#NFS server not responding
1 messages · Page 1 of 1 (latest)
this means the worker node cannot set up a TCP connection to ONTAP for NFS. A few possibilities: a) the IP address changed and a firewall or client match rule prevents access, b) the NFS protocol version changed (e.g. from NFSv3 to NFSv4) and the SVM is not configured correctly for that version, or c) security/SElinux policies or something similar on the woirker nodes is preventing the NFS connection
And as silly as it sounds, please make sure your vserver aggr-list has the correct aggregates listed. I’ve seen that manifest issues in very different ways
(vserver show -fields aggr-list
It cannot/should not be an empty list!)
This issue is not consistent, out of 5 times it is working fine 4 times
hm that sounds strange. Two things from my experience can cause that. a) NFS storpool exhaustion. Are you getting any EMS messages about NFS storpool ehaustion on your NetApp? or b) duplicate IP address (those should also be logged in ONTAP's EMS system at least if they're in the same layer 2 broadcast domain)
Thanks @crisp shell @edgy lynx
After we upgraded our ocp version to 4.13 few of the trident pods are getting restarted occasionally.
here is what i got from logs
"level=fatal msg="Unable to start the K8S hybrid controller frontend." error="could not initialize Kubernetes client; couldn't retrieve API server's version"
but the problem after restart it is working fine for few hours.
and during init container execution we see the below error
"nfs: server xxx.xx.xx.x not responding, still trying
INFO: task python3.11 blocked for more than 122 seconds.
Are you using certificates or credentials? Is K8S and ONTAP on the same subnet? Any possible firewall in the way?
How soon do you notice the issue? You could kick off a network trace from ONTAP, options to include keeping the last x files at a size of y MB. If you are able to stop the trace when done, you might have something to open a case with.
Anything in the event log on Netapp?
Could be a perf issue too.