Hello, out of the blue I started having NFS clients flooding my fas's nfs stack with bad sequences causing write io timeout on PVC probably due to the flooding, you can imagine the SQL HA DBs clusters are not having it 😦
I have 100's of thousands of message suppressed, only notice and debug.
We are pretty much stuck, also with the netapp support. The problems are sporadic on the applications. not all pv are concerned.
fas8300::*> event log show -time >4h -event nblade.nfs4SequenceInvalid,ems.engine.suppressed
Time Node Severity Event
10/18/2024 21:49:19 fas8300-01 NOTICE nblade.nfs4SequenceInvalid: NFS client (IP: 10.144.14.34) sent sequence# 2, but server expected sequence# 10. Server error: OLD_STATEID.
10/18/2024 21:49:19 fas8300-01 DEBUG ems.engine.suppressed: Event 'nblade.nfs4SequenceInvalid' suppressed 499439 times in last 121 seconds.
On the NFS clients, I have the following in var/log/message and I don't know if related, I'd say not
Oct 17 18:18:21 nfs-client nfsrahead[3638733]: setting /var/lib/rancher/pods/27e6f69a-1a18-4d95-ab7a-734ef47e5085/volumes/kubernetes.io~csi/pvc-af551ebd-60a4-45be-aa10-6d4bcb674131/mount readahead to 128
Oct 17 18:18:32 nfs-client nfsidmap[3639057]: nss_getpwnam: name 'root@nfs-filer.ad-domain' does not map into domain ''
Oct 17 18:22:49 nfs-client nfsidmap[3642513]: nss_name_to_gid: name 'root@nfs-filer.ad-domain' does not map into domain 'ad-domain'