#nblade.nfs4SequenceInvalid

1 messages · Page 1 of 1 (latest)

hollow shell
#

Hello, out of the blue I started having NFS clients flooding my fas's nfs stack with bad sequences causing write io timeout on PVC probably due to the flooding, you can imagine the SQL HA DBs clusters are not having it 😦

I have 100's of thousands of message suppressed, only notice and debug.

We are pretty much stuck, also with the netapp support. The problems are sporadic on the applications. not all pv are concerned.

fas8300::*> event log show -time >4h -event nblade.nfs4SequenceInvalid,ems.engine.suppressed
Time Node Severity Event


10/18/2024 21:49:19 fas8300-01 NOTICE nblade.nfs4SequenceInvalid: NFS client (IP: 10.144.14.34) sent sequence# 2, but server expected sequence# 10. Server error: OLD_STATEID.
10/18/2024 21:49:19 fas8300-01 DEBUG ems.engine.suppressed: Event 'nblade.nfs4SequenceInvalid' suppressed 499439 times in last 121 seconds.

On the NFS clients, I have the following in var/log/message and I don't know if related, I'd say not

Oct 17 18:18:21 nfs-client nfsrahead[3638733]: setting /var/lib/rancher/pods/27e6f69a-1a18-4d95-ab7a-734ef47e5085/volumes/kubernetes.io~csi/pvc-af551ebd-60a4-45be-aa10-6d4bcb674131/mount readahead to 128
Oct 17 18:18:32 nfs-client nfsidmap[3639057]: nss_getpwnam: name 'root@nfs-filer.ad-domain' does not map into domain ''
Oct 17 18:22:49 nfs-client nfsidmap[3642513]: nss_name_to_gid: name 'root@nfs-filer.ad-domain' does not map into domain 'ad-domain'

novel karma
#

try changing to NFSv3, if you don't have LDAP in place for NSFv4 implemented there will be issues, nfsdomain should match on the NFS server and clients as well as users' UIDs

hollow shell
hollow shell
#

It turns out we might have hit a bug, maybe. NetApp could recreate our problem in lab and it happens when we enabled a second storage class. We used ontap-nas-economy on top of ontap-nas.

Also, it works with NFS3 because this version does not deal with StateID's, only 4.2 does. which explain why it works with NFSv3.

#

From NetAp/Leonardo:

Nfs uses something called “filehandles”, those are storage generated strings to represent a file, this way a file can be moved or renamed but the client can still continue to refer to it.

Ontap generates different types of filehandles depending on the existence qtree specific export policies or only volume export policies, they are not expected to coexist.

Once the first policy is applied to a qtree the storage controller will start using the new filehandles, but only for new mounts, existing mounts will continue to use the old filehandles until remounted.

In the traces we collected the client is using both types of filehandles at the same time, this is unexpected and is generating issues.