We have a very simple FG consisting of two sub-volumes one on each aggregate, and each aggregate is on two different controllers (in the same cluster of cause).
This has been working great for some time...
But recently we see issues where we seem to get timouts in the cluster eventlog like "Nblade.CifsOperationTimedOut".. with these operations: "SMB2_COM_QUERY_INFO", "SMB2_COM_SET_INFO", "SMB2_COM_WRITE" and "SMB2_COM_CREATE"...
From the client side we can trigger with with copying a file of say 35GB.. i will copy along just fine... then after 805 it will slow down... then stop at the end or near the end and stay there for a minute or two, then cast an error...
We have tried the same copy to another FG (on the same cluster) and we get the same issue.
We then tried to copy to a normal FlexVol.. which works just fine without any errors...
We are on 9.16.1P10
We have of cause looked at the network and it all seems just fine. Now the cluster switches are not in a supported setup, yet it has been working just fine before. The switches are Cisco N3K 3172 i think.
But again we see no issues on the switch ports... and we are copying at about 4-500MB/sec. so nothing is getting "overloaded" 😉
Any suggestions are very welcome.