#ONAP/7Mode/Poor_performance_in_Data_ONTAP_7-Mode_due_to_excessive_file

1 messages · Page 1 of 1 (latest)

plain niche
#

Hi Team

Users utilizing NFS are reporting slow response times from the storage. Upon inspection of the filer, the following message was found in the event log, and it was confirmed that Disk Util and Cache hit are both at 100% according to sysstat.

The customer is running 7-Mode ONTAP 8.1.2P2 on FAS8040 hardware. They are planning a migration but are concerned as critical data is stored, and the system cannot afford performance degradation. In such a scenario, what steps should be taken to improve performance?

https://kb.netapp.com/Legacy/ONTAP/7Mode/Poor_performance_in_Data_ONTAP_7-Mode_due_to_excessive_file_deletions

tulip cedar
#

as the KB states, try to do less deletes from the client. If too many deletes happen concurrently, the system switches to a synchronous deletion mode until resources are available again, and during that time, IO will be impacted. It was a known problem with older ONTAP versions.
An upgrade to the latest 7-mode ONTAP version (8.2.5P5) might also help, but only ONTAP 9 really fixes these issues for good

plain niche
#

Thanks for the comment, Is there a way to check how much traffic or packets are coourring on an NFS session connected in 7-mode? I'm also curious if there's a command similar to "nfs connected-sessions" in ONTAP9.

tulip cedar
#

NFS33 is stateless so technically there are no "sessions". ONTAP9 fakes this by listing clients that have accessed anything within the last X minutes. AFAIK there is no such thing in 7-mode, and the onlything I remember that shows inbound traffic is either sysstat -x or ifstat (but it's been a while since I last worked with a 7mode system and our lab, which still has a 7-mode system, is currently down for maintenance 😉 )

plain niche
#

Whenever the disk util/cache hit reaches 100%, it shows an incredibly slow response time to the extent that NFS service becomes unusable. I have no confidence in resolving this issue whatsoever.

tulip cedar
#

when disk util hits 100% you are basically bound by disk IO, and you're behind the "knee" in the iops/ms latency curve. So yeah, in that case you will be in for a slow ride as there's nothing that can be doneto speed up physics

#

you can add mode disks to the aggregate (for example double the number of disks) to get more IOPS out and push the limit where it beomes a problem further out on the curve, but depending on your workload (i.e. how much the clients are "stressing" the system) that might not help much either

plain niche
#

In the same environment where there hasn't been a significant difference in data storage capacity, suddenly, around two months ago, these symptoms started appearing. However, there have been no physical disk removals or damages, and no changes in hardware, network environment, etc.

tulip cedar
#

it's usually client side. I.e. a dozen new NFS clients, or some new software (e.g. antivirus) that puts a lot of additional load on the system.
The interesting thing is that the iops-latency curve is nonlinear at the end, so adding 10% more clients doesn't increase the latency by 10% but by 50%, 100% or even more. So yeah, maybe the system before was only barely making it and that one new additional client brought it to the tipping point.
You might be able to do something with ONTAP priorities (is that what they were called in 7-mode? the poor-man's QoS thingy?) to slow down one volume to make room for more IOPS on another volume