#High other OPs on CIFS Share/Citrix Environment

1 messages · Page 1 of 1 (latest)

slate mesa
#

Hi There,

we are monitoring very high other OPs on a CIFS Share (around 20k IOPS) and are constantly running in the customers IOPS Limit.
The share is used for citrix profiles and files redirection.
Can you provide me with commands to narrow this down or is this a common issue i´m not aware of?

Cheers! Philipp

slate mesa
#

High other OPs on CIFS Share/Citrix Environment

devout leaf
#

I would start with statistics top file show and statistics top client show

#

"other ops" is usually metadata

slate mesa
#

Hey, statistics top file does not show any specific file for the customer. But shows the root volume of the share. top client shows always different citrix workers

#

4006 cl1-n2a VSERVER n2a_aggr1ssd VOLUME /qtree

#

*T otal Read Write Other Read Write Latency
Volume Vserver Aggregate Ops Ops Ops Ops (Bps) (Bps) (us)


kd_vs01_dat1 kd-vs01 n2a_aggr1ssd 6406 417 62 5016 9356814 1004999 71

hardy quarry
#

6k read ops and 9MB/s doesn't seem too extreme, depending on the number of users

pulsar flare
#

But Citrix tends to have more other ops usually.

slate mesa
#

Don´t get me wrong. there is no latency and the load is okay. We observe normal read and write IOPS but the Other OPs (open/close) are all over the place.

#

to set this in perspective: Read: eg. 1000 IOPS, Write 600IOPS and Other ist 10000 IOPS.

#

We suspect that MS teams could be the problem. For every user there are 10 -15 open files in /AppData/Teams/cache etc.

hardy quarry
#

yeah, there is often some tuning to do with windows applications that are very chatty. pulling the whole profile to the local TS is one method of moving the chatter then you have to filter what you store on logg-off

#

browsers are a special kind of evil with all of their cache's and configuration files

devout leaf
#

other ops can also be client-side virus scanning or indexing software, that does a ton of metadata ops on the backend

pulsar flare
#

Chromium based browsers are bad. Citrix containers do help mitigate some of this. Anything really in the Appdata folder is probably a problem.

mighty zenith
#

We have been seeing an issue with Other IOPs for years at my customer. We have opened cases, but have never been able to track down exactly what is causing it. As @devout leaf mentioned, I strongly expect it is some virus scanning or indexing that is being done by the clients, but we have never been able to convince the server admins to do a real investigation into it to determine what is generating them. The other IOPs don't actually seem to cause any performance issues, so to them, it isn't a problem. But, it really messes with the metrics on our side of things.

I have noticed that the Other IOPs increase exponentially as more servers start accessing the volume (which supports my theory that this is a virus scan or index that is part of the standard client server builds). I have also seen them increase dramatically for NFS volumes when the clients mount the snapshots of their volumes and leave them mounted (again, probably because the mysterious crawl is now separately crawling each snapshot directory).

hardy quarry
# mighty zenith We have been seeing an issue with Other IOPs for years at my customer. We have ...

In the case of multiple servers accessing the same volume, most of your "other ops" are checking the status of files/locks and new files. It's a poor man's clustering and it doesn't scale very well. Those using such approaches would always be better off using fewer more performant servers than many smaller as the "synchronization" overhead with many servers gets quickly out of hand. It's what happens when application people don't understand storage or the costs of not using real clusters with clustered filesytems or other batch approaches. I've seen 80k "other ops" on such setups and the application people didn't know why things were "slow" .
Things like log scraping are also very primitive, but often used approaches these days to monitor many open files for changes and cause such IO patterns.

worldly lodge
#

I see customers with virus scanning at the desktop/server and continue to tell the application to “scan all” including network drives. I’ve seen this with my customers. As soon as they disable network drives, it clears up. If that’s the case they should really be looking at an enterprise antivirus solution that works with ONTAP.

hardy quarry
#

if it was virus scanning, you'd probably see more than "Other Ops" because something would be reading the files

static citrus
#

not necessarily. offbox scanning doesn't read the file, but it does pull metadata and checks for a specific flag on the file to determine if it's been scanned previously

hardy quarry
#

he's talking about client-based scanning

mighty zenith
worldly lodge
static citrus
#

I hear it all the time as I'm showing them the vscan report clearly showing their stuff trying to scan my stuff

pulsar flare