#Direct vs Indirect access

1 messages · Page 1 of 1 (latest)

rough mulch
#

The section below stated that we don't need to worry about indirect access and also you can find the section from PDF file below:

Use a single logical interface (LIF) for each SVM on each node in the ONTAP cluster. Past recommendations of a LIF per datastore are no longer necessary. While direct access (LIF and datastore on same node) is best, don’t worry about indirect access because the performance effect is generally minimal (microseconds).
https://docs.netapp.com/us-en/ontap-apps-dbs/pdfs/fullsite-sidebar/ONTAP_and_enterprise_applications.pdf

However, in the KB below, although I don't fully understand what it says, but it indicates that Indirect could cause a performance issue and recommend to use direct access.
https://kb.netapp.com/on-prem/ontap/Perf/Perf-KBs/Elevated_CPU_or_high_cluster_latency_when_using_indirect_traffic_using_CIFS_or_NFS

Can some experts here please explain to me how should I better understand the KB? and what really "Network Exempt CPU domain" is? Does Direct or Indirect really matter?

hoary bough
#

The second article is over a year old, the first one is only a few days old.
It would seem that the newer article would be the correct one to adhere to.

We utilize one lif per node . We don't have one per datastore and our performance isn't impacted by it

rough mulch
#

I know all you said. But, do you all know what is “Network Exempt CPU domain” is or if the KB still can apply today’s situation?

strange mountain
#

The ASA is indirect traffic all over the place. An ASA is basically just an AFF but we publish all paths via ALUA as active, even the indirect paths. The cluster network gets used heavily on an ASA.

And also these days, with most customers deploying the 100Gb Nexus 9336C-FX2 cluster switches, and the NS224 NVMe shelves connecting at 100GbE also, the cluster network is typically not your bottleneck for throughput, it can just add some microseconds of latency as any additional hop will.

But note in your first link it does say this, which is absolutely correct:

"Do not be overly concerned about indirect traffic. It is best to avoid indirect traffic in a very I/O-intensive environment for which every microsecond of latency is critical, but the visible performance effect is negligible for typical workloads."

rough mulch
#

@strange mountain Fully understand what you said. I need to address those questions to managers. Do you know what “Network Exempt CPU domain” is??? I honestly don’t understand the issue in the KB and managers brought to me. We are not ASA shop only AFF and FAS nfs or CIFS

strange mountain
#

NWK_Exempt is processing for the NFS and SMB protocols, and can utilise multiple cores. Even when traffic is going indirect, both the node that received the traffic and the node that owns the volume, both get hits to their NWK_Exempt (so you're adding I/O processing load). This domain, in NAS environments, is often very highly used. What that KB is basically saying is that the nodes already have their own NAS workloads to process, and then they are processing traffic coming into them which is for other nodes, so they have to process it and send it over the cluster network. Resulting in higher CPU usage and increased cluster network usage and latency.

If the latency of NAS requests isn't causing issues, then there is no problem. But if it is, then you want to try and make your NAS traffic as direct as possible and you'll see a drop in NWK_Exempt across the board.

But that's very hard in CIFS environments as you typically access a UNC path via a name, and access many shares which could be on different nodes. So suddenly you have indirect access immediately.

In large CIFS environments where NWK_Exempt is consuming all resources, the best solutions are:

  1. Add more HA Pairs and spread the load across more physical CPUs
  2. Replace existing controllers with newer generations with more CPU cores

But before that, the first thing to try as much as possible is to eliminate indirect, and prove it is the culprit ... As a test, take the busiest clients and have them access their shares (or NFS mounts) directly to the node hosting their FlexVols via each node's LIF IP. Take the heaviest clients and try that for a period and monitor the CPU. You should see a drop across the board.

hearty quail
#

If you have indirect load, basically ONTAP becomes a router at that point and has to recieve the traffic (if writes), send it through the cluster network then the dblade node has to recieve the traffic. This is double the network CPU load.

rough mulch
#

Thank you all!
I understand Indirect is not a conern in general, also understand how Direct or Indirect works. My questions are all about the KB, and wether or not the KB would shake my understanding on that Indirect is not a concern?

@strange mountain
Your messages explained what the KB is talking about, but, unfortunately I don't fully understand and please address a couple of follow-ups below. Also, most of our shares are NFS, and we generally use LIF's to reference CIFS shares.

So called "NWK_Exempt CPU domain" is a part of CPU time spent on NFS/CIFS protocols AND also transfering netowrk traffics betweeen nodes. Correct?
Did Indirect cause high NWK_EXMPT here and that's why we wanted to reduce Indirects?

As suggested by KB, we can use " node run -node <node> sysstat -M 1" to determine if NWK_Exempt is too high, then how high is too high?

Is my understanding below correct?
In any event, indrect access alone wouldn't performance cause issues. When the node that received traffice itself has some kind of performance issue already, and the incoming requests for volumes on the other node added extra loads, result in not able to quicky transfer the requests to the other node. We can avoid this kind of issue by using Direct. Indirect is not the root cause here.

Please let me know. Thank you!

hearty quail
#

Great questions David. Let me check on the KB.

#

This KB is saying if Network Exempt is maxed out, you will have cluster latency. I'm trying to think of an easier way to check, but you could do:
Cluster::> set diag; systemshell -node <node> ps aux | grep -i nwk | grep th | wc -l
and see how many network threads you have.

#

Normally CI latency will only had tens or hundreds of microseconds of latency.

#

I added some extra clarification. Hopefully that will make it jive better.

#

Thanks for feedback.

rough mulch
#

Sorry for my persistence, but, can somebody please tell me:

Whether or not the Indirect was the root cause of the high NWK_EXEMPT in this KB? OR
The KB just told us that since the node already had high NWK_EXEMPT, we should bypass the node and use Direct access on the other node that owns the volume?

If it is former one, then that proves Indirect CAN be the root cause of a performance issue. This is the part I wanted to get confirmed.

strange mountain
#

Typically in a CIFS/NFS environment when you have volumes spread across many nodes, a great example is a FlexGroup, then what we will do is create a NAS LIF for that SVM on every node that hosts volumes, and then create multiple DNS A Records for the same name, but point to all the NAS LIF IPs.

DNS Round-Robin comes into play here. As clients lookup the name, DNS will return them a LIF IP. If the volume is local to that LIF IP, then its direct access. if the volume is on another node then it becomes indirect access. By the nature of DNS round robin and a high number of clients the load gets distributed across all nodes. Problems tend to arise when you have fewer LIFs than nodes hosting volumes for that SVM. If you have 1 NAS LIF, but volumes on different nodes, then traffic will all have to come in via that one node with the LIF and you start to see latency over that cluster network.

Typically the cluster network is the same speed or faster than most front-end clients can push (not always the case, but most of the time) so the cluster network itself is not usually a bottleneck but the node that has to process all that traffic is.

There are things you can look at such as pNFS or referrals. But ideally if not using FlexGroups you want to try and mount direct to the LIF on the node where the volume resides.

In the VMware world, with NFS Datastores, I just wish VMware would support pNFS ... that frustrates me as Session Trunking is not a solution and only works for multiple LIFs on the same node with ONTAP.

rough mulch
strange mountain
#

Its difficult to say without access to the controller and reviewing the performance metrics. Do you have Active IQ Unified Manager deployed? If you pick some of the volumes in that SVM under the "Workload Analysis" section, you get a good graph that shows the various cluster components and which is contributing to the latency. It would be interesting to see what that shows.

If you want a definitive answer the best course of action is to capture a performance archive from the node and create a support case and request investigation.

https://kb.netapp.com/on-prem/ontap/Perf/Perf-KBs/What_are_performance_archives_and_how_are_they_triggered

rough mulch
#

@strange mountain , My 2 questions above are about the case described in this KB https://kb.netapp.com/on-prem/ontap/Perf/Perf-KBs/Elevated_CPU_or_high_cluster_latency_when_using_indirect_traffic_using_CIFS_or_NFS

Is Indirect access in this case as described by the KB the root cause of the performance issue on this node?

OR

The node already had performance issues and therefore it delayed all requests going through the Indirect path?

My manager wanted to know.

strange mountain
#

In that case in the KB, yes, it was being caused by indirect access. You can see that in the last console output of the Issue section, where they viewed the volume latency and saw that it was coming from the "Cluster" which is the backend cluster network.

If nwk_exmpt is high, but cluster latency on the volumes is low, then there isn't an issue as indirect access causes nwk_exmpt to go high. But if cluster latency is also being seen, then yes, Indirect is in the cause of the performance degredation.

If the node itself being overworked already, then all volumes on that node will show high latency, but not from the "cluster". they'll see it from Network, Data, or Disk, usually. So if you're seeing a lot of volumes with "cluster" as the latency cause and also high nwk_exmpt usage, then indirect is the most likely culprit.

If all volumes are seeing high latency caused by other node/cluster components then its the nodes that are already overworked that is causing the performance issue and not directly the indirect access.

rough mulch
#

Does Nwk_Exmpt include CPU time on I/O processing caused by NFS/CIFS, AND also CPU time on Cluster which is backend cluster network?

strange mountain
#

yes, it includes both. It represents:

IP processing, NFS protocol processing (7-mode and cDOT), SMB processing (cDOT)

https://kb.netapp.com/on-prem/ontap/Perf/Perf-KBs/What_are_CPU_as_a_compute_resource_and_the_CPU_domains_in_ONTAP_9

You will always see Nwk_Exmpt usage. Indirect load on this is only a problem if you see latency from the Cluster Network (which includes the Nwk_Exmpt time spent processing). If the node is keeping up and cluster latency is low, and the other components are introducing latency (Disk, WAFL, front-end network, etc), but CPU usage is high and Nwk_Exmpt usage is also high, it just means your node is under heavy load but is keeping up currently. Something will eventually give as more workloads hit the controller, but its not an immediate problem right now ... but something to keep an eye on. Perhaps move workloads to other nodes as one option.

rough mulch
#

Understanding better now. In nowadays, the bandwidth on the backend cluster network is much higher than the throughput coming from the client network, so high latency on the cluster network and caused by Indirect traffic as KB’s example presented wouldn’t be seen very often. Is this a reasonable statement?

strange mountain
#

Yes, it's true that most backend cluster networks are very high speed, have dedicated switches, utilise multiple ports from each controller, and are highly available. Indirect these days isnt usually a massive issue. It's not to say it wont or cant be, but I rarely see it, if at all, across my 40+ customers.

I have one customer with a FlexGroup of about 10PB spanning 8 controllers, with a NAS LIF on each node. We see indirect traffic all the time, constantly, and Nwk_Exmpt usage I often see around 200-500%, but the "cluster" latency in "qos statistics volume latency show" is almost always less200-300us (less than half a millisecond, and that's with 100GbE Front-End Data Ports and 40GbE Cluster Switches, so front-end is technically capable of faster network than their backend cluster network).

The bigger causes of latency I mostly see are disk (on systems running NL-SAS drives), Data (WAFL's own processing overhead), and Cloud (Latency to an object-store when using FabricPool).

rough mulch
#

Hi @strange mountain ,

More follow-ups about the KB, if you can please help me to understand it better:

In this KB again:
https://kb.netapp.com/on-prem/ontap/Perf/Perf-KBs/Elevated_CPU_or_high_cluster_latency_when_using_indirect_traffic_using_CIFS_or_NFS

`Cluster::*> node run -node <node> sysstat -M 1
AVG Nwk_Exmpt
61% 994%
60% 988%
60% 974%
61% 971%

Cluster::*> node run -node <node> sysstat -x 1
CPU

99%
99%
99%`

What is the difference between AVG figures as shown by " node run -node <node> sysstat -M 1" and CPU figures by "node run -node <node> sysstat -x 1"? AVG CPU is only about 61%, but the other under "CPU" is 99%.
2.
Why Nwk_Exmpt 994%-988% is too high? How high it is too high?

Thank you!

hearty quail
#

Just monitor CPU. The CPU busy algorithm includes if network exempt is maxed out.

mild summit
#

@rough mulch do you have a NetApp ATS or SE aligned to your account? Sounds like it might be worth it if these are important questions, since usually they aren’t.

safe wigeon
#

all the ..._exempt statistics are the explicitly parallelized parts of the respective domain, so network exempt takes up (in your case) around 10 CPU cores. Is it too many? That depends on the other workload and the number of cores. If your system has 12 cores, it will be a lot more (relatively-speaking) than if you have a system with 96 cores for example

#

and about the CPU column in sysstat -x, this is not necessarily a simple average, on a busy system it is the maximum usage in any domain so it is usually higher than the average

austere rapids
#

and there, sysstat -M is your friend

hearty quail
#

Honestly if you don't know how this works, just use the sysstat -x cpu busy output. It handles it for you

rough mulch
# safe wigeon all the ..._exempt statistics are the explicitly parallelized parts of the respe...

Execuse my slowness. I still don't get it.
1)
For this KB, how can you figure out NWK_EXEMPT took 10 CPU cores? It showed NWK_EXEMPT was 998%, but didn't say it took 10 cores.
2)
Sure, I can understand 10 out of 12 cores is too high, and 10 out of 96 cores is not. But, what percentage of cores that NWK_EXEMPT took among all cores is consdered too high?
3)
So, do I just need to look at CPU util by "sysstat -x" without caring about NWK_EXEMPT by "sysstat -M"?

mild summit
safe wigeon
#

as to 1) if one core is maxed out the value is 100%, if two cores are maxed out it's 200%, so if 10 cores are maxed out it's 1000% (and you have 998% which is pretty close to 1000). Sure it could also be 20 cores at 50% or 40 cores at 25% but for total CPU usage that doesn't really matter
...other than that I agree with @mild summit that you should get in contact with your SE or SAM or other technical contact at NetApp, as they can provide much better help since they can access the ASUP data etc.

hearty quail
#

It's the number of NwkThd processes minus 1.

#

I just didn't want to include that in the KB because it involves doing a ps aux | grep NwkThd to find it.

#

Even then I'm not finding it...IDK.

safe wigeon
#

I was just about to say I don't seen any NwkThd threads on my system at all 🙂

#

for WAFL there are the wafl_exemptXX threads but no such thing for network

#

I mean you can see the threadpool stats for the network stack in netmpstat -p :

================
Thread Pool:                    on
  Thread Count:                 9
  Dynamic Trigger:              off

but I am not sure those correspond to the percentages displayed in sysstat

hearty quail
#

bash-5.0# ngsh -n 'priv set diag; ps' | grep -i nwk

#

That works too.

#

bash-5.0# ngsh -n 'priv set diag; ps' | grep -i nwk
Warning: These diagnostic commands are for use by NetApp
personnel only.
7 BG N 0 0% 0 0% NwkThd_00
8 BG N 7 0% 0 0% NwkThd_01
9 BG N 0 0% 0 0% NwkThd_02
835 BR 0 47673 0% 3936 10% nwk_cdp_sender
893 BR e 3773 0% 5536 15% nwk_cdp_timer
bash-5.0#

safe wigeon
#

ah. I was in the wrong shell

#

but yeah it's the same number of threads

hearty quail
#

bash-5.0# ps -axrHwww | grep NwkThd
0 - WLs 444:55.52 [kernel/NwkThd_00]
0 - WLs 445:03.47 [kernel/NwkThd_01]
0 - WLs 444:01.38 [kernel/NwkThd_02]
51404 1 S+ 0:00.00 grep NwkThd
bash-5.0#

#

There finally found the right PS options. Sheesh.

safe wigeon
#

it's the same output though, isn't it? except cumulative CPU time which doesn't say a whole lot

hearty quail
#

I just meant instead of using ngsh you can do ps -somethingHsomething and it will spit out NwkThd.

#

Otherwise it calls it "kernel" which isn't helpful.

rough mulch
#

By using your command, I got the following:
*> systemshell -node node-2 ps -axrHwww | grep NwkThd
(system node systemshell)
0 - WLs 151716:48.38 [kernel/NwkThd_00]
0 - WLs 151688:21.06 [kernel/NwkThd_01]
0 - RLs 151582:58.23 [kernel/NwkThd_02]
0 - WLs 81894:09.00 [kernel/NwkThd_03]
0 - RLs 40159:16.79 [kernel/NwkThd_04]
0 - WLs 12994:02.25 [kernel/NwkThd_05]
0 - WLs 2149:16.96 [kernel/NwkThd_06]
0 - WLs 120:00.58 [kernel/NwkThd_07]
0 - WLs 0:45.20 [kernel/NwkThd_08]
0 - WLs 0:00.00 [kernel/NwkThd_09]
0 - WLs 0:00.00 [kernel/NwkThd_10]
63995 0 Ss+ 0:00.00 csh -c ps -axrHwww | grep NwkThd
63998 0 S+ 0:00.00 grep NwkThd

The number of NwkThd appeared to be unchanged as I observed.
How the number of threads are corrlelated with if NWK_Exempt is high or not? Or any help on determining if NWK_Exempt is maxed out?

I am just trying to understand the KB and the KB only, not necessarily related to my environment or need ASUP data.

hearty quail
#

Ok so you have 11 threads, subtract one so you can go up to 1000%.

hearty quail
# safe wigeon I mean you can see the threadpool stats for the network stack in `netmpstat -p` ...

We just don't want that detail in the KB because we don't always want customers mucking with systemshell. However the CG stats Darkstar mentioned might be a good wayy to do that:

pstejska@pstejska-pc:~$ vsim

Last login time: 7/4/2024 05:18:13
Unsuccessful login attempts since last login: 1
pstejska_vsim::> set d

Warning: These diagnostic commands are for use by NetApp personnel only.
Do you want to continue? {y|n}: y

pstejska_vsim::*> run local netmpstat -p
CG Common Thread Pool Stats:

Thread Pool: on
** Thread Count: 3**

rough mulch
#

Great, now I know it better!

One more last question, then I am good.
Is it true that if we see CPU reached almost 100% by running "sysstat -x 1", then NWK_Exempt should be maxed out as well. But, when NWK_Exempt reached almost 100%, CPU is not necessarily so high? I meant, how these two figures correlated to each other?

safe wigeon
#

not necessarily ... the CPU could be spent somewhere else than networking

#

also the CPU column is not always an average. it could be that Kahuna is at 100% (not kahuna_ex) which would then also show CPU at 100%

#

from the manual:

CPU: The greater of either the utilization of the busiest domain or the average utilization of all CPUs during the previous interval seconds.