Hey everyone, I've had a super-long running ticket with NetApp support to help determine how to ensure our systems are appropriately utilized without affecting capability during a failover event (planned or otherwise)
It took a very long time, but they eventually came across this article and passed it along to us: https://kb.netapp.com/on-prem/ontap/Perf/Perf-KBs/CPU_utilization_above_50_percent_before_upgrading_ONTAP
Step 1 says to look at ActiveIQ, but that seems to show exactly the same CPU output as NABox, step 2 references a depreciated capability for AIQUM (node failover planning guide), but step 3 is where the magic is!
Step 3 is to confirm how much of the workload is background IO (background IO yields to foreground operations, so would be halted in a CPU bottlenecked situation similar to a failover.
https://kb.netapp.com/on-prem/ontap/Perf/Perf-KBs/How_to_check_for_background_CPU_utilization_in_ONTAP_9
The basis of this seems to be using the qos statistics workload resource cpu show -node <Node> command to provide a breakdown of the CPU usage.
The request: Would it be possible to get a CPU graph (or even just a line-item that can be looked at) for the foreground processes to help ensure that we keep the essential CPU usage below 50%, so we're not bound up in the event of a failover situation...
I'm quite happy to have conversations or provide info about what I mean if my rambling wasn't clear, but I feel like this is something that is very much missing from standard NetApp monitoring, so maybe it's yet another place for NABox to shine?
Thanks again for all your work!