Using nabox (4.0.7) there is a lot of information in the various dashboards, which is just great! But, sometimes I get a bit lost 🙂 E.g. I'm pretty sure I once saw a panel with QoS metrics/info. ... But I can't find it now.
Similarly I was hoping that I might find a Dashboard/Panel/Graph showing ONTAP CP (Consistency Point) information, but no luck so far. Is there such a thing in nabox?
#Looking for specific metrics ...
1 messages · Page 1 of 1 (latest)
@solid lotus You can find CP in disk dashboard as below
For QOS metrics, Workload dashboard should have it.
Found the CPs - Thanks! The graph / panel is labelled "CP Counts" but y-axis is labelled "write latency" ... Do you know what it means? I guess each peak isshow the time it took to do a individual CP?
Aren't there also different types of CPs in WAFL / ONTAP?
Found the QoS too, thank you. Looks as if it is turned off by default. So we'll have to think about if we should enable it or not ...
Qos metrics will be enabled by default in Harvest from upcoming Harvest version.
We'll improve the description there to make it clear.
Humm, actually what I see here is both axis labelled "write latency" and no sign of any counts or a y-axis on the right hand side
Underneath the graph are just a list of write latencies per aggr, but no other info.
If you click on Back-to-back CP count that should show CP
hmm.. What is the Harvest version?
Could you share screenshot how does this panel look?
Probably can a screenshot. I have to tx it out of ... where it is. Inspect json shows me references to cp counts, so it seems that there should be more info. ... Maybe the graph is just being rendered incorrectly somehow
4.0.7
You can try reimporting it from here in grafana and check if it fixes the issue
https://raw.githubusercontent.com/NetApp/harvest/refs/heads/main/grafana/dashboards/cmode/disk.json
I meant to ask Harvest version as shown here in NABox
How would I do that? (sorry - I'm (not yet) a Grafana professional :-/)
Aha. Harvest version is 24.11.1
It was imported via the nabox upgrade page to replace the version included with 4.0.7
Reimport from github won't work in any case, the site has no direct access to internet
Humm. Fond the import function, but it doesn't seem to want to overwrite the existing definition.
Maybe it would be easier to just upgrade the whole think to 4.0.8 ?
Okay
May be upgrading to Harvest nightly build is better.
https://github.com/NetApp/harvest/releases/tag/nightly
NABox 4.0.8 will not help
I see this (sorry about the hover-popup)
OK could you share your Grafana version?
I see the issue. I'll share fix.
As a workaround for now, Could you modify panel query to below if NABox allows that.
sum(wafl_cp_count{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$Node",metric=~"back_to_back_CP|deferred_back_to_back_CP|back_to_back_cp|deferred_back_to_back_cp"})
I'll try the mod, some things don't seem to be modifiable / deleteable (if that's a word) I get an error about teir being "provisioned" (or similar)
maybe a Save As will work instead of a Save
I have started a nightly build for you to upgrade.
@short field for dashboard modification in NABox
@short field has a change, not yet published, that will let you edit the out-of-the-box dashboards too
I should probabl release that before the Top Client dashboard checkbox
Well Hey ... that worked (mostly). I changed the query, hit Apply and the CP panel is updated. Now I have a RHS y-axis labelled "Back to Back CP Count" 👍🏻
Nice!
Two things occur to me ... the first row of the table, below the graph is showing the query e.g. "sum(walf_cp_ ..." the next rows show "Write Latency" values .... looks strange
This panel is mostly trying to show write latency and cp count together for issue like below
https://kb.netapp.com/on-prem/ontap/Perf/Perf-KBs/Write_Performance_Impacted_by_Back_to_Back_Consistency_Points
https://kb.netapp.com/on-prem/ontap/Perf/Perf-KBs/What_are_the_different_Consistency_Point_types_and_how_are_they_measured
Second thought ... Is it true? I though "back to back CP" events should be rare and v. not good. This seems to be showing me them happening all the the time?
@red cradle can help in domain side of this. See if KB articles help.
#1062049169520476220 channel can also help.
Good idea, I should know what I'm talking about, before talking. I'll read the KBs
In edit I see "Legend: Auto" ... maybe that's why I'm seeing the "raw" query displayed in the table
Its like this for me. Is it any different for you?
Fix is available in https://github.com/NetApp/harvest/releases/tag/nightly
You can upgrade to this in case.
Dude, it's Friday! Never change a running system (on a Friday afternoon)🔥
Seriously though ... I could consdier nightly ... But in general how stable is it? We have (we will have) users who will want to see reliable data ...
it's stable, runs through all CI, regression, and unit tests, and it's what we run leading up to our next release
Also .. I still have to get back to getting my custom Grindstaff ISL interface counters working 😊
For me, after changing the "A" metric query and Applying, then Option -> Legend was set to "Auto" which resulted (AFAICT) in the whole query being used as the "Name" in the table under the graph
Okay. You can set Legend to Back-to-back CP Count for A
Yeah ... but ... really shouldn't it say something like "back2back" + a variable i.e. the node name I think would be correct? (I think of CP ops as a Node / ONTAP instance level concept)
I changed it to be "BOBs {{$Node}}" ... but that results in the Name in the table becoming the literal string "BOBs {right-axis}"
Too many braces {} ?
for legends it is typically {{name}} example
but I think the Back-to-back CP Count legend is acting different, likely because it has right placement
for example
ah, i see the problem. let me try something
I'm glad you guys understand this 🤣
FYI: In your last screenshot: "Foo (right y-axis)" under "Name" is also what I have achieved 🤪
Give this a try
query: sum by (datacenter, cluster, node) (wafl_cp_count{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$Node",metric=~"back_to_back_CP|deferred_back_to_back_CP|back_to_back_cp|deferred_back_to_back_cp"})
legend: Back-to-back CP Count {{node}}
OK. I'll tx. that query over and try it. It'll take me a couple of minutes
CP Info and RHS axis have disappeared from the graph. Table undeneath has only "write latency" entries
Let's forget it for now.
Here's what I see with the changes pasted above
That's nice 🙂 Let me try it again - probably screwed something up here. Meanwhile, you can read email to ng-harvest-files?
yes
I sent you a screenshot of a different design
thanks. At the moment, Harvest does not have the CP durations, which is what the PAS CP dashboard is showing. Let me check and see if the ZAPI includes that information (I know Rest does not)
we collect cp_phase_times which is percentage time spent in different phases of CP. Looks like there is a counter for total_cp_msecs which is Milliseconds spent in CP but that won't be broken down by phases of CP.
so far, I don't see anything in the wafl perf object that includes the durations. PAS must be collecting that info some other way
That's too deep for me :-). I could poke around inside a perf autosupport archive and see if i can identify anything, bu tit would be a emperical approach
that's OK, I'll ask around and see what I can find out
BTW: Rahul's query line works. Just saying 😎
yes, we reviewed before he sent. Try this
sum by (datacenter, cluster, node) (wafl_cp_count{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$Node",metric=~"back_to_back_CP|deferred_back_to_back_CP|back_to_back_cp|deferred_back_to_back_cp"})
On it. ( I was just trolling you with that last one ;-))
So I cloned the complete Dashboard and applied your changes (query + legend). There it works. Go figure 🤷🏻♂️
Sent you a screenshot via email
thanks! I'll check on the scatter plot, doubt that Harvest has the data to show that. Either way, sounds like you would prefer we change this dashboard to use the latest query. Is that correct?
For sure. Before, here, I wasn't seeing any CP info in that graph. So this is definitely better - IMHO. Still have that slight doubt that these events are really all b2b CPs ... But I need to do my KB homework