#workload dashboard
1 messages · Page 1 of 1 (latest)
hi @idle ember These are the templates you uncommented in conf/zapiperf/default.yaml? If so, can you share your log files? https://netapp.github.io/harvest/23.11/help/log-collection/
Here are the lines tha commented out
Workload: workload.yaml
WorkloadDetail: workload_detail.yaml
WorkloadVolume: workload_volume.yaml
WorkloadDetailVolume: workload_detail_volume.yaml```
thanks! Your logs show the metrics are being collected without errors. Any chance you did't import the latest dashboards? Some of the metric names for workloads changed in 23.08 so if you are using 23.11 pollers with earlier dashboards, Grafana will make the wrong queries to Prometheus
i did import the 23.11 dashboard with overwite
let me delete the workload dashboard and re-import
delete/re-import did not fix the problem
Do the variable dropdowns at the top of the dashboard have values?
those are sad screenshots 🙂
let's check Prometheus - can you try a query like this, change cluster to one you have enabled workloads for qos_read_ops{cluster="umeng-aff300-01-02"}
good good. Now back to Grafana, hover over the first panel in the Workload dashboard and press e to edit that panel, then Query inspector, then press Refresh Does the query and the result look right?
what if you change the Workload variable to All?
let's try changing WorkloadClass to All also
got 500 error after changing workloadclass to all
sorry my bad. ignore above
workload=~"()"
and workload comes from this var and uses workloadClass
query only returned ALL
I strongly suspect this is a Grafana breaking change. I'm running 8.1.8 and you are running 10.2.1. The Harvest team has not validated that our dashboards work with 10.X. I've got to jump into a meeting shortly, but will take a look afterwards
👍
we've validated up to Grafana 8.3.4 https://github.com/NetApp/harvest/blob/355360b5b5e0f7ee124571080ea940061fb5e7a2/prom-stack.tmpl#L38
from general reading, I'm under the impression that other teams have hit Grafana breaking changes too
ok.
Good news, that dashboard works for me in 10.2.1. You empty workload query that you pasted above is the result of the TopQOSreadOps variable. What do you see for that variable in Preview of values?
It only has ALL
@idle ember Can you share output of qos_workload_labels metric from prometheus for this cluster?
If this is empty then please share logs consisting of lines Zapi:QosWorkload
Zapi:QosWorkload was missing in default.yml. After adding it, most panels have data now. Only QOS FIXED UTILIZED % still no data. qos_ops and qos_total_ops are being collected.
does your zapi/default.yaml have QosPolicyFixed: qos_policy_fixed.yaml?
those look good - is it possible the data hadn't updated when you checked the dashboard? If you reload that dashboard is that panel still blank. I confirmed that it works fine on Grafana 10.2.1.
reloaded and it is still blank😕
phooey - can you check your TopFixedQOSIOPsPercent variable - do you have anything in preview there?
Only has ALL
is it possible that you haven't created any fixed QoS policy groups defined on the cluster in question?
i don't think so. How do I check? BTW, the fixed qos panels are blank for all our clusters
try qos policy-group show from the ONTAP CLI
Name Vserver Class Wklds Throughput Is Shared
---------------- ----------- ------------ ----- ------------ ---------
extreme-fixed flc1-noprod-ash-storage user-defined 0 0-50000IOPS,1.53GB/s false
performance-fixed flc1-noprod-ash-storage user-defined 0 0-30000IOPS,937.5MB/s false
value-fixed flc1-noprod-ash-storage user-defined 0 0-15000IOPS,468.8MB/s false
3 entries were displayed.```
great that explains why you aren't seeing any fixed in the panel
oh but why we get fixed metrics?
you're right, I should have said that those are fixed policy-groups but they have not been applied to any workloads, the Wklds column is zero and that also matches your screenshot of the prometheus metrics that have object_count="0" The panel in question is showing the top qos_ops that have a fixed policy group applied to them, and in you case, there are no workloads with a fixed policy group applied
So you are saying that we are not using or enforcing QOS fixed or adaptive at all?
My read of your CLI output is since the number of workloads (Wklds column) is zero for all policies - that no workloads have a policy applied. @pine flicker is that a correct read?
Expected Peak Minimum Block
Name Vserver Wklds IOPS IOPS IOPS Size
------------ ------- ------ ----------- ------------ ------- -----
extreme flc1-noprod-ash-storage 0 6144IOPS/TB 12288IOPS/TB 1000IOPS ANY
performance flc1-noprod-ash-storage 0 2048IOPS/TB 4096IOPS/TB 500IOPS ANY
value flc1-noprod-ash-storage 0 128IOPS/TB 512IOPS/TB 75IOPS ANY
3 entries were displayed.```
0 workload too
I enabled qos fixed and adaptive on another cluster but the panels still show no data.
Name Vserver Class Wklds Throughput Is Shared
---------------- ----------- ------------ ----- ------------ ---------
extreme-fixed flc1-poc-ash-storage user-defined 36 0-50000IOPS,1.53GB/s false
performance-fixed flc1-poc-ash-storage user-defined 1 0-30000IOPS,937.5MB/s false
value-fixed flc1-poc-ash-storage user-defined 1 0-15000IOPS,468.8MB/s false
3 entries were displayed.
flc1-poc-ash-storage::qos> qos adaptive show
Expected Peak Minimum Block
Name Vserver Wklds IOPS IOPS IOPS Size
------------ ------- ------ ----------- ------------ ------- -----
extreme flc1-poc-ash-storage 1 6144IOPS/TB 12288IOPS/TB 1000IOPS ANY
performance flc1-poc-ash-storage 0 2048IOPS/TB 4096IOPS/TB 500IOPS ANY
value flc1-poc-ash-storage 0 128IOPS/TB 512IOPS/TB 75IOPS ANY
3 entries were displayed.```
Logs
``` 2023-12-04T16:15:43Z INF collector/collector.go:510 > Collected Poller=flc1-poc-ash-storage apiMs=94 calcMs=0 collector=Zapi:QosPolicyFixed instances=12 metrics=96 parseMs=1 pluginMs=0 │
│ 2023-12-04T16:15:43Z INF collector/collector.go:510 > Collected Poller=flc1-poc-ash-storage apiMs=118 calcMs=0 collector=Zapi:QosPolicyAdaptive instances=3 metrics=27 parseMs=1 pluginMs=0 │
│ 2023-12-04T16:15:43Z INF collector/collector.go:510 > Collected Poller=flc1-poc-ash-storage apiMs=191 calcMs=0 collector=Zapi:QosWorkload instances=110 metrics=330 parseMs=6 pluginMs=0 ```
@idle ember There is an issue with QOS Fixed panels where QoS fixed panels are not displaying the workloads where the admin svm qos policy has been applied. I have opened an issue for this https://github.com/NetApp/harvest/issues/2530 and fix via PR https://github.com/NetApp/harvest/pull/2532
Can you try importing dashboard from https://github.com/NetApp/harvest/blob/a55cb91327f19634e094d981669bb577e51d8b6c/grafana/dashboards/cmode/workload.json and see if this fixes the issue.
Thanks for reporting!
@wooden galleon yes the updated workload.json fixed the issue. Thanks!
great! thanks for the confirmation