#Struggling with QOS stats in grafana

1 messages · Page 1 of 1 (latest)

hollow trail
#

Hi, I am running NABox 3.5b and have custom.yaml's created in both my zapiperf and restperf locations. We are running 9.13 and I am expecting this to be using restperfs for the collection, I do see some metrics collected under volume workload on zapis but none for rest, it looks like my harvest was first deployed without restperf collectors in the harvest.yaml so I think this is the first issue... could someone tell me how best to add the restperf collectors? Can I just add them under the harvest.yaml and re-run docker-compose?

#

Second issues, with the zapiperf collector qos metrics do not appear under any query run in grafana sucn as topk($TopResources, qos_detail_volume_resource_latency{datacenter=~"$Datacenter",cluster=~"$Cluster",svm=~"$SVM",volume=~"$TopVolumeQOSThrottle",resource="throttle"})

#

Yet they do appear in prometheus under the metric "qos_detail_volume_resource_latency"

neat kernel
#

@hollow trail What is your Harvest version?

hollow trail
#

23-11-01

neat kernel
#

Did you make any manual changes to harvest.yml?

hollow trail
#

no

#

someone may have before me but they would get wiped under a reboot wouldn't they?

neat kernel
#

Can you share output of your custom.yaml?

hollow trail
#

Objects:
Volume: custom_volume_blacklist.yaml
Workload: workload.yaml
WorkloadDetail: workload_detail.yaml
WorkloadVolume: workload_volume.yaml
WorkloadDetailVolume: workload_detail_volume.yaml

neat kernel
#

What is the output of below command?

dc exec -w /conf nabox-harvest2 /netapp-harvest/bin/harvest doctor --print
hollow trail
#
    collectors:
        - Zapi
        - ZapiPerf
        - Rest
    exporters:
        - nabox-prometheus
Exporters:
    nabox-prometheus:
        addr: -REDACTED-
        exporter: Prometheus
        master: true
Pollers:
    _unix:
        addr: -REDACTED-
        autostart: '1'
        collectors:
            - Unix
        datacenter: NAbox
        prometheus_port: 12800
    hsnetappdevclstr:
        addr: -REDACTED-
        autostart: '1'
        collectors:
            - Zapi
            - ZapiPerf
            - Rest
            - Ems
        datacenter: HS
        password: -REDACTED-
        prometheus_port: 12991
        use_insecure_tls: true
        username: -REDACTED-
    hsnetappprodclstr:
        addr: -REDACTED-
        autostart: '1'
        collectors:
            - Zapi
            - ZapiPerf
            - Rest
            - Ems
        datacenter: HS
        password: -REDACTED-
        prometheus_port: 12990
        use_insecure_tls: true
        username: -REDACTED-
    khnetappprodclstr:
        addr: -REDACTED-
        autostart: '1'
        collectors:
            - Zapi
            - ZapiPerf
            - Rest
            - Ems
        datacenter: KH
        password: -REDACTED-
        prometheus_port: 12992
        use_insecure_tls: true
        username: -REDACTED-
    npnetappprodclstr:
        addr: -REDACTED-
        autostart: '1'
        collectors:
            - Zapi
            - ZapiPerf
            - Rest
            - Ems
        datacenter: NP
        password: -REDACTED-
        prometheus_port: 12993
        use_insecure_tls: true
        username: -REDACTED-
Tools:
    grafana_api_token: -REDACTED-
```
neat kernel
#

looks good

#

What is the output of query qos_detail_volume_resource_latency{resource="throttle"}?

hollow trail
#

empty, but if I remove thorttle I have metrics

neat kernel
#

Did you reset dashboards to sync with 23.11 release? Were you getting data earlier and it is missing recently?

neat kernel
#

How about other panels in that dashboard?

hollow trail
#

no panels work in the workload or qos dashboard but single metrics work in prom

#

It looks like it has briefly worked whilst it was manally editted in default.yaml but obviously that doens't sustain reboots

neat kernel
neat kernel
hollow trail
#

anything starting with qos as a single metric works without labels

neat kernel
#

Okay I understand the problem now

#

qos_detail_volume_resource_latency metric is no longer available in Harvest. All metrics are now under qos_detail_resource_latency

#

Can you reset your dashboards and see

hollow trail
#

thats fixed it

neat kernel
#

Cool.

hollow trail
#

that was nice and easy

neat kernel
#

Also Harvest 24.02 is out. It has few fixes for workload dashboard.

neat kernel