#Harvest Metric Documentation

1 messages · Page 1 of 1 (latest)

long raft
#

Hi, I'm looking into Harvest metrics and I want to configure Prometheus rules based on certain conditions.

However, I'm not sure where I can find documentation showing all the different possibilities per metric. For example:

cluster_new_status{} has a status metric but i want to see what different states are possible. E.g. healthy is one but is there some documentation listing all of these metrics and their possible values?

merry verge
#

@long raft Harvest metrics are documented, and cluster_new_status is available here:
https://netapp.github.io/harvest/24.02/ontap-metrics/#cluster_new_status

However, the documentation for labels, such as the status in cluster_new_status, is not available. This label is directly mapped to the output provided by ONTAP via the diagnosis-status-get ZAPI command, which you can find here:
https://github.com/NetApp/harvest/blob/main/conf/zapi/cdot/9.8.0/status.yaml#L7

You may want to track metric values of 0 and 1 for cluster_new_status, where 0 indicates an unhealthy status.

long raft
#

Thanks @merry verge i found the same doc but couldn't get more precise info. Maybe the best way is to test around in my dev cluster and create scenarios which produce different metric values.

merry verge
#

The metric values for cluster_new_status will only be 0 and 1. The status is a label within the cluster_new_status metric. For the Zapi Collector, the possible values for the status label, as per the Zapi documentation, are as follows:

Overall system health:
 *                     (ok, ok-with-suppressed, degraded, unreachable)
 *                     These are determined by the diagnosis framework.

For the REST collector, the following equivalent API is called:
api/cluster?return_records=true&fields=health

long raft
#

Oh thanks, I'll look into this @merry verge !

long raft
#

@merry verge I have another question, maybe I could create a new thread but I'll put my question here if it's ok.

volume_labels
node_labels
disk_labels
svm_labels

The above metrics im trying to configure alerts for but they generally equal 1 or 0 in grafana. Is 1 healthy and 0 unhealthy? I couldn't find any of these metrics in the doc - https://netapp.github.io/harvest/23.11/ontap-metric

I did find https://netapp.github.io/harvest/23.11/plugins/#default-tracking-for-svm-node-volume but I'm just unsure if I can trust these metrics/labels.

For example, I want to know if a volume/node/disk/svm are unhealthy. 1 example alert rule i have created is as follows:

sum (shelf_labels{state!="online"}) by (cluster,datacenter,state,node,op_status) > 0

I'm just not sure what the 1 means in this case as I can't see this documented

merry verge
#

@long raft All metrics you have mentioned above _labels have only value 1 . They are never 0.

long raft
#

in which case... is there a way to configure alerts for the states of volumes, nodes, disks, svm via harvest or would maybe alerting through the ONTAP system logs be a better option?

merry verge
#

You can. For example for volumes, If you want to detect offline volumes, You can create an alert on below query

volume_labels{state="offline"} == 1

long raft
#

oooh this helpful i missed this document thank you again!