#Hi does harvest exports its own metrics
1 messages · Page 1 of 1 (latest)
hi @rose depot Harvest exports some own metrics, most are shown in Metadata dashboard. We don't current report error stats, but we should. If you get a chance, please create a GitHub for that, otherwise I'll create one later today
Thanks Chris! Created issue #1457
thanks!
is there a document about the metadata metrics? most our metadata_target_status is 0 but there is one has value 1.
seems we can use metadata_component_status
as the Harvest Prometheus exported data makes clear 🙄
# HELP metadata_component_status Metric for metadata_component and # HELP metadata_target_status Metric for metadata_target
🙂 i think we can make that clearer
metadata_component_status is very clear. metadata_target_status shows value 1 but don't see error in the log
let me dig it up and see
👍
metadata_component_status{poller="flc3-prod-ams-storage",reason!="no instances"} !=0 can give us something
metadata_target_status is published by the poller and is metadata about the cluster being monitored
https://github.com/NetApp/harvest/blob/8419782ea92cff41fbe06c1b9e397cc3a3726177/cmd/poller/poller.go#LL919
If the cluster being monitored is reachable the status is set to 0, otherwise 1
cool
how about metadata_component_status? seems 0 normal. how about 1 and 2? we see 2 in our instance
on it
BTW these are great questions - I'm going to update our fancy new documentation https://netapp.github.io/harvest/ with this info so we can point to it next time. Any opinion on where in the docs you would expect to find this info? Maybe "Configure Harvest (advanced)" or "Reference" or somewhere else?
Troubleshoot or a new section "Monitor Harvest"?
i like it, monitor harvest makes the most sense
metadata_component_status is published by the poller and is metadata about each collector and exporter associated with the poller.
Let's say you're using the Zapi collector, with the out-of-the-box default.yaml, that means you will be monitoring ~22 different objects (as defined in default.yaml). And let's say you are exporting to Prometheus. That means we would expect Harvest to export 22 + 1 metadata_component_status metrics.
The status will be 0 if the collector runs without error, same for prometheus
actually, for Prometheus, the metadata_component_status = 0 means that the exporter was initialized, basically successfully created. It does not really track anything after that. The collector metadata_component_status metric tracks errors like I mentioned. Here is an example, metadata_component_status{name="Zapi",poller="sar",promport="12990",reason="no instances",target="SecurityAuditDestination",type="collector",version="2.0.2"} 1 This has a value of 1 because the collector failed to collect any instances for object SecurityAuditDestination because there are none on this cluster. When you have a non zero value, you should also have a reason
yeah. here is what we see
Here is how the collector sets the values for metadata_component_status
c.SetStatus(0, "running")
c.SetStatus(1, errs.ErrConnection.Error())
c.SetStatus(1, errs.ErrNoInstance.Error())
c.SetStatus(1, errs.ErrNoMetric.Error())
c.SetStatus(2, errMsg)
c.SetStatus(0, "running")
c.SetStatus(0, "running")
right so you have 2 which are from https://github.com/NetApp/harvest/blob/8419782ea92cff41fbe06c1b9e397cc3a3726177/cmd/poller/collector/collector.go#L369 and means the collector got an error we are not expecting, so enter failed state and terminate. You should see more info in your poller log files for that cluster. API request rejected usually means your cluster does not have those objects
context deadline exceeded means there was a timeout collecting that resource
Fcpport, maybe your cluster has no fiber channel ports?
possible. so we should filter with reason!="no instances",reason!="API request rejected"
i think that should work (with the caveat that I'm correct about "API request rejected"). Harvest is trying to figure out if errors it gets from ONTAP should be errors you care about or not. That's why when there are "no instances" the status is set to 1 and logged at INFO instead of WARN or ERR, because maybe it's perfectly fine that you have no instances of some object. We opted to tell you but not shout it 🙂 Seems to me that "API request rejected" is a similar situation. If memory serves, that's ONTAP saying, you sent a ZAPI to me that does not exist. You could make the case, that's fine sometimes and not others.
Can you drop reason altogether and only query for !=0
ah, but you want to remove those missing resources that you don't care about. I think you mean this?
metadata_component_status{reason!="no instances",reason!="API request rejected"} != 0
yes
yep, that's looks good
Thanks a lot!