#metrics missing for a cluster for some weeks

1 messages · Page 1 of 1 (latest)

visual yew
#

Hi Community

I'm faced an issue.
I'm running Harvest (24.05) and VictoriaMetrics inside Podman.
I have some dozens of Cluster monitored for more than 2years.
And I noticed that for one cluster i'm not getting some metrics for some weeks.
I have no errors on harvest pooler, no errors on vmagent, no errors on victoriametrics.

All counters which stop to work, stopped at the same time.
I went into victoriametrics itself and it seems that they are really no data for the missing counters (sounds like it is not a dashboard/grafana side effect).

So i'm stuck and i don't know where to go to dig deeper...

Any ideas?
Thanks
Flo

tough dagger
#

hi @visual yew which counter? If you curl the poller directly do you see that counter? If not, the log files should tell us why

visual yew
#

Hi Chris, for example node_volume_avg_latency

#

which logs are you talking abount?

#

So far i'm blind and i can just imagine something wrong in the victoriametric database

tough dagger
#

that comes from the volume.yaml template https://netapp.github.io/harvest/nightly/ontap-metrics/#node_volume_avg_latency
log files for the poller that is failing to export the counter you expect
https://netapp.github.io/harvest/nightly/help/log-collection/

If you curl that poller's promPort, like the following, do you see the metric?
Replace 12990 with your promPort

curl -s 'http://localhost:12990/metrics' | grep 'node_volume_avg_latency'
# HELP node_volume_avg_latency Metric for node_volume
# TYPE node_volume_avg_latency gauge
node_volume_avg_latency{cluster="umeng-aff300-01-02",datacenter="dc-1⚡️",node="umeng-aff300-02"} 145.50507832115724
node_volume_avg_latency{cluster="umeng-aff300-01-02",datacenter="dc-1⚡️",node="umeng-aff300-01"} 159.36897773839848
visual yew
#

i just get metrics related to victoriametrics

tough dagger
#

are you sure that you're curling the correct endpoint?

visual yew
#

in your example it is 12990, it is prometeus end point ?

tough dagger
#

yes, 12990 is my poller's promPort, but may not be yours. You need to replace 12990 with your poller's promPort

visual yew
#

so in prometheus.yml:
static_configs:

#
  • targets: [localhost:XX]
tough dagger
#

check your harvest.yml and see what you specified for the poller's exporter promPort. The poller has an exporter, which you defined in the Exporters: section of the harvest.yml. The export lists which port(s) it will use

The poller log files also tell you with a line like this
msg="server listen" Poller=sar exporter=prometheus1 url=http://:12990/metrics

visual yew
#

ok, understood, but curl is not available in harvest container, let me see if i can find it in victoria metrics

tough dagger
#

you can curl outside the container or inside you can use wget

visual yew
#

i can't get data outside my containizered app exept those i autorised?

tough dagger
#

you can't scrape the port unless you exposed it like so in your compose file

visual yew
#

jump into team

#

if you can 🙂

#

teams

visual yew
#

Hi Chris, i have updated to the latest harvest 24-08, and i still have error messages, i have sent it to the dl

worldly surge
#

Thanks @visual yew We'll take a look.