#Frequent gaps in collected data

1 messages · Page 1 of 1 (latest)

tardy briar
#

I see many gaps in the data for my big 8-Node MetroCluster. It is running 9.10.1P8 currently and the harvest log shows many "lagging behind schedule" warnings for all ZapiPerf collectors. Any idea what I can do to reduce the gaps?

terse oracle
#

@tardy briar Could you share logs with us @ ng-harvest-files@netapp.com. Anything different about this cluster? Is this cluster slow? or being monitored by multiple other tools?

tardy briar
#

I use Harvest, AIQUM and Checkmk for monitoring. I see some lag messages for all systems but not as many as this. It is our biggest cluster with most vServers & Volumes. I will send the logs as suggested.

terse oracle
#

Sure. Most likely it is slow response time of Zapi/Rest from ONTAP system.

terse oracle
#

@tardy briar I have checked the logs, and all the lag messages are not concerning as they mostly show a delay of less than one second. Could you share some examples of metrics with gaps? How large are these gaps?

tardy briar
#

The gaps are 2-5 minutes most of the time. As I said I find various graphs. Here SVM IOPS for example.

#

2nd example Ethernet Throughput

vocal elbow
#

hi @tardy briar any modifications to your Prometheus config in terms of scrape interval? Can you share your http://$promIP/config via same email as earlier?

tardy briar
#

But it seems the gaps are not related to the lag messages. Ethernet and SVM were not appearing there. Other charts where lags were reported in the logs look fine.

vocal elbow
#

right, as Rahul mentioned a several second lag won't cause that

#

i guess the other thing to check are your prom logs and see if anything points to a problem there

tardy briar
#

I am using VictoriaMetrics as a backend. Here is the config:
`scrape_configs:

  • job_name: harvest_poller
    static_configs:
    • targets:
      • localhost:5201
  • job_name: harvest
    scrape_interval: 1m0s
    http_sd_configs:
#

I see no errors regarding scraping 😦

#

I think I found it, forgot to add "sort_labels" to my harvest config. Must have been lost during one of the updates....

vocal elbow
#

ah!

#

if I recall VM is more sensitive to that setting than Prom? Normally that wouldn't cause gaps, right?

tardy briar
#

Correct

#

VM will mark series stale if the label order changes

vocal elbow
#

I'll make a note of that in our FAQ for next time. Thanks for the realtime follow-up 😄

tardy briar
#

Thanks for the quick response. That helped me a lot looking in the right direction.

vocal elbow
#

Have you been please with VM in general? We occasionally have folks ask us about it but haven't used it in earnest yet

tardy briar
#

I am very pleased with VM, for many reasons.

  • small and easy to setup for simple use cases
  • also able to do powerful cluster setups with an architecture that is easy to understand
  • many features to ease transion from other TSDBs (I still use inserting via graphite protocol for my other tools)
  • improved query language that overcomes some of PromQLs quirks
  • focus on performance and minimal storage usage, it scales really well for big number of metrics with high churn rates