#Grafana Alerts firing before Evaluation interval

1 messages · Page 1 of 1 (latest)

sonic vine
#

I have setup Volume Latency Alerts in Grafana. Rule to evaluate latency every 1 min for 10 min interval. However, i see Alerts generated even if the latency is high for 2 or 3 one min intervals.

For instance the alert was generated and when i checked the Graph in Grafana, the latency was high just 3 mins interval. Any help is much appreciated to get this resolved.

[FIRING:2] Volume Top Average Latency ASE | NetApp Harvest 21.11 - cDOT (n2_aggr2_sas ase-sy3-cluster-dev00 SY3 poller-ase-sy3-cluster-dev00:13000 harvest ase-sy3-cluster-dev00-02 critical ontap asec3981-sy3-fluid0-vs0 flexvol)
Firing
Value: [ var='B' labels={aggr=n2_aggr2_sas, cluster=ase-sy3-cluster-dev00, datacenter=SY3, instance=poller-ase-sy3-cluster-dev00:13000, job=harvest, node=ase-sy3-cluster-dev00-02, svm=asec3981-sy3-fluid0-vs0, type=flexvol, volume=trident_pvc_3fcf6e5b_0bf0_4c1a_8917_10d77421c319} value=73.121 ], [ var='C' labels={aggr=n2_aggr2_sas, cluster=ase-sy3-cluster-dev00, datacenter=SY3, instance=poller-ase-sy3-cluster-dev00:13000, job=harvest, node=ase-sy3-cluster-dev00-02, svm=asec3981-sy3-fluid0-vs0, type=flexvol, volume=trident_pvc_3fcf6e5b_0bf0_4c1a_8917_10d77421c319} value=1 ]
Labels:

  • alertname = Volume Top Average Latency
  • aggr = n2_aggr2_sas
  • cluster = ase-sy3-cluster-dev00
  • datacenter = SY3
  • grafana_folder = ASE | NetApp Harvest 21.11 - cDOT
  • instance = poller-ase-sy3-cluster-dev00:13000
  • job = harvest
  • node = ase-sy3-cluster-dev00-02
  • severity = critical
  • storage = ontap
  • svm = asec3981-sy3-fluid0-vs0
  • type = flexvol
  • volume = trident_pvc_3fcf6e5b_0bf0_4c1a_8917_10d77421c319
royal chasm
#

@sonic vine , Could you please share few more detail like which version of grafana you are currently using, what's the query of A?

sonic vine
#

@royal chasm We are running Grafana v9.2.3. + Attached screenshot of the query.

royal chasm
#

Hi @sonic vine
I've tried to reproduce with Grafana 9.4.3 and so far I haven't been able to reproduce the issue. I've also validated that Harvest is sending the correct values to Prometheus and there is no bug there. Also I have checked that if the alert condition is false within the provided For duration(10m) then the alerting state is successfully changed from pending to Normal.  
Sharing few screenshots of my testing.

  • evaluation: 1m, For: 10m -> Image 1
  • Alert pending state for more than 2m as value is higher than the condition. -> Image 2 and 3
  • As the read latency value fall below 20ms, the alerting state changed to Normal. -> Image 4 and 5.

It might be better to reach out to the Grafana team on their eval interval and alert firing.