#disable alert rule mimir / cortex / loki

1 messages · Page 1 of 1 (latest)

frigid spade
#

hello,

i got to many alerts and don´t know how to deactivate them.

example:
i have many volumes with anti-ransomware status dry-run.
grafana send the alert "Anti-ransomware state was changed to "dry-run" on volume "" (UUID: "") in Vserver "" (UUID: "")." many time a day.

the next example:
i create a new volume. grafana send me the alert "volume [XXX] created" and then "Anti-ransomware was changed to "disabled" on Vserver "" (UUID: "")."

how can i disable these alerts in mimir / cortex / loki. did i have to edit the ems_alert_rules.yml file and comment out what i´d like to disable?

thank you

jolly glacier
#

Alerting certainly needs some flexibility. I’m not sure what the best option is yet. Comments welcome

#

At the minimum NAbox should let you totally override the alert file, I’ll probably do that in next release

frigid spade
#

ok! in the meantime, i have changed the files alert_rules.yml & ems_alert_rules.yml according to my ideas. that workaround is ok for me. thank you

jolly glacier
#

Ok, this will be overwritten if you upgrade Harvest

frigid spade
#

ok, that´s not good. so i have to edit that files again.

jolly glacier
#

Yes. Not sure there is a simple fix. I need to think about it, it goes with the need to alter the parameters like snapmirror lag time that is too short sometimes

frigid spade
#

can I come back to the alert topic again?

missing so many variables in alerting.
here are some examples:

alertname = Certificates expired
description = Certificate [] has been expired on []
summary = Certificate [] has been expired on []
missing: $labels.name and $labels.expiry_time

alertname = Volume Anti-ransomware Monitoring
summary = Anti-ransomware state was changed to "disabled" on volume "" (UUID: "") in Vserver "" (UUID: "").
missing: $labels.volumeName, $labels.volumeUuid, $labels.vserverName and $labels.vserverUuid

and so on .... nearly every alert didn´t have all the variables.

jolly glacier
#

@drowsy flax @worn glen What do you think?
Let's use this one as an example :

    # Certificates expired
  - alert: Certificates expired
    expr: (security_certificate_expiry_time{} - time()) < 0
    labels:
      severity: "critical"
    annotations:
      summary: "Certificate [{{ $labels.name }}] has been expired on [{{ $labels.expiry_time }}]"
      description: "Certificate [{{ $labels.name }}] has been expired on [{{ $labels.expiry_time }}]"

I'm getting something like this for (security_certificate_expiry_time{} - time()) < 0 :

  {
    "metric": {
      "cluster": "cluster2",
      "datacenter": "LAB",
      "instance": "havrest:12001",
      "job": "harvest",
      "uuid": "cluster2.home.lab_46B0EC1911724ADBE6F29EC06DB5C551361C454446B0EC1911724ADBE6F29EC06DB5C551361C4544cluster2"
    },
    "value": [
      1723593600,
      "-12044716"
    ],
    "group": 1
  },

Is this something that Victoria Metrics is handling differently or is misconfigured you think ? I don't see those labels

drowsy flax
#

@frigid spade and @jolly glacier we'll check with VictoriaMeterics and let you know

jolly glacier
#

FWIW, that's with Prom. Can't remember how it works exactly, but the labels should show ?

drowsy flax
#

@mental pulsar looked into this and discovered that Harvest has a bug where the labels being exported are out of sync with the alert definitions. We'll fix those gaps and add unit/CI test to make sure this doesn't happen in the future. We'll post here when a fix is available