#I am trying the new Ems collector in the

1 messages · Page 1 of 1 (latest)

indigo phoenix
#

hi @livid slate let's try with curl to remove Harvest from the equation, let me dig that up for you

#

can you try this replacing username, password, and ip? curl -v --insecure --user admin:password 'https://10.193.48.11/api/support/ems/events'

#

this collector polls data every 3m by default

#

perhaps you had no new ems events in that 3m window?

livid slate
#

that worked. received many events
"records": [
{
"node": {
"name": "flc1-02-noprod-ash-storage",
"uuid": "12c70b18-bf3a-11ec-bc33-00a098f218d0",
"_links": {
"self": {
"href": "/api/cluster/nodes/12c70b18-bf3a-11ec-bc33-00a098f218d0"
}
}
},
"index": 7126896,
"time": "2022-09-28T09:05:00-07:00",
"message": {
"severity": "informational",
"name": "wafl.scan.ownblocks.done"
},
"log_message": "wafl.scan.ownblocks.done: Completed block ownership calculation on volume rootvol(4)@vserver:606bdda8-7930-11ec-8c61-00a0986e013f. The scanner took 2 ms.",
"_links": {
"self": {
"href": "/api/support/ems/events/flc1-02-noprod-ash-storage/7126896"
}
}
},

#

i have a custom zapi Ems template and it collects events too. i did disable that before trying the new Ems collector

tawny sable
#

New ems collector collect last 3 minutes of ems every 3 minutes of schedule . You may not be having any new events since you have started the new ems collector

livid slate
#

ahh. any way to trigger one event to test?

indigo phoenix
#

there's a way to tell ONTAP to synthesize an event - let me dig it up

tawny sable
livid slate
#

name: Ems
query: ems-message-get-iter
object: ems

collect_only_labels: true

counters:
ems-message-info:
- ^ems-severity => severity
- ^message-name => name
- ^^node => node
- ^time => time
- ^^seq-num => seq_num

plugins:

  • LabelAgent:

    metric label zapi_value rest_value default_value

    value_to_num:
    • non_encrypted encryption_state none none 0

export_options:
instance_keys:
- node
- seq_num
instance_labels:
- severity
- name
- time

tawny sable
#

Thanks. Looks like collect all events on every poll.

livid slate
#

yeah

indigo phoenix
#

give this a shot to create an object.store.unavailable event, replacing username, password, and IP curl -sk -uadmin:pass -X POST 'https://10.193.48.11/api/private/cli/event/generate' -d '{ "message-name": "object.store.unavailable", "values": [1,2,3] }'

livid slate
#

very slow to query

#

we were experiencing low WAFL memory issue. tried to catch the warning from EMS before it reaches veryverylow level from verylow

#

triggered EMS with curl

indigo phoenix
#

if the ems poller is running you should see within 3m

livid slate
#

ok

#

got it
curl -s localhost:12991/metrics | egrep '^ems'
ems_events{datacenter="ASH",cluster="flc1-noprod-ash-storage",cluster_uuid="b61b746e-ca01-11e5-aa1c-00a0986e012c",message="object.store.unavailable",node="flc1-01-noprod-ash-storage",node_uuid="2",severity="emergency",index="6713145",config_name="1"} 1
ems_events{cluster="flc1-noprod-ash-storage",cluster_uuid="b61b746e-ca01-11e5-aa1c-00a0986e012c",datacenter="ASH",index="6713146",app="",node="flc1-01-noprod-ash-storage",severity="informational",volume="AU2_1664380392",vol_ident="@vserver:a03a978a-ecd6-11e9-bc47-00a0986e013f",message="wafl.vvol.offline",node_uuid="5b8ad0df-bf3a-11ec-9f89-00a098f21d80"} 1

#

even one extra!

indigo phoenix
#

nice!

tawny sable
#

These events will not come again in next poll but you should see them in Prometheus past data. Basically these events will vanish in next poll and only events from last 3 minutes will be available on 12991 port.

livid slate
#

yep they are gone

#

Thank you very much Raul and Chris!

indigo phoenix
#

you bet!

tawny sable
livid slate
indigo phoenix
#

yes, you'll need to add them to your custom

tawny sable
#

Most of our ems alert examples uses last_over_time and all active ems are published as value 1

livid slate
tawny sable
#

It won't help sum and count would be same given metric value is 1 🙂

#

I think this use case we need to handle in current ems infrastructure . Could you pls open a GitHub request for this use case?

#

Mostly today we publish 1 when event happens and 0 when it is resolved via resolving event.

livid slate
#

will do.

tawny sable
#

Thanks!

livid slate
#

with our zapi ems template, we collect ems timestamp as value. then use this alert condition
count (time() - ems_time{name="wafl.memory.statusLowMemory"} < 600) by (node) > 1
not sure if it worked because prometheus did not report any 😉

tawny sable
#

Thanks . We ll see if we can simplify this use case!

livid slate