#How can I get the last value from the chart to evaluate the alert?

1 messages · Page 1 of 1 (latest)

trim sparrow
#

I send data to Netdata every ten minutes, and I need to retrieve the latest recorded value from the database for further evaluation.

Now I have this definition for alert:

alarm: kopia_quick_maintenance_failed
on: Kopia.quick_maintenance_tasks_status
lookup: min -1s unaligned
every: 1m 
crit: $this != 1
info: "One or more quick maintenance tasks failed"
class: application
type: integrity
component: kopia.maintenance
summary: "At least one quick maintenance task failed"
to: sysadmin
junior flicker
#

Hi. I think you can use calc instead of lookup.

(change dimensionName to the actual dimension name)

calc: $dimensionName
trim sparrow
# junior flicker Hi. I think you can use `calc` instead of `lookup`. (change `dimensionName` to ...

and when I have chart with more dimensions

[quick_maintenance_tasks_status]
  name = quick_maintenance_tasks_status
  title = Quick Maintenance Tasks Status
  context = kopia.quick_maintenance_tasks_status
  family = Quick Maintenance
  type = area
  units = state
  priority = 100001
  dimension = kopia.quick_maintenance_advance-epoch_success 'Advance epoch' last
  dimension = kopia.quick_maintenance_compact-single-epoch_success 'Compact single epoch' last
  dimension = kopia.quick_maintenance_delete-superseded-epoch-indexes_success 'Delete superseded epoch indexes' last
  dimension = kopia.quick_maintenance_cleanup-epoch-markers_success 'Cleanup epoch markers' last                                                                                                                                                                
  dimension = kopia.quick_maintenance_cleanup-logs_success 'Cleanup logs' last
  dimension = kopia.quick_maintenance_generate-epoch-range-index_success 'Generate epoch range index' last
  dimension = kopia.quick_maintenance_snapshot-gc_success 'Snapshot gc' last

I have to create alarm for every dimension? Or it is possible have only one alerts definition?

junior flicker
#

Instead of creating one chart with multiple dimension it is better to create a chart for every status: dimensions will be success and failed. Then you can create one alert (template) because all of them will have the same context.

trim sparrow
junior flicker
#

No, you didn't understand me. Dimensions will be "success" and "failed". You create a chart for every task status.

trim sparrow
#

OK. Now I'm creating definitions for these charts in /etc/netdata/statsd.d/kopia.conf.

[app]
name = Kopia
metrics = kopia.*
private charts = no
gaps when not collected = no
history = 3600

[quick_maintenance_advance_epoch_status]
  name = kopia_quick_maintenance_advance_epoch_status
  title = Advance epoch
  context = kopia.quick_maintenance_task_status
  family = Quick Maintenance
  type = line
  units = state
  priority = 110001
  dimension = kopia.quick_maintenance_advance-epoch_state 'Success' last

[quick_maintenance_compact_single_epoch_status]
  name = kopia_quick_maintenance_compact_single_epoch_status
  title = Compact single epoch
  context = kopia.quick_maintenance_task_status
  family = Quick Maintenance
  type = line
  units = state
  priority = 100001
  dimension = kopia.quick_maintenance_compact-single-epoch_state 'Success' last

Why is it that even if I give the chart a different name, I still see only one chart with two dimensions on the dashboard?

echo "kopia.quick_maintenance_compact-single-epoch_state:1|g" | nc -w 1 -u 127.0.0.1 8125
echo "kopia.quick_maintenance_advance-epoch_state:1|g" | nc -w 1 -u 127.0.0.1 8125
junior flicker
#

It says 2 instances. All charts are aggregated by context.

trim sparrow
#

Aha, is it normal behavior? I mean, I will define more charts with the same context, but in the dashboard, will they be aggregated into one chart? I thought there would be one chart for each task, and I use context for alerts.

junior flicker
#

Normal.

trim sparrow
#

So, does that mean I need to create a different context for each chart and then a separate alert for each one in order to distinguish in the message which task is wrong?

junior flicker
#

No, it doesn't.

trim sparrow
#

Sorry, but I still don’t really understand how the chart and alert system works. I think I’ve read everything in the docs, but it still doesn’t make sense to me.

junior flicker
#

What exatly is not clear? I can't really answer if you are not being specific.

#

templates (template:) apply to all charts of a context. It means if all your charts have the same context you will need one template.

trim sparrow
#

I want to know what task is not success, dimenson metric != 1

trim sparrow
#

I'm using Netdata to monitor several maintenance tasks from Kopia, and I send their statuses to StatsD every 10 minutes.

Each task has a separate dimension/metric like this:

dimension = kopia.quick_maintenance_advance-epoch_success 'Advance epoch' last  
dimension = kopia.quick_maintenance_compact-single-epoch_success 'Compact single epoch' last  
dimension = kopia.quick_maintenance_delete-superseded-epoch-indexes_success 'Delete superseded epoch indexes' last  
dimension = kopia.quick_maintenance_cleanup-epoch-markers_success 'Cleanup epoch markers' last  
dimension = kopia.quick_maintenance_cleanup-logs_success 'Cleanup logs' last  
dimension = kopia.quick_maintenance_generate-epoch-range-index_success 'Generate epoch range index' last  

Each dimension value is either 1 (OK) or 0 (Fail).

I would like to create alerts for each task. For example, if kopia.quick_maintenance_generate-epoch-range-index_success = 0, I want to trigger an alert to let me know that the "Generate epoch range index" task failed.

What is the best way to create such alerts and charts?

trim sparrow
#

The goal is to use some variable in "summary:" For the dimension to then differentiate the alerts in the dashboard.

trim sparrow
#

@junior flicker Now, I am happy. But can I use some a variable in the "summary" section? Ideally an alias name for dimension.

trim sparrow
#

@junior flicker Please is it possible? ⬆️

junior flicker
#

I will check