From the monitor perspective,
I want to get notified from the alert manager or harvest when the cluster is down.
The only method I can think of is to setup the following yaml in alertmanager
I tried to compare this metric "node_cpu_busy" for time spot.
- name: cluster_no_data_alerts rules: - alert: NoDataForMetric expr: group by (cluster) (node_cpu_busy{} offset 1h unless on(cluster) node_cpu_busy{})==1 for: 5m labels: severity: critical annotations: summary: "No metric data for for cluster '{{ $labels.cluster }}'" description: "The metric '{{ $labels.metric }}' has not received data for over 10 minutes."
Any recommendation?