#Metric for Node performance capacity View

1 messages · Page 1 of 1 (latest)

waxen trench
#

Hi,
I am using harvest metric
"headroom_cpu_current_utilization{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$Node"} / (headroom_cpu_optimal_point_utilization{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$Node" } != 0) *100"
to show the performance capacity trend in Grafana.

However, on node5, it has an abnormal value showing in Grafana, as the image show.
The mean value was 108% and max was 300%.
However, we checked that node in AIQUM and the average value of performance capacity is only 43% and the Max is only 70%.

Only this node in the cluster is showing abnormal value in the trend, other nodes looks normal.

What could be the reason for this? The data point?

abstract skiff
#

Hi @waxen trench I'm not familiar with which counters AIQUM is using to calculate that graph. Do you know which counters? We need to make sure that we're comparing the same thing. Can you also send your log files for this poller ng-harvest-files@netapp.com? We want to see if there are any partial aggregations or other issues logged there

night vault
#

@waxen trench In addition to Chris's ask, could you also verify if the values of the node_avg_processor_busy metric and headroom_cpu_current_utilization match? They should be identical.

Also, could you share a plot of the performance capacity used for the last 3-4 hours, based on the panel you mentioned earlier?

Is this the same cluster we discussed in this Discord channel?

waxen trench
#

@abstract skiff @night vault We have collected a Performance Archive for this node. The peak performance line of node utilization is fluctuating rapidly, as shown in the image, which might be the root cause. If we divide the utilization value by the peak performance value (optimal point), I think this trend might match what we observed using Harvest.

I was considering using an alternative metric:
headroom_cpu_ewma_hourly{metric="utilization"} / headroom_cpu_ewma_hourly{metric="optimal_point_utilization"}.

But found out that
headroom_cpu_ewma_hourly{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$Node", metric="utilization"} / headroom_cpu_ewma_hourly{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$Node", metric="optimal_point_utilization"}

This doesn't show the result.

By using the EWMA value to smooth short-term fluctuations and emphasize long-term trends or cycles, do you experts have any recommendations?

night vault
#

@waxen trench Are you saying that Harvest metrics are being reported correctly?

I'm not sure if EWMA can be used in this context. This question might be better suited for the #1062049169520476220 channel.

waxen trench
# night vault <@706068726092398593> Are you saying that Harvest metrics are being reported cor...

From the metric I am using now:
"headroom_cpu_current_utilization{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$Node"} / (headroom_cpu_optimal_point_utilization{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$Node" } != 0) *100"
This node has an average 100% and Max 300% of performance capacity.

And from the PA view, as the image attached.
57/ 22 -> 260%, which might be close to the view in Grafana harvest.

night vault
#

Okay, so it seems that UM is displaying different data or using a different metric.

waxen trench