#Sudden change of NetApp clusters temperature measures after NAbox and NetApp Harvest update

1 messages · Page 1 of 1 (latest)

reef thunder
#

Hi all,

Yesterday morning, around 07:40 AM CET, I updated our NAbox appliance from version 3.1.1 to 3.1.2 and NetApp Harvest from 22.05 to 22.08.0-1, right after this operation I noticed a drop in both NetApp clusters temperature measures (indeed, this phenomena is also observed at the same time on our secondary NetApp cluster located at a remote site)

If I could upload a screenshot of the dashboard that would be easier to understand...

Of course, there has been no changes on our infrastructure at that time, and no errors are reported by ONTAP, everything is still running smoothly on our NetApp clusters.

I have seen that there has already similar or a almost similar bug that has been fixed for AFF900.

Could you please tell me if that sudden observed modification is a consequence of the update of NAbox or NetApp Harvest components ?

Our current configuration of NAbox is the following:
NAbox 3.1.2 (2022-08-27) - Alpine Linux 3.15.6
Grafana 8.4.6
Graphite 1.2.0-dev
NetApp Harvest 22.08.0-1
Prometheus 2.38.0

Any help would be appreciated.
Regards.

wanton crescent
#

@reef thunder Let me check if there was any code change between these versions. Is this cluster AFF900?

reef thunder
#

Both clusters are made of AFF A400 nodes.
Thank you in advance.

wanton crescent
#

Thanks @reef thunder Could you share us the response of below command
bin/zapi -p POLLERNAME show data --api environment-sensors-get-iter --max 10000
You can mail it to us as well @ ng-harvest-files@netapp.com

#

You can run above CLI in NABox from root example in screenshot

reef thunder
#

Below is a screenshot of the requested command lines, I will send the output by email as well:

wanton crescent
#

Thanks please send output of below command
bin/zapi -p POLLERNAME show data --api environment-sensors-get-iter --max 10000

#

not from the screenshot i shared . That was just an example

reef thunder
#

My bad... I have been too fast.
OK, I will send the output by email as it is almost unreadable.

wanton crescent
#

thanks yeah it will be xml

wanton crescent
#

@reef thunder Received the xml response. Could you share the the graph values also. How much it was reporting earlier and now?

reef thunder
#

OK, it looks like my initial screenshot is not displaying values...
The following one should indicates that before the update, temperatures of the four nodes range roughly from 26.8 to 28.2 °c, and after the maintenance operation, it has dropped to a range of 24.3 to 25.5 °c.
Have a nice weekend.

wanton crescent
#

Thanks for confirming. I see the same based on the response xml you have shared. We made a change in 22.08. We'll discuss internally more on that change and communicate.

reef thunder
#

OK, I look forward hearing from you.
What are the right temperatures, before or after the update ?

hybrid crag
#

not clear yet, we'll get back to you on the root cause

wanton crescent
#

@reef thunder Thanks for reporting this. We have fixed this issue here https://github.com/NetApp/harvest/issues/1467 and it will be available in our next official release due later this month. Summary of issue as below.
Harvest 22.05 had a bug while calculating average_temperature as it also considered cold sensors in calculation. We tried fixing it in 22.08 by excluding these sensors but we ended up excluding some of hot sensors as well. We have corrected this now with your help.