#no environment sensor metrics since ontap update to 9.11.1

1 messages · Page 1 of 1 (latest)

harsh violet
#

hallo,

i miss environment_sensor_average_ambient_temperature, environment_sensor_average_fan_speed, environment_sensor_power sind update to ontap 9.11.1
can anyone check his ONTAP: Power dashboard?

thank you

tidal plover
#

@harsh violet Could you share logs files for this poller @ ng-harvest-files@netapp.com

harsh violet
tidal plover
#

@harsh violet Which type of Harvest installation are you using? rpm, deb, container, native, nabox?

tidal plover
harsh violet
#

no environment sensor metrics since ontap update to 9.11.1

pure oar
#

hi @harsh violet thanks for sending along your log file. One thing that jumped out was this error 2023-08-01T12:42:15Z ERR collector/collector.go:367 > error="context deadline exceeded (Client.Timeout or context cancellation while reading body)" Poller=A300_filer collector=Zapi:Sensor task=data that means Harvest is timing out when collecting the sensor resources. That would explain why you are no longer seeing your environment_sensor_* metrics. What isn't clear is why it is taking ONTAP longer in 9.11.1 than it did in 9.9.1. Might be worth trying to increase the client timeout for that object. By default, Harvest uses a timeout of 30s, but it can be increased via https://github.com/NetApp/harvest/wiki/Troubleshooting-Harvest#client_timeout

harsh violet
# pure oar hi <@1014160535731843073> thanks for sending along your log file. One thing that...

hi @pure oar / @tidal plover checked the client_timeout paramenter in my /opt/harvest2-conf/conf/zapi/cdot/9.8.0/volume.yaml and it´s 2m (never changed that). should I increase it even more?
it´s a FMC (fabric metro cluster) problem! have this issue on FMC A300 & A700 systems. HA systems or MCIP show the environment_sensor paramenters. all clusters have the same ontap version 9.11.1P8.
so it´s clear ..... the FMC didn´t show the parameters since the ontap update to 9.11.1P8 🤔

tidal plover
#

@harsh violet You need to increase client_timeout in /opt/harvest2-conf/conf/zapi/cdot/9.8.0/sensor.yaml

harsh violet
#

thank you @tidal plover! still the same problem after updating the sensor.yaml file with client_timeout: 2m
sure that it is a timeout problem and not a fabric metro cluster problem?

tidal plover
#

Based on the logs, it is a timeout issue. To further investigate, let's try calling the ONTAP system directly to see if the ZAPI returns any response. Please execute the following curl command on the affected Poller, You need to replace USER, PASS, and CLUSTER_IP with the appropriate values:

curl --connect-timeout 30 --user USER:PASS --insecure --data-ascii '<?xml version="1.0" encoding="UTF-8"?>
<netapp xmlns="http://www.netapp.com/filer/admin" version="1.130">
  <environment-sensors-get-iter/>
</netapp>' -H "Content-Type: text/xml" 'https://CLUSTER_IP/servlets/netapp.servlets.admin.XMLrequest_filer'
harsh violet
#

amazing! you´re right - it is a timeout problem.

PART1: # curl --connect-timeout 30 --user harvest2 --insecure --data-ascii '<?xml version="1.0" encoding="UTF-8"?>

<netapp xmlns="http://www.netapp.com/filer/admin" version="1.130">
<environment-sensors-get-iter/>
</netapp>' -H "Content-Type: text/xml" 'https://IP-OF-CLUSTER/servlets/netapp.servlets.admin.XMLrequest_filer'
Enter host password for user 'harvest2':
<?xml version='1.0' encoding='UTF-8' ?>
<!DOCTYPE netapp SYSTEM 'file:/etc/netapp_gx.dtd'>
<netapp version='1.211' xmlns='http://www.netapp.com/filer/admin'>
<results status="passed"><attributes-list><environment-sensors-info><discrete-sensor-state>normal</discrete-sensor-state><discrete-sensor-value>GOOD</discrete-sensor-value><node-name>NODE</node-name><sensor-name>PSU2</sensor-name><sensor-type>fru</sensor-type><threshold-sensor-state>normal</threshold-sensor-state></environment-sensors-info><environment-sensors-info><discrete-sensor-state>normal</discrete-sensor-state><discrete-sensor-value>GOOD</discrete-sensor-value><node-name>NODE</node-name><sensor-name>PSU1</sensor-name><sensor-type>fru</sensor-type><threshold-sensor-state>normal</threshold-sensor-state></environment-sensors-info><environment-sensors-info><discrete-sensor-state>normal</discrete-sensor-state><discrete-sensor-value>GOOD</discrete-sensor-value><node-name>NODE</node-name><sensor-name>Fan3</sensor-name><sensor-type>fru</sensor-type><threshold-sensor-state>normal</threshold-sensor-state></environment-sensors-info><environment-sensors-info><discrete-sensor-state>normal</discrete-sensor-state><discrete-sensor-value>GOOD</discrete-sensor-value><node-name>NODE</node-name><sensor-name>Fan2</sensor-name><sensor-type>fru</sensor-type><threshold-sensor-state>normal</threshold-sensor-state></environment-sensors-info><environment-sensors-info><discrete-sensor-state>normal</discrete-sensor-state>

#

PART2: <discrete-sensor-value>GOOD</discrete-sensor-value><node-name>NODE</node-name><sensor-name>Fan1</sensor-name><sensor-type>fru</sensor-type><threshold-sensor-state>normal</threshold-sensor-state></environment-sensors-info><environment-sensors-info><discrete-sensor-state>normal</discrete-sensor-state><discrete-sensor-value>IPMI_HB_OK</discrete-sensor-value><node-name>NODE</node-name><sensor-name>SP Status</sensor-name><sensor-type>discrete</sensor-type><threshold-sensor-state>normal</threshold-sensor-state></environment-sensors-info><environment-sensors-info><discrete-sensor-state>normal</discrete-sensor-state><discrete-sensor-value>OK</discrete-sensor-value><node-name>NODE</node-name><sensor-name>mSATA Status</sensor-name><sensor-type>discrete</sensor-type><threshold-sensor-state>normal</threshold-sensor-state></environment-sensors-info><environment-sensors-info><discrete-sensor-state>normal</discrete-sensor-state><discrete-sensor-value>PRESENT</discrete-sensor-value><node-name>NODE</node-name><sensor-name>mSATA Pres</sensor-name><sensor-type>discrete</sensor-type><threshold-sensor-state>normal</threshold-sensor-state></environment-sensors-info></attributes-list><next-tag><environment-sensors-get-iter-key-td>
<key-0>NODE</key-0>
<key-1>7</key-1>
</environment-sensors-get-iter-key-td>
</next-tag><num-records>8</num-records></results></netapp>

#

sorry, text was to long. the output on the FMC A300 looks completely different to another HA system. (i had to hide cluster ip and node name above).

tidal plover
#

I see that you got the Zapi response. Where did you hit timeout in this CLI?

harsh violet
#

takes 120 seconds between </threshold-sensor-state></environment-sensors-info> and <environment-sensors-info>

<discrete-sensor-value>OK</discrete-sensor-value><node-name>ramfas11</node-name><sensor-name>mSATA Status</sensor-name><sensor-type>discrete</sensor-type><threshold-sensor-state>normal</threshold-sensor-state></environment-sensors-info>

120 seconds here between

<environment-sensors-info><discrete-sensor-state>normal</discrete-sensor-state><discrete-sensor-value>PRESENT</discrete-sensor-value><node-name>ramfas11</node-name><sensor-name>mSATA Pres</sensor-name><sensor-type>discrete</sensor-type><threshold-sensor-state>normal</threshold-sensor-state></environment-sensors-info></attributes-list><next-tag><environment-sensors-get-iter-key-td>
<key-0>ramfas11</key-0>
<key-1>7</key-1>
</environment-sensors-get-iter-key-td>

tidal plover
#

Okay thanks. It's best to contact ONTAP support to check why this ZAPI is slow. On Harvest side, we can only increase client_timeout .

harsh violet
#

case is open. i´ll give you an update if i know more. thank you so much for your support!

tidal plover
#

Thanks!

harsh violet
#

problem:
::*> system node environment sensors show -node <node>
Error: show failed on node "<node>": The Service Processor on node "<node>" is not reachable. Verify that the SP or BMC is online, verify that api-service is enabled on the SP or BMC,
verify that the partner node is running, check if pings from SP or BMC to partner node work, check if hw-assist keep-alives are normal, check that network ports are configured
correctly and are functional (up). Then, try the command again.

Syslog:
Aug 31 02:26:29 (none) spcs[5730]: TSimpleServer client died: SSL_accept: sslv3 alert unsupported certificate

solution:
::*> system service-processor api-service renew-internal-certificates