Harvest not able to get the headroom aggregate metrices | NetApp | Page 1

lilac crest Nov 20, 2024, 4:32 PM

#

NetApp Release 9.14.1P9
harvest version 24.08.0-1 (commit 0cd72654) (build date 2024-08-12T08:58:20-0400) linux/amd64
H/W AFFA400

I have recently built a new netapp in our infra and I have noticed that this one is not displaying the metrix like headroom_aggr_current_utilization. While checking the poller logs I could see that the netapp is rejecting the API calls.
Logs here;
{"level":"info","Poller":"storage01","collector":"ZapiPerf:HeadroomAggr","error":"API request rejected => For resource_headroom_aggr object, no instances were found to match the given query. errNum="61110" statusCode="0"","task":"data","caller":"collector/collector.go:415","time":"2024-11-20T07:53:10Z","message":"Entering standby mode"}

The only difference between the other storage in the infra and this one is, it is the only one running the ONTAP version 9.14.1P9.
All the others are on 9.13.x.

To check this I have upgraded another instance to 9.14.1P8, that has no issues with the metrix.
Is anyone else having this issue?

late crane Nov 20, 2024, 4:38 PM

#

hi @lilac crest can you upload your log files for the poller that is failing here https://upload.nabox.org/moyu-mohi-fazo
How to collect logs

lilac crest Nov 20, 2024, 4:39 PM

#

Sure

#

@late crane Done

late crane Nov 20, 2024, 4:46 PM

#

thanks

#

I think that error is the ZAPI Perf infastructue of ONTAP saying that you don't have any instances of that object. I see the same response from ONTAP for CopyManager, HeadroomAggr, NFSv3, NFSv4, NFSv41, NFSv42, and Qtree. That message makes sense for most of those objects. Not sure about HeadroomAggr, would have thought you would have had instances of that object

#

Is REST enabled on the cluster in question? If so, we can curl the endpoint to check that way too

lilac crest Nov 20, 2024, 4:54 PM

#

Yeah REST is enabled

#

{
"name": "headroom_aggregate",
"description": "Display message service time variance and message inter-arrival time variance for aggregates in a node.",
"counter_schemas": [
{
"name": "current_utilization",
"description": "This is the storage aggregate average utilization of all the data disks in the aggregate.",
"type": "percent",
"unit": "percent",
"denominator": {
"name": "current_utilization_denominator"
}
}
],
"_links": {
"self": {
"href": "/api/cluster/counter/tables/headroom_aggregate"
}
}

#

curl -s -k --request GET "https://nappstore-14/api/cluster/counter/tables/headroom_aggregate?counter_schemas=current_utilization" -H "accept: application/hal+json" --user

late crane Nov 20, 2024, 4:57 PM

#

Perfect, can you also try this endpoint?
https://nappstore-14/api/cluster/counter/tables/headroom_aggregate/rows?return_records=true

lilac crest Nov 20, 2024, 5:00 PM

#

  "records": [
    {
      "id": "nappstore-14a:DISK_SSD_nappstore_14a_data_32de3074-7e40-4ffb-86e1-20c3b16eace9:32de3074-7e40-4ffb-86e1-20c3b16eace9",
      "_links": {
        "self": {
          "href": "/api/cluster/counter/tables/headroom_aggregate/rows/nappstore-14a%3ADISK_SSD_nappstore_14a_data_32de3074-7e40-4ffb-86e1-20c3b16eace9%3A32de3074-7e40-4ffb-86e1-20c3b16eace9"
        }
      }
    },
    {
      "id": "nappstore-14b:DISK_SSD_nappstore_14b_data_b8d85f81-8650-414d-b8f5-f4c86d736dd8:b8d85f81-8650-414d-b8f5-f4c86d736dd8",
      "_links": {
        "self": {
          "href": "/api/cluster/counter/tables/headroom_aggregate/rows/nappstore-14b%3ADISK_SSD_nappstore_14b_data_b8d85f81-8650-414d-b8f5-f4c86d736dd8%3Ab8d85f81-8650-414d-b8f5-f4c86d736dd8"
        }
      }
    }
  ],
  "num_records": 2,
  "_links": {
    "self": {
      "href": "/api/cluster/counter/tables/headroom_aggregate/rows?return_records=true"
    }
  }
}```

late crane Nov 20, 2024, 5:01 PM

#

well that's interesting - RestPerf returns two records while ZapiPerf returns errNum=61110

#

as a workaround, you can switch your poller to use RestPerf instead of ZapiPerf (or move RestPerf above ZapiPerf in the list of collectors). I'll see if I can find anything out about why ONTAP returns a 61110

lilac crest Nov 20, 2024, 5:04 PM

#

Okay let me try switching to REST in sometime. Also will look forward to know why this is happening 🙂

late crane Nov 20, 2024, 5:04 PM

#

me too!

#

@mossy kayak have you seen this before?

#

What if you try this against the same cluster? (please replace ip)

curl -snk --data-ascii '<?xml version="1.0" encoding="UTF-8"?> <netapp xmlns="http://www.netapp.com/filer/admin" version="1.30"> <perf-object-get-instances><instances><instance>*</instance></instances><objectname>resource_headroom_aggr</objectname></perf-object-get-instances> </netapp>' -H "Content-Type: text/xml" -X POST 'https://10.193.48.154/servlets/netapp.servlets.admin.XMLrequest_filer'

lilac crest Nov 21, 2024, 8:44 AM

#

@late crane excuse me about the late response. got engaged with some other stuff till late night

Here is the output

📎 message.txt

serene night Nov 21, 2024, 9:10 AM

#

Thanks @lilac crest
Could you please also share response of below command. Do replace USER,PASS,CLUSTER_IP as needed.

curl --connect-timeout 30 --user USER:PASS --insecure --data-ascii '<?xml version="1.0" encoding="UTF-8"?>
<netapp xmlns="http://www.netapp.com/filer/admin" version="1.130">
    <perf-object-instance-list-info-iter>
        <objectname>resource_headroom_aggr</objectname>
      </perf-object-instance-list-info-iter>
</netapp>' -H "Content-Type: text/xml" 'https://CLUSTER_IP/servlets/netapp.servlets.admin.XMLrequest_filer'

lilac crest Nov 21, 2024, 9:18 AM

#

here is the output

<!DOCTYPE netapp SYSTEM 'file:/etc/netapp_gx.dtd'>
<netapp version='1.241' xmlns='http://www.netapp.com/filer/admin'>
<results status="passed"><attributes-list><instance-info><name>DISK_SSD_nappstore_14a_data_32de3074-7e40-4ffb-86e1-20c3b16eace9</name><uuid>DISK_SSD_32de3074-7e40-4ffb-86e1-20c3b16eace9</uuid></instance-info><instance-info><name>DISK_SSD_nappstore_14b_data_b8d85f81-8650-414d-b8f5-f4c86d736dd8</name><uuid>DISK_SSD_b8d85f81-8650-414d-b8f5-f4c86d736dd8</uuid></instance-info></attributes-list><num-records>2</num-records></results></netapp> ```

#

@serene night

serene night Nov 21, 2024, 9:19 AM

#

Thanks. This shows that Zapi commands are working fine. I'll give you a Harvest command shortly to check further.

lilac crest Nov 21, 2024, 9:20 AM

#

Okay

#

Thanks Rahul

serene night Nov 21, 2024, 9:22 AM

#

Could you run below from Harvest install dir? Let it run for few mins and share the output. You need to replace POLLERNAME with the poller which is having this issue as defined in harvest config.

./bin/poller --poller POLLERNAME --collectors ZapiPerf --objects HeadroomAggr

lilac crest Nov 21, 2024, 9:48 AM

#

Here is the output @serene night

📎 message.txt

serene night Nov 21, 2024, 9:50 AM

#

Thanks. We can see that data is being collected just fine from logs. Could you restart your Harvest poller and check once if it is working now. You don't need to change to Rest for this step.

lilac crest Nov 21, 2024, 9:50 AM

#

Ok let me see

serene night Nov 21, 2024, 9:50 AM

#

And you can stop the CLI, i shared earlier.

lilac crest Nov 21, 2024, 9:59 AM

#

Thanks Rahul

#

restarted the instance and waiting for sometime

#

I am not seeing the error I shared earlier now in the logs

serene night Nov 21, 2024, 9:59 AM

#

Sure. We should be able to see data in 5 mins if it works fine.

lilac crest Nov 21, 2024, 9:59 AM

#

Will give it some time and see

lilac crest Nov 21, 2024, 10:43 AM

#

@serene night It works
Thanks a lot for the help

mossy kayak Nov 25, 2024, 1:47 PM

#

No idea Chris.

#Harvest not able to get the headroom aggregate metrices