#Missing volume_total_ops stats after harvest update

1 messages · Page 1 of 1 (latest)

gleaming anchor
#

Hello folks !

Got a subtle problem I was notified by our teams that volumes were missing some metrics (namely, at least, volume_total_ops and volume_avg_latency) on dashboards, whereas other info seems collected correctly. Metrics on those seem to exist until a date suspisciously close to an upgrade to Harvest 25.11.0. Is there something known about this ?

Have a nice day !

gleaming anchor
#

(Harvest running on latest Nabox, and the filer is running OnTAP 9.16.1P8 and polled using Zapi)

blissful plinth
gleaming anchor
#

Bundle uploaded 🙂 they seem missing on only some clusters, I'm trying to pinpoint a common pattern but I suspect it happens on all our 9.16.1 clusters

blissful plinth
#

I see below error due to which volume perf metric will be missing

Dec 16 10:17:32 harvest dc[1688]: havrest | time=2025-12-16T10:17:32.906Z level=ERROR source=collector.go:436 msg="" Poller= collector=KeyPerf:Volume error="configuration error => empty url" task=data

blissful plinth
#

@gleaming anchor This error can occur if there are any changes to KeyPerf collector default template. Are there any custom changes? Harvest start up logs are not available in shared logs. Could you restart all these pollers via dc restart command , wait for few minutes and then share logs again?
Also share content of command cat /data/packages/harvest/conf/keyperf/default.yaml

gleaming anchor
#

Hmmmmm nope, we did not modify anything it's a stock install. I'll do what you asked, brb

gleaming anchor
#
collector:          KeyPerf

# Order here matters!
schedule:
  - counter:  24h
  - data:      1m

objects:
  Aggregate:                   aggr.yaml
  CIFSvserver:                 cifs_vserver.yaml
  Cluster:                     cluster.yaml
  EthernetSwitchPort:          ethernet_switch_port.yaml
  FlexCache:                   flexcache.yaml
  LIF:                         lif.yaml
  Lun:                         lun.yaml
  Namespace:                   namespace.yaml
  NFSv3:                       nfsv3.yaml
  NFSv41:                      nfsv4_1.yaml
  NFSv4:                       nfsv4.yaml
  SystemNode:                  system_node.yaml
  Volume:                      volume.yaml
#  Qtree:                       qtree.yaml           #Enabling `qtree.yaml` may slow down data collection
#

Here's the default.yaml

blissful plinth
#

This looks correct.

gleaming anchor
#

I'm restarting all containers

blissful plinth
#

try dc down

gleaming anchor
#

aight

#

I'll wait for 10mn before sending new logs 🙂

#

(dc down and then dc up -d, right ?)

blissful plinth
#

yes

blissful plinth
#

@gleaming anchor Do we have new logs?

gleaming anchor
#

Uploading 'em right now 🙂

#

Aaaand done

blissful plinth
#

Thanks

blissful plinth
#

Looks like support bundle didn't capture collector logs post restart. Could you share a fresh support bundle? That should contain more logs than this one. No need to restart pollers.

gleaming anchor
#

Aight

#

Done

blissful plinth
#

Thanks

gleaming anchor
#

havrest | time=2025-12-16T15:08:56.642Z level=ERROR source=nic.go:239 msg="Failed to invoke batch zapi call" Poller=xxx plugin=ZapiPerf:Nic object=NicCommon error="Permission denied => Insufficient privileges: user 'netapp-harvest' does not have read access to this resource errNum=\"13003\" statusCode=\"0\""

blissful plinth
#

Is this the latest one?
nabox-logs-2025-12-16_150746.tgz

gleaming anchor
#

Hmmmmmmm did anything change recently harvest-side, role-wise ?

gleaming anchor
#

You know what, i'll try to switch the harvest user role to "readonly" to check if there is a problem with the associated role

#

Those "url empty" errors seem kinda fishy with all those permission denied around 'em

#

They look more like "mgwd told me to go f myself and come back with more permissions" 🙂

blissful plinth
#

You are right. for 25.11, We have switched volume performance metrics collection to use KeyPerf collector

#

which requires Rest permissions

#

Once that is done, Restart pollers and check.

#

Some of the clusters are collecting stats just fine.

#

I'll DM you the ones failing

gleaming anchor
#

We have clusters running 9.11.1 (old FAS2650), 9.14.1 and 9.16.1

blissful plinth
gleaming anchor
#

I did already plan to switch all clusters to 100% REST polling to finally get rid of ZAPI

blissful plinth
#

This KeyPerf change is applicable to clusters 9.10+

gleaming anchor
#

Looks like the equipment scheduled the change for me

#

😄

blissful plinth
#

I see that you have Rest collector listed post Zapi ones in harvest configuration already 🙂

gleaming anchor
#

But I'm pretty sure, as you'd have guessed, the clusters were not ready for it

#

Not a problem, the change was due anyway.

blissful plinth
#

Yes , I have pinged you the poller names. We can start with granting proper rest permission for one of them and see if it works

gleaming anchor
#

Thank you @blissful plinth

#

I'll make the change and get back to you 🙂

blissful plinth
#

sure

gleaming anchor
#

Well, that looks promising

blissful plinth
#

@gleaming anchor is the issue resolved post permissions?

gleaming anchor
#

Yep !

#

As I suspected, the "url empty" message was a bit more like "empty with a HTTP 403 response" 🙂

#

Thanks for you help again !