#Harvest. RestPerf Error missing timestamp metric

1 messages · Page 1 of 1 (latest)

bronze quarry
#

Hello Team. I have deployed a new monitoring for netapp system and got the error of restperf.
Some netapp system has version ONTAP 9.14.1P4 +
All poller run in containers. Sometimes reboot container help me, sometimes no. Can you help invistigate what the problem?
I got the error:
ime=2024-12-19T09:17:19.739Z level=ERROR source=collector.go:434 msg="" Poller= collector=RestPerf:NFSv41 error="configuration error => missing timestamp metric" task=data
ime=2024-12-19T09:17:19.739Z level=ERROR source=collector.go:434 msg="" Poller= collector=RestPerf:NFSv41Node error="configuration error => missing timestamp metric" task=data
ime=2024-12-19T09:17:19.739Z level=ERROR source=collector.go:434 msg="" Poller= collector=RestPerf:NVMfLif error="configuration error => missing timestamp metric" task=data
ime=2024-12-19T09:17:19.739Z level=ERROR source=collector.go:434 msg="" Poller= collector=RestPerf:HostAdapter error="configuration error => missing timestamp metric" task=data

zealous chasm
#

@bronze quarry Could you share Harvest version?

bronze quarry
#

Hi.

#

harvest version 2.0-nightly (commit 9cad88ee) (build date 2024-12-19T08:52:01+0000) linux/amd64

#

But, the same errors i got on lastest verison

zealous chasm
bronze quarry
#

one moment please

#

Done. Check that please

zealous chasm
#

I see that These errors are side effects of a timeouts which has happened earlier

time=2024-12-19T10:23:42.165Z level=ERROR source=collector.go:434 msg="" Poller=xxxx collector=RestPerf:Volume error="failed to fetch data. href=[api/cluster/counter/tables/volume?return_records=true&max_records=500] err: error making request connection error Get \"https://a402-dbaas-ds/api/cluster/counter/tables/volume?max_records=500&return_records=true\": dial tcp :443: i/o timeout" task=counter
#

It seems like Cluster is unable to parse requests.

bronze quarry
#

yes. i saw that. But it's stranger, because other metrics collected fine

zealous chasm
#

Yeah, I suspect Cluster is unable to respond to many requests at once.

bronze quarry
#

I have checked and will back with results

zealous chasm
#

sure

bronze quarry
#

collector=RestPerf:CIFSvserver error="configuration error => missing timestamp metric" task=data
collector=RestPerf:SMB2 error="configuration error => missing timestamp metric" task=data
collector=RestPerf:Lun error="configuration error => missing timestamp metric" task=data
collector=RestPerf:NFSv4 error="configuration error => missing timestamp metric" task=data
collector=RestPerf:NFSv4Node error="configuration error => missing timestamp metric" task=data
collector=RestPerf:NicCommon error="configuration error => missing timestamp metric" task=data
collector=RestPerf:Path error="configuration error => missing timestamp metric" task=data
collector=RestPerf:VolumeSvm error="configuration error => missing timestamp metric" task=data

#

this file limit_60s.yaml root@575d57a25911:/opt/harvest# cat conf/restperf/limit_60s.yaml
jitter: 1m

collector: RestPerf

Order here matters!

schedule:

  • counter: 60m
  • data: 1m

objects:
CIFSNode: cifs_node.yaml
Disk: disk.yaml

ExtCacheObj: ext_cache_obj.yaml

FCVI: fcvi.yaml

FcpPort: fcp.yaml

HeadroomAggr: resource_headroom_aggr.yaml
HeadroomCPU: resource_headroom_cpu.yaml

zealous chasm
#

Could you share logs again to same location

bronze quarry
#

Done.

zealous chasm
#

It's the same timeouts in logs. Let's curl the endpoint to isolate the issue from Harvest. Do you get any response back if you curl endpoint api/cluster/counter/tables/lun?return_records=true&max_records=500?

bronze quarry
#

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

#

but 443 port [tcp/https] succeeded!

zealous chasm
#

Could you try below

bronze quarry
#

great

#

{
"name": "lun",
"description": "This table contains LUN-level SAN counters which are shared between 7-mode and C-mode. These counters are available for every mapped logical unit. The alias name for lun:node is lun_node.",
"counter_schemas": [
{
"name": "average_read_latency",
"description": "Average read latency in microseconds for all operations on the LUN",
"type": "average",
"unit": "microsec",
"denominator": {
"name": "read_ops"
}
},
{
"name": "average_write_latency",
"description": "Average write latency in microseconds for all operations on the LUN",
"type": "average",
"unit": "microsec",
"denominator": {
"name": "write_ops"
}
},
{
"name": "average_xcopy_latency",
"description": "Average latency in microseconds for xcopy requests",
"type": "average",
"unit": "microsec",
"denominator": {
"name": "xcopy_requests"
}
},
{
"name": "caw_requests",
"description": "Number of compare and write requests",
"type": "delta",
"unit": "none"
},

zealous chasm
#

Great. I noticed that even with a jitter of 1 minute, the requests are still very close together. It's odd that ONTAP is failing under these conditions. We could try increasing the jitter value, but I doubt it will completely resolve the issue.

#

We could try increasing polling schedule if you are okay with that? It is 1m poll currently for RestPerf

bronze quarry
#

okay. i'll come back with result

zealous chasm
#

This is not a vsim right?

bronze quarry
#

right

#

it's full production)

zealous chasm
#

Okay

#

What is the output of below command in ONTAP CLI? In diag mode.

system services web show -fields per-address-limit, wait-queue-capacity 
bronze quarry
#

Rahul, Hi. jitter: 3m really help me. Thank yoy so much for your help

zealous chasm
#

Great! Did you change data as well to 3m?

bronze quarry
#

no, i stay 1m, jitter 3 and it's working perfectlly

zealous chasm
#

Cool. Do you have other API load as well on this cluster apart from Harvest?

bronze quarry
#

hm, yes, it has. But, it's only by zapi method not rest

zealous chasm
#

Okay. Yeah i suspect that Cluster is unable to handle many requests at the same time leading to timeouts.

bronze quarry
zealous chasm
#

I believe opening an ONTAP support case would be beneficial, as they can examine the logs on the ONTAP side to determine the root cause of this issue.

bronze quarry
zealous chasm
#

That looks correct