#Poller error "write: connection reset by peer"

1 messages · Page 1 of 1 (latest)

primal beacon
#

I just installed harvest 23.05 and was looking for the new External Service Operation counter. I am seeing counters for all systems except my 8 node MCC (which I am most interested in).

There are these errors in the poller log:
{"level":"error","Poller":"pollername","exporter":"prometheus","error":"write tcp 127.0.0.1:12999->127.0.0.1:59752: write: connection reset by peer","caller":"prometheus/httpd.go:129","time":"2023-05-12T12:30:09+02:00","message":"write metrics"} {"level":"error","Poller":"pollername","exporter":"prometheus","error":"write tcp 127.0.0.1:12999->127.0.0.1:53466: write: connection reset by peer","caller":"prometheus/httpd.go:129","time":"2023-05-12T12:31:09+02:00","message":"write metrics"} {"level":"error","Poller":"pollername","exporter":"prometheus","error":"write tcp 127.0.0.1:12999->127.0.0.1:54350: write: connection reset by peer","caller":"prometheus/httpd.go:129","time":"2023-05-12T12:32:09+02:00","message":"write metrics"} {"level":"error","Poller":"pollername","exporter":"prometheus","error":"write tcp 127.0.0.1:12999->127.0.0.1:47506: write: connection reset by peer","caller":"prometheus/httpd.go:129","time":"2023-05-12T12:33:09+02:00","message":"write metrics"} {"level":"error","Poller":"pollername","exporter":"prometheus","error":"write tcp 127.0.0.1:12999->127.0.0.1:42870: write: connection reset by peer","caller":"prometheus/httpd.go:129","time":"2023-05-12T12:34:09+02:00","message":"write metrics"}

Other counters are shown for this cluster so I guess it is related.

red badge
#

@primal beacon Are you missing all counters or just service operation counters for 8 node MCC?

primal beacon
#

I think all other counters are available. I am testing now to disable the External Service template.

red badge
primal beacon
#

error seems to be gone when the template is disabled.

red badge
#

okay service counter template should not cause this error. They are unrelated.

primal beacon
#

I sent the logs

red badge
#

Received

#

From the logs data is collected for ExternalServiceOperation

4430:{"level":"info","Poller":"","collector":"ZapiPerf:ExternalServiceOperation","parseMs":886,"calcMs":35,"skips":0,"apiMs":12831,"pluginMs":4,"metrics":309589,"instances":6587,"caller":"collector/collector.go:483","time":"2023-05-12T12:44:01+02:00","message":"Collected"}

primal beacon
#

Maybe it contains some data that the exporter doesn't like?

#

ahh no I misinterpreted that

red badge
#

I think, It is related with the time prometheus takes to scrape the client. Could you share below from prometheus

scrape_duration_seconds

primal beacon
#

"connection reset by peer" - my VictoriaMetrics won't scrape it correctly

red badge
#

Yes

primal beacon
#

Issue is the size of the scrape. Enabling this templates increases the size from 24MB to 140MB. Is set maxscrapesize in VictoriaMetrics to 100MB

red badge
#

makes sense. We'll take a relook at this template. May be limiting to just LDAP service will reduce the number of counters? Currently we collect all services.

primal beacon
#

I need active directory aswell.

#

I have a big amount of SVMs and there are a lot Domain Controllers in our AD. I think thats why it is getting so big.

#

Thanks for your quick response!

red badge
#

Could you share these services name so that we can collect only what is needed. LDAP, AD and ?

#

for now, you can filter relevant services using plugins in this template.

primal beacon
#

I will have to check which operations show some relevant latency. I will keep you posted next week.

red badge
#

sure thanks

primal beacon
#

I still owe an answer which histograms are of interest to me:
DNS - DNSquery
LDAP - GetUserInfoFromName
Netlogon - NetrSamLogonEx