#NAbox 4.2 - Unable to see cluster metrics

41 messages · Page 1 of 1 (latest)

vague wasp
#

Hello,
We recently added a new cluster to NAbox, but we are unable to see its metrics.
We are running NAbox 4.2 and Harvest 26.02.
After reviewing the containers logs, we saw frequent connection events where NAbox attemps to communicate with cluster, but the connection is lost . Specifically, the logs show "connection reset by peer" events.
Could you please help us determine what is the problem?
Thank you!

dreamy tapir
vague wasp
#

i cant because we are working on the dark site

dreamy tapir
#

does the line connection reset by peer appear in the harvest logs, or is it from a different nabox container log?

vague wasp
#

when i downloaded the NAbox support bundle there was a containers log file on its own

dreamy tapir
#

Okay. Could you share full log line of this error if possible? I need to understand if issue is NABox side or harvest side?

vague wasp
#

Yes one moment

#

May 19 10:10:58 nabox_hostname :havrest | time = ...... level=ERROR source=cache.go:161 msg="write metrics" poller= cluster_name export=nabox_victoriametrics error="write tcp ip->ip: write: connection reset by peer"

dreamy tapir
#

Thanks. It seems like there is an issue with Victoria Metrics data ingestion. How many clusters do you have? Have you tried restarting containers in NABox?

vague wasp
#

we have only one cluster and we tried to delete it and add it again but it didnt work

dreamy tapir
#

Try restarting all containers once and see?

vague wasp
#

i did restart the container now, and it shown metrics for 3 minutes. after that the metrics disappeared

dreamy tapir
#

Ok, do you see any errors if you run docker logs havrest in NABox cli?

vague wasp
#

i will check it

#

there are a lot of "level=ERROR" source=cache.go:** msg="write metrics" poller= cluster_name export=nabox_victoriametrics error="write tcp ip->ip: write: brocken pipe"

dreamy tapir
#

Any other errors apart from this?

vague wasp
#

1.level=error source=response.go msg="error response" status=500 response="GET \ https://.....: dial tcp lookup .... on ip no such host"

2.level =Error source=volumeAnaltics.go msg="failed to collect analytic data" pollar=cluster object=volumeAnalytics error="error making request connection error: Get \ ....: context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

dreamy tapir
#

Could you check if there are any errors on below page

https://NABox_IP/vm/targets

#

Also please check if metrics collection is working fine by doing ssh to container with below steps

docker exec -it havrest sh
ps -ef | grep poller

# Check poller promPort from cli and update 12001 to that

wget http://localhost:12001/metrics
cat metrics

# This cat command should show metrics collected from cluster
vague wasp
dreamy tapir
#

Great!

#

I think in NABox, it is already 200MB

#

Is this cluster heavily loaded with many objects like number of volumes/luns etc?

vague wasp
dreamy tapir
#

Understood. Reviewing the logs would have been helpful, but to resolve this issue, we should increase the maxScrapeSize setting in victoriametrics.
Have you customized any of Harvest templates in NABox?

#

@stable barn to help with this

vague wasp
#

we never did, but i will wait for a response tomorrow . Thank you very much!

dreamy tapir
#

Sure. I'll share a CLI with you to run whuch object is causing this much data.

#

Could you run below cli in NABox and share the output?

docker logs havrest 2>&1 | \
  grep 'msg=Collected' | \
  grep 'instancesExported=' | \
  awk '
  {
    c=""; ie=0; me=0
    for(i=1;i<=NF;i++){
      split($i,kv,"=")
      if(kv[1]=="collector")         c=kv[2]
      if(kv[1]=="instancesExported") ie=kv[2]+0
      if(kv[1]=="metricsExported")   me=kv[2]+0
    }
    if(c!=""){last_ie[c]=ie; last_me[c]=me}
  }
  END{
    for(c in last_ie) print last_ie[c], last_me[c], c
  }' | \
  sort -rn | \
  awk '{printf "%-40s instancesExported=%-6d metricsExported=%d\n", "collector="$3, $1, $2}'
vague wasp
#

yes just a moment

dreamy tapir
#

You can also use command below, which will be faster if you have a large number of Docker logs.

docker logs havrest --since 30m 2>&1 | \
  grep 'msg=Collected' | \
  grep 'instancesExported=' | \
  awk '
  {
    c=""; ie=0; me=0
    for(i=1;i<=NF;i++){
      split($i,kv,"=")
      if(kv[1]=="collector")         c=kv[2]
      if(kv[1]=="instancesExported") ie=kv[2]+0
      if(kv[1]=="metricsExported")   me=kv[2]+0
    }
    if(c!=""){last_ie[c]=ie; last_me[c]=me}
  }
  END{
    for(c in last_ie) print last_ie[c], last_me[c], c
  }' | \
  sort -rn | \
  awk '{printf "%-40s instancesExported=%-6d metricsExported=%d\n", "collector="$3, $1, $2}'
dreamy tapir
#

Hmm How about for below command?

docker logs havrest --since 30m 2>&1 |   grep 'msg=Collected' |   grep 'instancesExported='
vague wasp
#

still no output

dreamy tapir
#

if you just run

docker logs havrest

Do you see any line with Collected text in it?

vague wasp
#

i see an error: unable to merge config error=open nabox: no such file or directory with a few warnings and another error: poller exited abnormaly, fork/exec bin/poller: no such file or directory

#

no collected

dreamy tapir
#

Okay it means poller isn't even started?

#

Looks like something is wrong with NABox file structure. @stable barn Could you pls check?

stable barn
#

what version is it ?
mkdir /etc/nabox/harvest/nabox

vague wasp
#

forgive me for disappearing.
you meant what version is the Nabox?

stable barn
#

yes