#NetApp poller keeps on restarting.

1 messages · Page 1 of 1 (latest)

real cedar
north pelican
#

What does docker logs poller-Netapp01 show?

real cedar
#

2024-02-14T03:38:09Z ERR poller/poller.go:1161 > Failed to negotiateAPI error="connection error => Post "https://Netapp01:443/servlets/netapp.servlets.admin.XMLrequest_filer\": dial tcp 172.30.0.4:443: connect: connection refused" Poller=Netapp01 collector=Zapi
2024-02-14T03:38:09Z ERR poller/poller.go:1161 > Failed to negotiateAPI error="connection error => Post "https://Netapp01:443/servlets/netapp.servlets.admin.XMLrequest_filer\": dial tcp 172.30.0.4:443: connect: connection refused" Poller=Netapp01 collector=ZapiPerf
2024-02-14T03:38:09Z WRN poller/poller.go:669 > abort collector error="connection error => connection error => Post "https://Netapp01:443/servlets/netapp.servlets.admin.XMLrequest_filer\": dial tcp 172.30.0.4:443: connect: connection refused" Poller=Netapp01 collector=ZapiPerf object=ObjectStoreClient
2024-02-14T03:38:09Z WRN poller/poller.go:306 > no collectors initialized, stopping Poller=Netapp01
2024-02-14T03:38:09Z INF poller/poller.go:531 > cleaning up and stopping [pid=1] Poller=Netapp01

north pelican
#

Looks like a connection issue with cluster. You may want to check Poller config.

#

You can run below command to check if configuration is correct

curl --connect-timeout 30 --user USER:PASS --insecure --data-ascii '<?xml version="1.0" encoding="UTF-8"?>
<netapp xmlns="http://www.netapp.com/filer/admin" version="1.130">
  <system-get-version/>
</netapp>' -H "Content-Type: text/xml" 'https://URL/servlets/netapp.servlets.admin.XMLrequest_filer'
#

Replace USER, PASS, URL as applicable

real cedar
dapper orbit
#

I think you are missing a space between the URL and the arg before? Try this

curl --connect-timeout 30 --user test:Test123 --insecure --data-ascii '<?xml version="1.0" encoding="UTF-8"?><netapp xmlns="http://www.netapp.com/filer/admin" version="1.130"><system-get-version/></netapp>' -H "Content-Type:text/xml" 'https://netapp01/servlets/netapp.servlets.admin.XMLrequest_filer'
real cedar
#

[root@it-devops04 harvest-22.08.0-1_linux_amd64]# curl --connect-timeout 30 --user test:Test123 --insecure --data-ascii '<?xml version="1.0" encoding="UTF-8"?><netapp xmlns="http://www.netapp.com/filer/admin" version="1.130"><system-get-version/></netapp>' -H "Content-Type:text/xml" 'https://netapp01:443/servlets/netapp.servlets.admin.XMLrequest_filer'
<?xml version='1.0' encoding='UTF-8' ?>
<!DOCTYPE netapp SYSTEM 'file:/etc/netapp_gx.dtd'>
<netapp version='1.211' xmlns='http://www.netapp.com/filer/admin'>
<results status="passed"><build-timestamp>1670716457</build-timestamp><is-clustered>true</is-clustered><version>NetApp Release 9.11.1P5: Sat Dec 10 23:54:17 UTC 2022</version><version-tuple><system-version-tuple><generation>9</generation><major>11</major><minor>1</minor></system-version-tuple></version-tuple></results></netapp>

dapper orbit
#

that looks good. And this was run from the same machine that you ran the docker logs command from earlier? The reason Rahul asked you to try that is because the poller is failing to connect to that cluster (dial tcp 172.30.0.4:443: connect: connection refused) Does ping netapp01 show the same ip (172.30.0.4)?

real cedar
#

yes.. this was from the same machine docker logs command was run.. netapp01 ip is different ..it is 10.15.187.22

dapper orbit
#

To rule out DNS issues, what if you change your harvest.yml's addr to use the ip address for the cluster instead of the domain name? If you're using NAbox, try changing the Host field from domain name to ip

real cedar
#

ok..let me try and get back to you

fading cairn
#

Double check your NAbox DNS settings indeed, I've seen instances where the search domain would be discarded

real cedar
#

DNS seetings looks fine

#

2024-02-14T18:27:18Z ERR poller/poller.go:1161 > Failed to negotiateAPI error="connection error => Post "https://10.15.187.226:443/servlets/netapp.servlets.admin.XMLrequest_filer\": tls: failed to verify certificate: x509: cannot validate certificate for 10.15.187.226 because it doesn't contain any IP SANs" Poller=netapp01 collector=Zapi
2024-02-14T18:27:18Z ERR poller/poller.go:1161 > Failed to negotiateAPI error="connection error => Post "https://10.15.187.226:443/servlets/netapp.servlets.admin.XMLrequest_filer\": tls: failed to verify certificate: x509: cannot validate certificate for 10.15.187.226 because it doesn't contain any IP SANs" Poller=netapp01 collector=ZapiPerf
2024-02-14T18:27:18Z WRN poller/poller.go:669 > abort collector error="connection error => connection error => Post "https://10.15.187.226:443/servlets/netapp.servlets.admin.XMLrequest_filer\": tls: failed to verify certificate: x509: cannot validate certificate for 10.15.187.226 because it doesn't contain any IP SANs" Poller=netapp01 collector=ZapiPerf object=VolumeNode
2024-02-14T18:27:18Z WRN poller/poller.go:306 > no collectors initialized, stopping Poller=netapp01
2024-02-14T18:27:18Z INF poller/poller.go:531 > cleaning up and stopping [pid=1] Poller=netapp01

#

Is this a certificate issue?

dapper orbit
#

yes

#

if the cert is self-signed or invalid you need use_insecure_tls: true included for that poller. NAbox shows that in the UI as

dapper orbit
#

did selecting that checkbox make the poller work?

real cedar
#

I made the change to harvest.yml file with docker and it worked

#

however I dont see a port on the netapp poller in the docker ps output

#

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a7e72859a316 ghcr.io/netapp/harvest:latest "bin/poller --poller…" 6 days ago Up About an hour poller-netapp01
b26883f63bc5 ghcr.io/netapp/harvest:latest "bin/poller --poller…" 6 days ago Up About an hour poller-unix
9dc119dd030d grafana/grafana:8.3.4 "/run.sh" 6 days ago Up 6 days 0.0.0.0:3000->3000/tcp grafana
e0a28ef9ee6d prom/prometheus:v2.33.1 "/bin/prometheus --c…" 6 days ago Up 6 days 0.0.0.0:9090->9090/tcp prometheus

#

I dont see any metrics in the grafana dashboard. I dont see my netapp cluster datacenter details etc.. so it looks like metrics are not being polled into grafana

dapper orbit
#

are you using the compose workflow or nabox? In an earlier msg, it sounded like you were using nabox?

real cedar
#

Trying both

dapper orbit
#

depending on how you created the harvest-compose file, it does not expose the port, which in the compose workflow is fine since the Prometheus container can see the poller ports because they're in the same network. Can you paste your harvest.yaml file using doctor by running the following. cd to the directory that contains your harvest.yml file docker run --rm --env UID=$(id -u) --env GID=$(id -g) --entrypoint "bin/harvest" --volume "$(pwd):/opt/temp" --volume "$(pwd)/harvest.yml:/opt/harvest/harvest.yml" ghcr.io/netapp/harvest doctor --print

real cedar
#

Admin:
Tools:
Exporters:
prometheus:
exporter: Prometheus
local_http_addr: 0.0.0.0
port_range: 2000-2030
prometheus1:
exporter: Prometheus
port_range: 13000-14000
Defaults:
collectors:
- Zapi
- ZapiPerf
- Ems
- Rest
- RestPerf
use_insecure_tls: true
Pollers:
unix:
datacenter: local
addr: -REDACTED-
collectors:
- Unix
exporters:
- prometheus
us01cmqa:
datacenter: us01
addr: -REDACTED-
auth_style: basic_auth
username: -REDACTED-
password: -REDACTED-
exporters:
- prometheus1

#

How can i find prom-ip?

dapper orbit
#

if you are on the same machine that you ran docker ps on, you can use localhost:9090

real cedar
dapper orbit
real cedar
#

[root@it-devops04 harvest-22.08.0-1_linux_amd64]# cat harvest-compose.yml
version: "3.6"

services:

unix:

image: cr.netapp.io/harvest:latest

image: ghcr.io/netapp/harvest:latest
container_name: poller-unix
restart: unless-stopped
command: '--poller unix --promPort 12990 --config /opt/harvest.yml'
volumes:
  - /root/harvest-22.08.0-1_linux_amd64/conf:/opt/harvest/conf
  - /root/harvest-22.08.0-1_linux_amd64/cert:/opt/harvest/cert
  - /root/harvest-22.08.0-1_linux_amd64/harvest.yml:/opt/harvest.yml
networks:
  - backend

us01cmqa:

image: cr.netapp.io/harvest:latest

image: ghcr.io/netapp/harvest:latest
container_name: poller-us01cmqa
restart: unless-stopped
command: '--poller us01cmqa --config /opt/harvest.yml'
volumes:
  - /root/harvest-22.08.0-1_linux_amd64/conf:/opt/harvest/conf
  - /root/harvest-22.08.0-1_linux_amd64/cert:/opt/harvest/cert
  - /root/harvest-22.08.0-1_linux_amd64/harvest.yml:/opt/harvest.yml
networks:
  - backend

[root@it-devops04 harvest-22.08.0-1_linux_amd64]#

dapper orbit
fading cairn
#

I can take over on the NAbox option though

real cedar
#

@dapper orbit .. The process you mentioned fixed the issue.. I am able to see metrics in Grafana now