#New install, unable to add one NetApp cluster

1 messages · Page 1 of 1 (latest)

steel rain
#

Hello team
We have installed NAbox at a customer to look into some performance issues.
We are unable to add the one cluster we are most interested in. NAbox WEB UI just returns a very helpful "An Error Occured"
Er are able to add other cluster without problem.
Both clusters are on a local net from NAbox (no routers, no firewalls)
User and roles added to both clusters as per the doc.
NAbox v 3.3, both clusters are 9.12.1P4

The cluster we are unable to add have been upgraded, migrated and head-swapped, so there might be some old default configuration in place. Not that I've found anyting relevant.

The customer have set up MFA and SAML on the problem cluster, but the "harvest2" user should be using passwd for authentication.

In the log for nabox-harvest2 container we see some entries that might indicate problems communicating with the cluster:
nabox-harvest2 | netapp_lib.api.zapi.zapi.NaApiError: NetApp API failed. Reason - URL error:URLError(ConnectionResetError (104, 'Connection reset by peer'))

How can we troubleshoot this problem?

Best regards,
Karl

proven mulch
#

Hi Karl. This might be due to SAML indeed let me do some research there was a situation back then where it prevented adding system with a similar error but I thought there was a workaround in place. In the meantime maybe the harvest team has some inputs regarding SAML.

swift summit
#

@steel rain let's try calling the ONTAP system directly to see if the ZAPI returns any response. Please execute the following curl command on the affected Poller, You need to replace USER, PASS, and CLUSTER_IP with the appropriate values:

curl --connect-timeout 30 --user USER:PASS --insecure --data-ascii '<?xml version="1.0" encoding="UTF-8"?>
<netapp xmlns="http://www.netapp.com/filer/admin" version="1.130">
  <system-get-version/>
</netapp>' -H "Content-Type: text/xml" 'https://CLUSTER_IP/servlets/netapp.servlets.admin.XMLrequest_filer'
steel rain
#
<!DOCTYPE netapp SYSTEM 'file:/etc/netapp_gx.dtd'>
<netapp version='1.221' xmlns='http://www.netapp.com/filer/admin'>
<results status="passed"><build-timestamp>1686084094</build-timestamp><is-clustered>true</is-clustered><version>NetApp Release 9.12.1P4: Tue Jun 06 20:41:34 UTC 2023</version><version-tuple><system-version-tuple><generation>9</generation><major>12</major><minor>1</minor></system-version-tuple></version-tuple></results></netapp>```
proven mulch
#

Oh well… I guess back to NAbox then…

swift summit
#

Can you share output of below command from nabox shell

dc exec -w /conf nabox-harvest2 /netapp-harvest/bin/harvest doctor --print

steel rain
#

Thank you very much both!
My costomer contact is very busy and I have no direct access. Will get the output asap

steel rain
#

nabox:~# dc exec -w /conf nabox-harvest2 /netapp-harvest/bin/harvest doctor --print
Defaults:
collectors:
- Zapi
- ZapiPerf
- Rest
exporters:
- nabox-prometheus
Exporters:
nabox-prometheus:
addr: -REDACTED-
exporter: Prometheus
master: true
Pollers:
_unix:
addr: -REDACTED-
autostart: '1'
collectors:
- Unix
datacenter: NAbox
prometheus_port: 12800

swift summit
#

Thanks @steel rain . The poller is not mentioned in the file, indicating that the issue is likely originating from the NABox side, and not with Harvest. @proven mulch will be able to help in this.

steel rain
#

As of now we have no clusters added to NAbox.
We removed the cluster that we were able to add (as to not fill up the logs with normal activity)
The cluster we are NOT able to add just returns "An Error Occurred"

swift summit
proven mulch
#

Please also try :

curl --connect-timeout 30 --user USER:PASS --insecure --data-ascii '<?xml version="1.0" encoding="UTF-8"?>
<netapp xmlns="http://www.netapp.com/filer/admin" version="1.130">
  <cluster-identity-get/>
</netapp>' -H "Content-Type: text/xml" 'https://CLUSTER_IP/servlets/netapp.servlets.admin.XMLrequest_filer'
#

Another thing we can try is to add the configuration manually in harvest.yml, see if harvest correctly works then

#
cd /opt/harvest2-conf/
vi harvest.yml

You can duplicate an existing section here from another cluster that is working, and increment the port in prometheus_port

#

The error you are getting is an exact match for a previous case where the user couldn't add the system if SAML was used, and I was able to confirm there is mitigation in place. And that doesn't explain why curl would be working. You're sure you are running curl on the right ONTAP system ?

steel rain
#

I'll be onsite at the customer tomorrow, so I will be able to try your suggestions then. I cant guarantee that curl was run against the correct cluster, but I will verify then as well as add another cluster and copy the config in harvest.yml. Good suggestions there, thank you!

steel rain
#

I've been sick, but I now got remote access to some of the systems. Easier to get the output you like to see now.
I tripple-checked, and we are able to access the right cluster with curl as you asked, @proven mulch :

#
<!DOCTYPE netapp SYSTEM 'file:/etc/netapp_gx.dtd'>
<netapp version='1.221' xmlns='http://www.netapp.com/filer/admin'>
<results status="passed"><attributes><cluster-identity-info><cluster-contact></cluster-contact><cluster-location>NSC-DU</cluster-location><cluster-name>nsc-cl01</cluster-name><cluster-serial-number>1-80-007197</cluster-serial-number><cluster-uuid>f9f6d909-82ad-11e3-9b4b-123478563412</cluster-uuid><rdb-uuid>f9f7052a-82ad-11e3-9b4b-123478563412</rdb-uuid></cluster-identity-info></attributes></results></netapp>```
#

I also tried to add another cluster to NAbox and copy the config in harvest.yml. It still failes

#

ehh, actually, it seems to be working today!? Let me check

steel rain
#

Yes, it seems it worked to copy config from the working cluster in harvest.yml and editing for the problem cluster worked. Last thing I got done on friday, but I didn't notice til today.
I actually got an error on friday, maybe it just took some time.
It seems my problem is solved for now. Any additional info you want at this time to identify what the problem was?
Anyway thank you very much for helping me here, Yann and Rahul

shut flint
#

Glad to hear you're unblocked @steel rain! If you have logs for the poller in question when the original problem happened, we can take a look at those to see if anything jumps out. If so, feel free to grab them and send them to ng-harvest-files@netapp.com

proven mulch
#

If I get that correctly, you mannually added the failed cluster in the configuration and it works, but it still fails if you try to use NAbox UI to add it ?