#AIQUM 9.181RC1 issues in adding storge to moniotr

1 messages · Page 1 of 1 (latest)

spark gull
#

Hello folks ,

I have reinstalled a AIQUM 9.18.1RC1 in the lab to test stuff arround but i am not able to add anything to it i get this :

2025-12-07 23:42:44 [:ERR]:maint:API:action:[10.7.2.41]::Unable to add cluster datasource. This can occur if the clocks on the systems are not synchronized and the Active IQ Unified Manager HTTPS certificate start date is later than the date on the cluster, or if the cluster has reached the maximum number of EMS notification destinations.

NTP servers are the same as on the stoage same also timezone is the same .

the systems i am trying to add are A400 and A70's at 9.16.1p9 level .

hallow drift
#

that is unfortunately a generic error with adding clusters, and time is not a usual culprit.

this is likely an issue with the cloud agent, it will try to use that by default for ontap versions 9.14 and up. you need port 56443 open between aiqum and the cluster.

but to be sure, you would need to check the server_acq.log and see what it says. there's several things it could be.
https://kb.netapp.com/data-mgmt/AIQUM/AIQUM_Kbs/What_are_the_notable_log_files_and_their_respective_locations_for_Active_IQ_Unified_Manager

spark gull
#

good point i did not notice the change in ports requirements ; i will request the new ports to be opened.

spark gull
#

I reinstalled the AIQUM server to 9.16.1P2 but nothing changed

#

getting :

#

Sequence number
3658599
Description
This message occurs when the cluster agent connection is not in a connected state. Reconnect attempt to establish a connection will be initiated automatically.
Event
mhost.ca.connect.failure: Cluster agent connection of the client: UnifiedManager_05509979-31b3-43cb-97f3-e0a30191a7d9 is not healthy. Attempting to reconnect. Error: Certificate error: IP address mismatch..
Action
Verify that the connection path to the destination is in healthy state and network security polices are configured as needed to permit connectivity. If this behavior persists, contact NetApp technical support.

#

this is from ontal side of the storage

#

on aiqum logs :

#

2025-12-09 19:21:29 [:INFO]:maint:API:in:[10.7.2.42]::GET: Fetching cluster certificate for host 10.7.0.50 over port 443
2025-12-09 19:21:38 [:INFO]:maint:API:action:[10.7.2.42]::Adding datasource host 10.7.0.50
2025-12-09 19:24:56 [:ERR]:maint:API:action:[10.7.2.42]::Unable to add cluster datasource. This can occur if the clocks on the systems are not synchronized and the Active IQ Unified Manager HTTPS certificate start date is later than the date on the cluster, or if the cluster has reached the maximum number of EMS notification destinations.
2025-12-09 19:25:12 [:INFO]:maint:API:in:[10.7.2.42]::GET: Successfully fetched all datasources.
2025-12-09 19:25:17 [:INFO]:maint:API:in:[10.7.2.42]::GET : Fetched list of Audit Log Files.

hallow drift
#

if you check the cluster, does it have a cert named "05509979-31b3-43cb-97f3-e0a30191a7d9" already? try removing that cert and then try readding the cluster.
https://kb.netapp.com/data-mgmt/AIQUM/AIQUM_Kbs/Unable_to_add_a_cluster_in_AIQUM_Certificate_error__IP_address_mismatch

if that doesn't work, you might need to check the hostname for the server you installed aiqum on. there's also a custom.hostname in aiqum's settings you can try setting.
https://kb.netapp.com/data-mgmt/AIQUM/AIQUM_Kbs/AIQUM_fails_to_add_cluster_datasource_due_to_cloud_agent_unable_to_establish_connection_from_certificate_error_IP_address_mismatch

viscid geode
#

Had the same issue. You need to have 'localhost' and the IP of your AIQUM server in the SAN of the HTTPS certifacte. After regenerating a new CSR with 'localhost' & IP inside SAN I installed the new HTTPS certificate (provided from our CA from the CSR) and then you should be able to add the cluster. (now I'm getting the error 'failed in conversion' , but thats nothing for this thread ^^)

wide rapids
#

So this is interesting as in my case, the request system we use does not allow adding 'localhost' and IP in the SAN. So based on what you just described, I will not be able to use Aiqum against my storage clusters moving forward since I will never be able to meet this requirement.

hallow drift
#

localhost can probably be eliminated by setting the custom hostname option in the aiqum cli. this has worked to get rid of localhost showing up for snmp and mail links.

ip might be more problematic, but if you add by hostname it should just try to verify the name.

spark gull
#

I have tested a bit more and the fact is older ontap like 9.15.1 can be monitored in AIQUM 9.18.1 but new-er than 9.17.1 gives the error :

#

Description:
Forces acquisition on a Data Source. Will wait until acquisition fails or completes or there is a timeout.
Failure Reason:
Errors reported on acquisition with start time 15:03:50.431: [Failed in conversion]

#

is this failed in conversion

#

it is added to AIQUM but it does not collect data

fervent peak
#

You can check the au.log to try and get more details on why it failed. Usually it is due to either not getting back something that it expects from Ontap or issues with processing the data it does receive.

spark gull
#

From what i see is like this :

#

GMode Conversion exception
AggregatedDataSourceErrorsException: 0 out of 1 devices acquired successfully
Failed in conversion

#

with some entries : NetAppOCIEDataSource.doFoundation
GClusterBuilder

#

/servlets/netapp.servlets.admin.XMLrequest_filer are not exposed

#

user@RO-AVASILE:~$ curl -k -u admin https://172.20.0.110/servlets/netapp.servlets.admin.XMLrequest
Enter host password for user 'admin':
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>404 Not Found</title>
</head><body>
<h1>Not Found</h1>
<p>The requested URL was not found on this server.</p>
</body></html>
user@RO-AVASILE:~$

#

in the logs : "GMode Conversion exception com.onaro.sanscreen.acquisition.framework.datasource.AggregatedDataSourceErrorsException: 0 out of 1 devices acquired successfully.
"

hollow kettle
#

I fought the "cloud agent" for about a week when I first upgraded to 9.16 and in the end, it was just easier to disable it. Just change enable.cloudagent to "false" in /opt/netapp/essentials/conf/server.properties and restart the ocie and ocieau services .

#

From what I understood at the time, REST responses are just too slow and adding systems never completes, among other things. Harvest takes a more practical approach and pulls what it can from zapi, then uses rest for things that are rest-only, given the correct priorities in its configuration.

hallow drift
hallow drift
# hollow kettle From what I understood at the time, REST responses are just too slow and adding ...

not slower usually, but missing bits of data we're used to getting with ZAPI. the 'doesn't respond quickly' errors are usually aiqum is still trying to use the cloud agent connection we created for the REST APIs, but ONTAP has deleted the connection for one reason or another. we are trying to get on top of these REST issues by not just disabling the cloud agent for every problem, but for that we need the cases.

I agree it's not ideal.

neat venture
#

Cloud Agent and mTLS has been a pain-in-the-a** really the last year... We had so many customers where NetApp support ultimately disabled it since the AIQUM connection to the clusters was not working smoothly.

hollow kettle
spark gull
#

STEP 2 ) logged with diag and switched to root and used nano / vi to edit the file server.properties and changed "enable.cloudagent=false"