#adding cluster: failed in conversion

1 messages · Page 1 of 1 (latest)

strange crystal
#

Anyone else have the issue with adding a cluster to AIQUM resulting in 'failed in conversion'?

Short background story:

  • we had AIQUM (last running version: 9.16P2) for ~ 1 year

  • we run into: The Mutual TLS certificate for the cluster … has expired

  • was not able to get it running again (refreshed internal & external certificate without any success)

  • deployed a new AIQUM

  • first 9.18RC1

  • was not able to add cluster. Run into certificate IP mismatch, figured out: You need 'localhost' and IP addrr inside SAN entries

  • after fixing this, I was able to add a cluster

  • discovery runs into 'Failed in conversion'

  • dropped 9.18RC1, deleted all certs in ONTAP & also removed the user with auth-method 'cert'.

  • deployed 9.16P2 again (fresh deployment)

  • added one cluster = Ok

  • Certificates, ems event filter & destination are created

  • user auth-method 'cert' for 'http' and 'ontapi' are created

  • discovery runs also into 'Failed in conversion'

-> no firewall between ONTAP & AIQUM
-> custom.hostname already set
-> AIQUM user has admin role in http, console, ontapi, amqp)
-> /etc/hosts already 'fixed' with replacing 127.0.1.1 with actual IP
-> no errors in notifyd.log, no errors in mgwd.log (ONTAP, also no errors in event log show command)
-> server_acq.log shows: Certificates, ems event filter & destination are created; “Completed executing forced acquisition”

The only thing else I noted: Date & Timezone of ONTAP & AIQUM are the same. But the logs of AIQUM are 1h behind.

I already have an open case, but maybe someone here can point me to the right direction.

#

adding cluster: failed in conversion

ionic coyote
manic sonnet
#

failed in conversion is usually an error with rest monitoring with aiqum. it can be that bug, or it might be something else, there's a few different versions of them.

Another thing you can check is make sure the ontap licensing is the same on all nodes, you might need to apply your ontap one license again or get licenses for new nodes you've added to the cluster (if you've expanded it). I forget if this had the failed in conversion message along with other key messages, but it's a simple enough check.

Having a case open is the right step, they should get you to L2 fairly soon if you've been able to provide a support bundle and you don't match a known issue. I'd pull up your case myself, but i'm on vacation 🙂

#

if you're able to post the full failed in conversion error (mask any hostnames) and possibly one or two errors before that, i could tell you better

#

the easy fix is to disable the cloud agent, but we would prefer you work through support to make sure we get all of these bugs identified. if we have a fix, we can tell you what it is (future upgrade or something tweaked on the cluster side) without disabling it.

#

we can't fix the feature if we're turning it off all of the time, although we do understand the frustration.

#

if you're able to deploy the 9.18RC1 again and add the cluster and leave it in a broken state while disabling the cloud agent for your 9.16P2 for prod monitoring, that should be acceptable too, just make sure your case owner is aware.

strange crystal
# ionic coyote <@768872331035672596> You might want to check ntap version https://kb.netapp.com...

I do not have any 'EOFException' messages in au.log.
I deleted the user from the cluster (with all auth-methods), removed all certs created by AIQUM and added the cluster again in AIQUM: same result. 🙁

I don't think that the cert based auth has something to do with my issue, as I'm able to add the cluster & AIQUM also createds all event filters & certs. But I'm not sure, good to know that it's possible to disable cert-based auth. 🙂

strange crystal
# manic sonnet failed in conversion is usually an error with rest monitoring with aiqum. it can...

We did not expand the cluster. We refreshed the complete MC setup ~ 13 months ago and reinstalled AIQUM at the end of the refresh / migration. All LICs are the same and were never expanded as far as I know.

Attached are some error messages from au.log after adding the cluster to AIQUM again & also the last part with the conversion error.

I have a call with the case owner on Wednesday... let's see what we can figure out.

manic sonnet
#

hmmm, i'd expect something aggregate related with that stack trace, or possibly object store related. i'm not recalling anything specific off the top of my head. if you just refreshed the clusters, are there data aggregates created yet?

if you have data aggregates, this requires deeper digging than i can do at the moment. your case owner will either dig deeper or should get you up to L2.
if you don't have any, try creating one and see if it goes away, I do recall seeing an issue once with monitoring if there were no data aggrs on the cluster.

strange crystal
#

The 4-Node Mc is running in production for ~ 13 months. There are some aggregates & a lot of volumes. 🙂
Thank you for your time! Hopefully we will find something on Wednesday. 🤞

manic sonnet
#

yep, just have to work through the support process. if it's still ongoing next week, drop me a DM with the case number and I'll take a peek and make sure it's moving along.

#

i'll be back in the office then

stable moat
#

I have run into this issue as well. From the logs above:
"Unknown SnapMirror Policy Type: continuous"

This indicates you are using SnapMirror S3 and was an exact match for the error in my case.
As my environment was an internal lab, I just deleted my SnapMirror S3 relationships and AIQUM was happy.

I did have an email conversation with an AIQUM EE regarding this but in the end I just disabled the cloud agent.

If you PM me a support case# I'd be happy to ping that EE again to let him know we have caught one n the wild.

strange crystal
#

Hi @stable moat, no we do not use SnapMirror S3. We are just using FabricPool with on-prem StorageGrid. Nothing else S3-related on the ONTAP site of things. 🙁 (not sure if FabricPool Uses SnapMirror S3 in background (?))

Sure, I can PM you the case# anyway. Thanks 🙂

stable moat
# strange crystal Hi <@727839195212546128>, no we do not use SnapMirror S3. We are just using Fabr...

My mistake!

Rechecked my old notes and while I had that same message regarding the license I had more snapmirror related stuff in my au.log:
2025-09-24 19:52:51,697 ERROR [common-pool-71] c.o.s.a.d.n.b.r.g.n.GClusterBuilder (GClusterBuilder.java:251) - [netappfoundation] 10.128.16.23 - ExecutionException encountered on Builder java.util.concurrent.ExecutionException: java.lang.NullPointerException java.lang.NullPointerException at com.onaro.sanscreen.acquisition.datasource.netapp_ocie.builders.rest.gmode.netappfoundation.GSnapMirrorBuilder.createChildSnapmirror(GSnapMirrorBuilder.java:91) ~[au-datasource-netappfoundation.jar:9.16.0-2025.05.J22] at com.onaro.sanscreen.acquisition.datasource.netapp_ocie.builders.rest.gmode.netappfoundation.GSnapMirrorBuilder.buildModel(GSnapMirrorBuilder.java:70) ~[au-datasource-netappfoundation.jar:9.16.0-2025.05.J22]

Which was my real error.

Will have a look at the logs you provided to the support case.

strange crystal
#

Alright, we added some iptable rules which solved some issues in the logs. (We are using the OVA VM template, not sure why port 56072, 56080 is not open to beginn with...)
Anyway, we had still the same issue at the end. We disabled the cloud agent and everything worked after re-discover. Not sure why this is not the default setting for now, as "there is a long line of customers facing the same issue and netapp need to fix it. It's a known bug.".

stable moat
#

I can see in the case notes that you disabled the Cloud Agent and got things working again (reverting to ZAPI vs REST).

If you want to go modern again, with REST, there is a KB article with an exact match of what you ran into.
https://kb.netapp.com/data-mgmt/AIQUM/AIQUM_Kbs/MetroCluster_acquisition_via_REST_fails_in_AIQUM_with_NPE_on_GMode_Conversion?caseId=2010347649
You have a MetroCluster and the error in your au.log matches 100%.

The KB article is just a couple of days old so no wonder why it wasn't found at first.
To get access to the patch being referred to, just ask the support engineer.

strange crystal
#

Hey, thank you for the KB. I'll install a second AIQUM and check if I can get it running. Just need to fine some spare time for that.
Currently everything is running and support told me that ZAPI will still be there for a few years.