I have replaced one onde in our 4 node cluster. (all FAS2720) I followed the guide and it all went smoothly up until the encryption part...
It states that you need to resync the keys before you do the giveback... which I also did... then did the giveback, where it gaveback the root aggregate, but then just stalles at the data aggregates...It states "Waiting for partner lock synchronization" so I ran it once again... but still nothing... when I run the "storage failover show" a few times after each other I also get another message "Takeover is not possible: Storage failover mailbox version mismatch, The version of software running oneach node of the SFO pair is incompatible" which is a bit strange? Do I need to clear the mailbox disks for the new node?
#Giveback after replacing node doesn't complete...
1 messages · Page 1 of 1 (latest)
I also get this in the event log: cf.fm.versionMismatch: Failover monitor: CF monitor version mismatch detected: 2/0
This points to possible different ONTAP versions, but a "version -node" shows they all have 9.16.1 installed.. the new node have never had ONTAP 9.16.1 installed, but the boot media from the replaced controller ofcause had 9.16.1 and it did a few firmware updates the first time it booted... so seems legit somehow?
My bet it that I need to clear the mailbox disks on the new node?
...update, the data aggregates are given back to the node node, and so are the LIFs... so it is actually up and running... but we still have issues with the "storage failover show" which shows "System ID changed on local (Old: xxxxxxxx, New: yyyyyyyy), Connected to partner, Takeover is not possible: Storage failover mailbox version mismatch, The version of software running on each node of the SFO pair is incompatible" on the new node... and the partner node shows "Takeover possible true" as if everything is OK.. but when run the command again I get "System ID changed on local (Old: xxxxxxxx, New: yyyyyyyy), Connected to partner" and nothing else.. but then "Takeover is not possible: The version of software running on each node of the SFO pair is incompatible, NVRAM log not synchronized" on the partner node.. and another ting that is a bit strange is that the replaced node shows "-" under the partner field where the ha-partner shows the correct partner name... could it be that we should have set this in the LOADER prompt? It's not mentioned in the guide... and I would also guess that it is in the "env" on the boot media from the replaced node?
...just tried to boot the node into maint and did a "mailbox destroy local"... but it just ends up in the same state once rebooted... 😦
Just a status... the two nodes are now up and the aggregates/lifs have been distributed between them, but they are still not in a HA-pair... I have tried to destroy the local mailbox disks with no luck... so not really sure what is causing this... one thought is the fact that the two nodes have very different SN, like one node has a SN starting with 95... the other has 65... not sure if that makes any difference? We use these nodes for migrating customers so we have a few of them, but the ones we have left are all 65... we will get another 95... in a few months time... could it be that 95.. is the FAS2720 and the 65... is the FAS2750? and does it make any difference? 🙂
...not much help 😉 But eventually I managed to fix this by clearing both nodes mailbox disks, so involved a bit of down time because I was unable to failover.... but after that, the nodes booted up and worked as expected.. 🙂