#Duplicate shelf IDs. Change procedure as ONTAP is already running?
1 messages · Page 1 of 1 (latest)
They show up as 0.(SerialNumber) in the CLI
Should I gracefully shutdown the FAS2750/Ontap and then change the LEDs and then power cycle the shelves and then final step boot the controller up?
If your shelves with duplicate ID both already have disks in an aggregate and you don't have enough space somewhere else to move the volumes, then yes.
If not, then vol move all vols away, offline the aggr, change the shelf-ID and power-cycle the shelf.
This. With ONTAP the shelf must be power-cycled to set the ID. So yes:
- Graceful shutdown of ONTAP nodes in HA pair
- Set Shelf ID
- Power off shelf. Wait 10-seconds
- Power on shelf. Wait for it to stabilize (20-40 seconds)
- From the LOADER prompt type "bye"
I say this...if you happen to do any upgrade before a reboot of the nodes, there will be a warning about a loader variable and auto-boot. Mind you it will still work, just a silly annoying message that is resolved by using "bye" at the loader (instead of boot_ontap)
@lost cliff @floral stone after rebooting both nodes are up and we can SSH but the cluster MGMT and the System Manager are down
There is a command in ONTAP to set the shelf ID while the system is running, but it will not take effect until the shelf is physically power cycled, there's no way around that sadly
e0a and e0b both showing down
we fixed the shelf ID
now after rebooting cluster mgmt port is inaccesable Lol
e0a and e0b are the cluster ports? If they're down that is not good
they might just be temporarily degraded due to flapping, if so they should come up within a few minutes in that case
yeah no link lights e0a and e0b cluster ports so totally unrelated issue checking switch side
always a surprise when rebooting!
that doesn't sound good. Do the ports show up when you do node run local sysconfig -av or network port show on one of the nodes?
We had hardware issues with quite a few 2xxx systems with our customers, where after a reboot the onboard ports just disappeared
@tardy dew 10g eth adapter only showing on node 1, not node 2
yikes
the e0a/e0b adapter
yeah, that might be bad... you can try re-seating node 2 (pulling it, waiting a minute or two, and plugging it back in) but I'll be honest with you in most similar cases that I know about, a Mainboard replacement was necessary 😢
yeah, if the cables are long enough it should be fine. You only need to pull it out a centimeter or two
if ontap is running will this cause data issues?
I assume your node 1 has taken over node 2? If not, do a takeover first
but if the cluster ports are down, you're not serving much data anyways
so if takeover doesn't work, just shut down node 2 before pulling it
okay
were gonna do a full power cycle pulling plugs
im guessing this is going to end in a controller replacement as sysconfig didnt even show the adapter
🤦
That’s a bug I’ve run into before.
Log into both sps
Do a takeover one way.
If you can’t and i suspect you can’t, you can try to halt one node.
Interrupt the boot process on the rebooting or shut down node.
Use the sp and power cycle the node and let it boot.
If the cluster ports come back all good. If not you may need to repeat on the other node
It’s an sp issue talking to ONTAP
@hallow vigil see above
Reading more: do the above on node 2 not showing the ports
currently booting up fingers crossed
If you are nearby you can watch e0a/b light up
If you are on the sp of the node not booting, you can review “net port show port e0a|e0b”
this has got to be one of the most annoying HW bugs recently. The ports just disappear as if they never existed, not even sysconfig shows them anymore. Something to do with a corrupted firmware according to NetApp support
I'm still waiting for it to happen on a system that is no longer under warranty, so that I can take a closer look at it. Manually reflashing the firmware directly to the chip might be possible
Any update @hallow vigil ?