#Duplicate shelf IDs. Change procedure as ONTAP is already running?

1 messages · Page 1 of 1 (latest)

hallow vigil
#

Repeating error message of "Fault reported on disk storage shelf attached to channel 0a. "
Two shelves in the stack, (DS460C & 224C) the filer is a FAS2750.

#

They show up as 0.(SerialNumber) in the CLI

#

Should I gracefully shutdown the FAS2750/Ontap and then change the LEDs and then power cycle the shelves and then final step boot the controller up?

floral stone
#

If your shelves with duplicate ID both already have disks in an aggregate and you don't have enough space somewhere else to move the volumes, then yes.
If not, then vol move all vols away, offline the aggr, change the shelf-ID and power-cycle the shelf.

lost cliff
#

This. With ONTAP the shelf must be power-cycled to set the ID. So yes:

  1. Graceful shutdown of ONTAP nodes in HA pair
  2. Set Shelf ID
  3. Power off shelf. Wait 10-seconds
  4. Power on shelf. Wait for it to stabilize (20-40 seconds)
  5. From the LOADER prompt type "bye"

I say this...if you happen to do any upgrade before a reboot of the nodes, there will be a warning about a loader variable and auto-boot. Mind you it will still work, just a silly annoying message that is resolved by using "bye" at the loader (instead of boot_ontap)

hallow vigil
#

@lost cliff @floral stone after rebooting both nodes are up and we can SSH but the cluster MGMT and the System Manager are down

tardy dew
#

There is a command in ONTAP to set the shelf ID while the system is running, but it will not take effect until the shelf is physically power cycled, there's no way around that sadly

hallow vigil
#

e0a and e0b both showing down

#

we fixed the shelf ID

#

now after rebooting cluster mgmt port is inaccesable Lol

tardy dew
#

e0a and e0b are the cluster ports? If they're down that is not good

#

they might just be temporarily degraded due to flapping, if so they should come up within a few minutes in that case

hallow vigil
#

yeah no link lights e0a and e0b cluster ports so totally unrelated issue checking switch side

#

always a surprise when rebooting!

tardy dew
#

that doesn't sound good. Do the ports show up when you do node run local sysconfig -av or network port show on one of the nodes?

#

We had hardware issues with quite a few 2xxx systems with our customers, where after a reboot the onboard ports just disappeared

hallow vigil
#

@tardy dew 10g eth adapter only showing on node 1, not node 2

#

yikes

#

the e0a/e0b adapter

tardy dew
#

yeah, that might be bad... you can try re-seating node 2 (pulling it, waiting a minute or two, and plugging it back in) but I'll be honest with you in most similar cases that I know about, a Mainboard replacement was necessary 😢

hallow vigil
#

is this a controller replacement

#

can i reseat without unplugging anything?

tardy dew
#

yeah, if the cables are long enough it should be fine. You only need to pull it out a centimeter or two

hallow vigil
#

if ontap is running will this cause data issues?

tardy dew
#

I assume your node 1 has taken over node 2? If not, do a takeover first

#

but if the cluster ports are down, you're not serving much data anyways

#

so if takeover doesn't work, just shut down node 2 before pulling it

hallow vigil
#

okay

#

were gonna do a full power cycle pulling plugs

#

im guessing this is going to end in a controller replacement as sysconfig didnt even show the adapter

#

🤦

lost cliff
#

That’s a bug I’ve run into before.

Log into both sps
Do a takeover one way.
If you can’t and i suspect you can’t, you can try to halt one node.
Interrupt the boot process on the rebooting or shut down node.
Use the sp and power cycle the node and let it boot.

If the cluster ports come back all good. If not you may need to repeat on the other node

#

It’s an sp issue talking to ONTAP

#

@hallow vigil see above

#

Reading more: do the above on node 2 not showing the ports

hallow vigil
#

currently booting up fingers crossed

lost cliff
#

If you are nearby you can watch e0a/b light up
If you are on the sp of the node not booting, you can review “net port show port e0a|e0b”

tardy dew
#

I'm still waiting for it to happen on a system that is no longer under warranty, so that I can take a closer look at it. Manually reflashing the firmware directly to the chip might be possible

lost cliff
#

Any update @hallow vigil ?