#One cluster is offline

1 messages · Page 1 of 1 (latest)

obsidian wren
#

Is there any way of fixing this kind of problem?

limpid flax
#

Do you know why node2 was taken over? If you fix whatever issue was happening on node2, then you can do a "storage failover giveback -ofnode cluster01-02" to give the control of node2 back.

Once the giveback is complete, you can do a "net int revert *" to send all the LIFs back to their home node.

obsidian wren
#

I have no idea why it was taken over, is there any way to diagnose it?

odd delta
#

Connect via SSH to the SP of node2 and check in what condition it is.

#
system power status
system console

Check if the system is in the Loader.

#

If the cluster has support, simply call and let them handle it.
But according to the screenshot it looks like the version is quite old...

obsidian wren
#

It shows that node2 has RPC: Port mapper failure - RPC: Timed out

obsidian wren
#

I've contacted support and they say that there is possibilty of physical failure of controller of this node.

limpid flax
#

"event log show" is probably the best place to start.

barren aspen
#

you probably just need to boot the second node and everything will fix itself after a few minutes 😄

obsidian wren
#

sadly

obsidian wren
#

It truly may be physically broken

twin fjord
#

What is the status when you physically connect to console port of bad system OR SP and access console?

SP-NODE> system console

Is the node sitting at LOADER-B> ?

barren aspen
obsidian wren
twin fjord
#

Try

boot_ontap

obsidian wren
twin fjord
#

Possibly bad DIMM

#

DIMM-NV1

obsidian wren
#

I will try replacing it

odd delta
#

ONTAP 8.2.3P5 Cluster-Mode pikachu

barren aspen
#

yeah, that DIMM is toast

#

also duplicate shelf IDs

obsidian wren
obsidian wren
obsidian wren