#Aggr Reconstruction did not start

1 messages · Page 1 of 1 (latest)

nimble sinew
#

Team, I have an rg compromised with 2 fails disk within an active data aggr, however, it has a spare but did not take it, does anybody know why did not start automatically o problably doing it manually?

obtuse fable
#

You are probably missing spare disks in the corresponding pool

#

Hm well you seem to have at least one spare in pool 0... Did you check the event log to see why it's not rebuilding?

nimble sinew
#

With the sysconfig -r should be showing the recontructing process but is not starting for an unknown reason, is there any way to start it manually?

#

3/26/2024 09:28:09 NA-Polux ERROR raid.rg.recons.cantStart: The reconstruction cannot start in RAID group /aggr1/plex0/rg1: No matching disks available in spare pool, targeting any spare pool

#

from the event log

#

I was planning to bring back online any of the disks broken, after that, execute replace command with the valid spare in a certain way force it the reconstructing

obtuse fable
#

it's strange because it should at least pick up the disk 0b.00.23 as spare. Try removing disk ownership on that partition and re-assigning it (should be disk removeowner -disk ... -data true followed by disk assign -node NA-Polux -all true)

nimble sinew
#

I did something similar yesterday, and took more than a day to zero that spare 😩

obtuse fable
#

you don't need to zero the spares, if a disk fails even non-zeroed spares will be picked

nimble sinew
#

ok, you want me to remove 1.0.23P1 from the spare pool, then reassign it again?

obtuse fable
#

yep

nimble sinew
#

no reconstruct starts

#

fyi this platform is not under support

nimble sinew
#

any other idea?

swift leaf
#

Looks like you have 3 spare disks on the other node; 0a.00.2P1, 0a.00.20P1 and 0b.00.21P1, I would try to reassign one of them to the partner node, just to see if it like them better. Something like disk assign -disk 0a.00.2P1 -owner <othernodename> (I think you need to put a "-force true" in there, or alternatively unassign it first...

delicate coyote
#

Do a sysconfig -a pls

indigo quail
#

Fun fact - there’s at least 20 customers with castor/pollux HA pairs 🤣

#

Most of any I’ve seen. Yogi/booboo, Mario/luigi also popular

iron quiver
limpid horizon
#

not sure if you already fixed your assignment but i guss buller7929 is right. EMS says "The reconstruction cannot start in RAID group /aggr1/plex0/rg1: No matching disks available in spare pool, targeting any spare pool"

swift leaf
#

oh.. another thing mentioned in the logs on this system is the "SHUTDOWN PENDING" message... I'm not sure, but maybe the system will do a shutdown if two disks are broken, and it cannot rebuild? As a precaution like if it's too warm etc.? (i hope it doesn't). But there are actually a few issus on the system which should be addressed eventually, like the volume that is running out of inodes, and net.ifgrp.lacp.link.inactive which looks like the intercluster links are down? And of cause the "well know" disk.dynamicqual.fail.parse issue which I think is fixed in later ONTAP releases... but of cause focus on getting the rebuild up and running...

obtuse fable
#

The thing I'm wondering about is: the system clearly has 1 spare of the matching size and in the correct pool... No idea why it doesn't like that one, it should have at least started to rebuild one of the failed disks

swift leaf
#

Agree, it's a bit strange... I have only had something similar happen when I was doing some disk replace commands in order to "move" disks arround to free up a shelf... It was some time ago, but it was also partitioned disks and I had to have support check it, and it ended up beeing because the two data partitions were on the same node... somehow that caused issues to where it would not start the disk replace... it was most likely my own fault for not paying attention to the partitions and just adding "-force" when it didn't do as I told it to 🙂

nimble sinew
#

Hello Team,

#

I resolve the issue checking the failover, where the aggr1 [was on the partner ], then after I perform the giveback the reconstruct starts because I believe unless the return to their own node, can do a reconstructing

nimble sinew
nimble sinew