#Strange (?) aggr0 on new FAS2820 + DS460c + DS212c

1 messages · Page 1 of 1 (latest)

knotty totem
#

Hi guys, check this out - aggr0 setup out of the box in brand new fas2820 with fully populated DS460c and DS212c. 84 16TB disks and this is what we've got in aggr0(screenshot).

Also it put all 12 disks from ds212c and all 60 disks from ds460c to node_02. I've split 6 disks from ds212 and 30 disks from ds460 to node_01. I wanted more TiB but well its something like 795 TiB. Funny thing that we have second identical fas2820 with same shelves and it did same. ONTAP was 9.13.P2.

plain hearth
#

are you able to re-init it at this point?

knotty totem
#

catched it just today when I was scrolling asup messages tbh.

#

will that impair anything except some lost space?

#

in case that we cannot reinit it

plain hearth
#

i know you can convert to ADP. but not sure about this odd mix state honestly.

knotty totem
plain hearth
#

yeah. there's 12 disks internally. so node 1 grab them. i've seen this before. there was an old bug that caused this. like.. 9.6 maybe? but it's been fixed.

#

why I asked about re-init was do the re-init without the extra shelves and see what it does.

knotty totem
#

this was sitting on 9.13.P2 out of the box. upgraded it already to p9.

#

this will be hard unfortunately. old 2620 that is hosting data hits eos soon. we've just got this new FAS and rushed it to deploy and I think that guys started migration because we've deployed cifs and configured it

#

some xcp might be flying

plain hearth
knotty totem
#

well it can be problematic in future? how do you think? any impact besides some lost free space?

#

funny thing that it says aggr0 has been created on night month ago 5th april 2024. and I've found some manufacturing asup "HA Group Notification (USER_TRIGGERED (ALL:Manufacturing Test Config for SSN"? with system name aggr0_mfg_test_02_ and it seems very similar for aggr0 node2 - it took also drives from external shelves. that asup is from 5th april 2024 - so the night that aggr0 was created

plain hearth
#

I would open a case honestly. esp since it sounds like you can't re-init.

knotty totem
#

we won't cry about these few TiBs

#

systems are stable so far, patched nicely

plain hearth
#

oh got ya. BMC won't be lost

knotty totem
#

there are 4 spares overall

#

so if disk in aggr0 will fail it should be ok

plain hearth
#

yeah. they're just configured differently.

knotty totem
#

yep

#

all I can see that we've lost some free space that we could use

#

felt a little bit old-school - you know 3 disks for aggr0...

plain hearth
#

lol

#

additional question - did config advisor pick this up at all?

knotty totem
#

nope, dont think so

plain hearth
#

mmm

#

huh

knotty totem
#

active iq online also

#

it picked up outdated bmc tho - but on aff250

#

we've patched it

gritty yacht
#

so basically the concern is that N1 is made of ADP root partitions, and node 2 is full disks?

knotty totem
#

Yep

#

And node2 took all disks from external shelves but i changed ownership of half to n01

gritty yacht
#

Yeah, not optimal for efficiency.. I think you should probably open a case and get them to walk you through the process to reinit for adp. Sorry..

#

While it’s easy enough to 9a/9b and reinstall, I’d like it logged that this sort of thing happens

knotty totem
gritty yacht
#

Correct. Either is a supported config, just typically both nodes would be the same 😅

knotty totem
rustic tendon
#

This is exactly why I basically force my customers to re-init every install I do. Then I don’t have these issues! Granted I have others like sales not inputting the order properly so nobody can actually access the licenses.

knotty totem
#
Please find the relevant documents https://docs.netapp.com/us-en/ontap-systems-upgrade/upgrade/upgrade-reset-default-configuration-node3-and-node4.html and https://docs.netapp.com/us-en/ontap-metrocluster/install-ip/task_sw_config_restore_defaults.html

Also, should we remove cabling from additional shelves? ? Or can we wipe it with shelves attached?

It can be done with shelves attached . If there is no data in the nodes. No need to remove cables.

However please be informed all aggregates , volumes , lun’s raidgroups etc will get deleted and the controller will be in the state when it was shipped from factory.

Steps :

Please have console cable connected to the controller .

Bring the new controllers down [make sure it not part of cluster yet]

Boot it up and make sure you can access controller through console cable.

Where it asks to press ctrl+c to stop boot process , press ctrl+c

Look for boot options and select option 4 to wipe configuration.

Do that on both nodes and then reconfigure the controller from scratch.

#

So if I understand correctly:

  1. wipeconfig on both nodes
  2. set-default and option 9a on both nodes
  3. after that option 4 on both nodes

I'm little confused because got docs from metrocluster and upgrade. And wipeconfig then 9a is not too much? But maybe its necessary to correct this configuration.

rustic tendon
#

I go the extra mile and net boot ONTAP to get a full wipe off the flash boot

#

I don’t recall if the net boot is in that kb or not

knotty totem
#

dunno why support came up with wipeconfig and option 4... thinking about sticking to 9a/9b

rustic tendon
#

If the system shipped with adp then option 4 is “faster”. I just don’t play games anymore. Wipe. Wipe. Wipe.

#

Meaning I actually do it 3 times (9a) to verify it does grab all the drives. I’ve had times where an improperly assigned drives was in there. Running that tried time usually will make it stick out

knotty totem
#

wipeconfig also I think but well

#

usually i went 9a/9b

rustic tendon
#

4 is a wipeconfig. No doubt about it, something was messed up at delivery. Short of forcing NetApp to send you temp gear to fix it, there isn’t much you can do. The root volume is on whole disks. Not sure if that was by design. By default the internal 12 disks should have been used as root and probably using adp.

knotty totem
#

but on two identical systems? its same on two 2820 pairs

rustic tendon
#

Sounds like you have the right details. You might be able to get swing gear. Setup. Svm migrate. Fix your system migrate back

knotty totem
#

we catched it before putting production there

#

🙂

#

lucky us

#

we can reinit it

rustic tendon
#

It’s entirely possible someone swapped disk shelves. I’ve seen customers do that and not realize until it’s too late. I don’t care since I’m doing a reinit anyway

knotty totem
#

created case for this also

#

its new equipment

#

the thing is how we can proceed with it. we've already created cluster on it but stopped data migration when i've catched this issue

#

and then

#

Please have console cable connected to the controller .

Bring the new controllers down [make sure it not part of cluster yet]

Boot it up and make sure you can access controller through console cable.

Where it asks to press ctrl+c to stop boot process , press ctrl+c

Look for boot options and select option 4 to wipe configuration.

Do that on both nodes and then reconfigure the controller from scratch.

rustic tendon
#

Here’s my process
Get to loader prompt
set-defaults
saveenv
ifconfig e0M -addr=x.x.x.x -gw=255.255.255.0 -gw=x.x.xy
(Must define gw otherwise you hit a bug later)
netboot http://x.x.x.z/9318P1.tgz
Choose option 7 at boot menu
Say no to recover from backup
Say yes to use in next boot
Wait for both nodes to return to boot menu. Follow the other directions for a 9a/9b

#

And everything I do is via the serial console (not sp but it could be though if it’s setup)

knotty totem
#

cant do it from serial console, its dark site and PITA to enter

#

but we have sp set up

rustic tendon
#

The sp will work as long as you have access via ssh
ssh admin@sp
system console
(Take note of the escape sequence indicated to return to sp if desired!)

knotty totem
#

yeah thanks, access via ssh was available all along

rustic tendon
#

That process above will default the system. Make sure, no MAKE SURE ONTAP is cleanly shut down! If there is anything in nvram you will have problems!

knotty totem
#

will check from support and follow up. we will shut it down cleanly

rustic tendon
#

Good luck

knotty totem
#

thanks!

knotty totem
# rustic tendon Good luck

I've checked and we have 9.13.1.p9 there patched so I think we can skip netboot. gw defined on sp and e0m already

#

9.13.1p9 is ok for us

rustic tendon
#

Like I said, I do it to fully wipe the boot flash. Fully clean install. The process will reformat the boot flash and install whatever version as image1 and image2 along with making sure nothing else on the boot flash that shouldn’t be there

knotty totem
#

OK

rustic tendon
#

Takes like maybe 10 minutes. From my Mac book I do two systems at the same time in about 5 minutes

knotty totem
#

tommorow we will try to 9a/9b it etc. but I'm thinking if this aggr config will be good. system is for some archivizing data, cifs and nfs and servers will send data for them. capacity is what we need

knotty totem
#

just to update you guys - everything went perfectly. 9a/9b fixed issue. now aggr0 is splitted evenly on 5 disks in both nodes. even better we've got more than 880 TiB! with auto creation of local tier. before it was 795 TiB. I wonder what happened at factory - option 9b on one node and 9c on second? We will never know! thanks for all suggesstions guys!

rustic tendon
#

Check your aggregates (aggr show) and look at your roots. There is/was a bug when one aggr is DP and the other is tec if the drives are 10t or larger

knotty totem
#

Aggregate aggr0_cluster_01 (online, raid_dp, fast zeroed) (block checksums)
Aggregate aggr0_cluster_02 (online, raid_dp, fast zeroed) (block checksums)

vivid flint
#

Update to 9.14.1. Gives you another 5% per SATA-aggr because of reduced WAFL reserve. 🤙

knotty totem
vivid flint
#

Simply update and boom, more space (aggr needs to be bigger than 30TB).