#FAS3040 Controller died

1 messages · Page 1 of 1 (latest)

visual wren
#

Hey everyone,

I got a call in for a NetAPP for a client that has never came to us before and is running a very old FAS3040 and his secondary controller died that had his Datastorae on node 2 and node1 was no longer needed. So Since the secondary Controller was dead, I pull the controller from 1 to 2 and it see the disk and everything but it says waiting for giveback. Which I get it but the controller was dead how can I force this.

I personally never dealt with a NetAPP its something I want to try in my homelab

runic needle
#

Go to the Boot Menu (Ctrl+c during boot) and enter maintenance mode. There, type "mailbox destroy local" and "mailbox destroy partner", then reboot

That should make it boot normally

visual wren
fervent pine
#

Read what the system is asking for first 🙂 it is asking wether you accept the risk of data corruption in case the HA partner is still alive / active. You need to say "yes" here as it is not, to confirm the boot in maintenance mode, then you'll be able to destroy mailboxes.

visual wren
#

Got it, Before I do this, it will NOT delete any data or the configuration just forces the HA to single node?

#

Sound I can remount the data storage and get the hell off this thing and setup backups correctly for this new client it looks like

fervent pine
#

Il will not delete anything.

#
        In a High Availablity configuration, you MUST ensure that the
        partner node is (and remains) down, or that takeover is manually
        disabled on the partner node, because High Availability
        software is not started or fully enabled in Maintenance mode.

        FAILURE TO DO SO CAN RESULT IN YOUR FILESYSTEMS BEING DESTROYED

        NOTE: It is okay to use 'show/status' sub-commands such as
        'disk show or aggr status' in Maintenance mode while the partner is up
#

In your case, the HA partner is down.

visual wren
#

TY, guess I need to get a 30day trial going in my homelab to learn more about netapp seems pretty powerful vs truenas here.

fervent pine
#

Careful though, what you are attempting to do is basically a controller replacement, and on a quite dated OS that is. If you are inexperienced, be advised this a tricky operation. Tread lightly 🙂

visual wren
#

Oh I'm very inexperienced in NetApp hardware. I seen folks uses JBODs for these.

But I ran those commands and now rebooting into the LOADER> boot_primary

#

So if I put everything back to normal, I was able to access the controller that I was on but I couldn't see the data from the other SANs, Just the storage thatwas access on say node1, but there all interconnected via the sas cables to FC switch. Would I be able to mount that data still even tho its not reading from controller 2?

#

Another option is this client is willing to drop say $15k from the one ebay to get his stuff restore to migrate off of it.

#

Or if we can pump the data to another storage device / san he will buy any of them right now. This is connected to VMware host and is using NFS. So he is willing to spend money to get this fixed now. So not sure if you can sell hardware @feral temple or something that can help. But I think this customer is willing to do aything at this point.

fervent pine
#

Well, a typical FAS3040 setup is a dual node HA pair with disk shelves accessible to both nodes in FCAL. Which means both nodes have access to all disks. You only need to reassign disks from the broken node to the working one, and you would have access to all aggregates / volumes.

#

But since this is a FAS3040 setup... you would also need to modify the node configuration to expose the data through NFS / CIFS / ...

visual wren
#

I like that so far, I just have no idea how I would do that. Any docs that have this option to reassign the disk to the working on and node configuration.

fervent pine
#

In maintenance mode:

  • disk show -v (to get sysIDs associated with disks)
  • disk reassign -s old_system_id -d new_system_id
#

(old one beeing the dead node, and new one beeing the working one)

runic needle
fervent pine
feral temple
#

#subscribe

#

Keep us posted!

fickle basin
#

I looked through your message output. In summary:
-you have a 3040 running 8.1.4p8 7-mode
-One node has failed
-The surviving node is waiting for giveback
-There are conflicting disk reservations being detected.

runic needle
#

ah yes I guess you also need to do storage release disks to clear the SCSI persistent reservations

#

also in maintenance mode

visual wren
# fickle basin I looked through your message output. In summary: -you have a 3040 running 8.1....

@fickle basin you're pretty much spot on. I will try and do the commands the community has been telling me to try so I can at least just mount his storage back and then move to his new server and so as well. He is running VMware 5 I found out as well so he knew it was coming he even bought the hardware last month to do this.

So I will put the active node (working one) into maintenance and then do disk show and then reassign them and see how I can attach that volume back to NFS using basic permissions.

runic needle
#

don't forget releasing the disk reservations

fickle basin
visual wren
#

Yeah I'm about try and work on this. So should I try the 7mode (No idea what that is) and or (Storage release disk)

visual wren
#

Question.

So I assume here that where it says Owner this is where I need to get it changed to AINA03C0?

visual wren
#

I got the DATA 😛

fervent pine
#

Nice, well played !

visual wren
#

I do want to say thanks to everyone. I will post my steps that I also took after finding some basic documentation. Making me want to learn netapp now

visual wren
#

Heck I should've took screenshots and did a blog post about this one.

mortal citrus
#

This is pretty great! Glad I was able to steer you over here and that the community came through

visual wren
#

Yes this will help me save this customer and show him the importance that raid isn't a backup and you can lose your data very requickly and plus running 2008/2012 servers isn't ideal so we are now working up a plan on getting them updated / Backups and etc.

#

@mortal citrus Does NetApp offer a community or homelab version that doesn’t expire after 30 days?

I currently have two HP DL380 Gen9 (24-bay) servers running DataCore, which provide HA/failover, and I’m using synchronous replication for my homelab data. It works well and delivers solid IOPS, but I’m not a fan of the fact that DataCore runs on Windows.

I’d really like to give NetApp or another storage vendor,a try if there’s a non-expiring lab option available.

mortal citrus
#

Unfortunately not (at least not right now) 🫤

#

I'm working on it. It's just slow going.

visual wren
#

It might not be a bad idea to raise that up the chain at some point. Offering a non-expiring community or homelab version would give people who can’t afford the full solution a chance to learn the platform, gain experience with it, and potentially contribute useful community feedback.

I’ve also considered going back to TrueNAS, but they’ve been making some changes lately. From what I can tell, if you want proper HA/failover on your own hardware, you’re pushed toward enterprise support and licensing, which makes it less appealing for a homelab setup.

mortal citrus
#

If only I could share details of the battle its been so far...

#

Feedback like this is solid. I'm screenshotting it and adding it to the list haha. SEND MORE

visual wren
#

When I was working through the recent restore, I genuinely would not have been able to recover the customer’s data without the help of the community here. I came into the situation with very limited NetApp experience, and the willingness of people to step in, share their knowledge, and guide me through the process made all the difference.

That experience left a strong impression on me. It reinforced that there are vendors who build ecosystems around collaboration and shared expertise—not just transactional support models. Enterprise support absolutely has its place and clear value, but having an active, knowledgeable community that is willing to help others learn and troubleshoot speaks volumes about the strength and culture of the platform.

To me, a true community is about helping each other succeed whenever possible. It builds confidence in the technology, encourages people to invest time learning it more deeply, and fosters long-term loyalty. Those experiences shape how customers and engineers perceive a vendor far beyond feature comparisons or pricing discussions.

This is also why I take the time to write blog posts and document what I learn even if it feels like no one is reading them right now. The information is out there for the future, and at some point it may help someone else facing the same challenge.

It also serves as my personal knowledge base. Writing things down reinforces what I’ve learned and gives me something to reference when I encounter similar scenarios again. Over time, that shared knowledge whether in forums, blogs, or internal documentation—creates a stronger, more capable community overall.

I believe that’s something worth highlighting.