#The entire client network will be down for Maintenance without redundancy

1 messages · Page 1 of 1 (latest)

strange pagoda
#

Our network team informed us that they need to do some work and all switches including the redundancy will be not available for 15 minutes.

What should we do to 8-nodes cluster. Do we have to shutdown all 8 nodes and the entire the cluster in advance, and then bring up after the are done?

If yes, should we halt the nodes one by one?

royal grail
#

As long as the intercluster switches are up, the loss of the customer facing data switches is not an issue for the cluster.

mossy thistle
#

Storage will be fine. Will not be so good for clients

#

(Not intercluster switches, technically that would be front end. Those are intracluster switches or just cluster switches or cluster interconnects)

royal grail
#

intra/inter always trips me up

strange pagoda
#

If my understanding is correct, then from the storage perspective, I don't need to do anything, just seat and wait for the customer facing switches to come back? All datastores / VM's / NAS will be out of services, but I pretty much cannot do anything about it, System admins/VMware admins may have to shut them down in advance, correct?

royal grail
#

Yup, storage will be fine. VMs will likely become unhappy if not shut down in advance. NFS/CIFS clients should recover on their own, but best to shut them down to not be chasing random issues afterwards.

strange pagoda
#

When the network came back, do I need to do anything to re-activate LIF's or the LIF's will come back to services on their own?
Also, what about LUN connections, are they also same as NAS/Vmware datastores, and I don't need to do anything?

Sorry for my persistence, because there are a lot of clients involved, I just want to prepare as much as I can.

mossy thistle
#

Check your lifs before shutdown
San lifs never move
Data lifs can move

When possible set the auto revert to true if you want/need the lif to go home when the port os available

urban snow
#

The biggest problem is not the storage but the clients. If you have Linux clients on a datastore, and you rip the datastore out from under them for an extended period of time, you will corrupt some or most of your boot disks. Almost 100% guaranteed that you will suffer and your Linux admins will try and blame you. Plan on shutting your VMware infrastructure down (yeah, that will likely hurt). Windows clients have traditionally held up better but they'll also have issues. There are always writes to OS disks and they'll time out with I/O errors. It won't be pretty.

strange pagoda
#

Okay. We don't need to do anything before the network team shut down the network connections:

As for the order to bring up all components involved once the network connections back to the service, I am thinking of the following:

  1. make sure all LIF's are up on the NetApp cluster.
  2. bring up all ESXi hosts.
  3. Make sure all Datastores are up from vSphere.
  4. bring up all VM's.

Do this order seem alright to you?