#Failover Policy for node management

1 messages · Page 1 of 1 (latest)

finite zinc
#

I am trying to figure out the correct way to setup failover policy for the node management interface.

The problem we are getting is that the new hardware only has one copper interface e0M (while my old hardware had an e0c I could use). So the system complains that I do not have a proper failover config. If I set the failover policy from local-only to sfo-partner-only, it seems to work, but then my systems alert with: Alert ID: SwitchUnreachable_Alert (switch's are not reachable, although I can ping them fine from the interfaces).

I have both a c800 and a FAS8300, neither have enough copper ports for management, and the management network can not exist on the 802.1q trunk for the ifgroup.

What is the proper way to setup the management interfaces for the node? It seems like I am stuck between a rock and a hard place, without a good option here.

#

Oh, I am running 9.14P10.

solar canyon
#

Either simply set the failover-policy to disabled if you're fine with that or make sure you add the management VLAN to one of your data-ports. Then add it to the same failover-group where e0M resides.

#

You can't set it to sfo-partner-only since the node-mgmt LIF can only failover locally.

crimson socket
#

if you set the failover policy to sfo-partner the LIF actually becomes a cluster management LIF, not a node management LIF

#

(see the test_mgmt lif in the screenshot)

finite zinc
#

Thanks, it's not an option to set the VLAN on the data lifs as they are on other sides of firewalls. I will try to set it to disabled, but it seems like a hardware design flaw if the e0M fails, there is no way to get to the node (out of band). At least without burning a 10gig port (but from what I understand there is no compatible 1g copper SFP for those ports).

brazen hawk
#

connect the BMC port on each node, then you have an alternative to the e0M for (low level) node management although one can still easily hop into node shells for each of the nodes from any cluster login. The cluster management lif can easily failover among the e0M ports

#

and there are systems for centralized access to serial port connections as well

south grove
#

The first thing I do on installs for customers is set the failover policies.

Typically the e0M are in their own broadcast domain.

Node management lifs get their failover-policy set to disabled and while in at it I set the the cluster mgmt to broadcast-domain-wide

In the unlikely event there is a conflict with a vlan, it gets a little more complicated

I make sure the e0M/vlan are in the same broadcast domain.

I then create a cluster failover-group with just the e0M ports and another with just the vlan ports.

I modify the cluster mgmt to use the failover-group for e0M and data uses the vlan failover group

finite zinc
#

I sort of have this. all e0M are in their own BD. I think the cluster mgmt should be in this same BD but have it's fail over policy set to broadcast-domainwide policy. I think I just need to remove the local-only and set it to disabled for the node management interfaces. This should stop the complaints every sunday. IIRC, failover-group is depricated though?

#

my lab is almost done setting up and I can test this there.

south grove
#

I thought that as well but they are not! In fact short of creating ipspaces it’s the only way to realistically stop ONTAP from complaining about port reachability.

#

I just did an install last week where data and management are the same vlan. I put them all in the same bd and then created appropriate failover groups, calling them e0M-fg and vlan-126-fg

south grove
#

Firewall policies are deprecated in favor of network interface service-policy.

brazen hawk
#

I sometimes miss the practical side of documentation where a few such approaches could be explained. It's easier to unwind and change now and much better than before ipspaces were re-introduced in 8.* something, but getting the network setup "correctly" (wrt future flexibility and general security) from the start is a good thing.

crimson socket
#

We almost never use failover groups. If you have multiple ports in the same BD, why restrict LIF failover artificially to a subset of them? Makes no sense, neither for data nor for Mgmt LIFs, it only reduces redundancy. The only use case was in the beginning where you wanted to prevent ONTAP from putting data LIFs on e0M but those times are long gone 🙂