#Sysmgr still needs work

1 messages · Page 1 of 1 (latest)

ivory crane
#

Seems like sysmgr just doesn't want to work in failover situations with load... cluster mgmt interface isn't even on a node with load.

#

with one node of 4 down, everything just hangs

ionic sparrow
#

I don't think this is a matter of load, more of running into timeouts for certain API calls. Some CLI calls will also take longer to respond if nodes are down.

ivory crane
#

well, it hangs indefinitely, which would seem to be a poor timeout setting

arctic swallow
#

Open a case please.

#

You probably aren't wrong....

ivory crane
#

just how are you going to analyse this in a case? My threshold for opening cases is pretty high

noble owl
#

Trouble shoot: log in using a node mgmt. then try examining where the cluster_mgmt lif is. Start migrating it to other nodes. Does it fail on one node node or all nodes? Are service processors available?

Look at the switch config.

A cooler things I’ve seen include some being over secure and disabling gratuitous ARP (GARP) which is necessary for the lif failover to work. That can be tough to find but usually it would be disabled on the core switch where the vlan (mgmt vkan) is defined. Second is some people enable MAC security and limit MAC addresses per port. Most mistakenly make this to small. This also prevents failover

ivory crane
#

lif ports failover correctly...

#

i normally use ssh... so i tend to notice... and the mgmt lif was on the partner node of the node i had to take down (failover, ofc)

#

the priority of the sysmgr processes could be a bit higher as well... when wants to have a look at possible problems during a failover (capacity can be a problem at times) sysmgr is pretty useless when it hangs...

#

you have to start using cli commands to find out what is pushing you into a corner

noble owl
#

You can always add “other” management ports that are not on e0M and put them on tagged VLANs ( should be different than the access port of e0M otherwise network port reachability will continually bark at you)

ivory crane
#

already done

#

a long time ago... vlan on a 10GE port... not the source of the problem

arctic swallow
#

We would need to look at logs.