#Sysmgr still needs work
1 messages · Page 1 of 1 (latest)
I don't think this is a matter of load, more of running into timeouts for certain API calls. Some CLI calls will also take longer to respond if nodes are down.
well, it hangs indefinitely, which would seem to be a poor timeout setting
just how are you going to analyse this in a case? My threshold for opening cases is pretty high
Trouble shoot: log in using a node mgmt. then try examining where the cluster_mgmt lif is. Start migrating it to other nodes. Does it fail on one node node or all nodes? Are service processors available?
Look at the switch config.
A cooler things I’ve seen include some being over secure and disabling gratuitous ARP (GARP) which is necessary for the lif failover to work. That can be tough to find but usually it would be disabled on the core switch where the vlan (mgmt vkan) is defined. Second is some people enable MAC security and limit MAC addresses per port. Most mistakenly make this to small. This also prevents failover
lif ports failover correctly...
i normally use ssh... so i tend to notice... and the mgmt lif was on the partner node of the node i had to take down (failover, ofc)
the priority of the sysmgr processes could be a bit higher as well... when wants to have a look at possible problems during a failover (capacity can be a problem at times) sysmgr is pretty useless when it hangs...
you have to start using cli commands to find out what is pushing you into a corner
You can always add “other” management ports that are not on e0M and put them on tagged VLANs ( should be different than the access port of e0M otherwise network port reachability will continually bark at you)
already done
a long time ago... vlan on a 10GE port... not the source of the problem
We would need to look at logs.