#Issues with BMC on A800

1 messages · Page 1 of 1 (latest)

queen marlin
#

We see the following message on 4 new A800. After upgrading to 9.14.1P10 we cannot ssh into the BMC after a while. Rebooting the SP from the OnTap prompt resolves the issue temporarly.
The connection is to a Cisco switch with a SFP with RJ45.
Any ideas?
Thanks! Martijn

Record 102: Sun Jan 05 06:44:25.519188 2025 [IPMI.notice]: 090e | 02 | EVT: 0901ffff | Wrench_Port_Up | Assertion Event, "Device Enabled"
Record 103: Sun Jan 05 06:44:27.298322 2025 [IPMI.notice]: 090f | 02 | EVT: 0900ffff | Wrench_Port_Up | Assertion Event, "Device Disabled"
Record 104: Sun Jan 05 08:14:25.598406 2025 [IPMI.notice]: 0910 | 02 | EVT: 0901ffff | Wrench_Port_Up | Assertion Event, "Device Enabled"
Record 105: Sun Jan 05 08:14:27.354857 2025 [IPMI.notice]: 0911 | 02 | EVT: 0900ffff | Wrench_Port_Up | Assertion Event, "Device Disabled"
Record 106: Sun Jan 05 09:44:25.664825 2025 [IPMI.notice]: 0912 | 02 | EVT: 0901ffff | Wrench_Port_Up | Assertion Event, "Device Enabled"
Record 107: Sun Jan 05 09:44:27.394699 2025 [IPMI.notice]: 0913 | 02 | EVT: 0900ffff | Wrench_Port_Up | Assertion Event, "Device Disabled"
Record 108: Sun Jan 05 11:14:27.399175 2025 [IPMI.notice]: 0914 | 02 | EVT: 0901ffff | Wrench_Port_Up | Assertion Event, "Device Enabled"
Record 109: Sun Jan 05 11:14:29.409240 2025 [IPMI.notice]: 0915 | 02 | EVT: 0900ffff | Wrench_Port_Up | Assertion Event, "Device Disabled"
Record 110: Sun Jan 05 12:44:27.541674 2025 [IPMI.notice]: 0916 | 02 | EVT: 0901ffff | Wrench_Port_Up | Assertion Event, "Device Enabled"
Record 111: Sun Jan 05 12:44:29.429497 2025 [IPMI.notice]: 0917 | 02 | EVT: 0900ffff | Wrench_Port_Up | Assertion Event, "Device Disabled"

grave jetty
#

what bmc firmware version do you have?
I know there was a bug which was fixed in 11.8(i think) where the bmc port could become unresponsive

queen marlin
#

10.9P2 - came bundled with OnTap 9.14.1P10

plucky oar
#

@cerulean gulch you aware of this?

primal bobcat
#

strange... seems to happen precisely every 90 minutes. does the switch have 802.3az (eee) enabled maybe?

queen marlin
#

Will check!

queen marlin
#

The switches use the 802.3z standard (Cisco 9K)

primal bobcat
#

have you tried disabling eee (no power efficient-ethernet, as described here )

#

802.3z is just 1GBASE-T, 802.3az is EEE

cerulean gulch
#

@plucky oar This is the first that I've heard of a BMC issue on the A800 in a long time. I'd have to dig deeper.

cerulean gulch
#

@queen marlin Without any other log information, this is like throwing darts blindfolded. Try the recommendations by Darkstar first. If that doesn't work, then I suspect that the Ethernet switch controlling the BMC is in a weird state. You can clear the weird state, but it involves removing the PCM to allow the caps to power-down, which resets the switch.

queen marlin
#

Just received a BURT (internal still) from support. It is a bug in the latest BMC firmware for the A800

queen marlin
queen marlin
#

It is public!!

queen marlin
#

Just a note: The work-around is incorrect. A bmc reboot is necessary.

cerulean gulch
#

Well now...that is far less-disruptive with a PCM remove/discharge/reseat.

plucky oar
#

Can you reboot the BMC independently of a CF + soft reboot of the node?

high jungle
#

service-processor reboot -node node-01

#

(reboot any sp without an outage)

plucky oar
#

Learn something new every day

queen marlin
queen marlin
warped thistle
#

I wonder the same thing, we're experiencing the same behaviour on our A800's BMC ports. Not the end of the world, but looking forward to a fix.

queen marlin