#e0e Link's down

1 messages · Page 1 of 1 (latest)

shrewd hawk
#

Hi *,
since a two days all my e0e ports into my 4-node Cluster are down with a Negotiation failure.

e0e: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
    uuid: 22c47a36-d2b6-11eb-9958-d039ea2e15a2
    options=7ec07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6,TXRTLMT,HWRXTSTMP,NOMAP>
    ether d0:39:ea:2e:ab:c5
    nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
    media: Ethernet autoselect <rxpause,txpause> (autoselect <full-duplex,rxpause,txpause>)
    status: no carrier (Negotiation failure)

cluster::*> version
NetApp Release 9.13.1P9: Fri Apr 19 17:13:02 UTC 2024

cluster::*>

Any ideas how to fix that?
We already toggeled the ports on netapp an switch site.
Lights and optics are okay,
Set port speed to 25G won't work because of https://kb.netapp.com/on-prem/ontap/OHW/OHW-KBs/Speed_modify_fails_as_it_is_not_able_to_modify_the_speed_greater_than_100MB

keen hollow
#

What model is that? What SFP is plugged in on the NetApp side (might be visible in "sysconfig -av")? Can you share the port config on the switchport? ("show run int e1/1" or similar

#

I remember there was an ONTAP update which disabled unsupported SFPs that worked fine before but I think that was longer ago than 9.13...

shrewd hawk
#

sfp is visible in sysconfig -a

#
        e0e MAC Address:    d0:39:ea:2e:ab:c5 (auto-unknown-fd-down)
            SFP Vendor:         Mellanox
            SFP Part Number:    332-00436
            SFP Serial Number:  MT2107FT1XXXX```
#

same sfp in port e0g is still working against a second switch

keen hollow
#

For 25G you need to make sure that the FEC on both sides matches

#

maybe one of the switchports has different FEC parameters

shrewd hawk
#

thx, we will check that ...

#

The strange thing, is that we installed ontap updates and three days later all e0e ports stopped working...

fast fractal
#

With ONTAP, I think there is a kb that suggests turning off FEC at the switch side. However, you can try the different options on the switch side: cl74 and cl91. I think the 74 is generally for 25g and 91 is generally for 100g (and I may have that backwards!). If I recall unless something has changed, there is no way to modify FEC mode in ONTAP (yet)

shrewd hawk
#

but T6225-CR is a dual port card ... may issue is facing the quad port onboard nic

keen hollow
#

maybe. The solution is also what I suggested, try different FEC modes on the switchport (none, RS-FEC aka Reed Solomon aka clause 108, FC FEC aka FireCode aka BASE-R FEC aka Clause 74)

shrewd hawk
#

I'll give it a try, thanks so far

fast fractal
#

With 25g, any connectivity issues is almost always FEC related

#

Even with Cisco! Connecting a Cisco 93180 to a server requires actually setting the FEC mode on the switch to get the darn link! Cisco to Cisco!

shrewd hawk
#

we rebooted the switch behind e0e - port is still down

#

fec settings are okay (FEC74 enabled)

#

i removed e0e from his ifgrp, toggled the port, port is still down

fast fractal
#

Yeha, but try disabling FEC at the switch. (ONTAP does not allow you to modify FEC)

#

If you are using Twinax, try a differnt cable or swap cables. What does the switch say (show int eth1/6 or whatever. Is it no link or errdisable or something else?)

umbral crater
#

Which hardware model is your controller?

keen hollow
#

my guess is Katana (judging from the sysconfig). So either A400 or FAS 8300 or FAS 8700

@shrewd hawk did youcheck against this KB? i.e. do you see the "bus stuck(I2C or data shorted)" messages in the boot log/dmesg output? If so, try takeover/giveback, or even reseating the mezz card as suggested in the article

shrewd hawk
#

Fas8300 and a400

shrewd hawk
#

ports are back online, a juniper switch reboot solved the issue

keen hollow
#

I thought you already rebooted the switch and the issue persisted?

shrewd hawk
#

Yes, but there are multiple ways to reboot juniper switch. The first reboot we did only restarts the Junos VM.

upper pecan
#

Not sure if this has any relevance, but we had a somewhat similar issue with a 100G port in a Cisco switch which we "fixed" by forcing 100G speed on the switches... It also didn't go away by rebooting the switches, just if we rebooted the ontap nodes, which was somewhat inconvenient... and this was with all Cisco gear, switches and DAG-cables...

fast fractal
#

known issue with the Nexus switches. There is a comment in the RCF that in some cases you may need to force the speed.