#Can SFP on breakout cable X66120-3 be replaced?

1 messages · Page 1 of 1 (latest)

proud flax
#

We use X66120-3 breakout cables to connect 4-nodes cluster (2 FAS8200 and 2 AFF300) and N9K-9336c-FX2 switches. There are large MTU packet loss, and We were told that the SFP on the nodes went bad, and needs to be replaced.

I thought SFP comes together with the cable, and cannot be replaced alone. If needs to be replaced, we will then have to replace the whole cable. Can you please confirm?

cobalt rover
#

Yep the whole cable...

orchid spoke
#

what do you mean by "large MTU packet loss"? if you only lose packets that are over 1500bytes then it's not an SFP or cable issue... it is because jumbo frames are not enabled everywhere

#

also, did you connect the FAS8200 and the A300 to the same breakout cable? Because they're supposed to go on different switch ports, i.e. you need another cable for the second HA pair

winged ocean
#

Um, if it is a cluster cable, you can connect up to 4-nodes per cable, switch-01 and the same 4-nodes per cable to switch-02. Those platforms, using 10G as default need only one port per switch @ 10G

#

port 1 -> A300-01
port 2 -> A300-02
port 3 -> FAS8200-01
port 4 -> FAS8220-02

orchid spoke
#

hm, really? I mean that it usually works technically I agree, but all the documentation says that switch port 1 is for node 1, switch port 2 for node 2, and so on 🤔

#

I think I remember the reason for that being that some settings are per-physical-port and not per-breakout-port and if the nodes require different settings (e.g. FEC) that this might not work. Also, one broken cable will then immediately degrade four cluster nodes, which is something you probably don't want

winged ocean
#

Yeah but think about some of the newer configs.
Like 1.12-2-cluster configs
You have exactly 1x10g breakout and 1x25g breakout.

That doc you refer to needs to be seriously updated

#

It’s. Just. Wrong.

orchid spoke
#

Yes, having a definitive answer to that question somewhere in the docs would be awesome

cobalt rover
#

From the point of view of the switch, the 40G to 4x10G cable is more of less shown and handled as separate ports, and only the "physical" layer is somewhat different... Anyway why not just install the X1144A 2p 40G NIC in each node and be done with it? I cannot imagine they are that expensive anymore? Might be OOS with NetApp but maybe some broker can help out 🙂

winged ocean
#

I agree. I have converted plenty of fas8200/a300 to use 40g cluster ports especially when mixing with bigger nodes. However, when doing upgrades, I’ll stick with the 10g, loan out a couple breakout cables and retrieve when complete

#

Problem is that 40g card only has a total of 80Gb throughput (one full bio bidirectional port). Using both is certainly better than 10g but there is that issue

proud flax
#

NetApp Support asks us to reseat all two cluster breakout cables on both ends. So, I am going to let them to conduct the troubleshooting. But, since this is going to be first time we do something like that after implemented two N9K-C9336C-FX2's, if you can please help me to make sure we disconnect one path at a time.

We use 2 breakout cables for all 4 nodes, and connecting to Eth1/9/1 - Eth1/9/4 ports. Please also find all connections/ports in next message.

I will find out the corresponding Nexus commands later, but to illustrate:

  1. shutdown Eth1/9/1-Eth1/9/4 ports on cluster-sw-1
  2. reseat the cable on one side connecting to the swtich.
  3. Reseat the cable on the other side connecting to 4 ports whereas on different nodes.
  4. bring up Eth1/9/1-Eth1/9/4 ports on cluster-sw-1
  5. Repeat the same steps above on cluster-sw-2

Do above steps look good to you?

#

All connections:
cluster-sw-1
cluster-04 Eth1/9/1 163 H AFF-A300 e0a
cluster-03 Eth1/9/2 132 H AFF-A300 e0a
cluster-06 Eth1/9/3 132 H FAS8200 e0a
cluster-05 Eth1/9/4 155 H FAS8200 e0a

cluster-sw-2
cluster-06 Eth1/9/1 153 H FAS8200 e0b
cluster-04 Eth1/9/2 125 H AFF-A300 e0b
cluster-05 Eth1/9/3 176 H FAS8200 e0b
cluster-03 Eth1/9/4 154 H AFF-A300 e0b

cluster/ Local Discovered
Protocol Port Device (LLDP: ChassisID) Interface Platform


cluster-05 /cdp
e0a cluster-sw-1(serial-1) Ethernet1/9/4 N9K-C9336C-FX2
e0b cluster-sw-2(serial-2) Ethernet1/9/3 N9K-C9336C-FX2
cluster-04 /cdp
e0a cluster-sw-1(serial-1) Ethernet1/9/1 N9K-C9336C-FX2
e0b cluster-sw-2(serial-2) Ethernet1/9/2 N9K-C9336C-FX2
cluster-03 /cdp
e0a cluster-sw-1(serial-1) Ethernet1/9/2 N9K-C9336C-FX2
e0b cluster-sw-2(serial-2) Ethernet1/9/4 N9K-C9336C-FX2
cluster-06 /cdp
e0a cluster-sw-1(serial-1) Ethernet1/9/3 N9K-C9336C-FX2
e0b cluster-sw-2(serial-2) Ethernet1/9/1 N9K-C9336C-FX2

winged ocean
#

Just do this after shutting down/enabling each cable:

On the Netapp run
net port show -ipspace Clusters
Check the healthy status. If anything is degraded, wait 3-5 minutes and try again.

Then check
int show -vserver Cluster

Make sure all cluster ports return home

DO NOT CONTAIN until all ports are home. When ports are degraded they won’t go home. Be patient and wait.

Then repeat the process with the second cable.

#

Switch commands
conf
Int eth9/1/1-4
shut
(do cable maintenance)
no shut
end
Also use

show cdp neighbors

To verify ports coming up

show int eth9/1/1-4

to look at port data

proud flax
#

Thank you for such details!
I am thinking one more question and please help me to confirm:

We use 2 breakout cables for the clusters connections to all 4 nodes in the cluster.(detailed connections as listed above). Under this layout, if I reseat the 4x SFP ports one at a time on the storage nodes side, then it's going be non-disruptive, right? However, what if I reseat the other single end of a breakout cable on the switch side, will that cause interruption?

So, if they are all non-interruptive, then I don't need to shut down interfaces, correct?

cobalt rover
#

It's always a good idea to down the ports manually as mentioned earlier. About the breakout cables; I would thinkg the two cables are connected to different switches so if you pull the 4 x 10G ports or the 1 x 40G port, the result should be the same and it should not matter which one you pull because there should be two connections from each node done with two different set of breakout cables... (hope it makes sense?) so in short, provided the cables are connected correctly, it should be possible to pull either cable (one at a time of cause) and always keep an eye on the status of the ports on the NetApp (as described above) and take it slow 🙂 If it was me, I would migrate the cluster lifs manually on the netapp's (net int migrate.....) (you may need to disable the autorevert) then once all the cluster lifs are on the other ports, I would down the ports from the switch side, both the 10G and the 40G ports... then pull out the 40G and 10G ports at the same time, and reseat them... then up the switch ports, and revert back to cluster lifs (net in revert *)... But if you are not comfortable with this and if the system is critical I would leave this to a professional 🙂 Dependant of your service contract with NetApp they should be able to send someone out to help...

winged ocean
#

Overall
Switch 1
Shut down eth9/1/1-4
reseat all ends of that cable only
“no shut” those ports and wait for verification they are online. (Net port show/net int show as above). Then repeat on second switch

No disruption if you do one cable at a time and verify before proceeding

Did I mention to verify frequently?

Verify frequently to avoid disruption!

proud flax
#

So, if I do one cable at a time and without shutting down any interfaces, that should be safe?

boreal tundra
#

thats how we have done it in the past

#

if we had a spare switch port we'd switch one of the breakouts at a time, but when we had to use the same port we'd just shut that port down on the switch, flip the cable out and then bring the port back up.
you can do it however works for you

winged ocean
#

I was going to suggest that but since it is a breakout cable you will likely end up with a degraded health status. Why? Unplug the qsfp end (at the switch) and reinsert. Fine. But as soon as you pull one of the sfp ends that will appear to be link flapping and the port will be degraded for about 5 minutes.

That’s why I suggested shutting the ports down on the switch, reseat all five ends of the cable then turn it back on.

proud flax
#

To report back to you on what I finally did. I just simply unplugged all individual 4 sfp ends, and then finally the qsfp end, one at a time. During all total of 5 reseatings, I kept watching connections. There were no interruptions at any single point which is good. However, as the result, we still see large amount of CRC errors from all 4 e0a ports across all 4 nodes using the 4 x 10 breakout copper cable connected to the swtich. I even replaced the entire breakout cables with new one, same errors. All e0b ports and connections to the other switch are fine.

Now, upon your message above, do you think I should try reseating them again by using your suggested method: shutting the ports down on the switch first, before reseating all 5 ends, then turn them back on? I didn't do this way because it was a little confused to me as for where I should shut ports down, whether on nodes side or switch side. Anyway, since the data center is on remote site, so, not sure if it is worth of trying...

winged ocean
#

Honestly as a troubleshooting technique I would suggest copying the eth9/1/1-4 config to another port not in use. Not sure which rcf you are using but you could possible breakout another port and try moving the cable. It’s not impossible there is something wrong with the switch port.

Is the breakout cable a NetApp cable or Cisco cable? Or something else?

proud flax
#

It’s netapp X66120-3 breakout cable. As far as I know only port Eth1/9/* could be used, probably upon RCF? What port else can be used?

winged ocean
#

If you have an unused qsfp28 port, you can update the config to test

interface breakout module 1 port 8 map 10g-4x

You would need to copy the config from eth9/1/1 to the new breakout

conf
Int eth 8/1/1-4
(paste per details here)
Copy run start

Netapp support should be able to help you with this troubleshooting technique

Convert the port, move the breakout cable, retest

If needed move the cable back and revert port 8 back to non-breakout and restore the port config

#

If I were at my computer I could give you extract commands. Sorry

winged ocean
#

Just curious, which rcf file are you using?

proud flax
#

The RCF file that we are using: NX9336C-FX2-RCF-v1.12a-Shared.txt
When you are at your computer, please show me details, and steps to copy configurations from port 9 to port8, appreciate it!
conf
Int eth 8/1/1-4
(paste per details here) ???
Copy run start ???

Another note:
The result of "show running-config" shows me the following. Doe that mean port9 and port9 only can be used in our case? If yes, then I cannot use port8, or any other ports.
* Port 9: 10GbE breakout Intra-Cluster Ports, int e1/9/1-4

winged ocean
#

Ok, @proud flax provided port 8 is currently NOT in use, you should be alble to
conf interface breakout module 1 port 8 map 10g-4x interface e1/8/1-4 inherit port-profile CLUSTER_HA priority-flow-control mode on service-policy type qos input HA_POLICY exit copy run start

That gets port 8 ready as a breakout. Move the cable from Port 9 to port 8.
When you are done testing, you can just move the cable back from port 8 to port 9, then clean up:

conf no interface breakout module 1 port 8 map 10g-4x interface e1/8 inherit port-profile CLUSTER_HA priority-flow-control mode on service-policy type qos input HA_POLICY exit copy run start

That should be it.

proud flax
#

We resolved it by reapplying RCF. Thank you for all your messages