#Performance implications of using mixed cluster interconnects

1 messages · Page 1 of 1 (latest)

teal nacelle
#

We have 4 nodes cluster using N9K-C9336C-FX2. 2 x A300 nodes use 2 x 10G, 2 x C60 nodes uese 2 x 100G. This configurations are supported NetApp account team, and HWU as well. Because C60 is newer, we felt from cost and efficiency point of view, use used 100G. We have other clusters with mixed speed as well, and don't see issues.

The link below indicated issue could be caused by 40G and 10G:
https://kb.netapp.com/on-prem/ontap/OHW/OHW-KBs/Performance_implications_of_using_mixed_40G_and_10G_cluster_interconnects

My question is, can this mixed speed of interconnections cause performance issues?

wary jungle
#

technically, if you are super concerned, you could purchase a couple of 4-port 10/25G cards and use that. The big difference between the C60 and the A300: The A300 has HA internally on the midplane where the C60 is external and is shared on the cluster network connections. It requires Priority Flow Control (PFC) enablement on the switches (which the CN1610 did NOT support) and as long as you have a rather current RCF on your 9336 switches, you could breakout 100 ->4x25G or 40->4x10G.

With that said, unless you are actually having issues, I would not worry about it.

teal nacelle
#

Yes, we have very bad performance issue. Ocum indicates high latency on the intercluster switches.

As I said. We have mixed speeds for many years on different clusters without issues. Why this particular combination could cause issues. Fundamentally why mixed speed caused issues based on that kb?

wary jungle
#

Another option if you have slots in the a300 is to put a dual 40g card in.

#

It comes down to the overall bandwidth. If you keep data local to a node and access through that node in theory there shouldn’t be a problem. But if your data is in the c60 and you access from a lif on the a300, you will be possibly overloading the a300 network. (200g down to 20g of total possible bandwidth)

You could put a dual 40g in the a300 and downgrade the C60 to 40g using 40 twinax. Then networking would be symmetrical

#

If you have lots of NFS traffic, check the output of “NFS connected-clients show” and review the remote-reqs column. That data access not local to the node/volume/aggr/lif

teal nacelle
#

Thanks for all options .

To achieve those requires time and effort. I wanted to know if this is really a problem and if there other root causes and we can check and correct ?

The data access is not local. We know.

wary jungle
#

The remote access is very likely the culprit. If you can somehow manage to get your data local that may alleviate the issue

teal nacelle
#

Why would you believe remote access/indirect access is the culprit, for so many discussions. NetApp and most of people believed it should only delay microseconds, shouldn't cause issues.

wary jungle
#

See the bandwidth above. Depending on the amount of traffic going to the switch at 100g, it will funnel into a10g pipe

#

Not always the case. But it happens

teal nacelle
#

By checking the kb above again.
It applies to older CN1610 cluster-switch setups, which is why it recommends using a Cisco 3132Q and breakout cables. That scenario doesn’t apply to our current Nexus-based cluster network.

Can you pleae verify my undersranding to KB is correct?