#AFF A700 very poor throughput on 40g

1 messages · Page 1 of 1 (latest)

final flax
#

Hi all,

We have to fight an very strange performane topic on an AFF A700
This AFF A700 has 2x 40g dual-port nic cards (XL710/X91440A) configured in non-breakout mode.
The first port of each card is used as cluster-port and the second one should be used as front-end port.

We already configured these front-end ports and doing workload on them.
But we did not nearly get throughput as expected so we did sevearl testings to eliminate switch or other network hardware:

  • Direct Connection from Server with 40g to A700 with 40g

    • Connected iSCSI LUN gave throughput of 400-500MB/s
    • iperf/test-link gave 400-500MB/s indepent which one is server or client
  • Volume move from one aggr to another

    • even when system is nearly idle we only get around 1GB/s (we see about 500MB/s per cluster-port)
  • running test-link within the cluster-network from one node to another

    • gave us around 1500MB/s (we even see around 1000MB/s on an old 8040 with only 10g cluster-ports)

In production the "old" 10g ports providing better throughput than the 40g ones.

Any idea?

austere jolt
#

MTU? Seeing any retransmissions, CRC og runt errors? Congestion problems?

final flax
#

MTU is 9000 end-to-end. ifstats are fine and no drops in e.g. netstat

austere jolt
#

what is the server OS?

final flax
#

If i do the same "network test-link run-test" against the other node in the cluster-network for an smaller AFF250 i get about 3000MB/s which is more than double by only 25Gbit Cluster-network

austere jolt
#

the tests between nodes go via a switch though, right?

#

between netapp nodes

final flax
#

No. The network test-link runs directly between switchless nodes (apipa cluster network)

#

This is what makes me thinking there is something wrong with the adapter or something

austere jolt
#

you don't have any other 40G qsfp's to switch it with?

final flax
#

We have tried the 40G front-end port switched first. As there was the same poor throughput we started troubleshooting with direct connection and other testing

austere jolt
#

did the switch tell you anything more about the physical connection? frame errors and signal strength, for example

final flax
#

port stats and physical stat were also looking fine on the switch

austere jolt
#

it's a bit hard to say much without a bigger picture... i guess i'd open a case and get ready to provide some perfstat input...

#

and i don't know anything about the server... or what card or driver or it's buffering setup

fiery hornet
#

Precisely. Your best bet is to open a support case for performance troubleshooting.

#

You'd need a support case to RMA faulty hardware anyway, so save a couple of steps (and headaches) and start with opening a case.

gritty otter
#

iSCSI is a limited protocol.

#

You have to keep the rsize/wsize generally 128k or less.

#

It's better to have multiple connections with multiple IP addresses.