#How to get the 2nd 10GbE port of the Mezzanine card to work on a FAS2240?

1 messages · Page 1 of 1 (latest)

jovial epoch
#

Hello,

we have a customer who has a old FAS2240 on ONTAP 9 with 2 mezzanine cards in a cluster. e1a (data interf) and e1b (cluster interf) both show up but only one interface works (e1a or e1b). Is there a way to get both interfaces up and running at the same time or this is a HA limitation of the card?

opal heart
#

Define works? If you tag one of them for clustering, it won’t show up for data..

#

Switchless cluster or switched?

jovial epoch
#

Tried both, currently switched. One (e1b) is tagged cluster. The other (e1a) is intended for Data use, but the interface is always down. Somehow only one e1a or e1b is up. No matter

#

I can tell from the switch (on FAS reboot) that both ports are active and lights are also on, but upon finish of ONTAP boot one port is turned off

opal heart
opal heart
#

e1a should be cluster and e1b data though

jovial epoch
#

afaik my engineer has tried all sorts of variations by now and none of them worked. Even now the interface unexpectedly goes down and there is no way to wake up the interface

#

The anticipated configuration is (Single, non-redundant cluster connection) like here

#

this is a standalone deployment and the interface are not coming up either.

jovial epoch
#

is there any way I could manually turn on the link?

#

even in diag shell (systemshell) with ifconfig the "network stack cannot be modified".

jovial epoch
#

Here is an old post @opal heart with some reply from you as well.
https://community.netapp.com/t5/ONTAP-Hardware/Network-Port-Roles-for-FAS2240-With-SFP-Running-clustermode-9-1/td-p/135948

Is any portset required like andris mentioned it or is a "normal" cluster configuration ok?

jovial epoch
#

Again, after running cluster-setup i suspect that port e1a (now directly connected as shown in the guide) is DOWN. Thus the setup wizard shows e1b as cluster port:

opal heart
#

Whichever port is used for the cluster should be in the cluster IPSpace, on both nodes. Is this a fresh deployment of ontap? I would suggest doing a special boot menu 4 on both nodes, with the cable in e1a (not e1b like that chump Alex says on the community 🤣) and start fresh. It does not look like you have cluster LIFs currently which is.. not what I’d expect. Without them the rrdbs which ontap needs to properly manage networking will not run properly

#

What cable are you using for the crossover?

jovial epoch
#

🙂 very fresh deployment. This is as of right now

jovial epoch
opal heart
#

Which vendor?

jovial epoch
#

I think Arista

opal heart
#

That is not supported. I say that not to be a jerk, but DAC compatibility is limited

#

Use Cisco or Intel

#

Even Intel isn’t supported, but I know it works with those mezz cards

jovial epoch
#

I could exchange with what is available here. I tried with a Fiber as well. The SFPs work. I already had them up and running once and was able to ping the other cluster. Just not both ports e1a + e1b at the same time.

opal heart
#

nod

#

Re portsets- that’s a later change for survivability

jovial epoch
#

This was when e1a once worked

opal heart
#

Hmm

jovial epoch
#

I ran out of ideas how I could troubleshoot why both ports at the same time are not working. (considering that they work independently but not at the same time). Do you think that could be a SFP issue?

opal heart
#

Does “run local netdiag -v” work?

#

Netdiag used to be a 7mode command

#

I don’t know if it’s still in cdot

jovial epoch
#

havent tried. let me try. I would love to manually turn on the interfaces and see if the link comes up or of its a HW issue of some sort. BC when booting the system (BIOS stage and loader) both ports are turned on

#

is there a command to mess with the network stack?

opal heart
#

I think the mezz card is a single asic so it’s not out of the question it’s an SFP issue, but I’ll agree this is a head scratcher

#

I don’t remember off the top of my head

jovial epoch
#

I just finished setup of the new cluster:

FAS2240::*> run local netdiag -v
netdiag not found.  Type '?' for a list of commands

FAS2240::*> run -node local netdiag -v
netdiag not found.  Type '?' for a list of commands

FAS2240::*> node run -node local netdiag -v
netdiag not found.  Type '?' for a list of commands
opal heart
#

Blast

jovial epoch
#
#

I think this is the Twinax

#

But there are various vendors around

#

Usually the port turns on when I reboot both systems at the same time

opal heart
#

At this point I’d setup the cluster with two 1G links for cluster interconnect then try setting up the 10G for data

#

Or have you done that?

jovial epoch
#

I have done that and that works as the 4x1G interfaces work. However, not what the customer wants. One port for data usually works. Still a bummer

#

Is there a way I could turn on the ports on via diag commands or systemshell?

#

Btw, the NVRAM bat is complaining as well. Could that cause any issues related to the 10GbE card?

opal heart
#

Nvram bat shouldn’t cause an issue no. But it also should be working by now..

#

I don’t know of any commands to do that. I believe the up/down detection is pretty low level. That it looks like they work at bios/startup could just be that the card is not actually initialised

#

I’ve seen enough freaky DAC issues that I wouldn’t spend more time until there are Cisco cables being used

jovial epoch
#
FAS2240-01% ifconfig e1a
e1a: flags=8803<UP,BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    uuid: 6cb224ea-a0e9-11ed-80c8-00a0981c09f8
    options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
    ether 00:a0:98:xx:xx:xx
    media: Ethernet autoselect (autoselect <full-duplex,rxpause,txpause>)
    status: no carrier
FAS2240-01% ifconfig e1b
e1b: flags=8803<UP,BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
    uuid: 6cb277ad-a0e9-11ed-80c8-00a0981c09f8
    
options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
    ether 00:a0:98:xx:xx:yy
    media: Ethernet autoselect (autoselect <full-duplex,rxpause,txpause>)
    status: no carrier
FAS2240-01% ifconfig e1b up
ifconfig: Modifying networking stack is not allowed at this point!
FAS2240-01% ifconfig e1a up
ifconfig: Modifying networking stack is not allowed at this point!
FAS2240-01% 
opal heart
#

(And make sure they aren’t V01 cables)

jovial epoch
#

What is really freaky is that the e1b (data) port connected to the switch can be seen as (RS=Running) and later shut (S=Shut) after boot. Some part of ONTAP shuts it down when they system is close to wrote key file "/tmp/rndc.key"

#
[201] module_register_init: MOD_LOAD (NVMeOF, 0xffffffff83e12750, 0) error 45
[271] e1a: Could not setup receive structures
[271] e1b: ixgbe_check_mac_link_generic: **LINK UP** link_up = 0x00000001
[271] e1b: ixgbe_check_mac_link_generic: **LINK DOWN** link_up = 0x00000000
[271] e1b: Could not setup receive structures
FAS2240-01% 
opal heart
#

Yes

#

Four of them i guess

opal heart
#

Nvmeof? What version of ontap is this?

jovial epoch
jovial epoch
opal heart
#

On a 2240?

jovial epoch
#

yes. seems so. Not good?

opal heart
#

No.. FAS2240 supports 9.1 at maximum

#

Whatever you or the customer have done to get 9.8 on there.. 😮

#

That fact it even gets this far is.. surprising

jovial epoch
#

Was already installed when we got there. Cool 🙂 Happy to report up and running but just without the 10GbE. Maybe with the SPF cable then. However, performance on a single node was aweful. ~150 MB/s with all CPUs on 100% from what I was told on the 10GbE.

opal heart
#

Well, that may be why we don't support it on that platform 😉

#

interestingly though we do support it on the FAS255x which is.. not substantially different in CPU, only in RAM

#

but there's all sorts of things that get turned on/off according to platform, so.. if it's running 9.8 it may not turn off some stuff on a FAS2240 because it's not expecting it to run on there

jovial epoch
jovial epoch
opal heart
#

yeah

#

that's block by the CPU, but there's other things like how often tasks run, how much priority they have, etc, which is platform dependant

jovial epoch
#

I dont know how WAFL evolved and if its performing but I will try to find out why 9.8 is on there and if 9.1 could perform better

#

For the sake of knowing, I will wait for the cables to get delivered first

opal heart
#

both the FAS255x and FAS2240 are C3528 Processors but the FAS255x has 3x the RAM

#

(and no, RAM upgrades don't work)

#

we had a few customers try it back years ago, so at boot it checks if the right DIMMs are in place for the platform

jovial epoch
opal heart
#

no, it's a BGA CPU I think 😉

jovial epoch
#

challenge accepted

opal heart
#

actually looks like it isn't.. hah

jovial epoch
#

joke, I would be glad if I only get the interfaces up and running

opal heart
#

nod

jovial epoch
#

and then if the performance can saturate the 10GbE interface

opal heart
#

I.. would not be expecting that 🙂

jovial epoch
#

ie. 9500Mbps was measured between 10G. That would give a theoretical 1,187.5MB/s - overhead.

opal heart
#

yeah, I think more than about 400MB/sec is probably unlikely on that platform

jovial epoch
#

There is an additional disk shelf with 21x 6TB performance disks SAS attached.

opal heart
#

it's been a while since I've seen people performance test it

jovial epoch
#

I wouldn't do it but if it's a tops 150MB/s the customer wont be happy

opal heart
#

well, with respect to your customer, if they want performance, starting with a platform released 10 years ago isn't where we suggest they look for solving problems 🙂

jovial epoch
#

nod they just bought it out of a bankruptcy and were told its performing well

#

However, what do you think they should expect (in case we get the 10GbE card running)?

opal heart
#

internal drives are .. 600GB? or 450?

jovial epoch
#

afaik 450 and the ds4243 with 21x 6TB

opal heart
#

I'd say 250-300MByte/sec out of each controller is not unlikely

#

for sequential reads

#

of large files

#

random reads, small files and you'll get metadata read penalties etc

#

ok, starting the day here in Perth Australia. Let me know how the Cisco twinax's go

jovial epoch
#

Will do! thx a lot for the help and have a good one. Cheers from Germany

jovial epoch
#

One final thought. Could this be related to a power issue maybe? I mean everything is up and running but the power LEDs are orange?

#

Maybe the system goes after boot into some power saving state that cuts the energy for the SFPs

opal heart
#

It’s entirely possible, I don’t know for sure though. Until it’s running 9.1 though, I don’t recommend further troubleshooting