#API issue / bug with network ports?

1 messages · Page 1 of 1 (latest)

lofty wave
#

This will be a lengthy post, so it will be broken up into the comments as well.

I have an alert that I have used for quiet a while (since 2022).

sum by(node,nic,cluster) (nic_rx_total_errors) > 100

When running this, it's been really good at detecting errors with network ports, cables, etc.

Recently, we had a downtime, unclear if that is at all related, where the nodes were down for an extended (week + ) time. Upon return, one of the ports started getting errors and the alert was sending out to us to repair. Upon inspection, the node didn't have any port by the name it was complaining about. Digging further lead to more questions.

I queried both ZAPI and REST and both returned the port as being available for the node in the nic_rx_total_errors metric, but when I queried net_port_status, no such port existed.

Is there a specific reason why one metric would record data for a port that isn't visible within the NA OS or at the OS shell prompt that both REST and ZAPI returned?

Here's the ZAPI and REST queries that I used.

curl -s --connect-timeout 30 --user USER:PASS --insecure --data-ascii '<?xml version="1.0" encoding="UTF-8"?>
<netapp xmlns="http://www.netapp.com/filer/admin" version="1.130">
<perf-object-get-instances>
<objectname>nic_common</objectname>
<instances>
<instance>*</instance>
</instances>
</perf-object-get-instances>
</netapp>' -H "Content-Type: text/xml" 'https://CLUSTER_IP/servlets/netapp.servlets.admin.XMLrequest_filer' 2>&1 | tee zapi_nic_common.xml. ***** DOES RETURN THE PORT

curl -sk -u USER:PASS 'https://CLUSTER_IP/api/cluster/counter/tables/nic_common/rows?fields=*' 2>&1 | tee rest_nic_common.json. **** DOES RETURN THE PORT

curl -sk -u USER:PASS 'https://CLUSTER_IP/api/network/ethernet/ports?fields=**' 2>&1 | tee rest_ethernet_ports.json. **** DOES NOT RETURN THE PORT

Port in question was e0a on a node that wasn't displaying e0a to anything.

More in comments below:

#

(network port show)

Node: cfsXXnYYb
Ignore
Speed(Mbps) Health Health
Port IPspace Broadcast Domain Link MTU Admin/Oper Status Status


a0a Default 10.YYY.XXX.0/22 up 1500 -/- healthy false
e0M Default Default up 1500 auto/1000 healthy false
e0c Default - up 1500 auto/100000
healthy false
e0d Default - up 1500 auto/100000
healthy false
e0e Cluster Cluster up 9000 auto/10000 healthy false
e0f Default - down 1500 auto/- - false
e0g Default - down 1500 auto/- - false
e0h Cluster Cluster up 9000 auto/10000 healthy false
e3a Default - down 9000 auto/- - false
e3b Default - down 9000 auto/- - false
e5a Default - down 9000 auto/- - false
e5b Default - down 1500 auto/- - false
e5c Default - down 9000 auto/- - false
e5d Default - down 1500 auto/- - false
14 entries were displayed.

#

Then I dropped a systemshell and it also didn't show the port

#

bash-5.0$ ifconfig -a | grep e0
e0M: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
e0g: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
e0f: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
e0c: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
e0d: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
laggport: e0c flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: e0d flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
bash-5.0$ exit

#

no e0a in the result of either ifconfig or net port show

#

Upon further inspection net_port_status didn't have the port returned, just as network/ethernet/ports didn't in the REST query

#

my work around was to use net_port_status to filter out ports that aren't seen by the NA OS and eliminate trying to figure this out for ports that are not in the system.

#

Thanks for you help about this confusing issue

valid basin
#

what system is that on? There are ports that are not shown in ONTAP, for example the Cluster HA ports, MetroCluster iWARP ports, etc. I guess it was one of those?

#

OTOH, if that is a FAS2xxx (or equivalent AFF) system, there were some issues where after a reboot, some of the ports do not come up, either due to hardware defect or firmware issue.
Does node run ... sysconfig -av show the port? If it doesn't show up there, then you might need a mainboard replacement

lofty wave
#

AFF-A400

#

I don't buy a mainboard replacement argument, otherwise, it would need to be every node in my environment

#

sysconfig -av show the ports:

slot 0: Dual 10G/25G Ethernet Controller CX5
e0a MAC Address: d0:39:ea:17:ac:90 (auto-10g_cr-fd-up)
e0b MAC Address: d0:39:ea:17:ac:91 (auto-25g_cr-fd-up)
Device Type: CX5 PSID(NAP0000000006)
Firmware Version: 16.26.4012

#

they are not connected

#

they do not show up in 'net port show -node <node>' for the node in question, only in sysconfig -av

#

again, I am not understanding why ports that don't exist in ONTAP either at the admin level or systemshell level but appear in sysconfig -av and do appear in the API call for the node as an active port that is in use

#

maybe this is expected behavior, it's unclear to me

valid basin
#

sysconfig shows ports that are physically present. clustershell only shows ports that are actually usable for LIFs. Ports used exclusively for HA, ports used for MetroCluster (iWARP), ports used for storage shelves, UTA ports configured for FC etc. don't show up in "network port show"

#

and in your case with the A400, the e0a and e0b ports are the HA interconnect ports. Which should be running at the same speed by the way, so I would suggest you swap the 10g cable for a 25g cable

#

and while I can't tell you for sure why they show up in the API, I guess it is because you sometimes want to know if there are interface-level errors on those ports, to diagnose a marginal cable or something similar

peak heart
#

If you want to see the ports in ONTAP try these commands:

set -privilege advanced
system ha interconnect config show
system ha interconnect port show
system ha interconnect status show
peak heart
#

for a two-node switchless cluster it should be like this

valid basin
#

unless they changed it to be backed by AI and it hallucinates link lengths, cable types etc. now 😂