This will be a lengthy post, so it will be broken up into the comments as well.
I have an alert that I have used for quiet a while (since 2022).
sum by(node,nic,cluster) (nic_rx_total_errors) > 100
When running this, it's been really good at detecting errors with network ports, cables, etc.
Recently, we had a downtime, unclear if that is at all related, where the nodes were down for an extended (week + ) time. Upon return, one of the ports started getting errors and the alert was sending out to us to repair. Upon inspection, the node didn't have any port by the name it was complaining about. Digging further lead to more questions.
I queried both ZAPI and REST and both returned the port as being available for the node in the nic_rx_total_errors metric, but when I queried net_port_status, no such port existed.
Is there a specific reason why one metric would record data for a port that isn't visible within the NA OS or at the OS shell prompt that both REST and ZAPI returned?
Here's the ZAPI and REST queries that I used.
curl -s --connect-timeout 30 --user USER:PASS --insecure --data-ascii '<?xml version="1.0" encoding="UTF-8"?>
<netapp xmlns="http://www.netapp.com/filer/admin" version="1.130">
<perf-object-get-instances>
<objectname>nic_common</objectname>
<instances>
<instance>*</instance>
</instances>
</perf-object-get-instances>
</netapp>' -H "Content-Type: text/xml" 'https://CLUSTER_IP/servlets/netapp.servlets.admin.XMLrequest_filer' 2>&1 | tee zapi_nic_common.xml. ***** DOES RETURN THE PORT
curl -sk -u USER:PASS 'https://CLUSTER_IP/api/cluster/counter/tables/nic_common/rows?fields=*' 2>&1 | tee rest_nic_common.json. **** DOES RETURN THE PORT
curl -sk -u USER:PASS 'https://CLUSTER_IP/api/network/ethernet/ports?fields=**' 2>&1 | tee rest_ethernet_ports.json. **** DOES NOT RETURN THE PORT
Port in question was e0a on a node that wasn't displaying e0a to anything.
More in comments below: