#Harvest should monitor Cisco switches

1 messages · Page 1 of 1 (latest)

whole lichen
#

The initial focus is on Cisco switches used as part of Metrocluster's Inter-Switch Link. We may expand support to other Cisco switches, but want to focus on Metrocluster first.

If you're interested in this feature, please run the script in this comment and email the Harvest team your switch details. This will help ensure that your switches are able to be monitored.

Thanks!

GitHub

Is your feature request related to a problem? Please describe. Feature request if this is possible Describe the solution you'd like If possible any metric from the Cisco Cluster switches specia...

left rain
#

Is your plan to use SSH calls for the monitoring?

brazen pier
#

Yes

left rain
#

Are you aware that Cisco offers a REST API?

brazen pier
#

It appears that enabling them on boxes with specific versions is necessary or they are not available across all versions.

left rain
#

I think enabling a feature is not a big deal. The only non-EOA model NX9336C supports it for sure, even back to NX-OS 9.3 & 10.2, which was supported with ONTAP 9.8.

brazen pier
#

Sure. Yes we ll check this.

left rain
#

I'm no friend of poking the SSH for device monitoring. Though I must admit this "json-pretty" makes it somewhat usable 😅

brazen pier
#

yes. ssh json is the one we are exploring.

whole lichen
# left rain I'm no friend of poking the SSH for device monitoring. Though I must admit this ...

@left rain thanks again for the log files!

Can you say more about your reservations for using ssh to collect monitoring metrics? That seems like the transport that has the best support and best documentation for the CLI commands. Even though there are several REST models, it seems like ssh is still most often used to monitor (with a monitoring user that has limited privileges). The json filter means we don't need to parse unstructured CLI output

left rain
#

SSH is made for human interfacing and not an API. With the availability of a native and documented REST API it just feels wrong to rely on SSH. But maybe that is just my point of view.

whole lichen
#

understood

#

thanks for the feedback

whole lichen
#

Hi @left rain, We have a beta version of the Cisco collector available in nightly if you want to try it. We're working on a Cisco dashboard now. We took your feedback and switched the collector to REST instead of SSH. That means you will need to enable that feature on your switch.

#

To enable the REST API on your switch - ssh to your switch, and type

conf t
feature nxapi
#

To add the switch to your harvest.yml

switch1:
    addr: 10.65.50.35
    username: user               
    password: pass
    collectors:
        - CiscoRest
left rain
#

Do I need this configuration for each switch?

whole lichen
#

yes

left rain
#

Auto discovery of switches that are already configured in ONTAP would be cool.

#

I will try it out.

whole lichen
#

great!

left rain
#

I get some warnings like this:

#

level=WARN source=environment.go:259 msg="Unable to parse actualOut" Poller=redacted plugin=CiscoRest:Environment object=Environment error="strconv.ParseFloat: parsing " 132": invalid syntax" trimmed=" 132"

whole lichen
#

thanks @left rain we'll fix that. If you curl the poller, do you see the switch metrics?

#

Which model of switch gave that warning? Looks like a format change between versions

left rain
#

harvest status doesn't return a port where it is running

#

found it, exporters section was missing

whole lichen
left rain
#

I get the interface statistics

#

Looks good

#

Do you plan to include SFP statistics?

#

I can't see any other error in the log.

whole lichen
#

SFP statistics, would that be collected via show interface transceiver | json-pretty?

whole lichen
whole lichen
#

Thanks. We should be able to add that. Here is an example out from one of your switches. Can you describe what you want included?

{
  "interface": "Ethernet1/1",
  "sfp": "present",
  "type": "QSFP-100G-PCC",
  "name": "Molex",
  "partnum": "aaa-bbb",
  "rev": "B0",
  "serialnum": "serial",
  "nom_bitrate": "25500",
  "len_cu": "2",
  "ciscoid": "17",
  "ciscoid_1": "0"
}
left rain
#

The inter-site links through our DWDM equipment use "glass" connections. It is important to monitor the SFPs power levels. These are good indicators to see if a connection goes bad.

#

{ "interface": "Ethernet1/15", "sfp": "present", "type": "10Gbase-SR", "name": "AVAGO", "partnum": "AFBR-710SMZ", "rev": "G5.1", "serialnum": "AC2211P0FH3", "nom_bitrate": "10300", ... "TABLE_lane": { "ROW_lane": { "temperature": "42.63", "temp_flag": null, "temp_alrm_hi": "85.00", "temp_alrm_lo": "-5.00", "temp_warn_hi": "80.00", "temp_warn_lo": "0.00", "voltage": "3.28", "volt_flag": null, "volt_alrm_hi": "3.60", "volt_alrm_lo": "3.00", "volt_warn_hi": "3.46", "volt_warn_lo": "3.13", "current": "7.06", "current_flag": null, "current_alrm_hi": "10.50", "current_alrm_lo": "2.50", "current_warn_hi": "10.50", "current_warn_lo": "2.50", "tx_pwr": "-2.38", "tx_pwr_flag": null, "tx_pwr_alrm_hi": "3.01", "tx_pwr_alrm_lo": "-8.99", "tx_pwr_warn_hi": "-1.02", "tx_pwr_warn_lo": "-4.98", "rx_pwr": "-3.72", "rx_pwr_flag": null, "rx_pwr_alrm_hi": "3.01", "rx_pwr_alrm_lo": "-15.08", "rx_pwr_warn_hi": "-1.02", "rx_pwr_warn_lo": "-15.08", "xmit_faults": "0" } }

whole lichen
#

interesting - looks like that show command has different shaped returns. For example, Ethernet1/1 has the shape I pasted above (no power), but Ethernet1/15 does have power

left rain
#

rx_pwr und tx_pwr in this example

#

Depends on what is inserted in the slot.

whole lichen
#

yep we replied at the same time 🙂

left rain
#

The controllers and shelfs are connected with copper cables.

#

There is no optic in that case.

whole lichen
#

wonder what unit this is? "tx_pwr": "-2.38",

#

maybe dBm

left rain
#

yes dBm

whole lichen
#

thanks

left rain
#

I confirm latest nightly fixes the WARN events. Only the "Cluster info" message looks a bit weird. I don't know if that's important.

#

level=INFO source=poller.go:1515 msg="Cluster info" Poller=switchname remote="{Name:switchname Model:nxos UUID: Version:10.3.4a Release: Serial:cisco Nexus9000 C9336C-FX2 Chassis IsSanOptimized:false IsDisaggregated:false ZAPIsExist:false HasREST:false IsClustered:false}"

brazen pier
#

Do you mean that this log message is missing UUID and Release fields?

left rain
#

Yes, and "Serial" reporting the model.

brazen pier
left rain
#

It was just something I noticed and wanted to bring to your attention in case you see any relevance in it.

#

I don't think there is a UUID.

brazen pier
#

Sure Thanks! We'll check.

whole lichen
#

Thanks for confirming that nightly fixes the issues Mamoep. We'll ping you when we have the dashboard and optics collector completed. Please let us know if anything else is missing or needed

whole lichen
#

@left rain Optics collector is available in latest nightly. I used the json you shared earlier to help build it since the switch I have access to does not have optical interfaces.

left rain
#

@whole lichen The latest nightly isn't collecting anything and emits some warnings. I send the log via mail.

brazen pier
#

@left rain
There is an issue in the nightly build that prevents Cisco templates from working. It will be fixed in the PR (https://github.com/NetApp/harvest/pull/3578). As a workaround until a new nightly build is available, you can try removing the double quotes from the query in the template which should work.

https://github.com/NetApp/harvest/blob/main/conf/ciscorest/nxos/9.3.12/environment.yaml#L2
https://github.com/NetApp/harvest/blob/main/conf/ciscorest/nxos/9.3.12/interface.yaml#L2
https://github.com/NetApp/harvest/blob/main/conf/ciscorest/nxos/9.3.12/optic.yaml#L2

whole lichen
#

@left rain thanks for trying out the optics collector and reporting the bug you found. We've fixed that bug in the latest nightly

left rain
#

Latest Nightly does deliver some values. TX values seem cut-off

#

cisco_optic_rx{datacenter="",interface="Ethernet1/15",switch="switchname"} -3.12
cisco_optic_rx{datacenter="",interface="Ethernet1/16",switch="switchname"} -2.69
cisco_optic_rx{datacenter="",interface="Ethernet1/17",switch="switchname"} -2.51
cisco_optic_rx{datacenter="",interface="Ethernet1/18",switch="switchname"} -3.05
cisco_optic_tx{datacenter="",interface="Ethernet1/15",switch="switchname"} -2
cisco_optic_tx{datacenter="",interface="Ethernet1/16",switch="switchname"} -2
cisco_optic_tx{datacenter="",interface="Ethernet1/17",switch="switchname"} -2
cisco_optic_tx{datacenter="",interface="Ethernet1/18",switch="switchname"} -2

#

Also there are no values for the existing 40G transceivers that are offline.

whole lichen
#

thanks for the confirmation! Let me check the json you provided earlier

left rain
#

They report tx_pwr but no rx_pwr. I didn't know if it was intended to leave out offline ports.

whole lichen
left rain
#

"cisco_nx9336c_mcc-switch_out.txt" I shared, Port Ethernet1/22/*

whole lichen
#

thanks! yes, I see that. tx_pwr is present but rx_pwr is missing. My regex was defeated because all of the other matchin rx_pwr_* keys. I'll update the code to handle this case, exporting tx_pwr metrics when rx_pwr metrics are missing and fix the tx_pwr truncation you highlighted above

whole lichen
#

I'm not seeing any parser bugs for the truncated cisco_optic_tx metric - since a few minutes have passed, do you still see the collector reporting -2? And if so, does show interface transceiver details | json-pretty show a different value for tx_pwr for one of the interfaces with truncated power?

left rain
#

Issue is still visible, CLI output is similar to the test data.

#

"tx_pwr": "-2.35"
"tx_pwr": "-2.51"
"tx_pwr": "-2.31"
"tx_pwr": "-2.50"

whole lichen
#

thanks for confirmation

left rain
#

RX and TX are treated differently in the optic.go

#

Nevermind, I missed line 163 where RX is assigned earlier 😄

whole lichen
#

that code mens if rx_pwr is missing, tx_pwr won't be exported. That's the bug I've fixed, but that does not explain the parsing difference

#

the testcase, I pasted above, passes with your provided json so the truncation bug must happen further downstream. hmmm

whole lichen
#

new nightly fixes offline transceiver issue you reported. Let us know if that fixes the issue when you get a chance. With the new nightly, if you still see truncated tx values can you email us the output from show interface transceiver details | json-pretty again? If I still don't see the issue, we'll provide a debug build. Can you remind me how you are installing Harvest to test these Cisco runs?

left rain
#

I tested the new build. The truncated values are still visible. I sent the output via email.

left rain
#

I use the RPM on RHEL8

brazen pier
#

Thanks. It is strange that RX works fine but TX has issues given both have similiar code.

#

In test case, TX data looks correct

#

I noticed that the datacenter label is empty in the switch metrics. Is the datacenter field missing in the Switch Harvest configuration?

left rain
#

yes, that is not configured currently

whole lichen
#

hi @left rain perhaps the CLI is returning different values than nx-api. Let's check that. Can you try running the following and sending us the show-interface-transceiver-details.json file? Please replace $user, $pass, $ip with your values
curl -k -u$user:$pass -X POST https://$ip/ins -H "Content-Type: application/json" -d '{"ins_api":{"version":"1.0","type":"cli_show","chunk":"0","sid":"sid","input":"show interface transceiver details","output_format":"json"}}' > show-interface-transceiver-details.json

left rain
#

Seems like you're correct. 😩 I sent the file as requested.

whole lichen
#

thanks!

#

sigh indeed, looks like a Cisco bug. What if you change type from cli_show to cli_show_array, does that give you a float instead of an int?

curl -k -u$user:$pass -X POST https://$ip/ins -H "Content-Type: application/json" -d '{"ins_api":{"version":"1.0","type":"cli_show_array","chunk":"0","sid":"sid","input":"show interface transceiver details","output_format":"json"}}'

left rain
#

That looks better

#

Native API output from "/api/class/eqptFcotSensor.json" also delivers good values.

whole lichen
#

excellent, thanks

whole lichen
#

There's a new nightly that uses cli_show_array instead of cli_show if you want to try it out

left rain
#

Latest nightly doesn't collect any Optic Metrics

#

time=2025-04-30T09:45:23.142+02:00 level=INFO source=collector.go:601 msg=Collected Poller=switchname collector=CiscoRest:Optic apiMs=0 bytesRx=14338 calcMs=0 exportMs=0 instances=0 instancesExported=0 metrics=0 metricsExported=0 numCalls=1 parseMs=0 pluginInstances=0 pluginMs=1150 pollMs=1150 renderedBytes=0 zBegin=1745999121992

brazen pier
#

I see the issue. Structure for cli_show_array is little different. This needs to be handled.

brazen pier
idle flame
#

Is it also planed to check cisco crc port errors?

brazen pier
#

Could you share the cisco switch CLI command for that?

idle flame
#

--------------------------------------------------------------------------------
Port          Align-Err    FCS-Err   Xmit-Err    Rcv-Err  UnderSize OutDiscards
--------------------------------------------------------------------------------
mgmt0                 0          0         --         --         --          --
Eth1/1                0          0          0          0          0           0
Eth1/2                0          0          0          0          0           0
Eth1/3                0          0          0          0          0           0
Eth1/4                0          0          0          0          0           0
Eth1/5                0          0          0          0          0     1857168
Eth1/6                0          0          0          0          0    27884464
Eth1/7                0          0          0          0          0           0
Eth1/8                0          0          0          0          0           0
Eth1/9                0          0    4128188          0          0        6370 ```
#

or ``` nasw3-mc1-nbg3# show interface counters errors non-zero


Port Align-Err FCS-Err Xmit-Err Rcv-Err UnderSize OutDiscards

Eth1/5 0 0 0 0 0 187319
Eth1/6 0 0 0 0 0 3644271
Eth1/15 1 30930 0 30931 0 0
Po20 0 0 133191230816256 0 4294967296 0 ```

brazen pier
#

Thanks! You want to monitor all these Err? or specific ones?

idle flame
#

let me check...

#

it think this command is more clear

#
Ethernet1/15 is up
admin state is up, Dedicated Interface
  Belongs to Po20
...
  RX
    219811618009 unicast packets  5555257 multicast packets  13166 broadcast packets
    219825261692 input packets  590755573107974 bytes
    69196011137 jumbo packets  0 storm suppression bytes
    0 runts  7434 giants  8067826 CRC  0 no buffer
    8075260 input error  0 short frame  0 overrun   0 underrun  0 ignored
    0 watchdog  0 bad etype drop  0 bad proto drop  0 if down drop
    0 input with dribble  0 input discard
    0 Rx pause
    0 Stomped CRC
  TX
    228647074372 unicast packets  1767250 multicast packets  16213 broadcast packets
    228648857835 output packets  723081482793294 bytes
    85844898339 jumbo packets
    0 output error  0 collision  0 deferred  0 late collision
    0 lost carrier  0 no carrier  0 babble  1213 output discard
    0 Tx pause

nasw3-mc1-nbg3#  ``` CRC should be enough
#

may input errors too

brazen pier
#

Could you share which fields are you looking for in attached output?

idle flame
#
 "eth_inerr": 0,```
brazen pier
#

We are already collecting below for each interface

eth_inbytes
eth_outbytes
eth_inerr
eth_outerr
eth_inmcast
eth_inbcast
brazen pier
idle flame
#

perfect - thanks

brazen pier
idle flame
brazen pier
#

Nice!

idle flame
#

Next step is building a dashboard to make the crc errors visible?

whole lichen
idle flame
#

nice

whole lichen
#

@idle flame appreciate you taking a look, please let us know if there is anything else needed. @pulsar tundra is working on the dashboard and the switch, hostname, up time, chassis, bios, and os version will be shown like this in the dashboard

idle flame
#

RCF Version ?

whole lichen
#

do you know which command returns the RCF version?

idle flame
#

let me check

#

******************************************************************************
* NetApp Reference Configuration File (RCF)
*
* Switch    : NX9336C (direct storage, Transition, L2 Networks, direct ISL)
* Filename  : NX9336C-FX2_v2.10_Switch-B2.txt
* Date      : Generator: v1.6c 2023-12-05_001, file creation: 2024-07-29, 11:19:36
*
* Platforms : 1: MetroCluster 1 : FAS9000, AFF-A700, AFF-C800, ASA-C800, AFF-A800, ASA-A800
*           : 3: MetroCluster 1 : Cluster Port Speed FC Nodes
*
* Port Usage:
* Ports  1- 2: Intra-Cluster Node Ports, Cluster: MetroCluster 1, VLAN 201
* Ports  3- 4: Ports not used
* Ports  5- 6: 40G / 100G Ports for MetroCluster FC to IP transition
* Ports  7- 8: Intra-Cluster ISL Ports, local cluster, VLAN 201
* Ports  9-10: MetroCluster 1, Node Ports, VLAN 20
* Ports 11-12: Ports not used
* Ports 13-14: Ports not used
* Ports 15-20: MetroCluster ISL, VLAN 20, Port Channel 20, 40G / 100G
* Ports 21-24: MetroCluster ISL, VLAN 20, Port Channel 21, 4x10G breakout
* Ports 25-30: Ports not used
* Ports 31-36: Ports not used
*
******************************************************************************
nasw3-mc1-nbg3# ```
whole lichen
#

thanks! same should be available via show version

idle flame
#

so * Date : and * Filename:

whole lichen
#

we'll add those

idle flame
#

show version doesn't display these details

whole lichen
#

yes, I noticed. The ssh banner does, which was how I was running show version. I see it with show banner motd

#

thanks, we'll add this and include in the same table I shared above

whole lichen
#

@idle flame how does this look? Given the following banner motd filename and date
Filename : NX3132Q-V_v2.00_Switch-A1.txt
Date : Generator: v1.6b_2023-07-18_001, file creation: 2024-02-15, 10:28:44

We will export switch labels that look like this:
rcf_filename="NX3132Q-V_v2.00_Switch-A1.txt", rcf_generator="v1.6b_2023-07-18_001"

How does that look? Do you also want file creation?

idle flame
#

looks okay for me 🙂

left rain
#

Storage Switch RCF looks a little different than MCC Switch

#

`**********************************************************

  • NetApp Reference Configuration File (RCF)
  • Switch : Nexus N9K-C9336C-FX2
  • Filename : Nexus-9336C-RCF-v1.8-Storage.txt
  • Date : 07-15-2021
  • Version : v1.8
  • Port Usage : Storage configuration
  • Ports 1-36: 100GbE Controller and Shelf Storage Ports

**********************************************************`

idle flame
#

if possible Version: could be collected as well

whole lichen
#

thanks Mamoep. @idle flame are you saying that it would be better to publish rcf_version instead of rcf_generator? For the example text you provided, would version be 1.6c or would it be v1.6c_2023-12-05_001

left rain
#

Another example for cluster switches:
`******************************************************************************

  • NetApp Reference Configuration File (RCF)

  • Switch : NX3232C

  • Filename : NX3232C-RCF-v1.10-Cluster-HA.txt

  • Date : 10-04-2023

  • Version : v1.10

  • Port Usage:

  • Ports 1-30: 40/100GbE Intra-Cluster/HA Ports, int e1/1-30

  • Ports 31-32: Intra-Cluster ISL Ports, int e1/31-32

  • Ports 33-34: 10G Intra-Cluster/HA Ports, e1/33-34

  • This RCF supports Clustering, HA, RDMA, and DCTCP using a single port profile.

  • IMPORTANT NOTES

    • This RCF utilizes QoS and requires TCAM re-configuration, requiring RCF
  • to be loaded twice with the Cluster Switch rebooted in between.

******************************************************************************`

whole lichen
#

thanks! so for this switch we would export the labels rcf_filename="NX3232C-RCF-v1.10-Cluster-HA.txt" and rcf_version=v1.10 Does that look good?

idle flame
#

the mcc rfc's are generated with a rfc generator from netapp - if this generator would create rfc with the correct version: tag everything would be a bit easier

whole lichen
#

ah, that makes sense. Assuming that we can't fix the generator 🙂 what do you think about the labels shown above?

idle flame
#
* Date      : Generator: v1.6c 2023-12-05_001, file creation: 2024-07-29, 11:19:36
 ``` generated with the Generator
#
* Date     : 10-04-2023
* Version  : v1.10 ``` downloaded from mysupport
#

i think the labels are fine. Version won't work for the rfc's generated with the generator

whole lichen
#

I can make it work for the rfc's generated with the generator (assuming that format is stable)
for the first one, would you prefer
rcf_version="v1.6c 2023-12-05_001"
or
rcf_version="v1.6c"

idle flame
#

rcf_version="v1.6c"

whole lichen
#

thanks, we'll do that

left rain
#

I tested the latest nightly. Optics are collected correctly now.

brazen pier
#

Thanks for the confirmation!

shrewd venture
left rain
#

ONTAP can be configured for switch log collection and will save credentials for that. Those won't be visible after configuration though. A set of default credentials or a credential provider could be used to for auto-discovered switches.

brazen pier
whole lichen
#

Anyone tried the new Cisco switch dashboard in nightly yet and if so any feedback?

#

Some folks have requested that the Cisco collector also collect neighbor info and show that in the dashboard too. If you get a chance please run the following on your switches and email us your response(s) so we can validate the neighbor template. Thanks!
show lldp neighbors detail | json-pretty
show cdp neighbors detail | json-pretty
email: ng-harvest-files@netapp.com

idle flame
#

I'm currently busy and wasn't'd able to test the new nightly.

whole lichen
#

thanks!

#

Do you agree that including LLDP and CDP would be useful?

idle flame
#

not really, I've no idea what the add value should be

whole lichen
#

thanks for the feedback - i'll get more details on the value add from the folks requesting. I think it can show which ONTAP clusters are also connected to the same switch, which is useful for MCC

idle flame
#

may it make sense in metro cluster environments, but the neighbors details are static and should not be change ...

whole lichen
#

thanks for the files @idle flame much appreciated

waxen talon
#

Hi @whole lichen on the Switch dashboard is noted that the keyperf collector also is needed. Is this in general needed for the cisco metrics or only for the dashboard?

brazen pier
#

@waxen talon The ONTAP: Switch dashboard displays metrics for switches available via ONTAP. In contrast, the Cisco: Switch dashboard is intended to collect metrics directly from Cisco switches.

waxen talon
#

Ahh the Cisco Switch dashboard is not imported in the nabox @fleet silo when I remember right there was like a filter for the automated import right?

full crest
#

Humm ... Planning to update on Thursday, to show off the cool new MCC ISL Stats ... Would be good to know if there are any wrinkels or extra steps!

whole lichen
river bloom
#

Hello, my Cisco switches are cluster, not metrocluster. I created the user with role network-operator and enabled the nxapi. However, I am not able to add the switches in the NABox configuration. I am getting the error in the screenshot. Is the Cisco monitoring still targeting just metorcluster switches? The switch Os is a bit old tooSwitch : NX9336C-FX2
| * Filename : NX9336C-FX2-RCF-v1.10-Cluster-HA-Breakout.txt
| * Date : 10-04-2023
| * Version : v1.10

brazen pier
river bloom
#

Hi Rahul, thank you for getting back to me! Before collecting the bundle I decided to attempt one more time to add the switch and it worked. Apparently, when I was copying the password, there was an empty character at the end of it. So, this was a user error 😞Thank you again!