Hi All, looking around in the ONTAP CLI I found "system switch ethernet interface show ...". This set of commands can show the status and (some) if-stats for the (Metro)Cluster switches. What's the best way to map that to the info. that is available via the REST API ? I see in the Swagger interface an API endpoint called "/network/ethernet/switch/ports" which might be the same thing? Then, consequently, what's the best way to see if Harvest is already collecting that info. and, if not, how could I go about adding it? (We're using nabox, currently 4.0.7 and, in this case, talking to an ONTAP 9.12 cluster.)
#ONTAP Switch interface stats
1 messages · Page 1 of 1 (latest)
hi @lucid gull If you expand a swagger entry, the corresponding CLI command is show. For example, this one seems to match what you're looking for.
Another way to get the same thing from the ONTAP CLI is like so:
security login role show-rest -command "system switch ethernet interface show"
Harvest currently does not collect this - you can check that by searching for that endpoint in the Harvest GitHub repo like so https://github.com/search?q=repo%3ANetApp%2Fharvest %2Fnetwork%2Fethernet%2Fswitch%2Fports&type=code
creating a new template to collect these metrics should be straightforward. Are you using the Rest or Zapi collectors?
I expanded the "doc" under swagger 
I think we're using both REST and ZAPI ... If that's a thing? Certaining I remember creating REST user access entries.
Neat 👍🏻
yes, you can mix them. Which fields are you interested in from /network/ethernet/switch/ports and I'll whip up a REST template for you to try
Comprehensive anwsering! 🙏🏻
I also need to check if any of our clusters have records
Good question. I''d like to get the switch name, the interface name and the values returned by "-counters" i.e. InOctets, OutOctets, "In Errors", "Out Errors" and "Out Discards" ... and, well, also the uptime, if it were available (Didn't find it yet)
Of course, we're really only interested in a few of the interfaces
E.g. the MCC ISL links
Which interfaces are used for the ISL Links is dependent on the switch config. installed by NetApp
thanks! Can you run the following and email it to ng-harvest-files@netapp.com?
Replace $ip, $user, $pass with appropriate values.
curl -sk -u$user:$pass -H 'Accept: application/json' 'https://$ip/api/network/ethernet/switch/ports?max_records=1000000&return_timeout=120&fields=**' > switch-ports.json
OK. Will do. It will take me a few minutes ... I have to transfer the query over the the work environment and back. What can you do 🤷🏻♂️
thanks!
Aha. Fell at the first fence. It seems that the standard NetApp switch configurataion has http access disabled. The Filer must be using some other protocol e.g ssh to get it's stats 😦
Aha. Hang on ...
Wronng IP addresses
Doh!
it's almost Monday 🙂
What shall I put in the subject field, to ID me?
I think I'm having a bad case of the Mawndays 😕 ... File might be on it's way ...
SMTP Lives!
You mentioned counters: InOctets, Out Discards, etc. I don't see those in your json. I see statistics.receive_raw.discards. Was counters from the CLI?
Humm, yeah. I took those names from (more or less) the CLI column titles.
But the json just contains: "packets", "errors" and "discards" ... for some reason
yep, I noticed that too
So I think it is more or less the same: two structues for receive and transmit, each containing counts of packets, errors and discards
The switch (in this case a cisco) actually offers a lot more stuff, but I think NetApp also sell Brocade units, so maybe you generalise it by providing the lowest common denominator ...
yeah, not sure about that, but seems plausible. Let me see what I can come up with for a template
The only obvious way I see to identify the ISL links is by their InterfaceDescription ... which is in the switch configuration (i.e. "show run") but isn't in the json output.
How does this look?
reformatted two rows to make it easier to read. Newline after each key/value pair
the ethernet_switch_port_new_status metric has a zero value when the state is not up and a one value when the state is up
is_isl=true looks interesting
here's the template that produced the above open-metrics
name: EthernetSwitchPort
query: api/network/ethernet/switch/ports
object: ethernet_switch_port
counters:
- ^^identity.name => interface
- ^^switch.name => switch
- ^state
- ^speed => speed
- ^identity.type => type
- ^isl => is_isl
plugins:
- LabelAgent:
value_to_num:
# [ dormant, down, lower_layer_down, not_present, testing, unknown, up ]
- new_status state up up `0`
export_options:
include_all_labels: true
I'll need a tip as to what to do with it - (sorry!)
no worries - let me find the documentation for Nabox4
Perhaps you could send the new template to me, by replying to my email?
will do
steps to add to Nabox4
- ssh to nabox
sudo mkdir -p /etc/nabox/harvest/user/rest/9.12.0/sudo vim /etc/nabox/harvest/user/rest/custom.yaml- Press i to insert and copy/paste this
objects:
EthernetSwitchPort: ethernet_switch_port.yaml
- Save the file by press shift-z shift-z (ZZ)
- Create a directory for the template by running
sudo mkdir -p /etc/nabox/harvest/user/rest/9.12.0/ - Copy/paste the text above or the file sent via email to that directory
- Restart nabox by running
dc restart
Confirm that the template is working by waiting 30 to 60s after restart and then running docker logs havrest 2>&1 | grep EthernetSwitchPort
restarting ....
Our nabox had not been customised before so I had to do step 1a: "sudo mkdir /etc/nabox/harvest/user/rest/"
( at least thatdir didn't seem to exist)
ah yes, step 5 would have covered that too, but step 5 is after step 1 🙂
thanks for the clarification
havrest
Alas, no matches from the grep 😦
There is log output, but nothing matching EthernetSwitch
Checking my work ...
Should have paid more attention. Text in 4 was wrapped by my browser ... restarting ...
No success I am afraid, and I have to leave now. It's getting late 🙂
I wonder if the cluster in question is setup for REST? Run the following and paste the collectors for the poller. dc exec -w /harvest -e HARVEST_CONF=/harvest-conf havrest /harvest/bin/harvest doctor --print
OK we can pick it up on Monday
Thanks - have a good weekend!
you too!
🍺 time!
So, if you have a moment, I would circle back around to this issue ... I have the custom.yaml and the ethernet_switch_port.yaml. But I don't see any occurances of "Ethernet_Switch" in the output of "logs havest"
Are we looking for polling data? How does the new custom switch stuff get associated with a specific cluster?
I grepped the output of "doctor" and all our collectors have 4 listed i.e. Rest, RestPerf, Zapi, ZapiPerf
can you grab a support bundle from nabox and upload it to https://upload.nabox.org/qiri-noca-kohi
The new custom switch stuff get associated with a specific cluster by the collector - back on Jan 31, I provided a rest template, so any cluster that lists a Rest collector in the list of collectors should use the custom templates listed in /etc/nabox/harvest/user/rest/custom.yaml
Humm, maybe. I have to check what the security rules are wrt. exporting data 😕
ok. in the meantime, you can try
cd /etc/nabox/harvest/user/rest/ and then paste the result of ls -lR
Here you go ...
NABOX002:user 10.02 14:27:09$ pwd
/etc/nabox/harvest/user
NABOX002:user 10.02 14:27:18$ ls -ltra
total 16
-rw-r--r--. 1 root root 181 Aug 14 15:03 README.md
drwxr-xr-x. 3 root root 4096 Jan 31 17:27 ./
drwxr-xr-x. 3 root root 4096 Jan 31 17:47 rest/
drwxr-xr-x. 6 root root 4096 Jan 31 17:51 ../
NABOX002:user 10.02 14:27:23$ ls -lR rest/
rest/:
total 8
drwxr-xr-x. 2 root root 4096 Jan 31 17:51 9.12.0/
-rw-r--r--. 1 root root 57 Jan 31 17:42 custom.yaml
rest/9.12.0:
total 4
-rw-r--r--. 1 root root 571 Jan 31 17:48 ethernet_switch_port.yaml
NABOX002:user 10.02 14:27:29$ cat rest/custom.yaml
objects:
EthernetSwitchPort: ethernet_switch_port.yaml
NABOX002:user 10.02 14:27:43$
NABOX002:user 10.02 14:27:44$ cat rest/9.12.0/ethernet_switch_port.yaml
name: EthernetSwitchPort
query: api/network/ethernet/switch/ports
object: ethernet_switch_port
counters:
- ^^identity.name => interface
- ^^switch.name => switch
- ^state
- ^speed => speed
- ^identity.type => type
- ^isl => is_isl
plugins:
- LabelAgent:
value_to_num:
# [ dormant, down, lower_layer_down, not_present, testing, unknown, up ]
- new_status state up up0
export_options:
include_all_labels: true
NABOX002:user 10.02 14:27:55$
I should probably have marked that up somehow, as source code, or?
Discord supports triple backticks with a type, so for yaml you would use
your directory and template look good. We'll take a look at your support bundle if/when you're able to send.
Does this return anything?
docker logs havrest 2>&1 | grep ethernet_switch_port.yaml
You could also check for any errors or warnings like so:
docker logs havrest 2>&1 | grep -E 'ERROR|WARN'
ethernet_switch doesn't retun anything. There are no matches for WARN. There are a lot of ERROR matches. Some are "expected" polling errors e.g. for FRU and for EMS events. Also some dup key errors for ClusterSoftware.
I see fcvi.go polling for interconnect adaptor information and getting API not found
Can I grep for a specific "object=..." string? What poller name would poll ethernet_switch_port
These messages are just the running polling messages, I'm not capturing any startup messages, for example. Maybe there would be something interesting there?
if you want, we can run this one collector, which will make it easier to debug. If you want to do that ssh into nabox, then exec into the container like so:
docker exec -it havrest sh
the run the poller directly, like so, replacing $poller with your poller name
bin/poller --poller $poller --confpath ../../../data/packages/harvest/conf:../harvest/conf:active --promPort 22001 --objects EthernetSwitchPort | tee /tmp/log.txt
@lucid gull can you copy/paste the log snippets that you mentioned yesterday about dup key errors for ClusterSoftware. We working on a fix for that and want to confirm that our fix will also fix your issue https://github.com/NetApp/harvest/pull/3460
Would an email message work? I can do that (slightly) more easily ...
BTW: "replacing $poller with your poller name" What's a poller?
You may await my email!
the poller is https://netapp.github.io/harvest/nightly/concepts/#poller
you can find the name in your harvest.yml file, sudo less /etc/nabox/harvest/harvest.yml
e.g.
To do the debug poller run, do I have to stuff the other polling first?
So, bizarrely, manually starting the poller instance, I do see a "WARN" and it is: "template name is empty. Make sure the object is defined ..."
Actually happens twice: for Rest and RestPerf
The warnings include this value: confPath: [../../../data/packages/harvest/conf:../harvest/conf:active]
RestPerf is expected because we only defined it for Rest. We do not expect to see that warning for Rest though.
While you are exec into the container, what does this show when you execute it?
ls /harvest-conf/active/rest
I exec into dc havrest, I am then root, and /harvest-conf/active contains only two dirs: zapi and zapiperf
that would explain the problem
ok exit out of the exec by typing exit or pressing Ctrl-D
then type dc restart
then exec back into the container and run the ls again ls /harvest-conf/active/rest
if you exit out of the container and do ls /etc/nabox/harvest/user do you still see the rest directory you created last week? The files that you see in /etc/nabox/harvest/user (outside the container) should be merged into what you see inside the container at /etc/nabox/harvest/active https://nabox.org/faq/#customize-harvest
Those two locations are out-of-sync for you, which is why your custom template is not being loaded. @hazy sierra do you know why this would happen?
Outside of the container I see a /etc/nabox/harvest/user/rest containing custom.yaml and a dir 9.12.0
thanks, that's what I expected
anything interesting in sudo journalctl -u naboxd --since yesterday
In the container there is an overlay mount on "/harvest-conf" with several options
yes, that's expected
In the havrest container I see my custom stuff under "/harvest-conf/user/rest/..."
( So not "/harvest-conf/active/rest" )
that makes sense, you should see it under /harvest-conf/user/rest too since the volume is mounted into the container
I believe you said that you have Auto enabled for this cluster. What if you try changing it to Rest and then checking if that causes the rest directory to be merged in your /harvest-conf/active/?
Was harvest restarted ?
same behaviour was observed before and after a "dc restart". I think that restarts all containers (?)
He's back 🙂 ... In the nabox WebUI I switched one cluster from automatic to REST, I also did a "dc restart". In the container the directory /harvest-conf/active still contains only 2 entries: zapi and zapperf
There is another "peer" directory /harvest-conf/active.tmp which does contain 3 entries: zapi, zapiperf and rest ...
There "rest" has the 9.12.0 and ethernet_switch_port.xaml stuff we customised
Here is the output of ls -laR ...
hi @lucid gull sounds like a bug in NAbox's merge logic. Can you share a support bundle from this system so we can check the logs?
I could send such a thing to a @netapp.com address. If that would work for you guys?
typically the support bundle is too large and the mail server rejects - that's why Yann stood up this server to upload files to https://upload.nabox.org/nobi-juve-geny - anyone can upload, but only NetApp staff can download
The security team laughed out loud 🤣
Happy to download from a place you prefer
I could create a P4 support call and upload to netapp.support directly via that? Although since nabox isn't an offical product ... I don't know if that would work
worth trying. I don't recall anyone who's tried that before
Or, I can, fairly easily, supply you with the output any command that you or @hazy sierra would like ...
I'll check in to the support call idea too ...
thanks!
@hazy sierra RobbW's files aren't being merged correctly. He sees the correct templates in active.tmp but not active. Where does NAbox log errors in that logic?
FYI: For context, we are currently running nabox 4.0.7 + Harvest 24.11.1, if that helps
( I had thought to get the ISL customisation working, then upgrade to the latest ... But I could do it the otherway around if desirable 🤓 )
Yes, let's try upgrading and then debugging further if needed
So, quick sanity check, first nabox 4.0.9 update and then add Harvest 25.02 should be a good process?
yes or you can also try upgrading only nabox and retrying that. nothing in 25.02 will effect the problem you are hitting. If you plan on upgrading to 25.02 anyway, that's fine to upgrade too and then we can retry
same 😦
Where would I configure jitter? I found a link with a reference to /opt/harvest2-conf/conf, but that path doesn't exist. Maybe for a older version of nabox ...
phooey! let's try this, after sshing into nabox
cd /tmp
sudo journalctl -u naboxd --since yesterday 2>&1 > /tmp/naboxd.log
sudo journalctl -u nabox --since yesterday 2>&1 > /tmp/nabox.log
docker logs havrest --since 1h 2>&1 > /tmp/harvest.log
tar -czvf logs.tar.gz *.log
then lets see if the logs.tar.gz file is small enough to email. I think the SMTP relay rejects anything over 20MB
@lucid gull can you rm the .tmp directory, restart the havrest container with dc restart havrest, wait 5m, run docker logs havrest --since 30m 2>&1 > /tmp/harvest.log, and email that file.
The previous harvest.log file you sent did not include startup messages which is why I've asked for a restart and then collecting after 5m
Quick question: rm the active.tmp in the container or in the base os?
base os
You nailed it! level=ERROR source=havrest.go:49 msg="unable to merge config" error="yaml: line 2: found character that cannot start any token"
$64K == Whcih file?!?
🙂 probably custom.yaml or ethernet_switch_port.yaml
Right, I'll "set list" or something ...
what does line 2 look like in each of those? You can email them if you don't spot the problem
In custom.yaml I had a tab (^I) to indent "EthernetSwitch ...". I've changed it to four spaces and restarted ...
That seems to have worked, I don't see any reoccurance of the error
awesome!
Not to appear (too) dumb, what do I do next? I.e. to see if any Switches / ISL links have been discovered (?)
Yip, in the container, /harvet-conf/active now has a rest subdir!
Yeah!
you can check VictoriaMetrics and confirm that it is collecting the metrics, replace ip with your nabox ip, https://$ip/vm/vmui and then type the new metric name ethernet_switch_port_new_status
OK, I see that under "create new dashboard" I can select a Metric "ethernet_switch_port_new_status"wher "is_isl" = "true" ... But I'm not sure how to access the counters? (e.g. the packet and error counts)
packet and and error counts are not currently being collected by that template. That detail may have been lost in our conversations about the original template. Let me see if it's possible to collect those - the ONTAP API describes those this way - can you say which of these counters you are interested in including?
Right - those 🙂 I.e. Packets, Discard and Errors for both tx and Rx directions
In VictoriaMetrics I query for the name and I get a warning "Showing 20 series out of 726 series due to performance reasons ..." Below that are a graph and table. If I "show all" I see what looks like a list of all our MetroCluster ISL interfaces... The graphed values for each seem to show values of either 0 or 1, I guess that those correspond to down or up
yes, that's right, the ethernet_switch_port_new_status metric has a zero value when the state is not up and a one value when the state is up
Here is another template to. ONTAP is providing these counters like performance counters which means we need to use the KeyPerf collector so try this:
- ssh to nabox
sudo mkdir -p /etc/nabox/harvest/user/keyperf/9.15.0sudo vim /etc/nabox/harvest/user/keyperf/custom.yaml- Press i to insert and copy/paste this
objects:
EthernetSwitchPort: ethernet_switch_port.yaml
- Save the file by press shift-z shift-z (ZZ)
- Create a directory for the template by running
sudo mkdir -p /etc/nabox/harvest/user/keyperf/9.15.0/ - Copy/paste the template text into that file
name: EthernetSwitchPort
query: api/network/ethernet/switch/ports
object: ethernet_switch_port
counters:
- ^^identity.name => interface
- ^^switch.name => switch
- statistics.receive_raw.errors => receive_errors
- statistics.receive_raw.discards => receive_discards
- statistics.receive_raw.packets => receive_packets
- statistics.transmit_raw.errors => transmit_errors
- statistics.transmit_raw.discards => transmit_discards
- statistics.transmit_raw.packets => transmit_packets
- statistics.timestamp(timestamp) => timestamp
- filter:
- statistics.timestamp=!"-"
- hidden_fields:
- statistics
export_options:
instance_keys:
- interface
- switch
sudo vim /etc/nabox/harvest/harvest.ymland add theKeyPerfcollector to the list of collectors like so. Save the file.:
collectors:
- Rest
- RestPerf
- Zapi
- ZapiPerf
- Ems
- KeyPerf
sudo vim /data/packages/harvest/conf/keyperf/static_counter_definitions.yamland copy/paste the following:
objects:
node:
counter_definitions:
- name: statistics.processor_utilization_raw
type: percent
base_counter: statistics.processor_utilization_base
- name: statistics.processor_utilization_base
type: delta
flexcache:
counter_definitions:
- name: statistics.flexcache_raw.client_requested_blocks
type: delta
- name: statistics.flexcache_raw.cache_miss_blocks
type: delta
ethernet_switch_port:
counter_definitions:
- name: statistics.receive_raw.errors
type: delta
- name: statistics.receive_raw.discards
type: delta
- name: statistics.receive_raw.packets
type: delta
- name: statistics.transmit_raw.errors
type: delta
- name: statistics.transmit_raw.discards
type: delta
- name: statistics.transmit_raw.packets
type: delta
- Restart nabox by running
dc restart - Confirm the template works by waiting 7 minutes after restart and then running
docker logs havrest 2>&1 | grep EthernetSwitchPortCheck for errors by runningdocker logs havrest 2>&1 | grep ERR
We're currently running ONTAP 9.12, so probably I should use a sub-directory .../9.12.0/... ?
the way that best-fit works it doesn't matter since there is only one template for all versions of ONTAP in your environment
OK, thanks! So I'll be simply replacing the existing customisation
Aha ... with "keyperf" in the path. I should learn to read
you will be creating a new template for the KeyPerf collector and using that in addition to the Rest template you created. There is one step I missed, I will add it above at step 8. You need to tell your poller to use the KeyPerf collector since, your config is not using it at the moment
OK step 8 added
Have run out of time, Windows JumpServer is being rebooted at 18 for a patch installation ... I will be able to get back to this on Wednesday
I think I followed the steps you described. Grepping for ERR returns too much output to be useful, is there somethingspecific I can look for?
I can see numerous log mesages for "Collector=Rest:EthernetSwitchPort" ... those contain a KV pair "bytesRx=29913" ... for example
No sign of any other conter values though ...
is that log message INFO or ERROR?
ok, that means that collector is working correctly. What if you limit the logs like so
docker logs havrest --since 30m 2>&1 | grep 'RestPerf:EthernetSwitchPort'
There are no occurances of 'RestPerf:EthernetSwitchPort' (just RestPerf is logged)
I though it was some other kind of collector?
No occurances of 'KeyPerf' either
And step 8 was saved? i.e. adding KeyPerf to the list of collectors? And after all of the changes, you restart the container with dc restart?
If so, what if you restart everything again, then after 5m run docker logs havrest --since 6m 2>&1 | tee havrest.log and then email that havrest.log file. It will include the poller startup when the templates are loaded and should be small enough to mail
Re Step 8: '-KeyPerf' is listed under Defaults. However '-RestPerf' isn't. All of the individual Poller entries seem to have four Collectors: Rest + RestPerf + Zap + ZapiPerf
Will they inherit KeyPerf from Defaults?
not if they override it, defaults is used when a child does not specify a value
I've added a '- KeyPerf' collector entry to every individual Poller (Cluster)
And have now replaced the tabs I used with spaces
and dc restarted
Humm ... I think the (an) issue might be: keyperf.go:226 ... msg="Skipping metric due to unknown metricType"
step 9 should cover that
yep, updated with those
Q. Should those counters by of type delta or raw?
I had a question out to ONTAP about that and as you noticed I've flipped between the two while I wait for an answer. I think delta makes more sense because they appear to be monotonically increasing - I'll update the paste to use delta
Deja vu ... Takes me back to customising the Remedy Health Profiler in Dallas to suit Wellfeet instead of Cisco. And here I am, still doing the same, er, quality work 🙄
thanks for catching it, you would have noticed when you looked at the values after you get your collector working
That's what the customer in Dallas pointed out 🤣
( though I couldn't say for sure which one is correct yet )
FYI: It's alive! But unhappy: ... collector=KeyPerf:EthernetSwitchPort error="Failed to fetch ... error making request StatusCode: 400 Message: Unexpected argument "statistics..timestamp ... Code: 262179 ...
I'll have a look in those config files to see what I've done wrong this time 🙂
check if you have a two dots side by side .. in the template
Would that be good or bad?
bad, the error you pasted makes it sound like you have statistics..timestamp when it should be statistics.timestamp
I couldn't find any problem in that file, I can only think that, perhaps, the content has somehow been damaged on it's journey through Discor and Outlook ... Maybe you could send me your original example/sample text via email as a .txt attachment? (I think you have my email address).