Trying to enable NFS client, Qtree and FlexGroup counters. I'm running NABox 4.0.10, Grafana 11.5.2, Victoria 2.24.0. I am so very confused with the instructions (note I don't have access to the internet from the running system).
First question are the config files supposed to be .yml or .yaml? I see conflicting instructions.
Secondly, I am trying to copy the /etc/nabox/harvest/*.yml to /etc/nabox/harvest/user/restperf/custom_xxxx.yml, then modify and 'dc restart'. Changes don't work. Any clue as to what I maybe doing wrong?
#Need some NABOX configuration help
1 messages · Page 1 of 1 (latest)
hi @idle snow sorry for the confusion. Can you share links to the conflicting documentation and we'll update it.
If you want to enable those objects and you don't plan on modifying their templates, there is no reason to copy the templates. Take a look at this example for Qtree https://github.com/NetApp/harvest/discussions/3446
GitHub
This changes would be needed for enabling Qtree perf counters. Native ⚠️ Note: Below steps are applicable for native install. Qtree RestPerf Collector Step 1: Create custom.yaml Let's assume Ha...
Thanks Chris. Will take a look later, after dousing another fire. 😉
So I followed the instructions and created a file /etc/nabox/harvest/user/restperf/custom.yaml with the following lines:
objects:
Qtree: qtree.yaml
Nothing happened. Then I though that I may have needed to copy the qtree.yaml file from /data/packages/harvest/conf/rest/9.12.x/qtree.yaml in to /etc/nabox/harvest/user/restperf/qtree.yaml; I did that and restarted the docker instance. Waiting...
no need to copy. And because this is a perf template, you will need to wait ~3 minutes to allow two successive polls to run. You can also check that the template is being loaded by running
docker logs havrest 2>&1 | grep Qtree
Lots of 'level=WARN source=rest.go:543 msg="Instance data is not object, skipping" Poller=CLUSTER collector=Rest:Qtree type=String
then
What version of ONTAP and Harvest?
level=ERROR source=rest.go:457 msg="" Poller CLUSTER collector=REST:Qtree error="configuration error => empty url" api=api/private/cli/qtree
ONTAP 9.14P9, harvest 25.02.0
thanks! let's make sure you custom.yaml is valid yaml. If you open it again in vi and then type : set list and then press return. Do you see any ^I like this?
if you see ^I that means you used tabs and Yaml only support spaces. Try deleting the tab and replacing it with a couple of spaces. Save, dc restart, and then run the grep again a minute or so after restart
I guarantee I didn't use tabs. But let me take them out.
Let me rephrase, not in the beginning, I may have from habit in between : and qtree.yaml 😉
Looks like it is trying to collect via ZAPI.
custom.yaml in zapiperf had tabs as well
So did keyperf
perhaps you have ZapiPerf listed before RestPerf in your harvest.yml file? If so, that would explain why ZapiPerf is preferred instead of RestPerf
It appears that the collectrs for each array are:
Zapi, ZapiPerf,Rest,EMS
I don't think I modified that
that order explains why ZapiPerf is being used. RestPerf will never be used since it is not in the list. Since you're running ONTAP clusters >9.12 you can change the order to Rest, RestPerf, Zapi, ZapiPerf, Ems
Getting some data
Just so I understand then: For each collector I want to enable, i.e. NFS clients, I need to add it in my custom.yaml for restperf, zapiperf and keyperf dirs, then copy the appropriate yaml file for the collector in to each?
So I feel a little uncomfortable with making the change for every cluster I have. I will, but shouldn't it default that way?
can I comment the harvest.yml?
The top of the file, under Defaults:, collectors:, does not have Rest in the stanza. Should I add it? and Show I add Rest and RestPerf or only as appropriate?
restperf, zapiperf, and keyperf are all kinds of performance collectors. Let's ignore KeyPerf for a moment. ZapiPerf and RestPerf collect and publish identical metrics (except for a few exceptions due to missing ONTAP APIs). Each of those collectors has a default.yaml that lists the objects that collector supports (Disk, Qtree, etc.) When Harvest starts, it reads your collector list from your harvest.yml and uses the collectors in the order you listed. When the same object is collected by multiple collectors, only the first collector listed in your harvest.yaml will run because it doesn't make sense to request Qtree metrics via ZapiPerf and RestPerf since the metrics will be the same and it just creates more work for ONTAP and Harvest. This is explained here https://netapp.github.io/harvest/nightly/architecture/rest-strategy/#can-i-use-the-rest-and-zapi-collectors-at-the-same-time
My recommendation would be to use these collectors in this order: Rest, RestPerf, Zapi, ZapiPerf, Ems.
Then for the templates that you want to enable, create a custom.yaml and add the objects there like you did above for Qtree.
For example:
objects:
Qtree: qtree.yaml
NFSClients: nfs_clients.yaml
You do NOT need to copy the appropriate yaml files for the collectors into each.
Ok, I added NFSClients and CIFSSession. I am trying to get anything from the log, but I don't see anything
feel free to send me the log file after restart and I'll take a look.
docker logs havrest --since 3m 2>&1 > /tmp/havrest.log change the 3m to cover when you restarted since the templates are read once during startup
mail to ng-harvest-files@netapp.com
Unfortunately, I'm a dark site
ok
when you say that you don't see anything - do you mean you don't see anything in the logs?
No I don't see anything matching "nfs_session"
what does docker logs havrest 2>&1 | grep -E 'CIFSSession|NFSClients' show?
Nada
you changed your /etc/nabox/harvest/harvest.yml and created a custom.yaml, can you paste your custom.yaml and the collectors section of the poller in question?
I'm not sure what you mean. I modifyied the /etc/nabox/harvest/harvest.yml and added "- Rest, -RestPerf, - Zapi, -ZapiPerf, - Ems" for each cluster monitored. The custom.yaml only contain four lines:
Objects:
Qtree: qtree.yaml
CIFSSession: cifs_session.yaml
NFSClients: nfs_clients.yaml
after making these changes did you dc restart? dc restart havrest would be fine too. Also the custom yaml is in which directory?
thanks. the CIFSSession and NFSClients templates are rest templates, not restperf so you will need to create a /etc/nabox/harvest/user/rest/custom.yaml that contains those. The Qtree object has both a rest and restperf template so it should be listed in both
Ah... You maybe on to the problem. So I need both a custom.yaml for rest and restperf
yes, you need a custom.yaml for the matching collector's template
Ok, thank you Chris. I guess I am a little thick sometimes. 😉 Sorry about the frustration
your welcome! and no, the issue is we need to explain these concepts better. It's not you, it's our documentation
and just by entering the object, i.e. Qtree: qtree.yaml it will find the Rest and RestPerf appropriately and with the latest version?
yes, it will find the version that best matches your ONTAP version.
Since the Qtree object has a Rest and RestPerf template, you need a custom.yaml with Qtree: qtree.yaml in both /etc/nabox/harvest/user/rest/custom.yaml and /etc/nabox/harvest/user/restperf/custom.yaml
Right, but for nfs_clients I only need the entry in the rest/custom.yaml not also in the restperf/custom.yaml
that's right since that template does not exist for restperf
Thank you sir. I wish you a peaceful weekend
you too!
Hello again. Trying to look for my NFS sessions and am not seeing them
I did notice that the active rest and restperf custom.yaml's are empty
hi @idle snow if you ssh into your nabox system , what is the result of ls -la /etc/nabox/harvest/user/rest/
custom.yaml
And what about cat /etc/nabox/harvest/user/rest/custom.yaml
Text is:
objects:
Qtree: qtree.yaml
CIFSSession: cifs_session.yaml
NFSClients: nfs_clients.yaml
thanks. and what do you have in the collectors section of the poller in question in your /etc/nabox/harvest/harvest.yml file?
Exactly that; What doesn't make sense is that it looked like it was collecting on Friday when we parted
interesting. What does this return?
docker logs havrest 2>&1 | grep 'NFSClients'
Nothing
If you restart the havrest container by running dc restart havrest. Wait 3 minutes, and then run that grep again, do you get anything?
well that is unexpected. What about ls -la /etc/nabox/harvest/active/rest
That is what I was saying before, it only has custom.yaml and its contents are:
objects: {}
Well, glad you are happy...
What about docker logs havrest --since 10m 2>&1 | grep ERROR
I am getting error on some clusters: go:678 msg="Failed to collect ems data" and,
go:436 msg"" Poller=cluster collector=Rest:FRU/NetPort... error="failed to fetch data: error making request StatusCde 404, Message: API not found, Code: 3 API: /api/private/cli/system/controller/fru?fields=fru_name%2Cnode%2Cserial...
Errors go:601,217,678 and more
FYI, I updated NABox and Harvest last week
thanks, none of those would effect the NFSClients collectors.
Can you check your /etc/nabox/harvest/user/rest/custom.yaml file again and make sure there are no tabs or other syntax errors. Your /etc/nabox/harvest/active/rest/custom.yaml should not have an empty objects section. That's why the collector is not being loaded.
I suspect a syntax error or tabs is causing active to contain objects: {}
I don't see any control characters in there
if possible, it might be worth scping the file off, and using an external tool to validate the yaml
ok
@pure bluff why would the file /etc/nabox/harvest/active/rest/custom.yaml contain objects: {} even though 70tas has data in /etc/nabox/harvest/user/rest/custom.yaml
All I can think of is that the /etc/nabox/harvest/user/rest/custom.yaml file has syntax errors, but 70tas has checked that
Are leading spaces required?
sometimes - best to copy/paste into a yaml validator to check or use yamllint
Here is a valid custom.yaml
objects:
CIFSSession: cifs_session.yaml
EthernetSwitchPort: ethernet_switch_port.yaml
LunMap: lun_map.yaml
Volume: custom_volume_flexgroup.yaml
Can you paste that with code format ?
Unfortunately I don't have yamllint, I'll have to see if it is approved in in EPEL. I am seeing three grafana error due to 127.0.0.53:53 i/o timeout; would that be an issue?
Ok, got yamllint installed. seeing " warning ' missing document start "---" ', and error too many spaces after colon. Let me work on that
try yamllint -d relaxed /etc/nabox/harvest/active/rest/custom.yaml. The "too many spaces after colon" will probably become warnings, which is fine
I went through every file.
I gave a space at ^ for indented lines
a : after key, and one space and then the value
yamllint only warns of the missing start "---"
I am still only getting an active/zapi and active/zapiperf
@pure bluff Did you ask me about the code format?
yamllint -d relaxed shows nothing on either the active or user custom.yaml files
Ok, getting somewhere now. I rebooted. Now I have active custom.yaml files with data in all but keyperf/
I'm getting data... Whooppee!
missing keyperf is fine
sounds like the problem may have been syntax errors in the yaml files and after fixing that and restarting the container, things started working
your welcome! a reboot sounds like an nabox bug to me. What do you think @pure bluff?
Rebooting NAbox fixed the objects issue ?
yes
Ok probably a dc restart havrest would have fixed it as well ?
70tas mentioned a few messages above that a restart havrest did not work, only a reboot did. That's why I'm saying that it sounds like a bug
it sounds like the steps to reproduce would be to add syntax errors to custom.yaml files, restart the container, check that active has an empty custom.yaml, fix syntax errors, restart the container and then check that custom.yaml is fixed. 70tas is saying that after fixing the syntax errors and restarting the container that the custom.yaml will be blank until a reboot
Ok, so I managed to create the same custom.yaml :
❯ cat custom.yaml
objects: {}
With something like :
❯ cat ../../user/rest/custom.yaml
object:
Qtree: qtree.yaml
Volume: custom_volume_flexgroup.yaml
(Note object instead of objects)
yes, any syntax error in custom.yaml should do - tabs vs spaces. list instead of map, misspelling
That's handled correctly, an invalid yaml will abort merge which is fine, I just improved the logs to indicate where the problem is
and after fixing the syntax error and restarting the container - is the corrected template loaded? 70tas is saying that after fixing the syntax errors and restarting the container that the custom.yaml will be blank until a reboot, not a restart
Yes where I'm stumped on this one, can't figure out how this can happen
on merge the active directory is completely deleted and recreated
did the support bundles from https://upload.nabox.org/nuyo-zuxu-mynu help? I noticed that active.tmp was present. The other issue I noticed was if there is an error in restperf it will prevent rest custom.yaml from merging too
not really in the sense that I only saw the ^I and tried the same, but that cleanly fails, leaving out a dangling active.tmp directory which is fine
right, it seems like some folks have had the dangling active.tmp that persists even after there are no errors. Those folks "fixed" the problem by rebooting
on a running instance (not dev environment) I "break" custom.yaml by putting custom: so I end up with a merged {} after dc restart havrest.
Once fixed, dc restart havrest again, file is correct again
Will categorize this one as cosmic 😄 and wait that it happens again
Cont... Still not getting any Qtree stats on FlexGroups. @pure bluff do I need a special configuration to collect qtree stats from FlexGroups?
That’s a harvest question, but NAbox should behave and let you configure whatever you need.
Thank you. @modest bluff this is what is in the log...
timestamep level=INFO source=collector.go:601 msg_Collected Poller=aces collector=Rest:Qtree apiMs=1176 bytesRx=211678 calcMs=0 exportMs=1 instances=379 instancesExported=379 metrics=3032 metricsExported=758 numCalls=2 parsMs=2 pluginInstances=0 pluginMs=0 pollMs=1179 renderedBytes=119951 zBegin=1743423306730
timestamp level=INFO source=collector.go:601 msg=Collected Poller=aces collector=RestPerf:Qtree apiMs=188 bytesRx-148200 calcMs=0 exportMs=0 inststances=134 intancesExported=48 metrics=1072 metricsExported=192 numCalls=1 numPartials=0 parseMs=2 pluginInstances=0 pluginMs=0 pollMs=188 renderedBytes=322248 skips=0 zBegin=1743423338920, and so on
hi @idle snow those two log lines tell use that the Rest:Qtree and RestPerf:Qtree collectors are running, collecting, and exporting metrics. That's good. If you go the Qtree dashboard do you see data there? If so, if you select the Volume drop down at the top of the screen, do you see any flexgroup volumes there?
No and no
I take that back. No and Yes, but if I select a vollume, or qtree I still do not see anything
There are two rows in that dashboard, Highlights and Usage. When you select a Volume are the panels in both rows empty or is there data in some of the panels?
let's check that VictoriaMetrics has the metrics you expect to see. If you go to https://$ip/vm/vmui/ replacing $ip with you nabox ip and type qtree_labels and press Enter, do you see the flexgroup in the list?
Sorry, I'm back
Yes, however, at the top is "Showing 20 series our of 731 series due to performance reasons"
that message is ok. In VictoriaMetrics, what if you query for qtree_total_ops for one of the flexgroup volumes by typing something like this:, replacing $fg with the name of your flexgroup. qtree_total_ops{volume="$fg"} Do you get results for that flexgroup?
@idle snow This is a known issue with restPerf collector due to an ONTAP bug. I have added details here https://github.com/NetApp/harvest/issues/3541
Ladies and Gents, trying to enable EMS collector, but can't seem to find the HOWTO. Can someone help?
hi @idle snow have you tried clicking this toggle in the settings for your cluster in nabox?
Duh! Don't know, let me check. Thank you
Actually, it was already enabled
They are all coming back with "No data"
could you restart nabox via ssh by typing dc restart wait 5m, grab a support bundle and upload to https://upload.nabox.org/gamu-yita-vuty and we'll take a look
Ok, did rest and restperf, do I also need to do the zapis'? I'm running 9.14.1P12
I'm going to let go overnight and see if it has collected by morning
Ok, I got my FG's running. But I'm still not able to get any Health reports.