#Need some NABOX configuration help

1 messages · Page 1 of 1 (latest)

idle snow
#

Trying to enable NFS client, Qtree and FlexGroup counters. I'm running NABox 4.0.10, Grafana 11.5.2, Victoria 2.24.0. I am so very confused with the instructions (note I don't have access to the internet from the running system).
First question are the config files supposed to be .yml or .yaml? I see conflicting instructions.
Secondly, I am trying to copy the /etc/nabox/harvest/*.yml to /etc/nabox/harvest/user/restperf/custom_xxxx.yml, then modify and 'dc restart'. Changes don't work. Any clue as to what I maybe doing wrong?

modest bluff
#

hi @idle snow sorry for the confusion. Can you share links to the conflicting documentation and we'll update it.

If you want to enable those objects and you don't plan on modifying their templates, there is no reason to copy the templates. Take a look at this example for Qtree https://github.com/NetApp/harvest/discussions/3446

GitHub

This changes would be needed for enabling Qtree perf counters. Native ⚠️ Note: Below steps are applicable for native install. Qtree RestPerf Collector Step 1: Create custom.yaml Let's assume Ha...

idle snow
#

Thanks Chris. Will take a look later, after dousing another fire. 😉

idle snow
#

So I followed the instructions and created a file /etc/nabox/harvest/user/restperf/custom.yaml with the following lines:
objects:
Qtree: qtree.yaml

Nothing happened. Then I though that I may have needed to copy the qtree.yaml file from /data/packages/harvest/conf/rest/9.12.x/qtree.yaml in to /etc/nabox/harvest/user/restperf/qtree.yaml; I did that and restarted the docker instance. Waiting...

modest bluff
#

no need to copy. And because this is a perf template, you will need to wait ~3 minutes to allow two successive polls to run. You can also check that the template is being loaded by running
docker logs havrest 2>&1 | grep Qtree

idle snow
#

Lots of 'level=WARN source=rest.go:543 msg="Instance data is not object, skipping" Poller=CLUSTER collector=Rest:Qtree type=String

#

then

modest bluff
#

What version of ONTAP and Harvest?

idle snow
#

level=ERROR source=rest.go:457 msg="" Poller CLUSTER collector=REST:Qtree error="configuration error => empty url" api=api/private/cli/qtree

#

ONTAP 9.14P9, harvest 25.02.0

modest bluff
#

thanks! let's make sure you custom.yaml is valid yaml. If you open it again in vi and then type : set list and then press return. Do you see any ^I like this?

idle snow
#

Yes I do

#

Beginning and between : and qtree.yaml

modest bluff
#

if you see ^I that means you used tabs and Yaml only support spaces. Try deleting the tab and replacing it with a couple of spaces. Save, dc restart, and then run the grep again a minute or so after restart

idle snow
#

I guarantee I didn't use tabs. But let me take them out.

#

Let me rephrase, not in the beginning, I may have from habit in between : and qtree.yaml 😉

#

Looks like it is trying to collect via ZAPI.

#

custom.yaml in zapiperf had tabs as well

#

So did keyperf

modest bluff
#

perhaps you have ZapiPerf listed before RestPerf in your harvest.yml file? If so, that would explain why ZapiPerf is preferred instead of RestPerf

idle snow
#

It appears that the collectrs for each array are:
Zapi, ZapiPerf,Rest,EMS
I don't think I modified that

modest bluff
#

that order explains why ZapiPerf is being used. RestPerf will never be used since it is not in the list. Since you're running ONTAP clusters >9.12 you can change the order to Rest, RestPerf, Zapi, ZapiPerf, Ems

idle snow
#

Getting some data

#

Just so I understand then: For each collector I want to enable, i.e. NFS clients, I need to add it in my custom.yaml for restperf, zapiperf and keyperf dirs, then copy the appropriate yaml file for the collector in to each?

#

So I feel a little uncomfortable with making the change for every cluster I have. I will, but shouldn't it default that way?

#

can I comment the harvest.yml?

#

The top of the file, under Defaults:, collectors:, does not have Rest in the stanza. Should I add it? and Show I add Rest and RestPerf or only as appropriate?

modest bluff
#

restperf, zapiperf, and keyperf are all kinds of performance collectors. Let's ignore KeyPerf for a moment. ZapiPerf and RestPerf collect and publish identical metrics (except for a few exceptions due to missing ONTAP APIs). Each of those collectors has a default.yaml that lists the objects that collector supports (Disk, Qtree, etc.) When Harvest starts, it reads your collector list from your harvest.yml and uses the collectors in the order you listed. When the same object is collected by multiple collectors, only the first collector listed in your harvest.yaml will run because it doesn't make sense to request Qtree metrics via ZapiPerf and RestPerf since the metrics will be the same and it just creates more work for ONTAP and Harvest. This is explained here https://netapp.github.io/harvest/nightly/architecture/rest-strategy/#can-i-use-the-rest-and-zapi-collectors-at-the-same-time

My recommendation would be to use these collectors in this order: Rest, RestPerf, Zapi, ZapiPerf, Ems.
Then for the templates that you want to enable, create a custom.yaml and add the objects there like you did above for Qtree.
For example:

objects:
  Qtree:      qtree.yaml
  NFSClients: nfs_clients.yaml

You do NOT need to copy the appropriate yaml files for the collectors into each.

idle snow
#

Ok, I added NFSClients and CIFSSession. I am trying to get anything from the log, but I don't see anything

modest bluff
#

feel free to send me the log file after restart and I'll take a look.
docker logs havrest --since 3m 2>&1 > /tmp/havrest.log change the 3m to cover when you restarted since the templates are read once during startup
mail to ng-harvest-files@netapp.com

idle snow
#

Unfortunately, I'm a dark site

modest bluff
#

ok

#

when you say that you don't see anything - do you mean you don't see anything in the logs?

idle snow
#

No I don't see anything matching "nfs_session"

modest bluff
#

what does docker logs havrest 2>&1 | grep -E 'CIFSSession|NFSClients' show?

idle snow
#

Nada

modest bluff
#

you changed your /etc/nabox/harvest/harvest.yml and created a custom.yaml, can you paste your custom.yaml and the collectors section of the poller in question?

idle snow
#

I'm not sure what you mean. I modifyied the /etc/nabox/harvest/harvest.yml and added "- Rest, -RestPerf, - Zapi, -ZapiPerf, - Ems" for each cluster monitored. The custom.yaml only contain four lines:
Objects:
Qtree: qtree.yaml
CIFSSession: cifs_session.yaml
NFSClients: nfs_clients.yaml

modest bluff
#

after making these changes did you dc restart? dc restart havrest would be fine too. Also the custom yaml is in which directory?

idle snow
#

Yes

#

/etc/nabox/harvest/user/restperf/custom.yaml

modest bluff
#

thanks. the CIFSSession and NFSClients templates are rest templates, not restperf so you will need to create a /etc/nabox/harvest/user/rest/custom.yaml that contains those. The Qtree object has both a rest and restperf template so it should be listed in both

idle snow
#

Ah... You maybe on to the problem. So I need both a custom.yaml for rest and restperf

modest bluff
#

yes, you need a custom.yaml for the matching collector's template

idle snow
#

Ok, thank you Chris. I guess I am a little thick sometimes. 😉 Sorry about the frustration

modest bluff
#

your welcome! and no, the issue is we need to explain these concepts better. It's not you, it's our documentation

idle snow
#

and just by entering the object, i.e. Qtree: qtree.yaml it will find the Rest and RestPerf appropriately and with the latest version?

modest bluff
#

yes, it will find the version that best matches your ONTAP version.

Since the Qtree object has a Rest and RestPerf template, you need a custom.yaml with Qtree: qtree.yaml in both /etc/nabox/harvest/user/rest/custom.yaml and /etc/nabox/harvest/user/restperf/custom.yaml

idle snow
#

Right, but for nfs_clients I only need the entry in the rest/custom.yaml not also in the restperf/custom.yaml

modest bluff
#

that's right since that template does not exist for restperf

idle snow
#

Thank you sir. I wish you a peaceful weekend

modest bluff
#

you too!

idle snow
#

Hello again. Trying to look for my NFS sessions and am not seeing them

#

I did notice that the active rest and restperf custom.yaml's are empty

modest bluff
#

hi @idle snow if you ssh into your nabox system , what is the result of ls -la /etc/nabox/harvest/user/rest/

idle snow
#

custom.yaml

modest bluff
#

And what about cat /etc/nabox/harvest/user/rest/custom.yaml

idle snow
#

Text is:
objects:
Qtree: qtree.yaml
CIFSSession: cifs_session.yaml
NFSClients: nfs_clients.yaml

modest bluff
#

thanks. and what do you have in the collectors section of the poller in question in your /etc/nabox/harvest/harvest.yml file?

idle snow
#

Exactly that; What doesn't make sense is that it looked like it was collecting on Friday when we parted

modest bluff
#

interesting. What does this return?
docker logs havrest 2>&1 | grep 'NFSClients'

idle snow
#

Nothing

modest bluff
#

If you restart the havrest container by running dc restart havrest. Wait 3 minutes, and then run that grep again, do you get anything?

idle snow
#

I'll will try it now

#

Nothing still

modest bluff
#

well that is unexpected. What about ls -la /etc/nabox/harvest/active/rest

idle snow
#

That is what I was saying before, it only has custom.yaml and its contents are:
objects: {}

#

Well, glad you are happy...

modest bluff
#

What about docker logs havrest --since 10m 2>&1 | grep ERROR

idle snow
#

I am getting error on some clusters: go:678 msg="Failed to collect ems data" and,
go:436 msg"" Poller=cluster collector=Rest:FRU/NetPort... error="failed to fetch data: error making request StatusCde 404, Message: API not found, Code: 3 API: /api/private/cli/system/controller/fru?fields=fru_name%2Cnode%2Cserial...

#

Errors go:601,217,678 and more

#

FYI, I updated NABox and Harvest last week

modest bluff
#

thanks, none of those would effect the NFSClients collectors.

Can you check your /etc/nabox/harvest/user/rest/custom.yaml file again and make sure there are no tabs or other syntax errors. Your /etc/nabox/harvest/active/rest/custom.yaml should not have an empty objects section. That's why the collector is not being loaded.

#

I suspect a syntax error or tabs is causing active to contain objects: {}

idle snow
#

I don't see any control characters in there

modest bluff
#

if possible, it might be worth scping the file off, and using an external tool to validate the yaml

idle snow
#

ok

modest bluff
#

@pure bluff why would the file /etc/nabox/harvest/active/rest/custom.yaml contain objects: {} even though 70tas has data in /etc/nabox/harvest/user/rest/custom.yaml

All I can think of is that the /etc/nabox/harvest/user/rest/custom.yaml file has syntax errors, but 70tas has checked that

idle snow
#

Are leading spaces required?

modest bluff
#

sometimes - best to copy/paste into a yaml validator to check or use yamllint

#

Here is a valid custom.yaml

objects:
    CIFSSession: cifs_session.yaml
    EthernetSwitchPort: ethernet_switch_port.yaml
    LunMap: lun_map.yaml
    Volume: custom_volume_flexgroup.yaml
pure bluff
idle snow
#

Unfortunately I don't have yamllint, I'll have to see if it is approved in in EPEL. I am seeing three grafana error due to 127.0.0.53:53 i/o timeout; would that be an issue?

#

Ok, got yamllint installed. seeing " warning ' missing document start "---" ', and error too many spaces after colon. Let me work on that

modest bluff
#

try yamllint -d relaxed /etc/nabox/harvest/active/rest/custom.yaml. The "too many spaces after colon" will probably become warnings, which is fine

idle snow
#

I went through every file.
I gave a space at ^ for indented lines
a : after key, and one space and then the value

#

yamllint only warns of the missing start "---"

#

I am still only getting an active/zapi and active/zapiperf

#

@pure bluff Did you ask me about the code format?

#

yamllint -d relaxed shows nothing on either the active or user custom.yaml files

#

Ok, getting somewhere now. I rebooted. Now I have active custom.yaml files with data in all but keyperf/

#

I'm getting data... Whooppee!

modest bluff
#

missing keyperf is fine

#

sounds like the problem may have been syntax errors in the yaml files and after fixing that and restarting the container, things started working

idle snow
#

It required a reboot. The restart did not do it

#

Thanks for the help gentlemen

modest bluff
#

your welcome! a reboot sounds like an nabox bug to me. What do you think @pure bluff?

pure bluff
#

Rebooting NAbox fixed the objects issue ?

modest bluff
#

yes

pure bluff
#

Ok probably a dc restart havrest would have fixed it as well ?

modest bluff
#

70tas mentioned a few messages above that a restart havrest did not work, only a reboot did. That's why I'm saying that it sounds like a bug

#

it sounds like the steps to reproduce would be to add syntax errors to custom.yaml files, restart the container, check that active has an empty custom.yaml, fix syntax errors, restart the container and then check that custom.yaml is fixed. 70tas is saying that after fixing the syntax errors and restarting the container that the custom.yaml will be blank until a reboot

pure bluff
#

Ok, so I managed to create the same custom.yaml :

❯ cat custom.yaml
objects: {}
#

With something like :

❯ cat ../../user/rest/custom.yaml
object:
  Qtree: qtree.yaml
  Volume: custom_volume_flexgroup.yaml

(Note object instead of objects)

modest bluff
#

yes, any syntax error in custom.yaml should do - tabs vs spaces. list instead of map, misspelling

pure bluff
#

That's handled correctly, an invalid yaml will abort merge which is fine, I just improved the logs to indicate where the problem is

modest bluff
#

and after fixing the syntax error and restarting the container - is the corrected template loaded? 70tas is saying that after fixing the syntax errors and restarting the container that the custom.yaml will be blank until a reboot, not a restart

pure bluff
#

Yes where I'm stumped on this one, can't figure out how this can happen

#

on merge the active directory is completely deleted and recreated

modest bluff
#

did the support bundles from https://upload.nabox.org/nuyo-zuxu-mynu help? I noticed that active.tmp was present. The other issue I noticed was if there is an error in restperf it will prevent rest custom.yaml from merging too

pure bluff
#

not really in the sense that I only saw the ^I and tried the same, but that cleanly fails, leaving out a dangling active.tmp directory which is fine

modest bluff
#

right, it seems like some folks have had the dangling active.tmp that persists even after there are no errors. Those folks "fixed" the problem by rebooting

pure bluff
#

on a running instance (not dev environment) I "break" custom.yaml by putting custom: so I end up with a merged {} after dc restart havrest.
Once fixed, dc restart havrest again, file is correct again

pure bluff
#

Will categorize this one as cosmic 😄 and wait that it happens again

idle snow
#

Cont... Still not getting any Qtree stats on FlexGroups. @pure bluff do I need a special configuration to collect qtree stats from FlexGroups?

pure bluff
#

That’s a harvest question, but NAbox should behave and let you configure whatever you need.

idle snow
#

Thank you. @modest bluff this is what is in the log...

#

timestamep level=INFO source=collector.go:601 msg_Collected Poller=aces collector=Rest:Qtree apiMs=1176 bytesRx=211678 calcMs=0 exportMs=1 instances=379 instancesExported=379 metrics=3032 metricsExported=758 numCalls=2 parsMs=2 pluginInstances=0 pluginMs=0 pollMs=1179 renderedBytes=119951 zBegin=1743423306730

#

timestamp level=INFO source=collector.go:601 msg=Collected Poller=aces collector=RestPerf:Qtree apiMs=188 bytesRx-148200 calcMs=0 exportMs=0 inststances=134 intancesExported=48 metrics=1072 metricsExported=192 numCalls=1 numPartials=0 parseMs=2 pluginInstances=0 pluginMs=0 pollMs=188 renderedBytes=322248 skips=0 zBegin=1743423338920, and so on

modest bluff
#

hi @idle snow those two log lines tell use that the Rest:Qtree and RestPerf:Qtree collectors are running, collecting, and exporting metrics. That's good. If you go the Qtree dashboard do you see data there? If so, if you select the Volume drop down at the top of the screen, do you see any flexgroup volumes there?

idle snow
#

No and no

#

I take that back. No and Yes, but if I select a vollume, or qtree I still do not see anything

modest bluff
#

There are two rows in that dashboard, Highlights and Usage. When you select a Volume are the panels in both rows empty or is there data in some of the panels?

idle snow
#

Usage has data, Highlighilights are blank

#

The actually say "No data"

modest bluff
#

let's check that VictoriaMetrics has the metrics you expect to see. If you go to https://$ip/vm/vmui/ replacing $ip with you nabox ip and type qtree_labels and press Enter, do you see the flexgroup in the list?

idle snow
#

Sorry, I'm back

#

Yes, however, at the top is "Showing 20 series our of 731 series due to performance reasons"

modest bluff
#

that message is ok. In VictoriaMetrics, what if you query for qtree_total_ops for one of the flexgroup volumes by typing something like this:, replacing $fg with the name of your flexgroup. qtree_total_ops{volume="$fg"} Do you get results for that flexgroup?

torn trellis
idle snow
#

Ladies and Gents, trying to enable EMS collector, but can't seem to find the HOWTO. Can someone help?

modest bluff
#

hi @idle snow have you tried clicking this toggle in the settings for your cluster in nabox?

idle snow
#

Duh! Don't know, let me check. Thank you

#

Actually, it was already enabled

#

They are all coming back with "No data"

modest bluff
idle snow
#

Ok, did rest and restperf, do I also need to do the zapis'? I'm running 9.14.1P12

#

I'm going to let go overnight and see if it has collected by morning

idle snow
#

Ok, I got my FG's running. But I'm still not able to get any Health reports.