#┊・harvest-nabox🔒
1 messages · Page 1 of 1 (latest)
Thanks @loud ocean
Really glad you’re all here! I know there are lots of folks excited about harvest
Is this for the harvester for Grafana?
it's for the open source version of Harvest https://github.com/NetApp/harvest
Nice! We'd been holding out for a new release of Harvest at my old job... never got to upgrade though. But what the heck I can run it at home now.
If I have neither running yet... Prometheus or InfluxDB?
(looks like the influxdb exporter is a bit less work.)
@dusty lance we suggest to use Prometheus which has more harvest default dashboards than the influx.
yep, both work, as @fossil bane said, few more dashboards for Prometheus. Also for what it's worth, the strong majority of Harvest customers seem to have settled on Prometheus over InfluxDB
Cool, I'll do that, then. At $oldjob we were going to do Influx because our data engineering team was starting to use it for other things. But really as a self-contained thing ("the Harvest box") I suppose that doesn't matter.
Hi all, I'm planning to move from harvest 1.6 to 2.0. Regarding the prometheus disk requirements I've seen that wiki page https://github.com/NetApp/harvest/wiki/FAQ#sizing
In order to estimate disk usage, can I start from the metrics/min data that comes from the "netapp detail: harvest poller" dashboard in the 1.6 installation? Do it can be converted to ingested_samples_per_second?
Thanks!
Hi all, I have a question with the statics node_disk_data_read and hostadapter_bytes_read. What do they mean?
https://community.netapp.com/t5/ONTAP-Discussions/In-harvest-what-is-the-difference-between-node-disk-data-read-and-hostadapter/m-p/437123#M41017
@ocean kite You can get information about hostadapter counters via below cli
bin/zapi -p POLLERNAME show counters --object hostadapter
Also node level counters are available through below cli
bin/zapi -p POLLERNAME show counters --object system:node
From above commands:
node_disk_data_read is "Disk KB read per second"
hostadapter_bytes_read is "Bytes reads via Host Adapter"
@tough kestrel As mentioned in the guide, You should start with harvest exporter curl endpoint which prometheus is scrapping. It will show all metrics per poller which prometheus will be scrapping depending on the scrape time interval.
@fossil bane In order to curl the endpoint, do I have to do the following, right?
By default, the Harvest pollers are part of the backend network and do not expose their Prometheus web end-points. If you want their end-points exposed, pass the --port flag to the generate sub-command
@tough kestrel Yes for docker installation, you would need to generate compose file with --port flag to expose those poller endpoints.
Hi, sorry if I make dumb questions, which is the suggested way to customize grafana? I'm trying with environment variables but it seems they're ignored. I would like to have grafana listening via https and then using ldap authentication.
Thanks
@tough kestrel for https, it is mostly the ini file configuration which is needed. See if this link helps https://community.grafana.com/t/grafana-https-configuration/524 . I have not tried ldap set up in grafana but could find this link which have information about the same. https://grafana.com/docs/grafana/latest/setup-grafana/configure-security/configure-authentication/ldap/
Thanks a lot for that clarifications!
Hello, @fossil bane
could you please better explain which data should be gathered from the endpoint? Here's what came from one of my pollers:
Below is the list of metrics provided by my collectors and plugins.
Exposing data from 2 collectors and 35 objects, 429 metrics in total.
It seems another option to get ingested_samples_per_second is to query prometheus itself. How can I access the GUI?
Thank you
Hello, in the containerized environment, do the prom-stack.yml is overwritten when I generate a new compose file (for instance, when I add new pollers)?
If I make some customization for grafana/prometheus in that file, how can make them persistent?
@tough kestrel Yes prom-stack.yaml gets regenerated to set promPort and GrafanaPort in file
If you don't wish to persist it then you can pass a different stack prom-stack file during docker-compose command like below..
docker-compose -f custom-prom-stack.yml -f harvest-compose.yml up -d --remove-orphans
Hi gang!
First of all, Thanks for this wonderful project!
I’ve got a question. I have been tweaking the harvest to measure the QoS Limit “DELAY_CENTER_QOS_LIMIT: throttle” and I’ve done uncommenting the 4 workload objects at the bottom in /conf/zapiperf/default.yaml and did the docker-compose again but I don’t see the counter or the workload objects in Grafana. Can someone guide me on what I am missing? Thanks!
Nevermind. The culprit was the improper privilege setting in ONTAP. Thanks, gang 🙂
Glad you were able to solve it @agile oracle! I’ll make sure one of the Harvest folks come along and see 🙂
No need for that. I've misconfigured the ONTAP on my side. Thanks and always enjoying your clips @lyric mulch 😉
Hi, Could you share the poller logs.
Where are they located?
Are you using nabox?
Yes.
https://10.216.33.135/grafana/d/ZPvQPGiVk/netapp-detail-volume-details?orgId=1&refresh=1m I'm not seeing anything here.
nabox has highlighted log steps here https://nabox.org/documentation/troubleshooting/
Are you able to access prometheus?
see if there is any metrics name qos_latency?
ok share us the logs. We'll take a look there
@sturdy crest As discussed, default.yaml still had workload uncommented hence there were no workload metrics.
Mmmm, shouldn't be the big fat red warning fixed on that page ? 😄
I'll have to check the clock...
But the metrics are working now.
On the bright side I did find a bug in the volume dashboard.
@sturdy crest What's that bug?
The volume dashboard references qos_detail_volume_resource_latency but that doesn't exist.
The ones with no data haven't been fixed yet.
We have these metrics also. They come from this template https://github.com/NetApp/harvest/blob/main/conf/zapiperf/cdot/9.8.0/workload_detail_volume.yaml
Hmm. I mean workload_detail_volume is definitely an object in ONTAP CCMA.
yes all volumes of workload-class as autovolume are tracked under this template in harvest
Weird I don't have it in prometheus.
Does it come from the ONTAP side, or should it at least exist in prometheus?
You may not have any volume matching autovolume. See the response of below zapi request
`<?xml version="1.0" encoding="UTF-8"?>
<netapp xmlns="http://www.netapp.com/filer/admin" version="1.160">
<qos-workload-get-iter>
<query>
<qos-workload-info>
<workload-class>autovolume</workload-class>
</qos-workload-info>
</query>
</qos-workload-get-iter>
</netapp>`
Also logs should show if any instances were found for this template
Hmm, ok.
For me , I have the relevant data
I wonder how to check that. Would I need to use ZExplore?
Zoom tool or harvest cli would help
bin/zapi -p POLLERNAME show data --api qos-workload-get-iter
You should have CLI access if you're on VPN. I think SSH works.
I have checked your machine and it still have WorkloadDetailVolume disabled
You are welcome!
Still not seeing it???
takes around 5 minutes for first poll
awesome!
Now if we can just figure out how to build a dashboard similar to delay center view that'd be amazing.
let us know your requirement via Github. You can add any reference from PAS and we can take a look
It was that Github ticket you responded on.
Hang on I'll find the PA link for my vSIM once it loads.
sure you can dm me the details of PAS instance
Hi gang, is there a way to check the max throughput and min throughput of the QoS policy-group with Harvest? I see the qos_detail_ops and the other QoS related metrics but I couldn't figure out how to get those figures.
@agile oracle Does output of below CLI gives you the required data? If yes then , it should be just creating a template for this object in harvest.
bin/zapi -p POLLERNAME show data --api qos-policy-group-get-iter
When it comes to enable Workloads and QoS counters, can it be enabled in custom.yaml istead of altering default.yaml ?
Yes
Ok, so I have this :
nabox-api:/opt/harvest2-conf/conf/zapiperf# cat custom.yaml
collector: ZapiPerf
objects:
Volume: custom_volume_blacklist.yaml
Workload: workload.yaml
WorkloadDetail: workload_detail.yaml
WorkloadVolume: workload_volume.yaml
WorkloadDetailVolume: workload_detail_volume.yaml
And not getting much in Top Volume End-to-End QoS Drilldown
or... I hjust have to be patient
All good thanks !!
yes workload are polled every 3 minutes by default unlike other perf objects which are polled every 1 min
which means frequency * 2 or six minutes before Harvest will export metrics for these
Ok, just so you know, next version of NAbox will have workload/qos turn on by default
Thanks @fossil bane I've sorted it out. Appreciated!
Hi all, I've got a question about the Multi-tenancy which is documented in the Harvest FAQ (https://github.com/NetApp/harvest/wiki/FAQ#multi-tenancy). Does anybody have an example for that or this already in use? I'm interested but do not have any clue how to start.
FAQ · NetAppharvest Wiki
👋🏻 I see we moved over from Slack :). I keep seeing some outliers in harvest 2.0 that I use to manage with latency_io_reqd in 1.1x, I have volumes that do very little IO and see crazy latency figures when I'm convinced this is not the case. Is. there anyway of using a similar parameter to stop the false latency displaying
@uneven pumice Could you provide information about the counter name in template https://github.com/NetApp/harvest/blob/main/conf/zapiperf/cdot/9.8.0/volume.yaml having this issue?
I am seeing it reporting under avg_latency yet when I check max write_latency or read_latency the results are completely different. This shows bad results topk($TopResources, volume_avg_latency{datacenter="$Datacenter",cluster="$Cluster",svm=~"$SVM",volume=~"$Volume"})
topk($TopResources, volume_read_latency{datacenter="$Datacenter",cluster="$Cluster",svm=~"$SVM",volume=~"$Volume"})/1000 this or volume_write_latency gives me good results
I think I may see the issue actually
The default query did not have the /1000 on the end
Let me check
Divide by 1000 is used to display values in ms for tables. Ontap returns this counter in microsec. Grafana charts takes care of this value depending on values received. Also table will show the last value only. value from table should match last value from graph.
I think you are comparing 2 different counters... volume_avg_latency and volume_read_latency both are different counters.
hi @uneven pumice not sure which version of Harvest you are running, but in addition to what Rahul has pointed out, there was a bug described https://github.com/NetApp/harvest/issues/1175#issuecomment-1198327824 that affected latency calculations. This bug was fixed https://github.com/NetApp/harvest/pull/1154 about 29 days ago. Can you confirm which version of Harvest you are using with harvest version ? If you want to try these out before the next release, you can grab the latest https://github.com/NetApp/harvest/releases/tag/nightly build.
ooo that looks like it could be it
@loud ocean , @young steeple - good afternoon - we had a VMware vCenter outage that required ESXi host reboots and now our nabox instance is being assigned the wrong IP Address within VMware vCenter.
Can you please help me in assigning the correct IP address?
I’m guessing the IP address is assigned by the DHCP server ?
Followup question : can you connect to the web interface on this IP address ?
@young steeple - nope, web interface doesn't connect
Where does this IP come from ?
What address are you getting, and what address are you expecting? Mask the last two octets if you want.
ifconfig -a shows: 172.18.0.1 - it's supposed to be 172.24.x.x
Well that’s docker IP
ifconfig -a for the docker0 shows 172.17.0.1
What about enxxx ?
I don't see an enxxx - rather br-0fcff... (first entry in ifconfig -a output) and that's current set to 172.18.0.1
That’s a problem. You do have a network interface attached to the VM ?
What’s in /etc/network/interfaces ?
cat /etc/network interfaces
auto lo
iface lo inet loopback
That hardly makes any sense 😀 are you sure that’s NAbox ?
yes sir - VMware vCenter shows the following network interface is connected:
VMNetwork-xxx (connected) | 00:50:56:b7:e1:bf
What type of interface is it ? Vmxnet3 ?
so ifconfig eth0 doesn't return anything ?
Returns this:
static
This is what /etc/network/interfaces should look like :
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).
# The loopback network interface
auto lo
iface lo inet loopback
# The primary network interface
auto eth0
iface eth0 inet static
address %s
netmask %s
gateway %s
The Harvest team is happy to announce the release of 22.08 https://github.com/NetApp/harvest/releases/tag/v22.08.0
Highlights of this major release include:
- an ONTAP event management system (EMS) events collector with 64 events out-of-the-box
- Two new dashboards added in this release:
- Headroom dashboard
- Quota dashboard
- We've made lots of improvements to the REST Perf collector. The REST Perf collector should be considered early-access as we continue to improve it. This feature requires ONTAP versions 9.11.1 and higher.
- New max plugin that creates new metrics from the maximum of existing metrics by label.
- New compute_metric plugin that creates new metrics by combining existing metrics with mathematical operations.
- 48 feature, 45 bug fixes, and 11 documentation commits this release
I have no explanation on how this file has been reset to the current state though
before changing it, if it's not too late, can you check it's modification time ?
I've deployed 4 total nabox instances in our environment and I compared it against one of the other running ones and yes, it looks like what you said - this aligns
March 29 - am I reading that correctly?
Yes
Any idea of the uptime before it was reset?
I'd have to guess at least 50, if not 100 days
so that puts it after March 29
Is it possible you somehow mounted eth0 manually in the CLI, figure out it seems to be working but config files were never updated ?
nope - I've never manually configured eth0 in the CLI
I've also rebooted the VM several times - also did a VM power cycle
Ok you can overwrite /etc/network/interfaces or get nabox-api internal docker IP and issue the proper API call, but editing the file is simpler
sure, I'll try modifying /etc/network/interfaces - I'll copy the file from another one of the nabox OVA's I deployed and adjust the IPs, netmask, etc.
Just for the sake of nerding, you can use the internal API to reset network config :
curl -X POST -uadmin:Netapp01 -H "Content-type: application/json" -d '{
"hostname": "nabox",
"ip": {
"dns": [
"192.168.0.100"
],
"domain": "company.com",
"gateway": "192.168.0.1",
"ip_address": "192.168.0.100",
"netmask": "255.255.255.0"
},
"use_dhcp": false
}' http://`docker inspect nabox-api|jq -r '.[0].NetworkSettings.Networks["docker-compose_default"].IPAddress'`:5000/api/1.0/system/network-config
manually changed the IP address and rebooted and it's still not allowing me to SSH to the box
Has the file been reset somehow ?
No - its persistent across reboots
Our network team currently has ICMP disabled throughout the environment
SSH to the NAbox failed...
Are you positive gateway and ip are properly set ?
IP = yes, gateway = I don't know - checking internally with our network team
@young steeple - this is what I copied and adjusted from another nabox instance
The loopback network interface
auto lo
iface lo inet loopback
The primary network interface
auto eth0
iface eth0 inet static
address 172.24.x.x
netmask 255.255.255.0
gateway 172.24.x.x
Looking good. How about ip a s eth0 ?
I had an issue with the default gateway - I corrected it and it's still not connecting via SSH
@young steeple - are you available for a quick Zoom session?
Not right now but give me an hour
okie - I'll try to vMotion the VM to another host...
Not sure if I can help though, looks like IP misconfiguration but we’ll confirm.
after moving to another host?
no - I fixed the incorrect default gateway in /etc/network/interfaces and I guess it took some time to propagate, maybe?
I'm able to hit both the admin interface and Grafana dashboards successfully
also SSH is functioning perfectly to the OVA
the nice thing about Grafana, is we can tell exactly when the VMware vCenter issue started 2 nights ago because the metrics immediately dropped! lol - now onto root cause for this SEV-1 the other night...
I forgot to mention you needed to reboot or issue service networking restart
You say vcenter issue but is that really vcenter ? Like a dvSwitch issue ?
I rebooted directly from the CLI
Could very well by a dvSwitch issue within VMware vCenter - VMware and Network team have to confirm the configuration and best practices
^ Congrats on the new release 🙂
Look forward to using this next week in production 😃
Same 🙂
Well...on my vSIM lol.
Seems to be working fine in NAbox, don't forget to Reset dashboards folks.
Hi @young steeple I can confirm that the import works well but the "old" NetApp dashboards are still around and need to be deleted manually.
When you say "old" you mean in the same folder as the new ones ?
Correct in the Harvest - cDOT folder
Some dashboards gets deprecated ?
By default new release dashboards are imported in new folders to avoid overwriting of old dashboards in case someone has customisation there
In NAbox, the dashboard folders are always the same, and import is manual, to keep it clean in Gafana
Oh I think I see... @fossil bane did you guys rename the dashboards to "ONTAP: *" ?
Yep there was a renaming https://github.com/NetApp/harvest/pull/1080
ok that's why. I do dahsboards overwrite but if the dashboard name changes it won't be overwritten. I should probably empty the folder when importing dashboards with overwrite
Ideally I should use dashboard provisioning feature in Grafana to read it from disk so they're immutable and reflect what's in the folder dynamically
Is there a reason you don't use tags on stock dashboards ?
No reason as such. Can be used.
Would be nice to have to quickly identify default dashboards
I'll open an issue on github and see what people think, or see if there is one already
The new harvest release makes a very good first impression. It matured really well since the first release 21.05.0. Keep up the good work!
Hi guys, next two questions about the harvest 🙈 in the Quota dashboard I only see three entry's but since we have many quotas some are missing. Does harvest requires some specific setting on the SVM to collect the data? And second question is about the headroom dashboard. Is there something like a guide which explains what the specific panels are showing?
@tight iron With 22.08, Harvest shows same quotas as of System Manager. Do you see these quotas in System manager? Here is the code which filters out the quotas. It is currently in a plugin and cannt be configured https://github.com/NetApp/harvest/blob/main/cmd/collectors/zapi/plugins/qtree/qtree.go#L190-L204
You can run below commands to see the counter description about headroom which are used in dashboards
bin/zapi -p u2 show counters --object resource_headroom_aggr
bin/zapi -p u2 show counters --object resource_headroom_cpu
Hi @fossil bane thank you for your replay. Yes in the System Manager I see the quotas under the tab reports, could it be since we are using the default policy instead of a custom one?
@tight iron Could you confirm if System manager shows different number of quotas under Quota Report than harvest? Also what is your cluster version?
Yes, I see default quotas are excluded in system manager. This is the rest call system manager makes
/api/storage/quota/reports?return_timeout=120&max_records=200&fields=type%2Cvolume%2Csvm%2Cqtree%2Cusers%2Cgroup%2Cspace%2Cfiles&show_default_records=false&return_unmatched_nested_array_objects=true
@tight iron good catch on CIFS latency. Fixed in https://github.com/NetApp/harvest/pull/1221/files if you want to make the change or the next nightly build will be published in 30 minutes or so and it will have the change. Thanks for reporting! https://github.com/NetApp/harvest/releases/tag/nightly
We are currently on ONTAP 9.9.1 P9. Interesting is in the system manager I see all the Quotas (should be around 2351 for only one SVM) but only 3 in Prometheus
thanks, we'll double check 9.9.1 P9 in the meantime can you hit the rest endpoint with curl or Harvest's bin/rest and let's make sure the same number is returned? Something like this should do the trick curl --insecure --user admin:pass 'https://10.193.48.11/api/storage/quota/reports?return_timeout=120&fields=type%2Cvolume%2Csvm%2Cqtree%2Cusers%2Cgroup%2Cspace%2Cfiles&show_default_records=false&return_unmatched_nested_array_objects=true'
with the curl command I receive 2350 records on only one cluster
thanks. and the ONTAP: Qutoa dashboard Reports table has how many? If you want to check Prometheus instead of Grafana use the metric qtree_disk_limit qtree and quota metrics should be better named. We considered changing the metric names for 22.08 but didn't want to break customers already using these metrics. e.g. qtree_labels are qtrees, while qtree_disk_limit are quotas. we're going to deprecate the misnamed ones and fix them in the next release
Over the whole farm (19 Clusters) I only see 3 entries
can you check your poller log for any errors? maybe there is a timeout retrieving the quotas. feel free to shoot the log file our way https://github.com/NetApp/harvest/wiki/FAQ#how-do-i-share-sensitive-log-files-with-netapp
@young steeple is it possible to show logs for a specific poller or would you say just use grep?
it's possible for a single poller, examples here https://nabox.org/documentation/troubleshooting/
ahhh there is a plugin error - duplicated instance key
now we're getting somewhere 🙂
can you share the log file @tight iron so we can track down the duplicate instance key?
Sure just struggling how to collect only the logs for one poller 😅
maybe something like this would work? dc logs NAME_OF_POLLER > log.txt
where NAME_OF_POLLER will be from the name column when running dc ps
mail is send, hope this is enough
thanks @tight iron that did the trick. @fossil bane found the problem and is working on a fix. We'll post when it is fixed and hits nightly
@loud ocean my Volume Details look like this rn. Alot of points.
hi @viscid agate we improved that in the most recent release (22.08) - the issue was topk was not filtering enough data so you got more series than the five you asked for. Can you try with https://github.com/NetApp/harvest/releases/tag/v22.08.0
Will try and give feedback in the next couple d
@tight iron quota fix is now available in nightly build https://github.com/NetApp/harvest/releases/tag/nightly let us know if it fixes the issue.
I will test it tomorrow
@young steeple I have a weird issue where I can't upgrade packages with FF on my lab Harvest.
Chrome works fine.
Interesting, where does it fails ?
It browses for the file, and it will say sometimes "uploaded successfully" and sometimes it times out.
what is the current version?
I just did it in Chrome, but latest.
NAbox 3.1.1 (2022-06-08) - Alpine Linux 3.14.2
Grafana8.4.6
Graphite1.2.0-dev
NetApp Harvest22.08.0-1
NetApp NMSDK9.8P4
Prometheus
2.36
you upgraded to beta or youhad a version prior to that one before upgrade ?
I installed this when nabox was in beta 3 yeah.
so you downgraded ?
No...I mean this isn't a new NAbox.
Task failed successfully.
lol
before the end or at the end ?
Before the end.
Ok, I think I changed something a while back to fix an issue with FF, maybe that was it. I'm upgrading 3.1.1 to 3.1.2b3 now and it seems fine
Hmm ok.
If you get a chance you can try to reapply the update you just did with chrome
Yeah Chrome worked.
Hi Chris, just loaded the nightly release and the average latency is now there, thank you very much for that. Already needed to troubleshoot a vscan issue 🙂
Hi @fossil bane I also looked at the Quota dashboard and as far as I see, now all quotas are reported
Thanks @tight iron for the confirmation. As always Thanks for making Harvest better 😀
If i am allowed.
It is a lil harder to follow here in discord.
Is there something like the Threads in Slack in Discord?
Yes there is a thread option .
Have installed the 22.08 of Harvest and i like it alot.
The single points are gone either it seems.
How 😄
How 😄
Hi all, it is intentional that the units in the volume - details dashboard / Per volume Space Used panel is set to bytes(SI) instead of bytes(IEC). This is a bit confusing e.g. our volumes are 24TB if we do e vol show, also in the vCenter but in the dashboard they are 26.4TB
Hi all it is intentional that the units
Hi everyone.
I have a question regarding metrics exported by harvest. What is the best way to find description of a metric? For example there is "node_disk_data_written" metric. It used in one of the default Grafana dashboards as a source for disk latency. But for me it seem that it the dashboard shows unrealistic values of latencies near 200ms. So I want to dive deeper and check values on NetApp cluster itself.
@sour gale Most of Harvest metrics are prefixed by object. Let's take the metric you have mentioned node_disk_data_written , After removing object node, we get disk_data_written If you search this string in in our github repo or codebase, That should take you to template mentioned here https://github.com/NetApp/harvest/blob/main/conf/zapiperf/cdot/9.8.0/system_node.yaml#L24. From the template, you'll get to know about the object name as here (its the query field in template) https://github.com/NetApp/harvest/blob/main/conf/zapiperf/cdot/9.8.0/system_node.yaml#L3 . Now you can run below command to know more about the counters for that object.
bin/zapi -p POLLERNAME show counters --object system:node
thanks. it was what I looked for
Hi all, I've got a question volume_avg_latency. As far as I see this is not the average from read and write, can you specific what metric this is? I'm asking because we have a backup volume which has 10ms volume_avg_latency but 0ms read or write
@sturdy crest Thank you for the link, will be added to my OneNote. I'm just wondering how a backup volume with no IOPs or throughput can have 10ms latency on it
Are they CIFS/NFS IOPS or how are backups configured?
just did a statistics collection and there is no CIFS/NFS IOPs traffic on the volume. It is a backup volume which has a snapmirror relation to the productive one but currently there is no job running
could be related with this https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/What_is_the_difference_in_latency_of_the_same_volume_between_statistics_volume_show_and_qos_statistics_volume_latency_show_commands
Maybe, but we need more data. These would be dblade IOPS. It could be some client accessing the file system generating the load.
If you do a qos statistics volume latency show, does it give a delay center for that volume?
And check qos statistics volume performance show to make sure that volume's latency is counted at the qos statistics level.
The juncation-active state is on false for that specific volume and the all the values of qos statistics volume show are 0ms. But if I do a statistics volume show I see the Total OPS and latency what matches with the harvest
what you're describing reminded me of this discussion https://github.com/NetApp/harvest/discussions/768#discussioncomment-2487882 - at least the snapmirror reads not being tracked in volume counters part. Perhaps the discrepancy you're seeing is for related reasons? Thanks for verifying that the CLI and Harvest match. That makes sense, we aren't doing anything special with the avg_latency counter
Thanks for your feedback @loud ocean this just came up because I'm currently creating a custom dashboard for your SPOC, where we have like the Top 5 IOPs, throughput and latency from all volumes of the environment and then I saw we have a backup volume with a almost flat line at 10ms
If you want I can open a case tomorrow for that
Are you in EMEA?
Just trigger perf archives for a problem time frame. I don't need a case.
Yes I'm in EMEA
Ok. If you have access to trigger now please do, otherwise hit me up tomorrow.
I'm in Kansas, so I'm a few hours behind.
getting this error in Harvest on a couple of our clustered NA systems....trying to track it down.
oller_XXXYYY.log:{"level":"error","Poller":"XXXYYY","plugin":"Zapi:Volume","object":"Volume","error":"duplicate instance key => ","relationshipId":"","caller":"goharvest2/cmd/collectors/zapi/plugins/volume/volume.go:172","time":"2022-08-24T12:21:53-07:00","message":"Failed to create snapmirror cache instance"}
this is working on other clustered systems in the harvest config but this one is failing on 3 of 5 in one location.
we recently did volume moves in those clusters, so I would expect that this could be a reason for it but unsure of how to fix it
harvest version 22.05.0-1 (commit 2bc2942) (build date 2022-05-11T07:56:11-0400) linux/amd64
suggestions on where I should post this if this is not the appropriate place, I saw that some were describing github harvest, which isn't nabox
You're in the right place.
Looks like https://github.com/NetApp/harvest/issues/992 but your version should be fixed.
Snapmirror issue
Hi @sturdy crest the perf archiv was triggered
Ok what's the serial number?
And volume name?
If you don't want to broadcast you can DM me.
Ok this is weird.
you mean an interesting/strange behavior 😅
Yeah. I'm bamboozled.
I guess go ahead and open a case.
Hmm. It's not coming from outside WAFL. I don't see it in the spinhi counters at all.
Nevermind, found it.
Dedupe.
That's a counter manager version of that command.
Hi all, does anyone have by chance a prometheus query to filter volumes based on the node model type? For example based on the "issue" above I like to filter every volume out which is on a FAS system and only show volumes on AFFs with the metric volume_avg_latency
You can try this volume_labels * on(node) group_left(model) node_labels
Thank you for that 🙂 so now I have the volumes filitered based on the model. Do I need to create a new query for the volume_avg_latency or is it possible to have this in one?
I think you can just replace volume_labels with volume_avg_latency, See if below helps. If you want to add some more data from volume_labels then we can do further joins to add more results to it.
volume_avg_latency * on(node) group_left(model) node_labels
@tight iron did I help you ok? Or did you have any questions?
Hi @sturdy crest I see that there is the SIS scan running. Since I'm in the military next week I will open a case after that
Nah no need for a case now.
I explained it...
If the scan isn't hurting anything you could leave it.
I see, I'm now trying to write a query which takes the top5 (or topressources) only from the AFF systems. Since we are only using FAS system for backup or snaplock this makes for us more sense 🙂
NAbox 3.1.2 is officially the latest and least worst version you can get !
Cool thank you @young steeple will test it after I'm back 🙂
Hi gang I wanna ask for quick help. I am trying to get just numeric results from the "max_throughput" that I get from the qos_label. So max_throughput from qos_label returns 4000IOPS for example and I only want the 4000. Tried the Value mappings from the Grafana but this won't change the data type to numeric from the string so it's not gonna work for me. At the moment I am trying to work with the plugin LabelAgent in the template but I am not sure I can get it done with this plugin. Any idea would be appreciated!
<qos_labels Result from the prometheus>
qos_labels{cluster="nas", datacenter="Test", instance="nabox-harvest2:12991", job="harvest2", max_throughput="4000IOPS", min_throughput="0", num_workloads="1", policy_group="file-03f22cec-99562eca1d73-wid13042", svm="nas_fbrsvm"}
<Template>
name: QoSLimit
query: qos-policy-group-get-iter
object: qos
collect_only_labels: true
counters:
qos-policy-group-info:
- ^policy-group => policy_group
- ^^vserver => svm
- ^^pgid
- ^policy-group-class => policy_group_class
- ^max-throughput => max_throughput
- ^min-throughput => min_throughput
- ^num-workloads => num_workloads
plugins:
- LabelAgent:
value_to_num:- status status up 0
split: - max_throughput 'IOPS' ,max_num,placeholder
- status status up 0
export_options:
instance_labels:
- svm
- max_throughput
- min_throughput
- num_workloads
instance_keys:
- policy_group
qos_labels iops
Hey team. Collection of QoS policy requires QoS/Workloads collection ? Anything else ? I've got a user getting No Data I'm wondering if that's just the workload collection that's necessary
Could you ask user to share harvest logs with us.
Hi Team, I have request from one of our customers if there is any documentation on metrics collected and what it means?
His question is about qos metrics and below a question from him.
"what exactly does something like qos_volume_ops represent? How can that be used for QOS performance tracking? How is the value different from “normal” volume read and writes? The same question is for all of the QOS metrics."
Hi Team, I am looking to enable snmp request for NAbox to monitore the resources of the VM.
Is it possible to make a switch to enable snmp for NAbox and to configure, or maybe only to have it pre installed. I can configure by my self at cli
hi @glacial tree ONTAP's documentation for performance metrics is sparse. Generally the recommendation is to use the ONTAP provided metadata - for example, bin/zapi --poller aff-250 show counters --object workload_volume | less will query the cluster named aff-250 for all the performance counter metadata associated with the workload_volume object. If you look at the Harvest template conf/zapiperf/cdot/9.8.0/workload_volume.yaml you can see that template queries the ONTAP object workload_volume and exports those metrics as qos_volume - in this case, qos_volume_ops is a rate (per second) of the number of operations that completed for a workload
Hi Team I am looking to enable snmp
hello! since i updated to nabox 3.1.2 & harvest 22.08 i didn´t see the storage nodes & storage shelves on the power dashboard anymore. anyone else who have this problem?
Hi gang.
I am trying to make the intervals shorter than the 60s but it doesn't seem to work. I was getting the QoS Latency from zapiperf/cdot/9.8.0/workload_detail.yaml every 180s with the default schedule config and I have changed the config from workload_detail.yaml and /zapiperf/default.yaml to 60s and it works but when I set the config as 15s for the schedule, it doesn't seem to work. Any idea what else should I check? Thanks! 🙂
Hi, I have a question related to certificiation authentication. How do I need to fill out the server_cert.cnf (CN and alternative names) in the certificate generation process, If I want to use just one signed certificate for multiple ONTAP Cluster. The example contains the SAN data (with FQDN and IP). https://github.com/NetApp/harvest/blob/main/docs/samples/server_cert.cnf
QoS scheduling
Hi I have a question related to
Thanks Chris! I'm not aware any documentation links for performance. Is there a link for ONTAP documentation on this?
I'm not aware of any documentation on ZAPI performance counters beyond the metadata that ONTAP itself returns
You may check Rest Performance document here https://library.netapp.com/ecm/ecm_download_file/ECMLP2883449#qos_volume
Has anyone here successfully imported NAbox into AWS and created an EC2 instance? I'm having issues with the import
Hi guys, love the product, been running Harvest for 6 or 7 years now.
I ran into what seems to be a bug when updating the root password on the appliance. There was a penetration-test done in my company and it detected a default login existed for my NABox 3.1 setup. I webbed into it and when I attempt to set a new password, it says "Wrong password or username". That is with and without the box checked for "change root account instead of admin", though obviously I want it checked.
The password I'm setting it to meets the requirements listed.
I resorted to copy & pasting the default root password just to make sure I wasn't mistyping it (I wasn't)
I figure I'll update to v3.1.2 but wanted to see if anyone knows what might be up with this. Thanks!
Hi @young steeple we have an interesting phenomenon with the Nabox 3.1.1. We have errors in Grafana when loading the Dashboards. When we look on the Admin - Systems page we do not see any clusters but the all the containers are up and running. A week ago we had the same and I need to do a restore
and the nabox-api log looks like
Hi Yann8373 we have an interesting
Hello guys! I am using Prometheus Replication with Node Exporter and there is no problem , then i configured harvest to monitor NetApp , but i have this error msg="Out of order sample from remote write" err="duplicate sample for timestamp" series="{name="node_uptime"
When i curl my cluster i see this output , strange that uptime is 2 times there , or ?
[root@server003 harvest-21.08.0-6_linux_amd64]# curl 0.0.0.0:12993/metrics | grep node_uptime
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0node_uptime{datacenter="dc03",cluster="NAPP01",node="NAPP01-01"} 159378
node_uptime{datacenter="dc03",cluster="NAPP01",node="NAPP01-02"} 160569
node_uptime{datacenter="dc03",cluster="NAPP01",node="NAPP01-02"} 160439
node_uptime{datacenter="dc03",cluster="NAPP01",node="NAPP01-01"} 159247
100 1152k 0 1152k 0 0 495M 0 --:--:-- --:--:-- --:--:-- 562M
Someone with same error? Thanks on purpose.
Following up regarding failures when trying to reset the root password:
Log in with admin, go to reset the password, check the box to do it for root, put in the admin password for "current pw" and the new root pw in for the new and confirm password.
I was trying to do it while logged in as root and it wouldn't work. I believe you HAVE to be in as admin, not root.
You mean you logged in as root in the web ui ?
Hi all, Yann activated with NABox 3.1.2 the workload/QOS counters. I've just did a query for the qos_detail_volume_resource_latency counter but I only see the SVM root volumes. Does harvest need specific settings on the volume to collect this informations?
hello! about 90% of the dashboard "ONTAP: SVM" have no data. highlights, volume performance & capacity is ok, but the rest (all the protocols) display "no data". did you see the same of did you have metrics? thanks
hello about 90 of the dashboard ONTAP
Did you wait a bit ?
duplicate metrics
Hi Yann, after almost 24h I still only see the root volumes. We have per volume a own qos policy but could it be because we do not use the default policy groups?
qos workload detail volume
Thanks Rahul!
Thanks Chris!
Volumes not showing up
@loud ocean Any news on possible integration of Storagegrid for Harvest?
hi @viscid agate I wanted to discuss what you mentioned in https://github.com/NetApp/harvest/issues/170#issuecomment-1190364468 and see if that was something we could incorporate. Beyond that we've discussed a general purpose StorageGrid collector and it's on the roadmap, but not planned yet. Would a bucket's capacity used metric be useful as a first step?
tenant is probably a better place to start... then buckets -> object count, used capacity ...
or even farther up/next to the tree... site capacity, node capacity (bla all the hardware counters)
there's a lot of layers, hehe... but not so different from ontap in principle... just cassandra is a bit bigger element than wafl
thanks @sterile junco if you have a GitHub account, those comments would be a nice addition to the current issue https://github.com/NetApp/harvest/issues/170. If you don't want to bother, I'll add them. Last time I looked closely at StorageGRID, it didn't have a general /metrics endpoint that returns all Prometheus metrics (like what Harvest does). Instead, StorageGRID requires you to query by name. One idea was since StorageGRID already provides open metrics performance data, don't add that to a Harvest collector, but instead focus on the capacity and system health info that it only provides via REST. In other words, build a StorageGRID REST collector
it seems one can access the internal prometheus UI at https://admin_node/metrics/graph ... but i can't really tell you exactly what that means... if you get a test system you can dig into the internal configuration (just a linux box) on the admin node and see what they've done
i guess the only advantage of having harvest in the mix is just to have a single point for monitoring, etc ... would be nice to be able to configure custom dashboards and alerts (not a fan of the alarms now... they hang too long) ...
i always hope for the convergence of the flexibility of graphana+prometheus and the semi-intelligence of *UM
yep that makes sense. looks like StorageGRID bundles Prometheus and Grafana. They ship with a set of dashboards, but if you want those StorageGRID dashboards available in a different Grafana instance, looks like you need to export them from StorageGRID and import into a different instance. If you do that, not sure if it's then possible to change the Prometheus datasource to point to the metrics coming from StorageGRID. I'm not sure yet if StorageGRID exposes the metrics in a way that an external dashboard can use. Maybe you could setup remote writes from the internal Prom instance
we have a number of external SG dashboards as well and I wish I had time to explore making some better ones with the SG sources
Dear users, I'm using harvest on Alpine Linux. When I try to start harvest I get message: "fork/exec /conf/bin/daemonize: no such file or directory". Does anyone have an idea how to start harvest with docker?
hi @dusky mauve can you share more information. Harvest should run fine on Alpine. We build our containers on Alpine. https://github.com/NetApp/harvest/blob/d22ef0756ff7e2054dc6b6845e3b90a08444e289/docker/onePollerPerContainer/Dockerfile#L29 Can you paste more what commands you are running and what is failing?
First of all, I'm not sure if it should work inside the container or with alpine. It's failing when I'm try to use command bin/harvest start
I also try to start using command "docker exec" for specific container
When I try to use starting command from alpine I get "bin/harvest: not found"
gotcha, are you wanting to run on Docker and if so, which of the five bullets listed here are you trying? https://github.com/NetApp/harvest/tree/main/docker#harvest-and-containers
or maybe you mean that you want to run Harvest on Alpine without Docker? If you want to run natively without containers we only publish builds for amd64 (easy to build for other platforms, but we haven't gotten requests outside macos). If you want to run a containerized version it should just work
Before I installed the latest release, there was also an error related to pgrep
sounds like you're trying to run native amd64 build on Alpine which won't work. it should work fine with docker though
on Apple Silicon platform you can force x86 image iirc
with Rosetta 2 I assume
Stand-up Prometheus, Grafana, and Harvest via Docker Compose.
thanks Pawel, ah I bet I know what's happening - you are probably at step 3 and that's failing when you run it on Alpine Linux. The containers run fine on Alpine, but as I mentioned above, in step 1, you downloaded an amd64 version which will not run on Alpine
FYI you will have libc issues with Alpine to run Harvest. I finally let go and moved to FROM --platform=linux/amd64 python:3.8-slim-buster
that's true of the amd64 version, but not the containerized version, which is built on Alpine https://github.com/NetApp/harvest/blob/d22ef0756ff7e2054dc6b6845e3b90a08444e289/docker/onePollerPerContainer/Dockerfile#L29
Actually, I'm not sure as I work in the existing environment. Step 3: you mean "Generate a Docker compose file from your harvest.yml"? Nabox works and grafana collects data. Even command 'harvest status' works but shows "not running" pollers. Issue exists when I try to start using 'bin/harvest start' with '--config' file or without. I'll work with it tomorrow. Thank you Chris.
ah did not realize you were using nabox
What are you trying to achieve with NAbox ?
@dusky mauve we'll get things sorted out for you tomorrow then
As I mentioned, I work in the existing environment. But Nabox is handy, I can easily add pollers and install new releases.
If poller fails and stop, we should start with a dc logs nabox-harvest2
Also note that harvest container in NAbox does does embed NetApp Harvest, it is mounted from another directory
Any thoughts on setting custom retention for a given metric ? I don't think Prometheus lets us do that but I figured I'd ask anyway
You can make StorageGrid internal Prometheus available to external queries since 11.5. https://kb.netapp.com/Advice_and_Troubleshooting/Hybrid_Cloud_Infrastructure/StorageGRID/How_to_enable_external_access_to_Prometheus_for_StorageGRID
hello, did anyone made a vscan dashboard already? would like to steal this 😉
May be this helps httpsfaun pubhow to
There are some vscan panels in SVM dashboard under below panel
nice thanks for sharing @stiff dove
What does it change? I found that NAbox works with amd64 (only one tar.gz file to download from github) uploaded via site, but Grafana works and collects data. Should it work like this?
yes nabox should work fine, what I meant yesterday is that it was not clear to me that you were using nabox. 😄 I thought you were trying to run Harvest without nabox on Alpine.
@young steeple will help you with adding Harvest to NAbox
Thank you Chris.
Harvest issue in NAbox
Tenant capacity used and Bucket is needed on my side. I have realised that with a mix of Rest and Prometheus Querys in my Dashboards...it works...but i find it meh.
You wanted to contact me to see how i have done that.
Hi guys,
Is it possible to add metrics in the ONTAP nic area? If we do a ifstat e0d for example, there is in the output " Bus overruns". Is it possible to also grab this with Harvest to show this in the Network Dashboard?
Hey @dusk siren probably 🙂 can you paste the ONTAP cli commands you are using and we'll take a look
frd-ntap41n::*> node run -node frda46104 -command ifstat e0c
-- interface e0c (0 hours, 22 minutes, 26 seconds) --
RECEIVE
Total frames: 238m | Frames/second: 177k | Total bytes: 334g
Bytes/second: 248m | Total errors: 0 | Errors/minute: 0
Total discards: 26 | Discards/minute: 1 | Multi/broadcast: 205
Non-primary u/c: 0 | Errored frames: 0 | Unsupported Op: 0
CRC errors: 0 | Runt frames: 0 | Fragment: 0
Long frames: 0 | Jabber: 0 | Length errors: 0
Alignment errors: 0 | No buffer: 0 | Pause: 0
Jumbo: 0 | Error symbol: 0 | ||Bus overruns: 26||
Queue drops: 0 | LRO segments: 23983k | LRO bytes: 319g
LRO6 segments: 0 | LRO6 bytes: 0 | Bad UDP cksum: 0
Bad UDP6 cksum: 0 | Bad TCP cksum: 0 | Bad TCP6 cksum: 0
Mcast v6 solicit: 0 | Lagg errors: 0 | Lacp errors: 0
Lacp PDU errors: 0
TRANSMIT
Total frames: 274m | Frames/second: 203k | Total bytes: 357g
Bytes/second: 265m | Total errors: 0 | Errors/minute: 0
Total discards: 0 | Queue overflow: 0 | Multi/broadcast: 310
Collisions: 0 | Pause: 0 | Jumbo: 227m
Cfg Up to Downs: 0 | TSO segments: 8981k | TSO bytes: 334g
TSO6 segments: 0 | TSO6 bytes: 0 | HW UDP cksums: 0
HW UDP6 cksums: 0 | HW TCP cksums: 51224k | HW TCP6 cksums: 0
Mcast v6 solicit: 0 | Lagg drops: 0 | Lagg no buffer: 0
Lagg no entries: 0
DEVICE
Mcast addresses: 3 | Rx MBuf Sz: 4096
LINK INFO
Speed: 100G | Duplex: full | Flowcontrol: none
Media state: active | Up to downs: 5 | HW assist: 5655
ah! a nodeshell CLI command - Harvest does not run any of those at the moment. I'll see if we can find this via REST or REST private cli
hopefully that is surfaced somewhere else
Thx Chris
@dusk siren looks like this counter may be exposed via nic_common can you use the following command to verify that counter gives us what we want? Replace u2 with the name of your system with 26 receive overruns. bin/zapi -p u2 show data --object nic_common --counter tx_bus_overruns --counter rx_bus_overruns and if you have the handy https://github.com/tomwright/dasel you can throw in a | dasel -r xml -w json at the end to get pretty printed json, otherwise add --write color to get output that is more readable than XML
@loud ocean These ar ethe right counters:
{ "counters": { "counter-data": [ { "name": "rx_bus_overruns", "value": "670" }, { "name": "tx_bus_overruns", "value": "0" } ] }, "name": "e0d", "sort-id": "0", "uuid": "frda46104:kernel:e0d"
The number increased the last hours
Hi guys;
We are using the Prometheus Service Discovery. Since I updated Harvest to version 22.08.0-1 Harvest seems to expose not the whole address for the target. I only can see the port but not the FQDN. E.g.: {"__meta_poller":"vig-ntap11"}},{"targets":[":13099"] in Prometheus I also can only see the ports. Prometheus and Harvest are running on different VMs
@dusk siren Could you share output of below command
bin/harvest doctor -p --config harvest.yml
Sure:
Admin:
httpsd:
listen: :8887
auth_basic:
username: -REDACTED-
password: -REDACTED-
Tools:
grafana_api_token: -REDACTED-
Exporters:
prometheus-zf:
exporter: Prometheus
port_range: 13000-13999
Defaults:
collectors:
- Zapi
- ZapiPerf
use_insecure_tls: true
auth_style: basic_auth
username: -REDACTED-
password: -REDACTED-
exporters:
- prometheus-zf
Pollers:
abt-ntap91:
datacenter: ABT
addr: -REDACTED-
alf-ntap11:
datacenter: ALF
addr: -REDACTED-
alf-ntap91:
datacenter: ALF
addr: -REDACTED-
als-ntap91:
datacenter: ALS
addr: -REDACTED-
..... More systems
I see local_http_addr is missing in exporter configuration. Could you add , That should add the address of the target
0.0.0.0 is teh default value or? Even if it is not named in the yml?
yes
given your prometheus is running on a different machine, it will need the target ip address
It worked until I did the update 😄
hmm that is not something we changed in latest version.
So, the targets are named in harvest and Prometheus is using them right?
Yes, prometheus is scraping end points created by harvest
I didn't update to every version. So the source version was older
hmm let's give local_http_addr a try and see
Ideally 0.0.0.0 should have worked as well. Could you share logs where local_http_addr is not set?
0.0.0.0 is default only. It should not be the prometheus server IP but the IP of machine on which Harvest is running.
That is not yet available in Harvest
Just to confirm, you are using the steps mentioned here https://github.com/NetApp/harvest/blob/main/cmd/exporters/prometheus/README.md#prometheus-http-service-discovery ?
Yes, I did
ok great. So default configuration 0.0.0.0 should have exposed the fqdn with port. If it doesn't then we should see an error in logs (could be related with resolving fqdn). To workaround this, you can mention the fqdn of harvest machine in local_http_addr which should work.
ok
is it working now with local_http_addr?
yeah. I set the IP 0.0.0.0 and it is working
I have a question regard Aggregate Dashboard
Physical Space Used is not correct i think because it does show data that is tiered on our Storagegrid.
Can someone look into this?
Here an Example:
hi @viscid agate yes, we'll take a look 👍
that dashboard is displaying the aggregate physical_used metric returned by ONTAP. Let me check why the CLI may not agree
ah! I see the problem, units, that panel is using bytes (SI) when it should be bytes (IEC). If you change the units for that panel in the dashboard do you see the amount?
I'll open a PR to fix
Oh Wow...youre right. Thank you for the hint 😄
The other Panels are wrong as well (have checked 3-4).
I will edit it trough JSON till it is fixed with the next release
yes, the PR will include updates to all panels
hi @dusk siren thanks for raising the service discovery issue - we found and fixed the issue. As you discovered, when a poller's Prometheus exporter was missing the local_http_addr param or if that param was the empty string, the poller published an incomplete address that causes service discovery to fail. The workaround is to specify local_http_addr: 0.0.0.0 - this PR fixes the problem https://github.com/NetApp/harvest/pull/1278 so that the local_http_addr can be missing or empty. Thanks!
hi @viscid agate all of the dashboards that were using bytes (SI) were changed to use bytes (IEC) in https://github.com/NetApp/harvest/pull/1280 and https://github.com/NetApp/harvest/pull/1229/files
@quartz wave I think I know what's going on, wrote you an email
I just installed 22.08 in nabox. In the CPU headroom graph, is the utilization supposed to be measured in usec? Or should this really be percent?
I'm running Harvest inside nabox. In the 22.08 Node Details page, the CPU Busy Domains is very different from 22.05. It looks like an order of magnitude in Idle and host OS data. The rest of the values didn't appear to change.
Hi Team, I hope y'all had a good weekend. Does anyone have any experience deploying the harvest on a massive scale? I'm planning to deploy the harvest to monitor over 500 ONTAP cluster environments and I wonder if there is any hiccup that I might face thru the service or the deployment.
CPU domain busy
@agile oracle We have customers monitoring ~300 nodes on a single instance of Harvest running on a machine with 24 GB of memory. Please let us know how your deployment goes. It will be a good benchmark. 🙂
nathan33851 We have customers monitoring
You are welcome. Btw. I have added the two bus overrun metrics to nic_common and I get what I need. Thank you for that
awesome!
Hi all! Can Harvest scrape any S3 metrics? If not, is it planned? Thanks!
@empty knot Harvest doesn't scrape S3 metrics currently. Could you open a github request for the same.
Done - Thanks.
Thanks @empty knot Is your request similar to https://github.com/NetApp/harvest/issues/788
Very similar though mine is not in relation to FabricPool.
ok thanks.
For ONTAP S3 metrics I use the qtree metrics, there's a qtree per bucket
making the switch from nabox to nabox3/harvest 2.0 - some of the metrics in nabox/graphite have not come over into nabox3/prometheus - looking for the nfs connection count that I had in the old world
does anyone know how to reset the admin password for NAbox? I am able to ssh in as root, so there should be a way..
Hi all Can Harvest scrape any S3 metrics
I just installed 22 08 in nabox In the
Hi Team, I have got below questions from one of our customers. We have deployed Harvest for them. They are looking for qtree metrics.
Need some help.
For the 7-mode to ONTAP migrated volumes, there may be one or more qtrees within a volume.
* If they apply QOS at the qtree for their different workloads, is it possible to collect statistics at the qtree level.
For newly provisioned workload, there will be one qtree per volume.
* could apply QOS to the volume and collect statistics at the volume but it would be preferable to use one consistent practice of applying QOS at one level.
We will be applying QOS to the qtrees.
* For NFS, this will limit the IOPS at the qtree level.
* For CIFS, this will limit the IOPS at the qtree level once we are at ONTAP 9.9.1
qtree workload
hello! is it possible to bring a netapp e-series to grafana (nabox)?
hello is it possible to bring a netapp e
I am needing to migrate harvest (21.08.0-6) from an OpenStack environment to Azure. I am looking for the easiest method to accomplish this task. Harvest is running on a Centos VM and will be migrated to a Centos VM. I would like to migrate the applications (grafana, prometheus, and harvest) and keep the historical data. Thanks.
Do we have metrics for tiering activity ? packets in/out or throughput ?
Having trouble using influxdb exporter.
i keep seeing these errors in harvest container logs
3:06PM ERR collector/collector.go:433 > export data to [my-influx]: error="Post "http://0.0.0.0:8086/api/v2/write?org=harvest&bucket=harvest&precision=s\": dial tcp 0.0.0.0:8086: connect: connection refused" Poller=cluster-01 collector=ZapiPerf:Volume stack="goroutine 519 [running]:\ngithub.com/netapp/harvest/v2/pkg/logging.MarshalStack({0xbadb60?, 0xc000f9e090?})\n\tgithub.com/netapp/harvest/v2/pkg/logging/logger.go:152 +0x88\ngithub.com/rs/zerolog.(*Event).Err(0xc0008ea000, {0xbadb60, 0xc000f9e090})\n\tgithub.com/rs/zerolog@v1.27.0/event.go:381 +0x63\ngithub.com/netapp/harvest/v2/cmd/poller/collector.(*AbstractCollector).Start(0xc000279ee0, 0xc0005c6040?)\n\tgithub.com/netapp/harvest/v2/cmd/poller/collector/collector.go:433 +0x10ee\ncreated by main.(*Poller).Start\n\t./poller.go:399 +0x2c5\n"
Pollers:
cluster-01:
datacenter: DC-01
addr: 10.193.48.163
auth_style: basic_auth
credentials_file: path/to/credentials.yml # read credentials from the file
username: admin
password: netapp1!
use_insecure_tls: true # Disable TLS verification when connecting to ONTAP cluster
exporters:
- my-influx
my-influx:
exporter: InfluxDB
addr: localhost
bucket: harvest
org: harvest
token: mXHXQF2y3wsC3D3ItwC6RH3Wd3xtZCMEqBC_07AoPWELxGl4DBGGnhycOPrxviQcOUVU9JxqalXqn_NTk0RsxQ==
not sure where im going wrong. could someone help!
Hi,
I'm trying to deploy harvest pod in my k8s cluster. Able to generate k8s deployment.yaml using kompose.
can someone tell me importance of these two volume mounts
volumes:
- hostPath:
path: /root/harvest-22.08.0-1_linux_amd64/conf
name: cluster-01-hostpath0
- hostPath:
path: /root/harvest-22.08.0-1_linux_amd64/cert
name: cluster-01-hostpath1
My plan is to deploy harvest dynamically from a service. i wont have access to these folders from my service.
How do i go abt it?
k8 Harvest
i just noticed that I am not getting logs in /var/log/harvest. In fact the directory /var/log/harvest does not even exists. How can I start getting logs?
i just noticed that I am not getting
A user just sends me a request to use https://grafana.com/grafana/dashboards/14179-grafana-dashboard-for-netapp-ontap-v9-8/. Cute 🙂 I kinda like the smooth curves though.
Ooh I like it.
Hi,
is compute-metric plugin supported for REST only?
It should work in Zapi also
It should work in Zapi also
I am seeing the following:
https://github.com/NetApp/harvest/blob/main/conf/zapi/cdot/9.8.0/snapmirror.yaml
This shows:
- ^source-node
but when I look at the data returned back, I don't see source-node or source_node, is that information not in the zapi api call?
it shows all our data with source_node="" in all cases
which seems strange
on all our metrics
Using Histograms with Harvest
Hello,
we have a Problem.
From the Variable cluster (from the screenshot) sometimes we miss Systems. After we disable and reenable them in Nabox System Configs they reappear in the variable.
Can someone look into that?
dashboard issue NABox
the nabox2 had a cool dashboard "netapp overview cdot". is that also available somewhere for the nabox3? https://grafana.com/grafana/dashboards/10181-netapp-overview/
NetApp Overview CDot dashboard
Hey. So my Harvest lab instance isn't collecting qos delay centers again, despite my enabling in the zapiperf/default.yaml.
pstejska-harvest:/opt/harvest2-conf/conf/zapiperf# tail -n 5 default.yaml
Uncomment to collect workload/QOS counters
Workload: workload.yaml
WorkloadDetail: workload_detail.yaml
WorkloadVolume: workload_volume.yaml
WorkloadDetailVolume: workload_detail_volume.yamlpstejska-harvest:/opt/harvest2-conf/conf/zapiperf#
pstejska-harvest:/opt/harvest2-conf/conf/zapiperf#
pstejska-harvest:/opt/packages/harvest2/conf/zapiperf# tail -n 5 default.yaml
Uncomment to collect workload/QOS counters
Workload: workload.yaml
WorkloadDetail: workload_detail.yaml
WorkloadVolume: workload_volume.yaml
WorkloadDetailVolume: workload_detail_volume.yamlpstejska-harvest:/opt/packages/harvest2/conf/zapiperf#
Could it be a 9.11 thing?
Good Morning All - I have installed harvest using docker compose and wanted to keep the retention as below and restarted prometheus services to take affect, but after a day I still see only 15 days default metrics.
‘—storage.tsdb.retention.time=184d ’
Am I missing something here?
I am trying the new Ems collector in the 22.08 release but it does not collect any event.
poller_flc1-noprod-ash-storage.log:{"level":"info","Poller":"flc1-noprod-ash-storage","collector":"Ems:Ems","path":"conf/ems/9.6.0/ems.yaml","v":"9.8.0","caller":"collector/helpers.go:133","time":"2022-09-28T08:44:25-07:00","message":"best-fit template"}
poller_flc1-noprod-ash-storage.log:{"level":"info","Poller":"flc1-noprod-ash-storage","collector":"Ems:Ems","total instances":0,"caller":"ems/ems.go:387","time":"2022-09-28T08:44:25-07:00"}
poller_flc1-noprod-ash-storage.log:{"level":"info","Poller":"flc1-noprod-ash-storage","collector":"Ems:Ems","queried":61,"caller":"ems/ems.go:456","time":"2022-09-28T08:44:26-07:00","message":"No EMS events returned"}
What am I missing?
we'd like to use ems collector to monitor callhome.spares.low. also want to add resolve_when_ems: condition
is there an ems message that indicates spares low is resolved?
we d like to use ems collector to
Hi all, short question about the NABox. What is the maximum supported size of the data disk?
Hi All!
I am trying to get the cluster network interconnect usage by using these two metrics in Grafana:
- lif_recv_data{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$Node",port=~"$Eth",svm=~"Cluster"}
this metrics can know which ports are for cluster port - nic_util_percent{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$Node",nic=~"$Eth"}
This metrics can get the nic usage percentage
How can I merge this two together or how can get the result from metric one(knowing which ports are for cluster) and add the value to metric 2 to get the result?
Hi All
@loud ocean Hello Chris; running into a weird one, every couple of days I lose my main cluster from the reports, even though it is listed in my configuration.
looking to export metrics from older netapp grafana with graphite into nabox 3.1.2 with prometheus. anybody else run into this? want to keep history.
Missing user role in Harvest for NABox
@young steeple
Good question ! How much are you looking at ? It’s whatever linux / VMware supports with ext4
Hi team. IHAC that is running Harvest in their environment, and they are asking us for a historical report from 2018-2022 to show their storage growth over that time for all their systems. Is this the type of data that harvest would collect and report on?
We had 150gb filled in around 3 month. That would make around 1.2TB in two years
Thanks the older version isn't the nabox packaged VM. it's a separate grafana4.6, graphite 1.3.10, and harvest 1.3 install with imported dashboards. I wanted to import the last year of metrics from the old database which looks like it's stored in graphite to the new nabox 3.1.2 which has grafana8.2.7, graphite 1.3.14, harvest2, and prometheus.
We had 150gb filled in around 3 month
Thanks the older version isn t the nabox
@young steeple i must ask again because my colleagues keep asking me over and over...
Is it possible to change default from grafanaserver/admin to grafanaserver/grafana?
Nope 😄 Ok let me think about it and do some tests if I can provide a private flag to change that behaviour.
@hybrid night could be a memory issue indeed.
Hi Gang, I am doing a pitch for partners about Harvest and I am trying to include some examples in the slides. Can anyone let me know if there are any enterprises that are utilizing them? Thanks!
Do we have metrics for Flexcache in Harvest?
@static heart Could you add more details about the metric details? Do you mean workload related flexcache metrics?
Thanks Rahul, I was thinking about read cache HITs, protocol HITs...something that can illustrate how the cache is used.
Anything I can do to help identify it?
flexcache metrics
ontap: "vol show -fields used" show me 5.52TB.
AIQUM: show me 5.52 TiB used.
grafana: dashboard volume / per volume space used show me "volume size used" 6.07TB.
what i´d like to say:
- ontap show TiB but name it TB (terabyte).
- AIQUM make it right (in tebibyte).
- grafana: show us TiB but name it TB in the legend.
can you confirm that and correct it in the next version of harvest?
Hi, Can anyone tell me how does harvest find the best-fit template?
volume dashboard IEC units
I have a question about certificate authentication.
At my customer no self-signed certificates may be used. If I understand the documentation correctly, the mapping to the ONTAP user to be allowed to read the counters is made using the CN that is used for the CSR. https://github.com/NetApp/harvest/blob/main/docs/AuthAndPermissions.md#using-certificate-authentication
In the example harvest2. When creating the certificate, however, the FQDN (e.g. harvest-host1.domain.com) must be specified as the CN. My question is if the hostname alone can be specified as alt_name (e.g. harvest-host1) which is then mapped to an ONTAP user (harvest-host1 in this case). Is this the right way?
This is probably a newbie question,
Is there a way to sent the data collected to an external graphite server, couldn't find on the documentation.
Hi i have just installed nabox 3.1.2 with netapp-harvest 22.08.0-1
It s quite cool softwarer but i have some problems with environmental monitoring
i got no date from my DS460-12 shelf ( no temparature or power consumtion )
and no power consumption for all my FAS80x node
all other shelf or node are working well
is there some hardware that are not supported?
Missing Power metrics for FAS80x and DS460-12 shelf
Hello guys, just a very basic question - If I want to get Harvest running, do I need to have an external Grafana Server or something like that to visualise things, or can Harvest itself show dashboards somehow?
A popular choice is https://nabox.org/ which comes with dashboards
Any way to do that on a linux system without appliance?
@empty cairn You can do so with docker workflow as mentioned here https://github.com/NetApp/harvest/blob/main/docker/README.md
Other options are NABox or you can set up grafana prometheus separately and install harvest binary.
Any way to do that on a linux system
In some Graphs I have a vertical dotted line moving with the mouse cursor. But in the most Graphs not. Is there a way to have in all Graphs a dotted line? Btw. In the old Grafana (harvest 2.x) the dotted line was moved in all Graphs in the dashboard simultaneous.
@gentle pike Could you share which version of Grafana you are using with which version of Harvest? I could see vertical & horizontal lines in both the mentioned graphs in my setup.
- NaBox 3.1.2
- Harvest 22.08.0-1
- Grafana 8.4.6
Browsing data cleared
Hi,
I tried adding multiple pollers in the harvest.yml, but I see only one poller logs, there are no other errors. How to identify what went wrong with the other poller
Pollers:
ontap1:
datacenter: a330e568-63de-462a-bfb9-1f03c3cd04a7-DC
addr: <ip>
auth_style: basic_auth
credentials_file: /opt/secret/ontapcred.yml
use_insecure_tls: true
exporters:
- influx
collectors:
- Zapi:
- zapi_custom.yaml
- ZapiPerf:
- zapiperf_custom.yaml
umeng-aff300-01-02:
datacenter: a330e568-63de-462a-bfb9-1f03c3cd04a7-DC
addr: <ip>
auth_style: basic_auth
credentials_file: /opt/secret/ontapcred.yml
use_insecure_tls: true
exporters:
- influx
collectors:
- Zapi:
- zapi_custom.yaml
- ZapiPerf:
- zapiperf_custom.yaml
Grafana dotted line
Hello guys, does nabox support storagegrid/e-series?
Rene Meier3282 Could you share which
@mental vale can you paste docker version again?
We plan on including fabric pool panels in the volume dashboard for the Nov release of Harvest. The pull request is in progress now https://github.com/NetApp/harvest/pull/1352
Great thanks
Hey all: Is there a metric for measuring the amount of tiered cloud storage that I'm missing?
Hey all Is there a metric for measuring
Hello guys, in nabox the power consumption of NVME/A250 Systems just shows as 0w or empty, is this a known thing?
Hi all, where can I check for the unit of the metrics? Could someone point me to documentation if any.
Hello everyone! I have a strange problem when I try to configure LDAP in NAbox - Grafana. When I upload andsubmit data in Nabox, the configuration file (grafana.ini) automatically falls back to the default path to ldap.toml. I have a connection (check mark next to LDAP settings) but the user mapping is not working - it shows that it cannot find the user. I am using Linux with dockers. Additionally, I would like to ask how to enable logs in docker (grafana.log does not appear after uncommenting in the configuration file) to see what exactly is going on?
Hello everyone, i am using the new version of Harvest. Is it possible to add some object.counters from CDOT Ontap to the Prometheus as metrics?
Possible? Yes? How? I"m not actually sure.
Hello everyone i am using the new
NAbox 3 Cluster Dashboard-SVM Performance Drilldown Latency
Team in the latest harvest version i see grafana dashboard shows no data for quotas for one of my cluster and no data collected by harvest as well.The other poller is collecting the data perfectly
this error i can see in the docker logs
:39AM ERR collector/collector.go:394 > plugin [Qtree]: error="duplicate instance key => st-svm.vol12345.qt_bluser.4.file-limit" Poller=Pollername collector=Zapi:Qtree stack="goroutine 434 [running]:\ngithub.com/netapp/harvest/v2/pkg/logging.MarshalStack({0xbae240?, 0xc003b41ea0?})\n\tgithub.com/netapp/harvest/v2/pkg/logging/logger.go:152 +0x88\ngithub.com/rs/zerolog.(*Event).Err(0xc00040c000, {0xbae240, 0xc003b41ea0})\n\tgithub.com/rs/zerolog@v1.27.0/event.go:381 +0x63\ngithub.com/netapp/harvest/v2/cmd/poller/collector.(*AbstractCollector).Start(0xc0002e8000, 0x0?)\n\tgithub.com/netapp/harvest/v2/cmd/poller/collector/collector.go:394 +0x17ae\ncreated by main.(*Poller).Start\n\t./poller.go:399 +0x2c5\n"
I am trying to get QOS metrics working in harvest2, I have uncommented the workload metrics and rebooted and each time I reboot the metrics get re commented out
The location I am editing default.yaml in /opt/harvest2-conf/conf/zapiperf
Workload: workload.yaml
WorkloadDetail: workload_detail.yaml
WorkloadVolume: workload_volume.yaml
WorkloadDetailVolume: workload_detail_volume.yaml********:/opt/harvest2-conf/conf/zapiperf#reboot
******:/opt/harvest2-conf/conf/zapiperf# Connection to ****** closed by remote host.```
Reconnect and ```
Uncomment to collect workload/QOS counters
Workload: workload.yaml
WorkloadDetail: workload_detail.yaml
WorkloadVolume: workload_volume.yaml
WorkloadDetailVolume: workload_detail_volume.yaml*****
:/opt/harvest2-conf/conf/zapiperf```
So I'm thinking out loud...would it make sense for adding a graph for the netstat object to track packet loss? https://kb.netapp.com/Advice_and_Troubleshooting/Data_Storage_Software/ONTAP_OS/How_to_use_netstat_to_troubleshoot_network_problems_in_ONTAP_9.5_or_newer has the netstat command but there is a counter manager version in the netstat object.
I am trying to add information related to network port status, where they are and whether they are home, related to this post:
https://github.com/NetApp/harvest/issues/471
I am not able to get this to work to be able to have a metric that has this information so we can graph changes in the environment and possibly even put alerts in based on this to inform us when this happens.
Any thoughts on how I can do that?
I get this error when trying to add that one in
net-interface template
ISSUE WITH HARVESTER
Hi, could someone tell me why is the counter display name appended with the unit for node metrics?
I'm using Rest templates, system node for ONTAP v.9.12.1
`name: SystemNode
query: api/cluster/counter/tables/system:node
object: node
counters:
- ^^id
- ^node.name => node
- total_data
- total_latency
- total_ops
export_options:
instance_keys:
- node`
Hi could someone tell me why is the
Please help with "harvest.yml" configuration file from https://github.com/NetApp/harvest#harvest-configuration. Stuck at step "1. Configuration file" as not sure what do configure here and how to access Grafana.
The Harvest team is happy to announce a beta release of the StorageGRID collector is available in the latest nightly build. Please try it out and let us know how it works for you. Details can be found here https://github.com/NetApp/harvest/issues/170#issuecomment-1297448602
Hey guys, I now installed the nightly release on the nabox, and even managed to add the storagegrid as described on github. However, I can't seem to find the storagegrid dashboard. Help would be appreciated 😀
Hi, Hope this is right forum. I have query regarding the custom.yaml files for ontap. Can we decide to use REST or ZAPI yamls at object level for given ontap version. Like disks.yaml to use zapi and aggregate.yaml to use rest?
Has anyone successfully deployed harvest in AWS with the cloudformation.yaml file? From here https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/monitoring-harvest-grafana.html
You can use standard NetApp monitoring tools to monitor your file system storage usage and performance, with the following Harvest and Grafana solution being one example.
Hi, can anyone tell me if harvest allows only snake case for custom display name?
Hi can anyone tell me if harvest allows
hey, following on from #┊・harvest-nabox🔒 message do folks have the volume qos resource latency drill down metrics working? I can now see qos metrics in end-end but I am wanting to see "throttled" by qos metrics
Hi, I am trying to get nabox to work with an old netapp (7-mode) , but I get an ssl handshake error (zapi.py), I can give more details if anyone is available to help me...
Thank you in advance!
Hi I am trying to get nabox to work with
The team has been working hard on improving Harvest documentation. Many of you shared that it is difficult to find the docs on GitHub, so we've prioritized moving to a separate documentation site this release https://netapp.github.io/harvest/ We have lots more improvements planned, but this is a strong step in the right direction
Hi team, where can I find the metric units for harvest metrics?
Hi team where can I find the metric
Sudden change of NetApp clusters temperature measures after NAbox and NetApp Harvest update
Hi, does harvest exports its own metrics? We are interested in the error stats. We can view the log and see them. But wondering if we can collect those with prometheus
Hello,
I have configured new NABOX instance and added two clusters. I don't see option netapp when i try to add new dashboard in Grafana. Can anyone help me on this
Hello everyone, I would like to know why i am not seeing the metrics from netapp in prometheus. I am able to see the raw metrics arriving from all the collectors with the command "dc logs -f --tail 20 nabox-harvest2"
Where can I see all that raw metrics?
I would like to see raw metrics in order to be able to create new metrics
Thanks for your help.
Have a customer questioning if Harvest itself has any APIs they can pull information from? They understand Harvest is utilizing ZAPIs / APIs from Ontap to pull info from...but their ask is specific to any Harvest APIs/corresponding documentation. I'm not finding any, so thinking it's a 'no', but would love validation. Thanks!
Hello all.
We have a large enterprise customer that currently has three AIQUM clusters, but is asking the following:
“I would like to know about how Netapp integrates with Prometheus as that is a priority for us to integrate metrics with other compute platforms to get a detailed end to end view.”
We are working on setting up a meeting for this Friday, but for now those are all the detail we have.
I am fairly certain we will not need the included Grafana / UIs, but that this will end up just being a bridge from the ONTAP clusters into their internal Prometheus-based monitoring tool.
My assumption is that Harvest will be the best route to go, over working with the APIs directly or using something like https://github.com/sapcc/netapp-api-exporter. Wanted to see if anyone else thinks that seems like the best route as well.
Thanks!
@verbal hazel
@young steeple which is the actual version of grafana in the actual nabox build?
We have this bug :
https://github.com/grafana/grafana/pull/52253
Hi all,
I'm running containerized version of harvest 2.0. Which is the safest way to add new poller? Do I have to stop running pollers when generating compose files after editing harvest.yml?
Thanks!
We love the StorageGrid collector so far.
Is there an ETA when the next release of harvest will be available?
most likely next week.
Hi @young steeple
Is there a way to expose the NABox's Prometheus to use it as a datasource in my own Grafana instance?
The Harvest team is happy to announce the release of 22.11 https://github.com/NetApp/harvest/releases/tag/v22.11.0 This is one of our biggest releases with the largest number of external contributors to date. Go team! Highlights of this major release include: a StorageGRID collector with a Tenant/Buckets dashboard, production ready REST collectors with a full set of REST templates that export ZAPI identical metrics, and a new documentation site that consolidates Harvest documentation into one place
Hi, question about metrics lif_recv_errors and lif_sent_errors. what is time period to calculate them? how to find what the error is? thanks!
Finally 🥳 thanks @fossil bane, @loud ocean and team for the great work. I will roll the new Harvest out tomorrow in our environment
Harvest DB cleanup
@loud ocean I hope its just the name showing as nightly its not a nightly build. Reason Being some of my customers don't like seeing names nightly as it feels more of a test build rather than something released as GA.
Thank you @mental vale for noticing. I'll correct it right away.
Updated now
Thanks @fossil bane
@fossil bane After upgrading harvest to 22.11 i still cant see the fabricpool dashboards
Hi guys, I've just upgraded harvest 22.11 on some of our systems but we have an interesting behavior. In the SnapMirror dashboard we do not see any source cluster and only the destination nodes. Also it says we have two SnapMirrors but no last transfer
Hello guys, love the new release so far!
I just opened a new issue/feature request to see to which node/controller a particular disk belongs to: https://github.com/NetApp/harvest/issues/1536
Sure @empty cairn we ll take a look. Thanks
Hi, can anyone tell me how do I get a tagged version of harvest image and from where?
@fossil bane i upgraded harvest for one of my customer from 22.08 to 22.11 , i used the migration steps as part of it for docker volume migration.
Following the update instructions of the 22.11 harvest package, I tried replacing "qtree" with "quota" for my dashboards but I am missing an equivalent of "qtree_total_ops".
Is this intentional?
Following the update instructions of the
Hello guys, I'm currently looking at the harvest dashboard for nfs clients - is my understanding correct that we need to have ontap 9.12.1 for it to work (due to the rest interfaces)?
Hello guys I m currently looking at the
Where do I see what unit a metric available in Prometheus is in?
For example, I'm running NAbox and want to look at the metric quota_disk_used
Hi, can anyone tell me why is the datacenter id different every time I try to collect metrics from the same cluster? Is there a way I could ignore this field before it is exported?
Hi, seems the grafana tool released in 22.11 creates new UIDs when importing dashboards. This breaks many of our links because UID is part of the dashboard link. We were using garfana in 21.08 and it did not create new UIDs.
@young steeple why Prometheus container in Nabox was stuck in reboot loop when i checked the logs looks like the data LVM has gone full
Hi all, is there a Harvest-Dashboard to show FSA metrics?
Hi all is there a Harvest Dashboard to
is there a way to get a list of all dashboards & metrics contained in those dashboards? like from a text file or .yml or..
is there a way to get a list of all
good morning! first: 22.11.0 looks good - thanks for that. checked all the dashboards again and saw that the "Harvest Metadata" didn´t show anything (no data). can´t select a datacenter / hostname / poller ....
the harvest 22.08.0 show also nothing.
did this work for you? my installation is nabox 3.1.2.
metadata metrics
Feature request: On the maintenance page, when upgrading Harvest, can you please add a comment/help button (or change the file requestor) that says to upload the tarball and not the rpm? I accidentally uploaded the rpm, it took it, went through some motions, and then didn't change the version 🙂
@loud ocean CDOT/ONTAP Node dashboard is showing me one of my four FAS9000 running up to 6x the IOPs of the other three. When I look at CLI, they appear to be closer to each other. Have you or your team heard of this before?
Which commands are you using?
That chart uses sum(volume_total_ops{datacenter="$Datacenter",cluster="$Cluster",node=~"$Node"})
So stats on volume object total ops counter.
The statistics node show may show nblade counters, which can include indirect i/o.
So if you have a node with no data LIFs it will show 0 IOPS for example, because 0 IOPS are coming into the nblade from users.
The volume object is dblade, so it gets translated through nblade of data LIF node to cluster network to disk node where volume object is measured.
That could be one reason.
EMS integration is now live in NAbox 3.2b
cgrindst3618 CDOTONTAP Node dashboard is
Our data is full from Nabox. We have resized the disk 3 to 400g . I have extended the lv but how can i resize the disk? Can someone help?
Resize2fs is Not installed
hello, what is configuration should i change to collect data more than 2 week ? netapp harvest only show last 2 week ?
data retention
Hi There, hope you are doing well, on my harvest server i have customized, almost every conf files, do you know if i can know from one version to another version of harvest the configurations files provided have been modified?
How to upgrade Grafana
Hi, is there any technical explanation on why on the grafana dashboards the option "Connect Null values" is set to "never"?
Null value connect
So we did the upgrade to v21.11.1, and did some data migrations with prometheus, is there additional migrations for grafana? as I see a bunch of other volumes for grafana_data and harvest? Also, when I go into grafana there is no data being displayed? So not quite sure if was the data migration with prometheus or something else?
22.11.0 upgrade and seeing this in the logs:
{"level":"warn","Poller":"XXXYYY01","error":"unable to import template=[] no best-fit template found","collector":"Zapi","object":"Status_7mode","caller":"./poller.go:684","time":"2022-12-08T10:12:06-08:00","message":"init collector-object"
XXXharv:/opt/harvest/conf # grep Status_7mode /
zapi/default.yaml: Status_7mode: status_7.yaml
XXXharv:/opt/harvest/conf/zapi/7mode/8.6.0 # ls -atlr
-rw-r--r-- 1 harvest harvest 2021 Nov 21 07:34 volume.yaml
-rw-r--r-- 1 harvest harvest 686 Nov 21 07:34 subsystem.yaml
-rw-r--r-- 1 harvest harvest 596 Nov 21 07:34 status_7.yaml
I don't have 7mode, but it's weird that this exists and it says it can't find it.
is there a way to define a label for either a node, or set of nodes such that we can identify uniqueness across nodes? I know that datacenter can define it for a cluster, but really we need to drill down to the node level
@wispy raft Could you share some detail about the use-case for label a node/nodes? you would like to see THAT label in prometheus metric level or grafana panels?
example: A cluster could be in a 'Datacenter' but individual nodes may be in a different 'ROOM' within the cluster, so having a way to denote which 'node' is in which 'room' would be beneficial to our alerting system.
additionally, it would be nice to see that 'data' in grafana as well, since we would be able to sort based on 'room' AND 'datacenter'
can anyone share a Harvest screenshot that shows some FlexCache statistics?
afternoon, i've recently upgraded harvest to 22.11 and i'm looking for the new dashboards. I've reset the dashboards in NAbox but i'm still only seeing the previous dashboards, all the containers are running
Hi All, I've installed Harvest (22.11.0). It's talking to a Filer and (as a test) I configured the Promethus exporter, I can fetch data from there using curl. So far so good. But the customer would rather use InfluxDB ...
The Filer and the Harvest system are both in the "public" network zone, but the Grafana host is in a (more) secure "management network" zone.
I see that I can configure the InfluxDB exporter with a URL parameter to do HTTPS communication, but what does that do, does Harvest write to that URL? Or does Grafana read from it?
(It's unlikely I'll be allowed to allow open communication from the public network into the management network.)
I just put up nabox 3.2. I'm playing around with the headroom dashboard. On a new AFF-250, with no active data being served, it's projecting 300-400 iops of headroom per aggregate. Looks a wee bit on the low side 🙂
I found an issue with the environmental display in nabox for an AFF-A250. In the Cluster dashboard, nabox is showing fan speeds in orange and red whereas sensor show on the node cli shows them in normal range.
Node Fan speed in Red
had a look to the audit logs on ontap with "security audit log show -application http" and found this entries:
Tue Dec 06 06:47:55 2022 <MYFILER> [kern_audit:info:2460] 8503e80001e6e7cd :: <MYFILER>:http :: <MYHARVEST>:33922 :: <MYFILER>:harvest2 :: GET /api/private/cli/network/connections/active?return_records=true&fields=proto,remote_ip,remote_host,vserver,cid,blocks_lb,local_address,node,service,lif_name,local_port,lru :: Pending
Tue Dec 06 06:47:55 2022 <MYFILER> [kern_audit:info:2460] 8503e80001e6e7cd :: <MYFILER>:http :: <MYHARVEST>:33922 :: <MYFILER>:harvest2 :: GET /api/private/cli/network/connections/active?return_records=true&fields=proto,remote_ip,remote_host,vserver,cid,blocks_lb,local_address,node,service,lif_name,local_port,lru :: Error: invalid operation
on the 6th of dec. i updates from harvest 22.08. to 22.11.
can you check this please on your ontap system and how can i fix this?
thank you!
Hi, question about disk_labels. we replaced a failed disk and assigned ownership to container and partitions but disk_labels still has outage="unassigned". what should we look for? we use 22.11 harvest and the filer is FAS 2750 with ontap 9.8P11. Thanks!
Hi question about disk labels we
Hi, Im getting an error trying to download https://dl.tynsoe.org/nabox/NAbox-3.2.update Bad Gateway
I've used Chrome and Edge on both my work and personal PC and same issue
NABox download issue
I'm re-posting here to see if anyone has extended Harvest in this way before
Greetings all,
I was just curious if anyone else has had to integrate ONTAP with Splunk recently and what direction you may have taken? The officially supported Splunk add-on is EOL in a month, and the third-party add-on is listed as not supported. This makes it a non-starter option for my large enterprise customer. Has anyone tried to make it work with Harvest, or taken another approach?
Thanks and happy holidays!
Harvest with splunk
NaBox 3.2 fresh installation. But Systems Dashboard is empty. Tried with Chrome, IE, Edge, Safari from different source clients. Any idea ?
Hi all, short question for Harvest/NABox about the collecting. Will data from nodes which are going to be replaced keeped or will they be deleted? E.g. we had a techrefresh of an A700 Metro to A800 but now the "old" nodes are missing/lost
@young steeple on my installation only the harvest CDOT and 7MODE Dashboards get imported if i reset in webgui. I am missing the StorageGRID Dashboards. Can you look into this ?
StorageGrid NABox
FYI. Harvest team will be out of office next week. Wish you in advance Merry Christmas and Happy New Year!
Hi All,
Hi, is it possible to set default url of NABox 3.2 to /grafana like for NABox 2.x ?
(i.e : redirect to ./grafana when you enter URL of the NABox server)
Hello, I have installed harvest and prometheus but I am unable to collect data from activeIQ UM . Is it something which can be done or I should create a poller for each of my netapp box ?
Harvest polls the NetApp systems directly. The old Harvest 1.6 was polling UM for capacity metrics. The current Harvest 2.x doesn't poll UM at all.
@leaden vigil Harvest 2 does not collect data from ActiveIQ Unified Manager.
Default URL to grafana
thanks guys
my customer is running harvest 1.6 and 2.0 in their environment. they are looking for help to import counters for CIFS into Harvest: statistics start -object volume -instance * -counter cifs_total_ops -sample-id mu1statistics show -sample-id mu1 -instance * -object volume -fields instance,counter,value -counter cifs_total_ops -sort-order descending -sort-key cifs_total_ops -max 50statistics stop -sample-id mu1statistics sample delete -sample-id mu1 scc29::*> statistics show -sample-id mu1 -instance * -object volume -fields instance,counter,value -counter cifs_total_ops -sort-order descending -sort-key cifs_total_ops -max 50object instance counter value ------ ---------------- -------------- ----- volume swbld_releases_5 cifs_total_ops 6291 volume swbld_rel_hld cifs_total_ops 2654 volume swbld_releases_hld cifs_total_ops 226 volume swip_pai_1 cifs_total_ops 126 volume swbld_hammer_7 cifs_total_ops 33 volume psg_data_31 cifs_total_ops 31
Trying to find documentation for what the counters are in system_node.yaml under zapiperf, specifically, what is system:node -> net_data_recv / sent (as well as others, but that's a starting point). I have hunted around, clearly these mean something but there is no description on any of them.
Perf counter information
Any known impact to clusters when polling from the old NABOX 2 and new NABOX 3.1 at the same time whilst we cycle out old stats? Should also say old NABOX is using Harbvest 1.6 and new is planning to use Harvest 2 only
if i understand you right ...... NO, no problem.
right now we are renewing our hardware and this will take about 1 year. i monitor the whole migration with the old nabox 2 and the new nabox 3 at the same time on the same clusters.
sweet thanks, good to know, i guessed the overhead might be negligible
Has the network LIF dashboard been retired when using Harvest 2.0, I don't seem to able to see it under the Harvest - CDOT dashboard folder in Grafana. It obviously still appears under the General folder but a lot of these dashboards do not work when using Harvest 2.0.
lif dashboard
Is there a list, where I can lookup the meaning of a specific metric?
Is or has NMSDK been phased out? is it not required anymore for use with NABOX? wasnt sure what its requirement was with NABOX previously.
Hey guys I hope that's the right place for some unsolicited feedback for NAbox,
currently I am a dual study student and the past month I was tasked with setting up a NAbox instance as prototype in our environment.
First thing I've got to say is that doing so was surprisingly easy even though that was done in a very restricted environment.
There were only two things that stuck with me as not as nice as the rest of it. First thing is that the /prometheus path is accessible without any authentication and deactivating it causes the homepage in the NAbox admin interface to be empty, maybe the web interface could get the data in the backend without an XHR Request on client side?
Second "issue" is that setting up the LDAP Connection for Grafana isn't as much fun if you have to change the ldap.toml with vi afterwards anyway. This problem came from the web interface replacing the whole file upon clicking submit and because we use an AD I had to comment out the group_search_filter settings, and changing the member_of attribute to "memberOf" to avoid 15s-1m login time. Same thing different attributes were username and email which I had to update regularly because the web interface set it to default. Maybe you could use sed instead of overwriting the file or offer those settings in the web interface.
I hope this doesn't sound too negative, those are minor issues and the whole project is still incredible work^^
Hey guys I hope that s the right place
Hi, we have a problem with disk_labels. After replacing and assigning a failed disk, outage="unassigned" is stuck in disk_labels. This also causes disk_new_status == 0.
disk outage-reason
----- -------------
1.0.4 -
disk_labels{cluster="flc1-sys-phx1-coresys", container_type="spare", datacenter="PHX", disk="1.0.4", failed="false", firmware_revision="NA02", instance="flc1-sys-phx1-coresys", job="netapp-harvest", model="X341_SSKBE900A10", node="flc1-01-sys-phx1-coresys", outage="unassigned", owner_node="flc1-01-sys-phx1-coresys", serial_number="WFK9G16F", shared="false", shelf="8262126240392612432", shelf_bay="4", type="SAS"} 1
disk_new_status{cluster="flc1-sys-phx1-coresys", datacenter="PHX", disk="1.0.4", instance="flc1-sys-phx1-coresys", job="netapp-harvest", node="flc1-01-sys-phx1-coresys"} 0
I am new to writing queries in Grafana, using Prometheus. Basically trying to do a node read latency (and also one for write) query for all nodes then create an alert when certain latency is breached, at the moment I am trying to get the query working, but i dont think it likes the wildcard, though I am not sure I am right query. There used to be a node latency metric in Graphite, trying to find a similar one in Prom.
I am new to writing queries in Grafana
Hello All, I may be missing something simple, but what are the container image tags available for cr.netapp.io/harvest? I can't seem to figure out the pattern other than just latest and I don't like using that
Whats the easiest way to configure email alerting, smtp server etc within Grafana, I see you can do it from cmd line but there is also a section under Admin under the Alerting section on the Grafana web page.
@hot belfry, could you please post your question in #1062050414146625536 ?