Hey all, wanted to share that we've added a new StorageGRID overview dashboard with performance, storage, information lifecycle management, and node panels. These changes are in nightly if you want to give feedback before we release the next version of Harvest later this month. You can grab the latest nightly build https://github.com/NetApp/harvest/releases/tag/nightly Let us know what you think
#New StorageGRID dashboard
1 messages · Page 1 of 1 (latest)
Hello @wary lichen it's cool you added storagegrid dashboards. Which kind of authentication is supported for the poller? Do admin user is mandatory?
Thanks
@tame iron Required permissions for SG are mentioned here https://netapp.github.io/harvest/23.02/prepare-storagegrid-clusters/#create-storagegrid-group-permissions
Thanks @shadow pebble , sorry I missed RTFM 😉
Hi @wary lichen quick questioin. is it possible to add performance / metrics data from the buckets / tanents in an dashboard. Cool would be such performance charts as if you configure in the storageGRID a link clasification then you will get all the data perf-data from an tenant / bucket. this is what we are looking for but its not possible for us to configure link-clasification for all our buckets those are to many and i would be perfect if we could have this kind of information in the harvest nabox and we can compare and drill down all our bucket performance.
hi @trim roost we'll take a look. I might come back to you with clarifying questions 🙂 Also not sure if you saw in nightly we added traffic classification metrics last month like so
@wary lichen it seems weird, me & @trim roost can't find the dashboards in grafana on NAbox, even though after updating we reset the dasboards via the button in NAbox. Do we need to manually import the new dashboards? Just FYI we are currently on 23.02.0-1 and only see the tenants dashboard. 🙂
reset and import should work, i believe Yann added support for that. Traffic classification was added on March 10 though so it requires a nightly or waiting until 23.05 ships. If you want to try these out before the next release, you can grab the latest nightly https://github.com/NetApp/harvest/releases/tag/nightly
ahhh that makes sense, thanks for the info @wary lichen 🙂
awesome yeah that gives us so many valuable insight thx. What we are missing a dashboard which indicates which bucket makes the most OPS and Bandwidth. If we could add that would be great 🙂
I don't think that's possible but will double check. To my knowledge, SG does not publish bucket metrics with ops, xput, etc 🙁
Hi Chris, inside SG manager (11.6) there are metrics for any configured traffic classification policy, we have a policy for every tenant. Unfortunately they're not human readable. Policy named "joe" is shown as an id in the graphs...
Hi @tame iron are you talking about the ones on this page and this dashboard or a different one?
If so, you're in luck, nightly and the next version of Harvest has a traffic classification row with human readable names like so
I just upgraded nabox to the current beta and upgraded harvest to the nightly. I configured 2 grids and love what I'm seeing. Thanks guys (and gals)!
2 additions I'd like to see:
1: display graph grid consumption by percentage, not just PB. This would allow a display of multiple graphs on the same dashboard to be more useful and make it more obvious just how full the grid is.
2: graph by storage pool. Although we have only 1 storage pool per grid in use here, I expect that some customers will have multiple pools.
thanks Ed, we'll take a look
Yann rejected #2 (graph by storage pool) on the nabox feedback site, deferring to the Harvest team. You're welcome @wary lichen 🙂
I'm seeing a difference between network traffic reported by nabox/harvest for the grid vs what the Grid admin console is giving me. nabox is reporting a peak of 13.2GB/sec last night, but the General / Grid dashboard from the admin interface is peaking at 72.6Gb/sec at the same time. Both of those are just received traffic. That's about a 2x difference. Are they measuring different things?
hi @rustic gulch most of those metrics are coming from SG's Prometheus metrics. Same as in this screenshot. Are you using SG's grid manager or something else when comparing?
"Data storage over time" could use some decimals. I was very confused at the first look. 😁
looks like we set it to auto out-of-the-box. Changing it to 1 or 2 doesn't help in your case does it?
thanks for the follow-up, we'll change
The percentage query (C) for Panel "Data space usage breakdown" is not working correctly, at least for VictoriaMetrics. The second "sum by" is missing "cluster" to get matching labels. Working query:
sum by(site_name,cluster)(storagegrid_storage_utilization_data_bytes{datacenter=~"$Datacenter",cluster=~"$Cluster"})/sum by(site_name,cluster)(storagegrid_storage_utilization_usable_space_bytes{datacenter=~"$Datacenter",cluster=~"$Cluster"} + storagegrid_storage_utilization_data_bytes{datacenter=~"$Datacenter",cluster=~"$Cluster"})
The "Metadata allowed space usage breakdown" panel has three queries but only uses one in the table.
thanks! we'll fix those
@austere vortex fixed in https://github.com/NetApp/harvest/pull/1930 will be in tonight's nightly
Fix confirmed, looks good! 👍
thanks for checking!
Hello Chris, we upgraded to 23.04.26-nightly_linux_amd64, looking at the same dashboard but the drop down for policy is not showing our custom classification policies. Maybe a matter of permission?
hi @tame iron you're looking at the StorageGrid: Overview dashboard? If so, that var is populated via the prom query label_values(storagegrid_private_load_balancer_storage_rx_bytes{datacenter=~"$Datacenter",cluster=~"$Cluster"},policy) if you check Prometheus do you see metrics for storagegrid_private_load_balancer_storage_rx_bytes? if so let's take a look at the logs for your sg poller
and double check that the user has these permissions https://netapp.github.io/harvest/23.02/prepare-storagegrid-clusters/#create-storagegrid-group-permissions
maybe you are missing the Metrics query?
This is group configuration
that looks good
Logs is plenty of this
2023-04-26T15
37Z WRN storagegrid/storagegrid.go:165 > no instances on storagegrid Poller=goliath collector=StorageGrid:Prometheus metric=storagegrid_s3_operations_unauthorized
2023-04-26T15
37Z INF collector/collector.go:483 > Collected Poller=goliath apiMs=440 calcMs=0 collector=StorageGrid:Prometheus instances=5300 metrics=5300 parseMs=0 pluginMs=0
when you upgraded to nightly did follow these steps? https://netapp.github.io/harvest/23.02/install/containers/#upgrade-harvest
we can check, my guess is your container is not using the new nightly storagegrid template (that template includes storagegrid_private_load_balancer_storage_rx_bytes while 22.02 does not. You can verify by running this changing poller-dc1 to the name of one of your docker poller names found via docker ps -a
docker exec -it poller-dc1 cat /opt/harvest/conf/storagegrid/11.6.0/storagegrid_metrics.yaml | grep storagegrid_private_load_balancer_storage_rx_bytes
for me prints
- storagegrid_private_load_balancer_storage_rx_bytes => private_load_balancer_storage_rx_bytes
i'm guessing for you it won't print anything
docker exec -it poller-goliath cat /opt/harvest/conf/storagegrid/11.6.0/storagegrid_metrics.yaml | grep storagegrid_private_load_balancer_storage_rx_bytes
- storagegrid_private_load_balancer_storage_rx_bytes => private_load_balancer_storage_rx_bytes
Policy names contains space character, do it should be an issue?
cool it's there. after your download the targz and untarred it, did you also run docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --remove-orphans so the previously running containers would read the new templates?
yes, correct
weird, let me check on spaces. Any errors in your sg poller? docker logs $poller-sg 2>&1 | grep ERR and what does docker exec -it $poller-sg bin/harvest --version return?
docker logs poller-goliath 2>&1 | grep ERR
docker exec -it poller-goliath bin/harvest --version
harvest version 23.02.0-1 (commit 627311aa) (build date 2023-02-21T13:35:23+0000) linux/amd64
thanks! I see the problem - it's an oversight in our documentation. Sorry about that, will fix. Step 3 of https://netapp.github.io/harvest/23.02/install/containers/#upgrade-harvest says to regenerate your harvest-compose.yml which assumes a latest release to latest release upgrade but that's not what you want. You want to use the nightly image instead of 23.02. Please use the following cmd instead and then run step 4 again. bin/harvest generate docker full --image ghcr.io/netapp/harvest:nightly --port --output harvest-compose.yml
we're tell the generate command to use the nightly image instead of the default latest with --image ghcr.io/netapp/harvest:nightly
Do I need to stop and remove containers? Because we experienced conflicts with names when restarting all stuff...
Typically docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --remove-orphans is sufficient. If you want to stop you can use docker-compose -f prom-stack.yml -f harvest-compose.yml down and then the up. If you have them handy, can you share the conflict errors and we'll take a look?
That was the solution! Thank you very much!
I was not able to reproduce the name issue, but it has been smooth without stopping pollers.
Last time i stopped them because I was upgrading from harvest earlier than 22.11 and want to keep historical data...
Thanks again!
Hi @wary lichen , here's the name conflict reproduced (I'm trying to upgrade to the stable release)
# docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --remove-orphans [+] Running 0/0 [...] Error response from daemon: Conflict. The container name "/poller-fletcher-b" is already in use by container "b71c8e67225b822c29c1b12541239aa16f9020b405607fd8293fc5e30755d8a8". You have to remove (or rename) that container to be able to reuse that name.
Any idea on the cause?
@tame iron Have you regenerated the harvest-compose.yml or executed any other commands as part of the upgrade process?
I followed upgrade procedure https://netapp.github.io/harvest/23.05/install/containers/#upgrade-harvest
The only difference I merged prom-stack.yml with our customizations
If I try to start everything multiple times, the error mention different pollers that are currently running
It is possible to get this error if the image name has changed. In this release we have moved our docker image to Github. When you try to start a container with the same name as a previous container that was running with a different image name, Docker may try to create a new container with the same name, but it will conflict with the previous container that is still running. I have tried upgrading to latest release but unable to reproduce it. If you are still getting this error, Can you try below command
docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --force-recreate --remove-orphans
Do this will erase historical data?
It will not. It only affects the containers and does not modify any Docker volumes.
Same issue also with --force-recreate
okay is this error for all containers or only some of them?
it seems it's random on all defined poller
Okay. Let's try below command. We'll stop containers and then start them
docker-compose -f prom-stack.yml -f harvest-compose.yml down
docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --remove-orphans
The down command is not stopping containers, they still remain running...
do it can be related to the directory change? https://stackoverflow.com/questions/57361075/docker-compose-command-is-failing-with-conflict
If I change dir to the old version, docker-compose down stops all running containers 🙂
Great! Can you please provide the diff of the two compose files for Harvest, or share them with us? I think the issue might be caused by a difference in the Docker image name.
I am able to recreate the issue locally. Looking into it
I'll send you compose files via PM
Thanks. It appears that the com.docker.compose.project.working_dir setting is set by Docker Compose in the containers it creates, and changing it will lead to the issue you've highlighted.
To resolve this, you can try stopping and removing the containers that were created using the old working directory, and then start new containers with the new working directory. This is a temporary workaround and we'll work on finding a more permanent solution to this issue.
It has been solved with
cd $old-dir
docker-compose -f prom-stack.yml -f harvest-compose.yml down
cd $new-dir
docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --remove-orphans
I'll add this workaround for the upgrade process
Thanks for helping to fix the issue!