#New StorageGRID dashboard

1 messages · Page 1 of 1 (latest)

wary lichen
#

Hey all, wanted to share that we've added a new StorageGRID overview dashboard with performance, storage, information lifecycle management, and node panels. These changes are in nightly if you want to give feedback before we release the next version of Harvest later this month. You can grab the latest nightly build https://github.com/NetApp/harvest/releases/tag/nightly Let us know what you think

tame iron
#

Hello @wary lichen it's cool you added storagegrid dashboards. Which kind of authentication is supported for the poller? Do admin user is mandatory?

Thanks

shadow pebble
tame iron
#

Thanks @shadow pebble , sorry I missed RTFM 😉

trim roost
#

Hi @wary lichen quick questioin. is it possible to add performance / metrics data from the buckets / tanents in an dashboard. Cool would be such performance charts as if you configure in the storageGRID a link clasification then you will get all the data perf-data from an tenant / bucket. this is what we are looking for but its not possible for us to configure link-clasification for all our buckets those are to many and i would be perfect if we could have this kind of information in the harvest nabox and we can compare and drill down all our bucket performance.

wary lichen
#

hi @trim roost we'll take a look. I might come back to you with clarifying questions 🙂 Also not sure if you saw in nightly we added traffic classification metrics last month like so

dull escarp
#

@wary lichen it seems weird, me & @trim roost can't find the dashboards in grafana on NAbox, even though after updating we reset the dasboards via the button in NAbox. Do we need to manually import the new dashboards? Just FYI we are currently on 23.02.0-1 and only see the tenants dashboard. 🙂

wary lichen
#

reset and import should work, i believe Yann added support for that. Traffic classification was added on March 10 though so it requires a nightly or waiting until 23.05 ships. If you want to try these out before the next release, you can grab the latest nightly https://github.com/NetApp/harvest/releases/tag/nightly

dull escarp
#

ahhh that makes sense, thanks for the info @wary lichen 🙂

trim roost
#

awesome yeah that gives us so many valuable insight thx. What we are missing a dashboard which indicates which bucket makes the most OPS and Bandwidth. If we could add that would be great 🙂

wary lichen
#

I don't think that's possible but will double check. To my knowledge, SG does not publish bucket metrics with ops, xput, etc 🙁

tame iron
#

Hi Chris, inside SG manager (11.6) there are metrics for any configured traffic classification policy, we have a policy for every tenant. Unfortunately they're not human readable. Policy named "joe" is shown as an id in the graphs...

wary lichen
#

Hi @tame iron are you talking about the ones on this page and this dashboard or a different one?

#

If so, you're in luck, nightly and the next version of Harvest has a traffic classification row with human readable names like so

rustic gulch
#

I just upgraded nabox to the current beta and upgraded harvest to the nightly. I configured 2 grids and love what I'm seeing. Thanks guys (and gals)!

rustic gulch
#

2 additions I'd like to see:
1: display graph grid consumption by percentage, not just PB. This would allow a display of multiple graphs on the same dashboard to be more useful and make it more obvious just how full the grid is.

2: graph by storage pool. Although we have only 1 storage pool per grid in use here, I expect that some customers will have multiple pools.

wary lichen
#

thanks Ed, we'll take a look

rustic gulch
#

Yann rejected #2 (graph by storage pool) on the nabox feedback site, deferring to the Harvest team. You're welcome @wary lichen 🙂

rustic gulch
#

I'm seeing a difference between network traffic reported by nabox/harvest for the grid vs what the Grid admin console is giving me. nabox is reporting a peak of 13.2GB/sec last night, but the General / Grid dashboard from the admin interface is peaking at 72.6Gb/sec at the same time. Both of those are just received traffic. That's about a 2x difference. Are they measuring different things?

wary lichen
#

hi @rustic gulch most of those metrics are coming from SG's Prometheus metrics. Same as in this screenshot. Are you using SG's grid manager or something else when comparing?

austere vortex
#

"Data storage over time" could use some decimals. I was very confused at the first look. 😁

wary lichen
#

looks like we set it to auto out-of-the-box. Changing it to 1 or 2 doesn't help in your case does it?

austere vortex
#

If I set it to 2 Decimals I get 2.

#

Grafana 9.4.7

wary lichen
#

thanks for the follow-up, we'll change

austere vortex
#

The percentage query (C) for Panel "Data space usage breakdown" is not working correctly, at least for VictoriaMetrics. The second "sum by" is missing "cluster" to get matching labels. Working query:

sum by(site_name,cluster)(storagegrid_storage_utilization_data_bytes{datacenter=~"$Datacenter",cluster=~"$Cluster"})/sum by(site_name,cluster)(storagegrid_storage_utilization_usable_space_bytes{datacenter=~"$Datacenter",cluster=~"$Cluster"} + storagegrid_storage_utilization_data_bytes{datacenter=~"$Datacenter",cluster=~"$Cluster"})

#

The "Metadata allowed space usage breakdown" panel has three queries but only uses one in the table.

wary lichen
#

thanks! we'll fix those

wary lichen
austere vortex
#

Fix confirmed, looks good! 👍

wary lichen
#

thanks for checking!

tame iron
wary lichen
#

hi @tame iron you're looking at the StorageGrid: Overview dashboard? If so, that var is populated via the prom query label_values(storagegrid_private_load_balancer_storage_rx_bytes{datacenter=~"$Datacenter",cluster=~"$Cluster"},policy) if you check Prometheus do you see metrics for storagegrid_private_load_balancer_storage_rx_bytes? if so let's take a look at the logs for your sg poller

#

maybe you are missing the Metrics query?

tame iron
#

This is group configuration

wary lichen
#

that looks good

tame iron
#

Logs is plenty of this
2023-04-26T155037Z WRN storagegrid/storagegrid.go:165 > no instances on storagegrid Poller=goliath collector=StorageGrid:Prometheus metric=storagegrid_s3_operations_unauthorized
2023-04-26T155037Z INF collector/collector.go:483 > Collected Poller=goliath apiMs=440 calcMs=0 collector=StorageGrid:Prometheus instances=5300 metrics=5300 parseMs=0 pluginMs=0

wary lichen
#

when you upgraded to nightly did follow these steps? https://netapp.github.io/harvest/23.02/install/containers/#upgrade-harvest
we can check, my guess is your container is not using the new nightly storagegrid template (that template includes storagegrid_private_load_balancer_storage_rx_bytes while 22.02 does not. You can verify by running this changing poller-dc1 to the name of one of your docker poller names found via docker ps -a
docker exec -it poller-dc1 cat /opt/harvest/conf/storagegrid/11.6.0/storagegrid_metrics.yaml | grep storagegrid_private_load_balancer_storage_rx_bytes
for me prints
- storagegrid_private_load_balancer_storage_rx_bytes => private_load_balancer_storage_rx_bytes
i'm guessing for you it won't print anything

tame iron
#

docker exec -it poller-goliath cat /opt/harvest/conf/storagegrid/11.6.0/storagegrid_metrics.yaml | grep storagegrid_private_load_balancer_storage_rx_bytes

  • storagegrid_private_load_balancer_storage_rx_bytes => private_load_balancer_storage_rx_bytes
#

Policy names contains space character, do it should be an issue?

wary lichen
#

cool it's there. after your download the targz and untarred it, did you also run docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --remove-orphans so the previously running containers would read the new templates?

wary lichen
#

weird, let me check on spaces. Any errors in your sg poller? docker logs $poller-sg 2>&1 | grep ERR and what does docker exec -it $poller-sg bin/harvest --version return?

tame iron
#

docker logs poller-goliath 2>&1 | grep ERR

docker exec -it poller-goliath bin/harvest --version

harvest version 23.02.0-1 (commit 627311aa) (build date 2023-02-21T13:35:23+0000) linux/amd64

wary lichen
#

thanks! I see the problem - it's an oversight in our documentation. Sorry about that, will fix. Step 3 of https://netapp.github.io/harvest/23.02/install/containers/#upgrade-harvest says to regenerate your harvest-compose.yml which assumes a latest release to latest release upgrade but that's not what you want. You want to use the nightly image instead of 23.02. Please use the following cmd instead and then run step 4 again. bin/harvest generate docker full --image ghcr.io/netapp/harvest:nightly --port --output harvest-compose.yml

#

we're tell the generate command to use the nightly image instead of the default latest with --image ghcr.io/netapp/harvest:nightly

tame iron
#

Do I need to stop and remove containers? Because we experienced conflicts with names when restarting all stuff...

wary lichen
#

Typically docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --remove-orphans is sufficient. If you want to stop you can use docker-compose -f prom-stack.yml -f harvest-compose.yml down and then the up. If you have them handy, can you share the conflict errors and we'll take a look?

tame iron
wary lichen
#

sorry for the trouble, doc update incoming!

tame iron
#

I was not able to reproduce the name issue, but it has been smooth without stopping pollers.
Last time i stopped them because I was upgrading from harvest earlier than 22.11 and want to keep historical data...
Thanks again!

tame iron
#

Hi @wary lichen , here's the name conflict reproduced (I'm trying to upgrade to the stable release)

# docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --remove-orphans [+] Running 0/0 [...] Error response from daemon: Conflict. The container name "/poller-fletcher-b" is already in use by container "b71c8e67225b822c29c1b12541239aa16f9020b405607fd8293fc5e30755d8a8". You have to remove (or rename) that container to be able to reuse that name.

Any idea on the cause?

shadow pebble
#

@tame iron Have you regenerated the harvest-compose.yml or executed any other commands as part of the upgrade process?

tame iron
shadow pebble
#

It is possible to get this error if the image name has changed. In this release we have moved our docker image to Github. When you try to start a container with the same name as a previous container that was running with a different image name, Docker may try to create a new container with the same name, but it will conflict with the previous container that is still running. I have tried upgrading to latest release but unable to reproduce it. If you are still getting this error, Can you try below command

docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --force-recreate --remove-orphans

tame iron
#

Do this will erase historical data?

shadow pebble
#

It will not. It only affects the containers and does not modify any Docker volumes.

tame iron
#

Same issue also with --force-recreate

shadow pebble
#

okay is this error for all containers or only some of them?

tame iron
#

it seems it's random on all defined poller

shadow pebble
#

Okay. Let's try below command. We'll stop containers and then start them

docker-compose -f prom-stack.yml -f harvest-compose.yml down
docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --remove-orphans
tame iron
#

If I change dir to the old version, docker-compose down stops all running containers 🙂

shadow pebble
#

Great! Can you please provide the diff of the two compose files for Harvest, or share them with us? I think the issue might be caused by a difference in the Docker image name.

#

I am able to recreate the issue locally. Looking into it

tame iron
#

I'll send you compose files via PM

shadow pebble
#

Thanks. It appears that the com.docker.compose.project.working_dir setting is set by Docker Compose in the containers it creates, and changing it will lead to the issue you've highlighted.

To resolve this, you can try stopping and removing the containers that were created using the old working directory, and then start new containers with the new working directory. This is a temporary workaround and we'll work on finding a more permanent solution to this issue.

tame iron
#

It has been solved with

cd $old-dir
docker-compose -f prom-stack.yml -f harvest-compose.yml down
cd $new-dir
docker-compose -f prom-stack.yml -f harvest-compose.yml up -d --remove-orphans

I'll add this workaround for the upgrade process

Thanks for helping to fix the issue!