#different port

1 messages · Page 1 of 1 (latest)

upper sundial
#

Hello, just installed nabox4 and wanted to add storagegrid. This did not work because we're running it on a different port behind our loadbalancer. Is there a current way to support this situation ?
I've tried adding :xxx with portnumber behind the server but this is not allowed.
It would be nice to be able to change the portnumber (and also http/https options. we have https and a different port but I could imagine a situation that someone is not using https).
Is this something that can be considered as feature for a next update ?

tidal dew
#

@fierce ridge is this NABox related?

fierce ridge
#

Very likely 😄

#

That said, https is pretty standard

#

Shame I just release 4.0.8 😄

#

Allowing ":" for port number is a quick change. But it's still going to speak https

upper sundial
#

https is fine

fierce ridge
#

Unless we say != 443 => http

#

will send a build shortly

upper sundial
#

Man, you're blazing fast !

fierce ridge
#

That's what she said.

#

sorry couldn't resist

upper sundial
#

😆

fierce ridge
#

Now you can use ":" in address and I believe it should work !

upper sundial
#

Thanks ! Downloading as we speak.

#

We're still in migration from NAbox 3 to 4 which takes some time because around 1TB data. So I will have to wait until that finishing before installing the patch.

fierce ridge
#

That sounds wise !

upper sundial
#

I wish I had added some CPU's to NABox4 instance. 2 isn't enough. (NABox3 was running 4vCPU and NABox4 is pinned at 100% for 2 vCPU during migration) But we're halfway there. Doubling and restarting would take the same time as keeping it running I think.

fierce ridge
#

Probably, I should make a note of it. But that would require also modifying the args for VM migration, it has arguments for number of threads

upper sundial
#

Ah. Ok. I assumed 2 threads was because of the 2 vCPU on destination. Well it's just a one-time deal.

upper sundial
#

@fierce ridge To confirm. The update works like a charm. I have the storagegrid added to the NABox. Now we have to figure out if we can get growth per tenant out of some graph.

tidal dew
#

There is a panel in Tenants Dashboard Tenants by Capacity which may help.

upper sundial
#

There is no such panel in my Tenants dashboard. (just Tenant Quota and Buckets) However in the Overview dashboard I can see tenants by logical space used and quota used.

tidal dew
#

Do you see below panel in Tenants Dashboard

#

This and logical space used in overview dashboard for tenants should be same.

#

You can use either of these.

upper sundial
#

Yes. Thanks!

#

this is what I see

tidal dew
#

Could you share Harvest version?

#

These panels were only added in latest release

#

Could you upgrade to 24.11.1

upper sundial
#

The default that came with nabox4. 24.08.0. I will upgrade.

#

Ok. That did it. I see that now.

tidal dew
#

Great!

upper sundial
#

Unfortunately the Tenant dashboard is empty when I checked today. The regular dashboard is showing data.

tidal dew
upper sundial
#

Ok. Uploaded.

tidal dew
#

Okay. Tenant collection failed during init itself

#

Could you try restarting Harvest container and see if it fixes the issue.

upper sundial
#

ok. after docker restart the graph has entries again.

upper sundial
#

this ranks very high on my weird-shit-o-meter ... nabox3 shows like something stopped november...

#

when I import the same (default volume dashboard) from nabox3 into nabox4 .... the graph picks up where the nabox3 one drops off ?

#

The data from nabox3 was migrated to that nabox4 ....

vernal stirrup
#

hi @upper sundial what versions of Harvest are you running in nabox3 and nabox4? Recent versions of Harvest do not include a panel with that title in the volume dashboard. I suspect what you're observing is a difference in how topK is displayed. When you say, "the graph picks up where the nabox3 one drops off", are you referring to the graphs between the two red lines here? Are the yellow and orange lines for the same volume or different ones?

upper sundial
#

nabox3 has harvest 23.05.0-1 and the nabox4 has 24.11.1 The volume dashboard from nabox3 was exported and imported in nabox4 to have the exact same graphs. This is a selection of one volume. So yes, the green and yellow lines are the same volume as is the orange and yellow or green and orange on the nabox4 graph.

#

(The migrate was done this year so all data from last year was collected on harvest 23/nabox3 and migrated to nabox4. I could imagine that the difference would be around the newyear period from old to new harvest but it's all in 2024.)

#

maybe a clue... okt 27th we did upgrade ontap on the cluster. and maybe the other change was a previous update.

#

ok march 31th was also an upgrade of ontap.

vernal stirrup
#

maybe. Did you create that panel yourself? Looking at Harvest version 23.05.0-1, the dashboard panel you shared above is not from that version. Version 22.11.0 is the last version of Harvest that included that panel title https://github.com/NetApp/harvest/blob/release/22.11.0/grafana/dashboards/cmode/volume.json#L460
In 23.02.0 we changed all titles with topk to indicate that in their titles
https://github.com/NetApp/harvest/blob/release/23.02.0/grafana/dashboards/cmode/volume.json#L461

GitHub

Open-metrics endpoint for ONTAP and StorageGRID. Contribute to NetApp/harvest development by creating an account on GitHub.

GitHub

Open-metrics endpoint for ONTAP and StorageGRID. Contribute to NetApp/harvest development by creating an account on GitHub.

upper sundial
#

The panel was not created myself. It was a default panel. Maybe it stayed after upgrading harvest ? Still the question is why this behaviour ? I could imagine that a volume would get a different UUID and drop out of the graph but not that it would come back after some time or after changing harvest/nabox.

tidal dew
#

@upper sundial Could you share the Grafana versions from NABox3 and NABox4? I suspect that the differences you are observing are due to the large time range selection and the way Grafana is selecting points to plot. Let's try zooming into one week of the month and then compare the results. Do they look similar?

upper sundial
#

month on nabox3

#

month on nabox4. it misses a part where harvest was not running due to some init error. restarting container fixed that

tidal dew
#

This data looks similiar right?

upper sundial
#

grafana 9.5.14 on nabox3 and grafana 11.4.0 on nabox4. Yes this looks similar so I guess you're on to something.

#

The specific pattern on that volume is a 2 hour spike in traffic for an hour or so. This is creating this graph.

tidal dew
#

We have observed instances where selecting larger time ranges in Grafana may not display all spikes, if any. In this case, we are comparing two different Grafana versions. It is possible that the queries sent from Grafana differ slightly between the versions. You can verify this by checking the network tools in your browser.

upper sundial
#

(2 hour I mean, every 2 hour the traffic/throughput/latency goes up for around an hour. looks like the specific virtual machine is doing some task scheduled every 2 hours)

upper sundial
#

That is during the same grafana if you look at the nabox graph.

#

The change 31th march and 27th october match ontap upgrades. So it looks like that triggered a change.

tidal dew
#

I don't believe the ONTAP upgrade is causing the difference here. Let's zoom into the relevant time period you mentioned, focusing on approximately 15 days of data. I expect the data should be the same

upper sundial
#

Looking at the month graph the data is indeed the same. However it's really a shame that the year graph is not helping. What I wanted to find out is since when this traffic behaviour has been going on. It was longer than a month. Going further back using the year graph it is not really helping. It looked like the traffic suddenly stopped and suddenly started.

#

The longer period is normally causing data becoming more spread out, or flatter due to averaging. This is not that.

tidal dew
#

I believe it's not even averaging. Grafana selects a single point from each step size, and this step size changes as you adjust the time range. If you enable the Prometheus query log, you will notice changes in the step size as you modify the interval in Grafana. The step size determines that Grafana will pick one point from each interval of n. Even size of panel matters here. Increasing the panel size in Grafana allows it to display more data points, which can lead to changes in the appearance of the graph as well.

#

The longer the time range, the more problematic the data representation can become. It might be better for the panel to perform a different type of query that averages the data for longer duration graphs, providing more of a summary.

upper sundial
#

Yes. That explains why it's doing this. It would make more sense if it was averaging. That graph would be more helpful. We have the specific behaviour in the vm that is not working well if you pick the wrong timestamps. A regular graph would not make a difference but this specific behaviour is now messing the graphs up.

tidal dew
#

Could you share this panel prometehus query?

upper sundial
#

topk($TopResources, volume_avg_latency{datacenter=~"$Datacenter",cluster=~"$Cluster",svm=~"$SVM",volume=~"$TopVolumeAvgReadLatency"})

#

Expr: topk(5, volume_avg_latency{datacenter=~"PDC2",cluster=~"prev-aff-cl1-pdc2",svm=~"svm-previder-02",volume=~"cl2_winvol1502"})
Step: 6h0m0s

tidal dew
#

Thanks.

upper sundial
#

Expr: topk(5, volume_avg_latency{datacenter=~"PDC2",cluster=~"prev-aff-cl1-pdc2",svm=~"svm-previder-02",volume=~"cl2_winvol1502"})
Step: 12h0m0s

#

looks like the newer query is doing 12 hour step instead of 6

upper sundial
#

For what I'm looking for some kind of averaging would be better instead of randomly picking a measuring point. With gradually changing values this randomly picking a point is not an issue. With more dynamic behaving values this can really put you on the wrong foot.

tidal dew
#

Yes, we have discussed this in the past. Given that this value has to be static for averaging in the query, it may not be suitable for shorter time ranges. For shorter time ranges, we want to capture peaks accurately, as averaging them out would miss those peaks. However, for longer durations, we need smoothing to get a more meaningful summary.

For example, the query below may perform better for queries spanning months, but it will not be as effective for queries covering just one day or a few hours.

avg_over_time(volume_avg_latency{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$Volume", style!="flexgroup_constituent"}[3h])
and on(datacenter, cluster, svm, volume) 
topk($TopResources, avg_over_time(volume_avg_latency{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$Volume", style!="flexgroup_constituent"}[3h]) * on(datacenter, cluster, svm, volume) group_left() volume_labels{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$Volume", tags=~".*$Tag.*"})
upper sundial
#

query is not working unfortunately

tidal dew
#

What is the error?

#

We can also try using the $__interval variable, which adjusts its value as the time range changes. Could you try the following query to see if it works for both shorter and longer time ranges?

avg_over_time(volume_avg_latency{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$Volume", style!="flexgroup_constituent"}[$__interval])
and on(datacenter, cluster, svm, volume) 
topk($TopResources, avg_over_time(volume_avg_latency{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$Volume", style!="flexgroup_constituent"}[$__interval]) * on(datacenter, cluster, svm, volume) group_left() volume_labels{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$Volume", tags=~".*$Tag.*"})
#

Let's try with the query you have shared. I have modified it as below

topk($TopResources, avg_over_time(volume_avg_latency{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$TopVolumeAvgReadLatency"}[$__interval]))
upper sundial
#

yes. that is giving results

tidal dew
#

Yes, I suspect the interval value is quite large for a time range of one year. When we experimented with this previously, the results appeared unusual.

#

As a result, we have left it for Grafana to automatically select points for the time series, rather than manually guiding it to pick specific points. If we hardcode the value to, say, 3 hours, the graph will probably be for a year's worth of data but not for shorter time ranges.

topk($TopResources, avg_over_time(volume_avg_latency{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$TopVolumeAvgReadLatency"}[3h]))
upper sundial
#

I think the selected volume is not used in the last query and not acting on the relevant values ?

#

is denied by the document's Content Security Policy.

#

got error while trying to save copy of dashboard. it does save it but when editing panel it again shows error.

tidal dew
#

That is strange. I suspect it's some other issue. Maybe try reverting to an older version of this dashboard or deleting and re-importing it.

upper sundial
#

yeah. something is broken, cannot edit the original volume panel either without getting the error.

tidal dew
#

Okay. Better to delete and reimport.

upper sundial
#

Deleted the copy dashboards. Seems to be working again.

#

When looking at the volume dashboard from harvest 24.11.1 and having 1 year as time it gives an error saying duplicate time series.

#

When selecting a smaller time windows it is ok.

#

30 days for example works

#

even 90 days is fine

tidal dew
#

Ok it is possible that aggregate has changed for selected volume in last 1 year leading to duplicate results.

#

If you update this query to below? Does it work for 1 year range?

volume_avg_latency{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$Volume", style!="flexgroup_constituent"}
and on(datacenter, cluster, svm, aggr, volume) 
topk($TopResources, avg_over_time(volume_avg_latency{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$Volume", style!="flexgroup_constituent"}[3h]) * on(datacenter, cluster, svm, aggr, volume) group_left() volume_labels{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$Volume", tags=~".*$Tag.*"})
upper sundial
#

Yes !

#

looking great

tidal dew
#

Cool we ll fix this.

upper sundial
#

However I'm unable to save a copy dashboard.

tidal dew
#

I think that’s not allowed in NABox4. @fierce ridge

upper sundial
#

have created a different folder to avoid contaminating the original ones but still this error.

tidal dew
#

That’s the error in dashboard created in new folder?

upper sundial
#

yes, when trying to create copy of the volume dashboard in the new folder

tidal dew
#

looks like some permission related issue.

#

Do you see any error related to this in grafana logs?
After ssh to nabox machine, docker logs grafana?

upper sundial
# upper sundial looking great

still when looking again there is a lower portion of months. so partly fixed. it has data now but the original issue still there.

tidal dew
#

Okay do you mean that it is missing data for some months?

upper sundial
#

nabox3 shows this. high on the part where nabox4 shows low

tidal dew
#

Okay got it. Yeah that is due to the time range issue we discussed above.

tidal dew
#

Querying on a downsampled dataset may give better results. However, we first need to downsample the dataset before we can perform such queries.

fierce ridge
upper sundial
#

Nope, save as copy is giving the error.

#

but it does create something because with the same name it says that the dashboard exists.

#

Have to delete it to avoid issues.

#

If I don't delete them I get the same error when opening a different dashboard.

upper sundial
tidal dew
upper sundial
#

you mean this ? Yes. Uploaded it. (fyi my ip ends at .58, I saw a .104 connecting but that's a colleague)

fierce ridge
#

I get the operation is insecure error as well. But the dashboard is saved, don't try to save it in Harvest folder though I guess. I did a different folder

#

Looks like an issue with the new Content Security Policies

fierce ridge
#

Ok, issue fixed, let's get you a build

fierce ridge
#

I'd appreciate if you can ugrade to the version I sent earlier for SG port, validate it, and then to this one for the dashboard thing

upper sundial
#

updated but the volume year graphs show no data. the 90 days do show statistics. Do I have to manually restart nabox ? Or would the upgrade restart al necessary services ?

#

I can confirm that save a copy of dashboard now works without error.

tidal dew
#

What is the error if you hover/click on this red marks?

upper sundial
#

duplicate time series

tidal dew
#

Okay. I'll share fix for this shortly.

upper sundial
#

not working

tidal dew
#

Okay. I see that node is different which is causing duplicate time series.

upper sundial
#

We do regularly move volumes. Either to free up space on that aggregate or because we're replacing the HA-pair because the support contract is ending. So that could be the case.

tidal dew
#

Understood. Yeah We need to handle that in dashboard.

#

Let's focus on this Average latency panel and try to fix its query. Could you try below query if that works for this panel?

volume_avg_latency{datacenter=~"$Datacenter",cluster=~"$Cluster",svm=~"$SVM",volume=~"$Volume",style!="flexgroup_constituent"}
and on(datacenter,cluster,svm,volume)
topk($TopResources, avg_over_time(volume_avg_latency{datacenter=~"$Datacenter",cluster=~"$Cluster",svm=~"$SVM",volume=~"$Volume",style!="flexgroup_constituent"}[3h] @ end())
and on(datacenter,cluster,svm,volume)
volume_labels{datacenter=~"$Datacenter",cluster=~"$Cluster",svm=~"$SVM",volume=~"$Volume",tags=~".*$Tag.*"})
upper sundial
#

yes. that works

tidal dew
#

cool. I'll fix for remaining panels and give you updated dashboard.

tidal dew
upper sundial
#

it seems to work partly. the middle graph is showing data for past year but left and right graphs just start from the moment nabox4 was the harvester it seems.

#

when specifically selecting the svm and volume there is the remarkable drop we have discussed earlier in the middle graph and outer graphs are missing data.

tidal dew
#

Could you edit the average latency panel and then inspect -> query as shown in the screenshot? Then, could you share the query that was executed?

tidal dew
upper sundial
#

in both cases

upper sundial
tidal dew
#

Okay, let's take the Volume Average Latency panel again and try the query below to see if it shows all 1-year data correctly without any duplicate series errors for the panel.

volume_avg_latency{datacenter=~"$Datacenter",cluster=~"$Cluster",svm=~"$SVM",volume=~"$Volume",style!="flexgroup_constituent"}
and on (datacenter, cluster, svm, volume)
topk($TopResources, avg_over_time(volume_avg_latency{datacenter=~"$Datacenter",cluster=~"$Cluster",svm=~"$SVM",volume=~"$Volume",style!="flexgroup_constituent"}[3h] @ end()) * on (datacenter, cluster, svm, volume) group_left(node) volume_labels{datacenter=~"$Datacenter",cluster=~"$Cluster",svm=~"$SVM",volume=~"$Volume",tags=~".*$Tag.*"})
upper sundial
#

and for the single volume

tidal dew
#

Okay so data is back but step size issue exists.

upper sundial
tidal dew
#

Okay we don't have a good way yet to handle step size issue.

tidal dew
upper sundial
#

the 1 year for all has data now on the 3 panels.

#

The middle top panel is still having an issue

#

there is the duplicate time series error

tidal dew
#

Could you check if below query works for middle top panel

sum(
  topk(
    $TopResources,
    volume_read_data{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$Volume", style!="flexgroup_constituent"}
    * on(datacenter, cluster, svm, volume) group_left(node)
    volume_labels{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$Volume", tags=~".*$Tag.*"}
  )
)
+
sum(
  topk(
    $TopResources,
    volume_write_data{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$Volume", style!="flexgroup_constituent"}
    * on(datacenter, cluster, svm, volume) group_left(node)
    volume_labels{datacenter=~"$Datacenter", cluster=~"$Cluster", svm=~"$SVM", volume=~"$Volume", tags=~".*$Tag.*"}
  )
)
upper sundial
tidal dew
#

Thanks for confirming!

tidal dew
#

@upper sundial I'll share a volume dashboard with you tomorrow, designed to work over longer time ranges where the data didn't make sense in the Grafana panels.

tidal dew
#

@upper sundial Could you please send us an Test email at ng-harvest-files@netapp.com? We would like to share a new dashboard with you for testing via email. This new dashboard effectively handles long time ranges by presenting a summary of the data over time.

tidal dew
#

@upper sundial Gentle reminder.

upper sundial
#

Sorry, been away from discord for a while. Will send a mail.

tidal dew
#

Thanks. I have shared updated Volume dashboard via email for your feedback.

upper sundial
tidal dew
#

@fierce ridge