#Same Qtree is shown as multiple resources
1 messages · Page 1 of 1 (latest)
hi @fluid hull we've seen this happen when the instance label for a metric changes. The instance label includes the poller's Prometheus port that Prometheus uses to scrap. You can verify that the instance label changed by going to Prometheus and typing count by (instance,volume,qtree) (quota_files_used{volume="appdata_mms",qtree="appdata963"}). Do you see more than one row there? In that last six month, did the port for this poller change? That would cause this problem
In the Query? Here?
Or can we access Prom in our Nabox?
you can access prom from your nabox. Example: if nabox is https://10.216.33.135/login use https://10.216.33.135/prometheus/
when you typed the query, can you double check the qtree and volume names by picking them from the completion dropdown that Prometheus shows when typing like this?
thanks for confirming
What if you try
count by (instance,svm,volume,qtree) (quota_files_used{volume="appdata_mms"})
Dome the querys on our production env:
@tall carbon can you help us fixing this, or cant this be fixed?
@fluid hull One of the reason this can be possible is, If we have enabled all type of quotas in template.
Can you please confirm which quotas have been enabled so we can isolate this possibility,
you can check here for rest template: https://github.com/NetApp/harvest/blob/main/conf/rest/9.12.0/qtree.yaml#L30
you can check here for zapi template: https://github.com/NetApp/harvest/blob/main/conf/zapi/cdot/9.8.0/qtree.yaml#L38
Because the panel in dashboard are showing for qtree level but the metric quota_disk_used is quota metrics, and it can have multiple records for same qtree, volume and svm. Attached the screenshot where same qtree has multiple records because of group and user quotas belongs to that.
Also, In your screenshot for count by query, I could not see the right side counts, can you share that again with those right side count.
Will try to check everything.
For now here are the screenshot with the queries.
As per your screenshot, it doesn't look like you have many quotas belongs to the same qtree. So, thats' not the possibility which I have mentioned yesterday.
Which means, Chris's initial suspect, the metrics would be having different ports which caused the showing multiple instances for the same qtree object, would be the case here.
How can i check if what Chris suspected is true?
And can i avoid that actively on mi site?
How can i check if the port has changed?
We havent changed Network Settings,neither on the Netapp nor on the Nabox itself.
@fluid hull Do you still have this issue? If yes, Are the SVMs the same for all these duplicate qtrees displayed in the panel? If yes, then to check if the poller port has changed over time, let's focus on the panel Top $TopResources Qtrees by Disk Used. If you update its panel query to the one below, does it fix the issue with the duplicate qtrees for this panel?
(
label_replace(
quota_disk_used{
datacenter=~"$Datacenter",
cluster=~"$Cluster",
svm=~"$SVM",
volume=~"$Volume",
qtree=~"$Qtree"
},
"instance", "", "instance", ".*"
)
)
and
topk(
$TopResources,
avg_over_time(
label_replace(
quota_disk_used{
datacenter=~"$Datacenter",
cluster=~"$Cluster",
svm=~"$SVM",
volume=~"$Volume",
qtree=~"$Qtree"
},
"instance", "", "instance", ".*"
)[3h:] @ end()
)
)
Will try the new query and give you feedback soon
Our Productive System with older Harvest
Our Clone from Production with actual Harves - On the left the new query from @spiral fable on the right the actual query.
@fluid hull Has the new query I provided fixed the issue with duplicate records in the panel?
The new query shows only data for a few hours
And on the right side, the original query, doesnt apply the filter from the variable
And it is the same SVM
@fluid hull Are you suggesting that with the new query, you are getting only one record as a result, as shown in the screenshot? For me data is available for longer range.
I have updated qtree dashboard all panels queries with these changes. Could you import this dashboard and share feedback.
will try and report back.
As Soon as i change the Query our results in those Panels reduce themself to a few hours. And i dont know why 😄
The Panels above work fine
This is strange. Let's try below query in Grafana panel for Test purpose and check results.
(
label_replace(
quota_disk_used{
qtree=~"appdata963"
},
"instance", "", "instance", ".*"
)
)
and
topk(
5,
avg_over_time(
label_replace(
quota_disk_used{
qtree=~"appdata963"
},
"instance", "", "instance", ".*"
)[3h:] @ end()
)
)
I have found one qtree that shows all the data.
I will try to find out whats the difference between the two trees.
The first difference i have noticed is, that what works is a HA-Pair, and where we have this strange bevaiour is a Metrocluster.
I can confirm that all our Metroclusters show the same issue and all our HA Pairs dont
Thanks @fluid hull . It means we don't have relevant qtree data for MetroCluster in Prometheus. Were these clusters added recently? ,How far back is data available in Prometheus for metro clusters? Do other metrics (such as volume, etc.) also have the same issue as qtree for MetroClusters? If you could share the logs with us, as well as the details of the MetroCluster poller name, we can check. We don't have any special handling in Harvest for Qtrees related to MetroCluster.
For log collection, please refer to: NetApp Harvest Log Collection.
We have the Metrocluster since 2021
I dont think we miss data, look two screenshots with two differen servers. Only the query differ.
On the left our Productive Grafana with Harvest 24.02 and on the right latest nightly.
Sidenote: In Highlights Panels we have all data on both Instances.
Other Metrics with same problems are not present or maybe not discovered til now.
I will try to collect Logs and send it to you @spiral fable if you tell me what logs. Only Harvest Container ? Or more?
Thanks @fluid hull . We have identified the issue. I'll share updated Qtree dashboard with you shortly over email.
@fluid hull I have shared updated dashboard via email. Please try and see if it fixes this issue.
I will try the new dashboard and report back 🙂
Seems to work perfect now for me
Great. Thanks!
Can i ask you briefly what caused the issue?