#Label Collision Issue with Thanos as Data Source for Grafana

1 messages · Page 1 of 1 (latest)

vocal badge
#

Hi Team,

We’re working on configuring a global Thanos query layer as a data source for Grafana, in order to visualize both live and historical NetApp metrics collected via Harvest.
However, we’ve encountered a label collision issue:
Harvest uses the cluster label to represent the NetApp storage cluster.
Thanos, in our setup, uses the same cluster label for Kubernetes cluster names.
Since Grafana dashboards from the Harvest community rely on the cluster label, this is creating conflicts in our environment.

Questions:
Are there recommended ways to adapt the Harvest dashboards to avoid this conflict?
Is there any roadmap or plan to make the cluster label in Harvest configurable/customizable?
Have others in the community tackled similar naming collisions when integrating with Thanos?

Appreciate any guidance or best practices from the community!

marble sentinel
vocal badge
#

@marble sentinel thank you so much 🙂
“Shoutout to the Harvest crew—exactly what we needed. We’ve been burning hours on a dashboard update script; thanks for the permanent solution!”

solemn garden
#

@marble sentinel We have a global Thanos setup, with multiple Kubernetes clusters running Harvest for different pollers. Is there a way to include the cluster label in the query so that the dashboards can work seamlessly with the global Thanos setup?

"cluster" - this holds the k8s cluster name
"netapp_cluster" - this holds the netapp cluster name

#

feedback:

it will be nice to enable "hide:0" if we want to switch between different datasources from the UI.

        "current": {
          "selected": false,
          "text": "Prometheus",
          "value": "Prometheus"
        },
        "description": null,
        "error": null,
        "hide": 2,
        "includeAll": false,
        "label": "Data Source",
        "multi": false,
        "name": "DS_PROMETHEUS",
        "options": [],
        "query": "prometheus",
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      },```
#

bin/harvest grafana customize --output-dir ./thanos/dashboards/ --cluster-label netapp_cluster --directory grafana/dashboards/cmode-details --datasource thanos

this command updated the datasource on each panel. we have a variable defined for it.
if this command updates the variable it would be nice to switch between different datasources.

right now it gets locked with one datasource.

marble sentinel
stable meteor
#

Thanks for creating the issue

solemn garden
#

Additionally, is it possible to perform dashboard migration via a binary or another method for a global Thanos setup? The goal is to support both labels:

cluster -> Kubernetes cluster name
netapp_cluster -> NetApp cluster name

If this approach sounds good, I can create an issue for it as well. Let me know what you think!

stable meteor
#

The only thing that comes to mind is the option that Rahul mentioned above with labels Have you tried using the CLI options --cluster-label and --labels together?

solemn garden
#

yep.
below is the behaviour

command executed:

bin/harvest grafana customize --output-dir ./thanos/dashboards/ --cluster-label netapp_cluster --directory grafana/dashboards --labels cluster

modified the query like below

"query": "label_values(volume_labels{datacenter=~\"$Datacenter\",netapp_cluster=~\"$Cluster\",cluster=~\"$Cluster\"}, svm)",

created new var

{
        "allValue": ".*",
        "current": {
          "selected": false
        },
        "definition": "label_values(cluster)",
        "hide": 0,
        "includeAll": true,
        "multi": true,
        "name": "Cluster",
        "options": [],
        "query": {
          "query": "label_values(cluster)",
          "refId": "StandardVariableQuery"
        },
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      },

but this is a duplicate since we already have a var Cluster which is below

{
        "allValue": ".*",
        "current": {},
        "datasource": "prometheus",
        "definition": "label_values(cluster_new_status{system_type!=\"7mode\",datacenter=~\"$Datacenter\"},netapp_cluster)",
        "description": null,
        "error": null,
        "hide": 0,
        "includeAll": true,
        "label": null,
        "multi": true,
        "name": "Cluster",
        "options": [],
        "query": {
          "query": "label_values(cluster_new_status{system_type!=\"7mode\",datacenter=~\"$Datacenter\"},netapp_cluster)",
          "refId": "StandardVariableQuery"
        },
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "type": "query"
      },
#

it doesn't work as expected as @marble sentinel mentioned. 🙁

stable meteor
#

thanks for confirming, that's what we suspected. In terms of dashboard migration, we don't have anything beyond what you're already using via bin/grafana import and bin/grafana customize Yes, please open a feature request for this with a before and after example of how you would like us to rewrite the Prom queries. Are you suggesting a new or modified CLI argument to import or customize?

solemn garden
#

query before:
"query": "label_values(volume_labels{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\"}, svm)"

query after:
"query": "label_values(volume_labels{datacenter=~\"$Datacenter\",cluster=~\"$Cluster\", netapp_cluster=~\"$NetappCluster\"}, svm)"

Idea in my mind: (just a thought but can be decided by the dev team)

  1. Add a new variable "NetappCluster"
  2. Update all the panels, variables and others (table trasnformations, overrides etc)

command modification:
bin/harvest thanos migrate --output-dir ./thanos/dashboards/ --directory grafana/dashboards --cluster-label netapp_cluster --datasource-hide false

--datasource-hide false -> already handled here https://github.com/NetApp/harvest/issues/3816
--cluster-label - allows the user to provide their label which holds the netapp cluster name. that way the binary can create a variabe accordingly.
eg: --cluster-label netapp_cluster can create a variable NetappCluster

hope this helps... 🙂

#

variable NetappCluster look like below while the "Cluster" variable remian untouched
this one helps to filter netapp clusters acrosss different k8s clusters.

{
        "allValue": ".*",
        "current": {},
        "datasource": "thanos",
        "definition": "label_values(cluster_new_status{system_type!=\"7mode\",datacenter=~\"$Datacenter\, cluster=~\"$Cluster\"},netapp_cluster)",
        "description": null,
        "error": null,
        "hide": 0,
        "includeAll": true,
        "label": null,
        "multi": true,
        "name": "NetappCluster",
        "options": [],
        "query": {
          "query": "label_values(cluster_new_status{system_type!=\"7mode\",datacenter=~\"$Datacenter\", cluster=~\"$Cluster\"},netapp_cluster)",
          "refId": "StandardVariableQuery"
        },
        "refresh": 2,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "type": "query"
      },
stable meteor
#

thanks, @solemn garden! That helps. We should have a pull request for #3816 ready soon. The way it works is you pass --show-datasource to the import or customize subcommand and it will add a variable to each dashboard that contains all of your Prometheus datasources like so:

#

Does that solve your use-case for #3816?

solemn garden
#

highly appreciated for the work... thanks🙏

stable meteor
#

you're welcome!

stable meteor
#

the latest nightly build has the changes for https://github.com/NetApp/harvest/issues/3816 Please give it a try when you get a chance and let us know if it covers that issue.

When using grafana import or grafana customize, use the following CLI arguments: --show-datasource
and --datasource '${DS_PROMETHEUS}'

stable meteor
#

Hi @solemn garden any feedback on the --show-datasource and --datasource '${DS_PROMETHEUS}' changes? Are they working for you?

solemn garden
#

@stable meteor sorry i went on vacation. back today.
will update u by tomorrow

stable meteor
#

thanks!

#

those changes did not make 25.08.1 so you will need to try the nightly build

solemn garden
#

@stable meteor It seems like its working as expected.
thaks for the amazing efforts. really appreciated!

solemn garden
#

@stable meteor
just noticed that this change updates the metric name in the query as well

eg: https://github.com/sapcc/dme-storage-harvest/blob/release/25.08.1-sap/thanos/dashboards/cmode/cluster.json
actual query: "expr": "cluster_software_status{netapp_cluster=~\"$Cluster\",datacenter=~\"$Datacenter\"}",
modified to
"expr": "netapp_cluster_software_status{netapp_cluster=~\"$Cluster\",datacenter=~\"$Datacenter\"}",

GitHub

Open-metrics endpoint for ONTAP and StorageGRID. Contribute to sapcc/dme-storage-harvest development by creating an account on GitHub.

stable meteor
#

thanks for the bug report @solemn garden that wasn't introduced by the latest change BUT it for sure is a bug. We'll fix. Looks like this affects three queries in the cluster.json dashboard

vocal badge
#

thanks Chris, we haven't checked other dashboards to see if any other query is affected as well

stable meteor
stable meteor
stable meteor
#

@vocal badge and @solemn garden can you take a look at the screenshots pasted into #3955 and let us know if this new CLI argument does what you need for the issue you opened?

vocal badge
#

@stable meteor Yes, it’s looking good, this new CLI argument works for what we needed. Thanks for the update!