#Improve workload dashboard for adaptive qos adding workload size and minimum absolute in the table

1 messages · Page 1 of 1 (latest)

snow adder
#

Hello,
I was looking to improve the Workload dashboard for adaptive qos, We will like to calculate with the gathered information the size of workload assigned to that qos and additionally to show minimum absolute for that workload as well.

In some cases, the workload size can't be calculated since the minimum and maximum (peak) are shown as the same when the size of workload wasn't exceed the minimum absolute.

Have you worked on something like this?

Thanks.
Best regards!

alpine mulch
#

@snow adder , min-throughput could be added in the respective template and Workload dashboard Adaptive qos table accordingly.
But, size of workload assigned to that qos would be challenging as qos_workload template uses private/cli/qos/workload Rest api and api don't have size details.

Could you say more about size field and what does that indicate at qos workload level ?

snow adder
#

@alpine mulch Related with the size of workload, take a look in the screenshoot as an example. The adaptive policy assigned for a workload is one with this performance 12288IOPS/TB with and minimum absolute 1000IOPS. In the other image the qos assigned for the workload is 6000IOPS, this is because the size of the workload is around 500g. In some cases this operation won't be possible since the IOPS asigned for a workload will be the minimum absolute.

Let me know is I explain well or do you have any other doubts about my question.

Thanks.

alpine mulch
#

@snow adder Thanks for the explanation. Now, it's more clear to calculate the size of workload assigned to qos.
We will incorporate to display it in Adaptive qos table.

snow adder
normal flame
#

Thanks, Paqui! Appreciate you raising this issue and offering to try it out

alpine mulch
#

@snow adder PR https://github.com/NetApp/harvest/pull/3937 is in review and we are validating with our local data.
Could you share the rest response of these below calls at ng-harvest-files@netapp.com, so I could validate with your data as well.

Please replace USER, PASSWORD, and CLUSTER_IP with the appropriate cluster values.
curl -sk -u USER:PASSWORD 'https://CLUSTER_IP/api/private/cli/qos/workload?return_records=true&fields=is_adaptive,class,file,lun,max_throughput,min_throughput,policy_group,qtree,volume,vserver,workload&max_records=10000&ignore_unknown_fields=true' > rest_qos_workload.json

curl -sk -u USER:PASSWORD 'https://CLUSTER_IP/api/private/cli/qos/adaptive-policy-group?return_records=true&fields=absolute_min_iops,expected_iops,expected_iops_allocation,num_workloads,peak_iops,peak_iops_allocation,policy_group,vserver&max_records=10000&ignore_unknown_fields=true' > rest_qos_adaptive.json

curl -sk -u USER:PASSWORD 'https://CLUSTER_IP/api/private/cli/volume?return_records=true&fields=volume,vserver,size,total&is_constituent=*&max_records=10000&ignore_unknown_fields=true' > rest_volume.json

normal flame
#

@snow adder this feature request is included in the latest nightly build if you want to give it a try and let us know how it works for you

snow adder
#

Apologize because I haven't read your last messages. @alpine mulch Do you still need the requested information?
@normal flame Yes, we can try the latest nightly build. Do we need to configure or do anything special?

normal flame
#

We don't need the curls anymore, unless you have problems with the nightly build. If you're using nabox you will need to delete the Workload dashboard so the new one is load. If you aren't using nabox, you'll want to import the new workload dashboard from the nightly build

snow adder
#

We have deployed the latest nightly and we can see the Min IOPS in the table but we don't see the Workload size. Maybe, I don't have the correct version? Or need to do something more? Regarding the version issue, we do not use Nabox and we have deploy the new Workload dashboard.

alpine mulch
#

Little strange, what is last az, az2 ?
Could you import the Workload dashboard from here: https://raw.githubusercontent.com/NetApp/harvest/refs/heads/main/grafana/dashboards/cmode/workload.json

Also, Could you run this below query as-is in your prometheus query editor and share the screenshot of the response here:

label_join(
    (
            1024 * 1024 * 1024 * 1024
          *
            qos_workload_max_throughput_iops{is_adaptive="Yes"}
        / on (cluster, datacenter, policy_group, svm) group_left ()
          label_replace(
            qos_policy_adaptive_peak_iops{},
            "policy_group",
            "$1",
            "name",
            "(.*)"
          )
      unless on (cluster, datacenter, policy_group, svm)
        (
              qos_workload_min_throughput_iops{is_adaptive="Yes"}
            - on (cluster, datacenter, policy_group, svm)
              label_replace(
                qos_policy_adaptive_absolute_min_iops{},
                "policy_group",
                "$1",
                "name",
                "(.*)"
              )
          ==
            0
        )
    ),
    "unique_id",
    "-",
    "datacenter",
    "cluster",
    "workload"
  )
and on (cluster, datacenter, workload)
  qos_workload_min_throughput_iops{is_adaptive="Yes"}

As you see here, We do have only one record in all my local datacenter where this workload size has been eligible and calculated

snow adder
#

In the grafana dashboard, when I have checked the queries and table information I have seen this error: execution: multiple matches for labels: many-to-one matching must be explicit (group_left/group_right)
If I check the query that you say in the prometheus query editor the error is the same mentioned before.

The harvest version installed: ||harvest version 25.10.21-nightly (commit 23f2e8b3) (build date 2025-10-21T01:38:13-0400) linux/amd64||

Aditionally, when I change the dashboard by the one that you share, the result is the same. On the other hand, about your question az (availability zone, those are information added by us to every cluster). In the case of az2 I guess is the same but for the second query in the panel.

alpine mulch
#

Ok, understood about the availability zone.
Could you share individual response of this 2 promQL:

label_join(
    (
            1024 * 1024 * 1024 * 1024
          *
            qos_workload_max_throughput_iops{is_adaptive="Yes"}
        / on (cluster, datacenter, policy_group, svm) group_left ()
          label_replace(
            qos_policy_adaptive_peak_iops{},
            "policy_group",
            "$1",
            "name",
            "(.*)"
          )
      unless on (cluster, datacenter, policy_group, svm)
        (
              qos_workload_min_throughput_iops{is_adaptive="Yes"}
            - on (cluster, datacenter, policy_group, svm)
              label_replace(
                qos_policy_adaptive_absolute_min_iops{},
                "policy_group",
                "$1",
                "name",
                "(.*)"
              )
          ==
            0
        )
    ),
    "unique_id",
    "-",
    "datacenter",
    "cluster",
    "workload"
  )
qos_workload_min_throughput_iops{is_adaptive="Yes"}
#

It seems join on cluster, datacenter, workload would not be sufficient here

snow adder
# alpine mulch Ok, understood about the availability zone. Could you share individual response...

Regarding the first query, the result is the same than previously: multiple matches for labels: many-to-one matching must be explicit (group_left/group_right)

label_join(
    (
            1024 * 1024 * 1024 * 1024
          *
            qos_workload_max_throughput_iops{is_adaptive="Yes"}
        / on (cluster, datacenter, policy_group, svm) group_left ()
          label_replace(
            qos_policy_adaptive_peak_iops{},
            "policy_group",
            "$1",
            "name",
            "(.*)"
          )
      unless on (cluster, datacenter, policy_group, svm)
        (
              qos_workload_min_throughput_iops{is_adaptive="Yes"}
            - on (cluster, datacenter, policy_group, svm)
              label_replace(
                qos_policy_adaptive_absolute_min_iops{},
                "policy_group",
                "$1",
                "name",
                "(.*)"
              )
          ==
            0
        )
    ),
    "unique_id",
    "-",
    "datacenter",
    "cluster",
    "workload"
  )

The second query, one example of the output is:

qos_workload_min_throughput_iops{az="az1", class="user_defined", cluster="xxxxx", datacenter="xxx", file="volume-084fad25-43b3-4410-815d-59ebcd8cd147", instance="localhost:13008", is_adaptive="Yes", job="harvest", policy_group="aqos_xx_xxx_x", svm="xxxxxxxxxxxxxx", volume="xxxxxxx", wid="48596", workload="file-volume_084fad25_43b3_4410_815d_59ebcd8cd147-wid48596"}

alpine mulch
#

@snow adder It seems some more fields would be required to promQL join as cluster, datacenter and workload doesn't make unique key here. You can apply Cluster filter in your Workload dashboard to decide for which cluster the panel has been broken, and Could you run below curl commands against that cluster and share the response via email at ng-harvest-files@netapp.com

**Note: ** Replace USER,PASSWORD and CLUSTER_IP appropriately as per your cluster.

curl -sk -u USER:PASSWORD 'https://CLUSTER_IP/api/private/cli/qos/workload?return_records=true&fields=uuid,class,file,is_adaptive,lun,max_throughput,min_throughput,policy_group,qtree,volume,vserver,wid,workload&max_records=10000&ignore_unknown_fields=true' > qos_workload.json
curl -sk -u USER:PASSWORD 'https://CLUSTER_IP/api/private/cli/qos/adaptive-policy-group?return_records=true&fields=uuid,absolute_min_iops,expected_iops,expected_iops_allocation,num_workloads,object_count,peak_iops,peak_iops_allocation,policy_group,vserver&max_records=10000&ignore_unknown_fields=true' > qos_workload.json
snow adder
# alpine mulch <@1014834310265708615> It seems some more fields would be required to promQL joi...

Hello Hardik,
Yes, I usually select the cluster in which we have adaptative qos applied to filter the requested information to the panel and trying to reduce the size of query for prometheus since qos querys are heavy.

Regarding the requested information, I have sent you the gathered information for those queries for one of our clusters with adaptive qos. Let me know if you need something additional.

alpine mulch
#

Thanks @snow adder , received the json files. I would be trying to replay locally with your data and update soon.

alpine mulch
snow adder
alpine mulch
snow adder
alpine mulch
#

Yes, This new changes would be included in the next 25.11 Harvest release as well.

snow adder
#

Perfect!

snow adder
#

Hello,
I want to tell you about "Workload dashboard". I don't know why but I have to add an additional transformation in this panel to avoid blank information, like you can see in my screenshot. I have uploaded an image with this transformation which maybe will be used for you in future improvements.

alpine mulch
#

@snow adder , Thanks for the suggestion.
While testing recent fix in the table, we have made few changes in query join to handle those cases. We will incorporate this and validate the same.

#

@snow adder We want to understand why volume is empty records are showing in your system.

Could you run these below 5 queries in your prometheus query bar and share the responses via email us at ng-harvest-files@netapp.com

avg_over_time(
  label_join(
    clamp_max(
      (
          (qos_ops{} * 100)
        / on (datacenter, cluster, workload, policy_group, volume, lun, svm, qtree, file, wid)
          qos_workload_max_throughput_iops{is_adaptive="Yes"}
      ),
      100
    ),
    "unique_id",
    "-",
    "datacenter",
    "cluster",
    "workload"
  )[3h:]
)
label_join(
  (
      qos_ops{}
    * on (datacenter, cluster, workload, policy_group, volume, lun, svm, qtree, file, wid)
      (
          qos_workload_labels{is_adaptive="Yes"}
        and
          qos_workload_max_throughput_iops{is_adaptive="Yes"}
      )
  ),
  "unique_id",
  "-",
  "datacenter",
  "cluster",
  "workload"
)
label_join(
    qos_workload_max_throughput_iops{is_adaptive="Yes"}
  and on (cluster, datacenter, workload)
    qos_ops{},
  "unique_id",
  "-",
  "datacenter",
  "cluster",
  "workload"
)
label_join(
    qos_workload_min_throughput_iops{is_adaptive="Yes"}
  and on (cluster, datacenter, workload)
    qos_ops{},
  "unique_id",
  "-",
  "datacenter",
  "cluster",
  "workload"
)
#
label_join(
    (
            1024 * 1024 * 1024 * 1024
          *
            qos_workload_max_throughput_iops{is_adaptive="Yes"}
        / on (cluster, datacenter, policy_group, svm) group_left ()
          label_replace(
            qos_policy_adaptive_peak_iops{},
            "policy_group",
            "$1",
            "name",
            "(.*)"
          )
      unless on (cluster, datacenter, policy_group, svm, workload)
        (
              qos_workload_min_throughput_iops{is_adaptive="Yes"}
            - on (cluster, datacenter, policy_group, svm) group_left ()
              label_replace(
                qos_policy_adaptive_absolute_min_iops{},
                "policy_group",
                "$1",
                "name",
                "(.*)"
              )
          ==
            0
        )
    ),
    "unique_id",
    "-",
    "datacenter",
    "cluster",
    "workload"
  )
and on (cluster, datacenter, workload)
  qos_workload_min_throughput_iops{is_adaptive="Yes"}
alpine mulch
#

@snow adder , Could you share What is your Harvest version ? 25.11 or which date of nightly ?