#License dashboard and data pulled into harvest?

1 messages · Page 1 of 1 (latest)

static vortex
#

I know there is the REST API to get that info, but having a dashboard to query and review clusters to make sure all active nodes have all their license would make sense for this

#

/api/cluster/licensing/licenses

glossy mantle
#

hi @static vortex Harvest doesn't currently collect that information. Please open a feature request with the details of what you need https://github.com/NetApp/harvest/issues

GitHub

Open-metrics endpoint for ONTAP and StorageGRID. Contribute to NetApp/harvest development by creating an account on GitHub.

static vortex
glossy mantle
#

thanks @static vortex

hearty crow
static vortex
#

Thank you. I will attempt to install this and validate it's working at my next opportunity.

#

When will this be added to a release? may?

glossy mantle
#

thanks @static vortex any feedback you can provide is appreciated. That is correct, the 26.05 release in May

static vortex
#

I'll probably have to validate 'license' metrics appear, I won't be able to add it to the dashboard yet as that would require a change notification and review by peers (and we typically do that with major/minor software releases, not the nightly builds)

static vortex
#

I am collecting the license metrics. I compared against our previous install and it's clearly adding them. is there any specific thing about the metrics I should denote (like number of them, type, etc) to denote it it's collecting as expecting?

hearty crow
#

There are 5 license metrics currently.

license_labels
license_capacity_maximum_size
license_capacity_used_percent
license_capacity_used_size
license_expiry_time

license_labels consist of information about license. Other metrics are only available for some that have this data.
Harvest uses api/cluster/licensing/licenses?fields=* API to collect this information,

https://netapp.github.io/harvest/nightly/ontap-metrics/#license_labels
https://netapp.github.io/harvest/nightly/ontap-metrics/#license_capacity_maximum_size
https://netapp.github.io/harvest/nightly/ontap-metrics/#license_capacity_used_percent
https://netapp.github.io/harvest/nightly/ontap-metrics/#license_capacity_used_size
https://netapp.github.io/harvest/nightly/ontap-metrics/#license_expiry_time

static vortex
#

I see all but expiry_time in my list, so I expect that means, I don't have any 'temporary' licenses that are expired or have expired? Or would that be for something else?

glossy mantle
static vortex
#

email sent, stripped out identifying information per our corporate requirements for data.

hearty crow
#

@static vortex Thanks for the data. There isn’t an expiry_time field in the response, so it makes sense that you don’t have this metric. Since the data is PII-redacted, I see some errors in the logs when i use this response on my end, for example:

time=2026-03-04T07:21:28.991+05:30 level=ERROR source=license.go:85 msg="Failed to create instance" Poller=dc-1 plugin=Rest:License object=License error="duplicate instance key => cifs_node_none" key=cifs_node_none

Do you see a similar error? You mentioned that you removed node values from the shared JSON, so it could be related to that as well. I want to confirm whether you’re seeing this error in your logs as well and if this is something that requires a fix.

static vortex
#

I get numerous of those type of errors

#

=> anti_ransomware_node_none
=> cifs_node_none
=> fcp_node_none
=> flexclone_node_none
=> iscsi_node_none
=> mt_ek_mgmt_node_none
=> nfs_node_none
=> nvme_of_node_none
=> s3_node_none
=> s3_snapmirror_node_none
=> snaplock_node_none
=> snapmanagersuite_node_none
=> snapmirror_node_none
=> snapmirror_sync_node_none
=> snaprestore_node_none
=> snapvault_node_none
=> tpm_node_none
=> ve_node_none

#

all are 'duplicate instance key' errors

#

same line number

glossy mantle
#

Have you created custom templates? We do not export any metrics with the names from above so I'm guessing you added these?

Focusing on the question at hand, we are trying to understand if the new license code is working for you. We can't tell if there is a problem with the license code or if there is a problem because of how you redacted the data. We ran the license code across 39 clusters here and see no errors. The current working theory is we see problems when replaying your shared JSON because of how the file was edited.

static vortex
#

AFAIK, those are the license types that are on the cluster

#

no custom template

#

just whatever came in the nightly that I installed on 3/3

#

I'm guessing left over license from a non-existant node?

glossy mantle
#

cool, so seems like the license feature is working for you and you don't have the license_expiry_time metric because none of your licenses have expirations

static vortex
#

Would this generate the above?

Owner: none
Installed License: ONTAP One with Encryption

glossy mantle
#

yes since that's what the json from ONTAP includes (assuming that wasn't one of your edits)

static vortex
#

this is exactly why I want this license dashboard. It seems that when EOL'ing equipment we don't seem to remove the license for those old systems and they are causing this issue

#

owners / node where redacted, not removed

#

anyway to set the license query to create unique instances keys when owner: none?

#

I suspect this will be a larger problem than what this one cluster has

glossy mantle
#

hard to say without the actual json

static vortex
#

which part of the json would be needed?

glossy mantle
#

we would prefer the json from ontap

#

otherwise we have to guess what you have changed and what you haven't

#

which makes dev and debugging difficult

static vortex
#

/usr/bin/jq '.records[].licenses[].host_id = "ABCDEF"' license.json | /usr/bin/jq '.records[].licenses[].serial_number ="GHIJKL"' | /usr/bin/jq '.records[].licenses[].capacity.maximum_size = "1234567890"' > license.json.cleaned

#

this is what I changed

#

I also did a sed, but that matched whatever was in the file that needed redacted to XXXYYY for the owner, if it was already none, it didn't match it

#

that's why I left the identifying node info at the end of XXXYYY so it was clear it was a different system that had that license

#

does that help?

glossy mantle
#

yes, thanks

glossy mantle
#

right now a unique instance is defined by the combination of
licenseName + "_" + scope + "_" + owner
looking at your JSON this will cause duplicates. What if we change uniqueness to
licenseName + "_" + scope + "_" + owner + "_" + installedLicense

No that won't work either. Maybe adding serial_number would do it or is that the node's serial number? Can you check your non-redacted data and see if serial_number would address the uniqueness problem

static vortex
#

either serial or host_id, both are set

#

both match, if that matters

#

what I would do to test this, is pick a serial number of a host no longer in your cluster and add the license to the cluster

#

it should set owner to 'none'

glossy mantle
#

can you check if the combination of licenseName, scope, owner, serial is unique? What we've learned is the combination of licenseName, scope, and owner is NOT unique with your data. licenseName is "name" from the JSON

static vortex
#

licenseName is referring to this, correct?

"records": [
{
"name": "nfs",

#

vs

"records": [
{
"name": "cifs",

glossy mantle
#

yes for some reason my image showing that is uploading very slowly

static vortex
#

nfs_node_none_1234567890
nfs_node_none_1234567891

cifs_node_none_1234567890
cifs_node_none_1234567891

#

such as that?

glossy mantle
#

yes

#

I can't see your actual values so I can't see if there will be duplicates, that's why I'm asking if you can check

static vortex
#

at least what I am seeing, serials are unique throughout per licenseName

glossy mantle
#

thanks! I'll update the Harvest code to use that, test with CI, and let you know when there is a new nightly with these changes

static vortex
#

I'm re-validating with JQ

glossy mantle
#

good idea

static vortex
#

verified

#

cat license.json | /usr/intel/bin/jq -r '.records[] | .name as $name |.scope as $scope | .licenses[] | "($name)($scope)(.owner)_(.serial_number)"' | wc -l

glossy mantle
#

thank you!

static vortex
#

cat license.json | /usr/intel/bin/jq -r '.records[] | .name as $name |.scope as $scope | .licenses[] | "($name)($scope)(.owner)_(.serial_number)"' | sort -u | wc -l

#

both return 58 records

glossy mantle
#

code is submitted and running through CI

static vortex
#

I'm guessing this will appear tomorrow

#

in the rpm build

glossy mantle
#

it will be available in a few minutes

static vortex
#

didn't expect that

glossy mantle
static vortex
#

downloaded and updated

#

I'll report back later today

static vortex
#

so far all I am seeing is:

msg="lagging behind schedule"

#

and it reports the time, which for this specific cluster is ~40-44s

#

and clearly there was an issue before:

1512 license_ based metrics

now:

2088 license_ based metrics

#

no more ERRORs though

glossy mantle
#

excelllent no errors is good. Does that number of instances match what you see from the ONTAP CLI when running system license show? If you persistently see the lagging message you may want to increase the client_timeout https://netapp.github.io/harvest/nightly/help/troubleshooting/#client_timeout for the conf/rest/9.12.0/license.yaml template.

Can you grep your log files for the pattern Rest:License and paste the results? I'd like to see what it reports for numCalls, metrics, apiMs, etc. It should look something like this

time=2026-03-04T14:05:39.334-05:00 level=INFO source=collector.go:620 msg=Collected Poller=sar collector=Rest:License apiMs=778 bytesRx=9951 calcMs=0 exportMs=2 instances=20 instancesExported=20 metrics=110 metricsExported=38 numCalls=1 parseMs=0 pluginInstances=0 pluginMs=0 pollMs=778 renderedBytes=8607 zBegin=1772651138554

static vortex
#

time=2026-03-04T16:03:01.131-08:00 level=INFO source=collector.go:620 msg=Collected Poller=collector_node collector=Rest:License apiMs=225312 bytesRx=214570 calcMs=0 exportMs=5 instances=18 instancesExported=522 metrics=96 metricsExported=2088 numCalls=12 parseMs=1 pluginInstances=0 pluginMs=5 pollMs=225318 renderedBytes=384838 zBegin=1772668755808

#

I'll change the client_timeout and see if that occurs again, likely tomorrow sometime

hearty crow
static vortex
#

this is done, I'll report back if I see any errors in the log

static vortex
#

time=2026-03-05T10:16:55.310-08:00 level=INFO source=helpers.go:113 msg="best-fit template" Poller=node collector=Rest:License path=conf/rest/9.12.0/license_custom.yaml v=9.16.1 jitter=none
time=2026-03-05T10:20:40.994-08:00 level=WARN source=collector.go:580 msg="lagging behind schedule" Poller=node collector=Rest:License lag=43.758232018s

license_custom.yaml difference:

--- license.yaml 2026-03-05 01:35:55.000000000 -0800
+++ license_custom.yaml 2026-03-05 10:16:08.102708515 -0800
@@ -14,3 +14,4 @@

  • License

export_data: false
+client_timeout: 2m

#

still getting the lagging behind schedule mesg

#

not sure why this is occurring

glossy mantle
#

hey @static vortex it looks like it's taking roughly 3.75 minutes for ONTAP to provide the license info for this cluster. Since this collector, by default runs every 3m, it will have lag. You can live with warning or increase the schedule for this template so it runs every 4m instead of every 3. If you want to change that you can add the following lines below your client_timeout line

schedule:
  - counter: 24h  # This handles cases such as cluster upgrades or collector cache updates.
  - data: 4m

this will override the default in your rest/default.yaml template

static vortex
#

never mind, I was confused

#

I'll remove the client_timeout and add the schedule change

glossy mantle
#

sounds good

static vortex
#

because I don't expect the licenses to change often, we could bump this up to twice the collector time and it shouldn't have any ill effect

#

which I may do

#

here's the timing from startup until it exported metrics:

time=2026-03-05T10:37:55.996-08:00 level=INFO source=helpers.go:113 msg="best-fit template" Poller=node collector=Rest:License path=conf/rest/9.12.0/license_custom.yaml v=9.16.1 jitter=none
time=2026-03-05T10:37:56.630-08:00 level=INFO source=collector.go:652 msg=Collected Poller=node collector=Rest:License task=counter apiMs=241 bytesRx=0 metrics=0 numCalls=0 pollMs=276 zBegin=1772735876354
time=2026-03-05T10:41:53.975-08:00 level=INFO source=collector.go:620 msg=Collected Poller=node collector=Rest:License apiMs=237338 bytesRx=214570 calcMs=0 exportMs=4 instances=18 instancesExported=522 metrics=96 metricsExported=2088 numCalls=12 parseMs=2 pluginInstances=0 pluginMs=3 pollMs=237342 renderedBytes=430252 zBegin=1772735876630

glossy mantle
static vortex
#

interesting

#

so far at 4m, no behind schedule alert

#

I found a few more that were lagging and added the override for them, taking longer than 3m to run

#

one was right at 4m +- 10 seconds

#

may get a few alerts, but much less

glossy mantle
#

the reason we log that as a warning instead of error is because it's not crucial - when lag happens, it means the current collector is still running when the "next" schedule pops. That's OK since the previous one is still running. We don't run another at the same time, and once the previous one finishes, it will be scheduled to run again as soon as it is done

static vortex
#

I see, so when the current collector is running and you launch on the 3m time schedule, it starts tracking how long it took before it could start the new one and publishes that, interesting

#

the interesting part to me is it must be comparing start time against end time and how long over the schedule interval that is, because if what you say is true and it waits, i would think lags would get longer and longer up to 3m

glossy mantle
#

not quite - each collector runs in its own go routine (like a thread). The collector run a for(true) loop. That loop iterates through the list of tasks (tasks are the list of items in the schedule from above counter, data, etc.)

If the task is not due, continue to next task.
If the task is due, run it

At the end of the loop, check what time the next task is due. If that time is in the future, sleep until then. If that time is in the past, log the lag message, and continue back to the top of the forever loop.

code is https://github.com/NetApp/harvest/blob/7b207467fdaf97d551891a47157e3ed77ae42e7f/cmd/poller/collector/collector.go#L579

GitHub

Open-metrics endpoint for ONTAP, StorageGRID, E-Series, and Cisco switches - NetApp/harvest

static vortex
#

interesting

glossy mantle
#

this also has the advantage of making each task single threaded within the collector. Meaning that pollCounter will run before pollData. Both will never run at the same time

static vortex
#

still getting some random lags on license and volume under REST, but I suspect that we were seeing this before as well

glossy mantle
#

yep

static vortex
#

I'm sure that load is a factor on how long these run anyways, so I would expect some lag when load is higher

glossy mantle
#

agreed

hearty crow
static vortex
#

most of the time, it's not noticeable, my point is that customer load affect how long these take to return, as much as how much data there is to return. I am less concerned about how long the calls take as long as they mostly complete