#License dashboard and data pulled into harvest?
1 messages · Page 1 of 1 (latest)
hi @static vortex Harvest doesn't currently collect that information. Please open a feature request with the details of what you need https://github.com/NetApp/harvest/issues
thanks @static vortex
@static vortex Changes are in the latest nightly (https://github.com/NetApp/harvest/releases/tag/nightly) build. A new License dashboard is added.
Thank you. I will attempt to install this and validate it's working at my next opportunity.
When will this be added to a release? may?
thanks @static vortex any feedback you can provide is appreciated. That is correct, the 26.05 release in May
I'll probably have to validate 'license' metrics appear, I won't be able to add it to the dashboard yet as that would require a change notification and review by peers (and we typically do that with major/minor software releases, not the nightly builds)
here is the list of new license metrics that you should see https://github.com/NetApp/harvest/pull/4170/changes#diff-7b54aab7cf2d2c15c25357c5a23e571819791313b56704ad703407533a1ab716
I am collecting the license metrics. I compared against our previous install and it's clearly adding them. is there any specific thing about the metrics I should denote (like number of them, type, etc) to denote it it's collecting as expecting?
There are 5 license metrics currently.
license_labels
license_capacity_maximum_size
license_capacity_used_percent
license_capacity_used_size
license_expiry_time
license_labels consist of information about license. Other metrics are only available for some that have this data.
Harvest uses api/cluster/licensing/licenses?fields=* API to collect this information,
https://netapp.github.io/harvest/nightly/ontap-metrics/#license_labels
https://netapp.github.io/harvest/nightly/ontap-metrics/#license_capacity_maximum_size
https://netapp.github.io/harvest/nightly/ontap-metrics/#license_capacity_used_percent
https://netapp.github.io/harvest/nightly/ontap-metrics/#license_capacity_used_size
https://netapp.github.io/harvest/nightly/ontap-metrics/#license_expiry_time
I see all but expiry_time in my list, so I expect that means, I don't have any 'temporary' licenses that are expired or have expired? Or would that be for something else?
@static vortex can you check you logs for any errors and run the following curl and email license.json to ng-harvest-files@netapp.com
Replace $ip, $user, $pass with the appropriate values
curl -k --user $user:$pass 'https://$ip/api/cluster/licensing/licenses?return_records=true&fields=*' > license.json
email sent, stripped out identifying information per our corporate requirements for data.
@static vortex Thanks for the data. There isn’t an expiry_time field in the response, so it makes sense that you don’t have this metric. Since the data is PII-redacted, I see some errors in the logs when i use this response on my end, for example:
time=2026-03-04T07:21:28.991+05:30 level=ERROR source=license.go:85 msg="Failed to create instance" Poller=dc-1 plugin=Rest:License object=License error="duplicate instance key => cifs_node_none" key=cifs_node_none
Do you see a similar error? You mentioned that you removed node values from the shared JSON, so it could be related to that as well. I want to confirm whether you’re seeing this error in your logs as well and if this is something that requires a fix.
I get numerous of those type of errors
=> anti_ransomware_node_none
=> cifs_node_none
=> fcp_node_none
=> flexclone_node_none
=> iscsi_node_none
=> mt_ek_mgmt_node_none
=> nfs_node_none
=> nvme_of_node_none
=> s3_node_none
=> s3_snapmirror_node_none
=> snaplock_node_none
=> snapmanagersuite_node_none
=> snapmirror_node_none
=> snapmirror_sync_node_none
=> snaprestore_node_none
=> snapvault_node_none
=> tpm_node_none
=> ve_node_none
all are 'duplicate instance key' errors
same line number
Have you created custom templates? We do not export any metrics with the names from above so I'm guessing you added these?
Focusing on the question at hand, we are trying to understand if the new license code is working for you. We can't tell if there is a problem with the license code or if there is a problem because of how you redacted the data. We ran the license code across 39 clusters here and see no errors. The current working theory is we see problems when replaying your shared JSON because of how the file was edited.
AFAIK, those are the license types that are on the cluster
no custom template
just whatever came in the nightly that I installed on 3/3
I'm guessing left over license from a non-existant node?
cool, so seems like the license feature is working for you and you don't have the license_expiry_time metric because none of your licenses have expirations
Would this generate the above?
Owner: none
Installed License: ONTAP One with Encryption
yes since that's what the json from ONTAP includes (assuming that wasn't one of your edits)
this is exactly why I want this license dashboard. It seems that when EOL'ing equipment we don't seem to remove the license for those old systems and they are causing this issue
owners / node where redacted, not removed
anyway to set the license query to create unique instances keys when owner: none?
I suspect this will be a larger problem than what this one cluster has
hard to say without the actual json
which part of the json would be needed?
we would prefer the json from ontap
otherwise we have to guess what you have changed and what you haven't
which makes dev and debugging difficult
/usr/bin/jq '.records[].licenses[].host_id = "ABCDEF"' license.json | /usr/bin/jq '.records[].licenses[].serial_number ="GHIJKL"' | /usr/bin/jq '.records[].licenses[].capacity.maximum_size = "1234567890"' > license.json.cleaned
this is what I changed
I also did a sed, but that matched whatever was in the file that needed redacted to XXXYYY for the owner, if it was already none, it didn't match it
that's why I left the identifying node info at the end of XXXYYY so it was clear it was a different system that had that license
does that help?
yes, thanks
right now a unique instance is defined by the combination of
licenseName + "_" + scope + "_" + owner
looking at your JSON this will cause duplicates. What if we change uniqueness to
licenseName + "_" + scope + "_" + owner + "_" + installedLicense
No that won't work either. Maybe adding serial_number would do it or is that the node's serial number? Can you check your non-redacted data and see if serial_number would address the uniqueness problem
either serial or host_id, both are set
both match, if that matters
what I would do to test this, is pick a serial number of a host no longer in your cluster and add the license to the cluster
it should set owner to 'none'
can you check if the combination of licenseName, scope, owner, serial is unique? What we've learned is the combination of licenseName, scope, and owner is NOT unique with your data. licenseName is "name" from the JSON
licenseName is referring to this, correct?
"records": [
{
"name": "nfs",
vs
"records": [
{
"name": "cifs",
nfs_node_none_1234567890
nfs_node_none_1234567891
cifs_node_none_1234567890
cifs_node_none_1234567891
such as that?
yes
I can't see your actual values so I can't see if there will be duplicates, that's why I'm asking if you can check
at least what I am seeing, serials are unique throughout per licenseName
thanks! I'll update the Harvest code to use that, test with CI, and let you know when there is a new nightly with these changes
I'm re-validating with JQ
good idea
verified
cat license.json | /usr/intel/bin/jq -r '.records[] | .name as $name |.scope as $scope | .licenses[] | "($name)($scope)(.owner)_(.serial_number)"' | wc -l
thank you!
cat license.json | /usr/intel/bin/jq -r '.records[] | .name as $name |.scope as $scope | .licenses[] | "($name)($scope)(.owner)_(.serial_number)"' | sort -u | wc -l
both return 58 records
code is submitted and running through CI
it will be available in a few minutes
didn't expect that
changes are in latest nightly https://github.com/NetApp/harvest/releases/tag/nightly
so far all I am seeing is:
msg="lagging behind schedule"
and it reports the time, which for this specific cluster is ~40-44s
and clearly there was an issue before:
1512 license_ based metrics
now:
2088 license_ based metrics
no more ERRORs though
excelllent no errors is good. Does that number of instances match what you see from the ONTAP CLI when running system license show? If you persistently see the lagging message you may want to increase the client_timeout https://netapp.github.io/harvest/nightly/help/troubleshooting/#client_timeout for the conf/rest/9.12.0/license.yaml template.
Can you grep your log files for the pattern Rest:License and paste the results? I'd like to see what it reports for numCalls, metrics, apiMs, etc. It should look something like this
time=2026-03-04T14:05:39.334-05:00 level=INFO source=collector.go:620 msg=Collected Poller=sar collector=Rest:License apiMs=778 bytesRx=9951 calcMs=0 exportMs=2 instances=20 instancesExported=20 metrics=110 metricsExported=38 numCalls=1 parseMs=0 pluginInstances=0 pluginMs=0 pollMs=778 renderedBytes=8607 zBegin=1772651138554
time=2026-03-04T16:03:01.131-08:00 level=INFO source=collector.go:620 msg=Collected Poller=collector_node collector=Rest:License apiMs=225312 bytesRx=214570 calcMs=0 exportMs=5 instances=18 instancesExported=522 metrics=96 metricsExported=2088 numCalls=12 parseMs=1 pluginInstances=0 pluginMs=5 pollMs=225318 renderedBytes=384838 zBegin=1772668755808
I'll change the client_timeout and see if that occurs again, likely tomorrow sometime
@static vortex There is one more fix in the nightly build for a license-related issue:
https://github.com/NetApp/harvest/releases/tag/nightly
Please upgrade to this build and let us know if you encounter any issues.
this is done, I'll report back if I see any errors in the log
time=2026-03-05T10:16:55.310-08:00 level=INFO source=helpers.go:113 msg="best-fit template" Poller=node collector=Rest:License path=conf/rest/9.12.0/license_custom.yaml v=9.16.1 jitter=none
time=2026-03-05T10:20:40.994-08:00 level=WARN source=collector.go:580 msg="lagging behind schedule" Poller=node collector=Rest:License lag=43.758232018s
license_custom.yaml difference:
--- license.yaml 2026-03-05 01:35:55.000000000 -0800
+++ license_custom.yaml 2026-03-05 10:16:08.102708515 -0800
@@ -14,3 +14,4 @@
- License
export_data: false
+client_timeout: 2m
still getting the lagging behind schedule mesg
not sure why this is occurring
hey @static vortex it looks like it's taking roughly 3.75 minutes for ONTAP to provide the license info for this cluster. Since this collector, by default runs every 3m, it will have lag. You can live with warning or increase the schedule for this template so it runs every 4m instead of every 3. If you want to change that you can add the following lines below your client_timeout line
schedule:
- counter: 24h # This handles cases such as cluster upgrades or collector cache updates.
- data: 4m
this will override the default in your rest/default.yaml template
never mind, I was confused
I'll remove the client_timeout and add the schedule change
sounds good
because I don't expect the licenses to change often, we could bump this up to twice the collector time and it shouldn't have any ill effect
which I may do
here's the timing from startup until it exported metrics:
time=2026-03-05T10:37:55.996-08:00 level=INFO source=helpers.go:113 msg="best-fit template" Poller=node collector=Rest:License path=conf/rest/9.12.0/license_custom.yaml v=9.16.1 jitter=none
time=2026-03-05T10:37:56.630-08:00 level=INFO source=collector.go:652 msg=Collected Poller=node collector=Rest:License task=counter apiMs=241 bytesRx=0 metrics=0 numCalls=0 pollMs=276 zBegin=1772735876354
time=2026-03-05T10:41:53.975-08:00 level=INFO source=collector.go:620 msg=Collected Poller=node collector=Rest:License apiMs=237338 bytesRx=214570 calcMs=0 exportMs=4 instances=18 instancesExported=522 metrics=96 metricsExported=2088 numCalls=12 parseMs=2 pluginInstances=0 pluginMs=3 pollMs=237342 renderedBytes=430252 zBegin=1772735876630
that will work but you will need to be mindful of the Prometheus exporters cache_max_keep value of 5m. This duration is the maximum amount of time metrics are cached https://netapp.github.io/harvest/nightly/prometheus-exporter/#parameters
interesting
so far at 4m, no behind schedule alert
I found a few more that were lagging and added the override for them, taking longer than 3m to run
one was right at 4m +- 10 seconds
may get a few alerts, but much less
the reason we log that as a warning instead of error is because it's not crucial - when lag happens, it means the current collector is still running when the "next" schedule pops. That's OK since the previous one is still running. We don't run another at the same time, and once the previous one finishes, it will be scheduled to run again as soon as it is done
I see, so when the current collector is running and you launch on the 3m time schedule, it starts tracking how long it took before it could start the new one and publishes that, interesting
the interesting part to me is it must be comparing start time against end time and how long over the schedule interval that is, because if what you say is true and it waits, i would think lags would get longer and longer up to 3m
not quite - each collector runs in its own go routine (like a thread). The collector run a for(true) loop. That loop iterates through the list of tasks (tasks are the list of items in the schedule from above counter, data, etc.)
If the task is not due, continue to next task.
If the task is due, run it
At the end of the loop, check what time the next task is due. If that time is in the future, sleep until then. If that time is in the past, log the lag message, and continue back to the top of the forever loop.
interesting
this also has the advantage of making each task single threaded within the collector. Meaning that pollCounter will run before pollData. Both will never run at the same time
still getting some random lags on license and volume under REST, but I suspect that we were seeing this before as well
yep
I'm sure that load is a factor on how long these run anyways, so I would expect some lag when load is higher
agreed
@static vortex You can try limiting concurrent collectors to, say, 40 if it helps with the lag. This may help if the issue is related to slow ONTAP response times due to a high number of API calls.
https://netapp.github.io/harvest/26.02/configure-harvest-basic/#pool
most of the time, it's not noticeable, my point is that customer load affect how long these take to return, as much as how much data there is to return. I am less concerned about how long the calls take as long as they mostly complete