#harvestsnapmirror.yaml at main · NetApp...
1 messages · Page 1 of 1 (latest)
hi @stable crater the source_node is probably added by the plugin listed on line 37. Let me check
what version of Harvest are you on @stable crater ?
line 27 lists source-node and I believe the zapi code converts dashes to underscores which is why source_node is referenced later. hat tip to @shadow topaz for remind me of that conversion
sure, but source_node = ""
at least in our poller data; this is a dump of zapi -n POLLER show data --api snapmirror-get-iter
right, that's the real problem. we've been improving this code over the last week. let me dig up the PR and see if that fixes your problem. This is the issue https://github.com/NetApp/harvest/issues/1192 which has been fixed for REST. @civic badge is working on a ZAPI fix now
curl -s 'http://localhost:13003/metrics' | grep ^snap | grep source_node | grep 'source_node=""' | wc -l
0
hold on, it's not ^snap anymore it's ^Netapp_snap
ok, here's the source data
curl -s 'http://localhost:13003/metrics' | grep ^netapp_snap | wc -l
18204
total records
by chance are your snapmirror relationships inter-cluster? If so, that's what that issue is about and is not fixed for ZAPI yet
no, not inter-cluster, these are snapmirrors to external clusters that house 'snapvault'
gotcha. ok, in that case, the zapi won't see the external cluster - it's explained in the issue
Snapmirror relationships are destination driven. So, snapmirror info is available in zapi/cli/rest destination cluster, but no info available from source cluster via zapi/cli/rest.
that's the bug that @civic badge is fixing in the Zapi plugin. Was fixed earlier this week for REST
right, this is a destination cluster that I am querying
not the source of the data
ok, got it to sort out the counts:
Here's the count of everything that is a netapp_snap (not policy) could that has the source_node=""
curl -s 'http://localhost:13003/metrics' | grep ^netapp_snap | grep source_node | grep -v netapp_snapshot_policy | grep 'source_node=""' | wc -l
18200
here's checking what shows up with source_node != ""
curl -s 'http://localhost:13003/metrics' | grep ^netapp_snap | grep source_node | grep -v netapp_snapshot_policy | grep -v 'source_node=""' | wc -l
0
clearly missing data
the reason I found this is that the "dashboard" in harvest for it presents back "Source - " data in a graph
yes, data will be missing until the bug is fixed
I pasted above but may have gotten missed. This is the same issue you are hitting. That issue has been fixed for the REST collector (because REST returns a bit more information) and @civic badge is fixing for ZAPI now
I see the issue, but don't see comments about zapi in that issue
the one you sent shows 'fixed:' and closed
maybe I missed that
I reopened it
is there a timeline for the next release? I noticed that there are nightly builds but I don't see when / what decides a release is coming
Harvest follows release cycle as mentioned here https://github.com/NetApp/harvest/blob/main/SUPPORT.md#harvests-release-and-support-lifecycle
We are currently planning an incremental release of 22.08 in next couple of weeks.
ah, ok. Though, with having these issues, is it recommended to deviate from 'releases' and use the 'nightly builds'? I would like to have this fixed but not sure what is different between them other than likely a code freeze on that branch you use for it
nightly is built from the main branch and passes all CI and unit tests. We work hard to keep main green because it's the best way to get feedback from customers on current features and fixes. Generally it's safe to use main, but of course, sometimes things slip through. Documentation may be lagging from features in main and main will go through more testing before being promoted to a release.
The snapmirror issue, you'd like fixed, is still in review for the ZAPI side, and hasn't hit main yet https://github.com/NetApp/harvest/pull/1307
hi @stable crater this change has passed CI and been posted to nightly https://github.com/NetApp/harvest/releases/tag/nightly We also updated the FAQ to describe why the Prometheus metric snapmirror_labels has an empty source_node and how the dashboards workaround that https://github.com/NetApp/harvest/wiki/FAQ#why-do-my-snapmirror_labels-have-an-empty-source_node If you get a chance to try it out, let us know how it works or yell if something isn't clear in the FAQ
Thank you. I will try it out today
getting the same result with
curl -s 'http://localhost:13003/metrics' | grep ^netapp_snap | grep source_node | grep -v netapp_snapshot_policy | grep 'source_node=""' | wc -l
16996
here's checking what shows up with source_node != ""
curl -s 'http://localhost:13003/metrics' | grep ^netapp_snap | grep source_node | grep -v netapp_snapshot_policy | grep -v 'source_node=""' | wc -l
0
it's down from 18200
the snapmirror dashboard is populated now though....
for some of the clusters, not all of them
hi @stable crater progress! you won't see the source nodes via the curl, that was covered in the FAQ. Glad to hear the dashboard is populated. Would be better if it was populated for all clusters 🙂 let's see if we can figure out why
@stable crater could you share the poller logs from the cluster which haven't been populated the in snapmirror dashboard? Also could you run this command on the poller which haven't been populated just to confirm how many relationships exist there: curl -s 'http://localhost:xxxxx/metrics' | grep ^netapp_snapmirror_labels | wc -l
do you want the ones that have source_node = "" ?
or just all of them for that poller?
the snapmirror dashboard isn't populating the 'source' filer when trying to select a node that is the destination of the snapmirrors (vault)
I took the snapmirror dashboard from the github site but not sure on what version I should be using of the dashboard
pulled the latest version from github
still broken
"expr": "count (count by (relationship_id) (snapmirror_labels{datacenter="$Datacenter",cluster=~"$Cluster"}))",
there is no metric called 'snapmirror_labels' anymore
it's netapp_snapmirror_labels....
and everything else in the dashboard that has 'snapmirror_' needs replaced with 'netapp_snapmirror_'
I suspect all the other dashboards are also in that same state, where / when the netapp_ was added, those weren't updated
Hi @stable crater I would need logs of all the snapmirrors for that poller. Regarding the statementthe snapmirror dashboard isn't populating the 'source' filer when trying to select a node that is the destination of the snapmirrors (vault), With the fix of https://github.com/NetApp/harvest/issues/1192, Now snapmirror dashboard would be showing the source side view. In simple example: There is relationship between volume v1 as source in cluster C1 to volume v2 as destination in cluster C2, This relationship will be visible only when you select cluster C1 in dropdown at cluster variable, not in C2. Same way if you choose any node which is as source side, then it will show data, not other way.
Regarding there is no metric called 'snapmirror_labels' anymore, it's netapp_snapmirror_labels...., we haven't made any changes for this. As per the zapi template https://github.com/NetApp/harvest/blob/main/conf/zapi/cdot/9.8.0/snapmirror.yaml#L3 and rest template https://github.com/NetApp/harvest/blob/main/conf/rest/9.12.0/snapmirror.yaml#L5, object name is snapmirror which ensure that any metric from this template would start with snapmirror_xxxxxxx. It seems you might have custom template for snapmirror which would be doing this change, could you confirm the same?
@stable crater how are you installing Harvest? we aren't prefixing any of the metrics with netapp_
ah, nevermind then, I thought it was set to that by harvest, it looks like when we moved from the old harvest version to the github one, we used the prefix 'netapp' to match the old one
aka. we should have NOT done that
I was able to change the dashboard to match our prefix
sed is your friend
if you need to do that again, another handy way to rewrite is with the --prefix arg of grafana import
I am using the graphana import via web browser, not command line, unfortunately
I think I am confused as to how we are getting the source server, is it via the volume labels?
nevermind, I found the issue
well, I found MY issue, due to 'prefix' but the data is still not appearing
it's using 'cluster' to check the 'source point as well' which won't work, since cluster is the 'target' cluster where the snapmirrors are going too, shouldn't we be checking the 'source_volume' against the 'volume' in volume_labels instead?
count by(source_node, relationship_status) (snapmirror_labels{datacenter=~"$Datacenter", source_cluster=~"$Cluster"}) * on(source_volume, source_vserver, source_cluster) label_replace(label_replace(label_replace(label_replace(volume_labels{datacenter=~"$Datacenter", cluster=~"$Cluster", node=~"$SourceNode", node!=""}, "source_volume", "$1", "volume", "(.)"), "source_vserver", "$1", "svm", "(.)"), "source_node", "$1", "node", "(.)"), "source_cluster", "$1", "cluster", "(.)")
when I read this, the snapmirror labels are looking at the 'source_cluster' of the snapmirrors (which is where the snapmirror is originating from, on the destination side). if you use that same label to search volume_labels, it won't work, since that cluster doesn't contain the 'source volume'
which panel is your pasted expr for?
Source Relationships per Node
this one? I ask because what you pasted does not match the expression in github for that panel and I want to make sure we're talking about the same thing
yep
I pulled this out of the github yesterday
is there a new one?
the entire dashboard
the PR was checked in 2 days ago https://github.com/NetApp/harvest/pull/1307 so you should be good on that, but it included dashboard and snapmirror template/plugin changes so you need to take the entire build, not just the dashboard
where I pulled it from was from 'harvest', is there a seperate location for the dashboards?
in other words, this list of files changed and some of those changes mean that your poller binary needs to be updated too https://github.com/NetApp/harvest/pull/1307/files
I pulled and installed nightly as suggested and it's working from harvest-22.09.26-nightly.x86_64
ok so you have the new poller and templates too sounds like
yes, bin/harvest --version
harvest version 22.09.26-nightly (commit 81858dac) (build date 2022-09-26T08:11:30-0400) linux/amd64
looks exactly right - that's this commit https://github.com/NetApp/harvest/commit/81858dac0dc21f450ad8b668ea67bae04bb75c28
but the expression you pasted above for the dashboard is missing the group_left at line 1527 in the version you have https://github.com/NetApp/harvest/blob/81858dac0dc21f450ad8b668ea67bae04bb75c28/grafana/dashboards/cmode/harvest_dashboard_snapmirror.json#L1527
is it possible an older version of the dashboard was imported?
it's possible I missed something, double check that panel does not have group_left and maybe double check that the version of the file you download does include it
weird, I assumed I had the latest version from 'main' but clearly I didn't get the latest one
now that panel is populating
not sure if it's too early for confetti ball so I'll give an enthusiastic 👍
but anytime I select cluster, the panel has 'no data'
ok panel is populating and works until you select a cluster?
Yep, if I set cluster to a node that has snapmirrors, that panel goes to 'no data' immediately
if it has 'all' in the field, it works as expected, to show them all
Here's the query
count by (source_node, relationship_status) (netapp_snapmirror_labels{datacenter=~"$Datacenter",source_cluster=~"$Cluster"} * on (source_volume, source_vserver, source_cluster) group_left(source_node) label_replace( label_replace( label_replace( label_replace (netapp_volume_labels{datacenter=~"$Datacenter",cluster=~"$Cluster",node=~"$SourceNode",node!=""}, "source_volume", "$1", "volume", "(.)") , "source_vserver", "$1", "svm", "(.)"), "source_node", "$1", "node", "(.)") , "source_cluster", "$1", "cluster", "(.)") )
the source node based on the snapmirror isn't going to be based on the sourcenode for the volume labels
snapmirror is sourced from the destination filer and the source would only be found via volume name, I suspect
and/or cluster
checking
I am working on giving you some data as an example
always helpful, thanks
Mirror location:
netapp_snapmirror_labels{cluster="XXXrcfs70", releasedatacenter="HPC", derived_relationship_type="mirror_vault", destination_location="XXXrcfsv70a:XXXccfs01_build_rw_1a_mirror_vault", destination_node="XXXrcfs70n02a", destination_volume="XXXccfs01_build_rw_1,,a_mirror_vault", destination_vserver="XXXrcfsv70a", group_type="none", healthy="XXXue", instance="XXXciharv01.fqdn.com:12996", job="harvest_scrape", last_XXXansfer_type="update", policy_type="mirror_vault", protectedBy="volume", protectionSourceType="volume", relationship_id="db1dba11-6082-11e9-acb1-00a098d03d56", relationship_status="idle", relationship_type="extended_data_protection", schedule="sv_1110_2310", source_cluster="XXXccfs01", source_volume="build_rw_1a", source_vserver="XXXccfsv01a"} 1
netapp_volume_labels{aggregate="aggr2_XXXrcfs70n02a_L", cluster="XXXrcfs70", datacenter="HPC", instance="XXXciharv01.fqdn.com:12996", isEncrypted="false", isHardwareEncrypted="false", is_sis_volume="XXXue", job="harvest_scrape", node="XXXrcfs70n02a", protectedBy="not_applicable", protectionRole="destination", snapshot_policy="none", state="online", style="flexvol", svm="XXXrcfsv70a", type="dp", volume="XXXccfs01_build_rw_1a_mirror_vault"}
Source location:
netapp_volume_labels{aggregate="aggr1_XXXrcfs01n50a_H", all_sm_healthy="XXXue", cluster="XXXrcfs01", datacenter="HPC", instance="XXXciharv01.fqdn.com:12992", isEncrypted="false", isHardwareEncrypted="false", is_sis_volume="XXXue", job="harvest_scrape", node="XXXrcfs01n50a", protectedBy="snapmirror", protectionRole="protected", snapshot_policy="2perday_5day_retention", state="online", style="flexvol", svm="XXXrcfsv01a", type="rw", volume="build_rw_1a"}
so between the mirror and the source, the cluster is NOT the same
even though that query assumes they are
hope that helps
source_cluster isn't a label in snapmirror_labels at all
what you pasted includes it?
based on selecting the 'mirror' cluster
so, if in the overall panel, I select datacenter = HPC and cluster=XXXrcfs70, it shows no data, because that is put into the query as 'source_cluster=XXXrcfs70', which doesn't have source data, only mirror data
maybe that is expected?
it seems when I select a cluster node where there IS source volumes, that works fine
yes i follow your example, when you set the variable cluster=XXXrcfs70 you get no data because in the query that becomes source_cluster="XXXrcfs70" which matches nothing. On the other hand, if you set the variable cluster=XXXccfs01 you will get data. It would be clearer if the cluster variable was named SourceCluster
open to suggestions on ways to improve this @stable crater if you have any. Seems like it would be clearer to rename the Cluster variable to SourceCluster, although that's only applicable to some of the panels and not all of them. We could create a separate dashboard to address that variable name change. We could add some hover description on the panels were source_cluster=~"$Cluster". snapmirror_labels{cluster="XXXrcfs70", source_cluster="XXXccfs01" could be made clearer too, cluster is really destination_cluster. I think what you want is an or, something like this snapmirror_labels{source_cluster=~"$Cluster"} or snapmirror_labels{destination_cluster=~"$Cluster"}
is pondering yet
I am trying to understand another relationship that isn't showing up, whether it's something on our end or something in the data in harvest
we have several XDP relationships on source filers and they are NOT appearing in the list from harvest, I am still investigating
I need to run to a meeting, but I'll discuss with the team tomorrow and see if we can improve. Really appreciate you taking the time to try nightly and give us such valuable feedback 💯
sure, I think it should be 'destination_cluster' since that is what we are looking at and what the other queries are pulling out for the list of 'cluster' in that panel, using 'source_cluster' would mean that you'd have to add all clusters to the list, not just those with snapmirror labels
I think I would call it "$Destination_Cluster" as a field name and leave the existing queries the same for 'cluster='
but that would mean changes to the labels, if that is possible. you are correct that in the snapmirror_labels that 'destination_cluster' would be more clear than just 'cluster'
As per your promQL response of volume_labels and snapmirror_labels, summary would be: volume:build_rw_1a which is residing on node:XXXrcfs01n50a and cluster: XXXrcfs01 is in protected state. So, When you choose cluster as XXXrcfs01 and/or node as XXXrcfs01n50a, you could find this relationship as well as all other relationships whose source resides in this node-cluster. This is the perspective of source side. for your questions: so, if in the overall panel, I select datacenter = HPC and cluster=XXXrcfs70, it shows no data, because that is put into the query as 'source_cluster=XXXrcfs70', which doesn't have source data, only mirror data, maybe that is expected? --> Yes, it's expected
it seems when I select a cluster node where there IS source volumes, that works fine -->Yeah, absolutely.