#No Metric after ONTAP upgrade
1 messages · Page 1 of 1 (latest)
@thorn yacht Indeed, in Harvest 23.05, we implemented a new check to handle cases where an API request is rejected by ONTAP. As a result, the retry interval has been set to 24 hours. Unfortunately, this change has led to the scenarios you've reported.
Could you please open a bug for this.
https://github.com/NetApp/harvest/blob/main/cmd/poller/collector/collector.go#L359-L365s
What I've seen happen is that nabox loses connectivity to the cluster. In current releases, it retries. In previous releases, it wouldn't even do that.
I always try to remember and check nabox following a cluster upgrade. If it's not capturing data, restart nabox.
Thanks @snow rampart report is open https://github.com/NetApp/harvest/issues/2175
Thanks @thorn yacht
Thanks @hallow thunder . Harvest already had retries since start. Perhaps it was a different issue causing that. Please do let us know if you face it with recent release. We definitely don’t want to restart poller a due to such issues.
I know this has been addressed but is there some specific reason why this couldn't be a tunable parameter and set to a default and overridden if set in the main config file?
@dim folio Are you suggesting that we need to adjust the value of this parameter, the retry interval in case of API was rejected, to something other than one hour? We have various retry intervals for different set of errors.
@snow rampart NAbox uses the poller binary directly, if it dies, there is no mechanism to restart it currently. This might be a problem during upgrades for example. I’ll see if I can improve this. I’m assuming retry in harvest is implemented by harvest binary ?
@rancid musk Yes, the current discussion regarding retry is related to retrying the collector in the event of errors. It is a separate use case from the scenario where the poller binary terminates unexpectedly.
Cool, so in that scenario, the poller doesn't quit ?