#[BUG] False-positive downtime notifications

1 messages · Page 1 of 1 (latest)

topaz edge
#

I'm running Netdata Cloud (Business) for around 50 nodes and every day I get over 10 false-positive "<node> is unreachable" alerts but none of my nodes ever go down. After about 15 minutes I get "<node> is reachable" alert.

The only reason I've switched on downtime notifications was to get notified if a node ever "actually" goes down and this bug really defeats the purpose. This bug is there for almost a year now.

Is there a way I can solve this on my end instead of waiting for Netdata team to fix this bug with their software?

All my nodes are running v2.1.0.

UPDATE: I found out about the reachability delay option in the UI. I've changed it from default 30 seconds to 60 seconds now. Hopefully this should suppress those false alerts while Netdata team dig deeper into the root cause of this bug. I'll update this post if necessary.

Although judging by the time I get the "<node> is reachable" alert, I think this false "downtime" lasts for about 15 minutes. We need this bug to be fixed at the root.

#

[BUG] False-positive downtime notifications

wary lynx
#

Hi, @topaz edge. @high kelp please check.

high kelp
#

Hi @topaz edge, could you share your spaceID with me as a private message
You can find it on the Cloud at your space settings

high kelp
#

Hi @topaz edge,
We indeed found something off on the Cloud but couldn't find the root cause yet.
We are still running tests and investigation won't stop until we find the source for problem.

Thanks for raising the issue.
I will let you know once problem is found and fixed.

topaz edge
high kelp
#

Thank you, happy new year!

high kelp
#

Hi @topaz edge,
Finally we nail it down, it was a bug related with a service we use that's out of our control.
We applied a fix on our side and reported the issue to the service developer team.
From now on you should not experience false positives anymore.

#

Thanks for reporting it once again