I'm running Netdata Cloud (Business) for around 50 nodes and every day I get over 10 false-positive "<node> is unreachable" alerts but none of my nodes ever go down. After about 15 minutes I get "<node> is reachable" alert.
The only reason I've switched on downtime notifications was to get notified if a node ever "actually" goes down and this bug really defeats the purpose. This bug is there for almost a year now.
Is there a way I can solve this on my end instead of waiting for Netdata team to fix this bug with their software?
All my nodes are running v2.1.0.
UPDATE: I found out about the reachability delay option in the UI. I've changed it from default 30 seconds to 60 seconds now. Hopefully this should suppress those false alerts while Netdata team dig deeper into the root cause of this bug. I'll update this post if necessary.
Although judging by the time I get the "<node> is reachable" alert, I think this false "downtime" lasts for about 15 minutes. We need this bug to be fixed at the root.