#Bulk Z-Wave Command Woes

1 messages ยท Page 1 of 1 (latest)

visual nacelle
#

I have what I would describe as a "medium size" Z-Wave network, consisting of ~60 devices, mostly in-wall Inovelli switches. For the past several months, I've been fighting issues with "bulk" commands which target >10 devices on my Z-Wave network. Generally, things have gotten significantly more flaky, and I'm very frequently left with a subset of Z-Wave devices that have not responded to the issued command.

Examples:

  • "Hey Google, turn off all the lights" (via Google Assistant)
  • "Exterior Lights - On" (Automation)
    • Implemented via three different multicast actions to set lights to desired level

Historically, while things like this this were never "fast", they did work reliably and eventually all the targeted devices would be turned off successfully. It seems like something has changed in the way Z-Wave JS handles retries or queuing up of commands, but I've not been able to narrow down the exact root cause.

rain lily
#
visual nacelle
#

Totally agree with that post, but in the one example above I'm actually using multicast and still facing issues. It really seems like something about how Z-Wave JS deals with failures or timeouts has changed in some way that's been particularly impactful on my use cases.

slender swan
#

Have you added any devices since the point where it used to "work"? Re-include some with a different security level? Updated your controller firmware?

visual nacelle
#

hmmm. I may have just sorted out the multicast issue in the automation. Seems like my automation must have partially broken when migrating to the 800 stick, as the device_id had changed for these devices.

#

The Google Assistant issue is unrelated to that, and I'm still unsure how to best address that. The way that works, it's basically equivalent to issuing a command to each light I have in Google Home one by one. Pretty much a z-wave network torture test ๐Ÿ˜†

slender swan
#

To really understand what's going on, driver logs (level debug) of a problematic interaction are necessary.

In general, issuing lots of commands at once to multiple devices is problematic in Z-Wave. I have some ideas how to improve what Z-Wave JS does when the communication goes off the rails, but I'm not there yet.
I also wouldn't be surprised if those devices with red lines in the network map have something to do with it.

There's also a problem where some devices report back with supervision and force the flow of communication, while Z-Wave JS is still busy. If that happens while we're waiting for one of those devices with bad connectivity to respond... Let's just say it's not helpful.

visual nacelle
#

I was trying to work out a way I might be able to "proxy" the incoming GA commands so that I could batch them up into a zwavejs multicast command, but it gets really messy.

#

I'll grab some representative logs ๐Ÿ™‚

slender swan
#

Note that multicast only aims to reduce the popcorn effect. It adds one additional command rather than reduce traffic.

visual nacelle
#

For whatever reason, using it makes a massive difference for me.

visual nacelle
visual nacelle
visual nacelle
#

I've set up a zniffer again -- it's great to be able to use it directly in z-wave js and the integration is super slick compared to what I remember about using the windows application a couple years back

#

Question -- what is a "normal" amount of corrupted frames? Trying to decide if what I'm seeing is to be expected, or requires some further investigation.

haughty pond
#

corrupted frames in zniffer output?

visual nacelle
#

yes

haughty pond
#

could just be the zniffer

#

your zniffer is attached to the same system as the controller?

visual nacelle
#

for the moment, yes.

visual nacelle
#

A few thoughts based on the testing I've done over the past couple weeks:

  • Eliminating power reports (but not energy reports) has greatly improved the flakiness I was seeing with bulk actions initiated via Google Assistant.
  • This makes a lot of sense in the context of toggling switches, as toggling the state of a light always resulted in a power report, greatly increasing the load on the network at a time where it's already super busy.
  • While they haven't been entirely eliminated, the frequency / severity of "controller jammed" events has been significantly reduced.
  • I'm still unsure of what actually changed -- I've always had power reporting enabled, and this all used to work reliably (albeit slowly at times). Perhaps just a symptom of increased throughput somewhere else in the stack (IE: events from google assistant being issued more quickly than they used to be)?