Looking for some guidance to help decide how best to set up automations with a large Thread network.
Context: I have 70 REED devices at the moment, all Inovelli White series dimmer switches. SkyConnect dongle on a beefy VM in my closet that hosts HA/Matter add-on. Latest versions of HA and Matter add-on. All devices commissioned okay initially, more or less working as expected, with only (what I believe to be) a transceiver signal power issue on the devices causing some frequent drops of certain devices, and sometimes commands/updates taking a little longer than I'd expect to propagate. But again, mostly working.
I discovered that attempting to send commands to dozens of these devices at once (say, 30-40 of them) caused what appeared to be a full Thread/Matter crash and reset. Thinking further about that, I would surmise that the commands being sent to the devices was probably not the real cause of the dump, but rather the significant number of intermediate state updates being sent from the devices back to the hub -- I'd guess 5-10 updates for each device over the course of 5-10 seconds (dim level now 5%, dim level now 13%, dim level now...).
I went searching for some guidance on how many messages a (semi-unreliable, perhaps weak-signal-strength) Thread network can support, what the built-in retry mechanisms are, how the Matter server may be influencing this (it's the thing deciding whether a device is still "up" or not, right?), and any anecdotal guidance from people in the field trying to run this many devices. I have come up fairly empty. π
Does anybody have such guidance? In particular, what is the Matter server's role in deciding the stability of the connections to end devices, and when it determines a node has been lost and must thus be re-found? Does Thread itself have any limitations I should know about that might come into play here? Any documentation pointers or anecdotes are appreciated. π