#All Matter over Thread devices get unavailable after 5-6h

1 messages · Page 1 of 1 (latest)

mossy edge
#

Hi @devout saddle and @full trellis

what do you think about the following behavior? My Matter over Thread network was absolutely rock stable since January 2025. I have 67 MoT devices (26 Aqara T2 bulbs, 41 EVE devices) and 7 Apple TBRs (2 hardwired AppleTV 4K 3rd Gen, 4 HomePod Minis and 1 HomePod v2). Not a single issue in that time.

HAOS, Matter Server, OTBR, all Apple Home Hubs/TBRs and iPhones are up-to-date. I have a HA Yellow and use the integrated Thread radio for OTBR. Since yesterday all my MoT devices loose their connection and reconnect by themselves every 5-6hours. The only thing that changed, is that I enabled OTBR on HAOS round about 9 days ago. OTBR has the credentials of my Apple Thread network. Here you see that all my MoT devices have the same issue:

https://i.imgur.com/Z4OmRSJ.mp4

In the first half of the Video you see the history for my MoT devices for one week and in the second half you see the history for 1 day. I am not at home for some month, so it’s not an issue for me.

But before I disable OTBR remotely, I want to ask you guys, if this is something you want to analyze? If so, I am trying to download the relevant logfiles via my smartphone. I do not have any computer here. 😅

Regards Hoppel

mossy edge
#

Yesterday evening I disabled the OTBR integration/addon. Guess what: The issue is solved.

weak fiber
#

seems like a thread leader crashed every a few hours...

umbral salmon
#

OR maybe OTBR and ATBR fighting for leadership somehow?
OR actually the mesh can also split. A situation where a Router loses connectivity with the mesh and the children don't realize.
The question is @mossy edge , do they come back online by themselves? Like within 5 to 10 minutes?

mossy edge
mossy edge
#

@umbral salmon OK, so the OTBR splits the mesh. I have 7 Apple TBRs which worked without any issue for at least half a month.

#

@umbral salmon As already answered by @weak fiber the devices reconnect on its own.

weak fiber
#

You can not force one to be a leader, it’s automatically voted by Thread nodes as far as I know

#

I had a wemo plug and somehow it keeps being voted as a leader (HomeKit integration shows so) then crashed due to possible overloading

mossy edge
#

There are some other guys with the same or a similar issue:

umbral salmon
#

I will say this: In my home setup (1 OTBR and 5 Nest Hubs) I can do
ot-ctl state leader
The mesh will accept this and quickly reorganize.
I don't have hoppel's issues though. In my case the presence of OTBR seems to add to overall stability.
But then I don't use the HA OTBR add-on, but run OTBR separately on an old RPi3B where I occasionally recompile from main. So it might also be related to the specific build used in the HA add-on (or OTBR doesn't play nice with ATBR in general?)

umbral salmon
# weak fiber You can not force one to be a leader, it’s automatically voted by Thread nodes a...

Technically speaking, you can: There will always be only one leader. So

  • As a result of a split (where each partition will vote their own leader) and a subsequent re-join OR
  • As part of the role promotion of a potentially new device
    Also, a node can ask the current leader to give up its leadership role which will only succeed if the leader weight of the to-be-leader candidate is higher than the weight of the existing leader. See https://openthread.io/reference/group/api-thread-router#otthreadbecomeleader
    This is what happens on ot-ctl state leader.
    Leader weight results from a number of parameters, e.g. connectivity/link quality. With thread 1.4 it is also influenced by device properties
PowerSupply      : external
IsBorderRouter   : yes
SupportsCcm      : no
IsUnstable       : no
WeightAdjustment : 0

With openthreadyou can also override the leader weight by passing a higher value to sudo ot-ctl leaderweight be fore calling state leader .
Anyhow, I must have been lucky to always succeed with my leadership requests, maybe due to the fact that my OTBR is on ethernet as opposed to WiFi or just by being on thread 1.4

For reference:

#

To be fair, I should also mention that the documentation clearly states:

Taking over the leader role in this way is only allowed when triggered by an explicit user action. Using this API without such user action can make a production application non-compliant with the Thread Specification.
full trellis
#

Can Home Assistant see the devices in the Zeroconf browser? You'd need to figure out the matter operational service for a particular device by using fabric id + node id, see this README how to find that:
https://github.com/agners/thread-debug

If the devices do not appear there, then there is some mDNS/SRP problem. You can disable the OTBR SRP to test that theory using ot-ctl srp server disable.

GitHub

Repository with instructions, commands and scripts to help debug Thread networks - agners/thread-debug

#

I do have OTBR running with Google devices without such issues, so it must be some interaction problem with Apple BRs then 🤔