#Help with stabilising my Thread and Matter network

1 messages · Page 1 of 1 (latest)

hollow prairie
#

I have a Thread and Matter network setup at home with HomeKit and Home Assistant integration. However, I've been experiencing instability issues with my Thread / Matter network, leading to frequent disconnections and very unreliable device performance.
I'am running latest and greatest firmware on my Thread border router(s) and ensuring that all my devices are running the latest software, but the problems persist.
I'm looking for advice on how to stabilize my Thread network. Are there specific configurations or best practices I do not follow now? Any insights into optimising the network for better performance would be greatly appreciated.

I have 70 Thread/Matter items in use, most of them Eve. Energy power sockets, MotionBlinds, Motion sensors and door and window sensors.

When I look in the Matter Server logs I see so many: Subscription Liveness timeout with SubscriptionID = 0xxxx or Subscription failed with CHIP Error 0xxxx and many more. I do not know were to start trouble shooting.

When I go to the OT border Router webpage and go to Topology, I only see a hand full of items and not 70 or so.

Also strange I think is that in the Open Thread Border Router Integration I see 2 Services, both Home Assistant Sky Connect (Open Thread Border Router). I was expecting a ZBT-2 and my HomeKit gear. I had a ZBT-1 a while ago but upgraded with the new ZBT-2 when it became available (hoping it would improve but alas, it did not at all)

I have, for some time, stopped my OTBR add-on. But this did not resolve my issues.
It looks like there is some issue with who is the owner of the network or changing channels or something. Resulting is unavailable Items that without doding anything do re-apear after some time, sometimes minutes, sometimes hours.

Any structured help is greatly appreciated.

echo flume
#

How many Thread Border Routers do you have and which brand/model?

  • ZBT-2 and HA OTBR

Any Apple Thread Border Routers?

I have the same setup.

  • 70 Matter over Thread devices
  • 2 hardwired AppleTV 4K 3rd Gen
  • 4 HomePod Minis
  • 1 HomePod v2

My network is solid. Over the day I see some disconnects for some seconds/minutes for a few devices. Nothing critical. When I add OTBR to the mix, I get a lot of disconnects after round about 2 weeks. Some devices do not reconnect by themselves. But when I disable the OTBR everything starts working again.

I read that you already tried to disable your OTBR, but that didn’t solve your issues.

#

Did you try the following?

  • disable OTBR
  • unplug all your Apple TBRs and Apple Home Hubs from current
  • wait half an hour, to delete all device caches
  • plug your main/primary Apple Home Hub
  • wait until it’s available in Apple Home
  • plug all the other Apple Home Hubs
#

Did you configure a primary Apple Home Hub in Apple Home settings? If not, configure a hardwired AppleTV 4K 2nd/3rd Gen as your primary Apple Home Hub.

#

Do you have any other 2.4GHz technology in use?

2.4GHz WiFi, Zigbee, Thread and Bluetooth all use the same 2.4GHz frequency band.

Maybe you have interferences between these technologies. Here is a good description:

https://www.metageek.com/training/resources/zigbee-wifi-coexistence/

So, if your Thread network is on channel 25 (Apples default Thread channel, do not use a Zigbee network on channel 25 and do not use 2.4GHz WiFi channel 11, only use WiFi channel 1 and/or 6.

#

Which network equipment do you have in use?

#

Are all Matter related devices in the same VLAN?

#

Disable mDNS enhancements.

hollow prairie
#

The only 2 Thread border routers I have is 1 Apple Homepod mini, wich is my Thread network (MyHome444621901) and 1 ZBT-2 with the HA OTBR also part of the MyHome444621901 Preferred network.

I will later today do:
disable HA OTBR,
unplug all your Apple TBRs and Apple Home Hubs from current, (This will be the Homepod mini)
wait half an hour, to delete all device caches,
plug your main/primary Apple Home Hub,
wait until it’s available in Apple Home,
plug all the other Apple Home Hubs, (I do not do this step due to the lack of devices, but will not plugin my HA OTBR)

On 2,4 GHz I have WiFi and BT, no Zigbee. WiFi is fixed on Channel 1 on my AP's

I have Unify network equipment Home, Dream Machine and 4 AP's
mDNS is not activated on the Default network and I have only one network, no VLANS

hollow prairie
#

I did almost all as above.
Altough I do have multiple Apple TV's none of them is new enough to have Thread. I did not decouple them form power during the unplug step. I do not know if that makes a difference.
The HA OTBR is still disabled as I am trying to get it stable with only Apple HomePod mini first.

I had some dropping devices during the night (I have an automation on the most important one's to alert me) but this morning all looked good for multiple hours.

Your suggestions are very much appreciated. Keep you posted of the progress.

echo flume
#

Great, you are welcome. Please, keep me/us posted. 😉

#

By the way… Some time ago I read on Reddit that some guys had Thread issues with older AppleTVs without a Thread radio. Maybe it’s time for a new hardwired AppleTV 4K 3rd Gen. 😉

Alternatively, you can unplug your HomePod and use the ZBT-2 for testing purposes only.

dapper canyon
#

100% agree with @echo flume To not mix OTBRs. I have two Apple TV 4Ks (Ethernet/Thread) and things are mostly solid.

hollow prairie
#

Still dropping devices.
Just did unplug all Apple TVs, HomePod mini and still HA OTBR is disabled.
Apple TVs, all 3 of them, 2 are Apple TV HD, 1 is Apple TV 4K (so older models).
I am waiting for an Apple TV refresh to replace them.
Next step will be to unplug all again and only plug-in the HomePod mini and see if this makes a difference.

slate zealot
#

I have 7 HomePods/ minis and an Ethernet connected ATV and the ZBT-1 all as thread routers and no issues but I only have 4 thread devices so far. I have WiFi on channel 1, thread on 25 and Zigbee on 20. 130 WiFi devices with most on 2.4, 70+ Zigbee and 4 thread. Are you guys saying I’m better off removing my ZBT-1 as a thread router? If so I could repurpose it as Z2M to see if I like that but I am happy with ZHA at the moment.

real sapphire
crisp totem
hollow prairie
#

How to check on what Channel my Apple HomePod mini with its MyHomexxxxxxx is on?

#

For example, if I do the Eve app and go to Thread it keeps spinning on Checking Network Status…. And never display an overview of the network.

hollow prairie
#

Yes, thanks. It says Channel: 25 for me. That was what I was looking for.

echo flume
hollow prairie
#

Last night I unplugged all again and did after 1 hour only plug in the HomePod mini and nothing else. No Apple TVs and not the HA OTBR, this was I keep disabled for now, still during the night I had unavailable devices reported in HA. So this is not (yet) the solution to a stable network.

Any suggested follow-up actions for trouble shooting?

echo flume
#

What happens when you try it with your HA OTBR only, without any Apple TBR?

#
  1. How long are the devices unavailable?
  2. Are those unavailable device resubscribing/reconnecting by themselves?
  3. Are those unavailable devices always the same devices?
  4. Which devices (brand/model) are getting unavailable?
hollow prairie
#

Later tonight, when the family does not need any Smart Home functions 🙂 I will give it a go with only the HA OTBR.

  1. Between a couple of minutes to a couple of hours
  2. Yes, I do nothing, only wait till thay return to function
  3. No, its all over the place in the house and also time wise over the day/night
  4. Eve is 90% of my devices (Energy plug, Door/Window sensor, Motion sensor, MotionBlinds) Zemismart for the curtains and Nuki for the locks. Everything is on latest version of its software / firmware.
echo flume
#

I see two possibilities at the moment:

  1. you have 2.4GHz interferences
  2. your mesh has not enough Full Thread devices (mains powered).

How many 2.4GHz WiFis of your neighborhood do you see in your house?

How many mains-powered Thread devices do you have?

naive oak
#

I have been on a similar journey and have made some breakthroughs.

  • Unifi Cloud Gateway Fibre + 2 Unifi APs
  • Around 50 Nanoleaf downlights using matter over thread
  • HA using ZBT-2
  • 1x Apple Homepod mini

Originally I had started with the Apple Homepod mini before I had HA, so I ended up with two thread border gateways on a single thread network. I had onboarded the downlights into HA by using the "existing matter device" flow with pairing code.
I experienced a lot of instability in this configuration, lights regularly not responding or taking many seconds to respond.

After reading many forum posts and github issues, I decided to:

  • Reset all nanoleaf downlights, remove them from Apple Home
  • Create a brand new separate thread network in HA, add all downlights to HA as new Matter devices

I have very good stability now. The Homepod is still turned on but it doesn't have a role in the thread network, a common belief is that they insist on being the primary thread border gateway but then do a bad job of it.

One thing that doesn't seem solvable is that it seems to take a long time for thread meshes to "settle". For example, I had a multi-hour power outage a few weeks ago, and after the power came back it took over half a day for the 50 thread devices to figure themselves out and stabilise. So for thread in its current form, it seems really important that devices aren't being powered on and off routinely. In my case, I 3d printed face plates that fit my clipsal light switches, they have a hinge so that you can still access the physical switch if needed but my wife/kids/visitors only use the smart switch.

hollow prairie
#

Thanks for the insights.
None of my devices get powerless, they are on power 24/7 or battery powered.

Last days I experimented with only HomePod mini or only Ha as boarder router, but both cases gave still issues with dropping devices.
Need to say that with only HA OTBR and no HomePod mini and also all Apple TVs disconnected, it looked like I had no dropped devices all night.
But reality is I need Apple TVs in the house (although they are not the Thread models) and it looks like when I add them the issues return.
But last night I again unplugged everything Apple, HomePod mini and all Apple TVs and then stil had some dropping devices. Not to many but still some over the night.

I really do not want to reset everything and rebuild my network. This would be the last resort, and I am not at that point yet.

#

There are between 5 and 15 neighbours networks when I have a scan.

I have at least 30 Eve Energy plugs in the house at all floors so coverage from that perspective is no issue I think

naive oak
#

In HA Settings -> Logs -> OpenThread Border Router do you see many errors around the time of the dropouts?

#

(also check Matter Server as that will cover Thread connectivity issues and Matter issues)

hollow prairie
#

Yes constantly
There is a whole list of errors, mostly in the Matter logs

naive oak
#

Matter Server is pretty noisy on errors, I see CHIP_ERRORS pretty regularly even when everything seems to be working fine

#

but OTBR errors are probably useful

hollow prairie
#

OTBR is mostly something like:

1d.22:31:00.856 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:90, chksum:4531, ecn:no, to:0xd800, sec:yes, error:NoAck, prio:low, radio:15.4

naive oak
#

Also just checking if you're aware that the OTBR add-on also provides a web UI you can use to visualise the topology on port 8080

hollow prairie
#

Yes I am, but it never shows me more than a couple of devices in this screen

naive oak
#

sorry, I just noticed you mentioned it in your original post

hollow prairie
naive oak
#

is that a purple dot in the top left?

hollow prairie
#

This is the most I have ever seen btw. I am surprised

#

Good question, I need to go to my Mac to check, iPhone is not showing 😎

naive oak
#

it's a strange topology either way, you seem to have a lot of routers but it's formed a hub and spoke around one router intead of a mesh

hollow prairie
#

This is how it looks like from my browser on my Mac

naive oak
#

haha ok, that looks more normal. perhaps I shouldn't read too much into that Topology graph

hollow prairie
#

No purple dot. I have never seen the purple dot I think

naive oak
#

Do you have the Terminal addon in HA?

blissful siren
#

Could you check the status page real quick? Just to verify something

hollow prairie
#

I have the teminal add on intalled yes

Satus page of what?

blissful siren
#

Sorry, Status page from the OT Border Router page

naive oak
#

I am not a thread expert and haven't dug too deep, but you can supposedly get link metrics like signal strength, collisions etc from there which could be useful if you have a lot of neighbouring 2.4Ghz wifi

hollow prairie
#

Thats all I ve got

hollow prairie
slate zealot
hollow prairie
#

You need to add it in the configuration tab, at the bottom of the page.
Port 8080 and 8081 are the suggested defaults and they work if you do not already use them for something else.
Then you start a new browser tab wit the :8080 at the end

naive oak
#

if you use the ot-ctl discover command, you can see the signal strength (dBm) and Link Quality Indicator (255 should mean no signal-related issues).
The MAC Addresses here will match up with what you see when expanding Device info

hollow prairie
#

I only see a relative short list, I would expect to see 70 plus devices here? Or does it not work like that and are some devices "asleep" Then I would expect to see atleast 30 plus devices, all my Eve Energy plugs with constant power?

naive oak
#

I don't know exactly, but the data comes from the local node's state.
You could try router table and child table to list all known routers and children of the current node

hollow prairie
#

Is see a lot, but it is not clear to me what I am looking at

naive oak
#

I'm not 100% sure either, the OpenThread docs for the router info are not very descriptive (age doesn't even specify what time unit, I guess seconds?).
My advice at this point is an educated guess, but:

  1. The router table (nodes that are acting as routers) looks very mesh-like, so that weird hub and spoke diagram must have been wrong.
  2. The Link column value of 1 means that a link is established with that router. I see that all the routers linked to this node have excellent link quality out (all 3s), but not always good quality in (several 1s). Not sure what you can do about that, but you could use the MAC addresses to find out which devices they are and see if a theme emerges, like a particular area of the house.
  3. There's a Debugging and diagnostics section in the docs, they recommend running netdata length and netdata maxlength to see if the network data is full. Mine shows 115 and 116 respectively.
#

you also mentioned that the devices may sometimes be unavailable for long periods, so next time that happens you could jump in and focus on that particular device - can you ping its ipv6 address, what router(s) are in the path, what is the link quality in and out, etc.

hollow prairie
#

Many thanks for the tips. I was also reading in the documentation but not all makes sense to me and it is hard to explain what is good and especially what not and even more important what is not there that should be there. I have way more devices active then listed in de terminal list above. Why do they not show, is that healthy or not.
In the screenshot you can see I can ping, first one is a healty node, seecond a offline node but with a IPV6 address en I also have a offline node but it has no IPV6 address, second screenshot.
I also found a offline Powerplug with IPV6 address but no response from a ping to that address.
As well the netdata lenght 80 and netdata maxlength 102 (nowhere near the in the documentation mentioned 254)

hollow prairie
#

And now, all devices have become online again. No offline device at all!

dusty egret
#

There are no simple answers I guess:

For devices without IP address in "Device info"

  • the device could be actually offline OR
  • only the dns-sd (mDNS) part doesn't work
    The latter can also have multiple sources, depending on whether you have more than one or two border routers, and waht your network setup is

For device with IP address in "Device info":

  • the IP might still be cached in HA's (ore one border router's) mDNS cache, but the device is actually offline
  • the device came online recently, but advertising (mDNS) the new IP did't happen correctly
  • the device came online recently, advertising (mDNS) the new IP did happen correctly, but HA core did not process the the updated RA from your border router(s) correctly
#

For tracing mDNS issues you might want to look into HA's zeroconf browser and maybe also compare the results with what you see locally on your machine using tools like avahi-browse or dns-sd.
http://homeassistant.local:8123/config/zeroconf

#

For tracing RA issues you might want to have a look into the output of ip -6 route from inside HA and maybe your local machine's equivalent:

$ ip -6 route
fd37:5639:3134:1::/64 proto ra metric 605 pref medium
    nexthop via fe80::5bc2:a8ca:b525:ff02 dev wlp0s20f3 weight 1 
    nexthop via fe80::a745:3993:c3ca:d27 dev wlp0s20f3 weight 1 
    nexthop via fe80::1095:13e3:86cd:1f dev wlp0s20f3 weight 1 
    nexthop via fe80::5c3:3975:10fb:b9be dev wlp0s20f3 weight 1 
    nexthop via fe80::619e:c11a:63da:1636 dev wlp0s20f3 weight 1 
    nexthop via fe80::a756:a4fd:1248:2f88 dev wlp0s20f3 weight 1 

in this case 6 border routers annonce responsibility for the fd37:5639:3134:1::/64 ULA range

#

Re: mDNS (zeroconf) browser.
The matter services that you see announced by your devices in the _matter._tcp service type name domain are of the form:

3134XXXXXXXX175A-00000000000000A6

where the left part is the matter fabric ID, and the right part is the fabric's node ID. If it has may prepending 0s, it is currently an indicator for being controlled by the HA matter server add-on. Other entries might belong to other fabrics (eg. apple, goole, etc.). Anyway for the matter controlled services the right part is the HEX encoded node ID, that you would also find in the "Device info" (there in decimal notation though).

hollow prairie
#

Hi, This is all a very steep learning curve for me!
The ip -6 route gives me a different output:

#

Even when on my Mac mini when I do netstat -rn -f inet6 I get a very long list but no similarities.

#

What I can do is a route get -inet6 "ipv6 form my first Matter node""

dusty egret
# hollow prairie Hi, This is all a very steep learning curve for me! The ip -6 route gives me a d...

is this from inside the otbr container? unfortunately that doesn't eve look similar to what I see on the HA Green...
From the OS level or maybe from inside the matter server container you should be able to see a route to one of networks being bound to the wpan0 interface.

Another indication for bad RA processing would be if you are able to ping the device ip from your local machine but it is not possible from the device page's ping option (please test with powered devices though, because battery powered devices can be very laggy with responses)

hollow prairie
#

No, the first output is from the Terminal add-on on the HAOS server.

dusty egret
#

Also, maybe basics first: many thread problems result from RF interference, because of reused thread/zigbee channels or conflicting spectrum with 2.4GHz wifi (also considering neighbors):
So, first of all make sure your are separating these channels. for reference see maybe here: https://community.home-assistant.io/t/should-i-change-my-zigbee-channel/441583/5

Please remember to not confuse the wifi and zigbee channel numbers.

Some indication for good/stable connections:
From your local command line try pinging a thread device. if the round trip times are in the two digits or lower three digits and not too volatile, you should be ok.
You can also check the occupancy of your channels:

ot-ctl channel monitor
enabled: 1
interval: 41000
threshold: -75
window: 960
count: 2787
occupancies:
ch 11 (0x02bf)   1.07% busy
ch 12 (0x0000)   0.00% busy
ch 13 (0x0644)   2.44% busy
ch 14 (0x0302)   1.17% busy
ch 15 (0x1f59)  12.24% busy
ch 16 (0x2733)  15.31% busy
ch 17 (0x2667)  15.00% busy
ch 18 (0x4133)  25.46% busy
ch 19 (0x713f)  44.23% busy
ch 20 (0x7dbb)  49.11% busy
ch 21 (0x7cf0)  48.80% busy
ch 22 (0x6622)  39.89% busy
ch 23 (0x355f)  20.84% busy
ch 24 (0x16a6)   8.84% busy
ch 25 (0x1a01)  10.15% busy
ch 26 (0x2f96)  18.58% busy

Done
#

further above you already could see in your output "LQ in" and "LQ out" columns, where LQ stands for link quality. "0" is already not a good value...

hollow prairie
#

I do not Zigbee at all, no adapters or radios in my HA env

#

Zwave is the only other next to WLAN

#

➜ ~ docker exec addon_core_openthread_border_router ot-ctl channel monitor
enabled: 0
Done
➜ ~

#

ping times are around 80 ms avg. for multiple different tests

#

I see in the documentation that "OPENTHREAD_CONFIG_CHANNEL_MONITOR_ENABLE" is required

dusty egret
hollow prairie
#

I am trying to find the right command to turn it on now

#

➜ ~ docker exec addon_core_openthread_border_router ot-ctl channel monitor
enabled: 1
interval: 41000
threshold: -75
window: 960
count: 0
occupancies:
ch 11 (0x0000) 0.00% busy
ch 12 (0x0000) 0.00% busy
ch 13 (0x0000) 0.00% busy
ch 14 (0x0000) 0.00% busy
ch 15 (0x0000) 0.00% busy
ch 16 (0x0000) 0.00% busy
ch 17 (0x0000) 0.00% busy
ch 18 (0x0000) 0.00% busy
ch 19 (0x0000) 0.00% busy
ch 20 (0x0000) 0.00% busy
ch 21 (0x0000) 0.00% busy
ch 22 (0x0000) 0.00% busy
ch 23 (0x0000) 0.00% busy
ch 24 (0x0000) 0.00% busy
ch 25 (0x0000) 0.00% busy
ch 26 (0x0000) 0.00% busy

Done

#

Working now. Only output is dissapointing

dusty egret
#

You have to wait a while, it is buildin stats as the count vaule increases

#

it is getting more precise each cycle. also it will reset when you restart the border router

hollow prairie
#

Yes, I see already some numbers coming in now.
First will make me an espresso and come back in a couple of minutes. 👍🏻

real sapphire
hollow prairie
#
➜  ~ docker exec addon_core_openthread_border_router ot-ctl channel monitor      
enabled: 1
interval: 41000
threshold: -75
window: 960
count: 1474
occupancies:
ch 11 (0x1bcd)  10.85% busy
ch 12 (0x4fe2)  31.20% busy
ch 13 (0x4615)  27.37% busy
ch 14 (0x0f1c)   5.90% busy
ch 15 (0x03ba)   1.45% busy
ch 16 (0x1f6e)  12.27% busy
ch 17 (0x2152)  13.01% busy
ch 18 (0x2205)  13.28% busy
ch 19 (0x1d7f)  11.52% busy
ch 20 (0x0000)   0.00% busy
ch 21 (0x0010)   0.02% busy
ch 22 (0x0aa7)   4.16% busy
ch 23 (0x0c05)   4.69% busy
ch 24 (0x001d)   0.04% busy
ch 25 (0x0a1a)   3.94% busy
ch 26 (0x038f)   1.39% busy

Done```
#

As I am on channel 25, this looks like its not the source of the problems I think?

hollow prairie
#

It's never been as bad as this weekend. I did not change anything, only did some of the commands from above to get more insights.
At some point devices dropped all, or almost all. Most of them came back rather quick but I had waited for some to comeback by themself but it did not happen even over night so I toook them out of the powersocket and put them back in. This made them appear in the list quickly again.
Very strange.

#

Would it make a difference if I would build the network from the ground up again without the Apple devices?
I am really desperate to get a good working network, not only for me but way more for my wife and kids 😎

naive oak
#

When I suspected my Homepod, I powered it off for a couple of days and noticed an improvement, this is what lead to me rebuilding on a new thread network.
I think you already tried this though? how long did you leave it powered off for?

hollow prairie
#

I now have the HomePod mini powered off for 4 or 5 days.
This morning I did a full restart of my HA server and today was sort of okay. Some devices dropping off for a short time but coming back rather quickly.
I did not rebuild my network (yet). It will be quite an effort.
Is there a way to investigate 3 of my devices? These are Zemismart MT01 slide curtains with old hardware and firmware, both 1.0.0. Maybe they are the culprit?

naive oak
#

I can't think of how they would be the culprit unless they were spamming the airwaves (which doesn't appear to be the case) or acting as routers (battery powered stuff usually isn't)

dusty egret
#

That way you can save the time to migrate all your devices into a new thread network.

#

It can take half an hour or so for your devices to follow to your OTBR's new channel though. During that time it can be advisable to have the homepod switched off (only after a successful PAN split of course)

naive oak
#

Funnily enough, somehow overnight my Homepod mini managed to get credentials for the new HA thread network and join it as a border router. I assume my iPhone shared them with it.

#

Anyway this morning my thread network had gone to shit again, most of the lights were offline

#

Switched off the Homepod mini at the wall, half an hour later everything is back up and running nicely

echo flume
#

What are your WiFi settings? Can you please post some screenshots?

hollow prairie
#

I have a Unify Dream Machine and 3 AP's configured.
Ask me, I can provide more info if that is useful.
What I do not understand from the screenshots is, I have 2,4 GHz on only channel 1 all others should be not used, but in the second screenshot there seems to be some use of channel 6?

To mention changes I made in the last couple of days:
Changed Thread to channel 20 (looks like to be the least used channel )
Unplugged the HomePod mini for good. (put it in the cupboard)
Moved my ZBT-2 antenna as far away and high up as possible from my HA Nuc
Added an extra EVE power-plug in the kitchen to have an extra point for my kitchen table light (seems to be most problematic)

But still no solid network. Last night several devices dropped of the network every couple of hours.

hollow prairie
#

Is my OTBR not fully installed because I started with a Apple Thread network? Am I missing features?
I read that there should be a tab on the left where I would be able to do "configuration" in this OT Border Router screen?

iron canopy
#

In general you shouldn't need to do anything on the OTBR web interface. it provides some occasionally helpful diagnostics, that's it.

#

there is no "configuration" menu on it.

#

all configuration of the thread network is done through the home assistant thread integration (which talks to the otbr through its api, via the openthread border router integration)

hollow prairie
#

Okay, thanks. Good to know nothing is missing 😎
The Integration still has my 1 border router, the HA one. Only the name is still what is left of my Apple part of the network.

#

Again, today at 17:00 I had a drop on a lot of devices !!!
I was waiting for this to happen because I was thinking it was happening multiple days that at 17:00 the lights would not turn on and the blinds would not lower after that.

crisp totem
# hollow prairie I have a Unify Dream Machine and 3 AP's configured. Ask me, I can provide more i...

An alternative option is to ditch the Eve pluggins. Several, certainly not all, users I know have had issues with the routing of their thread network repeatedly failing when eve Pluggins (thread Routers) are involved. Replacing with cheaper, but Matter/thread 1.4, pluggins like onvis has stopped the insidious and sometimes precipitous thread network degradation that occurred at random intervals and times in my thread network, over 100 thread devices.

Eves 3.5.0 firmware is old and they are very slow to update, the device connection itself was and is robust, but over time I come to anecdotally believe the pluggin aggressively takes over as a thread router and fails/errors in that capacity with some frequency.

hollow prairie
#

Not the best option for me, 41 EVE Energy plugs in the house at the moment. Going to be pricey again.

dapper canyon
#

ya I plan on ripping out all Eve devices

crisp totem
crisp totem
# dapper canyon ya I plan on ripping out all Eve devices

I'm down to 3 eve devices. 2 motion sensors and an eve weather all on an outdoor patio. Not routers no negatives so far. switched to Ikea for door/window contact sensors, replacement cost will be recouped by using standard AAA batteries alone.

hollow prairie
#

Still happening, also today, 17:00h devices dropping off!
Could there be a process that is trying to refresh or maybe transfer to the Apple Border Router (that is not there anymore) and this will make all devices disappear and re -appear after a couple of minutes or up to 1 hour or more?
I have added the Matter log from 17:05h if this makes a clue for someone.

crisp totem
#

I know its not what you want to hear, but this finding was posted the new-matter-server discord here.

“Eve Energy + matter.js
Load-dependent instability occurred with Eve Energy thread devices. Reducing from 6 Eve Energy to 3 restored stability.

Raspberry Pi 5, 8GB, NVMe, ZBT-2, single OTBR
HA matter.js fabric: 45 Thread devices
Matter over Wi-Fi: 48 devices (appear connected, not tested)

OBSERVED BEHAVIOR

BEFORE (6 Eve Energy plugged in)
• 80% of Thread devices offline
• Frequent subscription timeouts
• Session teardown and rediscovery backoff
• Devices marked offline despite live telemetry

AFTER (3 of 6 Eve Energy unplugged)
• 5% Thread devices offline
• Subscription churn greatly reduced
• Rediscovery backoff events became rare”

echo flume
#

Hi guys,

as a measure point… I am using latest HA Python-Matter-Server (not Matter.js), have 7 Apple Thread Border Routers and 77 Matter over Thread devices:

  • 2 hardwired AppleTV 4K 3rd Gen

  • 4 HomePod Minis

  • 1 HomePod v2

  • 11 Aqara T2 E27

  • 9 Aqara T2 GU10

  • 1 Aqara FP300

  • 14 EVE Energy

  • 10 EVE Door & Window

  • 2 EVE Motion

  • 1 EVE MotionBlinds

  • 8 EVE Thermo Gen 4

  • 1 EVE Thermo Gen 5

  • 2 EVE ThermoControl

  • 2 EVE Weather

  • 4 Philips Hue A67 E27

  • 4 Philips Hue Essential A60 E27

  • 8 Philips Hue Essential GU10

I have a full Unifi stack:

  • network application: 10.0.162
  • efg: 5.0.10
  • u5g-max-outdoor: 7.3.37
  • usw-pro-xg-48-poe: 7.2.123
  • 5x u7-pro-xgs: 8.4.6

Everything works as expected. Over the day I see some resubscriptions in my Matter server logfile. But this happens within seconds to maybe one minute.

I am not using the HA OTBR addon with the ZBT-1 in my Home Assistant Yellow. When I do that, my system gets unstable after one or two weeks. Than I have disconnects every 6 hours. I didn’t test it since half a year and I do not plan to do so, before Apple and HA OTBR both support Thread 1.4.0 at least.

When I unplug all 7 Apple Thread Border Routers, it takes about 20-60minutes for everything to reconnect to Home Assistant Matter server. It’s much faster in Apple Home. There are all devices available after 15-20 minutes.

When my Thread network is established and I reboot my HAOS (HA Yellow) only, it takes 5 minutes to reestablish my complete Thread network in Home Assistant.

I am using WiFi channel 1 and 6 only, Thread channel 25 and I also have a Zigbee2MQTT network 17 devices on Zigbee channel 24. 2.4 GHz WiFi channel 6 doesn’t interfere with Zigbee/Thread channels 24/25.

hollow prairie
#

Thanks for your elaborate list of devices and settings.
And as I understand correctly, you do not run the OTBR add-on on your Home Assistant Yellow?
On your Home Assistant Yellow, do you run the Thread network? Or do you only run the Matter add-on and integration on Home Assistant?

My UniFi:
UniFi OS Version: 4.4.6
U6+ AP Device version: 6.7.31
2x AC Lite AP Device version: 6.7.35
USW Lite 16 POE Device version: 7.2.123
2x USW Lite 8 POE Device version: 7.2.123
USW Flex Mini Device version: 2.1.6
Site Manager: 5.0.1

Thread Network Devices:
41 EVE Energy Hardware 1.1 Firmware: 3.5.0
11 EVE Door Hardware: 1.1 Firmware: 3.2.1
3 EVE Motion Hardware: 1.1 Firmware: 3.5.0
12 EVE MotionBlinds Hardware: 1.0 Firmware: 3.5.1
Nuki Smart Lock Pro Hardware: 10.16 Firmware: 1.4.1
3 ZemiSmart Curtain Hardware: 1.0.0Firmware: 1.0.0

Via HomeKit:
2 EVE Light Switch Hardware: 1.1 Firmware: 2.1.3
Aqara PS-S02E Hardware: 1.0.0 Firmware: 1.2.7

And a whole bunch of other stuff, mostly via WiFi or Bluetooth.

crisp totem
#

Haos can use the apple thread network and you are in no way required to set up a separate zbt(1or2). Many of have found thread orders of magnitude more stable without a HA OTBR directly in the mix.

dapper canyon
slate zealot
#

I have 7 HomePods and a ZBT-1 all together in a thread network without issues.

dapper fox
#

3 EVE Energy as well, and no problem

crisp totem
#

The variance of what works and doesn’t seems like voodoo at times. Have 13 homepods and 5 appletv with thread. The ZBT-1 was a huge neg for me, for a while there like every 3-5 days the thread network degraded to nonfunctional. The 7 eve energy were not as big a problem, but would somewhat randomly fall off the network and refuse to reconnect for days, even after reboots them.

naive oak
hollow prairie
slate zealot
naive oak
#

That other thread has a lot of interesting insights, but the 6 hour regularity seems unique to them

hollow prairie
#

No, I also do have the 6 hour cycle. I found the pattern after I read about it in the other topic.

somber dirge
#

I updated the other day to the new beta Matter server. I'm still having the same every-6-hour "matter storm" cycle, but they are now much smaller (involving fewer nodes) and shorter lived (3 minutes or so). The cycle appears to coincide with when I last restarted my HAOS computer and then continues every six hours from there. I would love to figure out why this happens.

naive oak
#

is everyone with the 6 hour problem using Unifi gear? If so, anything in the system logs around that time to indicate a schedules wifi or DNS-related operation?

somber dirge
hollow prairie
#

I need to reboot, will keep an eye out if my 6 hour cycle shifts after that to the new reboot time

hollow prairie
#

The theory of the start time of the HAOS Server setting the 6 hour cycle is not holding up here.
Rebooted my server at 9:45 this morning. Calculated drop should have been 15:45. But at 11:00 I already had a drop cycle.
15:45 comes around, all devices stay connected!
17:00, again a full drop off of all devices!
This time it took quite long for them to recover (more than 15 minutes)
I am puzzled 😕

somber dirge
# hollow prairie The theory of the start time of the HAOS Server setting the 6 hour cycle is not ...

Interesting. I'm about to do the same experiment. I noticed once that restarting the HA host coincided perfectly with my 6-hour drops, but never paid attention close enough other times to verify this. So I'll give it a shot again and report back. So, in your case at least, this points to something external to HA. I see some people wondering if this is related to Unifi somehow. I have a Unifi network and it looks like you do, too. And poking around this issue elsewhere I see Unifi mentioned a lot. So far, I haven't been able to find anything in Unifi that could be causing this, but I'm no expert.

crisp totem
#

Does unifi do a background 2.4GHZ channel scan potentially on a 6hr cycle?

hollow prairie
#

On my end all network hardware is Ubiquti Unify. I see this mentioned a lot as well, although it is also widely used in this scene.

somber dirge
#

Well, I rebooted my HA host, but the every 6-hour matter storms stayed on their previous schedule. So it's not related to rebooting for me like I thought it might be.

I downloaded the support logs from Unifi and with the help of ChatGPT looked for culprits but couldn't find any. My APs are not scanning or switching channels or anything like that. My network is not restarting. My apple products are not on auto update.

I'm really at a loss here. I can't imagine what this could be! Something is happening every 6 hours and not leaving a trace, except a wake of offline devices.

velvet dagger
somber dirge
crisp totem
#

Neighbors Blasting wifi and channel switching every 6 hours? What is your Thread Border router?

somber dirge
dusty egret
# somber dirge Apple TVs and HomePod minis, channel 25.

Do you need multiple BRs, or is it just because you have them? What happens when you only have one BR online? Does the problem still exist? (You have to try with at least two BRs in "solo" mode, if the first try still shows the problem)

hollow prairie
#

The problem also exist with 1 border router, I have only the HA OTBR in use now.

dusty egret
#

And when you turn it off, and use one of the Apple BRs?

hollow prairie
#

What you suggest is:
Power up the Apple HomePod mini, let it take part in the Tread network. Then, stop the HA OTBR.
Question, also stop the Tread add-on?
Off course keep the Matter add-on running

dusty egret
#

Essentially yes, should be enough to stop the OTBR Add-on.
And if possible, and if the Apple TV has wired network, maybe use that one (if topologically feasible) instead of a HomePod.

dusty egret
#

@somber dirge Appollo77 answered to your logs in #new-matter-server . So he could see that there actually are prefix changes going on. That would also always affect multiple devices at the same time. So this all seems to happen in the thread layer.
Typical causes for a prefix change can be:

  • Net-split, when the thread mesh splits into separate partitions. (And subsequently re-joins)
  • (Border) router dropping from the mesh for longer than a few seconds.

This is also why I asked to do the experiment: If there is one BR in the mix that periodically fails, it can cause prefix changes to happen, which eventually would affect all devices.

crisp totem
naive oak
somber dirge
# dusty egret <@1006622182555787395> Appollo77 answered to your logs in <#1461119138713047274>...

Hello, again. Yes, it looks like some of my devices switch addresses, which I guess is a sign of a new partition forming. It appears, according to chatgpt going through my logs, that sometimes the storms end with two stable partitions. Other times, like an hour ago, several switched to a new partition and then back to the original. My setup is that I have a main house with two apple tvs ethernet. Then we have an outbuilding (art studio) that's too far from the house for its matter over thread devices to connect reliably to the devices in the main house. So we have two HomePods out there. I have NOT seen the handful of devices out there go offline during the storms. It always seems to be in the house. Curiously, in the house, the devices that go offline and switch addresses are the farthest ones from the Apple TVs. So what the heck does this tell me? Do you think I should try turning off one of the Apple TVs to see what happens in 6 hours?

somber dirge
crisp totem
# somber dirge Hello, again. Yes, it looks like some of my devices switch addresses, which I gu...

You may want to go into Apple home/settings/Home Hubs & Bridges then uncheck "automatic Selection" and then pick a Main house AppleTV to fix that as your Active apple home hub, in case apple is changing that back and forth from the Art Studio and the house.

Another suggestion is to buy a cheap thread router, like a onvis S4 smartplug, and place it half way between the devices that repeatedly switch addresses and the main house Apple TV to strengthen your homes thread network connectivity between those areas.

somber dirge
# crisp totem You may want to go into Apple home/settings/Home Hubs & Bridges then uncheck "au...

Thanks for this!

I tried the static home hub option. My understanding is that setting in Home settings only changes which apple device controls Apple Home, not whether it’s the thread network lead or anything with the thread network. Nevertheless I’ve never seen this move off of my main Apple TV even on auto, and setting it to static didn’t prevent the every-six-hour events.

And I have about a dozen eve energy plugs throughout the house that help relay traffic.

I’ve all but ruled out weak signals to my devices as the cause of this. If that were the case I would see devices dropping all the time randomly, not just for 10 minutes around the six hour schedule. Something on a schedule is causing chaos!

crisp totem
#

Was hoping it would be that easy sorry. The active Home hub does not have to be the network TBR, but apples automatic switching from 1 active hub to another will cause devices to report as being offline, not sure what that looks like from a thread routing/IP perspective though.

And just to throw this out there as some have and some have not seen issues with eve smart plugs. last resort unplug all or most of those and see if the 6 hrs cadence changes.

somber dirge
# dusty egret Do you need multiple BRs, or is it just because you have them? What happens when...

I finally was able to perform your experiment today with just having 1 Apple TBR online at a time. I have two Apple TVs in the main house and two HomePods in an out building. In both cases I left the HomePods off, but tried one Apple TV on only and then the other. In both cases the matter storms still occurred on time.

I'm noticing the the storms are drifting by a few minutes -- now they start about 40 minutes after the hour instead of 30. So, I just really feel like something is happening in Home Assistant somewhere, and a the drift is because something is maybe unavailable during the matter storm for a little bit and when it comes back on the new 6-hour timer starts. Otherwise, I don't know how to account for the drift. It seems unlikely to, say, be a Unifi event, because it would always start at the same time.

I still just have this weird feeling that having the ZBT-2 installed for part of a day, along with the OTBR add-on did something here. Maybe I need to talk with someone on that team? I do notice that when people talk about this 6-hour issue, it's maybe always with a OTBR installed.

I'm at a loss!

hollow prairie
#

Precise, same thought here. Also time is drifting by the same amount. (10 minutes) Now devices are dropping off at 14 past the full hour. Exactly 1 minute a day gets added to the time, as it seems.
My schedule today is 4:14 UTC (5:14 local time) and then every 6 hours.

The ZBT-2 is at the moment my only border router, no Apple stuff with Tread is connected.

velvet dagger
#

You're 100% sure it's not your internet router?

dusty egret
somber dirge
somber dirge
velvet dagger
echo flume
# velvet dagger I'm not sure, just throwing ideas

Yes, that could be the problem. I also have a full Unifi stack. In my case everything works rock solid, when I use my 7 Apple Thread Border Routers only:

  • 2 hardwired AppleTV 4K 3rd Gen
  • 4 HomePod Minis
  • 1 HomePod v2

When I enable my ZBT-1 (Home Assistant Yellow) and the OTBR Addon it takes round about 2 weeks before the issue starts and all my 75 Matter over Thread devices loose their connection every 6 hours. The issue stops as soon as I disable the OTBR addon.

So, yeah, maybe it’s combination of OTBR and Unifi.

It would be great if we could collect more data on this, e.g.

  • 6-hour issue present: yes/no
  • Matter ecosystems: Home Assistant, Apple, Google, Alexa, SmartThings, Ikea, Aqara
  • Thread Border Routers: brand/model
  • Network equipment: brand/model
velvet dagger
echo flume
#

In my case it’s as follows:

  • 6-hour issue present: yes
  • Matter ecosystems: Home Assistant, Apple
  • Thread Border Routers: 1 HA OTBR, 2 hardwired AppleTV 4K 3rd Gen, 4 HomePod Minis, 1 HomePod v2
  • Network equipment: Unifi Full Stack

—————

  • 6-hour issue present: no
  • Matter ecosystems: Home Assistant, Apple
  • Thread Border Routers: 2 hardwired AppleTV 4K 3rd Gen, 4 HomePod Minis, 1 HomePod v2
  • Network equipment: Unifi Full Stack
echo flume
velvet dagger
echo flume
#

I am not using OTBR anymore, because of this.

naive oak
#

Worth noting that when there are multiple Thread Border Routers and the nodes believe there is a communication failure, they form partitions and the partitions communicate via Thread Radio Encapsulation Link (TREL). This is essentially a wifi/ethernet bridge between the two thread networks and uses IPv6/UDP with DNS-SD. Here is where the potential Unifi interference may kick in

#

If you go back a year or two it was strongly recommended not to have multiple TBRs from different vendors as the previous Thread spec didn't cover the scenario well. It's very possible that's still the case in practise

#

So I think anyone experiencing this problem should try using only one Thread Border Router

somber dirge
#

6-hour issue present: yes
Matter ecosystems: Home Assistant, Apple (But issue started after I installed and then removed OTBR and ZBT-2 and never went away)
Thread Border Routers: 2 hardwired AppleTV 4K newest, 2 HomePod Minis newest
Network equipment: Unifi Full Stack

I wonder if I should install OTBR again and then remove it?

hollow prairie
#

In my case it’s as follows:

  • 6-hour issue present: yes,
  • Matter ecosystems: Home Assistant, Apple, (for now I removed the Apple HomePod Mini)
  • Thread Border Routers: 1 HA OTBR, 1 HomePod Mini, (for now I removed the Apple HomePod Mini)
  • Network equipment: Unifi Full Stack

For me it seems not to make a difference at the moment to have the Apple HomePod Mini added in the net work or not. (Apple HomePod Mini is already offline for over 2 weeks now)

somber dirge
echo flume
somber dirge
hollow prairie
#

Yes 40 Eve plugs in the house

somber dirge
# hollow prairie Yes 40 Eve plugs in the house

@echo flume Well, I unplugged all 7 of mine and still get my on-time matter storm. Here are all of the Matter devices i have:

Inovelli White Switches
Eve Energy plugs
Aqara Smart Locks
Sunricher RGB controllers

I've previously tried removing all the RGB controllers from the mix, but that didn't stop the storms either.

We all have something in common, it's just figuring out what it is!

somber dirge
#

Has anyone tried shutting down the matter server, then turning off TBRs for 20 minutes then turning on the Matter server and then one TBR at a time until they're all back on?? ChatGPT keeps telling me it thinks there's ghost info in the thread dataset that is causing an every-6-hour disruption. I don't know if that's right, or if it would help, but thinking of giving it a go.

hollow prairie
#

What we also all seem to have in common except from EVE plugs is the Ubiquiti gear.

somber dirge
#

So far it seems like the common thread that I've seen here and elsewhere is Ubiquiti gear and OTBR installation (and in my case, removal). We definitely have Eve in common, too. But my problems persisted after I unplugged all of my Eve devices, so I think it must be something else. I still feel like it has something to do with the OTBR software. I think I'll start a GitHub bug report in the OTBR section and see if I can get a developer to look at it.

dapper canyon
#

Not that it helps much, but I have 2 ATV 4Ks (wired) and never used the HA OTBR. I have multiple Eve energy devices (which will be going bye bye), but my Thread network is rock solid. Ruckus Wi-Fi and Netgear Pro-AV enterprise switches. 71 Thread devices and zero complaints. I was tempted a year ago to add the ZBT-1 and OTBR, but given the purported interop issues I never went down that path. I also have 31 Inovelli White switches.

somber dirge
dapper canyon
#

About two years ago I had a QNAP core ethernet switch (no Netgear Pro-AV) and Thread/Matter was a DISASTER. Horrible QNAP network firmware.

dapper canyon
somber dirge
zenith summit
#

Try to play with leaderweight of otbr, see if it helps

somber dirge
zenith summit
#

I was thinking that your current partition leader is somewhere in the opposite end of network, and if tbr after sometime unable to contact the leader then it will create its own partition thus omrprefix changes.

#

Nearby device that cant contact its leader will accept tbr newly created partition too. But after higher priority leader is reachable again, the node will go back to old partition. Thats what i observe when i reconnect my esp32 thread based light switch

somber dirge
zenith summit
#

Could be anything, but my guess would be something related to automation ?

#

My thought of changing leaderweight manually in otbr is for otbr to have itself a partition that is stable(?) and hopefully other node in thread network agree to change to.

somber dirge
crisp totem
#

@somber dirge I’m curious if you tried unplugging all but 1 appletv or HomePod and leaving the network like that overnight to see if the cycle repeats in that setup? Always hard to trouble shoot if you have fam using the apple TBR devices during the day.

somber dirge
#

I tried each Apple TV individually and the storms continued. About an hour ago I shut down all of the Apple thread border routers and my Home Assistant host. After 20 minutes I turned on the Apple devices. A half hour later I turned on the Home Assistant host. Everything came back cleanly and all of my matter over thread devices showed up on the same partition, except one. I powered cycled that one and it’s now on the same partition as the others. A matter storm is usually scheduled to start in about five minutes, so I’ll see if that happens. If not, I’ll see if the storm starts six hours from restart. Or maybe no more storms at all!?

#

Ugh, matter storm right on schedule with a little bit of drift to later.

crisp totem
#

So unplug all for 15-30 minutes power only one and wait has been done? Can you tell from Apple home when the issue hits?

somber dirge
#

Do you mean unplug all Apple TBRs and leave only one? Yes, I tried that for every one of them.

crisp totem
#

No unplug all for 15-30 then only plug 1 in let it build and manage things by itself from scratch.

somber dirge
#

Yeah tried that too. Convinced restarting and isolating is not a solution to my issues. I need to fine the source of the 6-hour chaos.

dusty egret
#

I have seen a case recently where the local network (lan) got assigned global IPv6 addresses by the ISP. The TBR(s) support DHCPv6 PD, hence all thread devices also have global IPv6 addresses. Now the ISP-assigned prefix wasn't fixed, but rotated at least once a day...