#2x SLZB-06's as Thread Border Routers... Have 1 running, need a second.
1 messages · Page 1 of 1 (latest)
Because of the low-level networking stuff it does, I don't think there's any ability to run multiple instances of OTBR within HAOS.
There's some trickiness about running OTBR within HAOS that makes using the HAOS OTBR alongside an external OTBR problematic in some circumstances.
Unless you're having an issue where some of your devices can't maintain a connection to the thread mesh, I'd recommend sticking with one OTBR for now.
If you do want to add another thread border router, i'd recommend using a completely separate computer to run it, and make sure that it's running OTBR with TREL enabled. The Matter server running in HAOS will always prefer to use the OTBR running as an HAOS add-on, but with TREL that OTBR can forward traffic to another TBR which has a better thread route to the destination device.
(I wouldn't be surprised if at some point in the future there is a thread border router firmware available for the SLZB-06 devices themselves - the ESP32 they use for the web interface is powerful enough to run OTBR)
I see. My main issue is that I seem to just have poor signal to a "working" Thread/Matter switch (White Series from Inovelli). It drops off the thread network frequently and reports poor signal despite not being physically far from the TBR.
you'd probably be better served by getting more devices that can act as thread routers ("Routing eligible end devices", not border routers), but there's not a whole lot of matter devices in that category available right now :/
pretty much just nanoleaf bulbs and eve outlets
one advantage of the SLZB-06 is the fact that it's POE and fairly small - you should try moving it around to different locations to see if there's somewhere it can get a good signal to as many devices as possible to strengthen the mesh (try to reduce the number of walls/floors between it and other devices)
also look into the thread channel vs 2.4ghz wifi channels you're using; you might be able to reduce interference by tweaking things there.
(general recommendation is the same as zigbee - prefer wifi channels 1 and 6 and use channel 25 for thread to minimize overlap; if you have both thread and zigbee it's best to have different channels for each)
it also sounds like there might be some firmware updates for the inovelli switches to improve their network connectivity
hopefully it's not a hardware limitation, like poor placement of the antenna within the electrical box when installed or something like that :(
Thank you for the info. I do have some longer cat6a cable coming as well as a loooong USB-C to A cable today, so I'll be moving the TBR closer to the first switch, and I ordered 2 eve plugs to start off with that will definitely have good enough signal to get it all connected.
I've noticed that for some reason, I am unable to change the channel in HA for Thread. I am on 15, but would like to use 20 as that seems to sit between the two heaviest used channels for wifi 2.4Ghz. I have bad/noisy wifi neighbors on my block. I set 20, but it doesn't seem to change.
I usually recommend channel 25, since that only has slight overlap with wifi channel 11. channel 15 is between wifi channels 1 and 6 and can get interference from both; channel 20 is between wifi channels 6 and 11 and can get interference from both.
not sure why you'd be having an issue with channel change - if you're only running otbr from haos and no other thread border routers, it's unlikely that some other device will be overriding the change.
it can sometimes take a while for the channel change to commit, since it's a multistage process where each device in the thread network gets informed of the pending switchover to a new channel, and then the actual switchover happens afterwards.
Yeah, there's a 5 minute delay before the channel switch actually starts. Did it change channels now?
I see. I'm changing it to channel 25 now so we'll see
5 minutes passed now. Did it change now?
No, it's still reporting channel 20. Should I restart the TBR itself or OTBR?
You can try to reload just the OTBR integration in HA, and then the Thread integration. Otherwise, just restarting everything (HA and the TBR)
I've restarted add-ons, integrations, and the TBR. Not HAOS though
Doing that now
Okay restarting HAOS got it to report channel 25
With all that, I am just waiting for supplies to come and will test more on Saturday
Okay, I was able to get things done last night. Installed 2 Eve plugs, now the network spready seems much better and I don't need to move the coordinator afterall. Switch now reports green on the signal strength
Waiting on a load resistor for my neutral-less switch to make sure that's good but I do anticipate it will be
@hybrid lotus you are running the SLZB right ?
Are you running it over PoE or USB ?
I'm testing two of these units but I'm seeing some crashing. Wonder if ythat is a generic issue with these things.
Disable the watchdog until it crashes and then one of the last lines would be "radio tx timeout"
for that matter, is it SLZB-06 or SLZB-06M? one has the CC2652P, the other EFR32MG21, so the RCP firmware is likely to be a fair bit different
Yes. The SLZB is powered and network connected over ethernet, but protocol communication needs to be over USB in the current firmware and OTBR add-on
Mine is the SLZB-06
06M should work fine as well but it's the "less stable" chipset.
the 06M is the same chip as is in the Skyconnect/ZBT-1 - it's probably better tested as a thread rcp for use with home assistant than the TI chip
(iirc the recommendations for the CC2652P are mostly fairly old, coming from people using Z2M prior to the "ember" zigbee driver becoming available)
Ah well SMLIGHT says it's not their best chip lol
Anyway, if the add-on is crashing or terminating, likely a config or setup issue
tube has also reported issues with lockups doing thread on the CC2652P with their hardware: #the-workshop message (when used with serial over network rather that direct usb connection)
Solution is to use USB for the time being. Are they running dev firmware?
I have one of each, both show the same symptoms. Trying to determine if its these devices, me being stupid or some other issue, unrelated to these devices.
Where did you read about protocol communicatipon over USB ? From a reliability perspective that makes sense because its kind of dangerous to tunnel the realtime RCP protocol over ethernet but because some people are raving about it, I really wanted to give it a try. Especially since we cant fix the issues with Apple border routers.
So far the most stable setup I could reach is with just a pi on each floor with a dedicated ZBT-1
No the radio firmware is crashing. Most likely because of timing issues
ha, that kind of waves the whole purpose of the device, making it a somewhat expensive alternative to a ZBT-1 or Sonoff-E. As far as I'm aware you cant run super long USB cables without risk
I will ping the company that they should either remove the thread over ethernet support or mark it as unstable.
If I followed the website etc correctly they even offer it over wifi, which is insane.
Hmm, coming back on this topic there might be something else going on as I used virtual machines to create the HA setups required to run the OTBR addon connected to the radios. That could also mess with the timings so I'm doing another test with a HA Green, so a dedicated device, connected to one of these SLZB devices.
But yeah, I agree with @eternal grail here that ideally these devices should/could run OTBR themselves.
Its just that I'm a bit worried that ESP32 is not capable enough for that.
I have none of the issues you're experiencing and my setup is with the device as a TBR, powered by PoE and it has network access, but in the HA add-on it's configured to use USB as the communication method.
I have 3 border routers, one for Z-wave LR, one for Zigbee and one for Thread
I wasn't able to get the add-on to see the SMLIGHT device for Thread over ethernet, it would only talk to it over USB. I am running only the latest "stable" version, not the dev version, of the Matter of Thread firmware
As someone who has been using the Espressif thread border router devkit as their (only) thread border router for a while, it honestly seems fine?
Memory limits might be a problem on systems with large numbers of registered services (SRP)
ahhh yes now I get what you mean. No that is not using USB as communication method. It sjust to validate that form. Basiclaly you are using SRP (Thread) over Ethernet with success!
your config should look like this (more or less) so the USB device on top is just bogus. Its all about the "network device" setting.
It is ? I remember you saying it crapped out at about 10 devices ?
The SLZB-06 can run powered by PoE with the ESP32 accessible over ethernet while the CC2652P is available via USB through the USB UART chip.
Issue with their default config having mDNS entries set to 10. I bumped that way up and it works quite well.
I have to take a look at my config shortly
Ah that is nice to hear. Did you get the devkit or something like the S20 from GL-iNet ?
I'm using the Espressif tbr devkit with ethernet sub-board. (Got it off their AliExpress store). The ESP32-S3 on that seems like it should be similar in speed/capability to the ESP32 chips in the SLZB-06 and TubeZB devices.
Now the question is if they have plans to more the device a BR or the community should pick it up
Honestly my main issue with the devkit has been no good case options unless you own a 3d printer.
Yeah that is always the issue with these devkits.
A community ESP32 tbr firmware might be a good idea, to clean up and bring some polish and missing configurablity to the Espressif "example" code.
If someone was really adventurous (probably not me, heh) they'd add the matter sdk too and implement the matter thread border router device :)
haha, now you are overdoing it 🙂
Lets start with step one, which is an actual working OTBR on ESP32 that is compatible with HA (so the api exists to send the creds)
That works with the Espressif example, I set up my thread network through Ha's OTBR integration. I've done a channel migration too.
Nice!
So with this config, I'm not talking to the radio over USB?
Yes you are - di dyou try removing the USB cable ? You can select anything you want at USB port btw
The firmware is set to network protocol, but it wouldn't work without the USB attached iirc
It should - that is how I configured them as well. And some good news (knock on wood) the device is still alive now that is connected to a physical host. So there seems to be something off with my VM host. I'll let it run overnight to see if it remains stable
Yeah, your device is definitely communicating over ethernet.
I'll try without the USB when I get home
I thought that the USB was required even though it was set to ethernet
If I select /dev/ttyAMA10, it will work?
Yes, just select anything you want - its not used but needs to be set to just validate the config form
Got it. I'll test tonight
i saw a recommendation somewhere to set it to /dev/null to make it obvious that it's not being used, so you don't get confused later
Where do I set that manually?
hmm. might have to use the three dot menu and edit it as yaml to do that :/
(I'm not actually sure if that passes the validation properly, tbh, i haven't tried it myself)
Yeah, it doesn't pass validation - you need to select one from the list, which then in turn is not used at all. Its a bit funky UI.
If this whole serial over IP does in fact work (with ethernet, I dont trust WiFi for this at all) - we should totally fix this UX.
Small update on my end. The crashings of OTBR seems to be unrelated to the SMLight stick. It looks like it happens with any virtualized instance of OTBR. So even if you just plug in a ZBT-1 stick. I can now reproduce it on 2 different virtualization hosts (although both are based on KVM). Maybe somebody else reads this and wants to confirm ? Make sure to disable the watchdog on the OTBR add-on so you can notice it crashed, otherwise it restarts.
Running HAOS on QEMU/KVM ....Not exactly sure the scenario you are asking about, but I did unplug, wait, plug back in the ZBT-1 and found the OTBR detected an i/o error and did what looks like a proper shutdown.
Well I'll be. Unplugged it and it's working fine.
I threw some rcp firmware on a spare esp32-h2 i had and added it via usb passthrough to my haos vm with otbr; i'll let you know if i hit an issue
Yeah, well fine is an understatement. I have been running tests all day together with @median venture and we conclude that the approach of RCP over IP is dangerous. Multiple occurances of a radio/firmware lockup where the OTBR can no longer communicate with the stick, then leading into a mess. So we actually do not recommend these sticks for Thread when using the ethernet connection. Instead just use the USB-connection or use them for Zigbee (where one could argue again how smart it is to forward the serial protocol over IP).
Super nice. I'm really curious about your findings. Other than the issue with RCP over IP with that SMLight device, I think it is still a super nice device hardware-wise and if we/they can somehow manage to get OTBR running on the ESP32, it will be a killer device.
home assistant automatically joined the haos otbr to my existing thread network with the esp ot br devkit, and from the add-on logs i think it discovered it for trel, which is neat.
I've had quite a good experience with the stick. Has been working flawlessly since I set it up, aside from my range issue
I think the stick is fine (very fine even) as long as you connect it to USB
Which was my fault, rectified by adding thread devices between the others
It's been over 12 hours since I disconnected the USB and not a single hiccup. And obviously I'd not been running over USB to begin with (without realizing it) and so it's been several days, no issues.
Perhaps there's an issue elsewhere on the network that's causing these issues for you? Network devices, switches, cables, etc?
could also be CPU contention issues when running HAOS in a VM. I'm running the VM on my NAS and I sometimes do cpu-intensive stuff there, so if that's the cause I might be able to reproduce it.
I'm running HAOS on a bare metal Pi5 8GB with a PoE+ hat that uses a 128GB NVMe drive. My network stack is all Unifi
My setup is HAOS OVA via libvirt/kvm on a fairly powerful x86-64 Linux NAS. Notably, the NIC on this is an Intel 82599 10gbit card, shared with the VM via PCIe SR-IOV (so I'm not dealing with any software network virtualization stuff)
That is exactly why it is so dangerous to do MRP over IP. Its a realtime protocol not meant to be router over an IP network. A bit of congestion on the network can take your Thread network down.
On a HA Green it took much longer to reproduce but on a VM it was not that hard even. Could reproduce it on a Proxmox host and a NAS.
Well, that is a setup that the average person doesn't have. I tried to simulate common scenarios, such as running OTBR on a HA Green, a pi4 and as VM on a Synology NAS, setups that are pretty common. Strangely enough the issue is much worse on the VM but maybe that adds just enough additional latency to break it sooner.
Why do you have 2 of them sitting on the same rack? Or is this one zigbee and one thread ?
Also, why not put it closer to your devices instead of in this noisy environment with interference from the network gear.
@hybrid lotus Have you updated innoveli switch firmware to 1.0.5? Innoveli increased the thread radio power from the initial firmware release to increase range and consistency.
One is Zigbee and one is Thread, Zigbee on Ch 20 and Thread on Ch 25.
Yeah, they're both on 1.0.5. One switch reports green, the other flashes red on signal check. Engineer sent me an email saying that their signal check is incorrect and they have no ETA to fix.
Also, it is less than 10ft to the farthest device, it's just that my home is old and has old building materials.
so signal is very weak to those end points. Simply adding 2 Eve smart plugs corrected that.
So the issue is the network conditions, less the environment the controller runs under?
It's a combination of both. A single missed (or delayed) packet can already give issues.
The problem increases with multiple BR's are involved and TREL. If one BR crashed and restarted on the watchdog, the traffic gets send to nowhere. Maybe for a couple of devices, a very solid ethernet connection, a single BR and a physical (not VM) HA host this can actually work but that are too many variables to account for. Our conclusion is final, this is not suitable for daily usage and we have added a big fat warning to the OTBR docs and I've sent the company an email about our findings.
Roger that. I'll stick to networking for now since it's working for me (and it's cleaner with 1 cable to the device). If any issues arise, I'll switch it out to USB control.
With that, a question: Is OTBR implementing support for additional TBRs? While adding some devices to extend the mesh is viable, I do have two of these and would ultimately like to move them to better spots and maintain one cohesive mesh.
Also, with your testing was that with the "stable" firmware or the dev firmware?
I tried one with stable firmware and one with esphome+thread. Didnt try the dev one
But still, we already knew the downsides of it, now its proven.
Problem is that our add-on system doesn't allow multiple copies. And even if we do that, it uses host networking (and its mdns name) so this will quickly lead into issues. Otherwise we could have simply copied the add-on (so you have otbr2, otbr3 etc). In the end I just ran a couple of VM's with a plain HA instance + OTBR addon per stick ) but that is a bit overkill.
For now the only way would be to fire up a separate raspberry pi. This can be a pi3, as long as it has 1gb memory.
Ultimately you just want to be able to install the OTBR on something small like a pi zero 2w or even better, run OTBR on the ESP chip
I see. I'll have to research that and how to connect the second one into HA
I see a lot of this
14:52:33.910 [W] Nat64---------: incoming message is an IPv4 datagram but no NAT64 prefix configured, drop 14:52:33.940 [W] Nat64---------: incoming message is an IPv4 datagram but no NAT64 prefix configured, drop 14:52:33.943 [W] Nat64---------: incoming message is an IPv4 datagram but no NAT64 prefix configured, drop
The last timestamp for it is 14:52:51:083
Doesn't seem like it keeps normal timestamps for some reason? Seems more like hours passed since successful startup
Cuz I see this startup sequence
`-----------------------------------------------------------
Add-on: OpenThread Border Router
OpenThread Border Router add-on
Add-on version: 2.12.2
You are running the latest version of this add-on.
s6-rc: info: service socat-otbr-tcp successfully started
System: Home Assistant OS 14.0 (aarch64 / raspberrypi5-64)
Home Assistant Core: 2024.12.3
Home Assistant Supervisor: 2024.12.0
Please, share the above information when looking for help
or support in, e.g., GitHub, forums or the Discord chat.
s6-rc: info: service banner successfully started
s6-rc: info: service universal-silabs-flasher: starting
[01:57:35] INFO: Flashing firmware is disabled
s6-rc: info: service universal-silabs-flasher successfully started
s6-rc: info: service otbr-agent: starting
[01:57:35] INFO: Setup OTBR firewall...
[01:57:36] INFO: Starting otbr-agent...
[NOTE]-AGENT---: Running 0.3.0-b041fa52-dirty
[NOTE]-AGENT---: Thread version: 1.3.0
[NOTE]-AGENT---: Thread interface: wpan0
[NOTE]-AGENT---: Radio URL: spinel+hdlc+uart:///tmp/ttyOTBR?uart-baudrate=460800&uart-init-deassert
[NOTE]-AGENT---: Radio URL: trel://end0
tiocmbic: Inappropriate ioctl for device
[NOTE]-ILS-----: Infra link selected: end0
56d.05:01:17.157 [C] P-SpinelDrive-: Software reset co-processor successfully
00:00:00.066 [N] RoutingManager: BR ULA prefix: fd33:1b06:a59c::/48 (loaded)
00:00:00.066 [N] RoutingManager: Local on-link prefix: fdd3:17d2:c78f:131a::/64
00:00:00.087 [N] Mle-----------: Role disabled -> detached
00:00:00.098 [N] P-Netif-------: Changing interface state to up.
00:00:00.109 [W] P-Netif-------: Failed to process request#2: No such process
00:00:00.109 [W] P-Netif-------: Failed to process request#6: No such process
s6-rc: info: service otbr-agent successfully started
s6-rc: info: service otbr-agent-configure: starting
Done
s6-rc: info: service otbr-agent-configure successfully started
s6-rc: info: service otbr-agent-rest-discovery: starting
00:00:00.385 [N] Mle-----------: Role detached -> leader
00:00:00.385 [N] Mle-----------: Partition ID 0x51210602
[NOTE]-BBA-----: BackboneAgent: Backbone Router becomes Primary!
00:00:00.500 [W] Mle-----------: Failed to process Link Accept: Security
00:00:00.762 [W] Mle-----------: Failed to process Link Accept: Security
[01:57:36] INFO: Successfully sent discovery information to Home Assistant.
s6-rc: info: service otbr-agent-rest-discovery successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started`
Then this https://pastebin.com/qn6Mfke4
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
And then after that is just a spam of incoming message is an IPv4 datagram but no NAT64 prefix configured, drop
But the spam stopped
It means one (ore more) of your devices wants to reach the internet (NAT64)
Yeah, that is time since start.
How do I figure it out? I only have 4 Thread devices so far.
Which ones do you have?
Two Inovelli White switches and two Eve Smart Plugs
Most probably the Innovelli's then. They are rather advanced. I know for sure the Eve's do not do NAT64
Ah okay. Is that a bad message to see? Or I can safely ignore it?
just ignore it
Got it, ty
fyi - during my testing I haven't seen any issues where the otbr add-on itself failed when running it in a HAOS vm - but I am seeing issues where communication between HA and the supervisor breaks while the OTBR add-on is running. See this thread: #1318380648221638738 message
I've had to turn the OTBR add-on back off for the time being to get my HAOS box stable again.
In the end it turned out to be the combination of SMLight + VM.
Those SMLight devices are bad when used for thread. Stefan already warned about doing MRP over IP but it turned out to be even worse than we expected. Too bad, because otherwise these are solid devices. Hopefully one day they manage to run OTBR on the ESP32
Anyways, keep an eye on that discovery about supervisor. So far I didnt hear about that ever. If you are able to reproduce, feel free to ping me.
This is in a VM though?
Seems like these issues are VM related. While obviously there should be no issues, I'm having a fine experience thus far on bare metal HAOS.