#How to troubleshoot and fix failure to convert HomeKit device from BLE to Thread

1 messages · Page 1 of 1 (latest)

astral lava
#

I have a ESP Thread Border Router board which I flashed using Espressif docs as a guide, minimally customising e.g. network name, keys etc. and making sure that commission and join were enabled. This is the only Thread Border Router which I have.

The Thread integration in HA picks up my config ok, and having added the Open Thread Border Router integration I am able to configure my new Thread network / board in HA's Preferred Network for Thread. Credentials copied to my iPhone ok (first time I tried it failed, but then it succeeded).

Everything looks ok until I click on "Provision Preferred Thread Credentials" on the connected HomeKit device (Eve MotionBlinds) which I'm trying to convert from BLE (ESPHome BLE Proxies) to Thread. After I click on PRESS, the device and all of its entities go unavailable and don't recover. Nothing sticks out in the logs which gives me any clues at this point, although admittedly I haven't enabled debugging for the HomeKit integration yet.

I've tried a few different basic steps like switching between ethernet and wifi on the ESP board (and reconfiguring with the new IP in HA), moving the board closer to the blinds, but neither of these steps seemed to make a difference.

My ultimate aim is to get Thread working using this ESP board (I don't have any use for a HomePod or AppleTV) for a total of 4 Eve MotionBlinds, but without converting the blinds to Matter. Aside from enabling debugging for HomeKit, does anybody have any pointers for anything else to check? Should this work / is it supported?

I'm using HA Core 2024.8.3 through Docker, network host mode, IPV6 enabled.

astral lava
#

Just to close the loop on this, in case anyone comes looking in future... debug in HA didn't help too much (for HomeKit or Thread), but what I did see was in the web ui for the ESP Thread Border Router, the child device was present in the Topology section.

It turns out there's a couple of extra commands which I needed to execute on my HA container host, in order to configure bi-directional IPV6 connectivity. It's documented by Espressif, I just didn't initially realise I would need to do it:

sudo sysctl -w net/ipv6/conf/eth0/accept_ra=2
sudo sysctl -w net/ipv6/conf/eth0/accept_ra_rt_info_max_plen=128

https://docs.espressif.com/projects/esp-thread-br/en/latest/codelab/connectivity.html

Anyway, bit of PEBKAC / RTFM, but all appears to be working now.

serene crag
#

Interesting! Home Assistant Python Matter Server README shows the following two commands:

sysctl -w net.ipv6.conf.wlan0.accept_ra=1
sysctl -w net.ipv6.conf.wlan0.accept_ra_rt_info_max_plen=64

What is better (ra=1/2, plen=64/128)?

https://github.com/home-assistant-libs/python-matter-server?tab=readme-ov-file#requirements-to-communicate-with-thread-devices-through-thread-border-routers

GitHub

Python server to interact with Matter. Contribute to home-assistant-libs/python-matter-server development by creating an account on GitHub.

astral lava
#

Good question. Probably a bit beyond me though - perhaps someone else can chime in?

modern fiber
#

i'd follow the HA recommendations for the home assistant side of things - they should be better tested, and should be based on what's used in HAOS (hopefully)

#

but if you have forwarding enabled on the network interface (i.e. the machine is acting as a router) then you will need to set accept_ra=2

#

as an aside, on the esp thread border router, make sure you increase the MDNS max services config (CONFIG_MDNS_MAX_SERVICES in sdkconfig). The default is … 10? I think, and that's low enough that it can run out of MDNS services after only a few matter devices have been added.

#

symptom is that commissioning new devices will fail when it tries to find the device on the network, since the thread border router isn't able to publish the mdns records to let the commissioner find the device.

#

running the TBR with the usb serial port connected to a computer isn't a bad idea when something's going wrong, since there might be some debug logs.

astral lava
astral lava
astral lava
earnest coral
#

I think we’ve seen that behaviour with those blinds before unfortunately.

astral lava
#

I've made a couple of changes to see if it helps with stability: changed the thread channel to something less congested, plus switched to ethernet. So far the connection has been up for just under two days without any dropouts. Will check in again with an update.

earnest coral
#

another case I remembered was someone had their blinds near a border router, but also had another google device acting as a BR that they hadn't mentioned. it was out of range of the rest of the thread network. because they don't have TREL, some packets would randomly go to that other BR and get sprayed into the ether. it sounds like you only have the 1 BR, but worth mentioning (and keeping in mind if you grow your mesh).

#

FYI: This isn't relevant to your current status, but the sysctl's above arent what HAOS does, exactly

#

HAOS uses NetworkManager, which processes RA's itself. So it doesn't need the max plen setting at all.

#

And i think it forces accept_ra to 0 so that it and the kernel don't try and program the same routes into the routing stack

#

they do program the routes in differently to each other, so it used to be that the accept_ra* variants were superior to NetworkManager (and HAOS).

#

HAOS has carried NetworkManager patches for a long time to make it work properly, and I think NetworkManager might have finally caught up a couple years later.

#

accept_ra=2 is suspicious, however, as it implies net.ipv6.conf.all.forwarding is on.

#

when that is on, if your BR changes its fe80 address (unstable wifi, firmware bugs, or just decided to for unknown reasons) it will take up to 30 minutes for HA to stop routing packets to the old fe80 address.

astral lava
#

Thanks for the explanation @earnest coral . If I get stability issues I'll try accept_ra=1 and see if that makes a difference. For now seems to be working ok, but it's only been a couple of days.

earnest coral
#

the crucial one is net.ipv6.conf.all.forwarding. if is set to 1 (and you arent running HAOS) then your setup /is/ broken, there are just various levels of luck to how badly you will feel it

#

depending on your pov, its a bug in an ipv6 RFC where there is some tension between what is a client and what is a router

#

and the spec forbids routers to do some things that clients do, one of which is dead neighbour detection

#

without the HAOS kernel patch, its net.ipv6.conf.all.forwarding that controls whether linux has the client behaviour or the router behaviour

#

ethernet will likely help, as its people with ropey wifi that get hit the worst

astral lava