#Enabling OTBR Add-on Firewall breaks TREL?

1 messages · Page 1 of 1 (latest)

wind monolith
#

This is something that I noticed during testing with multiple Thread border routers. I've got OTBR Add-on 2.12.3 on HAOS, along with an Espressif TBR devkit running a build using a development ESP-IDF with TREL support.

If the HA OTBR Firewall is enabled, the Add-on logs do not show any traffic using the "trel" radio. Additionally, the logs on my Espressif TBR include messages indicating that forwarding packets over TREL to the HA OTBR is failing:

I(509980) OPENTHREAD:[N] MeshForwarder-: Failed to send IPv6 UDP msg, len:95, chksum:8306, ecn:no, to:8e291b6bd127e2fd, sec:no, error:Abort, prio:net, radio:trel
I(509980) OPENTHREAD:[N] MeshForwarder-:     src:[fe80:0:0:0:886b:d45b:78c7:8b91]:19788
I(509990) OPENTHREAD:[N] MeshForwarder-:     dst:[fe80:0:0:0:8c29:1b6b:d127:e2fd]:19788

After I disable the HA OTBR Firewall setting, the error log messages on the Espressif TBR disappear, and I see two-way TREL traffic in the HA OTBR logs:

00:08:11.253 [I] MeshForwarder-: Sent IPv6 UDP msg, len:90, chksum:7949, ecn:no, to:0xec00, sec:yes, prio:low, radio:trel
00:08:11.254 [I] MeshForwarder-:     src:[fda9:d862:9078:1:753d:3132:3cc2:8954]:58914
00:08:11.254 [I] MeshForwarder-:     dst:[fda9:d862:9078:1:d6c4:e184:8639:9a85]:5540
00:08:11.282 [I] MeshForwarder-: Received IPv6 UDP msg, len:82, chksum:9505, ecn:no, from:0xdc00, sec:yes, prio:normal, rss:-20.0, radio:trel
00:08:11.283 [I] MeshForwarder-:     src:[fda9:d862:9078:1:3568:e119:2a4a:b1b]:5540
00:08:11.283 [I] MeshForwarder-:     dst:[fda9:d862:9078:1:753d:3132:3cc2:8954]:58914

Is this the expected behaviour?

summer hedge
#

For what its worth, I have two HA OTBR's running and neither have Firewall on, and I occasionally see these two errors in my logs as well.

latent nebula
wind monolith
#

yeah, i'm mostly just wondering if the firewall feature might be causing some weird network issues here. I've previously run into an odd thing where it seemed to have been the cause for my HA frontend becoming inaccessible

glad pewter
#

The firewall rules are a copy of what the upstream ot-br-posix repo provides (see https://github.com/openthread/ot-br-posix/blob/main/script/otbr-firewall). Afaik it pretty much makes sure that things are limited to the wpan0 interface, so it seems very unlikely to me that other things like frontend can be affected by it. I also would expect that TREL sockets are bound to IPs which are not the wpan0 interface, hence the rules should also not influence that.

glad pewter
#

Yeah can't reproduce this with a Google BR and OTBR:

# ot-ctl trel peers
| No  | Ext MAC Address  | Ext PAN Id       | IPv6 Socket Address                              |
+-----+------------------+------------------+--------------------------------------------------+
|   1 | c6746af90a2c7826 | 2f7fe0dc78c98e15 | [2a02:abcd:abcd:10:5ff2:5ed3:cdb1:abcd]:61871     |
|   2 | 62ca8af26f4da7fe | 309f0c24472244ca | [2a02:abcd:abcd:10:0:0:0:dcba]:50297               |
# ot-ctl trel counters
Inbound: Packets 26 Bytes 2531
Outbound: Packets 57 Bytes 5435 Failures 0
#

It seems that the OTBR is picking a link local address, which is a bit odd. Afaik, TREL by Thread standard should use the most global IPv6 address (altough, I think they are thinking about changing that). But in any case, link-local shoudl work too. Are those link-local addresses the address of the OTBR on the backhaul (WiFi/LAN) link? 🤔

glad pewter
summer hedge
#

11d.21:20:29.046 [N] MeshForwarder-: Failed to send IPv6 UDP msg, len:127, chksum:9d54, ecn:no, to:4a49db7a20f7deb9, sec:no, error:Abort, prio:net, radio:trel
11d.21:20:29.046 [N] MeshForwarder-: src:[fe80:0:0:0:5445:8435:6080:d6c9]:19788
11d.21:20:29.046 [N] MeshForwarder-: dst:[fe80:0:0:0:4849:db7a:20f7:deb9]:19788

#

and

#

11d.21:20:33.010 [N] MeshForwarder-: Dropping (reassembly queue) IPv6 UDP msg, len:467, chksum:e1d7, ecn:no, sec:yes, error:ReassemblyTimeout, prio:normal, rss:-20.0, radio:trel

#

For the sending part, the src link local is that of the sending OTBR's wpan0, and the dst is the remote OTBR's wpan0

#

Yes you ar correct, that these are only notices

wind monolith
# glad pewter It seems that the OTBR is picking a link local address, which is a bit odd. Afai...

The addresses in that log are addresses within the thread network; I'll have to check to see what info i can get from the esp-ot-br side, but from the HA side when it's working it's discovering ULAs for TREL: ```
[INFO]-MDNS----: DNSServiceGetAddrInfo reply: flags=1073741826, host=esp-ot-br.local., sa_family=2, error=0
[INFO]-MDNS----: Service _trel._udp is resolved successfully: add esp-ot-br host esp-ot-br.local. addresses 2
[INFO]-MDNS----: addresses: [ fdcf:8e77:90bc:20:deda:cff:fe5d:2e6f,fdc5:cbbb:63be:782c:deda:cff:fe5d:2e6f ]
[INFO]-DPROXY--: Service discovered: _trel._udp, instance esp-ot-br hostname esp-ot-br.local. addresses 2 port 12390 priority 0 weight 0
[...]
00:00:00.399 [I] TrelInterface-: Added peer mac:8a6bd45b78c78b91, xpan:c5cbbb63be96782c, [fdc5:cbbb:63be:782c:deda:cff:fe5d:2e6f]:12390

#

I wonder if it's mdns getting interfered with? as best i can tell from the logs, it looks like ha otbr might actually not be finding the other tbr's trel service via mdns - i don't see those log lines following a restart with the firewall enabled.

#

or maybe there's just some weird stuff going on elsewhere and i'm seeing a correlation where there is none :(

glad pewter
#

I mean, the idea of TREL is it is being run on the infrastructure link. If IP addresses of the Thread network are in play as TREL peers, soemthing is wrong. That would be TRET then, Thread Radio Encapsulation over Thread 🙈

wind monolith
#

yeah, it's definitely running over the infrastructure link; the packets inside trel have addresses within the thread network of course.

wind monolith
#

I'm curious about digging into this a bit more - how do i get access to the otbr cli / ot-ctl tool with the haos addon?

glad pewter
wind monolith
#

yeah, just figured out the docker exec. I should switch to using the os level ssh; the virtual machine console has some weirdness with line-wrapping.

#

so, none of the ot-ctl "dns" subcommands are working for me in haos (independent of firewall state)

#

On my ESP32 thread br: ```

dns browse _trel._udp.default.service.arpa

DNS browse response for _trel._udp.default.service.arpa.
8e291b6bd127e2fd
Port:41378, Priority:0, Weight:0, TTL:10
Host:homeassistant.default.service.arpa.
HostAddress:fdcf:8e77:90bc:20:0:0:0:735 TTL:10
TXT:[xa=8e291b6bd127e2fd, xp=c5cbbb63be96782c] TTL:10

#

but on HAOS OTBR: # ot-ctl dns browse _trel._udp.default.service.arpa DNS browse response for _trel._udp.default.service.arpa. Error 28: ResponseTimeout

#

yet despite that i'm currently in a state where the HAOS OTBR trel peers list shows my ESP OTBR, but the oter way around is not true.

#

really strange.

#

(ah, i think that one might be partly due to a known issue in the esp-idf mdns implementation where subscribing to a service doesn't do an initial query, so it'll only pick up the other services when they announce themselves)

wind monolith
#

alright, pretty sure my initial assumption was wrong, trel definitely works fine with the firewall enabled.