#IKEA device not responding to checkin() then zigbee network going down

1 messages · Page 1 of 1 (latest)

zenith sleet
#

I have an issue where my zigbee network becomes unuseable.
After troubleshooting, it seems that sometimes, an ikea device will not send the PollControl:Default_response() to a PollControl:checkin() command.
This makes zha retry the checkin() command several times.

Unfortunately, this means that the ikea device enters into fast poll mode and keeps sending requests every second. It overflows the (already very busy) zigbee network and everything falls appart.
Moreover, fast_poll_stop is never sent to the ikea device. because the response wasn't received so there is no chance that the problem stops.
The only way to fix this issue is to identify the device and power cycle it.

One workaround I've found on the internet is to completely disable the PollControl cluster for those.
I tried to manually send fast_poll_stop from the web interface but I only had the BUSY error.

It seems that IKEA devices have some quirks for checkin command. Shouldn't they simply be ignored ?
https://github.com/zigpy/zha/blob/61ec3039f1323190b405f9bbb100c7f2a7520c19/zha/zigbee/cluster_handlers/general.py#L598-L607

Thanks

GitHub

Zigbee Home Automation. Contribute to zigpy/zha development by creating an account on GitHub.

lilac cedar
#

Can you post the ZHA debug log?

zenith sleet
#

offending device NWK is 0xE5F3

#
2024-10-07 14:24:52.641 DEBUG (MainThread) [zigpy.zcl] [0xE5F3:1:0x0020] Decoded ZCL frame: PollControl:Default_Response(command_id=1, status=<Status.SUCCESS: 0>)
#

that one went ok

#

but things when downwards later

#

for this one I never get the Default_Response and fast_pool_stop is never sent because it's stuck on the await 3 lines above

#

it happened twice with an ikea on/off remote and once with a motion sensor

#

but my analysis may be totally off 😛

lilac cedar
#

We're working on a fix for this

#

That will hopefully isolate the problems to a single device instead of affecting the whole network

#

That being said, the BUSY error sounds a little suspicious. Do you have any Zigbee groups in use or any sort of components that send out commands to large swaths of the network?

zenith sleet
#

nope, nothing fancy

#

just 84 devices 😅

#

the busy only happens when there is one device flooding the network

#

it also happens with some of my TRVs when they somehow send a lot of mac_poll messages

#

this only happens when they are using the deconz as a gateway directly. if they use any other main powered device, I don't have the problem

#

I wish I had some way to measure the zigbee bandwidth being used

#

can you please point me to the PR/issue for this fix btw?

lilac cedar
zenith sleet
#

something is definitely amiss, after that failed checkin, that device starts sending ZDO request ZDOCmd.NWK_addr_req: [00:21:2e:ff:ff:08:02:16, <AddrRequestType.Single: 0>, 0] every 3 seconds and never stops. it seems it's the root cause for killing the zigbee network

#

looking at your patch, it seems that you will be delaying sending packets per device. how will this help in the situation where a device constantly sends packets ?

#

well I guess you won't be sending answers to those packets so it might help a bit

#

2024-10-09 06:13:38.166 ERROR (MainThread) [zigpy.zcl] [0xE5F3:1:0x0020] Traceback (most recent call last): here you can see the failure for checkin_response so fast_poll_stop is never sent

dense cloak
#

We probably should ensure that stop can run if long poll fails regardless. I say PR it and we can discuss any details there

zenith sleet
lilac cedar
#

Also, have you recently migrated your Zigbee network between adapters?

zenith sleet
#

e5f3 is the ikea motion sensor

#

I didn't migrate anything, I've had that zigbee network for 2 years

#

I've had a similar issue with a 2 button remote

#

but I don't have the logs handy

zenith sleet
#

note that apparently for ikea device, the fast poll thingy is skipped

zenith sleet
#

meh I just had the issue with yet another IKEA device

#

maybe it would be easier to ignore the pollcontrol cluster on those

#

I don't understand why those devices start spawning this : 2024-10-11 23:42:32.635 DEBUG (MainThread) [zigpy.zdo] [0xd731:zdo] ZDO request ZDOCmd.NWK_addr_req: [00:21:2e:ff:ff:08:02:16, <AddrRequestType.Single: 0>, 0] constantly

#

this time it was from an ikea scene button

#

what does this command even do ?

#

also why is checkin not being delivered at all ? it doesn't make sens. it feels more like an ikea bug

#

but I've had those devices for a while and I don't understand why they are starting to fail now. was checkin ignored before for ikea devices ?

zenith sleet
#

yes, could that be the cause ?

lilac cedar
#

I believe devices use this packet to discover routes

zenith sleet
#

mhh I disabled it but I still see the issue

#

but maybe it takes a while for source_routing to be disabled

lilac cedar
#

You may have to reboot the routers. It's not something that's really ever supposed to change. But that being said, the routing traffic will happen no matter what, with source routing you're just seeing it in a form that's visible to ZHA.