#Ongoing Zigbee issues - At wits end.

1 messages · Page 1 of 1 (latest)

livid zodiac
#

Appologies, this is a is a bit of a long one as its been going on for months and we are losing hope

Been having on and off issues with zigbee that seem to be getting worse as time goes on. The network was in place for over a year and was functioning well, then in the last 5-6 months for no obvious reason it started to misbehave to the point we are getting desperate.

It started with the occasional delay on an action taking place, For example, sending a light.turn_on could take 3 to 15 seconds to happen. Often the whole network will stop responding to instructions, then execute all the queued up requests in rapid order (such as turning lights on/off/back on etc)

The behaviour is somewhat erratic, sometimes the system will work for days/weeks on end without fault then will start to be erratic at apparent random

We are now seeing odd behaviours like some lights dimming in a loop or flashing/blinking apparently at random (e.g. one bulb in a group of five will just start ramping up from 0 brightness repeatedly) often it is a different bulb in a different room. none of the bulbs are directly assigned automations or behaviours, only to entire groups.

We've had a small number of devices (a Ikea Smart-plug and a Philips Hue bulb fall off the network and had to be reset and paired again) but the situations seems to be getting worse.

#

What I've tried

  • Checked for interference / Tried other channels
    Using the AP scanning function on my UnifiAP (2 AP's in the property, running on Chanel 1 and 6) to get channel usage which shows that most channels (and Zigbee Channel 20, 2450 MHz, which overlaps with Channel 8 on Wifi) is showing low utlization/noise, as this is the only real tool I have to check this)
  • Moving the coordinator to another part of the house, and made sure that it is away from any other electronic device or the Access points
  • Adding an additional router to boost signal on the other side of the house
  • Rebuilding the Zigbee network fully with a factory reset and repairing of each device in place.
  • Replacing the Cordinator from a ElectroLlama to a SBLZ-06
  • Replacing Zigbee2MQTT with ZHA
  • Rebuilding Home Assistant fully
  • Moving HA to a more powerful host device
    We thought for a little while that rebuilding HA had helped with the issues but it is unclear if this is the case or not, The new host is sually sitting around 12% CPU usage and about 10% ram so it does not seem to be taxed in any shape or way.

Looking at the Zigbee network map in ZHA (attached) we can see most of the connecting lines are 'yellow' and occasionally green.

#

(The red device is one of the Ikea Smart-plugs we have not gotten around to re-pairing back to the network)

#

The network is composed of 72 devices, with the following breakdown:

33 Philips Hue bulbs (Mains Powered)
6 Ikea 'Tradfri' bulbs (Mains powered)
6 Philips Hue Remotes (Battery)
2 Ikea Tradfri Remotes (Battery)
5 Xaomi Miliwave motion detectors (Mains powered USB)
5 Xaomi Aquaia Door sensors (Battery)

Over 3 floors in a normal sized UK home (i.e., fairly small)

#

I've been collecting logs from HA and the SBLZ-06 router, even from just today but honestly Im not sure exactly what is relevent, but please find attached some snippets from HA/Coodinator that keep cropping up

#

HA logs, this error shows up several times a day:

Logger: homeassistant
Source: /usr/src/homeassistant/homeassistant/runner.py:112
First occurred: 03:18:26 (10 occurrences)
Last logged: 13:24:11
Error doing job: Task exception was never retrieved (None)

Traceback (most recent call last):
  File "/usr/local/lib/python3.13/site-packages/zigpy_znp/api.py", line 1120, in request_callback_rsp
    return await callback_rsp
           ^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.13/site-packages/zigpy_znp/api.py", line 1117, in request_callback_rsp
    async with asyncio_timeout(timeout):
               ~~~~~~~~~~~~~~~^^^^^^^^^
  File "/usr/local/lib/python3.13/asyncio/timeouts.py", line 116, in __aexit__
    raise TimeoutError from exc_val
TimeoutError```
#

And oftne this error as well 2025-05-06 11:01:40.841 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved (None) Traceback (most recent call last): File "/usr/local/lib/python3.13/site-packages/zigpy_znp/api.py", line 1118, in request_callback_rsp await self.request(request, timeout=timeout, **response_params) File "/usr/local/lib/python3.13/site-packages/zigpy_znp/api.py", line 1079, in request response = await response_future ^^^^^^^^^^^^^^^^^^^^^ asyncio.exceptions.CancelledError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.13/site-packages/zigpy_znp/api.py", line 1117, in request_callback_rsp async with asyncio_timeout(timeout): ~~~~~~~~~~~~~~~^^^^^^^^^ File "/usr/local/lib/python3.13/asyncio/timeouts.py", line 116, in __aexit__ raise TimeoutError from exc_val TimeoutError 2025-05-06 11:01:40.843 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved (None) Traceback (most recent call last): File "/usr/local/lib/python3.13/site-packages/zigpy_znp/api.py", line 1120, in request_callback_rsp return await callback_rsp ^^^^^^^^^^^^^^^^^^ asyncio.exceptions.CancelledError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.13/site-packages/zigpy_znp/api.py", line 1117, in request_callback_rsp async with asyncio_timeout(timeout): ~~~~~~~~~~~~~~~^^^^^^^^^ File "/usr/local/lib/python3.13/asyncio/timeouts.py", line 116, in __aexit__ raise TimeoutError from exc_val TimeoutError

#

we get a lot of these errors on the log in the SLZB-06 but the time-stamps dont seem to line up with the errors in HA so we dont know if they are related or not? [10:31:04] zb_packet | wrong paket len: 2 expected: 4 [10:41:43] zb_packet | wrong paket len: 2 expected: 4 [10:52:29] zb_packet | wrong paket len: 57 expected: 202 [11:21:13] zbSelfOta | Heap: 2684 [11:29:47] zb_packet | wrong paket len: 5 expected: 70 [11:39:45] zb_packet | wrong paket len: 5 expected: 49 [12:21:13] zbSelfOta | Heap: 2684 [12:23:59] zb_packet | wrong paket len: 2 expected: 4 [12:40:58] zb_packet | wrong paket len: 2 expected: 4 [13:05:23] zb_packet | wrong paket len: 2 expected: 4 [13:05:23] zb_packet | wrong paket len: 13 expected: 14 [13:21:13] zbSelfOta | Heap: 2684 [13:24:45] zb_packet | wrong paket len: 5 expected: 164 [13:39:03] zb_packet | wrong paket len: 2 expected: 4 [13:40:02] zb_packet | wrong paket len: 5 expected: 176 [13:48:00] zb_packet | wrong paket len: 43 expected: 198 [14:21:13] zbSelfOta | Heap: 2684 [14:56:40] zb_packet | wrong paket len: 5 expected: 43 [15:21:13] zbSelfOta | Heap: 2684

#

And finally, a (massive?) Debug logfile when we managed to capture the errors happening

#

God I hope someone can make more sense of this than I can!

proven fossil
#

One thing that gets some people is they have these:

33 Philips Hue bulbs (Mains Powered)
6 Ikea 'Tradfri' bulbs (Mains powered)

#

And also a dumb switch on them that removes the power. They need to be always powered on if they are also routing devices.
I don't see anything obvious otherwise.

#

Personally i would have a walwart or 2 in each room, if you use them or not to act as always on routing devices.

livid zodiac
#

Hi, So these are. always powered, the light-switches are actually blocked off so they cant be turned off by accident, so no consern there. Most of the devices are bulbs which, given they are always on we didnt worry too much about having wall-wart style devices. a lot of the smart-plugs i use are ESPhome devices which work fine so I'd prefer not to have to buy a bunch of Zigbee plugs if I can help it

runic oxide
#

I know 72 devices makes this extremely difficult to do, but have you considered going all the way out to Zigbee channel 25?

#

Only thing I can think is if any of those bulbs are just really poor routers, or you're getting interference maybe the UniFi stuff isn't sensitive enough to pick up if you're getting packet errors

livid zodiac
#

I've seen this suggested, and.. Aye, its a lot of work to change the channel up to 25 but at this point Im willing to put up with the pain to see if it makes any difference.....

#

Im just trying ot figure out if tehres anything else a little more 'obviously' wrong that wont require most of a day to go around resetting everything 😄

runic oxide
livid zodiac
#

I've written a small script that'll pull the 'diagnostic data' from ZHA every 30mins and rips out the energy_signal fields. At least his way I can see what the zigbee channels look like and see if thats an issue. Weird you cant just.. have HA do that out of the box but 🤷

livid zodiac
#

Well, after leaving the script running for a few days, its collected 1200 scan-results and on average it seems to show that channel 24 is the 'quietest' so.. will hvae that a try I guess 😄

runic oxide
livid zodiac
#

Well..... that went to total and utter shit.. 😦

#

Changed the channel to 25 yesterday, Didnt seem to really make any noticable difference so figured I'd just leave it to sort itself out and this morning the entire Zigbee network is unresponsive, and HA/ZHA is no longer talking to the coordinator. (The intergration is marked as failed/unavailable) and if I try to go into the settings it just gives me an option to either 'reconfigure the existing network' or 'create a new network'

#

Looking at the logs, it looks like it just fell over at about 1am in the morning, there is a single message about the 'ZHA backups not matching the current network settings' then a several watchdog failures & 'No gateway object exists' errors .

#

rebooting HA / coordinator did nadda, and if I select 'reconfigure the existing network' (like, its already totally hosed at this point) it just fails anyway with an unable to connect to the coordinator lol

spiral spindle
little sky
#

Are you still on ZHA on back on Z2M? I have similar symptoms (10+s to answer a command) but in my case it seems to come from a weird slowness in Z2M, I've posted something here, don't know if your issue is the same: #1420049437908009130 message

spiral spindle
#

No, never made it over to Z2M. I tried to update the firmware on the POPP ZB shield using the Elelabs_EzspFwUtility, but think this must have bricked it; maybe I flashed the wrong file. But a brand new RaspBeeII does not respond either.
Not sure how to verify the hat/shield is getting any power?
The XZG MT add-on seems to do a thorough job of checking/scanning all possibilities, but even that fails at every turn.

#

...just downgraded to OS 16.0 and then further down to 15.2, but no joy bringing it back to life