Excessive Logging 'Data received on an unknown session. Dropping it!' | Home Assistant | Page 1

alpine bobcat May 30, 2024, 9:58 AM

#

Hi @manic moth and @nimble ridge,

regarding CHIP_ERROR Data received on an unknown session. Dropping it

Yesterday evening I restarted my Matter Server addon. Round about 5 minutes later my 55 Matter over Thread devices were available in Home Assistant. Another 5 minutes later the dropping messages began.

Here is a my log file. I see this dropping logs never ending: https://dpaste.org/MnkNh
Here is an overview of all my devices (brand/model) and their node ids (sorted by node id): https://dpaste.org/Ri5kU

As already said. I see these messages since 12 May. First thought was the Apple tvOS/HomePodOS 17.5 update. But that was launched on 13 May. In my Home Assistant backups I see the following:

addon_core_matter_server_5.5.1 - 3 Mai 2024
core_2024.5.2 - 11 Mai 2024
core 2024.5.4 and addon_core_matter_server_6.0.0 - 20 Mai 2024

So, also no direct indication that a HA Core or a Matter server update introduced that logs. My Matter system works really stable. So, its not a major issue for me. But it makes it hard to read the logs.

Any idea about this?

manic moth May 30, 2024, 12:30 PM

#

Just checked, I don't have any such logs in my logs, and I have a couple of Eve devices in my production systems (2x Motion sensor, 1x Blind, 2x Energy Plug). It sounds a bit like a currupted/incomplete message got received? 🤔 Or maybe deleyed received? 🤔

Given the Apple update is closeset in date, maybe it is still that? While it was launched on May 13, maybe rollout started a bit earlier? 🤔

#

The error comes from this line of code:
https://github.com/project-chip/connectedhomeip/blob/v1.3.0.0/src/transport/SessionManager.cpp#L722

It seems that the server just can't find a session for that device at that point in time.

What I would try here is enabling more logs. You should find more information about the session in the logs when searching from the error upwards for the session ID (LSID). Maybe it got closed? But why...? Now in a production setup of your size this is really hard if you enable SDK debug log, things get completely overwhelming. What I would suggest to do is to isolate one of the problematic devices on a separate HA test installation, and then enable full logs there. Thanks to Matter multi-admin this should be fairly straight forward 😎

alpine bobcat May 30, 2024, 9:40 PM

#

Thanks for checking. No, the latest Apple Release was on Monday (Mai 13). It’s typically on Tuesday. 😉

What I don’t understand is why only battery powered devices show this behavior? Where is the difference between a battery and a mains powered device in regards to the session?

Thanks for the suggestion. When I find the time, I will setup a vm for testing purposes.

Thanks again, I will keep you updated.

pine shoal May 31, 2024, 7:04 AM

#

I see the same error messages since re-adding my Eve Energy Plugs yesterday evening. I'm on the latest beta and it started after commisioning the 5th or 6th plug out of 8.

#

https://dpaste.org/zSmoN

#

I had removed the plugs for the last days because of massive thread network instability with my Apple TV's 4K (2nd gen) after tvOS 17.5

#

Node 33 - 41 are Eve Motionblinds

pine shoal May 31, 2024, 7:47 AM

#

I've removed the last four Eve Energy Plugs (out of 8 total) a few minutes ago and the error messages are gone. 🧐

#

https://dpaste.org/kKQu4

nimble ridge May 31, 2024, 7:53 AM

#

Maybe the polling of the energy attributes has anything to do with it.
In that case you maybe want to switch to HA beta 2024.6 and the latest matter server beta to see if that helps.

pine shoal May 31, 2024, 7:55 AM

#

I'm on the latest Matter Server 6.1.0 BETA 1 and HA 2024.6.0b3

#

I've never seen the error messages on the stable beside the instability problem with my Apple TV's and tvOS 17.5(.1)

nimble ridge May 31, 2024, 7:57 AM

#

pine shoal I'm on the latest Matter Server 6.1.0 BETA 1 and HA 2024.6.0b3

OK, in that case its probably not related as @alpine bobcat above is on stable.

pine shoal May 31, 2024, 8:03 AM

#

In my case it seems that also only battery powered Eve Motionblinds are re-subscribing after these messages and maybe caused by the Eve Energy Plugs.

nimble ridge May 31, 2024, 8:04 AM

#

Yeah, I have a feeling the Eve Energy plugs are causing the issue. Did they receive an update recently ?

pine shoal May 31, 2024, 8:04 AM

#

No update recently. They are still on 3.2.1.

#

My Eve Motionblinds are mostly stable on re-subscription. I only see it sometimes after closing or opening all 9 blinds together. I think it's because they all send position updates every second at the same time.

pine shoal May 31, 2024, 8:26 AM

#

@nimble ridge Can I stop polling the energy attributes in the latest beta if I disable the relevant entities in HA?

nimble ridge May 31, 2024, 8:27 AM

#

pine shoal <@461603794090852353> Can I stop polling the energy attributes in the latest bet...

No, and there is no (more) need to do so - we have optimized all that

alpine bobcat May 31, 2024, 8:58 AM

#

nimble ridge OK, in that case its probably not related as <@1005545243824885782> above is on...

Yes, I am using all the latest stable version.

alpine bobcat May 31, 2024, 9:06 AM

#

pine shoal No update recently. They are still on 3.2.1.

Since the Nanoleaf firmware update 3.6.173 (Matter 1.2) I have the feeling that my Nanoleaf bulbs are more stable then my EVE Energies. But it's really not unstable. Everything works fine. Throughout the day I see 2 to 5 short disconnects, mostly from the EVE Energies. Nanoleaf bulbs have have maybe 1 to 3 short disconnects per week.

#

My setup is:

Thread Border Routers:

2 hardwired AppleTV 4K 3rd Gen
4 HomePod Minis
1 HomePod v2

#

Devices:

14 EVE Energies
9 EVE Thermos
7 EVE Door & Window
3 EVE Motion
1 EVE MotionBlinds
11 Nanoleaf A19-E27
10 Nanoleaf GU10

#

Yes, I see the 'dropping it' logs, but I do not see any issue in practice. However, I also have the feeling, that tvOS/HomePodOS 17.5.1 made my setup a bit more prone to the small disconnects. These disconnects vary from 5 seconds to 3-5 minutes. But the devices always come back by itself.

alpine bobcat May 31, 2024, 9:14 AM

#

nimble ridge No, and there is no (more) need to do so - we have optimized all that

Shall I update to all the latest beta for testing purposes? @pine shoal has the 'dropping it' logs, even though he uses the latest beta of HA and Matter. I am unsure, when I find the time to setup a new HA instance to bind one or more of my battery powered devices to that Matter fabric and enabling more logs. This will take some time.

nimble ridge May 31, 2024, 9:18 AM

#

Most likely this is just something that is now being logged (and was hidden before) so if you dont experience any stability issues I wouldnt care too much, no rush. Still it would be interesting to see if you can reproduce it on an isolated install like Stefan suggested. But like I said, ro rush.

#

Maybe/hopefully we will be able to reproduce it as well at some point (until now I cant)

alpine bobcat May 31, 2024, 9:35 AM

#

pine shoal I've never seen the error messages on the stable beside the instability problem ...

Which date did you the messages the very first time? You can grep through your logs via ssh with the following command.

ha host logs --identifier addon_core_matter_server -n 100000 | grep "Dropping it!"

pine shoal May 31, 2024, 9:36 AM

#

alpine bobcat Which date did you the messages the very first time? You can grep through your l...

Yesterday evening after re-adding my last 4 plugs. See the first log.

alpine bobcat May 31, 2024, 9:47 AM

#

OK, I grep'ed through my logs and brought the data to an Excel pivottable. Here is the result:

#

2024-05-15        36
2024-05-19        1
2024-05-20        7
2024-05-21        1
2024-05-23        48
2024-05-24        11
2024-05-26        31
2024-05-27        1634
2024-05-28        9500
2024-05-29        10413
2024-05-30        10558
2024-05-31        5356
overall result    37596```

#

Crazy, wtf... whats going on? 😄

#

Sadly I do not get back to May 12, even though I brought the lines in the command above to 10000000.

pine shoal May 31, 2024, 9:51 AM

#

When did you update to tvOS 17.5.1?

#

I think it was released on May 22.

alpine bobcat May 31, 2024, 9:51 AM

#

I think on May 13. As soon as it was available.

#

Google says May 13.

pine shoal May 31, 2024, 9:52 AM

#

I mean 17.5.1 not 17.5

alpine bobcat May 31, 2024, 9:52 AM

#

Ah sorry, you mean 17.5.1. I meant 17.5.

#

Yeah, I don't know. Some days ago.

#

😉

#

WOW 10000 dropping logs a day.

#

My 'dropping it' logs started at May 12, so one day before tvOS 17.5 release. Maybe I misread it yesterday.

pine shoal May 31, 2024, 9:57 AM

#

I had massive instability problems since tvOS 17.5 with my ATV's 4K 2nd gen so I switched the active home hub to a homepod mini. After Apple released 17.5.1 only for ATV's it became a mandatory active home hub. I changed to 3nd gen ATV's this week and my network is now stable back again. 🤷‍♂️

alpine bobcat May 31, 2024, 9:57 AM

#

@nimble ridge Any idea how to get into the logfile, to see the first log line? I am firm with vi or nano, but I am unsure where to search for matter server logfile on HAOS.

pine shoal May 31, 2024, 10:16 AM

#

@nimble ridge I'm trying to reproduce the error messages so I removed all of my Eve Energy Plugs from HA and started to re-add them. Now I get a Traceback in _custom_attributes_poller on every plug while commissioning them.

#

https://dpaste.org/HMF7U

nimble ridge May 31, 2024, 11:16 AM

#

pine shoal <@461603794090852353> I'm trying to reproduce the error messages so I removed al...

Ah, that is a small oversight (race condition) if you start removing nodes that are part as polled.
Will fix that soon. Just restart the server to get rid of the error now

manic moth May 31, 2024, 4:35 PM

#

alpine bobcat <@461603794090852353> Any idea how to get into the logfile, to see the first log...

Add-on logs are stored in systemd-journald on the host OS. It is automatically dropping the last logs. If you don't get more when increasing the number of lines, then you already get the oldest line stored 😢

alpine bobcat May 31, 2024, 4:46 PM

#

Ok, thanks for the information. Have a nice weekend

pine shoal Jun 5, 2024, 3:46 PM

#

After updating the Matter Server to 6.1.0 I've readded my Eve Energy Plugs to HA (nodes 109, 111, 112, 113, 114, 115, 116, 117). After adding the fifth plug (node 114) the messages "Data received on an unknown session. Dropping it!" started. I think it is not related to specific nodes because I removed nodes 112, 113, 116 & 117 again. After reducing back to 4 Eve Energies the messages stopped except the 4 lines at 17:12 o'clock.

https://dpaste.org/85QEo

manic moth Jun 5, 2024, 4:04 PM

#

Is that with a 2024.6.0bx Core version?

pine shoal Jun 5, 2024, 4:17 PM

#

2024.6.0b8

alpine bobcat Jun 24, 2024, 4:54 PM

#

@pine shoal You are able to reproduce the issue easily as it seems. Can you enable more logs how @manic moth suggested and post the logs?

#1245487486600482816 message

In your case you maybe do not need to pair it to another fresh HA instance. How many devices do you have?

I still see a lot of droppings and it’s impossible to read my logfile

pine shoal Jun 24, 2024, 9:46 PM

#

alpine bobcat <@543796291583606824> You are able to reproduce the issue easily as it seems. Ca...

My setup is quite big with 63 Thread devices (including the 8 Eve Energy‘s). Without being able to filter the debug logging on devices I can confirm it’s almost impossible to read the logs. 😉

#

Atm I‘ve removed my Eve Energy’s from HA.

alpine bobcat Jun 25, 2024, 8:05 PM

#

pine shoal My setup is quite big with 63 Thread devices (including the 8 Eve Energy‘s). Wit...

Ok, I have 61 devices. 😅

#

@manic moth Is it possible to hide, reduce or disable this log?

manic moth Jun 25, 2024, 8:20 PM

#

I mean, there are log levels. What log are you looking to remove?

alpine bobcat Jun 25, 2024, 8:24 PM

#

The ’dropping it‘ logs:

2024-06-25 22:12:04.806 (Dummy-2) CHIP_ERROR [chip.native.IN] Data received on an unknown session (LSID=39581). Dropping it!

#

——-
Matter server log level: Info
Matter SDK log level: Error

#

What would be the correct settings to hide these logs?

manic moth Jun 25, 2024, 8:28 PM

#

Hm, I guess "Matter SDK Log Level" to None should make this go away. But error logs should be left enabled, just in case 😅

alpine bobcat Jun 25, 2024, 8:39 PM

#

😃 Yeah, that’s not what I want. The question was if it is somehow possible for you to disable this log explicitly? I do not see any issues in practice. But I can’t read my logfile anymore, because of all the ’dropping it‘ messages. I do not have the time for any testing at the moment. Too many hobbies and it’s summer… 😃 But I really want get back to a readable logfile.

manic moth Jun 25, 2024, 8:43 PM

#

I mean, we could patch it out in the SDK source, but not sure if that is a good idea.

#

I really wonder what is going on here, why only Eve are affected, and what change caused it 🤔

alpine bobcat Jun 25, 2024, 8:48 PM

#

manic moth I really wonder what is going on here, why only Eve are affected, and what chang...

Yeah, that is what I also don’t understand. The issue came up near Apple OS 17.5 release. Maybe this solves itself with iOS 17.6. But we have to wait for it.

alpine bobcat Jun 25, 2024, 8:51 PM

#

manic moth I mean, we could patch it out in the SDK source, but not sure if that is a good ...

I think we do not need to do it right now. Let’s wait until the next Apple OS release 17.6 and then decide, if it still occurs.

manic moth Jun 25, 2024, 8:52 PM

#

I've grepped through my logs, and I had that error once the last few days, but this was with a Nanoleaf bulb 🤔 Seems more coincidence

#

I do have two Eve Smart Plugs. But those are not connected through a Apple BR.

#

There is a 6.2.0b2 release, did you try it? I don't think it will make a difference for that particular issue though, the underlying SDK is largely the same.

alpine bobcat Jun 25, 2024, 8:53 PM

#

manic moth I've grepped through my logs, and I had that error once the last few days, but t...

Remember my pivot table:

#1245487486600482816 message

alpine bobcat Jun 25, 2024, 8:53 PM

#

manic moth There is a 6.2.0b2 release, did you try it? I don't think it will make a differe...

No, I didn’t try it.

#

But I can do it.

pine shoal Jun 25, 2024, 9:07 PM

#

manic moth I mean, we could patch it out in the SDK source, but not sure if that is a good ...

Patching it out isn’t a good idea. Besides these messages my battery powered Eve MotionBlinds start to resubscibe. After removing the Eve Energy’s it goes back to normal.

manic moth Jun 25, 2024, 9:09 PM

#

pine shoal Patching it out isn’t a good idea. Besides these messages my battery powered Eve...

This does sound like a network overload issue 🤔

pine shoal Jun 25, 2024, 9:18 PM

#

manic moth This does sound like a network overload issue 🤔

I don’t think so. My devices are paired to Apple Home and HA and are working fine if I leave the Eve Energy’s only paired with Apple Home. I also tried HA as the only controller to all devices and got the same issues after pairing the Eve Energy’s. 🙁

alpine bobcat Jun 25, 2024, 9:18 PM

#

Downloading python_matter_server-6.2.0b1-py3-none-any.whl.metadata

#

Ok, I also have 60 Matter over Thread devices and really do not recognize any issues, besides the dropping logs.

manic moth Jun 25, 2024, 9:23 PM

#

pine shoal I don’t think so. My devices are paired to Apple Home and HA and are working fin...

Is the CPU load high caused by the Matter Server?

pine shoal Jun 25, 2024, 9:23 PM

#

alpine bobcat Ok, I also have 60 Matter over Thread devices and really do not recognize any is...

If I remember correctly you said that your battery powered Eve devices also resubscribe after this messages? 🤔

manic moth Jun 25, 2024, 9:23 PM

#

Sounds like some type of lock contention or deadlock then, if not on the network layer

alpine bobcat Jun 25, 2024, 9:28 PM

#

The dropping it logs are still there with the latest Matter server beta.

pine shoal Jun 25, 2024, 9:28 PM

#

manic moth Is the CPU load high caused by the Matter Server?

I think that CPU load was normal but I can check this on Thursday evening again.

#

Maybe it’s an issue on Apple side. My HomePod Minis caused instability issues with 17.5 after changing the AP. I‘m now on the 17.6 beta since Sunday and it seems to run better.

alpine bobcat Jun 25, 2024, 9:33 PM

#

pine shoal I think that CPU load was normal but I can check this on Thursday evening again.

No need to do it: https://imgur.com/a/ssuwHNe cpu/ram looks fine on a HA Yellow.

pine shoal Jun 25, 2024, 9:34 PM

#

I‘m also using a Yellow

alpine bobcat Jun 25, 2024, 9:37 PM

#

Here is my latest logfile: https://dpaste.org/4KAxP

alpine bobcat Jun 25, 2024, 9:40 PM

#

pine shoal I‘m also using a Yellow

We have a very similiar setup: HA Yellow, Unifi network, Apple TBRs only, round about 60 MoT devices

#

HA Yellow Thread radio is disabled here.

pine shoal Jun 25, 2024, 9:43 PM

#

alpine bobcat Here is my latest logfile: https://dpaste.org/4KAxP

Your log also shows resubscription messages from battery powered Eve devices.

alpine bobcat Jun 25, 2024, 9:44 PM

#

I am looking which are re-subscribing, give me some minutes.

#

Everything starts here with node 16. Its an EVE Energy:

2024-06-25 23:18:04.415 (MainThread) INFO [matter_server.server.device_controller.mdns] Node 16 activity on MDNS, trigger resubscribe
2024-06-25 23:18:06.412 (MainThread) INFO [matter_server.server.device_controller.mdns] Node 16 activity on MDNS, trigger resubscribe
2024-06-25 23:18:08.413 (MainThread) INFO [matter_server.server.device_controller.mdns] Node 16 activity on MDNS, trigger resubscribe
2024-06-25 23:23:00.660 (Dummy-2) CHIP_ERROR [chip.native.IN] Data received on an unknown session (LSID=4729). Dropping it!
2024-06-25 23:23:02.135 (Dummy-2) CHIP_ERROR [chip.native.IN] Data received on an unknown session (LSID=4729). Dropping it!
2024-06-25 23:23:03.537 (Dummy-2) CHIP_ERROR [chip.native.IN] Data received on an unknown session (LSID=4729). Dropping it!
```

#

node id - brand model
158 - EVE Thermo
11 - EVE Motion
159 - EVE Thermo
165 - EVE Thermo
170 - EVE Thermo
169 - EVE Thermo
163 - EVE Thermo
160 - EVE Thermo
162 - EVE Thermo
220 - EVE Motion

#

158 - EVE Thermo
87 - EVE Motionsblinds
11 - EVE Motion
165 - EVE Thermo (second time)
237 - EVE Weather
157 - EVE Thermo

#

in chronological order to the log file

#

So yes, only battery powered devices produce this 'Dropping it' logs.

pine shoal Jun 25, 2024, 10:06 PM

#

Same like me. My other battery powered devices (Aqara & tado) are not affected.

alpine bobcat Jun 25, 2024, 10:07 PM

#

Interesting! I only have EVE battery powered Matter over Thread devices.

pine shoal Jun 25, 2024, 10:09 PM

#

I think these Dropping logs are caused by the Eve Energy and this produces re-supscriptions for battery powered Eve devices.

alpine bobcat Jun 25, 2024, 10:10 PM

#

Yeah, maybe, I have 14 EVE Energies paired to Apple Home and Home Assistant.

#

But you said, you removed all your EVE Energies from Apple Home and had them paired to HA only and the issue was still there. You had to unpair them from HA. Right?

pine shoal Jun 25, 2024, 10:14 PM

#

Right. I would swear that your Dropping logs will also reduce and then stop if you reduce your paired Eve Energy’s in HA. For me it started after adding the fifth plug.

#

Apple Home also seems to be more responsive after removing the Eve Energy’s from HA completely. After opening Apple Home it normally refreshed every device in the past. Now it mostly opens without refreshing.

alpine bobcat Jun 25, 2024, 10:20 PM

#

Very Interesting finding. 😄

alpine bobcat Jun 25, 2024, 10:23 PM

#

pine shoal Apple Home also seems to be more responsive after removing the Eve Energy’s from...

I also see this happening. When I shutdown all my Apple TBRs, restart them and wait until the mesh procedure completes, opening Apple Home is really fast. But it slows down over time. It takes 1 or 2 seconds until everything is refreshed and available.

alpine bobcat Jun 25, 2024, 10:24 PM

#

pine shoal Right. I would swear that your Dropping logs will also reduce and then stop if y...

Maybe this should be my next test, removing all EVE Energies from HA. But what do we win, when I do this? 😄

#

Do we maybe have a firmware problem with the EVE Energies? They are still using Matter 1.0, right? Yes, I know it should be downwards compatible.

#

But why do you see these logs only with EVE battery powered devices, while aqara and tado do not show this behavior? They also use the EVE Energies as their Thread Routers.

manic moth Jun 26, 2024, 3:00 PM

#

Maybe we run into a CASE session limit here or something.

#

So it seems that EVE energy are really causing this. Can you try add this to a separate test HA instance? Just 1,2 (or more) Eve energies, and see if that shows the error too? Thanks to multi-admin it should be rather easy to commission it with an addtiional controller.

#

First this would be interesting while still connected to the productive HA. I would expect that the erros happen on the test HA instance too, as the CASE sessions are shared resources. If it doesn't happen on the test HA instance, then it is more likely a controller issue. If it does happen on the test intsance too, then we probably run into a device limit of some sort.

pine shoal Jun 27, 2024, 10:09 PM

#

Maybe I was wrong about the EVE Energy's causing the problem. A few day's ago I removed 4 Nanoleaf GU10 bulbs for special testing from my HA instance. At this moment I was down to 41 active nodes in HA. Today I could pair all of my 8 EVE Energy's without any "Dropping it" errors into HA (back to 49 active nodes). After this I started to re-pair my 4 GU10 bulbs. Shortly after pairing the first bulb the error messages appeared again. I removed the bulb again and the error messages stopped. The same happens with pairing a Aqara P2 Door & Windows sensor. It seems that these error messages start to appear with 50 active nodes. Curiously only battery powered EVE devices re-subscribe after these "Dropping it" messages.

alpine bobcat Jun 27, 2024, 10:52 PM

#

Interesting finding. At the moment I am also down to 52 devices, because of beta testing something. So it is easy to validate for me. I only need to remove three devices from my HA instance. Give me some minutes…

alpine bobcat Jun 27, 2024, 11:24 PM

#

pine shoal Maybe I was wrong about the EVE Energy's causing the problem. A few day's ago I ...

Maybe its too early to say that you found the problem, but my log is quiet, after removing 3 devices from HA. I removed these 3 nodes:

~ # ha host logs --identifier addon_core_matter_server -n 1000 | grep "successfully removed from Matter server"
2024-06-28 00:55:29.825 (MainThread) INFO [matter_server.server.device_controller] Node ID 14 successfully removed from Matter server.
2024-06-28 00:57:39.956 (MainThread) INFO [matter_server.server.device_controller] Node ID 19 successfully removed from Matter server.
2024-06-28 01:04:10.728 (MainThread) INFO [matter_server.server.device_controller] Node ID 17 successfully removed from Matter server.

#

Here is a snippet from my HA Matter server log from round about 10 minutes before I removed the first of the three nodes: https://dpaste.org/MpQzm

#

Now its 2024-06-28 01:26 here in Germany, the last log line is still from 2024-06-28 01:07:07.859.

@manic moth Are these operational discovery failed expected after removing a node id from HA? I first removed them in Apple Home from the HA service 4939, before I removed them from HA.

#

@pine shoal I can't believe it, I can read my logs again, already 20 minutes without a single log line. 😉 Great finding!

#

I hope this limitation can be set to at least 100 devices. I still have round about 20 bulbs lying around. 😄

alpine bobcat Jun 28, 2024, 12:10 AM

#

Ok, great 1 hour later, still no new log line. Awesome! 👏 Now let’s hope that the limit is somehow configurable. 😉

#

Good night

manic moth Jun 28, 2024, 8:22 AM

#

alpine bobcat Now its `2024-06-28 01:26` here in Germany, the last log line is still from `202...

Yeah if you remove our fabric entry via Apple Home then this is expected. We still try to find that device, but it doesn't publish an operational service anymore for our fabric/node id.

#

Actually, there is an event notifying when one removes us from the fabric:

2024-06-28 10:23:10.007 (MainThread) VERBOSE [matter_server.server.device_controller.node_215] Received node event: EventReadResult(Header=EventHeader(EndpointId=0, ClusterId=40, EventId=2, EventNumber=20, Priority=<EventPriority.INFO: 1>, Timestamp=6015131, TimestampType=<EventTimestampType.SYSTEM: 0>), Status=<Status.Success: 0>, Data=BasicInformation.Events.Leave(fabricIndex=10)) - transaction: <Subscription (Id=2154774328)>

So we can react on this , and remove the device on our end....

#

@alpine bobcat @pine shoal these are very interesting finds! I was thinking about that this problem could be related to a CASE session limit. Your findings very much support that suspcion

#

I thought it was a per device one, but this seems to point to a controller global limit somehow 🤔

#

#ifndef CHIP_CONFIG_CONTROLLER_MAX_ACTIVE_DEVICES
#define CHIP_CONFIG_CONTROLLER_MAX_ACTIVE_DEVICES 64
#endif

#

Soooo how many do you plan to add? 😎

alpine bobcat Jun 28, 2024, 8:43 AM

#

How many devices can you make possible? 😂

#

I think in the long term I will run 100 maybe 150 Matter devices. What is this limit used for? Why is it configured to 50? Is there any performance reason? Let’s try it, with 100 in the next step. What do you think?

pine shoal Jun 28, 2024, 9:10 AM

#

manic moth Soooo how many do you plan to add? 😎

It would be interesting to know how many devices a controller "must" support. Does the Matter standard has a definition for it?

nimble ridge Jun 28, 2024, 9:14 AM

#

Well, I don’t think there is a limit (there is some discussion about that though) but it must support 100s of devices with ease

manic moth Jun 28, 2024, 9:29 AM

#

alpine bobcat How many devices can you make possible? 😂

I'd say your memory is the limit. Or maybe int max😉

#

I go with 1024 for now.

manic moth Jun 28, 2024, 9:32 AM

#

pine shoal It would be interesting to know how many devices a controller "must" support. Do...

Yeah, like Marcel, I am not aware of a limit either. The standard really doesn't talk a lot about controllers in general. 🤷‍♂️

alpine bobcat Jun 28, 2024, 3:56 PM

#

manic moth I go with 1024 for now.

That’s great. When do you think we will see this implemented? 😃

manic moth Jun 29, 2024, 12:59 PM

#

6.2.0 BETA 2 with this fixed is out now 🎉

alpine bobcat Jun 29, 2024, 2:35 PM

#

Sadly, I am not at home today. Will check it out tomorrow. 😃 Thanks

pine shoal Jun 29, 2024, 3:13 PM

#

manic moth 6.2.0 BETA 2 with this fixed is out now 🎉

After updating to 6.2.0b2 I still see „Dropping it“ messages. 🤔

manic moth Jun 29, 2024, 6:42 PM

#

Hm

manic moth Jun 29, 2024, 7:16 PM

#

There is another closely related config CHIP_CONFIG_CONTROLLER_MAX_ACTIVE_CASE_CLIENTS but from what I understand that is only relevant during session establishment

#

From what I can tell, the other config got successfully applied in the builds 🤔

pine shoal Jun 29, 2024, 8:01 PM

#

Maybe we should wait about @alpine bobcat feedback. For me it’s the same behavior like before.

alpine bobcat Jun 30, 2024, 5:55 PM

#

pine shoal Maybe we should wait about <@1005545243824885782> feedback. For me it’s the sam...

Yeah, its the same for me. I installed the new Matter server 6.20.b2 at 19:33, waited until all devices were connected again and shared one of my EVE Energies (node id 243) that are connected to Apple Home only at 19:43 with Home Assistant. The "Dropping it" logs came back immediately. Now I have 50 MoT devices again.

https://dpaste.org/vF96m

alpine bobcat Jun 30, 2024, 7:22 PM

#

manic moth ```c++ #ifndef CHIP_CONFIG_CONTROLLER_MAX_ACTIVE_DEVICES #define CHIP_CONFIG_CON...

So, 64 max active devices were configured before the lift to 1024. But why is the border at 50 devices?

manic moth Jun 30, 2024, 7:35 PM

#

Yeah not sure, I can only assume that some of the devices occupy multiple CASE sessions 🤔

alpine bobcat Jun 30, 2024, 7:35 PM

#

Hm…. One Moment… No new Dropping it messages, only these 7 log lines that are already in the log above. Still 50 MoT devices.

manic moth Jun 30, 2024, 7:36 PM

#

Ok, keep crossing my fingers then 🤞 😅

alpine bobcat Jun 30, 2024, 7:37 PM

#

I am going to pair some 2 more Mot devices now. That was my situation before the last test with a logfile full of dropping it messages.

#

I will keep you updated.

#

No, with 51 MoT devices its already going crazy. A lot of dropping it logs. I can post a complete log, if you need it.

pine shoal Jun 30, 2024, 7:57 PM

#

I‘ve seen the same. With 50 devices it starts with rare „Dropping it“ messages and if I pair more devices, messages also appear more often.

alpine bobcat Jun 30, 2024, 8:09 PM

#

manic moth There is another closely related config `CHIP_CONFIG_CONTROLLER_MAX_ACTIVE_CASE_...

So, maybe we want to try this next.

alpine bobcat Jun 30, 2024, 8:49 PM

#

Yep, I removed the 51. MoT device half an hour ago and the dropping it messages stopped. So I am back to 50 devices for now.

alpine bobcat Jul 1, 2024, 8:28 AM

#

I still see some dropping messages after I un-paired/removed the 51. MoT device (node 244) at 22:11 o'clock. Now where I look through my logs, I see that one of my EVE Energies (node 22) got unavailable after un-pairing (node 244). This node is still paired to Apple Home and works as expected. How can that be? Any idea?

Here is my complete log: https://dpaste.org/bOzGj

manic moth Jul 1, 2024, 4:11 PM

#

Turns out the config did not get applied correctly 😢

#

But meanwhile, bug found and fixed 🎉

6.2.0 BETA 4 should resolve the problem.

alpine bobcat Jul 1, 2024, 5:25 PM

#

alpine bobcat I still see some dropping messages after I un-paired/removed the 51. MoT device ...

This was my mistake. I deleted the home assistant service from the device, but forgot that I did that. Maybe I am getting old... 😄

alpine bobcat Jul 1, 2024, 5:27 PM

#

manic moth But meanwhile, bug found and fixed 🎉 6.2.0 BETA 4 should resolve the problem.

Great! I am testing it now. After commissioning the 50th device (node 245), I already see some dropping messages: https://dpaste.org/5Oaie

Now I am going to pair the 51th device.

#

The log for node 246 looks the same. After the node commissioning completes I see four dropping it messages for the device. https://dpaste.org/Hntqy

#

And now it starts again: https://dpaste.org/kwzKd

@manic moth that was not the solution.

alpine bobcat Jul 1, 2024, 5:52 PM

#

Yep, my log is full of dropping it now…

manic moth Jul 1, 2024, 5:56 PM

#

This doesn't really make sense 😥

#

Can you check if b4 really got installed on add-on startup?

alpine bobcat Jul 1, 2024, 5:59 PM

#

manic moth Can you check if b4 really got installed on add-on startup?

Oh, I always posted the log from the last startup…

Downloading python_matter_server-6.2.0b4-py3-none-any.whl.metadata

manic moth Jul 1, 2024, 6:00 PM

#

Yeah that looks good

alpine bobcat Jul 1, 2024, 6:05 PM

#

manic moth There is another closely related config `CHIP_CONFIG_CONTROLLER_MAX_ACTIVE_CASE_...

Did you change this setting, too?

manic moth Jul 1, 2024, 6:06 PM

#

Yes, it should be at 192 now.

nimble ridge Jul 1, 2024, 6:07 PM

#

So... just thinking out loud here... did you also try to add another matter device ?

#

anything else than the eve plug I mean

alpine bobcat Jul 1, 2024, 6:10 PM

#

Hi @nimble ridge

@pine shoal tested this here:

#1245487486600482816 message

Do you think it’s still worth a try? I can try that later. Now I have to bring my toddler to bed.

manic moth Jul 1, 2024, 6:33 PM

#

alpine bobcat Did you change this setting, too?

Actually, that was another setting which I've set to 192. The CHIP_CONFIG_CONTROLLER_MAX_ACTIVE_CASE_CLIENTS setting is still at 16.

#

Hmm,

#

#ifndef CHIP_CONFIG_MAX_FABRICS
#define CHIP_CONFIG_MAX_FABRICS 16
#endif // CHIP_CONFIG_MAX_FABRICS

#ifndef CHIP_CONFIG_SECURE_SESSION_POOL_SIZE
#define CHIP_CONFIG_SECURE_SESSION_POOL_SIZE (CHIP_CONFIG_MAX_FABRICS * 3 + 2)
#endif // CHIP_CONFIG_SECURE_SESSION_POOL_SIZE

#

That adds up to exactly 50, so it seems this is the limit we actually hit.

nimble ridge Jul 1, 2024, 6:49 PM

#

manic moth That adds up to exactly 50, so it seems this is the limit we actually hit.

a bit strange to calculate the pool size on max fabrics ?

manic moth Jul 1, 2024, 6:49 PM

#

I guess for a device this makes sense.

#

For a controller it is a bit odd indeed. Especially that then there are only 3 secure sessions per fabric allocated 😅

#

16 very small networks 😅

nimble ridge Jul 1, 2024, 7:13 PM

#

yeah indeed

alpine bobcat Jul 1, 2024, 10:57 PM

#

So we have a secure session pool of 16 * 3 + 2. Yeah, 16 very small networks. 😃

#

Can you change chip_config_max_fabrics to 64 for example?

#

So we get 64 very small networks, which probably nobody needs, but a usable safe session pool size… 😃

manic moth Jul 2, 2024, 6:37 AM

#

I've bumped CHIP_CONFIG_SECURE_SESSION_POOL_SIZE to 4096 directly. So with that we can have multiple fabrics with many many devices 😅

nimble ridge Jul 2, 2024, 6:53 AM

#

As soon as @alpine bobcat hits that limit we need to do an intervention that he needs to stop adding devices to his home network 🤣

alpine bobcat Jul 2, 2024, 6:54 AM

#

🤣🤣🤣

manic moth Jul 2, 2024, 7:22 AM

#

@alpine bobcat @pine shoal 6.2.0 BETA 5 out with the secure session limit bumped to 4096. This time! 🤞

alpine bobcat Jul 2, 2024, 7:23 AM

#

Downloading python_matter_server-6.2.0b5-py3-none-any.whl.metadata

alpine bobcat Jul 2, 2024, 8:04 AM

#

I think we got it! 🎉 explode

Round about 40 minutes after the update to 6.2.0b5, I do not see a single "Dropping it!" log line with now 52 devices paired to Home Assistant. Here is my complete log: https://dpaste.org/tschE

Great, thank you! 😃👍

manic moth Jul 2, 2024, 8:55 AM

#

Awesome! Thanks for testing!

alpine bobcat Jul 2, 2024, 10:40 AM

#

Yes, still no new "dropping it" lines. 😄

#

One last thing. Can you have a short look at my log file, please?
https://dpaste.org/bwO2J

#

Are these constant (re-)subscribing log lines an expected behavior? I know its all INFO, but I want to be sure. 😉

#

I also have some CHIP_ERRORs, mostly Time out! failed to receive report data from Exchange. Any idea, why this happens?

alpine bobcat Jul 2, 2024, 11:02 AM

#

These CHIP_ERRORs are not new to me.

manic moth Jul 2, 2024, 11:58 AM

#

So these are triggered by the SDK:

2024-07-02 10:53:33.402 (Dummy-2) CHIP_ERROR [chip.native.DMG] Time out! failed to receive report data from Exchange: 1824r with Node: <00000000000000D5, 1>
2024-07-02 10:53:33.404 (MainThread) INFO [matter_server.server.device_controller.node_213] Previous subscription failed with Error: 50, re-subscribing in 0 ms...
2024-07-02 10:53:36.420 (MainThread) INFO [root] Re-subscription succeeded!

#

They mean that the device failed to check-in in time. The reason could be that the liveness message got lost. RF/Ethernet networks do not guarnatee delivery, so this can happen every now and then under normal circumstances.

The MRP (Message Reliabiliy Protocol) should trigger up to 4 re-transmits. But it could be that temporary super busy 2.4GHz network takes them all out. Or maybe the device just crashed, but I think then it would take longer to re-subscribe. There was also some discussion if the liveness check-in is a bit too picky (as in, timeouts too early). I expect those to become fewer and fewer as the timings and timeouts of firmware and controllers are improved.

#

And then there are those:

2024-07-02 11:47:57.721 (MainThread) INFO [matter_server.server.device_controller.mdns] Node 176 activity on MDNS, trigger resubscribe
2024-07-02 11:48:14.665 (MainThread) INFO [matter_server.server.device_controller.mdns] Node 158 re-discovered on MDNS

These are from the Python Matter Server's zeroconf discovery. I would say your log has a bit too many of those 😅 But I'd need to look into why those get triggered that regularly.

alpine bobcat Jul 2, 2024, 8:28 PM

#

OK, thanks again for the explanation. This really helps me to understand what's going on. As long as everything works as expected in practice, I am happy. No rush with the zeroconf discovery messages. Finally these "Dropping it" log lines have disappeared and I can read my log file again. Have a nice evening. 😃

nimble ridge Jul 2, 2024, 9:15 PM

#

I am a bit inclined to say that these logs are expected on a big network like you have.
Its a lot of traffic for a lot of devices over various transports, including thread. Connections can be lost from time to time or packets dropped. As long as it restores itself and doesnt hurt actual operation. We log every fart of the SDK operations in HA, while the ecosystems like Apple and Google hide that all away from you. So incidental connection resubscriptions are "part of the deal", you see those especially more often with SDK 1.0 devices. Keep an eye on it of things stay responsive or you see logging at super steady/often intervals, like the one about the "dropping it".

alpine bobcat Jul 2, 2024, 9:43 PM

#

I see it exactly the same way and you keep telling me the same things again and again. 😅 We are the early adopters, so we have to live with certain bugs/issues at the moment, especially when you have so many Matter over Thread devices. I will continue to keep an eye on my thread network as a whole and keep an close eye on when the situation worsens or improves. I'm already looking forward to Apple OS 17.6 and 18, which may bring further improvements in addition to new device models. The next EVE/Nanoleaf firmware based on Matter 1.3 will also be interesting.

manic moth Jul 3, 2024, 7:25 AM

#

@nimble ridge I agree and disagree here at the same time.

There are different sources of messages with "resubscribe" in the logs: Some from the SDK, and some from the mDNS discovery.

Over the 3h of operation in the above logs there are a total of 2 from the SDK. This seems somewhat within normal operation normal to me too, I agree.

However, there are a total of 66 from the Matter Servers mdns implementation. That is a bit too many for my liking, tbh 😢 I wonder if we have a timing issue or something there 🤔

@alpine bobcat are the ones from the matter_server.server.device_controller.mdns logger continue to show? Can you share another log maybe?

nimble ridge Jul 3, 2024, 7:27 AM

#

If we see excessive logs or a pattern then we need to have a better look but in general mdns is very noisy.
If you think there is something to follow-up let's do that - to me this looks pretty normal (as its the same I see in test networks)

alpine bobcat Jul 3, 2024, 10:26 AM

#

OK, here is another log since the last Matter server restart: https://dpaste.org/77nP2

I see lot of CHIP_ERRORs, but in this dashboard history card I do not see any big trouble: https://imgur.com/a/P97HbXU

There was one EVE Energy unavailability within the last 24 hours: https://imgur.com/a/aDsi9Tx and three Nanoeaf unavailabilities for less than 40 seconds. I every case it healed itself. In general I do not have to power cycle devices anymore. They come back online by themselves.

#

I have some more Matter over Thread devices. But some Nanoleaf bulbs are not paired to HA at the moment for beta testing reasons. And some other devices, like EVE Thermo and EVE Weather, don't give me such a historical line. They show the temperature or humidity for example. I don't know how to get such a historical line for these device types.

pine shoal Jul 3, 2024, 3:18 PM

#

manic moth <@1005545243824885782> <@543796291583606824> 6.2.0 BETA 5 out with the secure se...

The change is also working for me. No more "Dropping it" messages! 🙂

alpine bobcat Jul 3, 2024, 4:03 PM

#

@pine shoal Can you also post a complete from the last Matter server restart? I am interested to see, what’s going on in your log file. 😃

pine shoal Jul 3, 2024, 4:17 PM

#

alpine bobcat <@543796291583606824> Can you also post a complete from the last Matter server r...

I can upload the logs later this week if you‘re interested. 😉 Today I’ve updated all my Home Hubs to 17.6 public beta 2 and actually waiting for things to settle down. So logs make no sense atm.

alpine bobcat Jul 3, 2024, 4:48 PM

#

Ok, please keep me updated, how it works and if you see some improvements. 😉

alpine bobcat Jul 3, 2024, 4:50 PM

#

pine shoal I can upload the logs later this week if you‘re interested. 😉 Today I’ve update...

I would be interested in a log file of about 24 hours of normal operation to compare the type and amount of CHIP_ERRORs. You have a similar setup to mine.

alpine bobcat Jul 18, 2024, 3:41 PM

#

@manic moth By the way... Here is a log of my last 24 hours: https://dpaste.org/cNEpC

The log is much quieter since the latest updates. Nearly all CHIP_ERRORs are from Nanoleaf bulbs, only one is from an EVE Energy (node 16). Overall I am fine with the log. Everthing works as expected. Thanks

#

@manic moth Even though its off topic. I see a small issue when I ssh into my HA Yellow and look at the new colourful logs. I can't read INFO, WARNING, CHIP_ERROR without marking the log lines, because its black on black. Is this changeable?

https://imgur.com/a/nUVkFZO

manic moth Jul 18, 2024, 4:13 PM

#

About the coloring: That depends on your virtual terminal adapter. We use default settings from the Python coloredlogs setting 🤷‍♂️ If you change the background to something more greyish it should be fine.

nimble ridge Jul 18, 2024, 4:15 PM

#

That log looks fine for the amount of devices you have @alpine bobcat.
One task for us maybe is to lower the log level of "Activity on mDNS, trigger resubscribe" to debug

manic moth Jul 18, 2024, 4:16 PM

#

A couple thigns in the logs which are interesting: You have two instances where the Matter SDK reports a long dispatch time. On startup, that is normal as the system is busy. But during regular operation, that is not ideal. Not sure, was there maybe high CPU load in general? 🤔

2024-07-17 20:51:03.858 (Dummy-2) CHIP_ERROR [chip.native.DL] Long dispatch time: 363 ms, for event type 2
...
2024-07-18 00:28:04.889 (Dummy-2) CHIP_ERROR [chip.native.DL] Long dispatch time: 398 ms, for event type 2

It is not suuper long, so not that problematic, but still not ideal.

#

These Activity on mDNS, trigger resubscribe don't hurt much, as we just tell the SDK it should re-subscribe NOW, in case that node is currently waiting to retry re-subscribe...

2024-07-17 15:13:30.751 (MainThread) INFO [matter_server.server.device_controller.mdns] <Node:159> Activity on mDNS, trigger resubscribe
2024-07-17 15:13:32.758 (MainThread) INFO [matter_server.server.device_controller.mdns] <Node:159> Activity on mDNS, trigger resubscribe

All those nodes listed are probably not in that state, so it is somewhat wasteful. I think this stems from the fact that Thread BR update their mDNS entry every now and then. I am not sure why it doesn't happen regularly though 🤔

alpine bobcat Jul 18, 2024, 9:13 PM

#

manic moth A couple thigns in the logs which are interesting: You have two instances where ...

I can't say it, but I don't think so. Everytime I look at the cpu load, it is between 1 and 5%.

#

Here is another interesting log file: https://dpaste.org/j3beU

At this time stamp 2024-07-18 17:53:45.170 I started to update my APs where also my 5 HomePods are connected to. I was not at home until 2024-07-18 20:30. When I came home all devices were connected and everything worked as expected. But there was a lot of work in my Thread mesh, when I look at the logs. While the update of my APs the infrastructure links from the Thread mesh to the WiFi do not work for some minutes and the mesh has to rebuild and find new routes. It took over two hours until the log got quiet again. Not the ideal situation. What do think about this?

#

@pine shoal Any news about 17.6 beta or 18 beta? 😄

pine shoal Jul 19, 2024, 9:18 AM

#

alpine bobcat <@543796291583606824> Any news about 17.6 beta or 18 beta? 😄

Do NOT install the tvOS/HomePodOS 18 public beta! 😉 I was on 17.6 public beta which was working quite well but my HomePods decided to update to 18 pb when it was released. Because of that I also updated my ATV's to tvOS 18 pb. After updating I saw lot of Matter timeouts in HA and it was not possible to run both Apple Home and HA as a controller at the same time. I could reduce the timeouts by removing Apple Home and leaving HA as the only controller. Still seeing a lot of timeouts compared to 17.6 but it works most of the time for now.

HomePodOS 17.6 public beta fixed the instability I've seen on 17.5 due to AP roaming. I also noticed that these annoying updating devices messages mostly disappeared after opening Apple Home app. It only updates individual devices that have changed their status.

alpine bobcat Jul 19, 2024, 8:19 PM

#

pine shoal Do NOT install the tvOS/HomePodOS 18 public beta! 😉 I was on 17.6 public beta ...

Some days ago I also decided to reduce the traffic in my Thread mesh, by removing all 23 Nanoleaf bulbs from Apple Home. Now they are paired to Home Assistant and light groups of my Matter bulbs are bridged to HomeKit. That works fine for the moment.

#

Ok, thanks for the tip. I think I will stay away from those Apple betas. It’s summer time. I do not have much time, short before holidays and too much other hobbies… 😉

#

Yeah, I also recognized a lot of timeouts due to HomePod roaming. I thought the issue is Unifi.

Can I ask which Unifi equipment you have?

#

Yeah, I also see these update messages on some of my EVE devices. Great to hear that they found the issue. 😃👍

pine shoal Jul 19, 2024, 8:56 PM

#

alpine bobcat Yeah, I also recognized a lot of timeouts due to HomePod roaming. I thought the ...

No Unifi at all. We‘re using a FRITZ!Box and a FRITZ!Repeater 6000.

alpine bobcat Jul 19, 2024, 9:26 PM

#

pine shoal No Unifi at all. We‘re using a FRITZ!Box and a FRITZ!Repeater 6000.

Oh, ok… I thought, you are a Unifi user, too. Thanks for the enlightenment. 😃

#Excessive Logging 'Data received on an unknown session. Dropping it!'