Store compression | Home Assistant | Page 1

rare sinew Feb 26, 2026, 9:03 AM

#

I've vibed a little benchmark script using same methods as core uses to store those json files. Here are the results:

Input file: ./knx_project.json
Initial size: 1364.11 KB

Results (averages over 3 iterations):
Mode: none
  Stored size: 1364.11 KB (compression: 0.00%)
  Save time: 0.32 ms
  Load time: 3.60 ms
Mode: gzip
  Stored size: 73.94 KB (compression: 94.58%)
  Save time: 39.03 ms
  Load time: 3.06 ms

#

📎 bench_storage.py

#

Those results are on an M1pro macbook, so saving and loading will be significantly different than it would on a RaspberryPi with an SD-Card... not sure if we should optimize for that, but users still seem to use those too 😬

#

Here with more detail

Input file: ./knx_project.json
Initial size: 1364.11 KB

Results (averages over 3 iterations):
Mode: none
  Stored size: 1364.11 KB (compression: 0.00%)
  Save total: 2.29 ms
    Serialization: 1.79 ms
    Compression: 0.00 ms
    Write IO: 0.50 ms
  Load total: 3.73 ms
    Read IO: 0.27 ms
    Decompression: 0.00 ms
    Parse: 3.46 ms
Mode: gzip
  Stored size: 73.94 KB (compression: 94.58%)
  Save total: 40.05 ms
    Serialization: 1.58 ms
    Compression: 38.30 ms
    Write IO: 0.17 ms
  Load total: 2.92 ms
    Read IO: 0.06 ms
    Decompression: 0.48 ms
    Parse: 2.39 ms

median edge Feb 26, 2026, 11:34 AM

#

But uh...

#

why? 😄

#

As in, what are we trying to solve here?

true acorn Feb 26, 2026, 12:08 PM

#

Holy shit knx file is huge. Yes for that size difference it's probably worth it

rare sinew Feb 26, 2026, 12:13 PM

#

median edge As in, what are we trying to solve here?

Main motivation was to speed up loading times and reduce wear of SD cards.

#

I guess backups are compressed anyway.

#

In Knx we store a communication history that can grow up to ~15 MB (size configurable, default is much less 😉)

median edge Feb 26, 2026, 12:20 PM

#

hehe uh

#

maybe that file isn't really suitable for that :S haha

#

considered using a sqlite or something instead maybe? :S

rare sinew Feb 26, 2026, 12:20 PM

#

HACS repository cache is also ~1,5 MB. Same for entity registry on my installation.

rare sinew Feb 26, 2026, 12:23 PM

#

median edge considered using a sqlite or something instead maybe? :S

Yeah, in the end we send this to frontend via WS. So json was the path of least resistance

true acorn Feb 26, 2026, 12:29 PM

#

Yeah those sizes are not good

median edge Feb 26, 2026, 12:33 PM

#

The thing is, if we compress this, we will hit some of the community that consume these files

#

probably not for this specific case. But I would consider splitting things like communication history out (and maybe compress that specifically in that case)

rare sinew Feb 26, 2026, 12:53 PM

#

Therefore I'd like to do it optionally. Like for data that is stored once and read every start (like Knx project or other config) or really huge data (HACS cache, maybe automation traces) I'd use it.
For entity registry maybe not 😬

#

Store(
    ...,
    compress: bool=False,
)

Like this so we can decide for every use.

hardy trench Feb 26, 2026, 3:24 PM

#

That sounds pretty good to me. Which compression format do you suggest? Would we enforce a suffix on the file corresponding to the compression algorithm?

median edge Feb 26, 2026, 3:30 PM

#

Agreed sounds good 👍

rare sinew Feb 26, 2026, 3:39 PM

#

Since we are on 3.14 now we could get fancy https://docs.python.org/3/library/compression.zstd.html 😃

#

that might yield even better numbers than the gzip test above

rare sinew Feb 26, 2026, 4:02 PM

#

Input file: ./knx_project.json
Initial size: 1364.11 KB

Results (averages over 10 iterations):
Mode: none
  Stored size: 1364.11 KB (compression: 0.00%)
  Save total: 2.59 ms
    Serialization: 1.62 ms
    Compression: 0.00 ms
    Write IO: 0.97 ms
  Load total: 2.76 ms
    Read IO: 0.33 ms
    Decompression: 0.00 ms
    Parse: 2.44 ms
Mode: gzip
  Stored size: 73.94 KB (compression: 94.58%)
  Save total: 41.49 ms
    Serialization: 1.58 ms
    Compression: 39.48 ms
    Write IO: 0.43 ms
  Load total: 3.05 ms
    Read IO: 0.07 ms
    Decompression: 0.50 ms
    Parse: 2.48 ms
Mode: zstd
  Stored size: 77.55 KB (compression: 94.31%)
  Save total: 3.01 ms
    Serialization: 1.57 ms
    Compression: 1.19 ms
    Write IO: 0.26 ms
  Load total: 2.79 ms
    Read IO: 0.05 ms
    Decompression: 0.50 ms
    Parse: 2.24 ms
Mode: zlib
  Stored size: 81.92 KB (compression: 93.99%)
  Save total: 10.65 ms
    Serialization: 1.57 ms
    Compression: 8.86 ms
    Write IO: 0.22 ms
  Load total: 2.84 ms
    Read IO: 0.06 ms
    Decompression: 0.46 ms
    Parse: 2.33 ms

true acorn Feb 26, 2026, 4:27 PM

#

median edge The thing is, if we compress this, we will hit some of the community that consum...

HM maybe but if I hear entity registry files of 1.5mb I do want to compress the registries

rare sinew Feb 26, 2026, 4:27 PM

#

Or use SQLite 😛

true acorn Feb 26, 2026, 4:28 PM

#

So far HA has been proudly SQL free at its core but I see your point..

#

Imagine allowing AI to write SQL to query our data...

#

Efficient search

median edge Feb 26, 2026, 4:52 PM

#

Yes. I do want vector search, but a bit off topic

#

but the fact the frontend now does all of the search logic is just... "meh"

median edge Feb 26, 2026, 6:40 PM

#

true acorn HM maybe but if I hear entity registry files of 1.5mb I do want to compress the ...

Hehe agreed, don't forget about people actually using them (even though they shouldn't) and it being a source for AI stuff

hardy trench Feb 26, 2026, 7:14 PM

#

surely the AI can handle decompression, directly or indirectly

median edge Feb 26, 2026, 7:15 PM

#

Not for context; which is what they use it mostly for

hardy trench Feb 26, 2026, 7:17 PM

#

why not? can't you pipe it via a gzip filter?

median edge Feb 26, 2026, 7:19 PM

#

k

hardy trench Feb 26, 2026, 7:19 PM

#

no?

true acorn Feb 26, 2026, 9:06 PM

#

yeah they can

median edge Feb 26, 2026, 9:18 PM

#

As soon as they hit command line execution, a confirmation is needed again

#

imho it doesn't improve the situation

#

unless we have a concrete use / need to compress the core entity registry to solve an issue at hand, I don't think we should do it

#

that said, not against the proposed, but I just want to be sure that we don't use it because we "feel like it"; but instead have an actual reason to do so

true acorn Feb 26, 2026, 9:30 PM

#

I think it would be good to start with measuring how often the entity registry is written to.

#

or actually, track that for all files. Filesize + writes

#

if we can go from 1.5mb to 73kb (which is possible, lots of duplicate keys), it would be a huge help to saving SD cards

median edge Feb 26, 2026, 9:46 PM

#

In all honesty, i do not thing SD cards are a concern at this point. Changing this, woud probably trigger existing community + AI reprecussions.
I curious to learn about the existing problem to date we are solving.

Don't get me wrong, I'm not against compression; especially not in combination with the history use case above. It is more about just enabling it on everything in the kitchen sink that I'm worried about in terms of backlash from the community without having facts on an existing (recent) issue.

#Store compression