#Store compression
1 messages ยท Page 1 of 1 (latest)
I've vibed a little benchmark script using same methods as core uses to store those json files. Here are the results:
Input file: ./knx_project.json
Initial size: 1364.11 KB
Results (averages over 3 iterations):
Mode: none
Stored size: 1364.11 KB (compression: 0.00%)
Save time: 0.32 ms
Load time: 3.60 ms
Mode: gzip
Stored size: 73.94 KB (compression: 94.58%)
Save time: 39.03 ms
Load time: 3.06 ms
Those results are on an M1pro macbook, so saving and loading will be significantly different than it would on a RaspberryPi with an SD-Card... not sure if we should optimize for that, but users still seem to use those too ๐ฌ
Here with more detail
Input file: ./knx_project.json
Initial size: 1364.11 KB
Results (averages over 3 iterations):
Mode: none
Stored size: 1364.11 KB (compression: 0.00%)
Save total: 2.29 ms
Serialization: 1.79 ms
Compression: 0.00 ms
Write IO: 0.50 ms
Load total: 3.73 ms
Read IO: 0.27 ms
Decompression: 0.00 ms
Parse: 3.46 ms
Mode: gzip
Stored size: 73.94 KB (compression: 94.58%)
Save total: 40.05 ms
Serialization: 1.58 ms
Compression: 38.30 ms
Write IO: 0.17 ms
Load total: 2.92 ms
Read IO: 0.06 ms
Decompression: 0.48 ms
Parse: 2.39 ms
Holy shit knx file is huge. Yes for that size difference it's probably worth it
Main motivation was to speed up loading times and reduce wear of SD cards.
I guess backups are compressed anyway.
In Knx we store a communication history that can grow up to ~15 MB (size configurable, default is much less ๐)
hehe uh
maybe that file isn't really suitable for that :S haha
considered using a sqlite or something instead maybe? :S
HACS repository cache is also ~1,5 MB. Same for entity registry on my installation.
Yeah, in the end we send this to frontend via WS. So json was the path of least resistance
Yeah those sizes are not good
The thing is, if we compress this, we will hit some of the community that consume these files
probably not for this specific case. But I would consider splitting things like communication history out (and maybe compress that specifically in that case)
Therefore I'd like to do it optionally. Like for data that is stored once and read every start (like Knx project or other config) or really huge data (HACS cache, maybe automation traces) I'd use it.
For entity registry maybe not ๐ฌ
Store(
...,
compress: bool=False,
)
Like this so we can decide for every use.
That sounds pretty good to me. Which compression format do you suggest? Would we enforce a suffix on the file corresponding to the compression algorithm?
Agreed sounds good ๐
Since we are on 3.14 now we could get fancy https://docs.python.org/3/library/compression.zstd.html ๐
that might yield even better numbers than the gzip test above
Input file: ./knx_project.json
Initial size: 1364.11 KB
Results (averages over 10 iterations):
Mode: none
Stored size: 1364.11 KB (compression: 0.00%)
Save total: 2.59 ms
Serialization: 1.62 ms
Compression: 0.00 ms
Write IO: 0.97 ms
Load total: 2.76 ms
Read IO: 0.33 ms
Decompression: 0.00 ms
Parse: 2.44 ms
Mode: gzip
Stored size: 73.94 KB (compression: 94.58%)
Save total: 41.49 ms
Serialization: 1.58 ms
Compression: 39.48 ms
Write IO: 0.43 ms
Load total: 3.05 ms
Read IO: 0.07 ms
Decompression: 0.50 ms
Parse: 2.48 ms
Mode: zstd
Stored size: 77.55 KB (compression: 94.31%)
Save total: 3.01 ms
Serialization: 1.57 ms
Compression: 1.19 ms
Write IO: 0.26 ms
Load total: 2.79 ms
Read IO: 0.05 ms
Decompression: 0.50 ms
Parse: 2.24 ms
Mode: zlib
Stored size: 81.92 KB (compression: 93.99%)
Save total: 10.65 ms
Serialization: 1.57 ms
Compression: 8.86 ms
Write IO: 0.22 ms
Load total: 2.84 ms
Read IO: 0.06 ms
Decompression: 0.46 ms
Parse: 2.33 ms
HM maybe but if I hear entity registry files of 1.5mb I do want to compress the registries
Or use SQLite ๐
So far HA has been proudly SQL free at its core but I see your point..
Imagine allowing AI to write SQL to query our data...
Efficient search
Yes. I do want vector search, but a bit off topic
but the fact the frontend now does all of the search logic is just... "meh"
Hehe agreed, don't forget about people actually using them (even though they shouldn't) and it being a source for AI stuff
surely the AI can handle decompression, directly or indirectly
Not for context; which is what they use it mostly for
why not? can't you pipe it via a gzip filter?
k
no?
yeah they can
As soon as they hit command line execution, a confirmation is needed again
imho it doesn't improve the situation
unless we have a concrete use / need to compress the core entity registry to solve an issue at hand, I don't think we should do it
that said, not against the proposed, but I just want to be sure that we don't use it because we "feel like it"; but instead have an actual reason to do so
I think it would be good to start with measuring how often the entity registry is written to.
or actually, track that for all files. Filesize + writes
if we can go from 1.5mb to 73kb (which is possible, lots of duplicate keys), it would be a huge help to saving SD cards
In all honesty, i do not thing SD cards are a concern at this point. Changing this, woud probably trigger existing community + AI reprecussions.
I curious to learn about the existing problem to date we are solving.
Don't get me wrong, I'm not against compression; especially not in combination with the history use case above. It is more about just enabling it on everything in the kitchen sink that I'm worried about in terms of backlash from the community without having facts on an existing (recent) issue.