#Store compression

1 messages ยท Page 1 of 1 (latest)

rare sinew
#

I've vibed a little benchmark script using same methods as core uses to store those json files. Here are the results:

Input file: ./knx_project.json
Initial size: 1364.11 KB

Results (averages over 3 iterations):
Mode: none
  Stored size: 1364.11 KB (compression: 0.00%)
  Save time: 0.32 ms
  Load time: 3.60 ms
Mode: gzip
  Stored size: 73.94 KB (compression: 94.58%)
  Save time: 39.03 ms
  Load time: 3.06 ms
#

Those results are on an M1pro macbook, so saving and loading will be significantly different than it would on a RaspberryPi with an SD-Card... not sure if we should optimize for that, but users still seem to use those too ๐Ÿ˜ฌ

#

Here with more detail

Input file: ./knx_project.json
Initial size: 1364.11 KB

Results (averages over 3 iterations):
Mode: none
  Stored size: 1364.11 KB (compression: 0.00%)
  Save total: 2.29 ms
    Serialization: 1.79 ms
    Compression: 0.00 ms
    Write IO: 0.50 ms
  Load total: 3.73 ms
    Read IO: 0.27 ms
    Decompression: 0.00 ms
    Parse: 3.46 ms
Mode: gzip
  Stored size: 73.94 KB (compression: 94.58%)
  Save total: 40.05 ms
    Serialization: 1.58 ms
    Compression: 38.30 ms
    Write IO: 0.17 ms
  Load total: 2.92 ms
    Read IO: 0.06 ms
    Decompression: 0.48 ms
    Parse: 2.39 ms
median edge
#

But uh...

#

why? ๐Ÿ˜„

#

As in, what are we trying to solve here?

true acorn
#

Holy shit knx file is huge. Yes for that size difference it's probably worth it

rare sinew
#

I guess backups are compressed anyway.

#

In Knx we store a communication history that can grow up to ~15 MB (size configurable, default is much less ๐Ÿ˜‰)

median edge
#

hehe uh

#

maybe that file isn't really suitable for that :S haha

#

considered using a sqlite or something instead maybe? :S

rare sinew
#

HACS repository cache is also ~1,5 MB. Same for entity registry on my installation.

rare sinew
true acorn
#

Yeah those sizes are not good

median edge
#

The thing is, if we compress this, we will hit some of the community that consume these files

#

probably not for this specific case. But I would consider splitting things like communication history out (and maybe compress that specifically in that case)

rare sinew
#

Therefore I'd like to do it optionally. Like for data that is stored once and read every start (like Knx project or other config) or really huge data (HACS cache, maybe automation traces) I'd use it.
For entity registry maybe not ๐Ÿ˜ฌ

#
Store(
    ...,
    compress: bool=False,
)

Like this so we can decide for every use.

hardy trench
#

That sounds pretty good to me. Which compression format do you suggest? Would we enforce a suffix on the file corresponding to the compression algorithm?

median edge
#

Agreed sounds good ๐Ÿ‘

rare sinew
#

that might yield even better numbers than the gzip test above

rare sinew
#
Input file: ./knx_project.json
Initial size: 1364.11 KB

Results (averages over 10 iterations):
Mode: none
  Stored size: 1364.11 KB (compression: 0.00%)
  Save total: 2.59 ms
    Serialization: 1.62 ms
    Compression: 0.00 ms
    Write IO: 0.97 ms
  Load total: 2.76 ms
    Read IO: 0.33 ms
    Decompression: 0.00 ms
    Parse: 2.44 ms
Mode: gzip
  Stored size: 73.94 KB (compression: 94.58%)
  Save total: 41.49 ms
    Serialization: 1.58 ms
    Compression: 39.48 ms
    Write IO: 0.43 ms
  Load total: 3.05 ms
    Read IO: 0.07 ms
    Decompression: 0.50 ms
    Parse: 2.48 ms
Mode: zstd
  Stored size: 77.55 KB (compression: 94.31%)
  Save total: 3.01 ms
    Serialization: 1.57 ms
    Compression: 1.19 ms
    Write IO: 0.26 ms
  Load total: 2.79 ms
    Read IO: 0.05 ms
    Decompression: 0.50 ms
    Parse: 2.24 ms
Mode: zlib
  Stored size: 81.92 KB (compression: 93.99%)
  Save total: 10.65 ms
    Serialization: 1.57 ms
    Compression: 8.86 ms
    Write IO: 0.22 ms
  Load total: 2.84 ms
    Read IO: 0.06 ms
    Decompression: 0.46 ms
    Parse: 2.33 ms
true acorn
rare sinew
#

Or use SQLite ๐Ÿ˜›

true acorn
#

So far HA has been proudly SQL free at its core but I see your point..

#

Imagine allowing AI to write SQL to query our data...

#

Efficient search

median edge
#

Yes. I do want vector search, but a bit off topic

#

but the fact the frontend now does all of the search logic is just... "meh"

median edge
hardy trench
#

surely the AI can handle decompression, directly or indirectly

median edge
#

Not for context; which is what they use it mostly for

hardy trench
#

why not? can't you pipe it via a gzip filter?

median edge
#

k

hardy trench
#

no?

true acorn
#

yeah they can

median edge
#

As soon as they hit command line execution, a confirmation is needed again

#

imho it doesn't improve the situation

#

unless we have a concrete use / need to compress the core entity registry to solve an issue at hand, I don't think we should do it

#

that said, not against the proposed, but I just want to be sure that we don't use it because we "feel like it"; but instead have an actual reason to do so

true acorn
#

I think it would be good to start with measuring how often the entity registry is written to.

#

or actually, track that for all files. Filesize + writes

#

if we can go from 1.5mb to 73kb (which is possible, lots of duplicate keys), it would be a huge help to saving SD cards

median edge
#

In all honesty, i do not thing SD cards are a concern at this point. Changing this, woud probably trigger existing community + AI reprecussions.
I curious to learn about the existing problem to date we are solving.

Don't get me wrong, I'm not against compression; especially not in combination with the history use case above. It is more about just enabling it on everything in the kitchen sink that I'm worried about in terms of backlash from the community without having facts on an existing (recent) issue.