#Checking space usage and shrinking/migrating database
1 messages ยท Page 1 of 1 (latest)
overlay2 are the file systems of the containers. Mostly addons. Leave them alone. ha supervisor repair does housekeeping for this.
https://file.coffee/u/2GPUbQMkc1u8qLRhdFYxH.png mnt/data/ This is weird as there's a /docker/ folder suspiciously with the same size as the one in var/lib/
It's mounted in different places. Doesn't necessarily use double the space though.
Also see here about the double thing: <#general-archived message>
Also see here: <#general-archived message>
I don't want to overwhelm you with this stuff.
I can see my mariaDB is f'ing large (I have 2 years set to retain) but I didn't expect it to get that large
So basically /mnt/data/data and /var/lib/docker is the same.
I recommend using a proper Time Series database for long term data. SQL is very inefficient for that.
Influx DB would be one example. Also read this: https://community.home-assistant.io/t/how-to-keep-your-recorder-database-size-under-control/295795
You can shrink the used data inside the DB but the file(s) will stay the same size. Just how InnoDB works.
I'd have to check where HA gets its data from. You can also check df -h.
Also note that gdu shows GiB while HA shows GB and so on and gdu counts the docker directory double.
wiztree doesn't allow me to scan network devices
It does for me. Just click Select Folder. HA seems to use this: https://docs.python.org/3/library/shutil.html#shutil.disk_usage
You might have to run it as admin. Or if you are try without. I remember there was some strangeness with this.
no, tried both ways
It's only really faster for local drives so it doesn't matter much what you use.
so basically the 2 main lumps that smb doesn't detect are the /addons folder, and that is mostly mariaDB (55GB). And then the docker x2 (37GB x2)
so, for the docker thing it's definitive that it's standard and normal that it's doubled. Nothing to be done there?
I'd like to understand what makes it grow, and how to keep it down, at least if possible
You can't change it being there two times. There's quite some inconsistencies how the space is calculated/shown.
Here's some data from my test HAOS VM
# df -h /mnt/data/docker/
Filesystem Size Used Available Use% Mounted on
/dev/sda8 30.8G 6.0G 23.5G 20% /mnt/data
# du -shc /mnt/data/docker/* | sort -rh
10.4G total
10.4G /mnt/data/docker/overlay2
27.8M /mnt/data/docker/image
...
As you can see, df says 6G used, du says 10.4G. gdu says 10.4G too.
You can try docker system df to see some of it but like I said, supervisor should clean this up periodically and through ha supervisor repair.
If you have lots of addons it's normal to have a big overlay2 folder.
Not sure. How overlay works I guess.
You should think about the Time Series database though. HA has long term statistics so you don't need 2 years worth of recorder data to see all.
yeah, I get that. That's going to be my focus
I have heard about the long term statistics but tbh I don't get what it means
for me long term statistics = storing the data for years ๐
Trying to find the blog post.
I use the default 10 days and that already makes my database bloat to 4-8G. I have my long term data in VictoriaMetrics: #installation-archived message
so if I turn down my keep_days back to 10 I will still be able to see the long-term data?
Assuming the data is eligible for it. Yes. It's not as ganular.
I actually did the change to mariadb just because I wanted to keep all the data, and had heard that sql wasn't as good for it.
The problem with 2 years of data in HA is you can't do anything with it. You can count, let's say, average daily energy consumption over a month or so.
If you have it in InfluxDB or similar you can use grafana do create dynamic dashboard to calculate pretty much everything.
yeah, I heard of them too, but never got around to research more and go that route
Well HeidiSQL is still SQL. In my expeience it performs rather badly when you have more data than memory anyways.
I used MariaDB in the beginning because the tools are just better and I'm familiar with it. Like checking space usage per table and so on is infinitely easier.
tbh it's not that I'm looking for dashboards etc
I just like being able to see temperatures over a long time
Nowadays I just use the defaults as my important data is in VictoriaDB. I use HA's graphs just for a quick peek.
got it
never heard of the ones you mentioned, only InfluxDB
seems to be quite popular, so I guess the others are more advanced, and I'm clearly not at that user stage
Here's a year of data for a temperature sensor: https://i.imgur.com/dJ21oHx.png
You can see it goes light blue after 10~ days.
do you think InfluxDB will be good enough then?
I don't like its syntax but it's the most popular choice for this and there's lots of dashboard already.
See here: https://grafana.com/grafana/dashboards/?dataSource=influxdb
Or more specifically for HA: https://grafana.com/grafana/dashboards/?search=home+assistant&dataSource=influxdb
I'm still confused tho
the point is to set up influxdb and then keep 720 days to retain?
Why 720? The recorder is not involved in it. They are separate.
or is 720 def not needed because these supposed long term statistics are still keeping the data somehow?
that's what I guess I'm missing
The long-term statistics table is never purged. ahhhhhhhhhhh
If you set up the influxdb integration: https://www.home-assistant.io/integrations/influxdb/
It justs sends the data to InfluxDB in addition to the recorder.
but it has nothing to do with the long term ones, right?
well, influxDB stores it forever if you want, so it acts as another copy of the full data
No. And if you set this: https://www.home-assistant.io/integrations/recorder/#purge_keep_days
It does not change anything about long term stats or InfluxDB.
I guess you can try to select last 5 years in your history tab and see if it becomes light blue like mine too: <#1216777289270951957 message>
Might load for 1 an hour...
Not all your entities might support long term stats.
nah my mariaDB it's only been set up for less than a year now
so it's still accumulating
I can't imagine having to back up 50G+ of data for HA.
HA's backups are not that efficient.
That's not the stock history panel. History explorer is dead, btw.
really?
Hmm. Have you run HA longer than you had MariaDB?
My thought was that the recorder data might just be as old as the long term stats. Otherwise you can check the developer tools states if the entity has the states it needs the link here says it needs to have: https://developers.home-assistant.io/docs/core/entity/sensor/#long-term-statistics
ouch, that was amusing
so sad though, it was a great tool
I think we can go back to the main HA channels now. The thread served its purpose.
thank you SO much for all the help
would you mind if I ping you to continue with this?
No that's fine.
really feel it's the right time for me to do this db change and set it up fine for future
I'd recommend you run both in parallel until your're ready to switch. Like I said earlier, they are separate.
https://file.coffee/u/ruGDEDkLfM4DxWfoOGk0h.png this one supposedly has the state_class needed for long term
you meant it should have the earlier 6 months before mariaDB?
also, just to clarify for me, even if I switch keep_days back to default 10, that should purge (whenever the next one is) the recorder values but still retain the long-term ones, correct?
Yes. You should have long term stats from the time of first installation. At least as far as I know. You are using a recent HA version, yeah?
See my picture here: I use a purge of 10 days: <#1216777289270951957 message>
yeah, last stable version
yeah, I saw
idk, maybe I somehow cleared everything when changing the db?
tbh didn't know much of what I was doing
The attributes for that sensor are
state_class: measurement
unit_of_measurement: ยฐC
device_class: temperature
friendly_name: ...
As far as I know you can't really delete long term stats.
weird then :S
recorder:
db_url: !secret mariadb_url # Use new MariaDB
purge_keep_days: 730 # Retain data for 2 years instead of default 10 days
this is what I did to my config
Well unless you destroy the whole DB, of course. Switching from SQLite to MariaDB might count as such a case as you basically create a new database from scratch.
Yes. You can run both in parallel for a year though and don't lose anything. The point of the switch is to have better efficiency and ways to inspect the data.
but whenever I switch that recorder url it will get replaced and deleted, right?
Well yes and no. InfluxDB does NOT replace the recorder.
ah
The idea is to store ganular recorder data for only 10 days, then have recorder long term stats and granular + long term via InfluxDB.
got it. So what should I do with my mariaDB then? Atm it seems I'm storing granular for 2 years, so long term cannot even be seen
so I guess if I purge over 10 days, it will switch to granular for 10, then long term
and then I set up influxDB as a "backup" DB for granular long term
am I right?
You could try reducing purge days by a month or so and see if long term stats show up. But if you only have 6 months of data ๐คท
what do you mean only 6 months? I have almost a year.
11 months then ๐
okok got it
I mean that 11 months is less than, let's say, 700 days.
If not maybe the folks in #integrations-archived can tell you more. I'm not an expert with the long term stats.
not managing to purge
just to get my roadmap clear. Would you to switch the recorder back to default db?
or stick to mariaDB?
Depends why you use MariaDB. I use SQLite at the moment. The only benefits of MariaDB I see is the better tools to inspect the database.
I never looked at the db myself really
just thought it was more efficient for long term
but now understanding the recorder won't be responsible of long term, I don't think it matters
right?
I think it's still a bit faster. I switched because it uses more memory and space.
Long term stats are still saved in the recorder database.
They don't take a lot of storage though as they are not that granular.
ah ok. Well so to avoid losing all the data again, I will just stick with mariaDB for now
tbh don't think I even need influxDB if the long term thing really works as it seems
don't really use the granular bit that much
See here for some reasons why the database can't be easily shrunk even if you remove the data itself: https://www.google.com/search?q=shrink+innodb
InfluxDB is useful if you want to do stuff with the data. Like let's say check the average power consumption of the last hour or so.
Or store glances data for CPU usage per process or whatever.
well if that's the case I might end up just changing back to default just for clearing the db in full xD
don't really understand much unfortunately
but I get that even after purged my 55GB won't reduce much
so I guess my only option is to start from 0 again
Think of the database file like belly skin that is now flappy after you've lost weight.
yeah, makes sense
is that the same for any db though? Or just mariaDB?
meaning, is everyone forced to reset their DB because it just keeps inflating infinitely?
ah no, if you stick to the 10d rule, it won't increase
silly me
It only inflates to the biggest it has been.
yep
so it was dumb of me to do it ๐
so once again. Knowing I'm starting form 0. What do you recommend for recorder DB? Keeping in mind I'm a user, not a dev. ๐
I'd stick to SQLite.
priority is ease of use, and popularity (for community support)
ok
funny thing it still had the original sqlite
so there's the old data
So at least it works.
do you know if there's a way to migrate the data?
Not officially.
well, I guess I don't want the granular data, and the long term one wasn't stored
so it'd be complex
nvm
I'll survive
happy to recover 55GB
also prob mariaDB and that db was responsible of some of that docker folder
so might see even a larger reduction
The devs talk about this somewhere from here: https://www.youtube.com/watch?v=IGnCGDaXR0M&t=2636s
Sorry, don't have a more accurate timestamp.
Otherwise: https://www.google.com/search?q=homea+ssistnat+mysql+sqlite++migrate
yeah found something, but again I'd need to trim it down to 10d granular, or else I'll have the same issue if I migrate everything
and I don't think making mariadb process the data and convert it to long term manually is going to be straightforward
nvm
48:30
found a migration guide and the video suggested it was possible too, so going to try
and first I am purging all the long term
https://file.coffee/u/HOHTHS96VcJpCgIT5w5Wl.png it's going to take the night, but it should reduce the size a lot before migrating it
looking good!
Hey @lilac dune just FYI. I learned a few things.
- You can check db size without SSH access by going to Settings > System > Repairs > 3 dots > System information, Scroll down to Recorder.
- I had a 55GB db as you saw. I started purging the db before migrating back to SQLite. https://file.coffee/u/QjQ68HVvSSv5c46W-zbH5.png Still working on it after 2 days. ๐ Anyway, when purging there are these handy options https://file.coffee/u/VDAQBiggt4N22i45t3qZ1.png that seem to take much longer but help reducing the db size after the purge.
- It seems to be working! https://file.coffee/u/Pk1OxvrxbWD9rw7O2wX_Z.png Db currently sitting at 19GB and going down. ๐ Will see how small it ends with. I will still migrate it back to SQLite. The video info was very useful.
1
You can also use a SQL sensor for this: https://www.home-assistant.io/integrations/sql/
And file size for SQLite: https://www.home-assistant.io/integrations/filesize/
This is mine right now: https://i.imgur.com/IZTSjN5.png
See here for queries and tops to keep the database small. Not sure if the queries work with MySQL too: https://community.home-assistant.io/t/how-to-keep-your-recorder-database-size-under-control/295795
See here for MySQL queries you can run in phpMyAdmin: https://stackoverflow.com/questions/9620198/how-to-get-the-sizes-of-the-tables-of-a-mysql-database
2
Yeah that's handy but unfortunately it doesn't show progress or anything. This also happens periodically on its own btw: https://www.home-assistant.io/integrations/recorder/#auto_repack
Cool, Might create a sensor once I migrate then. It's handy to have the history. What's the diff between the 2 sensors you got?
- Then that solves the issue of "skin staying flacid" no? Sort of
I think my purge is completed now. Got it down to 900MiB omfg
such a reduction
Well one checks the actual file size, the other the data SQLite reports.
Not in the case of MySQL. See here again for why: #1216777289270951957 message
that says that, but as I said my MySQL one has shrunk heavily after the purge.
Maybe it does a recreate.
I'm trying to double check with gdu but can't find the command to launch it
I heavily updated my post about it. Take another look: #general-archived message
How come HA now reports less (110GB) that gdu (120+37.7GiB)?
For what exactly? I thought I explained that alst time.
I don't know how often that updates.
maybe you did, sorry. I don't understand
you think gdu is more reliable? Not sure why it would increase suddenly though, as I've only been reducing it
The gist is that overlays skew the data a bit and they calculate it differently.
I just checked and can confirm db has shrunk 54GB ๐
overlays folder hasn't changed a bit though ๐ฆ I thought it would since you said it was related to addons, and thought it would be MariaDB fault
You can actually trigger the compaction ayourself in phpMyAdmin with optimize table but it's kind of write heavy as it writes a copy of the database, then deletes the old one.
I think SQLite does the same, btw.
trigger what? The purge? I'm happy to do it via HA purge service. Not that familiar with phpmyadmin and SQL anyway
The repack.
yeah, prefer to do it via HA
will migrate back to sqlite now, and delete mariadb altogether. And will check again. Really hoping to see overlays reduce somewhat ๐ฆ
appreciate it, but really, way too messy when HA is one click ๐
Still good to know about.
As the livestream video said, if you use MySQL you are the DB admin ๐
yeah, true. Notice how quick I'm moving back ๐
Mind helping me with this? https://file.coffee/u/he7D2ssk8M5eC-uQ6U4il.png
got an error when trying to install the migration tool
the guide is this one: https://community.home-assistant.io/t/migrate-back-from-mariadb-to-the-default-sqlite/604278
Try pipx.
apk add py3-pipx ??
Check if it's installed first. I'd have to check the package name first. Am mobile right now.
no, command not found
Try apk add pipx.
yeah, it worked
Then run the same command but just with x appended.
installed package mysql-to-sqlite3 2.1.10, installed using Python 3.11.8
These apps are now globally available
- mysql2sqlite
โ ๏ธ Note: '/root/.local/bin' is not on your PATH environment variable. These apps will not be globally accessible until your PATH is
updated. Run `pipx ensurepath` to automatically add it, or manually modify your PATH in your shell's config file (i.e. ~/.bashrc).
done! โจ ๐ โจ
Got this
should I do the ensurepath?
tbh the tutorial said I should have got this warning, but I didn't https://community-assets.home-assistant.io/original/4X/e/c/3/ec35101ae3d355e482ddb08e1b6057f3869d8804.png
It told you it's not in your path
Allso please post this in code blocks, no pictures.
If you do I can copy paste your paths from there and give you a proper command. I don't want to retype it.
It's also easier to read.
โ ~ pipx install mysql-to-sqlite3
installed package mysql-to-sqlite3 2.1.10, installed using Python 3.11.8
These apps are now globally available
- mysql2sqlite
โ ๏ธ Note: '/root/.local/bin' is not on your PATH environment variable. These apps will not be globally accessible until your PATH is
updated. Run `pipx ensurepath` to automatically add it, or manually modify your PATH in your shell's config file (i.e. ~/.bashrc).
done! โจ ๐ โจ
โ ~
โ ~
โ ~ pipx ensurepath
Success! Added /root/.local/bin to the PATH environment variable.
Consider adding shell completions for pipx. Run 'pipx completions' for instructions.
You will need to open a new terminal or re-login for the PATH changes to take effect.
Otherwise pipx is ready to go! โจ ๐ โจ
โ ~ ha core stop
Processing... Done.
Command completed successfully.
โ ~ mysql2sqlite -f /config/home-assistant_v2.db -d homeassistant -u homeassistant -h homeassistant.local -P 3306 -p -V --use-buffered-cursors
zsh: command not found: mysql2sqlite
โ ~ zsh: command not found: mysql2sqlite
Try /root/.local/bin/mysql2sqlite ....
cd there? Or just add that instead of just the name at the beginning of the command?
No.
You can also run export PATH="/root/.local/bin:$PATH" to make it runnable with just mysql2sqlite
RIght now it doesn't know where mysql2sqlite is. There's specific paths it looks at to check for commands
Try echo $PATH.
this worked
migrating now
tahnk you very much
Checking space usage and shrinking/migrating database
https://file.coffee/u/YzMqP8MWbM7tLppJ58z4Z.png Sorry, how do I use these commands? Already did the commenting, just need to save and close the file.
Which commands?
the ones shown at the bottom
I mean key shortcuts
Yeah sorry it took me a second. Just CTRL+X and so on.
Just enter.
that is SO misleading lol
nano is one of the easiest editors. You should try vi ๐
soooo... after rebooting with the sqlite db now the db is empty xD
the migration seemed to work, and no errors
but now history shows nothing. Not even the old data from the initial months
will ask in the forum thread :S
config/home-assistant_v2.db.corrupt.2024-03-13T21:03:18.844016+00:00
maybe that's a flag
Yeah I'm not sure if that conversion works properly.
Tried this, but after booting it still found it corrupt ๐ฆ https://community.home-assistant.io/t/restore-corrupt-database-after-nightly-purge/602377/3?u=chaoscontrol
Guess there's a reason the devs don't have docs for it if it were that simple.
Did you stop HA while doing the migration?
yep
I managed to make it work trying the migration again. But this time, instead of migrating into the old original HA database and merging with it, I did it into a brand new db. And it worked fine. Obv I don't have the data from those first HA months now though.
But still have the file from the first migration which holds that data. Do you know if there's a way to properly migrate that date range only into the current working sqlite db?