gentle relic Aug 17, 2024, 5:51 PM

#

HA OS is killing off processes and crippling the migration. I can reboot from the shell, but migration just starts again and I end up here.

copper sentinel Aug 17, 2024, 5:52 PM

#

Please install the glances addon and share what it looks like. Make sure to press z and m to show processes and sort by memory usage before.

fallow rapidsBOT Aug 17, 2024, 5:52 PM

#

Please use imgur or other image sharing web sites, and share the link here.

Image posting is blocked in most channels to discourage people from sharing text as images. Sharing text as images assumes that everybody sees the world as you do, which isn't the case. Some people are colour blind, or have visual impairment that means they can't make sense of an image of text.

gentle relic Aug 17, 2024, 5:54 PM

#

After the OS killed some stuff off, it settled back a bit. Here is glances nowhttps://imgur.com/a/vzs2MSI

copper sentinel Aug 17, 2024, 5:55 PM

#

Make sure to press z and m to show processes and sort by memory usage before.

gentle relic Aug 17, 2024, 5:56 PM

#

https://imgur.com/a/Jmn5qFo

copper sentinel Aug 17, 2024, 5:57 PM

#

This is still not sorted by memory.

gentle relic Aug 17, 2024, 5:59 PM

#

Argh. Now when I press any key in glances (like h or m to sort) it just beeps at me and does nothing. I tried leaving glancves and coming back and refreshing the app (cmd+r)

copper sentinel Aug 17, 2024, 5:59 PM

#

:<

gentle relic Aug 17, 2024, 6:00 PM

#

It did that a minute ago and refreshing brought it back.

copper sentinel Aug 17, 2024, 6:01 PM

#

Some things might be out of view this way but I can see that HA itself uses 2G alone which I think is a bit high.

gentle relic Aug 17, 2024, 6:01 PM

#

https://imgur.com/a/zI3VAXI

#

I had to use the browser. The macOS App is not always fun 🙂

copper sentinel Aug 17, 2024, 6:01 PM

#

I'd try starting HA in safe mode and see if the memory usage goes down. Also check this: #general-archived message

gentle relic Aug 17, 2024, 6:02 PM

#

I might need to rebooto it again. I get this when trying to restart in safe mode:

#

Failed to restart Home Assistant
The system cannot restart while a database upgrade is in progress.

copper sentinel Aug 17, 2024, 6:03 PM

#

Do you know how big your database is roughly?

#

You can check with ls -lh /config/ via the SSH addon.

#

You can monitor the core logs with ha core logs -vf.

#

Might take a while if the database is large or the host is slow. I saw that there was a decent amount of IO. For this kind of storage at least.

gentle relic Aug 17, 2024, 6:04 PM

#

Yup, I've been watching both OS and CORE logs in a ssh. Here is the db:

#

-rw-r--r-- 1 root root 675037184 Nov 25 2021 home-assistant_v2.db

#

-rw-r--r-- 1 root root 643.8M Nov 25 2021 home-assistant_v2.db

#

(-lh)

#

But it seems OS has killed off the containers and so the migrations get halted until I reboot

copper sentinel Aug 17, 2024, 6:07 PM

#

I wouldn't reboot just yet. The writes hint towards the upgrade being underway.
You can also monitor progress with watch -n1 ls -lah /config/ | sort -h. A temporary file should be written somewhere.

gentle relic Aug 17, 2024, 6:07 PM

#

Lots of OS log entries like:

#

2024-08-17 17:49:33.490 owl kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-bbde1c54fe766666822523dae17d9b1f40dcc44b4b26894a74b700cb5648d5dd.scope,task=beam.smp,pid=26335,uid=0

#

"watch -n1 ls -lah /config/ | sort -h" - very cool! Watching...

#

Ah, and now the app has lost its connection. 😦

#

but I am still SSH'ed in from a macOS terminal

copper sentinel Aug 17, 2024, 6:09 PM

#

SSH runs in its own container. Maybe the homeassistant container was killed.

#

ha host logs -vf should tell you this too.

#

You could try stopping the addons (besides SSH, of course) to gain back some head room but I think this should not be necessary.

gentle relic Aug 17, 2024, 6:11 PM

#

no output from watch or ha host logs now... Seems like it's ground to a halt.

#

top in ssh session shows basically zero activity.

copper sentinel Aug 17, 2024, 6:14 PM

#

I have a somewhat poor idea if you have backups. Stop core, rename the database file and let it create a new one. Restore core later.

gentle relic Aug 17, 2024, 6:15 PM

#

yup, daily backups on a SMB share to a Mac Mini

copper sentinel Aug 17, 2024, 6:16 PM

#

Another way would be to re-install HA completely, restore core and if that upgraded fine restore the addons.
I don't really like these options but I don't have another idea (aside from what I shared initially) right now if the logs are quiet.

gentle relic Aug 17, 2024, 6:17 PM

#

let's try the first one. I issues ha core stop in the terminal but it took several seconds and then:

#

ha core stop

gentle relic · 2024-08-17T17:51:10.034Z

I just upgraded my HA Blue yesterday | Home Assistant | Page 1

Processing... Done.

Error: Another job is running for job group home_assistant_core

copper sentinel Aug 17, 2024, 6:18 PM

#

ha jobs info should tell you what they are. Likely the upgrade.

fallow rapidsBOT Aug 17, 2024, 6:18 PM

#

To format your text as code, enter three backticks on the first line, press Enter for a new line, paste your code, press Enter again for another new line, and lastly three more backticks.
```yaml
example: here
```
Don't forget you can edit your post rather than repeatedly posting the same thing.

gentle relic Aug 17, 2024, 6:20 PM

#

huh, some weatchdog must have kicked in because the app just rteconnected and is going through startup as if I had restarted CORE

#

~ # ha jobs info
ignore_conditions: []
jobs:

child_jobs: []
done: false
errors: []
name: addon_restart_after_problem
progress: 0
reference: core_samba
stage: null
uuid: d627493951d04821b0f9ff50a94e6225
child_jobs: []
done: false
errors: []
name: home_assistant_core_restart
progress: 0
reference: null
stage: null
uuid: 3706b3c713494a22b25e63f3dff7946a
~ id:browse

copper sentinel Aug 17, 2024, 6:21 PM

#

Lol. That last link 😄
That's why we use code blocks 🙂

#

Gotta go for now.

gentle relic Aug 17, 2024, 6:22 PM

#

glances is back up. https://imgur.com/a/xZMaBIA

#

ok, thanks @copper sentinel for the help!

#

675037184 Nov 25 2021 home-assistant_v2.db

#

It seems this file has not been updated in a looong time. I moved to mariahdb - is that the reason? Is the database in some other place?

gentle relic Aug 17, 2024, 6:46 PM

#

OK, so I renamed home-assistant_v2.db and rebooted. The system came back up and said it is migrating the database. Core logs say the same: The database is about to upgrade from schema version 44 to 45

#

But there is no new home-assistant_v2.db

gentle relic Aug 17, 2024, 7:04 PM

#

And I'm back at 100% SWAP etc.

#

and OS just killed off a bunch of stuff

gentle relic Aug 17, 2024, 7:35 PM

#

I disabled both the influxdb and mariahdb addons and rebooted. Stable (but no recorder). I re-enabled and started influxdb. So far stable. Homeassistant process only using 743M. 64% MEM use, no SWAP use. Now to enable mariahdb...

#

Now I started mariadb and curiously there is no MEM/SWAP creep yet. Also no message on upgrading the database. Maybe I need to restart to trigger that?

gentle relic Aug 17, 2024, 7:58 PM

#

I found the mariadb database itself mounted in the container

#

Strike that. SWAP is at 100% and OS is killing things off.

#

mariadb is huge...```config # docker exec 426efbbf3f62 du -sh /data/databases
24.2G /data/databases

copper sentinel Aug 17, 2024, 8:40 PM

#

gentle relic It seems this file has not been updated in a looong time. I moved to mariahdb -...

Oh. That explains it. Yeah. This is the default sqlite file.

#

What are your recorder settings?

gentle relic Aug 17, 2024, 8:50 PM

#

Recorder in configuration.yaml

#

  db_url: !secret recorder_url
  purge_keep_days: 120

#

recorder_url: mysql://homeassistant:Chr0n1cle@core-mariadb/homeassistant?charset=utf8mb4

#

426efbbf3f62 in my docker command above is the mariadb container

#

migration just failed. Here is the log message. https://imgur.com/a/R7g2wDe

copper sentinel Aug 17, 2024, 8:54 PM

#

120 can be quite a lot.

#

You don't need more than 10 usually due to long term statistics: #1216777289270951957 message

gentle relic Aug 17, 2024, 8:55 PM

#

is 120 the days of full resolution data? Will it also keep a sub-sampled version for much longer?

#

aha. So let me set that to much smaller.

#

Here is the filan error in the log that got cut off in the imgur:sqlalchemy.exc.PendingRollbackError: Can't reconnect until invalid transaction is rolled back. Please rollback() fully before proceeding (Background on this error at: https://sqlalche.me/e/20/8s2b)

copper sentinel Aug 17, 2024, 8:56 PM

#

Yeah and HA's recorder is very inefficient storing such data. If you want granular data for a long gime I recommend grafana and a time series database.

#

HA's database is not quite my expertise.

gentle relic Aug 17, 2024, 8:57 PM

#

I do have graphana instaled. What time-series DB? mySQL?

copper sentinel Aug 17, 2024, 8:57 PM

#

I use VictoriaMetrics but InfluxDB is available as addon.

gentle relic Aug 17, 2024, 8:57 PM

#

I also have INfluxDB but it is not used now

copper sentinel Aug 17, 2024, 8:57 PM

#

MySQL is not a time series database.

#

But back to topic, I'm not quite sure how to fix the SQL error.

gentle relic Aug 17, 2024, 8:59 PM

#

So immediate question is how do I recover agt least some of the data I have. I keep water bill (3 month) stats etc...

#

The help in the referenced webpage says:When a connection is invalidated, any Transaction that was in progress is now in an invalid state, and must be explicitly rolled back in order to remove it from the Connection.

#

I suspect when OS terminates the container due to OOM (out of memory) it has corrupted the mariadb.

#

I was reading in other posts that mariadb was recommended some time ago due to DB corruption issues and that that is no longer a problem so I should use the built-in DB support for recorder.

#

That's why I had switched to mariadb back then

copper sentinel Aug 17, 2024, 9:01 PM

#

You could try to temporarily switch the recorder to the default just so you can start. Disable all your non-essential addons, then restore both HA and the MySQL addon to a earlier state.
Have you tried safe mode yet?

gentle relic Aug 17, 2024, 9:02 PM

#

I'm in safe mode now 🙂

copper sentinel Aug 17, 2024, 9:02 PM

#

HA tends to get very slow when the database is bigger than 1G or so.

gentle relic Aug 17, 2024, 9:03 PM

#

so comment out the configuration.yaml recorder entry and reboot? Then it will use the default (mysql?)

copper sentinel Aug 17, 2024, 9:03 PM

#

The default is SQLite but yes.

#

The .db file is SQLite.

gentle relic Aug 17, 2024, 9:04 PM

#

ah. ok. sigh. I wonder if I can recover some of the mariadb - maybe I can manually do a rollback?

copper sentinel Aug 17, 2024, 9:05 PM

#

Perhaps. I'm very rusty. Haven't played DBa in a while now.

#

If you restart MySQL the transactions should be invalidated so I don't think that's it. Like it says, a transaction has to succeed.

gentle relic Aug 17, 2024, 9:07 PM

#

the mariadb log shows it does a check on startup and it seems to say all the tables are OK.

#

I set the days to keep to 14 - I'll try one more reboot in the hope that 1. mariadb is not actually corrupted and 2. HA will trim the DB back BEFORE trying to migrate it...

copper sentinel Aug 17, 2024, 9:09 PM

#

"Trim" only happens on sundays.

#

It's kind of an intense process.

gentle relic Aug 17, 2024, 9:10 PM

#

ah. OK, commenting ou tthe recorder entry in config. I can still start mariadb and if I get ambitious I might try to salvage some data from it.

copper sentinel Aug 17, 2024, 9:13 PM

#

bdraco is the database wizard here but I don't recommend pinging them.
Rather search for the last error in the github organization's issues.

gentle relic Aug 17, 2024, 9:14 PM

#

good idea. Thanks for sticking with me @copper sentinel !

#

cc: @lean reef in case you have any ideas...

#

OK, commented out recorder and rebooted. Now I have a shiny new empty historical DB :(-rw-r--r-- 1 root root 4943872 Aug 17 14:21 home-assistant_v2.db -rw-r--r-- 1 root root 32768 Aug 17 14:21 home-assistant_v2.db-shm -rw-r--r-- 1 root root 4214792 Aug 17 14:21 home-assistant_v2.db-wal

copper sentinel Aug 17, 2024, 9:23 PM

#

That was just one step 🙂

gentle relic Aug 17, 2024, 10:21 PM

#

well, it's growing like a banshi! -rw-r--r-- 1 root root 181.0M Aug 17 15:20 home-assistant_v2.db -rw-r--r-- 1 root root 32.0K Aug 17 15:20 home-assistant_v2.db-shm -rw-r--r-- 1 root root 6.1M Aug 17 15:20 home-assistant_v2.db-wal

gentle relic Aug 17, 2024, 10:57 PM

#

What the heck is it recording? -rw-r--r-- 1 root root 287.2M Aug 17 15:57 home-assistant_v2.db -rw-r--r-- 1 root root 32.0K Aug 17 15:56 home-assistant_v2.db-shm -rw-r--r-- 1 root root 6.6M Aug 17 15:57 home-assistant_v2.db-wal

copper sentinel Aug 17, 2024, 11:20 PM

#

Check here: #general-archived message

gentle relic Aug 18, 2024, 1:25 AM

#

Wow, DbStats is exactly what I needed! I can immediately see what the issue is. One of my home-brew devices (water usage monitor) is spaming the system!

#I just upgraded my HA Blue yesterday

ha core stop