#HA Crash how to diagnose

1 messages · Page 1 of 1 (latest)

coral glacier
#

I switched from HomeSeer to HA about a month ago. HA was running fine until it halted about an hour ago while nothing new was happening.
I did not get a screenshot of the halt screen.
Running HAOS in Virtualbox on Win10 on a dedicated optiplex for HA and supporting dockers.
Reboot hangs for a couple min coming up dropping console to a sh as root.
Image is on the network, ssh refused, and no reply on normal 8123 web

Where do I find crash log file and the boot log for HA to get a direction for troubleshooting?

I am not sure the difference between HA components like core or supervisor yet, not sure where to start

What is ssystemctltart order for HA services?

#

systemctl restart hassos-supervisor seems to have brought it up, but it is not working from boot. looking for logs

#

2025-02-05 12:52:48.394 WARNING (Recorder) [homeassistant.components.recorder.util] The system could not validate that the sqlite3 database at //config/home-assistant_v2.db was shutdown cleanly
2025-02-05 12:52:48.538 WARNING (Recorder) [homeassistant.components.recorder.util] Ended unfinished session (id=6 from 2025-01-31 23:49:11.696409)

#

No surprise from crash

#

tts.piper errors on boot, but is working now must have been called before it started

#

no other boot issues once I sysctrl

#

last log lines before it went down

#

logs before crash normal then a spray of nul char

coral glacier
#

after backup and reboot via HA web UI it came back up with no error... How did it fix itself? I rebooted the host 3 times with the same failure to start HA before forcing the service to start manually. So something fixed something

#

Full reboot from VM also worked no issue. I would feel a lot better if I could find the crash log and the boot logs where is failed to start automatically.

coral glacier
#

HA crashed and restarted on its own today, with no explanation. OS had no issue. Just normal execution in log. Is there any crash log other than home-assistant.log.1 ? complete mystery is not filling me with confidence, there has to be something to try to work the issue.?

#

journalctl -u hassio-supervisor.service empty

coral glacier
#

ha docker has no logs command?

#

I cant get docker logs for supervisor from console to dump to a file. It seems I cant create any file anywhere.

#

I think I made a mistake going with HAOS, its impossible to manage and troubleshoot. I think best bet is to install it on a fresh Ubuntu image or add it to my existing docker Ubunto

coral glacier
#

The file editor addon has a homeassistant dir where the config storage is, but it does note exist as root from console.

#

Also there is no /var/lib/docker from file editor. is the file editor from instide the container? Really need to find some docs on how this syustem works

haughty star
#

is the file editor from instide the container?
Yes. Every addon is a container. HA itself too.
need to find some docs on how this syustem works
Maybe this will help: #general-archived message

haughty star
# coral glacier I think I made a mistake going with HAOS, its impossible to manage and troublesh...

Yeah it's not for everyone. I'd recommend you keep virtualizing but perhaps run HA in a docker container instead: https://gist.github.com/Impact123/c23c36eafe1672ec056233e450a86ae2
You can get logs via the ha commands like ha host logs, ha supervisor logs and so on. Also see here in case that command is borked: https://developers.home-assistant.io/docs/operating-system/debugging/
Maybe you can restore a snapshot or a backup? Perhaps after using a fresh OVA.

coral glacier
#

I have been working those suggestions you gave, thank you

#

I have no direct proof but I think the size of the DB is causing the issues. When sqllite get corrupted and restarts it runs well for a wile. I observed once swap space come above 0 it is heading to a lockup, so I doubled CPU and memory to 8g and 4 cpu.

#

It is just acting completely unstable. After adding more memory and bringing it up many things are not working properly, I think I traced it to the fact that ALL of the devices are unlinked from their entities

#

entities still exist though, and its all integrations

#

Also this. I know I saw this at least one time before in the last month

#

loading almost an hour. I am suspecting the supervisor needs a delay when the HAOS is starting.

#

nothing obvious in logs

#

2025-02-09 17:48:10.512 INFO (MainThread) [homeassistant.bootstrap] Setting up recorder: {'recorder'}
2025-02-09 17:48:10.513 INFO (MainThread) [homeassistant.setup] Setting up recorder
2025-02-09 17:48:10.516 INFO (MainThread) [homeassistant.components.http] Now listening on port 8123
2025-02-09 17:48:10.517 INFO (MainThread) [homeassistant.setup] Setup of domain http took 0.01 seconds
2025-02-09 17:48:10.530 WARNING (Recorder) [homeassistant.components.recorder.util] The system could not validate that the sqlite3 database at //config/home-assistant_v2.db was shutdown cleanly
2025-02-09 17:48:10.578 INFO (MainThread) [homeassistant.setup] Setup of domain recorder took 0.06 seconds

#

It was a clean shutdown vis HA UI

#

2025-02-09 17:48:11.728 ERROR (MainThread) [frontend.js.modern.202502050] Uncaught error from Chrome 132.0.0.0 on Windows 10
SyntaxError: Bad control character in string literal in JSON at position 654005 (line 1 column 654006)
parse (node_modules/home-assistant-js-websocket/dist/connection.js:11:36)
2025-02-09 17:48:11.728 ERROR (MainThread) [frontend.js.modern.202502050] Uncaught error from Chrome 132.0.0.0 on Windows 10
SyntaxError: Bad control character in string literal in JSON at position 654006 (line 1 column 654007)
parse (node_modules/home-assistant-js-websocket/dist/connection.js:11:36)
2025-02-09 17:48:11.801 ERROR (MainThread) [frontend.js.modern.202502050] Uncaught error from Chrome 132.0.0.0 on Windows 10
SyntaxError: Bad control character in string literal in JSON at position 654006 (line 1 column 654007)
parse (node_modules/home-assistant-js-websocket/dist/connection.js:11:36)
2025-02-09 17:48:11.873 ERROR (MainThread) [frontend.js.modern.202502050] Uncaught error from Chrome 132.0.0.0 on Windows 10
SyntaxError: Expected ',' or '}' after property value in JSON at position 654007 (line 1 column 654008)
parse (node_modules/home-assistant-js-websocket/dist/connection.js:11:36)
2025-02-09 17:48:12.029 ERROR (MainThread) [frontend.js.modern.202502050] Uncaught error from Chrome 132.0.0.0 on Windows 10
SyntaxError: Expected ',' or '}' after property value in JSON at position 654007 (line 1 column 654008)
parse (node_modules/home-assistant-js-websocket/dist/connection.js:11:36)
2025-02-09 17:48:12.130 INFO (MainThread) [homeassistant.components.websocket_api.http.connection] [140561815311344] Received unknown command: history/history_during_period
2025-02-09 17:48:12.188 ERROR (MainThread) [frontend.js.modern.202502050] Uncaught error from Chrome 132.0.0.0 on Windows 10
SyntaxError: Bad control character in string literal in JSON at position 654382 (line 1 column 654383)
parse (node_modules/home-assistant-js-websocket/dist/connection.js:11:36)

#

2025-02-09 17:48:14.433 ERROR (MainThread) [frontend.js.modern.202502050] Uncaught error from Chrome 132.0.0.0 on Windows 10
SyntaxError: Bad control character in string literal in JSON at position 2951306 (line 1 column 2951307)
parse (node_modules/home-assistant-js-websocket/dist/connection.js:11:36)

#

I assume this is just an expired UI tab

#

2025-02-09 17:48:15.113 ERROR (MainThread) [aiohttp.server] Error handling request from 192.168.0.50
Traceback (most recent call last):
File "/usr/local/lib/python3.13/asyncio/selector_events.py", line 1094, in _write_sendmsg
nbytes = self._sock.sendmsg(self._get_sendmsg_buffer())
BrokenPipeError: [Errno 32] Broken pipe

The above exception was the direct cause of the following exception:

#

Supervisor restart seems to have resolved these issues

#

Cant seem to find any articles on adding a wait on service

#

I suspect this is an issue as even on good reboots I have had issue with some interfaces like greeneye monitor not recording properly that a recycle after boot solves.

#

I do use one zwave JS UI "sidecar" running in HAOS. I think I should move that to my docker Ubuntu. I already have 3 others running there, and that would take load of HAOS guest

#

Also I just noted that in addition to devices not being linked to entities on initial bootup, all categories were missing. Wish I could get a good line on detailed logs. I have given up on getting SSH to HAOS, usb mounting on VirtualBox seems to not work.

#

I think HAOS is not a viable solution to maintain. I was hoping for simple maintenance and troubleshooting with a standardized stack, but is seems to cause more issues thank it solves