#HA Stability Precationary Measures

1 messages · Page 1 of 1 (latest)

scarlet grove
#

My HA Yellow NVMe has an incredibly corrupt partition. I'm doing an fsck on it, but it's been correcting things for the past 5 minutes, I'm not holding my breath. I don't know what caused it, if it was the recent update from 2025.8.1 to 2025.8.2, the latest OS update a week or so back that finally showed itself today, the power outage that we had a bit ago (we average 1-2 a year?), or cosmic interdimensional gremlins...

My question: What would be some reasonable steps to help prevent this in the future?

Also, unlike me, utilize the backups to make regular off-device backups. Do as I say, not as I do. 😣

autumn aspen
#

Ensure clean shutdowns - never pull the power cord out unnecessarily.

Connect the HA Yellow to a UPS to ensure clean power and battery backup - if possible use one that integrates with HA, so that you can trigger a shutdown if the battery gets low.

Use 'smartctl' to check the NVMe health, looking for wear, temperature spikes and controller errors. You can expose the Smart data using the SSH & Web Terminal add-on and use command_line sensors to track wear level.

Ensure it always has free space of at least 10-20% at all times

HAOS uses 'ext4' which supports TRIM, so schedule a job to run 'fstrim -v /'

scarlet grove
#

Those are great, thank you!