#HAOS fails to start due to full HDD. Trying to clear up space via emergency console.

1 messages Β· Page 1 of 1 (latest)

languid hemlock
#

So I believe my issue is that the HDD HAOS is installed on is 100% full and that's preventing it from starting as I'm get this string of "I/O buffer" errors before it dumps me to the emergency console. HA runs on a VE in Proxmox if that is relevant. I don't know what caused it to completely fill and I naively gave it all the space available on my Proxmox machine so there is nowhere to expand the drive to which means I need to somehow reach into the VE from Proxmox or via the emergency console and delete (hopefully) some unnecessary stuff to give it enough space to sucessfully boot but I haven't found actionable specifics on how to do that.

What can I do from the emergency console to clear out some of this space?

vestal sundial
#

Please share df -hT. Do you have PVE backups/snapshots?

#

I'd also like to see lvs -a, lsblk -o+FSTYPE,MODEL, and qm config VMIDHERE from the PVE side.

languid hemlock
vestal sundial
#

HAOS is not the issue here. Or at least it's not full.

#

Your comically small data thin pool is full though.

languid hemlock
#

This is what the HA console slowly prints before it eventually dumps to emergency console

vestal sundial
#

Can I see qm config 101 too?

languid hemlock
vestal sundial
#

Try docker system prune -af and then fstrim -av in the HAOS console. Then check lvs on the node again and see if Data% has gone down.

languid hemlock
vestal sundial
#

Why is your disk only 32G? It seems to be a USB stick too.

languid hemlock
#

thinclient Dell Wyse. Just a small SSD

vestal sundial
#

Alright so we can't do anything in the VM for now. Shut it down and we work on the node. Show me pvs.

languid hemlock
vestal sundial
#

Welp. There's no unallocated space to increase the data pool. There's usually a few gigs but not in this case.

#

Those backups you mentioned. Did you put them somewhere other than the VM's local storage?

languid hemlock
#

It's the backup within HA afaik. I've never had to check so beyond having the good guess that I did have it make one (or more) backups at some point, I don't know.

#

no snapshots in PVE, though I did recently set that up for my personal HA instance here at home. This is my parents setup I'm responsible for fixing atm πŸ˜›

vestal sundial
#

Not a good idea to put backups on the machine you back up. The issue is that without some free space it's hard to do anything. We can try to mount the HAOS data disk on the node and move those backups somewhere else but I can't walk you through that from mobile.

languid hemlock
#

I don't know if at any point HA asked me to point to a specific location to store the backups it wanted to make before doing system updates. I was imagining reverting in the case of breaking changes, not in the case that the entire system kicks the bucket

vestal sundial
#

I might have an idea. We could delete the swap LV.

#

You can later set up a swap file if needed.

languid hemlock
vestal sundial
#

What does free -h look like?

#

No, the PVE installer chooses certain sizes. With a 32G disk there's not a lot of choice to split it up.

languid hemlock
vestal sundial
#

Alright. Run these

swapoff -a
lvremove pve/swap
lvresize -r -l +95%FREE pve/data

Check with lvs afterwards. Also remove the SWAP entry in /etc/fstab via nano /etc/fstab.

#

The 5% unallocated is exactly for cases like this.

languid hemlock
vestal sundial
#

That looks better. Now start the VM again and see if it works.

languid hemlock
vestal sundial
#

Also note that you cannot rely on the size shown inside the VM as you really only have about 14G.

#

Cases like these can cause corruption inside the guest. You could try to enter e on the boot selection screen and force a fsck.

#

It not we can try to mount the partition on the node, grab the backup and recreate the whole virtual disk of the VM. Essentially reinstalling it.

languid hemlock
#

stick fsck.mode=force somewhere in here?

vestal sundial
#

Yes just put it at the end of the last argument but use the normal boot option.

languid hemlock
#

That seems to send me through a partial boot and then kicks me back to the grub boot options

vestal sundial
#

Show me what you configured.

#

You can also attach and boot a lightweight live iso such as grml to the VM and do the fsck from there.

languid hemlock
vestal sundial
#

It has to be on the linux line after the last argument.

languid hemlock
#

same effect. I could only perceive one red error as the boot details flew past but it moves too quick to read it

vestal sundial
#

I wanted to check again before you let it rip.

languid hemlock
vestal sundial
#

That should work. You could try slot B I guess.

languid hemlock
#

already did. No go

vestal sundial
#

:<

languid hemlock
#

is there something I can tack on to stop it from rebooting once it fails so I can at least read the console?

vestal sundial
#

Can't think of something but you could record the screen?

languid hemlock
#

recently found the perfect tool to do just that! Let's try it out.

vestal sundial
#

systemctl status docker.

languid hemlock
vestal sundial
#

There's more below.

languid hemlock
vestal sundial
#

I expected some logs there but oh well. Let's see journalctl -ru docker then

#

My recommendation is probably still to extract the backup and recreate the virtual disk with a fresh HAOS.

languid hemlock
#

This PVE console window is a bit annoying to work with :\

#

doesn't like to stay resized

vestal sundial
languid hemlock
vestal sundial
#

Oh well if overlay is broken this is not worth it.

#

Do this

cd /mnt/data/supervisor/backup
ls -l
languid hemlock
vestal sundial
#
scp -r /mnt/data/supervisor/backup/ USERHERE@PVENODEIPHERE:~/haos_backups
#

Download the file from the node via FileZilla or whatever.

#

Note that this is only a core backup and not even recent.

languid hemlock
#

USER being what I log into PVE as? Didn't seem to like that

vestal sundial
#

root.

languid hemlock
vestal sundial
#

:<

languid hemlock
#

So for reference, what actually went wrong and how do I avoid that in the future (outisde of keeping a snapshot in PVE)

vestal sundial
#

You know about these fake USB sticks? Your VM is basically having a 32G one while the real size was 10G.

#

Stop the VM and follow the kpartx way I linked earlier. The VM appears to be too broken.

languid hemlock
#

Good grief, how did I get it into that state?

vestal sundial
#

With LVM-Thin you can over provision. It's a nice feature but it should never become full and you shoudln't allocate more to one guest than you have in total.

languid hemlock
#

I managed to provision it the entire SSD instead of what the actual free space was, then?

#

I...wonder if I did the same on my instance.

vestal sundial
#

Yeah the HAOS disk is 32G.

#

To be fair I haven't seen it cause that much breakage before. Usually it's fine after what we did and a fresh boot.

languid hemlock
#

Should that issue be self evident by what I see for LVM and LVM-thin in the GUI?

vestal sundial
#

No.

#

For LVM-Thin this shows what the storage Summary shows too.

#

For LVM it's full because the space is allocated to the three LVs you saw in lvs and this is fine.

languid hemlock
#

I'm assuming LVM-thin, space is provisioned on an as-needed basis instead of "locking in" a chunk of space regardless of whether it's filled or not?

vestal sundial
#

In theory yes but it requires participation from the guest to free unused but allocated "buffer".

languid hemlock
#

Storage is so cheap these days, It probably only makes sense if you have a lot of guests and a rather variable storage requirement for them? Otherwise I could have avoided this mishap by not using LMV thin?

vestal sundial
#

LVM-Thin is needed for snapshots too.

languid hemlock
#

Is there something I can do to warn me in the future "Hey, it's getting pretty full in here. Maybe don't download ESPHome Device Assistant" πŸ˜›

languid hemlock
#

Sometimes I think I need simpler hobbies. Feels like I spend more time troubleshooting than actively enjoying myself lol

#

Thanks for your help. I'm going to stick all that in my notes for future me to see.

vestal sundial
#

We're not quite done though. We need to grab the backup and recreate the virtual disk.

languid hemlock
#

okie

#

I got maybe another 15 minutes in me before I stuff some watermelon in my face and head to bed, though.

vestal sundial
#

Just shout here and I'll respond when I'm available.

languid hemlock
#

🀝

languid hemlock
#

@vestal sundial

vestal sundial
#

Yep.

languid hemlock
#

let's get that backup out of the VE

vestal sundial
#

Stop the VM. Install kpartx and share ls -l /dev/mapper/pve.

languid hemlock
#

yeah just figured out proxmox can't contact the internet

#

not a problem I expected to see

vestal sundial
#

Any guesses why?

languid hemlock
#

Something is still manually misconfigured after bringing it back to my place and putting it on my network? The router sees it and I'm not whitelisting anything for internet connectivity.

vestal sundial
#

I'm guessing your subnet or router ip is different.

languid hemlock
#

yeah that sorted it. Back when I first brought it over Google told me to edit the interfaces file to change the IP so I could actually access it via the LAN. I guess that was only half the battle

#

ls: cannot access '/dev/mapper/pve': No such file or directory

vestal sundial
#

This is just temporary. I want to focus on the actual HA issue.

#

Hmm. What does ls -l /dev/mapper say?

languid hemlock
vestal sundial
#

Alright I need the Hardware tab of the VM again. Also please use code blocks so I can copy and paste.

languid hemlock
#

how do I copy that?

vestal sundial
#

Use a SSH client or right click copy in the GUI.

languid hemlock
#

the GUI doesn't give me any kind of context menu option to copy; it's just the usual browser right-click menu

#

This is what you're referring to?

vestal sundial
#

In node > Shell if you select the text you can copy it.

#

But seriously just use SSH. Windows even comes with it. Just do ssh root@iphere in the terminal.

#

Alright we need disk1.

#

Run kpartx -av /dev/mapper/....

languid hemlock
vestal sundial
#

Replace ... with the path to the volume. I ain't writing that.

#

Try to press the TAB key for completion.

languid hemlock
#

root@proxmox:~# kpartx -av /dev/mapper/pve-vm--101--disk--1 add map pve-vm--101--disk--1p1 (252:7): 0 65536 linear 252:6 2048 add map pve-vm--101--disk--1p2 (252:8): 0 49152 linear 252:6 67584 add map pve-vm--101--disk--1p3 (252:9): 0 524288 linear 252:6 116736 add map pve-vm--101--disk--1p4 (252:10): 0 49152 linear 252:6 641024 add map pve-vm--101--disk--1p5 (252:11): 0 524288 linear 252:6 690176 add map pve-vm--101--disk--1p6 (252:12): 0 16384 linear 252:6 1214464 add map pve-vm--101--disk--1p7 (252:13): 0 196608 linear 252:6 1230848 add map pve-vm--101--disk--1p8 (252:14): 0 65681375 linear 252:6 1427456

vestal sundial
#

Alright now this

mkdir -p /mnt/hassos-data
mount /dev/mapper/pve-vm--101--disk--1p8 /mnt/hassos-data
ls -l /mnt/hassos-data
languid hemlock
#

root@proxmox:~# ls -l /mnt/hassos-data/ total 1323204 drwx--x--- 13 root root 4096 Sep 10 17:16 docker drwxr-sr-x 3 root render 4096 Jan 10 2025 logs drwx------ 212 root root 28672 Dec 19 2024 lost+found -rw-r--r-- 1 root root 244 Sep 10 17:16 rauc.db drwxr-xr-x 17 root root 4096 Sep 10 12:48 supervisor -rw------- 1 root root 1354911744 Sep 10 13:19 swapfile

vestal sundial
#
mkdir ~/hassos-backups
cp /mnt/hassos-data/supervisor/backup/* ~/hassos-backups/
ls -l ~/hassos-backups/
languid hemlock
#

root@proxmox:~# ls -l ~/hassos-backups/ total 18504 -rw-r--r-- 1 root root 10240 Sep 11 14:43 395ce42d.tar -rw-r--r-- 1 root root 18933760 Sep 11 14:43 Home_Assistant_Core_2025.8.3_2025-08-31_17.15_42946398.tar

vestal sundial
#

Perfect. We can also try to extract the newest core configs if you like. Were there any addons you installed and need their data?

languid hemlock
#

z-wave js is the only one that comes to mind

vestal sundial
#

I don't know exactly where it stores its settings but you can do this to investiagte

apt install gdu
gdu /mnt/hassos-backups
#

Use FileZilla or WinSCP and download the backups from /root/hassos-backups too your PC.

languid hemlock
#

I can skip it if it's not readily apparent, there's only a handful of devices I'd need to re-pair.

vestal sundial
#

Let's skip this then. Also do this

apt install zip
zip -vr ~/hassos-backups/homeassistant_backup.zip /mnt/hassos-data/supervisor/homeassistant/
#

Once you downloaded both files to your PC we can continue.

languid hemlock
#

unable to locate zio

vestal sundial
#

Typo. Try again.

#

We could also use tar which comes included by default but I guess zip is simpler for you to use.

languid hemlock
#

zipped

vestal sundial
languid hemlock
#

I haven't used winscp so give me a sec

vestal sundial
#

Now unmount the file system and delete the mappings again

umount /mnt/hassos-data
kpartx -dv /dev/mapper/pve-vm--101--disk--1

Then we can grab the HAOS image and extract it

apt install xz-utils
wget https://github.com/home-assistant/operating-system/releases/download/16.2/haos_ova-16.2.qcow2.xz
unxz -v -T0 haos_ova-16.2.qcow2.xz

Go into Hardware and detach the disk1 and delete the now unused disk. Then run this

qm disk import 101 haos_ova-16.2.qcow2 local-lvm

Double click the new unused disk in Hardware and attach it. Enable discard and IO Thread and SSD. Then go to Options > Boot Order and add it at the top. You should now be able to start it again and it will be a fresh HAOS. Do a snapshot now (without RAM) and then restore the backup you downloaded. After restore reboot the VM. Then we fix the network.

languid hemlock
#

I'm getting connection refused when trying to scp to my local machine

vestal sundial
#

You're supposed to connect to the PVE node.

languid hemlock
#

I'm in the ssh for the PVE node. scp from cd to user@windowsmachineIP:directory no?

vestal sundial
#

Your windows PC is lilely not running a SFTP server.

#

The idea is that you use FileZilla or WinSCP from your windows PC to connect to the PVE node which is and download the backups. Just as I said.

languid hemlock
#

I'll try filezilla then

vestal sundial
#

SSH port is 22.

languid hemlock
#

ok got it

vestal sundial
languid hemlock
#

unable to locate package unxz

vestal sundial
#

Edited.

languid hemlock
#

I don't see an 'SSD' option, though I see discard and IO thread

vestal sundial
#

Make sure Advanced is selected.

#

You don't technically need that option for HAOS but it doesn't hurt and makes sense.

#

Discard is the most important as you might know from last time we discussed this.

languid hemlock
#

fyi WARN: iothread is only valid with virtio disk or virtio-scsi-single controller, ignoring

vestal sundial
#

Hmm. What did you choose? virtio-scsi-single is the default and a good choice.

languid hemlock
#

I think it was just defaulting to SCSI

vestal sundial
#

Can you share your Hardware tab again?

languid hemlock
#

HA is preparing btw

vestal sundial
#

Double click the controller and use VirtIO SCSI single.

#

You should also delete the unused disk immediately. We need that space.

#

You can also remove the HAOS image via rm haos_ova*. Not needed any more and again, we need that space πŸ™‚

languid hemlock
#

Do I want to let it finish "preparing" before I snapshot?

#

it's already griping about network information

vestal sundial
#

Yeah the bridge is down. I forgot.

#

Show me the output of the DHCP command from earlier and cat /etc/network/interfaces.

languid hemlock
#

`root@proxmox:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface enp1s0 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.1.27/24
gateway 192.168.1.1
bridge-ports enp1s0
bridge-stp off
bridge-fd 0

source /etc/network/interfaces.d/*`

vestal sundial
#

Hmm. That looks okay though. Perhaps there was a ip conflict. .27 is pretty low.

#

Do you know the DHCP range of your router and the one at your parents' house?

languid hemlock
#

parents' house, no. Here it's .2-.254

#

HA says it's on .57

vestal sundial
#

That's too large IMHO. You need some free for static ips.

#

Can you give PVE a static DHCP lease?

languid hemlock
#

I have a number of statics set within that range. Does that mean I'm stepping on DHCPs toes?

vestal sundial
#

Absolutely. Unless you have static DHCP leases for that you will run into conflicts.

#

PVE does not normally use DHCP but it prevents the router from giving its ip to something else.

languid hemlock
#

well, nothing's complained in the years its been running that I can remember 🀞

vestal sundial
#

That you noticed πŸ˜„

languid hemlock
#

If I haven't noticed, then it wasn't important πŸ™ƒ

vestal sundial
#

Anyways. Can you reserve it?

#

Or perhaps shrink the DHCP range to a max of 50-200 or so.

languid hemlock
#

you mean assign a static IP for PVE?

vestal sundial
#

In the router. Yes.

#

I specifically call it a DHCP lease as the static ip is set on PVE and the lease in the router.

languid hemlock
#

it's on .27 so I just gave it a static assignment to .27 in the router

vestal sundial
#

Alright. Reverse the temporary DHCP ip again as per my docs and then see if HA works.

#

By the way if you try the DHCP thingy again you should now get .27 if done right.

languid hemlock
#

oh. "when done" means when I want to disabled DHCP huh

vestal sundial
#

Yep.

languid hemlock
#

I took it as "when the previous line finishes" πŸ˜›

#

so...it should already be off

vestal sundial
#

Hmm. What does ip a say?

languid hemlock
#

root@proxmox:~# ip address 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host noprefixroute valid_lft forever preferred_lft forever 2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000 link/ether f4:ee:08:ac:45:76 brd ff:ff:ff:ff:ff:ff 6: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether f4:ee:08:ac:45:76 brd ff:ff:ff:ff:ff:ff inet 192.168.1.27/24 scope global vmbr0 valid_lft forever preferred_lft forever inet6 fe80::f6ee:8ff:feac:4576/64 scope link valid_lft forever preferred_lft forever 8: tap101i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UNKNOWN group default qlen 1000 link/ether 0a:aa:08:dd:e1:9c brd ff:ff:ff:ff:ff:ff

#

I spot .27?

vestal sundial
#

Looks okay to me.

#

You can check ha network info in the HAOS CLI.

#

By the way this looks better if you use tree backticks for multi-line code.

languid hemlock
#

ESPHome discord wanted singles so I switched, lol

#

I can't get to the CLI and I can't stop the VM either

vestal sundial
#

That makes no sense. This looks better, no?

root@proxmox:~# ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether f4:ee:08:ac:45:76 brd ff:ff:ff:ff:ff:ff
6: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f4:ee:08:ac:45:76 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.27/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::f6ee:8ff:feac:4576/64 scope link 
       valid_lft forever preferred_lft forever
8: tap101i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UNKNOWN group default qlen 1000
    link/ether 0a:aa:08:dd:e1:9c brd ff:ff:ff:ff:ff:ff
vestal sundial
#

You reloaded the web interface on the new ip, yeah?

languid hemlock
#

yeah

vestal sundial
#

I can only guess it was caused because the old unused disk was not removed before starting the setup and it temporarily ran into 100% again.

#

Can you hover over the warning icon and also check node > System Log?

#

You really need a bigger SSD.

languid hemlock
#

I believe that was the SCSI warning from earlier

#

Sep 11 15:18:42 proxmox dmeventd[321]: WARNING: Thin pool pve-data-tpool data is now 100.00% full. πŸ‘€

vestal sundial
#

Oh well. I documented to remove it πŸ˜„

#

Try to stop the VM and start it again.

#

If it's broken again you now know how to replace the disk with a fresh HAOS.

#

Or just restore the snapshot.

languid hemlock
#

The snapshot I was trying to take after I started the system with the old drive still hanging around that caused the current 100% disk use? πŸ˜›

vestal sundial
#

Snapshots only store a delta of before and after.

languid hemlock
#

I don't have a snapshot because we didn't get to it after I asked if I should let HA finish preparing

vestal sundial
#

It contributes but the old disk was the bigger issue. Can't you buy a used SSD from eBay for the guests?

#

Ah.

languid hemlock
#

Yeah I think I'm just going to order a larger SSD and start from scratch on that. Plop in a copy of the SSD I stuck in my thinclient so I know it's compatible.

vestal sundial
#

No need. You can just use it as additional storage.

#

This should still work for now.

#

Pretty much all my nodes have their own separate boot disk and another one for guests.

vestal sundial
#

Just from the Then we can grab the HAOS image and extract it step.

#

Make sure there is no unused disk and you removed the HAOS iamge before starting HAOS. YO uneed enough free space.

languid hemlock
#

yeah lol

vestal sundial
#

Check with lvs before starting.

#

The idea of the snapshot was so you don't have to do this again.

languid hemlock
#

Is restore a prompt that will show up in the webui?

vestal sundial
#

It's called Rollback in the Snapshots tab..

languid hemlock
#

the HA backup we downloaded? the webui answered my question by popping up the option so it's currently restoring πŸ˜›

vestal sundial
#

I thought you were talking about the PVE snapshot. Onboarding has options to upload the HA backup, yep.

languid hemlock
#

Oh, were you planning to restore with the .tar via the console?

vestal sundial
#

No. I made you donwload the file so you can upload it via the GUI.

languid hemlock
#

which then didn't want my .zip πŸ˜›

#

so I gave it the .tar

vestal sundial
#

The zip is not a HA created backup.

#

It's just the config directory we zipped up in case you need the current configs since the HA backup is over a week old..

#

You'd need to restore those files manually.

vestal sundial
#

Make sure to reboot the VM after restore or things can get funky.

#

I'd also like to see lvs.

languid hemlock
#
  LV                              VG  Attr       LSize   Pool Origin        Data%  Meta%  Move Log Cpy%Sync Convert
  data                            pve twi-aotz-- <14.29g                    57.53  1.86
  root                            pve -wi-ao----  12.84g
  snap_vm-101-disk-0_post-restore pve Vri---tz-k   4.00m data vm-101-disk-0
  snap_vm-101-disk-0_pre-prep     pve Vri---tz-k   4.00m data vm-101-disk-0
  snap_vm-101-disk-1_post-restore pve Vri---tz-k  32.00g data vm-101-disk-1
  snap_vm-101-disk-1_pre-prep     pve Vri---tz-k  32.00g data vm-101-disk-1
  vm-101-disk-0                   pve Vwi-aotz--   4.00m data               0.00
  vm-101-disk-1                   pve Vwi-aotz--  32.00g data               25.48```
vestal sundial
#

Seems fine for now. Perhaps set up the warning mail though.

#

I'll mark this as resolved then.

languid hemlock
#

appreciate this lengthy effort to get HA working again

#

When I get some external storage I should be able to just push the whole VM over to it?

vestal sundial
#

Yep.

languid hemlock
#

snapshot logs mentioned this:

  WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full.```
vestal sundial
#

You kinda have to ignore this for now. The autoextend would not be a good idea.