#Nothing will boot anymore

1 messages · Page 1 of 1 (latest)

steep stone
#

Chain of events and explanation:
Build new kernel to confirm LLVM bug -> delete old efibootmgr entry and create new entry
-> no entries persist after rebooting -> install GRUB, install gentoo-kernel-bin
-> all kernels get stuck after loading initramfs

What I've tried:
Nuking and recreating ESP, reinstalling kernels and grub - no result
Nuking swap and recreating (on suspicion I accidentally DD'd a live iso to the partition)
Updating UEFI firmware - no result (not in regards to efibootmgr either)
Removing partitions (like ntfs and swap) from fstab and regenerating initramfs
Changing around installkernel use flags (systemd, -systemd, still not entirely sure what the entire mess behind that change is)
btrfs check on root partition

Other Linux distros on separate drives boot just fine, Windows installed on the same NVME works without issues, only Gentoo seems to be affected.

Current grub.cfg (stock): https://bpa.st/65UA
emerge --info: https://bpa.st/H6RA

#

I might try to setup systemd-boot and see where that gets me altho I doubt much will happen

#

I think I had issues in the past where only efibootmgr was a reliable way of getting it to boot but given that I can't create entries I can't really use that

#

@unique badge does this smell like a firmware issue or a "what the fuck is happening" issue

#

my uefi also only lets me access shell from USB devices so I can‘t try to do anything from there

unique badge
#

Looks strange

#

The thing jumping out at me to ask first is the system not seeming to rub ~amd64

#

is that true

#

I've got to go out but making sure you are using the newest versions to start then confirming if the issue happens on llvm17 would be my first port of call

silver orchid
#

@steep stone all kernels get stuck after loading initramfs means grub loads fine, initramfs fine - > kernel stuck on what moment? can you boot using systemrescuecd with findroot ?

steep stone
#

I‘ll try

#

not at home rn

silver orchid
#

sure np, I don't have too many ideas, I had troubles after updating grub actually, something changed there, I've used to stuck too, needed to re-install grub, some people on forums used to complain, some used to downgrade grub as I've seen. yet re-installing helped for my case (but still case was different since it was stuck even during grub load itself)

#

maybe, some information that cna be useful, show grub version you use, with use flags, commands you run to install it, command outputs...

#

I doubt if problem is in kernel or intramfs if you use -bin versions

#

yet grub/efi install is sorta complicated and even documentation is confusing and not consistent across different places where you read it

steep stone
#

Honestly the EFI issues are probably unrelated and just muddying up everything else so it's distracting me

#

...maybe

gloomy verge
#

It may be useful to inspect the output of installkernel (specifically the initramfs generation part) to confirm that dracut is not doing something weird (I'm always suspicious of dracut)

#

dracut may be including some wrong cmdline

steep stone
#

am back at home so I'll do that right away

#

that could be it yea

#

dracut output seems relatively normal

#

one of the first things I checked were btrfs and systemd being included

gloomy verge
#

try it with --verbose (or simply emerge --config gentoo-kernel-bin which will execute it with verbose by default)

steep stone
#

oh you're right

#

I was wondering why there was no cmdline

#

well for dracut

steep stone
#

I haven't used gentoo-kernel for a while so I'm still a bit confused by the installkernel systemd use flag change, which I think just uses the path used by systemd-boot? I tried with and without

#

rebooted again with the newest device I added unplugged just to try it, no change

silver orchid
#

grub loads fine and you can edit kernel run option with e ?

steep stone
#

yes

silver orchid
#

how current grub.cfg looks

steep stone
#

I removed all my custom config stuff

#

to try and get th ecustom kernel booting

silver orchid
#

@steep stone I can't see init=/lib/systemd/systemd

#

you run on systemd?

steep stone
#

good catch

#

yes

#

i figured it might be init

#

so let‘s try that

silver orchid
#

you need to add it on /etc/default/grub

#

I have it like this
GRUB_CMDLINE_LINUX="rootfstype=ext4 init=/lib/systemd/systemd intel_pstate=disable cgroup_enable=memory swapaccount=1"

gloomy verge
steep stone
#

I think I forgot to alter my default grub given that I unnnstalled grub months ago

#

idk how I missed that

gloomy verge
steep stone
#

never had to icnclude that in my commandline on custom kernels

#

funny stuff

#

altho that might be my issue

gloomy verge
#

but you can try it anyway I guess

steep stone
#

doesn't hurt

#

altho I tried to manually add it to the commandline from grub's editor

#

Ctrl-X to boot

#

should be the correct way

#

didn't change anything but maybe I made a typo

#

no luck

#

hmpf

gloomy verge
#

what I find the most confusing here is that you don't get some error message to indicate where to look for the problem

#

maybe we can make this more verbose somehow

steep stone
#

I wish I could look into the boot process

silver orchid
#

tried without initramfs?

steep stone
#

and see what happens post-initramfs

#

I guess I could try to make something initramfs-less

steep stone
#

I've been booting into another Gentoo-based system this entire time running 6.6.8 (which I also tried to use)

#

and nothing complains here

gloomy verge
#

can you try to boot with 'debug rd.shell' on the cmdline

steep stone
#

sure!

gloomy verge
#

it should drop you to some debug shell if the kernel doesn't start correctly

#

it might also need 'console=tty0'

steep stone
#

just add debug rd.shell to my cmdline correct?

#

appended those three options

gloomy verge
#

yeah that should do it, maybe add 'console=tty0' to be safe

steep stone
#

linux /vmlinuz-6.7.4-gentoo-dist.old root=UUID=c78888dc-f31f-4de9-b980-582d146bb86b ro init=/usr/lib/systemd/systemd debug rd.shell console=tty0

#

quite a long one now

#

lol

#

the systemd one is unnecessary but I don't think it would interfere

#

(?)

#

the UUID is correct btw

#

I checked that

silver orchid
#

maybe even loglevel=1 ? or maybe other value not 1

steep stone
#

/dev/nvme0n1p5: UUID="c78888dc-f31f-4de9-b980-582d146bb86b" UUID_SUB="c81655cd-c9ef-41b0-9ab0-50f1827fca47" BLOCK_SIZE="4096" TYPE="btrfs" PARTUUID="d70fc61c-1155-8d49-a5c2-e5fda548add2"

#

├─nvme0n1p5 259:5 0 207.4G 0 part /
└─nvme0n1p6 259:6 0 512M 0 part /boot

gloomy verge
#

is it correct that this is the .old image?

steep stone
#

I just used grub it appended it to both new and old

#

I forgot to clean those up

steep stone
#

i would be so happy to see a single error message

gloomy verge
#

hmm, maybe dracut thinks everything is fine then, and the problem is later

steep stone
#

getting fbcon shouldn‘t be an issue on my card it has never been before

#

so idk where to look

gloomy verge
#

so here's something complex to try, but if you adjust the root= on the gentoo-kernel-bin cmdline to point to the root of your rescue usb does that boot? And maybe try also the other way around, booting your main root with the kernel from the rescue usb.

steep stone
#

let me set up a rescue USB real quick

#

the system I‘ve been botting into has a wonky squashfs setup

#

setting up a new usb makes things easier

#

or maybe not

silver orchid
#

whats the last thing you see on screen when it's stuck?

steep stone
#

loading initial ramdisk

silver orchid
#

I'd try without ramfs

steep stone
#

will try that after Andrew's suggestion

steep stone
#

okay

#

liveusb with my systems root boots to TTY login

#

only other thing showing up is cgroup: unknown subsys name 'net_cls' but idk if tht‘s to be expected

#

anything you want me to try while on my root with the liveusb kernel?

silver orchid
#

Maybe any errors on dmesg but doubt if something there

steep stone
#

doesnt seem like it

#

i saved the output to a file i can wgetpaste later tho

#

at least we found something that works

silver orchid
#

You can copy and add kernel from USB

#

And ramfs

steep stone
#

unsure how to go about where those are located

#

the efi partition has a few efi files but i canr just mountnany of the others so I need to look that up

silver orchid
#

I don't know too

steep stone
#

i have a couple other linux devices I could copy from

#

ran smartcrl out of curiosity

#

self test failed

#

nvm im reading it weong maybe

#

passed self-test but log failed

silver orchid
#

Can rescue CD use your kernel?

steep stone
#

im getting really tired

#

i wonder what my next steps should be if I can boot into my root with the liveusb kernel should be

#

all of these are fresh kernels

silver orchid
#

You can live with rescue USB for some time at last

steep stone
#

on a newly formattes partition

steep stone
#

no networking, can‘t get a GUI

#

which I think is to be expected

silver orchid
steep stone
#

well all I did was append root=uuid=blabla

silver orchid
#

MB

silver orchid
steep stone
#

never dealt with anything likw this so I‘m just happy to learn and hopefully fix things

gloomy verge
#

that should boot since it should then be an identically configured kernel (minus version difference)

#

and since I am now growing sucpicous of dracut again, maybe you can try downgrading dracut and then regenning the initramfs

steep stone
#

alright sure

#

i never had issues with dracur before but if this isnit I know why people hate it

#

i know many dracut enemies

#

I‘ll try an older kernel version aswell just to rule things out

#

just cant use LTS due to my graphics card

gloomy verge
steep stone
#

my root partition remained unchanged and I use /dev/naming for fstab

#

but partition numbers shouldn‘t change

#

(?)

#

i could switch to UUIDs

gloomy verge
steep stone
#

that‘s funny

#

i never saw partition ordering change

#

only device names

gloomy verge
#

it can also change if the uefi initializes them in a different order due to configuration changes or updates

#

in general it is best to use PARTUUID, since it is the "most" static

steep stone
#

they havent changed yet but may be during boot

#

since I know that my device lettering definitely changes on different linux installations

#

I‘ll switch to UUIDs for them just to rule it out

#

usually just throws me an error liek couldnt mount X when fstab is wrong but we‘ll see

#

i should also use a known working kernel from my other install

#

ah well

#

now I get a fucked up UUID kernel panic

#

i got something

gloomy verge
#

well at least that is an error

steep stone
#

the kernel 6.6.8 I took from a different install gets there

#

all newer kernels don‘t

#

they get stuck at the same place as before

gloomy verge
#

hmm, but doesn't that mean the UUID you specify is wrong?

#

maybe it is the same problem, but it just gets stuck in a different point

steep stone
#

i appened smth to the cmdline to try and boot onto the libeusb

#

seems to be that UUID

#

gotta clean that up

#

this kernel gets me somewhere so im leaving debug options as is

gloomy verge
#

another thing to maybe try is USE=generic-uki on gentoo-kernel-bin, this will get you a pregenerated generic initramfs/uki. But this does require setting up systemd-gpt-auto-generator (i.e. giving your root partition the correct id)

steep stone
#

I hate taking phone pics but I‘m also leaving this as a reference for myself

#

hmm I did make grub not parse UUID to make that other thing work

#

let me fix that up too

#

interesting that the list of available partitions doesn't even show anything on my nvme

#

also idk why ive been typing on my phone this entire time when I could've just pulled out my laptop

#

after letting GRUB parse the UUID it's complaining about a different PARTUUID it can't find

#

I'll justmanually specify it what a headache

#

wait that PARTUUID is for the partition

#

butmy kernel can't find it

#

I have
UUID=c78888dc-f31f-4de9-b980-582d146bb86b / btrfs defaults,compress=zstd 0 0 in my fstab,
the menuentry specifically for that kernel uses the PARTUUID, none of the others do
linux /vmlinuz-6.6.8-gentoo-dist root=PARTUUID=d70fc61c-1155-8d49-a5c2-e5fda548add2 ro init=/usr/lib/systemd/systemd debug rd.shell console=tty0

#
/dev/nvme0n1p5: UUID="c78888dc-f31f-4de9-b980-582d146bb86b" UUID_SUB="c81655cd-c9ef-41b0-9ab0-50f1827fca47" BLOCK_SIZE="4096" TYPE="btrfs" PARTUUID="d70fc61c-1155-8d49-a5c2-e5fda548add2"
#

do I need to understand why all the newer kernels have

linux   /vmlinuz-6.6.13-gentoo-dist root=UUID=c78888dc-f31f-4de9-b980-582d146bb86b ro init=/usr/lib/systemd/systemd debug rd.shell console=tty0```
as their menuentry
#

or is this just some grub funniness

#

Idk if manually specifying UUID over PARTUUID does anything here, but it results in the same kernel panic

gloomy verge
#

maybe just some change in grub 2.06 vs 2.12

#

in any case PARTUUID is the better option, though it should not matter

steep stone
#

I know the UUIDs are correct

#

but it can't open them

#

so at least

gloomy verge
steep stone
#

there's an error

steep stone
#

do you know what the causes of this could be?

#

this is the same kernel I was on earlier where I could mount the root partition and chroot intoit

gloomy verge
#

no clue to be honest

#

maybe the firmware has some option to not initialize the nvme drives?

steep stone
#

well one thing of note is thatmy other NVME shows up under SATA configuration

#

this one doesn't have any options

#

it's there under NVME config

gloomy verge
#

it's probably one of those drives that is m2 in shape but sata under the hood

steep stone
#

yea

#

that one is

#

my big one isn't

steep stone
#

it's a crappy one I got from an old prebuilt that is barely faster than an HDD

formal obsidian
#

related maybe

steep stone
#

always suspected something was up with it

#

DRAMless too I think

#

I only use it for other OSesto try them out

#

i'm running a full self-test on the main nvme while I'm already in the configuration settings

#

controller and namespace

#

at this point I hope my NVME is just faulty

#

as soon as I put a kernel onto its ESP nothing works anymore

#

same kernel on a USB or different drive? works

#

fsck did report some issues before and after partitioning but I think I had the partition mounted

#

so I chocked it up to that

#

it passed...

formal obsidian
#

this is a sad day

steep stone
#

yea I'm at a loss

#

officially

#

NVME works fine

#

Well I guess the summary for now is:

If I boot the exact same kernel from a different drive and parse the UUID, it boots
If I boot the kernel from the same drive and parse the UUID, it can't find the NVME drive at all

gloomy verge
#

Could you try booting with 'systemd.gpt_auto=0' on the cmdline?

#

When it is enabled it tries to do magic on the partitions that are on the same disk

#

Maybe that is why it works when it is on a different disk

steep stone
#

that didnt seem to have done it

#

unless im using GRUB's editor wrong

#

I just append it and Ctrl+X/F10 right?

gloomy verge
#

Yeah I think that's correct, don't use grub myself tho

steep stone
#

I stopped using it for reasons

#

but rn I kinda feel forced to

#

with efibootmgr just dying

gloomy verge
#

I don't understand why it should be impossible to boot from a disk that is otherwise working

steep stone
#

running an extended self-test with controller and namespace testing in UEFI should be a good way to assess health

#

right?

#

maybe I can run another test

gloomy verge
#

Yeah that should be a proper test

steep stone
#

Icould try to move the ESP somewhere on the drive

#

but idk how much of a difference that would make

#

delete swap, move ESP to a different sector, make swap the last partition instead

gloomy verge
#

If you can't boot from the disk this means the firmware doesn't think it exists somehow, but then because you can actually use the disk, the disk itself is probably fine

steep stone
#

which is confusing since I can boot into Windows on the same NVME

#

and mount it when in a different linux install on another drive

formal obsidian
steep stone
#

1 TB

gloomy verge
#

Wait a minute, is your ESP the right kind of FAT?

steep stone
#

I think on this PC VFAT

#

usually I do fat 32

gloomy verge
#

I vaguely remember that some boards are picky on the exact type of fat because the spec only mentions fat32(?)

#

Or it might have been fat16

#

And now that I think about it, I also remember that some boards have a max size of the ESP

steep stone
#

I always kept mine at 512MB

#

leaves space for multiple kernels withoutbeing too big

#

I can try F32 tho

gloomy verge
#

It also has to have the correct GUID

#

C12A7328-F81F-11D2-BA4B-00A0C93EC93B

#

On the other hand, maybe none of this makes sense since grub does start

steep stone
#

reformatted to Fat32 and yea same result

#

the kernel is an older version of gentoo-kernel-bin so NVME support shouldbe baked in

gloomy verge
#

I wonder if the disk boots if it is in a different machine

#

But that is not easy to test

steep stone
#

I can try to put it in my laptop but taking it out is a bit painful

#

might besomething to do tho

#

just to try

#

upgrading my UEFI firmware should have reset it

#

but I can also still try that

#

I do have a CMOS reset if I jump some pins

#

wouldn't be able to get much further than initial bootup tho I think with one being -v2 and this one being -v3

#

at least gentoo-kernel-bin isjust x86-64

gloomy verge
#

Aaah, might have a problem in the initramfs then as well

steep stone
#

yeeeee

#

Idon'tr have any other CPUs this new

#

jumping CMOS pins seems like the best hardware related option

#

probably wrong term for those but you get me

formal obsidian
steep stone
#

I use a screwdriver

#

well that did indeed clear it

#

its memory training

#

well that doesnt seem to be the root of the issue either

gloomy verge
#

Thought of one more thing to try. What if you mount your ESP at /efi instead of /boot and then install the kernel and initramfs to the main partition instead of having them on the ESP (I assume you currently have them on ESP since you were EFI stub booting before)

steep stone
#

no difference, I‘m considering just getting a new drive once I have the money to be exclusively for gentoo

unique badge
#

where did we get too?

silver orchid
#

@steep stone have you tried grub downgrading, I probably think there are some noticibly breaking changes with new grub version and EFI setup, I had problem with my netbook just today grub update broke the grub again, but yet again removing /boot/EFI and running grub-install again fixes, case is different because I wasn't able to load grub itself but maybe still related to grub changes...

steep stone
#

if I boot with a liveUSB kernel or I boot with the kernel on a separate drive pointing the ROOT variable to my root partition

#

it works just fine

#

however if I copy that same kernel to my ESP and boot it

#

it doesn‘t recognize my nvme drive

#

it can‘t open the device either by UUID or /dev/name

#

and the list of availavle devices my kernel panic gives me only includes SATA drives

#

I am starting to think it might be an issue with the drive itself or the firmware

#

but I already reset and updated my firmware and the drive passes all selftests

#

I tried moving my ESP to /efi and loading my kernels from the root partition

#

but it literally can‘t detect the device it is literally loaded from

steep stone
#

I guess if that fails my only option

#

is to actually use the ESP from my Xenia Linux installation

#

for my Gentoo install

#

because I can‘t have the ESP on the same drive anymore or it just fails

formal obsidian
#

or reinstall gentoo 👀

steep stone
#

that wouldn‘t fix it?

#

i already deleted and reformatted my ESP several times

#

so reinstalling gentoo wouldn‘t do anything unless I move it to a different drive

#

well my /boot has to be a different drive not the ESP per se I think

#

maybe the drive is somehow uninitialized post-kernel?

steep stone
#

would make my boot times a few seconds slower but at least I can boot

steep stone
#

Downgraded GRUB and compiled it with GCC, did the same for dracut

#

this is one hell of an adventure.

silver orchid
steep stone
#

no.

silver orchid
#

okay

steep stone
#

I'm just gonna sacrifice something else on one of my 3 disks and use a different drive for my boot part

silver orchid
#

uninstall windows XD

steep stone
#

no.

silver orchid
#

writing this from windows

steep stone
#

although funnily enough

#

Windows is now complaining about changed hardware

#

which may be a result of me updating my firmware or may be indicative of an issue

#

my mouse keeps freezing too altho I can't tell if that's just the game itself or Windows

silver orchid
#

yes I've seen on #chill

steep stone
#

so my computer is like

#

not happy

silver orchid
#

tried with no ramfs so far?

steep stone
#

no

#

I honestly doubt its even worth the effort at this point

silver orchid
#

yet we know that you can boot using grub from rescue cd and grub itself loads fine but binary kernel (which was working for most) stuck

steep stone
#

ONLY if it's on the same drive

#

if the kernel is on a different drive

#

it works

#

it's not a bootloader issue

#

and it is not a kernel issue

#

since the same kernel works on one drive

#

and not on another

#

across several different -bin kernel versions

#

it's fucked idk man

#

might try to rearrange the partitions on the drive a littlebit

#

but the most likely plan of action is that I use a partition on a different drive until I can buy a Gentoo-only nvme

#

I tried using efibootmgr on other drives

#

like my SSD

#

and that didn't work either

#

interestingly enough

unique badge
#

this is so weird, I have no clue what to suggest as the next plan

steep stone
#

yeaaa I'll just try what I outlined earlier

#

given that other OSes are starting to act up in different ways maybe I have an underlying issue

#

Xenia Linux works fine, Windows reports hardware being changed and acts strange

#

Gentoo can't boot

#

🥴

unique badge
#

this might be a #gentoo time as well for more opinion but you might need to pick your times to ask as it's the weekend

#

I'm leaning towards hardware though, maybe a SMART?

steep stone
#

I can try other methods to look at drive health but that one should‘ve been pretty solid

#

maybe efibootmgr acting up hints at mobo stuff? I have no clue

#

and the windows hardware thing

#

do you know if it deactivates its license when you update motherboard firmware

unique badge
#

That's lame, I know SMART isn't the best with SSD always but even a hardware fault sounds like it would be useful

steep stone
#

it shouldn‘t

steep stone
#

so either this is some issue with my activation method/windows iso

#

or smth is being sketchy

unique badge
#

have you tried GRUB? might rule out some issues giving it a test

unique badge
#

I had that when I run a tech shop and never understood it

steep stone
#

yea quick google either had people whose hardware started returning different values

#

or just microsoft keyservwrs

#

acting up

#

everything seemingly works when compiling which is like one of the best stress tests

unique badge
#

agreed there

silver orchid
#

even for asking on IRC or elsewhere, I'd form question somewhere will the history, logs, data, everything... for example unix stackexchange (and you have karma there add big bounty)

steep stone
#

I have a bunch of unallocated space on my HDD

#

would slow boot times but hey

#

as long as I can boot

unique badge
steep stone
#

forgive the terminology since I'm sure it's inaccurate

#

but I wonder if there is such a thing as my drive being uninitialized

unique badge
#

You get Sam, tdr or kurley and you are golden for example

steep stone
#

post-kernel loading

#

since the kernel is loaded from it but then can't physically find it

#

but that would also make Windows not boot

unique badge
steep stone
#

if that happened

#

so I have no clue anymore

silver orchid
steep stone
#

yea I might just write up a document outlining everything and providing the few logs I do have

#

(since the only source of any errors is one specific kernel that panics and all other ones giving no output at all)

unique badge
gloomy verge
#

Seems like a similar problem somehow

steep stone
#

I will try disabling fast boot again and appending that cmdline

#

that seems to be the exact issue I'm having except mine is constant

#

not just "sometimes"

#

maybe I can try an old LTS kernel too?

gloomy verge
#

yes, either LTS or the 6.7 series which the last post seems to suggest is better

steep stone
#

I think I tried to boot into 6.7.6 after updating it in a chroot yesterday and got stuck too

#

but I wanna try 5.10

steep stone
#

6.6.16 should be the only LTS kernel that even boots on my hardware

#

guess I'll try

#

6.1 shouldn't boot on a 7800

steep stone
#

@gloomy verge out of sheer curiosity I just copied a non-gentoo kernel to my gentoo install

#

and it boots

#

this seems to be directly related to gentoo-kernel

#

I also generated the initramfs by copying over the modules from another distro

#

with dracut

steep stone
#

booting vanilla-kernel with a random old config at least gives me funny messages

steep stone
#

seems like these are all non-issues

#

but vanilla with a gentoo-config based custom config

#

doesn't boot

#

building vanilla with chimera config

gloomy verge
#

hmm, do you have any clue which config option causes this difference?

#

or it could be one of our kernel patches I guess

#

(oh no wait that is nonesense, it must be the config if sys-kernel/vanilla-kernel with our config behaves the same)

steep stone
#

I was gonna see if I can identify any difference

#

gonna build vanilla with the stock non-modified gentoo config too

#

just to confirm

#

or well

#

since vanilla is newer I did do make olddefconfig just to get it working asap

gloomy verge
#

I wonder if building CONFIG_NVME_* into the kernel instead of as module makes some difference. We have it as a module, but I don't know what other distro's/vanilla are doing with respect to nvme.

steep stone
#

I did diff the config I used to build and gentoo config earlier

gloomy verge
#

Did we ever double check that the initramfs contains the nvme modules?

steep stone
#

but that was the config after make olddefconfig