#[SOLVED] Randomly rebooting, no obvious cause

90 messages · Page 1 of 1 (latest)

brave tulip
#

Arch seems to cause my system to randomly reboot for unknown reasons. This does not occur on Windows (which I have on a completely different drive) (tested only once). The issue started this morning after powering on my system. Running sudo journalctl -b -1 doesn't seem to show anything obvious as to why my system is rebooting.

Any tips on how to resolve this?

#

Randomly rebooting, no obvious cause

runic kraken
#

system specs?

#

this is a known issue on some ryzen 5900x/5950x systems

brave tulip
#

excuse the unconfigured fastfetch

#

i ended up doing some testing and it appears that the issue was somehow related to my installation of discord

#

eg stopping the discord client from running stopped the issue

#

after uninstalling discord (which i also used vencord with it), i reinstalled it, deleted whatever trash it left in .config, and then reinstalled

#

things seem to work fine without crashing now. it's not an arch specific issue so this can be closed

dull lynxBOT
#

Too few arguments for command addnote!
Usage: .addnote <name> <text>

brave tulip
#

also for anyone else reading this i used the arch repos version of the discord client, not a flatpak or anything

#

[SOLVED] Randomly rebooting, no obvious cause (it was Discord/Vencord)

#

Randomly rebooting, no obvious cause

#

never mind, not fixed

#

stock discord client still causes reboots

brave tulip
#

repo ver is .102, downgrading to .101 and seeing how that works via the downgrade pkg in AUR

brave tulip
#

downgrading to even .097 isn't fixing the issue

#

which i know didnt have this issue

#

i know discord forces some updates so im really not sure how to fix this. i can try switching to flatpak

brave tulip
#

it' just not doing it anymore??? idk what is happening 😭

brave tulip
#

it's something to do with discord but i'm not sure what's causing it. just switching to the flatpak version temporarily

brave tulip
#

happens even with the flatpak. routing all output of the flatpak to a log file reveals nothing of interest

runic kraken
#

tbh it sounds like the issue isn't discord (and frankly, i dont see how discord would cause this)

#

what kind of circumstances is it rebooting under? i assume you're using cpu at stock settings
if its idle, it might be a similar issue to the zen 3 cpus i described above. basically, linux will let more cores drop to lower cstates than usually possible in windows and that can drop voltage enough to reveal ccd instability. this also happens when undervolting if you've pushed it too low, and it's very sporadic and can be difficult to test for.
if its under load it might be a psu issue.

Also, it would be a good idea to check the journal after a reboot to see if the kernel logged anything useful. I can guide you through it if you need. That should point us in the right direction to investigate whether this is a hardware or software problem

brave tulip
# runic kraken what kind of circumstances is it rebooting under? i assume you're using cpu at s...

it occurs basically when I'm chilling at idle for extended periods of time. having discord running seems to shorten the amount of time required for a reboot to occur but last night it did eventually reboot by itself without discord being open, so you're def right about it being divorced from discord
I decided to run memtest last night and it passed, so it doesn't seem to be a hardware issue. stressing the system with stress also didn't seem to cause a reboot, but maybe more testing is needed there

#

CPU is totally stock

runic kraken
#

tbh this does sound like an idle voltage problem

#

cpu isn't defective this just happens sometimes

brave tulip
#

I mean what's weird is that I've had this system installed for like 2 months and this hasn't happened until just now

runic kraken
#

that is weird

#

next time it reboots try running sudo journalctl -k -r -b -1

brave tulip
#

kk, I'll lyk. thank u for the help so far

brave tulip
#

@runic kraken

runic kraken
#

did this happen right after boot?

brave tulip
#

the crash? happened within a few minutes

#

had stock discord open, was browsing some forums, then i crashed

#

after i rebooted i ran the command you gave and uploaded it

runic kraken
#

alr, im just mentioning because it looks like this happened right after init from the logs

brave tulip
#

it def wasn't right after init, i was able to use the pc for like 5 minutes

runic kraken
#

there's no clear indication here of anything going wrong on the software end, the only weird thing at all is a usb device resetting
i would try enabling PBO and running +5 CO all core (should increase voltage by about 15mV, will consume a little more power and run a degree or two hotter, just see if it works)

runic kraken
brave tulip
#

okay, and just as a sanity check question; I keep windows 10 on a separate drive. I think on sunday night I starting updating windows, then I booted into Linux, so the windows update never finished (since it needs to reboot back into windows to finish). now I just noticed that I can't actually boot into my windows drive anymore. is there any chance that I fucked up my Linux install by not booting back into windows to finish the update?

#

could also be a hardware issue that just happened to show up tho

runic kraken
#

you should probably fix your windows install though

brave tulip
#

okay yea the bios is now Freezing whenever I try to save the PBO settings

#

definitely a hardware issue at this point rofl

runic kraken
#

definitely a novel failure mode lol

#

never heard of that one before

#

slight overvolting should provide more stability at the same clocks

#

what board are you running if I may ask?

brave tulip
#

"MSI MAG X670E TOMAHAWK WIFI ATX AM5 Motherboard"

#

I just somehow failed a memtest even though last night I passed it so at this point I'm removing a RAM stick and testing one at a time

runic kraken
#

honestly looks like it's getting garbage fed into it from somewhere? but it's both instruction and data so it can't be the decoder or l1 caches

#

When I was undervolting i did run into L2 cache poisoning which could manifest into the same symptoms? though it must be severe to lock up the cpu

#

if you can get into OS I would suggest installing rasdaemon. It can't catch anything if the system goes down but if it's just ambiently spitting out recoverable MCEs it'll catch them

brave tulip
#

also it seems like the bios settings set themselves back to default after a recent reboot since i had to re-disable secure boot

#

memtests both separately passed when in slot 2. i'm currently just running the system to see if it'll fail with just one memory stick; perhaps the 4th slot is somehow fucked

#

also sorry i'm just kinda using this to track the progress i've made and what tests i've done on my PC

#

though i am down to try anything you recommend to troubleshoot the issue. all these parts are pretty new so

#

kk i've now got rasdaemon running via systemctl

runic kraken
brave tulip
#
blake@main ~> rasdaemon -f
rasdaemon: Can't locate a mounted debugfs
blake@main ~> type debugfs
debugfs is /usr/bin/debugfs
#

it was running via systemctl on that boot so whatever messages it prints should be there

#

reboots with any stick in slot 2. probably going to have to bring it in to a repair shop unless you have other ideas

runic kraken
#

there's nothing in there
iirc for foreground you need to run it as root

#

I would only take it in as a last resort tbh. this still looks like a faulty CPU to me and it'd be a lot cheaper to just rma it if we can confirm that it is

#

just to check, you're running latest bios + microcode?

#

I'll check back here in the morning it's very late and I gotta sleep

brave tulip
brave tulip
brave tulip
#

okay, good news; bios update installed. bad news: grub now no longer appears in my boot menu. so I can't boot into arch. good news: I can boot into windows and grab the arch ISO, then use cfdisk to try and triage

brave tulip
#

oh my GOD that took forever

#

so my grub installation somehow got like corrupted despite winblows being on a separate drive

#

so in order to save shit i got systemrescue to save whatever stuff i hadn't backed up yet (because i'm lame and busy and regularly forget lmao)

#

then after fucking around with installing grub (which would give weird errors sometimes; probably PEBKAC)

#

i realized what i had to do and which order i had to do it

#

i didn't realize that grub would not recognize existing linux installs so i had to reinstall the base package in order for grub-mkconfig to recognize Oh Yeah I Should Probably Boot To That

#

anyway NOW i'm back in arch on my system. and hopefully the issue is fixed. i'm gonna be so fucking mad if it's not because i've been doing this shit since i got up

brave tulip
#

thank you so much for the help btw laceflower, you're my savior

#

even if this doesn't work ur tips have been extremely helpful and i appreciate you spending time helping me resolve my issue

brave tulip
#

seems to have totally fixed the issue, tysm! i'll close this issue sometime tonight