#EXT4-fs error, Kernel panic and kdump not working

1 messages · Page 1 of 1 (latest)

dull crag
#

Every few days and seemingly randomly this server crashes with this output on the screen.

From the output it seems to me that there are three problems:

  1. EXT4-fs errors.
  2. Kernel panics.
  3. kdump is not able to save the dump or recover the system

I can't find anything suspicious in journalctl, and obviously don't have a kernel dump. This has persisted over multiple system and kernel updates but the system has ran long before this problem started to happen.

How should I go about understanding and fixing the problem?

viscid hollow
#

I advice to run SMART test and FSCK.

austere loom
#

This is almost assuredly a hardware issue. Is this in any kind of raid configuration

dull crag
dull crag
#

just fot another one today, this time no mention of EXT4-FS

austere loom
dull crag
# austere loom It may be that the ext4 error was simply a side effect of this. This is definite...

indeed, very good point regarding the same location, I hadn't noticed before. I'll run a memtest as soon as I have physical access to the machine.

Last kernel update was indeed a good few months ago because I don't often have physical access to the machine and I avoid doing it remotely. Although this problem also happened before the last kernel update, so I eliminated the possibility of being that at least with a high degree of probability.

I'll post the results of the memtest soon, thank you so much!

dull crag
#

Interestingly the memtest passed without errors. (I've ran a few cycles and rebooted it to run again before the image was taken).
Does this rule out the hardware problem? I guess I can update the kernel again to a current one but suspect that won't fix the problem

austere loom
#

I'm still on the side that it's probably a hardware issue tbqh, there's not much that would cause a kernel panic during a page fault.

#

It's possible it's a software issue, but I highly doubt it.

dull crag
#

okay, so I'll keep exploring the hardware angle. I can try to disconnecting storage to isolate the issue but since I don't have a way to trigger the problem and the services of the server require the storage to be attached, I'm not sure how conclusive these tests will be