#Server crash with no logs

1 messages · Page 1 of 1 (latest)

dusty depot
#

Dear all,

We are currently running Project Zomboid version 42.14.1 and are experiencing continuous server crashes that result in a crash loop. The server shuts down without generating any error logs, making it difficult to identify the root cause.

We have performed extensive troubleshooting, including migrating the server to a dedicated VPS environment. Unfortunately, the crashes persist even after the migration, and we are still unable to determine what is causing them.

The only temporary workaround we have found is setting a password on the server. This allows us to connect, join the server, and manually trigger a world save. However, this does not resolve the underlying issue, and the crash loop resumes shortly afterward.

Did anyone experience the same kind of issue? Can you help us?

Thank you in advance for the support.

#

@real hatch

dusty depot
#

@ripe stratus apologies for mentioning you directly, i am doing so because our problem is highly disruptive and it’s creating major problems for the user experience

#

It’s making impossible for the players to play during the peak hours. Our VM is a 192GB RAM and AMD Ryzen 9 5,67GHz

My assumption is that it’s not a resources problem because the server barely reaches the 10% of the workload with a capacity around 10 players online. We cannot either find a root cause, due to the lack of logs on the OS and Project Zomboid

#

We allocated a minimum of 16GB of RAM and a maximum of 128GB

#

I am tagging @real hatch, me and him are both sysadmins working actively on the problem to identify the root cause

ripe stratus
#

Can you share a screenshot of the in game "Show Statistics" screen from the Admin menu?

#

And can you check the server install folder for any "hs_err_pid" files? Can you share all of them here?

#

Lastly, can you explain the nature of the crashes further? Does the server freeze, crash, what happens when you manually launch it again? And what do you mean by a "crash loop" exactly?

dusty depot
real hatch
real hatch
# ripe stratus Lastly, can you explain the nature of the crashes further? Does the server freez...

For the crash loops, we're deploying the server with a custom python-based discord bot

The bot starts the servers as child processes and gives them top OS priority

Once the bot notices in a regular interval check that the child process is dead (server crash) , he waits and monitors for all PZ related processes to quit fully (saving world state and such in crash event) and restarts the server back up

All of those restarts keep crashing after the boot process finishes fully, and the server starts listnening for new connecting players. As soon as any player attempts to connect, the process implodes on itself, does a clean JVM exit process (so we can't catch any JVM exception trace log externally), PZ itself apparently preforms a clean server shutdown process due to its JVM exiting, and starts looping in that state

Eventually it does get clean by itself and starts working, but only after a dozen restarts like that. The only reliable workarround we have is to put in a temporary password on the server, let it boot and run without players for a minute or two, manually run /save on it, then an admin connects and spends arround a minute in world, and only after that we shut down the server, remove the password and allow the playerbase to connect where it works without any issues again (untill the next crash event happens)

real hatch
#

We also found that the start-server.sh script is masking the JVM exit code with 0, and edited it to capture the actual exit code (137 for SIGKILL, 134 for SIGSEGV for example), so when the next crash event happens we'll have more details on the process death itself, if we dont find out anything more before that

ripe stratus
#

And to lower any issue vectors, can you remove the custom bot and see what happens when the server crashes and you manually start it without the use of any external or automation management tools

real hatch
# ripe stratus And to lower any issue vectors, can you remove the custom bot and see what happe...

We did test runs without the bot (just by running the start-server.sh in a tmux) just to verify, and the crash still keeps occurring in the same manner, with the same fallout of crashing on every subsequent run untill we 'fix' it with a temporary password

The bot is just acting as a remote server overseer so we don't have to ssh into it every time and to monitor and execute scheduled restarts and restart on crash

We attempted that fix a week ago while trying to debug for solutions ourselves. We didn't keep any crash logs from them as they were pretty much the same as with the automated bot, no differences

ripe stratus
real hatch
# ripe stratus Did you also check out dmesg or journalctl for any logs when these crashes happe...

Yeah, dmesg is clean, no kernel-level events
And journalctl only echoes that the bot crashed with its own crash log (exit code 0), nothing from the kernel or the OS

We're running the server right now on a bare-metal Ryzen machine with 180G of available RAM on Ubuntu server 24.04. , and running PZ with flags -Xms8g and -Xmx128g , we were on a VPS earlier with the same crashes, and on a windows server VPS with the same crashes (we first though it might be an OS issue, switching from win greatly reduced them, then switching from VPS to bare-metal again reduced but they're still occurring )

#

Journalctl entry:

  Feb 24 19:56:46 Ubuntu-2404-noble-amd64-base pzbot[35227]: 2026-02-24 19:56:46,314 [INFO] bot.scheduler.auto_start: Auto-start: server crashed, entering recovery...
  Feb 24 19:56:51 Ubuntu-2404-noble-amd64-base pzbot[35227]: 2026-02-24 19:56:51,584 [INFO] bot.scheduler.auto_start: Auto-start: all blockers cleared, ready to restart
  Feb 24 19:56:51 Ubuntu-2404-noble-amd64-base pzbot[35227]: 2026-02-24 19:56:51,584 [INFO] bot.scheduler.auto_start: Auto-start: starting server...
  Feb 24 19:56:51 Ubuntu-2404-noble-amd64-base pzbot[35227]: 2026-02-24 19:56:51,584 [INFO] bot.server.process: Starting PZ server: /opt/pzserver/start-server.sh
  Feb 24 19:56:51 Ubuntu-2404-noble-amd64-base pzbot[35227]: 2026-02-24 19:56:51,585 [INFO] bot.server.process: PZ server spawned (PID 37862)
  Feb 24 19:57:03 Ubuntu-2404-noble-amd64-base pzbot[35227]: 2026-02-24 19:57:03,425 [INFO] bot.cogs.chat_relay: Tailing 2026-02-24_19-56_chat.txt (pos=454)
  Feb 24 19:57:08 Ubuntu-2404-noble-amd64-base pzbot[35227]: 2026-02-24 19:57:08,594 [INFO] bot.server.process: PZ server is ready (took ~12s)
  Feb 24 19:57:08 Ubuntu-2404-noble-amd64-base pzbot[35227]: 2026-02-24 19:57:08,594 [INFO] bot.cogs.chat_relay: DayLength: 120 real minutes per in-game day
  Feb 24 19:57:08 Ubuntu-2404-noble-amd64-base pzbot[35227]: 2026-02-24 19:57:08,594 [INFO] bot.bot: Restart schedule reset: next restart in 360 minutes
  Feb 24 19:57:44 Ubuntu-2404-noble-amd64-base pzbot[35227]: 2026-02-24 19:57:44,977 [INFO] bot.cogs.chat_relay: Tailing 2026-02-24_19-56_user.txt (pos=80)
  Feb 24 19:58:11 Ubuntu-2404-noble-amd64-base pzbot[35227]: 2026-02-24 19:58:11,206 [WARNING] bot.server.process: PZ server exited unexpectedly (code 0)
ripe stratus
#

Same crashes... is anything similar between these configurations in that case? On this server and the previous servers.

#

Usually 128g is overkill for the server, beyond overkill I would say, did you have the same RAM config before as well?

#

And did you do a fresh install on this new server or did you copy anything from the other VPS?

real hatch
real hatch
real hatch
ripe stratus
#

Since the crashes are easily reproducible by just connecting to the server after the reboot, it allows us to test the configs and potentional fixes

real hatch
ripe stratus
#

It would be very difficult to oom at 16g

#

I had 80 players playing, the RAM usage was 10g

real hatch
#

We're using a shit ton of the KI5 car mods, not sure if they're eating into the ram as well

ripe stratus
#

In this instance, you had only 5gb used

real hatch
#

Just checked, we're on 17G currently 😅

ripe stratus
#

And what is the value it drops to after the GC runs?

#

Since that is the important number

real hatch
#

No, sorry, thats 17G on the system, PZ process seems to be on 9.1

ripe stratus
#

We are mainly setting the heap, this also measures non-heap assignments, you cannot really control those

#

Afaik, KI5 mods would mainly add to the non heap usage

#

Though I have not tested that

real hatch
#

Alright, so, want us to use the default -xmx16g and remove the xms flag, and with the JVM exit code catcher modification in the start-server.sh , we should be able go gather some more data and come back after the next crash event?

Or do you have something different / additional in mind that we might try now?

ripe stratus
#

That is basically it for the test yes, though I am also wondering, are you running the server directly through the .sh or using the SystemD service?

#

Since the wiki has that as well, though we recommend against using that