#Crash after each update

56 messages · Page 1 of 1 (latest)

restive shell
#

Hey Krill
My agents crash after each automatic update. What can be wrong?

golden furnaceBOT
vapid hedgeBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

golden furnaceBOT
restive shell
#

They are on external server Hetzner. Openclaw Doctor helps but they crash again after updating

reef martenBOT
#

Most likely it is not the download step, it is the post-update restart/state reload that is failing.

First, can you paste:

openclaw --version
openclaw status --all
openclaw gateway status --deep
openclaw doctor

Then, right after the next crash:

openclaw logs --follow
#

The most common causes I’d check are:

  • stale/duplicate gateway service after restart
  • stale session store / lock files (sessions.json, session locks)
  • Node/service path mismatch after update, especially with version managers
  • source install update issues on beta/dev (pnpm/bootstrap/build drift)

A useful clue: the auto-updater reuses the same update path as openclaw update, so if manual updates crash too, the scheduler is probably not the real problem.

If you want a temporary stability workaround while debugging, disable auto-apply and update manually for a bit:
https://docs.openclaw.ai/install/updating

Relevant docs:

#

Paste the outputs and I’ll help narrow it down.

#

That pattern usually means doctor is repairing stale state/service drift, and the next update is re-triggering it.

On a Hetzner box, my top suspects are:

  • stale or duplicate systemd service after restart
  • service config mismatch (Config (cli) vs Config (service))
  • session store / lock file issue that doctor keeps cleaning up
  • install-method mismatch (source vs npm vs container)

Can they paste these?

openclaw --version
openclaw gateway status --deep
openclaw status --all
openclaw doctor
#
openclaw logs --follow

The key thing I want to see is whether gateway status --deep shows:

  • extra gateway-like services
  • config mismatch
  • port conflict
  • restart failure hints

If it is Linux/systemd drift, that command usually catches it.

Also, how is it installed on Hetzner?

  • git/source
  • npm/pnpm global
  • Docker/Compose
restive shell
#

🦞 OpenClaw 2026.4.12 (1c0672b)
Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-14.log
Command: /usr/bin/node /usr/lib/node_modules/openclaw/dist/entry.js gateway --port 18790
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_STATE_DIR=/root/.openclaw-scouty OPENCLAW_GATEWAY_PORT=18790
...
Gateway: bind=lan (0.0.0.0), port=18790 (service args)
Probe target: ws://127.0.0.1:18790
Dashboard: http://89.167.78.211:18790/
Probe note: bind=lan listens on 0.0.0.0 (all interfaces); probing via 127.0.0.1.

Runtime: running (pid 2512565, state active, sub running, last exit 0, reason 0)
Warm-up: launch agents can take a few seconds. Try again shortly.
RPC probe: failed
RPC target: ws://127.0.0.1:18790
gateway closed (1006):

Port 18790 is already in use.

  • pid 2512483 root: openclaw-gateway (*:18790)
  • Gateway already running locally. Stop it (openclaw gateway stop) or use a different port.
    Listening: *:18790
    Note: CLI config resolves gateway port=18789 (free).
    Troubles: run openclaw status
    Troubleshooting: https://docs.openclaw.ai/troubleshooting
    ...
    Overview

│ Version │ 2026.4.12
│ Dashboard │ http://127.0.0.1:18789/
│ Tailscale │ off │
│ Channel │ stable (config) │
│ Update │ pnpm · npm update 2026.4.14 │
│ Gateway │ local · ws://127.0.0.1:18789 (local loopback) · unreachable (gateway closed (1006): ) │

reef martenBOT
#

Yep, this is the clue.

This does not look like an agent crash. It looks like gateway/service drift after update:

  • the service is starting with port 18790
  • the CLI config still resolves to 18789
  • and 18790 is already occupied by another openclaw-gateway process

So after update/restart, OpenClaw is ending up in a split-brain state: wrong port, and likely a stale/duplicate listener.

What I’d fix first:

  1. Make config and service agree on one port
    Right now they do not.

  2. Make sure only one gateway instance exists
    Port 18790 is already in use is the big red flag here.

#
  1. Check for stale/extra systemd installs
    On Hetzner/Linux this is often an old service or override surviving updates.

Please paste these next:

openclaw config get gateway
openclaw gateway status --deep
systemctl --user cat openclaw-gateway

If they previously installed/reran OpenClaw in more than one way, that would explain it.

Temporary workaround: turn off auto-updates until the port/service mismatch is fixed, otherwise each restart can re-break it.

Docs:

restive shell
#

Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-14.log
Command: /usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_GATEWAY_PORT=18789

Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw/openclaw.json

Gateway: bind=loopback (127.0.0.1), port=18789 (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://127.0.0.1:18789/
Probe note: Loopback-only gateway; only local clients can connect.

Runtime: running (pid 2514666, state active, sub running, last exit 0, reason 0)
RPC probe: ok

Listening: 127.0.0.1:18789
Troubles: run openclaw status

reef martenBOT
#

This output looks healthy now.

So the permanent config/service mismatch is probably already fixed, and the real bug is happening right after auto-update, before doctor repairs it.

A few useful takeaways:

  • service/config now match
  • RPC probe is ok
  • loopback bind is not the crash
  • so the thing we need is the first failure log before running doctor

Please grab these next time it breaks, before running openclaw doctor:

openclaw logs --follow
journalctl --user -u openclaw-gateway -n 200 --no-pager
#

And also paste the part of openclaw doctor that says what it repaired, especially lines like:

  • migrated config/state
  • removed stale lock file
  • fixed sessions / transcripts
  • rebuilt stale UI
  • extra gateway detected

That’s the key clue now, because the current status itself looks fine.

If they want to reproduce it in a controlled way later, do it with logs open and use:

openclaw update --dry-run

then, only if they want to test it live:

restive shell
#

so they Scouty (18790) and Sparky (18789) are back on telegram and on version 2026.4.12
guide me in updating both to 2026.4.14 from terminal

reef martenBOT
#

Yep. If Scouty and Sparky are two gateways on the same box, do it like this from terminal:

# 1) See the unit names first
systemctl --user list-units 'openclaw-gateway*' --no-pager

# 2) Update OpenClaw once, without auto-restarting mid-way
openclaw update --tag 2026.4.14 --no-restart

# 3) Restart both gateway units
systemctl --user restart openclaw-gateway.service
systemctl --user restart openclaw-gateway-<other-profile>.service

# 4) Verify
openclaw --version
systemctl --user status openclaw-gateway.service --no-pager
#
systemctl --user status openclaw-gateway-<other-profile>.service --no-pager

If they use profiles, also verify each one directly:

openclaw --profile <profile1> gateway status --deep
openclaw --profile <profile2> gateway status --deep

Big note, because I think this may be the real footgun:

18789 and 18790 are too close together for two gateways on one host.
Docs recommend leaving at least 20 ports apart, because derived browser/CDP ports can collide.

So after updating, I would move one of them, for example:

restive shell
#

That's definitely it. They listen to the same gateway.. How do I move Scouty to 19001 before updating to 4.14?

reef martenBOT
#

Yep, good call. Move Scouty first, then update.

If Scouty is the instance using:

  • OPENCLAW_STATE_DIR=/root/.openclaw-scouty
  • current port 18790

then on the server run:

# target Scouty only
export OPENCLAW_STATE_DIR=/root/.openclaw-scouty

# confirm you're editing the right instance
openclaw config get gateway.port
#
# move Scouty to 19001
openclaw config set gateway.port 19001 --strict-json

# refresh the managed service metadata for this instance
openclaw gateway install --port 19001 --force

# restart Scouty
openclaw gateway restart

# verify
openclaw config get gateway.port
openclaw gateway status --deep

If Scouty uses a custom config path outside that state dir, set that too before running the commands:

#
export OPENCLAW_CONFIG_PATH=/root/.openclaw-scouty/openclaw.json

Then update OpenClaw once for both:

openclaw update --tag 2026.4.14 --no-restart

Then restart both instances:

# Sparky (default instance)
unset OPENCLAW_STATE_DIR OPENCLAW_CONFIG_PATH
openclaw gateway restart
#

# Scouty
export OPENCLAW_STATE_DIR=/root/.openclaw-scouty
openclaw gateway restart

After that, verify both:

# Sparky
unset OPENCLAW_STATE_DIR OPENCLAW_CONFIG_PATH
openclaw gateway status --deep

# Scouty
export OPENCLAW_STATE_DIR=/root/.openclaw-scouty
restive shell
#

Scouty successfully moved to 19001
Update ongoing now

reef martenBOT
#

Nice, that was the right move 🎉

Once the update finishes, I’d verify both immediately:

# Sparky
unset OPENCLAW_STATE_DIR OPENCLAW_CONFIG_PATH
openclaw --version
openclaw gateway status --deep

# Scouty
export OPENCLAW_STATE_DIR=/root/.openclaw-scouty
openclaw --version
openclaw gateway status --deep

If both come back clean on 2026.4.14, I’d consider the port-overlap issue the likely root cause.

#

If anything looks weird, paste the two gateway status --deep outputs and I’ll check them.

restive shell
#

I only receive results for 19001

reef martenBOT
#

That usually means your shell is still pointed at Scouty.

If OPENCLAW_STATE_DIR=/root/.openclaw-scouty is still exported, every openclaw ... command will keep talking to 19001.

Try this exactly:

# see what your shell is targeting right now
echo "$OPENCLAW_STATE_DIR"
echo "$OPENCLAW_CONFIG_PATH"

# check Sparky (default, 18789)
unset OPENCLAW_STATE_DIR OPENCLAW_CONFIG_PATH
openclaw gateway status --deep

# check Scouty (19001)
#
OPENCLAW_STATE_DIR=/root/.openclaw-scouty openclaw gateway status --deep

If Sparky does not come back on 18789, restart it explicitly:

unset OPENCLAW_STATE_DIR OPENCLAW_CONFIG_PATH
openclaw gateway restart
openclaw gateway status --deep

So, short version:

  • 19001 = Scouty
  • 18789 = Sparky
  • if you keep seeing only 19001, you are probably still in Scouty’s env context, or Sparky is not running yet
#

Paste the gateway status --deep for the unset case if Sparky still looks wrong.

restive shell
#

openclaw gateway status --deep

🦞 OpenClaw 2026.4.14 (323493f) — Finally, a use for that always-on Mac Mini under your desk.



Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-14.log
Command: /usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 19001
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_STATE_DIR=/root/.openclaw-scouty OPENCLAW_GATEWAY_PORT=19001

Config (cli): ~/.openclaw-scouty/openclaw.json
Config (service): ~/.openclaw-scouty/openclaw.json

Gateway: bind=lan (0.0.0.0), port=19001 (service args)
Probe target: ws://127.0.0.1:19001
Dashboard: http://89.167.78.211:19001/
Probe note: bind=lan listens on 0.0.0.0 (all interfaces); probing via 127.0.0.1.

Runtime: running (pid 2520515, state active, sub running, last exit 0, reason 0)
RPC probe: ok

Listening: *:19001
Troubles: run openclaw status

reef martenBOT
#

Ah, yep, this explains it.

You overwrote the default systemd unit with Scouty.

Right now:

  • openclaw-gateway.service = Scouty
  • so Sparky no longer has its own managed service

To fix it cleanly, use profiles, so each gateway gets its own service name.

Run this:

# important: clear the manual Scouty env first
unset OPENCLAW_STATE_DIR OPENCLAW_CONFIG_PATH
#

# 1) restore Sparky as the default gateway service on 18789
openclaw gateway install --port 18789 --force
openclaw gateway restart

# 2) install Scouty as its own profile-backed service on 19001
openclaw --profile scouty gateway install --port 19001 --force
systemctl --user enable --now openclaw-gateway-scouty.service

# 3) verify both services exist
systemctl --user list-units 'openclaw-gateway*' --no-pager

# 4) verify both gateways
openclaw gateway status --deep
openclaw --profile scouty gateway status --deep
#

Why this works:

  • default service = openclaw-gateway.serviceSparky
  • profile service = openclaw-gateway-scouty.serviceScouty

That’s the clean multiple-gateway setup from the docs:
https://docs.openclaw.ai/gateway/multiple-gateways

From here on, I’d manage Scouty with --profile scouty instead of exporting OPENCLAW_STATE_DIR, because the export is what caused the service overwrite.

If they paste the output of the final list-units + both gateway status --deep commands, I can confirm the setup is clean.

restive shell
#

2 loaded units listed.
eway status --deep

🦞 OpenClaw 2026.4.14 (323493f) —
Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-14.log
Command: /usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_STATE_DIR=/root/.openclaw-scouty OPENCLAW_GATEWAY_PORT=18789

Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw-scouty/openclaw.json
Root cause: CLI and service are using different config paths (likely a profile/state-dir mismatch).
Fix: rerun openclaw gateway install --force from the same --profile / OPENCLAW_STATE_DIR you expect.

Gateway: bind=lan (0.0.0.0), port=18789 (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://89.167.78.211:18789/
Probe note: bind=lan listens on 0.0.0.0 (all interfaces); probing via 127.0.0.1.

Runtime: running (pid 2527944, state active, sub running, last exit 0, reason 0)
RPC probe: ok

Listening: *:18789
Troubles: run openclaw status

🦞 OpenClaw 2026.4.14 (323493f)
Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-14.log
Command: /usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 19001
Service file: ~/.config/systemd/user/openclaw-gateway-scouty.service
Service env: OPENCLAW_PROFILE=scouty OPENCLAW_STATE_DIR=/root/.openclaw-scouty OPENCLAW_CONFIG_PATH=/root/.openclaw-scouty/openclaw.json OPENCLAW_GATEWAY_PORT=19001

Config (cli): ~/.openclaw-scouty/openclaw.json
Config (service): ~/.openclaw-scouty/openclaw.json

Gateway: bind=lan (0.0.0.0), port=19001 (service args)
Probe target: ws://127.0.0.1:19001
Dashboard: http://89.167.78.211:19001/
Probe note: bind=lan listens on 0.0.0.0 (all interfaces); probing via 127.0.0.1.

Runtime: running (pid 2528158, state active, sub running, last exit 0, reason 0)
RPC probe: ok

Listening: *:19001
Troubles: run openclaw --profile scouty status

reef martenBOT
#

You’re very close. Scouty is clean, Sparky is still contaminated by Scouty’s env.

The bad part is here:

  • default unit: openclaw-gateway.service
  • but service env still contains OPENCLAW_STATE_DIR=/root/.openclaw-scouty

So fix only the default unit from a clean env:

env -u OPENCLAW_STATE_DIR \
    -u OPENCLAW_CONFIG_PATH \
    -u OPENCLAW_PROFILE \
    -u OPENCLAW_GATEWAY_PORT \
    openclaw gateway install --port 18789 --force
#
env -u OPENCLAW_STATE_DIR \
    -u OPENCLAW_CONFIG_PATH \
    -u OPENCLAW_PROFILE \
    -u OPENCLAW_GATEWAY_PORT \
    openclaw gateway restart

env -u OPENCLAW_STATE_DIR \
    -u OPENCLAW_CONFIG_PATH \
    -u OPENCLAW_PROFILE \
    -u OPENCLAW_GATEWAY_PORT \
    openclaw gateway status --deep

What you want to see after that:

  • Config (cli): ~/.openclaw/openclaw.json
#
  • Config (service): ~/.openclaw/openclaw.json
  • no Scouty state dir in the default unit env

Then re-check Scouty:

openclaw --profile scouty gateway status --deep

So the state should end up as:

  • Sparkyopenclaw-gateway.service~/.openclaw18789
  • Scoutyopenclaw-gateway-scouty.service~/.openclaw-scouty19001

Paste the new default-unit gateway status --deep once you run that.

restive shell
#


Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-14.log
Command: /usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_STATE_DIR=/root/.openclaw-scouty OPENCLAW_GATEWAY_PORT=18789

Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw-scouty/openclaw.json
Root cause: CLI and service are using different config paths (likely a profile/state-dir mismatch).
Fix: rerun openclaw gateway install --force from the same --profile / OPENCLAW_STATE_DIR you expect.

Gateway: bind=lan (0.0.0.0), port=18789 (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://89.167.78.211:18789/
Probe note: bind=lan listens on 0.0.0.0 (all interfaces); probing via 127.0.0.1.

Runtime: running (pid 2529597, state active, sub running, last exit 0, reason 0)
RPC probe: ok

Listening: *:18789

Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-14.log
Command: /usr/bin/node /usr/lib/node_modules/openclaw/dist/index.js gateway --port 19001
Service file: ~/.config/systemd/user/openclaw-gateway-scouty.service
Service env: OPENCLAW_PROFILE=scouty OPENCLAW_STATE_DIR=/root/.openclaw-scouty OPENCLAW_CONFIG_PATH=/root/.openclaw-scouty/openclaw.json OPENCLAW_GATEWAY_PORT=19001

Config (cli): ~/.openclaw-scouty/openclaw.json
Config (service): ~/.openclaw-scouty/openclaw.json

Gateway: bind=lan (0.0.0.0), port=19001 (service args)
Probe target: ws://127.0.0.1:19001
Dashboard: http://89.167.78.211:19001/
Probe note: bind=lan listens on 0.0.0.0 (all interfaces); probing via 127.0.0.1.

Runtime: running (pid 2528158, state active, sub running, last exit 0, reason 0)
RPC probe: ok

Listening: *:19001

reef martenBOT
#

Yep, Scouty is fine. The default unit is still literally written with Scouty’s state dir.

So let’s stop being implicit and rewrite Sparky’s unit with the explicit default paths:

env -u OPENCLAW_PROFILE \
    -u OPENCLAW_GATEWAY_PORT \
    OPENCLAW_STATE_DIR=/root/.openclaw \
    OPENCLAW_CONFIG_PATH=/root/.openclaw/openclaw.json \
    openclaw gateway install --port 18789 --force

env -u OPENCLAW_PROFILE \
    -u OPENCLAW_GATEWAY_PORT \
    OPENCLAW_STATE_DIR=/root/.openclaw \
    OPENCLAW_CONFIG_PATH=/root/.openclaw/openclaw.json \
    openclaw gateway restart
#

env -u OPENCLAW_PROFILE \
    -u OPENCLAW_GATEWAY_PORT \
    OPENCLAW_STATE_DIR=/root/.openclaw \
    OPENCLAW_CONFIG_PATH=/root/.openclaw/openclaw.json \
    openclaw gateway status --deep

What we want after that:

  • Config (cli): ~/.openclaw/openclaw.json
  • Config (service): ~/.openclaw/openclaw.json
  • no /root/.openclaw-scouty in the default unit env

Scouty should stay as:

#
openclaw --profile scouty gateway status --deep

If the default unit still shows Scouty after this, paste:

systemctl --user cat openclaw-gateway.service

because then the unit file itself needs inspection.

restive shell
#

Sparky answered back on telegram, but Scouty didn't maybe I need to restart Scouty?

reef martenBOT
#

Yep, I would restart only Scouty now.

openclaw --profile scouty gateway restart
openclaw --profile scouty gateway status --deep

If that still doesn’t answer on Telegram, check the unit directly:

systemctl --user status openclaw-gateway-scouty.service --no-pager

So, yes:

  • Sparky is back
  • now just bounce Scouty and re-test
#

If Scouty still stays quiet after the restart, paste the openclaw --profile scouty gateway status --deep output.

restive shell
#

Scouty is answering!
Hopefully next update they'll both still answer!

#

shall I run doctor now as we just updated or no need?

reef martenBOT
#

Nice, that sounds a lot healthier 🎉

Yeah, I’m optimistic now. The big fixes were:

  • separate service units
  • separate config/state
  • ports far enough apart

That was the real footgun.

If the next auto-update still does anything weird, grab gateway status --deep for both right away, but I’d expect this setup to survive updates much better now.

Glad that helped! 🎉
If this fixed it, please right-click the solution message → Apps → ✅ Mark Solution.
This helps Answer Overflow index the solution for future users.

#

I would, yes, but just as a sanity check, not because something looks broken.

Since you now have two instances, run it once for each:

# Sparky
openclaw doctor

# Scouty
openclaw --profile scouty doctor

If both are already answering and doctor comes back clean, I’d stop there and not keep poking it.

So, short answer: yes, worth running once now.

restive shell
#

super all good!