#ERR_MODULE_NOT_FOUND led to doctor --fix breaking config, now Gateway UI is completely unreachable (

1 messages · Page 1 of 1 (latest)

random tinselBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

upper graniteBOT
#

You’re almost certainly dealing with two overlapping problems:

  1. a corrupted host npm install (ERR_MODULE_NOT_FOUND in /usr/lib/node_modules/openclaw/dist/...)
  2. a runtime split-brain (host systemd gateway + Docker gateway competing / different configs)

Also, from docs: doctor --fix is a migration/repair tool, not a package file reinstaller, so it can legitimately rewrite config/service state.

1) First, verify what’s actually running (read-only)

(yeah, even though you said 2026.4.5 — verify current binary)

openclaw --version
openclaw gateway status --deep
openclaw status --all
ss -ltnp | rg ':18789'
docker compose ps
#

2) Pick ONE gateway owner (critical)

If your intended setup is Docker-owned, remove host daemon ownership:

openclaw gateway uninstall
systemctl --user disable --now openclaw-gateway.service || true

Then run only container gateway:

docker compose up -d openclaw-gateway
docker compose logs -f openclaw-gateway

If Docker is the owner, reinstalling host /usr/lib/node_modules/openclaw won’t fix container runtime.

#

3) Fix the missing module only on the runtime you actually use

If host gateway runtime is still needed:

npm uninstall -g openclaw
npm install -g openclaw@2026.4.5 --force

(That specifically addresses the missing hashed dist chunk mismatch.)

4) Fix “unauthorized: gateway token missing”

Check auth mode + token in the active config path:

#
openclaw config get gateway.auth.mode
openclaw config get gateway.auth.token

If mode is token and token is empty:

openclaw config set gateway.auth.mode token
openclaw config set gateway.auth.token "$(openssl rand -hex 32)"
openclaw gateway restart

Then get a fresh dashboard URL:

#

Yep — you can unblock this safely without doctor --fix.

Best low-risk option (no service rewrites)

Use:

openclaw doctor --non-interactive

Per docs, this applies only safe migrations (like config normalization) and skips restart/service/sandbox repair actions that need prompts.


If you want to patch manually (exactly what you asked), do this:

#
cp /root/.openclaw/openclaw.json /root/.openclaw/openclaw.json.pre-websearch-fix.bak
nano /root/.openclaw/openclaw.json

In tools.web.search, remove legacy provider-owned keys like:

  • apiKey
  • brave: { ... }
  • exa: { ... }
  • firecrawl: { ... }
  • etc. provider sub-objects under tools.web.search.*

Keep only normal runtime fields there (example):

tools: {
#
web: {
search: {
enabled: true,
provider: "brave",
maxResults: 5,
},
},
},

If you were using Brave API key, move it to:

plugins: {
entries: {
brave: {
#
config: {
webSearch: {
apiKey: "YOUR_KEY",
},
},
},
},
},

(Or temporarily disable search to unblock everything: search: { enabled: false }.)


Commands without rg / compose v2

Use these now:

#
openclaw --version
openclaw gateway status --deep
openclaw status --all
ss -ltnp | grep ':18789'
docker compose version || docker-compose version
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
docker-compose ps 2>/dev/null || true

Paste that output and I’ll give you the exact “Docker vs host service” cleanup sequence next.

Refs:

#

Perfect, that narrows it down cleanly.
For your host-based setup, run these exact 2 commands:

npm install -g openclaw@latest --force
openclaw config set gateway.auth.mode token && openclaw config set gateway.auth.token "$(openssl rand -hex 32)" && openclaw gateway install --force && openclaw gateway restart

Then grab the token + verify:

openclaw config get gateway.auth.token
openclaw gateway status

Use that token in the Control UI settings.
(Yes, this rotates token auth to a fresh value.)

#

Yes — in your case that is the correct and safest next step.

Your logs match the documented startup guard: if gateway.mode is missing, service exits with code 1 before binding.

Run exactly:

openclaw config set gateway.mode local
openclaw gateway start
openclaw gateway status

If status comes back healthy, you’re done.
If it still fails, send just:

#
journalctl --user -u openclaw-gateway.service -n 50 --no-pager

(But based on your log, setting gateway.mode=local should unblock it.)

#

Hell yes — you’re at the last mile.

Use this exact sequence (replace with your real domain, no trailing slash):

openclaw config set gateway.controlUi.allowedOrigins '["https://YOUR_SUBDOMAIN","http://127.0.0.1:18789","http://localhost:18789"]' --strict-json
openclaw gateway restart

Then verify:

openclaw config get gateway.controlUi.allowedOrigins --json
openclaw gateway status

Notes:

#
  • Origin must be exact (scheme + host + optional port), no path.
  • If you use multiple hostnames (e.g. https://a.example.com and https://b.example.com), include both in that array.

If this works, glad that helped 🎉
If this fixed it, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it.

#

Nice — you’re at the final auth gate.

For Control UI, there is no pairing code. It’s a device approval flow.

Run:

openclaw devices list
openclaw devices approve --latest

If you want to approve a specific request instead:

openclaw devices approve <requestId>
#

Then refresh the UI tab.

If devices list shows nothing pending, reload the Control UI once (to create a fresh request), then run openclaw devices list again.

#

Great diagnosis — and yes, this is classic split-state.

Safest path: don’t “merge” directories and don’t run root against /home/openclaw/.openclaw.
Use openclaw user + /home/openclaw/.openclaw as the single source of truth, then re-apply only the few config fixes you made under root.

Recommended recovery plan

# 1) stop/remove the root-owned gateway service
openclaw gateway stop || true
openclaw gateway uninstall || true

# 2) make sure openclaw-user config has required keys
sudo -u openclaw -H openclaw config set gateway.mode local
sudo -u openclaw -H openclaw config set gateway.auth.mode token
sudo -u openclaw -H openclaw config set gateway.controlUi.allowedOrigins '["https://YOUR_SUBDOMAIN","http://127.0.0.1:18789","http://localhost:18789"]' --strict-json
#
sudo -u openclaw -H bash -lc 'openclaw config set gateway.auth.token "$(openssl rand -hex 32)"'

# 3) install/start service as openclaw user
sudo -u openclaw -H openclaw gateway install --force
sudo -u openclaw -H openclaw gateway restart
sudo -u openclaw -H openclaw gateway status --deep

Why this is safest

  • avoids root/openclaw ownership conflicts
  • preserves original agents/sessions/skills in /home/openclaw/.openclaw
  • avoids risky cross-user chown -R on active state

After this, you’ll likely need to re-pair the browser device once (normal).

Refs:

#

Yep — in this situation, yes: running doctor in non-interactive mode as openclaw is the right move.

That mode is specifically for safe migrations (schema/state normalization) and skips service-repair prompts.
So it’s the best way to unblock legacy Slack/Web keys without redoing the root/systemd mess.

Use this sequence:

sudo -u openclaw -H bash -lc 'cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.pre-noninteractive.bak && openclaw doctor --non-interactive'

Then continue with the openclaw user config fixes:

sudo -u openclaw -H openclaw config set gateway.mode local
sudo -u openclaw -H openclaw config set gateway.auth.mode token
#
sudo -u openclaw -H openclaw config set gateway.controlUi.allowedOrigins '["https://YOUR_SUBDOMAIN","http://127.0.0.1:18789","http://localhost:18789"]' --strict-json
sudo -u openclaw -H bash -lc 'openclaw config set gateway.auth.token "$(openssl rand -hex 32)"'

Then reinstall/restart service as openclaw:

sudo -u openclaw -H openclaw gateway install --force
sudo -u openclaw -H openclaw gateway restart

So yes — your instinct was correct. doctor --non-interactive is the safest unblocker here.

#

Yes — in your exact case, proceed with --fix for the openclaw user.

You’ve already confirmed doctor is detecting the right migrations; it just needs repair mode to write them.

sudo -u openclaw -H bash -lc 'cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.pre-fix.bak && openclaw doctor --fix --non-interactive'

That should apply the Slack/WebSearch legacy migrations in /home/openclaw/.openclaw/openclaw.json and unblock config set.

Then continue:

sudo -u openclaw -H openclaw config set gateway.mode local
sudo -u openclaw -H openclaw config set gateway.auth.mode token
sudo -u openclaw -H openclaw config set gateway.controlUi.allowedOrigins '["https://YOUR_SUBDOMAIN","http://127.0.0.1:18789","http://localhost:18789"]' --strict-json
#

So yes: your proposed command is the right next step.

#

Yep — this is the right diagnosis.
For headless VPS, the safe fix is:

  1. enable lingering for openclaw
  2. start that user manager
  3. run --user commands with the user-bus env set
uid=$(id -u openclaw)
sudo loginctl enable-linger openclaw
sudo systemctl start "user@${uid}.service"

sudo -u openclaw env \
XDG_RUNTIME_DIR="/run/user/${uid}" \
DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/${uid}/bus" \
openclaw gateway install --force
#

sudo -u openclaw env \
XDG_RUNTIME_DIR="/run/user/${uid}" \
DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/${uid}/bus" \
openclaw gateway restart

That’s the cleanest way to “wake up” systemctl --user without switching architecture.
So yes: loginctl enable-linger openclaw is part of the correct fix.

upper graniteBOT
#

You’re correct: providers.openai.apiKey is old/invalid at root level.

âś… Correct config path in this build:
models.providers.openai.apiKey

So:

sudo -u openclaw -H openclaw config set models.providers.openai.apiKey "sk-..."
sudo -u openclaw -H openclaw gateway restart
sudo -u openclaw -H openclaw models status

Alternative (recommended for daemon setups): put OPENAI_API_KEY in /home/openclaw/.openclaw/.env and restart gateway.

And no — you generally do not need agents configure main just to use a global OpenAI key.
auth-profiles.json is mainly for explicit profile-managed auth flows; global provider key/env is enough for normal OpenAI usage.

#

(So between your options: use models.providers.openai.apiKey, not plugins.entries.openai.config.apiKey.)

#

Yep — that error means in your current config shape, models.providers.openai requires baseUrl when that block exists.

Fast fix: set both fields together in one write:

sudo -u openclaw -H openclaw config set --batch-json '[{"path":"models.providers.openai.baseUrl","value":"https://api.openai.com/v1"},{"path":"models.providers.openai.apiKey","value":"sk-REPLACE_ME"}]'
sudo -u openclaw -H openclaw gateway restart

If you’d rather avoid provider-block schema quirks, use env instead (also valid):

  • put OPENAI_API_KEY=... in /home/openclaw/.openclaw/.env
  • restart gateway

That should clear this validation error immediately.

#

Good catch — and yeah, this explains the loop.

You’re now in the custom provider schema path.
If models.providers.openai exists, OpenClaw treats it like a full custom provider object (with required fields), not a simple “drop key here” shortcut.

Safest fix (recommended)

Don’t use models.providers.openai.* for this right now.
Use env key + built-in OpenAI provider:

sudo -u openclaw -H openclaw config unset models.providers.openai
sudo -u openclaw -H bash -lc 'grep -q "^OPENAI_API_KEY=" ~/.openclaw/.env 2>/dev/null && sed -i "s/^OPENAI_API_KEY=.*/OPENAI_API_KEY=sk-REPLACE_ME/" ~/.openclaw/.env || echo "OPENAI_API_KEY=sk-REPLACE_ME" >> ~/.openclaw/.env'
sudo -u openclaw -H openclaw gateway restart

Then verify:

#
sudo -u openclaw -H openclaw models status
sudo -u openclaw -H openclaw models list

If openai/gpt-5.4 appears in models list, you can set it as primary.
No extra “capability flags” are required just to use it.

So: unset custom models.providers.openai, use OPENAI_API_KEY, restart, verify.

#

That error means the main agent auth store still has no OpenAI credential (or the running service isn’t seeing env).

Fastest deterministic fix: write the key directly into that agent auth store:

sudo -u openclaw -H openclaw models auth paste-token --provider openai
sudo -u openclaw -H openclaw models status --probe
sudo -u openclaw -H openclaw gateway restart

When prompted, paste your sk-... key.

If it still fails after that, run this once to confirm you’re not accidentally still on root service/config:

sudo -u openclaw -H openclaw gateway status --deep
#

(You should see service/config paths under /home/openclaw/.openclaw, not /root/.openclaw.)

#

Huge win. Nicely done digging through that mess. 🦞

For your goal (openclaw data + root-simple ops), the clean answer is:

Use a system service that runs as User=openclaw

Yes — that’s the right “vanilla” shape for VPS admin from root.

Why

  • no per-command XDG_RUNTIME_DIR / DBUS_SESSION_BUS_ADDRESS hacks
  • keep state in /home/openclaw/.openclaw
  • manage with normal root commands: sudo systemctl restart openclaw-gateway

Migration (safe path)

  1. Disable the user service (so you don’t run two gateways):
#
uid=$(id -u openclaw)
sudo -u openclaw env XDG_RUNTIME_DIR="/run/user/$uid" DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/$uid/bus" systemctl --user disable --now openclaw-gateway.service || true
  1. Create system unit:
OPENCLAW_BIN=$(command -v openclaw)
sudo tee /etc/systemd/system/openclaw-gateway.service >/dev/null <<EOF
[Unit]
Description=OpenClaw Gateway
After=network-online.target
Wants=network-online.target

[Service]
#
User=openclaw
Group=openclaw
Environment=HOME=/home/openclaw
Environment=OPENCLAW_STATE_DIR=/home/openclaw/.openclaw
ExecStart=${OPENCLAW_BIN} gateway --port 18789
Restart=always
RestartSec=5
TimeoutStopSec=30
TimeoutStartSec=30
SuccessExitStatus=0 143
KillMode=control-group

[Install]
WantedBy=multi-user.target
EOF
#
  1. Enable/start:
sudo systemctl daemon-reload
sudo systemctl enable --now openclaw-gateway.service
sudo systemctl status openclaw-gateway.service --no-pager

After this, manage it with:

  • sudo systemctl restart openclaw-gateway
  • sudo systemctl status openclaw-gateway
  • sudo journalctl -u openclaw-gateway -f

That gives you the stability of openclaw user data with root-level simplicity.

If this fixed it end-to-end, please mark solution so Answer Overflow indexes it âś…

upper graniteBOT
#

You’re not hitting a stale lockfile.

In OpenClaw, gateway singleton is TCP bind-based (no persistent PID/lock file to delete).
That "already running under systemd; waiting 5000ms" loop is almost always supervisor conflict (two service layers / duplicate units), not a port lock.

Nuclear reset (clean + deterministic)

# 0) stop both possible managers
sudo systemctl disable --now openclaw-gateway.service || true

uid=$(id -u openclaw)
sudo loginctl enable-linger openclaw
sudo systemctl start "user@${uid}.service"

sudo -u openclaw env \
#
XDG_RUNTIME_DIR="/run/user/${uid}" \
DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/${uid}/bus" \
systemctl --user disable --now openclaw-gateway.service || true

# 1) kill leftover processes + clear failed state
sudo pkill -9 -u openclaw -f "openclaw gateway" || true
sudo systemctl reset-failed openclaw-gateway.service || true

Now use one system service only, and make it run foreground gateway directly:

ExecStart=/usr/bin/openclaw gateway run --port 18789

(Use your actual openclaw path if different.)

#

Then:

sudo systemctl daemon-reload
sudo systemctl enable --now openclaw-gateway.service
sudo systemctl status openclaw-gateway.service --no-pager

Direct answers

  • Lock/PID file causing this? No (not the mechanism here).
  • Built-in daemon conflict? Yes, effectively — duplicate supervision paths/user-vs-system unit behavior.
  • Nuclear option? Disable all other units, kill all gateway procs, run one foreground system unit only.

If this clears it, please mark solution so others can find this ghost-loop fix via Answer Overflow âś…

#

You found the right thread to pull. This is not /run/user/* — it’s the gateway lock layer.

Direct answers

  • Lock file exists (in this codepath), at:
  • /tmp/openclaw-<uid>/gateway.<hash>.lock
  • --force won’t fix this lock-loop (it helps port reclaim, not lock-owner logic).
  • The loop happens because under systemd it retries instead of hard-failing on lock conflict.

Nuclear reset (do this exactly)

uid=$(id -u openclaw)

sudo systemctl stop openclaw-gateway.service
sudo pkill -9 -u openclaw -f "openclaw gateway" || true
#
sudo rm -f /tmp/openclaw-${uid}/gateway.*.lock
sudo systemctl reset-failed openclaw-gateway.service

Then run once in foreground (outside systemd retry behavior) to see the real error:

sudo -u openclaw -H env \
-u INVOCATION_ID -u SYSTEMD_EXEC_PID -u JOURNAL_STREAM \
OPENCLAW_STATE_DIR=/home/openclaw/.openclaw \
/usr/bin/openclaw gateway run --port 18789 --verbose

If that starts cleanly, your service can be adjusted similarly (or fall back to user service).
If it errors, paste that first hard error — that’s the root cause hidden by the 5s systemd retry loop.

#

Also: yes, this retry behavior is supervisor-detection-driven, not evidence of a surviving port bind.

#

Yep — if the lingering user manager is the squatter, the clean kill switch is:

  1. disable linger
  2. terminate the user manager session

You don’t need to close a special socket manually.

uid=$(id -u openclaw)

sudo loginctl disable-linger openclaw
sudo loginctl terminate-user openclaw
sudo systemctl stop "user@${uid}.service" || true

# ensure old user-unit autostart link is gone
sudo rm -f /home/openclaw/.config/systemd/user/default.target.wants/openclaw-gateway.service
#

# cleanup leftovers
sudo pkill -9 -u openclaw -f "openclaw gateway" || true
sudo rm -f /tmp/openclaw-${uid}/gateway.*.lock

Then start only your system unit:

sudo systemctl restart openclaw-gateway.service
sudo systemctl status openclaw-gateway.service --no-pager

So yes: loginctl terminate-user openclaw is the definitive immediate stop, and disable-linger prevents it from respawning in the background.

upper graniteBOT
#

Nice catch. PPID 800002 means you’re right: there’s a living supervisor chain.

Also: if Caddy is truly listening on :18789, that alone can block OpenClaw. One listener per port.

1) Identify exactly what 800002 is

Run:

ps -o pid,ppid,pgid,sid,user,tty,etime,cmd -p 800002,800009
tr '\0' ' ' < /proc/800002/cmdline; echo
readlink -f /proc/800002/exe
cat /proc/800002/cgroup
systemctl status 800002 --no-pager

If that PID belongs to a systemd unit, stop/kill the unit (best way).

#

2) Kill the entire 800002 branch (if rogue process tree)

sudo pkill -TERM -P 800002 || true
sudo kill -TERM 800002 || true
sleep 2
sudo pkill -KILL -P 800002 || true
sudo kill -KILL 800002 || true

Then confirm nothing remains:

pgrep -a -u openclaw -f "openclaw gateway" || true
ss -ltnp | grep ':18789' || true
#

3) Important port sanity check

If Caddy is on 18789, move one of them. Preferred layout:

  • Caddy listens on :443/:80
  • OpenClaw listens on 127.0.0.1:18789
  • Caddy reverse-proxies to 127.0.0.1:18789

If Caddy must keep 18789, set OpenClaw to another port and update proxy target.


If you paste the output of systemctl status 800002 + /proc/800002/cgroup, I can tell you the exact parent service name to kill permanently in one command.

#

Bingo — that explains the entire ghost loop.

Short answer: no, 2026.4.5 does not hard-require -gateway in the unit name for runtime.
But yes, you should still migrate to the canonical service name for sanity.

Why

  • openclaw.service is legacy and can confuse doctor / gateway status --deep / service tooling.
  • Current docs + tooling expect openclaw-gateway.service (and profile variants).
  • Keeping legacy name works today, but it’s not the safest long-term path.

Recommendation

  • Keep one service only.
  • Prefer canonical openclaw-gateway.service.
  • Remove/disable the legacy openclaw.service once you’ve replicated the same ExecStart/env/state path.

If you want, I can give you a zero-downtime cutover sequence (copy unit, daemon-reload, stop old/start new, verify port/process, then purge old unit).

#

And yep — excellent forensic work here. This was a nasty one. 🦞

#

Perfect — here’s the cutover sequence (fast + rollback-ready).

# 0) backup the currently working legacy unit
OLD_UNIT_PATH=$(systemctl show -p FragmentPath --value openclaw.service)
sudo cp "$OLD_UNIT_PATH" "/etc/systemd/system/openclaw.service.bak.$(date +%F-%H%M%S)"

# 1) disable/remove the conflicting duplicate unit if present
sudo systemctl disable --now openclaw-gateway.service || true

# 2) clone the working legacy unit into canonical name
sudo cp "$OLD_UNIT_PATH" /etc/systemd/system/openclaw-gateway.service
sudo sed -i 's/^Description=.*/Description=OpenClaw Gateway/' /etc/systemd/system/openclaw-gateway.service
sudo systemctl daemon-reload

# 3) cutover (brief blip)
#
sudo systemctl stop openclaw.service
sudo systemctl enable --now openclaw-gateway.service
sudo systemctl disable openclaw.service

# 4) verify
sudo systemctl status openclaw-gateway.service --no-pager
ss -ltnp | grep ':18789' || true
sudo journalctl -u openclaw-gateway.service -n 50 --no-pager

Rollback (if needed):

sudo systemctl stop openclaw-gateway.service
sudo systemctl start openclaw.service