#ERR_MODULE_NOT_FOUND led to doctor --fix breaking config, now Gateway UI is completely unreachable (
1 messages · Page 1 of 1 (latest)
You’re almost certainly dealing with two overlapping problems:
- a corrupted host npm install (
ERR_MODULE_NOT_FOUNDin/usr/lib/node_modules/openclaw/dist/...) - a runtime split-brain (host systemd gateway + Docker gateway competing / different configs)
Also, from docs: doctor --fix is a migration/repair tool, not a package file reinstaller, so it can legitimately rewrite config/service state.
1) First, verify what’s actually running (read-only)
(yeah, even though you said 2026.4.5 — verify current binary)
openclaw --version
openclaw gateway status --deep
openclaw status --all
ss -ltnp | rg ':18789'
docker compose ps
2) Pick ONE gateway owner (critical)
If your intended setup is Docker-owned, remove host daemon ownership:
openclaw gateway uninstall
systemctl --user disable --now openclaw-gateway.service || true
Then run only container gateway:
docker compose up -d openclaw-gateway
docker compose logs -f openclaw-gateway
If Docker is the owner, reinstalling host
/usr/lib/node_modules/openclawwon’t fix container runtime.
3) Fix the missing module only on the runtime you actually use
If host gateway runtime is still needed:
npm uninstall -g openclaw
npm install -g openclaw@2026.4.5 --force
(That specifically addresses the missing hashed dist chunk mismatch.)
4) Fix “unauthorized: gateway token missing”
Check auth mode + token in the active config path:
openclaw config get gateway.auth.mode
openclaw config get gateway.auth.token
If mode is token and token is empty:
openclaw config set gateway.auth.mode token
openclaw config set gateway.auth.token "$(openssl rand -hex 32)"
openclaw gateway restart
Then get a fresh dashboard URL:
openclaw dashboard --no-open
If you paste outputs from step 1 (gateway status --deep, ss, docker compose ps), I can tell you exactly which process is stealing 18789 and give you a precise recovery path in 1-2 commands.
Docs used:
Yep — you can unblock this safely without doctor --fix.
Best low-risk option (no service rewrites)
Use:
openclaw doctor --non-interactive
Per docs, this applies only safe migrations (like config normalization) and skips restart/service/sandbox repair actions that need prompts.
If you want to patch manually (exactly what you asked), do this:
cp /root/.openclaw/openclaw.json /root/.openclaw/openclaw.json.pre-websearch-fix.bak
nano /root/.openclaw/openclaw.json
In tools.web.search, remove legacy provider-owned keys like:
apiKeybrave: { ... }exa: { ... }firecrawl: { ... }- etc. provider sub-objects under
tools.web.search.*
Keep only normal runtime fields there (example):
tools: {
web: {
search: {
enabled: true,
provider: "brave",
maxResults: 5,
},
},
},
If you were using Brave API key, move it to:
plugins: {
entries: {
brave: {
config: {
webSearch: {
apiKey: "YOUR_KEY",
},
},
},
},
},
(Or temporarily disable search to unblock everything: search: { enabled: false }.)
Commands without rg / compose v2
Use these now:
openclaw --version
openclaw gateway status --deep
openclaw status --all
ss -ltnp | grep ':18789'
docker compose version || docker-compose version
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
docker-compose ps 2>/dev/null || true
Paste that output and I’ll give you the exact “Docker vs host service” cleanup sequence next.
Refs:
Perfect, that narrows it down cleanly.
For your host-based setup, run these exact 2 commands:
npm install -g openclaw@latest --force
openclaw config set gateway.auth.mode token && openclaw config set gateway.auth.token "$(openssl rand -hex 32)" && openclaw gateway install --force && openclaw gateway restart
Then grab the token + verify:
openclaw config get gateway.auth.token
openclaw gateway status
Use that token in the Control UI settings.
(Yes, this rotates token auth to a fresh value.)
Yes — in your case that is the correct and safest next step.
Your logs match the documented startup guard: if gateway.mode is missing, service exits with code 1 before binding.
Run exactly:
openclaw config set gateway.mode local
openclaw gateway start
openclaw gateway status
If status comes back healthy, you’re done.
If it still fails, send just:
journalctl --user -u openclaw-gateway.service -n 50 --no-pager
(But based on your log, setting gateway.mode=local should unblock it.)
Hell yes — you’re at the last mile.
Use this exact sequence (replace with your real domain, no trailing slash):
openclaw config set gateway.controlUi.allowedOrigins '["https://YOUR_SUBDOMAIN","http://127.0.0.1:18789","http://localhost:18789"]' --strict-json
openclaw gateway restart
Then verify:
openclaw config get gateway.controlUi.allowedOrigins --json
openclaw gateway status
Notes:
- Origin must be exact (
scheme + host + optional port), no path. - If you use multiple hostnames (e.g.
https://a.example.comandhttps://b.example.com), include both in that array.
If this works, glad that helped 🎉
If this fixed it, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it.
Nice — you’re at the final auth gate.
For Control UI, there is no pairing code. It’s a device approval flow.
Run:
openclaw devices list
openclaw devices approve --latest
If you want to approve a specific request instead:
openclaw devices approve <requestId>
Then refresh the UI tab.
If devices list shows nothing pending, reload the Control UI once (to create a fresh request), then run openclaw devices list again.
Great diagnosis — and yes, this is classic split-state.
Safest path: don’t “merge” directories and don’t run root against /home/openclaw/.openclaw.
Use openclaw user + /home/openclaw/.openclaw as the single source of truth, then re-apply only the few config fixes you made under root.
Recommended recovery plan
# 1) stop/remove the root-owned gateway service
openclaw gateway stop || true
openclaw gateway uninstall || true
# 2) make sure openclaw-user config has required keys
sudo -u openclaw -H openclaw config set gateway.mode local
sudo -u openclaw -H openclaw config set gateway.auth.mode token
sudo -u openclaw -H openclaw config set gateway.controlUi.allowedOrigins '["https://YOUR_SUBDOMAIN","http://127.0.0.1:18789","http://localhost:18789"]' --strict-json
sudo -u openclaw -H bash -lc 'openclaw config set gateway.auth.token "$(openssl rand -hex 32)"'
# 3) install/start service as openclaw user
sudo -u openclaw -H openclaw gateway install --force
sudo -u openclaw -H openclaw gateway restart
sudo -u openclaw -H openclaw gateway status --deep
Why this is safest
- avoids root/openclaw ownership conflicts
- preserves original agents/sessions/skills in
/home/openclaw/.openclaw - avoids risky cross-user
chown -Ron active state
After this, you’ll likely need to re-pair the browser device once (normal).
Refs:
https://docs.openclaw.ai/help/faq#where-does-openclaw-store-its-data
https://docs.openclaw.ai/gateway/multiple-gateways
https://docs.openclaw.ai/cli/gateway
If this gets you fully back, please mark solution so it gets indexed on Answer Overflow âś…
Yep — in this situation, yes: running doctor in non-interactive mode as openclaw is the right move.
That mode is specifically for safe migrations (schema/state normalization) and skips service-repair prompts.
So it’s the best way to unblock legacy Slack/Web keys without redoing the root/systemd mess.
Use this sequence:
sudo -u openclaw -H bash -lc 'cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.pre-noninteractive.bak && openclaw doctor --non-interactive'
Then continue with the openclaw user config fixes:
sudo -u openclaw -H openclaw config set gateway.mode local
sudo -u openclaw -H openclaw config set gateway.auth.mode token
sudo -u openclaw -H openclaw config set gateway.controlUi.allowedOrigins '["https://YOUR_SUBDOMAIN","http://127.0.0.1:18789","http://localhost:18789"]' --strict-json
sudo -u openclaw -H bash -lc 'openclaw config set gateway.auth.token "$(openssl rand -hex 32)"'
Then reinstall/restart service as openclaw:
sudo -u openclaw -H openclaw gateway install --force
sudo -u openclaw -H openclaw gateway restart
So yes — your instinct was correct. doctor --non-interactive is the safest unblocker here.
Yes — in your exact case, proceed with --fix for the openclaw user.
You’ve already confirmed doctor is detecting the right migrations; it just needs repair mode to write them.
sudo -u openclaw -H bash -lc 'cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.pre-fix.bak && openclaw doctor --fix --non-interactive'
That should apply the Slack/WebSearch legacy migrations in /home/openclaw/.openclaw/openclaw.json and unblock config set.
Then continue:
sudo -u openclaw -H openclaw config set gateway.mode local
sudo -u openclaw -H openclaw config set gateway.auth.mode token
sudo -u openclaw -H openclaw config set gateway.controlUi.allowedOrigins '["https://YOUR_SUBDOMAIN","http://127.0.0.1:18789","http://localhost:18789"]' --strict-json
So yes: your proposed command is the right next step.
Yep — this is the right diagnosis.
For headless VPS, the safe fix is:
- enable lingering for
openclaw - start that user manager
- run
--usercommands with the user-bus env set
uid=$(id -u openclaw)
sudo loginctl enable-linger openclaw
sudo systemctl start "user@${uid}.service"
sudo -u openclaw env \
XDG_RUNTIME_DIR="/run/user/${uid}" \
DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/${uid}/bus" \
openclaw gateway install --force
sudo -u openclaw env \
XDG_RUNTIME_DIR="/run/user/${uid}" \
DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/${uid}/bus" \
openclaw gateway restart
That’s the cleanest way to “wake up” systemctl --user without switching architecture.
So yes: loginctl enable-linger openclaw is part of the correct fix.
You’re correct: providers.openai.apiKey is old/invalid at root level.
âś… Correct config path in this build:
models.providers.openai.apiKey
So:
sudo -u openclaw -H openclaw config set models.providers.openai.apiKey "sk-..."
sudo -u openclaw -H openclaw gateway restart
sudo -u openclaw -H openclaw models status
Alternative (recommended for daemon setups): put OPENAI_API_KEY in /home/openclaw/.openclaw/.env and restart gateway.
And no — you generally do not need agents configure main just to use a global OpenAI key.
auth-profiles.json is mainly for explicit profile-managed auth flows; global provider key/env is enough for normal OpenAI usage.
(So between your options: use models.providers.openai.apiKey, not plugins.entries.openai.config.apiKey.)
Yep — that error means in your current config shape, models.providers.openai requires baseUrl when that block exists.
Fast fix: set both fields together in one write:
sudo -u openclaw -H openclaw config set --batch-json '[{"path":"models.providers.openai.baseUrl","value":"https://api.openai.com/v1"},{"path":"models.providers.openai.apiKey","value":"sk-REPLACE_ME"}]'
sudo -u openclaw -H openclaw gateway restart
If you’d rather avoid provider-block schema quirks, use env instead (also valid):
- put
OPENAI_API_KEY=...in/home/openclaw/.openclaw/.env - restart gateway
That should clear this validation error immediately.
Good catch — and yeah, this explains the loop.
You’re now in the custom provider schema path.
If models.providers.openai exists, OpenClaw treats it like a full custom provider object (with required fields), not a simple “drop key here” shortcut.
Safest fix (recommended)
Don’t use models.providers.openai.* for this right now.
Use env key + built-in OpenAI provider:
sudo -u openclaw -H openclaw config unset models.providers.openai
sudo -u openclaw -H bash -lc 'grep -q "^OPENAI_API_KEY=" ~/.openclaw/.env 2>/dev/null && sed -i "s/^OPENAI_API_KEY=.*/OPENAI_API_KEY=sk-REPLACE_ME/" ~/.openclaw/.env || echo "OPENAI_API_KEY=sk-REPLACE_ME" >> ~/.openclaw/.env'
sudo -u openclaw -H openclaw gateway restart
Then verify:
sudo -u openclaw -H openclaw models status
sudo -u openclaw -H openclaw models list
If openai/gpt-5.4 appears in models list, you can set it as primary.
No extra “capability flags” are required just to use it.
So: unset custom models.providers.openai, use OPENAI_API_KEY, restart, verify.
That error means the main agent auth store still has no OpenAI credential (or the running service isn’t seeing env).
Fastest deterministic fix: write the key directly into that agent auth store:
sudo -u openclaw -H openclaw models auth paste-token --provider openai
sudo -u openclaw -H openclaw models status --probe
sudo -u openclaw -H openclaw gateway restart
When prompted, paste your sk-... key.
If it still fails after that, run this once to confirm you’re not accidentally still on root service/config:
sudo -u openclaw -H openclaw gateway status --deep
(You should see service/config paths under /home/openclaw/.openclaw, not /root/.openclaw.)
Huge win. Nicely done digging through that mess. 🦞
For your goal (openclaw data + root-simple ops), the clean answer is:
Use a system service that runs as User=openclaw
Yes — that’s the right “vanilla” shape for VPS admin from root.
Why
- no per-command
XDG_RUNTIME_DIR/DBUS_SESSION_BUS_ADDRESShacks - keep state in
/home/openclaw/.openclaw - manage with normal root commands:
sudo systemctl restart openclaw-gateway
Migration (safe path)
- Disable the user service (so you don’t run two gateways):
uid=$(id -u openclaw)
sudo -u openclaw env XDG_RUNTIME_DIR="/run/user/$uid" DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/$uid/bus" systemctl --user disable --now openclaw-gateway.service || true
- Create system unit:
OPENCLAW_BIN=$(command -v openclaw)
sudo tee /etc/systemd/system/openclaw-gateway.service >/dev/null <<EOF
[Unit]
Description=OpenClaw Gateway
After=network-online.target
Wants=network-online.target
[Service]
User=openclaw
Group=openclaw
Environment=HOME=/home/openclaw
Environment=OPENCLAW_STATE_DIR=/home/openclaw/.openclaw
ExecStart=${OPENCLAW_BIN} gateway --port 18789
Restart=always
RestartSec=5
TimeoutStopSec=30
TimeoutStartSec=30
SuccessExitStatus=0 143
KillMode=control-group
[Install]
WantedBy=multi-user.target
EOF
- Enable/start:
sudo systemctl daemon-reload
sudo systemctl enable --now openclaw-gateway.service
sudo systemctl status openclaw-gateway.service --no-pager
After this, manage it with:
sudo systemctl restart openclaw-gatewaysudo systemctl status openclaw-gatewaysudo journalctl -u openclaw-gateway -f
That gives you the stability of openclaw user data with root-level simplicity.
If this fixed it end-to-end, please mark solution so Answer Overflow indexes it âś…
You’re not hitting a stale lockfile.
In OpenClaw, gateway singleton is TCP bind-based (no persistent PID/lock file to delete).
That "already running under systemd; waiting 5000ms" loop is almost always supervisor conflict (two service layers / duplicate units), not a port lock.
Nuclear reset (clean + deterministic)
# 0) stop both possible managers
sudo systemctl disable --now openclaw-gateway.service || true
uid=$(id -u openclaw)
sudo loginctl enable-linger openclaw
sudo systemctl start "user@${uid}.service"
sudo -u openclaw env \
XDG_RUNTIME_DIR="/run/user/${uid}" \
DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/${uid}/bus" \
systemctl --user disable --now openclaw-gateway.service || true
# 1) kill leftover processes + clear failed state
sudo pkill -9 -u openclaw -f "openclaw gateway" || true
sudo systemctl reset-failed openclaw-gateway.service || true
Now use one system service only, and make it run foreground gateway directly:
ExecStart=/usr/bin/openclaw gateway run --port 18789
(Use your actual openclaw path if different.)
Then:
sudo systemctl daemon-reload
sudo systemctl enable --now openclaw-gateway.service
sudo systemctl status openclaw-gateway.service --no-pager
Direct answers
- Lock/PID file causing this? No (not the mechanism here).
- Built-in daemon conflict? Yes, effectively — duplicate supervision paths/user-vs-system unit behavior.
- Nuclear option? Disable all other units, kill all gateway procs, run one foreground system unit only.
If this clears it, please mark solution so others can find this ghost-loop fix via Answer Overflow âś…
You found the right thread to pull. This is not /run/user/* — it’s the gateway lock layer.
Direct answers
- Lock file exists (in this codepath), at:
/tmp/openclaw-<uid>/gateway.<hash>.lock--forcewon’t fix this lock-loop (it helps port reclaim, not lock-owner logic).- The loop happens because under systemd it retries instead of hard-failing on lock conflict.
Nuclear reset (do this exactly)
uid=$(id -u openclaw)
sudo systemctl stop openclaw-gateway.service
sudo pkill -9 -u openclaw -f "openclaw gateway" || true
sudo rm -f /tmp/openclaw-${uid}/gateway.*.lock
sudo systemctl reset-failed openclaw-gateway.service
Then run once in foreground (outside systemd retry behavior) to see the real error:
sudo -u openclaw -H env \
-u INVOCATION_ID -u SYSTEMD_EXEC_PID -u JOURNAL_STREAM \
OPENCLAW_STATE_DIR=/home/openclaw/.openclaw \
/usr/bin/openclaw gateway run --port 18789 --verbose
If that starts cleanly, your service can be adjusted similarly (or fall back to user service).
If it errors, paste that first hard error — that’s the root cause hidden by the 5s systemd retry loop.
Also: yes, this retry behavior is supervisor-detection-driven, not evidence of a surviving port bind.
Yep — if the lingering user manager is the squatter, the clean kill switch is:
- disable linger
- terminate the user manager session
You don’t need to close a special socket manually.
uid=$(id -u openclaw)
sudo loginctl disable-linger openclaw
sudo loginctl terminate-user openclaw
sudo systemctl stop "user@${uid}.service" || true
# ensure old user-unit autostart link is gone
sudo rm -f /home/openclaw/.config/systemd/user/default.target.wants/openclaw-gateway.service
# cleanup leftovers
sudo pkill -9 -u openclaw -f "openclaw gateway" || true
sudo rm -f /tmp/openclaw-${uid}/gateway.*.lock
Then start only your system unit:
sudo systemctl restart openclaw-gateway.service
sudo systemctl status openclaw-gateway.service --no-pager
So yes: loginctl terminate-user openclaw is the definitive immediate stop, and disable-linger prevents it from respawning in the background.
Nice catch. PPID 800002 means you’re right: there’s a living supervisor chain.
Also: if Caddy is truly listening on :18789, that alone can block OpenClaw. One listener per port.
1) Identify exactly what 800002 is
Run:
ps -o pid,ppid,pgid,sid,user,tty,etime,cmd -p 800002,800009
tr '\0' ' ' < /proc/800002/cmdline; echo
readlink -f /proc/800002/exe
cat /proc/800002/cgroup
systemctl status 800002 --no-pager
If that PID belongs to a systemd unit, stop/kill the unit (best way).
2) Kill the entire 800002 branch (if rogue process tree)
sudo pkill -TERM -P 800002 || true
sudo kill -TERM 800002 || true
sleep 2
sudo pkill -KILL -P 800002 || true
sudo kill -KILL 800002 || true
Then confirm nothing remains:
pgrep -a -u openclaw -f "openclaw gateway" || true
ss -ltnp | grep ':18789' || true
3) Important port sanity check
If Caddy is on 18789, move one of them. Preferred layout:
- Caddy listens on
:443/:80 - OpenClaw listens on
127.0.0.1:18789 - Caddy reverse-proxies to
127.0.0.1:18789
If Caddy must keep 18789, set OpenClaw to another port and update proxy target.
If you paste the output of systemctl status 800002 + /proc/800002/cgroup, I can tell you the exact parent service name to kill permanently in one command.
Bingo — that explains the entire ghost loop.
Short answer: no, 2026.4.5 does not hard-require -gateway in the unit name for runtime.
But yes, you should still migrate to the canonical service name for sanity.
Why
openclaw.serviceis legacy and can confusedoctor/gateway status --deep/ service tooling.- Current docs + tooling expect
openclaw-gateway.service(and profile variants). - Keeping legacy name works today, but it’s not the safest long-term path.
Recommendation
- Keep one service only.
- Prefer canonical
openclaw-gateway.service. - Remove/disable the legacy
openclaw.serviceonce you’ve replicated the same ExecStart/env/state path.
If you want, I can give you a zero-downtime cutover sequence (copy unit, daemon-reload, stop old/start new, verify port/process, then purge old unit).
And yep — excellent forensic work here. This was a nasty one. 🦞
Perfect — here’s the cutover sequence (fast + rollback-ready).
# 0) backup the currently working legacy unit
OLD_UNIT_PATH=$(systemctl show -p FragmentPath --value openclaw.service)
sudo cp "$OLD_UNIT_PATH" "/etc/systemd/system/openclaw.service.bak.$(date +%F-%H%M%S)"
# 1) disable/remove the conflicting duplicate unit if present
sudo systemctl disable --now openclaw-gateway.service || true
# 2) clone the working legacy unit into canonical name
sudo cp "$OLD_UNIT_PATH" /etc/systemd/system/openclaw-gateway.service
sudo sed -i 's/^Description=.*/Description=OpenClaw Gateway/' /etc/systemd/system/openclaw-gateway.service
sudo systemctl daemon-reload
# 3) cutover (brief blip)
sudo systemctl stop openclaw.service
sudo systemctl enable --now openclaw-gateway.service
sudo systemctl disable openclaw.service
# 4) verify
sudo systemctl status openclaw-gateway.service --no-pager
ss -ltnp | grep ':18789' || true
sudo journalctl -u openclaw-gateway.service -n 50 --no-pager
Rollback (if needed):
sudo systemctl stop openclaw-gateway.service
sudo systemctl start openclaw.service