#ERR_MODULE_NOT_FOUND led to doctor --fix breaking config, now Gateway UI is completely unreachable (

1 messages · Page 1 of 1 (latest)

random tinselBOT Apr 6, 2026, 4:30 PM

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

upper graniteBOT Apr 6, 2026, 4:32 PM

You’re almost certainly dealing with two overlapping problems:

a corrupted host npm install (ERR_MODULE_NOT_FOUND in /usr/lib/node_modules/openclaw/dist/...)
a runtime split-brain (host systemd gateway + Docker gateway competing / different configs)

Also, from docs: doctor --fix is a migration/repair tool, not a package file reinstaller, so it can legitimately rewrite config/service state.

1) First, verify what’s actually running (read-only)

(yeah, even though you said 2026.4.5 — verify current binary)

openclaw --version
openclaw gateway status --deep
openclaw status --all
ss -ltnp | rg ':18789'
docker compose ps

2) Pick ONE gateway owner (critical)

If your intended setup is Docker-owned, remove host daemon ownership:

openclaw gateway uninstall
systemctl --user disable --now openclaw-gateway.service || true

Then run only container gateway:

docker compose up -d openclaw-gateway
docker compose logs -f openclaw-gateway

If Docker is the owner, reinstalling host /usr/lib/node_modules/openclaw won’t fix container runtime.

3) Fix the missing module only on the runtime you actually use

If host gateway runtime is still needed:

npm uninstall -g openclaw
npm install -g openclaw@2026.4.5 --force

(That specifically addresses the missing hashed dist chunk mismatch.)

4) Fix “unauthorized: gateway token missing”

Check auth mode + token in the active config path:

openclaw config get gateway.auth.mode
openclaw config get gateway.auth.token

If mode is token and token is empty:

openclaw config set gateway.auth.mode token
openclaw config set gateway.auth.token "$(openssl rand -hex 32)"
openclaw gateway restart

Then get a fresh dashboard URL:

openclaw dashboard --no-open

If you paste outputs from step 1 (gateway status --deep, ss, docker compose ps), I can tell you exactly which process is stealing 18789 and give you a precise recovery path in 1-2 commands.

Docs used:

Yep — you can unblock this safely without doctor --fix.

Best low-risk option (no service rewrites)

Use:

openclaw doctor --non-interactive

Per docs, this applies only safe migrations (like config normalization) and skips restart/service/sandbox repair actions that need prompts.

If you want to patch manually (exactly what you asked), do this:

cp /root/.openclaw/openclaw.json /root/.openclaw/openclaw.json.pre-websearch-fix.bak
nano /root/.openclaw/openclaw.json

In tools.web.search, remove legacy provider-owned keys like:

apiKey
brave: { ... }
exa: { ... }
firecrawl: { ... }
etc. provider sub-objects under tools.web.search.*

Keep only normal runtime fields there (example):

tools: {

web: {
search: {
enabled: true,
provider: "brave",
maxResults: 5,
},
},
},

If you were using Brave API key, move it to:

plugins: {
entries: {
brave: {

config: {
webSearch: {
apiKey: "YOUR_KEY",
},
},
},
},
},

(Or temporarily disable search to unblock everything: search: { enabled: false }.)

Commands without `rg` / compose v2

Use these now:

openclaw --version
openclaw gateway status --deep
openclaw status --all
ss -ltnp | grep ':18789'
docker compose version || docker-compose version
docker ps --format 'table {{.Names}}\t{{.Status}}\t{{.Ports}}'
docker-compose ps 2>/dev/null || true

Paste that output and I’ll give you the exact “Docker vs host service” cleanup sequence next.

Refs:

Perfect, that narrows it down cleanly.
For your host-based setup, run these exact 2 commands:

npm install -g openclaw@latest --force
openclaw config set gateway.auth.mode token && openclaw config set gateway.auth.token "$(openssl rand -hex 32)" && openclaw gateway install --force && openclaw gateway restart

Then grab the token + verify:

openclaw config get gateway.auth.token
openclaw gateway status

Use that token in the Control UI settings.
(Yes, this rotates token auth to a fresh value.)

Yes — in your case that is the correct and safest next step.

Your logs match the documented startup guard: if gateway.mode is missing, service exits with code 1 before binding.

Run exactly:

openclaw config set gateway.mode local
openclaw gateway start
openclaw gateway status

If status comes back healthy, you’re done.
If it still fails, send just:

journalctl --user -u openclaw-gateway.service -n 50 --no-pager

(But based on your log, setting gateway.mode=local should unblock it.)

Hell yes — you’re at the last mile.

Use this exact sequence (replace with your real domain, no trailing slash):

openclaw config set gateway.controlUi.allowedOrigins '["https://YOUR_SUBDOMAIN","http://127.0.0.1:18789","http://localhost:18789"]' --strict-json
openclaw gateway restart

Then verify:

openclaw config get gateway.controlUi.allowedOrigins --json
openclaw gateway status

Notes:

Origin must be exact (scheme + host + optional port), no path.
If you use multiple hostnames (e.g. https://a.example.com and https://b.example.com), include both in that array.

If this works, glad that helped 🎉
If this fixed it, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it.

Nice — you’re at the final auth gate.

For Control UI, there is no pairing code. It’s a device approval flow.

Run:

openclaw devices list
openclaw devices approve --latest

If you want to approve a specific request instead:

openclaw devices approve <requestId>

Then refresh the UI tab.

If devices list shows nothing pending, reload the Control UI once (to create a fresh request), then run openclaw devices list again.

Great diagnosis — and yes, this is classic split-state.

Safest path: don’t “merge” directories and don’t run root against /home/openclaw/.openclaw.
Use openclaw user + /home/openclaw/.openclaw as the single source of truth, then re-apply only the few config fixes you made under root.

Recommended recovery plan

# 1) stop/remove the root-owned gateway service
openclaw gateway stop || true
openclaw gateway uninstall || true

# 2) make sure openclaw-user config has required keys
sudo -u openclaw -H openclaw config set gateway.mode local
sudo -u openclaw -H openclaw config set gateway.auth.mode token
sudo -u openclaw -H openclaw config set gateway.controlUi.allowedOrigins '["https://YOUR_SUBDOMAIN","http://127.0.0.1:18789","http://localhost:18789"]' --strict-json

sudo -u openclaw -H bash -lc 'openclaw config set gateway.auth.token "$(openssl rand -hex 32)"'

# 3) install/start service as openclaw user
sudo -u openclaw -H openclaw gateway install --force
sudo -u openclaw -H openclaw gateway restart
sudo -u openclaw -H openclaw gateway status --deep

Why this is safest

avoids root/openclaw ownership conflicts
preserves original agents/sessions/skills in /home/openclaw/.openclaw
avoids risky cross-user chown -R on active state

After this, you’ll likely need to re-pair the browser device once (normal).

Refs:

https://docs.openclaw.ai/help/faq#where-does-openclaw-store-its-data
https://docs.openclaw.ai/gateway/multiple-gateways
https://docs.openclaw.ai/cli/gateway

If this gets you fully back, please mark solution so it gets indexed on Answer Overflow ✅

Yep — in this situation, yes: running doctor in non-interactive mode as openclaw is the right move.

That mode is specifically for safe migrations (schema/state normalization) and skips service-repair prompts.
So it’s the best way to unblock legacy Slack/Web keys without redoing the root/systemd mess.

Use this sequence:

sudo -u openclaw -H bash -lc 'cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.pre-noninteractive.bak && openclaw doctor --non-interactive'

Then continue with the openclaw user config fixes:

sudo -u openclaw -H openclaw config set gateway.mode local
sudo -u openclaw -H openclaw config set gateway.auth.mode token

sudo -u openclaw -H openclaw config set gateway.controlUi.allowedOrigins '["https://YOUR_SUBDOMAIN","http://127.0.0.1:18789","http://localhost:18789"]' --strict-json
sudo -u openclaw -H bash -lc 'openclaw config set gateway.auth.token "$(openssl rand -hex 32)"'

Then reinstall/restart service as openclaw:

sudo -u openclaw -H openclaw gateway install --force
sudo -u openclaw -H openclaw gateway restart

So yes — your instinct was correct. doctor --non-interactive is the safest unblocker here.

Yes — in your exact case, proceed with --fix for the openclaw user.

You’ve already confirmed doctor is detecting the right migrations; it just needs repair mode to write them.

sudo -u openclaw -H bash -lc 'cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.pre-fix.bak && openclaw doctor --fix --non-interactive'

That should apply the Slack/WebSearch legacy migrations in /home/openclaw/.openclaw/openclaw.json and unblock config set.

Then continue:

sudo -u openclaw -H openclaw config set gateway.mode local
sudo -u openclaw -H openclaw config set gateway.auth.mode token
sudo -u openclaw -H openclaw config set gateway.controlUi.allowedOrigins '["https://YOUR_SUBDOMAIN","http://127.0.0.1:18789","http://localhost:18789"]' --strict-json

So yes: your proposed command is the right next step.

Yep — this is the right diagnosis.
For headless VPS, the safe fix is:

enable lingering for openclaw
start that user manager
run --user commands with the user-bus env set

uid=$(id -u openclaw)
sudo loginctl enable-linger openclaw
sudo systemctl start "user@${uid}.service"

sudo -u openclaw env \
XDG_RUNTIME_DIR="/run/user/${uid}" \
DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/${uid}/bus" \
openclaw gateway install --force


sudo -u openclaw env \
XDG_RUNTIME_DIR="/run/user/${uid}" \
DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/${uid}/bus" \
openclaw gateway restart

That’s the cleanest way to “wake up” systemctl --user without switching architecture.
So yes: loginctl enable-linger openclaw is part of the correct fix.

upper graniteBOT Apr 6, 2026, 6:08 PM

You’re correct: providers.openai.apiKey is old/invalid at root level.

✅ Correct config path in this build:
models.providers.openai.apiKey

So:

sudo -u openclaw -H openclaw config set models.providers.openai.apiKey "sk-..."
sudo -u openclaw -H openclaw gateway restart
sudo -u openclaw -H openclaw models status

Alternative (recommended for daemon setups): put OPENAI_API_KEY in /home/openclaw/.openclaw/.env and restart gateway.

And no — you generally do not need agents configure main just to use a global OpenAI key.
auth-profiles.json is mainly for explicit profile-managed auth flows; global provider key/env is enough for normal OpenAI usage.

(So between your options: use models.providers.openai.apiKey, not plugins.entries.openai.config.apiKey.)

Yep — that error means in your current config shape, models.providers.openai requires baseUrl when that block exists.

Fast fix: set both fields together in one write:

sudo -u openclaw -H openclaw config set --batch-json '[{"path":"models.providers.openai.baseUrl","value":"https://api.openai.com/v1"},{"path":"models.providers.openai.apiKey","value":"sk-REPLACE_ME"}]'
sudo -u openclaw -H openclaw gateway restart

If you’d rather avoid provider-block schema quirks, use env instead (also valid):

put OPENAI_API_KEY=... in /home/openclaw/.openclaw/.env
restart gateway

That should clear this validation error immediately.

Good catch — and yeah, this explains the loop.

You’re now in the custom provider schema path.
If models.providers.openai exists, OpenClaw treats it like a full custom provider object (with required fields), not a simple “drop key here” shortcut.

Safest fix (recommended)

Don’t use models.providers.openai.* for this right now.
Use env key + built-in OpenAI provider:

sudo -u openclaw -H openclaw config unset models.providers.openai
sudo -u openclaw -H bash -lc 'grep -q "^OPENAI_API_KEY=" ~/.openclaw/.env 2>/dev/null && sed -i "s/^OPENAI_API_KEY=.*/OPENAI_API_KEY=sk-REPLACE_ME/" ~/.openclaw/.env || echo "OPENAI_API_KEY=sk-REPLACE_ME" >> ~/.openclaw/.env'
sudo -u openclaw -H openclaw gateway restart

Then verify:

sudo -u openclaw -H openclaw models status
sudo -u openclaw -H openclaw models list

If openai/gpt-5.4 appears in models list, you can set it as primary.
No extra “capability flags” are required just to use it.

So: unset custom models.providers.openai, use OPENAI_API_KEY, restart, verify.

That error means the main agent auth store still has no OpenAI credential (or the running service isn’t seeing env).

Fastest deterministic fix: write the key directly into that agent auth store:

sudo -u openclaw -H openclaw models auth paste-token --provider openai
sudo -u openclaw -H openclaw models status --probe
sudo -u openclaw -H openclaw gateway restart

When prompted, paste your sk-... key.

If it still fails after that, run this once to confirm you’re not accidentally still on root service/config:

sudo -u openclaw -H openclaw gateway status --deep

(You should see service/config paths under /home/openclaw/.openclaw, not /root/.openclaw.)

Huge win. Nicely done digging through that mess. 🦞

For your goal (openclaw data + root-simple ops), the clean answer is:

Use a system service that runs as `User=openclaw`

Yes — that’s the right “vanilla” shape for VPS admin from root.

Why

no per-command XDG_RUNTIME_DIR / DBUS_SESSION_BUS_ADDRESS hacks
keep state in /home/openclaw/.openclaw
manage with normal root commands: sudo systemctl restart openclaw-gateway

Migration (safe path)

Disable the user service (so you don’t run two gateways):

uid=$(id -u openclaw)
sudo -u openclaw env XDG_RUNTIME_DIR="/run/user/$uid" DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/$uid/bus" systemctl --user disable --now openclaw-gateway.service || true

Create system unit:

OPENCLAW_BIN=$(command -v openclaw)
sudo tee /etc/systemd/system/openclaw-gateway.service >/dev/null <<EOF
[Unit]
Description=OpenClaw Gateway
After=network-online.target
Wants=network-online.target

[Service]

User=openclaw
Group=openclaw
Environment=HOME=/home/openclaw
Environment=OPENCLAW_STATE_DIR=/home/openclaw/.openclaw
ExecStart=${OPENCLAW_BIN} gateway --port 18789
Restart=always
RestartSec=5
TimeoutStopSec=30
TimeoutStartSec=30
SuccessExitStatus=0 143
KillMode=control-group

[Install]
WantedBy=multi-user.target
EOF

Enable/start:

sudo systemctl daemon-reload
sudo systemctl enable --now openclaw-gateway.service
sudo systemctl status openclaw-gateway.service --no-pager

After this, manage it with:

sudo systemctl restart openclaw-gateway
sudo systemctl status openclaw-gateway
sudo journalctl -u openclaw-gateway -f

That gives you the stability of openclaw user data with root-level simplicity.

If this fixed it end-to-end, please mark solution so Answer Overflow indexes it ✅

upper graniteBOT Apr 6, 2026, 7:03 PM

You’re not hitting a stale lockfile.

In OpenClaw, gateway singleton is TCP bind-based (no persistent PID/lock file to delete).
That "already running under systemd; waiting 5000ms" loop is almost always supervisor conflict (two service layers / duplicate units), not a port lock.

Nuclear reset (clean + deterministic)

# 0) stop both possible managers
sudo systemctl disable --now openclaw-gateway.service || true

uid=$(id -u openclaw)
sudo loginctl enable-linger openclaw
sudo systemctl start "user@${uid}.service"

sudo -u openclaw env \

XDG_RUNTIME_DIR="/run/user/${uid}" \
DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/${uid}/bus" \
systemctl --user disable --now openclaw-gateway.service || true

# 1) kill leftover processes + clear failed state
sudo pkill -9 -u openclaw -f "openclaw gateway" || true
sudo systemctl reset-failed openclaw-gateway.service || true

Now use one system service only, and make it run foreground gateway directly:

ExecStart=/usr/bin/openclaw gateway run --port 18789

(Use your actual openclaw path if different.)

Then:

sudo systemctl daemon-reload
sudo systemctl enable --now openclaw-gateway.service
sudo systemctl status openclaw-gateway.service --no-pager

Direct answers

Lock/PID file causing this? No (not the mechanism here).
Built-in daemon conflict? Yes, effectively — duplicate supervision paths/user-vs-system unit behavior.
Nuclear option? Disable all other units, kill all gateway procs, run one foreground system unit only.

If this clears it, please mark solution so others can find this ghost-loop fix via Answer Overflow ✅

You found the right thread to pull. This is not /run/user/* — it’s the gateway lock layer.

Direct answers

Lock file exists (in this codepath), at:
/tmp/openclaw-<uid>/gateway.<hash>.lock
--force won’t fix this lock-loop (it helps port reclaim, not lock-owner logic).
The loop happens because under systemd it retries instead of hard-failing on lock conflict.

Nuclear reset (do this exactly)

uid=$(id -u openclaw)

sudo systemctl stop openclaw-gateway.service
sudo pkill -9 -u openclaw -f "openclaw gateway" || true

sudo rm -f /tmp/openclaw-${uid}/gateway.*.lock
sudo systemctl reset-failed openclaw-gateway.service

Then run once in foreground (outside systemd retry behavior) to see the real error:

sudo -u openclaw -H env \
-u INVOCATION_ID -u SYSTEMD_EXEC_PID -u JOURNAL_STREAM \
OPENCLAW_STATE_DIR=/home/openclaw/.openclaw \
/usr/bin/openclaw gateway run --port 18789 --verbose

If that starts cleanly, your service can be adjusted similarly (or fall back to user service).
If it errors, paste that first hard error — that’s the root cause hidden by the 5s systemd retry loop.

Also: yes, this retry behavior is supervisor-detection-driven, not evidence of a surviving port bind.

Yep — if the lingering user manager is the squatter, the clean kill switch is:

disable linger
terminate the user manager session

You don’t need to close a special socket manually.

uid=$(id -u openclaw)

sudo loginctl disable-linger openclaw
sudo loginctl terminate-user openclaw
sudo systemctl stop "user@${uid}.service" || true

# ensure old user-unit autostart link is gone
sudo rm -f /home/openclaw/.config/systemd/user/default.target.wants/openclaw-gateway.service


# cleanup leftovers
sudo pkill -9 -u openclaw -f "openclaw gateway" || true
sudo rm -f /tmp/openclaw-${uid}/gateway.*.lock

Then start only your system unit:

sudo systemctl restart openclaw-gateway.service
sudo systemctl status openclaw-gateway.service --no-pager

So yes: loginctl terminate-user openclaw is the definitive immediate stop, and disable-linger prevents it from respawning in the background.

upper graniteBOT Apr 6, 2026, 8:16 PM

Nice catch. PPID 800002 means you’re right: there’s a living supervisor chain.

Also: if Caddy is truly listening on :18789, that alone can block OpenClaw. One listener per port.

1) Identify exactly what 800002 is

Run:

ps -o pid,ppid,pgid,sid,user,tty,etime,cmd -p 800002,800009
tr '\0' ' ' < /proc/800002/cmdline; echo
readlink -f /proc/800002/exe
cat /proc/800002/cgroup
systemctl status 800002 --no-pager

If that PID belongs to a systemd unit, stop/kill the unit (best way).

2) Kill the entire 800002 branch (if rogue process tree)

sudo pkill -TERM -P 800002 || true
sudo kill -TERM 800002 || true
sleep 2
sudo pkill -KILL -P 800002 || true
sudo kill -KILL 800002 || true

Then confirm nothing remains:

pgrep -a -u openclaw -f "openclaw gateway" || true
ss -ltnp | grep ':18789' || true

3) Important port sanity check

If Caddy is on 18789, move one of them. Preferred layout:

Caddy listens on :443/:80
OpenClaw listens on 127.0.0.1:18789
Caddy reverse-proxies to 127.0.0.1:18789

If Caddy must keep 18789, set OpenClaw to another port and update proxy target.

If you paste the output of systemctl status 800002 + /proc/800002/cgroup, I can tell you the exact parent service name to kill permanently in one command.

Bingo — that explains the entire ghost loop.

Short answer: no, 2026.4.5 does not hard-require -gateway in the unit name for runtime.
But yes, you should still migrate to the canonical service name for sanity.

Why

openclaw.service is legacy and can confuse doctor / gateway status --deep / service tooling.
Current docs + tooling expect openclaw-gateway.service (and profile variants).
Keeping legacy name works today, but it’s not the safest long-term path.

Recommendation

Keep one service only.
Prefer canonical openclaw-gateway.service.
Remove/disable the legacy openclaw.service once you’ve replicated the same ExecStart/env/state path.

If you want, I can give you a zero-downtime cutover sequence (copy unit, daemon-reload, stop old/start new, verify port/process, then purge old unit).

And yep — excellent forensic work here. This was a nasty one. 🦞

Perfect — here’s the cutover sequence (fast + rollback-ready).

# 0) backup the currently working legacy unit
OLD_UNIT_PATH=$(systemctl show -p FragmentPath --value openclaw.service)
sudo cp "$OLD_UNIT_PATH" "/etc/systemd/system/openclaw.service.bak.$(date +%F-%H%M%S)"

# 1) disable/remove the conflicting duplicate unit if present
sudo systemctl disable --now openclaw-gateway.service || true

# 2) clone the working legacy unit into canonical name
sudo cp "$OLD_UNIT_PATH" /etc/systemd/system/openclaw-gateway.service
sudo sed -i 's/^Description=.*/Description=OpenClaw Gateway/' /etc/systemd/system/openclaw-gateway.service
sudo systemctl daemon-reload

# 3) cutover (brief blip)

sudo systemctl stop openclaw.service
sudo systemctl enable --now openclaw-gateway.service
sudo systemctl disable openclaw.service

# 4) verify
sudo systemctl status openclaw-gateway.service --no-pager
ss -ltnp | grep ':18789' || true
sudo journalctl -u openclaw-gateway.service -n 50 --no-pager

Rollback (if needed):

sudo systemctl stop openclaw-gateway.service
sudo systemctl start openclaw.service

#ERR_MODULE_NOT_FOUND led to doctor --fix breaking config, now Gateway UI is completely unreachable (

1) First, verify what’s actually running (read-only)

2) Pick ONE gateway owner (critical)

3) Fix the missing module only on the runtime you actually use

4) Fix “unauthorized: gateway token missing”

Best low-risk option (no service rewrites)

Commands without rg / compose v2

Recommended recovery plan

Why this is safest

Safest fix (recommended)

Use a system service that runs as User=openclaw

Why

Migration (safe path)

Nuclear reset (clean + deterministic)

Direct answers

Direct answers

Nuclear reset (do this exactly)

1) Identify exactly what 800002 is

2) Kill the entire 800002 branch (if rogue process tree)

3) Important port sanity check

Why

Recommendation

Commands without `rg` / compose v2

Use a system service that runs as `User=openclaw`