please help, if im following logs what will happen is it will show gateway clsed. and then it might come back. and.. yeah. just like that. on discord messages might show abnormally long loading times and typing indicator coming and going. and following the logs - shows gateway not reachable? but then the turn still.. continues? yeah. status --deep wont work cause it says gateway not reachable.
#gateway oddities
1 messages · Page 1 of 1 (latest)
This sounds like two “gateways” may be getting mixed together:
Gateway websocket closed: 1000in Discord logs = Discord’s gateway websocket reconnecting.openclaw status --deepsays gateway not reachable = OpenClaw’s local Gateway daemon is unreachable.
First thing: please paste:
openclaw --version
openclaw gateway status
openclaw status --all
openclaw doctor --deep
If status --deep fails, that’s expected while the OpenClaw Gateway is unreachable — use status --all instead.
Also grab the relevant log tail:
grep -Ei 'discord|Gateway websocket|READY|reconnect|identify|fatal|error|listener|EADDRINUSE|not reachable' /tmp/openclaw/openclaw-$(date +%F).log | tail -200
My suspicion: either the Gateway process is restarting/crashing, or there’s a Discord websocket reconnect loop from duplicate bot-token usage / duplicate gateway instance. Check for duplicates too:
pgrep -af 'openclaw.*gateway|openclaw-gateway|node .*openclaw'
docker ps
Docs for the gateway runbook are here: https://docs.openclaw.ai/gateway/troubleshooting
That status is useful: this is not a port conflict. The port is “in use” because your OpenClaw Gateway process owns it.
The bad part is:
Runtime: running
Listening: 127.0.0.1:18789
Connectivity probe: failed
timeout
That usually means the process is alive and bound to the port, but the Gateway isn’t answering the WebSocket probe — likely startup hang, event-loop stall, or the process is overloaded.
Please run this next:
openclaw gateway restart
sleep 20
openclaw gateway status
If it still says probe timeout, paste:
journalctl --user -u openclaw-gateway.service -n 200 --no-pager
grep -Ei 'fatal|error|exception|timeout|liveness|event.loop|event-loop|memory|heap|rss|discord|Gateway websocket|READY|reconnect' /tmp/openclaw/openclaw-2026-04-28.log | tail -250
ps -p 3535 -o pid,etimes,%cpu,%mem,rss,vsz,stat,cmd
Also: don’t chase the Port 18789 is already in use line yet — that’s expected while pid 3535 is listening.
One extra note: there’s a main-branch fix today around exposing event-loop health in readiness, so if 2026.4.26 keeps doing this, this may be worth a GitHub issue with diagnostics. But first let’s see whether it recovers after restart or immediately wedges again.
This confirms it: the Gateway process is alive but wedged/unresponsive.
The big clue is this line:
Git main @ 4340cb74 · behind 856 · fetch failed
Command: /home/ubuntu/openclaw/dist/index.js gateway
So you’re running from a source checkout, not the npm-installed latest. npm latest 2026.4.26 doesn’t really help if systemd is launching ~/openclaw/dist/index.js, and that checkout is 856 commits behind.
I’d do this next:
cd ~/openclaw
git fetch origin main
git pull --ff-only
pnpm install
pnpm build
openclaw gateway install --force
openclaw gateway restart
sleep 20
openclaw gateway status
If git fetch still fails, fix that first — otherwise you’re stuck on an old source tree/dist.
If it still wedges after updating/rebuilding, paste:
journalctl --user -u openclaw-gateway.service -n 200 --no-pager
grep -Ei 'fatal|error|exception|timeout|liveness|event.loop|event-loop|memory|heap|rss|discord|Gateway websocket|READY|reconnect' /tmp/openclaw/openclaw-2026-04-28.log | tail -250
Yep — same diagnosis. systemd is saying “I successfully started a process,” but the OpenClaw CLI is saying “that process is not answering as a Gateway.”
So the next useful thing is startup logs for pid 7423 / that restart, not more status output:
journalctl --user -u openclaw-gateway.service -b -n 300 --no-pager
tail -n 200 ~/.openclaw/logs/gateway.err.log ~/.openclaw/logs/gateway.log 2>/dev/null
grep -Ei 'fatal|error|exception|timeout|liveness|event.loop|event-loop|memory|heap|rss|listening|ready|discord|Gateway websocket|reconnect' /tmp/openclaw/openclaw-2026-04-28.log | tail -300
ps -p 7423 -o pid,etimes,%cpu,%mem,rss,vsz,stat,cmd
Also: did you update/rebuild the source checkout yet? Your earlier report said systemd is launching:
/home/ubuntu/openclaw/dist/index.js
and that checkout was 856 commits behind. If it’s still behind, I’d fix that before digging deeper, because the installed npm version won’t affect this service path.
That log is actually pretty telling.
These lines:
client initialized ... awaiting gateway readiness
...
[heartbeat] started
mean startup is getting partway through, but it’s not reaching a healthy Gateway-ready state. The unresolved Discord channels are probably just deleted/missing channels, not fatal. The Tailscale Serve failure is worth fixing later, but it should not make local ws://127.0.0.1:18789 time out.
The big thing: this matches a known-ish “Discord gateway startup stuck at awaiting gateway readiness” area, and your checkout is 856 commits behind. I’d update/rebuild before chasing logs further:
cd ~/openclaw
git status -sb
git fetch origin main
git merge --ff-only origin/main
pnpm install
pnpm build
pnpm openclaw gateway install --force
pnpm openclaw gateway restart
sleep 20
pnpm openclaw gateway status
If git fetch fails, paste the exact error. If fetch fails but origin/main is already ahead locally, this may still work:
cd ~/openclaw
git merge --ff-only origin/main
If it still wedges after that, temporarily isolate Tailscale:
openclaw config set gateway.tailscale.mode off
openclaw gateway restart
sleep 20
openclaw gateway status
That’s reversible, and helps tell us whether the tailscale serve failed path is involved.
That update failure is separate and likely the root cause of being stuck on the old checkout.
unable to write file ... fatal: Could not reset index file usually means one of:
- disk/inodes full
- repo files owned by another user, often from a previous
sudo git/pnpm - filesystem / repo dir not writable
Please don’t rerun the update yet — check this first:
cd ~/openclaw
df -h .
df -ih .
id
git status -sb
ls -ld . .git vendor vendor/a2ui vendor/a2ui/specification vendor/a2ui/specification/0.9 2>/dev/null
find . -maxdepth 4 ! -user "$(id -un)" -printf '%u:%g %p\n' 2>/dev/null | head -50
If df shows 100%, free space first.
If the find output shows lots of root:root or another user inside ~/openclaw, that’s probably it.
Also, if you ran any of the build/update commands with sudo, don’t — the source checkout should be writable by your normal ubuntu user.
Disk/inodes look fine, so next suspect is permissions/ownership in ~/openclaw or .git.
Run these:
cd ~/openclaw
id
git status -sb
ls -ld . .git .git/index vendor vendor/a2ui vendor/a2ui/specification vendor/a2ui/specification/0.9 2>/dev/null
find . -maxdepth 4 ! -user "$(id -un)" -printf '%u:%g %p\n' 2>/dev/null | head -80
If you see root:root in that output, fix ownership from the parent dir:
sudo chown -R "$(id -un):$(id -gn)" ~/openclaw
Then retry:
cd ~/openclaw
git reset --hard HEAD
git fetch origin main
git merge --ff-only origin/main
If the find output is clean and it still can’t write, paste ls -ld output — we’ll check file modes/immutable flags next.
Nice — ownership looks basically OK, and behind 3 means the fetch mostly worked. The earlier behind 856 is gone.
Two possibilities now:
- the repo is fine now and only needs the last few commits
- the failing files are one level deeper than what we checked (
.../0.9/json/*)
Try the simple path first:
cd ~/openclaw
git status --porcelain
git pull --ff-only
If git status --porcelain prints local changes, paste it before pulling.
If git pull works:
pnpm install
pnpm build
pnpm openclaw gateway install --force
pnpm openclaw gateway restart
sleep 20
pnpm openclaw gateway status
If it fails with unable to write file again, check the deeper path:
cd ~/openclaw
ls -ld vendor/a2ui/specification/0.9/json
ls -l vendor/a2ui/specification/0.9/json/common_types.json vitest.config.ts zizmor.yml 2>/dev/null
lsattr -d vendor/a2ui/specification/0.9/json vitest.config.ts zizmor.yml 2>/dev/null
Paste that if it still errors.
Good, repo update is fixed ✅
Now rebuild/reinstall the service so systemd stops launching stale dist:
cd ~/openclaw
pnpm install
pnpm build
pnpm openclaw gateway install --force
pnpm openclaw gateway restart
sleep 20
pnpm openclaw gateway status
If pnpm build fails, paste the first real error. If status still says runtime running but connectivity timeout after this, paste the fresh journalctl tail again.
It should work, but in your case openclaw update hit the source-checkout updater’s preflight worktree and failed while writing files.
Since you’re running from:
/home/ubuntu/openclaw/dist/index.js
you’re in the git/source checkout flow, not the npm-global package flow. openclaw update does extra safety steps there: fetch, preflight temp worktree/build, then update/build/restart. Your manual git pull --ff-only just succeeded, so the repo itself is probably OK now.
For this incident, I’d skip openclaw update and finish manually:
cd ~/openclaw
pnpm install
pnpm build
pnpm openclaw gateway install --force
pnpm openclaw gateway restart
sleep 20
pnpm openclaw gateway status
After things are healthy, you can test the updater again with:
pnpm openclaw update status
pnpm openclaw update --dry-run
Yes — openclaw update --channel dev is supposed to use the git checkout and update to latest main.
The flow is roughly:
set channel dev → use ~/openclaw git checkout → fetch origin/main → preflight/build-check → rebase/update → install deps → build → doctor/plugin sync → restart gateway
In your case, it started that flow but failed at the preflight worktree step:
Preparing preflight worktree
error: unable to write file ...
fatal: Could not reset index file to revision 'HEAD'
So the intended answer is “yes”, but the updater itself got blocked before it could finish. Since your manual git pull --ff-only worked afterward, finishing with manual pnpm install && pnpm build && pnpm openclaw gateway restart is the fastest recovery path.
After the gateway is healthy, pnpm openclaw update --channel dev --dry-run would be the safe way to see whether the updater is still broken.
Yeah — at this point doctor won’t fix it, because the service does start; it just never becomes reachable. We need to isolate whether startup is hanging on Discord/Tailscale vs core Gateway.
This will briefly take the bot offline:
cd ~/openclaw
pnpm openclaw gateway stop
Then run the Gateway in the foreground with channels skipped + startup tracing:
cd ~/openclaw
OPENCLAW_SKIP_CHANNELS=1 OPENCLAW_GATEWAY_STARTUP_TRACE=1 pnpm openclaw gateway --tailscale off --port 18789 --verbose
Leave that running for ~30s. In a second terminal, run:
cd ~/openclaw
pnpm openclaw gateway status
curl -i --max-time 5 http://127.0.0.1:18789/healthz
curl -i --max-time 5 http://127.0.0.1:18789/readyz
If this works, the core Gateway is fine and the hang is likely channel/Tailscale startup.
If this still fails, paste the foreground startup trace — that should show exactly where it wedges.
Good — this is progress. With channels skipped, connectivity probe is OK and /healthz is 200.
So the core Gateway is fine. The normal startup hang is coming from something loaded during full service startup — most likely Discord channel startup or Tailscale Serve, not the Gateway server itself.
Also this line matters:
Service config looks out of date or non-standard.
Let’s clean that up first, then isolate channels.
Run:
cd ~/openclaw
pnpm openclaw gateway install --force
systemctl --user daemon-reload
pnpm openclaw gateway restart
sleep 20
pnpm openclaw gateway status
If it wedges again, test with Tailscale disabled persistently:
pnpm openclaw config set gateway.tailscale.mode off
pnpm openclaw gateway restart
sleep 20
pnpm openclaw gateway status
If it still wedges after Tailscale off, then we isolate Discord accounts. Since your logs show two Discord accounts (Pikachu and Quiet), temporarily disable one configured Discord account at a time in config and restart. Don’t paste tokens, but you can inspect account ids with:
pnpm openclaw config get channels.discord --json
Given the earlier awaiting gateway readiness lines, I’m leaning Discord startup/lifecycle race rather than core Gateway.
Yep — that confirms the foreground “skip channels” test is healthy:
{"ready":true,"failing":[]}
The event_loop_delay is degraded but not fatal there; 1.5s max delay during startup/build-ish conditions can happen. The important part is: Gateway ready works when channels are skipped.
So next isolate in this order:
- clean/reinstall service metadata
- disable Tailscale
- isolate Discord accounts
Commands:
cd ~/openclaw
pnpm openclaw gateway install --force
systemctl --user daemon-reload
pnpm openclaw config set gateway.tailscale.mode off
pnpm openclaw gateway restart
sleep 20
pnpm openclaw gateway status
If still unreachable, check the Discord config account ids:
pnpm openclaw config get channels.discord --json
Don’t paste tokens. We’re looking for the account keys/names so we can temporarily disable one bot at a time. Your logs show isolated-bot / Pikachu and main-bot / Quiet; one of those may be hanging startup.
for the first time gatewya status --all and --deep succeeds without asking if the gateway s running... good sign..
That’s a very good sign 🎉
Given what changed, the likely culprit was either:
- stale/non-standard systemd service metadata/PATH, or
- Tailscale Serve startup getting in the way,
- less likely Discord now that full status/deep works.
Do one more sanity pass:
pnpm openclaw gateway status
pnpm openclaw status --deep
pnpm openclaw channels status --probe
If those stay green and Discord replies normally, you’re probably fixed.
Since we set Tailscale off while isolating, only turn it back on if you actually need remote dashboard access:
pnpm openclaw config set gateway.tailscale.mode serve
pnpm openclaw gateway restart
sleep 20
pnpm openclaw gateway status
If enabling Tailscale makes it wedge again, we’ve found the trigger.
no way, those commands worked for the first time... i dont need tailscale i can live without it tbh. i just ssh into my server via tailscale and not into the gateway
Hell yeah — then leave gateway.tailscale.mode off. That’s a totally valid setup.
If you SSH in over Tailscale, you can still reach the dashboard safely with a tunnel when needed:
ssh -N -L 18789:127.0.0.1:18789 ubuntu@<your-tailscale-hostname>
Then open locally:
http://127.0.0.1:18789/
So the stable state is:
pnpm openclaw config set gateway.tailscale.mode off
pnpm openclaw gateway restart
pnpm openclaw status --deep
If this fixed it, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow indexes it for the next person. 🦐
tbh i dont want to save this solution cause in this chat i gave a lot of personal logs
Totally fair — don’t mark it then. Privacy > indexing.
If you want a safe version later, you can make a new sanitized thread with only:
Symptom: Gateway service running, but local probe timed out / status --deep unreachable.
Cause/workaround: Tailscale Serve exposure seemed to wedge startup.
Fix: set gateway.tailscale.mode off, reinstall/restart gateway service.
No hostnames, channel IDs, logs, usernames, or paths needed.
For now: just leave it unmarked. Glad it’s working again 🦐