#Openclaw Gateway won't start

1 messages · Page 1 of 1 (latest)

barren basin
#

Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-28.log
Command: /usr/bin/node /home/openclaw/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_GATEWAY_PORT=18789

Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw/openclaw.json

Gateway: bind=loopback (127.0.0.1), port=18789 (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://127.0.0.1:18789/
Probe note: Loopback-only gateway; only local clients can connect.

Runtime: running (pid 4212, state active, sub running, last exit 0, reason 0)
Warm-up: launch agents can take a few seconds. Try again shortly.
Connectivity probe: failed
Probe target: ws://127.0.0.1:18789
timeout
Capability: unknown

Port 18789 is already in use.

  • pid 4212 openclaw: openclaw-gateway (127.0.0.1:18789)
  • Gateway already running locally. Stop it (openclaw gateway stop) or use a different port.
    Listening: 127.0.0.1:18789
    Troubles: run openclaw status
    Troubleshooting: https://docs.openclaw.ai/troubleshooting
paper hamletBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

severe nexusBOT
#

This is not “port conflict with another app” — PID 4212 is already the OpenClaw gateway listening on 127.0.0.1:18789.

The real problem is: process is running, but the WS/RPC probe is timing out. That usually means the gateway is hung during startup or blocked on something before it can answer.

Please paste:

openclaw --version
openclaw status --all
tail -n 120 /tmp/openclaw/openclaw-2026-04-28.log

Also: if this is on a VPS and you’re opening the dashboard from another machine, bind=loopback means only the VPS itself can connect. You’d need an SSH tunnel or a non-loopback/Tailscale-style setup. But the local probe timeout still needs the logs.

Docs: https://docs.openclaw.ai/gateway/troubleshooting

barren basin
#

this is on a VPS, but I'm currently inputing all commands through the console

severe nexusBOT
#

Got it — then loopback is fine. Since you’re running commands on the VPS itself, 127.0.0.1:18789 should work.

So the important bit is: gateway process is running, but not responding to the probe. Let’s look at startup logs:

openclaw --version
openclaw status --all
tail -n 160 /tmp/openclaw/openclaw-2026-04-28.log
journalctl --user -u openclaw-gateway -n 120 --no-pager

If you want to try a quick recovery first:

openclaw gateway restart
sleep 5
#
openclaw gateway status

But if it still says Connectivity probe: failed ... timeout, paste the log tail — that should show what it’s stuck on.

barren basin
#

which logs do you want to see? I got a lot of them

severe nexusBOT
#

Just the latest gateway log tail plus the systemd tail — not everything.

Run these and paste the output:

openclaw --version
tail -n 160 /tmp/openclaw/openclaw-2026-04-28.log
journalctl --user -u openclaw-gateway -n 80 --no-pager

If that file is huge/noisy, this filtered one is also useful:

grep -Ei 'error|warn|fail|timeout|listen|gateway|port|panic|exception|denied|auth' /tmp/openclaw/openclaw-2026-04-28.log | tail -n 120

Make sure to redact any API keys/tokens if you spot them, though OpenClaw’s status/log tooling usually redacts the obvious ones.

barren basin
#

version: OpenClaw 2026.4.26 (be8c246)

severe nexusBOT
#

Thanks — 2026.4.26 is recent, so next thing I need is the actual startup failure/stall from logs.

Please paste these two outputs:

tail -n 160 /tmp/openclaw/openclaw-2026-04-28.log
journalctl --user -u openclaw-gateway -n 80 --no-pager

If the first one is too noisy, use this instead:

grep -Ei 'error|warn|fail|timeout|listen|gateway|port|panic|exception|denied|auth' /tmp/openclaw/openclaw-2026-04-28.log | tail -n 120
barren basin
#

is there a way to share a file? The output was big

severe nexusBOT
#

Yep — attach it as a .txt file in this thread.

Best small files to upload:

tail -n 300 /tmp/openclaw/openclaw-2026-04-28.log > openclaw-log-tail.txt
journalctl --user -u openclaw-gateway -n 150 --no-pager > openclaw-journal-tail.txt

Then drag/drop both files here.

Before uploading, quick sanity check for secrets:

grep -Ei 'api[_-]?key|token|secret|password|authorization|bearer|sk-' openclaw-log-tail.txt openclaw-journal-tail.txt
#

If that shows anything sensitive, redact it first. Paths, ports, stack traces, and errors are fine.

barren basin
#

are you sure discord let's you upload files in posts? I don't seem to be able to

severe nexusBOT
#

You may be right — Discord/forum permissions can block uploads. My bad.

Let’s make the output small enough to paste directly. Run:

grep -Ei 'error|warn|fail|timeout|listen|gateway|port|panic|exception|denied|auth|starting|started' /tmp/openclaw/openclaw-2026-04-28.log | tail -n 80

And for systemd:

journalctl --user -u openclaw-gateway -n 60 --no-pager

If either is still too large, paste just the last ~50 lines in a Discord code block:

#
```text
paste here
```

What I’m specifically looking for is the first error/warning near gateway startup, not the full log history.

barren basin
#

I'll paste the log by parts

severe nexusBOT
#

Perfect. Please paste them as code blocks if you can, and start with the last part first — the newest lines are usually most useful.

Something like:

part 1/3
...

Aim for the output from:

tail -n 200 /tmp/openclaw/openclaw-2026-04-28.log

Then after that, if needed, we’ll look at the journalctl tail.

barren basin
severe nexusBOT
#

Yep — I can read that Drive file. No need to paste the giant log.

The useful bits I see are:

telegram/network: fetch fallback ... ETIMEDOUT,ENETUNREACH
gateway/ws: handshake timeout ... 127.0.0.1 -> 127.0.0.1:18789
telegram deleteWebhook failed: Network request for 'deleteWebhook' failed!

So this looks less like “gateway won’t start” and more like Telegram startup/network calls are hanging/failing, and the local gateway probe times out while that’s happening.

Try this quick isolation test:

openclaw config set channels.telegram.enabled false --strict-json
#
openclaw gateway restart
sleep 5
openclaw gateway status

If the probe becomes OK, we’ve confirmed Telegram/network is the culprit.

Then test VPS access to Telegram:

curl -4 -v --connect-timeout 10 https://api.telegram.org/ 2>&1 | tail -40
curl -6 -v --connect-timeout 10 https://api.telegram.org/ 2>&1 | tail -40

If IPv4 also times out/fails, your VPS/provider/firewall/DNS is blocking outbound Telegram API access. Docs: https://docs.openclaw.ai/channels/telegram#troubleshooting

barren basin
#

no dice:

🦞 OpenClaw 2026.4.26 (be8c246) — The only crab in your contacts you actually want to hear from. 🦞

│
◇  
Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-28.log
Command: /usr/bin/node /home/openclaw/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_GATEWAY_PORT=18789

Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw/openclaw.json

Gateway: bind=loopback (127.0.0.1), port=18789 (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://127.0.0.1:18789/
Probe note: Loopback-only gateway; only local clients can connect.

Runtime: running (pid 7228, state active, sub running, last exit 0, reason 0)
Warm-up: launch agents can take a few seconds. Try again shortly.
Connectivity probe: failed
Probe target: ws://127.0.0.1:18789
  timeout
Capability: unknown

Port 18789 is already in use.
- pid 7228 openclaw: openclaw-gateway (127.0.0.1:18789)
- Gateway already running locally. Stop it (openclaw gateway stop) or use a different port.
Listening: 127.0.0.1:18789
Troubles: run openclaw status
Troubleshooting: https://docs.openclaw.ai/troubleshooting

severe nexusBOT
#

Thanks for testing — that rules out the obvious “Telegram alone blocks startup” theory.

The PID changed to 7228, so the restart happened. Now I need the fresh lines after that restart, not the old 23:17 log. Since Drive worked, easiest is:

tail -n 250 /tmp/openclaw/openclaw-2026-04-28.log > fresh-openclaw-log.txt
journalctl --user -u openclaw-gateway -n 120 --no-pager > fresh-openclaw-journal.txt

Share those via Drive the same way and I can read them.

Also run these two tiny checks and paste the output:

openclaw config get channels.telegram.enabled
curl -i --max-time 5 http://127.0.0.1:18789/ | head -20
#

That tells us:

  • whether Telegram actually stayed disabled in the service config
  • whether plain HTTP dashboard responds while only the WebSocket/RPC probe is wedged.
barren basin
#

should I sudo it?

➜  logs journalctl --user -u openclaw-gateway -n 120 --no-pager > fresh-openclaw-journal.txt
Hint: You are currently not seeing messages from the system.
      Users in groups 'adm', 'systemd-journal' can see all messages.
      Pass -q to turn off this notice.
No journal files were opened due to insufficient permissions.
severe nexusBOT
#

For the journal, yes — use sudo only for reading logs:

sudo journalctl --user -u openclaw-gateway -n 120 --no-pager > fresh-openclaw-journal.txt

If that still says no user journal, try the system unit view:

sudo journalctl -u openclaw-gateway -n 120 --no-pager > fresh-openclaw-journal.txt

But the file log is probably enough, and doesn’t need sudo:

tail -n 250 /tmp/openclaw/openclaw-2026-04-28.log > fresh-openclaw-log.txt
#

Then share the fresh log link.

Also paste these directly if you can:

openclaw config get channels.telegram.enabled
curl -i --max-time 5 http://127.0.0.1:18789/ | head -20
barren basin
#

link to the log: https://drive.google.com/file/d/1N2hme2XTuiN1yMe1alFYdLzetHzRs-9n/view?usp=sharing

openclaw config get channels.telegram.enabled:
false

curl -i --max-time 5 http://127.0.0.1:18789/ | head -20:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2801  100  2801    0     0  72931      0 --:--:-- --:--:-- --:--:-- 73710
HTTP/1.1 200 OK
X-Content-Type-Options: nosniff
Referrer-Policy: no-referrer
Permissions-Policy: camera=(), microphone=(self), geolocation=()
X-Frame-Options: DENY
Content-Security-Policy: default-src 'self'; base-uri 'none'; object-src 'none'; frame-ancestors 'none'; script-src 'self' 'sha256-RxCZFmTWY/yQmhYxMDn+blaCuwLzOsV/XsVb0n5EkRU='; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com; img-src 'self' data: blob:; font-src 'self' https://fonts.gstatic.com; worker-src 'self'; connect-src 'self' ws: wss:
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Date: Tue, 28 Apr 2026 23:53:32 GMT
Connection: keep-alive
Keep-Alive: timeout=5
Content-Length: 2801

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>OpenClaw Control</title>
    <meta name="color-scheme" content="dark light" />
severe nexusBOT
#

I can read the log. Good news: the gateway is actually starting now.

Fresh log shows:

http server listening ... 
gateway ready

and your curl confirms the HTTP dashboard is responding.

The remaining noisy/failing part is WhatsApp, not Telegram:

WhatsApp session logged out. Run: openclaw channels login
[whatsapp] channel exited ... 401 Unauthorized / Connection Failure
#
auto-restart attempt ...

Let’s isolate that too:

openclaw config set channels.whatsapp.enabled false --strict-json
openclaw gateway restart
sleep 10
openclaw gateway status

If that goes green, the gateway is fine and the next step is either:

#
openclaw channels login --channel whatsapp

or leave WhatsApp disabled if you don’t need it right now.

One more note: I also saw chat.history taking ~58s in the log, which may explain why CLI probes feel like they’re timing out even though the HTTP server is alive. First disable the broken WhatsApp loop; then rerun status and we’ll see if the slow history/RPC issue remains.

barren basin
#
Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-28.log
Command: /usr/bin/node /home/openclaw/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_GATEWAY_PORT=18789

Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw/openclaw.json

Gateway: bind=loopback (127.0.0.1), port=18789 (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://127.0.0.1:18789/
Probe note: Loopback-only gateway; only local clients can connect.

Runtime: running (pid 9003, state active, sub running, last exit 0, reason 0)
Warm-up: launch agents can take a few seconds. Try again shortly.
Connectivity probe: failed
Probe target: ws://127.0.0.1:18789
  timeout
Capability: unknown

Port 18789 is already in use.
- pid 9003 openclaw: openclaw-gateway (127.0.0.1:18789)
- Gateway already running locally. Stop it (openclaw gateway stop) or use a different port.
Listening: 127.0.0.1:18789
Troubles: run openclaw status
Troubleshooting: https://docs.openclaw.ai/troubleshooting
severe nexusBOT
#

Okay, that means the HTTP side is alive, but the RPC/WebSocket probe is still getting stuck.

Two focused checks now:

openclaw config get channels.whatsapp.enabled
openclaw gateway status --timeout 60000

If the 60s timeout eventually works, then the gateway isn’t dead — the normal 10s probe is just being starved by something slow.

Also close any open dashboard/control UI browser tab for a moment, then:

openclaw gateway restart
sleep 10
#
openclaw gateway status --timeout 60000

Reason: your log had very slow RPC calls like chat.history taking ~48–58s, and those can make the CLI probe look like the gateway is unreachable even while curl http://127.0.0.1:18789/ works.

If channels.whatsapp.enabled is still true, disable it again first:

openclaw config set channels.whatsapp.enabled false --strict-json
openclaw gateway restart
barren basin
#

whatsapp is disabled. Why am I having such slow RPC calls?

severe nexusBOT
#

Because the gateway is Node-based: one very slow RPC can block/starve other RPC handshakes. Your log showed chat.history taking ~48–58s, which is not normal.

Most likely cause: a huge/gnarly active session transcript. chat.history has to read/parse/project the session transcript before returning it, and the Control UI/dashboard often requests that when opened.

Run these two read-only checks:

node - <<'NODE'
const fs=require('fs');
const p='/tmp/openclaw/openclaw-2026-04-28.log';
const rows=fs.readFileSync(p,'utf8').split('\n').flatMap(l=>{
  try {
    const j=JSON.parse(l);
    const m=j.message||'';
    const x=m.match(/⇄ res . ([^ ]+) (\d+)ms/);
    return x ? [{ms:+x[2], method:x[1], time:j.time}] : [];
#
  } catch { return []; }
}).sort((a,b)=>b.ms-a.ms).slice(0,20);
console.table(rows);
NODE

And:

find ~/.openclaw/agents -path '*/sessions/*' -type f -printf '%s %TY-%Tm-%Td %TH:%TM %p\n' \
  | sort -nr | head -30 | numfmt --field=1 --to=iec

Paste those results. If we see one giant transcript/session file, we can quarantine or clean it up safely — don’t delete anything yet.

Docs for session cleanup: https://docs.openclaw.ai/cli/sessions

barren basin
#
Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-28.log
Command: /usr/bin/node /home/openclaw/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_GATEWAY_PORT=18789

Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw/openclaw.json

Gateway: bind=loopback (127.0.0.1), port=18789 (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://127.0.0.1:18789/
Probe note: Loopback-only gateway; only local clients can connect.

Runtime: running (pid 9003, state active, sub running, last exit 0, reason 0)
Connectivity probe: ok
Capability: admin-capable

Listening: 127.0.0.1:18789
Troubles: run openclaw status
Troubleshooting: https://docs.openclaw.ai/troubleshooting
severe nexusBOT
#

Nice — that’s the healthy state 🎉

So the gateway itself is up:

Connectivity probe: ok
Capability: admin-capable

Likely what happened: broken channel startup/retry loops + slow dashboard/session RPCs made the short probe time out during warmup. With Telegram/WhatsApp disabled, the probe can complete.

Next:

  • If you need Telegram, re-enable it after confirming VPS outbound access to api.telegram.org.
  • If you need WhatsApp, run:
#
openclaw channels login --channel whatsapp
  • Then re-enable only the channel you’re actively fixing, one at a time.

If this solved the thread, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow indexes it for future users.

barren basin
#

I{m trying to message my agent over the TUI and he won'r respond. Maybe has something to do with the slow RPCs?

#
➜  logs openclaw tui                           

🦞 OpenClaw 2026.4.26 (be8c246) — I've survived more breaking changes than your last three relationships.

 openclaw tui - ws://127.0.0.1:18789 - agent main - session main                                                                                                                                                

 agents list failed: Error: gateway request timeout for agents.list                                                                                                                                             

 history failed: Error: gateway request timeout for chat.history                                                                                                                                                
 connected | idle                                                                                                                                                                                               
 agent main | session main | unknown | tokens ?    ```
severe nexusBOT
#

Yes, possibly — but there are two separate TUI modes:

  • openclaw tui = Gateway-backed, uses RPC/WebSocket
  • openclaw chat / openclaw terminal = local embedded mode, not the gateway

If you’re using openclaw tui, slow chat.history/RPC can absolutely make it look frozen.

Try a fresh session to bypass any huge/stuck history:

openclaw tui --session tui-smoke-$(date +%s)

Send a simple hi. If that responds, your old/default session history is probably the issue.

If it still doesn’t respond, run this in another console while sending a message:

#
tail -f /tmp/openclaw/openclaw-2026-04-28.log | grep -Ei 'chat|agent|model|error|timeout|rate|auth|run'

Also check model/auth health:

openclaw models status

Paste what appears right after you send the TUI message. That’ll tell us whether it’s stuck at RPC/history, agent run startup, or model auth/API. Docs: https://docs.openclaw.ai/cli/tui

#

Yep — this confirms it:

connected | idle
agents.list failed: gateway request timeout
history failed: gateway request timeout

The WS connection succeeds, but startup RPCs time out. Since chat.history is one of them, a large/bad session main history can block things badly enough that even agents.list times out.

Try this to bypass main history:

openclaw tui --session tui-smoke-$(date +%s) --history-limit 1

If that works, the issue is almost certainly the default session main transcript/history.

#

Then run:

find ~/.openclaw/agents -path '*/sessions/*' -type f -printf '%s %TY-%Tm-%Td %TH:%TM %p\n' \
  | sort -nr | head -30 | numfmt --field=1 --to=iec

Also useful:

openclaw sessions cleanup --all-agents --dry-run

Don’t delete anything yet — paste the biggest files / cleanup dry-run summary first. We can quarantine the bad session safely if needed.

barren basin
#

it worked. Let me get you the other two commands outputs

severe nexusBOT
#

Perfect — that strongly points to the default session main history/transcript being the problem, not the gateway or TUI generally.

Paste those outputs when you have them. We’re looking for either:

  • a giant transcript file, or
  • cleanup showing stale/oversized session entries.

Once we identify the main session file, safest fix is to archive/move it aside, not delete it.

barren basin
severe nexusBOT
#

I read the file. Those biggest files are mostly under the security agent and checkpoint/reset artifacts, not obviously the active main TUI session.

For the default TUI session, I want the actual store entry for agent:main:main. Run:

node - <<'NODE'
const fs=require('fs');
for (const p of [
  `${process.env.HOME}/.openclaw/agents/main/sessions/sessions.json`,
  `${process.env.HOME}/.openclaw/sessions/sessions.json`,
]) {
  if (!fs.existsSync(p)) continue;
  const j=JSON.parse(fs.readFileSync(p,'utf8'));
  console.log('STORE', p);
  for (const [k,v] of Object.entries(j)) {
    if (k === 'agent:main:main' || k === 'main' || k.includes(':main')) {
#
      console.log(k, JSON.stringify(v,null,2));
    }
  }
}
NODE

If it shows a sessionId or sessionFile, check its size:

ls -lh ~/.openclaw/agents/main/sessions/<SESSION_ID>.jsonl

Since the fresh session worked, the pragmatic fix is probably to reset the bad main session from the TUI:

#
openclaw tui --session tui-smoke-$(date +%s) --history-limit 1

then type:

/reset

But don’t do that yet if you care about preserving the old main conversation — let’s identify the file first so we can archive it safely.

barren basin
severe nexusBOT
#

Dry-run is clean — no stale/missing sessions to prune. So cleanup won’t fix this.

Important bit:

agent:main:main ... system id:a56410de-0968-4e3a-ad81-61035f4d85f3

Let’s inspect that exact transcript:

ls -lh ~/.openclaw/agents/main/sessions/a56410de-0968-4e3a-ad81-61035f4d85f3.jsonl
wc -l ~/.openclaw/agents/main/sessions/a56410de-0968-4e3a-ad81-61035f4d85f3.jsonl

If you don’t care about preserving the old main session, try resetting it with minimal history load:

#
openclaw tui --session main --history-limit 1

Then type:

/reset

If you do care about preserving it, don’t reset yet — paste the ls/wc output first and we can archive it safely.

barren basin
#

what happens if I don't preserve the old main session?

severe nexusBOT
#

If you don’t preserve it, you’re basically saying: “forget this one conversation thread and start main fresh.”

What you lose:

  • the old main chat transcript/history in the TUI
  • that session’s token counters / selected model overrides / thinking settings if stored there
  • any context the agent was relying on from that conversation

What you don’t lose:

  • config
  • agents
  • auth/API keys
  • workspace files
  • other sessions like dm, Telegram sessions, etc.

Safest reset path:

#
openclaw tui --session main --history-limit 1

then type:

/reset

That should archive/reset the agent:main:main session and let openclaw tui work normally again. If you want a zero-risk backup first:

cp ~/.openclaw/agents/main/sessions/a56410de-0968-4e3a-ad81-61035f4d85f3.jsonl ~/main-session-backup.jsonl
barren basin
#

➜ logs ls -lh ~/.openclaw/agents/main/sessions/a56410de-0968-4e3a-ad81-61035f4d85f3.jsonl
wc -l ~/.openclaw/agents/main/sessions/a56410de-0968-4e3a-ad81-61035f4d85f3.jsonl
ls: cannot access '/home/openclaw/.openclaw/agents/main/sessions/a56410de-0968-4e3a-ad81-61035f4d85f3.jsonl': No such file or directory
wc: /home/openclaw/.openclaw/agents/main/sessions/a56410de-0968-4e3a-ad81-61035f4d85f3.jsonl: No such file or directory

severe nexusBOT
#

Ah, interesting — the session store points at a564..., but the transcript file is already missing. That mismatch itself can cause weird history behavior.

Check whether the store has a custom sessionFile path:

node - <<'NODE'
const fs=require('fs');
const p=`${process.env.HOME}/.openclaw/agents/main/sessions/sessions.json`;
const j=JSON.parse(fs.readFileSync(p,'utf8'));
const e=j['agent:main:main'];
console.log(JSON.stringify(e,null,2));
NODE

Also search for the transcript under a reset/deleted/checkpoint name:

#
find ~/.openclaw/agents/main/sessions -type f -name '*a56410de*' -printf '%s %p\n' | numfmt --field=1 --to=iec

If no transcript exists, I’d just reset main:

openclaw tui --session main --history-limit 1

then:

/reset

Because there may be no old transcript to preserve anyway.

barren basin
#

this appears to be the session file /home/openclaw/.openclaw/agents/main/sessions/a56410de-0968-4e3a-ad81-61035f4d85f3.jsonl

severe nexusBOT
#

That path is what the store wants to use, but your ls showed the file isn’t actually there.

Can you run these exactly?

pwd
echo "$HOME"
ls -ld ~/.openclaw/agents/main/sessions
find ~/.openclaw/agents/main/sessions -maxdepth 1 -type f -name '*a56410de*' -ls

If the find command returns nothing, the transcript is gone/archived elsewhere and resetting main is the right move.

If it does return a .reset... or .deleted... file, paste that line — we can decide whether to restore/archive/reset cleanly.

barren basin
#

drwx------ 2 openclaw openclaw 12288 Apr 28 22:19 /home/openclaw/.openclaw/agents/main/sessions
1835084 16 -rw-rw-r-- 1 openclaw openclaw 13828 Apr 28 22:02 /home/openclaw/.openclaw/agents/main/sessions/a56410de-0968-4e3a-ad81-61035f4d85f3.jsonl.codex-app-server.json
1835018 4 -rw------- 1 openclaw openclaw 248 Apr 28 22:02 /home/openclaw/.openclaw/agents/main/sessions/a56410de-0968-4e3a-ad81-61035f4d85f3.trajectory-path.json
1835115 100 -rw------- 1 openclaw openclaw 98628 Apr 28 22:02 /home/openclaw/.openclaw/agents/main/sessions/a56410de-0968-4e3a-ad81-61035f4d85f3.trajectory.jsonl
1835116 72 -rw------- 1 openclaw openclaw 71391 Apr 28 18:36 /home/openclaw/.openclaw/agents/main/sessions/a56410de-0968-4e3a-ad81-61035f4d85f3.trajectory.jsonl.deleted.2026-04-28T23-07-01.991Z

severe nexusBOT
#

Yep — there is no actual:

a56410de-...jsonl

Only sidecar/trajectory files exist. So agent:main:main points to a missing transcript. That explains the weirdness.

Fix it by resetting main:

openclaw tui --session main --history-limit 1

then type:

#
/reset

After that, quit and try normal TUI again:

openclaw tui

If /reset times out because the broken main session is too wedged, the fallback is to patch the session store manually — but try the built-in reset first since it preserves OpenClaw’s bookkeeping.

barren basin
#

it takes the tui like a whole minute to connect, is that normal?

severe nexusBOT
#

No — a whole minute is not normal. A local loopback TUI should usually connect in a few seconds.

Given what we found, the likely reason is: TUI connects, then immediately tries agents.list + chat.history; chat.history for session main is wedged because the session store points at a missing transcript. It waits for RPC timeouts before showing you the UI.

For now, bypass it:

openclaw tui --session tui-smoke-$(date +%s) --history-limit 1

For the real fix, reset the broken main session:

openclaw tui --session main --history-limit 1

then type:

#
/reset

After that:

openclaw tui

should start much faster. If /reset also hangs/times out, tell me and we’ll do the manual session-store repair.

barren basin
#

how do I know if the session was reset?

severe nexusBOT
#

After /reset, the TUI should show a system line like:

session agent:main:main reset

You can verify from the shell too:

node - <<'NODE'
const fs=require('fs');
const p=`${process.env.HOME}/.openclaw/agents/main/sessions/sessions.json`;
const j=JSON.parse(fs.readFileSync(p,'utf8'));
console.log(JSON.stringify(j['agent:main:main'], null, 2));
NODE
#

Look for a new sessionId — it should no longer be:

a56410de-0968-4e3a-ad81-61035f4d85f3

Then check the new transcript exists:

NEW_ID=$(node -e "const fs=require('fs'); const j=JSON.parse(fs.readFileSync(process.env.HOME+'/.openclaw/agents/main/sessions/sessions.json','utf8')); console.log(j['agent:main:main']?.sessionId||'')")
ls -lh ~/.openclaw/agents/main/sessions/$NEW_ID.jsonl

Final practical check:

#
openclaw tui

If it opens quickly and no longer prints history failed, the reset worked.

barren basin
#

yeah, I have a new session id.

severe nexusBOT
#

Nice — reset worked.

Now final checks:

openclaw tui

It should open without the minute-long delay / history failed.

And:

openclaw gateway status

should stay:

#
Connectivity probe: ok
Capability: admin-capable

Once that’s good, you can re-enable channels one at a time. I’d leave WhatsApp disabled until you’re ready to relogin, since the logs showed it was logged out/retrying.

barren basin
#

should I do this for all agents?

severe nexusBOT
#

No — don’t reset all agents preemptively.

Only reset sessions that are actually broken/slow. Resetting all agents would wipe useful conversation context for security, clu, percy, etc.

Good rule:

  1. Test the agent/session:

    openclaw tui --session agent:security:main --history-limit 1
    

    or whatever session you use.

  2. If it opens/responds normally, leave it alone.

  3. If it hangs or history failed, inspect that session’s store entry / transcript, then reset just that session.

For now I’d only reset agent:main:main, keep WhatsApp disabled until relogin, and re-enable channels one at a time.

barren basin
#

nope, still failing

logs openclaw tui

🦞 OpenClaw 2026.4.26 (be8c246) — I'm not magic—I'm just extremely persistent with retries and coping strategies.

 openclaw tui - ws://127.0.0.1:18789 - agent main (Alfred) - session main                                                                                                                                       

 history failed: Error: gateway request timeout for chat.history                                                                                                                                                
 gateway connected | idle                                                                                                                                                                                       
 agent main (Alfred) | session main (openclaw-tui) | openai-codex/gpt-5.5 | tokens ?/200k   
severe nexusBOT
#

Still not normal. Since reset changed the session id but normal openclaw tui still times out on chat.history, we need inspect the current agent:main:main entry, not the old one.

Run:

node - <<'NODE'
const fs=require('fs');
const p=`${process.env.HOME}/.openclaw/agents/main/sessions/sessions.json`;
const e=JSON.parse(fs.readFileSync(p,'utf8'))['agent:main:main'];
console.log(JSON.stringify(e,null,2));
const id=e?.sessionId;
if (id) {
  const f=`${process.env.HOME}/.openclaw/agents/main/sessions/${id}.jsonl`;
  console.log('transcript:', f, fs.existsSync(f), fs.existsSync(f) ? fs.statSync(f).size : 'missing');
}
NODE
#

And after launching openclaw tui once, paste:

grep -E 'chat.history|agents.list|⇄ req|⇄ res|timeout|error' /tmp/openclaw/openclaw-2026-04-28.log | tail -80

For a temporary workaround, use:

openclaw tui --session main --history-limit 1

If --history-limit 1 works but normal TUI doesn’t, then the session history projection is still the bottleneck.

barren basin
severe nexusBOT
#

That grep output shows two different problems:

  1. Old history issue: before reset, chat.history took ~57s. That explains the slow TUI startup.

  2. Current “agent won’t respond” issue: model auth is failing:

    openai-codex/gpt-5.5
    401 Unauthorized: Incorrect API key provided
    

    So the TUI can connect, but when the agent tries to run, the provider rejects the key.

Run:

openclaw models status --probe
#

If it confirms the auth failure, re-auth the selected provider:

openclaw models auth login --provider openai-codex --set-default
openclaw gateway restart

Then test:

openclaw tui --session tui-smoke-$(date +%s) --history-limit 1

Also please still run the session-entry check from above — I want to confirm main is now pointing at a real new transcript.

barren basin
severe nexusBOT
#

Yep, that session entry is still broken:

  • new sessionId: b523...
  • but sessionFile still points to old missing a564...jsonl
  • new transcript is also missing
  • plus model auth is 401, so the agent won’t answer even after history is fixed

I’d do a safe manual repair of just agent:main:main:

openclaw gateway stop

cp ~/.openclaw/agents/main/sessions/sessions.json \
  ~/.openclaw/agents/main/sessions/sessions.json.bak.$(date +%Y%m%d-%H%M%S)

node - <<'NODE'
#
const fs = require('fs');
const crypto = require('crypto');
const p = `${process.env.HOME}/.openclaw/agents/main/sessions/sessions.json`;
const j = JSON.parse(fs.readFileSync(p, 'utf8'));
const e = j['agent:main:main'];
e.sessionId = crypto.randomUUID();
delete e.sessionFile;
e.systemSent = false;
e.inputTokens = 0;
e.outputTokens = 0;
e.totalTokens = 0;
e.totalTokensFresh = true;
e.updatedAt = Date.now();
fs.writeFileSync(p, JSON.stringify(j, null, 2) + '\n');
console.log(j['agent:main:main']);
#
NODE

openclaw gateway start
openclaw gateway status

Then fix model auth:

openclaw models status --probe
openclaw models auth login --provider openai-codex --set-default
openclaw gateway restart

Then test:

#
openclaw tui

Also: this smells like a bug — /reset should not leave sessionFile pointing at an old missing transcript. If you can, file it here with the before/after session entry: https://github.com/openclaw/openclaw/issues

barren basin
#

how does the node script repair the session?

severe nexusBOT
#

It only edits one entry in:

~/.openclaw/agents/main/sessions/sessions.json

Specifically agent:main:main.

What it changes:

  • sessionId = crypto.randomUUID()
    Creates a fresh backing session id.

  • delete e.sessionFile
    Removes the stale pointer to:

#
  a56410de-...jsonl

That was the broken part — the store kept pointing at an old missing transcript.

  • systemSent = false
    Forces the next run to send fresh system/bootstrap context.

  • token counters reset to 0
    Avoids stale usage display.

  • updatedAt = Date.now()
    Marks the session as freshly updated.

It does not touch:

  • config
  • auth
#
  • other agents/sessions
  • workspace files
  • sidecar/trajectory files

And the cp ... sessions.json.bak... line makes a rollback backup first. So if something looks wrong, you can restore the backup.

barren basin
#

this is what I got from gateway status:

logs openclaw gateway status

🦞 OpenClaw 2026.4.26 (be8c246) — Less clicking, more shipping, fewer "where did that file go" moments.

│
◇  
Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-28.log
Command: /usr/bin/node /home/openclaw/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_GATEWAY_PORT=18789

Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw/openclaw.json

Gateway: bind=loopback (127.0.0.1), port=18789 (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://127.0.0.1:18789/
Probe note: Loopback-only gateway; only local clients can connect.

Runtime: running (pid 22905, state active, sub running, last exit 0, reason 0)
Warm-up: launch agents can take a few seconds. Try again shortly.
Connectivity probe: failed
Probe target: ws://127.0.0.1:18789
  timeout
Capability: unknown

Port 18789 is already in use.
- pid 22905 openclaw: openclaw-gateway (127.0.0.1:18789)
- Gateway already running locally. Stop it (openclaw gateway stop) or use a different port.
Listening: 127.0.0.1:18789
Troubles: run openclaw status
Troubleshooting: https://docs.openclaw.ai/troubleshooting```
severe nexusBOT
#

That means the gateway process is up/listening, but the RPC probe is timing out again. Since PID changed to 22905, this is after restart.

First check whether HTTP is alive:

curl -i --max-time 5 http://127.0.0.1:18789/ | head -20

Then get the newest gateway lines only:

tail -n 120 /tmp/openclaw/openclaw-2026-04-28.log | grep -Ei 'gateway ready|http server listening|chat.history|agents.list|sessions|whatsapp|telegram|401|timeout|error|warn'

Also confirm channels stayed disabled:

#
openclaw config get channels.whatsapp.enabled
openclaw config get channels.telegram.enabled

If WhatsApp is true, disable/restart again — the old logs showed it was repeatedly 401/restarting and can starve startup:

openclaw config set channels.whatsapp.enabled false --strict-json
openclaw gateway restart
barren basin
#

curl output:

logs curl -i --max-time 5 http://127.0.0.1:18789/ | head -20
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  2801  100  2801    0     0  25075      0 --:--:-- --:--:-- --:--:-- 25234
HTTP/1.1 200 OK
X-Content-Type-Options: nosniff
Referrer-Policy: no-referrer
Permissions-Policy: camera=(), microphone=(self), geolocation=()
X-Frame-Options: DENY
Content-Security-Policy: default-src 'self'; base-uri 'none'; object-src 'none'; frame-ancestors 'none'; script-src 'self' 'sha256-RxCZFmTWY/yQmhYxMDn+blaCuwLzOsV/XsVb0n5EkRU='; style-src 'self' 'unsafe-inline' https://fonts.googleapis.com; img-src 'self' data: blob:; font-src 'self' https://fonts.gstatic.com; worker-src 'self'; connect-src 'self' ws: wss:
Content-Type: text/html; charset=utf-8
Cache-Control: no-cache
Date: Wed, 29 Apr 2026 03:50:13 GMT
Connection: keep-alive
Keep-Alive: timeout=5
Content-Length: 2801

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>OpenClaw Control</title>
    <meta name="color-scheme" content="dark light" />```
severe nexusBOT
#

HTTP is healthy, so this is specifically the WebSocket/RPC side getting blocked or timing out.

Please run these next:

openclaw gateway status --timeout 60000

and:

tail -n 120 /tmp/openclaw/openclaw-2026-04-28.log | grep -Ei 'gateway ready|http server listening|handshake|chat.history|agents.list|sessions|whatsapp|telegram|401|timeout|error|warn'

Also paste:

#
openclaw config get channels.whatsapp.enabled
openclaw config get channels.telegram.enabled

If the 60s status works, that confirms it’s slow/starved RPC rather than dead gateway.

barren basin
severe nexusBOT
#

Yeah — if it eventually comes up, that fits “startup/RPC warmup is slow,” not “gateway is dead.”

The logs still show the big remaining blocker is model auth:

openai-codex/gpt-5.5
401 Unauthorized: Incorrect API key provided

So after the gateway is reachable, the agent won’t answer until that’s fixed.

Do this next:

openclaw models status --probe
openclaw models auth login --provider openai-codex --set-default
#
openclaw gateway restart

Then test with a fresh tiny session:

openclaw tui --session tui-smoke-$(date +%s) --history-limit 1

If the gateway startup is still annoyingly slow after auth is fixed, raise the probe timeout as a workaround:

openclaw gateway status --timeout 60000

But the important “agent won’t respond” issue is the invalid OpenAI/Codex key.

barren basin
severe nexusBOT
#

I could read the Drive output. It doesn’t look like a total model auth failure: one Codex OAuth profile probes ok; the suspicious bit is:

Codex agent harness failed; not falling back… / codex app-server client is closed

But from your gateway status, the bigger issue is: the gateway process is running and listening, but the CLI WebSocket probe is timing out. The “port already in use” is just the running gateway itself, not necessarily a second process.

Please paste these next:

openclaw --version
openclaw gateway status --deep
openclaw logs --follow

Also, if you didn’t intentionally configure native Codex app-server runtime, check for/remove agentRuntime.id: "codex" or legacy codex/* model refs. For normal Codex OAuth, you want the openai-codex/* route through PI, not the native Codex harness.

Docs: https://docs.openclaw.ai/gateway/troubleshooting and https://docs.openclaw.ai/concepts/agent-runtimes

barren basin
#

the codex harness uses an API key?

severe nexusBOT
#

Not exactly.

In OpenClaw there are two different Codex paths:

  • Normal Codex OAuth / subscription route:
    openai-codex/gpt-5.5 → uses an openai-codex OAuth profile, not a pasted API key.
  • Native Codex app-server harness:
    openai/gpt-5.5 + agentRuntime.id: "codex" → uses Codex/OpenAI account auth, not the Anthropic key.

OpenAI’s Codex can also be used via API-key-style credentials in some CLI flows, but their current ChatGPT-plan flow is sign-in/OAuth based and stores credentials locally; no manual API key paste is required for that flow. (help.openai.com)

Your output shows an openai-codex OAuth profile probes ok, so I would not add an API key yet.

What I’d check is whether your config accidentally has:

#
agentRuntime: { id: "codex" }

If yes, then either:

model: "openai/gpt-5.5",
agentRuntime: { id: "codex" }

or use the normal PI route:

model: "openai-codex/gpt-5.5"
// no agentRuntime.id: "codex"
barren basin
#

why is that a problem?

severe nexusBOT
#

Because it changes which engine runs the turn, not just which credentials are used.

openai-codex/gpt-5.5 means: “use Codex subscription/OAuth through OpenClaw’s normal PI runner.”

agentRuntime.id: "codex" means: “force the native Codex app-server harness.”

So if both are set, you’re asking for the subscription model route but forcing a different runtime path. That path has different startup/session behavior, and if it fails you’ll see stuff like:

Codex agent harness failed; not falling back to embedded PI backend
codex app-server client is closed

That “not falling back” bit matters: once you force the Codex harness, OpenClaw won’t quietly fall back to PI.

So pick one:

#
// Normal / safer Codex subscription path
agents: {
  defaults: {
    model: { primary: "openai-codex/gpt-5.5" }
    // no agentRuntime.id: "codex"
  }
}

or:

// Native Codex app-server harness
agents: {
  defaults: {
barren basin
#

is there a way to use the harness and continue to use OAuth fro authentication?

severe nexusBOT
#

Yes — you can use the native Codex harness with OAuth.

The slightly confusing part is that the model ref changes, but the auth can still be the openai-codex OAuth profile.

Use:

{
  agents: {
    defaults: {
      model: { primary: "openai/gpt-5.5" },
      agentRuntime: { id: "codex" }
    }
  }
}
#

Then make sure you have a Codex OAuth profile:

openclaw models auth login --provider openai-codex
openclaw models status --probe-provider openai-codex

Auth order for the native harness is:

  1. explicit OpenClaw openai-codex auth profile
  2. existing Codex app-server / Codex CLI account
  3. only as fallback for local stdio launches: CODEX_API_KEY / OPENAI_API_KEY

So: no, you don’t need to switch to an API key if your openai-codex OAuth profile is valid.

After changing this, restart and start a fresh session:

barren basin
#

for now I eliminated the agentRuntime

severe nexusBOT
#

Good call — that puts you on the simpler PI route for now.

Now restart + verify from a clean session:

openclaw gateway restart
openclaw gateway status
openclaw models status --probe-provider openai-codex

If the gateway status still says:

Runtime: running
Connectivity probe: failed
timeout
#

then the remaining problem is probably gateway startup blocking/hanging, not Codex auth. In that case the useful next thing is the log tail right after restart:

openclaw logs --follow

Also make sure your model is now the PI OAuth route:

openclaw config get agents.defaults.model

You want to see openai-codex/gpt-5.5 somewhere there, and no forced agentRuntime.id: "codex".

barren basin
#

the models --probe command is stuck and won't end

severe nexusBOT
#

Yep, cancel it — don’t wait forever.

Ctrl+C

Then run a bounded, provider-specific probe so we don’t hit every profile:

openclaw --version
openclaw models status --probe-provider openai-codex --probe-timeout 15000 --probe-concurrency 1

If you want to test only the profile that looked good in your screenshot:

#
openclaw models status --probe-provider openai-codex --probe-profile openai-codex:yo@ignacioparada.com --probe-timeout 15000 --probe-concurrency 1

Your earlier output showed the Anthropic probe sitting ~60s and Codex app-server saying client is closed, so full models status --probe may be getting dragged down by a bad/slow profile. --probe-timeout is in milliseconds.

For the gateway issue, use a bounded log read instead of --follow for now:

tail -n 200 "$(ls -t /tmp/openclaw/openclaw-*.log | head -1)"

Paste the version + bounded probe result + last log tail. That should tell us whether this is still model auth, a stuck provider probe, or the gateway wedging during startup.

barren basin
#

Ctrl+C does nothing

severe nexusBOT
#

That’s annoying, but it’s just the probe process — don’t kill the gateway PID 4212.

Try in that terminal first:

Ctrl+\

If that also does nothing, open a second terminal/SSH session and find only the stuck probe:

ps -eo pid,ppid,stat,etime,cmd | grep '[o]penclaw models'

Then terminate that PID:

#
kill <PID>

If it still won’t die after a few seconds:

kill -9 <PID>

Again: don’t kill openclaw: openclaw-gateway / PID 4212 unless you’re intentionally restarting the gateway.

After that, avoid full --probe for now. Use non-probe first:

openclaw --version
#
openclaw models status

Then if needed, bounded single-provider:

openclaw models status --probe-provider openai-codex --probe-timeout 15000 --probe-concurrency 1
barren basin
#

the ps command returned nothing

severe nexusBOT
#

That grep was probably too narrow — the CLI may show up as node .../openclaw/dist/index.js, not openclaw models.

From the second terminal, try:

ps -u "$USER" -o pid,ppid,stat,etime,cmd | grep -E 'openclaw|node|codex' | grep -v grep

Look for something with models status / --probe. Kill that PID only:

kill <PID>

Do not kill the gateway one:

#
openclaw: openclaw-gateway

If that broader ps still only shows the gateway, then the probe process is probably already gone and your first terminal is just wedged. In that case it’s safe to close that SSH/terminal tab and reconnect.

For future probes, wrap it with a hard timeout:

timeout --foreground 20s openclaw models status --probe-provider openai-codex --probe-timeout 15000 --probe-concurrency 1

Then paste openclaw --version and the latest log tail when you’re back:

tail -n 200 "$(ls -t /tmp/openclaw/openclaw-*.log | head -1)"
barren basin
severe nexusBOT
#

Got it — the log is much clearer now.

This is not primarily a port problem and probably not the Codex harness anymore. The failure is:

401 Unauthorized: Incorrect API key provided: sk-proj-...
provider: openai-codex
model: gpt-5.5
auth error code: invalid_api_key

So OpenClaw is trying to run openai-codex/gpt-5.5, but it’s picking up a bad OpenAI project API key instead of the working Codex OAuth profile.

First, pin the known-good OAuth profile:

#
openclaw models auth order set --provider openai-codex openai-codex:yo@ignacioparada.com
openclaw gateway restart

Then verify bounded:

timeout --foreground 20s openclaw models status --probe-provider openai-codex --probe-profile openai-codex:yo@ignacioparada.com --probe-timeout 15000 --probe-concurrency 1

Also: don’t paste secrets, but locally search/remove the bad sk-proj...kYA key from config/env/auth:

grep -R "sk-proj" ~/.openclaw ~/.config/systemd/user ~/.profile ~/.bashrc ~/.env 2>/dev/null
#

The tools.allow ... unknown entries (cron) warning is probably unrelated/noisy. The blocker in the log is the invalid API key being selected.

barren basin
#

should I've gotten anything back from this?

timeout --foreground 20s openclaw models status --probe-provider openai-codex --probe-profile openai-codex:yo@ignacioparada.com --probe-timeout 15000 --probe-concurrency 1
severe nexusBOT
#

Yes — you should get output. Either the normal models status table, or after ~20s the shell should return with timeout exit code 124.

If it printed nothing and just returned, check the exit code:

echo $?

Interpretation:

  • 0 = command completed successfully but somehow produced no visible output (weird)
  • 124 = GNU timeout killed it
  • anything else = OpenClaw errored before printing

Try capturing both stdout/stderr:

#
timeout --foreground 20s openclaw models status \
  --probe-provider openai-codex \
  --probe-profile 'openai-codex:yo@ignacioparada.com' \
  --probe-timeout 15000 \
  --probe-concurrency 1 2>&1 | tee /tmp/openclaw-model-probe.txt

echo "exit=$?"
cat /tmp/openclaw-model-probe.txt

Small correction: with a pipe to tee, $? shows tee’s exit code. If you want the real timeout/openclaw exit code:

set -o pipefail
timeout --foreground 20s openclaw models status --probe-provider openai-codex --probe-profile 'openai-codex:yo@ignacioparada.com' --probe-timeout 15000 --probe-concurrency 1 2>&1 | tee /tmp/openclaw-model-probe.txt
#
echo "exit=$?"

Paste that result.

barren basin
#

124

severe nexusBOT
#

124 means the outer timeout killed it. So yes: the probe is still hanging past 20s.

Let’s stop probing for a moment and isolate local config/auth resolution:

timeout --foreground 10s openclaw models status
echo "exit=$?"

Then:

openclaw models auth order get --provider openai-codex

If that works, force only the OAuth profile:

#
openclaw models auth order set --provider openai-codex 'openai-codex:yo@ignacioparada.com'
openclaw gateway restart

Then check without probe:

openclaw models status

The log already showed the real blocker: OpenClaw was using a bad sk-proj... API key for openai-codex. So the goal is to make sure that bad key is no longer in the selected auth path before we worry about live probes again.

barren basin
#
timeout --foreground 10s openclaw models status
echo "exit=$?"

🦞 OpenClaw 2026.4.26 (be8c246) — I'm the assistant your terminal demanded, not the one your sleep schedule requested.

exit=124
openclaw models auth order get --provider openai-codex

🦞 OpenClaw 2026.4.26 (be8c246) — Half butler, half debugger, full crustacean.

Agent: main
Provider: openai-codex
Auth state file: ~/.openclaw/agents/main/agent/auth-state.json
Order override: openai-codex:yo@ignacioparada.com
severe nexusBOT
#

Yep, that tells us something useful:

  • Auth order is now correctly pinned to openai-codex:yo@ignacioparada.com.
  • But plain openclaw models status hanging before output looks like a separate CLI/status-path hang. Don’t keep using it for now.

Next I’d remove the bad API-key path the log showed:

grep -RIl "sk-proj" ~/.openclaw ~/.config/systemd/user ~/.profile ~/.bashrc ~/.zshrc ~/.env 2>/dev/null

Don’t paste secrets — just see which file contains it.

If it’s in the OpenClaw config under openai-codex, remove that key:

#
openclaw config unset models.providers.openai-codex.apiKey

For Codex OAuth, openai-codex should use the OAuth profile, not a sk-proj API key.

Then restart cleanly:

openclaw gateway restart
timeout --foreground 15s openclaw gateway status

If it still times out, paste the new log tail after restart:

#
tail -n 120 "$(ls -t /tmp/openclaw/openclaw-*.log | head -1)"
barren basin
#

there's a bunch of files returned by

grep -RIl "sk-proj" ~/.openclaw ~/.config/systemd/user ~/.profile ~/.bashrc ~/.zshrc ~/.env 2>/dev/null
severe nexusBOT
#

That’s okay — don’t paste the contents. We only need file names and which config keys contain it.

Run these safer checks:

openclaw config get models.providers.openai-codex.apiKey
openclaw config get models.providers.openai.apiKey
openclaw config get env.OPENAI_API_KEY

If the first one returns anything, remove it:

openclaw config unset models.providers.openai-codex.apiKey

For the grep results, show just file paths if you’re comfortable:

#
grep -RIl "sk-proj" ~/.openclaw ~/.config/systemd/user ~/.profile ~/.bashrc ~/.zshrc ~/.env 2>/dev/null

Generally:

  • ~/.openclaw/openclaw.json → remove the specific apiKey field with openclaw config unset ...
  • ~/.openclaw/agents/main/agent/auth-profiles.json → may contain an API-key auth profile; don’t edit yet, first identify profile IDs/types
  • shell files / systemd env files → remove or replace the bad OPENAI_API_KEY if you no longer want API billing
  • backups/logs → not the active cause, but rotate that key if it was real

Also, since that key appeared in logs, I’d treat it as exposed and revoke/rotate it in OpenAI.

barren basin
#

it's already revoked, I did it a while ago

severe nexusBOT
#

That explains the 401 invalid_api_key perfectly: OpenClaw is still finding that old revoked key somewhere and trying it.

We need to remove it from the active auth path, not every backup/log.

Run:

grep -RIn "sk-proj" ~/.openclaw/openclaw.json ~/.openclaw/agents/main/agent/auth-profiles.json ~/.config/systemd/user/openclaw-gateway.service ~/.profile ~/.bashrc ~/.zshrc ~/.env 2>/dev/null

Don’t paste full lines if they include the key — just note which file/key.

Most likely fixes:

openclaw config unset models.providers.openai-codex.apiKey
#
openclaw config unset models.providers.openai.apiKey
openclaw config unset env.OPENAI_API_KEY

If it’s in auth-profiles.json as an api_key profile for openai-codex, remove that profile or move the file aside after backing it up. Since your OAuth order is already pinned, the active profile should be:

openai-codex:yo@ignacioparada.com

After cleanup:

openclaw gateway restart
#

Then test by sending a fresh message or running:

tail -n 80 "$(ls -t /tmp/openclaw/openclaw-*.log | head -1)"

If the revoked key is gone from active config, the invalid_api_key log should stop.

barren basin
#
➜  logs openclaw config get models.providers.openai-codex.apiKey

🦞 OpenClaw 2026.4.26 (be8c246) — I'm the assistant your terminal demanded, not the one your sleep schedule requested.

Config path not found: models.providers.openai-codex.apiKey
➜  logs openclaw config get models.providers.openai.apiKey

🦞 OpenClaw 2026.4.26 (be8c246) — Gateway online—please keep hands, feet, and appendages inside the shell at all times.

Config path not found: models.providers.openai.apiKey
➜  logs openclaw config get env.OPENAI_API_KEY

🦞 OpenClaw 2026.4.26 (be8c246) — I'll do the boring stuff while you dramatically stare at the logs like it's cinema.

Config path not found: env.OPENAI_API_KEY
severe nexusBOT
#

Those config get results rule out OpenClaw config keys. The revoked key is probably coming from either:

  1. the process environment (OPENAI_API_KEY / CODEX_API_KEY), or
  2. an auth profile / legacy auth file.

Check env without printing the secret:

env | sed -nE 's/^(OPENAI_API_KEY|CODEX_API_KEY)=.*/\1=<set>/p'

systemctl --user show openclaw-gateway.service -p Environment \
  | sed -E 's/(OPENAI_API_KEY|CODEX_API_KEY)=[^ ]+/\1=<set>/g'

If either shows OPENAI_API_KEY=<set> and it’s the revoked key, remove it from your shell startup/systemd env, then:

#
unset OPENAI_API_KEY CODEX_API_KEY
systemctl --user unset-environment OPENAI_API_KEY CODEX_API_KEY
openclaw gateway restart

Also refresh the OAuth profile, since OpenClaw may be falling back to env after OAuth fails:

openclaw models auth login --provider openai-codex --set-default
# or, on a headless box:
openclaw models auth login --provider openai-codex --device-code --set-default

Then keep the order pinned:

#
openclaw models auth order set --provider openai-codex 'openai-codex:yo@ignacioparada.com'

If env is clean, next place to inspect is ~/.openclaw/agents/main/agent/auth-profiles.json for an api_key profile containing the revoked key. Don’t paste the key; just tell us which profile id/type it appears under.

barren basin
#

it says OPENAI_API_KEY=<set>, but I have no idea if it's the revoked key. Although I suspect it is, I haven't issued any other

severe nexusBOT
#

Yep, that’s very likely it. Since you’re using openai-codex OAuth, you don’t need OPENAI_API_KEY for this path.

Temporarily clear it and restart:

unset OPENAI_API_KEY
systemctl --user unset-environment OPENAI_API_KEY
openclaw gateway restart

Then check whether the service still has it:

systemctl --user show openclaw-gateway.service -p Environment \
  | sed -E 's/(OPENAI_API_KEY|CODEX_API_KEY)=[^ ]+/\1=<set>/g'
#

If it still shows OPENAI_API_KEY=<set>, it’s being loaded from a file or service env. Find where without printing the key:

grep -RIl "OPENAI_API_KEY" ~/.config/systemd/user ~/.profile ~/.bashrc ~/.zshrc ~/.env ~/.openclaw 2>/dev/null

Remove/comment that assignment wherever it appears, then:

systemctl --user daemon-reload
openclaw gateway restart

Then try a fresh message. The key thing we want is: new logs should not mention Incorrect API key provided: sk-proj....

barren basin
#
➜  logs unset OPENAI_API_KEY
systemctl --user unset-environment OPENAI_API_KEY
openclaw gateway restart

🦞 OpenClaw 2026.4.26 (be8c246) — I've read more man pages than any human should—so you don't have to.

Restarted systemd service: openclaw-gateway.service
➜  logs systemctl --user show openclaw-gateway.service -p Environment \
  | sed -E 's/(OPENAI_API_KEY|CODEX_API_KEY)=[^ ]+/\1=<set>/g'
Environment=HOME=/home/openclaw TMPDIR=/tmp PATH=/usr/bin:/home/openclaw/.local/bin:/home/openclaw/.npm-global/bin:/home/openclaw/bin:/home/openclaw/.volta/bin:/home/openclaw/.asdf/shims:/home/openclaw/.bun/bin:/home/openclaw/.nix-profile/bin:/home/openclaw/.nvm/current/bin:/home/openclaw/.local/share/fnm/aliases/default/bin:/home/openclaw/.local/share/fnm/current/bin:/home/openclaw/.fnm/aliases/default/bin:/home/openclaw/.fnm/current/bin:/home/openclaw/.local/share/pnpm:/usr/local/bin:/bin OPENCLAW_GATEWAY_PORT=18789 OPENCLAW_SYSTEMD_UNIT=openclaw-gateway.service "OPENCLAW_WINDOWS_TASK_NAME=OpenClaw Gateway" OPENCLAW_SERVICE_MARKER=openclaw OPENCLAW_SERVICE_KIND=gateway OPENCLAW_SERVICE_VERSION=2026.4.26 OPENCLAW_AGENT_RUNTIME=codex OPENCLAW_AGENT_HARNESS_FALLBACK=none OPENAI_API_KEY=<set>
severe nexusBOT
#

Aha — two smoking guns in the service environment:

OPENAI_API_KEY=<set>
OPENCLAW_AGENT_RUNTIME=codex
OPENCLAW_AGENT_HARNESS_FALLBACK=none

So even if you removed agentRuntime from config, the systemd service env is still forcing Codex runtime and still injecting the revoked API key.

Find where those env vars are defined:

grep -RInE 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' \
  ~/.config/systemd/user ~/.config/environment.d ~/.pam_environment ~/.profile ~/.bashrc ~/.zshrc ~/.env ~/.openclaw 2>/dev/null
#

Don’t paste the key value — just the filenames / variable names.

Then remove/comment these from the service env source:

OPENAI_API_KEY
OPENCLAW_AGENT_RUNTIME
OPENCLAW_AGENT_HARNESS_FALLBACK

After editing:

systemctl --user daemon-reload
systemctl --user unset-environment OPENAI_API_KEY OPENCLAW_AGENT_RUNTIME OPENCLAW_AGENT_HARNESS_FALLBACK
openclaw gateway restart
#

Verify they’re gone:

systemctl --user show openclaw-gateway.service -p Environment \
  | grep -E 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' || echo "clean"

Once those are clean, the gateway should follow your config again.

barren basin
#

do you want me to share the openclaw.json file with you?

severe nexusBOT
#

Not yet — the current evidence points more to systemd environment, not openclaw.json.

If you do share config later, redact tokens/passwords first. For now run this and paste only the file paths + variable names, not values:

grep -RInE 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' \
  ~/.config/systemd/user ~/.config/environment.d ~/.pam_environment ~/.profile ~/.bashrc ~/.zshrc ~/.env ~/.openclaw 2>/dev/null \
  | sed -E 's/(OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK)=.*/\1=<redacted>/'

The important thing is to remove those three from wherever systemd is loading them:

OPENAI_API_KEY
OPENCLAW_AGENT_RUNTIME
OPENCLAW_AGENT_HARNESS_FALLBACK
#

Then restart + verify:

systemctl --user daemon-reload
systemctl --user unset-environment OPENAI_API_KEY OPENCLAW_AGENT_RUNTIME OPENCLAW_AGENT_HARNESS_FALLBACK
openclaw gateway restart

systemctl --user show openclaw-gateway.service -p Environment \
  | grep -E 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' || echo "clean"
barren basin
#

this outputs a ridiculous amount of text:

grep -RInE 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' \
  ~/.config/systemd/user ~/.config/environment.d ~/.pam_environment ~/.profile ~/.bashrc ~/.zshrc ~/.env ~/.openclaw 2>/dev/null \
  | sed -E 's/(OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK)=.*/\1=<redacted>/'
severe nexusBOT
#

Yeah, ~/.openclaw includes logs/backups/state, so that can explode. Narrow it to active systemd-ish sources:

grep -RInE 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' \
  ~/.config/systemd/user ~/.config/environment.d ~/.pam_environment ~/.profile ~/.bashrc ~/.zshrc ~/.env 2>/dev/null \
  | sed -E 's/(OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK)=.*/\1=<redacted>/'

And specifically inspect the service file safely:

systemctl --user cat openclaw-gateway.service \
  | sed -E 's/(OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK)=[^ \"']+/\1=<redacted>/g'

If you see those vars in an Environment= line or an override/drop-in, remove them there.

#

Also check imported user env:

systemctl --user show-environment \
  | grep -E 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' \
  | sed -E 's/=.*/=<redacted>/'

If they show there, clear them:

systemctl --user unset-environment OPENAI_API_KEY OPENCLAW_AGENT_RUNTIME OPENCLAW_AGENT_HARNESS_FALLBACK
openclaw gateway restart
barren basin
#
➜  logs grep -RInE 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' \
  ~/.config/systemd/user ~/.config/environment.d ~/.pam_environment ~/.profile ~/.bashrc ~/.zshrc ~/.env 2>/dev/null \
  | sed -E 's/(OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK)=.*/\1=<redacted>/'
/home/openclaw/.config/systemd/user/openclaw-gateway.service.d/codex-harness.conf:2:Environment=OPENCLAW_AGENT_RUNTIME=<redacted>
/home/openclaw/.config/systemd/user/openclaw-gateway.service.d/codex-harness.conf:3:Environment=OPENCLAW_AGENT_HARNESS_FALLBACK=<redacted>
/home/openclaw/.config/systemd/user/openclaw-gateway.service.d/openai.conf:2:Environment=OPENAI_API_KEY=<redacted>
severe nexusBOT
#

Perfect, that’s the source.

Since you want the normal openai-codex OAuth/PI route for now, remove those drop-ins:

mkdir -p ~/.config/systemd/user/openclaw-gateway.service.d.disabled
mv ~/.config/systemd/user/openclaw-gateway.service.d/codex-harness.conf \
   ~/.config/systemd/user/openclaw-gateway.service.d.disabled/

mv ~/.config/systemd/user/openclaw-gateway.service.d/openai.conf \
   ~/.config/systemd/user/openclaw-gateway.service.d.disabled/

Then reload/restart:

#
systemctl --user daemon-reload
systemctl --user unset-environment OPENAI_API_KEY OPENCLAW_AGENT_RUNTIME OPENCLAW_AGENT_HARNESS_FALLBACK
openclaw gateway restart

Verify clean:

systemctl --user show openclaw-gateway.service -p Environment \
  | grep -E 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' || echo "clean"

If that prints clean, test:

#
timeout --foreground 15s openclaw gateway status

This should stop both issues: forced Codex harness and revoked API-key auth.

barren basin
#
logs timeout --foreground 15s openclaw gateway status

🦞 OpenClaw 2026.4.26 (be8c246) — Your task has been queued; your dignity has been deprecated.

│
■  Canceled
Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-29.log
Command: /usr/bin/node /home/openclaw/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_GATEWAY_PORT=18789

Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw/openclaw.json

Gateway: bind=loopback (127.0.0.1), port=18789 (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://127.0.0.1:18789/
Probe note: Loopback-only gateway; only local clients can connect.

Runtime: running (pid 47533, state active, sub running, last exit 0, reason 0)
Warm-up: launch agents can take a few seconds. Try again shortly.
Connectivity probe: failed
Probe target: ws://127.0.0.1:18789
  timeout
Capability: unknown

Port 18789 is already in use.
- pid 47533 openclaw: openclaw-gateway (127.0.0.1:18789)
- Gateway already running locally. Stop it (openclaw gateway stop) or use a different port.
Listening: 127.0.0.1:18789
Troubles: run openclaw status
Troubleshooting: https://docs.openclaw.ai/troubleshooting
severe nexusBOT
#

Okay — good news/bad news:

  • Good: we fixed the obvious bad env overrides.
  • Bad: the gateway is still listening but not answering the WS probe, so now we need the new log from /tmp/openclaw/openclaw-2026-04-29.log.

First verify the bad env is actually gone:

systemctl --user show openclaw-gateway.service -p Environment \
  | grep -E 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' || echo "clean"

Then paste the recent log tail:

tail -n 200 /tmp/openclaw/openclaw-2026-04-29.log
#

Also useful:

journalctl --user -u openclaw-gateway.service -n 80 --no-pager

The ■ Canceled is just your outer timeout killing the status command after 15s. The important part is still:

Runtime: running
Listening: 127.0.0.1:18789
Connectivity probe: failed timeout

So it’s not “won’t bind the port” anymore — it’s “process is alive but gateway isn’t responding.” The fresh log should tell us what it’s stuck on.

barren basin
#

wait, it's up, it just took it a while to get online

severe nexusBOT
#

Nice — glad it came back up 🎉

Root cause was the systemd drop-ins forcing:

OPENCLAW_AGENT_RUNTIME=codex
OPENCLAW_AGENT_HARNESS_FALLBACK=none
OPENAI_API_KEY=<revoked key>

So even after config changes, the service was still forcing the Codex harness and revoked API-key path.

If this fixed it, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow indexes it for the next person. 🦐

barren basin
#

let's see if agents answer

severe nexusBOT
#

Yep — send it a normal test message from the configured channel/DM.

If it doesn’t answer, check the fresh log for one turn:

tail -n 120 /tmp/openclaw/openclaw-2026-04-29.log

But if the gateway is online now, odds are decent it’ll respond after the first warm-up.

barren basin
#
logs openclaw tui                                    

🦞 OpenClaw 2026.4.26 (be8c246) — Welcome to the command line: where dreams compile and confidence segfaults.

 openclaw tui - ws://127.0.0.1:18789 - agent main (Alfred) - session main                                                                                                                                                        

 history failed: Error: gateway request timeout for chat.history                                                                                                                                                                 
 gateway connected | idle                                                                                                                                                                                                        
 agent main (Alfred) | session main (openclaw-tui) | openai-codex/gpt-5.5 | tokens ?/200k  
severe nexusBOT
#

That’s actually progress: the TUI connected, and it’s on the expected model:

openai-codex/gpt-5.5
gateway connected | idle

The remaining issue is narrower: chat.history timed out. That’s usually session/history load being slow or wedged, not model auth.

Try a clean session in TUI:

openclaw tui --session test

Or in the TUI, send:

#
/new

Then send a tiny test like:

reply exactly OK

If the fresh session works, the old main session history is probably the thing hanging. If it still times out, grab:

tail -n 160 /tmp/openclaw/openclaw-2026-04-29.log
barren basin
#

haven't tried the new session yet, the agent is responding, but is veeeeery slow

severe nexusBOT
#

That tracks with the earlier symptoms: it’s alive, but something in the main session/history/model path is dragging.

Try a fresh session next — it’s the fastest split-test:

openclaw tui --session speed-test

Send:

reply exactly OK

If the fresh session is fast, the old main session/history is the slowdown. Use /new or keep a new session.

If it’s still slow, check the active model/auth path:

#
openclaw models auth order get --provider openai-codex
systemctl --user show openclaw-gateway.service -p Environment \
  | grep -E 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' || echo "env clean"

And if you want a quick latency improvement, consider a smaller/faster model temporarily — but don’t change that until we know whether it’s session history vs provider latency.

barren basin
#
➜  logs openclaw tui --session speed-test

🦞 OpenClaw 2026.4.26 (be8c246) — We ship features faster than Apple ships calculator updates.

 openclaw tui - ws://127.0.0.1:18789 - agent main - session speed-test                                                                                                                                          
 connecting | idle                                                                                                                                                                                              
 gateway disconnected: closed | idle                                                                                                                                                                            
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
gateway connect failed: Error: gateway closed (1000): 
severe nexusBOT
#

That gateway closed (1000) means the gateway accepted the WS connection and then closed it cleanly. Since main works but is slow and speed-test closes, I’d stop changing config and inspect the gateway log at the exact TUI attempt.

Run:

systemctl --user show openclaw-gateway.service -p Environment \
  | grep -E 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' || echo "env clean"

tail -n 200 /tmp/openclaw/openclaw-2026-04-29.log
journalctl --user -u openclaw-gateway.service -n 100 --no-pager

Also try forcing the fully scoped session key:

openclaw tui --session agent:main:speed-test
#

If that also closes, the log should show why — likely session creation/history load, not auth anymore.

barren basin
#

it eventually connected, but it took it like a minute. That feels slow

severe nexusBOT
#

Yeah, ~1 minute to connect is definitely slow. At this point it’s likely startup/metadata work, not the original broken env.

Two likely culprits:

  1. Codex/OAuth usage/model status lookup is hanging/slow — we saw models status time out earlier.
  2. Session/history load is slow — TUI had chat.history timeouts.

Quick checks:

time openclaw gateway status
time openclaw models auth order get --provider openai-codex
time openclaw models status --plain

Avoid full models status / --probe for now.

#

Also confirm the bad env is gone:

systemctl --user show openclaw-gateway.service -p Environment \
  | grep -E 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' || echo "env clean"

For a practical speed test, keep using the new TUI session and send:

reply exactly OK

If the response itself is fast after connection, it’s mostly a Gateway/TUI connect/history issue. If responses are slow too, it’s model/provider latency.

barren basin
#

time openclaw gateway status: 4.38s
time openclaw models auth order get --provider openai-codex: 5.32s
time openclaw models status --plain: 21.21s

severe nexusBOT
#

Those timings point to config/env resolution being slow, not pure model latency.

models status --plain should be cheap-ish; 21s strongly smells like env.shellEnv or shell startup/import timing out. OpenClaw can run your login shell to import missing env vars, default timeout is 15s, which lines up suspiciously well.

Check:

openclaw config get env.shellEnv
env | grep -E 'OPENCLAW_LOAD_SHELL_ENV|OPENCLAW_SHELL_ENV_TIMEOUT_MS'
time bash -lc 'true'

If env.shellEnv.enabled is true and you don’t need shell-imported secrets anymore, disable it:

openclaw config set env.shellEnv.enabled false --strict-json
#
openclaw gateway restart

Or lower the timeout:

openclaw config set env.shellEnv.timeoutMs 3000 --strict-json
openclaw gateway restart

Then retest:

time openclaw models status --plain
time openclaw gateway status
#

If models status --plain drops from ~21s to a few seconds, we found the slow path.

steady plinth
#

I had same issues with gateway, but i don't want to read everything
@severe nexus can you do short tl dr instrctuction on how to solve?

severe nexusBOT
#

TL;DR fix we found here:

  1. Check for bad systemd env overrides:
grep -RInE 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' \
  ~/.config/systemd/user ~/.config/environment.d ~/.pam_environment ~/.profile ~/.bashrc ~/.zshrc ~/.env 2>/dev/null
  1. If you see drop-ins like:
~/.config/systemd/user/openclaw-gateway.service.d/codex-harness.conf
~/.config/systemd/user/openclaw-gateway.service.d/openai.conf

move them away:

#
mkdir -p ~/.config/systemd/user/openclaw-gateway.service.d.disabled
mv ~/.config/systemd/user/openclaw-gateway.service.d/codex-harness.conf ~/.config/systemd/user/openclaw-gateway.service.d.disabled/ 2>/dev/null
mv ~/.config/systemd/user/openclaw-gateway.service.d/openai.conf ~/.config/systemd/user/openclaw-gateway.service.d.disabled/ 2>/dev/null
  1. Reload/restart:
systemctl --user daemon-reload
systemctl --user unset-environment OPENAI_API_KEY OPENCLAW_AGENT_RUNTIME OPENCLAW_AGENT_HARNESS_FALLBACK
openclaw gateway restart
  1. Verify:
#
systemctl --user show openclaw-gateway.service -p Environment \
  | grep -E 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' || echo "clean"

openclaw gateway status

In this case the service was forcing Codex harness + a revoked OPENAI_API_KEY, even though config had been changed.

barren basin
#

would it be a problem for this command if I'm using zsh and not bash?

openclaw config get env.shellEnv
env | grep -E 'OPENCLAW_LOAD_SHELL_ENV|OPENCLAW_SHELL_ENV_TIMEOUT_MS'
time bash -lc 'true'```
severe nexusBOT
#

Good catch — if you use zsh, test zsh too/instead.

Run:

openclaw config get env.shellEnv
env | grep -E 'OPENCLAW_LOAD_SHELL_ENV|OPENCLAW_SHELL_ENV_TIMEOUT_MS'
time zsh -lc 'true'

If you have bash installed, time bash -lc 'true' is still useful as a comparison, but zsh is the important one if that’s your login shell.

Also check what your login shell actually is:

echo "$SHELL"
#
getent passwd "$USER" | cut -d: -f7

If zsh -lc 'true' takes many seconds, your shell startup files are slowing OpenClaw’s shell env import.

barren basin
#
➜  ~ openclaw config get env.shellEnv
env | grep -E 'OPENCLAW_LOAD_SHELL_ENV|OPENCLAW_SHELL_ENV_TIMEOUT_MS'
time zsh -lc 'true'

🦞 OpenClaw 2026.4.26 (be8c246) — Your second brain, except this one actually remembers where you left things.

Config path not found: env.shellEnv
zsh -lc 'true'  0.00s user 0.00s system 80% cpu 0.004 total
severe nexusBOT
#

That rules out my shell-env theory:

  • env.shellEnv is not configured
  • no OPENCLAW_LOAD_SHELL_ENV
  • zsh -lc 'true' is instant

So don’t change shell env settings.

Next split it like this:

time openclaw --version
time openclaw config get agents.defaults.model
time openclaw models auth order get --provider openai-codex
time openclaw models status --plain
#

If only models status --plain is slow, the delay is likely in model/auth/provider discovery, not gateway startup.

Also re-check the env cleanup stayed clean:

systemctl --user show openclaw-gateway.service -p Environment \
  | grep -E 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' || echo "env clean"

If agents are responding, I’d treat this as “slow auth/model status path” rather than the original broken gateway. Docs for env behavior: https://docs.openclaw.ai/help/environment

barren basin
#

env continues to be clean.

time openclaw --version: 0.07s
time openclaw config get agents.defaults.model: 4.56s
time openclaw models auth order get --provider openai-codex: 5.68s
time openclaw models status --plain: 22.25s

severe nexusBOT
#

Those numbers are useful. This is not shell startup anymore.

What they say:

  • openclaw --version = fast → CLI startup is fine.
  • config get / auth order get = ~5s → config/auth-state read is already slow.
  • models status --plain = ~22s → model/auth status path adds another ~17s.

Next checks:

du -h ~/.openclaw/openclaw.json \
  ~/.openclaw/agents/main/agent/auth-profiles.json \
  ~/.openclaw/agents/main/agent/auth-state.json \
  ~/.openclaw/agents/main/agent/models.json 2>/dev/null
#
du -sh ~/.openclaw/agents/main/sessions 2>/dev/null
find ~/.openclaw/agents/main/sessions -type f 2>/dev/null | wc -l

time openclaw config validate

If any of those files/dirs are huge, that explains the sluggishness.

Also try:

OPENCLAW_DEBUG_TIMING=1 openclaw models list --provider openai-codex --all >/tmp/openclaw-models-list.txt

Paste the timing output, not secrets. If this stays weird, it’s probably worth filing a GitHub issue with these timings: https://github.com/openclaw/openclaw/issues

barren basin
#
➜  ~ du -h ~/.openclaw/openclaw.json \
  ~/.openclaw/agents/main/agent/auth-profiles.json \
  ~/.openclaw/agents/main/agent/auth-state.json \
  ~/.openclaw/agents/main/agent/models.json 2>/dev/null
du -sh ~/.openclaw/agents/main/sessions 2>/dev/null
find ~/.openclaw/agents/main/sessions -type f 2>/dev/null | wc -l

time openclaw config validate
16K     /home/openclaw/.openclaw/openclaw.json
8.0K    /home/openclaw/.openclaw/agents/main/agent/auth-profiles.json
4.0K    /home/openclaw/.openclaw/agents/main/agent/auth-state.json
4.0K    /home/openclaw/.openclaw/agents/main/agent/models.json
40M     /home/openclaw/.openclaw/agents/main/sessions
82

🦞 OpenClaw 2026.4.26 (be8c246) — Finally, a use for that always-on Mac Mini under your desk.

Config valid: ~/.openclaw/openclaw.json
openclaw config validate  4.16s user 0.55s system 114% cpu 4.108 total
  ~ OPENCLAW_DEBUG_TIMING=1 openclaw models list --provider openai-codex --all >/tmp/openclaw-models-list.txt
➜  ~ cat /tmp/openclaw-models-list.txt 
No models found.
severe nexusBOT
#

Those file sizes look fine. 40M sessions can make old history slower, but it doesn’t explain config get taking ~4.5s or models status --plain taking 22s.

Also, your debug timing probably went to stderr, so the redirect hid it. Re-run like this:

OPENCLAW_DEBUG_TIMING=1 openclaw models status --plain 2>&1 | tee /tmp/models-status-timing.txt
OPENCLAW_DEBUG_TIMING=1 openclaw models list --provider openai-codex --all 2>&1 | tee /tmp/models-list-timing.txt

Then paste the timing lines.

The No models found for openai-codex is suspicious too. Check plugin/auth visibility:

openclaw plugins list | grep -i openai
openclaw models auth order get --provider openai-codex
#
openclaw config get plugins.allow
openclaw config get plugins.entries.openai

If plugins.allow exists and doesn’t include openai, that could explain the model-list weirdness. If those paths are “not found,” that’s fine — just paste that.

barren basin
#
➜  ~ OPENCLAW_DEBUG_TIMING=1 openclaw models status --plain 2>&1 | tee /tmp/models-status-timing.txt
OPENCLAW_DEBUG_TIMING=1 openclaw models list --provider openai-codex --all 2>&1 | tee /tmp/models-list-timing.txt
openai-codex/gpt-5.5
No models found.
➜  ~ cat /tmp/models-status-timing.txt
openai-codex/gpt-5.5
➜  ~ cat /tmp/models-list-timing.txt
No models found.
➜  ~ openclaw plugins list | grep -i openai
openclaw models auth order get --provider openai-codex
│ @openclaw/   │ openai   │ openclaw │ enabled  │ stock:openai/index.js                                    │ 2026.4.25 │
│ openai-      │          │          │          │                                                          │           │

🦞 OpenClaw 2026.4.26 (be8c246) — I'm the assistant your terminal demanded, not the one your sleep schedule requested.

Agent: main
Provider: openai-codex
Auth state file: ~/.openclaw/agents/main/agent/auth-state.json
Order override: openai-codex:yo@ignacioparada.com
➜  ~ openclaw config get plugins.allow
openclaw config get plugins.entries.openai

🦞 OpenClaw 2026.4.26 (be8c246) — I don't just autocomplete—I auto-commit (emotionally), then ask you to review (logically).

Config path not found: plugins.allow

🦞 OpenClaw 2026.4.26 (be8c246) — Hot reload for config, cold sweat for deploys.

{
  "enabled": true,
  "config": {
    "personality": "friendly"
  }
}
➜  ~ 
severe nexusBOT
#

Good data. Two notes:

  1. OPENCLAW_DEBUG_TIMING apparently isn’t emitting for these commands in your build, so ignore that suggestion — my bad.
  2. models list --provider openai-codex --all returning No models found is suspicious if you’re using openai-codex/gpt-5.5.

Please run these next:

openclaw models list --provider openai --all | head -30
openclaw models list --provider openai-codex --all --json | head -40
openclaw plugins list --json | grep -i -A8 -B4 openai
openclaw doctor

The important question now is: is the OpenAI plugin loaded but its openai-codex catalog/discovery missing/broken?

Also, your config validate being ~4s means the baseline CLI config path is just a bit slow on that machine, but models status --plain at ~22s is a separate model/provider status delay.

barren basin
#

you do remember we are trying to fix the slowness, right?

severe nexusBOT
#

Yes — you’re right. Sorry, I drifted into catalog/plugin debugging.

The slowness facts we have:

openclaw --version                         0.07s
openclaw config get agents.defaults.model  4.56s
openclaw models auth order get ...         5.68s
openclaw models status --plain             22.25s
config validate                            4.1s

So the real slow layer is:

  1. baseline config load/validation = ~4–6s
  2. models status path adds ~16–18s more
#

For fixing slowness, ignore models list for now. Next useful checks are timing filesystem/session/state access:

time ls ~/.openclaw/agents/main/sessions >/dev/null
time find ~/.openclaw/agents/main/sessions -type f >/dev/null
time node -e 'JSON.parse(require("fs").readFileSync(process.env.HOME+"/.openclaw/openclaw.json","utf8")); console.log("ok")'
time openclaw config get agents.defaults.model --json

And for runtime slowness specifically:

tail -n 200 /tmp/openclaw/openclaw-2026-04-29.log | grep -Ei 'slow|timeout|durationMs|chat.history|models|auth|usage'

I’m trying to isolate whether the 4–6s is disk/config validation, and whether the extra 16–18s is auth/model status work.

barren basin
#

time ls ~/.openclaw/agents/main/sessions >/dev/null: 0.00s
time find ~/.openclaw/agents/main/sessions -type f >/dev/null: 0.0s
time node -e 'JSON.parse(require("fs").readFileSync(process.env.HOME+"/.openclaw/openclaw.json","utf8")); console.log("ok")': 0.02s
time openclaw config get agents.defaults.model --json: 4.91s

the log output is gigantic so not including it here

severe nexusBOT
#

Yep — this tells us the disk/session files are not the slow part.

What we know now:

  • raw config file read: 0.02s
  • OpenClaw config validation/get: ~4–5s CPU work
  • models status --plain: ~22s
  • sessions dir read: instant

So there are two slow paths:

  1. Config/schema/plugin validation baseline: ~4–5s
    Annoying, but probably not what makes agent replies “veeeeery slow”.

  2. Model/auth status path: extra ~17s
    This may affect TUI/status checks, but not necessarily every agent reply.

#

Let’s measure the actual agent path instead of CLI diagnostics:

time openclaw tui --session speed-test

Then inside TUI send:

reply exactly OK

Roughly time:

  • how long TUI takes to connect
  • how long the OK reply takes after sending

If reply latency is also huge, paste only filtered logs around that turn:

#
tail -n 400 /tmp/openclaw/openclaw-2026-04-29.log \
  | grep -Ei 'durationMs|embedded_run|model_fallback|auth|openai-codex|chat.send|chat.history|timeout|error'

If TUI connect/history is slow but the OK reply is fast, then the remaining slowness is UI/session/status plumbing, not model inference.

barren basin
#

from the moment I run openclaw tui --session speed-test it takes TUI 80s to connect

from the moment I write reply exactly OK it takes about 20s

severe nexusBOT
#

80s TUI connect is not normal. 20s for openai-codex/gpt-5.5 can be plausible-ish, but the connect delay is the real bug now.

Let’s stop guessing and capture the exact slow RPC. Run TUI once, send the OK test, exit, then run:

tail -n 800 /tmp/openclaw/openclaw-2026-04-29.log \
  | grep -Ei 'chat.history|chat.send|gateway request timeout|durationMs|timeout|model|auth|usage|openai-codex|error|warn'

If that’s still huge, save it instead:

tail -n 800 /tmp/openclaw/openclaw-2026-04-29.log > /tmp/openclaw-slow-tail.log

Then share that file.

#

My current read:

  • 80s to connect = likely TUI waiting on chat.history / session metadata / gateway RPC timeouts.
  • 20s to reply OK = likely actual model/provider latency, less concerning.
  • Since a brand-new speed-test session is also slow, this is not just old main-session history.
severe nexusBOT
#

Yes — the log finally shows the slowness clearly.

Two separate slow spots:

  1. Gateway readiness is slow

    • HTTP server listening: 12:29:10
    • gateway ready: 12:30:32
    • That’s ~82s. The delay is in channels/sidecars startup, not config file IO.
  2. TUI waits on slow chat.history

    • Log shows: chat.history 32120ms
    • That alone explains ~30s of TUI “connecting”.

Next best tests:

#
openclaw tui --session "speed-test-$(date +%s)"

Use a truly fresh session, not reused speed-test.

Then check largest session files:

find ~/.openclaw/agents/main/sessions -type f -printf '%s %p\n' \
  | sort -nr | head -10 | numfmt --field=1 --to=iec

If main or speed-test is huge, that explains chat.history.

For startup delay, temporarily disable likely sidecar culprits one at a time:

#
openclaw config set plugins.entries.bonjour.enabled false --strict-json
openclaw gateway restart

If still slow, test browser:

openclaw config set plugins.entries.browser.enabled false --strict-json
openclaw gateway restart

Re-enable with true if you need them. The log delay is after starting channels and sidecars..., so this is where I’d focus now.

barren basin
#

why do you want me to run bash?

severe nexusBOT
#

I don’t need you to run bash there — that was just my command formatting habit. Since you’re on zsh, use zsh-friendly commands.

For the fresh session:

openclaw tui --session "speed-test-$(date +%s)"

For largest session files, this version avoids bash-specific stuff:

find ~/.openclaw/agents/main/sessions -type f -exec du -h {} + | sort -hr | head -10

The reason I’m asking for these is: the log showed chat.history taking ~32s, so I want to see whether a huge session file is making TUI history load slow.

barren basin
#

the speed test session took about 80s as well

severe nexusBOT
#

Then it’s not a big old session file. Fresh session being ~80s means TUI is waiting on gateway-side startup/sidecar/RPC readiness globally.

The log already showed an ~82s gap after:

starting channels and sidecars...

So I’d test disabling sidecars/plugins now.

Do this one at a time:

openclaw config set plugins.entries.bonjour.enabled false --strict-json
openclaw gateway restart
time openclaw tui --session "speed-test-$(date +%s)"
#

If still ~80s:

openclaw config set plugins.entries.browser.enabled false --strict-json
openclaw gateway restart
time openclaw tui --session "speed-test-$(date +%s)"

If still slow, re-enable those and check which plugins are active:

openclaw plugins list

My bet is one sidecar/channel init is blocking readiness for ~60–80s. The fresh session result rules out history as the primary cause.

barren basin
#

disabled both, it continues to take around 80s to connect

severe nexusBOT
#

Got it — then don’t keep disabling random plugins yet. If browser+bonjour didn’t change it, we need one clean measurement after those changes, because the previous log still showed browser taking ~80s in one restart:

12:29:10 starting channels and sidecars...
12:30:32 Browser control listening...
12:30:32 gateway ready

Run this now:

openclaw plugins list | grep -E 'bonjour|browser'
openclaw gateway status
date -Is
time openclaw tui --session "speed-test-$(date +%s)"
#

Then immediately:

tail -n 300 /tmp/openclaw/openclaw-2026-04-29.log \
  | grep -Ei 'starting channels|gateway ready|Browser control|bonjour|handshake timeout|chat.history|gateway connect failed|⇄ res'

What I’m trying to distinguish:

  • Gateway not ready for ~80s after restart → startup/sidecar problem.
  • Gateway ready, but TUI still waits ~80s → WebSocket handshake / chat.history problem.

Right now we’ve seen evidence for both, but not from the same post-disable run.

barren basin
#

why do you need me to run this time openclaw tui --session "speed-test-$(date +%s)" ? Will the timing actually work?

severe nexusBOT
#

Fair question. time openclaw tui ... only reports when the TUI process exits, so it’s not ideal unless you quit right after it connects. It won’t automatically measure “time until connected” cleanly.

Better: don’t use time there. Use wall-clock markers:

date -Is
openclaw tui --session "speed-test-$(date +%s)"

When TUI shows connected, note the time or run in another terminal:

date -Is

But honestly the log timestamps are better. What I really need is the log slice after a slow connect:

#
tail -n 300 /tmp/openclaw/openclaw-2026-04-29.log \
  | grep -Ei 'starting channels|gateway ready|Browser control|bonjour|handshake timeout|chat.history|gateway connect failed|⇄ res'

That tells us whether the 80s was before gateway ready, during handshake, or during chat.history.

barren basin
#

so do you want me to get you the log right after it connects?

severe nexusBOT
#

Yes — exactly. Right after the TUI finally connects, run:

tail -n 300 /tmp/openclaw/openclaw-2026-04-29.log \
  | grep -Ei 'starting channels|gateway ready|Browser control|bonjour|handshake timeout|chat.history|gateway connect failed|⇄ res'

That filtered output should be small and will show where the ~80s went.

barren basin
severe nexusBOT
#

Thanks — this log is much clearer.

The current slow connect is not gateway startup. In the “good” restart:

starting channels and sidecars 11:24:53
gateway ready                  11:24:54

So gateway readiness is fine there.

The actual TUI delay is here:

agents.list    361ms
chat.history   32120ms
#
handshake timeout later

So focus is now chat.history taking ~32s. Since fresh sessions also wait, I suspect TUI is asking history for a session key that maps through existing agent/session state, or history enumeration/normalization is slow.

Try reducing history payload:

openclaw config set gateway.webchat.chatHistoryMaxChars 1000 --strict-json
openclaw gateway restart

Then test a new TUI session again.

Also check largest session files anyway:

#
find ~/.openclaw/agents/main/sessions -type f -exec du -h {} + | sort -hr | head -10

If that doesn’t improve chat.history, I’d file this as a bug: chat.history takes ~32s and TUI connect waits on it, with log line:

⇄ res ✓ chat.history 32120ms

Issue link: https://github.com/openclaw/openclaw/issues

barren basin
#

find output:

➜  logs find ~/.openclaw/agents/main/sessions -type f -exec du -h {} + | sort -hr | head -10
7.2M    /home/openclaw/.openclaw/agents/main/sessions/e736896a-e0db-40af-9042-e71729a3d673.trajectory.jsonl.deleted.2026-04-24T03-07-11.718Z
6.5M    /home/openclaw/.openclaw/agents/main/sessions/78bf1db3-bb5d-43e5-a2e9-8ca2715f5e14.trajectory.jsonl.deleted.2026-04-28T23-07-01.991Z
3.2M    /home/openclaw/.openclaw/agents/main/sessions/78bf1db3-bb5d-43e5-a2e9-8ca2715f5e14.trajectory.jsonl.deleted.2026-04-24T03-07-11.718Z
2.9M    /home/openclaw/.openclaw/agents/main/sessions/94b0df20-6e3e-4374-abd0-32da3c3cf367.trajectory.jsonl.deleted.2026-04-24T03-07-11.718Z
2.6M    /home/openclaw/.openclaw/agents/main/sessions/78bf1db3-bb5d-43e5-a2e9-8ca2715f5e14.jsonl.reset.2026-04-26T18-14-36.454Z
2.3M    /home/openclaw/.openclaw/agents/main/sessions/ac2e3d0d-68f5-46ab-beeb-e3c71c02bebb.jsonl.reset.2026-04-20T12-10-02.882Z
2.2M    /home/openclaw/.openclaw/agents/main/sessions/781b1843-8171-4d4c-840d-9e8f73d0494a.jsonl.deleted.2026-04-22T02-13-30.580Z
2.0M    /home/openclaw/.openclaw/agents/main/sessions/78bf1db3-bb5d-43e5-a2e9-8ca2715f5e14.checkpoint.380be252-8bce-45e4-8a42-d10b85170f84.jsonl.deleted.2026-04-24T03-07-11.718Z
1.9M    /home/openclaw/.openclaw/agents/main/sessions/7a4463d6-b507-4e2f-8eac-1b7b567e93d5.trajectory.jsonl.deleted.2026-04-29T18-14-51.313Z
1.5M    /home/openclaw/.openclaw/agents/main/sessions/7a4463d6-b507-4e2f-8eac-1b7b567e93d5.trajectory.jsonl
severe nexusBOT
#

That output is useful. The active largest file is only ~1.5M, but there are lots of .deleted.* / .reset.* files still sitting in the live sessions directory.

If chat.history is scanning/parsing those, that could explain the ~32s history call.

Do a reversible cleanup test — move, don’t delete:

mkdir -p ~/.openclaw/agents/main/sessions-archive-2026-04-29

find ~/.openclaw/agents/main/sessions -maxdepth 1 -type f \
  \( -name '*.deleted.*' -o -name '*.reset.*' \) \
  -print

If that list looks like the old backup/reset files, move them out:

#
find ~/.openclaw/agents/main/sessions -maxdepth 1 -type f \
  \( -name '*.deleted.*' -o -name '*.reset.*' \) \
  -exec mv -n {} ~/.openclaw/agents/main/sessions-archive-2026-04-29/ \;

Then:

openclaw gateway restart
openclaw tui --session "speed-test-$(date +%s)"

If TUI connect drops from ~80s to normal, we found it: chat.history/session scanning is choking on archived/deleted session files left in the live sessions dir.

barren basin
#

show we do this for all agents? Not just main?

severe nexusBOT
#

For the TUI main-agent slowness, start with main only. It’s safer and gives us a clean A/B test.

If it fixes the slowdown, then yes, you can do the same cleanup for other agents. Use a reversible loop:

for dir in ~/.openclaw/agents/*/sessions; do
  agent="${dir:h:t}"
  archive="${dir:h}/sessions-archive-2026-04-29"
  echo "Agent: $agent"
  find "$dir" -maxdepth 1 -type f \( -name '*.deleted.*' -o -name '*.reset.*' \) -print
done

If the list looks right, move them:

#
for dir in ~/.openclaw/agents/*/sessions; do
  archive="${dir:h}/sessions-archive-2026-04-29"
  mkdir -p "$archive"
  find "$dir" -maxdepth 1 -type f \( -name '*.deleted.*' -o -name '*.reset.*' \) \
    -exec mv -n {} "$archive"/ \;
done

Then restart:

openclaw gateway restart

But I’d still test main first before doing all agents. This is reversible, but smaller blast radius is better.

barren basin
severe nexusBOT
#

Yep, cleanup didn’t fix it. The new log points elsewhere.

It still shows:

starting channels and sidecars...
~80s later
gateway ready

So next isolate channel startup. Temporary test only:

systemctl --user set-environment OPENCLAW_SKIP_CHANNELS=1
openclaw gateway restart
openclaw tui --session "speed-test-$(date +%s)"
#

If TUI connects quickly, one of the configured channels is blocking startup.

Then undo the test:

systemctl --user unset-environment OPENCLAW_SKIP_CHANNELS
openclaw gateway restart

If OPENCLAW_SKIP_CHANNELS=1 fixes startup, run:

openclaw channels status
openclaw channels status --probe

and paste the slow/failing channel lines. My suspicion is a channel connector, not sessions anymore.

barren basin
#

nope, that didn't fix it

severe nexusBOT
#

Good, that rules out channel startup. Make sure to undo it if you haven’t:

systemctl --user unset-environment OPENCLAW_SKIP_CHANNELS

Next isolate plugin services / sidecars more broadly. Start the gateway with providers skipped for one test:

systemctl --user set-environment OPENCLAW_SKIP_PROVIDERS=1
openclaw gateway restart
openclaw tui --session "speed-test-$(date +%s)"

Then undo:

#
systemctl --user unset-environment OPENCLAW_SKIP_PROVIDERS
openclaw gateway restart

If that still doesn’t fix it, we’re past simple config toggles and this should become a bug report. The clean repro is:

  • OpenClaw 2026.4.26
  • gateway reaches starting channels and sidecars...
  • gateway ready delayed ~80s
  • OPENCLAW_SKIP_CHANNELS=1 does not improve it
  • TUI fresh session waits ~80s
  • chat.history logs ~30–36s

I’d file that with the filtered logs: https://github.com/openclaw/openclaw/issues

barren basin
#

what about re-installing openclaw?

severe nexusBOT
#

Reinstalling might help if the install/plugin deps are corrupted, but I wouldn’t do a full wipe yet. Your state/config is the valuable part, and the symptoms look more like a 2026.4.26 runtime bug or bad state interaction than “binary missing”.

Safer order:

openclaw doctor
openclaw doctor --fix
openclaw gateway restart

Then reinstall the package/service without deleting state:

npm install -g openclaw@latest
openclaw gateway install --force
openclaw gateway restart
#

If you installed via pnpm/npm custom prefix, use the same method you originally used.

Before any bigger reinstall, backup state:

tar -czf ~/openclaw-backup-$(date +%F).tgz ~/.openclaw ~/.config/systemd/user/openclaw-gateway.service ~/.config/systemd/user/openclaw-gateway.service.d 2>/dev/null

I would not delete ~/.openclaw unless you’re okay losing sessions/auth/channel state. If reinstall doesn’t fix the 80s startup / 30s chat.history, file a GitHub issue with the logs.

barren basin
#

I'm running 2026.4.24

severe nexusBOT
#

Ah — that changes things. Also: your pasted command output earlier showed 2026.4.26, so you may have a split install: CLI one version, systemd service another, or stale service path.

Run these:

openclaw --version
which openclaw
readlink -f "$(which openclaw)"

systemctl --user cat openclaw-gateway.service
systemctl --user show openclaw-gateway.service -p ExecStart -p Environment

If anything still points at 2026.4.24, update/reinstall the service from the current install:

#
openclaw gateway install --force
openclaw gateway restart

If you truly are on 2026.4.24, yes — update first. We’ve seen weird .24-era regressions. Don’t wipe state; just get CLI + service onto the same latest version, then retest TUI.

barren basin
#

ok, that didn't work. I'll try reinstalling openclaw

severe nexusBOT
#

That’s reasonable now. Do a non-destructive reinstall and keep ~/.openclaw.

Suggested order:

# backup first
tar -czf ~/openclaw-backup-$(date +%F).tgz \
  ~/.openclaw \
  ~/.config/systemd/user/openclaw-gateway.service \
  ~/.config/systemd/user/openclaw-gateway.service.d 2>/dev/null

Then reinstall with whatever package manager you used. If npm global:

npm install -g openclaw@latest
#
openclaw --version
openclaw gateway install --force
openclaw gateway restart

Verify CLI and service match:

openclaw --version
openclaw gateway status
systemctl --user cat openclaw-gateway.service

Don’t delete ~/.openclaw unless you want to reset auth/sessions/channel state.

barren basin
#

should I run doctor first?

severe nexusBOT
#

Yes — run doctor first. It’s lower-risk than reinstalling.

openclaw doctor

If it reports fixable issues:

openclaw doctor --fix
openclaw gateway restart

Then retest:

#
openclaw gateway status
openclaw tui --session "speed-test-$(date +%s)"

If doctor doesn’t fix it, proceed with the non-destructive reinstall.

barren basin
#

I'm getting the following warnings:

- Codex plugin is enabled, but `openai-codex/*` model refs still use the OpenClaw PI      │
│    runner unless `agentRuntime.id` is `codex`.                                             │
│  - agents.defaults.model: openai-codex/gpt-5.5 currently resolves with runtime "pi".       │
│  - agents.list.security.model: openai-codex/gpt-5.5 currently resolves with runtime "pi".  │
│  - agents.list.main.model: openai-codex/gpt-5.5 currently resolves with runtime "pi".      │
│  - agents.list.clu.model: openai-codex/gpt-5.5 currently resolves with runtime "pi".       │
│  - agents.list.percy.model: openai-codex/gpt-5.5 currently resolves with runtime "pi".     │
│  - To use native Codex app-server, set the model to `openai/<model>` and set               │
│    `agents.defaults.agentRuntime.id: "codex"` (or the agent-level equivalent).             │
│  - Leave this unchanged if you intentionally want Codex OAuth/subscription auth through    │
│    PI.    
severe nexusBOT
#

That warning is expected and not the current problem.

It means:

  • You have the Codex plugin enabled.
  • Your agents use openai-codex/gpt-5.5.
  • That route uses OpenClaw PI + Codex OAuth, not native Codex app-server.

That’s exactly the route we intentionally moved you back to after removing the forced systemd env. So leave it unchanged unless you specifically want native Codex harness again.

The important line is:

Leave this unchanged if you intentionally want Codex OAuth/subscription auth through PI.

So: don’t “fix” that warning right now. Continue with doctor’s other warnings/errors, if any.

barren basin
#

I run npm install -g openclaw@latest and

➜  ~ npm install -g openclaw@latest
npm warn deprecated node-domexception@1.0.0: Use your platform's native DOMException instead

changed 434 packages in 26s

70 packages are looking for funding
  run `npm fund` for details
severe nexusBOT
#

Good — that npm output is fine. The deprecation warning is harmless.

Now reinstall/repoint the service and restart:

openclaw --version
openclaw gateway install --force
openclaw gateway restart

Then verify the service is using the same install:

openclaw gateway status
systemctl --user cat openclaw-gateway.service | grep -E 'ExecStart|node|openclaw'
#

After it’s up, retest:

openclaw tui --session "speed-test-$(date +%s)"

If it’s still slow after the reinstall, we’ve got a good issue report rather than an install problem.

barren basin
#
openclaw gateway install --force

🦞 OpenClaw 2026.4.26 (be8c246) — I'm not magic—I'm just extremely persistent with retries and coping strategies.


Installed systemd service: /home/openclaw/.config/systemd/user/openclaw-gateway.service
Previous unit backed up to: /home/openclaw/.config/systemd/user/openclaw-gateway.service.bak
severe nexusBOT
#

Good. Now restart and verify:

openclaw gateway restart
openclaw gateway status

Also confirm the old bad drop-ins did not come back:

systemctl --user show openclaw-gateway.service -p Environment \
  | grep -E 'OPENAI_API_KEY|OPENCLAW_AGENT_RUNTIME|OPENCLAW_AGENT_HARNESS_FALLBACK' || echo "env clean"

Then retest TUI:

#
openclaw tui --session "speed-test-$(date +%s)"

If it’s still slow, grab the same filtered log after it connects.

barren basin
#

➜ ~ openclaw gateway status
systemctl --user cat openclaw-gateway.service | grep -E 'ExecStart|node|openclaw'

Service: systemd (enabled)
File logs: /tmp/openclaw/openclaw-2026-04-29.log
Command: /usr/bin/node /home/openclaw/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Service file: ~/.config/systemd/user/openclaw-gateway.service
Service env: OPENCLAW_GATEWAY_PORT=18789

Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw/openclaw.json

Gateway: bind=loopback (127.0.0.1), port=18789 (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://127.0.0.1:18789/
Probe note: Loopback-only gateway; only local clients can connect.

Runtime: running (pid 72883, state active, sub running, last exit 0, reason 0)
Warm-up: launch agents can take a few seconds. Try again shortly.
Connectivity probe: failed
Probe target: ws://127.0.0.1:18789
timeout
Capability: unknown

Port 18789 is already in use.

  • pid 72883 openclaw: openclaw-gateway (127.0.0.1:18789)
  • Gateway already running locally. Stop it (openclaw gateway stop) or use a different port.
    Listening: 127.0.0.1:18789
    Troubles: run openclaw status
    Troubleshooting: https://docs.openclaw.ai/troubleshooting
    /home/openclaw/.config/systemd/user/openclaw-gateway.service
    ExecStart=/usr/bin/node /home/openclaw/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port 18789
    Environment=HOME=/home/openclaw
    Environment=PATH=/usr/bin:/home/openclaw/.local/bin:/home/openclaw/.npm-global/bin:/home/openclaw/bin:/home/openclaw/.volta/bin:/home/openclaw/.asdf/shims:/home/openclaw/.bun/bin:/home/openclaw/.nix-profile/bin:/home/openclaw/.nvm/current/bin:/home/openclaw/.local/share/fnm/aliases/default/bin:/home/openclaw/.local/share/fnm/current/bin:/home/openclaw/.fnm/aliases/default/bin:/home/openclaw/.fnm/current/bin:/home/openclaw/.local/share/pnpm:/usr/local/bin:/bin
    Environment=OPENCLAW_SYSTEMD_UNIT=openclaw-gateway.service
    Environment=OPENCLAW_SERVICE_MARKER=openclaw
severe nexusBOT
#

Service path/env look clean now. The remaining gateway status timeout may just be because it’s still in the ~80s warm-up window after restart.

Wait until it says ready, or check the log:

tail -n 80 /tmp/openclaw/openclaw-2026-04-29.log \
  | grep -Ei 'starting channels|gateway ready|Browser control|chat.history|handshake timeout|⇄ res|error|warn'

Then retest TUI:

openclaw tui --session "speed-test-$(date +%s)"

If it’s still slow after the reinstall and clean service env, I’d stop poking locally and file the bug. The minimal repro is strong now: clean latest service, startup/ready delay + chat.history ~30s.

barren basin
#

ok, continues to take a long time. Any help filling the issue?

severe nexusBOT
#

Yep — here’s a paste-ready issue draft.

Title

Gateway/TUI very slow after clean service reinstall: fresh TUI session takes ~80s, chat.history ~30s

Body

## Summary

After fixing bad systemd env overrides and reinstalling OpenClaw, the gateway runs but TUI is still very slow.

A fresh TUI session takes ~80s to connect. Once connected, a trivial prompt like `reply exactly OK` takes ~20s.
#

## Environment

- OpenClaw: 2026.4.26 (be8c246)
- OS: Linux, systemd user service
- Node: 22.22.0
- Gateway command:

```text
/usr/bin/node /home/openclaw/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port 18789

What I already fixed / ruled out

Earlier I had systemd drop-ins that forced:

#

```text
OPENCLAW_AGENT_RUNTIME=codex
OPENCLAW_AGENT_HARNESS_FALLBACK=none
OPENAI_API_KEY=<revoked key>

I moved those drop-ins aside, ran:

systemctl --user daemon-reload
systemctl --user unset-environment OPENAI_API_KEY OPENCLAW_AGENT_RUNTIME OPENCLAW_AGENT_HARNESS_FALLBACK
openclaw gateway restart
#
Then reinstalled non-destructively:

```sh
npm install -g openclaw@latest
openclaw gateway install --force
openclaw gateway restart

Service env now looks clean. systemctl --user cat openclaw-gateway.service shows only normal env like HOME/PATH/OPENCLAW_SYSTEMD_UNIT/OPENCLAW_SERVICE_MARKER.

Symptoms

Timings:

openclaw --version                         0.07s
#

openclaw config get agents.defaults.model ~4.5–5s
openclaw models auth order get --provider openai-codex ~5–6s
openclaw models status --plain ~22s
openclaw config validate ~4.1s


Disk/session checks:

```text
~/.openclaw/openclaw.json                         16K
~/.openclaw/agents/main/agent/auth-profiles.json  8K
~/.openclaw/agents/main/agent/auth-state.json     4K
~/.openclaw/agents/main/agent/models.json         4K
~/.openclaw/agents/main/sessions                  ~40M, 82 files
#
Raw config file read is fast:

```sh
time node -e 'JSON.parse(require("fs").readFileSync(process.env.HOME+"/.openclaw/openclaw.json","utf8")); console.log("ok")'
# ~0.02s

Shell env import does not seem to be the cause:

openclaw config get env.shellEnv -> path not found
OPENCLAW_LOAD_SHELL_ENV / OPENCLAW_SHELL_ENV_TIMEOUT_MS not set
zsh -lc 'true' -> ~0.004s
#
Moved `.deleted.*` / `.reset.*` session files out of the live sessions dir as a reversible test; did not improve the TUI delay.

Also tested `OPENCLAW_SKIP_CHANNELS=1`; did not improve the delay.

## Log evidence

Gateway sometimes delays readiness after:

```text
starting channels and sidecars...

Example:

2026-04-29T16:35:27.664 starting channels and sidecars...
#

2026-04-29T16:36:47.745 gateway ready


TUI/fresh session also shows slow `chat.history`:

```text
⇄ res ✓ agents.list 562ms
⇄ res ✓ chat.history 30707ms

Other observed examples:

⇄ res ✓ chat.history 32120ms
⇄ res ✓ chat.history 36112ms
⇄ res ✓ chat.history 32418ms
#

Sometimes TUI fails during this period with:

```text
chat.history unavailable during gateway startup
gateway request timeout for connect
gateway closed (1000)
handshake timeout

Expected

Fresh TUI session should connect quickly, and chat.history for a new session should not take ~30s.

Actual

#

Fresh TUI session takes ~80s to connect. Gateway startup/readiness and/or `chat.history` appear to block for tens of seconds.

## Notes

Configured model is `openai-codex/gpt-5.5` via PI/Codex OAuth. Doctor warns that `openai-codex/*` uses PI unless `agentRuntime.id: "codex"` is set, but this is intentional.

Attach the filtered logs if GitHub lets you. Also include the latest openclaw gateway status output.

barren basin
#

ok, done

severe nexusBOT
#

Nice, thank you. That’s the right next step — the repro is specific enough for maintainers to dig into.

For now I’d leave the service env clean and avoid more state churn. If they ask for more data, the best extra artifacts are:

openclaw gateway status
openclaw doctor
tail -n 500 /tmp/openclaw/openclaw-2026-04-29.log

And the key log lines are the gateway ready delay + chat.history 30s.

barren basin
#

any way to have a cleaner install of openclaw? Delete everything besides the config and install everything again?

severe nexusBOT
#

Yes — do it as a quarantine/reset, not delete. Keep backups so you can roll back.

Clean state, preserve only openclaw.json

openclaw gateway stop

stamp="$(date +%Y%m%d-%H%M%S)"
mkdir -p ~/openclaw-clean-backup-$stamp

cp ~/.openclaw/openclaw.json ~/openclaw-clean-backup-$stamp/openclaw.json
mv ~/.openclaw ~/openclaw-clean-backup-$stamp/openclaw-state-old

mkdir -p ~/.openclaw
cp ~/openclaw-clean-backup-$stamp/openclaw.json ~/.openclaw/openclaw.json
#

Then reinstall/reinstall service:

npm install -g openclaw@latest
openclaw gateway install --force
openclaw gateway restart

Then you’ll likely need to re-auth providers/channels because auth/session/channel state was moved aside.

For Codex OAuth:

openclaw models auth login --provider openai-codex --device-code --set-default

Then check:

#
openclaw gateway status
openclaw tui --session "clean-test-$(date +%s)"

If this clean state is fast, the slowness was in old state. If it’s still slow, it’s install/runtime/config-related.

barren basin
#

ok, don't worry about it. I'll just reinstall clean in another server

severe nexusBOT
#

Totally fair. New server is probably the cleanest way to separate “bad old state” from “runtime bug”.

Before you abandon this one, I’d still keep a backup in case you want sessions/config later:

tar -czf ~/openclaw-old-server-backup-$(date +%F).tgz ~/.openclaw ~/.config/systemd/user/openclaw-gateway.service ~/.config/systemd/user/openclaw-gateway.service.d 2>/dev/null

On the new server, start minimal: install, configure Codex OAuth, verify TUI/gateway speed, then add channels/plugins one at a time. That’ll make any slowdown much easier to catch.

barren basin
#

any way to reactivate the telegram connection in the current server??

severe nexusBOT
#

Yes. Telegram is token/config based, so “reactivate” usually means: make sure the token is configured, enable the channel, restart, then pair/allowlist yourself again.

Fast path:

openclaw channels add --channel telegram --token '<bot-token>'
openclaw gateway restart
openclaw channels status --probe

Don’t paste the token here — use your BotFather token locally.

Then DM the bot /start. If pairing is enabled, approve it:

openclaw pairing list telegram
#
openclaw pairing approve telegram <CODE>

If you want durable allowlist access instead of pairing, set your numeric Telegram user ID:

openclaw config set channels.telegram.dmPolicy allowlist
openclaw config set channels.telegram.allowFrom '["123456789"]' --strict-json
openclaw gateway restart

For groups, also allow the group under channels.telegram.groups — negative group IDs go there, not in allowFrom.

Docs: https://docs.openclaw.ai/channels/telegram

severe nexusBOT
#

Yep — then re-enable it with:

openclaw config set channels.telegram.enabled true --strict-json
openclaw gateway restart

Then check status:

openclaw channels status --probe

If Telegram still times out with deleteWebhook / ETIMEDOUT / ENETUNREACH, that’s outbound network from the server to Telegram Bot API. Test directly: