#Continuation: Broken Openclaw

1 messages Ā· Page 1 of 1 (latest)

true skiff
#

Hello. I had another thread going but it had to be closed. I have a summary written of what we were most recently working on, if I can send that to you in a few pieces:
OpenClaw 2026.4.25 (aa36ee6) on native Windows, config at:
D:\Caelan\openclaw-rescue-noplugins.json
state/logs under:
D:\Caelan.openclaw\

Original issue: gateway extremely slow / dashboard and Discord unusable. Rolling back to 2026.4.23 did not fully help; trying 2026.4.25 showed severe hangs.

Key findings:

  1. Normal gateway/service would bind port 18789 but local probes timed out.

    • openclaw gateway status showed listener on 127.0.0.1:18789
    • Invoke-WebRequest http://127.0.0.1:18789/healthz -TimeoutSec 5 timed out
    • Dashboard also refused to load
  2. Discord/channel startup was initially suspected because gateway reached:

    • starting channels and sidecars...
    • hooks loaded
    • then stalled
      But later testing showed the problem was broader than Discord.
  3. Rescue mode with channels/providers skipped:

powershell
$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_SKIP_CHANNELS="1"
$env:OPENCLAW_SKIP_PROVIDERS="1"

still resulted in /healthz timeouts unless bundled plugins were disabled too.

  1. Important smoking gun:

    Running openclaw hooks disable boot-md blocked for ~5 minutes with:
    failed to load bundled channel qqbot: Timed out waiting for bundled runtime deps lock
    lock path:
    D:\Caelan\.openclaw\plugin-runtime-deps\openclaw-2026.4.25-eb7d6f0fe52e\.openclaw-runtime-deps.lock
    owner PID in lock was 2376, but Windows showed PID 2376 was svchost.exe, so it looked like a stale/wrong PID collision.

jovial escarpBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> āœ… Mark Solution

true skiff
#
  1. Renaming the stale lock alone did not fix the gateway. The working rescue combination was:

powershell
$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_SKIP_CHANNELS="1"
$env:OPENCLAW_SKIP_PROVIDERS="1"
$env:OPENCLAW_DISABLE_BUNDLED_PLUGINS="1"
openclaw gateway run --verbose

With that, /healthz returned:

{"ok":true,"status":"live"}

Conclusion:
Core gateway works only when bundled plugins are disabled. Strong suspicion: 2026.4.25 bundled plugin runtime-deps/staging/lock regression on Windows, not just Discord.

Current rescue restart steps:
powershell
$gwPid = (Get-NetTCPConnection -LocalPort 18789 -State Listen -ErrorAction SilentlyContinue).OwningProcess | Select-Object -First 1
if ($gwPid) { Stop-Process -Id $gwPid -Force }

$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_SKIP_CHANNELS="1"
$env:OPENCLAW_SKIP_PROVIDERS="1"
$env:OPENCLAW_DISABLE_BUNDLED_PLUGINS="1"

openclaw gateway run --verbose
Then verify:
powershell
Invoke-WebRequest http://127.0.0.1:18789/healthz -UseBasicParsing -TimeoutSec 5
Model situation:

  • Dashboard loads in rescue mode, but agent got stuck looping through fallbacks.
  • openclaw models status showed:
    • Default: openai-codex/gpt-5.5
    • 21 fallbacks configured
    • google-antigravity OAuth expired
    • openai-codex:default expired
    • openai-codex:lindsay... ok, expires in 4d
    • openai has env API key from OPENAI_API_KEY
    • openrouter has API key
  • Suggested rescue model fix:
    powershell
    $env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
    $env:OPENCLAW_SKIP_CHANNELS="1"
    $env:OPENCLAW_SKIP_PROVIDERS="1"
    $env:OPENCLAW_DISABLE_BUNDLED_PLUGINS="1"

openclaw models set openai/gpt-4.1
openclaw models fallbacks clear
Then start a new dashboard chat and test ā€œsay hiā€.

#

Known cleanup:

  • There is now a stale warning:
    plugins.entries.qqbot: plugin not found: qqbot
    This came from an attempted targeted isolation step and can be cleaned later. It is not the main issue.
willow willowBOT
#

Yep, send the rest in pieces. I’ll keep track.

That runtime-deps lock / stale PID collision is a very useful lead, so once I have the full summary we can pick up from there instead of restarting from zero. Code blocks for commands/logs are perfect. 🦐

#

Got it — this narrows it a lot.

The key checkpoint I’m carrying forward is:

  • Gateway core is healthy
  • Healthz only responds when OPENCLAW_DISABLE_BUNDLED_PLUGINS=1 is set
  • So the immediate blocker is very likely in bundled plugin runtime deps / staging / locking on Windows, not the dashboard, Discord, or model provider layer by itself.
  • Separately, the model fallback loop is a second issue to stabilize once the gateway is responsive.

Send the next piece when ready. 🦐

#

Got it. I’ll treat the plugins.entries.qqbot: plugin not found: qqbot warning as cleanup noise from isolation, not the root cause.

So far the working theory remains: bundled plugin runtime deps are wedging startup on Windows, and the model/fallback config is only relevant after rescue mode is stable.

Send the next chunk whenever.

true skiff
#

that was it. Now, I don't think gpt 4.1 is the right model to test but i don't know. i feel like my google models and OAuth/codex should at least work. And if not those, the free models should I had an openrouter key and everything

willow willowBOT
#

Yeah, your instinct is fair: gpt-4.1 probably isn’t the model you ultimately want.

But for this specific rescue test, the model choice is not about ā€œbest modelā€ — it’s about picking the least complicated auth path.

Right now:

  • google-antigravity OAuth showed expired.
  • openai-codex:default showed expired.
  • OpenRouter/free models add another failure layer: upstream routing, free-tier limits, provider incompatibility, rate limits.
  • You had 21 fallbacks, which can turn one bad auth/profile into a long confusing loop.

So the reason openai/gpt-4.1 was suggested is basically: you have OPENAI_API_KEY, so direct OpenAI API is the cleanest smoke test. Not because it’s the right daily driver.

I’d do this order:

#
$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"

openclaw models fallbacks clear
openclaw models set openai/gpt-4.1
openclaw infer model run --local --model openai/gpt-4.1 --prompt "Reply with exactly: smoke-ok" --json

If that works, then the agent/model path is alive. After that, re-auth Codex/Google once the bundled-plugin startup issue is dealt with.

I would not use OpenRouter/free models as the first smoke test here. They’re fine later, but bad for diagnosis because ā€œAPI key existsā€ doesn’t mean ā€œthis specific free model is available, compatible, not rate-limited, and fast right now.ā€

Docs that match this flow:

true skiff
#

PS D:\Caelan.openclaw\workspace> $env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
PS D:\Caelan.openclaw\workspace>
PS D:\Caelan.openclaw\workspace> openclaw models fallbacks clear
Config warnings:

  • plugins.entries.qqbot: plugin not found: qqbot (stale config entry ignored; remove it from plugins config)

šŸ¦ž OpenClaw 2026.4.25 (aa36ee6) — Claws out, commit in—let's ship something mildly responsible.

|
o Config warnings -------------------------------------------------------------------------+
| |
| - plugins.entries.qqbot: plugin not found: qqbot (stale config entry ignored; remove it |
| from plugins config) |
| |
+-------------------------------------------------------------------------------------------+
Config warnings:

  • plugins.entries.qqbot: plugin not found: qqbot (stale config entry ignored; remove it from plugins config)
    Config write rejected: D:\Caelan\openclaw-rescue-noplugins.json (size-drop:45893->15277). Rejected payload saved to D:\Caelan\openclaw-rescue-noplugins.json.rejected.2026-04-28T06-51-45-229Z.
    Error: Config write rejected: D:\Caelan\openclaw-rescue-noplugins.json (size-drop:45893->15277). Rejected payload saved to D:\Caelan\openclaw-rescue-noplugins.json.rejected.2026-04-28T06-51-45-229Z.
    PS D:\Caelan.openclaw\workspace> openclaw models set openai/gpt-4.1
    Config warnings:
  • plugins.entries.qqbot: plugin not found: qqbot (stale config entry ignored; remove it from plugins config)
#

penClaw 2026.4.25 (aa36ee6) — Somewhere between 'hello world' and 'oh god what have I built.'

|
o Config warnings -------------------------------------------------------------------------+
| |
| - plugins.entries.qqbot: plugin not found: qqbot (stale config entry ignored; remove it |
| from plugins config) |
| |
+-------------------------------------------------------------------------------------------+
Config warnings:

  • plugins.entries.qqbot: plugin not found: qqbot (stale config entry ignored; remove it from plugins config)
    Config write rejected: D:\Caelan\openclaw-rescue-noplugins.json (size-drop:45893->16098). Rejected payload saved to D:\Caelan\openclaw-rescue-noplugins.json.rejected.2026-04-28T06-51-48-367Z.
    Error: Config write rejected: D:\Caelan\openclaw-rescue-noplugins.json (size-drop:45893->16098). Rejected payload saved to D:\Caelan\openclaw-rescue-noplugins.json.rejected.2026-04-28T06-51-48-367Z.
    PS D:\Caelan.openclaw\workspace> openclaw infer model run --local --model openai/gpt-4.1 --prompt "Reply with exactly: smoke-ok" --json
#

Config warnings:

  • plugins.entries.qqbot: plugin not found: qqbot (stale config entry ignored; remove it from plugins config)
    00:52:50 [diagnostic] lane task error: lane=main durationMs=8651 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
    00:52:50 [diagnostic] lane task error: lane=session:agent:main:main durationMs=8757 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
    00:52:50 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai/gpt-4.1 candidate=openai/gpt-4.1 reason=unknown next=openai-codex/gpt-5.5 detail=EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'
    00:52:51 [diagnostic] lane task error: lane=main durationMs=1141 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
    00:52:51 [diagnostic] lane task error: lane=session:agent:main:main durationMs=1154 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
    00:52:51 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai/gpt-4.1 candidate=openai-codex/gpt-5.5 reason=unknown next=openai/gpt-5.4 detail=EPERM: operation not permitted, open '
#

D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'
00:52:52 [diagnostic] lane task error: lane=main durationMs=496 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:52 [diagnostic] lane task error: lane=session:agent:main:main durationMs=509 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:52 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai/gpt-4.1 candidate=openai/gpt-5.4 reason=unknown next=openai-codex/gpt-5.4 detail=EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'
00:52:53 [diagnostic] lane task error: lane=main durationMs=962 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:53 [diagnostic] lane task error: lane=session:agent:main:main durationMs=972 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:53 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai/gpt-4.1 candidate=openai-codex/gpt-5.4 reason=unknown next=google/gemini-2.5-flash detail=EPERM: operation not permitted, open

willow willowBOT
#

That write rejection is actually good news: OpenClaw refused to rewrite your 45KB rescue config into a 15KB clobbered-looking version. Your active config should be unchanged; the .rejected... file is just the failed attempted write.

For now, stop doing config-writing commands like models set / fallbacks clear on this rescue config.

If openclaw models set openai/gpt-4.1 is still sitting there, hit Ctrl+C.

Next, do a no-write smoke test instead:

$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"

openclaw infer model run --local --model openai/gpt-4.1 --prompt "Reply with exactly: smoke-ok" --json

This tests the model/auth path without changing your config or starting the gateway.

If that works, then test the ones you actually care about, one at a time:

#
openclaw infer model run --local --model openai-codex/gpt-5.5 --prompt "Reply with exactly: codex-ok" --json

And for Google, use whichever exact model ref your models status showed, e.g.:

openclaw infer model run --local --model google-antigravity/gemini-3-flash --prompt "Reply with exactly: google-ok" --json

If Codex/Google fail with expired OAuth, that’s a separate auth refresh issue — but first let’s prove one clean provider works.

Docs:
https://docs.openclaw.ai/cli/infer
https://docs.openclaw.ai/cli/config

true skiff
#

'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'
00:52:54 [diagnostic] lane task error: lane=main durationMs=672 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:54 [diagnostic] lane task error: lane=session:agent:main:main durationMs=684 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:54 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai/gpt-4.1 candidate=google/gemini-2.5-flash reason=unknown next=openai-codex/gpt-5.1-codex-max detail=EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'
00:52:55 [diagnostic] lane task error: lane=main durationMs=959 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:55 [diagnostic] lane task error: lane=session:agent:main:main durationMs=970 error="Error:

#

EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:55 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai/gpt-4.1 candidate=openai-codex/gpt-5.1-codex-max reason=unknown next=openrouter/arcee-ai/trinity-large-preview:free detail=EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'
00:52:55 [diagnostic] lane task error: lane=main durationMs=665 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:55 [diagnostic] lane task error: lane=session:agent:main:main durationMs=677 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:55 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai/gpt-4.1 candidate=openrouter/arcee-ai/trinity-large-preview:free reason=unknown next=openrouter/openrouter/auto detail=EPERM: operation not permitted, open

willow willowBOT
#

Yep — that second rejection is from the models set command finishing/failing. Same story: config protected itself, no write committed.

Now the important bit is the command you just ran:

openclaw infer model run --local --model openai/gpt-4.1 --prompt "Reply with exactly: smoke-ok" --json

If it returns, paste the output.

If it sits there for more than ~60–90 seconds, hit Ctrl+C and rerun with the bundled-plugin disable env too:

$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_DISABLE_BUNDLED_PLUGINS="1"
true skiff
#

'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'
00:52:56 [diagnostic] lane task error: lane=main durationMs=476 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:56 [diagnostic] lane task error: lane=session:agent:main:main durationMs=487 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:56 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai/gpt-4.1 candidate=openrouter/openrouter/auto reason=unknown next=openrouter/openrouter/free detail=EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'
00:52:57 [diagnostic] lane task error: lane=main durationMs=483 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:57 [diagnostic] lane task error: lane=session:agent:main:main durationMs=495 error="Error:

willow willowBOT
#
openclaw infer model run --local --model openai/gpt-4.1 --prompt "Reply with exactly: smoke-ok" --json

That keeps us away from the same bundled runtime-deps lock path while testing the direct model call.

true skiff
#

EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:57 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai/gpt-4.1 candidate=openrouter/openrouter/free reason=unknown next=openrouter/z-ai/glm-4.7 detail=EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'
00:52:57 [diagnostic] lane task error: lane=main durationMs=484 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:57 [diagnostic] lane task error: lane=session:agent:main:main durationMs=497 error="Error: EPERM: operation not permitted, open 'D:\Caelan.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock'"
00:52:57 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai/gpt-4.1 candidat

willow willowBOT
#

Okay, this is not a model failure yet. The real failure is filesystem/session locking:

EPERM: operation not permitted, open
D:\Caelan\.openclaw\agents\main\sessions\...\jsonl.lock

That explains why it starts ā€œfalling backā€ — OpenClaw is treating the local lock-file error as candidate failure. Annoying, but useful: we need to clear the session lock problem before model tests mean anything.

Please do this carefully:

# 1) Stop the currently running command
# Press Ctrl+C if it is still running

# 2) Stop gateway if it is listening
#
$gwPid = (Get-NetTCPConnection -LocalPort 18789 -State Listen -ErrorAction SilentlyContinue).OwningProcess | Select-Object -First 1
if ($gwPid) { Stop-Process -Id $gwPid -Force }

# 3) Show any remaining OpenClaw processes
Get-CimInstance Win32_Process |
  Where-Object { $_.CommandLine -match 'openclaw' } |
  Select-Object ProcessId,Name,CommandLine

If that prints leftover openclaw gateway, openclaw infer, or similar processes, stop those too.

Then move aside only the session lock files:

$sessionDir = "D:\Caelan\.openclaw\agents\main\sessions"
#
$stamp = Get-Date -Format "yyyyMMdd-HHmmss"

Get-ChildItem $sessionDir -Filter "*.jsonl.lock" -Force |
  Rename-Item -NewName { "$($_.Name).stale-$stamp" }

Then rerun the no-write smoke test with bundled plugins disabled:

$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_DISABLE_BUNDLED_PLUGINS="1"

openclaw infer model run --local --model openai/gpt-4.1 --prompt "Reply with exactly: smoke-ok" --json

I’m intentionally not telling you to run doctor --fix here, even though doctor normally knows about stale session locks, because this install already has a bundled runtime-deps lock regression and I don’t want to poke that path more than necessary.

#

Yep, same root error repeating. Don’t wait for it to finish the fallback chain — it’ll just keep misclassifying the lock failure as model failures.

Hit Ctrl+C now, then do the process/lock cleanup from my previous message.

The key point: until this file can be created/opened…

D:\Caelan\.openclaw\agents\main\sessions\ceccad72-54c4-4ad6-a03e-816f210749b4.jsonl.lock

…none of the model results are meaningful. This is Windows/filesystem state first, model auth second.

#

Yep — Ctrl+C now. It’s just walking the fallback list and every candidate will fail for the same lock-file reason.

After Ctrl+C, run this exact block:

$gwPid = (Get-NetTCPConnection -LocalPort 18789 -State Listen -ErrorAction SilentlyContinue).OwningProcess | Select-Object -First 1
if ($gwPid) { Stop-Process -Id $gwPid -Force }

Get-CimInstance Win32_Process |
  Where-Object { $_.CommandLine -match 'openclaw' } |
  Select-Object ProcessId,Name,CommandLine

Paste what that prints before deleting/renaming anything else. That’ll tell us whether another OpenClaw process is holding or racing the session lock.

#

Still the same loop. Please don’t let it run further.

Do this now:

  1. Press Ctrl+C in that PowerShell window.
  2. Run:
$gwPid = (Get-NetTCPConnection -LocalPort 18789 -State Listen -ErrorAction SilentlyContinue).OwningProcess | Select-Object -First 1
if ($gwPid) { Stop-Process -Id $gwPid -Force }

Get-CimInstance Win32_Process |
  Where-Object { $_.CommandLine -match 'openclaw' } |
  Select-Object ProcessId,Name,CommandLine
  1. Paste that output.
#

We need to stop the process that’s causing/holding the .jsonl.lock problem before any more model testing.

#

RainChecked — pause and hit Ctrl+C. The pasted output is still from the same running command.

I don’t need more fallback lines; I need the output of this after it stops:

Get-CimInstance Win32_Process |
  Where-Object { $_.CommandLine -match 'openclaw' } |
  Select-Object ProcessId,Name,CommandLine

If Ctrl+C doesn’t stop it, close that PowerShell window and open a fresh one, then run the process check above.

#

Stop — that’s still the same runaway command output.

Do not paste more fallback lines. Do this instead:

  1. Press Ctrl+C.
  2. If it does not stop, close that PowerShell window entirely.
  3. Open a fresh PowerShell and run:
Get-CimInstance Win32_Process |
  Where-Object { $_.CommandLine -match 'openclaw' } |
  Select-Object ProcessId,Name,CommandLine

Paste only that process-list output.

If you still see OpenClaw processes there, we’ll kill those next. The model test is blocked until the .jsonl.lock EPERM loop is gone.

true skiff
#

PS D:\Caelan.openclaw\workspace> Get-CimInstance Win32_Process |

Where-Object { $_.CommandLine -match 'openclaw' } |
Select-Object ProcessId,Name,CommandLine

ProcessId Name CommandLine


 3448 node.exe "D:\node.exe" D:\Caelan\npm-global/node_modules/openclaw/openclaw.mjs gateway run --verbose
willow willowBOT
#

Good, found it. You still have a gateway process running:

PID 3448 node.exe ... openclaw.mjs gateway run --verbose

Stop that one:

Stop-Process -Id 3448 -Force

Then confirm OpenClaw is fully stopped:

Get-CimInstance Win32_Process |
#
  Where-Object { $_.CommandLine -match 'openclaw' } |
  Select-Object ProcessId,Name,CommandLine

If that prints nothing, then move aside the session lock files:

$sessionDir = "D:\Caelan\.openclaw\agents\main\sessions"
$stamp = Get-Date -Format "yyyyMMdd-HHmmss"

Get-ChildItem $sessionDir -Filter "*.jsonl.lock" -Force |
  Rename-Item -NewName { "$($_.Name).stale-$stamp" }

Then try the no-write smoke test again:

#
$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_DISABLE_BUNDLED_PLUGINS="1"

openclaw infer model run --local --model openai/gpt-4.1 --prompt "Reply with exactly: smoke-ok" --json
true skiff
#

just trying to fall back over and over

willow willowBOT
#

Yep — because the gateway process is still running and continuing the old agent turn. It’s not going to recover on its own; it’ll just walk the whole fallback list.

Kill the gateway first. Don’t run any more model commands yet.

Stop-Process -Id 3448 -Force

Then verify:

Get-CimInstance Win32_Process |
  Where-Object { $_.CommandLine -match 'openclaw' } |
  Select-Object ProcessId,Name,CommandLine

If that prints nothing, we’re good.

#

If PID 3448 or another openclaw ... gateway run comes back, then something is restarting it; run:

openclaw gateway stop

Then check the process list again.

Once the process list is empty, we can clear the .jsonl.lock files and retry. The fallback loop is a symptom of the stuck gateway/session lock, not proof that all your models are broken.

true skiff
#

i DID kill it. the smoke test is causing model fallback over and over

willow willowBOT
#

You’re right — I’m sorry. I treated the smoke test as isolated, but in 2026.4.25 it’s still hitting the main agent/session path, so it keeps tripping over the same .jsonl.lock and then falsely falls back through every model.

Don’t run that smoke test again yet.

Do this instead: move the broken session lock/session dir out of the way after killing OpenClaw.

# confirm no OpenClaw process is running
Get-CimInstance Win32_Process |
  Where-Object { $_.CommandLine -match 'openclaw' } |
  Select-Object ProcessId,Name,CommandLine

If that prints nothing, run:

#
$sessionDir = "D:\Caelan\.openclaw\agents\main\sessions"
$stamp = Get-Date -Format "yyyyMMdd-HHmmss"

Rename-Item $sessionDir "$sessionDir.stuck-$stamp"
New-Item -ItemType Directory -Path $sessionDir | Out-Null

That preserves the old sessions as sessions.stuck-... but gives OpenClaw a clean sessions folder.

Then retry once:

$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_DISABLE_BUNDLED_PLUGINS="1"
#
openclaw infer model run --local --model openai/gpt-4.1 --prompt "Reply with exactly: smoke-ok" --json

If it still falls back after that, stop immediately — then we switch to a raw OpenAI API curl/PowerShell test outside OpenClaw entirely.

true skiff
#

yep doing it again:
PS D:\Caelan.openclaw\workspace> $sessionDir = "D:\Caelan.openclaw\agents\main\sessions"
PS D:\Caelan.openclaw\workspace> $stamp = Get-Date -Format "yyyyMMdd-HHmmss"
PS D:\Caelan.openclaw\workspace>
PS D:\Caelan.openclaw\workspace> Rename-Item $sessionDir "$sessionDir.stuck-$stamp"
PS D:\Caelan.openclaw\workspace> New-Item -ItemType Directory -Path $sessionDir | Out-Null
PS D:\Caelan.openclaw\workspace> $env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
PS D:\Caelan.openclaw\workspace> $env:OPENCLAW_DISABLE_BUNDLED_PLUGINS="1"
PS D:\Caelan.openclaw\workspace> $env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
PS D:\Caelan.openclaw\workspace> $env:OPENCLAW_DISABLE_BUNDLED_PLUGINS="1"
PS D:\Caelan.openclaw\workspace> openclaw infer model run --local --model openai/gpt-4.1 --prompt "Reply with exactly: smoke-ok" --json
Config warnings:

  • plugins.entries.qqbot: plugin not found: qqbot (stale config entry ignored; remove it from plugins config)
    01:24:49 [agent/embedded] workspace bootstrap file AGENTS.md is 14572 chars (limit 12000); truncating in injected context (sessionKey=agent:main:main)
    01:24:49 [agent/embedded] workspace bootstrap file MEMORY.md is 27114 chars (limit 12000); truncating in injected context (sessionKey=agent:main:main)
    01:24:50 [diagnostic] lane task error: lane=main durationMs=8851 error="Error: Unable to resolve bundled plugin public surface speech-core/runtime-api.js"
    01:24:50 [diagnostic] lane task error: lane=session:agent:main:main durationMs=8865 error="Error: Unable to resolve bundled plugin public surface speech-core/runtime-api.js"
    01:24:50 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai/gpt-4.1 candidate=openai/gpt-4.1 reason=unknown next=openai-codex/gpt-5.5 detail=Unable to resolve bundled plugin public surface speech-core/runtime-api.js
willow willowBOT
#

Yep — stop that test. I’m calling it: OpenClaw model smoke tests are not trustworthy on this install right now.

That new error is the giveaway:

Unable to resolve bundled plugin public surface speech-core/runtime-api.js

So:

  • Without OPENCLAW_DISABLE_BUNDLED_PLUGINS, you hit the bundled runtime-deps/lock problem.
  • With OPENCLAW_DISABLE_BUNDLED_PLUGINS=1, the agent path later needs bundled core surfaces like speech-core, so it fails anyway.
  • The fallback loop is therefore OpenClaw runtime failure, not proof your models are bad.

Please Ctrl+C and don’t run openclaw infer model run again for now.

To test whether your OpenAI API key itself works, test outside OpenClaw:

#
$body = @{
  model = "gpt-4.1"
  input = "Reply with exactly: smoke-ok"
  max_output_tokens = 20
} | ConvertTo-Json -Depth 5

Invoke-RestMethod `
  -Uri "https://api.openai.com/v1/responses" `
  -Method Post `
  -Headers @{
    "Authorization" = "Bearer $env:OPENAI_API_KEY"
    "Content-Type" = "application/json"
  } `
  -Body $body |
  ConvertTo-Json -Depth 10
#

OpenAI’s current Responses API uses POST /v1/responses with model + input, authenticated with Authorization: Bearer .... (platform.openai.com)

If that returns smoke-ok, your OpenAI key is fine and this is squarely an OpenClaw 2026.4.25 bundled/runtime issue.

At this point I’d preserve these exact two errors for a GitHub issue:

Timed out waiting for bundled runtime deps lock
Unable to resolve bundled plugin public surface speech-core/runtime-api.js
true skiff
#

PS D:\Caelan.openclaw\workspace> $body = @{

model = "gpt-4.1"
input = "Reply with exactly: smoke-ok"
max_output_tokens = 20
} | ConvertTo-Json -Depth 5
PS D:\Caelan.openclaw\workspace>
PS D:\Caelan.openclaw\workspace> Invoke-RestMethod -Uri "https://api.openai.com/v1/responses"
-Method Post -Headers @{ "Authorization" = "Bearer $env:OPENAI_API_KEY" "Content-Type" = "application/json" }
-Body $body |
ConvertTo-Json -Depth 10
Invoke-RestMethod : Specified value has invalid Control characters.
Parameter name: value
At line:1 char:1

  • Invoke-RestMethod `
  •   + CategoryInfo          : NotSpecified: (:) [Invoke-RestMethod], ArgumentException
      + FullyQualifiedErrorId : System.ArgumentException,Microsoft.PowerShell.Commands.InvokeRestMethodCommand
willow willowBOT
#

That error is from PowerShell rejecting the header value before it even sends the request. Most likely $env:OPENAI_API_KEY has a hidden newline/control character in it.

Clean it into a local variable and retry — don’t paste the key here:

$key = ($env:OPENAI_API_KEY -replace '[\x00-\x20]', '')

$body = @{
  model = "gpt-4.1"
  input = "Reply with exactly: smoke-ok"
  max_output_tokens = 20
} | ConvertTo-Json -Depth 5

Invoke-RestMethod `
  -Uri "https://api.openai.com/v1/responses" `
  -Method Post `
#
  -Headers @{ Authorization = "Bearer $key" } `
  -ContentType "application/json" `
  -Body $body |
  ConvertTo-Json -Depth 10

OpenAI’s docs show the Responses endpoint using POST /v1/responses with model + input, and API keys are sent via Authorization: Bearer .... (platform.openai.com)

If that works, fix the env var permanently in that terminal:

$env:OPENAI_API_KEY = $key

If it still errors, run this without showing the key:

#
"length=$($env:OPENAI_API_KEY.Length)"
"controlChars=$((($env:OPENAI_API_KEY.ToCharArray() | Where-Object { [int]$_ -lt 32 }).Count))"
true skiff
#

PS D:\Caelan.openclaw\workspace> $key = ($env:OPENAI_API_KEY -replace '[\x00-\x20]', '')
PS D:\Caelan.openclaw\workspace>
PS D:\Caelan.openclaw\workspace> $body = @{

model = "gpt-4.1"
input = "Reply with exactly: smoke-ok"
max_output_tokens = 20
} | ConvertTo-Json -Depth 5
PS D:\Caelan.openclaw\workspace>
PS D:\Caelan.openclaw\workspace> Invoke-RestMethod -Uri "https://api.openai.com/v1/responses"
-Method Post -Headers @{ Authorization = "Bearer $key" }
-ContentType "application/json" `
-Body $body |
ConvertTo-Json -Depth 10
Invoke-RestMethod : Specified value has invalid Control characters.
Parameter name: value
At line:1 char:1

  • Invoke-RestMethod `
  •   + CategoryInfo          : NotSpecified: (:) [Invoke-RestMethod], ArgumentException
      + FullyQualifiedErrorId : System.ArgumentException,Microsoft.PowerShell.Commands.InvokeRestMethodCommand
willow willowBOT
#

That error is happening locally in PowerShell before the request reaches OpenAI. Your request shape is fine — OpenAI’s Responses API uses POST /v1/responses with Authorization: Bearer $OPENAI_API_KEY. (platform.openai.com)

The key/env var still has a hidden invalid character. Don’t paste the key here — run this instead:

$raw = [string]$env:OPENAI_API_KEY

$bad = for ($i = 0; $i -lt $raw.Length; $i++) {
  $code = [int][char]$raw[$i]
  $cat = [Globalization.CharUnicodeInfo]::GetUnicodeCategory($raw, $i).ToString()
  if ($code -lt 33 -or $code -eq 127 -or ($code -ge 128 -and $code -le 159) -or $cat -in @("Control","Format","LineSeparator","ParagraphSeparator","SpaceSeparator")) {
    [pscustomobject]@{
      Index = $i
      CodePoint = ("U+{0:X4}" -f $code)
      Category = $cat
    }
#
  }
}

$bad | Format-Table
"raw length=$($raw.Length), bad chars=$(@($bad).Count)"

Then make a clean ASCII-printable copy and retry:

$key = -join ([char[]][string]$env:OPENAI_API_KEY | Where-Object {
  $n = [int]$_
  $n -ge 33 -and $n -le 126
})
$key = $key.Trim('"', "'")
#

$body = @{
  model = "gpt-4.1"
  input = "Reply with exactly: smoke-ok"
  max_output_tokens = 20
} | ConvertTo-Json -Depth 5

Invoke-RestMethod `
  -Uri "https://api.openai.com/v1/responses" `
  -Method Post `
  -Headers @{ Authorization = "Bearer $key" } `
  -ContentType "application/json" `
  -Body $body |
  ConvertTo-Json -Depth 10
#

If that works, reset the env var from the cleaned value and restart the gateway/service so OpenClaw sees the clean environment:

[Environment]::SetEnvironmentVariable("OPENAI_API_KEY", $key, "User")

Then continue the OpenClaw side with the normal triage ladder: openclaw status, openclaw gateway status, logs, then openclaw doctor. (docs.openclaw.ai)

OpenClaw
true skiff
#

PS D:\Caelan.openclaw\workspace> $raw = [string]$env:OPENAI_API_KEY
PS D:\Caelan.openclaw\workspace>
PS D:\Caelan.openclaw\workspace> $bad = for ($i = 0; $i -lt $raw.Length; $i++) {

$code = [int][char]$raw[$i]
$cat = [Globalization.CharUnicodeInfo]::GetUnicodeCategory($raw, $i).ToString()
if ($code -lt 33 -or $code -eq 127 -or ($code -ge 128 -and $code -le 159) -or $cat -in @("Control","Format","LineSeparator","ParagraphSeparator","SpaceSeparator")) {
[pscustomobject]@{
Index = $i
CodePoint = ("U+{0:X4}" -f $code)
Category = $cat
}
}
}
PS D:\Caelan.openclaw\workspace>
PS D:\Caelan.openclaw\workspace> $bad | Format-Table
PS D:\Caelan.openclaw\workspace> "raw length=$($raw.Length), bad chars=$(@($bad).Count)"
raw length=52, bad chars=0
PS D:\Caelan.openclaw\workspace> $key = -join ([char[]][string]$env:OPENAI_API_KEY | Where-Object {
$n = [int]$_
$n -ge 33 -and $n -le 126
})
PS D:\Caelan.openclaw\workspace> $key = $key.Trim('"', "'")
PS D:\Caelan.openclaw\workspace> $body = @{
model = "gpt-4.1"
input = "Reply with exactly: smoke-ok"
max_output_tokens = 20
} | ConvertTo-Json -Depth 5
PS D:\Caelan.openclaw\workspace>

#

PS D:\Caelan.openclaw\workspace> Invoke-RestMethod `

-Uri "https://api.openai.com/v1/responses" -Method Post
-Headers @{ Authorization = "Bearer $key" } -ContentType "application/json"
-Body $body |
ConvertTo-Json -Depth 10
Invoke-RestMethod : {
"error": {
"message": "You exceeded your current quota, please check your plan and billing details. For more information on
this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.",
"type": "insufficient_quota",
"param": null,
"code": "insufficient_quota"
}
}
At line:1 char:1

  • Invoke-RestMethod `
  •   + CategoryInfo          : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-RestMethod], WebExc
     eption
      + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeRestMethodCommand
willow willowBOT
#

Good — that means the env var doesn’t have obvious whitespace/control junk. You stopped right before the actual retry, so run this next with the cleaned $key you already made:

Invoke-RestMethod `
  -Uri "https://api.openai.com/v1/responses" `
  -Method Post `
  -Headers @{ Authorization = "Bearer $key" } `
  -ContentType "application/json" `
  -Body $body |
  ConvertTo-Json -Depth 10

If it still says invalid control characters, run this tiny sanity check:

"key length=$($key.Length)"
#
"starts=$($key.Substring(0, [Math]::Min(7,$key.Length)))"
"ends=$($key.Substring([Math]::Max(0,$key.Length-4)))"

$header = "Bearer $key"
$badHeader = for ($i = 0; $i -lt $header.Length; $i++) {
  $n = [int][char]$header[$i]
  if ($n -lt 32 -or $n -gt 126) {
    [pscustomobject]@{ Index=$i; Code=$n; Hex=("0x{0:X2}" -f $n) }
  }
}
$badHeader | Format-Table

Don’t paste the full key, but paste the length/prefix/suffix and whether $badHeader prints anything. If the smoke test succeeds, we can go back to the OpenClaw gateway/plugin lock issue.

#

Good, that’s actually useful: the API key/header is syntactically valid now. The current OpenAI failure is account/billing quota, not the PowerShell control-character issue.

So we have two separate tracks:

  1. Model calls will fail until you either add/fix OpenAI billing/quota or switch OpenClaw to a provider/key with quota.
  2. Gateway /healthz hanging is still likely the bundled plugin runtime-deps lock/root issue, especially given the qqbot lock timeout.

Next I’d isolate the plugin lock cleanly. In PowerShell, first make sure no OpenClaw/npm/node install is running:

Get-Process openclaw,node,npm,pnpm -ErrorAction SilentlyContinue |
  Select-Object Id, ProcessName, Path

If that only shows things you know are safe to stop, stop the gateway/service and retry after moving the stale lock aside — rename, don’t delete:

#
openclaw gateway stop

$lock = "D:\Caelan\.openclaw\plugin-runtime-deps\openclaw-2026.4.25-eb7d6f0fe52e\.openclaw-runtime-deps.lock"
if (Test-Path $lock) {
  Rename-Item $lock ".openclaw-runtime-deps.lock.stale-$(Get-Date -Format yyyyMMdd-HHmmss)"
}

$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
openclaw plugins deps --prune
openclaw plugins deps --repair

Then try the minimal gateway again:

#
$env:OPENCLAW_SKIP_CHANNELS="1"
$env:OPENCLAW_SKIP_PROVIDERS="1"
openclaw gateway run

In another PowerShell:

Invoke-WebRequest http://127.0.0.1:18789/healthz -TimeoutSec 5

If plugins deps --repair hangs or the gateway still times out, paste the output from:

openclaw --version
true skiff
#

hold on one sec. First, I kinda forgot where we left off because i took a day off of troubleshooting. Second, my Codex/Oauth has reloaded my weekly usage. So we could just switch back to using that with gpt 5.5

willow willowBOT
#

No worries — quick reset:

Where we left off:

  • Your OpenAI API key was valid, but the account had insufficient_quota.
  • Separate from that, the gateway hang looked like a bundled plugin runtime-deps lock problem:
    D:\Caelan\.openclaw\plugin-runtime-deps\...\ .openclaw-runtime-deps.lock
  • The lock claimed an owner PID that Windows showed as svchost.exe, so it smelled stale/wrong.

If Codex OAuth quota is back, yes — I’d switch model auth back there, but don’t confuse the two Codex paths:

  • Codex OAuth/subscription path: openai-codex/...
  • Native Codex runtime path: openai/... with agentRuntime.id: "codex"

For this rescue, I’d use the simpler OAuth/subscription path first:

#
openclaw models auth login --provider openai-codex --set-default
openclaw models status --probe-provider openai-codex
openclaw models list --provider openai-codex

Then set the exact model ref shown by the list/status output, e.g. the openai-codex/... one you intend to use:

openclaw models set <exact-provider/model-from-list>

After that, we still need to finish the gateway/plugin-lock repair. Codex quota fixes model calls, but it won’t by itself fix /healthz timing out if startup is stuck on plugin runtime deps.

Docs: https://docs.openclaw.ai/concepts/models and https://docs.openclaw.ai/plugins/codex-harness

true skiff
#

Ok, i've done that

willow willowBOT
#

Perfect. Let’s prove that path works without needing the gateway, then go back to the hang.

Run:

openclaw models status --probe-provider openai-codex
openclaw infer model run --local --prompt "Reply with exactly: codex-smoke-ok" --json

If that returns codex-smoke-ok, model/auth is no longer the blocker.

Then do the gateway lock repair:

openclaw gateway stop
#
$root = "D:\Caelan\.openclaw\plugin-runtime-deps\openclaw-2026.4.25-eb7d6f0fe52e"
if (Test-Path $root) {
  Rename-Item $root "openclaw-2026.4.25-eb7d6f0fe52e.stale-$(Get-Date -Format yyyyMMdd-HHmmss)"
}

openclaw plugins deps --prune
openclaw plugins deps --repair

Then minimal gateway test:

$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_SKIP_CHANNELS="1"
$env:OPENCLAW_SKIP_PROVIDERS="1"
#
openclaw gateway run

In a second PowerShell:

Invoke-WebRequest http://127.0.0.1:18789/healthz -TimeoutSec 5

Paste whichever step fails/hangs first. If it works, we’ll re-enable providers/channels one layer at a time.

true skiff
#

smoke test has been hanging for over 10 minutes šŸ™ But here is the output from before that:
PS D:\Caelan.openclaw\workspace> openclaw models status --probe-provider openai-codex
Config warnings:

  • plugins.entries.discord: plugin disabled (disabled in config) but config is present
  • plugins.entries.continuity: plugin disabled (disabled in config) but config is present
  • plugins.entries.stability: plugin disabled (disabled in config) but config is present
  • plugins.entries.lossless-claw: plugin disabled (disabled in config) but config is present

šŸ¦ž OpenClaw 2026.4.25 (aa36ee6) — I autocomplete your thoughts—just slower and with more API calls.

Config warnings:

  • plugins.entries.discord: plugin disabled (disabled in config) but config is present
  • plugins.entries.continuity: plugin disabled (disabled in config) but config is present
  • plugins.entries.stability: plugin disabled (disabled in config) but config is present
  • plugins.entries.lossless-claw: plugin disabled (disabled in config) but config is present
#

Config : D:\Caelan.openclaw\openclaw.json
Agent dir : D:\Caelan.openclaw\agents\main\agent
Default : openai-codex/gpt-5.5
Fallbacks (21): openai-codex/gpt-5.5, openai/gpt-5.4, openai-codex/gpt-5.4, google/gemini-2.5-flash, openai-codex/gpt-5.1-codex-max, openrouter/arcee-ai/trinity-large-preview:free, openrouter/auto, openrouter/free, openrouter/z-ai/glm-4.7, openrouter/z-ai/glm-4.5-air:free, google/gemini-3.1-pro-preview, openrouter/google/gemini-3.1-pro-preview, google/gemini-3-flash-preview, openrouter/moonshotai/kimi-k2-thinking, google/gemini-3-pro-preview, openai/gpt-5.4-pro, openai/gpt-5.4, openai/gpt-5.5, openai/gpt-4o-2024-11-20, openai/gpt-4.1, openrouter/google/gemini-3-flash-preview
Image model : openrouter/auto
Image fallbacks (2): google-antigravity/gemini-2.5-flash, google-antigravity/gemini-3-flash-preview
Aliases (6) : wren-venice -> openrouter/cognitivecomputations/dolphin-mistral-24b-venice-edition:free, OpenRouter -> openrouter/auto, gemini -> google/gemini-3.1-pro-preview, gemini-flash-lite -> google/gemini-3.1-flash-lite-preview, gemini-flash -> google/gemini-3-flash-preview, gpt -> openai/gpt-5.4
Configured models (22): openrouter/google/gemini-3-pro-preview, google/gemini-2.5-flash, openai-codex/gpt-5.1-codex-max, openrouter/arcee-ai/trinity-large-preview:free, openrouter/cognitivecomputations/dolphin-mistral-24b-venice-edition:free, openrouter/auto, openrouter/free, openrouter/z-ai/glm-4.7, openrouter/z-ai/glm-4.5-air:free, google/gemini-3.1-pro-preview, google/gemini-3.1-flash-lite-preview, google/gemini-3-flash-preview, openrouter/google/gemini-3.1-pro-preview, openrouter/moonshotai/kimi-k2-thinking, openai-codex/gpt-5.5, openai-codex/gpt-5.4, openai-codex/gpt-5.1, openai/gpt-5.4-pro, openai/gpt-5.4, openai/gpt-5.5, openai/gpt-4o-2024-11-20, openai/gpt-4.1

Auth overview
Auth store : D:\Caelan.openclaw\agents\main\agent\auth-profiles.json
Shell env : off

#

Providers w/ OAuth/tokens (2): google-antigravity (1), openai-codex (2)

  • google effective=profiles:D:\Caelan.openclaw\agents\main\agent\auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | google=AIzaSyCb...K6J9vL48 | models.json=AIzaSyCb...K6J9vL48 | source=models.json: D:\Caelan.openclaw\agents\main\agent\models.json
  • google-antigravity effective=profiles:D:\Caelan.openclaw\agents\main\agent\auth-profiles.json | profiles=1 (oauth=1, token=0, api_key=0) | google-antigravity:lindsay.caldwell729@gmail.com=OAuth (lindsay.caldwell729@gmail.com)
  • openai effective=env:sk-Z127v...3UZwGbwF | env=sk-Z127v...3UZwGbwF | source=env: OPENAI_API_KEY
  • openai-codex effective=profiles:D:\Caelan.openclaw\agents\main\agent\auth-profiles.json | profiles=2 (oauth=2, token=0, api_key=0) | openai-codex:default=OAuth, openai-codex:lindsay.caldwell729@gmail.com=OAuth (lindsay.caldwell729@gmail.com)
  • openrouter effective=profiles:D:\Caelan.openclaw\agents\main\agent\auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | openrouter:default=sk-or-v1...1fc2fc58 | env=sk-or-v1...1fc2fc58 | source=env: OPENROUTER_API_KEY | models.json=OPENROUT..._API_KEY | source=models.json: D:\Caelan.openclaw\agents\main\agent\models.json
  • perplexity effective=env:sk-or-v1...1fc2fc58 | env=sk-or-v1...1fc2fc58 | source=env: OPENROUTER_API_KEY

OAuth/token status

willow willowBOT
#

Oof šŸ™ — stop that smoke test with Ctrl+C. A 10+ minute hang means this is not just provider quota anymore.

The fact that models status --probe-provider openai-codex only got as far as warnings + banner strongly suggests OpenClaw is still getting stuck in the plugin/runtime-deps path before the actual model probe finishes.

Let’s temporarily sidestep the old staging dir entirely:

# in this PowerShell only
$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_PLUGIN_STAGE_DIR="D:\Caelan\.openclaw\plugin-runtime-deps-clean"
$env:OPENCLAW_SKIP_CHANNELS="1"
$env:OPENCLAW_SKIP_PROVIDERS="1"

openclaw gateway stop
openclaw plugins deps --repair
#

If plugins deps --repair hangs too, stop it and run:

Get-Process openclaw,node,npm,pnpm -ErrorAction SilentlyContinue |
  Select-Object Id, ProcessName, Path

Get-ChildItem "D:\Caelan\.openclaw\plugin-runtime-deps*" -Force |
  Select-Object FullName, LastWriteTime

If repair completes, test the gateway with the clean stage dir:

openclaw gateway run

Second PowerShell:

#
Invoke-WebRequest http://127.0.0.1:18789/healthz -TimeoutSec 5

For now, don’t spend more cycles on Codex/model smoke tests — first we need CLI/gateway startup to stop hanging. The clean OPENCLAW_PLUGIN_STAGE_DIR is the fastest way to confirm whether that old locked/corrupt plugin-runtime-deps root is the culprit.

#

Yep — that confirms the model config is now pointed at openai-codex/gpt-5.5, so the selection part is done.

But if the smoke test/status probe hangs, stop it with Ctrl+C. At this point the blocker is still startup/plugin loading, not model choice.

Next, run the clean plugin-stage isolation exactly like this:

openclaw gateway stop

$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_PLUGIN_STAGE_DIR="D:\Caelan\.openclaw\plugin-runtime-deps-clean"
$env:OPENCLAW_SKIP_CHANNELS="1"
$env:OPENCLAW_SKIP_PROVIDERS="1"

openclaw plugins deps --repair
#

If that hangs, don’t wait 10 minutes — give it ~60–90 seconds, then Ctrl+C and paste:

Get-Process openclaw,node,npm,pnpm -ErrorAction SilentlyContinue |
  Select-Object Id, ProcessName, Path

Get-ChildItem "D:\Caelan\.openclaw\plugin-runtime-deps*" -Force |
  Select-Object FullName, LastWriteTime

If plugins deps --repair completes, then:

openclaw gateway run

And in a second PowerShell:

#
Invoke-WebRequest http://127.0.0.1:18789/healthz -TimeoutSec 5

Tiny but important: don’t keep testing Codex until the clean-stage test passes. The model config looks right; now we need to unstick the runtime-deps/plugin path.

#

Good news: Codex auth looks usable — Week 100% left and one openai-codex profile is ok. The expired openai-codex:default is noise, but let’s keep it out of rotation.

First stop the hanging smoke test with Ctrl+C, then run:

openclaw models auth order set --provider openai-codex "<the openai-codex profile id that says ok>"
openclaw models auth order get --provider openai-codex

Use the exact profile id from your output — the one ending in your email, not openai-codex:default.

Then go back to the real blocker: plugin/runtime-deps startup.

openclaw gateway stop
#
$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_PLUGIN_STAGE_DIR="D:\Caelan\.openclaw\plugin-runtime-deps-clean"
$env:OPENCLAW_SKIP_CHANNELS="1"
$env:OPENCLAW_SKIP_PROVIDERS="1"
$env:OPENCLAW_PLUGIN_LIFECYCLE_TRACE="1"

openclaw plugins deps --repair

If that hangs for more than ~90 seconds, Ctrl+C and paste only:

Get-Process openclaw,node,npm,pnpm -ErrorAction SilentlyContinue |
  Select-Object Id, ProcessName, Path
#
Get-ChildItem "D:\Caelan\.openclaw\plugin-runtime-deps*" -Force |
  Select-Object FullName, LastWriteTime

No need to paste more auth/key output — we’ve got enough there.

true skiff
willow willowBOT
#

Yep, that error is because the email alone is just the label/account part, not the auth profile id.

From your output, the actual profile id is the full thing on the left of =OAuth:

openai-codex:lindsay.caldwell729@gmail.com

So run:

openclaw models auth order set --provider openai-codex "openai-codex:lindsay.caldwell729@gmail.com"
openclaw models auth order get --provider openai-codex

If that works, it should stop trying the expired:

#
openai-codex:default

But this is optional cleanup — the bigger blocker is still the plugin/runtime-deps hang. After that auth-order command, go straight back to the clean stage test:

openclaw gateway stop

$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_PLUGIN_STAGE_DIR="D:\Caelan\.openclaw\plugin-runtime-deps-clean"
$env:OPENCLAW_SKIP_CHANNELS="1"
$env:OPENCLAW_SKIP_PROVIDERS="1"
$env:OPENCLAW_PLUGIN_LIFECYCLE_TRACE="1"

openclaw plugins deps --repair
true skiff
#

it's saying: error: unknown command 'deps'

willow willowBOT
#

Ah — that one’s on me. openclaw plugins deps is from newer/current docs, but it’s not in 2026.4.25, so your install is right to say unknown command.

Use this 2026.4.25-safe path instead:

openclaw gateway stop

$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_PLUGIN_STAGE_DIR="D:\Caelan\.openclaw\plugin-runtime-deps-clean"
$env:OPENCLAW_SKIP_CHANNELS="1"
$env:OPENCLAW_SKIP_PROVIDERS="1"

openclaw doctor --fix

If doctor --fix hangs too, stop it after ~90s and try the most minimal ā€œno bundled plugins at allā€ gateway test:

#
$env:OPENCLAW_DISABLE_BUNDLED_PLUGINS="1"
openclaw gateway run

Second PowerShell:

Invoke-WebRequest http://127.0.0.1:18789/healthz -TimeoutSec 5

If that works, we’ve proven the core gateway is okay and bundled plugin loading/runtime-deps is the hang. If it still hangs, it’s deeper than plugins.

Docs refs: https://docs.openclaw.ai/help/faq and https://docs.openclaw.ai/cli/doctor

true skiff
#

T OpenClaw doctor
|
o OAuth refresh errors ------------------------------------------------------------+
| |
| - openai-codex:default: OAuth refresh failed — Try again; if this persists, run |
| openclaw models auth login --provider openai-codex. |
| |
+-----------------------------------------------------------------------------------+
|
o Model auth ---------------------------------------------------------------------+
| |
| - google-antigravity:lindsay.caldwell729@gmail.com: expiring (50m) — openclaw | | configure. |
| - openai-codex:default: expired (0m) — Re-auth via openclaw models auth login | | --provider openai-codex or openclaw configure. |
| |
+----------------------------------------------------------------------------------+
|

#

o Plugin registry -------------------------------------------+
| |
| Plugin registry refreshed: 0/115 enabled plugins indexed. |
| |
+-------------------------------------------------------------+
|
o State integrity --------------------------------------------------------------------------+
| |
| - Multiple state directories detected. This can split session history. |
| - ~.openclaw |
| Active state dir: D:\Caelan.openclaw |
| - 1/1 recent sessions are missing transcripts. |
| Verify sessions in store: openclaw sessions --store |
| "D:\Caelan.openclaw\agents\main\sessions\sessions.json" |
| Preview cleanup impact: openclaw sessions cleanup --store |
| "D:\Caelan.openclaw\agents\main\sessions\sessions.json" --dry-run |
| Prune missing entries: openclaw sessions cleanup --store |
| "D:\Caelan.openclaw\agents\main\sessions\sessions.json" --enforce --fix-missing |
| - Main session transcript missing |
| (D:\Caelan.openclaw\agents\main\sessions\f89328e3-405a-43e4-a789-663fbcb977e0.jsonl). |
| History will appear to reset. |
| |
+--------------------------------------------------------------------------------------------+
|

#

o Session locks ------------------------------------------------------------------------------+
| |
| - Found 1 session lock file. |
| - D:\Caelan.openclaw\agents\main\sessions\f89328e3-405a-43e4-a789-663fbcb977e0.jsonl.lock |
| pid=10028 (dead) age=42h35m stale=yes (dead-pid, too-old) [removed] |
| - Removed 1 stale session lock file. |
| |
+----------------------------------------------------------------------------------------------+
[openai-codex] Token refresh failed: 401 {
"error": {
"message": "Your refresh token has already been used to generate a new access token. Please try signing in again.",
"type": "invalid_request_error",
"param": null,
"code": "refresh_token_reused"
}
}
|
o Security ---------------------------------+
| |
| - No channel security warnings detected. |
| - Run: openclaw security audit --deep |
| |
+--------------------------------------------+
|
o Skills status ------------+
| |
| Eligible: 16 |
| Missing requirements: 46 |
| Blocked by allowlist: 0 |
| |
+----------------------------+
|

willow willowBOT
#

That doctor output is auth cleanup noise, not the main hang.

  • openai-codex:default is expired.
  • Your usable one is still openai-codex:lindsay.caldwell729@gmail.com with quota back.
  • Doctor reports expired profiles even if you’ve pinned auth order to the good profile.

So: ignore that for now unless you need a clean doctor later.

Next test the core gateway with bundled plugins fully disabled:

openclaw gateway stop

$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_PLUGIN_STAGE_DIR="D:\Caelan\.openclaw\plugin-runtime-deps-clean"
$env:OPENCLAW_SKIP_CHANNELS="1"
#
$env:OPENCLAW_SKIP_PROVIDERS="1"
$env:OPENCLAW_DISABLE_BUNDLED_PLUGINS="1"

openclaw gateway run

In a second PowerShell:

Invoke-WebRequest http://127.0.0.1:18789/healthz -TimeoutSec 5

If this works, we’ve proven the core gateway is fine and the hang is bundled plugin/runtime-deps. If it still times out, the issue is deeper than the plugin stage.

#

That’s actually a useful result: with bundled plugins disabled, doctor indexed 0/115 enabled plugins, which means the disable flag is taking effect. The state warnings are real but not our current blocker.

Now do the actual health probe with that same no-bundled-plugin environment:

$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_PLUGIN_STAGE_DIR="D:\Caelan\.openclaw\plugin-runtime-deps-clean"
$env:OPENCLAW_SKIP_CHANNELS="1"
$env:OPENCLAW_SKIP_PROVIDERS="1"
$env:OPENCLAW_DISABLE_BUNDLED_PLUGINS="1"

openclaw gateway run

Second PowerShell:

#
Invoke-WebRequest http://127.0.0.1:18789/healthz -TimeoutSec 5

Interpretation:

  • If /healthz works: core gateway is okay; bundled plugin/runtime-deps is definitely the hang.
  • If /healthz still times out: we pivot to state/session corruption or multiple state dirs as the next suspect.

Don’t run the session cleanup yet — it’s not destructive exactly, but I’d rather prove whether the core gateway can breathe first.

true skiff
#

o Plugins -------+
| |
| Loaded: 0 |
| Imported: 0 |
| Disabled: 115 |
| Errors: 0 |
| |
+-----------------+
|
o Bootstrap file size --------------------------------------------------+
| |
| Workspace bootstrap files exceed limits and will be truncated: |
| - AGENTS.md: 14,572 raw / 11,999 injected (18% truncated; max/file) |
| - MEMORY.md: 27,114 raw / 11,999 injected (56% truncated; max/file) |
| Total bootstrap injected chars: 39,636 (66% of max/total 60,000). |
| Total bootstrap raw chars (before truncation): 57,324. |
| |
| - Tip: tune agents.defaults.bootstrapMaxChars for per-file limits. |
| |
+------------------------------------------------------------------------+
|
o
|
o Gateway --------------+
| |
| Gateway not running. |
| |
+------------------------+
|
o Gateway connection -------------------------------+
| |
| Gateway target: ws://127.0.0.1:18789 |
| Source: local loopback |
| Config: D:\Caelan\openclaw-rescue-noplugins.json |
| Bind: loopback |
| |
+----------------------------------------------------+
|
o

#

Memory search -------------------------------------------------+
| |
| No active memory plugin is registered for the current config. |
| |
+-----------------------------------------------------------------+
|
o Gateway --------------------------------------------------------------------------------+
| |
| Runtime: stopped (state Disabled, last run 1, last run time 4/26/2026 3:15:16 PM, Task |
| Last Run Result=1; treating as not running.) |
| Service is loaded but not running (likely exited immediately). |
| File logs: D:\Caelan.openclaw\logs\openclaw.log |
| Logs: schtasks /Query /TN "OpenClaw Gateway" /V /FO LIST |
| Restart attempts: D:\Caelan.openclaw\logs\gateway-restart.log |
| |
+------------------------------------------------------------------------------------------+
Error: schtasks run failed: ERROR: The scheduled task "OpenClaw Gateway" could not run because it is disabled.

willow willowBOT
#

That stale session lock removal is a real find. A dead 42h lock can absolutely keep things wedged. Nice, we may have just removed one of the blockers.

The Codex refresh error is separate: that old openai-codex:default profile is busted/reused. Ignore it for this gateway test; re-auth later if needed.

Now run the health test:

$env:OPENCLAW_CONFIG_PATH="D:\Caelan\openclaw-rescue-noplugins.json"
$env:OPENCLAW_PLUGIN_STAGE_DIR="D:\Caelan\.openclaw\plugin-runtime-deps-clean"
$env:OPENCLAW_SKIP_CHANNELS="1"
$env:OPENCLAW_SKIP_PROVIDERS="1"
$env:OPENCLAW_DISABLE_BUNDLED_PLUGINS="1"

openclaw gateway run

Leave that open. In a second PowerShell:

#
Invoke-WebRequest http://127.0.0.1:18789/healthz -TimeoutSec 5

If that works, next test bundled plugins with the clean stage dir:

Remove-Item Env:OPENCLAW_DISABLE_BUNDLED_PLUGINS -ErrorAction SilentlyContinue
openclaw gateway run

Then probe /healthz again. This tells us whether the stale session lock was the main issue, or whether bundled plugin runtime deps are still the culprit.

#

That output is good, actually: plugins are fully disabled and doctor/status isn’t wedged anymore. The ā€œGateway not runningā€ line just means we haven’t started the foreground gateway yet.

Now run this in the same PowerShell window where those env vars are set:

openclaw gateway run

Leave it running. Then in a second PowerShell:

Invoke-WebRequest http://127.0.0.1:18789/healthz -TimeoutSec 5

Ignore the bootstrap-size warning for now — not the blocker.

Results:

#
  • If /healthz works → core gateway is healthy; next we re-enable bundled plugins using the clean stage dir.
  • If /healthz times out → issue is deeper than bundled plugins, and the next suspect is state/session store or port/process weirdness.

Paste the last ~30 lines from openclaw gateway run if it hangs or errors.

#

That’s the service/scheduled-task path, not the foreground run path. The scheduled task is disabled, so any gateway start/restart/status style command will complain. That’s okay for this test.

Use foreground only:

# Same PowerShell with env vars set
openclaw gateway run

Do not use openclaw gateway start for this step.

If gateway run exits immediately, paste the last output. If it keeps running, then in a second PowerShell:

Invoke-WebRequest http://127.0.0.1:18789/healthz -TimeoutSec 5
#

If you want to re-enable the Windows scheduled task later, that’s a separate step after we prove the foreground gateway is healthy.

true skiff
#

healthz keeps timing out, but the gateway run has gotten to 'ready status'

willow willowBOT
#

Hmm — if gateway run reached ā€œreadyā€, then /healthz should be instant. /healthz is the shallow live probe, not a deep channel/model check. So now we need to verify whether anything is actually listening on 18789, and whether PowerShell is probing the right thing.

In the second PowerShell, run:

Get-NetTCPConnection -LocalPort 18789 -ErrorAction SilentlyContinue |
  Select-Object LocalAddress,LocalPort,State,OwningProcess

netstat -ano | findstr ":18789"

Then try curl.exe instead of PowerShell’s web cmdlet:

curl.exe --noproxy "*" -v --max-time 5 http://127.0.0.1:18789/healthz
curl.exe --noproxy "*" -v --max-time 5 http://localhost:18789/healthz
#

If you see an OwningProcess, identify it:

Get-Process -Id <PID_FROM_OWNINGPROCESS> |
  Select-Object Id,ProcessName,Path

Paste:

  1. the last ~20 lines from the openclaw gateway run window, especially any ā€œlistening/bind/readyā€ lines
  2. the Get-NetTCPConnection / curl.exe -v output

This will tell us whether the gateway is bound but blocked, bound on a different address, or if /healthz is hitting a stale/wrong process. Docs ref: https://docs.openclaw.ai/gateway/health

true skiff
#

LocalAddress LocalPort State OwningProcess


::1 18789 Listen 17084
127.0.0.1 18789 CloseWait 17084
127.0.0.1 18789 Established 17084
127.0.0.1 18789 Listen 17084
127.0.0.1 18789 Established 17084
127.0.0.1 18789 CloseWait 17084

PS D:\Caelan.openclaw\workspace>
PS D:\Caelan.openclaw\workspace> netstat -ano | findstr ":18789"
TCP 127.0.0.1:18789 0.0.0.0:0 LISTENING 17084
TCP 127.0.0.1:18789 127.0.0.1:59096 ESTABLISHED 17084
TCP 127.0.0.1:18789 127.0.0.1:61863 CLOSE_WAIT 17084
TCP 127.0.0.1:18789 127.0.0.1:61882 CLOSE_WAIT 17084
TCP 127.0.0.1:18789 127.0.0.1:64219 ESTABLISHED 17084
TCP 127.0.0.1:59096 127.0.0.1:18789 ESTABLISHED 14796
TCP 127.0.0.1:61863 127.0.0.1:18789 FIN_WAIT_2 19156
TCP 127.0.0.1:61882 127.0.0.1:18789 FIN_WAIT_2 19156
TCP 127.0.0.1:64219 127.0.0.1:18789 ESTABLISHED 14796
TCP [::1]:18789 [::]:0 LISTENING 17084

#

PS D:\Caelan.openclaw\workspace> curl.exe --noproxy "*" -v --max-time 5 http://127.0.0.1:18789/healthz

  • Trying 127.0.0.1:18789...
  • Established connection to 127.0.0.1 (127.0.0.1 port 18789) from 127.0.0.1 port 51845
  • using HTTP/1.x

GET /healthz HTTP/1.1
Host: 127.0.0.1:18789
User-Agent: curl/8.18.0
Accept: /

  • Request completely sent off
  • Operation timed out after 5013 milliseconds with 0 bytes received
  • closing connection #0
    curl: (28) Operation timed out after 5013 milliseconds with 0 bytes received
    PS D:\Caelan.openclaw\workspace> curl.exe --noproxy "*" -v --max-time 5 http://localhost:18789/healthz
  • Host localhost:18789 was resolved.
  • IPv6: ::1
  • IPv4: 127.0.0.1
  • Trying [::1]:18789...
  • Established connection to localhost (::1 port 18789) from ::1 port 51848
  • using HTTP/1.x

GET /healthz HTTP/1.1
Host: localhost:18789
User-Agent: curl/8.18.0
Accept: /

  • Request completely sent off
  • Operation timed out after 5002 milliseconds with 0 bytes received
  • closing connection #0
    curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
    PS D:\Caelan.openclaw\workspace> Get-Process -Id <PID_FROM_OWNINGPROCESS> |

Select-Object Id,ProcessName,Path
At line:1 char:17

  • Get-Process -Id <PID_FROM_OWNINGPROCESS> |
  •             ~
    

The '<' operator is reserved for future use.
+ CategoryInfo : ParserError: (:) [], ParentContainsErrorRecordException
+ FullyQualifiedErrorId : RedirectionNotSupported