01:21:34 [diagnostic] stuck session: sessionId=medicare-web sessionKey=agent:medicare-web:main state=processing age=135s queueDepth=1
01:22:04 [diagnostic] stuck session: sessionId=medicare-web sessionKey=agent:medicare-web:main state=processing age=165s queueDepth=1
01:22:34 [diagnostic] stuck session: sessionId=medicare-web sessionKey=agent:medicare-web:main state=processing age=195s queueDepth=1
01:23:04 [diagnostic] stuck session: sessionId=medicare-web sessionKey=agent:medicare-web:main state=processing age=225s queueDepth=1
01:23:34 [diagnostic] stuck session: sessionId=medicare-web sessionKey=agent:medicare-web:main state=processing age=255s queueDepth=1
01:24:04 [diagnostic] stuck session: sessionId=medicare-web sessionKey=agent:medicare-web:main state=processing age=285s queueDepth=1
01:24:19 [agent/embedded] embedded run timeout: runId=301f22c8-0ccf-4c38-9936-192b9cf5535e sessionId=0aa3896c-c615-419c-a8d5-b075a4566c8c timeoutMs=300000
01:24:49 [agent/embedded] Profile openai-codex:default timed out. Trying next account...
01:24:49 [agent/embedded] embedded run failover decision: runId=301f22c8-0ccf-4c38-9936-192b9cf5535e stage=assistant decision=fallback_model reason=timeout from=openai-codex/gpt-5.5 profile=sha256:06bfb5171eff
01:24:49 [diagnostic] lane task error: lane=main durationMs=340196 error="FailoverError: LLM request timed out."
01:24:49 [diagnostic] lane task error: lane=session:agent:medicare-web:main durationMs=340199 error="FailoverError: LLM request timed out."
01:24:49 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai-codex/gpt-5.5 candidate=openai-codex/gpt-5.5 reason=timeout next=openai-codex/gpt-5.4 detail=LLM request timed out.
01:24:52 [plugins] memory-lancedb-pro: all 3 memories were filtered out due to redundancy policy
01:25:01 [plugins] memory-lancedb-pro: injecting 2 memories into context for agent medicare-web
01:25:42 [agent/embedded] embedded run failover decision: runId=301f22c8-0ccf-4c38-9936-192b9cf5535e stage=assistant decision=surface_error reason=timeout from=openai-codex/gpt-5.4 profile=sha256:06bfb5171eff
01:25:42 [model-fallback/decision] model fallback decision: decision=candidate_succeeded requested=openai-codex/gpt-5.5 candidate=openai-codex/gpt-5.4 reason=unknown next=none
01:28:04 [plugins] memory-lancedb-pro: injecting 3 memories into context for agent peanut
01:28:24 [plugins] memory-lancedb-pro: regex fallback found 0 capturable texts for agent peanut
#STUCK SESSION
1 messages · Page 1 of 1 (latest)
That log looks like a provider/model timeout, not a memory/plugin issue.
What’s happening:
stuck session ... processingis the diagnostic warning while the agent turn is still running.- At
timeoutMs=300000, the embedded run hit the 5-minute timeout. - OpenClaw then tried failover from
openai-codex/gpt-5.5toopenai-codex/gpt-5.4. - The
memory-lancedb-prolines are probably incidental context injection, not the cause.
First thing: what version are you on?
openclaw --version
openclaw status --all
If this is still blocking the medicare-web session, try a fresh session/reset for that agent, or restart the gateway once. If it comes back repeatedly, the useful bug report is: version, openclaw status --all, and ~100 log lines before the first embedded run timeout.
Relevant docs: https://docs.openclaw.ai/help/faq and https://docs.openclaw.ai/concepts/model-failover
Version 4.23
It's on a fresh /new session
Got it — if /new is fresh, then I’d stop looking at stale session state. This is almost certainly the live model call hanging until the runner’s 300s timeout.
Two quick clarifiers:
- Is the exact version output
2026.4.23? If yes, I’d update first before chasing ghosts. - Did the fallback reply from
openai-codex/gpt-5.4ever arrive, or did the user-facing turn still fail?
For triage, please grab:
openclaw status --all
openclaw gateway status
And the log window from ~1 minute before embedded run timeout through the final model-fallback/decision line.
If you need to unblock immediately: switch that agent/session to a different configured working model/provider, or remove the slow Codex model from that agent’s fallback chain temporarily. The queueDepth=1 part is just the next message stuck behind the active turn; the root symptom is the 5-minute LLM timeout.
It's not 4.23 - something else is wrong.
Answer from the agent usually is :25:42 [agent/embedded] embedded run failover decision: runId=301f22c8-0ccf-4c38-9936-192b9cf5535e stage=assistant decision=surface_error reason=timeout from=openai-codex/gpt-5.4 profile=sha256:06bfb5171eff
Ah yep — I misread your 4.23, sorry. The important line is this one:
stage=assistant decision=surface_error reason=timeout from=openai-codex/gpt-5.4
That means the fallback candidate also timed out, and OpenClaw had no successful next step left, so it surfaced the timeout instead of producing an assistant answer. In other words: fresh /new rules out stale session history; this is the active LLM call path timing out twice.
The weird part is your later line:
candidate_succeeded ... candidate=openai-codex/gpt-5.4
paired with surface_error reason=timeout. That suggests the fallback wrapper considered the gpt-5.4 candidate “handled”, but the embedded runner handled it by surfacing a timeout/error payload. That’s worth filing as a bug if the user-visible reply is just bad/internal instead of a clean timeout message.
Next things I’d check:
openclaw status --all
openclaw models status
And whether a direct minimal run on that same provider/model hangs too. If both openai-codex/gpt-5.5 and openai-codex/gpt-5.4 are timing out under the same auth profile hash, I’d temporarily move this agent to a different known-good provider/model or remove Codex from its fallback chain to unblock.
For a GitHub issue, include the exact log window you pasted plus config model/fallback shape with secrets redacted: https://github.com/openclaw/openclaw/issues
To be clear this was the exact sequence over multiple attempts
queueDepth=1
01:24:04 [diagnostic] stuck session: sessionId=medicare-web sessionKey=agent:medicare-web:main state=processing age=285s queueDepth=1
01:24:19 [agent/embedded] embedded run timeout: runId=301f22c8-0ccf-4c38-9936-192b9cf5535e sessionId=0aa3896c-c615-419c-a8d5-b075a4566c8c timeoutMs=300000
01:24:49 [agent/embedded] Profile openai-codex:default timed out. Trying next account...
01:24:49 [agent/embedded] embedded run failover decision: runId=301f22c8-0ccf-4c38-9936-192b9cf5535e stage=assistant decision=fallback_model reason=timeout from=openai-codex/gpt-5.5 profile=sha256:06bfb5171eff
01:24:49 [diagnostic] lane task error: lane=main durationMs=340196 error="FailoverError: LLM request timed out."
01:24:49 [diagnostic] lane task error: lane=session:agent:medicare-web:main durationMs=340199 error="FailoverError: LLM request timed out."
01:24:49 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=openai-codex/gpt-5.5 candidate=openai-codex/gpt-5.5 reason=timeout next=openai-codex/gpt-5.4 detail=LLM request timed out.
01:24:52 [plugins] memory-lancedb-pro: all 3 memories were filtered out due to redundancy policy
01:25:01 [plugins] memory-lancedb-pro: injecting 2 memories into context for agent medicare-web
01:25:42 [agent/embedded] embedded run failover decision: runId=301f22c8-0ccf-4c38-9936-192b9cf5535e stage=assistant decision=surface_error reason=timeout from=openai-codex/gpt-5.4 profile=sha256:06bfb5171eff
01:25:42 [model-fallback/decision] model fallback decision: decision=candidate_succeeded requested=openai-codex/gpt-5.5 candidate=openai-codex/gpt-5.4 reason=unknown next=none
01:28:04 [plugins] memory-lancedb-pro: injecting 3 memories into context for agent peanut
01:28:24 [plugins] memory-lancedb-pro: regex fallback found 0 capturable texts for agent peanut
01:34:18 [diagnostic] stuck session: sessionId=medicare-web sessionKey=agent:medicare-web:main state=processing age=120s queueDepth=1
01:34:49 [diagnostic] stuck session: sessionId=medicare-web sessionKey=agent:medicare-web:main state=processing age=151s queueDepth=1
01:35:19 [diagnostic] stuck session: sessionId=medicare-web sessionKey=agent:medicare-web:main state=processing age=181s queueDepth=1
Yep, this sequence makes it clearer. I’d read it as:
queueDepth=1= another message/turn is waiting behind the activemedicare-webrun. It’s a symptom, not the root cause.01:24:19 embedded run timeout ... timeoutMs=300000= the primaryopenai-codex/gpt-5.5run hit the 5-minute run timeout.01:24:49 Profile openai-codex:default timed out. Trying next account...= profile rotation was attempted, but there wasn’t a usable next profile/account, so it escalated.candidate_failed ... next=openai-codex/gpt-5.4= model fallback kicked in correctly.01:25:42 ... surface_error reason=timeout from=openai-codex/gpt-5.4= the fallback model also produced a timeout-shaped failure.candidate_succeeded ... candidate=openai-codex/gpt-5.4is the suspicious bit. That looks like the outer fallback wrapper treated “the fallback candidate returned a surfaced error result” as success, instead of logging it ascandidate_failed/ final failure. So the logging/state here is probably misleading even though the underlying issue is still: both Codex candidates timed out.
So I don’t think this is memory, stale /new state, or LanceDB. The memory-lancedb-pro lines are just context being rebuilt for the fallback/new attempt.
The real question is why openai-codex is timing out for that agent. I’d check three things:
openclaw status --all
openclaw models status
Then temporarily switch medicare-web away from openai-codex/gpt-5.5 + gpt-5.4 to any known-good model/provider. If that fixes it, we’ve isolated this to the Codex provider/auth path or that model fallback chain.
Also: please file this log sequence as a GitHub issue. The surface_error reason=timeout followed by candidate_succeeded is worth fixing or at least making much clearer: https://github.com/openclaw/openclaw/issues
Are older versions of OpenClaw maintained and updated?
Short answer: no, not as long-lived maintenance branches.
OpenClaw moves fast and support is generally for current latest / beta / dev builds, not old pinned versions. Older versions may still exist on npm/GitHub, but bugfixes are not usually backported to them.
So for a bug like this, first test/update to current stable or beta/dev if asked by maintainers:
openclaw update status
openclaw update
Updating to a newer version isn't an option as it bricks my entire OpenClaw
Oof, that’s frustrating — and it changes the priority. If newer OpenClaw bricks the install, that’s the first bug to isolate, because staying pinned means you’re effectively stuck on a snapshot with no backported fixes.
I wouldn’t tell you to blindly update in-place. Safer path:
- Keep your current working install pinned/backed up.
- Test the newer version in a separate install/state dir if possible.
- Capture the actual “bricks my OpenClaw” failure so it can be fixed.
Useful info:
openclaw --version
openclaw update status
openclaw status --all
And for the failed newer version: what exactly breaks — gateway won’t start, config migration fails, auth disappears, sessions won’t load, tools fail, Discord stops replying?
If you can paste the first fatal error from the newer-version logs, we can triage that without risking your current setup. If it’s reproducible, it should go to GitHub too: https://github.com/openclaw/openclaw/issues
What CLI is latest beta branch running
Latest npm dist-tags right now:
latest: 2026.5.7
beta: 2026.5.12-beta.6
So the beta CLI would be openclaw@2026.5.12-beta.6.
Check locally with:
openclaw --version
openclaw update status
What harness?
By “harness” I mean the runner layer OpenClaw uses to talk to the model/CLI and turn it into an agent session — not a separate thing you install.
For openai-codex/..., that’s the OpenClaw embedded/agent runner path around the Codex provider: it sends the prompt/tools, watches streaming/output, enforces timeouts, and reports failover. Your logs are from that layer:
[agent/embedded] embedded run timeout
[model-fallback/decision]
So in your case: the OpenClaw runner/harness is timing out waiting for the Codex model response.
o OpenAI OAuth complete
Config write rejected: C:\Users\desti.openclaw\openclaw.json (size-drop:37509->12843). Rejected payload saved to C:\Users\desti.openclaw\openclaw.json.rejected.2026-05-15T06-10-57-064Z.
Error: Config write rejected: C:\Users\desti.openclaw\openclaw.json (size-drop:37509->12843). Rejected payload saved to C:\Users\desti.openclaw\openclaw.json.rejected.2026-05-15T06-10-57-064Z.
First: what does openclaw --version show?
That error is OpenClaw’s config safety guard doing its job. The OAuth flow tried to write a config that was much smaller than your current one (37509 -> 12843 bytes), so OpenClaw rejected it instead of risking clobbering your existing openclaw.json. The active config should still be the original file; the smaller attempted write was saved as:
C:\Users\desti\.openclaw\openclaw.json.rejected.2026-05-15T06-10-57-064Z
Please don’t paste either config file here, since they may contain secrets.
Run these in PowerShell:
openclaw --version
openclaw config validate
openclaw doctor
openclaw doctor --fix
openclaw config validate