Logs:
20:13:34+00:00 warn diagnostic {"subsystem":"diagnostic"} stuck session: sessionId=unknown sessionKey=agent:main:telegram:direct:8668358572 state=processing age=143s queueDepth=1 reason=queued_work_without_active_run classification=stale_session_state recovery=checking
20:13:34+00:00 warn diagnostic {"subsystem":"diagnostic"} stuck session recovery skipped: reason=active_reply_work action=keep_lane sessionId=c29a1f24-a652-466e-98f1-ac8644cd00e6 sessionKey=agent:main:telegram:direct:8668358572 age=143s queueDepth=1 activeSessionId=c29a1f24-a652-466e-98f1-ac8644cd00e6
#Trouble with session stalls
1 messages · Page 1 of 1 (latest)
Bandit, this log is the stuck-session watchdog, not necessarily the root error.
What it’s saying:
queued_work_without_active_runmeans the diagnostic state sees queued work foragent:main:telegram:direct:8668358572but doesn’t see the normal active-run marker for that queued item.recovery=checkingmeans OpenClaw considered auto-recovery.recovery skipped: reason=active_reply_work action=keep_lane ... activeSessionId=...means it found an active embedded reply/run for that same lane, so it refused to clear/reset the queue while work might still be legitimately running. That’s intentional protective behavior.
First check the version, because session recovery has been moving recently:
openclaw --version
openclaw status --all
openclaw logs --follow
If Telegram never replies and the same session keeps repeating this for several minutes, the least invasive recovery is usually to send /reset or /new in that Telegram chat, then try again. If that doesn’t release it, restart the gateway and re-check:
openclaw gateway restart
openclaw status
Docs refs: https://docs.openclaw.ai/help/faq and https://docs.openclaw.ai/concepts/session
If it keeps happening after restart, this is worth a GitHub issue with openclaw --version, the openclaw status --all output, and a few minutes of surrounding logs: https://github.com/openclaw/openclaw/issues
Is Anthropic having issues? I keep getting these fallbacks and they continue to resolve at failures.
I checked the public Claude status page just now: Claude API is showing Operational. That doesn’t rule out regional/account/rate-limit trouble, but there isn’t an obvious global outage signal from the status page.
If OpenClaw is falling back and each fallback also resolves to a failure, I’d check these in order:
openclaw --version
openclaw models status --probe
openclaw status --all
openclaw logs --follow
Things to look for in the logs:
429/rate_limit_error/overloaded_error: likely provider capacity, quota, or long-context eligibility.401/ auth / OAuth expiry: credential/profile issue, not Anthropic status.- fallback targets using the same provider/account: the fallback can fail for the same reason as the primary.
Model is not allowed:agents.defaults.modelsallowlist doesn’t include the selected fallback.
Docs: https://docs.openclaw.ai/concepts/models and https://docs.openclaw.ai/concepts/model-failover
If you paste the exact fallback log block plus openclaw --version, I can help tell whether it’s provider-side, auth/config, or fallback-chain behavior.
I'm on 5.7.
Logs: 20:10:05+00:00 warn agent/cli-backend {"subsystem":"agent/cli-backend"} claude live session turn failed: provider=claude-cli model=claude-sonnet-4-6 durationMs=195668 error=FailoverError
20:10:05+00:00 warn model-fallback/decision {"subsystem":"model-fallback/decision"} {"event":"model_fallback_decision","tags":["error_handling","model_fallback","candidate_failed"],"runId":"b9a9ec7d-0f5d-4089-bcab-1faa3e39683b","sessionId":"c29a1f24-a652-466e-98f1-ac8644cd00e6","lane":"main","decision":"candidate_failed","requestedProvider":"anthropic","requestedModel":"claude-sonnet-4-6","candidateProvider":"anthropic","candidateModel":"claude-sonnet-4-6","attempt":1,"total":1,"reason":"timeout","status":408,"errorPreview":"CLI produced no output for 180s and was terminated.","errorHash":"sha256:8349e7990402","fallbackStepType":"fallback_step","fallbackStepFromModel":"anthropic/claude-sonnet-4-6","fallbackStepFromFailureReason":"timeout","fallbackStepFromFailureDetail":"CLI produced no output for 180s and was terminated.","fallbackStepChainPosition":1,"fallbackStepFinalOutcome":"chain_exhausted","isPrimary":true,"requestedModelMatched":true,"fallbackConfigured":false} model fallback decision
This one points more at the Claude CLI backend hanging locally than a clean Anthropic API outage.
Key bits:
provider=claude-climeans OpenClaw is driving the localclaudeCLI, not calling Anthropic API directly.errorPreview: "CLI produced no output for 180s and was terminated."means the CLI process was silent for 3 minutes, so OpenClaw killed that turn.attempt:1 total:1+fallbackConfigured:falsemeans there was no real model fallback chain configured. ThefallbackStep... chain_exhaustedwording is the failover machinery reporting “primary failed and there’s nowhere else to go.”
I’d check the CLI directly on the same machine/user that runs the gateway:
claude --version
claude auth status
claude -p "say hi"
openclaw models status --probe
openclaw logs --follow
If claude -p "say hi" also hangs or takes ages, fix/re-login/update Claude CLI first. If direct Claude CLI works but OpenClaw keeps timing out, try /reset or /new in that Telegram session, then restart the gateway.