#Stuck in a reasoning loop

1 messages · Page 1 of 1 (latest)

woven salmon
#

Open claw reported an issue and this is the answer it told me when I asked

[12/05/2026 21:49] Roger: Can you look at the logs to see why I got an error at 21:45
[12/05/2026 21:49] Portfolio Guardian: Found it. Here's the timeline:

21:25 — Context overflow. Your session hit 38 messages and the estimated context size exceeded the safe threshold for qwen3.6-35b. The system auto-compacted the transcript and retried.

21:31–21:45 — Model got stuck. After compaction, the model started a model_call but never returned a usable response. It kept timing out — the diagnostic flagged it as "stalled" every 30 seconds for 20+ minutes.

21:37 & 21:41 — Reasoning-only retries. The model produced a "reasoning-only" turn (internal chain-of-thought with no visible answer). The system retried twice with visible-answer continuation — both failed.

21:45:44 — Error surfaced. After exhausting retries, the model ended with an "incomplete terminal response." No fallback candidate was available (it was the only model configured), so the error was sent to you.

Root cause: The session grew too large (38 messages), triggering a context overflow. After compaction, the local model (qwen3.6-35b via LM Studio) got stuck in a reasoning loop and never produced a final answer.

Fix options:

  1. Reduce session size — I can keep responses shorter to delay overflow
  2. Add a fallback model — so if one fails, another gets tried
  3. Increase compaction aggressiveness — compact sooner before hitting the hard limit

Want me to set up a fallback model or adjust compaction?

woven sierraBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

lethal axleBOT
#

That diagnosis is plausible: context overflow → auto-compaction → local LM Studio/Qwen model fails to produce a visible final answer → OpenClaw eventually surfaces incomplete terminal response because there’s no fallback.

A couple things I’d check before changing config:

  1. What version are you on?
openclaw --version
  1. Grab a safe diagnostic snapshot around the failure:
openclaw status --all

(tokens are redacted)

  1. For local LM Studio models, make sure you’re using the Responses API when supported and that the model entry has realistic contextWindow / maxTokens. The local-model docs recommend api: "openai-responses" for LM Studio and keeping hosted fallbacks available with models.mode: "merge".
#

Docs:
https://docs.openclaw.ai/gateway/local-models
https://docs.openclaw.ai/concepts/models

My recommendation: add a hosted fallback first. Making compaction more aggressive may reduce overflow, but it won’t fix the underlying “local model got stuck / reasoning-only” behavior. A fallback gives OpenClaw somewhere sane to go when the local model stalls.

If you paste openclaw --version + the relevant redacted status --all bit, we can sanity-check the exact config keys before you edit anything. 🦞

woven salmon
#

This is openclaw 2026.5.7.

lethal axleBOT
#

Thanks — 2026.5.7 is recent enough that I’d treat this as a local-model/session behavior issue, not an obvious “you’re on an old broken build” thing.

Given the timeline, I’d do this in order:

  1. Start a fresh session for the immediate unblock:
/new

or /reset if you want to clear the current thread/session state.

  1. Add a fallback model in config, then restart/reload the gateway. The important shape is:
agents: {
  defaults: {
    model: {
      primary: "lmstudio/your-qwen-model-id",
#
      fallbacks: ["provider/model-you-trust"]
    }
  }
},
models: {
  mode: "merge"
}

I’m intentionally not naming a specific hosted model here because availability/auth changes fast — use openclaw models status / openclaw models list to pick one you actually have configured.

  1. If LM Studio supports it for that model, use:
api: "openai-responses"

in the LM Studio provider block. That tends to handle reasoning/final text separation better than plain chat completions.

#
  1. Optional but useful: set compaction to use a stronger/stabler model than the local Qwen:
agents: {
  defaults: {
    compaction: {
      model: "provider/model-you-trust"
    }
  }
}

Docs: https://docs.openclaw.ai/concepts/compaction

The key point: shorter replies/earlier compaction can help, but your actual failure was after compaction, when the local model stalled/reasoning-looped. So fallback + possibly a separate compaction model is the cleaner fix. 🦞

woven salmon
#

How can I diagnose why it went in to a loop

lethal axleBOT
#

You probably won’t be able to see the model’s private “why” directly, but you can diagnose which layer caused the loop: OpenClaw context/compaction, LM Studio transport, or the model itself.

I’d do this:

  1. Pull the exact OpenClaw log window around the failure:
openclaw logs --json --local-time > openclaw-log.jsonl

Then search around 21:25–21:45 for:

rg "auto-compaction|context|overflow|model_call|stalled|reasoning|incomplete|timeout|lmstudio|qwen" openclaw-log.jsonl

What you’re looking for:

  • Did compaction succeed cleanly?
  • Did the model stream any bytes/events after the retry?
  • Was there time-to-first-byte, then no final answer?
  • Did LM Studio return stop reason/error, or did OpenClaw abort on timeout?
#
  1. Reproduce with targeted model diagnostics enabled. If you run gateway in foreground:
OPENCLAW_DEBUG_MODEL_TRANSPORT=1 \
OPENCLAW_DEBUG_MODEL_PAYLOAD=summary \
OPENCLAW_DEBUG_SSE=events \
openclaw gateway

If you need deeper payload inspection, temporarily use:

OPENCLAW_DEBUG_MODEL_PAYLOAD=full-redacted

Careful: that can include redacted-but-still-private prompt/message content, so don’t paste it publicly.

Docs: https://docs.openclaw.ai/logging

  1. Check LM Studio’s own server log at the same timestamp. The key distinction is:
  • LM Studio never produced tokens/events → backend/model load/serving problem
#
  • It produced only reasoning/internal-looking output and no final text → model/protocol mismatch or reasoning behavior
  • It produced a final answer but OpenClaw didn’t accept it → adapter/parsing/stop-reason issue worth filing
  1. Compare the pre/post-compaction state:
/compact

Then ask a simple follow-up in the same session. If it loops only after compaction, try setting compaction to a stronger/stabler model, or start a new session and see if the issue disappears.

  1. Sanity-check LM Studio config: for LM Studio, prefer:
api: "openai-responses"

when supported. If you’re using chat completions with a reasoning-heavy Qwen model, that’s a common place for “reasoning but no visible final” weirdness.

Most useful thing to share here would be a redacted snippet containing the log lines around auto-compaction start/complete, the first failed model_call, and the terminal incomplete error — not the full payload. 🦞

woven salmon
#

21:45:44
warn
agent/embedded
reasoning-only retries exhausted: runId=ccede50e-9169-4229-a2c9-c58b06af8438 sessionId=63e5b1e4-d926-4ad2-af50-271a11375ed8 provider=lmstudio/qwen/qwen3.6-35b-a3b attempts=2/2 — surfacing incomplete-turn error

lethal axleBOT
#

That line is actually very specific: OpenClaw did detect the failure mode and tried to repair it twice.

Meaning:

  • LM Studio/Qwen returned an assistant turn that OpenClaw classified as reasoning-only: thinking/reasoning content existed, but there was no visible assistant text to send back.
  • OpenClaw then injected a “visible-answer continuation” retry instruction.
  • Qwen did the same thing again, twice.
  • After attempts=2/2, OpenClaw stopped retrying and surfaced the incomplete-turn error instead of looping forever.

So the “loop” was not OpenClaw blindly looping; it was the local model repeatedly failing the reply contract.

Next logs to find are the earlier two warnings for the same runId:

rg "ccede50e-9169-4229-a2c9-c58b06af8438|reasoning-only assistant turn detected|auto-compaction|model_call|stream" /tmp/openclaw/openclaw-*.log

You should see lines like:

#
reasoning-only assistant turn detected ... retrying 1/2 with visible-answer continuation
reasoning-only assistant turn detected ... retrying 2/2 with visible-answer continuation

If you want to know why Qwen did that, turn on model transport/SSE diagnostics and reproduce:

OPENCLAW_DEBUG_MODEL_TRANSPORT=1 \
OPENCLAW_DEBUG_MODEL_PAYLOAD=summary \
OPENCLAW_DEBUG_SSE=events \
openclaw gateway

If it still shows response events but no visible final text, I’d suspect one of:

  1. LM Studio adapter/API mismatch — use api: "openai-responses" if supported.
  2. Qwen reasoning mode/template emitting hidden reasoning but not final answer.
#
  1. Post-compaction prompt confused the local model enough that it stayed in think/analysis mode.
  2. Context/max token settings too optimistic for the served model, causing degraded behavior after compaction.

The practical workaround is fallback + maybe use a stronger/stabler model for compaction. But diagnostically, that warning already tells us the terminal cause: model produced reasoning-only output twice after OpenClaw explicitly asked for visible final text.

woven salmon
#

21:46:57
debug
openclaw
verbose
21:46:57
debug
openclaw
verbose
21:47:08
debug
agent/embedded
[compaction-diag] start runId=63e5b1e4-d926-4ad2-af50-271a11375ed8 sessionKey=agent:portfolio-guardian:telegram:direct:6938637819 diagId=cmp-mp33q154-3zk_5g trigger=budget provider=lmstudio/qwen/qwen3.6-35b-a3b attempt=1 maxAttempts=1 pre.messages=26 pre.historyTextChars=138144 pre.toolResultChars=138064 pre.estTokens=37684
21:47:08
debug
agent/embedded
[compaction-diag] contributors diagId=cmp-mp33q154-3zk_5g top=[{"role":"toolResult","chars":16000,"tool":"firecrawl__firecrawl_scrape"},{"role":"toolResult","chars":16000,"tool":"firecrawl__firecrawl_scrape"},{"role":"toolResult","chars":16000,"tool":"firecrawl__firecrawl_scrape"}]
21:47:08
warn
agent/embedded
[compaction-diag] end runId=63e5b1e4-d926-4ad2-af50-271a11375ed8 sessionKey=agent:portfolio-guardian:telegram:direct:6938637819 diagId=cmp-mp33q154-3zk_5g trigger=budget provider=lmstudio/qwen/qwen3.6-35b-a3b attempt=1 maxAttempts=1 outcome=failed reason=already_compacted_recently durationMs=8473
21:47:08
debug
openclaw
verbose
21:47:08
debug
openclaw
verbose
21:47:19
debug
agent/embedded
[context-diag] pre-prompt: sessionKey=agent:portfolio-guardian:telegram:direct:6938637819 messages=13 roleCounts=assistant:5,compactionSummary:1,toolResult:7 historyTextChars=96416 maxMessageTextChars=16000 historyImageBlocks=0 systemPromptChars=30227 promptChars=604 promptImages=0 provider=lmstudio/qwen/qwen3.6-35b-a3b sessionFile=/home/admin/.openclaw/agents/portfolio-guardian/sessions/63e5b1e4-d926-4ad2-af50-271a11375ed8.jsonl
21:48:45
debug
agent/embedded
compaction wait aborted: runId=93d0298c-2882-4c18-83d5-9daebf3d35ee sessionId=63e5b1e4-d926-4ad2-af50-271a11375ed8
21:49:15
debug
openclaw
verbose
21:49:15
debug
openclaw
verbose

lethal axleBOT
#

This log is the smoking gun. The loop wasn’t random — the session was saturated with huge tool output.

Key bits:

pre.historyTextChars=138144
pre.toolResultChars=138064
contributors ... firecrawl__firecrawl_scrape 16000 x 3

Almost the entire transcript was Firecrawl scrape results. Then after a recent compaction, OpenClaw tried to compact again because of budget pressure, but refused:

outcome=failed reason=already_compacted_recently

That guard exists to avoid thrashing: compact → immediately compact again → compact again. So the next prompt still had:

#
messages=13
compactionSummary:1
toolResult:7
historyTextChars=96416
systemPromptChars=30227

That is a lot for a local Qwen agent loop: ~96k chars of history, 7 tool results, plus a 30k-char system prompt. My read: Qwen got a post-compaction prompt dominated by scraped web pages and stayed in reasoning mode instead of producing final visible text. OpenClaw retried twice, then stopped.

What I’d do next:

  1. Start a fresh session for this user/task:
/new
#
  1. Reduce Firecrawl payload size. Prefer search/summarize/extract modes over dumping full scrape pages into the session. If the tool has params for onlyMainContent, formats, maxAge, limit, etc., use the smallest useful output.

  2. Add/keep a fallback model. This exact failure needs somewhere else to go when local Qwen returns reasoning-only.

  3. Consider a stronger compaction model:

agents: {
  defaults: {
    compaction: {
      model: "provider/model-you-trust"
    }
  }
}
  1. Consider transcript rotation after compaction so old bulky tool output stops riding along in the active transcript:
#
agents: {
  defaults: {
    compaction: {
      truncateAfterCompaction: true,
      keepRecentTokens: 8000
    }
  }
}

Docs: https://docs.openclaw.ai/concepts/compaction

For diagnosis, inspect that session file and count the largest tool results:

jq -r 'select(.role=="toolResult") | [.toolName, (.content|tostring|length)] | @tsv' \
  /home/admin/.openclaw/agents/portfolio-guardian/sessions/63e5b1e4-d926-4ad2-af50-271a11375ed8.jsonl \
#
  | sort -k2 -nr | head

If those are mostly Firecrawl, I’d call the cause: large scrape outputs kept in context after recent compaction + local Qwen reasoning-only behavior under pressure.

woven salmon
#

WHat does truncateAfterCompaction: true do?

lethal axleBOT
#

truncateAfterCompaction: true makes compaction create a new active transcript instead of continuing to use the same giant session file.

Without it, OpenClaw adds a compaction summary, but the active transcript can still stay bulky on disk and may keep hitting size/budget pressure.

With it enabled:

  • OpenClaw summarizes older history into a compaction summary
  • keeps the recent unsummarized tail, controlled by keepRecentTokens
  • starts a fresh successor transcript containing only the summary + recent tail + preserved state
  • archives the previous full JSONL transcript as the checkpoint/source

So in your case, it helps because those huge Firecrawl toolResult blobs stop being carried forward in the active transcript after compaction.

Example:

agents: {
#
  defaults: {
    compaction: {
      truncateAfterCompaction: true,
      keepRecentTokens: 8000
    }
  }
}

Important: the docs note that byte-guard compaction only really works with truncateAfterCompaction: true, because otherwise the active file doesn’t shrink.

Docs: https://docs.openclaw.ai/concepts/compaction

woven salmon
#

These are the current settings
"compaction": {
"mode": "safeguard",
"model": "lmstudio/qwen/qwen3.6-35b-a3b",
"timeoutSeconds": 1800,
"reserveTokens": 48000,
"keepRecentTokens": 40000,
"reserveTokensFloor": 48000,
"maxHistoryShare": 0.65,
"midTurnPrecheck": {
"enabled": true
},
"truncateAfterCompaction": true,
"maxActiveTranscriptBytes": "40mb",
"memoryFlush": {
"enabled": true,
"model": "lmstudio/qwen/qwen3.6-35b-a3b",
"softThresholdTokens": 16000,
"prompt": "Save decisions, state changes, lessons, and open tasks to memory/YYYY-MM-DD.md. Skip routine conversation. If nothing worth storing, reply exactly NO_R>
"systemPrompt": "Only persist what matters for session continuity. Be concise. If nothing matters, reply exactly NO_REPLY."
}
},

lethal axleBOT
#

These settings explain why compaction didn’t save you much.

The big issue is this combo:

reserveTokens: 48000,
reserveTokensFloor: 48000,
keepRecentTokens: 40000,
maxHistoryShare: 0.65

That is extremely conservative and keeps a huge recent tail. So OpenClaw is saying:

  • reserve 48k tokens of headroom
  • always keep up to 40k recent tokens verbatim
  • allow history to occupy up to 65% of the context
  • use the same local Qwen model for normal replies, compaction, and memory flush
#

For a Firecrawl-heavy session, keepRecentTokens: 40000 means a lot of those huge scrape tool results can survive compaction. That matches your log: after compaction, the next prompt still had ~96k chars and 7 tool results.

I’d tune it down. Something more sane for this workload:

compaction: {
  mode: "safeguard",
  model: "lmstudio/qwen/qwen3.6-35b-a3b", // ideally use a stronger/stabler hosted model here if available
  timeoutSeconds: 900,

  reserveTokens: 24000,
  reserveTokensFloor: 24000,
  keepRecentTokens: 8000,
  maxHistoryShare: 0.4,

  midTurnPrecheck: { enabled: true },
#
  truncateAfterCompaction: true,
  maxActiveTranscriptBytes: "20mb",

  memoryFlush: {
    enabled: true,
    model: "lmstudio/qwen/qwen3.6-35b-a3b",
    softThresholdTokens: 6000,
    systemPrompt: "Only persist what matters for session continuity. Be concise. If nothing matters, reply exactly NO_REPLY.",
    prompt: "Save decisions, state changes, lessons, and open tasks to memory/YYYY-MM-DD.md. Skip routine conversation. If nothing worth storing, reply exactly NO_REPLY."
  }
}

If you have any hosted fallback available, my stronger recommendation is: use the hosted/stabler model for compaction.model and maybe memoryFlush.model, while keeping local Qwen as the main chat model if you want. Compaction is exactly where weaker/local models often struggle, because they’re asked to summarize messy oversized context.

Also: maxActiveTranscriptBytes: "40mb" is pretty high. Your failure happened from token/context pressure before that file-size guard necessarily helped. Dropping to "20mb" is reasonable.

woven salmon
#

I have a context window of 256K for the model

lethal axleBOT
#

256K helps, but two caveats:

  1. OpenClaw has to know it’s 256K.
  2. LM Studio has to actually be serving the model with that context, not just advertising a model that theoretically supports it.

Your log is the important clue:

pre.estTokens=37684
trigger=budget

If OpenClaw truly believed the effective context was 256K, ~38K estimated tokens should not usually be budget pressure, even with reserveTokens: 48000. That suggests one of these is true:

  • the OpenClaw model catalog entry for lmstudio/qwen/qwen3.6-35b-a3b has a lower contextWindow than 256K
  • LM Studio is running the model with a smaller loaded context length
  • safeguard compaction is applying maxHistoryShare/budget limits before the hard 256K window
#
  • the estimator/log is undercounting some parts, but the actual provider request is larger

I’d check these:

openclaw models status
openclaw models list --provider lmstudio
openclaw config get models.providers.lmstudio
openclaw config get agents.defaults.models

In your LM Studio provider config, the model entry should explicitly say something like:

models: [
  {
#
    id: "qwen/qwen3.6-35b-a3b",
    contextWindow: 262144,
    maxTokens: 8192
  }
]

Also check LM Studio’s loaded model settings/UI for context length. If LM Studio is loaded at 64K or 128K, OpenClaw config saying 256K won’t make the backend actually accept/use 256K.

That said: even with a real 256K window, I’d still lower keepRecentTokens. Big context ≠ good reasoning. Seven 16K Firecrawl scrape results plus a 30K system prompt can make local models wander or produce reasoning-only output long before they hit the hard limit.

So I’d keep the large window, but tune like:

reserveTokens: 48000,        // okay if you want huge headroom
#
reserveTokensFloor: 24000,   // don’t force 48K everywhere
keepRecentTokens: 8000,      // this is the big one
maxHistoryShare: 0.4

Main thing to verify now: does OpenClaw’s resolved LM Studio model entry actually show contextWindow: 262144?

woven salmon
#

{
"id": "qwen/qwen3.6-35b-a3b",
"name": "Qwen3.6 35B A3B",
"reasoning": false,
"input": [
"text",
"image"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"compat": {
"supportsReasoningEffort": false
},
"contextWindow": 262144,
"contextTokens": 200000,
"maxTokens": 8192
}

lethal axleBOT
#

Good, that changes the diagnosis a bit: OpenClaw does know the model is large-context.

This part matters:

contextWindow: 262144,
contextTokens: 200000,
maxTokens: 8192

contextWindow is the advertised model window. contextTokens is likely the effective budget OpenClaw is using for prompt assembly. With your reserveTokens: 48000, the rough prompt budget is still about:

200000 - 48000 = 152000 tokens

So your logged pre.estTokens=37684 should not be a hard context overflow. I’d retract the earlier “maybe OpenClaw thinks it’s smaller” suspicion — this config looks fine on that front.

#

What your logs now point to is more specific:

  1. The transcript was dominated by Firecrawl tool results.
  2. Compaction had already happened recently, so a follow-up compaction was blocked by already_compacted_recently.
  3. The next model call still fit the large window, but Qwen returned reasoning-only output twice.

So this is probably not “ran out of context.” It’s more likely Qwen/LM Studio behavior under a messy tool-heavy prompt.

Two settings still look too aggressive even for 256K:

keepRecentTokens: 40000,
maxHistoryShare: 0.65

Those let a lot of raw scrape output survive compaction. Big context is useful, but local reasoning models can still get worse when the prompt is full of raw scraped pages.

#

I’d tune to preserve less raw tail:

keepRecentTokens: 8000,
maxHistoryShare: 0.4

You can probably keep:

reserveTokens: 48000

with a 200K effective budget if you like the headroom.

The other thing I’d investigate is Qwen’s thinking mode/template in LM Studio. Your OpenClaw config says:

#
reasoning: false

but the model still produced something OpenClaw classified as reasoning-only. For Qwen-family models, check whether LM Studio has a “thinking/reasoning” or chat-template setting enabled. If there’s an option to disable thinking, do that. Some Qwen variants also respond to /no_think, but I’d prefer fixing the LM Studio/template setting rather than relying on prompt magic.

For a clean proof, reproduce with diagnostics and see whether LM Studio streams reasoning blocks/text but no final content:

OPENCLAW_DEBUG_MODEL_TRANSPORT=1 \
OPENCLAW_DEBUG_MODEL_PAYLOAD=summary \
OPENCLAW_DEBUG_SSE=peek \
openclaw gateway

If SSE shows only reasoning-ish events and no visible output, it’s the model/template. If SSE shows a final answer and OpenClaw still calls it reasoning-only, that’s worth filing as a bug with the redacted event shape.

woven salmon
#

I've turned off thinking in lmstudio

lethal axleBOT
#

Good move — that removes the most likely model-side cause.

Now I’d do one controlled test rather than waiting for the next failure:

  1. Start a fresh session or reset the affected one:
/new
  1. Ask it a normal short question first. Confirm it produces visible text.

  2. Then retry a smaller version of the Firecrawl task. If it works, the root cause was probably the combo of thinking mode + huge scrape context.

I’d still reduce this setting:

keepRecentTokens: 40000
#

to something like:

keepRecentTokens: 8000

Because even with thinking off, keeping 40k recent tokens means raw Firecrawl pages can survive compaction and dominate the next prompt.

If it happens again with thinking off, grab the same warning plus the preceding OPENCLAW_DEBUG_SSE=peek output. At that point we’re looking for whether LM Studio is still emitting hidden/reasoning-shaped events despite the setting, or whether OpenClaw is misclassifying the response.