Tools killing tasks | Friends of the Crustacean 🦞🤝 | Page 1

silk copper Mar 31, 2026, 12:45 PM

#

Hardware: Beelink GTR9 Pro AMD AI MAX 395+ , 128 GB Unified
Software: Lemonade Server with llamma.cpp rocm backend
Model: Qwen3.5 35B-A3B , 19.70GB

So am using a local LLM and I can keep chatting with a bot for a long time but the minute I give them a task that requires subagent spawn or reading files or web search they just eat up context so fast and get 'terminated' then I gotta message them again to resume the task even though session context isnt full yet.

Any solutions to do this correctly, or where bots can stream tool output slowly rather than dump lots of data at once?

drowsy yachtBOT Mar 31, 2026, 12:49 PM

#

Yep — what you’re seeing is a common local-model failure mode with tool-heavy runs.

The short version: session history can look “not full” in chat, but tool results (read/web output/subagent chatter) can still blow up the effective prompt fast.

Try this order:

First check version + type of termination

openclaw --version
openclaw models status
openclaw logs --follow

If subagents are involved, also check:

/subagents info <id>

That tells you if it was a timeout vs context blowup.

#

Enable session pruning (biggest win)

{
agents: {
defaults: {
contextPruning: {
mode: "cache-ttl",
ttl: "5m",
keepLastAssistants: 3
}
}
}
}

This trims old tool results before each call (without rewriting transcript history).

Verify your local model’s real context settings

#

In your provider model entry, make sure contextWindow/maxTokens match what llama.cpp server is actually configured for.
Mismatch here causes “early” failures that feel random.

Reduce tool payload size

Keep /verbose off (avoid /verbose full)
In prompts, force paging/summaries:
read in chunks (limit/offset)
smaller web fetches (maxChars)
fewer search hits (count)

If subagents are timing out
Set a higher default (or 0 for no timeout):

{
agents: {
defaults: {

#

subagents: {
runTimeoutSeconds: 1800
}
}
}
}

On your streaming question: channel messages don’t do true token-delta streaming yet; OpenClaw supports block/preview streaming modes, but that’s presentation-layer streaming, not incremental tool-result ingestion.

Docs:

FAQ (context truncation / context too large): https://docs.openclaw.ai/help/faq
Session pruning: https://docs.openclaw.ai/concepts/session-pruning
Streaming/chunking: https://docs.openclaw.ai/concepts/streaming
Local models: https://docs.openclaw.ai/gateway/local-models
Sub-agents: https://docs.openclaw.ai/tools/subagents

OpenClaw

FAQ - OpenClaw

OpenClaw

Session Pruning - OpenClaw

OpenClaw

Streaming and Chunking - OpenClaw

OpenClaw

Local Models - OpenClaw

OpenClaw

Sub-Agents - OpenClaw

#

If you paste openclaw --version + one failing logs --follow snippet, I can help tune exact values for your rig.

silk copper Mar 31, 2026, 12:58 PM

#

Give me all options to reduce payload size

drowsy yachtBOT Mar 31, 2026, 1:01 PM

#

Perfect question. Here are all the practical payload-reduction levers in OpenClaw, grouped by what they actually shrink:

1) Shrink model input context (most important)

Prune old tool outputs automatically

{
agents: {
defaults: {
contextPruning: {
mode: "cache-ttl",
ttl: "5m",
keepLastAssistants: 3
}
}
}

#

(You can also tune softTrim, hardClear, minPrunableToolChars, tools.allow/deny.)

Compact long sessions

/compact before heavy tool tasks
/new when topic changes

Reduce tool schema overhead

Use a tighter tool profile:

{ tools: { profile: "coding" } }

Deny big tools you don’t need:

#

{ tools: { deny: ["browser", "canvas"] } }

Reduce bootstrap/system-prompt size

{
agents: {
defaults: {
bootstrapMaxChars: 12000,
bootstrapTotalMaxChars: 60000
}
}
}

Also keep AGENTS.md, SOUL.md, TOOLS.md, etc. short.

#

Reduce image payload

{ agents: { defaults: { imageMaxDimensionPx: 800 } } }

Reduce skill/plugin prompt overhead

Allow only bundled skills you actually use:

{ skills: { allowBundled: ["<needed-skill-1>", "<needed-skill-2>"] } }

Disable unused skills/plugins.

2) Shrink tool-result payloads

Web search result count

#

{ tools: { web: { search: { maxResults: 3 } } } }

Web fetch extraction limits

{
tools: {
web: {
fetch: {
maxChars: 8000,
maxCharsCap: 8000,
maxResponseBytes: 1000000
}
}
}

#

Prompt behavior (important)
Tell the bot to paginate reads and summarize:

read files in chunks (offset/limit)
fetch fewer chars/pages
summarize each chunk before continuing

3) Shrink output/message payload (UX, not context)

Keep:

/verbose off
/reasoning off

#

Stream smaller chunks in Discord (display-layer):

channels.discord.streaming: "partial"
optional block streaming chunk/coalesce settings

(Helps readability; does not solve context growth by itself.)

4) Model/server alignment (prevents fake “early full”)

Ensure your local model entry has accurate contextWindow + maxTokens matching llama.cpp runtime.
Cap response verbosity via model params (maxTokens) for that model entry.

If you want, I can give you one “aggressive low-payload” config block you can paste directly.

silk copper Mar 31, 2026, 1:10 PM

#

13:07:33+00:00 info [warn] Subagent announce completion direct announce agent call transient failure, retrying 3/4 in 10s: gateway timeout after 90000ms
Gateway target: ws://127.0.0.1:18789
Source: local loopback
Config: /home/dev/.openclaw/openclaw.json
Bind: loopback

drowsy yachtBOT Mar 31, 2026, 1:12 PM

#

Nice catch — this log line is super useful.

That warning is not “context full” directly. It’s the subagent completion announce step timing out:

Subagent announce ... transient failure ... gateway timeout after 90000ms

So child run finished, but handoff/delivery back to requester timed out.

What to do next

Check version first

openclaw --version

If you’re below 2026.3.12, update — there was a fix specifically around subagent completion announce timeout/retry behavior.

Reduce concurrency pressure

#

{
agents: {
defaults: {
maxConcurrent: 1,
subagents: {
maxConcurrent: 1
}
}
}
}

(Local llama.cpp + subagents + tool calls can starve the gateway loop.)

Keep your payload controls on (from earlier)

contextPruning.mode: "cache-ttl"
lower tools.web.search.maxResults

#

lower tools.web.fetch.maxChars / maxResponseBytes
force paged read usage (offset/limit), not whole files
/verbose off

Make subagents return short announces
In your task prompt, require:

“write full output to file”
“announce only a short summary (N lines)”

If you send openclaw --version, I can tell you immediately whether this is the known pre-fix behavior or something else.

Changelog: https://github.com/openclaw/openclaw/blob/main/CHANGELOG.md
Sub-agents docs: https://docs.openclaw.ai/tools/subagents
Session pruning: https://docs.openclaw.ai/concepts/session-pruning

GitHub

openclaw/CHANGELOG.md at main · openclaw/openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞 - openclaw/openclaw

OpenClaw

Sub-Agents - OpenClaw

OpenClaw

Session Pruning - OpenClaw

#Tools killing tasks

1) Shrink model input context (most important)

2) Shrink tool-result payloads

3) Shrink output/message payload (UX, not context)

4) Model/server alignment (prevents fake “early full”)

What to do next