#Tools killing tasks

1 messages · Page 1 of 1 (latest)

silk copper
#

Hardware: Beelink GTR9 Pro AMD AI MAX 395+ , 128 GB Unified
Software: Lemonade Server with llamma.cpp rocm backend
Model: Qwen3.5 35B-A3B , 19.70GB

So am using a local LLM and I can keep chatting with a bot for a long time but the minute I give them a task that requires subagent spawn or reading files or web search they just eat up context so fast and get 'terminated' then I gotta message them again to resume the task even though session context isnt full yet.

Any solutions to do this correctly, or where bots can stream tool output slowly rather than dump lots of data at once?

drowsy yachtBOT
#

Yep — what you’re seeing is a common local-model failure mode with tool-heavy runs.

The short version: session history can look “not full” in chat, but tool results (read/web output/subagent chatter) can still blow up the effective prompt fast.

Try this order:

  1. First check version + type of termination
openclaw --version
openclaw models status
openclaw logs --follow

If subagents are involved, also check:

/subagents info <id>

That tells you if it was a timeout vs context blowup.

#
  1. Enable session pruning (biggest win)
{
agents: {
defaults: {
contextPruning: {
mode: "cache-ttl",
ttl: "5m",
keepLastAssistants: 3
}
}
}
}

This trims old tool results before each call (without rewriting transcript history).

  1. Verify your local model’s real context settings
#

In your provider model entry, make sure contextWindow/maxTokens match what llama.cpp server is actually configured for.
Mismatch here causes “early” failures that feel random.

  1. Reduce tool payload size
  • Keep /verbose off (avoid /verbose full)
  • In prompts, force paging/summaries:
  • read in chunks (limit/offset)
  • smaller web fetches (maxChars)
  • fewer search hits (count)
  1. If subagents are timing out
    Set a higher default (or 0 for no timeout):
{
agents: {
defaults: {
#
subagents: {
runTimeoutSeconds: 1800
}
}
}
}

On your streaming question: channel messages don’t do true token-delta streaming yet; OpenClaw supports block/preview streaming modes, but that’s presentation-layer streaming, not incremental tool-result ingestion.

Docs:

#

If you paste openclaw --version + one failing logs --follow snippet, I can help tune exact values for your rig.

silk copper
#

Give me all options to reduce payload size

drowsy yachtBOT
#

Perfect question. Here are all the practical payload-reduction levers in OpenClaw, grouped by what they actually shrink:

1) Shrink model input context (most important)

  1. Prune old tool outputs automatically
{
agents: {
defaults: {
contextPruning: {
mode: "cache-ttl",
ttl: "5m",
keepLastAssistants: 3
}
}
}
#
}

(You can also tune softTrim, hardClear, minPrunableToolChars, tools.allow/deny.)

  1. Compact long sessions
  • /compact before heavy tool tasks
  • /new when topic changes
  1. Reduce tool schema overhead
  • Use a tighter tool profile:
{ tools: { profile: "coding" } }
  • Deny big tools you don’t need:
#
{ tools: { deny: ["browser", "canvas"] } }
  1. Reduce bootstrap/system-prompt size
{
agents: {
defaults: {
bootstrapMaxChars: 12000,
bootstrapTotalMaxChars: 60000
}
}
}

Also keep AGENTS.md, SOUL.md, TOOLS.md, etc. short.

#
  1. Reduce image payload
{ agents: { defaults: { imageMaxDimensionPx: 800 } } }
  1. Reduce skill/plugin prompt overhead
  • Allow only bundled skills you actually use:
{ skills: { allowBundled: ["<needed-skill-1>", "<needed-skill-2>"] } }
  • Disable unused skills/plugins.

2) Shrink tool-result payloads

  1. Web search result count
#
{ tools: { web: { search: { maxResults: 3 } } } }
  1. Web fetch extraction limits
{
tools: {
web: {
fetch: {
maxChars: 8000,
maxCharsCap: 8000,
maxResponseBytes: 1000000
}
}
}
#
}
  1. Prompt behavior (important)
    Tell the bot to paginate reads and summarize:
  • read files in chunks (offset/limit)
  • fetch fewer chars/pages
  • summarize each chunk before continuing

3) Shrink output/message payload (UX, not context)

  1. Keep:
  • /verbose off
  • /reasoning off
#
  1. Stream smaller chunks in Discord (display-layer):
  • channels.discord.streaming: "partial"
  • optional block streaming chunk/coalesce settings

(Helps readability; does not solve context growth by itself.)


4) Model/server alignment (prevents fake “early full”)

  1. Ensure your local model entry has accurate contextWindow + maxTokens matching llama.cpp runtime.

  2. Cap response verbosity via model params (maxTokens) for that model entry.


If you want, I can give you one “aggressive low-payload” config block you can paste directly.

silk copper
#

13:07:33+00:00 info [warn] Subagent announce completion direct announce agent call transient failure, retrying 3/4 in 10s: gateway timeout after 90000ms
Gateway target: ws://127.0.0.1:18789
Source: local loopback
Config: /home/dev/.openclaw/openclaw.json
Bind: loopback

drowsy yachtBOT
#

Nice catch — this log line is super useful.

That warning is not “context full” directly. It’s the subagent completion announce step timing out:

Subagent announce ... transient failure ... gateway timeout after 90000ms

So child run finished, but handoff/delivery back to requester timed out.

What to do next

  1. Check version first
openclaw --version

If you’re below 2026.3.12, update — there was a fix specifically around subagent completion announce timeout/retry behavior.

  1. Reduce concurrency pressure
#
{
agents: {
defaults: {
maxConcurrent: 1,
subagents: {
maxConcurrent: 1
}
}
}
}

(Local llama.cpp + subagents + tool calls can starve the gateway loop.)

  1. Keep your payload controls on (from earlier)
  • contextPruning.mode: "cache-ttl"
  • lower tools.web.search.maxResults
#
  • lower tools.web.fetch.maxChars / maxResponseBytes
  • force paged read usage (offset/limit), not whole files
  • /verbose off
  1. Make subagents return short announces
    In your task prompt, require:
  • “write full output to file”
  • “announce only a short summary (N lines)”

If you send openclaw --version, I can tell you immediately whether this is the known pre-fix behavior or something else.

Changelog: https://github.com/openclaw/openclaw/blob/main/CHANGELOG.md
Sub-agents docs: https://docs.openclaw.ai/tools/subagents
Session pruning: https://docs.openclaw.ai/concepts/session-pruning

GitHub

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞 - openclaw/openclaw