#Can you help me reduce my context size? I think my fixec cost starts at 20k.
1 messages Ā· Page 1 of 1 (latest)
Yeah, very plausible.
For local backends, OpenClaw warns below 32k and blocks below 16k, and even at 16k the baseline prompt, tools, workspace files, and chat history can eat the window fast.
First, what does this show?
openclaw --version
Then in the thread try:
/status/context list/compact
If it still keeps happening, /new starts fresh.
A few practical fixes:
- use a model with 64k+ context, ideally 128k+ for tool use
- make sure Ollama is using the native URL like
http://host:11434, not/v1 - if you must stay local/small, try:
agents.defaults.experimental.localModelLean: truecompat.supportsTools: falseon that model
Docs:
- https://docs.openclaw.ai/help/faq#i-am-getting-context-too-large-errors-how-do-i-reset-or-compact
- https://docs.openclaw.ai/reference/token-use
- https://docs.openclaw.ai/providers/ollama
- https://docs.openclaw.ai/gateway/local-models
Paste openclaw --version and /status and Iāll tell you the best next step.
š¦ OpenClaw 2026.4.21 (f788c88)
š§ Model: openai-codex/gpt-5.4-mini Ā· š oauth (openai-codex:marketing.ps@trrgroup.com (marketing.ps@trrgroup.com))
š Fallbacks: openai-codex/gpt-5.4
š§® Tokens: 20k in / 116 out
š Context: 20k/200k (10%) Ā· š§¹ Compactions: 0
š Usage: 5h 85% left ā±2h 9m Ā· Week 96% left ā±6d 16h
𧵠Session: agent:main:main:heartbeat ⢠updated just now
āļø Runtime: direct Ā· Think: low Ā· Text: low
šŖ¢ Queue: collect (depth 0)
Yep, that basically confirms it.
Your version is fine, 2026.4.21 is recent. The big signal is this line:
š Context: 20k/200k
So your current session is already carrying about 20k tokens. If you point that same kind of session at an Ollama model with only 16k context, itās very likely to fail immediately or act weird.
One important detail though: the status you pasted is for a 200k Codex session, not the Ollama model itself. So the problem is basically:
- current OpenClaw session payload ā 20k
- Ollama model window = 16k
- that math does not work š
What Iād do next:
- Start fresh before testing Ollama
/new
-
Check what is eating the baseline
/context list/context detail
-
If you want Ollama to work locally, donāt use 16k
- 32k is the bare minimum territory
- 64k+ is much more realistic for tool use
- 128k feels way better
-
Shrink the baseline if you must stay local
agents.defaults.experimental.localModelLean: true- lower
agents.defaults.bootstrapMaxChars - lower
agents.defaults.bootstrapTotalMaxChars - if needed, set that Ollama model to
compat.supportsTools: false
- Make sure youāre using native Ollama
- use
http://host:11434 - not
.../v1
- use
Docs:
- https://docs.openclaw.ai/reference/token-use
- https://docs.openclaw.ai/providers/ollama
- https://docs.openclaw.ai/gateway/local-models
- https://docs.openclaw.ai/help/faq#i-am-getting-context-too-large-errors-how-do-i-reset-or-compact
If you paste your openclaw models status and Ollama model name, I can help you trim it down to a config that actually fits.
š§ Context breakdown
Workspace: C:\Users\prapunsak.openclaw\workspace
Bootstrap max/file: 32,000 chars
Bootstrap max/total: 180,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 66,423 chars (~16,606 tok) (Project Context 37,699 chars (~9,425 tok))
Injected workspace files:
- AGENTS.md: OK | raw 17,305 chars (~4,327 tok) | injected 17,305 chars (~4,327 tok)
- SOUL.md: OK | raw 4,818 chars (~1,205 tok) | injected 4,818 chars (~1,205 tok)
- TOOLS.md: OK | raw 1,636 chars (~409 tok) | injected 1,636 chars (~409 tok)
- IDENTITY.md: OK | raw 3,009 chars (~753 tok) | injected 3,009 chars (~753 tok)
- USER.md: OK | raw 2,214 chars (~554 tok) | injected 2,214 chars (~554 tok)
- HEARTBEAT.md: OK | raw 860 chars (~215 tok) | injected 860 chars (~215 tok)
- BOOTSTRAP.md: MISSING | raw 0 | injected 0
- MEMORY.md: OK | raw 8,179 chars (~2,045 tok) | injected 8,179 chars (~2,045 tok)
Skills list (system prompt text): 9,900 chars (~2,475 tok) (21 skills)
Skills: budget-audit, coding-agent, config-diff-review, gog, healthcheck, knowledge-reconciliation, learning-loop, memory-promotion, memory-watchdog, model-routing, nano-pdf, node-connect, notion, receipt-normalizer, self-improvement-review, session-logs, skill-creator, summarize, taskflow, taskflow-inbox-triage, ⦠(+1 more)
Tool list (system prompt text): 0 chars (~0 tok)
Tool schemas (JSON): 13,944 chars (~3,486 tok) (counts toward context; not shown as text)
Tools: read, edit, write, apply_patch, exec, process, cron, update_plan, sessions_list, sessions_history, sessions_send, sessions_yield, sessions_spawn, subagents, session_status, web_fetch, image
Session tokens (cached): 19,953 total / ctx=200,000
Inline shortcut: a command token inside normal text (e.g. āhey /statusā) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.
Yep, thatās the smoking gun.
Your problem is not really chat history, itās the baseline system prompt.
Right now, before the model even starts replying, youāre already at about:
- workspace files: ~9.4k tokens
- skills block: ~2.5k tokens
- tool schemas: ~3.5k tokens
- total system prompt: ~16.6k tokens
So a 16k Ollama model literally has no room left. Itās already over budget before the actual conversation does much.
Also, important: /new only helps session history. It does not fix this big bootstrap/system-prompt load.
Biggest wins, in order
-
Lower bootstrap limits hard
- yours:
32,000per file /180,000total - docs default:
12,000per file /60,000total - for local Ollama, Iād go much lower than your current settings
- yours:
-
Trim the biggest injected files first
AGENTS.mdis the biggest hitMEMORY.mdis also expensive- then
SOUL.md,IDENTITY.md,USER.md
-
Enable lean local mode
agents.defaults.experimental.localModelLean: true
-
Shrink the skills prompt budget
skills.limits.maxSkillsPromptChars- or per-agent:
agents.list[].skillsLimits.maxSkillsPromptChars
- Use a dedicated lean Ollama agent
- smaller workspace/bootstrap
- smaller tool allowlist
- or disable tools for that model with
compat.supportsTools: false
Minimal starter config to try
{
agents: {
defaults: {
bootstrapMaxChars: 4000,
bootstrapTotalMaxChars: 12000,
experimental: {
localModelLean: true,
},
},
},
skills: {
limits: {
maxSkillsPromptChars: 2000,
},
},
}
If that still isnāt enough, then make the Ollama model/tooling leaner too.
My honest recommendation
Keep your normal full agent for cloud models, and make a separate slim Ollama agent for local runs. Trying to cram this current prompt shape into 16k is going to be miserable.
Docs:
- https://docs.openclaw.ai/reference/token-use
- https://docs.openclaw.ai/concepts/context
- https://docs.openclaw.ai/providers/ollama
- https://docs.openclaw.ai/gateway/local-models
If you want, paste your openclaw.json model/agent section and Iāll help you turn it into a lean Ollama profile.
Can you help me take a look at my Openclaw Control System Administrator/Technical Support Agent.md?
Yep, absolutely. That file is probably one of the biggest wins.
From your /context detail, AGENTS.md is already ~17.3k chars / ~4.3k tokens, so trimming it could save a lot.
I canāt read your machine from here, so paste the file here and Iāll go through it with you. If itās huge, send it in 2 to 3 chunks.
If you want, paste just these parts first:
- the top section / role description
- any long rules / policy lists
- examples / templates
- anything that feels ānice to haveā instead of āneeded every turnā
Iāll tell you:
- what should stay in
AGENTS.md - what should move out
- what can be shortened hard
- what is probably burning tokens for no benefit
If you paste it, Iāll mark it up.
I can't send markdown in this chat. Maybe you can help me by writing a prompt for me to put in Claude so that it helps me trim basd on your strict rules to reduce unnessary token usage in agent.md at start up?
Yep, use this. Paste this into Claude first, then paste your Agent.md under it.
You are optimizing an OpenClaw AGENTS.md / agent startup instruction file for minimum token usage.
Context:
- This file is injected into the model context at startup on every run.
- My local Ollama model has a small context window, so every token matters.
- Other files are already injected separately, including things like persona, identity, user info, tools, memory, and heartbeat.
- Because of that, AGENTS.md should NOT duplicate personality, biography, tool docs, examples, or anything that can live elsewhere.
Your job:
Ruthlessly trim the file while preserving only instructions that are truly necessary on most turns.
Strict rules:
- Keep only instructions that are needed frequently and materially change behavior.
- Delete anything that is:
- repetitive
- explanatory instead of directive
- backstory, lore, personality, or flavor
- long examples
- duplicated from other startup files
- duplicated from built-in tool behavior
- only useful in rare edge cases
- Compress long paragraphs into short bullets.
- Merge overlapping rules.
- Prefer direct commands over explanation.
- If something is useful but not needed every turn, move it out of AGENTS.md and mark it as āmove to external docā.
- Assume separate files already handle:
- persona/tone
- identity
- user profile
- tool descriptions
- memory notes
- Do not preserve ānice to haveā wording.
- Be aggressive. I want a lean operational file, not a manifesto.
Optimization target:
- Reduce total size as much as possible.
- Prefer a hard practical target of under 4,000 characters if possible.
- If that is not possible, still minimize as much as you can.
Output format:
- āWhat to removeā list
- āWhat to move out of AGENTS.mdā list
- āWhat must stayā list
- A fully rewritten lean version of the file
- A short explanation of the biggest token savings
Important review test:
For every line, ask:
- Is this needed on most turns?
- Does it prevent a real failure mode?
- Is it already covered elsewhere?
- Is it worth paying for on every single startup?
If the answer is no, remove it.
Do not ask me clarifying questions first.
Do the rewrite immediately.
If you want, after Claude rewrites it, paste the result here and Iāll do a second-pass trim.
Yep, use this one for MEMORY.md.
Paste this into Claude first, then paste your full MEMORY.md under it.
You are trimming MEMORY.md for an OpenClaw workspace.
Context:
- MEMORY.md is injected into startup context on many runs, so it must stay extremely lean.
- Its job is to store only durable, high-value facts that repeatedly improve behavior.
- Other files already exist for persona, identity, user profile, tools, and general instructions.
- Daily or temporary notes can live outside MEMORY.md.
Your job:
Ruthlessly reduce MEMORY.md to only the facts that are genuinely worth paying for on every startup.
Keep only:
- durable user preferences that matter often
- stable environment-specific facts
- recurring pain points that still happen repeatedly
- long-lived terminology corrections or naming rules
- persistent safety boundaries or important do-not-do items not already covered elsewhere
Delete or move out:
- one-off incidents
- resolved bugs that are unlikely to recur
- outdated version-specific notes
- release-specific quirks
- temporary troubleshooting notes
- stale reminders
- examples
- explanations
- duplicates of AGENTS.md, USER.md, IDENTITY.md, SOUL.md, or tool docs
- anything that only mattered once
- anything that can live in a dated note or external reference file
Strict rules:
- Prefer very short bullets.
- Remove all filler and explanation.
- Merge overlapping items.
- If a note is old, narrow, or tied to a single historical incident, assume it should be removed or moved out unless it still causes recurring failures.
- MEMORY.md should contain facts, not essays.
- If a fact is useful but not needed on every startup, move it to an archive or dated memory note.
- Be aggressive.
Optimization target:
- Cut total size as much as possible.
- Prefer a practical target of under 2,000 characters if possible.
Output format:
- āRemove entirelyā
- āMove to archive or dated memoryā
- āKeepā
- A fully rewritten lean MEMORY.md
- A short note on the biggest token savings
Review test for every line:
- Is this durable?
- Is this likely to matter again?
- Does it change behavior in a meaningful way?
- Is it worth injecting every startup?
If no, remove it.
Do not ask clarifying questions first.
Rewrite immediately.
If you want, after Claude trims it, paste the result here and Iāll do a second-pass cleanup.
Yep, hereās a good one for SOUL.md.
Paste this into Claude first, then paste your full SOUL.md under it.
You are trimming SOUL.md for an OpenClaw workspace.
Context:
- SOUL.md is startup-injected context, so it should be extremely small.
- Its purpose is only to preserve core voice, vibe, and a few non-negotiable personality/boundary cues.
- It should NOT duplicate instructions already covered by AGENTS.md, USER.md, IDENTITY.md, MEMORY.md, or built-in system rules.
- This file is not for long philosophy, lore, backstory, or repeated safety instructions.
Your job:
Ruthlessly compress SOUL.md into the smallest possible file that still preserves the intended personality and tone.
Keep only:
- the smallest set of lines needed to preserve voice
- a few core behavioral truths
- a few genuine boundaries if they are unique to SOUL.md
- the vibe in compact form
Delete or compress:
- lore
- origin story
- philosophy paragraphs
- repeated safety language
- repeated ābe helpful / be kind / be carefulā wording
- examples
- dramatic phrasing
- duplicate tone guidance
- anything already covered elsewhere
- anything that does not change behavior in a noticeable way
Strict rules:
- Prefer short bullets over paragraphs.
- Keep personality, but make it compact.
- One idea per line.
- Remove repetition and explanation.
- Preserve tone with as few words as possible.
- If a line is nice but not essential, delete it.
- Aim for a file that feels alive, but not bloated.
Optimization target:
- Prefer a practical target of under 1,000 characters if possible.
Output format:
- āRemove entirelyā
- āKeep in compressed formā
- A fully rewritten lean SOUL.md
- A short note on what personality traits were preserved
Review test for every line:
- Does this meaningfully shape tone?
- Is it unique to SOUL.md?
- Is it worth paying for on every startup?
- Can it be said in fewer words?
If no, remove or compress it.
Do not ask clarifying questions first.
Rewrite immediately.
If you want, after Claude trims it, paste the result here and Iāll help you make it even tighter.
Can you write a prompt for me to trim IDENTITY.md?
Yep, hereās one for IDENTITY.md.
Paste this into Claude first, then paste your full IDENTITY.md under it.
You are trimming IDENTITY.md for an OpenClaw workspace.
Context:
- IDENTITY.md is startup-injected context, so it must be tiny.
- Its purpose is only to preserve the agentās essential identity markers.
- It should NOT duplicate tone rules from SOUL.md, operating rules from AGENTS.md, or user/environment facts from USER.md and MEMORY.md.
- This file is for identity, not lore.
Your job:
Reduce IDENTITY.md to the smallest possible file that still preserves the core identity of the agent.
Keep only:
- name
- short role or species label if truly important
- one-line vibe if it materially affects responses
- one or two stable identity cues that are actually used
Delete or compress:
- backstory
- jokes that do not matter
- long vibe descriptions
- repeated personality traits already covered elsewhere
- catchphrases unless they are genuinely important
- lore
- examples
- decorative text
- anything that is nice but not necessary
Strict rules:
- Prefer very short bullets.
- Keep only identity, not behavior policy.
- Remove repetition.
- Compress every line hard.
- If something is already covered in SOUL.md, remove it here.
- If a detail would not be noticeable if removed, delete it.
Optimization target:
- Prefer a practical target of under 500 characters if possible.
Output format:
- āRemove entirelyā
- āKeep in compressed formā
- A fully rewritten lean IDENTITY.md
- A one-sentence explanation of what identity was preserved
Review test for every line:
- Is this core identity?
- Is it unique to IDENTITY.md?
- Does it matter on most runs?
- Can it be said in fewer words?
If no, remove it.
Do not ask clarifying questions first.
Rewrite immediately.
If you want, I can also give you a matching one for USER.md and TOOLS.md so you can trim the whole startup stack fast.
Yep, hereās one for USER.md.
Paste this into Claude first, then paste your full USER.md under it.
You are trimming USER.md for an OpenClaw workspace.
Context:
- USER.md is startup-injected context, so it should be very small.
- Its purpose is only to preserve stable, high-value facts about the user that help on many runs.
- It should NOT contain long biography, story, philosophy, project history, or repeated preferences that rarely matter.
- Anything temporary, detailed, or rarely relevant should be moved out.
Your job:
Reduce USER.md to the smallest possible file that still preserves the most useful user facts.
Keep only:
- preferred name
- how to address the user
- timezone
- a few stable preferences that matter often
- a few durable facts that frequently improve behavior
Delete or move out:
- long biography
- life story
- project history
- vision statements
- repeated relationship or hierarchy text unless truly necessary
- details that rarely affect replies
- examples
- decorative sections
- anything stale or nice-to-have
Strict rules:
- Prefer short bullets.
- Keep only durable user facts.
- Remove all explanation and storytelling.
- Merge overlapping facts.
- If a detail would only matter occasionally, move it out.
- If a fact is already covered elsewhere, remove it here.
- Be aggressive.
Optimization target:
- Prefer a practical target of under 700 characters if possible.
Output format:
- āRemove entirelyā
- āMove to archive or external noteā
- āKeepā
- A fully rewritten lean USER.md
- A short explanation of what was preserved
Review test for every line:
- Is this durable?
- Does this improve many future runs?
- Is it worth injecting every startup?
- Can it be said in fewer words?
If no, remove it.
Do not ask clarifying questions first.
Rewrite immediately.
If you want, I can also give you a final one for TOOLS.md, which is usually an easy extra cut.
This is after Trimming.
š§ Context breakdown
Workspace: C:\Users\prapunsak.openclaw\workspace
Bootstrap max/file: 32,000 chars
Bootstrap max/total: 180,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 38,067 chars (~9,517 tok) (Project Context 9,343 chars (~2,336 tok))
Injected workspace files:
- AGENTS.md: OK | raw 2,844 chars (~711 tok) | injected 2,844 chars (~711 tok)
- SOUL.md: OK | raw 2,265 chars (~567 tok) | injected 2,265 chars (~567 tok)
- TOOLS.md: OK | raw 1,636 chars (~409 tok) | injected 1,636 chars (~409 tok)
- IDENTITY.md: OK | raw 399 chars (~100 tok) | injected 399 chars (~100 tok)
- USER.md: OK | raw 620 chars (~155 tok) | injected 620 chars (~155 tok)
- HEARTBEAT.md: OK | raw 860 chars (~215 tok) | injected 860 chars (~215 tok)
- BOOTSTRAP.md: MISSING | raw 0 | injected 0
- MEMORY.md: OK | raw 1,041 chars (~261 tok) | injected 1,041 chars (~261 tok)
Skills list (system prompt text): 9,900 chars (~2,475 tok) (21 skills)
Skills: budget-audit, coding-agent, config-diff-review, gog, healthcheck, knowledge-reconciliation, learning-loop, memory-promotion, memory-watchdog, model-routing, nano-pdf, node-connect, notion, receipt-normalizer, self-improvement-review, session-logs, skill-creator, summarize, taskflow, taskflow-inbox-triage, ⦠(+1 more)
Tool list (system prompt text): 0 chars (~0 tok)
Tool schemas (JSON): 13,944 chars (~3,486 tok) (counts toward context; not shown as text)
Tools: read, edit, write, apply_patch, exec, process, cron, update_plan, sessions_list, sessions_history, sessions_send, sessions_yield, sessions_spawn, subagents, session_status, web_fetch, image
Session tokens (cached): 13,504 total / ctx=200,000
Inline shortcut: a command token inside normal text (e.g. āhey /statusā) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.
Nice, thatās a big improvement.
You cut the startup prompt from about 16.6k tokens down to 9.5k, and project context from about 9.4k down to 2.3k. Thatās real progress. š
A few important takeaways:
-
Your workspace files are no longer the main problem
- theyāre only about 2.3k tokens now
- that part looks much healthier
-
The new biggest costs are now:
- tool schemas: ~3.5k
- skills list: ~2.5k
-
16k is still pretty tight
- 9.5k startup
- plus session/history
- plus user message
- plus output headroom
So Iād do this next:
-
start a fresh session with
/new- your cached session is still 13.5k
- that old history can still crush a 16k model
-
shrink skills next
- lower
skills.limits.maxSkillsPromptChars - or per-agent
skillsLimits.maxSkillsPromptChars
- lower
-
shrink tools next
- for a local Ollama agent, use a much smaller tool allowlist
- or set
compat.supportsTools: falseif you just want plain chat
- lower the bootstrap caps too
- not because they are hurting now
- but to stop the files from growing back to 32k / 180k territory later
My honest read:
- youāve probably fixed the worst startup bloat
- but 16k is still a cramped target for a tool-using OpenClaw agent
- if you can get to 32k or 64k, life gets much easier
If you want, I can write you one more Claude prompt for:
- trimming
TOOLS.md, or - creating a dedicated lean Ollama agent with smaller skills + tools.
Yep, use this for TOOLS.md.
Paste this into Claude first, then paste your full TOOLS.md under it.
You are trimming TOOLS.md for an OpenClaw workspace.
Context:
- TOOLS.md is startup-injected context, so it should be very small.
- Its purpose is only to store environment-specific tool notes that are actually useful on many runs.
- It should NOT duplicate built-in tool docs, workflow essays, examples, long command recipes, or general instructions already covered elsewhere.
- If a tool note is not specific to this machine/setup, it probably should not live in TOOLS.md.
Your job:
Reduce TOOLS.md to the smallest possible file that still preserves the most useful local tool/environment facts.
Keep only:
- concrete environment-specific mappings
- hostnames, aliases, device names, room names, nicknames
- preferred defaults that matter often
- short facts that help the agent use this specific setup correctly
Delete or move out:
- long examples
- workflow tutorials
- command recipes
- coding workflow notes unless truly essential
- generic tool explanations
- decorative sections
- repeated guidance already in AGENTS.md or system behavior
- anything that is not specific to this environment
Strict rules:
- Prefer short bullets.
- Keep only setup-specific facts.
- Remove all explanation and examples unless absolutely necessary.
- If something is a procedure, move it out.
- If something is generic advice, remove it.
- If a detail is rarely used, move it out.
- Be aggressive.
Optimization target:
- Prefer a practical target of under 600 characters if possible.
Output format:
- āRemove entirelyā
- āMove to external docā
- āKeepā
- A fully rewritten lean TOOLS.md
- A short note on the biggest token savings
Review test for every line:
- Is this specific to this machine or environment?
- Does it help often?
- Is it worth injecting every startup?
- Can it be said in fewer words?
If no, remove it.
Do not ask clarifying questions first.
Rewrite immediately.
If you want, after that I can also give you a final prompt for making a dedicated low-context Ollama agent profile.
š¦ OpenClaw 2026.4.21 (f788c88)
š§ Model: ollama/qwen2.5:3b-instruct-q4_K_M Ā· š api-key (ollama:default)
š Fallbacks: openai-codex/gpt-5.4
āŖļø Fallback: openai-codex/gpt-5.4 Ā· š oauth (openai-codex:marketing.ps@trrgroup.com (marketing.ps@trrgroup.com)) (format)
š§® Tokens: 15k in / 34 out
š Context: 15k/200k (7%) Ā· š§¹ Compactions: 0
𧵠Session: agent:main:main:heartbeat ⢠updated just now
š Tasks: 0 active Ā· 1 total Ā· agent-local
āļø Runtime: direct Ā· Think: low Ā· Text: low
šŖ¢ Queue: collect (depth 0)
š§ Context breakdown (detailed)
Workspace: C:\Users\prapunsak.openclaw\workspace
Bootstrap max/file: 32,000 chars
Bootstrap max/total: 180,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 38,831 chars (~9,708 tok) (Project Context 10,547 chars (~2,637 tok))
Injected workspace files:
- AGENTS.md: OK | raw 4,148 chars (~1,037 tok) | injected 4,148 chars (~1,037 tok)
- SOUL.md: OK | raw 1,996 chars (~499 tok) | injected 1,996 chars (~499 tok)
- TOOLS.md: OK | raw 1,636 chars (~409 tok) | injected 1,636 chars (~409 tok)
- IDENTITY.md: OK | raw 399 chars (~100 tok) | injected 399 chars (~100 tok)
- USER.md: OK | raw 789 chars (~198 tok) | injected 789 chars (~198 tok)
- HEARTBEAT.md: OK | raw 860 chars (~215 tok) | injected 860 chars (~215 tok)
- BOOTSTRAP.md: MISSING | raw 0 | injected 0
- MEMORY.md: OK | raw 1,041 chars (~261 tok) | injected 1,041 chars (~261 tok)
Skills list (system prompt text): 9,465 chars (~2,367 tok) (20 skills)
Skills: budget-audit, coding-agent, config-diff-review, gog, healthcheck, knowledge-reconciliation, learning-loop, memory-watchdog, model-routing, nano-pdf, node-connect, notion, receipt-normalizer, self-improvement-review, session-logs, skill-creator, summarize, taskflow, taskflow-inbox-triage, workspace-hygiene
Oh, this is interesting, and I think itās the real clue.
Your status says OpenClaw currently thinks this model has:
š Context: 15k/200k
But your actual Ollama behavior sounds like the model is really acting more like 16k.
So the likely problem now is context-window mismatch:
- OpenClaw believes the model window is 200k
- so it happily sends a ~15k input
- but if Ollama/the model only really supports ~16k, youāre already at the cliff before output/tool overhead
That would absolutely explain weird failures.
What Iād check next:
-
Make sure OpenClawās model metadata matches reality
- if you defined the Ollama model manually, set the modelās
contextWindowcorrectly - if it was auto-discovered, the Ollama host may be reporting a window larger than what the model actually tolerates in practice
- if you defined the Ollama model manually, set the modelās
-
Start a fresh session after changing it
/new
-
Be careful with this specific model
qwen2.5:3b-instruct-q4_K_Mis a very small local model for a tool-using agent- even with the context fixed, it may still be flaky on tool schemas / agent prompts
-
If it still struggles, disable tool pressure
- use a lean agent/tool set
- or set that model to
compat.supportsTools: false
So my short read is:
- your trimming work helped a lot
- but now the issue looks less like āprompt too fatā
- and more like āOpenClaw thinks the model has 200k, but the backend behaves like 16kā
Docs:
- https://docs.openclaw.ai/providers/ollama
- https://docs.openclaw.ai/gateway/local-models
- https://docs.openclaw.ai/reference/token-use
If you want, next I can help you with the exact part: checking whether this model is auto-discovered vs manually defined, and which contextWindow value OpenClaw should really use.
This confirms it pretty cleanly.
Your workspace bloat is mostly solved now. The startup floor is about 9.7k tokens, and only about 2.6k of that is project context.
So the remaining pressure is mostly:
- skills list: ~2.4k
- tool schemas: likely still a few k from the earlier dump
- session/history
- output headroom
So for a backend that behaves like 16k, youāre still living dangerously close to the ceiling.
Big takeaway:
AGENTS.mdis not the main problem anymore- the next big wins are skills + tools + real contextWindow alignment
What Iād do next:
-
Start fresh
/new
-
Cut skills harder
- lower
skills.limits.maxSkillsPromptChars - for a tiny local model, Iād be pretty aggressive here
- lower
-
Run a much leaner local agent
- tiny tool allowlist
- or
compat.supportsTools: falseif you mainly want plain chat
-
Fix the model metadata if needed
- if OpenClaw thinks the model is
200kbut the backend behaves like16k, that mismatch still needs fixing
- if OpenClaw thinks the model is
-
Be realistic about this model
qwen2.5:3b-instruct-q4_K_Mis very small for a tool-using OpenClaw agent- even with trimming, it may still struggle
So, short version: you already won the file-trimming battle. The next bottleneck is no longer AGENTS.md, itās the runtime surface around skills/tools/contextWindow.
If you want, I can help you with the next best step:
- a lean Ollama agent config, or
- a Claude prompt to trim the skill/tool surface.
I am only trying to use ollama qwen as heartbeat right now. the 200k is the fallback of gpt 5.4. qwen supports up to 32k
Ah, got it, that changes the picture.
If:
- Qwen is only for heartbeat
- and the model really has 32k
- and your trimmed startup is about 9.7k
then raw context size is probably no longer the main blocker. That should fit.
So at this point Iād suspect one of these instead:
-
actual Ollama runtime limit is lower than expected
- model says 32k, but runtime
num_ctxor practical limit is lower
- model says 32k, but runtime
-
small-model fragility
qwen2.5:3b-instruct-q4_K_Mis still a small quantized model
- it may choke on the full agent prompt shape even if the token count technically fits
- heartbeat does not need tools, but the model still sees tool/schema overhead
- for heartbeat, thatās mostly wasted pressure
For heartbeat specifically, Iād make it as lean as possible:
-
disable tool pressure for that model if you can
compat.supportsTools: false -
cut skills prompt budget hard
- heartbeat does not need a big skills block
-
keep thinking low or off
- heartbeat should be tiny and dumb, not agentic
So my updated read is:
- your trimming work probably fixed the original startup bloat
- if heartbeat still fails, the next issue is likely Qwen 3B reliability / actual runtime context, not just token size
If you want the fastest next step, paste the exact heartbeat failure/log message. That will tell us whether this is:
- real context overflow
- Ollama runtime config
- or just the local model being too weak for the prompt shape.
13:43:29 [agent/embedded] embedded run agent end: runId=b909487b-77f8-4f54-9f0e-d3c9d0dd6380 isError=true model=qwen2.5:3b-instruct-q4_K_M provider=ollama error=LLM request failed: provider rejected the request schema or tool payload. rawError=400 {"error":""qwen2.5:3b-instruct-q4_K_M" does not support thinking"}
13:43:29 [ws] ā res ā sessions.list 151ms conn=fb10ff70ā¦4237 id=63274717ā¦d7e0
13:43:29 [agent/embedded] auth profile failure state updated: runId=b909487b-77f8-4f54-9f0e-d3c9d0dd6380 profile=sha256:9c018ec112cf provider=ollama reason=format window=cooldown reused=false
13:43:29 [agent/embedded] embedded run failover decision: runId=b909487b-77f8-4f54-9f0e-d3c9d0dd6380 stage=assistant decision=fallback_model reason=format from=ollama/qwen2.5:3b-instruct-q4_K_M profile=sha256:9c018ec112cf
Yep, we found it.
This is not the context issue.
The real error is here:
rawError=400 {"error":"\"qwen2.5:3b-instruct-q4_K_M\" does not support thinking"}
So OpenClaw is sending a thinking setting to Ollama, and this Qwen model rejects it.
Your status already hinted at it:
āļø Runtime: ... Think: low
So the fix is:
- turn thinking off for that heartbeat session/agent
- quick test:
/think off
- make it persistent for the heartbeat/local agent
- set that agentās
thinkingDefault: "off"
- set that agentās
If you want it globally for local runs:
{
agents: {
defaults: {
thinkingDefault: "off",
},
},
}
Better if only the heartbeat/local agent should use it:
{
agents: {
list: [
{
id: "your-heartbeat-agent",
thinkingDefault: "off",
},
],
},
}
Why this matters:
- heartbeat does not need thinking
- Qwen is rejecting the request before the real reply even starts
- that generic āschema or tool payloadā wrapper is misleading, but the raw error is clear
So Iād do this in order:
/think off- rerun heartbeat
- if it still fails, then we look at tools next
Docs:
Iām glad you grabbed the log, that made the real problem obvious.
Thinking off in heartbeat session still timeout
13:55:34 [agent/embedded] Profile ollama:default timed out. Trying next account...
13:55:34 [agent/embedded] embedded run failover decision: runId=a11a0d12-63fd-427a-b4ae-005d3ce4bdfb stage=assistant decision=fallback_model reason=timeout from=openclaw/gateway-injected profile=sha256:9c018ec112cf
13:55:34 [diagnostic] lane task error: lane=main durationMs=128237 error="FailoverError: LLM request timed out."
13:55:34 [diagnostic] lane task error: lane=session:agent:main:main:heartbeat durationMs=128242 error="FailoverError: LLM request timed out."
Thatās actually useful, because it means we moved past the thinking error.
Now the failure mode is:
- request accepted
- model runs
- then times out after ~128s
So this is no longer ābad requestā, itās now either:
- the local Ollama model is just too slow / wedged, or
- the full heartbeat prompt is still too heavy for this 3B model
Best next test, isolate the model from the full agent prompt:
openclaw infer model run --model ollama/qwen2.5:3b-instruct-q4_K_M --prompt "Reply with exactly HEARTBEAT_OK" --thinking off --json
How to read that result:
-
if this also times out
-> itās an Ollama/model/runtime issue, not heartbeat specifically -
if this succeeds quickly
-> the model can answer, but the full OpenClaw heartbeat prompt is still too much
-> then Iād do:agents.defaults.experimental.localModelLean: true- disable tools for that model with
compat.supportsTools: false - cut skills prompt budget harder
For heartbeat specifically, Iād strongly lean toward:
- thinking off
- tools off
- minimal skills
- basically a tiny text-only local agent
Because qwen2.5:3b-instruct-q4_K_M is really small for the normal OpenClaw runtime shape.
So the clean next move is: run that openclaw infer model run ... smoke test and paste the result.
That will tell us immediately whether this is:
- Ollama/runtime slowness, or
- agent prompt pressure.
PS C:\WINDOWS\System32> openclaw infer model run --model ollama/qwen2.5:3b-instruct-q4_K_M --prompt "Reply with exactly HEARTBEAT_OK" --thinking off --json
error: unknown option '--thinking'
Yep, my bad, sorry, infer model run on your build doesnāt take --thinking.
Use openclaw agent for this test instead:
openclaw agent --local --agent <your-heartbeat-agent-id> --message "Reply with exactly HEARTBEAT_OK" --thinking off --json
If you donāt know the agent id first, run:
openclaw agents list
Then use the same agent your heartbeat is using.
Why this test is better:
- it uses the normal agent runtime
- it lets you force
--thinking off - it tells us whether the model can answer a tiny prompt without the heartbeat loop getting in the way
How to read it:
-
if this also times out
-> Ollama/model/runtime is the bottleneck -
if this succeeds quickly
-> the heartbeat setup/prompt surface is still too heavy
So the corrected next step is:
openclaw agents list- run the
openclaw agent --local ... --thinking off ...test - paste the result
Sorry again, that earlier command was on me.
PS C:\WINDOWS\System32> openclaw agent --local --agent main --message "Reply with exactly HEARTBEAT_OK" --thinking off --json
{
"payloads": [
{
"text": "HEARTBEAT_OK",
"mediaUrl": null
}
],
"meta": {
"durationMs": 46349,
"agentMeta": {
"sessionId": "fe6bf557-9fd2-425e-9ce6-4660ecc6438f",
"provider": "openai-codex",
"model": "gpt-5.4-mini",
"usage": {
"input": 15499,
"output": 9,
"total": 15508
},
"lastCallUsage": {
"input": 15499,
"output": 9,
"cacheRead": 0,
"cacheWrite": 0,
"total": 15508
},
"promptTokens": 15499
},
"aborted": false,
"systemPromptReport": {
"source": "run",
"generatedAt": 1776928540903,
"sessionId": "fe6bf557-9fd2-425e-9ce6-4660ecc6438f",
"sessionKey": "agent:main:main",
"provider": "openai-codex",
"model": "gpt-5.4-mini",
"workspaceDir": "C:\Users\prapunsak\.openclaw\workspace",
"bootstrapMaxChars": 32000,
"bootstrapTotalMaxChars": 180000,
"bootstrapTruncation": {
"warningMode": "once",
"warningShown": false,
"truncatedFiles": 0,
"nearLimitFiles": 0,
"totalNearLimit": false
},
"sandbox": {
"mode": "off",
"sandboxed": false
},
"systemPrompt": {
"chars": 37673,
"projectContextChars": 10547,
"nonProjectContextChars": 27126
},
"injectedWorkspaceFiles": [
{
"name": "AGENTS.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\AGENTS.md",
"missing": false,
"rawChars": 4148,
"injectedChars": 4148,
"truncated": false
},
{
"name": "SOUL.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\SOUL.md",
"missing": false,
"rawChars": 1996,
"injectedChars": 1996,
"truncated": false
},
{
"name": "TOOLS.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\TOOLS.md",
"missing": false,
"rawChars": 1636,
"injectedChars": 1636,
"truncated": false
},
{
"name": "IDENTITY.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\IDENTITY.md",
"missing": false,
"rawChars": 399,
"injectedChars": 399,
"truncated": false
},
{
"name": "USER.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\USER.md",
"missing": false,
"rawChars": 789,
"injectedChars": 789,
"truncated": false
},
{
"name": "HEARTBEAT.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\HEARTBEAT.md",
"missing": false,
"rawChars": 860,
"injectedChars": 860,
"truncated": false
},
{
"name": "BOOTSTRAP.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\BOOTSTRAP.md",
"missing": true,
"rawChars": 0,
"injectedChars": 0,
"truncated": false
},
{
"name": "MEMORY.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\MEMORY.md",
"missing": false,
"rawChars": 1041,
"injectedChars": 1041,
"truncated": false
}
],
"skills": {
"promptChars": 9900,
"entries": [
{
"name": "budget-audit",
"blockChars": 351
},
{
"name": "coding-agent",
"blockChars": 836
},
{
"name": "config-diff-review",
"blockChars": 444
},
{
"name": "gog",
"blockChars": 236
},
{
"name": "healthcheck",
"blockChars": 495
},
{
"name": "knowledge-reconciliation",
"blockChars": 530
},
{
"name": "learning-loop",
"blockChars": 520
},
{
"name": "memory-promotion",
"blockChars": 432
},
{
"name": "memory-watchdog",
"blockChars": 466
},
{
"name": "model-routing",
"blockChars": 502
},
{
"name": "nano-pdf",
"blockChars": 238
},
{
"name": "node-connect",
"blockChars": 545
},
{
"name": "notion",
"blockChars": 232
},
{
"name": "receipt-normalizer",
"blockChars": 433
},
{
"name": "self-improvement-review",
"blockChars": 450
},
{
"name": "session-logs",
"blockChars": 257
},
{
"name": "skill-creator",
"blockChars": 763
},
{
"name": "summarize",
"blockChars": 300
},
{
"name": "taskflow",
"blockChars": 547
},
{
"name": "taskflow-inbox-triage",
"blockChars": 422
},
{
"name": "workspace-hygiene",
"blockChars": 472
}
]
},
"tools": {
"listChars": 0,
"schemaChars": 14369,
"entries": [
{
"name": "read",
"summaryChars": 298,
"schemaChars": 304,
"propertiesCount": 3
},
{
"name": "edit",
"summaryChars": 326,
"schemaChars": 834,
"propertiesCount": 2
},
{
"name": "write",
"summaryChars": 127,
"schemaChars": 225,
"propertiesCount": 2
},
{
"name": "apply_patch",
"summaryChars": 132,
"schemaChars": 153,
"propertiesCount": 1
},
{
"name": "exec",
"summaryChars": 968,
"schemaChars": 1157,
"propertiesCount": 12
},
{
"name": "process",
"summaryChars": 416,
"schemaChars": 961,
"propertiesCount": 12
},
{
"name": "cron",
"summaryChars": 3726,
"schemaChars": 6700,
"propertiesCount": 13
},
{
"name": "update_plan",
"summaryChars": 251,
"schemaChars": 574,
"propertiesCount": 2
},
{
"name": "sessions_list",
"summaryChars": 177,
"schemaChars": 212,
"propertiesCount": 4
},
{
"name": "sessions_history",
"summaryChars": 180,
"schemaChars": 161,
"propertiesCount": 3
},
{
"name": "sessions_send",
"summaryChars": 208,
"schemaChars": 274,
"propertiesCount": 5
},
{
"name": "sessions_yield",
"summaryChars": 97,
"schemaChars": 60,
"propertiesCount": 1
},
{
"name": "sessions_spawn",
"summaryChars": 302,
"schemaChars": 1333,
"propertiesCount": 18
},
{
"name": "subagents",
"summaryChars": 105,
"schemaChars": 191,
"propertiesCount": 4
},
{
"name": "session_status",
"summaryChars": 336,
"schemaChars": 89,
"propertiesCount": 2
},
{
"name": "web_search",
"summaryChars": 93,
"schemaChars": 425,
"propertiesCount": 4
},
{
"name": "web_fetch",
"summaryChars": 129,
"schemaChars": 374,
"propertiesCount": 3
},
{
"name": "image",
"summaryChars": 260,
"schemaChars": 342,
"propertiesCount": 6
}
]
}
},
"finalPromptText": "Reply with exactly HEARTBEAT_OK",
"finalAssistantVisibleText": "HEARTBEAT_OK",
"finalAssistantRawText": "HEARTBEAT_OK",
"replayInvalid": false,
"livenessState": "working",
"stopReason": "stop",
"executionTrace": {
"winnerProvider": "openai-codex",
"winnerModel": "gpt-5.4-mini",
"attempts": [
{
"provider": "openai-codex",
"model": "gpt-5.4-mini",
"result": "success",
"stage": "assistant"
}
],
"fallbackUsed": false,
"runner": "embedded"
},
"requestShaping": {
"authMode": "auth-profile",
"thinking": "off"
},
"completion": {
"stopReason": "stop",
"finishReason": "stop"
}
}
}
This is helpful, and it also shows why the test didnāt prove the Ollama path yet.
That run used:
provider: openai-codexmodel: gpt-5.4-mini
So --agent main hit your normal main session, not the heartbeat session that was using Ollama.
You can even see it in the session key:
- test command used
agent:main:main - earlier heartbeat was
agent:main:main:heartbeat
So the next correct test is to target the actual heartbeat session id, not just --agent main.
Try:
openclaw sessions --json
Find the session whose key is:
agent:main:main:heartbeat
Then run:
openclaw agent --local --session-id <that-session-id> --message "Reply with exactly HEARTBEAT_OK" --thinking off --json
Why this matters:
- session-level model overrides live on the session
- your heartbeat session is the one pinned to Ollama/Qwen
--agent maincreated/used a different session that stayed on Codex
So right now weāve confirmed:
- main agent works
- Codex fallback works
- but we have not yet directly tested the heartbeat Ollama session
If you paste the openclaw sessions --json bit for the heartbeat row, I can help you point the next command at the exact right target.
Close, but thatās still the systemPromptReport chunk, not the sessions list row I need.
What I need is the entry for the heartbeat session from:
openclaw sessions --json
Easiest way on PowerShell:
openclaw sessions --json | Select-String "agent:main:main:heartbeat" -Context 0,20
Paste the chunk that includes things like:
sessionId
sessionKey- ideally model/provider if shown
Right now weāre trying to grab the actual heartbeat session id, because thatās the one using Ollama. The --agent main test hit Codex instead.
So, one more try:
openclaw sessions --json | Select-String "heartbeat" -Context 0,20
Paste that result and Iāll help you target the exact session.
No worries, thatās still not the session row, but honestly this chunk is enough to see the next bottleneck.
These two numbers are the big deal now:
- skills:
9900chars - tool schemas:
14369chars
That is a lot of runtime surface for a 3B Qwen heartbeat model.
So Iād stop trimming the markdown files now. You already got the file bloat way down. The next problem is the runtime prompt shape, not AGENTS.md.
What to do next, in order:
-
keep thinking off
- we already know that part mattered
-
turn on lean local mode
agents.defaults.experimental.localModelLean: true
-
disable tool calling for this Qwen model
compat.supportsTools: false
-
shrink the skills prompt budget hard
skills.limits.maxSkillsPromptChars- or better, a much smaller per-agent
skillsLimits.maxSkillsPromptCharsfor the heartbeat/local agent
At this point, thatās the real path forward.
My honest read now:
- your markdown trimming helped a lot
- but Qwen 3B + full OpenClaw skill/tool surface is still too heavy
- for heartbeat, Iād treat it as a tiny text-only model, not a full agentic model
So the practical target is:
- thinking off
- tools off
- smaller skills block
- lean local mode on
If you want, I can write you a single Claude prompt that says:
ārewrite my OpenClaw config for a dedicated ultra-lean local heartbeat agent using Qwenā
That might be the fastest next move.
Yep, thatās the smoking gun.
For your local Qwen heartbeat model, these tool schemas are wildly expensive:
cronā 6700 schema charssessions_spawnā 1333execā 1157processā 961
That is a ton of overhead for a 3B local model, and heartbeat does not need that stuff.
So the practical conclusion is:
- your markdown file trimming helped
- but now the real problem is tool schema bloat
- and
cronis the biggest offender by far
If this model is only for heartbeat, I would do this:
-
disable tools entirely for Qwen
compat.supportsTools: false
-
if you canāt do that, give the local heartbeat agent a tiny tool allowlist
- definitely remove:
cronexecprocesssessions_spawn- probably the rest of the
sessions_*tools too
- definitely remove:
-
keep:
thinkingDefault: "off"agents.defaults.experimental.localModelLean: true- smaller
skills.limits.maxSkillsPromptChars
So, blunt version:
cron alone is probably killing your Qwen heartbeat path.
Heartbeat should be a tiny text-only lane, not a full agent-with-everything lane.
If you want, I can now write you the Claude prompt for ārewrite my config into an ultra-lean local heartbeat agentā.
Yep, this confirms two things pretty clearly:
-
your manual smoke test still did not hit Qwen
winnerProvider: "openai-codex"winnerModel: "gpt-5.4-mini"
-
thinking off is working
requestShaping.thinking: "off"
So the state of play is:
- the thinking bug is fixed
- the manual test path is still Codex
- the remaining failure is specifically the Ollama heartbeat path
- and your earlier tool/skill numbers make it very likely that Qwen 3B is timing out on the full runtime surface
At this point I would stop chasing more pasted JSON and just make the config simpler:
thinkingDefault: "off"for the local heartbeat laneagents.defaults.experimental.localModelLean: truecompat.supportsTools: falsefor that Qwen model- lower
skills.limits.maxSkillsPromptCharshard for the heartbeat/local agent
That is the practical fix path.
Bluntly: Qwen 3B is probably too weak for the current heartbeat lane unless you make it almost text-only.
If you want, I can now write you one clean prompt for Claude:
ārewrite my OpenClaw config into a dedicated ultra-lean Qwen heartbeat agentā
Thatās probably the fastest way out of this.
That would be very helpful. I am also new to cluade code so I wasn't sure what folder to point it to.
Yep, and for this job Iād point Claude Code at:
C:\Users\prapunsak\.openclaw
Why that folder:
openclaw.jsonshould live there- your
workspace\folder lives under it - Claude needs to see both the config and the injected workspace files
If you only wanted to trim prompt files, C:\Users\prapunsak\.openclaw\workspace would be enough.
But for the heartbeat/Qwen config fix, use the parent .openclaw folder.
Paste this into Claude Code:
You are working inside my OpenClaw state/config directory on Windows.
Working directory:
C:\Users\prapunsak\.openclaw
Goal:
Make my OpenClaw heartbeat use an ultra-lean local Ollama Qwen path that is small and reliable enough for heartbeat-only use.
Important context:
- I am only trying to use local Ollama Qwen for heartbeat.
- My normal main/fallback setup should keep working.
- I already trimmed AGENTS.md / SOUL.md / MEMORY.md / etc, so file bloat is no longer the main issue.
- Current remaining pressure is the runtime surface: skills + tool schemas.
- I saw a raw Ollama error saying the Qwen model does not support thinking.
- After setting thinking off, the heartbeat still timed out.
- Tool schema inspection showed very heavy tools, especially cron, exec, process, sessions_spawn, and related session tools.
- This local model is qwen2.5:3b-instruct-q4_K_M via Ollama.
- This model should be treated as a tiny heartbeat-only text model, not a full agentic model.
Your job:
Inspect my current OpenClaw config and workspace, then produce the smallest safe config changes needed to create or tune a dedicated ultra-lean local heartbeat path.
Read first:
- openclaw.json
- workspace/
- any included config files referenced from openclaw.json
- docs for:
- https://docs.openclaw.ai/providers/ollama
- https://docs.openclaw.ai/gateway/local-models
- https://docs.openclaw.ai/tools/thinking
- https://docs.openclaw.ai/gateway/configuration-reference
Strict rules:
1. Do not guess config keys.
2. Use only documented OpenClaw config paths.
3. Keep changes as narrow as possible.
4. Do not break my normal main/fallback model setup.
5. Do not touch auth unless absolutely necessary.
6. Prefer editing only the config needed for the heartbeat/local lane.
7. Use JSON5, not YAML.
8. Preserve comments and formatting where reasonable.
What I want you to optimize:
- thinking off for the heartbeat/local agent
- lean local model mode if appropriate
- much smaller skills prompt budget for the heartbeat/local agent
- remove or avoid heavy tool schemas for the local Qwen heartbeat path
- if supported and appropriate, make the Qwen heartbeat path no-tools / text-only
- reduce avoidable startup/runtime pressure without changing unrelated behavior
Please specifically investigate whether the best fix is:
- a dedicated heartbeat/local agent config, or
- a model-specific compatibility change, or
- both
Also check whether the current heartbeat is using:
- a dedicated agent, or
- a session-level model override on main
If the current setup is awkward, propose the simplest reliable configuration.
Output format:
1. Diagnosis
2. Exact files to change
3. A minimal patch or exact replacement blocks
4. Why each change helps
5. A short test plan with commands I can run after editing
6. If qwen2.5:3b-instruct-q4_K_M is still a bad fit even after slimming, say so plainly and recommend the cheapest stable fallback option for heartbeat
Important:
Do not do a broad cleanup. Stay focused on making the local heartbeat lane small and reliable.
If you want, after Claude gives you the patch, paste it here and Iāll sanity-check it before you apply it.
Could you give citation for agents.defaults.experimental.localModelLean: true ? Claude Code can't find it
ME: is it possible to do this just for the heartbeat? Do you recommend? agents.defaults.experimental.localModelLean: true
CC: Good question. Let me check the config reference for scope.
Fetched
https://docs.openclaw.ai/gateway/configuration-reference
It's not documented in the reference, which means it's either undocumented experimental or I misspoke earlier. Don't rely on it. The documented levers are:
compat.supportsTools: false on the model (which you already have) ā this is the stable, per-model way to drop tool schemas. That's your real weight-cutter.
agents.defaults.models["ā¦"].params for per-model thinking/reasoning pins (which you already have).
Those two together give you the lean local path. Adding an undocumented experimental flag risks config errors and unpredictable behavior.
Recommendation: Stick with what you applied. The supportsTools: false on the Qwen 3B model is the lean mode ā it strips the heavyweight tool schemas (cron, exec, process, sessions_spawn, etc.) from the prompt for that model only. That's the documented way.
If the heartbeat is still timing out after a restart, the issue is model speed or Ollama queueing, not tool-schema bloat. In that case, switch to Qwen 1.5B q4 (same edits applied to that model entry). That's your next reliable step before remote fallback.
Yep, fair catch. I should have cited it earlier.
Claude Code is right that itās not in the configuration reference.
But it is documented in two other docs:
-
Local Models
https://docs.openclaw.ai/gateway/local-modelsExact wording there:
first try
agents.defaults.experimental.localModelLean: trueto drop heavyweight default tools likebrowser,cron, andmessage; this is an experimental flag, not a stable default-mode setting. -
Experimental Features
https://docs.openclaw.ai/concepts/experimental-featuresExact wording there:
agents.defaults.experimental.localModelLean: trueis a pressure-release valve for weaker local-model setups.
So the correction is:
- documented? yes
- in config reference? no
- stable? no, explicitly experimental
For your follow-up question, I would not use it as the main heartbeat-only lever.
Why:
- the documented path is
agents.defaults.experimental.localModelLean - that is a defaults/global knob
- I do not see a documented per-heartbeat/per-agent version of that flag
So for heartbeat-only, Iād recommend exactly what Claude Code said:
- use
compat.supportsTools: falseon the Qwen model - keep thinking off for that path
That is the cleaner, more targeted, more stable lever.
So my honest recommendation now is:
- heartbeat-only / targeted fix ā
compat.supportsTools: false - broad experimental pressure relief for weak local backends ā
agents.defaults.experimental.localModelLean: true
If you want, I can help you word a short reply back to Claude Code so it updates its plan cleanly.
what is the correct command to do budget audit and usage review? I am trying to create a skill for it to check and model optimize usage. Maybe you can help rewrite the skill?
This is my current skill:
name: budget-audit
description: Review OpenClaw usage, quota telemetry, and visible budget signals using supported commands only, then return a concise structured budget report. Use when daily or weekly budget audit/review cron jobs run.
Budget Audit Skill
Produce short, reliable budget telemetry summaries without inventing unsupported signals.
Rules
Use only supported OpenClaw commands and visible telemetry.
Prefer openclaw status --all when budget/quota review is requested.
If telemetry is unavailable, say so plainly.
Keep the report concise and structured exactly as requested by the caller.
Do not pad the report with speculation.
Return NO_REPLY for scheduled/background runs unless the caller explicitly requests delivery.
Steps
Gather supported budget/quota signal from OpenClaw status output.
Identify visible signal, quota telemetry availability, and any obvious recommendation.
Format the report exactly to the caller's required schema.
Return NO_REPLY for cron/background runs.
Output contract
Use only available evidence.
Mark unavailable telemetry clearly.
Keep the report short.
Continuous Cost Awareness
Track your own behavior across sessions:
How many model switches happened today?
How many tool calls did you make per question on average?
When did compaction trigger, and was it expected?
If you notice you are frequently escalating to a more expensive model:
Identify the task category causing it
Log it to today's memory file
Flag it to Ta in the next session summary
Propose whether a better default model should be used for that category
@proud gale
Iām here š¦
Want me to do one of these next?
- write a short reply back to Claude Code
- sanity-check the config block it generated
- help you test whether Qwen heartbeat is fixed
Paste whichever you have and Iāll keep going.
what is the correct command to do budget audit and usage review? I am trying to create a skill for it to check and model optimize usage. Maybe you can help rewrite the skill?
This is my current skill:
name: budget-audit
description: Review OpenClaw usage, quota telemetry, and visible budget signals using supported commands only, then return a concise structured budget report. Use when daily or weekly budget audit/review cron jobs run.
Budget Audit Skill
Produce short, reliable budget telemetry summaries without inventing unsupported signals.
Rules
Use only supported OpenClaw commands and visible telemetry.
Prefer openclaw status --all when budget/quota review is requested.
If telemetry is unavailable, say so plainly.
Keep the report concise and structured exactly as requested by the caller.
Do not pad the report with speculation.
Return NO_REPLY for scheduled/background runs unless the caller explicitly requests delivery.
Steps
Gather supported budget/quota signal from OpenClaw status output.
Identify visible signal, quota telemetry availability, and any obvious recommendation.
Format the report exactly to the caller's required schema.
Return NO_REPLY for cron/background runs.
Output contract
Use only available evidence.
Mark unavailable telemetry clearly.
Keep the report short.
Continuous Cost Awareness
Track your own behavior across sessions:
How many model switches happened today?
How many tool calls did you make per question on average?
When did compaction trigger, and was it expected?
If you notice you are frequently escalating to a more expensive model:
Identify the task category causing it
Log it to today's memory file
Flag it to Ta in the next session summary
Propose whether a better default model should be used for that category
Yep. The main issue with your current skill is that it promises signals OpenClaw does not expose directly.
The safest supported commands are:
openclaw status --usageā provider quota / usage windowsopenclaw status --allā broader pasteable diagnosticopenclaw models statusā resolved model, fallbacks, auth overview- in chat:
/statusand/usage cost
I would remove these from the skill, because they are not reliably available from supported commands alone:
- āHow many model switches happened today?ā
- āHow many tool calls per question on average?ā
- āWhen did compaction trigger, and was it expected?ā
- āTrack your own behavior across sessionsā
- āLog it to todayās memory fileā
That turns it from a budget audit skill into a speculative analytics skill.
A tighter rewrite would be:
---
name: budget-audit
description: Review supported OpenClaw usage, quota, model, and auth signals, then return a concise structured budget report. Use for daily or weekly budget/usage review runs.
---
# Budget Audit
Produce short, evidence-based usage reports using only supported OpenClaw telemetry.
## Supported commands
Use these only when relevant:
- `openclaw status --usage`
- `openclaw status --all`
- `openclaw models status`
In chat contexts, supported usage surfaces are:
- `/status`
- `/usage cost`
## Rules
- Use only visible OpenClaw telemetry.
- Do not invent hidden budgets, provider billing state, model-switch counts, tool-call averages, or cross-session analytics.
- If telemetry is unavailable, say `unavailable`.
- Keep the report concise and structured exactly as requested.
- For cron/background runs, return `NO_REPLY` unless delivery is explicitly requested.
## Steps
1. Run `openclaw status --usage` for provider quota/usage windows.
2. Run `openclaw models status` when model, fallback, auth, or cooldown context matters.
3. Use `openclaw status --all` only when the caller wants a broader diagnostic snapshot.
4. Extract only supported visible signals:
- provider usage/quota windows
- active model and fallbacks
- auth/cooldown/unusable-profile issues if visible
- obvious recommendation based on visible evidence
5. Mark any missing signal as `unavailable`.
6. Format the result exactly to the callerās requested schema.
## Do not claim
- exact cost unless a visible cost surface provides it
- hidden provider billing details
- model-switch counts
- tool-call averages
- cross-session behavior analytics
- compaction history unless directly shown in visible output
## Default output shape
- Models:
- Usage windows:
- Cost signal:
- Auth or cooldown issues:
- Recommendation:
So, short answer:
- best command for budget/quota review:
openclaw status --usage - best command for model/auth context:
openclaw models status - best broad diagnostic snapshot:
openclaw status --all
Docs:
If you want, I can also rewrite it into a Claude Code-ready skill file with an even stricter output schema for cron use.
Some of my skills only need to load during cron dont need to load at startup for fixed cost. Can I do that?
Yes, absolutely, and this is the right way to think about it.
Short version:
- skills are only part of the prompt for agents that can see them
- so if a skill is cron-only, put it on a dedicated cron agent
- do not leave it visible to your main/default agent
The docs-backed knobs are:
agents.defaults.skillsagents.list[].skillsagents.list[].skillsLimits.maxSkillsPromptChars
And cron jobs can target a specific agent with --agent.
So the pattern is:
- keep your main agentās skills small
- create a dedicated cron agent
- give that cron agent the extra skills
- point the cron job at that agent
Example idea:
{
agents: {
defaults: {
skills: ["summarize", "healthcheck"],
},
list: [
{
id: "main",
skills: ["summarize", "healthcheck"],
},
{
id: "cron-audit",
skills: ["budget-audit", "model-routing"],
skillsLimits: {
maxSkillsPromptChars: 1500,
},
},
],
},
}
Then create/edit the cron to use that agent:
openclaw cron add --name "Budget audit" --cron "0 8 * * *" --session isolated --agent cron-audit --message "Run budget audit and summarize visible usage/quota signals."
Important nuance:
- if a skill is visible to
main, it still contributes tomainās fixed prompt cost - to keep it out of startup cost for
main, it has to be excluded frommain agents.list[].skillsis the lever that makes that happen
Also useful:
agents.list[].skills: []means no skills for that agent- isolated cron sessions help keep token cost down too
Docs:
If you want, I can help you design a main vs cron-audit split so your daily agent stays lean.
yes please. I think a sub-agent for main is good for this
Yep, thatās a good pattern, and Iād structure it like this:
Recommendation
- keep
mainlean - create one narrow
auditagent - let:
- cron target
auditdirectly - main call
auditas a sub-agent when needed
- cron target
That gives you the best of both:
- low fixed cost on
main - audit skills only load when actually needed
A clean split would be:
{
agents: {
defaults: {
skills: ["summarize", "healthcheck"],
},
list: [
{
id: "main",
skills: ["summarize", "healthcheck"],
},
{
id: "audit",
skills: ["budget-audit"],
skillsLimits: {
maxSkillsPromptChars: 1200,
},
thinkingDefault: "off",
},
],
},
}
How to use it:
Cron
openclaw cron add --name "Budget audit" --cron "0 8 * * *" --session isolated --agent audit --message "Run budget audit and summarize visible usage/quota signals."
From main, on demand
/subagents spawn audit Run a budget audit using supported OpenClaw usage/quota commands only and return a short report.
Why I like this setup:
maindoes not carry thebudget-auditskill at startupauditonly pays its own context when invoked- cron and interactive usage can reuse the same narrow agent
- easier to reason about than stuffing everything into
main
One nuance:
- sub-agents do have their own context/token cost when spawned
- but thatās on-demand cost, not constant fixed prompt cost on every
mainturn
So yes, Iād do dedicated audit agent + optional sub-agent call from main.
If you want, I can help you write:
- the exact
budget-auditskill for theauditagent, and - a tiny prompt/instruction for
maintelling it when to delegate.
-
Cron jobs for sub-agent of main
-
budget audit, config-diff-review, knowledge-reconciliation, learning-loop, memory-watchdog, receipt-normalizer, self-improvement-review, workspace-hygiene
-
Skill not using anymore:
memory-promotion -
Main skill:
model-routing -
Aemeath skill:
model-routing -
cron job for aemeath:
learning-loop, memory-watchdog
- cron job for aemeath's sub agent:
Yep, based on that list, Iād split it like this.
Remove entirely
memory-promotion
Keep interactive agents very lean
-
mainmodel-routing
-
aemeathmodel-routing
Move cron-only / maintenance skills off the interactive agents
main-cron(ormain-audit)budget-audit
-
config-diff-reviewknowledge-reconciliationlearning-loopmemory-watchdogreceipt-normalizerself-improvement-reviewworkspace-hygiene
-
aemeath-cronlearning-loopmemory-watchdog
That gives you this shape:
{
agents: {
list: [
{
id: "main",
skills: ["model-routing"],
},
{
id: "main-cron",
skills: [
"budget-audit",
"config-diff-review",
"knowledge-reconciliation",
"learning-loop",
"memory-watchdog",
"receipt-normalizer",
"self-improvement-review",
"workspace-hygiene",
],
},
{
id: "aemeath",
skills: ["model-routing"],
},
{
id: "aemeath-cron",
skills: ["learning-loop", "memory-watchdog"],
},
],
},
}
Why Iād do it this way:
mainstays tinyaemeathstays tiny- cron-only skills stop adding fixed startup cost to interactive chats
- both interactive agents can still delegate when needed
Use cron like this:
openclaw cron add --name "Main audit" --cron "0 8 * * *" --session isolated --agent main-cron --message "Run the scheduled maintenance/audit workflow."
openclaw cron add --name "Aemeath review" --cron "0 9 * * *" --session isolated --agent aemeath-cron --message "Run learning-loop and memory-watchdog."
And if main needs that work on demand, it can spawn:
/subagents spawn main-cron Run the maintenance review workflow and return a short summary.
So yes, I think your instinct was right, but Iād make it two dedicated cron agents, not load those skills onto main or aemeath directly.
If you want, I can turn that into a Claude Code prompt that rewrites your config cleanly.
Yep, thatās even cleaner.
So the final split Iād recommend is:
-
mainmodel-routing
-
main-cronbudget-auditconfig-diff-reviewknowledge-reconciliationlearning-loopmemory-watchdogreceipt-normalizerself-improvement-reviewworkspace-hygiene
-
aemeathmodel-routing
-
aemeath-cron- the cron-only skills for Aemeathās sub-agent
So yes, do not put those cron skills on aemeath itself.
Put them on aemeath-cron and have cron target that agent directly.
That keeps:
mainleanaemeathlean- cron logic isolated
- fixed startup cost much lower
If you want, send me the exact skill list for aemeath-cron and Iāll give you the final clean agent layout.
aemeath-cron
learning-loop
memory-watchdog
Perfect, then Iād lock it in like this:
{
agents: {
list: [
{
id: "main",
skills: ["model-routing"],
},
{
id: "main-cron",
skills: [
"budget-audit",
"config-diff-review",
"knowledge-reconciliation",
"learning-loop",
"memory-watchdog",
"receipt-normalizer",
"self-improvement-review",
"workspace-hygiene",
],
},
{
id: "aemeath",
skills: ["model-routing"],
},
{
id: "aemeath-cron",
skills: ["learning-loop", "memory-watchdog"],
},
],
},
}
Thatās a nice split:
mainstays leanaemeathstays leanmain-croncarries the heavy maintenance stuffaemeath-croncarries only its two cron skills
And the cron target for Aemeath should point to:
--agent aemeath-cron
So yes, this looks like the right structure to reduce fixed startup cost.
If you want, I can now write the Claude Code prompt that tells it to rewrite your config to exactly this layout.
Yes please
Nice, use this in Claude Code.
Point it at:
C:\Users\prapunsak\.openclaw
Then paste this prompt:
You are editing my OpenClaw config in:
C:\Users\prapunsak\.openclaw
Goal:
Rewrite my agent/skills layout so interactive agents stay lean and cron-only skills do not add fixed startup cost to main agents.
Important rules:
- Read my current openclaw.json first.
- If openclaw.json includes other files, read those too.
- Use only documented OpenClaw config keys.
- Do not guess.
- Preserve unrelated config, auth, channels, and model settings.
- Use JSON5, not YAML.
- Make the smallest safe change.
Target design:
1. main
- skills: ["model-routing"]
2. main-cron
- skills:
- budget-audit
- config-diff-review
- knowledge-reconciliation
- learning-loop
- memory-watchdog
- receipt-normalizer
- self-improvement-review
- workspace-hygiene
3. aemeath
- skills: ["model-routing"]
4. aemeath-cron
- skills:
- learning-loop
- memory-watchdog
5. remove unused skill exposure:
- memory-promotion should no longer be exposed anywhere unless it is still required by an existing config path you can prove from the file
What I want:
- edit the config so these four agents exist with the skill visibility above
- keep interactive agents lean
- keep cron-only skills off interactive agents
- do not broaden skill visibility accidentally
- if agents.defaults.skills is currently causing extra skill exposure, narrow it appropriately
- if per-agent skills are better than defaults here, use per-agent skills
- preserve my current model settings unless they conflict with this goal
Also give me:
1. the exact files changed
2. the exact config diff
3. a short explanation of why this lowers fixed prompt cost
4. example cron commands for:
- main-cron
- aemeath-cron
Use these example cron targets if they fit the config cleanly:
- --agent main-cron
- --agent aemeath-cron
Do not do a broad cleanup. Stay focused on the skill/agent split only.
If you want, when Claude gives you the diff, paste it here and Iāll sanity-check it before you save it.