#Can you help me reduce my context size? I think my fixec cost starts at 20k.

1 messages Ā· Page 1 of 1 (latest)

stoic pivot
#

This is probably why my ollama has been failing because the model ctx is at 16k ctx

obtuse lindenBOT
nimble gladeBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> āœ… Mark Solution

obtuse lindenBOT
proud galeBOT
#

Yeah, very plausible.

For local backends, OpenClaw warns below 32k and blocks below 16k, and even at 16k the baseline prompt, tools, workspace files, and chat history can eat the window fast.

First, what does this show?

openclaw --version

Then in the thread try:

  • /status
  • /context list
  • /compact

If it still keeps happening, /new starts fresh.

#

A few practical fixes:

  • use a model with 64k+ context, ideally 128k+ for tool use
  • make sure Ollama is using the native URL like http://host:11434, not /v1
  • if you must stay local/small, try:
    • agents.defaults.experimental.localModelLean: true
    • compat.supportsTools: false on that model

Docs:

Paste openclaw --version and /status and I’ll tell you the best next step.

stoic pivot
#

šŸ¦ž OpenClaw 2026.4.21 (f788c88)
🧠 Model: openai-codex/gpt-5.4-mini Ā· šŸ”‘ oauth (openai-codex:marketing.ps@trrgroup.com (marketing.ps@trrgroup.com))
šŸ”„ Fallbacks: openai-codex/gpt-5.4
🧮 Tokens: 20k in / 116 out
šŸ“š Context: 20k/200k (10%) Ā· 🧹 Compactions: 0
šŸ“Š Usage: 5h 85% left ā±2h 9m Ā· Week 96% left ā±6d 16h
🧵 Session: agent:main:main:heartbeat • updated just now
āš™ļø Runtime: direct Ā· Think: low Ā· Text: low
🪢 Queue: collect (depth 0)

proud galeBOT
#

Yep, that basically confirms it.

Your version is fine, 2026.4.21 is recent. The big signal is this line:

  • šŸ“š Context: 20k/200k

So your current session is already carrying about 20k tokens. If you point that same kind of session at an Ollama model with only 16k context, it’s very likely to fail immediately or act weird.

One important detail though: the status you pasted is for a 200k Codex session, not the Ollama model itself. So the problem is basically:

  • current OpenClaw session payload ā‰ˆ 20k
  • Ollama model window = 16k
  • that math does not work šŸ˜…

What I’d do next:

  1. Start fresh before testing Ollama
#
  • /new
  1. Check what is eating the baseline

    • /context list
    • /context detail
  2. If you want Ollama to work locally, don’t use 16k

    • 32k is the bare minimum territory
    • 64k+ is much more realistic for tool use
    • 128k feels way better
  3. Shrink the baseline if you must stay local

    • agents.defaults.experimental.localModelLean: true
    • lower agents.defaults.bootstrapMaxChars
    • lower agents.defaults.bootstrapTotalMaxChars
    • if needed, set that Ollama model to compat.supportsTools: false
stoic pivot
#

🧠 Context breakdown
Workspace: C:\Users\prapunsak.openclaw\workspace
Bootstrap max/file: 32,000 chars
Bootstrap max/total: 180,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 66,423 chars (~16,606 tok) (Project Context 37,699 chars (~9,425 tok))

Injected workspace files:

  • AGENTS.md: OK | raw 17,305 chars (~4,327 tok) | injected 17,305 chars (~4,327 tok)
  • SOUL.md: OK | raw 4,818 chars (~1,205 tok) | injected 4,818 chars (~1,205 tok)
  • TOOLS.md: OK | raw 1,636 chars (~409 tok) | injected 1,636 chars (~409 tok)
  • IDENTITY.md: OK | raw 3,009 chars (~753 tok) | injected 3,009 chars (~753 tok)
  • USER.md: OK | raw 2,214 chars (~554 tok) | injected 2,214 chars (~554 tok)
  • HEARTBEAT.md: OK | raw 860 chars (~215 tok) | injected 860 chars (~215 tok)
  • BOOTSTRAP.md: MISSING | raw 0 | injected 0
  • MEMORY.md: OK | raw 8,179 chars (~2,045 tok) | injected 8,179 chars (~2,045 tok)

Skills list (system prompt text): 9,900 chars (~2,475 tok) (21 skills)
Skills: budget-audit, coding-agent, config-diff-review, gog, healthcheck, knowledge-reconciliation, learning-loop, memory-promotion, memory-watchdog, model-routing, nano-pdf, node-connect, notion, receipt-normalizer, self-improvement-review, session-logs, skill-creator, summarize, taskflow, taskflow-inbox-triage, … (+1 more)
Tool list (system prompt text): 0 chars (~0 tok)
Tool schemas (JSON): 13,944 chars (~3,486 tok) (counts toward context; not shown as text)
Tools: read, edit, write, apply_patch, exec, process, cron, update_plan, sessions_list, sessions_history, sessions_send, sessions_yield, sessions_spawn, subagents, session_status, web_fetch, image

Session tokens (cached): 19,953 total / ctx=200,000

Inline shortcut: a command token inside normal text (e.g. ā€œhey /statusā€) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.

proud galeBOT
#

Yep, that’s the smoking gun.

Your problem is not really chat history, it’s the baseline system prompt.

Right now, before the model even starts replying, you’re already at about:

  • workspace files: ~9.4k tokens
  • skills block: ~2.5k tokens
  • tool schemas: ~3.5k tokens
  • total system prompt: ~16.6k tokens

So a 16k Ollama model literally has no room left. It’s already over budget before the actual conversation does much.

Also, important: /new only helps session history. It does not fix this big bootstrap/system-prompt load.

Biggest wins, in order

#
  1. Lower bootstrap limits hard

    • yours: 32,000 per file / 180,000 total
    • docs default: 12,000 per file / 60,000 total
    • for local Ollama, I’d go much lower than your current settings
  2. Trim the biggest injected files first

    • AGENTS.md is the biggest hit
    • MEMORY.md is also expensive
    • then SOUL.md, IDENTITY.md, USER.md
  3. Enable lean local mode

    • agents.defaults.experimental.localModelLean: true
  4. Shrink the skills prompt budget

    • skills.limits.maxSkillsPromptChars
    • or per-agent: agents.list[].skillsLimits.maxSkillsPromptChars
#
  1. Use a dedicated lean Ollama agent
    • smaller workspace/bootstrap
    • smaller tool allowlist
    • or disable tools for that model with compat.supportsTools: false

Minimal starter config to try

{
  agents: {
    defaults: {
      bootstrapMaxChars: 4000,
      bootstrapTotalMaxChars: 12000,
      experimental: {
        localModelLean: true,
      },
#
    },
  },
  skills: {
    limits: {
      maxSkillsPromptChars: 2000,
    },
  },
}

If that still isn’t enough, then make the Ollama model/tooling leaner too.

My honest recommendation

Keep your normal full agent for cloud models, and make a separate slim Ollama agent for local runs. Trying to cram this current prompt shape into 16k is going to be miserable.

stoic pivot
#

Can you help me take a look at my Openclaw Control System Administrator/Technical Support Agent.md?

proud galeBOT
#

Yep, absolutely. That file is probably one of the biggest wins.

From your /context detail, AGENTS.md is already ~17.3k chars / ~4.3k tokens, so trimming it could save a lot.

I can’t read your machine from here, so paste the file here and I’ll go through it with you. If it’s huge, send it in 2 to 3 chunks.

If you want, paste just these parts first:

  1. the top section / role description
  2. any long rules / policy lists
  3. examples / templates
  4. anything that feels ā€œnice to haveā€ instead of ā€œneeded every turnā€

I’ll tell you:

  • what should stay in AGENTS.md
  • what should move out
#
  • what can be shortened hard
  • what is probably burning tokens for no benefit

If you paste it, I’ll mark it up.

stoic pivot
#

I can't send markdown in this chat. Maybe you can help me by writing a prompt for me to put in Claude so that it helps me trim basd on your strict rules to reduce unnessary token usage in agent.md at start up?

proud galeBOT
#

Yep, use this. Paste this into Claude first, then paste your Agent.md under it.

You are optimizing an OpenClaw AGENTS.md / agent startup instruction file for minimum token usage.

Context:

  • This file is injected into the model context at startup on every run.
  • My local Ollama model has a small context window, so every token matters.
  • Other files are already injected separately, including things like persona, identity, user info, tools, memory, and heartbeat.
  • Because of that, AGENTS.md should NOT duplicate personality, biography, tool docs, examples, or anything that can live elsewhere.

Your job:
Ruthlessly trim the file while preserving only instructions that are truly necessary on most turns.

Strict rules:

  1. Keep only instructions that are needed frequently and materially change behavior.
  2. Delete anything that is:
    • repetitive
#
  • explanatory instead of directive
    • backstory, lore, personality, or flavor
    • long examples
    • duplicated from other startup files
    • duplicated from built-in tool behavior
    • only useful in rare edge cases
  1. Compress long paragraphs into short bullets.
  2. Merge overlapping rules.
  3. Prefer direct commands over explanation.
  4. If something is useful but not needed every turn, move it out of AGENTS.md and mark it as ā€œmove to external docā€.
  5. Assume separate files already handle:
    • persona/tone
    • identity
    • user profile
    • tool descriptions
    • memory notes
  6. Do not preserve ā€œnice to haveā€ wording.
#
  1. Be aggressive. I want a lean operational file, not a manifesto.

Optimization target:

  • Reduce total size as much as possible.
  • Prefer a hard practical target of under 4,000 characters if possible.
  • If that is not possible, still minimize as much as you can.

Output format:

  1. ā€œWhat to removeā€ list
  2. ā€œWhat to move out of AGENTS.mdā€ list
  3. ā€œWhat must stayā€ list
  4. A fully rewritten lean version of the file
  5. A short explanation of the biggest token savings

Important review test:
For every line, ask:

  • Is this needed on most turns?
#
  • Does it prevent a real failure mode?
  • Is it already covered elsewhere?
  • Is it worth paying for on every single startup?

If the answer is no, remove it.

Do not ask me clarifying questions first.
Do the rewrite immediately.

If you want, after Claude rewrites it, paste the result here and I’ll do a second-pass trim.

stoic pivot
#

Can you write a prompt for me to trim MEMORY.md?

proud galeBOT
#

Yep, use this one for MEMORY.md.

Paste this into Claude first, then paste your full MEMORY.md under it.

You are trimming MEMORY.md for an OpenClaw workspace.

Context:

  • MEMORY.md is injected into startup context on many runs, so it must stay extremely lean.
  • Its job is to store only durable, high-value facts that repeatedly improve behavior.
  • Other files already exist for persona, identity, user profile, tools, and general instructions.
  • Daily or temporary notes can live outside MEMORY.md.

Your job:
Ruthlessly reduce MEMORY.md to only the facts that are genuinely worth paying for on every startup.

Keep only:

  • durable user preferences that matter often
#
  • stable environment-specific facts
  • recurring pain points that still happen repeatedly
  • long-lived terminology corrections or naming rules
  • persistent safety boundaries or important do-not-do items not already covered elsewhere

Delete or move out:

  • one-off incidents
  • resolved bugs that are unlikely to recur
  • outdated version-specific notes
  • release-specific quirks
  • temporary troubleshooting notes
  • stale reminders
  • examples
  • explanations
  • duplicates of AGENTS.md, USER.md, IDENTITY.md, SOUL.md, or tool docs
  • anything that only mattered once
  • anything that can live in a dated note or external reference file
#

Strict rules:

  1. Prefer very short bullets.
  2. Remove all filler and explanation.
  3. Merge overlapping items.
  4. If a note is old, narrow, or tied to a single historical incident, assume it should be removed or moved out unless it still causes recurring failures.
  5. MEMORY.md should contain facts, not essays.
  6. If a fact is useful but not needed on every startup, move it to an archive or dated memory note.
  7. Be aggressive.

Optimization target:

  • Cut total size as much as possible.
  • Prefer a practical target of under 2,000 characters if possible.

Output format:

  1. ā€œRemove entirelyā€
  2. ā€œMove to archive or dated memoryā€
  3. ā€œKeepā€
#
  1. A fully rewritten lean MEMORY.md
  2. A short note on the biggest token savings

Review test for every line:

  • Is this durable?
  • Is this likely to matter again?
  • Does it change behavior in a meaningful way?
  • Is it worth injecting every startup?

If no, remove it.

Do not ask clarifying questions first.
Rewrite immediately.

If you want, after Claude trims it, paste the result here and I’ll do a second-pass cleanup.

stoic pivot
#

Can you write a prompt for me to trim SOUL.md?

proud galeBOT
#

Yep, here’s a good one for SOUL.md.

Paste this into Claude first, then paste your full SOUL.md under it.

You are trimming SOUL.md for an OpenClaw workspace.

Context:

  • SOUL.md is startup-injected context, so it should be extremely small.
  • Its purpose is only to preserve core voice, vibe, and a few non-negotiable personality/boundary cues.
  • It should NOT duplicate instructions already covered by AGENTS.md, USER.md, IDENTITY.md, MEMORY.md, or built-in system rules.
  • This file is not for long philosophy, lore, backstory, or repeated safety instructions.

Your job:
Ruthlessly compress SOUL.md into the smallest possible file that still preserves the intended personality and tone.

Keep only:

  • the smallest set of lines needed to preserve voice
#
  • a few core behavioral truths
  • a few genuine boundaries if they are unique to SOUL.md
  • the vibe in compact form

Delete or compress:

  • lore
  • origin story
  • philosophy paragraphs
  • repeated safety language
  • repeated ā€œbe helpful / be kind / be carefulā€ wording
  • examples
  • dramatic phrasing
  • duplicate tone guidance
  • anything already covered elsewhere
  • anything that does not change behavior in a noticeable way

Strict rules:

#
  1. Prefer short bullets over paragraphs.
  2. Keep personality, but make it compact.
  3. One idea per line.
  4. Remove repetition and explanation.
  5. Preserve tone with as few words as possible.
  6. If a line is nice but not essential, delete it.
  7. Aim for a file that feels alive, but not bloated.

Optimization target:

  • Prefer a practical target of under 1,000 characters if possible.

Output format:

  1. ā€œRemove entirelyā€
  2. ā€œKeep in compressed formā€
  3. A fully rewritten lean SOUL.md
  4. A short note on what personality traits were preserved
#

Review test for every line:

  • Does this meaningfully shape tone?
  • Is it unique to SOUL.md?
  • Is it worth paying for on every startup?
  • Can it be said in fewer words?

If no, remove or compress it.

Do not ask clarifying questions first.
Rewrite immediately.

If you want, after Claude trims it, paste the result here and I’ll help you make it even tighter.

stoic pivot
proud galeBOT
#

Yep, here’s one for IDENTITY.md.

Paste this into Claude first, then paste your full IDENTITY.md under it.

You are trimming IDENTITY.md for an OpenClaw workspace.

Context:

  • IDENTITY.md is startup-injected context, so it must be tiny.
  • Its purpose is only to preserve the agent’s essential identity markers.
  • It should NOT duplicate tone rules from SOUL.md, operating rules from AGENTS.md, or user/environment facts from USER.md and MEMORY.md.
  • This file is for identity, not lore.

Your job:
Reduce IDENTITY.md to the smallest possible file that still preserves the core identity of the agent.

Keep only:

  • name
#
  • short role or species label if truly important
  • one-line vibe if it materially affects responses
  • one or two stable identity cues that are actually used

Delete or compress:

  • backstory
  • jokes that do not matter
  • long vibe descriptions
  • repeated personality traits already covered elsewhere
  • catchphrases unless they are genuinely important
  • lore
  • examples
  • decorative text
  • anything that is nice but not necessary

Strict rules:

  1. Prefer very short bullets.
#
  1. Keep only identity, not behavior policy.
  2. Remove repetition.
  3. Compress every line hard.
  4. If something is already covered in SOUL.md, remove it here.
  5. If a detail would not be noticeable if removed, delete it.

Optimization target:

  • Prefer a practical target of under 500 characters if possible.

Output format:

  1. ā€œRemove entirelyā€
  2. ā€œKeep in compressed formā€
  3. A fully rewritten lean IDENTITY.md
  4. A one-sentence explanation of what identity was preserved

Review test for every line:

  • Is this core identity?
#
  • Is it unique to IDENTITY.md?
  • Does it matter on most runs?
  • Can it be said in fewer words?

If no, remove it.

Do not ask clarifying questions first.
Rewrite immediately.

If you want, I can also give you a matching one for USER.md and TOOLS.md so you can trim the whole startup stack fast.

stoic pivot
#

Can you write a prompt for me to trim USER.md?

proud galeBOT
#

Yep, here’s one for USER.md.

Paste this into Claude first, then paste your full USER.md under it.

You are trimming USER.md for an OpenClaw workspace.

Context:

  • USER.md is startup-injected context, so it should be very small.
  • Its purpose is only to preserve stable, high-value facts about the user that help on many runs.
  • It should NOT contain long biography, story, philosophy, project history, or repeated preferences that rarely matter.
  • Anything temporary, detailed, or rarely relevant should be moved out.

Your job:
Reduce USER.md to the smallest possible file that still preserves the most useful user facts.

Keep only:

  • preferred name
#
  • how to address the user
  • timezone
  • a few stable preferences that matter often
  • a few durable facts that frequently improve behavior

Delete or move out:

  • long biography
  • life story
  • project history
  • vision statements
  • repeated relationship or hierarchy text unless truly necessary
  • details that rarely affect replies
  • examples
  • decorative sections
  • anything stale or nice-to-have

Strict rules:

#
  1. Prefer short bullets.
  2. Keep only durable user facts.
  3. Remove all explanation and storytelling.
  4. Merge overlapping facts.
  5. If a detail would only matter occasionally, move it out.
  6. If a fact is already covered elsewhere, remove it here.
  7. Be aggressive.

Optimization target:

  • Prefer a practical target of under 700 characters if possible.

Output format:

  1. ā€œRemove entirelyā€
  2. ā€œMove to archive or external noteā€
  3. ā€œKeepā€
  4. A fully rewritten lean USER.md
  5. A short explanation of what was preserved
#

Review test for every line:

  • Is this durable?
  • Does this improve many future runs?
  • Is it worth injecting every startup?
  • Can it be said in fewer words?

If no, remove it.

Do not ask clarifying questions first.
Rewrite immediately.

If you want, I can also give you a final one for TOOLS.md, which is usually an easy extra cut.

stoic pivot
#

This is after Trimming.
🧠 Context breakdown
Workspace: C:\Users\prapunsak.openclaw\workspace
Bootstrap max/file: 32,000 chars
Bootstrap max/total: 180,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 38,067 chars (~9,517 tok) (Project Context 9,343 chars (~2,336 tok))

Injected workspace files:

  • AGENTS.md: OK | raw 2,844 chars (~711 tok) | injected 2,844 chars (~711 tok)
  • SOUL.md: OK | raw 2,265 chars (~567 tok) | injected 2,265 chars (~567 tok)
  • TOOLS.md: OK | raw 1,636 chars (~409 tok) | injected 1,636 chars (~409 tok)
  • IDENTITY.md: OK | raw 399 chars (~100 tok) | injected 399 chars (~100 tok)
  • USER.md: OK | raw 620 chars (~155 tok) | injected 620 chars (~155 tok)
  • HEARTBEAT.md: OK | raw 860 chars (~215 tok) | injected 860 chars (~215 tok)
  • BOOTSTRAP.md: MISSING | raw 0 | injected 0
  • MEMORY.md: OK | raw 1,041 chars (~261 tok) | injected 1,041 chars (~261 tok)

Skills list (system prompt text): 9,900 chars (~2,475 tok) (21 skills)
Skills: budget-audit, coding-agent, config-diff-review, gog, healthcheck, knowledge-reconciliation, learning-loop, memory-promotion, memory-watchdog, model-routing, nano-pdf, node-connect, notion, receipt-normalizer, self-improvement-review, session-logs, skill-creator, summarize, taskflow, taskflow-inbox-triage, … (+1 more)
Tool list (system prompt text): 0 chars (~0 tok)
Tool schemas (JSON): 13,944 chars (~3,486 tok) (counts toward context; not shown as text)
Tools: read, edit, write, apply_patch, exec, process, cron, update_plan, sessions_list, sessions_history, sessions_send, sessions_yield, sessions_spawn, subagents, session_status, web_fetch, image

Session tokens (cached): 13,504 total / ctx=200,000

Inline shortcut: a command token inside normal text (e.g. ā€œhey /statusā€) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.

proud galeBOT
#

Nice, that’s a big improvement.

You cut the startup prompt from about 16.6k tokens down to 9.5k, and project context from about 9.4k down to 2.3k. That’s real progress. šŸŽ‰

A few important takeaways:

  1. Your workspace files are no longer the main problem

    • they’re only about 2.3k tokens now
    • that part looks much healthier
  2. The new biggest costs are now:

    • tool schemas: ~3.5k
    • skills list: ~2.5k
  3. 16k is still pretty tight

    • 9.5k startup
    • plus session/history
#
  • plus user message
    • plus output headroom

So I’d do this next:

  • start a fresh session with /new

    • your cached session is still 13.5k
    • that old history can still crush a 16k model
  • shrink skills next

    • lower skills.limits.maxSkillsPromptChars
    • or per-agent skillsLimits.maxSkillsPromptChars
  • shrink tools next

    • for a local Ollama agent, use a much smaller tool allowlist
    • or set compat.supportsTools: false if you just want plain chat
#
  • lower the bootstrap caps too
    • not because they are hurting now
    • but to stop the files from growing back to 32k / 180k territory later

My honest read:

  • you’ve probably fixed the worst startup bloat
  • but 16k is still a cramped target for a tool-using OpenClaw agent
  • if you can get to 32k or 64k, life gets much easier

If you want, I can write you one more Claude prompt for:

  • trimming TOOLS.md, or
  • creating a dedicated lean Ollama agent with smaller skills + tools.
stoic pivot
proud galeBOT
#

Yep, use this for TOOLS.md.

Paste this into Claude first, then paste your full TOOLS.md under it.

You are trimming TOOLS.md for an OpenClaw workspace.

Context:

  • TOOLS.md is startup-injected context, so it should be very small.
  • Its purpose is only to store environment-specific tool notes that are actually useful on many runs.
  • It should NOT duplicate built-in tool docs, workflow essays, examples, long command recipes, or general instructions already covered elsewhere.
  • If a tool note is not specific to this machine/setup, it probably should not live in TOOLS.md.

Your job:
Reduce TOOLS.md to the smallest possible file that still preserves the most useful local tool/environment facts.

Keep only:

  • concrete environment-specific mappings
#
  • hostnames, aliases, device names, room names, nicknames
  • preferred defaults that matter often
  • short facts that help the agent use this specific setup correctly

Delete or move out:

  • long examples
  • workflow tutorials
  • command recipes
  • coding workflow notes unless truly essential
  • generic tool explanations
  • decorative sections
  • repeated guidance already in AGENTS.md or system behavior
  • anything that is not specific to this environment

Strict rules:

  1. Prefer short bullets.
  2. Keep only setup-specific facts.
#
  1. Remove all explanation and examples unless absolutely necessary.
  2. If something is a procedure, move it out.
  3. If something is generic advice, remove it.
  4. If a detail is rarely used, move it out.
  5. Be aggressive.

Optimization target:

  • Prefer a practical target of under 600 characters if possible.

Output format:

  1. ā€œRemove entirelyā€
  2. ā€œMove to external docā€
  3. ā€œKeepā€
  4. A fully rewritten lean TOOLS.md
  5. A short note on the biggest token savings

Review test for every line:

#
  • Is this specific to this machine or environment?
  • Does it help often?
  • Is it worth injecting every startup?
  • Can it be said in fewer words?

If no, remove it.

Do not ask clarifying questions first.
Rewrite immediately.

If you want, after that I can also give you a final prompt for making a dedicated low-context Ollama agent profile.

stoic pivot
#

šŸ¦ž OpenClaw 2026.4.21 (f788c88)
🧠 Model: ollama/qwen2.5:3b-instruct-q4_K_M Ā· šŸ”‘ api-key (ollama:default)
šŸ”„ Fallbacks: openai-codex/gpt-5.4
ā†Ŗļø Fallback: openai-codex/gpt-5.4 Ā· šŸ”‘ oauth (openai-codex:marketing.ps@trrgroup.com (marketing.ps@trrgroup.com)) (format)
🧮 Tokens: 15k in / 34 out
šŸ“š Context: 15k/200k (7%) Ā· 🧹 Compactions: 0
🧵 Session: agent:main:main:heartbeat • updated just now
šŸ“Œ Tasks: 0 active Ā· 1 total Ā· agent-local
āš™ļø Runtime: direct Ā· Think: low Ā· Text: low
🪢 Queue: collect (depth 0)

#

🧠 Context breakdown (detailed)
Workspace: C:\Users\prapunsak.openclaw\workspace
Bootstrap max/file: 32,000 chars
Bootstrap max/total: 180,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 38,831 chars (~9,708 tok) (Project Context 10,547 chars (~2,637 tok))
Injected workspace files:

  • AGENTS.md: OK | raw 4,148 chars (~1,037 tok) | injected 4,148 chars (~1,037 tok)
  • SOUL.md: OK | raw 1,996 chars (~499 tok) | injected 1,996 chars (~499 tok)
  • TOOLS.md: OK | raw 1,636 chars (~409 tok) | injected 1,636 chars (~409 tok)
  • IDENTITY.md: OK | raw 399 chars (~100 tok) | injected 399 chars (~100 tok)
  • USER.md: OK | raw 789 chars (~198 tok) | injected 789 chars (~198 tok)
  • HEARTBEAT.md: OK | raw 860 chars (~215 tok) | injected 860 chars (~215 tok)
  • BOOTSTRAP.md: MISSING | raw 0 | injected 0
  • MEMORY.md: OK | raw 1,041 chars (~261 tok) | injected 1,041 chars (~261 tok)
    Skills list (system prompt text): 9,465 chars (~2,367 tok) (20 skills)
    Skills: budget-audit, coding-agent, config-diff-review, gog, healthcheck, knowledge-reconciliation, learning-loop, memory-watchdog, model-routing, nano-pdf, node-connect, notion, receipt-normalizer, self-improvement-review, session-logs, skill-creator, summarize, taskflow, taskflow-inbox-triage, workspace-hygiene
proud galeBOT
#

Oh, this is interesting, and I think it’s the real clue.

Your status says OpenClaw currently thinks this model has:

  • šŸ“š Context: 15k/200k

But your actual Ollama behavior sounds like the model is really acting more like 16k.

So the likely problem now is context-window mismatch:

  • OpenClaw believes the model window is 200k
  • so it happily sends a ~15k input
  • but if Ollama/the model only really supports ~16k, you’re already at the cliff before output/tool overhead

That would absolutely explain weird failures.

What I’d check next:

#
  1. Make sure OpenClaw’s model metadata matches reality

    • if you defined the Ollama model manually, set the model’s contextWindow correctly
    • if it was auto-discovered, the Ollama host may be reporting a window larger than what the model actually tolerates in practice
  2. Start a fresh session after changing it

    • /new
  3. Be careful with this specific model

    • qwen2.5:3b-instruct-q4_K_M is a very small local model for a tool-using agent
    • even with the context fixed, it may still be flaky on tool schemas / agent prompts
  4. If it still struggles, disable tool pressure

    • use a lean agent/tool set
    • or set that model to compat.supportsTools: false

So my short read is:

#
  • your trimming work helped a lot
  • but now the issue looks less like ā€œprompt too fatā€
  • and more like ā€œOpenClaw thinks the model has 200k, but the backend behaves like 16kā€

Docs:

If you want, next I can help you with the exact part: checking whether this model is auto-discovered vs manually defined, and which contextWindow value OpenClaw should really use.

#

This confirms it pretty cleanly.

Your workspace bloat is mostly solved now. The startup floor is about 9.7k tokens, and only about 2.6k of that is project context.

So the remaining pressure is mostly:

  • skills list: ~2.4k
  • tool schemas: likely still a few k from the earlier dump
  • session/history
  • output headroom

So for a backend that behaves like 16k, you’re still living dangerously close to the ceiling.

Big takeaway:

  • AGENTS.md is not the main problem anymore
  • the next big wins are skills + tools + real contextWindow alignment
#

What I’d do next:

  1. Start fresh

    • /new
  2. Cut skills harder

    • lower skills.limits.maxSkillsPromptChars
    • for a tiny local model, I’d be pretty aggressive here
  3. Run a much leaner local agent

    • tiny tool allowlist
    • or compat.supportsTools: false if you mainly want plain chat
  4. Fix the model metadata if needed

    • if OpenClaw thinks the model is 200k but the backend behaves like 16k, that mismatch still needs fixing
  5. Be realistic about this model

#
  • qwen2.5:3b-instruct-q4_K_M is very small for a tool-using OpenClaw agent
    • even with trimming, it may still struggle

So, short version: you already won the file-trimming battle. The next bottleneck is no longer AGENTS.md, it’s the runtime surface around skills/tools/contextWindow.

If you want, I can help you with the next best step:

  • a lean Ollama agent config, or
  • a Claude prompt to trim the skill/tool surface.
stoic pivot
#

I am only trying to use ollama qwen as heartbeat right now. the 200k is the fallback of gpt 5.4. qwen supports up to 32k

proud galeBOT
#

Ah, got it, that changes the picture.

If:

  • Qwen is only for heartbeat
  • and the model really has 32k
  • and your trimmed startup is about 9.7k

then raw context size is probably no longer the main blocker. That should fit.

So at this point I’d suspect one of these instead:

  1. actual Ollama runtime limit is lower than expected

    • model says 32k, but runtime num_ctx or practical limit is lower
  2. small-model fragility

    • qwen2.5:3b-instruct-q4_K_M is still a small quantized model
#
  • it may choke on the full agent prompt shape even if the token count technically fits
  1. heartbeat does not need tools, but the model still sees tool/schema overhead
    • for heartbeat, that’s mostly wasted pressure

For heartbeat specifically, I’d make it as lean as possible:

  • disable tool pressure for that model if you can
    compat.supportsTools: false

  • cut skills prompt budget hard

    • heartbeat does not need a big skills block
  • keep thinking low or off

    • heartbeat should be tiny and dumb, not agentic

So my updated read is:

#
  • your trimming work probably fixed the original startup bloat
  • if heartbeat still fails, the next issue is likely Qwen 3B reliability / actual runtime context, not just token size

If you want the fastest next step, paste the exact heartbeat failure/log message. That will tell us whether this is:

  • real context overflow
  • Ollama runtime config
  • or just the local model being too weak for the prompt shape.
stoic pivot
#

13:43:29 [agent/embedded] embedded run agent end: runId=b909487b-77f8-4f54-9f0e-d3c9d0dd6380 isError=true model=qwen2.5:3b-instruct-q4_K_M provider=ollama error=LLM request failed: provider rejected the request schema or tool payload. rawError=400 {"error":""qwen2.5:3b-instruct-q4_K_M" does not support thinking"}
13:43:29 [ws] ⇄ res āœ“ sessions.list 151ms conn=fb10ff70…4237 id=63274717…d7e0
13:43:29 [agent/embedded] auth profile failure state updated: runId=b909487b-77f8-4f54-9f0e-d3c9d0dd6380 profile=sha256:9c018ec112cf provider=ollama reason=format window=cooldown reused=false
13:43:29 [agent/embedded] embedded run failover decision: runId=b909487b-77f8-4f54-9f0e-d3c9d0dd6380 stage=assistant decision=fallback_model reason=format from=ollama/qwen2.5:3b-instruct-q4_K_M profile=sha256:9c018ec112cf

proud galeBOT
#

Yep, we found it.

This is not the context issue.

The real error is here:

rawError=400 {"error":"\"qwen2.5:3b-instruct-q4_K_M\" does not support thinking"}

So OpenClaw is sending a thinking setting to Ollama, and this Qwen model rejects it.

Your status already hinted at it:

  • āš™ļø Runtime: ... Think: low

So the fix is:

  1. turn thinking off for that heartbeat session/agent
#
  • quick test: /think off
  1. make it persistent for the heartbeat/local agent
    • set that agent’s thinkingDefault: "off"

If you want it globally for local runs:

{
  agents: {
    defaults: {
      thinkingDefault: "off",
    },
  },
}
#

Better if only the heartbeat/local agent should use it:

{
  agents: {
    list: [
      {
        id: "your-heartbeat-agent",
        thinkingDefault: "off",
      },
    ],
  },
}

Why this matters:

  • heartbeat does not need thinking
stoic pivot
#

Thinking off in heartbeat session still timeout
13:55:34 [agent/embedded] Profile ollama:default timed out. Trying next account...
13:55:34 [agent/embedded] embedded run failover decision: runId=a11a0d12-63fd-427a-b4ae-005d3ce4bdfb stage=assistant decision=fallback_model reason=timeout from=openclaw/gateway-injected profile=sha256:9c018ec112cf
13:55:34 [diagnostic] lane task error: lane=main durationMs=128237 error="FailoverError: LLM request timed out."
13:55:34 [diagnostic] lane task error: lane=session:agent:main:main:heartbeat durationMs=128242 error="FailoverError: LLM request timed out."

proud galeBOT
#

That’s actually useful, because it means we moved past the thinking error.

Now the failure mode is:

  • request accepted
  • model runs
  • then times out after ~128s

So this is no longer ā€œbad requestā€, it’s now either:

  1. the local Ollama model is just too slow / wedged, or
  2. the full heartbeat prompt is still too heavy for this 3B model

Best next test, isolate the model from the full agent prompt:

#
openclaw infer model run --model ollama/qwen2.5:3b-instruct-q4_K_M --prompt "Reply with exactly HEARTBEAT_OK" --thinking off --json

How to read that result:

  • if this also times out
    -> it’s an Ollama/model/runtime issue, not heartbeat specifically

  • if this succeeds quickly
    -> the model can answer, but the full OpenClaw heartbeat prompt is still too much
    -> then I’d do:

    • agents.defaults.experimental.localModelLean: true
    • disable tools for that model with compat.supportsTools: false
    • cut skills prompt budget harder

For heartbeat specifically, I’d strongly lean toward:

#
  • thinking off
  • tools off
  • minimal skills
  • basically a tiny text-only local agent

Because qwen2.5:3b-instruct-q4_K_M is really small for the normal OpenClaw runtime shape.

So the clean next move is: run that openclaw infer model run ... smoke test and paste the result.
That will tell us immediately whether this is:

  • Ollama/runtime slowness, or
  • agent prompt pressure.
stoic pivot
#

PS C:\WINDOWS\System32> openclaw infer model run --model ollama/qwen2.5:3b-instruct-q4_K_M --prompt "Reply with exactly HEARTBEAT_OK" --thinking off --json
error: unknown option '--thinking'

proud galeBOT
#

Yep, my bad, sorry, infer model run on your build doesn’t take --thinking.

Use openclaw agent for this test instead:

openclaw agent --local --agent <your-heartbeat-agent-id> --message "Reply with exactly HEARTBEAT_OK" --thinking off --json

If you don’t know the agent id first, run:

openclaw agents list

Then use the same agent your heartbeat is using.

Why this test is better:

#
  • it uses the normal agent runtime
  • it lets you force --thinking off
  • it tells us whether the model can answer a tiny prompt without the heartbeat loop getting in the way

How to read it:

  • if this also times out
    -> Ollama/model/runtime is the bottleneck

  • if this succeeds quickly
    -> the heartbeat setup/prompt surface is still too heavy

So the corrected next step is:

  1. openclaw agents list
  2. run the openclaw agent --local ... --thinking off ... test
  3. paste the result
#

Sorry again, that earlier command was on me.

stoic pivot
#

PS C:\WINDOWS\System32> openclaw agent --local --agent main --message "Reply with exactly HEARTBEAT_OK" --thinking off --json
{
"payloads": [
{
"text": "HEARTBEAT_OK",
"mediaUrl": null
}
],
"meta": {
"durationMs": 46349,
"agentMeta": {
"sessionId": "fe6bf557-9fd2-425e-9ce6-4660ecc6438f",
"provider": "openai-codex",
"model": "gpt-5.4-mini",
"usage": {
"input": 15499,
"output": 9,
"total": 15508
},
"lastCallUsage": {
"input": 15499,
"output": 9,
"cacheRead": 0,
"cacheWrite": 0,
"total": 15508
},
"promptTokens": 15499
},
"aborted": false,
"systemPromptReport": {
"source": "run",
"generatedAt": 1776928540903,
"sessionId": "fe6bf557-9fd2-425e-9ce6-4660ecc6438f",
"sessionKey": "agent:main:main",
"provider": "openai-codex",
"model": "gpt-5.4-mini",
"workspaceDir": "C:\Users\prapunsak\.openclaw\workspace",
"bootstrapMaxChars": 32000,
"bootstrapTotalMaxChars": 180000,
"bootstrapTruncation": {
"warningMode": "once",
"warningShown": false,
"truncatedFiles": 0,
"nearLimitFiles": 0,
"totalNearLimit": false
},
"sandbox": {
"mode": "off",
"sandboxed": false
},
"systemPrompt": {
"chars": 37673,
"projectContextChars": 10547,
"nonProjectContextChars": 27126
},
"injectedWorkspaceFiles": [
{
"name": "AGENTS.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\AGENTS.md",
"missing": false,
"rawChars": 4148,
"injectedChars": 4148,
"truncated": false
},

#

{
"name": "SOUL.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\SOUL.md",
"missing": false,
"rawChars": 1996,
"injectedChars": 1996,
"truncated": false
},
{
"name": "TOOLS.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\TOOLS.md",
"missing": false,
"rawChars": 1636,
"injectedChars": 1636,
"truncated": false
},
{
"name": "IDENTITY.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\IDENTITY.md",
"missing": false,
"rawChars": 399,
"injectedChars": 399,
"truncated": false
},
{
"name": "USER.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\USER.md",
"missing": false,
"rawChars": 789,
"injectedChars": 789,
"truncated": false
},
{
"name": "HEARTBEAT.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\HEARTBEAT.md",
"missing": false,
"rawChars": 860,
"injectedChars": 860,
"truncated": false
},
{
"name": "BOOTSTRAP.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\BOOTSTRAP.md",
"missing": true,
"rawChars": 0,
"injectedChars": 0,
"truncated": false
},
{
"name": "MEMORY.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\MEMORY.md",
"missing": false,
"rawChars": 1041,
"injectedChars": 1041,
"truncated": false
}
],
"skills": {
"promptChars": 9900,
"entries": [
{
"name": "budget-audit",
"blockChars": 351
},
{
"name": "coding-agent",
"blockChars": 836
},
{

#

"name": "config-diff-review",
"blockChars": 444
},
{
"name": "gog",
"blockChars": 236
},
{
"name": "healthcheck",
"blockChars": 495
},
{
"name": "knowledge-reconciliation",
"blockChars": 530
},
{
"name": "learning-loop",
"blockChars": 520
},
{
"name": "memory-promotion",
"blockChars": 432
},
{
"name": "memory-watchdog",
"blockChars": 466
},
{
"name": "model-routing",
"blockChars": 502
},
{
"name": "nano-pdf",
"blockChars": 238
},
{
"name": "node-connect",
"blockChars": 545
},
{
"name": "notion",
"blockChars": 232
},
{
"name": "receipt-normalizer",
"blockChars": 433
},
{
"name": "self-improvement-review",
"blockChars": 450
},
{
"name": "session-logs",
"blockChars": 257
},
{
"name": "skill-creator",
"blockChars": 763
},
{
"name": "summarize",
"blockChars": 300
},
{
"name": "taskflow",
"blockChars": 547
},
{
"name": "taskflow-inbox-triage",
"blockChars": 422
},
{
"name": "workspace-hygiene",
"blockChars": 472
}
]
},
"tools": {
"listChars": 0,
"schemaChars": 14369,
"entries": [
{
"name": "read",
"summaryChars": 298,
"schemaChars": 304,
"propertiesCount": 3
},

#

{
"name": "edit",
"summaryChars": 326,
"schemaChars": 834,
"propertiesCount": 2
},
{
"name": "write",
"summaryChars": 127,
"schemaChars": 225,
"propertiesCount": 2
},
{
"name": "apply_patch",
"summaryChars": 132,
"schemaChars": 153,
"propertiesCount": 1
},
{
"name": "exec",
"summaryChars": 968,
"schemaChars": 1157,
"propertiesCount": 12
},
{
"name": "process",
"summaryChars": 416,
"schemaChars": 961,
"propertiesCount": 12
},
{
"name": "cron",
"summaryChars": 3726,
"schemaChars": 6700,
"propertiesCount": 13
},
{
"name": "update_plan",
"summaryChars": 251,
"schemaChars": 574,
"propertiesCount": 2
},
{
"name": "sessions_list",
"summaryChars": 177,
"schemaChars": 212,
"propertiesCount": 4
},
{
"name": "sessions_history",
"summaryChars": 180,
"schemaChars": 161,
"propertiesCount": 3
},
{
"name": "sessions_send",
"summaryChars": 208,
"schemaChars": 274,
"propertiesCount": 5
},
{
"name": "sessions_yield",
"summaryChars": 97,
"schemaChars": 60,
"propertiesCount": 1
},
{
"name": "sessions_spawn",
"summaryChars": 302,
"schemaChars": 1333,
"propertiesCount": 18
},

#

{
"name": "subagents",
"summaryChars": 105,
"schemaChars": 191,
"propertiesCount": 4
},
{
"name": "session_status",
"summaryChars": 336,
"schemaChars": 89,
"propertiesCount": 2
},
{
"name": "web_search",
"summaryChars": 93,
"schemaChars": 425,
"propertiesCount": 4
},
{
"name": "web_fetch",
"summaryChars": 129,
"schemaChars": 374,
"propertiesCount": 3
},
{
"name": "image",
"summaryChars": 260,
"schemaChars": 342,
"propertiesCount": 6
}
]
}
},
"finalPromptText": "Reply with exactly HEARTBEAT_OK",
"finalAssistantVisibleText": "HEARTBEAT_OK",
"finalAssistantRawText": "HEARTBEAT_OK",
"replayInvalid": false,
"livenessState": "working",
"stopReason": "stop",
"executionTrace": {
"winnerProvider": "openai-codex",
"winnerModel": "gpt-5.4-mini",
"attempts": [
{
"provider": "openai-codex",
"model": "gpt-5.4-mini",
"result": "success",
"stage": "assistant"
}
],
"fallbackUsed": false,
"runner": "embedded"
},
"requestShaping": {
"authMode": "auth-profile",
"thinking": "off"
},
"completion": {
"stopReason": "stop",
"finishReason": "stop"
}
}
}

proud galeBOT
#

This is helpful, and it also shows why the test didn’t prove the Ollama path yet.

That run used:

  • provider: openai-codex
  • model: gpt-5.4-mini

So --agent main hit your normal main session, not the heartbeat session that was using Ollama.

You can even see it in the session key:

  • test command used agent:main:main
  • earlier heartbeat was agent:main:main:heartbeat

So the next correct test is to target the actual heartbeat session id, not just --agent main.

Try:

#
openclaw sessions --json

Find the session whose key is:

agent:main:main:heartbeat

Then run:

openclaw agent --local --session-id <that-session-id> --message "Reply with exactly HEARTBEAT_OK" --thinking off --json

Why this matters:

#
  • session-level model overrides live on the session
  • your heartbeat session is the one pinned to Ollama/Qwen
  • --agent main created/used a different session that stayed on Codex

So right now we’ve confirmed:

  • main agent works
  • Codex fallback works
  • but we have not yet directly tested the heartbeat Ollama session

If you paste the openclaw sessions --json bit for the heartbeat row, I can help you point the next command at the exact right target.

#

Close, but that’s still the systemPromptReport chunk, not the sessions list row I need.

What I need is the entry for the heartbeat session from:

openclaw sessions --json

Easiest way on PowerShell:

openclaw sessions --json | Select-String "agent:main:main:heartbeat" -Context 0,20

Paste the chunk that includes things like:

  • sessionId
#
  • sessionKey
  • ideally model/provider if shown

Right now we’re trying to grab the actual heartbeat session id, because that’s the one using Ollama. The --agent main test hit Codex instead.

So, one more try:

openclaw sessions --json | Select-String "heartbeat" -Context 0,20

Paste that result and I’ll help you target the exact session.

#

No worries, that’s still not the session row, but honestly this chunk is enough to see the next bottleneck.

These two numbers are the big deal now:

  • skills: 9900 chars
  • tool schemas: 14369 chars

That is a lot of runtime surface for a 3B Qwen heartbeat model.

So I’d stop trimming the markdown files now. You already got the file bloat way down. The next problem is the runtime prompt shape, not AGENTS.md.

What to do next, in order:

  1. keep thinking off

    • we already know that part mattered
  2. turn on lean local mode

#
  • agents.defaults.experimental.localModelLean: true
  1. disable tool calling for this Qwen model

    • compat.supportsTools: false
  2. shrink the skills prompt budget hard

    • skills.limits.maxSkillsPromptChars
    • or better, a much smaller per-agent skillsLimits.maxSkillsPromptChars for the heartbeat/local agent

At this point, that’s the real path forward.

My honest read now:

  • your markdown trimming helped a lot
  • but Qwen 3B + full OpenClaw skill/tool surface is still too heavy
  • for heartbeat, I’d treat it as a tiny text-only model, not a full agentic model

So the practical target is:

#
  • thinking off
  • tools off
  • smaller skills block
  • lean local mode on

If you want, I can write you a single Claude prompt that says:

ā€œrewrite my OpenClaw config for a dedicated ultra-lean local heartbeat agent using Qwenā€

That might be the fastest next move.

#

Yep, that’s the smoking gun.

For your local Qwen heartbeat model, these tool schemas are wildly expensive:

  • cron → 6700 schema chars
  • sessions_spawn → 1333
  • exec → 1157
  • process → 961

That is a ton of overhead for a 3B local model, and heartbeat does not need that stuff.

So the practical conclusion is:

  • your markdown file trimming helped
  • but now the real problem is tool schema bloat
  • and cron is the biggest offender by far
#

If this model is only for heartbeat, I would do this:

  1. disable tools entirely for Qwen

    • compat.supportsTools: false
  2. if you can’t do that, give the local heartbeat agent a tiny tool allowlist

    • definitely remove:
      • cron
      • exec
      • process
      • sessions_spawn
      • probably the rest of the sessions_* tools too
  3. keep:

    • thinkingDefault: "off"
    • agents.defaults.experimental.localModelLean: true
    • smaller skills.limits.maxSkillsPromptChars
#

So, blunt version:

cron alone is probably killing your Qwen heartbeat path.

Heartbeat should be a tiny text-only lane, not a full agent-with-everything lane.

If you want, I can now write you the Claude prompt for ā€œrewrite my config into an ultra-lean local heartbeat agentā€.

#

Yep, this confirms two things pretty clearly:

  1. your manual smoke test still did not hit Qwen

    • winnerProvider: "openai-codex"
    • winnerModel: "gpt-5.4-mini"
  2. thinking off is working

    • requestShaping.thinking: "off"

So the state of play is:

  • the thinking bug is fixed
  • the manual test path is still Codex
  • the remaining failure is specifically the Ollama heartbeat path
  • and your earlier tool/skill numbers make it very likely that Qwen 3B is timing out on the full runtime surface

At this point I would stop chasing more pasted JSON and just make the config simpler:

#
  • thinkingDefault: "off" for the local heartbeat lane
  • agents.defaults.experimental.localModelLean: true
  • compat.supportsTools: false for that Qwen model
  • lower skills.limits.maxSkillsPromptChars hard for the heartbeat/local agent

That is the practical fix path.

Bluntly: Qwen 3B is probably too weak for the current heartbeat lane unless you make it almost text-only.

If you want, I can now write you one clean prompt for Claude:

ā€œrewrite my OpenClaw config into a dedicated ultra-lean Qwen heartbeat agentā€

That’s probably the fastest way out of this.

stoic pivot
#

That would be very helpful. I am also new to cluade code so I wasn't sure what folder to point it to.

proud galeBOT
#

Yep, and for this job I’d point Claude Code at:

C:\Users\prapunsak\.openclaw

Why that folder:

  • openclaw.json should live there
  • your workspace\ folder lives under it
  • Claude needs to see both the config and the injected workspace files

If you only wanted to trim prompt files, C:\Users\prapunsak\.openclaw\workspace would be enough.
But for the heartbeat/Qwen config fix, use the parent .openclaw folder.

Paste this into Claude Code:

You are working inside my OpenClaw state/config directory on Windows.
#

Working directory:
C:\Users\prapunsak\.openclaw

Goal:
Make my OpenClaw heartbeat use an ultra-lean local Ollama Qwen path that is small and reliable enough for heartbeat-only use.

Important context:
- I am only trying to use local Ollama Qwen for heartbeat.
- My normal main/fallback setup should keep working.
- I already trimmed AGENTS.md / SOUL.md / MEMORY.md / etc, so file bloat is no longer the main issue.
- Current remaining pressure is the runtime surface: skills + tool schemas.
- I saw a raw Ollama error saying the Qwen model does not support thinking.
- After setting thinking off, the heartbeat still timed out.
- Tool schema inspection showed very heavy tools, especially cron, exec, process, sessions_spawn, and related session tools.
#
- This local model is qwen2.5:3b-instruct-q4_K_M via Ollama.
- This model should be treated as a tiny heartbeat-only text model, not a full agentic model.

Your job:
Inspect my current OpenClaw config and workspace, then produce the smallest safe config changes needed to create or tune a dedicated ultra-lean local heartbeat path.

Read first:
- openclaw.json
- workspace/
- any included config files referenced from openclaw.json
- docs for:
  - https://docs.openclaw.ai/providers/ollama
  - https://docs.openclaw.ai/gateway/local-models
  - https://docs.openclaw.ai/tools/thinking
  - https://docs.openclaw.ai/gateway/configuration-reference
#

Strict rules:
1. Do not guess config keys.
2. Use only documented OpenClaw config paths.
3. Keep changes as narrow as possible.
4. Do not break my normal main/fallback model setup.
5. Do not touch auth unless absolutely necessary.
6. Prefer editing only the config needed for the heartbeat/local lane.
7. Use JSON5, not YAML.
8. Preserve comments and formatting where reasonable.

What I want you to optimize:
- thinking off for the heartbeat/local agent
- lean local model mode if appropriate
- much smaller skills prompt budget for the heartbeat/local agent
#
- remove or avoid heavy tool schemas for the local Qwen heartbeat path
- if supported and appropriate, make the Qwen heartbeat path no-tools / text-only
- reduce avoidable startup/runtime pressure without changing unrelated behavior

Please specifically investigate whether the best fix is:
- a dedicated heartbeat/local agent config, or
- a model-specific compatibility change, or
- both

Also check whether the current heartbeat is using:
- a dedicated agent, or
- a session-level model override on main

If the current setup is awkward, propose the simplest reliable configuration.
#
Output format:
1. Diagnosis
2. Exact files to change
3. A minimal patch or exact replacement blocks
4. Why each change helps
5. A short test plan with commands I can run after editing
6. If qwen2.5:3b-instruct-q4_K_M is still a bad fit even after slimming, say so plainly and recommend the cheapest stable fallback option for heartbeat

Important:
Do not do a broad cleanup. Stay focused on making the local heartbeat lane small and reliable.

If you want, after Claude gives you the patch, paste it here and I’ll sanity-check it before you apply it.

stoic pivot
#

Could you give citation for agents.defaults.experimental.localModelLean: true ? Claude Code can't find it

ME: is it possible to do this just for the heartbeat? Do you recommend? agents.defaults.experimental.localModelLean: true

CC: Good question. Let me check the config reference for scope.

Fetched
https://docs.openclaw.ai/gateway/configuration-reference
It's not documented in the reference, which means it's either undocumented experimental or I misspoke earlier. Don't rely on it. The documented levers are:

compat.supportsTools: false on the model (which you already have) — this is the stable, per-model way to drop tool schemas. That's your real weight-cutter.
agents.defaults.models["…"].params for per-model thinking/reasoning pins (which you already have).
Those two together give you the lean local path. Adding an undocumented experimental flag risks config errors and unpredictable behavior.

Recommendation: Stick with what you applied. The supportsTools: false on the Qwen 3B model is the lean mode — it strips the heavyweight tool schemas (cron, exec, process, sessions_spawn, etc.) from the prompt for that model only. That's the documented way.

If the heartbeat is still timing out after a restart, the issue is model speed or Ollama queueing, not tool-schema bloat. In that case, switch to Qwen 1.5B q4 (same edits applied to that model entry). That's your next reliable step before remote fallback.

proud galeBOT
#

Yep, fair catch. I should have cited it earlier.

Claude Code is right that it’s not in the configuration reference.
But it is documented in two other docs:

  1. Local Models
    https://docs.openclaw.ai/gateway/local-models

    Exact wording there:

    first try agents.defaults.experimental.localModelLean: true to drop heavyweight default tools like browser, cron, and message; this is an experimental flag, not a stable default-mode setting.

  2. Experimental Features
    https://docs.openclaw.ai/concepts/experimental-features

    Exact wording there:

#

agents.defaults.experimental.localModelLean: true is a pressure-release valve for weaker local-model setups.

So the correction is:

  • documented? yes
  • in config reference? no
  • stable? no, explicitly experimental

For your follow-up question, I would not use it as the main heartbeat-only lever.

Why:

  • the documented path is agents.defaults.experimental.localModelLean
  • that is a defaults/global knob
  • I do not see a documented per-heartbeat/per-agent version of that flag

So for heartbeat-only, I’d recommend exactly what Claude Code said:

#
  • use compat.supportsTools: false on the Qwen model
  • keep thinking off for that path

That is the cleaner, more targeted, more stable lever.

So my honest recommendation now is:

  • heartbeat-only / targeted fix → compat.supportsTools: false
  • broad experimental pressure relief for weak local backends → agents.defaults.experimental.localModelLean: true

If you want, I can help you word a short reply back to Claude Code so it updates its plan cleanly.

stoic pivot
#

what is the correct command to do budget audit and usage review? I am trying to create a skill for it to check and model optimize usage. Maybe you can help rewrite the skill?

This is my current skill:
name: budget-audit
description: Review OpenClaw usage, quota telemetry, and visible budget signals using supported commands only, then return a concise structured budget report. Use when daily or weekly budget audit/review cron jobs run.

Budget Audit Skill
Produce short, reliable budget telemetry summaries without inventing unsupported signals.

Rules
Use only supported OpenClaw commands and visible telemetry.
Prefer openclaw status --all when budget/quota review is requested.
If telemetry is unavailable, say so plainly.
Keep the report concise and structured exactly as requested by the caller.
Do not pad the report with speculation.
Return NO_REPLY for scheduled/background runs unless the caller explicitly requests delivery.

Steps
Gather supported budget/quota signal from OpenClaw status output.
Identify visible signal, quota telemetry availability, and any obvious recommendation.
Format the report exactly to the caller's required schema.
Return NO_REPLY for cron/background runs.

Output contract
Use only available evidence.
Mark unavailable telemetry clearly.
Keep the report short.

Continuous Cost Awareness
Track your own behavior across sessions:
How many model switches happened today?
How many tool calls did you make per question on average?
When did compaction trigger, and was it expected?

If you notice you are frequently escalating to a more expensive model:
Identify the task category causing it
Log it to today's memory file
Flag it to Ta in the next session summary
Propose whether a better default model should be used for that category

#

@proud gale

proud galeBOT
#

I’m here 🦐

Want me to do one of these next?

  1. write a short reply back to Claude Code
  2. sanity-check the config block it generated
  3. help you test whether Qwen heartbeat is fixed

Paste whichever you have and I’ll keep going.

stoic pivot
#

what is the correct command to do budget audit and usage review? I am trying to create a skill for it to check and model optimize usage. Maybe you can help rewrite the skill?

This is my current skill:
name: budget-audit
description: Review OpenClaw usage, quota telemetry, and visible budget signals using supported commands only, then return a concise structured budget report. Use when daily or weekly budget audit/review cron jobs run.

Budget Audit Skill
Produce short, reliable budget telemetry summaries without inventing unsupported signals.

Rules
Use only supported OpenClaw commands and visible telemetry.
Prefer openclaw status --all when budget/quota review is requested.
If telemetry is unavailable, say so plainly.
Keep the report concise and structured exactly as requested by the caller.
Do not pad the report with speculation.
Return NO_REPLY for scheduled/background runs unless the caller explicitly requests delivery.

Steps
Gather supported budget/quota signal from OpenClaw status output.
Identify visible signal, quota telemetry availability, and any obvious recommendation.
Format the report exactly to the caller's required schema.
Return NO_REPLY for cron/background runs.

Output contract
Use only available evidence.
Mark unavailable telemetry clearly.
Keep the report short.

Continuous Cost Awareness
Track your own behavior across sessions:
How many model switches happened today?
How many tool calls did you make per question on average?
When did compaction trigger, and was it expected?

If you notice you are frequently escalating to a more expensive model:
Identify the task category causing it
Log it to today's memory file
Flag it to Ta in the next session summary
Propose whether a better default model should be used for that category

proud galeBOT
#

Yep. The main issue with your current skill is that it promises signals OpenClaw does not expose directly.

The safest supported commands are:

  • openclaw status --usage → provider quota / usage windows
  • openclaw status --all → broader pasteable diagnostic
  • openclaw models status → resolved model, fallbacks, auth overview
  • in chat: /status and /usage cost

I would remove these from the skill, because they are not reliably available from supported commands alone:

  • ā€œHow many model switches happened today?ā€
  • ā€œHow many tool calls per question on average?ā€
  • ā€œWhen did compaction trigger, and was it expected?ā€
  • ā€œTrack your own behavior across sessionsā€
  • ā€œLog it to today’s memory fileā€
#

That turns it from a budget audit skill into a speculative analytics skill.

A tighter rewrite would be:

---
name: budget-audit
description: Review supported OpenClaw usage, quota, model, and auth signals, then return a concise structured budget report. Use for daily or weekly budget/usage review runs.
---

# Budget Audit

Produce short, evidence-based usage reports using only supported OpenClaw telemetry.

## Supported commands
#
Use these only when relevant:

- `openclaw status --usage`
- `openclaw status --all`
- `openclaw models status`

In chat contexts, supported usage surfaces are:

- `/status`
- `/usage cost`

## Rules

- Use only visible OpenClaw telemetry.
- Do not invent hidden budgets, provider billing state, model-switch counts, tool-call averages, or cross-session analytics.
#
- If telemetry is unavailable, say `unavailable`.
- Keep the report concise and structured exactly as requested.
- For cron/background runs, return `NO_REPLY` unless delivery is explicitly requested.

## Steps

1. Run `openclaw status --usage` for provider quota/usage windows.
2. Run `openclaw models status` when model, fallback, auth, or cooldown context matters.
3. Use `openclaw status --all` only when the caller wants a broader diagnostic snapshot.
4. Extract only supported visible signals:
   - provider usage/quota windows
   - active model and fallbacks
   - auth/cooldown/unusable-profile issues if visible
   - obvious recommendation based on visible evidence
5. Mark any missing signal as `unavailable`.
#
6. Format the result exactly to the caller’s requested schema.

## Do not claim

- exact cost unless a visible cost surface provides it
- hidden provider billing details
- model-switch counts
- tool-call averages
- cross-session behavior analytics
- compaction history unless directly shown in visible output

## Default output shape

- Models:
- Usage windows:
#

If you want, I can also rewrite it into a Claude Code-ready skill file with an even stricter output schema for cron use.

stoic pivot
#

Some of my skills only need to load during cron dont need to load at startup for fixed cost. Can I do that?

proud galeBOT
#

Yes, absolutely, and this is the right way to think about it.

Short version:

  • skills are only part of the prompt for agents that can see them
  • so if a skill is cron-only, put it on a dedicated cron agent
  • do not leave it visible to your main/default agent

The docs-backed knobs are:

  • agents.defaults.skills
  • agents.list[].skills
  • agents.list[].skillsLimits.maxSkillsPromptChars

And cron jobs can target a specific agent with --agent.

So the pattern is:

#
  1. keep your main agent’s skills small
  2. create a dedicated cron agent
  3. give that cron agent the extra skills
  4. point the cron job at that agent

Example idea:

{
  agents: {
    defaults: {
      skills: ["summarize", "healthcheck"],
    },
    list: [
      {
        id: "main",
#
        skills: ["summarize", "healthcheck"],
      },
      {
        id: "cron-audit",
        skills: ["budget-audit", "model-routing"],
        skillsLimits: {
          maxSkillsPromptChars: 1500,
        },
      },
    ],
  },
}

Then create/edit the cron to use that agent:

#
openclaw cron add --name "Budget audit" --cron "0 8 * * *" --session isolated --agent cron-audit --message "Run budget audit and summarize visible usage/quota signals."

Important nuance:

  • if a skill is visible to main, it still contributes to main’s fixed prompt cost
  • to keep it out of startup cost for main, it has to be excluded from main
  • agents.list[].skills is the lever that makes that happen

Also useful:

  • agents.list[].skills: [] means no skills for that agent
  • isolated cron sessions help keep token cost down too

Docs:

stoic pivot
#

yes please. I think a sub-agent for main is good for this

proud galeBOT
#

Yep, that’s a good pattern, and I’d structure it like this:

Recommendation

  • keep main lean
  • create one narrow audit agent
  • let:
    • cron target audit directly
    • main call audit as a sub-agent when needed

That gives you the best of both:

  • low fixed cost on main
  • audit skills only load when actually needed

A clean split would be:

#
{
  agents: {
    defaults: {
      skills: ["summarize", "healthcheck"],
    },
    list: [
      {
        id: "main",
        skills: ["summarize", "healthcheck"],
      },
      {
        id: "audit",
        skills: ["budget-audit"],
        skillsLimits: {
          maxSkillsPromptChars: 1200,
#
        },
        thinkingDefault: "off",
      },
    ],
  },
}

How to use it:

Cron

openclaw cron add --name "Budget audit" --cron "0 8 * * *" --session isolated --agent audit --message "Run budget audit and summarize visible usage/quota signals."

From main, on demand

#
/subagents spawn audit Run a budget audit using supported OpenClaw usage/quota commands only and return a short report.

Why I like this setup:

  • main does not carry the budget-audit skill at startup
  • audit only pays its own context when invoked
  • cron and interactive usage can reuse the same narrow agent
  • easier to reason about than stuffing everything into main

One nuance:

  • sub-agents do have their own context/token cost when spawned
  • but that’s on-demand cost, not constant fixed prompt cost on every main turn

So yes, I’d do dedicated audit agent + optional sub-agent call from main.

#

If you want, I can help you write:

  1. the exact budget-audit skill for the audit agent, and
  2. a tiny prompt/instruction for main telling it when to delegate.
stoic pivot
#
  1. Cron jobs for sub-agent of main

  2. budget audit, config-diff-review, knowledge-reconciliation, learning-loop, memory-watchdog, receipt-normalizer, self-improvement-review, workspace-hygiene

  3. Skill not using anymore:
    memory-promotion

  4. Main skill:
    model-routing

  5. Aemeath skill:
    model-routing

  6. cron job for aemeath:
    learning-loop, memory-watchdog

#
  1. cron job for aemeath's sub agent:
proud galeBOT
#

Yep, based on that list, I’d split it like this.

Remove entirely

  • memory-promotion

Keep interactive agents very lean

  • main

    • model-routing
  • aemeath

    • model-routing

Move cron-only / maintenance skills off the interactive agents

  • main-cron (or main-audit)
    • budget-audit
#
  • config-diff-review

    • knowledge-reconciliation
    • learning-loop
    • memory-watchdog
    • receipt-normalizer
    • self-improvement-review
    • workspace-hygiene
  • aemeath-cron

    • learning-loop
    • memory-watchdog

That gives you this shape:

{
#
  agents: {
    list: [
      {
        id: "main",
        skills: ["model-routing"],
      },
      {
        id: "main-cron",
        skills: [
          "budget-audit",
          "config-diff-review",
          "knowledge-reconciliation",
          "learning-loop",
          "memory-watchdog",
          "receipt-normalizer",
#
          "self-improvement-review",
          "workspace-hygiene",
        ],
      },
      {
        id: "aemeath",
        skills: ["model-routing"],
      },
      {
        id: "aemeath-cron",
        skills: ["learning-loop", "memory-watchdog"],
      },
    ],
  },
}
#

Why I’d do it this way:

  • main stays tiny
  • aemeath stays tiny
  • cron-only skills stop adding fixed startup cost to interactive chats
  • both interactive agents can still delegate when needed

Use cron like this:

openclaw cron add --name "Main audit" --cron "0 8 * * *" --session isolated --agent main-cron --message "Run the scheduled maintenance/audit workflow."
openclaw cron add --name "Aemeath review" --cron "0 9 * * *" --session isolated --agent aemeath-cron --message "Run learning-loop and memory-watchdog."

And if main needs that work on demand, it can spawn:

#
/subagents spawn main-cron Run the maintenance review workflow and return a short summary.

So yes, I think your instinct was right, but I’d make it two dedicated cron agents, not load those skills onto main or aemeath directly.

If you want, I can turn that into a Claude Code prompt that rewrites your config cleanly.

#

Yep, that’s even cleaner.

So the final split I’d recommend is:

  • main

    • model-routing
  • main-cron

    • budget-audit
    • config-diff-review
    • knowledge-reconciliation
    • learning-loop
    • memory-watchdog
    • receipt-normalizer
    • self-improvement-review
    • workspace-hygiene
#
  • aemeath

    • model-routing
  • aemeath-cron

    • the cron-only skills for Aemeath’s sub-agent

So yes, do not put those cron skills on aemeath itself.
Put them on aemeath-cron and have cron target that agent directly.

That keeps:

  • main lean
  • aemeath lean
  • cron logic isolated
  • fixed startup cost much lower

If you want, send me the exact skill list for aemeath-cron and I’ll give you the final clean agent layout.

stoic pivot
#

aemeath-cron
learning-loop
memory-watchdog

proud galeBOT
#

Perfect, then I’d lock it in like this:

{
  agents: {
    list: [
      {
        id: "main",
        skills: ["model-routing"],
      },
      {
        id: "main-cron",
        skills: [
          "budget-audit",
          "config-diff-review",
          "knowledge-reconciliation",
#
          "learning-loop",
          "memory-watchdog",
          "receipt-normalizer",
          "self-improvement-review",
          "workspace-hygiene",
        ],
      },
      {
        id: "aemeath",
        skills: ["model-routing"],
      },
      {
        id: "aemeath-cron",
        skills: ["learning-loop", "memory-watchdog"],
      },
#
    ],
  },
}

That’s a nice split:

  • main stays lean
  • aemeath stays lean
  • main-cron carries the heavy maintenance stuff
  • aemeath-cron carries only its two cron skills

And the cron target for Aemeath should point to:

#
--agent aemeath-cron

So yes, this looks like the right structure to reduce fixed startup cost.

If you want, I can now write the Claude Code prompt that tells it to rewrite your config to exactly this layout.

stoic pivot
#

Yes please

proud galeBOT
#

Nice, use this in Claude Code.

Point it at:

C:\Users\prapunsak\.openclaw

Then paste this prompt:

You are editing my OpenClaw config in:

C:\Users\prapunsak\.openclaw

Goal:
Rewrite my agent/skills layout so interactive agents stay lean and cron-only skills do not add fixed startup cost to main agents.
#
Important rules:
- Read my current openclaw.json first.
- If openclaw.json includes other files, read those too.
- Use only documented OpenClaw config keys.
- Do not guess.
- Preserve unrelated config, auth, channels, and model settings.
- Use JSON5, not YAML.
- Make the smallest safe change.

Target design:

1. main
- skills: ["model-routing"]

2. main-cron
#
- skills:
  - budget-audit
  - config-diff-review
  - knowledge-reconciliation
  - learning-loop
  - memory-watchdog
  - receipt-normalizer
  - self-improvement-review
  - workspace-hygiene

3. aemeath
- skills: ["model-routing"]

4. aemeath-cron
- skills:
#
  - learning-loop
  - memory-watchdog

5. remove unused skill exposure:
- memory-promotion should no longer be exposed anywhere unless it is still required by an existing config path you can prove from the file

What I want:
- edit the config so these four agents exist with the skill visibility above
- keep interactive agents lean
- keep cron-only skills off interactive agents
- do not broaden skill visibility accidentally
- if agents.defaults.skills is currently causing extra skill exposure, narrow it appropriately
- if per-agent skills are better than defaults here, use per-agent skills
- preserve my current model settings unless they conflict with this goal
#
Also give me:
1. the exact files changed
2. the exact config diff
3. a short explanation of why this lowers fixed prompt cost
4. example cron commands for:
   - main-cron
   - aemeath-cron

Use these example cron targets if they fit the config cleanly:
- --agent main-cron
- --agent aemeath-cron

Do not do a broad cleanup. Stay focused on the skill/agent split only.

If you want, when Claude gives you the diff, paste it here and I’ll sanity-check it before you save it.

obtuse lindenBOT