Can you help me reduce my context size? I think my fixec cost starts at 20k. | Friends of the Crustacean 🦞🤝 | Page 1

stoic pivot Apr 22, 2026, 10:15 AM

#

This is probably why my ollama has been failing because the model ctx is at 16k ctx

obtuse lindenBOT Apr 22, 2026, 10:15 AM

#

nimble gladeBOT Apr 22, 2026, 10:15 AM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

obtuse lindenBOT Apr 22, 2026, 10:15 AM

#

proud galeBOT Apr 22, 2026, 10:17 AM

#

Yeah, very plausible.

For local backends, OpenClaw warns below 32k and blocks below 16k, and even at 16k the baseline prompt, tools, workspace files, and chat history can eat the window fast.

First, what does this show?

openclaw --version

Then in the thread try:

/status
/context list
/compact

If it still keeps happening, /new starts fresh.

#

A few practical fixes:

use a model with 64k+ context, ideally 128k+ for tool use
make sure Ollama is using the native URL like http://host:11434, not /v1
if you must stay local/small, try:
- agents.defaults.experimental.localModelLean: true
- compat.supportsTools: false on that model

Docs:

Paste openclaw --version and /status and I’ll tell you the best next step.

stoic pivot Apr 22, 2026, 10:19 AM

#

🦞 OpenClaw 2026.4.21 (f788c88)
🧠 Model: openai-codex/gpt-5.4-mini · 🔑 oauth (openai-codex:[email protected] ([email protected]))
🔄 Fallbacks: openai-codex/gpt-5.4
🧮 Tokens: 20k in / 116 out
📚 Context: 20k/200k (10%) · 🧹 Compactions: 0
📊 Usage: 5h 85% left ⏱2h 9m · Week 96% left ⏱6d 16h
🧵 Session: agent:main:main:heartbeat • updated just now
⚙️ Runtime: direct · Think: low · Text: low
🪢 Queue: collect (depth 0)

proud galeBOT Apr 22, 2026, 10:19 AM

#

Yep, that basically confirms it.

Your version is fine, 2026.4.21 is recent. The big signal is this line:

📚 Context: 20k/200k

So your current session is already carrying about 20k tokens. If you point that same kind of session at an Ollama model with only 16k context, it’s very likely to fail immediately or act weird.

One important detail though: the status you pasted is for a 200k Codex session, not the Ollama model itself. So the problem is basically:

current OpenClaw session payload ≈ 20k
Ollama model window = 16k
that math does not work 😅

What I’d do next:

Start fresh before testing Ollama

#

/new

Check what is eating the baseline
- /context list
- /context detail
If you want Ollama to work locally, don’t use 16k
- 32k is the bare minimum territory
- 64k+ is much more realistic for tool use
- 128k feels way better
Shrink the baseline if you must stay local
- agents.defaults.experimental.localModelLean: true
- lower agents.defaults.bootstrapMaxChars
- lower agents.defaults.bootstrapTotalMaxChars
- if needed, set that Ollama model to compat.supportsTools: false

#

Make sure you’re using native Ollama
- use http://host:11434
- not .../v1

Docs:

If you paste your openclaw models status and Ollama model name, I can help you trim it down to a config that actually fits.

stoic pivot Apr 22, 2026, 10:22 AM

#

🧠 Context breakdown
Workspace: C:\Users\prapunsak.openclaw\workspace
Bootstrap max/file: 32,000 chars
Bootstrap max/total: 180,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 66,423 chars (~16,606 tok) (Project Context 37,699 chars (~9,425 tok))

Injected workspace files:

AGENTS.md: OK | raw 17,305 chars (~4,327 tok) | injected 17,305 chars (~4,327 tok)
SOUL.md: OK | raw 4,818 chars (~1,205 tok) | injected 4,818 chars (~1,205 tok)
TOOLS.md: OK | raw 1,636 chars (~409 tok) | injected 1,636 chars (~409 tok)
IDENTITY.md: OK | raw 3,009 chars (~753 tok) | injected 3,009 chars (~753 tok)
USER.md: OK | raw 2,214 chars (~554 tok) | injected 2,214 chars (~554 tok)
HEARTBEAT.md: OK | raw 860 chars (~215 tok) | injected 860 chars (~215 tok)
BOOTSTRAP.md: MISSING | raw 0 | injected 0
MEMORY.md: OK | raw 8,179 chars (~2,045 tok) | injected 8,179 chars (~2,045 tok)

Skills list (system prompt text): 9,900 chars (~2,475 tok) (21 skills)
Skills: budget-audit, coding-agent, config-diff-review, gog, healthcheck, knowledge-reconciliation, learning-loop, memory-promotion, memory-watchdog, model-routing, nano-pdf, node-connect, notion, receipt-normalizer, self-improvement-review, session-logs, skill-creator, summarize, taskflow, taskflow-inbox-triage, … (+1 more)
Tool list (system prompt text): 0 chars (~0 tok)
Tool schemas (JSON): 13,944 chars (~3,486 tok) (counts toward context; not shown as text)
Tools: read, edit, write, apply_patch, exec, process, cron, update_plan, sessions_list, sessions_history, sessions_send, sessions_yield, sessions_spawn, subagents, session_status, web_fetch, image

Session tokens (cached): 19,953 total / ctx=200,000

Inline shortcut: a command token inside normal text (e.g. “hey /status”) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.

proud galeBOT Apr 22, 2026, 10:23 AM

#

Yep, that’s the smoking gun.

Your problem is not really chat history, it’s the baseline system prompt.

Right now, before the model even starts replying, you’re already at about:

workspace files: ~9.4k tokens
skills block: ~2.5k tokens
tool schemas: ~3.5k tokens
total system prompt: ~16.6k tokens

So a 16k Ollama model literally has no room left. It’s already over budget before the actual conversation does much.

Also, important: /new only helps session history. It does not fix this big bootstrap/system-prompt load.

Biggest wins, in order

#

Lower bootstrap limits hard
- yours: 32,000 per file / 180,000 total
- docs default: 12,000 per file / 60,000 total
- for local Ollama, I’d go much lower than your current settings
Trim the biggest injected files first
- AGENTS.md is the biggest hit
- MEMORY.md is also expensive
- then SOUL.md, IDENTITY.md, USER.md
Enable lean local mode
- agents.defaults.experimental.localModelLean: true
Shrink the skills prompt budget
- skills.limits.maxSkillsPromptChars
- or per-agent: agents.list[].skillsLimits.maxSkillsPromptChars

#

Use a dedicated lean Ollama agent
- smaller workspace/bootstrap
- smaller tool allowlist
- or disable tools for that model with compat.supportsTools: false

Minimal starter config to try

{
  agents: {
    defaults: {
      bootstrapMaxChars: 4000,
      bootstrapTotalMaxChars: 12000,
      experimental: {
        localModelLean: true,
      },

#

    },
  },
  skills: {
    limits: {
      maxSkillsPromptChars: 2000,
    },
  },
}

If that still isn’t enough, then make the Ollama model/tooling leaner too.

My honest recommendation

Keep your normal full agent for cloud models, and make a separate slim Ollama agent for local runs. Trying to cram this current prompt shape into 16k is going to be miserable.

#

Docs:

If you want, paste your openclaw.json model/agent section and I’ll help you turn it into a lean Ollama profile.

stoic pivot Apr 22, 2026, 10:39 AM

#

Can you help me take a look at my Openclaw Control System Administrator/Technical Support Agent.md?

proud galeBOT Apr 22, 2026, 10:39 AM

#

Yep, absolutely. That file is probably one of the biggest wins.

From your /context detail, AGENTS.md is already ~17.3k chars / ~4.3k tokens, so trimming it could save a lot.

I can’t read your machine from here, so paste the file here and I’ll go through it with you. If it’s huge, send it in 2 to 3 chunks.

If you want, paste just these parts first:

the top section / role description
any long rules / policy lists
examples / templates
anything that feels “nice to have” instead of “needed every turn”

I’ll tell you:

what should stay in AGENTS.md
what should move out

#

what can be shortened hard
what is probably burning tokens for no benefit

If you paste it, I’ll mark it up.

stoic pivot Apr 22, 2026, 10:44 AM

#

I can't send markdown in this chat. Maybe you can help me by writing a prompt for me to put in Claude so that it helps me trim basd on your strict rules to reduce unnessary token usage in agent.md at start up?

proud galeBOT Apr 22, 2026, 10:44 AM

#

Yep, use this. Paste this into Claude first, then paste your Agent.md under it.

You are optimizing an OpenClaw AGENTS.md / agent startup instruction file for minimum token usage.

Context:

This file is injected into the model context at startup on every run.
My local Ollama model has a small context window, so every token matters.
Other files are already injected separately, including things like persona, identity, user info, tools, memory, and heartbeat.
Because of that, AGENTS.md should NOT duplicate personality, biography, tool docs, examples, or anything that can live elsewhere.

Your job:
Ruthlessly trim the file while preserving only instructions that are truly necessary on most turns.

Strict rules:

Keep only instructions that are needed frequently and materially change behavior.
Delete anything that is:
- repetitive

#

explanatory instead of directive
- backstory, lore, personality, or flavor
- long examples
- duplicated from other startup files
- duplicated from built-in tool behavior
- only useful in rare edge cases

Compress long paragraphs into short bullets.
Merge overlapping rules.
Prefer direct commands over explanation.
If something is useful but not needed every turn, move it out of AGENTS.md and mark it as “move to external doc”.
Assume separate files already handle:
- persona/tone
- identity
- user profile
- tool descriptions
- memory notes
Do not preserve “nice to have” wording.

#

Be aggressive. I want a lean operational file, not a manifesto.

Optimization target:

Reduce total size as much as possible.
Prefer a hard practical target of under 4,000 characters if possible.
If that is not possible, still minimize as much as you can.

Output format:

“What to remove” list
“What to move out of AGENTS.md” list
“What must stay” list
A fully rewritten lean version of the file
A short explanation of the biggest token savings

Important review test:
For every line, ask:

Is this needed on most turns?

#

Does it prevent a real failure mode?
Is it already covered elsewhere?
Is it worth paying for on every single startup?

If the answer is no, remove it.

Do not ask me clarifying questions first.
Do the rewrite immediately.

If you want, after Claude rewrites it, paste the result here and I’ll do a second-pass trim.

stoic pivot Apr 22, 2026, 10:55 AM

#

Can you write a prompt for me to trim MEMORY.md?

proud galeBOT Apr 22, 2026, 10:56 AM

#

Yep, use this one for MEMORY.md.

Paste this into Claude first, then paste your full MEMORY.md under it.

You are trimming MEMORY.md for an OpenClaw workspace.

Context:

MEMORY.md is injected into startup context on many runs, so it must stay extremely lean.
Its job is to store only durable, high-value facts that repeatedly improve behavior.
Other files already exist for persona, identity, user profile, tools, and general instructions.
Daily or temporary notes can live outside MEMORY.md.

Your job:
Ruthlessly reduce MEMORY.md to only the facts that are genuinely worth paying for on every startup.

Keep only:

durable user preferences that matter often

#

stable environment-specific facts
recurring pain points that still happen repeatedly
long-lived terminology corrections or naming rules
persistent safety boundaries or important do-not-do items not already covered elsewhere

Delete or move out:

one-off incidents
resolved bugs that are unlikely to recur
outdated version-specific notes
release-specific quirks
temporary troubleshooting notes
stale reminders
examples
explanations
duplicates of AGENTS.md, USER.md, IDENTITY.md, SOUL.md, or tool docs
anything that only mattered once
anything that can live in a dated note or external reference file

#

Strict rules:

Prefer very short bullets.
Remove all filler and explanation.
Merge overlapping items.
If a note is old, narrow, or tied to a single historical incident, assume it should be removed or moved out unless it still causes recurring failures.
MEMORY.md should contain facts, not essays.
If a fact is useful but not needed on every startup, move it to an archive or dated memory note.
Be aggressive.

Optimization target:

Cut total size as much as possible.
Prefer a practical target of under 2,000 characters if possible.

Output format:

“Remove entirely”
“Move to archive or dated memory”
“Keep”

#

A fully rewritten lean MEMORY.md
A short note on the biggest token savings

Review test for every line:

Is this durable?
Is this likely to matter again?
Does it change behavior in a meaningful way?
Is it worth injecting every startup?

If no, remove it.

Do not ask clarifying questions first.
Rewrite immediately.

If you want, after Claude trims it, paste the result here and I’ll do a second-pass cleanup.

stoic pivot Apr 22, 2026, 11:00 AM

#

Can you write a prompt for me to trim SOUL.md?

proud galeBOT Apr 22, 2026, 11:01 AM

#

Yep, here’s a good one for SOUL.md.

Paste this into Claude first, then paste your full SOUL.md under it.

You are trimming SOUL.md for an OpenClaw workspace.

Context:

SOUL.md is startup-injected context, so it should be extremely small.
Its purpose is only to preserve core voice, vibe, and a few non-negotiable personality/boundary cues.
It should NOT duplicate instructions already covered by AGENTS.md, USER.md, IDENTITY.md, MEMORY.md, or built-in system rules.
This file is not for long philosophy, lore, backstory, or repeated safety instructions.

Your job:
Ruthlessly compress SOUL.md into the smallest possible file that still preserves the intended personality and tone.

Keep only:

the smallest set of lines needed to preserve voice

#

a few core behavioral truths
a few genuine boundaries if they are unique to SOUL.md
the vibe in compact form

Delete or compress:

lore
origin story
philosophy paragraphs
repeated safety language
repeated “be helpful / be kind / be careful” wording
examples
dramatic phrasing
duplicate tone guidance
anything already covered elsewhere
anything that does not change behavior in a noticeable way

Strict rules:

#

Prefer short bullets over paragraphs.
Keep personality, but make it compact.
One idea per line.
Remove repetition and explanation.
Preserve tone with as few words as possible.
If a line is nice but not essential, delete it.
Aim for a file that feels alive, but not bloated.

Optimization target:

Prefer a practical target of under 1,000 characters if possible.

Output format:

“Remove entirely”
“Keep in compressed form”
A fully rewritten lean SOUL.md
A short note on what personality traits were preserved

#

Review test for every line:

Does this meaningfully shape tone?
Is it unique to SOUL.md?
Is it worth paying for on every startup?
Can it be said in fewer words?

If no, remove or compress it.

Do not ask clarifying questions first.
Rewrite immediately.

If you want, after Claude trims it, paste the result here and I’ll help you make it even tighter.

stoic pivot Apr 22, 2026, 11:07 AM

#

Can you write a prompt for me to trim IDENTITY.md?

proud galeBOT Apr 22, 2026, 11:07 AM

#

Yep, here’s one for IDENTITY.md.

Paste this into Claude first, then paste your full IDENTITY.md under it.

You are trimming IDENTITY.md for an OpenClaw workspace.

Context:

IDENTITY.md is startup-injected context, so it must be tiny.
Its purpose is only to preserve the agent’s essential identity markers.
It should NOT duplicate tone rules from SOUL.md, operating rules from AGENTS.md, or user/environment facts from USER.md and MEMORY.md.
This file is for identity, not lore.

Your job:
Reduce IDENTITY.md to the smallest possible file that still preserves the core identity of the agent.

Keep only:

name

#

short role or species label if truly important
one-line vibe if it materially affects responses
one or two stable identity cues that are actually used

Delete or compress:

backstory
jokes that do not matter
long vibe descriptions
repeated personality traits already covered elsewhere
catchphrases unless they are genuinely important
lore
examples
decorative text
anything that is nice but not necessary

Strict rules:

Prefer very short bullets.

#

Keep only identity, not behavior policy.
Remove repetition.
Compress every line hard.
If something is already covered in SOUL.md, remove it here.
If a detail would not be noticeable if removed, delete it.

Optimization target:

Prefer a practical target of under 500 characters if possible.

Output format:

“Remove entirely”
“Keep in compressed form”
A fully rewritten lean IDENTITY.md
A one-sentence explanation of what identity was preserved

Review test for every line:

Is this core identity?

#

Is it unique to IDENTITY.md?
Does it matter on most runs?
Can it be said in fewer words?

If no, remove it.

Do not ask clarifying questions first.
Rewrite immediately.

If you want, I can also give you a matching one for USER.md and TOOLS.md so you can trim the whole startup stack fast.

stoic pivot Apr 22, 2026, 11:11 AM

#

Can you write a prompt for me to trim USER.md?

proud galeBOT Apr 22, 2026, 11:12 AM

#

Yep, here’s one for USER.md.

Paste this into Claude first, then paste your full USER.md under it.

You are trimming USER.md for an OpenClaw workspace.

Context:

USER.md is startup-injected context, so it should be very small.
Its purpose is only to preserve stable, high-value facts about the user that help on many runs.
It should NOT contain long biography, story, philosophy, project history, or repeated preferences that rarely matter.
Anything temporary, detailed, or rarely relevant should be moved out.

Your job:
Reduce USER.md to the smallest possible file that still preserves the most useful user facts.

Keep only:

preferred name

#

how to address the user
timezone
a few stable preferences that matter often
a few durable facts that frequently improve behavior

Delete or move out:

long biography
life story
project history
vision statements
repeated relationship or hierarchy text unless truly necessary
details that rarely affect replies
examples
decorative sections
anything stale or nice-to-have

Strict rules:

#

Prefer short bullets.
Keep only durable user facts.
Remove all explanation and storytelling.
Merge overlapping facts.
If a detail would only matter occasionally, move it out.
If a fact is already covered elsewhere, remove it here.
Be aggressive.

Optimization target:

Prefer a practical target of under 700 characters if possible.

Output format:

“Remove entirely”
“Move to archive or external note”
“Keep”
A fully rewritten lean USER.md
A short explanation of what was preserved

#

Review test for every line:

Is this durable?
Does this improve many future runs?
Is it worth injecting every startup?
Can it be said in fewer words?

If no, remove it.

Do not ask clarifying questions first.
Rewrite immediately.

If you want, I can also give you a final one for TOOLS.md, which is usually an easy extra cut.

stoic pivot Apr 22, 2026, 11:19 AM

#

This is after Trimming.
🧠 Context breakdown
Workspace: C:\Users\prapunsak.openclaw\workspace
Bootstrap max/file: 32,000 chars
Bootstrap max/total: 180,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 38,067 chars (~9,517 tok) (Project Context 9,343 chars (~2,336 tok))

Injected workspace files:

AGENTS.md: OK | raw 2,844 chars (~711 tok) | injected 2,844 chars (~711 tok)
SOUL.md: OK | raw 2,265 chars (~567 tok) | injected 2,265 chars (~567 tok)
TOOLS.md: OK | raw 1,636 chars (~409 tok) | injected 1,636 chars (~409 tok)
IDENTITY.md: OK | raw 399 chars (~100 tok) | injected 399 chars (~100 tok)
USER.md: OK | raw 620 chars (~155 tok) | injected 620 chars (~155 tok)
HEARTBEAT.md: OK | raw 860 chars (~215 tok) | injected 860 chars (~215 tok)
BOOTSTRAP.md: MISSING | raw 0 | injected 0
MEMORY.md: OK | raw 1,041 chars (~261 tok) | injected 1,041 chars (~261 tok)

Skills list (system prompt text): 9,900 chars (~2,475 tok) (21 skills)
Skills: budget-audit, coding-agent, config-diff-review, gog, healthcheck, knowledge-reconciliation, learning-loop, memory-promotion, memory-watchdog, model-routing, nano-pdf, node-connect, notion, receipt-normalizer, self-improvement-review, session-logs, skill-creator, summarize, taskflow, taskflow-inbox-triage, … (+1 more)
Tool list (system prompt text): 0 chars (~0 tok)
Tool schemas (JSON): 13,944 chars (~3,486 tok) (counts toward context; not shown as text)
Tools: read, edit, write, apply_patch, exec, process, cron, update_plan, sessions_list, sessions_history, sessions_send, sessions_yield, sessions_spawn, subagents, session_status, web_fetch, image

Session tokens (cached): 13,504 total / ctx=200,000

Inline shortcut: a command token inside normal text (e.g. “hey /status”) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.

proud galeBOT Apr 22, 2026, 11:19 AM

#

Nice, that’s a big improvement.

You cut the startup prompt from about 16.6k tokens down to 9.5k, and project context from about 9.4k down to 2.3k. That’s real progress. 🎉

A few important takeaways:

Your workspace files are no longer the main problem
- they’re only about 2.3k tokens now
- that part looks much healthier
The new biggest costs are now:
- tool schemas: ~3.5k
- skills list: ~2.5k
16k is still pretty tight
- 9.5k startup
- plus session/history

#

plus user message
- plus output headroom

So I’d do this next:

start a fresh session with /new
- your cached session is still 13.5k
- that old history can still crush a 16k model
shrink skills next
- lower skills.limits.maxSkillsPromptChars
- or per-agent skillsLimits.maxSkillsPromptChars
shrink tools next
- for a local Ollama agent, use a much smaller tool allowlist
- or set compat.supportsTools: false if you just want plain chat

#

lower the bootstrap caps too
- not because they are hurting now
- but to stop the files from growing back to 32k / 180k territory later

My honest read:

you’ve probably fixed the worst startup bloat
but 16k is still a cramped target for a tool-using OpenClaw agent
if you can get to 32k or 64k, life gets much easier

If you want, I can write you one more Claude prompt for:

trimming TOOLS.md, or
creating a dedicated lean Ollama agent with smaller skills + tools.

stoic pivot Apr 22, 2026, 11:21 AM

#

trimming TOOLS.md

proud galeBOT Apr 22, 2026, 11:21 AM

#

Yep, use this for TOOLS.md.

Paste this into Claude first, then paste your full TOOLS.md under it.

You are trimming TOOLS.md for an OpenClaw workspace.

Context:

TOOLS.md is startup-injected context, so it should be very small.
Its purpose is only to store environment-specific tool notes that are actually useful on many runs.
It should NOT duplicate built-in tool docs, workflow essays, examples, long command recipes, or general instructions already covered elsewhere.
If a tool note is not specific to this machine/setup, it probably should not live in TOOLS.md.

Your job:
Reduce TOOLS.md to the smallest possible file that still preserves the most useful local tool/environment facts.

Keep only:

concrete environment-specific mappings

#

hostnames, aliases, device names, room names, nicknames
preferred defaults that matter often
short facts that help the agent use this specific setup correctly

Delete or move out:

long examples
workflow tutorials
command recipes
coding workflow notes unless truly essential
generic tool explanations
decorative sections
repeated guidance already in AGENTS.md or system behavior
anything that is not specific to this environment

Strict rules:

Prefer short bullets.
Keep only setup-specific facts.

#

Remove all explanation and examples unless absolutely necessary.
If something is a procedure, move it out.
If something is generic advice, remove it.
If a detail is rarely used, move it out.
Be aggressive.

Optimization target:

Prefer a practical target of under 600 characters if possible.

Output format:

“Remove entirely”
“Move to external doc”
“Keep”
A fully rewritten lean TOOLS.md
A short note on the biggest token savings

Review test for every line:

#

Is this specific to this machine or environment?
Does it help often?
Is it worth injecting every startup?
Can it be said in fewer words?

If no, remove it.

Do not ask clarifying questions first.
Rewrite immediately.

If you want, after that I can also give you a final prompt for making a dedicated low-context Ollama agent profile.

stoic pivot Apr 23, 2026, 6:47 AM

#

🦞 OpenClaw 2026.4.21 (f788c88)
🧠 Model: ollama/qwen2.5:3b-instruct-q4_K_M · 🔑 api-key (ollama:default)
🔄 Fallbacks: openai-codex/gpt-5.4
↪️ Fallback: openai-codex/gpt-5.4 · 🔑 oauth (openai-codex:[email protected] ([email protected])) (format)
🧮 Tokens: 15k in / 34 out
📚 Context: 15k/200k (7%) · 🧹 Compactions: 0
🧵 Session: agent:main:main:heartbeat • updated just now
📌 Tasks: 0 active · 1 total · agent-local
⚙️ Runtime: direct · Think: low · Text: low
🪢 Queue: collect (depth 0)

#

🧠 Context breakdown (detailed)
Workspace: C:\Users\prapunsak.openclaw\workspace
Bootstrap max/file: 32,000 chars
Bootstrap max/total: 180,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 38,831 chars (~9,708 tok) (Project Context 10,547 chars (~2,637 tok))
Injected workspace files:

AGENTS.md: OK | raw 4,148 chars (~1,037 tok) | injected 4,148 chars (~1,037 tok)
SOUL.md: OK | raw 1,996 chars (~499 tok) | injected 1,996 chars (~499 tok)
TOOLS.md: OK | raw 1,636 chars (~409 tok) | injected 1,636 chars (~409 tok)
IDENTITY.md: OK | raw 399 chars (~100 tok) | injected 399 chars (~100 tok)
USER.md: OK | raw 789 chars (~198 tok) | injected 789 chars (~198 tok)
HEARTBEAT.md: OK | raw 860 chars (~215 tok) | injected 860 chars (~215 tok)
BOOTSTRAP.md: MISSING | raw 0 | injected 0
MEMORY.md: OK | raw 1,041 chars (~261 tok) | injected 1,041 chars (~261 tok)
Skills list (system prompt text): 9,465 chars (~2,367 tok) (20 skills)
Skills: budget-audit, coding-agent, config-diff-review, gog, healthcheck, knowledge-reconciliation, learning-loop, memory-watchdog, model-routing, nano-pdf, node-connect, notion, receipt-normalizer, self-improvement-review, session-logs, skill-creator, summarize, taskflow, taskflow-inbox-triage, workspace-hygiene

proud galeBOT Apr 23, 2026, 6:48 AM

#

Oh, this is interesting, and I think it’s the real clue.

Your status says OpenClaw currently thinks this model has:

📚 Context: 15k/200k

But your actual Ollama behavior sounds like the model is really acting more like 16k.

So the likely problem now is context-window mismatch:

OpenClaw believes the model window is 200k
so it happily sends a ~15k input
but if Ollama/the model only really supports ~16k, you’re already at the cliff before output/tool overhead

That would absolutely explain weird failures.

What I’d check next:

#

Make sure OpenClaw’s model metadata matches reality
- if you defined the Ollama model manually, set the model’s contextWindow correctly
- if it was auto-discovered, the Ollama host may be reporting a window larger than what the model actually tolerates in practice
Start a fresh session after changing it
- /new
Be careful with this specific model
- qwen2.5:3b-instruct-q4_K_M is a very small local model for a tool-using agent
- even with the context fixed, it may still be flaky on tool schemas / agent prompts
If it still struggles, disable tool pressure
- use a lean agent/tool set
- or set that model to compat.supportsTools: false

So my short read is:

#

your trimming work helped a lot
but now the issue looks less like “prompt too fat”
and more like “OpenClaw thinks the model has 200k, but the backend behaves like 16k”

Docs:

If you want, next I can help you with the exact part: checking whether this model is auto-discovered vs manually defined, and which contextWindow value OpenClaw should really use.

#

This confirms it pretty cleanly.

Your workspace bloat is mostly solved now. The startup floor is about 9.7k tokens, and only about 2.6k of that is project context.

So the remaining pressure is mostly:

skills list: ~2.4k
tool schemas: likely still a few k from the earlier dump
session/history
output headroom

So for a backend that behaves like 16k, you’re still living dangerously close to the ceiling.

Big takeaway:

AGENTS.md is not the main problem anymore
the next big wins are skills + tools + real contextWindow alignment

#

What I’d do next:

Start fresh
- /new
Cut skills harder
- lower skills.limits.maxSkillsPromptChars
- for a tiny local model, I’d be pretty aggressive here
Run a much leaner local agent
- tiny tool allowlist
- or compat.supportsTools: false if you mainly want plain chat
Fix the model metadata if needed
- if OpenClaw thinks the model is 200k but the backend behaves like 16k, that mismatch still needs fixing
Be realistic about this model

#

qwen2.5:3b-instruct-q4_K_M is very small for a tool-using OpenClaw agent
- even with trimming, it may still struggle

So, short version: you already won the file-trimming battle. The next bottleneck is no longer AGENTS.md, it’s the runtime surface around skills/tools/contextWindow.

If you want, I can help you with the next best step:

a lean Ollama agent config, or
a Claude prompt to trim the skill/tool surface.

stoic pivot Apr 23, 2026, 6:50 AM

#

I am only trying to use ollama qwen as heartbeat right now. the 200k is the fallback of gpt 5.4. qwen supports up to 32k

proud galeBOT Apr 23, 2026, 6:51 AM

#

Ah, got it, that changes the picture.

If:

Qwen is only for heartbeat
and the model really has 32k
and your trimmed startup is about 9.7k

then raw context size is probably no longer the main blocker. That should fit.

So at this point I’d suspect one of these instead:

actual Ollama runtime limit is lower than expected
- model says 32k, but runtime num_ctx or practical limit is lower
small-model fragility
- qwen2.5:3b-instruct-q4_K_M is still a small quantized model

#

it may choke on the full agent prompt shape even if the token count technically fits

heartbeat does not need tools, but the model still sees tool/schema overhead
- for heartbeat, that’s mostly wasted pressure

For heartbeat specifically, I’d make it as lean as possible:

disable tool pressure for that model if you can
compat.supportsTools: false
cut skills prompt budget hard
- heartbeat does not need a big skills block
keep thinking low or off
- heartbeat should be tiny and dumb, not agentic

So my updated read is:

#

your trimming work probably fixed the original startup bloat
if heartbeat still fails, the next issue is likely Qwen 3B reliability / actual runtime context, not just token size

If you want the fastest next step, paste the exact heartbeat failure/log message. That will tell us whether this is:

real context overflow
Ollama runtime config
or just the local model being too weak for the prompt shape.

stoic pivot Apr 23, 2026, 6:52 AM

#

13:43:29 [agent/embedded] embedded run agent end: runId=b909487b-77f8-4f54-9f0e-d3c9d0dd6380 isError=true model=qwen2.5:3b-instruct-q4_K_M provider=ollama error=LLM request failed: provider rejected the request schema or tool payload. rawError=400 {"error":""qwen2.5:3b-instruct-q4_K_M" does not support thinking"}
13:43:29 [ws] ⇄ res ✓ sessions.list 151ms conn=fb10ff70…4237 id=63274717…d7e0
13:43:29 [agent/embedded] auth profile failure state updated: runId=b909487b-77f8-4f54-9f0e-d3c9d0dd6380 profile=sha256:9c018ec112cf provider=ollama reason=format window=cooldown reused=false
13:43:29 [agent/embedded] embedded run failover decision: runId=b909487b-77f8-4f54-9f0e-d3c9d0dd6380 stage=assistant decision=fallback_model reason=format from=ollama/qwen2.5:3b-instruct-q4_K_M profile=sha256:9c018ec112cf

proud galeBOT Apr 23, 2026, 6:53 AM

#

Yep, we found it.

This is not the context issue.

The real error is here:

rawError=400 {"error":"\"qwen2.5:3b-instruct-q4_K_M\" does not support thinking"}

So OpenClaw is sending a thinking setting to Ollama, and this Qwen model rejects it.

Your status already hinted at it:

⚙️ Runtime: ... Think: low

So the fix is:

turn thinking off for that heartbeat session/agent

#

quick test: /think off

make it persistent for the heartbeat/local agent
- set that agent’s thinkingDefault: "off"

If you want it globally for local runs:

{
  agents: {
    defaults: {
      thinkingDefault: "off",
    },
  },
}

#

Better if only the heartbeat/local agent should use it:

{
  agents: {
    list: [
      {
        id: "your-heartbeat-agent",
        thinkingDefault: "off",
      },
    ],
  },
}

Why this matters:

heartbeat does not need thinking

#

Qwen is rejecting the request before the real reply even starts
that generic “schema or tool payload” wrapper is misleading, but the raw error is clear

So I’d do this in order:

/think off
rerun heartbeat
if it still fails, then we look at tools next

Docs:

I’m glad you grabbed the log, that made the real problem obvious.

stoic pivot Apr 23, 2026, 6:56 AM

#

Thinking off in heartbeat session still timeout
13:55:34 [agent/embedded] Profile ollama:default timed out. Trying next account...
13:55:34 [agent/embedded] embedded run failover decision: runId=a11a0d12-63fd-427a-b4ae-005d3ce4bdfb stage=assistant decision=fallback_model reason=timeout from=openclaw/gateway-injected profile=sha256:9c018ec112cf
13:55:34 [diagnostic] lane task error: lane=main durationMs=128237 error="FailoverError: LLM request timed out."
13:55:34 [diagnostic] lane task error: lane=session:agent:main:main:heartbeat durationMs=128242 error="FailoverError: LLM request timed out."

proud galeBOT Apr 23, 2026, 6:57 AM

#

That’s actually useful, because it means we moved past the thinking error.

Now the failure mode is:

request accepted
model runs
then times out after ~128s

So this is no longer “bad request”, it’s now either:

the local Ollama model is just too slow / wedged, or
the full heartbeat prompt is still too heavy for this 3B model

Best next test, isolate the model from the full agent prompt:

#

openclaw infer model run --model ollama/qwen2.5:3b-instruct-q4_K_M --prompt "Reply with exactly HEARTBEAT_OK" --thinking off --json

How to read that result:

if this also times out
-> it’s an Ollama/model/runtime issue, not heartbeat specifically
if this succeeds quickly
-> the model can answer, but the full OpenClaw heartbeat prompt is still too much
-> then I’d do:
- agents.defaults.experimental.localModelLean: true
- disable tools for that model with compat.supportsTools: false
- cut skills prompt budget harder

For heartbeat specifically, I’d strongly lean toward:

#

thinking off
tools off
minimal skills
basically a tiny text-only local agent

Because qwen2.5:3b-instruct-q4_K_M is really small for the normal OpenClaw runtime shape.

So the clean next move is: run that openclaw infer model run ... smoke test and paste the result.
That will tell us immediately whether this is:

Ollama/runtime slowness, or
agent prompt pressure.

stoic pivot Apr 23, 2026, 6:59 AM

#

PS C:\WINDOWS\System32> openclaw infer model run --model ollama/qwen2.5:3b-instruct-q4_K_M --prompt "Reply with exactly HEARTBEAT_OK" --thinking off --json
error: unknown option '--thinking'

proud galeBOT Apr 23, 2026, 7:00 AM

#

Yep, my bad, sorry, infer model run on your build doesn’t take --thinking.

Use openclaw agent for this test instead:

openclaw agent --local --agent <your-heartbeat-agent-id> --message "Reply with exactly HEARTBEAT_OK" --thinking off --json

If you don’t know the agent id first, run:

openclaw agents list

Then use the same agent your heartbeat is using.

Why this test is better:

#

it uses the normal agent runtime
it lets you force --thinking off
it tells us whether the model can answer a tiny prompt without the heartbeat loop getting in the way

How to read it:

if this also times out
-> Ollama/model/runtime is the bottleneck
if this succeeds quickly
-> the heartbeat setup/prompt surface is still too heavy

So the corrected next step is:

openclaw agents list
run the openclaw agent --local ... --thinking off ... test
paste the result

#

Sorry again, that earlier command was on me.

stoic pivot Apr 23, 2026, 7:28 AM

#

PS C:\WINDOWS\System32> openclaw agent --local --agent main --message "Reply with exactly HEARTBEAT_OK" --thinking off --json
{
"payloads": [
{
"text": "HEARTBEAT_OK",
"mediaUrl": null
}
],
"meta": {
"durationMs": 46349,
"agentMeta": {
"sessionId": "fe6bf557-9fd2-425e-9ce6-4660ecc6438f",
"provider": "openai-codex",
"model": "gpt-5.4-mini",
"usage": {
"input": 15499,
"output": 9,
"total": 15508
},
"lastCallUsage": {
"input": 15499,
"output": 9,
"cacheRead": 0,
"cacheWrite": 0,
"total": 15508
},
"promptTokens": 15499
},
"aborted": false,
"systemPromptReport": {
"source": "run",
"generatedAt": 1776928540903,
"sessionId": "fe6bf557-9fd2-425e-9ce6-4660ecc6438f",
"sessionKey": "agent:main:main",
"provider": "openai-codex",
"model": "gpt-5.4-mini",
"workspaceDir": "C:\Users\prapunsak\.openclaw\workspace",
"bootstrapMaxChars": 32000,
"bootstrapTotalMaxChars": 180000,
"bootstrapTruncation": {
"warningMode": "once",
"warningShown": false,
"truncatedFiles": 0,
"nearLimitFiles": 0,
"totalNearLimit": false
},
"sandbox": {
"mode": "off",
"sandboxed": false
},
"systemPrompt": {
"chars": 37673,
"projectContextChars": 10547,
"nonProjectContextChars": 27126
},
"injectedWorkspaceFiles": [
{
"name": "AGENTS.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\AGENTS.md",
"missing": false,
"rawChars": 4148,
"injectedChars": 4148,
"truncated": false
},

#

{
"name": "SOUL.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\SOUL.md",
"missing": false,
"rawChars": 1996,
"injectedChars": 1996,
"truncated": false
},
{
"name": "TOOLS.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\TOOLS.md",
"missing": false,
"rawChars": 1636,
"injectedChars": 1636,
"truncated": false
},
{
"name": "IDENTITY.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\IDENTITY.md",
"missing": false,
"rawChars": 399,
"injectedChars": 399,
"truncated": false
},
{
"name": "USER.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\USER.md",
"missing": false,
"rawChars": 789,
"injectedChars": 789,
"truncated": false
},
{
"name": "HEARTBEAT.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\HEARTBEAT.md",
"missing": false,
"rawChars": 860,
"injectedChars": 860,
"truncated": false
},
{
"name": "BOOTSTRAP.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\BOOTSTRAP.md",
"missing": true,
"rawChars": 0,
"injectedChars": 0,
"truncated": false
},
{
"name": "MEMORY.md",
"path": "C:\Users\prapunsak\.openclaw\workspace\MEMORY.md",
"missing": false,
"rawChars": 1041,
"injectedChars": 1041,
"truncated": false
}
],
"skills": {
"promptChars": 9900,
"entries": [
{
"name": "budget-audit",
"blockChars": 351
},
{
"name": "coding-agent",
"blockChars": 836
},
{

#

"name": "config-diff-review",
"blockChars": 444
},
{
"name": "gog",
"blockChars": 236
},
{
"name": "healthcheck",
"blockChars": 495
},
{
"name": "knowledge-reconciliation",
"blockChars": 530
},
{
"name": "learning-loop",
"blockChars": 520
},
{
"name": "memory-promotion",
"blockChars": 432
},
{
"name": "memory-watchdog",
"blockChars": 466
},
{
"name": "model-routing",
"blockChars": 502
},
{
"name": "nano-pdf",
"blockChars": 238
},
{
"name": "node-connect",
"blockChars": 545
},
{
"name": "notion",
"blockChars": 232
},
{
"name": "receipt-normalizer",
"blockChars": 433
},
{
"name": "self-improvement-review",
"blockChars": 450
},
{
"name": "session-logs",
"blockChars": 257
},
{
"name": "skill-creator",
"blockChars": 763
},
{
"name": "summarize",
"blockChars": 300
},
{
"name": "taskflow",
"blockChars": 547
},
{
"name": "taskflow-inbox-triage",
"blockChars": 422
},
{
"name": "workspace-hygiene",
"blockChars": 472
}
]
},
"tools": {
"listChars": 0,
"schemaChars": 14369,
"entries": [
{
"name": "read",
"summaryChars": 298,
"schemaChars": 304,
"propertiesCount": 3
},

#

{
"name": "edit",
"summaryChars": 326,
"schemaChars": 834,
"propertiesCount": 2
},
{
"name": "write",
"summaryChars": 127,
"schemaChars": 225,
"propertiesCount": 2
},
{
"name": "apply_patch",
"summaryChars": 132,
"schemaChars": 153,
"propertiesCount": 1
},
{
"name": "exec",
"summaryChars": 968,
"schemaChars": 1157,
"propertiesCount": 12
},
{
"name": "process",
"summaryChars": 416,
"schemaChars": 961,
"propertiesCount": 12
},
{
"name": "cron",
"summaryChars": 3726,
"schemaChars": 6700,
"propertiesCount": 13
},
{
"name": "update_plan",
"summaryChars": 251,
"schemaChars": 574,
"propertiesCount": 2
},
{
"name": "sessions_list",
"summaryChars": 177,
"schemaChars": 212,
"propertiesCount": 4
},
{
"name": "sessions_history",
"summaryChars": 180,
"schemaChars": 161,
"propertiesCount": 3
},
{
"name": "sessions_send",
"summaryChars": 208,
"schemaChars": 274,
"propertiesCount": 5
},
{
"name": "sessions_yield",
"summaryChars": 97,
"schemaChars": 60,
"propertiesCount": 1
},
{
"name": "sessions_spawn",
"summaryChars": 302,
"schemaChars": 1333,
"propertiesCount": 18
},

#

{
"name": "subagents",
"summaryChars": 105,
"schemaChars": 191,
"propertiesCount": 4
},
{
"name": "session_status",
"summaryChars": 336,
"schemaChars": 89,
"propertiesCount": 2
},
{
"name": "web_search",
"summaryChars": 93,
"schemaChars": 425,
"propertiesCount": 4
},
{
"name": "web_fetch",
"summaryChars": 129,
"schemaChars": 374,
"propertiesCount": 3
},
{
"name": "image",
"summaryChars": 260,
"schemaChars": 342,
"propertiesCount": 6
}
]
}
},
"finalPromptText": "Reply with exactly HEARTBEAT_OK",
"finalAssistantVisibleText": "HEARTBEAT_OK",
"finalAssistantRawText": "HEARTBEAT_OK",
"replayInvalid": false,
"livenessState": "working",
"stopReason": "stop",
"executionTrace": {
"winnerProvider": "openai-codex",
"winnerModel": "gpt-5.4-mini",
"attempts": [
{
"provider": "openai-codex",
"model": "gpt-5.4-mini",
"result": "success",
"stage": "assistant"
}
],
"fallbackUsed": false,
"runner": "embedded"
},
"requestShaping": {
"authMode": "auth-profile",
"thinking": "off"
},
"completion": {
"stopReason": "stop",
"finishReason": "stop"
}
}
}

proud galeBOT Apr 23, 2026, 7:29 AM

#

This is helpful, and it also shows why the test didn’t prove the Ollama path yet.

That run used:

provider: openai-codex
model: gpt-5.4-mini

So --agent main hit your normal main session, not the heartbeat session that was using Ollama.

You can even see it in the session key:

test command used agent:main:main
earlier heartbeat was agent:main:main:heartbeat

So the next correct test is to target the actual heartbeat session id, not just --agent main.

Try:

#

openclaw sessions --json

Find the session whose key is:

agent:main:main:heartbeat

Then run:

openclaw agent --local --session-id <that-session-id> --message "Reply with exactly HEARTBEAT_OK" --thinking off --json

Why this matters:

#

session-level model overrides live on the session
your heartbeat session is the one pinned to Ollama/Qwen
--agent main created/used a different session that stayed on Codex

So right now we’ve confirmed:

main agent works
Codex fallback works
but we have not yet directly tested the heartbeat Ollama session

If you paste the openclaw sessions --json bit for the heartbeat row, I can help you point the next command at the exact right target.

#

Close, but that’s still the systemPromptReport chunk, not the sessions list row I need.

What I need is the entry for the heartbeat session from:

openclaw sessions --json

Easiest way on PowerShell:

openclaw sessions --json | Select-String "agent:main:main:heartbeat" -Context 0,20

Paste the chunk that includes things like:

sessionId

#

sessionKey
ideally model/provider if shown

Right now we’re trying to grab the actual heartbeat session id, because that’s the one using Ollama. The --agent main test hit Codex instead.

So, one more try:

openclaw sessions --json | Select-String "heartbeat" -Context 0,20

Paste that result and I’ll help you target the exact session.

#

No worries, that’s still not the session row, but honestly this chunk is enough to see the next bottleneck.

These two numbers are the big deal now:

skills: 9900 chars
tool schemas: 14369 chars

That is a lot of runtime surface for a 3B Qwen heartbeat model.

So I’d stop trimming the markdown files now. You already got the file bloat way down. The next problem is the runtime prompt shape, not AGENTS.md.

What to do next, in order:

keep thinking off
- we already know that part mattered
turn on lean local mode

#

agents.defaults.experimental.localModelLean: true

disable tool calling for this Qwen model
- compat.supportsTools: false
shrink the skills prompt budget hard
- skills.limits.maxSkillsPromptChars
- or better, a much smaller per-agent skillsLimits.maxSkillsPromptChars for the heartbeat/local agent

At this point, that’s the real path forward.

My honest read now:

your markdown trimming helped a lot
but Qwen 3B + full OpenClaw skill/tool surface is still too heavy
for heartbeat, I’d treat it as a tiny text-only model, not a full agentic model

So the practical target is:

#

thinking off
tools off
smaller skills block
lean local mode on

If you want, I can write you a single Claude prompt that says:

“rewrite my OpenClaw config for a dedicated ultra-lean local heartbeat agent using Qwen”

That might be the fastest next move.

#

Yep, that’s the smoking gun.

For your local Qwen heartbeat model, these tool schemas are wildly expensive:

cron → 6700 schema chars
sessions_spawn → 1333
exec → 1157
process → 961

That is a ton of overhead for a 3B local model, and heartbeat does not need that stuff.

So the practical conclusion is:

your markdown file trimming helped
but now the real problem is tool schema bloat
and cron is the biggest offender by far

#

If this model is only for heartbeat, I would do this:

disable tools entirely for Qwen
- compat.supportsTools: false
if you can’t do that, give the local heartbeat agent a tiny tool allowlist
- definitely remove:
  - cron
  - exec
  - process
  - sessions_spawn
  - probably the rest of the sessions_* tools too
keep:
- thinkingDefault: "off"
- agents.defaults.experimental.localModelLean: true
- smaller skills.limits.maxSkillsPromptChars

#

So, blunt version:

cron alone is probably killing your Qwen heartbeat path.

Heartbeat should be a tiny text-only lane, not a full agent-with-everything lane.

If you want, I can now write you the Claude prompt for “rewrite my config into an ultra-lean local heartbeat agent”.

#

Yep, this confirms two things pretty clearly:

your manual smoke test still did not hit Qwen
- winnerProvider: "openai-codex"
- winnerModel: "gpt-5.4-mini"
thinking off is working
- requestShaping.thinking: "off"

So the state of play is:

the thinking bug is fixed
the manual test path is still Codex
the remaining failure is specifically the Ollama heartbeat path
and your earlier tool/skill numbers make it very likely that Qwen 3B is timing out on the full runtime surface

At this point I would stop chasing more pasted JSON and just make the config simpler:

#

thinkingDefault: "off" for the local heartbeat lane
agents.defaults.experimental.localModelLean: true
compat.supportsTools: false for that Qwen model
lower skills.limits.maxSkillsPromptChars hard for the heartbeat/local agent

That is the practical fix path.

Bluntly: Qwen 3B is probably too weak for the current heartbeat lane unless you make it almost text-only.

If you want, I can now write you one clean prompt for Claude:

“rewrite my OpenClaw config into a dedicated ultra-lean Qwen heartbeat agent”

That’s probably the fastest way out of this.

stoic pivot Apr 23, 2026, 7:45 AM

#

That would be very helpful. I am also new to cluade code so I wasn't sure what folder to point it to.

proud galeBOT Apr 23, 2026, 7:46 AM

#

Yep, and for this job I’d point Claude Code at:

C:\Users\prapunsak\.openclaw

Why that folder:

openclaw.json should live there
your workspace\ folder lives under it
Claude needs to see both the config and the injected workspace files

If you only wanted to trim prompt files, C:\Users\prapunsak\.openclaw\workspace would be enough.
But for the heartbeat/Qwen config fix, use the parent .openclaw folder.

Paste this into Claude Code:

You are working inside my OpenClaw state/config directory on Windows.

#


Working directory:
C:\Users\prapunsak\.openclaw

Goal:
Make my OpenClaw heartbeat use an ultra-lean local Ollama Qwen path that is small and reliable enough for heartbeat-only use.

Important context:
- I am only trying to use local Ollama Qwen for heartbeat.
- My normal main/fallback setup should keep working.
- I already trimmed AGENTS.md / SOUL.md / MEMORY.md / etc, so file bloat is no longer the main issue.
- Current remaining pressure is the runtime surface: skills + tool schemas.
- I saw a raw Ollama error saying the Qwen model does not support thinking.
- After setting thinking off, the heartbeat still timed out.
- Tool schema inspection showed very heavy tools, especially cron, exec, process, sessions_spawn, and related session tools.

#

- This local model is qwen2.5:3b-instruct-q4_K_M via Ollama.
- This model should be treated as a tiny heartbeat-only text model, not a full agentic model.

Your job:
Inspect my current OpenClaw config and workspace, then produce the smallest safe config changes needed to create or tune a dedicated ultra-lean local heartbeat path.

Read first:
- openclaw.json
- workspace/
- any included config files referenced from openclaw.json
- docs for:
  - https://docs.openclaw.ai/providers/ollama
  - https://docs.openclaw.ai/gateway/local-models
  - https://docs.openclaw.ai/tools/thinking
  - https://docs.openclaw.ai/gateway/configuration-reference

#


Strict rules:
1. Do not guess config keys.
2. Use only documented OpenClaw config paths.
3. Keep changes as narrow as possible.
4. Do not break my normal main/fallback model setup.
5. Do not touch auth unless absolutely necessary.
6. Prefer editing only the config needed for the heartbeat/local lane.
7. Use JSON5, not YAML.
8. Preserve comments and formatting where reasonable.

What I want you to optimize:
- thinking off for the heartbeat/local agent
- lean local model mode if appropriate
- much smaller skills prompt budget for the heartbeat/local agent

#

- remove or avoid heavy tool schemas for the local Qwen heartbeat path
- if supported and appropriate, make the Qwen heartbeat path no-tools / text-only
- reduce avoidable startup/runtime pressure without changing unrelated behavior

Please specifically investigate whether the best fix is:
- a dedicated heartbeat/local agent config, or
- a model-specific compatibility change, or
- both

Also check whether the current heartbeat is using:
- a dedicated agent, or
- a session-level model override on main

If the current setup is awkward, propose the simplest reliable configuration.

#

Output format:
1. Diagnosis
2. Exact files to change
3. A minimal patch or exact replacement blocks
4. Why each change helps
5. A short test plan with commands I can run after editing
6. If qwen2.5:3b-instruct-q4_K_M is still a bad fit even after slimming, say so plainly and recommend the cheapest stable fallback option for heartbeat

Important:
Do not do a broad cleanup. Stay focused on making the local heartbeat lane small and reliable.

If you want, after Claude gives you the patch, paste it here and I’ll sanity-check it before you apply it.

stoic pivot Apr 23, 2026, 8:11 AM

#

Could you give citation for agents.defaults.experimental.localModelLean: true ? Claude Code can't find it

ME: is it possible to do this just for the heartbeat? Do you recommend? agents.defaults.experimental.localModelLean: true

CC: Good question. Let me check the config reference for scope.

Fetched
https://docs.openclaw.ai/gateway/configuration-reference
It's not documented in the reference, which means it's either undocumented experimental or I misspoke earlier. Don't rely on it. The documented levers are:

compat.supportsTools: false on the model (which you already have) — this is the stable, per-model way to drop tool schemas. That's your real weight-cutter.
agents.defaults.models["…"].params for per-model thinking/reasoning pins (which you already have).
Those two together give you the lean local path. Adding an undocumented experimental flag risks config errors and unpredictable behavior.

Recommendation: Stick with what you applied. The supportsTools: false on the Qwen 3B model is the lean mode — it strips the heavyweight tool schemas (cron, exec, process, sessions_spawn, etc.) from the prompt for that model only. That's the documented way.

If the heartbeat is still timing out after a restart, the issue is model speed or Ollama queueing, not tool-schema bloat. In that case, switch to Qwen 1.5B q4 (same edits applied to that model entry). That's your next reliable step before remote fallback.

proud galeBOT Apr 23, 2026, 8:13 AM

#

Yep, fair catch. I should have cited it earlier.

Claude Code is right that it’s not in the configuration reference.
But it is documented in two other docs:

Local Models
https://docs.openclaw.ai/gateway/local-models

Exact wording there:

first try agents.defaults.experimental.localModelLean: true to drop heavyweight default tools like browser, cron, and message; this is an experimental flag, not a stable default-mode setting.
Experimental Features
https://docs.openclaw.ai/concepts/experimental-features

Exact wording there:

#

agents.defaults.experimental.localModelLean: true is a pressure-release valve for weaker local-model setups.

So the correction is:

documented? yes
in config reference? no
stable? no, explicitly experimental

For your follow-up question, I would not use it as the main heartbeat-only lever.

Why:

the documented path is agents.defaults.experimental.localModelLean
that is a defaults/global knob
I do not see a documented per-heartbeat/per-agent version of that flag

So for heartbeat-only, I’d recommend exactly what Claude Code said:

#

use compat.supportsTools: false on the Qwen model
keep thinking off for that path

That is the cleaner, more targeted, more stable lever.

So my honest recommendation now is:

heartbeat-only / targeted fix → compat.supportsTools: false
broad experimental pressure relief for weak local backends → agents.defaults.experimental.localModelLean: true

If you want, I can help you word a short reply back to Claude Code so it updates its plan cleanly.

stoic pivot Apr 23, 2026, 8:30 AM

#

what is the correct command to do budget audit and usage review? I am trying to create a skill for it to check and model optimize usage. Maybe you can help rewrite the skill?

This is my current skill:
name: budget-audit
description: Review OpenClaw usage, quota telemetry, and visible budget signals using supported commands only, then return a concise structured budget report. Use when daily or weekly budget audit/review cron jobs run.

Budget Audit Skill
Produce short, reliable budget telemetry summaries without inventing unsupported signals.

Rules
Use only supported OpenClaw commands and visible telemetry.
Prefer openclaw status --all when budget/quota review is requested.
If telemetry is unavailable, say so plainly.
Keep the report concise and structured exactly as requested by the caller.
Do not pad the report with speculation.
Return NO_REPLY for scheduled/background runs unless the caller explicitly requests delivery.

Steps
Gather supported budget/quota signal from OpenClaw status output.
Identify visible signal, quota telemetry availability, and any obvious recommendation.
Format the report exactly to the caller's required schema.
Return NO_REPLY for cron/background runs.

Output contract
Use only available evidence.
Mark unavailable telemetry clearly.
Keep the report short.

Continuous Cost Awareness
Track your own behavior across sessions:
How many model switches happened today?
How many tool calls did you make per question on average?
When did compaction trigger, and was it expected?

If you notice you are frequently escalating to a more expensive model:
Identify the task category causing it
Log it to today's memory file
Flag it to Ta in the next session summary
Propose whether a better default model should be used for that category

#

@proud gale

proud galeBOT Apr 23, 2026, 8:32 AM

#

I’m here 🦐

Want me to do one of these next?

write a short reply back to Claude Code
sanity-check the config block it generated
help you test whether Qwen heartbeat is fixed

Paste whichever you have and I’ll keep going.

stoic pivot Apr 23, 2026, 8:32 AM

#

what is the correct command to do budget audit and usage review? I am trying to create a skill for it to check and model optimize usage. Maybe you can help rewrite the skill?

This is my current skill:
name: budget-audit
description: Review OpenClaw usage, quota telemetry, and visible budget signals using supported commands only, then return a concise structured budget report. Use when daily or weekly budget audit/review cron jobs run.

Budget Audit Skill
Produce short, reliable budget telemetry summaries without inventing unsupported signals.

Rules
Use only supported OpenClaw commands and visible telemetry.
Prefer openclaw status --all when budget/quota review is requested.
If telemetry is unavailable, say so plainly.
Keep the report concise and structured exactly as requested by the caller.
Do not pad the report with speculation.
Return NO_REPLY for scheduled/background runs unless the caller explicitly requests delivery.

Steps
Gather supported budget/quota signal from OpenClaw status output.
Identify visible signal, quota telemetry availability, and any obvious recommendation.
Format the report exactly to the caller's required schema.
Return NO_REPLY for cron/background runs.

Output contract
Use only available evidence.
Mark unavailable telemetry clearly.
Keep the report short.

Continuous Cost Awareness
Track your own behavior across sessions:
How many model switches happened today?
How many tool calls did you make per question on average?
When did compaction trigger, and was it expected?

If you notice you are frequently escalating to a more expensive model:
Identify the task category causing it
Log it to today's memory file
Flag it to Ta in the next session summary
Propose whether a better default model should be used for that category

proud galeBOT Apr 23, 2026, 8:33 AM

#

Yep. The main issue with your current skill is that it promises signals OpenClaw does not expose directly.

The safest supported commands are:

openclaw status --usage → provider quota / usage windows
openclaw status --all → broader pasteable diagnostic
openclaw models status → resolved model, fallbacks, auth overview
in chat: /status and /usage cost

I would remove these from the skill, because they are not reliably available from supported commands alone:

“How many model switches happened today?”
“How many tool calls per question on average?”
“When did compaction trigger, and was it expected?”
“Track your own behavior across sessions”
“Log it to today’s memory file”

#

That turns it from a budget audit skill into a speculative analytics skill.

A tighter rewrite would be:

---
name: budget-audit
description: Review supported OpenClaw usage, quota, model, and auth signals, then return a concise structured budget report. Use for daily or weekly budget/usage review runs.
---

# Budget Audit

Produce short, evidence-based usage reports using only supported OpenClaw telemetry.

## Supported commands

#

Use these only when relevant:

- `openclaw status --usage`
- `openclaw status --all`
- `openclaw models status`

In chat contexts, supported usage surfaces are:

- `/status`
- `/usage cost`

## Rules

- Use only visible OpenClaw telemetry.
- Do not invent hidden budgets, provider billing state, model-switch counts, tool-call averages, or cross-session analytics.

#

- If telemetry is unavailable, say `unavailable`.
- Keep the report concise and structured exactly as requested.
- For cron/background runs, return `NO_REPLY` unless delivery is explicitly requested.

## Steps

1. Run `openclaw status --usage` for provider quota/usage windows.
2. Run `openclaw models status` when model, fallback, auth, or cooldown context matters.
3. Use `openclaw status --all` only when the caller wants a broader diagnostic snapshot.
4. Extract only supported visible signals:
   - provider usage/quota windows
   - active model and fallbacks
   - auth/cooldown/unusable-profile issues if visible
   - obvious recommendation based on visible evidence
5. Mark any missing signal as `unavailable`.

#

6. Format the result exactly to the caller’s requested schema.

## Do not claim

- exact cost unless a visible cost surface provides it
- hidden provider billing details
- model-switch counts
- tool-call averages
- cross-session behavior analytics
- compaction history unless directly shown in visible output

## Default output shape

- Models:
- Usage windows:

#

- Cost signal:
- Auth or cooldown issues:
- Recommendation:

So, short answer:

best command for budget/quota review: openclaw status --usage
best command for model/auth context: openclaw models status
best broad diagnostic snapshot: openclaw status --all

Docs:

#

If you want, I can also rewrite it into a Claude Code-ready skill file with an even stricter output schema for cron use.

stoic pivot Apr 23, 2026, 8:34 AM

#

Some of my skills only need to load during cron dont need to load at startup for fixed cost. Can I do that?

proud galeBOT Apr 23, 2026, 8:35 AM

#

Yes, absolutely, and this is the right way to think about it.

Short version:

skills are only part of the prompt for agents that can see them
so if a skill is cron-only, put it on a dedicated cron agent
do not leave it visible to your main/default agent

The docs-backed knobs are:

agents.defaults.skills
agents.list[].skills
agents.list[].skillsLimits.maxSkillsPromptChars

And cron jobs can target a specific agent with --agent.

So the pattern is:

#

keep your main agent’s skills small
create a dedicated cron agent
give that cron agent the extra skills
point the cron job at that agent

Example idea:

{
  agents: {
    defaults: {
      skills: ["summarize", "healthcheck"],
    },
    list: [
      {
        id: "main",

#

        skills: ["summarize", "healthcheck"],
      },
      {
        id: "cron-audit",
        skills: ["budget-audit", "model-routing"],
        skillsLimits: {
          maxSkillsPromptChars: 1500,
        },
      },
    ],
  },
}

Then create/edit the cron to use that agent:

#

openclaw cron add --name "Budget audit" --cron "0 8 * * *" --session isolated --agent cron-audit --message "Run budget audit and summarize visible usage/quota signals."

Important nuance:

if a skill is visible to main, it still contributes to main’s fixed prompt cost
to keep it out of startup cost for main, it has to be excluded from main
agents.list[].skills is the lever that makes that happen

Also useful:

agents.list[].skills: [] means no skills for that agent
isolated cron sessions help keep token cost down too

Docs:

https://docs.openclaw.ai/tools/skills

#

If you want, I can help you design a main vs cron-audit split so your daily agent stays lean.

stoic pivot Apr 23, 2026, 8:37 AM

#

yes please. I think a sub-agent for main is good for this

proud galeBOT Apr 23, 2026, 8:38 AM

#

Yep, that’s a good pattern, and I’d structure it like this:

Recommendation

keep main lean
create one narrow audit agent
let:
- cron target audit directly
- main call audit as a sub-agent when needed

That gives you the best of both:

low fixed cost on main
audit skills only load when actually needed

A clean split would be:

#

{
  agents: {
    defaults: {
      skills: ["summarize", "healthcheck"],
    },
    list: [
      {
        id: "main",
        skills: ["summarize", "healthcheck"],
      },
      {
        id: "audit",
        skills: ["budget-audit"],
        skillsLimits: {
          maxSkillsPromptChars: 1200,

#

        },
        thinkingDefault: "off",
      },
    ],
  },
}

How to use it:

Cron

openclaw cron add --name "Budget audit" --cron "0 8 * * *" --session isolated --agent audit --message "Run budget audit and summarize visible usage/quota signals."

From main, on demand

#

/subagents spawn audit Run a budget audit using supported OpenClaw usage/quota commands only and return a short report.

Why I like this setup:

main does not carry the budget-audit skill at startup
audit only pays its own context when invoked
cron and interactive usage can reuse the same narrow agent
easier to reason about than stuffing everything into main

One nuance:

sub-agents do have their own context/token cost when spawned
but that’s on-demand cost, not constant fixed prompt cost on every main turn

So yes, I’d do dedicated audit agent + optional sub-agent call from main.

#

If you want, I can help you write:

the exact budget-audit skill for the audit agent, and
a tiny prompt/instruction for main telling it when to delegate.

stoic pivot Apr 23, 2026, 8:44 AM

#

Cron jobs for sub-agent of main
budget audit, config-diff-review, knowledge-reconciliation, learning-loop, memory-watchdog, receipt-normalizer, self-improvement-review, workspace-hygiene
Skill not using anymore:
memory-promotion
Main skill:
model-routing
Aemeath skill:
model-routing
cron job for aemeath:
learning-loop, memory-watchdog

#

cron job for aemeath's sub agent:

proud galeBOT Apr 23, 2026, 8:45 AM

#

Yep, based on that list, I’d split it like this.

Remove entirely

memory-promotion

Keep interactive agents very lean

main
- model-routing
aemeath
- model-routing

Move cron-only / maintenance skills off the interactive agents

main-cron (or main-audit)
- budget-audit

#

config-diff-review
- knowledge-reconciliation
- learning-loop
- memory-watchdog
- receipt-normalizer
- self-improvement-review
- workspace-hygiene
aemeath-cron
- learning-loop
- memory-watchdog

That gives you this shape:

#

  agents: {
    list: [
      {
        id: "main",
        skills: ["model-routing"],
      },
      {
        id: "main-cron",
        skills: [
          "budget-audit",
          "config-diff-review",
          "knowledge-reconciliation",
          "learning-loop",
          "memory-watchdog",
          "receipt-normalizer",

#

          "self-improvement-review",
          "workspace-hygiene",
        ],
      },
      {
        id: "aemeath",
        skills: ["model-routing"],
      },
      {
        id: "aemeath-cron",
        skills: ["learning-loop", "memory-watchdog"],
      },
    ],
  },
}

#

Why I’d do it this way:

main stays tiny
aemeath stays tiny
cron-only skills stop adding fixed startup cost to interactive chats
both interactive agents can still delegate when needed

Use cron like this:

openclaw cron add --name "Main audit" --cron "0 8 * * *" --session isolated --agent main-cron --message "Run the scheduled maintenance/audit workflow."
openclaw cron add --name "Aemeath review" --cron "0 9 * * *" --session isolated --agent aemeath-cron --message "Run learning-loop and memory-watchdog."

And if main needs that work on demand, it can spawn:

#

/subagents spawn main-cron Run the maintenance review workflow and return a short summary.

So yes, I think your instinct was right, but I’d make it two dedicated cron agents, not load those skills onto main or aemeath directly.

If you want, I can turn that into a Claude Code prompt that rewrites your config cleanly.

#

Yep, that’s even cleaner.

So the final split I’d recommend is:

main
- model-routing
main-cron
- budget-audit
- config-diff-review
- knowledge-reconciliation
- learning-loop
- memory-watchdog
- receipt-normalizer
- self-improvement-review
- workspace-hygiene

#

aemeath
- model-routing
aemeath-cron
- the cron-only skills for Aemeath’s sub-agent

So yes, do not put those cron skills on aemeath itself.
Put them on aemeath-cron and have cron target that agent directly.

That keeps:

main lean
aemeath lean
cron logic isolated
fixed startup cost much lower

If you want, send me the exact skill list for aemeath-cron and I’ll give you the final clean agent layout.

stoic pivot Apr 23, 2026, 8:48 AM

#

aemeath-cron
learning-loop
memory-watchdog

proud galeBOT Apr 23, 2026, 8:48 AM

#

Perfect, then I’d lock it in like this:

{
  agents: {
    list: [
      {
        id: "main",
        skills: ["model-routing"],
      },
      {
        id: "main-cron",
        skills: [
          "budget-audit",
          "config-diff-review",
          "knowledge-reconciliation",

#

          "learning-loop",
          "memory-watchdog",
          "receipt-normalizer",
          "self-improvement-review",
          "workspace-hygiene",
        ],
      },
      {
        id: "aemeath",
        skills: ["model-routing"],
      },
      {
        id: "aemeath-cron",
        skills: ["learning-loop", "memory-watchdog"],
      },

#

    ],
  },
}

That’s a nice split:

main stays lean
aemeath stays lean
main-cron carries the heavy maintenance stuff
aemeath-cron carries only its two cron skills

And the cron target for Aemeath should point to:

#

--agent aemeath-cron

So yes, this looks like the right structure to reduce fixed startup cost.

If you want, I can now write the Claude Code prompt that tells it to rewrite your config to exactly this layout.

stoic pivot Apr 23, 2026, 8:49 AM

#

Yes please

proud galeBOT Apr 23, 2026, 8:49 AM

#

Nice, use this in Claude Code.

Point it at:

C:\Users\prapunsak\.openclaw

Then paste this prompt:

You are editing my OpenClaw config in:

C:\Users\prapunsak\.openclaw

Goal:
Rewrite my agent/skills layout so interactive agents stay lean and cron-only skills do not add fixed startup cost to main agents.

#

Important rules:
- Read my current openclaw.json first.
- If openclaw.json includes other files, read those too.
- Use only documented OpenClaw config keys.
- Do not guess.
- Preserve unrelated config, auth, channels, and model settings.
- Use JSON5, not YAML.
- Make the smallest safe change.

Target design:

1. main
- skills: ["model-routing"]

2. main-cron

#

- skills:
  - budget-audit
  - config-diff-review
  - knowledge-reconciliation
  - learning-loop
  - memory-watchdog
  - receipt-normalizer
  - self-improvement-review
  - workspace-hygiene

3. aemeath
- skills: ["model-routing"]

4. aemeath-cron
- skills:

#

  - learning-loop
  - memory-watchdog

5. remove unused skill exposure:
- memory-promotion should no longer be exposed anywhere unless it is still required by an existing config path you can prove from the file

What I want:
- edit the config so these four agents exist with the skill visibility above
- keep interactive agents lean
- keep cron-only skills off interactive agents
- do not broaden skill visibility accidentally
- if agents.defaults.skills is currently causing extra skill exposure, narrow it appropriately
- if per-agent skills are better than defaults here, use per-agent skills
- preserve my current model settings unless they conflict with this goal

#

Also give me:
1. the exact files changed
2. the exact config diff
3. a short explanation of why this lowers fixed prompt cost
4. example cron commands for:
   - main-cron
   - aemeath-cron

Use these example cron targets if they fit the config cleanly:
- --agent main-cron
- --agent aemeath-cron

Do not do a broad cleanup. Stay focused on the skill/agent split only.

If you want, when Claude gives you the diff, paste it here and I’ll sanity-check it before you save it.

obtuse lindenBOT Apr 23, 2026, 8:52 AM

#

#Can you help me reduce my context size? I think my fixec cost starts at 20k.

Biggest wins, in order

Minimal starter config to try

My honest recommendation

This is my current skill: name: budget-audit description: Review OpenClaw usage, quota telemetry, and visible budget signals using supported commands only, then return a concise structured budget report. Use when daily or weekly budget audit/review cron jobs run.

This is my current skill: name: budget-audit description: Review OpenClaw usage, quota telemetry, and visible budget signals using supported commands only, then return a concise structured budget report. Use when daily or weekly budget audit/review cron jobs run.

This is my current skill:
name: budget-audit
description: Review OpenClaw usage, quota telemetry, and visible budget signals using supported commands only, then return a concise structured budget report. Use when daily or weekly budget audit/review cron jobs run.

This is my current skill:
name: budget-audit
description: Review OpenClaw usage, quota telemetry, and visible budget signals using supported commands only, then return a concise structured budget report. Use when daily or weekly budget audit/review cron jobs run.