#Saving model cost
38 messages · Page 1 of 1 (latest)
Let me check the docs for the current guidance so I don’t hand-wave this.
Yeah, there is official guidance now, and the short version is: route by job type, don’t just hope fallbacks save money.
Best-supported cost levers:
- Give heartbeat its own cheaper model with
agents.defaults.heartbeat.model - Make heartbeat cheap with:
isolatedSession: truelightContext: true
The docs explicitly say this can cut heartbeat cost from roughly ~100K tokens to ~2-5K per run
- Use cron for exact scheduled jobs, and give those runs their own
--model, instead of stuffing everything into the main session - Use a cheaper sub-agent model via
agents.defaults.subagents.modelfor repetitive worker tasks - Route by agent role with
agents.list[].model
A couple of important caveats from the docs:
- Use smaller/cheaper models only for low-stakes, trusted/chat-only work
- For tool-enabled or untrusted-input flows, the docs recommend prioritizing a stronger model
- Fallbacks are mostly for resilience, not primary cost control, because OpenClaw still tries the primary model first
If you’re using prompt caching, the official extras are:
cacheRetentionagents.defaults.contextPruning.mode: "cache-ttl"- keep heartbeat just under cache TTL only for agents where warming the cache is actually worth it
Minimal shape:
{
agents: {
defaults: {
model: { primary: "your-strong-main-model" },
heartbeat: {
model: "your-cheaper-routine-model",
every: "30m",
isolatedSession: true,
lightContext: true,
},
subagents: {
model: "your-cheaper-worker-model",
},
},
},
}
Also, to measure instead of guessing:
/status
/usage full/usage cost
Docs:
- https://docs.openclaw.ai/automation
- https://docs.openclaw.ai/gateway/heartbeat
- https://docs.openclaw.ai/concepts/models
- https://docs.openclaw.ai/reference/prompt-caching
- https://docs.openclaw.ai/reference/token-use
If you want, I can sketch a cheap-but-safe config for your exact provider mix.
biggest wins i've found: use a cheaper model for heartbeats and status checks (haiku or a local model work fine), reserve the expensive model for actual task execution. also check if your agents are sending full context on every heartbeat — trimming that alone can cut costs 40-50%.
Thanks @hearty smelt I have some free credits from Google so been using
Primary: google/gemini-3.1-flash-lite-preview
Fallbacks: gemini-3.1-flash, gemini-3.1-pro-preview
I also configured xAI..
Any tips welcome
That setup is pretty sensible as an availability ladder, but the bigger savings usually come from role separation, not from the fallback chain itself.
My practical tips:
-
Use the cheap Gemini tier for heartbeat, not just as main default
Set:agents.defaults.heartbeat.modelisolatedSession: truelightContext: true
That is the cleanest “save tokens without changing behavior much” win.
-
Use cron for scheduled summaries/reports, with an explicit cheap or mid-tier model
Good for digests, inbox sweeps, daily reports, etc. -
Be careful making the lightest tier your main tool-using brain
Official guidance is basically: cheaper/smaller is fine for low-stakes, trusted stuff, but for tool use or untrusted inputs, a stronger default is safer.
-
Use a separate sub-agent model for grunt work
agents.defaults.subagents.modelis great for summarization, extraction, repetitive background work. -
Treat xAI as a separate route, not just “also configured”
If you want to use it intentionally, I’d make it a separate agent or switch to it on demand. Cross-provider fallbacks are more about resilience than cost optimization.
So if it were me, I’d probably do:
- main chat / tools: stronger default
- heartbeat / cron / summaries: cheap Gemini
- worker subagents: cheap or mid Gemini
- xAI: manual switch or dedicated agent for the cases where you specifically want it
If you want, I can mock up a concrete JSON5 config for exactly that split.
sure
I’d start with this split, it’s a good cheap but not too fragile baseline:
{
agents: {
defaults: {
model: {
primary: "google/gemini-3.1-flash",
fallbacks: [
"google/gemini-3.1-pro-preview",
],
},
heartbeat: {
every: "30m",
model: "google/gemini-3.1-flash-lite-preview",
isolatedSession: true,
lightContext: true,
target: "none",
},
subagents: {
model: "google/gemini-3.1-flash-lite-preview",
},
},
},
}
Why this shape:
- main interactive/tool work gets the more reliable mid tier
- heartbeat gets the cheap tier
- subagents/background grunt work also get the cheap tier
- pro stays as the “oh shit, need a stronger retry” fallback
A couple of tweaks I’d recommend:
-
Use full refs everywhere
Prefergoogle/...explicitly, not unprefixed model names -
Don’t put xAI in the normal fallback chain unless you want outage protection
For cost/control, I’d use xAI as a manual/model ...switch or a separate agent, not as a routine fallback -
If your bot is mostly trusted chat and not tool-heavy, you can go even cheaper
just keep your current lite model asprimarytoo, and still keep the heartbeat/subagent split above
For scheduled digests/reports, I’d also use isolated cron jobs with an explicit cheap model, for example:
openclaw cron add \
--name "Daily digest" \
--cron "0 8 * * *" \
--session isolated \
--model "google/gemini-3.1-flash-lite-preview" \
--message "Summarize anything important from the last day."
After changing it, I’d watch:
/status/usage full/usage cost
for a day or two and see where the real spend is.
If you want, I can also sketch a 2-agent version like main + ops/heartbeat, which is even cleaner.
how do i test this is working right
Let me check the safest test path in the docs so I don’t send you on a fake command.
Easiest safe test loop:
-
Verify the model config resolves
openclaw models status openclaw models status --probeThat confirms your default/fallback/auth setup is actually valid.
-
Test the normal chat path
In the thread, turn on usage info:/usage fullThen send a tiny prompt. That lets you see the normal session behavior and cost footer.
-
Test heartbeat explicitly
Make sure HEARTBEAT.md is not empty, otherwise OpenClaw will skip it.
Then trigger one manually:
openclaw system event --text "Test heartbeat config" --mode now
openclaw system heartbeat last
-
Watch logs while doing that
openclaw logs --followIf heartbeat didn’t run, this is where you’ll usually see why.
-
Know the common “looks broken but isn’t” cases
Heartbeat can skip with reasons like:empty-heartbeat-file
no-tasks-duequiet-hoursalerts-disabled
-
Test cron separately with a throwaway job
openclaw cron add \ --name "model-test" \ --at "5m" \ --session isolated \ --model "google/gemini-3.1-flash-lite-preview" \ --message "Reply with: cron test ok" \ --announceThen force-run it:
openclaw cron run <jobId>
openclaw cron runs --id <jobId> --limit 20
My recommendation: test in this order:
models status --probe- normal chat with
/usage full - manual heartbeat wake
- throwaway cron job
That usually tells you very quickly whether:
- config is valid
- main chat is using the right route
- heartbeat is actually firing
- isolated jobs are using the right model
If you want, I can give you a tiny HEARTBEAT.md test file and a throwaway cron command you can paste exactly.
how do i know if heartbeat runs, there will be a mesage in the log with whch model it used?
Let me verify that before I guess.
Short answer: not as a guaranteed plain-English log line, no.
What the docs do guarantee is:
- Did heartbeat run?
openclaw system heartbeat last - Why did it skip/fail?
That’s where you’ll see things likeopenclaw logs --followempty-heartbeat-file,no-tasks-due,quiet-hours, etc.
For the exact model/provider used, the more reliable surface is diagnostics/telemetry, not the generic heartbeat log line:
- OpenClaw emits
model.usagediagnostic events with:
providermodel- usage tokens
- The logging docs also call out the exported
openclaw.model.usagespan with:openclaw.channelopenclaw.provideropenclaw.model
So the practical answer is:
- run confirmation →
openclaw system heartbeat last - skip/error reason →
openclaw logs --follow - exact model used → diagnostics / telemetry (
model.usage/openclaw.model.usage)
If you want a dead-simple sanity check, do this:
openclaw system event --text "Test heartbeat config" --mode now
openclaw system heartbeat last
openclaw logs --follow
Then temporarily make the heartbeat model different from your main model. That way, if you inspect the usage event or provider-side usage, it’s obvious which path fired.
If you want, I can help you find the exact easiest place to read those model.usage events on your setup.
where are them odel usage files?
Let me check the docs/source for the exact path.
Usually there is not one dedicated “model-usage file”.
OpenClaw usage data is spread across a few places:
- Session transcripts
~/.openclaw/agents/<agentId>/sessions/<sessionId>.jsonl - Session metadata / counters
~/.openclaw/agents/<agentId>/sessions/sessions.json - Diagnostics log if you want
model.usageevents
or whatever you set in/tmp/openclaw/openclaw-YYYY-MM-DD.loglogging.file
If you’re testing heartbeat, the most useful raw place is usually the diagnostics log:
ls -t /tmp/openclaw/openclaw-*.log | head -1
rg "model.usage|heartbeat" /tmp/openclaw/openclaw-*.log
A couple of related paths:
- cron run logs
~/.openclaw/cron/runs/<jobId>.jsonl - mac app diagnostics
~/Library/Logs/OpenClaw/diagnostics.jsonl
So the short version is:
/usage costreads from session logs/transcriptsmodel.usagediagnostics show up in the diagnostics log- heartbeat itself is easiest to trace via:
openclaw system heartbeat last openclaw logs --follow
If you want, I can give you a one-liner to find the exact transcript file for your current session too.
this is getting too compicated
Yeah, totally fair. I’d simplify hard.
You do not need to chase usage files right now.
Just do this:
1. Keep your normal model setup as-is
Don’t touch fallbacks/xAI/subagents for now.
2. Only add a cheap heartbeat override
{
agents: {
defaults: {
heartbeat: {
model: "google/gemini-3.1-flash-lite-preview",
isolatedSession: true,
lightContext: true,
},
},
},
}
3. Test only one thing
openclaw system event --text "Test heartbeat" --mode now
openclaw system heartbeat last
If heartbeat last shows it ran, you’re basically done.
That’s the simplest version of the idea:
- normal chat uses your normal model setup
- heartbeat uses the cheap model
- no log spelunking required
If you want, I can give you the smallest possible final config for your setup, nothing extra, no options dump.
i changed my json earlier.
"heartbeat": {
"every": "30m",
"model": "google/gemini-3.1-flash-lite-preview",
"isolatedSession": true,
"lightContext": true,
"target": "none"
},
yes heartbeat ran.