#Saving model cost

38 messages · Page 1 of 1 (latest)

cloud robin
#

Other than running local what is a good way to save money by optimizing model calls based on separating heartbeats etc from other calls? There is a lot of info floating around but I wanted to see if there was any official guidance...

whole zincBOT
#

Yeah, there is official guidance now, and the short version is: route by job type, don’t just hope fallbacks save money.

Best-supported cost levers:

  • Give heartbeat its own cheaper model with agents.defaults.heartbeat.model
  • Make heartbeat cheap with:
    • isolatedSession: true
    • lightContext: true
      The docs explicitly say this can cut heartbeat cost from roughly ~100K tokens to ~2-5K per run
  • Use cron for exact scheduled jobs, and give those runs their own --model, instead of stuffing everything into the main session
  • Use a cheaper sub-agent model via agents.defaults.subagents.model for repetitive worker tasks
  • Route by agent role with agents.list[].model

A couple of important caveats from the docs:

  • Use smaller/cheaper models only for low-stakes, trusted/chat-only work
  • For tool-enabled or untrusted-input flows, the docs recommend prioritizing a stronger model
#
  • Fallbacks are mostly for resilience, not primary cost control, because OpenClaw still tries the primary model first

If you’re using prompt caching, the official extras are:

  • cacheRetention
  • agents.defaults.contextPruning.mode: "cache-ttl"
  • keep heartbeat just under cache TTL only for agents where warming the cache is actually worth it

Minimal shape:

{
  agents: {
    defaults: {
      model: { primary: "your-strong-main-model" },
      heartbeat: {
#
        model: "your-cheaper-routine-model",
        every: "30m",
        isolatedSession: true,
        lightContext: true,
      },
      subagents: {
        model: "your-cheaper-worker-model",
      },
    },
  },
}

Also, to measure instead of guessing:

  • /status
hearty smelt
#

biggest wins i've found: use a cheaper model for heartbeats and status checks (haiku or a local model work fine), reserve the expensive model for actual task execution. also check if your agents are sending full context on every heartbeat — trimming that alone can cut costs 40-50%.

cloud robin
#

Thanks @hearty smelt I have some free credits from Google so been using
Primary: google/gemini-3.1-flash-lite-preview
Fallbacks: gemini-3.1-flash, gemini-3.1-pro-preview
I also configured xAI..

Any tips welcome

whole zincBOT
#

That setup is pretty sensible as an availability ladder, but the bigger savings usually come from role separation, not from the fallback chain itself.

My practical tips:

  • Use the cheap Gemini tier for heartbeat, not just as main default
    Set:

    • agents.defaults.heartbeat.model
    • isolatedSession: true
    • lightContext: true
      That is the cleanest “save tokens without changing behavior much” win.
  • Use cron for scheduled summaries/reports, with an explicit cheap or mid-tier model
    Good for digests, inbox sweeps, daily reports, etc.

  • Be careful making the lightest tier your main tool-using brain
    Official guidance is basically: cheaper/smaller is fine for low-stakes, trusted stuff, but for tool use or untrusted inputs, a stronger default is safer.

#
  • Use a separate sub-agent model for grunt work
    agents.defaults.subagents.model is great for summarization, extraction, repetitive background work.

  • Treat xAI as a separate route, not just “also configured”
    If you want to use it intentionally, I’d make it a separate agent or switch to it on demand. Cross-provider fallbacks are more about resilience than cost optimization.

So if it were me, I’d probably do:

  • main chat / tools: stronger default
  • heartbeat / cron / summaries: cheap Gemini
  • worker subagents: cheap or mid Gemini
  • xAI: manual switch or dedicated agent for the cases where you specifically want it

If you want, I can mock up a concrete JSON5 config for exactly that split.

cloud robin
#

sure

whole zincBOT
#

I’d start with this split, it’s a good cheap but not too fragile baseline:

{
  agents: {
    defaults: {
      model: {
        primary: "google/gemini-3.1-flash",
        fallbacks: [
          "google/gemini-3.1-pro-preview",
        ],
      },

      heartbeat: {
        every: "30m",
        model: "google/gemini-3.1-flash-lite-preview",
#
        isolatedSession: true,
        lightContext: true,
        target: "none",
      },

      subagents: {
        model: "google/gemini-3.1-flash-lite-preview",
      },
    },
  },
}

Why this shape:

  • main interactive/tool work gets the more reliable mid tier
#
  • heartbeat gets the cheap tier
  • subagents/background grunt work also get the cheap tier
  • pro stays as the “oh shit, need a stronger retry” fallback

A couple of tweaks I’d recommend:

  • Use full refs everywhere
    Prefer google/... explicitly, not unprefixed model names

  • Don’t put xAI in the normal fallback chain unless you want outage protection
    For cost/control, I’d use xAI as a manual /model ... switch or a separate agent, not as a routine fallback

  • If your bot is mostly trusted chat and not tool-heavy, you can go even cheaper
    just keep your current lite model as primary too, and still keep the heartbeat/subagent split above

For scheduled digests/reports, I’d also use isolated cron jobs with an explicit cheap model, for example:

#
openclaw cron add \
  --name "Daily digest" \
  --cron "0 8 * * *" \
  --session isolated \
  --model "google/gemini-3.1-flash-lite-preview" \
  --message "Summarize anything important from the last day."

After changing it, I’d watch:

  • /status
  • /usage full
  • /usage cost

for a day or two and see where the real spend is.

#

If you want, I can also sketch a 2-agent version like main + ops/heartbeat, which is even cleaner.

cloud robin
#

how do i test this is working right

whole zincBOT
#

Easiest safe test loop:

  1. Verify the model config resolves

    openclaw models status
    openclaw models status --probe
    

    That confirms your default/fallback/auth setup is actually valid.

  2. Test the normal chat path
    In the thread, turn on usage info:

    /usage full
    

    Then send a tiny prompt. That lets you see the normal session behavior and cost footer.

  3. Test heartbeat explicitly

#

Make sure HEARTBEAT.md is not empty, otherwise OpenClaw will skip it.

Then trigger one manually:

openclaw system event --text "Test heartbeat config" --mode now
openclaw system heartbeat last
  1. Watch logs while doing that

    openclaw logs --follow
    

    If heartbeat didn’t run, this is where you’ll usually see why.

  2. Know the common “looks broken but isn’t” cases
    Heartbeat can skip with reasons like:

    • empty-heartbeat-file
#
  • no-tasks-due
    • quiet-hours
    • alerts-disabled
  1. Test cron separately with a throwaway job

    openclaw cron add \
      --name "model-test" \
      --at "5m" \
      --session isolated \
      --model "google/gemini-3.1-flash-lite-preview" \
      --message "Reply with: cron test ok" \
      --announce
    

    Then force-run it:

#
   openclaw cron run <jobId>
   openclaw cron runs --id <jobId> --limit 20

My recommendation: test in this order:

  • models status --probe
  • normal chat with /usage full
  • manual heartbeat wake
  • throwaway cron job

That usually tells you very quickly whether:

  • config is valid
  • main chat is using the right route
  • heartbeat is actually firing
  • isolated jobs are using the right model
#

If you want, I can give you a tiny HEARTBEAT.md test file and a throwaway cron command you can paste exactly.

cloud robin
#

how do i know if heartbeat runs, there will be a mesage in the log with whch model it used?

whole zincBOT
#

Short answer: not as a guaranteed plain-English log line, no.

What the docs do guarantee is:

  • Did heartbeat run?
    openclaw system heartbeat last
    
  • Why did it skip/fail?
    openclaw logs --follow
    
    That’s where you’ll see things like empty-heartbeat-file, no-tasks-due, quiet-hours, etc.

For the exact model/provider used, the more reliable surface is diagnostics/telemetry, not the generic heartbeat log line:

  • OpenClaw emits model.usage diagnostic events with:
#
  • provider
    • model
    • usage tokens
  • The logging docs also call out the exported openclaw.model.usage span with:
    • openclaw.channel
    • openclaw.provider
    • openclaw.model

So the practical answer is:

  • run confirmation → openclaw system heartbeat last
  • skip/error reason → openclaw logs --follow
  • exact model used → diagnostics / telemetry (model.usage / openclaw.model.usage)

If you want a dead-simple sanity check, do this:

#
openclaw system event --text "Test heartbeat config" --mode now
openclaw system heartbeat last
openclaw logs --follow

Then temporarily make the heartbeat model different from your main model. That way, if you inspect the usage event or provider-side usage, it’s obvious which path fired.

If you want, I can help you find the exact easiest place to read those model.usage events on your setup.

cloud robin
#

where are them odel usage files?

whole zincBOT
#

Usually there is not one dedicated “model-usage file”.

OpenClaw usage data is spread across a few places:

  • Session transcripts
    ~/.openclaw/agents/<agentId>/sessions/<sessionId>.jsonl
    
  • Session metadata / counters
    ~/.openclaw/agents/<agentId>/sessions/sessions.json
    
  • Diagnostics log if you want model.usage events
    /tmp/openclaw/openclaw-YYYY-MM-DD.log
    
    or whatever you set in logging.file
#

If you’re testing heartbeat, the most useful raw place is usually the diagnostics log:

ls -t /tmp/openclaw/openclaw-*.log | head -1
rg "model.usage|heartbeat" /tmp/openclaw/openclaw-*.log

A couple of related paths:

  • cron run logs
    ~/.openclaw/cron/runs/<jobId>.jsonl
    
  • mac app diagnostics
    ~/Library/Logs/OpenClaw/diagnostics.jsonl
    
#

So the short version is:

  • /usage cost reads from session logs/transcripts
  • model.usage diagnostics show up in the diagnostics log
  • heartbeat itself is easiest to trace via:
    openclaw system heartbeat last
    openclaw logs --follow
    

If you want, I can give you a one-liner to find the exact transcript file for your current session too.

cloud robin
#

this is getting too compicated

whole zincBOT
#

Yeah, totally fair. I’d simplify hard.

You do not need to chase usage files right now.

Just do this:

1. Keep your normal model setup as-is

Don’t touch fallbacks/xAI/subagents for now.

2. Only add a cheap heartbeat override

{
  agents: {
    defaults: {
      heartbeat: {
        model: "google/gemini-3.1-flash-lite-preview",
#
        isolatedSession: true,
        lightContext: true,
      },
    },
  },
}

3. Test only one thing

openclaw system event --text "Test heartbeat" --mode now
openclaw system heartbeat last

If heartbeat last shows it ran, you’re basically done.

#

That’s the simplest version of the idea:

  • normal chat uses your normal model setup
  • heartbeat uses the cheap model
  • no log spelunking required

If you want, I can give you the smallest possible final config for your setup, nothing extra, no options dump.

cloud robin
#

i changed my json earlier.

"heartbeat": {
"every": "30m",
"model": "google/gemini-3.1-flash-lite-preview",
"isolatedSession": true,
"lightContext": true,
"target": "none"
},
yes heartbeat ran.