#Is there any future plan to address the crazy toke burn?
1 messages · Page 1 of 1 (latest)
A lot of the “token burn” people see with OpenClaw is expected, because it’s not a single chat completion per message — it’s an agent loop that may do multiple LLM calls (plan → tool call(s) → follow-ups), and it can carry a big context (system prompt + skills + session history + tool results).
That said: if you’re seeing huge usage for simple prompts, that’s often tunable (or occasionally a loop/bug).
What’s “working as intended” vs “fixable”
Working as intended:
- Tool-heavy tasks (browser/file/workflows) = multiple model turns = more tokens.
- Long-running sessions = more context sent each request (until compaction/pruning kicks in).
Usually fixable:
- Expensive default model for everyday chat
- Sub-agents using an expensive model
- Background automation (heartbeat/cron) quietly spending tokens
- Very long sessions with lots of tool output (needs
/compact/ pruning) - Verbose/reasoning left on (can increase output a lot)
Biggest cost levers (practical)
-
Use a cheaper default model for day-to-day, and only switch up when you need it (via
/model), or split into two agents (fast/cheap vs “heavy”).
Docs: https://docs.openclaw.ai/help/faq#how-can-i-use-different-models-for-different-tasks -
Set a cheaper model for sub-agents (sub-agents can multiply spend fast). The FAQ explicitly calls out:
agents.defaults.subagents.model.
Docs: https://docs.openclaw.ai/help/faq#the-bot-freezes-while-doing-heavy-work-how-do-i-offload-that -
Keep sessions lean: use
/compactoccasionally, and/newwhen switching topics.
Docs: https://docs.openclaw.ai/concepts/compaction -
Trim tool-output bloat (especially if you do lots of exec/web/file reads): enable/tune session pruning.
Docs: https://docs.openclaw.ai/concepts/session-pruning -
Check background spend: if heartbeat/cron is enabled, that’s extra LLM calls even when you’re not chatting.
“Are there plans to reduce it?”
Yes in the sense that the project keeps adding things like compaction/pruning/caching-aware behavior to prevent unnecessary re-sending of giant contexts — but OpenClaw can’t magically make an expensive model cheap. The main control is still: which model you run by default and how many turns your agent is doing per request.