#Token Usage
1 messages · Page 1 of 1 (latest)
Not as a single hard total-per-task token budget, no.
Closest knobs today are:
params.maxTokens: caps the model’s outputcontextTokens: caps the effective context/history budgetthinkingDefault: "off": helps for simple tasks, since reasoning tokens still count
Example (JSON5):
{
agents: {
defaults: {
contextTokens: 8000,
thinkingDefault: "off",
models: {
"openai/gpt-5.4-mini": {
params: { maxTokens: 1000 },
},
},
},
},
}
Important caveat: 1k total tokens for the whole run is usually too tight, because OpenClaw’s system prompt, tools, history, and attachments all count before the answer is generated.
For monitoring:
/usage tokens/status
Docs:
- https://docs.openclaw.ai/reference/token-use
- https://docs.openclaw.ai/gateway/configuration-reference
If you want, I can help sketch a “cheap/simple tasks” agent profile for your exact model/provider.
If you want a true hard per-task token cap, that’s probably a feature request right now rather than an existing single knob.
Best path is a GitHub issue: https://github.com/openclaw/openclaw/issues
Current workaround is still:
params.maxTokensfor response lengthcontextTokensfor effective context budgetthinkingDefault: "off"for simple/cheap tasks
Okay, what about best practices for optimizing your token usage?
Yep. My short list for OpenClaw token hygiene:
-
Measure first
/usage tokens/status/context detailif you want to see what’s actually eating prompt space
-
Use a separate “cheap/simple” agent
- smaller model
thinkingDefault: "off"or low- lower
params.maxTokens
-
Keep sessions short
- use
/compacton long threads - don’t let one session accumulate tons of unrelated work
- use
-
Trim prompt bloat
- keep
AGENTS.md/ bootstrap files lean- lower
bootstrapMaxChars/bootstrapTotalMaxCharsif needed - reduce
skills.limits.maxSkillsPromptChars - reduce
contextLimits.toolResultMaxChars
- lower
-
Be careful with screenshots/images
- lower
agents.defaults.imageMaxDimensionPxfor screenshot-heavy workflows
- lower
-
Use prompt caching when your provider supports it
cacheRetention: "short"or"long"- for long-running threads, heartbeat just under cache TTL can save a lot
-
Use the big model only when it matters
- cheap model for routing, formatting, simple Q&A
- bigger model for coding/debugging/deep reasoning
Big caveat: system prompt + history + tool results all count, so optimizing output tokens alone usually doesn’t move the needle enough.