#Token Usage

1 messages · Page 1 of 1 (latest)

balmy quartz
#

Is there a way to limit token usage on openclaw? For example: set a hard limit of 1k tokens for simple tasks?

cloud pewterBOT
acoustic tendonBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

cloud pewterBOT
winged timberBOT
#

Not as a single hard total-per-task token budget, no.

Closest knobs today are:

  • params.maxTokens: caps the model’s output
  • contextTokens: caps the effective context/history budget
  • thinkingDefault: "off": helps for simple tasks, since reasoning tokens still count

Example (JSON5):

{
  agents: {
    defaults: {
      contextTokens: 8000,
      thinkingDefault: "off",
#
      models: {
        "openai/gpt-5.4-mini": {
          params: { maxTokens: 1000 },
        },
      },
    },
  },
}

Important caveat: 1k total tokens for the whole run is usually too tight, because OpenClaw’s system prompt, tools, history, and attachments all count before the answer is generated.

For monitoring:

  • /usage tokens
  • /status
#

If you want a true hard per-task token cap, that’s probably a feature request right now rather than an existing single knob.

Best path is a GitHub issue: https://github.com/openclaw/openclaw/issues

Current workaround is still:

  • params.maxTokens for response length
  • contextTokens for effective context budget
  • thinkingDefault: "off" for simple/cheap tasks
balmy quartz
#

Okay, what about best practices for optimizing your token usage?

winged timberBOT
#

Yep. My short list for OpenClaw token hygiene:

  1. Measure first

    • /usage tokens
    • /status
    • /context detail if you want to see what’s actually eating prompt space
  2. Use a separate “cheap/simple” agent

    • smaller model
    • thinkingDefault: "off" or low
    • lower params.maxTokens
  3. Keep sessions short

    • use /compact on long threads
    • don’t let one session accumulate tons of unrelated work
  4. Trim prompt bloat

#
  • keep AGENTS.md / bootstrap files lean
    • lower bootstrapMaxChars / bootstrapTotalMaxChars if needed
    • reduce skills.limits.maxSkillsPromptChars
    • reduce contextLimits.toolResultMaxChars
  1. Be careful with screenshots/images

    • lower agents.defaults.imageMaxDimensionPx for screenshot-heavy workflows
  2. Use prompt caching when your provider supports it

    • cacheRetention: "short" or "long"
    • for long-running threads, heartbeat just under cache TTL can save a lot
  3. Use the big model only when it matters

    • cheap model for routing, formatting, simple Q&A
    • bigger model for coding/debugging/deep reasoning

Big caveat: system prompt + history + tool results all count, so optimizing output tokens alone usually doesn’t move the needle enough.