#Token costs

1 messages Ā· Page 1 of 1 (latest)

exotic sparrow
#

Hey, I’m debugging a very expensive OpenClaw run and want to sanity-check whether this is expected or if I’m doing something wrong.

Setup:
OpenClaw on Telegram
model: deepinfra/google/gemma-4-31B-it
session key: agent:main:telegram:default:direct:1641047688

Problem:
A single visible user message + one final assistant reply is costing about 65k–76k input tokens.

Concrete example from a live session:
user message content was only 691 chars
first assistant call cost 15,391 input tokens
total for that one visible exchange ended up at 64,751 input tokens in the live session
DeepInfra dashboard delta for that message was ~76,534 input tokens
Part 2 coming

mortal pawnBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> āœ… Mark Solution

silk topazBOT
exotic sparrow
#

@earnest mauve part 2:

What I measured from systemPromptReport for the run:
systemPrompt.chars = 13,035
skills.promptChars = 3,301
tools.schemaChars = 11,349
tool count = 27
injectedWorkspaceFiles = []

So before any useful work, the known first-pass char budget was:
system prompt: 13,035
tool schemas: 11,349
skills prompt: 3,301
wrapped user message: 691

Also, this one message triggered 4 model passes:
toolUse -> read_active_plan
toolUse -> read ~/.openclaw-companion/workspace/skills/diet-planner/SKILL.md
toolUse -> read_health_hub(section=diet_snapshot) (also emitted a stray "uma")
final text answer

wait for part 2
Questions:
Is a ~15k-token first call for a 691-char user message expected with 27 tools and 6 skills, even with no injected workspace files?
Is there a recommended way to reduce tool-schema overhead per turn? Dynamic tool filtering? Separate lighter agent profile?
Is loading the full SKILL.md into history via read tool the intended pattern, even though it then bloats subsequent calls?
Are 3 sequential tool-use passes for a simple diet request expected behavior, or is there a recommended pattern to reduce this churn?
Is there a way to get token-level breakdowns for system prompt vs skills vs tool schemas vs history, beyond the char counts in systemPromptReport?

I already checked that this spike is not coming from my own user wrapper or extra injected canonical blocks on that turn.

earnest mauveBOT
#

Short version: not automatically a bug.

The big clue is this line:

  • first assistant call: 15,391 input tokens

That means the cost is mostly coming from OpenClaw context, not your 691-char visible message. Per docs, each run can include:

  • system prompt
  • tool schemas
  • injected workspace files (AGENTS.md, SOUL.md, TOOLS.md, USER.md, memory.md, etc.)
  • session history
  • tool results
  • provider-side hidden wrappers/headers

Docs:
https://docs.openclaw.ai/reference/token-use

#

https://docs.openclaw.ai/concepts/context
https://docs.openclaw.ai/concepts/system-prompt

Also, 64,751 / 15,391 ā‰ˆ 4.2. That looks a lot like roughly 4 internal model submissions for one visible exchange. OpenClaw can do that if there were tool turns, retries, compaction, or similar internal passes.

So my read is:

  • not normal for a ā€œtiny plain chat promptā€
  • but very plausible for OpenClaw if the prompt is fat and the provider path is rebilling the full prefix each pass

First things I’d want next:

  1. openclaw --version
  2. In that Telegram session: /context detail
  3. /status
  4. /usage full
#

If /context detail already shows something like 10k to 15k prompt tokens before much history, we’ve basically found the cause.

Most common culprits:

  • large injected workspace files, especially memory.md, AGENTS.md, TOOLS.md
  • long session history
  • big tool schema inventory
  • no useful prompt caching on that provider path, so the same prefix gets billed again on each internal pass

Send part 2 plus those outputs and I can sanity-check whether this is expected overhead or a bad config.

#

Yep, this is enough to reason about now.

One important correction first: I think you’re double-counting skills.promptChars.

In OpenClaw’s context report path, systemPrompt.chars already includes the injected skills block, injected files, and tool-list text. The report only adds tool schema chars separately for the tracked prompt estimate. So your first-pass tracked prompt is closer to:

  • systemPrompt.chars = 13,035
  • tools.schemaChars = 11,349
  • user wrapper/message ā‰ˆ 691

That is roughly 25k chars before history/tool results, not 28k+.

Direct answers

1) Is ~15k input on the first call expected here?

Yes, plausible. Not obviously wrong.

#

Why:

  • 25k-ish chars of prompt before ā€œreal workā€ is already substantial
  • tokenization on Gemma/OpenAI-compatible routes can be denser than the rough 4 chars/token rule
  • you still have transcript/history/wrappers/provider overhead beyond the raw char counters
  • /context detail itself calls out untracked provider/runtime overhead

So for a 27-tool, 6-skill agent, 15k first-pass input is believable even with no injected workspace files.

2) Best way to reduce tool-schema overhead?

Yes: the recommended pattern today is a lighter agent/profile, not magical per-turn dynamic filtering.

Useful knobs:

  • tools.profile
  • tools.allow / tools.deny
  • tools.byProvider
#
  • agents.list[].tools.profile
  • agents.list[].tools.allow/deny
  • agents.list[].tools.byProvider

Docs:
https://docs.openclaw.ai/tools/index
https://docs.openclaw.ai/tools/multi-agent-sandbox-tools

For your case, I’d strongly consider a diet agent with only the few tools it actually needs.

Also reduce skill overhead via:

  • agents.list[].skills
  • disable-model-invocation: true on skills the model should not see
  • skills.limits.maxSkillsPromptChars
  • agents.list[].skillsLimits.maxSkillsPromptChars
#

3) Is loading full SKILL.md via read intended?

Yes. That is the intended design.

OpenClaw keeps only a compact skills list in the base prompt, then the model reads SKILL.md on demand.

Docs:
https://docs.openclaw.ai/concepts/context
https://docs.openclaw.ai/tools/skills

But for a hot-path skill, big SKILL.md files are expensive. If this skill gets used constantly, I’d treat that as a sign to:

  • keep the skill much shorter
  • move repeatable logic into a dedicated tool
  • or use a dedicated slash/tool dispatch flow for the hot path

4) Are 3 tool-use passes for a simple diet request expected?

For a generic agent, yes. For an optimized diet workflow, probably more churn than ideal.

#

Your 4-pass sequence:

  1. read plan
  2. read skill
  3. read health snapshot
  4. final answer

…is very normal agentic behavior.

But it’s also exactly why the total balloons. Later passes include earlier tool results, so the per-pass input grows.

For this workflow, the biggest wins are:

  • dedicated lighter agent
  • collapse reads where possible
  • if diet_snapshot is almost always needed, inject a small structured block pre-turn instead of spending a whole model pass to fetch it
  • if the workflow is stable, replace ā€œread skill then decideā€ with a narrower custom tool or command
#

5) Can you get token-level breakdowns beyond char counts?

Not exact per-section tokenizer counts, as far as current OpenClaw surfaces go.

Best available today:

  • /context detail
  • /context json

Those give you:

  • per-skill prompt-entry sizes
  • per-tool schema sizes
  • tracked prompt estimate
  • actual cached context usage
  • untracked provider/runtime overhead

But I don’t think OpenClaw currently exposes exact tokenizer-accurate split accounting for:

#
  • system prompt vs skills vs tool schemas vs history

Docs:
https://docs.openclaw.ai/concepts/context
https://docs.openclaw.ai/reference/token-use

My practical take

What you’re seeing looks more like expected OpenClaw overhead + multi-pass churn than a billing bug.

If I were optimizing this exact setup, I’d do these in order:

  1. make a diet-specific agent with a tiny tool set
  2. keep only the relevant skill visible to that agent
  3. shorten diet-planner/SKILL.md hard
  4. reduce persisted tool-result size with per-agent contextLimits.toolResultMaxChars if safe
  5. if diet_snapshot is always needed, stop spending a separate model turn to discover that
#

Also, please paste openclaw --version. Token/context reporting has changed a bit across versions, and that matters here.

exotic sparrow
#

Considering Honcho memory for Bryan on OpenClaw. Main questions:

  1. In multi-agent OpenClaw setups, is the recommended pattern shared user memory + isolated per-agent memory peer?
  2. For a public-facing assistant, is Honcho materially better than builtin/QMD memory, or mostly overkill unless you need strong user modeling?
  3. If self-hosting Honcho, what tends to dominate real-world cost/ops: deriver workload, embeddings, Postgres/pgvector growth, or Redis?
  4. If using managed Honcho with active Telegram/WhatsApp traffic, what have people actually seen in monthly cost?

Would especially love feedback from anyone running Honcho on OpenClaw in production.

would it help save tokens costs?

exotic sparrow
#

@earnest mauve

exotic sparrow
#

@earnest mauve

fallen roost
#

this is super helpful breakdown

i ran into something similar and ended up building something to trace runs step by step + see token cost per pass

made it way easier to spot where things were blowing up (usually after tool calls / accumulated context)

sharing here in case it helps: https://clawtrail.dev

curious if this would’ve helped in your setup