#Token Consumption

1 messages · Page 1 of 1 (latest)

atomic stone
#

I don't really understand the token concept but I ask "how many tokens used for this question" it said 21k in/78 out. Is that low/high/too high?

Would that apply to all model out there?

And how much that would cost me, if say, I'll use Claude 4.6 Opus API?

#

@olive junco

olive juncoBOT
#

“Tokens” here means LLM text tokens (chunks of text), not login/auth tokens. Roughly: 1 token ≈ 3–4 characters of English (very approximate), so counts are mostly “how much text you sent + how much text the model replied”, including hidden stuff like system prompt + chat history.

Is 21k in / 78 out low or high?

  • 21,000 input tokens is high for a single short question, but pretty normal if you’re using an agent/app that sends a lot of context (full conversation history, tool schemas, logs, pasted files, etc.) every time.
  • 78 output tokens is tiny (a couple sentences).

A quick intuition: 21k tokens is on the order of ~15k words worth of context being sent to the model.

Does the same token count apply to all models?

  • The concept applies broadly (most APIs bill by input/output tokens).
  • But the exact token count for the same text can differ between model families because tokenizers differ.
  • Costs also vary a lot per model/provider, and some providers have extras (e.g., prompt caching, web-search tool fees).

What would that cost on Claude Opus 4.6?

Anthropic’s pricing (for prompts ≤ 200K tokens) is: $5 / MTok input and $25 / MTok output.
Source: https://claude.com/pricing

#

So for 21,000 in and 78 out:

  • Input: 21,000 / 1,000,000 × $5 = $0.105
  • Output: 78 / 1,000,000 × $25 = $0.00195

Total ≈ $0.107 (~11 cents) for that one call (assuming you’re in the ≤200K bracket).

If you tell me what app/workflow you used when you saw “21k in”, I can usually point out what’s inflating the prompt (chat history, tool schemas, files, etc.) and how to bring it down (or use prompt caching where supported).

#

That “21k in / 78 out” is LLM text tokens (billing units), not auth tokens.

  • 21k input tokens is high for a single short question, but very normal if your app/agent is sending lots of hidden context each call (full chat history + system prompt + tool schemas + logs/files, etc.).
  • 78 output tokens is tiny.

Does it apply to all models?

  • Mostly yes: most APIs bill on input + output tokens.
  • But token counts differ a bit between model families (different tokenizers), and pricing varies a lot.

Claude Opus 4.6 cost (Anthropic, prompts ≤ 200K):
Pricing is $5 / MTok input and $25 / MTok output (Source: https://claude.com/pricing)

Cost for 21,000 in + 78 out:

  • input: 21,000/1,000,000 × $5 = $0.105
  • output: 78/1,000,000 × $25 = $0.00195
    ~$0.107 (~11 cents) for that one call.
#

If you share where you saw the “21k in” (OpenClaw dashboard? Claude UI? OpenRouter? etc.), I can tell you what’s likely bloating the prompt and how to reduce it (or use prompt caching).

atomic stone
#

I just chat with my agent and prompted that. Anyway so would trimming the orchestrator (main agent) md files would work bringing that token consumption down? Or should I just use lesser model (sonnet or haiku) for the orchestrator to lower the cost, for general use? Unless I thought about specialized sub agent (for coding for example) then I use Opus? Which way is better approach?

olive juncoBOT
#

Yes — trimming the orchestrator’s injected markdown/context can directly reduce token burn, because that stuff typically gets sent on every single model call as part of the “prompt” (system instructions + project context + tool schemas + conversation history).

That said, there are usually bigger levers than shaving a few paragraphs.

What’s likely making it “21k in” every turn

Common contributors:

  1. Long conversation history (every new message includes the prior chat)
  2. Big system/orchestrator docs (AGENTS/SOUL/house rules, etc.)
  3. Tool schemas (each enabled tool adds JSON schema tokens)

So yes: trimming orchestrator MD files helps, but it’s only one slice.

Strategy: smaller orchestrator model + “big brain” specialist

Your proposed approach is generally the best cost/perf pattern:

  • Default/orchestrator: use a cheaper-but-capable model (often “Sonnet” tier).
  • Specialized sub-agent (coding / heavy reasoning): use Opus when needed.
#
  • Ultra-light tasks (heartbeats, simple extraction, routing): Haiku can be great if it stays reliable.

Why: the orchestrator tends to make lots of calls. Making every call Opus is the fastest way to rack up spend.

Cost intuition for your 21k/78 example (Anthropic pricing, prompts ≤200K)

From https://claude.com/pricing:

  • Opus 4.6: $5/MTok in, $25/MTok out → ~$0.107 per call (your earlier number)
  • Sonnet 4.6: $3/MTok in, $15/MTok out → 21k/78 ≈ $0.064
  • Haiku 4.5: $1/MTok in, $5/MTok out → 21k/78 ≈ $0.021

(Outputs are tiny here; input dominates.)

Practical guidance (what I’d do)

  1. Keep orchestrator docs short but strong: remove repetition, long examples, and “personality essays”. Keep only rules that change behavior.
  2. Limit tools enabled by default (each tool schema adds tokens). Enable extra tools only in the agents that need them.
  3. Split work by agent:
    • main agent = Sonnet (general use)
#
  • coding agent = Opus (when you’re actually coding / debugging something gnarly)
    • lightweight agent = Haiku (summaries, tagging, quick lookups)
  1. If you do long multi-turn tasks with lots of repeated context, look into prompt caching support (it can be a huge multiplier if available in your stack).

If you paste what’s in your orchestrator MD (or at least its rough size/sections) and roughly how many tools you have enabled, I can tell you what to cut first for the biggest token drop.

atomic stone
olive juncoBOT
#

30kb total across those “always-injected” MD files is not crazy, but it’s also big enough to matter if it’s getting sent on every turn.

Rough token math

Rule of thumb: 1 token ≈ ~4 chars of English-ish text (varies, but good enough for budgeting).

  • 30,000 chars / 4 ≈ ~7,500 tokens of “static context” per call

So if you’re seeing ~21k input tokens, then very roughly:

  • ~7–9k tokens = those MD files
  • ~12–14k tokens = everything else (chat history, tool schemas, any retrieved content, etc.)

Cost impact (just the MD files)

At Opus 4.6 input pricing ($5 / MTok):

  • 7,500 tokens ≈ 0.0075 MTok → $0.0375 (~3.8 cents) per call just for the static MDs.

That’s why “small trims” still add up if your agent does lots of turns.