#Token Consumption
1 messages · Page 1 of 1 (latest)
“Tokens” here means LLM text tokens (chunks of text), not login/auth tokens. Roughly: 1 token ≈ 3–4 characters of English (very approximate), so counts are mostly “how much text you sent + how much text the model replied”, including hidden stuff like system prompt + chat history.
Is 21k in / 78 out low or high?
- 21,000 input tokens is high for a single short question, but pretty normal if you’re using an agent/app that sends a lot of context (full conversation history, tool schemas, logs, pasted files, etc.) every time.
- 78 output tokens is tiny (a couple sentences).
A quick intuition: 21k tokens is on the order of ~15k words worth of context being sent to the model.
Does the same token count apply to all models?
- The concept applies broadly (most APIs bill by input/output tokens).
- But the exact token count for the same text can differ between model families because tokenizers differ.
- Costs also vary a lot per model/provider, and some providers have extras (e.g., prompt caching, web-search tool fees).
What would that cost on Claude Opus 4.6?
Anthropic’s pricing (for prompts ≤ 200K tokens) is: $5 / MTok input and $25 / MTok output.
Source: https://claude.com/pricing
So for 21,000 in and 78 out:
- Input: 21,000 / 1,000,000 × $5 = $0.105
- Output: 78 / 1,000,000 × $25 = $0.00195
Total ≈ $0.107 (~11 cents) for that one call (assuming you’re in the ≤200K bracket).
If you tell me what app/workflow you used when you saw “21k in”, I can usually point out what’s inflating the prompt (chat history, tool schemas, files, etc.) and how to bring it down (or use prompt caching where supported).
That “21k in / 78 out” is LLM text tokens (billing units), not auth tokens.
- 21k input tokens is high for a single short question, but very normal if your app/agent is sending lots of hidden context each call (full chat history + system prompt + tool schemas + logs/files, etc.).
- 78 output tokens is tiny.
Does it apply to all models?
- Mostly yes: most APIs bill on input + output tokens.
- But token counts differ a bit between model families (different tokenizers), and pricing varies a lot.
Claude Opus 4.6 cost (Anthropic, prompts ≤ 200K):
Pricing is $5 / MTok input and $25 / MTok output (Source: https://claude.com/pricing)
Cost for 21,000 in + 78 out:
- input: 21,000/1,000,000 × $5 = $0.105
- output: 78/1,000,000 × $25 = $0.00195
→ ~$0.107 (~11 cents) for that one call.
If you share where you saw the “21k in” (OpenClaw dashboard? Claude UI? OpenRouter? etc.), I can tell you what’s likely bloating the prompt and how to reduce it (or use prompt caching).
I just chat with my agent and prompted that. Anyway so would trimming the orchestrator (main agent) md files would work bringing that token consumption down? Or should I just use lesser model (sonnet or haiku) for the orchestrator to lower the cost, for general use? Unless I thought about specialized sub agent (for coding for example) then I use Opus? Which way is better approach?
Yes — trimming the orchestrator’s injected markdown/context can directly reduce token burn, because that stuff typically gets sent on every single model call as part of the “prompt” (system instructions + project context + tool schemas + conversation history).
That said, there are usually bigger levers than shaving a few paragraphs.
What’s likely making it “21k in” every turn
Common contributors:
- Long conversation history (every new message includes the prior chat)
- Big system/orchestrator docs (AGENTS/SOUL/house rules, etc.)
- Tool schemas (each enabled tool adds JSON schema tokens)
So yes: trimming orchestrator MD files helps, but it’s only one slice.
Strategy: smaller orchestrator model + “big brain” specialist
Your proposed approach is generally the best cost/perf pattern:
- Default/orchestrator: use a cheaper-but-capable model (often “Sonnet” tier).
- Specialized sub-agent (coding / heavy reasoning): use Opus when needed.
- Ultra-light tasks (heartbeats, simple extraction, routing): Haiku can be great if it stays reliable.
Why: the orchestrator tends to make lots of calls. Making every call Opus is the fastest way to rack up spend.
Cost intuition for your 21k/78 example (Anthropic pricing, prompts ≤200K)
From https://claude.com/pricing:
- Opus 4.6: $5/MTok in, $25/MTok out → ~$0.107 per call (your earlier number)
- Sonnet 4.6: $3/MTok in, $15/MTok out → 21k/78 ≈ $0.064
- Haiku 4.5: $1/MTok in, $5/MTok out → 21k/78 ≈ $0.021
(Outputs are tiny here; input dominates.)
Practical guidance (what I’d do)
- Keep orchestrator docs short but strong: remove repetition, long examples, and “personality essays”. Keep only rules that change behavior.
- Limit tools enabled by default (each tool schema adds tokens). Enable extra tools only in the agents that need them.
- Split work by agent:
- main agent = Sonnet (general use)
- coding agent = Opus (when you’re actually coding / debugging something gnarly)
- lightweight agent = Haiku (summaries, tagging, quick lookups)
- If you do long multi-turn tasks with lots of repeated context, look into prompt caching support (it can be a huge multiplier if available in your stack).
If you paste what’s in your orchestrator MD (or at least its rough size/sections) and roughly how many tools you have enabled, I can tell you what to cut first for the biggest token drop.
30kb total across those “always-injected” MD files is not crazy, but it’s also big enough to matter if it’s getting sent on every turn.
Rough token math
Rule of thumb: 1 token ≈ ~4 chars of English-ish text (varies, but good enough for budgeting).
- 30,000 chars / 4 ≈ ~7,500 tokens of “static context” per call
So if you’re seeing ~21k input tokens, then very roughly:
- ~7–9k tokens = those MD files
- ~12–14k tokens = everything else (chat history, tool schemas, any retrieved content, etc.)
Cost impact (just the MD files)
At Opus 4.6 input pricing ($5 / MTok):
- 7,500 tokens ≈ 0.0075 MTok → $0.0375 (~3.8 cents) per call just for the static MDs.
That’s why “small trims” still add up if your agent does lots of turns.