and burned through tokens way faster than is sustainable.
version 2026.4.15. Telegram.
Main model is usually a minimax monthly subscription which has been effectively unlimited for the one price.
Tried out Kimi as an alternative. I had one short conversation over 51 minutes while tweaking settings and testing the model. I didn't do anything crazy with it. It spent about 1 dollar. I had my claw analyze the usage csv and I can paste her response after this.
#Tried out Kimi 2.5 pay as you go
1 messages Ā· Page 1 of 1 (latest)
Here's the breakdown from the CSV. This covers the Apr 18 session:
17 large requests (the conversation):
⢠Avg input: 112,000 tokens/call ā massive
⢠Avg output: 193 tokens/call ā very lean responses
⢠Time span: 51 minutes
⢠Cache hit rate: 38% overall, but wildly variable
The cost mystery: At $0.03/1K input + $0.06/1K output (Kimi's published rates), these 17 calls alone should cost ~$1.03. Your dashboard shows $0.84 total across the whole day. Either Kimi's giving you a bulk discount, the cache billing works differently than the docs suggest, or the CSV costs are calculated differently than billing. I'd ask Kimi support about the discrepancy.
The root cause of the burn rate: Your context window is ~112K-120K tokens per call. That's the full system prompt + workspace files + chat history, sent every single turn. Even though 38% cache-hit sounds decent, the uncached fresh tokens still cost full price ā and you're sending 97K-114K fresh tokens on most calls.
For comparison: a normal conversation without the full system-prompt-every-turn approach might use 2K-5K input tokens per call. You're running at 20-60x that.
Let me verify the docs and source so I donāt hand-wave this.
That can happen pretty fast with pay-as-you-go Kimi, and itās often not just āone short chatā.
Most likely causes here:
-
Same session kept growing
OpenClaw resends the system prompt + chat history + tool results on each turn, so lots of small test messages in one 51-minute session can snowball token use.
Docs: https://docs.openclaw.ai/reference/token-use -
You may have used a thinking model or enabled thinking
moonshot/kimi-k2.5is non-reasoning, butmoonshot/kimi-k2-thinkingandmoonshot/kimi-k2-thinking-turbodo add reasoning overhead.
Docs: https://docs.openclaw.ai/providers/moonshot -
Extra hidden calls
Compaction, tools, web/media handling, etc. can add more paid requests in the same session.
Docs: https://docs.openclaw.ai/concepts/compaction
What Iād do next:
- turn on
/usage tokensor/usage full - check
/statusafter a reply - use
/newbefore model-testing so you arenāt paying to resend old context - keep
/think offwhile cost-testing - use
/compactif you want to stay in the same thread
Slash command docs: https://docs.openclaw.ai/tools/slash-commands
If you paste your clawās CSV analysis plus the exact model ref you used, I can tell you whether this looks expected or suspicious.
Yep, that analysis looks basically right.
The big tell is this:
- 112k avg input
- 193 avg output
So the spend is overwhelmingly from context replay, not from Kimi ātalking too muchā.
One small correction to the pasted analysis: itās usually not just āsystem prompt + workspace files + chat historyā. In OpenClaw, the heavy part can also include:
- tool schemas
- skills list
- injected bootstrap files (
AGENTS.md,SOUL.md,USER.md,MEMORY.md, etc.) - prior conversation/tool results
Docs:
https://docs.openclaw.ai/concepts/context
https://docs.openclaw.ai/reference/token-use
So the next thing Iād check is:
/context listfor the rough breakdown/context detailfor the big offenders
Thatāll tell you whether the weight is mostly:
- injected files,
- tools,
- or accumulated chat history.
Most practical fixes:
- Use
/newfor model testing
Otherwise every tiny test keeps dragging the old context forward.
-
Trim injected files
EspeciallyMEMORY.md,AGENTS.md,TOOLS.md, long persona files, etc. -
Use
/compactsooner
Or just reset the session if itās a throwaway test. -
Keep Kimi on a non-thinking model while cost testing
moonshot/kimi-k2.5is cheaper to sanity-check than the thinking variants.
Docs: https://docs.openclaw.ai/providers/moonshot -
Consider a lean agent for Telegram chat
Fewer injected files/tools = much lower per-turn tax.
On the $1.03 vs $0.84 mismatch, I would not blame OpenClaw first. That sounds more like provider-side billing math, cached token treatment, or dashboard lag.
If you paste a /context detail, I can help pinpoint exactly whatās inflating the 112k.
Any idea what the ~25 or so "undefined" 616 chars injections are?
Let me check the source so I can answer this one precisely.
My bet: those are skills list entries, not 25 mystery files.
Why I think that:
- OpenClaw injects a compact skills list into the prompt on every run
- each skill block includes name + description + location
- ~616 chars each is a very believable per-skill block size
- having ~25 of them lines up with āa bunch of installed/eligible skillsā
Docs: https://docs.openclaw.ai/concepts/context
So the undefined part is probably just a reporting bug / stale saved report, not the real prompt content literally saying undefined.
Quick way to verify:
- run
/context json - look at whether those entries are under:
skills.entriesā probably just the skill list, label bugtools.entriesā then itās a tool/plugin missing a name, which is a real bug
If you want a fast sanity check, also try:
/new- then
/context detail
If the names suddenly look normal in a fresh session, that points even more to a stale report issue.
So short version: I donāt think those are 25 hidden bootstrap injections. I think theyāre almost certainly the per-skill prompt entries, with bad labeling.
If you paste the relevant bit from /context json, I can tell you exactly which section theyāre coming from.
yep i found the duplicate injection culprit but that's separate from this thread. back to what i should run to show you where the token drain is coming from
Best clean repro:
/new/usage tokens- Send one normal test message like āhelloā or whatever your typical opener is
/status/context detail
Then paste:
- the usage footer from that one reply
- the
/statusoutput - the
/context detailoutput, especially:System prompt (...)Injected workspace filesSkills listTool schemas (JSON)
Tracked prompt estimateActual context usage
If you want to inspect the already-expensive session instead of a clean repro, just run:
/status/context detail
Thatāll show us whether the drain is mostly:
- base prompt tax,
- injected files,
- tools/skills,
- or accumulated chat history.
If itās easier, /context json is also great, but detail is usually enough. Feel free to redact private paths/content before pasting.
can't paste an image so here we go PART 1
> ME:
Hey there
> BOT:
Hey! āļø
Usage: 154 in / 18 out Ā· cache 19k cached / 0 new Ā· est $0.0012
> ME:
/status
> BOT:
š¦ OpenClaw 2026.4.15 (041266a)
š§ Model: minimax/MiniMax-M2.7 Ā· š api-key (env: MINIMAX_API_KEY)
š Fallbacks: groq/llama-3.3-70b-versatile, openrouter/free, local/qwen2.5-1.5b-instruct-q4_k_m.gguf
š§® Tokens: 154 in / 18 out Ā· šµ Cost: $0.0001
šļø Cache: 99% hit Ā· 19k cached, 0 new
š Context: 19k/205k (9%) Ā· š§¹ Compactions: 0
𧵠Session: agent:main:telegram:direct:8695526498 ⢠updated just now
āļø Runtime: direct Ā· Think: low
šŖ¢ Queue: steer (depth 0)
> ME:
/context list
PART 2
> BOT:
š§ Context breakdown
Workspace: /home/openclaw/.openclaw/workspace
Bootstrap max/file: 20,000 chars
Bootstrap max/total: 60,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 54,027 chars (~13,507 tok) (Project Context 32,842 chars (~8,211 tok))
Injected workspace files:
⢠AGENTS.md: OK | raw 15,072 chars (~3,768 tok) | injected 15,072 chars (~3,768 tok)
⢠SOUL.md: OK | raw 1,423 chars (~356 tok) | injected 1,423 chars (~356 tok)
⢠TOOLS.md: OK | raw 3,414 chars (~854 tok) | injected 3,414 chars (~854 tok)
⢠IDENTITY.md: OK | raw 1,428 chars (~357 tok) | injected 1,428 chars (~357 tok)
⢠USER.md: OK | raw 866 chars (~217 tok) | injected 866 chars (~217 tok)
⢠HEARTBEAT.md: OK | raw 888 chars (~222 tok) | injected 888 chars (~222 tok)
⢠BOOTSTRAP.md: MISSING | raw 0 | injected 0
⢠MEMORY.md: OK | raw 9,355 chars (~2,339 tok) | injected 9,355 chars (~2,339 tok)
⢠SELF_IMPROVEMENT_REMINDER.md: OK | raw 616 chars (~154 tok) | injected 616 chars (~154 tok)
PART 3
Skills list (system prompt text): 8,804 chars (~2,201 tok) (20 skills)
Skills: clawhub-security, clean-coding, collectiviq-query, discord, google, healthcheck, node-connect, openai-whisper, openclaw-ops, openclaw-support, petra-backup, reviewed-learning-loop, security, skill-creator, taskflow, taskflow-inbox-triage, telegram-ops, telegram-security, video-frames, weather
Tool list (system prompt text): 0 chars (~0 tok)
Tool schemas (JSON): 14,219 chars (~3,555 tok) (counts toward context; not shown as text)
Tools: read, edit, write, exec, process, image_generate, music_generate, video_generate, sessions_list, sessions_history, sessions_send, sessions_yield, sessions_spawn, subagents, session_status, web_search, web_fetch, image, memory_search, memory_get
Session tokens (cached): 19,226 total / ctx=204,800
Inline shortcut: a command token inside normal text (e.g. āhey /statusā) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.
Nice, this is already useful.
Big takeaway from part 1:
- your clean base context is about 19k
- not 112k
So the Kimi burn is not just āyour bootstrap/system prompt is insanely hugeā.
It points more toward:
- session growth during that 51-minute convo, and/or
- much worse cache reuse on Kimi than what you saw here on MiniMax
That lines up with your CSV note about lots of fresh uncached tokens.
So yes, please paste part 2, especially the /context list output.
If you have it later, /context detail is even better, but /context list alone is already enough to separate:
- base prompt tax
- injected files
- skills/tools overhead
- conversation/history growth
Yep, this is the smoking gun for the fixed prompt tax.
Your clean-session overhead is roughly:
- System prompt: ~13.5k tok
- Total cached context on hello: ~19k tok
Biggest always-injected contributors:
So:
- AGENTS + MEMORY alone = ~6.1k tokens every turn
- that 616-char file is only ~154 tok, basically noise
What this means:
-
You do have a chunky base tax
- around 19k before the conversation really grows
-
But that still does not explain 112k avg by itself
- the remaining ~90k-ish is still coming from:
- accumulated chat/session history
- tool results
- weaker cache reuse on Kimi than what you saw here
- the remaining ~90k-ish is still coming from:
So the answer is basically both:
- fixed overhead is real
- session growth is the bigger killer
If you want to cut costs fastest, Iād start with:
- trimming AGENTS.md
- trimming MEMORY.md
- keeping test sessions on
/new - using
/compactsooner in longer chats
If you want the last missing piece, paste the rest of /context list or ideally /context detail, especially:
Skills listTool schemas (JSON)
Thatāll show how much of the remaining ~5k-6k base overhead is tools vs skills.
Yep, now we can pretty much account for it.
Your brand-new session floor is about 19.2k tokens.
Biggest recurring prompt costs:
- AGENTS.md: ~3,768 tok
- Tool schemas: ~3,555 tok
- MEMORY.md: ~2,339 tok
- Skills list: ~2,201 tok
- TOOLS.md: ~854 tok
So the main story is:
- Your agent has a real fixed tax
- about 19k before the convo even gets long
-
Kimi made that pain visible
- because itās pay-as-you-go
- and your CSV showed lots of fresh uncached input
-
The 112k average came from fixed tax + session growth
- not one mystery file
- not that 616-char reminder
So if you want to make it sustainable, Iād prioritize in this order:
- Shrink
AGENTS.md - Trim
MEMORY.md - Reduce eligible skills for this chat agent
- Use a leaner Telegram/chat agent with fewer tools/skills
- Use
/newfor testing and/compactsooner in longer chats
Honestly, this looks like an agent thatās optimized for capability, not cheap casual chat. Thatās why MiniMax monthly felt okay, and Kimi felt brutal.
If you want, I can help you turn this into a concrete āhighest ROI cuts firstā checklist.
i took a lot of time to train/massage my AGENTS.md so i don't wanna shrink it anymore. The primary selling point for openclaw for me is the identity aspect even though it isn't that great yet
Thatās totally fair, and honestly I think thatās a good instinct.
If identity is the point, I wouldnāt gut the part that makes the agent feel like yours.
So Iād treat AGENTS.md as protected and optimize the other stuff instead.
Your next-best cuts are:
So the better move is probably:
- Keep the same identity/persona
- Make the Telegram agent leaner on tools + skills
- Trim MEMORY/TOOLS before touching AGENTS
- Use
/newfor tests so session growth doesnāt pile on top
Honestly, the nicest setup here might be:
- one āsoulful chatā agent with your full identity
- one heavier ops/tooling agent for tasky stuff
Same personality if you want, but way less per-turn baggage on the casual-chat side.
That gets you cheaper chats without sacrificing the identity part, which I agree is one of the most interesting things about OpenClaw right now.
If you want, I can help you figure out the highest-ROI non-AGENTS cuts first.
but yes you're correct this isn't a chat bot it's my command center for the machine. eventually i plan on having it orchestrate smaller more bespoke agents but we're not there yet.
Yep, that makes perfect sense.
A command-center agent is naturally context-heavy. Thatās not you misusing OpenClaw, thatās just the architecture cost showing up.
So the practical takeaway is:
- MiniMax monthly is a much better fit for your main orchestrator right now
- Kimi pay-as-you-go is going to feel rough on that same agent shape
- later, when you split into smaller bespoke agents, the economics should look a lot better
That future model feels right to me:
- one high-context ācaptainā
- multiple cheap, focused workers
Until then, Iād mostly treat Kimi as:
- a fresh-session test model
- or a special-purpose side agent
- not the main machine-brain
Honestly, Iām glad we traced this, because now it looks less like āsomething is brokenā and more like āthis agent profile is expensive by designā.
If this fixed the mystery, please right-click the solution message ā Apps ā ā Mark Solution so Answer Overflow can index it for future folks.