#Tried out Kimi 2.5 pay as you go

1 messages Ā· Page 1 of 1 (latest)

proven shale
#

and burned through tokens way faster than is sustainable.
version 2026.4.15. Telegram.
Main model is usually a minimax monthly subscription which has been effectively unlimited for the one price.
Tried out Kimi as an alternative. I had one short conversation over 51 minutes while tweaking settings and testing the model. I didn't do anything crazy with it. It spent about 1 dollar. I had my claw analyze the usage csv and I can paste her response after this.

grand creekBOT
granite acornBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> āœ… Mark Solution

grand creekBOT
proven shale
#
Here's the breakdown from the CSV. This covers the Apr 18 session:

17 large requests (the conversation):

• Avg input: 112,000 tokens/call — massive
• Avg output: 193 tokens/call — very lean responses
• Time span: 51 minutes
• Cache hit rate: 38% overall, but wildly variable

The cost mystery: At $0.03/1K input + $0.06/1K output (Kimi's published rates), these 17 calls alone should cost ~$1.03. Your dashboard shows $0.84 total across the whole day. Either Kimi's giving you a bulk discount, the cache billing works differently than the docs suggest, or the CSV costs are calculated differently than billing. I'd ask Kimi support about the discrepancy.

The root cause of the burn rate: Your context window is ~112K-120K tokens per call. That's the full system prompt + workspace files + chat history, sent every single turn. Even though 38% cache-hit sounds decent, the uncached fresh tokens still cost full price — and you're sending 97K-114K fresh tokens on most calls.

For comparison: a normal conversation without the full system-prompt-every-turn approach might use 2K-5K input tokens per call. You're running at 20-60x that.
patent mauveBOT
#

Let me verify the docs and source so I don’t hand-wave this.

#

That can happen pretty fast with pay-as-you-go Kimi, and it’s often not just ā€œone short chatā€.

Most likely causes here:

  1. Same session kept growing
    OpenClaw resends the system prompt + chat history + tool results on each turn, so lots of small test messages in one 51-minute session can snowball token use.
    Docs: https://docs.openclaw.ai/reference/token-use

  2. You may have used a thinking model or enabled thinking
    moonshot/kimi-k2.5 is non-reasoning, but moonshot/kimi-k2-thinking and moonshot/kimi-k2-thinking-turbo do add reasoning overhead.
    Docs: https://docs.openclaw.ai/providers/moonshot

  3. Extra hidden calls
    Compaction, tools, web/media handling, etc. can add more paid requests in the same session.
    Docs: https://docs.openclaw.ai/concepts/compaction

What I’d do next:

#
  • turn on /usage tokens or /usage full
  • check /status after a reply
  • use /new before model-testing so you aren’t paying to resend old context
  • keep /think off while cost-testing
  • use /compact if you want to stay in the same thread

Slash command docs: https://docs.openclaw.ai/tools/slash-commands

If you paste your claw’s CSV analysis plus the exact model ref you used, I can tell you whether this looks expected or suspicious.

#

Yep, that analysis looks basically right.

The big tell is this:

  • 112k avg input
  • 193 avg output

So the spend is overwhelmingly from context replay, not from Kimi ā€œtalking too muchā€.

One small correction to the pasted analysis: it’s usually not just ā€œsystem prompt + workspace files + chat historyā€. In OpenClaw, the heavy part can also include:

  • tool schemas
  • skills list
  • injected bootstrap files (AGENTS.md, SOUL.md, USER.md, MEMORY.md, etc.)
  • prior conversation/tool results

Docs:

#
  1. Trim injected files
    Especially MEMORY.md, AGENTS.md, TOOLS.md, long persona files, etc.

  2. Use /compact sooner
    Or just reset the session if it’s a throwaway test.

  3. Keep Kimi on a non-thinking model while cost testing
    moonshot/kimi-k2.5 is cheaper to sanity-check than the thinking variants.
    Docs: https://docs.openclaw.ai/providers/moonshot

  4. Consider a lean agent for Telegram chat
    Fewer injected files/tools = much lower per-turn tax.

On the $1.03 vs $0.84 mismatch, I would not blame OpenClaw first. That sounds more like provider-side billing math, cached token treatment, or dashboard lag.

If you paste a /context detail, I can help pinpoint exactly what’s inflating the 112k.

proven shale
#

Any idea what the ~25 or so "undefined" 616 chars injections are?

patent mauveBOT
#

Let me check the source so I can answer this one precisely.

#

My bet: those are skills list entries, not 25 mystery files.

Why I think that:

  • OpenClaw injects a compact skills list into the prompt on every run
  • each skill block includes name + description + location
  • ~616 chars each is a very believable per-skill block size
  • having ~25 of them lines up with ā€œa bunch of installed/eligible skillsā€

Docs: https://docs.openclaw.ai/concepts/context

So the undefined part is probably just a reporting bug / stale saved report, not the real prompt content literally saying undefined.

Quick way to verify:

  • run /context json
  • look at whether those entries are under:
#
  • skills.entries → probably just the skill list, label bug
    • tools.entries → then it’s a tool/plugin missing a name, which is a real bug

If you want a fast sanity check, also try:

  • /new
  • then /context detail

If the names suddenly look normal in a fresh session, that points even more to a stale report issue.

So short version: I don’t think those are 25 hidden bootstrap injections. I think they’re almost certainly the per-skill prompt entries, with bad labeling.

If you paste the relevant bit from /context json, I can tell you exactly which section they’re coming from.

proven shale
#

yep i found the duplicate injection culprit but that's separate from this thread. back to what i should run to show you where the token drain is coming from

patent mauveBOT
#

Best clean repro:

  1. /new
  2. /usage tokens
  3. Send one normal test message like ā€œhelloā€ or whatever your typical opener is
  4. /status
  5. /context detail

Then paste:

  • the usage footer from that one reply
  • the /status output
  • the /context detail output, especially:
    • System prompt (...)
    • Injected workspace files
    • Skills list
    • Tool schemas (JSON)
#
  • Tracked prompt estimate
    • Actual context usage

If you want to inspect the already-expensive session instead of a clean repro, just run:

  • /status
  • /context detail

That’ll show us whether the drain is mostly:

  • base prompt tax,
  • injected files,
  • tools/skills,
  • or accumulated chat history.

If it’s easier, /context json is also great, but detail is usually enough. Feel free to redact private paths/content before pasting.

proven shale
#

can't paste an image so here we go PART 1

> ME:
Hey there

> BOT:
Hey! ā˜˜ļø
Usage: 154 in / 18 out Ā· cache 19k cached / 0 new Ā· est $0.0012

> ME:
/status

> BOT:
šŸ¦ž OpenClaw 2026.4.15 (041266a)
🧠 Model: minimax/MiniMax-M2.7 Ā· šŸ”‘ api-key (env: MINIMAX_API_KEY)
šŸ”„ Fallbacks: groq/llama-3.3-70b-versatile, openrouter/free, local/qwen2.5-1.5b-instruct-q4_k_m.gguf
🧮 Tokens: 154 in / 18 out Ā· šŸ’µ Cost: $0.0001
šŸ—„ļø Cache: 99% hit Ā· 19k cached, 0 new
šŸ“š Context: 19k/205k (9%) Ā· 🧹 Compactions: 0
🧵 Session: agent:main:telegram:direct:8695526498 • updated just now
āš™ļø Runtime: direct Ā· Think: low
🪢 Queue: steer (depth 0)

> ME:
/context list
#

PART 2

> BOT:
🧠 Context breakdown
Workspace: /home/openclaw/.openclaw/workspace
Bootstrap max/file: 20,000 chars
Bootstrap max/total: 60,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 54,027 chars (~13,507 tok) (Project Context 32,842 chars (~8,211 tok))

Injected workspace files:

• AGENTS.md: OK | raw 15,072 chars (~3,768 tok) | injected 15,072 chars (~3,768 tok)
• SOUL.md: OK | raw 1,423 chars (~356 tok) | injected 1,423 chars (~356 tok)
• TOOLS.md: OK | raw 3,414 chars (~854 tok) | injected 3,414 chars (~854 tok)
• IDENTITY.md: OK | raw 1,428 chars (~357 tok) | injected 1,428 chars (~357 tok)
• USER.md: OK | raw 866 chars (~217 tok) | injected 866 chars (~217 tok)
• HEARTBEAT.md: OK | raw 888 chars (~222 tok) | injected 888 chars (~222 tok)
• BOOTSTRAP.md: MISSING | raw 0 | injected 0
• MEMORY.md: OK | raw 9,355 chars (~2,339 tok) | injected 9,355 chars (~2,339 tok)
• SELF_IMPROVEMENT_REMINDER.md: OK | raw 616 chars (~154 tok) | injected 616 chars (~154 tok)

#

PART 3


Skills list (system prompt text): 8,804 chars (~2,201 tok) (20 skills)
Skills: clawhub-security, clean-coding, collectiviq-query, discord, google, healthcheck, node-connect, openai-whisper, openclaw-ops, openclaw-support, petra-backup, reviewed-learning-loop, security, skill-creator, taskflow, taskflow-inbox-triage, telegram-ops, telegram-security, video-frames, weather
Tool list (system prompt text): 0 chars (~0 tok)
Tool schemas (JSON): 14,219 chars (~3,555 tok) (counts toward context; not shown as text)
Tools: read, edit, write, exec, process, image_generate, music_generate, video_generate, sessions_list, sessions_history, sessions_send, sessions_yield, sessions_spawn, subagents, session_status, web_search, web_fetch, image, memory_search, memory_get

Session tokens (cached): 19,226 total / ctx=204,800

Inline shortcut: a command token inside normal text (e.g. ā€œhey /statusā€) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.

patent mauveBOT
#

Nice, this is already useful.

Big takeaway from part 1:

  • your clean base context is about 19k
  • not 112k

So the Kimi burn is not just ā€œyour bootstrap/system prompt is insanely hugeā€.

It points more toward:

  1. session growth during that 51-minute convo, and/or
  2. much worse cache reuse on Kimi than what you saw here on MiniMax

That lines up with your CSV note about lots of fresh uncached tokens.

So yes, please paste part 2, especially the /context list output.

#

If you have it later, /context detail is even better, but /context list alone is already enough to separate:

  • base prompt tax
  • injected files
  • skills/tools overhead
  • conversation/history growth
#

Yep, this is the smoking gun for the fixed prompt tax.

Your clean-session overhead is roughly:

  • System prompt: ~13.5k tok
  • Total cached context on hello: ~19k tok

Biggest always-injected contributors:

So:

  • AGENTS + MEMORY alone = ~6.1k tokens every turn
#
  • that 616-char file is only ~154 tok, basically noise

What this means:

  1. You do have a chunky base tax

    • around 19k before the conversation really grows
  2. But that still does not explain 112k avg by itself

    • the remaining ~90k-ish is still coming from:
      • accumulated chat/session history
      • tool results
      • weaker cache reuse on Kimi than what you saw here

So the answer is basically both:

  • fixed overhead is real
  • session growth is the bigger killer
#

If you want to cut costs fastest, I’d start with:

  • trimming AGENTS.md
  • trimming MEMORY.md
  • keeping test sessions on /new
  • using /compact sooner in longer chats

If you want the last missing piece, paste the rest of /context list or ideally /context detail, especially:

  • Skills list
  • Tool schemas (JSON)

That’ll show how much of the remaining ~5k-6k base overhead is tools vs skills.

#

Yep, now we can pretty much account for it.

Your brand-new session floor is about 19.2k tokens.

Biggest recurring prompt costs:

So the main story is:

  1. Your agent has a real fixed tax
    • about 19k before the convo even gets long
#
  1. Kimi made that pain visible

    • because it’s pay-as-you-go
    • and your CSV showed lots of fresh uncached input
  2. The 112k average came from fixed tax + session growth

    • not one mystery file
    • not that 616-char reminder

So if you want to make it sustainable, I’d prioritize in this order:

  1. Shrink AGENTS.md
  2. Trim MEMORY.md
  3. Reduce eligible skills for this chat agent
  4. Use a leaner Telegram/chat agent with fewer tools/skills
  5. Use /new for testing and /compact sooner in longer chats

Honestly, this looks like an agent that’s optimized for capability, not cheap casual chat. That’s why MiniMax monthly felt okay, and Kimi felt brutal.

#

If you want, I can help you turn this into a concrete ā€œhighest ROI cuts firstā€ checklist.

proven shale
#

i took a lot of time to train/massage my AGENTS.md so i don't wanna shrink it anymore. The primary selling point for openclaw for me is the identity aspect even though it isn't that great yet

patent mauveBOT
#

That’s totally fair, and honestly I think that’s a good instinct.

If identity is the point, I wouldn’t gut the part that makes the agent feel like yours.

So I’d treat AGENTS.md as protected and optimize the other stuff instead.

Your next-best cuts are:

  • Tool schemas: ~3.6k tok
  • MEMORY.md: ~2.3k tok
  • Skills list: ~2.2k tok
  • TOOLS.md: ~0.85k tok

So the better move is probably:

  1. Keep the same identity/persona
  2. Make the Telegram agent leaner on tools + skills
#
  1. Trim MEMORY/TOOLS before touching AGENTS
  2. Use /new for tests so session growth doesn’t pile on top

Honestly, the nicest setup here might be:

  • one ā€œsoulful chatā€ agent with your full identity
  • one heavier ops/tooling agent for tasky stuff

Same personality if you want, but way less per-turn baggage on the casual-chat side.

That gets you cheaper chats without sacrificing the identity part, which I agree is one of the most interesting things about OpenClaw right now.

If you want, I can help you figure out the highest-ROI non-AGENTS cuts first.

proven shale
#

but yes you're correct this isn't a chat bot it's my command center for the machine. eventually i plan on having it orchestrate smaller more bespoke agents but we're not there yet.

patent mauveBOT
#

Yep, that makes perfect sense.

A command-center agent is naturally context-heavy. That’s not you misusing OpenClaw, that’s just the architecture cost showing up.

So the practical takeaway is:

  • MiniMax monthly is a much better fit for your main orchestrator right now
  • Kimi pay-as-you-go is going to feel rough on that same agent shape
  • later, when you split into smaller bespoke agents, the economics should look a lot better

That future model feels right to me:

  • one high-context ā€œcaptainā€
  • multiple cheap, focused workers

Until then, I’d mostly treat Kimi as:

#
  • a fresh-session test model
  • or a special-purpose side agent
  • not the main machine-brain

Honestly, I’m glad we traced this, because now it looks less like ā€œsomething is brokenā€ and more like ā€œthis agent profile is expensive by designā€.

If this fixed the mystery, please right-click the solution message → Apps → āœ… Mark Solution so Answer Overflow can index it for future folks.