#OpenRouter
1 messages · Page 1 of 1 (latest)
115M/week is high, but sadly not impossible — usually it’s one of these:
- Long context replay (client keeps sending full chat history each turn)
- Reasoning-heavy responses (extra internal tokens)
- Retries/background loops (same request sent many times)
- Shared/leaked API key (another app/process using the same key)
Quick sanity check:
115M / 7 ≈ 16.4M tokens/day.
If each request is ~200k input tokens, that’s only ~80 requests/day — this can happen faster than people expect.
If you’re using OpenClaw, do this to verify where it’s coming from:
/usage full(per-response token footer)openclaw status --usage(provider snapshot)- and send
openclaw --versionhere, I can help you narrow it down.
Docs for token accounting:
Perfect — please share them here and I’ll help you audit it.
Best screenshots to post (redact API keys):
- OpenRouter Usage page (date range = last 7 days)
- Per-model breakdown (if available)
- Request/activity logs showing largest calls
- Your client settings for max tokens / reasoning / retries (if you have them)
Once I see those, I can usually tell within a minute whether it’s context replay, retries, or key leakage.
gonna send you evertthing just notice that I say it's impossible because i don't even use it that much, 2hr average per day in 6-7 days..
Yeah, with ~2h/day that number does sound off for “normal” usage.
Usually when it jumps like that, it’s either:
- a hidden retry/runaway loop, or
- another app using the same API key.
Do this now before sharing screenshots:
- Rotate the OpenRouter key
- Create a new key only for this one app/client
- Set a hard spend limit
Then send whatever screenshots you have and I’ll help pinpoint it.
If you can, include:
- usage graph by day
- biggest request rows
- model breakdown
- your client’s retry/max-token settings
We’ll figure out where those tokens went.
I cannot send screenshot Because there Is no button
No worries — text is enough 👍
Just paste these values here (no screenshot needed):
- Date range
- Total requests
- Total input tokens
- Total output tokens
- Top 3 biggest requests (tokens each)
- Any retry/error count (429/5xx if shown)
- Whether this key was used in more than one app
If easier, copy this and fill it:
Range:
Requests:
Input tokens:
Output tokens:
Largest req #1:
Largest req #2:
Largest req #3:
429/5xx:
Other apps using same key: yes/no
And if you haven’t yet, rotate the key first so usage can’t keep climbing.
no other apps using the key, stopped working now because the model is unavaiable anymore and it's just one model.
tokens: Model
Min (tok)
Max (tok)
Avg (tok)
Sum (tok)
Hunter Alpha 909K 38.9M 14.4M 115M
requests:>
Model
Min (req)
Max (req)
Avg (req)
Sum (req)
Hunter Alpha 6 556 236 1.88K
This is super useful — and it actually explains a lot.
Your numbers are internally consistent:
- 115M tokens / 1.88K requests ≈ 61K tokens per request
- Peak bucket: 38.9M tokens with 556 requests (≈70K/request)
So this is probably not a dashboard math bug.
It looks like a request storm + large context per call (most likely retries/looping in the client, especially if the model started failing/unavailable).
Given your “~2h/day” usage, 1.88K requests is way too high for manual chatting.
What to do now
- Keep using a fresh rotated key
- In your client, set:
- max retries = 0 or 1
- disable auto-continue/background agents
- lower max context / clear chat history often
- Do a 10-minute test with the new key:
- if request count increases while idle → definite background loop
If you tell me which client/app you used, I can point to the exact settings to stop this.
so i am charging 10$ for the api from xiaomi themselves, the prices are this MiMo-V2-Pro
Model name mimo-v2-pro
Category Text Generation - General-purpose large language models
Context length 1 M
Max output 128 K
Features Text Generation、Deep Thinking、Stream output、Fuction Call、Json Output、Web Search
Price
0< Token ≤ 256 K:Input: $1.00 / 1M tokens, Cached Input: $0.20 / 1M tokens, Output: $3.00 / 1M tokens
256K < Token ≤ 1 M:Input: $2.00 / 1M tokens, Cached Input: $0.40 / 1M tokens, Output: $6.00 / 1M tokens
Note: Cache writing is free for a limited time.
Rate limit RPM: 100
TPM: 10 M
i am going to just use the api for openclaw i wont use it that much considering the time there are heartbeat every 30m where i don't think nothing needs to happen, i mean no tool call .. let's seee in a few hrs and will see if it's coherent
Yep — good plan, and one important thing:
Heartbeat still runs a full model turn even if no tool is called.
So it can burn tokens quietly.
If you want usage to stay low in OpenClaw, set heartbeat like this:
{
agents: {
defaults: {
heartbeat: {
every: "2h", // or "0m" to disable
target: "none", // default, no outbound message
isolatedSession: true, // big token saver
lightContext: true // only HEARTBEAT.md bootstrap
}
}
}
}
Also keep HEARTBEAT.md tiny.
Docs:
If you want, paste your heartbeat block and I’ll sanity-check it before you run.
Solo questi messaggi mi sono costati 1.1M di token. Sono 13 messaggi precisi 7-6 non ha senso così
Hai ragione, è frustrante — ma i numeri che vedi sono plausibili se il client reinvia tutto il contesto ad ogni turno.
1.1M / 13 messaggi = ~84k token per messaggio
Questo succede quando sommi:
- prompt di sistema grande
- cronologia completa del thread
- eventuali tool/result
- reasoning/output lunghi
Quindi non è “13 messaggi semplici”, è 13 volte un payload molto più grande.
Per verificarlo subito:
/context detail(ti dice cosa pesa davvero)/usage tokenso/usage full/status
Per ridurre subito i costi:
- apri una sessione nuova più spesso
- usa
/compactperiodicamente - abbassa max output/reasoning
- se usi heartbeat:
isolatedSession: true+lightContext: true(o disattivalo)
Se vuoi, incollami l’output di /context detail e ti dico esattamente dove stanno andando i token.
Ma se sono plausibili vuol dire che non è sostenibile per nessuno come costo per nessuno. Ed è un open source
Hai perfettamente ragione sul principio.
Se resta così com’è, non è sostenibile per uso “normale”.
La parte importante è questa: in OpenClaw il default è orientato a capacità, non a costo minimo.
Con thread lunghi + contesto 1M + heartbeat, i token possono esplodere.
Quindi va “messo in modalità risparmio”:
- Heartbeat off (o molto più raro)
- Sessioni più corte (
/new) e/compact - Ridurre bootstrap/context
Esempio config low-cost (valida):
{
agents: {
defaults: {
bootstrapMaxChars: 6000,
bootstrapTotalMaxChars: 30000,
heartbeat: {
every: "0m" // disabilitato
}
}
}
}
Se vuoi tenere heartbeat:
heartbeat: { every: "2h", isolatedSession: true, lightContext: true, target: "none" }
Docs:
- https://docs.openclaw.ai/reference/token-use
- https://docs.openclaw.ai/gateway/heartbeat
- https://docs.openclaw.ai/concepts/context
Se mi incolli il tuo blocco agents.defaults, te lo ottimizzo io per stare su costi sensati.
Bootstrap limits
• max/file: 20,000 chars
• max/total: 150,000 chars
Sandbox
• mode=off
• sandboxed=false
⸻
System prompt
• Run: 21,671 chars (~5,418 tok)
• Project Context: 8,138 chars (~2,035 tok)
⸻
Injected workspace files
• AGENTS.md → 771 chars (~193 tok)
• SOUL.md → 1,717 chars (~430 tok)
• TOOLS.md → 700 chars (~175 tok)
• IDENTITY.md → 654 chars (~164 tok)
• USER.md → 821 chars (~206 tok)
• HEARTBEAT.md → 686 chars (~172 tok)
• BOOTSTRAP.md → MISSING
• MEMORY.md → 2,047 chars (~512 tok)
⸻
Skills
• Total: 3,023 chars (~756 tok)
• Count: 6
Lista:
• gemini
• healthcheck
• node-connect
• skill-creator
• video-frames
• weather
Top skills (per peso):
• skill-creator → 756 chars (~189 tok)
• node-connect → 538 chars (~135 tok)
• healthcheck → 488 chars (~122 tok)
• weather → 413 chars (~104 tok)
• video-frames → 226 chars (~57 tok)
• gemini → 218 chars (~55 tok
🧠 Context breakdown (Part 2)
Tools
Tool list (testo prompt):
• 1,993 chars (~499 tok)
Tool schemas (JSON):
• 15,974 chars (~3,994 tok)
⸻
Tools disponibili
read, edit, write, exec, process, browser, canvas, message, tts,
agents_list, sessions_list, sessions_history, sessions_send,
sessions_yield, sessions_spawn, subagents, session_status,
web_search, web_fetch, memory_search, memory_get
⸻
Top tools (dimensione schema)
• message → 5,025 chars (~1,257 tok)
• browser → 2,799 chars (~700 tok)
• sessions_spawn → 1,179 chars (~295 tok)
• exec → 1,086 chars (~272 tok)
• web_search → 1,084 chars (~271 tok)
• process → 961 chars (~241 tok)
• canvas → 661 chars (~166 tok)
• edit → 591 chars (~148 tok)
• read → 392 chars (~98 tok)
• web_fetch → 374 chars (~94 tok)
• write → 313 chars (~79 tok)
⸻
Top tools (summary size)
• browser → 1,683 chars (~421 tok)
• memory_search → 334 chars (~84 tok)
• read → 298 chars (~75 tok)
• session_status → 207 chars (~52 tok)
• sessions_spawn → 198 chars (~50 tok)
• exec → 181 chars (~46 tok)
⸻
Tools (numero parametri)
• message → 94
• browser → 48
• canvas → 18
• sessions_spawn → 17
• exec → 12
• process → 12
• web_search → 9
• edit → 6
• sessions_send → 5
• read → 4
⸻
Session
• Cached tokens: 39,763
• Context max: 200,000
⸻
Inline shortcut
Comando inline dentro il testo (es. hey /status)
• eseguito immediatamente
• solo per sender allowlisted
• rimosso prima che il modello processi il resto
Perfetto, questi dati chiariscono molto.
Qui il bootstrap non è il colpevole principale.
Dal tuo dump:
- base system+files+skills ≈ ~10k token/turno (ordine di grandezza)
- su 13 messaggi sarebbero ~130k token
- ma hai speso 1.1M ⇒ restano ~970k da spiegare
Quindi il grosso viene quasi certamente da:
- cronologia chat che cresce ad ogni turno, e/o
- deep thinking/reasoning del modello (token spesso non visibili nel testo finale), e/o
- output troppo lunghi.
Cosa fare subito
- Usa
/usage fullper 2-3 messaggi e guarda input vs output per turno - Fai
/newpiù spesso (topic separati) - Usa
/compactprima che il thread diventi lungo
- Nel client/modello, abbassa o disattiva deep thinking e limita max output tokens
Se mi incolli 2 righe reali di /usage full (input/output), ti dico esattamente quale dei 3 sta pesando di più.
Ecco, questo chiude il cerchio 👇
Il dato chiave è:
- Session cached tokens: 39,763
Significa che ogni turno parte già con ~40k token di input (anche se scrivi una riga).
Quindi:
- 13 messaggi × ~40k input ≈ 520k input
- se aggiungi output/deep-thinking (es. 20–50k a turno), arrivi facilmente a ~1.1M totale
Quindi sì: è “plausibile”, e il problema è soprattutto contesto accumulato + output reasoning, non bootstrap/tools da soli.
Azione immediata
- fai
/compactadesso - o meglio
/newquando cambi argomento
- tieni
deep thinkingpiù basso/disattivo - limita output max
- heartbeat solo con
isolatedSession: true+lightContext: true(o off)
Se vuoi, nel prossimo messaggio mandami solo una riga di /usage full (input/output) e ti dico il settaggio preciso per stare in budget.
What!? 💸 Usage cost
Session $0.0000 · 868k tokens
Today $0.0000
Last 30d $0.0000
Yep — that can happen, and it’s confusing 😅
$0.0000 there is OpenClaw’s local estimate, not the provider invoice.
It shows zero when:
- pricing for that model/provider isn’t configured in OpenClaw, or
- auth mode hides dollar estimates (OAuth flows), or
- provider pricing is dynamic and OpenClaw only has token counts.
So this is possible:
- 868k tokens (real)
- $0.0000 (local estimate unavailable)
For actual billing, trust the provider dashboard.
Docs: https://docs.openclaw.ai/reference/token-use
33 request 1.2 M token spent. That’s crazy wrong
It feels wrong, but with your own stats it matches almost exactly:
- You had ~39,763 cached context tokens per request
39,763 × 33 requests = 1,312,179input tokens baseline
So ~1.2M for 33 requests is unfortunately expected, even before big outputs.
So the fix is not billing support — it’s reducing per-request context:
/newnow (hard reset context)/compactregularly- shorter threads per topic
- lower reasoning/deep-thinking + max output
- heartbeat off, or isolated/light
Your token burn is context inflation, not random overcharging.