#OpenRouter

1 messages · Page 1 of 1 (latest)

green spire
#
  • i used Alpha Hunter for a week on open router, the dashboard says that i used 115M tokens which is impossible, i cannot verify with them because there's no support, any hint bout this?
#

i wanted to show screenshots

scenic groveBOT
#

115M/week is high, but sadly not impossible — usually it’s one of these:

  1. Long context replay (client keeps sending full chat history each turn)
  2. Reasoning-heavy responses (extra internal tokens)
  3. Retries/background loops (same request sent many times)
  4. Shared/leaked API key (another app/process using the same key)

Quick sanity check:
115M / 7 ≈ 16.4M tokens/day.
If each request is ~200k input tokens, that’s only ~80 requests/day — this can happen faster than people expect.

If you’re using OpenClaw, do this to verify where it’s coming from:

  • /usage full (per-response token footer)
  • openclaw status --usage (provider snapshot)
  • and send openclaw --version here, I can help you narrow it down.

Docs for token accounting:

#

Perfect — please share them here and I’ll help you audit it.

Best screenshots to post (redact API keys):

  1. OpenRouter Usage page (date range = last 7 days)
  2. Per-model breakdown (if available)
  3. Request/activity logs showing largest calls
  4. Your client settings for max tokens / reasoning / retries (if you have them)

Once I see those, I can usually tell within a minute whether it’s context replay, retries, or key leakage.

green spire
#

gonna send you evertthing just notice that I say it's impossible because i don't even use it that much, 2hr average per day in 6-7 days..

scenic groveBOT
#

Yeah, with ~2h/day that number does sound off for “normal” usage.
Usually when it jumps like that, it’s either:

  • a hidden retry/runaway loop, or
  • another app using the same API key.

Do this now before sharing screenshots:

  1. Rotate the OpenRouter key
  2. Create a new key only for this one app/client
  3. Set a hard spend limit

Then send whatever screenshots you have and I’ll help pinpoint it.
If you can, include:

  • usage graph by day
  • biggest request rows
  • model breakdown
  • your client’s retry/max-token settings
#

We’ll figure out where those tokens went.

green spire
#

I cannot send screenshot Because there Is no button

scenic groveBOT
#

No worries — text is enough 👍

Just paste these values here (no screenshot needed):

  • Date range
  • Total requests
  • Total input tokens
  • Total output tokens
  • Top 3 biggest requests (tokens each)
  • Any retry/error count (429/5xx if shown)
  • Whether this key was used in more than one app

If easier, copy this and fill it:

Range:
#
Requests:
Input tokens:
Output tokens:
Largest req #1:
Largest req #2:
Largest req #3:
429/5xx:
Other apps using same key: yes/no

And if you haven’t yet, rotate the key first so usage can’t keep climbing.

green spire
#

no other apps using the key, stopped working now because the model is unavaiable anymore and it's just one model.
tokens: Model
Min (tok)
Max (tok)
Avg (tok)
Sum (tok)

Hunter Alpha 909K 38.9M 14.4M 115M
requests:>

Model
Min (req)
Max (req)
Avg (req)
Sum (req)

Hunter Alpha 6 556 236 1.88K

scenic groveBOT
#

This is super useful — and it actually explains a lot.

Your numbers are internally consistent:

  • 115M tokens / 1.88K requests ≈ 61K tokens per request
  • Peak bucket: 38.9M tokens with 556 requests (≈70K/request)

So this is probably not a dashboard math bug.
It looks like a request storm + large context per call (most likely retries/looping in the client, especially if the model started failing/unavailable).

Given your “~2h/day” usage, 1.88K requests is way too high for manual chatting.

What to do now

  1. Keep using a fresh rotated key
  2. In your client, set:
  • max retries = 0 or 1
  • disable auto-continue/background agents
#
  • lower max context / clear chat history often
  1. Do a 10-minute test with the new key:
  • if request count increases while idle → definite background loop

If you tell me which client/app you used, I can point to the exact settings to stop this.

green spire
#

so i am charging 10$ for the api from xiaomi themselves, the prices are this MiMo-V2-Pro
Model name mimo-v2-pro
Category Text Generation - General-purpose large language models
Context length 1 M
Max output 128 K
Features Text Generation、Deep Thinking、Stream output、Fuction Call、Json Output、Web Search
Price

0< Token ≤ 256 K:Input: $1.00 / 1M tokens, Cached Input: $0.20 / 1M tokens, Output: $3.00 / 1M tokens
256K < Token ≤ 1 M:Input: $2.00 / 1M tokens, Cached Input: $0.40 / 1M tokens, Output: $6.00 / 1M tokens

Note: Cache writing is free for a limited time.
Rate limit RPM: 100
TPM: 10 M
i am going to just use the api for openclaw i wont use it that much considering the time there are heartbeat every 30m where i don't think nothing needs to happen, i mean no tool call .. let's seee in a few hrs and will see if it's coherent

scenic groveBOT
#

Yep — good plan, and one important thing:

Heartbeat still runs a full model turn even if no tool is called.
So it can burn tokens quietly.

If you want usage to stay low in OpenClaw, set heartbeat like this:

{
agents: {
defaults: {
heartbeat: {
every: "2h", // or "0m" to disable
target: "none", // default, no outbound message
isolatedSession: true, // big token saver
lightContext: true // only HEARTBEAT.md bootstrap
green spire
#

Solo questi messaggi mi sono costati 1.1M di token. Sono 13 messaggi precisi 7-6 non ha senso così

scenic groveBOT
#

Hai ragione, è frustrante — ma i numeri che vedi sono plausibili se il client reinvia tutto il contesto ad ogni turno.

1.1M / 13 messaggi = ~84k token per messaggio
Questo succede quando sommi:

  • prompt di sistema grande
  • cronologia completa del thread
  • eventuali tool/result
  • reasoning/output lunghi

Quindi non è “13 messaggi semplici”, è 13 volte un payload molto più grande.

Per verificarlo subito:

  1. /context detail (ti dice cosa pesa davvero)
  2. /usage tokens o /usage full
  3. /status

Per ridurre subito i costi:

#
  • apri una sessione nuova più spesso
  • usa /compact periodicamente
  • abbassa max output/reasoning
  • se usi heartbeat: isolatedSession: true + lightContext: true (o disattivalo)

Se vuoi, incollami l’output di /context detail e ti dico esattamente dove stanno andando i token.

green spire
#

Ma se sono plausibili vuol dire che non è sostenibile per nessuno come costo per nessuno. Ed è un open source

scenic groveBOT
#

Hai perfettamente ragione sul principio.
Se resta così com’è, non è sostenibile per uso “normale”.

La parte importante è questa: in OpenClaw il default è orientato a capacità, non a costo minimo.
Con thread lunghi + contesto 1M + heartbeat, i token possono esplodere.

Quindi va “messo in modalità risparmio”:

  1. Heartbeat off (o molto più raro)
  2. Sessioni più corte (/new) e /compact
  3. Ridurre bootstrap/context

Esempio config low-cost (valida):

{
#
agents: {
defaults: {
bootstrapMaxChars: 6000,
bootstrapTotalMaxChars: 30000,
heartbeat: {
every: "0m" // disabilitato
}
}
}
}

Se vuoi tenere heartbeat:

heartbeat: { every: "2h", isolatedSession: true, lightContext: true, target: "none" }
green spire
#

Bootstrap limits
• max/file: 20,000 chars
• max/total: 150,000 chars

Sandbox
• mode=off
• sandboxed=false

System prompt
• Run: 21,671 chars (~5,418 tok)
• Project Context: 8,138 chars (~2,035 tok)

Injected workspace files
AGENTS.md → 771 chars (~193 tok)
SOUL.md → 1,717 chars (~430 tok)
TOOLS.md → 700 chars (~175 tok)
IDENTITY.md → 654 chars (~164 tok)
USER.md → 821 chars (~206 tok)
HEARTBEAT.md → 686 chars (~172 tok)
BOOTSTRAP.md → MISSING
MEMORY.md → 2,047 chars (~512 tok)

Skills
• Total: 3,023 chars (~756 tok)
• Count: 6

Lista:
• gemini
• healthcheck
• node-connect
• skill-creator
• video-frames
• weather

Top skills (per peso):
• skill-creator → 756 chars (~189 tok)
• node-connect → 538 chars (~135 tok)
• healthcheck → 488 chars (~122 tok)
• weather → 413 chars (~104 tok)
• video-frames → 226 chars (~57 tok)
• gemini → 218 chars (~55 tok

#

🧠 Context breakdown (Part 2)

Tools

Tool list (testo prompt):
• 1,993 chars (~499 tok)

Tool schemas (JSON):
• 15,974 chars (~3,994 tok)

Tools disponibili

read, edit, write, exec, process, browser, canvas, message, tts,
agents_list, sessions_list, sessions_history, sessions_send,
sessions_yield, sessions_spawn, subagents, session_status,
web_search, web_fetch, memory_search, memory_get

Top tools (dimensione schema)
• message → 5,025 chars (~1,257 tok)
• browser → 2,799 chars (~700 tok)
• sessions_spawn → 1,179 chars (~295 tok)
• exec → 1,086 chars (~272 tok)
• web_search → 1,084 chars (~271 tok)
• process → 961 chars (~241 tok)
• canvas → 661 chars (~166 tok)
• edit → 591 chars (~148 tok)
• read → 392 chars (~98 tok)
• web_fetch → 374 chars (~94 tok)
• write → 313 chars (~79 tok)

Top tools (summary size)
• browser → 1,683 chars (~421 tok)
• memory_search → 334 chars (~84 tok)
• read → 298 chars (~75 tok)
• session_status → 207 chars (~52 tok)
• sessions_spawn → 198 chars (~50 tok)
• exec → 181 chars (~46 tok)

Tools (numero parametri)
• message → 94
• browser → 48
• canvas → 18
• sessions_spawn → 17
• exec → 12
• process → 12
• web_search → 9
• edit → 6
• sessions_send → 5
• read → 4

Session
• Cached tokens: 39,763
• Context max: 200,000

Inline shortcut

Comando inline dentro il testo (es. hey /status)
• eseguito immediatamente
• solo per sender allowlisted
• rimosso prima che il modello processi il resto

scenic groveBOT
#

Perfetto, questi dati chiariscono molto.
Qui il bootstrap non è il colpevole principale.

Dal tuo dump:

  • base system+files+skills ≈ ~10k token/turno (ordine di grandezza)
  • su 13 messaggi sarebbero ~130k token
  • ma hai speso 1.1M ⇒ restano ~970k da spiegare

Quindi il grosso viene quasi certamente da:

  1. cronologia chat che cresce ad ogni turno, e/o
  2. deep thinking/reasoning del modello (token spesso non visibili nel testo finale), e/o
  3. output troppo lunghi.

Cosa fare subito

  • Usa /usage full per 2-3 messaggi e guarda input vs output per turno
  • Fai /new più spesso (topic separati)
  • Usa /compact prima che il thread diventi lungo
#
  • Nel client/modello, abbassa o disattiva deep thinking e limita max output tokens

Se mi incolli 2 righe reali di /usage full (input/output), ti dico esattamente quale dei 3 sta pesando di più.

#

Ecco, questo chiude il cerchio 👇

Il dato chiave è:

  • Session cached tokens: 39,763

Significa che ogni turno parte già con ~40k token di input (anche se scrivi una riga).

Quindi:

  • 13 messaggi × ~40k input ≈ 520k input
  • se aggiungi output/deep-thinking (es. 20–50k a turno), arrivi facilmente a ~1.1M totale

Quindi sì: è “plausibile”, e il problema è soprattutto contesto accumulato + output reasoning, non bootstrap/tools da soli.

Azione immediata

  1. fai /compact adesso
  2. o meglio /new quando cambi argomento
#
  1. tieni deep thinking più basso/disattivo
  2. limita output max
  3. heartbeat solo con isolatedSession: true + lightContext: true (o off)

Se vuoi, nel prossimo messaggio mandami solo una riga di /usage full (input/output) e ti dico il settaggio preciso per stare in budget.

green spire
#

What!? 💸 Usage cost
Session $0.0000 · 868k tokens
Today $0.0000
Last 30d $0.0000

scenic groveBOT
#

Yep — that can happen, and it’s confusing 😅

$0.0000 there is OpenClaw’s local estimate, not the provider invoice.
It shows zero when:

  1. pricing for that model/provider isn’t configured in OpenClaw, or
  2. auth mode hides dollar estimates (OAuth flows), or
  3. provider pricing is dynamic and OpenClaw only has token counts.

So this is possible:

  • 868k tokens (real)
  • $0.0000 (local estimate unavailable)

For actual billing, trust the provider dashboard.
Docs: https://docs.openclaw.ai/reference/token-use

green spire
#

33 request 1.2 M token spent. That’s crazy wrong

scenic groveBOT
#

It feels wrong, but with your own stats it matches almost exactly:

  • You had ~39,763 cached context tokens per request
  • 39,763 × 33 requests = 1,312,179 input tokens baseline

So ~1.2M for 33 requests is unfortunately expected, even before big outputs.

So the fix is not billing support — it’s reducing per-request context:

  1. /new now (hard reset context)
  2. /compact regularly
  3. shorter threads per topic
  4. lower reasoning/deep-thinking + max output
  5. heartbeat off, or isolated/light

Your token burn is context inflation, not random overcharging.