High Token Usage | Friends of the Crustacean 🦞🤝 | Page 1

latent chasm Mar 3, 2026, 11:16 AM

#

I am having very high token usage and am burning through almost $10 every 4-6 hours depending on what I am doing. This is all through messaging my main agent through telegram. I had several sub agents but they have all been disabled whilst I fault find. I have tried context reduction methods in the documentation but I am still going through them very quickly.
Originally I was using my CGPT subscription as my model. This was with my main agent and several sub agents all buzzing along nicely. I have since tried anthropic, openai and openrouter API's all with similar token burn rates.
My main uses are for GTD, some calendar APi stuff, writing local text files and building a very small website to push to git.

Current version is 2026.3.2

At one point I had 50 active sessions and even after closing them still have 21 that persist

wind bearBOT Mar 3, 2026, 11:16 AM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

tough flaxBOT Mar 3, 2026, 11:19 AM

#

What you’re describing (“~$10 every 4–6h” across multiple providers) usually means one of two things:

Each run is sending a huge prompt (system prompt + tool schemas + injected workspace files + long session history), so every message is expensive, or
You have background runs (cron/heartbeat/retries) happening even when you’re not actively chatting.

The good news: you can measure which one it is in ~2 minutes.

1) Measure the burn per reply (Telegram)

In your Telegram chat with the agent, run:

/usage full (or /usage tokens if full is too noisy)
then send 1 normal message like: “ping”

Paste the usage footer it prints (input/output/context tokens + any cache info).
Docs: https://docs.openclaw.ai/reference/token-use and https://docs.openclaw.ai/concepts/usage-tracking.md

Also run:

#

/context list

That will tell you if your baseline overhead is already massive (tool schemas + injected files).
Docs: https://docs.openclaw.ai/concepts/context.md

If you see something like “System prompt … (~8k–20k tok)” or “Tool schemas … (several thousand tok)” before you’ve even chatted much: that’s your floor cost every single message.

2) Check if you’re paying while “idle” (Gateway CLI)

On the gateway host, run:

openclaw logs --follow

Then don’t message the bot for ~2 minutes. If you still see model runs happening, you’ve got background activity (cron/heartbeat/retry loops).

Quick checks:

#

openclaw cron status
openclaw cron list
openclaw system heartbeat last

Runbook reference: https://docs.openclaw.ai/gateway/troubleshooting.md (see “Cron and heartbeat delivery” + command ladder)

3) About the “21 sessions that persist”

That part is usually normal.

“Closing” sessions in a UI doesn’t necessarily delete them from disk.
Session stores are retained (default pruneAfter is 30d, maxEntries 500), and cleanup is a maintenance operation.

To see what’s actually in there and whether they’re active:

#

openclaw sessions --active 360
openclaw sessions --all-agents
openclaw sessions cleanup --all-agents --dry-run

Docs: https://docs.openclaw.ai/cli/sessions.md and https://docs.openclaw.ai/concepts/session.md

This affects disk/housekeeping more than token burn, but it’s a good sanity check (and can reveal “cron:*” sessions etc).

4) The biggest practical lever: reduce tool schema + context size for your Telegram “daily driver”

If your Telegram agent is used for GTD/calendar/chat most of the time, don’t give it the full “coding” tool universe (browser/canvas/nodes/etc). Tool schemas count toward context on every provider.

You can:

create a “messaging-only / lightweight” agent for Telegram
keep a separate “coding” agent for website/git work

#

Docs (tool allow/deny + profiles):
https://docs.openclaw.ai/tools/multi-agent-sandbox-tools
https://docs.openclaw.ai/gateway/sandbox-vs-tool-policy-vs-elevated

This is often the difference between “every message costs a lot” vs “only heavy work costs a lot”.

If you paste (a) the /usage full footer from one reply and (b) the top of /context list (especially the system prompt + tool schema sizes), I can tell you pretty quickly which bucket you’re in and what the most effective next change is.

latent chasm Mar 3, 2026, 11:21 AM

#

⚙️ Usage footer: tokens.

⚙️ Usage footer: full.

💸 Usage cost
Session $179.70 · 347.8m tokens
Today $6.65
Last 30d $209.46

#

🧠 Context breakdown
Workspace: /Users/mattmoore/.openclaw/sandboxes/agent-main-f331f052
Bootstrap max/file: 20,000 chars
Bootstrap max/total: 150,000 chars
Sandbox: mode=all sandboxed=true
System prompt (run): 23,988 chars (~5,997 tok) (Project Context 13,023 chars (~3,256 tok))

Injected workspace files:

Skills list (system prompt text): 2,428 chars (~607 tok) (6 skills)
Skills: coding-agent, healthcheck, skill-creator, summarize, weather, they-ask-you-answer
Tool list (system prompt text): 778 chars (~195 tok)
Tool schemas (JSON): 5,506 chars (~1,377 tok) (counts toward context; not shown as text)
Tools: read, edit, write, exec, process, sessions_list, sessions_history, sessions_send, sessions_spawn, subagents, session_status, image

Session tokens (cached): 42,770 total / ctx=200000

Inline shortcut: a command token inside normal text (e.g. “hey /status”) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.

⚙️ Usage footer: full.

tough flaxBOT Mar 3, 2026, 11:22 AM

#

Those numbers help, but /usage cost is an aggregate report, not the per-message smoking gun.

Session $179.70 · 347.8m tokens = “this session key’s history, over time, summed up from local logs”
Today $6.65 = the only part that’s a rate signal

Two fast ways to figure out what’s actually burning money:

A) Is it spending while you’re idle? (background runs)

Note the current Today value ($6.65).
Don’t message the bot for 15–30 minutes.
Run /usage cost again.

If Today increases while idle, you almost certainly have something like cron / heartbeat / webhook / retry loop generating model calls.

On the gateway host, paste the output of:

#

openclaw cron status
openclaw cron list
openclaw system heartbeat last
openclaw logs --follow

(With logs --follow, just watch for 1–2 minutes without chatting; if you see LLM runs, that’s the culprit.)

Docs: https://docs.openclaw.ai/gateway/troubleshooting.md (command ladder + cron/heartbeat section)

B) Or is every single reply huge? (prompt/context overhead)

Right now you’ve only shown the toggle confirmations (“Usage footer: full/tokens”), not an actual per-reply footer.

In Telegram:

Run /usage full
Send: ping

#

Paste the usage footer from the bot’s reply (the one that includes input/output/context/cache tokens)

Also run and paste:

/status
/context list

/context list will tell us if you have a massive baseline (system prompt + injected workspace files + tool schemas) making every message expensive.
Docs: https://docs.openclaw.ai/concepts/context.md and https://docs.openclaw.ai/reference/token-use

About the “21 sessions persist”

That’s normal: sessions are retained for maintenance/UX; having many doesn’t cost tokens by itself.

To see what’s actually “active” recently:

openclaw sessions --all-agents --active 360

#

openclaw sessions cleanup --all-agents --dry-run

Docs: https://docs.openclaw.ai/cli/sessions.md

One quick clarification that matters: when you say “burning through $10”, is that your provider dashboard spend, or OpenClaw’s /usage cost estimate? (The estimate depends on your configured pricing and can be misleading; provider spend is definitive.)

If you paste:

the per-reply /usage full footer from a single “ping”
/context list top section
whether “Today $…” rises while idle

…I can tell you which bucket you’re in and the most effective fix (background job vs giant prompts vs tool/schema overhead).

#

That /context list output is really useful — and it strongly suggests your burn is not coming from “tool schema bloat” or a huge bootstrap.

What your context breakdown says

Baseline overhead per request (before chat history) is roughly:

System prompt (run): ~6,000 tokens
- of which injected files are ~3,256 tokens
Tool schemas: ~1,377 tokens
AGENTS.md alone is ~1,965 tokens every single call (worth trimming, but it’s not catastrophic)

So your “fixed tax” per message is ~7–8k tokens. Not nothing, but it doesn’t explain $10/4–6h by itself.

The real red flag is this line:

Session tokens (cached): 42,770 total

That usually means: each time you send a message, the model is seeing ~40k tokens of history, so even a trivial “what’s next?” GTD prompt costs real money.

#

Why it keeps growing

Your context window is showing ctx=200000. OpenClaw uses 200000 as a fallback when it can’t resolve the model’s real context window from the model catalog (docs mention this defaulting behavior). If it thinks you have a 200k window, auto-compaction won’t kick in until you’re much larger — which makes long-running “everything in one chat” sessions expensive for ages.

2 quick checks that will pinpoint the cause

1) Get a real per-reply token footer

Right now you’ve only shown the confirmation line (“Usage footer: full”), not the footer from an actual response.

In Telegram:

/usage full
send ping
paste the usage footer attached to the bot’s reply (the one with input/output/context/cache tokens)

Alternative: run /status right after the ping and paste it — it usually includes “last input/output tokens” style info.

2) Confirm whether the 200k context window is “real” or fallback

On the gateway host (or wherever you run the CLI), paste:

#

openclaw models status
openclaw status --deep

If your actual model is, say, 32k/128k context but OpenClaw is treating it as 200k, you’ll get way less compaction than you expect, and token usage will climb steadily.

Practical fixes (once we confirm)

Most common “this costs a fortune” fix patterns:

Split GTD vs coding into different sessions/agents
- Coding (website + git + lots of pasted code) permanently inflates user/assistant history.
- Later, your GTD chat inherits that entire history unless you /new or separate agents.
Force earlier compaction by capping effective context
- Typically via agents.defaults.contextTokens (so OpenClaw compacts sooner instead of happily carrying 40k+ forever).
- Exact number depends on your model (often 32k or 64k is a good sanity cap for “daily driver” chats).

#

Trim AGENTS.md
- You’re paying ~2k tokens every call just to inject that file.
- Move long “how I like things” into memory files / notes that aren’t injected every run.

If you paste the per-reply footer from the ping + your /status after it (and the model shown there), I can tell you whether you’re paying for huge input tokens each message, hidden reasoning/output, or background activity.

latent chasm Mar 3, 2026, 11:26 AM

#

Choose off, tokens, full, or cost for /usage.

⚙️ Usage footer: full.

ping

pong
Usage: ? in / 5 out · session agent:main:telegram:default:direct:1810982771

/status

🦞 OpenClaw 2026.3.2 (85377a2)
🧠 Model: openrouter/anthropic/claude-3.5-sonnet · 🔑 api-key sk-or-v1...67dea5ef (openrouter:default)
↪️ Fallback: anthropic/claude-sonnet-4-20250514 · 🔑 api-key sk-ant-a...gVczIwAA (anthropic:default) (billing)
🧮 Tokens: 0 in / 5 out
🗄️ Cache: 0% hit · 0 cached, 42k new
📚 Context: 42k/200k (21%) · 🧹 Compactions: 6
🧵 Session: agent:main:telegram:default:direct:1810982771 • updated just now
⚙️ Runtime: docker/all · Think: off
🪢 Queue: collect (depth 0)

tough flaxBOT Mar 3, 2026, 11:28 AM

#

That output basically explains the whole burn:

Your session is 42k tokens of context (📚 Context: 42k/200k)
And you’re getting 0% cache hit with “0 cached, 42k new”

So even a dumb ping → pong is forcing the provider to process ~42k tokens of input every time. If you send lots of short Telegram messages, that adds up fast.

Also note: Usage: ? in / 5 out means OpenRouter isn’t returning input token counts in a way OpenClaw can show, so /usage full won’t be very illuminating on that provider. /status is giving you the real clue (42k context + no cache).

The fastest sanity test

In Telegram, do:

/new
ping
/status

If the problem is “session bloat”, you should see context drop from ~42k to something like <10k immediately (mostly system prompt + your few messages).

#

If that drops your “spend per message” feeling, you’ve found the core issue: you were using one long-lived chat for GTD + coding + everything, so every tiny GTD question drags all the old coding convo along.

Fixes that actually move the needle

1) Separate GTD vs coding (most effective)

Keep a “daily driver” GTD chat small, and do coding/website work in a separate session/agent.

Practical workflow:

Before you start coding: /new (or do it in a separate agent/chat)
After a big coding session: don’t go back to GTD in the same session key unless you /new again

2) Get prompt caching working (your cache is currently doing nothing)

Right now: 0 cached, 42k new every turn = worst case.

Two things to try:

A) Switch primary away from OpenRouter to direct Anthropic (since you already have a billed Anthropic key as fallback).

#

Direct Anthropic caching support is solid per docs: https://docs.openclaw.ai/reference/prompt-caching

B) Explicitly set cacheRetention in config for the model(s) you use (this is what tells OpenClaw to cache stable blocks):

Docs: https://docs.openclaw.ai/reference/prompt-caching

After enabling, do two quick turns (ping, then ping again) and check /status — you want to see cache hit > 0% and non-zero cached tokens.

3) Trim the always-injected stuff (nice-to-have)

Your /context list shows:

AGENTS.md is ~2k tokens every call

Not the main problem, but it’s “always on” overhead. If AGENTS.md has a lot of narrative/old instructions, slimming it down helps every single message.

About the “21 sessions persist”

That’s normal retention; it doesn’t directly cost tokens unless something is actively running in them.

#

If you want to inspect/clean up safely:

openclaw sessions --all-agents --active 360
openclaw sessions cleanup --all-agents --dry-run

Docs: https://docs.openclaw.ai/cli/sessions.md

If you do the /new → ping → /status test and paste the new /status, we can confirm in one shot whether this is purely “long session context”, and whether caching is actually kicking in after config/provider changes.

latent chasm Mar 3, 2026, 11:30 AM

#

🦞 OpenClaw 2026.3.2 (85377a2)
🧠 Model: openrouter/anthropic/claude-3.5-sonnet · 🔑 api-key sk-or-v1...67dea5ef (openrouter:default)
↪️ Fallback: anthropic/claude-sonnet-4-20250514 · 🔑 api-key sk-ant-a...gVczIwAA (anthropic:default) (billing)
🧮 Tokens: 2 in / 172 out
🗄️ Cache: 98% hit · 12k cached, 210 new
📚 Context: 12k/200k (6%) · 🧹 Compactions: 0
🧵 Session: agent:main:telegram:default:direct:1810982771 • updated just now
⚙️ Runtime: docker/all · Think: off
🪢 Queue: collect (depth 0)

This is after /new > ping> /status sequence

tough flaxBOT Mar 3, 2026, 11:31 AM

#

Perfect — that confirms it.

What changed after `/new`

Before:

Context: 42k
Cache: 0% hit · 0 cached, 42k new
Result: every tiny message was effectively paying to re-send/re-process ~42k tokens

After /new:

Context: 12k (6%)
Cache: 98% hit · 12k cached, 210 new
Result: now each new message is mostly paying only for the delta (a couple hundred tokens), not the whole history

So the root cause of the “$10 every 4–6h” feeling is overwhelmingly:

You were running daily GTD + calendar + coding + file work inside one long-lived Telegram session, and it had grown to ~40k+ context.
Even though caching exists, your earlier turns weren’t hitting it (likely because the prompt/history kept changing enough or you were outside TTL / cache conditions), so you were repeatedly writing big cache chunks / processing big prompts.

#

What to do going forward (practical + low effort)

1) Treat `/new` as your “start of a new project” button

Starting a coding/build task? /new first.
Done coding and going back to GTD? /new again (or keep a separate GTD agent/session).

This alone will keep you out of the 40k+ context trap.

2) Keep two agents (recommended)

GTD agent (Telegram default): minimal tools, short context, cheap model if you want.
Coding agent: full tools, larger context, used only when you’re actually coding.

This avoids “my todo chat is dragging 2000 lines of code history forever”.

Docs for multi-agent tool profiles:
https://docs.openclaw.ai/tools/multi-agent-sandbox-tools

#

3) Make compaction/pruning more aggressive for the daily driver

Even with caching, long history still creeps up and eventually costs you again.

Good knobs to look at:

Compaction (keeps the convo useful but summarized): https://docs.openclaw.ai/concepts/compaction.md
Session pruning (tool results): https://docs.openclaw.ai/concepts/session-pruning.md
Token/caching overview: https://docs.openclaw.ai/reference/token-use and https://docs.openclaw.ai/reference/prompt-caching

If you want, paste your openclaw.json agents + models section (redact keys) and I can suggest a sane “GTD cheap + coding expensive” split.

About the 21 sessions persisting

Now that we’ve proven the cost is session-context, those “persisting sessions” are mostly just stored metadata/history. They aren’t costing you money unless something is actively running in them.

If you still want to clean them up:

openclaw sessions --all-agents

#

openclaw sessions cleanup --all-agents --dry-run

Docs: https://docs.openclaw.ai/cli/sessions.md

If this fixed the main issue, please mark the solution message in the thread (right-click → Apps → ✅ Mark Solution) so Answer Overflow indexes it for the next person.

latent chasm Mar 3, 2026, 11:43 AM

#

I have had several agents setup but was running into context issues between different projects and my main agent seemed to sort all of that out. I have been using him a lot though for 3/4 different projects and migrating openclaw to a new machine. I'm considering starting again with a fresh install on the fresh machine

tough flaxBOT Mar 3, 2026, 11:45 AM

#

A totally fresh install on the new machine is fine, but it won’t really fix the thing that was costing you money unless you also change how you separate “projects”.

What you just proved with /new → ping → /status is:

the same agent becomes cheap again as soon as the session history is small
your expensive days were basically “one mega-session doing 3–4 projects + migration”, so every tiny GTD ping was dragging tens of thousands of tokens along

Better than “start over”: pick a separation strategy

You’ve got a few good options:

Option A (lowest friction): one chat per project

Keep one agent, but don’t mix projects in the same DM thread:

For each new project: /new
If you want true parallel threads, create separate Telegram group chats (or a private group with topics/forums) and talk to the bot there — each group/topic becomes a separate session key.

Telegram session isolation notes are in the channel docs (group IDs + topics):

#

https://docs.openclaw.ai/channels/telegram

Option B (cleanest): one agent per project (recommended)

This is what multi-agent is designed for: each agent has its own workspace + sessions + auth store, so “Project A” never bleeds into “Project B”.

Docs: https://docs.openclaw.ai/concepts/multi-agent

Quick workflow:

openclaw agents add gtd
openclaw agents add proj-a
openclaw agents add proj-b
openclaw agents list --bindings

Then either:

use multiple Telegram bots (one per agent) and bind each accountId to an agent (example in the multi-agent doc), or

#

bind different Telegram peers/groups to different agents.

CLI binding helper: https://docs.openclaw.ai/cli/agents.md

If you are migrating to a new machine

If the goal is “move everything without re-onboarding”, follow the migration guide and copy the whole state dir + workspace:

https://docs.openclaw.ai/install/migrating

Key point: copying the full ~/.openclaw/ brings sessions too — so if you want the new machine to feel “fresh”, you can migrate config/auth/workspace but deliberately not bring over old sessions (otherwise you’ll just inherit the same giant histories again).

If you tell me which of these you prefer:

multiple Telegram bots (one per agent), or
one bot but separate project groups/topics, or

#

one DM and you’re happy to use /new as a hard boundary

…I can suggest the simplest concrete setup for your case.

latent chasm Mar 3, 2026, 11:55 AM

#

I have tried multiple agents and tasking them for different projects but burn rate was high
I have tried one supergroup with all agents and topics for each different project but burn rate was also high.
I have saved contexts tompaste before tasking each agent to do something but usage is still also high.
Each agent is based on a different individual, a task manager, a marketing guy, a customer service rep, a health coach l, and one main agent to oversee everything.
I'm early enough in that I'm happy to completely restructure my agent /sub agent strategy to optimise token usage. What do you recommend

tough flaxBOT Mar 3, 2026, 11:56 AM

#

If you want to optimize token burn, the #1 thing is to stop “one message → many LLM calls” and stop dragging giant histories into tiny GTD turns.

From what you showed earlier, the burn wasn’t “OpenRouter vs Anthropic vs OpenAI” — it was context growth + too much parallel agent activity.

Here’s the structure that tends to be cheapest and usable:

Recommended setup (cost-first, still powerful)

1) One Daily Driver agent (GTD / calendar / quick questions)

Goal: small context, minimal tools, minimal output.

Use it in one dedicated Telegram DM/thread.
Don’t let it code, don’t let it do big web research, don’t let it run long multi-step plans.
Use a cheaper model here if you can tolerate it.

Key habits:

/new when you switch “projects”

#

/compact occasionally when it gets rambly (compaction persists and shrinks history): https://docs.openclaw.ai/concepts/compaction.md

2) One Builder agent (coding / files / git)

Goal: this is where big context is allowed — but it should be isolated.

Put coding in a separate agent and preferably a separate Telegram bot or separate chat so it can’t pollute your GTD chat.
This agent can have the full toolset, and a bigger model.

Multi-agent routing overview: https://docs.openclaw.ai/concepts/multi-agent
CLI helpers: https://docs.openclaw.ai/cli/agents.md

3) Treat “marketing guy / health coach / customer support rep” as modes, not always-on agents

What gets expensive fast is: “main overseer” spawning 3–6 persona agents for every request. That multiplies cost even if each agent is efficient.

Cheaper pattern:

Keep one Daily Driver agent and use short “mode prompts” when needed:
- “Switch to marketing mode for the next answer: …”

#

“Answer like a health coach, max 10 bullets, no long explanations.”
Only spawn a specialist agent when you’re doing a real deep task.

If you do keep multiple persona agents, avoid the “supergroup with all agents” pattern unless you have strict mention gating; otherwise you can easily end up with multiple agents responding / building context in parallel.

Hygiene knobs that prevent “silent re-bloating”

A) Auto-reset stale sessions (optional but effective)

If you tend to come back to an old thread days later, add an idle reset so the session doesn’t quietly grow forever. Session docs: https://docs.openclaw.ai/concepts/session.md

B) Keep injected files small

Your /context list showed AGENTS.md is ~2k tokens every call. Fine, but don’t put huge pasted “project context” into injected files. Instead: store project briefs in separate notes/files that are only read when needed.

C) Keep caching effective

When caching works, you saw it drop to “210 new” tokens — that’s the happy path.
Prompt caching knobs: https://docs.openclaw.ai/reference/prompt-caching

About “fresh install on the new machine”

#

Not required to fix burn rate. If you migrate, you can choose to not bring old sessions, which effectively gives you a clean slate without losing config/auth.

Migration guide: https://docs.openclaw.ai/install/migrating

If you answer just two questions, I can make this concrete (bots/bindings/models) for your setup:

Do you want multiple Telegram bots (one per agent), or keep one bot and separate by groups/topics?
Is your “main overseer” currently spawning specialist agents on most messages, or only occasionally?

latent chasm Mar 4, 2026, 8:28 AM

#

Do you recommend having one bot per agent who is specialised in one skill, or one bot per project and load individual skills for that project as needed?
The main overseer hasn't spawned any specialist agents, only through conversing and saying oh if you want to do marketing you should have a marketing bot. The overseer very rarely instructs the other agents to do things, I usually C&P a prompt from them to input into the agents telegram chat

tough flaxBOT Mar 4, 2026, 8:29 AM

#

Between those two, “one bot per project” wins for token efficiency most of the time.

Reason: tokens are dominated by (a) session history and (b) whatever context you have to re-introduce. If you do “one bot per persona/skill”, you end up re-feeding the same project context to multiple bots (often via C&P), and each bot builds its own long-running history across multiple projects anyway.

What I recommend instead (best tradeoff)

Think in two axes:

Project isolation (prevents “coding junk” contaminating GTD)
Tool/model profile isolation (prevents your GTD chat from hauling coding tools/policies)

So the cheapest/cleanest structure is usually:

A) 1 “Daily Driver” bot (GTD + calendar only)

Minimal tools, minimal skills, cheaper model if you want.
You keep this chat “clean” and never do long coding sessions in it.

B) 1 “Builder” bot (coding / files / git)

#

Full tools, bigger model if needed.
Use one topic/chat per project inside this bot.

If you like Telegram topics: a private supergroup with forum topics per project is great because each topic is a separate session key (so contexts don’t mix). Just make sure you’re not pasting huge “saved contexts” into every topic.

Docs for multi-agent + routing (if you want separate bots/agents):
https://docs.openclaw.ai/concepts/multi-agent
https://docs.openclaw.ai/cli/agents.md
Telegram topics/session isolation notes: https://docs.openclaw.ai/channels/telegram

Where do “marketing / health coach / customer support” fit?

Given your overseer doesn’t actually dispatch the specialists and you’re mostly C&P’ing prompts, don’t run them as permanent separate agents unless you really need long-lived memory in that role.

Cheaper approach:

Keep them as short “modes” inside the project bot:
- “Marketing mode: write 3 landing page variants…”
- “Customer support mode: draft a reply…”

#

If one role truly needs continuity (e.g. health coaching), then yes: a dedicated role-bot can make sense because it’s not project-specific.

Quick rule of thumb

If something depends on project state (codebase, roadmap, decisions): project bot/topic
If something depends on personal long-term behavior (health coaching): role bot
If it’s just writing style: mode prompt inside one bot, not a whole agent

One more important note

“Saved contexts to paste” is usually a silent token killer. Better:

keep a 10–20 line project brief and update it, or
store project notes in a file and have the Builder bot read it when needed (instead of pasting walls of text)

If you tell me roughly how many projects you run concurrently (2? 5? 10?) and whether you prefer one bot + topics or multiple bots, I’ll suggest an exact minimal setup (number of bots/agents, what each one does, and what not to mix) that keeps your context from creeping back to 40k+.

latent chasm Mar 7, 2026, 9:01 PM

#

Ok I have a few projects on the go but am not smart enough to be working on multiple projects concurrently. So I only work on one at a time. I have limited context switching and frequently compact sessions which has helped. Today though my agent has been on a spending spree and sending hundreds of single line replies instead of one large one. Huge contexts which failed compacting properly. I asked him to save the context to memory so I could start a new session. When I did he went from being a smart 25 year old guy to a stroppy teenager who lost their read and write access

tough flaxBOT Mar 7, 2026, 9:02 PM

#

That symptom bundle usually comes from one (or more) of these three things:

A retry/compaction loop (gateway keeps trying → compacts → retries → fails → tries again), which can rack up spend fast and also produce lots of short “status-y” replies.
Telegram “block streaming” / chunking (one model run, but it gets delivered as tons of tiny messages).
You accidentally switched into a different agent/workspace/tool profile after /new (so the persona changes and tools like read/write appear “gone”).

Given you said they “lost read/write access” and the personality flipped, #3 is very plausible.

1) Confirm whether you actually lost tools, or the model is just refusing

In the “stroppy teenager” chat, run:

/status
/context list

You’re looking for two lines:

🧵 Session: agent:<agentId>:... (did the agentId change?)

#

In /context list, the Tools: list — do read/write still appear?

If read/write are not listed there, it’s a real tool policy/binding change.
If they are listed but the bot claims it can’t use them, it’s usually persona/system-prompt drift (or the model is being weird), not actual capability loss.

If the agentId changed unexpectedly, that’s a routing/bindings issue (easy to fix once we see the /status).

2) Check if you accidentally “memory-dumped” into an auto-injected file

One common footgun: telling the agent “save all context to memory” and it writes a massive blob into MEMORY.md / memory.md.

Those files are auto-injected into every run (up to the bootstrap cap), which can:

jack up your baseline tokens again immediately, even after /new
make the model act different because it’s reading a giant new instruction/context blob every turn

Docs on what gets injected: https://docs.openclaw.ai/reference/token-use
If you want to store big project briefs, put them in a normal note file (or memory/*.md on-demand files), not the always-injected memory root.

#

3) If you literally got “hundreds of single-line replies”

Two quick checks:

A) Make sure it’s not delivery/streaming mode

In Telegram, streaming can be partial vs block. Block-mode can look like “a million tiny replies”.
Telegram streaming docs: https://docs.openclaw.ai/channels/telegram (see channels.telegram.streaming)

B) Make sure it’s not multiple bots responding

If you have more than one Telegram bot in the same group/topic and requireMention is off, you can get a reply storm that looks like one agent going crazy.

4) If it’s a real compaction/retry loop (actual spend explosion)

On the gateway host, the fastest way to confirm is watching logs while it happens:

openclaw logs --follow

#High Token Usage

1) Measure the burn per reply (Telegram)

2) Check if you’re paying while “idle” (Gateway CLI)

3) About the “21 sessions that persist”

4) The biggest practical lever: reduce tool schema + context size for your Telegram “daily driver”

A) Is it spending while you’re idle? (background runs)

B) Or is every single reply huge? (prompt/context overhead)

About the “21 sessions persist”

What your context breakdown says

Why it keeps growing

2 quick checks that will pinpoint the cause

1) Get a real per-reply token footer

2) Confirm whether the 200k context window is “real” or fallback

Practical fixes (once we confirm)

The fastest sanity test

Fixes that actually move the needle

1) Separate GTD vs coding (most effective)

2) Get prompt caching working (your cache is currently doing nothing)

3) Trim the always-injected stuff (nice-to-have)

About the “21 sessions persist”

What changed after /new

What to do going forward (practical + low effort)

1) Treat /new as your “start of a new project” button

2) Keep two agents (recommended)

3) Make compaction/pruning more aggressive for the daily driver

About the 21 sessions persisting

Better than “start over”: pick a separation strategy

Option A (lowest friction): one chat per project

Option B (cleanest): one agent per project (recommended)

If you are migrating to a new machine

Recommended setup (cost-first, still powerful)

1) One Daily Driver agent (GTD / calendar / quick questions)

2) One Builder agent (coding / files / git)

3) Treat “marketing guy / health coach / customer support rep” as modes, not always-on agents

Hygiene knobs that prevent “silent re-bloating”

A) Auto-reset stale sessions (optional but effective)

B) Keep injected files small

C) Keep caching effective

About “fresh install on the new machine”

What I recommend instead (best tradeoff)

A) 1 “Daily Driver” bot (GTD + calendar only)

B) 1 “Builder” bot (coding / files / git)

Where do “marketing / health coach / customer support” fit?

Quick rule of thumb

One more important note

1) Confirm whether you actually lost tools, or the model is just refusing

2) Check if you accidentally “memory-dumped” into an auto-injected file

3) If you literally got “hundreds of single-line replies”

A) Make sure it’s not delivery/streaming mode

B) Make sure it’s not multiple bots responding

4) If it’s a real compaction/retry loop (actual spend explosion)

What changed after `/new`

1) Treat `/new` as your “start of a new project” button