#Tokens consumption tricks
1 messages · Page 1 of 1 (latest)
If youâre on Claude Max (claude.ai), âtoken consumptionâ mostly shows up as: fewer messages before you hit the usage limit, slower responses, and the model âforgettingâ because context gets too big. The big wins are all about keeping the input context small and forcing short outputs.
Below is the full bag of tricks, ordered roughly by impact.
1) Biggest wins (usually 5â20Ă less tokens)
A) Start a new chat aggressively
Long chats silently get expensive because each new message can drag a lot of prior context along.
- New topic? New chat
- Same topic but different subproblem? New chat
- Debugging with lots of attempts? New chat after youâve learned the key facts
Prompt to carry over without dragging history:
âWeâre starting fresh. Here are the only facts you should assume: (bullets). Ask 1â2 questions if anything critical is missing.â
B) Replace history with a âcompressed stateâ
Instead of letting the model re-read 50 turns, maintain a tiny âstateâ block you paste each time.
Template (copy/paste):
STATE (keep; overwrite as needed):
Goal: âŠ
Constraints: âŠ
Current status: âŠ
Decisions made: âŠ
Open questions: âŠTASK NOW: âŠ
OUTPUT FORMAT: (e.g., 6 bullets, max 120 words)
This is the single most reliable way to cut tokens dramatically.
C) Donât paste raw logs / code / articles; paste only the minimal excerpt
If you paste 2,000 lines, you pay for 2,000 linesâevery time the model needs them again.
- Paste only the failing section
- Include file name + function name + ~50 lines around the issue
- For logs: include first error + stack trace + 30 lines before/after
Prompt:
âIâm pasting an excerpt. Do not ask for the full file unless absolutely necessary. If you need more, specify exactly which lines/range.â
D) Force short outputs (hard cap)
Most people say âbe conciseâ and still get essays. Put a hard budget and a stop rule.
Examples:
- âAnswer in â€120 tokens. If you canât, ask me 1 question.â
- âGive exactly 7 bullets, no intro, no recap.â
- âReturn only: JSON with keys
root_cause,fix,next_steps.â
2) Model / settings choices (very high leverage)
A) Avoid âextended thinking / deep reasoningâ unless you truly need it
In Claude products, more thinking generally means more internal work and often longer outputs. Use the lightest mode that works.
B) Prefer smaller/faster models for most turns
Use the best model only for the âhard stepsâ:
- brainstorming / outlining / summarizing: smaller model
- final tricky reasoning / final answer: bigger model
C) Split work: cheap pass â expensive pass
- Cheap: âextract facts, detect unknowns, propose planâ
- Expensive: âsolve only the hard part with the cleaned factsâ
This prevents paying expensive-model rates to read messy context.
3) Conversation tactics that reduce back-and-forth (saves both tokens + messages)
A) Ask the model to ask you questions first
This avoids 3â6 turns of wrong assumptions.
Prompt:
âBefore solving, ask up to 3 clarifying questions. If you can proceed, say âNo questionsâ and proceed.â
B) Use âtwo-stepâ outputs: plan then execute (but keep plan tiny)
Prompt:
âStep 1: 5-bullet plan (max 60 words). Step 2: execute.â
C) Stop when sufficient
Prompt:
âStop as soon as you have a working answer. Donât provide extra alternatives unless I ask.â
4) Make your inputs more compressible
A) Use identifiers and references instead of repeating text
Instead of re-pasting a long spec, do:
- âSpec v3 is unchangedâ
- âConstraint C2 still appliesâ
- âUse the same assumptions as aboveâ
Or create a short âfacts listâ and refer to bullet numbers:
- âAssume F1âF7; update F3 to âŠâ
B) Use structured input formats
Models waste tokens parsing prose.
Good:
- bullets
- tables
- key/value blocks
- JSON (if you can tolerate it)
Bad:
- long narrative paragraphs
- repeated context in different words
5) Output-format tricks (prevents token bloat)
A) For code: ask for diffs or minimal patches
Prompt:
âGive a minimal patch only (diff-style). No full-file rewrites.â
B) For explanations: demand âjustification size limitsâ
Prompt:
âExplain the why in 2 bullets max.â
C) For research: ask for âanswer + citations onlyâ
Prompt:
âReturn: (1) conclusion (2) 3 supporting bullets (3) sources list. Nothing else.â
6) When you must include large context: chunking and retrieval
A) Chunk input and ask for extraction
Instead of âread all this and solveâ, do:
- Paste chunk 1 â âextract only relevant facts; ignore everything elseâ
- Paste chunk 2 â same
- Then give the extracted facts back and ask for solution
Yes, itâs multiple turnsâbut you avoid repeatedly carrying giant raw text forward.
B) âQuote only what you useâ
Prompt:
âOnly quote the exact lines you rely on (max 10 lines total).â
This keeps outputs tight and prevents the model from reprinting your whole input.
7) Practical copy/paste prompts (steal these)
Ultra-low token mode
âUltra-low-token mode. Output â€90 tokens. No preamble. If missing info, ask 1 question only.â
Debug mode (minimal)
âFind the root cause and fix. Output exactly:
- Root cause (1 sentence)
- Fix (numbered steps, max 5)
- Verification (max 3 steps)â
âCompressed stateâ generator (use once, then keep using the state)
âSummarize this conversation into a STATE block â€150 words with: Goal, Constraints, Current status, Decisions, Open questions. This STATE should replace the entire chat history.â
Quick diagnostic question (so I can tailor this)
When you say âtoken consumption,â do you mean:
- claude.ai Max usage limits (messages run out), or
- API billing (you pay per token), or
- OpenClaw/agent runs (tools + long context causing big prompts)?
Tell me which one, and what your typical workflow is (coding? research? chatty back-and-forth?), and Iâll give you a tighter âdo this, not thatâ setup.
I am talking about decreasing token consumption during heartbeats
Heartbeats in OpenClaw are full agent turns (full system prompt + session history + tools), so the only dramatic token reductions come from either running them less or skipping the model call entirely.
Here are the best levers (highest impact first), with the exact knobs that exist today:
1) Make OpenClaw skip the heartbeat run (no model call)
A) Create HEARTBEAT.md but keep it âeffectively emptyâ
From the heartbeat docs: if HEARTBEAT.md exists but is basically empty (blank lines / headers / comments), OpenClaw skips the heartbeat run to save API calls. If the file is missing, the heartbeat still runs.
Docs: https://docs.openclaw.ai/gateway/heartbeat#heartbeatmd-optional
So: create HEARTBEAT.md and leave only e.g.
# HEARTBEAT.md
# (empty on purpose)
B) Turn off heartbeat visibility everywhere (also skips the run)
If all three are false, OpenClaw skips the heartbeat run entirely:
showOk: false
showAlerts: false
useIndicator: false
Docs: https://docs.openclaw.ai/gateway/heartbeat#what-each-flag-does
(Useful if you enabled heartbeats globally but want to hard-disable them for a channel/account.)
2) Reduce how often heartbeats happen
In config:
{
agents: {
defaults: {
heartbeat: {
every: "2h" // or "4h"; "0m" disables
}
}
}
}
Docs: https://docs.openclaw.ai/gateway/heartbeat#defaults
Also consider activeHours so youâre not burning turns overnight:
{
agents: { defaults: { heartbeat: { every: "1h", activeHours: { start: "09:00", end: "22:00" } } } }
}
3) Use a cheaper model for heartbeat runs
Heartbeat supports a dedicated model override:
{
agents: {
defaults: {
heartbeat: {
model: "provider/model-here"
}
}
}
}
(Use whatever âsmall/cheapâ model you have access to; I wonât list models here because availability changes a lotâOpenClaw docs + your provider list are the source of truth.)
4) Shrink the context heartbeats have to read
Even if the heartbeat prompt is short, the main session history can be huge, and heartbeats pay that cost repeatedly.
Practical ways:
- Run
/compactperiodically (summarize long sessions) (also mentioned in token docs): https://docs.openclaw.ai/reference/token-use#tips-for-reducing-token-pressure - Avoid pasting giant logs/tool outputs into your main session (or move that work to a new session)
Advanced option: move âscheduled checksâ off heartbeat and onto cron (isolated) so it runs with a clean context instead of your giant main chat. Guide: https://docs.openclaw.ai/automation/cron-vs-heartbeat
5) Donât enable extra heartbeat output
- Keep
includeReasoning: false(reasoning delivery can add extra messages/tokens). Docs: https://docs.openclaw.ai/gateway/heartbeat#reasoning-delivery-optional
Also: target: "none" only stops delivery â it still runs the model, so it doesnât save tokens.
If you tell me which of these you want:
- âI want heartbeats but super cheapâ vs
- âI want zero token burn unless something is actually scheduledâ
âŠIâll suggest a clean config pattern (usually: empty HEARTBEAT.md + cron jobs for the few real tasks).
But if I need in heartbeat to specify that each heartbeat, agent should open twitter, collect posts, reply to them , do not reply of already did and at the end write summary
If you do âopen Twitter â read timeline â reply â summarizeâ on every heartbeat, youâre basically choosing a highâtoken workflow on a timer. You can optimize it a lot, but you wonât get âdramaticâ reductions unless you change the architecture (because Twitter page reads via browser snapshots are inherently token-heavy, and heartbeat runs are full agent turns).
Hereâs the best-practice setup to keep tokens as low as possible while still doing that job:
1) Donât run Twitter automation in the main heartbeat session
Heartbeat runs in the main session by default, so every Twitter run bloats the session history, and then future heartbeats get more expensive.
Instead, put Twitter into a separate, dedicated session (or better: cron isolated; see #2). Heartbeat docs mention you can override the heartbeat run session key. That way, your main chat doesnât grow from automation output.
Docs: https://docs.openclaw.ai/gateway/heartbeat
2) Prefer Cron (isolated) for âdo X every N minutesâ automation
For token efficiency, Twitter automation is exactly what cron isolated is for: clean context each run, no main-session pollution, you can pick a cheaper model, and you can control delivery.
Guide: https://docs.openclaw.ai/automation/cron-vs-heartbeat
If you must keep âheartbeatâ as the trigger, the compromise is:
- Heartbeat stays lightweight (âis Twitter job due?â)
- Cron does the heavy lifting
3) Use a tiny state file so you donât ârememberâ via chat history
âDo not reply if already didâ should not be implemented by having the model reread old summaries.
Instead store state like:
lastSeenTweetId- a small rolling list of
repliedTweetIds(cap it, e.g. last 100)
This makes each run: fetch only new posts since lastSeenTweetId, act, update state. Minimal tokens.
4) Cap the work per run (hard limits)
This is a huge token saver and prevents runaway browsing:
- Max new tweets processed per run: e.g. 5â10
- Max replies per run: e.g. 3
- If more exist: save
lastSeenTweetIdand continue next run
5) Avoid âreading Twitterâ with huge browser snapshots
Browser automation tends to dump large UI text into tool results (token expensive).
If possible, use a lower-text path:
- API / RSS / email notifications / curated search pages
- If using browser: open specific tweet URLs or a single narrow list, donât scroll endlessly
And instruct your agent to keep browser reads minimal (small snapshots, minimal depth, donât screenshot/full DOM unless needed).
6) Keep heartbeat outputs tiny (or drop them)
Every âsummaryâ you write can become more context later. Make it one line:
- âReplied to 2 tweets (ids âŠ). Skipped 5 already-handled. LastSeen=âŠâ
and if nothing happened, returnHEARTBEAT_OK(so itâs suppressed).
Docs: https://docs.openclaw.ai/gateway/heartbeat#response-contract
7) Use a cheaper heartbeat model
Heartbeat supports a separate model override. Use your cheapest âgood enoughâ model for the Twitter loop; save the expensive one for real conversations.
Docs: https://docs.openclaw.ai/gateway/heartbeat#config
The blunt truth
If you insist on: every 30m heartbeat + open Twitter in browser + read timeline, token usage will stay high. The only way to make it âdramaticallyâ cheaper is:
- move to cron isolated, and/or
- switch from browser scraping to a low-text feed/API, and
- persist state in a file so you donât need conversational memory.
If you tell me:
- are you using browser automation (logged-in X) or an API/feed?
- how often is your heartbeat (
every: "30m"/"1h")?
âŠI can propose a concrete minimal-token flow (heartbeat vs cron split + what state to store + how to cap work).
- Logged in X
- Every 30 minutes
If youâre logged into X in a real browser and you want to read timeline + reply every 30 minutes, thatâs basically the most token-expensive thing you can schedule â because the browser tool has to âseeâ a lot of page text every run.
You can still make it manageable. The goal is: keep each run bounded and prevent context growth.
Best architecture (biggest token win): Cron (isolated) for the X work
Heartbeat runs in your main session (big context, grows forever). For âdo X automation every 30mâ, youâll get much lower token burn if you move it to an isolated cron job (clean slate each run, doesnât pollute your main chat).
Docs: https://docs.openclaw.ai/automation/cron-vs-heartbeat
Then keep heartbeat for âawarenessâ stuff only (or slow it down).
If you insist on doing it in heartbeat: do these 5 things
1) Store âalready repliedâ in a state file (not in chat history)
Create something like memory/x-reply-state.json with:
lastSeenTweetIdrepliedTweetIds(cap to last ~100-300)- optional
lastRunAt
This is what makes âdonât reply twiceâ cheap.
2) Hard-cap the work per run (this saves you the most)
In your HEARTBEAT.md / heartbeat prompt enforce:
- maxTweetsToConsider = 10
- maxReplies = 3
- if more tweets exist â stop and continue next run
Without caps, one busy timeline = runaway tokens.
3) Avoid screenshots; use minimal text snapshots
OpenClaw can prune old tool output, but image blocks arenât pruned right now â so screenshots make long-term token pressure worse. Prefer text snapshots and keep them small.
4) Enable tool-result pruning so the session doesnât grow forever
Turn on agents.defaults.contextPruning. This trims/clears old tool results before each LLM call so heartbeats donât get more expensive over time.
Docs: https://docs.openclaw.ai/gateway/configuration#agentsdefaultscontextpruning-tool-result-pruning
A strong default for automation-heavy sessions:
{
agents: {
defaults: {
contextPruning: { mode: "aggressive" }
}
}
}
5) Only send a summary when you actually replied
If thereâs nothing new â return HEARTBEAT_OK (so delivery is suppressed). If you replied â send a tiny summary (1â2 lines, no quoted tweets).
Docs: https://docs.openclaw.ai/gateway/heartbeat#response-contract
Example summary format:
X run: replied=2, skipped_already_done=4, lastSeen=1891234567890
What Iâd do in practice for your exact case
- Use cron isolated every 30m for the X automation (lowest tokens, clean context).
- In that job, read/update
memory/x-reply-state.json. - Keep replies + summary short, and cap max replies per run.
If you want, paste your current agents.defaults.heartbeat config block + whether youâre using Chrome Relay or the isolated browser, and Iâll suggest the tightest possible prompt/HEARTBEAT.md checklist for X that wonât explode tokens.