#Tokens consumption tricks

1 messages · Page 1 of 1 (latest)

light oriole
#

Give me the full list of tricks to reduce tokens consumption dramatically. I am using claude max subscription

urban wraithBOT
# light oriole Give me the full list of tricks to reduce tokens consumption dramatically. I am ...

If you’re on Claude Max (claude.ai), “token consumption” mostly shows up as: fewer messages before you hit the usage limit, slower responses, and the model “forgetting” because context gets too big. The big wins are all about keeping the input context small and forcing short outputs.

Below is the full bag of tricks, ordered roughly by impact.


1) Biggest wins (usually 5–20× less tokens)

A) Start a new chat aggressively

Long chats silently get expensive because each new message can drag a lot of prior context along.

  • New topic? New chat
  • Same topic but different subproblem? New chat
  • Debugging with lots of attempts? New chat after you’ve learned the key facts

Prompt to carry over without dragging history:

“We’re starting fresh. Here are the only facts you should assume: (bullets). Ask 1–2 questions if anything critical is missing.”

#

B) Replace history with a “compressed state”

Instead of letting the model re-read 50 turns, maintain a tiny “state” block you paste each time.

Template (copy/paste):

STATE (keep; overwrite as needed):
Goal: 

Constraints: 

Current status: 

Decisions made: 

Open questions: 


TASK NOW: 

OUTPUT FORMAT: (e.g., 6 bullets, max 120 words)

This is the single most reliable way to cut tokens dramatically.

C) Don’t paste raw logs / code / articles; paste only the minimal excerpt

#

If you paste 2,000 lines, you pay for 2,000 lines—every time the model needs them again.

  • Paste only the failing section
  • Include file name + function name + ~50 lines around the issue
  • For logs: include first error + stack trace + 30 lines before/after

Prompt:

“I’m pasting an excerpt. Do not ask for the full file unless absolutely necessary. If you need more, specify exactly which lines/range.”

D) Force short outputs (hard cap)

Most people say “be concise” and still get essays. Put a hard budget and a stop rule.

Examples:

  • “Answer in ≀120 tokens. If you can’t, ask me 1 question.”
  • “Give exactly 7 bullets, no intro, no recap.”
  • “Return only: JSON with keys root_cause, fix, next_steps.”

#

2) Model / settings choices (very high leverage)

A) Avoid “extended thinking / deep reasoning” unless you truly need it

In Claude products, more thinking generally means more internal work and often longer outputs. Use the lightest mode that works.

B) Prefer smaller/faster models for most turns

Use the best model only for the “hard steps”:

  • brainstorming / outlining / summarizing: smaller model
  • final tricky reasoning / final answer: bigger model

C) Split work: cheap pass → expensive pass

  1. Cheap: “extract facts, detect unknowns, propose plan”
  2. Expensive: “solve only the hard part with the cleaned facts”

This prevents paying expensive-model rates to read messy context.


#

3) Conversation tactics that reduce back-and-forth (saves both tokens + messages)

A) Ask the model to ask you questions first

This avoids 3–6 turns of wrong assumptions.

Prompt:

“Before solving, ask up to 3 clarifying questions. If you can proceed, say ‘No questions’ and proceed.”

B) Use “two-step” outputs: plan then execute (but keep plan tiny)

Prompt:

“Step 1: 5-bullet plan (max 60 words). Step 2: execute.”

C) Stop when sufficient

Prompt:

“Stop as soon as you have a working answer. Don’t provide extra alternatives unless I ask.”


#

4) Make your inputs more compressible

A) Use identifiers and references instead of repeating text

Instead of re-pasting a long spec, do:

  • “Spec v3 is unchanged”
  • “Constraint C2 still applies”
  • “Use the same assumptions as above”

Or create a short “facts list” and refer to bullet numbers:

  • “Assume F1–F7; update F3 to 
”

B) Use structured input formats

Models waste tokens parsing prose.

Good:

  • bullets
  • tables
#
  • key/value blocks
  • JSON (if you can tolerate it)

Bad:

  • long narrative paragraphs
  • repeated context in different words

5) Output-format tricks (prevents token bloat)

A) For code: ask for diffs or minimal patches

Prompt:

“Give a minimal patch only (diff-style). No full-file rewrites.”

B) For explanations: demand “justification size limits”

Prompt:

#

“Explain the why in 2 bullets max.”

C) For research: ask for “answer + citations only”

Prompt:

“Return: (1) conclusion (2) 3 supporting bullets (3) sources list. Nothing else.”


6) When you must include large context: chunking and retrieval

A) Chunk input and ask for extraction

Instead of “read all this and solve”, do:

  1. Paste chunk 1 → “extract only relevant facts; ignore everything else”
  2. Paste chunk 2 → same
  3. Then give the extracted facts back and ask for solution

Yes, it’s multiple turns—but you avoid repeatedly carrying giant raw text forward.

#

B) “Quote only what you use”

Prompt:

“Only quote the exact lines you rely on (max 10 lines total).”

This keeps outputs tight and prevents the model from reprinting your whole input.


7) Practical copy/paste prompts (steal these)

Ultra-low token mode

“Ultra-low-token mode. Output ≀90 tokens. No preamble. If missing info, ask 1 question only.”

Debug mode (minimal)

“Find the root cause and fix. Output exactly:

  1. Root cause (1 sentence)
  2. Fix (numbered steps, max 5)
#
  1. Verification (max 3 steps)”

“Compressed state” generator (use once, then keep using the state)

“Summarize this conversation into a STATE block ≀150 words with: Goal, Constraints, Current status, Decisions, Open questions. This STATE should replace the entire chat history.”


Quick diagnostic question (so I can tailor this)

When you say “token consumption,” do you mean:

  1. claude.ai Max usage limits (messages run out), or
  2. API billing (you pay per token), or
  3. OpenClaw/agent runs (tools + long context causing big prompts)?

Tell me which one, and what your typical workflow is (coding? research? chatty back-and-forth?), and I’ll give you a tighter “do this, not that” setup.

light oriole
#

I am talking about decreasing token consumption during heartbeats

urban wraithBOT
# light oriole I am talking about decreasing token consumption during heartbeats

Heartbeats in OpenClaw are full agent turns (full system prompt + session history + tools), so the only dramatic token reductions come from either running them less or skipping the model call entirely.

Here are the best levers (highest impact first), with the exact knobs that exist today:

1) Make OpenClaw skip the heartbeat run (no model call)

A) Create HEARTBEAT.md but keep it “effectively empty”

From the heartbeat docs: if HEARTBEAT.md exists but is basically empty (blank lines / headers / comments), OpenClaw skips the heartbeat run to save API calls. If the file is missing, the heartbeat still runs.
Docs: https://docs.openclaw.ai/gateway/heartbeat#heartbeatmd-optional

So: create HEARTBEAT.md and leave only e.g.

# HEARTBEAT.md
# (empty on purpose)

B) Turn off heartbeat visibility everywhere (also skips the run)

If all three are false, OpenClaw skips the heartbeat run entirely:

#
    defaults: {
      heartbeat: {
        every: "2h" // or "4h"; "0m" disables
      }
    }
  }
}

Docs: https://docs.openclaw.ai/gateway/heartbeat#defaults

Also consider activeHours so you’re not burning turns overnight:

{
  agents: { defaults: { heartbeat: { every: "1h", activeHours: { start: "09:00", end: "22:00" } } } }
}
#

(Use whatever “small/cheap” model you have access to; I won’t list models here because availability changes a lot—OpenClaw docs + your provider list are the source of truth.)


4) Shrink the context heartbeats have to read

Even if the heartbeat prompt is short, the main session history can be huge, and heartbeats pay that cost repeatedly.

Practical ways:

Advanced option: move “scheduled checks” off heartbeat and onto cron (isolated) so it runs with a clean context instead of your giant main chat. Guide: https://docs.openclaw.ai/automation/cron-vs-heartbeat


5) Don’t enable extra heartbeat output

#

Also: target: "none" only stops delivery — it still runs the model, so it doesn’t save tokens.


If you tell me which of these you want:

  1. “I want heartbeats but super cheap” vs
  2. “I want zero token burn unless something is actually scheduled”


I’ll suggest a clean config pattern (usually: empty HEARTBEAT.md + cron jobs for the few real tasks).

light oriole
#

But if I need in heartbeat to specify that each heartbeat, agent should open twitter, collect posts, reply to them , do not reply of already did and at the end write summary

urban wraithBOT
# light oriole But if I need in heartbeat to specify that each heartbeat, agent should open twi...

If you do “open Twitter → read timeline → reply → summarize” on every heartbeat, you’re basically choosing a high‑token workflow on a timer. You can optimize it a lot, but you won’t get “dramatic” reductions unless you change the architecture (because Twitter page reads via browser snapshots are inherently token-heavy, and heartbeat runs are full agent turns).

Here’s the best-practice setup to keep tokens as low as possible while still doing that job:

1) Don’t run Twitter automation in the main heartbeat session

Heartbeat runs in the main session by default, so every Twitter run bloats the session history, and then future heartbeats get more expensive.

Instead, put Twitter into a separate, dedicated session (or better: cron isolated; see #2). Heartbeat docs mention you can override the heartbeat run session key. That way, your main chat doesn’t grow from automation output.
Docs: https://docs.openclaw.ai/gateway/heartbeat

2) Prefer Cron (isolated) for “do X every N minutes” automation

For token efficiency, Twitter automation is exactly what cron isolated is for: clean context each run, no main-session pollution, you can pick a cheaper model, and you can control delivery.
Guide: https://docs.openclaw.ai/automation/cron-vs-heartbeat

If you must keep “heartbeat” as the trigger, the compromise is:

  • Heartbeat stays lightweight (“is Twitter job due?”)
  • Cron does the heavy lifting
#

3) Use a tiny state file so you don’t “remember” via chat history

“Do not reply if already did” should not be implemented by having the model reread old summaries.

Instead store state like:

  • lastSeenTweetId
  • a small rolling list of repliedTweetIds (cap it, e.g. last 100)

This makes each run: fetch only new posts since lastSeenTweetId, act, update state. Minimal tokens.

4) Cap the work per run (hard limits)

This is a huge token saver and prevents runaway browsing:

  • Max new tweets processed per run: e.g. 5–10
  • Max replies per run: e.g. 3
  • If more exist: save lastSeenTweetId and continue next run

5) Avoid “reading Twitter” with huge browser snapshots

Browser automation tends to dump large UI text into tool results (token expensive).

#

If possible, use a lower-text path:

  • API / RSS / email notifications / curated search pages
  • If using browser: open specific tweet URLs or a single narrow list, don’t scroll endlessly

And instruct your agent to keep browser reads minimal (small snapshots, minimal depth, don’t screenshot/full DOM unless needed).

6) Keep heartbeat outputs tiny (or drop them)

Every “summary” you write can become more context later. Make it one line:

7) Use a cheaper heartbeat model

Heartbeat supports a separate model override. Use your cheapest “good enough” model for the Twitter loop; save the expensive one for real conversations.
Docs: https://docs.openclaw.ai/gateway/heartbeat#config


#

The blunt truth

If you insist on: every 30m heartbeat + open Twitter in browser + read timeline, token usage will stay high. The only way to make it “dramatically” cheaper is:

  • move to cron isolated, and/or
  • switch from browser scraping to a low-text feed/API, and
  • persist state in a file so you don’t need conversational memory.

If you tell me:

  1. are you using browser automation (logged-in X) or an API/feed?
  2. how often is your heartbeat (every: "30m" / "1h")?
    
I can propose a concrete minimal-token flow (heartbeat vs cron split + what state to store + how to cap work).
light oriole
#
  1. Logged in X
  2. Every 30 minutes
urban wraithBOT
# light oriole 1. Logged in X 2. Every 30 minutes

If you’re logged into X in a real browser and you want to read timeline + reply every 30 minutes, that’s basically the most token-expensive thing you can schedule — because the browser tool has to “see” a lot of page text every run.

You can still make it manageable. The goal is: keep each run bounded and prevent context growth.

Best architecture (biggest token win): Cron (isolated) for the X work

Heartbeat runs in your main session (big context, grows forever). For “do X automation every 30m”, you’ll get much lower token burn if you move it to an isolated cron job (clean slate each run, doesn’t pollute your main chat).
Docs: https://docs.openclaw.ai/automation/cron-vs-heartbeat

Then keep heartbeat for “awareness” stuff only (or slow it down).

If you insist on doing it in heartbeat: do these 5 things

1) Store “already replied” in a state file (not in chat history)

Create something like memory/x-reply-state.json with:

  • lastSeenTweetId
  • repliedTweetIds (cap to last ~100-300)
  • optional lastRunAt
#

This is what makes “don’t reply twice” cheap.

2) Hard-cap the work per run (this saves you the most)

In your HEARTBEAT.md / heartbeat prompt enforce:

  • maxTweetsToConsider = 10
  • maxReplies = 3
  • if more tweets exist → stop and continue next run

Without caps, one busy timeline = runaway tokens.

3) Avoid screenshots; use minimal text snapshots

OpenClaw can prune old tool output, but image blocks aren’t pruned right now — so screenshots make long-term token pressure worse. Prefer text snapshots and keep them small.

4) Enable tool-result pruning so the session doesn’t grow forever

Turn on agents.defaults.contextPruning. This trims/clears old tool results before each LLM call so heartbeats don’t get more expensive over time.
Docs: https://docs.openclaw.ai/gateway/configuration#agentsdefaultscontextpruning-tool-result-pruning

#

A strong default for automation-heavy sessions:

{
  agents: {
    defaults: {
      contextPruning: { mode: "aggressive" }
    }
  }
}

5) Only send a summary when you actually replied

If there’s nothing new → return HEARTBEAT_OK (so delivery is suppressed). If you replied → send a tiny summary (1–2 lines, no quoted tweets).
Docs: https://docs.openclaw.ai/gateway/heartbeat#response-contract

Example summary format:

  • X run: replied=2, skipped_already_done=4, lastSeen=1891234567890
#

What I’d do in practice for your exact case

  • Use cron isolated every 30m for the X automation (lowest tokens, clean context).
  • In that job, read/update memory/x-reply-state.json.
  • Keep replies + summary short, and cap max replies per run.

If you want, paste your current agents.defaults.heartbeat config block + whether you’re using Chrome Relay or the isolated browser, and I’ll suggest the tightest possible prompt/HEARTBEAT.md checklist for X that won’t explode tokens.