Is it possible to move some context out of the system message for helping with caching purposes on Anthropic API? I recently put 10$ in credits to use with OpenClaw and 1 hour cache but I got 4$ cache writes and 2$ cache reads in about 4 total hours of usage. I guess this is because the system message gets changing so much and this triggers a cache re-write every single time. This is also important as I'm going to create a OpenClaw instance to use with some friends and as we're sharing money and using for real purposes, not experiments, we need maximum efficiency in credit usage.
#Move some context out of system message
1 messages ยท Page 1 of 1 (latest)
Let me verify this against the docs/source so I don't guess.
I found the relevant bit. OpenClaw already does exactly this now, so the main question is whether your churn is coming from something else in the prompt/session shape.
Yes, and newer OpenClaw already does this.
It now splits the system prompt into a cache-stable prefix and a volatile suffix, so changing bits like HEARTBEAT.md and per-turn metadata should not bust the whole Anthropic cache by themselves.
Docs:
So if you're seeing a lot of cacheWrite, it's usually one of these instead:
- the session went idle past the TTL, then had to re-warm
- something above the cache boundary changed, like injected bootstrap files, tools, skills, config, model/provider
- long tool output or media-heavy history got re-cached
For a shared, cost-sensitive setup, I'd recommend:
- default to
cacheRetention: "short" - use
longonly for sessions/agents you revisit often - if you do use 1h cache, keep it warm only where it matters (
heartbeat.every: "55m")
- enable
contextPruning.mode: "cache-ttl"so re-warms are smaller - keep injected files small, especially
MEMORY.md/memory.md
Also, 1h cache is not always cheaper. On Anthropic, cache writes are the expensive part, so if sessions are bursty or go idle a lot, short or even none can be more cost efficient.
More docs:
If you want, paste openclaw --version and I can suggest a concrete JSON5 config for a multi-user shared setup.
At this point, is it worth it using cache at all? Like, Haiku is 1$/million input and 5$/million output, but I got 4$ cache writes and 2$ cache reads (input) and actually got only 0.40$ of output. Isn't this like, very wrong?
I want to sanity-check the pricing side before I say "yes/no", because the answer depends a lot on Anthropic's cache multipliers and your reuse pattern.
It looks unintuitive, but not necessarily wrong.
Assuming Claude Haiku 4.5 on direct Anthropic, current pricing is roughly:
- normal input:
$1 / MTok - 1h cache write:
$2 / MTok - cache read:
$0.10 / MTok
So your dollar totals need to be converted back to tokens.
With your numbers:
$4cache writes = about2Mtokens written to cache$2cache reads = about20Mtokens served from cache
Without caching, that same prompt traffic would have been about:
22Mnormal input tokens- about
$22of input cost
With caching, you paid about $6 for that input-side traffic.
So oddly enough, those numbers suggest cache probably saved you money, not lost you money. The $0.40 output just means the model didn't generate much text. Output cost is separate.
What is true:
1hcache is only worth it if you reuse the same growing prefix enough- for bursty/shared usage,
shortis usually the safer default - I'd use
longonly for persistent chat/research agents
Rule of thumb for Haiku 4.5:
- 5m cache is usually worth it after 1 reuse
- 1h cache is usually worth it after about 2+ reuses
So for a shared OpenClaw setup, my recommendation is:
- default
cacheRetention: "short" - use
longonly on specific agents you revisit a lot - keep
contextPruning.mode: "cache-ttl"enabled
- only keep heartbeat on for those long-cache agents
Docs:
- https://docs.openclaw.ai/reference/prompt-caching
- https://docs.openclaw.ai/concepts/session-pruning
- Anthropic pricing: https://platform.claude.com/docs/en/build-with-claude/prompt-caching
If you want, I can sketch a JSON5 config for a shared "cheap by default, deep chat when needed" setup.
But why exactly I wrote 2 million tokens in 4 hours of usage? And what would be the price for these same 2 million tokens considering I come back to a chat in about 5-30 minutes or even longer?
I meant, if the cache were 5min instead of 1 hour
Because those 2M write tokens are not "what you typed". They're the cached prompt prefix being written, over and over as the conversation grows.
That prefix includes:
- tool definitions
- system prompt
- injected files/context
- prior chat history
- tool results
So a session can hit big numbers fast. For example, 20 warm writes with an average cached prefix of 100k tokens is already 2M write tokens. In OpenClaw, that is pretty plausible once a thread gets long or tool-heavy.
Also, Anthropic's cache advances with the conversation, so as the prefix grows, later writes can be much larger than early ones.
For the same 2M-token prefix on Haiku 4.5:
- No cache:
$2.00each time - 5m cache write:
$2.50once
- 1h cache write:
$4.00once - Cache read/hit:
$0.20each reuse
So if you come back:
-
within 5 minutes
- 5m cache hit:
$0.20 - 1h cache hit:
$0.20
- 5m cache hit:
-
after 5 to 30 minutes
- 5m cache is expired, you pay another
$2.50 - 1h cache still hits, you pay
$0.20
- 5m cache is expired, you pay another
-
after more than 1 hour
- both miss, you pay a fresh write again
Rule of thumb:
- mostly
<5mreturns:short - mostly
5m to 1hreturns to the same chat:long - mostly one-off / bursty / many different chats:
noneorshort
And importantly, cache is about reusing the same prefix, not all usage globally. Different chats/friends/sessions will not magically share one cache if their history differs.
If you want to see where the tokens are actually coming from, try:
/context detail- or enable cache tracing via the prompt-caching docs
Docs:
- https://docs.openclaw.ai/reference/prompt-caching
- https://docs.openclaw.ai/concepts/session-pruning
- Anthropic pricing: https://platform.claude.com/docs/en/build-with-claude/prompt-caching
If you want, I can give you a "shared instance, cheapest sane defaults" JSON5 config next.
Ah, got it.
If those same 2M write tokens had been 5-minute cache writes instead of 1-hour cache writes on Haiku 4.5:
- 1h write:
2M ร $2/MTok = $4.00 - 5m write:
2M ร $1.25/MTok = $2.50
So yes, for the same write volume, 5-minute cache would have been $1.50 cheaper.
But the catch is the TTL:
- come back within 5 minutes: still a cheap cache read (
$0.20for 2M tokens) - come back after 6+ minutes: the 5m cache is gone, so you pay another write
- come back after 5 to 30 minutes: 5m usually misses, 1h usually still hits
So for a 2M-token prefix:
If you come back after 3 minutes
- no cache:
$4.00 - 5m cache:
$2.70 - 1h cache:
$4.20
If you come back after 10 minutes
- no cache:
$4.00 - 5m cache:
$5.00 - 1h cache:
$4.20
So the rough takeaway is:
- rapid back-and-forth in the same chat โ
shortis great - returning after 5 to 30 minutes โ
longis usually better - only 1 revisit after 10-30m โ even
longcan still be slightly worse than no cache - 2+ revisits within the hour โ
longstarts winning