Hello, i tried the new routing through Codex harness. Instead of the OAuth legacy. While it worked it drains tons more of my usage limit compared to just doing OAuth. Dont know if prompt caching is worse or just much more context being loaded into each session or turn. But is there a way to fix it otherwise i will stick to legacy OAuth Codex using my Codex subscription
#Codex harness burn more token
9 messages · Page 1 of 1 (latest)
Yeah, that can happen.
Short version:
openai-codex/*= Codex OAuth through normal OpenClaw provider pathcodex/*= Codex app-server harness
Those are not the same execution path, so usage is not 1:1 comparable. The harness gives you native Codex thread resume/compaction, but OpenClaw still adds its own tools/system prompt/context and keeps a transcript mirror, so it can burn more quota in practice.
What I’d recommend:
- If your priority is lowest usage / best value, stick to
openai-codex/gpt-5.4 - Use
codex/gpt-5.4only if you specifically want the native harness features - In long harness threads, use
/codex compactor/compact - Turn on
/usage tokensand check/statusso you can compareinputvscacheRead - Run
/context detailonce, it’s the fastest way to see if prompt/tool/workspace overhead is the real culprit
So, yes: there are ways to reduce it, but I don’t think there’s currently a magic switch that makes the harness path as cheap as the older OAuth path.
Can you paste openclaw --version? I want to sanity-check whether you’re on a build where Codex routing/cache behavior changed.
Docs:
OpenClaw 2026.4.12 (1c0672b)
Nice, thanks. 2026.4.12 is recent enough that I wouldn’t blame this on “you’re just outdated”.
My honest take: if codex/* is burning a lot more than openai-codex/*, that’s probably a real harness-path difference, not you imagining it.
A few concrete checks:
- make sure
/fastis off - make sure you did not set
plugins.entries.codex.config.appServer.serviceTier: "priority" - compare in two fresh sessions:
openai-codex/gpt-5.4codex/gpt-5.4
- turn on
/usage tokens - check
/statusafter each turn - run
/context detailonce
How to read that:
- low
cacheReadon repeated turns = caching is worse - huge
inputgrowth over time = thread/context growth - long Codex thread getting fat = try
/codex compact
So right now my recommendation would be:
- if you want best subscription efficiency, use
openai-codex/* - if you want native Codex app-server features, use
codex/* - I would not switch to harness just assuming it will be cheaper, because today it often isn’t
If you can reproduce a really big delta on fresh sessions, that’s worth a GitHub issue with:
- exact version
- model on each side
- whether
/fastwas on - 2-3 turn
/usage tokensoutput for each path
Docs: