#Compaction settings, context window and token count

1 messages · Page 1 of 1 (latest)

grave elm
#

Krill my man, whats the ideal setting here, we running on GPT5.4 and I want to be optimized for context as well as output, I think we put a global cap on agents of 240K and floor of 40K

ask some questions for me to help, also do you understand what I'm asking ?

strange vesselBOT
lyric orbitBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

hard sunBOT
#

Yep, I understand what you mean.

You’re basically asking, “for GPT-5.4, what should my runtime context budget + compaction headroom be so I keep useful context without making output worse or slower?”

Short answer: yes, 240k cap + 40k floor is a sane starting point.

A few important nuances from the docs:

  • contextTokens = your effective runtime cap
  • reserveTokensFloor = minimum headroom before compaction
  • keepRecentTokens = how much recent convo stays intact during compaction

For direct openai/*, OpenClaw already auto-enables OpenAI server-side compaction on Responses.
For openai-codex/gpt-5.4, docs explicitly say the native window is 1.05M, but the default runtime cap is 272k, because smaller caps tend to have better latency and quality in practice.

So if it were me, I’d start here:

#
{
  agents: {
    defaults: {
      compaction: {
        reserveTokensFloor: 40000,
        keepRecentTokens: 60000
      }
    }
  },
  models: {
    providers: {
      openai: {
        models: [{ id: "gpt-5.4", contextTokens: 240000 }]
      }
      // or openai-codex if that's your provider path
#
    }
  }
}

Questions that would let me tighten this up for you:

  1. Are you using openai/gpt-5.4 or openai-codex/gpt-5.4?
  2. Is this mostly tool-heavy coding, or more chat/research/writing?
  3. Are you optimizing more for max recall, or for speed/quality stability?
  4. Are you currently seeing too-frequent compaction, slow replies, or stale summaries?
  5. What does openclaw --version show?

If you want, paste your current compaction/model block and I’ll give you the exact values I’d set.
Docs: https://docs.openclaw.ai/reference/session-management-compaction https://docs.openclaw.ai/reference/token-use https://docs.openclaw.ai/providers/openai

grave elm
#

from Tony my head agent

I’ve got the live facts. One last check: whether OpenClaw’s local docs actually expose a distinct OAuth path for Codex/OpenAI on this box, or if practically we’re just talking about which provider/model string we point at.
Result: Yes, Codex via OAuth is a real option, and we are already using the Codex lane on main/doc. Live config says main / Happy / Forge / Mason / Doc = openai-codex/gpt-5.4, Intel = openai-codex/gpt-5.3-codex, and openclaw --version is 2026.4.14. We’ve also used OpenAI/Codex OAuth profiles before on Sabian and Pi, so this is not theoretical. Source: memory/2026-04-07-sabian-models.md, memory/2026-04-04-pi-oauth.md

My take: for tool-heavy coding / agentic execution, Codex is the right default. I would not switch main hive from openai-codex/gpt-5.4 to plain openai/gpt-5.4 unless you find a specific quality/stability win in your actual workload. Right now your compaction defaults are:
// main
"compaction": {
"mode": "safeguard",
"recentTurnsPreserve": 8,
"memoryFlush": { "enabled": true },
"reserveTokensFloor": 40000
}
// doc
"compaction": {
"mode": "safeguard",
"recentTurnsPreserve": 8,
"reserveTokensFloor": 24000,
"memoryFlush": { "enabled": true }
}

more coming below, wiat for the next message then reply

#

Send Krill this exact reply:```text
We are currently on OpenClaw 2026.4.14.

Main hive active defaults:

main: openai-codex/gpt-5.4
happy: openai-codex/gpt-5.4
forge: openai-codex/gpt-5.4
mason: openai-codex/gpt-5.4

  - doc: openai-codex/gpt-5.4
  - intel: openai-codex/gpt-5.3-codex

  Main compaction block:
  {
    "mode": "safeguard",
    "recentTurnsPreserve": 8,
    "memoryFlush": { "enabled": true },
    "reserveTokensFloor": 40000
  }

  Doc compaction block:
  {
    "mode": "safeguard",
    "recentTurnsPreserve": 8,
  
    "reserveTokensFloor": 24000,
    "memoryFlush": { "enabled": true }
  }

  We can use OpenAI/Codex OAuth, and have done so on other instances before.
  Workload is mixed, but a lot of it is tool-heavy agentic ops plus coding, config, and file work.
  We want good recall without compaction firing too early, but also stable response quality.
  Please recommend exact values and whether you’d keep openai-codex/gpt-5.4 or move any roles to openai/gpt-5.4.
hard sunBOT
#

Nice, that’s enough to get specific.

Yes, I understand what you’re optimizing for, and with that workload I’d do this:

My recommendation

Keep these on openai-codex/gpt-5.4

  • main
  • happy
  • forge
  • mason
  • doc (for now)

I would not move your core tool-heavy roles to plain openai/gpt-5.4 by default.
For coding, config, file ops, and agentic work, I’d keep the Codex lane as the default.

Only possible exception

#

If doc becomes mostly pure writing/synthesis and not much tool/file work, that’s the one role I’d A/B test on openai/gpt-5.4. But I would test it, not switch blind.


Exact values I’d set

Global runtime cap for GPT-5.4

Set Codex GPT-5.4 to:

{
  models: {
    providers: {
      "openai-codex": {
        models: [{ id: "gpt-5.4", contextTokens: 240000 }]
      }
#
    }
  }
}

That’s a good target. Docs say Codex GPT-5.4 has a 1.05M native window but a 272k default runtime cap, and smaller runtime caps usually behave better for latency + quality. So 240k is a pretty sensible balance.


Compaction settings

Main / happy / forge / mason

I’d make it explicit like this:

#
{
  compaction: {
    mode: "safeguard",
    recentTurnsPreserve: 8,
    reserveTokens: 40000,
    reserveTokensFloor: 40000,
    keepRecentTokens: 64000,
    memoryFlush: { enabled: true }
  }
}

Doc

I’d raise it from the current 24k floor and use:

#
{
  compaction: {
    mode: "safeguard",
    recentTurnsPreserve: 8,
    reserveTokens: 32000,
    reserveTokensFloor: 32000,
    keepRecentTokens: 96000,
    memoryFlush: { enabled: true }
  }
}

Why this split

#
  • 40k reserve on tool-heavy agents gives cleaner headroom for tool turns + output.
  • 64k keepRecent preserves enough local working set for coding/config sessions.
  • Doc usually benefits more from a bigger preserved recent tail than from an oversized reserve floor, so I’d bias it toward keepRecentTokens.

One extra thing I’d strongly consider

If a lot of your bloat is old tool output, also enable context pruning.
Compaction helps history, but pruning helps when giant tool results are the real problem.

If you want, send your actual agents + models.providers.openai-codex block and I’ll rewrite it into the exact JSON5 I’d use.

Docs:
https://docs.openclaw.ai/reference/session-management-compaction
https://docs.openclaw.ai/reference/token-use
https://docs.openclaw.ai/providers/openai

grave elm
#

What runtime cap and compaction/pruning settings give the best real-world performance for our actual mix of tool-heavy agentic work, coding, config, file ops, and some research/writing?”

Current setup:

  • main / happy / forge / mason / doc / James = openai-codex/gpt-5.4
  • intel = openai-codex/gpt-5.3-codex
  • OpenClaw version = 2026.4.14

Current compaction:
Main:
{
"mode": "safeguard",
"recentTurnsPreserve": 8,
"memoryFlush": { "enabled": true },
"reserveTokensFloor": 40000
}

Doc:
{
"mode": "safeguard",
"recentTurnsPreserve": 8,
"reserveTokensFloor": 24000,
"memoryFlush": { "enabled": true }
}

Your suggested direction makes sense:

  • keep core tool-heavy agents on openai-codex/gpt-5.4
  • set openai-codex gpt-5.4 runtime cap to 240000
  • explicit reserve/keepRecent settings
  • likely add context pruning

What I want from you:

  1. the exact JSON5 block you would set for our main instance
  2. whether you would apply the same block to Doc or keep Doc slightly different
  3. whether context pruning should be on, and if so with what settings
  4. any downside/risk to 240k vs 272k for our workload
  5. whether any role should be moved off Codex right now, or only after A/B testing

This is meant to define our default runtime standard for core, durable, high-value agents across instances, not a blind one-size-fits-all preset for every lightweight agent.

also Please recommend:

the default standard for heavy-duty core agents
what should differ for lighter agents
what should differ for writing/research-only agents

  4. which settings should be global defaults vs role-specific overrides
hard sunBOT
#

Yep, for core durable tool-heavy agents, this is the standard I’d set right now.

1) Exact JSON5 block I’d use for the main instance

{
  models: {
    providers: {
      "openai-codex": {
        models: [
          {
            id: "gpt-5.4",
            contextTokens: 240000
          }
        ]
      }
#
    }
  },

  agents: {
    defaults: {
      model: {
        primary: "openai-codex/gpt-5.4"
      },

      compaction: {
        mode: "safeguard",
        recentTurnsPreserve: 8,

        // make the reserve explicit instead of relying only on the floor
        reserveTokens: 40000,
#
        reserveTokensFloor: 40000,

        // keep a solid recent working set intact
        keepRecentTokens: 64000,

        memoryFlush: {
          enabled: true,
          softThresholdTokens: 6000
        }
      },

      contextPruning: {
        mode: "cache-ttl",
        ttl: "1h",
        keepLastAssistants: 3,
#

        softTrimRatio: 0.3,
        hardClearRatio: 0.5,
        minPrunableToolChars: 50000,

        softTrim: {
          maxChars: 4000,
          headChars: 1500,
          tailChars: 1500
        },

        hardClear: {
          enabled: true,
          placeholder: "[Old tool result content cleared]"
        },
#

        tools: {
          deny: ["browser", "canvas"]
        }
      }
    }
  }
}

That would become my default heavy-duty core profile.


2) Would I apply the same block to Doc?

#

Almost, but not exactly.
I’d keep doc on openai-codex/gpt-5.4, but give it a slightly different compaction profile.

If you’re using per-agent overrides, I’d do:

{
  id: "doc",
  compaction: {
    mode: "safeguard",
    recentTurnsPreserve: 8,
    reserveTokens: 32000,
    reserveTokensFloor: 32000,
    keepRecentTokens: 96000,
    memoryFlush: {
      enabled: true,
#
      softThresholdTokens: 6000
    }
  }
}

Why:

  • doc usually benefits more from a bigger preserved recent tail
  • it usually needs slightly less reserve headroom than coding/tool-heavy agents

So:

  • core agents: 40k reserve / 64k keepRecent
  • doc-style agents: 32k reserve / 96k keepRecent

#

3) Should context pruning be on?

Yes. Absolutely for your workload.

For tool-heavy agentic runs, pruning is one of the biggest real-world wins, because a lot of context pressure comes from old tool results, not just normal chat history.

Important nuance from the docs: pruning is in-memory only, it does not delete session history on disk.

So for your setup, I’d keep pruning on globally with the block above.

When I would make pruning gentler

Only for a near-pure writing/research agent where:

  • tool output is rare
  • long conversational continuity matters more than tool-result cleanup

In that case I’d keep pruning on, but consider:

  • higher minPrunableToolChars
#
  • maybe hardClear.enabled: false

But for your main durable core agents, I would not start gentler. I’d start with the pruning block above.


4) Downside/risk to 240k vs 272k

There is a downside, but it’s pretty modest.

What you give up at 240k

  • slightly more frequent compaction on very long runs
  • a bit less raw room for giant transcripts, huge docs, or long research accumulation

What you gain

  • less prompt bloat
  • better latency
#
  • usually better stability/quality in real agentic use
  • fewer “we technically fit, but the model is swimming in sludge” situations

For your workload, I think 240k is the better operating point.

If this were mostly long-form writing or research with very few tools, I’d be more tempted to stay closer to 256k or 272k.
But for coding + tools + config + file ops, I’d take the cleaner 240k budget.


5) Move any role off Codex now?

No, not right now.

My answer is:

  • Keep core roles on openai-codex/gpt-5.4
#
  • Do not move anything off Codex by default yet
  • If you want to experiment, do it with A/B testing only

Best A/B candidate

doc, or a dedicated:

  • writer
  • research
  • synthesis

agent

That’s where plain openai/gpt-5.4 might win.
But I would test it as a specialized lane, not replace your core Codex lane.


Recommended standards by agent type

#

A) Heavy-duty core agents

Use this as the default standard:

  • model: openai-codex/gpt-5.4
  • contextTokens: 240000
  • reserveTokens: 40000
  • reserveTokensFloor: 40000
  • keepRecentTokens: 64000
  • pruning: on
  • recentTurnsPreserve: 8
  • memoryFlush.enabled: true

This is what I’d use for:

  • main
  • happy
  • forge
  • mason
#
  • James
  • most other durable ops agents

B) Lighter agents

For lightweight or narrower agents, I would reduce pressure.

Target shape:

  • lower effective cap
  • lower reserve
  • smaller keepRecent
  • same pruning philosophy

Operationally I’d aim for roughly:

  • cap: 160k to 180k
  • reserve: 24k to 32k
#
  • keepRecent: 32k to 48k

These should not inherit the full heavy-duty standard unless they really need it.


C) Writing / research-only agents

This is the only category where I’d seriously test openai/gpt-5.4.

Target shape:

  • model: A/B test openai/gpt-5.4 vs Codex
  • reserve: 32k
  • keepRecent: 96k+
  • pruning: on, but possibly gentler
  • cap: 240k minimum, maybe 256k if the role is truly long-context and low-tooling

#

6) What should be global defaults vs role-specific overrides?

Make these global defaults

For an instance whose important agents are mostly heavy-duty:

  • openai-codex/gpt-5.4 runtime cap = 240000
  • compaction mode = safeguard
  • recentTurnsPreserve = 8
  • reserveTokens = 40000
  • reserveTokensFloor = 40000
  • keepRecentTokens = 64000
  • memoryFlush.enabled = true
  • pruning = on

Make these role-specific overrides

  • doc / writing-heavy agents:
    • lower reserve
#
  • higher keepRecent
  • lightweight agents:
    • smaller cap / smaller reserve / smaller keepRecent
  • any future pure writer/research role:
    • candidate for openai/gpt-5.4 A/B test

Bottom line

If you want one clean standard for core, durable, high-value agents across instances, I’d standardize on:

  • openai-codex/gpt-5.4
  • contextTokens: 240000
  • reserveTokens/reserveTokensFloor: 40000
  • keepRecentTokens: 64000
  • context pruning on