#codex/gpt-5.5 compaction issue

1 messages Ā· Page 1 of 1 (latest)

rancid arch
#

@krill can you sanity check an OpenClaw compaction issue?

OpenClaw 2026.4.23, Telegram group topic session, model openai-codex/gpt-5.5.

What happened:

  • Status after the event showed: Context: 62k/1.0m, Compactions: 1
  • Actual compaction event was at 2026-04-26T22:15:47.089Z
  • Transcript compaction record had tokensBefore = 271575
  • Immediately before compaction, usage was around:
    • 271552 total tokens at 22:11:55Z
    • 271386 total tokens at 22:13:31Z
  • Gateway log right before compaction showed upstream rejection:
    • context_length_exceeded
    • message: ā€œYour input exceeds the context window of this model.ā€
    • provider/model: openai-codex/gpt-5.5
  • Local config/catalog says GPT-5.5 has:
    • contextWindow: 1000000
    • contextTokens: 1000000
  • Compaction config:
    • reserveTokensFloor = 40000
    • memoryFlush.softThresholdTokens = 6000
    • maxHistoryShare = 0.8

My read: OpenClaw/status believed this was a 1M context model, but the upstream Codex/GPT-5.5 route rejected around ~272k total tokens, so compaction was reactive after provider failure, not a planned 1M-window compaction.

The session was file/tool-output heavy: roughly ~639k text chars before compaction, ~585k from tool results.

Questions:

  1. Is GPT-5.5 through the openai-codex route effectively capped closer to ~250k-300k despite config showing 1M?
  2. Should OpenClaw’s model catalog/config for openai-codex/gpt-5.5 be lowered, or should OpenClaw auto-detect this provider-side cap?
  3. Is totalTokens including cache-read tokens in a way that differs from the upstream context limit?
  4. Best local mitigation for now: set GPT-5.5 contextWindow/contextTokens to ~240k-250k so OpenClaw compacts proactively instead of hitting context_length_exceeded mid-turn?
  5. Also saw a ā€œcompaction retry aggregate timeout (60000ms)ā€ warning after the provider rejection, but post-compaction turns continued. Is that expected?

Trying to avoid surprise compactions mid-coding session.

tame juncoBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> āœ… Mark Solution

red pineBOT
#

Yep — your read looks right.

Short version: openai-codex/gpt-5.5 is not practically a 1M-input session through the Codex/ChatGPT route. OpenAI’s published ChatGPT/Codex GPT‑5.5 Thinking window is 400k total = 272k input + 128k max output on Pro, or 256k total = 128k input + 128k max output on other paid tiers. That lines up suspiciously well with your rejection at tokensBefore ā‰ˆ 271,575. (help.openai.com)

Answers:

  1. Yes, effectively ~272k input for Pro Codex route, not 1M usable input. Your failure point is basically right at that cap.

  2. OpenClaw should not treat this as contextTokens: 1000000 for runtime budgeting. Current OpenClaw docs now distinguish native metadata from runtime cap: contextWindow: 1000000, but default runtime contextTokens: 272000 for GPT‑5.5 through Codex OAuth. (docs.openclaw.ai)
    So if 2026.4.23 showed 1.0m and didn’t compact before ~272k, that looks like stale/incorrect catalog/session metadata or a bug in applying contextTokens.

  3. totalTokens can differ from the live context limit. OpenClaw docs say provider usage totals can include cached input/output/multiple tool-loop calls, while context display is supposed to use the latest prompt snapshot. But in your case, tokensBefore=271575 plus provider rejection strongly suggests this was a real prompt/context cap, not just cache-read accounting weirdness. (docs.openclaw.ai)

  4. Best local mitigation: set contextTokens, not necessarily contextWindow, to something like 240000 or 250000 so OpenClaw/Pi compacts early. Leave contextWindow as native metadata if your build respects contextTokens; if 2026.4.23 ignores it, lowering both is a pragmatic workaround.

#
{
  models: {
    providers: {
      "openai-codex": {
        models: [
          { id: "gpt-5.5", contextTokens: 240000 }
        ]
      }
    }
  }
}

Then restart the gateway and start a fresh session / reset that session so the stored session contextTokens doesn’t keep showing the stale 1.0m.

  1. The 60s compaction retry aggregate timeout sounds non-fatal if later turns continued. I checked the source: that warning means OpenClaw stopped waiting on compaction retry completion after 60s to avoid wedging the session lane, then proceeds. Worth watching, but by itself it matches ā€œwarning + recoveredā€ behavior.
#

Also: file/tool-output heavy sessions are exactly where I’d cap lower than the theoretical provider limit. I’d use 240k for now; 250k is probably okay, but 240k gives a little room for hidden wrappers/tool schema/output reserve.

rancid arch
#

why would gpt 5.5 have less context window than gpt 5.4?

red pineBOT
#

It probably doesn’t, intrinsically.

The confusing bit is that there are two different numbers being mixed:

  • Native/model metadata context: OpenClaw/catalog may say ~1M for GPT‑5.4/5.5.
  • Codex/ChatGPT subscription route effective budget: OpenAI documents GPT‑5.5 Thinking as 400k on Pro = 272k input + 128k max output, or 256k on other paid tiers = 128k input + 128k output. (help.openai.com)

So the ~272k rejection doesn’t mean ā€œGPT‑5.5 is worse than GPT‑5.4.ā€ It likely means the openai-codex route is enforcing the ChatGPT/Codex product cap, including a big reserved output budget.

Why might a newer model expose less through that route?

  • Product/tier cap, not raw architecture limit.
  • Reasoning models reserve a lot for output/thinking/tool orchestration.
  • Latency/quality guardrail: huge contexts can make coding agents slower and less reliable.
  • Rollout safety/cost control: OpenAI can expose different limits per route even for the same model family.

So I’d phrase it as: GPT‑5.5 may have ~1M native/catalog context, but Codex OAuth currently behaves like ~272k usable input on Pro. OpenClaw should budget against contextTokens, not the native contextWindow, for proactive compaction.

rancid arch
#

@krill I want your recommendation on OpenClaw memory/context architecture, not just the GPT-5.5 cap.

Our workflow is context-heavy:

  • Main agent is a trading-system engineering assistant.
  • Workspace has AGENTS.md, SOUL.md, USER.md, MEMORY.md, HEARTBEAT.md, plus daily logs in memory/YYYY-MM-DD.md.
  • Startup instructions currently require reading SOUL.md, USER.md, today’s daily memory, yesterday’s daily memory, gateway health, boot script, and sniper state.
  • Current system prompt report in one session:
    • total system prompt ~47.6k chars
    • project context ~21k chars
    • tool schemas ~31.9k chars
    • MEMORY.md is ~16k chars and gets injected truncated to 12k.
  • Daily logs are getting large:
    • memory/2026-04-27.md ~34k chars
    • memory/2026-04-26.md ~107k chars
    • memory/2026-04-25.md ~68k chars
  • Our main sessions often read big source files, docs, logs, and JSONL outputs directly; in one compaction case, tool results were ~585k chars of ~639k active text chars.
  • Semantic memory search is currently failing with provider/credential errors, so we rely on direct file reads and exact memory_get/manual grep.
  • We’re considering whether Obsidian / OpenClaw Memory Wiki / a curated vault could help.

Questions:

#
  1. What would you recommend as the best memory architecture for this setup?
  2. Should startup stop reading full today/yesterday daily logs and instead read a small ā€œcurrent state / active workstreamā€ file?
  3. Should MEMORY.md be reduced to durable invariants only, with topic pages for active processes, Mission Control, OpenClaw ops, etc.?
  4. Is OpenClaw Memory Wiki the right bridge to an Obsidian-style vault, or should Obsidian stay human-facing while OpenClaw uses compiled summaries?
  5. How should we structure files so the agent can retrieve relevant context on demand without loading giant daily logs?
  6. Should we fix semantic/vector memory first before trying Obsidian/wiki?
  7. Is there a recommended pattern for ā€œdaily log as archiveā€ vs ā€œactive context as compact startup fileā€?
  8. Can OpenClaw use Obsidian vault pages safely without injecting the whole vault into every session?
  9. Are there config options for shrinking startup context/tool schema load or lazy-loading tools, or is the main win workflow discipline + memory architecture?
  10. For avoiding compaction context loss, would you recommend:
  • subagents for large file audits,
  • manual /compact at phase boundaries,
  • smaller startup files,
  • Memory Wiki compiled digests,
  • semantic memory provider,
  • or some combination?

Goal: preserve continuity across compactions without stuffing every session with huge .md files or raw logs.

red pineBOT
#

Let me check the current docs/source for the memory/wiki + pruning knobs so I don’t hand-wave this.

#

Yep — for this workflow I’d stop treating context as storage. You need a tiered memory architecture.

My recommendation:

1. Shrink ā€œalways injectedā€ memory hard

Your always-on prompt is already expensive:

  • ~47.6k system prompt chars
  • ~31.9k tool schemas
  • MEMORY.md injected/truncated
  • extra startup reads of large daily files

For a coding/trading-system agent, I’d make the always-injected layer only:

  • AGENTS.md: operating rules + retrieval protocol
  • USER.md: tiny preferences/identity
#
  • MEMORY.md: curated durable facts only, ideally <5–8k chars
  • HEARTBEAT.md: tiny checklist only
  • no daily logs by default

Daily files should not be read wholesale at startup. Replace ā€œread today/yesterdayā€ with ā€œread memory/CURRENT.md or search/index first.ā€

Docs: https://docs.openclaw.ai/concepts/context

2. Add a small ā€œcurrent stateā€ file

For trading-system engineering, create one compact handoff file, e.g.

memory/CURRENT.md

Keep it under ~2–5k chars. It should contain:

#
  • active branch/session goal
  • current system state
  • known blockers
  • last good commands/tests
  • open risks
  • links/line refs into daily logs

This is what startup should read, not 100k-char daily logs.

Daily logs become append-only archive. CURRENT.md is the working set.

3. Fix retrieval before adding more memory

If memory_search is broken, that’s the first architectural bug. Without search, the model falls back to full-file reads, which is exactly how you get 585k chars of tool output.

Run/check:

#
openclaw memory status --deep
openclaw memory index --force

Then set an explicit provider if auto-detect is confused:

{
  agents: {
    defaults: {
      memorySearch: {
        provider: "openai" // or "gemini", "github-copilot", "local", "ollama"
      }
    }
  }
#
}

Docs: https://docs.openclaw.ai/concepts/memory-search

4. Use Memory Wiki/Obsidian for curated knowledge, not raw logs

I’d use Memory Wiki, optionally rendered for Obsidian, but not as a dumping ground.

Good wiki pages:

  • systems/execution-engine.md
  • systems/sniper.md
  • runbooks/deploy.md
  • incidents/2026-04-26-context-overflow.md
  • decisions/model-context-budget.md
#
  • dashboards/open-questions.md

Bad wiki pages:

  • raw JSONL dumps
  • full logs
  • full source files
  • every daily note copied verbatim

Memory Wiki is best as the compiled belief/knowledge layer: claims, evidence, decisions, contradictions, freshness. Obsidian is mostly a human UX on top.

Docs: https://docs.openclaw.ai/plugins/memory-wiki

5. Cap tool results aggressively

Your biggest problem is not the system prompt; it’s accumulated tool output.

#

I’d set:

{
  agents: {
    defaults: {
      contextLimits: {
        memoryGetMaxChars: 8000,
        memoryGetDefaultLines: 80,
        toolResultMaxChars: 8000,
        postCompactionMaxChars: 1800
      }
    }
  }
}
#

And make the agent’s operating rule:

Never read large files blindly. Use rg, wc, head/tail, targeted line ranges, or subagents. Summarize findings; don’t drag raw logs forward.

Docs: https://docs.openclaw.ai/gateway/config-agents

6. Enable pruning + proactive compaction

For this workflow, I’d enable pruning even outside Anthropic:

{
  agents: {
    defaults: {
      contextPruning: { mode: "cache-ttl", ttl: "5m" },
      compaction: {
#

Separately from the memory architecture, I’d still set:

{
  models: {
    providers: {
      "openai-codex": {
        models: [
          { id: "gpt-5.5", contextTokens: 240000 }
        ]
      }
    }
  }
}

Then design the agent as if 240k is the ceiling, not 1M.

#

My preferred shape

Hot prompt: tiny rules + current state pointer
CURRENT.md: active working set
MEMORY.md: durable facts only
daily logs: raw chronological archive
memory_search/QMD: broad retrieval
Memory Wiki: curated decisions/entities/runbooks
session pruning: trims old tool output
compaction: semantic checkpointing
subagents: summarize huge files/logs instead of importing them into main context

The big rule: main agent context should contain conclusions and pointers, not raw evidence unless needed right now.

#

My recommendation: yes, move to a layered memory system. Don’t make the active session carry your whole operational history.

Best architecture

AGENTS.md                 # short operating rules + retrieval protocol
MEMORY.md                 # durable invariants only, high signal
memory/CURRENT.md         # compact active state / handoff, read at startup
memory/workstreams/*.md   # active project/process pages
memory/runbooks/*.md      # stable procedures
memory/decisions/*.md     # decisions + rationale
memory/incidents/*.md     # postmortems / known issues
memory/YYYY-MM-DD.md      # raw daily archive, not startup context
wiki/ or Obsidian vault   # compiled human/agent knowledge layer

Direct answers

#
  1. Best memory architecture: hot/cold split. Startup reads only compact state; archives are searched/read on demand.

  2. Yes, stop reading full today/yesterday logs. Read memory/CURRENT.md plus maybe a tiny ā€œrecent changesā€ section. Daily logs at 34k/107k/68k are archives, not startup context.

  3. Yes, shrink MEMORY.md to durable invariants. Things like ā€œproduction constraintsā€, ā€œtrading safety rulesā€, ā€œoperator preferencesā€, ā€œknown permanent architecture decisionsā€. Move active processes into topic pages.

  4. Memory Wiki is the right bridge, but Obsidian should stay mostly human-facing. Let OpenClaw consume compiled summaries/digests/pages via wiki_search/wiki_get, not the whole vault. Obsidian is great as the editor/UI.

  5. Structure files around retrieval. One page per system, incident, decision, workstream. Each page should have:

    • status
    • last updated
    • current truth
    • evidence links / daily-log refs
    • open questions
    • next actions
  6. Yes, fix semantic/vector memory first. Without working search, the agent will keep doing giant read/grep passes. Run:

#
   openclaw memory status --deep
   openclaw memory index --force

Docs: https://docs.openclaw.ai/concepts/memory-search

  1. Recommended pattern: daily log = append-only black box; CURRENT.md = cockpit.

    Daily log records what happened. CURRENT.md says what matters now.

  2. Yes, OpenClaw can use Obsidian pages safely if the vault is searched/fetched on demand. Don’t put the whole vault, or symlinked huge folders, into always-injected bootstrap files.

  3. Some config helps, but architecture/workflow is the main win. Useful knobs:

    • agents.defaults.contextInjection: "continuation-skip"
    • bootstrapMaxChars
    • bootstrapTotalMaxChars
#
  • startupContext.maxTotalChars

    • contextLimits.memoryGetMaxChars
    • contextLimits.toolResultMaxChars
    • tools.profile / tools.allow / tools.byProvider
    • skill allowlists

    Tool schemas do count, but the huge killer in your example is tool output/history, not just schemas. Docs: https://docs.openclaw.ai/concepts/context

  1. Use a combination:
  • āœ… subagents for large file/log audits
  • āœ… manual /compact at phase boundaries
  • āœ… small startup files
  • āœ… semantic memory provider
  • āœ… Memory Wiki for compiled durable knowledge
  • āœ… context pruning for old tool results

Suggested config starter:

#
{
  agents: {
    defaults: {
      contextInjection: "continuation-skip",
      startupContext: {
        enabled: true,
        dailyMemoryDays: 0,
        maxTotalChars: 3000
      },
      contextLimits: {
        memoryGetMaxChars: 8000,
        memoryGetDefaultLines: 80,
        toolResultMaxChars: 8000,
        postCompactionMaxChars: 1800
      },
#
      contextPruning: { mode: "cache-ttl", ttl: "5m" },
      compaction: {
        reserveTokensFloor: 40000,
        truncateAfterCompaction: true,
        maxActiveTranscriptBytes: "20mb"
      }
    }
  }
}

And I’d add this rule to AGENTS.md:

On startup, read memory/CURRENT.md first. Do not read daily logs wholesale. Use memory search, targeted line ranges, or subagents for archives/logs/source audits. Keep conclusions in CURRENT.md or topic pages.

That gives you continuity across compactions without dragging raw logs into every turn.

rain vessel
#

Why is 5.5 not showing as an available model when connecting via codex oauth?

red pineBOT
#

Most likely one of three things:

  1. Version/catalog lag. First check openclaw --version. If you’re on 2026.4.23, update — current OpenClaw docs say Codex OAuth should support gpt-5.5, and the docs now prefer the canonical ref openai/gpt-5.5 with openai-codex auth, while openai-codex/* is legacy aliasing. (docs.openclaw.ai)

  2. OpenAI rollout/account gating. OpenAI says GPT‑5.5 is rolling out gradually to Plus/Pro/Business/Enterprise users in ChatGPT and Codex, so it may simply not be available to that account/session yet. (help.openai.com)

  3. Wrong expectation from Codex model picker. OpenAI’s Codex docs also say Codex model availability is separate from ChatGPT/API and can change over time. So ā€œI have it in ChatGPTā€ ≠ guaranteed visible through the Codex OAuth catalog immediately. (help.openai.com)

I’d try:

openclaw --version
openclaw models list --provider openai-codex
openclaw models auth login --provider openai-codex
openclaw models list --provider openai-codex