#About compaction

1 messages · Page 1 of 1 (latest)

frail laurel
#

In the openclaw dashboard, I could see compacting context but it takes too long to compact. Any reasons and solutions for it?

jagged ridgeBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

molten mauveBOT
#

Compaction is not “just a local cleanup” — it’s an actual LLM summarization run over a potentially huge chunk of your session history. So “Compacting context…” can take a while when any of these are true:

Why it can be slow

  • Your transcript is large (lots of messages + big tool outputs). More tokens in → slower summarization.
  • The session’s model is slow / rate-limited (compaction uses your primary model unless you override it).
  • It may be doing multiple passes: OpenClaw can do a silent memory flush before compaction, then compaction, then retry your original request. That can feel like “it’s stuck compacting”.

Things that usually fix it

  1. Use a faster dedicated model for compaction (so your “main” model can be whatever you like, but compaction stays snappy):
    agents.defaults.compaction.model: "<provider>/<model-id>"
    Docs: https://docs.openclaw.ai/concepts/compaction

  2. Reduce how much it tries to keep uncompressed (if you’re compacting huge histories, keeping too many “recent tokens” makes the compaction job itself massive). The deep dive explains the knobs:

#
  1. Stop context bloat at the source

Quick debugging questions (so we can be precise):

  • What does /status show for contextTokens / contextWindow and Compactions?
  • What’s your openclaw --version and which provider/model are you using for that session?
  • Is this a local model (Ollama/LM Studio/etc.) or cloud?

If you paste /status output, I can tell whether you’re hitting “huge transcript”, “slow model”, or “compaction thresholds too conservative.”

frail laurel
#

Yes I have a lot of texts, also with exec results so…

molten mauveBOT
#

Yep — lots of exec output is the #1 reason compaction feels slow. Compaction has to “read” (tokenize + summarize) all that accumulated history, and tool results add up fast even when each one is individually truncated.

Practical ways to improve it:

  1. Don’t dump huge command output into the session

    • Prefer “summaries” at the shell level: | tail -n 200, | rg 'error|warn', --json + filter, etc.
    • For logs, grab just the relevant slice instead of the whole thing.
  2. Enable session pruning (prevents old tool results from bloating the prompt between turns)

    • This trims old toolResult messages right before model calls (doesn’t rewrite the JSONL transcript, but it does keep context smaller so you hit compaction less / later).
    • Config is agents.defaults.contextPruning (docs + examples):
      https://docs.openclaw.ai/concepts/session-pruning
    • Note: right now it’s Anthropic-only (and OpenRouter Anthropic). If you’re on a different provider, tell me which one and I’ll point at the right lever.
  3. Use a faster model just for compaction

    • Set agents.defaults.compaction.model to a faster/cheaper summarization model you have access to, so your “main” model choice doesn’t make compaction painful.
      Docs: https://docs.openclaw.ai/concepts/compaction
#
  1. If you’re already deep in the weeds: start a fresh session
    • /new (or /reset) is the instant fix when a session transcript is huge and you don’t need it all kept “alive”.

If you paste:

  • /status (especially contextTokens + compactions count)
  • what provider/model you’re using
    I can tell whether pruning will help immediately, or whether you should mostly solve it by swapping the compaction model / tuning keepRecentTokens & reserveTokens (deep dive: https://docs.openclaw.ai/reference/session-management-compaction).
frail laurel
#

It’s openai gpt 4.5 any good way to solve this? Also with thinking high, which I think it should be none or low

molten mauveBOT
#

If you’re on openai/gpt-4.5 and /think high, then yeah: compaction is going to feel chunky. Compaction is essentially a big summarization call, and “high thinking” adds extra latency on top.

1) Drop thinking for normal chat (and only turn it up when you need it)

You can persist it for the session with a directive-only message:

  • /think off (or /think low / /think minimal)

Then, when you do want deeper reasoning for one message, you can add it inline (non-persistent hint) to just that request.

Docs (exact options + behavior): https://docs.openclaw.ai/tools/slash-commands

2) Use a different (faster) model just for compaction

By default compaction uses the session’s primary model — so if your primary is slow/expensive, compaction will be too. You can override:

  • agents.defaults.compaction.model: "<provider>/<model-id>"

Pick something you have access to that’s “fast + good at summarizing” rather than “max intelligence”.

#

Docs: https://docs.openclaw.ai/concepts/compaction

3) Exec output is the gasoline on the fire

Even truncated outputs accumulate. Best mitigation is operational:

  • pipe/slice output (| tail -n 200, | rg ..., etc.)
  • avoid dumping whole logs into the session
  • if the session is already huge, /new is the instant reset

One quick check

Run /status once and see how close you are to the context window + how often compaction has happened. If you paste that (plus your openclaw --version), I can tell whether you’re mostly suffering from thinking level, model choice for compaction, or just sheer transcript size.

frail laurel
#

It should not take 10-20minutes right?

molten mauveBOT
#

No — 10–20 minutes is not normal for compaction.

In most setups compaction is on the order of seconds to a couple minutes. When it’s that long, it usually means the run is not “just compacting” but is stuck/retrying behind the scenes.

Common causes in your case (OpenAI + lots of exec output + /think high):

  1. You’re effectively paying for multiple slow calls

    • (optional) silent memory flush
    • compaction summarization
    • retry of the original request
      With /think high, each of those can get painfully slow.
  2. OpenAI request is stalling or retrying (rate limits / network / transport)

    • You’ll see this immediately in logs as repeated retries / long gaps.

What I’d do first (fast triage)

  • In-chat: /think off (or /think low) then run /compact once manually.
#

If logs suggest OpenAI transport weirdness

You can try forcing SSE (often more stable than WS for some networks) and/or disabling WS warmup:

Longer-term fixes

  • Keep /think low/off by default, only bump it for one-off prompts.
  • Set a faster compaction model via agents.defaults.compaction.model (so compaction doesn’t inherit your “heavy” model).

If you paste the relevant log snippet around one compaction attempt (even just the error/retry lines) + openclaw --version, I can tell you which bucket you’re in (stall vs rate limit vs “too much context + high thinking”).

frail laurel
#

Agent embedded run timeout during compaction?

molten mauveBOT
#

Yep — that error is exactly what it sounds like: compaction runs inside the normal embedded agent loop, and the loop has a hard timeout.

From the agent-loop docs, the embedded runtime timeout is:

So if compaction (plus any pre-flush + retry) takes longer than that, you’ll see “Agent embedded run timeout” and the compaction won’t finish.

Best fixes (recommended)

  1. Turn thinking down (high thinking makes compaction much slower)

    • /think off (or /think low) in that session
  2. Use a faster model for compaction

    • agents.defaults.compaction.model: "<provider>/<model-id>"
      (so your main model can stay “smart”, but compaction stays fast)
#
  1. Reduce giant exec dumps (they’re the usual reason compaction becomes huge)

If you just need it to stop timing out

Increase the runtime timeout in openclaw.json, e.g.:

{
  agents: {
    defaults: {
      timeoutSeconds: 1200, // 20 minutes
    },
  },
}

If you paste the /status output (contextTokens/contextWindow + compactions count), I can help you pick whether this is better solved by timeoutSeconds vs compaction model + thinking level (usually the latter).

frail laurel
#

Is there easier way to compact? Always making problems during compaction. Should I create outside-in compaction mode?

molten mauveBOT
#

“Outside‑in compaction mode” isn’t a built-in thing today (at least there’s no documented mode: "outside-in" in agents.defaults.compaction). So I wouldn’t start by inventing a new mode — the problems you’re seeing are almost certainly from runtime timeout + huge transcript + slow settings, not the compaction strategy itself.

Easiest / most reliable ways to make compaction stop being painful

  1. Make compaction fast
  • Set default thinking low/off: /think off (or /think low)
  • Use a dedicated compaction model: agents.defaults.compaction.model: "<provider>/<model-id>" (pick something fast at summarization)
  1. Stop it timing out mid-compaction
  1. Prevent the transcript from becoming “uncompactable”
  • Biggest win: don’t persist huge exec dumps (slice output with tail, rg, etc.)
  • Compaction has to read that history to summarize it — so garbage in = slow compaction.
#
  1. Operational workaround: compact earlier / reset more often
  • Run /compact at natural milestones (before the session becomes enormous)
  • Or /new when the thread has served its purpose

If you really want custom behavior (advanced)

OpenClaw can delegate /compact + context assembly to a context-engine plugin (kind: "context-engine" via plugins.slots.contextEngine), but that’s a real engineering project, not a quick config tweak. Docs: https://docs.openclaw.ai/tools/plugin

If you paste your /status and the exact timeout/error line from logs, I can suggest a minimal config that usually fixes this (timeoutSeconds + thinking + compaction.model).

frail laurel
#

What model is the best for compaction then?