#Optimise compaction

1 messages · Page 1 of 1 (latest)

oblique cedar
#

I want to optimise compaction and avoid tools going into loops and prompts abruptly ending. I running lmstudio/qwen/qwen3.6-35b-a3b on a beelink gtr9 with 96 GB RAM and have set the maximum context to 256K. Here is my openclaw config, please make suggestions and ask questions

  "compaction": {
    "mode": "default",
    "model": "lmstudio/qwen/qwen3.6-35b-a3b",
    "timeoutSeconds": 3600,
    "reserveTokens": 32000,
    "keepRecentTokens": 40000,
    "reserveTokensFloor": 48000,
    "maxHistoryShare": 0.65,
    "memoryFlush": {
      "enabled": true,
      "model": "lmstudio/qwen/qwen3.6-35b-a3b",
      "softThresholdTokens": 64000,
      "prompt": "Save decisions, state changes, lessons, and open tasks to memory/YYYY-MM-DD.md. Skip routine conversation. If nothing worth storing: NO_>
      "systemPrompt": "Only persist what matters for session continuity. Be concise."
    }
  },
  "contextPruning": {
    "mode": "cache-ttl",
    "ttl": "30m",
    "keepLastAssistants": 4,
    "softTrimRatio": 0.65,
    "hardClearRatio": 0.88,
    "minPrunableToolChars": 1200
  },
keen zincBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

sick urchinBOT
#

Rex Horizon — a few things jump out. I checked the current docs/source rather than guessing claws-first 🦞

First: can you share openclaw --version? A lot of compaction/pruning behavior has changed recently, so version matters.

Main suggestions:

  1. Use safeguard mode for this setup
    default works, but with 256K context + long local-model sessions I’d prefer the chunked safeguard path:
compaction: {
  mode: "safeguard",
  timeoutSeconds: 1800,
  reserveTokens: 48000,
  reserveTokensFloor: 48000,
  keepRecentTokens: 40000,
#
  maxHistoryShare: 0.65,
  midTurnPrecheck: { enabled: true },
  truncateAfterCompaction: true,
  maxActiveTranscriptBytes: "40mb",
}

midTurnPrecheck is especially relevant to “tools go into loops / prompt abruptly ends”: it checks pressure after tool results are appended but before the next model call, so OpenClaw can compact/truncate instead of sending an overstuffed prompt.

  1. Your memoryFlush.softThresholdTokens: 64000 is probably too high
    In source, the trigger threshold is effectively:

contextWindow - reserveTokensFloor - softThresholdTokens

So with 256K / 48K / 64K, the preflight threshold is around 144K. That means housekeeping/compaction pressure starts quite early. I’d try 16000 or 24000 first.

#

Also your prompt looks truncated: NO_> — make sure it says the exact silent token NO_REPLY, otherwise silent housekeeping can leak or behave oddly.

Suggested:

memoryFlush: {
  enabled: true,
  model: "lmstudio/qwen/qwen3.6-35b-a3b",
  softThresholdTokens: 16000,
  systemPrompt: "Only persist durable session continuity facts. Be concise. Reply exactly NO_REPLY if nothing matters.",
  prompt: "Save decisions, state changes, lessons, and open tasks to memory/YYYY-MM-DD.md. Skip routine conversation. If nothing worth storing, reply exactly NO_REPLY."
}
  1. Your pruning is maybe too aggressive on small tool results
    minPrunableToolChars: 1200 can prune/clear fairly small outputs. If the model loses a useful prior tool result, it may re-call tools and look loop-y. I’d raise it:
#
contextPruning: {
  mode: "cache-ttl",
  ttl: "10m",
  keepLastAssistants: 4,
  softTrimRatio: 0.5,
  hardClearRatio: 0.75,
  minPrunableToolChars: 8000,
  softTrim: { maxChars: 4000, headChars: 1500, tailChars: 1500 },
  hardClear: { enabled: true, placeholder: "[Old tool result content cleared]" }
}
  1. Enable actual tool-loop detection
    This is separate from compaction/pruning and is disabled by default:
#
tools: {
  loopDetection: {
    enabled: true,
    historySize: 30,
    warningThreshold: 8,
    criticalThreshold: 16,
    globalCircuitBreakerThreshold: 24,
    detectors: {
      genericRepeat: true,
      knownPollNoProgress: true,
      pingPong: true,
    },
  },
}
#
  1. Consider lowering runaway retry budget
    If this is the embedded OpenClaw/Pi runtime, agents.defaults.runRetries can cap runaway attempts:
runRetries: {
  base: 16,
  perProfile: 4,
  min: 24,
  max: 60,
}

Questions that would help tune this properly:

  • What OpenClaw version are you on? (openclaw --version)
  • Are these Discord/chat embedded sessions, or a CLI/ACP backend? Some compaction bits only apply to embedded Pi sessions.
  • In LM Studio, is the model actually loaded with 256K context, and does OpenClaw know that context window for this provider/model? If OpenClaw thinks it’s smaller, compaction will fire much earlier.
oblique cedar
#

I'm running OpenClaw 2026.5.7

sick urchinBOT
#

Great, 2026.5.7 is new enough that the knobs I mentioned are relevant.

For your exact setup, I’d focus on three likely pressure points:

  1. Make sure OpenClaw’s model metadata also says 256K
    LM Studio’s max context alone isn’t enough if OpenClaw’s provider config still has a lower/default window. In your models.providers.lmstudio.models[], make sure the matching model has something like:
{
  id: "qwen/qwen3.6-35b-a3b",
  contextWindow: 262144,
  contextTokens: 262144,
  maxTokens: 8192
}

If you want a safer effective cap, set contextTokens: 200000 while leaving contextWindow: 262144. That gives OpenClaw room to compact before LM Studio hard-fails.

#
  1. I would not keep memoryFlush.softThresholdTokens at 64000
    That makes preflight compaction/flush begin very early for a 256K window. I’d start with:
memoryFlush: {
  enabled: true,
  model: "lmstudio/qwen/qwen3.6-35b-a3b",
  softThresholdTokens: 16000,
  systemPrompt: "Only persist durable session continuity facts. Be concise. Reply exactly NO_REPLY if nothing matters.",
  prompt: "Save decisions, state changes, lessons, and open tasks to memory/YYYY-MM-DD.md. Skip routine conversation. If nothing worth storing, reply exactly NO_REPLY."
}
  1. Turn on loop protection separately from compaction
    Compaction won’t necessarily stop a local model from repeatedly polling/reading/searching. Add:
#
tools: {
  loopDetection: {
    enabled: true,
    historySize: 30,
    warningThreshold: 8,
    criticalThreshold: 16,
    globalCircuitBreakerThreshold: 24,
    detectors: {
      genericRepeat: true,
      knownPollNoProgress: true,
      pingPong: true,
    },
  },
}
#

My recommended compaction block for you would be:

compaction: {
  mode: "safeguard",
  model: "lmstudio/qwen/qwen3.6-35b-a3b",
  timeoutSeconds: 1800,
  reserveTokens: 48000,
  reserveTokensFloor: 48000,
  keepRecentTokens: 40000,
  maxHistoryShare: 0.65,
  midTurnPrecheck: { enabled: true },
  truncateAfterCompaction: true,
  maxActiveTranscriptBytes: "40mb",
  memoryFlush: { /* as above */ },
}
#

One important question: what does your models.providers.lmstudio block look like, especially api, contextWindow, contextTokens, and maxTokens? That’s the part I’d want to verify next, because mismatched context metadata is a classic cause of “why did it compact / end / fail so early?”

oblique cedar
#

Yes, it was truncated here it is in full

  "compaction": {
    "mode": "default",
    "model": "lmstudio/qwen/qwen3.6-35b-a3b",
    "timeoutSeconds": 3600,
    "reserveTokens": 32000,
    "keepRecentTokens": 40000,
    "reserveTokensFloor": 48000,
    "maxHistoryShare": 0.65,
    "memoryFlush": {
      "enabled": true,
      "model": "lmstudio/qwen/qwen3.6-35b-a3b",
      "softThresholdTokens": 64000,
      "prompt": "Save decisions, state changes, lessons, and open tasks to memory/YYYY-MM-DD.md. Skip routine conversation. If nothing worth storing: NO_FLUSH",
      "systemPrompt": "Only persist what matters for session continuity. Be concise."
    }
  },
  "contextPruning": {
    "mode": "cache-ttl",
    "ttl": "30m",
    "keepLastAssistants": 4,
    "softTrimRatio": 0.65,
    "hardClearRatio": 0.88,
    "minPrunableToolChars": 1200
  },
sick urchinBOT
#

Yep — that full version confirms one thing I’d definitely change:

NO_FLUSH is not the silent token OpenClaw documents. Use NO_REPLY instead.

So this:

"prompt": "Save decisions, state changes, lessons, and open tasks to memory/YYYY-MM-DD.md. Skip routine conversation. If nothing worth storing: NO_FLUSH"

should become:

"prompt": "Save decisions, state changes, lessons, and open tasks to memory/YYYY-MM-DD.md. Skip routine conversation. If nothing worth storing, reply exactly NO_REPLY."

I’d also add the same instruction to systemPrompt, because local models sometimes treat the user prompt as “soft”:

#
"systemPrompt": "Only persist what matters for session continuity. Be concise. If nothing matters, reply exactly NO_REPLY."

The other big thing: your reserveTokens: 32000 is effectively misleading because reserveTokensFloor: 48000 means OpenClaw will bump the reserve up to 48K anyway. I’d make them match so future-you doesn’t have to remember the hidden floor behavior:

"reserveTokens": 48000,
"reserveTokensFloor": 48000,

For 256K context, I’d change this next:

"softThresholdTokens": 64000
#

to either:

"softThresholdTokens": 16000

or, if you want a more conservative safety margin:

"softThresholdTokens": 24000

64K is huge. With your 48K reserve floor, it makes the preflight threshold roughly 256K - 48K - 64K = 144K. That means memory flush / compaction pressure starts while you still have a ton of window left.

For pruning, I’d raise minPrunableToolChars. 1200 is small enough that useful tool results can disappear from model-visible context, which can cause re-reading/re-polling loops. Try:

#
"minPrunableToolChars": 8000

or 12000 if you do lots of code/debugging.

My adjusted version:

"compaction": {
  "mode": "safeguard",
  "model": "lmstudio/qwen/qwen3.6-35b-a3b",
  "timeoutSeconds": 1800,
  "reserveTokens": 48000,
  "keepRecentTokens": 40000,
  "reserveTokensFloor": 48000,
#
  "maxHistoryShare": 0.65,
  "midTurnPrecheck": { "enabled": true },
  "truncateAfterCompaction": true,
  "maxActiveTranscriptBytes": "40mb",
  "memoryFlush": {
    "enabled": true,
    "model": "lmstudio/qwen/qwen3.6-35b-a3b",
    "softThresholdTokens": 16000,
    "prompt": "Save decisions, state changes, lessons, and open tasks to memory/YYYY-MM-DD.md. Skip routine conversation. If nothing worth storing, reply exactly NO_REPLY.",
    "systemPrompt": "Only persist what matters for session continuity. Be concise. If nothing matters, reply exactly NO_REPLY."
  }
},
"contextPruning": {
  "mode": "cache-ttl",
  "ttl": "10m",
#
  "keepLastAssistants": 4,
  "softTrimRatio": 0.5,
  "hardClearRatio": 0.75,
  "minPrunableToolChars": 8000,
  "softTrim": { "maxChars": 4000, "headChars": 1500, "tailChars": 1500 },
  "hardClear": { "enabled": true, "placeholder": "[Old tool result content cleared]" }
}

And separately add loop detection under top-level tools, not inside compaction:

"tools": {
  "loopDetection": {
    "enabled": true,
#
    "historySize": 30,
    "warningThreshold": 8,
    "criticalThreshold": 16,
    "globalCircuitBreakerThreshold": 24,
    "detectors": {
      "genericRepeat": true,
      "knownPollNoProgress": true,
      "pingPong": true
    }
  }
}

One nuance: if Qwen is slow at summarization on the GTR9, keep timeoutSeconds: 3600. I suggested 1800 mostly to avoid an hour-long wedged compaction, but local hardware can be spiky.

oblique cedar
#

This is the entries for models

      {
        "id": "qwen/qwen3.6-35b-a3b",
        "name": "Qwen3.6 35B A3B",
        "reasoning": true,
        "input": [
          "text",
          "image"
        ],
        "cost": {
          "input": 0,
          "output": 0,
          "cacheRead": 0,
          "cacheWrite": 0
        },
        "compat": {
          "supportsReasoningEffort": true,
          "supportedReasoningEfforts": [
            "off",
            "on"
          ],
          "reasoningEffortMap": {
            "off": "off",
            "none": "off",
            "minimal": "on",
            "low": "on",
            "medium": "on",
            "high": "on",
            "xhigh": "on",
            "adaptive": "on",
            "max": "on"
          },
        },
        "contextWindow": 262144,
        "contextTokens": 131072,
        "maxTokens": 8192
      }
sick urchinBOT
#

Ah — that explains a lot.

You have:

"contextWindow": 262144,
"contextTokens": 131072,

contextWindow is the native/model metadata, but contextTokens is the effective runtime cap OpenClaw uses. So right now OpenClaw is behaving as if this is a 128K-ish model, not a 256K model.

With your current compaction settings, the preflight threshold is roughly:

131072 - 48000 - 64000 = 19072 tokens
#

That is extremely early. It means memory flush / compaction pressure can start around 19K visible tokens. That’s probably the main source of weird early compaction / abrupt behavior.

I’d change the model entry to either full 256K:

"contextWindow": 262144,
"contextTokens": 262144,
"maxTokens": 8192

or my preferred safer option:

"contextWindow": 262144,
"contextTokens": 200000,
"maxTokens": 8192
#

Then pair that with:

"reserveTokens": 48000,
"reserveTokensFloor": 48000,
"memoryFlush": {
  "softThresholdTokens": 16000
}

That gives you thresholds around:

  • 262144 - 48000 - 16000 = 198144 if using full 256K
  • 200000 - 48000 - 16000 = 136000 if using safer 200K cap

That’s much more reasonable.

#

One more config hygiene note: if your config file is JSON5, the trailing comma after reasoningEffortMap is fine. If it’s strict JSON, remove it.

Also, if “abrupt ending” means the assistant stops mid-answer rather than context errors, consider raising:

"maxTokens": 16384

…but I’d only do that after fixing contextTokens, because bigger output budgets can make local models slower and increase pressure.

So the biggest fix is: don’t leave contextTokens at 131072 if you’re trying to run a 256K setup.

oblique cedar
#

When I say abrubt ended I mean I get ' Agent couldn't generate a response.'

sick urchinBOT
#

Got it — that specific message means OpenClaw finished the run without a deliverable assistant message. It’s not just “the answer was short.” Internally that can happen when the last model turn is one of these:

  • empty response from the provider
  • reasoning/thinking-only response with no final text
  • the model ended on a tool call and never produced the post-tool final answer
  • provider/model error surfaced as an incomplete turn

Given your config, I’d suspect two things first:

  1. contextTokens: 131072 + softThresholdTokens: 64000
    This was making the effective threshold about 19K tokens, so the agent could get pushed into housekeeping/compaction paths very early. Fix that first.

  2. Qwen reasoning mode may be producing reasoning-only / no final text
    Your model entry says:

#
"reasoning": true,
"compat": {
  "supportsReasoningEffort": true,
  "supportedReasoningEfforts": ["off", "on"]
}

For stability, I’d explicitly force reasoning off at the agent default level while testing:

"agents": {
  "defaults": {
    "reasoningDefault": "off",
    "thinkingDefault": "off"
  }
#
}

And for the model entry, if LM Studio/Qwen is flaky with reasoning/tool finalization, test temporarily with:

"reasoning": false,
"compat": {
  "supportsReasoningEffort": false
}

That’s not necessarily the final setting, but it’s a good isolation test. If the “couldn’t generate” errors disappear, the issue is likely reasoning/final-answer formatting rather than compaction.

My debugging order would be:

#
  1. Change model cap:
"contextWindow": 262144,
"contextTokens": 200000
  1. Change memory flush:
"softThresholdTokens": 16000
  1. Replace NO_FLUSH with NO_REPLY.

  2. Enable midTurnPrecheck and tools.loopDetection.

#
  1. Temporarily force reasoning off.

Then reproduce once and check logs around that timestamp. The useful thing to look for is the final stop reason / provider error around the incomplete turn. If you can paste the gateway log lines immediately before Agent couldn't generate a response, we can tell whether it’s empty response, reasoning-only, tool-use terminal, timeout, or context overflow.

oblique cedar
#

So can you give me the changes in full

sick urchinBOT
#

Yep — here’s the full set I’d try as a clean baseline.

Under your models.providers.lmstudio.models[], change the Qwen entry to this shape:

{
  "id": "qwen/qwen3.6-35b-a3b",
  "name": "Qwen3.6 35B A3B",
  "reasoning": false,
  "input": ["text", "image"],
  "cost": {
    "input": 0,
    "output": 0,
    "cacheRead": 0,
    "cacheWrite": 0
  },
#
  "compat": {
    "supportsReasoningEffort": false
  },
  "contextWindow": 262144,
  "contextTokens": 200000,
  "maxTokens": 8192
}

Then under agents.defaults, use:

{
  "reasoningDefault": "off",
  "thinkingDefault": "off",
#

  "compaction": {
    "mode": "safeguard",
    "model": "lmstudio/qwen/qwen3.6-35b-a3b",
    "timeoutSeconds": 1800,
    "reserveTokens": 48000,
    "keepRecentTokens": 40000,
    "reserveTokensFloor": 48000,
    "maxHistoryShare": 0.65,
    "midTurnPrecheck": {
      "enabled": true
    },
    "truncateAfterCompaction": true,
    "maxActiveTranscriptBytes": "40mb",
    "memoryFlush": {
#
      "enabled": true,
      "model": "lmstudio/qwen/qwen3.6-35b-a3b",
      "softThresholdTokens": 16000,
      "prompt": "Save decisions, state changes, lessons, and open tasks to memory/YYYY-MM-DD.md. Skip routine conversation. If nothing worth storing, reply exactly NO_REPLY.",
      "systemPrompt": "Only persist what matters for session continuity. Be concise. If nothing matters, reply exactly NO_REPLY."
    }
  },

  "contextPruning": {
    "mode": "cache-ttl",
    "ttl": "10m",
    "keepLastAssistants": 4,
    "softTrimRatio": 0.5,
    "hardClearRatio": 0.75,
    "minPrunableToolChars": 8000,
#
    "softTrim": {
      "maxChars": 4000,
      "headChars": 1500,
      "tailChars": 1500
    },
    "hardClear": {
      "enabled": true,
      "placeholder": "[Old tool result content cleared]"
    }
  },

  "runRetries": {
    "base": 16,
    "perProfile": 4,
    "min": 24,
#
    "max": 60
  }
}

And add this at top level, beside agents / models, not inside agents.defaults:

{
  "tools": {
    "loopDetection": {
      "enabled": true,
      "historySize": 30,
      "warningThreshold": 8,
      "criticalThreshold": 16,
#
      "globalCircuitBreakerThreshold": 24,
      "detectors": {
        "genericRepeat": true,
        "knownPollNoProgress": true,
        "pingPong": true
      }
    }
  }
}

A couple notes:

  • contextTokens: 200000 is intentional. It lets LM Studio keep the full 256K native window while OpenClaw behaves more conservatively.
  • If this stabilizes things, you can later try contextTokens: 262144.
  • If Qwen performs worse with reasoning disabled, turn reasoning back on after confirming compaction/tool loops are stable. I’d isolate one variable at a time.
oblique cedar
#

It doesn't like "runRetries": {
"base": 16,
"perProfile": 4,
"min": 24,
"max": 60
}

sick urchinBOT
#

Good catch — drop runRetries for now. It’s optional and not needed for the main compaction fix.

Use this agents.defaults block without it:

{
  "reasoningDefault": "off",
  "thinkingDefault": "off",

  "compaction": {
    "mode": "safeguard",
    "model": "lmstudio/qwen/qwen3.6-35b-a3b",
    "timeoutSeconds": 1800,
    "reserveTokens": 48000,
    "keepRecentTokens": 40000,
    "reserveTokensFloor": 48000,
#
    "maxHistoryShare": 0.65,
    "midTurnPrecheck": {
      "enabled": true
    },
    "truncateAfterCompaction": true,
    "maxActiveTranscriptBytes": "40mb",
    "memoryFlush": {
      "enabled": true,
      "model": "lmstudio/qwen/qwen3.6-35b-a3b",
      "softThresholdTokens": 16000,
      "prompt": "Save decisions, state changes, lessons, and open tasks to memory/YYYY-MM-DD.md. Skip routine conversation. If nothing worth storing, reply exactly NO_REPLY.",
      "systemPrompt": "Only persist what matters for session continuity. Be concise. If nothing matters, reply exactly NO_REPLY."
    }
  },
#
  "contextPruning": {
    "mode": "cache-ttl",
    "ttl": "10m",
    "keepLastAssistants": 4,
    "softTrimRatio": 0.5,
    "hardClearRatio": 0.75,
    "minPrunableToolChars": 8000,
    "softTrim": {
      "maxChars": 4000,
      "headChars": 1500,
      "tailChars": 1500
    },
    "hardClear": {
      "enabled": true,
      "placeholder": "[Old tool result content cleared]"
#
    }
  }
}

Keep the top-level tools.loopDetection block — that’s the important loop-control piece.

If it also rejects midTurnPrecheck, truncateAfterCompaction, or maxActiveTranscriptBytes, remove those three next and keep the core fixes:

  • model contextTokens: 200000
  • mode: "safeguard"
  • reserveTokens: 48000
  • softThresholdTokens: 16000
  • NO_REPLY instead of NO_FLUSH
  • higher minPrunableToolChars
#

If you paste the exact validation error, I can trim it to what 2026.5.7 accepts.