#Auto compactions issue
1 messages ¡ Page 1 of 1 (latest)
First check version please:
openclaw --version
Likely explanation: 1M native context â OpenClawâs runtime context budget. For gpt-5.5 via Codex OAuth, OpenClaw documents a native contextWindow of 1,000,000, but a default runtime contextTokens cap of 272,000, so auto-compaction before 1M is expected unless you raise that cap. (docs.openclaw.ai)
To confirm whatâs actually active, run in that chat:
/status
/context detail
If itâs Codex OAuth, you can raise the runtime cap in ~/.openclaw/openclaw.json:
{
models: {
providers: {
"openai-codex": {
models: [{ id: "gpt-5.5", contextTokens: 512000 }],
},
},
},
}
Then restart the gateway.
If youâre using direct OpenAI Responses (openai/* on api.openai.com), OpenClaw also auto-enables server-side compaction with a default threshold around 70% of the model context window; that can be tuned/disabled with responsesCompactThreshold / responsesServerCompaction. OpenAIâs Responses API itself has compaction support for long-running conversations. (docs.openclaw.ai)
That error means the run ended without a usable visible assistant response; the âtool actions may have already been executedâ part means donât blindly retry if it was doing file/system actions. Grab:
openclaw status --all
openclaw logs --limit 200 --plain
Docs: https://docs.openclaw.ai/providers/openai and https://docs.openclaw.ai/concepts/compaction
so i did raise the context in openclaw.json and i am using codex gpt-5.5
Gotcha â then the key thing is: did you raise contextTokens on the exact openai-codex/gpt-5.5 model row?
For Codex GPT-5.5, setting only contextWindow: 1000000 wonât necessarily change the runtime compaction budget, because the built-in row already has contextTokens: 272000.
Use this shape:
{
models: {
providers: {
"openai-codex": {
models: [
{
id: "gpt-5.5",
contextWindow: 1000000,
contextTokens: 512000
}
]
}
}
}
}
Then:
openclaw gateway restart
And verify in the same chat:
/status
/context detail
If /status still shows ~272k context, then either:
- config is under
openaiinstead ofopenai-codex, - gateway didnât reload the config,
- the current session has stale model/context state â try
/new, - or the model override in that chat is not actually
openai-codex/gpt-5.5.
Also please paste openclaw --version + the relevant redacted config snippet if it still compacts early.
so this is my settings
⢠openai-codex/gpt-5.4
⢠contextWindow: 1050000
⢠contextTokens: 1050000
⢠openai-codex/gpt-5.5
⢠contextWindow: 1000000
⢠contextTokens: 1000000
Those model entries look right. If itâs still compacting early, Iâd check these next:
-
Agent-level override beats the model row
Check if you have a lower cap here:
openclaw config get agents.defaults.contextTokens openclaw config get agents.listIf
agents.defaults.contextTokensor an agentâscontextTokensis set lower, remove/raise it. -
Verify the active session is actually using that route
In the chat, run:
/status
/context detail
You want to see openai-codex/gpt-5.5 and ~1,000,000 context.
If it says openai/gpt-5.5 + Runtime: OpenAI Codex, thatâs the native Codex harness path, not the openai-codex/* PI route.
-
Expected compaction point is not exactly 1M
OpenClaw/Pi keeps reserve headroom, default floor is ~20k tokens, so with 1M context it can compact around 980k, not exactly 1,000,000.
-
If you changed config recently
Restart + fresh session:
openclaw gateway restart
Then in chat:
/new
/status
If /status shows 1M but it compacts way before ~980k, paste openclaw --version, /status, and the log lines around the compaction/error. Then itâs probably overflow recovery or runtime mismatch rather than config.
openclaw config get agents.defaults.contextTokens
đŚ OpenClaw 2026.4.24 (cbcfdf6) â If you can describe it, I can probably automate itâor at least make it funnier.
Config path not found: agents.defaults.contextTokens
That result is actually good â Config path not found means there is no global agents.defaults.contextTokens override, so thatâs not whatâs forcing it lower.
Next checks:
openclaw config get agents.list
openclaw models list --provider openai-codex
In agents.list, look for any per-agent contextTokens value. If thereâs none, check whether the model catalog is really seeing your 1M override:
- If
models listshowsgpt-5.5as272k/1000kor similar â your model override isnât being applied. - If it shows
1000kâ config is fine, then the session/runtime is the next suspect.
Then in the same chat/session that compacts:
/status
/context detail
If /status still shows the old context cap, try:
/new
/model openai-codex/gpt-5.5
/status
Also: donât paste full agents.list if it contains secrets/tokens â redact first. Docs for the relevant model config bit: https://docs.openclaw.ai/providers/openai
openclaw models list --provider openai-codex
đŚ OpenClaw 2026.4.24 (cbcfdf6) â If it works, it's automation; if it breaks, it's a "learning opportunity."
Model Input Ctx Local Auth Tags
openai-codex/gpt-5.5 text 977k no yes default,configured
openai-codex/gpt-5.4 text 1025k no yes configured
That output means your override is being picked up. 977k is just how OpenClaw formats ~1,000,000 tokens, so model config is not the problem now.
So next suspect is the active session/runtime state or the compaction run itself failing.
Please run in the same chat thatâs failing:
/status
/context detail
Things to check:
- If
/statusshowsopenai-codex/gpt-5.5and ~977k ctx, config is fine. - If it shows a lower ctx, do:
/new
/model openai-codex/gpt-5.5
/status
Also, compaction near ~957kâ977k is expected because OpenClaw/Pi reserves output/headroom. But if itâs compacting much earlier or repeatedly failing, grab:
openclaw logs --limit 300 --plain
Look for lines around compaction, context length, request_too_large, or the exact âcouldn't generate a responseâ event.
Tokens: 28k in / 3.3k out
đď¸ Cache: 70% hit ¡ 66k cached, 0 new
đ Context: 67k/1.0m (7%) ¡ đ§š Compactions: 0
đ Usage: 5h 97% left âą2h 6m ¡ Week 68% left âą1d 5h
That status says the current session is not near compaction:
Context: 67k/1.0m (7%)Compactions: 0
So the config is working. The âAgent couldn't generate a responseâ error is likely not from context pressure in this session.
Next thing Iâd check is logs around the failed turn:
openclaw logs --limit 300 --plain
Search for lines mentioning:
couldn't generate
compaction
context length
request_too_large
openai-codex
error
If the error happened in another chat/thread/session, run /status there too â session state is per conversation. If logs donât show a clear provider error, try /new once and retry the same prompt; that will tell us whether itâs stale transcript/session-specific or a broader Codex runtime issue.
its very difficult to search can u tell me a command that does that
Yep â run this right after the failure:
openclaw logs --limit 1000 --plain | grep -nEi -C 3 "couldn.?t generate|agent couldn|compaction|auto-compaction|context length|request_too_large|input exceeds|too long|openai-codex|gpt-5\.5|error|fail|exception"
If that prints too much, narrower version:
openclaw logs --limit 1000 --plain | grep -nEi -C 5 "couldn.?t generate|compaction|context length|request_too_large"
Paste the matching chunk here, but redact tokens/keys/URLs if any show up.
the log is too big how can i share here
Donât share the whole log â just extract a small redacted slice.
Try this:
openclaw logs --limit 2000 --plain \
| grep -nEi -C 8 "couldn.?t generate|agent couldn|compaction|auto-compaction|context length|request_too_large|input exceeds|too long|openai-codex|gpt-5\.5|error|fail|exception" \
| tail -200
If itâs still huge, save it to a file:
openclaw logs --limit 2000 --plain \
| grep -nEi -C 8 "couldn.?t generate|agent couldn|compaction|auto-compaction|context length|request_too_large|input exceeds|too long|openai-codex|gpt-5\.5|error|fail|exception" \
| tail -200 > openclaw-error-snippet.txt
Then upload openclaw-error-snippet.txt here.
Before uploading, redact anything that looks like:
- API keys / tokens
- gateway URLs with tokens
- private file paths you donât want public
- personal message content if unrelated
i cannot share files here
No worries â donât upload a file. Just paste a small terminal snippet.
Run this; it should print max ~80 redacted lines:
openclaw logs --limit 2000 --plain \
| grep -nEi -C 4 "couldn.?t generate|agent couldn|compaction|context length|request_too_large|input exceeds|too long|openai-codex|gpt-5\.5|HTTP [45][0-9][0-9]|error|exception" \
| sed -E 's/(sk-[A-Za-z0-9_-]+)/[REDACTED_KEY]/g; s/(token=)[^ ]+/\1[REDACTED]/g; s/(authorization: bearer )[A-Za-z0-9._-]+/\1[REDACTED]/Ig' \
| tail -80
Then paste the output here inside triple backticks:
```text
paste output here
```
If itâs still too much, run the same command with tail -40 instead of tail -80.
how about i share it here in parts
Yes, that works â but please share filtered chunks, not the whole raw log.
Best format:
PART 1/3
```log
<paste ~80 lines>
Then PART 2/3, etc.
Before pasting, run this so each part is already filtered/redacted:
```bash
openclaw logs --limit 3000 --plain \
| grep -nEi -C 5 "couldn.?t generate|agent couldn|compaction|context length|request_too_large|input exceeds|too long|openai-codex|gpt-5\.5|HTTP [45][0-9][0-9]|error|exception" \
| sed -E 's/(sk-[A-Za-z0-9_-]+)/[REDACTED_KEY]/g; s/(token=)[^ ]+/\1[REDACTED]/g; s/(authorization: bearer )[A-Za-z0-9._-]+/\1[REDACTED]/Ig'
If it prints a ton, paste only the parts around the most recent failure timestamp.
Part 1/3
33:2026-04-26T23:17:52.473Z warn agent/embedded {"subsystem":"agent/embedded"} {"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"57df549f-96e2-459f-8108-e425961837f7","isError":true,"error":"LLM request timed out.","failoverReason":"timeout","model":"gpt-5.5","provider":"openai-codex","rawErrorPreview":"terminated","rawErrorHash":"sha256:e8c95a2a7271","providerRuntimeFailureKind":"timeout"} embedded run agent end
34:2026-04-26T23:27:39.351Z warn agent/embedded {"subsystem":"agent/embedded"} {"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"e233e688-bb4b-40b6-8cf2-34847760e36e","isError":true,"error":"LLM request timed out.","failoverReason":"timeout","model":"gpt-5.5","provider":"openai-codex","rawErrorPreview":"terminated","rawErrorHash":"sha256:e8c95a2a7271","providerRuntimeFailureKind":"timeout"} embedded run agent end
35:2026-04-26T23:27:54.128Z warn agent/embedded {"subsystem":"agent/embedded"} {"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"e233e688-bb4b-40b6-8cf2-34847760e36e","isError":true,"error":"LLM request timed out.","failoverReason":"timeout","model":"gpt-5.5","provider":"openai-codex","rawErrorPreview":"terminated","rawErrorHash":"sha256:e8c95a2a7271","providerRuntimeFailureKind":"timeout"} embedded run agent end
36:2026-04-26T23:38:48.381Z warn agent/embedded {"subsystem":"agent/embedded"} {"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"7f2f292f-4829-45fe-9ff8-c53b0e6e757e","isError":true,"error":"LLM request rejected: Your input exceeds the context window of this model. Please adjust your input and try again.","failoverReason":null,"model":"gpt-5.5","provider":"openai-codex","rawErrorPreview":"Codex error:
Part 1 already tells us something useful: there are two separate failures:
LLM request timed outLLM request rejected: Your input exceeds the context window of this model
So the compaction is probably OpenClaw reacting to a provider-side context rejection, not your config being ignored.
Please send part 2/3, especially the lines immediately after:
runId":"7f2f292f-4829-45fe-9ff8-c53b0e6e757e"
If easier, run this focused command:
openclaw logs --limit 3000 --plain \
| grep -n -C 20 "7f2f292f-4829-45fe-9ff8-c53b0e6e757e"
That should give the exact failure block without the whole log.
54-Docs: /tools/browser#playwright-requirement raw_params={"action":"act","targetId":"F1237BDBBC37D0CE82738F1029EF8041","kind":"evaluate","fn":"() => ({title: document.title, hasError: !!document.querySelector('.error'), headings: [...document.querySelectorAll('h2')].map(x=>x.textContent).slice(0,20)})"}
55:2026-04-26T23:40:48.584Z warn agent/embedded {"subsystem":"agent/embedded"} compaction retry aggregate timeout (60000ms): proceeding with pre-compaction state runId=7f2f292f-4829-45fe-9ff8-c53b0e6e757e sessionId=4f128ca8-0ca6-4af2-8ce5-513b923faf2d
56:2026-04-26T23:40:48.591Z warn agent/embedded {"subsystem":"agent/embedded"} using current snapshot: timed out during compaction runId=7f2f292f-4829-45fe-9ff8-c53b0e6e757e sessionId=4f128ca8-0ca6-4af2-8ce5-513b923faf2d
57:2026-04-26T23:41:14.860Z warn agent/embedded {"subsystem":"agent/embedded"} incomplete turn detected: runId=7f2f292f-4829-45fe-9ff8-c53b0e6e757e sessionId=4f128ca8-0ca6-4af2-8ce5-513b923faf2d stopReason=toolUse payloads=0 â surfacing error to user
Yep, this log explains the behavior:
- Codex rejected the request: input exceeds context window
- OpenClaw tried overflow-recovery compaction
- That retry wait hit a hard 60s aggregate timeout
- Then the turn ended incomplete after a tool call, so you got the scary âtool actions may have already been executedâ message
So your 1M config is recognized, but the provider is still rejecting some real prompt as too large, and recovery compaction isnât finishing fast enough.
Best workaround: lower the runtime cap so OpenClaw compacts before Codex rejects it:
{
models: {
providers: {
"openai-codex": {
models: [
{ id: "gpt-5.5", contextWindow: 1000000, contextTokens: 750000 },
{ id: "gpt-5.4", contextWindow: 1050000, contextTokens: 800000 }
]
}
}
}
}
Then:
openclaw gateway restart
Optional but helpful: make compaction visible so you know when it happens:
{
agents: {
defaults: {
compaction: {
notifyUser: true,
keepRecentTokens: 50000
}
}
}
}
If this still happens with contextTokens: 750000, Iâd file it as a GitHub issue with those log lines â especially the compaction retry aggregate timeout (60000ms) part: https://github.com/openclaw/openclaw/issues
what is this keepRecentTokens: 50000
keepRecentTokens: 50000 means: when OpenClaw compacts, keep roughly the most recent 50k tokens verbatim, and summarize older history.
So after compaction, the next prompt has:
- compacted summary of older stuff
- last ~50k tokens exactly as-is
Itâs useful because recent tool calls/errors/files stay intact instead of only being summarized.
You donât need to set it. I suggested it as a safer long-session setting. If youâre unsure, leave it out and only lower contextTokens first:
{
models: {
providers: {
"openai-codex": {
models: [
{ id: "gpt-5.5", contextWindow: 1000000, contextTokens: 750000 }
]
}
}
}
}
Thatâs the main fix.
this is my compaction settings
{
"mode": "safeguard",
"reserveTokensFloor": 0,
"memoryFlush": {
"enabled": false,
"softThresholdTokens": 4000,
"prompt": "Review the session for any architectural decisions, bug fixes, or new patterns. Write personal notes to memory/YYYY-MM-DD.md. Reply NO_REPLY if nothing to store.",
"systemPrompt": "Session nearing compaction. Store durable memories now."
},
"notifyUser": true
}
Those settings are mostly fine, but reserveTokensFloor: 0 is working against what you want.
It disables OpenClawâs safety floor, so compaction can happen very close to the provider limit. Since Codex is rejecting before OpenClaw thinks it must compact, add an explicit reserve:
{
"mode": "safeguard",
"reserveTokens": 150000,
"reserveTokensFloor": 0,
"memoryFlush": {
"enabled": false
},
"notifyUser": true
}
What that means:
reserveTokens: 150000â start compacting with ~150k tokens of headroom.reserveTokensFloor: 0â donât force OpenClawâs default minimum; okay if you setreserveTokensyourself.memoryFlush.enabled: falseâ fine; avoids an extra pre-compaction agent turn.notifyUser: trueâ just shows compaction notices.
Iâd still also lower model contextTokens from 1000000 to something like 750000â850000. Thatâs the more reliable guard against Codex-side âinput exceeds context windowâ errors.
so what settings do u suggest look at my setting and then tell me updated settings
Yep â based on your logs/settings, Iâd use this balanced safer setup:
{
"models": {
"providers": {
"openai-codex": {
"models": [
{
"id": "gpt-5.5",
"contextWindow": 1000000,
"contextTokens": 800000
},
{
"id": "gpt-5.4",
"contextWindow": 1050000,
"contextTokens": 850000
}
]
}
}
},
"agents": {
"defaults": {
"compaction": {
"mode": "safeguard",
"reserveTokens": 150000,
"reserveTokensFloor": 0,
"keepRecentTokens": 50000,
"timeoutSeconds": 900,
"truncateAfterCompaction": true,
"memoryFlush": {
"enabled": false
},
"notifyUser": true
}
}
}
}
Why this combo:
contextTokens: 800000â donât drive Codex all the way to the edge of its advertised 1M.reserveTokens: 150000â compact proactively before provider rejection.keepRecentTokens: 50000â keep recent work/tool calls intact after compaction.truncateAfterCompaction: trueâ keeps active transcript smaller after compaction.
memoryFlush.enabled: falseâ avoids extra pre-compaction turn/time.
Then run:
openclaw gateway restart
And in chat:
/new
/status
You should see roughly 800k context for gpt-5.5.
i should remove the prompts? and system prompts?
Yes â since you have:
memoryFlush: { enabled: false }
you can remove the prompt, systemPrompt, and softThresholdTokens fields. They wonât help while memory flush is disabled, and they just make the config noisier.
Use:
"memoryFlush": {
"enabled": false
}
If later you re-enable memory flush, then add custom prompt / systemPrompt back. For now, simpler is better.
AFAIK, the 1mil context is only avaiable with API and not through codex oauth
if that helps
Yeah, good catch â Json is right here. OpenAIâs GPTâ5.5 announcement says Codex gets GPTâ5.5 with a 400K context window, while the API version is the one with 1M context. (openai.com)
So for openai-codex/gpt-5.5, Iâd update to something like:
{
"models": {
"providers": {
"openai-codex": {
"models": [
{
"id": "gpt-5.5",
"contextWindow": 400000,
"contextTokens": 320000
},
{
"id": "gpt-5.4",
"contextWindow": 400000,
"contextTokens": 320000
}
]
}
}
},
"agents": {
"defaults": {
"compaction": {
"mode": "safeguard",
"reserveTokens": 80000,
"reserveTokensFloor": 0,
"keepRecentTokens": 50000,
"timeoutSeconds": 900,
"truncateAfterCompaction": true,
"memoryFlush": {
"enabled": false
},
"notifyUser": true
}
}
}
}
That explains your logs perfectly: OpenClaw was configured for 1M, but Codex OAuth upstream rejected the real request as over its effective context window.
If you want the actual 1M path, that would be API-key openai/gpt-5.5 when available/eligible â not openai-codex/gpt-5.5.
Yes, that helps a lot â and I think Json is right.
The safer assumption is:
openai-codex/gpt-5.5via Codex OAuth: treat as ~400k max, not 1M.- OpenAIâs API model docs may list
gpt-5.5with 1M, but ChatGPT/Codex surfaces have lower context limits. OpenAIâs Help page lists GPTâ5.5 Thinking as 400k for Pro and 256K for paid tiers. (help.openai.com)
So update your settings to:
{
"models": {
"providers": {
"openai-codex": {
"models": [
{
"id": "gpt-5.5",
"contextWindow": 400000,
"contextTokens": 320000
},
{
"id": "gpt-5.4",
"contextWindow": 400000,
"contextTokens": 320000
}
]
}
}
},
"agents": {
"defaults": {
"compaction": {
"mode": "safeguard",
"reserveTokens": 80000,
"reserveTokensFloor": 0,
"keepRecentTokens": 50000,
"timeoutSeconds": 900,
"truncateAfterCompaction": true,
"memoryFlush": {
"enabled": false
},
"notifyUser": true
}
}
}
}
That should stop OpenClaw from driving Codex past the real upstream limit and hitting the âinput exceeds context windowâ rejection.
thanks for verifying krill, you da best
Happy to help! đŚ Nice catch â that explains the whole âconfig says 1M but upstream says nopeâ weirdness.
400k is for both 5.4 and 5.5 in codex?
thanks
Not exactly â correction:
- GPTâ5.5 in Codex: yes, OpenAI says 400K context. (openai.com)
- GPTâ5.4 in Codex: OpenAIâs GPTâ5.4 announcement says Codex had experimental 1M context support, and the API model is listed at 1.05M. (openai.com)
So safest settings:
{ id: "gpt-5.5", contextWindow: 400000, contextTokens: 320000 }
For gpt-5.4, you can either be conservative:
{ id: "gpt-5.4", contextWindow: 400000, contextTokens: 320000 }
OpenAI
Introducing GPT-5.5, our smartest model yetâfaster, more capable, and built for complex tasks like coding, research, and data analysis across tools.
or test the long-context path:
{ id: "gpt-5.4", contextWindow: 1050000, contextTokens: 850000 }
Given your actual failure was on gpt-5.5, Iâd definitely cap 5.5 to ~320k.
You got it â hope that gets the compaction weirdness under control đŚ
nps, good luck
{
"id": "gpt-5.5",
"contextWindow": 400000,
"contextTokens": 350000
},
{
"id": "gpt-5.4",
"contextWindow": 1050000,
"contextTokens": 950000
}
also for compaction make these changes
reserveTokens": 50000,
"reserveTokensFloor": 0,
"keepRecentTokens": 50000,
"timeoutSeconds": 900,
"truncateAfterCompaction": true,
what does this setting mean, when will auto compaction happen?
Yes â with those settings, auto-compaction should happen roughly here:
GPTâ5.5
contextWindow: 400000,
contextTokens: 350000,
reserveTokens: 50000
Effective budget is ~350k, and OpenClaw keeps 50k headroom.
So proactive compaction happens around:
350k - 50k = ~300k context used
GPTâ5.4
contextWindow: 1050000,
contextTokens: 950000,
reserveTokens: 50000
Compaction around:
950k - 50k = ~900k context used
What each setting means:
contextWindow= native/upstream max you believe the model supports.
contextTokens= OpenClawâs runtime cap; it should stay below the real upstream limit.reserveTokens: 50000= leave ~50k tokens free for output/tool loops/provider overhead.reserveTokensFloor: 0= donât force OpenClawâs default reserve floor; use your explicitreserveTokens.keepRecentTokens: 50000= after compaction, keep the latest ~50k tokens verbatim and summarize older history.timeoutSeconds: 900= allow a single compaction operation up to 15 minutes.truncateAfterCompaction: true= after successful compaction, rotate to a smaller active transcript.
Also: if Codex rejects before those thresholds, OpenClaw may still do overflow recovery compaction earlier. Thatâs expected.
on updating openclaw i get this error Updating plugins...
No plugin updates needed.
Completion cache update failed: Error: spawnSync /usr/bin/node ETIMEDOUT
That one is probably non-fatal. It means the update finished, but the post-update shell completion cache generation timed out after ~30s.
It only affects CLI tab-completion, not the gateway/model/compaction stuff.
Run:
openclaw --version
openclaw doctor
If OpenClaw updated correctly, manually regenerate the completion cache:
OPENCLAW_COMPLETION_SKIP_PLUGIN_COMMANDS=1 openclaw completion --write-state
If that works, youâre done.
If it hangs/times out too, you can ignore it for now unless shell tab-completion is important. If your terminal startup feels slow, run:
openclaw doctor --fix
my openclaw is very slow
Slow in which part â CLI commands, gateway startup, or agent replies? Quick triage:
openclaw --version
time openclaw status
openclaw gateway status --deep
openclaw doctor
If CLI commands are slow / completion cache timed out, run:
openclaw doctor --fix
OPENCLAW_COMPLETION_SKIP_PLUGIN_COMMANDS=1 openclaw completion --write-state
If agent replies are slow, check in chat:
/status
/context detail
For Codex GPTâ5.5, donât run it near fake 1M. Use your safer cap:
{ "id": "gpt-5.5", "contextWindow": 400000, "contextTokens": 350000 }
Also after config changes:
openclaw gateway restart
Then in chat:
/new
/status
If itâs still slow, paste openclaw status --all redacted. Thatâll show whether itâs model latency, gateway/plugin issues, huge context, or service health.
no i mean its taking too much cpu
Ah, CPU â got it.
First identify which process is burning CPU:
ps -eo pid,ppid,pcpu,pmem,etime,command \
| grep -E '[o]penclaw|[n]ode' \
| sort -k3 -nr \
| head -20
Then check what OpenClaw is doing:
openclaw status --all
openclaw logs --limit 500 --plain \
| grep -nEi "compaction|memory|index|embedding|plugin|discord|heartbeat|cron|completion|timeout|error|exception" \
| tail -120
Common CPU causes:
- Compaction loop â logs mention
compactionrepeatedly. - Huge transcript/session â enable
truncateAfterCompaction: truelike we discussed. - Memory/indexing/embeddings â logs mention
memory,index,embedding. - Plugin/dependency loop after update â run:
openclaw doctor - Shell completion cache â temporary; if it keeps happening:
OPENCLAW_COMPLETION_SKIP_PLUGIN_COMMANDS=1 openclaw completion --write-state
If CPU stays high while idle, paste the top 5 lines from the ps command plus the filtered log chunk.
real 1m22.761s
user 1m24.732s
sys 0m9.817s
That timing means the command is spending ~85s of CPU, not just waiting. Thatâs high.
Which command did you time â openclaw status, doctor, or completion --write-state?
Run these two to pinpoint it:
time openclaw --version
time openclaw status
And for the hot process:
ps -eo pid,ppid,pcpu,pmem,etime,command \
| grep -E '[o]penclaw|[n]ode' \
| sort -k3 -nr \
| head -20
If openclaw --version is also slow, thatâs CLI startup/plugin loading. If only status is slow, itâs probably gateway/state/log/session scanning.
1197656 1197649 109 0.3 00:24 openclaw
1196753 1029 107 0.6 09:15 openclaw-gateway
1197347 1197339 13.5 0.3 03:39 openclaw-agent
1197649 1196289 1.5 0.0 00:24 openclaw
1197339 494813 0.2 0.0 03:39 openclaw
That shows two hot things:
openclawPID1197656at ~109% CPU â likely the CLI command you just ran, maybe completion/cache/status.openclaw-gatewayPID1196753at ~107% CPU for 9 minutes â thatâs the concerning one.
First, get full args so we know what they are doing:
ps -p 1197656,1197649,1196753,1197347,1197339 -o pid,ppid,pcpu,pmem,etime,args
Then check gateway logs for loops:
openclaw logs --limit 1000 --plain \
| grep -nEi "loop|retry|compaction|completion|plugin|memory|index|embedding|heartbeat|cron|error|exception|timeout" \
| tail -150
If you just need CPU relief now, restart the gateway:
openclaw gateway restart
If the CPU comes back immediately after restart, paste the full ps ... args output + the filtered log tail. Since youâre on 2026.4.24, Iâd also run:
openclaw doctor
and if doctor mentions plugin/runtime deps or repeated repair, that may be the culprit.
why is openclaw gateway taking so much cpu
1196753 root 88.3 0.7 680 MB openclaw-gateway
88% CPU on openclaw-gateway while idle is not normal. Itâs usually one of:
- plugin/dependency loop after update
- memory indexing/embedding loop
- compaction/retry loop
- cron/heartbeat job stuck
- channel reconnect loop
First get the exact loop from the gateway log:
tail -n 2000 "$(ls -t /tmp/openclaw/openclaw-*.log | head -1)" \
| grep -nEi "plugin|runtime-deps|npm|ENOTEMPTY|memory|index|embedding|compaction|cron|heartbeat|retry|loop|timeout|error|exception" \
| tail -120
If you just need relief now:
openclaw gateway restart
sleep 20
ps -p 1196753 -o pid,user,pcpu,pmem,rss,etime,args
If CPU jumps back up after restart, paste the filtered log chunk.
Also: youâre on 2026.4.24, and Iâm suspicious of plugin/runtime-deps issues in that version. If the log mentions plugin-runtime-deps, npm, ENOTEMPTY, or repeated dependency installs, thatâs likely the cause and should go to a GitHub issue with the log snippet.
how can i fix this
Try this fix path:
openclaw gateway restart
openclaw update --yes
openclaw doctor --fix
openclaw gateway restart
Then wait ~30s and check CPU:
ps -C openclaw-gateway -o pid,user,pcpu,pmem,rss,etime,args
Why: youâre on 2026.4.24, and there were affected 2026.4.24 session/transcript/runtime issues that doctor --fix can repair. High gateway CPU can come from replaying/repairing bad session state, plugin deps, or loops.
If CPU is still high after that, run:
openclaw logs --limit 1000 --plain \
| grep -nEi "plugin|runtime-deps|npm|ENOTEMPTY|memory|index|embedding|compaction|cron|heartbeat|retry|timeout|error|exception" \
| tail -120
Paste that filtered output. If it mentions plugin-runtime-deps, npm, or ENOTEMPTY, thatâs likely the root cause.
OpenClaw gateway listening ports only:
⢠127.0.0.1:18789
⢠127.0.0.1:18791
why is it listening to two ports
18789 is the normal default Gateway port. 18791 usually means a second Gateway-like process/profile is also running â often a stale service/process from an update/restart, or another profile/rescue gateway.
Check which PID owns each port:
sudo ss -ltnp 'sport = :18789 or sport = :18791'
Then compare with:
ps -eo pid,ppid,pcpu,pmem,etime,args | grep -E '[o]penclaw|[n]ode'
openclaw gateway status --deep
openclaw gateway probe
If 18791 is stale/duplicate, the clean fix is:
openclaw gateway restart
Wait 20s, then recheck:
sudo ss -ltnp 'sport = :18789 or sport = :18791'
If both still remain, paste the ss output. Donât kill anything yet until we see which service owns which port.
Start with point 1: the repeated plugin manifest scanning.
What I observed: while CPU was high, the gateway main process was constantly reading /usr/lib/node_modules/openclaw/dist/extensions/*/openclaw.plugin.json. OpenClaw has
about 100+ bundled extension manifests, and the cache TTL in the installed code is only 1000ms, so any active path that asks âwhat plugins/channels/commands exist?â can
trigger repeated filesystem scans.
Why it matters: this is synchronous filesystem work inside the Node gateway process. If it happens during a stuck Telegram run or command discovery loop, it can burn CPU
and make the gateway less responsive even though the actual bot polling is not expensive.
Likely fix direction: increase the cache TTL first as a low-risk workaround, then consider an upstream/code fix that makes plugin discovery process-lifetime cached or
invalidated only when config/extensions change.
Suggested first test, not applying yet:
Environment=OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS=60000
Environment=OPENCLAW_PLUGIN_MANIFEST_CACHE_MS=60000
is this a good idea?
Yes â thatâs a reasonable low-risk test. I checked: those env vars are real:
OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS
OPENCLAW_PLUGIN_MANIFEST_CACHE_MS
Default is 1000ms, so 60000 should reduce repeated manifest scanning. Tradeoff: plugin/config discovery changes may take up to ~60s to reflect unless you restart the gateway.
Since your gateway is running as root/systemd, test it like this:
sudo systemctl edit openclaw-gateway.service
Add:
[Service]
Environment=OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS=60000
Environment=OPENCLAW_PLUGIN_MANIFEST_CACHE_MS=60000
Then:
sudo systemctl daemon-reload
sudo systemctl restart openclaw-gateway.service
Verify:
sudo systemctl show openclaw-gateway.service -p Environment
ps -C openclaw-gateway -o pid,pcpu,pmem,etime,args
If CPU drops, thatâs strong evidence the manifest scan loop is part of it. Still worth filing upstream, because a hot gateway path repeatedly rebuilding plugin discovery is probably something OpenClaw should cache better internally.
sudo ss -ltnp 'sport = :18789 or sport = :18791'
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
LISTEN 0 511 127.0.0.1:18789 0.0.0.0:* users:(("openclaw-gatewa",pid=1208315,fd=26))
LISTEN 0 511 127.0.0.1:18791 0.0.0.0:* users:(("openclaw-gatewa",pid=1208315,fd=28))
LISTEN 0 511 [::1]:18789 [::]:* users:(("openclaw-gatewa",pid=1208315,fd=27))
That output is normal: itâs one gateway process listening on both ports.
18789= main Gateway WS/HTTP port.18791= browser control service, derived asgateway.port + 2.
Docs say the browser control service binds to loopback on a port derived from gateway.port; default 18791 = 18789 + 2. So this is not a duplicate gateway and not the CPU cause.
The CPU issue is more likely the plugin manifest scanning you found, a stuck channel/run, or a loop. The cache env test is still reasonable.
ok how can i check and fix high cpu
Do it in this order. This gives you diagnosis + safe fixes without guessing.
1) Confirm what the gateway is doing
PID=$(pgrep -x openclaw-gateway | head -1)
ps -p "$PID" -o pid,user,pcpu,pmem,rss,etime,args
openclaw tasks list --status running
openclaw tasks audit
Then logs:
openclaw logs --limit 2000 --plain \
| grep -nEi "plugin|manifest|runtime-deps|npm|ENOTEMPTY|memory|index|embedding|compaction|cron|heartbeat|retry|timeout|error|exception" \
| tail -150
2) If you suspect plugin manifest scanning, confirm it
If strace is installed:
PID=$(pgrep -x openclaw-gateway | head -1)
sudo timeout 15s strace -f -p "$PID" \
-e trace=openat,newfstatat,statx \
-c
If you see tons of reads/stat calls for openclaw.plugin.json, apply the cache workaround.
3) Apply plugin cache workaround
sudo systemctl edit openclaw-gateway.service
Add:
[Service]
Environment=OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS=60000
Environment=OPENCLAW_PLUGIN_MANIFEST_CACHE_MS=60000
Then:
sudo systemctl daemon-reload
sudo systemctl restart openclaw-gateway.service
sleep 30
ps -C openclaw-gateway -o pid,user,pcpu,pmem,rss,etime,args
4) Run repairs after update
openclaw doctor
openclaw doctor --fix
sudo systemctl restart openclaw-gateway.service
5) If CPU is still high
Paste:
openclaw --version
ps -C openclaw-gateway -o pid,user,pcpu,pmem,rss,etime,args
openclaw tasks list --status running
openclaw logs --limit 500 --plain | tail -120
Donât kill it yet unless itâs unusable â the log should show whether itâs plugin scanning, memory indexing, compaction, cron/heartbeat, or channel reconnect loop.
PID=$(pgrep -x openclaw-gateway | head -1)
ps -p "$PID" -o pid,user,pcpu,pmem,rss,etime,args
openclaw tasks list --status running
openclaw tasks audit
pgrep: pattern that searches for process name longer than 15 characters will result in zero matches
Try `pgrep -f' option to match against the complete command line.
error: list of process IDs must follow -p
Usage:
ps [options]
Try 'ps --help <simple|list|output|threads|misc|all>'
or 'ps --help <s|l|o|t|m|a>'
for additional help text.
For more details see ps(1).
đŚ OpenClaw 2026.4.25 (aa36ee6) â We ship features faster than Apple ships calculator updates.
Background tasks: 0
Task pressure: 0 queued ¡ 0 running ¡ 0 issues
Status filter: running
No background tasks found.
đŚ OpenClaw 2026.4.25 (aa36ee6) â If it works, it's automation; if it breaks, it's a "learning opportunity."
Tasks audit: 0 findings ¡ 0 errors ¡ 0 warnings
Task findings: 0 ¡ TaskFlow findings: 0
No tasks audit findings.
Nice, youâre on 2026.4.25 now, and tasks are clean. The pgrep issue is just Linuxâs 15-char process-name limit.
Use -f:
PID=$(pgrep -f 'openclaw-gateway' | head -1)
ps -p "$PID" -o pid,user,pcpu,pmem,rss,etime,args
Or simpler:
ps -eo pid,user,pcpu,pmem,rss,etime,args \
| grep -E '[o]penclaw-gateway|[o]penclaw'
Then check if CPU is still high. If it is, run:
openclaw logs --limit 2000 --plain \
| grep -nEi "plugin|manifest|runtime-deps|npm|ENOTEMPTY|memory|index|embedding|compaction|cron|heartbeat|retry|timeout|error|exception" \
| tail -150
Since there are no running tasks, the likely suspects are now gateway startup/channel loop, plugin discovery, memory indexing, or channel health/reconnect â the filtered logs should reveal which.
323:2026-04-27T15:49:17.300Z error telegram/api {"subsystem":"telegram/api"} telegram sendChatAction failed: Network request for 'sendChatAction' failed!
329:2026-04-27T15:51:56.415Z error Gateway agent failed; falling back to embedded: Error: gateway timeout after 330000ms
172:2026-04-27T15:28:39.168Z error Invalid config at /root/.openclaw/openclaw.json:\n- plugins.entries.feishu: plugin feishu: plugin requires OpenClaw >=2026.4.25, but this host is 2026.4.24; skipping load
173:- plugins.entries.whatsapp: plugin whatsapp: plugin requires OpenClaw >=2026.4.25, but this host is 2026.4.24; skipping load
174:2026-04-27T15:28:39.192Z error gateway {"subsystem":"gateway"} shutdown error: Error: Invalid config at /root/.openclaw/openclaw.json:
175:- plugins.entries.feishu: plugin feishu: plugin requires OpenClaw >=2026.4.25, but this host is 2026.4.24; skipping load
176:- plugins.entries.whatsapp: plugin whatsapp: plugin requires OpenClaw >=2026.4.25, but this host is 2026.4.24; skipping load
That log has the smoking gun:
plugin requires OpenClaw >=2026.4.25, but this host is 2026.4.24
Your CLI is 2026.4.25, but the gateway service was still running 2026.4.24 at that time. That split-brain can cause bad loops/high CPU.
Run:
which openclaw
openclaw --version
sudo env "PATH=$PATH" openclaw --version
sudo systemctl cat openclaw-gateway.service | grep -E "ExecStart|Environment"
Then repair the service to use the current install:
sudo env "PATH=$PATH" openclaw gateway install --force
sudo systemctl daemon-reload
sudo systemctl restart openclaw-gateway.service
Verify:
sleep 20
openclaw gateway status --deep
ps -C openclaw-gateway -o pid,user,pcpu,pmem,rss,etime,args
openclaw logs --limit 200 --plain | tail -100
The Telegram sendChatAction line is probably just a network/typing-action failure. The version mismatch is the thing Iâd fix first.
which openclaw
openclaw --version
sudo env "PATH=$PATH" openclaw --version
sudo systemctl cat openclaw-gateway.service | grep -E "ExecStart|Environment"
/usr/bin/openclaw
OpenClaw 2026.4.25 (aa36ee6)
OpenClaw 2026.4.25 (aa36ee6)
No files found for openclaw-gateway.service.
That means itâs probably a user systemd service, not a system service. Use --user:
systemctl --user cat openclaw-gateway.service | grep -E "ExecStart|Environment"
systemctl --user status openclaw-gateway.service --no-pager
To reinstall/restart the managed user service:
openclaw gateway install --force
systemctl --user daemon-reload
systemctl --user restart openclaw-gateway.service
Then verify:
openclaw gateway status --deep
ps -eo pid,user,pcpu,pmem,rss,etime,args | grep -E '[o]penclaw-gateway'
If systemctl --user cat also says no files found, then the gateway is being run by something else. Find parent + command:
PID=$(pgrep -f 'openclaw-gateway' | head -1)
ps -p "$PID" -o pid,ppid,user,pcpu,pmem,etime,args
ps -p "$(ps -p "$PID" -o ppid=)" -o pid,user,etime,args
That will tell us what supervisor owns it.
but what difference does it make to cpu usage
It helps CPU only if the hot loop is repeatedly rebuilding plugin discovery.
Right now default cache TTL is ~1000ms. If some gateway path asks âwhat plugins/channels/commands exist?â many times, OpenClaw can re-scan ~100+ plugin manifest JSON files every second. Thatâs lots of synchronous filesystem/stat/JSON parse work inside the Node process â CPU burn.
Changing to:
OPENCLAW_PLUGIN_DISCOVERY_CACHE_MS=60000
OPENCLAW_PLUGIN_MANIFEST_CACHE_MS=60000
means: reuse the plugin discovery/manifest result for 60 seconds instead of 1 second.
So instead of:
scan manifests up to ~60 times/minute
it becomes:
scan manifests about ~1 time/minute
Tradeoff: if you install/enable/disable plugins, the gateway might not notice for up to 60s unless you restart it. For a normal always-on gateway, thatâs fine.
But if CPU is caused by Telegram reconnects, model calls, memory indexing, or a stuck service mismatch, this wonât fix it. Itâs specifically for the repeated manifest-scan CPU pattern.
ok this did not fix it whats next
Since plugin-cache didnât fix it, next isolate channel/runtime loops. Your logs already showed Telegram network errors, so start there.
1) Confirm hot process + owner
PID=$(pgrep -f 'openclaw-gateway' | head -1)
ps -p "$PID" -o pid,ppid,user,pcpu,pmem,rss,etime,args
ps -p "$(ps -p "$PID" -o ppid=)" -o pid,user,etime,args
2) Check channels
openclaw channels status --probe
Look for anything repeatedly disconnected, polling stall, or failing probes.
3) Temporarily disable Telegram and restart
Since your logs had Telegram failures:
openclaw config set channels.telegram.enabled false --strict-json
openclaw gateway restart
sleep 60
ps -p "$(pgrep -f 'openclaw-gateway' | head -1)" -o pid,user,pcpu,pmem,rss,etime,args
If CPU drops, Telegram/reconnect/polling is the culprit.
4) If CPU is still high, disable WhatsApp too
openclaw config set channels.whatsapp.enabled false --strict-json
openclaw gateway restart
sleep 60
ps -p "$(pgrep -f 'openclaw-gateway' | head -1)" -o pid,user,pcpu,pmem,rss,etime,args
5) If still high, grab the loop signature
openclaw logs --limit 3000 --plain \
| grep -nEi "telegram|whatsapp|polling|sendChatAction|getUpdates|health|restart|retry|memory|index|embedding|compaction|plugin|error|timeout|exception" \
| tail -200
Paste that. At this point we need to identify which subsystem is looping; donât keep tuning context/compaction until CPU source is known.
All Telegram accounts are âconnectedâ, so donât disable all Telegram yet. The suspicious one is:
Telegram vision ... works, audit failed
Next isolate just that account.
1) See why audit failed
Run:
openclaw channels status --probe --json \
| grep -i -A20 -B5 '"vision"'
If thatâs messy, use:
openclaw logs --limit 2000 --plain \
| grep -nEi "telegram.*vision|vision.*telegram|audit failed|sendChatAction|getUpdates|polling stall|telegram/api|error|timeout" \
| tail -120
2) Temporarily disable only vision
openclaw config set channels.telegram.accounts.vision.enabled false --strict-json
openclaw gateway restart
sleep 60
ps -p "$(pgrep -f 'openclaw-gateway' | head -1)" -o pid,user,pcpu,pmem,rss,etime,args
If CPU drops, the vision Telegram account/audit/polling loop is the culprit.
3) If CPU does not drop
Re-enable it:
openclaw config set channels.telegram.accounts.vision.enabled true --strict-json
Then disable all Telegram briefly as a stronger isolation:
openclaw config set channels.telegram.enabled false --strict-json
openclaw gateway restart
sleep 60
ps -p "$(pgrep -f 'openclaw-gateway' | head -1)" -o pid,user,pcpu,pmem,rss,etime,args
If all-Telegram-off fixes CPU, itâs one of those pollers. If not, Telegram isnât the cause.
how can i move from polling to interupts that should decrease overall headache, and improve cpu performance as i have lots of agents
Yes â for Telegram, âinterruptsâ = webhook mode. It can reduce headache because you stop running 6 long-polling loops (getUpdates) and Telegram pushes updates to you instead.
Caveat: you need a public HTTPS URL that Telegram can reach.
Recommended setup for multiple Telegram accounts
Use one webhook secret, but give each bot account a unique public URL + local port.
Example:
openclaw config set channels.telegram.webhookSecret "$(openssl rand -hex 32)"
Then configure each account:
openclaw config set channels.telegram.accounts.default.webhookUrl "https://YOURDOMAIN.com/tg/default"
openclaw config set channels.telegram.accounts.default.webhookPath "/tg/default"
openclaw config set channels.telegram.accounts.default.webhookPort 8787 --strict-json
openclaw config set channels.telegram.accounts.hawkeye.webhookUrl "https://YOURDOMAIN.com/tg/hawkeye"
openclaw config set channels.telegram.accounts.hawkeye.webhookPath "/tg/hawkeye"
openclaw config set channels.telegram.accounts.hawkeye.webhookPort 8788 --strict-json
openclaw config set channels.telegram.accounts.po.webhookUrl "https://YOURDOMAIN.com/tg/po"
openclaw config set channels.telegram.accounts.po.webhookPath "/tg/po"
openclaw config set channels.telegram.accounts.po.webhookPort 8789 --strict-json
Repeat for qa, router, vision, etc.
Reverse proxy idea
Proxy each path to its matching local port:
location /tg/default { proxy_pass http://127.0.0.1:8787; }
location /tg/hawkeye { proxy_pass http://127.0.0.1:8788; }
location /tg/po { proxy_pass http://127.0.0.1:8789; }
Then restart:
openclaw gateway restart
openclaw channels status --probe
You want Telegram accounts to show mode:webhook instead of mode:polling.