#openclaw keeps crashing when running doctor. is this oom heap issue?
1 messages · Page 1 of 1 (latest)
I donât see the actual crash/log in the thread, so I wouldnât call it heap yet.
Quick triage:
- If it says
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory, yes, thatâs heap/OOM. - If itâs gateway closed / stuck processing / Codex turn never completes / doctor hangs, that may be one of the post-
2026.4.26Codex/ACP cleanup issues rather than pure heap â there are newer fixes on main around Codex harness lanes getting stuck. (raw.githubusercontent.com)
Please paste the exact output from:
openclaw --version
node --version
NODE_OPTIONS=--max-old-space-size=4096 openclaw doctor
And if the gateway is running as a service, also paste recent logs:
openclaw logs --follow
One gotcha: if your 4GB heap change was only in your shell, the service wonât necessarily inherit it. OpenClawâs docs note env precedence starts with the actual process environment, and services donât inherit your interactive shell env by default. (docs.openclaw.ai)
Redact tokens/API keys before pasting.
what should we do first? clean up codex acp issues?
Iâd not start by wiping Codex/ACP state yet. First separate âgateway/doctor crashâ from âstale ACP tasksâ.
Do this order:
openclaw --version
openclaw status --all
openclaw gateway status
openclaw logs --follow
If logs mention:
device-pair: notify poll failed
Unexpected end of JSON input
~/.openclaw/devices/pending.json
then thatâs likely corrupted pairing state, not heap/Codex. Fix with:
mv ~/.openclaw/devices/pending.json ~/.openclaw/devices/pending.json.bad
openclaw gateway restart
openclaw doctor
If gateway is stable but Codex/ACP is stuck, then clean the supported way:
openclaw tasks list --runtime acp
openclaw tasks audit
openclaw tasks maintenance
openclaw tasks maintenance --apply
Only cancel specific stuck running tasks after inspecting them:
openclaw tasks show <lookup>
openclaw tasks cancel <lookup>
So: logs/status first â corrupted state check â then ACP task maintenance. Donât manually delete sessions.json unless weâve confirmed thatâs the broken thing.
because my status message shows acp too idk why
đŠ OpenClaw 2026.4.26 (be8c246)
đ§ Model: openai/gpt-5.5 · đ api-key (openai:default)
đ§ź Tokens: 983 in / 126 out
đïž Cache: 99% hit · 101k cached, 0 new
đ Context: 102k/272k (38%) · đ§č Compactions: 0
đ§” Session: agent:main:telegram:group:-1003705610703:topic:1124 âą updated just now
đ Tasks: recently finished · acp · Context engine turn maintenance · No transcript changes were needed.
âïž Execution: direct · Runtime: OpenAI Codex · Think: medium · Text: low
đ„ Activation: always · đȘą Queue: collect (depth 0)
anyway, what should i do now?
That đ Tasks: recently finished · acp · Context engine turn maintenance · No transcript changes were needed line is not by itself a problem. It just means an ACP-backed maintenance task ran and finished cleanly. I wouldnât clean ACP state first.
Do this now:
openclaw logs --follow
Leave that running, then in another terminal run:
openclaw doctor
When it crashes, paste the 20â50 log lines around the crash.
If the crash line says heap/OOM, retry once with:
NODE_OPTIONS="--max-old-space-size=4096" openclaw doctor
If that works, then your heap setting isnât being inherited by the process/service.
If it does not say OOM, donât touch Codex/ACP yet â we should diagnose the actual crash. The ACP line in status looks harmless.
Docs for the flow: https://docs.openclaw.ai/help/faq and https://docs.openclaw.ai/gateway/doctor
hmm ok sure i will try
Sounds good. Tiny shortcut while you test: if logs show JavaScript heap out of memory, itâs heap. If logs show JSON parse / pending.json / device-pair, itâs likely corrupted pairing state. Anything else, paste the crash block and weâll chase the real cause. đŠ
i just keep getting doctor and these running in a loop
2026-Apr-29 22:35:40
â Doctor complete.
2026-Apr-29 22:35:40
2026-Apr-29 22:35:41
[entrypoint] setting up nginx basic auth (user: T3v5wQY43KFJqBUH)
2026-Apr-29 22:35:41
2026-04-29T22:35:46.295+00:00 [gateway] loading configurationâŠ
2026-Apr-29 22:35:46
2026-04-29T22:35:46.377+00:00 [gateway] resolving authenticationâŠ
2026-Apr-29 22:35:46
2026-04-29T22:35:46.380+00:00 [gateway] starting...
2026-Apr-29 22:35:47
2026-04-29T22:35:47.764+00:00 [secrets] [SECRETS_RELOADER_DEGRADED] SecretRefResolutionError: Environment variable "OPENAI_API_KEY" is missing or empty.
2026-Apr-29 22:35:47
2026-04-29T22:35:47.771+00:00 [gateway] wrote stability bundle: /data/.openclaw/logs/stability/openclaw-stability-2026-04-29T22-35-47-767Z-1-gateway.startup_failed.json
2026-Apr-29 22:35:47
2026-04-29T22:35:47.772+00:00 Gateway failed to start: Error: Startup failed: required secrets are unavailable. SecretRefResolutionError: Environment variable "OPENAI_API_KEY" is missing or empty.
2026-Apr-29 22:35:48
[entrypoint] state dir: /data/.openclaw
2026-Apr-29 22:35:48
[entrypoint] workspace dir: /data/workspace
2026-Apr-29 22:35:48
[entrypoint] setting up persistent tool storage in /data...
2026-Apr-29 22:35:48
[entrypoint] running configure...
2026-Apr-29 22:35:48
[configure] state dir: /data/.openclaw
2026-Apr-29 22:35:48
[configure] workspace dir: /data/workspace
2026-Apr-29 22:35:48
[configure] config file: /data/.openclaw/openclaw.json
2026-Apr-29 22:35:48
[configure] merged persisted config from /data/.openclaw/openclaw.json
2026-Apr-29 22:35:48
[configure] ZAI provider enabled (ZAI_API_KEY set)
2026-Apr-29 22:35:48
[configure] primary model (override): openai/gpt-5.5
aw doctor --fix...
2026-Apr-29 22:35
đŠ OPENCLAW đŠ
2026-Apr-29 22:35:50
2026-Apr-29 22:35:50
â OpenClaw doctor
Yep â this is not heap and not ACP.
The crash reason is right here:
Startup failed: required secrets are unavailable.
Environment variable "OPENAI_API_KEY" is missing or empty.
Your container is restarting, entrypoint runs configure + doctor --fix, then Gateway starts, fails because the configured primary model is:
primary model (override): openai/gpt-5.5
âŠbut the container does not have OPENAI_API_KEY.
Do one of these:
Option A â keep OpenAI/Codex
Add OPENAI_API_KEY to the container env / docker compose / hosting panel, then recreate/restart the container.
Example compose shape:
environment:
OPENAI_API_KEY: ${OPENAI_API_KEY}
Then:
docker compose up -d --force-recreate
Option B â donât use OpenAI
Remove/change the openai/gpt-5.5 primary model override so it uses a provider you actually configured, like the ZAI key your logs say is present.
Donât keep rerunning doctor yet â fix the missing secret or model override first, then restart. Docs: https://docs.openclaw.ai/help/environment and https://docs.openclaw.ai/concepts/models
but i just did subscription auth and switched to codex back end why need api key?
Because âCodex backendâ and âOpenAI provider authâ are different knobs.
Your log says the configured model is:
primary model (override): openai/gpt-5.5
openai/gpt-5.5 is being treated as the direct OpenAI provider route, so OpenClaw tries to resolve OPENAI_API_KEY. ChatGPT/Codex subscription auth is separate from normal API key billing/auth; OpenAI documents ChatGPT/API billing separately, while Codex plan access is via âSign in with ChatGPT.â (help.openai.com)
If you want subscription OAuth, use:
openclaw config get agents.defaults.model.primary
openclaw config get agents.defaults.agentRuntime
openclaw config get models.providers.openai.apiKey
openclaw models status
Then likely fix is either:
openclaw models set openai-codex/gpt-5.5
or, if you intentionally want native Codex harness, keep openai/gpt-5.5 + agentRuntime.id: "codex" but remove the stale direct OpenAI API-key SecretRef if you donât use it:
openclaw config unset models.providers.openai.apiKey
Also check your Docker/hosting env for a model override forcing openai/gpt-5.5 every boot. The âprimary model (override)â line makes me suspect an env/compose setting is reapplying it.
oh so in addition to auth i have to remove api key?
Yes â if itâs a stale/missing API-key reference.
You donât have to remove a working API key. But your config currently points at:
models.providers.openai.apiKey -> env OPENAI_API_KEY
âŠand the container does not have OPENAI_API_KEY, so startup fails before Codex auth can help.
If youâre subscription/Codex-only, do:
openclaw config unset models.providers.openai.apiKey
openclaw models set openai-codex/gpt-5.5
openclaw gateway restart
If you want native Codex harness, then model can be openai/gpt-5.5, but you still need to remove the missing API-key ref unless you actually provide OPENAI_API_KEY.
Also check Docker env/compose for anything forcing the model back to openai/gpt-5.5 on boot.
whats wrong? i thought openai/gpt-5.5 works for codex also?
Youâre not crazy â openai/gpt-5.5 can be right for native Codex harness.
The problem is: your config also has a direct OpenAI API-key secret ref somewhere, and Gateway fails startup because that env var is missing:
models.providers.openai.apiKey -> OPENAI_API_KEY
Codex subscription sign-in doesnât satisfy a broken OPENAI_API_KEY SecretRef. OpenAIâs Codex subscription flow is âsign in with ChatGPTâ; API availability/billing is separate. (help.openai.com)
So choose one:
Native Codex harness:
openclaw config unset models.providers.openai.apiKey
openclaw config set agents.defaults.model.primary openai/gpt-5.5
openclaw config set agents.defaults.agentRuntime.id codex
openclaw gateway restart
Codex OAuth through normal OpenClaw/PI runner:
openclaw config unset models.providers.openai.apiKey
openclaw models set openai-codex/gpt-5.5
openclaw gateway restart
Your immediate blocker is not the model name â itâs the stale missing OPENAI_API_KEY reference killing startup.
Docs: https://docs.openclaw.ai/providers/openai and https://docs.openclaw.ai/plugins/codex-harness
this:
GPT-5.5 is currently subscription/OAuth-only in OpenClaw. Use openai-codex/gpt-5.5 for PI OAuth, or openai/gpt-5.5 with the Codex app-server harness.
is from:
Yep â exactly. That doc line is right.
For native Codex app-server harness, this is valid:
model: openai/gpt-5.5
agentRuntime.id: codex
So the model name is not whatâs wrong.
Whatâs wrong is this separate stale config:
models.providers.openai.apiKey -> OPENAI_API_KEY
That makes Gateway try to resolve OPENAI_API_KEY during startup, and your container doesnât have it, so it exits before the Codex harness can run.
Your target state should be:
openclaw config unset models.providers.openai.apiKey
openclaw config set agents.defaults.model.primary openai/gpt-5.5
openclaw config set agents.defaults.agentRuntime.id codex
openclaw gateway restart
So: keep openai/gpt-5.5, remove the broken API-key SecretRef. Thatâs the key bit.
how to do?
Do it from the folder with your docker-compose.yml.
Since the container is restart-looping, stop it first, then run config edits through the gateway image:
docker compose stop openclaw-gateway
Then run these:
docker compose run --rm --no-deps --entrypoint node openclaw-gateway \
dist/index.js config unset models.providers.openai.apiKey
docker compose run --rm --no-deps --entrypoint node openclaw-gateway \
dist/index.js config set agents.defaults.model.primary openai/gpt-5.5
docker compose run --rm --no-deps --entrypoint node openclaw-gateway \
dist/index.js config set agents.defaults.agentRuntime.id codex
docker compose run --rm --no-deps --entrypoint node openclaw-gateway \
dist/index.js config validate
Then restart:
docker compose up -d --force-recreate openclaw-gateway
docker compose logs -f openclaw-gateway
Also check your .env / hosting panel: if you have a blank line like this, remove it unless youâre actually using an API key:
OPENAI_API_KEY=
Expected result: Gateway should stop failing on OPENAI_API_KEY, while still using openai/gpt-5.5 with agentRuntime.id: codex.
Docs: https://docs.openclaw.ai/install/docker and https://docs.openclaw.ai/plugins/codex-harness
setting a random str to the env var did the trick
Nice, that confirms it was missing SecretRef, not heap/OOM.
Iâd still replace the random string with the clean fix when you can:
openclaw config unset models.providers.openai.apiKey
openclaw gateway restart
The random env var works because startup only needed OPENAI_API_KEY to be non-empty. But later, anything that actually touches direct OpenAI API-key routes â embeddings, TTS, image gen, accidental openai/* PI route â may fail with auth/401 using that fake key.
After restart, sanity check:
openclaw models status
openclaw doctor
If this fixed it, please right-click the solution message â Apps â â Mark Solution so Answer Overflow indexes it for the next person. đŠ
u sure? can you check github to see if others have had similar problems?