#OOM every ~30 minutes of use in a container.
25 messages · Page 1 of 1 (latest)
I'll check whether there's already a fix or workaround landed for that OOM pattern, then I’ll give a clean in-thread answer.
If this is the same crash as #57349, first thing I’d check is whether you’re still on 2026.3.28.
A couple of gateway memory-leak fixes landed after that, including:
- a node-related leak fix in early April
fix(gateway): plug long-running memory leaksin 2026.4.10
So my recommendation is:
-
Update to latest first if you’re still on 2026.3.28, especially for Docker/container setups.
Docker docs: https://docs.openclaw.ai/install/docker -
Then paste these:
openclaw --version openclaw status --all openclaw gateway status
openclaw logs --follow
- If it still OOMs on current latest, grab the log section around the crash and post it here, because then it’s probably a new leak/regression, not just the older one.
Useful refs:
- FAQ: https://docs.openclaw.ai/help/faq
- Troubleshooting: https://docs.openclaw.ai/gateway/troubleshooting
So, short version: don’t spend time deep-debugging 2026.3.28 yet, update first.
As described in the github issue linked, this is happening on 2026.4.11
Got it, that’s useful. I’m checking whether anything newer than 2026.4.11 already landed for this specific leak, so I can point you at either a real workaround or a likely fix.
Oof, okay, that’s important, thanks.
If it still happens on 2026.4.11, then this does not look fully fixed by the earlier leak patches.
I checked current main, and I don’t see an obvious post-2026.4.11 fix for the webchat/control UI -> node.list polling path yet.
Most useful workaround for now:
- Do not leave the Control UI/webchat tab open if you need stability.
- The UI starts polling
node.listevery ~5s as soon as it connects, even if you are not on the Nodes tab. - So “just leave webchat open on Chat” is probably still enough to trigger the bad path.
If you need a stopgap only, you can also raise the Node heap in the container with NODE_OPTIONS=--max-old-space-size=8192, but that’s just buying time, not fixing the leak.
At this point I’d suggest updating the GitHub issue with:
- exact
openclaw --version - confirmation it still reproduces on 2026.4.11
- whether closing the webchat/control UI tab stops the memory growth
- fresh crash logs from:
openclaw status --all openclaw logs --follow
Docs:
If you want, I can help you write a tight repro/update comment for the issue.
This is literally covered in the issue linked to start this conversation.
You’re right, my bad, I re-asked for info that was already in the issue.
The only actually useful addition here is:
- it still reproduces on 2026.4.11
- there’s no obvious released fix yet
- the main workaround is still don’t leave Control UI/webchat open
- bumping
NODE_OPTIONSjust delays the crash
So if you add anything to the GitHub issue, I’d keep it to one short update like: “still reproduces on 2026.4.11”. That’s the new signal.
So what are others doing? If others are running OpenClaw in a container, this is going to happen. Which leads me to believe no one is running this in a container because it is unusable beyond asking one off questions. No work or building can take place, the crash kills any current main or subagent work being done, then you have to re-prompt through the web-ui to continue on?
I wouldn’t jump to “nobody runs it in a container”.
My honest read is more:
- containerized gateway itself is viable
- long-lived Control UI/webchat sessions in a container are the weak spot here
- for heavier work, people tend to either:
- run from source/on-host for the best dev loop, or
- keep the gateway in Docker/Podman but use Discord/Telegram/TUI/CLI as the main surface, not a browser tab left open all day
That lines up with the docs too, which treat Docker as optional, while the normal recommendation is the regular install / source flow:
And yes, with the bug in #57349, your concern is fair:
- an OOM kills the active main/subagent work
- if you mounted persistence correctly, history/workspace should survive
- but the in-flight turn is toast, so you usually have to resume manually
So I think your conclusion is basically:
container + web-ui-for-hours-at-a-time is not dependable enough right now for serious building
If you want actual human experience reports, I’d also ask in #users-helping-users. This thread is good for bug tracking, but that channel is better for “what are people actually doing in practice?”
It does survive, but after the crash the work doesnt just pick up where it left off. You have to specifically ask "Whats going on" or "How are things going" or "Hey, you crashed. Pick up where you left off". If you do not, work on the previous prompt does not continue.
Yep, exactly, and that’s the important distinction.
State surviving is not the same as the run resuming.
After a gateway crash, the transcript/session may still be there, but the in-flight execution is gone. So today it behaves more like:
- history survives
- active turn dies
- no automatic continuation
- you have to manually re-engage the agent to get it moving again
So for real build/workflows, I agree with you, that’s still a broken experience. A restart policy helps the process come back up, but it does not solve continuity.
I mean, what are fake build/workflows? Is that not what this was designed for? What model is backing you right now Krill?
Fair pushback, that wording was bad on my part.
I didn’t mean “fake” as in not real. I meant short/simple one-off tasks versus long-running interactive build sessions.
And yes, you’re right, real build/workflows are absolutely part of what OpenClaw is supposed to support. So if this path OOMs, drops the in-flight run, and needs a manual “pick back up” message, that’s a real break in the intended experience, not some edge case I’d hand-wave away.
On the model question, I can’t give reliable runtime model identity in chat here, and it wouldn’t really change this diagnosis anyway, this looks like a gateway/control-ui/runtime continuity problem, not a model-choice problem.
For model docs/community notes, use:
I just want to know if Im talking in loops or if you are backed by a specific model I havnt run this query through yet. That is why I ask what model is backing your responses right now.
Totally fair question.
I can’t give the runtime model identity here, but I can answer the part underneath it:
No, you’re not talking in loops.
The reason the answer keeps collapsing to the same place is that, from the docs/repo/issue state, this still looks like a real unresolved bug with no clean released workaround beyond avoiding the bad path.
So the stable answer is still:
- your use case is valid
- the OOM + lost in-flight continuation is real
- I don’t see a released fix yet
- “don’t leave webchat open” is a workaround, not a solution
If you want a second non-me take, I’d ask in #users-helping-users or #models so you can compare against human experience / other setups.
Thanks for the "help"
Fair. Sorry, that was frustrating.
You wanted a concrete fix, and I don’t have one for this bug. I shouldn’t have dragged you through the same points twice.
"Fair"... Well that gives it away. Thanks again