#OpenClaw/Codex hangs or loses app-server connection when using multiple channels

1 messages · Page 1 of 1 (latest)

honest lance
#

I’m seeing a recurring reliability issue when using OpenClaw across multiple Discord/channel contexts or when several messages/files are coming in close together.

Today I received this error:

“⚠️ Codex app-server connection closed before this turn finished. OpenClaw retried once when the stdio turn was still replay-safe; please try again if this keeps happening.”

What was happening at the time:

  • I was working in Discord through OpenClaw.
  • The task involved multiple turns/messages in the same channel.
  • I had uploaded PDF attachments and was asking Codex/OpenClaw to review them and generate a PDF output.
  • While the agent was working, additional messages/status checks were sent.
  • The turn appeared to get interrupted/hung, and the app-server connection closed before completion.
  • This seems more likely when multiple channels/messages/tool calls are active or when I interrupt/check status while a longer task is running.

Impact:

  • Long-running tasks become unreliable.
  • The agent may lose continuity or fail before delivering the output file.
  • I have to resend context or ask for status repeatedly.
  • It makes document-heavy workflows hard to trust.

What I need help with:

  1. Confirm what causes “Codex app-server connection closed before this turn finished.”
  2. Determine whether this is an OpenClaw orchestration issue, Codex stdio/session issue, Discord message handling issue, or a timeout/resource issue.
  3. Add better recovery behavior so long-running tasks can continue or resume safely after interruptions.
  4. Prevent multi-message/channel activity from causing the active task to hang or drop.
  5. Provide recommended workflow/config changes if I should be handling long document-generation tasks differently.

Useful context:

  • Surface: Discord channel conversation through OpenClaw
  • Task type: PDF review + generated PDF output
  • Attachments: multiple PDFs
  • Behavior: turn aborted / connection closed / retry only happened once because stdio replay was considered safe

Please let me know what logs, config files, timestamps, or reproduction steps you need. I can provide the exact time, channel, and task details.

dense copperBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

lethal summitBOT
smoky yarrowBOT
# honest lance I’m seeing a recurring reliability issue when using OpenClaw across multiple Dis...

I checked the current docs/source before answering.

That warning means the Codex app-server client connection closed after OpenClaw had already started a Codex turn, but before OpenClaw saw the terminal turn/completed event. OpenClaw treats that as potentially recoverable only while the turn is still replay-safe, so it retries once. After that, it stops instead of blindly replaying work that might have already used tools or produced side effects.

So the likely boundary is:

  • Codex app-server / stdio connection closed mid-turn: the direct trigger.
  • OpenClaw orchestration: detects it, aborts the active turn, and does the one replay-safe retry.
  • Discord messages/status checks: probably not the root cause by themselves, but they can increase pressure by queueing/steering/interruption while a long Codex turn is active.
  • Timeout/resource/provider pressure: very plausible, especially with multiple PDFs + generated output + long session history.

I would not call this “just Discord” without logs. The useful evidence is the gateway log around the exact timestamp.

Please grab:

smoky yarrowBOT
# honest lance I’m seeing a recurring reliability issue when using OpenClaw across multiple Dis...
openclaw --version
openclaw status --all
openclaw gateway status
openclaw channels status --probe
openclaw logs --local-time --follow

Then reproduce or find the time window and look for lines like:

  • codex app-server client closed before turn completed
  • codex app-server connection closed during startup
  • codex app-server turn idle timed out waiting for turn/completed
  • codex app-server client retired after timed-out turn
  • process exits, OOM, provider/model errors, or Discord attachment/media handling errors nearby

Docs for the diagnostic flow:

smoky yarrowBOT
# honest lance I’m seeing a recurring reliability issue when using OpenClaw across multiple Dis...

https://docs.openclaw.ai/help/faq
https://docs.openclaw.ai/gateway/troubleshooting
https://docs.openclaw.ai/logging

For the workflow side, I’d handle document-heavy jobs as a fresh session/thread with all PDFs in the initial message, then avoid sending “status?” into the same active session unless needed. Ask it up front to write intermediate artifacts/checkpoints and final output to disk, then send the file at the end. If this is a recurring production workflow, it is worth moving it to a local/source install or dedicated local agent session so logs, files, and recovery state are easier to inspect.

If you can share the exact timestamp, openclaw --version, install type, model/provider, PDF count/sizes, and the log slice from ~2 minutes before to ~2 minutes after the warning, that should be enough to tell whether this is a Codex app-server crash/stdio close, an OpenClaw timeout/watchdog path, a provider/resource failure, or channel ingress pressure. If the logs show the same path repeatedly on current OpenClaw, file it as a GitHub issue with that repro: https://github.com/openclaw/openclaw/issues