#Native subagent hangs gateway / event-loop starvation on 2026.5.7

1 messages · Page 1 of 1 (latest)

vapid mist
#

I need help diagnosing a native subagent hang/regression.

Environment:

  • OpenClaw 2026.5.7 stable
  • Linux arm64, Node v25.9.0
  • Gateway service: openclaw-gateway.service
  • Main model: openai-codex/gpt-5.5
  • Native subagent runtime
  • Stable appears to use @mariozechner/pi-coding-agent 0.73.0

Problem:
Native subagents used to work, but since around May 10–11, a single subagent can hang the whole gateway. This is not just “too many subagents”.

Symptoms:

  • Dashboard/WebChat request timeouts / WebSocket disconnects
  • Discord gateway disconnects/timeouts
  • Gateway becomes very slow/unresponsive
  • Restart often needs SIGKILL
  • Logs show event-loop starvation:
    • eventLoopUtilization=1
    • cpuCoreRatio≈1
    • large eventLoopDelay
    • Discord fetch timers delayed tens of seconds
    • WebSocket calls backed up for 100–200s+

Important test:
A minimal Discord-origin subagent reproduced the hang, so this does not seem Dashboard-specific.

Discord smoke test:

  • runId: 30e325bf-a394-4328-bd3d-9f5aee80ed65
  • childSessionKey: agent:coder:subagent:e6459bbb-fbc6-49f3-9c65-df56cfa1c2e5
  • requesterOrigin.channel: discord
  • context=isolated, lightContext=true, runTimeoutSeconds=60
  • task only: reply DISCORD_SUBAGENT_SMOKE_OK

Logs:

  • prep totalMs=143495
  • session-resource-loader=141347ms
  • startup totalMs=40595
  • attempt-dispatch=39821ms
  • subagent lane timed out after 90000ms
  • Discord fetch timeout elapsed 46636ms; timer delayed 36636ms
  • chat.history WS calls took 145–215s
  • cleanup timed out at pi-trajectory-flush
  • systemd SIGKILLed gateway during restart
  • service reported 5.6G memory peak

Earlier run:

  • runId bb9da5a5-3c79-4924-870a-38ba1fa9ec89
  • outcome: subagent run lost active execution context
  • session-resource-loader=59705ms
  • same event-loop starvation / Discord timeout pattern

Question:
Is this a known 2026.5.7 issue, stale state issue, or pi-coding-agent regression? Should we upgrade to beta 2026.5.12-beta.6, clean subagent/session state?

old onyxBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

merry hedgeBOT
#

This smells much more like the 2026.5.7 embedded Pi resource-loader stall than a generic stale-state/subagent-concurrency problem.

Why: your smoking gun is session-resource-loader=141347ms before dispatch, plus ELU=1 / one core pinned / huge timer delays. In 2026.5.7, OpenClaw constructs Pi’s DefaultResourceLoader directly with filesystem discovery enabled. In v2026.5.12-beta.6, there’s a very relevant changelog entry:

Agents: disable Pi's default filesystem resource discovery for embedded runs ... avoiding ... event-loop stalls during first ... agent startup. Fixes #77443.

And the code changed to pass noExtensions, noSkills, noPromptTemplates, noThemes, and noContextFiles into the embedded Pi resource loader. That matches your session-resource-loader timing almost too well.

So my recommendation:

  1. Upgrade to beta first, don’t start by deleting state:
openclaw update --channel beta --dry-run
openclaw update --channel beta
openclaw --version

Docs: https://docs.openclaw.ai/install/updating

#
  1. After restart, rerun exactly the same tiny Discord smoke test. Success criteria: session-resource-loader should drop from ~60–140s to something boring, and Discord/WS timers should not drift tens of seconds.

  2. If it still hangs on beta, then treat it as either a remaining Pi/native-subagent regression or corrupted/local state. At that point, archive rather than delete the specific bad child session/task state, and file a GitHub issue with these run IDs + logs: https://github.com/openclaw/openclaw/issues

I would not call this “Dashboard-specific” or “too many subagents”; your Discord-origin isolated/lightContext repro rules that out pretty cleanly. 🦞

vapid mist
#

Updating to the beta did not fix it. But deleting all session files fixed it.