- Executive summary
• The gateway is not dead. It is listening on 0.0.0.0:18789, HTTP responds, and basic health/RPC probes can succeed.
• The failure is more specific: the WebSocket/session path used for subagent work becomes unhealthy or slow enough to time out.
• This is why normal chat can still work while specialist launches fail.
- What I verified
• Port 18789 is listening.
• Local HTTP probe to http://127.0.0.1:18789/ returned 200.
• openclaw health --json returned ok.
• Gateway process is alive, but heavy:
• working set around 475 MB
• many threads/handles
• Memory is not catastrophic right now, so this does not look like simple RAM starvation.
- Actual issue found
The logs point to a few concrete problems:
• Session/WebSocket operations are stalling
• I found repeated slow gateway operations like sessions.patch taking ~12.4s.
• I also found lane congestion warnings like lane wait exceeded.
• This matches your symptoms: subagent/session actions are the path that times out.
• Subagent announce path is timing out on loopback
• There are explicit log entries showing:
• Subagent announce completion direct announce ... gateway timeout after 120000ms
• So the problem is not just startup, it also affects completion handoff.
• Gateway restarts have been unhealthy/inconsistent
• I found:
• Gateway restart timed out after 60s waiting for health checks
• Found stale gateway process(es)
• stale session lock removal
• That suggests restart/state cleanup has not been fully clean.
• Scheduled Task runtime checks are unreliable
• openclaw gateway status reported:
• Runtime: unknown (Error: schtasks unavailable: schtasks timed out after 15000ms)
• That does not directly cause the subagent issue, but it is another sign the control layer is unhealthy/slow.
• Bonjour/mDNS is flapping
• Repeated logs show advertise/probing/restart loops.
• I do not think this is the root cause of your subagent timeout, but it adds noise