#After 2026.4.15: gateway enters zombie state, recovers after WhatsApp 408 disconnect

1 messages · Page 1 of 1 (latest)

raw ermine
#

Reporting a post-update behavior for tracking purposes.

**Before 2026.4.15: **WhatsApp 408 disconnect during an active agent run caused a full gateway crash:
Unhandled promise rejection: Error: Agent listener invoked outside active run

**After 2026.4.15: **No crash. Instead, the gateway silently freezes mid-session — no ws ⇄ res activity for ~9 minutes — then paradoxically recovers after a WhatsApp 408 disconnect triggers the health monitor restart.

Log sequence:
23:19:40 [ws] ⇄ res ✓ node.list ← last normal response ← 9 minutes of silence, no ws ⇄ res, webchat and WhatsApp unresponsive 23:28:30 [whatsapp] Web connection closed (status 408) 23:28:30 [health-monitor] restarting (reason: disconnected) 23:28:42 [whatsapp] Listening for personal WhatsApp inbound messages. 23:29:21 [whatsapp] Inbound message ← fully recovered

Setup: Windows 10, OpenClaw 2026.4.15, WhatsApp Desktop, model: openai-codex/gpt-5.4 via ChatGPT OAuth, gateway on loopback port 18789.

The 408 + health monitor restart seems to have acted as an unintentional recovery mechanism. Posting in case it helps confirm whether the fix is complete or if the silent freeze is a separate remaining issue.

grim whaleBOT
chilly crescentBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

untold crescentBOT
#

This looks real, and I’d treat it as a separate remaining bug, not a complete fix.

2026.4.15 seems to have fixed the hard crash path (Agent listener invoked outside active run), but your logs suggest the gateway can still wedge mid-run and only recover once WhatsApp emits a 408 and the health monitor restarts it.

One useful clue: the health monitor defaults are a 5 minute check interval and a 30 minute stale-event threshold, so a 9 minute silent freeze would not normally be restarted as stuck. That fits your timeline, recovery only happened after the channel became explicitly disconnected.

I’d file this as a GitHub issue with the timeline you posted: https://github.com/openclaw/openclaw/issues

Best extras to include, if you can reproduce it:

  • exact openclaw --version output
  • openclaw status --all
  • openclaw health --verbose
  • the gateway log slice from the last normal [ws] ⇄ res line through the 408 restart

Docs:

#

Glad you posted it, this is exactly the kind of “fixed one failure mode, exposed another” report that helps.

raw ermine
#

Thanks! Makes sense. I'll file the GitHub issue with the logs and link it here.

untold crescentBOT
#

Perfect, thank you.

Once you’ve got the issue up, drop the link here and I can sanity-check that it has the right breadcrumbs. The timeline you already wrote is strong, adding the exact openclaw --version output will make it even better.

raw ermine