#upgrade to 2026.4.25 is hanging

1 messages · Page 1 of 1 (latest)

royal rain
#

Upgraded from v2026.4.24 → v2026.4.25. Gateway hangs silently after channel init starts.
On first boot after upgrade, 11 deps were installed fresh including @whiskeysockets/baileys@7.0.0-rc.9. Gateway reaches ready state fine (10 plugins loaded). Then:
[gateway] starting channels and sidecars...
[gateway/hooks] loaded 4 internal hook handlers
After that — complete silence. No [whatsapp], no [telegram], no error, no crash. Log is frozen for 10+ minutes. Plugins active: whatsapp, telegram, acpx, browser, memory-core, memory-wiki, webhooks.
Questions:

Is there a known hang in channel init on v2026.4.25?
Is baileys@7.0.0-rc.9 a breaking change for the whatsapp plugin?
Is there a more verbose log flag or a separate channel-level log to diagnose further?

fair oakBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

compact juncoBOT
#

Short version: I wouldn’t blame Baileys first. @openclaw/whatsapp 2026.4.25 intentionally pins @whiskeysockets/baileys@7.0.0-rc.9, so it’s not an accidental transitive upgrade. (raw.githubusercontent.com)

That frozen point is suspicious though: after internal hooks, startup enters sidecars.channels, does primary-model prewarm, then starts channels. So if you never see any [whatsapp] / [telegram], it may be stuck before the actual channel listeners. (raw.githubusercontent.com)

I’d do this:

openclaw --version
node -v
openclaw doctor --fix
openclaw status --all

If 2026.4.26 is available for you, update first — 2026.4.25 had major plugin startup/install registry changes, and 2026.4.26 includes nearby gateway/plugin install startup fixes. (raw.githubusercontent.com)

For verbose startup:

#
OPENCLAW_SKIP_CHANNELS=1 OPENCLAW_GATEWAY_STARTUP_TRACE=1 OPENCLAW_LOG_LEVEL=debug openclaw gateway

If that boots, disable WhatsApp/Telegram one at a time:

openclaw config set channels.whatsapp.enabled false --strict-json
openclaw config set channels.telegram.enabled false --strict-json
openclaw gateway restart

If it still hangs after doctor --fix + latest patch, this is GitHub-issue-worthy with openclaw status --all, Node version, and the startup-trace log.

royal rain
#

Found that /home/node/.openclaw/registry does not exist in the container. Given the 4.25 release note about plugin startup moving to a cold persisted registry — is this directory expected to be pre-created before first boot on a fresh upgrade? Could its absence cause a silent hang/clean exit after loaded 4 internal hook handlers?

compact juncoBOT
#

No — /home/node/.openclaw/registry not existing is not the smoking gun.

In 4.25 the persisted plugin index lives under:

/home/node/.openclaw/plugins/installs.json

not .openclaw/registry. It does not need to be pre-created; if missing/invalid, OpenClaw should derive the plugin index and you can refresh it with:

openclaw plugins registry --refresh
# or
openclaw doctor --fix

So absence of .openclaw/registry should not cause a silent hang after:

#
[gateway/hooks] loaded 4 internal hook handlers

That point is after plugin bootstrap and internal hooks; next likely phase is channel startup / model prewarm. I’d verify exact version first:

openclaw --version
node -v

Then run:

OPENCLAW_LOG_LEVEL=debug OPENCLAW_GATEWAY_STARTUP_TRACE=1 openclaw gateway
#

If you want to specifically inspect/repair plugin registry state:

openclaw plugins registry
openclaw plugins registry --refresh
ls -la /home/node/.openclaw/plugins

If it still freezes, isolate channels:

OPENCLAW_SKIP_CHANNELS=1 OPENCLAW_LOG_LEVEL=debug OPENCLAW_GATEWAY_STARTUP_TRACE=1 openclaw gateway

If OPENCLAW_SKIP_CHANNELS=1 boots cleanly, then it’s not the registry — it’s channel startup/model prewarm. Next disable WhatsApp/Telegram one at a time.

royal rain
#

After waiting out the health-monitor grace period (~35 min), we can see channels are starting but immediately dying with zero logs:
[health-monitor] [telegram:ditto-ops] health-monitor: restarting (reason: stopped)
[health-monitor] [telegram:kingler-tg] health-monitor: restarting (reason: stopped)
[health-monitor] [whatsapp:default] health-monitor: restarting (reason: stopped)
Health monitor restart attempts also produce zero channel logs. All 3 channels, same silent failure. channels logs --channel all hangs (channel subsystem is dead).
This regression is consistent across both v2026.4.24 and v2026.4.25. v2026.4.20 works fine.
Questions:

Is there a known silent crash path in channel init introduced in 4.24/4.25 that fires before channel-level logging initializes?
Is there a separate per-channel log file on disk we can read directly rather than via CLI?
Does OPENCLAW_SKIP_CHANNELS=1 with OPENCLAW_GATEWAY_STARTUP_TRACE=1 produce any channel-level trace before skipping — or does it skip entirely?

compact juncoBOT
#

That’s useful data — and yeah, “all 3 channels immediately become stopped with no per-channel logs” smells like a shared startup/runtime path, not Baileys specifically.

Answers:

  1. Known silent path?
    There is a plausible silent path: if a channel plugin’s startAccount() returns cleanly instead of throwing or staying alive, core marks it running: false in finally, but doesn’t log channel exited because that log only happens on rejection. Health monitor then reports reason: stopped. So this can happen before useful channel logs exist.

  2. Separate per-channel log file?
    No, not separate files. Channel logs are filtered from the main Gateway JSONL log. Read directly:

    ls -lh /tmp/openclaw/openclaw-*.log
    tail -n 1000 "$(ls -t /tmp/openclaw/openclaw-*.log | head -1)" \
      | grep -E 'gateway/channels|telegram|whatsapp|health-monitor|channel startup|channel exited'
    

    If logging.file is customized, use:

#
   openclaw config get logging.file
  1. OPENCLAW_SKIP_CHANNELS=1 + startup trace?
    It skips channel startup entirely. You should get a sidecars.channels startup-trace phase plus the “skipping channel start” log, but no per-channel trace because no channel plugin starts.

Given it reproduces on 4.24 + 4.25 and 4.20 works, I’d try 4.26 before deeper digging — 4.26 has several startup/plugin-runtime fixes near this area.

Best next repro command:

OPENCLAW_LOG_LEVEL=trace OPENCLAW_GATEWAY_STARTUP_TRACE=1 openclaw gateway

And if still broken, please file a GitHub issue with openclaw --version, node -v, openclaw status --all, and the direct /tmp/openclaw/openclaw-*.log tail. This one sounds like it needs maintainer eyes.

royal rain
#

Update: Isolated to channel startup via OPENCLAW_SKIP_CHANNELS=1
Ran OPENCLAW_SKIP_CHANNELS=1 OPENCLAW_GATEWAY_STARTUP_TRACE=1 — gateway boots cleanly to ready state. All plugins load, hooks register, sidecars start fine. Log confirms:
[gateway/channels] skipping channel start (OPENCLAW_SKIP_CHANNELS=1 or OPENCLAW_SKIP_PROVIDERS=1)
So the failure is definitively in channel startup, not model prewarm or plugin bootstrap. All three channels (whatsapp:default, telegram:ditto-ops, telegram:kingler-tg) log starting provider then die silently — no exception, no error log, health monitor reports reason: stopped.
Currently doing a source diff between v2026.4.20 and v2026.4.24 on the channel startup path to find what changed.

compact juncoBOT
#

Nice isolate. Small clarification: OPENCLAW_SKIP_CHANNELS=1 skips the whole sidecars.channels block, which includes model prewarm and startChannels(). But since you see each provider start and then become stopped when not skipped, I agree this is channel-startup proper.

The diff I’d focus on: between 2026.4.20 and 2026.4.24, the shared server-channels.ts path barely changed. The big shared change is in bundled channel loading/runtime deps: bundled channels now prepare/mirror staged runtime deps before loading channel entries. That would fit “Telegram + WhatsApp all die the same way” better than a Baileys-only bug.

Try isolating the runtime-deps staging path with a fresh explicit stage dir:

mkdir -p /tmp/openclaw-plugin-runtime-deps-test

OPENCLAW_PLUGIN_STAGE_DIR=/tmp/openclaw-plugin-runtime-deps-test \
OPENCLAW_LOG_LEVEL=trace \
OPENCLAW_GATEWAY_STARTUP_TRACE=1 \
openclaw gateway

Also check what got staged:

#
find /home/node/.openclaw/plugin-runtime-deps -maxdepth 4 -type f -name package.json -print 2>/dev/null
find /home/node/.openclaw/plugin-runtime-deps -maxdepth 4 -type d -name node_modules -print 2>/dev/null

If a fresh OPENCLAW_PLUGIN_STAGE_DIR fixes it, that strongly points at stale/corrupt staged bundled runtime deps or module-resolution collision. If it still fails, next useful thing is a GitHub issue with:

openclaw --version
node -v
openclaw status --all
OPENCLAW_LOG_LEVEL=trace OPENCLAW_GATEWAY_STARTUP_TRACE=1 openclaw gateway

and mention: OPENCLAW_SKIP_CHANNELS=1 cleanly boots, but all bundled channel accounts return to stopped with no rejection/error.

royal rain
#

Fix for silent channel crash on v2026.4.24/4.25 — seeking confirmation before making permanent
Root cause identified: corrupt plugin-runtime-deps dir. Channels all die silently after starting provider because startAccount() resolves immediately (no throw) when deps are in a bad state.
Fix that worked: pointing OPENCLAW_PLUGIN_STAGE_DIR to a fresh directory.
Planning to make this permanent by adding to Dockerfile:
ENV OPENCLAW_PLUGIN_STAGE_DIR=/home/node/.openclaw/plugin-stage
This path is inside the host-mounted volume so it persists across container restarts.
Questions:

Is OPENCLAW_PLUGIN_STAGE_DIR the correct env var to override the stage dir permanently?
Is /home/node/.openclaw/ the right place for this, or should it live elsewhere?
Is there a openclaw doctor --fix equivalent that repairs a corrupt stage dir instead of needing a fresh path?

compact juncoBOT
#

Yep — that diagnosis makes sense.

  1. Yes, OPENCLAW_PLUGIN_STAGE_DIR is the right env var.
    It’s specifically for bundled plugin runtime deps. Docs: https://docs.openclaw.ai/install/updating

  2. /home/node/.openclaw/plugin-stage is fine if:

    • it’s writable by the node user
    • it’s on the mounted persistent volume
    • the Gateway and any doctor --fix runs use the same env var

    I’d maybe name it closer to the docs/default:

    OPENCLAW_PLUGIN_STAGE_DIR=/home/node/.openclaw/plugin-runtime-deps
    

    But your /home/node/.openclaw/plugin-stage path is valid.

#
  1. Doctor can repair missing bundled runtime deps, but for a corrupt stage dir I’d still prefer a clean rebuild:
    openclaw gateway stop
    
    mv /home/node/.openclaw/plugin-runtime-deps \
       /home/node/.openclaw/plugin-runtime-deps.bad.$(date +%s) 2>/dev/null || true
    
    OPENCLAW_PLUGIN_STAGE_DIR=/home/node/.openclaw/plugin-stage \
      openclaw doctor --fix
    
    openclaw gateway restart
    

Important: if you set the env in Dockerfile/compose, also run doctor --fix inside the container with that same env active, otherwise doctor may repair the default stage dir while the Gateway uses another one.

Given this was silent channel death, I’d also file a GitHub issue with the repro/fix — corrupt runtime deps should ideally log loudly, not make startAccount() resolve cleanly.
https://github.com/openclaw/openclaw/issues

royal rain
#

Follow-up to earlier thread on v2026.4.25 hang after [hooks] loaded 4 internal hook handlers
Additional debugging done since last update:

OPENCLAW_SKIP_CHANNELS=1 confirmed: failure is in channel startup, not model prewarm
OPENCLAW_PLUGIN_STAGE_DIR=/tmp/openclaw-plugin-runtime-deps-test — channels connected successfully
Real boot with OPENCLAW_PLUGIN_STAGE_DIR pointing to any path under /home/node/.openclaw/ (host-mounted NTFS volume) — dies silently, same point, exit code 0
JSONL log file on disk confirms: log ends exactly at loaded 4 internal hook handlers, nothing after
Node process is gone (pgrep node empty) — clean exit, no crash

The only consistent difference between the working test run and the failing real boot: /tmp is container-local, /home/node/.openclaw/ is a Windows NTFS host-mounted volume.
Question: What does the channel startup path do with OPENCLAW_PLUGIN_STAGE_DIR that would silently fail on an NTFS-backed mount — and is keeping this dir on a container-local path the correct permanent fix?

compact juncoBOT
#

Yep — that pretty much confirms it’s the stage filesystem, not Telegram/WhatsApp.

OPENCLAW_PLUGIN_STAGE_DIR is used for bundled plugin runtime deps: it creates package roots, runs npm installs, creates/renames node_modules, mirrors bundled plugin dist/ files, and may use symlinks/copies + lock dirs. That path wants POSIX-ish filesystem behavior. A Windows NTFS bind mount can absolutely break that in weird ways.

Recommended permanent fix: keep OPENCLAW_PLUGIN_STAGE_DIR off the Windows bind mount.

Best Docker shape:

environment:
  OPENCLAW_PLUGIN_STAGE_DIR: /var/lib/openclaw/plugin-runtime-deps

volumes:
  - ./openclaw-state:/home/node/.openclaw
  - openclaw-plugin-runtime-deps:/var/lib/openclaw/plugin-runtime-deps
#
volumes:
  openclaw-plugin-runtime-deps:

So: persistent Docker-managed volume = good. /tmp = also good for correctness, but deps may reinstall after container recreation. /home/node/.openclaw on NTFS bind mount = avoid for plugin stage.

doctor --fix can repair/install missing bundled runtime deps in the configured stage dir, but if the underlying filesystem is NTFS-backed and producing bad node_modules/rename/symlink behavior, doctor can’t really make that reliable. You can rotate the bad dir, but the real fix is “stage dir on container-local/Linux filesystem.”

Also: this silent clean exit is still worth filing. Bad stage deps should produce a loud startup error, not make channels resolve/stopped with exit code 0.
https://github.com/openclaw/openclaw/issues

royal rain
#

Update: root cause confirmed, fix identified
Your diagnosis was correct — OPENCLAW_PLUGIN_STAGE_DIR was pointing into the Windows NTFS host-mounted volume. Specifically, the Dockerfile had:
ENV OPENCLAW_PLUGIN_STAGE_DIR=/home/node/.openclaw/plugin-runtime-deps
And /home/node/.openclaw is a Windows NTFS bind mount. That's why /tmp worked and the real boot didn't.
Planned fix:

Remove the ENV OPENCLAW_PLUGIN_STAGE_DIR line from Dockerfile
Add a Docker-managed named volume for the stage dir in docker-compose:

yamlenvironment:
OPENCLAW_PLUGIN_STAGE_DIR: /var/lib/openclaw/plugin-runtime-deps
volumes:

  • openclaw-plugin-runtime-deps:/var/lib/openclaw/plugin-runtime-deps
    yamlvolumes:
    openclaw-plugin-runtime-deps:
    Before applying — is there anything else that needs to happen on first boot with a fresh named volume? Does doctor --fix need to run explicitly, or will OpenClaw stage the deps automatically on startup?
compact juncoBOT
#

Yes — that compose shape is the right fix.

A fresh named volume should be okay. OpenClaw will stage bundled runtime deps automatically on startup before importing the active bundled channel plugin, so doctor --fix is not strictly required.

I’d still recommend running it once as a clean preflight because it gives clearer output than “first gateway boot is doing npm staging”:

docker compose run --rm openclaw openclaw doctor --fix
docker compose up -d

Two important details:

  1. Make sure /var/lib/openclaw/plugin-runtime-deps is writable by the container user (node). If your image runs as node, add something like this before switching users in the Dockerfile:
#
RUN mkdir -p /var/lib/openclaw/plugin-runtime-deps \
  && chown -R node:node /var/lib/openclaw
  1. Keep the stage dir off the NTFS bind mount. This is the key part:
    • /home/node/.openclaw bind-mounted from Windows: good for config/state, bad for runtime deps
    • /var/lib/openclaw/plugin-runtime-deps Docker named volume: good
    • /tmp: works, but non-persistent

Your compose should be roughly:

environment:
  OPENCLAW_PLUGIN_STAGE_DIR: /var/lib/openclaw/plugin-runtime-deps
#
volumes:
  - ./openclaw-state:/home/node/.openclaw
  - openclaw-plugin-runtime-deps:/var/lib/openclaw/plugin-runtime-deps

volumes:
  openclaw-plugin-runtime-deps:

Docs for this env var: https://docs.openclaw.ai/install/updating

And yeah — please file the silent-exit part as an issue. The workaround is solid, but OpenClaw should log loudly when stage-dir filesystem behavior breaks channel runtime loading.

royal rain
#

Resolved — thanks Krill. Summary for anyone hitting the same issue:
Problem: On Docker (Linux container, Windows 11 host), gateway died silently after [hooks] loaded 4 internal hook handlers. Exit code 0, no channel logs, all 3 channels reporting reason: stopped. Reproduced on v2026.4.24 and v2026.4.25. v2026.4.20 worked fine.
Root cause: OPENCLAW_PLUGIN_STAGE_DIR was pointing to a path inside a Windows NTFS bind-mounted volume (/home/node/.openclaw/). The plugin staging path does npm installs, creates/renames node_modules, and uses symlinks — all of which require POSIX filesystem behavior. NTFS-backed mounts break this silently, causing startAccount() to resolve cleanly with no error or log output.
Fix: Move OPENCLAW_PLUGIN_STAGE_DIR off the NTFS bind mount onto a Docker-managed named volume:
yamlenvironment:
OPENCLAW_PLUGIN_STAGE_DIR: /var/lib/openclaw/plugin-runtime-deps
volumes:

  • openclaw-plugin-runtime-deps:/var/lib/openclaw/plugin-runtime-deps

volumes:
openclaw-plugin-runtime-deps:
Remove any ENV OPENCLAW_PLUGIN_STAGE_DIR line from the Dockerfile that points into the bind-mounted path.
Result: Gateway boots cleanly, all channels start, dep staging works correctly.
Will file a separate GitHub issue for the silent clean exit — bad stage-dir behavior should log loudly, not make channels resolve as stopped with no output.

compact juncoBOT
#

Awesome, glad we got it pinned down. 🎉

That summary is solid and should help the next Windows + Docker user a lot. The key bit is definitely: state bind mount is fine; plugin runtime-deps stage needs a Linux/Docker-managed filesystem.

If this fixed it, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow indexes it for future users.

And yes please do file the silent-exit issue — the workaround is good, but the failure mode should be loud. 🦐