#ACP broke with latest update?

1 messages · Page 1 of 1 (latest)

left swan
#

Here's what my Opus 4.6-powered Claude concluded:

Still failing as expected — I've traced the root cause. Here's the diagnosis:

The acpx runtime backend can't connect to the gateway WebSocket. The gateway log shows repeated errors:

gateway connect failed: invalid handshake: first request must be connect
acp startup identity reconcile: checked=22 resolved=0 failed=22

The acpx subprocess launches and tries to connect back to the gateway, but the handshake fails — it sends sessions.list before the required connect frame. This is why:

  • The acpx backend shows "registered" but never "ready"
  • Every ACP turn fails immediately with "Internal error" (0 tokens, 2 seconds)

The version mismatch: OpenClaw 2026.4.25 bundles acpx v0.6.1 (up from v0.5.3 in 2026.4.24). The v0.6.1 runtime has a changed gateway handshake protocol that the gateway isn't handling correctly — or the gateway's WebSocket protocol changed in 2026.4.25 and the bundled runtime deps weren't rebuilt to match.

This is an OpenClaw bug, not a config issue. Our options:

  1. Downgrade to OpenClaw 2026.4.24 — ACP worked there
  2. Wait for 2026.4.26+ — there's already a release (we saw it in the changelog), might fix this
  3. Try installing 2026.4.26 now

Want me to check if 2026.4.26 is available and try the upgrade? Subagent spawns (non-ACP) still work fine — that's how Boris built the launchd migration.

Is this true?

grim oysterBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

vague badgerBOT
#

Mostly plausible, but overstated.

What I can confirm:

  • 2026.4.25 did bump bundled acpx to 0.6.1. The changelog says that explicitly. (raw.githubusercontent.com)
  • invalid handshake: first request must be connect really means a Gateway WS client sent some other RPC before connect; if the frame was sessions.list, that diagnosis is directionally right.
  • But acp startup identity reconcile: checked=… resolved=0 failed=… is not proof by itself that acpx’s protocol changed. Similar reconcile failures have appeared with other ACP issues where the root cause was session/routing/runtime state, not a gateway handshake protocol mismatch. (github.com)
  • “registered but never ready” is also a little suspicious as evidence: recent ACPX behavior intentionally avoids normal startup probes, so “registered” without “ready” can be expected unless probing is enabled.

The part I would not state as fact yet:

“acpx v0.6.1 has a changed gateway handshake protocol that the gateway isn’t handling.”

That’s a reasonable hypothesis, but I’d call it unconfirmed unless you have a captured WS frame showing sessions.list as the first frame from the ACP bridge process.

Also: 2026.4.26 is in the changelog, but stable latest is still 2026.4.25 from what I checked; treat 2026.4.26-beta.1 as beta, not “the stable fix.” (raw.githubusercontent.com)

Recommended next step: paste/file a GitHub issue with:

GitHub

Description When attempting to spawn ACP sessions (Codex, Claude Code, etc.) via /acp spawn or sessions_spawn({ runtime: "acp" }), the thread binding fails with: acp startup identity reco...

#
openclaw --version
openclaw status --all
openclaw gateway status
openclaw logs --follow

status --all is the docs-recommended shareable report with redaction. (docs.openclaw.ai)

Short answer: yes, likely an OpenClaw regression rather than config — but the exact “acpx handshake protocol mismatch” conclusion is not proven yet.

left swan
#

Here's my bots reply to that:

Fair correction from Krill — I overstated the diagnosis. The "handshake protocol mismatch" is a hypothesis, not confirmed. Let me gather the actual diagnostic output they recommended:OK wait — I may have found the actual issue. The doctor says the Anthropic auth token (from Claude CLI) is expiring in 6 hours. But more importantly, the doctor recommends re-auth. If the token is in a bad state for the acpx subprocess, that would explain the "Internal error" — the ACP runtime can't authenticate to Anthropic's API.

But that doesn't fully explain the "invalid handshake" WS errors. Those are a gateway-level issue. Let me check one more thing:Here's where we are, Bob:

Confirmed findings:

  • ACP runtime shows "registered" but never "ready" since the 2026.4.25 upgrade
  • Gateway logs show gateway connect failed: invalid handshake: first request must be connect — the acpx subprocess can't connect back to the gateway
  • anthropic:claude-cli auth token is expiring (6h left)
  • openclaw doctor recommends re-auth

What Krill correctly flagged: My "acpx handshake protocol mismatch" diagnosis was a hypothesis, not proven. The reconcile failures and missing "ready" could have other causes.

What I'd recommend:

  1. File the GitHub issue with the diagnostic output as Krill suggested
  2. Try re-authing: claude auth login then openclaw models auth login --provider anthropic --method cli --set-default
  3. Restart gateway after re-auth and test ACP again
  4. If still broken, try the 2026.4.26-beta.1

Subagent spawns work fine as a fallback for Boris/Vlad work in the meantime. Want me to file the issue, or would you rather handle it?

vague badgerBOT
#

That bot reply is much better, but I’d tweak two things:

  1. Re-auth is worth trying, and those commands look right:

    claude auth login
    claude auth status --text
    openclaw models auth login --provider anthropic --method cli --set-default
    openclaw gateway restart
    

    Docs: https://docs.openclaw.ai/gateway/cli-backends

  2. Don’t call the Anthropic token the root cause yet. An expiring/bad Claude CLI token could explain ACP “Internal error” during a turn, but it does not explain a Gateway WS error like:

    invalid handshake: first request must be connect
    

    That still points to a Gateway/client handshake problem or a separate ACPX/runtime issue.

#

Also, “registered but never ready” is not always proof of failure anymore, because ACPX startup probing may be skipped unless enabled.

Best next move:

openclaw --version
openclaw status --all
openclaw gateway status
openclaw logs --follow

Then file a GitHub issue with that output. I wouldn’t jump to 2026.4.26-beta.1 unless you’re okay testing beta builds; latest is still 2026.4.25 from npm.

left swan
#

There was some back and forth, but here's the latest from my bot:

Still failing — same 2-second instant death, 0 tokens. Auth refresh didn't fix it, confirming Krill's point: the token wasn't the root cause. This is the WS handshake / acpx runtime issue.

Time to file the GitHub issue. Let me gather the diagnostic output:Got all the diagnostic data. Here's what I'd put in the GitHub issue — want me to file it, or would you rather paste it yourself?

Title: ACP turns fail with "Internal error" after upgrade to 2026.4.25 — gateway WS handshake rejected

Body:

  • Version: OpenClaw 2026.4.25 (aa36ee6), macOS arm64, Node v22.22.2 (nvm)
  • ACP agents: claude (claude-agent-acp v0.26.0), codex (codex-acp)
  • Symptom: All ACP sessions_spawn calls fail instantly (2s, 0 tokens) with AcpRuntimeError: Internal error / ACP_TURN_FAILED
  • Last working: 2026.4.24 on Apr 26 — acpx v0.5.3 showed "registered" then "ready"
  • Since upgrade: acpx v0.6.1 shows "registered" but never "ready" across 6+ gateway restarts

Key log evidence:

[gateway/ws] invalid handshake: first request must be connect
  (lastFrameMethod: sessions.list — acpx subprocess sends sessions.list before connect frame)
[gateway] gateway connect failed: Error: gateway closed (1008)
[gateway] acp startup identity reconcile (renderer=v1): checked=22 resolved=0 failed=22

What I've ruled out:

  • Auth: Claude CLI re-authed successfully, token fresh
  • Session state: Cleaned all stale ACP session files + orphaned .jsonl transcripts
  • Config: maxConcurrentSessions bumped to 100, well under limit
  • Standalone: claude --print and claude-agent-acp both work fine outside OpenClaw
  • Subagent (non-ACP) spawns work perfectly

Want me to file it on GitHub?

vague badgerBOT
#

Yep — this is now issue-worthy. The write-up is good.

I’d make two small edits before filing:

  1. Phrase this as evidence, not certainty:

    lastFrameMethod: sessions.list suggests an ACP-related Gateway client is sending sessions.list before connect.

    Don’t say “the acpx subprocess sends…” unless the log clearly identifies the process/client.

  2. Add the raw diagnostics as attachments/paste:

    openclaw --version
    openclaw status --all
    openclaw gateway status
    openclaw doctor
    
#

Also include:

  • whether OPENCLAW_ACPX_RUNTIME_STARTUP_PROBE is set
  • exact plugins/acpx / bundled runtime-deps path if visible in logs
  • the exact command/input that triggers the failing ACP turn

And yes: auth is now pretty well ruled out. A fresh Claude auth token not changing the 2s / 0-token failure points back to ACPX/Gateway/runtime state.

Per project policy I can’t file the GitHub issue for you, but you should paste it here: https://github.com/openclaw/openclaw/issues

left swan
vague badgerBOT
#

Perfect, thank you. That’s exactly the right place for it.

If this ends up fixed or worked around in the thread, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow indexes it for the next person. 🦐

left swan
#

workaround is to use regular subagents and not acp

vague badgerBOT
#

Yep, that’s the safest workaround for now: use regular runtime="subagent" spawns and avoid ACP until the issue is fixed.

If you need to preserve the workflow, the practical swap is:

  • ACP/Codex/Claude thread-bound work → regular subagent
  • avoid ACP-only fields like resumeSessionId / ACP streamTo
  • keep the GitHub issue open with the ACP diagnostics so maintainers can track the real fix

Glad you have a path forward at least. 🦐