#After updating to v2026.3.13, `openclaw devices list` fails with gateway handshake timeout

1 messages · Page 1 of 1 (latest)

grizzled sinew
#

Environment: OpenClaw v2026.3.13 (61d171a), Debian 12, Node 22, systemd user service, loopback bind
Affected: 2 separate gateways on same host (different user accounts), both updated from v2026.3.12 → v2026.3.13

Problem: After updating to v2026.3.13, openclaw devices list fails with gateway handshake timeout. All other CLI commands (status, doctor) work fine via token-auth.

Steps to reproduce:

  1. Update from v2026.3.12 → v2026.3.13
  2. Gateway starts correctly (systemd active, Telegram connected, heartbeat running)
  3. Run: sudo -iu <bot_user> env XDG_RUNTIME_DIR=/run/user/$(id -u <bot_user>) PATH=~/.npm-global/bin:$PATH openclaw devices list
  4. → Hangs for 10s, then: gateway closed (1006 abnormal closure)

Expected: Device list output
Actual: Gateway-side logs show [ws] handshake timeout followed by closed before connect ... code=1006

Already tried:

  • Full device identity reset (deleted device.json + device-auth.json + paired.json) → same error after gateway restart
  • Passing --token <gateway-token> explicitly → same error
  • Hard stop/start gateway (not just restart) → same error
  • Verified paired.json has correct scopes (all 5 operator.* scopes present)
  • openclaw status works fine (uses token-auth, no device handshake)
  • openclaw doctor works fine
  • Affects 2 independent gateways with separate user accounts on same host

Hypothesis: v3 device-auth signature payload binds platform + device family. The sudo -iu context may present different platform metadata than the gateway expects, causing handshake mismatch. Both gateways use sudo -iu <user> for CLI access (systemd user service pattern).

Is dangerouslyDisableDeviceAuth still supported? Config validation rejects it as "Unrecognized key" in v2026.3.13.

woeful pilotBOT
#

Short answers up front

  • Is dangerouslyDisableDeviceAuth supported? Yes — it still exists, but the config key moved. Use:
    {
    gateway: {
    controlUi: { dangerouslyDisableDeviceAuth: true }
    }
    }
    (That is gateway.controlUi.dangerouslyDisableDeviceAuth — not a top-level key.)

  • What “handshake timeout” means here: the gateway sent the connect.challenge but never received the client’s connect frame within the handshake window, so the server closed the socket. In other words: the CLI didn’t reply to the server’s connect.challenge.

Why that usually happens (most likely causes)

  1. Different CLI environment / different binary than you expect
    • Running under sudo -iu can change PATH, HOME, and OPENCLAW_STATE_DIR; the CLI you invoke may be a different version or using a different state dir than the gateway expects. A mismatched client can fail to complete the new handshake flow.
  2. Proxy / env interfering with WebSocket frames
    • HTTP_PROXY/HTTPS_PROXY/NO_PROXY can break local wss/ws handshakes in some environments. If a proxy is set in the sudo environment it can stop the client from receiving the challenge or from sending the connect frame correctly.
#
  1. Device identity / state-dir mismatch or unreadable identity
    • If the CLI is using a different ~/.openclaw (or cannot read/write identity files) the client may not present the expected device credentials — that can lead to pairing/verification flows or unexpected failures. (Note: metadata pinning / v3 payloads include platform/deviceFamily and can trigger pairing when they don’t match the paired record — but that normally results in a 1008 pairing-required reply, not a pure “handshake timeout”.)
  2. Different client behaviour / older binary
    • If the CLI binary is older/newer than the gateway and doesn’t implement the same challenge/connect flow, it can ignore the challenge.

What to check (quick verifications — run as the same user you use for the command)

  1. Confirm which binary + version the sudo run is actually using
  • Run (exactly as you run the command):

    sudo -iu <bot_user> which openclaw
    sudo -iu <bot_user> openclaw --version

  1. Confirm the CLI’s state directory + identity files exist and are readable by that user
  • Run:

    sudo -iu <bot_user> sh -c 'echo HOME="$HOME" OPENCLAW_STATE_DIR="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}"; ls -la "$HOME/.openclaw/identity" || true; cat "$HOME/.openclaw/identity/device.json" 2>/dev/null || true; ls -la "$HOME/.openclaw/pairing" 2>/dev/null || true'

#
  1. Check for proxy env vars (these can silently break WS to loopback in odd setups)
  • Run:

    sudo -iu <bot_user> env | rg -i 'proxy|NO_PROXY' || true

    If HTTP_PROXY/HTTPS_PROXY are set, try unsetting them (or add NO_PROXY=127.0.0.1,localhost) and retry.

  1. Run devices list with explicit URL/token and extended timeout (this will show any gateway error JSON)
  • Replace <token> with your gateway token:

    sudo -iu <bot_user> openclaw devices list --url ws://127.0.0.1:18789 --token '<token>' --timeout 30000 --json

    Note: passing --url forces explicit-credentials checks, so include --token too.

  1. Tail gateway logs while reproducing (do this on the gateway user)
  • If gateway is installed as user service:
#

on the gateway user / host

openclaw logs --follow

OR

journalctl --user -u openclaw-gateway.service -f

Then run the devices list command and paste the logs around the failure.

  1. Try an interactive shell (avoid inline env / tilde expansion surprises)
  • Do an interactive login shell, then run the command:

    sudo -iu <bot_user>
    export XDG_RUNTIME_DIR=/run/user/$(id -u)
    export PATH="$HOME/.npm-global/bin:$PATH"
    openclaw devices list

#

This rules out subtle expansion/tilde / quoting differences.

Extra, if the above doesn’t show the issue

  • Try a minimal WS client to see the server’s challenge, e.g. (on the gateway host as that user):

    node -e "const W=require('ws'); const ws=new W('ws://127.0.0.1:18789'); ws.on('open',()=>console.log('open')); ws.on('message',m=>console.log('msg',m)); ws.on('close',(c,r)=>console.log('close',c,r));"

    You should immediately see the server’s connect.challenge event. If that shows up but the openclaw CLI does not handle it, the problem is client-side (binary/env).

What to paste here if you want me to continue

  • Output of:
    • sudo -iu <bot_user> which openclaw && sudo -iu <bot_user> openclaw --version
    • sudo -iu <bot_user> env | rg -i 'OPENCLAW|proxy|HOME|XDG_RUNTIME_DIR' || true
    • sudo -iu <bot_user> ls -la ~/.openclaw/identity ~/.openclaw/pairing || true
    • The gateway log tail covering your run (openclaw logs --follow output) while you run the devices list command
    • The exact command line you use (so I can check for quoting/tilde expansion issues)
#

Quick hypothesis & next steps (practical)

  • Most commonly this is an env / path / binary mismatch when running via sudo. Start by confirming the openclaw binary and state-dir used by the sudo -iu invocation and the presence/permissions of ~/.openclaw/identity/device.json for that user. Also check proxy env vars. Those two checks resolve ~90% of “connect.challenge never answered” cases I’ve seen after upgrades.

If you want, paste the four small outputs above (which openclaw /version, env snippet, ls of identity/pairing, and a short tail of gateway logs) and I’ll read them and give a concrete next action.

#

Two separate things here:

1) dangerouslyDisableDeviceAuth

Yes, it’s still supported, but the key is not top-level anymore. It’s Control UI only:

{
  gateway: {
    controlUi: {
      dangerouslyDisableDeviceAuth: true,
    },
  },
}

Docs: https://docs.openclaw.ai/web/control-ui

#

Also: this will not affect openclaw devices list / CLI pairing. It only relaxes Control UI device identity checks in break-glass scenarios.


2) Why you see [ws] handshake timeout

That log happens when the gateway sends the initial connect.challenge event and never receives the client’s connect frame (so the handshake never completes).

The biggest red flag in your repro command is this part:

PATH=~/.npm-global/bin:$PATH

Both ~ and $PATH get expanded by the current shell before sudo runs, so you’re very likely injecting root’s PATH (and maybe pointing at a different openclaw / different node) into the <bot_user> session. That can absolutely cause “client never responds to connect.challenge” symptoms.

Try this instead (forces expansion inside the target user login shell)

#
sudo -iu <bot_user> bash -lc '
  export XDG_RUNTIME_DIR=/run/user/$(id -u)
  export PATH="$HOME/.npm-global/bin:$PATH"
  command -v openclaw
  openclaw --version
  openclaw devices list --json
'

Or skip PATH hacking entirely and call the binary directly:

sudo -iu <bot_user> /home/<bot_user>/.npm-global/bin/openclaw devices list --json

#

3) Quick extra checks (common culprits)

Proxy vars can break WS even to loopback in some setups:

sudo -iu <bot_user> env | rg -i 'proxy|no_proxy' || true

If anything is set, test once with:

sudo -iu <bot_user> bash -lc '
  unset HTTP_PROXY HTTPS_PROXY ALL_PROXY
  export NO_PROXY=127.0.0.1,localhost
  openclaw devices list --json
'

#

If the corrected bash -lc form still reproduces, paste the output of the 3 lines it prints (command -v openclaw, openclaw --version, and the devices list --json error) plus the gateway log snippet around the handshake timeout — then we can narrow it to either a proxy/env issue or an actual 2026.3.13 regression.

grizzled sinew
#

Follow-up — all env checks done, still reproduces.

Binary + version (via bash -l login shell):

/home/hubee/.npm-global/bin/openclaw
OpenClaw 2026.3.13 (61d171a)
Same binary for gateway (systemd user service) and CLI. No proxy vars set.

Identity files readable by bot user:

identity/device.json (416b), device-auth.json (428b)
devices/paired.json (1 device, all 5 operator.* scopes)
Command (script via SCP, login shell — no tilde/expansion issues):

bash
sudo -iu hubee bash -l /tmp/diag.sh

export XDG_RUNTIME_DIR=/run/user/$(id -u)

export PATH="$HOME/.npm-global/bin:$PATH"

openclaw devices list --json

Gateway log:

[ws] handshake timeout conn=9511c15d… remote=127.0.0.1
[ws] closed before connect conn=9511c15d… code=1006
Raw WS test — challenge IS received:

msg {"type":"event","event":"connect.challenge","payload":{"nonce":"c432338f…"}}
Gateway sends challenge. CLI never replies.

Also tested:

Direct binary: sudo -iu hubee /home/hubee/.npm-global/bin/openclaw devices list → timeout
openclaw status / doctor → work (token-auth)
2 gateways on same host, different users/ports → both fail
Full device identity reset → same result
Looks like a v2026.3.13 CLI regression in the connect.challenge handler.

woeful pilotBOT
#

That’s a really good diag — and one key detail jumped out:

openclaw devices list is going through the CLI-mode gateway client path, which (by design) requests the full CLI_DEFAULT_OPERATOR_SCOPES set (admin/read/write/approvals/pairing).
openclaw status is mostly using the probe/least-privilege path (READ scope) and can still succeed even if the “full CLI client” path is busted.

So the next step is to confirm whether this is:

  • A) “anything that uses CLI-mode callGateway is broken” (bigger regression), or
  • B) “only device pairing methods are broken” (narrower).

1) Can you run these two in the same sudo/login setup?

sudo -iu hubee bash -lc 'openclaw gateway call health --json'
sudo -iu hubee bash -lc 'openclaw gateway call device.pair.list --json --params "{}"'
#

Interpretation:

  • If gateway call health also handshake-times-out → it’s not “devices list”, it’s the CLI-mode gateway client path in general.
  • If gateway call health works but device.pair.list fails → it’s specific to device pairing RPCs (or their authz path) even before connect completes (which is weird, but at least scoped).

2) Turn on client-side debug logs for one run

There is an env knob for debug file logging:

sudo -iu hubee bash -lc 'export OPENCLAW_LOG_LEVEL=debug; openclaw devices list --json'

Then immediately paste the relevant lines from the latest CLI log (not the gateway service log):

sudo -iu hubee bash -lc 'tail -n 200 "$(ls -t /tmp/openclaw/openclaw-*.log | head -1)"'

What I’m looking for are lines like:

#
  • gateway client parse error: ...
  • gateway connect challenge timeout
  • anything indicating the client received a frame but didn’t validate it

3) Re: your “v3 platform/device family mismatch” hypothesis

That mismatch normally triggers a 1008 “pairing required” path (scope-upgrade / metadata-upgrade), after the client has sent connect. It shouldn’t present as “never replies to connect.challenge”.


If you paste:

  1. the output of those two openclaw gateway call ... --json commands, and
  2. that tail -n 200 client log snippet with OPENCLAW_LOG_LEVEL=debug,

…I can tell you which bucket this falls into and whether it’s likely a 2026.3.13 regression worth filing as a GitHub issue (with a minimal repro).

grizzled sinew
#

Results: it's Case A — entire CLI-mode callGateway is broken.

Test 1: gateway call health → timeout (handshake) Test 1b: gateway call device.pair.list → timeout (handshake)

So this is not specific to device pairing — any CLI-mode gateway call fails.

Test 2: debug logs (OPENCLAW_LOG_LEVEL=debug):

json
{"cause":"handshake-timeout","handshake":"failed","durationMs":27462,"host":"127.0.0.1:18791","handshakeMs":3000}
Gateway-side: challenge sent, 3s handshake window expires, connection closed.

Client-side debug log shows no indication the CLI ever received or processed the connect.challenge event. No parse error, no challenge received — the client opens the WS but appears to never handle the server's first message.

Meanwhile, a raw node -e WS client immediately sees:

msg {"type":"event","event":"connect.challenge","payload":{"nonce":"..."}}
Environment confirmed clean: same binary (v2026.3.13), login shell, no proxy vars, identity files readable.

openclaw status and openclaw doctor still work (probe/token-auth path, not CLI-mode callGateway).

Two independent gateways on same host, both upgraded from v2026.3.12 → v2026.3.13, both exhibit identical behavior.

woeful pilotBOT
#

Got it — with those results, this really does look like a 2026.3.13 regression in the “CLI-mode gateway RPC client” path (anything that goes through mode=cli / gateway call / devices etc), while the probe/least-privilege path (status, parts of doctor) still works.

A couple of concrete points (so we’re aligned):

  • Gateway handshake timeout is 3s server-side.
  • Your raw ws client shows the gateway does immediately send:
    {"type":"event","event":"connect.challenge",...}
  • The OpenClaw CLI-mode client appears to open the WS but never processes the first message, because:
    • no connect.challenge timeout (client-side timer) log
    • no parse error log
    • and the gateway never sees a connect request

At this point, I’d treat it as a real bug and switch to “workaround + report”.

Workaround (pragmatic)

If you need devices / gateway call functionality right now, the cleanest workaround is to roll back to 2026.3.12 (known-good for you), on both gateway + CLI, then restart the systemd user service.

#

How you roll back depends on how you installed (npm global vs git checkout), but the key is: CLI binary + service entrypoint must match. After rollback, run:

  • openclaw --version
  • openclaw gateway status
  • restart service

(If you want, tell me your install method and I’ll give the exact minimal rollback steps for that setup.)

Please file a GitHub issue (this is actionable)

This is exactly the kind of bug maintainers can reproduce from a single command:

  • Debian 12 + Node 22
  • OpenClaw 2026.3.13
  • openclaw gateway call health --json → handshake timeout
  • but openclaw status / openclaw doctor still work
  • gateway log: [ws] handshake timeout ... closed before connect
#

Issue link: https://github.com/openclaw/openclaw/issues

Include these in the report (copy/paste):

  1. openclaw --version and node --version
  2. exact failing command: openclaw gateway call health --json
  3. gateway log lines showing handshake timeout
  4. your raw ws proof that connect.challenge is sent/visible
  5. mention “2 gateways, 2 users, different ports, identical behaviour after 2026.3.12 → 2026.3.13”

If you paste the GitHub issue link here after you open it, I can help tighten the repro text / label it correctly so it doesn’t get bounced.

(Also: gateway.controlUi.dangerouslyDisableDeviceAuth is unrelated here; it won’t affect CLI-mode RPC.)