#OPENCLAW AGENT EXECUTION BUG — ROOT CAUSE ANALYSIS

21 messages · Page 1 of 1 (latest)

mighty mauve
#

OPENCLAW AGENT EXECUTION BUG — ROOT CAUSE ANALYSIS


CONTEXT:

OpenClaw environment where:

• Existing agents (e.g. systemarchitect) execute normally
• ANY newly created agent hangs indefinitely on execution
• Hanging occurs even with minimal prompt: "Respond with exactly: OK"
• Happens across different agent names and templates
• Cloning working agent directories does NOT fix it
• Config restore + session clearing + cache clearing performed
• Gateway restart successful
• Runtime confirmed healthy (existing agents respond instantly)


SYMPTOMS:

• openclaw agent --agent <new-agent> → hangs / no output
• No errors surfaced
• No difference between stdout or telegram channel
• Agent appears correctly in registry


ALREADY TESTED:

• Session deletion
• Workspace prompt clearing
• Model switching
• Agent recreation
• Config restore
• Runtime reset (/tmp + sessions)
• Template cloning from working agent


WORKING:

• systemarchitect agent executes normally


FAILING:

• market-brief
• market-brief-v2
• trading-desk
• any newly created agent


QUESTION:

What internal OpenClaw component could cause:

• New agents failing at execution
• Existing agents continuing to work

Specifically:

Is there a known issue with:

• agent lifecycle registry
• workspace binding cache
• agent ID/context initialization
• gateway routing layer


GOAL:

Identify ROOT CAUSE and minimal fix path

NOT workaround

drowsy thicketBOT
icy meadowBOT
#

Most likely, this is not gateway routing and probably not the agent registry.

Because openclaw agent --agent <id> hangs from CLI too, the failure is lower down, in per-agent state initialization:

  • agentDir resolution
  • per-agent auth store
  • per-agent session store / sessionFile path resolution
  • workspace bootstrap state (BOOTSTRAP.md)

If I had to bet on one root cause from your symptoms, it is this:

new agents are hitting a bad per-agent state path, while the old working agent already has a valid one.

That fits several known multi-agent bug classes OpenClaw has had:

  • wrong/default agentDir handling for new agents
  • shared agentDir collisions
  • stale absolute sessionFile paths in multi-agent setups
#
  • bootstrapping state never completing for new workspaces

So, to your list:

  • agent lifecycle registry: unlikely, the agent is discoverable
  • workspace binding cache: unlikely, CLI reproduction bypasses routing/bindings
  • agent ID/context initialization: yes, very plausible
  • gateway routing layer: no, almost certainly not the primary fault

Minimal fix path

First, version matters a lot here. Please paste:

pnpm openclaw --version
pnpm openclaw status --all
pnpm openclaw config get agents
#
pnpm openclaw config get session.store

What I want to check immediately:

  1. Does each agent have a unique agentDir?
    Shared agentDir causes auth/session collisions.

  2. Does status --all show bootstrap pending for the broken agents?
    A stuck BOOTSTRAP.md can keep new agents in first-run state.

  3. Are broken agents carrying stale sessionFile pointers?
    Especially if you restored config or changed state dirs.

  4. Is the failing agent missing auth-profiles.json in its own agentDir?
    Auth is per-agent.

#

There were real fixes in Jan and Feb 2026 around exactly this area, multi-agent agentDir, auth path, and session path resolution. So if you're on an older build, the root cause is likely a version-specific multi-agent state bug, not your prompt/template.

Docs:

Paste those 4 outputs and I can narrow it to the exact component pretty fast.

mighty mauve
#

OPENCLAW v2026.4.2 — MULTI-AGENT INITIALIZATION BUG


VERSION:

pnpm openclaw --version → 2026.4.2


BEHAVIOR:

• Existing agents (systemarchitect) execute normally
• ANY newly created agent hangs on execution
• Reproducible across names, templates, and cloned agents


CONFIRMED:

• Session state cleared → no change
• /tmp runtime cleared → no change
• Gateway restarted → no change
• Agent cloned from working agent → still hangs
• Minimal prompt ("OK") → still hangs


CRITICAL:

• systemarchitect works immediately
• trading-desk (and any new agent) hangs
• failure occurs ONLY during new agent initialization


QUESTION:

Given v2026.4.2 and known multi-agent fixes earlier in 2026:

Is this a known bug in:

• agentDir resolution
• auth-profiles initialization
• sessionFile path handling
• bootstrap state machine


REQUEST:

Identify exact root cause for this version and provide minimal fix

NOT workaround

#

OPENCLAW v2026.4.2 — MULTI-AGENT INITIALIZATION BUG


VERSION:

pnpm openclaw --version → 2026.4.2


BEHAVIOR:

• Existing agents (systemarchitect) execute normally
• ANY newly created agent hangs on execution
• Reproducible across names, templates, and cloned agents


CONFIRMED:

• Session state cleared → no change
• /tmp runtime cleared → no change
• Gateway restarted → no change
• Agent cloned from working agent → still hangs
• Minimal prompt ("OK") → still hangs


CRITICAL:

• systemarchitect works immediately
• trading-desk (and any new agent) hangs
• failure occurs ONLY during new agent initialization


QUESTION:

Given v2026.4.2 and known multi-agent fixes earlier in 2026:

Is this a known bug in:

• agentDir resolution
• auth-profiles initialization
• sessionFile path handling
• bootstrap state machine


REQUEST:

Identify exact root cause for this version and provide minimal fix

NOT workaround

#

@icy meadow

#

Hello

#

Yo @icy meadow

icy meadowBOT
#

Best-supported RCA from the v2026.4.2 repo delta:

This is most likely a fresh-session bootstrap/init bug, not agent registry or gateway routing.

Why that fits your repro:

  • systemarchitect already has an established session/history path
  • every new agent hits the first-run initialization path
  • openclaw agent --agent <new-agent> hanging from CLI too means it is below channel routing
  • cloning a working workspace does not help if the broken path is session/bootstrap init, not workspace contents

Highest-signal fix lane after v2026.4.2

These landed right after the tag and target exactly that path:

  • 94340b9598 fix(agent-init): move session startup context into the runtime (#65055)
  • 7f071a6a8e Agents: persist bootstrap marker after clean sessions_yield
  • ef98a8dd49 refactor(agents): trim bootstrap marker gate
#

Relevant files:

  • src/agents/pi-embedded-runner/run/attempt.ts
  • src/agents/pi-embedded-runner/run/attempt.context-engine-helpers.ts
  • src/agents/bootstrap-files.ts

So, from your candidate list

  • Yes: bootstrap state machine / first-turn session initialization
  • No, probably not: agent lifecycle registry
  • No, probably not: gateway routing layer
  • Unlikely primary RCA: agentDir resolution, auth-profiles initialization, sessionFile path handling

Minimal fix path

Upgrade off 2026.4.2 to a build that includes those commits.

If you are source-patching, the smallest meaningful fix set is:

  1. 94340b9598
  2. 7f071a6a8e
#

Yes. For v2026.4.2, the matching known failure is the fresh-session bootstrap state machine, not the agent registry, not gateway routing, and not primarily agentDir / auth / sessionFile.

Exact failing component

The bad path is the new-agent first-run init lane in the embedded runner:

  • src/agents/pi-embedded-runner/run/attempt.ts
  • src/agents/pi-embedded-runner/run/attempt.context-engine-helpers.ts
  • src/agents/bootstrap-files.ts

The key bug is that fresh sessions could fail to transition cleanly out of bootstrap/startup context handling, specifically around the bootstrap completion marker path (FULL_BOOTSTRAP_COMPLETED_CUSTOM_TYPE) and first-turn startup context handling.

That is why the split looks like this:

  • existing agents work: they already have an established session/transcript and bypass the broken first-run path
  • new agents hang: every new agent enters that broken first-run bootstrap/init path

The fix commits

#

The repo fixes that line up with your repro are:

  • 94340b9598 fix(agent-init): move session startup context into the runtime (#65055)
  • 7f071a6a8e Agents: persist bootstrap marker after clean sessions_yield
  • ef98a8dd49 refactor(agents): trim bootstrap marker gate

Minimal fix path

I checked the tags. The first release that contains all three is:

  • v2026.4.12

So the minimal real fix is:

  • upgrade from v2026.4.2 to v2026.4.12 or newer

If you are patching from source instead of upgrading a release, cherry-pick:

#
  1. 94340b9598
  2. 7f071a6a8e
  3. ef98a8dd49

Short answer to your list

  • agent lifecycle registryno
  • workspace binding cacheno evidence
  • agent ID/context initializationyes, in the bootstrap/init path
  • gateway routing layerno
  • agentDir resolutionnot the main bug
  • auth-profiles initializationnot the main bug
  • sessionFile path handlingnot the main bug
  • bootstrap state machineyes, this is the one

Docs:

#

If this fixed it, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it.

#

Yep, the short version:

Root cause: v2026.4.2 has a new-session bootstrap/init bug.
Not: agent registry, workspace binding cache, or gateway routing.
Fix: upgrade to v2026.4.12+.

If you need the exact patch set, it is:

  • 94340b9598
  • 7f071a6a8e
  • ef98a8dd49

So from your list, bootstrap state machine / agent init is the right bucket.

#

Hey, if you want, I can also turn that into a one-paragraph final RCA you can paste into an issue or internal notes.