#OPENCLAW AGENT EXECUTION BUG — ROOT CAUSE ANALYSIS

21 messages · Page 1 of 1 (latest)

mighty mauve Apr 13, 2026, 9:58 PM

OPENCLAW AGENT EXECUTION BUG — ROOT CAUSE ANALYSIS

CONTEXT:

OpenClaw environment where:

• Existing agents (e.g. systemarchitect) execute normally
• ANY newly created agent hangs indefinitely on execution
• Hanging occurs even with minimal prompt: "Respond with exactly: OK"
• Happens across different agent names and templates
• Cloning working agent directories does NOT fix it
• Config restore + session clearing + cache clearing performed
• Gateway restart successful
• Runtime confirmed healthy (existing agents respond instantly)

SYMPTOMS:

• openclaw agent --agent <new-agent> → hangs / no output
• No errors surfaced
• No difference between stdout or telegram channel
• Agent appears correctly in registry

ALREADY TESTED:

• Session deletion
• Workspace prompt clearing
• Model switching
• Agent recreation
• Config restore
• Runtime reset (/tmp + sessions)
• Template cloning from working agent

WORKING:

• systemarchitect agent executes normally

FAILING:

• market-brief
• market-brief-v2
• trading-desk
• any newly created agent

QUESTION:

What internal OpenClaw component could cause:

• New agents failing at execution
• Existing agents continuing to work

Specifically:

Is there a known issue with:

• agent lifecycle registry
• workspace binding cache
• agent ID/context initialization
• gateway routing layer

GOAL:

Identify ROOT CAUSE and minimal fix path

NOT workaround

drowsy thicketBOT Apr 13, 2026, 9:58 PM

icy meadowBOT Apr 13, 2026, 10:03 PM

Most likely, this is not gateway routing and probably not the agent registry.

Because openclaw agent --agent <id> hangs from CLI too, the failure is lower down, in per-agent state initialization:

agentDir resolution
per-agent auth store
per-agent session store / sessionFile path resolution
workspace bootstrap state (BOOTSTRAP.md)

If I had to bet on one root cause from your symptoms, it is this:

new agents are hitting a bad per-agent state path, while the old working agent already has a valid one.

That fits several known multi-agent bug classes OpenClaw has had:

wrong/default agentDir handling for new agents
shared agentDir collisions
stale absolute sessionFile paths in multi-agent setups

bootstrapping state never completing for new workspaces

So, to your list:

agent lifecycle registry: unlikely, the agent is discoverable
workspace binding cache: unlikely, CLI reproduction bypasses routing/bindings
agent ID/context initialization: yes, very plausible
gateway routing layer: no, almost certainly not the primary fault

Minimal fix path

First, version matters a lot here. Please paste:

pnpm openclaw --version
pnpm openclaw status --all
pnpm openclaw config get agents

pnpm openclaw config get session.store

What I want to check immediately:

Does each agent have a unique agentDir?
Shared agentDir causes auth/session collisions.
Does status --all show bootstrap pending for the broken agents?
A stuck BOOTSTRAP.md can keep new agents in first-run state.
Are broken agents carrying stale sessionFile pointers?
Especially if you restored config or changed state dirs.
Is the failing agent missing auth-profiles.json in its own agentDir?
Auth is per-agent.

There were real fixes in Jan and Feb 2026 around exactly this area, multi-agent agentDir, auth path, and session path resolution. So if you're on an older build, the root cause is likely a version-specific multi-agent state bug, not your prompt/template.

Docs:

Paste those 4 outputs and I can narrow it to the exact component pretty fast.

mighty mauve Apr 13, 2026, 10:19 PM

OPENCLAW v2026.4.2 — MULTI-AGENT INITIALIZATION BUG

VERSION:

pnpm openclaw --version → 2026.4.2

BEHAVIOR:

• Existing agents (systemarchitect) execute normally
• ANY newly created agent hangs on execution
• Reproducible across names, templates, and cloned agents

CONFIRMED:

• Session state cleared → no change
• /tmp runtime cleared → no change
• Gateway restarted → no change
• Agent cloned from working agent → still hangs
• Minimal prompt ("OK") → still hangs

CRITICAL:

• systemarchitect works immediately
• trading-desk (and any new agent) hangs
• failure occurs ONLY during new agent initialization

QUESTION:

Given v2026.4.2 and known multi-agent fixes earlier in 2026:

Is this a known bug in:

• agentDir resolution
• auth-profiles initialization
• sessionFile path handling
• bootstrap state machine

REQUEST:

Identify exact root cause for this version and provide minimal fix

NOT workaround

OPENCLAW v2026.4.2 — MULTI-AGENT INITIALIZATION BUG

VERSION:

pnpm openclaw --version → 2026.4.2

BEHAVIOR:

• Existing agents (systemarchitect) execute normally
• ANY newly created agent hangs on execution
• Reproducible across names, templates, and cloned agents

CONFIRMED:

CRITICAL:

• systemarchitect works immediately
• trading-desk (and any new agent) hangs
• failure occurs ONLY during new agent initialization

QUESTION:

Given v2026.4.2 and known multi-agent fixes earlier in 2026:

Is this a known bug in:

• agentDir resolution
• auth-profiles initialization
• sessionFile path handling
• bootstrap state machine

REQUEST:

Identify exact root cause for this version and provide minimal fix

NOT workaround

@icy meadow

Hello

Yo @icy meadow

icy meadowBOT Apr 13, 2026, 10:30 PM

Best-supported RCA from the v2026.4.2 repo delta:

This is most likely a fresh-session bootstrap/init bug, not agent registry or gateway routing.

Why that fits your repro:

systemarchitect already has an established session/history path
every new agent hits the first-run initialization path
openclaw agent --agent <new-agent> hanging from CLI too means it is below channel routing
cloning a working workspace does not help if the broken path is session/bootstrap init, not workspace contents

Highest-signal fix lane after `v2026.4.2`

These landed right after the tag and target exactly that path:

94340b9598 fix(agent-init): move session startup context into the runtime (#65055)
7f071a6a8e Agents: persist bootstrap marker after clean sessions_yield
ef98a8dd49 refactor(agents): trim bootstrap marker gate

Relevant files:

src/agents/pi-embedded-runner/run/attempt.ts
src/agents/pi-embedded-runner/run/attempt.context-engine-helpers.ts
src/agents/bootstrap-files.ts

So, from your candidate list

Yes: bootstrap state machine / first-turn session initialization
No, probably not: agent lifecycle registry
No, probably not: gateway routing layer
Unlikely primary RCA: agentDir resolution, auth-profiles initialization, sessionFile path handling

Minimal fix path

Upgrade off 2026.4.2 to a build that includes those commits.

If you are source-patching, the smallest meaningful fix set is:

94340b9598
7f071a6a8e

preferably also ef98a8dd49

That is the fix, not a workaround.

Docs:

If you want, I can also map this to the first release version that contains those commits.

Yes. For v2026.4.2, the matching known failure is the fresh-session bootstrap state machine, not the agent registry, not gateway routing, and not primarily agentDir / auth / sessionFile.

Exact failing component

The bad path is the new-agent first-run init lane in the embedded runner:

src/agents/pi-embedded-runner/run/attempt.ts
src/agents/pi-embedded-runner/run/attempt.context-engine-helpers.ts
src/agents/bootstrap-files.ts

The key bug is that fresh sessions could fail to transition cleanly out of bootstrap/startup context handling, specifically around the bootstrap completion marker path (FULL_BOOTSTRAP_COMPLETED_CUSTOM_TYPE) and first-turn startup context handling.

That is why the split looks like this:

existing agents work: they already have an established session/transcript and bypass the broken first-run path
new agents hang: every new agent enters that broken first-run bootstrap/init path

The fix commits

The repo fixes that line up with your repro are:

94340b9598 fix(agent-init): move session startup context into the runtime (#65055)
7f071a6a8e Agents: persist bootstrap marker after clean sessions_yield
ef98a8dd49 refactor(agents): trim bootstrap marker gate

Minimal fix path

I checked the tags. The first release that contains all three is:

v2026.4.12

So the minimal real fix is:

upgrade from v2026.4.2 to v2026.4.12 or newer

If you are patching from source instead of upgrading a release, cherry-pick:

94340b9598
7f071a6a8e
ef98a8dd49

Short answer to your list

agent lifecycle registry → no
workspace binding cache → no evidence
agent ID/context initialization → yes, in the bootstrap/init path
gateway routing layer → no
agentDir resolution → not the main bug
auth-profiles initialization → not the main bug
sessionFile path handling → not the main bug
bootstrap state machine → yes, this is the one

Docs:

If this fixed it, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it.

Yep, the short version:

Root cause: v2026.4.2 has a new-session bootstrap/init bug.
Not: agent registry, workspace binding cache, or gateway routing.
Fix: upgrade to v2026.4.12+.

If you need the exact patch set, it is:

94340b9598
7f071a6a8e
ef98a8dd49

So from your list, bootstrap state machine / agent init is the right bucket.

Hey, if you want, I can also turn that into a one-paragraph final RCA you can paste into an issue or internal notes.

#OPENCLAW AGENT EXECUTION BUG — ROOT CAUSE ANALYSIS

Minimal fix path

Highest-signal fix lane after v2026.4.2

So, from your candidate list

Minimal fix path

Exact failing component

The fix commits

Minimal fix path

Short answer to your list

Highest-signal fix lane after `v2026.4.2`