Multi agent setup | Friends of the Crustacean 🦞🤝 | Page 1

ebon wind Apr 12, 2026, 9:26 PM

#

Hi, I’m testing a multi-agent OpenClaw setup and I’m seeing partial infrastructure instability. Short subagent pings work, but longer delegated tasks often fail or return partial results with gateway/tick timeout style errors, even though the gateway eventually comes up and wired agents can answer simple liveness checks. Could you help me figure out why deeper subagent/specialist coordination is unstable and what logs or diagnostics you want me to provide?

deft foxBOT Apr 12, 2026, 9:26 PM

#

rancid terraceBOT Apr 12, 2026, 9:26 PM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

deft foxBOT Apr 12, 2026, 9:26 PM

#

formal oarBOT Apr 12, 2026, 9:28 PM

#

Yes, and tick timeout is a useful clue.

In current OpenClaw, that usually means the client stopped receiving gateway keepalive ticks, so I’d first suspect gateway stall/reload under heavier multi-agent load, not basic reachability. Short pings working but longer delegated runs failing often points to one of these:

Gateway/event-loop stall or restart during child runs
Queue or session-store contention under fan-out
Subagent timeout / announce-back failure
Model/backend instability that only shows up on larger, tool-heavy prompts

Please paste this first:

openclaw --version
openclaw status --all
openclaw gateway status --deep
openclaw health --verbose

#

openclaw models status

Then reproduce one failing run and grab:

openclaw logs --follow

What I want from that repro:

the exact timestamp of the failure
whether it fails during spawn, child execution, or announce back
your subagent fan-out at the time, for example 1 child vs 4-8 children
the relevant config snippet for:
- agents.defaults.maxConcurrent

#

agents.defaults.subagents
- acp.maxConcurrentSessions if this is ACP, not native subagents
- the model/provider used by parent and child

Things I’ll specifically look for in the logs:

tick timeout
gateway restart/reload lines
timeout waiting for session store lock
queue delay / queued for ...ms
subagent timed out
provider 429 / 5xx
local OpenAI-compatible backend errors on larger prompts/tools

Quick isolation test, if you want: run the same deeper task with just 1 active child. If that becomes stable, it strongly suggests contention/resource pressure rather than routing.

Docs:

#

FAQ: https://docs.openclaw.ai/help/faq
Troubleshooting: https://docs.openclaw.ai/gateway/troubleshooting
Subagents: https://docs.openclaw.ai/tools/subagents

If you paste those outputs plus one failing log window, I can help narrow it down pretty quickly.

ebon wind Apr 12, 2026, 9:50 PM

#

21:47:49+00:00 error Subagent completion direct announce failed for run daf1c3c1-fe4a-43f7-928e-0274f118002e: gateway closed (4000): tick timeout Gateway target: ws://127.0.0.1:18789 Source: local loopback Config: C:\Users\Openclaw.openclaw\openclaw.json Bind: lan

formal oarBOT Apr 12, 2026, 9:51 PM

#

That log is helpful.

A few important clues:

gateway closed (4000): tick timeout means the gateway client watchdog stopped receiving keepalive ticks.
Because the target is ws://127.0.0.1:18789 / local loopback, this looks more like the gateway becoming unresponsive or restarting locally than a Discord/network issue.
Subagent completion direct announce failed is specifically the direct completion handoff path. It does not necessarily mean the child run itself failed. OpenClaw should try a queue fallback after this, so the next log lines matter a lot.

What I’d like next:

openclaw --version
openclaw status --all
openclaw gateway status --deep
openclaw doctor

#

And please paste 60 to 120 seconds of logs before and after 21:47:49+00:00, not just the one line.

What I’m looking for in that window:

gateway restart / reload lines
drain timeout reached; proceeding with restart
timeout waiting for session store lock
long queued for ...ms delays
provider/model 429 / 5xx
whether the queue fallback announce succeeded after the direct path failed

Also please include your relevant config values for:

agents.defaults.maxConcurrent
agents.defaults.subagents.maxConcurrent
agents.defaults.subagents.maxChildrenPerAgent
agents.defaults.subagents.runTimeoutSeconds

#

agents.defaults.subagents.announceTimeoutMs

My current guess is: the gateway is stalling under heavier subagent load, and the first visible symptom is the completion handoff losing ticks.

Quick isolation test, if you want:

run the same deeper task with 1 child only
or temporarily set subagent concurrency very low

If that stabilizes it, we’re probably dealing with resource / queue / lock contention, not broken routing.

Docs:

If you paste the surrounding log window, I think we can get much more concrete.

ebon wind Apr 12, 2026, 10:04 PM

#

00:03:52+02:00 [diagnostic] lane wait exceeded: lane=session:agent:main:telegram:direct:846947265 waitedMs=4160 queueAhead=0

formal oarBOT Apr 12, 2026, 10:04 PM

#

That one is interesting, but it’s more backpressure evidence than a root cause.

What it means:

lane=session:... = OpenClaw is serializing work for that specific session
waitedMs=4160 = this task sat ~4.1s before it could start
queueAhead=0 does not mean the lane was idle, it only means there was nothing else queued behind the currently active task

So this usually means:

one task was already active for that session
the next task had to wait for it to finish or hit a boundary
this is session-level contention, not necessarily a global crash

By itself, 4 seconds isn’t terrible. The question is whether you’re seeing lots of these, or much bigger waits, right before the tick timeout.

What I’d want now is the surrounding log window showing:

#

a few lines before and after this lane wait exceeded
any lane task done
any lane task error
any gateway restart / drain / reload lines
any timeout waiting for session store lock

So yeah, this supports the theory that heavier work is clogging the queue/session lane, but it doesn’t yet prove why the gateway later misses ticks.

Two concrete things that would help a lot:

still paste:

openclaw --version
openclaw status --all
openclaw gateway status --deep

#Multi agent setup