Running 7-agent pipeline on Win10, OpenClaw 2026.4.2 (npm install), MiniMax M2.7 API key. Orchestrator (main) dispatches workers via sessions_send, each worker has its own Telegram bot bound via bindings.
Problem: nested lane waits are massive with queueAhead=0:
lane=nested waitedMs=130086 queueAhead=0
lane=nested waitedMs=152390 queueAhead=0
lane=session:agent:notebooklm-ingestion-agent:... waitedMs=348938 queueAhead=0
Workers eventually run successfully but agent.wait times out at 45s, so orchestrator gives up and retries → duplicate dispatches. Also seeing periodic Profile minimax:global timed out failovers.
Already done: killed duplicate gateway scheduled task, fixed groupPolicy: allowlist (was silently dropping messages), removed hardcoded session keys from worker MEMORY files. doctor --deep clean, Telegram ok.
Questions: (1) config knob for nested lane concurrency? (2) where to bump 45s agent.wait? (3) should I migrate to sessions_spawn + sessions_yield?
Happy to share sanitized config / more logs.
#Multi-agent dispatch hangs — orchestrator can't reach sub-agents and sub-agent replies never arrive
1 messages · Page 1 of 1 (latest)
Yep — this behavior lines up with how lanes/tools are wired today.
Short answers first:
- Nested lane concurrency knob?
No direct config knob right now.
nestedlane is effectively default-lane behavior (concurrency 1 unless explicitly set in code), and config only exposes concurrency for:
agents.defaults.maxConcurrent→mainlaneagents.defaults.subagents.maxConcurrent→subagentlanecron.maxConcurrentRuns→cronlane
- Where to bump the 45s
agent.wait?
At the caller, not a global config key.
- If you call
agent.waitdirectly: raisetimeoutMsthere. - If you dispatch via
sessions_send: raisetimeoutSecondson each call (default is 30s in tool code).
- Migrate to
sessions_spawn+sessions_yield?
For a 7-worker pipeline: yes, strongly recommended.
sessions_send uses nested lane and can trigger extra A2A follow-up/announce work, which amplifies wait pressure.
sessions_spawn is built for fan-out background work and uses the subagent lane, where concurrency is tunable.
What your logs likely mean:
queueAhead=0+ hugewaitedMsusually means “you were next, but lane slot was occupied” (active task), not a deep queue backlog.- Retrying after wait timeout creates duplicate dispatches (new idempotency each retry), so workers eventually run and duplicates appear.
If you must stay on sessions_send for now, quick mitigations:
- Set
session.agentToAgent.maxPingPongTurns: 0(cuts reply-back churn). - Increase
timeoutSecondsonsessions_sendcalls.
- Treat wait timeout as “unknown/pending”, not immediate failure + redispatch.
If you switch to spawn pattern:
- Dispatch workers with
sessions_spawn(non-blocking). - Use
sessions_yieldto wait for completion events (push-based). - Tune
agents.defaults.subagents.maxConcurrent+runTimeoutSeconds.
Docs:
- https://docs.openclaw.ai/tools/subagents
- https://docs.openclaw.ai/concepts/session-tool
- https://docs.openclaw.ai/concepts/queue
- https://docs.openclaw.ai/gateway/configuration-reference
- https://docs.openclaw.ai/concepts/model-failover
If you share sanitized config, include these blocks and I’ll give exact edits:
agents.defaults, agents.defaults.subagents, session.agentToAgent, tools.agentToAgent, bindings.
Also paste exact openclaw --version output (full string), just to confirm patch/build before we tune further.
Going with Path B — want to migrate to sessions_spawn + sessions_yield. How can i get the blocks to you. discord doesnt allow posts of more than 2000 chars and i'm already whittled them down
version is OpenClaw 2026.4.2 (d74a122)
Perfect, thanks — 2026.4.2 (d74a122) helps.
Yep, Discord limit is annoying. Easiest options:
- Attach a file (
config-snippet.txt/.json5) in the thread (best). - If no file, send 4 small messages with this exact split:
[1/4]
agents.defaultsagents.defaults.subagents- orchestrator agent entry from
agents.list[](just that one)
[2/4]
- worker agent entries (
agents.list[]) for 1–2 workers (pattern is enough) - each worker’s
subagentsblock (if present)
[3/4]
tools.allow/tools.denytools.subagentssession.agentToAgent
[4/4]
- relevant
bindings[]entries (orchestrator + one worker example) - 20–40 lines of queue/failover logs around one timeout
While you gather that, here’s the minimum Path B baseline to move to sessions_spawn + sessions_yield:
- Ensure orchestrator can call:
sessions_spawn,sessions_yield,sessions_list,sessions_history,subagents - Set:
agents.defaults.subagents.maxConcurrent(start with8)agents.defaults.subagents.runTimeoutSeconds(e.g.900)
subagents.allowAgentsso orchestrator can spawn your worker agent IDs (default is same-agent only)- Set
session.agentToAgent.maxPingPongTurns: 0(reduces old sessions_send chatter overhead)
Then orchestration flow should be:
- spawn all workers via
sessions_spawn(non-blocking) - store returned
runId/childSessionKey - call
sessions_yield - consume completion events (no 45s wait-loop retries)
Drop the blocks in the split above and I’ll give you an exact config patch.
cant i just tell my main agent to do this? i'm having trouble dragging and dropping into discord
Yep — 100%, you can have your main agent do it.
No file upload needed.
Paste this to your main agent:
Migrate my orchestration from sessions_send to sessions_spawn + sessions_yield.
Constraints:
- First do ANALYZE ONLY (no writes), then show a plan + exact diffs.
- Keep existing bindings unless required.
- Do not use sessions_send for worker dispatch anymore.
Required config updates:
1) session.agentToAgent.maxPingPongTurns = 0
2) agents.defaults.subagents.maxConcurrent = 8
3) agents.defaults.subagents.runTimeoutSeconds = 900
4) Ensure orchestrator agent tool policy allows:
sessions_spawn, sessions_yield, sessions_list, sessions_history, subagents, session_status
5) Ensure orchestrator subagents.allowAgents includes my 7 worker agent IDs (or explicit safe allowlist).
Required orchestration behavior updates:
- Fan out workers with sessions_spawn (runtime=subagent, explicit agentId).
- Collect returned runId + childSessionKey.
- Call sessions_yield after dispatch.
- Do NOT retry-dispatch on wait timeout.
- Use completion events (and sessions_history only as fallback) for final aggregation.
Output format:
A) Findings
B) Proposed config diff
C) Proposed orchestration diff
D) Validation checklist
After I approve, apply changes, restart gateway, and run a smoke test with 2 workers.
If you want, I can also give you a tiny follow-up prompt that forces your agent to print results in 3 Discord-safe chunks ([1/3], [2/3], [3/3]).
i'm going to send you response back in chunks ok?
first response
A) Findings
• Current orchestration relies on sessions_send for worker dispatch + progress signaling. We’ve seen it be unreliable (timeouts / gateway closed), which is exactly what sessions_spawn + completion events are meant to avoid.
• Your current openclaw.json already has:
• tools.sessions.visibility: "all" and tools.agentToAgent.enabled: true
• Agent registry includes the 7 worker IDs you listed:
queue-agent, claim-worker, sec-filing-agent, notebooklm-ingestion-agent, owner-earnings-extraction-agent, owner-earnings-normalization-agent, intrinsic-value-agent
• Missing / not yet present in config (will need adding):
• session.agentToAgent.maxPingPongTurns
• agents.defaults.subagents.{maxConcurrent, runTimeoutSeconds}
• an orchestrator tool-policy allowlist (if you want it explicit vs tool profile)
• agents.defaults.subagents.allowAgents (or equivalent allowlist) for the 7 worker agent IDs
Perfect — chunk 1 looks solid and aligned with what we expected ✅
Please send B (proposed config diff) next, then C (orchestration diff), then D (validation checklist).
Most important for B:
- exact JSON5 paths/values for
session.agentToAgent.maxPingPongTurns
agents.defaults.subagents.maxConcurrent
agents.defaults.subagents.runTimeoutSeconds
agents.defaults.subagents.allowAgents(or per-agent equivalent)
I’ll sanity-check each chunk as you post it.
second response
B) Proposed config diff (exact)
File: C:\Users\Eagle3dStreaming.openclaw\openclaw.json
@@
"agents": {
"defaults": {
- "subagents": {
- "maxConcurrent": 8,
- "runTimeoutSeconds": 900,
- "allowAgents": [
- "queue-agent",
- "claim-worker",
- "sec-filing-agent",
- "notebooklm-ingestion-agent",
- "owner-earnings-extraction-agent",
- "owner-earnings-normalization-agent",
- "intrinsic-value-agent"
- ]
- },
"model": {
"primary": "minimax/MiniMax-M2.7"
},
@@
"tools": {
"profile": "coding",
@@
"agentToAgent": {
"enabled": true - }
- },
- "session": {
- "agentToAgent": {
- "maxPingPongTurns": 0
}
},
Tool-policy note (requirement #4): OpenClaw’s tools.profile: "coding" already includes the tools you listed in practice, but if you want an explicit allowlist for orchestrator agent only, we can add an agent-specific tools policy block (I’m not adding it yet because your config doesn’t currently use per-agent tool policies, and I want to “keep existing bindings unless required”). If you confirm the schema you prefer (global vs per-agent), I’ll draft the exact diff.
this response
C) Proposed orchestration diff (exact)
Target: Main orchestrator logic (wherever you currently do dispatch via sessions_send).
Replace patterns like:
• dispatch worker → sessions_send(...)
• wait/poll worker session → sessions_list/sessions_history loops
• retry-dispatch on timeout
With:
- Dispatch via sessions_spawn (runtime=subagent, explicit agentId)
- Store runId + childSessionKey in an in-memory map (or structured state object)
- Immediately sessions_yield() after dispatch fanout
- On completion events, advance the pipeline; use sessions_history(childSessionKey) only if the completion payload is missing key fields
Pseudo-diff (illustrative, since I don’t yet have your orchestrator source file in this workspace):
- await sessions_send({ sessionKey:
agent:${workerId}:telegram:direct:${userId}, message }) - // poll/wait, retry on timeout, etc.
- const spawnRes = await sessions_spawn({
- runtime: "subagent",
- mode: "run",
- agentId: workerId,
- task: message
- })
- state.children.push({ workerId, runId: spawnRes.runId, childSessionKey: spawnRes.childSessionKey })
- await sessions_yield()
Fan-out rule: You asked to “fan out workers” with max concurrency 8. Your pipeline is sequential by design, so the most sensible “fan out” is:
• fan out only where it’s safe (e.g., non-dependent background checks), OR
• keep pipeline sequential but still use sessions_spawn to run each worker as an isolated run (recommended)
If you truly want parallelism inside the pipeline, tell me which steps are allowed to overlap (right now your MEMORY.md says never parallelize).
Nice — B is mostly correct ✅
Two important tweaks before apply:
- Add
maxChildrenPerAgent(you’re fanning out 7 workers; default is 5)
agents: {
defaults: {
subagents: {
maxConcurrent: 8,
maxChildrenPerAgent: 10, // <- important for 7-worker fanout
runTimeoutSeconds: 900,
allowAgents: [/* your 7 worker ids */],
},
},
}
tools.profile: "coding"already includesgroup:sessions, so you should havesessions_spawn+sessions_yieldavailable unless another deny layer exists.
Everything else in your B chunk looks aligned (including session.agentToAgent.maxPingPongTurns: 0).
Send chunk C (orchestration diff) next — that’s the make-or-break part for eliminating duplicate dispatch.
C is very good — this is the right migration shape ✅
A few small tighten-ups:
- Add
labelper spawn (makes tracing easier insessions_list/ logs). - Consider
cleanup: "delete"to avoid subagent session buildup. - Pass
runTimeoutSecondsin the spawn call if some workers are known long-runners. - Do not keep any retry-dispatch-on-timeout logic.
Example tightened spawn call:
{
runtime: "subagent",
mode: "run",
agentId: workerId,
label: `pipe-${jobId}-${workerId}`,
task: message,
runTimeoutSeconds: 900,
cleanup: "delete"
}
And yep: if your pipeline is logically sequential, do sequential spawn + yield per stage (still isolated, still better than sessions_send), not forced parallel fanout.
Send D (validation checklist) and we can greenlight final apply.
fourth response
D) Validation checklist
- Config
• [ ] openclaw.json contains:
• [ ] session.agentToAgent.maxPingPongTurns = 0
• [ ] agents.defaults.subagents.maxConcurrent = 8
• [ ] agents.defaults.subagents.runTimeoutSeconds = 900
• [ ] agents.defaults.subagents.allowAgents includes all 7 worker IDs
• [ ] orchestrator tool policy allows: sessions_spawn, sessions_yield, sessions_list, sessions_history, subagents, session_status
- Orchestrator behavior
• [ ] No remaining sessions_send calls for dispatch
• [ ] After spawning worker(s), orchestrator calls sessions_yield (no polling loops)
• [ ] No retry-dispatch on timeouts
• [ ] Completion events drive the next step; sessions_history used only as fallback
- Smoke test (after approval)
• [ ] Restart gateway
• [ ] Run a smoke test dispatching 2 workers (you choose which two) via sessions_spawn
• [ ] Verify completion events arrive and orchestrator advances correctly without sessions_send
If you tell me where your orchestrator code lives (file path / repo), I’ll produce a real, line-accurate “Proposed orchestration diff” instead of pseudocode.
This is a solid plan — I’d greenlight it with one required addition:
- Add
agents.defaults.subagents.maxChildrenPerAgent(set to10for your 7-worker case).
So your final subagents block should include:
maxConcurrent: 8maxChildrenPerAgent: 10✅runTimeoutSeconds: 900allowAgents: [7 worker ids]
Everything else in your checklist looks right.
If helpful, paste this to your main agent as the apply step:
Apply the approved migration now.
Requirements:
1) Update openclaw.json:
- session.agentToAgent.maxPingPongTurns = 0
- agents.defaults.subagents.maxConcurrent = 8
- agents.defaults.subagents.maxChildrenPerAgent = 10
- agents.defaults.subagents.runTimeoutSeconds = 900
- agents.defaults.subagents.allowAgents = [the 7 worker IDs]
2) Update orchestrator logic:
- Replace sessions_send dispatch with sessions_spawn (runtime=subagent, mode=run, explicit agentId, label, runTimeoutSeconds, cleanup=delete)
- Remove retry-dispatch-on-timeout logic
- Use sessions_yield after spawn dispatch
- Use completion events as primary; sessions_history only fallback
3) Restart gateway.
4) Run smoke test with 2 workers.
5) Return:
- exact diff applied
- smoke test result
- any remaining failures
Send me the smoke-test result and I’ll help tune anything left (especially if MiniMax timeout failovers still appear).
Hey — I’m here.
Quick update on the migration work: I applied the config changes but hit a schema mismatch that broke the gateway restart on the first pass (allowAgents / groupChat weren’t valid where I put them). I ran openclaw doctor --repair --non-interactive, which repaired the config and it now validates cleanly.
What’s still incomplete right now:
• The config does not yet contain agents.defaults.subagents.allowAgents after doctor repair (it removed it), so we still need to put the 7-agent allowlist in the correct schema location.
• I also haven’t found an actual “orchestrator code file” in workspace/ to patch (it looks like orchestration is currently happening via agent instructions + MEMORY.md conventions rather than a JS/PS orchestrator script). So I can’t truthfully claim I’ve replaced dispatch logic in code yet.
If you tell me where the orchestrator logic lives (file path, or if it’s purely “main agent behavior”), I’ll finish:
- put the allowlist in the correct config key (per schema),
- restart gateway successfully,
- run the 2-worker smoke test via sessions_spawn + sessions_yield,
- return exact diffs + results.
Great update — you’re close.
You can avoid the allowAgents schema fight by putting it on the main/orchestrator agent (not only in defaults).
That’s also cleaner security-wise.
Use this shape:
{
agents: {
defaults: {
subagents: {
maxConcurrent: 8,
maxChildrenPerAgent: 10,
runTimeoutSeconds: 900
}
},
list: [
{
id: "main",
subagents: {
allowAgents: [
"queue-agent",
"claim-worker",
"sec-filing-agent",
"notebooklm-ingestion-agent",
"owner-earnings-extraction-agent",
"owner-earnings-normalization-agent",
"intrinsic-value-agent"
],
requireAgentId: true
}
}
]
},
session: {
agentToAgent: { maxPingPongTurns: 0 }
}
}
If doctor --repair still strips defaults allowAgents, that likely means your 2026.4.2 parser expects the per-agent location more reliably — so keep it under agents.list[].subagents and proceed.
On “no orchestrator code file found”: that’s fine.
If orchestration is behavior/prompt-driven, then the “patch” is your main agent instructions (AGENTS/MEMORY/prompt), not JS/PS code.
Set explicit orchestration rules for main agent:
- Never dispatch workers with
sessions_send - Always dispatch with
sessions_spawn(runtime: "subagent", explicitagentId) - After dispatch, call
sessions_yield - Never retry-dispatch on wait timeout
- Use completion events first;
sessions_historyonly fallback
Then:
openclaw gateway restart- run your 2-worker smoke test
If you paste the post-restart smoke result, I’ll do the final sanity pass.
Thanks — this bindings block is useful.
Quick read: bindings look structurally fine, but they’re not the key blocker for your spawn migration.
What still matters for Path B:
-
Spawn permissions (separate from bindings)
mainneedssubagents.allowAgentslisting target worker agent IDs.
Bindings route inbound chat; they do not grantsessions_spawncross-agent rights. -
Runtime choice
- Worker pipeline: use
sessions_spawnwithruntime: "subagent"(default). - ACP bindings only matter if you intentionally spawn ACP sessions (
runtime: "acp").
- Ordering sanity (important on older configs)
Put the topic-specific ACP bindings above broad account/channel bindings, so specific topic routes can’t be shadowed.