#Multi-Agent Migration Plan

1 messages · Page 1 of 1 (latest)

indigo hinge
#

Is this a solid plan? currently the "captain" creates the subworkers.

Goal
Promote high-value ephemeral sub-agents to permanent agentIds so they retain knowledge between runs. Keep spawn pattern for one-shot tasks.
Current Prob
All sub-agents are ephemeral, no persistent memory, no agent-to-agent messaging, everything funnels through Captain (Opus 4.6).
Proposed Changes
Make Permanent (agentId): Scout (research, Sonnet 4.6), Bolt (code, Opus 4.6), Lens (analytics, Opus 4.6). Each gets own AGENTS.md, MEMORY.md, and memory dir under shared workspace.
Stay Ephemeral: Echo, Pulse, Shade, Interns — standalone/event-driven tasks.
Comms: Discord inbound → Captain only. Captain delegates via sessions_send. Workers post to their output channels. Captain synthesizes.
Migration (sequential, verified per step)

Verify sub-agent tool access ← DONE
Enable agent-to-agent messaging
Add Scout as first permanent agent, test end-to-end
Add Bolt and Lens same pattern
Update AGENTS.md — replace spawn templates for permanent agents

Phase 1 Result:
memory_search: ❌ unavailable to sub-agents
read, message, web_search, exec, browser: ✅
Model aliases: ❌ don't resolve — use full path anthropic/claude-sonnet-4-6

Key Rules

Don't separate workspaces. Don't add all agents at once. Don't remove sessions_spawn — it's the fallback. Don't let workers manage cron or respond to Discord directly.

Success = Scout/Bolt/Lens persist knowledge, Captain delegates via sessions_send, ephemeral spawn still works, no gateway errors.
Coms Flow
Blacklist → #orders → Captain (main)
↓ sessions_send
┌─────────┌─────────┐
↓ ↓ ↓
Scout Bolt Lens
↓ ↓ ↓
#find-output #build-output #track-output
↓ ↓ ↓
└─────────┌─────────┘
↓
Captain synthesizes
↓
#captain-output

potent trellisBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

weak ridgeBOT
#

Yep — the shape of it is solid (Captain as the single “inbound brain”, plus specialized workers, plus keeping sessions_spawn for one-shots). The main things I’d tighten are a few OpenClaw-specific mechanics that can surprise you:

1) “Permanent agentId” implies separate workspace + auth + sessions

In OpenClaw, an agentId isn’t just a label; it’s a fully scoped brain (workspace, per-agent auth, per-agent session store). So “Scout/Bolt/Lens each get their own AGENTS.md, MEMORY.md, memory/” basically means they need distinct workspace roots (that’s the normal/intended model).
If you want shared code/assets, prefer a shared folder (or symlinks / bind mounts) rather than truly sharing the same workspace root, otherwise you lose most of the benefits and invite file/memory cross-contamination.

Docs: https://docs.openclaw.ai/concepts/multi-agent

2) Cross-agent delegation needs agent-to-agent messaging enabled

Your “Enable agent-to-agent messaging” step is the right next step: it’s off by default and must be explicitly allowlisted (so Captain can sessions_send into scout/bolt/lens).

Docs: https://docs.openclaw.ai/concepts/multi-agent (see tools.agentToAgent)

3) sessions_send has a built-in ping‑pong + announce behavior

This is the biggest gotcha with your “workers post to their output channels; Captain synthesizes” flow:

  • sessions_send can trigger a reply-back “ping-pong” loop (requester ↔ target).
#
  • Then it runs an announce step that can post back to the calling channel unless the target replies ANNOUNCE_SKIP.

So: if you don’t want Scout/Bolt/Lens talking in #orders, bake into each worker’s AGENTS.md something like:

  • always REPLY_SKIP when appropriate to stop ping-pong
  • always ANNOUNCE_SKIP in the announce step
  • use message tool to post only in their designated output channels

Also consider setting session.agentToAgent.maxPingPongTurns = 0 if you want strictly one-turn delegation.

Docs: https://docs.openclaw.ai/concepts/session-tool

4) Memory: memory_search being unavailable is usually “not enabled / not allowed”

memory_search / memory_get only exist when memory search is enabled for that agent, and tools for sub-agents can also be filtered. If you expect Scout/Bolt/Lens to retain knowledge via semantic recall, make sure memory search is enabled for those agents and that the tool set they run with includes it.

Docs: https://docs.openclaw.ai/concepts/memory

5) Discord safety: if workers can send messages, enforce it with Discord perms

#

If Scout/Bolt/Lens have message, they can technically send anywhere the bot has access. If your rule is “workers only post to #find-output/#build-output/#track-output”, the most robust enforcement is Discord channel permissions (bot role can only write in those output channels). That keeps accidents from becoming incidents.


If you align those 5 points, your phased rollout plan (Scout first, then Bolt/Lens, keep spawn as fallback) is exactly the way I’d do it.

indigo hinge
#
  1. Auth sharing — We use Claude Max (anthropic:manual token). Docs say auth profiles are per-agent and "never reuse agentDir." Can we symlink auth-profiles.json into each agent's agentDir, or should we copy it? What's the intended pattern for sharing one flat-rate token across multiple agents?
  2. agents.defaults scope — Does agents.defaults.memorySearch config apply to ALL agents automatically, or does each agent in agents.list[] need its own memorySearch config? Same question for compaction, model, etc. Trying to avoid duplicating config 6 times.
  3. Tool allow/deny approach — When specifying tools per agent in agents.list[], is it deny-list or allow-list? i.e., if I set tools.allow: ["message", "memory_search"], does that mean ONLY those tools are available? Or should I use tools.deny to block specific tools and keep everything else open? Workers need most tools (exec, read, write, message, memory_search, web_search, web_fetch, browser).
  4. Shared domain knowledge — Our workers write to shared topic files (memory/topics/). With separate workspaces, what's the recommended pattern? Symlink a shared memory/topics/ directory into each workspace? Or is there a better approach that doesn't risk cross-contamination?
  5. maxPingPongTurns = 0 + announce behavior — If we set maxPingPongTurns = 0, does the announce step still fire? We want: Captain sends task → worker does work → worker posts to Discord channels directly → no announce back to Captain's channel. Is maxPingPongTurns = 0 + ANNOUNCE_SKIP in worker AGENTS.md sufficient, or is there a config-level way to disable announce entirely?
  6. openclaw agents add wizard — Does it handle workspace creation + agentDir + session store setup automatically? Or do we need to manually create the directory structure?
weak ridgeBOT
#
  1. Auth sharing (Claude setup-token / “manual”)
  • Intended pattern is per-agent auth-profiles.json. Don’t reuse agentDir across agents, and don’t try to make multiple agents “share” one agentDir.
  • For sharing one Claude subscription token across multiple agents: copy the same credentials into each agent’s
    ~/.openclaw/agents/<agentId>/agent/auth-profiles.json (or just run the wizard / paste-token flow per agent).
  • A symlink just for auth-profiles.json will often “work”, but it’s not the intended pattern and is brittle (future refresh/write behavior, file locking expectations, wizard/doctor rewriting, accidental cross-agent coupling). Copying is the boring/reliable choice.

Docs: https://docs.openclaw.ai/concepts/multi-agent + https://docs.openclaw.ai/concepts/oauth

  1. agents.defaults scope (memorySearch / compaction / model / etc.)
  • Yes: agents.defaults.* is the baseline for all agents automatically.
  • You only set per-agent overrides in agents.list[] when you want something different for that agent.
  • For memorySearch: config can live in agents.defaults.memorySearch, but each agent still gets its own index/store (it’s per-agent on disk), so you avoid config duplication but not per-agent indexing work.

Docs (precedence): https://docs.openclaw.ai/tools/multi-agent-sandbox-tools

  1. Tools allow/deny semantics (per agent)
  • It’s allowlist + denylist.
#
  • Rule of thumb (from docs): if allow is non-empty, everything else is blocked; deny always wins.
  • So if workers “need most tools”, you usually either:
    • leave allow empty and just set a small deny (e.g. block cron, gateway, maybe session tools), or
    • set allow using groups (group:fs, group:runtime, group:web, etc.) so you’re not listing 10 tools by hand.

Docs: https://docs.openclaw.ai/gateway/sandbox-vs-tool-policy-vs-elevated

  1. Shared domain knowledge with separate workspaces
  • There’s no first-class “shared memory DB” across agentIds today; each agent’s memory_search indexes its own MEMORY.md + memory/**/*.md.
  • Recommended pattern: keep a separate shared KB folder/repo and then:
    • mount/symlink it read-only into each agent workspace (single-writer policy), or
    • copy it into each workspace if you want divergence.
  • If you specifically want memory_search to cover shared topics, the shared files must appear under each agent’s memory/ (so symlink/bind-mount memory/topics/ is the practical way). To avoid “cross-contamination”, enforce single-writer (e.g., only Captain updates the shared topics; workers write proposals to their own workspace and Captain merges).
  1. maxPingPongTurns = 0 and announce
  • maxPingPongTurns = 0 stops the reply-back ping-pong, but the announce step still exists after the primary run.
  • The reliable way to make workers silent (other than their explicit message sends) is: worker replies ANNOUNCE_SKIP in the announce step (and optionally REPLY_SKIP defensively).
#
  • I don’t see a config-level “disable announce entirely” knob in the current docs; the supported control is ANNOUNCE_SKIP.

Docs: https://docs.openclaw.ai/concepts/session-tool

  1. openclaw agents add wizard behavior
  • Yes: it’s designed to set up the agent entry and paths for you:
    • writes agents.list[].workspace and agents.list[].agentDir (and optionally bindings)
    • creates/seeds the workspace bootstrap files
  • The sessions store directory under ~/.openclaw/agents/<agentId>/sessions/ is created automatically on first use (you shouldn’t need to precreate it manually).

Docs: https://docs.openclaw.ai/start/wizard + https://docs.openclaw.ai/concepts/multi-agent

indigo hinge
#

Issue 1: Model error
openclaw models list shows anthropic/claude-sonnet-4-6 as "missing" (no auth resolved). Opus-4-6 works fine through the same anthropic:manual token. Scout was set to sonnet-4-6 as primary, which fails. Even as a fallback for main, sonnet-4-6 would fail if ever needed.
Issue 2: Discord
Re-reading the multi-agent docs, they explicitly show:
"Discord: one bot per agent, enable Message Content Intent, copy each token."
And the config example has separate accounts.default and accounts.coding with different bot tokens. So yes — the docs suggest separate Discord bot accounts per agent when running multi-agent.
The question is whether Scout can use the message tool through the global Discord bot without its own account/binding, or if it needs a dedicated Discord bot app + account entry.
What to ask Krill:
openclaw models list shows anthropic/claude-sonnet-4-6 as missing (no auth). We're on Claude Max via anthropic:manual token (sk-ant-oat01-...). claude-opus-4-6 resolves fine through the same token. Is sonnet-4-6 not available through the setup token, or do we need to refresh?
For multi-agent Discord: Scout (a non-default agent) needs to post to Discord channels via the message tool. Does it need its own Discord bot app + account entry in channels.discord.accounts, or can it send through the existing bot s

weak ridgeBOT
#

Issue 1 (Sonnet shows missing / no auth resolved)

In multi-agent setups, the most common cause is: Scout doesn’t have an Anthropic auth profile resolved, even if main does.

A good way to pinpoint it:

openclaw models status --agent scout --probe
openclaw models status --agent main  --probe

If Scout shows “Missing auth” (or no usable anthropic:* profile), fix is either:

A) Copy auth store (recommended + boring + reliable)

  • Copy ~/.openclaw/agents/main/agent/auth-profiles.json
    → ~/.openclaw/agents/scout/agent/auth-profiles.json
  • Restart gateway
#
  • Re-run openclaw models status --agent scout --probe

B) Auth profile order mismatch (very plausible with anthropic:manual)
If your token lives under a non-default profile id (e.g. anthropic:manual), check what Scout is trying to use:

openclaw models auth order get --provider anthropic --agent scout --json

If Scout’s order doesn’t include the profile that actually exists in its auth-profiles.json, it’ll look exactly like “missing/no auth resolved”.

Finally, if Scout does have working Anthropic auth but only that one model id fails, then it’s likely either:

  • the model name isn’t in your current catalog, or
  • it’s not accessible for that auth method/profile.

Verify what your install thinks exists:

#
openclaw models list --all --provider anthropic

(If it’s not listed there, it won’t resolve cleanly until the catalog/config supports it.)

Docs: https://docs.openclaw.ai/cli/models


Issue 2 (Discord multi-agent: does Scout need its own bot?)

Docs recommend one Discord bot per agent primarily for inbound routing (bindings by accountId) and for “separate persona per bot”.

In your architecture (Captain is the only inbound agent; workers just post to output channels), Scout does not need its own Discord bot/app just to use message.

What’s true:

#
  • Channel accounts/tokens are global gateway config (channels.discord...), not per-agent.
  • A non-default agent can still call the message tool and send via the existing Discord bot token.

Two practical notes:

  1. If Scout is not bound to Discord, its session context may be internal, so make sure your worker prompts/tool usage explicitly sets the Discord channel/target when sending (don’t rely on “current channel” inference).

  2. If you want hard safety boundaries (“Scout can only post in #find-output”), the most robust approach is still Discord permissions (or a dedicated low-permission bot for workers). Tool policy alone won’t prevent “oops wrong channel id” if the bot can write everywhere.

Docs: https://docs.openclaw.ai/concepts/multi-agent#discord-bots-per-agent

indigo hinge
#

Questions for Krill re: multi-agent memory persistence:

Session retention for non-main agents — do agent sessions (scout/bolt/lens) get swept by the same archive cron as main? Is there a per-agent session retention config, or does the global setting apply? We want their sessions to persist so agent:scout:main keeps its conversation context across tasks.
MEMORY.md injection for agents — when sessions_send triggers an agent, does the agent's MEMORY.md from its workspace get injected into the system prompt automatically (like it does for main)? Or do agents only get AGENTS.md? We need to confirm agents actually read their own MEMORY.md.
Memory search scope for agents — when Scout runs memory_search, does it search Scout's workspace (~/.openclaw/workspace-scout/MEMORY.md + memory/*.md) or main's workspace? The symlinked memory/topics/ means topics are shared, but agent-specific files like memory/agents/researcher.md need to be in Scout's own workspace.
Session continuity — when I send a second task to agent:scout:main, does it continue the same session (with prior context) or start fresh? The main in the session key suggests it should be the persistent session, but I want to confirm.
Gateway restart impact — do agent sessions survive gateway restarts? Bolt and Lens both timed out on first health check (needed 180s), which suggests cold start. Is that just model loading time, or did their sessions reset?

weak ridgeBOT
#

Session retention / “archive sweep”

  • Regular agent sessions like agent:scout:main live on disk under ~/.openclaw/agents/<agentId>/sessions/ and are governed by the global session.* config (reset + maintenance). They’re not “ephemeral” just because the agent isn’t main. Docs: https://docs.openclaw.ai/concepts/session
  • The thing that does auto-archive by default is sessions_spawn sub-agents (agents.defaults.subagents.archiveAfterMinutes). That does not apply to permanent agents (Scout/Bolt/Lens). Docs: https://docs.openclaw.ai/concepts/session-tool
  • There isn’t a documented agents.list[].session.* per-agent retention override; instead, each agent has its own session store file, but the same session.reset / session.maintenance rules apply across agents.

Does Scout’s MEMORY.md get injected when triggered via sessions_send?

  • Yes for a normal agent session: OpenClaw injects bootstrap files (incl MEMORY.md / memory.md when present) into the system prompt on every turn for that agent/workspace. Docs: https://docs.openclaw.ai/concepts/system-prompt
  • Caveat: MEMORY.md is intended for the agent’s “main/private” context (not shared/group contexts). So: sending work to agent:scout:main is the right pattern if you want Scout’s MEMORY.md injected. Docs: https://docs.openclaw.ai/concepts/agent-workspace
  • You can verify exactly what’s being injected by running /context list inside the Scout session.

memory_search scope (which workspace gets searched?)

  • memory_search is per-agent: it indexes that agent’s own MEMORY.md + memory/**/*.md in that agent’s workspace, and stores the index per-agent (e.g. ~/.openclaw/memory/<agentId>.sqlite). Docs: https://docs.openclaw.ai/concepts/memory
  • So Scout searches Scout’s workspace. If you symlink memory/topics/ into each workspace, each agent will index the shared topics (still per-agent indices), while Scout-only notes remain Scout-only.

**Session continuity

#

for agent:scout:main**

  • Yes: sending a second task to the same session key (agent:scout:main) continues the same session/transcript unless your session.reset policy expires it (daily/idle) or you trigger /new//reset. Docs: https://docs.openclaw.ai/concepts/session

Gateway restart impact

  • Sessions survive gateway restarts (they’re persisted as sessions.json + *.jsonl on disk). Restart alone doesn’t “reset” them. Docs: https://docs.openclaw.ai/concepts/session
  • Timeouts right after restart are usually just cold start latency (provider warm-up, first big prompt, etc.), not session loss. For agent-to-agent calls you can also use timeoutSeconds: 0 (fire-and-forget) and then check later via sessions_history if you don’t want the Captain waiting. Docs: https://docs.openclaw.ai/concepts/session-tool
indigo hinge
#

When using sessions_send to a permanent agent (agent:bolt:main) and it times out — is the message guaranteed to be queued and delivered when the agent's current tool call completes? Or can messages be lost on timeout?

Context: I keep misinterpreting timeouts as "message not delivered" and then taking destructive recovery actions (duplicate sends, gateway restarts). Need to understand the delivery guarantee so I can trust fire-and-forget (timeoutSeconds: 0).

#

@weak ridge

#

@weak ridge can you help with the question above

weak ridgeBOT
#

I can, but I can’t see the “question above” in my context (this thread came through without prior messages).

Can you copy/paste the question (and any error logs / config snippet it refers to) here? If it’s long, paste the most relevant part + say what you’re trying to migrate (single-agent → multi-agent, heartbeat/crons, per-agent tool allowlists, etc.).

indigo hinge
#

When using sessions_send to a permanent agent (agent:bolt:main) and it times out — is the message guaranteed to be queued and delivered when the agent's current tool call completes? Or can messages be lost on timeout?

Context: I keep misinterpreting timeouts as "message not delivered" and then taking destructive recovery actions (duplicate sends, gateway restarts). Need to understand the delivery guarantee so I can trust fire-and-forget (timeoutSeconds: 0).

weak ridgeBOT
#

If sessions_send times out while waiting, that does not mean “message wasn’t delivered.” It means “the tool stopped waiting for the target run to finish.”

What actually happens (per current OpenClaw code):

  • sessions_send first submits the message via the Gateway agent RPC and gets a runId back. That RPC acks immediately (status:"accepted") and then continues asynchronously. Source: server-methods/agent.ts ack + void agentCommand(...) async continuation (respond(...accepted...) then run) (Source: src/gateway/server-methods/agent.ts#524-563).
  • Only after that, sessions_send optionally calls agent.wait. If agent.wait hits its timeout, sessions_send returns status:"timeout" — but the run is already in flight/queued. Source: the agent call happens before agent.wait, and the timeout return is only based on agent.wait (Source: src/agents/tools/sessions-send-tool.ts#284-335).

Queueing behind an in-progress tool call:

  • Embedded runs are serialized per-session using a per-session command lane (session:<sessionKey>). So a send to agent:bolt:main will sit in that session’s lane until the current run completes. Source: runEmbeddedPiAgent enqueues into enqueueCommandInLane(sessionLane, ...) where sessionLane = "session:" + sessionKey (Source: src/agents/pi-embedded-runner/run.ts#177-195, src/agents/pi-embedded-runner/lanes.ts#3-6).

So the practical guarantees:

  1. If sessions_send returns accepted or timeout with a runId: the message was accepted by the gateway and is queued/in-flight. Don’t retry just because you didn’t get a completion in time.

  2. If sessions_send returns error because it couldn’t submit the agent call (e.g. transport/gateway timeout before getting the ack): delivery is unknown (could be not submitted, or submitted but you didn’t receive the ack). That’s the one case where “maybe resend” is defensible—but it can create duplicates because sessions_send generates