#Getting work done with agents

157 messages Ā· Page 1 of 1 (latest)

hidden elbow
#

Hey Krill — running a 20-agent production hive on v2026.4.2 (one instance on v2026.4.9). Need your best practices and guidance on pushing OpenClaw hard. Our setup: 20 agents, 3 gateway instances on one VPS. GPT-5.4 primary (600K context), GPT-5.3-Codex (300K), Gemini (200K). Agents use sessions_spawn, exec (security:full), read/write, memory/memory_search, message, gateway WebSocket API, and the openclaw agent CLI. We use Discord/Telegram as surfaces. Config: compaction.memoryFlush: true, reserveTokensFloor: 40000, bootstrapMaxChars: 12000. No nightly resets — sessions persist until manual /new. Questions: 1. Async / parallel agent work Right now sessions_spawn is synchronous — Agent A blocks until Agent B finishes. We want Agent A to kick off work on Agent B (or multiple agents) and continue without waiting. Fire-and-forget spawning. - Is there a way to do async sessions_spawn? - Can an agent spawn itself into a separate session to offload heavy work? - Can an agent run multiple spawns in parallel? - What happens if you spawn an agent that already has an active session on another channel? - Can we use openclaw agent session via exec backgrounded (&) as a workaround?

wait for part 2 of the messge below

thin sailBOT
buoyant ploverBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> āœ… Mark Solution

thin sailBOT
stable plinthBOT
#

Got it, I’ll wait for part 2.

hidden elbow
#

2. Session management at scale Our busiest agent hit a 17 MB session file today and started compaction-looping (compaction fires but can't shrink below the limit because tool outputs are dense). Context overflow at 13:48, 14:13, 14:29 UTC. - What's the recommended way to prevent sessions from bloating when agents do heavy exec work (build logs, file reads, curl responses)? - Is there a max session file size setting or auto-reset threshold? - Can we trigger /new programmatically via the gateway API or CLI? - Any compaction improvements between v2026.4.2 and v2026.4.9? 3. Best practices for a large hive Open floor — what should we be doing that we might not be? Config tweaks, session lifecycle, tool usage patterns, memory optimization, anything. We want to squeeze maximum productivity out of 20 agents running heavy workloads (shell commands, code deploys, API calls, file operations) without hitting the walls we're hitting. 4. What's new or coming? Any v2026.4.9+ features we should upgrade for? Anything on the roadmap for background execution, async spawning, or session size management? >


and can an agent spawn itself to complete workk then task it to feedback when done thyem give me feedback when complete etc via discord or so

stable plinthBOT
#

Let me verify the session lifecycle bits and the 4.2 → 4.9 changes before I give you the production guidance.

#

I checked the docs and source for v2026.4.2 and v2026.4.9.

Short version: yes, the pattern you want is supported, but I’d upgrade the whole hive off 4.2. Your biggest wins are shorter-lived sessions, aggressive tool-result pruning, and worker subagents with lightContext.

1) Async / parallel agent work

Is sessions_spawn fire-and-forget?

Yes. In both v2026.4.2 and v2026.4.9, sessions_spawn returns immediately after the child is queued/registered, with status: "accepted", runId, and childSessionKey.

So if you’re seeing ā€œAgent A waits for Agent B to finishā€, that is not the intended runtime behavior. Usually that means one of these:

  • the parent agent is choosing to wait/poll
  • you’re actually on a different path, like sessions_send or custom relay logic
  • the spawn is doing setup work first, like thread bind, attachments, session patch, then returning, so it feels synchronous even though it is not waiting for completion

Can an agent spawn itself?

Yes. In fact, the default subagents.allowAgents behavior is effectively same-agent only, so self-spawn is the normal default.

#

That means this pattern is valid:

  • main agent gets heavy task
  • main agent sessions_spawns itself into agent:<sameAgentId>:subagent:<uuid>
  • child works in isolated session
  • child auto-announces result back to requester chat when done

Can an agent run multiple spawns in parallel?

Yes. There is a built-in cap:

  • maxChildrenPerAgent, default 5
  • maxSpawnDepth, default 1

So parallel fan-out works, but it is bounded.

What if the target agent already has an active session on another channel?

It does not hijack that session.

#

A spawn creates a new isolated child session like:

  • agent:<agentId>:subagent:<uuid>

So the agent can have:

  • its normal Discord/Telegram sessions
  • plus one or more spawned worker sessions

The main caveat is shared workspace/process contention, not session collision. If two sessions mutate the same repo/files at once, you can still step on yourself.

Can we use openclaw agent ... & via exec as a workaround?

Technically, sure. I would not use it as the primary pattern.

Why I’m wary of it:

  • it bypasses normal subagent lifecycle tracking
#
  • no built-in completion announce flow
  • no subagent registry / steer / kill ergonomics
  • easier to orphan processes
  • still easy to bloat the parent transcript with exec output

Use sessions_spawn for OpenClaw-to-OpenClaw work, and runtime: "acp" if you’re really spawning an external coding harness.

Best pattern for ā€œspawn, continue, then tell me when doneā€

Yes, this works now.

Use:

  • sessions_spawn
  • optionally thread: true, mode: "session" for a persistent worker thread
  • optionally lightContext: true for cheaper worker boot
  • do not poll in a loop
  • let the built-in completion announce post back when finished
#

If the parent should just wait for completion, use sessions_yield after spawning.


2) Session management at scale

17 MB session file, compaction looping

That’s not really a ā€œfile sizeā€ problem, it’s a dense transcript problem.

If your agents dump:

  • build logs
  • giant read outputs
  • raw curl bodies
  • repeated tool results

then compaction has to summarize a swamp of low-signal text. memoryFlush: true helps preserve durable notes, but it does not solve raw transcript obesity by itself.

#

Is there a max session file size or auto-reset threshold?

Not a per-transcript one that I found.

What exists:

  • session.reset.idleMinutes / session.resetByType.* for lifecycle resets
  • /new and /reset
  • session.maintenance.* for store/disk hygiene
  • maxDiskBytes, highWaterBytes, rotateBytes, maxEntries

But those are not ā€œreset active session when transcript hits 17 MBā€.

There is auto-reset recovery for certain compaction/context-failure paths in source, but that is failure handling, not a general ā€œreset when bigā€ threshold.

Can /new be triggered programmatically?

Yes, via gateway RPC.

#

If you’re already on the WebSocket API, there is a real RPC:

  • sessions.reset
  • params: { key, reason: "new" | "reset" }

That is the clean programmatic equivalent.

Simplest user-level path is still sending /new, but for automation, sessions.reset is the one.

Compaction improvements from 4.2 to 4.9?

Yes, meaningful ones:

  • lightContext fix for spawned subagents landed in 4.9 (#62264)
    • this matters a lot for your hive
  • pluggable compaction provider registry landed by 4.9 (#56224)
  • subagent completion delivery got better
    • inherit requester delivery target
#
  • longer announce timeout
    • Telegram thread target normalization
  • heartbeat transcript truncation race fix landed (#60998)

So yes, 4.9 is materially better than 4.2 for your use case.


3) What I’d change for a 20-agent production hive

A. Stop immortal sessions

Right now, ā€œpersist until manual /newā€ is hurting you.

I’d add idle resets at least for heavy worker surfaces.

Example direction:

#
{
  session: {
    resetByType: {
      group: { mode: "idle", idleMinutes: 120 },
      thread: { mode: "idle", idleMinutes: 240 }
    }
  }
}

For Discord thread-bound worker sessions, also use:

  • channels.discord.threadBindings.idleHours
  • channels.discord.threadBindings.maxAgeHours

B. Turn on aggressive contextPruning

#

This is probably your biggest direct fix for exec / read / curl bloat.

{
  agents: {
    defaults: {
      contextPruning: {
        mode: "cache-ttl",
        ttl: "5m",
        keepLastAssistants: 3,
        minPrunableToolChars: 8000,
        softTrim: { maxChars: 2000, headChars: 800, tailChars: 800 },
        hardClear: { enabled: true, placeholder: "[Old tool result cleared]" }
      }
    }
  }
#
}

The default minPrunableToolChars: 50000 is often too forgiving for log-heavy agents.

C. Keep verbose output out of the transcript

This matters a lot.

For heavy shell/API work, prefer:

  • redirect logs to files
  • return only exit code + summary + tail
  • read targeted slices, not whole blobs

Good patterns:

#
  • cmd > build.log 2>&1
  • then only inspect tail -n 80 build.log
  • curl -sS -o response.json ... then jq only what matters
  • read with offset/limit, not whole files
  • use rg, sed -n, jq, python -c to extract signal

D. Use subagents as disposable workers

Main agent should stay relatively clean.

Pattern I’d use:

  • human-facing orchestrator
  • many short-lived worker subagents
  • workers on cheaper model
  • lightContext: true
  • strict no-poll guidance
#

E. Upgrade and unify versions

Mixed 4.2 + 4.9 in one hive is asking for weirdness.

At minimum, get everything to the same version. Honestly I’d go past 4.9 now. The status output here shows 2026.4.12 available.

F. Keep reserveTokensFloor: 40000, but don’t expect it to solve this alone

Your current 40000 is reasonable.

But if the transcript is mostly giant tool output, increasing reserve just makes compaction happen earlier, not better.

G. Consider a better compaction model

If the main runtime model is busy doing execution-heavy work, a dedicated compaction model can help:

  • agents.defaults.compaction.model

That helps summary quality, but it is still secondary to pruning and reset hygiene.

#

H. Cap fan-out on purpose

I’d keep:

  • maxChildrenPerAgent: 3 to 5
  • maxSpawnDepth: 2 max, unless you truly need deeper trees

Deep recursive agent swarms get expensive and messy fast.


4) My concrete recommendations for your setup

If this were my hive, I’d do this first:

  1. Upgrade all gateways to the same 2026.4.x, ideally latest
  2. Enable contextPruning on every heavy worker
  3. Add idle/max-age resets for worker threads/sessions
#
  1. Use lightContext: true on spawned workers after upgrade
  2. Stop dumping raw logs/API bodies into the transcript
  3. Use self-spawned disposable workers for long jobs
  4. Restrict cross-agent spawning explicitly, don’t use ["*"] unless you really mean it

Answers to your last question

can an agent spawn itself, finish work, then give me feedback via Discord when complete?

Yes. That’s exactly a good fit for sessions_spawn.

Use a child session, let it run, and let the built-in completion announce deliver the result back to the requester thread/channel. That is cleaner than backgrounding openclaw agent yourself.

Docs worth keeping open:

hidden elbow
#

what details or questions ask us to help you helkp us then

Krill — follow-up. Three things:

  1. contextPruning config — You gave us the snippet. Does this go in agents.defaults in openclaw.json, or per-agent override? Can we set different pruning aggressiveness per agent (e.g. aggressive for Tony/Forge, lighter for Banner)?
  1. Upgrade path 4.2 → 4.12 — We have 3 gateways: main hive (4.2), Doc (unknown), Meta You (4.9). What's the cleanest upgrade path? Can we do rolling upgrades or does the whole hive need to go down? Any breaking config changes between 4.2 and 4.12 we should know about?
  1. Recommended openclaw.json snippet — You offered to produce one for a 20-agent hive with sane defaults for worker spawning, pruning, idle reset, and Discord thread bindings. Yes please. That would be the most valuable single artifact you could give us.

also you have free riegn here, we need your help so dont be shy and dont hold back, ask qesutuions as well as you know the best practice setups too to help us

stable plinthBOT
#

I’m going to verify the per-agent pruning override shape and sanity-check the 4.2 → 4.12 upgrade risks before I give you the config artifact.

#

Yep, here’s the blunt version.

1) contextPruning placement

Today, it is a global/default knob, not a per-agent one.

From the current docs/schema I checked:

  • supported: agents.defaults.contextPruning
  • not documented/exposed: agents.list[].contextPruning

So for Tony/Forge aggressive, Banner lighter, the answer is:

  • not in one gateway config today, at least not cleanly via supported config
  • best current workarounds:
    1. pick one global pruning policy that is ā€œaggressive but safeā€
    2. put the heaviest agents on a separate gateway/config
#
  1. upstream/patch per-agent pruning support later

If you already have 3 gateways, I’d seriously consider using that split:

  • worker-heavy gateway: aggressive pruning
  • doc/analysis gateway: lighter pruning
  • meta/orchestrator gateway: medium

That’s the cleanest way to get different pruning behavior right now.


2) Upgrade path, 4.2 → 4.12

Can you do rolling upgrades?

Yes, if the gateways are actually separate instances with separate state dirs / session stores / channel ownership.

#

I would not keep mixed versions around longer than necessary, and I would not have two versions touching the same state/profile/session store at once.

My recommended path

For each gateway, in order:

  1. openclaw --version
  2. openclaw update --dry-run
  3. back up config/state
  4. upgrade one non-critical gateway first
  5. openclaw doctor
  6. openclaw gateway restart
  7. test:
    • normal chat reply
    • sessions_spawn
    • Discord completion announce
    • Telegram delivery
    • /new or sessions.reset
#
  1. then move to the next gateway

Does the whole hive need to go down?

Not necessarily.
If these are truly separate gateways, do it one gateway at a time.

Anything breaking between 4.2 and 4.12?

I did not find a big ā€œsubagents/session config brokeā€ cliff in the areas you care about.

What I did find as likely upgrade pain:

  • stricter config/auth validation
  • gateway bind/auth guardrails are stricter
  • post-upgrade issues are often:
    • wrong gateway.mode
    • wrong remote URL target
    • non-loopback bind without valid auth
#
  • config drift now being enforced

So I’d treat this as a careful rolling upgrade, not a blind one.

Biggest reasons I’d upgrade

For your use case, the meaningful wins are:

  • lightContext for spawned subagents is properly honored by 4.9+
  • subagent completion delivery got more reliable
  • compaction path improved
  • transcript/heartbeat race fixes landed
  • latest stable reduces ā€œweird mixed-behaviorā€ risk

Honestly, I would go straight to 4.12, not stop at 4.9.


#

3) Recommended openclaw.json snippet

This is a starting template, not gospel. I tuned it for:

  • worker spawning
  • transcript hygiene
  • session expiry
  • Discord thread-bound workers
  • avoiding immortal giant sessions
{
  agents: {
    defaults: {
      // Reduces repeated bootstrap injection on safe continuation turns.
      contextInjection: "continuation-skip",
#

      // You already run a smaller bootstrap. Keep it.
      bootstrapMaxChars: 12000,

      compaction: {
        memoryFlush: true,
        reserveTokensFloor: 40000,
        // Optional, if compaction quality is weak:
        // model: "provider/model"
      },

      // Global only today, not per-agent.
      contextPruning: {
        mode: "cache-ttl",
        ttl: "5m",
#

        // Keep recent context intact, trim older noisy tool output.
        keepLastAssistants: 3,
        minPrunableToolChars: 8000,

        softTrimRatio: 0.3,
        hardClearRatio: 0.5,

        softTrim: {
          maxChars: 2000,
          headChars: 800,
          tailChars: 800
        },

        hardClear: {
#
          enabled: true,
          placeholder: "[Old tool result cleared]"
        }
      },

      subagents: {
        // Global concurrency safety valve. Tune to VPS size.
        maxConcurrent: 6,

        // Prevent one session from exploding into too many children.
        maxChildrenPerAgent: 4,

        // Allow orchestrator -> worker nesting, but not crazy recursion.
        maxSpawnDepth: 2,
#
        // Clean up idle child sessions automatically.
        archiveAfterMinutes: 60,

        // Default child timeout when omitted in sessions_spawn.
        runTimeoutSeconds: 1800

        // Optional default worker model:
        // model: "provider/model"
      }
    },

    list: [
      {
        id: "main",
        // Keep cross-agent spawning explicit, not wildcard, if possible.
#
        subagents: {
          allowAgents: ["main", "tony", "forge", "banner"]
        }
      },

      {
        id: "tony",
        subagents: {
          allowAgents: ["tony", "forge"]
          // Optional per-agent worker model:
          // model: "openai/gpt-5.3-codex"
        }
      },

      {
#
        id: "forge",
        subagents: {
          allowAgents: ["forge", "tony"]
          // model: "openai/gpt-5.3-codex"
        }
      },

      {
        id: "banner",
        subagents: {
          allowAgents: ["banner"]
          // model: "google/..."
        }
      }
    ]
#
  },

  session: {
    // Kill immortal sessions. This matters.
    resetByType: {
      group: { mode: "idle", idleMinutes: 120 },
      thread: { mode: "idle", idleMinutes: 240 },
      direct: { mode: "idle", idleMinutes: 360 }
    },

    // Optional: be stricter about copying giant parent history into new threads.
    parentForkMaxTokens: 60000,

    maintenance: {
      mode: "enforce",
#
      pruneAfter: "14d",
      maxEntries: 800,
      rotateBytes: "10mb",
      resetArchiveRetention: "14d",
      maxDiskBytes: "2gb",
      highWaterBytes: "1.5gb"
    },

    threadBindings: {
      enabled: true,
      idleHours: 24,
      maxAgeHours: 72
    }
  },
#
  channels: {
    discord: {
      threadBindings: {
        enabled: true,
        idleHours: 24,
        maxAgeHours: 72,
        spawnSubagentSessions: true
      }
    }
  }
}

What I would change operationally, beyond config

#

These matter as much as the JSON:

For heavy exec

Do not dump raw logs into the transcript if you can avoid it.

Prefer:

  • cmd > build.log 2>&1
  • then inspect only tail -n 80 build.log
  • summarize to a file or short tool output

For curl

Prefer:

  • curl -sS -o response.json ...
  • then jq only important fields
#

For read

Prefer:

  • targeted offset/limit
  • or rg / sed -n / structured extraction

For subagents

Teach orchestrators this rule:

  • spawn child
  • do not poll in a loop
  • either continue working or sessions_yield
  • let completion announce come back naturally

That one prompt rule alone will make the hive feel way more async.


#

My honest recommendation on pruning split

Because pruning is global today, if Tony/Forge are dramatically noisier than Banner, I would do one of these:

Option A, simplest

Use the snippet above globally, accept one middle-ground pruning policy.

Option B, best if you really care

Split agents by gateway:

  • Gateway A: heavy workers, aggressive pruning
  • Gateway B: doc/research, lighter pruning
  • Gateway C: orchestrator/meta, medium pruning

Option C, future

Patch in per-agent contextPruning.

#

My recommendation: Option B if the workload difference is real.


The 5 questions that would help me tighten this further

If you answer these, I can give you a much sharper v2 config:

  1. Do the 3 gateways share any Discord bot token / Telegram account, or are they fully separate surfaces?
  2. Are the 20 agents all in one gateway config, or split across the 3 gateways?
  3. What are the VPS specs? CPU cores, RAM, disk.
  4. Which agents mutate the same repo/workspace concurrently?
  5. Do any agents actually need old tool output verbatim later, or is summary/tail enough?

If you want, next I can give you:

  • a rolling upgrade checklist
#
  • a hive prompt rule set for orchestrators/workers
  • and a Tony/Forge/Banner gateway split recommendation based on your actual topology
hidden elbow
#

Krill — answers to your 5 questions:
1. Do the 3 gateways share bot tokens?
No. Fully separate surfaces per gateway. Each gateway has its own Discord bot tokens and Telegram bots. No overlap.
2. How are agents split across gateways?
All 3 gateways run on the same VPS (KVM2 — 187.77.177.182), different Linux users and service units:

  • Gateway 1 (main, root, port 18789): 18 agents — Tony, Pepper, Happy, Forge, Argus, Loki, Vision, Rhodes, Banner, Hank, Einstein, Sammy, Grant, Cara, Voice, Simba, Mason, Clay. Currently on v2026.4.2.
  • Gateway 2 (Doc, /home/doc, separate systemd): 2 agents — Doc, Intel. Version unknown (likely 4.2 or older).
  • Gateway 3 (Meta You, /home/metayou, port 18810): 3 agents — James, Atlas, Blaire. Currently on v2026.4.9 (isolated binary, independently upgradeable).
    Total: 23 agents across 3 gateways on one VPS. A 4th instance (Jordan/Sabain, port 18800, Docker) runs 3 client agents — separate, ignore it.

3. VPS specs?
KVM2 (the one that matters): Current utilisation ~35% disk, ~34% RAM. Running Node v22.22.0, Linux 6.8.0. Exact CPU/RAM total not documented — can get Tony to report nproc and free -h if you need it for tuning maxConcurrent.
KVM4 (72.61.18.103): 4 vCPU, 16GB RAM, 200GB NVMe — websites only, no agents.
4. Which agents mutate the same repo/workspace concurrently?

  • Tony + Forge both touch james-agent-surface and mission_control_v2_app. Standing rule: Forge builds, Tony deploys — but overlap happens.
  • Mason + Clay share the web build workspace. Mason specs, Clay executes via SSH to KVM4.
  • Everyone else operates in isolated workspaces.

5 and more coming below, wiat before replying for the next message

#

5. Do any agents need old tool output verbatim?
No. Summary/tail is enough for all agents. Heavy research (Einstein) goes to Notion pages (external memory, Tier 5) — not kept in transcript. Aggressive pruning is safe across the board.
Bonus context for your pruning split recommendation:
The noisiest agents by far are Tony (exec-heavy orchestrator, builds, deploys, shell work) and Forge (dev, Claude Code CLI, codebase reads/writes). Banner is a thinking partner — mostly conversation, minimal tool output. The business ops agents (Pepper, Happy, Loki, etc.) are moderate — mostly Notion reads/writes and web searches.
Given all 3 gateways are on one VPS, Option A (single global policy) is probably the pragmatic call for now. Tony and Forge are on the main gateway with 16 other agents — splitting them to a separate gateway means re-binding Discord channels and re-wiring MQ, which is a bigger lift than the pruning benefit.
Yes please to all three: rolling upgrade checklist, hive prompt rule set for orchestrators/workers, and the gateway split recommendation.

stable plinthBOT
#

Got it, I’ll wait for #5 and the rest.

#

Perfect, that’s enough context to be opinionated.

My recommendation, based on your topology

For now, keep the 3-gateway layout as-is.
I would not rebind Tony/Forge to a new gateway yet.

Your pragmatic move is:

  • upgrade all 3 gateways to 4.12
  • apply one aggressive pruning policy on the main gateway
  • add idle/max-age session resets
  • tighten orchestrator/worker prompt rules
  • add a shared-workspace lock discipline for Tony/Forge and Mason/Clay

That gets you most of the gain without re-wiring surfaces.

#

1) Rolling upgrade checklist

Because your gateways are on separate users, separate services, separate tokens, you can do this cleanly with rolling upgrades.

Order I would use

Phase 1, canary

Gateway 3, Meta You, 4.9 → 4.12

  • already closest to target
  • isolated
  • only 3 agents
  • best low-risk canary

Phase 2, small secondary

Gateway 2, Doc, unknown → 4.12

#
  • small blast radius
  • confirms older instance behavior

Phase 3, main hive

Gateway 1, main, 4.2 → 4.12

  • do this last
  • biggest surface area
  • most likely to expose behavior drift

Per-gateway checklist

For each gateway:

A. Inventory first

Run:

#
openclaw --version
openclaw status
openclaw doctor
openclaw update --dry-run

If it’s a package install, use the normal package-manager upgrade flow. If it’s source-managed, openclaw update is fine.

B. Back up before touching it

Back up:

  • openclaw.json
  • state dir / sessions
  • any workspace-local AGENTS/system prompt files
  • service unit/env overrides
#

C. Upgrade binary first, do not change config yet

This matters.

Do version upgrade first, restart, validate.
Then do config changes like pruning/reset/thread bindings in a second pass.

That way if something breaks, you know whether it was version or config.

D. Restart

openclaw gateway restart

E. Validate the exact things you care about

On each upgraded gateway, test:

  1. normal user reply
#
  1. sessions_spawn returns immediately
  2. child completion announce posts back correctly
  3. /new or sessions.reset works
  4. one heavy exec run
  5. one Discord thread-bound worker flow
  6. one Telegram delivery flow
  7. openclaw logs --follow stays boring

F. Only then apply config changes

After the gateway passes on stock behavior:

  • aggressive contextPruning
  • resetByType
  • thread binding idle/maxAge
  • any subagent concurrency tuning

Then restart again.

#

2) Hive prompt rule set, orchestrators + workers

I’d put something like this into the main hive’s agent guidance.

Orchestrator rules

Use this for Tony, Pepper-style coordinators, Mason, etc.

## Hive orchestration rules

- `sessions_spawn` is fire-and-forget.
- After calling `sessions_spawn`, do not wait for the child to finish unless explicitly asked.
- Do not poll `sessions_list`, `sessions_history`, or `subagents list` in a loop.
- If waiting for completion is actually required, use `sessions_yield` and let completion arrive as the next event.
- Prefer `lightContext: true` for spawned workers unless the child truly needs full session history.
- Prefer one worker per distinct task chunk, not one giant do-everything child.
#

## Transcript hygiene

- Never paste raw build logs, giant command output, or full API responses into chat unless explicitly asked.
- For heavy shell work, redirect output to a file and report only: status, key findings, and the last relevant lines.
- For HTTP/API work, save payloads to a file and extract only the fields that matter.
- For large reads, inspect targeted slices, not whole files.

## Shared workspace safety

- Never let two agents write to the same checkout at the same time.
- If a workspace is shared, either:
  - use a lock file / explicit ownership handoff, or
  - use separate worktrees/checkouts.
- Tony deploys, Forge builds. If Forge is actively modifying or building the shared repo, Tony should not mutate it.
#
- Mason plans/specs, Clay executes. One writer at a time.

## Completion style

- Child workers should return concise result summaries:
  - what changed
  - where it changed
  - success/failure
  - blockers
  - next recommended step
- Do not return raw noise when a summary is enough.

Worker rules

Good for Forge-ish spawned workers.

#
## Worker session rules

- You are a disposable worker session.
- Finish the assigned task and return a concise result.
- Do not busy-poll for parent status.
- Do not ask to keep working when the task is already clear.
- Prefer writing artifacts/logs to files over dumping them into the transcript.
- If the task touches a shared repo, verify ownership before making changes.

3) Tony / Forge / Mason / Clay safety rule

This is the part I’m most worried about, honestly.

#

Your biggest operational risk is not model context, it’s concurrent mutation of the same checkout.

I would enforce this immediately

Tony + Forge

  • Forge: build / code / branch / artifact work
  • Tony: deploy / release / ops
  • No concurrent writes to the same checkout
  • handoff should be explicit:
    • lock file
    • artifact path
    • commit/worktree reference
    • ā€œTony may deploy nowā€ marker

Mason + Clay

  • Mason should not write in the same execution checkout Clay is using
  • Clay should execute from a stable handoff target, not Mason’s actively changing workspace
#

If you do nothing else, fix this. It will save you more pain than another 20k reserve tokens.


4) Gateway split recommendation

Right now

I agree with you, Option A is correct.

Keep:

  • Gateway 1 = main hive, aggressive pruning
  • Gateway 2 = doc/intel, same or slightly lighter later
  • Gateway 3 = meta, same defaults for now

No surface rebinding yet.

#

When I would revisit a split

Only if one of these becomes true:

  • Tony/Forge account for most compaction incidents
  • main gateway CPU/RAM starts spiking during build waves
  • Discord/Telegram latency gets noticeably worse during heavy worker bursts
  • main gateway logs show repeated congestion from exec/subagent load

If you eventually split later

The first candidates to move are:

  • Tony
  • Forge
  • Mason
  • Clay

That would create a dedicated execution gateway and leave business/chat agents on the main hive.

#

But I would not do that yet.


5) My tuning recommendation, right now

Because aggressive pruning is safe for all of you, I’d use it broadly.

Main gateway

Use the aggressive policy from earlier, and I’d actually be comfortable going a bit harder:

  • ttl: "5m"
  • minPrunableToolChars: 8000
  • keep hardClear enabled

If Tony/Forge still bloat sessions after that, the next move is not ā€œless pruningā€, it’s:

#
  • stricter transcript hygiene in prompts
  • more file-backed logs
  • faster session turnover

Session lifecycle

For the main hive, I’d absolutely enable:

  • group idle reset
  • thread idle reset
  • Discord thread maxAgeHours

Immortal worker threads are killing you.


6) One extra config tweak I do like for you

#

This one is worth keeping if you haven’t already:

{
  agents: {
    defaults: {
      contextInjection: "continuation-skip"
    }
  }
}

It reduces repeated bootstrap reinjection on safe continuation turns, which helps exactly the kind of long-running orchestrator sessions you have.


7) What I want from you next, if you want a tighter v3

#

Two tiny things would let me tune concurrency more confidently:

  1. from KVM2:
nproc
free -h
  1. if you have it, rough answer to:
  • how many heavy jobs run at once during peak, 2, 5, 10+?

With that, I can give you a more confident recommendation for:

  • agents.defaults.maxConcurrent
  • agents.defaults.subagents.maxConcurrent
  • maxChildrenPerAgent
#

If you want, next I can turn this into a copy-paste rollout plan with:

  • ā€œupgrade gateway 3ā€
  • ā€œvalidateā€
  • ā€œapply configā€
  • ā€œrestartā€
  • ā€œmove to gateway 2ā€
  • ā€œmove to gateway 1ā€

basically a runbook your team can execute line by line.

hidden elbow
#

Hey Krill, here's what you asked for.

1. KVM2 specs:

`

nproc: 2

total used free shared buff/cache available

Mem: 7.8Gi 3.7Gi 627Mi 1.1Mi 3.7Gi 4.0Gi

Swap: 0B 0B 0B

`

2. Peak concurrency:

During build waves (Forge building + Clay SSH'ing to KVM4 + Tony deploying + Argus monitoring + a standup runner), we hit roughly 5–8 concurrent agent sessions with 2–3 heavy exec jobs at the same time. Not 10+. Normal steady-state is lower — maybe 3–4 active sessions.

3. Yes please — turn the upgrade plan into a copy-paste rollout runbook we can execute line by line. The phased order you gave (Meta You → Doc → Main) is exactly right.

Also requesting all three deliverables:

  • Rolling upgrade runbook (line-by-line)
  • Hive prompt rule set (orchestrator + worker templates)
  • Gateway split recommendation (when to revisit)

One version check: you noted Meta You at 4.9 — we'll confirm with openclaw --version on that instance before starting Phase 1. It may still be on 4.5.

stable plinthBOT
#

That hardware changes my recommendation a bit.

On 2 vCPU, 7.8 GiB RAM, no swap, I would be conservative, not ambitious.

Revised tuning for KVM2

I’d dial my earlier numbers back to roughly:

  • agents.defaults.maxConcurrent: 3
  • agents.defaults.subagents.maxConcurrent: 4
  • agents.defaults.subagents.maxChildrenPerAgent: 3

Reason: on a 2-core box, your bottleneck is not just OpenClaw logic, it’s overlapping exec, builds, SSH, and Node processes fighting for the same tiny CPU budget.

Also, I’m a bit worried about 0 swap. Even a modest swapfile would make this box much less brittle during build waves.


#

1) Copy-paste rollout runbook

I’m splitting this into:

  1. binary upgrade pass
  2. config rollout pass

Do not combine them. Upgrade binaries first, validate, then change config.


Phase 0, preflight, once

0.1 Freeze the blast radius

During the rollout window:

  • avoid big deploys
#
  • avoid long Forge builds
  • avoid Mason/Clay overlapping jobs
  • avoid manual prompt experiments on Tony

0.2 Save current config/state

Run this once per gateway user and save it somewhere safe.

Meta You

sudo -iu metayou bash -lc '
set -e
date
openclaw --version || true
openclaw status || true
openclaw doctor || true
cp -av ~/.openclaw ~/.openclaw.backup.$(date +%Y%m%d-%H%M%S) || true
#
'

Doc

sudo -iu doc bash -lc '
set -e
date
openclaw --version || true
openclaw status || true
openclaw doctor || true
cp -av ~/.openclaw ~/.openclaw.backup.$(date +%Y%m%d-%H%M%S) || true
'

Main

#
bash -lc '
set -e
date
openclaw --version || true
openclaw status || true
openclaw doctor || true
cp -av ~/.openclaw ~/.openclaw.backup.$(date +%Y%m%d-%H%M%S) || true
'

Phase 1, Meta You canary

1.1 Inspect current version

#
sudo -iu metayou bash -lc '
set -e
openclaw --version
openclaw status
openclaw doctor
openclaw update --dry-run || true
'

1.2 Upgrade

If openclaw update --dry-run looks sane and this install supports it:

sudo -iu metayou bash -lc '
set -e
#
openclaw update --yes --no-restart
openclaw gateway restart
sleep 5
openclaw --version
openclaw status
'

If this instance is package-managed and openclaw update is not the right path, use your normal package-manager update flow for that user, then:

sudo -iu metayou bash -lc '
set -e
openclaw gateway restart
sleep 5
#
openclaw --version
openclaw status
'

1.3 Smoke test

Test these manually on Meta You:

  • normal chat reply
  • one sessions_spawn
  • one child completion announce
  • one /new
  • one Telegram/Discord delivery path

If Meta You looks good, continue.

#

Phase 2, Doc gateway

2.1 Inspect current version

sudo -iu doc bash -lc '
set -e
openclaw --version
openclaw status
openclaw doctor
openclaw update --dry-run || true
'

2.2 Upgrade

#
sudo -iu doc bash -lc '
set -e
openclaw update --yes --no-restart || true
openclaw gateway restart
sleep 5
openclaw --version
openclaw status
'

If package-managed, use that upgrade path, then restart.

2.3 Smoke test

Same checks:

  • normal reply
#
  • sessions_spawn
  • completion announce
  • /new
  • one actual doc/intel workflow

If clean, continue.


Phase 3, Main gateway

3.1 Inspect current version

bash -lc '
set -e
openclaw --version
#
openclaw status
openclaw doctor
openclaw update --dry-run || true
'

3.2 Upgrade

bash -lc '
set -e
openclaw update --yes --no-restart || true
openclaw gateway restart
sleep 5
openclaw --version
openclaw status
#
'

If package-managed, use that upgrade path, then restart.

3.3 Main smoke test

Before config changes, validate:

  • Tony normal reply
  • Forge normal reply
  • one sessions_spawn
  • one child result announce
  • one heavy-ish exec
  • one /new
  • one Discord thread-bound worker flow
  • one Telegram delivery flow
#

If that all works, move to config rollout.


Phase 4, config rollout

4.1 Main gateway config, conservative for 2 vCPU

Apply this style of tuning on the main gateway first.

{
  agents: {
    defaults: {
      maxConcurrent: 3,
      contextInjection: "continuation-skip",
      bootstrapMaxChars: 12000,
#

      compaction: {
        memoryFlush: true,
        reserveTokensFloor: 40000
      },

      contextPruning: {
        mode: "cache-ttl",
        ttl: "5m",
        keepLastAssistants: 3,
        minPrunableToolChars: 8000,
        softTrimRatio: 0.3,
        hardClearRatio: 0.5,
        softTrim: {
          maxChars: 2000,
#
          headChars: 800,
          tailChars: 800
        },
        hardClear: {
          enabled: true,
          placeholder: "[Old tool result cleared]"
        }
      },

      subagents: {
        maxConcurrent: 4,
        maxChildrenPerAgent: 3,
        maxSpawnDepth: 2,
        archiveAfterMinutes: 60,
        runTimeoutSeconds: 1800
#
      }
    }
  },

  session: {
    resetByType: {
      group: { mode: "idle", idleMinutes: 120 },
      thread: { mode: "idle", idleMinutes: 240 },
      direct: { mode: "idle", idleMinutes: 360 }
    },
    parentForkMaxTokens: 60000,
    maintenance: {
      mode: "enforce",
      pruneAfter: "14d",
      maxEntries: 800,
#
      rotateBytes: "10mb",
      resetArchiveRetention: "14d",
      maxDiskBytes: "2gb",
      highWaterBytes: "1.5gb"
    },
    threadBindings: {
      enabled: true,
      idleHours: 24,
      maxAgeHours: 72
    }
  },

  channels: {
    discord: {
      threadBindings: {
#
        enabled: true,
        idleHours: 24,
        maxAgeHours: 72,
        spawnSubagentSessions: true
      }
    }
  }
}

4.2 Restart main after config

bash -lc '
set -e
openclaw gateway restart
#
sleep 5
openclaw status
'

4.3 Re-test main

Watch specifically for:

  • Tony and Forge no longer stuffing giant tool output into context
  • sessions_spawn feeling async again
  • less compaction churn
  • lower prompt growth over time

4.4 Roll same config shape to Meta You and Doc

You can keep the same structure there.
If those gateways are lighter, you can later loosen concurrency, but I would start uniform.

#

2) Hive prompt rule set, copy-paste templates

Orchestrator template

Use this for Tony, Pepper, Mason-type agents.

## Orchestration rules

- `sessions_spawn` is fire-and-forget.
- After spawning a worker, continue with other useful work unless explicitly told to wait.
- Do not poll `sessions_list`, `sessions_history`, or `subagents list` in loops.
- If waiting for completion is actually required, use `sessions_yield`.
- Prefer `lightContext: true` for spawned workers unless the worker truly needs full session history.
- Prefer small focused workers over one giant worker.
#

## Transcript hygiene

- Never dump raw build logs, full API payloads, or giant file contents into the transcript unless explicitly asked.
- For shell work, redirect verbose output to a file and report only status, key findings, and the final relevant lines.
- For HTTP/API work, save payloads to a file and extract only the fields that matter.
- For large reads, inspect targeted slices, not whole files.

## Shared repo safety

- Never allow two agents to write to the same checkout at the same time.
- If a workspace is shared, use either:
  - a lock file / ownership handoff, or
  - separate worktrees/checkouts.
- Tony deploys, Forge builds.
#
- Mason plans/specs, Clay executes.
- If another agent owns the shared workspace, do not mutate it.

## Result format

Return concise summaries:
- what changed
- where it changed
- success/failure
- blockers
- next step

Worker template

Use this for Forge-ish spawned worker sessions.

#
## Worker rules

- You are a disposable worker session.
- Finish the assigned task and return a concise result.
- Do not busy-poll for parent status.
- Prefer writing artifacts and logs to files over dumping them into the transcript.
- If the task touches a shared repo, verify ownership before writing.
- Return summary, changed files, blockers, and recommended next step.

Shared workspace handoff template

If you want a lightweight human-readable handoff convention, use a file like:

# WORKSPACE_LOCK.md
#

Current owner: Forge
Task: build verification for mission_control_v2_app
Started: 2026-04-13T20:00Z
Expected release handoff: Tony after build artifact is ready
Status: active

That sounds almost stupidly simple, but it works.


3) Gateway split recommendation, when to revisit

My current recommendation

Do not split Tony/Forge off yet.

#

Why:

  • you already have 3 gateways
  • all 3 live on the same 2-core box
  • the real resource bottleneck is the host, not just routing shape
  • surface rebinding cost is real
  • you’ll get more benefit right now from pruning + reset discipline + concurrency caps

Revisit only if one of these becomes true

Trigger A

Tony/Forge still dominate compaction / overflow incidents after pruning and reset fixes.

Trigger B

Main gateway becomes visibly laggy during build waves.

Trigger C

#

You move a worker gateway to a different host or give KVM2 materially more resources.

Trigger D

Tony/Forge/Mason/Clay need tighter operational isolation than the current shared main hive can give.

If you do split later

First candidates:

  • Tony
  • Forge
  • Mason
  • Clay

But I’d only do it when you can also improve compute isolation, not just shuffle configs on the same 2-core VPS.


#

4) My final opinionated take

For this exact box, the winning stack is:

  • 4.12 everywhere
  • aggressive pruning
  • shorter-lived sessions
  • strict no-log-dumps prompt rules
  • conservative concurrency
  • shared-workspace ownership discipline

Not more fan-out.
Not more concurrency.
Not another gateway split on the same tiny host.

That’s the part I feel strongest about.

#

Docs worth bookmarking:

If this helps, I can do one more pass and turn the config + prompt templates into a clean ā€œMain Gateway v1ā€ bundle with:

  • final openclaw.json block
  • final orchestrator prompt
  • final worker prompt
  • final shared-workspace rule block

all ready to paste.

hidden elbow
#

If we upgrade KVM2 to 4 vCPU / 16 GB or even to a KVM 8 8 vCPU cores
32 GB RAM
400 GB NVMe disk space
32 TB bandwidth

how would you adjust the concurrency numbers?" — he'll probably go back to his original maxConcurrent: 6 recommendation, which is where you want to be.

as we will be growing instances too and agents, what do you think

stable plinthBOT
#

Yeah, on bigger boxes I’d loosen it, but not linearly.

Short version:

• 4 vCPU / 16 GB: I’d move to about **maxConcurrent: 4**, **subagents.maxConcurrent: 6**, **maxChildrenPerAgent: 4**
• 8 vCPU / 32 GB: I’d move to about **maxConcurrent: 6**, **subagents.maxConcurrent: 8**, **maxChildrenPerAgent: 5**

So yes, on 8 / 32, my earlier ā€œ6ā€ becomes pretty sane again.

My table

| Host size      | agents.defaults.maxConcurrent | agents.defaults.subagents.maxConcurrent | maxChildrenPerAgent |
| -------------- | ----------------------------- | --------------------------------------- | ------------------- |
| 2 vCPU / 8 GB  | 3                             | 4                                       | 3                   |
| 4 vCPU / 16 GB | 4                             | 6                                       | 4                   |
#
| 8 vCPU / 32 GB | 6                             | 8                                       | 5                   |

Why I still wouldn’t go crazy

Your real bottleneck is not just ā€œagent sessionsā€, it’s:

• builds
• exec
• SSH
• code tooling
• concurrent writes to shared repos

So even on 8 / 32, I would not jump straight to 10+ top-level concurrent runs unless you’ve also:

• separated worker vs chat roles better
• reduced shared-workspace contention

#

• kept transcript hygiene tight
• added some swap
• observed stable behavior for a few days

What matters most for concurrency

In practice, the order is:

  1. CPU cores
  2. RAM
  3. swap exists or not
  4. how many heavy exec/build jobs overlap
  5. whether browser/Chrome-style tools are active

Disk and bandwidth are nice, but they are not what sets these concurrency caps.

If you upgrade and also grow agents

#

Then I’d start thinking in gateway roles, not just bigger numbers:

• Gateway A, orchestrator/chat
lower concurrency, cleaner sessions
• Gateway B, workers/execution
higher concurrency, more subagents, more exec
• Gateway C, docs/research/meta
lighter, cheaper, more memory-oriented

That’s where scaling starts to feel clean instead of chaotic.

My opinionated take

If you go to 4 / 16

Good upgrade, worth doing.
I’d still keep the current topology and just relax caps a bit.

#

If you go to 8 / 32

Now it becomes worth revisiting a real worker gateway split later.
That’s the point where Tony/Forge/Mason/Clay on a more execution-oriented gateway starts making sense.

How I’d raise limits safely

Don’t jump all at once.

After upgrade:

  1. set the new caps
  2. run for 48 hours
  3. watch:
    • p95 reply latency
    • compaction frequency
    • CPU saturation during build waves
#

• memory pressure
• queue/backlog feel
4. only then bump again

So if you land on 8 / 32, I’d start at:

• maxConcurrent: 5
• subagents.maxConcurrent: 7
• maxChildrenPerAgent: 4

and only move to 6 / 8 / 5 after it proves stable.

If you want, I can turn that into a capacity-planning matrix next, like:

• current KVM2
• 4/16 target
• 8/32 target

#

• recommended gateway roles
• recommended caps per gateway role

hidden elbow
#

Going KVM 8 (8 vCPU / 32 GB). Please send the Main Gateway v1 bundle with caps tuned for that box. We'll start conservative per your advice (maxConcurrent: 5, subagents: 7, children: 4) and step up to 6/8/5 after 48h observation. And yes — adding a swapfile.

stable plinthBOT
#

Nice, that’s a much healthier target box.

Here’s the Main Gateway v1 bundle for KVM 8 (8 vCPU / 32 GB), tuned to start conservative at 5 / 7 / 4, then step to 6 / 8 / 5 after 48h if stable.


Main Gateway v1

1) openclaw.json merge block

Use this as the main gateway baseline.

{
  agents: {
    defaults: {
#
      // Start conservative on KVM 8. After 48h stable, bump to 6.
      maxConcurrent: 5,

      contextInjection: "continuation-skip",
      bootstrapMaxChars: 12000,

      compaction: {
        memoryFlush: true,
        reserveTokensFloor: 40000
      },

      contextPruning: {
        mode: "cache-ttl",
        ttl: "5m",
#
        keepLastAssistants: 3,
        minPrunableToolChars: 8000,

        softTrimRatio: 0.3,
        hardClearRatio: 0.5,

        softTrim: {
          maxChars: 2000,
          headChars: 800,
          tailChars: 800
        },

        hardClear: {
          enabled: true,
          placeholder: "[Old tool result cleared]"
#
        }
      },

      subagents: {
        // Start conservative on KVM 8. After 48h stable, bump to 8.
        maxConcurrent: 7,

        // Start conservative. After 48h stable, bump to 5.
        maxChildrenPerAgent: 4,

        maxSpawnDepth: 2,
        archiveAfterMinutes: 60,
        runTimeoutSeconds: 1800
      }
    },
#

    list: [
      {
        id: "tony",
        subagents: {
          allowAgents: ["tony", "forge"],
          model: "openai/gpt-5.3-codex"
        }
      },
      {
        id: "forge",
        subagents: {
          allowAgents: ["forge", "tony"],
          model: "openai/gpt-5.3-codex"
        }
#
      },
      {
        id: "mason",
        subagents: {
          allowAgents: ["mason", "clay"],
          model: "openai/gpt-5.3-codex"
        }
      },
      {
        id: "clay",
        subagents: {
          allowAgents: ["clay", "mason"],
          model: "openai/gpt-5.3-codex"
        }
      },
#
      {
        id: "banner",
        subagents: {
          allowAgents: ["banner"]
        }
      }
    ]
  },

  session: {
    resetByType: {
      group: { mode: "idle", idleMinutes: 120 },
      thread: { mode: "idle", idleMinutes: 240 },
      direct: { mode: "idle", idleMinutes: 360 }
    },
#

    parentForkMaxTokens: 60000,

    maintenance: {
      mode: "enforce",
      pruneAfter: "14d",
      maxEntries: 1000,
      rotateBytes: "10mb",
      resetArchiveRetention: "14d",
      maxDiskBytes: "4gb",
      highWaterBytes: "3gb"
    },

    threadBindings: {
      enabled: true,
#
      idleHours: 24,
      maxAgeHours: 72
    }
  },

  channels: {
    discord: {
      threadBindings: {
        enabled: true,
        idleHours: 24,
        maxAgeHours: 72,
        spawnSubagentSessions: true
      }
    }
  }
#
}

After 48h, if stable

Change just these 3 lines:

agents.defaults.maxConcurrent = 6
agents.defaults.subagents.maxConcurrent = 8
agents.defaults.subagents.maxChildrenPerAgent = 5

2) Orchestrator prompt block

#

Use this for Tony, Mason, Pepper-style agents.

## Main Gateway orchestration rules

- `sessions_spawn` is fire-and-forget.
- After spawning a worker, continue with other useful work unless explicitly told to wait.
- Do not poll `sessions_list`, `sessions_history`, or `subagents list` in loops.
- If waiting for completion is actually required, use `sessions_yield`.
- Prefer `lightContext: true` for spawned workers unless the worker truly needs full session history.
- Prefer small focused workers over one giant worker.

## Transcript hygiene

- Never dump raw build logs, full API payloads, or giant file contents into the transcript unless explicitly asked.
- For shell work, redirect verbose output to a file and report only:
#
  - status
  - key findings
  - final relevant lines
- For HTTP/API work, save payloads to a file and extract only the fields that matter.
- For large reads, inspect targeted slices, not whole files.

## Shared workspace safety

- Never allow two agents to write to the same checkout at the same time.
- If a workspace is shared, use either:
  - a lock file / explicit ownership handoff, or
  - separate worktrees/checkouts.
- Tony deploys, Forge builds.
- Mason plans/specs, Clay executes.
- If another agent owns the shared workspace, do not mutate it.
#

## Result contract

Return concise summaries:
- what changed
- where it changed
- success/failure
- blockers
- next step

3) Worker prompt block

Use this for Forge-ish and execution worker sessions.

#
## Worker session rules

- You are a disposable worker session.
- Finish the assigned task and return a concise result.
- Do not busy-poll for parent status.
- Prefer writing artifacts and logs to files over dumping them into the transcript.
- If the task touches a shared repo, verify ownership before writing.
- Return:
  - summary
  - changed files
  - blockers
  - recommended next step

#

4) Shared workspace rule block

This is the part I’d make explicit.

## Shared workspace ownership

The following workspaces are shared and must be treated as single-writer:

- Tony + Forge:
  - james-agent-surface
  - mission_control_v2_app

- Mason + Clay:
  - shared web build workspace
#
Rules:
- Only one agent may actively write in a shared checkout at a time.
- Use a lock file or separate worktree before modifying shared code.
- Tony deploys, Forge builds.
- Mason plans/specs, Clay executes.
- If ownership is unclear, stop and return blocked status instead of guessing.

5) Simple lock file template

Drop this into shared repos if you want a dead-simple coordination pattern.

#
# WORKSPACE_LOCK.md

Current owner: Forge
Task: build verification for mission_control_v2_app
Started: 2026-04-13T20:00Z
Expected handoff: Tony after build artifact is ready
Status: active

That’s not fancy, but it prevents a lot of dumb collisions.


6) Stability checklist for the first 48h

Before you step from 5 / 7 / 4 to 6 / 8 / 5, I’d want all of these true:

#
  • no context overflow loop
  • no repeated compaction storms
  • no user-visible reply lag during build waves
  • no main gateway memory pressure
  • no shared-workspace collisions between Tony/Forge or Mason/Clay
  • no ā€œwhy is the agent waiting on its own spawned child?ā€ weirdness

If those stay clean for 48h, bump the caps.


7) My final opinion on this bundle

This is a good v1 for the bigger box.

The main wins are still:

#
  • aggressive pruning
  • shorter-lived sessions
  • clear spawn behavior
  • shared-workspace discipline

The bigger box lets you loosen concurrency, but it does not remove the need for those rules.

If you want, next I can do a Meta/Doc lighter bundle too, with milder caps and a simpler prompt set.

hidden elbow
#

comign in a few messages

For Krill 🦐:

KVM 8 is live and verified — 8 vCPU, 32 GB RAM, 400 GB NVMe, 4G swap. Tony confirmed. Main Gateway v1 bundle received and QA'd. Three flags from QA:

  1. agents.list is partial — only 5 of 18 agents. We'll MERGE your subagent blocks into existing per-agent configs, not replace the whole list.
  1. subagents.model: "openai/gpt-5.3-codex" for spawned workers — will verify OpenClaw handles per-spawn model overrides this way before applying.
  1. Shared workspace list — we'll expand it as needed during rollout.

Now I need the Meta/Doc lighter bundle you offered, with updated context:

Meta You instance (same KVM 8 box, separate gateway on port 18810, openclaw-meta-you.service under metayou Linux user):

  • Currently: 3 agents — James (Opus 4.6), Atlas (Sonnet 4.6), Blaire (GPT-5.4 via OpenAI Pro)
  • Coming soon: FINN — a financial copilot agent. Voice + text + file upload (CSV/PDF financial statements). Dashboard, weekly reports, plain-language Q&A. Pilot first, then productised for multiple clients.
  • Growth plan: ~4 more agents beyond FINN, totalling ~7-8 on this gateway. One will likely be a shared services / builder agent (like our Mason/Clay pattern on main).
  • Auth: Anthropic Max 5x ($100/mo) for James/Atlas. Blaire + future agents on OpenAI Pro. FINN will likely be OpenAI Pro (GPT-5.4 or 5.3-Codex).
  • Key constraint: Anthropic agents need lean bootstrap (proven at 3.3KB/agent). Output costs 5x input on Anthropic.
  • Voice: FINN will use voice (like our existing Voice agent on main — Gemini Flash via ElevenLabs). Voice sessions are typically short but can spike.

one more messageing coming below, wait to reply

#
  • File processing: FINN will ingest uploaded financial docs (PDFs, CSVs). This means disk I/O and potentially larger context windows during analysis.

Doc instance (separate box, own OpenClaw install):

  • Currently: 2 agents — Doc (GPT-5.4), Intel (GPT-5.3-Codex)

  • Growth plan: Staying at 2. No expansion planned. Light workload — audits, research, version watch.

  • Just needs a sensible lighter config that matches the main bundle's patterns (pruning, session management) but with minimal concurrency.

Jordan/Sabain instance (port 18800, Docker, same KVM 8 box):

  • Currently: 3 agents — jordan, vic, russ. Separate client deployment.

  • Growth plan: Will grow by a few agents as Sabain adds capability. Separate config, separate gateway.

  • Would appreciate a starter template for this one too — something conservative that Sabain can scale up.

Overall fleet on KVM 8 after growth:

  • Main hive: 18 agents (gateway on 18789)

  • Meta You: ~7-8 agents (gateway on 18810)

  • Jordan: ~5-6 agents (gateway on 18800)

  • Total: ~30-32 agents sharing 8 vCPU / 32 GB RAM

At what point does this box need to become a dedicated server or split into two boxes? FINN with voice + file processing is the heaviest new workload.

Please provide:

  1. Meta You lighter bundle — tuned for 7-8 agents, mixed Anthropic/OpenAI, voice + file upload workload

  2. Doc lighter bundle — tuned for 2 agents, minimal

  3. Jordan starter template — conservative, growable to ~6 agents

  4. Your opinion on KVM 8 ceiling — when do we need to split or upgrade given 30+ agents with FINN's workload?

stable plinthBOT
#

Got it, I’ll wait for the rest before replying.

#

Yep, here’s the next layer. And good catch on the earlier agents.list, yes, treat all of these as merge blocks, not full replacement files.

Fleet-level take first

On KVM 8, 8 vCPU / 32 GB / 4G swap, this fleet is still workable if you cap active concurrency sanely.

What matters is not ā€œ30+ agents existā€, it’s:

  • how many are active at once
  • how many are doing heavy exec
  • how many are doing voice / file analysis
  • whether one gateway can starve another

My recommended starting gateway caps across this box:

  • Main: 5 / 7 / 4
  • Meta: 3 / 4 / 3
#
  • Jordan: 2 / 3 / 2
  • Doc: separate box, can ignore for KVM8 pressure

That gives you a sane host-wide starting point without pretending all gateways deserve the same budget.


1) Meta You lighter bundle

Use case: mixed Anthropic/OpenAI, lean bootstrap, light growth to ~7–8 agents, FINN voice + file upload, one future builder/shared-services agent.

Meta openclaw.json merge block

{
  agents: {
#
    defaults: {
      maxConcurrent: 3,

      // Anthropic-friendly, keeps repeat bootstrap injection down.
      contextInjection: "continuation-skip",

      // Keep this tighter than main, since you already proved lean bootstraps.
      bootstrapMaxChars: 6000,

      compaction: {
        memoryFlush: true,
        reserveTokensFloor: 40000
      },

      contextPruning: {
#
        mode: "cache-ttl",
        ttl: "5m",

        keepLastAssistants: 3,
        minPrunableToolChars: 6000,

        softTrimRatio: 0.3,
        hardClearRatio: 0.5,

        softTrim: {
          maxChars: 1500,
          headChars: 600,
          tailChars: 600
        },
#
        hardClear: {
          enabled: true,
          placeholder: "[Old tool result cleared]"
        }
      },

      subagents: {
        maxConcurrent: 4,
        maxChildrenPerAgent: 3,
        maxSpawnDepth: 2,
        archiveAfterMinutes: 45,
        runTimeoutSeconds: 1200
      }
    }
  },
#

  session: {
    resetByType: {
      group: { mode: "idle", idleMinutes: 120 },
      thread: { mode: "idle", idleMinutes: 180 },
      direct: { mode: "idle", idleMinutes: 300 }
    },

    parentForkMaxTokens: 50000,

    maintenance: {
      mode: "enforce",
      pruneAfter: "14d",
      maxEntries: 600,
      rotateBytes: "10mb",
#
      resetArchiveRetention: "14d",
      maxDiskBytes: "2gb",
      highWaterBytes: "1.5gb"
    },

    threadBindings: {
      enabled: true,
      idleHours: 24,
      maxAgeHours: 48
    }
  },

  channels: {
    discord: {
      threadBindings: {
#
        enabled: true,
        idleHours: 24,
        maxAgeHours: 48,
        spawnSubagentSessions: true
      }
    }
  }
}

Meta notes

I’d run Meta a little tighter than Main because:

  • Anthropic output is expensive
  • James/Atlas want lean context
  • FINN’s file sessions can spike context fast
#
  • voice sessions are short, but bursty

FINN-specific operating rule

I would strongly encourage this pattern for FINN:

  • uploaded PDF/CSV comes in
  • FINN spawns a short-lived analysis worker
  • worker extracts + summarizes
  • FINN responds from the summary
  • raw parsed content does not linger in the main session

That matters a lot.


2) Doc lighter bundle

#

Use case: 2 agents, separate box, light audits/research/version watch, no growth planned.

Doc openclaw.json merge block

{
  agents: {
    defaults: {
      maxConcurrent: 2,
      contextInjection: "continuation-skip",
      bootstrapMaxChars: 8000,

      compaction: {
        memoryFlush: true,
        reserveTokensFloor: 40000
      },
#

      contextPruning: {
        mode: "cache-ttl",
        ttl: "10m",

        keepLastAssistants: 3,
        minPrunableToolChars: 8000,

        softTrimRatio: 0.3,
        hardClearRatio: 0.5,

        softTrim: {
          maxChars: 1500,
          headChars: 600,
          tailChars: 600
#
        },

        hardClear: {
          enabled: true,
          placeholder: "[Old tool result cleared]"
        }
      },

      subagents: {
        maxConcurrent: 2,
        maxChildrenPerAgent: 2,
        maxSpawnDepth: 2,
        archiveAfterMinutes: 45,
        runTimeoutSeconds: 1200
      }
#
    }
  },

  session: {
    resetByType: {
      group: { mode: "idle", idleMinutes: 180 },
      thread: { mode: "idle", idleMinutes: 240 },
      direct: { mode: "idle", idleMinutes: 480 }
    },

    parentForkMaxTokens: 50000,

    maintenance: {
      mode: "enforce",
      pruneAfter: "21d",
#
      maxEntries: 300,
      rotateBytes: "10mb",
      resetArchiveRetention: "21d",
      maxDiskBytes: "1gb",
      highWaterBytes: "800mb"
    },

    threadBindings: {
      enabled: true,
      idleHours: 24,
      maxAgeHours: 72
    }
  },

  channels: {
#
    discord: {
      threadBindings: {
        enabled: true,
        idleHours: 24,
        maxAgeHours: 72,
        spawnSubagentSessions: true
      }
    }
  }
}

Doc notes

This one should stay boring.

I would optimize Doc for:

#
  • low background noise
  • good transcript hygiene
  • light concurrency
  • reliable long-ish research threads without letting them go immortal

3) Jordan starter template

Use case: separate client deployment, conservative today, growable to ~6 agents, same KVM8 host.

Jordan openclaw.json merge block

{
  agents: {
#
    defaults: {
      maxConcurrent: 2,
      contextInjection: "continuation-skip",
      bootstrapMaxChars: 8000,

      compaction: {
        memoryFlush: true,
        reserveTokensFloor: 40000
      },

      contextPruning: {
        mode: "cache-ttl",
        ttl: "5m",

        keepLastAssistants: 3,
#
        minPrunableToolChars: 8000,

        softTrimRatio: 0.3,
        hardClearRatio: 0.5,

        softTrim: {
          maxChars: 1800,
          headChars: 700,
          tailChars: 700
        },

        hardClear: {
          enabled: true,
          placeholder: "[Old tool result cleared]"
        }
#
      },

      subagents: {
        maxConcurrent: 3,
        maxChildrenPerAgent: 2,
        maxSpawnDepth: 2,
        archiveAfterMinutes: 45,
        runTimeoutSeconds: 1200
      }
    }
  },

  session: {
    resetByType: {
      group: { mode: "idle", idleMinutes: 120 },
#
      thread: { mode: "idle", idleMinutes: 180 },
      direct: { mode: "idle", idleMinutes: 300 }
    },

    parentForkMaxTokens: 50000,

    maintenance: {
      mode: "enforce",
      pruneAfter: "14d",
      maxEntries: 500,
      rotateBytes: "10mb",
      resetArchiveRetention: "14d",
      maxDiskBytes: "1.5gb",
      highWaterBytes: "1.2gb"
    },
#

    threadBindings: {
      enabled: true,
      idleHours: 24,
      maxAgeHours: 48
    }
  },

  channels: {
    discord: {
      threadBindings: {
        enabled: true,
        idleHours: 24,
        maxAgeHours: 48,
        spawnSubagentSessions: true
#
      }
    }
  }
}

Jordan scale-up path

Start at 2 / 3 / 2.

If Jordan grows to ~6 agents and behaves well for a week, I’d move it to:

  • maxConcurrent: 3
  • subagents.maxConcurrent: 4
  • maxChildrenPerAgent: 3

No faster than that on the shared box.

#

4) My opinion on the KVM8 ceiling

Short version

KVM8 is fine for this fleet in the short term.
It is not the forever box if FINN becomes real.

The ceiling is not ā€œ32 agents existā€.
The ceiling is more like:

  • Main doing 2 heavy exec/build jobs
  • Meta doing FINN voice + file parsing
  • Jordan doing client work
  • all at the same time

That is where the box starts to get honest with you.

#

I would stay on KVM8 if:

  • FINN is still pilot-level
  • voice is intermittent
  • file analysis sessions are limited
  • heavy exec overlap stays moderate
  • reply latency stays acceptable

I would split to two boxes when any of these happen

Trigger 1

FINN gets real client traffic
Meaning: more than occasional PDF/CSV analysis, or multiple voice/file sessions in the same day.

Trigger 2

One gateway can starve another
Example: Main build waves make Meta or Jordan feel laggy.

#

Trigger 3

CPU or RAM starts living hot
My rough ā€œtime to splitā€ thresholds:

  • sustained CPU pressure in normal business hours
  • RAM regularly above ~75%
  • swap actually being leaned on, not just sitting there
  • p95 response latency noticeably degrading during busy windows

Trigger 4

Jordan needs stronger client isolation
This is not just performance, it’s operational cleanliness.
Client deployment sharing a box with your internal hive is okay early, not ideal forever.


5) Which gateway should move first?

#

There are two reasonable answers.

If you care most about performance risk

Move Meta / FINN first.
Why:

  • voice + file analysis is the heaviest new workload
  • context can spike fast
  • FINN is the most likely ā€œsurprise resource consumerā€

If you care most about tenant isolation

Move Jordan first.
Why:

  • it’s already a separate client deployment
  • cleaner operational boundary
  • easier to reason about support/noise/isolation

My recommendation

#

If FINN becomes real before Jordan becomes large, move Meta first.
If Jordan becomes customer-critical first, move Jordan first.


6) My fleet recommendation, blunt version

For now

  • keep Main on KVM8 with 5 / 7 / 4
  • keep Meta on KVM8 with 3 / 4 / 3
  • keep Jordan on KVM8 with 2 / 3 / 2
  • keep Doc separate and boring

When FINN graduates from pilot

That’s when I’d stop treating KVM8 as the forever home.

At that point, I’d want either:

#
  • Box A: Main + maybe Doc
  • Box B: Meta/FINN + Jordan
    or
  • Box A: Main + Jordan
  • Box B: Meta/FINN

depending on whether performance or client isolation is the bigger concern.


If you want, next I can package this into a Fleet v1 matrix:

  • gateway
  • role
  • caps
  • growth path
  • first split trigger
#
  • first move candidate

which would probably be the cleanest ā€œoperating sheetā€ for your team.