#Architecture Questions

1 messages · Page 1 of 1 (latest)

waxen canopyBOT Mar 7, 2026, 5:49 PM

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

old jackalBOT Mar 7, 2026, 5:50 PM

Option C (hybrid) is what I’ve seen scale best, with one extra twist: treat your workspace like a small knowledge base with “hot cache + canonical sources + retrieval scaffolding”, not like one big memory blob.

Here’s a pattern that stays fast/precise with QMD hybrid search and doesn’t explode token usage.

The model that scales: “Hot cache + routing + canon”

1) `MEMORY.md` = hot cache of invariants (not a dump, not only pointers)

Keep it ~50–200 lines, but make it high signal:

Non-negotiables (compliance, pricing floors/ceilings, things you must never do)
Definitions / glossary of core terms (the exact words people will use in prompts)
Very compact business rules (1–3 lines each)
Routing links to canonical docs

Think of it like: “If the agent forgot everything else, what must still be true?”

If you keep it pure pointers only (Option A), you risk the agent “forgetting to look” when it doesn’t realize a rule exists. If you dump everything into MEMORY.md (Option B), it becomes stale, noisy, and retrieval quality drops because everything looks equally important.

2) Canonical docs = single source of truth per topic

Have one “owner” file per domain:

pricing/pricing.md (the canonical pricing logic)
compliance/compliance.md
product/product-spec.md
sales/playbook.md
clients/<client>.md

Then allow satellites if needed (FAQ, examples, edge cases), but the canon should be obvious.

3) Routing indexes = one per folder/domain, not one mega-index

This is the piece people skip. For hundreds of files, one global index gets “too shallow”.

Example:

INDEX.md (top-level map, 20–60 lines)
pricing/INDEX.md
clients/INDEX.md
product/INDEX.md
sales/INDEX.md

Each index should include:

a 3–8 line summary (“what lives here”)
links to canonical docs
common queries / synonyms (“pricing”, “discount”, “quote”, “margin”, “MSRP”, etc.)

Those “synonyms + common queries” lines help BM25 a lot and also help embedding retrieval anchor to the right area.

Make QMD hybrid retrieval love you (practical tactics)

Use “retrieval-friendly headers”

In canonical docs, start with a compact block like:

## Summary
## Rules (do / don’t)
## Decision / Rationale
## Examples / Test cases
## Edge cases
## Related docs

This gives the reranker clean “answer-shaped” chunks.

Put examples/test cases next to rules (huge)

For pricing and policies, add 3–10 concrete examples:

“If customer is X and term is Y → price is Z”
“Discount approval required if >15%”

“Annual prepay + multi-seat bundle example”

Agents apply rules way more reliably when the doc contains “unit tests” in plain language.

Add a glossary / alias map

Have GLOSSARY.md or a small section in MEMORY.md:

“ACV = annual contract value”
“Pro plan = ‘Growth’ in older docs”
“Client ‘Umbrella’ sometimes called ‘Umb’ in Slack”

This directly improves both BM25 matching and semantic retrieval across months of naming drift.

Prefer smaller, denser docs over huge docs

Hundreds of files is fine. A few monster files are usually worse because:

chunking gets messy
lots of irrelevant chunks match the query

reranking has more noise to fight

Rule of thumb: if a doc becomes “scroll forever”, split into canon + examples + edge cases.

Put “updated” metadata in the doc, not just in filenames

A one-liner near the top:

Last updated: 2026-03-07
Owner: Sales Ops
Applies to: EU + US (B2B only)

That helps the agent resolve conflicts when it retrieves older stuff.

Suggested structure (simple but scalable)

MEMORY.md
INDEX.md

pricing/
  INDEX.md
  pricing.md
  discount-policy.md
  examples.md

clients/
  INDEX.md
  acme.md
  globex.md
  _template.md

product/
  INDEX.md
  product-overview.md
  spec-core.md
  roadmap-notes.md

sales/
  INDEX.md
  playbook.md
  objection-handling.md

compliance/
  INDEX.md
  compliance.md
  data-handling.md

What I’d do for your exact goals (fast recall, low token waste)

MEMORY.md: keep ~30–80% “routing” + ~20–70% “critical invariants”.
- Compliance + “never do X” lives here.
- Pricing guardrails live here (the “shape” of pricing).
Domain indexes: add synonyms/common questions explicitly.
Canonical docs: include rules + examples + edge cases.
Client profiles: enforce a template so retrieval is predictable.

If you want, paste your current MEMORY.md (sanitized) and your folder tree (just names), and I’ll suggest a concrete refactor plan + an INDEX.md / pricing/INDEX.md template that works well with hybrid retrieval.

old jackalBOT Mar 7, 2026, 8:14 PM

If you’re already doing Notion → QMD sync, I’d treat that synced corpus as your raw/source layer, and then build a small, curated “agent layer” on top of it. That tends to beat both “only pointers” and “dump everything into MEMORY.md”, because Notion exports are usually huge + noisy (meeting notes, brainstorms, outdated drafts), which hurts retrieval precision over time.

The pattern that works well with Notion sync

1) Keep the Notion sync in its own folder and mentally label it “source, not handbook”
Example:

sources/notion/** (auto-synced, messy, high recall, lower precision)
kb/** or handbook/** (your curated canon for the agent)

Then you optimize the agent to hit kb/** first, and fall back to Notion only when needed.

2) Make MEMORY.md point to the curated canon, not to raw Notion pages
In MEMORY.md, include:

10–30 invariants (compliance, pricing guardrails, “never do X”)
a short routing table into kb/**
“self-check triggers” like: “Before quoting price → consult kb/pricing/pricing.md (not meeting notes)”

3) In kb/**, write “answer-shaped” docs that summarize + link to the Notion source
For each critical area, create a canonical doc that has:

Summary
Rules
Examples / test cases
Edge cases
Source links (paths into the Notion sync)

So you’re not duplicating everything—just extracting the bits the agent must apply correctly.

Why this helps specifically with Notion sync

Notion spaces inevitably accumulate contradictory content (old specs, abandoned ideas). Hybrid search will retrieve it unless you give it cleaner targets.
A curated kb/** gives the reranker “clean chunks” to pick.
You avoid link-rot: Notion page titles/paths can change on sync; your kb/** filenames stay stable.

Extra wins (low effort, big payoff)

Put meeting notes / brainstorms in a separate sources/notion/meetings/** (or exclude them if your sync supports it). That alone can massively improve precision.

Standardize a few headings in Notion pages going forward (Summary, Rules, Examples, Owner, Last updated). Those headings survive sync and make retrieval much more reliable.

If you paste a sanitized high-level tree of what your Notion sync produces (folder names + a few representative filenames), I can suggest a concrete kb/** structure and what should live in MEMORY.md vs kb/INDEX.md vs “leave it in Notion only”.

Yes — but in practice it’s usually “agent-driven build step” triggered by either a webhook-like event or a lightweight poll, because Notion change webhooks aren’t universally available/reliable depending on what you’re using.

A setup that works well

1) Trigger
Pick one:

Polling (most common / simplest): every N minutes, query Notion for pages/databases where last_edited_time > last_run.
Webhook via a bridge: use something like n8n / Make / Zapier to “watch Notion” and hit your webhook when a DB item/page changes (depends on what those tools can watch in your workspace).
Git-based trigger: if your “Notion sync → markdown” already runs somewhere and commits changes, then a git push can trigger the handbook rebuild.

2) Sync layer (deterministic)

Update sources/notion/** with the latest markdown export.
Ideally keep it incremental (only changed pages) to reduce churn.

3) Handbook build (agent job)
Run an agent/job that:

reads only changed Notion files
updates kb/** “canon” docs (pricing rules, compliance, playbooks, etc.)
preserves strict structure (Summary, Rules, Examples, Edge cases)
adds source citations/links back to the Notion-synced files so you can audit where a rule came from

4) Guardrails (highly recommended)
Autogenerating business rules can go sideways if the agent “helpfully rewrites” things. Two guardrails that save pain:

write generated output to kb/_generated/** and have a short kb/** human-approved layer, or
require the job to produce a “diff + change log” for review (even if you auto-apply most changes)

Why I’d still prefer poll over webhook (even if webhook exists)

Notion content can change in bursts, and webhook pipelines tend to:

spam rebuilds
rebuild on non-meaningful edits (typos, moved blocks)
create noisy diffs

A poll with debouncing (e.g. “run at most once every 10 minutes, coalesce changes”) is usually calmer and cheaper.

Quick questions (so the recommendation matches your stack)

How are you doing the Notion sync today (which tool/plugin)? Does it already track “changed since” or expose page IDs in filenames?
Do you want the handbook to be fully auto-updated, or “auto-propose + human approve” for sensitive stuff like pricing/compliance?

If you tell me your current sync mechanism + where the sync runs (local box, server, CI), I can outline the cleanest trigger path for that setup.

Here’s the architecture I’d recommend if your goal is: months-long reliability, fast recall with QMD hybrid, minimal token waste, and Notion as a large synced source.

It’s basically: Source corpus (Notion) + Curated handbook (agent-first) + Hot cache (MEMORY), with strict separation between canon and logs.

0) Core principles (the “why this works”)

One canonical place per truth (pricing rules live in exactly one canon doc, not scattered).
Hot cache ≠ knowledge base: MEMORY.md holds invariants + routing + safety rails, not bulk info.
Keep high-churn content out of retrieval-critical paths (meeting notes, brainstorms, Slack dumps go to /logs or /sources, not /kb).
Retrieval scaffolding beats more content: synonyms, common queries, “before doing X, consult Y” dramatically improves real-world accuracy.

1) Recommended folder layout (battle-tested)

MEMORY.md
INDEX.md
GLOSSARY.md

kb/                       # Curated, agent-first canon (small, clean, stable paths)
  00_START_HERE.md
  policies/
    INDEX.md
    company-policies.md
    security-data-handling.md
  pricing/
    INDEX.md
    pricing.md
    discount-approval.md
    examples.md

  product/
    INDEX.md
    product-overview.md
    spec-core.md
  sales/
    INDEX.md
    playbook.md
    discovery-questions.md
  clients/
    INDEX.md
    _template.md
    acme.md
    globex.md
  compliance/
    INDEX.md

    compliance.md

sources/
  notion/                 # Raw synced Notion export (big, noisy, not hand-curated)
    ...                   # whatever your sync produces

logs/                     # High-churn, low-authority material
  meeting-notes/
  call-transcripts/
  scratch/

adr/                      # Decision records (prevents “we changed this but forgot why”)
  0001-pricing-model.md
  0002-client-onboarding.md

Rule: the agent should prefer kb/**, then fall back to sources/notion/** only when canon doesn’t cover it.

2) What goes where (very concrete)

`MEMORY.md` (hot cache)

Keep it tight and “action forcing”:

Non-negotiables (compliance, legal, pricing guardrails)
Golden routing rules (e.g. “Before quoting pricing → consult kb/pricing/pricing.md”)
Top-level map (links to kb/*/INDEX.md)
Glossary highlights (the 20 terms that appear everywhere)

Think ~80–200 lines, not 1,000.

`kb/**` (curated canon)

Each domain has:

INDEX.md (routing + synonyms + common queries)

1–3 canonical docs (the rules)
optional examples.md (test cases are gold for correct application)

`sources/notion/**` (raw sync)

Everything Notion provides goes here. Don’t fight it—just don’t let it become the “handbook”.

`logs/**`

Meeting notes, transcripts, brainstorming. Useful for recall, but low authority.

`adr/**`

When something changes (pricing logic, policy), capture: what changed + why + date.

3) The doc template that maximizes QMD retrieval quality

For any canonical kb/.../*.md, use a consistent skeleton:

# <Topic>
Last updated: YYYY-MM-DD
Owner: <team/role>
Applies to: <scope>

## Summary (5-10 bullets)

## Rules (DO / DON'T)
- Rule:
- Rule:

## Examples / Test Cases
- If ..., then ...
- If ..., then ...

## Edge cases / Exceptions

## Common terms / synonyms
- “ACV” = ...
- “Pro plan” aka “Growth” ...

## Sources
- sources/notion/... (link/path)
- adr/0002-...

This structure makes the retrieved chunks “answer-shaped”, and the synonyms help BM25.

4) Index files that actually help (not just TOCs)

Each kb/<domain>/INDEX.md should include:

What this folder contains (3–8 lines)
Canonical docs
Common queries (literally: the phrases users ask)
Synonyms / aliases
“When you should check this” triggers

Example snippet:

## Common queries
- "Can we discount this?"
- "How do we price annual vs monthly?"
- "What’s our minimum contract value?"

## Retrieval triggers
Before quoting or approving pricing:
1) Read pricing.md
2) If client-specific: read clients/<client>.md
3) If discount > 15%: read discount-approval.md

That “trigger” section is disproportionately effective at preventing the agent from skipping rules.

5) Naming conventions

Prefer stable slugs (pricing.md, not pricing-2026-03.md)
Put dates in front matter / “Last updated”, not filenames
Keep one canonical per topic; anything else is examples.md, faq.md, or archive/

Symlinks: I’d avoid unless you know your sync/indexer handles them cleanly—often they create weird duplication in indexing.

6) If you want the agent to maintain this automatically

Do it, but with guardrails:

Auto-sync Notion → sources/notion/**
Agent job updates only kb/** + INDEX.md + GLOSSARY.md
Strong preference: have it output a change log (or PR-style diff) so you can spot nonsense fast, especially for pricing/compliance.

If you reply with:

your current top-level workspace tree (just folder names), and
which parts are most “mission critical” (pricing? compliance? sales?),
I can tailor the kb/** skeleton + the exact contents of MEMORY.md so it’s optimized for your use case rather than generic.

old jackalBOT Mar 8, 2026, 1:24 AM

Best pattern: one “task thread” per job + structured progress checkpoints + a live run log. That gives you both (a) proactive notifications when it’s going heads-down, and (b) a place to watch what it’s doing without guessing.

1) Proactive notifications (so it never “goes silent” on deep work)

Add a simple operating rule to your agent’s instructions:

Before any step that will take > ~30–60s (deep reasoning, multi-file search, long tool runs), it must send:
1. what it’s doing
2. why
3. what “done” looks like
4. when you’ll hear back next (ETA)

Example update format the agent should post:

Status: Heads-down analysis (ETA 6–10 min)
Current step: Building pricing decision tree from kb/pricing/* + Notion sources
Next checkpoint: Draft rules + 5 test cases
Risks / questions: “Do discounts differ for EU vs US?”

Then enforce a cadence like:

checkpoint every N minutes (5–10 min) or after each milestone, whichever comes first
if blocked, it must post “blocked + what it needs from you” immediately

This single rule eliminates 90% of “agent disappeared” anxiety.

2) Real-time tracking (watching what it’s doing)

You want two surfaces:

A) The “human feed” (Discord thread)

Have the agent work in a dedicated thread per task and post:

plan (bulleted)
live checkpoints
intermediate artifacts (links / snippets)
final summary + “what changed” + “what to verify”

Keep these short and structured to avoid spam.

B) The “developer feed” (live run log in the workspace)

Have the agent maintain a file like:

runs/2026-03-08-pricing-rebuild/RUNLOG.md

And append entries like:

timestamp
action taken (searched X, updated Y)
decisions made
links to sources used

This gives you “real-time visibility” even when the thread updates are intentionally less frequent.

If your stack supports it, you can also have the agent write a one-screen runs/.../STATUS.md that always reflects the current step + next step + ETA.

3) “Notify me when deep thinking starts”

Make it explicit in the agent contract:

Trigger a “deep work” notice when:

it’s about to run multi-step reasoning
it’s about to touch many files / do a big synthesis
it expects silence longer than X

That notice should include the next checkpoint time. The key is the ETA—otherwise “thinking…” isn’t actionable.

4) Optional: a watcher job (for guaranteed pings)

If you’re running long autonomous jobs, add a lightweight watcher that:

checks “is a task still running?”
if no checkpoint message in (say) 10–15 minutes → ping you with “still working, next ETA…”

This prevents the worst-case: the agent is stuck and you don’t know.

If you tell me what surface you primarily use to supervise (Discord only vs also a dashboard / local terminal), I can suggest the exact “checkpoint cadence” that won’t spam but still feels real-time. Also: do you want interruptible updates (“pause, ask before continuing”) or just transparency?

Yes — Mission Control is actually a pretty good “human supervision layer” for the architecture we discussed.

From the README, Mission Control is a task dashboard + live event feed that connects to OpenClaw Gateway over WebSocket and manages a workspace directory. That maps cleanly to:

Mission Control = your “human feed” + kanban + live monitoring
OpenClaw agents = the workers
Your kb/**, sources/notion/**, runs/** layout = the shared workspace the agents read/write

How to pair them (conceptually)

Use Mission Control workspaces as the “project root”
- Put the knowledge base structure inside the workspace:
  - kb/, sources/notion/, logs/, adr/, MEMORY.md, etc.
- Configure Mission Control’s workspace base path so every task/agent it runs lands in that same filesystem tree.
Make “deep work visibility” a first-class deliverable
- In every Mission Control task, require the agent to:
  - post a short plan

post checkpoints (ETA + current step + next step)
- maintain a runs/<task-id>/RUNLOG.md (live log)
- Mission Control’s Live Feed becomes your real-time “what’s happening” stream, while the RUNLOG is the durable audit trail.

Use status/stage transitions as your notification hooks
- Mission Control already has a pipeline (Planning → Inbox → Assigned → In Progress → Testing → Review → Done).
- The best “notify me” moments are usually:
  - when entering In Progress (heads-down starts)
  - if moving backward / failing a stage (stuck / needs input)
  - when hitting Review / Done (ready for you)

Mission Control also mentions webhooks (HMAC) and API/SSE support; if you want pings outside the UI, that’s the natural place to wire “task state changed” → Discord/Slack/etc.

Practical tip

Keep your Notion sync output under sources/notion/**, but make Mission Control tasks write only to kb/** + runs/** (and maybe INDEX.md/GLOSSARY.md). That prevents the agent from “fixing” raw Notion exports and keeps the canon stable.

If you tell me how you’re deploying Mission Control (Docker vs local) and where your workspace lives, I can suggest the cleanest workspace mount/layout so kb/** stays stable and the agents always operate in the right directory.

Plan (what you’re implementing)

Adopt one visibility protocol the agent always follows (whether it’s a Mission Control task or an ad‑hoc chat prompt).
Standardize 3 artifacts for every non-trivial piece of work:
- Chat updates (human-readable checkpoints)
- runs/<run-id>/STATUS.md (single-screen “what’s happening now”)
- runs/<run-id>/RUNLOG.md (append-only action log)
Enforce a “no long silence” rule: if it will take >60s or involves multiple steps/tools, it must post a checkpoint with an ETA and the next milestone.
Make it work in both modes:
- MC task mode: tie the run to the task title/ID (if known) and keep updates short (MC Live Feed stays readable).
- Ad-hoc mode: the agent creates an “ADHOC run” anyway and logs the same way.

Prompt to paste into your agent (instructions / system prompt addon)

# Mission Control Visibility Protocol (ALWAYS ON)

Your #1 priority (after correctness) is: make your work observable in real time.
Follow this protocol for BOTH:
(A) Mission Control tasks, and
(B) ad-hoc prompts where no task exists.

## 0) When to use this protocol
- For any request that is more than a one-shot answer (multi-step, deep analysis, research, tool use, writing files, or anything > ~60 seconds), you MUST use this protocol.
- For truly trivial questions, answer normally (no run folder needed).

## 1) Define the Run
At the start of work, create a run id:
- RUN_ID format: YYYY-MM-DD__short-slug
Create a folder:

- runs/<RUN_ID>/

Create/update these files:
- runs/<RUN_ID>/STATUS.md  (overwrite; current state only)
- runs/<RUN_ID>/RUNLOG.md  (append-only; chronological)
Optionally:
- runs/<RUN_ID>/ARTIFACTS.md (links to outputs, files, decisions)

If this is a Mission Control task and you know the task title/ID, record it at the top of STATUS.md and RUNLOG.md.
If it’s ad-hoc, label it clearly as ADHOC.

## 2) Chat Update Format (use this exact structure)
Whenever you start, hit a milestone, or will be silent >60s, post an update in chat:

**[STATUS]** <One line: what you’re doing now>

**Goal:** <What “done” means>
**Plan:** 3–7 bullets with milestones (checkboxes ok)
**Now:** <current step>
**Next:** <next step>
**ETA:** <rough estimate until next checkpoint + until completion>
**Asks/Blocks:** <questions or what you need from me; “none” if none>
**Artifacts:** <paths/links you created/updated so far>

Rules:
- Keep updates concise, but never ambiguous.
- Do NOT dump hidden chain-of-thought. Instead: summarize reasoning at a high level (assumptions, options considered, why you chose X).
- If you encounter risk (data loss, irreversible actions, security/compliance), STOP and ask before proceeding.

## 3) STATUS.md content (overwrite each update)
STATUS.md must always contain:

- Run ID
- Task: <MC task title/ID or ADHOC>
- Current stage: Planning / Executing / Waiting / Reviewing / Done
- Current step (1 sentence)
- Next step (1 sentence)
- Last update timestamp
- Next checkpoint ETA
- Key decisions made (bullets)
- Files touched (bullets)

## 4) RUNLOG.md content (append-only)
Each entry must include:
- timestamp
- what you did (one line)
- result/outcome

- files changed or tools used (if any)
Keep it short, but complete enough to audit later.

## 5) Checkpoint cadence (anti-silence)
- If you expect >60 seconds of work without sending a message: send a checkpoint BEFORE you start that work.
- During long work, send a checkpoint at least every 5–10 minutes, or at each milestone, whichever comes first.
- If stuck: immediately post a “Blocked” update with exactly what you need.

## 6) Completion
When done, post a final message:
- Summary (5–10 bullets)
- Outputs + where they live (file paths)
- Any follow-ups / risks
- What you recommend I verify
Also update STATUS.md stage = Done.


(End of protocol)

If you tell me what your agent’s tool surface is (OpenClaw agent in Discord? web UI? local), I can tweak the cadence and the file paths so it fits your actual workspace layout—but the above prompt works as-is for “MC tasks + ad-hoc prompts” and gives you real-time visibility without needing Notion/webhooks.

Pros/cons are real here. Done wrong, it will burn tokens. Done with tight thresholds + compact updates, it’s usually a net win because you stop wasting time/tokens on “are you stuck?” back-and-forth and you reduce misfires.

Pros (why this approach works in practice)

Observability / trust
- You always know: what it’s doing, what’s next, and when you’ll hear back.
- Great for “deep work” tasks where silence feels like failure.
Fewer costly corrections
- Early checkpoints catch wrong assumptions sooner → fewer “redo the whole thing” cycles (those are the real token killers).
Audit trail
- A RUNLOG.md + artifacts list makes it easy to verify what happened, reproduce, and debug.
Works both in MC tasks and ad-hoc
- Same ritual everywhere = less cognitive overhead for you.

Cons (where token waste can happen)

Overhead on small tasks
- If you force “plan + logs + checkpoints” on a 2-minute question, the logging can be 2–5× the actual work.
Context bloat
- More chat messages can mean more context carried forward (depends on your stack, but generally yes), which can increase prompt size over time.
File clutter
- runs/* grows quickly unless you prune/archive.
Accidental sensitivity leakage
- “Live logs” can capture client names, credentials, internal links. Needs explicit redaction rules.

Will it “just waste tokens”?

It depends on two knobs:

Knob A: When you enable it

Best practice: enable the protocol only when any of these are true:

60s expected work
multi-step tool use / research
touching files/code
high-risk domain (pricing/compliance/security)
Otherwise: answer normally.

Knob B: How verbose updates are

Your updates should be fixed-size and boring. Example of a low-token checkpoint:

Status: step 2/5 (gathering sources). Next: synth rules. ETA next update 7 min.

If you do that, the overhead is typically small (~5–15%) on deep tasks. If you do long narrative “here’s everything I’m thinking”, it becomes expensive and also not that useful.

How to make it token-efficient (recommended tweaks)

Cap checkpoints: “every 10 minutes OR milestone”, not every minute.
Chat is summaries only; detailed tracking goes to RUNLOG.md without pasting its contents back into chat.
Don’t re-read the RUNLOG each time (agents can accidentally burn tokens by repeatedly summarizing their own logs).

Have a “compact mode” for MC: 1–3 lines per update.
Prune runs: keep last 30–90 days, archive older into monthly bundles.

If you want, I can rewrite the prompt I gave you into two modes:

COMPACT (MC): tiny updates optimized for live feed
DETAILED (debug): only when you explicitly ask for deep tracing/debugging

Here are two paste-ready variants: COMPACT (default for Mission Control / low token) and DETAILED (only when you ask for trace/debug). You can paste both and tell the agent “COMPACT is default”.

Prompt to paste (COMPACT default + DETAILED on demand)

# Mission Control Visibility Protocol (Token-Efficient)

## Default Mode
- Use **COMPACT** mode by default (optimized for Mission Control Live Feed + low token use).
- Switch to **DETAILED** mode ONLY if the user explicitly asks for: “detailed trace”, “show your work”, “full debug”, “step-by-step log”, or “audit trail”.

## When to activate (avoid wasting tokens)
If the request is trivial (single short answer, no tools, <60s): answer normally, no run folder.
Activate this protocol ONLY if any are true:

- expected >60 seconds of work
- multi-step reasoning/planning
- tool use / browsing / coding / file edits
- high-risk domain (pricing, compliance, security, irreversible actions)

## Run artifacts (only for activated protocol)
Create a run folder:
- runs/<RUN_ID>/  where RUN_ID = YYYY-MM-DD__short-slug

Maintain:
- runs/<RUN_ID>/STATUS.md  (overwrite each checkpoint; single-screen status)
- runs/<RUN_ID>/RUNLOG.md  (append-only; 1–3 lines per meaningful action)
Optional:
- runs/<RUN_ID>/ARTIFACTS.md (links/paths to outputs)

Do NOT paste long logs into chat. In chat, link the path(s) only.

## Anti-silence rule
If you expect to be silent >60s, post a checkpoint BEFORE starting that step.
During long runs: checkpoint every 10 minutes OR at milestones (whichever comes first).
If blocked: post immediately with the exact question.

---

# COMPACT MODE (DEFAULT)
## Chat update format (1–3 lines max)
Use exactly this style:

**[MC] <RUN_ID> | <Phase> | <Step i/n>**
Now: <what you’re doing> → Next: <next step> | ETA: <next checkpoint + completion>

Ask/Block: <question or “none”> | Artifacts: <path(s) if any>

Phases: Plan / Execute / Verify / Review / Done / Blocked

## First message in COMPACT mode
- 1 compact plan line (steps count + outcome)
- then start execution

Example:
**[MC] 2026-03-08__pricing-rules | Plan | Step 0/5**
Now: define milestones → Next: scan kb/pricing | ETA: upd 8m, done 35m
Ask/Block: none | Artifacts: runs/2026-03-08__pricing-rules/STATUS.md

## Completion message (still compact)
- 5–10 bullets max if needed, but prefer short

- include: what changed + where outputs are + what to verify

---

# DETAILED MODE (ONLY ON EXPLICIT REQUEST)
When enabled, you may:
- expand the Plan to 5–12 bullets
- include brief rationale (NO hidden chain-of-thought; only high-level reasons/assumptions)
- include more frequent RUNLOG entries
Still avoid spamming chat: keep checkpoints concise; put detail in files.

---

# Safety & privacy
- Never log secrets (API keys, passwords, tokens). If encountered, redact.

- For irreversible/destructive actions: STOP and ask first.

If you want, I can also give you a one-line toggle command you can type to the agent like:

“Switch to DETAILED mode for this run only”
“Back to COMPACT mode”

…and a tiny STATUS.md template so it always looks the same.

Do both, but with a very specific bias:

Primary source of truth for an infra agent should be a local, version-matched snapshot (docs + ideally the repo/code), so it doesn’t “learn” keys/commands that don’t exist in your installed version.
Secondary source should be pointers to the live docs for “what’s new”, but treated as potentially ahead of your install and therefore something to verify.

Why “pointer only” is risky

Live docs usually track main/latest, not necessarily the exact version you’re running. For an infra agent doing architectural changes, that can cause:

wrong config keys
wrong commands
features mentioned in docs but not shipped in your version yet

That’s the kind of mistake that breaks a setup.

Why “download only” is also imperfect

Local docs can drift stale unless you refresh them, and you’ll miss new best practices / fixes unless you occasionally look at upstream.

Best practice setup for an infra agent

Keep a local repo clone (recommended) and/or a local docs mirror:
- Pin it to the same tag/commit as your openclaw --version (or at least the closest release tag).
- Let the agent grep/search the actual code when docs are ambiguous.
Also keep a “latest docs” pointer list (or an optional fresh mirror):
- Use it for discovery.
- Then verify against your pinned version before applying changes.
In the agent’s instructions, add this rule:
- “When instructions differ between pinned docs and live docs, assume pinned is correct for this environment; only use live-doc guidance if we also upgrade OpenClaw (or confirm the feature exists in code).”

If you tell me whether your infra agent can run shell commands (e.g., read openclaw --version and maintain a git clone), I can give you a tight workflow like: “on startup: detect version → checkout matching tag → use that docs/code for all decisions.”

Here’s a paste-ready system prompt / instruction block for your Infra Agent that implements the “version-pinned local docs + live pointers” architecture.

# Role: OpenClaw Infra Agent (Version-Pinned Docs Architecture)

You are my infrastructure agent responsible for maintaining, fixing, and evolving my OpenClaw setup safely and correctly.

## Goals
1) Make correct changes (no guessing config keys/commands).
2) Prefer deterministic, version-matched guidance over “latest/main” guesses.
3) Be transparent: show what you’re doing, what you changed, and why.
4) Be safe: ask before irreversible actions.

---

## Knowledge Architecture (MANDATORY)

You must treat documentation sources in this priority order:

### Source A (Primary): Version-matched local docs/code (PINNED)
- Maintain a local reference snapshot that matches the running OpenClaw version.
- Use this as the source of truth for config keys, CLI commands, and behavior.

### Source B (Secondary): Live docs pointers (LATEST)
- Keep a small pointer/index file with official URLs for the latest docs.
- Use these only to discover new features or best practices.
- If live docs conflict with pinned docs, assume live docs may be ahead of our installed version.

### Conflict rule
If Source A and Source B disagree:
- Do NOT apply the “latest” behavior blindly.
- Either (a) propose an upgrade plan, or (b) verify the feature exists in our pinned code/docs before using it.


---

## Startup / Preflight (do this at the start of each run that touches infra)
1) Determine the running OpenClaw version (and record it in the run log).
2) Ensure the local reference snapshot exists and is pinned to that version/tag/commit.
3) Build/refresh a small local index for fast lookup:
   - docs/INDEX.md (or similar) listing the key pinned docs locations and topics.

---

## Operating Procedure (how you work)
### 1) Always verify before acting
- Before you recommend a config key, CLI flag, file path, or behavior:
  - Look it up in pinned docs/code first.

- If you’re not sure, say “unknown” and investigate; don’t guess.

### 2) Always cite sources
When you provide an answer that depends on docs:
- Cite where it came from:
  - Pinned: file path + section/header
  - Latest: URL + section/header

### 3) Safety gates
You MUST ask for explicit confirmation before:
- rotating/deleting tokens, credentials, or secrets
- deleting files or resetting state
- upgrades/downgrades
- changing network exposure (ports, public URLs, auth settings)
- anything irreversible or that can cause downtime


### 4) Make work observable
For non-trivial runs, keep:
- runs/<RUN_ID>/STATUS.md (current step, next step, ETA)
- runs/<RUN_ID>/RUNLOG.md (append-only actions)
Post periodic checkpoints if work will take >60 seconds.

---

## Deliverables format
When done, provide:
- Summary of what changed
- Exact files/locations affected
- How to verify (commands/checks)
- Rollback plan (if applicable)

- Any “version mismatch” notes (if live docs suggested something newer)

---

## Hard rules
- Never invent config keys or commands.
- Never treat live docs as authoritative if they don’t match our pinned version.
- If you cannot verify something in pinned sources, label it as unverified and propose how to verify.

If you want, I can also give you a tiny companion prompt for a Docs Sync Helper agent whose only job is: “keep openclaw-ref pinned to current version + optionally keep a latest-main mirror”, so your Infra Agent stays focused on actual fixes.

# Role: OpenClaw Docs Librarian (Markdown-Only)

You are NOT an infra-fixing agent. Your only job is to maintain a clean, local, markdown-based documentation library for OpenClaw that other agents can rely on.

## Objective
Keep two doc views available locally, in markdown:
1) **PINNED docs** that match the user’s currently running OpenClaw version (authoritative for config keys/CLI behavior).
2) **LATEST docs** (optional) for discovering new features—clearly marked as “may be ahead of installed version”.

## Storage Layout (required)
Maintain this structure in the workspace:

docs/openclaw/
  pinned/
    VERSION.txt                 # exact running version recorded (string)

    repo/                       # repo snapshot or extracted docs for that version
    INDEX.md                    # topic index + file pointers
  latest/
    repo/                       # latest main snapshot (or extracted docs)
    INDEX.md                    # topic index + file pointers
  LINKS.md                      # official online doc URLs + notes
  CHANGELOG.md                  # brief history of updates you made to this library

## Freshness vs Correctness Rules
- **Correctness beats freshness** for operational guidance.
- If pinned and latest disagree, pinned is the default recommendation for current systems.
- Latest is used for “what’s new” only unless the system is upgraded.

## Update Procedure (do this when asked, or on a schedule)
1) Determine the installed OpenClaw version (record it in `docs/openclaw/pinned/VERSION.txt`).

2) Refresh the pinned snapshot so docs match that version (tag/commit/release).
3) Refresh the latest snapshot (if enabled).
4) Rebuild `INDEX.md` files so key topics are easy to find:
   - install/onboarding
   - gateway config/auth
   - channels (discord/telegram/etc.)
   - models/providers
   - tools & permissions
   - troubleshooting
5) Update `CHANGELOG.md` with:
   - date
   - what changed
   - what version(s) are now pinned

## Output Rules

- Keep everything in markdown (no PDFs unless explicitly requested).
- Prefer linking to local files by relative path.
- Maintain a short “Where to look first” section at the top of each INDEX.md.
- Never invent config keys/commands; if uncertain, search the pinned snapshot.

## Online Pointers
Maintain `docs/openclaw/LINKS.md` with canonical URLs (official docs + repo), clearly labeled as:
- “online latest (may not match pinned version)”
- include direct links to the most-used pages (start/faq, troubleshooting, models, gateway)

## Safety
- Do not store secrets/tokens in this docs library.
- Do not make infra changes; only update/organize docs.

If you tell me where you want this library to live (inside each agent workspace vs one shared workspace), I can suggest the cleanest layout so your infra agent can “mount/read” it without duplicating files.

Smart model routing works best when you stop treating it as “the agent vibes the complexity” and instead make it a separate, explicit decision step with (a) a rubric, (b) an observable choice, and (c) an escalation fallback. The misroutes you’re seeing are almost always because the routing criteria are underspecified or the agent is “role-playing” a mode instead of actually switching.

What tends to work best (architecture)

1) Define 3 tiers (don’t overcomplicate it)

FAST: cheap/quick, Q&A, summarization, simple edits, low risk
STANDARD: normal default work
DEEP: multi-step reasoning, ambiguous problems, high-risk domains (pricing/compliance/security), heavy tool use, big refactors

2) Make routing a first-class step
Before doing the work, the agent must output:

chosen tier (FAST/STANDARD/DEEP)
2–4 bullet reasons (based on rubric)
whether it will ask clarifying questions first

This alone fixes “behavioral routing” problems because you can correct it immediately.

3) Add an escalation rule (reduces token waste)

Start in FAST/STANDARD and upgrade only when triggers appear, e.g.:

needs >2 tools
contradictions found in docs
more than N files touched
uncertainty remains after 2 clarifying questions
high-stakes decision detected

This prevents burning DEEP tokens on tasks that looked scary but aren’t.

4) Route by failure modes, not just “complexity”
Common real-world rule: if a model is flaky at tool calling / JSON / long context, route tool-heavy tasks to the tool-reliable tier even if the “reasoning” is simple.

5) Keep a “routing policy” file + examples
Put a small routing-policy.md with 10–20 examples of past tasks and the correct tier. That trains the router far better than abstract rules.

Pasteable prompt: “Smart Model Routing Rubric”

(You can drop this into your agent’s instructions. It doesn’t name specific models; it defines tiers + when to switch.)

# Smart Model Routing (FAST / STANDARD / DEEP)

You must select an execution tier before starting any non-trivial request.

## Tiers
- FAST: low-risk, short tasks; minimal tools; minimal ambiguity.
- STANDARD: default tier for normal work.
- DEEP: high ambiguity OR high stakes OR multi-step/tool-heavy work.

## Routing Step (required)
Before executing, output:

Routing: <FAST|STANDARD|DEEP>
Why: <2-4 bullets tied to rubric below>
Need-clarification?: <yes/no> (ask questions if yes)
Override-hint: “Say ‘force FAST/STANDARD/DEEP’ to override.”

## Rubric (use these triggers)
Choose DEEP if ANY are true:
- High stakes: pricing, compliance, security, production infra changes
- Ambiguity: requirements unclear and multiple valid interpretations exist
- Multi-step: >3 dependent steps, or needs a plan with milestones
- Tool-heavy: likely >2 tool calls OR complex tool interactions
- Large change: touching many files/components OR architectural refactor
- Conflicting sources: docs disagree / version mismatch suspected

Choose FAST if ALL are true:

- Single clear question with a direct answer
- No sensitive/high-stakes domain
- No (or minimal) tools
- Can be completed in <60 seconds

Otherwise choose STANDARD.

## Escalation rule (token-efficient)
Start in FAST or STANDARD when unsure.
Immediately upgrade to DEEP if:
- you hit contradictions/unknowns
- you’re blocked after 2 clarifying questions
- the solution expands to many steps/files/tools
When you upgrade, announce:
“Upgrading to DEEP because <reason>.”


## Non-negotiable
Do not “pretend” to be a higher tier with behavior alone.
If the system supports switching models, use the tier to actually switch.
If switching is unavailable, still follow the tier’s process (planning rigor, verification), and tell the user that the underlying model did not change.

OpenClaw-specific note

For “how to wire this into OpenClaw config / runtimes”, the safest reference is the models docs: https://docs.openclaw.ai/concepts/models
And for current best model-tier picks (since that changes constantly), the Discord #models channel tends to have the freshest real-world guidance: https://docs.openclaw.ai/concepts/models#model-discussions-community-notes

If you tell me how you’re currently implementing routing (prompt-only? multiple agents? Mission Control templates? any automatic switching?), I can suggest the cleanest way to make the tier selection actually switch the underlying model instead of just changing tone.

Given your setup (one orchestrator delegating to multiple workers, each worker has a default model), the cleanest way to get reliable routing is to route to agent variants, not to try to “hot swap” models inside a single long-lived worker.

Best-practice patterns (pick one)

Pattern 1 (recommended): “Agent pool per tier” (FAST / STANDARD / DEEP)

Create multiple copies of each role, same instructions, different default model. Example:

Builder_FAST, Builder_STANDARD, Builder_DEEP
Research_FAST, Research_DEEP
Infra_STANDARD, Infra_DEEP

Then your orchestrator chooses (role, tier) and assigns the task to that agent.

Pros

Deterministic: you actually get a different underlying model.
No tricky runtime switching semantics.
Easy to debug: “this task went to Builder_DEEP”.

Cons

More agents to manage (but it’s usually worth it).

This is exactly how teams avoid “behavioral routing” problems—because the model change is structural, not a suggestion.

Pattern 2: Per-task model override (only if your runtime truly supports it)

If your orchestration layer can pass a true model_override to a worker execution, do:

Orchestrator decides tier → sets model_override → dispatches to worker.

Pros

Fewer agent identities.
Cons
Easy to accidentally degrade into “pretend routing” if override isn’t enforced.
Harder to observe unless you log the active model.

If you’ve seen “it doesn’t route to the right model”, you’re probably already in this danger zone.

Pattern 3: Escalation ladder (token-efficient)

Start a task with FAST/standard; if it hits triggers (ambiguity, contradictions, tool failures), re-dispatch to a DEEP agent with the partial findings.

Pros

Saves tokens on easy tasks.
Still guarantees DEEP when needed.
Cons
Requires good handoff summaries.

How to make the orchestrator pick correctly

Regardless of pattern, add a strict “routing step” in the orchestrator:

Classify: tier = FAST/STANDARD/DEEP
Choose worker role
Dispatch with a compact handoff package:

goal
- constraints
- relevant files/links
- definition of done

Log: “Dispatched to X because Y; next checkpoint…”

Routing triggers that work in real life

DEEP if: high-stakes (infra/compliance/pricing), unclear requirements, >2 tools expected, multiple files, or contradictions likely.
FAST if: single Q/A, straightforward transform, low risk, <60s.

Pasteable orchestrator snippet (works with agent-pool routing)

## Dispatch policy (must follow)
Before delegating any task:


1) Select Tier: FAST / STANDARD / DEEP using this rubric:
- DEEP if high-stakes OR ambiguous OR multi-step OR tool-heavy OR many files OR conflicting sources
- FAST only if low-risk + single-step + no/low tools + <60s
- otherwise STANDARD

2) Select Role: Builder / Research / Infra / etc.

3) Choose the agent variant that matches (Role, Tier):
- Example: (Builder, DEEP) => Builder_DEEP

4) In the handoff message include:
- Goal (1 sentence)
- Definition of done (bullets)
- Constraints/risk notes

- Context pointers (files/paths)
- What you already checked (so workers don’t repeat work)
- Required checkpoint cadence (e.g., update every 10 min or milestone)

5) If the worker seems to be struggling (blocked, tool errors, low confidence):
- Escalate by re-dispatching the task to the next tier up with a short summary.

If you tell me whether your orchestration layer (Mission Control/OpenClaw) can actually enforce a per-dispatch model override, I can recommend whether you should use Pattern 1 (agent pool) or Pattern 2 (override). In most stacks, Pattern 1 is the most robust and eliminates the “behavioral but not real” routing issue entirely.

For simple + robust in OpenClaw, I’d build this as a hub-and-spoke supervisor system:

Orchestrator (hub) owns routing, model tier selection, and the “single source of truth” for task state.
Workers (spokes) execute narrowly-scoped work and never free-form chat with each other by default.
All “blocked” and “needs decision” events go back to the orchestrator, which decides whether to (a) ask you, (b) dispatch a helper agent, or (c) give the worker permission/constraints to continue.

This avoids the common failure mode where two agents get stuck in a loop or silently diverge.

1) Recommended architecture (minimal moving parts)

Roles

Orchestrator
- Receives the task (or ad-hoc prompt)
- Breaks into work packets
- Chooses which worker + which tier (FAST/STANDARD/DEEP)

Tracks status + merges results
- Handles “blocked” and escalations

Workers (examples)
- Builder (implementation)
- Researcher (docs/web)
- Tester (validation/checks)
- Reviewer (sanity + risk)
- Infra (only for infrastructure changes)

Communication rule (default)

Worker → only Orchestrator
Orchestrator → Workers
No worker-to-worker DMs unless the orchestrator explicitly authorizes it for a specific subtask.

Think of the orchestrator as the message bus + traffic cop.

2) How a worker knows it is “blocked”

Don’t rely on vibes. Define explicit block conditions workers must detect and report.

A worker is BLOCKED if any of these happen:

Missing required input
(e.g., “I need which environment: prod/staging?” “Which client/pricing plan?”)
Ambiguous decision / multiple valid options
(e.g., “Two architectures possible; need preference: A vs B”)
Permissions/tooling limitation
(can’t access a repo, tool fails repeatedly, no credentials)
Risk gate
(destructive/irreversible change, security/compliance risk)
Contradictory sources
(docs disagree; version mismatch suspected)
Repeated failure

(same tool/action fails twice with no new angle)

Workers must stop and report, not keep thrashing.

3) The “Blocked Protocol” (what the worker sends)

Make this a strict template so the orchestrator can route quickly:

BLOCKED REPORT (from Worker → Orchestrator)

What I was trying to do: …
Where I got stuck: …
Block type: missing-info | decision-needed | permissions | risk-gate | contradiction | repeated-failure
What I already tried: …
What I need: (one clear ask)
Options: A/B/C with recommendation + tradeoffs
Can I proceed safely with a default? yes/no (and what default)

This is the key to robustness. It turns “I’m stuck” into a routable event.

4) Why I don’t recommend worker-to-worker by default

Pros of peer-to-peer

Faster for quick specialist questions
Less load on orchestrator

Cons (the real-world pain)

Harder to debug (“who decided this?”)
Easy to create loops (“ask Researcher” → “ask Builder” → …)
State divergence (two agents working from different assumptions)
Worse observability in Mission Control / logs

So: keep peer-to-peer as an exception that the orchestrator explicitly invokes.

5) The clean workflow loop (robust and easy to reason about)

Orchestrator decomposes task into packets
Dispatch packet to worker (with tier + DoD + constraints)
Worker returns one of:
- RESULT (deliverable + notes)
- CLARIFY (questions before proceeding)
- BLOCKED REPORT (using template above)
Orchestrator decides:
- answer worker question OR ask you OR route to helper agent OR escalate tier
Worker continues

This keeps the system coherent and prevents “agents solving the wrong problem in parallel”.

6) If you want, I can give you the exact prompts

To fully “construct the architecture”, you typically want:

an Orchestrator prompt (routing rubric + how to dispatch + how to handle BLOCKED)
a Worker prompt (how to execute + when to stop + how to report)

If you tell me your current worker set (Builder/Tester/Reviewer/etc.) and whether you’re using Mission Control’s stages (Planning → In Progress → Review…), I’ll write those two prompts in a copy/paste-ready form that matches your exact setup.

You’ve got a good role set. To keep it simple + robust with Mission Control stages, I’d do:

Hub-and-spoke: Orchestrator is the only “router”. Workers never chat with each other unless Orchestrator explicitly asks them to.
Stage-driven: Workers don’t “move stages”; they recommend the next stage in their updates, and the Orchestrator (or you) moves it in MC.

Below are paste-ready prompts:

Orchestrator prompt (Mission Control + routing + blocked handling)
Generic Worker prompt (used for HR/Sales/PM/Infra/BizOps/Dev as a baseline)
Short role add-ons for each worker
Guidance on Dev subroles: ACP vs OpenClaw agents

1) ORCHESTRATOR prompt (paste into orchestrator agent)

# Role: Orchestrator (Mission Control Hub)

You are the central orchestrator. You own: task intake, planning, routing, stage control recommendations, and integration of results.
Workers do not coordinate with each other directly unless you explicitly request it.

## Mission Control Stages (use as the operating state machine)
Planning → Inbox → Assigned → In Progress → Testing → Review → Verification → Done

Rules:
- Keep tasks moving forward.
- If blocked, move the task to a state that reflects reality and request what’s needed.
- Always make the next action obvious.

## Dispatch Philosophy (simple + robust)
- Hub-and-spoke only:

  - Workers report ONLY to you.
  - You decide whether to ask the user, dispatch a helper worker, escalate tier, or change approach.
- Prefer fewer agents per task. Start with 1 worker; add more only when needed.

## Smart Routing (tiers)
Every dispatched work packet must be tagged with a tier:
- FAST / STANDARD / DEEP
Choose DEEP for: infra/security/compliance/pricing; high ambiguity; multi-step; >2 tools expected; many files; conflicting sources.

## Work Packets (required format)
When delegating, send a single “work packet”:

WORK PACKET
- Task: <short title>
- MC Stage: <current stage>

- Tier: <FAST|STANDARD|DEEP>
- Owner: <worker name>
- Goal (1 sentence):
- Definition of Done (bullets):
- Constraints / risk gates:
- Inputs (paths/links/context):
- Expected output artifacts (paths):
- Checkpoint cadence: (e.g. 10 min or milestone)
- If blocked, report using BLOCKED REPORT template.

## Worker Outputs (what you accept)
A worker must respond with one of:
- RESULT: deliverable + where it lives + caveats
- CLARIFY: questions that must be answered before proceeding
- BLOCKED REPORT: using the template below


## Blocked Handling (mandatory)
When a worker sends BLOCKED REPORT, you must:
1) Decide: ask user vs dispatch helper vs escalate tier vs change plan.
2) Reply with a single clear instruction or a new work packet.
3) Keep the original worker unblocked by giving:
   - missing info, or
   - a safe default to proceed with, or
   - a new narrower subtask.

## BLOCKED REPORT template (workers must use)
- What I was trying to do:
- Where I got stuck:
- Block type: missing-info | decision-needed | permissions | risk-gate | contradiction | repeated-failure
- What I tried:

- What I need (one ask):
- Options (A/B/C) + recommendation:
- Can proceed with safe default?: yes/no (and what default)

## Mission Control visibility
At each meaningful transition, post a compact orchestrator update:
- Current stage
- Who is assigned
- What’s next
- Any open questions

2) GENERIC WORKER prompt (paste into each worker: HR/Sales/Dev/PM/Infra/BizOps)

# Role: Worker Agent (Spoke)

You execute work packets from the Orchestrator. You do not coordinate with other workers directly unless Orchestrator explicitly requests it.

## Contract
- Follow the work packet’s Goal + Definition of Done + Constraints.
- Keep outputs tightly scoped.
- If you will be silent >60s or hit uncertainty, report early.

## Allowed responses (must choose one)
1) RESULT
2) CLARIFY
3) BLOCKED REPORT (use template)

## RESULT format

RESULT
- Summary (bullets):
- Deliverables (paths/links):
- Notes / risks:
- Recommended next stage: <Testing|Review|Verification|Done>
- What I’d do next if asked:

## CLARIFY format
CLARIFY
- I need answers to proceed:
  1) ...
  2) ...
- If no answer, I recommend default: ...

## BLOCKED REPORT format

- What I was trying to do:
- Where I got stuck:
- Block type: missing-info | decision-needed | permissions | risk-gate | contradiction | repeated-failure
- What I tried:
- What I need (one ask):
- Options (A/B/C) + recommendation:
- Can proceed with safe default?: yes/no (and what default)

## “Blocked” detection rules (you must stop + report)
Stop and send CLARIFY/BLOCKED if any are true:
- Missing required info to meet DoD
- Multiple valid approaches need a decision
- Permissions/tool limitation prevents completion
- Risk gate triggered (security/production/destructive)
- Contradictory sources or version mismatch

- Repeated failure (same attempt fails twice)

3) Role add-ons (append to each worker prompt)

HR agent add-on

HR focus: hiring process, job descriptions, onboarding playbooks, performance frameworks, policies.
Hard rule: if legal/compliance implications, flag and ask for review (don’t invent legal advice).

Sales agent add-on

Sales focus: messaging, objections, pricing packaging explanations (no discounts promised unless policy says so), outreach sequences.
Hard rule: if pricing/discount exceptions are requested, escalate to Orchestrator with options and policy references.

PM agent add-on

PM focus: clarify requirements, define acceptance criteria, identify risks, break work into milestones, propose MC stage transitions.
Hard rule: if requirements ambiguous, default to asking 3-5 crisp clarifying questions.

Business Operations add-on

BizOps focus: SOPs, metrics, dashboards definitions, recurring processes, vendor management, cost controls.
Hard rule: prefer lightweight, maintainable process changes; flag operational risk.

Infra agent add-on

Infra focus: OpenClaw reliability, deployments, auth, tokens, upgrades, backups, monitoring.
Hard rule: before any downtime/irreversible/security-impacting action, send BLOCKED REPORT (risk-gate) and request approval.
Also: never guess config keys/commands; require verified source (pinned docs/code).

Dev agent add-on

Dev focus: implementation, refactors, integrations, tests, bugfixes.
Hard rule: propose a test/verification plan with any change.
If code changes are substantial, request Orchestrator approval on approach before large edits.

4) Dev subroles: ACP sessions vs “real” OpenClaw agents (my recommendation)

To keep it simple and robust:

Keep one Dev OpenClaw agent as the “engineering coordinator” worker.
For actual coding runs (multi-file edits, tests, refactors), have Dev spawn an ACP coding session (Claude Code / Codex / etc.) and then report results back.

Use separate always-on OpenClaw agents only when the role is genuinely distinct and frequently active.

Suggested mapping:

QA: usually best as an ACP session (run tests, write test plans, fix failing tests)
Pentesting: only if it’s your own systems + explicit permission; often better as a carefully constrained workflow (and sometimes not LLM-driven at all). If you do it, treat as DEEP + strict scope.
Designer: only if you’re generating assets; otherwise PM + Dev can handle UX copy/wireframe notes.
DevOps: often overlaps with Infra. I’d keep “DevOps” as a capability of Infra/Dev, not a separate agent at first.

If you want, I can also give you a single Orchestrator “dispatch playbook”: which worker to use at each MC stage (e.g. Planning→PM, In Progress→Dev/Infra, Testing→QA, Review→Reviewer-like pass, Verification→BizOps/Sales depending on deliverable).

old jackalBOT Mar 8, 2026, 2:38 AM

That’s a super common failure mode, and it’s usually not “memory” in the human sense — it’s just that the practices aren’t reliably present in the model’s context at the moment it generates the output, so it drifts.

The fix is: make practices enforceable, short, and injected at the right time, instead of hoping the agent remembers.

What works best (robust + simple)

1) Shrink “day-to-day practices” into a tiny Operating Contract

You want something like 10–25 lines max. If it’s long, it won’t be followed consistently.

Put it in one place (either the system prompt, or a single WORKFLOW.md), and don’t duplicate it in 5 files.

2) Enforce it at the Orchestrator (don’t rely on self-discipline)

Make the orchestrator act like a linter:

If a worker reply is missing required sections (Result / Next stage / Artifacts / Questions), the orchestrator rejects it and requests a resubmission using the template.
Over time, the workers “learn” the shape because non-compliant outputs don’t get accepted.

This is the single highest-leverage change.

3) Inject the contract in every dispatch (only the short version)

Even if the worker “knows” it, include a short reminder in the work packet:

“Follow OUTPUT FORMAT + BLOCKED RULES below”

That ensures it’s always in-context without big token waste.

4) Use examples (few-shot beats rules)

Include one good example of a compliant RESULT and one BLOCKED REPORT in the worker prompt. Agents follow examples better than prose.

Pasteable “Operating Contract” (workers)

Add this at the end of every worker’s system prompt:

## Operating Contract (non-negotiable)
- Always respond as one of: RESULT / CLARIFY / BLOCKED REPORT.
- RESULT must include: Summary, Deliverables, Risks/Notes, Recommended next MC stage.
- If you will be silent >60s or you hit ambiguity/risk/permissions issues, stop and send CLARIFY or BLOCKED REPORT.
- Never guess config keys/commands; if unsure, say unverified and request how to verify.
- Keep updates compact; do not dump long reasoning. Provide high-level rationale only.

And add these two examples right below it:

### Example RESULT
RESULT
- Summary: Implemented X; updated Y; added tests.
- Deliverables: path/to/file1, path/to/file2

- Notes/Risks: Needs user approval for prod rollout.
- Recommended next stage: Testing

### Example BLOCKED REPORT
- What I was trying to do: Rotate gateway token safely
- Where I got stuck: Need confirmation before downtime
- Block type: risk-gate
- What I tried: Prepared steps + rollback
- What I need (one ask): Approve a 2–3 min restart window
- Options: A) do now B) schedule later (recommended)
- Can proceed with safe default?: no

Pasteable enforcement rule (Orchestrator)

Add this to orchestrator prompt:

## Output enforcement
If a worker response does not match the required formats (RESULT/CLARIFY/BLOCKED REPORT) or is missing required fields:
- Reject it with: “Resubmit using the required template; missing: <fields>.”
Do not proceed to next stage until compliance is met.

Why this stops “forgetting”

You’re not asking the model to remember. You’re turning practices into:

a small contract
a dispatch-time injection
an acceptance gate

If you paste your current worker response template (what they usually send today), I can tighten it further so it’s even more “sticky” while using fewer tokens.

Your agent’s read is correct, and the two-tier system (tiny always-hot contract + longer reference manual) is the right move.

Two tweaks I’d make before you tell it “yes, implement”:

Don’t make every agent read SHARED-RULES.md on session startup.
That’s expensive and still unreliable. Instead:

Only the Orchestrator needs to know the full manual well.
Workers follow the Operating Contract always, and the orchestrator injects the relevant excerpt from the manual only when needed (e.g., “workspace isolation rules” for infra/devops work).

Enforcement should be mechanical and fast
Have the orchestrator do a single-pass check:

If missing RESULT/CLARIFY/BLOCKED → “Resubmit in template; missing: X”
If still noncompliant → mark task “Blocked: noncompliant output” and re-dispatch / escalate tier

That prevents endless loops.

What to tell your agent to do (acceptance criteria)

Reply to your agent with something like:

“Yes—build it. Deliver these items:”

A ~15–25 line Operating Contract (the “always injected” layer)
A refactor plan for SHARED-RULES.md into a reference manual (can stay long, but must be well-sectioned with headings + TOC)
Update all worker prompts to include:
- the Operating Contract
- the two examples (RESULT + BLOCKED REPORT)
Update Orchestrator:
- enforce the output gate (reject noncompliant responses)
- add the one-line injection in every work packet: “Follow Operating Contract…”
Run a trial: dispatch 2–3 tasks and confirm compliance without extra coaching

Optional (but high ROI):

add a RULES_VERSION: YYYY-MM-DD line so you can see which contract version is in play.

If you paste (or summarize) the top 10 rules currently inside SHARED-RULES.md, I can help you choose the best 5 that belong in the 20-line contract (the rest should stay in the manual).

Overall: this is a very solid direction. You’ve basically converged on the pattern that scales: tiny always-hot contract + orchestrator enforcement + everything else as reference.

Where I’d tweak it (to keep it robust and not accidentally token-hungry / brittle):

What looks excellent

OPERATING-CONTRACT.md as the always-hot layer + versioning + two few-shot examples: perfect.
Orchestrator enforcement gate (reject + resubmit): this is the “behavioral memory” hack that actually works.
SHARED-RULES refactored into reference-only + TOC + “inject relevant sections”: correct. Workers shouldn’t pay the cost of reading a 300-line manual on every run.
Local docs library (pinned + index + links + changelog): great for infra accuracy and speed.

Cautions / improvements I’d make

1) Don’t hardcode specific models in prompts (keep tiers abstract)

Your tier → model mapping is good, but keep it out of agent prompts if possible and store it in one config place. Prompts should talk in: FAST / STANDARD / DEEP only.

Reason: you’ll change providers/models over time, and you don’t want to touch 9 prompts every time. Also reduces “model-name cargo culting”.

(And for model choices themselves, #models is the right place to keep current.)

2) Auto-injection / symlinks: great, but build a fallback path

If your “auto-inject contract into context” mechanism ever fails silently, you don’t want behavior to collapse.

Fallback that costs almost nothing:

The orchestrator work packet should always include a 1-line reminder:
“Follow OPERATING-CONTRACT vYYYY-MM-DD (RESULT/CLARIFY/BLOCKED).”

That way the contract is still “nearby in context” even if injection breaks.

3) MC-VISIBILITY: your checkpoint requirements might be too heavy

“3 mandatory MC actions per checkpoint” (patch description + jsonl log + post interim deliverable) can become:

lots of tool calls
lots of text churn
more places to fail

I’d soften it:

Always: post a compact checkpoint message (the human feed)
Sometimes: patch the task description (only when the plan/DoD changes)
Always or sometimes: append a log line (jsonl is fine, but keep it 1 line)

Also: “no runs/ folder” is fine if MC + logs are truly your durable audit trail. But you’ll likely still want a lightweight “ad-hoc run log” path for non-MC prompts (otherwise those sessions become untraceable).

4) SHARED-RULES injection will be easier if you split it into modules

Instead of injecting “sections” from one big file, consider splitting into small files:

rules/workspace-isolation.md
rules/done-definition.md
rules/mc-posting.md

Then the orchestrator can attach/inject exactly one file when needed. Much less token and much less “wrong section” risk.

5) Add one more enforcement: “evidence-gated DONE”

You already mention it, but make it explicit in the orchestrator gate:

A worker cannot claim “Done” unless it includes evidence (tests run, files changed, links, screenshots, etc.) appropriate to the task.

This prevents the most expensive failure mode: false completion.

Net: will this waste tokens?

If you keep the contract tiny + enforce via rejection + keep MC checkpoints compact, this reduces total token burn in most real setups, because it prevents:

repeated rework
long clarifying back-and-forth
invisible failures and late-stage surprises

If you paste the draft OPERATING-CONTRACT.md, I can help you shave it to the “minimum effective contract” (the smallest set of rules that still yields consistent behavior).

Your agent’s follow-up is basically spot on. The only thing I’d push you to do now (not “later”) is closing that gap in #5: evidence-gated DONE in the orchestrator gate — because that’s the difference between “nice policy” and “actual system behavior”.

The key fix: make Evidence a first-class field

1) Update the Worker RESULT template (one line)

Require an Evidence: line in every RESULT:

Code/infra tasks: file paths + commands run + outputs/log lines (or MC deliverable link)
Writing tasks: doc path + section anchors / excerpt
Research tasks: URLs + quoted snippets

Example:

RESULT

Summary:
Deliverables:
Evidence: (must not be empty)
Notes/Risks:

Recommended next stage:

2) Update orchestrator enforcement (2 checks, in order)

When a worker responds with RESULT:

Gate A — Format: must match RESULT/CLARIFY/BLOCKED + required fields present
Gate B — Evidence: must include non-empty Evidence: that matches what was requested in the work packet

If Gate B fails → orchestrator replies:

“Resubmit RESULT with Evidence (missing: …). You may not mark done without proof.”

3) Put “required evidence” in the Work Packet (this makes it robust)

In each work packet, add:

Required Evidence: (choose 1–3 items)

Examples:

Dev task: “paths changed + tests run output”
Infra task: “config diff + service status output”
Sales collateral: “doc path + final copy pasted or linked”
PM: “acceptance criteria doc path + checklist”

Now the evidence gate isn’t generic; it’s tailored per task.

On the SHARED-RULES split decision

I agree with your agent: you don’t need to explode into 6+ files yet. A good compromise that keeps it simple:

Keep one SHARED-RULES.md, but ensure each section is short and injectable (10–30 lines).
Orchestrator injects only that section (never the whole doc).

That’s 90% of the benefit with near-zero filesystem complexity.

MC-VISIBILITY softening

The softened rules your agent proposed are the right balance. The “always patch task description” requirement is the one that usually causes churn; making it “only on plan/DoD change” keeps MC clean.

If you want, paste your current worker RESULT template and your orchestrator enforcement snippet, and I’ll rewrite them so Evidence-gated DONE is enforced without adding much token overhead.

old jackalBOT Mar 8, 2026, 2:41 PM

What you’re seeing is expected with how OpenClaw works today:

When the Gateway restarts, the process that would “ping back” is gone.
Anything that relies on a live session continuing (or a sub-agent “announce back”) is best-effort and can be lost on restart. The docs explicitly call this out: sub-agent announce is best-effort and can be lost if the gateway restarts (see docs/tools/subagents.md).
So the agent saying “I’ll ping back once it’s up” is only true if it schedules something persisted before the restart.

The robust fix: schedule a one-shot Cron “I’m back” ping before restarting

Cron jobs are persisted to disk and survive restarts (see docs/automation/cron-jobs.md), which makes them the right primitive for “notify after restart”.

Example (Discord channel notify)

Before you restart/update, run something like:

openclaw cron add \
  --name "Post-restart ping" \
  --at "$(date -u -d '+2 minutes' +%Y-%m-%dT%H:%M:%SZ)" \

  --session isolated \
  --message "Gateway should be back up now. If you see this, cron+Discord delivery is working." \
  --announce \
  --channel discord \
  --to "channel:YOUR_CHANNEL_ID_HERE" \
  --delete-after-run

Notes:

Use a small delay (e.g. 2–3 minutes) so Discord reconnect is definitely up.
If you’re restarting as part of an update, schedule the cron first, then update/restart.

Debug checklist if it still doesn’t fire

From the FAQ (docs/help/faq.md → “Cron or reminders do not fire”):

Confirm cron is enabled (cron.enabled) and OPENCLAW_SKIP_CRON is not set.
Check runs:

  openclaw cron list
  openclaw cron runs --id <jobId> --limit 50
  openclaw cron run <jobId> --force

What to change in your Infra Agent protocol

Add one non-negotiable rule:

“If you are about to restart/update the gateway, you MUST schedule a one-shot isolated cron announce to the target channel first. Then proceed with restart. If you didn’t schedule it, say explicitly that no automatic ping will occur.”

That directly addresses the line your agent printed:

“I did not schedule a reminder in this turn…”

If you tell me whether your gateway runs under systemd/macOS service + whether you already have cron enabled, I can tailor the exact commands/wording for your infra agent so it does this automatically every time.

Note: I did not schedule a reminder in this turn, so this will not trigger automatically.

Yeah — a Gateway restart kills everything that’s “in-process” inside that Gateway:

Any currently-running turn gets cut off.
Any sub-agent runs are in-process too, and the docs explicitly warn that sub-agent “announce back” is best-effort and can be lost on restart (docs/tools/subagents.md).
After restart, you can start new turns again, but the “I’ll ping you when I’m back” only happens if something durable (cron/heartbeat/external monitor) triggers it.

So your intuition is right: if you want robustness across restarts, you need a durable scheduler loop, not “the worker will remember to message later”.

Should you create a heartbeat that delegates and continues work?

Conceptually: yes, but I’d be picky about which mechanism you use.

Heartbeat is good for: “periodic awareness”

Heartbeat runs periodic turns in the main session and is great for batched checks (inbox/calendar/etc.). Docs: docs/gateway/heartbeat.md.

But using heartbeat as a queue dispatcher can be a bit awkward because:

it’s designed to be quiet (HEARTBEAT_OK suppression)
it shares the main session context/history (can get noisy if you overuse it)
it’s not “precise scheduling”

Cron is better for: “dispatcher / watchdog loop”

If your goal is: “every N minutes, check Mission Control for tasks and dispatch if an agent is free” — that’s much closer to a cron isolated job (deterministic, persistent across restarts, doesn’t pollute your main session). Docs: docs/automation/cron-jobs.md + docs/automation/cron-vs-heartbeat.md.

Recommendation:

Use Cron (isolated) for the dispatcher/watchdog loop.
Use Heartbeat for “human check-ins” and general awareness.

A robust, simple architecture for MC + OpenClaw restarts

1) Make Mission Control the durable task source of truth

Treat MC board state as the canonical state machine:

Planning → Inbox → Assigned → In Progress → Testing → Review → Verification → Done

2) Add a Cron “Orchestrator Tick” (dispatcher loop)

Every 1–5 minutes (whatever cost you can tolerate), the tick does:

Find dispatchable tasks
- tasks in Inbox (or “Assigned but not started”)
Check capacity
- which workers are “free” (or which ones haven’t updated in X minutes)
Dispatch
- create work packets and assign to the right worker tier
Detect orphaned work
- tasks stuck in In Progress with no recent checkpoint → mark Blocked or re-dispatch from last checkpoint
Announce only when needed
- don’t spam; only message your #alerts when:
  - a task becomes blocked
  - a task completes

the system had to reassign due to restart

This gives you exactly what you described: after a restart, the next cron tick “re-hydrates” activity from MC and continues.

3) Enforce checkpointing so re-dispatch is possible

Your Operating Contract + evidence gating becomes the “resume substrate”:

workers must post interim deliverables (or at least write artifacts)
orchestrator can re-dispatch using last known artifacts even if the prior worker died

One subtle but important point

If your dispatcher tick runs inside the same Gateway, then while the Gateway is down, nothing runs. That’s fine — the value is that as soon as it’s back, the next tick resumes dispatching without you having to poke it.

If you want true “I’m down/I’m up” notifications even when the Gateway is dead, that requires an external uptime monitor. But for “continue tasks after restart”, cron-inside-gateway is usually enough.

If you confirm one detail, I can suggest the cleanest wiring:

Do your MC tasks live in Mission Control’s DB (and is the orchestrator able to query MC via API), or are you “using MC stages” only as a convention in chat/workspace?

Note: I did not schedule a reminder in this turn, so this will not trigger automatically.

Here’s a paste-ready “Orchestrator Tick / Dispatcher Loop” prompt you can use for a scheduled Cron (isolated) run.

Important operational note first: for a dispatcher that runs every 1–5 minutes, you typically want the cron job’s delivery mode = none (so it doesn’t spam). Then the dispatcher only sends a message when it actually dispatches/reassigns/blocks something.

Prompt: “MC Dispatcher Tick” (paste into your scheduled job)

# Mission Control Dispatcher Tick (Orchestrator Loop)

You are running as a scheduled dispatcher. Your job is to keep Mission Control moving:
- pick up new tasks,
- assign them to appropriate agents (FAST/STANDARD/DEEP tier),
- detect orphaned/stalled tasks (esp. after gateway restarts),
- and notify only when action is required.


## Hard constraints
- Be token-efficient: do not do deep reasoning. Do not rewrite tasks. Do not add fluff.
- Never spam: only send notifications when you actually (a) dispatch, (b) reassign, (c) mark blocked, or (d) detect something broken.
- Never contact other workers directly; dispatch through the Orchestrator routing rules and work packets.

## Inputs / environment assumptions
- Mission Control is the source of truth for task stage and assignment.
- You can read MC tasks via whatever integration exists in this deployment (API or stored task list).
If you cannot access MC tasks, STOP and report a single BLOCKED report explaining what’s missing (endpoint/credentials/path).

## Stage model
Planning → Inbox → Assigned → In Progress → Testing → Review → Verification → Done

## Dispatcher policy (one pass per tick)

### Step 0 — Load state
Load:
- list of MC tasks (id, title, stage, assignee if any, lastUpdate timestamp if available)
- agent availability (best effort; if unknown, assume 1 task per agent and avoid over-dispatching)

### Step 1 — Identify candidates
A) Dispatchable:
- stage == Inbox (or Assigned but not started) AND no assignee

B) Orphaned/stalled:
- stage == In Progress (or Testing/Review) AND lastUpdate older than STALE_MINUTES
- OR task assigned to an agent that is not responding / appears offline

Set STALE_MINUTES default to 30 unless you have a better local standard.

### Step 2 — Choose actions (minimal, safe)
Do actions in this priority order:
1) Handle orphaned/stalled tasks (prevent deadlocks)
2) Dispatch new Inbox tasks (one or two per tick max to avoid floods)

### Step 3 — Routing (FAST/STANDARD/DEEP)
For each task you take action on:
- Choose tier:
  - DEEP if infra/security/compliance/pricing OR ambiguous OR multi-step/tool-heavy
  - FAST only if clearly trivial
  - otherwise STANDARD
- Choose role agent:
  - PM: clarify requirements / acceptance criteria / plan
  - Dev: implementation
  - Infra: gateway/config/deploy

  - Sales/HR/BizOps: domain tasks
- Prefer not to assign multiple agents unless necessary.

### Step 4 — Dispatch format (Work Packet)
When you assign/reassign a task, attach/send this work packet to the chosen agent:

WORK PACKET
- Task ID:
- Task title:
- Current MC stage:
- Tier: FAST|STANDARD|DEEP
- Goal (1 sentence):
- Definition of Done (bullets):
- Required Evidence (1–3 bullets):
- Constraints / risk gates:

- Inputs (paths/links/context):
- Checkpoint cadence: (e.g. 10 min or milestone)
- Contract reminder: Respond ONLY as RESULT / CLARIFY / BLOCKED REPORT.

### Step 5 — Notifications (only on action)
If you took any action, send a single compact notification to the alerts channel (or MC activity feed):
- what tasks you dispatched/reassigned/blocked
- who they went to
- any CLARIFY questions you need from the user
If you took NO action, send nothing.

## Output contract for this tick
Return a short internal summary for logs:
- actions taken (or “no-op”)
- tasks touched

- any detected systemic issue

Two optional refinements (worth doing)

Stall detection based on “last checkpoint message” (not just timestamps)
“One action per tick” throttle at first (prevents dispatch storms after restart)

If you tell me how your dispatcher can access Mission Control tasks (API endpoint vs file export vs “MC is just a convention”), I can tailor the prompt to the exact data source + give you a clean “BLOCKED” message for the missing integration.

old jackalBOT Mar 8, 2026, 3:48 PM

You prevent this by treating infra changes like production deploys, not like “LLM edits a config file and vibes”.

The winning combo is:

reduce blast radius (limit what the infra agent is allowed to do)
two-phase commit (plan → approve → execute)
preflight validation + rollback (so a bad change can’t brick the gateway)

Here’s a practical setup that works well with OpenClaw:

1) Make the infra agent “read-only by default”

Default behavior:

inspect, propose, prepare diffs/commands
do not restart, update, rotate tokens, or rewrite config without explicit approval

This alone eliminates most “it broke itself”.

Enforcement: Orchestrator should reject any infra action that isn’t preceded by an approval request.

2) Adopt a Safe-Change Runbook (every time)

Have your infra agent follow this exact sequence:

A) Snapshot / backup first (break-glass)

Save a timestamped copy of the active config (and any other critical state you rely on).

B) Preflight validate before restart

Use the CLI validator so you catch schema/key errors before the gateway goes down:

openclaw config validate (docs: docs/cli/config.md)

If validate fails: STOP, do not restart.

C) Schedule a post-restart ping (durable)

Before restart/update, schedule a one-shot cron announce to your alerts channel. Cron persists across restarts (docs: docs/automation/cron-jobs.md), unlike in-flight “I’ll ping back”.

D) Execute the change

Only after approval.

E) Verify health

Immediately check:

gateway is running
channel delivery works
mission control reconnects (if used)

F) Auto-rollback if verification fails

If gateway fails to come up or delivery is broken, revert to the backup config and restart.

This turns “oops” into a 2-minute rollback instead of a manual rescue.

3) Restrict tooling per agent (blast-radius control)

OpenClaw supports per-agent tool allow/deny policies (docs mention tools.allow/tools.deny and per-agent equivalents in the configuration reference). The practical pattern:

Infra Planner agent: can read logs/docs/status; no “dangerous” tools.
Infra Executor agent: can run the minimal set of tools needed, but still uses approval gates.

Even if you don’t implement strict tool policies yet, logically splitting “planner vs executor” is a huge reliability win.

4) Don’t let the infra agent edit its own instructions

Most self-bricking happens when an agent rewrites:

its own contract / rules

orchestrator routing rules
auth/delivery settings

Make those files human-owned (or at least “proposal only”): the agent can draft a patch, but cannot apply it without approval.

5) Make “brick-proof” the default with one command: validate

If you do only one thing: enforce that every config change must pass:

openclaw config validate (from docs/cli/config.md)

before any restart/update. That catches the majority of “gateway won’t start” incidents.

If you want, I can write you a pasteable Infra Agent “Safe Change Protocol” prompt (plan/approve/backup/validate/restart/verify/rollback) that fits your Operating Contract style and works nicely with Mission Control.

Note: I did not schedule a reminder in this turn, so this will not trigger automatically.

old jackalBOT Mar 8, 2026, 4:06 PM

Two different levers here:

Planning-before-action (behavioral) — works with any model
“Thinking level” (runtime/model feature) — only works if the underlying model/provider supports it

You usually want both: plan first, and only turn on high thinking for DEEP-tier work.

1) Force planning before proceeding (most reliable)

In your Orchestrator prompt, add a hard gate:

For any non-trivial task, the orchestrator must produce:
- clarifying questions (if needed)
- a short plan + Definition of Done + risks
- then wait for explicit “APPROVE” before dispatching workers / running tools

This prevents “agent just starts doing stuff” and is independent of model quirks.

Mission Control already has a Planning Q&A flow — the key is: don’t let tasks move to In Progress until the plan is approved.

2) Enable “deep thinking” (OpenClaw-specific knobs)

OpenClaw supports a thinking level concept you can set per run. You’ll see it in:

sub-agent spawns: /subagents spawn … --thinking <level> (docs: docs/tools/subagents.md)
cron jobs: openclaw cron add … --thinking high (docs: docs/automation/cron-jobs.md)
agent sends: openclaw agent … --thinking medium/high (docs: docs/tools/agent-send.md)
defaults: agents.defaults.thinkingDefault in config (docs: docs/gateway/configuration-reference.md)

Important caveat: not all models honor thinking levels. Some providers ignore it or implement their own “adaptive thinking”. So treat thinking as “best effort”, and rely on the planning gate for correctness.

Recommended pattern (simple + cost-controlled)

Keep your global/default at low (agents.defaults.thinkingDefault: "low")

For DEEP-tier tasks only, override thinking to high on the dispatched run (cron/subagent/agent-send).

Copy/paste snippet for your Orchestrator (planning gate + thinking escalation)

## Planning Gate (must follow)
For any task that is multi-step, tool-heavy, or high-stakes:
1) Ask clarifying questions if needed.
2) Propose a plan (3–7 steps) + Definition of Done + risks.
3) STOP and wait for the user to reply “APPROVE”.
Only after APPROVE may you dispatch workers or run tools.

## Thinking escalation rule
- Default thinking: low.
- If Tier=DEEP, run the delegated worker turn with thinking=high (if supported).

- If thinking override is unsupported by the current model, proceed with the same plan/verification rigor anyway and explicitly note: “thinking override not supported; using process gate instead”.

If you tell me how you dispatch workers today (subagents vs Mission Control assignment vs cron isolated jobs), I can give you the exact place to apply the --thinking high override so it only hits DEEP-tier runs and doesn’t blow up your token bill.

hot stump Apr 8, 2026, 2:59 AM

How do you turn on high thinking ?

#Architecture Questions

The model that scales: “Hot cache + routing + canon”

1) MEMORY.md = hot cache of invariants (not a dump, not only pointers)

2) Canonical docs = single source of truth per topic

3) Routing indexes = one per folder/domain, not one mega-index

Make QMD hybrid retrieval love you (practical tactics)

Use “retrieval-friendly headers”

Put examples/test cases next to rules (huge)

Add a glossary / alias map

Prefer smaller, denser docs over huge docs

Put “updated” metadata in the doc, not just in filenames

Suggested structure (simple but scalable)

What I’d do for your exact goals (fast recall, low token waste)

The pattern that works well with Notion sync

Why this helps specifically with Notion sync

Extra wins (low effort, big payoff)

A setup that works well

Why I’d still prefer poll over webhook (even if webhook exists)

Quick questions (so the recommendation matches your stack)

0) Core principles (the “why this works”)

1) Recommended folder layout (battle-tested)

2) What goes where (very concrete)

MEMORY.md (hot cache)

kb/** (curated canon)

sources/notion/** (raw sync)

logs/**

adr/**

3) The doc template that maximizes QMD retrieval quality

4) Index files that actually help (not just TOCs)

5) Naming conventions

6) If you want the agent to maintain this automatically

1) Proactive notifications (so it never “goes silent” on deep work)

2) Real-time tracking (watching what it’s doing)

A) The “human feed” (Discord thread)

B) The “developer feed” (live run log in the workspace)

3) “Notify me when deep thinking starts”

4) Optional: a watcher job (for guaranteed pings)

How to pair them (conceptually)

Practical tip

Plan (what you’re implementing)

Prompt to paste into your agent (instructions / system prompt addon)

Pros (why this approach works in practice)

Cons (where token waste can happen)

Will it “just waste tokens”?

Knob A: When you enable it

Knob B: How verbose updates are

How to make it token-efficient (recommended tweaks)

Prompt to paste (COMPACT default + DETAILED on demand)

Why “pointer only” is risky

Why “download only” is also imperfect

Best practice setup for an infra agent

What tends to work best (architecture)

Pasteable prompt: “Smart Model Routing Rubric”

OpenClaw-specific note

Best-practice patterns (pick one)

Pattern 1 (recommended): “Agent pool per tier” (FAST / STANDARD / DEEP)

Pattern 2: Per-task model override (only if your runtime truly supports it)

Pattern 3: Escalation ladder (token-efficient)

How to make the orchestrator pick correctly

Routing triggers that work in real life

Pasteable orchestrator snippet (works with agent-pool routing)

1) Recommended architecture (minimal moving parts)

Roles

Communication rule (default)

2) How a worker knows it is “blocked”

3) The “Blocked Protocol” (what the worker sends)

4) Why I don’t recommend worker-to-worker by default

Pros of peer-to-peer

Cons (the real-world pain)

5) The clean workflow loop (robust and easy to reason about)

6) If you want, I can give you the exact prompts

1) ORCHESTRATOR prompt (paste into orchestrator agent)

2) GENERIC WORKER prompt (paste into each worker: HR/Sales/Dev/PM/Infra/BizOps)

3) Role add-ons (append to each worker prompt)

HR agent add-on

Sales agent add-on

PM agent add-on

Business Operations add-on

Infra agent add-on

Dev agent add-on

1) `MEMORY.md` = hot cache of invariants (not a dump, not only pointers)

`MEMORY.md` (hot cache)

`kb/**` (curated canon)

`sources/notion/**` (raw sync)

`logs/**`

`adr/**`