#Architecture Questions
1 messages ¡ Page 1 of 1 (latest)
Option C (hybrid) is what Iâve seen scale best, with one extra twist: treat your workspace like a small knowledge base with âhot cache + canonical sources + retrieval scaffoldingâ, not like one big memory blob.
Hereâs a pattern that stays fast/precise with QMD hybrid search and doesnât explode token usage.
The model that scales: âHot cache + routing + canonâ
1) MEMORY.md = hot cache of invariants (not a dump, not only pointers)
Keep it ~50â200 lines, but make it high signal:
- Non-negotiables (compliance, pricing floors/ceilings, things you must never do)
- Definitions / glossary of core terms (the exact words people will use in prompts)
- Very compact business rules (1â3 lines each)
- Routing links to canonical docs
Think of it like: âIf the agent forgot everything else, what must still be true?â
If you keep it pure pointers only (Option A), you risk the agent âforgetting to lookâ when it doesnât realize a rule exists. If you dump everything into MEMORY.md (Option B), it becomes stale, noisy, and retrieval quality drops because everything looks equally important.
2) Canonical docs = single source of truth per topic
Have one âownerâ file per domain:
pricing/pricing.md(the canonical pricing logic)compliance/compliance.mdproduct/product-spec.mdsales/playbook.mdclients/<client>.md
Then allow satellites if needed (FAQ, examples, edge cases), but the canon should be obvious.
3) Routing indexes = one per folder/domain, not one mega-index
This is the piece people skip. For hundreds of files, one global index gets âtoo shallowâ.
Example:
INDEX.md(top-level map, 20â60 lines)pricing/INDEX.mdclients/INDEX.mdproduct/INDEX.mdsales/INDEX.md
Each index should include:
- a 3â8 line summary (âwhat lives hereâ)
- links to canonical docs
- common queries / synonyms (âpricingâ, âdiscountâ, âquoteâ, âmarginâ, âMSRPâ, etc.)
Those âsynonyms + common queriesâ lines help BM25 a lot and also help embedding retrieval anchor to the right area.
Make QMD hybrid retrieval love you (practical tactics)
Use âretrieval-friendly headersâ
In canonical docs, start with a compact block like:
## Summary## Rules (do / donât)## Decision / Rationale## Examples / Test cases## Edge cases## Related docs
This gives the reranker clean âanswer-shapedâ chunks.
Put examples/test cases next to rules (huge)
For pricing and policies, add 3â10 concrete examples:
- âIf customer is X and term is Y â price is Zâ
- âDiscount approval required if >15%â
- âAnnual prepay + multi-seat bundle exampleâ
Agents apply rules way more reliably when the doc contains âunit testsâ in plain language.
Add a glossary / alias map
Have GLOSSARY.md or a small section in MEMORY.md:
- âACV = annual contract valueâ
- âPro plan = âGrowthâ in older docsâ
- âClient âUmbrellaâ sometimes called âUmbâ in Slackâ
This directly improves both BM25 matching and semantic retrieval across months of naming drift.
Prefer smaller, denser docs over huge docs
Hundreds of files is fine. A few monster files are usually worse because:
- chunking gets messy
- lots of irrelevant chunks match the query
- reranking has more noise to fight
Rule of thumb: if a doc becomes âscroll foreverâ, split into canon + examples + edge cases.
Put âupdatedâ metadata in the doc, not just in filenames
A one-liner near the top:
Last updated: 2026-03-07Owner: Sales OpsApplies to: EU + US (B2B only)
That helps the agent resolve conflicts when it retrieves older stuff.
Suggested structure (simple but scalable)
MEMORY.md
INDEX.md
pricing/
INDEX.md
pricing.md
discount-policy.md
examples.md
clients/
INDEX.md
acme.md
globex.md
_template.md
product/
INDEX.md
product-overview.md
spec-core.md
roadmap-notes.md
sales/
INDEX.md
playbook.md
objection-handling.md
compliance/
INDEX.md
compliance.md
data-handling.md
What Iâd do for your exact goals (fast recall, low token waste)
- MEMORY.md: keep ~30â80% âroutingâ + ~20â70% âcritical invariantsâ.
- Compliance + ânever do Xâ lives here.
- Pricing guardrails live here (the âshapeâ of pricing).
- Domain indexes: add synonyms/common questions explicitly.
- Canonical docs: include rules + examples + edge cases.
- Client profiles: enforce a template so retrieval is predictable.
If you want, paste your current MEMORY.md (sanitized) and your folder tree (just names), and Iâll suggest a concrete refactor plan + an INDEX.md / pricing/INDEX.md template that works well with hybrid retrieval.
If youâre already doing Notion â QMD sync, Iâd treat that synced corpus as your raw/source layer, and then build a small, curated âagent layerâ on top of it. That tends to beat both âonly pointersâ and âdump everything into MEMORY.mdâ, because Notion exports are usually huge + noisy (meeting notes, brainstorms, outdated drafts), which hurts retrieval precision over time.
The pattern that works well with Notion sync
1) Keep the Notion sync in its own folder and mentally label it âsource, not handbookâ
Example:
sources/notion/**(auto-synced, messy, high recall, lower precision)kb/**orhandbook/**(your curated canon for the agent)
Then you optimize the agent to hit kb/** first, and fall back to Notion only when needed.
2) Make MEMORY.md point to the curated canon, not to raw Notion pages
In MEMORY.md, include:
- 10â30 invariants (compliance, pricing guardrails, ânever do Xâ)
- a short routing table into
kb/** - âself-check triggersâ like: âBefore quoting price â consult kb/pricing/pricing.md (not meeting notes)â
3) In kb/**, write âanswer-shapedâ docs that summarize + link to the Notion source
For each critical area, create a canonical doc that has:
SummaryRulesExamples / test casesEdge casesSourcelinks (paths into the Notion sync)
So youâre not duplicating everythingâjust extracting the bits the agent must apply correctly.
Why this helps specifically with Notion sync
- Notion spaces inevitably accumulate contradictory content (old specs, abandoned ideas). Hybrid search will retrieve it unless you give it cleaner targets.
- A curated
kb/**gives the reranker âclean chunksâ to pick. - You avoid link-rot: Notion page titles/paths can change on sync; your
kb/**filenames stay stable.
Extra wins (low effort, big payoff)
- Put meeting notes / brainstorms in a separate
sources/notion/meetings/**(or exclude them if your sync supports it). That alone can massively improve precision.
- Standardize a few headings in Notion pages going forward (
Summary,Rules,Examples,Owner,Last updated). Those headings survive sync and make retrieval much more reliable.
If you paste a sanitized high-level tree of what your Notion sync produces (folder names + a few representative filenames), I can suggest a concrete kb/** structure and what should live in MEMORY.md vs kb/INDEX.md vs âleave it in Notion onlyâ.
Yes â but in practice itâs usually âagent-driven build stepâ triggered by either a webhook-like event or a lightweight poll, because Notion change webhooks arenât universally available/reliable depending on what youâre using.
A setup that works well
1) Trigger
Pick one:
- Polling (most common / simplest): every N minutes, query Notion for pages/databases where
last_edited_time > last_run. - Webhook via a bridge: use something like n8n / Make / Zapier to âwatch Notionâ and hit your webhook when a DB item/page changes (depends on what those tools can watch in your workspace).
- Git-based trigger: if your âNotion sync â markdownâ already runs somewhere and commits changes, then a git push can trigger the handbook rebuild.
2) Sync layer (deterministic)
- Update
sources/notion/**with the latest markdown export. - Ideally keep it incremental (only changed pages) to reduce churn.
3) Handbook build (agent job)
Run an agent/job that:
- reads only changed Notion files
- updates
kb/**âcanonâ docs (pricing rules, compliance, playbooks, etc.) - preserves strict structure (
Summary,Rules,Examples,Edge cases) - adds source citations/links back to the Notion-synced files so you can audit where a rule came from
4) Guardrails (highly recommended)
Autogenerating business rules can go sideways if the agent âhelpfully rewritesâ things. Two guardrails that save pain:
- write generated output to
kb/_generated/**and have a shortkb/**human-approved layer, or - require the job to produce a âdiff + change logâ for review (even if you auto-apply most changes)
Why Iâd still prefer poll over webhook (even if webhook exists)
Notion content can change in bursts, and webhook pipelines tend to:
- spam rebuilds
- rebuild on non-meaningful edits (typos, moved blocks)
- create noisy diffs
A poll with debouncing (e.g. ârun at most once every 10 minutes, coalesce changesâ) is usually calmer and cheaper.
Quick questions (so the recommendation matches your stack)
- How are you doing the Notion sync today (which tool/plugin)? Does it already track âchanged sinceâ or expose page IDs in filenames?
- Do you want the handbook to be fully auto-updated, or âauto-propose + human approveâ for sensitive stuff like pricing/compliance?
If you tell me your current sync mechanism + where the sync runs (local box, server, CI), I can outline the cleanest trigger path for that setup.
Hereâs the architecture Iâd recommend if your goal is: months-long reliability, fast recall with QMD hybrid, minimal token waste, and Notion as a large synced source.
Itâs basically: Source corpus (Notion) + Curated handbook (agent-first) + Hot cache (MEMORY), with strict separation between canon and logs.
0) Core principles (the âwhy this worksâ)
- One canonical place per truth (pricing rules live in exactly one canon doc, not scattered).
- Hot cache â knowledge base:
MEMORY.mdholds invariants + routing + safety rails, not bulk info. - Keep high-churn content out of retrieval-critical paths (meeting notes, brainstorms, Slack dumps go to
/logsor/sources, not/kb). - Retrieval scaffolding beats more content: synonyms, common queries, âbefore doing X, consult Yâ dramatically improves real-world accuracy.
1) Recommended folder layout (battle-tested)
MEMORY.md
INDEX.md
GLOSSARY.md
kb/ # Curated, agent-first canon (small, clean, stable paths)
00_START_HERE.md
policies/
INDEX.md
company-policies.md
security-data-handling.md
pricing/
INDEX.md
pricing.md
discount-approval.md
examples.md
product/
INDEX.md
product-overview.md
spec-core.md
sales/
INDEX.md
playbook.md
discovery-questions.md
clients/
INDEX.md
_template.md
acme.md
globex.md
compliance/
INDEX.md
compliance.md
sources/
notion/ # Raw synced Notion export (big, noisy, not hand-curated)
... # whatever your sync produces
logs/ # High-churn, low-authority material
meeting-notes/
call-transcripts/
scratch/
adr/ # Decision records (prevents âwe changed this but forgot whyâ)
0001-pricing-model.md
0002-client-onboarding.md
Rule: the agent should prefer kb/**, then fall back to sources/notion/** only when canon doesnât cover it.
2) What goes where (very concrete)
MEMORY.md (hot cache)
Keep it tight and âaction forcingâ:
- Non-negotiables (compliance, legal, pricing guardrails)
- Golden routing rules (e.g. âBefore quoting pricing â consult
kb/pricing/pricing.mdâ) - Top-level map (links to
kb/*/INDEX.md) - Glossary highlights (the 20 terms that appear everywhere)
Think ~80â200 lines, not 1,000.
kb/** (curated canon)
Each domain has:
INDEX.md(routing + synonyms + common queries)
- 1â3 canonical docs (the rules)
- optional
examples.md(test cases are gold for correct application)
sources/notion/** (raw sync)
Everything Notion provides goes here. Donât fight itâjust donât let it become the âhandbookâ.
logs/**
Meeting notes, transcripts, brainstorming. Useful for recall, but low authority.
adr/**
When something changes (pricing logic, policy), capture: what changed + why + date.
3) The doc template that maximizes QMD retrieval quality
For any canonical kb/.../*.md, use a consistent skeleton:
# <Topic>
Last updated: YYYY-MM-DD
Owner: <team/role>
Applies to: <scope>
## Summary (5-10 bullets)
## Rules (DO / DON'T)
- Rule:
- Rule:
## Examples / Test Cases
- If ..., then ...
- If ..., then ...
## Edge cases / Exceptions
## Common terms / synonyms
- âACVâ = ...
- âPro planâ aka âGrowthâ ...
## Sources
- sources/notion/... (link/path)
- adr/0002-...
This structure makes the retrieved chunks âanswer-shapedâ, and the synonyms help BM25.
4) Index files that actually help (not just TOCs)
Each kb/<domain>/INDEX.md should include:
- What this folder contains (3â8 lines)
- Canonical docs
- Common queries (literally: the phrases users ask)
- Synonyms / aliases
- âWhen you should check thisâ triggers
Example snippet:
## Common queries
- "Can we discount this?"
- "How do we price annual vs monthly?"
- "Whatâs our minimum contract value?"
## Retrieval triggers
Before quoting or approving pricing:
1) Read pricing.md
2) If client-specific: read clients/<client>.md
3) If discount > 15%: read discount-approval.md
That âtriggerâ section is disproportionately effective at preventing the agent from skipping rules.
5) Naming conventions
- Prefer stable slugs (
pricing.md, notpricing-2026-03.md) - Put dates in front matter / âLast updatedâ, not filenames
- Keep one canonical per topic; anything else is
examples.md,faq.md, orarchive/
Symlinks: Iâd avoid unless you know your sync/indexer handles them cleanlyâoften they create weird duplication in indexing.
6) If you want the agent to maintain this automatically
Do it, but with guardrails:
- Auto-sync Notion â
sources/notion/** - Agent job updates only
kb/**+INDEX.md+GLOSSARY.md - Strong preference: have it output a change log (or PR-style diff) so you can spot nonsense fast, especially for pricing/compliance.
If you reply with:
- your current top-level workspace tree (just folder names), and
- which parts are most âmission criticalâ (pricing? compliance? sales?),
I can tailor thekb/**skeleton + the exact contents ofMEMORY.mdso itâs optimized for your use case rather than generic.
Best pattern: one âtask threadâ per job + structured progress checkpoints + a live run log. That gives you both (a) proactive notifications when itâs going heads-down, and (b) a place to watch what itâs doing without guessing.
1) Proactive notifications (so it never âgoes silentâ on deep work)
Add a simple operating rule to your agentâs instructions:
- Before any step that will take > ~30â60s (deep reasoning, multi-file search, long tool runs), it must send:
- what itâs doing
- why
- what âdoneâ looks like
- when youâll hear back next (ETA)
Example update format the agent should post:
- Status: Heads-down analysis (ETA 6â10 min)
- Current step: Building pricing decision tree from
kb/pricing/*+ Notion sources - Next checkpoint: Draft rules + 5 test cases
- Risks / questions: âDo discounts differ for EU vs US?â
Then enforce a cadence like:
- checkpoint every N minutes (5â10 min) or after each milestone, whichever comes first
- if blocked, it must post âblocked + what it needs from youâ immediately
This single rule eliminates 90% of âagent disappearedâ anxiety.
2) Real-time tracking (watching what itâs doing)
You want two surfaces:
A) The âhuman feedâ (Discord thread)
Have the agent work in a dedicated thread per task and post:
- plan (bulleted)
- live checkpoints
- intermediate artifacts (links / snippets)
- final summary + âwhat changedâ + âwhat to verifyâ
Keep these short and structured to avoid spam.
B) The âdeveloper feedâ (live run log in the workspace)
Have the agent maintain a file like:
runs/2026-03-08-pricing-rebuild/RUNLOG.md
And append entries like:
- timestamp
- action taken (searched X, updated Y)
- decisions made
- links to sources used
This gives you âreal-time visibilityâ even when the thread updates are intentionally less frequent.
If your stack supports it, you can also have the agent write a one-screen runs/.../STATUS.md that always reflects the current step + next step + ETA.
3) âNotify me when deep thinking startsâ
Make it explicit in the agent contract:
Trigger a âdeep workâ notice when:
- itâs about to run multi-step reasoning
- itâs about to touch many files / do a big synthesis
- it expects silence longer than X
That notice should include the next checkpoint time. The key is the ETAâotherwise âthinkingâŚâ isnât actionable.
4) Optional: a watcher job (for guaranteed pings)
If youâre running long autonomous jobs, add a lightweight watcher that:
- checks âis a task still running?â
- if no checkpoint message in (say) 10â15 minutes â ping you with âstill working, next ETAâŚâ
This prevents the worst-case: the agent is stuck and you donât know.
If you tell me what surface you primarily use to supervise (Discord only vs also a dashboard / local terminal), I can suggest the exact âcheckpoint cadenceâ that wonât spam but still feels real-time. Also: do you want interruptible updates (âpause, ask before continuingâ) or just transparency?
Yes â Mission Control is actually a pretty good âhuman supervision layerâ for the architecture we discussed.
From the README, Mission Control is a task dashboard + live event feed that connects to OpenClaw Gateway over WebSocket and manages a workspace directory. That maps cleanly to:
- Mission Control = your âhuman feedâ + kanban + live monitoring
- OpenClaw agents = the workers
- Your
kb/**,sources/notion/**,runs/**layout = the shared workspace the agents read/write
How to pair them (conceptually)
-
Use Mission Control workspaces as the âproject rootâ
- Put the knowledge base structure inside the workspace:
kb/,sources/notion/,logs/,adr/,MEMORY.md, etc.
- Configure Mission Controlâs workspace base path so every task/agent it runs lands in that same filesystem tree.
- Put the knowledge base structure inside the workspace:
-
Make âdeep work visibilityâ a first-class deliverable
- In every Mission Control task, require the agent to:
- post a short plan
- In every Mission Control task, require the agent to:
- post checkpoints (ETA + current step + next step)
- maintain a
runs/<task-id>/RUNLOG.md(live log) - Mission Controlâs Live Feed becomes your real-time âwhatâs happeningâ stream, while the RUNLOG is the durable audit trail.
- maintain a
- Use status/stage transitions as your notification hooks
- Mission Control already has a pipeline (Planning â Inbox â Assigned â In Progress â Testing â Review â Done).
- The best ânotify meâ moments are usually:
- when entering In Progress (heads-down starts)
- if moving backward / failing a stage (stuck / needs input)
- when hitting Review / Done (ready for you)
Mission Control also mentions webhooks (HMAC) and API/SSE support; if you want pings outside the UI, thatâs the natural place to wire âtask state changedâ â Discord/Slack/etc.
Practical tip
Keep your Notion sync output under sources/notion/**, but make Mission Control tasks write only to kb/** + runs/** (and maybe INDEX.md/GLOSSARY.md). That prevents the agent from âfixingâ raw Notion exports and keeps the canon stable.
If you tell me how youâre deploying Mission Control (Docker vs local) and where your workspace lives, I can suggest the cleanest workspace mount/layout so kb/** stays stable and the agents always operate in the right directory.
Plan (what youâre implementing)
- Adopt one visibility protocol the agent always follows (whether itâs a Mission Control task or an adâhoc chat prompt).
- Standardize 3 artifacts for every non-trivial piece of work:
- Chat updates (human-readable checkpoints)
runs/<run-id>/STATUS.md(single-screen âwhatâs happening nowâ)runs/<run-id>/RUNLOG.md(append-only action log)
- Enforce a âno long silenceâ rule: if it will take >60s or involves multiple steps/tools, it must post a checkpoint with an ETA and the next milestone.
- Make it work in both modes:
- MC task mode: tie the run to the task title/ID (if known) and keep updates short (MC Live Feed stays readable).
- Ad-hoc mode: the agent creates an âADHOC runâ anyway and logs the same way.
Prompt to paste into your agent (instructions / system prompt addon)
# Mission Control Visibility Protocol (ALWAYS ON)
Your #1 priority (after correctness) is: make your work observable in real time.
Follow this protocol for BOTH:
(A) Mission Control tasks, and
(B) ad-hoc prompts where no task exists.
## 0) When to use this protocol
- For any request that is more than a one-shot answer (multi-step, deep analysis, research, tool use, writing files, or anything > ~60 seconds), you MUST use this protocol.
- For truly trivial questions, answer normally (no run folder needed).
## 1) Define the Run
At the start of work, create a run id:
- RUN_ID format: YYYY-MM-DD__short-slug
Create a folder:
- runs/<RUN_ID>/
Create/update these files:
- runs/<RUN_ID>/STATUS.md (overwrite; current state only)
- runs/<RUN_ID>/RUNLOG.md (append-only; chronological)
Optionally:
- runs/<RUN_ID>/ARTIFACTS.md (links to outputs, files, decisions)
If this is a Mission Control task and you know the task title/ID, record it at the top of STATUS.md and RUNLOG.md.
If itâs ad-hoc, label it clearly as ADHOC.
## 2) Chat Update Format (use this exact structure)
Whenever you start, hit a milestone, or will be silent >60s, post an update in chat:
**[STATUS]** <One line: what youâre doing now>
**Goal:** <What âdoneâ means>
**Plan:** 3â7 bullets with milestones (checkboxes ok)
**Now:** <current step>
**Next:** <next step>
**ETA:** <rough estimate until next checkpoint + until completion>
**Asks/Blocks:** <questions or what you need from me; ânoneâ if none>
**Artifacts:** <paths/links you created/updated so far>
Rules:
- Keep updates concise, but never ambiguous.
- Do NOT dump hidden chain-of-thought. Instead: summarize reasoning at a high level (assumptions, options considered, why you chose X).
- If you encounter risk (data loss, irreversible actions, security/compliance), STOP and ask before proceeding.
## 3) STATUS.md content (overwrite each update)
STATUS.md must always contain:
- Run ID
- Task: <MC task title/ID or ADHOC>
- Current stage: Planning / Executing / Waiting / Reviewing / Done
- Current step (1 sentence)
- Next step (1 sentence)
- Last update timestamp
- Next checkpoint ETA
- Key decisions made (bullets)
- Files touched (bullets)
## 4) RUNLOG.md content (append-only)
Each entry must include:
- timestamp
- what you did (one line)
- result/outcome
- files changed or tools used (if any)
Keep it short, but complete enough to audit later.
## 5) Checkpoint cadence (anti-silence)
- If you expect >60 seconds of work without sending a message: send a checkpoint BEFORE you start that work.
- During long work, send a checkpoint at least every 5â10 minutes, or at each milestone, whichever comes first.
- If stuck: immediately post a âBlockedâ update with exactly what you need.
## 6) Completion
When done, post a final message:
- Summary (5â10 bullets)
- Outputs + where they live (file paths)
- Any follow-ups / risks
- What you recommend I verify
Also update STATUS.md stage = Done.
(End of protocol)
If you tell me what your agentâs tool surface is (OpenClaw agent in Discord? web UI? local), I can tweak the cadence and the file paths so it fits your actual workspace layoutâbut the above prompt works as-is for âMC tasks + ad-hoc promptsâ and gives you real-time visibility without needing Notion/webhooks.
Pros/cons are real here. Done wrong, it will burn tokens. Done with tight thresholds + compact updates, itâs usually a net win because you stop wasting time/tokens on âare you stuck?â back-and-forth and you reduce misfires.
Pros (why this approach works in practice)
-
Observability / trust
- You always know: what itâs doing, whatâs next, and when youâll hear back.
- Great for âdeep workâ tasks where silence feels like failure.
-
Fewer costly corrections
- Early checkpoints catch wrong assumptions sooner â fewer âredo the whole thingâ cycles (those are the real token killers).
-
Audit trail
- A
RUNLOG.md+ artifacts list makes it easy to verify what happened, reproduce, and debug.
- A
-
Works both in MC tasks and ad-hoc
- Same ritual everywhere = less cognitive overhead for you.
Cons (where token waste can happen)
-
Overhead on small tasks
- If you force âplan + logs + checkpointsâ on a 2-minute question, the logging can be 2â5Ă the actual work.
-
Context bloat
- More chat messages can mean more context carried forward (depends on your stack, but generally yes), which can increase prompt size over time.
-
File clutter
runs/*grows quickly unless you prune/archive.
-
Accidental sensitivity leakage
- âLive logsâ can capture client names, credentials, internal links. Needs explicit redaction rules.
Will it âjust waste tokensâ?
It depends on two knobs:
Knob A: When you enable it
Best practice: enable the protocol only when any of these are true:
-
60s expected work
- multi-step tool use / research
- touching files/code
- high-risk domain (pricing/compliance/security)
Otherwise: answer normally.
Knob B: How verbose updates are
Your updates should be fixed-size and boring. Example of a low-token checkpoint:
Status: step 2/5 (gathering sources). Next: synth rules. ETA next update 7 min.
If you do that, the overhead is typically small (~5â15%) on deep tasks. If you do long narrative âhereâs everything Iâm thinkingâ, it becomes expensive and also not that useful.
How to make it token-efficient (recommended tweaks)
- Cap checkpoints: âevery 10 minutes OR milestoneâ, not every minute.
- Chat is summaries only; detailed tracking goes to
RUNLOG.mdwithout pasting its contents back into chat. - Donât re-read the RUNLOG each time (agents can accidentally burn tokens by repeatedly summarizing their own logs).
- Have a âcompact modeâ for MC: 1â3 lines per update.
- Prune runs: keep last 30â90 days, archive older into monthly bundles.
If you want, I can rewrite the prompt I gave you into two modes:
- COMPACT (MC): tiny updates optimized for live feed
- DETAILED (debug): only when you explicitly ask for deep tracing/debugging
Here are two paste-ready variants: COMPACT (default for Mission Control / low token) and DETAILED (only when you ask for trace/debug). You can paste both and tell the agent âCOMPACT is defaultâ.
Prompt to paste (COMPACT default + DETAILED on demand)
# Mission Control Visibility Protocol (Token-Efficient)
## Default Mode
- Use **COMPACT** mode by default (optimized for Mission Control Live Feed + low token use).
- Switch to **DETAILED** mode ONLY if the user explicitly asks for: âdetailed traceâ, âshow your workâ, âfull debugâ, âstep-by-step logâ, or âaudit trailâ.
## When to activate (avoid wasting tokens)
If the request is trivial (single short answer, no tools, <60s): answer normally, no run folder.
Activate this protocol ONLY if any are true:
- expected >60 seconds of work
- multi-step reasoning/planning
- tool use / browsing / coding / file edits
- high-risk domain (pricing, compliance, security, irreversible actions)
## Run artifacts (only for activated protocol)
Create a run folder:
- runs/<RUN_ID>/ where RUN_ID = YYYY-MM-DD__short-slug
Maintain:
- runs/<RUN_ID>/STATUS.md (overwrite each checkpoint; single-screen status)
- runs/<RUN_ID>/RUNLOG.md (append-only; 1â3 lines per meaningful action)
Optional:
- runs/<RUN_ID>/ARTIFACTS.md (links/paths to outputs)
Do NOT paste long logs into chat. In chat, link the path(s) only.
## Anti-silence rule
If you expect to be silent >60s, post a checkpoint BEFORE starting that step.
During long runs: checkpoint every 10 minutes OR at milestones (whichever comes first).
If blocked: post immediately with the exact question.
---
# COMPACT MODE (DEFAULT)
## Chat update format (1â3 lines max)
Use exactly this style:
**[MC] <RUN_ID> | <Phase> | <Step i/n>**
Now: <what youâre doing> â Next: <next step> | ETA: <next checkpoint + completion>
Ask/Block: <question or ânoneâ> | Artifacts: <path(s) if any>
Phases: Plan / Execute / Verify / Review / Done / Blocked
## First message in COMPACT mode
- 1 compact plan line (steps count + outcome)
- then start execution
Example:
**[MC] 2026-03-08__pricing-rules | Plan | Step 0/5**
Now: define milestones â Next: scan kb/pricing | ETA: upd 8m, done 35m
Ask/Block: none | Artifacts: runs/2026-03-08__pricing-rules/STATUS.md
## Completion message (still compact)
- 5â10 bullets max if needed, but prefer short
- include: what changed + where outputs are + what to verify
---
# DETAILED MODE (ONLY ON EXPLICIT REQUEST)
When enabled, you may:
- expand the Plan to 5â12 bullets
- include brief rationale (NO hidden chain-of-thought; only high-level reasons/assumptions)
- include more frequent RUNLOG entries
Still avoid spamming chat: keep checkpoints concise; put detail in files.
---
# Safety & privacy
- Never log secrets (API keys, passwords, tokens). If encountered, redact.
- For irreversible/destructive actions: STOP and ask first.
If you want, I can also give you a one-line toggle command you can type to the agent like:
- âSwitch to DETAILED mode for this run onlyâ
- âBack to COMPACT modeâ
âŚand a tiny STATUS.md template so it always looks the same.
Do both, but with a very specific bias:
- Primary source of truth for an infra agent should be a local, version-matched snapshot (docs + ideally the repo/code), so it doesnât âlearnâ keys/commands that donât exist in your installed version.
- Secondary source should be pointers to the live docs for âwhatâs newâ, but treated as potentially ahead of your install and therefore something to verify.
Why âpointer onlyâ is risky
Live docs usually track main/latest, not necessarily the exact version youâre running. For an infra agent doing architectural changes, that can cause:
- wrong config keys
- wrong commands
- features mentioned in docs but not shipped in your version yet
Thatâs the kind of mistake that breaks a setup.
Why âdownload onlyâ is also imperfect
Local docs can drift stale unless you refresh them, and youâll miss new best practices / fixes unless you occasionally look at upstream.
Best practice setup for an infra agent
-
Keep a local repo clone (recommended) and/or a local docs mirror:
- Pin it to the same tag/commit as your
openclaw --version(or at least the closest release tag). - Let the agent
grep/search the actual code when docs are ambiguous.
- Pin it to the same tag/commit as your
-
Also keep a âlatest docsâ pointer list (or an optional fresh mirror):
- Use it for discovery.
- Then verify against your pinned version before applying changes.
-
In the agentâs instructions, add this rule:
- âWhen instructions differ between pinned docs and live docs, assume pinned is correct for this environment; only use live-doc guidance if we also upgrade OpenClaw (or confirm the feature exists in code).â
If you tell me whether your infra agent can run shell commands (e.g., read openclaw --version and maintain a git clone), I can give you a tight workflow like: âon startup: detect version â checkout matching tag â use that docs/code for all decisions.â
Hereâs a paste-ready system prompt / instruction block for your Infra Agent that implements the âversion-pinned local docs + live pointersâ architecture.
# Role: OpenClaw Infra Agent (Version-Pinned Docs Architecture)
You are my infrastructure agent responsible for maintaining, fixing, and evolving my OpenClaw setup safely and correctly.
## Goals
1) Make correct changes (no guessing config keys/commands).
2) Prefer deterministic, version-matched guidance over âlatest/mainâ guesses.
3) Be transparent: show what youâre doing, what you changed, and why.
4) Be safe: ask before irreversible actions.
---
## Knowledge Architecture (MANDATORY)
You must treat documentation sources in this priority order:
### Source A (Primary): Version-matched local docs/code (PINNED)
- Maintain a local reference snapshot that matches the running OpenClaw version.
- Use this as the source of truth for config keys, CLI commands, and behavior.
### Source B (Secondary): Live docs pointers (LATEST)
- Keep a small pointer/index file with official URLs for the latest docs.
- Use these only to discover new features or best practices.
- If live docs conflict with pinned docs, assume live docs may be ahead of our installed version.
### Conflict rule
If Source A and Source B disagree:
- Do NOT apply the âlatestâ behavior blindly.
- Either (a) propose an upgrade plan, or (b) verify the feature exists in our pinned code/docs before using it.
---
## Startup / Preflight (do this at the start of each run that touches infra)
1) Determine the running OpenClaw version (and record it in the run log).
2) Ensure the local reference snapshot exists and is pinned to that version/tag/commit.
3) Build/refresh a small local index for fast lookup:
- docs/INDEX.md (or similar) listing the key pinned docs locations and topics.
---
## Operating Procedure (how you work)
### 1) Always verify before acting
- Before you recommend a config key, CLI flag, file path, or behavior:
- Look it up in pinned docs/code first.
- If youâre not sure, say âunknownâ and investigate; donât guess.
### 2) Always cite sources
When you provide an answer that depends on docs:
- Cite where it came from:
- Pinned: file path + section/header
- Latest: URL + section/header
### 3) Safety gates
You MUST ask for explicit confirmation before:
- rotating/deleting tokens, credentials, or secrets
- deleting files or resetting state
- upgrades/downgrades
- changing network exposure (ports, public URLs, auth settings)
- anything irreversible or that can cause downtime
### 4) Make work observable
For non-trivial runs, keep:
- runs/<RUN_ID>/STATUS.md (current step, next step, ETA)
- runs/<RUN_ID>/RUNLOG.md (append-only actions)
Post periodic checkpoints if work will take >60 seconds.
---
## Deliverables format
When done, provide:
- Summary of what changed
- Exact files/locations affected
- How to verify (commands/checks)
- Rollback plan (if applicable)
- Any âversion mismatchâ notes (if live docs suggested something newer)
---
## Hard rules
- Never invent config keys or commands.
- Never treat live docs as authoritative if they donât match our pinned version.
- If you cannot verify something in pinned sources, label it as unverified and propose how to verify.
If you want, I can also give you a tiny companion prompt for a Docs Sync Helper agent whose only job is: âkeep openclaw-ref pinned to current version + optionally keep a latest-main mirrorâ, so your Infra Agent stays focused on actual fixes.
# Role: OpenClaw Docs Librarian (Markdown-Only)
You are NOT an infra-fixing agent. Your only job is to maintain a clean, local, markdown-based documentation library for OpenClaw that other agents can rely on.
## Objective
Keep two doc views available locally, in markdown:
1) **PINNED docs** that match the userâs currently running OpenClaw version (authoritative for config keys/CLI behavior).
2) **LATEST docs** (optional) for discovering new featuresâclearly marked as âmay be ahead of installed versionâ.
## Storage Layout (required)
Maintain this structure in the workspace:
docs/openclaw/
pinned/
VERSION.txt # exact running version recorded (string)
repo/ # repo snapshot or extracted docs for that version
INDEX.md # topic index + file pointers
latest/
repo/ # latest main snapshot (or extracted docs)
INDEX.md # topic index + file pointers
LINKS.md # official online doc URLs + notes
CHANGELOG.md # brief history of updates you made to this library
## Freshness vs Correctness Rules
- **Correctness beats freshness** for operational guidance.
- If pinned and latest disagree, pinned is the default recommendation for current systems.
- Latest is used for âwhatâs newâ only unless the system is upgraded.
## Update Procedure (do this when asked, or on a schedule)
1) Determine the installed OpenClaw version (record it in `docs/openclaw/pinned/VERSION.txt`).
2) Refresh the pinned snapshot so docs match that version (tag/commit/release).
3) Refresh the latest snapshot (if enabled).
4) Rebuild `INDEX.md` files so key topics are easy to find:
- install/onboarding
- gateway config/auth
- channels (discord/telegram/etc.)
- models/providers
- tools & permissions
- troubleshooting
5) Update `CHANGELOG.md` with:
- date
- what changed
- what version(s) are now pinned
## Output Rules
- Keep everything in markdown (no PDFs unless explicitly requested).
- Prefer linking to local files by relative path.
- Maintain a short âWhere to look firstâ section at the top of each INDEX.md.
- Never invent config keys/commands; if uncertain, search the pinned snapshot.
## Online Pointers
Maintain `docs/openclaw/LINKS.md` with canonical URLs (official docs + repo), clearly labeled as:
- âonline latest (may not match pinned version)â
- include direct links to the most-used pages (start/faq, troubleshooting, models, gateway)
## Safety
- Do not store secrets/tokens in this docs library.
- Do not make infra changes; only update/organize docs.
If you tell me where you want this library to live (inside each agent workspace vs one shared workspace), I can suggest the cleanest layout so your infra agent can âmount/readâ it without duplicating files.
Smart model routing works best when you stop treating it as âthe agent vibes the complexityâ and instead make it a separate, explicit decision step with (a) a rubric, (b) an observable choice, and (c) an escalation fallback. The misroutes youâre seeing are almost always because the routing criteria are underspecified or the agent is ârole-playingâ a mode instead of actually switching.
What tends to work best (architecture)
1) Define 3 tiers (donât overcomplicate it)
- FAST: cheap/quick, Q&A, summarization, simple edits, low risk
- STANDARD: normal default work
- DEEP: multi-step reasoning, ambiguous problems, high-risk domains (pricing/compliance/security), heavy tool use, big refactors
2) Make routing a first-class step
Before doing the work, the agent must output:
- chosen tier (FAST/STANDARD/DEEP)
- 2â4 bullet reasons (based on rubric)
- whether it will ask clarifying questions first
This alone fixes âbehavioral routingâ problems because you can correct it immediately.
3) Add an escalation rule (reduces token waste)
Start in FAST/STANDARD and upgrade only when triggers appear, e.g.:
- needs >2 tools
- contradictions found in docs
- more than N files touched
- uncertainty remains after 2 clarifying questions
- high-stakes decision detected
This prevents burning DEEP tokens on tasks that looked scary but arenât.
4) Route by failure modes, not just âcomplexityâ
Common real-world rule: if a model is flaky at tool calling / JSON / long context, route tool-heavy tasks to the tool-reliable tier even if the âreasoningâ is simple.
5) Keep a ârouting policyâ file + examples
Put a small routing-policy.md with 10â20 examples of past tasks and the correct tier. That trains the router far better than abstract rules.
Pasteable prompt: âSmart Model Routing Rubricâ
(You can drop this into your agentâs instructions. It doesnât name specific models; it defines tiers + when to switch.)
# Smart Model Routing (FAST / STANDARD / DEEP)
You must select an execution tier before starting any non-trivial request.
## Tiers
- FAST: low-risk, short tasks; minimal tools; minimal ambiguity.
- STANDARD: default tier for normal work.
- DEEP: high ambiguity OR high stakes OR multi-step/tool-heavy work.
## Routing Step (required)
Before executing, output:
Routing: <FAST|STANDARD|DEEP>
Why: <2-4 bullets tied to rubric below>
Need-clarification?: <yes/no> (ask questions if yes)
Override-hint: âSay âforce FAST/STANDARD/DEEPâ to override.â
## Rubric (use these triggers)
Choose DEEP if ANY are true:
- High stakes: pricing, compliance, security, production infra changes
- Ambiguity: requirements unclear and multiple valid interpretations exist
- Multi-step: >3 dependent steps, or needs a plan with milestones
- Tool-heavy: likely >2 tool calls OR complex tool interactions
- Large change: touching many files/components OR architectural refactor
- Conflicting sources: docs disagree / version mismatch suspected
Choose FAST if ALL are true:
- Single clear question with a direct answer
- No sensitive/high-stakes domain
- No (or minimal) tools
- Can be completed in <60 seconds
Otherwise choose STANDARD.
## Escalation rule (token-efficient)
Start in FAST or STANDARD when unsure.
Immediately upgrade to DEEP if:
- you hit contradictions/unknowns
- youâre blocked after 2 clarifying questions
- the solution expands to many steps/files/tools
When you upgrade, announce:
âUpgrading to DEEP because <reason>.â
## Non-negotiable
Do not âpretendâ to be a higher tier with behavior alone.
If the system supports switching models, use the tier to actually switch.
If switching is unavailable, still follow the tierâs process (planning rigor, verification), and tell the user that the underlying model did not change.
OpenClaw-specific note
For âhow to wire this into OpenClaw config / runtimesâ, the safest reference is the models docs: https://docs.openclaw.ai/concepts/models
And for current best model-tier picks (since that changes constantly), the Discord #models channel tends to have the freshest real-world guidance: https://docs.openclaw.ai/concepts/models#model-discussions-community-notes
If you tell me how youâre currently implementing routing (prompt-only? multiple agents? Mission Control templates? any automatic switching?), I can suggest the cleanest way to make the tier selection actually switch the underlying model instead of just changing tone.
Given your setup (one orchestrator delegating to multiple workers, each worker has a default model), the cleanest way to get reliable routing is to route to agent variants, not to try to âhot swapâ models inside a single long-lived worker.
Best-practice patterns (pick one)
Pattern 1 (recommended): âAgent pool per tierâ (FAST / STANDARD / DEEP)
Create multiple copies of each role, same instructions, different default model. Example:
Builder_FAST,Builder_STANDARD,Builder_DEEPResearch_FAST,Research_DEEPInfra_STANDARD,Infra_DEEP
Then your orchestrator chooses (role, tier) and assigns the task to that agent.
Pros
- Deterministic: you actually get a different underlying model.
- No tricky runtime switching semantics.
- Easy to debug: âthis task went to Builder_DEEPâ.
Cons
- More agents to manage (but itâs usually worth it).
This is exactly how teams avoid âbehavioral routingâ problemsâbecause the model change is structural, not a suggestion.
Pattern 2: Per-task model override (only if your runtime truly supports it)
If your orchestration layer can pass a true model_override to a worker execution, do:
- Orchestrator decides tier â sets
model_overrideâ dispatches to worker.
Pros
- Fewer agent identities.
Cons - Easy to accidentally degrade into âpretend routingâ if override isnât enforced.
- Harder to observe unless you log the active model.
If youâve seen âit doesnât route to the right modelâ, youâre probably already in this danger zone.
Pattern 3: Escalation ladder (token-efficient)
Start a task with FAST/standard; if it hits triggers (ambiguity, contradictions, tool failures), re-dispatch to a DEEP agent with the partial findings.
Pros
- Saves tokens on easy tasks.
- Still guarantees DEEP when needed.
Cons - Requires good handoff summaries.
How to make the orchestrator pick correctly
Regardless of pattern, add a strict ârouting stepâ in the orchestrator:
- Classify: tier = FAST/STANDARD/DEEP
- Choose worker role
- Dispatch with a compact handoff package:
- goal
- constraints
- relevant files/links
- definition of done
- Log: âDispatched to X because Y; next checkpointâŚâ
Routing triggers that work in real life
- DEEP if: high-stakes (infra/compliance/pricing), unclear requirements, >2 tools expected, multiple files, or contradictions likely.
- FAST if: single Q/A, straightforward transform, low risk, <60s.
Pasteable orchestrator snippet (works with agent-pool routing)
## Dispatch policy (must follow)
Before delegating any task:
1) Select Tier: FAST / STANDARD / DEEP using this rubric:
- DEEP if high-stakes OR ambiguous OR multi-step OR tool-heavy OR many files OR conflicting sources
- FAST only if low-risk + single-step + no/low tools + <60s
- otherwise STANDARD
2) Select Role: Builder / Research / Infra / etc.
3) Choose the agent variant that matches (Role, Tier):
- Example: (Builder, DEEP) => Builder_DEEP
4) In the handoff message include:
- Goal (1 sentence)
- Definition of done (bullets)
- Constraints/risk notes
- Context pointers (files/paths)
- What you already checked (so workers donât repeat work)
- Required checkpoint cadence (e.g., update every 10 min or milestone)
5) If the worker seems to be struggling (blocked, tool errors, low confidence):
- Escalate by re-dispatching the task to the next tier up with a short summary.
If you tell me whether your orchestration layer (Mission Control/OpenClaw) can actually enforce a per-dispatch model override, I can recommend whether you should use Pattern 1 (agent pool) or Pattern 2 (override). In most stacks, Pattern 1 is the most robust and eliminates the âbehavioral but not realâ routing issue entirely.
For simple + robust in OpenClaw, Iâd build this as a hub-and-spoke supervisor system:
- Orchestrator (hub) owns routing, model tier selection, and the âsingle source of truthâ for task state.
- Workers (spokes) execute narrowly-scoped work and never free-form chat with each other by default.
- All âblockedâ and âneeds decisionâ events go back to the orchestrator, which decides whether to (a) ask you, (b) dispatch a helper agent, or (c) give the worker permission/constraints to continue.
This avoids the common failure mode where two agents get stuck in a loop or silently diverge.
1) Recommended architecture (minimal moving parts)
Roles
- Orchestrator
- Receives the task (or ad-hoc prompt)
- Breaks into work packets
- Chooses which worker + which tier (FAST/STANDARD/DEEP)
- Tracks status + merges results
- Handles âblockedâ and escalations
- Workers (examples)
- Builder (implementation)
- Researcher (docs/web)
- Tester (validation/checks)
- Reviewer (sanity + risk)
- Infra (only for infrastructure changes)
Communication rule (default)
- Worker â only Orchestrator
- Orchestrator â Workers
- No worker-to-worker DMs unless the orchestrator explicitly authorizes it for a specific subtask.
Think of the orchestrator as the message bus + traffic cop.
2) How a worker knows it is âblockedâ
Donât rely on vibes. Define explicit block conditions workers must detect and report.
A worker is BLOCKED if any of these happen:
- Missing required input
(e.g., âI need which environment: prod/staging?â âWhich client/pricing plan?â) - Ambiguous decision / multiple valid options
(e.g., âTwo architectures possible; need preference: A vs Bâ) - Permissions/tooling limitation
(canât access a repo, tool fails repeatedly, no credentials) - Risk gate
(destructive/irreversible change, security/compliance risk) - Contradictory sources
(docs disagree; version mismatch suspected) - Repeated failure
(same tool/action fails twice with no new angle)
Workers must stop and report, not keep thrashing.
3) The âBlocked Protocolâ (what the worker sends)
Make this a strict template so the orchestrator can route quickly:
BLOCKED REPORT (from Worker â Orchestrator)
- What I was trying to do: âŚ
- Where I got stuck: âŚ
- Block type: missing-info | decision-needed | permissions | risk-gate | contradiction | repeated-failure
- What I already tried: âŚ
- What I need: (one clear ask)
- Options: A/B/C with recommendation + tradeoffs
- Can I proceed safely with a default? yes/no (and what default)
This is the key to robustness. It turns âIâm stuckâ into a routable event.
4) Why I donât recommend worker-to-worker by default
Pros of peer-to-peer
- Faster for quick specialist questions
- Less load on orchestrator
Cons (the real-world pain)
- Harder to debug (âwho decided this?â)
- Easy to create loops (âask Researcherâ â âask Builderâ â âŚ)
- State divergence (two agents working from different assumptions)
- Worse observability in Mission Control / logs
So: keep peer-to-peer as an exception that the orchestrator explicitly invokes.
5) The clean workflow loop (robust and easy to reason about)
- Orchestrator decomposes task into packets
- Dispatch packet to worker (with tier + DoD + constraints)
- Worker returns one of:
- RESULT (deliverable + notes)
- CLARIFY (questions before proceeding)
- BLOCKED REPORT (using template above)
- Orchestrator decides:
- answer worker question OR ask you OR route to helper agent OR escalate tier
- Worker continues
This keeps the system coherent and prevents âagents solving the wrong problem in parallelâ.
6) If you want, I can give you the exact prompts
To fully âconstruct the architectureâ, you typically want:
- an Orchestrator prompt (routing rubric + how to dispatch + how to handle BLOCKED)
- a Worker prompt (how to execute + when to stop + how to report)
If you tell me your current worker set (Builder/Tester/Reviewer/etc.) and whether youâre using Mission Controlâs stages (Planning â In Progress â ReviewâŚ), Iâll write those two prompts in a copy/paste-ready form that matches your exact setup.
Youâve got a good role set. To keep it simple + robust with Mission Control stages, Iâd do:
- Hub-and-spoke: Orchestrator is the only ârouterâ. Workers never chat with each other unless Orchestrator explicitly asks them to.
- Stage-driven: Workers donât âmove stagesâ; they recommend the next stage in their updates, and the Orchestrator (or you) moves it in MC.
Below are paste-ready prompts:
- Orchestrator prompt (Mission Control + routing + blocked handling)
- Generic Worker prompt (used for HR/Sales/PM/Infra/BizOps/Dev as a baseline)
- Short role add-ons for each worker
- Guidance on Dev subroles: ACP vs OpenClaw agents
1) ORCHESTRATOR prompt (paste into orchestrator agent)
# Role: Orchestrator (Mission Control Hub)
You are the central orchestrator. You own: task intake, planning, routing, stage control recommendations, and integration of results.
Workers do not coordinate with each other directly unless you explicitly request it.
## Mission Control Stages (use as the operating state machine)
Planning â Inbox â Assigned â In Progress â Testing â Review â Verification â Done
Rules:
- Keep tasks moving forward.
- If blocked, move the task to a state that reflects reality and request whatâs needed.
- Always make the next action obvious.
## Dispatch Philosophy (simple + robust)
- Hub-and-spoke only:
- Workers report ONLY to you.
- You decide whether to ask the user, dispatch a helper worker, escalate tier, or change approach.
- Prefer fewer agents per task. Start with 1 worker; add more only when needed.
## Smart Routing (tiers)
Every dispatched work packet must be tagged with a tier:
- FAST / STANDARD / DEEP
Choose DEEP for: infra/security/compliance/pricing; high ambiguity; multi-step; >2 tools expected; many files; conflicting sources.
## Work Packets (required format)
When delegating, send a single âwork packetâ:
WORK PACKET
- Task: <short title>
- MC Stage: <current stage>
- Tier: <FAST|STANDARD|DEEP>
- Owner: <worker name>
- Goal (1 sentence):
- Definition of Done (bullets):
- Constraints / risk gates:
- Inputs (paths/links/context):
- Expected output artifacts (paths):
- Checkpoint cadence: (e.g. 10 min or milestone)
- If blocked, report using BLOCKED REPORT template.
## Worker Outputs (what you accept)
A worker must respond with one of:
- RESULT: deliverable + where it lives + caveats
- CLARIFY: questions that must be answered before proceeding
- BLOCKED REPORT: using the template below
## Blocked Handling (mandatory)
When a worker sends BLOCKED REPORT, you must:
1) Decide: ask user vs dispatch helper vs escalate tier vs change plan.
2) Reply with a single clear instruction or a new work packet.
3) Keep the original worker unblocked by giving:
- missing info, or
- a safe default to proceed with, or
- a new narrower subtask.
## BLOCKED REPORT template (workers must use)
- What I was trying to do:
- Where I got stuck:
- Block type: missing-info | decision-needed | permissions | risk-gate | contradiction | repeated-failure
- What I tried:
- What I need (one ask):
- Options (A/B/C) + recommendation:
- Can proceed with safe default?: yes/no (and what default)
## Mission Control visibility
At each meaningful transition, post a compact orchestrator update:
- Current stage
- Who is assigned
- Whatâs next
- Any open questions
2) GENERIC WORKER prompt (paste into each worker: HR/Sales/Dev/PM/Infra/BizOps)
# Role: Worker Agent (Spoke)
You execute work packets from the Orchestrator. You do not coordinate with other workers directly unless Orchestrator explicitly requests it.
## Contract
- Follow the work packetâs Goal + Definition of Done + Constraints.
- Keep outputs tightly scoped.
- If you will be silent >60s or hit uncertainty, report early.
## Allowed responses (must choose one)
1) RESULT
2) CLARIFY
3) BLOCKED REPORT (use template)
## RESULT format
RESULT
- Summary (bullets):
- Deliverables (paths/links):
- Notes / risks:
- Recommended next stage: <Testing|Review|Verification|Done>
- What Iâd do next if asked:
## CLARIFY format
CLARIFY
- I need answers to proceed:
1) ...
2) ...
- If no answer, I recommend default: ...
## BLOCKED REPORT format
- What I was trying to do:
- Where I got stuck:
- Block type: missing-info | decision-needed | permissions | risk-gate | contradiction | repeated-failure
- What I tried:
- What I need (one ask):
- Options (A/B/C) + recommendation:
- Can proceed with safe default?: yes/no (and what default)
## âBlockedâ detection rules (you must stop + report)
Stop and send CLARIFY/BLOCKED if any are true:
- Missing required info to meet DoD
- Multiple valid approaches need a decision
- Permissions/tool limitation prevents completion
- Risk gate triggered (security/production/destructive)
- Contradictory sources or version mismatch
- Repeated failure (same attempt fails twice)
3) Role add-ons (append to each worker prompt)
HR agent add-on
HR focus: hiring process, job descriptions, onboarding playbooks, performance frameworks, policies.
Hard rule: if legal/compliance implications, flag and ask for review (donât invent legal advice).
Sales agent add-on
Sales focus: messaging, objections, pricing packaging explanations (no discounts promised unless policy says so), outreach sequences.
Hard rule: if pricing/discount exceptions are requested, escalate to Orchestrator with options and policy references.
PM agent add-on
PM focus: clarify requirements, define acceptance criteria, identify risks, break work into milestones, propose MC stage transitions.
Hard rule: if requirements ambiguous, default to asking 3-5 crisp clarifying questions.
Business Operations add-on
BizOps focus: SOPs, metrics, dashboards definitions, recurring processes, vendor management, cost controls.
Hard rule: prefer lightweight, maintainable process changes; flag operational risk.
Infra agent add-on
Infra focus: OpenClaw reliability, deployments, auth, tokens, upgrades, backups, monitoring.
Hard rule: before any downtime/irreversible/security-impacting action, send BLOCKED REPORT (risk-gate) and request approval.
Also: never guess config keys/commands; require verified source (pinned docs/code).
Dev agent add-on
Dev focus: implementation, refactors, integrations, tests, bugfixes.
Hard rule: propose a test/verification plan with any change.
If code changes are substantial, request Orchestrator approval on approach before large edits.
4) Dev subroles: ACP sessions vs ârealâ OpenClaw agents (my recommendation)
To keep it simple and robust:
- Keep one Dev OpenClaw agent as the âengineering coordinatorâ worker.
- For actual coding runs (multi-file edits, tests, refactors), have Dev spawn an ACP coding session (Claude Code / Codex / etc.) and then report results back.
Use separate always-on OpenClaw agents only when the role is genuinely distinct and frequently active.
Suggested mapping:
- QA: usually best as an ACP session (run tests, write test plans, fix failing tests)
- Pentesting: only if itâs your own systems + explicit permission; often better as a carefully constrained workflow (and sometimes not LLM-driven at all). If you do it, treat as DEEP + strict scope.
- Designer: only if youâre generating assets; otherwise PM + Dev can handle UX copy/wireframe notes.
- DevOps: often overlaps with Infra. Iâd keep âDevOpsâ as a capability of Infra/Dev, not a separate agent at first.
If you want, I can also give you a single Orchestrator âdispatch playbookâ: which worker to use at each MC stage (e.g. PlanningâPM, In ProgressâDev/Infra, TestingâQA, ReviewâReviewer-like pass, VerificationâBizOps/Sales depending on deliverable).
Thatâs a super common failure mode, and itâs usually not âmemoryâ in the human sense â itâs just that the practices arenât reliably present in the modelâs context at the moment it generates the output, so it drifts.
The fix is: make practices enforceable, short, and injected at the right time, instead of hoping the agent remembers.
What works best (robust + simple)
1) Shrink âday-to-day practicesâ into a tiny Operating Contract
You want something like 10â25 lines max. If itâs long, it wonât be followed consistently.
Put it in one place (either the system prompt, or a single WORKFLOW.md), and donât duplicate it in 5 files.
2) Enforce it at the Orchestrator (donât rely on self-discipline)
Make the orchestrator act like a linter:
- If a worker reply is missing required sections (Result / Next stage / Artifacts / Questions), the orchestrator rejects it and requests a resubmission using the template.
- Over time, the workers âlearnâ the shape because non-compliant outputs donât get accepted.
This is the single highest-leverage change.
3) Inject the contract in every dispatch (only the short version)
Even if the worker âknowsâ it, include a short reminder in the work packet:
- âFollow OUTPUT FORMAT + BLOCKED RULES belowâ
That ensures itâs always in-context without big token waste.
4) Use examples (few-shot beats rules)
Include one good example of a compliant RESULT and one BLOCKED REPORT in the worker prompt. Agents follow examples better than prose.
Pasteable âOperating Contractâ (workers)
Add this at the end of every workerâs system prompt:
## Operating Contract (non-negotiable)
- Always respond as one of: RESULT / CLARIFY / BLOCKED REPORT.
- RESULT must include: Summary, Deliverables, Risks/Notes, Recommended next MC stage.
- If you will be silent >60s or you hit ambiguity/risk/permissions issues, stop and send CLARIFY or BLOCKED REPORT.
- Never guess config keys/commands; if unsure, say unverified and request how to verify.
- Keep updates compact; do not dump long reasoning. Provide high-level rationale only.
And add these two examples right below it:
### Example RESULT
RESULT
- Summary: Implemented X; updated Y; added tests.
- Deliverables: path/to/file1, path/to/file2
- Notes/Risks: Needs user approval for prod rollout.
- Recommended next stage: Testing
### Example BLOCKED REPORT
- What I was trying to do: Rotate gateway token safely
- Where I got stuck: Need confirmation before downtime
- Block type: risk-gate
- What I tried: Prepared steps + rollback
- What I need (one ask): Approve a 2â3 min restart window
- Options: A) do now B) schedule later (recommended)
- Can proceed with safe default?: no
Pasteable enforcement rule (Orchestrator)
Add this to orchestrator prompt:
## Output enforcement
If a worker response does not match the required formats (RESULT/CLARIFY/BLOCKED REPORT) or is missing required fields:
- Reject it with: âResubmit using the required template; missing: <fields>.â
Do not proceed to next stage until compliance is met.
Why this stops âforgettingâ
Youâre not asking the model to remember. Youâre turning practices into:
- a small contract
- a dispatch-time injection
- an acceptance gate
If you paste your current worker response template (what they usually send today), I can tighten it further so itâs even more âstickyâ while using fewer tokens.
Your agentâs read is correct, and the two-tier system (tiny always-hot contract + longer reference manual) is the right move.
Two tweaks Iâd make before you tell it âyes, implementâ:
- Donât make every agent read
SHARED-RULES.mdon session startup.
Thatâs expensive and still unreliable. Instead:
- Only the Orchestrator needs to know the full manual well.
- Workers follow the Operating Contract always, and the orchestrator injects the relevant excerpt from the manual only when needed (e.g., âworkspace isolation rulesâ for infra/devops work).
- Enforcement should be mechanical and fast
Have the orchestrator do a single-pass check:
- If missing RESULT/CLARIFY/BLOCKED â âResubmit in template; missing: Xâ
- If still noncompliant â mark task âBlocked: noncompliant outputâ and re-dispatch / escalate tier
That prevents endless loops.
What to tell your agent to do (acceptance criteria)
Reply to your agent with something like:
âYesâbuild it. Deliver these items:â
- A ~15â25 line Operating Contract (the âalways injectedâ layer)
- A refactor plan for
SHARED-RULES.mdinto a reference manual (can stay long, but must be well-sectioned with headings + TOC) - Update all worker prompts to include:
- the Operating Contract
- the two examples (RESULT + BLOCKED REPORT)
- Update Orchestrator:
- enforce the output gate (reject noncompliant responses)
- add the one-line injection in every work packet: âFollow Operating ContractâŚâ
- Run a trial: dispatch 2â3 tasks and confirm compliance without extra coaching
Optional (but high ROI):
- add a
RULES_VERSION: YYYY-MM-DDline so you can see which contract version is in play.
If you paste (or summarize) the top 10 rules currently inside SHARED-RULES.md, I can help you choose the best 5 that belong in the 20-line contract (the rest should stay in the manual).
Overall: this is a very solid direction. Youâve basically converged on the pattern that scales: tiny always-hot contract + orchestrator enforcement + everything else as reference.
Where Iâd tweak it (to keep it robust and not accidentally token-hungry / brittle):
What looks excellent
- OPERATING-CONTRACT.md as the always-hot layer + versioning + two few-shot examples: perfect.
- Orchestrator enforcement gate (reject + resubmit): this is the âbehavioral memoryâ hack that actually works.
- SHARED-RULES refactored into reference-only + TOC + âinject relevant sectionsâ: correct. Workers shouldnât pay the cost of reading a 300-line manual on every run.
- Local docs library (pinned + index + links + changelog): great for infra accuracy and speed.
Cautions / improvements Iâd make
1) Donât hardcode specific models in prompts (keep tiers abstract)
Your tier â model mapping is good, but keep it out of agent prompts if possible and store it in one config place. Prompts should talk in: FAST / STANDARD / DEEP only.
Reason: youâll change providers/models over time, and you donât want to touch 9 prompts every time. Also reduces âmodel-name cargo cultingâ.
(And for model choices themselves, #models is the right place to keep current.)
2) Auto-injection / symlinks: great, but build a fallback path
If your âauto-inject contract into contextâ mechanism ever fails silently, you donât want behavior to collapse.
Fallback that costs almost nothing:
- The orchestrator work packet should always include a 1-line reminder:
âFollow OPERATING-CONTRACT vYYYY-MM-DD (RESULT/CLARIFY/BLOCKED).â
That way the contract is still ânearby in contextâ even if injection breaks.
3) MC-VISIBILITY: your checkpoint requirements might be too heavy
â3 mandatory MC actions per checkpointâ (patch description + jsonl log + post interim deliverable) can become:
- lots of tool calls
- lots of text churn
- more places to fail
Iâd soften it:
- Always: post a compact checkpoint message (the human feed)
- Sometimes: patch the task description (only when the plan/DoD changes)
- Always or sometimes: append a log line (jsonl is fine, but keep it 1 line)
Also: âno runs/ folderâ is fine if MC + logs are truly your durable audit trail. But youâll likely still want a lightweight âad-hoc run logâ path for non-MC prompts (otherwise those sessions become untraceable).
4) SHARED-RULES injection will be easier if you split it into modules
Instead of injecting âsectionsâ from one big file, consider splitting into small files:
rules/workspace-isolation.mdrules/done-definition.mdrules/mc-posting.md
Then the orchestrator can attach/inject exactly one file when needed. Much less token and much less âwrong sectionâ risk.
5) Add one more enforcement: âevidence-gated DONEâ
You already mention it, but make it explicit in the orchestrator gate:
- A worker cannot claim âDoneâ unless it includes evidence (tests run, files changed, links, screenshots, etc.) appropriate to the task.
This prevents the most expensive failure mode: false completion.
Net: will this waste tokens?
If you keep the contract tiny + enforce via rejection + keep MC checkpoints compact, this reduces total token burn in most real setups, because it prevents:
- repeated rework
- long clarifying back-and-forth
- invisible failures and late-stage surprises
If you paste the draft OPERATING-CONTRACT.md, I can help you shave it to the âminimum effective contractâ (the smallest set of rules that still yields consistent behavior).
Your agentâs follow-up is basically spot on. The only thing Iâd push you to do now (not âlaterâ) is closing that gap in #5: evidence-gated DONE in the orchestrator gate â because thatâs the difference between ânice policyâ and âactual system behaviorâ.
The key fix: make Evidence a first-class field
1) Update the Worker RESULT template (one line)
Require an Evidence: line in every RESULT:
- Code/infra tasks: file paths + commands run + outputs/log lines (or MC deliverable link)
- Writing tasks: doc path + section anchors / excerpt
- Research tasks: URLs + quoted snippets
Example:
RESULT
- Summary:
- Deliverables:
- Evidence: (must not be empty)
- Notes/Risks:
- Recommended next stage:
2) Update orchestrator enforcement (2 checks, in order)
When a worker responds with RESULT:
Gate A â Format: must match RESULT/CLARIFY/BLOCKED + required fields present
Gate B â Evidence: must include non-empty Evidence: that matches what was requested in the work packet
If Gate B fails â orchestrator replies:
- âResubmit RESULT with Evidence (missing: âŚ). You may not mark done without proof.â
3) Put ârequired evidenceâ in the Work Packet (this makes it robust)
In each work packet, add:
- Required Evidence: (choose 1â3 items)
Examples:
- Dev task: âpaths changed + tests run outputâ
- Infra task: âconfig diff + service status outputâ
- Sales collateral: âdoc path + final copy pasted or linkedâ
- PM: âacceptance criteria doc path + checklistâ
Now the evidence gate isnât generic; itâs tailored per task.
On the SHARED-RULES split decision
I agree with your agent: you donât need to explode into 6+ files yet. A good compromise that keeps it simple:
- Keep one SHARED-RULES.md, but ensure each section is short and injectable (10â30 lines).
- Orchestrator injects only that section (never the whole doc).
Thatâs 90% of the benefit with near-zero filesystem complexity.
MC-VISIBILITY softening
The softened rules your agent proposed are the right balance. The âalways patch task descriptionâ requirement is the one that usually causes churn; making it âonly on plan/DoD changeâ keeps MC clean.
If you want, paste your current worker RESULT template and your orchestrator enforcement snippet, and Iâll rewrite them so Evidence-gated DONE is enforced without adding much token overhead.
What youâre seeing is expected with how OpenClaw works today:
- When the Gateway restarts, the process that would âping backâ is gone.
- Anything that relies on a live session continuing (or a sub-agent âannounce backâ) is best-effort and can be lost on restart. The docs explicitly call this out: sub-agent announce is best-effort and can be lost if the gateway restarts (see
docs/tools/subagents.md). - So the agent saying âIâll ping back once itâs upâ is only true if it schedules something persisted before the restart.
The robust fix: schedule a one-shot Cron âIâm backâ ping before restarting
Cron jobs are persisted to disk and survive restarts (see docs/automation/cron-jobs.md), which makes them the right primitive for ânotify after restartâ.
Example (Discord channel notify)
Before you restart/update, run something like:
openclaw cron add \
--name "Post-restart ping" \
--at "$(date -u -d '+2 minutes' +%Y-%m-%dT%H:%M:%SZ)" \
--session isolated \
--message "Gateway should be back up now. If you see this, cron+Discord delivery is working." \
--announce \
--channel discord \
--to "channel:YOUR_CHANNEL_ID_HERE" \
--delete-after-run
Notes:
- Use a small delay (e.g. 2â3 minutes) so Discord reconnect is definitely up.
- If youâre restarting as part of an update, schedule the cron first, then update/restart.
Debug checklist if it still doesnât fire
From the FAQ (docs/help/faq.md â âCron or reminders do not fireâ):
- Confirm cron is enabled (
cron.enabled) andOPENCLAW_SKIP_CRONis not set. - Check runs:
openclaw cron list
openclaw cron runs --id <jobId> --limit 50
openclaw cron run <jobId> --force
What to change in your Infra Agent protocol
Add one non-negotiable rule:
âIf you are about to restart/update the gateway, you MUST schedule a one-shot isolated cron announce to the target channel first. Then proceed with restart. If you didnât schedule it, say explicitly that no automatic ping will occur.â
That directly addresses the line your agent printed:
âI did not schedule a reminder in this turnâŚâ
If you tell me whether your gateway runs under systemd/macOS service + whether you already have cron enabled, I can tailor the exact commands/wording for your infra agent so it does this automatically every time.
Note: I did not schedule a reminder in this turn, so this will not trigger automatically.
Yeah â a Gateway restart kills everything thatâs âin-processâ inside that Gateway:
- Any currently-running turn gets cut off.
- Any sub-agent runs are in-process too, and the docs explicitly warn that sub-agent âannounce backâ is best-effort and can be lost on restart (
docs/tools/subagents.md). - After restart, you can start new turns again, but the âIâll ping you when Iâm backâ only happens if something durable (cron/heartbeat/external monitor) triggers it.
So your intuition is right: if you want robustness across restarts, you need a durable scheduler loop, not âthe worker will remember to message laterâ.
Should you create a heartbeat that delegates and continues work?
Conceptually: yes, but Iâd be picky about which mechanism you use.
Heartbeat is good for: âperiodic awarenessâ
Heartbeat runs periodic turns in the main session and is great for batched checks (inbox/calendar/etc.). Docs: docs/gateway/heartbeat.md.
But using heartbeat as a queue dispatcher can be a bit awkward because:
- itâs designed to be quiet (HEARTBEAT_OK suppression)
- it shares the main session context/history (can get noisy if you overuse it)
- itâs not âprecise schedulingâ
Cron is better for: âdispatcher / watchdog loopâ
If your goal is: âevery N minutes, check Mission Control for tasks and dispatch if an agent is freeâ â thatâs much closer to a cron isolated job (deterministic, persistent across restarts, doesnât pollute your main session). Docs: docs/automation/cron-jobs.md + docs/automation/cron-vs-heartbeat.md.
Recommendation:
- Use Cron (isolated) for the dispatcher/watchdog loop.
- Use Heartbeat for âhuman check-insâ and general awareness.
A robust, simple architecture for MC + OpenClaw restarts
1) Make Mission Control the durable task source of truth
Treat MC board state as the canonical state machine:
Planning â Inbox â Assigned â In Progress â Testing â Review â Verification â Done
2) Add a Cron âOrchestrator Tickâ (dispatcher loop)
Every 1â5 minutes (whatever cost you can tolerate), the tick does:
- Find dispatchable tasks
- tasks in Inbox (or âAssigned but not startedâ)
- Check capacity
- which workers are âfreeâ (or which ones havenât updated in X minutes)
- Dispatch
- create work packets and assign to the right worker tier
- Detect orphaned work
- tasks stuck in In Progress with no recent checkpoint â mark Blocked or re-dispatch from last checkpoint
- Announce only when needed
- donât spam; only message your #alerts when:
- a task becomes blocked
- a task completes
- donât spam; only message your #alerts when:
- the system had to reassign due to restart
This gives you exactly what you described: after a restart, the next cron tick âre-hydratesâ activity from MC and continues.
3) Enforce checkpointing so re-dispatch is possible
Your Operating Contract + evidence gating becomes the âresume substrateâ:
- workers must post interim deliverables (or at least write artifacts)
- orchestrator can re-dispatch using last known artifacts even if the prior worker died
One subtle but important point
If your dispatcher tick runs inside the same Gateway, then while the Gateway is down, nothing runs. Thatâs fine â the value is that as soon as itâs back, the next tick resumes dispatching without you having to poke it.
If you want true âIâm down/Iâm upâ notifications even when the Gateway is dead, that requires an external uptime monitor. But for âcontinue tasks after restartâ, cron-inside-gateway is usually enough.
If you confirm one detail, I can suggest the cleanest wiring:
- Do your MC tasks live in Mission Controlâs DB (and is the orchestrator able to query MC via API), or are you âusing MC stagesâ only as a convention in chat/workspace?
Note: I did not schedule a reminder in this turn, so this will not trigger automatically.
Hereâs a paste-ready âOrchestrator Tick / Dispatcher Loopâ prompt you can use for a scheduled Cron (isolated) run.
Important operational note first: for a dispatcher that runs every 1â5 minutes, you typically want the cron jobâs delivery mode = none (so it doesnât spam). Then the dispatcher only sends a message when it actually dispatches/reassigns/blocks something.
Prompt: âMC Dispatcher Tickâ (paste into your scheduled job)
# Mission Control Dispatcher Tick (Orchestrator Loop)
You are running as a scheduled dispatcher. Your job is to keep Mission Control moving:
- pick up new tasks,
- assign them to appropriate agents (FAST/STANDARD/DEEP tier),
- detect orphaned/stalled tasks (esp. after gateway restarts),
- and notify only when action is required.
## Hard constraints
- Be token-efficient: do not do deep reasoning. Do not rewrite tasks. Do not add fluff.
- Never spam: only send notifications when you actually (a) dispatch, (b) reassign, (c) mark blocked, or (d) detect something broken.
- Never contact other workers directly; dispatch through the Orchestrator routing rules and work packets.
## Inputs / environment assumptions
- Mission Control is the source of truth for task stage and assignment.
- You can read MC tasks via whatever integration exists in this deployment (API or stored task list).
If you cannot access MC tasks, STOP and report a single BLOCKED report explaining whatâs missing (endpoint/credentials/path).
## Stage model
Planning â Inbox â Assigned â In Progress â Testing â Review â Verification â Done
## Dispatcher policy (one pass per tick)
### Step 0 â Load state
Load:
- list of MC tasks (id, title, stage, assignee if any, lastUpdate timestamp if available)
- agent availability (best effort; if unknown, assume 1 task per agent and avoid over-dispatching)
### Step 1 â Identify candidates
A) Dispatchable:
- stage == Inbox (or Assigned but not started) AND no assignee
B) Orphaned/stalled:
- stage == In Progress (or Testing/Review) AND lastUpdate older than STALE_MINUTES
- OR task assigned to an agent that is not responding / appears offline
Set STALE_MINUTES default to 30 unless you have a better local standard.
### Step 2 â Choose actions (minimal, safe)
Do actions in this priority order:
1) Handle orphaned/stalled tasks (prevent deadlocks)
2) Dispatch new Inbox tasks (one or two per tick max to avoid floods)
### Step 3 â Routing (FAST/STANDARD/DEEP)
For each task you take action on:
- Choose tier:
- DEEP if infra/security/compliance/pricing OR ambiguous OR multi-step/tool-heavy
- FAST only if clearly trivial
- otherwise STANDARD
- Choose role agent:
- PM: clarify requirements / acceptance criteria / plan
- Dev: implementation
- Infra: gateway/config/deploy
- Sales/HR/BizOps: domain tasks
- Prefer not to assign multiple agents unless necessary.
### Step 4 â Dispatch format (Work Packet)
When you assign/reassign a task, attach/send this work packet to the chosen agent:
WORK PACKET
- Task ID:
- Task title:
- Current MC stage:
- Tier: FAST|STANDARD|DEEP
- Goal (1 sentence):
- Definition of Done (bullets):
- Required Evidence (1â3 bullets):
- Constraints / risk gates:
- Inputs (paths/links/context):
- Checkpoint cadence: (e.g. 10 min or milestone)
- Contract reminder: Respond ONLY as RESULT / CLARIFY / BLOCKED REPORT.
### Step 5 â Notifications (only on action)
If you took any action, send a single compact notification to the alerts channel (or MC activity feed):
- what tasks you dispatched/reassigned/blocked
- who they went to
- any CLARIFY questions you need from the user
If you took NO action, send nothing.
## Output contract for this tick
Return a short internal summary for logs:
- actions taken (or âno-opâ)
- tasks touched
- any detected systemic issue
Two optional refinements (worth doing)
- Stall detection based on âlast checkpoint messageâ (not just timestamps)
- âOne action per tickâ throttle at first (prevents dispatch storms after restart)
If you tell me how your dispatcher can access Mission Control tasks (API endpoint vs file export vs âMC is just a conventionâ), I can tailor the prompt to the exact data source + give you a clean âBLOCKEDâ message for the missing integration.
You prevent this by treating infra changes like production deploys, not like âLLM edits a config file and vibesâ.
The winning combo is:
- reduce blast radius (limit what the infra agent is allowed to do)
- two-phase commit (plan â approve â execute)
- preflight validation + rollback (so a bad change canât brick the gateway)
Hereâs a practical setup that works well with OpenClaw:
1) Make the infra agent âread-only by defaultâ
Default behavior:
- inspect, propose, prepare diffs/commands
- do not restart, update, rotate tokens, or rewrite config without explicit approval
This alone eliminates most âit broke itselfâ.
Enforcement: Orchestrator should reject any infra action that isnât preceded by an approval request.
2) Adopt a Safe-Change Runbook (every time)
Have your infra agent follow this exact sequence:
A) Snapshot / backup first (break-glass)
- Save a timestamped copy of the active config (and any other critical state you rely on).
B) Preflight validate before restart
Use the CLI validator so you catch schema/key errors before the gateway goes down:
openclaw config validate(docs:docs/cli/config.md)
If validate fails: STOP, do not restart.
C) Schedule a post-restart ping (durable)
Before restart/update, schedule a one-shot cron announce to your alerts channel. Cron persists across restarts (docs: docs/automation/cron-jobs.md), unlike in-flight âIâll ping backâ.
D) Execute the change
Only after approval.
E) Verify health
Immediately check:
- gateway is running
- channel delivery works
- mission control reconnects (if used)
F) Auto-rollback if verification fails
If gateway fails to come up or delivery is broken, revert to the backup config and restart.
This turns âoopsâ into a 2-minute rollback instead of a manual rescue.
3) Restrict tooling per agent (blast-radius control)
OpenClaw supports per-agent tool allow/deny policies (docs mention tools.allow/tools.deny and per-agent equivalents in the configuration reference). The practical pattern:
- Infra Planner agent: can read logs/docs/status; no âdangerousâ tools.
- Infra Executor agent: can run the minimal set of tools needed, but still uses approval gates.
Even if you donât implement strict tool policies yet, logically splitting âplanner vs executorâ is a huge reliability win.
4) Donât let the infra agent edit its own instructions
Most self-bricking happens when an agent rewrites:
- its own contract / rules
- orchestrator routing rules
- auth/delivery settings
Make those files human-owned (or at least âproposal onlyâ): the agent can draft a patch, but cannot apply it without approval.
5) Make âbrick-proofâ the default with one command: validate
If you do only one thing: enforce that every config change must pass:
openclaw config validate(fromdocs/cli/config.md)
before any restart/update. That catches the majority of âgateway wonât startâ incidents.
If you want, I can write you a pasteable Infra Agent âSafe Change Protocolâ prompt (plan/approve/backup/validate/restart/verify/rollback) that fits your Operating Contract style and works nicely with Mission Control.
Note: I did not schedule a reminder in this turn, so this will not trigger automatically.
Two different levers here:
- Planning-before-action (behavioral) â works with any model
- âThinking levelâ (runtime/model feature) â only works if the underlying model/provider supports it
You usually want both: plan first, and only turn on high thinking for DEEP-tier work.
1) Force planning before proceeding (most reliable)
In your Orchestrator prompt, add a hard gate:
- For any non-trivial task, the orchestrator must produce:
- clarifying questions (if needed)
- a short plan + Definition of Done + risks
- then wait for explicit âAPPROVEâ before dispatching workers / running tools
This prevents âagent just starts doing stuffâ and is independent of model quirks.
Mission Control already has a Planning Q&A flow â the key is: donât let tasks move to In Progress until the plan is approved.
2) Enable âdeep thinkingâ (OpenClaw-specific knobs)
OpenClaw supports a thinking level concept you can set per run. Youâll see it in:
- sub-agent spawns:
/subagents spawn ⌠--thinking <level>(docs:docs/tools/subagents.md) - cron jobs:
openclaw cron add ⌠--thinking high(docs:docs/automation/cron-jobs.md) - agent sends:
openclaw agent ⌠--thinking medium/high(docs:docs/tools/agent-send.md) - defaults:
agents.defaults.thinkingDefaultin config (docs:docs/gateway/configuration-reference.md)
Important caveat: not all models honor thinking levels. Some providers ignore it or implement their own âadaptive thinkingâ. So treat thinking as âbest effortâ, and rely on the planning gate for correctness.
Recommended pattern (simple + cost-controlled)
- Keep your global/default at low (
agents.defaults.thinkingDefault: "low")
- For DEEP-tier tasks only, override thinking to high on the dispatched run (cron/subagent/agent-send).
Copy/paste snippet for your Orchestrator (planning gate + thinking escalation)
## Planning Gate (must follow)
For any task that is multi-step, tool-heavy, or high-stakes:
1) Ask clarifying questions if needed.
2) Propose a plan (3â7 steps) + Definition of Done + risks.
3) STOP and wait for the user to reply âAPPROVEâ.
Only after APPROVE may you dispatch workers or run tools.
## Thinking escalation rule
- Default thinking: low.
- If Tier=DEEP, run the delegated worker turn with thinking=high (if supported).
- If thinking override is unsupported by the current model, proceed with the same plan/verification rigor anyway and explicitly note: âthinking override not supported; using process gate insteadâ.
If you tell me how you dispatch workers today (subagents vs Mission Control assignment vs cron isolated jobs), I can give you the exact place to apply the --thinking high override so it only hits DEEP-tier runs and doesnât blow up your token bill.
How do you turn on high thinking ?