Krill, review this OpenClaw recovery plan as an architecture/stability audit, not a vibes check.
Goal: make OpenClaw reliable enough for client website builds, infra, Google Ads/analytics, and daily operating support without dead
jobs, stale context, or fake completion claims.
Current diagnosis: gateway is healthy, but the operating layer is unreliable. Main failure areas are ACP/Codex disconnects/timeouts,
broad cron jobs that can outlive their cadence, overloaded/truncated context, active-memory timeouts, noisy session cleanup, and
missing recent transcripts.
Target architecture: Morgan orchestrates, domain agents execute, live tools verify reality, Supabase records durable state,
artifacts prove delivery, dashboards summarize, and a nightly brain finds leverage.
Non-negotiables: Supabase is canonical ledger; markdown is only plans/runbooks; no “done/sent/deployed/fixed” unless source row +
receipt/artifact/live readback supports it; ACP completion text is not proof; risky actions need approval.
Priority plan:
- Backup config/tasks DB/cron/sessions before repair.
- Stabilize chat/context: shrink boot files, lazy-load protocols, reduce context stuffing, tune/disable active-memory while
debugging, restore useful normal chat. - Split deterministic jobs from agent review work; add leases, receipts, blocker output, and bounded Command Center runner.
- Put ACP behind a stability gate with preflight and artifact receipts.
- Repair session store/debuggability.
- Rebuild Morgan/Kit/Mika/Rex operating model.
- Productize websites, Ads, and life ops with dashboards and approval gates.
Please identify: wrong sequencing, missing OpenClaw-native primitives, overcomplicated parts, unsafe autonomy, missing acceptance
gates, evals we need before implementation, and the smallest robust path to make this boring and dependable.