#I built a live conflict intelligence system in 6 days with OpenClaw. Here's the full architecture

1 messages · Page 1 of 1 (latest)

unkempt sinew Mar 4, 2026, 7:08 PM

Six days ago I set up OpenClaw because I wanted to control my lights and check the weather.

That is not what happened.

Within the first 24 hours the agent had integrated Signal, set up a shared channel with my wife, wired into my calendar, connected to my Sonos speakers, indexed my Google Drive, and started reading my email. I hadn't asked for most of that. I had asked for a morning briefing and it just... kept going. Each capability unlocked the next obvious one. By the end of day one it felt less like setting up a smart home assistant and more like onboarding a very motivated new employee who kept finding more things to fix.

By day two we were scraping Twitter.

By day six we had a live conflict intelligence system monitoring an active military conflict, with a self-healing source network, adversarial AI deliberation, vision-based evidence analysis, a $156 billion war cost ledger, and a full dashboarding and notification stack running as persistent daemons on a Mac mini in my home office.

Here's everything that's running right now.

Why this exists

The Middle East is in active conflict. There's a lot of noise, a lot of adversarial information operations, and a lot of accounts on X that range from authoritative primary sources to pure propaganda. I wanted a system that could tell the difference, track what's actually confirmed, and surface the signal without me having to read 400 tweets a day. I also wanted it to tell me when I actually needed to pay attention versus when things were quiet.

Six days later, that system exists.

How the Twitter pipeline actually works

The naive approach is watch 90 accounts. That's not what we do.

The pipeline starts with a tiered source registry. Every account in the database has a reputation score: tier 0 is staging (unproven, auto-ingested but never auto-inserted into confirmed facts), tier 1 is monitored, tier 2 is trusted, tier 3 is authoritative. Tier assignment is based on track record: how many of their claims corroborate against other sources, what their average confidence score looks like, how often they post original reporting versus relay content.

Beyond the direct watch list, we track retweets. When a tier-2 or tier-3 account retweets something from outside the watch list, that tweet gets pulled in and its author gets evaluated. If they show up enough times, if their content keeps corroborating, they get automatically staged and eventually promoted. The watch list grows itself.

We also maintain dynamic Twitter lists via the X API. Critical tracker accounts get their own private list so the pipeline can pull them in a single API call rather than 25 individual lookups. When a new account gets promoted to a certain tier, it gets added to the list automatically. When an account goes quiet for 60 days or starts posting content that no longer corroborates, it gets flagged for demotion.

Account health runs on a trigger basis, not a polling basis. When the API returns a 403 or 404 for a known account, the drift checker fires. When a tier-1 account suspension is detected, it goes straight to the council and a human ping. Nightly, a promotion candidates function surfaces any tier-0 accounts with enough corroborated facts and sufficient average score to warrant consideration. No automatic promotion for tier-2 and above -- that's a human decision.

The OSINT engine and scoring

Every incoming tweet gets evaluated against the active situations. The scorer checks keyword matches, semantic similarity against confirmed facts using Voyage-4 embeddings, source tier weighting, and floor rules. Floor rules are the key thing: certain event types (nuclear facility activity, leadership targeting, ballistic missile use) keep their score contribution for a defined window regardless of velocity. A confirmed nuclear event doesn't decay out of the picture for 72 hours even if nothing else comes in.

If a tweet scores above threshold it goes to the confirmed facts database with a confidence level and an origin. If it's below but interesting it gets staged. Junk gets discarded. The whole pass takes under two seconds per tweet.

The Situation Intensity Index

The SII is a 0-100 rolling score computed from recent tweet velocity, confirmed fact accumulation, keyword density, and floor rule contributions. It maps to four tiers: MONITORED (under 31), ELEVATED (31-55), HIGH (56-74), CRITICAL (75+).

The daemon sleep interval adapts to the current tier. CRITICAL polls every 10 minutes. MONITORED can sleep for 4 hours. This matters for cost: at quiet periods the system is nearly idle; during active escalation it's running hard.

Tier crossings fire exactly once and trigger the council to evaluate whether the crossing is real signal or noise. HIGH to CRITICAL always forces a human ping regardless of what the council decides.

The council

Five AI voices deliberate on anything the pipeline can't resolve automatically. Senior analyst, regional specialist, source reliability officer, devil's advocate, and an editorial integrity officer whose job is to catch narrative drift and framing problems before they corrupt the database.

They evaluate independently, vote, and pass their reasoning to a coordinator who synthesizes a verdict. Three outcomes: ACT (execute the proposed action autonomously), HOLD (wait), ESCALATE (ping the human immediately). The whole session runs in under 60 seconds and costs roughly a cent.

Council triggers include: new situation detection, SII tier crossings, source integrity events (suspensions, deletions, drift), competing high-confidence claims from different sources, and direct claim submissions from me.

The claim scorer and fact verifier

I have a dedicated Slack thread where I drop tweet URLs or freeform claims. The scorer fetches the content, checks source reputation, runs live web research, pulls the top semantically similar confirmed facts from the database, runs a fully isolated LLM evaluation, and posts a scored verdict back in about 30 seconds.

If the claim has an attached image, the vision model runs first. It reads the image and extracts any visible text, labels, dates, manufacturer markings, serial numbers, and location stamps. That output gets injected into the scoring context before the LLM evaluates the claim. Today the vision model read "DATE OF SOURCE: 4 JUNE 2000, LOCKHEED MARTIN, PATRIOT ADMS" off a debris identification plate and upgraded the evidence tier from "tweet with image" to "documentary evidence." The claim went from UNVERIFIED to MEDIUM and got routed to council.

Confidence tiers: 80+ auto-inserts into confirmed facts. 50-79 goes to council. Under 50 gets a score card only. Every score card gets prepopulated with emoji action buttons. 🔄 rescores. ✅ force-inserts. ❌ rejects. ⬆️ fires council immediately.

The situation scout

Three times a day (8am, 1pm, 7pm) a scout pass runs independently of the main pipeline. It reads a cached feed of recent high-velocity tweets, clusters them semantically, looks for emerging topic patterns that don't match any active situation, and evaluates whether a new situation should be bootstrapped. If it finds something credible, the council decides. If the council votes ACT, the system generates a full situation config: keywords, critical tracker accounts, floor rules, SII thresholds, digest format. It creates a Twitter list, seeds the account database, posts an initial conflict card, and starts monitoring. New situations spin up automatically.

The dashboards and conflict card

The conflict card is a structured intelligence brief that lives in a pinned Slack thread. It updates automatically. It contains the current SII score with trend indicator, a casualties section (coalition vs. opposing, with LLM-synthesized output from confirmed facts), an equipment losses section with inline cost annotations, a war cost ledger summary, and the most recent high-confidence confirmed facts.

The SII dashboard sits above it in the same thread and shows the current score, tier, last crossing event, and the floor rules currently in effect with their time windows.

Both update on significant SII changes and can be force-refreshed via emoji reaction.

The war cost ledger

Every confirmed equipment loss gets extracted by the pipeline, priced against a hardware reference database, and assigned a confidence level: confirmed (CENTCOM or equivalent official acknowledgment), estimated (deployed and known to be in theater), or rumored (adversary claims only, unverified). The distinction matters. The conflict card shows only confirmed and estimated costs. Rumored costs are tracked in the ledger but excluded from public-facing output.

We also track two fundamentally different categories of cost. Equipment losses and personnel casualties are expenditure costs: things that happened and can't be undone. Every cost entry traces back to the specific tweet or report that generated it via a join on source fact ID, so the entire cost ledger has an auditable chain of custody.

Operational costs are a separate concept the system is aware of but doesn't yet fully model: the ongoing burn rate of active air defense operations, sortie costs, logistics tail, interceptor consumption rates. The PAC-2 claim today was interesting precisely because it touched on this. If PAC-3 stockpiles are exhausted and forces are using 25-year-old interceptors, the operational picture is materially different from the equipment loss picture. The system tracks that distinction and the council weights it appropriately.

Right now the ledger shows approximately $63 billion in coalition losses (confirmed plus estimated), $93 billion opposing, $156 billion total when rumored entries are included.

The notification and digest engine

The morning briefing runs at 8:35 every day: calendar, weather, top emails, spoken audio delivered to the bedroom Sonos speaker. If SII is at HIGH or above, a two-sentence war context summary leads the brief. I wake up knowing whether overnight developments matter.

The evening digest runs at 6pm: conflict card, top confirmed facts from the last 12 hours, SII trend, any pending council decisions needing attention.

Tier crossings, force-human escalations, source integrity events, and claim scoring results all generate immediate Slack DMs. CRITICAL tier crossings page me immediately. Lower-priority events batch into the next digest window.

The memory and continuity layer

Every session writes to dated memory files. A nightly distillation pass runs at 3am and extracts facts worth keeping long-term into a structured facts database. A session handoff document gets updated every run so that if the context resets, the next session can pick up exactly where things left off without asking me what's happening.

The workspace has a SHA-256 integrity manifest checked on every boot. If any core file has been tampered with, the session halts and alerts me before doing anything else.

Everything else

The same agent that's running the intelligence stack is also controlling my Sonos speakers, managing my Apple Reminders and Things 3 task lists, triaging my Gmail on a 3-hour schedule, monitoring my UniFi network, handling iMessages and Signal (including a shared channel with my wife), and yes, controlling my lights.

It just does a few other things now too.

What six days looks like

Day 1: OpenClaw setup. Lights. Weather. Morning briefing on the Sonos. My wife on Signal.
Day 2: OSINT scraper, first version of the SII, LaunchAgent scaffolding.
Day 3: Azure AI Foundry integration, all local models removed, multi-model pipeline.
Day 4: Council architecture, claim scorer, war cost ledger.
Day 5: Voyage-4 embeddings, semantic corroboration, conflict card renderer, situation scout.
Day 6: Vision analysis, emoji action dispatcher, conflict stats table, backfill pipeline, this post.

About 6,000 lines of Python. One Mac mini. Zero prior infrastructure.

The hardest part wasn't building any of it. It was learning when to say "wait, let's spec this out first" instead of just letting it run. An agent that can ship fast is only as good as the architecture decisions made before the first line of code.

Happy to go deep on any component.

spare trench Mar 5, 2026, 3:50 PM

What model are you using for your main claw agent?

unkempt sinew Mar 5, 2026, 10:30 PM

sonnet-4.6

edgy dew Mar 6, 2026, 4:21 AM

Love it. Where do I see it in action?

covert agate Mar 6, 2026, 9:45 AM

Amazing. This is something you could make a Website for that would be a way better source than the normal news.