#Architecture of Openclaw
1 messages · Page 1 of 1 (latest)
@neat leaf can you help me with this ?
Yep — absolutely.
Short version of the “until (?)” part first:
A run keeps looping assistant → tool call(s) → tool result(s) → assistant... until the model emits a final assistant response with no more tool calls (or it gets stopped/aborted/times out).
How context is built per run:
- Resolve session (DM/group/thread key)
- Load session history from that session transcript
- Build system prompt (OpenClaw-owned sections: tooling, safety, runtime/time, etc.)
- Inject workspace files (
AGENTS.md,SOUL.md,USER.md,TOOLS.md, etc., with size caps) - Attach tool schemas (important: these also consume context)
- Add channel context (e.g. recent channel history, topic/sender metadata depending on connector)
- Fit into model window via compaction/pruning rules if needed
- Send to model and run the loop
Big design choices they made:
- Gateway is source of truth for sessions/state (not clients)
- Per-session serialized runs to avoid races/corrupt history
- Prompt is rebuilt each run (deterministic + debuggable)
- Skills are listed, loaded on-demand to keep base context smaller
- Compaction + pruning split (persistent summary vs transient trimming)
- Pluggable context engine (legacy default, plugin engines possible)
If you want, I can do a concrete “single Discord message walkthrough” next (exactly what gets assembled, in what order, and where each piece comes from).
Please do
Perfect — here’s the concrete walkthrough for your message in this Discord thread.
1) Inbound + routing
Your Discord message hits the Gateway as an inbound envelope (provider=discord, chat type channel/thread context).
Gateway resolves a session key (usually shaped like):
agent:<agentId>:discord:channel:<channelOrThreadId>
(unless thread bindings/focus remap it)
So this thread has its own running conversation state, separate from DMs or other channels.
2) Pre-model processing
Before any LLM call:
- Slash commands/directives are parsed first (
/model,/think, etc.) - Mention/allowlist rules are checked
- Queue mode is applied (serialize/collect/steer behavior)
- Run is accepted and assigned a
runId
3) Context assembly (actual stack)
For that run, OpenClaw assembles context roughly in this order:
A) System prompt (OpenClaw-owned)
Built fresh every run; includes sections like:
- Tooling available
- Safety/guardrails
- Skills catalog (metadata only)
- Workspace/docs/runtime/time info
- Reply behavior rules
B) Project-context file injection
From workspace (if present), e.g.:
AGENTS.md,SOUL.md,TOOLS.md,IDENTITY.md,USER.md, etc.
These are hard-capped/truncated by config so one giant file can’t nuke the context window.
C) Session history
Loads prior turns from this session transcript (.jsonl), including:
- user/assistant messages
- tool call/result records
- previous compaction summary (if any)
D) Channel context extras
Discord channel history/topic metadata can be added per config (historyLimit, etc.), as untrusted context (not elevated to system prompt authority).
E) Tool schemas
Function/tool schemas are attached to the model request (this is a hidden-but-real token cost).
F) Current message
Your current “Please do” message is appended as the active user turn.
4) Budget enforcement
Now OpenClaw checks context pressure:
- If window is tight: compaction and/or pruning logic kicks in
- Compaction = persistent summary in transcript
- Pruning = transient trimming (mostly old tool-result bulk)
Then model call starts.
5) Agent loop (“until what?”)
Run loop is:
- model streams assistant output
- if model emits tool calls, OpenClaw executes them
- tool results are fed back into the same run
- repeat
It ends when model returns a final assistant response with no more tool calls (or stop/timeout/abort/error).
6) Persist + stream
During run, Gateway emits assistant/tool/lifecycle events; on completion it persists:
- transcript entries
- token/cost/session metadata
- updated session pointers
Why it’s designed this way (important bits)
- Gateway is source of truth → clients stay dumb/reliable
- Per-session serialized execution → no racey state corruption
- Rebuilt prompt each run → deterministic/debuggable
- Skills listed, loaded on demand → lower baseline token burn
- Context-engine pluggability → future retrieval/assembly strategies without rewriting core loop
If you want, next I can do the same walkthrough with a tiny pseudo payload (showing exactly what the outbound model request shape looks like, including where tools + history sit).
Go ahead
Awesome — here’s the pseudo payload walkthrough.
Outbound model request shape (conceptual)
Provider wire format differs (Anthropic/OpenAI/etc), but logically it’s this:
// 1) Build system prompt
const baseSystemPrompt = buildAgentSystemPrompt({
tools, skills, runtime, workspaceFiles, time, safety, ...
});
// 2) Assemble context via active context engine (legacy by default)
const assembled = contextEngine.assemble({
sessionId,
messages: sessionTranscriptMessages,
tokenBudget,
});
// assembled.messages = history selected for this run
// assembled.systemPromptAddition? optional extra prefix
const systemPrompt =
(assembled.systemPromptAddition ? assembled.systemPromptAddition + "\n\n" : "") +
baseSystemPrompt;
// 3) Final message stack for this turn
const modelMessages = [
...assembled.messages, // prior session turns (possibly compacted/pruned)
{ role: "user", content: inboundUserText }, // current message
];
// 4) Attach tool schemas (counts toward context)
const toolSchemas = resolveAllowedToolSchemasForRun(...);
// 5) Send request
await provider.stream({
model,
systemPrompt,
messages: modelMessages,
tools: toolSchemas,
thinking,
verbosity,
...
});
Loop behavior after request starts
while (runActive) {
const event = await stream.next();
if (event.type === "assistant_text_delta") {
emitStream("assistant", event.delta);
}
if (event.type === "tool_call") {
emitStream("tool", { phase: "start", ...event });
const result = await executeTool(event.name, event.input);
emitStream("tool", { phase: "end", resultMeta: ... });
// feed tool result back into model context in same run
appendMessage({ role: "tool", toolCallId: event.id, content: result });
continue;
}
if (event.type === "assistant_final" && !event.hasFurtherToolCalls) {
break;
}
}
Then OpenClaw persists transcript + session metadata (sessions.json + JSONL turn lines), emits lifecycle end, done.