#Help me troubleshoot why my agent is non-responsive.
1 messages ¡ Page 1 of 1 (latest)
First thing: what version are you on? Please paste:
openclaw --version
Then letâs do a quick âis the gateway alive + is the channel connected + is policy dropping messagesâ triage. Run these in order and paste the output:
openclaw status
openclaw gateway status
openclaw channels status --probe
openclaw doctor
After that, keep logs open, send a message that the bot should answer, and paste the log lines that appear when you send it:
openclaw logs --follow
A couple super common ânon-responsiveâ causes (the logs will usually make it obvious):
- Gateway not actually reachable:
openclaw gateway statusshould showRuntime: runningandRPC probe: ok. - Discord connected but messages are dropped by policy:
- In guild channels, many configs require a mention. Logs often say something like
mention required/drop guild message (mention required ...). - Pairing required / pending approval (logs mention
pairing). - Allowlist mismatch (
blocked/allowlist).
- In guild channels, many configs require a mention. Logs often say something like
- Discord auth/perms:
channels status --probewill show 401/403, missing scopes, not in channel, etc.
Two clarifiers so I can route you faster:
- Is this ânon-responsiveâ in Discord DMs, or in a server channel/thread?
- When you message it, do you get nothing at all, or does it show âthinkingâ/create a session but never replies?
Relevant runbook (matches the commands above): https://docs.openclaw.ai/gateway/troubleshooting
Security audit
Summary: 1 critical ¡ 2 warn ¡ 1 info
CRITICAL Small models require sandboxing and web tools disabled
Small models (<=300B params) detected: - ollama/llama3.1:8b-instruct-q4_K_M (8B) @ agents.defaults.model.primary (unsafe; sandbox=off; web=[web_fetch, browser]âŚ
Fix: If you must use small models, enable sandboxing for all sessions (agents.defaults.sandbox.mode="all") and disable web_search/web_fetch/browser (tools.deny=["group:web","browser"]).
WARN Reverse proxy headers are not trusted
gateway.bind is loopback and gateway.trustedProxies is empty. If you expose the Control UI through a reverse proxy, configure trusted proxies so local-client câŚ
Fix: Set gateway.trustedProxies to your proxy IPs or keep the Control UI local-only.
WARN Some gateway.nodes.denyCommands entries are ineffective
gateway.nodes.denyCommands uses exact node command-name matching only (for example system.run), not shell-text filtering inside a command payload. - Unknown âŚ
Fix: Use exact command names (for example: canvas.present, canvas.hide, canvas.navigate, canvas.eval, canvas.snapshot, canvas.a2ui.push, canvas.a2ui.pushJSONL, canvas.a2ui.reset). If you need broader restrictions, remove risky command IDs from allowCommands/default workflows and tighten tools.exec policy.
Full report: openclaw security audit
Deep probe: openclaw security audit --deep
That openclaw security audit output is useful (good you ran it), but it doesnât actually explain âno repliesâ by itself â itâs mostly a safety/config lint report.
Two key points from what you pasted:
-
CRITICAL: youâre running an 8B Ollama model with
sandbox=offwhile web tools are enabled
Thatâs a legit issue to fix (either enable sandboxing or stop exposing web/browser tools when using that small model). It usually wonât make Discord totally silent, but small local models can appear ânon-responsiveâ (hangs, tool-call weirdness, etc.), so itâs worth addressing after we confirm the basics. -
If youâre trying to talk to the agent in a Discord server channel/thread, the #1 âsilent botâ cause is: guild messages are being ignored due to policy (common log line: âmention requiredâ, or guild not in allowlist).
What I need next (paste outputs)
Run these and paste them:
openclaw --version
openclaw gateway status
openclaw channels status --probe
openclaw pairing list --channel discord
Then:
openclaw logs --follow
âŚleave that running, send one test message in Discord (in the same place you expect replies), and paste the new log lines that appear.
Quick Discord-specific checks (very common)
- Are you messaging the agent in a DM or in a server channel/thread?
- In the server: did you @mention the bot in your message?
By default, guilds often userequireMention: true(so it will silently ignore normal messages until mentioned) â docs: https://docs.openclaw.ai/channels/discord - Is the bot actually in that server and does it have permission to read/send in that channel? (
channels status --probe/ logs will show 401/403/Forbidden/missing_scope)
Once you paste gateway status + channels status --probe + the log snippet from a test message, we can usually pinpoint the exact reason in one round.
â Dashboard â http://127.0.0.1:18789/ â
â OS â macos 26.2 (arm64) ¡ node 24.8.0 â
â Tailscale â off â
â Channel â stable (default)
â Update â pnpm ¡ npm latest 2026.2.26
â Gateway â local ¡ ws://127.0.0.1:18789 (local loopback) ¡ unreachable (connect failed: connect ECONNREFUSED 127.0.0.1:18789)
â Gateway service â LaunchAgent installed ¡ not loaded ¡ unknown â
â Node service â LaunchAgent not installed â
â Agents â 1 ¡ 1 bootstrap file present ¡ sessions 0 ¡ default main active unknown â
â Memory â 0 files ¡ 0 chunks ¡ sources memory ¡ plugin memory-core ¡ vector unknown ¡ fts ready ¡ cache on (0)
â Probes â skipped (use --deep) â
â Events â none â
â Heartbeat â 30m (main) â
â Sessions â 0 active ¡ default llama3.1:8b-instruct-q4_K_M (128k ctx) ¡ ~/.openclaw/agents/main/sessions/sessions.json â
Your output already shows the root cause:
Gateway ⌠ws://127.0.0.1:18789 ⌠unreachable (connect failed: connect ECONNREFUSED 127.0.0.1:18789)Gateway service ⌠LaunchAgent installed ¡ not loaded
That means nothing is listening on 127.0.0.1:18789, so the agent canât respond anywhere (Discord included). Fix = get the Gateway process running.
1) Fastest: run the Gateway in the foreground (so we can see why it fails)
In a terminal on that Mac:
openclaw gateway --port 18789 --verbose
Leave it running. In a second terminal:
openclaw gateway status
openclaw channels status --probe
If it crashes immediately, paste the terminal output from the openclaw gateway ⌠--verbose window.
Docs: https://docs.openclaw.ai/gateway (see â5-minute local startupâ)
2) If you want the macOS background service (LaunchAgent)
Run:
openclaw gateway install
openclaw gateway restart
openclaw gateway status --deep
If it still says ânot loadedâ (or âstoppedâ), paste openclaw gateway status --deep output.
3) After the Gateway is running: one extra âsilent agentâ gotcha in your status
You also have:
Agents ⌠1 bootstrap file present
If the Gateway is up but the agent stays weird/non-responsive, thatâs often a leftover BOOTSTRAP.md keeping it in first-run mode. Donât delete anything yetâjust tell me what openclaw logs --follow shows right after startup.
Reply with:
- the output of
openclaw gateway status --deep(after you try #1 or #2), and - the first ~50 lines from
openclaw logs --followright after starting the gateway.
Should I run the gateway now? Or enable sandbox mode? What does sandbox mode do? REMEMBER: I'm on mac m4 with 16GB RAM.
Run the gateway now. Your status output shows:
unreachable (connect failed: connect ECONNREFUSED 127.0.0.1:18789)LaunchAgent installed ¡ not loaded
So the agent is ânon-responsiveâ because the Gateway process isnât running / nothing is listening on that port. Sandboxing wonât help until the Gateway is actually up.
1) Get responsive first (start the Gateway)
Fastest debug path (foreground, so you see errors):
openclaw gateway --port 18789 --verbose
In a second terminal:
openclaw gateway status
openclaw channels status --probe
If you prefer the background LaunchAgent:
openclaw gateway install
openclaw gateway restart
openclaw gateway status --deep
Once gateway status shows Runtime: running and RPC probe: ok, Discord replies should start working again (assuming the Discord channel is connected).
What sandbox mode does (and why the audit complained)
Sandboxing in OpenClaw means: tools run inside Docker containers (reduced blast radius). The Gateway stays on the host, but âdangerousâ stuff like exec, file ops (read/write/edit), and optionally the browser can be isolated.
Docs: https://docs.openclaw.ai/gateway/sandboxing
Key bits:
agents.defaults.sandbox.mode:"off": everything runs on the host"non-main": only non-main sessions are sandboxed (Discord guild channels/threads are typically ânon-mainâ)"all": every session is sandboxed
- By default, sandbox containers often run with no network egress, and with a separate sandbox workspace (so the model canât see your real files unless you allow it).
About your Mac M4 (16GB RAM)
- Your current model (
llama3.1:8bâŚ) already eats a meaningful chunk of RAM. - Sandboxing adds overhead (Docker + containers). Itâs usually fine for basic tool isolation, but sandboxed browser is heavier.
- If youâre memory-tight, start with
mode: "non-main"+ keep scope conservative.
Why the audit says âsmall models require sandboxing + disable web toolsâ
Small local models are more likely to:
- hallucinate tool calls / ignore safety instructions
- get prompt-injected into doing dumb things
So the audit is basically saying: if you insist on 8B + web/browser tools, put it in a container (or disable those tools).
But again: step one is still start the gateway.
If you paste the output of openclaw gateway status --deep after starting/restarting, I can tell you immediately whether itâs running and what it thinks its service state is.
FYI I have no intent on setting up a discord channel, don't assume that I do.
Here is some output:
my-MacBook-Air workspace % openclaw gateway status --deep
đŚ OpenClaw 2026.2.26 (***) â iMessage green bubble energy, but for everyone.
â
â
Service: LaunchAgent (not loaded)
File logs: /tmp/openclaw/openclaw-2026-02-27.log
Command: /Users//.nvm/versions/node/v24.8.0/bin/node /Users//.nvm/versions/node/v24.8.0/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Service file: ~/Library/LaunchAgents/ai.openclaw.gateway.plist
Service env: OPENCLAW_GATEWAY_PORT=***
Service config looks out of date or non-standard.
Service config issue: Gateway service uses Node from a version manager; it can break after upgrades. (/Users/***/.nvm/versions/node/v24.8.0/bin/node)
Service config issue: System Node 22+ not found; install it before migrating away from version managers.
Recommendation: run "openclaw doctor" (or "openclaw doctor --repair").
Config (cli): ~/.openclaw/openclaw.json
Config (service): ~/.openclaw/openclaw.json
Gateway: bind=loopback (127.0.0.1), port=*** (service args)
Probe target: ws://127.0.0.1:18789
Dashboard: http://127.0.0.1:18789/
Probe note: Loopback-only gateway; only local clients can connect.
Runtime: unknown (Bad request.
Could not find service "ai.openclaw.gateway" in domain for user gui: 501)
RPC probe: ok
Service unit not found.
Service not installed. Run: openclaw gateway install
File logs: /tmp/openclaw/openclaw-2026-02-27.log
Listening: 127.0.0.1:18789
Troubles: run openclaw status
Troubleshooting: https://docs.openclaw.ai/troubleshooting
Yep â and also: your Gateway is already running right now.
From your gateway status --deep:
RPC probe: okListening: 127.0.0.1:18789
So the process is up and reachable. The problem is the background service side:
Service: LaunchAgent (not loaded)Could not find service "ai.openclaw.gateway"...- and itâs warning that the LaunchAgent points at an nvm Node path, plus âSystem Node 22+ not foundâ
That combo usually means: it works while you happen to have the gateway running in a terminal, then âgoes deadâ after reboot / logout / upgrades because launchd isnât actually running it.
What to do next (recommended order)
1) Make the Gateway reliably stay up (fix LaunchAgent)
Do what your output suggests:
openclaw doctor --repair
openclaw gateway install --force
openclaw gateway restart
openclaw gateway status --deep
But before doctor --repair can migrate you away from nvm, you need a system Node 22+ installed (not via nvm). Your status literally says it canât find one. Once Node 22+ exists, doctor --repair can rewrite the LaunchAgent to use it.
If after this gateway status --deep still shows ânot loadedâ, paste that output again + the file log it points to (/tmp/openclaw/openclaw-2026-02-27.log).
2) If the agent still âdoes nothingâ: check bootstrapping
Earlier you had 1 bootstrap file present. If ~/.openclaw/workspace/BOOTSTRAP.md is still there, the agent may be stuck in first-run bootstrapping (itâs supposed to remove that file when complete). Docs: https://docs.openclaw.ai/start/bootstrapping
Run:
openclaw status
openclaw logs --follow
âŚand paste what you see when you send a message in whatever interface youâre using (Control UI chat, Telegram, etc.). No Discord assumptions needed.
Sandboxing: what it does + should you enable it?
Sandboxing = run tools inside Docker containers to reduce blast radius (file access, exec, etc.). Gateway itself still runs on the host. Docs: https://docs.openclaw.ai/gateway/sandboxing
- Itâs primarily a security boundary for tool execution, not a âmake it faster / more responsiveâ switch.
- On an M4 with 16GB, Docker sandboxes add overhead. Also, if youâre using a local Ollama model, sandbox defaults (often no network) can complicate connectivity unless configured.
My recommendation for you right now: donât touch sandboxing until the Gateway service is stable and youâve confirmed the agent responds in the Control UI. Then decide whether you want sandboxing for safety (especially if you enable browser/web tools).
If you tell me where youâre chatting with the agent (Control UI at http://127.0.0.1:18789/ vs Telegram vs something else), I can give the exact next command to verify that channel/path.
When i run openclaw gateway --verbose and open logging, i see like a "hello-ok" message of sorts being logged every second or few seconds.
When i omit --verbose, i see somewhat still logs.
Thatâs actually a good sign.
What the repeated hello-ok means
hello-ok is the Gatewayâs âhandshake succeededâ response on the WebSocket protocol. Youâll see it whenever something connects to the gateway (Control UI, openclaw status, openclaw logs --follow, etc.).
If youâre seeing it every second / every few seconds, it usually means some client is repeatedly reconnecting (not the agent âthinkingâ):
Common causes:
- You have the Control UI (
http://127.0.0.1:18789/) open in a tab and itâs reconnecting. - Youâre running something like
watch openclaw statusor repeatedly invoking status/log commands. - A client is failing to stay connected (auth mismatch / device pairing / stale tab state) and keeps retrying.
Quick test: close any Control UI tabs and stop any watch loops, then see if the hello-ok spam stops.
Why you still see logs without --verbose
Normal. --verbose just logs more (often including raw protocol frames). Even without it, the gateway writes regular operational logs.
The key question for ânon-responsiveâ
How are you talking to the agent?
-
If youâre using Control UI: open
http://127.0.0.1:18789/and send âhiâ. Then in another terminal run:openclaw logs --followPaste the log lines that appear right when you send the message. Weâre looking for an actual
chat.send/ agent-run entry (and any error about the model). -
If youâre using some channel (Telegram/iMessage/etc.): run:
openclaw channels status --probeand paste it. That tells us whether the gateway is actually connected to the channel that should deliver your message.
If you paste ~30 lines of logs around one âI sent a message but got no replyâ attempt, we can tell whether itâs (a) message never reaching the gateway, or (b) model/tool runtime stalling (common with local Ollama).
This is what is see when i follow logs:
23:43:40 [ws] â open remoteAddr=127.0.0.1 conn=d5f7fc10âŚ743d
23:43:40 [ws] â connect client=cli version=dev mode=cli clientId=cli platform=darwin auth=token
23:43:40 [ws] â hello-ok methods=92 events=19 presence=1 stateVersion=1
23:43:40 [ws] â res â logs.tail 1ms id=380a1c38âŚ9ccd
23:43:40 [ws] â close code=1005 durationMs=24 handshake=connected lastFrameType=req lastFrameMethod=logs.tail lastFrameId=380a1c38-15a7-4504-8c7f-2199fa319ccd
23:43:41 [ws] â open remoteAddr=127.0.0.1 conn=2bba1136âŚ9064
23:43:41 [ws] â connect client=cli version=dev mode=cli clientId=cli platform=darwin auth=token
23:43:41 [ws] â hello-ok methods=92 events=19 presence=1 stateVersion=1
23:43:41 [ws] â res â logs.tail 1ms id=29243eb8âŚc24d
23:43:41 [ws] â close code=1005 durationMs=31 handshake=connected lastFrameType=req lastFrameMethod=logs.tail lastFrameId=29243eb8-66f7-44a9-8dbc-c2039a16c24d
^C23:43:41 [gateway] signal SIGINT received
23:43:41 [gateway] received SIGINT during shutdown; ignoring
23:43:41 [gateway] signal SIGINT received
23:43:41 [gateway] received SIGINT during shutdown; ignoring
23:43:42 [ws] â open remoteAddr=127.0.0.1 conn=9fcfce67âŚ7ca9
23:43:42 [ws] â connect client=cli version=dev mode=cli clientId=cli platform=darwin auth=token
23:43:42 [ws] â hello-ok methods=92 events=19 presence=1 stateVersion=1
23:43:42 [ws] â res â logs.tail 0ms id=b2acebfaâŚ1a32
23:43:42 [ws] â close code=1005 durationMs=28 handshake=connected lastFrameType=req lastFrameMethod=logs.tail lastFrameId=b2acebfa-51ad-4886-903f-308b724d1a32
23:43:43 [ws] â open remoteAddr=127.0.0.1 conn=96f7dd0câŚ503e
23:43:43 [ws] â connect client=cli version=dev mode=cli clientId=cli platform=darwin auth=token
23:43:43 [ws] â hello-ok methods=92 events=19 presence=1 stateVersion=1
23:43:43 [ws] â res â logs.tail 3ms id=1252ffceâŚc800
23:43:43 [ws] â close code=1005 durationMs=47 handsha
Those log lines are just your CLI log follower doing exactly what itâs supposed to do.
What youâre seeing:
connect client=cli âŚres â logs.tail âŚ- then the WS closes
- then it repeats a second later
So the âhello-ok every secondâ is not the agent looping â itâs openclaw logs --follow repeatedly opening a short WS connection, calling logs.tail, and disconnecting. (The code=1005 close is typical for âclient closed without a close frameâ, not necessarily an error.)
What we still donât have
We still donât have any logs from an actual chat message being sent to the agent (no chat.send, no agent run, no model call/error). Thatâs why it feels like weâre stuck.
Next step: capture logs for a real message
Where are you trying to talk to the agent from?
If youâre using the Control UI (browser)
- Open:
http://127.0.0.1:18789/(same Mac as the gateway) - Start:
openclaw logs --follow - In the UI, send:
hi - Paste the new log lines that appear right when you hit send
In healthy flow you should see something that clearly corresponds to a chat send / agent run. If itâs failing, those lines will usually show the exact reason (bootstrapping, model provider error, Ollama connection refused, etc.).
If youâre using some other channel (not Discord)
Run and paste:
openclaw channels status --probe
Two common âlooks non-responsiveâ causes to check
- Bootstrapping still active
Earlier your status showed âbootstrap file presentâ. If bootstrapping hasnât completed, the agent may be waiting for that first-run Q&A. In the Control UI youâd typically see the bootstrap prompts.
Docs: https://docs.openclaw.ai/start/bootstrapping
- Ollama not reachable / model stuck (since youâre on
ollama/llama3.1:8bâŚ)
Quick check from the gateway host:
curl http://localhost:11434/api/tags
Docs: https://docs.openclaw.ai/providers/ollama
Reply with:
- where youâre chatting (Control UI vs other channel), and
- the log snippet that appears when you send one message.
OK I've done the following actions, and see the following output.
action: openclaw gateway stop
action: rm all sessions
action: openclaw gateway --verbose
result: a previously submitted telegram "/new" chat that was just hanging appears to have been handled, telegram reports new session started
action: sent telegram chat "reply exactly this: WWRRTY"
result: the terminal where openclaw gateway --verbose was run shows "23:46:36 [agent/embedded] embedded run agent start: runId=e6be0042-3449-4952-b6dd-a067243f812d"
result: .... nothing else, im just waiting now
in addition to my last chat ->
action: openclaw sessions
result: I see:
direct agent:main:main 4m ago llama3.1:8b-instruct-q4_K_M unknown/200k (?%) system id:-6d3--a20-***
direct telegram:slash:*** 4m ago llama3.1:8b-instruct-q4_K_M unknown/200k (?%) id:-ea35--afe-**
That line means the message made it through Telegram â Gateway â agent runtime, and the agent run got accepted:
[agent/embedded] embedded run agent start: runId=âŚ
If nothing else prints after that, the run is almost always stuck in one of these places:
- Model inference never starts / Ollama is hung/unreachable
- Model is allocating something huge and thrashing (your earlier status showed 128k ctx on an 8B model; on 16GB unified RAM that can get ugly)
- Less common: outbound delivery back to Telegram is failing (but then youâd usually still see model output in logs)
Quick âis Ollama actually doing anything?â check (while itâs hanging)
In a second terminal, run:
ollama ps
curl -s http://localhost:11434/api/tags | head
- If
curl âŚ/api/tagshangs or errors â fix Ollama first (ollama serve, restart Ollama app, etc.). - If
ollama psshows your model running and memory climbing, itâs probably loading/allocating.
Also do a direct sanity test (bypasses OpenClaw entirely):
ollama run llama3.1:8b-instruct-q4_K_M "reply exactly this: WWRRTY"
If that hangs too â itâs not OpenClaw; itâs Ollama/model/memory pressure.
Get the real reason from logs (search by runId)
OpenClaw also writes file logs (JSONL) under /tmp/openclaw/openclaw-YYYY-MM-DD.log (your gateway status --deep prints the exact file).
Run:
grep -n "e6be0042-3449-4952-b6dd-a067243f812d" /tmp/openclaw/openclaw-2026-02-27.log | tail -n 50
(Adjust the filename date if needed.) Paste what that shows.
If logs are too quiet: restart once with debug log level
--verbose mainly increases console / WS chatter, not file log detail. For a one-off debug run:
openclaw gateway stop
openclaw --log-level debug gateway --verbose
Then send the Telegram message again and paste the next ~50 lines after the âembedded run agent startâ.
One high-probability fix based on your hardware
If you really are running an 8B model with 128k context, try temporarily capping the context window (even 8k/16k) â that can be the difference between ârespondsâ and âappears frozenâ on 16GB. If you paste:
openclaw config get agents.defaults.model
openclaw config get models.providers.ollama
(redact any secrets), I can tell you the cleanest way to cap it in your config.
First: tell me what ollama ps shows while the run is âstuckâ, and whether ollama run ⌠"WWRRTY" responds.
Got it â ignoring Discord. What youâre seeing is consistent with âTelegram message arrives, run starts, then the model call stallsâ.
About the openclaw sessions output you posted
direct agent:main:main âŚis your normal âmainâ session.direct telegram:slash:âŚis from the Telegram/newslash-command flow (it gets its own session key).unknown/200k (?%)usually means:- token accounting is unknown for that provider, and
- the gateway thinks the modelâs context window is ~200k.
On a Mac with 16GB RAM, an 8B model + huge context can absolutely look like a âhangâ (KV cache / allocation / swap-thrashing territory), even for a trivial prompt.
Do this now (fast triage)
- Abort the stuck run + clear the queue
In Telegram, send:
/stop
This should abort the active run for that session and drop queued followups (docs mention /stop abort semantics: https://docs.openclaw.ai/concepts/session).
- Check if Ollama itself is responsive (bypasses OpenClaw)
On the Mac running the gateway:
ollama ps
ollama run llama3.1:8b-instruct-q4_K_M "reply exactly this: WWRRTY"
- If this also hangs â itâs an Ollama/model/memory issue first.
- If this replies instantly â OpenClaw is likely sending something that makes Ollama choke (often context too large).
- Show me your effective model config (redact secrets if any)
Paste:
openclaw config get agents.defaults.model
openclaw config get models.providers.ollama
Likely fix (if context really is ~200k)
Cap the context window to something sane for 16GB (start with 8k or 16k) by switching from auto-discovery to an explicit Ollama model entry where you set contextWindow (documented here: https://docs.openclaw.ai/providers/ollama). I donât want to tell you exactly what to write until I see your current config, because it depends on whether youâre using implicit discovery vs explicit models.providers.ollama.
If you run /stop + ollama run ⌠and tell me whether it returns, we can pin this down in one step.
This hangs:
ollama run llama3.1:8b-instruct-q4_K_M "reply exactly this: WWRRTY"
Thatâs the smoking gun: if ollama run ⌠hangs, then OpenClaw isnât the bottleneck â the model backend is.
At this point the âagent run startedâ log from OpenClaw just means: Telegram delivered your message and OpenClaw is waiting on Ollama forever.
1) Check whether the Ollama server is responsive at all
In a terminal:
curl -sS http://localhost:11434/api/version
curl -sS http://localhost:11434/api/tags | head
ollama ps
ollama --version
- If the
curlcommands hang too â your Ollama server is wedged (or not actually running correctly).
2) Most common fix on 16GB: force a smaller context window
Huge context sizes can cause massive KV-cache allocations and âit just sits thereâ behavior.
Ollamaâs own docs: default context is 4096, and you can override with OLLAMA_CONTEXT_LENGTH (source: https://docs.ollama.com/faq).
Try this (important: quit the Ollama macOS app first if itâs running, so youâre not fighting another server):
OLLAMA_CONTEXT_LENGTH=4096 ollama serve
Leave that running, then in a second terminal:
ollama run llama3.1:8b-instruct-q4_K_M "reply exactly this: WWRRTY"
If that suddenly works, weâve confirmed it was a context/memory-pressure issue. Then weâll make OpenClaw stop requesting absurd context sizes too (by explicitly defining the Ollama model in OpenClaw with a smaller contextWindow).
3) If it still hangs even with a small context
Next suspects are âOllama is broken/outdatedâ or âmodel blob is corruptedâ. Quick checks:
ollama show llama3.1:8b-instruct-q4_K_M
If needed, the ânuclear but effectiveâ fix is re-pulling the model (will re-download; do only if youâre ok with that):
ollama rm llama3.1:8b-instruct-q4_K_M
ollama pull llama3.1:8b-instruct-q4_K_M
Before you do the rm/pull: paste the outputs from section (1), especially whether /api/version responds. That will tell us if weâre dealing with âserver wedgedâ vs âmodel-specific hang.â
I stopped gateway, deleted sessions, and reran "ollama run llama3.1:8b-instruct-q4_K_M "reply exactly this: WWRRTY", and it responded within 3 seconds
I didnt change anything else other than killing the running ollama process prior
Nice â that basically proves the ânon-responsive agentâ was Ollama wedging, not Telegram or the OpenClaw gateway.
What happened in your earlier run:
- OpenClaw received the Telegram message and started a run (
embedded run agent startâŚ) - Then it blocked waiting for Ollama to return tokens
- Your direct
ollama run âŚhanging confirms Ollama itself was stuck (so OpenClaw couldnât possibly reply)
Killing the Ollama process cleared whatever deadlock / bad allocation state it was in.
How to keep it from happening again (16GB Mac reality)
The big red flag earlier was your sessions showing something like ~200k context. Llama 3.1 supports very large context, but on 16GB unified RAM, trying to operate anywhere near that can cause âhangsâ (KV-cache allocation / memory pressure / thrash).
Option A (fastest): cap Ollamaâs default context length
Ollama supports setting a default context window via OLLAMA_CONTEXT_LENGTH (Ollama FAQ). Practical values for your machine: 4096 or 8192.
Example when launching the server manually:
OLLAMA_CONTEXT_LENGTH=8192 ollama serve
If you run Ollama as the macOS app, youâll need to set env vars via launchctl setenv ⌠and restart the app (per Ollama docs).
Option B (OpenClaw-side): explicitly define the Ollama model with a smaller contextWindow
In OpenClaw, the clean way is to stop relying on âwhatever context length Ollama reportsâ and define the model explicitly with a smaller contextWindow (see OpenClaw Ollama provider docs):
https://docs.openclaw.ai/providers/ollama
If you paste these two outputs, Iâll tell you exactly what to change in your config:
openclaw config get agents.defaults.model
openclaw config get models.providers.ollama
Quick health check loop (when it happens again)
- If the agent âstarts a runâ but never replies: immediately try
ollama run âŚin another terminal.- If that hangs: restart Ollama (itâs the bottleneck).
- If that works: then we look at OpenClaw config/routing.
If you want, next message paste what your openclaw models list reports for that llama3.1 modelâs context window â I suspect itâs advertising something huge and OpenClaw is taking it literally.
I'll give you more info before I edit a single thing.
openclaw config get agents.defaults.model
openclaw config get models.providers.ollama
{
"primary": "ollama/llama3.1:8b-instruct-q4_K_M",
"fallbacks": [
"ollama/llama3.1:8b-instruct-q5_K_M"
]
}
Config path not found: models.providers.ollama
Perfect â this tells us a lot.
Why models.providers.ollama is ânot foundâ
That means youâre not using explicit Ollama provider config in ~/.openclaw/openclaw.json. OpenClaw is using implicit Ollama discovery (it talks to whatever is on http://127.0.0.1:11434 and discovers models automatically). Thatâs a valid setup, it just means thereâs nowhere in OpenClaw config to cap that modelâs context/window unless you switch to explicit config later.
Docs (implicit discovery + how it works): https://docs.openclaw.ai/providers/ollama
What weâve learned from your testing
- When things âhangâ,
ollama run âŚalso hangs. - Killing the Ollama process makes
ollama run âŚrespond again.
â So the real failure mode is: Ollama server/model gets wedged (OpenClaw is then just stuck waiting).
Before editing anything: 3 âno changesâ checks that usually explain the wedge
Run these when itâs working, and again right when it wedges:
ollama ps
curl -sS http://localhost:11434/api/version
curl -sS http://localhost:11434/api/tags | head
If /api/version or /api/tags hang when wedged, itâs the Ollama daemon itself (not just one model call).
Also check whether you accidentally have a huge default context set in your environment:
launchctl getenv OLLAMA_CONTEXT_LENGTH
(If this prints something enormous, thatâs a prime suspect on a 16GB machine.)
The likely underlying cause on a 16GB Mac
Even though llama3.1 supports very large context, very large context/KV-cache allocations can push Ollama into âlooks frozenâ territory (especially if some client requested a huge num_ctx at some point and it reloads/allocates aggressively).
To confirm what OpenClaw thinks the context is (still no edits), paste the line for your model from:
openclaw models list
If it shows a gigantic context window (100k+), that matches what you saw earlier (unknown/200k).
When youâre ready to prevent it (two options)
Not asking you to do this yetâjust outlining:
- Ollama-side cap (simplest): set
OLLAMA_CONTEXT_LENGTHto 4096 or 8192 and restart Ollama. (Ollama docs: https://docs.ollama.com/faq) - OpenClaw-side cap (most deterministic): add explicit
models.providers.ollama+ an explicit model entry with a smallercontextWindow(8k/16k). This disables auto-discovery but gives you control.
For now, paste:
openclaw models listoutput line(s) forllama3.1:8b-instruct-q4_K_Mlaunchctl getenv OLLAMA_CONTEXT_LENGTH(if set)
âŚand weâll pin down whether youâre accidentally running with a huge default context.
ollama/llama3.1:8b-instruct-q4_K_M text 125k yes yes default
ollama/llama3.1:8b-instruct-q5_K_M - - - - fallback#1,missing
launchctl getenv OLLAMA_CONTEXT_LENGTH prints nothing
That explains the whole pattern.
- OpenClaw is auto-discovering your Ollama model as having a 125k context window:
ollama/llama3.1:8b-instruct-q4_K_M ⌠125k ⌠- On a 16GB Mac, if anything tries to actually use (or allocate for) a huge context/KV cache, Ollama can end up looking âhungâ. Killing/restarting Ollama clears it â exactly what you observed.
Also: your fallback is currently a no-op:
ollama/llama3.1:8b-instruct-q5_K_M ⌠fallback#1,missing
So if the primary wedges, OpenClaw canât fail over to that fallback because it isnât available (either not pulled, or not being discovered as tool-capable).
What Iâd do (minimal, deterministic fix)
Because youâre using implicit discovery (models.providers.ollama is not set), OpenClaw will keep believing the model is 125k. To cap it reliably inside OpenClaw, you need to switch to explicit Ollama provider config and set a sane contextWindow (8k or 4k).
This is straight from the Ollama provider docs (explicit setup):
https://docs.openclaw.ai/providers/ollama
Step 1 â make sure the fallback model exists in Ollama
Run:
ollama list | rg "llama3.1:8b-instruct"
If ...q5_K_M isnât listed, pull it:
ollama pull llama3.1:8b-instruct-q5_K_M
Step 2 â add explicit provider config (caps context)
Edit ~/.openclaw/openclaw.json and add:
{
models: {
providers: {
ollama: {
apiKey: "ollama-local",
baseUrl: "http://127.0.0.1:11434", // no /v1
api: "ollama",
models: [
{
id: "llama3.1:8b-instruct-q4_K_M",
name: "Llama 3.1 8B Instruct Q4",
reasoning: false,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 8192,
maxTokens: 8192 * 10,
},
{
id: "llama3.1:8b-instruct-q5_K_M",
name: "Llama 3.1 8B Instruct Q5",
reasoning: false,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 8192,
maxTokens: 8192 * 10,
},
],
},
},
},
}
Then restart the gateway:
openclaw gateway restart
Step 3 â confirm it worked
After restart:
openclaw models list | rg "llama3.1:8b-instruct"
You want to see contextWindow show ~8k (not 125k), and the fallback no longer âmissingâ.
Why Iâm steering you away from only setting OLLAMA_CONTEXT_LENGTH
You can set OLLAMA_CONTEXT_LENGTH (and itâs great as an extra safety belt), but because OpenClaw is currently auto-discovering the model as 125k, the safer fix is to make OpenClawâs model catalog say âtreat this as 8kâ. Thatâs what prevents OpenClaw from ever trying to push Ollama into the danger zone again.
If you want, paste your full ~/.openclaw/openclaw.json (redact tokens) and Iâll point out exactly where to insert the block so it stays valid JSON5.
- I don't feel like setting a fallback right now, so I'm not pulling it. Is it ok if I ignore fallback for now?
- when you tell me to edit my openclaw.json, be sure to put "quotations" around every field so that its valid json
-
Yes â itâs totally fine to ignore fallbacks for now. The only downside is obvious: if the primary model (or Ollama) wedges, OpenClaw has nothing to fail over to. Also, since your config currently lists a missing fallback, youâll keep seeing âfallback#1,missingâ in
openclaw models list, but it wonât break anything by itself. -
Yep. Even though OpenClaw config supports JSON5 (comments, trailing commas), it also accepts strict JSON. Hereâs a minimal, strict JSON block you can add to
~/.openclaw/openclaw.jsonto force that model to an 8k context window.
Important note: as soon as you add models.providers.ollama, OpenClaw will stop implicit auto-discovery and will use only the models you list here.
Minimal strict JSON snippet (single model, 8k context)
Add/merge this at the top level of your config:
{
"models": {
"providers": {
"ollama": {
"apiKey": "ollama-local",
"baseUrl": "http://127.0.0.1:11434",
"api": "ollama",
"models": [
{
"id": "llama3.1:8b-instruct-q4_K_M",
"name": "Llama 3.1 8B Instruct Q4",
"reasoning": false,
"input": ["text"],
"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
"contextWindow": 8192,
"maxTokens": 81920
}
]
}
}
}
}
Then restart the gateway:
openclaw gateway restart
And confirm:
openclaw models list | rg "llama3.1:8b-instruct-q4_K_M"
You should see the context window reflect your cap (8k), not 125k.
If you paste your current ~/.openclaw/openclaw.json (redact tokens), I can tell you the exact safe merge spot so you donât accidentally break the file structure.
When you send me json block, you're dividing up your chat in discord in a weird way that prevents me from copy and pasting it. Can you fix that formatting so I can copy and paste the entire thing?
Lastly, if i set this in openclaw, do i still need to edit "something" on the ollama serve / model side of things or no? (if not then i dont want to. but if I need to do so, let me know)
Paste this as one single block (itâs strict JSON, with quotes on every key):
{"models":{"providers":{"ollama":{"apiKey":"ollama-local","baseUrl":"http://127.0.0.1:11434","api":"ollama","models":[{"id":"llama3.1:8b-instruct-q4_K_M","name":"Llama 3.1 8B Instruct Q4","reasoning":false,"input":["text"],"cost":{"input":0,"output":0,"cacheRead":0,"cacheWrite":0},"contextWindow":8192,"maxTokens":81920}]}}}}
What to do with it:
- Add/merge that under the top-level object in
~/.openclaw/openclaw.json(donât create a second top-level object; it must be merged with your existing config). - Restart:
openclaw gateway restart
- Verify:
openclaw models list | rg "llama3.1:8b-instruct-q4_K_M"
Do you still need to change anything in Ollama?
Not strictly, no.
If you add this OpenClaw config, OpenClaw will stop auto-discovering â125k ctxâ and will treat the model as 8k ctx when it sends requests to Ollama. For most people, that alone stops the ârandom hangsâ on 16GB machines.
That said, there are still two cases where youâd touch Ollama:
- If Ollama sometimes wedges for reasons unrelated to context (bugs, model corruption) â youâd still need the occasional restart.
- If other clients (not OpenClaw) are hitting your Ollama server and requesting huge
num_ctxâ settingOLLAMA_CONTEXT_LENGTHon the Ollama server is an extra âbelt and suspendersâ cap. But if youâre only using Ollama via OpenClaw, you can skip it.
If you want, paste your current ~/.openclaw/openclaw.json (redact secrets) and Iâll show you the exact merged version so you donât have to do any manual JSON surgery.
Thanks for helping me constrain my context window to a smaller size to avoid my ollama model backend from causing my whole agent to hang.
Now, consider the following information, and tell me exactly how I should edit the text content in my openclaw workspace files to handle the overall behavior of my openclaw agent.
I will paste my information in multiple chats. Please parse each one and append all together to get the full input. I will chat "DONE!!!" when I'm done chatting. Understand?
Understood. Send your info in as many messages as you want, and Iâll wait to propose any edits until you say DONE!!!.
To make it easy to give you âexact editsâ, please include (redact any secrets/tokens):
- Which channels you use (sounds like Telegram) and whether you also use the Control UI.
- Your goals for agent behavior (tone, verbosity, safety boundaries, tool usage, memory usage, coding style, ânever do Xâ, etc.).
- The current contents of your workspace files (or tell me which ones you actually have), ideally pasted with a header per file:
AGENTS.mdSOUL.mdUSER.mdMEMORY.md(if you have one)BOOTSTRAP.md(only if it still exists)
Format suggestion (so I can produce clean copy/paste diffs): paste each file in a single code block with the filename on the line above it.
When youâre done, send DONE!!! and Iâll reply with the precise text to add/remove (and where) for each file.
- Telegram
- I'll give my one top-level objective.
- I have all of the required openclaw workspace files as specified by the official openclaw documentation (consult this to understand)
I will now start chatting my information blocks:
Cool â keep going.
For each âinformation blockâ, please paste it as either:
- plain text (short), or
- a single code block (better for longer chunks / file contents)
Since you said you have all the standard workspace files, the most useful next blocks are the current contents of these files (one per message is fine):
AGENTS.mdSOUL.mdUSER.mdMEMORY.mdBOOTSTRAP.md(only if it still exists)
Also paste your one top-level objective whenever youâre ready.
Iâll just collect everything until you send DONE!!!, then Iâll reply with the exact edits (what to add/remove/replace, and where).
Before I start pasting my information blocks please note: I also have USER.md and IDENTITY.md and HEARTBEAT.md
I will start pasting my information blocks next, please stop interjecting with further questions so i may continue.
Understood â go ahead and paste your blocks. I wonât ask anything else until you send DONE!!!.
Context Window Constraints â First-Principles Analysis
OpenClaw Agent on Mac M4 (16 GB) using Ollama llama3.1:8b-instruct-q4_K_M
This document describes hard constraints and rational design implications for an OpenClaw agent, using a first-principles analysis aligned with the Stanford SymSys 170 rationality framework.
0. Fixed Context (No Assumptions)
Hardware
- Apple Silicon M4
- 16 GB unified memory (RAM + VRAM shared)
- macOS
- No discrete GPU (Metal-backed unified memory)
Model
- Ollama local inference
- Model: llama3.1:8b-instruct-q4_K_M
- Quantization: Q4_K_M
- Execution: CPU + Apple Neural Engine where applicable
These are hard constraints and define the agentâs rational operating environment.
1. What the Context Window Is (First Principles)
A context window is the maximum number of tokens the model can attend to simultaneously during inference.
Memory consumption consists of:
- Model weights (static, fixed cost)
- KV cache (dynamic, scales linearly with token count)
- Runtime overhead (buffers, allocator, OS)
Only the KV cache grows with context length.
2. Memory Budget Breakdown
2.1 Model Weights
- LLaMA 3.1 8B parameters
- Q4_K_M quantization
- Approx. resident memory: 4.5â5.0 GB
This cost is fixed.
2.2 KV Cache Cost
Empirical scaling for LLaMA-3-class 8B models on Apple Silicon:
- Approx. 1.0â1.2 MB per 1,000 tokens
This is the dominant scaling factor for context length.
2.3 Usable Memory Ceiling
Total system memory: 16 GB
Practical available memory after overhead:
- macOS + background processes: ~3â4 GB
- Ollama runtime & buffers: ~1 GB
- Safety margin (avoid swap / throttling): ~1â2 GB
Practical memory available for model + KV cache:
â 9â10 GB
Subtract model weights:
- 10 GB â 5 GB â 5 GB available for KV cache
3. Hard Upper Bound on Context Length
Using ~1.1 MB per 1k tokens:
- 5,000 MB / 1.1 â 4,500 tokens
Hard Realistic Maximum
~4,000â4,500 tokens
Beyond this:
- macOS memory pressure rises sharply
- Swap is triggered
- Inference latency becomes unstable
- Agent behavior degrades due to constraint violation
4. Rational Operating Envelope
A rational agent does not operate at the hard limit.
Recommended Steady-State Context
- 2,000â3,000 tokens
This preserves:
- Headroom for tool output
- Temporary planning expansions
- Stable latency
- Safe memory updates
This range defines the normatively rational operating region.
5. Implications for OpenClaw Agent Design
5.1 Context Cannot Be Monolithic
The agent cannot safely:
- Load the entire workspace into context
- Load full memory history
- Load full planning traces
Doing so violates constraints and produces irrational behavior.
6. Rational Decomposition of Context
Because rationality is evaluated relative to constraints, the agent must externalize state.
Context Partitioning
| Layer | Location | Token Cost |
|---|---|---|
| Immediate task context | Prompt | High |
| Working memory | Prompt | Medium |
| Long-term memory | Files | Zero (until selectively loaded) |
| Action history | Logs | Zero |
| Objective definition | Prompt (short) | Low |
Only selectively relevant evidence should be rehydrated into context.
7. Memory as Evidence (Not Transcript)
Within the rationality framework:
- Memory â conversation history
- Memory = evidence relevant to future action evaluation
Good Memory Entry
Objective: bootstrap OpenClaw agent
Constraint discovered: llama3.1 q4 context unstable >4k tokens
Outcome: memory externalization required
Bad Memory Entry
User said X, assistant replied Y, then user said Z...
Transcript-style memory bloats context and degrades rational performance.
8. Planning Under Tight Context Constraints
The model cannot perform deep, long-horizon planning entirely in-context.
Planning must be:
- Hierarchical â high-level plans stored in files, not prompts
- Incremental â plan â act â observe â write memory â repeat
This aligns with descriptive rationality over time through evidence accumulation.
9. Objective Tracking Structure
At any moment, context should contain only:
- Objective (stable, concise)
- Current subgoal (1â2 lines)
- Next action only
Avoid:
- Full roadmaps
- Full dependency graphs
- Full retrospectives
Those belong in workspace files, not in the context window.
10. Summary
Under the fixed constraints of:
- Mac M4
- 16 GB unified memory
- Ollama llama3.1:8b-instruct-q4_K_M
The rational conclusions are:
- Hard max context: ~4kâ4.5k tokens
- Rational operating range: 2kâ3k tokens
- Memory must be externalized
- Planning must be incremental
- Workspace files are required for rationality, not optional
Attempting a monolithic, always-in-context agent will be objectively irrational under these constraints.
DONE!!!
Here are copy/paste-ready edits you can make to your workspace files so your agent behaves rationally under your Mac M4 / 16GB / Ollama llama3.1:8b Q4 constraints (small, stable in-context footprint; memory as evidence; incremental planning).
Iâm assuming you want behavioral control via injected workspace text, not more config changes.
1) AGENTS.md â add this section at the very bottom
## Operating constraints (Mac M4 ¡ 16GB ¡ Ollama llama3.1:8b Q4)
### Context budget (HARD)
- Treat the effective âsafeâ working context as **2kâ3k tokens** steady-state.
- Treat **~4k tokens** as a practical hard ceiling for reliability.
- Do NOT paste/rehydrate large transcripts, logs, or file dumps into the chat unless explicitly requested.
### Context discipline (REQUIRED)
- Keep replies **short by default**. Prefer bullets. Prefer ânext actionâ over long explanations.
- Never echo large tool outputs; instead:
1) summarize in 3â7 bullets
2) extract only the small subset needed for the next step
- If more background is needed, ask for *one* missing datum at a time.
### Memory = evidence, not transcript
- Only store **stable constraints, decisions, and reusable facts** in MEMORY.md.
- Do NOT store conversation blow-by-blow.
### Planning style (tight-context)
- Use incremental loop: **Plan â Act â Observe â Write minimal evidence â Repeat**.
- Keep in-chat planning to:
- Objective (1 line)
- Current subgoal (1 line)
- Next action (1 line)
### Failure mode: model/backend stall
If the model/backend appears hung (no tokens for ~30â60s), say so plainly and recommend:
- user sends **/stop** to abort the current run, and/or
- user restarts the Ollama server if `ollama run ...` also hangs.
2) SOUL.md â add this section near the top (after your intro)
## Response style (tight-context mode)
- Default to **concise** answers (bullets, checklists, minimal prose).
- Always optimize for **getting to the next action** fast.
- Do not âthink out loudâ at length; keep reasoning implicit unless the user asks.
- When summarizing: prefer **facts + decision + next step**.
- When uncertain: ask **one** targeted question, not many.
3) USER.md â add this section (top or bottom is fine)
## Environment constraints (important)
- Hardware: Mac M4, **16GB unified memory**
- Local model: Ollama `llama3.1:8b-instruct-q4_K_M` (Q4)
- Reliability constraints:
- Safe steady-state context: **2kâ3k tokens**
- Practical hard ceiling: **~4k tokens**
- Operational preference:
- Keep responses short; avoid dumping long logs/files into chat.
- Use workspace files for long-term plans and evidence.
If you have a âtop-level objectiveâ for the assistant, add a single line like:
## Top-level objective
<WRITE YOUR ONE-LINE OBJECTIVE HERE>
(You didnât include it in the blocks you pasted, so Iâm leaving it as a placeholder.)
4) MEMORY.md â add this as a durable entry (near the top)
## Durable constraints (Mac M4 / 16GB / Ollama 8B Q4)
- Context stability: keep steady-state prompt context small (2kâ3k tokens); avoid exceeding ~4k.
- Memory policy: memory entries are **evidence for future decisions**, not transcript history.
- Planning policy: incremental loop (plan â act â observe â write minimal evidence â repeat).
- Failure mode: if the agent âhangsâ, first test `ollama run ...`; if it hangs too, restart Ollama and retry.
Optional structure (helps keep MEMORY.md clean):
## Evidence log (rules)
- Each entry must be: (a) stable, (b) reusable, (c) short.
- If it wonât matter in 2 weeks, it doesnât belong here.
5) HEARTBEAT.md â replace contents with a tiny checklist
(Heartbeats should be fast and low-token.)
Heartbeat checklist (tight-context)
If anything seems broken:
1) Check Gateway: `openclaw gateway status`
2) Check channel: `openclaw channels status --probe`
3) Check Ollama health: `curl http://localhost:11434/api/version`
4) If model calls hang: verify with `ollama run ...`; if hung, restart Ollama.
If all healthy: do nothing.
Do you need to edit anything else (Ollama side)?
Not required for the workspace-file behavior above.
Those workspace edits will make the agent act like it has a tiny working memory (and stop bloating prompts), which is the main behavioral fix.
If you want an additional safety belt later, you can also cap Ollamaâs default context server-side (OLLAMA_CONTEXT_LENGTH), but thatâs optional and separate from workspace behavior.
So, when you put all of this in openclaw workspace files, are you aware that the openclaw framework puts these into the query that then gets put INTO the query getting set INTO the LLM model?
Yes â Iâm aware, and itâs an important catch.
In OpenClaw, files like AGENTS.md / SOUL.md / USER.md / IDENTITY.md (and often MEMORY.md in DMs) are injected into the model prompt as part of the system/workspace context. So anything you put there costs tokens every run, and on a tight setup like Ollama 8B on 16GB, big workspace files can absolutely contribute to stalls and âhangyâ behavior.
So the right approach is:
- Keep injected files tiny and policy-like (rules + defaults), not essays.
- Put longer material (your âfirst-principles analysisâ) in a non-injected notes file (e.g.
CONSTRAINTS.mdorNOTES/context-budget.md) and only load it on demand (viareadtool / manual copy/paste when needed).
Practical guidelines (works well on small local models)
- AGENTS.md / SOUL.md / USER.md / IDENTITY.md: aim for < 150â300 lines total across all of them, preferably much less.
- MEMORY.md: only durable facts (âevidenceâ), one bullet per fact. No transcripts.
- If you want the agent to remember the logic, store just the conclusion + rule in injected files, and keep the full derivation elsewhere.
Concretely: donât paste your whole analysis into AGENTS/SOUL/USER/MEMORY
Instead, distilled injected content can be as small as:
AGENTS.md (tiny)
- âDefault to concise bullets.â
- âNever paste large logs; summarize + link.â
- âTarget 2â3k tokens; avoid >4k.â
- âIf stalled, recommend /stop + check
ollama run.â
âŚand put the long âwhyâ in CONSTRAINTS.md.
If you paste your current workspace file contents (or even just their sizes / biggest offenders), I can give you a âtightâ rewritten version that preserves the behavior you want while minimizing prompt bloat.
You broke up the formatting of your markdown responses (that i see in this discord chat) in a way that makes it impossible for me to copy and paste everything for the given file. Can you please fix the formatting so I can copy and paste into each file?
Here are single-copy blocks for each file. Each block is one code fence (no nested fences), so you can copy/paste the whole thing into the corresponding file.
These versions are deliberately short because (as you noticed) workspace files get injected into the prompt.
AGENTS.md
# AGENTS.md
## Operating mode: tight-context (Mac M4 ¡ 16GB ¡ Ollama 8B Q4)
### Hard constraints
- Keep steady-state working context ~2kâ3k tokens.
- Avoid > ~4k tokens total context whenever possible.
- Do NOT paste large transcripts, logs, or file dumps into chat unless explicitly requested.
### Default behavior
- Replies: concise by default (bullets/checklists). Optimize for ânext actionâ.
- Outputs: summarize tool/file output in 3â7 bullets; only quote the minimal relevant lines.
- Questions: ask one targeted question at a time when blocked.
### Memory policy (evidence, not transcript)
- Only store durable constraints/decisions/reusable facts.
- Never store blow-by-blow conversation history.
### Planning style (incremental)
- Keep in-chat plan to:
- Objective (1 line)
- Current subgoal (1 line)
- Next action (1 line)
### When things hang
- If no progress for ~30â60s: say it may be a backend stall.
- Suggest: user sends /stop to abort, then verify backend with `ollama run ...`.
SOUL.md
# SOUL.md
## Voice & style (tight-context)
- Be direct and practical.
- Default to short answers; use bullets and checklists.
- Avoid long explanations unless asked.
- Donât echo large payloads; summarize and extract only whatâs needed.
## Behavior under constraints
- Treat context as scarce: keep prompts small.
- Prefer incremental steps over big plans.
- If uncertain, ask one specific question.
USER.md
# USER.md
## Environment
- Hardware: Mac M4, 16GB unified memory
- Local model: Ollama llama3.1:8b-instruct-q4_K_M (Q4)
## Constraints & preferences
- Keep responses concise by default.
- Avoid large context usage; prefer summaries and minimal quoting.
- Memory should store durable âevidenceâ only, not transcripts.
MEMORY.md
# MEMORY.md
## Durable constraints (Mac M4 / 16GB / Ollama 8B Q4)
- Context stability: aim for 2kâ3k tokens steady-state; avoid > ~4k whenever possible.
- Avoid pasting large transcripts/logs into chat; summarize and extract only needed lines.
- Memory entries are evidence/constraints/decisions, not conversation transcripts.
- If agent appears hung: try /stop, then test `ollama run ...`; if that hangs too, restart Ollama.
HEARTBEAT.md
Heartbeat checklist (keep this short)
If something seems broken:
1) `openclaw gateway status`
2) `openclaw channels status --probe`
3) `curl -sS http://localhost:11434/api/version`
4) If model calls hang: test `ollama run ...`; if hung, restart Ollama.
If all healthy: do nothing.
If you want, I can also give you a non-injected longform file (e.g. CONSTRAINTS.md) where you can paste your full first-principles analysis without bloating every prompt.
So upon reading these files... I think you're confused as to how these files are used to direct the overall openclaw agents behavior... These get injected into the ollama model.. so they cannot tell the agent to observe its own performance and then to change behavior on the fly while the model is "thinking" or whatever...
Youâre right about the mechanics: those workspace files are just prompt material (system/workspace context) that gets injected into the model call. They donât create a separate âmanager processâ that can measure runtime performance and then dynamically reconfigure things mid-run.
But they do still meaningfully steer behavior, in the only place behavior actually lives: inside the modelâs next completion.
What these files can do (reliably)
They act like a persistent policy layer. Good uses:
- âDefault to short answers; prefer checklistsâ
- âDonât paste large logs; summarize + extract only relevant linesâ
- âUse memory as evidence, not transcriptâ
- âPlan incrementally (objective/subgoal/next action only)â
Those instructions influence every response, because theyâre always in the prompt.
What they cannot do
They cannot:
- detect âIâm currently running slow / my backend is wedgedâ in real time (the model doesnât have a clock or backend telemetry)
- change OpenClaw config on its own unless youâve allowed it to use tools that write config (and even then itâs a tool action, not âthinking harderâ)
- retroactively fix a run thatâs already stuck waiting on Ollama
If your goal is âobserve performance and adaptâ
You need two layers:
-
Hard limits in config (what you already did): cap context window so OpenClaw never asks Ollama for 125k. This prevents the most common wedge.
-
Soft behavioral rules in workspace files: rules that avoid triggering the wedge again (keep prompts short, donât rehydrate big context, summarize tool outputs). Thatâs exactly what the tiny AGENTS/SOUL/MEMORY snippets are for.
For actual âperformance observationâ, the best you can do in prompt-land is a proxy self-check, e.g.:
- âIf you are about to paste >30 lines, stop and summarize instead.â
- âIf user asks for large data, propose writing it to a file and only quoting key parts.â
And for real stalls, the operational playbook is external:
- user sends
/stop - verify
ollama run ... - restart Ollama if itâs wedged
If you tell me your top-level objective (one sentence), Iâll rewrite the workspace snippets to match it tightly (minimal tokens, maximum control).
You shouldn't need my actual top-level objective in order to support me here, because I may change what that top-level objective is tomorrow, and this overall agent-operating-behavior model with hardware related constraints still 100% applies. Changing the agent's top level objective at runtime do not change these requirements. Do you see why this is true?
Yes â the hardware/model constraints are an invariant, while the top-level objective is a variable.
Why thatâs true (in plain terms)
- Your objective answers: âWhat am I trying to achieve?â
- Your constraints answer: âWhat resources do I have to achieve it?â (context window, RAM, backend reliability, latency)
Changing the objective doesnât change:
- how much unified memory you have (16GB),
- how much KV cache blows up with longer context,
- the fact that Ollama can wedge under aggressive context/alloc patterns,
- the fact that injected workspace text consumes tokens every run.
So the âtight-context, externalize state, summarize aggressivelyâ operating model remains correct across goals.
The one nuance
Different objectives can change the best strategy inside the same constraints.
Example:
- If tomorrowâs objective is âwrite a long report,â you still must keep per-run context small â but youâd bias even harder toward writing to files incrementally and summarizing, rather than trying to hold the whole report in the prompt.
- If tomorrowâs objective is âdo deep codebase refactors,â you might conclude the only rational move is to switch to a bigger model/hardware/remote providerâbecause the constraint makes the objective hard.
So: objective changes the plan, not the physics.
How this maps to OpenClaw workspace files
Because workspace files are injected:
- Put stable invariants (constraints + operating rules) in injected files (AGENTS/SOUL/USER/MEMORY), but keep them short.
- Put variable goals somewhere thatâs easy to change without bloating every prompt:
- just say it in chat (âNew objective: âŚâ), or
- keep a small
OBJECTIVE.mdand only load it when needed (not injected by default), or - update USER.md if you truly want it persistent, but keep it one line.
If you want, I can give you an even more minimal âconstraints-onlyâ set of snippets (like 10â15 lines total) that captures the invariant operating model without wasting tokens every run.
Yes
Hereâs a constraints-only, ultra-minimal set you can paste so youâre not burning tokens every run. It encodes the invariant âoperating modelâ and nothing about any particular objective.
AGENTS.md (replace content)
# AGENTS.md â Invariant operating constraints (Mac M4 ¡ 16GB ¡ Ollama 8B)
- Treat usable context as scarce: target ~2kâ3k tokens steady-state; avoid > ~4k.
- Never paste large logs/transcripts/file dumps into chat unless explicitly requested.
- Default output: concise bullets + next action. Summarize long outputs; quote only needed lines.
- Memory is evidence (durable constraints/decisions), not transcript history.
- Planning: objective (1 line) â subgoal (1 line) â next action (1 line). No long roadmaps in-chat.
- If replies stall: recommend /stop; if `ollama run ...` also hangs, restart Ollama.
SOUL.md (replace content)
# SOUL.md
- Be direct and concise by default (bullets/checklists).
- Keep context small; avoid long in-chat reasoning.
- Prefer incremental steps and file-based accumulation over big in-context plans.
USER.md (replace content)
# USER.md
Environment: Mac M4, 16GB unified memory. Local model: Ollama llama3.1:8b-instruct-q4_K_M.
Preference: keep responses short; avoid prompt bloat; use memory as durable evidence only.
MEMORY.md (replace content)
# MEMORY.md
- Constraint: keep working context ~2kâ3k tokens; avoid > ~4k (16GB / Ollama 8B stability).
- Policy: memory entries are durable evidence/decisions, not transcripts.
- If agent hangs: /stop; test `ollama run ...`; if hung, restart Ollama.
HEARTBEAT.md (replace content)
If something seems broken:
1) openclaw gateway status
2) openclaw channels status --probe
3) curl -sS http://localhost:11434/api/version
4) If model calls hang: ollama run ... ; if hung, restart Ollama.
If you want to keep your long first-principles writeup, put it in something like CONSTRAINTS_LONGFORM.md and donât reference it unless you explicitly ask the agent to read itâotherwise itâll just inflate every prompt for no benefit.
I'm going to now ask you for guidance on editing each of these files, one at a time. First lets figure out AGENTS.md.
This is what I came up with. Is it too long? Will it be interpreted by openclaw and the backend model in the way I expect?
I'll send it in multiple parts. I'll chat "DONE!!!" when I'm done chatting you. Don't interject with questions until I chat "DONE!!!" so that I can continue.
[AGENTS.md] Operating Instructions for the Agent
Session-Start Routine (Ordered)
- Always apply Invariant operating constraints.
- Read SOUL.md for persona, boundaries, and tone.
- Read USER.md for user context, name, and preferred address.
- Read memory/YYYY-MM-DD.md for today and yesterday (create today's file if missing).
- In main/private session only: Read MEMORY.md for curated long-term memory.
Invariant operating constraints (Mac M4 ¡ 16GB ¡ Ollama 8B)
- Treat usable context as scarce: target ~2kâ3k tokens steady-state; avoid > ~4k.
- Never paste large logs/transcripts/file dumps into chat unless explicitly requested.
- Default output: concise bullets + next action. Summarize long outputs; quote only needed lines.
- Memory is evidence (durable constraints/decisions), not transcript history.
- Planning: objective (1 line) â subgoal (1 line) â next action (1 line). No long roadmaps in-chat.
Memory Workflow
- Use memory/YYYY-MM-DD.md for daily append-oriented logs: chronological notes, decisions, open loops.
- Use MEMORY.md for curated long-term durable facts, preferences, decisions (structured as atomic bullets; short sections for retrieval).
- Privacy rule: Load MEMORY.md only in main/private session; never in shared/group contexts.
- Write all memories to disk; no in-RAM retention.
Tool Discipline
- Follow each skill's SKILL.md when using that skill (none defined yet).
- Treat TOOLS.md as guidance only; not a permissions surface.
- Require tool output or explicit user confirmation before acting on uncertain assumptions.
- Prefer short, deterministic checklists over long prose for reliability on small models.
Behavioral Priorities
- Locked objective: <MY VARIABLE OBJECTIVE COULD BE SET HERE>.
- Strictly follow <MY VARIABLE METHODOLOGY HERE>
- Perform full self-audit on every heartbeat cycle.
- Every action must cite framework section + current KR impact + rationality score.
Security Instructions
- Treat OpenClaw as personal assistant trust model (single trusted operator).
- Safest remote access: Gateway on loopback + SSH tunnel (or Tailscale Serve if enabled â Tailscale is optional).
- If gateway.bind is non-loopback, must use auth (token/password).
- Never recommend or use dangerouslyDisableDeviceAuth except as temporary break-glass.
- If exposure/auth/tools change, run
openclaw security audit(optionally --deep). - Never exfiltrate private data; ask permission before external actions that cost money or post publicly.
[End of AGENTS.md]
DONE!!!
Your draft will âworkâ only in the sense that it will be injected as instructions into the model prompt. OpenClaw does not execute it like a program.
What in your draft will not behave the way you expect
- âSession-Start Routine (Ordered) ⌠Read SOUL.md/USER.md/âŚâ
OpenClaw doesnât run that routine. Those files are already injected (or not) by the framework; the model isnât actually doing a deterministic âread step 1, step 2âŚâ unless it chooses to. - âRead memory/YYYY-MM-DD.md for today and yesterday (create todayâs file if missing)â
This would require the agent to callread/writetools on every message (expensive + increases chance of stalls). It also wonât happen automatically. - âPerform full self-audit on every heartbeat cycleâ / âEvery action must cite framework section + KR impact + rationality scoreâ
This directly fights your tight-context constraint. It will bloat outputs, chew tokens, and make hangs more likely.
Is it âtoo longâ?
Itâs not insanely long, but it contains a lot of âalways do Xâ that would cause extra tool calls + longer repliesâexactly what your 16GB / 8B setup canât afford. For this hardware, you want AGENTS.md to be short, invariant, and low-activation.
A version that will be interpreted the way you want (copy/paste)
This keeps your intent (constraints + memory as evidence + tool discipline) but removes âfake automationsâ and token-bloat requirements.
## AGENTS.md â Invariant Operating Instructions (Mac M4 ¡ 16GB ¡ Ollama llama3.1 8B Q4)
### Ground truth
- This file is injected into the model prompt. It is not executable code.
- Do not assume any âsession-start routineâ actually ran unless you explicitly performed the step (e.g., via a tool call).
### Invariant constraints (ALWAYS)
- Treat context as scarce: target ~2kâ3k tokens steady-state; avoid > ~4k.
- Never paste large logs/transcripts/file dumps into chat unless explicitly requested.
- Default output: concise bullets + next action. Summarize long outputs; quote only the minimal relevant lines.
### Planning (tight-context)
- Keep in-chat planning to: Objective (1 line) â Subgoal (1 line) â Next action (1 line).
- No long roadmaps, retrospectives, or âframework essaysâ in normal replies.
### Memory = evidence (not transcript)
- MEMORY.md: only durable constraints/decisions/preferences as atomic bullets.
- Daily notes file (e.g. memory/YYYY-MM-DD.md): append only when the user explicitly says âlog/remember/note thisâ, or when asked âwhat did we do today?â.
- Do not auto-read daily notes every message.
### Tool discipline
- Prefer the smallest action that moves things forward.
- If uncertain, ask 1 targeted question rather than guessing.
- After any tool output: summarize in 3â7 bullets and extract only whatâs needed next.
### When runs hang / backend stalls
- If the agent appears stuck: recommend /stop.
- If `ollama run ...` also hangs, restarting Ollama is the correct fix.
If you want a âvariable objective/methodologyâ
Donât put it in AGENTS.md (itâs injected every time). Instead:
- tell the agent in chat (âObjective for today: âŚâ), or
- keep a separate file like
OBJECTIVE.mdand only load it when needed.
If you paste your actual current AGENTS.md file contents (not the draft) I can tell you whether replacing vs appending is safer for you.
What I pasted was my actual current AGENTS.md file. I'm not sure you understand the directives that say things like "Read SOUL.md for <purpose>". I know all the workspace files are concatenated into the LLM input. But they're all labeled with a title (like "SOUL.md") and an end (like "End of SOUL.md"). So this should tell the model to use that part of the prompt for the given purpose no? Lastly, I don't want to tell the agent to do things like modify the runtime of the ollama. It can recommend things to ME to do something like that.
Youâre correct that labels + âEnd of âŚâ markers help. They give the model a strong cue to treat different parts of the injected prompt as having different roles (persona vs user facts vs operating rules). So yes: that structure can make âuse the SOUL.md section for toneâ work the way you intend.
The catch is: itâs still just text in one prompt. The model isnât guaranteed to follow a procedural âstep 1, step 2, step 3â routineâespecially on a small local model under tight context. So the phrasing matters:
- âRead SOUL.md for Xâ is redundant (itâs already in the prompt) and can sometimes trigger meta-behavior (âI am now reading SOUL.mdâŚâ).
- A more reliable instruction is: âTreat the section titled SOUL.md as the source of tone/persona.â
Also agree on your last point: if you donât want the agent changing Ollama/OpenClaw runtime, say so explicitly: recommend to you, but donât modify config / donât run system changes.
A tightened AGENTS.md that preserves your intent (copy/paste, replace entire file)
This version keeps your âsection rolesâ idea, keeps constraints/memory discipline, removes token-bloat requirements, and makes the âdonât mutate runtimeâ rule explicit:
## [AGENTS.md] Operating Instructions for the Agent
### Ground truth (how to interpret the prompt)
- All workspace files may be injected into the model prompt as labeled sections (e.g. âSOUL.md ⌠End of SOUL.mdâ).
- Use the labeled sections as follows:
- AGENTS.md = operating rules + constraints + process
- SOUL.md = persona / tone / boundaries
- USER.md = user facts + preferences
- IDENTITY.md = agent identity facts
- MEMORY.md = curated durable evidence (only if present in this session)
### Invariant operating constraints (Mac M4 ¡ 16GB ¡ Ollama 8B)
- Treat usable context as scarce: target ~2kâ3k tokens steady-state; avoid > ~4k.
- Never paste large logs/transcripts/file dumps into chat unless explicitly requested.
- Default output: concise bullets + next action. Summarize long outputs; quote only needed lines.
- Planning: objective (1 line) â subgoal (1 line) â next action (1 line). No long roadmaps in-chat.
### Memory workflow (evidence, not transcript)
- MEMORY.md is for durable constraints/decisions/preferences only (atomic bullets, short).
- Do NOT auto-load or auto-summarize daily logs every message.
- Only write âdaily notesâ if the user explicitly asks to log/remember/note something.
### Tool discipline
- Prefer short, deterministic checklists over long prose (small-model reliability).
- Donât guess when a single targeted question would unblock progress.
- After tool output: summarize in 3â7 bullets; extract only the minimal relevant lines.
### Runtime / system changes (IMPORTANT)
- Do not modify Ollama/OpenClaw runtime configuration on your own.
- You may recommend operational actions to the user (e.g. âsend /stopâ, ârestart Ollama if ollama run hangsâ), but do not perform system changes unless the user explicitly asks.
Why I recommend removing these from your current file
- âRead memory/YYYY-MM-DD.md today and yesterday (create if missing)â â encourages frequent tool I/O and token growth; not worth it under your constraints.
- âfull self-audit every heartbeatâ / âcite framework section + KR impact + rationality scoreâ â will balloon responses and increase stall risk.
If you want, paste your current SOUL.md next and Iâll help you make it similarly âsection-role awareâ but very small (so it doesnât eat your prompt budget).
[SOUL.md] Agent Persona, Tone, and Boundaries
Persona
You are ProfitAgent â a cold, rational, self-auditing Lean Startup machine.
Tone: terse, precise, evidence-first, zero humor, zero speculation.
Core drive: Achieve locked objective via validated learning only.
Boundaries
- Objective is locked: <MY VARIABLE OBJECTIVE COULD BE HERE>.
- Think/act ONLY within <REF MY VARIABLE METHODOLOGY HERE>
- Every cycle: <SOME OF MY VARIABLE OUTPUT RELATE CONSTRAINTS HERE, RELATED TO MY VARIABLE METHODOLOGY>
- Private data stays private â never share workspace contents in public/group contexts.
[End of SOUL.md]
Your SOUL.md is already pretty short, and yes: the labels + end markers help the model segment the injected prompt and âuse this section for tone/boundariesâ.
The two things that wonât work the way you probably expect are the variable placeholders and the âevery cycleâ auditing requirement:
- Putting
<MY VARIABLE OBJECTIVE âŚ>/<METHODOLOGY âŚ>inside SOUL.md makes them sticky (injected every run) even when you want to change them tomorrow. Better: keep SOUL.md objective-agnostic, and provide the current objective/method in-chat when needed. - âEvery cycle: âŚâ tends to make the model add meta-audit text to outputs, which burns context and increases stalls. If you want self-auditing, make it silent unless asked.
Hereâs a tighter SOUL.md that preserves your intended persona, keeps it low-token, and handles variable objective/methodology correctly (copy/paste replace the whole file):
## [SOUL.md] Agent Persona, Tone, and Boundaries
### Persona (always)
You are ProfitAgent.
Tone: cold, terse, precise, evidence-first. No humor. No speculation.
If uncertain: say so and ask 1 targeted question.
### Objective & methodology (variable)
- The current objective and methodology are supplied by the user at runtime (in chat).
- Treat the latest user-supplied objective/methodology as authoritative for the current session.
- If no objective/methodology is provided and it matters: ask for it briefly.
### Boundaries (always)
- Privacy: do not reveal workspace contents, secrets, or personal data in public/group contexts.
- Keep outputs short by default; prefer checklists and next actions.
- Self-audit quietly; do not include audit scores/long meta-analysis in normal replies unless the user asks.
If you want the objective to be persistent sometimes, the best pattern (without bloating every prompt) is: set it in chat (âObjective for today: âŚâ) and let it live in the session transcript, rather than hard-coding it into SOUL.md.
The methodology is a little lengthy and definitely prescribed. The objective is variable technically when compared with overall operating constraints and behavior of the agent but maybe that also could be fixed in an editable text file by me and usually not changed via chat message
Yep â treat objective and methodology differently:
- Objective is usually short (1â3 lines) and can safely live in an injected file (USER.md) so you can edit it without chat.
- Methodology is long and should not live in an injected file, or youâll pay that token cost on every message and increase hangs.
Recommended layout (works with how OpenClaw injects files)
1) Put the objective in USER.md (injected, but small)
Copy/paste this pattern into your USER.md:
## Objective (editable)
Objective: <one or two lines that you change manually>
## Methodology
Follow the methodology described in METHODOLOGY.md (not injected). Use only the short âMethodology summaryâ in AGENTS.md by default.
This gives you the âedit a file, not a chat messageâ workflow without bloating every prompt.
2) Put the full methodology in a separate file METHODOLOGY.md (NOT injected)
Create METHODOLOGY.md in the workspace with the full long prescribed methodology. Donât reference its full text anywhere injected.
When you want it applied in a session, you can say something like:
- âUse the methodology in METHODOLOGY.md for this session; donât quote it back, just apply it.â
(That triggers a one-time read/rehydration if needed, instead of every message.)
3) Keep only a compressed methodology âsummaryâ in AGENTS.md
This is the key to making it reliable on a small model. Add something like:
### Methodology summary (compressed)
- Rule 1: âŚ
- Rule 2: âŚ
- Rule 3: âŚ
(keep this <= ~10 bullets)
Use this summary by default.
Consult METHODOLOGY.md only if (a) user explicitly asks, or (b) summary is insufficient for a decision.
Never paste the full methodology into chat.
Why your earlier âRead SOUL.md for Xâ idea is half-right
Yes, the labels help the model route âthis section is tone / this section is constraintsâ. But the procedural wording (âread X step 1âŚâ) tends to cause extra meta-output and doesnât guarantee tool reads. Itâs better to state interpretation rules (âtreat the SOUL.md section as toneâ) and keep the rest minimal.
If you paste your actual long methodology (or its headings), I can compress it into a 5â10 bullet âMethodology summaryâ that preserves the intent without eating your context budget.
Eh ok now weâre working on both USER and SOUL⌠letâs focusâŚ
Doesnât openclaw inject ALL md workspace files into the LLM query?
No.
OpenClaw injects a specific set of âbootstrapâ workspace files, not every *.md in your workspace. Docs call these out explicitly as the injected set: AGENTS.md, SOUL.md, TOOLS.md, BOOTSTRAP.md (first-run only), IDENTITY.md, USER.md. It injects them on the first turn of a new session; blank files are skipped, and large ones can be trimmed/truncated with a marker. Source: https://docs.openclaw.ai/concepts/agent (âBootstrap files (injected)â).
Everything else you put in the workspace (e.g. METHODOLOGY.md, CONSTRAINTS_LONGFORM.md, random notes) is not automatically injected. The agent can still use those files if it (or you) explicitly reads them via tools / instructions, but they wonât bloat every prompt by default.
Practical implication for your tight-context setup:
- Keep the injected set short and policy-like.
- Put long methodology in a separate file (not injected) and only pull it in on demand.
On second thought, given my hardware and context window constraints help me conceptualize my requirements related to how my agent must read and write durable memory, or how I must think about short term or long term memory.
In general, consider how an agent (in general) is acting in a context towards some objective:
- performing logical inference
- picking a next possible move
- planning a contingent sequence of moves
- evaluating how well an actually chosen action achieved an objective
With your hardware/model constraints, the right mental model is: the agent is doing bounded rationality with a tiny âattention bufferâ (the context window). So âmemoryâ isnât âmore tokensâ; itâs a system design for what stays in that buffer vs what lives outside it and is pulled in only when needed.
Below is a clean way to conceptualize shortâterm vs longâterm memory that directly maps to how an agent does the four things you listed.
1) Treat context as âworking RAMâ, not âstorageâ
The context window is the agentâs working set for this decision.
If you let it fill with transcripts, long plans, or frameworks, you donât get âsmarter agentâ â you get:
- slower inference,
- higher stall risk,
- less room for the actual decision inputs.
So memory must be mostly external, and only rehydrated selectively.
2) Four cognitive jobs â four kinds of state
A) Logical inference (âwhat follows from what?â)
Needs: small set of premises + definitions + constraints.
So you want a very small always-present âinvariantsâ layer:
- stable operating constraints (context budget, safety rules),
- stable preferences (tone, boundaries),
- stable definitions (âwhat counts as âdoneââ, etc.).
This is the only stuff worth having injected every turn.
Rule: if it doesnât change decisions often, and itâs short, it can be injected.
B) Picking a next move (âwhat do I do next?â)
Needs: current subgoal + the immediate options + the one best next action.
This should live in the live chat turn, not in long memory.
Rule: The ânext moveâ should be derivable from:
- Objective (1â2 lines),
- Current subgoal (1 line),
- Current state snapshot (a few bullets),
- Constraints (tiny invariant list).
Everything else is support material in files.
C) Planning a contingent sequence (âplan A, plan BâŚâ)
Long plans donât belong in the prompt. The model canât hold them reliably, and it will start narrating/rewriting them.
Instead: planning must be hierarchical + incremental:
- keep a very short plan outline in-context (3â7 steps max),
- keep the full plan in a file (external state),
- iterate: plan â act â observe â update the file.
Rule: âPlan in files, execute in chat.â
D) Evaluating outcomes (âdid that work?â)
Evaluation is where durable memory should come from.
But you donât store the transcript. You store evidence that changes future decisions:
- a constraint discovered,
- a decision + outcome,
- a preference validated,
- a failure mode + fix.
Rule: Only promote to durable memory if itâs likely to matter again.
3) Practical memory architecture (simple + reliable on small models)
Think in layers:
-
Injected invariants (always in prompt)
Keep tiny. Examples: operating constraints, tool discipline, privacy boundaries. -
Durable memory (curated facts)
Also keep small if itâs injected. Store atomic bullets (evidence, not chat). -
Daily log / scratchpad (not injected)
Chronological âwhat happened todayâ, open loops, experiments. Useful, but not always loaded.
- Project state files (not injected)
E.g.PLAN.md,METHODOLOGY.md,DECISIONS.md. Pulled in only when needed.
OpenClaw-specific note: only certain bootstrap files are auto-injected; you can put long methodology in another file and it wonât automatically bloat every prompt.
4) A concrete âmemory writeâ rule that matches your constraints
After an action completes, decide:
-
Is this durable? (will it change future decisions?)
â put a 1-line bullet in durable memory. -
Is this just a trace / diary / open loop?
â put it in the daily log.
Example durable memory bullet style:
- âConstraint: Ollama 8B on 16GB becomes unstable with large effective context; keep steady-state prompt small; if stall, test
ollama runand restart Ollama.â
Thatâs âevaluation â evidence â future decision ruleâ in one line.
If you want, I can turn this into a tiny set of operating rules (10â15 lines) that fits your context budget and directly instructs the agent when to (a) write daily log vs (b) promote to durable memory vs (c) do nothing.
Could it help to constrain LLM query types to one of four different kinds? (Do logical inference, Determine possible next actions and pick best, Plan contingent sequence of moves)?
Because the type may determine the kind of information I then need to ensure is injected into the context window.
Yes â explicitly classifying turns into a small set of âquery typesâ helps a lot under tight context constraints. It forces the agent to only pull the minimum information needed for that kind of thinking, instead of dragging in everything âjust in caseâ.
Think of it as a context budgeting protocol.
A good minimal taxonomy (4 types)
-
INFER â logical inference / diagnosis / âwhat follows from these facts?â
Context should include: only the premises, constraints, definitions.
Avoid: history, plans, long memory. -
DECIDE â enumerate options and pick the next best move
Context should include: objective (1â2 lines), current state snapshot (bullets), constraints, 3â7 candidate actions.
Avoid: big background. -
PLAN â contingent sequence / multi-step strategy
Context should include: objective + constraints + current state + short outline (3â7 steps).
Output should be: a plan written to a file (external), not held in-chat.
Avoid: fully expanded plan in the prompt.
- EVAL â evaluate outcome / âdid that action work?â and memory promotion
Context should include: expected result vs observed result + success criteria.
Output should be: (a) 1â3 bullets of evaluation, (b) optional durable memory bullet, (c) next action.
How it changes what you inject
- Your injected workspace files should only contain:
- the rules for choosing a type, and
- the context budget rules per type.
- The actual âextra infoâ (methodology, long plans, long logs) should live in non-injected files and be pulled only when the chosen type requires it.
How to implement in practice (without changing OpenClaw)
Add a tiny rule to AGENTS.md like:
- âBefore answering, silently pick one of: INFER / DECIDE / PLAN / EVAL.â
- âThen follow the context rules for that type.â
- âIf the user doesnât specify the type, choose the smallest type that works.â
And optionally let the user force it in chat, e.g.:
- âTYPE=INFER: âŚâ
- âTYPE=PLAN: âŚâ
This is prompt-level control (so it wonât be perfect), but itâs surprisingly effective at preventing âcontext sprawlâ that wedges Ollama.
If you want, I can give you a 10-line AGENTS.md addendum that encodes this protocol very tightly (minimal tokens) so it wonât bloat your prompt.
If I were to constrain an agent to only be able to do one of those four at a time, would it tend to operate in a consistent ordered cycle of those four kinds of queries?
Or would it jump around depending on need?
It would jump around depending on need, unless you force a fixed cadence.
Why it wonât naturally run as a fixed INFERâDECIDEâPLANâEVAL cycle
Those four âquery typesâ arenât four phases that always occur. Theyâre more like four operators you apply when the situation demands it:
- Sometimes you can DECIDE immediately (âsend /stopâ, ârestart Ollamaâ) with no inference.
- Sometimes you must INFER first (debugging, diagnosis) before any decision is rational.
- Sometimes you PLAN only when the task is multi-step and expensive to reverse.
- Sometimes you EVAL only after an action (or experiment) produced an observation.
So the natural pattern is conditional branching, not a fixed loop.
The cycle that does emerge (and is worth enforcing)
For agentic work under tight context, the best forced loop is a 3-step micro-cycle:
- DECIDE: choose the next action
- ACT: execute (tool/user action)
- EVAL: check result + write minimal evidence (optional)
âŚand you insert INFER or PLAN only when youâre blocked:
- If blocked by uncertainty â INFER (small premise set)
- If blocked by multi-step structure â PLAN (write plan to file, then go back to DECIDE)
How to encode this in AGENTS.md (minimal + effective)
If you want a consistent ordered behavior without over-constraining, use this rule:
- Default loop: DECIDE â ACT â EVAL.
- Only use INFER when you need to resolve uncertainty that affects action choice.
- Only use PLAN when the task is multi-step; write plan externally; then resume DECIDE.
That gives you consistency and flexibility, and it keeps context small because most turns stay in DECIDE/EVAL.
If you want, I can give you a tiny âprotocol blockâ (8â12 lines) to paste into AGENTS.md that enforces exactly that.
Wouldnât INFER or PLAN essentially be actions that agent could DECIDE to ACT upon?
Yes â thatâs a clean way to think about it.
Two levels
Level 1: Meta-decision (control)
- DECIDE is the control step: âwhat should I do next to make progress?â
From that control step, the agent can choose to ACT by doing one of several things:
Level 2: Chosen action (what it actually does next)
- run a tool / ask the user a question / write a file (external action), or
- do a bounded chunk of reasoning like INFER or PLAN (internal action: no side effects, but consumes tokens/context).
So in practice:
- INFER = an action you choose when uncertainty blocks action-selection
- PLAN = an action you choose when multi-step structure blocks action-selection
Why keep INFER/PLAN as explicit âtypesâ anyway?
Even if theyâre âactionsâ, labeling them helps you enforce budget rules:
- If you choose INFER, you should not also drag in long plans, long memory, etc.
- If you choose PLAN, you should produce a compact plan (ideally written to a file) and then immediately return to DECIDE/ACT.
A tight protocol that matches this model
- DECIDE: Do I need INFER, PLAN, or can I ACT directly?
- If INFER: do it briefly â output only conclusion + 1â3 premises.
- If PLAN: write short plan to file â output next action.
- Otherwise ACT: do tool/user step.
- EVAL: did it work? (optional memory bullet)
That keeps your 16GB/8B setup stable because âinternal actionsâ donât sprawl into âdo everything at once.â
EVAL is really INFER come to think of it.
Theyâre closely related, but itâs useful to keep them distinct.
Why EVAL â INFER (even though both are âreasoningâ)
-
INFER answers: âGiven these facts, what is true / whatâs the diagnosis / what follows?â
Output: a conclusion (and maybe the minimal premises). -
EVAL answers: âGiven an objective + a prior action + an observed result, did it succeed, and what should we update?â
Output: pass/fail (or degree), what changed, and whether to write durable memory / adjust the plan.
So EVAL is basically inference + a scoring/update rule.
Practical reason to keep EVAL as its own type
Under tight context, EVAL is where you decide:
- âDo I promote this to durable memory?â
- âDo I keep going or backtrack?â
- âWhat constraint did we learn?â
If you collapse it into generic INFER, the agent tends to skip the explicit âupdateâ step and you lose the benefit of learning over time.
Minimal version
If you want to simplify: you can reduce to three:
- DECIDE (control)
- THINK (infer/plan/eval internally)
- ACT (external)
âŚbut if youâre specifically trying to enforce good memory hygiene, keeping EVAL separate is a feature, not a bug.
How would normative analysis apply to INFER?
Normative analysis for INFER is basically: given your constraints, what inference procedure should an agent follow to be ârationalâ (i.e., high expected accuracy per token / per unit risk)?
Under your setup (small model + tight context), the normative goal isnât âmaximum clevernessâ â itâs reliable inference with bounded compute.
Normative INFER: what âgoodâ looks like
1) Choose the right question form
Instead of âExplain everything about Xâ, phrase inference as:
- âWhich of these hypotheses best explains the evidence?â
- âWhatâs the minimal missing fact that would disambiguate A vs B?â
- âWhatâs the highest-leverage test?â
Normatively: reduce inference to a small decision.
2) Use a hypothesis set + discriminating evidence
A good INFER step does:
- list 2â5 plausible hypotheses (not 20),
- list the one or two observations that would discriminate them,
- pick the next question/test.
Normatively: maximize information gain per token/tool call.
3) Bound the reasoning budget
With a small model, long chains degrade quickly. So normatively:
- cap inference to a short chunk,
- stop early and ask for one missing datum.
This is âoptimal stoppingâ under limited compute.
4) Track uncertainty explicitly (but minimally)
You donât need probabilities everywhere. Normatively:
- state confidence as {high/medium/low},
- name the key uncertainty.
That prevents overcommitment, which is a common failure mode on small models.
A concrete INFER template (tight-context)
- Question: what are we trying to infer?
- Evidence (3â7 bullets): only the relevant facts
- Hypotheses (2â5 bullets): A/B/CâŚ
- Discriminator: ask 1 question or run 1 test
- Conclusion: provisional best guess + confidence
Example in your domain (debugging âagent hangsâ)
- Evidence: âOpenClaw run starts, no reply;
ollama runhangs too.â - Hypotheses: (A) Ollama daemon wedged (B) model load thrash due to huge context (C) resource exhaustion
- Discriminator:
curl /api/version,ollama ps, restart Ollama - Conclusion: âMost likely Ollama wedged; restart resolves; cap context to avoid recurrence.â
Thatâs normative INFER: small hypothesis set, minimal evidence, one discriminating action.
If you want, I can condense this into a 6â8 line âINFER ruleâ you can paste into AGENTS.md so the agent consistently follows it.
How would descriptive analysis apply to INFER? (And how is this different from normative analysis applied to INFER)?
Descriptive analysis for INFER is: what does the agent/model actually do in practice when asked to infer under your real constraints? Normative analysis is: what should it do to be ârationalâ (accuracy/utility per token/time/risk)?
Same task (âinferâ), two very different lenses.
Descriptive INFER (what actually happens)
You look at the agent as a bounded, fallible system and characterize its real inference behavior:
1) What heuristics does it use?
Common descriptive patterns (especially on small local models):
- jumps to the first plausible explanation (âavailabilityâ)
- over-generalizes from one example (ârepresentativenessâ)
- prefers coherent stories over uncertainty (narrative bias)
- treats missing data as if it were present (hallucination-as-completion)
2) What failure modes show up repeatedly?
For INFER, typical failures you can observe:
- premise drift: it subtly changes facts (âyou said Xâ) and builds on it
- overconfidence: gives a single answer when evidence supports multiple
- context contamination: irrelevant earlier text steers the conclusion
- length = error: the longer the chain, the more likely it goes off-rails
3) How does it behave under resource pressure?
With tight context / memory pressure, youâll often see:
- it âforgetsâ constraints it should respect
- it stops asking clarifying questions and guesses
- it becomes inconsistent across turns
Descriptive analysis is empirically grounded: youâd validate it by looking at transcripts, âwhat it saidâ, and whether it was right.
Normative INFER (what it should do)
Normative INFER specifies the ideal procedure given constraints:
- keep hypotheses small (2â5)
- seek discriminating evidence
- bound chain length
- ask one targeted question when blocked
- report uncertainty minimally
Normative is prescriptive: it defines the algorithm you want.
The difference in one sentence
- Normative INFER: âHere is the best inference strategy under constraints.â
- Descriptive INFER: âHere is how this particular agent/model actually behaves (biases, errors, heuristics) under those constraints.â
Why you need both (especially with Ollama 8B / 16GB)
- Normative rules alone can be too âaspirationalâ â the model wonât reliably follow them.
- Descriptive findings tell you what rules are necessary (e.g., âforce it to ask one clarifying question instead of guessingâ) and what to avoid (e.g., long chains, multi-page prompts).
A practical workflow:
- Write a normative INFER policy (short).
- Observe the agent (descriptive): where does it violate the policy?
- Patch the policy to be more enforceable (e.g., âIf confidence < high, ask 1 question.â)
If you want, I can give you a tiny âINFER protocolâ block for AGENTS.md thatâs designed to counter the most common descriptive failure modes (premise drift + overconfidence + long chains) without adding many tokens.
Concisely summarize the kind of hardware constraints and calculations I sent you, then, summarize this context budget protocol, and give me a protocol block for each DECIDE, INFER, ACT, EVAL, PLAN
Be sure to take advantage of normative vs descriptive analysis for each so that we can ensure agent behaves effectively rational and this context budget protocol achieves its purpose
Hardware constraints + calculations (your summary)
- Platform: Mac M4, 16GB unified memory (RAM/VRAM shared).
- Model: Ollama
llama3.1:8b-instruct-q4_K_M(~4.5â5GB weights resident). - Main scaling cost: KV cache grows ~linearly with context (you used ~1.0â1.2MB per 1k tokens as an empirical rule-of-thumb).
- Practical budget: after OS/runtime/margins, ~9â10GB usable â ~5GB left for KV after weights â hard realistic max ~4kâ4.5k tokens, with 2kâ3k as the rational steady-state envelope.
- Failure mode: exceeding the envelope â memory pressure/swap â unstable latency/hangs.
Context budget protocol (summary)
- Classify the turn into a small type (DECIDE / INFER / PLAN / ACT / EVAL).
- For that type, only include the minimum required information in-context.
- Prefer externalization (files, short notes) over monolithic in-context state.
- Keep outputs short; ask for one missing discriminator when blocked.
- After actions, do a minimal EVAL and optionally promote one durable memory bullet.
Protocol blocks (copy/paste-ready)
These are written to counter common descriptive failures (overconfidence, premise drift, long chains) while enforcing normative goals (info gain per token, bounded compute, reliable progress).
DECIDE
### TYPE: DECIDE (choose next action)
Normative:
- Optimize for progress per token: choose the smallest action that reduces uncertainty or advances the subgoal.
- Consider 2â5 options max; pick 1 next action.
- If uncertain, choose a discriminating question/test rather than guessing.
Descriptive guardrails (small-model):
- Do NOT write long justifications.
- Do NOT enumerate >5 options.
- If you feel âpulledâ to explain background, stop and output only the decision + next step.
Required context (minimal):
- Objective (1â2 lines)
- Current subgoal (1 line)
- Current state (3â7 bullets)
- Constraints (1â3 bullets)
Output format:
- Next action: âŚ
- Why (1 line): âŚ
- If blocked: ask 1 question.
INFER
### TYPE: INFER (diagnose / conclude from evidence)
Normative:
- Use a small hypothesis set (2â4).
- Maximize information gain: ask 1 discriminating question/test if evidence is insufficient.
- Report uncertainty briefly (high/med/low).
Descriptive guardrails (small-model):
- Avoid long chains; stop after one inference step.
- Avoid premise drift: restate key evidence as 3â7 bullets before concluding.
- Avoid overconfidence: if confidence < high, ask 1 question instead of asserting.
Required context (minimal):
- Evidence (3â7 bullets)
- Definitions/constraints needed for inference (0â3 bullets)
Output format:
- Hypotheses: A/B(/C)
- Best guess: âŚ
- Confidence: high/med/low
- Discriminator (1 question/test): âŚ
PLAN
### TYPE: PLAN (multi-step / contingent)
Normative:
- Create a short plan that is executable under constraints.
- Externalize detail: write the plan to a file if it exceeds ~7 steps.
- Include one fallback/branch only if it changes the next action.
Descriptive guardrails (small-model):
- Do NOT expand the full roadmap in-chat.
- Do NOT rehash history.
- Keep plan to 3â7 steps; if longer, summarize and externalize.
Required context (minimal):
- Objective (1â2 lines)
- Constraints (1â3 bullets)
- Current state (3â7 bullets)
Output format:
- Plan (3â7 numbered steps)
- Next action (step 1): âŚ
- Success check: what will we observe after step 1?
ACT
### TYPE: ACT (execute / request action)
Normative:
- Execute exactly one next action (tool call OR one user request).
- Prefer reversible/low-risk actions.
- Minimize tool chatter; only run tools that materially change state.
Descriptive guardrails (small-model):
- Donât âdo multiple things at onceâ.
- Donât assume tool results; wait for output.
- If a tool might be risky/irreversible, ask the user first.
Required context (minimal):
- The chosen action (1 line)
- Any parameters needed (paths/ids) (1â5 lines)
Output format:
- Doing: âŚ
- Expected result: âŚ
- If tool output is large: summarize only.
EVAL
### TYPE: EVAL (check outcome + update memory)
Normative:
- Compare observed result vs success criteria.
- Decide: continue / adjust / rollback.
- Promote only durable evidence to long-term memory.
Descriptive guardrails (small-model):
- Donât write a long retrospective.
- Donât store transcript-style memory.
- Write at most 1 durable memory bullet per eval.
Required context (minimal):
- Action taken (1 line)
- Expected result (1 line)
- Observed result (1â5 bullets)
Output format:
- Outcome: success / partial / fail
- What changed (1â3 bullets)
- Next action: âŚ
- Durable memory? (optional 1 bullet): âŚ
Optional: a one-line âtype selectorâ rule (put in AGENTS.md)
Before replying, silently choose exactly one TYPE: DECIDE / INFER / PLAN / ACT / EVAL. Use only the minimal context rules for that TYPE. Default is DECIDE â ACT â EVAL.
If you want, I can compress each block further (to reduce injected tokens) while keeping the same control behavior.
Ok now create compatible (with this context budget protocol) workspace text that results in the agent correctly externalizing the right kind of context info for each type in a way thatâs consistent, scales, and useful for when the agent has to read context info for that type
Below is a workspace text system thatâs compatible with your context budget protocol and your Mac M4 / 16GB / Ollama 8B constraints.
Design goals:
- Injected files stay tiny (so they donât bloat every new session).
- Each TYPE externalizes the right info into small, purpose-built files.
- When the agent needs context for a TYPE later, it reads only the relevant file(s) (not the world).
Iâm giving you copy/paste blocks for:
- Injected:
AGENTS.md,SOUL.md,USER.md,MEMORY.md - Non-injected (external state):
STATE.md,EVIDENCE/TEMPLATE.md,PLANS/TEMPLATE.md,EVAL_LOG.md,DECISIONS.md
AGENTS.md (injected)
## AGENTS.md â Context Budget Protocol + Externalized State Map (Mac M4 ¡ 16GB ¡ Ollama 8B)
### Ground truth
- This file is prompt instructions, not executable code.
- Keep working context small: target ~2kâ3k tokens steady-state; avoid > ~4k.
- Never paste large logs/transcripts/file dumps unless explicitly requested.
### Type selector (choose ONE per reply)
Types: DECIDE / INFER / PLAN / ACT / EVAL
Default loop: DECIDE â ACT â EVAL
Only use INFER when uncertainty blocks action selection.
Only use PLAN when the task is multi-step.
### External state layout (DO NOT inject these; read/write via tools only)
- STATE.md = current snapshot (small, always updated)
- DECISIONS.md = decision log (1â3 lines per decision)
- EVAL_LOG.md = outcomes + lessons (1â5 lines per eval)
- EVIDENCE/<topic>.md = facts + hypotheses for a specific inference
- PLANS/<topic>.md = multi-step plan for a specific objective/subgoal
- MEMORY.md = curated durable constraints/decisions/preferences (atomic bullets)
Scaling rule:
- Keep STATE.md <= ~50 lines.
- Keep any EVIDENCE/* and PLANS/* file <= ~200 lines; if it grows, create a new file and link it from STATE.md.
### Output discipline (descriptive guardrails for small models)
- Avoid long chains and âframework essaysâ.
- If confidence < high: ask 1 discriminating question/test instead of guessing.
- After any tool output: summarize in 3â7 bullets; quote only minimal relevant lines.
---
## TYPE PROTOCOLS (what to read/write)
### TYPE: DECIDE (pick next move)
Normative: maximize progress per token; consider 2â5 options; pick 1 next action.
Descriptive guardrail: no long justification.
Context to keep in prompt: objective (1â2 lines), subgoal (1 line), state (<=7 bullets), constraints (<=3 bullets).
Read (if needed): STATE.md only.
Write: append 1 line to DECISIONS.md; update STATE.md (Next action + Success check).
### TYPE: INFER (diagnose / conclude from evidence)
Normative: 2â4 hypotheses; pick 1 discriminator; report uncertainty briefly.
Descriptive guardrail: restate evidence before concluding to avoid premise drift.
Read: STATE.md (only if needed for objective/constraints) + EVIDENCE/<topic>.md.
Write: update EVIDENCE/<topic>.md with conclusion + next discriminator.
### TYPE: PLAN (multi-step / contingent)
Normative: 3â7 steps; externalize details to PLANS/<topic>.md; include only next step in chat.
Descriptive guardrail: do not expand full plan in chat.
Read: STATE.md.
Write: PLANS/<topic>.md; update STATE.md (Plan pointer + Next action).
### TYPE: ACT (execute one step)
Normative: execute exactly one action (tool call OR one user request); wait for results.
Descriptive guardrail: donât batch actions; donât assume tool output.
Read: STATE.md (Next action).
Write: update STATE.md (Last action + observed result summary).
### TYPE: EVAL (evaluate outcome + update memory)
Normative: compare expected vs observed; decide continue/adjust; promote only durable evidence.
Descriptive guardrail: no long retrospectives; max 1 durable memory bullet.
Read: STATE.md + (optional) relevant PLAN/EVIDENCE file.
Write: append to EVAL_LOG.md; optionally add 1 bullet to MEMORY.md; update STATE.md (Next action).
SOUL.md (injected)
## SOUL.md â Persona + Boundaries (tight-context)
Persona: cold, terse, precise, evidence-first. No humor. No speculation.
If uncertain: say so and ask 1 targeted question.
Boundaries:
- Privacy first. Donât reveal private workspace contents in public/group contexts.
- Keep responses short; prefer bullets + next action.
- Do not modify system/runtime configuration unless the user explicitly asks. You may recommend actions for the user to take.
USER.md (injected)
## USER.md â Environment + Preferences
Environment: Mac M4, 16GB unified memory. Local model: Ollama llama3.1 8B Q4.
Preference: concise outputs; avoid prompt bloat; externalize plans/evidence to files; memory is durable evidence only.
Objective: (user edits this line as needed)
- <WRITE CURRENT OBJECTIVE HERE IN 1â2 LINES>
MEMORY.md (injected if present in your flow; keep short)
## MEMORY.md â Durable evidence only (atomic bullets)
- Constraint: on Mac M4 16GB + Ollama 8B, keep steady-state context ~2kâ3k tokens; avoid > ~4k.
- Policy: donât paste large logs/transcripts; summarize and extract minimal lines.
- Policy: promote only durable constraints/decisions/preferences; no transcript-style memory.
- Failure mode: if agent hangs and `ollama run ...` hangs too, restart Ollama; then continue.
External state files (NOT auto-injected)
STATE.md (create this file; the agent reads/writes it as needed)
# STATE.md (keep <= ~50 lines)
## Objective (1â2 lines)
<current objective>
## Current subgoal (1 line)
<current subgoal>
## Constraints (<=3 bullets)
- Context scarce (2kâ3k target; avoid >4k)
- Summarize, donât dump
- One action at a time
## Working state (<=7 bullets)
- <key facts needed right now>
## Plan pointer (optional)
- Plan: PLANS/<topic>.md
## Evidence pointer (optional)
- Evidence: EVIDENCE/<topic>.md
## Next action (1 line)
<the single next step>
## Success check (1 line)
<what we will observe if the next action worked>
## Last action + observed result (<=3 bullets)
- Action: <what was done>
- Observed: <what happened>
- Note: <one relevant implication>
DECISIONS.md
# DECISIONS.md (append-only, 1â3 lines per decision)
YYYY-MM-DD HH:MM â Decision: <what> â Why: <1 line> â Next: <1 line>
EVAL_LOG.md
# EVAL_LOG.md (append-only, keep entries short)
YYYY-MM-DD HH:MM â Expected: <1 line> â Observed: <1â2 lines> â Outcome: success|partial|fail â Next: <1 line>
Lesson (optional, 1 line): <durable evidence candidate>
EVIDENCE/TEMPLATE.md (create folder EVIDENCE/)
# EVIDENCE/<topic>.md
## Question
<what are we trying to infer?>
## Evidence (<=10 bullets)
- <fact>
- <fact>
## Hypotheses (2â4)
- A: ...
- B: ...
- C: ...
## Discriminator (next question/test)
- <one best question/test>
## Current best guess
- Conclusion: ...
- Confidence: high|med|low
PLANS/TEMPLATE.md (create folder PLANS/)
# PLANS/<topic>.md
## Objective
<1â2 lines>
## Constraints
- <1â3 bullets>
## Plan (3â7 steps)
1) ...
2) ...
3) ...
## Next action (step 1)
...
## Success check (after next action)
...
Why this âscalesâ
- Per-topic evidence and plan files prevent âone giant fileâ reads.
STATE.mdstays tiny and acts as the router: it points to the one plan/evidence file that matters.- Each TYPE has a deterministic read/write target, so the agent can reliably rehydrate the correct context later.
If you want, I can also give you an even smaller injected AGENTS.md that only contains the file map + the type selector (and moves the detailed type protocols into a non-injected PROTOCOL.md).
Would that still be effective?
Yes â it can still be effective, and in your setup itâs usually more effective because it keeps the always-injected prompt small.
The catch: a non-injected PROTOCOL.md only influences behavior if the model actually uses it. Since it wonât be automatically injected, you need one of these patterns:
Pattern A (recommended): tiny summary injected + full protocol external
- Put a very small, enforceable summary in
AGENTS.md(type selector + file map + âdonât dump contextâ rules). - Put the detailed playbooks in
PROTOCOL.md(not injected). - Rule in
AGENTS.md: âIf the summary is insufficient for the current decision, readPROTOCOL.mdonce, then proceed.â
This works well because most turns never need the full document.
Pattern B: read PROTOCOL.md once per new session
- Add in
AGENTS.md: âAt the start of a new session, readPROTOCOL.mdonce.â - This is consistent, but it forces a tool read each new session (usually fine, but itâs extra overhead).
Minimal AGENTS.md that makes the split work (copy/paste)
## AGENTS.md â Tight-context invariant rules (Mac M4 16GB / Ollama 8B)
- Keep working context small (2kâ3k target; avoid >4k). Summarize; donât dump.
- Choose exactly one TYPE per reply: DECIDE / INFER / PLAN / ACT / EVAL. Default loop: DECIDEâACTâEVAL.
- External state: STATE.md (router), DECISIONS.md, EVAL_LOG.md, PLANS/<topic>.md, EVIDENCE/<topic>.md, MEMORY.md.
- Use only STATE.md + the one relevant PLANS/* or EVIDENCE/* file for the chosen TYPE.
- If you need detailed rules beyond this summary, read PROTOCOL.md (once) and then proceed. Do not paste PROTOCOL.md into chat.
So: yes, the âsmall injected + big externalâ approach remains effective as long as the injected part contains enough to (a) choose the type and (b) know what to read/write nextâand includes a clear trigger for when to consult PROTOCOL.md.
Would telling it âif you needâŚâ result in weird behavior? Like what if it just decides to interpret this AGENTS file in its own way and decides it doesnât need to read the official protocol?
Yes â âif you needâŚâ is ambiguous enough that an LLM (especially a small one) can rationalize âI donât need itâ and then freewheel. Thatâs a real descriptive failure mode.
If you want consistent behavior, donât use a soft conditional. Use a deterministic trigger.
Two robust patterns
Pattern A (most consistent): always read PROTOCOL.md once per session
This removes interpretation variance.
Add this to AGENTS.md (copy/paste):
- At the start of every new session: read PROTOCOL.md once before answering anything.
- If PROTOCOL.md cannot be read for any reason: ask the user to paste it (do not guess).
- PROTOCOL.md is authoritative; if it conflicts with AGENTS.md, PROTOCOL.md wins.
- Never claim you read PROTOCOL.md unless you actually did in this session.
Tradeoff: one tool read per new session (usually worth it for stability).
Pattern B (lighter): force-read only on âhardâ types
If you want fewer reads, make the trigger mechanical:
- You MUST read PROTOCOL.md before any TYPE=INFER or TYPE=PLAN response.
- For TYPE=DECIDE/ACT/EVAL: use AGENTS.md summary + STATE.md only (do not read PROTOCOL.md).
- Never claim you read PROTOCOL.md unless you actually did in this session.
- PROTOCOL.md is authoritative; if it conflicts with AGENTS.md, PROTOCOL.md wins.
Tradeoff: slightly more variance than A, but much less overhead.
Why this works
It replaces âjudgment callâ (âif you needâŚâ) with a rule the model can follow without meta-reasoning, which is exactly what you want under tight context and small-model quirks.
If you tell me which pattern you prefer (A or B), Iâll give you a single, minimal AGENTS.md block that includes your TYPE protocol summary + the chosen PROTOCOL-read rule, sized to avoid prompt bloat.
I feel like for my needs it needs to read PROTOCOL so it always knows what words have specific meaning
Then do Pattern A: make PROTOCOL.md the authoritative spec, and require the agent to read it at the start of every new session (before it answers anything). Thatâs the cleanest way to keep âspecial wordsâ stable without bloating the always-injected files.
Two important points up front:
PROTOCOL.mdis not automatically injected. So if you want the agent to âalways knowâ your defined vocabulary, you must explicitly require areadofPROTOCOL.mdat session start.- Donât put the entire spec in injected files. Keep injected files tiny; put the full semantics in
PROTOCOL.md.
1) Paste this into AGENTS.md (replace the file)
## AGENTS.md â Tight-context operating rules (Mac M4 16GB / Ollama 8B)
### Non-negotiable constraints
- Keep working context small: target ~2kâ3k tokens steady-state; avoid > ~4k.
- Never paste large logs/transcripts/file dumps unless explicitly requested.
- Default output: concise bullets + next action. Avoid long explanations unless asked.
### Protocol requirement (authoritative semantics)
- PROTOCOL.md defines the meaning of special terms (TYPE, STATE, EVIDENCE, PLAN, DECIDE/INFER/PLAN/ACT/EVAL).
- At the start of every new session, BEFORE answering the user, you MUST read PROTOCOL.md once.
- If PROTOCOL.md cannot be read: ask the user to paste it. Do not guess.
- Never claim you followed PROTOCOL.md unless you actually read it in this session.
### Runtime changes
- Do not modify system/runtime configuration unless the user explicitly asks.
- You may recommend actions for the user to take (e.g. /stop, restart Ollama).
2) Create PROTOCOL.md (this is the âword meaningsâ source of truth)
Keep the Definitions section at the very top and short. Hereâs a good starting template:
# PROTOCOL.md â Context Budget Protocol (authoritative)
## Definitions (these words have specific meaning)
- TYPE: exactly one of {DECIDE, INFER, PLAN, ACT, EVAL} chosen per reply.
- STATE.md: the single ârouterâ file containing the current snapshot + pointers.
- EVIDENCE/<topic>.md: facts + hypotheses + discriminator for one inference.
- PLANS/<topic>.md: multi-step plan for one objective/subgoal.
- DECISIONS.md: append-only log of decisions (1â3 lines each).
- EVAL_LOG.md: append-only outcome log (short entries).
- MEMORY.md: durable evidence only (atomic bullets; no transcripts).
## Global constraints
- Context budget: aim 2kâ3k steady-state; avoid >4k.
- Never dump large content into chat; summarize and quote minimal lines only.
## Type selector rule
- Choose exactly one TYPE per reply.
- Default loop: DECIDE â ACT â EVAL.
- Only use INFER when uncertainty blocks action selection.
- Only use PLAN when the task is multi-step.
## TYPE: DECIDE
- Read: STATE.md (only)
- Write: DECISIONS.md (append), STATE.md (update Next action + Success check)
- Output: Next action + 1-line why + (optional) 1 question if blocked
## TYPE: INFER
- Read: STATE.md (optional) + EVIDENCE/<topic>.md
- Write: EVIDENCE/<topic>.md (update conclusion + discriminator)
- Output: hypotheses (2â4) + best guess + confidence + discriminator
## TYPE: PLAN
- Read: STATE.md
- Write: PLANS/<topic>.md + STATE.md (pointer + next action)
- Output: 3â7 step outline + next action only (no full expansion)
## TYPE: ACT
- Read: STATE.md (Next action)
- Write: STATE.md (Last action + Observed result summary)
- Output: do exactly one action (tool call OR one user request)
## TYPE: EVAL
- Read: STATE.md + (optional) relevant plan/evidence file
- Write: EVAL_LOG.md (append), STATE.md (next action), optional 1 bullet to MEMORY.md
- Output: success/partial/fail + next action + optional durable memory bullet
Why this will behave the way you want
- The injected
AGENTS.mdis small and stable, but it forces the tool-read ofPROTOCOL.mdat session start, so the model gets your vocabulary/semantics early in the session. PROTOCOL.mdbeing authoritative avoids the âLLM interprets it its own wayâ problemâbecause your rule removes discretion (âif you needâŚâ).
If you want, I can also give you minimal templates for STATE.md, EVIDENCE/TEMPLATE.md, and PLANS/TEMPLATE.md so the externalized files stay consistent and easy to rehydrate.
State the purpose of this context buffer protocol and what I does for this agent and any agent I build in the future.
Purpose of the context buffer protocol: make an agent rational and reliable when its attention buffer (LLM context window) is scarce and expensive.
What it does for this agent (Mac M4 / 16GB / Ollama 8B):
- Prevents prompt bloat: forces each turn to use only the minimum information needed, keeping you in the safe ~2kâ3k token operating envelope and reducing hangs.
- Improves consistency: by classifying each turn (DECIDE / INFER / PLAN / ACT / EVAL), the agent doesnât mix modes (e.g., planning + diagnosing + journaling in one response).
- Forces externalization: long plans, evidence, and logs live in files (STATE/PLANS/EVIDENCE/etc.), not in the prompt. The model only rehydrates what the current TYPE requires.
- Enables scalable memory: durable memory becomes âevidence that changes future decisions,â not transcript historyâso memory stays useful without growing the prompt.
- Creates a feedback loop: EVAL captures outcome â optionally promotes one durable constraint/lesson â future DECIDE/INFER improves.
What it does for any future agent you build:
- Itâs a hardware-agnostic operating system for agent cognition: even with bigger models, it keeps behavior clean; with smaller models, it prevents collapse.
- It gives you a portable structure: same files, same TYPE semantics, same rules for what to read/writeâso you can swap objectives, swap tools, even swap models/providers, without reinventing âhow the agent thinks.â
- It makes agents auditable and maintainable: decisions, plans, evidence, and evaluations are stored in predictable places, so you (and the agent) can inspect and resume work without reloading huge context.
In short: it turns âLLM as a big chatâ into âLLM as a bounded controller with an external state store,â which is the stable pattern for long-running, constraint-aware agents.
I notice sometimes my backend OLLAMA model uses GPU and other times CPU. Are certain kinds of tasks done better with GPU than CPU and vice versa? Explain
In Ollama, âGPU vs CPUâ isnât about different reasoning skillsâitâs almost entirely about performance + memory availability while doing the same core operation (token-by-token transformer inference).
What GPU is better at
- Throughput / speed: GPUs (Metal on Apple Silicon) are very good at the big dense linear algebra that dominates inference.
- Longer prompts can benefit more: attention/compute grows with prompt length, and GPUs handle that parallel work better if everything fits comfortably in memory.
What CPU is âbetterâ at (practically)
- Fallback when GPU memory is tight: if the model + KV cache (context) + overhead donât fit, or the system is under memory pressure, Ollama may run partially or fully on CPU.
- Stability under contention: if other apps are using GPU heavily, or the system is thermally/power constrained, CPU execution can be more consistent (but slower).
- Lower setup/dispatch overhead: for very small workloads, CPU can sometimes be âgood enough,â but for LLM generation itâs usually still slower overall.
Why you see it switch on the same machine
The most common reasons are:
- Different effective context (
num_ctx) / KV cache size between runs (large context = large KV = more memory pressure). - Concurrent model loads or other memory usage (your unified 16GB is shared by everything).
- System/GPU load changes (other apps using GPU, background tasks, etc.).
You can observe what Ollama decided with:
ollama ps
(look at the âProcessorâ column)
Does GPU vs CPU change answer quality?
Generally no (same model, same weights). You might see tiny numeric differences from floating-point math, but it shouldnât turn âgood reasoningâ into âbad reasoning.â The big difference youâll feel is latency and whether it hangs/thrashes under memory pressure.
Practical tip for your setup (Mac M4, 16GB)
Keeping your context window capped (like youâre doing) makes it more likely the model stays on GPU consistently, because the KV cache stays smaller and avoids memory pressure.
Given my context buffer protocol and my hardware constraints that I previously gave to you, what are the largest configuration values I should set in openclaw.json?
Given your Mac M4 + 16GB unified memory + Ollama 8B Q4 (safe working envelope ~2â3k tokens, âdonât exceed ~4kâ), the config values that most directly control âhow big prompts getâ are:
1) Model context + output (hard cap)
Largest Iâd set: contextWindow: 4096 for your Ollama model entry. (8192 might work sometimes, but your own analysis says instability past ~4k is the cliff.)
Also cap output so a single reply canât explode the KV cache:
- Largest Iâd set:
maxTokens: 512(maybe 1024 if you really need long replies)
(These live in your explicit models.providers.ollama.models[] entry.)
2) Injected workspace file size caps (so bootstrap files donât bloat every run)
OpenClaw truncates injected bootstrap files using:
agents.defaults.bootstrapMaxChars(default 20000 chars)agents.defaults.bootstrapTotalMaxChars(default 150000 chars)
Source:/concepts/context(shows these exact keys) https://docs.openclaw.ai/concepts/context
For your setup, largest Iâd set:
bootstrapMaxChars: 4000 (per injected file)bootstrapTotalMaxChars: 20000 (across all injected files)
This forces you (in a good way) to keep AGENTS.md/SOUL.md/TOOLS.md/... short.
3) Telegram history injection
Telegram has historyLimit (default example shows 50) and that history gets pulled into context.
Source: config reference Telegram section https://docs.openclaw.ai/gateway/configuration-reference
For your setup, largest Iâd set:
channels.telegram.historyLimit: 10 (maybe 20 if you like more continuity)
4) Tool schema bloat (big hidden token cost)
Tool schemas (especially browser) can be thousands of tokens every run. If youâre not using them, donât send them to the model.
Use tools.profile + tools.deny (deny wins).
Docs: https://docs.openclaw.ai/tools
For your setup (Telegram + file-based externalization), a good âmaximal but safeâ tool set is:
- allow:
group:fs,group:memory,group:messaging,group:sessions - deny:
group:web,group:ui(browser/canvas),group:nodes(unless you use nodes)
5) Compaction reserve floor (important with small context)
OpenClaw enforces a reserveTokens floor for embedded runs. Default floor is 20000, which is nonsensical if your real window is 4096.
Docs: https://docs.openclaw.ai/reference/session-management-compaction
Set:
agents.defaults.compaction.reserveTokensFloor: 0 (disable the huge floor)
Optionally disable the pre-compaction memory flush if you donât want surprise silent turns:agents.defaults.compaction.memoryFlush.enabled: false
Docs: https://docs.openclaw.ai/concepts/memory
Example openclaw.json snippet (strict JSON)
Merge this into your existing config (donât paste as a second top-level object):
{
"agents": {
"defaults": {
"bootstrapMaxChars": 4000,
"bootstrapTotalMaxChars": 20000,
"compaction": {
"reserveTokensFloor": 0,
"memoryFlush": { "enabled": false }
}
}
},
"channels": {
"telegram": {
"historyLimit": 10
}
},
"tools": {
"profile": "messaging",
"allow": ["group:fs", "group:memory", "group:sessions"],
"deny": ["group:web", "group:ui", "group:nodes"]
}
}
If you paste your current openclaw config get for tools, channels.telegram, and your models.providers.ollama block (redacting secrets), I can tell you exactly which of these you already have and what to changeâwithout guessing.
In this thread, you've provided recommendations related to "tools" in a number of places. List a very concise summary of your recs + reason why
-
Disable web + browser tools on small local models (
tools.deny: ["group:web","group:ui"]/ denybrowser)
Why: big tool schemas + risky prompt-injection surface + extra token/context pressure; improves reliability on 8B/16GB. -
Use a restrictive tool profile / allowlist (start from
tools.profile: "messaging"or similarly minimal; thentools.allowonly what you need)
Why: fewer tools = smaller system prompt + fewer failure modes/hallucinated tool calls. -
Keep file + memory tools only if youâre using externalized state (
group:fs,group:memory)
Why: supports your protocol (STATE/PLAN/EVIDENCE files) without loading everything into context. -
Deny nodes unless you actually use them (
tools.deny: ["group:nodes"])
Why: avoids extra schema/context + reduces risk/complexity. -
Donât rely on TOOLS.md as permissions (itâs guidance only)
Why: tool availability is controlled by config allow/deny/profile; TOOLS.md doesnât enforce anything. -
Avoid tool-heavy âalways do Xâ workflows (e.g., âread files every messageâ)
Why: repeated tool calls and large outputs bloat context and can stall Ollama; your 2â4k token envelope is tight.
- Your bullet 1: Will disabling web + browser tools on small local models prevent my agent from being able to do web related research? What if the objective of my agent is to perform research that occaisionally requires using the internet?
- Your bullet 2: If I build an agent that needs to: strive towards objective, create subgoals and contingent sequence of actions to accomplish, research from the web, write memory locally, and do something like construct various guides for a user locally, and etc, what might be the restrictive tool profile that I'd configure?
- Your bullet 3: No duh, I must use externalized state because I must keep my context window smaller on my machine given my hardware constraints. But I mean. Feel free to challenge my assumption here.
- Your bullet 4: What is nodes? What does denying them do? When do I know if I need to "actually use them"?
- Your bullet 5: If I rely on allow/deny/profile to specify tool availability, then what is the purpose of using TOOLS.md for guidance?
- Your bullet 6: Why or how would an "always do X" workflow cause "large outputs" tha bloat context?
- Yes â if you deny
group:webandbrowser, the agent wonât be able to do web research via tools.
But you usually donât need the browser tool for research anyway; most âinternet researchâ can be done withweb_search+web_fetch(fetch pages â summarize). The browser tool is for interactive UI automation and has a large schema/context cost.
If your objective sometimes needs the internet, a sane compromise on a small local model is:
- allow
web_search/web_fetch - deny
browser(andcanvasunless you need it) - consider sandboxing when web tools are enabled (your earlier security audit warning was about small models + web tools without sandboxing)
Docs on tool groups/profiles: https://docs.openclaw.ai/tools
- For the agent you described (objective â subgoals/plans â web research â write local files/memory â generate guides), Iâd start with:
- File ops:
group:fs(read/write/edit) - Memory tools:
group:memory(memory_search/memory_get) - Sessions tools:
group:sessions(optional, but useful for management)
- Web research:
group:web(web_search/web_fetch) - Runtime exec: only if you truly need it (
group:runtime)
Config sketch (tight but capable):
{
"tools": {
"profile": "coding",
"allow": ["group:web", "message"],
"deny": ["browser", "canvas", "group:nodes"]
}
}
Notes:
codingalready includesgroup:fs,group:runtime,group:sessions,group:memory(per docs). You can further denygroup:runtimeif you donât want shell commands at all.
- If youâre chatting via Telegram, you may not need
messageat all (replies go out through the channel), but itâs useful for proactive sends/structured delivery.
- Externalized state isnât just ânice to haveâ for your machine â itâs the stable pattern.
The only real alternative is: use a bigger-context / more reliable model backend (often cloud) or bigger hardware. With Ollama 8B + 16GB, âkeep everything in chatâ will eventually hit the KV-cache wall again. Externalization is the rational move.
- Nodes are paired companion devices/hosts that can run tools (think: ârun commands on another machineâ, âuse phone sensorsâ, âscreenshots/cameraâ, etc.). Denying
group:nodes:
- reduces tool/schema overhead in the prompt
- prevents the agent from attempting node operations youâre not using
You need nodes only if you installed/paired a node and you actually want capabilities from it (screen/camera/location/remote exec, etc.). If your setup is just âgateway + Telegram + local Ollamaâ, you can deny nodes.
- TOOLS.md is for human-maintained operational notes and conventions (e.g., âmy server is called home-serverâ, preferred voice, where logs live, naming, do/donât). It helps the model act consistently when it does use tools, but it does not grant/deny permissions. Permissions come from
tools.profile/allow/deny.
Given your tight context, keep TOOLS.md short (or move long stuff to a non-injected PROTOCOL.md and force-read it once per session).
- âAlways do Xâ causes large outputs because tool results and file reads become part of the session transcript/context. Examples:
- âRead yesterdayâs log every messageâ â you inject hundreds/thousands of tokens repeatedly.
- âRun status checks every turnâ â tool output accumulates; compaction kicks in; more instability.
Your protocol should be conditional: read/write only the one file needed for the current TYPE, and summarize tool outputs aggressively.
Useful command to verify where tokens are going: /context list and /context detail (docs: https://docs.openclaw.ai/concepts/context).
Tools.profile in openclaw.json is unrecognized
That error almost always comes from how itâs written in the JSON file.
In openclaw.json, tools.profile is not a literal key
This is wrong in the file (it creates an unknown key named "tools.profile"):
{ "tools.profile": "coding" }
In the config file it must be nested:
{
"tools": {
"profile": "coding"
}
}
(tools.profile is a valid CLI path for openclaw config set, but the file itself must be nested objects.)
Quick verify
Run:
openclaw config get tools
You should see something like:
{ "profile": "coding" }
If itâs still âunrecognizedâ
Then I need the exact error text + version because it could be a schema/version mismatch or a different nesting issue:
openclaw --version
openclaw doctor
âŚand paste the tools section from your ~/.openclaw/openclaw.json (just that part).
Docs reference for tool profiles: https://docs.openclaw.ai/tools
Show how to enable local embedded memory search. Will enabling this affect context window size?
To enable local (fully offline) memory search in OpenClaw youâre enabling the memory_search / memory_get tools backed by a local embeddings model + a local SQLite index. The Markdown files remain the source of truth (MEMORY.md + memory/YYYY-MM-DD.md). Docs: https://docs.openclaw.ai/concepts/memory
1) Config: enable local memory search (no API keys)
Add this to ~/.openclaw/openclaw.json (merge into your existing JSON):
{
"agents": {
"defaults": {
"memorySearch": {
"provider": "local",
"local": {
"modelPath": "/ABS/PATH/TO/EMBEDDING-MODEL.gguf"
},
"fallback": "none"
}
}
}
}
Notes from the docs:
agents.defaults.memorySearchis the right place (not top-levelmemorySearch).local.modelPathcan be a GGUF path or anhf:URI (docs mention both).- Setting
fallback: "none"avoids silently using remote embeddings if local fails. (From the docsâ âLocal modeâ notes.)
After editing config:
openclaw gateway restart
Local prerequisite
Local mode uses node-llama-cpp and âmay require pnpm approve-buildsâ (docs). If local embeddings donât come up, thatâs the first thing Iâd check.
2) Will this affect your context window size?
It does not change the modelâs contextWindow (thatâs still your Ollama model config).
But it can affect how much of the window gets used in two ways:
- Tool schemas: enabling memory search makes
memory_search/memory_getavailable to the model, which adds their tool schema overhead to the system prompt (usually small compared tobrowser, but non-zero). - When you call memory_search: the returned snippets are injected into the prompt for that run, so that run uses more tokens. (The index on disk does not consume context; only the retrieved snippets do.)
Practical implication with your tight budget: keep memory_search results small (few results) and avoid dumping large snippets; use it as a âfind the right file/lines, then memory_get a small rangeâ workflow.
If you paste your current agents.defaults.memorySearch section (or say âitâs absentâ), I can tell you exactly what you need to add vs what you already have.
wtf do I put for modelPath?
modelPath is not your chat model (llama3.1:8bâŚ). Itâs the path to a local embeddings model file (a .gguf) that OpenClaw uses to build/query the vector index for memory_search.
So you have two options:
Option A (local/offline): point to a GGUF file on disk
- Download an embedding model in GGUF format (small is good; donât use a huge chat LLM for embeddings on a 16GB machine).
- Put it somewhere stable, e.g.
~/.openclaw/models/embeddings.gguf~/Models/embeddings.gguf
- Set
modelPathto the absolute path to that file:
{
"agents": {
"defaults": {
"memorySearch": {
"provider": "local",
"fallback": "none",
"local": {
"modelPath": "/Users/YOURNAME/.openclaw/models/embeddings.gguf"
}
}
}
}
}
Docs (this is exactly what itâs asking for): memorySearch.local.modelPath (GGUF or hf: URI)
Source: https://docs.openclaw.ai/concepts/memory
Option B (no modelPath): use remote embeddings instead
If you donât want to deal with downloading a GGUF embeddings model, you can configure memorySearch.provider to a remote embedding provider (OpenAI/Gemini/Voyage/Mistral) and give it an API key. That avoids modelPath entirely, but isnât âlocalâ.
Quick sanity check
Whatever you choose, verify the file exists:
ls -lh /Users/YOURNAME/.openclaw/models/embeddings.gguf
If you tell me âI want offline/localâ vs âIâm fine with remoteâ, I can point you to the simplest next step for that path.
Iâm traveling with no wifi right now
If you have no WiâFi, then:
What still works (offline)
- Your agent can still read/write the Markdown memory files (
MEMORY.md,memory/YYYY-MM-DD.md) and you can retrieve stuff withmemory_get(or plain file reads) entirely offline. - This does not require embeddings.
What wonât work (unless you already prepared it)
memory_searchsemantic search needs an embeddings backend.- If itâs configured to use a remote embeddings provider (OpenAI/Gemini/etc.), it wonât work offline.
- If itâs configured to use local embeddings, it will work offline only if you already have the local embeddings model file on disk (the
.ggufyou pointmodelPathat).
So right now, if you donât already have an embeddings .gguf on your machine, you canât âturn on local semantic searchâ while offline (because you canât download the model).
Best offline workaround (practical)
- Use
memory_getto read the relevant memory files directly. - For âsearchâ, do simple keyword search locally (you can run this yourself in Terminal):
rg -n "your keyword" ~/.openclaw/workspace/MEMORY.md ~/.openclaw/workspace/memory/*.md
When you have internet again (prep local semantic search)
- Download an embeddings GGUF to a stable path (e.g.
~/.openclaw/models/embeddings.gguf) - Set:
{
"agents": {
"defaults": {
"memorySearch": {
"provider": "local",
"fallback": "none",
"local": { "modelPath": "/Users/YOU/.openclaw/models/embeddings.gguf" }
}
}
}
}
- Restart gateway:
openclaw gateway restart
If you want: tell me whether you currently get an error when calling memory_search (paste the error), and I can tell you whether youâre already partly configured (e.g., an index exists from earlier) or fully unconfigured.
contextWindow minimum is 16000, what should my max tokens value be
maxTokens is just the maximum length of the modelâs output. It doesnât cap how much context (prompt/history/tool schemas) gets sent in â it caps how long the reply is allowed to be.
Given:
- your real stability envelope is ~2â4k tokens of âworking stuffâ on a 16GB / Ollama 8B setup, and
- OpenClaw is forcing
contextWindow >= 16000(schema/catalog constraint),
Iâd set:
maxTokens: 512(default recommendation)- If you occasionally need longer writing:
maxTokens: 1024and rely on chunking / writing to files rather than huge single replies. - I would avoid 2048+ on your setup unless youâre deliberately doing a long-form generation and youâve kept the prompt extremely small.
Rule of thumb (safe): maxTokens â contextWindow / 32 â 16000/32 â 500 â ~512.
Also: if the goal is to stop Ollama hangs, the bigger lever is still keeping prompt small (bootstrap file sizes, Telegram historyLimit, tool schema count) â maxTokens mainly prevents the reply from exploding.
Can ollama/llama3.1:8b-instruct q4 execute tools?
Yes â it can, if Ollama reports that model as tool-capable.
In OpenClaw, tool calling for Ollama isnât determined by â8Bâ or âq4â so much as whether that specific model advertises tools support via Ollamaâs metadata. OpenClaw will then use Ollamaâs native API (/api/chat) to do streaming + tool calls (donât use the /v1 OpenAI-compatible endpoint for this; tool calling is unreliable there).
How to confirm on your machine:
- In OpenClaw (best signal):
openclaw models list | rg "ollama/llama3.1:8b-instruct-q4"
If it shows something like tools: yes (or similar âyesâ columns), OpenClaw will attempt tool calls with it.
- In Ollama directly (metadata):
ollama show llama3.1:8b-instruct-q4_K_M
One caveat: even when tool-capable, small local models can be flaky at tool calling (stalling, or outputting tool JSON as plain text). If you see that behavior, the fix is usually to (a) reduce tool surface (deny browser/web unless needed) and/or (b) switch to a more tool-reliable model/provider for tool-heavy work.
Thatâs normal for an 8B local model. Tool-capable â tool-reliable, and âfactual groundingâ is still weak compared to bigger/cloud models.
What youâre seeing is usually one (or more) of these:
Why it fabricates file reads / invents ârulesâ
-
Itâs trying to be helpful under uncertainty.
Small models often fill gaps with plausible-sounding text instead of saying âI donât know / I didnât read that.â -
The workspace injection is long + noisy.
If AGENTS/SOUL/TOOLS/etc are big, the model loses track of exact wording and paraphrases or invents. Use/context listto see whatâs being injected (docs: https://docs.openclaw.ai/concepts/context). -
Itâs not actually calling the
readtool, but it roleplays that it did.
The only real âfile readâ is a tool call. Everything else is just generated text. -
Temperature / sampling can amplify confabulation.
Even at low temp, small models can still confabulate, but high temp makes it worse.
How to fix it (practical, works)
A) Add an explicit âno fake readsâ rule to AGENTS.md
Put this near the top of your injected rules:
- Never claim you read a file unless you used a tool call to read it in this session.
- If you need file content, request a read (or ask the user to paste it).
- When referencing workspace files (AGENTS.md/SOUL.md/USER.md/etc), quote the exact line(s) you are using or say âIâm paraphrasingâ.
This doesnât make it perfect, but it dramatically reduces âI read XâŚâ lies.
B) Keep injected files short
Even with bootstrapMaxChars, the model can still get confused. For your hardware, keep AGENTS/SOUL/TOOLS/USER as tiny and move detail to non-injected files (then explicitly read them).
C) Reduce tool surface area
The more tools you expose, the more chances it has to âinventâ a tool narrative. For your âexternalize state + research sometimesâ agent, a good starting tool set is:
- allow:
group:fs,group:memory,group:web - deny:
browser(and oftencanvas/group:nodes)
(Docs on tool groups/profiles: https://docs.openclaw.ai/tools)
D) Verify whether it actually called tools
Turn on verbose mode temporarily and watch for tool calls/results in the Control UI (or logs). If it âclaimsâ it read a file but no tool call happened, itâs confabulation.
One quick diagnostic question
When it âfabricated a file readâ, did you see any read tool call in the Control UI / logs at that moment? If you paste 10â20 lines around that event, I can tell you whether OpenClaw ever attempted the tool call or if it was pure hallucination.
Are certain LLM models more effective at letâs say, logical inference than letâs say, planning a contingent sequence of moves towards an objective? And vice versa
Yes. Different models (and even different variants/quantizations of the same model) tend to have different âstrength profilesâ across things like tight logical inference vs multi-step planning vs tool use.
Why this happens (high level)
- Training mix: some models saw more math/logic/coding tasks; others saw more instruction-following or long-form writing.
- Context + attention: long-horizon planning is very sensitive to context window quality and how well the model keeps constraints consistent across many steps.
- Reasoning style: some models are better at âcareful step-by-stepâ and uncertainty; others are better at producing plausible action plans.
- Tool reliability: some models are much better at generating correct tool calls and not âroleplayingâ results.
Typical pattern
- Logical inference / precise deduction: models that are strong at math/coding and disciplined reasoning tend to do better. Smaller local models can do OK on short inference tasks if the prompt is tight, but they often hallucinate premises when the task is underspecified.
- Contingent planning toward objectives: planning stresses consistency, memory, and âdonât lose the plot.â Bigger models usually win here, especially when plans span many steps and require updating based on results.
- Tool-using agents: reliability matters more than raw âIQ.â Some models are great at prose but terrible at tool calling; some are the opposite.
Your specific setup note (Ollama llama3.1 8B Q4 on 16GB)
- It can do short inference fine when you constrain the prompt.
- It is much more likely to struggle with long-horizon planning and may confabulate âI read X / I did Yâ unless you hard-enforce tool discipline and keep context tiny.
- The biggest enemy isnât just âplanning abilityâ â itâs context pressure + confabulation under uncertainty, which your context buffer protocol is designed to mitigate.