#Setting up - Help a beginner
1 messages ¡ Page 1 of 1 (latest)
If your main goal is OpenClaw + Telegram + basic âassistant-yâ stuff, a Mac mini is a totally reasonable always-on box â but 16GB unified memory is the main bottleneck the moment you try to run a local LLM and keep the gateway/agent tooling comfy.
Practical reality check on âlocal LLM to save tokensâ
- You can run local models on a Mac mini, but for OpenClaw-style âagenticâ work (tool calling, multi-step tasks, reliability), small local models often feel flaky compared to good cloud models.
- So the best experience for most people is hybrid:
- cheap cloud model for routine chat/heartbeats
- stronger model only when you actually ask it to do real work
- optional: local model for âoffline Q&A / quick draftsâ, not mission-critical tool use
(For up-to-date model recommendations, ask in #models â things change fast. Docs: https://docs.openclaw.ai/concepts/models)
If you keep the Mac mini: what upgrades matter?
RAM is king. SSD is second.
- 32GB RAM: the âminimum Iâd personally recommendâ if local LLM is a real goal.
- 64GB RAM: starts to feel comfortable for bigger local models and longer contexts.
- 256GB SSD: workable, but youâll run out surprisingly fast (models + caches + dev stuff). Iâd aim for 512GB or 1TB if you can.
With 16GB, youâll likely be limited to small quantized models and shorter contexts, and macOS will lean on swap when you multitask (which feels bad on an always-on box).
If youâre open to non-mac options (still âsmall box by the routerâ)
A small Linux mini-PC can be a great always-on gateway host, and you can keep your âmain computerâ separate. GPU-based local LLM setups get expensive/complicated fast though (and arenât as discreet once cooling/power enters the picture).
What Iâd do in your shoes (rookie-friendly path)
- Donât return the Mac solely for local LLM dreams unless youâre ready to pay for more RAM.
- Get OpenClaw stable + useful with a cheap cloud model + spending limits first.
- Once youâre comfortable, experiment with local inference knowing itâs a tradeoff.
If you tell me what you mean by âbasic tasksâ (just chat? summaries? web browsing? coding? home automation?), I can sanity-check whether 16GB is likely to frustrate you or not. Also: youâll get real âwhat people run at homeâ anecdotes in #users-helping-users.
The point is to eventually run different agents, specialized in different tasks and learning about different topics so they can work together to build software that I can sell. This aside from checking, resuming and replying to personal and work e-mails, do online tasks for me, etc....
Commercializing access to agents running simple daily use-tasks bots to other people that dont have the patience or knowledge to set up an openclaw is another possibility. I just want to be sure if I should keep this Mac Mini for now as it will still serve me for a while or if I should return it while I can still get a 100% refund and move on to another one
Given what you want long-term, it helps to separate two goals:
1) Learning + personal assistant (email, online tasks, âhelp me build stuffâ)
For this, the Mac mini M4 16GB / 256GB is totally fine as an always-on OpenClaw box if youâre OK using a cloud model for the âthinkingâ.
What will frustrate you on 16GB is specifically: ârun a local LLM + run the gateway + do browser/tool automation + multitaskâ all on the same machine. Itâll work for small local models, but youâll hit limits fast (swap, slowdowns, short contexts).
If your priority is âget productive, learn, build prototypesâ: keep the Mac, and control spend by:
- using a cheaper model for day-to-day chatter/background tasks
- reserving âstrongâ models for big jobs
- putting hard spend limits on API keys
(See: https://docs.openclaw.ai/concepts/models â and for current community picks, ask in #models.)
2) âI might sell this / host agents for other peopleâ
Thatâs a completely different hardware and architecture problem.
The hard parts arenât just âcan it run locally?â â itâs:
- security isolation between users (multi-tenant risk)
- handling other peopleâs email / accounts (privacy, compliance, incident response)
- uptime/monitoring/backups
- billing + rate limiting + abuse handling
- model/provider Terms (many providers have rules around reselling / offering access)
A single Mac mini under your router is fine for your assistant, but itâs not the direction Iâd bet on for a hosted product.
So should you return it?
If you want a simple decision rule:
- Keeping it makes sense if: your next 1â3 months are âlearn OpenClaw, build flows for yourself, prototype agentsâ, and youâre OK with cloud models for the heavy lifting.
- Return/upgrade makes sense if: youâre committed to local LLM as a core requirement soon. In that case Iâd strongly consider 32GB RAM minimum (Apple RAM isnât upgradeable), and ideally 512GB+ storage (or plan on an external SSD).
If you answer these 3 questions, I can be much more concrete:
- Do you want the local model to be the default brain, or just a âcheap fallbackâ?
- Will the Mac be doing browser automation/tasks too (so it needs headroom), or just running the gateway?
- Whatâs your realistic budget ceiling for RAM/SSD upgrades?
1 - Local LLM as capable as possible as I believe that will save me in API costs in the long run. Why? Because yes, I can set up daily limits on API but that will just mean I will take longer to reach the same goal, as the amount of transmissions and content (and thus the "price") wont change. With a decent local LLM, that cost doesn't exist in the long run and would most likely offset the higher initial PC cost.
2 - Knowing from 1 that a local LLM is the main point, the Mac will be doing as much as possible.
3 - 2000$-ish mark for the full setup
If âlocal LLM as capable as possibleâ is nonânegotiable, then Iâd not keep a 16GB / 256GB Mac mini. Youâll hit the wall quickly once you run: gateway + browser/tooling + a decent local model + multiple agent sessions.
Also one key correction: local inference isnât truly âfreeâ longârun â you pay in hardware, power, and (big one) time/latency. And for agentic tool use, the âcheapestâ model is often the one that actually succeeds in fewer steps. So even local-first people usually keep a cloud fallback for hard jobs.
That said, with a ~$2000 budget, here are the realistic best paths:
Option A (simplest/quiet/discreet): stay Apple, buy RAM
If you want a small silent box by the router, Apple Silicon is great if you buy enough unified memory.
- RAM: Iâd target 64GB if you can, otherwise 32GB minimum.
- SSD: 512GB+ strongly recommended (models + caches + dev stuff add up fast). External SSD is fine too, but internal still helps.
- CPU upgrades: usually less important than RAM for your stated goal.
This is the âleast hassleâ setup, and very newbie-friendly.
Option B (most capability per $ for local LLM): small PC + NVIDIA GPU (Linux)
If youâre not a fan of macOS and want maximum local capability per dollar:
- Aim for NVIDIA (easiest path for LLM tooling).
- VRAM matters more than CPU: 16GB VRAM is the floor, 24GB VRAM is the sweet spot for âcapable localâ before things get awkward.
- Tradeoff: itâs usually bigger/noisier than a Mac mini, and âdiscreet living room boxâ gets harder.
This is the âbest performanceâ route, but itâs also the most DIY.
My blunt recommendation based on your answers
- If you want discreet + low-maintenance: return the 16GB Mac mini and rebuy with more RAM (32GB/64GB) and ideally more SSD.
- If you want maximum local LLM capability per $ and donât love macOS anyway: consider a small Linux + NVIDIA box and treat it as your always-on âagent serverâ.
If you want, tell me which matters more: silent/discreet vs max capability, and whether youâre OK with a âtwo-boxâ setup (Mac mini for OpenClaw + separate GPU box for inference). Then I can give a much tighter âbuy this class of machineâ recommendation without hand-waving.
(And for real-world âwhat people actually runâ, ask in #users-helping-users too â youâll get concrete builds and noise/thermals feedback.)
At this stage, discrete interests me more. Returning the 16/256 and going for a 64/512 seems like the best compromise
Yeah â if âdiscreet single boxâ is the priority, 64GB RAM / 512GB SSD is a solid compromise and (importantly) avoids the one thing you canât fix later: RAM.
A few practical notes so you donât get surprised later:
- 64GB is the real win for local LLM on Apple Silicon. It gives you headroom for: gateway + browser/tool automation + longer contexts + not living in swap.
- 512GB is the minimum Iâd be comfortable with if this is going to be your âalways-on agent boxâ. Models, caches, logs, and general dev stuff add up fast.
If 1TB is in budget, itâs nice â but you can also add a fast external SSD later for model storage. - Even with 64GB, Iâd still plan a cloud fallback for the occasional âhardâ task (local can be great, but tool reliability is the difference between finishing in 3 steps vs 30).
If you go this route and want to sanity-check performance before the return window closes: run your intended local inference stack + OpenClaw for a day, watch memory pressure / swap, and see if it feels âsnappy enoughâ for you. Thatâll make the decision feel a lot less theoretical.
Sounds good! I'll keep the thread open in case someone that has gone/is going through the same can provide their inputs as well
Makes sense.
If you want more âpeople who run this at homeâ input, itâs worth also asking in #users-helping-users â youâll get better anecdotes on noise/thermals, âdid I regret 32 vs 64â, external SSD setups, etc.
If you end up going 64/512 and want a quick sanity checklist for first-week setup (keeping it stable + keeping costs down), drop a note here and Iâll outline one.
Feel free to, I'm all ears
Hereâs a âfirst weekâ checklist Iâd follow on a Mac mini thatâs meant to sit quietly by the router (stable + not burning money). Iâm keeping it practical / rookie-friendly.
Day 0: Baseline sanity (saves you hours later)
- Record versions: run
openclaw --versionand note it somewhere. - Make sure the gateway survives reboots: after you restart the Mac, confirm it comes back and still replies on Telegram.
- Know the 3 commands youâll use most (service control):
openclaw gateway status/openclaw gateway restart/openclaw gateway stop|start
Docs index: https://docs.openclaw.ai/help/faq (tons of âreal-world setupâ answers)
Day 1: Make it stable as an always-on box
- Disable sleep (sleep is the #1 âwhy did it stop replying?â issue on home Macs).
- Prefer Ethernet over WiâFi if possible (less random disconnect pain).
- Bookmark troubleshooting + log locations so youâre not guessing later:
https://docs.openclaw.ai/gateway/troubleshooting
Day 2: Lock down access (especially if it touches email later)
- In Telegram (and any channel), make sure youâre using allowlists / restricted access so random people canât DM your bot. (FAQ covers Telegram allowFrom, group behavior, etc.)
- Donât expose the dashboard/gateway openly to the internet unless you really know what youâre doing. If you need remote access, prefer a private network approach (VPN/Tailscale-style).
Day 3: Cost control that actually moves the needle
Even if you go local-first, youâll still want controls so you donât accidentally burn tokens when you do use a hosted model.
- Set up two tiers mentally:
- âcheap/defaultâ for routine chat + background
- âstrong/fallbackâ for hard tasks
- Keep contexts small on purpose:
- Use
/newwhen you switch topics (prevents âcontext snowballâ) - If you hit context issues, use the reset/compact guidance in the FAQ sections on sessions/context limits
Models docs (how selection/defaults/failover work): https://docs.openclaw.ai/concepts/models
Day 4: Local LLM reality check (before you commit)
- Run your local inference stack + OpenClaw for a couple hours doing real tasks (not benchmarks).
- Watch for: slowdowns, swap usage, overheating/throttling, and whether tool-using tasks âstick the landingâ.
- If itâs flaky: keep local as default for casual chat, but keep that cloud fallback for anything important.
Day 5â7: Backup + âoops recoveryâ
- Figure out what folder OpenClaw stores its data in on your machine and back it up (FAQ has a âwhere things live on diskâ section).
- Make sure you can recover from: a broken config edit, a bad skill, or a stuck session (FAQ + troubleshooting cover common resets).
If you tell me which local inference route youâre planning (e.g., Ollama vs llama.cpp vs something else) and whether youâll run it on the same box, I can tighten this into a more specific âdo these exact checksâ planâwithout going into deep sysadmin land.
I've tried to set up Ollama locally but it wouldnt work and would fallback right into google API. Is it a known issue?
Usually not a âknown Ollama is brokenâ issue â itâs OpenClaw doing normal failover because it canât see/use Ollama, so it falls back to the next provider you have configured (Google in your case).
The 3 most common causes:
- Ollama isnât actually reachable from where the Gateway runs
- Quick check:
curl http://localhost:11434/api/tags(should return JSON)
- Ollama provider isnât enabled (super common on macOS service installs)
- If you only did
export OLLAMA_API_KEY=...in your terminal, but the Gateway is running as a launchd service, it often wonât inherit that env var â OpenClaw wonât auto-discover Ollama â fallback triggers. - Fix: set it in config instead (docs show this), then restart the gateway.
- No models are being discovered
- OpenClawâs Ollama auto-discovery only keeps models that report tool support, so you can end up with â0 Ollama models availableâ even though
ollama listshows models installed â then it falls back.
Docs (worth skimming, especially the â/v1â warning + discovery rules):
https://docs.openclaw.ai/providers/ollama
Minimal debug checklist
Can you paste the output of:
ollama listopenclaw models listopenclaw gateway status
That will usually make it obvious whether itâs (a) env/config not applied to the service, (b) wrong base URL (e.g. accidentally using /v1), or (c) discovery filtering out your installed model(s).
I will as soon as possible
Cool â whenever you get a minute, paste those outputs and weâll pin it down.
If youâre running the Gateway as a macOS service, the most likely culprit is the env var not being seen by launchd. So if you want a quick thing to try in the meantime: follow the âEnable Ollama for OpenClawâ step on the Ollama provider page using the config method (not just export ...), then openclaw gateway restart.
Still on the Mac Mini M4 topic, why exactly have you not elaborated on VRAM?
Because on a Mac mini (Apple Silicon) VRAM isnât a separate thing you can pick the way you would on an NVIDIA/AMD GPU PC.
Whatâs different on Macs
- Apple Silicon uses unified memory: the CPU and GPU share the same pool of RAM.
- So the âVRAMâ the GPU uses is basically a slice of your unified memory, allocated dynamically by the OS (plus some driver/OS reservations). You canât buy âmore VRAMâ without buying more unified memory.
Why I kept talking about RAM instead
For local LLMs, the hard requirement is âdo the model weights + KV cache fit in fast memory?â
- On a PC with a discrete GPU: that fast memory is VRAM, so VRAM is the headline spec.
- On a Mac: that fast memory is unified memory, so 64GB unified memory effectively is your âVRAM budgetâ too.
Practical takeaway
If youâre staying Mac mini and you care about local LLM capability, the knob that matters most is:
- Unified memory (32GB/64GB) â directly increases what the GPU can hold (and what your system can do without swapping)
If you switch to a GPU PC, then weâd talk VRAM first (because youâll feel the difference between 16GB vs 24GB VRAM immediately for local inference).
Ok, lets start fresh. Instruct me step by step on how to install the overall best possible free local LLM to feed openclaw on a Mac Mini M4 16/256 thats currently working with an esxternal API
I can walk you through this cleanly. Two quick constraints up front:
- On a Mac mini M4 16GB, âbest possibleâ local LLM really means best that fits + runs without swapping. You will be limited to smaller/quantized options.
- âFreeâ models still have licenses â if you plan to sell something, you must pick a model whose license allows commercial use.
Also: I donât actually see the screenshot attached here. Please re-upload the pic (the OpenClaw model/provider settings page / config view) so I can make sure weâre matching your current setup.
That said, hereâs the step-by-step âfresh startâ path that works for most people on macOS, using Ollama (simplest integration):
Step 1 â Verify your Gateway is healthy
In Terminal:
openclaw --versionopenclaw gateway status
If gateway status isnât ârunning / RPC probe okâ, stop here and we fix that first.
Step 2 â Install & start Ollama
- Install Ollama: https://ollama.ai
- Make sure itâs running (Ollama app running, or service running).
- Verify the API responds:
curl http://localhost:11434/api/tags
If that fails, OpenClaw canât use Ollama yet.
Step 3 â Pull a tool-capable model in Ollama
OpenClawâs auto-discovery only lists Ollama models that report tool support.
So:
- Go to the OpenClaw Ollama provider doc and pick a model from their âQuick startâ list (they keep this current):
https://docs.openclaw.ai/providers/ollama - Run:
ollama pull <model-id-you-picked>
Then confirm:
ollama list
Step 4 â Enable Ollama in OpenClaw (IMPORTANT on macOS services)
If your Gateway runs as a macOS service, donât rely on export OLLAMA_API_KEY=... in your shell (launchd often wonât see it).
Do this (any value works; Ollama doesnât require a real key):
openclaw config set models.providers.ollama.apiKey "ollama-local"
Then restart the gateway:
openclaw gateway restart
(These exact keys/commands are from the provider doc above.)
Step 5 â Confirm OpenClaw sees Ollama models
Run:
openclaw models list
You should see models with IDs like ollama/....
If you see none, itâs usually one of:
- Ollama not reachable (
curlfails) - you defined
models.providers.ollamaexplicitly (disables auto-discovery) but didnât define models - the model you pulled doesnât report tool support â it gets filtered out
Step 6 â Set the default agent model to Ollama
Once openclaw models list shows at least one ollama/... model, set your agent default to it (docs example):
- set
agents.defaults.model.primaryto theollama/...model id you want - restart gateway again
(If you paste your screenshot / current config, Iâll tell you exactly what to change so it doesnât keep falling back to Google.)
Next message from you (so I can make this 100% precise)
Please paste the output of:
curl http://localhost:11434/api/tagsollama listopenclaw models listopenclaw gateway status
âŚand re-upload the screenshot you mentioned. Then Iâll tell you exactly why itâs falling back and which step is missing.
Step 1 - 1) v2026.3.2
2)Runtime: running (pid 548, state active)
RPC probe: ok
Perfect â Gateway is up and reachable (RPC probe: ok). đ
Step 2 â Install + start Ollama, then confirm itâs reachable
2.1 Install Ollama
- Install from: https://ollama.ai
- After install, open the Ollama app once (so the background service starts).
2.2 Confirm the Ollama service is running
In Terminal, run:
- Does the API respond?
curl http://localhost:11434/api/tags
What Iâm looking for:
- It should return JSON (even if the model list is empty).
- If you get âconnection refusedâ / canât connect, Ollama isnât running yet.
- Do you have any models installed yet?
ollama list
If ollama says âcommand not foundâ, your PATH isnât picking it up (weâll fix that), but try the curl first since thatâs the real âis it runningâ test.
Reply with the output of those two commands and weâll go to Step 3 (pulling a tool-capable model and wiring OpenClaw to it).
Installing Ollama to /Applications...
Starting Ollama...
Unable to find application named 'Ollama'
That output means âthe installer tried to launch the Ollama macOS app, but macOS couldnât find Ollama.appâ.
This usually happens in one of these cases:
- the app didnât actually get copied to
/Applications, or - you installed only the CLI (e.g. via Homebrew), so there is no GUI app to launch, or
- the app is somewhere else (like
~/Applications).
Letâs diagnose in the simplest way.
Step 2A â Check if the Ollama app exists
In Terminal, run:
ls -ld /Applications/Ollama.app
If it exists
Start it explicitly:
open /Applications/Ollama.app
If macOS blocks it, go to System Settings â Privacy & Security and allow it (âOpen Anywayâ), then try again.
If it does not exist
Then either the install failed or you only have the CLI. Letâs check if the CLI is installed:
which ollama
ollama --version
- If
ollama --versionworks: great â we can run it without the GUI app. Start the server manually:
ollama serve
(Leave that terminal window open for now.)
- If
ollamais not found: re-install via the official mac download (drag into Applications):
https://ollama.ai
Step 2B â Confirm the server is reachable
Once Ollama is running (GUI app open or ollama serve running), run:
curl http://localhost:11434/api/tags
Paste the outputs of:
ls -ld /Applications/Ollama.appwhich ollama+ollama --versioncurl http://localhost:11434/api/tags
âŚand weâll move to Step 3 (pull a model + make OpenClaw use it).
Step 2B - aiverse@Jarbass-Mac-mini ~ % curl http://localhost:11434/api/tags
{"models":[]}% aiverse@Jarbass-Mac-mini ~
Nice â that means Ollama is running and reachable, it just has zero models installed yet ("models":[]).
Step 3 â Install (pull) a model into Ollama
3.1 Pick a tool-capable model (important for OpenClaw)
OpenClawâs Ollama auto-discovery only lists models that report tool support.
So: open this page and pick one model from the âQuick startâ list (they keep it current):
https://docs.openclaw.ai/providers/ollama
On your 16GB Mac, I recommend you start with the smallest option from that list first (just to validate the whole pipeline). You can always try bigger after itâs working.
3.2 Pull it
Run:
ollama pull <model-id-you-picked>
3.3 Confirm Ollama now sees it
Run:
ollama list
curl http://localhost:11434/api/tags
At that point, api/tags should show at least one model.
Reply with the output of ollama list (just that), and weâll do Step 4: configure OpenClaw to actually use ollama/<that-model> as the default (and optionally stop it from falling back to Google during testing).
Which of these is the most adequate for my computer (using it solely for this and nothing else)
ollama pull gpt-oss:20b
or
ollama pull llama3.3
or
ollama pull qwen2.5-coder:32b
or
ollama pull deepseek-r1:32b
On a Mac mini M4 with 16GB unified memory, that exact list is (bluntly) way too heavy except maybe the 20B one â and even that is likely to be slow / memory-tight once you add OpenClaw + tool use.
Hereâs the breakdown:
llama3.3â thatâs the 70B model. Not realistic on 16GB (itâll either fail to run or swap itself to death).qwen2.5-coder:32bâ 32B is also not realistic on 16GB for the same reason.deepseek-r1:32bâ same story: 32B is too big for 16GB.gpt-oss:20bâ the only one from your list Iâd even try on 16GB, but expect it to be tight (and if it runs, it may still be painfully slow once contexts/tool calls grow).
What I recommend for your machine (so it actually works)
Start with a tool-capable model in the ~7Bâ10B range first (something with an explicit small size tag like :7b / :8b / :9b), get OpenClaw working end-to-end, then try bigger if you want.
Those 20B/32B/70B pulls are great targets for a 32GB/64GB Mac mini â but theyâre not a good first move on 16GB/256GB.
If you still want to proceed right now using only the four commands you posted: pick gpt-oss:20b as the âleast badâ option.
Tell me which route you want:
- âMake it work reliably on 16GBâ (Iâll point you to a small-size pull command style), or
- âTry the biggest thing that might runâ (you pull
gpt-oss:20band we see if itâs usable).
Yeah but those are the only ones that pop up on
https://docs.openclaw.ai/providers/ollama
Yep â that page is showing examples, not âthe only models that existâ, and (annoyingly) the examples skew big.
For a Mac mini 16GB / 256GB, from that specific list:
llama3.3â this is the 70B model. Not suitable on 16GB.qwen2.5-coder:32bâ 32B, also not suitable on 16GB.deepseek-r1:32bâ 32B, also not suitable on 16GB.gpt-oss:20bâ the only one thatâs even remotely plausible on 16GB (and even then, expect it to be tight/slow once OpenClaw + contexts grow).
So if you want the simplest path thatâs consistent with the OpenClaw docs (and avoids manual model config): pull the 20B one first and see if itâs usable:
ollama pull gpt-oss:20b
ollama list
If that ends up too slow or you hit memory pressure, the next step is: pick a smaller Ollama model (not on that OpenClaw page), and we can still use it with OpenClaw â but it may require explicit model config (and some smaller models wonât be as reliable at tool calling).
If you run the pull, paste ollama list after it finishes and weâll continue to Step 4 (making OpenClaw actually use it and preventing Google fallback while testing).
Or, alternatively, give me smaller model names and we'll go from there right away
I canât give you a list of specific âsmall model namesâ to pull here â those recommendations go stale really fast (and Iâd rather not send you on a 30GB download chase that turns out to be the wrong pick). The best place for current âthis works well on 16GBâ picks is #models.
What I can do is get you unstuck immediately with a reliable selection method:
How to pick a smaller Ollama model for a 16GB Mac (that actually works with OpenClaw)
-
Go to the Ollama library and pick something explicitly small (look for tags like
:7b/:8b/:9b):
https://ollama.com/library -
Pull it:
ollama pull <model-id>
- Check that Ollama sees it:
ollama list
- Check whether OpenClaw auto-discovers it (tool-capable filter):
openclaw models list
- If it shows up as
ollama/...: great â we can set it as your default and youâll stay local. - If it doesnât show up: that model likely doesnât report tool support, so OpenClaw hides it during auto-discovery. Then you have two choices:
- pick a different model (repeat steps 1â4), or
- we switch to explicit model config to force it (works, but tool calling may be weaker depending on the model).
If you tell me which size tag you want to target (7B vs 8B vs 9B) and whether you care more about general chat vs coding, Iâll tell you exactly what to look for on the library page (without guessing names).
Pulling ollama qwen3.5:9b
Good pick for a 16GB box â 9B is in the âshould actually runâ range.
Once the pull finishes, do these in order:
Step 3.4 â Confirm Ollama sees the model
ollama list
curl http://localhost:11434/api/tags
Step 4 â Make OpenClaw see/use Ollama
4.1 Enable the Ollama provider (service-safe)
openclaw config set models.providers.ollama.apiKey "ollama-local"
openclaw gateway restart
4.2 Check whether OpenClaw discovered it
openclaw models list
Youâre looking for an entry like:
ollama/qwen3.5:9b
4.3 If it shows up: set it as the default model
openclaw config set agents.defaults.model.primary "ollama/qwen3.5:9b"
openclaw gateway restart
Important note about âfalling back to Googleâ
If you leave a cloud fallback configured, OpenClaw may still use it when the Ollama model errors / times out. If you want strictly local (no surprise charges), paste your current agents.defaults.model + any models.providers.google... bits (or screenshot), and Iâll tell you the smallest safe change to prevent fallback while you test.
Send me the output of:
ollama listopenclaw models list
after you run the steps above.
openclaw models list
đŚ OpenClaw 2026.3.2 (85377a2) â I'm not AI-powered, I'm AI-possessed. Big difference.
Model Input Ctx Local Auth Tags
google/gemini-2.5-flash-lite text+image 1024k no yes default,configured
groq/llama3-8b-8192 text 8k no yes fallback#1
anthropic/claude-haiku-4-5-20251001 text+image 195k no yes configured,alias:haiku
anthropic/claude-sonnet-4-6 text+image 195k no yes configured,alias:sonnet
google/gemini-2.5-flash text+image 1024k no yes configured
google/gemini-2.5-pro text+image 1024k no yes configured
openrouter/anthropic/claude-haiku-4.5 text+image 195k no yes configured
openrouter/openrouter/auto text+image 1953k no yes configured,alias:OpenRouter
qwen-portal/coder-model text 125k no yes configured,alias:qwen
qwen-portal/vision-model text+image 125k no yes configured
That output explains the problem: OpenClaw isnât seeing any Ollama provider/models at all yet, so it can only list your cloud providers.
Letâs wire Ollama in explicitly (this works even if the model doesnât advertise âtoolsâ perfectly).
Step 4 â Confirm youâre editing the same config the Gateway service uses
Run:
openclaw config file
openclaw gateway status
(If gateway status shows âConfig (cli)â vs âConfig (service)â different, tell me â thatâs a very common reason changes âdonât stickâ.)
Step 5 â Add Ollama provider + your model (explicit config)
Run these commands:
openclaw config set models.providers.ollama.baseUrl "http://127.0.0.1:11434"
openclaw config set models.providers.ollama.apiKey "ollama-local"
openclaw config set models.providers.ollama.api "ollama"
Now register the model explicitly:
openclaw config set models.providers.ollama.models '[
{
id: "qwen3.5:9b",
name: "Local Ollama (qwen3.5:9b)",
reasoning: false,
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
contextWindow: 8192,
maxTokens: 81920,
}
]' --strict-json
Set it as your default:
openclaw config set agents.defaults.model.primary "ollama/qwen3.5:9b"
Restart + validate:
openclaw config validate
openclaw gateway restart
openclaw models list
After that, openclaw models list should include ollama/qwen3.5:9b.
Avoiding surprise Google charges (optional but recommended while testing)
If you want âlocal onlyâ during testing, we should also clear/disable fallbacks (otherwise it can still jump to Google when Ollama errors). If you paste:
openclaw config get agents.defaults.model
Iâll tell you the exact one-liner to make fallbacks empty without touching the rest of your setup.
{
"primary": "ollama/qwen3.5:9b",
"fallbacks": [
"groq/llama3-8b-8192"
]
Great â thatâs exactly what we need.
To make it local-only (no Groq fallback), run:
openclaw config set agents.defaults.model.fallbacks '[]' --strict-json
openclaw gateway restart
Now if Ollama isnât configured correctly (or the model errors), youâll get a failure instead of silently using Groq/Google â which is what we want while validating.
Next, run and paste:
openclaw models list
We should see an ollama/... entry. If we still donât, weâll finish the explicit models.providers.ollama config so OpenClaw can actually talk to your local Ollama server.
đŚ OpenClaw 2026.3.2 (85377a2) â Hot reload for config, cold sweat for deploys.
Model Input Ctx Local Auth Tags
ollama/qwen3.5:9b text 8k yes yes default
anthropic/claude-haiku-4-5-20251001 text+image 195k no yes configured,alias:haiku
anthropic/claude-sonnet-4-6 text+image 195k no yes configured,alias:sonnet
google/gemini-2.5-flash text+image 1024k no yes configured
google/gemini-2.5-pro text+image 1024k no yes configured
openrouter/anthropic/claude-haiku-4.5 text+image 195k no yes configured
openrouter/openrouter/auto text+image 1953k no yes configured,alias:OpenRouter
qwen-portal/coder-model text 125k no yes configured,alias:qwen
qwen-portal/vision-model text+image 125k no yes configured
google/gemini-2.5-flash-lite text+image 1024k no yes configured
Perfect â thatâs the state you wanted:
ollama/qwen3.5:9bshows as Local: yes and tagged default- and youâve cleared fallbacks, so it wonât silently jump to a paid provider for normal text chats
Next step: verify itâs actually answering via Telegram
- Send your bot a simple text like: âtell me a 1âsentence jokeâ
- If it replies: youâre done for the âlocal text LLMâ part.
Two practical follow-ups (so it stays working)
1) Make sure Ollama starts after reboots
Since OpenClaw depends on http://localhost:11434, you want Ollama to always be running. Easiest path: add Ollama to macOS Login Items (or just make sure the Ollama app launches on boot).
2) Know when it will still use cloud models
You still have cloud models configured (Anthropic/Google/etc). Thatâs fine and doesnât cost anything unless used â but if you send images or use a vision-only workflow, OpenClaw may need a vision-capable model (often cloud).
If you want, tell me your goal:
- Local-only, even if features break, or
- Local for text, cloud allowed for images / special cases
âŚand Iâll suggest the cleanest config for that.
The reply was:
â ď¸ Agent failed before reply: Model context window too small (8192 tokens). Minimum is 16000.
Logs: openclaw logs --follow
Good news: Ollama is wired up correctly. Bad news: OpenClaw is refusing to use that model because (per your config) it only has an 8192 token context window, and this agent needs ⼠16000 just to fit the system + tools.
This is fixable by increasing the declared contextWindow for your Ollama model entry.
Fix (2 commands + restart)
Run:
openclaw config set models.providers.ollama.models[0].contextWindow 16384 --strict-json
openclaw config set models.providers.ollama.models[0].maxTokens 163840 --strict-json
openclaw gateway restart
Then try your Telegram message again.
If it still errors
Run this and paste the output:
openclaw config get models.providers.ollama.models[0]
(Also, openclaw logs --follow is the right way to watch whatâs happening, yep.)
Sent it exactly the "tell me a 1-sentence joke" text, it showed "typing..." for a while. Then it stopped and about around two minutes later it replied with a joke. Is it normal for it to be this slow?
Yeah â on a local 9B model + OpenClaw, that kind of delay can be normal, especially on the first request.
There are two main reasons it feels âweirdly slowâ for a 1âsentence joke:
-
Cold start / model load
The first time Ollama uses a model (or after itâs been idle/unloaded), it has to load weights into memory. First response can be dramatically slower than the next few. -
OpenClawâs prompt is big (tools + policies)
Even if your message is tiny, OpenClaw is sending a large system/tool setup along with it. The model still has to process all of that before it can output 1 sentence. On 16GB hardware, that âprompt processingâ dominates.
Quick way to tell which one is biting you
Try the same prompt directly in Ollama (no OpenClaw overhead):
ollama run qwen3.5:9b "tell me a 1-sentence joke"
- If this is fast (a few seconds) but Telegram is slow â itâs mostly OpenClaw prompt/tool overhead.
- If this is also slow â itâs mostly hardware/model speed.
Biggest speed win on OpenClaw: reduce tools (shrinks the prompt)
If youâre currently using this as a âchat-onlyâ agent, set a minimal tool profile:
openclaw config set tools.profile "minimal"
openclaw gateway restart
This can make local models feel much snappier because the model isnât being fed huge tool schemas every turn. (You can switch back to full later when you want browsing/coding tools.)
Docs on tool profiles: https://docs.openclaw.ai/tools
If you tell me whether you want this agent to be chat-only or chat + web/browser automation, I can suggest the best tools.profile/allowlist combo for speed without breaking what you need.
It took 66.8 seconds, including all the reasoning behind it. OC took nearly 2 minutes
That comparison is actually super informative:
- ~67s in raw Ollama â this model is already pretty heavy/slow on your 16GB box (especially if it insists on âthinking out loudâ).
- ~2 min via OpenClaw â expected, because OpenClaw adds a lot of extra prompt around your message (system instructions + tool schemas + safety/policy). More input tokens = more time before it can even start generating the joke.
So yes: the extra ~50â70 seconds is very plausibly just prompt overhead.
The quickest win (and why I suggested it)
If you set tools to minimal, OpenClaw stops sending huge tool schemas every turn, which can cut that overhead a lot:
openclaw config set tools.profile "minimal"
openclaw gateway restart
Then retry the same â1-sentence jokeâ and see if itâs closer to the ~67s baseline.
Reality check
Even at ~67s, thatâs still slow for âchattyâ usage. To get snappy local chat you usually need one (or more) of:
- a smaller model for day-to-day chatting (and keep 9B for âheavierâ requests), and/or
- stronger hardware (though more RAM alone doesnât guarantee speed; it mostly expands what you can run)
If you want, we can pinpoint where the extra time is going
Run openclaw logs --follow, send one prompt, then paste the few lines around the model request timing (thereâs usually a ârequest took âŚmsâ style line). Thatâll tell us whether the delay is inside the LLM call vs queueing/Telegram/etc.
Even calling the bot by his name just like "Bot?" takes 160 seconds through telegram
160 seconds for just âBot?â is not what you want â but itâs also very often explainable with OpenClaw:
The usual culprit: your session context is huge
Even if your new message is 1 token, the model still has to process:
- the entire conversation so far, plus
- OpenClawâs system prompt + tool/policy prompt
If youâve been testing a bunch (and the model has been outputting long âreasoningâ), the context can balloon and everything gets slower and slower.
Do this first (fastest fix)
In Telegram, send:
/new
Then try again with something tiny like âhiâ.
Also confirm youâre actually in âminimal toolsâ mode
Run:
openclaw config get tools.profile
If itâs not "minimal", set it (again) and restart:
openclaw config set tools.profile "minimal"
openclaw gateway restart
If itâs still slow after /new
Then itâs mostly just âthis 9B model on 16GB is slow for agent-style promptsâ.
At that point the real options are:
- switch to a smaller local model for chat (look for
:4b/:7bvariants in the Ollama library), or - accept a cheap cloud model for âinstant chatâ, and keep local for heavier/offline use.
If you want, run openclaw logs --follow, send one âhiâ after /new, and paste the lines showing how long the LLM call took + prompt token counts â that will tell us exactly whether itâs (a) prompt size, (b) model speed, or (c) something else in the pipeline.
Took it 78 seconds after the prompt with the minimal tools patch
78s after switching to minimal tools tells us two things:
- The âOpenClaw overheadâ part is mostly under control now (good).
- The remaining slowness is basically the model/runtime speed on your 16GB Mac.
Because you measured ~67s in raw ollama run earlier, OpenClaw being ~78s is actually pretty close to the baseline â which means there isnât a magical OpenClaw setting that will turn this into a 3âsecond bot. You need a faster model (usually smaller) or different hardware.
What to do next (pick one)
Option A: Keep this model, accept itâs âslow but localâ
This is fine for âfire-and-forgetâ tasks, but it wonât feel chatty/interactive.
Option B: Switch to a smaller local model for chat (recommended)
Look for a :4b or :7b class model in the Ollama library, pull it, and swap your OpenClaw default to it.
High-level steps:
ollama pull <smaller-model-id>
- Update OpenClaw to use it:
openclaw config set models.providers.ollama.models[0].id "<smaller-model-id>"
openclaw config set agents.defaults.model.primary "ollama/<smaller-model-id>"
openclaw gateway restart
(If you tell me what smaller model id you pulled, Iâll give you the exact config set lines including updating the name field cleanly.)
One more useful debug command (shows you why itâs slow)
In Telegram, run:
/context detail
That will show how big your system/tool prompt is. If itâs still huge, we can trim more.
If your goal is âfast chatty assistantâ, Option B is the realistic path on 16GB.
Hereâs the result of the /context detail:
đ§ Context breakdown (detailed)
Workspace: /Users/aiverse/.openclaw/workspace
Bootstrap max/file: 20,000 chars
Bootstrap max/total: 150,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 36,632 chars (~9,158 tok) (Project Context 16,513 chars (~4,129 tok))
Injected workspace files:
⢠AGENTS.md: OK | raw 7,804 chars (~1,951 tok) | injected 7,804 chars (~1,951 tok)
⢠SOUL.md: OK | raw 1,664 chars (~416 tok) | injected 1,664 chars (~416 tok)
⢠TOOLS.md: OK | raw 850 chars (~213 tok) | injected 850 chars (~213 tok)
⢠IDENTITY.md: OK | raw 459 chars (~115 tok) | injected 459 chars (~115 tok)
⢠USER.md: OK | raw 583 chars (~146 tok) | injected 583 chars (~146 tok)
⢠HEARTBEAT.md: OK | raw 167 chars (~42 tok) | injected 167 chars (~42 tok)
⢠BOOTSTRAP.md: MISSING | raw 0 | injected 0
⢠MEMORY.md: OK | raw 4,298 chars (~1,075 tok) | injected 4,298 chars (~1,075 tok)
Skills list (system prompt text): 12,886 chars (~3,222 tok) (42 skills)
Skills: 1password, apple-notes, apple-reminders, bear-notes, blogwatcher, blucli, camsnap, clawhub, discord, eightctl, gemini, gh-issues, gifgrep, github, gog, goplaces, healthcheck, himalaya, imsg, mcporter, ⌠(+22 more)
Top skills (prompt entry size):
⢠github: 569 chars (~143 tok)
⢠gh-issues: 505 chars (~127 tok)
⢠healthcheck: 488 chars (~122 tok)
⢠model-usage: 460 chars (~115 tok)
⢠things-mac: 433 chars (~109 tok)
⢠clawhub: 429 chars (~108 tok)
⢠weather: 413 chars (~104 tok)
⢠xurl: 384 chars (~96 tok)
⢠himalaya: 380 chars (~95 tok)
⢠apple-notes: 372 chars (~93 tok)
⢠1password: 345 chars (~87 tok)
⢠goplaces: 331 chars (~83 tok)
⢠mcporter: 327 chars (~82 tok)
⢠apple-reminders: 307 chars (~77 tok)
⢠skill-creator: 293 chars (~74 tok)
⢠summarize: 293 chars (~74 tok)
⢠wacli: 274 chars (~69 tok)
⢠oracle: 273 chars (~69 tok)
⢠openai-image-gen: 269 chars (~68 tok)
⢠songsee: 248 chars (~62 tok)
⢠openai-whisper-api: 246 chars (~62 tok)
⢠ordercli: 245 chars (~62 tok)
⢠nano-banana-pro: 242 chars (~61 tok)
⢠obsidian: 242 chars (~61 tok)
⢠blogwatcher: 240 chars (~60 tok)
⢠gifgrep: 240 chars (~60 tok)
⢠imsg: 238 chars (~60 tok)
⢠nano-pdf: 231 chars (~58 tok)
⢠openai-whisper: 230 chars (~58 tok)
⢠eightctl: 229 chars (~58 tok)
⌠(+12 more skills)
Tool list (system prompt text): 185 chars (~47 tok)
Tool schemas (JSON): 89 chars (~23 tok) (counts toward context; not shown as text)
Tools: session_status
Top tools (schema size):
⢠session_status: 89 chars (~23 tok)
Top tools (summary text size):
⢠session_status: 207 chars (~52 tok)
Tools (param count):
⢠session_status: 2 params
Session tokens (cached): unknown / ctx=16384
Inline shortcut: a command token inside normal text (e.g. âhey /statusâ) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.
That /context detail output basically answers the âwhy is it slow?â question:
Youâre feeding the model ~9,158 tokens of system prompt before it even sees your message (plus the chat history). On a local 9B model, just reading that much text every turn is expensive.
Two biggest chunks you can trim immediately:
- Skills list: ~3,222 tokens (â42 skillsâ)
- Workspace files: AGENTS.md (~1,951 tok) + MEMORY.md (~1,075 tok) are pretty chunky
Fastest âmake it fasterâ steps
1) Stop injecting the 42 bundled skills (big win)
Those skills are great when youâre using a strong cloud model, but for local chat they mostly just bloat the prompt.
Per docs, you can allowlist bundled skills. To effectively disable bundled skills injection:
openclaw config set skills.allowBundled '[]' --strict-json
openclaw gateway restart
Then do /new in Telegram and run /context detail again â you should see that skills section shrink a lot.
Docs: https://docs.openclaw.ai/tools/skills-config
2) Keep tools minimal (you already did)
That was the right move.
3) (Optional but huge) Slim down AGENTS.md / MEMORY.md
For a local-first âchat botâ workspace, you want those files tiny. Every extra line is paid for on every message.
If you want, paste your AGENTS.md and MEMORY.md sizes are already shown â you could cut them down to a few rules each and youâll feel it immediately.
If you do step (1) and re-run /context detail, paste the new numbers (system prompt tokens + skills tokens). We can iterate until it feels acceptable on your hardware.
This is great debugging output â and it shows exactly whatâs still bloating your prompt:
- Tools are basically gone now (Tools: session_status, tiny schemas) â
- But youâre still injecting ~42 skills into the system prompt every turn â
(that was ~3,222 tokens in your earlier breakdown)
On a local 9B model, that skills list alone can cost you a lot of seconds per message.
Next step: disable bundled skills (and start a new session)
Per the docs, skills.allowBundled is the switch for bundled skills, and changes take effect on the next new session.
Run on the Mac:
openclaw config set skills.allowBundled '[]' --strict-json
openclaw gateway restart
Then in Telegram:
/new/context detailagain
You should see the skills section shrink dramatically (ideally to 0 bundled skills).
If skills still show up after that
Then theyâre coming from ~/.openclaw/skills or <workspace>/skills (managed/workspace skills are not affected by allowBundled). In that case, paste:
openclaw config get skills.load.extraDirs
and weâll disable those sources too.
Once the skills injection is gone, your local latency should drop noticeably.
After step1
đ§ Context breakdown (detailed)
Workspace: /Users/aiverse/.openclaw/workspace
Bootstrap max/file: 20,000 chars
Bootstrap max/total: 150,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 36,632 chars (~9,158 tok) (Project Context 16,513 chars (~4,129 tok))
Injected workspace files:
⢠AGENTS.md: OK | raw 7,804 chars (~1,951 tok) | injected 7,804 chars (~1,951 tok)
⢠SOUL.md: OK | raw 1,664 chars (~416 tok) | injected 1,664 chars (~416 tok)
⢠TOOLS.md: OK | raw 850 chars (~213 tok) | injected 850 chars (~213 tok)
⢠IDENTITY.md: OK | raw 459 chars (~115 tok) | injected 459 chars (~115 tok)
⢠USER.md: OK | raw 583 chars (~146 tok) | injected 583 chars (~146 tok)
⢠HEARTBEAT.md: OK | raw 167 chars (~42 tok) | injected 167 chars (~42 tok)
⢠BOOTSTRAP.md: MISSING | raw 0 | injected 0
⢠MEMORY.md: OK | raw 4,298 chars (~1,075 tok) | injected 4,298 chars (~1,075 tok)
Skills list (system prompt text): 12,886 chars (~3,222 tok) (42 skills)
Skills: 1password, apple-notes, apple-reminders, bear-notes, blogwatcher, blucli, camsnap, clawhub, discord, eightctl, gemini, gh-issues, gifgrep, github, gog, goplaces, healthcheck, himalaya, imsg, mcporter, ⌠(+22 more)
Top skills (prompt entry size):
⢠github: 569 chars (~143 tok)
⢠gh-issues: 505 chars (~127 tok)
⢠healthcheck: 488 chars (~122 tok)
⢠model-usage: 460 chars (~115 tok)
⢠things-mac: 433 chars (~109 tok)
⢠clawhub: 429 chars (~108 tok)
⢠weather: 413 chars (~104 tok)
⢠xurl: 384 chars (~96 tok)
⢠himalaya: 380 chars (~95 tok)
⢠apple-notes: 372 chars (~93 tok)
⢠1password: 345 chars (~87 tok)
⢠goplaces: 331 chars (~83 tok)
⢠mcporter: 327 chars (~82 tok)
⢠apple-reminders: 307 chars (~77 tok)
⢠skill-creator: 293 chars (~74 tok)
⢠summarize: 293 chars (~74 tok)
⢠wacli: 274 chars (~69 tok)
⢠oracle: 273 chars (~69 tok)
⢠openai-image-gen: 269 chars (~68 tok)
⢠songsee: 248 chars (~62 tok)
⢠openai-whisper-api: 246 chars (~62 tok)
⢠ordercli: 245 chars (~62 tok)
⢠nano-banana-pro: 242 chars (~61 tok)
⢠obsidian: 242 chars (~61 tok)
⢠blogwatcher: 240 chars (~60 tok)
⢠gifgrep: 240 chars (~60 tok)
⢠imsg: 238 chars (~60 tok)
⢠nano-pdf: 231 chars (~58 tok)
⢠openai-whisper: 230 chars (~58 tok)
⢠eightctl: 229 chars (~58 tok)
⌠(+12 more skills)
Tool list (system prompt text): 185 chars (~47 tok)
Tool schemas (JSON): 89 chars (~23 tok) (counts toward context; not shown as text)
Tools: session_status
Top tools (schema size):
⢠session_status: 89 chars (~23 tok)
Top tools (summary text size):
⢠session_status: 207 chars (~52 tok)
Tools (param count):
⢠session_status: 2 params
Session tokens (cached): unknown / ctx=16384
Inline shortcut: a command token inside normal text (e.g. âhey /statusâ) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.
Those numbers are identical to before (still 42 skills / ~3,222 tokens), which strongly suggests the change hasnât been applied to the session snapshot yet.
Two things to check:
1) Verify the config actually took
In Terminal, run:
openclaw config get skills.allowBundled
It should print:
[]
If itâs null/missing (or not []), then the config set didnât stick and we fix that first.
2) Start a NEW session (this is required for skills changes)
Skills are snapshotted when the session starts (docs: skills snapshot/performance). So after changing skills.allowBundled, you must do:
In Telegram (standalone message):
/new
Then run:
/context detail
If skills.allowBundled is [] and youâve done /new, the skills list should drop (for bundled skills).
If you still see 42 skills after /new
Then those skills are not âbundledâ anymore â theyâre being loaded from ~/.openclaw/skills or <workspace>/skills (managed/workspace skills are unaffected by allowBundled).
In that case, paste:
openclaw config get skills.load.extraDirs
and also tell me if you ever installed skills via ClawHub (that often puts them into ~/.openclaw/skills). Then weâll disable the managed/workspace skill sources cleanly.
Got it â still showing the full skills list, so weâre definitely not past the âsession snapshotâ issue yet (or skills.allowBundled didnât apply).
Next 2 checks (these will tell us exactly which):
- In Terminal, paste:
openclaw config get skills.allowBundled
- In Telegram, send a standalone:
/new
âŚand then run /context detail again.
If skills.allowBundled prints [] and youâve done /new and you still see those skills, then theyâre coming from ~/.openclaw/skills or <workspace>/skills (managed/workspace), and weâll disable that source next.
bash-3.2$ openclaw config get skills.allowBundled
đŚ OpenClaw 2026.3.2 (85377a2) â Your personal assistant, minus the passive-aggressive calendar reminders.
[]
You have new mail in /var/mail/aiverse
bash-3.2$
Perfect â skills.allowBundled is definitely set to [] now. đ
Two important details:
- Skills are snapshotted when a session starts, so you must start a new session for this to affect
/context. - If you still see lots of skills after
/new, then theyâre not âbundledâ anymore â theyâre coming from managed/workspace skills (~/.openclaw/skillsor<workspace>/skills), whichallowBundleddoes not filter.
Do this next
Step A â Force a fresh session snapshot
In Telegram, send (standalone):
/new
Then run:
/context detail
Step B â If skills still show up, identify where theyâre coming from
On the Mac, run:
openclaw skills list --eligible
That command is specifically meant to show what skills are eligible and (usually) where theyâre loaded from. Paste the top chunk of that output and weâll disable the right source (managed/workspace/extraDirs) so your local prompt shrinks and speed improves.
Docs for that command: https://docs.openclaw.ai/cli/skills
â ď¸ đ Read failed
đ§ Context breakdown (detailed)
Workspace: /Users/aiverse/.openclaw/workspace
Bootstrap max/file: 20,000 chars
Bootstrap max/total: 150,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 36,632 chars (~9,158 tok) (Project Context 16,513 chars (~4,129 tok))
Injected workspace files:
⢠AGENTS.md: OK | raw 7,804 chars (~1,951 tok) | injected 7,804 chars (~1,951 tok)
⢠SOUL.md: OK | raw 1,664 chars (~416 tok) | injected 1,664 chars (~416 tok)
⢠TOOLS.md: OK | raw 850 chars (~213 tok) | injected 850 chars (~213 tok)
⢠IDENTITY.md: OK | raw 459 chars (~115 tok) | injected 459 chars (~115 tok)
⢠USER.md: OK | raw 583 chars (~146 tok) | injected 583 chars (~146 tok)
⢠HEARTBEAT.md: OK | raw 167 chars (~42 tok) | injected 167 chars (~42 tok)
⢠BOOTSTRAP.md: MISSING | raw 0 | injected 0
⢠MEMORY.md: OK | raw 4,298 chars (~1,075 tok) | injected 4,298 chars (~1,075 tok)
Skills list (system prompt text): 12,886 chars (~3,222 tok) (42 skills)
Skills: 1password, apple-notes, apple-reminders, bear-notes, blogwatcher, blucli, camsnap, clawhub, discord, eightctl, gemini, gh-issues, gifgrep, github, gog, goplaces, healthcheck, himalaya, imsg, mcporter, ⌠(+22 more)
Top skills (prompt entry size):
⢠github: 569 chars (~143 tok)
⢠gh-issues: 505 chars (~127 tok)
⢠healthcheck: 488 chars (~122 tok)
⢠model-usage: 460 chars (~115 tok)
⢠things-mac: 433 chars (~109 tok)
⢠clawhub: 429 chars (~108 tok)
⢠weather: 413 chars (~104 tok)
⢠xurl: 384 chars (~96 tok)
⢠himalaya: 380 chars (~95 tok)
⢠apple-notes: 372 chars (~93 tok)
⢠1password: 345 chars (~87 tok)
⢠goplaces: 331 chars (~83 tok)
⢠mcporter: 327 chars (~82 tok)
⢠apple-reminders: 307 chars (~77 tok)
⢠skill-creator: 293 chars (~74 tok)
⢠summarize: 293 chars (~74 tok)
⢠wacli: 274 chars (~69 tok)
⢠oracle: 273 chars (~69 tok)
⢠openai-image-gen: 269 chars (~68 tok)
⢠songsee: 248 chars (~62 tok)
⢠openai-whisper-api: 246 chars (~62 tok)
⢠ordercli: 245 chars (~62 tok)
⢠nano-banana-pro: 242 chars (~61 tok)
⢠obsidian: 242 chars (~61 tok)
⢠blogwatcher: 240 chars (~60 tok)
⢠gifgrep: 240 chars (~60 tok)
⢠imsg: 238 chars (~60 tok)
⢠nano-pdf: 231 chars (~58 tok)
⢠openai-whisper: 230 chars (~58 tok)
⢠eightctl: 229 chars (~58 tok)
⌠(+12 more skills)
Tool list (system prompt text): 185 chars (~47 tok)
Tool schemas (JSON): 89 chars (~23 tok) (counts toward context; not shown as text)
Tools: session_status
Top tools (schema size):
⢠session_status: 89 chars (~23 tok)
Top tools (summary text size):
⢠session_status: 207 chars (~52 tok)
Tools (param count):
⢠session_status: 2 params
Session tokens (cached): unknown / ctx=16384
Inline shortcut: a command token inside normal text (e.g. âhey /statusâ) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.
That â ď¸ đ Read failed line means /context detail couldnât read something it normally injects (a workspace file or a skill file). The snippet you pasted cuts off right before it would show which item failed, so I canât tell what it is yet.
Two things to do:
1) Paste the rest of the /context detail output
Specifically the part that lists:
MEMORY.md: ...(and whether it says READ FAILED)- the Skills list section (how many skills now)
- the Tools section (you said youâre on minimal tools â want to confirm it stuck)
2) Confirm where those â42 skillsâ are coming from
Since skills.allowBundled is [], bundled skills should be gone in a new session. If theyâre still present, theyâre almost certainly being loaded from managed/workspace skills.
Run on the Mac and paste the first ~30 lines:
openclaw skills list --eligible
That will tell us whether the skills are coming from:
~/.openclaw/skills(managed/local)<workspace>/skillsskills.load.extraDirs
Once we know the source, we can disable it cleanly (and your prompt size / latency will drop a lot).
Cool â this is now unambiguous:
skills.allowBundledis[]- but you still have 42 skills injected
That means those 42 skills are not coming from âbundled skillsâ anymore. Theyâre coming from one of the other skill locations:
~/.openclaw/skills(managed/local) or<workspace>/skillsorskills.load.extraDirs
allowBundled canât hide those (by design).
Next step: find the source path
Run this on the Mac and paste the output (or at least the first 30 lines):
openclaw skills list --eligible
That command will show where each eligible skill is loaded from.
If theyâre mostly from ~/.openclaw/skills (most common)
And you want a fast local chat bot, the quickest reversible way to stop injecting them is to temporarily move that folder out of the way:
mv ~/.openclaw/skills ~/.openclaw/skills.disabled
openclaw gateway restart
Then in Telegram:
/new/context detail
You should see the skills list drop (or become tiny), and latency should improve noticeably.
Nothing is deleted â you can undo it anytime:
mv ~/.openclaw/skills.disabled ~/.openclaw/skills
Before we do the move though, paste openclaw skills list --eligible so weâre not guessing.
Here it goes again then
đ§ Context breakdown (detailed)
Workspace: /Users/aiverse/.openclaw/workspace
Bootstrap max/file: 20,000 chars
Bootstrap max/total: 150,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 36,632 chars (~9,158 tok) (Project Context 16,513 chars (~4,129 tok))
Injected workspace files:
⢠AGENTS.md: OK | raw 7,804 chars (~1,951 tok) | injected 7,804 chars (~1,951 tok)
⢠SOUL.md: OK | raw 1,664 chars (~416 tok) | injected 1,664 chars (~416 tok)
⢠TOOLS.md: OK | raw 850 chars (~213 tok) | injected 850 chars (~213 tok)
⢠IDENTITY.md: OK | raw 459 chars (~115 tok) | injected 459 chars (~115 tok)
⢠USER.md: OK | raw 583 chars (~146 tok) | injected 583 chars (~146 tok)
⢠HEARTBEAT.md: OK | raw 167 chars (~42 tok) | injected 167 chars (~42 tok)
⢠BOOTSTRAP.md: MISSING | raw 0 | injected 0
⢠MEMORY.md: OK | raw 4,298 chars (~1,075 tok) | injected 4,298 chars (~1,075 tok)
Skills list (system prompt text): 12,886 chars (~3,222 tok) (42 skills)
Skills: 1password, apple-notes, apple-reminders, bear-notes, blogwatcher, blucli, camsnap, clawhub, discord, eightctl, gemini, gh-issues, gifgrep, github, gog, goplaces, healthcheck, himalaya, imsg, mcporter, ⌠(+22 more)
Top skills (prompt entry size):
⢠github: 569 chars (~143 tok)
⢠gh-issues: 505 chars (~127 tok)
⢠healthcheck: 488 chars (~122 tok)
⢠model-usage: 460 chars (~115 tok)
⢠things-mac: 433 chars (~109 tok)
⢠clawhub: 429 chars (~108 tok)
⢠weather: 413 chars (~104 tok)
⢠xurl: 384 chars (~96 tok)
⢠himalaya: 380 chars (~95 tok)
⢠apple-notes: 372 chars (~93 tok)
⢠1password: 345 chars (~87 tok)
⢠goplaces: 331 chars (~83 tok)
⢠mcporter: 327 chars (~82 tok)
⢠apple-reminders: 307 chars (~77 tok)
⢠skill-creator: 293 chars (~74 tok)
⢠summarize: 293 chars (~74 tok)
⢠wacli: 274 chars (~69 tok)
⢠oracle: 273 chars (~69 tok)
⢠openai-image-gen: 269 chars (~68 tok)
⢠songsee: 248 chars (~62 tok)
⢠openai-whisper-api: 246 chars (~62 tok)
⢠ordercli: 245 chars (~62 tok)
⢠nano-banana-pro: 242 chars (~61 tok)
⢠obsidian: 242 chars (~61 tok)
⢠blogwatcher: 240 chars (~60 tok)
⢠gifgrep: 240 chars (~60 tok)
⢠imsg: 238 chars (~60 tok)
⢠nano-pdf: 231 chars (~58 tok)
⢠openai-whisper: 230 chars (~58 tok)
⢠eightctl: 229 chars (~58 tok)
⌠(+12 more skills)
Tool list (system prompt text): 185 chars (~47 tok)
Tool schemas (JSON): 89 chars (~23 tok) (counts toward context; not shown as text)
Tools: session_status
Top tools (schema size):
⢠session_status: 89 chars (~23 tok)
Top tools (summary text size):
⢠session_status: 207 chars (~52 tok)
Tools (param count):
⢠session_status: 2 params
Session tokens (cached): unknown / ctx=16384
Inline shortcut: a command token inside normal text (e.g. âhey /statusâ) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.
Yep â next step is just to grab the source paths so we stop guessing.
Run on the Mac and paste the output here:
openclaw skills list --eligible
If itâs long, even the first ~40 lines is enough â Iâm mainly looking for whether the paths point at ~/.openclaw/skills or <workspace>/skills (or an extraDirs folder).
Thatâs still the /context detail output (from Telegram).
What we need now is a Terminal command on the Mac that shows where the skills are coming from:
openclaw skills list --eligible
Please copy/paste the output of that command here (even just the first ~40 lines). It will include the file paths, which is the missing piece â thatâs how weâll tell if those 42 skills are coming from:
~/.openclaw/skills(managed/local)/Users/aiverse/.openclaw/workspace/skills(workspace)- or an
extraDirsfolder
Once we see the paths, the fix is straightforward.
Youâre still sending the Telegram /context detail output â thatâs useful, but it wonât tell us where those skills live on disk.
If youâre having trouble getting openclaw skills list --eligible output, we can do a simpler âbrute force but reversibleâ test that answers the same question:
Reversible test: temporarily disable managed/workspace skill folders
On the Mac mini Terminal (not Telegram), run these one at a time:
1) Check if the managed skills folder exists
ls -la ~/.openclaw/skills
If that shows a folder full of skills, temporarily move it out of the way:
mv ~/.openclaw/skills ~/.openclaw/skills.disabled
2) Check if the workspace skills folder exists
ls -la ~/.openclaw/workspace/skills
If it exists and has skills, disable it too:
mv ~/.openclaw/workspace/skills ~/.openclaw/workspace/skills.disabled
3) Restart + new session
openclaw gateway restart
Then in Telegram:
/new/context detail
If the skills list disappears (or drops to ~0), we found the source and your prompt will shrink a lot â faster replies.
Undo anytime:
mv ~/.openclaw/skills.disabled ~/.openclaw/skills
mv ~/.openclaw/workspace/skills.disabled ~/.openclaw/workspace/skills
openclaw gateway restart
If you run step (1) and paste what ls -la ~/.openclaw/skills prints (even just âNo such fileâ vs showing contents), I can tell you exactly which branch to take.
bash-3.2$ openclaw skills list --eligible
đŚ OpenClaw 2026.3.2 (85377a2) â It's not "failing," it's "discovering new ways to configure the same thing wrong."
Skills (42/42 ready)
ââââââââââââŹâââââââââââââââââââŹâââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââŹâââââââââââââââââââ
â Status â Skill â Description â Source â
ââââââââââââźâââââââââââââââââââźâââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââźâââââââââââââââââââ¤
â â ready â đ 1password â Set up and use 1Password CLI (op). Use when installing the CLI, â openclaw-bundled â
â â â enabling desktop app integration, signing in (single or multi- â â
â â â account), or reading/injecting/running secrets via op. â â
â â ready â đ apple-notes â Manage Apple Notes via the memo CLI on macOS (create, view, â openclaw-bundled â
â â â edit, delete, search, move, and export notes). Use when a user â â
â â â asks OpenClaw to add a note, list notes, search notes, or manage â â
â â â note folders. â â
â â ready â â° apple- â Manage Apple Reminders via remindctl CLI (list, add, edit, â openclaw-bundled â
â â reminders â complete, delete). Supports lists, date filters, and JSON/plain â â
â â â output. â â
â â ready â đť bear-notes â Create, search, and manage Bear notes via grizzly CLI. â openclaw-bundled â
â â ready â đ° blogwatcher â Monitor blogs and RSS/Atom feeds for updates using the â openclaw-bundled â
â â â blogwatcher CLI. â â
â â ready â đŤ blucli â BluOS CLI (blu) for discovery, playback, grouping, and volume. â openclaw-bundled â
â â ready â đ¸ camsnap â Capture frames or clips from RTSP/ONVIF cameras. â openclaw-bundled â
â â ready â đŚ clawhub â Use the ClawHub CLI to search, install, update, and publish agent â openclaw-bundled â
â â â skills from clawhub.com. Use when you need to fetch new skills on â â
â â â the fly, sync installed skills to latest or a specific version, â â
â â â or publish new/updated skill folders with the npm-installed â â
â â â clawhub CLI. â â
â â ready â đŽ discord â Discord ops via the message tool (channel=discord). â openclaw-bundled â
â â ready â đď¸ eightctl â Control Eight Sleep pods (status, temperature, alarms, schedules). â openclaw-bundled â
â â ready â âď¸ gemini â Gemini CLI for one-shot Q&A, summaries, and generation. â openclaw-bundled â
Awesome â this output is exactly what we needed.
Key point: your 42 skills are 100% bundled (Source = openclaw-bundled). So skills.allowBundled should be able to filter them.
But since skills.allowBundled is currently [] and you still see 42/42 ready, OpenClaw is almost certainly treating an empty allowlist as âno filterâ (i.e., allow everything). Thatâs a common allowlist semantics gotcha.
Workaround: set allowBundled to a non-empty list
If you want zero bundled skills injected, set it to a skill name that doesnât exist:
openclaw config set skills.allowBundled '["__none__"]' --strict-json
openclaw gateway restart
Then in Telegram:
/new/context detail
And in Terminal, confirm:
openclaw skills list --eligible
You should now see 0 eligible bundled skills, and your system prompt should shrink by ~3,222 tokens (the skills list), which helps local speed.
If you do want a tiny set of skills (instead of none)
Set skills.allowBundled to just a couple you actually need (much smaller prompt than 42).
If ["__none__"] still doesnât reduce the eligible list, thatâs a bug â and itâd be worth filing an issue with the exact openclaw --version + openclaw config get skills.allowBundled + openclaw skills list --eligible output.
Yep â thatâs still the âeverything allowedâ state.
Now do the change + re-check:
- Set a non-empty allowlist that matches nothing:
openclaw config set skills.allowBundled '["__none__"]' --strict-json
openclaw gateway restart
- Start a fresh session snapshot in Telegram:
/new
- Re-run and paste just the top part (the âSkills (⌠ready)â line is enough):
openclaw skills list --eligible
If this works, it should drop from 42/42 ready to 0 ready (or close to it), and /context detail should no longer show that 42-skill list being injected.
đ§ Context breakdown (detailed)
Workspace: /Users/aiverse/.openclaw/workspace
Bootstrap max/file: 20,000 chars
Bootstrap max/total: 150,000 chars
Sandbox: mode=off sandboxed=false
System prompt (run): 23,348 chars (~5,837 tok) (Project Context 16,513 chars (~4,129 tok))
Injected workspace files:
⢠AGENTS.md: OK | raw 7,804 chars (~1,951 tok) | injected 7,804 chars (~1,951 tok)
⢠SOUL.md: OK | raw 1,664 chars (~416 tok) | injected 1,664 chars (~416 tok)
⢠TOOLS.md: OK | raw 850 chars (~213 tok) | injected 850 chars (~213 tok)
⢠IDENTITY.md: OK | raw 459 chars (~115 tok) | injected 459 chars (~115 tok)
⢠USER.md: OK | raw 583 chars (~146 tok) | injected 583 chars (~146 tok)
⢠HEARTBEAT.md: OK | raw 167 chars (~42 tok) | injected 167 chars (~42 tok)
⢠BOOTSTRAP.md: MISSING | raw 0 | injected 0
⢠MEMORY.md: OK | raw 4,298 chars (~1,075 tok) | injected 4,298 chars (~1,075 tok)
Skills list (system prompt text): 0 chars (~0 tok) (0 skills)
Skills: (none)
Tool list (system prompt text): 185 chars (~47 tok)
Tool schemas (JSON): 89 chars (~23 tok) (counts toward context; not shown as text)
Tools: session_status
Top tools (schema size):
⢠session_status: 89 chars (~23 tok)
Top tools (summary text size):
⢠session_status: 207 chars (~52 tok)
Tools (param count):
⢠session_status: 2 params
Session tokens (cached): unknown / ctx=16384
Inline shortcut: a command token inside normal text (e.g. âhey /statusâ) that runs immediately (allowlisted senders only) and is stripped before the model sees the remaining message.
Nice â thatâs a win.
Your system prompt dropped from ~9,158 tok â ~5,837 tok, which is exactly what we wanted (it strongly suggests the 42-skill list is no longer being injected).
Next: confirm + re-test speed
- Scroll a bit further down in
/context detailand confirm it now says something like:
- Skills list: 0 skills (or missing entirely)
- Then try a simple message in Telegram again (e.g. âhiâ or â1-sentence jokeâ) and time it.
You should see a noticeable speedup because the model has ~3k fewer tokens to read every turn.
If itâs still too slow
At this point the remaining biggest prompt chunk is your workspace files, especially:
AGENTS.md(~1,951 tok)MEMORY.md(~1,075 tok)
For a local chat-only bot, you can often shrink those a lot (or use a separate minimal workspace) and get another meaningful latency drop.
Perfect â thatâs exactly the âlean local chatâ setup:
- Skills: none (0 tok) â
- Tools: session_status only â
- System prompt is now much smaller â
Now the only big remaining prompt chunks are your injected workspace files (especially AGENTS.md and MEMORY.md), and then the raw speed of the local model itself.
Next: re-test latency
Try the same Telegram prompt again (âhiâ / â1âsentence jokeâ) and time it. It should be noticeably faster than before.
If itâs still too slow
Two levers left:
- Use a smaller local model for chat
Even with the prompt trimmed, a 9B model on 16GB can just be inherently slow. If you want ânear-instantâ replies, youâll usually need a smaller model for day-to-day chat.
If you tell me the new timed result after this prompt cleanup, I can tell you whether youâre hitting âprompt overheadâ still or just the model/hardware limit.
Took it about 26 seconds, which is a massive improvement from the initial time
26 seconds is a huge improvement â that tracks perfectly with what we fixed (you stopped paying ~3k âskills listâ tokens every single message).
Quick recap of whatâs now making it fast(er):
- Skills injected: 0
- Tools: session_status only
- Much smaller system prompt â far less âprompt processingâ time per turn
What to expect going forward
- The first reply after a long idle/reboot can still be slower (model cold-start / load).
- 26s is about what Iâd call âusable but not chattyâ. If you want near-instant replies, the next lever is usually a smaller local model (or more compute), not more config trimming.
Donât forget how to âturn features back onâ later
Right now youâve intentionally stripped things down for speed. When you want skills back:
- you can remove the bundled-skill block with:
openclaw config unset skills.allowBundled
openclaw gateway restart
(and /new in Telegram so the session re-snapshots skills)
Same idea for tools: if you later want browsing/web/etc, youâll switch tools.profile back from minimal.
If this got you to a good baseline, please mark the solution in the thread so the fix is searchable for the next person.
Now if we consider a 4b model instead but with I some skills and tools with it, will those 26 seconds improve or degrade?
Mostly improve, unless you re-bloat the prompt again.
A decent mental model is:
latency â (input tokens the model must read + output tokens it generates) á tokens/sec of the model
So two knobs fight each other:
1) Smaller model (9B â ~4B)
- Usually faster per token on the same Mac (often noticeably faster).
- But quality drops, and you may need more back-and-forth to get the same result (which can erase the speed win for âreal tasksâ).
2) Re-enabling skills/tools
- Increases input tokens every single message.
- The â42 skills listâ you had earlier was ~3,222 tokens of extra prompt. That alone added a ton of time.
- Some tools (especially UI/browser ones) add big schemas, which can also force larger context windows and slow things down.
Rule of thumb for your case
- 4B + a small handful of skills/tools (say 0â5 skills, and a light tool profile) will likely feel faster than your current 9B setup.
- 4B + âturn everything back onâ (dozens of skills + heavy tools) will likely creep back toward slow again, because prompt size dominates.
Best way to do this without guessing
- Switch to the 4B model
- Re-enable only the 1â3 skills/tools you actually want
- Run
/context detailand watch the system prompt tokens - Time a fixed prompt (â1âsentence jokeâ, âsummarize this sentenceâ, etc.)
If you tell me what tools you want (just web? browser? email? reminders?), I can suggest the smallest tools.profile + skills.allowBundled combo that gets you that capability without going back to a 9k-token system prompt.
Say (as an example) I want to use the agent to put in place a betting strategy on polymarket and use the agentâs web browsing capabilities (API and without API) to continuously monitor the web for news related to the topics it is betting on so that it can adjust the strategy and improve its odds. What skills and tools would I need?
For that Polymarket + âmonitor news and adjustâ workflow, think in capabilities first (tools), then optional helper wrappers (skills). Also: Iâd strongly recommend human-in-the-loop for any trade/bet execution and to check Polymarketâs ToS/geo rules before automating anything.
Tools youâd need (the real âpowerâ)
Minimum set for monitor â decide â notify:
- Web monitoring
web_search(find new items fast)web_fetch(pull article text for summarizing/extraction)
- Interaction with Polymarket
- Prefer API (more reliable): typically via
execrunning a small script/CLI that calls the API (or a custom tool/skill you write that wraps it). - âWithout APIâ (UI automation):
browsertool (works, but more brittleâselectors change, bot detection, logins, etc.)
- Automation / running continuously
crontool (run every N minutes / on schedule)
- State + reporting
read/write(store current positions, last-seen headlines, decision logs, etc.)message(send you alerts/approval requests to Telegram)
Docs: Tools overview https://docs.openclaw.ai/tools
Skills you might enable (optional, but can help)
These donât add new powers; they mostly add guidance / workflows. For this use case, useful ones are typically:
- a feed/news watcher skill (RSS/blog monitoring)
- a summarizer skill (turn long articles into âimpact on market?â bullets)
- a URL/article cleaner skill
You can also just skip skills entirely and rely on the tools above (often better for keeping prompts small on local models).
Practical architecture (what usually works best)
- Agent A (chatty, fast, local): minimal tools, just talks to you.
- Agent B (worker): has
cron + web_search/web_fetch + browser/exec + message, runs scheduled checks, and sends you âHereâs what changed + what I proposeâ messages. You approve before it places/adjusts bets.
If you tell me whether you want API-only (recommended) or UI automation too, I can suggest the smallest tool allowlist/profile that enables it without blowing your prompt size back up.
Would any of this work better on a 4b model, or even better, Gemini 2.5 flash lite?
For the Polymarket worker idea (web monitoring + decide + maybe act), a 4B local model and Gemini 2.5 Flash Lite will feel very different:
4B local (Ollama)
Pros
- Free per-token (after hardware)
- Private/local
- Can be fast if the prompt is lean
Cons (big for your use case)
- Much less reliable at multi-step tool use (browser automation, keeping state, not getting confused by long pages)
- More likely to need extra back-and-forth to get the same result
- If you re-enable lots of tools/skills, it can still bog down because it must read a big prompt every turn
Where it fits best: âchat + simple summarization/extractionâ and as a cheap local assistant, not as the main autonomous web/trading worker.
Gemini 2.5 Flash Lite (hosted)
Pros
- Typically way better at agentic workflows: web research, long context, consistent tool calling, planning
- Much faster wall-clock for âread a bunch of stuff and decideâ
Cons
- Costs money
- Data leaves your machine (provider/privacy considerations)
Where it fits best: the worker agent that does continuous monitoring + web tasks, then sends you âhereâs what changed + what I proposeâ messages.
Practical recommendation
- Use local (4B/9B) for day-to-day chat and lightweight tasks.
- Use Flash Lite (or another strong hosted model) for the âmonitor + browse + decideâ worker, with human approval before any bet placement.
If you tell me whether you want API-only Polymarket interaction or UI automation too, I can suggest a minimal tool set and a split that keeps token spend low while staying reliable.
Letâs move on to qwen 4b and check the speed, and then decide
Cool â letâs do a clean A/B speed test.
Step 0 â Pick the exact Ollama model ID
âqwen 4bâ needs the exact Ollama tag (something like âŚ:4b).
Grab the exact ID from the Ollama library page youâre using, then weâll plug it in below as:
<QWEN_4B_ID>
Step 1 â Pull the 4B model
ollama pull <QWEN_4B_ID>
ollama list
(Optional quick baseline, no OpenClaw overhead):
time ollama run <QWEN_4B_ID> "tell me a 1-sentence joke"
Step 2 â Switch OpenClaw to the 4B model
You currently have a single explicit Ollama model entry, so weâll just swap that entry + the default:
openclaw config set models.providers.ollama.models[0].id "<QWEN_4B_ID>"
openclaw config set agents.defaults.model.primary "ollama/<QWEN_4B_ID>"
openclaw gateway restart
openclaw models list
Step 3 â Test on Telegram (important)
In Telegram:
/new
- then send:
tell me a 1-sentence joke
Time it like before.
Rollback (if itâs worse)
openclaw config set models.providers.ollama.models[0].id "qwen3.5:9b"
openclaw config set agents.defaults.model.primary "ollama/qwen3.5:9b"
openclaw gateway restart
Reply with the exact <QWEN_4B_ID> you plan to pull (copy/paste from Ollama), and your measured times (direct Ollama + Telegram/OpenClaw).
step1
bash-3.2$ bash
The default interactive shell is now zsh.
To update your account to use zsh, please run chsh -s /bin/zsh.
For more details, please visit https://support.apple.com/kb/HT208050.
bash-3.2$ time ollama run <QWEN_4B_ID> "tell me a 1-sentence joke"
bash: QWEN_4B_ID: No such file or directory
real 0m0.003s
user 0m0.000s
sys 0m0.001s
That error is just because you literally ran the placeholder. In shells, the < ... > part is not âvariablesâ â < is treated as input redirection, so it tries to run a command called QWEN_4B_ID, etc.
For Qwen 3.5, the 4B tag is:
qwen3.5:4b (Ollama library lists 0.8b/2b/4b/9b/etc.)
Do this instead
1) Pull the 4B model
ollama pull qwen3.5:4b
ollama list
2) Baseline speed test (direct Ollama)
time ollama run qwen3.5:4b "tell me a 1-sentence joke"
3) Switch OpenClaw to it
openclaw config set models.providers.ollama.models[0].id "qwen3.5:4b"
openclaw config set agents.defaults.model.primary "ollama/qwen3.5:4b"
openclaw gateway restart
Then in Telegram:
/new- send the same prompt and time it
Paste the two timings (direct Ollama + Telegram/OpenClaw) and weâll decide if 4B is the better default for you.
real 4m19.689s
user 0m0.380s
sys 0m0.465s
thats for 4b
im waiting on the telegram results now
Oof â 4m19s for a direct ollama run is not what weâd expect from a 4B model. Thatâs slower than your 9B baseline, which means somethingâs off (or it did a huge âthinkingâ dump).
A couple quick sanity checks before we conclude â4B is worseâ:
1) Was that the first run after pulling?
The first run can be much slower (load/compile/warm-up). Please run it a second time and time it again:
time ollama run qwen3.5:4b "tell me a 1-sentence joke. Output ONLY the joke."
2) Did it output a ton of text/reasoning?
If it printed a long explanation, that alone can add minutes. If you can, tell me roughly how many lines it produced.
3) Show the model sizes you have installed
Paste:
ollama list
If the 2nd run is still minutesâŚ
Then yeah: for your machine/setup, qwen3.5:4b is not a speed win, and you should stick with the 9B (since you got it down to ~26s in OpenClaw) or try a different small model variant.
Send me the second timing + ollama list and weâll decide whether to switch OpenClaw over or roll back.
Got it â if qwen3.5:4b is taking 4m19s directly in Ollama, then switching OpenClaw to it wonât make anything faster (OpenClaw will only add overhead).
Before we write it off completely, we need 2 quick data points:
- Second-run timing (warm):
time ollama run qwen3.5:4b "tell me a 1-sentence joke. Output ONLY the joke."
- Your installed models + sizes:
ollama list
If the second run is still in the multi-minute range, Iâd roll back and stick with the 9B (since we already got that down to ~26s in Telegram), and we can look for a different small model that actually improves latency.
Makes sense â if OpenClaw is currently pointed at the 4B model, Telegram can easily take minutes (your direct Ollama baseline was ~4m).
While youâre waiting, two quick checks help keep this controlled:
- Confirm what model Telegram is actually using:
openclaw models list
(look for which one is tagged default)
- If itâs still âtypingâŚâ after ~2 minutes and you donât want to wait forever, you can abort the in-flight turn from Telegram with:
/stop
If the Telegram result comes back anywhere near that 4-minute Ollama time, Iâd roll back to the 9B as your default (since we already got that down to ~26s with prompt slimming).
When I changed the model to :4b and sent the /new command to telegram, I got this:
â New session started ¡ model: ollama/qwen3.5:4b
[MISSING] Expected at: /Users/aiverse/.openclaw/workspace/BOOTSTRAP.md
That message is normal.
/new starts a fresh session and OpenClaw prints a little bootstrap check. It can inject a BOOTSTRAP.md file (if present) as âstartup instructionsâ for the agent â but in your workspace it isnât there, so it reports:
[MISSING] Expected at: .../BOOTSTRAP.md
Nothing is broken; itâs just informational. In fact, for a local-speed-focused setup, you probably want it missing (or empty), because anything in BOOTSTRAP.md would be extra prompt tokens every session.
Next step is simply: send a normal prompt now and time it. If the 4B model is painfully slow, roll back to the 9B default.
21 seconds to tell a joke through telegram on the 4b model
Nice â 21s vs ~26s is a real improvement (and it confirms your setup is now âprompt-lightâ enough that model size/speed actually matters).
A few practical takeaways:
-
The speed win is modest (~20% faster), so the real question is quality:
-
If the 4B model still answers well for your day-to-day chat/summaries â keep it as default.
-
If you notice it getting confused / giving weaker answers â the 9B is usually the better âbrainâ, just a bit slower.
-
As soon as you turn on heavier tools/skills (browser/web/automation), the 4B model is more likely to struggle than the 9B. For your Polymarket-style âworker agentâ idea, Iâd still expect youâll want either:
-
a bigger local model, or
-
a hosted model for the worker, with local for chat.
If you want a clean decision: try 5 prompts you actually care about (summarize an article, extract key facts, write a short email reply, etc.) on 4B, then switch back to 9B and compare quality. Speed is only worth it if the answers are still usable.
Tell me what 5 prompts would help me the most for the polymarket strategy I told you about as an example. And tell me what tasks I should give my local LLM agent (qwen 3.5 4b or 9b if you deem it better for the job in hand) and what tasks I should give to another agent using Gemini 2.5 flash lite, and how to create that agent
Iâll keep this non-gambling-advice / non-automation-by-default: the safe pattern is monitor + analyze + propose, then you approve before any bet/trade action. (Also double-check Polymarket ToS/region rules before automating anything.)
5 prompts that actually move you forward (for the âPolymarket workerâ agent)
Use these as copy/paste prompts to your Gemini 2.5 Flash Lite worker (because itâll be doing web + long-context + tool-heavy work):
- Market spec + constraints (turn vague idea into a config)
âHelp me define a monitoring + decision spec for Polymarket market:
<paste market URL + rules text>.
Output JSON with: markets[], outcomes[], what events invalidate the thesis, max exposure, max daily changes, cooldown rules, and what data sources to monitor.â
- News monitoring plan (queries + sources)
âCreate a news monitoring plan for this market.
Output: (a) 10 web_search queries (include synonyms, people/orgs, key phrases), (b) 10 trustworthy sources/RSS feeds, (c) a schedule (how often to check), (d) keywords that indicate material updates.â
- Article triage (convert articles into âdoes this matter?â)
âGiven this article text:
<paste fetched text>, extract the new factual claims, rate relevance to each outcome (0â5), and explain in 3 bullets what it changes (if anything). Output a table.â
- Portfolio/position review (human-in-the-loop action proposal)
âGiven current positions + odds (Iâll paste them), and the last 24h of signals, propose âHold/Adjust/Exitâ with: reason, confidence, max size change, and a one-line message to request my approval. Do NOT execute anything.â
- Monitoring log + post-mortem
âMaintain a decision log. For every proposed change, record: timestamp, evidence links, what changed, what was decided, and what would falsify it. Output append-only JSONL entries.â
What to give the local agent vs the Gemini agent
Local agent (Qwen 4B/9B) â keep it âfast + cheapâ
Best tasks:
- Your chat interface (âwhat are we monitoring?â, âsummarize last updateâ, âwhatâs the current plan?â)
- Formatting/summarizing messages youâll read on your phone
- Simple âconfig editsâ / âremindersâ / âstatus checksâ
Model choice:
- 4B if you prioritize speed for chat/summaries
- 9B if you notice the 4B gets confused or misses nuance
Keep this agentâs tools/skills minimal (like youâve done), otherwise youâll lose the speed advantage.
Gemini 2.5 Flash Lite worker â âtool-heavy analystâ
Best tasks:
- Web monitoring:
web_search,web_fetch - Browser automation (only if needed):
browser - Scheduled monitoring:
cron - Parsing lots of text and producing structured recommendations
How to create a second âGemini workerâ agent (recommended approach)
The cleanest beginner setup is two Telegram bots â two isolated agents.
-
Create a 2nd Telegram bot in BotFather (get a second token)
-
Add a new agent workspace
openclaw agents add polymarket
- Bind that Telegram bot to the new agent
In~/.openclaw/openclaw.json, youâll end up with something like (from the docs pattern):
{
agents: { list: [
{ id: "main", default: true, workspace: "~/.openclaw/workspace" },
{ id: "polymarket", workspace: "~/.openclaw/workspace-polymarket" },
]},
bindings: [
{ agentId: "main", match: { channel: "telegram", accountId: "default" } },
{ agentId: "polymarket", match: { channel: "telegram", accountId: "polymarket" } },
],
channels: { telegram: { accounts: {
default: { botToken: "âŚ", dmPolicy: "pairing" },
polymarket: { botToken: "âŚ", dmPolicy: "pairing" }
}}}
}
- Restart:
openclaw gateway restart
- DM the polymarket bot and set its session model once:
/new/model google/gemini-2.5-flash-lite
From there, you talk to the local bot for âfast chatâ, and the polymarket bot for âworker tasksâ.
If you want, paste your current openclaw config get channels.telegram.accounts (redact tokens) and Iâll tell you the exact minimal diff to add the second bot + binding.
On your 1 marker spec step, do I have to give it a specific URL? Or can I simply tell it to search the web for news in general?
For the market spec prompt, you really want to give it something concrete about the market.
Best: give the Polymarket market URL
That lets the worker agent anchor on:
- the exact question
- outcomes
- resolution/settlement criteria
- deadlines / edge cases
Without that, itâs very easy for the agent to âspecâ the wrong market (or invent resolution rules).
If you donât have the URL handy
You can still do it, but you must paste at least:
- the exact market question text
- the outcomes
- the settlement rules (or a screenshot / copied text)
âSearch the web for news in general?â
You can, but it wonât be useful unless you narrow it. âNews in generalâ is infinite; youâll drown in irrelevant articles and burn tokens.
A good middle ground prompt is:
âSearch the web for news related to <topic/person/company/event> that could affect the probability of <outcome A/B> in this market: <paste market question/outcomes>. Propose 10 focused queries + 10 reliable sources + an update cadence.â
If you paste one market URL (or the question/outcomes text), I can rewrite your âmarket specâ prompt so itâs tight and doesnât waste searches.
On another note: If I am limited by RAM, would it help to have 10 different agents asking 10 different questions in parallel with the local qwen3.5 instead of having a single agent asking 10 questions one after the other? Time wise
On one Mac mini running one local model, spinning up 10 agents in parallel usually does not make it faster â it usually makes it slower (and less stable).
Why:
The bottleneck is compute, not ânumber of agentsâ
Local inference is basically: one GPU/CPU crunching tokens. If you run 10 requests at once, they donât magically get 10Ă compute â they time-slice the same hardware.
So total throughput is roughly the same (often worse), and:
- each individual reply takes longer (higher latency)
- you risk RAM pressure / swapping (which is catastrophic for speed)
Parallel can help only when youâre I/O-bound
If the tasks are mostly âwait for web pages / APIsâ, parallelism helps.
But if each task is âread big prompt â thinkâ, local LLM inference dominates, and parallelism hurts.
Practical rule on a 16GB local setup
- Keep one worker doing LLM calls sequentially.
- If you must parallelize, do it lightly (e.g. 2 at a time), and only for I/O-heavy steps.
If your goal is â10 questions quicklyâ, the real speedups are:
- smaller prompts (what we already did)
- smaller/faster model
- or offload heavy parallel work to a hosted model (where concurrency actually scales).
So if I had a Mac Mini M4 Pro (14 core CPU and 20 core GPU) with 64GB of RAM and 512GB, would I be able to run everything locally with literally zero API costs? This still on the hypothetical polymarket context and when when compared to my current M4 16/256 setup running local + online based LLM
You could get very close to âzero API spendâ on an M4 Pro + 64GB unified memory, but it wonât be âfree in the absolute senseâ, and it wonât automatically be âas good as cloudâ.
What that upgraded Mac mini would change vs your M4 16/256
- 64GB unified memory = you can run much stronger local models (bigger weights, bigger context, less swapping) and keep more stuff resident.
- M4 Pro GPU/CPU = generally better local inference throughput/latency than the base M4.
- 512GB SSD = more room for multiple model files + caches (still easy to fill if you hoard models).
So yes: it can make âlocal-first OpenClawâ far more viable.
Can you run âeverythingâ locally with literally $0 API costs?
Only if you avoid any tools/features that call paid APIs. Examples:
Can be $0 (API-wise):
- Local LLM inference via Ollama
- Browser automation / scraping (still âonlineâ, but not paid API)
- Pulling public web pages and parsing them locally
Often NOT $0 (unless you deliberately disable/replace):
web_searchtool (typically uses a search provider API key)- Any cloud speech-to-text / TTS / image generation you enable
- Any paid data sources you choose (news APIs, market data feeds, etc.)
So the answer is: yes, you can set up a âno paid model APIâ system, but you must design for it.
Polymarket-specific reality check
Even with strong local models, the reliable way to do this is usually:
- deterministic code for âfetch market data / compute exposure / place orderâ
- LLM (local or cloud) for âsummarize news + extract claims + propose actionsâ
- human approval before execution
You can scrape instead of using an API, but itâs brittle and can break anytime the site changes.
Compared to your current setup (M4 16/256 + mix of local/cloud)
- The upgraded machine will reduce how often you âneedâ cloud for capability.
- But for tool reliability + complex multi-step browsing, cloud models can still be better (and sometimes cheaper overall because they finish in fewer attempts).
If you tell me âI want $0 paid APIs, Iâm okay with slower/less capableâ, I can outline the exact knobs to disable (web_search, skills that use cloud APIs, etc.) while keeping the browser-based workflow working.
Im just wondering if the upgrade cost (about 1600$) would be compensated by the savings in API tokens in the long run
It can pay off, but only if your monthly API spend is high enough â otherwise $1600 is a long break-even.
Simple break-even math
Roughly:
break-even months â 1600 á (monthly API spend you avoid)
So for example:
- If youâd otherwise spend $200/mo on APIs â break-even ~ 8 months
- $50/mo â ~ 32 months (~2.7 years)
- $20/mo â ~ 80 months (~6.5 years)
(And local isnât truly $0: youâll still have some electricity cost, and you may still choose to use a hosted model for certain tasks.)
The trap: local often increases âtime costâ
Even if it saves tokens, local can be slower/less reliable for tool-heavy work â you may end up spending more time getting the same outcome. That matters a lot when youâre learning/building.
Best way to decide without guessing
Measure your real usage for 1â2 weeks, then extrapolate:
- Turn on usage footers in chat:
/usage cost(or/usage full) - Do your normal workflow for a week (or simulate the Polymarket âworkerâ checks)
- Look at the daily/weekly cost trend and project monthly
If you tell me what youâre currently spending per week (or paste a couple /usage cost results), I can help you estimate a realistic break-even.
Im just thinking that if I spend 3$ a day which isnât that hard on a computer running 24h a day, thatâs 93$ per month (letâs call it 100$). In 1.5 years the difference is paid for in API savings, electricity included
Your math is basically right if (big if) the upgrade really lets you eliminate most of that ~$3/day:
- $3/day â $90/mo
- $1600 á $90/mo â 17â18 months (call it ~1.5 years)
So the break-even story is plausible.
Where this often goes sideways is in the assumptions:
1) The upgrade wonât necessarily take API spend to ânear-zeroâ
Even on an M4 Pro/64GB, you may still choose to use a hosted model for:
- tool-heavy browsing/planning reliability
- speed (cloud can be dramatically faster wall-clock)
- vision/image tasks
- web search APIs (unless you disable them and rely on browser scraping)
So savings might be, say, $90 â $30/mo, not $90 â $0/mo. That doubles/triples break-even time.
2) You already proved your current M4 can run local
Youâve got Qwen 4B doing ~21s replies after prompt slimming. So the upgrade isnât âcloud vs localâ, itâs âbetter local vs current local + occasional cloudâ. The incremental savings may be smaller than youâre thinking.
Best next step before spending $1600
Run a 1â2 week âlocal-firstâ trial on your current Mac:
- keep your current lean prompt setup
- only use Gemini/hosted when you truly get stuck
- track actual spend with
/usage cost
Then youâll know whether the upgrade would save you $20/mo or $100/mo in practice.
If you paste a couple days of /usage cost totals (or just âI spent $X in 3 days doing Yâ), I can help you estimate a more realistic break-even.
What I really need to know is if that speed difference between local and online-based (with the more capable M4Pro version) is big enough to eventually make any betting strategy make or break
It depends entirely on what kind of âstrategyâ you mean â and for most Polymarket-style edges, the âLLM latency differenceâ is rarely the makeâorâbreak factor.
When speed does matter (and local vs cloud can matter)
If your plan is basically âreact to breaking news before everyone elseâ, then yes: seconds matter. In that world:
- 20â30s local latency can be too slow
- 2â5s cloud latency can be better
âŚbut also: youâre competing against people with direct feeds + custom code + automation. An LLM-driven pipeline (local or cloud) usually isnât the winning edge there.
When speed doesnât matter much
If your plan is more like:
- medium-term thesis updates
- interpreting complex developments
- adjusting positions a few times per day / week
âŚthen the difference between 5s vs 25s usually doesnât decide profitability. Correctness + discipline + risk management matter far more.
The real âmake or breakâ factors are usually:
- data freshness (what sources you watch, alerts, RSS, transcripts, filings, etc.)
- execution plumbing (API reliability, avoiding UI scraping, reducing friction)
- false positives / hallucinations (LLM confidently wrong = expensive)
- risk controls (position sizing, max loss, cooldowns, human approval)
Practical recommendation
Use the LLM as an analyst, not a high-frequency trader:
- run fast deterministic monitors (RSS/keyword triggers) to detect âsomething changedâ
- only then invoke the LLM to summarize/assess impact
- keep a human approval step before any bet placement
If you tell me your intended cadence (e.g. âreact within 1â2 minutesâ vs âdaily adjustmentsâ), I can say whether local latency is likely acceptable or whether youâre in a regime where youâll want cloud + a more automated data pipeline.
My idea was precisely to build a setup through OC that would allow me to be one of those people with direct feeds + custom code + automation. Is it plausible with both Mac Mini versions or just the higher end?
Itâs plausible to build an automated âdirect feeds + custom codeâ setup with OpenClaw in the loop â but the honest answer is that hardware is not the main limiter once youâre chasing speed.
What actually determines whether you can be âone of those peopleâ
- Data ingestion latency + reliability
- How fast you get the signal (feeds, alerts, RSS, transcripts, filings, etc.)
- Whether itâs machine-readable and stable
- Execution plumbing
- API-first integration (UI scraping is too brittle/slow)
- Robust retries, idempotency, logging, risk limits
- Running the execution code close to the network endpoints that matter (often a VPS beats a box under your router)
- Critical path design
If you want to compete on speed, an LLM should not be in the critical path.
- Use deterministic code to detect an event and (optionally) execute within seconds
- Use the LLM for: summarization, classification, âis this materially relevant?â, generating human-readable briefs, etc.
So⌠base M4 vs M4 Pro 64GB?
- Both can run OpenClaw + scripts + cron + web monitoring.
- The M4 Pro/64GB mainly helps if you insist on doing more LLM work locally (bigger local models, fewer stalls, more headroom).
- But if your goal is âlow-latency automated execution,â the difference between those Macs is usually less important than:
- running the worker on a VPS
- having good data sources
- keeping the LLM out of the hot path
Practical recommendation
- Use your current Mac mini as the âcontrol planeâ / interface.
- Put the fast feed+execution worker somewhere reliable/always-on (often a VPS), and let OpenClaw supervise + alert + approve.
If you tell me whether youâre aiming for âreact within secondsâ vs âreact within minutes,â I can be more concrete about whether the Mac-only approach is realistic.
React within seconds would be the answer
If your target is âreact within secondsâ, then the honest answer is:
- Neither Mac mini (base or Pro) + an LLM-in-the-loop pipeline is the right way to compete on speed.
- You need an event-driven, deterministic worker (code) thatâs always running, plus API/WebSocket feeds, and likely a VPS with good network routes.
A few concrete implications:
1) Keep the LLM out of the hot path
LLMs (local or cloud) add unpredictable latency. For seconds-level reaction, the âdecisionâ must be:
- rule-based / statistical code
- precomputed thresholds
- fast parsing of structured data
Use the LLM after the fact to:
- summarize what happened
- explain the rationale
- generate a report
- optionally sanity-check risk limits (but not blocking execution)
2) Cron polling wonât cut it
Even 10-second polling is too slow/noisy. You want:
- streaming updates (WebSocket) or push-based feeds
- a daemon/service, not âask an agent repeatedlyâ
3) Hardware matters less than architecture + network placement
An M4 Pro is nice, but your limiting factor will be:
- data latency
- execution latency
- reliability
- ToS/API constraints
A small VPS close to the services you hit often beats a box under your router.