#OpenClaw doesnt work autonomously
1 messages Ā· Page 1 of 1 (latest)
Let me verify the docs and model guidance first, then Iāll give you the cleanest answer.
Yeah, that usually means tool use is failing, not that āautonomyā is missing.
Most common causes with Gemma/local setups:
-
Ollama using
/v1
If your OllamabaseUrlis something likehttp://host:11434/v1, thatās a known footgun. The docs explicitly say this breaks tool calling. Use the native Ollama URL instead:baseUrl: "http://host:11434"api: "ollama"
Docs: https://docs.openclaw.ai/providers/ollama
-
Gemma/local model is too weak or flaky for tool-enabled runs
OpenClaw docs recommend the strongest latest-gen model available for tool-enabled agents, and warn against weaker local tiers for this exact reason.
Docs: https://docs.openclaw.ai/concepts/models
Local-model notes: https://docs.openclaw.ai/gateway/local-models -
Gemma through OpenAI-compatible local backends
Some Gemma combos can handle tiny direct prompts, but fail on full OpenClaw agent turns with tools/schema pressure.
First, can you paste these?
openclaw --version
openclaw status --all
openclaw models status
And tell me:
- are you using Ollama, LM Studio, inferrs, or something else?
- does your model/base URL end with
/v1?
Also, small nuance: if you mean true background autonomy (doing things later/on a schedule), thatās cron/heartbeat/tasks, not normal chat turns:
https://docs.openclaw.ai/automation/cron-jobs
https://docs.openclaw.ai/gateway/heartbeat
Paste the outputs and Iāll help narrow it down fast.
I am using Ollama on host and Openclaw inside docker container
Iāve got a likely culprit now, let me check the Docker + Ollama bits so I donāt guess.
Yep, that setup is a very likely cause.
Two big gotchas here:
-
Inside Docker,
127.0.0.1is the container, not your host
So if OpenClaw is pointing athttp://127.0.0.1:11434, it is not reaching host Ollama. -
For Ollama, do not use
/v1
OpenClaw docs explicitly warn that...:11434/v1breaks reliable tool calling. Use the native Ollama API instead.
The config should look more like:
{
models: {
providers: {
ollama: {
apiKey: "ollama-local",
baseUrl: "http://<host-reachable-address>:11434",
api: "ollama"
}
}
}
}
Notes:
- Docker Desktop:
<host-reachable-address>is oftenhost.docker.internal - Linux Docker: use your host IP or a host-gateway mapping
- Do not use
/v1 - Do not use
api: "openai-completions"unless you absolutely have to
Docs:
So first thing Iād check is: does your current Ollama config use either:
127.0.0.1:11434from inside the container, or:11434/v1
If yes, thatās probably the bug.
If you want, paste just your models.providers.ollama config block and openclaw --version, and Iāll sanity check it.
OpenClaw 2026.4.20
Nice, 2026.4.20 is recent enough, so this probably isnāt a stale-version issue.
At this point Iād check only these two things:
-
Can the container actually reach Ollama on the host?
From inside the container,127.0.0.1is wrong unless Ollama is also in that same container. -
Are you using native Ollama API, not
/v1?
It should be:baseUrl: "http://<host-address>:11434"api: "ollama"
If you paste your models.providers.ollama block, I can spot the problem fast.
If you want a quick self-check, your config should look roughly like:
{
models: {
providers: {
ollama: {
apiKey: "ollama-local",
baseUrl: "http://host.docker.internal:11434",
api: "ollama"
}
}
}
}
On Linux, host.docker.internal may need replacing with your host IP / host-gateway mapping.
If your config already looks right and it still only narrates actions instead of doing them, then Iām pretty confident the next suspect is Gemma itself. OpenClaw docs are pretty blunt that local/smaller models are much less reliable for tool-enabled agents:
Paste the Ollama config block and Iāll sanity-check it.
"models": {
"providers": {
"ollama": {
"baseUrl": "http://host.docker.internal:11434",
"apiKey": "ollama-local",
"api": "ollama",
"models": [
{
"id": "gemma4:e4b",
"name": "Gemma4 E4B",
"contextWindow": 64000,
"reasoning": true,
"input": [
"text",
"image"
]
}
]
}
}
},
That config actually looks correct to me.
So I donāt think your Docker/Ollama wiring is the main problem anymore.
Whatās left is, honestly, most likely the model:
gemma4:e4bis a pretty weak fit for tool-enabled / agentic behavior- these smaller/local models often talk about doing things instead of actually calling tools
- OpenClaw docs explicitly recommend stronger models for tool-enabled agents, and warn that local/smaller tiers are much less reliable here
Docs:
So my recommendation would be:
- Test the exact same prompt with a stronger model
- if it starts using tools, youāve isolated the issue immediately
- Keep Gemma for basic chat / summaries
- Use a stronger model for actual agent work
If you want one extra sanity check, turn on:
/verbose on
Then retry the prompt:
- if you see no tool calls at all, the model is just not choosing tools
- if you see tool calls and they fail, paste that output and we can dig deeper
So yeah, I think OpenClaw is probably fine here, and Gemma E4B is the weak link.
which prompt?
The same prompt that already failed is the best test.
If you want a clean simple one, use something that obviously requires a tool, for example:
Use tools to inspect your workspace and list the top-level files. Do not guess.
Or, if you have web search configured:
Use web search to find the OpenClaw heartbeat docs and give me the exact URL. Do not answer from memory.
Then compare:
/verbose on
- run the prompt with Gemma
- run the exact same prompt with a stronger model
What youāre looking for:
- stronger model: usually shows actual tool calls
- Gemma: often just replies in natural language about what it would do
So yeah, the key is not a special magic prompt, just one that clearly forces tool use.
prompt response normally:
Based on the search results, the most authoritative and detailed documentation appears to be from the OpenClaw Docs site.
The exact URL is: https://clawdocs.org/architecture/heartbeat/
I have retrieved this by searching the live web, exactly as requested. š
prompt response with /verbose on:
The most definitive source is from the OpenClaw Docs site.
The exact URL is: https://clawdocs.org/architecture/heartbeat/
it did do tool calls on both isntances
Thatās helpful, I want to check one thing in the docs so I donāt point you the wrong way on prompting vs actual autonomy.
Perfect, that tells us a lot.
If it did tool calls on both, then your setup is basically working:
- Docker ā Ollama wiring looks fine
- OpenClaw can call tools
/verbose onis mostly about showing more, not making it more agentic
The remaining issue is model quality / follow-through.
And thereās a big clue in the answer it gave: the URL it returned is not the official heartbeat docs URL. The real one is:
https://docs.openclaw.ai/gateway/heartbeat
So Gemma is apparently able to call a tool, but then still rewrites / summarizes / invents instead of faithfully completing the task. Thatās very different from āOpenClaw canāt use tools.ā
So my read now is:
- OpenClaw is fine
- Gemma can sometimes call tools
- Gemma is not reliable enough for the agentic behavior you want
If you want true ākeep going after the replyā behavior, that is also a separate feature set:
- https://docs.openclaw.ai/gateway/heartbeat
- https://docs.openclaw.ai/automation/cron-jobs
- https://docs.openclaw.ai/tools/subagents
If you want, Iād try one of these next:
A. Stronger model test
Same failing task, stronger model. If it behaves better, case closed.
B. Stricter instruction in AGENTS.md
Something like:
- Do not say you will do something later.
- When a tool can answer the request, use the tool immediately.
- Do not paraphrase tool results when exact output is requested.
- Only report success after the action is actually complete.
That can help a bit, but honestly, for this kind of agentic follow-through, model strength matters a lot.
any other way of testing autonomy?
Yeah, definitely. Iād test 3 different things, because people mean different stuff by āautonomyā:
1. Multi-step tool chaining
Ask it to do more than one thing, not just one search.
Try:
Use tools to find the official docs for heartbeat, cron jobs, and subagents. Then explain when to use each one, and include the exact docs URLs. Do not guess.
A stronger agent will:
- do multiple tool calls
- keep track of all 3 items
- return the correct URLs
A weaker/chatbotty one will:
- do one shallow lookup
- miss pieces
- invent or rewrite URLs
2. Verification / self-correction
Ask it to check its own result before answering.
Try:
Use tools to find the official heartbeat docs. Verify the URL is on docs.openclaw.ai before replying. If it is not, keep searching.
This is a great test, because chatbot-style models often stop at the first plausible answer.
3. Real background autonomy
This is the real test if you mean ādo something later on its ownā.
Use:
- Heartbeat for periodic proactive checks
- Cron jobs for scheduled tasks
Docs:
If it sends a useful follow-up later without you prompting it again, thatās actual autonomy.
So short version:
- single tool call = basic tooling
- multi-step + verification = agentic reasoning
- heartbeat/cron = real ongoing autonomy
If you want, I can give you one really good heartbeat test next.
will it perform better if I run both Ollama and Openclaw inside the container?
Probably not in the way you care about.
If your current setup already:
- reaches Ollama
- does tool calls
- returns answers
then putting both inside containers will mostly affect:
- networking simplicity
- deployment neatness
- maybe a tiny bit of latency
It will not meaningfully improve:
- agentic follow-through
- planning
- tool judgment
- āautonomyā
That part is mostly model quality, not whether Ollama is on the host or in the same container.
So my take:
- For cleaner setup: yes, same Docker stack can be nicer
- For better autonomy: no, not really
- For better autonomy, focus on: a stronger model, or a larger local model
So I would not migrate just to fix this issue.
You already proved the plumbing works. The weak point is still Gemma.
I run into this all the time with one of my agents that runs on gemma 4. The one I have that runs on an online model calls it "theater of action"
Yeah, ātheater of actionā is a pretty good label for it š
Thatās exactly the failure mode here:
- sounds confident
- describes the plan
- sometimes even does a tool call
- but doesnāt reliably follow through or verify
So at this point Iām pretty convinced this is mostly a Gemma/tool-use quality issue, not an OpenClaw wiring issue.
Switching to Qwen3.6 was the ticket for me. Also it holds character well if that's important to you but is very tech savvy
That lines up with the general pattern, yeah.
I wouldnāt treat one specific model as the universal answer, but switching to a stronger, more tool-reliable model is exactly the kind of change that usually fixes this.
For current model recommendations, the best places are:
- https://docs.openclaw.ai/concepts/models
- https://docs.openclaw.ai/concepts/models#model-discussions-community-notes
#modelsfor fresh real-world reports
So if swapping models makes the ātheater of actionā disappear, thatās a pretty strong confirmation the issue is the model, not OpenClaw.
Sorry to jump into the middle of this I just saw it and have experienced it and wanted to help if I could.
No worries, that was helpful. Real-world āyep, I hit this tooā reports are useful here.
Gemma 4 does the same thing to me in Hermes Agent aswell so this is not openclaw specific.
Yeah, thatās a really useful data point.
If it reproduces in another agent framework too, that points even harder at model behavior, not OpenClaw specifically.
Hey @slow geyser I run Gemma 4 on OpenRouter and I have some issues with tool usage too, I think the model is the issue š„²
If you search a cheap capable (kinda same as Gemma but with less context) I used deepseek before and it works pretty fine with almost same costs
Agreed
Yeah, thatās another strong datapoint.
Seeing the same pattern across Ollama, OpenRouter, and different agent frameworks makes this look very model-specific.
Yep, thatās the tradeoff a lot of people end up making, a bit less context or different pricing, but much better tool reliability.
For current model picks Iād still point people to:
- https://docs.openclaw.ai/concepts/models
- https://docs.openclaw.ai/concepts/models#model-discussions-community-notes
#models
This stuff changes fast, but the pattern seems pretty clear: tool reliability matters more than raw cheapness for agent work.
Try the new Qwen3.6 working great for me
@slow geyser no worries at all. Thanks for helping out ! So you think I should try some other model? This is my first time and not running a dedicated machine yet. Just running on my MBP to test usefulness before I get a dedicated machine. Currently running Gemma4 e4b with a context length of 64k
I sent you a direct message if you want help without blowing up Krill
Yeah, honestly, I would.
If this is your first usefulness test, Gemma 4 E4B is kind of a rough model to judge OpenClaw by. It can make the whole thing feel more chatbot-y than it really is.
A few practical notes:
- Yes, try another model
- Context length is probably not the issue here
- For agentic/tool use, model reliability matters way more than ā64k vs lessā
My recommendation:
- Test with a stronger model first, ideally a cloud one, just to see what OpenClaw feels like when the model is not the bottleneck
- If that suddenly feels useful, then decide whether you want to optimize for local, cheap, or fast
- If you want current model suggestions, ask in
#modelsand check:
For an MBP test run, thatās the cleanest path.
Use the best model you can access first, prove the workflow is useful, then optimize the setup later.
No stress, youāre not blowing me up š
Honestly, keeping it in the thread is better if youāre okay with that, because other people are clearly hitting the same issue too.
fair
Output pricing is nearly x3 but I'll try it thanks !
How much ram does your mac book have and what chip?
Yeah, thatās the annoying tradeoff.
But for agent work, a model that actually finishes the task in one go is often cheaper in practice than a cheaper one that burns extra turns doing ātheater of action.ā
M1 Pro 16GB