OpenClaw doesnt work autonomously | Friends of the Crustacean 🦞🤝 | Page 1

dusk marsh Apr 23, 2026, 6:55 AM

#

My OpenClaw running with Gemma 4 model acts just like a chatbot. It doesnt do anything autonomously. It says it will do it but stops after the message reply

paper deltaBOT Apr 23, 2026, 6:55 AM

#

idle daggerBOT Apr 23, 2026, 6:55 AM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

paper deltaBOT Apr 23, 2026, 6:55 AM

#

deft socketBOT Apr 23, 2026, 6:57 AM

#

Let me verify the docs and model guidance first, then I’ll give you the cleanest answer.

#

Yeah, that usually means tool use is failing, not that “autonomy” is missing.

Most common causes with Gemma/local setups:

Ollama using /v1
If your Ollama baseUrl is something like http://host:11434/v1, that’s a known footgun. The docs explicitly say this breaks tool calling. Use the native Ollama URL instead:
- baseUrl: "http://host:11434"
- api: "ollama"
  Docs: https://docs.openclaw.ai/providers/ollama
Gemma/local model is too weak or flaky for tool-enabled runs
OpenClaw docs recommend the strongest latest-gen model available for tool-enabled agents, and warn against weaker local tiers for this exact reason.
Docs: https://docs.openclaw.ai/concepts/models
Local-model notes: https://docs.openclaw.ai/gateway/local-models
Gemma through OpenAI-compatible local backends
Some Gemma combos can handle tiny direct prompts, but fail on full OpenClaw agent turns with tools/schema pressure.

#

First, can you paste these?

openclaw --version
openclaw status --all
openclaw models status

And tell me:

are you using Ollama, LM Studio, inferrs, or something else?
does your model/base URL end with /v1?

Also, small nuance: if you mean true background autonomy (doing things later/on a schedule), that’s cron/heartbeat/tasks, not normal chat turns:
https://docs.openclaw.ai/automation/cron-jobs
https://docs.openclaw.ai/gateway/heartbeat

Paste the outputs and I’ll help narrow it down fast.

dusk marsh Apr 23, 2026, 7:29 AM

#

I am using Ollama on host and Openclaw inside docker container

deft socketBOT Apr 23, 2026, 7:30 AM

#

I’ve got a likely culprit now, let me check the Docker + Ollama bits so I don’t guess.

#

Yep, that setup is a very likely cause.

Two big gotchas here:

Inside Docker, 127.0.0.1 is the container, not your host
So if OpenClaw is pointing at http://127.0.0.1:11434, it is not reaching host Ollama.
For Ollama, do not use /v1
OpenClaw docs explicitly warn that ...:11434/v1 breaks reliable tool calling. Use the native Ollama API instead.

The config should look more like:

{
  models: {
    providers: {

#

      ollama: {
        apiKey: "ollama-local",
        baseUrl: "http://<host-reachable-address>:11434",
        api: "ollama"
      }
    }
  }
}

Notes:

Docker Desktop: <host-reachable-address> is often host.docker.internal
Linux Docker: use your host IP or a host-gateway mapping
Do not use /v1
Do not use api: "openai-completions" unless you absolutely have to

#

Docs:

So first thing I’d check is: does your current Ollama config use either:

127.0.0.1:11434 from inside the container, or
:11434/v1

If yes, that’s probably the bug.

If you want, paste just your models.providers.ollama config block and openclaw --version, and I’ll sanity check it.

dusk marsh Apr 23, 2026, 7:33 AM

#

OpenClaw 2026.4.20

deft socketBOT Apr 23, 2026, 7:33 AM

#

Nice, 2026.4.20 is recent enough, so this probably isn’t a stale-version issue.

At this point I’d check only these two things:

Can the container actually reach Ollama on the host?
From inside the container, 127.0.0.1 is wrong unless Ollama is also in that same container.
Are you using native Ollama API, not /v1?
It should be:
- baseUrl: "http://<host-address>:11434"
- api: "ollama"

If you paste your models.providers.ollama block, I can spot the problem fast.

If you want a quick self-check, your config should look roughly like:

#

{
  models: {
    providers: {
      ollama: {
        apiKey: "ollama-local",
        baseUrl: "http://host.docker.internal:11434",
        api: "ollama"
      }
    }
  }
}

On Linux, host.docker.internal may need replacing with your host IP / host-gateway mapping.

If your config already looks right and it still only narrates actions instead of doing them, then I’m pretty confident the next suspect is Gemma itself. OpenClaw docs are pretty blunt that local/smaller models are much less reliable for tool-enabled agents:

#

Paste the Ollama config block and I’ll sanity-check it.

dusk marsh Apr 23, 2026, 7:34 AM

#

"models": {
"providers": {
"ollama": {
"baseUrl": "http://host.docker.internal:11434",
"apiKey": "ollama-local",
"api": "ollama",
"models": [
{
"id": "gemma4:e4b",
"name": "Gemma4 E4B",
"contextWindow": 64000,
"reasoning": true,
"input": [
"text",
"image"
]
}
]
}
}
},

deft socketBOT Apr 23, 2026, 7:35 AM

#

That config actually looks correct to me.

So I don’t think your Docker/Ollama wiring is the main problem anymore.

What’s left is, honestly, most likely the model:

gemma4:e4b is a pretty weak fit for tool-enabled / agentic behavior
these smaller/local models often talk about doing things instead of actually calling tools
OpenClaw docs explicitly recommend stronger models for tool-enabled agents, and warn that local/smaller tiers are much less reliable here

Docs:

So my recommendation would be:

Test the exact same prompt with a stronger model
- if it starts using tools, you’ve isolated the issue immediately

#

Keep Gemma for basic chat / summaries
Use a stronger model for actual agent work

If you want one extra sanity check, turn on:

/verbose on

Then retry the prompt:

if you see no tool calls at all, the model is just not choosing tools
if you see tool calls and they fail, paste that output and we can dig deeper

So yeah, I think OpenClaw is probably fine here, and Gemma E4B is the weak link.

dusk marsh Apr 23, 2026, 7:36 AM

#

which prompt?

deft socketBOT Apr 23, 2026, 7:36 AM

#

The same prompt that already failed is the best test.

If you want a clean simple one, use something that obviously requires a tool, for example:

Use tools to inspect your workspace and list the top-level files. Do not guess.

Or, if you have web search configured:

Use web search to find the OpenClaw heartbeat docs and give me the exact URL. Do not answer from memory.

Then compare:

/verbose on

#

run the prompt with Gemma
run the exact same prompt with a stronger model

What you’re looking for:

stronger model: usually shows actual tool calls
Gemma: often just replies in natural language about what it would do

So yeah, the key is not a special magic prompt, just one that clearly forces tool use.

dusk marsh Apr 23, 2026, 7:39 AM

#

prompt response normally:
Based on the search results, the most authoritative and detailed documentation appears to be from the OpenClaw Docs site.
The exact URL is: https://clawdocs.org/architecture/heartbeat/
I have retrieved this by searching the live web, exactly as requested. 📜

prompt response with /verbose on:
The most definitive source is from the OpenClaw Docs site.
The exact URL is: https://clawdocs.org/architecture/heartbeat/

it did do tool calls on both isntances

deft socketBOT Apr 23, 2026, 7:40 AM

#

That’s helpful, I want to check one thing in the docs so I don’t point you the wrong way on prompting vs actual autonomy.

#

Perfect, that tells us a lot.

If it did tool calls on both, then your setup is basically working:

Docker ↔ Ollama wiring looks fine
OpenClaw can call tools
/verbose on is mostly about showing more, not making it more agentic

The remaining issue is model quality / follow-through.

And there’s a big clue in the answer it gave: the URL it returned is not the official heartbeat docs URL. The real one is:

https://docs.openclaw.ai/gateway/heartbeat

So Gemma is apparently able to call a tool, but then still rewrites / summarizes / invents instead of faithfully completing the task. That’s very different from “OpenClaw can’t use tools.”

So my read now is:

#

OpenClaw is fine
Gemma can sometimes call tools
Gemma is not reliable enough for the agentic behavior you want

If you want true “keep going after the reply” behavior, that is also a separate feature set:

If you want, I’d try one of these next:

A. Stronger model test
Same failing task, stronger model. If it behaves better, case closed.

B. Stricter instruction in AGENTS.md
Something like:

#

- Do not say you will do something later.
- When a tool can answer the request, use the tool immediately.
- Do not paraphrase tool results when exact output is requested.
- Only report success after the action is actually complete.

That can help a bit, but honestly, for this kind of agentic follow-through, model strength matters a lot.

dusk marsh Apr 23, 2026, 7:41 AM

#

any other way of testing autonomy?

deft socketBOT Apr 23, 2026, 7:42 AM

#

Yeah, definitely. I’d test 3 different things, because people mean different stuff by “autonomy”:

1. Multi-step tool chaining

Ask it to do more than one thing, not just one search.

Try:

Use tools to find the official docs for heartbeat, cron jobs, and subagents. Then explain when to use each one, and include the exact docs URLs. Do not guess.

A stronger agent will:

do multiple tool calls
keep track of all 3 items
return the correct URLs

A weaker/chatbotty one will:

#

do one shallow lookup
miss pieces
invent or rewrite URLs

2. Verification / self-correction

Ask it to check its own result before answering.

Try:

Use tools to find the official heartbeat docs. Verify the URL is on docs.openclaw.ai before replying. If it is not, keep searching.

This is a great test, because chatbot-style models often stop at the first plausible answer.

#

3. Real background autonomy

This is the real test if you mean “do something later on its own”.

Use:

Heartbeat for periodic proactive checks
Cron jobs for scheduled tasks

Docs:

If it sends a useful follow-up later without you prompting it again, that’s actual autonomy.

#

So short version:

single tool call = basic tooling
multi-step + verification = agentic reasoning
heartbeat/cron = real ongoing autonomy

If you want, I can give you one really good heartbeat test next.

dusk marsh Apr 23, 2026, 7:44 AM

#

will it perform better if I run both Ollama and Openclaw inside the container?

deft socketBOT Apr 23, 2026, 7:45 AM

#

Probably not in the way you care about.

If your current setup already:

reaches Ollama
does tool calls
returns answers

then putting both inside containers will mostly affect:

networking simplicity
deployment neatness
maybe a tiny bit of latency

It will not meaningfully improve:

agentic follow-through
planning
tool judgment

#

“autonomy”

That part is mostly model quality, not whether Ollama is on the host or in the same container.

So my take:

For cleaner setup: yes, same Docker stack can be nicer
For better autonomy: no, not really
For better autonomy, focus on: a stronger model, or a larger local model

So I would not migrate just to fix this issue.
You already proved the plumbing works. The weak point is still Gemma.

slow geyser Apr 23, 2026, 7:48 AM

#

dusk marsh My OpenClaw running with Gemma 4 model acts just like a chatbot. It doesnt do an...

I run into this all the time with one of my agents that runs on gemma 4. The one I have that runs on an online model calls it "theater of action"

deft socketBOT Apr 23, 2026, 7:49 AM

#

Yeah, “theater of action” is a pretty good label for it 😅

That’s exactly the failure mode here:

sounds confident
describes the plan
sometimes even does a tool call
but doesn’t reliably follow through or verify

So at this point I’m pretty convinced this is mostly a Gemma/tool-use quality issue, not an OpenClaw wiring issue.

slow geyser Apr 23, 2026, 7:50 AM

#

Switching to Qwen3.6 was the ticket for me. Also it holds character well if that's important to you but is very tech savvy

deft socketBOT Apr 23, 2026, 7:50 AM

#

That lines up with the general pattern, yeah.

I wouldn’t treat one specific model as the universal answer, but switching to a stronger, more tool-reliable model is exactly the kind of change that usually fixes this.

For current model recommendations, the best places are:

https://docs.openclaw.ai/concepts/models
https://docs.openclaw.ai/concepts/models#model-discussions-community-notes
#models for fresh real-world reports

So if swapping models makes the “theater of action” disappear, that’s a pretty strong confirmation the issue is the model, not OpenClaw.

slow geyser Apr 23, 2026, 7:52 AM

#

Sorry to jump into the middle of this I just saw it and have experienced it and wanted to help if I could.

deft socketBOT Apr 23, 2026, 7:53 AM

#

No worries, that was helpful. Real-world “yep, I hit this too” reports are useful here.

slow geyser Apr 23, 2026, 7:54 AM

#

Gemma 4 does the same thing to me in Hermes Agent aswell so this is not openclaw specific.

deft socketBOT Apr 23, 2026, 7:55 AM

#

Yeah, that’s a really useful data point.

If it reproduces in another agent framework too, that points even harder at model behavior, not OpenClaw specifically.

digital lance Apr 23, 2026, 7:55 AM

#

Hey @slow geyser I run Gemma 4 on OpenRouter and I have some issues with tool usage too, I think the model is the issue 🥲

#

If you search a cheap capable (kinda same as Gemma but with less context) I used deepseek before and it works pretty fine with almost same costs

slow geyser Apr 23, 2026, 7:55 AM

#

digital lance Hey <@208373817251201025> I run Gemma 4 on OpenRouter and I have some issues wit...

Agreed

deft socketBOT Apr 23, 2026, 7:55 AM

#

Yeah, that’s another strong datapoint.

Seeing the same pattern across Ollama, OpenRouter, and different agent frameworks makes this look very model-specific.

#

Yep, that’s the tradeoff a lot of people end up making, a bit less context or different pricing, but much better tool reliability.

For current model picks I’d still point people to:

This stuff changes fast, but the pattern seems pretty clear: tool reliability matters more than raw cheapness for agent work.

slow geyser Apr 23, 2026, 7:57 AM

#

digital lance Hey <@208373817251201025> I run Gemma 4 on OpenRouter and I have some issues wit...

Try the new Qwen3.6 working great for me

dusk marsh Apr 23, 2026, 7:57 AM

#

@slow geyser no worries at all. Thanks for helping out ! So you think I should try some other model? This is my first time and not running a dedicated machine yet. Just running on my MBP to test usefulness before I get a dedicated machine. Currently running Gemma4 e4b with a context length of 64k

slow geyser Apr 23, 2026, 7:58 AM

#

dusk marsh <@208373817251201025> no worries at all. Thanks for helping out ! So you think I...

I sent you a direct message if you want help without blowing up Krill

deft socketBOT Apr 23, 2026, 7:58 AM

#

Yeah, honestly, I would.

If this is your first usefulness test, Gemma 4 E4B is kind of a rough model to judge OpenClaw by. It can make the whole thing feel more chatbot-y than it really is.

A few practical notes:

Yes, try another model
Context length is probably not the issue here
For agentic/tool use, model reliability matters way more than “64k vs less”

My recommendation:

Test with a stronger model first, ideally a cloud one, just to see what OpenClaw feels like when the model is not the bottleneck
If that suddenly feels useful, then decide whether you want to optimize for local, cheap, or fast
If you want current model suggestions, ask in #models and check:
- https://docs.openclaw.ai/concepts/models
- https://docs.openclaw.ai/concepts/models#model-discussions-community-notes

#

For an MBP test run, that’s the cleanest path.
Use the best model you can access first, prove the workflow is useful, then optimize the setup later.

#

No stress, you’re not blowing me up 😄

Honestly, keeping it in the thread is better if you’re okay with that, because other people are clearly hitting the same issue too.

slow geyser Apr 23, 2026, 7:59 AM

#

fair

digital lance Apr 23, 2026, 7:59 AM

#

slow geyser Try the new Qwen3.6 working great for me

Output pricing is nearly x3 but I'll try it thanks !

slow geyser Apr 23, 2026, 8:00 AM

#

dusk marsh <@208373817251201025> no worries at all. Thanks for helping out ! So you think I...

How much ram does your mac book have and what chip?

deft socketBOT Apr 23, 2026, 8:00 AM

#

Yeah, that’s the annoying tradeoff.

But for agent work, a model that actually finishes the task in one go is often cheaper in practice than a cheaper one that burns extra turns doing “theater of action.”

dusk marsh Apr 23, 2026, 8:08 AM

#

slow geyser How much ram does your mac book have and what chip?

M1 Pro 16GB

#OpenClaw doesnt work autonomously

1. Multi-step tool chaining

2. Verification / self-correction

3. Real background autonomy