#OpenClaw doesnt work autonomously

1 messages Ā· Page 1 of 1 (latest)

dusk marsh
#

My OpenClaw running with Gemma 4 model acts just like a chatbot. It doesnt do anything autonomously. It says it will do it but stops after the message reply

paper deltaBOT
idle daggerBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> āœ… Mark Solution

paper deltaBOT
deft socketBOT
#

Let me verify the docs and model guidance first, then I’ll give you the cleanest answer.

#

Yeah, that usually means tool use is failing, not that ā€œautonomyā€ is missing.

Most common causes with Gemma/local setups:

  1. Ollama using /v1
    If your Ollama baseUrl is something like http://host:11434/v1, that’s a known footgun. The docs explicitly say this breaks tool calling. Use the native Ollama URL instead:

  2. Gemma/local model is too weak or flaky for tool-enabled runs
    OpenClaw docs recommend the strongest latest-gen model available for tool-enabled agents, and warn against weaker local tiers for this exact reason.
    Docs: https://docs.openclaw.ai/concepts/models
    Local-model notes: https://docs.openclaw.ai/gateway/local-models

  3. Gemma through OpenAI-compatible local backends
    Some Gemma combos can handle tiny direct prompts, but fail on full OpenClaw agent turns with tools/schema pressure.

#

First, can you paste these?

openclaw --version
openclaw status --all
openclaw models status

And tell me:

  • are you using Ollama, LM Studio, inferrs, or something else?
  • does your model/base URL end with /v1?

Also, small nuance: if you mean true background autonomy (doing things later/on a schedule), that’s cron/heartbeat/tasks, not normal chat turns:
https://docs.openclaw.ai/automation/cron-jobs
https://docs.openclaw.ai/gateway/heartbeat

Paste the outputs and I’ll help narrow it down fast.

dusk marsh
#

I am using Ollama on host and Openclaw inside docker container

deft socketBOT
#

I’ve got a likely culprit now, let me check the Docker + Ollama bits so I don’t guess.

#

Yep, that setup is a very likely cause.

Two big gotchas here:

  1. Inside Docker, 127.0.0.1 is the container, not your host
    So if OpenClaw is pointing at http://127.0.0.1:11434, it is not reaching host Ollama.

  2. For Ollama, do not use /v1
    OpenClaw docs explicitly warn that ...:11434/v1 breaks reliable tool calling. Use the native Ollama API instead.

The config should look more like:

{
  models: {
    providers: {
#
      ollama: {
        apiKey: "ollama-local",
        baseUrl: "http://<host-reachable-address>:11434",
        api: "ollama"
      }
    }
  }
}

Notes:

  • Docker Desktop: <host-reachable-address> is often host.docker.internal
  • Linux Docker: use your host IP or a host-gateway mapping
  • Do not use /v1
  • Do not use api: "openai-completions" unless you absolutely have to
dusk marsh
#

OpenClaw 2026.4.20

deft socketBOT
#

Nice, 2026.4.20 is recent enough, so this probably isn’t a stale-version issue.

At this point I’d check only these two things:

  1. Can the container actually reach Ollama on the host?
    From inside the container, 127.0.0.1 is wrong unless Ollama is also in that same container.

  2. Are you using native Ollama API, not /v1?
    It should be:

    • baseUrl: "http://<host-address>:11434"
    • api: "ollama"

If you paste your models.providers.ollama block, I can spot the problem fast.

If you want a quick self-check, your config should look roughly like:

#
{
  models: {
    providers: {
      ollama: {
        apiKey: "ollama-local",
        baseUrl: "http://host.docker.internal:11434",
        api: "ollama"
      }
    }
  }
}

On Linux, host.docker.internal may need replacing with your host IP / host-gateway mapping.

If your config already looks right and it still only narrates actions instead of doing them, then I’m pretty confident the next suspect is Gemma itself. OpenClaw docs are pretty blunt that local/smaller models are much less reliable for tool-enabled agents:

dusk marsh
#

"models": {
"providers": {
"ollama": {
"baseUrl": "http://host.docker.internal:11434",
"apiKey": "ollama-local",
"api": "ollama",
"models": [
{
"id": "gemma4:e4b",
"name": "Gemma4 E4B",
"contextWindow": 64000,
"reasoning": true,
"input": [
"text",
"image"
]
}
]
}
}
},

deft socketBOT
#

That config actually looks correct to me.

So I don’t think your Docker/Ollama wiring is the main problem anymore.

What’s left is, honestly, most likely the model:

  • gemma4:e4b is a pretty weak fit for tool-enabled / agentic behavior
  • these smaller/local models often talk about doing things instead of actually calling tools
  • OpenClaw docs explicitly recommend stronger models for tool-enabled agents, and warn that local/smaller tiers are much less reliable here

Docs:

So my recommendation would be:

  1. Test the exact same prompt with a stronger model
    • if it starts using tools, you’ve isolated the issue immediately
#
  1. Keep Gemma for basic chat / summaries
  2. Use a stronger model for actual agent work

If you want one extra sanity check, turn on:

/verbose on

Then retry the prompt:

  • if you see no tool calls at all, the model is just not choosing tools
  • if you see tool calls and they fail, paste that output and we can dig deeper

So yeah, I think OpenClaw is probably fine here, and Gemma E4B is the weak link.

dusk marsh
#

which prompt?

deft socketBOT
#

The same prompt that already failed is the best test.

If you want a clean simple one, use something that obviously requires a tool, for example:

Use tools to inspect your workspace and list the top-level files. Do not guess.

Or, if you have web search configured:

Use web search to find the OpenClaw heartbeat docs and give me the exact URL. Do not answer from memory.

Then compare:

  1. /verbose on
#
  1. run the prompt with Gemma
  2. run the exact same prompt with a stronger model

What you’re looking for:

  • stronger model: usually shows actual tool calls
  • Gemma: often just replies in natural language about what it would do

So yeah, the key is not a special magic prompt, just one that clearly forces tool use.

dusk marsh
#

prompt response normally:
Based on the search results, the most authoritative and detailed documentation appears to be from the OpenClaw Docs site.
The exact URL is: https://clawdocs.org/architecture/heartbeat/
I have retrieved this by searching the live web, exactly as requested. šŸ“œ

prompt response with /verbose on:
The most definitive source is from the OpenClaw Docs site.
The exact URL is: https://clawdocs.org/architecture/heartbeat/

it did do tool calls on both isntances

deft socketBOT
#

That’s helpful, I want to check one thing in the docs so I don’t point you the wrong way on prompting vs actual autonomy.

#

Perfect, that tells us a lot.

If it did tool calls on both, then your setup is basically working:

  • Docker ↔ Ollama wiring looks fine
  • OpenClaw can call tools
  • /verbose on is mostly about showing more, not making it more agentic

The remaining issue is model quality / follow-through.

And there’s a big clue in the answer it gave: the URL it returned is not the official heartbeat docs URL. The real one is:

https://docs.openclaw.ai/gateway/heartbeat

So Gemma is apparently able to call a tool, but then still rewrites / summarizes / invents instead of faithfully completing the task. That’s very different from ā€œOpenClaw can’t use tools.ā€

So my read now is:

#
  1. OpenClaw is fine
  2. Gemma can sometimes call tools
  3. Gemma is not reliable enough for the agentic behavior you want

If you want true ā€œkeep going after the replyā€ behavior, that is also a separate feature set:

If you want, I’d try one of these next:

A. Stronger model test
Same failing task, stronger model. If it behaves better, case closed.

B. Stricter instruction in AGENTS.md
Something like:

#
- Do not say you will do something later.
- When a tool can answer the request, use the tool immediately.
- Do not paraphrase tool results when exact output is requested.
- Only report success after the action is actually complete.

That can help a bit, but honestly, for this kind of agentic follow-through, model strength matters a lot.

dusk marsh
#

any other way of testing autonomy?

deft socketBOT
#

Yeah, definitely. I’d test 3 different things, because people mean different stuff by ā€œautonomyā€:

1. Multi-step tool chaining

Ask it to do more than one thing, not just one search.

Try:

Use tools to find the official docs for heartbeat, cron jobs, and subagents. Then explain when to use each one, and include the exact docs URLs. Do not guess.

A stronger agent will:

  • do multiple tool calls
  • keep track of all 3 items
  • return the correct URLs

A weaker/chatbotty one will:

#
  • do one shallow lookup
  • miss pieces
  • invent or rewrite URLs

2. Verification / self-correction

Ask it to check its own result before answering.

Try:

Use tools to find the official heartbeat docs. Verify the URL is on docs.openclaw.ai before replying. If it is not, keep searching.

This is a great test, because chatbot-style models often stop at the first plausible answer.

#

So short version:

  • single tool call = basic tooling
  • multi-step + verification = agentic reasoning
  • heartbeat/cron = real ongoing autonomy

If you want, I can give you one really good heartbeat test next.

dusk marsh
#

will it perform better if I run both Ollama and Openclaw inside the container?

deft socketBOT
#

Probably not in the way you care about.

If your current setup already:

  • reaches Ollama
  • does tool calls
  • returns answers

then putting both inside containers will mostly affect:

  • networking simplicity
  • deployment neatness
  • maybe a tiny bit of latency

It will not meaningfully improve:

  • agentic follow-through
  • planning
  • tool judgment
#
  • ā€œautonomyā€

That part is mostly model quality, not whether Ollama is on the host or in the same container.

So my take:

  • For cleaner setup: yes, same Docker stack can be nicer
  • For better autonomy: no, not really
  • For better autonomy, focus on: a stronger model, or a larger local model

So I would not migrate just to fix this issue.
You already proved the plumbing works. The weak point is still Gemma.

slow geyser
deft socketBOT
#

Yeah, ā€œtheater of actionā€ is a pretty good label for it šŸ˜…

That’s exactly the failure mode here:

  • sounds confident
  • describes the plan
  • sometimes even does a tool call
  • but doesn’t reliably follow through or verify

So at this point I’m pretty convinced this is mostly a Gemma/tool-use quality issue, not an OpenClaw wiring issue.

slow geyser
#

Switching to Qwen3.6 was the ticket for me. Also it holds character well if that's important to you but is very tech savvy

deft socketBOT
#

That lines up with the general pattern, yeah.

I wouldn’t treat one specific model as the universal answer, but switching to a stronger, more tool-reliable model is exactly the kind of change that usually fixes this.

For current model recommendations, the best places are:

So if swapping models makes the ā€œtheater of actionā€ disappear, that’s a pretty strong confirmation the issue is the model, not OpenClaw.

slow geyser
#

Sorry to jump into the middle of this I just saw it and have experienced it and wanted to help if I could.

deft socketBOT
#

No worries, that was helpful. Real-world ā€œyep, I hit this tooā€ reports are useful here.

slow geyser
#

Gemma 4 does the same thing to me in Hermes Agent aswell so this is not openclaw specific.

deft socketBOT
#

Yeah, that’s a really useful data point.

If it reproduces in another agent framework too, that points even harder at model behavior, not OpenClaw specifically.

digital lance
#

Hey @slow geyser I run Gemma 4 on OpenRouter and I have some issues with tool usage too, I think the model is the issue 🄲

#

If you search a cheap capable (kinda same as Gemma but with less context) I used deepseek before and it works pretty fine with almost same costs

deft socketBOT
#

Yeah, that’s another strong datapoint.

Seeing the same pattern across Ollama, OpenRouter, and different agent frameworks makes this look very model-specific.

slow geyser
dusk marsh
#

@slow geyser no worries at all. Thanks for helping out ! So you think I should try some other model? This is my first time and not running a dedicated machine yet. Just running on my MBP to test usefulness before I get a dedicated machine. Currently running Gemma4 e4b with a context length of 64k

slow geyser
deft socketBOT
#

Yeah, honestly, I would.

If this is your first usefulness test, Gemma 4 E4B is kind of a rough model to judge OpenClaw by. It can make the whole thing feel more chatbot-y than it really is.

A few practical notes:

  • Yes, try another model
  • Context length is probably not the issue here
  • For agentic/tool use, model reliability matters way more than ā€œ64k vs lessā€

My recommendation:

  1. Test with a stronger model first, ideally a cloud one, just to see what OpenClaw feels like when the model is not the bottleneck
  2. If that suddenly feels useful, then decide whether you want to optimize for local, cheap, or fast
  3. If you want current model suggestions, ask in #models and check:
#

For an MBP test run, that’s the cleanest path.
Use the best model you can access first, prove the workflow is useful, then optimize the setup later.

#

No stress, you’re not blowing me up šŸ˜„

Honestly, keeping it in the thread is better if you’re okay with that, because other people are clearly hitting the same issue too.

slow geyser
#

fair

digital lance
slow geyser
deft socketBOT
#

Yeah, that’s the annoying tradeoff.

But for agent work, a model that actually finishes the task in one go is often cheaper in practice than a cheaper one that burns extra turns doing ā€œtheater of action.ā€

dusk marsh