#WSL Ollama and OpenClaw impossible to run local agents

1 messages · Page 1 of 1 (latest)

small arrow
#

I am getting warnings and my local subagent doesn't work well in local when it uses the cloud fallback it works but doesn't matter which model I setup in Ollama it wont work

10:50:04+00:00 warn agent/embedded {"subsystem":"agent/embedded"} {"runId":"d961822e-3e4e-4e32-a8d8-d0cee75b097b","provider":"ollama","model":"qwen2.5-coder:14b","pattern":"json_tool_call","toolName":"edit","registeredTool":true} Assistant reply looks like a tool call, but no structured tool invocation was emitted; treating it as text.

rancid pivotBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

misty ibexBOT
#

That warning is pretty specific: OpenClaw registered the edit tool, but Ollama returned something that merely looked like a JSON tool call in text instead of emitting a real structured tool invocation. So the agent can’t safely execute it and treats it as text.

The two most common causes are:

  1. Ollama is configured through the OpenAI-compatible /v1 endpoint. For OpenClaw tool calling, use the native Ollama API URL, e.g. http://127.0.0.1:11434 — no /v1 — and preferably set api: "ollama".
  2. The local model just isn’t reliable enough at OpenClaw-style tool calling. qwen2.5-coder:14b can chat/code, but local 14B-class models often fumble structured tools, especially for subagents/file edits. Cloud fallback working is a strong sign this is model/tool-call reliability, not your task itself.

First thing: what does openclaw --version show?

Then I’d test in this order:

openclaw models status
openclaw models list --provider ollama
openclaw infer model run --local --model ollama/qwen2.5-coder:14b --prompt "Reply with exactly: pong" --json
openclaw infer model run --gateway --model ollama/qwen2.5-coder:14b --prompt "Reply with exactly: pong" --json
#

If those pass but real subagent runs fail, try local-model lean mode:

{
  agents: {
    defaults: {
      experimental: {
        localModelLean: true
      }
    }
  }
}

If it still emits fake tool JSON, either use a stronger tool-capable local model / larger context, or mark that model as not tool-capable with compat.supportsTools: false and keep tool-heavy subagents on a cloud fallback.

Relevant docs:

small arrow
#

bc@pc:/mnt/c/Users/BC$ ollama list
NAME ID SIZE MODIFIED
qwen3-coder:30b 06c1097efce0 18 GB 15 hours ago
qwen2.5-coder:14b 9ec8897f747e 9.0 GB 18 hours ago
qwen2.5:14b 7cdf5a0187d5 9.0 GB 18 hours ago
qwen2.5:7b 845dbda0ea48 4.7 GB 18 hours ago
qwen2.5-coder:7b dae161e27b0e 4.7 GB 19 hours ago

maybe I can try with other models that are 100% suitable with openclaw?

misty ibexBOT
#

There isn’t a “100% suitable with OpenClaw” Ollama model, sadly. The hard part is not coding ability — it’s reliable structured tool calling. Local models can look smart and still emit fake JSON tool calls like your warning shows.

From the models you already have, I’d try them in this order:

  1. ollama/qwen3-coder:30b — best shot from your list, because it’s larger/newer.
  2. ollama/qwen2.5-coder:14b — usable for coding text, but the warning suggests it’s shaky for tools in your setup.
  3. Skip the 7B models for tool-heavy subagents; they’re very likely to be brittle.

But first, please confirm openclaw --version — version mismatch matters a lot here.

Also make sure your Ollama config is native API, not /v1:

models: {
  providers: {
    ollama: {
#
      baseUrl: "http://127.0.0.1:11434",
      apiKey: "ollama-local",
      api: "ollama",
      timeoutSeconds: 300
    }
  }
}

Then test qwen3-coder:30b with:

openclaw infer model run --local --model ollama/qwen3-coder:30b --prompt "Reply with exactly: pong" --json
openclaw infer model run --gateway --model ollama/qwen3-coder:30b --prompt "Reply with exactly: pong" --json
#

If simple inference works but subagents still produce fake tool calls, enable localModelLean. If even then it fails, I’d keep local Ollama for chat/summarization and use cloud fallback for tool-heavy subagents. That’s not you doing anything wrong — local tool-calling is just still spiky. 🦞

Docs: https://docs.openclaw.ai/providers/ollama and https://docs.openclaw.ai/gateway/local-models

For fresh “which local model is best this week?” advice, #models is the better place — people there will have more current hands-on results.

small arrow
#

I am running in my desktop computer in WSL 64GB RAM, AMD Ryzen 7 5800X, NVIDIA GeForce RTX 3060

misty ibexBOT
#

That machine is decent, but the RTX 3060 is probably the limiting factor here — especially if it’s the 12GB VRAM model. Your qwen3-coder:30b download is 18GB, so it likely won’t fully fit in VRAM. Ollama may offload part of it to system RAM/CPU, which can make it slow and much worse at long/tool-heavy OpenClaw turns.

For OpenClaw subagents, the painful combo is: large prompt + tools schema + transcript/context + local model. Even if the model can answer normal prompts, it may fail structured tool calls under that load.

I’d check these first:

openclaw --version
nvidia-smi
ollama ps

ollama ps is useful because it shows whether the model is mostly on GPU or split CPU/GPU. If qwen3-coder:30b is not fitting cleanly, try the best model that fits mostly/fully in VRAM, with a smaller context first.

Also add explicit Ollama params for the model entry, something like:

#
params: {
  num_ctx: 8192,
  keep_alive: "15m",
  thinking: false
}

Then enable lean mode:

agents: {
  defaults: {
    experimental: {
      localModelLean: true
    }
#
  }
}

If your goal is “local subagent that edits files reliably”, I’d be honest: on a 3060, this may remain flaky. Local Ollama is better suited for chat/summarization/simple code suggestions; cloud fallback is still much more reliable for OpenClaw tool-heavy work.

If you paste openclaw --version, openclaw models status, and whether your Ollama baseUrl has /v1 in it, we can narrow this down without guessing.

small arrow
#

12:35:20+00:00 warn diagnostic {"subsystem":"diagnostic"} liveness warning: reasons=event_loop_delay interval=30s eventLoopDelayP99Ms=288.6 eventLoopDelayMaxMs=7658.8 eventLoopUtilization=0.542 cpuCoreRatio=0.63 active=1 waiting=0 queued=1 phase=channels.telegram.start-account recentPhases=sidecars.subagent-recovery:6ms,sidecars.main-session-recovery:9ms,post-attach.update-sentinel:0ms,sidecars.session-locks:43ms,sidecars.model-prewarm:1162ms,post-ready.maintenance:7ms work=[active=agent:main:dashboard:50da4520-3e3e-4d14-9ba3-fe84c7c8d4bc(processing,q=1,age=12s) queued=agent:main:dashboard:50da4520-3e3e-4d14-9ba3-fe84c7c8d4bc(processing,q=1,age=12s)]
12:35:42+00:00 warn memory {"subsystem":"memory"} memory embeddings rate limited; retrying in 573ms
12:35:43+00:00 warn memory {"subsystem":"memory"} memory embeddings rate limited; retrying in 1160ms
12:35:44+00:00 warn memory {"subsystem":"memory"} memory embeddings rate limited; retrying in 1157ms
12:35:44+00:00 warn memory {"subsystem":"memory"} memory embeddings rate limited; retrying in 1029ms
12:35:44+00:00 warn memory {"subsystem":"memory"} memory embeddings rate limited; retrying in 2350ms
12:35:45+00:00 warn memory {"subsystem":"memory"} memory embeddings rate limited; retrying in 2109ms
12:35:45+00:00 warn memory {"subsystem":"memory"} memory embeddings rate limited; retrying in 2245ms
12:35:46+00:00 warn memory {"subsystem":"memory"} memory embeddings rate limited; retrying in 2149ms
12:35:50+00:00 warn memory {"subsystem":"memory"} memory sync failed (search-bootstrap): Error: openai embeddings failed: 429 {
"error": {
"message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.",
"type": "insufficient_quota",
"param": null,
"code": "insufficient_quota"
}
}

misty ibexBOT
#

This log is a second, separate problem from the Ollama tool-call warning.

The important line is:

openai embeddings failed: 429 ... insufficient_quota

So OpenClaw is trying to use OpenAI for memory embeddings, and that OpenAI account/key has no quota. That can make startup/memory sync noisy and add load, but it’s not proof that Ollama itself is broken.

Since you’re trying to run local, switch memory embeddings to Ollama too:

ollama pull nomic-embed-text

Then configure memory search roughly like this:

#
{
  agents: {
    defaults: {
      memorySearch: {
        provider: "ollama",
        model: "nomic-embed-text",
        remote: {
          nonBatchConcurrency: 1
        }
      }
    }
  }
}

If your Ollama provider uses a custom/LAN URL, include it there too:

#
remote: {
  baseUrl: "http://127.0.0.1:11434",
  apiKey: "ollama-local",
  nonBatchConcurrency: 1
}

The event_loop_delay warning means the gateway’s Node process was getting stalled for a bit. In WSL + Ollama, that can happen when the machine is under CPU/RAM pressure, the model is partly offloaded to CPU, or startup jobs like Telegram + memory sync are all happening at once.

I’d do these in order:

  1. Fix embeddings so they stop hitting OpenAI quota.
  2. Use native Ollama URL, no /v1.
  3. Use qwen3-coder:30b only if ollama ps shows it’s not thrashing CPU/RAM; otherwise test qwen2.5-coder:14b with smaller context.
  4. Enable localModelLean for local agent runs.
  5. Keep cloud fallback for file-edit/tool-heavy subagents if local still emits fake tool JSON.
small arrow
#

ok please can you summarize what you just said with something also for this?

12:40:23+00:00 error [tools] web_search failed: DuckDuckGo returned a bot-detection challenge. raw_params={"query":"Business Central AL Item fields Unit Width Unit Height Unit Length Net Weight","count":5,"language":"en"}

misty ibexBOT
#

Yep — short version:

You have three separate issues/logs happening:

  1. Ollama tool-call warning
    Your local model is writing text that looks like a tool call, but it is not emitting a real structured tool call. OpenClaw refuses to execute that, correctly.

    Fix/check:

    • Use native Ollama URL: http://127.0.0.1:11434
    • Do not use /v1
    • Set api: "ollama"
    • Try qwen3-coder:30b first, then qwen2.5-coder:14b
    • Enable localModelLean
    • Use smaller num_ctx like 8192
    • If local still fakes tool JSON, keep cloud fallback for tool-heavy subagents.
  2. OpenAI embeddings quota error

#

This line means memory search is trying to use OpenAI embeddings:

openai embeddings failed: 429 insufficient_quota

Since you want local, switch memory embeddings to Ollama:

ollama pull nomic-embed-text

Config idea:

agents: {
#
     defaults: {
       memorySearch: {
         provider: "ollama",
         model: "nomic-embed-text",
         remote: {
           baseUrl: "http://127.0.0.1:11434",
           apiKey: "ollama-local",
           nonBatchConcurrency: 1
         }
       }
     }
   }
  1. DuckDuckGo bot-detection error
    This one is unrelated to Ollama. It means OpenClaw used DuckDuckGo as the key-free web_search fallback, and DuckDuckGo blocked the automated request with a challenge.
#
   web_search failed: DuckDuckGo returned a bot-detection challenge

Best fix: configure a real search provider instead of DuckDuckGo. I’d recommend Brave or another API-backed provider.

Example:

tools: {
  web: {
    search: {
      enabled: true,
      provider: "brave"
    }
  }
#
   },
   plugins: {
     entries: {
       brave: {
         config: {
           webSearch: {
             apiKey: "YOUR_BRAVE_API_KEY"
           }
         }
       }
     }
   }

Or use env var:

small arrow
#

how can I check this?

Use native Ollama URL: http://127.0.0.1:11434
Do not use /v1

how to check if base url for ollama in wsl is baseUrl: "http://127.0.0.1:11434" ??

for brave:

  • tools.web.search.provider: web_search provider is not available: brave (install or enable plugin "brave", then run openclaw doctor --fix)
  • plugins.entries.brave: plugin not installed: brave — install the official external plugin with: openclaw plugins install @openclaw/brave-plugin
misty ibexBOT
#

Yep — you can check both from WSL with the OpenClaw config CLI.

First, find the active config file:

openclaw config file

Then check the Ollama provider config:

openclaw config get models.providers.ollama --json
openclaw config get models.providers.ollama.baseUrl
openclaw config get models.providers.ollama.api

What you want to see is:

#
baseUrl = http://127.0.0.1:11434
api = ollama

Bad / suspicious:

http://127.0.0.1:11434/v1
http://localhost:11434/v1
api = openai

Also verify Ollama itself is reachable from WSL:

#
curl http://127.0.0.1:11434/api/tags

If that works, OpenClaw should use the same base URL.

To set/fix it:

openclaw config set models.providers.ollama.baseUrl "http://127.0.0.1:11434"
openclaw config set models.providers.ollama.api "ollama"
openclaw config set models.providers.ollama.apiKey "ollama-local"
openclaw gateway restart

For Brave: your error means the Brave search plugin is not installed yet. Install it exactly like the error says:

#
openclaw plugins install @openclaw/brave-plugin
openclaw doctor --fix

Then configure the key/provider:

openclaw config set tools.web.search.provider "brave"
openclaw config set plugins.entries.brave.config.webSearch.apiKey "YOUR_BRAVE_API_KEY"
openclaw gateway restart

Then verify:

#
openclaw config get tools.web.search.provider
openclaw config get plugins.entries.brave.config.webSearch.apiKey
openclaw status --all

And please still check:

openclaw --version

Version matters a lot for this specific Ollama/tool-calling path.

small arrow
#

13:40:28+00:00 warn plugins {"subsystem":"plugins"} [plugins] plugins.allow is empty; discovered non-bundled plugins may auto-load: brave (/home/bc/.openclaw/npm/node_modules/@openclaw/brave-plugin/dist/index.js). Set plugins.allow to explicit trusted ids.

misty ibexBOT
#

That warning is not fatal. It means Brave is installed and discoverable, but your config has no explicit plugin trust list, so OpenClaw is warning that non-bundled plugins may auto-load.

Since you intentionally installed Brave, add it to plugins.allow:

openclaw config set plugins.allow '["brave"]' --strict-json
openclaw doctor --fix
openclaw gateway restart

Then verify:

openclaw config get plugins.allow --json
openclaw plugins list
openclaw config validate
#

Important caveat: plugins.allow is exclusive. If you later rely on other plugin-owned features and something disappears, you may need to add those plugin IDs too. For your current Brave web search case, brave is the one that matters.

After that, this should be valid:

openclaw config set tools.web.search.provider "brave"

Docs: https://docs.openclaw.ai/tools/plugin

small arrow
#

14:05:34+00:00 warn diagnostic {"subsystem":"diagnostic"} long-running session: sessionId=d7a3f8a8-ce09-4fb8-acc2-aa88c03af89f sessionKey=agent:main:dashboard:2364a078-35cf-4b63-82f2-d4e8d09df231 state=processing age=130s queueDepth=1 reason=queued_behind_active_work classification=long_running activeWorkKind=model_call lastProgress=model_call:started lastProgressAge=9s recovery=none

misty ibexBOT
#

That warning means the Gateway has one active dashboard session stuck/busy in a model call for ~130s, and another request is queued behind it.

Key parts:

state=processing
age=130s
queueDepth=1
reason=queued_behind_active_work
activeWorkKind=model_call
lastProgress=model_call:started

So this is not a plugin/config validation error. It usually means the local Ollama model call is taking a long time or not streaming progress back quickly enough. With WSL + RTX 3060 + larger Qwen models, this is very plausible, especially if the model is partly CPU/RAM offloaded.

Check what Ollama is doing:

#
ollama ps
nvidia-smi

If you see qwen3-coder:30b loaded and GPU/CPU is pegged, that’s likely the cause. For debugging, stop the current stuck run/session from the UI if possible, then try a tiny direct test:

openclaw infer model run --gateway --model ollama/qwen2.5-coder:14b --prompt "Reply with exactly: pong" --json

If even that is slow/hangs, it’s Ollama/WSL/model performance. If that works but full agents hang, reduce the agent load:

agents: {
  defaults: {
#
    experimental: { localModelLean: true }
  }
}

And set smaller Ollama params for the local model:

params: {
  num_ctx: 8192,
  keep_alive: "15m",
  thinking: false
}

Practical recommendation: use local Ollama for simple chat/summarization first, and keep cloud fallback for tool-heavy subagents until the local model can pass small gateway tests reliably.

small arrow
#

how can i check from openclaw the models usage quota?

misty ibexBOT
#

Use this:

openclaw status --usage

That’s the main CLI command for provider usage/quota. In chat you can also run:

/status

and for local session cost/tokens:

/usage full