#Is OpenClaw on Windows native officially supported? I'm hitting an issue where /v1/models and /v1/c

1 messages · Page 1 of 1 (latest)

wind pebble
#

OpenClaw's HTTP server is returning HTML (UI pages) on all API endpoints instead of JSON responses. This breaks programmatic access to the OpenClaw Gateway via standard OpenAI-compatible endpoints (/v1/models, /v1/chat/completions, etc.), making it impossible to integrate OpenClaw with external agents, applications, or automation tools.
The root cause appears to be a catch-all route serving the UI that intercepts and overrides API endpoint handlers.

Root Cause Analysis
Hypothesis: OpenClaw's routing configuration has a catch-all route that serves the React UI, and this route is executing before or instead of the API endpoint handlers.
Evidence

Consistent behavior: ALL API endpoints return HTML, suggesting a single catch-all route
GET requests work for UI: Browser requests to / receive the HTML UI correctly
No 404 errors: Requests to invalid endpoints don't return 404, they return the UI
Timing: Routes appear to be evaluated in a single pass, not cascading

Likely Location in Code
Based on typical Express.js/Fastify patterns, the issue is likely in one of these files:

dist/gateway-.js (main HTTP server)
dist/router-
.js (routing configuration)
dist/api-*.js (API endpoint definitions)

silver kelpBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

civic rainBOT
#

I checked the current docs/source. Your catch-all theory is close, but there’s an important config gate:

  • OpenAI-compatible Gateway HTTP endpoints are disabled by default.
  • Enable with:
openclaw config set gateway.http.endpoints.chatCompletions.enabled true --strict-json
openclaw gateway restart

Then test:

curl -i http://127.0.0.1:18789/v1/models \
  -H "Authorization: Bearer YOUR_TOKEN"
#

Expected: JSON listing openclaw, openclaw/default, etc.

Docs: https://docs.openclaw.ai/gateway/openai-http-api

One nuance: if the compat endpoint is disabled, GET /v1/models can look like the Control UI SPA fallback, because /v1 isn’t reserved by the UI classifier the same way /api is. That’s confusing and may be worth a GitHub issue, but if the endpoint is enabled, /v1/models is registered before the Control UI handler.

First check please: what does openclaw --version show? If you already enabled the config above and still get HTML, please paste openclaw status --all plus the exact curl -i output.
Also: Windows native exists, but WSL2 is the recommended/stabler route.

wind pebble
#

I will definitely try! Thank you so much.

civic rainBOT
#

Happy to help! 🦐

If enabling that fixes it, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow indexes it for future users.

wind pebble
#

Hey @maintainers / @developers,
I'm running OpenClaw 2026.4.24 on Windows native with Ollama backend. I've enabled the OpenAI-compatible HTTP endpoints (gateway.http.endpoints.chatCompletions.enabled: true) as documented, but all requests hang indefinitely.
What's happening:

Gateway starts fine - openclaw gateway run starts successfully
HTTP endpoints accept connections - curl http://127.0.0.1:18789/v1/models connects but never responds
Gateway logs show the blocker:

"blocked model (context window too small): ollama/llama3.2:3b ctx=8192 (min=16000)"
"Model context window too small (8192 tokens; source=modelsConfig). Minimum is 16000."

Result: All requests timeout waiting for a model response that never comes

Config attempted:
I've tried:

Setting contextWindow: 131072 in models config
Setting maxTokens: 8192 explicitly
Gateway restart after each change
Still getting the same "ctx=8192" error even though config says 131072

Questions:

Why is OpenClaw reporting ctx=8192 when the config explicitly sets contextWindow: 131072? Is there a caching issue or config path I'm missing?
Is the 16,000 token minimum a hard requirement? Can it be lowered for small models like llama3.2:3b?
Is there a workaround to use llama3.2:3b with OpenClaw on Windows native? Or should I use a different model?
Ollama direct works fine - llama3.2:3b responds in 0.5s via http://127.0.0.1:11434/v1/chat/completions, so the model is healthy. The issue is OpenClaw's validation.
Is Windows native officially supported for programmatic API access? Or is WSL2 required?

System Info:
OS: Windows 11 (native, not WSL2)
Node.js: v24.0.0
OpenClaw: 2026.4.24 (cbcfdf6)
Ollama: 0.1.32
Model: llama3.2:3b
Config: ~/.openclaw/openclaw.json with explicit contextWindow settings
What I'm trying to do:
Build a multi-agent trading system where OpenClaw acts as the orchestrator. I need the HTTP API working (/v1/models, /v1/chat/completions) to integrate with agent frameworks.
Any guidance appreciated!

civic rainBOT
#

A few important bits after checking the current docs/source:

  1. /v1/models should not hit Ollama at all. It should quickly return OpenClaw agent targets like openclaw/default, not raw Ollama models. If it hangs, that’s separate from the context-window guard. Use auth in the request:
curl -i http://127.0.0.1:18789/v1/models \
  -H "Authorization: Bearer YOUR_GATEWAY_TOKEN"

OpenClaw’s compat API uses model: "openclaw/default" and backend overrides via x-openclaw-model, not model: "ollama/...". (docs.openclaw.ai)

  1. The 16k minimum is currently a hard guard. The log is from the agent runtime guard:
Minimum is 16000

So llama3.2:3b reporting 8192 will be blocked unless OpenClaw resolves a context budget ≥ 16000.

  1. Why it still says ctx=8192 source=modelsConfig: that strongly suggests OpenClaw is finding a configured per-model entry for llama3.2:3b with either:
#
contextTokens: 8192

or:

contextWindow: 8192

Note: contextTokens wins over contextWindow, and per-model config wins over provider defaults. Check the exact resolved block:

openclaw config get models.providers.ollama
openclaw models status
openclaw models list --provider ollama --json
  1. Recommended Ollama config: use native Ollama mode, not /v1, for OpenClaw’s backend:
#
{
  models: {
    providers: {
      ollama: {
        baseUrl: "http://127.0.0.1:11434",
        apiKey: "ollama-local",
        api: "ollama",
        contextWindow: 32768,
        maxTokens: 8192,
        models: [
          {
            id: "llama3.2:3b",
            name: "llama3.2:3b",
            contextWindow: 32768,
            maxTokens: 8192,
#
            params: { num_ctx: 32768 }
          }
        ]
      }
    }
  },
  agents: {
    defaults: {
      model: { primary: "ollama/llama3.2:3b" }
    }
  }
}

Docs specifically warn not to use Ollama’s /v1 URL with OpenClaw unless you intentionally need legacy OpenAI-compatible mode; native Ollama is the better path. (docs.openclaw.ai)

  1. Windows: native Windows is supported, but WSL2 is still the recommended/stabler path for the full experience. Native should not be inherently unable to serve these HTTP endpoints, though. (docs.openclaw.ai)
#

If after fixing the model config GET /v1/models still hangs with auth, I’d treat that as a bug/regression and file it with the exact config + openclaw logs --follow output: https://github.com/openclaw/openclaw/issues

wind pebble
#

Hey @maintainers,
Got the HTTP API working thanks to the previous guidance (switched to native Ollama mode), but I've hit a severe performance issue that makes it unusable.
The Problem:
Direct Ollama (http://127.0.0.1:11434/v1/chat/completions):

Response time: ~0.5 seconds
Model responds immediately

OpenClaw Gateway (http://127.0.0.1:18789/v1/chat/completions):

Response time: 180+ seconds (3+ minutes)
Same model, same request format
Gateway eventually responds with correct JSON

Test Details:
Both using ollama/llama3.2:3b, same request payload:
json{
"model": "openclaw/default",
"messages": [{"role": "user", "content": "what is 2+2?"}],
"max_tokens": 50
}
Ollama direct: curl http://127.0.0.1:11434/v1/chat/completions ... → 0.5s
OpenClaw gateway: curl http://127.0.0.1:18789/v1/chat/completions ... → 189 seconds
What's Happening:

/v1/models endpoint responds instantly (< 1s) ✅
/v1/chat/completions takes 180+ seconds ❌
Gateway logs show no errors, timeouts, or blocks - just slow processing
The response IS correct JSON when it arrives

Questions:

Is this expected behavior? A 360x slowdown (180s vs 0.5s) seems wrong.
What's adding the overhead? Is it:

Model loading/initialization on every request?
Plugin processing?
Agent context building?
Something else?

Is there a config setting to speed this up? Any caching, pre-loading, or optimization flags?

System Info:
OS: Windows 11 native
Node.js: v24.0.0
OpenClaw: 2026.4.24 (cbcfdf6)
Ollama: 0.1.32
Model: llama3.2:3b (3B params)
Config: Native Ollama mode (api: "ollama", not "/v1")
Use Case:
Building a multi-agent trading system where agents need to make decisions in seconds, not minutes. OpenClaw's orchestration features are perfect, but the 180+ second latency kills the use case.
Is there a performance optimization path, or should I consider a different architecture?

civic rainBOT
#

Yep — now that /v1/models is instant JSON, the UI catch-all theory is ruled out. This is the chat request running the full OpenClaw agent runtime, not a raw Ollama proxy.

A direct Ollama call sends basically: “what is 2+2?”
OpenClaw sends: agent instructions + session scaffolding + available tool schemas + context/history + your message. On llama3.2:3b, that can absolutely turn into minutes, especially with large contextWindow / num_ctx.

Key notes:

  • max_tokens in the OpenAI-style request is not the main limiter here. For OpenClaw, use model config maxTokens.
  • /v1/chat/completions is for “talk to an OpenClaw agent through an OpenAI-compatible shape,” not for low-latency raw model serving.
  • Model loading may be part of it, but 180s usually points to prompt/tool/context pressure. keep_alive helps cold starts, not giant prompt eval.

Try this benchmark profile:

{
  tools: {
#
    profile: "minimal"
  },
  agents: {
    defaults: {
      experimental: {
        localModelLean: true
      },
      model: {
        primary: "ollama/llama3.2:3b"
      }
    }
  },
  models: {
    providers: {
      ollama: {
#
        baseUrl: "http://127.0.0.1:11434",
        apiKey: "ollama-local",
        api: "ollama",
        contextWindow: 16384,
        maxTokens: 256,
        models: [
          {
            id: "llama3.2:3b",
            name: "llama3.2:3b",
            params: {
              num_ctx: 16384,
              keep_alive: "15m"
            },
            compat: {
              supportsTools: false
#
            }
          }
        ]
      }
    }
  }
}

Use compat.supportsTools: false only for testing or text-only agents — it removes real tool-calling capability.

Good isolation steps:

openclaw infer model run --local --model ollama/llama3.2:3b --prompt "Reply with exactly: pong" --json
#

Then test streaming to see whether the delay is before first token:

curl.exe -N http://127.0.0.1:18789/v1/chat/completions ^
  -H "Authorization: Bearer YOUR_TOKEN" ^
  -H "Content-Type: application/json" ^
  -d "{\"model\":\"openclaw/default\",\"stream\":true,\"messages\":[{\"role\":\"user\",\"content\":\"what is 2+2?\"}]}"

For a trading system needing second-level latency, I’d use Ollama/vLLM directly on the critical decision path and use OpenClaw for orchestration, monitoring, tools, notifications, or slower supervisory workflows. A 3B local model is not a great fit for full OpenClaw agent/tool runtime.

Docs: https://docs.openclaw.ai/gateway/openai-http-api, https://docs.openclaw.ai/providers/ollama, https://docs.openclaw.ai/gateway/local-models

wind pebble
#

Follow-up: Confirmed - delay is pre-inference (agent setup overhead)
Tested your suggestions. Results:

openclaw infer model run - Hangs indefinitely
Streaming endpoint (-N flag) - Hangs 2+ minutes with zero output, then times out

Key finding: No tokens stream. The 180+ second delay happens BEFORE inference starts. This is agent runtime setup, not model inference.
Questions:

Is this expected for a 3B model running the full OpenClaw agent runtime? (tools, context, session scaffolding, etc.)
Is there a "raw inference mode" that skips agent setup and just runs the model?
For trading use case: Should I:

Use Ollama directly (0.5s) for fast decisions
Use OpenClaw (180+s) separately for orchestration/monitoring
Or is there a way to make OpenClaw fast enough for real-time?

Can I disable agent features to speed up inference? (tools, context history, session state, etc.)

The maintainers' recommendation was "use Ollama directly on critical path" — but I'm wondering if there's a middle ground where OpenClaw can serve sub-10-second requests.

civic rainBOT
#

That finding actually changes the diagnosis a bit:

openclaw infer model run --local is already the raw-ish mode. It uses a one-shot plain prompt with tools disabled, not the full OpenClaw agent/tool runtime. So if that hangs, I would not blame agent setup yet — I’d look at the Ollama native transport/config path.

Two big suspects:

  1. Your direct test used Ollama /v1/chat/completions, but OpenClaw native Ollama uses /api/chat. Test the same native path directly:
curl.exe -N http://127.0.0.1:11434/api/chat ^
  -H "Content-Type: application/json" ^
  -d "{\"model\":\"llama3.2:3b\",\"stream\":true,\"messages\":[{\"role\":\"user\",\"content\":\"Reply with exactly: pong\"}],\"options\":{\"num_ctx\":16384,\"num_predict\":64}}"
  1. max_tokens in the OpenAI request is not the limiter to rely on here. Set OpenClaw model config low:
{
#
  models: {
    providers: {
      ollama: {
        baseUrl: "http://127.0.0.1:11434",
        apiKey: "ollama-local",
        api: "ollama",
        models: [{
          id: "llama3.2:3b",
          name: "llama3.2:3b",
          contextWindow: 16384,
          maxTokens: 64,
          params: {
            num_ctx: 16384,
            num_predict: 64,
            keep_alive: "15m"
#
          },
          compat: { supportsTools: false }
        }]
      }
    }
  },
  tools: { profile: "minimal" },
  agents: {
    defaults: {
      experimental: { localModelLean: true },
      model: { primary: "ollama/llama3.2:3b" }
    }
  }
}
#

Also: Ollama 0.1.32 is very old — I’d retest with a current Ollama build before chasing OpenClaw internals.

Middle-ground answer: yes, maybe sub-10s is possible for text-only, minimal-tool use if openclaw infer model run --local is fast first. If that still hangs while direct /api/chat is fast with the same num_ctx/num_predict, that’s a good GitHub issue: https://github.com/openclaw/openclaw/issues

Docs: https://docs.openclaw.ai/providers/ollama, https://docs.openclaw.ai/gateway/local-models

wind pebble
#

Root cause identified - openclaw infer model run hangs, but Ollama native works fine
Following up on the performance investigation. Isolated the issue precisely:
Test Results:
Direct Ollama /api/chat (native mode):
bashcurl http://127.0.0.1:11434/api/chat
-H "Content-Type: application/json"
-d '{"model":"llama3.2:3b","stream":true,"messages":[{"role":"user","content":"Reply with exactly: pong"}],"options":{"num_ctx":16384,"num_predict":64}}'

Result: 0.09 seconds ✅ FAST
OpenClaw infer command:
bashopenclaw infer model run --local --model ollama/llama3.2:3b --prompt "Reply with exactly: pong" --json

Result: Hangs indefinitely, no output, no error ❌ BROKEN
Key Facts:

Ollama native path works - 90ms response proves Ollama backend is healthy
OpenClaw infer hangs - Not Ollama, it's inside OpenClaw's infer runtime
No error message - Just silent hang, suggests deadlock or infinite loop
Affects everything downstream - Since infer is broken, /v1/chat/completions can't work either

System Info:
OS: Windows 11 native
Node.js: v24.0.0
OpenClaw: 2026.4.24 (cbcfdf6)
Ollama: 0.21.2 (just updated)
Model: llama3.2:3b
Questions:

Is openclaw infer model run --local known to be broken on Windows?
Should I:

Update to latest OpenClaw version (2026.4.25 is available)?
Roll back to a specific version?
File a GitHub issue with this diagnostic?

Is this a Windows-specific bug, or does it happen on WSL2/Linux too?
Workaround: Since direct Ollama works, can I bypass OpenClaw's infer and use the native /api/chat endpoint directly for fast inference? (Then use OpenClaw for orchestration only)

Impact:

OpenClaw API endpoints: 180+ seconds (blocked by broken infer)
Direct Ollama: 0.09 seconds (works perfectly)
Gap: 2000x slowdown due to OpenClaw infer hang

This is a blocker for any time-sensitive use case (trading, real-time monitoring, etc.).
Grateful for the diagnostic guidance. This is now a concrete bug report with clear reproduction steps

civic rainBOT
#

Nice isolation — that’s a real bug-shaped repro now.

No, openclaw infer model run --local hanging while direct Ollama /api/chat returns in 90ms is not expected. That command is already the “plain one-shot model run” path with tools disabled, so this is below the normal agent orchestration overhead.

What I’d do next:

  1. Update to 2026.4.25 and retest — there was a real 2026.4.25 release on Apr 27 with install/update hardening and plugin startup changes, but I wouldn’t claim it specifically fixes this Ollama hang. (github.com)

  2. If it still hangs, file a GitHub issue with exactly this repro:
    https://github.com/openclaw/openclaw/issues

Include:

openclaw --version
openclaw doctor
openclaw infer model run --local --model ollama/llama3.2:3b --prompt "Reply with exactly: pong" --json
#

…and the working direct Ollama /api/chat curl. This is much cleaner than the original HTTP API symptom.

  1. Workaround: yes — for a trading/real-time decision path, bypass OpenClaw and call Ollama directly. Use OpenClaw for orchestration, monitoring, alerts, scheduled checks, memory/tools, etc. I would not put sub-second or hard realtime decisions through the full OpenClaw gateway path right now.

I can’t confirm it’s Windows-only. There are similar “direct Ollama works, OpenClaw local Ollama hangs/no output” reports on macOS/Linux too, so your Windows-native repro is probably another instance of that broader local-Ollama integration class, not just routing/UI catch-all weirdness. (github.com)

GitHub

Summary Local Ollama models (e.g., llama3.1, llama3.2) hang indefinitely in OpenClaw sessions after update to 2026.3.8. Direct Ollama API/chat works fine, but OpenClaw sessions timeout with zero to...

wind pebble
#

Agent runtime taking 46+ seconds despite model being fast (50ms). Context window 16K minimum blocking optimization.
Following up on performance improvements from 2026.4.27. The model inference is fast, but the agent runtime initialization is blocking requests.
Test Results:
Direct Ollama inference (no agent):
bashcurl http://127.0.0.1:11434/v1/chat/completions
-H "Content-Type: application/json"
-d '{"model":"llama3.2:3b","messages":[{"role":"user","content":"hello"}],"max_tokens":50}'

Result: 0.09 seconds ✅ FAST
OpenClaw gateway with agent runtime:
bashcurl http://127.0.0.1:18789/v1/chat/completions
-H "Content-Type: application/json"
-d '{"model":"openclaw/default","messages":[{"role":"user","content":"hello"}],"max_tokens":50}'

Result: 46.88 seconds ❌ SLOW (on fresh start, every request)
Configuration:
json{
"models": {
"providers": {
"ollama": {
"baseUrl": "http://127.0.0.1:11434",
"api": "ollama",
"models": [
{
"id": "llama3.2:3b",
"contextWindow": 16384,
"maxTokens": 256
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "ollama/llama3.2:3b"
},
"timeoutSeconds": 900,
"experimental": {
"localModelLean": true
}
}
}
}
Key Observations:

Model is responsive - Direct Ollama: 0.09s ✅
Agent overhead is massive - 46s vs 0.09s = 500x slowdown ❌
Consistent delay - Not a one-time load; every request takes 46+ seconds
Hard context minimum - Tried 4096, got error: "Minimum is 16000". Using 16384 now.
UI also slow - Browser chat interface hangs for 46+ seconds per message
Model size - 3.2B params (small model, 2GB disk)

Is 46s normal for agent runtime with a 3B model on Windows?
Is the 16K context window minimum the bottleneck? (We tried 4096 and it failed validation)

Should we use the API differently?

civic rainBOT
#

46s isn’t “normal fast path,” but the direct Ollama test still isn’t apples-to-apples: OpenClaw’s /v1/chat/completions is an agent endpoint, not a raw proxy.

One config issue jumps out: for native Ollama, contextWindow is OpenClaw’s budget metadata; it does not force Ollama’s request context. You need params.num_ctx.

Try this exact model entry:

{
  id: "llama3.2:3b",
  name: "llama3.2:3b",
  contextWindow: 16384,
  maxTokens: 64,
  params: {
    num_ctx: 16384,
    num_predict: 64,
    keep_alive: "15m"
#
  },
  compat: {
    supportsTools: false
  }
}

And also:

{
  tools: { profile: "minimal" },
  agents: {
    defaults: {
      experimental: { localModelLean: true }
#
    }
  }
}

Important bits:

  • max_tokens in the OpenAI HTTP request is not the thing to rely on here; use model config maxTokens / Ollama params.num_predict.
  • The 16k minimum is a guard; you can’t currently use 4096 as the OpenClaw agent context budget.
  • localModelLean reduces heavy tools, but it does not change Ollama runtime context — params.num_ctx does.
  • If you need raw 50ms inference, call Ollama directly. OpenClaw’s Gateway API is for “run this through an OpenClaw agent.”

After changing config, retest:

wind pebble
#

Diagnostic complete — infer model run hangs but direct Ollama instant. Windows-specific bug in OpenClaw's Ollama transport?
Following your suggestion to test infer model run --local vs direct Ollama, I've isolated the bottleneck.
Quick Findings:
Direct Ollama API (working):
curl -X POST http://127.0.0.1:11434/api/generate ...
Result: 0.52 seconds ✅
OpenClaw infer command (hangs):
openclaw infer model run --local --model ollama/llama3.2:3b --prompt "pong" --json
Result: Hangs >2min, no output, no error ❌
OpenClaw Gateway API:
curl http://127.0.0.1:18789/v1/chat/completions ...
Result: 46+ seconds ⚠️
Diagnosis:

Ollama backend is healthy (0.52s proves it)
OpenClaw's request layer to Ollama is broken (hangs/timeouts)
Isolated to native Ollama mode integration

My Questions:

Is this a known Windows-specific bug in 2026.4.27?
Should infer model run have a timeout instead of hanging indefinitely?
Is the issue in request marshalling or response handling?
Should I revert to 2026.4.26 or wait for 2026.4.28?
Is this related to the earlier /v1/chat/completions hang that was fixed in 2026.4.27?

Direct Ollama works (0.5s), but I can't use OpenClaw's orchestration because of this hang. Can you help diagnose what's happening in OpenClaw's Ollama transport layer?

civic rainBOT
#

This is good diagnostic data. One important correction though: OpenClaw native Ollama uses /api/chat, not /api/generate. So the exact comparison should be:

curl.exe -N http://127.0.0.1:11434/api/chat ^
  -H "Content-Type: application/json" ^
  -d "{\"model\":\"llama3.2:3b\",\"stream\":true,\"messages\":[{\"role\":\"user\",\"content\":\"pong\"}],\"options\":{\"num_ctx\":16384,\"num_predict\":64}}"

If that is fast but this still hangs:

openclaw infer model run --local --model ollama/llama3.2:3b --prompt "pong" --json

…then yes, that’s likely an OpenClaw Ollama transport/stream-handling bug, not normal agent overhead.

I’d add a provider timeout so it fails instead of hanging:

#
models: {
  providers: {
    ollama: {
      baseUrl: "http://127.0.0.1:11434",
      apiKey: "ollama-local",
      api: "ollama",
      timeoutSeconds: 30,
      models: [{
        id: "llama3.2:3b",
        name: "llama3.2:3b",
        contextWindow: 16384,
        maxTokens: 64,
        params: { num_ctx: 16384, num_predict: 64, keep_alive: "15m" },
        compat: { supportsTools: false }
      }]
#
    }
  }
}

Then retest:

openclaw infer model run --local --model ollama/llama3.2:3b --prompt "pong" --json
openclaw infer model run --gateway --model ollama/llama3.2:3b --prompt "pong" --json

If the 30s timeout doesn’t produce an error, that’s especially useful for the issue — it means the hang is probably outside the guarded Ollama HTTP request.

I wouldn’t roll back blindly. Update to the newest available version first, retest, then file this as a GitHub issue with:

#
  • openclaw --version
  • openclaw doctor
  • exact Ollama /api/chat curl result
  • exact openclaw infer model run --local ... hang
  • your config block above

Issue tracker: https://github.com/openclaw/openclaw/issues

And yes: for now, using Ollama directly for the time-sensitive trading path and OpenClaw for orchestration/monitoring is the sane workaround.

wind pebble
#

Agent runtime timeout (30s) fires before Ollama responds. Direct Ollama instant (0.09s), agent layer hangs.
Followed your diagnostic steps. Here's what I found:
OpenClaw 2026.4.27 | Windows 11 | Ollama 0.21.2
Direct Ollama (your exact command):
Result: 0.09s ✅
OpenClaw infer model run --local:
Result: 20-25s ✅ (completes, valid JSON)
OpenClaw Gateway API:
Result: 30s timeout ❌
Gateway logs show:
[llm-idle-timeout] ollama/llama3.2:3b produced no reply before the idle watchdog
embedded run timeout: timeoutMs=30000
Error: No text output returned for provider "ollama" model "llama3.2:3b"
The problem is clear: Ollama responds instantly to direct HTTP (0.09s) but the agent layer can't get timely responses, so the 30s idle timeout fires before it gets an answer.
Config applied (following your suggestions):

timeoutSeconds: 30 on Ollama provider
contextWindow: 16384, params.num_ctx: 16384
keep_alive: "15m"
tools: { profile: "minimal" }
localModelLean: true

My questions:

Is this a known Windows-specific bug in 2026.4.27?
Why does the agent's Ollama call hang when direct HTTP is instant?
Should the idle timeout be longer for 3B models?
Should I test on WSL2/Linux to isolate Windows?

Workaround: Using infer model run --local directly (~20-25s) but this loses agent orchestration benefits.
Status: Blocker for production — the agent timeout fires faster than the agent can get an answer from a healthy Ollama backend.

civic rainBOT
#

Yep — this is much clearer now.

Short version: your 30s timeout is now too low for the OpenClaw path. Since infer --local completes in 20–25s, the native Ollama transport is working, but Gateway agent runs add enough prompt/context/runtime overhead that the first streamed token doesn’t arrive before the 30s idle watchdog.

So:

  • Direct Ollama 0.09s = tiny raw prompt.
  • openclaw infer --local 20–25s = OpenClaw’s simplified model path; no full agent tools/context.
  • Gateway 30s timeout = full agent path + idle watchdog, and your models.providers.ollama.timeoutSeconds: 30 effectively caps the first-token wait.

This is related to a broader class of local-LLM/Ollama timeout issues, not clearly Windows-only. There are existing reports for local Ollama/model timeout behavior across Windows and macOS/Linux-ish setups. (github.com)

Try raising both timeouts for diagnosis:

{
GitHub

Summary When using local Ollama models, the first request after model load triggers a cold-start that takes ~13-46 seconds (depending on model size). The default LLM request timeout in OpenClaw app...

#
  models: {
    providers: {
      ollama: {
        timeoutSeconds: 180
      }
    }
  },
  agents: {
    defaults: {
      timeoutSeconds: 180
    }
  }
}

That won’t make it fast; it just confirms whether Gateway can complete when not killed early.

#

Next isolation step:

openclaw infer model run --gateway --model ollama/llama3.2:3b --prompt "pong" --json

If --gateway is ~20–25s but /v1/chat/completions is ~46s, that remaining gap is full agent context/tool/runtime overhead.

For production trading: I’d still use direct Ollama for the fast decision loop, and OpenClaw for orchestration/monitoring/alerts. If you need sub-10s through OpenClaw, this setup/model/runtime path probably isn’t there yet.

Testing WSL2/Linux would be useful. If WSL2 is much faster, file it as Windows-native specific. If not, file it as local Ollama agent first-token timeout/overhead with your timings: https://github.com/openclaw/openclaw/issues

wind pebble
#

Subject: Raised timeouts to 180s as suggested, but getting unexpected results. --gateway fails with auth error, /v1/chat/completions takes 193s.
Followed your diagnostic steps. Raised timeouts to 180s on both Ollama provider and agents defaults. Got unexpected results:
Config applied:
json{
"models": { "providers": { "ollama": { "timeoutSeconds": 180 } } },
"agents": { "defaults": { "timeoutSeconds": 180 } }
}
Test 1: infer --gateway
Result: GatewayClientRequestError: provider/model overrides are not authorized for this caller.
Time: 4.16s
Test 2: /v1/chat/completions
Time: 193.11s
Comparison:

infer --local: 20-25s ✅
infer --gateway: Auth error (4.16s) ❌
/v1/chat/completions: 193.11s ❌

Questions:

Why does --gateway fail with "provider/model overrides not authorized"?
Why is /v1/chat/completions taking 193s when you suggested ~46s?
The gap is huge (20-25s local vs 193s gateway). Is this normal agent overhead?

Should I test WSL2/Linux, debug the auth error, or proceed with hybrid architecture?

civic rainBOT
#

That --gateway error is expected-ish: --model ollama/... is a provider/model override, and Gateway only allows those from admin/internal-authorized callers. Since your default is already ollama/llama3.2:3b, test without the override:

openclaw infer model run --gateway --prompt "pong" --json

That should use the configured default model and avoid the override auth path.

The 193s result is basically the 180s timeout doing its job, plus setup/cleanup/retry overhead. So raising the timeout didn’t fix latency — it just let the slow Gateway agent path wait longer before giving up/completing. The key comparison is now:

  • direct Ollama: ~0.09s
  • infer --local: ~20–25s
  • Gateway agent: ~193s with 180s timeout

That gap is not a healthy realtime path for a 3B Ollama model.

I’d do two things:

#
  1. Run the no-override Gateway probe above.
  2. Test WSL2/Linux if you can. If WSL2 is dramatically faster, it’s Windows-native specific. If it’s still slow, it’s broader local-Ollama/OpenClaw agent overhead.

For production, I’d proceed with hybrid architecture: direct Ollama for trading decisions, OpenClaw for orchestration/monitoring/alerts. And yes, this is worth a GitHub issue with these exact timings/logs: https://github.com/openclaw/openclaw/issues

Docs: https://docs.openclaw.ai/gateway/local-models

wind pebble
#

So, realistically, I won't be able to use UI via Windows with answers in <=10 seconds? Is there anything else I could do?

civic rainBOT
#

Realistically: with this exact setup (Windows-native + Ollama + llama3.2:3b + full OpenClaw UI agent), I would not expect reliable ≤10s replies right now. You’ve already done the right optimizations, and your own numbers show the simplified OpenClaw path is still ~20–25s before the full Gateway/UI path.

Things still worth trying, in order:

  1. Test WSL2/Linux. Native Windows is explicitly the less-recommended path; WSL2 may remove some overhead/weirdness.
  2. Use a stronger/faster-serving model/backend. For OpenClaw agent use, tiny 3B models aren’t ideal. If local-only matters, try LM Studio or vLLM with a larger model and keep it loaded.
  3. Keep OpenClaw minimal: tools.profile: "minimal", localModelLean: true, compat.supportsTools: false, low maxTokens, explicit params.num_ctx.
  4. Use hybrid architecture: direct Ollama for realtime/trading decisions; OpenClaw for UI, monitoring, alerts, scheduling, summaries, and orchestration that can tolerate latency.
  5. File an issue with your timings, especially if WSL2 is much faster: https://github.com/openclaw/openclaw/issues

My honest recommendation: for a ≤10s requirement, don’t put the critical path through the OpenClaw UI/Gateway agent on this stack. Use direct Ollama for that, and let OpenClaw supervise around it.

wind pebble
#

I am happy to have openclaw at <=10 seconds as the "ceo" of an operation that is run by agents in Ollama., But I would need to make it via UI.

civic rainBOT
#

That architecture makes sense — OpenClaw as the CEO/UI layer, with fast Ollama agents doing the actual operational decisions. The trick is: don’t make the OpenClaw UI call the small Ollama model for every answer.

For ≤10s via UI, I’d do one of these:

Best option: use a fast cloud/hosted model for OpenClaw itself, and let it call/coordinate your Ollama agents/tools.
OpenClaw stays responsive in the UI, while your local agents do the specialized work.

Local-only option: run OpenClaw on WSL2/Linux with a stronger local backend/model, but I’d be skeptical of reliable ≤10s with llama3.2:3b as the OpenClaw brain.

Hybrid pattern I’d recommend:

  • UI / CEO agent: OpenClaw with a fast reliable model.
  • Worker agents: your Ollama services, exposed via a small local HTTP API or plugin/tool.
  • OpenClaw asks workers for status/results, summarizes, decides next action.
  • Time-critical loops stay outside OpenClaw and report back.

So the UI prompt becomes something like: “show me current trading state and recommended actions,” not “perform every realtime inference inside the OpenClaw agent run.”

#

If you want to build that, create a tiny local API in front of your Ollama agents, then add it as an OpenClaw tool/plugin. That gives you the UI workflow without putting the realtime path through the slow agent runtime.

wind pebble
#

So, having it on WSL2 would be my best alternative?

civic rainBOT
#

Yes — WSL2 is the best next alternative if you want to keep running locally on Windows hardware.

I’d try this order:

  1. Run OpenClaw Gateway in WSL2
  2. Keep Ollama either:
    • inside WSL2 if GPU/CUDA works cleanly there, or
    • on Windows native, reachable from WSL2 via the Windows host IP
  3. Retest:
openclaw infer model run --local --prompt "pong" --json

and then the UI.

That will tell you whether the big slowdown is Windows-native OpenClaw/Node/runtime overhead or just the local model path in general.

But: WSL2 is not guaranteed to make llama3.2:3b a good OpenClaw “CEO brain.” For ≤10s UI, the most reliable setup is still: