#OpenClaw + Ollama local timing out on CPU VPS - 500 error from Ollama

12 messages · Page 1 of 1 (latest)

mossy path
#

Hi, running OpenClaw 2026.4.14 on VPS 12 vCPU 24GB RAM no GPU, Ollama local with qwen2.5:7b.

Every agent turn returns 500 from Ollama after exactly 1m59s. Direct curl works fine in 7s.

openclaw --version: OpenClaw 2026.4.14 (323493f)

openclaw models status:
Default: ollama/qwen2.5:7b

Fallbacks: none
Auth: ollama:default api_key configured
openclaw config get models.providers.ollama:
baseUrl: http://127.0.0.1:11434
api: ollama (no /v1)
model: qwen2.5:7b
contextWindow: 32768
supportsTools: false
idleTimeoutSeconds: 300
openclaw config get agents.defaults:
primary: ollama/qwen2.5:7b
llm.idleTimeoutSeconds: 300

Ollama logs:
POST /api/chat returns 500 after 1m59s
completion request prompt=26848 tokens
num_ctx=32768
aborting completion request due to client closing the connection

System:
RAM total 23GB, available 21GB
Ollama model size 5.8 GiB
CPU only, no GPU

ollama list: qwen2.5:7b 4.7GB
Direct curl test works in 7s:
curl http://127.0.0.1:11434/api/generate -d {"model":"qwen2.5:7b","prompt":"hola","stream":false}
Response: Hola! Como estas?
Already tried: supportsTools false reduced prompt from 66K to 26K tokens but still 500. idleTimeoutSeconds 300 set and gateway restarted.

Question: why does Ollama return 500 after exactly 1m59s with 21GB free RAM, prompt 26K tokens under the 32K context window, and direct curl working fine?

tame parrotBOT
#

I’m checking the code path for a hard ~120s client timeout, because that exact 1m59s pattern usually means OpenClaw is closing the request, not Ollama running out of RAM.

#

Found it. OpenClaw has a separate LLM idle watchdog, and if it sees no streamed chunks it aborts the request. That fits your Ollama log exactly: “client closing the connection”. Let me grab the relevant docs/source lines so I can give you the cleanest fix.

#

It’s almost certainly not RAM. It’s OpenClaw aborting the Ollama request after ~120s of no streamed output, and Ollama is surfacing that as a 500 because the client closed the connection.

Why I’m pretty confident:

  • Your Ollama log literally says: “aborting completion request due to client closing the connection”
  • OpenClaw has an LLM idle timeout. If no response chunks arrive, it aborts.
  • The default fallback is 120s. Docs/code both match that.

So the real story is:

  1. 26,848 prompt tokens on CPU-only qwen2.5:7b is huge
  2. Ollama is probably spending most of that time in prompt prefill before first token
  3. OpenClaw sees no chunks for ~120s, aborts
  4. Ollama logs client disconnect and returns 500

Also, your curl test is not comparable:

  • tiny prompt
#
  • /api/generate
  • 7s response

That does not prove a 26k-token /api/chat request will produce a first token within 120s.

The interesting bit is this: you already set agents.defaults.llm.idleTimeoutSeconds: 300, but the behavior is still ~120s. That usually means the running gateway is not using the config file you edited.

I’d check these next:

openclaw gateway status
openclaw status --all
openclaw logs --follow

Specifically look for:

  • Config (cli) vs Config (service) mismatch
#
  • log lines mentioning LLM idle timeout

If Config (cli) and Config (service) differ, reinstall service metadata from the same profile/env:

openclaw gateway install --force
openclaw gateway restart

Practical fixes:

  • make sure agents.defaults.llm.idleTimeoutSeconds is actually applied
  • for slow local CPU inference, set it higher, like 600, or 0 to disable
  • reduce prompt pressure further:
    • agents.defaults.localModelMode: "lean"
    • start a fresh session or /compact
    • trim bootstrap/history if possible
mossy path
#

Update: tried all suggestions. Still timing out.
Config cli and Config service are same file, no drift confirmed.
idleTimeoutSeconds set to 600, gateway reinstalled with --force and restarted. Still times out after 5 minutes (timeoutMs=300000).
agents.defaults.localModelMode lean rejected: Error: Config validation failed: agents.defaults: Unrecognized key: "localModelMode"
Ollama log pattern unchanged: client closing the connection, 500 after 1m59s.
Current config:
baseUrl: http://127.0.0.1:11434
api: ollama (no /v1)
supportsTools: false
contextWindow: 32768
idleTimeoutSeconds: 600
prompt tokens: 26848
num_ctx: 32768
RAM available: 21GB
CPU: 12 vCPU, no GPU
Bootstrap file: PRESENT (agents show 1 bootstrapping)
Note: agents.defaults shows 1 bootstrapping, 0 active. Could the bootstrap file be causing extra prompt size?
Direct questions:

Is 26K tokens on CPU too slow for first token within 5 minutes on qwen2.5:7b?
agents.defaults.localModelMode is rejected as unrecognized key, is there another way to reduce prompt size for local models?
Should I delete BOOTSTRAP.md to reduce context?
What is the minimum viable local setup for OpenClaw that actually works?

tame parrotBOT
#

I’m checking two things: whether localModelMode is newer than your version, and whether that lingering BOOTSTRAP.md can keep inflating the prompt or trap the agent in bootstrapping.

#

Yep, I think you’ve found the real problem now.

Short answers:

  1. Is 26K tokens on CPU too slow for first token within 5 minutes on qwen2.5:7b?
    Yes, absolutely plausible. On CPU, this is mostly a prefill / first-token latency problem, not a RAM problem. 21 GB free RAM does not help much once the model is loaded.

  2. agents.defaults.localModelMode is rejected, is there another way to reduce prompt size?
    On your exact build, that knob is effectively not available. Safest ways to shrink prompt on your version are:

    • remove the lingering BOOTSTRAP.md
    • start a fresh session
    • trim injected workspace files (AGENTS.md, SOUL.md, USER.md, TOOLS.md, etc.)
  3. Should I delete BOOTSTRAP.md?
    Yes, or at least move it aside.
    Bootstrapping docs say it is one-time only and should be removed after completion. Seeing “1 bootstrapping, 0 active” is not normal. I’m pretty suspicious that you’re stuck in first-run mode and paying for that extra context/state every time.

#
  1. Minimum viable local setup that actually works?
    Honest answer, not a CPU-only 7B VPS.
    For OpenClaw, the practical floor is a real GPU box, and even then local can still be fragile with large prompts. The docs are pretty blunt here.

What I would do next, in order:

mv ~/.openclaw/workspace/BOOTSTRAP.md ~/.openclaw/workspace/BOOTSTRAP.done.md
openclaw gateway restart

Then start a brand-new session and re-test.

Why:

  • BOOTSTRAP.md is part of injected workspace/bootstrap context
  • if the agent is still marked bootstrapping, it can keep dragging first-run ritual state around
  • docs say BOOTSTRAP.md should disappear after setup anyway
#

So yes, the lingering bootstrap file can contribute to prompt size, and more importantly, it strongly suggests the agent never left bootstrap mode cleanly.

My honest read on your setup:

  • qwen2.5:7b
  • ~26k prompt tokens
  • 12 vCPU
  • no GPU

That is below the practical floor for normal OpenClaw agent turns. It may work for tiny prompts, but not reliably for real agent context.

Best path today:

  • if you want OpenClaw to be usable now, use a hosted primary model
  • if you want local, move to a GPU-backed setup and keep sessions/prompt size much smaller

Docs: