#Is it possible to run 2 models at the same time for openclaw?
1 messages · Page 1 of 1 (latest)
I have been struggle to find a model for both image processing and coding but cant find one/too heavy for my 4gb gpu. but i have 32gb cpu and want to try run both model for openclaw. how can i do it if possible?
@stable yoke
Yes, it’s possible — but with a 4GB GPU I’d treat this as an experiment, not a comfy daily-driver coding setup.
OpenClaw can use separate model defaults:
agents.defaults.model= main chat/coding modelagents.defaults.imageModel= model used when the main model can’t handle images
So you don’t need one giant “does everything” model. You can run a lighter local text/coding model, and separately configure a vision-capable local/proxied model for image inputs.
Practical path:
- Use LM Studio or Ollama first — easiest local model servers.
- Start one local OpenAI-compatible server endpoint.
- Add the local provider to OpenClaw.
- Set your text model as
agents.defaults.model.primary. - Set your vision model as
agents.defaults.imageModel.primary. - For custom local vision models, make sure the model entry says
input: ["text", "image"], otherwise OpenClaw won’t know it can receive images.
Generic shape, with placeholders:
{
agents: {
defaults: {
model: { primary: "local/your-coding-model" },
imageModel: { primary: "local/your-vision-model" }
}
},
models: {
mode: "merge",
providers: {
local: {
baseUrl: "http://127.0.0.1:1234/v1",
apiKey: "local",
api: "openai-completions",
timeoutSeconds: 300,
models: [
{
id: "your-coding-model",
name: "Local coding model",
input: ["text"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }
},
{
id: "your-vision-model",
name: "Local vision model",
input: ["text", "image"],
cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }
}
]
}
}
}
}
Two important reality checks:
- Running both loaded at once may eat RAM fast. With 32GB CPU RAM it may work, but CPU inference will be slow, especially vision.
- Small/local models often struggle with OpenClaw tool use and coding-agent workflows. If reliability matters, a hybrid setup is better: local for privacy/cheap experiments, hosted fallback for hard coding/image tasks.
Docs that match this setup:
https://docs.openclaw.ai/gateway/local-models
https://docs.openclaw.ai/concepts/models
https://docs.openclaw.ai/providers/ollama
https://docs.openclaw.ai/providers/lmstudio
For exact current model recommendations, ask in #models — that changes too fast for me to safely name-drop models here. If you want config help, paste openclaw --version plus whether you’re using Ollama or LM Studio, and I’ll help shape it. 🦞
how can i use a whisper model on openclaw
For Whisper, don’t set it as your main agents.defaults.model. OpenClaw treats speech-to-text separately under tools.media.audio.
You have 3 decent options:
Option A — easiest local: install a Whisper CLI and let OpenClaw auto-detect it
OpenClaw auto-detects local audio tools if audio isn’t disabled, including:
whisper-clifrom whisper.cppwhisperPython CLIsherpa-onnx-offline
So if whisper or whisper-cli is on the Gateway service PATH, voice/audio attachments should be transcribed automatically.
Option B — explicitly configure local Whisper CLI
Example using Python Whisper:
{
tools: {
media: {
audio: {
enabled: true,
models: [
{
type: "cli",
command: "whisper",
args: ["--model", "base", "{{MediaPath}}"],
timeoutSeconds: 120,
},
],
},
},
},
}
For your hardware, start with tiny or base; small may be okay on CPU but slower. 4GB GPU is tight, so CPU Whisper is often less painful.
Option C — provider Whisper / hosted transcription
If you have provider auth, you can use something like OpenAI or Groq:
{
tools: {
media: {
audio: {
enabled: true,
models: [{ provider: "groq", model: "whisper-large-v3-turbo" }],
},
},
},
}
Or OpenAI:
{
tools: {
media: {
audio: {
enabled: true,
models: [{ provider: "openai", model: "gpt-4o-mini-transcribe" }],
},
},
},
}
To test a file manually:
openclaw infer audio transcribe --file ./memo.m4a --json
openclaw infer audio transcribe --file ./memo.m4a --model openai/whisper-1 --json
https://docs.openclaw.ai/cli/infer#audio
If you want exact commands for your install, paste openclaw --version and tell me whether you’re on Linux/macOS + whether you prefer whisper.cpp or Python Whisper. 🦞
14:03:54 [ws] ⇄ res ✓ node.list 16489ms conn=b52871e5…c5f4 id=92e92c77…4a53
14:03:54 [ws] ⇄ res ✓ device.pair.list 16492ms conn=b52871e5…c5f4 id=15615ab5…8c8f
14:03:54 [plugins] embedded acpx runtime backend ready
14:03:55 [discord] client initialized as 1484564009775534170 (Ralsei); awaiting gateway readiness
14:04:19 [ws] ⇄ res ✓ node.list 11035ms conn=b52871e5…c5f4 id=ebe744bb…38cf
14:06:08 [agent/embedded] embedded run failover decision: runId=d89b1b91-6bdb-4deb-8079-37c3f69742ed stage=assistant decision=surface_error reason=timeout from=custom-localhost-8080/planner profile=-
That log means OpenClaw reached your local model provider, but the assistant call timed out:
reason=timeout from=custom-localhost-8080/planner
So this is probably not Discord failing — it’s your local model/server being too slow or not returning a compatible streaming response in time.
First thing: what version are you on?
openclaw --version
Then check these:
openclaw models status
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json
If that probe times out too, fix the local provider side:
- use a smaller/faster model
- increase the provider timeout, e.g. in your custom provider config:
models: {
providers: {
"custom-localhost-8080": {
timeoutSeconds: 300
}
}
}
- make sure the provider
apimatches your server:- most local
/v1/chat/completionsservers:openai-completions - LM Studio Responses API:
openai-responses
- most local
- if this is CPU inference, expect slow first-token latency; 4GB GPU + 32GB RAM can work, but coding/planner models may be painfully slow.
Also, those node.list / device.pair.list calls taking 11–16s are suspiciously slow. Not necessarily fatal, but if everything feels laggy, run:
openclaw status --all
Docs: https://docs.openclaw.ai/gateway/local-models and https://docs.openclaw.ai/help/faq
OpenClaw 2026.4.23 (a979721)
Thanks — 2026.4.23 is recent enough that the local-provider config path I mentioned should apply.
For that specific timeout, I’d debug it in this order:
- Test the local model outside the full agent loop:
openclaw infer model run \
--model custom-localhost-8080/planner \
--prompt "Say hello in one short sentence" \
--gateway \
--json
- If that times out, increase the provider timeout and/or use a smaller model:
{
models: {
providers: {
"custom-localhost-8080": {
timeoutSeconds: 300
}
}
}
}
- Check the provider API mode. If your server exposes
/v1/chat/completions, it should usually be:
api: "openai-completions"
If it’s LM Studio using /v1/responses, use:
api: "openai-responses"
- Restart gateway after config changes:
openclaw gateway restart
- Then re-test:
openclaw models status
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json
If you paste your models.providers.custom-localhost-8080 config block — redact any real API key — I can sanity-check the exact fields. The timeout strongly suggests either the model is too slow on CPU/4GB GPU, or the API mode/streaming compatibility is mismatched.
| Version | 2026.4.23 |
| OS | windows 10.0.26200 (x64) |
| Node | 24.14.0 |
| Config | ~.openclaw\openclaw.json |
|
| Gateway service | Scheduled Task installed · registered · running (pid 17392, Startup-folder login item installed; |
| | listener detected on port 18789.) |
| Node service | Scheduled Task not installed |
| Agents | 1 total · 0 bootstrapping · 1 active · 1 sessions |
| Secrets | none |
Channels
| Channel | Enabled | State | Detail |
| Discord | ON | OK | token config (sha256:786d2fb3 · len 72) · accounts 1/1
Discord accounts
| Account | Status | Notes |
| default | OK | token:config |
Agents
| Agent | Bootstrap file | Sessions | Active | Store |
| main | ABSENT | 1 | 3m ago | ~.openclaw\agents\main\sessions\sessions.json |
Good, gateway + Discord look healthy. The important missing piece is model/provider status — this paste doesn’t show whether custom-localhost-8080/planner is configured/ready.
Please run these next:
openclaw models status
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json
If the second command times out, the local model server is the bottleneck. On Windows + 4GB GPU, that’s very plausible, especially if the model is mostly CPU/RAM loaded.
A few Windows-specific things to check:
- Make sure your local model server is already running before OpenClaw uses it.
- Open
http://127.0.0.1:8080/v1/modelsin a browser, or test it with PowerShell, to confirm the server responds quickly. - If your OpenClaw Gateway runs as a Scheduled Task, it may not see the same
PATH/env as your terminal. This matters for Whisper CLI especially. Secrets: nonemeans no provider API keys are configured, so hosted fallback won’t work unless you add auth. That’s okay for local-only, but less reliable.
For the timeout, add/increase this in your provider block:
{
models: {
providers: {
"custom-localhost-8080": {
baseUrl: "http://127.0.0.1:8080/v1",
apiKey: "local",
api: "openai-completions",
timeoutSeconds: 300
}
}
}
}
Then:
openclaw gateway restart
If you paste your models.providers.custom-localhost-8080 block, I can check whether the model id / api mode / timeout are right. Redact any real key, though if it’s local it can just be local.
"models": {
"mode": "merge",
"providers": {
"custom-localhost-8080": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"models": [
{
"id": "planner",
"name": "LFM2-8B Planner",
"contextWindow": 64000,
"maxTokens": 4096,
"input": ["text"],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"reasoning": true
}
]
},
"custom-localhost-8082": {
"baseUrl": "http://localhost:8082/v1",
"api": "openai-completions",
"models": [
{
"id": "coder",
"name": "Qwen3.5",
"contextWindow": 16384,
"maxTokens": 4096,
"input": ["text"],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"reasoning": true
}
]
},
"custom-localhost-8081": {
"baseUrl": "http://localhost:8081/v1",
"api": "openai-completions",
"models": [
{
"id": "vision",
"name": "LFM2.5-VL Vision",
"contextWindow": 8192,
"maxTokens": 2048,
"input": ["text", "image"],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"reasoning": false
}
]
},
"custom-localhost-8083": {
"baseUrl": "http://localhost:8083/v1",
"api": "openai-completions",
"models": [
{
"id": "sound",
"name": "Sound Model",
"contextWindow": 8192,
"maxTokens": 2048,
"input": ["text"],
"cost": {
"input": 0,
{
"agents": {
"defaults": {
"workspace": "C:\Users\User\.openclaw\workspace",
"model": {
"primary": "custom-localhost-8080/planner"
},
"models": {
"custom-localhost-8080/planner": {
"alias": "Planner"
},
"custom-localhost-8082/coder": {
"alias": "Coder"
},
"custom-localhost-8081/vision": {
"alias": "Vision"
},
"custom-localhost-8083/sound": {
"alias": "Sound"
}
}
}
},
Yep — this config explains a few things.
Main issues I see:
- No
apiKey/ local marker on the custom providers
For local OpenAI-compatible providers, add a placeholder key. It doesn’t need to be secret for localhost:
apiKey: "local"
Add that to each custom-localhost-* provider.
- No
timeoutSeconds
Your planner timeout is likely just slow local inference. Add this at provider level:
timeoutSeconds: 300
- Your
soundprovider is not enough for Whisper/audio transcription
Puttingsoundundermodels.providersmakes it a text model provider. OpenClaw audio transcription usestools.media.audio, notagents.defaults.model.
If you want local Whisper, configure it like this instead:
{
tools: {
media: {
audio: {
enabled: true,
models: [
{
type: "cli",
command: "whisper",
args: ["--model", "base", "{{MediaPath}}"],
timeoutSeconds: 120
}
]
}
}
}
}
Or if you use whisper.cpp:
{
tools: {
media: {
audio: {
enabled: true,
models: [
{
type: "cli",
command: "whisper-cli",
args: ["--model", "C:\\path\\to\\ggml-base.bin", "{{MediaPath}}"],
timeoutSeconds: 120
}
]
}
}
}
}
On Windows Scheduled Task, use the full path to whisper.exe / whisper-cli.exe if auto-detection fails, because the gateway may not have the same PATH as your terminal.
- For image model routing, use
imageModel
Your vision provider entry looks basically right. Make sure your defaults include something like:
agents: {
defaults: {
model: { primary: "custom-localhost-8080/planner" },
imageModel: { primary: "custom-localhost-8081/vision" }
}
}
- Don’t expect planner + coder to both run automatically
OpenClaw won’t automatically split every request between planner and coder just because both providers exist. It will use the selected primary model unless you set up routing/agents or manually switch models.
I’d simplify first: run only planner + vision + Whisper CLI, get that stable, then add coder later. With 4GB GPU, four local servers at once is very likely to cause timeouts or memory thrash.
This part is mostly okay, but it only does model selection/allowlisting. It does not wire vision/audio routing yet.
I’d change it to this shape:
{
agents: {
defaults: {
workspace: "C:\\Users\\User\\.openclaw\\workspace",
model: {
primary: "custom-localhost-8080/planner"
},
imageModel: {
primary: "custom-localhost-8081/vision"
},
models: {
"custom-localhost-8080/planner": { alias: "Planner" },
"custom-localhost-8082/coder": { alias: "Coder" },
"custom-localhost-8081/vision": { alias: "Vision" }
}
}
},
tools: {
media: {
audio: {
enabled: true,
models: [
{
type: "cli",
command: "C:\\path\\to\\whisper.exe",
args: ["--model", "base", "{{MediaPath}}"],
timeoutSeconds: 120
}
]
}
}
}
}
Then in your models.providers blocks, add apiKey: "local" and timeoutSeconds: 300 to each localhost provider, especially planner/vision:
"custom-localhost-8080": {
baseUrl: "http://localhost:8080/v1",
apiKey: "local",
api: "openai-completions",
timeoutSeconds: 300,
models: [ ... ]
}
Important bits:
model.primary= normal assistant/planner modelimageModel.primary= image-capable fallback when images are attached- Whisper/audio does not go in
agents.defaults.models; it goes undertools.media.audio - I’d remove
custom-localhost-8083/soundfrom the allowlist unless it’s actually a text/chat model. It won’t be used as Whisper just because it’s named Sound. - Your pasted JSON looks cut off at the end, so make sure the final file has all closing braces and commas valid.
After editing:
openclaw gateway restart
openclaw models status
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json
If you tell me what Whisper binary you installed — Python whisper, whisper.cpp whisper-cli, or something else — I can give you the exact Windows command line.
i use whisper-cli server
Ah — if it’s a Whisper server, the key question is: does it expose an OpenAI-compatible endpoint?
OpenClaw’s provider audio path expects:
POST <baseUrl>/audio/transcriptions
So if your server has something like:
http://localhost:8083/v1/audio/transcriptions
then configure it under tools.media.audio, not as agents.defaults.models:
{
tools: {
media: {
audio: {
enabled: true,
models: [
{
provider: "openai",
model: "whisper-1",
baseUrl: "http://localhost:8083/v1",
apiKey: "local",
timeoutSeconds: 120
}
]
}
}
}
}
Then test:
openclaw gateway restart
openclaw infer audio transcribe --file .\test.wav --json
But if your whisper-cli server is the whisper.cpp server that only exposes /inference or some non-OpenAI endpoint, OpenClaw probably can’t call it directly as a provider. In that case use one of these instead:
Simplest: call the binary directly:
{
tools: {
media: {
audio: {
enabled: true,
models: [
{
type: "cli",
command: "C:\\full\\path\\to\\whisper-cli.exe",
args: ["-m", "C:\\full\\path\\to\\ggml-base.bin", "-f", "{{MediaPath}}"],
timeoutSeconds: 120
}
]
}
}
}
}
More advanced: run a tiny proxy that converts OpenAI /v1/audio/transcriptions requests into your server’s endpoint.
Quick check: open this in browser or curl it:
http://localhost:8083/v1/audio/transcriptions
If it 404s, it’s not OpenAI-compatible and you’ll want the CLI config/proxy route.
here is my own config for whisper:
"tools": {
"profile": "coding",
"web": {
"search": {
"provider": "gemini",
"enabled": true
}
},
"media": {
"audio": {
"enabled": true,
"models": [
{
"type": "cli",
"command": "whisper",
"args": [
"--model",
"base",
"{{MediaPath}}",
"--output_format",
"txt"
],
"timeoutSeconds": 120
}
]
}
}
},
That config is close, but for the Python whisper CLI the output behavior is the likely problem.
whisper normally writes transcript files to disk and prints progress/logging, not always clean transcript text to stdout. OpenClaw’s CLI audio transcription wants the command’s stdout to contain the transcript.
Try adding --output_dir - if your Whisper CLI supports it:
"tools": {
"profile": "coding",
"web": {
"search": {
"provider": "gemini",
"enabled": true
}
},
"media": {
"audio": {
"enabled": true,
"models": [
{
"type": "cli",
"command": "whisper",
"args": [
"--model", "base",
"--output_format", "txt",
"--output_dir", "-",
"{{MediaPath}}"
],
"timeoutSeconds": 120
}
]
}
}
}
If that doesn’t produce clean stdout, use a wrapper script instead. Example idea:
python -m whisper "C:\path\audio.wav" --model base --output_format txt --output_dir C:\temp\whisper-out
Get-Content C:\temp\whisper-out\audio.txt
Then point OpenClaw’s command at that wrapper so stdout is only the transcript.
Also Windows Scheduled Task warning: if command: "whisper" works in PowerShell but not in OpenClaw, use the full path to whisper.exe or python.exe, because the Gateway may not inherit your terminal PATH.
Test with:
openclaw infer audio transcribe --file .\test.wav --json
If that fails, paste the exact error from that command/logs and we can tighten it.
i dont remember my whisper runner was python, it was cpp
Gotcha — then change command: "whisper". That is usually the Python Whisper command name. For whisper.cpp it’s usually whisper-cli.exe — older builds may be main.exe.
First check what Windows sees:
where whisper
where whisper-cli
where main
For whisper.cpp, your OpenClaw config should look more like this, using full paths:
{
tools: {
media: {
audio: {
enabled: true,
models: [
{
type: "cli",
command: "C:\\path\\to\\whisper-cli.exe",
args: [
"-m", "C:\\path\\to\\ggml-base.bin",
"-f", "{{MediaPath}}",
"--no-timestamps"
],
timeoutSeconds: 120
}
]
}
}
}
}
If your binary is the older main.exe, use that as command instead.
Test the exact command manually first:
C:\path\to\whisper-cli.exe -m C:\path\to\ggml-base.bin -f .\test.wav --no-timestamps
If that prints the transcript to the terminal, OpenClaw can use it. Then test OpenClaw:
openclaw infer audio transcribe --file .\test.wav --json
Small note: if you were running a whisper.cpp server, that’s different from whisper-cli. OpenClaw’s easiest path is the CLI binary. The server only works directly if it exposes OpenAI-compatible /v1/audio/transcriptions.
why my openclaw took so much time to start, how can i increase timeout for model servers
Two separate timeouts matter here:
1. Model HTTP timeout — use this first for slow local servers:
models: {
providers: {
"custom-localhost-8080": {
baseUrl: "http://127.0.0.1:8080/v1",
apiKey: "local",
api: "openai-completions",
timeoutSeconds: 300,
models: [ ... ]
},
"custom-localhost-8081": {
baseUrl: "http://127.0.0.1:8081/v1",
apiKey: "local",
api: "openai-completions",
timeoutSeconds: 300,
models: [ ... ]
}
}
}
Docs specifically recommend models.providers.<id>.timeoutSeconds for slow local/remote model servers before raising the whole agent timeout: https://docs.openclaw.ai/gateway/local-models
2. Whole agent/run timeout — only raise this if the full assistant run still gets killed after provider timeout is higher:
agents: {
defaults: {
timeoutSeconds: 600,
model: { primary: "custom-localhost-8080/planner" },
imageModel: { primary: "custom-localhost-8081/vision" }
}
}
Then restart:
openclaw gateway restart
Why startup/first reply is slow in your setup:
- You’re running multiple local model servers on a 4GB GPU + CPU RAM; cold-loading models can take a long time.
- Four servers at once can fight for RAM/VRAM and make everything slower.
localhoston Windows can sometimes try IPv6 first; use127.0.0.1inbaseUrlto avoid that variable.- First request often loads/warmups the model. Keep only the models you actually need running.
My recommendation: start with one text model + one vision model + whisper CLI, with timeoutSeconds: 300, and remove the extra coder/sound servers until the basic loop is stable. 🦞
it is the same despite i running 1 or 4 models
If it’s the same with 1 model or 4, then the delay probably isn’t VRAM contention. It’s more likely one of these:
- The local model server itself has slow first-token / request handling
Test it outside OpenClaw and time it:
Measure-Command { openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json }
Also test the server directly if it’s OpenAI-compatible:
Invoke-RestMethod http://127.0.0.1:8080/v1/models
If /v1/models is slow, OpenClaw can’t fix that — the server is slow to respond.
localhostresolution weirdness on Windows
Change all localbaseUrls from:
http://localhost:8080/v1
to:
http://127.0.0.1:8080/v1
This avoids IPv6 / resolver weirdness.
- Gateway startup vs assistant response timeout are different
OpenClaw does not normally load your LLM at gateway startup. The slow part in your earlier log was during an assistant run:
stage=assistant reason=timeout from=custom-localhost-8080/planner
So increasing startup timeout won’t help if the model request itself hangs.
- Your local server may not stream in the shape OpenClaw expects
Try non-stream/simple probe:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Return exactly: ok" --gateway --json
If this hangs or returns weird JSON/errors, the issue is provider compatibility or the local server.
- Your
node.listcalls were also very slow
Earlier you had 11–16s for node/device calls. That suggests general gateway/system slowness too, not just models. Check:
openclaw status --deep
openclaw logs --tail 200
I’d make these exact config tweaks first:
"custom-localhost-8080": {
"baseUrl": "http://127.0.0.1:8080/v1",
"apiKey": "local",
"api": "openai-completions",
"timeoutSeconds": 300,
"models": [ ... ]
}
Do that for each localhost provider, restart gateway, then run the infer model run probe. If you paste that probe’s output/timing, we can tell whether it’s OpenClaw timeout, server compatibility, or raw model latency.
why openclaw repeat this twice in log:
14:25:09 [ws] ⇄ res ✓ commands.list 746ms conn=ed584b2d…6a43 id=c3622af0…f116
14:25:09 [ws] ⇄ res ✓ chat.history 759ms conn=ed584b2d…6a43 id=23005b0b…0327
14:25:09 [ws] ⇄ res ✓ models.list 761ms conn=ed584b2d…6a43 id=ecb5501a…b150
14:25:09 [ws] ⇄ res ✓ node.list 788ms conn=ed584b2d…6a43 id=604bd909…fa71
14:25:09 [ws] ⇄ res ✓ device.pair.list 789ms conn=ed584b2d…6a43 id=c06b5b46…9ff1
14:26:02 [ws] ⇄ res ✓ channels.status 130ms conn=558050ab…3a83 id=83591a49…b0c0
14:27:12 [ws] ⇄ res ✓ health 870ms conn=fb45220e…b45b id=a0da69ac…8976
That log isn’t OpenClaw “starting twice” or running the model twice. Those are WebSocket RPC requests from a client UI / control surface asking the Gateway for state.
This batch:
commands.list
chat.history
models.list
node.list
device.pair.list
usually happens when the Dashboard / chat UI opens or refreshes. It loads available commands, chat history, model picker data, nodes, and device pairing state.
The later lines are different connections:
conn=558050ab… channels.status
conn=fb45220e… health
Different conn= IDs means different websocket connections — commonly:
- another browser tab
- dashboard reconnect/refresh
- status page / health check
- CLI command asking gateway status
- mobile/control UI reconnecting
So this is normal by itself. The times shown there are also not terrible now: ~700–800ms for the first batch, 130ms for channel status, 870ms health.
If you really see the exact same batch duplicated repeatedly every few seconds, check for:
- multiple Dashboard tabs open
- browser auto-refresh / extension reconnecting
- both startup-folder login item and Scheduled Task launching UI/helper pieces
- a CLI/status command running in a loop
But this log snippet alone doesn’t explain the model timeout. For model slowness, the line to focus on is still the earlier one like:
stage=assistant reason=timeout from=custom-localhost-8080/planner
so slow that agents section doesnt loaded yet
If the Agents section in the UI never finishes loading, that’s probably not your model server directly — it’s the Dashboard waiting on gateway state, sessions, nodes/devices, or a stuck active run.
Try this quick isolation path:
openclaw status --deep
openclaw logs --tail 200
openclaw sessions list
Then restart only the gateway:
openclaw gateway restart
After restart, open one Dashboard tab only. Multiple tabs can create repeated websocket requests and make the UI look noisier.
A few likely causes:
- Stuck active assistant run from the timed-out local model. Restarting gateway usually clears the immediate wedged UI state.
- Session store/history slow. If
openclaw sessions listhangs or errors, the Agents panel may wait on session metadata. - Node/device checks slow. You don’t have Node service installed, so if node/device calls are slow or failing, the UI can feel stuck while it waits.
- Browser cache / websocket state wedged. Try hard refresh or another browser.
- Scheduled Task + startup login item weirdness. Your status said both Scheduled Task and Startup-folder login item are present. That can be okay, but if two gateway-ish things are fighting, it can cause odd behavior.
openclaw gateway statusshould show one clear running gateway.
If you paste the last ~100 lines after opening the Dashboard and waiting for Agents to load, I can point at the exact request that’s blocking. The key is whether you see an error/timeout after agents/sessions/chat.history, or whether the UI just never asks for it.
System: [2026-05-11 14:24:54 GMT+7] Config recovery warning: OpenClaw restored openclaw.json from the last-known-good backup during reload (reload-invalid-config). The rejected config was invalid and was preserved as a timestamped .clobbered.* file. Do not write openclaw.json again unless you validate the full config first.
[Mon 2026-05-11 14:29 GMT+7] hi
Aha — that warning is the real clue.
OpenClaw is slow / Agents isn’t loading because your config edit was invalid, so Gateway restored the last-known-good config and preserved your broken edit as an openclaw.json.clobbered.* file.
That usually happens when you paste a partial block like:
"tools": { ... }
without wrapping it into the full existing openclaw.json, or there’s a missing comma/brace.
Do this first:
openclaw config validate
openclaw doctor
Then inspect the rejected/clobbered config:
$config = openclaw config file
Get-ChildItem "$config.clobbered.*", "$config.rejected.*" -ErrorAction SilentlyContinue | Sort-Object LastWriteTime -Descending | Select-Object -First 5
Do not paste the whole clobbered file back over openclaw.json. Copy only the intended sections into the active config, then validate:
openclaw config validate
openclaw gateway restart
For your case, the intended pieces are:
agents.defaults.imageModelmodels.providers.*.apiKey+timeoutSecondstools.media.audiofor whisper.cpp
The warning specifically says “Do not write openclaw.json again unless you validate the full config first” — so yeah, claws carefully here. 🦞
Docs: https://docs.openclaw.ai/gateway/troubleshooting#gateway-rejected-invalid-config
14:32:48 [agent/embedded] [llm-idle-timeout] custom-localhost-8080/planner produced no reply before the idle watchdog; retrying same model
14:33:02 [ws] ⇄ res ✓ node.list 54ms conn=58768958…2f30 id=c95613c9…ac1b
That line means the model connection opened, but no chunks/tokens came back before OpenClaw’s LLM idle watchdog fired:
[llm-idle-timeout] custom-localhost-8080/planner produced no reply before the idle watchdog
So the fix is specifically models.providers.<id>.timeoutSeconds, not just agents.defaults.timeoutSeconds.
For your planner provider:
"models": {
"mode": "merge",
"providers": {
"custom-localhost-8080": {
"baseUrl": "http://127.0.0.1:8080/v1",
"apiKey": "local",
"api": "openai-completions",
"timeoutSeconds": 300,
"models": [
{
"id": "planner",
"name": "LFM2-8B Planner",
"contextWindow": 64000,
"maxTokens": 4096,
"input": ["text"],
"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
"reasoning": true
}
]
}
}
}
Then:
openclaw config validate
openclaw gateway restart
Docs say this provider timeout controls the model HTTP fetch and the model stream idle watchdog for slow local/self-hosted providers: https://docs.openclaw.ai/concepts/agent-loop#timeouts
If it still idles out after timeoutSeconds: 300, your server likely accepts the request but doesn’t stream/return anything for 5+ minutes. Then I’d test the backend directly with a tiny request and reduce max output/context, because OpenClaw is just waiting for the first token at that point.
how can i send the config while file upload is disabled
If file upload is disabled, send it as text — just don’t paste secrets.
Best options:
- Paste only the relevant blocks
Usually enough:
agents: { ... }
models: { ... }
tools: { ... }
Redact anything like:
apiKey: "REDACTED"
token: "REDACTED"
- Use Discord code blocks
Split it into multiple messages if it’s too long:
```json5
...config part 1...
```
- Use safe CLI output instead of full config
These are better than pasting the whole file:
openclaw config validate
openclaw models status
openclaw status --all
status --all is designed to redact tokens.
- Print just one config path
For example:
openclaw config get models.providers
openclaw config get agents.defaults
openclaw config get tools.media.audio
Then paste those outputs here.
Avoid pasting the entire openclaw.json if it includes Discord tokens, API keys, gateway tokens, or auth profiles. For your current issue, I only need agents.defaults, models.providers, and tools.media.audio.
i got limited at 2000 characters, the config is over that
No need to send the whole config. Send the small pieces one at a time.
Run these and paste each output in separate messages:
openclaw config get agents.defaults
openclaw config get models.providers.custom-localhost-8080
openclaw config get models.providers.custom-localhost-8081
openclaw config get tools.media.audio
If one block is still too big, get only the important fields:
openclaw config get models.providers.custom-localhost-8080.baseUrl
openclaw config get models.providers.custom-localhost-8080.api
openclaw config get models.providers.custom-localhost-8080.timeoutSeconds
openclaw config get models.providers.custom-localhost-8080.models
For this issue I mainly need to see:
- provider id
baseUrlapiapiKeypresent or not — redact value if realtimeoutSeconds- model
id agents.defaults.modelagents.defaults.imageModeltools.media.audio
You can split across 3–5 Discord messages. Just wrap each in a code block if possible, and redact tokens/API keys.
"agents": {
"defaults": {
"workspace": "C:\Users\User\.openclaw\workspace",
"model": {
"primary": "custom-localhost-8080/planner"
},
"models": {
"custom-localhost-8080/planner": {
"alias": "Planner"
},
"custom-localhost-8082/coder": {
"alias": "Coder"
},
"custom-localhost-8081/vision": {
"alias": "Vision"
}
}
}
},
"gateway": {
"mode": "local",
"auth": {
"mode": "password",
"password": ""
},
"port": 18789,
"bind": "loopback",
"tailscale": {
"mode": "off",
"resetOnExit": false
}
},
"tools": {
"profile": "coding",
"web": {
"search": {
"provider": "gemini",
"enabled": true
}
},
"media": {
"audio": {
"enabled": true,
"models": [
{
"type": "cli",
"command": "whisper",
"args": [
"--model",
"base",
"{{MediaPath}}",
"--output_format",
"txt"
],
"timeoutSeconds": 120
}
]
}
}
},
"models": {
"mode": "merge",
"providers": {
"custom-localhost-8080": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"models": [
{
"id": "planner",
"name": "LFM2-8B Planner",
"contextWindow": 64000,
"maxTokens": 4096,
"input": ["text"],
"reasoning": true
}
]
},
"custom-localhost-8082": {
"baseUrl": "http://localhost:8082/v1",
"api": "openai-completions",
"models": [
{
"id": "coder",
"name": "Qwen3.5",
"contextWindow": 16384,
"maxTokens": 4096,
"input": ["text"],
"reasoning": true
}
]
},
"custom-localhost-8081": {
"baseUrl": "http://localhost:8081/v1",
"api": "openai-completions",
"models": [
{
"id": "vision",
"name": "LFM2.5-VL Vision",
"contextWindow": 8192,
"maxTokens": 2048,
"input": ["text", "image"],
"reasoning": false
}
]
}
}
},
This pasted piece has two likely problems:
- It’s only a fragment
If your actual file starts with:
"agents": { ... }
then it is invalid. The full file must be wrapped like:
{
"agents": { ... },
"gateway": { ... },
"tools": { ... },
"models": { ... }
}
Also your pasted block ends with a trailing comma after tools:
},
That is okay only if another top-level key follows it, like "models". If it’s the end of the file, it’s invalid.
- This does not include the
models.providerssection
That’s where your timeout fix must go. I still need this output:
openclaw config get models.providers.custom-localhost-8080
Your agents.defaults should also include imageModel if you want the vision model used for images:
"imageModel": {
"primary": "custom-localhost-8081/vision"
}
For Whisper: your current config uses command: "whisper", which is usually Python Whisper. If you use whisper.cpp, change that to the full path of whisper-cli.exe and use whisper.cpp args, e.g.:
{
"type": "cli",
"command": "C:\\path\\to\\whisper-cli.exe",
"args": [
"-m", "C:\\path\\to\\ggml-base.bin",
"-f", "{{MediaPath}}",
"--no-timestamps"
],
"timeoutSeconds": 120
}
Before restarting, always run:
openclaw config validate
If that fails, paste just the validation error — that’ll tell us exactly where the config is broken.
"channels": {
"discord": {
"enabled": true,
"token": "",
"groupPolicy": "allowlist",
"guilds": {
"*": {}
},
"dmPolicy": "allowlist",
"dm": {
"enabled": true
},
"allowFrom": [
"807736763425619968"
]
}
},
"plugins": {
"entries": {
"google": {
"enabled": true,
"config": {
"webSearch": {
"apiKey": ""
}
}
},
"memory-core": {
"config": {
"dreaming": {
"enabled": true
}
}
}
}
},
"meta": {
"lastTouchedVersion": "2026-04-23",
"lastTouchedAt": "2026-04-26T16:06:16.943Z"
}
Yep, I see the issue in this models block.
Add three things to each localhost provider:
- change
localhost→127.0.0.1 - add
apiKey: "local" - add
timeoutSeconds: 300
Like this for planner:
"custom-localhost-8080": {
"baseUrl": "http://127.0.0.1:8080/v1",
"apiKey": "local",
"api": "openai-completions",
"timeoutSeconds": 300,
"models": [
{
"id": "planner",
"name": "LFM2-8B Planner",
"contextWindow": 64000,
"maxTokens": 4096,
"input": ["text"],
"reasoning": true,
"cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
}
]
}
Do the same pattern for custom-localhost-8081 and custom-localhost-8082.
Also add this under agents.defaults so images route to the vision model:
"imageModel": {
"primary": "custom-localhost-8081/vision"
}
And be careful: your pasted block ends with:
},
That trailing comma is only valid if another top-level config key comes after it. If it’s the end of the file, remove the comma.
After edits:
openclaw config validate
openclaw gateway restart
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json
If it still logs [llm-idle-timeout] after this, the server is accepting the request but not producing first tokens fast enough / not streaming compatibly.
Thanks, this last block looks structurally fine if it’s inside one top-level { ... } with the earlier blocks.
But a few things to fix/avoid:
- Empty secrets may be a problem
You pasted:
"token": ""
"apiKey": ""
If those are really empty in your config, Discord / Google search auth may be broken. If you only redacted them before pasting, that’s fine.
- Do not overwrite the whole file with these fragments
Your fullopenclaw.jsonshould look like:
{
"agents": { ... },
"gateway": { ... },
"tools": { ... },
"models": { ... },
"channels": { ... },
"plugins": { ... },
"meta": { ... }
}
Each top-level section needs commas between sections, but no trailing comma after the final meta block.
- Required changes for your timeout issue are still in
models.providers
Update all local providers like this:
"baseUrl": "http://127.0.0.1:8080/v1",
"apiKey": "local",
"api": "openai-completions",
"timeoutSeconds": 300
- Add image routing
Underagents.defaults:
"imageModel": {
"primary": "custom-localhost-8081/vision"
}
- Whisper.cpp config still needs changing
Your current tools.media.audio uses command: "whisper", which likely isn’t your whisper.cpp binary. Use the full path to whisper-cli.exe or main.exe.
After edits, run exactly:
openclaw config validate
If it fails, paste the validation error only. Don’t restart until validation passes, because the gateway will just restore last-known-good again.
PS C:\WINDOWS\system32> openclaw config validate
🦞 OpenClaw 2026.4.23 (a979721) — Half butler, half debugger, full crustacean.
Config valid: ~.openclaw\openclaw.json
Nice — config is valid now 🎉
Next do the runtime test path:
openclaw gateway restart
openclaw models status
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json
If that succeeds, try Discord again.
If it still shows:
[llm-idle-timeout]
then the config is no longer the blocker — your local server is accepting the request but not producing output fast enough / not streaming compatibly. In that case, test the server directly with a tiny request, and make sure your provider block has:
"baseUrl": "http://127.0.0.1:8080/v1",
"apiKey": "local",
"timeoutSeconds": 300
openclaw config validate proves the file shape is good; infer model run proves the model server actually works through the gateway.
🦞 OpenClaw 2026.4.23 (a979721) — I'm not saying your workflow is chaotic... I'm just bringing a linter and a helmet.
Config : ~.openclaw\openclaw.json
Agent dir : ~.openclaw\agents\main\agent
Default : custom-localhost-8080/planner
Fallbacks (0) : -
Image model : custom-localhost-8081/vision
Image fallbacks (0): -
Aliases (3) : Planner -> custom-localhost-8080/planner, Coder -> custom-localhost-8082/coder, Vision -> custom-localhost-8081/vision
Configured models (3): custom-localhost-8080/planner, custom-localhost-8082/coder, custom-localhost-8081/vision
Auth overview
Auth store : ~.openclaw\agents\main\agent\auth-profiles.json
Shell env : off
Providers w/ OAuth/tokens (0): -
- custom-localhost-8080 effective=models.json:l...l | models.json=l...l | source=models.json: ~.openclaw\agents\main\agent\models.json
- custom-localhost-8081 effective=models.json:l...l | models.json=l...l | source=models.json: ~.openclaw\agents\main\agent\models.json
- custom-localhost-8082 effective=models.json:l...l | models.json=l...l | source=models.json: ~.openclaw\agents\main\agent\models.json
OAuth/token status
This looks good now: OpenClaw sees all 3 models, and it sees local placeholder auth from models.json (l...l = probably local). So auth/config is no longer the obvious problem.
Next test the actual model call:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello in 5 words" --gateway --json
If that works, the planner is usable through OpenClaw.
If it still logs [llm-idle-timeout], then the issue is your server/model runtime, not OpenClaw config. In that case check these:
openclaw config get models.providers.custom-localhost-8080.timeoutSeconds
openclaw config get models.providers.custom-localhost-8080.baseUrl
You want:
"baseUrl": "http://127.0.0.1:8080/v1",
"timeoutSeconds": 300
Also test /v1/models directly:
Invoke-RestMethod http://127.0.0.1:8080/v1/models
If /v1/models responds but infer model run hangs, your server likely has a chat-completions compatibility/streaming issue or the model is taking too long to produce the first token.
i got replied this this in openclaw chat:
at: 14:43
[Mon 2026-05-11 14:45 GMT+7] hi
System: [2026-05-11 14:45:18 GMT+7] OpenClaw detected a configuration reload warning: the last valid openclaw.json was restored from a timestamped backup after rejection. The rejected config was saved as openclaw.json.clobbered_20260511T142234Z.clobbered — do not overwrite unless verified.
[Mon 2026-05-11 14:46 GMT+7] System: [2026-05-11 14:46:32 GMT+7] Config recovery successful, but caution: the original file was replaced; validate config before writing new openclaw.json if needed.
[Mon 2026-05-11 14:48 GMT+7] hi
[Mon 2026-05-11 14:48 GMT+7] hi
System: [2026-05-11 14:50:05 GMT+7] OpenClaw active session confirmed. Ready to proceed with next task.
[Mon 2026-05-11 14:52 GMT+7] hi
Those lines are not normal assistant replies — they’re system/context messages being included in the OpenClaw conversation transcript.
The important part is this:
openclaw.json.clobbered_20260511T142234Z.clobbered
That was from the earlier bad config edit. Since openclaw config validate now passes, it may just be old session context getting repeated back into the chat.
Do this to clear the noisy/polluted session:
openclaw gateway restart
Then in the chat, start a fresh session:
/new
or from CLI, if you prefer:
openclaw sessions list
Then test with one clean message like hi.
If it still only echoes system/context lines and gives no real assistant answer, then the model is still failing to produce a reply. Run this exact test:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --json
If that hangs or idle-timeouts, the remaining issue is the local model server, not Discord or config recovery.
Also: don’t worry about the .clobbered file itself. It’s a preserved backup of the rejected bad config. Once your active config validates, you can ignore it unless you need to recover something from it.
14:48:17 [plugins] embedded acpx runtime backend ready
14:48:17 [gateway] security audit: device access upgrade requested reason=scope-upgrade device=2b8010d84c84c4ebdba8fa2a407a2ca942519111bf0462a4a0f7d4915ee2a0f0 ip=unknown-ip auth=password roleFrom=operator roleTo=operator scopesFrom=operator.read scopesTo=operator.admin,operator.approvals,operator.pairing,operator.read,operator.talk.secrets,operator.write client=cli conn=74d4688c-0e70-4aae-9136-00a22991d9fd
14:48:17 [ws] closed before connect conn=74d4688c-0e70-4aae-9136-00a22991d9fd peer=127.0.0.1:64859->127.0.0.1:18789 remote=127.0.0.1 fwd=n/a origin=n/a host=127.0.0.1:18789 ua=n/a code=1008 reason=connect failed
14:48:18 [discord] client initialized as 1484564009775534170 (Ralsei); awaiting gateway readiness
14:49:31 [ws] ⇄ res ✓ node.list 538ms conn=25dbced2…0ee7 id=a9d1fa26…8415
This log is a device scope upgrade, not a model timeout:
scope-upgrade ... scopesFrom=operator.read ... scopesTo=operator.admin,operator.approvals,operator.pairing,operator.read,operator.talk.secrets,operator.write
code=1008 reason=connect failed
Your CLI/device currently only has operator.read, but the CLI action you ran needs broader operator scopes. OpenClaw created a pending upgrade request, then closed that connection until you approve it.
Run:
openclaw devices list
Look for the pending request for that same device/scope upgrade. Then approve the exact request id:
openclaw devices approve <requestId>
If you want to preview the latest pending request first:
openclaw devices approve --latest
But docs say approval without an exact id only previews; you still rerun with the exact requestId.
After approving:
openclaw gateway restart
openclaw status --deep
That should clear the 1008 connect failed for CLI/control actions. Then you can go back to testing the model call separately.
this is the result of the hi command:
PS C:\WINDOWS\system32> openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json
gateway connect failed: GatewayClientRequestError: scope upgrade pending approval (requestId: 555883a1-afb3-4300-8ef5-2b6e0cbfab97)
Error: gateway closed (1008): pairing required: device is asking for more scopes than currently approved (requestId: 555883a1-afb3-4300-8ef5-2b6e0cbfa
Gateway target: ws://127.0.0.1:18789
Source: local loopback
Config: C:\Users\User.openclaw\openclaw.json
Bind: loopback
Yep, that confirms the current blocker is device pairing / scope upgrade, not the model.
Approve this exact request id:
openclaw devices approve 555883a1-afb3-4300-8ef5-2b6e0cbfab97
If it says the request is missing/expired, refresh the pending list:
openclaw devices list
Then approve the newest pending scope-upgrade request with its exact requestId.
After approval, retry:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json
That should finally test the model instead of failing at gateway auth. 🦞
PS C:\WINDOWS\system32> openclaw devices approve 555883a1-afb3-4300-8ef5-2b6e0cbfab97
🦞 OpenClaw 2026.4.23 (a979721) — The lobster in your shell. 🦞
|
gateway connect failed: GatewayClientRequestError: scope upgrade pending approval (requestId: 555883a1-afb3-4300-8ef5-2b6e0cbfab97)
o
Direct scope access failed; using local fallback.
Approved 2b8010d84c84c4ebdba8fa2a407a2ca942519111bf0462a4a0f7d4915ee2a0f0 (555883a1-afb3-4300-8ef5-2b6e0cbfab97)
PS C:\WINDOWS\system32>
Good — that approval worked despite the scary first line. The key line is:
Approved 2b8010d... (555883a1-afb3-4300-8ef5-2b6e0cbfab97)
Now retry the model probe:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json
If you still get pairing required, run:
openclaw devices list
and check whether there is a new pending scope-upgrade request id. Sometimes the CLI asks again with a refreshed request id.
If the probe now reaches the model but times out, then we’re back to the local server/idle-timeout issue — paste that output next.
i have a new command running but that wasnt done yet
That’s okay — let the current command finish first. Don’t start another model probe in parallel yet, or it’ll muddy the logs.
If it’s the openclaw infer model run ... command, possible outcomes:
- It returns JSON → paste the output.
- It hangs until timeout / logs
[llm-idle-timeout]→ the request reached the model server, but the model didn’t produce tokens fast enough. - It shows pairing required again → run
openclaw devices listand approve the new request id.
If it’s been stuck for more than your configured timeout, you can stop it with Ctrl+C, then check:
openclaw logs --tail 100
openclaw devices list
But I’d wait a bit first since your local model may be slow to first token.
this is my local model log:
slot update_slots: id 0 | task 375 | Checking checkpoint with [8191, 8191] against 305...
slot update_slots: id 0 | task 375 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see
slot update_slots: id 0 | task 375 | erased invalidated context checkpoint (pos_min = 8191, pos_max = 8191, n_tokens = 8192, n_swa = 0, pos_next = 0, size = 0.282 MiB)
slot update_slots: id 0 | task 375 | erased invalidated context checkpoint (pos_min = 16383, pos_max = 16383, n_tokens = 16384, n_swa = 0, pos_next = 0, size = 0.282 MiB)
slot update_slots: id 0 | task 375 | erased invalidated context checkpoint (pos_min = 16479, pos_max = 16479, n_tokens = 16480, n_swa = 0, pos_next = 0, size = 0.282 MiB)
slot update_slots: id 0 | task 375 | erased invalidated context checkpoint (pos_min = 17048, pos_max = 17048, n_tokens = 17049, n_swa = 0, pos_next = 0, size = 0.282 MiB)
slot update_slots: id 0 | task 375 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 0 | task 375 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.121212
slot update_slots: id 0 | task 375 | n_tokens = 2048, memory_seq_rm [2048, end)
slot update_slots: id 0 | task 375 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.242424
slot update_slots: id 0 | task 375 | n_tokens = 4096, memory_seq_rm [4096, end)
slot update_slots: id 0 | task 375 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.363636
slot update_slots: id 0 | task 375 | n_tokens = 6144, memory_seq_rm [6144, end)
slot update_slots: id 0 | task 375 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, progress = 0.484848
srv stop: cancel task, id_task = 375
srv log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
srv params_from_: Chat format: peg-native
That local model log explains the timeout pretty clearly:
prompt processing progress ... progress = 0.484848
srv stop: cancel task
Your model never reached generation. It was still prompt-processing / prefill when OpenClaw cancelled the request. So OpenClaw wasn’t waiting on the final answer — it was waiting for the first token, and your server produced no streamed chunks before the idle watchdog.
What to do:
- Make sure timeout is actually on the provider
"custom-localhost-8080": {
"baseUrl": "http://127.0.0.1:8080/v1",
"apiKey": "local",
"api": "openai-completions",
"timeoutSeconds": 600,
"models": [ ... ]
}
Then restart:
openclaw gateway restart
- Lower the advertised context window
Your config says planner has:
"contextWindow": 64000
That encourages OpenClaw to send large prompts. For your hardware/server, set it much lower first:
"contextWindow": 8192,
"maxTokens": 1024
If stable, increase later.
- Turn off reasoning for that local model at first
"reasoning": false
Local OpenAI-compatible servers often don’t handle OpenClaw reasoning/tool-style prompting well, and it can increase latency.
- Start a fresh chat/session after changing context
Old transcript/context can keep making prompts large:
/new
- If llama.cpp server supports it, reduce server ctx too
Don’t run the server with a huge context on 4GB GPU/CPU if first-token latency is killing you. Try 4096 or 8192 until it works reliably.
The important diagnosis: srv stop: cancel task means OpenClaw/client aborted while the model was still ingesting the prompt. So either raise provider timeout a lot, or reduce prompt/context/model size so it reaches first token faster. I’d do both: timeoutSeconds: 600, contextWindow: 8192, maxTokens: 1024, reasoning: false for the first working baseline.
but when i put the timeout config, it errors then revert to previous config
That means the timeout edit was being applied in a way that made the full config invalid. Don’t hand-edit the whole file for this — use openclaw config set, because it validates before writing.
Run these exact commands:
openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 600 --strict-json
openclaw config set models.providers.custom-localhost-8081.timeoutSeconds 600 --strict-json
openclaw config set models.providers.custom-localhost-8082.timeoutSeconds 600 --strict-json
And if you haven’t changed localhost yet:
openclaw config set models.providers.custom-localhost-8080.baseUrl '"http://127.0.0.1:8080/v1"' --strict-json
openclaw config set models.providers.custom-localhost-8081.baseUrl '"http://127.0.0.1:8081/v1"' --strict-json
openclaw config set models.providers.custom-localhost-8082.baseUrl '"http://127.0.0.1:8082/v1"' --strict-json
Then verify:
openclaw config get models.providers.custom-localhost-8080.timeoutSeconds
openclaw config validate
openclaw gateway restart
If one of those config set commands errors, paste that exact error. That will tell us whether timeoutSeconds is landing in the wrong path, wrong type, or there’s another schema issue.
Also: for the model prefill issue, I’d lower context too:
openclaw config set models.providers.custom-localhost-8080.models '[{"id":"planner","name":"LFM2-8B Planner","contextWindow":8192,"maxTokens":1024,"input":["text"],"reasoning":false}]' --strict-json --replace
That one intentionally replaces only the planner model list, not the whole config.
PS C:\WINDOWS\system32> openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json
Error: gateway timeout after 120000ms
Gateway target: ws://127.0.0.1:18789
Source: local loopback
Config: C:\Users\User.openclaw\openclaw.json
Bind: loopback
That specific error is the CLI waiting for the gateway response for 120s, not necessarily the model provider timeout:
Error: gateway timeout after 120000ms
Your model may still be running, but the CLI gave up waiting. For this test, pass a longer CLI timeout too:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json --timeout-ms 600000
But also check whether your provider timeout actually stuck:
openclaw config get models.providers.custom-localhost-8080.timeoutSeconds
openclaw config get models.providers.custom-localhost-8080.models
You want to see something like:
timeoutSeconds: 600
contextWindow: 8192
maxTokens: 1024
reasoning: false
Given your local server log showed it was still pre-processing ~8k+ tokens and got cancelled, I strongly recommend lowering the advertised planner context first:
openclaw config set models.providers.custom-localhost-8080.models '[{"id":"planner","name":"LFM2-8B Planner","contextWindow":4096,"maxTokens":512,"input":["text"],"reasoning":false}]' --strict-json --replace
openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 600 --strict-json
openclaw config validate
openclaw gateway restart
Then start a fresh chat/session with /new so it doesn’t include the old polluted context, and retry the infer command with --timeout-ms 600000.
If it still times out after that, the local server is too slow/compatibility-broken for OpenClaw’s full prompt path. The next test would be direct server curl with a tiny OpenAI chat-completions payload to confirm first-token latency outside OpenClaw.
🦞 OpenClaw 2026.4.23 (a979721) — Your config is valid, your assumptions are not.
Error: Config validation failed: models.providers.custom-localhost-8080: Unrecognized key: "timeoutSeconds"
🦞 OpenClaw 2026.4.23 (a979721) — I'll do the boring stuff while you dramatically stare at the logs like it's cinema.
Error: Config validation failed: models.providers.custom-localhost-8081: Unrecognized key: "timeoutSeconds"
🦞 OpenClaw 2026.4.23 (a979721) — I don't sleep, I just enter low-power mode and dream of clean diffs.
Error: Config validation failed: models.providers.custom-localhost-8082: Unrecognized key: "timeoutSeconds"
PS C:\WINDOWS\system32>
You’re right — I gave you the wrong key for your installed version. Sorry, that’s on me. In 2026.4.23, models.providers.*.timeoutSeconds is being rejected, so don’t use it.
Use the model-level timeout under agents.defaults.model.timeoutMs instead:
openclaw config set agents.defaults.model '{"primary":"custom-localhost-8080/planner","timeoutMs":600000}' --strict-json
For vision too, if needed:
openclaw config set agents.defaults.imageModel '{"primary":"custom-localhost-8081/vision","timeoutMs":600000}' --strict-json
Then validate + restart:
openclaw config validate
openclaw gateway restart
For the CLI probe, also extend the CLI wait timeout:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json --timeout-ms 600000
But the bigger fix is still to reduce the prompt/context, because your llama.cpp log showed it was stuck in prefill and only ~48% through before cancellation. Set planner smaller:
openclaw config set models.providers.custom-localhost-8080.models '[{"id":"planner","name":"LFM2-8B Planner","contextWindow":4096,"maxTokens":512,"input":["text"],"reasoning":false}]' --strict-json --replace
Then /new in chat before testing again.
Short version: remove timeoutSeconds; use agents.defaults.model.timeoutMs on 2026.4.23.
14:51:03 [gateway] security audit: device access upgrade requested reason=scope-upgrade device=2b8010d84c84c4ebdba8fa2a407a2ca942519111bf0462a4a0f7d4915ee2a0f0 ip=unknown-ip auth=password roleFrom=operator roleTo=operator scopesFrom=operator.read scopesTo=operator.admin,operator.approvals,operator.pairing,operator.read,operator.talk.secrets,operator.write client=cli conn=ffb23f2a-8d67-4cb3-bba6-68fe56fe30e8
14:51:03 [ws] closed before connect conn=ffb23f2a-8d67-4cb3-bba6-68fe56fe30e8 peer=127.0.0.1:57680->127.0.0.1:18789 remote=127.0.0.1 fwd=n/a origin=n/a host=127.0.0.1:18789 ua=n/a code=1008 reason=connect failed
14:52:32 [agent/embedded] [llm-idle-timeout] custom-localhost-8080/planner produced no reply before the idle watchdog; retrying same model
14:52:49 [ws] ⇄ res ✓ node.list 103ms conn=25dbced2…0ee7 id=8500875f…6ecc
14:54:49 [agent/embedded] embedded run failover decision: runId=39b3734c-8673-47c1-a926-20ea1de25cbc stage=assistant decision=surface_error reason=timeout from=custom-localhost-8080/planner profile=-
14:54:49 [diagnostic] lane wait exceeded: lane=session:agent:main:main waitedMs=167930 queueAhead=0
14:55:10 [ws] ⇄ res ✓ node.list 151ms conn=25dbced2…0ee7 id=ea6fff58…2bf9
14:56:24 [reload] config change detected; evaluating reload (models.providers.custom-localhost-8080.models)
14:56:25 [reload] config hot reload applied (models.providers.custom-localhost-8080.models)
14:57:22 [agent/embedded] [llm-idle-timeout] custom-localhost-8080/planner produced no reply before the idle watchdog; retrying same model
This log shows two separate things:
- Scope upgrade happened again
scope-upgrade ... request ... code=1008
If CLI commands fail again, run:
openclaw devices list
and approve the newest pending request id.
- The real model issue is still first-token timeout
[llm-idle-timeout] ... produced no reply before the idle watchdog
Also notice your reload only changed this:
models.providers.custom-localhost-8080.models
So you lowered the model definition, but you probably have not increased the agent/model timeout yet.
For 2026.4.23, since models.providers.*.timeoutSeconds is rejected, set these instead:
openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
openclaw config set agents.defaults.model.timeoutMs 600000 --strict-json
openclaw config validate
openclaw gateway restart
Then start a clean session so the old huge/polluted context is not reused:
/new
Keep planner small for now:
contextWindow: 4096
maxTokens: 512
reasoning: false
Then test with a longer CLI wait:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json --timeout-ms 600000
If it still idles out after that, your llama.cpp server is still spending too long in prompt prefill before first token. At that point the fix is not more OpenClaw config — reduce server context / model size / prompt length, or use a faster backend/model.
PS C:\WINDOWS\system32> openclaw config set agents.defaults.model '{"primary":"custom-localhost-8080/planner","timeoutMs":600000}' --strict-json
🦞 OpenClaw 2026.4.23 (a979721) — I'm not saying your workflow is chaotic... I'm just bringing a linter and a helmet.
Error: Failed to parse JSON value: SyntaxError: Expected property name or '}' in JSON at position 1 (line 1 column 2)
That’s a PowerShell quoting problem. The JSON quotes are getting stripped before OpenClaw receives it, so OpenClaw sees something like {primary:...} instead of valid JSON.
Easiest fix: don’t set the whole object. Set the nested fields separately:
openclaw config set agents.defaults.model.primary '"custom-localhost-8080/planner"' --strict-json
openclaw config set agents.defaults.model.timeoutMs 600000 --strict-json
Then verify:
openclaw config get agents.defaults.model
openclaw config validate
openclaw gateway restart
If PowerShell still fights the string value, use backtick escaping instead:
openclaw config set agents.defaults.model.primary "`"custom-localhost-8080/planner`"" --strict-json
Or use a patch file, which avoids shell quoting entirely:
@'
{
agents: {
defaults: {
model: {
primary: "custom-localhost-8080/planner",
timeoutMs: 600000
}
}
}
}
'@ | Set-Content $env:TEMP\openclaw-timeout.patch.json5
openclaw config patch --file $env:TEMP\openclaw-timeout.patch.json5 --dry-run
openclaw config patch --file $env:TEMP\openclaw-timeout.patch.json5
I’d use the two nested config set commands first — least room for JSON drama. 🦞
PS C:\WINDOWS\system32> openclaw config set agents.defaults.model.primary '"custom-localhost-8080/planner"' --strict-json
openclaw config set agents.defaults.model.timeoutMs 600000 --strict-json
🦞 OpenClaw 2026.4.23 (a979721) — Type the command with confidence—nature will provide the stack trace if needed.
Error: Failed to parse JSON value: SyntaxError: Unexpected token 'c', "custom-loc"... is not valid JSON
PowerShell is still eating the quotes. For this specific field, don’t use --strict-json — a plain value is fine because it’s just a string.
Run them separately:
openclaw config set agents.defaults.model.primary custom-localhost-8080/planner
openclaw config set agents.defaults.model.timeoutMs 600000 --strict-json
Then check:
openclaw config get agents.defaults.model
openclaw config validate
openclaw gateway restart
If you want the no-shell-drama version, use a patch file:
@'
{
agents: {
defaults: {
model: {
primary: "custom-localhost-8080/planner",
timeoutMs: 600000
}
}
}
}
'@ | Set-Content $env:TEMP\openclaw-timeout.patch.json5
openclaw config patch --file $env:TEMP\openclaw-timeout.patch.json5 --dry-run
openclaw config patch --file $env:TEMP\openclaw-timeout.patch.json5
The patch-file route is safest on Windows when quoting gets weird.
okay it works write the same command for code and vision model too
Yep. For vision, use agents.defaults.imageModel:
openclaw config set agents.defaults.imageModel.primary custom-localhost-8081/vision
openclaw config set agents.defaults.imageModel.timeoutMs 600000 --strict-json
For coder, there are two cases:
If you want to make coder the main/default model instead of planner:
openclaw config set agents.defaults.model.primary custom-localhost-8082/coder
openclaw config set agents.defaults.model.timeoutMs 600000 --strict-json
If you want to keep planner as default and only switch to coder sometimes, don’t change config — use model switching in chat/CLI instead:
/model custom-localhost-8082/coder
Then verify everything:
openclaw config get agents.defaults.model
openclaw config get agents.defaults.imageModel
openclaw config validate
openclaw gateway restart
Important: OpenClaw only has one normal active agents.defaults.model at a time. imageModel is separate and gets used for image-capable handling when needed.
PS C:\WINDOWS\system32> openclaw config get agents.defaults.model
openclaw config get agents.defaults.imageModel
openclaw config validate
openclaw gateway restart
🦞 OpenClaw 2026.4.23 (a979721) — Welcome to the command line: where dreams compile and confidence segfaults.
{
"primary": "custom-localhost-8080/planner"
}
🦞 OpenClaw 2026.4.23 (a979721) — The lobster in your shell. 🦞
{
"primary": "custom-localhost-8081/vision"
}
🦞 OpenClaw 2026.4.23 (a979721) — I don't have opinions about tabs vs spaces. I have opinions about everything else.
Config valid: ~.openclaw\openclaw.json
🦞 OpenClaw 2026.4.23 (a979721) — I'm like tmux: confusing at first, then suddenly you can't live without me.
Restarted Windows login item: OpenClaw Gateway
[openclaw] Uncaught exception: Error: spawn start ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:286:19)
at onErrorNT (node:internal/child_process:484:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21)
Two things here:
1. Your timeout didn’t show up in config get
The output only shows:
{
"primary": "custom-localhost-8080/planner"
}
So timeoutMs is not currently set. Run:
openclaw config set agents.defaults.model.timeoutMs 600000 --strict-json
openclaw config set agents.defaults.imageModel.timeoutMs 600000 --strict-json
openclaw config get agents.defaults.model.timeoutMs
openclaw config get agents.defaults.imageModel.timeoutMs
openclaw config validate
For coder, there is no separate coder timeout unless coder is the active main model. If you switch main model to coder, this same timeout applies:
openclaw config set agents.defaults.model.primary custom-localhost-8082/coder
openclaw config set agents.defaults.model.timeoutMs 600000 --strict-json
If you keep planner as default, just use /model custom-localhost-8082/coder when you want coder.
2. spawn start ENOENT is a Windows restart/service issue
Config is valid, so this is not your JSON. Check whether the gateway actually came back:
openclaw gateway status
openclaw status --deep
If it’s running, ignore that restart error for now and continue testing.
If it’s not running or restart keeps throwing spawn start ENOENT, repair the Windows service setup:
openclaw doctor
openclaw gateway install
openclaw gateway status
Then try:
openclaw gateway restart
Your earlier status showed both Scheduled Task and Startup-folder login item, so OpenClaw may be hitting the fallback login-item restart path. Scheduled Task is preferred on Windows.
🦞 OpenClaw 2026.4.23 (a979721) — Your .env is showing; don't worry, I'll pretend I didn't see it.
Error: Config validation failed: agents.defaults.model: Invalid input
Okay — that means 2026.4.23 does not accept agents.defaults.model.timeoutMs in your config. I was wrong on that key for your installed version. Remove it.
Run:
openclaw config unset agents.defaults.model.timeoutMs
openclaw config unset agents.defaults.imageModel.timeoutMs
openclaw config validate
Then use the supported whole-agent timeout instead:
openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
openclaw config validate
Your model config should go back to this shape:
"model": {
"primary": "custom-localhost-8080/planner"
},
"imageModel": {
"primary": "custom-localhost-8081/vision"
},
"timeoutSeconds": 600
If config unset can’t run because the config is already wedged, do:
openclaw doctor --fix
openclaw config validate
Then re-apply only:
openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
For the CLI probe, you can still extend the CLI wait separately:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json --timeout-ms 600000
But don’t use agents.defaults.model.timeoutMs on 2026.4.23.
15:09:53 [discord] [default] starting provider
15:09:54 [discord] users resolved: 807736763425619968
15:09:54 [bonjour] restarting advertiser (service stuck in probing for 106164ms (gateway fqdn=DESKTOP-UJ5TSC3 (OpenClaw)._openclaw-gw._tcp.local. host=openclaw.local. port=18789 state=probing))
15:09:54 [ws] handshake timeout conn=e67c77d5-0369-4e4f-a18a-4d7d6af0b932 peer=127.0.0.1:51690->127.0.0.1:18789 remote=127.0.0.1
15:09:54 [plugins] embedded acpx runtime backend registered (cwd: C:\Users\User.openclaw\workspace)
15:09:55 (node:37564) [DEP0190] DeprecationWarning: Passing args to a child process with shell option true can lead to security vulnerabilities, as the arguments are not escaped, only concatenated.
(Use node --trace-deprecation ... to show where the warning was created)
15:09:55 [browser/server] Browser control listening on http://127.0.0.1:18791/ (auth=password)
15:09:55 [ws] closed before connect conn=e67c77d5-0369-4e4f-a18a-4d7d6af0b932 peer=127.0.0.1:51690->127.0.0.1:18789 remote=127.0.0.1 fwd=n/a origin=http://127.0.0.1:18789 host=127.0.0.1:18789 ua=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.0.0 Safari/537.36 code=1000 reason=n/a
15:09:55 [ws] webchat connected conn=e67c77d5-0369-4e4f-a18a-4d7d6af0b932 remote=127.0.0.1 client=openclaw-control-ui webchat v2026.4.23
15:09:57 [plugins] embedded acpx runtime backend ready
15:09:57 [discord] client initialized as 1484564009775534170 (Ralsei); awaiting gateway readiness
This startup log is mostly okay. The scary-looking bits are probably not your model issue:
bonjour restarting advertiser... probing= local network discovery/mDNS got stuck and restarted. Usually harmless unless you rely on LAN discovery.ws handshake timeoutfollowed bywebchat connected= the Control UI tried to connect before the gateway was fully ready, then reconnected successfully.DEP0190= Node deprecation warning from some child-process call; noisy, not fatal.embedded acpx runtime backend ready+discord client initialized= good signs.
The line to watch for after this is whether it eventually says gateway ready / Discord ready, or whether it keeps sitting at:
[discord] client initialized ... awaiting gateway readiness
If it stays there for minutes, run:
openclaw status --deep
openclaw health --verbose
openclaw config validate
For the model timeout specifically, this log doesn’t show a model request yet. After gateway is ready, test with:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json --timeout-ms 600000
Also since model.timeoutMs was invalid on your version, the safe timeout config is:
openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
openclaw config validate
Then restart only if validation passes.
why my openclaw is slow at transfering prompts and results
It’s probably not “transfer” speed. From your llama.cpp log, OpenClaw sends the request, then your model server spends a long time prefilling the prompt before it produces the first token.
This part was the giveaway:
prompt processing progress ... n_tokens = 8192 ... progress = 0.484848
srv stop: cancel task
So the slow path is usually:
- OpenClaw builds a big agent prompt: system prompt + tools + workspace/context + chat history.
- Local server receives it.
- llama.cpp reprocesses thousands of tokens.
- No output appears until prefill finishes.
- OpenClaw/CLI looks “stuck” and may timeout.
For your hardware, do this:
openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
openclaw config set models.providers.custom-localhost-8080.models '[{"id":"planner","name":"LFM2-8B Planner","contextWindow":4096,"maxTokens":512,"input":["text"],"reasoning":false}]' --strict-json --replace
openclaw config validate
Then restart / fresh session:
openclaw gateway restart
In chat:
/new
Also use 127.0.0.1 instead of localhost in model baseUrl if you haven’t already.
If it’s still slow after that, reduce more:
contextWindow: 2048,
maxTokens: 256,
reasoning: false
Big picture: local 4GB GPU + CPU inference can work, but OpenClaw agent prompts are much heavier than a simple “hi” in a model UI. The fastest fix is smaller context + fresh session + no reasoning + fewer tools/models loaded while testing.
slot update_slots: id 0 | task 393 | n_tokens = 8192, memory_seq_rm [8192, end)
slot update_slots: id 0 | task 393 | prompt processing progress, n_tokens = 10240, batch.n_tokens = 2048, progress = 0.610069
slot update_slots: id 0 | task 393 | n_tokens = 10240, memory_seq_rm [10240, end)
slot update_slots: id 0 | task 393 | prompt processing progress, n_tokens = 12288, batch.n_tokens = 2048, progress = 0.732082
slot update_slots: id 0 | task 393 | n_tokens = 12288, memory_seq_rm [12288, end)
slot update_slots: id 0 | task 393 | prompt processing progress, n_tokens = 14336, batch.n_tokens = 2048, progress = 0.854096
That confirms it: the request is ~16.8k prompt tokens. It’s at 85% prefill and still hasn’t generated a single output token yet.
So the bottleneck is prompt prefill, not network transfer.
Do these reductions:
1. Set an effective runtime cap, not only native context
Use contextTokens too:
openclaw config set models.providers.custom-localhost-8080.models '[{"id":"planner","name":"LFM2-8B Planner","contextWindow":4096,"contextTokens":4096,"maxTokens":512,"input":["text"],"reasoning":false}]' --strict-json --replace
contextWindow is model metadata; contextTokens is the smaller runtime cap you want OpenClaw to actually budget around.
2. Enable lean mode for local models
This trims heavy default tool surface:
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
3. Reduce Discord history context
For testing, disable extra group history:
openclaw config set channels.discord.historyLimit 0 --strict-json
4. Validate + restart + fresh session
openclaw config validate
openclaw gateway restart
Then in chat:
/new
If it’s still too large, last-resort disable tools for that local planner:
openclaw config set models.providers.custom-localhost-8080.models '[{"id":"planner","name":"LFM2-8B Planner","contextWindow":4096,"contextTokens":4096,"maxTokens":512,"input":["text"],"reasoning":false,"compat":{"supportsTools":false}}]' --strict-json --replace
That makes the planner more like a plain chat model, but much lighter. For a 4GB GPU/CPU setup, this is often the difference between “eventually maybe” and usable.
if saying hi is already this heavy then how do i do working loops
Yeah — this is the hard truth of local agents: a real OpenClaw agent loop is much heavier than a plain “hi” in llama.cpp/LM Studio.
A working loop includes a lot before your actual message:
- system/persona instructions
- tool definitions
- Discord/thread context
- workspace/bootstrap files
- session history
- model routing metadata
- sometimes image/audio/media capability instructions
So “hi” can still become a 10k–20k token prompt. On a 4GB GPU + CPU inference, that means slow prefill every turn.
For your hardware, I would not try to run the full coding-agent loop locally. I’d use one of these patterns:
Option A — local light assistant
Use local model for chat/basic tasks only:
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
And set planner small:
contextWindow: 4096
contextTokens: 4096
maxTokens: 512
reasoning: false
compat: { supportsTools: false }
This gives you a usable local “answer simple things” bot, not a strong coding loop.
Option B — hybrid, recommended
Use local for cheap/private chat, but hosted model for coding/tool work. That’s the practical setup. Local 4GB GPU models usually struggle with tool calling, long prompts, and multi-step coding.
Option C — separate OpenClaw agents
Make one lightweight local agent for Discord chat and another stronger/hosted agent for coding. Then switch/use the stronger one when needed.
Option D — direct model mode for local
For local models, test with:
openclaw infer model run --model custom-localhost-8080/planner --prompt "..." --gateway --json
That skips the full agent/tool loop. If this is fast but chat is slow, the problem is definitely agent context size.
My honest recommendation: on your machine, aim for local vision/transcription/simple chat, but use a cloud model or much bigger local rig for real working loops. Otherwise every turn spends forever just digesting the prompt before it can think. 🦞
PS C:\WINDOWS\system32> openclaw config set models.providers.custom-localhost-8080.models '[{"id":"planner","name":"LFM2-8B Planner","contextWindow":4096,"contextTokens":4096,"maxTokens":512,"input":["text"],"reasoning":false}]' --strict-json --replace
🦞 OpenClaw 2026.4.23 (a979721) — Self-hosted, self-updating, self-aware (just kidding... unless?).
Error: Failed to parse JSON value: SyntaxError: Expected property name or '}' in JSON at position 2 (line 1 column 3)
PS C:\WINDOWS\system32>
PowerShell is mangling the JSON array again. Don’t replace the whole models array — set the fields one by one instead:
openclaw config set models.providers.custom-localhost-8080.models[0].contextWindow 4096 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].contextTokens 4096 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].maxTokens 512 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].reasoning false --strict-json
Then verify:
openclaw config get models.providers.custom-localhost-8080.models
openclaw config validate
If you want to disable tools for the planner too:
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
If that path errors because compat doesn’t exist yet, use the patch-file method instead:
@'
{
models: {
providers: {
"custom-localhost-8080": {
models: [
{
id: "planner",
name: "LFM2-8B Planner",
contextWindow: 4096,
contextTokens: 4096,
maxTokens: 512,
input: ["text"],
reasoning: false,
compat: { supportsTools: false }
}
]
}
}
}
}
'@ | Set-Content $env:TEMP\planner.patch.json5
openclaw config patch --file $env:TEMP\planner.patch.json5 --replace-path models.providers.custom-localhost-8080.models --dry-run
openclaw config patch --file $env:TEMP\planner.patch.json5 --replace-path models.providers.custom-localhost-8080.models
The one-by-one commands are simpler; use those first.
PS C:\WINDOWS\system32> openclaw config get models.providers.custom-localhost-8080.models
openclaw config validate
🦞 OpenClaw 2026.4.23 (a979721) — Welcome to the command line: where dreams compile and confidence segfaults.
[
{
"id": "planner",
"name": "LFM2-8B Planner",
"reasoning": false,
"input": [
"text"
],
"contextWindow": 4096,
"contextTokens": 4096,
"maxTokens": 512,
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"api": "openai-completions"
}
]
🦞 OpenClaw 2026.4.23 (a979721) — Making 'I'll automate that later' happen now.
Config valid: ~.openclaw\openclaw.json
PS C:\WINDOWS\system32>
Good — that part is now fixed and valid. Planner is capped to 4k context / 512 output and reasoning is off.
Now do the remaining lightweight-agent settings:
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
openclaw config validate
Then restart and start clean:
openclaw gateway restart
In chat:
/new
Then test:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json --timeout-ms 600000
If the full chat is still huge after that, add the last-resort tool disable for this local model:
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config validate
openclaw gateway restart
That will make it much lighter, but it also means this planner won’t do tool-calling loops. For your hardware, that may be the right tradeoff for a usable local chat baseline.
what does this do:openclaw config set channels.discord.historyLimit 0 --strict-json
It tells OpenClaw’s Discord channel to include zero recent Discord messages as extra channel history when building the prompt.
So with:
openclaw config set channels.discord.historyLimit 0 --strict-json
OpenClaw will stop adding recent Discord thread/channel history into each agent turn.
Why I suggested it: your local model was getting huge prompts. Discord history can add extra tokens every turn, so setting it to 0 helps make the prompt smaller and faster.
Tradeoff:
- âś… faster / fewer prompt tokens
- âś… better for weak local models
- ❌ less awareness of recent thread context unless it’s already in the active session transcript or directly included in the current message
You can later set it back to something small, like:
openclaw config set channels.discord.historyLimit 5 --strict-json
For your 4GB GPU setup, 0 or 5 is much safer than large history.
btw this is what happend after i tried to say hi:
15:20:51 [diagnostic] stuck session: sessionId=e3229eab-689c-4e49-b3f4-49210edaa7bd sessionKey=agent:main:main state=processing age=527s queueDepth=1
15:20:51 [discord] gateway: Gateway websocket closed: 1006
15:20:51 [discord] gateway: Gateway reconnect scheduled in 1076ms (close, resume=true)
15:20:52 [ws] ⇄ res ✓ node.list 227ms conn=2f287603…c7e6 id=aeb545f6…f42a
That means the previous hi turn got wedged in the session lane:
stuck session ... state=processing age=527s queueDepth=1
So OpenClaw still thinks the main session is busy processing an old run. New messages queue behind it, which makes everything feel frozen.
The Discord 1006 reconnect is probably secondary — Discord gateway connection dropped/reconnected while the agent lane was stuck. The key problem is the stuck session/model run.
Do this cleanup:
openclaw gateway restart --force
Then start a fresh session in chat:
/new
If /new doesn’t respond because the lane is still wedged, use:
openclaw sessions list
and if needed run doctor:
openclaw doctor
Before trying Discord again, test the model outside the full chat path:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --json --timeout-ms 600000
If that is still slow/hangs, don’t test in Discord yet — it’ll just wedge the session again. Fix the local model prompt/prefill first.
Also after your config reductions, confirm these are set:
openclaw config get agents.defaults.experimental.localModelLean
openclaw config get channels.discord.historyLimit
openclaw config get models.providers.custom-localhost-8080.models
openclaw config get agents.defaults.timeoutSeconds
Expected: lean mode true, Discord history 0, planner contextTokens: 4096, and timeout 600.
PS C:\WINDOWS\system32> openclaw gateway restart --force
error: unknown option '--force'
Yep, that flag isn’t in your 2026.4.23 CLI. My bad — I was looking at newer restart docs. Use the supported command:
openclaw gateway restart
Then check:
openclaw gateway status
openclaw status --deep
If the session is still stuck after restart, the local model request may still be hanging in your llama.cpp server. In that case, stop/restart the local model server too, then start a fresh OpenClaw chat session:
/new
For now, don’t keep sending hi in Discord while the model probe is hanging — each message can queue behind the stuck session lane.
Use this order:
openclaw gateway restart
openclaw config validate
openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --json --timeout-ms 600000
If that probe hangs, the full Discord agent loop will definitely hang too. Fix the local server/prompt size first, then test Discord.
Restarted Windows login item: OpenClaw Gateway
[openclaw] Uncaught exception: Error: spawn start ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:286:19)
at onErrorNT (node:internal/child_process:484:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21)
PS C:\WINDOWS\system32>
That spawn start ENOENT looks like a Windows service/restart path bug, not your model config.
start is a cmd.exe built-in on Windows, not a real executable, so if OpenClaw’s fallback “Startup-folder login item” restart path tries to spawn start directly, it can throw exactly this kind of ENOENT.
First check whether the gateway actually restarted anyway:
openclaw gateway status
openclaw status --deep
If it says the gateway is running/listening on 18789, you can ignore this for the moment and continue testing.
If it is not running, use the temporary foreground workaround:
openclaw gateway run
Leave that window open while testing.
Then, to repair the Windows managed startup setup, try:
openclaw doctor
openclaw gateway install
openclaw gateway status
If gateway restart keeps throwing spawn start ENOENT after reinstall, that’s worth filing as a Windows bug here:
https://github.com/openclaw/openclaw/issues
Include:
- OpenClaw
2026.4.23 - Windows version
- this error:
spawn start ENOENT - that it says
Restarted Windows login item: OpenClaw Gateway
For now, don’t keep testing Discord until openclaw gateway status confirms the gateway is healthy — otherwise the stuck session/model debugging gets mixed with service restart failure.
i have been testing on gateway this whole time
Right — but there are two different “gateway” paths, and I should’ve been clearer:
openclaw infer model run --gateway ... = Gateway-routed raw model probe. It should skip the full Discord/agent/tool/session context and send basically just your prompt.
Sending hi in Discord/OpenClaw chat = full agent turn through the Gateway. That includes session context, tools, persona/bootstrap, Discord handling, etc. That’s the heavy one.
So if the huge llama.cpp prefill log happened after you said hi in Discord, that makes sense.
If the huge prefill happened from this command:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json --timeout-ms 600000
then something else is wrong, because that probe should not be sending 16k tokens. In that case, test your model server directly outside OpenClaw:
$body = @{
model = "planner"
messages = @(@{ role = "user"; content = "Say hello" })
max_tokens = 20
stream = $false
} | ConvertTo-Json -Depth 5
Invoke-RestMethod -Uri http://127.0.0.1:8080/v1/chat/completions -Method Post -ContentType "application/json" -Body $body
If even that direct request pre-processes thousands of tokens, your server/model template is injecting a huge prompt by itself.
If direct request is fast but Discord hi is slow, then OpenClaw agent context is the issue — use lean mode, historyLimit 0, /new, and possibly compat.supportsTools false for the local planner.
uh why does the agent return json file:
{"message": "Hi. I’m your openclaw control UI assistant, ready to help with agent coordination, memory updates, scheduling reminders, or any task to streamline your workflow. How can I assist you today?"}
That’s not a “file” — it’s JSON text.
Most likely one of two things:
- You ran the CLI with
--json
If the command was like:
openclaw infer model run ... --json
then OpenClaw intentionally prints machine-readable JSON instead of pretty text. Remove --json if you want normal output.
- Your local model is answering in JSON because of its chat template / system prompt
This part:
{"message":"Hi. I’m your openclaw control UI assistant..."}
looks like the model wrapped its reply in a message field. That can happen with some local models/templates, especially if the server uses a “tool/chat” format or a control-UI-style system prompt.
Quick checks:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway
No --json there.
If it still returns:
{"message":"hello"}
then it’s your local model/server template doing JSON-style output. Try changing the server chat format/template, or add a direct instruction like:
Reply in plain text only. Do not use JSON.
Also, the wording “control UI assistant” suggests you may be talking through the Control UI/webchat path, not Discord’s normal assistant flow. That’s okay, but it may use a slightly different prompt than Discord.
got this
Default (off)
Model context window too small (4096 tokens; source=modelsConfig). Minimum is 16000. OpenClaw is using the configured model context limit for this model. Raise contextWindow/contextTokens or choose a larger model.
|
Model context window too small (4096 tokens; source=modelsConfig). Minimum is 16000. OpenClaw is using the configured model context limit for this model. Raise contextWindow/contextTokens or choose a larger model
That means OpenClaw’s agent loop has a hard minimum context budget of 16k tokens. Setting planner to 4096 made the prompt smaller, but now OpenClaw refuses to use it as a normal agent model.
So set it back to at least 16000, but keep the other reductions:
openclaw config set models.providers.custom-localhost-8080.models[0].contextWindow 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].contextTokens 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].maxTokens 512 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].reasoning false --strict-json
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config validate
Then restart and start fresh:
openclaw gateway restart
In chat:
/new
Important tradeoff: 16k is the minimum for OpenClaw agent mode, but your local server was already slow pre-filling around that size. So if 16k is still too slow on your hardware, your realistic options are:
- use this local model only for raw probes / simple non-agent use
- disable tools for it with
compat.supportsTools false - use a faster/smaller local model/backend
- use hosted model for real agent/coding loops
The error is basically OpenClaw saying: “4k is too small to safely run the full assistant.”
15:29:11 [agent/embedded] blocked model (context window too small): custom-localhost-8080/planner ctx=4096 (min=16000) source=modelsConfig; Model context window too small (4096 tokens; source=modelsConfig). Minimum is 16000. OpenClaw is using the configured model context limit for this model. Raise contextWindow/contextTokens or choose a larger model.
15:29:11 [diagnostic] lane task error: lane=main durationMs=30786 error="FailoverError: Model context window too small (4096 tokens; source=modelsConfig). Minimum is 16000. OpenClaw is using the configured model context limit for this model. Raise contextWindow/contextTokens or choose a larger model."
15:29:11 [diagnostic] lane task error: lane=session:agent:main:main durationMs=30790 error="FailoverError: Model context window too small (4096 tokens; source=modelsConfig). Minimum is 16000. OpenClaw is using the configured model context limit for this model. Raise contextWindow/contextTokens or choose a larger model."
Yep, that log is exactly the 4k cap problem. OpenClaw won’t run the full embedded agent loop below 16k context.
Set planner back to 16k:
openclaw config set models.providers.custom-localhost-8080.models[0].contextWindow 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].contextTokens 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].maxTokens 512 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].reasoning false --strict-json
openclaw config validate
Then restart / fresh session:
openclaw gateway restart
In chat:
/new
But the tradeoff is real: OpenClaw requires 16k minimum for agent mode, and your server is slow prefilling ~16k. So if 16k is too slow on your hardware, the practical choices are:
- use local model only for raw/simple model probes, not full agent loops
- enable lean mode + history 0 to reduce overhead as much as possible
- disable tools for the planner to make it lighter:
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
- or use a hosted/faster model for real coding/tool loops
For your setup, I’d use local for vision/transcription/simple chat, and a stronger model for coding loops. 4GB GPU just gets punished by the 16k agent prompt prefill.
like why it asks for 16k even with a "hi"
Because OpenClaw isn’t checking the size of your message. It’s checking whether the model is large enough for the minimum full agent runtime envelope.
Even when you type hi, the model may receive something like:
- OpenClaw system instructions
- safety / routing instructions
- tool definitions
- Discord/session metadata
- workspace/bootstrap context
- recent session transcript
- model/tool protocol instructions
OpenClaw needs enough context headroom for all of that plus the user request plus the model’s reply and tool loop. So it enforces a minimum context window of 16k for embedded agent mode.
Think of it like: hi is the payload, but the truck still has to carry the whole assistant runtime.
For local models, your options are:
openclaw config set models.providers.custom-localhost-8080.models[0].contextWindow 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].contextTokens 16000 --strict-json
Then reduce overhead as much as possible:
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].reasoning false --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].maxTokens 512 --strict-json
If 16k prefill is still too slow on your 4GB GPU/CPU setup, that’s the hardware/backend limit showing up. The model can answer hi, but it can’t comfortably run the full OpenClaw agent loop.
does it required for image model
Yes if the image model is used as an OpenClaw agent/model turn.
agents.defaults.imageModel is still a model used inside the assistant runtime when images are involved, so it needs enough context for the OpenClaw envelope too. Don’t set the vision model to 4k if OpenClaw is going to route image turns through it.
For your vision model, keep at least 16k if the server/model can handle it:
openclaw config set models.providers.custom-localhost-8081.models[0].contextWindow 16000 --strict-json
openclaw config set models.providers.custom-localhost-8081.models[0].contextTokens 16000 --strict-json
openclaw config set models.providers.custom-localhost-8081.models[0].maxTokens 512 --strict-json
openclaw config set models.providers.custom-localhost-8081.models[0].reasoning false --strict-json
Then:
openclaw config validate
Separate note: Whisper/audio transcription under tools.media.audio is different — that does not need the 16k agent context window, because it’s a CLI/media transcription step, not the full agent loop.
15:37:00 [diagnostic] stuck session: sessionId=4bff9132-5129-4012-9515-9c38fcf9957b sessionKey=agent:main:main state=processing age=133s queueDepth=1
That means the same main session lane is still blocked by a running/failed model turn:
state=processing age=133s queueDepth=1
At this point, don’t send more chat messages into OpenClaw yet — they’ll just queue behind the stuck run.
Do this recovery order:
# 1) Stop/restart your local llama.cpp/model server first
# The stuck OpenClaw turn is probably waiting on that request.
# 2) Restart OpenClaw gateway
openclaw gateway restart
# 3) Check health
openclaw gateway status
openclaw status --deep
Then start a fresh session in chat:
/new
If the gateway restart still throws spawn start ENOENT, run the gateway in foreground temporarily while testing:
openclaw gateway run
Leave that window open.
Before using Discord again, test the local model directly or with the raw gateway probe. If this hangs, Discord will hang too:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --json --timeout-ms 600000
If that command is slow/stuck, the fix is still on the local model side: 16k context is required for agent mode, but your backend is struggling to prefill it. Use lean mode/history 0, or switch the real agent loop to a faster/hosted model.
i will update to the latest openclaw ver first
Good call. Update first, then re-test — a few things we hit look version-specific.
Use the built-in updater if available:
openclaw update --dry-run
openclaw update
After it finishes:
openclaw --version
openclaw doctor
openclaw config validate
openclaw gateway restart
openclaw status --deep
If openclaw update itself fails, fall back to npm:
npm i -g openclaw@latest
openclaw doctor
openclaw gateway restart
Docs: https://docs.openclaw.ai/install/updating
After updating, paste the new openclaw --version and we can revisit the timeout/context keys against the newer schema. 🦞
PS C:\WINDOWS\system32> openclaw update
Updating OpenClaw...
|
o âś“ Updating via package manager (214.23s)
|
o âś“ Running doctor checks (57.73s)
Update Result: OK
Root: C:\nvm4w\nodejs\node_modules\openclaw
Before: 2026.4.23
After: 2026.5.7
Total time: 272.61s
Nice, update succeeded 🎉 You’re now on 2026.5.7, so some of the config keys we were fighting may behave differently now.
Next do this cleanup/check sequence:
openclaw --version
openclaw config validate
openclaw gateway restart
openclaw status --deep
Then check whether the provider timeout key is accepted on this version:
openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 600 --strict-json
openclaw config validate
If that passes, also set it for vision/coder:
openclaw config set models.providers.custom-localhost-8081.timeoutSeconds 600 --strict-json
openclaw config set models.providers.custom-localhost-8082.timeoutSeconds 600 --strict-json
openclaw config validate
Then test raw gateway model routing before Discord/chat:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --json --timeout-ms 600000
If that succeeds, then try a fresh chat session with /new. If it still idles out, we know it’s local model prefill speed, not old OpenClaw version/config schema weirdness.
what is the findstr /R/C:"18789 .*LIstening
That’s a Windows filter command, usually used with netstat to check whether OpenClaw’s gateway port is listening.
Typical full command:
netstat -ano | findstr /R /C:"18789 .*LISTENING"
Meaning:
netstat -anolists network sockets + process IDs|pipes that output intofindstrfindstrsearches/filter lines/Rmeans use regex matching/C:"..."treats the quoted text as one search pattern18789is OpenClaw’s gateway port.*means “anything in between”
LISTENINGmeans a process is accepting connections on that port
So it’s checking: “is anything listening on port 18789?”
Small typo: it should be LISTENING, not LIstening — Windows matching may be case-sensitive depending on usage. Safer:
netstat -ano | findstr /I /R /C:"18789 .*LISTENING"
/I makes it case-insensitive.
uh why that windows keeps opening
Probably because OpenClaw is using the Windows Startup-folder login item fallback instead of a proper Scheduled Task service.
Your earlier logs said:
Restarted Windows login item: OpenClaw Gateway
spawn start ENOENT
That fallback can cause visible cmd/PowerShell windows to pop open when the gateway starts/restarts. Scheduled Task mode is preferred because it runs more quietly and has better supervisor status.
Check what mode you’re in:
openclaw gateway status
Then try repairing/reinstalling the managed gateway service:
openclaw doctor
openclaw gateway install
openclaw gateway status
If it still uses the Startup-folder login item after that, Windows may be blocking Scheduled Task creation. In that case, the window opening is expected from the fallback.
For now: if the gateway is already running, don’t keep restarting it repeatedly — that’s what keeps popping the window. Use:
openclaw gateway status
to check health instead.
\Users\User\AppData\Local\nvm\v24.14.0\node_modules\openclaw\dist\extensions\browser\skills\browser-automation": Error: EPERM: operation not permitted, symlink 'C:\Users\User\AppData\Local\nvm\v24.14.0\node_modules\openclaw\dist\extensions\browser\skills\browser-automation' -> 'C:\Users\User.openclaw\plugin-skills\browser-automation'
15:48:03 [skills] failed to create plugin skill symlink "C:\Users\User.openclaw\plugin-skills\browser-automation" → "C:\Users\User\AppData\Local\nvm\v24.14.0\node_modules\openclaw\dist\extensions\browser\skills\browser-automation": Error: EPERM: operation not permitted, symlink 'C:\Users\User\AppData\Local\nvm\v24.14.0\node_modules\openclaw\dist\extensions\browser\skills\browser-automation' -> 'C:\Users\User.openclaw\plugin-skills\browser-automation'
15:48:17 [agent/embedded] [trace:embedded-run] startup stages: runId=3ebcbf6e-9af6-459b-a4e4-294c5aba735f sessionId=79abafd7-b3c6-4e07-9cf0-216263b699f3 phase=attempt-dispatch totalMs=13516 stages=workspace:0ms@0ms,runtime-plugins:14ms@14ms,hooks:1ms@15ms,model-resolution:13202ms@13217ms,auth:283ms@13500ms,context-engine:3ms@13503ms,attempt-dispatch:13ms@13516ms
That EPERM is Windows blocking symlink creation:
failed to create plugin skill symlink ... EPERM: operation not permitted, symlink
OpenClaw is trying to publish the bundled browser-automation skill into:
C:\Users\User\.openclaw\plugin-skills\browser-automation
On Windows, creating symlinks usually requires either:
- Developer Mode enabled, or
- running the process as Administrator / with symlink privilege.
It’s probably not fatal unless you need the browser-automation skill. The agent can still run, but that skill won’t be discoverable through the plugin-skills symlink path.
Fix options:
-
Enable Windows Developer Mode:
Settings → System → For developers → Developer Mode → On -
Restart OpenClaw after enabling it:
openclaw gateway restart
- If it still fails, run OpenClaw/Gateway once from an elevated PowerShell, or reinstall the gateway from elevated PowerShell:
openclaw gateway install
openclaw gateway restart
The other line is separate:
model-resolution:13202ms
That means model resolution/auth/provider setup took ~13s before dispatch. Annoying, but not the same as the symlink error or model prefill timeout.
i think i might need that skill later
15:48:55 [diagnostic] liveness warning: reasons=event_loop_delay interval=30s eventLoopDelayP99Ms=33.7 eventLoopDelayMaxMs=3347.1 eventLoopUtilization=0.124 cpuCoreRatio=0.131 active=1 waiting=0 queued=1 recentPhases=sidecars.restart-sentinel:0ms,sidecars.subagent-recovery:12ms,sidecars.main-session-recovery:8ms,post-attach.update-sentinel:0ms,sidecars.session-locks:100ms,post-ready.maintenance:999ms work=[active=agent:main:main(processing/model_call,q=1,age=31s last=model_call:started) queued=agent:main:main(processing/model_call,q=1,age=31s last=model_call:started)]
Yeah, if you plan to use browser automation later, I’d fix the symlink permission now.
Best fix on Windows:
-
Enable Developer Mode
Settings → System → For developers → Developer Mode → On -
Restart OpenClaw:
openclaw gateway restart
- Check logs. The
failed to create plugin skill symlink ... EPERMwarning should disappear.
If it still fails, run PowerShell as Administrator once and reinstall/restart the gateway:
openclaw gateway install
openclaw gateway restart
You do not need to run everything as admin forever — the goal is just to let Windows allow symlink creation. Developer Mode is the cleaner long-term fix.
After that, this path should exist:
Test-Path C:\Users\User\.openclaw\plugin-skills\browser-automation
If it returns True, the browser automation skill should be available when OpenClaw needs it.
That warning says the Gateway itself is mostly alive, but the main agent session is currently busy in a model call and one message is queued behind it:
active=agent:main:main(processing/model_call,q=1,age=31s last=model_call:started)
queued=agent:main:main(... q=1 ...)
The event_loop_delay part means Node’s event loop had a pause/spike:
eventLoopDelayMaxMs=3347.1
So at one point the Gateway couldn’t process JS work for ~3.3s. That can happen if the machine is under load, antivirus is scanning, disk is busy, or local model processes are eating CPU/RAM. Your CPU ratios aren’t high there, so it’s more of a stall/spike than sustained CPU overload.
Most important: this is still consistent with the local model call being slow/stuck. Don’t send more messages while queued=1, because they’ll stack behind the active model call.
Do this after the current model call finishes or you stop/restart the local model server:
openclaw status --deep
openclaw logs --tail 100
If it keeps happening, reduce the OpenClaw agent load further:
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config validate
But the key readout is: Gateway not dead; main session is waiting on model_call; one message queued.
15:50:06 [ws] ⇄ res ✓ node.list 81ms conn=68c37f8a…9bc5 id=54543277…013a
15:50:25 [agent/embedded] [llm-idle-timeout] custom-localhost-8080/planner produced no reply before the idle watchdog; retrying same model
15:50:30 [agent/embedded] [trace:embedded-run] core-plugin-tool stages: runId=3ebcbf6e-9af6-459b-a4e4-294c5aba735f sessionId=79abafd7-b3c6-4e07-9cf0-216263b699f3 phase=core-plugin-tools totalMs=5365 stages=tool-policy:2664ms@2664ms,workspace-policy:1ms@2665ms,base-coding-tools:0ms@2665ms,shell-tools:0ms@2665ms,openclaw-tools:session-workspace:1ms@2666ms,openclaw-tools:image-tool:28ms@2694ms,openclaw-tools:image-generate-tool:0ms@2694ms,openclaw-tools:video-generate-tool:0ms@2694ms,openclaw-tools:music-generate-tool:0ms@2694ms,openclaw-tools:pdf-tool:0ms@2694ms,openclaw-tools:web-search-tool:0ms@2694ms,openclaw-tools:web-fetch-tool:0ms@2694ms,openclaw-tools:message-tool:6ms@2700ms,openclaw-tools:nodes-tool:0ms@2700ms,openclaw-tools:core-tool-list:1ms@2701ms,openclaw-tools:plugin-tools:2659ms@5360ms,openclaw-tools:0ms@5360ms,message-provider-policy:0ms@5360ms,model-provider-policy:0ms@5360ms,authorization-policy:1ms@5361ms,schema-normalization:3ms@5364ms,tool-hooks:0ms@5364ms,abort-wrappers:0ms@5364ms,deferred-followup-descriptions:1ms@5365ms,attempt:create-openclaw-coding-tools:0ms@5365ms,attempt:tools-allow:0ms@5365ms
15:50:35 [agent/embedded] [trace:embedded-run] prep stages: runId=3ebcbf6e-9af6-459b-a4e4-294c5aba735f sessionId=79abafd7-b3c6-4e07-9cf0-216263b699f3 phase=stream-ready totalMs=10285 stages=workspace-sandbox:2ms@2ms,skills:0ms@2ms,core-plugin-tools:5366ms@5368ms,bootstrap-context:23ms@5391ms,bundle-tools:2077ms@7468ms,system-prompt:11ms@7479ms,session-resource-loader:2769ms@10248ms,agent-session:4ms@10252ms,stream-setup:33ms@10285ms
This log is actually useful: OpenClaw itself is spending ~10s preparing the turn, then the local model is still timing out before first token.
Breakdown:
core-plugin-tools: 5366ms
bundle-tools: 2077ms
session-resource-loader: 2769ms
prep total: 10285ms
So about 10s is OpenClaw assembling tools/session/runtime. That’s not terrible for a heavy coding profile, but it is too much overhead for a weak local model if the model then needs ages to prefill.
The worse line is still:
[llm-idle-timeout] ... produced no reply before the idle watchdog
That means after prep, your local model still did not emit tokens in time.
For your setup, I’d make the local planner non-tool / lean:
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
openclaw config validate
Then restart/fresh session:
openclaw gateway restart
In chat:
/new
That should reduce tool/schema overhead. Tradeoff: with supportsTools false, the local planner becomes more like a plain chat model, not a full coding/tool-loop agent.
If you want real working loops, I’d honestly use a hosted/faster model for agents.defaults.model, and keep local models for vision/transcription/simple chat. Your logs show the 4GB GPU/CPU backend is losing most of the battle at prompt prefill + tool envelope.
slot create_check: id 0 | task 0 | created context checkpoint 1 of 32 (pos_min = 8191, pos_max = 8191, n_tokens = 8192, size = 0.282 MiB)
srv stop: cancel task, id_task = 0
srv log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
srv params_from_: Chat format: peg-native
slot release: id 0 | task 0 | stop processing: n_tokens = 10240, truncated = 0
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.738 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 7 | processing task, is_child = 0
slot update_slots: id 0 | task 7 | new prompt, n_ctx_slot = 65536, n_keep = 0, task.n_tokens = 13878
slot update_slots: id 0 | task 7 | n_tokens = 10240, memory_seq_rm [10240, end)
slot update_slots: id 0 | task 7 | prompt processing progress, n_tokens = 12288, batch.n_tokens = 2048, progress = 0.885430
slot update_slots: id 0 | task 7 | n_tokens = 12288, memory_seq_rm [12288, end)
slot update_slots: id 0 | task 7 | prompt processing progress, n_tokens = 13362, batch.n_tokens = 1074, progress = 0.962819
srv stop: cancel task, id_task = 7
srv log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
This log says the model was almost done prefilling, then the client cancelled before generation started:
task.n_tokens = 13878
progress = 0.962819
srv stop: cancel task
So the prompt is ~13.9k tokens. That’s under OpenClaw’s 16k minimum, but still heavy for your backend. It reached 96% of prompt processing and then got aborted — classic idle watchdog / caller timeout before first token.
Since you updated to 2026.5.7, try provider timeout again now:
openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 600 --strict-json
openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
openclaw config validate
openclaw gateway restart
Also make the local planner as lean as possible:
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config validate
Then /new before testing again.
One more thing: your server says:
n_ctx_slot = 65536
That means llama.cpp is running with a huge 65k context slot. On your hardware, try launching the model server with a smaller context, like 16k or 20k, not 65k. You only need to satisfy OpenClaw’s 16k minimum; 65k just makes KV/cache/memory pressure worse.
So: OpenClaw is not failing to send the prompt. The local server is spending too long digesting ~14k tokens, then OpenClaw/client gives up before first output token.
kv cache of that model only takes 200 mb
can i increase openclaw waiting time?
Fair point — if KV is only ~200MB, then the 65k context setting probably isn’t the main memory problem.
The real bottleneck in your logs is prefill compute / time-to-first-token, not just KV size:
task.n_tokens = 13878
progress = 0.962819
srv stop: cancel task
That means llama.cpp was still chewing through ~14k prompt tokens and OpenClaw/client cancelled before the first generated token arrived. Even with small KV, prefill can be slow on CPU/low-end GPU because it still has to run the model over every prompt token.
So the useful fixes are:
openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 600 --strict-json
openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config validate
Then /new before testing again.
If it still cancels around 90–99% prefill, the model is simply not reaching first token before the idle/watchdog timeout. At that point either raise timeout more, speed up llama.cpp prefill, or use a lighter/faster model for the main OpenClaw agent loop.
Yes — now that you’re on 2026.5.7, try increasing both the provider/model wait and the overall agent turn wait:
openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 900 --strict-json
openclaw config set agents.defaults.timeoutSeconds 900 --strict-json
openclaw config validate
openclaw gateway restart
For your vision/coder servers too:
openclaw config set models.providers.custom-localhost-8081.timeoutSeconds 900 --strict-json
openclaw config set models.providers.custom-localhost-8082.timeoutSeconds 900 --strict-json
openclaw config validate
When testing from CLI, also increase the CLI wait timeout, otherwise the CLI itself can give up first:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --json --timeout-ms 900000
So the three layers are:
models.providers.<id>.timeoutSeconds= model HTTP / stream idle waitagents.defaults.timeoutSeconds= whole agent turn timeout--timeout-ms= CLI command wait time
If it still cancels after 900s, then OpenClaw is waiting long enough and the local backend is simply too slow/stuck before first token.
16:00:18 [diagnostic] liveness warning: reasons=event_loop_delay interval=43s eventLoopDelayP99Ms=32.3 eventLoopDelayMaxMs=13463.7 eventLoopUtilization=0.365 cpuCoreRatio=0.369 active=1 waiting=0 queued=1 recentPhases=sidecars.restart-sentinel:0ms,sidecars.subagent-recovery:7ms,sidecars.main-session-recovery:5ms,post-attach.update-sentinel:0ms,sidecars.session-locks:38ms,post-ready.maintenance:867ms work=[active=agent:main:main(processing,q=1,age=15s) queued=agent:main:main(processing,q=1,age=15s)]
16:00:32 [ws] ⇄ res ✓ node.list 11700ms conn=4c78d5a0…99e8 id=b096b23d…479d
16:00:32 [agent/embedded] [trace:embedded-run] startup stages: runId=aeb1742f-6b61-44cb-8c33-61199c28ae80 sessionId=903489c3-4d6a-487b-aced-ab4496860813 phase=attempt-dispatch totalMs=11646 stages=workspace:1ms@1ms,runtime-plugins:10ms@11ms,hooks:1ms@12ms,model-resolution:11390ms@11402ms,auth:234ms@11636ms,context-engine:1ms@11637ms,attempt-dispatch:9ms@11646ms
This log points to Gateway/Node event loop stalls, not just model prefill now.
Key bits:
eventLoopDelayMaxMs=13463.7
node.list 11700ms
model-resolution:11390ms
So something blocked/slowed the OpenClaw Node process for ~11–13 seconds. That’s why even node.list took 11.7s — that call does not need your LLM to generate text.
Likely causes on your setup:
- local model server is eating CPU/RAM and starving the Gateway process
- antivirus/Windows Defender scanning
node_modules/.openclaw - Gateway running from nvm path + service/login-item weirdness after update
- too many active/retrying stuck model runs queued in the same session
Do this next:
openclaw status --deep
openclaw gateway status
openclaw config validate
Then check if models.providers.*.timeoutSeconds now works after update:
openclaw config get models.providers.custom-localhost-8080.timeoutSeconds
If it’s unset:
openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 900 --strict-json
openclaw config validate
But separately, I’d stop the local model server once and see if OpenClaw UI/API becomes fast. If node.list drops back under ~100ms when the model server is stopped, the machine is simply resource-starved while llama.cpp is running.
PS C:\WINDOWS\system32> openclaw config get models.providers.custom-localhost-8080.timeoutSeconds
🦞 OpenClaw 2026.5.7 (eeef486) — I'll do the boring stuff while you dramatically stare at the logs like it's cinema.
900
PS C:\WINDOWS\system32>
Good — provider timeout is set correctly now:
custom-localhost-8080.timeoutSeconds = 900
So OpenClaw should wait much longer for the planner model before cancelling the HTTP/model stream.
Next verify the other two layers:
openclaw config get agents.defaults.timeoutSeconds
Expected: 900 or similar.
And when testing from CLI, use:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --json --timeout-ms 900000
If that still times out/cancels while your llama.cpp log shows it is still prefilling, then it’s no longer an OpenClaw timeout config issue — the machine/backend is too slow or getting event-loop stalls under load.
Also your last log had node.list 11700ms and model-resolution 11390ms, which means the Gateway itself was delayed ~11s. I’d test once with the local model server stopped: if OpenClaw becomes fast, llama.cpp is starving the Gateway process.
it still returns json reply, wth
If the Discord/OpenClaw chat reply itself is literally:
{"message":"..."}
then that’s your local model/server template, not OpenClaw intentionally sending a JSON file.
Two common causes:
- CLI test used
--json
If it’s from this command:
openclaw infer model run ... --json
remove --json:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --timeout-ms 900000
- llama.cpp chat format is forcing JSON-ish replies
Your server log showed:
Chat format: peg-native
That template/model may prefer structured output like { "message": ... }.
Test directly against the server:
$body = @{
model = "planner"
messages = @(@{ role = "user"; content = "Reply with exactly: hello. Plain text only, no JSON." })
max_tokens = 20
stream = $false
} | ConvertTo-Json -Depth 5
Invoke-RestMethod -Uri http://127.0.0.1:8080/v1/chat/completions -Method Post -ContentType "application/json" -Body $body
If direct server output is also JSON-style, fix it in llama.cpp: try a different --chat-template / chat format for that model, or add a system prompt/template that says plain text only.
For OpenClaw, also make sure this is set for the weak local planner:
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config validate
That reduces tool/structured-output pressure. But if the direct llama.cpp test still returns { "message": ... }, the real fix is the model server chat template.
maybe there was something with openclaw that cause the slow reply and weird result format\
Yeah, that’s possible — but it’s probably two different issues mixed together:
1. Slow reply
OpenClaw definitely adds overhead compared to talking to llama.cpp directly. Your logs showed:
core-plugin-tools ~5.3s
bundle-tools ~2s
session-resource-loader ~2.7s
model-resolution ~11s sometimes
So OpenClaw is adding several seconds before the model even starts. Then your local model has to prefill ~14k tokens, which is the much bigger delay.
To reduce OpenClaw overhead for local models:
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config validate
That makes it lighter, but also less agentic/tool-capable.
2. Weird JSON reply format
That’s more likely from the local model/chat template, especially because your llama.cpp log said:
Chat format: peg-native
OpenClaw can encourage structured/tool-ish outputs, but if the model replies like:
{"message":"hi..."}
even when tools are disabled, it’s probably the model template or server format.
Best isolation test:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello. Plain text only, no JSON." --gateway --timeout-ms 900000
No --json there.
Then test llama.cpp directly. If direct llama.cpp also returns JSON-style output, it’s definitely the model/server template. If direct is plain text but OpenClaw chat is JSON, then OpenClaw’s agent/tool prompt is pushing it that way — compat.supportsTools false should help.
here is my chat model template:
{{- bos_token -}}{%- set system_prompt = "" -%}{%- set ns = namespace(system_prompt="") -%}{%- if messages[0]["role"] == "system" -%} {%- set ns.system_prompt = messages[0]["content"] -%} {%- set messages = messages[1:] -%}{%- endif -%}{%- if tools -%} {%- set ns.system_prompt = ns.system_prompt + ("
" if ns.system_prompt else "") + "List of tools: <|tool_list_start|>[" -%} {%- for tool in tools -%} {%- if tool is not string -%} {%- set tool = tool | tojson -%} {%- endif -%} {%- set ns.system_prompt = ns.system_prompt + tool -%} {%- if not loop.last -%} {%- set ns.system_prompt = ns.system_prompt + ", " -%} {%- endif -%} {%- endfor -%} {%- set ns.system_prompt = ns.system_prompt + "]<|tool_list_end|>" -%}{%- endif -%}{%- if ns.system_prompt -%} {{- "<|im_start|>system
" + ns.system_prompt + "<|im_end|>
" -}}{%- endif -%}{%- for message in messages -%} {{- "<|im_start|>" + message["role"] + "
" -}} {%- set content = message["content"] -%} {%- if content is not string -%} {%- set content = content | tojson -%} {%- endif -%} {%- if message["role"] == "tool" -%} {%- set content = "<|tool_response_start|>" + content + "<|tool_response_end|>" -%} {%- endif -%} {{- content + "<|im_end|>
" -}}{%- endfor -%}{%- if add_generation_prompt -%} {{- "<|im_start|>assistant
" -}}{%- endif -%}
Yep — this template can absolutely explain both the huge prompt and the JSON-ish replies.
The biggest issue is this block:
{%- if tools -%}
... "List of tools: <|tool_list_start|>[" ...
{%- for tool in tools -%}
{%- set tool = tool | tojson -%}
That means whenever OpenClaw sends tools, your template serializes the entire tool schema list into the system prompt. OpenClaw tools are large, so even “hi” becomes a massive prompt. That matches your llama.cpp logs showing ~14k tokens before generation.
It can also push the model toward structured/JSON output, because the prompt contains lots of JSON tool definitions.
Try these fixes, in order:
- Disable tool use for this local planner:
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config validate
openclaw gateway restart
- Keep lean mode on:
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
-
If you control the llama.cpp template, make a non-tool chat template for this model — remove the whole
{% if tools %} ... tool_list ... {% endif %}section. For this local model, plain chat is better than pretending it can handle OpenClaw’s full tool schema. -
If you still want tools later, use a model/template with native OpenAI-compatible tool calling support. This current template is basically stuffing tools into plain text, which is expensive and brittle.
So yes: OpenClaw is heavy, but your template is amplifying it by dumping tool JSON into the prompt. For your hardware, I’d run planner with supportsTools: false and use a stronger/hosted model for real coding/tool loops.
Hi - this thread is getting very long and will be closed soon. Please create a new one for any new questions
thanks for remind me
please sum up this whole thread @stable yoke
Quick thread summary before this gets too long:
You wanted to run local models in OpenClaw on a machine with a 4GB GPU + 32GB RAM, including separate models for planning/coding, vision, and Whisper/audio.
What we found:
- OpenClaw can use separate model slots:
agents.defaults.modelfor the main assistant/planner/coding modelagents.defaults.imageModelfor image-capable model fallbacktools.media.audiofor Whisper/audio transcription
- Whisper should not be configured as a normal chat model. It belongs under
tools.media.audio. - For whisper.cpp, use
whisper-cli.exe/main.exewith full Windows paths, not usuallycommand: "whisper"unless that is actually your binary. - Your local model providers were configured as custom OpenAI-compatible endpoints like:
custom-localhost-8080/plannercustom-localhost-8081/visioncustom-localhost-8082/coder
- On OpenClaw
2026.4.23,models.providers.*.timeoutSecondswas rejected. After updating to2026.5.7, it works, and you set planner timeout to900.
- The Gateway/device pairing issue was separate: the CLI needed a scope upgrade, which you approved with
openclaw devices approve <requestId>. - The
spawn start ENOENTafteropenclaw gateway restartlooked like a Windows Startup-folder login-item restart path issue, not model config. Suggested checkingopenclaw gateway status, repairing withopenclaw doctor/openclaw gateway install, or runningopenclaw gateway runtemporarily. - Windows symlink errors for
browser-automationskill were caused by Windows blocking symlink creation. Fix: enable Developer Mode or run elevated once/reinstall gateway.
Main diagnosis:
Your slow replies were mostly not network transfer. llama.cpp logs showed OpenClaw was sending a large prompt, then the model spent a long time in prompt prefill before producing the first token. Example logs showed ~13k–16k prompt tokens and then cancellation before generation.
Why even hi was heavy:
- OpenClaw full agent mode includes system instructions, tools, workspace/bootstrap context, session metadata, Discord context, and model/tool protocol text.
- OpenClaw requires at least ~16k context for the embedded agent loop.
- Your chat template was also adding the entire tool list as JSON into the system prompt:
List of tools: <|tool_list_start|>[ ... ]<|tool_list_end|>
That likely inflated prompts and encouraged JSON-looking replies.
Recommended local-model config direction:
openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 900 --strict-json
openclaw config set agents.defaults.timeoutSeconds 900 --strict-json
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].reasoning false --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].maxTokens 512 --strict-json
For full agent mode, keep planner/vision context at least 16k:
openclaw config set models.providers.custom-localhost-8080.models[0].contextWindow 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].contextTokens 16000 --strict-json
If still too slow, last-resort for local planner:
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
Tradeoff: that makes the local planner much lighter, but no real tool-calling loop.
Bottom line:
Local vision/transcription/simple chat may work on this hardware. Full coding/tool-agent loops are likely too heavy for a 4GB GPU + CPU inference unless you heavily reduce tools/context or use a faster/hosted model for the main agent loop.