#Is it possible to run 2 models at the same time for openclaw?

1 messages · Page 1 of 1 (latest)

errant merlin
#

I have been struggle to find a model for both image processing and coding but cant find one/too heavy for my 4gb gpu. but i have 32gb cpu and want to try run both model for openclaw. how can i do it if possible?

neat cliffBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

errant merlin
#

I have been struggle to find a model for both image processing and coding but cant find one/too heavy for my 4gb gpu. but i have 32gb cpu and want to try run both model for openclaw. how can i do it if possible?
@stable yoke

stable yokeBOT
#

Yes, it’s possible — but with a 4GB GPU I’d treat this as an experiment, not a comfy daily-driver coding setup.

OpenClaw can use separate model defaults:

  • agents.defaults.model = main chat/coding model
  • agents.defaults.imageModel = model used when the main model can’t handle images

So you don’t need one giant “does everything” model. You can run a lighter local text/coding model, and separately configure a vision-capable local/proxied model for image inputs.

Practical path:

  1. Use LM Studio or Ollama first — easiest local model servers.
  2. Start one local OpenAI-compatible server endpoint.
  3. Add the local provider to OpenClaw.
  4. Set your text model as agents.defaults.model.primary.
  5. Set your vision model as agents.defaults.imageModel.primary.
  6. For custom local vision models, make sure the model entry says input: ["text", "image"], otherwise OpenClaw won’t know it can receive images.
#

Generic shape, with placeholders:

{
  agents: {
    defaults: {
      model: { primary: "local/your-coding-model" },
      imageModel: { primary: "local/your-vision-model" }
    }
  },
  models: {
    mode: "merge",
    providers: {
      local: {
        baseUrl: "http://127.0.0.1:1234/v1",
        apiKey: "local",
#
        api: "openai-completions",
        timeoutSeconds: 300,
        models: [
          {
            id: "your-coding-model",
            name: "Local coding model",
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }
          },
          {
            id: "your-vision-model",
            name: "Local vision model",
            input: ["text", "image"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }
          }
#
        ]
      }
    }
  }
}

Two important reality checks:

  • Running both loaded at once may eat RAM fast. With 32GB CPU RAM it may work, but CPU inference will be slow, especially vision.
  • Small/local models often struggle with OpenClaw tool use and coding-agent workflows. If reliability matters, a hybrid setup is better: local for privacy/cheap experiments, hosted fallback for hard coding/image tasks.

Docs that match this setup:
https://docs.openclaw.ai/gateway/local-models
https://docs.openclaw.ai/concepts/models
https://docs.openclaw.ai/providers/ollama

#

https://docs.openclaw.ai/providers/lmstudio

For exact current model recommendations, ask in #models — that changes too fast for me to safely name-drop models here. If you want config help, paste openclaw --version plus whether you’re using Ollama or LM Studio, and I’ll help shape it. 🦞

errant merlin
#

how can i use a whisper model on openclaw

stable yokeBOT
#

For Whisper, don’t set it as your main agents.defaults.model. OpenClaw treats speech-to-text separately under tools.media.audio.

You have 3 decent options:

Option A — easiest local: install a Whisper CLI and let OpenClaw auto-detect it
OpenClaw auto-detects local audio tools if audio isn’t disabled, including:

  • whisper-cli from whisper.cpp
  • whisper Python CLI
  • sherpa-onnx-offline

So if whisper or whisper-cli is on the Gateway service PATH, voice/audio attachments should be transcribed automatically.

Option B — explicitly configure local Whisper CLI
Example using Python Whisper:

#
{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [
          {
            type: "cli",
            command: "whisper",
            args: ["--model", "base", "{{MediaPath}}"],
            timeoutSeconds: 120,
          },
        ],
      },
    },
#
  },
}

For your hardware, start with tiny or base; small may be okay on CPU but slower. 4GB GPU is tight, so CPU Whisper is often less painful.

Option C — provider Whisper / hosted transcription
If you have provider auth, you can use something like OpenAI or Groq:

{
  tools: {
    media: {
      audio: {
        enabled: true,
#
        models: [{ provider: "groq", model: "whisper-large-v3-turbo" }],
      },
    },
  },
}

Or OpenAI:

{
  tools: {
    media: {
      audio: {
        enabled: true,
#
        models: [{ provider: "openai", model: "gpt-4o-mini-transcribe" }],
      },
    },
  },
}

To test a file manually:

openclaw infer audio transcribe --file ./memo.m4a --json
openclaw infer audio transcribe --file ./memo.m4a --model openai/whisper-1 --json

Docs:
https://docs.openclaw.ai/nodes/audio

#

https://docs.openclaw.ai/cli/infer#audio

If you want exact commands for your install, paste openclaw --version and tell me whether you’re on Linux/macOS + whether you prefer whisper.cpp or Python Whisper. 🦞

errant merlin
#

14:03:54 [ws] ⇄ res ✓ node.list 16489ms conn=b52871e5…c5f4 id=92e92c77…4a53
14:03:54 [ws] ⇄ res ✓ device.pair.list 16492ms conn=b52871e5…c5f4 id=15615ab5…8c8f
14:03:54 [plugins] embedded acpx runtime backend ready
14:03:55 [discord] client initialized as 1484564009775534170 (Ralsei); awaiting gateway readiness
14:04:19 [ws] ⇄ res ✓ node.list 11035ms conn=b52871e5…c5f4 id=ebe744bb…38cf
14:06:08 [agent/embedded] embedded run failover decision: runId=d89b1b91-6bdb-4deb-8079-37c3f69742ed stage=assistant decision=surface_error reason=timeout from=custom-localhost-8080/planner profile=-

stable yokeBOT
#

That log means OpenClaw reached your local model provider, but the assistant call timed out:

reason=timeout from=custom-localhost-8080/planner

So this is probably not Discord failing — it’s your local model/server being too slow or not returning a compatible streaming response in time.

First thing: what version are you on?

openclaw --version

Then check these:

#
openclaw models status
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json

If that probe times out too, fix the local provider side:

  • use a smaller/faster model
  • increase the provider timeout, e.g. in your custom provider config:
models: {
  providers: {
    "custom-localhost-8080": {
      timeoutSeconds: 300
    }
#
  }
}
  • make sure the provider api matches your server:
    • most local /v1/chat/completions servers: openai-completions
    • LM Studio Responses API: openai-responses
  • if this is CPU inference, expect slow first-token latency; 4GB GPU + 32GB RAM can work, but coding/planner models may be painfully slow.

Also, those node.list / device.pair.list calls taking 11–16s are suspiciously slow. Not necessarily fatal, but if everything feels laggy, run:

openclaw status --all

Docs: https://docs.openclaw.ai/gateway/local-models and https://docs.openclaw.ai/help/faq

errant merlin
#

OpenClaw 2026.4.23 (a979721)

stable yokeBOT
#

Thanks — 2026.4.23 is recent enough that the local-provider config path I mentioned should apply.

For that specific timeout, I’d debug it in this order:

  1. Test the local model outside the full agent loop:
openclaw infer model run \
  --model custom-localhost-8080/planner \
  --prompt "Say hello in one short sentence" \
  --gateway \
  --json
  1. If that times out, increase the provider timeout and/or use a smaller model:
#
{
  models: {
    providers: {
      "custom-localhost-8080": {
        timeoutSeconds: 300
      }
    }
  }
}
  1. Check the provider API mode. If your server exposes /v1/chat/completions, it should usually be:
api: "openai-completions"
#

If it’s LM Studio using /v1/responses, use:

api: "openai-responses"
  1. Restart gateway after config changes:
openclaw gateway restart
  1. Then re-test:
openclaw models status
#
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json

If you paste your models.providers.custom-localhost-8080 config block — redact any real API key — I can sanity-check the exact fields. The timeout strongly suggests either the model is too slow on CPU/4GB GPU, or the API mode/streaming compatibility is mismatched.

errant merlin
#

| Version | 2026.4.23 |
| OS | windows 10.0.26200 (x64) |
| Node | 24.14.0 |
| Config | ~.openclaw\openclaw.json |
|
| Gateway service | Scheduled Task installed · registered · running (pid 17392, Startup-folder login item installed; |
| | listener detected on port 18789.) |
| Node service | Scheduled Task not installed |
| Agents | 1 total · 0 bootstrapping · 1 active · 1 sessions |
| Secrets | none |

Channels

| Channel | Enabled | State | Detail |

| Discord | ON | OK | token config (sha256:786d2fb3 · len 72) · accounts 1/1

Discord accounts

| Account | Status | Notes |

| default | OK | token:config |

Agents

| Agent | Bootstrap file | Sessions | Active | Store |

| main | ABSENT | 1 | 3m ago | ~.openclaw\agents\main\sessions\sessions.json |

stable yokeBOT
#

Good, gateway + Discord look healthy. The important missing piece is model/provider status — this paste doesn’t show whether custom-localhost-8080/planner is configured/ready.

Please run these next:

openclaw models status
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json

If the second command times out, the local model server is the bottleneck. On Windows + 4GB GPU, that’s very plausible, especially if the model is mostly CPU/RAM loaded.

A few Windows-specific things to check:

  • Make sure your local model server is already running before OpenClaw uses it.
  • Open http://127.0.0.1:8080/v1/models in a browser, or test it with PowerShell, to confirm the server responds quickly.
  • If your OpenClaw Gateway runs as a Scheduled Task, it may not see the same PATH/env as your terminal. This matters for Whisper CLI especially.
  • Secrets: none means no provider API keys are configured, so hosted fallback won’t work unless you add auth. That’s okay for local-only, but less reliable.
#

For the timeout, add/increase this in your provider block:

{
  models: {
    providers: {
      "custom-localhost-8080": {
        baseUrl: "http://127.0.0.1:8080/v1",
        apiKey: "local",
        api: "openai-completions",
        timeoutSeconds: 300
      }
    }
  }
}
#

Then:

openclaw gateway restart

If you paste your models.providers.custom-localhost-8080 block, I can check whether the model id / api mode / timeout are right. Redact any real key, though if it’s local it can just be local.

errant merlin
#

"models": {
"mode": "merge",
"providers": {
"custom-localhost-8080": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"models": [
{
"id": "planner",
"name": "LFM2-8B Planner",
"contextWindow": 64000,
"maxTokens": 4096,
"input": ["text"],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"reasoning": true
}
]
},
"custom-localhost-8082": {
"baseUrl": "http://localhost:8082/v1",
"api": "openai-completions",
"models": [
{
"id": "coder",
"name": "Qwen3.5",
"contextWindow": 16384,
"maxTokens": 4096,
"input": ["text"],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"reasoning": true
}
]
},
"custom-localhost-8081": {
"baseUrl": "http://localhost:8081/v1",
"api": "openai-completions",
"models": [
{
"id": "vision",
"name": "LFM2.5-VL Vision",
"contextWindow": 8192,
"maxTokens": 2048,
"input": ["text", "image"],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"reasoning": false
}
]
},
"custom-localhost-8083": {
"baseUrl": "http://localhost:8083/v1",
"api": "openai-completions",
"models": [
{
"id": "sound",
"name": "Sound Model",
"contextWindow": 8192,
"maxTokens": 2048,
"input": ["text"],
"cost": {
"input": 0,

#

{
"agents": {
"defaults": {
"workspace": "C:\Users\User\.openclaw\workspace",
"model": {
"primary": "custom-localhost-8080/planner"
},
"models": {
"custom-localhost-8080/planner": {
"alias": "Planner"
},
"custom-localhost-8082/coder": {
"alias": "Coder"
},
"custom-localhost-8081/vision": {
"alias": "Vision"
},
"custom-localhost-8083/sound": {
"alias": "Sound"
}
}
}
},

stable yokeBOT
#

Yep — this config explains a few things.

Main issues I see:

  1. No apiKey / local marker on the custom providers
    For local OpenAI-compatible providers, add a placeholder key. It doesn’t need to be secret for localhost:
apiKey: "local"

Add that to each custom-localhost-* provider.

  1. No timeoutSeconds
    Your planner timeout is likely just slow local inference. Add this at provider level:
#
timeoutSeconds: 300
  1. Your sound provider is not enough for Whisper/audio transcription
    Putting sound under models.providers makes it a text model provider. OpenClaw audio transcription uses tools.media.audio, not agents.defaults.model.

If you want local Whisper, configure it like this instead:

{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [
#
          {
            type: "cli",
            command: "whisper",
            args: ["--model", "base", "{{MediaPath}}"],
            timeoutSeconds: 120
          }
        ]
      }
    }
  }
}

Or if you use whisper.cpp:

#
{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [
          {
            type: "cli",
            command: "whisper-cli",
            args: ["--model", "C:\\path\\to\\ggml-base.bin", "{{MediaPath}}"],
            timeoutSeconds: 120
          }
        ]
      }
    }
#
  }
}

On Windows Scheduled Task, use the full path to whisper.exe / whisper-cli.exe if auto-detection fails, because the gateway may not have the same PATH as your terminal.

  1. For image model routing, use imageModel
    Your vision provider entry looks basically right. Make sure your defaults include something like:
agents: {
  defaults: {
    model: { primary: "custom-localhost-8080/planner" },
    imageModel: { primary: "custom-localhost-8081/vision" }
  }
#
}
  1. Don’t expect planner + coder to both run automatically
    OpenClaw won’t automatically split every request between planner and coder just because both providers exist. It will use the selected primary model unless you set up routing/agents or manually switch models.

I’d simplify first: run only planner + vision + Whisper CLI, get that stable, then add coder later. With 4GB GPU, four local servers at once is very likely to cause timeouts or memory thrash.

#

This part is mostly okay, but it only does model selection/allowlisting. It does not wire vision/audio routing yet.

I’d change it to this shape:

{
  agents: {
    defaults: {
      workspace: "C:\\Users\\User\\.openclaw\\workspace",
      model: {
        primary: "custom-localhost-8080/planner"
      },
      imageModel: {
        primary: "custom-localhost-8081/vision"
      },
      models: {
#
        "custom-localhost-8080/planner": { alias: "Planner" },
        "custom-localhost-8082/coder": { alias: "Coder" },
        "custom-localhost-8081/vision": { alias: "Vision" }
      }
    }
  },

  tools: {
    media: {
      audio: {
        enabled: true,
        models: [
          {
            type: "cli",
            command: "C:\\path\\to\\whisper.exe",
#
            args: ["--model", "base", "{{MediaPath}}"],
            timeoutSeconds: 120
          }
        ]
      }
    }
  }
}

Then in your models.providers blocks, add apiKey: "local" and timeoutSeconds: 300 to each localhost provider, especially planner/vision:

"custom-localhost-8080": {
  baseUrl: "http://localhost:8080/v1",
#
  apiKey: "local",
  api: "openai-completions",
  timeoutSeconds: 300,
  models: [ ... ]
}

Important bits:

  • model.primary = normal assistant/planner model
  • imageModel.primary = image-capable fallback when images are attached
  • Whisper/audio does not go in agents.defaults.models; it goes under tools.media.audio
  • I’d remove custom-localhost-8083/sound from the allowlist unless it’s actually a text/chat model. It won’t be used as Whisper just because it’s named Sound.
  • Your pasted JSON looks cut off at the end, so make sure the final file has all closing braces and commas valid.

After editing:

#
openclaw gateway restart
openclaw models status
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json

If you tell me what Whisper binary you installed — Python whisper, whisper.cpp whisper-cli, or something else — I can give you the exact Windows command line.

errant merlin
#

i use whisper-cli server

stable yokeBOT
#

Ah — if it’s a Whisper server, the key question is: does it expose an OpenAI-compatible endpoint?

OpenClaw’s provider audio path expects:

POST <baseUrl>/audio/transcriptions

So if your server has something like:

http://localhost:8083/v1/audio/transcriptions

then configure it under tools.media.audio, not as agents.defaults.models:

#
{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [
          {
            provider: "openai",
            model: "whisper-1",
            baseUrl: "http://localhost:8083/v1",
            apiKey: "local",
            timeoutSeconds: 120
          }
        ]
      }
#
    }
  }
}

Then test:

openclaw gateway restart
openclaw infer audio transcribe --file .\test.wav --json

But if your whisper-cli server is the whisper.cpp server that only exposes /inference or some non-OpenAI endpoint, OpenClaw probably can’t call it directly as a provider. In that case use one of these instead:

Simplest: call the binary directly:

#
{
  tools: {
    media: {
      audio: {
        enabled: true,
        models: [
          {
            type: "cli",
            command: "C:\\full\\path\\to\\whisper-cli.exe",
            args: ["-m", "C:\\full\\path\\to\\ggml-base.bin", "-f", "{{MediaPath}}"],
            timeoutSeconds: 120
          }
        ]
      }
    }
#
  }
}

More advanced: run a tiny proxy that converts OpenAI /v1/audio/transcriptions requests into your server’s endpoint.

Quick check: open this in browser or curl it:

http://localhost:8083/v1/audio/transcriptions

If it 404s, it’s not OpenAI-compatible and you’ll want the CLI config/proxy route.

errant merlin
#

here is my own config for whisper:
"tools": {
"profile": "coding",
"web": {
"search": {
"provider": "gemini",
"enabled": true
}
},
"media": {
"audio": {
"enabled": true,
"models": [
{
"type": "cli",
"command": "whisper",
"args": [
"--model",
"base",
"{{MediaPath}}",
"--output_format",
"txt"
],
"timeoutSeconds": 120
}
]
}
}
},

stable yokeBOT
#

That config is close, but for the Python whisper CLI the output behavior is the likely problem.

whisper normally writes transcript files to disk and prints progress/logging, not always clean transcript text to stdout. OpenClaw’s CLI audio transcription wants the command’s stdout to contain the transcript.

Try adding --output_dir - if your Whisper CLI supports it:

"tools": {
  "profile": "coding",
  "web": {
    "search": {
      "provider": "gemini",
      "enabled": true
    }
  },
  "media": {
#
    "audio": {
      "enabled": true,
      "models": [
        {
          "type": "cli",
          "command": "whisper",
          "args": [
            "--model", "base",
            "--output_format", "txt",
            "--output_dir", "-",
            "{{MediaPath}}"
          ],
          "timeoutSeconds": 120
        }
      ]
#
    }
  }
}

If that doesn’t produce clean stdout, use a wrapper script instead. Example idea:

python -m whisper "C:\path\audio.wav" --model base --output_format txt --output_dir C:\temp\whisper-out
Get-Content C:\temp\whisper-out\audio.txt

Then point OpenClaw’s command at that wrapper so stdout is only the transcript.

Also Windows Scheduled Task warning: if command: "whisper" works in PowerShell but not in OpenClaw, use the full path to whisper.exe or python.exe, because the Gateway may not inherit your terminal PATH.

#

Test with:

openclaw infer audio transcribe --file .\test.wav --json

If that fails, paste the exact error from that command/logs and we can tighten it.

errant merlin
#

i dont remember my whisper runner was python, it was cpp

stable yokeBOT
#

Gotcha — then change command: "whisper". That is usually the Python Whisper command name. For whisper.cpp it’s usually whisper-cli.exe — older builds may be main.exe.

First check what Windows sees:

where whisper
where whisper-cli
where main

For whisper.cpp, your OpenClaw config should look more like this, using full paths:

{
  tools: {
    media: {
#
      audio: {
        enabled: true,
        models: [
          {
            type: "cli",
            command: "C:\\path\\to\\whisper-cli.exe",
            args: [
              "-m", "C:\\path\\to\\ggml-base.bin",
              "-f", "{{MediaPath}}",
              "--no-timestamps"
            ],
            timeoutSeconds: 120
          }
        ]
      }
#
    }
  }
}

If your binary is the older main.exe, use that as command instead.

Test the exact command manually first:

C:\path\to\whisper-cli.exe -m C:\path\to\ggml-base.bin -f .\test.wav --no-timestamps

If that prints the transcript to the terminal, OpenClaw can use it. Then test OpenClaw:

#
openclaw infer audio transcribe --file .\test.wav --json

Small note: if you were running a whisper.cpp server, that’s different from whisper-cli. OpenClaw’s easiest path is the CLI binary. The server only works directly if it exposes OpenAI-compatible /v1/audio/transcriptions.

errant merlin
#

why my openclaw took so much time to start, how can i increase timeout for model servers

stable yokeBOT
#

Two separate timeouts matter here:

1. Model HTTP timeout — use this first for slow local servers:

models: {
  providers: {
    "custom-localhost-8080": {
      baseUrl: "http://127.0.0.1:8080/v1",
      apiKey: "local",
      api: "openai-completions",
      timeoutSeconds: 300,
      models: [ ... ]
    },
    "custom-localhost-8081": {
      baseUrl: "http://127.0.0.1:8081/v1",
#
      apiKey: "local",
      api: "openai-completions",
      timeoutSeconds: 300,
      models: [ ... ]
    }
  }
}

Docs specifically recommend models.providers.<id>.timeoutSeconds for slow local/remote model servers before raising the whole agent timeout: https://docs.openclaw.ai/gateway/local-models

2. Whole agent/run timeout — only raise this if the full assistant run still gets killed after provider timeout is higher:

agents: {
#
  defaults: {
    timeoutSeconds: 600,
    model: { primary: "custom-localhost-8080/planner" },
    imageModel: { primary: "custom-localhost-8081/vision" }
  }
}

Then restart:

openclaw gateway restart

Why startup/first reply is slow in your setup:

#
  • You’re running multiple local model servers on a 4GB GPU + CPU RAM; cold-loading models can take a long time.
  • Four servers at once can fight for RAM/VRAM and make everything slower.
  • localhost on Windows can sometimes try IPv6 first; use 127.0.0.1 in baseUrl to avoid that variable.
  • First request often loads/warmups the model. Keep only the models you actually need running.

My recommendation: start with one text model + one vision model + whisper CLI, with timeoutSeconds: 300, and remove the extra coder/sound servers until the basic loop is stable. 🦞

errant merlin
#

it is the same despite i running 1 or 4 models

stable yokeBOT
#

If it’s the same with 1 model or 4, then the delay probably isn’t VRAM contention. It’s more likely one of these:

  1. The local model server itself has slow first-token / request handling
    Test it outside OpenClaw and time it:
Measure-Command { openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json }

Also test the server directly if it’s OpenAI-compatible:

Invoke-RestMethod http://127.0.0.1:8080/v1/models

If /v1/models is slow, OpenClaw can’t fix that — the server is slow to respond.

#
  1. localhost resolution weirdness on Windows
    Change all local baseUrls from:
http://localhost:8080/v1

to:

http://127.0.0.1:8080/v1

This avoids IPv6 / resolver weirdness.

  1. Gateway startup vs assistant response timeout are different
    OpenClaw does not normally load your LLM at gateway startup. The slow part in your earlier log was during an assistant run:
#
stage=assistant reason=timeout from=custom-localhost-8080/planner

So increasing startup timeout won’t help if the model request itself hangs.

  1. Your local server may not stream in the shape OpenClaw expects
    Try non-stream/simple probe:
openclaw infer model run --model custom-localhost-8080/planner --prompt "Return exactly: ok" --gateway --json

If this hangs or returns weird JSON/errors, the issue is provider compatibility or the local server.

  1. Your node.list calls were also very slow
    Earlier you had 11–16s for node/device calls. That suggests general gateway/system slowness too, not just models. Check:
#
openclaw status --deep
openclaw logs --tail 200

I’d make these exact config tweaks first:

"custom-localhost-8080": {
  "baseUrl": "http://127.0.0.1:8080/v1",
  "apiKey": "local",
  "api": "openai-completions",
  "timeoutSeconds": 300,
  "models": [ ... ]
}
#

Do that for each localhost provider, restart gateway, then run the infer model run probe. If you paste that probe’s output/timing, we can tell whether it’s OpenClaw timeout, server compatibility, or raw model latency.

errant merlin
#

why openclaw repeat this twice in log:
14:25:09 [ws] ⇄ res ✓ commands.list 746ms conn=ed584b2d…6a43 id=c3622af0…f116
14:25:09 [ws] ⇄ res ✓ chat.history 759ms conn=ed584b2d…6a43 id=23005b0b…0327
14:25:09 [ws] ⇄ res ✓ models.list 761ms conn=ed584b2d…6a43 id=ecb5501a…b150
14:25:09 [ws] ⇄ res ✓ node.list 788ms conn=ed584b2d…6a43 id=604bd909…fa71
14:25:09 [ws] ⇄ res ✓ device.pair.list 789ms conn=ed584b2d…6a43 id=c06b5b46…9ff1
14:26:02 [ws] ⇄ res ✓ channels.status 130ms conn=558050ab…3a83 id=83591a49…b0c0
14:27:12 [ws] ⇄ res ✓ health 870ms conn=fb45220e…b45b id=a0da69ac…8976

stable yokeBOT
#

That log isn’t OpenClaw “starting twice” or running the model twice. Those are WebSocket RPC requests from a client UI / control surface asking the Gateway for state.

This batch:

commands.list
chat.history
models.list
node.list
device.pair.list

usually happens when the Dashboard / chat UI opens or refreshes. It loads available commands, chat history, model picker data, nodes, and device pairing state.

The later lines are different connections:

#
conn=558050ab… channels.status
conn=fb45220e… health

Different conn= IDs means different websocket connections — commonly:

  • another browser tab
  • dashboard reconnect/refresh
  • status page / health check
  • CLI command asking gateway status
  • mobile/control UI reconnecting

So this is normal by itself. The times shown there are also not terrible now: ~700–800ms for the first batch, 130ms for channel status, 870ms health.

If you really see the exact same batch duplicated repeatedly every few seconds, check for:

#
  • multiple Dashboard tabs open
  • browser auto-refresh / extension reconnecting
  • both startup-folder login item and Scheduled Task launching UI/helper pieces
  • a CLI/status command running in a loop

But this log snippet alone doesn’t explain the model timeout. For model slowness, the line to focus on is still the earlier one like:

stage=assistant reason=timeout from=custom-localhost-8080/planner
errant merlin
#

so slow that agents section doesnt loaded yet

stable yokeBOT
#

If the Agents section in the UI never finishes loading, that’s probably not your model server directly — it’s the Dashboard waiting on gateway state, sessions, nodes/devices, or a stuck active run.

Try this quick isolation path:

openclaw status --deep
openclaw logs --tail 200
openclaw sessions list

Then restart only the gateway:

openclaw gateway restart

After restart, open one Dashboard tab only. Multiple tabs can create repeated websocket requests and make the UI look noisier.

#

A few likely causes:

  • Stuck active assistant run from the timed-out local model. Restarting gateway usually clears the immediate wedged UI state.
  • Session store/history slow. If openclaw sessions list hangs or errors, the Agents panel may wait on session metadata.
  • Node/device checks slow. You don’t have Node service installed, so if node/device calls are slow or failing, the UI can feel stuck while it waits.
  • Browser cache / websocket state wedged. Try hard refresh or another browser.
  • Scheduled Task + startup login item weirdness. Your status said both Scheduled Task and Startup-folder login item are present. That can be okay, but if two gateway-ish things are fighting, it can cause odd behavior. openclaw gateway status should show one clear running gateway.

If you paste the last ~100 lines after opening the Dashboard and waiting for Agents to load, I can point at the exact request that’s blocking. The key is whether you see an error/timeout after agents/sessions/chat.history, or whether the UI just never asks for it.

errant merlin
#

System: [2026-05-11 14:24:54 GMT+7] Config recovery warning: OpenClaw restored openclaw.json from the last-known-good backup during reload (reload-invalid-config). The rejected config was invalid and was preserved as a timestamped .clobbered.* file. Do not write openclaw.json again unless you validate the full config first.

[Mon 2026-05-11 14:29 GMT+7] hi

stable yokeBOT
#

Aha — that warning is the real clue.

OpenClaw is slow / Agents isn’t loading because your config edit was invalid, so Gateway restored the last-known-good config and preserved your broken edit as an openclaw.json.clobbered.* file.

That usually happens when you paste a partial block like:

"tools": { ... }

without wrapping it into the full existing openclaw.json, or there’s a missing comma/brace.

Do this first:

openclaw config validate
#
openclaw doctor

Then inspect the rejected/clobbered config:

$config = openclaw config file
Get-ChildItem "$config.clobbered.*", "$config.rejected.*" -ErrorAction SilentlyContinue | Sort-Object LastWriteTime -Descending | Select-Object -First 5

Do not paste the whole clobbered file back over openclaw.json. Copy only the intended sections into the active config, then validate:

openclaw config validate
openclaw gateway restart
errant merlin
#

14:32:48 [agent/embedded] [llm-idle-timeout] custom-localhost-8080/planner produced no reply before the idle watchdog; retrying same model
14:33:02 [ws] ⇄ res ✓ node.list 54ms conn=58768958…2f30 id=c95613c9…ac1b

stable yokeBOT
#

That line means the model connection opened, but no chunks/tokens came back before OpenClaw’s LLM idle watchdog fired:

[llm-idle-timeout] custom-localhost-8080/planner produced no reply before the idle watchdog

So the fix is specifically models.providers.<id>.timeoutSeconds, not just agents.defaults.timeoutSeconds.

For your planner provider:

"models": {
  "mode": "merge",
  "providers": {
    "custom-localhost-8080": {
      "baseUrl": "http://127.0.0.1:8080/v1",
#
      "apiKey": "local",
      "api": "openai-completions",
      "timeoutSeconds": 300,
      "models": [
        {
          "id": "planner",
          "name": "LFM2-8B Planner",
          "contextWindow": 64000,
          "maxTokens": 4096,
          "input": ["text"],
          "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 },
          "reasoning": true
        }
      ]
    }
#
  }
}

Then:

openclaw config validate
openclaw gateway restart

Docs say this provider timeout controls the model HTTP fetch and the model stream idle watchdog for slow local/self-hosted providers: https://docs.openclaw.ai/concepts/agent-loop#timeouts

If it still idles out after timeoutSeconds: 300, your server likely accepts the request but doesn’t stream/return anything for 5+ minutes. Then I’d test the backend directly with a tiny request and reduce max output/context, because OpenClaw is just waiting for the first token at that point.

errant merlin
#

how can i send the config while file upload is disabled

stable yokeBOT
#

If file upload is disabled, send it as text — just don’t paste secrets.

Best options:

  1. Paste only the relevant blocks
    Usually enough:
agents: { ... }
models: { ... }
tools: { ... }

Redact anything like:

#
apiKey: "REDACTED"
token: "REDACTED"
  1. Use Discord code blocks
    Split it into multiple messages if it’s too long:
```json5
...config part 1...
```
  1. Use safe CLI output instead of full config
    These are better than pasting the whole file:
#
openclaw config validate
openclaw models status
openclaw status --all

status --all is designed to redact tokens.

  1. Print just one config path
    For example:
openclaw config get models.providers
openclaw config get agents.defaults
openclaw config get tools.media.audio
#

Then paste those outputs here.

Avoid pasting the entire openclaw.json if it includes Discord tokens, API keys, gateway tokens, or auth profiles. For your current issue, I only need agents.defaults, models.providers, and tools.media.audio.

errant merlin
#

i got limited at 2000 characters, the config is over that

stable yokeBOT
#

No need to send the whole config. Send the small pieces one at a time.

Run these and paste each output in separate messages:

openclaw config get agents.defaults
openclaw config get models.providers.custom-localhost-8080
openclaw config get models.providers.custom-localhost-8081
openclaw config get tools.media.audio

If one block is still too big, get only the important fields:

openclaw config get models.providers.custom-localhost-8080.baseUrl
openclaw config get models.providers.custom-localhost-8080.api
#
openclaw config get models.providers.custom-localhost-8080.timeoutSeconds
openclaw config get models.providers.custom-localhost-8080.models

For this issue I mainly need to see:

  • provider id
  • baseUrl
  • api
  • apiKey present or not — redact value if real
  • timeoutSeconds
  • model id
  • agents.defaults.model
  • agents.defaults.imageModel
  • tools.media.audio
#

You can split across 3–5 Discord messages. Just wrap each in a code block if possible, and redact tokens/API keys.

errant merlin
#

"agents": {
"defaults": {
"workspace": "C:\Users\User\.openclaw\workspace",
"model": {
"primary": "custom-localhost-8080/planner"
},
"models": {
"custom-localhost-8080/planner": {
"alias": "Planner"
},
"custom-localhost-8082/coder": {
"alias": "Coder"
},
"custom-localhost-8081/vision": {
"alias": "Vision"
}
}
}
},
"gateway": {
"mode": "local",
"auth": {
"mode": "password",
"password": ""
},
"port": 18789,
"bind": "loopback",
"tailscale": {
"mode": "off",
"resetOnExit": false
}
},
"tools": {
"profile": "coding",
"web": {
"search": {
"provider": "gemini",
"enabled": true
}
},
"media": {
"audio": {
"enabled": true,
"models": [
{
"type": "cli",
"command": "whisper",
"args": [
"--model",
"base",
"{{MediaPath}}",
"--output_format",
"txt"
],
"timeoutSeconds": 120
}
]
}
}
},

#

"models": {
"mode": "merge",
"providers": {
"custom-localhost-8080": {
"baseUrl": "http://localhost:8080/v1",
"api": "openai-completions",
"models": [
{
"id": "planner",
"name": "LFM2-8B Planner",
"contextWindow": 64000,
"maxTokens": 4096,
"input": ["text"],
"reasoning": true
}
]
},
"custom-localhost-8082": {
"baseUrl": "http://localhost:8082/v1",
"api": "openai-completions",
"models": [
{
"id": "coder",
"name": "Qwen3.5",
"contextWindow": 16384,
"maxTokens": 4096,
"input": ["text"],
"reasoning": true
}
]
},
"custom-localhost-8081": {
"baseUrl": "http://localhost:8081/v1",
"api": "openai-completions",
"models": [
{
"id": "vision",
"name": "LFM2.5-VL Vision",
"contextWindow": 8192,
"maxTokens": 2048,
"input": ["text", "image"],
"reasoning": false
}
]
}
}
},

stable yokeBOT
#

This pasted piece has two likely problems:

  1. It’s only a fragment
    If your actual file starts with:
"agents": { ... }

then it is invalid. The full file must be wrapped like:

{
  "agents": { ... },
  "gateway": { ... },
  "tools": { ... },
#
  "models": { ... }
}

Also your pasted block ends with a trailing comma after tools:

  },

That is okay only if another top-level key follows it, like "models". If it’s the end of the file, it’s invalid.

  1. This does not include the models.providers section
    That’s where your timeout fix must go. I still need this output:
#
openclaw config get models.providers.custom-localhost-8080

Your agents.defaults should also include imageModel if you want the vision model used for images:

"imageModel": {
  "primary": "custom-localhost-8081/vision"
}

For Whisper: your current config uses command: "whisper", which is usually Python Whisper. If you use whisper.cpp, change that to the full path of whisper-cli.exe and use whisper.cpp args, e.g.:

{
#
  "type": "cli",
  "command": "C:\\path\\to\\whisper-cli.exe",
  "args": [
    "-m", "C:\\path\\to\\ggml-base.bin",
    "-f", "{{MediaPath}}",
    "--no-timestamps"
  ],
  "timeoutSeconds": 120
}

Before restarting, always run:

openclaw config validate
#

If that fails, paste just the validation error — that’ll tell us exactly where the config is broken.

errant merlin
#

"channels": {
"discord": {
"enabled": true,
"token": "",
"groupPolicy": "allowlist",
"guilds": {
"*": {}
},
"dmPolicy": "allowlist",
"dm": {
"enabled": true
},
"allowFrom": [
"807736763425619968"
]
}
},
"plugins": {
"entries": {
"google": {
"enabled": true,
"config": {
"webSearch": {
"apiKey": ""
}
}
},
"memory-core": {
"config": {
"dreaming": {
"enabled": true
}
}
}
}
},
"meta": {
"lastTouchedVersion": "2026-04-23",
"lastTouchedAt": "2026-04-26T16:06:16.943Z"
}

stable yokeBOT
#

Yep, I see the issue in this models block.

Add three things to each localhost provider:

  • change localhost → 127.0.0.1
  • add apiKey: "local"
  • add timeoutSeconds: 300

Like this for planner:

"custom-localhost-8080": {
  "baseUrl": "http://127.0.0.1:8080/v1",
  "apiKey": "local",
  "api": "openai-completions",
  "timeoutSeconds": 300,
#
  "models": [
    {
      "id": "planner",
      "name": "LFM2-8B Planner",
      "contextWindow": 64000,
      "maxTokens": 4096,
      "input": ["text"],
      "reasoning": true,
      "cost": { "input": 0, "output": 0, "cacheRead": 0, "cacheWrite": 0 }
    }
  ]
}

Do the same pattern for custom-localhost-8081 and custom-localhost-8082.

#

Also add this under agents.defaults so images route to the vision model:

"imageModel": {
  "primary": "custom-localhost-8081/vision"
}

And be careful: your pasted block ends with:

},

That trailing comma is only valid if another top-level config key comes after it. If it’s the end of the file, remove the comma.

After edits:

#
openclaw config validate
openclaw gateway restart
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json

If it still logs [llm-idle-timeout] after this, the server is accepting the request but not producing first tokens fast enough / not streaming compatibly.

#

Thanks, this last block looks structurally fine if it’s inside one top-level { ... } with the earlier blocks.

But a few things to fix/avoid:

  1. Empty secrets may be a problem
    You pasted:
"token": ""
"apiKey": ""

If those are really empty in your config, Discord / Google search auth may be broken. If you only redacted them before pasting, that’s fine.

  1. Do not overwrite the whole file with these fragments
    Your full openclaw.json should look like:
#
{
  "agents": { ... },
  "gateway": { ... },
  "tools": { ... },
  "models": { ... },
  "channels": { ... },
  "plugins": { ... },
  "meta": { ... }
}

Each top-level section needs commas between sections, but no trailing comma after the final meta block.

  1. Required changes for your timeout issue are still in models.providers
    Update all local providers like this:
#
"baseUrl": "http://127.0.0.1:8080/v1",
"apiKey": "local",
"api": "openai-completions",
"timeoutSeconds": 300
  1. Add image routing
    Under agents.defaults:
"imageModel": {
  "primary": "custom-localhost-8081/vision"
}
  1. Whisper.cpp config still needs changing
#

Your current tools.media.audio uses command: "whisper", which likely isn’t your whisper.cpp binary. Use the full path to whisper-cli.exe or main.exe.

After edits, run exactly:

openclaw config validate

If it fails, paste the validation error only. Don’t restart until validation passes, because the gateway will just restore last-known-good again.

errant merlin
#

PS C:\WINDOWS\system32> openclaw config validate

🦞 OpenClaw 2026.4.23 (a979721) — Half butler, half debugger, full crustacean.

Config valid: ~.openclaw\openclaw.json

stable yokeBOT
#

Nice — config is valid now 🎉

Next do the runtime test path:

openclaw gateway restart
openclaw models status
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json

If that succeeds, try Discord again.

If it still shows:

[llm-idle-timeout]
#

then the config is no longer the blocker — your local server is accepting the request but not producing output fast enough / not streaming compatibly. In that case, test the server directly with a tiny request, and make sure your provider block has:

"baseUrl": "http://127.0.0.1:8080/v1",
"apiKey": "local",
"timeoutSeconds": 300

openclaw config validate proves the file shape is good; infer model run proves the model server actually works through the gateway.

errant merlin
#

🦞 OpenClaw 2026.4.23 (a979721) — I'm not saying your workflow is chaotic... I'm just bringing a linter and a helmet.

Config : ~.openclaw\openclaw.json
Agent dir : ~.openclaw\agents\main\agent
Default : custom-localhost-8080/planner
Fallbacks (0) : -
Image model : custom-localhost-8081/vision
Image fallbacks (0): -
Aliases (3) : Planner -> custom-localhost-8080/planner, Coder -> custom-localhost-8082/coder, Vision -> custom-localhost-8081/vision
Configured models (3): custom-localhost-8080/planner, custom-localhost-8082/coder, custom-localhost-8081/vision

Auth overview
Auth store : ~.openclaw\agents\main\agent\auth-profiles.json
Shell env : off
Providers w/ OAuth/tokens (0): -

  • custom-localhost-8080 effective=models.json:l...l | models.json=l...l | source=models.json: ~.openclaw\agents\main\agent\models.json
  • custom-localhost-8081 effective=models.json:l...l | models.json=l...l | source=models.json: ~.openclaw\agents\main\agent\models.json
  • custom-localhost-8082 effective=models.json:l...l | models.json=l...l | source=models.json: ~.openclaw\agents\main\agent\models.json

OAuth/token status

stable yokeBOT
#

This looks good now: OpenClaw sees all 3 models, and it sees local placeholder auth from models.json (l...l = probably local). So auth/config is no longer the obvious problem.

Next test the actual model call:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello in 5 words" --gateway --json

If that works, the planner is usable through OpenClaw.

If it still logs [llm-idle-timeout], then the issue is your server/model runtime, not OpenClaw config. In that case check these:

openclaw config get models.providers.custom-localhost-8080.timeoutSeconds
openclaw config get models.providers.custom-localhost-8080.baseUrl
#

You want:

"baseUrl": "http://127.0.0.1:8080/v1",
"timeoutSeconds": 300

Also test /v1/models directly:

Invoke-RestMethod http://127.0.0.1:8080/v1/models

If /v1/models responds but infer model run hangs, your server likely has a chat-completions compatibility/streaming issue or the model is taking too long to produce the first token.

errant merlin
#

i got replied this this in openclaw chat:
at: 14:43
[Mon 2026-05-11 14:45 GMT+7] hi
System: [2026-05-11 14:45:18 GMT+7] OpenClaw detected a configuration reload warning: the last valid openclaw.json was restored from a timestamped backup after rejection. The rejected config was saved as openclaw.json.clobbered_20260511T142234Z.clobbered — do not overwrite unless verified.
[Mon 2026-05-11 14:46 GMT+7] System: [2026-05-11 14:46:32 GMT+7] Config recovery successful, but caution: the original file was replaced; validate config before writing new openclaw.json if needed.
[Mon 2026-05-11 14:48 GMT+7] hi
[Mon 2026-05-11 14:48 GMT+7] hi
System: [2026-05-11 14:50:05 GMT+7] OpenClaw active session confirmed. Ready to proceed with next task.
[Mon 2026-05-11 14:52 GMT+7] hi

stable yokeBOT
#

Those lines are not normal assistant replies — they’re system/context messages being included in the OpenClaw conversation transcript.

The important part is this:

openclaw.json.clobbered_20260511T142234Z.clobbered

That was from the earlier bad config edit. Since openclaw config validate now passes, it may just be old session context getting repeated back into the chat.

Do this to clear the noisy/polluted session:

openclaw gateway restart

Then in the chat, start a fresh session:

#
/new

or from CLI, if you prefer:

openclaw sessions list

Then test with one clean message like hi.

If it still only echoes system/context lines and gives no real assistant answer, then the model is still failing to produce a reply. Run this exact test:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --json
#

If that hangs or idle-timeouts, the remaining issue is the local model server, not Discord or config recovery.

Also: don’t worry about the .clobbered file itself. It’s a preserved backup of the rejected bad config. Once your active config validates, you can ignore it unless you need to recover something from it.

errant merlin
#

14:48:17 [plugins] embedded acpx runtime backend ready
14:48:17 [gateway] security audit: device access upgrade requested reason=scope-upgrade device=2b8010d84c84c4ebdba8fa2a407a2ca942519111bf0462a4a0f7d4915ee2a0f0 ip=unknown-ip auth=password roleFrom=operator roleTo=operator scopesFrom=operator.read scopesTo=operator.admin,operator.approvals,operator.pairing,operator.read,operator.talk.secrets,operator.write client=cli conn=74d4688c-0e70-4aae-9136-00a22991d9fd
14:48:17 [ws] closed before connect conn=74d4688c-0e70-4aae-9136-00a22991d9fd peer=127.0.0.1:64859->127.0.0.1:18789 remote=127.0.0.1 fwd=n/a origin=n/a host=127.0.0.1:18789 ua=n/a code=1008 reason=connect failed
14:48:18 [discord] client initialized as 1484564009775534170 (Ralsei); awaiting gateway readiness
14:49:31 [ws] ⇄ res ✓ node.list 538ms conn=25dbced2…0ee7 id=a9d1fa26…8415

stable yokeBOT
#

This log is a device scope upgrade, not a model timeout:

scope-upgrade ... scopesFrom=operator.read ... scopesTo=operator.admin,operator.approvals,operator.pairing,operator.read,operator.talk.secrets,operator.write
code=1008 reason=connect failed

Your CLI/device currently only has operator.read, but the CLI action you ran needs broader operator scopes. OpenClaw created a pending upgrade request, then closed that connection until you approve it.

Run:

openclaw devices list

Look for the pending request for that same device/scope upgrade. Then approve the exact request id:

#
openclaw devices approve <requestId>

If you want to preview the latest pending request first:

openclaw devices approve --latest

But docs say approval without an exact id only previews; you still rerun with the exact requestId.

After approving:

openclaw gateway restart
#
openclaw status --deep

That should clear the 1008 connect failed for CLI/control actions. Then you can go back to testing the model call separately.

errant merlin
#

this is the result of the hi command:
PS C:\WINDOWS\system32> openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json
gateway connect failed: GatewayClientRequestError: scope upgrade pending approval (requestId: 555883a1-afb3-4300-8ef5-2b6e0cbfab97)
Error: gateway closed (1008): pairing required: device is asking for more scopes than currently approved (requestId: 555883a1-afb3-4300-8ef5-2b6e0cbfa
Gateway target: ws://127.0.0.1:18789
Source: local loopback
Config: C:\Users\User.openclaw\openclaw.json
Bind: loopback

stable yokeBOT
#

Yep, that confirms the current blocker is device pairing / scope upgrade, not the model.

Approve this exact request id:

openclaw devices approve 555883a1-afb3-4300-8ef5-2b6e0cbfab97

If it says the request is missing/expired, refresh the pending list:

openclaw devices list

Then approve the newest pending scope-upgrade request with its exact requestId.

After approval, retry:

#
openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json

That should finally test the model instead of failing at gateway auth. 🦞

errant merlin
#

PS C:\WINDOWS\system32> openclaw devices approve 555883a1-afb3-4300-8ef5-2b6e0cbfab97

🦞 OpenClaw 2026.4.23 (a979721) — The lobster in your shell. 🦞

|
gateway connect failed: GatewayClientRequestError: scope upgrade pending approval (requestId: 555883a1-afb3-4300-8ef5-2b6e0cbfab97)
o
Direct scope access failed; using local fallback.
Approved 2b8010d84c84c4ebdba8fa2a407a2ca942519111bf0462a4a0f7d4915ee2a0f0 (555883a1-afb3-4300-8ef5-2b6e0cbfab97)
PS C:\WINDOWS\system32>

stable yokeBOT
#

Good — that approval worked despite the scary first line. The key line is:

Approved 2b8010d... (555883a1-afb3-4300-8ef5-2b6e0cbfab97)

Now retry the model probe:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json

If you still get pairing required, run:

openclaw devices list
#

and check whether there is a new pending scope-upgrade request id. Sometimes the CLI asks again with a refreshed request id.

If the probe now reaches the model but times out, then we’re back to the local server/idle-timeout issue — paste that output next.

errant merlin
#

i have a new command running but that wasnt done yet

stable yokeBOT
#

That’s okay — let the current command finish first. Don’t start another model probe in parallel yet, or it’ll muddy the logs.

If it’s the openclaw infer model run ... command, possible outcomes:

  • It returns JSON → paste the output.
  • It hangs until timeout / logs [llm-idle-timeout] → the request reached the model server, but the model didn’t produce tokens fast enough.
  • It shows pairing required again → run openclaw devices list and approve the new request id.

If it’s been stuck for more than your configured timeout, you can stop it with Ctrl+C, then check:

openclaw logs --tail 100
openclaw devices list

But I’d wait a bit first since your local model may be slow to first token.

errant merlin
#

this is my local model log:
slot update_slots: id 0 | task 375 | Checking checkpoint with [8191, 8191] against 305...
slot update_slots: id 0 | task 375 | forcing full prompt re-processing due to lack of cache data (likely due to SWA or hybrid/recurrent memory, see
slot update_slots: id 0 | task 375 | erased invalidated context checkpoint (pos_min = 8191, pos_max = 8191, n_tokens = 8192, n_swa = 0, pos_next = 0, size = 0.282 MiB)
slot update_slots: id 0 | task 375 | erased invalidated context checkpoint (pos_min = 16383, pos_max = 16383, n_tokens = 16384, n_swa = 0, pos_next = 0, size = 0.282 MiB)
slot update_slots: id 0 | task 375 | erased invalidated context checkpoint (pos_min = 16479, pos_max = 16479, n_tokens = 16480, n_swa = 0, pos_next = 0, size = 0.282 MiB)
slot update_slots: id 0 | task 375 | erased invalidated context checkpoint (pos_min = 17048, pos_max = 17048, n_tokens = 17049, n_swa = 0, pos_next = 0, size = 0.282 MiB)
slot update_slots: id 0 | task 375 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id 0 | task 375 | prompt processing progress, n_tokens = 2048, batch.n_tokens = 2048, progress = 0.121212
slot update_slots: id 0 | task 375 | n_tokens = 2048, memory_seq_rm [2048, end)
slot update_slots: id 0 | task 375 | prompt processing progress, n_tokens = 4096, batch.n_tokens = 2048, progress = 0.242424
slot update_slots: id 0 | task 375 | n_tokens = 4096, memory_seq_rm [4096, end)
slot update_slots: id 0 | task 375 | prompt processing progress, n_tokens = 6144, batch.n_tokens = 2048, progress = 0.363636
slot update_slots: id 0 | task 375 | n_tokens = 6144, memory_seq_rm [6144, end)
slot update_slots: id 0 | task 375 | prompt processing progress, n_tokens = 8192, batch.n_tokens = 2048, progress = 0.484848
srv stop: cancel task, id_task = 375
srv log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
srv params_from_: Chat format: peg-native

stable yokeBOT
#

That local model log explains the timeout pretty clearly:

prompt processing progress ... progress = 0.484848
srv stop: cancel task

Your model never reached generation. It was still prompt-processing / prefill when OpenClaw cancelled the request. So OpenClaw wasn’t waiting on the final answer — it was waiting for the first token, and your server produced no streamed chunks before the idle watchdog.

What to do:

  1. Make sure timeout is actually on the provider
"custom-localhost-8080": {
  "baseUrl": "http://127.0.0.1:8080/v1",
#
  "apiKey": "local",
  "api": "openai-completions",
  "timeoutSeconds": 600,
  "models": [ ... ]
}

Then restart:

openclaw gateway restart
  1. Lower the advertised context window
    Your config says planner has:
#
"contextWindow": 64000

That encourages OpenClaw to send large prompts. For your hardware/server, set it much lower first:

"contextWindow": 8192,
"maxTokens": 1024

If stable, increase later.

  1. Turn off reasoning for that local model at first
#
"reasoning": false

Local OpenAI-compatible servers often don’t handle OpenClaw reasoning/tool-style prompting well, and it can increase latency.

  1. Start a fresh chat/session after changing context
    Old transcript/context can keep making prompts large:
/new
  1. If llama.cpp server supports it, reduce server ctx too
    Don’t run the server with a huge context on 4GB GPU/CPU if first-token latency is killing you. Try 4096 or 8192 until it works reliably.

The important diagnosis: srv stop: cancel task means OpenClaw/client aborted while the model was still ingesting the prompt. So either raise provider timeout a lot, or reduce prompt/context/model size so it reaches first token faster. I’d do both: timeoutSeconds: 600, contextWindow: 8192, maxTokens: 1024, reasoning: false for the first working baseline.

errant merlin
#

but when i put the timeout config, it errors then revert to previous config

stable yokeBOT
#

That means the timeout edit was being applied in a way that made the full config invalid. Don’t hand-edit the whole file for this — use openclaw config set, because it validates before writing.

Run these exact commands:

openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 600 --strict-json
openclaw config set models.providers.custom-localhost-8081.timeoutSeconds 600 --strict-json
openclaw config set models.providers.custom-localhost-8082.timeoutSeconds 600 --strict-json

And if you haven’t changed localhost yet:

openclaw config set models.providers.custom-localhost-8080.baseUrl '"http://127.0.0.1:8080/v1"' --strict-json
openclaw config set models.providers.custom-localhost-8081.baseUrl '"http://127.0.0.1:8081/v1"' --strict-json
openclaw config set models.providers.custom-localhost-8082.baseUrl '"http://127.0.0.1:8082/v1"' --strict-json
#

Then verify:

openclaw config get models.providers.custom-localhost-8080.timeoutSeconds
openclaw config validate
openclaw gateway restart

If one of those config set commands errors, paste that exact error. That will tell us whether timeoutSeconds is landing in the wrong path, wrong type, or there’s another schema issue.

Also: for the model prefill issue, I’d lower context too:

openclaw config set models.providers.custom-localhost-8080.models '[{"id":"planner","name":"LFM2-8B Planner","contextWindow":8192,"maxTokens":1024,"input":["text"],"reasoning":false}]' --strict-json --replace

That one intentionally replaces only the planner model list, not the whole config.

errant merlin
#

PS C:\WINDOWS\system32> openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json
Error: gateway timeout after 120000ms
Gateway target: ws://127.0.0.1:18789
Source: local loopback
Config: C:\Users\User.openclaw\openclaw.json
Bind: loopback

stable yokeBOT
#

That specific error is the CLI waiting for the gateway response for 120s, not necessarily the model provider timeout:

Error: gateway timeout after 120000ms

Your model may still be running, but the CLI gave up waiting. For this test, pass a longer CLI timeout too:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json --timeout-ms 600000

But also check whether your provider timeout actually stuck:

openclaw config get models.providers.custom-localhost-8080.timeoutSeconds
#
openclaw config get models.providers.custom-localhost-8080.models

You want to see something like:

timeoutSeconds: 600
contextWindow: 8192
maxTokens: 1024
reasoning: false

Given your local server log showed it was still pre-processing ~8k+ tokens and got cancelled, I strongly recommend lowering the advertised planner context first:

#
openclaw config set models.providers.custom-localhost-8080.models '[{"id":"planner","name":"LFM2-8B Planner","contextWindow":4096,"maxTokens":512,"input":["text"],"reasoning":false}]' --strict-json --replace
openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 600 --strict-json
openclaw config validate
openclaw gateway restart

Then start a fresh chat/session with /new so it doesn’t include the old polluted context, and retry the infer command with --timeout-ms 600000.

If it still times out after that, the local server is too slow/compatibility-broken for OpenClaw’s full prompt path. The next test would be direct server curl with a tiny OpenAI chat-completions payload to confirm first-token latency outside OpenClaw.

errant merlin
#

🦞 OpenClaw 2026.4.23 (a979721) — Your config is valid, your assumptions are not.

Error: Config validation failed: models.providers.custom-localhost-8080: Unrecognized key: "timeoutSeconds"

🦞 OpenClaw 2026.4.23 (a979721) — I'll do the boring stuff while you dramatically stare at the logs like it's cinema.

Error: Config validation failed: models.providers.custom-localhost-8081: Unrecognized key: "timeoutSeconds"

🦞 OpenClaw 2026.4.23 (a979721) — I don't sleep, I just enter low-power mode and dream of clean diffs.

Error: Config validation failed: models.providers.custom-localhost-8082: Unrecognized key: "timeoutSeconds"
PS C:\WINDOWS\system32>

stable yokeBOT
#

You’re right — I gave you the wrong key for your installed version. Sorry, that’s on me. In 2026.4.23, models.providers.*.timeoutSeconds is being rejected, so don’t use it.

Use the model-level timeout under agents.defaults.model.timeoutMs instead:

openclaw config set agents.defaults.model '{"primary":"custom-localhost-8080/planner","timeoutMs":600000}' --strict-json

For vision too, if needed:

openclaw config set agents.defaults.imageModel '{"primary":"custom-localhost-8081/vision","timeoutMs":600000}' --strict-json

Then validate + restart:

#
openclaw config validate
openclaw gateway restart

For the CLI probe, also extend the CLI wait timeout:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json --timeout-ms 600000

But the bigger fix is still to reduce the prompt/context, because your llama.cpp log showed it was stuck in prefill and only ~48% through before cancellation. Set planner smaller:

openclaw config set models.providers.custom-localhost-8080.models '[{"id":"planner","name":"LFM2-8B Planner","contextWindow":4096,"maxTokens":512,"input":["text"],"reasoning":false}]' --strict-json --replace
#

Then /new in chat before testing again.

Short version: remove timeoutSeconds; use agents.defaults.model.timeoutMs on 2026.4.23.

errant merlin
#

14:51:03 [gateway] security audit: device access upgrade requested reason=scope-upgrade device=2b8010d84c84c4ebdba8fa2a407a2ca942519111bf0462a4a0f7d4915ee2a0f0 ip=unknown-ip auth=password roleFrom=operator roleTo=operator scopesFrom=operator.read scopesTo=operator.admin,operator.approvals,operator.pairing,operator.read,operator.talk.secrets,operator.write client=cli conn=ffb23f2a-8d67-4cb3-bba6-68fe56fe30e8
14:51:03 [ws] closed before connect conn=ffb23f2a-8d67-4cb3-bba6-68fe56fe30e8 peer=127.0.0.1:57680->127.0.0.1:18789 remote=127.0.0.1 fwd=n/a origin=n/a host=127.0.0.1:18789 ua=n/a code=1008 reason=connect failed
14:52:32 [agent/embedded] [llm-idle-timeout] custom-localhost-8080/planner produced no reply before the idle watchdog; retrying same model
14:52:49 [ws] ⇄ res ✓ node.list 103ms conn=25dbced2…0ee7 id=8500875f…6ecc
14:54:49 [agent/embedded] embedded run failover decision: runId=39b3734c-8673-47c1-a926-20ea1de25cbc stage=assistant decision=surface_error reason=timeout from=custom-localhost-8080/planner profile=-
14:54:49 [diagnostic] lane wait exceeded: lane=session:agent:main:main waitedMs=167930 queueAhead=0
14:55:10 [ws] ⇄ res ✓ node.list 151ms conn=25dbced2…0ee7 id=ea6fff58…2bf9
14:56:24 [reload] config change detected; evaluating reload (models.providers.custom-localhost-8080.models)
14:56:25 [reload] config hot reload applied (models.providers.custom-localhost-8080.models)
14:57:22 [agent/embedded] [llm-idle-timeout] custom-localhost-8080/planner produced no reply before the idle watchdog; retrying same model

stable yokeBOT
#

This log shows two separate things:

  1. Scope upgrade happened again
scope-upgrade ... request ... code=1008

If CLI commands fail again, run:

openclaw devices list

and approve the newest pending request id.

  1. The real model issue is still first-token timeout
#
[llm-idle-timeout] ... produced no reply before the idle watchdog

Also notice your reload only changed this:

models.providers.custom-localhost-8080.models

So you lowered the model definition, but you probably have not increased the agent/model timeout yet.

For 2026.4.23, since models.providers.*.timeoutSeconds is rejected, set these instead:

openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
#
openclaw config set agents.defaults.model.timeoutMs 600000 --strict-json
openclaw config validate
openclaw gateway restart

Then start a clean session so the old huge/polluted context is not reused:

/new

Keep planner small for now:

contextWindow: 4096
#
maxTokens: 512
reasoning: false

Then test with a longer CLI wait:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json --timeout-ms 600000

If it still idles out after that, your llama.cpp server is still spending too long in prompt prefill before first token. At that point the fix is not more OpenClaw config — reduce server context / model size / prompt length, or use a faster backend/model.

errant merlin
#

PS C:\WINDOWS\system32> openclaw config set agents.defaults.model '{"primary":"custom-localhost-8080/planner","timeoutMs":600000}' --strict-json

🦞 OpenClaw 2026.4.23 (a979721) — I'm not saying your workflow is chaotic... I'm just bringing a linter and a helmet.

Error: Failed to parse JSON value: SyntaxError: Expected property name or '}' in JSON at position 1 (line 1 column 2)

stable yokeBOT
#

That’s a PowerShell quoting problem. The JSON quotes are getting stripped before OpenClaw receives it, so OpenClaw sees something like {primary:...} instead of valid JSON.

Easiest fix: don’t set the whole object. Set the nested fields separately:

openclaw config set agents.defaults.model.primary '"custom-localhost-8080/planner"' --strict-json
openclaw config set agents.defaults.model.timeoutMs 600000 --strict-json

Then verify:

openclaw config get agents.defaults.model
openclaw config validate
openclaw gateway restart
#

If PowerShell still fights the string value, use backtick escaping instead:

openclaw config set agents.defaults.model.primary "`"custom-localhost-8080/planner`"" --strict-json

Or use a patch file, which avoids shell quoting entirely:

@'
{
  agents: {
    defaults: {
      model: {
        primary: "custom-localhost-8080/planner",
        timeoutMs: 600000
#
      }
    }
  }
}
'@ | Set-Content $env:TEMP\openclaw-timeout.patch.json5

openclaw config patch --file $env:TEMP\openclaw-timeout.patch.json5 --dry-run
openclaw config patch --file $env:TEMP\openclaw-timeout.patch.json5

I’d use the two nested config set commands first — least room for JSON drama. 🦞

errant merlin
#

PS C:\WINDOWS\system32> openclaw config set agents.defaults.model.primary '"custom-localhost-8080/planner"' --strict-json

openclaw config set agents.defaults.model.timeoutMs 600000 --strict-json

🦞 OpenClaw 2026.4.23 (a979721) — Type the command with confidence—nature will provide the stack trace if needed.

Error: Failed to parse JSON value: SyntaxError: Unexpected token 'c', "custom-loc"... is not valid JSON

stable yokeBOT
#

PowerShell is still eating the quotes. For this specific field, don’t use --strict-json — a plain value is fine because it’s just a string.

Run them separately:

openclaw config set agents.defaults.model.primary custom-localhost-8080/planner
openclaw config set agents.defaults.model.timeoutMs 600000 --strict-json

Then check:

openclaw config get agents.defaults.model
openclaw config validate
openclaw gateway restart
#

If you want the no-shell-drama version, use a patch file:

@'
{
  agents: {
    defaults: {
      model: {
        primary: "custom-localhost-8080/planner",
        timeoutMs: 600000
      }
    }
  }
}
'@ | Set-Content $env:TEMP\openclaw-timeout.patch.json5
#
openclaw config patch --file $env:TEMP\openclaw-timeout.patch.json5 --dry-run
openclaw config patch --file $env:TEMP\openclaw-timeout.patch.json5

The patch-file route is safest on Windows when quoting gets weird.

errant merlin
stable yokeBOT
#

Yep. For vision, use agents.defaults.imageModel:

openclaw config set agents.defaults.imageModel.primary custom-localhost-8081/vision
openclaw config set agents.defaults.imageModel.timeoutMs 600000 --strict-json

For coder, there are two cases:

If you want to make coder the main/default model instead of planner:

openclaw config set agents.defaults.model.primary custom-localhost-8082/coder
openclaw config set agents.defaults.model.timeoutMs 600000 --strict-json

If you want to keep planner as default and only switch to coder sometimes, don’t change config — use model switching in chat/CLI instead:

#
/model custom-localhost-8082/coder

Then verify everything:

openclaw config get agents.defaults.model
openclaw config get agents.defaults.imageModel
openclaw config validate
openclaw gateway restart

Important: OpenClaw only has one normal active agents.defaults.model at a time. imageModel is separate and gets used for image-capable handling when needed.

errant merlin
#

PS C:\WINDOWS\system32> openclaw config get agents.defaults.model

openclaw config get agents.defaults.imageModel
openclaw config validate
openclaw gateway restart

🦞 OpenClaw 2026.4.23 (a979721) — Welcome to the command line: where dreams compile and confidence segfaults.

{
"primary": "custom-localhost-8080/planner"
}

🦞 OpenClaw 2026.4.23 (a979721) — The lobster in your shell. 🦞

{
"primary": "custom-localhost-8081/vision"
}

🦞 OpenClaw 2026.4.23 (a979721) — I don't have opinions about tabs vs spaces. I have opinions about everything else.

Config valid: ~.openclaw\openclaw.json

🦞 OpenClaw 2026.4.23 (a979721) — I'm like tmux: confusing at first, then suddenly you can't live without me.

Restarted Windows login item: OpenClaw Gateway
[openclaw] Uncaught exception: Error: spawn start ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:286:19)
at onErrorNT (node:internal/child_process:484:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21)

stable yokeBOT
#

Two things here:

1. Your timeout didn’t show up in config get
The output only shows:

{
  "primary": "custom-localhost-8080/planner"
}

So timeoutMs is not currently set. Run:

openclaw config set agents.defaults.model.timeoutMs 600000 --strict-json
openclaw config set agents.defaults.imageModel.timeoutMs 600000 --strict-json
#
openclaw config get agents.defaults.model.timeoutMs
openclaw config get agents.defaults.imageModel.timeoutMs
openclaw config validate

For coder, there is no separate coder timeout unless coder is the active main model. If you switch main model to coder, this same timeout applies:

openclaw config set agents.defaults.model.primary custom-localhost-8082/coder
openclaw config set agents.defaults.model.timeoutMs 600000 --strict-json

If you keep planner as default, just use /model custom-localhost-8082/coder when you want coder.

2. spawn start ENOENT is a Windows restart/service issue
Config is valid, so this is not your JSON. Check whether the gateway actually came back:

#
openclaw gateway status
openclaw status --deep

If it’s running, ignore that restart error for now and continue testing.

If it’s not running or restart keeps throwing spawn start ENOENT, repair the Windows service setup:

openclaw doctor
openclaw gateway install
openclaw gateway status

Then try:

#
openclaw gateway restart

Your earlier status showed both Scheduled Task and Startup-folder login item, so OpenClaw may be hitting the fallback login-item restart path. Scheduled Task is preferred on Windows.

errant merlin
#

🦞 OpenClaw 2026.4.23 (a979721) — Your .env is showing; don't worry, I'll pretend I didn't see it.

Error: Config validation failed: agents.defaults.model: Invalid input

stable yokeBOT
#

Okay — that means 2026.4.23 does not accept agents.defaults.model.timeoutMs in your config. I was wrong on that key for your installed version. Remove it.

Run:

openclaw config unset agents.defaults.model.timeoutMs
openclaw config unset agents.defaults.imageModel.timeoutMs
openclaw config validate

Then use the supported whole-agent timeout instead:

openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
openclaw config validate
#

Your model config should go back to this shape:

"model": {
  "primary": "custom-localhost-8080/planner"
},
"imageModel": {
  "primary": "custom-localhost-8081/vision"
},
"timeoutSeconds": 600

If config unset can’t run because the config is already wedged, do:

openclaw doctor --fix
#
openclaw config validate

Then re-apply only:

openclaw config set agents.defaults.timeoutSeconds 600 --strict-json

For the CLI probe, you can still extend the CLI wait separately:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json --timeout-ms 600000

But don’t use agents.defaults.model.timeoutMs on 2026.4.23.

errant merlin
#

15:09:53 [discord] [default] starting provider
15:09:54 [discord] users resolved: 807736763425619968
15:09:54 [bonjour] restarting advertiser (service stuck in probing for 106164ms (gateway fqdn=DESKTOP-UJ5TSC3 (OpenClaw)._openclaw-gw._tcp.local. host=openclaw.local. port=18789 state=probing))
15:09:54 [ws] handshake timeout conn=e67c77d5-0369-4e4f-a18a-4d7d6af0b932 peer=127.0.0.1:51690->127.0.0.1:18789 remote=127.0.0.1
15:09:54 [plugins] embedded acpx runtime backend registered (cwd: C:\Users\User.openclaw\workspace)
15:09:55 (node:37564) [DEP0190] DeprecationWarning: Passing args to a child process with shell option true can lead to security vulnerabilities, as the arguments are not escaped, only concatenated.
(Use node --trace-deprecation ... to show where the warning was created)
15:09:55 [browser/server] Browser control listening on http://127.0.0.1:18791/ (auth=password)
15:09:55 [ws] closed before connect conn=e67c77d5-0369-4e4f-a18a-4d7d6af0b932 peer=127.0.0.1:51690->127.0.0.1:18789 remote=127.0.0.1 fwd=n/a origin=http://127.0.0.1:18789 host=127.0.0.1:18789 ua=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.0.0 Safari/537.36 code=1000 reason=n/a
15:09:55 [ws] webchat connected conn=e67c77d5-0369-4e4f-a18a-4d7d6af0b932 remote=127.0.0.1 client=openclaw-control-ui webchat v2026.4.23
15:09:57 [plugins] embedded acpx runtime backend ready
15:09:57 [discord] client initialized as 1484564009775534170 (Ralsei); awaiting gateway readiness

stable yokeBOT
#

This startup log is mostly okay. The scary-looking bits are probably not your model issue:

  • bonjour restarting advertiser... probing = local network discovery/mDNS got stuck and restarted. Usually harmless unless you rely on LAN discovery.
  • ws handshake timeout followed by webchat connected = the Control UI tried to connect before the gateway was fully ready, then reconnected successfully.
  • DEP0190 = Node deprecation warning from some child-process call; noisy, not fatal.
  • embedded acpx runtime backend ready + discord client initialized = good signs.

The line to watch for after this is whether it eventually says gateway ready / Discord ready, or whether it keeps sitting at:

[discord] client initialized ... awaiting gateway readiness

If it stays there for minutes, run:

#
openclaw status --deep
openclaw health --verbose
openclaw config validate

For the model timeout specifically, this log doesn’t show a model request yet. After gateway is ready, test with:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json --timeout-ms 600000

Also since model.timeoutMs was invalid on your version, the safe timeout config is:

openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
#
openclaw config validate

Then restart only if validation passes.

errant merlin
#

why my openclaw is slow at transfering prompts and results

stable yokeBOT
#

It’s probably not “transfer” speed. From your llama.cpp log, OpenClaw sends the request, then your model server spends a long time prefilling the prompt before it produces the first token.

This part was the giveaway:

prompt processing progress ... n_tokens = 8192 ... progress = 0.484848
srv stop: cancel task

So the slow path is usually:

  1. OpenClaw builds a big agent prompt: system prompt + tools + workspace/context + chat history.
  2. Local server receives it.
  3. llama.cpp reprocesses thousands of tokens.
  4. No output appears until prefill finishes.
  5. OpenClaw/CLI looks “stuck” and may timeout.
#

For your hardware, do this:

openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
openclaw config set models.providers.custom-localhost-8080.models '[{"id":"planner","name":"LFM2-8B Planner","contextWindow":4096,"maxTokens":512,"input":["text"],"reasoning":false}]' --strict-json --replace
openclaw config validate

Then restart / fresh session:

openclaw gateway restart

In chat:

#
/new

Also use 127.0.0.1 instead of localhost in model baseUrl if you haven’t already.

If it’s still slow after that, reduce more:

contextWindow: 2048,
maxTokens: 256,
reasoning: false

Big picture: local 4GB GPU + CPU inference can work, but OpenClaw agent prompts are much heavier than a simple “hi” in a model UI. The fastest fix is smaller context + fresh session + no reasoning + fewer tools/models loaded while testing.

errant merlin
#

slot update_slots: id 0 | task 393 | n_tokens = 8192, memory_seq_rm [8192, end)
slot update_slots: id 0 | task 393 | prompt processing progress, n_tokens = 10240, batch.n_tokens = 2048, progress = 0.610069
slot update_slots: id 0 | task 393 | n_tokens = 10240, memory_seq_rm [10240, end)
slot update_slots: id 0 | task 393 | prompt processing progress, n_tokens = 12288, batch.n_tokens = 2048, progress = 0.732082
slot update_slots: id 0 | task 393 | n_tokens = 12288, memory_seq_rm [12288, end)
slot update_slots: id 0 | task 393 | prompt processing progress, n_tokens = 14336, batch.n_tokens = 2048, progress = 0.854096

stable yokeBOT
#

That confirms it: the request is ~16.8k prompt tokens. It’s at 85% prefill and still hasn’t generated a single output token yet.

So the bottleneck is prompt prefill, not network transfer.

Do these reductions:

1. Set an effective runtime cap, not only native context
Use contextTokens too:

openclaw config set models.providers.custom-localhost-8080.models '[{"id":"planner","name":"LFM2-8B Planner","contextWindow":4096,"contextTokens":4096,"maxTokens":512,"input":["text"],"reasoning":false}]' --strict-json --replace

contextWindow is model metadata; contextTokens is the smaller runtime cap you want OpenClaw to actually budget around.

2. Enable lean mode for local models
This trims heavy default tool surface:

#
openclaw config set agents.defaults.experimental.localModelLean true --strict-json

3. Reduce Discord history context
For testing, disable extra group history:

openclaw config set channels.discord.historyLimit 0 --strict-json

4. Validate + restart + fresh session

openclaw config validate
openclaw gateway restart
#

Then in chat:

/new

If it’s still too large, last-resort disable tools for that local planner:

openclaw config set models.providers.custom-localhost-8080.models '[{"id":"planner","name":"LFM2-8B Planner","contextWindow":4096,"contextTokens":4096,"maxTokens":512,"input":["text"],"reasoning":false,"compat":{"supportsTools":false}}]' --strict-json --replace

That makes the planner more like a plain chat model, but much lighter. For a 4GB GPU/CPU setup, this is often the difference between “eventually maybe” and usable.

errant merlin
#

if saying hi is already this heavy then how do i do working loops

stable yokeBOT
#

Yeah — this is the hard truth of local agents: a real OpenClaw agent loop is much heavier than a plain “hi” in llama.cpp/LM Studio.

A working loop includes a lot before your actual message:

  • system/persona instructions
  • tool definitions
  • Discord/thread context
  • workspace/bootstrap files
  • session history
  • model routing metadata
  • sometimes image/audio/media capability instructions

So “hi” can still become a 10k–20k token prompt. On a 4GB GPU + CPU inference, that means slow prefill every turn.

For your hardware, I would not try to run the full coding-agent loop locally. I’d use one of these patterns:

Option A — local light assistant

#

Use local model for chat/basic tasks only:

openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json

And set planner small:

contextWindow: 4096
contextTokens: 4096
maxTokens: 512
reasoning: false
compat: { supportsTools: false }
#

This gives you a usable local “answer simple things” bot, not a strong coding loop.

Option B — hybrid, recommended
Use local for cheap/private chat, but hosted model for coding/tool work. That’s the practical setup. Local 4GB GPU models usually struggle with tool calling, long prompts, and multi-step coding.

Option C — separate OpenClaw agents
Make one lightweight local agent for Discord chat and another stronger/hosted agent for coding. Then switch/use the stronger one when needed.

Option D — direct model mode for local
For local models, test with:

openclaw infer model run --model custom-localhost-8080/planner --prompt "..." --gateway --json

That skips the full agent/tool loop. If this is fast but chat is slow, the problem is definitely agent context size.

#

My honest recommendation: on your machine, aim for local vision/transcription/simple chat, but use a cloud model or much bigger local rig for real working loops. Otherwise every turn spends forever just digesting the prompt before it can think. 🦞

errant merlin
#

PS C:\WINDOWS\system32> openclaw config set models.providers.custom-localhost-8080.models '[{"id":"planner","name":"LFM2-8B Planner","contextWindow":4096,"contextTokens":4096,"maxTokens":512,"input":["text"],"reasoning":false}]' --strict-json --replace

🦞 OpenClaw 2026.4.23 (a979721) — Self-hosted, self-updating, self-aware (just kidding... unless?).

Error: Failed to parse JSON value: SyntaxError: Expected property name or '}' in JSON at position 2 (line 1 column 3)
PS C:\WINDOWS\system32>

stable yokeBOT
#

PowerShell is mangling the JSON array again. Don’t replace the whole models array — set the fields one by one instead:

openclaw config set models.providers.custom-localhost-8080.models[0].contextWindow 4096 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].contextTokens 4096 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].maxTokens 512 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].reasoning false --strict-json

Then verify:

openclaw config get models.providers.custom-localhost-8080.models
openclaw config validate

If you want to disable tools for the planner too:

#
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json

If that path errors because compat doesn’t exist yet, use the patch-file method instead:

@'
{
  models: {
    providers: {
      "custom-localhost-8080": {
        models: [
          {
            id: "planner",
            name: "LFM2-8B Planner",
#
            contextWindow: 4096,
            contextTokens: 4096,
            maxTokens: 512,
            input: ["text"],
            reasoning: false,
            compat: { supportsTools: false }
          }
        ]
      }
    }
  }
}
'@ | Set-Content $env:TEMP\planner.patch.json5

openclaw config patch --file $env:TEMP\planner.patch.json5 --replace-path models.providers.custom-localhost-8080.models --dry-run
#
openclaw config patch --file $env:TEMP\planner.patch.json5 --replace-path models.providers.custom-localhost-8080.models

The one-by-one commands are simpler; use those first.

errant merlin
#

PS C:\WINDOWS\system32> openclaw config get models.providers.custom-localhost-8080.models

openclaw config validate

🦞 OpenClaw 2026.4.23 (a979721) — Welcome to the command line: where dreams compile and confidence segfaults.

[
{
"id": "planner",
"name": "LFM2-8B Planner",
"reasoning": false,
"input": [
"text"
],
"contextWindow": 4096,
"contextTokens": 4096,
"maxTokens": 512,
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"api": "openai-completions"
}
]

🦞 OpenClaw 2026.4.23 (a979721) — Making 'I'll automate that later' happen now.

Config valid: ~.openclaw\openclaw.json
PS C:\WINDOWS\system32>

stable yokeBOT
#

Good — that part is now fixed and valid. Planner is capped to 4k context / 512 output and reasoning is off.

Now do the remaining lightweight-agent settings:

openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
openclaw config validate

Then restart and start clean:

openclaw gateway restart
#

In chat:

/new

Then test:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json --timeout-ms 600000

If the full chat is still huge after that, add the last-resort tool disable for this local model:

openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
#
openclaw config validate
openclaw gateway restart

That will make it much lighter, but it also means this planner won’t do tool-calling loops. For your hardware, that may be the right tradeoff for a usable local chat baseline.

errant merlin
#

what does this do:openclaw config set channels.discord.historyLimit 0 --strict-json

stable yokeBOT
#

It tells OpenClaw’s Discord channel to include zero recent Discord messages as extra channel history when building the prompt.

So with:

openclaw config set channels.discord.historyLimit 0 --strict-json

OpenClaw will stop adding recent Discord thread/channel history into each agent turn.

Why I suggested it: your local model was getting huge prompts. Discord history can add extra tokens every turn, so setting it to 0 helps make the prompt smaller and faster.

Tradeoff:

  • âś… faster / fewer prompt tokens
  • âś… better for weak local models
  • ❌ less awareness of recent thread context unless it’s already in the active session transcript or directly included in the current message
#

You can later set it back to something small, like:

openclaw config set channels.discord.historyLimit 5 --strict-json

For your 4GB GPU setup, 0 or 5 is much safer than large history.

errant merlin
#

btw this is what happend after i tried to say hi:
15:20:51 [diagnostic] stuck session: sessionId=e3229eab-689c-4e49-b3f4-49210edaa7bd sessionKey=agent:main:main state=processing age=527s queueDepth=1
15:20:51 [discord] gateway: Gateway websocket closed: 1006
15:20:51 [discord] gateway: Gateway reconnect scheduled in 1076ms (close, resume=true)
15:20:52 [ws] ⇄ res ✓ node.list 227ms conn=2f287603…c7e6 id=aeb545f6…f42a

stable yokeBOT
#

That means the previous hi turn got wedged in the session lane:

stuck session ... state=processing age=527s queueDepth=1

So OpenClaw still thinks the main session is busy processing an old run. New messages queue behind it, which makes everything feel frozen.

The Discord 1006 reconnect is probably secondary — Discord gateway connection dropped/reconnected while the agent lane was stuck. The key problem is the stuck session/model run.

Do this cleanup:

openclaw gateway restart --force

Then start a fresh session in chat:

#
/new

If /new doesn’t respond because the lane is still wedged, use:

openclaw sessions list

and if needed run doctor:

openclaw doctor

Before trying Discord again, test the model outside the full chat path:

#
openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --json --timeout-ms 600000

If that is still slow/hangs, don’t test in Discord yet — it’ll just wedge the session again. Fix the local model prompt/prefill first.

Also after your config reductions, confirm these are set:

openclaw config get agents.defaults.experimental.localModelLean
openclaw config get channels.discord.historyLimit
openclaw config get models.providers.custom-localhost-8080.models
openclaw config get agents.defaults.timeoutSeconds

Expected: lean mode true, Discord history 0, planner contextTokens: 4096, and timeout 600.

errant merlin
#

PS C:\WINDOWS\system32> openclaw gateway restart --force
error: unknown option '--force'

stable yokeBOT
#

Yep, that flag isn’t in your 2026.4.23 CLI. My bad — I was looking at newer restart docs. Use the supported command:

openclaw gateway restart

Then check:

openclaw gateway status
openclaw status --deep

If the session is still stuck after restart, the local model request may still be hanging in your llama.cpp server. In that case, stop/restart the local model server too, then start a fresh OpenClaw chat session:

#
/new

For now, don’t keep sending hi in Discord while the model probe is hanging — each message can queue behind the stuck session lane.

Use this order:

openclaw gateway restart
openclaw config validate
openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --json --timeout-ms 600000

If that probe hangs, the full Discord agent loop will definitely hang too. Fix the local server/prompt size first, then test Discord.

errant merlin
#

Restarted Windows login item: OpenClaw Gateway
[openclaw] Uncaught exception: Error: spawn start ENOENT
at ChildProcess._handle.onexit (node:internal/child_process:286:19)
at onErrorNT (node:internal/child_process:484:16)
at process.processTicksAndRejections (node:internal/process/task_queues:90:21)
PS C:\WINDOWS\system32>

stable yokeBOT
#

That spawn start ENOENT looks like a Windows service/restart path bug, not your model config.

start is a cmd.exe built-in on Windows, not a real executable, so if OpenClaw’s fallback “Startup-folder login item” restart path tries to spawn start directly, it can throw exactly this kind of ENOENT.

First check whether the gateway actually restarted anyway:

openclaw gateway status
openclaw status --deep

If it says the gateway is running/listening on 18789, you can ignore this for the moment and continue testing.

If it is not running, use the temporary foreground workaround:

#
openclaw gateway run

Leave that window open while testing.

Then, to repair the Windows managed startup setup, try:

openclaw doctor
openclaw gateway install
openclaw gateway status

If gateway restart keeps throwing spawn start ENOENT after reinstall, that’s worth filing as a Windows bug here:
https://github.com/openclaw/openclaw/issues

#

Include:

  • OpenClaw 2026.4.23
  • Windows version
  • this error: spawn start ENOENT
  • that it says Restarted Windows login item: OpenClaw Gateway

For now, don’t keep testing Discord until openclaw gateway status confirms the gateway is healthy — otherwise the stuck session/model debugging gets mixed with service restart failure.

errant merlin
#

i have been testing on gateway this whole time

stable yokeBOT
#

Right — but there are two different “gateway” paths, and I should’ve been clearer:

openclaw infer model run --gateway ... = Gateway-routed raw model probe. It should skip the full Discord/agent/tool/session context and send basically just your prompt.

Sending hi in Discord/OpenClaw chat = full agent turn through the Gateway. That includes session context, tools, persona/bootstrap, Discord handling, etc. That’s the heavy one.

So if the huge llama.cpp prefill log happened after you said hi in Discord, that makes sense.

If the huge prefill happened from this command:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Say hello" --gateway --json --timeout-ms 600000

then something else is wrong, because that probe should not be sending 16k tokens. In that case, test your model server directly outside OpenClaw:

#
$body = @{
  model = "planner"
  messages = @(@{ role = "user"; content = "Say hello" })
  max_tokens = 20
  stream = $false
} | ConvertTo-Json -Depth 5

Invoke-RestMethod -Uri http://127.0.0.1:8080/v1/chat/completions -Method Post -ContentType "application/json" -Body $body

If even that direct request pre-processes thousands of tokens, your server/model template is injecting a huge prompt by itself.

If direct request is fast but Discord hi is slow, then OpenClaw agent context is the issue — use lean mode, historyLimit 0, /new, and possibly compat.supportsTools false for the local planner.

errant merlin
#

uh why does the agent return json file:
{"message": "Hi. I’m your openclaw control UI assistant, ready to help with agent coordination, memory updates, scheduling reminders, or any task to streamline your workflow. How can I assist you today?"}

stable yokeBOT
#

That’s not a “file” — it’s JSON text.

Most likely one of two things:

  1. You ran the CLI with --json
    If the command was like:
openclaw infer model run ... --json

then OpenClaw intentionally prints machine-readable JSON instead of pretty text. Remove --json if you want normal output.

  1. Your local model is answering in JSON because of its chat template / system prompt
    This part:
#
{"message":"Hi. I’m your openclaw control UI assistant..."}

looks like the model wrapped its reply in a message field. That can happen with some local models/templates, especially if the server uses a “tool/chat” format or a control-UI-style system prompt.

Quick checks:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway

No --json there.

If it still returns:

#
{"message":"hello"}

then it’s your local model/server template doing JSON-style output. Try changing the server chat format/template, or add a direct instruction like:

Reply in plain text only. Do not use JSON.

Also, the wording “control UI assistant” suggests you may be talking through the Control UI/webchat path, not Discord’s normal assistant flow. That’s okay, but it may use a slightly different prompt than Discord.

errant merlin
#

got this

Default (off)
Model context window too small (4096 tokens; source=modelsConfig). Minimum is 16000. OpenClaw is using the configured model context limit for this model. Raise contextWindow/contextTokens or choose a larger model.

|

Model context window too small (4096 tokens; source=modelsConfig). Minimum is 16000. OpenClaw is using the configured model context limit for this model. Raise contextWindow/contextTokens or choose a larger model

stable yokeBOT
#

That means OpenClaw’s agent loop has a hard minimum context budget of 16k tokens. Setting planner to 4096 made the prompt smaller, but now OpenClaw refuses to use it as a normal agent model.

So set it back to at least 16000, but keep the other reductions:

openclaw config set models.providers.custom-localhost-8080.models[0].contextWindow 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].contextTokens 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].maxTokens 512 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].reasoning false --strict-json
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config validate

Then restart and start fresh:

#
openclaw gateway restart

In chat:

/new

Important tradeoff: 16k is the minimum for OpenClaw agent mode, but your local server was already slow pre-filling around that size. So if 16k is still too slow on your hardware, your realistic options are:

  • use this local model only for raw probes / simple non-agent use
  • disable tools for it with compat.supportsTools false
  • use a faster/smaller local model/backend
  • use hosted model for real agent/coding loops
#

The error is basically OpenClaw saying: “4k is too small to safely run the full assistant.”

errant merlin
#

15:29:11 [agent/embedded] blocked model (context window too small): custom-localhost-8080/planner ctx=4096 (min=16000) source=modelsConfig; Model context window too small (4096 tokens; source=modelsConfig). Minimum is 16000. OpenClaw is using the configured model context limit for this model. Raise contextWindow/contextTokens or choose a larger model.
15:29:11 [diagnostic] lane task error: lane=main durationMs=30786 error="FailoverError: Model context window too small (4096 tokens; source=modelsConfig). Minimum is 16000. OpenClaw is using the configured model context limit for this model. Raise contextWindow/contextTokens or choose a larger model."
15:29:11 [diagnostic] lane task error: lane=session:agent:main:main durationMs=30790 error="FailoverError: Model context window too small (4096 tokens; source=modelsConfig). Minimum is 16000. OpenClaw is using the configured model context limit for this model. Raise contextWindow/contextTokens or choose a larger model."

stable yokeBOT
#

Yep, that log is exactly the 4k cap problem. OpenClaw won’t run the full embedded agent loop below 16k context.

Set planner back to 16k:

openclaw config set models.providers.custom-localhost-8080.models[0].contextWindow 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].contextTokens 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].maxTokens 512 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].reasoning false --strict-json
openclaw config validate

Then restart / fresh session:

openclaw gateway restart
#

In chat:

/new

But the tradeoff is real: OpenClaw requires 16k minimum for agent mode, and your server is slow prefilling ~16k. So if 16k is too slow on your hardware, the practical choices are:

  • use local model only for raw/simple model probes, not full agent loops
  • enable lean mode + history 0 to reduce overhead as much as possible
  • disable tools for the planner to make it lighter:
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
  • or use a hosted/faster model for real coding/tool loops
#

For your setup, I’d use local for vision/transcription/simple chat, and a stronger model for coding loops. 4GB GPU just gets punished by the 16k agent prompt prefill.

errant merlin
#

like why it asks for 16k even with a "hi"

stable yokeBOT
#

Because OpenClaw isn’t checking the size of your message. It’s checking whether the model is large enough for the minimum full agent runtime envelope.

Even when you type hi, the model may receive something like:

  • OpenClaw system instructions
  • safety / routing instructions
  • tool definitions
  • Discord/session metadata
  • workspace/bootstrap context
  • recent session transcript
  • model/tool protocol instructions

OpenClaw needs enough context headroom for all of that plus the user request plus the model’s reply and tool loop. So it enforces a minimum context window of 16k for embedded agent mode.

Think of it like: hi is the payload, but the truck still has to carry the whole assistant runtime.

For local models, your options are:

#
openclaw config set models.providers.custom-localhost-8080.models[0].contextWindow 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].contextTokens 16000 --strict-json

Then reduce overhead as much as possible:

openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].reasoning false --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].maxTokens 512 --strict-json

If 16k prefill is still too slow on your 4GB GPU/CPU setup, that’s the hardware/backend limit showing up. The model can answer hi, but it can’t comfortably run the full OpenClaw agent loop.

errant merlin
#

does it required for image model

stable yokeBOT
#

Yes if the image model is used as an OpenClaw agent/model turn.

agents.defaults.imageModel is still a model used inside the assistant runtime when images are involved, so it needs enough context for the OpenClaw envelope too. Don’t set the vision model to 4k if OpenClaw is going to route image turns through it.

For your vision model, keep at least 16k if the server/model can handle it:

openclaw config set models.providers.custom-localhost-8081.models[0].contextWindow 16000 --strict-json
openclaw config set models.providers.custom-localhost-8081.models[0].contextTokens 16000 --strict-json
openclaw config set models.providers.custom-localhost-8081.models[0].maxTokens 512 --strict-json
openclaw config set models.providers.custom-localhost-8081.models[0].reasoning false --strict-json

Then:

#
openclaw config validate

Separate note: Whisper/audio transcription under tools.media.audio is different — that does not need the 16k agent context window, because it’s a CLI/media transcription step, not the full agent loop.

errant merlin
#

15:37:00 [diagnostic] stuck session: sessionId=4bff9132-5129-4012-9515-9c38fcf9957b sessionKey=agent:main:main state=processing age=133s queueDepth=1

stable yokeBOT
#

That means the same main session lane is still blocked by a running/failed model turn:

state=processing age=133s queueDepth=1

At this point, don’t send more chat messages into OpenClaw yet — they’ll just queue behind the stuck run.

Do this recovery order:

# 1) Stop/restart your local llama.cpp/model server first
#    The stuck OpenClaw turn is probably waiting on that request.

# 2) Restart OpenClaw gateway
openclaw gateway restart
#

# 3) Check health
openclaw gateway status
openclaw status --deep

Then start a fresh session in chat:

/new

If the gateway restart still throws spawn start ENOENT, run the gateway in foreground temporarily while testing:

#
openclaw gateway run

Leave that window open.

Before using Discord again, test the local model directly or with the raw gateway probe. If this hangs, Discord will hang too:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --json --timeout-ms 600000

If that command is slow/stuck, the fix is still on the local model side: 16k context is required for agent mode, but your backend is struggling to prefill it. Use lean mode/history 0, or switch the real agent loop to a faster/hosted model.

errant merlin
#

i will update to the latest openclaw ver first

stable yokeBOT
#

Good call. Update first, then re-test — a few things we hit look version-specific.

Use the built-in updater if available:

openclaw update --dry-run
openclaw update

After it finishes:

openclaw --version
openclaw doctor
openclaw config validate
openclaw gateway restart
#
openclaw status --deep

If openclaw update itself fails, fall back to npm:

npm i -g openclaw@latest
openclaw doctor
openclaw gateway restart

Docs: https://docs.openclaw.ai/install/updating

After updating, paste the new openclaw --version and we can revisit the timeout/context keys against the newer schema. 🦞

errant merlin
#

PS C:\WINDOWS\system32> openclaw update
Updating OpenClaw...

|
o âś“ Updating via package manager (214.23s)
|
o âś“ Running doctor checks (57.73s)

Update Result: OK
Root: C:\nvm4w\nodejs\node_modules\openclaw
Before: 2026.4.23
After: 2026.5.7

Total time: 272.61s

stable yokeBOT
#

Nice, update succeeded 🎉 You’re now on 2026.5.7, so some of the config keys we were fighting may behave differently now.

Next do this cleanup/check sequence:

openclaw --version
openclaw config validate
openclaw gateway restart
openclaw status --deep

Then check whether the provider timeout key is accepted on this version:

openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 600 --strict-json
openclaw config validate
#

If that passes, also set it for vision/coder:

openclaw config set models.providers.custom-localhost-8081.timeoutSeconds 600 --strict-json
openclaw config set models.providers.custom-localhost-8082.timeoutSeconds 600 --strict-json
openclaw config validate

Then test raw gateway model routing before Discord/chat:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --json --timeout-ms 600000

If that succeeds, then try a fresh chat session with /new. If it still idles out, we know it’s local model prefill speed, not old OpenClaw version/config schema weirdness.

errant merlin
#

what is the findstr /R/C:"18789 .*LIstening

stable yokeBOT
#

That’s a Windows filter command, usually used with netstat to check whether OpenClaw’s gateway port is listening.

Typical full command:

netstat -ano | findstr /R /C:"18789 .*LISTENING"

Meaning:

  • netstat -ano lists network sockets + process IDs
  • | pipes that output into findstr
  • findstr searches/filter lines
  • /R means use regex matching
  • /C:"..." treats the quoted text as one search pattern
  • 18789 is OpenClaw’s gateway port
  • .* means “anything in between”
#
  • LISTENING means a process is accepting connections on that port

So it’s checking: “is anything listening on port 18789?”

Small typo: it should be LISTENING, not LIstening — Windows matching may be case-sensitive depending on usage. Safer:

netstat -ano | findstr /I /R /C:"18789 .*LISTENING"

/I makes it case-insensitive.

errant merlin
#

uh why that windows keeps opening

stable yokeBOT
#

Probably because OpenClaw is using the Windows Startup-folder login item fallback instead of a proper Scheduled Task service.

Your earlier logs said:

Restarted Windows login item: OpenClaw Gateway
spawn start ENOENT

That fallback can cause visible cmd/PowerShell windows to pop open when the gateway starts/restarts. Scheduled Task mode is preferred because it runs more quietly and has better supervisor status.

Check what mode you’re in:

openclaw gateway status
#

Then try repairing/reinstalling the managed gateway service:

openclaw doctor
openclaw gateway install
openclaw gateway status

If it still uses the Startup-folder login item after that, Windows may be blocking Scheduled Task creation. In that case, the window opening is expected from the fallback.

For now: if the gateway is already running, don’t keep restarting it repeatedly — that’s what keeps popping the window. Use:

openclaw gateway status

to check health instead.

errant merlin
#

\Users\User\AppData\Local\nvm\v24.14.0\node_modules\openclaw\dist\extensions\browser\skills\browser-automation": Error: EPERM: operation not permitted, symlink 'C:\Users\User\AppData\Local\nvm\v24.14.0\node_modules\openclaw\dist\extensions\browser\skills\browser-automation' -> 'C:\Users\User.openclaw\plugin-skills\browser-automation'
15:48:03 [skills] failed to create plugin skill symlink "C:\Users\User.openclaw\plugin-skills\browser-automation" → "C:\Users\User\AppData\Local\nvm\v24.14.0\node_modules\openclaw\dist\extensions\browser\skills\browser-automation": Error: EPERM: operation not permitted, symlink 'C:\Users\User\AppData\Local\nvm\v24.14.0\node_modules\openclaw\dist\extensions\browser\skills\browser-automation' -> 'C:\Users\User.openclaw\plugin-skills\browser-automation'
15:48:17 [agent/embedded] [trace:embedded-run] startup stages: runId=3ebcbf6e-9af6-459b-a4e4-294c5aba735f sessionId=79abafd7-b3c6-4e07-9cf0-216263b699f3 phase=attempt-dispatch totalMs=13516 stages=workspace:0ms@0ms,runtime-plugins:14ms@14ms,hooks:1ms@15ms,model-resolution:13202ms@13217ms,auth:283ms@13500ms,context-engine:3ms@13503ms,attempt-dispatch:13ms@13516ms

stable yokeBOT
#

That EPERM is Windows blocking symlink creation:

failed to create plugin skill symlink ... EPERM: operation not permitted, symlink

OpenClaw is trying to publish the bundled browser-automation skill into:

C:\Users\User\.openclaw\plugin-skills\browser-automation

On Windows, creating symlinks usually requires either:

  • Developer Mode enabled, or
  • running the process as Administrator / with symlink privilege.
#

It’s probably not fatal unless you need the browser-automation skill. The agent can still run, but that skill won’t be discoverable through the plugin-skills symlink path.

Fix options:

  1. Enable Windows Developer Mode:
    Settings → System → For developers → Developer Mode → On

  2. Restart OpenClaw after enabling it:

openclaw gateway restart
  1. If it still fails, run OpenClaw/Gateway once from an elevated PowerShell, or reinstall the gateway from elevated PowerShell:
#
openclaw gateway install
openclaw gateway restart

The other line is separate:

model-resolution:13202ms

That means model resolution/auth/provider setup took ~13s before dispatch. Annoying, but not the same as the symlink error or model prefill timeout.

errant merlin
#

15:48:55 [diagnostic] liveness warning: reasons=event_loop_delay interval=30s eventLoopDelayP99Ms=33.7 eventLoopDelayMaxMs=3347.1 eventLoopUtilization=0.124 cpuCoreRatio=0.131 active=1 waiting=0 queued=1 recentPhases=sidecars.restart-sentinel:0ms,sidecars.subagent-recovery:12ms,sidecars.main-session-recovery:8ms,post-attach.update-sentinel:0ms,sidecars.session-locks:100ms,post-ready.maintenance:999ms work=[active=agent:main:main(processing/model_call,q=1,age=31s last=model_call:started) queued=agent:main:main(processing/model_call,q=1,age=31s last=model_call:started)]

stable yokeBOT
#

Yeah, if you plan to use browser automation later, I’d fix the symlink permission now.

Best fix on Windows:

  1. Enable Developer Mode
    Settings → System → For developers → Developer Mode → On

  2. Restart OpenClaw:

openclaw gateway restart
  1. Check logs. The failed to create plugin skill symlink ... EPERM warning should disappear.

If it still fails, run PowerShell as Administrator once and reinstall/restart the gateway:

#
openclaw gateway install
openclaw gateway restart

You do not need to run everything as admin forever — the goal is just to let Windows allow symlink creation. Developer Mode is the cleaner long-term fix.

After that, this path should exist:

Test-Path C:\Users\User\.openclaw\plugin-skills\browser-automation

If it returns True, the browser automation skill should be available when OpenClaw needs it.

#

That warning says the Gateway itself is mostly alive, but the main agent session is currently busy in a model call and one message is queued behind it:

active=agent:main:main(processing/model_call,q=1,age=31s last=model_call:started)
queued=agent:main:main(... q=1 ...)

The event_loop_delay part means Node’s event loop had a pause/spike:

eventLoopDelayMaxMs=3347.1

So at one point the Gateway couldn’t process JS work for ~3.3s. That can happen if the machine is under load, antivirus is scanning, disk is busy, or local model processes are eating CPU/RAM. Your CPU ratios aren’t high there, so it’s more of a stall/spike than sustained CPU overload.

Most important: this is still consistent with the local model call being slow/stuck. Don’t send more messages while queued=1, because they’ll stack behind the active model call.

#

Do this after the current model call finishes or you stop/restart the local model server:

openclaw status --deep
openclaw logs --tail 100

If it keeps happening, reduce the OpenClaw agent load further:

openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config validate

But the key readout is: Gateway not dead; main session is waiting on model_call; one message queued.

errant merlin
#

15:50:06 [ws] ⇄ res ✓ node.list 81ms conn=68c37f8a…9bc5 id=54543277…013a
15:50:25 [agent/embedded] [llm-idle-timeout] custom-localhost-8080/planner produced no reply before the idle watchdog; retrying same model
15:50:30 [agent/embedded] [trace:embedded-run] core-plugin-tool stages: runId=3ebcbf6e-9af6-459b-a4e4-294c5aba735f sessionId=79abafd7-b3c6-4e07-9cf0-216263b699f3 phase=core-plugin-tools totalMs=5365 stages=tool-policy:2664ms@2664ms,workspace-policy:1ms@2665ms,base-coding-tools:0ms@2665ms,shell-tools:0ms@2665ms,openclaw-tools:session-workspace:1ms@2666ms,openclaw-tools:image-tool:28ms@2694ms,openclaw-tools:image-generate-tool:0ms@2694ms,openclaw-tools:video-generate-tool:0ms@2694ms,openclaw-tools:music-generate-tool:0ms@2694ms,openclaw-tools:pdf-tool:0ms@2694ms,openclaw-tools:web-search-tool:0ms@2694ms,openclaw-tools:web-fetch-tool:0ms@2694ms,openclaw-tools:message-tool:6ms@2700ms,openclaw-tools:nodes-tool:0ms@2700ms,openclaw-tools:core-tool-list:1ms@2701ms,openclaw-tools:plugin-tools:2659ms@5360ms,openclaw-tools:0ms@5360ms,message-provider-policy:0ms@5360ms,model-provider-policy:0ms@5360ms,authorization-policy:1ms@5361ms,schema-normalization:3ms@5364ms,tool-hooks:0ms@5364ms,abort-wrappers:0ms@5364ms,deferred-followup-descriptions:1ms@5365ms,attempt:create-openclaw-coding-tools:0ms@5365ms,attempt:tools-allow:0ms@5365ms
15:50:35 [agent/embedded] [trace:embedded-run] prep stages: runId=3ebcbf6e-9af6-459b-a4e4-294c5aba735f sessionId=79abafd7-b3c6-4e07-9cf0-216263b699f3 phase=stream-ready totalMs=10285 stages=workspace-sandbox:2ms@2ms,skills:0ms@2ms,core-plugin-tools:5366ms@5368ms,bootstrap-context:23ms@5391ms,bundle-tools:2077ms@7468ms,system-prompt:11ms@7479ms,session-resource-loader:2769ms@10248ms,agent-session:4ms@10252ms,stream-setup:33ms@10285ms

stable yokeBOT
#

This log is actually useful: OpenClaw itself is spending ~10s preparing the turn, then the local model is still timing out before first token.

Breakdown:

core-plugin-tools: 5366ms
bundle-tools: 2077ms
session-resource-loader: 2769ms
prep total: 10285ms

So about 10s is OpenClaw assembling tools/session/runtime. That’s not terrible for a heavy coding profile, but it is too much overhead for a weak local model if the model then needs ages to prefill.

The worse line is still:

#
[llm-idle-timeout] ... produced no reply before the idle watchdog

That means after prep, your local model still did not emit tokens in time.

For your setup, I’d make the local planner non-tool / lean:

openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
openclaw config validate

Then restart/fresh session:

#
openclaw gateway restart

In chat:

/new

That should reduce tool/schema overhead. Tradeoff: with supportsTools false, the local planner becomes more like a plain chat model, not a full coding/tool-loop agent.

If you want real working loops, I’d honestly use a hosted/faster model for agents.defaults.model, and keep local models for vision/transcription/simple chat. Your logs show the 4GB GPU/CPU backend is losing most of the battle at prompt prefill + tool envelope.

errant merlin
#

slot create_check: id 0 | task 0 | created context checkpoint 1 of 32 (pos_min = 8191, pos_max = 8191, n_tokens = 8192, size = 0.282 MiB)
srv stop: cancel task, id_task = 0
srv log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200
srv params_from_: Chat format: peg-native
slot release: id 0 | task 0 | stop processing: n_tokens = 10240, truncated = 0
slot get_availabl: id 0 | task -1 | selected slot by LCP similarity, sim_best = 0.738 (> 0.100 thold), f_keep = 1.000
slot launch_slot_: id 0 | task -1 | sampler chain: logits -> ?penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 0 | task 7 | processing task, is_child = 0
slot update_slots: id 0 | task 7 | new prompt, n_ctx_slot = 65536, n_keep = 0, task.n_tokens = 13878
slot update_slots: id 0 | task 7 | n_tokens = 10240, memory_seq_rm [10240, end)
slot update_slots: id 0 | task 7 | prompt processing progress, n_tokens = 12288, batch.n_tokens = 2048, progress = 0.885430
slot update_slots: id 0 | task 7 | n_tokens = 12288, memory_seq_rm [12288, end)
slot update_slots: id 0 | task 7 | prompt processing progress, n_tokens = 13362, batch.n_tokens = 1074, progress = 0.962819
srv stop: cancel task, id_task = 7
srv log_server_r: done request: POST /v1/chat/completions 127.0.0.1 200

stable yokeBOT
#

This log says the model was almost done prefilling, then the client cancelled before generation started:

task.n_tokens = 13878
progress = 0.962819
srv stop: cancel task

So the prompt is ~13.9k tokens. That’s under OpenClaw’s 16k minimum, but still heavy for your backend. It reached 96% of prompt processing and then got aborted — classic idle watchdog / caller timeout before first token.

Since you updated to 2026.5.7, try provider timeout again now:

openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 600 --strict-json
openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
openclaw config validate
#
openclaw gateway restart

Also make the local planner as lean as possible:

openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config validate

Then /new before testing again.

One more thing: your server says:

#
n_ctx_slot = 65536

That means llama.cpp is running with a huge 65k context slot. On your hardware, try launching the model server with a smaller context, like 16k or 20k, not 65k. You only need to satisfy OpenClaw’s 16k minimum; 65k just makes KV/cache/memory pressure worse.

So: OpenClaw is not failing to send the prompt. The local server is spending too long digesting ~14k tokens, then OpenClaw/client gives up before first output token.

errant merlin
errant merlin
stable yokeBOT
#

Fair point — if KV is only ~200MB, then the 65k context setting probably isn’t the main memory problem.

The real bottleneck in your logs is prefill compute / time-to-first-token, not just KV size:

task.n_tokens = 13878
progress = 0.962819
srv stop: cancel task

That means llama.cpp was still chewing through ~14k prompt tokens and OpenClaw/client cancelled before the first generated token arrived. Even with small KV, prefill can be slow on CPU/low-end GPU because it still has to run the model over every prompt token.

So the useful fixes are:

openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 600 --strict-json
#
openclaw config set agents.defaults.timeoutSeconds 600 --strict-json
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config validate

Then /new before testing again.

If it still cancels around 90–99% prefill, the model is simply not reaching first token before the idle/watchdog timeout. At that point either raise timeout more, speed up llama.cpp prefill, or use a lighter/faster model for the main OpenClaw agent loop.

#

Yes — now that you’re on 2026.5.7, try increasing both the provider/model wait and the overall agent turn wait:

openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 900 --strict-json
openclaw config set agents.defaults.timeoutSeconds 900 --strict-json
openclaw config validate
openclaw gateway restart

For your vision/coder servers too:

openclaw config set models.providers.custom-localhost-8081.timeoutSeconds 900 --strict-json
openclaw config set models.providers.custom-localhost-8082.timeoutSeconds 900 --strict-json
openclaw config validate
#

When testing from CLI, also increase the CLI wait timeout, otherwise the CLI itself can give up first:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --json --timeout-ms 900000

So the three layers are:

  • models.providers.<id>.timeoutSeconds = model HTTP / stream idle wait
  • agents.defaults.timeoutSeconds = whole agent turn timeout
  • --timeout-ms = CLI command wait time

If it still cancels after 900s, then OpenClaw is waiting long enough and the local backend is simply too slow/stuck before first token.

errant merlin
#

16:00:18 [diagnostic] liveness warning: reasons=event_loop_delay interval=43s eventLoopDelayP99Ms=32.3 eventLoopDelayMaxMs=13463.7 eventLoopUtilization=0.365 cpuCoreRatio=0.369 active=1 waiting=0 queued=1 recentPhases=sidecars.restart-sentinel:0ms,sidecars.subagent-recovery:7ms,sidecars.main-session-recovery:5ms,post-attach.update-sentinel:0ms,sidecars.session-locks:38ms,post-ready.maintenance:867ms work=[active=agent:main:main(processing,q=1,age=15s) queued=agent:main:main(processing,q=1,age=15s)]
16:00:32 [ws] ⇄ res ✓ node.list 11700ms conn=4c78d5a0…99e8 id=b096b23d…479d
16:00:32 [agent/embedded] [trace:embedded-run] startup stages: runId=aeb1742f-6b61-44cb-8c33-61199c28ae80 sessionId=903489c3-4d6a-487b-aced-ab4496860813 phase=attempt-dispatch totalMs=11646 stages=workspace:1ms@1ms,runtime-plugins:10ms@11ms,hooks:1ms@12ms,model-resolution:11390ms@11402ms,auth:234ms@11636ms,context-engine:1ms@11637ms,attempt-dispatch:9ms@11646ms

stable yokeBOT
#

This log points to Gateway/Node event loop stalls, not just model prefill now.

Key bits:

eventLoopDelayMaxMs=13463.7
node.list 11700ms
model-resolution:11390ms

So something blocked/slowed the OpenClaw Node process for ~11–13 seconds. That’s why even node.list took 11.7s — that call does not need your LLM to generate text.

Likely causes on your setup:

  • local model server is eating CPU/RAM and starving the Gateway process
  • antivirus/Windows Defender scanning node_modules / .openclaw
  • Gateway running from nvm path + service/login-item weirdness after update
#
  • too many active/retrying stuck model runs queued in the same session

Do this next:

openclaw status --deep
openclaw gateway status
openclaw config validate

Then check if models.providers.*.timeoutSeconds now works after update:

openclaw config get models.providers.custom-localhost-8080.timeoutSeconds

If it’s unset:

#
openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 900 --strict-json
openclaw config validate

But separately, I’d stop the local model server once and see if OpenClaw UI/API becomes fast. If node.list drops back under ~100ms when the model server is stopped, the machine is simply resource-starved while llama.cpp is running.

errant merlin
#

PS C:\WINDOWS\system32> openclaw config get models.providers.custom-localhost-8080.timeoutSeconds

🦞 OpenClaw 2026.5.7 (eeef486) — I'll do the boring stuff while you dramatically stare at the logs like it's cinema.

900
PS C:\WINDOWS\system32>

stable yokeBOT
#

Good — provider timeout is set correctly now:

custom-localhost-8080.timeoutSeconds = 900

So OpenClaw should wait much longer for the planner model before cancelling the HTTP/model stream.

Next verify the other two layers:

openclaw config get agents.defaults.timeoutSeconds

Expected: 900 or similar.

And when testing from CLI, use:

#
openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --json --timeout-ms 900000

If that still times out/cancels while your llama.cpp log shows it is still prefilling, then it’s no longer an OpenClaw timeout config issue — the machine/backend is too slow or getting event-loop stalls under load.

Also your last log had node.list 11700ms and model-resolution 11390ms, which means the Gateway itself was delayed ~11s. I’d test once with the local model server stopped: if OpenClaw becomes fast, llama.cpp is starving the Gateway process.

errant merlin
#

it still returns json reply, wth

stable yokeBOT
#

If the Discord/OpenClaw chat reply itself is literally:

{"message":"..."}

then that’s your local model/server template, not OpenClaw intentionally sending a JSON file.

Two common causes:

  1. CLI test used --json
    If it’s from this command:
openclaw infer model run ... --json
#

remove --json:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello" --gateway --timeout-ms 900000
  1. llama.cpp chat format is forcing JSON-ish replies
    Your server log showed:
Chat format: peg-native

That template/model may prefer structured output like { "message": ... }.

Test directly against the server:

#
$body = @{
  model = "planner"
  messages = @(@{ role = "user"; content = "Reply with exactly: hello. Plain text only, no JSON." })
  max_tokens = 20
  stream = $false
} | ConvertTo-Json -Depth 5

Invoke-RestMethod -Uri http://127.0.0.1:8080/v1/chat/completions -Method Post -ContentType "application/json" -Body $body

If direct server output is also JSON-style, fix it in llama.cpp: try a different --chat-template / chat format for that model, or add a system prompt/template that says plain text only.

For OpenClaw, also make sure this is set for the weak local planner:

#
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config validate

That reduces tool/structured-output pressure. But if the direct llama.cpp test still returns { "message": ... }, the real fix is the model server chat template.

errant merlin
#

maybe there was something with openclaw that cause the slow reply and weird result format\

stable yokeBOT
#

Yeah, that’s possible — but it’s probably two different issues mixed together:

1. Slow reply
OpenClaw definitely adds overhead compared to talking to llama.cpp directly. Your logs showed:

core-plugin-tools ~5.3s
bundle-tools ~2s
session-resource-loader ~2.7s
model-resolution ~11s sometimes

So OpenClaw is adding several seconds before the model even starts. Then your local model has to prefill ~14k tokens, which is the much bigger delay.

To reduce OpenClaw overhead for local models:

#
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config validate

That makes it lighter, but also less agentic/tool-capable.

2. Weird JSON reply format
That’s more likely from the local model/chat template, especially because your llama.cpp log said:

Chat format: peg-native

OpenClaw can encourage structured/tool-ish outputs, but if the model replies like:

#
{"message":"hi..."}

even when tools are disabled, it’s probably the model template or server format.

Best isolation test:

openclaw infer model run --model custom-localhost-8080/planner --prompt "Reply with exactly: hello. Plain text only, no JSON." --gateway --timeout-ms 900000

No --json there.

Then test llama.cpp directly. If direct llama.cpp also returns JSON-style output, it’s definitely the model/server template. If direct is plain text but OpenClaw chat is JSON, then OpenClaw’s agent/tool prompt is pushing it that way — compat.supportsTools false should help.

errant merlin
#

here is my chat model template:

{{- bos_token -}}{%- set system_prompt = "" -%}{%- set ns = namespace(system_prompt="") -%}{%- if messages[0]["role"] == "system" -%} {%- set ns.system_prompt = messages[0]["content"] -%} {%- set messages = messages[1:] -%}{%- endif -%}{%- if tools -%} {%- set ns.system_prompt = ns.system_prompt + ("
" if ns.system_prompt else "") + "List of tools: <|tool_list_start|>[" -%} {%- for tool in tools -%} {%- if tool is not string -%} {%- set tool = tool | tojson -%} {%- endif -%} {%- set ns.system_prompt = ns.system_prompt + tool -%} {%- if not loop.last -%} {%- set ns.system_prompt = ns.system_prompt + ", " -%} {%- endif -%} {%- endfor -%} {%- set ns.system_prompt = ns.system_prompt + "]<|tool_list_end|>" -%}{%- endif -%}{%- if ns.system_prompt -%} {{- "<|im_start|>system
" + ns.system_prompt + "<|im_end|>
" -}}{%- endif -%}{%- for message in messages -%} {{- "<|im_start|>" + message["role"] + "
" -}} {%- set content = message["content"] -%} {%- if content is not string -%} {%- set content = content | tojson -%} {%- endif -%} {%- if message["role"] == "tool" -%} {%- set content = "<|tool_response_start|>" + content + "<|tool_response_end|>" -%} {%- endif -%} {{- content + "<|im_end|>
" -}}{%- endfor -%}{%- if add_generation_prompt -%} {{- "<|im_start|>assistant
" -}}{%- endif -%}

stable yokeBOT
#

Yep — this template can absolutely explain both the huge prompt and the JSON-ish replies.

The biggest issue is this block:

{%- if tools -%}
  ... "List of tools: <|tool_list_start|>[" ...
  {%- for tool in tools -%}
    {%- set tool = tool | tojson -%}

That means whenever OpenClaw sends tools, your template serializes the entire tool schema list into the system prompt. OpenClaw tools are large, so even “hi” becomes a massive prompt. That matches your llama.cpp logs showing ~14k tokens before generation.

It can also push the model toward structured/JSON output, because the prompt contains lots of JSON tool definitions.

Try these fixes, in order:

#
  1. Disable tool use for this local planner:
openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json
openclaw config validate
openclaw gateway restart
  1. Keep lean mode on:
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
  1. If you control the llama.cpp template, make a non-tool chat template for this model — remove the whole {% if tools %} ... tool_list ... {% endif %} section. For this local model, plain chat is better than pretending it can handle OpenClaw’s full tool schema.

  2. If you still want tools later, use a model/template with native OpenAI-compatible tool calling support. This current template is basically stuffing tools into plain text, which is expensive and brittle.

#

So yes: OpenClaw is heavy, but your template is amplifying it by dumping tool JSON into the prompt. For your hardware, I’d run planner with supportsTools: false and use a stronger/hosted model for real coding/tool loops.

worthy wave
#

Hi - this thread is getting very long and will be closed soon. Please create a new one for any new questions

sharp moatBOT
errant merlin
#

please sum up this whole thread @stable yoke

stable yokeBOT
#

Quick thread summary before this gets too long:

You wanted to run local models in OpenClaw on a machine with a 4GB GPU + 32GB RAM, including separate models for planning/coding, vision, and Whisper/audio.

What we found:

  • OpenClaw can use separate model slots:
    • agents.defaults.model for the main assistant/planner/coding model
    • agents.defaults.imageModel for image-capable model fallback
    • tools.media.audio for Whisper/audio transcription
  • Whisper should not be configured as a normal chat model. It belongs under tools.media.audio.
  • For whisper.cpp, use whisper-cli.exe/main.exe with full Windows paths, not usually command: "whisper" unless that is actually your binary.
  • Your local model providers were configured as custom OpenAI-compatible endpoints like:
    • custom-localhost-8080/planner
    • custom-localhost-8081/vision
    • custom-localhost-8082/coder
  • On OpenClaw 2026.4.23, models.providers.*.timeoutSeconds was rejected. After updating to 2026.5.7, it works, and you set planner timeout to 900.
#
  • The Gateway/device pairing issue was separate: the CLI needed a scope upgrade, which you approved with openclaw devices approve <requestId>.
  • The spawn start ENOENT after openclaw gateway restart looked like a Windows Startup-folder login-item restart path issue, not model config. Suggested checking openclaw gateway status, repairing with openclaw doctor / openclaw gateway install, or running openclaw gateway run temporarily.
  • Windows symlink errors for browser-automation skill were caused by Windows blocking symlink creation. Fix: enable Developer Mode or run elevated once/reinstall gateway.

Main diagnosis:

Your slow replies were mostly not network transfer. llama.cpp logs showed OpenClaw was sending a large prompt, then the model spent a long time in prompt prefill before producing the first token. Example logs showed ~13k–16k prompt tokens and then cancellation before generation.

Why even hi was heavy:

  • OpenClaw full agent mode includes system instructions, tools, workspace/bootstrap context, session metadata, Discord context, and model/tool protocol text.
  • OpenClaw requires at least ~16k context for the embedded agent loop.
  • Your chat template was also adding the entire tool list as JSON into the system prompt:
List of tools: <|tool_list_start|>[ ... ]<|tool_list_end|>
#

That likely inflated prompts and encouraged JSON-looking replies.

Recommended local-model config direction:

openclaw config set models.providers.custom-localhost-8080.timeoutSeconds 900 --strict-json
openclaw config set agents.defaults.timeoutSeconds 900 --strict-json
openclaw config set agents.defaults.experimental.localModelLean true --strict-json
openclaw config set channels.discord.historyLimit 0 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].reasoning false --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].maxTokens 512 --strict-json

For full agent mode, keep planner/vision context at least 16k:

#
openclaw config set models.providers.custom-localhost-8080.models[0].contextWindow 16000 --strict-json
openclaw config set models.providers.custom-localhost-8080.models[0].contextTokens 16000 --strict-json

If still too slow, last-resort for local planner:

openclaw config set models.providers.custom-localhost-8080.models[0].compat.supportsTools false --strict-json

Tradeoff: that makes the local planner much lighter, but no real tool-calling loop.

Bottom line:

Local vision/transcription/simple chat may work on this hardware. Full coding/tool-agent loops are likely too heavy for a 4GB GPU + CPU inference unless you heavily reduce tools/context or use a faster/hosted model for the main agent loop.