"meta": {
"lastTouchedVersion": "2026.2.19-2",
"lastTouchedAt": "2026-02-21T16:24:48.063Z"
},
"wizard": {
"lastRunAt": "2026-02-21T16:24:47.981Z",
"lastRunVersion": "2026.2.19-2",
"lastRunCommand": "configure",
"lastRunMode": "local"
},
"models": {
"mode": "merge",
"providers": {
"vllm": {
"baseUrl": "http://77.215.165.xxx:yyyy/v1",
"apiKey": "VLLM_API_KEY",
"api": "openai-completions",
"models": [
{
"id": "nemotron-3-nano-30b-a3b",
"name": "nemotron-3-nano-30b-a3b",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 64000,
"maxTokens": 8192
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "vllm/nemotron-3-nano-30b-a3b"
},
"models": {
"vllm/nemotron-3-nano-30b-a3b": {}
},
"workspace": "C:\Users\openc\.openclaw\workspace",
"compaction": {
"mode": "safeguard"
},
"maxConcurrent": 4,
"subagents": {
"maxConcurrent": 8
}
}
},
"messages": {
"ackReactionScope": "group-mentions"
},
"commands": {
"native": "auto",
"nativeSkills": "auto",
"restart": true
#LM Studio load nemotron-3-nano-30b-a3b
1 messages · Page 1 of 1 (latest)
That LLM request timed out line usually means: the Gateway did reach “a model provider”, but the HTTP call to the model API never produced a valid response before the Gateway’s request deadline.
For vLLM/OpenAI-compatible local servers, the most common gotcha is the API mode:
1) If your provider is named vllm, make sure you set api: "openai-completions"
For custom providers (models.providers.*), OpenClaw will default to openai-responses if you don’t specify api. vLLM typically supports Chat Completions, not the newer Responses API → that mismatch often shows up as “hangs then times out”.
Config should look like:
models: {
mode: "merge",
providers: {
vllm: {
baseUrl: "http://127.0.0.1:8000/v1",
apiKey: "sk-local", // whatever you use
Config-wise this looks basically right now (api: "openai-completions" is the important bit), so if you’re still getting LLM request timed out it’s almost always (A) the Gateway can’t actually reach that URL or (B) the server is reachable but it’s too slow to produce the first token.
1) First check: can the Gateway machine reach the model endpoint?
On the same machine where OpenClaw is running:
curl -v http://77.215.165.171:2222/v1/models
Then try an actual completion:
curl -s http://77.215.165.171:2222/v1/chat/completions ^
-H "Content-Type: application/json" ^
-H "Authorization: Bearer VLLM_API_KEY" ^
-d "{\"model\":\"nemotron-3-nano-30b-a3b\",\"messages\":[{\"role\":\"user\",\"content\":\"say hi\"}],\"max_tokens\":32,\"stream\":false}"
Your config doesn’t look “wrong” at a glance — the api: "openai-completions" part is exactly what I’d expect for an OpenAI-compatible server.
When you still get LLM request timed out, it’s usually one of these:
1) apiKey is literal, not an env var
Right now you have:
apiKey: "VLLM_API_KEY"
That is the actual key string OpenClaw will send.
If you meant “read this from my environment”, it must be:
api: "openai-completions", // important for vLLM
models: [{ id: "nemotron-3-nano-30b-a3b", name: "Nemotron 3 Nano 30B A3B" }],
},
},
}
Docs: https://docs.openclaw.ai/gateway/local-models (and config reference for the api field)
2) Quick connectivity check from the Gateway machine
If vLLM is supposed to be local/same box:
curl http://127.0.0.1:8000/v1/models
PS C:> curl -v http://77.215.165.171:2222/v1/models
VERBOSE: GET with 0-byte payload
- If either of these hangs/fails → it’s network/firewall/bind/NAT, not OpenClaw.
- If they return quickly → we look at OpenClaw/logs + streaming.
2) Don’t use your public IP if the model is actually local/LAN
If LM Studio/vLLM is running on the same Windows box as OpenClaw, use:
http://127.0.0.1:<port>/v1(best), or- your LAN IP (
192.168.x.x), not the public IP.
A lot of home routers don’t support NAT hairpin reliably, and it can manifest as “timeouts”.
3) Reduce concurrency while testing
Local models + OpenClaw with maxConcurrent: 4 and subagents can overwhelm the server and cause timeouts. For debugging, set:
agents: { defaults: { maxConcurrent: 1, subagents: { maxConcurrent: 1 } } }
apiKey: "${VLLM_API_KEY}"
(or just paste the real key value there).
Also: if you’re running the Gateway as a Windows service/daemon, it often doesn’t inherit your shell env vars, so ${VLLM_API_KEY} will fail unless you set it for the service/user environment.
2) The Gateway can’t reach http://77.215.165.171:2222/v1 (network/NAT/firewall)
From the same machine running OpenClaw, run:
curl -v http://77.215.165.171:2222/v1/models
If that hangs/fails, OpenClaw will time out too (and the fix is networking, not config).
If that’s slow/hanging, OpenClaw will definitely time out too.
3) If it still times out, the next 3 things to check
- vLLM logs: look for OOM / model still loading / KV cache issues.
- Cold-start latency: first request can be very slow if the model isn’t already loaded.
- Token/context limits: Nemotron 30B class models can be painfully slow if your prompt is big and you ask for lots of output.
4) What I need to pinpoint it
Paste (redact keys):
openclaw --versionopenclaw status --all- your
models.providers.vllm { ... }block - ~20 lines of
openclaw logs --followaround the timeout
That’ll make it obvious whether it’s (a) API-mode mismatch, (b) baseUrl reachability, or (c) the model just can’t answer fast enough on the available hardware.
PS C:> openclaw --version
2026.2.19-2
Diagnosis (read-only)
Gateway connection details:
Gateway target: ws://127.0.0.1:8989
Source: local loopback
Config: C:\Users\openc.openclaw\openclaw.json
Bind: loopback
âś“ Config: C:\Users\openc.openclaw\openclaw.json
âś“ Restart sentinel: none
âś“ Port 8989
! Tailscale: off · unknown
error: Error: spawn tailscale ENOENT
✓ Skills: 3 eligible · 0 missing · C:\U
Restart the gateway and test again.
4) Tiny config nit: apiKey
If you intended to read an env var, it should be ${VLLM_API_KEY} (env substitution), not "VLLM_API_KEY" (literal string). If your server doesn’t enforce auth, you can also just set a dummy key.
Docs worth skimming: https://docs.openclaw.ai/gateway/local-models
If you paste ~30 lines around the timeout from openclaw logs --follow (and confirm whether that 77.215…:2222 server is local, LAN, or truly remote), I can tell you which branch you’re on immediately.
If LM Studio/vLLM is actually on the same PC, don’t go out to a public IP — use http://127.0.0.1:<port>/v1 instead.
3) Try disabling streaming + lowering concurrency (just for debugging)
Some local servers are flaky with streaming, and maxConcurrent: 4 can overload the model box.
agents: {
defaults: {
maxConcurrent: 1,
subagents: { maxConcurrent: 1 },
models: {
"vllm/nemotron-3-nano-30b-a3b": { streaming: false },
},
},
}
I think it the nemotron-3-nano-30b-a3b that is very slow
2026-02-21 17:58:30 [INFO]
[nemotron-3-nano-30b-a3b] Running chat completion on conversation with 7 messages.
2026-02-21 17:58:30 [INFO]
[nemotron-3-nano-30b-a3b] Streaming response...
2026-02-21 17:58:30 [DEBUG]
LlamaV4::predict slot selection: session_id=<empty> server-selected (LCP/LRU)
2026-02-21 17:58:30 [DEBUG]
slot get_availabl: id 1 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 1 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 1 | task 0 | processing task, is_child = 0
slot update_slots: id 1 | task 0 | new prompt, n_ctx_slot = 64000, n_keep = 15215, task.n_tokens = 15465
slot update_slots: id 1 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
2026-02-21 17:58:30 [DEBUG]
slot update_slots: id 1 | task 0 | prompt processing progress, n_tokens = 1024, batch.n_tokens = 1024, progress = 0.066214
2026-02-21 17:58:30 [INFO]
[nemotron-3-nano-30b-a3b] Prompt processing progress: 0.0%
What model shall i use insted
If you paste 2 things, we can pinpoint it fast
- Output of:
curl -v http://77.215.165.171:2222/v1/models - ~30 lines around the timeout from:
openclaw logs --follow(redact any keys)
That will tell us immediately whether it’s auth/env, connectivity, or the model server stalling.
That “Diagnosis (read-only)” snippet is good news: your Gateway is up and reachable on ws://127.0.0.1:8989 and the config file is being loaded. So the timeout is almost certainly happening when calling the model endpoint, not a Gateway startup/config parse issue.
Next steps (these usually pinpoint it in 2 minutes):
1) Verify the model server is reachable from the OpenClaw machine
Run on the same box as OpenClaw:
curl.exe -v --max-time 10 http://77.215.165.171:2222/v1/models
What to look for:
- Hangs / timeout → networking/firewall/NAT/port-forwarding problem (OpenClaw will time out too).
- 401 Unauthorized → your server expects a real bearer token, but OpenClaw is not sending the right one.
- 200 + JSON list of models → connectivity is fine; move to step 2.
2) Fix apiKey (yours is currently a literal string)
Cool — version is current (2026.2.19-2), so we can stop worrying about “old bug”.
Now we need to prove whether the model endpoint is reachable + responding.
1) From that same PowerShell, run:
curl.exe -v --max-time 10 http://77.215.165.171:2222/v1/models
Paste the output (even if it errors). This will tell us immediately if it’s:
- can’t connect / timeout (network/firewall/NAT),
- 401/403 (auth),
- or OK (then it’s model latency/streaming/etc).
2) Also run and paste: