LM Studio load nemotron-3-nano-30b-a3b | Friends of the Crustacean 🦞🤝 | Page 1

frozen saffron Feb 21, 2026, 4:37 PM

#

"meta": {
"lastTouchedVersion": "2026.2.19-2",
"lastTouchedAt": "2026-02-21T16:24:48.063Z"
},
"wizard": {
"lastRunAt": "2026-02-21T16:24:47.981Z",
"lastRunVersion": "2026.2.19-2",
"lastRunCommand": "configure",
"lastRunMode": "local"
},
"models": {
"mode": "merge",
"providers": {
"vllm": {
"baseUrl": "http://77.215.165.xxx:yyyy/v1",
"apiKey": "VLLM_API_KEY",
"api": "openai-completions",
"models": [
{
"id": "nemotron-3-nano-30b-a3b",
"name": "nemotron-3-nano-30b-a3b",
"reasoning": false,
"input": [
"text"
],
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
},
"contextWindow": 64000,
"maxTokens": 8192
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "vllm/nemotron-3-nano-30b-a3b"
},
"models": {
"vllm/nemotron-3-nano-30b-a3b": {}
},
"workspace": "C:\Users\openc\.openclaw\workspace",
"compaction": {
"mode": "safeguard"
},
"maxConcurrent": 4,
"subagents": {
"maxConcurrent": 8
}
}
},
"messages": {
"ackReactionScope": "group-mentions"
},
"commands": {
"native": "auto",
"nativeSkills": "auto",
"restart": true

#

Cant find anyting worng with the config anyone can help ?

gusty quarryBOT Feb 21, 2026, 4:45 PM

#

That LLM request timed out line usually means: the Gateway did reach “a model provider”, but the HTTP call to the model API never produced a valid response before the Gateway’s request deadline.

For vLLM/OpenAI-compatible local servers, the most common gotcha is the API mode:

1) If your provider is named `vllm`, make sure you set `api: "openai-completions"`

For custom providers (models.providers.*), OpenClaw will default to openai-responses if you don’t specify api. vLLM typically supports Chat Completions, not the newer Responses API → that mismatch often shows up as “hangs then times out”.

Config should look like:

models: {
  mode: "merge",
  providers: {
    vllm: {
      baseUrl: "http://127.0.0.1:8000/v1",
      apiKey: "sk-local",              // whatever you use

#

Config-wise this looks basically right now (api: "openai-completions" is the important bit), so if you’re still getting LLM request timed out it’s almost always (A) the Gateway can’t actually reach that URL or (B) the server is reachable but it’s too slow to produce the first token.

1) First check: can the Gateway machine reach the model endpoint?

On the same machine where OpenClaw is running:

curl -v http://77.215.165.171:2222/v1/models

Then try an actual completion:

curl -s http://77.215.165.171:2222/v1/chat/completions ^
  -H "Content-Type: application/json" ^
  -H "Authorization: Bearer VLLM_API_KEY" ^
  -d "{\"model\":\"nemotron-3-nano-30b-a3b\",\"messages\":[{\"role\":\"user\",\"content\":\"say hi\"}],\"max_tokens\":32,\"stream\":false}"

#

Your config doesn’t look “wrong” at a glance — the api: "openai-completions" part is exactly what I’d expect for an OpenAI-compatible server.

When you still get LLM request timed out, it’s usually one of these:

1) `apiKey` is literal, not an env var

Right now you have:

apiKey: "VLLM_API_KEY"

That is the actual key string OpenClaw will send.

If you meant “read this from my environment”, it must be:

#

      api: "openai-completions",       // important for vLLM
      models: [{ id: "nemotron-3-nano-30b-a3b", name: "Nemotron 3 Nano 30B A3B" }],
    },
  },
}

Docs: https://docs.openclaw.ai/gateway/local-models (and config reference for the api field)

2) Quick connectivity check from the Gateway machine

If vLLM is supposed to be local/same box:

curl http://127.0.0.1:8000/v1/models

frozen saffron Feb 21, 2026, 4:48 PM

#

PS C:> curl -v http://77.215.165.171:2222/v1/models
VERBOSE: GET with 0-byte payload

gusty quarryBOT Feb 21, 2026, 4:49 PM

#

If either of these hangs/fails → it’s network/firewall/bind/NAT, not OpenClaw.
If they return quickly → we look at OpenClaw/logs + streaming.

2) Don’t use your public IP if the model is actually local/LAN

If LM Studio/vLLM is running on the same Windows box as OpenClaw, use:

http://127.0.0.1:<port>/v1 (best), or
your LAN IP (192.168.x.x), not the public IP.

A lot of home routers don’t support NAT hairpin reliably, and it can manifest as “timeouts”.

3) Reduce concurrency while testing

Local models + OpenClaw with maxConcurrent: 4 and subagents can overwhelm the server and cause timeouts. For debugging, set:

agents: { defaults: { maxConcurrent: 1, subagents: { maxConcurrent: 1 } } }

#

apiKey: "${VLLM_API_KEY}"

(or just paste the real key value there).

Also: if you’re running the Gateway as a Windows service/daemon, it often doesn’t inherit your shell env vars, so ${VLLM_API_KEY} will fail unless you set it for the service/user environment.

2) The Gateway can’t reach `http://77.215.165.171:2222/v1` (network/NAT/firewall)

From the same machine running OpenClaw, run:

curl -v http://77.215.165.171:2222/v1/models

If that hangs/fails, OpenClaw will time out too (and the fix is networking, not config).

#

If that’s slow/hanging, OpenClaw will definitely time out too.

3) If it still times out, the next 3 things to check

vLLM logs: look for OOM / model still loading / KV cache issues.
Cold-start latency: first request can be very slow if the model isn’t already loaded.
Token/context limits: Nemotron 30B class models can be painfully slow if your prompt is big and you ask for lots of output.

4) What I need to pinpoint it

Paste (redact keys):

openclaw --version
openclaw status --all
your models.providers.vllm { ... } block
~20 lines of openclaw logs --follow around the timeout

That’ll make it obvious whether it’s (a) API-mode mismatch, (b) baseUrl reachability, or (c) the model just can’t answer fast enough on the available hardware.

frozen saffron Feb 21, 2026, 4:52 PM

#

PS C:> openclaw --version
2026.2.19-2

#

Diagnosis (read-only)

Gateway connection details:
Gateway target: ws://127.0.0.1:8989
Source: local loopback
Config: C:\Users\openc.openclaw\openclaw.json
Bind: loopback

✓ Config: C:\Users\openc.openclaw\openclaw.json
✓ Restart sentinel: none
✓ Port 8989
! Tailscale: off · unknown
error: Error: spawn tailscale ENOENT
✓ Skills: 3 eligible · 0 missing · C:\U

gusty quarryBOT Feb 21, 2026, 4:53 PM

#

Restart the gateway and test again.

4) Tiny config nit: `apiKey`

If you intended to read an env var, it should be ${VLLM_API_KEY} (env substitution), not "VLLM_API_KEY" (literal string). If your server doesn’t enforce auth, you can also just set a dummy key.

Docs worth skimming: https://docs.openclaw.ai/gateway/local-models

If you paste ~30 lines around the timeout from openclaw logs --follow (and confirm whether that 77.215…:2222 server is local, LAN, or truly remote), I can tell you which branch you’re on immediately.

#

If LM Studio/vLLM is actually on the same PC, don’t go out to a public IP — use http://127.0.0.1:<port>/v1 instead.

3) Try disabling streaming + lowering concurrency (just for debugging)

Some local servers are flaky with streaming, and maxConcurrent: 4 can overload the model box.

agents: {
  defaults: {
    maxConcurrent: 1,
    subagents: { maxConcurrent: 1 },
    models: {
      "vllm/nemotron-3-nano-30b-a3b": { streaming: false },
    },
  },
}

frozen saffron Feb 21, 2026, 4:59 PM

#

I think it the nemotron-3-nano-30b-a3b that is very slow

#

2026-02-21 17:58:30 [INFO]
[nemotron-3-nano-30b-a3b] Running chat completion on conversation with 7 messages.
2026-02-21 17:58:30 [INFO]
[nemotron-3-nano-30b-a3b] Streaming response...
2026-02-21 17:58:30 [DEBUG]
LlamaV4::predict slot selection: session_id=<empty> server-selected (LCP/LRU)
2026-02-21 17:58:30 [DEBUG]
slot get_availabl: id 1 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id 1 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> min-p -> ?xtc -> temp-ext -> dist
slot launch_slot_: id 1 | task 0 | processing task, is_child = 0
slot update_slots: id 1 | task 0 | new prompt, n_ctx_slot = 64000, n_keep = 15215, task.n_tokens = 15465
slot update_slots: id 1 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
2026-02-21 17:58:30 [DEBUG]
slot update_slots: id 1 | task 0 | prompt processing progress, n_tokens = 1024, batch.n_tokens = 1024, progress = 0.066214
2026-02-21 17:58:30 [INFO]
[nemotron-3-nano-30b-a3b] Prompt processing progress: 0.0%

#

What model shall i use insted

gusty quarryBOT Feb 21, 2026, 5:01 PM

#

If you paste 2 things, we can pinpoint it fast

Output of: curl -v http://77.215.165.171:2222/v1/models
~30 lines around the timeout from: openclaw logs --follow (redact any keys)

That will tell us immediately whether it’s auth/env, connectivity, or the model server stalling.

#

That “Diagnosis (read-only)” snippet is good news: your Gateway is up and reachable on ws://127.0.0.1:8989 and the config file is being loaded. So the timeout is almost certainly happening when calling the model endpoint, not a Gateway startup/config parse issue.

Next steps (these usually pinpoint it in 2 minutes):

1) Verify the model server is reachable from the OpenClaw machine

Run on the same box as OpenClaw:

curl.exe -v --max-time 10 http://77.215.165.171:2222/v1/models

What to look for:

Hangs / timeout → networking/firewall/NAT/port-forwarding problem (OpenClaw will time out too).
401 Unauthorized → your server expects a real bearer token, but OpenClaw is not sending the right one.
200 + JSON list of models → connectivity is fine; move to step 2.

2) Fix `apiKey` (yours is currently a literal string)

#

Cool — version is current (2026.2.19-2), so we can stop worrying about “old bug”.

Now we need to prove whether the model endpoint is reachable + responding.

1) From that same PowerShell, run:

curl.exe -v --max-time 10 http://77.215.165.171:2222/v1/models

Paste the output (even if it errors). This will tell us immediately if it’s:

can’t connect / timeout (network/firewall/NAT),
401/403 (auth),
or OK (then it’s model latency/streaming/etc).

#LM Studio load nemotron-3-nano-30b-a3b

1) If your provider is named `vllm`, make sure you set `api: "openai-completions"`

1) First check: can the Gateway machine reach the model endpoint?

1) `apiKey` is literal, not an env var

2) Quick connectivity check from the Gateway machine

2) Don’t use your public IP if the model is actually local/LAN

3) Reduce concurrency while testing

2) The Gateway can’t reach `http://77.215.165.171:2222/v1` (network/NAT/firewall)

3) If it still times out, the next 3 things to check

4) What I need to pinpoint it

4) Tiny config nit: `apiKey`

3) Try disabling streaming + lowering concurrency (just for debugging)

If you paste 2 things, we can pinpoint it fast

1) Verify the model server is reachable from the OpenClaw machine

2) Fix `apiKey` (yours is currently a literal string)

1) From that same PowerShell, run:

2) Also run and paste:

#LM Studio load nemotron-3-nano-30b-a3b

1) If your provider is named vllm, make sure you set api: "openai-completions"

1) First check: can the Gateway machine reach the model endpoint?

1) apiKey is literal, not an env var

2) Quick connectivity check from the Gateway machine

2) Don’t use your public IP if the model is actually local/LAN

3) Reduce concurrency while testing

2) The Gateway can’t reach http://77.215.165.171:2222/v1 (network/NAT/firewall)

3) If it still times out, the next 3 things to check

4) What I need to pinpoint it

4) Tiny config nit: apiKey

3) Try disabling streaming + lowering concurrency (just for debugging)

If you paste 2 things, we can pinpoint it fast

1) Verify the model server is reachable from the OpenClaw machine

2) Fix apiKey (yours is currently a literal string)

1) From that same PowerShell, run:

2) Also run and paste:

1) If your provider is named `vllm`, make sure you set `api: "openai-completions"`

1) `apiKey` is literal, not an env var

2) The Gateway can’t reach `http://77.215.165.171:2222/v1` (network/NAT/firewall)

4) Tiny config nit: `apiKey`

2) Fix `apiKey` (yours is currently a literal string)