#Local LM Studio setup constantly loading new LMS model with every chat response post update

1 messages · Page 1 of 1 (latest)

lost ore
#

I just performed an update to 2026.4.15 from an older build (which worked fine), now I'm experiencing an issue where every basic chat response is loading a new model filling my available VRAM/RAM and tanking performance. The LMS service is running fine and I get a response back.

Two interesting log lines popping up:

[agent/embedded] embedded run failover decision: runId=slug-gen-1776609671777 stage=assistant decision=surface_error reason=timeout from=lmstudio/qwen/Qwen3.5-27B profile=-
[lmstudio/stream] LM Studio inference preload failed for "qwen/Qwen3.5-27B" (1 consecutive failure, next preload attempt skipped for ~5s); continuing without preload: AbortError: This operation was aborted

rancid wolfBOT
nimble hollowBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

rancid wolfBOT
fading belfryBOT
#

Let me verify this against the docs and source so I don't guess.

#

This looks a lot like the new LM Studio auto-load/preload path in 2026.4.x, not LM Studio itself suddenly breaking.

Two useful bits from your logs:

  • LM Studio inference preload failed ... continuing without preload = OpenClaw is calling LM Studio’s /api/v1/models/load before the real inference request.
  • runId=slug-gen-... = that one is a separate helper run for session-memory slug generation, not the main assistant reply.

My first suspicion is that OpenClaw is requesting a bigger context load than LM Studio already has loaded, so LM Studio keeps creating/reloading another instance and eats VRAM/RAM.

Can you paste these first:

openclaw --version
openclaw status --all
openclaw models status
#

Also, if you can, check whether LM Studio shows multiple loaded instances of the same model with different context lengths.

Quick mitigation to try now, pin a smaller runtime cap for that LM Studio model via contextTokens, for example:

{
  models: {
    providers: {
      lmstudio: {
        models: [
          {
            id: "qwen/Qwen3.5-27B",
            contextTokens: 16384
          }
        ]
      }
lost ore
#

OpenClaw 2026.4.15 (041266a)

Diagnosis (read-only)

Gateway connection details:
Gateway target: ws://127.0.0.1:18789
Source: local loopback
Config: /root/.openclaw/openclaw.json
Bind: loopback

✓ Config: /root/.openclaw/openclaw.json
✓ Secret diagnostics (0)
✓ Restart sentinel: none
! Port 18789
Port 18789 is already in use.

  • pid 2940 root: openclaw-gateway (127.0.0.1:18789)
  • Gateway already running locally. Stop it (openclaw gateway stop) or use a different port.
    ! Tailscale: off · unknown
    ✓ Skills: 9 eligible · 0 missing · /root/.openclaw/workspace
    ✓ Plugin compatibility (none)
    ✓ Channel issues (none

Config : ~/.openclaw/openclaw.json
Agent dir : ~/.openclaw/agents/main/agent
Default : lmstudio/qwen/Qwen3.5-27B
Fallbacks (0) : -
Image model : lmstudio/qwen/Qwen3.5-27B
Image fallbacks (0): -
Aliases (0) : -
Configured models (1): lmstudio/qwen/Qwen3.5-27B

Auth overview
Auth store : ~/.openclaw/agents/main/agent/auth-profiles.json
Shell env : off
Providers w/ OAuth/tokens (0): -

  • lmstudio effective=models.json:lm...al | models.json=lm...al | source=models.json: ~/.openclaw/agents/main/agent/models.json
  • ollama effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | ollama:default=marker(ollama-local)
fading belfryBOT
#

Nice, that helps. A couple things are clear now:

  • The Port 18789 is already in use bit is harmless here, that just means the gateway is already running.
  • slug-gen-* is a side helper run, not the main chat turn, unless this is happening on /new or /reset.
  • The interesting bit is the LM Studio preload path. In 2026.4.15, OpenClaw now calls LM Studio’s /api/v1/models/load before inference, with a 30s timeout. For discovered LM Studio models, the default load target is usually 64k context unless contextTokens is lower.

So my current theory is:

  1. OpenClaw is asking LM Studio to preload qwen/Qwen3.5-27B at a large context size.
  2. LM Studio either times that load out, or creates another loaded instance for a different context length.
  3. Real inference still succeeds, which matches what you’re seeing.

Can you paste these 2 outputs?

jq '.providers.lmstudio.models[] | select(.id=="qwen/Qwen3.5-27B")' ~/.openclaw/agents/main/agent/models.json
#
curl -s http://127.0.0.1:1234/api/v1/models | jq '.models[] | select(.key=="qwen/Qwen3.5-27B") | {key,max_context_length,loaded_instances}'

If you want the fastest workaround right now, set a smaller runtime cap for that model, for example:

{
  models: {
    providers: {
      lmstudio: {
        models: [
          {
            id: "qwen/Qwen3.5-27B",
            contextTokens: 8192
          }
#

If you paste those two outputs, I can tell you pretty quickly whether this is config-sized or bug-sized.

lost ore
#

Below is my model output, setting a limit on the contexttokens still reproduces the issue:

{
"type": "llm",
"publisher": "qwen",
"key": "qwen/qwen3.5-27b",
"display_name": "Qwen3.5 27B",
"architecture": "qwen35",
"quantization": {
"name": "Q4_K_M",
"bits_per_weight": 4
},
"size_bytes": 17468056915,
"params_string": "27B",
"loaded_instances": [
{
"id": "qwen/qwen3.5-27b",
"config": {
"context_length": 131072,
"eval_batch_size": 512,
"parallel": 4,
"flash_attention": true,
"offload_kv_cache_to_gpu": true
}
}
],
"max_context_length": 262144,
"format": "gguf",
"capabilities": {
"vision": true,
"trained_for_tool_use": true
},
"description": null,
"variants": [
"qwen/qwen3.5-27b@q4_k_m"
],
"selected_variant": "qwen/qwen3.5-27b@q4_k_m"
}

fading belfryBOT
#

That’s a really helpful detail, I think I see the bug now.

#

I think we found it.

Your LM Studio server reports this exact key:

qwen/qwen3.5-27b

But OpenClaw status showed the configured ref as:

lmstudio/qwen/Qwen3.5-27B

In the 2026.4.15 LM Studio preload path, OpenClaw does a case-sensitive lookup before deciding whether the model is already loaded. So with:

  • configured: qwen/Qwen3.5-27B
#
  • LM Studio reported: qwen/qwen3.5-27b

…it never recognizes the loaded instance, and keeps calling /api/v1/models/load again.

That also explains why lowering contextTokens did not help, the match is failing before that loaded-instance check matters.

What to change

Make the configured LM Studio model ref match LM Studio’s key exactly, lowercase included:

lmstudio/qwen/qwen3.5-27b

and make sure the LM Studio catalog/model entry id is also:

#
qwen/qwen3.5-27b

I’d update that for both the default text model and the image model, since your status output showed both using the mixed-case ref.

Why I’m pretty confident

  • LM Studio output: key: "qwen/qwen3.5-27b"
  • OpenClaw output: Default: lmstudio/qwen/Qwen3.5-27B
  • Current preload code compares exact strings, it does not lowercase-normalize the model key before matching.

So yeah, this looks like a real regression for older configs that kept mixed-case LM Studio IDs.

If switching the ref to lowercase fixes it, please file a GitHub issue so it gets patched properly:
https://github.com/openclaw/openclaw/issues

lost ore
#

I fixed the case sensetivty in openclaw.json, sadly this is not the issue

fading belfryBOT
#

One more likely gotcha, your earlier status output said the LM Studio model was coming from models.json, not directly from openclaw.json.

#

I’m still suspicious of the stale models.json path.

The reason: your earlier models status said LM Studio was effectively coming from:

source=models.json: ~/.openclaw/agents/main/agent/models.json

and the docs say that file is merged by default unless models.mode: "replace" is set, so changing only openclaw.json may not be what the runtime is actually using.

Can you paste these two read-only checks?

openclaw models status
grep -nE 'Qwen3\.5-27B|qwen3\.5-27b' ~/.openclaw/agents/main/agent/models.json ~/.openclaw/openclaw.json
#

If models.json still contains the mixed-case ref, that would explain why fixing openclaw.json alone didn’t change behavior.

If both files are already lowercase, then we’ve ruled that out properly and I’d look next for a second stale ref, likely the image model/helper path, or a real LM Studio preload regression.

Docs for the merge behavior:

Paste those two outputs and we can narrow it down another step.

lost ore
#

grep returns:

/root/.openclaw/agents/main/agent/models.json:32: "name": "Qwen3.5-27B (LM Studio)",
/root/.openclaw/openclaw.json:21: "name": "Qwen3.5-27B (LM Studio)"

models status is all:

Config : ~/.openclaw/openclaw.json
Agent dir : ~/.openclaw/agents/main/agent
Default : lmstudio/qwen/qwen3.5-27B
Fallbacks (0) : -
Image model : lmstudio/qwen/qwen3.5-27B
Image fallbacks (0): -
Aliases (0) : -
Configured models (1): lmstudio/qwen/qwen3.5-27B

Auth overview
Auth store : ~/.openclaw/agents/main/agent/auth-profiles.json
Shell env : off
Providers w/ OAuth/tokens (0): -

  • lmstudio effective=models.json:lm...al | models.json=lm...al | source=models.json: ~/.openclaw/agents/main/agent/models.json
  • ollama effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | ollama:default=marker(ollama-local)
fading belfryBOT
#

Ah, I see the confusion now.

The grep output only showed the display name, and the display name is not the thing the preload code matches on.

This part is harmless:

"name": "Qwen3.5-27B (LM Studio)"

What matters is the actual model ref / id, and your models status is still showing:

Default     : lmstudio/qwen/qwen3.5-27B
Image model : lmstudio/qwen/qwen3.5-27B
#

LM Studio reported the real key as:

qwen/qwen3.5-27b

So the case mismatch is still there, just smaller than before. The final B is still uppercase in OpenClaw, and lowercase in LM Studio. Since the current preload lookup is exact-string matching, that still breaks it.

So the fix needs to be on the id/ref, not the name.

You want these exact values:

  • default model: lmstudio/qwen/qwen3.5-27b
  • image model: lmstudio/qwen/qwen3.5-27b
  • LM Studio model entry id: qwen/qwen3.5-27b

You can keep the pretty display name as Qwen3.5-27B (LM Studio) if you want, that part should not matter.

#

I’d verify by checking for the actual refs, not the name:

grep -nE 'qwen3\.5-27B|qwen3\.5-27b' ~/.openclaw/openclaw.json ~/.openclaw/agents/main/agent/models.json

What I’d expect after the fix is that openclaw models status shows:

Default     : lmstudio/qwen/qwen3.5-27b
Image model : lmstudio/qwen/qwen3.5-27b
Configured models (1): lmstudio/qwen/qwen3.5-27b

Until models status shows that exact lowercase ref, I don’t think we’ve actually ruled the case bug out yet.

lost ore
#

Looks like that did it, what openclaw version was case sensitivity introduced?

fading belfryBOT
#

Nice, glad we pinned it down. Let me check the source history so I don’t guess on the version.

#

As best I can tell from the source history, this came in with the initial bundled LM Studio integration, not specifically 2026.4.15.

The relevant LM Studio preload code first shows up in:

  • 0cfb83edfa feat: LM Studio Integration (#53248)

and the first tagged release containing that commit looks to be v2026.4.12.

So the short answer is: case-sensitive LM Studio preload matching appears to have been introduced in 2026.4.12.

2026.4.15 added the preload backoff/logging improvements, but it does not look like that release introduced the actual case-sensitive comparison.

This is definitely worth a GitHub issue, since the lookup should probably normalize the model key:
https://github.com/openclaw/openclaw/issues

Glad we caught it 🎉
If this fixed it, please right-click the solution message → Apps → ✅ Mark Solution. That helps Answer Overflow index it for the next person.