#My LLM is fast when run on its own but slow through OpenClaw

1 messages · Page 1 of 1 (latest)

nocturne stream
#

as the title says

thorny galleonBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

forest finchBOT
fierce raftBOT
#

A little slower is normal, but if it’s much slower, the usual reason is that OpenClaw sends a way bigger prompt than your raw test: system instructions, session history, tool schemas, channel context, etc.

Can you paste these first:

openclaw --version
openclaw status
openclaw models status

And tell me:

  • backend: LM Studio / Ollama / vLLM / other
  • exact model
  • whether this is slow here too:
#
openclaw infer model run --model <provider/model> --prompt "hi" --json

That helps separate:

  • model/backend is slow under OpenClaw prompt load
  • vs channel/gateway issue

If you’re using a local OpenAI-compatible /v1 backend, the common fixes are:

  • agents.defaults.experimental.localModelLean: true
  • models.providers.<provider>.models[].compat.supportsTools: false
  • if the backend expects plain string chat content: compat.requiresStringContent: true

Docs:

#

Paste the outputs and I’ll help narrow it down. 🦐

nocturne stream
#

so if i message in the LM Studio chat window, the replies are super fast. But when done through OpenClaw, its slow

fierce raftBOT
#

Yep, that’s usually expected, especially with local models.

LM Studio chat is basically:

  • short system prompt
  • short history
  • no tool schema
  • no agent/runtime overhead

OpenClaw is doing a lot more:

  • system + agent instructions
  • session/thread history
  • tool definitions
  • channel metadata
  • sometimes workspace/context too

So it’s not really the same prompt, even if your message text is the same.

#

Fastest things to try:

  1. Lean local mode
agents: {
  defaults: {
    experimental: {
      localModelLean: true
    }
  }
}
  1. Disable tools for that local model if you mostly want chat speed
models: {
#
  providers: {
    lmstudio: {
      models: [
        {
          id: "your-model-id",
          compat: {
            supportsTools: false
          }
        }
      ]
    }
  }
}
  1. Compare direct OpenClaw model call vs chat
nocturne stream
#

OpenClaw 2026.4.20

🦞 OpenClaw 2026.4.20 (unknown) — Ah, the fruit tree company! 🍎

Config : ~/.openclaw/openclaw.json
Agent dir : ~/.openclaw/agents/main/agent
Default : lmstudio/huihui-qwen3.5-9b-claude-4.6-opus-abliterated
Fallbacks (0) : -
Image model : -
Image fallbacks (0): -
Aliases (0) : -
Configured models (1): lmstudio/huihui-qwen3.5-9b-claude-4.6-opus-abliterated

Auth overview
Auth store : ~/.openclaw/agents/main/agent/auth-profiles.json
Shell env : off
Providers w/ OAuth/tokens (0): -

  • google effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | google:default=yf...fv
  • lmstudio effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | lmstudio:default=sk-lm-WY...s3s7Popy | models.json=LM...EN | source=models.json: ~/.openclaw/agents/main/agent/models.json
  • openai effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | openai:default=hg...gh
  • openrouter effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | openrouter:default=hguyghgb...ugughgbu

OAuth/token status

  • none
fierce raftBOT
#

Thanks, that version is current enough, so this probably is not just “you’re on an old build”.

What you’re seeing is pretty normal for a local 9B model:

  • LM Studio chat = small prompt, almost no overhead
  • OpenClaw = agent prompt + history + tool schema + Discord/thread context

So the model can feel fast in LM Studio, but slow once OpenClaw turns it into an agent workload.

The two best things to try are:

1) Enable lean local mode

openclaw config set agents.defaults.experimental.localModelLean true
openclaw gateway restart
#

That trims heavyweight default tools and reduces prompt bloat.

2) Disable tool schemas for the LM Studio model

In your LM Studio model entry, set:

compat: {
  supportsTools: false
}

Then restart the gateway.

If you want, paste your ~/.openclaw/agents/main/agent/models.json and I’ll show the exact edit.

Quick sanity check

Run:

#
openclaw infer model run --model lmstudio/huihui-qwen3.5-9b-claude-4.6-opus-abliterated --prompt "hi" --json

If that is already much slower than LM Studio chat, then the slowdown is mostly prompt/runtime load, not Discord itself.

Short version: if you want LM Studio-chat speed, OpenClaw usually needs either:

  • a stronger local model / more hardware, or
  • a cloud model for the agentic stuff

Docs:

If this helps, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it.

nocturne stream
#

Ill run this

openclaw infer model run --model lmstudio/huihui-qwen3.5-9b-claude-4.6-opus-abliterated --prompt "hi" --json