#My LLM is fast when run on its own but slow through OpenClaw
1 messages · Page 1 of 1 (latest)
A little slower is normal, but if it’s much slower, the usual reason is that OpenClaw sends a way bigger prompt than your raw test: system instructions, session history, tool schemas, channel context, etc.
Can you paste these first:
openclaw --version
openclaw status
openclaw models status
And tell me:
- backend: LM Studio / Ollama / vLLM / other
- exact model
- whether this is slow here too:
openclaw infer model run --model <provider/model> --prompt "hi" --json
That helps separate:
- model/backend is slow under OpenClaw prompt load
- vs channel/gateway issue
If you’re using a local OpenAI-compatible /v1 backend, the common fixes are:
agents.defaults.experimental.localModelLean: truemodels.providers.<provider>.models[].compat.supportsTools: false- if the backend expects plain string chat content:
compat.requiresStringContent: true
Docs:
Paste the outputs and I’ll help narrow it down. 🦐
so if i message in the LM Studio chat window, the replies are super fast. But when done through OpenClaw, its slow
Yep, that’s usually expected, especially with local models.
LM Studio chat is basically:
- short system prompt
- short history
- no tool schema
- no agent/runtime overhead
OpenClaw is doing a lot more:
- system + agent instructions
- session/thread history
- tool definitions
- channel metadata
- sometimes workspace/context too
So it’s not really the same prompt, even if your message text is the same.
Fastest things to try:
- Lean local mode
agents: {
defaults: {
experimental: {
localModelLean: true
}
}
}
- Disable tools for that local model if you mostly want chat speed
models: {
providers: {
lmstudio: {
models: [
{
id: "your-model-id",
compat: {
supportsTools: false
}
}
]
}
}
}
- Compare direct OpenClaw model call vs chat
openclaw infer model run --model lmstudio/<your-model-id> --prompt "hi" --json
If that’s fast, but Discord/chat is slow, the extra agent/thread context is the main cost.
Docs:
- https://docs.openclaw.ai/gateway/local-models
- https://docs.openclaw.ai/concepts/experimental-features
If you want, paste:
openclaw --version
openclaw models status
and your LM Studio model ID, and I can suggest the exact config.
OpenClaw 2026.4.20
🦞 OpenClaw 2026.4.20 (unknown) — Ah, the fruit tree company! 🍎
Config : ~/.openclaw/openclaw.json
Agent dir : ~/.openclaw/agents/main/agent
Default : lmstudio/huihui-qwen3.5-9b-claude-4.6-opus-abliterated
Fallbacks (0) : -
Image model : -
Image fallbacks (0): -
Aliases (0) : -
Configured models (1): lmstudio/huihui-qwen3.5-9b-claude-4.6-opus-abliterated
Auth overview
Auth store : ~/.openclaw/agents/main/agent/auth-profiles.json
Shell env : off
Providers w/ OAuth/tokens (0): -
- google effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | google:default=yf...fv
- lmstudio effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | lmstudio:default=sk-lm-WY...s3s7Popy | models.json=LM...EN | source=models.json: ~/.openclaw/agents/main/agent/models.json
- openai effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | openai:default=hg...gh
- openrouter effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | openrouter:default=hguyghgb...ugughgbu
OAuth/token status
- none
Thanks, that version is current enough, so this probably is not just “you’re on an old build”.
What you’re seeing is pretty normal for a local 9B model:
- LM Studio chat = small prompt, almost no overhead
- OpenClaw = agent prompt + history + tool schema + Discord/thread context
So the model can feel fast in LM Studio, but slow once OpenClaw turns it into an agent workload.
The two best things to try are:
1) Enable lean local mode
openclaw config set agents.defaults.experimental.localModelLean true
openclaw gateway restart
That trims heavyweight default tools and reduces prompt bloat.
2) Disable tool schemas for the LM Studio model
In your LM Studio model entry, set:
compat: {
supportsTools: false
}
Then restart the gateway.
If you want, paste your ~/.openclaw/agents/main/agent/models.json and I’ll show the exact edit.
Quick sanity check
Run:
openclaw infer model run --model lmstudio/huihui-qwen3.5-9b-claude-4.6-opus-abliterated --prompt "hi" --json
If that is already much slower than LM Studio chat, then the slowdown is mostly prompt/runtime load, not Discord itself.
Short version: if you want LM Studio-chat speed, OpenClaw usually needs either:
- a stronger local model / more hardware, or
- a cloud model for the agentic stuff
Docs:
- https://docs.openclaw.ai/gateway/local-models
- https://docs.openclaw.ai/concepts/experimental-features
If this helps, please right-click the solution message → Apps → ✅ Mark Solution so Answer Overflow can index it.
Ill run this
openclaw infer model run --model lmstudio/huihui-qwen3.5-9b-claude-4.6-opus-abliterated --prompt "hi" --json