#Performance issue
1 messages · Page 1 of 1 (latest)
Let me verify the local docs paths, then I’ll answer with the likely bottlenecks and what to measure.
Mostly because this is not the same workload.
Raw Ollama CLI is measuring near-pure model decode.
OpenClaw adds several layers on top:
- a much larger prompt, system prompt + tool schemas + session/thread history
- Discord thread history, default
channels.discord.historyLimitis20 - agent/runtime bookkeeping before the model starts answering
- Discord streaming is
offby default, so you often see the reply only after generation finishes
On a 4 GB VRAM card, that prompt prefill cost gets brutal. So 30 tps raw can easily feel like “5 tps” end-to-end through OpenClaw.
Also, double-check this one because it matters a lot:
- if you configured Ollama as
http://...:11434/v1, that is the OpenAI-compatible path - for OpenClaw + Ollama, use the native Ollama API:
http://...:11434with no/v1
Docs:
- https://docs.openclaw.ai/providers/ollama
- https://docs.openclaw.ai/gateway/local-models
- https://docs.openclaw.ai/channels/discord
Fastest isolation test:
openclaw infer model run --model ollama/<your-model> --prompt "hi" --json
If that is already much slower than ollama run, the slowdown is prompt/tool overhead.
If that is fast, but Discord feels slow, it is mostly delivery/streaming config.
Best knobs to try:
channels.discord.streaming: "partial"
- lower
channels.discord.historyLimit agents.defaults.experimental.localModelLean: true- if you do not need tools for that model,
compat.supportsTools: false
If you paste your Ollama provider config block, I can tell you pretty quickly which of those is biting you hardest.
Discord thread history, default channels.discord.historyLimit is 20?
What has discord to do with the performance of my local OpenClaw instance, I have no discord configured as a channel, are you halluzinating?
You’re right, that Discord bit was wrong. I mixed up the support thread we’re in with your local runtime. Sorry.
If your local OpenClaw instance is not using Discord, then Discord has nothing to do with the slowdown.
The real reason is usually this:
ollama run= thin raw chat call- OpenClaw = full agent runtime
So OpenClaw sends a much fatter prompt:
- system/developer instructions
- tool schemas
- session history
- workspace/bootstrap context
- sometimes memory/context plumbing
On a 4 GB VRAM card, that extra prompt ingestion, prefill, often hurts way more than decode speed. So raw Ollama can show ~30 tps, while OpenClaw feels much slower end-to-end, especially on first token.
A few concrete gotchas:
- If you pointed OpenClaw at Ollama’s
/v1OpenAI-compatible endpoint, that can be worse. Native Ollama URL is better:http://host:11434with no/v1. - First turn is often the worst because OpenClaw may load workspace/session context.
- Small local backends often struggle with OpenClaw’s full tool surface. The docs explicitly call out
agents.defaults.experimental.localModelLean: trueas a workaround.
Best isolation test:
openclaw infer model run --model ollama/<your-model> --prompt "hi" --json
That uses the normal agent runtime. If this is already much slower than ollama run, the gap is prompt/runtime overhead, not the transport.
Things worth trying:
- native Ollama API, not
/v1
agents.defaults.experimental.localModelLean: true- if you do not need tools,
compat.supportsTools: false - benchmark in a minimal workspace / fresh session
Docs:
- https://docs.openclaw.ai/providers/ollama
- https://docs.openclaw.ai/gateway/local-models
- https://docs.openclaw.ai/cli/infer
If you want, paste your models.providers.ollama block and I’ll point at the most likely bottleneck instead of guessing.