#Ollama 70B Crashes OpenClaw on first prompt?
1 messages · Page 1 of 1 (latest)
That symptom lines up with OpenClaw driving Ollama very differently than ollama run:
- Ollama streaming is disabled by default in OpenClaw, so you won’t see tokens dribbling in — you only get the reply when the whole generation finishes. (So a long generation looks like a “hang”.)
- If you enabled Ollama auto‑discovery, OpenClaw will often set a very large
maxTokens(docs say it defaults to 10× the context window). For an 8k ctx model that can be ~80k tokens of generation budget, which can absolutely peg GPU/RAM and “never” finish for a 70B model.
Docs: https://docs.openclaw.ai/providers/ollama (see Model discovery + Streaming Configuration)
What I’d try first (fastest)
A. Check what OpenClaw thinks the limits are
Run:
openclaw --versionopenclaw models status(or in TUI:/model status)
You’re looking for your ollama/... model showing a huge maxTokens and streaming: false.
B. Cap maxTokens for that Ollama model
Easiest fix is to override the model’s request params to something sane (ex: 512–2048) so it can’t run for ages.
In ~/.openclaw/openclaw.json (JSON5), add an override like:
{
agents: {
defaults: {
models: {
"ollama/<your-model-id-here>": {
params: { maxTokens: 1024 },
},
},
},
},
}
(Replace <your-model-id-here> with exactly what /model status shows, e.g. llama3.3:70b etc.)