Ollama 70B Crashes OpenClaw on first prompt? | Friends of the Crustacean 🦞🤝 | Page 1

That symptom lines up with OpenClaw driving Ollama very differently than ollama run:

Ollama streaming is disabled by default in OpenClaw, so you won’t see tokens dribbling in — you only get the reply when the whole generation finishes. (So a long generation looks like a “hang”.)
If you enabled Ollama auto‑discovery, OpenClaw will often set a very large maxTokens (docs say it defaults to 10× the context window). For an 8k ctx model that can be ~80k tokens of generation budget, which can absolutely peg GPU/RAM and “never” finish for a 70B model.

Docs: https://docs.openclaw.ai/providers/ollama (see Model discovery + Streaming Configuration)

What I’d try first (fastest)

A. Check what OpenClaw thinks the limits are
Run:

openclaw --version
openclaw models status (or in TUI: /model status)

You’re looking for your ollama/... model showing a huge maxTokens and streaming: false.

B. Cap maxTokens for that Ollama model
Easiest fix is to override the model’s request params to something sane (ex: 512–2048) so it can’t run for ages.

In ~/.openclaw/openclaw.json (JSON5), add an override like:

{
  agents: {
    defaults: {
      models: {
        "ollama/<your-model-id-here>": {
          params: { maxTokens: 1024 },
        },
      },
    },
  },
}

(Replace <your-model-id-here> with exactly what /model status shows, e.g. llama3.3:70b etc.)

#Ollama 70B Crashes OpenClaw on first prompt?

What I’d try first (fastest)