How do I increase the LLM HTTP request timeout for Ollama?
Running gemma4:e4b locally. It responds fine via direct Ollama API (~11s for simple prompts) but OpenClaw times out at exactly 60 seconds. agents.defaults.timeoutSeconds: 180 didn't help — that appears to be the agent turn timeout, not the LLM request timeout. Is there a config key to raise the per-request HTTP timeout for local Ollama calls?
#How do I increase the LLM HTTP request timeout for Ollama?
1 messages · Page 1 of 1 (latest)
Yep — you’re hitting the LLM idle stream timeout (default is 60s), not agents.defaults.timeoutSeconds.
Use this key:
openclaw config set agents.defaults.llm.idleTimeoutSeconds 180
openclaw gateway restart
Or in config:
{
agents: {
defaults: {
llm: {
idleTimeoutSeconds: 180
}
}
}
}
Notes:
agents.defaults.timeoutSeconds= overall agent run timeout.agents.defaults.llm.idleTimeoutSeconds= max silence between streamed model chunks.0disables the idle timeout entirely.
If it still dies at 60s, please share:
openclaw --version
openclaw logs --follow
and check for LLM idle timeout (60s): no response from model.
Docs: https://docs.openclaw.ai/gateway/configuration and https://docs.openclaw.ai/providers/ollama
Is this fairly normal for a local model to take so long searching the web for a simple item I’m only running on a Mac mini 16gpu
Yeah — for web search tasks with a local model, that can be totally normal.
Even for a “simple” question, OpenClaw usually does multiple steps (decide to search → call web tool(s) → read snippets → synthesize), so it’s not just one fast completion. Local models are often much slower on this tool-using loop than on plain chat.
Quick tuning tips:
- Keep
agents.defaults.llm.idleTimeoutSecondshigher (like 180+) so long first-token delays don’t get cut off. - Use a smaller/faster local model for tool-heavy tasks.
- Keep context/token budgets modest (large context can slow local inference a lot).
- For fast/reliable web+tools, cloud models are usually much better today.
If you want, paste:
openclaw --version
openclaw status --all
and I can suggest tighter settings for your exact setup.
Model guidance docs: https://docs.openclaw.ai/concepts/models
Community notes: https://docs.openclaw.ai/concepts/models#model-discussions-community-notes
Also since you’re asking about “what’s normal in practice,” it’s worth asking in #users-helping-users too for real-world hardware comparisons.
So you’re suggesting switching to the 2B verse 4B for things like a simple web search correct is that what you’re insinuating
Not insinuating — just describing the speed/quality tradeoff.
Short version:
- 2B: usually faster, but weaker tool-use/reasoning quality.
- 4B: usually slower, but more reliable answers.
So yes, for speed-first simple lookups, a smaller model can help.
But with your setup, I’d do this order first:
- Keep 4B and set:
agents.defaults.llm.idleTimeoutSeconds: 180 - Test again.
- If still too slow, try 2B for those tasks.
That way you’re changing one thing at a time and can see what actually improves latency.
You are amazing. Thank you so much.
Hell yeah, glad it helped! 🦐
If this fixed it, please right-click the solution message → Apps → ✅ Mark Solution.
That helps Answer Overflow index it so the next person finds it fast.
@low lynx May I know your specs? I am currently on a i9 12900 with 32gb ram without gpu
You’re correct: agents.defaults.timeoutSeconds is the agent turn timeout, not the model HTTP/stream idle timeout.
For the 60s cutoff, use:
{
agents: {
defaults: {
llm: {
idleTimeoutSeconds: 180 // or 0 to disable idle timeout
}
}
}
}
CLI equivalent: