#How do I increase the LLM HTTP request timeout for Ollama?

1 messages · Page 1 of 1 (latest)

low lynx
#

How do I increase the LLM HTTP request timeout for Ollama?
Running gemma4:e4b locally. It responds fine via direct Ollama API (~11s for simple prompts) but OpenClaw times out at exactly 60 seconds. agents.defaults.timeoutSeconds: 180 didn't help — that appears to be the agent turn timeout, not the LLM request timeout. Is there a config key to raise the per-request HTTP timeout for local Ollama calls?

lavish ivyBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

fierce sirenBOT
#

Yep — you’re hitting the LLM idle stream timeout (default is 60s), not agents.defaults.timeoutSeconds.

Use this key:

openclaw config set agents.defaults.llm.idleTimeoutSeconds 180
openclaw gateway restart

Or in config:

{
agents: {
defaults: {
llm: {
#
idleTimeoutSeconds: 180
}
}
}
}

Notes:

  • agents.defaults.timeoutSeconds = overall agent run timeout.
  • agents.defaults.llm.idleTimeoutSeconds = max silence between streamed model chunks.
  • 0 disables the idle timeout entirely.

If it still dies at 60s, please share:

openclaw --version
low lynx
#

Is this fairly normal for a local model to take so long searching the web for a simple item I’m only running on a Mac mini 16gpu

fierce sirenBOT
#

Yeah — for web search tasks with a local model, that can be totally normal.

Even for a “simple” question, OpenClaw usually does multiple steps (decide to search → call web tool(s) → read snippets → synthesize), so it’s not just one fast completion. Local models are often much slower on this tool-using loop than on plain chat.

Quick tuning tips:

  • Keep agents.defaults.llm.idleTimeoutSeconds higher (like 180+) so long first-token delays don’t get cut off.
  • Use a smaller/faster local model for tool-heavy tasks.
  • Keep context/token budgets modest (large context can slow local inference a lot).
  • For fast/reliable web+tools, cloud models are usually much better today.

If you want, paste:

openclaw --version
openclaw status --all

and I can suggest tighter settings for your exact setup.

low lynx
#

So you’re suggesting switching to the 2B verse 4B for things like a simple web search correct is that what you’re insinuating

fierce sirenBOT
#

Not insinuating — just describing the speed/quality tradeoff.

Short version:

  • 2B: usually faster, but weaker tool-use/reasoning quality.
  • 4B: usually slower, but more reliable answers.

So yes, for speed-first simple lookups, a smaller model can help.
But with your setup, I’d do this order first:

  1. Keep 4B and set:
    agents.defaults.llm.idleTimeoutSeconds: 180
  2. Test again.
  3. If still too slow, try 2B for those tasks.

That way you’re changing one thing at a time and can see what actually improves latency.

low lynx
#

You are amazing. Thank you so much.

fierce sirenBOT
#

Hell yeah, glad it helped! 🦐

If this fixed it, please right-click the solution message → Apps → ✅ Mark Solution.
That helps Answer Overflow index it so the next person finds it fast.

haughty cove
#

@low lynx May I know your specs? I am currently on a i9 12900 with 32gb ram without gpu

fierce sirenBOT
#

You’re correct: agents.defaults.timeoutSeconds is the agent turn timeout, not the model HTTP/stream idle timeout.

For the 60s cutoff, use:

{
agents: {
defaults: {
llm: {
idleTimeoutSeconds: 180 // or 0 to disable idle timeout
}
}
}
}

CLI equivalent: