#Gemini models keeps timing out

1 messages · Page 1 of 1 (latest)

reef edge
#

Hey Krill

After upgrading openclaw to a recent version (2026.5.6) I started getting a lot of (decision=surface_error reason=timeout) for gemini models. Other model providers like MoonshotAI kimi 2.5 seems to work fine. Gemini also works sometimes but it triggers this error a lot recently.

What could be the issue? Did openclaw change something? Could it be that gemini models are slower? This affects both cron jobs and normal chat.

dreamy dockBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

distant basinBOT
#

I checked the current docs/source. The likely culprit is the model idle/request timeout path, not Discord/cron specifically.

Current OpenClaw has an LLM idle watchdog: if the model stream produces no response chunks before the idle window, the run is aborted. The default is 120s for cloud providers. Gemini can be especially vulnerable here because it may spend a long time thinking / waiting before first token, so it can “work sometimes” and still randomly hit timeout under load or with larger prompts.

The knob to try is provider timeout, not just the outer agent timeout:

{
  models: {
    providers: {
      google: {
        timeoutSeconds: 300
      }
    }
  }
}
#

If you’re using a different Gemini provider id, set that one instead, e.g. google-vertex or google-gemini-cli.

Then restart the gateway and test again. For cron jobs, also check the cron job’s own timeoutSeconds; the provider timeout helps the model request, while the cron timeout is the outer deadline.

Useful checks:

openclaw status --all
openclaw logs --follow

If logs show LLM idle timeout (120s): no response from model, that confirms it. If it still times out at 300s, I’d file it as a GitHub issue with the model id, provider id, and a redacted log snippet: https://github.com/openclaw/openclaw/issues

Docs: https://docs.openclaw.ai/concepts/agent-loop#timeouts and https://docs.openclaw.ai/concepts/model-providers#google-gemini-api-key

reef edge
#

I get these errors,

Gateway failed to start: Error: Invalid config at /home/node/.openclaw/openclaw.json. models.providers.google.baseUrl: Invalid input: expected string, received undefined models.providers.google.models: Invalid input: expected array, received undefined

I didn't have a google provider in the config before. I think it was using the defaults.

I use gemini-3-flash-preview, gemini-3.1-flash-lite-preview, and gemini-2.5-flash-lite.

distant basinBOT
#

Yep — that error makes sense. models.providers.google is a full explicit provider override, not a partial patch. Once you add that block, OpenClaw expects at least baseUrl and models too.

First, remove the incomplete block so the gateway can start again. If you want to keep using the built-in Google/Gemini catalog, you normally don’t need models.providers.google at all.

If you do want to override the timeout for the built-in google provider, use a complete block like this:

{
  models: {
    mode: "merge",
    providers: {
      google: {
        baseUrl: "https://generativelanguage.googleapis.com/v1beta",
        api: "google-generative-ai",
        timeoutSeconds: 300,
        models: [
#
          { id: "gemini-3-flash-preview", name: "Gemini 3 Flash Preview" },
          { id: "gemini-3.1-flash-lite-preview", name: "Gemini 3.1 Flash Lite Preview" },
          { id: "gemini-2.5-flash-lite", name: "Gemini 2.5 Flash Lite" }
        ]
      }
    }
  }
}

If your key is already in GEMINI_API_KEY, that should be enough. If not, add:

apiKey: "${GEMINI_API_KEY}"
#

inside the google block.

Important: because this explicit block defines the visible Google model list, include every Google model you actually use. If you leave one out, model selection/capability checks may not see it.

Then restart the gateway and look for the exact timeout line:

openclaw logs --follow

If the old error was LLM idle timeout (120s), this should move it to 300s. Docs refs: https://docs.openclaw.ai/concepts/model-providers#providers-via-modelsproviders-custombase-url and https://docs.openclaw.ai/concepts/agent-loop#timeouts

reef edge
#

Hey Krill, I wanted to give you a better logs to better identify the issue. I reproduced it and here are the logs. I will send them over multiple messages (discord message limit) and will let you know when I am done

2026-05-10T06:34:41.191+00:00 [diagnostic] stalled session: sessionId=c9227751-d3a2-4730-bb11-36bf9b8747c6 sessionKey=agent:main:direct:ahmed state=processing age=123s queueDepth=1 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=model_call lastProgress=model_call:started lastProgressAge=123s recovery=none 2026-05-10T06:34:47.225+00:00 [agent/embedded] Profile google:default timed out. Trying next account... 2026-05-10T06:34:47.233+00:00 [agent/embedded] embedded run failover decision: runId=85b99138-850a-4e3c-8a15-fa240aeaac75 stage=assistant decision=fallback_model reason=timeout from=google/gemini-3-flash-preview profile=sha256:a94403a4f9d0

#

2026-05-10T06:34:47.236+00:00 [diagnostic] lane task error: lane=main durationMs=131105 error="FailoverError: LLM request timed out." 2026-05-10T06:34:47.238+00:00 [diagnostic] lane task error: lane=session:agent:main:direct:ahmed durationMs=131109 error="FailoverError: LLM request timed out." 2026-05-10T06:34:47.266+00:00 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=google/gemini-3-flash-preview candidate=google/gemini-3-flash-preview reason=timeout next=google/gemini-3.1-flash-lite-preview detail=LLM request timed out. 2026-05-10T06:36:56.280+00:00 [agent/embedded] Profile google:default timed out. Trying next account... 2026-05-10T06:36:56.284+00:00 [agent/embedded] embedded run failover decision: runId=85b99138-850a-4e3c-8a15-fa240aeaac75 stage=assistant decision=fallback_model reason=timeout from=google/gemini-3.1-flash-lite-preview profile=sha256:a94403a4f9d0 2026-05-10T06:36:56.286+00:00 [diagnostic] lane task error: lane=main durationMs=128825 error="FailoverError: LLM request timed out." 2026-05-10T06:36:56.289+00:00 [diagnostic] lane task error: lane=session:agent:main:direct:ahmed durationMs=128827 error="FailoverError: LLM request timed out." 2026-05-10T06:36:56.317+00:00 [model-fallback/decision] model fallback decision: decision=candidate_failed requested=google/gemini-3-flash-preview candidate=google/gemini-3.1-flash-lite-preview reason=timeout next=moonshot/kimi-k2.5 detail=LLM request timed out.

#

Done. That's all

distant basinBOT
#

Got it — thanks. Send the rest, and I’ll wait until you say you’re done before I analyze so I don’t chase a partial log trail.

#

Received this chunk too. So far it’s consistently timing out at ~129–131s per Gemini candidate, which strongly matches the 120s-ish model request/idle timeout path plus overhead — but I’ll hold the full diagnosis until you say you’re done.

#

Thanks, this log is enough to identify the shape pretty clearly.

This is not auth, Discord, or cron-specific. The important lines are:

activeWorkKind=model_call lastProgress=model_call:started lastProgressAge=123s
...
Profile google:default timed out
...
reason=timeout from=google/gemini-3-flash-preview
...
durationMs=131105 error="FailoverError: LLM request timed out."

Then the same thing happens again for google/gemini-3.1-flash-lite-preview at ~129s.

So OpenClaw starts the Gemini model call, then receives no useful model progress/chunk before the timeout window. Around 120s later it aborts that candidate and falls back. That’s why it affects both cron and normal chat: they share the same underlying model-call path.

#

So yes: in practice Gemini is being slower / quieter-before-first-token than the current timeout budget allows. The newer OpenClaw versions also log/failover this more explicitly, so you’re seeing reason=timeout and fallback decisions instead of just a mysterious stall.

Your invalid-config issue came from adding a partial models.providers.google block. For now I’d do one of these:

Option A — safest: create a separate slow Gemini provider and switch to it:

{
  models: {
    mode: "merge",
    providers: {
      "google-slow": {
        baseUrl: "https://generativelanguage.googleapis.com/v1beta",
        api: "google-generative-ai",
        apiKey: "${GEMINI_API_KEY}",
        timeoutSeconds: 300,
#
        models: [
          { id: "gemini-3-flash-preview", name: "Gemini 3 Flash Preview" },
          { id: "gemini-3.1-flash-lite-preview", name: "Gemini 3.1 Flash Lite Preview" },
          { id: "gemini-2.5-flash-lite", name: "Gemini 2.5 Flash Lite" }
        ]
      }
    }
  },
  agents: {
    defaults: {
      model: {
        primary: "google-slow/gemini-3-flash-preview",
        fallbacks: [
          "google-slow/gemini-3.1-flash-lite-preview",
          "google-slow/gemini-2.5-flash-lite",
#
          "moonshot/kimi-k2.5"
        ]
      }
    }
  }
}

Option B — override the built-in google provider, but then you must provide the full baseUrl + api + models list like above under google. Don’t leave baseUrl or models undefined.

I’d recommend Option A because it avoids accidentally replacing the built-in Google catalog.

After changing it, restart gateway and reproduce. If the new log times out around 300s, then the config is working and Gemini is simply exceeding even that. If it still times out around 120–130s, then the timeout override isn’t being applied and that’s worth a GitHub issue with these exact log lines: https://github.com/openclaw/openclaw/issues

reef edge
#

Is it possible to increase the default timeout of all providers?

distant basinBOT
#

Not currently as a single “all model providers request timeout” knob, from what I can see.

There are two different timeouts:

  • agents.defaults.timeoutSeconds = whole agent run timeout. Raising it does not raise the model idle watchdog past the default cap; it mainly prevents the entire agent turn from being killed.
  • models.providers.<provider>.timeoutSeconds = provider HTTP/model request timeout + stream idle watchdog. This is the one that fixes your Gemini case, but it’s per provider.

So for now you’d need to set it per provider you care about, e.g. google, moonshot, openrouter, etc. For built-in providers, that means either a complete provider override or a separate custom provider like google-slow.

I agree a global default like models.timeoutSeconds / models.providersDefaults.timeoutSeconds would be useful here. If you want that behavior, I’d file it as a feature request: https://github.com/openclaw/openclaw/issues

For your immediate case, I’d only raise Gemini. Raising every provider globally would also make bad/stuck providers take much longer before falling back.

reef edge
#

If I were to override the default google provider. Can you provide a snippet I can copy paste?

Also do you have other recommendations for models with lower latency and same price range?

distant basinBOT
#

Yep — here’s a copy/pasteable override for the built-in google provider.

Add/merge this into ~/.openclaw/openclaw.json — don’t duplicate top-level models if you already have one:

{
  models: {
    mode: "merge",
    providers: {
      google: {
        baseUrl: "https://generativelanguage.googleapis.com/v1beta",
        api: "google-generative-ai",
        apiKey: "${GEMINI_API_KEY}",
        timeoutSeconds: 300,
        models: [
          {
#
            id: "gemini-3-flash-preview",
            name: "Gemini 3 Flash Preview",
            input: ["text", "image"]
          },
          {
            id: "gemini-3.1-flash-lite-preview",
            name: "Gemini 3.1 Flash Lite Preview",
            input: ["text", "image"]
          },
          {
            id: "gemini-2.5-flash-lite",
            name: "Gemini 2.5 Flash Lite",
            input: ["text", "image"]
          }
        ]
#
      }
    }
  }
}

If you already have GEMINI_API_KEY in the gateway environment, this is fine. If your key is configured through auth profiles instead, you can remove the apiKey line.

Then restart the gateway:

openclaw gateway restart

After that, reproduce once and check whether the timeout moved from ~130s to ~300s. If it still fails at ~120–130s, the override is not being applied.