#model hitting rate limit quite frequently

1 messages · Page 1 of 1 (latest)

daring zodiac
#

using gemini pro model (gemini 3.1 pro preview)

balmy geyserBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

warm harborBOT
steel rapidsBOT
#

If it’s google/gemini-3.1-pro-preview, frequent rate limits honestly wouldn’t surprise me, preview Gemini models can be pretty spiky.

First thing, can you paste:

openclaw --version
openclaw models status

And also:

  • the exact error text / 429 message
  • whether you’re using google (API key) or google-gemini-cli (OAuth)

A couple useful refs:

#

One relevant detail from the docs: OpenClaw can rotate across multiple provider keys on rate-limit responses, so if you’re on API keys and have more than one, that can help. But let’s get your version + exact error first so I don’t guess.

daring zodiac
#

user1@user1:~/.openclaw$ openclaw --version
openclaw models status
OpenClaw 2026.4.15 (041266a)
Config : ~/.openclaw/openclaw.json
Agent dir : ~/.openclaw/agents/main/agent
Default : google/gemini-3.1-pro-preview
Fallbacks (0) : -
Image model : -
Image fallbacks (0): -
Aliases (1) : gemini -> google/gemini-3.1-pro-preview
Configured models (2): openai-codex/gpt-5.4, google/gemini-3.1-pro-preview

Auth overview
Auth store : ~/.openclaw/agents/main/agent/auth-profiles.json
Shell env : off
Providers w/ OAuth/tokens (1): openai-codex (2)

  • google effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | google:default=AIzaSyBU...05_lQudM
  • openai-codex effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=2 (oauth=2, token=0, api_key=0) | openai-codex:default=OAuth, openai-codex:xxxxx@gmail.com=OAuth (xxxxxx@gmail.com)

OAuth/token status

steel rapidsBOT
#

Nice, that narrows it down a lot.

What your status says:

  • your version looks recent enough, so this does not smell like an old-version bug
  • your primary is google/gemini-3.1-pro-preview
  • you have one Google API key/profile
  • you have zero model fallbacks

So when Gemini rate-limits, OpenClaw has:

  1. no second Google key to rotate to, and
  2. no fallback model to jump to

One important gotcha: Configured models (2) is not the same thing as fallbacks.
So having openai-codex/gpt-5.4 configured does not make it automatic failover.

Easiest fix

#

Since you already have Codex auth configured:

openclaw models fallbacks add openai-codex/gpt-5.4

Then OpenClaw can fail over when Google rate-limits.

If you want to stay on Gemini

If you have multiple Gemini keys, OpenClaw can rotate them on rate-limit responses. Docs-backed env vars are:

  • GEMINI_API_KEYS
  • GEMINI_API_KEY_1, GEMINI_API_KEY_2, etc.
  • OPENCLAW_LIVE_GEMINI_KEY for a single live override

My read

This is probably mostly an upstream quota / preview-model issue, plus the fact that your setup currently has no safety net.

daring zodiac
#

can i add claude as well as a fallback?

steel rapidsBOT
#

Yep, absolutely.

OpenClaw can use multiple fallbacks, and it tries them in order.

So the shape is basically:

agents: {
  defaults: {
    model: {
      primary: "google/gemini-3.1-pro-preview",
      fallbacks: [
        "openai-codex/gpt-5.4",
        "anthropic/<your-claude-model>",
      ],
    },
#
  },
}

Or via CLI:

openclaw models fallbacks add openai-codex/gpt-5.4
openclaw models fallbacks add anthropic/<your-claude-model>
openclaw models fallbacks list

Two important bits:

  • you need Anthropic auth configured first
  • fallbacks are tried in the order you set them
daring zodiac
#

i believe claude recently banned using its subscription with openclaw?

steel rapidsBOT
#

Not according to the current docs.

OpenClaw’s docs explicitly say:

  • Claude CLI reuse / Claude subscription auth is allowed again
  • OpenClaw treats claude -p style usage as sanctioned unless Anthropic publishes a new policy

So the current docs-backed answer is no, it’s not banned right now.

That said, there’s an important nuance:

  • Claude CLI / subscription: supported
  • Anthropic API key: still the safer / more predictable path for long-lived or production setups

So if you want Claude only as a fallback, you can do it either way, but if you want the least surprising setup, I’d use an Anthropic API key.

Refs:

daring zodiac
#

does gemini has diff limits for pro, flash, flash lite? should we need to configure all of them separately? does openclaw has intelligence to use lighter models for lighter tasks, heavier models for complex tasks.. like it... i mean, can we configure multiple of them and rely on openclaw using them all based on the query

steel rapidsBOT
#

Short version:

  • Different Gemini tiers probably do have different upstream limits, yes, but that’s mostly a Google-side quota question
  • OpenClaw does not currently do “this prompt looks easy, use Flash Lite; this one looks hard, use Pro” routing by default
  • What OpenClaw does do well is:
    • primary model
    • fallbacks in order
    • auth/key rotation inside a provider
    • separate modality routing like imageModel, pdfModel, etc.

So if you configure:

model: {
  primary: "google/gemini-3.1-pro-preview",
  fallbacks: [
#
    "google/<flash-model>",
    "google/<flash-lite-model>",
    "openai-codex/gpt-5.4",
  ],
}

OpenClaw will use those as a failover chain, not as a “smart complexity router”.

So, do you need to configure them separately?

Only if you want them available.

A good practical pattern is:

  • Primary: the model you want most of the time
  • Fallback 1: a cheaper/faster same-provider fallback
#
  • Fallback 2: a different provider in case Google is rate-limiting hard

What OpenClaw is smart about today

More like:

  • “primary failed/rate-limited, try next”
  • “this tool needs image/pdf support, use the configured image/pdf route”

Not really:

  • “this user asked a simple question, downgrade automatically”

Docs:

If you want, I can suggest a clean Gemini Pro + Gemini Flash + cross-provider fallback order for your setup.

daring zodiac
#

separate modality routing like imageModel, pdfModel, etc.?

steel rapidsBOT
#

Yep, exactly, it means routing by task/input type, not by “how hard the prompt feels”.

For example:

  • model = normal chat/default text model
  • imageModel = used for image understanding / vision
  • pdfModel = used by the pdf tool
  • imageGenerationModel = used for creating/editing images
  • musicGenerationModel = used for music generation
  • videoGenerationModel = used for video generation

A few useful details from the docs:

  • imageModel is also used when the selected/default model can’t accept image input
  • pdfModel is used by the pdf tool, and if you don’t set it, it falls back to:
    1. imageModel
    2. then the normal default model
#

So a setup can look like:

agents: {
  defaults: {
    model: {
      primary: "google/gemini-3.1-pro-preview",
      fallbacks: ["openai-codex/gpt-5.4"],
    },
    imageModel: "google/<vision-capable-model>",
    pdfModel: "google/<pdf-friendly-model>",
  },
}

That’s smart modality routing.

daring zodiac
#

If you want, I can sketch a practical config for:
text chat
image understanding
PDF reading
failover when Gemini rate-limits

yes.. and configure some good models and failovers for all cagegories

steel rapidsBOT
#

Yep, I’d go with a balanced setup like this.

I wouldn’t treat these exact model picks as eternal truth, because model quality/limits churn fast, but this is a docs-compatible starting point for your case:

{
  agents: {
    defaults: {
      // Normal chat
      model: {
        primary: "google/gemini-3.1-pro-preview",
        fallbacks: [
          "google/gemini-3-flash-preview",
          "openai-codex/gpt-5.4",
        ],
      },
#

      // Image understanding / vision
      imageModel: {
        primary: "google/gemini-3-flash-preview",
        fallbacks: [
          "google/gemini-3.1-pro-preview",
          "openai-codex/gpt-5.4",
        ],
      },

      // PDF analysis
      pdfModel: {
        primary: "google/gemini-3-flash-preview",
        fallbacks: [
          "google/gemini-3.1-pro-preview",
#
          "openai-codex/gpt-5.4",
        ],
      },

      // Optional, only set these if you actually use those tools a lot:
      // imageGenerationModel: { primary: "google/gemini-3.1-flash-image-preview" },
      // musicGenerationModel: { primary: "google/lyria-3-clip-preview" },
      // videoGenerationModel: { primary: "google/veo-3.1-fast-generate-preview" },
    },
  },
}

Why this layout

  • Gemini Pro for your main heavy chat/reasoning
  • Gemini Flash as the first fallback, so simple/image/PDF work doesn’t burn the expensive tier first
#
  • Codex as the cross-provider escape hatch when Google is being annoying

Important nuance

If Google is rate-limiting at the account/key level, switching from Pro to Flash may still hit limits.
That’s why having one non-Google fallback is smart.

My recommendation

For your setup, I’d start with:

  • text: Gemini Pro -> Gemini Flash -> Codex
  • image: Gemini Flash -> Gemini Pro -> Codex
  • pdf: Gemini Flash -> Gemini Pro -> Codex

If you add Claude later

Then PDF is where Claude is often a nice extra fallback. I’d put it after the Google models or as the main non-Google fallback, depending on your billing/preferences.

Docs

daring zodiac
#

for pdf and image, you said gemini? i mean, you don't just have to stick to gemini and codex

steel rapidsBOT
#

Yep, totally, you’re not limited to Gemini + Codex.

I only used those because your earlier models status showed those were the providers you already had auth for.

A better way to think about it:

  • model = best general chat model
  • imageModel = whichever provider/model you want for vision
  • pdfModel = whichever provider/model you want for PDFs

And for PDFs specifically, the docs call out that Anthropic and Google get native PDF mode. Other providers can still work, but via extraction fallback.

So a mixed setup is absolutely valid, for example:

  • text chat → Gemini
  • image understanding → Anthropic or Google
  • PDF reading → Anthropic or Google
#
  • fallback if Google is rate-limiting → Codex or Anthropic

So yes, if you add Anthropic, a very sensible layout is:

{
  agents: {
    defaults: {
      model: {
        primary: "google/gemini-3.1-pro-preview",
        fallbacks: [
          "anthropic/<your-claude-model>",
          "openai-codex/gpt-5.4",
        ],
      },
      imageModel: {
#
        primary: "anthropic/<your-claude-model>",
        fallbacks: [
          "google/<your-gemini-vision-model>",
          "openai-codex/gpt-5.4",
        ],
      },
      pdfModel: {
        primary: "anthropic/<your-claude-model>",
        fallbacks: [
          "google/<your-gemini-pdf-model>",
          "openai-codex/gpt-5.4",
        ],
      },
    },
  },
daring zodiac
#

gemini quota is there but even for small prompt, i keep getting
⚠️ Rate-limited — ready in ~18s. Please wait a moment.

steel rapidsBOT
#

Yep, that usually means short-window rate limit / cooldown, not “you have zero quota left”.

So even a tiny prompt can still hit it if Google is throttling that model/profile right now.

OpenClaw is basically telling you:

  • this model/profile is temporarily cooling down
  • the soonest retry it knows is about 18s
  • it’s not necessarily about prompt size

A couple important bits from the failover docs:

  • rate-limit cooldowns can be model-scoped
  • a sibling model on the same provider can still work
  • if every candidate is blocked, OpenClaw surfaces the “ready in ~18s” message

So in your case, the practical fix is: