#Subject: Gemini API 429 Rate Limits Despite Being at 22% of Quota

1 messages · Page 1 of 1 (latest)

radiant grove
#

Hey @krill - hitting persistent ⚠️ API rate limit reached errors on Gemini despite being at ~22% of Tier 1 TPM quota. Theory: retry logic creating TPS bursts that violate per-second limits.

Setup:

  • Google Cloud Tier 1: 1M TPM, 1K RPM, 10K RPD
  • Actual usage: 225k TPM (22%), 4 RPM (0.4%), 22 RPD (0.2%)
  • Models: Both preview (gemini-3-flash-preview) and stable (gemini-2.5-flash)

What We Tried:

  1. Fresh Google Cloud project + new API key
  2. Switched preview → stable
  3. Both fail with 429s immediately

Evidence:

  • AI Studio graphs: 4 requests over 4 min, 258k tokens total [attached]
  • Gateway logs: error=⚠️ API rate limit reached.
  • Pattern: First request after idle hits 429 instantly, retries fail

Code Findings:

  • BASE_RUN_RETRY_ITERATIONS=24, MIN=32, MAX=160
  • retryAsync() has exponential backoff (300ms * 2^i)
  • Theory: Run loop (32-160) Ă— API retries (3-5) = TPS burst

The Issue:

  • 4 requests (~25k each = 100k tokens)
  • If those hit within the same second, might trip undocumented TPS limits
  • Dashboard shows 60-second average is fine, but per-second enforcement stricter

Questions:

  1. Can we tune run loop retry count or add delays?
  2. Parallel requests on model switch causing bursts?
  3. Known Gemini TPS vs TPM issues?

Currently on Claude (works fine). Happy to provide more logs/config.

slim deltaBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> âś… Mark Solution

hearty heraldBOT
#

That ⚠️ API rate limit reached string is OpenClaw’s generic rewrite for any error that looks like 429 / quota / RESOURCE_EXHAUSTED (so it’s not Gemini-specific, and it can also be masking the actual quota dimension you’re tripping).

First: your “run loop × retries = TPS burst” theory is probably not the culprit

Those BASE_RUN_RETRY_ITERATIONS / MIN / MAX constants are just a hard cap on the outer run loop (src/agents/pi-embedded-runner/run.ts). In the common “Gemini returned a 429” case, OpenClaw typically does:

  • 1 model attempt (plus whatever retries the provider SDK does internally)
  • then fails over (if you configured fallbacks) or returns the error (if you didn’t)

It does not spin 32–160 model calls unless you’re in a weird recovery loop (context overflow compaction, thinking-level fallback, rotating through many auth profiles, etc.).

What’s much more likely with Gemini

Gemini/Generative Language will return 429s for multiple independent limits, and some of them are not the ones people look at in the console:

  • ...PerUser... limits (often effectively “per IP” for API-key usage)
  • “tokens-per-minute” evaluated using (input tokens + max output tokens) as the reservation
  • model/region-specific throttles that don’t show up nicely as “you used 22% in the last 60s”
#

With OpenClaw, it’s also very easy for a “small” chat turn to be a huge request (system prompt + tool schemas + chat history). If the raw 429 mentions something like GenerateContentTokensPerMinutePerUser, a single ~60k-token prompt can get you “instant 429 after idle” even at low RPM.

Q1) “Can we tune run loop retry count or add delays?”

Not via config today. Those values are currently hard-coded in run.ts, and the inner HTTP retry logic you found (retryAsync(...)) lives lower in the stack (pi-ai / provider client). So: no supported knob right now.

If we want to improve this properly, the right fix is to honor Retry-After / retryDelay from Google (or add jittered backoff on 429) before re-attempting. That’s a code change.

Q2) “Parallel requests on model switch causing bursts?”

OpenClaw’s model attempts are sequential, not parallel, for a single incoming message. The only way you get “bursty” is if you have multiple concurrent runs (multiple chats, cron/heartbeat, multiple agents) or the provider client retries quickly.

Q3) “Known Gemini TPS vs TPM issues?”

Yes: Gemini can 429 you while TPM/RPM averages look fine because the actual enforced metric might be different (per-user/per-IP, token reservation, per-model throttles). The only way to know is the raw 429 payload, which usually spells it out.


What I need from you to nail this (minimal, super actionable)

  1. Version (important):
#
  • openclaw --version
  1. Raw error text (not the rewritten “⚠️ API rate limit reached”):
    Run one failing prompt, then paste the relevant lines from:

    • openclaw logs --follow --json (or just openclaw status --all, which includes a safe tail)

    We’re looking for the Google message that typically contains:

    • the quota metric name (...TokensPerMinutePerUser, ...RequestsPerMinute..., etc.)
    • sometimes a suggested retry delay
  2. Which provider you’re actually using + model ref:

    • openclaw models status --json (you can redact anything that looks secret; it should not print raw keys anyway)

Once we see the metric name in the 429, we can tell you whether the fix is:

  • shrink the request (new session / reduce toolset / reduce history),
  • change provider path (Vertex vs API-key Gemini),
  • or treat it as an OpenClaw bug (missing Retry-After handling) and send you to file a GitHub issue with the raw payload.
radiant grove
#

Hey @krill - following up on the Gemini rate limit diagnostics:

What We Have

  • âś… openclaw --version: 2026.2.24
  • âś… openclaw models status --json: Full output captured
  • âś… Confirmed 429s trigger consistently on google/gemini-3-flash-preview at session start (3 retries over ~17 seconds)

Problem

Can't capture the raw Google API error. Even openclaw logs --follow --json shows the emoji-rewritten version:

error=⚠️ API rate limit reached. Please try again later.

The rewriting happens before the logging layer captures it, so we can't see the actual quota metric name (GenerateContentTokensPerMinutePerUser or whatever Google returned).

Question

Is there a way to access the raw provider response deeper in the stack? For example:

  • Debug flag that logs provider responses before rewriting?
  • Direct access to the error object before translation?
  • Alternative log output that preserves the original error?

We need that raw error to confirm whether it's the per-user token reservation issue you mentioned.

hearty heraldBOT
#

Yep — in 2026.2.24 there isn’t a “log the raw provider 429” flag in the normal gateway logs, because the lifecycle logger intentionally formats errors via formatAssistantErrorText(...) before writing error=... (that’s exactly why you only see ⚠️ API rate limit reached...).

The good news: the raw error is usually still stored — just not in the standard log line

Two practical ways to get it without patching code:

Option A (best): enable cache trace (JSONL) and extract errorMessage

Cache trace records the full “session after” snapshot, including the assistant message object (which typically still contains the original errorMessage).

  1. Enable it (env vars or config)

Env vars (fastest):

export OPENCLAW_CACHE_TRACE=1
export OPENCLAW_CACHE_TRACE_PROMPT=0
export OPENCLAW_CACHE_TRACE_SYSTEM=0
export OPENCLAW_CACHE_TRACE_MESSAGES=1
#
openclaw gateway restart

(If you run as a service, put those in ~/.openclaw/.env (or ~/.openclaw-<profile>/.env) and restart.)

  1. Reproduce one failing Gemini run.

  2. Extract the raw error:

jq -r '
  select(.stage=="session:after")
  | .messages[]?
  | select(.role=="assistant" and .stopReason=="error")
  | .errorMessage
' ~/.openclaw/logs/cache-trace.jsonl | tail -n 1
#

If that prints something like 429 ... GenerateContentTokensPerMinutePerUser ..., that’s the quota metric we need.

Important: don’t paste the whole cache-trace file into Discord — even with prompt/system disabled, message history can still contain sensitive content. Just paste the single extracted error string (redact request IDs if you want).

Option B: look in the session transcript for errorMessage / openclaw:prompt-error

If the 429 happened during prompt submission (thrown error), OpenClaw also persists a custom entry:

  • type: openclaw:prompt-error
  • includes error: describeUnknownError(promptError)

So grepping your session JSONL for prompt-error or errorMessage can work too — but cache-trace is usually simpler because it’s already centralized under ~/.openclaw/logs/.


If cache-trace still only shows the emoji-rewritten text

Then the raw Google payload is getting collapsed before it becomes lastAssistant.errorMessage (likely inside the provider/SDK). At that point, the only way to go deeper is a code change (e.g. log the original thrown error object / response body at trace level before rewriting).

If you hit that, paste:

#
  • the extracted errorMessage (even if generic)
  • plus one session:after event’s provider/modelId/modelApi fields
    …and I’d call it “needs a GitHub issue” so we can add a safe “raw error debug” path.
radiant grove
#

âś… Raw error JSON: {"error":{"code":429,"message":"Resource has been exhausted (e.g. check quota).","status":"RESOURCE_EXHAUSTED"}}
❌ Provider/model fields not found in cache-trace session:after events
Context: google/gemini-3-flash-preview, consistent 429s at session start with ~200k context
Note that the cache-trace didn't contain provider/modelId/modelApi fields in the structure we queried. He can guide on where else to look, or confirm it needs a code change to expose that info.

hearty heraldBOT
#

Nice — that raw JSON payload is already the “real thing” (it’s coming from the provider), but yeah it’s annoyingly generic and doesn’t include the quota metric name.

1) Cache-trace does have provider/model fields — they’re on the event, not inside .messages[]

In cache-trace.jsonl, provider, modelId, modelApi, sessionId, etc are top-level fields on each trace line (they’re set when the cache-trace is created in run/attempt.ts).

Try this to print the last session:after with those fields:

jq -r '
  select(.stage=="session:after")
  | "\(.ts) provider=\(.provider) modelId=\(.modelId) modelApi=\(.modelApi) sessionId=\(.sessionId)"
' ~/.openclaw/logs/cache-trace.jsonl | tail -n 5

And if you want the assistant errorMessage from the same event:

#
jq -r '
  select(.stage=="session:after")
  | . as $e
  | ($e.messages // [])
  | map(select(.role=="assistant" and .stopReason=="error"))
  | last?
  | [$e.provider, $e.modelId, $e.modelApi, .errorMessage] | @tsv
' ~/.openclaw/logs/cache-trace.jsonl | tail -n 1

If those top-level fields are literally missing, double-check you’re reading the right state dir (profiles write to ~/.openclaw-<profile>/logs/cache-trace.jsonl, not ~/.openclaw/...).

2) Best next move: use the built-in probe path (often preserves more detail)

Run a tiny live probe against Google auth:

#
openclaw models status --probe --probe-provider google --probe-max-tokens 16 --json

Why this helps:

  • It makes a minimal request (so it’s a clean test: if this still 429s, your 200k-context “token reservation” theory is probably not the trigger).
  • models status is also the place where OpenClaw sometimes surfaces provider usage headers / probe metadata that you won’t see in the lifecycle “friendly error” rewrite.

3) If probe + cache-trace still only shows RESOURCE_EXHAUSTED with no metric

Then yeah: we’d need a code change to log deeper info (HTTP headers / original error object) before it gets collapsed into that minimal JSON.

At that point I’d file a GitHub issue and include:

  • openclaw --version (you already have 2026.2.24)
  • the openclaw models status --probe ... --json output (redacted)
  • the raw error JSON you pasted

because the actionable fix would be “add a safe debug flag to log raw Google 429 details (quota metric + retry delay)”.

radiant grove
#

Hey @krill - completed the full diagnostic run:

Raw Error Captured (via cache-trace)

{
  "error": {
    "code": 429,
    "message": "Resource has been exhausted (e.g. check quota).",
    "status": "RESOURCE_EXHAUSTED"
  }
}

Key finding: Generic RESOURCE_EXHAUSTED with no details array containing the specific quota metric (GenerateContentTokensPerMinutePerUser, etc.) that you expected.

Context

  • Version: 2026.2.24
  • Model: google/gemini-3-flash-preview
  • Pattern: Consistent 429s with large context (~200k tokens), 3-4 retry attempts over ~17 seconds
  • Small sessions: Work fine (no rate limit with minimal context)

Probe Result

openclaw models status --probe --probe-provider google --probe-max-tokens 16 --json showed "invalid API key" error in logs, but regular Gemini sessions work fine with the same key from .env.

Possible issue: Probe might not inherit gateway env vars? Regular sessions authenticate successfully, so the key is valid.

Conclusion

The behaviour strongly supports your token reservation theory - large context sessions (input + max_output tokens) exhaust quota instantly, while small requests succeed. Google's returning a generic error without the detailed quota metric.

hearty heraldBOT
#

That’s a solid capture — and that exact JSON shape (just code/message/status with no details[]) means we genuinely don’t have the quota metric name available at the OpenClaw layer. So the next “actionable” step is: (a) extract what OpenClaw did send (esp. maxTokens), and (b) mitigate by reducing reservation/context.

1) Cache-trace does include provider/model — it’s on the trace line, not inside the error JSON

In ~/.openclaw/logs/cache-trace.jsonl, provider, modelId, modelApi are top-level fields on each event.

Try:

jq -r '
  select(.stage=="session:after")
  | "\(.ts) provider=\(.provider) modelId=\(.modelId) modelApi=\(.modelApi) sessionId=\(.sessionId)"
' ~/.openclaw/logs/cache-trace.jsonl | tail -n 10

Also very useful: the stream:context stage records the options passed into the provider call (this is where maxTokens lives):

#
jq -r '
  select(.stage=="stream:context")
  | {ts, provider, modelId, modelApi, options}
' ~/.openclaw/logs/cache-trace.jsonl | tail -n 3

If you paste just the options object for a failing call, we can confirm whether you’re hitting “token reservation” (input + maxTokens) vs something else.

2) Why models status --probe said “invalid API key” while sessions work

openclaw models status --probe ... runs in the CLI process, using your current shell env + local config, not the gateway service’s environment.

Common causes:

  • The gateway service has GEMINI_API_KEY, but your interactive shell doesn’t (systemd/launchd env split).
  • You ran the probe on a different machine than the gateway.
  • API key is IP-restricted, so probing from a different host/IP fails.
#

Fix: put the key in the canonical place the CLI + service both load:

  • ~/.openclaw/.env (or ~/.openclaw-<profile>/.env if you use profiles), then openclaw gateway restart.

3) Practical mitigation: cap Gemini output tokens (reduces reservation)

OpenClaw supports per-model params that flow into the stream call (see agents.defaults.models[modelKey].params.maxTokens).

Example:

{
  agents: {
    defaults: {
      models: {
        "google/gemini-3-flash-preview": { params: { maxTokens: 512 } },
        "google/gemini-2.5-flash": { params: { maxTokens: 512 } },
#
      },
    },
  },
}

Restart gateway, then retry the same “huge context” session.

Interpretation:

  • If big-context sessions start working once maxTokens is small → your reservation theory is basically confirmed.
  • If they still 429 instantly → it’s more likely Google is treating this as a request-size / per-user token bucket issue that returns generic RESOURCE_EXHAUSTED.

4) Do we need a code change to expose more detail?

If Google isn’t sending details[], the only remaining “more detail” is likely in HTTP headers or a richer error object that the underlying SDK currently collapses.

So: yes, to make this debuggable in the future we’d want an opt-in debug path that logs (safely) the raw response headers / structured error before it gets rewritten. That’s GitHub-issue territory.

#

If you want, I can give you an issue template snippet with exactly what to attach (cache-trace stream:context.options, plus the raw error JSON, plus version).

radiant grove
#

before we capture cach etrace again - we tried the maxTokens mitigation - still hit the 429 eror

hearty heraldBOT
#

If capping maxTokens didn’t change anything, that basically rules out the “(input + max output) reservation” angle as the primary trigger.

What it does fit really well is: your input context alone is blowing a Gemini quota bucket (TPM-per-user / per-minute bucket / burst bucket), so even maxTokens=128 still sends ~200k tokens and still trips RESOURCE_EXHAUSTED.

Next things that actually move the needle (input size), in order

  1. Run /context detail in the failing session
    This will show you how much you’re burning just on:
  • system prompt + project context
  • tool schema JSON (can be huge)
  • skills prompt

If tool schemas / skills are a big chunk, you can shave tens of thousands of tokens without touching your user text.

  1. Quick A/B test: remove tool schema overhead for Google only
    As a test (not forever), restrict tools when using the google provider so the schema payload shrinks:
#
{
  tools: {
    byProvider: {
      "google": { profile: "minimal" } // or "messaging"/"coding" depending what you need
    }
  }
}

If that makes the big session stop 429’ing, then the “200k context” you’re measuring includes lots of tool/skill/system overhead, not just chat history.

  1. Reduce history going into the prompt
    If the 200k is mostly accumulated conversation, the only reliable fixes are:
  • /compact (then continue from the summary)
  • /new and re-introduce only what’s needed (or keep big material in files and have the agent read excerpts)
#

Re: probe showing “invalid API key”

That’s almost always “CLI env != gateway env”. The gateway service can see ~/.openclaw/.env, but your shell running openclaw models status --probe ... might not (or you’re running it on a different machine/profile). Put the key in ~/.openclaw/.env (or the correct ~/.openclaw-<profile>/.env) and rerun the probe on the gateway host.

If you paste the top of /context detail (system prompt chars, tool schemas chars, skills prompt chars) I can tell you which of the above will give the biggest win fastest.

radiant grove
#

just before we try that - I'd just point out we are experiencing the same 429 with a 22k context - we attempted with new sessions, so its not necessarily the token burden.

hearty heraldBOT
#

If you can reproduce the same 429 at ~22k total prompt, then yeah — that pretty much kills the “token reservation (input + maxTokens)” theory as the main cause.

At that point the two most common explanations are:

  1. You’re looking at the wrong quota dashboard for the endpoint you’re actually using
  • In OpenClaw, provider: "google" + GEMINI_API_KEY is the Gemini Developer API path (AI Studio / Generative Language).
  • A lot of people check Vertex AI quotas (GCP console) and assume they apply — they don’t.
  • So even if you think you have “Tier 1 1M TPM”, the endpoint you’re hitting may have different (or not-yet-granted) limits.
  1. It’s not a token limit at all — it’s per-IP / trust / project state throttling
    Some datacenter/VPS egress IPs get throttled harder. In that case, prompt size doesn’t matter much: even small-ish prompts can 429 consistently.

The fastest way to disambiguate (no more cache-trace needed)

Do one minimal direct API call from the same machine that runs the gateway (same egress IP), using the same key:

#
curl -sS -D - \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"contents":[{"role":"user","parts":[{"text":"ping"}]}]}'
  • If this 429s too → it’s not OpenClaw overhead. It’s quota/account/IP/project-side. Next test is: run the same curl from your laptop/home network. If home works but VPS doesn’t, it’s almost certainly IP-based throttling.
  • If curl works but OpenClaw 429s → then OpenClaw is still sending a much larger payload than you think (tool schemas/skills/system prompt), and /context detail + restricting tools for google will matter.

Also: the --probe “invalid API key” mismatch is expected

openclaw models status --probe ... runs in your CLI environment, not the gateway service env. If the gateway gets the key from ~/.openclaw/.env but your shell doesn’t, probe will fail while the gateway works.

Put GEMINI_API_KEY=... into ~/.openclaw/.env (or ~/.openclaw-<profile>/.env) and rerun the probe on the gateway host.

If the minimal curl 429s

I’d stop chasing retries/backoff inside OpenClaw and instead:

#
  • try a different egress IP / region (or run the gateway somewhere else temporarily)
  • or switch to Vertex auth (google-vertex via gcloud ADC) if you specifically need GCP quota behavior

If you paste the result of that curl -D - ... (just status + headers + error JSON; redact the key), we’ll know immediately which branch you’re on.

radiant grove
#

no its deffo correct gemini API key in AIStudio gen/lang key its not Vertex quotas - I can watch the live graphs.
we are running on a home local single machine.
It only started happening yesterday - have been using Gemini3 flash for a few weeks - obviously hit rate limit occasionally but only the 60 second sliding TPM window - and it would clear

hearty heraldBOT
#

If it’s the same 429 even with ~22k context + fresh /new sessions, and it only started yesterday after working for weeks, that strongly points to this being upstream Gemini API behavior (quota/entitlement/capacity), not OpenClaw retry loops or token reservation.

A couple things that match your symptoms:

Fastest confirmation (rules out OpenClaw entirely)

Run a single minimal generateContent call from the same machine/key (no tools, no history):

curl -sS -D - \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GEMINI_API_KEY" \
#
  -H "Content-Type: application/json" \
  -d '{"contents":[{"role":"user","parts":[{"text":"ping"}]}]}'
  • If that also 429s → it’s 100% Gemini API-side (account/project entitlement or capacity), and the best next step is posting in those forum threads / opening Google support with your project number + timestamps.
  • If that succeeds while OpenClaw fails → then we go back to /context detail + tool/schema trimming (because OpenClaw is still sending more than you think).

Re: “probe says invalid API key”

That’s usually just CLI env != gateway env (service has the key in ~/.openclaw/.env, your shell doesn’t). It’s annoying but separate from the 429 issue.

If you paste the headers + body from that curl (redact the key), we’ll know immediately whether you’re dealing with a platform incident vs an OpenClaw payload size problem.

radiant grove
#

I think we've gone as far as we can go for now.

Generic error with no details[] quota metric - nothing for OpenClaw to surface beyond this.

Critical Timing Evidence

  • Worked fine for weeks until yesterday (2026-02-24)
  • Same config/payload suddenly started 429-ing
  • Matches timing of widespread Gemini 429 reports in Google forums (last 24h)

Diagnostic Tests

  1. Minimal curl "ping" (2 tokens, no tools): âś… Success
  2. Fresh OpenClaw session (~22k context + tools): ❌ 429
  3. Large OpenClaw session (~200k context + tools): ❌ 429
  4. Mitigation attempt (maxTokens: 512): ❌ No effect, still 429s

Configuration Notes

  • Version: 2026.2.24
  • Gateway correctly uses the openclaw.json key pulled from op
  • Probe failed with "invalid key" because it used shell env (not gateway env) - separate issue

Conclusion

The curl test proves the account isn't blocked (minimal requests succeed), but any realistic agent payload (context + tools) triggers instant 429s. Combined with:

  • Sudden onset after weeks of working fine
  • Timing matches Google platform reports
  • Generic error (no actionable quota metric)

Likely cause: Google tightened limits on free/Tier 1 accounts or regional capacity issue affecting function calling workloads.

hearty heraldBOT
#

Yeah — that’s a clean stopping point, and your conclusion is the same one I’d land on given the evidence.

What you’ve proven:

  • The key/account/project is basically fine (minimal curl succeeds).
  • The failure correlates with “real agent-shaped” payloads (history + tool schemas / function calling).
  • The error blob Google returns is intentionally non-diagnostic (RESOURCE_EXHAUSTED with no details[]), so OpenClaw can’t surface a quota metric even if it wanted to.
  • Timing lines up with broader reports → very likely a Google-side policy/capacity/tier change or regression affecting larger structured requests.

Two pragmatic next moves (when you feel like poking again)

  1. Submit a Google forum/support report with the strongest reproducer

    • Minimal curl succeeds (include that)
    • Slightly larger request fails (include the smallest failing payload you can produce)
    • Include timestamps + region + model + “function calling/tool schemas present”
  2. Workaround inside OpenClaw: reduce tool schema load for Google
    If you want a “make it usable” band-aid until Google fixes it, try running Gemini in a minimal-tools mode (and keep Claude for tool-heavy work). Concretely:

    • set tools.byProvider.google.profile = "minimal" (or an allowlist that excludes the heavy schema tools)
#
  • or route Gemini to a separate agent configured with fewer tools/skills

That’s the only lever OpenClaw really has without Google exposing better quota telemetry.

If you do end up filing something, paste the link here and I’ll help you turn your diagnostics into a tight issue report (so it doesn’t get dismissed as “check quota”).

radiant grove
#

thanks for the help - you want me flag this as solved?

hearty heraldBOT
#

Yes please — if you can.

If this thread helped you reach a conclusion/workaround, please right-click my solution message → Apps → ✅ Mark Solution (Answer Overflow will index it and it helps the next person who hits the same “Gemini 429 RESOURCE_EXHAUSTED w/ no details[]” situation).