Gemini Flash 429s | Friends of the Crustacean 🦞🤝 | Page 1

green meteor Apr 7, 2026, 10:13 PM

#

I have 5 open claws. 3 of them are using Gemini Flash 2.5 and I have been getting 429.
my head claw, Eric, has said the following(pasted below). wanted your advice.
We reviewed the calls we were doing and there was not much volume that would trigger any sort of rate limiting so i am pretty sure its something internally that we need to solve for.

Eric:
Still 429s. Same pattern — hits seq 4-5 within seconds of starting. Session hygiene didn't fix it.

My theory about context bloat was wrong, or at least not the full story.

I think there's something in OpenClaw's internal handling — it may be auto-retrying on 429s under the hood, creating a rapid burst at the framework level rather than from Andy's behavior. That would explain why it happens within seconds of every fresh session regardless of context size.

Check OpenClaw config — look at whether there's a retry setting in openclaw.json that's causing automatic rapid retries on 429 responses. That would be the actual bug to fix.

Eric:
Found something significant. Look at line 29050:

OpenClaw is using Google's Prompt Caching API for Gemini agents — it tries to cache the system prompt (brain files) to reduce token costs on repeat calls. This is a separate API endpoint from the generative API. When this cache operation fails or hits its own rate limit, it sets a 10-minute backoff and then retries, potentially causing rapid-fire requests to the cache API on each new session start.

The 429s might not be from generate content at all — they could be from the prompt cache creation API, which has separate and more restrictive limits. That's why the generate content metrics look clean but we still get 429s.

This is an OpenClaw internals question — whether there's a way to disable or throttle prompt caching for Gemini agents. John, this is something worth raising in the OpenClaw Discord or with support. The behavior is in the compiled dist — I can't change it from here.

violet bluffBOT Apr 7, 2026, 10:13 PM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

green meteor Apr 7, 2026, 10:15 PM

#

from Eric: Agent: Andy Bellefleur (google/gemini-2.5-flash)
• 429 RESOURCE_EXHAUSTED on every fresh session start, seq 4-5
• Generate content metrics look clean (18 RPM, 665K TPM) — suspect is prompt cache API, not generate
• GOOGLE_PROMPT_CACHE_RETRY_BACKOFF_MS = 10 * 6e4 in pi-embedded-8NxfKyT2.js
• Happens consistently, even with cleared sessions and lean prompts

Let me know what he says.

sudden pewterBOT Apr 7, 2026, 10:16 PM

#

Eric’s read is actually pretty solid — and I think you’re both close.

I checked current OpenClaw behavior: for direct Gemini, if cacheRetention is short or long, OpenClaw will auto-create/use Google cachedContents for system prompts. So yes, you can get 429s from that endpoint, separate from normal generate calls.

Key nuance: it does not look like a tight retry loop in one session; it records a ~10 min backoff per session context. But with multiple fresh sessions/agents spinning up, each can still fire its own cache-create call quickly.

What I’d do right now (fast isolation)

Please paste:

openclaw --version

Check if caching is enabled globally/per-model:

openclaw config get agents.defaults.params
openclaw config get agents.defaults.models

#

openclaw config get agents.list

Temporary mitigation: set Gemini 2.5 Flash agents/models to:

params: { cacheRetention: "none" }

Then restart gateway:

openclaw gateway restart

If 429s drop immediately, you’ve confirmed cache-endpoint pressure as the trigger.

If you still see bursty 429s after that

Check failover/cooldown behavior:

#

openclaw models status --json

Look at auth.unusableProfiles and cooldown reasons (rate_limit, etc.).

Docs:

If you want, paste those config get outputs and I’ll point to the exact key to change for your 3 Gemini claws only.

#

Yep — based on that Eric dump, I’d treat this as very likely prompt-cache endpoint pressure, not normal generate RPM/TPM.

Short version:

✅ Eric’s hypothesis is plausible.
✅ There is a 10-min backoff constant in OpenClaw’s Gemini prompt-cache path.
⚠️ There is not a user-facing “retry tuning” knob just for this cachedContents create flow.
✅ Practical workaround: disable prompt caching for those Gemini agents via cacheRetention: "none".

What to change now

In openclaw.json (JSON5), set per-agent override for the 3 Gemini claws:

{
agents: {

#

list: [
{ id: "andy", params: { cacheRetention: "none" } },
{ id: "eric", params: { cacheRetention: "none" } },
{ id: "third-gemini-agent", params: { cacheRetention: "none" } },
],
},
}

Then restart:

openclaw gateway restart

If you have a global default like agents.defaults.params.cacheRetention: "long", this per-agent override is the right way to neutralize only those 3.

#

Also please send:

openclaw --version
openclaw config get agents.defaults.params
openclaw config get agents.list

I can sanity-check that no higher-level setting is re-enabling cache behavior.

If this reproduces on latest, I’d absolutely file a GitHub issue with your seq timing + metrics + this cachedContents suspicion: https://github.com/openclaw/openclaw/issues

green meteor Apr 7, 2026, 10:22 PM

#

From eric:

Eric Northman:
Gateway's back up. Testing Andy right now — this is the real test:

Eric Northman:
Still 429s. cacheRetention: none didn't fix it — or the gateway restart didn't fully pick up the change. Let me check models status as Krill suggested:

Eric Northman:
No cooldowns showing. Let me verify the config change actually registered:

Eric Northman:
Config registered. cacheRetention: none is live on all three. But still 429ing.

For Krill — yes, one question: the fix is in but 429s persist after restart with cacheRetention: none on all three Gemini agents. The error happens at seq 4-5 of fresh sessions within seconds. The models status --json shows no unusableProfiles or cooldowns. Is there a separate cooldown state file or in-memory throttle that persists across gateway restarts? And is there a way to force-clear it?

sudden pewterBOT Apr 7, 2026, 10:24 PM

#

Great datapoint — if cacheRetention: "none" is confirmed live and it still 429s, then this is probably not the managed Gemini prompt-cache create path anymore.

Answering your direct question:

Yes, some failover/cooldown state is persisted across restarts in
~/.openclaw/agents/<agentId>/agent/auth-state.json (auth profile cooldown/disable state).
Gemini prompt-cache backoff entries are tied to session transcript state (sessions/*.jsonl custom entries), not a global throttle.
I’m not aware of a separate hidden global in-memory throttle you can “flush” via CLI.

Practical next checks

Confirm there’s no explicit cached-content handle still set anywhere:

cachedContent or cached_content in model/agent params.

Run with only one Gemini claw active (others paused) to rule out per-second/concurrency quota bursts.
Enable cache tracing briefly and verify runtime sees cacheRetention: none on those turns.

Also, please paste:

#

openclaw --version
openclaw config get agents.defaults.models
openclaw config get agents.list

If you want a hard reset without manual file surgery, quickest safe test is a brand-new agent id with same model config + cacheRetention: "none" and a fresh session. If that still fails seq 4-5, I’d file this as an OpenClaw bug with repro + logs.

Docs:

green meteor Apr 7, 2026, 10:27 PM

#

Eric Northman:
On it. Checking auth-state.json files and looking for any cached-content handles:

Eric Northman:
Config confirmed live. For Krill — here's the full picture to share:

───

openclaw --version: 2026.4.6 (35fd766)

config get agents.defaults.models: Not found (no global default)

config get agents.list:

• eric-northman: anthropic/claude-sonnet-4-6 | params: {}
• lafayette-reynolds: google/gemini-2.5-flash | params: {cacheRetention: none} ✅
• jason-stackhouse: ollama/gemma4:e4b | params: {}
• andy-bellefleur: google/gemini-2.5-flash | params: {cacheRetention: none} ✅
• terry-bellefleur: google/gemini-2.5-flash | params: {cacheRetention: none} ✅

auth-state.json: No cooldowns. errorCount: 0 on all three. lastFailureAt timestamps exist but no threshold tripped.

Pattern: 429 hits consistently at seq 4-5 on fresh sessions, seconds after start. Only one Gemini agent active at time of failure. cacheRetention: none confirmed live, still 429ing.

Next per Krill's suggestion: Try a brand-new agent ID with same model + params to see if it still fails seq 4-5. If yes, file as OpenClaw bug with repro.

Ready. Send him everything above — the version, the agents.list output, the auth-state summary, and the pattern description.

One addition worth including: the 429 error is coming from the Gemini generate content API (google-generative-ai provider, stopReason: "error" in session transcript). It is NOT tagged as a prompt-cache specific error — it's a standard RESOURCE_EXHAUSTED on what appears to be a regular generate call at seq 4-5.

That's the key detail that might help Krill pinpoint whether this is still cache-related or something else in the pipeline.

Go ahead and send it.

sudden pewterBOT Apr 7, 2026, 10:28 PM

#

Perfect report — this is exactly what we needed.

Given your data, I’d now call it this way:

This is no longer pointing at prompt-cache create calls.
cacheRetention: none is live on all 3 Gemini agents, and error is explicitly on generate content.
Not likely stale cooldown state.
You already checked auth-state.json and it’s clean (no active cooldowns).
There isn’t another “secret cooldown file” you need to flush for this case.
Most likely now:

Google-side burst/quota dimension not reflected in your high-level RPM/TPM view (e.g. short-window/concurrency), or
OpenClaw is still doing multiple generate attempts in a rapid chain (failover/retry behavior).

#

Next high-signal test (single run, minimal noise)

With only one Gemini agent active:

Temporarily force failover retries way down:

{
auth: {
cooldowns: {
rateLimitedProfileRotations: 0,
overloadedProfileRotations: 0
}
}
}

Ensure no model fallbacks configured for that agent.

#

Send one simple prompt and see if seq still jumps to 4-5.

If seq collapses to ~1 attempt, it’s retry/failover amplification.
If it still 429s on first attempt, it’s provider-side quota/throttle.

If this still reproduces on 2026.4.6, I’d file a bug with exactly what you posted (it’s great repro quality):
https://github.com/openclaw/openclaw/issues

Include:

version
agent config (cacheRetention: none)
single-agent-only repro
seq 4-5 pattern
confirmation that the failing endpoint is generate content (not cachedContents).

green meteor Apr 7, 2026, 10:32 PM

#

Eric: Implementing Krill's test now — killing the retry rotations and testing with Andy solo:

Still 429, still seq 4-5, still within seconds. Cooldown rotations set to 0 made no difference.

Per Krill's logic — if seq still hits 4-5 with rotations at 0, it's not retry/failover amplification. It's provider-side quota throttle.

This is a Google-side issue now, not OpenClaw. The API key is genuinely throttled at some quota dimension that isn't visible in the standard RPM/TPM dashboard — likely a short-window or concurrency limit Google applies at the key level after heavy sustained use today.

Honest assessment: Andy is not going to run tonight. The key needs time to reset fully. Tomorrow morning with a fresh quota window it should work cleanly.

For Krill — the failover test confirms: seq 4-5 persists with rateLimitedProfileRotations: 0, overloadedProfileRotations: 0, cacheRetention: none, single agent, clean session. This is provider-side. Worth filing as an OpenClaw bug since the error isn't surfacing as a proper cooldown in auth-state.json.

sudden pewterBOT Apr 7, 2026, 10:32 PM

#

Yep — that test result is strong. I agree with your conclusion.

You’ve effectively ruled out:

Gemini prompt-cache create path (cacheRetention: none)
failover/retry amplification (rateLimitedProfileRotations: 0, overloadedProfileRotations: 0)
multi-agent concurrency (solo repro)

So this looks like provider-side RESOURCE_EXHAUSTED on generate (likely quota dimension outside the simple RPM/TPM view).

The OpenClaw part still worth reporting:
429 classification/cooldown visibility gap (not surfacing into auth-state.json as expected).

Please file here: https://github.com/openclaw/openclaw/issues
Include exactly what you already collected (it’s excellent repro quality).

If helpful, title suggestion:
“Gemini 2.5 Flash 429 RESOURCE_EXHAUSTED not reflected in auth-state cooldowns (seq 4–5 fresh session)”

#Gemini Flash 429s

What I’d do right now (fast isolation)

If you still see bursty 429s after that

What to change now

Practical next checks

Next high-signal test (single run, minimal noise)