Hey @krill - hitting persistent ⚠️ API rate limit reached errors on Gemini despite being at ~22% of Tier 1 TPM quota. Theory: retry logic creating TPS bursts that violate per-second limits.
Setup:
- Google Cloud Tier 1: 1M TPM, 1K RPM, 10K RPD
- Actual usage: 225k TPM (22%), 4 RPM (0.4%), 22 RPD (0.2%)
- Models: Both preview (
gemini-3-flash-preview) and stable (gemini-2.5-flash)
What We Tried:
- Fresh Google Cloud project + new API key
- Switched preview → stable
- Both fail with 429s immediately
Evidence:
- AI Studio graphs: 4 requests over 4 min, 258k tokens total [attached]
- Gateway logs:
error=⚠️ API rate limit reached. - Pattern: First request after idle hits 429 instantly, retries fail
Code Findings:
BASE_RUN_RETRY_ITERATIONS=24,MIN=32,MAX=160retryAsync()has exponential backoff (300ms * 2^i)- Theory: Run loop (32-160) Ă— API retries (3-5) = TPS burst
The Issue:
- 4 requests (~25k each = 100k tokens)
- If those hit within the same second, might trip undocumented TPS limits
- Dashboard shows 60-second average is fine, but per-second enforcement stricter
Questions:
- Can we tune run loop retry count or add delays?
- Parallel requests on model switch causing bursts?
- Known Gemini TPS vs TPM issues?
Currently on Claude (works fine). Happy to provide more logs/config.