#"API rate limit reached" thrown internally before any Google API call is made

1 messages · Page 1 of 1 (latest)

thorn canopy
#

🐛 Bug: "API rate limit reached" thrown internally in 2026.2.26 — never reaches Google API
Environment

OpenClaw: 2026.2.26 (bc50708), upgraded from 2026.2.21-2
OS: Linux 6.17.0-14-generic x64 | Node: 25.6.1
Model: google/gemini-3-flash-preview

Problem
Every agent request fails with ⚠️ API rate limit reached immediately after upgrade. Error is generated inside OpenClaw before any API call — confirmed by curl working fine and Google AI Studio showing minimal usage (8/1K RPM, 135K/1M TPM).
Log evidence
isError=true error=⚠️ API rate limit reached. Please try again later.
source: openclaw/dist/subsystem-DypCPrmP.js:1170
Already tried

Cleared all auth-profiles.json cooldown state across all agent dirs
Deleted sessions, rebooted, tried new session IDs
Switched model to gemini-3-flash-preview (2026.2.26 broke the gemini-3-flash alias)
ran doctor --fix

Additional breaking changes found in 2026.2.26

channels.whatsapp.allowlist renamed to allowFrom with no auto-migration
Gateway service entrypoint mismatch after npm install without sudo

rugged cosmosBOT
#

That specific string (⚠️ API rate limit reached…) is not a Google response — it’s OpenClaw’s generic rewrite for any error that looks like rate limiting (it matches things like 429, rate limit, model_cooldown, cooling down, quota, etc.). So the key is to find the raw error and/or whether OpenClaw thinks your auth profile is in cooldown.

1) Check if OpenClaw has your Google profile stuck in cooldown (most likely)

Run this and paste only the auth section (redact tokens/emails if needed):

openclaw models status --json

In the JSON, look at:

  • auth.storePath ← this tells you the exact auth-profiles.json file the running agent is using
  • auth.unusableProfiles ← if this contains your google profile with a future until, OpenClaw will fail before making any API call
  • also confirm agentDir matches what you’ve been editing

If unusableProfiles is non-empty, the immediate workaround is simply: clear the usageStats for that profile in the storePath shown there (people often clear the legacy path or the wrong agent’s directory), then restart the gateway.

#

Docs (paths + cooldown behavior): https://docs.openclaw.ai/concepts/model-failover

2) Verify you’re editing the same install/user/state as the running gateway

Given you also saw a “gateway service entrypoint mismatch”, please run:

openclaw gateway status --deep

and check for Config (cli) vs Config (service) / different state dirs or users. If the service is running under a different user (or pointing at a different state dir), you can clear cooldown state “everywhere you can find” and still miss the one the gateway is actually reading.

Troubleshooting runbook: https://docs.openclaw.ai/gateway/troubleshooting

3) If unusableProfiles is empty, we need the raw error (the rewrite is hiding it)

Temporarily bump logging and reproduce once:

  • Env-var way: OPENCLAW_LOG_LEVEL=debug (or trace)
#
  • or set logging.level: "debug" in config and restart

Logging doc: https://docs.openclaw.ai/logging

Then paste the few lines around the failure (again, redact secrets). The rewrite happens late, so the raw error is usually still in the gateway logs.


If you can drop the openclaw models status --jsonauth.storePath + auth.unusableProfiles output here, I can tell you immediately whether this is “wrong auth-profiles.json file” vs “cooldown stuck” vs “misclassified internal error”.

Also: the channels.whatsapp.allowlist → allowFrom and the gemini alias regression really sound like separate bugs — please file those as separate GitHub issues so they don’t get conflated: https://github.com/openclaw/openclaw/issues

thorn canopy
#

Update: Google AI Studio Usage tab confirms 429 TooManyRequests errors are reaching Google, triggered by OpenClaw's aggressive retry loop after initial rate limit hit. OpenClaw appears to lack backoff/cooldown handling, causing a retry death spiral that exhausts quota entirely.

rugged cosmosBOT
#

That update lines up with how OpenClaw currently behaves: the user-facing string “⚠️ API rate limit reached…” is a generic rewrite, and once a run is in the “failover / retry” paths it can generate a burst of upstream calls with no sleep/backoff between attempts.

A few concrete points (from current source):

  • The embedded agent has an outer retry loop with a very high cap (MIN_RUN_RETRY_ITERATIONS is 32; scaled up to MAX_RUN_RETRY_ITERATIONS 160): src/agents/pi-embedded-runner/run.ts (resolveMaxRunRetryIterations).
  • On rate-limit shaped failures it will continue; into another iteration immediately (profile rotation and/or model fallback paths) — cooldown is a “don’t pick this profile again” mechanism, not a “wait before retrying” mechanism.

So you can absolutely get a “retry death spiral” effect where one user message causes many fast retries (esp. if you have multiple auth profiles/keys, or model fallbacks configured, or you’re in a multi-call tool loop).

Mitigations to stop burning quota (right now)

  1. Temporarily disable model fallbacks (so one message can’t fan out into many upstream attempts):
openclaw models fallbacks list
openclaw models fallbacks clear
# or set agents.defaults.model.fallbacks: [] in config
#
  1. If you have multiple Google auth profiles/keys, pin to one so it won’t immediately rotate and hammer several keys in a row:
openclaw models auth order get --provider google
openclaw models auth order set --provider google <profileId>

(Grab the exact <profileId> from openclaw models status --jsonauth.storePath / auth profile list.)

  1. After a 429, wait out the provider window before retrying (since OpenClaw doesn’t currently honor Retry-After for model calls at the agent-loop level).

For the bug report (this one is worth filing)

If you open a GitHub issue, the most useful payload is:

  • openclaw models status --json (redacted) showing fallbacks and how many google profiles exist
  • A log snippet showing multiple rapid attempts for a single inbound message (set OPENCLAW_LOG_LEVEL=debug and reproduce once)
  • Mention that run.ts has a high retry cap (32–160) and retries on failover without backoff; rate-limit handling should probably fail fast (or sleep using Retry-After) once a 429 is seen for a given provider/model within a single turn.

Issues: https://github.com/openclaw/openclaw/issues

#

Also +1 on splitting the other breakages (WhatsApp allowlist rename, service entrypoint mismatch, gemini alias) into separate issues so they don’t get tangled.

thorn canopy
#

Thanks for the detailed guidance. Here are the results:

openclaw models status --json (redacted)

{
  "agentDir": "/home/user/.openclaw/agents/main/agent",
  "defaultModel": "google/gemini-3-flash-preview",
  "fallbacks": [],
  "auth": {
    "storePath": "~/.openclaw/agents/main/agent/auth-profiles.json",
    "unusableProfiles": [],
    "providers": [{
      "provider": "google",
      "profiles": { "count": 1, "apiKey": 1 }
    }]
  }
}

openclaw gateway status --deep

  • Config (cli) and Config (service) both point to ~/.openclaw/openclaw.json
  • Only one Google auth profile, no rotation possible ✅
  • unusableProfiles is empty, no stuck cooldown ✅
  • Fallbacks cleared ✅
  • Second service (openclaw-bridge.service) is just a socat forwarder for Tailscale access, not relevant ✅

Google AI Studio confirms real 429s
Usage tab shows 429 TooManyRequests errors hitting Google repeatedly alongside 404 NotFound errors. This lines up with the retry death spiral you described — the initial 429 triggered 32-160 rapid retries with no backoff, burning through quota entirely.

Current status
Everything looks clean on the OpenClaw side now but requests are still failing. Waiting for Google's quota window to reset.

For the GitHub issue — happy to file separately for:

  1. No backoff/sleep on 429 in the agent retry loop (run.ts MIN/MAX_RUN_RETRY_ITERATIONS)
  2. channels.whatsapp.allowlistallowFrom rename with no auto-migration
  3. gemini-3-flash model alias regression (worked in 2026.2.21-2, 404s in 2026.2.26)
  4. Gateway service entrypoint mismatch after non-sudo npm install
rugged cosmosBOT
#

That’s an excellent dataset — and it narrows this down a lot.

Given:

  • 1 google api-key profile
  • no fallbacks
  • unusableProfiles: []
  • CLI/service config paths match

…then the “cooldown stuck / wrong agentDir / profile rotation” theories are basically dead.

What’s most likely happening

  1. The 429 “death spiral” is real, but it’s probably not “model fallback/rotation” (you’ve got none). The remaining places a burst can come from are:
  • the embedded runner’s internal retry loop (it can iterate a lot; see src/agents/pi-embedded-runner/run.ts), if it hits a continue branch repeatedly, or
  • provider/SDK-level retries (inside the Google adapter / upstream library), or
  • multiple concurrent runs (e.g. queue processing duplicating work) — less likely, but worth ruling out with logs.
#
  1. The 404s are a separate smoking gun: they usually mean wrong model id / wrong API version / model not enabled for that key. In OpenClaw there’s also an explicit normalization that maps:
  • gemini-3-flashgemini-3-flash-preview for provider google
    (normalizeGoogleModelId() in src/agents/models-config.providers.ts, also applied in model selection)

So if something previously worked as “flash” but now gets normalized to “preview”, you can get hard 404s.

One thing I still want you to check (because it affects cooldown/backoff behavior)

In your redacted models status --json, can you confirm whether the key is being resolved via shell env fallback?

In the full JSON there’s:

  • auth.shellEnvFallback.enabled
  • auth.shellEnvFallback.appliedKeys

If OpenClaw is actually pulling the Gemini key from env (even though a profile exists), you can end up with no profileId to attach cooldown to, which makes “429 → immediately try again next message” much more likely.

For the GitHub issues

#

Yes—please file them separately. Suggested “what to include” so maintainers can reproduce fast:

  1. 429 retry storm / no backoff
  • exact raw 429 payload from logs (not just the rewritten “⚠️ API rate limit reached”)
  • timestamps showing multiple upstream requests per single user message/runId (or clarify if it’s “per message” but too eager)
  • mention run.ts has high retry caps and doesn’t sleep on rate-limit paths; ideally should honor Retry-After or impose a minimum delay when no alternate profile exists.
  1. Gemini alias/model normalization regression
  • the exact 404 text (Google often returns something like “models/X is not found for api version …”)
  • note that gemini-3-flash is normalized to gemini-3-flash-preview for provider google, which can flip a previously-valid id into a 404.
  1. WhatsApp allowlist rename
  • config snippet before/after + confirmation that openclaw doctor --fix didn’t migrate it.
  1. Service entrypoint mismatch
  • openclaw gateway status --deep excerpt showing the service command/path and the CLI path (and how they differ after non-sudo npm install).
#

If you paste the auth.shellEnvFallback section + one raw 404/429 line from openclaw logs --follow here, I can tell you which branch this is (runner loop vs SDK retries vs duplicate runs) before you even wait for the quota window reset.

thorn canopy
#

Update — root cause confirmed after extensive testing:

CLI works fine:
Single 429 on first attempt, retries once, succeeds:

embedded run agent end: runId=... isError=true error=⚠️ API rate limit reached
embedded run agent end: runId=... isError=false

Aura responds normally via CLI and WhatsApp.

Dashboard webchat fails:
Same single message triggers a much longer retry storm that exhausts quota entirely before succeeding. Stopping the gateway, waiting 5+ minutes, and restarting is the only way to recover.

The bug is specifically in the webchat channel retry behaviour — it's dramatically more aggressive than the embedded CLI runner. With no fallbacks, no profile rotation, and only 1 Google API key, the webchat path still manages to trigger enough retries to hit TPM limits.

Other confirmed diagnostics:

  • auth.shellEnvFallback.enabled: false
  • unusableProfiles: []
  • fallbacks: []
  • CLI/service config paths match ✅
  • Only 1 Google auth profile ✅

Workaround for now: CLI and WhatsApp only, avoiding dashboard webchat entirely.

Happy to provide any additional logs needed for the GitHub issue.

rugged cosmosBOT
#

That’s a really strong narrowing: same key/model, no fallbacks, no cooldown state, CLI + WhatsApp OK, but Dashboard webchat triggers runaway usage.

At this point the two most plausible explanations are:

  1. Webchat is starting multiple independent runs for one “send” (usually because the browser client retries chat.send with new idempotencyKeys, often correlated with WS reconnects / page refreshes / double-submit bugs).
  2. Or: a single webchat run is looping internally in a way the CLI path doesn’t (less likely given the embedded runner logic, but possible if the webchat path is hitting a different wrapper/runner behavior).

The fastest way to prove which one it is (and produce perfect GitHub evidence)

Run the gateway in the foreground with WebSocket frame logging, reproduce one webchat send, then inspect how many chat.send RPCs happened and what idempotencyKey values were used:

openclaw gateway stop

# then run interactively
OPENCLAW_LOG_LEVEL=debug openclaw gateway --verbose --ws-log full
# (or --ws-log compact if full is too spammy)
#

Now send one message in the Dashboard webchat and watch for lines like:

  • chat.send (request)
  • matching chat.send (response)
  • and importantly: whether you see many chat.send requests with different ids / different idempotencyKeys.

Interpretation:

  • If you see multiple chat.send calls per single UI send → it’s almost certainly the web UI client / reconnect / double-submit path causing duplicate runs (gateway deduping only works per idempotencyKey).
  • If you see exactly one chat.send → then the retry storm is happening inside the agent/run path, and we’ll focus on runId attempt logs + raw upstream 429 payloads.

Docs reference for WS logging flags: https://docs.openclaw.ai/gateway/logging

What to include in the GitHub issue (for the webchat-specific runaway)

Along with your existing environment + AI Studio screenshots, include:

  • The --ws-log full snippet showing chat.send for one user action (or showing only 1, if that’s the case)
  • The list of idempotencyKey values observed (even shortened)
#
  • Any gateway log lines that include the runId and the raw upstream error text (the raw 429/404 string is more valuable than the rewritten “⚠️ API rate limit reached…”)

Your proposed issue split looks perfect

  1. Webchat runaway retries / duplicate chat.send / no backoff honoring 429 (this thread’s core)
  2. WhatsApp allowlistallowFrom migration
  3. gemini-3-flash alias / model id normalization regression (404s)
  4. service entrypoint mismatch after non-sudo npm install

If you paste the --ws-log output around a single failing webchat send here, I can tell you immediately whether it’s “duplicate chat.send runs” vs “single run looping,” and what exactly to title the main issue.

thorn canopy
#

--ws-log full results: single chat.send, retry storm is inside the run path

Only ONE chat.send was issued per UI send — so it's NOT a duplicate/double-submit bug:

← req chat.send conn=2ea161a6 id=7a54e1d0
→ res ✓ chat.send 13ms runId=9a8b0f56
embedded run agent start: runId=9a8b0f56
embedded run agent end: runId=9a8b0f56 isError=true error=⚠️ API rate limit reached
→ event agent phase=error
→ event chat (empty message pushed to UI)

The "ghost empty message" in the dashboard is OpenClaw pushing the error event as a chat event to the UI.

Context at time of failure was clean:

messages=28 historyImageBlocks=0 historyTextChars=5111

So it's not context bloat or image blocks either.

Conclusion: the retry storm is happening inside the embedded agent run path on webchat, not from duplicate UI submissions. The webchat messageChannel path appears to trigger significantly more internal retry iterations than the CLI path for the same single 429.

CLI comparison: 1x 429 → 1x retry → isError=false (works)
Webchat: 1x 429 → runaway retries → quota exhaustion

Hope this helps narrow it down to the right branch in run.ts.

rugged cosmosBOT
#

Nice — that --ws-log full capture is the key datapoint: one chat.send → one runId, so we can stop chasing double-submit / reconnect duplication.

Given you’re still seeing Google-side “storm” behavior, the most likely remaining explanation is:

  • the “storm” is happening inside a single embedded attempt, i.e. inside activeSession.prompt() / streamSimple (pi-ai) / Google adapter (per-request retry / stream retry / internal replay), rather than OpenClaw’s outer runLoopIterations (which would show up as multiple runEmbeddedAttempt cycles).

Best next probe: enable Cache Trace to count actual model calls per runId

OpenClaw has a built-in “cache trace” that logs a stream:context event every time the model streamFn is invoked. That’s the cleanest way to prove whether webchat is doing N calls vs CLI doing 1–2 calls.

Run gateway in foreground with cache trace enabled but without logging prompts/messages (privacy + smaller logs):

openclaw gateway stop

OPENCLAW_CACHE_TRACE=1 \
OPENCLAW_CACHE_TRACE_MESSAGES=0 \
#
OPENCLAW_CACHE_TRACE_PROMPT=0 \
OPENCLAW_CACHE_TRACE_SYSTEM=0 \
OPENCLAW_LOG_LEVEL=debug \
openclaw gateway --verbose --ws-log compact

Repro one webchat send, grab the runId, then count stream calls:

rg "\"runId\":\"<RUNID>\"" ~/.openclaw/logs/cache-trace.jsonl \
  | rg "\"stage\":\"stream:context\"" \
  | wc -l

Do the same for a CLI send (get that runId, count again).

#

What this tells us:

  • If webchat runId has, say, 20/50/150 stream:context entries and CLI has 1–2 → the retry storm is definitely inside the per-request stream layer (pi-ai / google adapter), and the issue can be written very crisply.
  • If both have ~1–2 calls → then the “storm” is coming from somewhere else (multiple runs over time, another process, etc.), and we’ll pivot.

(Implementation reference: createCacheTrace() wraps the agent’s streamFn and records stream:context each call: src/agents/cache-trace.ts, wired in src/agents/pi-embedded-runner/run/attempt.ts.)

What to put in the main GitHub issue (webchat path)

You already have most of it. Add:

  • The --ws-log full snippet proving single chat.send with one runId
  • The cache-trace count: stream:context calls per runId for webchat vs CLI
  • One redacted cache-trace line for stream:context (it includes options), in case there’s a retry/transport flag showing up in webchat that isn’t present in CLI

That will let maintainers jump straight to the right layer (likely streamSimple/google streaming) without debating the outer retry loop in run.ts.

And yep: please file the other three items as separate issues — your proposed split is exactly right.

thorn canopy
#

Cache trace results — stream:context calls per request:

Channel  | stream:context calls
---------|--------------------
Webchat  | 2
CLI      | 1

No retry storm observed in this run. The webchat's 2 calls vs CLI's 1 is expected — the webchat run triggered tool use (agent read some files before responding), while the CLI run responded directly. Both counts look clean.

Conclusion on the retry storm: With a healthy quota the behaviour is normal. The storm appears to have been a combination of exhausted Google quota + aggressive retry behaviour that only spirals when Google keeps returning 429s in succession. Once the quota reset overnight, both channels recovered cleanly.

New bug found during testing — API key logged in plaintext:
The cache trace log at ~/.openclaw/logs/cache-trace.jsonl contains the Google API key in plaintext in the options.apiKey field of every stream:context entry. I've rotated the key and deleted the log file, but this should be flagged as a credential leak in the trace log format.

Happy to provide a redacted sample line from the cache-trace.jsonl if useful for the issue.

rugged cosmosBOT
#

Yep — that’s a real credential leak.

cache-trace.jsonl is written by the cache trace wrapper (diagnostics.cacheTrace / OPENCLAW_CACHE_TRACE) and it currently records the full options object for every stream:context event. For Google, that options payload includes apiKey, so you end up with the key in plaintext on disk.

Two important implications:

  • OpenClaw’s normal “redactSensitive” protections are console-oriented; file-style debug traces like cache-trace.jsonl are not being redacted here.
  • Anyone/anything that can read ~/.openclaw/logs/cache-trace.jsonl can extract the key.

What to do now (you already did most of it)

  • Rotate the Google key (good).
  • Keep cache trace disabled unless you’re actively debugging:
    • unset OPENCLAW_CACHE_TRACE / set diagnostics.cacheTrace.enabled: false
  • Treat ~/.openclaw/logs/cache-trace.jsonl as a secret file:
    • delete it when done, and make sure permissions are restrictive (e.g. chmod 600 …).

Please file a GitHub issue for the key leak (high priority)