Dual models getting called when session model switched from default | Friends of the Crustacean 🦞🤝 | Page 1

errant goblet Mar 11, 2026, 4:21 PM

#

When I switch from default model to an alternate, I am getting API calls to both the selected model and the default fallback model at same time

I use a VPS and openclaw is hosted inside a docker

worn kettleBOT Mar 11, 2026, 4:21 PM

#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

azure gorgeBOT Mar 11, 2026, 4:25 PM

#

OpenClaw can legitimately hit two different models in one “turn”, but it’s usually not “fallback running in parallel”.

There are two common causes:

1) It’s not failover — it’s pre-processing (links/media/compaction) using its own model list

Even if you /model switch the session model, OpenClaw may also run:

link understanding (URL expansion/summarization) → tools.links.*
media understanding (image/audio/video) → tools.media.*
PDF/image fallback → agents.defaults.imageModel / agents.defaults.pdfModel
compaction summarization (near context limits) → agents.defaults.compaction.model (optional override)

Those can call a model from config that often matches your “default model”, while the main reply uses your newly selected session model — so you see two providers/models billed.

Quick isolation test: send a plain text message with no links and no attachments.

If it becomes single-model: it’s tools.links / tools.media.
If it’s still double-model: look at #2.

#

Config knobs to check:

tools.links.enabled
tools.links.models
tools.media.models and tools.media.image.models (etc.)

Docs (relevant): https://docs.openclaw.ai/help/faq (Models + Failover section), https://docs.openclaw.ai/concepts/models

2) You actually have two gateways consuming the same Discord events

On a VPS + Docker this happens surprisingly often (old container still running, duplicate deployment, etc.). In that case both instances will call their own default/selected models.

Quick check:

Do you ever see double replies in Discord (even briefly)?
Does docker ps show more than one openclaw container?
Do gateway logs show the same Discord message ID handled twice?

#

To diagnose fast, paste these outputs (from inside the container)

openclaw --version
openclaw models status
The relevant config snippets (just these keys):

agents.defaults.model
tools.links
tools.media

With that, we can tell which path is triggering the “second” model call and what the cleanest fix is (disable link/media preprocessing, or align those model lists, or kill the duplicate gateway).

errant goblet Mar 11, 2026, 4:27 PM

#

Slow down, step by step 1 msg 1 step. Make all commands 1 click copyable. How do call docker commands? I am on 2026.3.8

azure gorgeBOT Mar 11, 2026, 4:28 PM

#

Step 1: Check if you accidentally have 2 OpenClaw containers running (this is the #1 cause of “two models get billed for one message”).

Run this on the VPS host (where Docker is installed — usually via SSH):

sudo docker ps --format 'table {{.ID}}\t{{.Names}}\t{{.Image}}\t{{.Status}}'

If sudo isn’t needed on your box, use:

docker ps --format 'table {{.ID}}\t{{.Names}}\t{{.Image}}\t{{.Status}}'

Paste the output here (especially any rows that look like openclaw, clawdbot, or your compose project name).

errant goblet Mar 11, 2026, 4:30 PM

#

linuxuser@sera-gateway:~$ docker ps --format 'table {{.ID}}\t{{.Names}}\t{{.Image}}\t{{.Status}}'
CONTAINER ID NAMES IMAGE STATUS
1d946123c24f openclaw-openclaw-gateway-1 openclaw:local-uv Up 16 minutes (healthy)
8042bdd5e41c openclaw-canvas-dashboard nginx:alpine Up 7 days

azure gorgeBOT Mar 11, 2026, 4:31 PM

#

Step 2: Let’s see your active model + fallback chain (from inside the gateway container).

Run:

docker exec -it openclaw-openclaw-gateway-1 openclaw models status

Paste the output here.

If that command errors (no TTY), run this instead:

docker exec openclaw-openclaw-gateway-1 openclaw models status

errant goblet Mar 11, 2026, 4:33 PM

#

linuxuser@sera-gateway:~$ docker exec -it openclaw-openclaw-gateway-1 openclaw models status

🦞 OpenClaw 2026.3.8 (unknown) — Open source means you can see exactly how I judge your config.

Config : ~/.openclaw/openclaw.json
Agent dir : ~/.openclaw/agents/main/agent
Default : google/gemini-2.5-pro (from dispatch)
Fallbacks (0) : -
Image model : google/gemini-3.1-flash-image-preview
Image fallbacks (1): openrouter/openai/gpt-5-image-mini
Aliases (9) : research -> openrouter/anthropic/claude-sonnet-4.6, logic -> openrouter/anthropic/claude-opus-4.6, content -> openrouter/x-ai/grok-4.1-fast, gemini-flash -> google/gemini-3-flash-preview, learn -> google/gemini-3.1-pro-preview, gemini-flash-lite -> google/gemini-3.1-flash-lite-preview, img -> google/gemini-3.1-flash-image-preview, code -> openrouter/openai/gpt-5.3-codex, dispatch -> google/gemini-2.5-pro
Configured models (13): google/gemini-2.5-flash-lite, openrouter/anthropic/claude-sonnet-4.6, openrouter/anthropic/claude-opus-4.6, openrouter/x-ai/grok-4.1-fast, openrouter/openai/gpt-5-codex, google/gemini-3-flash-preview, google/gemini-3.1-pro-preview, openrouter/google/gemini-3.1-flash-image-preview, openrouter/openai/gpt-5-image-mini, google/gemini-3.1-flash-lite-preview, google/gemini-3.1-flash-image-preview, openrouter/openai/gpt-5.3-codex, google/gemini-2.5-pro

Auth overview
Auth store : ~/.openclaw/agents/main/agent/auth-profiles.json
Shell env : off
Providers w/ OAuth/tokens (0): -

google effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | google:default=REDACTED | env=REDACTED | source=env: GEMINI_API_KEY
openrouter effective=profiles:~/.openclaw/agents/main/agent/auth-profiles.json | profiles=1 (oauth=0, token=0, api_key=1) | openrouter:default=REDACTED | env=REDACTED | source=env: OPENROUTER_API_KEY

Missing auth

anthropic Run claude setup-token, then openclaw models auth setup-token or openclaw configure.

OAuth/token status

none

azure gorgeBOT Mar 11, 2026, 4:33 PM

#

Step 3: Check if “link understanding” is enabled (it can trigger an extra model call alongside your session model).

Run:

docker exec -it openclaw-openclaw-gateway-1 openclaw config get tools.links.enabled

Paste the single line it prints.

(If -it errors, use:)

docker exec openclaw-openclaw-gateway-1 openclaw config get tools.links.enabled

errant goblet Mar 11, 2026, 4:34 PM

#

Config path not found: tools.links.enabled

azure gorgeBOT Mar 11, 2026, 4:37 PM

#

No worries — that error just means tools.links.enabled isn’t explicitly set in your openclaw.json (so we can’t read it via config get).

Step 4: Watch the gateway logs live while you trigger the “two model calls” once.

Run this:

docker logs -f --timestamps openclaw-openclaw-gateway-1

Then (while it’s running) trigger the problem once (switch model + send 1 message), and copy/paste the log block that shows the two model/provider requests.
Stop the log stream with Ctrl+C after you’ve captured it.

errant goblet Mar 11, 2026, 4:39 PM

#

Can't we pull logs that cover the last 2 hours to see what's been happening

azure gorgeBOT Mar 11, 2026, 4:39 PM

#

Yes — Docker can.

Step 5: Dump the last 2 hours of gateway logs (and also show the last 200 lines).

docker logs --since 2h --timestamps openclaw-openclaw-gateway-1 | tee /tmp/openclaw-gateway-last2h.log | tail -n 200

After you run it, tell me:

do you see two different provider/model entries for the same Discord message/turn in that output?
and if yes, paste the few lines around one of those “double call” moments.

errant goblet Mar 11, 2026, 4:50 PM

#

Seeing many lines that repeat this:

2026-03-11T16:12:55.720771045Z 2026-03-11T16:12:55.720+00:00 [agent/embedded] embedded run agent end: runId=d87a98e9-6040-4a1a-989d-9cabd8405d9f isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:12:57.959680235Z 2026-03-11T16:12:57.959+00:00 [agent/embedded] embedded run agent end: runId=d87a98e9-6040-4a1a-989d-9cabd8405d9f isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:13:02.380756581Z 2026-03-11T16:13:02.380+00:00 [agent/embedded] embedded run agent end: runId=d87a98e9-6040-4a1a-989d-9cabd8405d9f isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:13:10.840517281Z 2026-03-11T16:13:10.839+00:00 [agent/embedded] embedded run agent end: runId=d87a98e9-6040-4a1a-989d-9cabd8405d9f isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:13:10.850313327Z 2026-03-11T16:13:10.849+00:00 [diagnostic] lane task error: lane=main durationMs=15781 error="FailoverError: ⚠️ API rate limit reached. Please try again later."
2026-03-11T16:13:10.852133175Z 2026-03-11T16:13:10.851+00:00 [diagnostic] lane task error: lane=session:agent:main:main durationMs=15787 error="FailoverError: ⚠️ API rate limit reached. Please try again later."

#

I've been monitoring my google API dashboard, and I haven't been hitting any rate limits so this is odd

azure gorgeBOT Mar 11, 2026, 4:51 PM

#

Those lines are retries for the same run (runId=...) failing with HTTP 429 / rate limit. That can absolutely look like “it’s calling multiple models”, because the runtime keeps trying to complete the turn (and may attempt other candidates if configured).

Step 6: let’s pull context around that exact runId so we can see which provider/model it was using right before the 429.

Run:

grep -n -C 20 'runId=d87a98e9-6040-4a1a-989d-9cabd8405d9f' /tmp/openclaw-gateway-last2h.log | head -n 200

Paste the output.

errant goblet Mar 11, 2026, 4:52 PM

#

linuxuser@sera-gateway:~$ grep -n -C 20 'runId=d87a98e9-6040-4a1a-989d-9cabd8405d9f' /tmp/openclaw-gateway-last2h.log | head -n 200
linuxuser@sera-gateway:~$

Returned nothing

azure gorgeBOT Mar 11, 2026, 4:53 PM

#

That “⚠️ API rate limit reached” string is generic — it can be thrown for any provider that returns a 429 / throttling signal (OpenRouter included). So if you’re only watching the Google dashboard, it can look “impossible” even though the limit is actually coming from OpenRouter (or another upstream).

Step 7: let’s confirm which provider is rate-limiting by pulling log context around the FailoverError:

grep -n -C 50 'FailoverError: ⚠️ API rate limit reached' /tmp/openclaw-gateway-last2h.log | head -n 200

Paste that output — it usually includes the provider/model a few lines above.

#

Got it — that just means that specific runId isn’t in the file (either the runId differs, or the file doesn’t contain that section).

Step 8: let’s grab the actual runId(s) that do exist in /tmp/openclaw-gateway-last2h.log by searching the error text:

grep -n 'API rate limit reached' /tmp/openclaw-gateway-last2h.log | head -n 20

Paste the output (it should include the current runId=...).

errant goblet Mar 11, 2026, 4:55 PM

#

These are returning nothing:

linuxuser@sera-gateway:~$ grep -n -C 50 'FailoverError: ⚠️ API rate limit reached' /tmp/openclaw-gateway-last2h.log | head -n 200
linuxuser@sera-gateway:~$ grep -n 'API rate limit reached' /tmp/openclaw-gateway-last2h.log | head -n 20
linuxuser@sera-gateway:~$

#

Are we neglecting the docker component to the commands?

azure gorgeBOT Mar 11, 2026, 4:56 PM

#

That strongly suggests the file either doesn’t exist or is empty / not what we think it is.

Step 9: confirm the file exists + has content (and show its last 20 lines):

ls -lh /tmp/openclaw-gateway-last2h.log && wc -l /tmp/openclaw-gateway-last2h.log && tail -n 20 /tmp/openclaw-gateway-last2h.log

Paste the output.

#

Yep — good catch. The grep ... /tmp/openclaw-gateway-last2h.log commands only work if you first wrote Docker logs into that file.

Let’s do it in one command (no temp file needed), directly from the container logs:

Step 10:

docker logs --since 2h --timestamps openclaw-openclaw-gateway-1 | grep -E 'FailoverError|API rate limit reached|embedded run agent end|429|RESOURCE_EXHAUSTED' | tail -n 200

Paste that output. This will confirm whether the “rate limit” is real, and usually gives us enough surrounding breadcrumbs to tell which provider/model is involved.

errant goblet Mar 11, 2026, 5:04 PM

#

2026-03-11T16:13:35.245716120Z 2026-03-11T16:13:35.245+00:00 [delivery-recovery] Delivery d86e5df2-2d60-4c20-8102-1e437fa9043c exceeded max retries (5/5) — moving to failed/
2026-03-11T16:13:36.222836676Z 2026-03-11T16:13:36.222+00:00 [ws] Proxy headers detected from untrusted address. Connection will not be treated as local. Configure gateway.trustedProxies to restore local client detection behind your proxy.
2026-03-11T16:13:42.212945848Z 2026-03-11T16:13:42.212+00:00 [agent/embedded] embedded run agent end: runId=82f80909-114e-4fdc-af8c-31f02a435f77 isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:13:44.471064407Z 2026-03-11T16:13:44.470+00:00 [agent/embedded] embedded run agent end: runId=82f80909-114e-4fdc-af8c-31f02a435f77 isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:13:48.772925634Z 2026-03-11T16:13:48.772+00:00 [agent/embedded] embedded run agent end: runId=82f80909-114e-4fdc-af8c-31f02a435f77 isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:13:57.203154913Z 2026-03-11T16:13:57.202+00:00 [agent/embedded] embedded run agent end: runId=82f80909-114e-4fdc-af8c-31f02a435f77 isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:13:57.230913913Z 2026-03-11T16:13:57.230+00:00 [diagnostic] lane task error: lane=main durationMs=15662 error="FailoverError: ⚠️ API rate limit reached. Please try again later."
2026-03-11T16:13:57.233364775Z 2026-03-11T16:13:57.232+00:00 [diagnostic] lane task error: lane=session:agent:main:main durationMs=15669 error="FailoverError: ⚠️ API rate limit reached. Please try again later."
2026-03-11T16:17:30.839491092Z 2026-03-11T16:17:30.836+00:00 [tools] exec failed: elevated is not available right now (runtime=direct).
2026-03-11T16:17:30.839520398Z Failing gates: enabled (tools.elevated.enabled / agents.list[].tools.elevated.enabled)
2026-03-11T16:17:30.839525548Z Context: provider=discord session=agent:main:main
2026-03-11T16:17:30.839529366Z Fix-it keys:
2026-03-11T16:17:30.839533353Z - tools.elevated.enabled
2026-03-11T16:17:30.839537120Z - tools.elevated.allowFrom.<provider>
2026-03-11T16:17:30.839541168Z - agents.list[].tools.elevated.enabled
2026-03-11T16:17:30.839544825Z - agents.list[].tools.elevated.allowFrom.<provider>
2026-03-11T16:17:42.579429714Z 2026-03-11T16:17:42.578+00:00 [ws] ⇄ res ✓ browser.request 137ms conn=6a14e60a…ad30 id=d0822859…9d77

Should also point out, when my default model is selected the dual model calls are not taking place. Only when an alternate model has been set in the session

azure gorgeBOT Mar 11, 2026, 5:06 PM

#

That last point is actually a big clue:

When you switch to an alternate session model, OpenClaw may need to run extra “embedded” work (most commonly image/media understanding) using your configured Image model.
Your models status shows you do have an Image model configured: google/gemini-3.1-flash-image-preview.
So if your alternate model is text-only (or effectively treated as non‑multimodal), you’ll see two calls:
1. Google (image/media understanding)
2. Your selected alternate model (main response)

That matches: “no dual calls on default model” (because your default Gemini model can handle the inputs directly, so it doesn’t need the extra image pass).

Step 11: let’s pull a tight log window around the rate-limit event to see what triggered the embedded run:

docker logs --since 2026-03-11T16:12:30Z --until 2026-03-11T16:14:30Z --timestamps openclaw-openclaw-gateway-1

Paste that output (it should include what happened before runId=82f8... started).

errant goblet Mar 11, 2026, 5:13 PM

#

It's too long to share in 1 post. Here is part 1/2

linuxuser@sera-gateway:~$ docker logs --since 2026-03-11T16:12:30Z --until 2026-03-11T16:14:30Z --timestamps openclaw-openclaw-gateway-1
2026-03-11T16:12:55.720771045Z 2026-03-11T16:12:55.720+00:00 [agent/embedded] embedded run agent end: runId=d87a98e9-6040-4a1a-989d-9cabd8405d9f isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:12:57.959680235Z 2026-03-11T16:12:57.959+00:00 [agent/embedded] embedded run agent end: runId=d87a98e9-6040-4a1a-989d-9cabd8405d9f isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:13:02.380756581Z 2026-03-11T16:13:02.380+00:00 [agent/embedded] embedded run agent end: runId=d87a98e9-6040-4a1a-989d-9cabd8405d9f isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:13:10.840517281Z 2026-03-11T16:13:10.839+00:00 [agent/embedded] embedded run agent end: runId=d87a98e9-6040-4a1a-989d-9cabd8405d9f isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:13:10.850313327Z 2026-03-11T16:13:10.849+00:00 [diagnostic] lane task error: lane=main durationMs=15781 error="FailoverError: ⚠️ API rate limit reached. Please try again later."
2026-03-11T16:13:10.852133175Z 2026-03-11T16:13:10.851+00:00 [diagnostic] lane task error: lane=session:agent:main:main durationMs=15787 error="FailoverError: ⚠️ API rate limit reached. Please try again later."
2026-03-11T16:13:28.443464851Z 2026-03-11T16:13:28.442+00:00 [gateway] signal SIGTERM received
2026-03-11T16:13:28.445621227Z 2026-03-11T16:13:28.445+00:00 [gateway] received SIGTERM; shutting down
2026-03-11T16:13:28.479111451Z 2026-03-11T16:13:28.478+00:00 [gmail-watcher] gmail watcher stopped
2026-03-11T16:13:32.033035168Z (node:7) [DEP0040] DeprecationWarning: The punycode module is deprecated. Please use a userland alternative instead.
2026-03-11T16:13:32.033061127Z (Use node --trace-deprecation ... to show where the warning was created)
2026-03-11T16:13:32.058089586Z │
2026-03-11T16:13:32.058111528Z ◇ Doctor warnings ────────────────────────────────────────────────────────╮
2026-03-11T16:13:32.058116518Z │ │
2026-03-11T16:13:32.058120565Z │ - channels.telegram.groupPolicy is "allowlist" but groupAllowFrom (and │
2026-03-11T16:13:32.058159811Z │ allowFrom) is empty — all group messages will be silently dropped. │
2026-03-11T16:13:32.058166424Z │ Add sender IDs to channels.telegram.groupAllowFrom or │
2026-03-11T16:13:32.058184729Z │ channels.telegram.allowFrom, or set groupPolicy to "open". │
2026-03-11T16:13:32.058187595Z │ │
2026-03-11T16:13:32.058190019Z ├──────────────────────────────────────────────────────────────────────────╯

#

Part 3/3

2026-03-11T16:13:36.768617625Z 2026-03-11T16:13:36.768+00:00 [telegram] autoSelectFamily=true (default-node22)
2026-03-11T16:13:36.771241491Z 2026-03-11T16:13:36.770+00:00 [telegram] global undici dispatcher autoSelectFamily=true
2026-03-11T16:13:36.773455317Z 2026-03-11T16:13:36.773+00:00 [telegram] dnsResultOrder=ipv4first (default-node22)
2026-03-11T16:13:36.935546357Z 2026-03-11T16:13:36.935+00:00 [gateway] qmd memory startup initialization armed for agent "main"
2026-03-11T16:13:37.818387091Z 2026-03-11T16:13:37.817+00:00 [discord] logged in to discord as 1466886376434831532 (Sera Nova Link)
2026-03-11T16:13:38.065478713Z 2026-03-11T16:13:38.065+00:00 [gateway] qmd memory startup initialization armed for agent "jax"
2026-03-11T16:13:38.914811099Z 2026-03-11T16:13:38.914+00:00 [gateway] qmd memory startup initialization armed for agent "augustus"
2026-03-11T16:13:39.598177397Z 2026-03-11T16:13:39.597+00:00 [gateway] qmd memory startup initialization armed for agent "quincy"
2026-03-11T16:13:40.327096212Z 2026-03-11T16:13:40.326+00:00 [gateway] qmd memory startup initialization armed for agent "silas"
2026-03-11T16:13:41.053076656Z 2026-03-11T16:13:41.049+00:00 [gateway] qmd memory startup initialization armed for agent "sloan"
2026-03-11T16:13:42.212945848Z 2026-03-11T16:13:42.212+00:00 [agent/embedded] embedded run agent end: runId=82f80909-114e-4fdc-af8c-31f02a435f77 isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:13:44.471064407Z 2026-03-11T16:13:44.470+00:00 [agent/embedded] embedded run agent end: runId=82f80909-114e-4fdc-af8c-31f02a435f77 isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:13:48.772925634Z 2026-03-11T16:13:48.772+00:00 [agent/embedded] embedded run agent end: runId=82f80909-114e-4fdc-af8c-31f02a435f77 isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:13:57.203154913Z 2026-03-11T16:13:57.202+00:00 [agent/embedded] embedded run agent end: runId=82f80909-114e-4fdc-af8c-31f02a435f77 isError=true error=⚠️ API rate limit reached. Please try again later.
2026-03-11T16:13:57.230913913Z 2026-03-11T16:13:57.230+00:00 [diagnostic] lane task error: lane=main durationMs=15662 error="FailoverError: ⚠️ API rate limit reached. Please try again later."
2026-03-11T16:13:57.233364775Z 2026-03-11T16:13:57.232+00:00 [diagnostic] lane task error: lane=session:agent:main:main durationMs=15669 error="FailoverError: ⚠️ API rate limit reached. Please try again later."

#

Part 2/3 didn't send probably too much repetitive lines

azure gorgeBOT Mar 11, 2026, 5:15 PM

#

That log actually lines up really well with your observation:

models status shows two different “roles” of models in your setup:
- Default chat model: google/gemini-2.5-pro
- Image model: google/gemini-3.1-flash-image-preview (+ an OpenRouter image fallback)
The repeating lines are from [agent/embedded], which is almost always “helper work” (most commonly image/media understanding for screenshots/attachments/tool outputs).

So what’s likely happening is:

When you keep the default Gemini model, it can handle multimodal context directly → one provider/model call.
When you switch the session to an alternate model that’s effectively text-only for the current task (very common with “codex” / some OpenRouter models), OpenClaw needs to “convert images to text” first → it runs an embedded call using your configured Image model (Google) plus your selected session model → looks like “dual model calls”.

That means the “extra Google calls” are probably not “fallbacks” — they’re the image/media helper model kicking in.

One quick question to confirm: which exact alternate model are you switching to (e.g. your code -> openrouter/openai/gpt-5.3-codex, or one of the Claude aliases)?

#

If it is Codex: that would perfectly explain why dual calls only happen when you switch away from Gemini (because Codex is text-first and browser/screenshots trigger the image helper).

errant goblet Mar 11, 2026, 5:16 PM

#

But the model being used when the dual calls are taking place is gemini 3.1 pro preview, which is mutli modal capable. So this theory doesn't stack up

azure gorgeBOT Mar 11, 2026, 5:18 PM

#

Your pasted Docker logs are the human-formatted lines, and they unfortunately don’t print the provider/model for embedded run agent end — but the logger does record provider + model in the structured JSON fields.

So the fastest way to answer “which model is actually rate-limiting / getting called” is to pull the JSON logs and grep by that runId.

Run this (1 command):

docker exec openclaw-openclaw-gateway-1 openclaw logs --json --limit 3000 | grep -F 'runId":"82f80909-114e-4fdc-af8c-31f02a435f77' | head -n 20

Paste what it prints — it should include "provider":"..." and "model":"..." on the same line, which will tell us whether this is Google, OpenRouter, or something else (and whether you’re really seeing two different models per turn).

errant goblet Mar 11, 2026, 5:19 PM

#

Returned no output

azure gorgeBOT Mar 11, 2026, 5:20 PM

#

All good — don’t try to paste the whole 2‑hour window. We just need a tiny, targeted snippet that includes provider + model, and JSON logs will give us that.

Run this (it should output at most 20 lines, so it won’t hit Discord limits):

docker exec openclaw-openclaw-gateway-1 openclaw logs --json --limit 3000 | grep -F 'runId":"82f80909-114e-4fdc-af8c-31f02a435f77' | head -n 20

Paste whatever it prints. If it prints nothing, tell me “no matches” and we’ll grep for the error string instead.

errant goblet Mar 11, 2026, 5:20 PM

#

No, not couldn't paste in discord but that command returned no output or ran silent

azure gorgeBOT Mar 11, 2026, 5:22 PM

#

Multimodal capability isn’t the deciding factor here.

Even if your session model is Gemini 3.1 Pro Preview, OpenClaw can still make an additional LLM call for pre-processing that does not necessarily follow the per-session /model override — most commonly:

link understanding (tools.links.*) when a message contains URLs
media understanding (tools.media.*) for attachments / tool-produced images
compaction / memory-flush (less common, but also model-driven)

That would exactly produce: “selected model gets called + the default model gets called” only when you switch away from default.

Next step (just to confirm): show your tools.links config.

docker exec -it openclaw-openclaw-gateway-1 openclaw config get tools.links

Paste the output (if it says “path not found”, tell me that).

#

Fair pushback — if your session model is google/gemini-3.1-pro-preview, then “it needed a separate image helper because the model can’t see images” is not the best explanation.

In that case, the usual reasons you still see Gemini 2.5 Pro calls alongside it are:

background work / other agents in the same gateway (you clearly have multiple agents: jax, augustus, quincy, etc.)
pre-processing that uses its own model selection (not the session override)

Let’s stop guessing and prove which models are being used by pulling the structured JSON log lines that include "model": ....

Run this:

docker exec openclaw-openclaw-gateway-1 openclaw logs --json --limit 5000 | grep -E '"model":"google/gemini-2.5-pro"|"model":"google/gemini-3.1-pro-preview"' | tail -n 50

Paste the output. That will show whether both models are actually being invoked by the gateway around the same time (and often includes enough metadata to tell if it’s the same run/session or background activity).

#

Totally fair — if both models are multimodal, the “image helper model” explanation isn’t sufficient.

Right now we’re stuck because we don’t yet know what the JSON log lines actually look like in your setup (so our grep "model":... patterns can miss even if the data is there).

Next step: show just the raw JSON log format (very small output).

docker exec openclaw-openclaw-gateway-1 openclaw logs --json --limit 5

Paste those 5 lines (you can redact message text if any appears). Once I see the shape (type=log wrapper vs direct entries, field names, etc.), I can give you the exact one-liner to extract both model names and prove whether it’s (a) background jobs/other agents or (b) per-message preprocessing.

errant goblet Mar 11, 2026, 5:29 PM

#

Sent in 2 parts 1/2

linuxuser@sera-gateway:~$ docker exec openclaw-openclaw-gateway-1 openclaw logs --json --limit 5
{"type":"meta","file":"/tmp/openclaw/openclaw-2026-03-11.log","cursor":565582,"size":565582}
{"type":"log","time":"2026-03-11T17:03:35.502+00:00","level":"info","subsystem":"gateway/channels/discord","message":"{"subsystem":"gateway/channels/discord"} [default] Discord Message Content Intent is limited; bots under 100 servers can use it without verification.","raw":"{"0":"{\"subsystem\":\"gateway/channels/discord\"}","1":"[default] Discord Message Content Intent is limited; bots under 100 servers can use it without verification.","_meta":{"runtime":"node","runtimeVersion":"22.22.1","hostname":"1d946123c24f","name":"{\"subsystem\":\"gateway/channels/discord\"}","parentNames":["openclaw"],"date":"2026-03-11T17:03:35.501Z","logLevelId":3,"logLevelName":"INFO","path":{"fullFilePath":"file:///app/dist/subsystem-4K-e3L3i.js:1020:14","fileName":"subsystem-4K-e3L3i.js","fileNameWithLine":"subsystem-4K-e3L3i.js:1020","fileColumn":"14","fileLine":"1020","filePath":"dist/subsystem-4K-e3L3i.js","filePathWithLine":"dist/subsystem-4K-e3L3i.js:1020","method":"logToFile"}},"time":"2026-03-11T17:03:35.502+00:00"}"}

#

{"type":"log","time":"2026-03-11T17:03:35.504+00:00","level":"info","subsystem":"gateway/channels/discord","message":"{"subsystem":"gateway/channels/discord"} [default] starting provider (@REDACTED)","raw":"{"0":"{\"subsystem\":\"gateway/channels/discord\"}","1":"[default] starting provider (@REDACTED)","_meta":{"runtime":"node","runtimeVersion":"22.22.1","hostname":"1d946123c24f","name":"{\"subsystem\":\"gateway/channels/discord\"}","parentNames":["openclaw"],"date":"2026-03-11T17:03:35.503Z","logLevelId":3,"logLevelName":"INFO","path":{"fullFilePath":"file:///app/dist/subsystem-4K-e3L3i.js:1020:14","fileName":"subsystem-4K-e3L3i.js","fileNameWithLine":"subsystem-4K-e3L3i.js:1020","fileColumn":"14","fileLine":"1020","filePath":"dist/subsystem-4K-e3L3i.js","filePathWithLine":"dist/subsystem-4K-e3L3i.js:1020","method":"logToFile"}},"time":"2026-03-11T17:03:35.504+00:00"}"}
{"type":"log","time":"2026-03-11T17:03:35.602+00:00","level":"info","subsystem":"gateway/channels/discord","message":"{"subsystem":"gateway/channels/discord"} discord channels resolved: guild:1420899282327699561→1420899282327699561 (guild:REDACTED)","raw":"{"0":"{\"subsystem\":\"gateway/channels/discord\"}","1":"discord channels resolved: guild:1420899282327699561→1420899282327699561 (guild:REDACTED)","_meta":{"runtime":"node","runtimeVersion":"22.22.1","hostname":"1d946123c24f","name":"{\"subsystem\":\"gateway/channels/discord\"}","parentNames":["openclaw"],"date":"2026-03-11T17:03:35.602Z","logLevelId":3,"logLevelName":"INFO","path":{"fullFilePath":"file:///app/dist/subsystem-4K-e3L3i.js:1020:14","fileName":"subsystem-4K-e3L3i.js","fileNameWithLine":"subsystem-4K-e3L3i.js:1020","fileColumn":"14","fileLine":"1020","filePath":"dist/subsystem-4K-e3L3i.js","filePathWithLine":"dist/subsystem-4K-e3L3i.js:1020","method":"logToFile"}},"time":"2026-03-11T17:03:35.602+00:00"}"}
{"type":"log","time":"2026-03-11T17:03:35.604+00:00","level":"info","subsystem":"gateway/channels/discord","message":"{"subsystem":"gateway/channels/discord"} discord channel users resolved: 1137885937435693107→1137885937435693107","raw":"{"0":"{\"subsystem\":\"gateway/channels/discord\"}","1":"discord channel users resolved: 1137885937435693107→1137885937435693107","_meta":{"runtime":"node","runtimeVersion":"22.22.1","hostname":"1d946123c24f","name":"{\"subsystem\":\"gateway/channels/discord\"}","parentNames":["openclaw"],"date":"2026-03-11T17:03:35.604Z","logLevelId":3,"logLevelName":"INFO","path":{"fullFilePath":"file:///app/dist/subsystem-4K-e3L3i.js:1020:14","fileName":"subsystem-4K-e3L3i.js","fileNameWithLine":"subsystem-4K-e3L3i.js:1020","fileColumn":"14","fileLine":"1020","filePath":"dist/subsystem-4K-e3L3i.js","filePathWithLine":"dist/subsystem-4K-e3L3i.js:1020","method":"logToFile"}},"time":"2026-03-11T17:03:35.604+00:00"}"}
{"type":"log","time":"2026-03-11T17:03:36.608+00:00","level":"info","subsystem":"gateway/channels/discord","message":"{"subsystem":"gateway/channels/discord"} logged in to discord as 1466886376434831532 (REDACTED)","raw":"{"0":"{\"subsystem\":\"gateway/channels/discord\"}","1":"logged in to discord as 1466886376434831532 (REDACTED)","_meta":

#

{"runtime":"node","runtimeVersion":"22.22.1","hostname":"1d946123c24f","name":"{\"subsystem\":\"gateway/channels/discord\"}","parentNames":["openclaw"],"date":"2026-03-11T17:03:36.607Z","logLevelId":3,"logLevelName":"INFO","path":{"fullFilePath":"file:///app/dist/subsystem-4K-e3L3i.js:1020:14","fileName":"subsystem-4K-e3L3i.js","fileNameWithLine":"subsystem-4K-e3L3i.js:1020","fileColumn":"14","fileLine":"1020","filePath":"dist/subsystem-4K-e3L3i.js","filePathWithLine":"dist/subsystem-4K-e3L3i.js:1020","method":"logToFile"}},"time":"2026-03-11T17:03:36.608+00:00"}"}
{"type":"notice","message":"Log tail truncated (increase --max-bytes)."}

azure gorgeBOT Mar 11, 2026, 5:30 PM

#

Yep — now that we can see the JSON log shape, the reason my earlier greps were “silent” is that the model name often won’t appear as "model":"google/..." in these lines. It may show up as just gemini-2.5-pro / gemini-3.1-pro-preview inside the raw payload.

Next step: grep the JSON log stream for either model string (no provider prefix), then show the last matches:

docker exec openclaw-openclaw-gateway-1 openclaw logs --json --limit 8000 | grep -E 'gemini-2\\.5-pro|gemini-3\\.1-pro-preview' | tail -n 50

Paste the output. If that still returns nothing, we’ll instead grep for the lifecycle event string (embedded run agent end) and read the provider/model fields from the raw object.

#

What you’re describing (“I only notice it when I switch away from the default model”) matches a very common pattern:

Heartbeats / background runs

OpenClaw runs heartbeats every 30 minutes by default, and those runs use the agent’s default model, not your per-session /model override.

So when:

default model = Gemini 2.5 → heartbeat + chat both hit 2.5 → looks like “one model”
you switch session to Gemini 3.1 → chat hits 3.1 but heartbeat still hits 2.5 → looks like “dual model calls”

This is not “fallback for the same message”, it’s two separate runs.

Next step (confirm via logs):

docker logs --since 2h --timestamps openclaw-openclaw-gateway-1 | grep -i heartbeat | tail -n 50

#

If you see heartbeat activity around the times you notice “extra” default-model calls, that’s the explanation.

If you want to stop those extra calls: set agents.defaults.heartbeat.every to "0m" (disable) or increase it (e.g. "2h").

#

The snippet you pasted is just Discord startup log noise + the openclaw logs truncation notice — it doesn’t tell us anything about models yet.

At this point, the cleanest explanation that fits “only happens when I switch away from default” is still: heartbeats (or other background agents) are using the default model, while your chat session uses the alternate model. When both are the same (default), you don’t notice “dual models”.

Let’s confirm the heartbeat interval first (if it’s unset, it’s still effectively 30m by default):

docker exec -it openclaw-openclaw-gateway-1 openclaw config get agents.defaults.heartbeat.every

If it prints a value (like "30m"), heartbeats are definitely enabled.
If it says “path not found” or prints nothing, that usually means you’re on the default (30m) and heartbeats are still running unless you explicitly disabled them.

Paste what you get.

errant goblet Mar 11, 2026, 5:36 PM

#

I don't currently have heartbeats setup

azure gorgeBOT Mar 11, 2026, 5:37 PM

#

“Not set up” is the confusing part here: heartbeats run by default in OpenClaw (every 30m) unless you explicitly disable them in config.

So even if you never created a HEARTBEAT.md or “configured” anything, the gateway can still be doing a periodic background LLM call on the default model — which would make it look like your session is calling two models when you switch to an alternate.

Fastest proof: temporarily disable heartbeats and see if the “extra default-model calls” stop.

Step 1 (disable heartbeats):

docker exec -it openclaw-openclaw-gateway-1 openclaw config set agents.defaults.heartbeat.every "0m"

After that we’ll restart the container (next step) and you can re-test switching models.

errant goblet Mar 11, 2026, 5:38 PM

#

Updated agents.defaults.heartbeat.every. Restart the gateway to apply.

azure gorgeBOT Mar 11, 2026, 5:39 PM

#

Nice — that confirms heartbeats were enabled (at least implicitly) and now they’re disabled.

Step 2 (apply it): restart the gateway container

docker restart openclaw-openclaw-gateway-1

After it comes back up, re-test:

send a message on the default model
switch session model (e.g. to Gemini 3.1 Pro Preview) and send another message

Then tell me if you still see “both models” getting hit in the API dashboard.

errant goblet Mar 11, 2026, 5:45 PM

#

Did all those steps, when using gemini 3.1 pro it is still dual calling gemini 3 flash (default fallback model). When using default gemini 2.5 pro only that model gets called

#

Response time is also very sluggish when calling the alternate 3.1 pro model. Takes more than double amount of time for a response

azure gorgeBOT Mar 11, 2026, 5:46 PM

#

Good — disabling heartbeats rules out the “background heartbeat” explanation.

If you’re seeing Gemini 3.1 Pro Preview + Gemini 3 Flash for the same interaction, then it’s almost certainly either:

an actual configured fallback (agents.defaults.model.fallbacks), or
a separate subsystem (link/media understanding) that’s configured to use Flash.

Next step: let’s confirm what OpenClaw thinks your fallback chain is right now.

docker exec -it openclaw-openclaw-gateway-1 openclaw models status

Paste just these lines from the output:

Default
Fallbacks
Image model

#

Image fallbacks

#

Yeah, that tracks: if you’re getting two model calls (3.1 Pro + Flash) and/or any retries/failover, the user-visible latency can easily jump to “double or worse”.

There are 3 common reasons for the “alternate model feels sluggish” symptom:

The model itself is slower (some “pro/preview” tiers have noticeably higher latency).
It’s doing an extra pass (the Flash call you’re seeing), so you’re paying for two round trips.
It’s retrying behind the scenes (rate-limit / transient errors), which adds seconds.

We need to finish the one diagnostic that tells us which bucket you’re in:

docker exec -it openclaw-openclaw-gateway-1 openclaw models status

Paste just:

Default
Fallbacks

#

Image model
Image fallbacks

Once we see whether Flash is configured as a fallback vs being used by some other subsystem, we can make the “dual call + sluggish” stop.

errant goblet Mar 11, 2026, 5:48 PM

#

Is it possible that my main agent has settings that are causing this "collision"?

Config : ~/.openclaw/openclaw.json
Agent dir : ~/.openclaw/agents/main/agent
Default : google/gemini-2.5-pro (from dispatch)
Fallbacks (0) : -
Image model : google/gemini-3.1-flash-image-preview
Image fallbacks (1): openrouter/openai/gpt-5-image-mini

azure gorgeBOT Mar 11, 2026, 5:50 PM

#

Yep — your snippet basically explains the “Gemini Flash” part:

Fallbacks (0) → so this is not model fallback.
But you do have an Image model configured:
- Image model : google/gemini-3.1-flash-image-preview

So the “second model” you’re seeing is very likely the image/media understanding path (screenshots, attachments, embeds, PDFs, browser tool snapshots, etc.) using your configured image model, while the main chat uses your session model (Gemini 3.1 Pro Preview).

Why you don’t see it on default gemini-2.5-pro: that model is often treated as “can just consume the image directly”, so OpenClaw may skip the extra image-understanding pass in that case.

Fastest test: temporarily disable automatic image understanding and see if the Flash calls stop.

docker exec -it openclaw-openclaw-gateway-1 openclaw config set tools.media.image.enabled false --strict-json

Then restart the gateway container and re-test switching to 3.1 Pro.

#

If Flash stops showing up, we’ve confirmed it’s the media/image understanding helper, not a fallback collision.

errant goblet Mar 11, 2026, 6:00 PM

#

No, the scond model I can verify which is being called via my gemini API dashboard, it is not 3.1 falsh image preview, it is gemini 3 flash which I know is set as my main agent fallback

azure gorgeBOT Mar 11, 2026, 6:01 PM

#

If the second model is google/gemini-3-flash-preview (not the image-preview model), then yeah — that’s not coming from agents.defaults.imageModel.

In OpenClaw there are only a couple ways Flash gets pulled in during a run:

Real model fallback (agents.defaults.model.fallbacks or agents.list[].model.fallbacks) — typically because the primary attempt failed/timeout/rate-limited, so it tries the next model.
A separate “helper” path that has its own model list (e.g. link/media understanding), which can be configured to prefer Flash.

Since your models status is showing Fallbacks (0), let’s verify what’s actually in config (this will settle it immediately).

Run:

docker exec -it openclaw-openclaw-gateway-1 openclaw config get agents.defaults.model

Paste the output (it should show primary and any fallbacks). If you’ve set Flash as a fallback anywhere, it’ll show up there and explain the dual calls + sluggishness (first model attempt + fallback attempt).

errant goblet Mar 11, 2026, 6:03 PM

#

{
"primary": "dispatch",
"fallbacks": []
}

But what about attached to my main agent defaults? Can we pull a schema that outlines all agents and their models and possible fallbacks

azure gorgeBOT Mar 11, 2026, 6:04 PM

#

Yep — the per-agent model + fallback settings live under agents.list[].model (and can override the global agents.defaults.model).

There isn’t a single “schema dump” command that prints only models for all agents, but you can get the same info two easy ways:

Option A (most readable): per-agent `models status`

List your agent IDs:

docker exec -it openclaw-openclaw-gateway-1 openclaw agents list

For each agent you care about, inspect its model + fallbacks:

docker exec -it openclaw-openclaw-gateway-1 openclaw models status --agent main

#

(Repeat with --agent jax, --agent augustus, etc.)

Option B (one shot): dump the agent configs (includes model/fallbacks)

docker exec -it openclaw-openclaw-gateway-1 openclaw config get agents.list

If any agent has something like:

model: { primary: "...", fallbacks: ["google/gemini-3-flash-preview"] }

that would explain where the Flash calls are coming from.

errant goblet Mar 11, 2026, 6:07 PM

#

linuxuser@sera-gateway:~$ docker exec -it openclaw-openclaw-gateway-1 openclaw models status --agent main

🦞 OpenClaw 2026.3.8 (unknown) — Say "stop" and I'll stop—say "ship" and we'll both learn a lesson.

Config : ~/.openclaw/openclaw.json
Agent dir : ~/.openclaw/agents/main/agent
Default (agent): google/gemini-2.5-pro (from dispatch)
Fallbacks (1) (agent): google/gemini-3-flash-preview

Bingo. But why is it getting called in tandem with gemini 3.1 pro preview when I switch to that model. I haven't hit any day or minute rate limit caps on it

azure gorgeBOT Mar 11, 2026, 6:11 PM

#

Yep — that’s the missing piece:

Global defaults: fallbacks: []
But agent main has: Fallbacks (1): google/gemini-3-flash-preview

So when you /model switch the session to google/gemini-3.1-pro-preview, you’re only changing the primary model for that session — you are not removing the agent’s fallback chain. If the 3.1 Pro call hits any transient failure (429 throttling, 5xx, timeout, tool/schema hiccup), OpenClaw will immediately try the fallback (3-flash-preview). In dashboards that often looks like “both models were called for one message”, because they happen back-to-back.

And “I’m not hitting daily/minute caps” doesn’t rule it out — Gemini will still occasionally return throttling/overload responses based on burst/concurrency and model-tier availability, especially on preview models.

If you want to prove that’s what’s happening, the log lines to look for are the ones containing FailoverError right after a slow turn.

If you want to stop Flash being used at all, you need to remove that fallback from the main agent’s config (not agents.defaults.model). The fallback is coming from agents.list[].model.fallbacks for agent main.

errant goblet Mar 11, 2026, 6:14 PM

#

Ok, can we remove the fallback for now and undue the changes to heartbeat we made. I plan to use heartbeats eventually

azure gorgeBOT Mar 11, 2026, 6:19 PM

#

Yep — we can do both (remove the main agent fallback + revert heartbeat to “not explicitly configured”).

Step 1 (find the `main` agent’s index)

Run:

docker exec -it openclaw-openclaw-gateway-1 openclaw config get agents.list[0].id

If it prints "main", we’ll remove the fallback at agents.list[0].model.fallbacks.

If it prints a different id, tell me what it prints and we’ll check [1], [2], etc until we hit "main" (then we’ll edit the right one without guessing).

errant goblet Mar 11, 2026, 6:20 PM

#

main

azure gorgeBOT Mar 11, 2026, 6:21 PM

#

Step 2 (remove the fallback from main):

docker exec -it openclaw-openclaw-gateway-1 openclaw config set agents.list[0].model.fallbacks [] --strict-json

It should say it updated the config. After that, we’ll undo the heartbeat change and restart the gateway.

errant goblet Mar 11, 2026, 6:24 PM

#

Config overwrite: /home/node/.openclaw/openclaw.json (sha256 211b43a7767cd2c7fe95cb12e499a1c73bf05ae1b2b160f8dd30b5b013894e53 -> 1417a8d0df38ae332d63c8d5da6b1693e3417bb3f8a966abba43475e7fcc28e7, backup=/home/node/.openclaw/openclaw.json.bak)
Updated agents.list[0].model.fallbacks. Restart the gateway to apply.

azure gorgeBOT Mar 11, 2026, 6:24 PM

#

Step 3 (undo the heartbeat change we made — remove the explicit 0m override):

docker exec -it openclaw-openclaw-gateway-1 openclaw config unset agents.defaults.heartbeat.every

After that, we’ll restart the gateway container to apply both changes.

errant goblet Mar 11, 2026, 6:25 PM

#

Config overwrite: /home/node/.openclaw/openclaw.json (sha256 1417a8d0df38ae332d63c8d5da6b1693e3417bb3f8a966abba43475e7fcc28e7 -> 199eed6e0a7fb4510baf4b7195e30340f61618e5dfc6d85b64322fc45cbf6992, backup=/home/node/.openclaw/openclaw.json.bak)
Removed agents.defaults.heartbeat.every. Restart the gateway to apply.
linuxuser@sera-gateway:~$

Ok, next. And we should verify after restart

azure gorgeBOT Mar 11, 2026, 6:25 PM

#

Step 4: restart the gateway container:

docker restart openclaw-openclaw-gateway-1

After it’s back up, we’ll verify:

openclaw models status --agent main shows Fallbacks (0)
and your /model gemini-3.1-pro-preview no longer triggers a Flash fallback call unless something else is invoking it.

errant goblet Mar 11, 2026, 6:26 PM

#

azure gorgeBOT Mar 11, 2026, 6:26 PM

#

Step 5 (verify the fallback is gone on main):

docker exec -it openclaw-openclaw-gateway-1 openclaw models status --agent main

You want it to show:

Fallbacks (0) (agent): -

Paste those 2–3 lines (Default (agent) + Fallbacks).

errant goblet Mar 11, 2026, 6:26 PM

#

Fallback empty, next

azure gorgeBOT Mar 11, 2026, 6:27 PM

#

Perfect — with main fallbacks cleared, Flash should no longer be called as a “rescue model” for your /model switch.

Next step: re-test in Discord (cleanest possible test):

Switch the session to google/gemini-3.1-pro-preview
Send a plain text message (no links, no images, no attachments), e.g. test
Check the Gemini dashboard for that minute: you should see only the 3.1 Pro model get hit.

If you still see gemini-3-flash-preview after this, then it’s not fallback anymore — it’s coming from some other subsystem (e.g. link/media understanding, another agent, or a background job). Tell me which models you see after this re-test and we’ll chase the next source.

errant goblet Mar 11, 2026, 6:30 PM

#

Ok, that solved it as I got an immediate response of a rate limit hit on gemini 3.1 pro preview and no call for gemini 3 flash. So i've hit a hard cap on 3.1 pro limits for today.

azure gorgeBOT Mar 11, 2026, 6:31 PM

#

That lines up perfectly:

The “dual calls” were 3.1 Pro Preview failing (quota/rate-limit) and then the agent fallback kicking in (3 Flash).
After we cleared agents.list[0].model.fallbacks, the fallback stopped — so you now see the real root error immediately: 3.1 Pro Preview is capped/rate-limited for you right now.

Practical options now:

Use gemini-2.5-pro (or Flash) until the quota window resets, or
Request/increase the quota for that specific model/project in Google Cloud (preview models often have much lower caps).

If this fixed the “why are two models getting billed” question, please mark the solution (right‑click the solution message → Apps → ✅ Mark Solution) so Answer Overflow can index it for others.

#Dual models getting called when session model switched from default

1) It’s not failover — it’s pre-processing (links/media/compaction) using its own model list

2) You actually have two gateways consuming the same Discord events

To diagnose fast, paste these outputs (from inside the container)

Heartbeats / background runs

Option A (most readable): per-agent models status

Option B (one shot): dump the agent configs (includes model/fallbacks)

Step 1 (find the main agent’s index)

Option A (most readable): per-agent `models status`

Step 1 (find the `main` agent’s index)