#Gemini 3 Flash

1275 messages · Page 2 of 2 (latest)

deep python
#

so you are telling me, that google didn't knew what they were doing when they named their image model nano banana?

ruby flower
#

See, Google said 3 Flash is a different animal from 3 Pro entirely

deep python
#

pro will get upgrade, rumors say

ruby flower
#

Apparently, 3 Flash is built on newer tech that was too late for 3 Pro

#

So it's exciting that they have more ammo whenever they feel like for a 3 Pro upgrade based on this

onyx rover
#

if they can manage this kind of upgrade for pro, they are gonna leave all the other models behind

civic hull
#

is anyone having issues with cache hits with this model?

bitter galleon
#

seems to be a 2048 token minimum for 3 Flash, which is odd because google docs state 1024 for 2.5 Flash and 4096 for 2.5/3 Pro (doesn't list 3 Flash)

lament tulip
opal thicket
elfin oak
#

using 3 flash feels like using an open sota model

#

it kinda reminds me of deepseek r1 vibes

#

like its a smart model but hallucination is still biting me

#

its good model but all i wanted is they learn how to make the model honest but it may regress because its also tough to fight hallucinations

#

gpt-5.2 already feels retarded to use, it is good overall but it kinda feels like when chatting, its like it always policing your grammar and choices and keeps hedging itself

drowsy knoll
#

gpt-5.2 feels retarded?

#

are oyu doing roleplaying

elfin oak
#

not doing role play (and gpt models are sloppy with that anyway and i have life)

#

like even the smaller mistakes from my prompts it always cautiously tells me about my statements wordings as if its the end of the world

#

i only said "shocks the world" because personally for me 3 flash is an impressive model, but c'mon gpt-5.2 you dont have to question my life choices

#

but i still use gpt-5.2 for high stakes tasks that i dont mind actually being corrected

#

the only chat models I use is either 4.5 sonnet or 3 flash and k2

civic hull
opal thicket
civic hull
#

What do you mean?

#

I meant , it has auto caching based on prefix. If you use the prefix the second time , it'll auto cache hit

opal thicket
#

I don't get prefix. Like model endpoint prefix, or prefill?

civic hull
#

Prefix means message Prefix.

So lets say my first message is :

"Lorem Ipsum 123 dada , hello"

And hte second message is :

"Lorem ipsum 123 dada, hi"

The prefix is the same in both messages: Lorem ipsum 123 dada,

So this part gets cached, so in the second message you get cache hit for the "Lorem ipsum 123 dada," part and pay normal for "hi" part.

#

This is just an example, in reality there's a minimum 64 token or some size for cache hit

opal thicket
#

Oh, that's just default multiturn behaviour for auto caching with other models. Got it

distant plinth
#

finally getting my moneys worth out of google code assist

opal thicket
#

Cache...

coarse swallow
#

i love how it can solve math fast

zenith girder
#

Have you achieved CHIM yet?

agile jacinth
zenith girder
agile jacinth
bold berry
#

ohhhh limbo of the lost

#

This stole from elder scrolls something right?

opal thicket
#

They stole almost everything. That's gif from ending song, sung by single guy

bold berry
#

lol

#

are they playing the entertainer or something lol

#

Is this really from the game?

#

JFC this is awful

opal thicket
#

You shoulf watch full review from MandaloreGaming

#

Legendary

bold berry
gray oyster
#

This thing hallucinates so much I regret using it for baking with grounding. I used the thinking mode too. I'm scared. I'm too deep in to back down now...

#

I explicityly told it not to make one up, and only use search results...

low gulch
#

yeah i noticed that too

gray oyster
#

Update: the vegan banana bread was actually pretty okay. Chana besan was hallucinated, so I added more chia egg when I realized the AI made that part up. I wonder if G3 flash is better than G3 pro at baking with niche ingredients...

drowsy knoll
#

bro how are ai models getting recipies wrong 🥀

gray oyster
#

Yeah they tend to be pretty bad at cooking and such.

Interestingly Monad 56m has baking as one of the category es (creative writing, memorization, etc.) in its training data. I once asked it for a recipe for cookies and I think one of the procedures involved a flamethrower tho...

jaunty flume
#

flash is so good for the price

#

don't even need to use anything else right now

noble zenith
#

Yea

#

It feels like, 2.5 pro used to search for everything but 3 pro and flash just know stuff without searching

drowsy knoll
#

idk why

forest valve
#

Is it able to use implicit/explicit caching?

#

I used over 1k conversation with more than 2048 input token. But it never use cache.

forest valve
#

yes i use fixed cachedPrompt for it
{
role: "system",
content: [
{
type: "text",
text: cachedPrompt,
cache_control: { type: "ephemeral" },
},
],
},

faint narwhal
#

Implicit caching works, but it is quite short-lived, depends on the time of the day when you have a bit more time... And it works better if your context length is bigger. I never tried explicit caching

forest valve
faint narwhal
#

but works

opal thicket
faint narwhal
#

around ten seconds, some with tool calls, some without

#

but it is really a gamble if you get caching or not at the moment

#

can't really rely on it

#

and caching started for me at around 3000 tok

#

some days ago it worked much better, I guess the servers are just a bit overloaded at the moment and don't have that much spare resources to cache

drowsy knoll
#

i dont like gemini caching

#

openai caching is easily the best

#

and for pro its still explicit caching 🥀

elfin oak
#

Using Gemini 3 flash in api really feels like I'm using a SOTA open model

#

or probably I'm spoiled with 2.0 flash and 2.5 flash pricing

#

also kinda eats my credits quickly

#

even with minimal reasoning effort set

true wharf
# elfin oak even with minimal reasoning effort set

interesting. probably ton of input? do you ever see any reasoning charged at minimal ? for me they were about even though that is with mostly output and little input. still, bottom line price shouldn't be so drastic difference.

elfin oak
#

on average i am usually charged $0.001 - $0.09

#

not bad

#

but

#

i kinda miss how they priced 2.0 flash

true wharf
#

oh 2.0, yes, that model was very cheap. 2.5/3 cannot hold a candle. however, different class of model. 2.0 flash would nowadays constitute a flash lite model in terms of end 2025 capability.

agile jacinth
gray oyster
#

I accidentally got it to leak all it's CoT on a question and now I wondering if there's a consistent method? Anyone?

rich forge
#

well I think if you paste a long fake chat with fake thinking it will spit out without the real tag

#

it's happened to me when I copied and pasted a huge ui (Ctrl a Ctrl c Ctrl v)

#

which had "thought for 0.5s" etc

elfin oak
#

one thing i hate with gemini 3 models is despite you provide tools and explicit instructions, it just wonr follow

drowsy knoll
drowsy knoll
elfin oak
#

gemini 3 is ass at IF and agentic tool use

umbral chasm
#

I still think 2.0 flash > 2.5 flash lite

true wharf
opal thicket
#

So, Gemini 3 Flash Lite should be ~2.5 Flash level?

elfin oak
#

most likely

true wharf
elfin oak
#

doubt

#

3 flash doesn't beat 5.2 i swear despite I've been using it daily, I still found 5.2 way more reliable at not making things up or doing shit things, it doesn't try to compete with current minis either so its smarter I'd say within the sonnet level

#

most likely 3 flash lite would compete with at the level of gpt5 mini/4.5 haiku

#

but idk if google still deserves "flash lite" to be lite if they're planning to raise prices again

elfin oak
# umbral chasm yeah and the "lite" models for me have been near unusable

2.5 flash lite is really decent for video summarization, its quite useable, but yeah its not even close to 2.0 flash
it also doesn't follow instructions well and less token efficient, so if you try to ingest tons data and ask it to summarize in one sentence only, it will fail and end up being a word salad, compared to gpt5 nano which surprisingly still follows instructions better

#

and if you add the fact gemini 3 models still suffers from poor IF, i have no hopes 3 flash lite might be improved within that part

cursive moss
#

wow this model is horrible at IF and tool use also hallucinates af
and feels retarded

cursive moss
#

Like what the fuck is that

i just decided to write about random unrelated topics, standard QA. And then i asked about high grade math and this is what it gave me in response. i didnt ask to be answered like some braindead person with adhd. It might as well have asked me to turn on tiktok and subway surfers for authenticity

usually it answers with analogies for pretty much everything, not like whats in the screenshot, but i was too lazy to push it that far. i just did a quick 10 question QA.

cursive moss
#

rgr.
Tried via api (chatroom). U owe me $0.052 (5 rubles) 😄

#

Without system prompt btw

#

I can share the whole chat if u want but there's barely any point since it failed at IF.
I clearly told it to answer in english only, but it kept going in russian.
Face it, this model is retarded

elfin oak
#

as much as i like gemini 3 flash being smart

#

tool calling is very shallow lmao

#

its garbage

#

no matter how elaborate your prompt is how to use the tools

#

idk what google is doing

#

like i asked to generate a research report and it only ran 4 tool calls, 3 search tool and one browse and call it a "report"

#

meanwhile glm 4.7, it literally does tools a lot

#

its the only model how to use tools based on description and schema

last hornet
#

Guys gemini 3.0 flash keeps writing <tool_code> in the user facing text instead of actually calling the tools every then and now.

System prompt clearly asks to call the right tools, tools are correctly passed. Even mentioning to NOT use <tool_code> makes it worse.

#

Guys I please need a fix quick. I have a presentation to make.

wise furnace
#

and in debugger.. check if the tools are actually getting passed to the LLM

last hornet
#

I have instructions explaining what tool to call when, not HOW to call them btw

#

Tools are getting passed

elfin oak
#

gemini is not great at tool calls

#

nothing you can do but maybe manually strip it

elfin oak
#

Gemini 3 flash Instant vs GPT 5 Mini High
its not even close
3 flash managed to spot irony instantly

#

i gotta say, its really good chat model, but really not great for reliability and tool use

#

gpt-5 mini, while reliable on precise prompting such as step by step tool use execution, its still o-mini series model smell, its not great

bitter galleon
drowsy knoll
#

vision w flash is good

#

but

#

hate the tool use

elfin oak
#

yeah its good

#

they still have time to fix things

zenith girder
#

Mildly interesting, you don't see this kind of typo from top models very often:

The reason you can't always do this yourself is that when you try to open your mouth, your muscles automatically tensing up to protect the joint (this is called "guarding").

deep python
#

interesting

elfin oak
#

yeah that mostly explains why

#

the benchmarks from official google site is NOTHING from what ive been dealing with

agile jacinth
#

sooo.... how do we fix caching

#

literally not getting any cache hits on agentic tasks

#

could've at least cached the 1.7k tokens from the first request

rich forge
#

i believe theres a minimum, but using explicit caching (anthropic style markers) works a lot better

#

but still has ocasional times where it misses almost every time

agile jacinth
#

i do have explicit

#

but literally speaking ive not had a single cache hit on gemini for more than a week

#

this is genuinely insane.

rich forge
#

my cache just randomy misses like in 1/5 requests, ive tried with both vertex and ai studio

agile jacinth
#

okay nvm, but it seems they raised minimum?

#

it used to cache with 1k tokens min

umbral chasm
#

still better than grok - I don't think I've ever had a full cache hit

rich forge
#

now gone

lament tulip
#

Found the first thing that gemini 3 flash seems unambiguously superior at compared to every other model I've tried, really surprising result imo, curious if other people have had similar experiences.

I gave this very open-ended refactoring prompt to every major LLM, across multiple different harnesses (gpt-5.2-codex xhigh in Codex CLI, gpt-5.2 high in PI, gemini 3 pro in PI, gemini 3 flash in PI, opus 4.5 in CC):

https://pastebin.com/EGFQBc29

Surprisingly, I was quite disappointed with all the results from almost all those models. They'd do bad refactorings, make it more convoluted, less readable in the effort to "deduplicate" stuff that shouldn't have, or vice-versa. Split up stuff that shouldn't have been split up. Lots of like "this kinda looks like I'm doing the job, right?" vibes.

BUT! gemini 3 flash seemed to actually have good taste. really surprised. It also had the highest % reduction in LoC without any regression in functionality (+1000 / -1600 LoC, manually tested and pretty thoroughly reviewed each one. It reduced LoC imo "properly", not through code golfing or anything, but by making the code actually simpler)

rich forge
#

is max reasoning tokens still supported like it was in 2.5 on this model?

turbid steppe
#

only level

#

you can pass the param up but it isn't enforced. google just maps it to an effort enum value

fervent mountain
#

OpenRouter team @turbid steppe , is there any chance you could increase the rate limit for Gemini 3 Flash? I am receiving around 500 errors a day about it, and it affects users.

fervent mountain
#

same problem with Google AI Studio

drowsy knoll
#

yes this would be good

#

+1

turbid steppe
#

working on it

drowsy knoll
#

thanks king

#

how do you even get higher limits

civic folio
#

is there any setting to enable cache?
looking at metadata in Activity I always get "native_tokens_cached": 0 despite reuse of messages

fervent mountain
#

@turbid steppe any progress?

bold berry
deep python
#

claude if it can't solve stuff one-shot, it will begin breaking more and more

bold berry
#

I have the total opposite experience lol

#

I mostly use opencode what are you suing?

deep python
#

i write:
⁨```issues:

  • sdsdsds
  • sdsdsds
#

then copy paste the above long prompt

#

then when done i ask:
⁨```
does all the requirements solved now? if so explain how, and is it good? and if not, how to solve it?

#

i use gemini 3 pro more despite the benchmarks, but they are almost the same level (but i'm betting on "bigger model smarter")

bold berry
#

It one shots frontends like nobodys business tho

#

And I looooove deepresearch and nano banana

deep python
bold berry
#

Whatever the one is that comes with the sub

deep python
bold berry
#

I have plus or premium or whatever the $20/mth is (got it on black fri or new years) and occasionally buy the calls here when I get throttled or where it's a PitA

visual crest
#

Unhinged google models

deep python
#

when gemini 3 flash is convinced that it is right, it is just repeats itself. -_-

bold berry
deep python
bold berry
#

its funny g2.5 flash was good g3 flash is waaaay worse than pro

drowsy knoll
#

big lie from benchmarks putting it that close

#

it’s a LOT worse than pro

cursive moss
#

degenerative pre-trained model

fervent mountain
#

Hey @turbid steppe , seeing a lot of 429 errors today for gemini 3 flash. Any potential fix you're working on? Should we wait, or is there nothing really you can do on your side?

shy minnow
#

I also encontered the same question yesterday. And I didn't find any description about rate limit description in openrouter docs.

ruby flower
#

This model will refuse the most random stuff, lol

#

In other news, I seem to not get charged for a refusal even if I do get an output (which's cut off mid stream by the content filter), dunno if this is intended

exotic pebble
#

Gemini 3 flash is a delight to code using this simple sentence
Be concise , 0 yapping , don't try to one-shot a problem. Try to understand the problem and don't jump to solutions. If you need more context/files to understand the problem ask for them rather than giving me half baked solutions. No comments inside the code.

sour gulch
#

anyone having problems with 503 overloaded errors lately?

storm garden
fervent mountain
storm garden
#

Why is AI studio so much more reliable ?

zenith girder
#

They aren't serving other companies models on AI Studio

drowsy knoll
#

i still dont understand how ai studio is a fundamentally different provider than vertex

harsh pendant
#

more of an office politics thing i suppose

ruby flower
#

What's the content blocking level for this? I haven't gotten this many unexplainable CONTENT_PROHIBITED refusals since Claude 3

#

For example, this is a silly roleplay chatbot I let loose in a public server, this triggers a refusal but I cannot see any reason, the other refusals are pretty similar

feral sinew
ruby flower
#

Well, I'm fairly sure it's ultimately Google

#

Though I'm wondering if OR has the level at BLOCK_LOW_AND_ABOVE

stark aspen
#

Did something happen since yesterday? Getting a lot more blank outputs than ever before.

ruby flower
#

Is the finish reason prohibited content by any chance?

stark aspen
#

Finish reason just says "stop"

elfin oak
#

happens to me while back with 2.5 flash lite model, not the first time

umbral gorge
#

why the fuck is 3 flash so slow right now

#

holy moly

ruby flower
#

The word "grooming" being present in the context of grooming a pet will trigger the censorship more often

#

If you call the bot old and it answers "no, I'm young" it'll cut off mid-stream due to the content filter

tired mortar
#

Lolita fashion also gives it issues. Because of the first word.

zenith girder
#

Ok that one's at least more understandable lol

drowsy knoll
#

chat

#

do you think that giving gemini 3 flash 1000 images would be detrimental

#

or would the ocr still be good

ruby flower
#

Uh

#

Are you referring to stitching these images together?

drowsy knoll
#

basically i have a document thats pretty long (more like 100 images sorry not 100)

#

and it has like parts to it

#

and i want to split it up by each part

#

so i basically need gemini 3 flash to be like "part 1 is on pages 1-3" "part 2 on 4-7" or wtv

#

right now i basically just do like python pdf conversion to text and then feed that to gemini

#

i wonder if an image of every page + the text would help

low gulch
#

i don't think you can even do that

#

i think OCR would be best indeed, but you can try it

#

for science

#
The Gemini API lets you include multiple images in one request by adding multiple image “parts” in contents (mixed inline bytes/URLs and File API references).​

If you send images inline (base64/bytes), Google notes it’s best for smaller files, with a total request size under 20 MB (prompt + inline media).​

For larger or reusable images, use the Files API upload flow instead of inline data.​

The docs explicitly state that Gemini 2.5 Pro/Flash and 2.0 Flash support up to 3,600 image files per request; the Gemini 3 docs on that same page don’t list a separate “max images per request” number, and instead highlight controlling per-image token budget via media_resolution.​

perplexity

drowsy knoll
#

i wish there was a gemini 3 xhigh level

rich forge
#

made a typo, Sheild instead of Shield

deep python
viral ridge
#

goes the same in aistudio

zenith girder
zenith girder
#

Oh wait, that's web app? Lmaooo

deep python
#

my friend sent

#

he says it is the pro model

#

came to wrong chat

graceful sentinel
#

What's the default reasoning effort level for 3 flash

true wharf
cinder sand
#

To live in a world where geminis thinking isn’t obfuscated 🥀

patent pond
#

Is there maybe an undocumented way to control media_resolution and fps when sending youtube urls? 🥹🥹

low gulch
#

hm nope

dense surge
#

no way to set media_resolution yet? I don't want to be charged 1000 tokens for a tiny image

visual crest
#

People say that gemini flash 3.5 / 3.1 is already active in Antigravity? I can believe it, insane model

#

Wtf, that model is on drugs