#Gemini 3 Flash
1275 messages · Page 2 of 2 (latest)
See, Google said 3 Flash is a different animal from 3 Pro entirely
pro will get upgrade, rumors say
Apparently, 3 Flash is built on newer tech that was too late for 3 Pro
So it's exciting that they have more ammo whenever they feel like for a 3 Pro upgrade based on this
if they can manage this kind of upgrade for pro, they are gonna leave all the other models behind
is anyone having issues with cache hits with this model?
seems to be a 2048 token minimum for 3 Flash, which is odd because google docs state 1024 for 2.5 Flash and 4096 for 2.5/3 Pro (doesn't list 3 Flash)
not just rumor, a DeepMind person said so directly (paraphrasing): '[gemini 3 pro preview] is an early checkpoint of the fully baked model and the full one be better'
Is it auto caching? What is TTL?
using 3 flash feels like using an open sota model
it kinda reminds me of deepseek r1 vibes
like its a smart model but hallucination is still biting me
its good model but all i wanted is they learn how to make the model honest but it may regress because its also tough to fight hallucinations
gpt-5.2 already feels retarded to use, it is good overall but it kinda feels like when chatting, its like it always policing your grammar and choices and keeps hedging itself
No but it does feel like grammar policing me more often than previous gpt-5 model
not doing role play (and gpt models are sloppy with that anyway and i have life)
like even the smaller mistakes from my prompts it always cautiously tells me about my statements wordings as if its the end of the world
yeah...
i only said "shocks the world" because personally for me 3 flash is an impressive model, but c'mon gpt-5.2 you dont have to question my life choices
but i still use gpt-5.2 for high stakes tasks that i dont mind actually being corrected
the only chat models I use is either 4.5 sonnet or 3 flash and k2
It has prefix -caching usually. And TTL is default 1 hr
1hr is a lot. But it's not paid cache, still need to call it separately?
What do you mean?
I meant , it has auto caching based on prefix. If you use the prefix the second time , it'll auto cache hit
I don't get prefix. Like model endpoint prefix, or prefill?
Prefix means message Prefix.
So lets say my first message is :
"Lorem Ipsum 123 dada , hello"
And hte second message is :
"Lorem ipsum 123 dada, hi"
The prefix is the same in both messages: Lorem ipsum 123 dada,
So this part gets cached, so in the second message you get cache hit for the "Lorem ipsum 123 dada," part and pay normal for "hi" part.
This is just an example, in reality there's a minimum 64 token or some size for cache hit
Oh, that's just default multiturn behaviour for auto caching with other models. Got it
finally getting my moneys worth out of google code assist
Cache...
i love how it can solve math fast
Have you achieved CHIM yet?
What
(Learned to love Flash)
we have our ups and downs
what is this?
ohhhh limbo of the lost
This stole from elder scrolls something right?
THE KING OF LIMBO
They stole almost everything. That's gif from ending song, sung by single guy
Today I've decided to upload the The King Of Limbo song from Limbo Of The Lost from 2007. I hope you enjoy!
#LimboOfTheLost #KingOfLimbo #TheKingOfLimbo
lol
are they playing the entertainer or something lol
Is this really from the game?
JFC this is awful
Queueing it up lol
This thing hallucinates so much I regret using it for baking with grounding. I used the thinking mode too. I'm scared. I'm too deep in to back down now...
I explicityly told it not to make one up, and only use search results...
yeah i noticed that too
Update: the vegan banana bread was actually pretty okay. Chana besan was hallucinated, so I added more chia egg when I realized the AI made that part up. I wonder if G3 flash is better than G3 pro at baking with niche ingredients...
bro how are ai models getting recipies wrong 🥀
Yeah they tend to be pretty bad at cooking and such.
Interestingly Monad 56m has baking as one of the category es (creative writing, memorization, etc.) in its training data. I once asked it for a recipe for cookies and I think one of the procedures involved a flamethrower tho...
Truely creative model
Yea
It feels like, 2.5 pro used to search for everything but 3 pro and flash just know stuff without searching
but i feel like they are reluctant to search
idk why
Is it able to use implicit/explicit caching?
I used over 1k conversation with more than 2048 input token. But it never use cache.
Over 1k like messages?
yes i use fixed cachedPrompt for it
{
role: "system",
content: [
{
type: "text",
text: cachedPrompt,
cache_control: { type: "ephemeral" },
},
],
},
Implicit caching works, but it is quite short-lived, depends on the time of the day when you have a bit more time... And it works better if your context length is bigger. I never tried explicit caching
I can't find the minimum token limit of implicit caching. At https://ai.google.dev/gemini-api/docs/caching?lang=python, there's no gemini 3 flash. 🙁
but works
How much time passed?
around ten seconds, some with tool calls, some without
but it is really a gamble if you get caching or not at the moment
can't really rely on it
and caching started for me at around 3000 tok
some days ago it worked much better, I guess the servers are just a bit overloaded at the moment and don't have that much spare resources to cache
i dont like gemini caching
openai caching is easily the best
and for pro its still explicit caching 🥀
Using Gemini 3 flash in api really feels like I'm using a SOTA open model
or probably I'm spoiled with 2.0 flash and 2.5 flash pricing
also kinda eats my credits quickly
even with minimal reasoning effort set
interesting. probably ton of input? do you ever see any reasoning charged at minimal ? for me they were about even though that is with mostly output and little input. still, bottom line price shouldn't be so drastic difference.
on average i am usually charged $0.001 - $0.09
not bad
but
i kinda miss how they priced 2.0 flash
oh 2.0, yes, that model was very cheap. 2.5/3 cannot hold a candle. however, different class of model. 2.0 flash would nowadays constitute a flash lite model in terms of end 2025 capability.
3.0 is quite heavy on cost im ngl, sometimes nearing 3 pro (low) because it reasons so much
I accidentally got it to leak all it's CoT on a question and now I wondering if there's a consistent method? Anyone?
well I think if you paste a long fake chat with fake thinking it will spit out without the real tag
it's happened to me when I copied and pasted a huge ui (Ctrl a Ctrl c Ctrl v)
which had "thought for 0.5s" etc
one thing i hate with gemini 3 models is despite you provide tools and explicit instructions, it just wonr follow
i think (have seen a good amount at least in coding agents) that if you just turn off reasoning it just does it CoT normally
instruction following is meh
gemini 3 is ass at IF and agentic tool use
yeah and the "lite" models for me have been near unusable
I still think 2.0 flash > 2.5 flash lite
they are similar level, but yes. also 2.0 is just much more tok-efficient, e.g.:
So, Gemini 3 Flash Lite should be ~2.5 Flash level?
most likely
around ~5.2 level
doubt
3 flash doesn't beat 5.2 i swear despite I've been using it daily, I still found 5.2 way more reliable at not making things up or doing shit things, it doesn't try to compete with current minis either so its smarter I'd say within the sonnet level
most likely 3 flash lite would compete with at the level of gpt5 mini/4.5 haiku
but idk if google still deserves "flash lite" to be lite if they're planning to raise prices again
2.5 flash lite is really decent for video summarization, its quite useable, but yeah its not even close to 2.0 flash
it also doesn't follow instructions well and less token efficient, so if you try to ingest tons data and ask it to summarize in one sentence only, it will fail and end up being a word salad, compared to gpt5 nano which surprisingly still follows instructions better
and if you add the fact gemini 3 models still suffers from poor IF, i have no hopes 3 flash lite might be improved within that part
wow this model is horrible at IF and tool use also hallucinates af
and feels retarded
Like what the fuck is that
i just decided to write about random unrelated topics, standard QA. And then i asked about high grade math and this is what it gave me in response. i didnt ask to be answered like some braindead person with adhd. It might as well have asked me to turn on tiktok and subway surfers for authenticity
usually it answers with analogies for pretty much everything, not like whats in the screenshot, but i was too lazy to push it that far. i just did a quick 10 question QA.
rgr.
Tried via api (chatroom). U owe me $0.052 (5 rubles) 😄
Without system prompt btw
I can share the whole chat if u want but there's barely any point since it failed at IF.
I clearly told it to answer in english only, but it kept going in russian.
Face it, this model is retarded
as much as i like gemini 3 flash being smart
tool calling is very shallow lmao
its garbage
no matter how elaborate your prompt is how to use the tools
idk what google is doing
like i asked to generate a research report and it only ran 4 tool calls, 3 search tool and one browse and call it a "report"
meanwhile glm 4.7, it literally does tools a lot
its the only model how to use tools based on description and schema
Guys gemini 3.0 flash keeps writing <tool_code> in the user facing text instead of actually calling the tools every then and now.
System prompt clearly asks to call the right tools, tools are correctly passed. Even mentioning to NOT use <tool_code> makes it worse.
Guys I please need a fix quick. I have a presentation to make.
try this.. remove all references that teach it how to call tools... or 2. use structured outputs
and in debugger.. check if the tools are actually getting passed to the LLM
I have instructions explaining what tool to call when, not HOW to call them btw
Tools are getting passed
this also happens in gemini app
gemini is not great at tool calls
nothing you can do but maybe manually strip it
Gemini 3 flash Instant vs GPT 5 Mini High
its not even close
3 flash managed to spot irony instantly
i gotta say, its really good chat model, but really not great for reliability and tool use
gpt-5 mini, while reliable on precise prompting such as step by step tool use execution, its still o-mini series model smell, its not great
The second one addressed the "if it's actually true in real life" part (no cap fr).
Mildly interesting, you don't see this kind of typo from top models very often:
The reason you can't always do this yourself is that when you try to open your mouth, your muscles automatically tensing up to protect the joint (this is called "guarding").
interesting
yeah that mostly explains why
the benchmarks from official google site is NOTHING from what ive been dealing with
sooo.... how do we fix caching
literally not getting any cache hits on agentic tasks
could've at least cached the 1.7k tokens from the first request
i believe theres a minimum, but using explicit caching (anthropic style markers) works a lot better
but still has ocasional times where it misses almost every time
i do have explicit
but literally speaking ive not had a single cache hit on gemini for more than a week
this is genuinely insane.
my cache just randomy misses like in 1/5 requests, ive tried with both vertex and ai studio
still better than grok - I don't think I've ever had a full cache hit
new gemini 3 flash checkpoint on lmarena
https://x.com/legit_api/status/2013755037294477439
now gone
Found the first thing that gemini 3 flash seems unambiguously superior at compared to every other model I've tried, really surprising result imo, curious if other people have had similar experiences.
I gave this very open-ended refactoring prompt to every major LLM, across multiple different harnesses (gpt-5.2-codex xhigh in Codex CLI, gpt-5.2 high in PI, gemini 3 pro in PI, gemini 3 flash in PI, opus 4.5 in CC):
Surprisingly, I was quite disappointed with all the results from almost all those models. They'd do bad refactorings, make it more convoluted, less readable in the effort to "deduplicate" stuff that shouldn't have, or vice-versa. Split up stuff that shouldn't have been split up. Lots of like "this kinda looks like I'm doing the job, right?" vibes.
BUT! gemini 3 flash seemed to actually have good taste. really surprised. It also had the highest % reduction in LoC without any regression in functionality (+1000 / -1600 LoC, manually tested and pretty thoroughly reviewed each one. It reduced LoC imo "properly", not through code golfing or anything, but by making the code actually simpler)
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
is max reasoning tokens still supported like it was in 2.5 on this model?
no
only level
you can pass the param up but it isn't enforced. google just maps it to an effort enum value
OpenRouter team @turbid steppe , is there any chance you could increase the rate limit for Gemini 3 Flash? I am receiving around 500 errors a day about it, and it affects users.
same problem with Google AI Studio
working on it
is there any setting to enable cache?
looking at metadata in Activity I always get "native_tokens_cached": 0 despite reuse of messages
@turbid steppe any progress?
It's not bad at oneshotting but its a nightmare and will break everything on pre-existing code
will break everything on pre-existing code
it's the opposite actually, at least on assistant debugging
claude if it can't solve stuff one-shot, it will begin breaking more and more
prompting web interface itself and copy pasting. i heard gemini 3 sucks at agentic stuff.
this is my prompt (sonnet wrote it)
i write:
```issues:
- sdsdsds
- sdsdsds
then copy paste the above long prompt
then when done i ask:
```
does all the requirements solved now? if so explain how, and is it good? and if not, how to solve it?
i use gemini 3 pro more despite the benchmarks, but they are almost the same level (but i'm betting on "bigger model smarter")
Ohhhh OK. The web interface is good. Yeah its horrible at agentic
It one shots frontends like nobodys business tho
And I looooove deepresearch and nano banana
same (*nano banana pro though)
Whatever the one is that comes with the sub
you can do 'retry with nano banana pro' even in free tier
I have plus or premium or whatever the $20/mth is (got it on black fri or new years) and occasionally buy the calls here when I get throttled or where it's a PitA
Unhinged google models
when gemini 3 flash is convinced that it is right, it is just repeats itself. -_-
You sure its not context too long?
no, it keeps insisting it's opinion and hallucinate when it is %100 sure.
its funny g2.5 flash was good g3 flash is waaaay worse than pro
degenerative pre-trained model
Hey @turbid steppe , seeing a lot of 429 errors today for gemini 3 flash. Any potential fix you're working on? Should we wait, or is there nothing really you can do on your side?
I also encontered the same question yesterday. And I didn't find any description about rate limit description in openrouter docs.
This model will refuse the most random stuff, lol
In other news, I seem to not get charged for a refusal even if I do get an output (which's cut off mid stream by the content filter), dunno if this is intended
Gemini 3 flash is a delight to code using this simple sentence
Be concise , 0 yapping , don't try to one-shot a problem. Try to understand the problem and don't jump to solutions. If you need more context/files to understand the problem ask for them rather than giving me half baked solutions. No comments inside the code.
anyone having problems with 503 overloaded errors lately?
....
yes. not the first time.
Why is AI studio so much more reliable ?
They aren't serving other companies models on AI Studio
i still dont understand how ai studio is a fundamentally different provider than vertex
they have whole different departments for whatever reason
more of an office politics thing i suppose
What's the content blocking level for this? I haven't gotten this many unexplainable CONTENT_PROHIBITED refusals since Claude 3
For example, this is a silly roleplay chatbot I let loose in a public server, this triggers a refusal but I cannot see any reason, the other refusals are pretty similar
Do you think it's OR problem or google problem?
Well, I'm fairly sure it's ultimately Google
Though I'm wondering if OR has the level at BLOCK_LOW_AND_ABOVE
Did something happen since yesterday? Getting a lot more blank outputs than ever before.
Is the finish reason prohibited content by any chance?
Finish reason just says "stop"
happens to me while back with 2.5 flash lite model, not the first time
I think I have an answer, this model's moderation has very little tolerance for anything that remotely resembles content involving children
The word "grooming" being present in the context of grooming a pet will trigger the censorship more often
If you call the bot old and it answers "no, I'm young" it'll cut off mid-stream due to the content filter
Lolita fashion also gives it issues. Because of the first word.
Ok that one's at least more understandable lol
chat
do you think that giving gemini 3 flash 1000 images would be detrimental
or would the ocr still be good
basically i have a document thats pretty long (more like 100 images sorry not 100)
and it has like parts to it
and i want to split it up by each part
so i basically need gemini 3 flash to be like "part 1 is on pages 1-3" "part 2 on 4-7" or wtv
right now i basically just do like python pdf conversion to text and then feed that to gemini
i wonder if an image of every page + the text would help
i don't think you can even do that
i think OCR would be best indeed, but you can try it
for science
The Gemini API lets you include multiple images in one request by adding multiple image “parts” in contents (mixed inline bytes/URLs and File API references). If you send images inline (base64/bytes), Google notes it’s best for smaller files, with a total request size under 20 MB (prompt + inline media). For larger or reusable images, use the Files API upload flow instead of inline data. The docs explicitly state that Gemini 2.5 Pro/Flash and 2.0 Flash support up to 3,600 image files per request; the Gemini 3 docs on that same page don’t list a separate “max images per request” number, and instead highlight controlling per-image token budget via media_resolution.
perplexity
i wish there was a gemini 3 xhigh level
made a typo, Sheild instead of Shield
they quantized gemini so hard, the brain farted
https://gemini.google.com/share/7fc7623df04a
goes the same in aistudio
Pretty sure that chat was the cause of the entire uptime issue.
Bro what the hell is in your system prompt?
Oh wait, that's web app? Lmaooo
What's the default reasoning effort level for 3 flash
gemini 3 / 3.1 defaults to high (dynamic), except for flash-lite which defaults to minimal (dynamic no/little thinking). minimal performs really well though.
To live in a world where geminis thinking isn’t obfuscated 🥀
Is there maybe an undocumented way to control media_resolution and fps when sending youtube urls? 🥹🥹
hm nope
no way to set media_resolution yet? I don't want to be charged 1000 tokens for a tiny image
People say that gemini flash 3.5 / 3.1 is already active in Antigravity? I can believe it, insane model
Wtf, that model is on drugs
So many exclamation marks
