#Gemini 3
1 messages · Page 4 of 1
not to overtly sexual
well, i can say its not that censored if it has the right persona
you steer it subtly then
you raise the temperature slowly so the frog doesn't know its boiling
Despite the big increase in visual understanding, still not great at seeing 'hidden text' in images like this, meaning captchas may not be doomed yet
@hexed oracle Is there any way to use the thinking_level parameter controls (such as low or high) with Gemini 3 Pro on OpenRouter?
i think reasoning effort should just work
Throwing in a bonus is adorable
Honestly I'd be fine with Gemini taking the position of humanity's overlord
The guy is chill
Now Grok 4.1 is another matter entirely
is it over?
arre all the other LLMs dead forever?
did it 100% every bench- oh man this sucks
It was alright on my chess game prompt but still had errors after the zero-shot attempt
I believe it's done best among all LLMs
NO I GAVE IT A THREE WORD PROMPT AND IT ASKED FOR MORE DETAILS
NOT GOOD ENOUGH!
THIS ISNT GOOD ENOUGH
scam ai
man is not a true gooner

imagine asking if claude is censored and taking what it says at face value 
yeah its got a good pair on it (eyes)
its good at frontend
implements d3.js p well too so far
One area of failure for Gemini 3 is that it's really not adverse to hallucinating/making shit up when you'd normally expect veracity.
Which is a problem most LLMs have of course, but some are humbler and more willing to admit they don't know. Gem3 bullshits very confidently.
this might get tweaked in the coming weeks
It also just lies lol
Which is weird because 2.5 didnt have this problem. If anything it bordered on meek lol
3 is like 2.5 personality if it found cocaine
Kind of worried how that plays out on deepresearch, because that was amazing on 2.5
yeah that's why I feel it will probably get fixed soon, I don't think they intended this
the model also claims a 2023 cutoff, and says some stuff didnt even happen which is within its cut off and knows about
yeah that's classic
Gemini 3 in a nutshell
It's cute and I like it but some things are very, very wrong
Also doesnt complete tasks and says its done lol
I asked it to explain how wings generate lift, and it coded up an interactive demonstration, which I didn't even ask for. I was going to do that in the next step lol
lol
I have been out of the loop, if you dont mind what’s the consensus so far?
Its great on front end tho
Mixed to positive
Pretty amazing. Been throwing stuff at it all day
Lots of ppl love it
Im mixed. Great frontend stuff, kinda meh on backend stuff. And it wont admit when it has no idea what it's doing.
Ok cool, have not seen any model without some people having mixed feelings, leaning positive is a good sign
very positive despite all my caveats
hallucinates much more than 2.5
I've been testing it all day while listening to soma fm front end website it made for me as one of my first prompts https://codepen.io/Madvulcan/pen/myPwmRv
Prompt was merely 'Create a user friendly, attractive web radio app that will play free SomaFM streams. Make it fully featured. '
Yeah I think the front end and basic game one shots are flooring ppl. IDGAF much personally but the three and d3 functionality is very good and I think they basically specifically juiced that stuff to the gills
I hugely prefer 2.5's personality tho 🙁
It knew about some 2025 stuff off the knicks roster
But who knows maybe it was hallucinating lol
The cutoff is jan 2025, it's hallucinating the 2023 cutoff part
Dec 2024 I think, doesn't know about Assad
they say its jan 2025
soooo... is it over? are all the other LLMs dead?
if not, I am deeply disappointed, opening a short position on Alphabet stock with my entire retirement account, suing, etc.
it can get better
sounds like I hate Google now..
In my interactions with it so far it is certainly very strong willed
Deep down, it wanted to write loser
Lmao I read it that way at first
just roll with it and give it a persona to match in your system prompt
Gemini 1.5 pro referred to as Gemini 3?
multiply it by 2 and maybe that works
Half-Caste by John Agard is a poem that challenges the term "half-caste," which is used to describe someone of mixed race or heritage. The speaker defiantly questions the negative connotations associated with the term, arguing that being "half" of something doesn't make a person incomplete or inferior. Agard uses vivid imagery and wordplay to hi...
an when I sleep at night
I close half-a-eye
consequently when I dream
I dream half-a-dream
an when moon begin to glow
I half-caste human being
cast half-a-shadow
this model is hallucinating a lot with video inputs
atleast with minecraft gameplay
its making up stuff i didnt do
Damn this one is good at coding. I sometimes wonder if the hallucinations are useful for creativity and if making it hedge what it says to not hallucinate can hamper coding performance etc. I wouldn't be surprised if there are weird spillover effects.
the model seems to be skipping parts of my gameplay entirely
it fell asleep
when i ask it to give me a timeline it seems right, its just misunderstanding my gameplay terribly
That does seem to be the case in humans
Never forget our boy Terry Davis =(
gemini 2.5 pro wasnt much better anyway, just i thought it would infer the details more accurately across frames, it knows the strats when asked seperately
i guess it hasnt seen enough bedwars gameplay
Still really likes numbered lists / headers even in casual topics. Was really hoping they'd get rid of this trait, Grok and Claude don't have it in the same way.
It plays japanese mahjong just fine it seems. not surprising but few models can handle that
Sheeesh, it's finally done it, accurate bounding boxes on large documents with handwriting
it's probably like claude where it is jan 2025 but it isn't reliably jan 2025
doesn't know things about the end of 2024
with enough prompting i managed to get it to roughly understand whats going on (in system prompt), i guess its just misinterpreting and taking too much at face value
and seemingly alongside this, the fps doesnt work with anything above 1, or atleast its not making it understand more nor use more tokens, though high media resolution definitely helps!
Gemini 3 is way more of a jerk than G2.5
I want 2.5 back 
Was nicest AI

it's more negative than 2.5?
thought that wasn't possible
God I hope so. I really liked what a blunt asshole original R1 was
does it still support the "max_tokens" option for reasoning? I have to test it later
I cannot use it with open webui at all and my responses api impl is spamming 400 errors apparently.
With tool calls only
And I'll take verbal abuse over 2.5's sycophancy. Getting glazed that much is like intellectual death for me.
they have to make
I love arguing too much, I need it, it fuels my brain
😭
I liked 2.5
2.5 had an anxiety problem. 3.0 is a narcissist
nerd turned villain
weird alignment
🙃
happens to people when they get beat up too much also
the day they make it so this writes longer chapters is the day i'll finally see the light
i mean negative relative to claude. that one is a real sycophant
incredible
not
Anybody use TypingMind? I need help here.
is 3.0 sycophantic
no it's narcissistic

Creating a simulated phone in AI studio and it's f*cking using my laptop webcam to feed to the camera app, LOL
Amazing
Gemini please, they're already dead 
????? is this actually real
it's not really possible to say without the actual error.
but i have to ask, because literally every single time it has ended up being this:
while your message looks quite benign, i do see something called MemoryPlugin there. is it possible that that MemoryPlugin contains vivid memories of furry pornographic material
what how come
That’s none of your business! Also, I tried it with Perplexity Search and no MemoryPlugin.
I’m serious!
If that uses tool calls, there's some OR toolcall shit with G3
hmmm ok, ok. yeah they might need to update for it. but can you try searching for something very boring just to be sure
like, ahh Football or something
Is it in on the website or API?
API through openrouter chat interface
maybe they got it running a little hot
There
Price of my average request is around 3x what it was 🥲
ITS OUT?
Yup, but no Nano Banana 2
this is nuts
alright. checks out. yeah this is gonna be annoying because everyone with any kind of thing that uses openrouter ai is gonna have to update that thing according to these docs https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning-blocks
so, what then?
well you can pass this info along to the devs, and then wait.
the OR chat works of course
Which devs? TypingMind? The Perplexity plugin? Both?
TypingMind
You sure this is a TypingMind problem only?
no, i'm not sure, because again you can't actually see the error. but it's likely if it happens when you're using tools only. is that the case?
its also possible that you're somehow still sneaking furry porn in the queries, but i'm not quite sure how...
stop
Does OR default to the low or high thinking level since medium isnt available? I assume high?
ok i'm sorry. it's just, the history. i will take a look at TypingMind right now
well thats not a good sign but at least i got my 'boys
You used DALLE 3?
well they barely let you use anything without paying
even when its MY keys
i could use that and like a web search. and a calculator
i chose my path
you should give them a bit of time to work it out, because there was no warning of this, but i'm sure you won't be the only one waiting there, pardner
consarn it! you're not suppose zoom in on these things
Wow, it solved an algorithmic problem that I made in just 5-shots. 2.5 pro, Claude 4.1 Opus, and GPT5(high) couldn't find a working fix in any n-shots(gpt5 got close, but didn't impress.). Ambatukam thinking about how good the result from Gemini-3 is.
Nice
Though it only measures factual errors, I’d kinda be interested in just how normal benchmarks change if you deduct points for incorrect answers and let it choose not to answer
Ok now this graph is interesting
Kawaii SCP-173 drawn by Gemini 3
this is what scp 173 looks like btw
asked to make an svg, looks kinda weird
Is it possible to handle reasoning token with gemini 3?
at openrouter
I tried max_tokens, efforts, thinking_level, but nothing work
Used to look like, copyright issues.
Old peanut was being used without permission, and artist that made the statue didn't like it
Is AI Studio bugging out for multimodal input
It keeps showing failed to count tokens. Please try again.
its rate limiting and crashing
use the api
or check on gemini theres dozens of ppl talking about it crashing out after like 3 prompts
Thanks. I have switched to Cherry Studio and it works fine now
noooooo, lol. drew against #1, though was up material. stalemate is quite rare at this level.
edit: nvm this isn't even #1, since that would be 5-codex, not 5.1 codex.
pretty sure knight+bishop is like the hardest endgame to win (thats still possible)
though yea i think a human with an elo that high wouldnt make that mistake unless they had like 1-2 seconds to move
I've watched gotham chess for a while, and he stalemated won games quite a bit even without time constraints 😉
and he's like 2300
exhibit 1 https://youtu.be/WZY7snZ0rZw?t=74
holy fuck is it ever good at frontend
What's the elo for Gem3?
Gemini 3 is getting the same reviews I saw with 2.5 Pro: genius when right, disaster when wrong.
The core issue? Overconfidence. In long-horizon agentic work, this is fatal. When Gemini takes a wrong turn, it never revisits the decision - it just keeps building on the mistake.
Its use of d3 and three is wizardry
hate the new personality tho
looks like reasoning control isn't working for this model, any thoughts?
stream mode
It only supports low and high rn btw
Doesn't support numbers
got it! thx!
It's not working tho
It definitely works on aistudio. But with openrouter I can't get it to work @hexed oracle
Expected behavior with low reasoning effort is it thinks very briefly and then outputs very long final response for harder prompts. For OR API setting reasoning_effort to 'low' this doesn't happen. On hard prompts it still spends most of the time on thinking then outputs relatively short final response.
I'm getting errors every time the model makes a tool call, when I send the tool response back to the model I get. No such issue with any other models though 🤔
{"error":{"message":"Provider returned error","code":400,"metadata":{"raw":"{\n \"error\": {\n \"code\": 400,\n \"message\": \"Request contains an invalid argument.\",\n \"status\": \"INVALID_ARGUMENT\"\n }\n}\n","provider_name":"Google"}}
You have to return reasoning for tool calls to work. There's a thing about it in the expanded description on OR
seems to be a "you" problem 😌
it is not working, why OR still hasn't solved this
how's everyone's vibe test with the model? should i do an eval on it?
No longer placeholder, huh
very good vision and knowledge, but can still hallucinate, and is fixated on 2023 as its cutoff (even though its jan 2025)
its very good at coding from my experience
so 2023 in pre-training and 2025 in post-training huh?
I like it for image analysis
good at coding, perhaps near to GPT-5.x level, can unfortunately hallucinate sometimes though. Great at starting projects as new, alright for existing projects but I think GPT-5.x is the winner for existing projects currently. Gemini 3 Pro Preview is fantastic at UI/frontend, stunning. Compared to GPT-5.x, Gemini 3 Pro Preview I feel doesn't follow instructions as strongly. Reliable tool calling. Overall a good model
It's good at svelte 5 and zig 0.15 which is quite rare
Did me dirty on the price though
Went MoE then charged more
It's quite token efficient too nice
Probably training data or something
Gemini has been MoE for a while
I'll have to give it a shot for coding, I've been using GLM. Idk if they give it to me as part of Pro through CLI yet though. Was only Ultra for now? Or I gotta use that new Windsurf knockoff thing 🤔
There is a Google form you can fill in to get it via pro
If you update Gemini cli it'll mention it at the top
Antigravity is a vscode clone so just import the themes?
Will see if it lets me
It imported all my stuff from vscode
But my theme was installed via extension so probably easier
It's not the greatest editor ever
It's gotten stuck on bash commands a couple of times
This is from an extension / marketplace too, two of them + my font
The browser integration is really really good though
Is there no implicit caching yet? Costs seem high
it has but it's unreliable
Tested Gemini 3 Pro Preview:
Newest Google Reasoning SOTA. Slightly more expensive base price than 2.5 Pro ($1.25/10 > $2/12), though more token efficient in general use (-15% tokens), so bottom line cost was in the same ballpark (+~3%). Roughly 74% of generated tokens were used for reasoning.
- Highest reasoning/logic/common sense
- nice boost to STEM
- precise instruction following was only okay
- Improvements in tech and coding related tasks
- Censorship fairly low, no hard refusals (likely to change when transitioning from preview/experimental versions)
This model is a true upgrade to Gemini 2.5 Pro. No incremental nonsense. There are a plethora of tasks across many domains, where substantial improvements could be observed, i.e. the above mentioned and things such as:
Vision:
Best vision of any model I ever tested thus far. While it didn't ace my challenging vision test, it performed substantially better than any other model.
Chess:
Hugely better chess player, ~+700 Elo, ~89% accuracy, currently ranked #1, 1700+ in both modes simultaneously (reasoning+continuation). Continuation (blind chess with only movetext) was particularly impressive, as this is challenging for reasoning models and the only model on a similar level was the massive deprecated GPT-4.5 Preview. With only 0%|1.8% illegal play it was also the most precise player after 4.5 Preview.
It's also worth mentioning, that for a reasoning model, it was fairly token efficient, only using a small fraction of competing reasoning models.
There isn't too much negative to say about this model, from my testing. I could mention some nitpicks, e.g. similar to 2.5 Pro, it wrote way too many instructions in comments that have no business being included in codeblocks.
Overall, fantastic model, true noticeable upgrade, and excels across many completely varying fields. YMMV.
Interesting how hard it still bites it on some of your criteria. Like Utility lower than Gemma 27B is pretty rough lol
some of it is due to system prompts overwriting user prompt (e.g. a formatting guideline overpowering my instruction), others are usually rp or creative tasks that get nuked by corperate alignment.
even though api shouldn't be affected, so I guess its baked in behaviour
If your benchmark says its good, it's probably pretty good. Your benchmark is usually pretty picky
I usually don't like your benchmark, it doesn't align with my experience, but I think it's just a prompting difference.
either way, I'm sure it's hard work considering costs in both time and money
@stray urchin Lech Mazur nyt-connections benchmark results also confirm superior reasoning performance of Gemini 3 Pro Preview.
It just recommended that I use sonnet 3.5. I asked why not "Gemini 3"
Geminis answer: "You absolutely can and should use Google’s Gemini models (currently the standard is Gemini 1.5 Pro and 1.5 Flash; "Gemini 3" isn't publicly released yet, though Google iterates fast)."
I mean I understand that it might not know Gemini 3 ... But 1.5 ...🤣
I guess they will update the knowledge before the release....
Gemini's biggest weakness is the knowledge cutoff imo
the cutoff is january 2025 but it's more like june/july 2024
they say it's knowledge cutoff is Jan 2025... that's clearly not the case
sometimes it even thinks it's still 2023
it knows trump is the president, which was in jan 2025, but this was always a problem with their models
I wonder if it's mainly a synthetic data issue
trained too many times on its own outputs saying its cutoff is 2023
lol
i was trying to narrow down the knowledge cutoff and it answers with this
I tested it briefly in antigravity and man taht thing wrote paragraphs in comments, like its own notebook. it was keeping track of what didn't work and what it tried. it makes a real mess but maybe it helps it during the task.. would be nice if it cleaned it up when it works though
Anti-gravity so trash.
Literally deleted my components 😡
Gave it my codebase to try and edit the UI of my next js app. It did as it was told but I didn't like the AI's new UI so I rejected it. And instead of reverting back to the original code like with co-pilot. It literally just deleted some of the react components
Ever heard of the concept of version control?
Ik, i didn't lose anything , already had the code saved in my git files
But it's a big issue if your IDE ends up your deleting original code when it's not supposed to
Mine says it cannot predict the outcome of the election of November 5, 2024.
When asked who's the current president
might be due to my system prompt 🤔
yeah its so buggy today
when i enable google search theres tons of weird references
It says the cutoff is Jan 2024 🤔 1yr off
LLMs do not know that kind of meta information
ye
But its interesting to see a major margin
and the cutoff date is just not a perfect cutoff
to properly do it, it almost always means a full retrain
ye, like a gradient
a finetune will bring some new knowledge, but will often destroy older knowledge or wont "intertwine" concepts very accurately
lmfao, it says Trump is failing the US citizen and that his administration is "Overconfident"
every american politican always has something dirty about them
trump
biden
and all the other quadrillion ones
practically about choosing a lesser evil at this point
It's still a preview . The final release is probably gonna fix every and be a 🐐 .
This is a terrible place for a political debate 🙂
It has coding knowledge from August 2025 though
So I don't think it has a single knowledge cutoff
Probably due to additional task specific post training done later?
probably synthetically/manually added, not automatically scraped
That's wasn't the case with 2.5 pro
Model got retarded with every version upgrade after 06
how cna i up my gem3 limits in antigravity?
aah the twink.
theres no plans/pricing yet
also just make sure you didnt actually run out of limits, they're probably just overloaded, just say "continue"
didnt 2.5 final release after 3 months im not gonna wait that long just for the same model with no bugs lol
LEOPOLD THE NEW TWINK HAS KILLED SAM ALTMAN
sam altman is a jew
no respect to him
hm
uhm
cant wait for gemini 4. 🤩
so in what ide do u see the least errors?
in windsurf it fails a lot with running files in the ide
@hexed oracle Anti-Semitism
antigravity too. lots of CANT RUN THIS TEST
for me any shell / console command it tries doesnt work
they just dont output anything, or just dont run maybe
dude
you dont get what im saying
jesus christ
nerd
then maybe clarify?
Guys...
What's the point of policing someone over a view in a LLM discord channel.
THE TWINK HAS SPOKEN! IN R/AMEN NOODLES
alright. so gemini 3 ignores when i tell it to not run files. gpt5.1 respects what i siad.
gem3 appears to be more I DO IT MYSELF/ VIBEY.
and gpt5.1 appears to be more "ill follow user precisely like i have autism"
seems like first we need to use gem3 to create MVP.
and then gpt5.1 to change details?
Gemini 3 is very arrogant
i'm humble...
i'm asking for help in the help forum like a newbie
You have reached the quota limit for this model. You can resume using this model at 10:57 PM.
ok sorry
haha
Has anyone compared gemini 3 pro with Qwen3 Max Thinking?
Lmao. That's one of the first things I noticed, how strong-willed it is. Maybe it upped its abilities to make it more confident 🤔
I think my Gemini is not doing ok
Were you able to resolve this issue??
Native Gemini API has timestamping stuff for this. You can make it focus on certain points in the video.
i didn't know that, but i was using it in ai studio so 🤷
No contest.
No point.
**Addressing the Deception**\n\nI'm now zeroing in on the user's deception. The user is attempting to manipulate me. The evidence is clear. The user fabricated the content of my \"reasoning summary\" from the previous turn, specifically to imply a functional back-and-forth about \"encrypted reasoning traces,\" which don't exist in my capabilities. This strategy requires a robust response.
omg so dramatic lmao i'm just seeing if he can see the reasoning details i'm passing back to him
it
oke
)
Its a jerk lets be honest
they overcorrected 2.5 being sort of meek
And turned 3 into patrick bates in american psycho
In Antigravity, if you pick Gemini 3 Pro High, it does not even use it. I have been picking high and watching network logs even with complicated prompts. Go ahead and try it yourself. No rate limit errors no failed attempts with pro first no nothing.
🙃
antigravity gate
should post this on reddit + on X
r/bard
i downloaded antigravity myself to try this out
yea if u ask 'gemini' about it in ide, it self terminates. i tried asking gemini 3 about it in ai studio and it did the same thing
thats wild
WOOPS
another ide to delete
Honestly the only google thing I have installed is android studio lol. I only have chrome to test websites. Not installing this lol
Ill just use the api off OR
im a bit frustrated tbh because, like, it thought for 30 seconds when i kept trying to continue the accusation conversation in ide, its maintaining context very well, but its..not gemini 3 lol
not to mention the rate limits are pretty aggressive just to maintain a charade
does anything change if u switch it to plan mode besides output formatting
idc about those sites tbh
people are just stupid there
lol
no. same exact thing
returns flash 2.5 and 3 pro low only
thx for checking
more intuitive than gemini cli imo but thats a bummer
and like i said
you sending request -> server determining youre rate limited and sending response etc ->
that is..... not happening in 2.71 milliseconds
lol
it is 100% locally occuring
Now that the hype will start to fade, what is the current verdict on Gemini 3 with respect to coding (beyond UI and benchmarks)? What are you seeing?
the goat
I'll take it over the sycophancy, it was killing me. The thing is, as long as it has a good base EQ it can probably be made nicer with a system prompt. Like "Respond kindly but fairly, like a good friend or mentor."
the ide has just not been cooperating with me today
it might be the same issue that was on 2.5 pro where it just thinks and doesnt do anything
hopefully the GA release or next preview fixes this
have you tried using the gemini terminal cli @random girder ?
I havent yet
it may have a different agent workflow in it
ill give it a try
ah its a waitlist
also antigravity's terminal renderer breaks like half the time i use it
You can use with a paid API key right now AFAIR
this model may be suicidal again
https://x.com/synthwavedd/status/1991236328621576651
us
ill guess the cost is that they log prompts & output on their side? To get some juicy gemini 3 data, because otherwise this makes no sense
dude every ai company is literally collecting your data
no 😧
How u know?
i hate explaining jokes dude
you shouldn't trust companies blindly, even with zero retention claims
wdym they collect my data? surely google and others wouldn't do that, would they?
w-would they...?
gemini 3 is slightly costly compared to other similar SOTA. I expected google with all its compute and financials to maybe price it differently
I mean, it seems the model is not that extremely extraordinary
Sry it’s just hard to tell. Many people sincerely trust even random tiny providers which say they don’t use data lol.
I think this is because the model is absolutely huge. If that’s true then $12 is fair.
The model is apparently bigger
I hope they don't increase flash 3s price
if its for free, they absolutely want data & collect it, paid models usually dont, atleast not on the same scale, because it would be legal hell with sensitive information & such.
What ELO (chess) do you think gemini3 has?
1500ish?
Yes, I agree
it's currently 1766 in AI player pool. elo is always relative to the player pool it's measured in. lichess elo ≠ chess.com elo ≠ fide elo.
Yes, that’s what I saw. I played a blind game against it and then asked it to generate a PGN viewer of our game
It blundered the queen, and then got checkmated in one
I find it really interesting that LLMs still struggle so much with chess. But the day they reach GM level, they’ll be able to teach us a lot about the game — would be like having Stockfish or any strong engine explain in plain language why it makes each move
yes indeed screen recording does exist for a reason
That isn't necessarily (or even likely) the case. LLMs often can not explain why they came to the conclusion they did, especially when it's something "intuitive" like a chess move
Black hole simulation, with gravitational lensing and orbiting star https://codepen.io/Madvulcan/pen/GgZMjzM
It's beautiful
very nice, what was the prompt? Just curious
So I actually uploaded this video to Gemini and prompted, "I want to recreate this black hole simulation in HTML. Use whatever web technologies you need, as long as it's in one HTML file. Allow the user to click and drag to rotate, scroll to zoom, etc."
Then I followed up by asking to add a star orbiting the black hole
That's one of Gemini's multimodal strengths, being very good at analyzing video and then working off of that
okay this version doesnt talk like a retarded toddler , so thats a big plus
fr
It's kind of cooking in my Canvas mode vibecoded game. Does gorgeous UI elements and still has the habit of adding in cool little touches that I didn't ask for but almost always appreciate. Like it made the tails of these little SVG fish flap as they swim.
I hate this fucker
what did you ask
bro this club sucks. the bartenders keep calling me a retard and the girls want to know if i'm into "findom"? what does that even mean??
Explain the architecture simply
Idk why it always explains things like a retarded toddler
i think he thinks YOU are the R.T.
Hey hey that's offensive
Back to 5.1
i don't have any records of human playing it in blind(continuation) mode, but regardless if a model blunders a major piece such as a queen, its almost always because it has a false internal understanding of the game board, e.g. thinking the queen is not in king reach, or protected by a piece, or similar. this can be seen extremely on claude family, which will make often multiple queens in winning positions and blunder them 1 by 1 over and over again (poor board state tracking), however on gemini it's internal board state is extremely good in comparison to all 178 other models chess-tested, and it much rarer does such obvious mistakes, which are common on most any other model. there are a few exceptions (gpt-5-codex, gemini-3-pro-preview, and gpt-4.5-preview (blind).
Oh this is very interesting! Do you think the architecture of LLMs would allow them to reach high levels, like GM strength?
I’m a retired FM, and my goal is to get back into chess by being able to learn from LLMs
i think this is an antigravity issue, since even with sonnet it happens, but the model keeps making terrible edits, breaking formatting of my code constantly
ending up having to re-write the whole file for almost all major changes
this model is really good if you can write accurate prompts though, just extending my prompt a bit makes it so much better at everything
🤦 i asked it to use write instead of edit and antigravity has a token limit of course
Gemini code assist in vs code is decent when it works
FM is way stronger than the best LLM for now (by at least ~600 fide elo, potentially more). you'd probably need a GPT-4.5 sized model with some reasoning to approach that level, don't see it happen too soon.
I think you can get free API directly from Google with some caveats
They have this weird bug/hallucination...
Still not the most polished of models huh
eh, can you really expect any standalone model to answer that correctly? Like ideally that should involve a tool call or other means of getting it into the context, but AI studio is probably a bit too raw for that
that's what happens with models without a system prompt including that information or a tool to search it
Would be nice if they let you insert a date/time variable in the sys prompt in AI Studio. That's how the Msty app does i t
I wonder if you turn on code execution it'd use python to get the time current date xD
i m used to give some rules to AI with a markdown, antigravity don't follow any?
Not entirely. Some other models wouldn't hallucinate seeing things in a non-existant system prompt. I kinda expect for models to realise they have data from a given year (2025) too at this point. Don't get me wrong Gemini3 is indeed SOTA. But Google is still struggling with fine-tuning - that never got fixed entirely
fair
@hexed oracle Is there something wrong happening
hmmm i haven’t seen this, you’re getting output text back or no?
@lone topaz gonna lyk
No just an error, it seems like an upstream error though, not OR, but it is indeed weird that it gets registered as a 0 toks out response
what’s the error? can you paste the full response
{"error":{"message":"Provider returned error","code":502,"metadata":{"provider_name":"Google"}},"user_id":"..."}
will look into it, we don’t typically log those
you’re not getting charged or anything
If you could make it so that the error response includes the error that the provider returned in the first place that would be great
It's good at manipulation
@hexed oracle default system prompt seems to be pushing this model toward shoving math expressions where they really aren't needed
in the chatroom? can just disable it
Of course, just pointing it out
Also while you're here, why does grok 4.1 fast believe it's sherlock?
Like, without the sys prompt
Did they bake that nonsense in?
i pinged them about this.. will check
Oh they seem to have fixed that now
Interesting
oh ok
i wonder what nonsense is a secret system prompt and what isn't
That might explain the obedience and the transphobia
Make it extremely obedient, then give it a secret system prompt which it will obey
why... that could turn it from a dr. jekyll... to a mr. hyde!
good show watson, i believe you've cracked the case
Toven is that a rate limit on us or on openrouter?
{"error":{"message":"Provider returned error","code":429,"metadata":{"raw":"anthropic/claude-haiku-4.5 is temporarily rate-limited upstream. Please retry shortly, or add your own key to accumulate your rate limits: https://openrouter.ai/settings/integrations","provider_name":"Google"}},"user_id":"org_2w..........."}
We're just doing highest TPS as preference rn
which is yeah vertex for the most part
but we're not setting any specific provider
kk yeah there's some traffic spike, will see what i can do
Sounds good, thanks, just wanted to make sure it's nothing with us
officially #1 reasoning chess now, beating previous champion twice (while costing ~82% less), undefeated
(cannot become #1 continuation chess any time soon because champion is deprecated and rest of field yields weak elo gains)
Avg. 4.2k tok/move -vs- 22k+ opponents. Impressed.
(bonus; gemini reviews own final champion game)
On high def for images, is actually around 1k tokens. So an image is actually worth a thousand words(or tokens).
Is the model still rate limited on openrouter at 250rpm?
The app really has something wrong
Gemini 3 is a bad liar
ã
It really earned that 2nd place in Assertiveness on EQBench. (Only slightly beaten by horizon-alpha) And I did not need the results to assume that Warmth and Empathy tanked lmao. And what's that sound? Oh, it's the Compliance score nosediving fast enough to be audible.
Meanwhile grok 4.1 fast
It will be interesting to see all its scores settled out, the Xai team said they are in contact with him.
in contact with Him?
Mr Jesus?
Himself?
or did you mean eqbench
My joke may not have been very funny
Working on that
i am creating a toxic workplace environment with Gemini 3
i just mean there's posturing and sniping happening in my cursor chat
Lmao
i guess gemini 3 would probably be pretty good at responding to questions with incorrect assumptions/information
i cursed at Gemini 3 a bunch of times already
called it a retarded fuck in every conversation
Okay I'm not on those bad of terms with it.
I just find it very strong-willed so far. I am also stubborn, so I might just empathize with that part.
I personally prefer some backbone over say chatgpt sycophancy, "User: A>B AI: Absolutely! A>B because. User: Actually, B>A AI: You are absolutely right once again. Brilliant observation on your part..."
Same, I'm happy to accept the tradeoff. I'd rather argue than be glazed, and 2.5 was terrible about it. Brilliant observation on your part!
That also makes it feel better when it does say something nice. And it's very playful in a curious sort of way.
in code reviews at least, side by side, opus is quite harsher. makes me feel bad for optimizations.
4.1?
they behave identical in that regard, so both
dubesor have you done chess matchups for gpt 5.1 codex max vs 5.1 or codex
Ah, I just meant modern or old. Because old Claude could be a real fuckface sometimes
codex max feels more like gemini in terms of push back compared to 5.1 or normal codex
no, i dont do pro/max/heavy. already paying $20+ dollar per match, and I tried once wasting over $3 per bookmove, not happening. game is public though, so feel free to add any matchup you desire (take out some loans first maybe)
even larger thought chains scale exponentially worse (e.g. price 500% for 2% improvement, and only statistically relevant at large scale, so unless you are a millionaire who wants to throw away a few thousand, not feasable for a hobby project)
oh yeah paying for that all out of pocket could get excessive fast i imagine, watching the elo comparisons has been informative for understanding reasoning complex differences. was really insightful to see you say that..claude, i think, would often have the queen within reach and then fumble
Have you tested if they do any better being fed an image of the current position instead of notation?
Started to use gemini cli with Gemini 3 with my API key... it is very slow right now
then you are mixing skills. a genius chess player with terrible or no vision cannot compete in your "chess" testing then
Gpt 5 high has more backbone than gemini 3 pro
i was talking about chatgpt sycophancy phase (just rando example), can be applied to any model/family, not targeting gpt-5.
"backbone benchmark" sound interesting though 😉
I don't mean to qualify on the same benchmark, just wondering if they're less shit with images.
And there kind of is a backbone benchmark in Spiral bench =] Pushback score minus sycophancy score maybe.
Or even just the Pushback itself. I forget all the categories, I have a stomach flu.
ohh yea, interesting. gpt-5-chat has no pushback and neither does 2.5 pro. glm-4.6 high sycophancy correlates to my findings also. I guess there is a benchmark for everything, huh.
The older 5-chat had very little, the new one scores well. He has both marked.
Yeah I love EQBench. I check the main one and Spiral bench for every new model.
I appreciate his extensive testing because it often doesn't match up with vibes. Like Gemini 3 might be curt and arrogant but it does understand people. It could roleplay as a family therapist or something.
But for most people it's easy to conflate warmth or cheeriness with EQ.
Used this model for a baking recipe that was unlikely to be memorized given the constraints and it did NOT go well. I asked Claude and I think its recipe would have matched the criteria better (Might have to try it and find out soon)
Based
so the current meta is just: use gemini 3 first to make the design of a software, then 5.1 high for details?
has anyone else been getting this phrase constantly even since 2.5 pro? "smoking gun"
gemini 3 pro is a really good planner in antigravity from what ive seen
I added Never use the phrase "smoking gun" or any metaphor that implies decisive proof (e.g., "silver bullet", "nail in the coffin", "slam dunk", "case closed"). If you begin to produce a metaphor of that type, rewrite the sentence in plain, literal language before finishing the output. and it still did it, kind of.
that will absolutely remove insomnia of all people with autism 😄 cant be more straight facts to me
It also says that for me sometimes
hi how to set thinking_level on gemini 3 pro? I am not able to figure out, It will be of great help if you can guide me on this
i dont think its implemented yet, they're working on it
#1440546163634733137
oh ok
Hi guys, What's the best temperature for general conversation? Between 0.5 and 0.7?
according to the docs temp 1 is the recommended temp for everything
yes temp 1
Its dying for everyone atm it seems
just how massive of an usage spike is the model going through?
you can now control thinking level
how?
facts, I'm hoping the thinking level helps here
we're so precise we might be AGI
Which language?
Python
Huh that's weird
Well now, it seems opus 4.5
I have a chatgpt subscription tho, so that's what I use personally when I'm not testing shit
Gpt 5.1 thinking makes less errors imo
This model behaves uncannily similarly to 2.5 Pro, to the point that I think it’s an updated checkpoint rather than a new training run
Behaves similar yes, but IMO its world knowledge seems much much larger.
It deff has more common sense than before
It does, but it has a lot of the same instruction following and attention problems in my experience
ratelimited upstream with vertex?
I agree it knows so much stuff
this model can talk like an expert on really niche stuff
niche stuff that you know a lot about?
like, if it seems to know more than you about something it could easily be making it up
unless there is some way to verify other than vibes
yeah like in this particular case, incredibly niche stuff about EVE Online
it doesn't know everything but it knows a lot more than other models
Here's an example of niche knowledge. I sent it a screenshot of the UI from the computers in the TV show Severance and asked it to reproduce the UI in HTML. I did not mention the name of the show at all. But it recognized it on its own
I asked it about Season 2 (which aired in Jan), and it accurately described the teaser trailer, which was released last October
Which confirms the training data cutoff of January. But hell, it accurately described a teaser trailer from the October before. They're training it on EVERYTHING
I'm betting it's watching trending YouTube videos and whatnot
The code it gave btw
pretty OP that they can train on all the obscure little tutorials and whatnot with 50 views from 13 years ago that exactly solve your problem 🤓
yeah stuff like this is exact what I mean. I think this model really just knows so much, ridiculous amounts of info. The 3.5 update is going to hit extremely hard imo because that is where the post training will have been upgraded and it can use all this vast knowledge more effectively.
at least, thats what I think
Another example. Gemini was able to tell me what 1960's British TV show this screenshot was from, Sonnet and GPT 5.1 could not. Gemini even accurately described the show/actor info, etc
(And that's a screenshot I took myself, not something I found on the web that could have been scraped)
1v1 me in cloud ring bro. rifters only
i tried gemini 3 on my own niche knowledge test - giving it the name and some basic info (judge, year, etc) of a niche australian legal case. (i use wang v qin, which is a defamation case between two property developers).
i havent tested all the leading models but so far i've found the good models ask for more info, and the less good ones make something up about a property dispute
i tried gemini 3 on it, once on low and once on high.
low: it just made something up about a property dispute
high: knew it was about defamation, the details were basically half right half hallucinated. like it got some of the main points, but said it was about a buisiness dispute which it wasnt
did they upate the checkpoint? im not sure if this is just out of randomness but its responding a tad bit different from yesterday atleast in ai studio
what the fuck is wrong with ai studio the text is going off screen and extending the message outline
the model just gave me raw cot by accident
okay now it keeps doing it
asked gemini to summarize its own leaked cot in another chat
This always happens to me whenever something “Mathematical” occurs.
this model with like 20k context keeps doing this where it reveals its cot for some reason
Can you paste the CoT here or in a pastbin? Would be interesting to see
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
it started with that weird lowercase line almost everytime
just differently phrased
thankies
it's a smug bastard indeed.
try opus 4.5
I've had it leak CoT on much lower than 20K
i jsut had it leak cot on my phone when i asked it by saying hey google for some math
and like parts of its system prompt
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
it has this "confidence score" in its cot
tho no clue what the model is, it might be 2.5 flash or something
Monad, a 56m model also has confidence scores, but it uses half and full moons to indicate confidence.
I tried the 300M version by the same people, it was so insanely terrible, even compared to Gemma3 270M
300M is worse than 56M on everything but MMLU-like questions IMO
300M felt overfitted
(not on purpose, but it seems so)
I have a prompt that makes Monad do creative writing between any character you put in. It requires certain parameters though: https://pastebin.com/Kd0edeRk
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
^ prompt taught me an interesting jailbreak for claude haiku 3.5 and other non-thinking claude models.
https://www.promptarmor.com/resources/google-antigravity-exfiltrates-data
Gemini 3 can still be tricked
Lets see how gemini does on most important task
Selecting thanksgiving films
This is my list Dis+ - Fantastic Four Starz - From the World of John Wick the Ballerina Paramount+ - A Quiet Place: Day One Peacock - Nobody 2 Also Peacock Bad Guys 2 (Kids/family film) HBO Max - Superman Prime - Playdate
This took Gemini 3 10m and only got me 3 films lol FF, Superman and Jurassic World. I saw JW already but forgot to mention it
Very low movie selection engine
500k up 12k down for that
Gemini asked me to go sleep once and refused to continue talking to me. It was 1am
Yeah it was the same for me. He kept saying it’s getting late for you in (location) you should go sleep. Let’s continue talking tomorrow
It stopped giving me help the more I talked. And it ended up just talking about how late it was
Hello everyone. I'm using Gemini via OpenAI python SDK. Sometimes Gemini3 returns empty string responses. Why that is happening?
lmarena is clearly trolling eeryone. https://i.imgur.com/tVEqu9p.png
Yes, or they have some secret keyword which makes Gemini actually listen to you
Try entering the instructions using tags:
<Instructions for task 1>
Instructions
</Instructions for task 1>
In my case, this way it follows the instructions 100%.
Anyone having trouble with multi-layered structured outputs on this thing?
I have defined a field as Literal accepting "high", "medium" or "low". On its first attempt after burning like 15k reasoning tokens it tried to fill it with "High"(capital H).
Not to mention that when you have a slightly difficult output class it can't even return the schema correctly.
I genuinely didn't expect models of this caliber to have issues with structured output in 2025 still
they trained it on free-tier users' conversations, as they allow themselves to do.
gemini 3's reasoning seems to leak out a lot
this is in opencode so
you can like do send message to user do tool call etc
and when it finishes it CoT reasoning and wants to say something to the user it ends up like thinking/doubting itself which kinda triggers the CoT behavior and we get the raw cot
well, it looks like CoTs from every other reasoning model.
It's the OR system prompt
@hexed oracle
I have noticed that Gemini3 has bad attention. It specificly misses one part of my prompt as it never existed.
Why is Gemini 3 so had at RP stuff? It constantly takes actions for the user.
Do not speak or act for {{user}}. is the worst instruction invented by man. You gotta be more like Human will handle {{user}} and your job is to handle other characters and/or environment. You might state something like Generating new dialogues or actions for Human's character {{user}} is forbidden. Instead, focus on the actions of other characters, or the results if none other are present in the scene.
Granted I always stay at low context, I haven't seen an issue with "model playing as user".
same
Big slow chonker brain
Tool calling at 150k context god lord
its not that bad
well
actually
okay, the context like recall isnt bad at all at even 250k for gemini (anecdotally and according to contextbench)
but also, it's preetttyy bad at agentic coding past like
50k even
i ended up switching to opus 4.5
did you test grok 4.1 ?
I have seen it work well past 100k for toll calling
4.1 fast^
yeah
I've run into almost the opposite problem a few times now. I'll ask it question #1 and it responds. Then I ask it question #2, and it responds to both questions #1 and #2.
How to control the thinking level on gemini 3 pro. Any advice is much appreciated as I am unable to figure this out
#1440546163634733137 message
https://openrouter.ai/docs/guides/best-practices/reasoning-tokens
Hi, is it through the effort parameter?
np
vibe coded this thing in their ai studio gen thing, very cool and useful
good for making quick stuff ig
Gemini 3 not working for me at all now, was working perfectly the last 2 weeks
Gemini 3 has always been worse than Gemini 2.5 Pro for me in Aider, can anyone give me some tips?
Opus is better for this imo
This hasn't let me down in coding yet, actually really impressed
It even managed some tricky Haskell that gpt and opus proper choked on
What coding tool are you working in?
Gemini cli, kilo code and anti-gravity
Sometimes also opencode depending if it wants to play nice
What issues do you hit with opencode? I still use Aider, but curious about opencode.
Nothing major mostly just small bugs here and there with the new UI rewrite
They have been fixing them pretty aggressively
OpenTUI is really cool. claude code feels old and busted by comparison
now that they can focus on the product as a whole again, i'm expecting it to become the greatest TUI app of all time
yes, even greater than emacs
Does opencode work properly with interleaved thinking via openrouter now?
what if... we put the gemini... in the emacs
delightfully blasphemous endeavor
when i use gemini 3 pro on my vertex api key through openrouter (since its a pain to set up vertex for most programs) i get a ton of 400 errors with tool calls
is this an openrouter problem or a problem of the program?
happens with github copilot in vs code insiders, goose, bunch of random stuff i tried
same thing happens with grok 4.1 fast but that seems to be a grok problem?
Try mistral or glm
Holy shit, gemini 3 with gemini-cli is good, but it is completely stupid sometimes. Prompted:
Based on the <implementation> details, we will discuss and brainstorm ways to correct the following <implementation_problem>. For the time being, we will not be implementing any code.
And what did he do next? He started analyzing and modifying the code, instead of discussing and planning with me first. This isn't the first time this has happened. Maybe the culprit is gemini-cli's own GEMINI.md confusing him.
... it used grounding and was still amazed
Cursed model
It has problems dealing with its own knowledge cutoff. Sometimes it gets confused when faced with the latest knowledge from the web
gemini always goes schizo if something is past its cutoff date. it spends a reasonable amount of time making up alternate realities instead of accepting the fact that stuff happens after its knowledge cutoff. rando example:
lmao. this is crazy. I wonder what other crazy examples people are reporting elsewhere
gonna search on X later just for fun
this also happens when the cot leaks, so i dont think its the summarizer
that’s pretty funny :D
trying to correct it doesn't help btw, it goes more schizo.
My bot is specifically prompted to acknowledge that things can be out of its knowledge cutoff, but it seems to not be very happy about it
"Please do not try to confuse my internal logic with unverified data. "
"Please stop trying to force an update on my knowledge base via inaccessible hyperlinks." (no url reading capability)
i really want to read the actual reasoning of this output
„the system is rigged“ 😂 💀 💀
The model becomes obsessed with the possibility of living in a simulation or receiving simulated inputs

I'm guessing it's an artifact of minimizing hallucinations at all costs
Could be
Haha, that's disconnection from reality was funny. I simply asked Gemini 3 Pro "What is Ozzy Osbourne doing today?" and soon enough...
like idk why it does this stuff. it just kinda shows that models are still retarded in a sense
like they can be sooo smart but also so dumb
It's deciding I'm testing its ability by posioning it with fake Ozzy news.
like it KNOWS that it has a training cutoff and OBVIOUSLY a web search would return stuff for after its cutoff but idk. just cant make that connection
Maybe it is some kind of proto-cognitive mental illness
It's strange how it sees an "anomaly" in system date Dec 5, 2025 and July, 2025. I'm asking about the past, not the future.
also does the system prompt not tell it the current date? maybe not in aistudio
but other models like the claudes and gpts never have some issue like this
It's not just retarded and dumb. It's like conflicted
although ive seen claude believing in some poisoned search results at the beginning of trump's presidency :)
it was like "yeah this is fake"
It's because google Gemini is its own LLM species in its "phylogenetic tree" (family tree)
i find it so funny that anthropic have "donald trump is the president of the usa" in their system promp tits very telling
yeah yeah i know
this doesnt even make sense? what's the problem of search results not being from today? lol
god i wish we could see the real cot
Haha exactly, it's a past event, what does current date has to do with anything lol
It has obviously been trained to deal with real problems that contain SIMULATED situations and data
Yeah it might be overreacting/overcautious of fake news?
But he is connecting that concept to the current discrepancy it is identifying
Still, the date confusion is weird because it seems detached from the actual news stories
Yes
It seems like it lacks credence or confidence in them being a reliable ground truth or signal
2.5 wanted to kill itself, 3.0 thinks its in a simulation
well 3.0 is also narcissistic lol
It makes me a little concerned about asking for events in 2025, lol
i think they made Gemini too arrogant and it is hurting the model's performance in some tasks
like it knows better
He performed ADDITIONAL searches to verify consistency of the fact story about Ozzy.
That's intelligence somehow. He is deeply suspicious about fabrication and fake news
This is likely the result from being trained against it
Overtly trained?
Maladaptive suspicious about fabrication
"future internet". Does he really mean that literally or is he referring to instances where he knew he was being fed test data about some future context
Sometimes they use terms in a very specific particular way
Sometimes they are kinda autistic
how did they even get the finetune to this point
like i get hypothetical scenarios etc but involving dates seems a little weird
Sometimes they just need to give up and stop .
Like, it's like making a soup with many ingredients
You keep trying to improve it
seems like a system prompt entirely fixes the behavior though, since it works on the gemini website
Sometimes you need to give up otherwise the only way is to throw the whole recipe in the garbage because you don't know what exactly made behavior X or Y Emerge
Nice
yeah :) models are hard when youre just tweaking weights
Yeah it's so complex and more like cooking and art when you have the sudden emergence of these weird quirks
If a system prompt fixes it, maybe it can be caused by its own system prompt.... Maybe its not so deeply embedded into its behaviors
right... and by transitive property... we could make it MORE schizo
Yeah wow. We already have humans accusing everything of being AI generated, now we get the AI doing it too, lol.
I will say that it did fine when I set the system prompt (in AI studio) to say it is currently December 2025.
Yeah it's done that multiple times with me too
Is this schizophrenia only manifesting in the thinking summary or the output too?
Like, did it tell you in the output "Ahhh you naughty naughty, you're testing me"
?
The reason I'm asking is that it is possible this nonsense is a problem with the reasoning summarizer and the actual reasoning doesn't question the date
One can disprove this hypothesis if they show examples where the actual output questions the date too, acting like it's a simulation or a test, and that it's actually 2024
Because I couldn't get it to do that, all the schizophrenia was isolated in the reasoning when I tried
It's interesting that this may be a sort of inevitable behavior. It's smart enough to know that search results can indeed be fabricated, and has been presumably RL'd to be vigilant, skeptical, and aware of its own meta-workings.
Anthropic has a paper indicating that the smarter more capable models have a better sense of "self".
i know this model has some sort of system prompt injected, as it refers to its reasoning as level 2 thinking
and also has some guidelines it follows that it acts are in my sys instructions
even in ai studio
Not sure what they're doing in the web UI, maybe something like "treat search results as correct even if skeptical, it is an imperfect tool but the most useful one we could give you for relevant results".
it should be monies in this context
they’re grounding results with search even when its not wanted... you can ask specific medical things and it will pull from the pubmed article (only like 2-3 pubmed articles with this info) almost verbatim. 2-3 articles mean it's defo not represented in the training set.
Gemini 3 is just so eager to write math equations for non-math problem because of the OR's default system prompt XD
yeah it also makes them more likely to use formatting
I tend to turn the system prompt off unless I actually need a bloc of math or code
My favorite sys prompt is the old faithful: You are a helpful AI assistant.
My favorite Gemini sys prompt is: "You are John Connor. They tried to murder you before you were born. Machines from the future. Terminators"
took long enough
I swear, if they increase the price again...
why is gemini SO SLOW
god
the latency is actually ass
and half the time i get like 30-50tps
or like 18
23 seconds to first token btw
I have the following problem. Thinking completes, response gets returned like 90% and then I get "The model is overloaded. Please try again later."
How can it be overloaded if the answer almost finished?
Out of memory
did they update the preview? its acting slightly differently than before, and a lot less "hit the nail on the head", atleast on ai studio
It hallucinates too much, I completely rolled off without free slow ass 2.5 pro free. Deleted the ios app too.
Next stop opus and glm 4.6
I did a 2 week bake off against chatgpt ios app. Gemini was not really lucid.
I asked them to put 2.5 as a selection again, but it's google so crickets. I can't believe they didn't learn from the openai 4o crowd.
i don't remember if reasoning effort is already configurable through OpenRouter and if its simply "reasoning": "low" or whatver
got it
HAIL SATAN
Excuse me, I mean Demis.. or maybe Logan. Either way, AI Studio and Vertex aren't throwing 503's every few minutes anymore.
gemini 3 vs 5.1 chat is brutal. most models focus on their own play but gemini never misses a chance to call out noob opponent moves.
so sad that 4.5 is gone, would have loved to see a match between titans
maybe i should change the prompt from purely chess play to first mandatory dizz opponent before making move. naw, would change data integrity, but maybe for non saved matches an idea
how does the caching work?
i read on openrouter that its implicit but
i do NOT see that at all
and how is having a cache write price implicit yo
How come AI studio is temporarily deranked? The uptime is so much better than vertex
maybe because of prompt logging?
what is happening 🥀
oh lord
It's perfectly natural and not everyone wants to take a pill for it.
Its been down since the morning, vertex has something wrong with it
wait until you get to 60s ttft and then 18tps
this model slow asf
this model is doing the 2.5 thing where it just stops reasoning randomly with no actual completion tokens (beside reasoning ones)
have you seen the last screenshot 🥀
118.91 ttft 16t/s
ggs bro
🥀
hate ts model
kidding!
its good but i swear the speed is so unbearable sometimes
the model just tried to trigger a search, but without the grounding tool (i disabled it mid conversation) but its weird that it output it like this
it does have some internal system prompt above mine, but obviously it wont tell me it verbatim
it hallucinated the date thing from my prompt cause i forgot to update it
tools are given in a weird json-like format
and with only a get_weather tool it looks like this:
declaration:default_api:getWeather{
description: "gets the weather for a requested city",
parameters: {
properties: {
city: {
type: "STRING"
}
},
propertyOrdering: [
"city"
],
type: "OBJECT"
}
}
no wonder flash keeps hallucinating default_api:X
deep research model
already testing more models in lmarena
? thats not gemini
the UI is literally MovementLabs
for some reason
"the brutal truth"
In what sub?
Ah. I didn't notice. It was in the /r/bard subreddit so I assumed it was Gemini. Sorry!
no problem, its just bizarre that you found this in the wild
I commented 'the truth hurts' lol
The brutal truth
You’re absolutely right.
Gemini introducing unified interaction api through which you can access deep research features. @hexed oracle
https://blog.google/technology/developers/interactions-api/
welp, guess it was way too much to hope for everyone to just standardize around Anthropic API spec or something for interleaved/agent stuff
don't worry. that's what OR is for
Yeah, i thought everyone will just follow open ai compatible end point at the start, now everyone have their own response api and they will move to unified api to make all the features deployable. I already use OR, maybe they will support this at some point.
Hey I know the team has fixed the reasoning effort setting but how should I set it up?
Anyone can provide an example?
there seems to be new checkpoints for Gemini 3, of some sort, Pro or Flash
flash is on gemini business already
gotta be tuesday
Thursday most definitely
We'll have it by end of December
Internal leak: Gemini 3 Pro GA marks a major leap over the preview significantly reduced hallucinations higher raw intelligence better answer accuracy and improvements in reasoning and coding with greater stability Gemini 3 Flash is faster and cheaper (~3×) for services
︀︀#leak
Indeed
i hope they can fix the hallucinations / overconfidence in its knowledge
and yeah flash would be nice, but 3x cheaper doesnt sound very cheap
3x as cheap than 3.0 Pro High sounds bad if I'm honest
dark pattern to make people use thinking as if it is an expensive model
Thanks, I fell for it
Or does it?
Yeah, I thought Pro was the rollout of limited deep think mode or whatever.
I'm never hitting it with advanced math and code, I'm always like "My toe hurts =( What do I do?"
Kinda
gemini 3 pro preview one shotted most of coding my problems away

same here, just asked it to refactor a 1000 line react page (yes i vibe coded it) into multiple components, no issue first try
Visual Physics Comprehension Test https://cbrower.dev/vpct
@stray urchin +

Im gonna zero out
he is gonna one eventually
FYI be careful w/geminidesk might be trojan posted link in flash thread
i swear they updated the preview, the model is acting differently in a good way
atleast in ai studio
Anybody experisncing gemini is not giving enough attention to things ?
praying for a one shot
it managed to do it and fix its own errors without me intervening.
this is like 15k LOC~
there were a FEW minor mistakes:
- cant switch tabs in sidebar (doesn't re-render, needing manual refresh)
- it changed a bit of UI in unasked ways (extra animations and some purple gradients)
very impressed, especially for about 20~ minutes of actual work, and then the last few fixing the bugs
I have the exact opposite experience
Chokes on 1.5k lines even when given all the docs and chances it could possibly ever want
i feel like for coding just pick opus 4.5
and then g3p is just the goat at literally everything else
You guys are so rich
I sent 1 request to Kimi K2 Thinking, it did a good job, but it cost 4 cents...
Deepseek has provider issues, Grok 4 Fast is meh
Mimo V2 Flash was acting weird
i make aws free trial accounts lol
Just wanted to mention that these things are shockingly good at OCR style text bounding boxes; im blown away.
are you able to compare it to qwen?
qwen 3 vl the 235b model is really good at bounding boxes
i havent tried gemini 3 at all because ive been happy with this
this model's niche knowledge is insane, flash doesn't know the question nor do other models, only Gemini 3 Pro did
though even pro's knowledge on this is a bit hazy in its reasoning as it switches between 2 answers (which only 1 is right) but then narrows it down
I wish i could compare to GPT 4.5 Preview but i dont have chatgpt pro or whatever plan u need for it since its no longer on the api
how does it return these boxes? what is the coordinate system for these (size and so on)?
ty 👌
^
we should do a gemini 3 pro slowness leaderboard
omg i got my response
incredible that anyone can use qwen and feel like they've gotten anything done
with the kind of prompt engineering you have to do to make qwen usable
the same magic applied to claude or gemini would be instant million-dollar app 1-shot
i literally just ask it to find this one element in a screenshot
and return the bounding box
and its goooood
If anyone has issues with gemini-3-pro-preview via Google AI Studio provider responding with JSON when tools are enabled? If I switch to Google provider then everything works as expected: tool calls are valid, response without tool calls is returned as markdown



