#Gemini 3
1 messages · Page 2 of 1
the default CLI system prompt is huge and tells the model to act all professional
yep definitely has different knowledge
asked it "What is the latest season of the anime..." that ran this summer, and it knows while 2.5 doesn't
what's the better sysprompt
am I supposed to give it something
cuz that .env doesn't work on its own
You are a helpful assistant.
how are people getting it to work in gemini cli
with google acc auth it 403 errors and with gemini api key it 404 errors
is it only vertex?
we have access to gemini 3?
yeah a .md with smth simple like that seems to work and have Gemini's more "natural" behavior
it works when not passing the model flag
we're able to access a model that definitely has more up to date knowledge than gemini 2.5 pro
ah cool, is it any better? 🙂
idk yet, testing some stuff
are you in the US?
I think that's the issue
works for me with google ai pro, australia
does not work with my google code assist standard account
here's from cli
'safety'
not working for me with my ai pro acc or code assist enterprise acc 🙁
they called my post fake on the bard subreddit 💔
check ur version too
same
prompt: "Make me an SVG of a pelican riding a bicyle, at pelican.svg. Make it super detailed, and think long and hard about it."
definitely better
Better but what was that one with the feathers
NYC skyline SVG
and ur sure that you're trying with the ai pro login? you can check by making sure the settings.json has selectedType set to oauth-personal damnit that dont mean anything
could be prompting
yep, looks like the wall is here
prompt: Create an SVG of the Tokyo skyline at night. Think hard about how to do this, and make it visually stunning and beautiful.
US VPN makes it work
2 are 3 pro the other is 2.5 pro
oh wait the png rendered incorrectly for one of the pro ones lemme redo the export
looks like gpt 5 is roughly as good as it's ever gonna get
in terms of intelligence

ya'll are wild
"looks like the latest major new architecture is the best it's ever going to be"
just because gemini 3 isn't catching up on pelican in this one test
why are you assuming gpt 6 wouldn't be better

it will be marginally better
of course
it's always marginally better
that's how improvement works
marginal gains, repeatedly
yes but it's clearly diminishing
apple is more important
I mean the pelican is miles better than gpt 5
price-performance wise I don't think we're seeing diminishing returns
well yes, I'm talking about pushing the frontier
anyway it's not bad, but yeah gemini 3 isn't blowing anything out of the water
it was extremely overhyped ngl
it's better but not by miles
yeah it's not bad but they hyped it too much
we haven't seen top price models yet
the main thing for me is it seems to actually follow tool calls and not just give up
like 2.5 did
I'll put $30 on the frontier not improving by more than 25% on SWE-Rebench in 6 months
it's hard to say on a benchmark which changes
GPT5 via codex
same prompt as 3 pro in gemini cli xD
took 5 times longer too
going to let it try again
I'm just saying I'll put my money where my mouth is lol, I don't think the frontier is getting pushed much anymore
I'm not sure what I'd use quantitatively for the bet, but I'd also bet something similar
that gpt 5.5 / claude 5 will be significantly better than o3/claude 4
if you compare it actual generation to generation e.g 2.5 to 3 it is kind of a big leap for this particular model series
sus mouth shape aside that's actually weirdly more accurate than most
yeah lmao
I was about to say "the most anatomically correct one"

some guy on fiverr
On his way to spread schizophrenia
Look at the streamlined posture though
Mmm
Modern art
I do wonder if this one was fake
and/or super prompted
I was looking back in this chat history and didn't see anything nearly that impressive
That's what I'm saying
What if you try to generate a very detailed description first?
devmode server is full of larpers so i wouldnt be surprised
this was one of the first ones that people got really excited about
here's from gemini cli
could this have been a checkpoint for ultra?
maybe
they need preference data for ultra too, even if they don't release it, so I think it could be possible for sure
could also have been some base model that they have then distilled/quant
Use a VPN, select a USA server and it will work
The model still working for you guys?
404 appears now when trying too use it
gemini --model jesus-god-69 works
^ Can you get me access?
no, but I can post the same thing crypto-bros do every day until it comes out
wen gemini 3 pls
wennnnnnnnnnnnnn
Ohhh, thank you very much!!!
Pls, I will appreciate that
Thank you this works
Doesn't work for me now
I was kidding
it's working for me, it's literally AGI. zero-shot my first SAAS
aw

Make it one-shot a super fast inference chip
unfortunately i need a sheikh to buy me credits
Cerebras already did it
the day before i come out with gemini 4 it will probably be up to maybe 1% faster than gemini 3
nobody seems to understand this
but wen
2.5 pro has 100% got dumber , gemini 3 is coming
Apparently Nov 18 now
can we get gemini 3 already
how. benchmarks will note down how good a model is.
livebench etc
WEN BARD 4
I feel like there's a hype bell curve and they've gone way into a long tail
the hype is dead
Optimal is probably one week, maybe two
i am very much alive
We're at what, a month and a half?
Gemini 3 in November! (2026)
That's way, way too long to tease a model. They made that mistake with GPT-5. It was like oh man they've been cooking so long, it's gonna be a banger. It was nice, but not a new paradigm.
Honestly wtf happened with o3? It was kicking ass on so many metrics, then failed to make it the free default in app despite it being priced cheaper than Sonnet.
how
Even with pro the usage limits were too low for me.
are ya guys accessing it
gpt 5 is good enough for what it is
cheaper coding model
i can spend easily 20 usd daily via codex on 20 usd plan
Afaik the only Gemini access is happening via A/B tests on lmarena
honetly i just want it bit better than 2.5 pro anyything marginally better works good enough
Yeah, I mean codex seems to be great for people, but they kept going with 4o as main flagship in webUI while sort of silently having o3 crush even private benchmarks.
or updated knowledge
claude code i tested
Yeah. I want 2.5 Pro with a better personality lmao
i think it was 7 usd
Some better general smarts maybe
on 20 usd plan
mate don’t let them ship mediocre models
they should be making breakthroughs with how much money they all are burning
otherwise ehhhhh yeah
Because funny enough, 2.5 pro is still basically second place for reasoning on Dube and simplebench
For sure, they should
i mean 2.5 pro works good enough
but
breaks often
medicore at best
gpt and claude is good
i don't get the. hype for the grok4 and grok for code one tbh
lot of people are using it
I'd really like Google to solve linear context pricing / coherence. That's my dream, my #1 and #2.
Grok is good. Dry, but I like it enough
for coding?
No
I'm poor rn so I use GLM plan for coding
It works for the projects I'm doing
i think the whole hype is mostly just really cheap drafting, then fix the bugs or do more complex stuff with the bigger more expensive models, more usage for little loss
plus the models are fast
Also it was (is?) free in Kilo
either give us amazing API models or 2.5 in a bottle (a 30B parameter bottle)
Google's tooling/ecosystem is one of the bigger draws for me rn despite the meh personality. Bundled with Google one / Workspace and notebook LM is nuts. I feel like they have so much shit that everyone can find a really cool thing they like. Deep Research + long audio overview is pretty much the greatest thing ever for me.
@celest cypress if u are building good stuff
hmu i will give u chatgpt pro
👍 u can use codex via that
I am building mid games and benchmarks to get back into project mode xD
But super appreciated!
feel free to hmu with your email
same here. And I use gemini 2.5, gpt-5-codex and sonnet 4.5.
but google's ecosystem really give you some really nice advantages
Speaking of this, I think this recent blog post and Google research was interesting! In particular for long context performance, where it might even be a paradigm shift. https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/
sorry i missed this. I can basically just kinda tell
where???
actually theo says a bunch of good stuff here (as usual)
- gemini 3 has been ready for a while but they keep delaying it because competitors keep releasing new models that beat it in benchmarks
- they're using cool ui specific training data that they've purchased, which makes you guys think it's gonna be AGI
which is cool, like i would enjoy using that myself. but can it actually write rust/c/svelte/robloxlua/medium complexity typescript backend code without making a mess? i don't know...
the langs u mentioned are irrelevnt
only game development matters
That's actually funny
They're letting apple use their model for apple intelligence
Why doesn't apple have their own apple intelligence model?
they've allegedly managed to train a model up to 150B params internally, maybe it didnt meet their standards?
Mayhaps
Our server-based model performs favorably against Llama-4-Scout, whose total size and active number of parameters are comparable to our server model
https://machinelearning.apple.com/research/apple-foundation-models-2025-updates
comparing with scout 🤕
mayhaps gemini is overcooked 
i think you can make games in roblox. it's not just for meeting people
only thing one can do is take mini games from roblox. and make actual great games out of it.
very profitable
their ai division has screwed up badly. they seem to be "parting ways" with their SVP of ai, and they've lost a bunch of staff already. if they're like what, 1-1.5 years behind the pack, you can't release anything. it would just be highly embarrassing.
i think i found my interest.. how i can make money, whiel using runway and ai...
you know all these videos where younger people like under 30yo, sell their products or are spreading stuff like Looksmaxxing?
ill debunk their videos. shoudl be very easy if i use their original title of video in my title
youtube algo should know immediately who to show the vids
for apple anyway, they get publiclly shamed by the media for any misstep. they're not like, say, amazon, where their B2B focus means their reputation doesn't really matter
Crows Vs. Looksmaxxing
l;ooksmaxing only works with makeup and surgery.
that bitch of a "pretty man" tells teens to consume 3:1 ratio of potassium:sodium
to loose bloat in face 💀
r/stroke
also, OpenAI provides ChatGPT services to apple platforms for FREE!! when you can make deals like that, there's no need to rush into anything
it boils down to genetics, and what you eat as a kid tbh 😭
but you can do skin care + hair care to improve some stuff ig
yeah like nourishing a garden.
if ur "pArEnTs" do it wrong, ur fucked as adult.
would you be recessed if your parents fed you raw meat and milk at 2
duno im gona make debunk videos now
😳
That doesn't quite add up though with what the checkpoints have been like. They would clearly eval well on the benchmarks already. But let's be real these companies always cherry pick anyway so it's kind of irrelevant.
Also 2.5 pro still scores near the top on a lot of the mainstream benchmarks (which just proves how unreliable the benchmarks are lmao)
To me it feels more like they have been waiting for infrastructure as well as some tuning for that infra maybe - like the new generation of TPUs they are rolling out.
The UI training is interesting but not sure what relevance it has unless you are doing a computer use agent (and Google already have a fine tuned model for specifically that)
only livebench gives 2.5 pro as sota. the others dont. probably because livebench is the agi meassurement. and 2.5 missing intelligence but is closer to agi?
true. i mean who knows, it's a theory. but benchmarks are the ridiculous game they have to play; people are making decisions based off them, and it's what the headlines are all about. even crows like them. and it fits the 'plateau' scenario which i find very believable.
the hardware argument does seem solid, if that's what it ends up being. but requiring this hardware to run gemini 3? why wait?
are the TPUs borked? did they not consider MPUs? sad
i'm sure it's not something technical, more like a business/corpo decision
it's time for Momentum to pull ahead...
the UI stuff is mostly what i've seen that allegedly showcases gemini 3's advanced capabilities. is this the new benchmaxxing?
or LOOKSMAXXING??
🫨
Yes, this obssession with aesthetics has radicalized me against pelicans
yeah. you run the software on hardware you have. meanwhile, maybe there's a middle-managment war of epic proportions currently playing out
Well it's twofold really,.you have to adjust the model for the new hardware and they may want/need that if the model is larger etc to maintain even their current uptimes etc
Plus they did just sign agreements to provide hardware to others so makes sense to put new models on a combination of new and old then slowly partition off the older hardware as newer ones come online to fulfill their contracts
minmaxxing hardware is for poor people
I don't know either, but they're up to something
There's another leak of this sort of UI card but I don't know if it's real:
that UI is very lithium-flow and orion-mist
Really thought it will release earlier
It's rumored that this thing is using a better than 2.5 model
rumored
it's confirmed kek
there's no way a new model is worse than the old one
Confirmed how?
because new models are always better than the old model
Sure, but where is it confirmed that this is using a new model?
Or, well, I actually wouldn't necessarily agree that newer is better, companies often roll out new models that are cheaper to run or simply more tuned for specific tasks, like whatever these UI card things are
llama 4 wasn't?
I don't see anything in either of those examples that need anything more than tool calling/structured output
I correct my statement to "generally, newer models are better and there are very few exceptions. Therefore, it's reasonable to expect the new model to be better"
Like I've written some dynamic UI stuff for agents that isn't quite this good but not a million miles away and it works with most models
And Google's coders are better than me so I'm sure they can do it better without needing new models
Them mixing in new features like this is making things even more messy to predict wtf is going on xD
Sure, there's a possibility it's some older model is being used with some augmentation
Anyhow, at least for me (pro sub), continuing the convo lets me keep testing out whatever this is
That's cool that it works like that when sharing lmao
i don't think Gemini 2.5 Pro will do that cool UI you showed if you ask it to
with the animated background, and that specific style of headings
that style resembles a lot of the outputs i got with lithium flow and orion mist
not only the ones i posted
Over a thousand message, for a model that hasn't been released yet.
I'm thinking we all gonna be very disappointed
They're trying to see how low they can push performance without backlash
not me. heed my warnings. i'm not just trolling (only a little bit) #1423327675996438608 message
I just use gemini models because I can use them for free in the ai studo
i was about to say this
not as bad as a 1850 message thread for $cam model
I blame Logan, the lord of hyping
If it's the same as 2.5 pro but doesn't mess up tool calls as much that'd be fine for me xD
Dont give those google people ideas
I also want the price to be 10x cheaper dw
2.5 pro max
I think price will stay where its at , intelligence will see a huge bump espcially frontend
not if they're continuously delaying it because it's subpar to current available models
thats pure cope , google is notirous for delyaing models just like anthropic
Claude 4, 4.1 and 4.5 has been released since Gemini 2.5 came out so idk what you're on about Anthropic
I think there is a difference of 2 std devitaiton between our intelligence.
4.5 opus kinda more detailed than riftrunner in a random svg i got it to make
riftrunner supposeldy 3 pro
so which one is which?
the right one definitely looks better
riftrunner right. yeah it actually is quite better i tried to make a wii remote and it was so impressive. trying to find stuff that wouldnt be benchmaxxed like pelican or smth
Gpt 5 chat was still coherent but a little behind not as detailed (battle mode)
I've never actually run an SVG test. what do you prompt to produce good svgs?
simple prompts really
just vibes
riftrunner
The hell? That's essentially perfect, wtf? I will never understand how LLMs do this
bad
Create a SVG representing an orbital plane alongside its plane of reference, highlighting all of the orbital elements required to completely describe the orbit.
...there was an attempt here, but it's a hard prompt
other models were completely unreadable
it's fairly wrong btw, just looks coherent on a first glance
yeah but its kinda cool how clean the aesthetic looks like
gpt 5 codex for reference
GLM
Is 3 out yet?
no
after an hour of trying i got a rp/creative writing response
thats too easy, its geometric shapes with shadows
try a seahorse 😈
yep
It seems like they've rolled it out to those who use canvas on mobile apps
I just tried it and yeah the results are dramatically different
Also, the canvas request from mobile is not appearing in the myactivity feed
Is it good?
ios users always getting the best treatment lmao
Comedic, preferring IOS over google's own android
i mean they have their reasons, yknow way more people on android so a higher chance of people catching on to them
Worked fine for me on Android
via web on the left, via android on the right
Ugh, but it gave this for the left:
<!-- Hello! I am Gemini, a large language model. I am trained by Google and my knowledge base is continuously updated. -->
and this for the right:
* ----------------------------------------------------- * AI MODEL IDENTITY: Gemini * MODEL VERSION: gemini-2.5-flash-preview-09-2025 * TRAINING DATA CUTOFF: September 2025 * ----------------------------------------------------- */
so who knows
another example i just did now
Gemini 3 maybe now released on canvas 🤔
https://gemini.google.com/share/bf04f82b02b5
This site output someone got is kind of bonkers
wtf a gemini 3 just flew over my house
all i see up there is grok satellites. everywhere!
Just canvas or is it also just Gemini 2.5 pro the Gemini web app and app? I saw the vision for Gemini 2.5. pro is better than on Aistudio
Does anyone know if gemini-cli to access gemini 3 still works?
I had crazy rate limitin never tried again
is it me
or is every example i see always has this circle cursor for some reason
Confirmed
No canvas needed
dont ask it to make
ask it to provide code
i remember telling ais to make and they just tell me they just cant
Well it did make, it just decided to make it not in the way I specified
yeah as a web app
If this is gemini 3 pro im a little disappointed but I guess it is an improvement
In canvas? I mean the whole point of canvas is for making stuff that works in the web
Hm is it?
I was thinking like chatgpt canvas
Where it's just an editor
General-purpose
why does gemini 3 pro love the capitalised big text
because it's BOLD and GROUND BREAKING
YOU'RE ABSOLUTELY CORRECT!
interesting that they only release it somewhere where its special UI dataset training will shine. hmm 🤔
please google
just
release it
trillion valued company btw
they're tuning it for the TPUs...
is it like some huge 1tb parameter model or something
Apple's Gemini seems to be 1.3 trillion parameters
whered you get this info
i'm an insider
hmmm
i saw it somewhere and i saw again and again
so it must be true!
will look up later
Weak, smaller than Momentum
where can you even access momentum
This is a joke and highly recommend you to not engage with that product
The apple one?
This thing with Copilot-generated benchmarks and no conclusive proof of existence of their custom chip #1434917422686801980 message
We'll never run out of hype
https://x.com/ammaar/status/1989071809438707731
Istg if they don’t drop the damn model
Gemini 2.9
all this hype train from them still, at least I can play around with GPT 5.1 in the meantime 😂
there should be an ai bubble
and then
there should be an gemini 3 bubble
its like the gta 6 of ai now
gemini 2.6
https://x.com/NotebookLM/status/1989085142846210122
https://x.com/NotebookLM/status/1989086177727771012
https://x.com/NotebookLM/status/1989087591736701308
Why is Gemini 3 trickling into places before ai studio? Like isn't ai students pretty much intended as a open beta for models?
Idk the more they keep hyping it without releasing it, the more I feel like we are going to be disappointed
At this stage it's so overblown already
i have already fired my junior developers
And to think that the first message in this thread was in Oct. 3 is even crazier
More than one month of hyping already
Yeah, it's genuinely really annoying
Gemini Pi -0.141592653589...
Actually important but heard nothing about. Will 3's vision ne better?
I don't recall people doing many vision benchmarks in the stuff we had (AB tests/stealth models)
Which is a problem, because AI vision isn't very good yet.
Yes
Vision and frontend will be better
Sorry, no other model capabilities matter now except SVG generation
I've been waiting since Oct 9
Even earlier
Is this seriously gonne be another 3.5 moment orr?
when
lithiumflow outputs were very similar to each other too
Eh, it used to put out the same dark mode gradient slop every time too, at least it looks good now lol
yeah ui looks better but repeats same styles a ton (also loves the like brutalism all caps)
can these people test anything except frontend jesus christ
true
like what? this is a frontend model
that's what we know
well it's probably as good as the other stuff as any
i wonder if they'll delay again because of gpt 5.1
but like what else are people going to share around? some nice functions?
yeah that's the problem, the models need to be good at things as well that are not so flashy online ^^
code that is like...nice? 😂
that would impress me from gemini
this just doesn't feel like confidence to me. its pretty odd
that doesn't impress the average joe
we need flashy html pages
I want a knowledge cut off in 2025.
unlikely , 2024-25 data is filled with AI slop
I want them to at least scrape the marvel Fandom wiki.
google employees have broad access to G3 now, so the model is definitely imminent
maybe google does a thanksgiving code freeze and they launch it next week
This raised a question I haven't thought of before; how we'll define "cutoff date" in the future. They'll definitely refresh the training set, but earlier it meant "general last massive spidering of the web", but in the future it might be less clear cut and much more selective? It'll basically be up to the vendors but who knows what they mean, as they avoid the slop...
Blah blah blah so much hype
Google DeepMind has leaked three groundbreaking AI products: Gemini 3, Nano Banana 2, and a new AI agent. Gemini 3 is a powerful AI model that can create impressive projects, like MacOS and Windows replicas, all within a browser. It’s fast, smart, and capable of generating complex structures with minimal code. Nano Banana 2, the improved versi...
Source?
Interesting 👀
its a women so idk how much yall can believe it
😐

💀
I'd rather trust a woman than a guy who can't spell properly
respect
🇮🇳
out of pocket af
Warren is bullish
https://x.com/AndrewCurran_/status/1989476691396407482
do we think the old fella uses AI?
As of mid-November 2025, the net worth of Berkshire Hathaway is approximately $1.1 trillion to $1.11 trillion, based on its market capitalization.
a balanced investment in two titans seems reasonable. that's probably what i would do if i had a trilly.
doubtful
I'm teaching 16 individuals to earn $51k or more within 71 hours and you only need to pay me 9% of your profit
hype posting like crazy
spam alert @hexed oracle
I will teach you how to HYPE a model
i will teach you how to turn 1 trilly into 0 trilly
At this point I'm convinced they just lost the hard drive it's on and are stalling while the interns scour the data center for it
it was more funny before mods deleted the message above mine
if i had a real pyramid scheme i would take it straight to dms
and whos to say you arent the one spamming tough guy. think about that one in the shower
Ah didnt notice that
Maybe it was me all along 🤔
you have double my messages total that means your double the spammer
How are you Biffy and not Bimmy with that profile pic
if we're going off of like all time though dont take any offense but i oh man i got you beat. if there were a medium that acted as an ultimate amalgamation of all my years of spamming a discord mod would lay eyes on it and experience a visceral ape-like primal fear and perhaps even go into an episode
im biffy 100% i know it ive lived it but maybe i should keep an open mind ignorance is the enemy of progress
12:14 PM · Jun 17, 2025
🤦♂️
ts actually gonna be something? gpt3.5->gpt4 level?
people say so, but probably not that big of a jump
yeah agree. i think it will be best at frontend but wether it can trump other models as go to vibe code model not sure
it likes certain patterns though. like a lot of the sites it makes all have that loading screen before landing page
it will probably be the best at coding for a while, and as long as people dont get bored of the frontend it makes i think it will stick for a while until anthropic releases the 5 series at some point
Oh my god has it actually been 5 months of them jerking us around?
I thought it was more like three (hur hur)
it better be spectacular or i'll stop using AI till 2026
AGI-level hype
Boys it's so over, I put some glue on my broken toenail to keep it together and then just accidentally kicked my desk
Uh
damn that sucks
I think I fixed it
but how does this help divine the release date
BREAKING 🚨: Google is working on multi-agent systems to help you refine ideas with tournament-like evaluation. Each run takes around 40 minutes and brings you 100 detailed ideas on a given research topic.
2 new multi-agents are being developed for Gemini Enterprise:
- Idea Generation - "Create a multi-agent innovation session"
- Co-Scientist - "Drive novel scientific discovery with Co-Scientist"
Co-Scientist 3-step workflow 👀
- Tell Co-Scientist what you plan to research, point it to relevant data, and set your evaluation criteria.
- A team of agents will generate ideas on your topic using their available data
- The agents will evaluate the ideas against your criteria and rank them, tournament-style
Google is not only automating research but also preparing a product that will enable others to do so.
This is the next level 🤯
nobody is reading 100 ai ideas
That isn't much at all for a researcher
the problem is that i feel like ai ideas are like half baked or miss the point half the time
when you give like broader more widesweeping questions to research
So are researchers' ideas in the initial stages, really, the AI allows us to iterate faster
Coming up with ideas is surprisingly difficult and requires a ton of reading that is more efficiently done by AI, at least for a first pass
is there real life data how much this tool will help researchers? how faster they will be
this looks like a stupid as hell vibe coded lovable app
There usually isn't real world data for usage of a leak that's not been released, no
But, on a serious note, real world data for this sort of thing is tricky, because first and foremost it depends on how skilled people are with these tools
I can only say anecdotally that our lab's productivity went up since like, early 2024 when we started using AIs seriously
Gemini 3 - today ?
oh whoops. wrong company. eh i'll leave it
Technically that isn't any company =P
Gemini 3 is cancelled, this is now the Long Chile thread
Meituan Long Chile
Microsoft is investing in Chile with a big data center AFAIR
soon + 2 weeks
nah i doubt that
omfg a tooltip
GTA 6 before Gemini 3
they'll release when they're ready, not when people think it will be
i dont think its coming tomorrow
maybe later this week
anyway dont get too hyped, the model probably wont be as good as people think it will be, keep your expecatations low
Prediction market with such a skew points to insider trading
im pretty sure the betting percent was already quite high since last month for 18 nov and mostly spiked after the deprecation update
🤣
why don't you think it's ready?
the alleged checkpoints seemed sloppy and i think they're just trying to make it more reliable; probably in the final RLHF stages
They are in a hype trap now like gpt5, people expect mind blowing amazing at this point.
whenever it's finally available, I'm curious to see how well it performs in Gemini CLI, as I remember 2.5 Pro being a disappointment in that :x
its been improved a lot since release, so maybe try again, but it still acts stupid sometimes
Anyways it doesn't matter
eh schedules change and anyways they still have to run benchmarks etc, im pretty confident its not coming tomorrow
nah it's definitely november, the preview model had 11 in the name
its a preview from a november version, but that doesnt mean the model is coming in november
and often public checkpoints for quick updates are month and day, not month and year
Toriset really wants us to believe it's not coming soon huh
🤷
💺
Trying to get us to pull out of the nov 18th bet so they can get all the profits
They lost to grok 4.1 and went back to the drawing board 🤣

so... all in on november 20 on polymarket?
we're gonna be rich toriset just you and me
its very likely tommorow
Nope 18th is the highest bet rn
its a joke guys
Yes please i need to cleanse my palette after sherlock/grok 4.1
GemiNi
You know something
The thread with already 1.5k messages in it?
Somebody please tell him to shut up
At this rate he's more like an astrologist instead of a LLM PR person
What's the astrological sign of people who are born in mid June? That's right! GEMINI!
geminmimiini
gemini
Jepeto
GEMINI
what gender is a clanker ?
well usually theyre male. obviously
thats sexist
exactly, but now we're finally getting a real intelligence
it's not human.
no gender.
are they not doing a keynote/event for it?

then how do you explain grok??
Flowith founder confirms that it is indeed happening today.
https://x.com/DerekNee/status/1990462030739091474
still plenty of time till then
whered you find this benchmark?
does this work for free?
if its real you'd need a subscription
no pro sub
oh its in the model card, i didnt see that
well google slipped up lol
that card is probably not meant to be accessed
cause the links it gives dont work
@minor elm
time to download cursor
It’s interesting that they picked a lot of unusual benchmarks
it's not free
yeah i noticed
tsk tsk tsk. i know a crow who's about to have his worldview shattered
Maybe. Given what their IMO gold paper said I think it might mean something else tho.
if Gemini 3 flash can match or beat 2.5 Pro that would be amazing
Holy moly, HLE and ARC-AGI-2 results 🔥 It's becoming silly to list a third of those results though because they're saturated.
i have error: (Chutes) Provider returned error: <!doctype html><meta charset="utf-8"><meta name=viewport content="width=device-width, initial-scale=1"><title>502</title>502 Bad Gateway
Cloudflare is down. Wait until it comes back up
a beast
Will the pricing be the same as 2.5 pro
we dont know, but probably will be either the same or lower, almost certainly not higher
thank you.
nice, it's really happening at last. Those benchmarks look amazing, hopefully reality will be a similar story. Guessing it's not available on Gemini CLI yet until later today or this week
I wonder how much of those gains were the result of architectural improvements
google with all its compute power
what about in open router ?
openai said that with gpt 6 it will try something that is actually new in terms of architecture
I guess whenever Google officially announce it
hopefully today
2035™
ps5 controller
xbox
probably, idk
how was the timeline of 2.5 pro and 2.5 flash I don't remember
according to grok, 2.5 flash released roughly 3 weeks after 2.5 pro preview
yeah but apple bought it
yeah now everytime you ask it, it recommends you to buy the apple sock for your iphone
oh jeez can't wait to generate a flock of pelicans riding bikes into the sunset
can't wait to generate FICTION
interested in its writing skillz
@random girder what do u think?
sota in any eqbench benchmark?
i havent used it yet, since i dont use cursor (and dont have a plan so i cant use the early access), but from earlier checkpoints the writing wasnt very great
but it has really good world knowledge based on the benchmarks, alike to 2.5 pro
and probably will have the "ability" to say "idk" when it doesnt know, alike to GPT 5
good
i lowkey NEED it to top k2 writing
but i doubt it
pretty good
it's so much better than 2.5 pro in every category
so i expect it to have like 200-300 more points
in creative writing
this thing will feel human
I'm super excited to try it ou
yea
yeah if the aggregator leak was real, it feels insanely human
at 1M too
yeh
The screen thing, at like 70%, means their vision modulate would of pass the line into not shit rangee
yea
this is the hype that sama wanted
to happen with gpt 5
72% vs their previous 11% on the screen understanding is insane, probably gonna be vision sota for a long way to go
right it's crazy
time to hype until it officially releases
are there audio benchmarks too?
or just visual
insane compute, insane amount of information bc it's the main internet site
they'll be #1 long-term
The massive vision understand improvement, is probably what improve the ARC-AGI-2 scores.
not in the ones that are in the model card, but probably in general yes
though i think their models were already sota at audio understanding
ah, fair
i can't help but wonder what 3.5 pro will be like in the future, if 3.0 got this much better
meanwhile meta ...
even better 🤷
elon's sota will only last one day
poetic
yeah that'll take another year
now this makes me wonder what grok 5 will be 🤔
AGI!
well, elon said 10% chance of agi
but yea, he caps a lot
i trust google the most
i can say numbers too
wait for sama to draw a cirlcle with bigger radius this time you guys are not gonna believe it
99.99% it's agi!
Now only if the knowledge cutoff wasn't still just in January.
ye
the irony of Cloudflare having a major outage on the day that Gemini 3 Pro is likely being released
wat happened to his
creative writing model
did they just put it into gpt 5
bc it's sota on eqbench
i guess yes that was it
i remember someone talking ab
how chatgpt's writing seems good at first
but when u analyze the sentences
they don't really make sense
something like that
most people using ai seem to barely be able read, so it works for them
praying for less purple prose cliche slop
i just trust google so much
i think they'll be able to get rid of it
but it'll either be chatgpt, claude or gemini
my bet's on gemini, but who knows
on release 2.5 pro was so good at creative writing then it just deteriorated
sacrificed for other tasks
i believe
yeah
talked to flowsen ab it yday
but only claude 3.5 sonnet and gemini 2.5 03-25
were able to almost perfectly
replicate the first episode of a show
3.5 sonnet is great but I think it’s been deprecated

but it's funny how no newer models
are able to do that
but 3.5 sonnet and gemini 2.5 03-25 were able to do it
yeah that’s interesting I wonder why
now-ancient models
world knowledge i think
hes dead 😔
wait amazon are still hosting it?
but i somehow made it
03-25 my beloved
#keep0325
brother, i couldn't believe it
it knew the main plot, the dialogue even
i was talking to a google staff i was a 100% sure
https://orchid-three.vercel.app/endpoints?q=3.5+sonnet the skulls... when i chose them i didn't think about what this moment would be like
Compare the models and provider endpoint available on OpenRouter
😭
why is every provider there twice
so ominous
because they actually had two each?
i guess it was their thinking variant
and non moderated (forgot the name)
ah okay, i see
😭
only once X is actually up
imagine it’s just a delay announcement like gta 6
nah probably not
google took down cloudflare again to buy even more time
seems like it
yeah just take down 20% of the entire internet because your model's benchmarks leaked early, logical answer
cloudflare took everything down to delay the announcement
well, SWE-Bench Verified...
well maybe its not benchmaxxed like the other models its against.. no proof, but maybe.
You think flash will also be released today?
maybe in a week
i just hope they fix the tool calling
yeah
noooooooooooooooooo
sota ❤️
Does that model card not feel awfully short form? Are all of google's model cards that short?
yes i think so
The first link under Gemini 3 Pro - Model Card 404s
they took it down
Why would they take down this? https://deepmind.google/models/model-cards/
oh i thought u meant the model card itself, idk, maybe soon to come page?
https://modelcards.withgoogle.com/assets/documents/gemini-2.5-pro.pdf This is much more like it length/detail wise
sorry but doesn't beat gpt 5.1's pro moves
awesome sauce
inb4 another price increase for flash
i hope not
back to glory days of 0.10 in and 0.40 out
competing with haiku for worst pricing?
💎 nova premier 💎
im just using gemini 2.5 flash lite preview cause its actually so lowk good
i wonder if google will make a code-specific version of their model, like codex
heard its lobotomised
like not as good as the lm arena checkpoints. idk though havent vibe tested myself
trying to avoid a grok
almost like its completely made up
I really hope the devs don't read comments xD
Would be so demoralizing to work your ass off getting your model SotA on literally every benchmark except SWE and people go "Eh, I expected better."
it's insane how easily disappointed people are. this is literally sci fi level tech and people are calling it "lobotomized" because the behavior changes between snapshots
?????
"it's not 100% on everything so ts is ass, moving on"
people are too spoiled
i think this is a great leap if the model can live up to the benchmark increments
Also they reference the March 2.5 obsolescence as proof of lobotomizing, but like...how many private or public benchmarks does March beat 0605 on? Zero as far as I'm aware, even if people liked it more stylistically.
I never trust people after every single week with GPT-4 there'd be a post going "That's it boys, they killed it, it's an idiot now."
it's insane how some people think LMAO
Also jesus christ I just saw the SimpleBench score
"Well I would never pay for AI"
74.8%???????
"oh it doesn't score 100 on every benchmark. Shit model"
this is why i think it will feel human
SEVENTY FOUR????
10%+ jump
I think the largest jump we've ever seen?
3.5 Sonnet was also pretty massive, but my memory of what was SotA before it is hazy
Actually o1 might have been biggest jump?
Gemini 1206 -> o1 was 10%. So yeah, still loses to this by 2-3% and that was going from non-reasoning to a massively expensive reasoning model
im having huge latency on ai studio in the ui atleast
I'm a bit torqued considering SimpleBench hasn't failed me yet, it's my north star
Idk about you guys but it really impresses me that a LLM can do Arc AGI 2 at all
Over 30% is crazy
what sort of test is that exactly?
I can barely do ARC =(
When is gemini 3 coming on openrouter?
Here's a DIY example
https://arcprize.org/play?task=cbebaa4b
arc is hard 
oh cool
gemini has to have a pretty good vision training to understand the positioning and number of squares
but...swe
🙁
people acting as if google won't sweep that one too
not sota on swe = bad
it's more laborious than difficult
bc the marketing trick is to release a model that's sota, but can be improved
when visual understanding gets saturated arc scores will probably skyrocket
for when other companies try beating
ur current score
marketing 1/1
c: aha, i beat u!
google: great, wait for the next update
beats c
ARC AGI 2 is text only, doesn't involve vision
The board is represented with characters
i think their vision is really, really good too
it was able to translate a text
from the 1900s
perfectly
i cannot read that text for shit
I'm kind of amazed Gemini isn't SotA on coding. Google legendarily has massive amounts of the highest quality code.
who knows
we'll see when it drops
Anthropic has...a scrape of Github?
i think their priv shit is def sota on coding
but they just release the most stable
stuff they have
Could be. I mean tbf, it loses to Claude by like 1% and ties 5.1
and they could be "lobomotizing" so when other companies try beating their score
they just do an update
and claim sota
Eh, SWE Bench is just another benchmark, it's just a slice of the programming performance so I'm not concerned about the score
Speaking of, I wonder if 5.1 and 4.1 released recently because insider info said Gemini 3 is crushing it on social metrics?
I can't get over the weird coincidence of them both doing the exact same thing at the exact same time
possibly, yea
and gemini 3
will still be sota
openai & anthropic can't do much when competing against the internet itself lmao
ever since 2.5 pro google deciding they were gonna kill the competition 
yea
steamrolling
TerminalBench is a far better and comprehensive bench than SWE BV tbf
a landslide
Never 4get that 2.5 stayed in the top 2 on multiple logic benchmarks for, what, 8 months?
Each model release recently TerminalBench is ~basically the only coding bench I care about
^
I mean sure they updated the checkpoint, but the original was nearly good enough to hold that score even today
the other companies spend like 3 major updates trying to catch up just for google to release another thing that will take them another half a year+ to catch up
it's genius
yeah i know, i know. still...
it's just interesting.
New bench: how many bad useEffects does the model use without being told not to use it explicitly
Who is making this
Give them $1mil
i never actually see them do that and i use react all the time
😭
You never see a model write useEffect ?
maybe it's just what i'm doing. but who is using useEffect, cmon
you might not need a useEffect
Even latest GPT 5.1 needs super hand holding
Or it’ll write useEffect everywhere
In my experience
oh i mean their code is still hot garbage, don't get me wrong
generally they're not willing or able to decide to rewrite a slice of a component stack, or anything really, when it really needs it
🔥
proof?
google™
instead, just add more props/function args. tightly couple things that i could never have even considered
gold
some people need that apology form
unless some mad shit happens, I'll be a firm gemini believer until the next release 🙏
Ngl the model could be AGI and I'd still say they started hyping it too early
I think it's out on aistudio?
(my ai studio)
still not announced though right?
CONFIDENTIAL
$12 output
™
google deepmind posted smth on twitter
we're READY
YEEHAA
patiently waiting
