#Gemini 2.5 Pro
1 messages · Page 2 of 1
The new Gemini 2.5 is still very good if not better for my use cases.
I see people rarely talk about the "hidden" costs of running reasoning models
I mean sure , its 10/mtoken which is cheaper than sonent but sonnet doesnt think.
It's always been like this. It's super annoying.
damn this new bastard thinks alot
Crazy drop 😂
Thinking longer than previous model but dumber if it not about coding task
Just noticed the new 2.5 Pro no longer inserts hidden reasoning when prefilling, which was wonky before.
Understood, here's the summary:
Before: 1k reasoning, 1k response (doubling the output cost)
After: 1k response (faster vs no prefill too, so it's not just not reporting it)
I've been thinking about this a lot. Maybe the providers should not charge for thinking tokens if they are not exposed. The non-exposed nature makes counting and tracking usage very tricky for both developers and users.
i try to account for this in my personal project lmb's price scatter chart by using dubesor's reasoning token usage data as a multiplier for the output cost
I mean its expensive for them to serve a reasoning model and competitors WILL distill their model
still you got a point there.
streaming a summarized COT seems the only way to combat this
On the UX side, maybe the API can return something like: "<think_stats>Thought for 2.2 seconds using 2548 tokens.</think_stats>" as part of streaming response.
3.5 is still the best model imo
You are not alone
On my personal evals, Gemini 2.5 Pro is behind GPT-4.1 and Claude 3.5 #1354107710437724221 message
I should really add back Claude 3.5 to my tests, since 3.5 is indeed better than 3.7 in some.
new v3 is slept on
I hope deepseek focuses more on long context
It's also doing well on my eval. Just that humans have a hard time focusing on more than 3 things.
I wish there's only 3 labs putting out SOTA models, but now we have 4.
problem with deepseek v3 is it's slow as fuck on all providers. why use it when gemini 2.5 can hit 500 tps? it really limits its use cases in comparison. I ain't waiting 2 minutes for it to write a dozen lines of code
[funny you mention that...](#general message)
nice, good it's progressing but not immediately useful. no mention of serverless offerings?
I don't have the $2000 a day they want for an endpoint
Honestly DeepSeek V3.1 is not that slow. It can get to 60 tokens per second on Fireworks. On par with Claude 3.5 Sonnet, etc. See my tests here: #1369678362330529875 message
It's just some providers don't optimize it well enough to make it fast.
yoour'e right some providers were hidden
@restive locust we really need better UX for the provider list. The best providers in terms of speed is sometimes hidden and neglected, which gives a wrong impression on how fast the model can be. 👆
yep good point. you can always sort by throughput!
Guys, if I add $10 billing to open router, do I get 1000 RPD for 2.5 Pro?
Anyone else find Gemini 2.5 Pro not great in practice? It is consistently worse than other SOTA models for me in coding and writing tasks. I mainly care about instruction following and whether the response was concise.
try "be concise" in system instructions, otherwise you're essentially comparing default styles instead of the model's capabilities
thanks. this indeed improve the style to my personal liking. i tend to not mess with system instructions, but this works well enough in normal prompts.
I personally recommend treating default response style and word choices orthogonal to the model performance itself
can change response style, can't change how smart it is
makes sense. maybe not performance, but i'd say still a consideration for personal preference.
well actually adding "be concise" improve response style for coding tasks, but made writing tasks perform way worse, now the response is way too short. so this is not a universal silver bullet for fixing gemini.
i have to think about this more, whether this makes sense and how to go about evaluating the models.
well, there many things you can put into system message to control what you want to get, "be concise" is just an example. Can be "give response for at least 200 words, but no more than 500" (it won't give exact amount of words, but it'll be longer than be concise)
style control is a thing, and instructions for the LLM are in plain English, so simpler to come up than, say, python
imagine LLM as an evil genie - if you don't tell it what to do, it'll do the worst possible way. If you tell it what to do, it'll follow the instruction but in worst possible way. Just like programming - you need to be precise in what you want and cover as many bases as possible.
i am very aware of giving specific instructions. in fact i already had "minimize prose" in all prompts since 2023. this had worked well for all previous models until gemini 2.5 pro came out, which forces me to add "be concise".
every model treats same phrase differently, so you can't have universal prompt that works on every model, despite OpenRouter offering easy switch between them
even model versions in same family will treat same phrases differently
due to the nature of the machine learning, that's inevitable
yeah i am aware of those accutely
so in your case "minimise prose" worked in model 1, but won't work in model 2, and you'll need to find a different phrasing of "minimise prose" that works
"be concise" in my experience is more universal and worked since gpt3.5-turbo, but again, how exactly concise - depends on the model
some treat it as "respond in two words", some treat it as "two paragraphs"
yeah thanks for sharing. i am just thinking about how to evalute them objectively given these understanding
also location of the instruction is important. system instruction > end of prompt > beginning of prompt
in case you weren't aware
not as relevant for CoT models, but non-CoT models give priority to instructions in the bottom
this is not universal, OpenAI has different recommendations from Anthropic
different recommendations yes, but in practice instruction in the end works better than instruction in the beginning for both openai, anthropic and gemini (all non-cot)
due to how SFT teaches them, that's what they infer:
- question 1
- answer 1
- question 2
- answer 2
- question 3
which question should LLM answer to? 1, 2 or 3? the one in the bottom. Almost all LLM's generalize that for instructions.
Sometimes they generalize that question 3 should take into account question 1, so they also prioritize question 1 + question 3 when answering, therefore instructions at top and bottom have more strength than ones in the middle. openai's cookbook has same recommendation about top+bottom
(This is for non-CoT)
but because this is thread about gemini 2.5 pro, that doesn't apply, since you can't have non-CoT version of it
but still I think it's a generally good advice and something to look out for when comparing models
thanks for sharing. I have been experimenting F vs B vs F+B for a while and observed no signifant difference.
I've found it to depend on complexity of the prompt. If it's "2+2=?", then it won't matter. If it's something out-of-distribution, then in my experience it's bottom for non-CoT models (in 2023-2024, didn't recheck in 2025).
one example is clickbait detection. gpt-4-0613 consistently detected at better rate, with less false positives, if instruction was after the document.
but that's off-topic for gemini 2.5 pro
your issue still seems to be style control (not enough or too much text)
so I recommend taking a typical task for your usecase, and then make separate style control prompts for each model until responses match. If you want strict template, you can provide that template for it to fill. If you parse programmatically, then you can ask for JSON with JSON schema or JSON template. With JSON make sure to disable penalties and set temperature to 0 wherever you can, or use JSON mode (gemini and openai support that).
believe or not, that's exactly what i do as my main job now 😆
Do you mind if I dm you to chat more on the topic?
I don't
is it possible to use a paid version of 03-25 on OR?
i hope google brings back gemini 03-25, the latest snapshot is quite stupid imo
We are working on fully supporting Gemini 2.5 Pro implicit caching, but for now, if you route to AI Studio you will get the implicit cache (read at .625 price, since we are currently implementing context length cache costs)
hey! what's the difference between Google's Vertex and AI studio providers in practice?
how quickly they support new features mostly haha
otherwise basically the same
huh ok! thanks
So the api key from aistudio or one created at gcp ?
ai studio
I’m using it from https://aistudio.google.com/apikey
and doesn’t see any caching
Or is it behind the scene ?
what model are you using? how are you making the calls? how are you checking for caching?
in my testing it works, our data shows it's working, but it's not always going to just happen automatically, it is not very consistent
2.5 pro preview throught roo code and checking in my activity tab
So you think it’s maybe fault on roo code side ?
does your activity show you hitting AI Studio or Vertex?
Both
you have to be consistently hitting AI Studio
and it if switches from your key to ours it could break
so doesn't seem to be roo code issue
just tricky to get it to happen consistently until we have full support with cache stickiness
Actually I can’t move from page 1 to page 2 at the activity to go there where it was jumping
Have set it up now. Let’s check
@restive locust this issue is at your side too at the activity while switching sides ?
yes I just flagged to the team
I would prefer using explict caching , implicits are hit or mis s
implicits are great. no work required, automatic, no downside. explicit caching with gemini is a lot of work to implement and expensive
why is it expensive? also impicits are known to miss , they arent reliable.
of course they're known to miss you have to start at the same prefix and it only goes as far as your content is static
it's epxensive for gemini because you have to rent the cache. it's not just pay for write like anthropic
you need a good amount of usage to justify the explicit cache before it saves you any money
you dont pay for storage in implicit cache?
btw , even with the same prefix sometimes you will miss cache. Its been a known issue with claude, oai and deepseek
no it does not appear you have to pay for it. it's probably significantly lower TTL than explicit (likely 5min)
Implicit caching is enabled by default for all Gemini 2.5 models. We automatically pass on cost savings if your request hits caches. There is nothing you need to do in order to enable this. It is effective as of May 8th, 2025. The minimum input token count for context caching is 1,024 for 2.5 Flash and 2,048 for 2.5 Pro.
still I'm not going to complain about free, mostly reliable implicit caching as long as it has no downsides
having support for explicit is great as well of course
idk will have to test test
true that
Logan confirmed that there's no storage cost for implicit caching, see https://x.com/OfficialLoganK/status/1920530345553748480 👍
@ClassicMain @mag_pl No storage cost for implicit caching
Interesting
Oh wow, that's really nice to hear, and makes OR's life a lot easier
yes, yes it does
Now all we have to wait for is TTL
i'm testing our pending PR now, hard to say what the TTL is or anything
I'll time it now...
🙏
hmm , makes sense if its implicits TTL is less than explicits
i mean with explicit you set the TTL
by default it's 1hr but you can pay for 1281298219328 hours if you want lol
but if you want thinking summaries then you need to route to vertex 
hey, does anyone know how to change gemini safety setting so it didnt being to aggresive at rejecting input?
before the update my costumer input didnt seems to be a problem but now it just keep on rejecting and rejecting their input, thanks if anyone can help me with both of the google aI studio and google vertex.
we default to safety settings OFF now btw. If you are getting the "PROHIBITED_CONTENT" finish reason there's no way to adjust the safety settings to prevent that
it means you are being flagged as breaking TOS
stop gooning so much yal
from google:
No worries! It's closer to the latter - it's best effort. For guaranteed cache hits and TTL, we'd recommend using explicit caching
guaranteed TTL ????
if you create your own cache
yeah
you set your own TTL
but obviously you pay for the cache token input price + storage price
implicity has no gurranted TTL? and we only have option of 5 min TTL for explicit through OR
?
we need AGI to parse through all of ORs if and else
implicit has no guaranteed TTL. Explicit through OR has a 5m TTL yes
For the meme, I've counted the
ifstatements in your codebase's.tsfiles. The grand total is 17,692!
Gemini 2.5 pro via Cline w/ grep
implicit caching with full proper pricing (long context etc) should be live through AI Studio in ~5 mins.
If tool calling can be more consistent with Gemini I'd use it 100%, with zed no Gemini model can do file edits through openrouter, while Claude and 4.1 can do the same through openrouter. It's quite weird
Already feels so much better. 10c calls become 4c calls.
Hahaah
So to confirm , there is no specific TTL for implicit caching ?
yeah
it varies
Seems weird
same thing as openai really
"5-10 minutes of inactivity, though sometimes lasting up to a maximum of one hour during off-peak periods."
Openai has some clue atleast
Does google too ?
not officially
2.5 pro has gone to shit
unneccsary thinking tokens
lol , and the code doesnt even work.
They to focus on coding fine-tune which make it have less knowledge of wider domain coding problem
Their previous version is better Imo and right now feels like downgrade
context7 fixes this sir?
Ratatatatata
https://developers.googleblog.com/en/gemini-2-5-video-understanding/
The Gemini API now offers a 'low' media resolution parameter enabling Gemini 2.5 Pro to process ~6 hours of video with 2 million token context.
but google, 2.5 pro is still limited to 1M context
@restive locust Caching not working with ai studio after ignoring vertex provider
I mean there's no way for me to debug this, it's not something we do at all
it either works or it doesn't, not really up to me haha
Ok
Implicit caching has dynamic TTL and is approx 6 min
-Logan
yep
meanwhile I've never been able to trigger the implicit cache ever
cache 
Same
@restive locust does it ONLY work in the chatroom, because I'm sending IDENTICAL prompts and there is no cache hit. I assume that if the prompt is exactly the same, there would be a cache hit.
There's nothing OpenRouter is doing for it to work or not work
it's not consistent or guaranteed even
sometimes you have to send the same thing 3, 4, 5 times for it to work
if you keep going on a long multiturn convo, sometimes it works and hits like half of the convo
it also helps if you send the requests pretty quickly one after the other
okay, so that sounds like basically I shouldn't count on it then.
Could use an extension if using ST @floral skiff https://github.com/OneinfinityN7/Cache-Refresh-SillyTavern
That's Claude only.
Won't it work with any models that have a TTL-based cache? Damn
oh my bad I didn't read it (I haven't used it, just remembered someone made it for Claude)
While designed primarily for Claude Sonnet, it works with other models as well.
sounds like it will just send again every x minutes
It literally is unusable ☹️ Slow, overthinking, etc.
We need a thinking budget at least
Yeah it's so slow now lol
Do you have a sample request? Or are you asking about the implicit caching?
The model supports thinking budget including zero budget right? Is it not supported in OR?
thinking budget is only for 2.5 flash afaik
Ah I see. This was not obvious in the docs. As usual.
Vertex AI is quite explicit on only Flash, but Google AI Studio doesn't mention it.
so i tested @google/genai, setting thinkingBudget = 0 for Gemini 2.5 Pro doesn't actually cause any errors, but it indeed doesn't stop the model from thinking. interesting behavior.
maybe they do plan to support it in the future
Have it within roo code and it’s everytime nearly the same. Pricing at OR fits fit the normal pricing without caching (implicit/explicit).
The tokensize was made down to 2048 for 2.5 pro and 1024 for flash by Google
„To make more requests eligible for cache hits, we reduced the minimum request size for 2.5 Flash to 1024 tokens and 2.5 Pro to 2048 tokens.“
Speed is okay today, but implicit caching is shaky. Had it work for the first few times then suddenly stopped, even during swipes.
I have the impression that new Pro 05-06 consumes slightly more thinking tokens than before
You would be right, and I wish it was "slightly more"
Sigh, I'm disappointed in the new 2.5 pro
Aider bench confirms that it's taking 2x as long to complete each task.
basically the same as my experience
sorry, 3x as long
Seconds per case : 165.3 (new)
Seconds per case : 45.3 (new)
Yup same experience
cant do much , will have to use this shitter model.
Nice way for google to curb multiple requests and make each requests cost more
Can’t you just go back to exp?
I feel like this version is smarter, at least for coding, but it is way slower than the previous version
Nope , no endpoints for old models.
way slower menas more thinking , more cost.
yup, yes endpoints for old models
I am aware, I think I would've preferred some versioning so we can use the older version if we need quick answers
there are no endpoints for old models
there are endpoints for old models
now 2.5 pro is less attractive for my use case
wher?
on major platforms
can you give the link?
"OpenRouter Free": {
model: "google/gemini-2.5-pro-exp-03-25",
fixedPrice: OR_PRICE,
},
"Vertex": {
model: "google/gemini-2.5-pro-exp-03-25",
fixedPrice: equivalentPrice(1000),
},
"Google": {
model: "gemini-2.5-pro-exp-03-25",
fixedPrice: equivalentPrice(25),
maxTokens: 250000,
},
heavily limited on openrouter
somewhat limited on ai studio
barely limited on vertex
this is still the new model
03 25
OR didnt fix the name
No, it’s 03 25
it is accessible on all platforms as 03 25 ^
exp didn’t update
@restive locust can you confrim?
But idk why people think it did
You can just go on the website where the model comes from and check
Because logan said there are no endpoints for old models
exp didn’t update
the experimental endpoint does not point to the new model. Only preview
from our vertex rep
Hmm got it
Ugh this model is such a pain to use now
sometimes it thinks for so long that it times out
It's not good choice, they heavily limited it.
I hope someone from OR actually contacting googl and said to them that their updated model are worse for a lot of people than their older one then told them to redeploy the older checkpoint.
Making it so we have 2 endpoint and let exp gone replace by it.
T_T
gemini 2.5 pro is unusable for me.
see all request have the same cache, (looks like its only a system instruction)
all problem i hate are existing on gemini 2.5 (slow,expensive,nocache)
it seems google Implicit caching are very bad.
from screenshot its only 4 request, but i made several request like its a 10, its all have the same cached tokens / usage_cache
exactly
exactly
Logan is working on thinking budget btw
I think they dumbed the model down to save cost and upped the thinking to mitigate some of the retardness
will give us thinking budget to milk more money
Thanks for the info.
yeah , its available for flash they are implementing it for pro
thinking budget? so it will have the option to toggle thinking?
it seems to now be able to skip the thinking process entirely (I have seen this on 05-06 multiple times, but never on 03-25). It makes sense for super mundane question ("hello") but I have seen it do it on more complex stuff, too. Will see how it impacts overall capability.
Yup experencing this same thing
any source on this btw?
the newest gemini 2.5 seems to overthink almost everything
Logan said it
gemini 2.5 pro is the first model that could fix overlapping UI elements in an app I gave it
pygame app
Just to make it absolutely clear, https://openrouter.ai/google/gemini-2.5-pro-exp-03-25
This points to the March variant?
yep
it likely won't last though\
TBF they alias / forward the march preview endpoint to may
just not the exp

ugh, i hate when they do that. aliasing a generic name (e.g. 2.5 pro preview) is fine, but if I explicitly call 03-25, then an alias to a non 03-25 is dodgy
Don't worry everyone does
At the very least I would like for Google to return the actual model name in modelVersion of the response. I.e. aliasing is fine (as opposed to outright error), but tell us what it's aliased to.
Frustrating part is everyone went silent on this from Google's end
I feel there’s a small team handling most of this and it everyone else is just sitting in the dark
makes any data collection a pain in the butt. need to carefully inspect each timestamp and cross reference.
Na it's all hands on deck for I/O so they're probably making mental notes to hopefully have this not repeat going forwards
Well that's the guess anyway
Yours is as good as mine
1 more month pleaseeeeeeeeeeee
so many rate limits , insnae
Technically it's still a preview model so Google is entitled to point it to a new version. They won't do it with a stable model I assume.
But I agree Google is lacking experience in terms of rolling out models compared to OAI or Anthropic.
2.5 pro is basically unusuable right now
true
i think OR should update the coloring for uptime, i wouldn't consider 96.61% to be green, maybe slightly more yellowish?
lol google ai docs is also down it seems, and dashboard is returning 500
(Google AI Studio) Provider returned error: {"error":{"code":500,"status":"INTERNAL"}}
Hello gemini down for u too?
I am interested in setting the thinking budget to 0 to add iteration speed
so gemini is down rn then?
Ye
for how long?
it's fine here
@restive locust Any solution?
Gemini still out for me too
Gemini is heavily rated limited on OR, also through AI Studio i think
But vertex has been slightly better from my experience
no we're not being rate limited on the preview endpoint
this is a google error unfortunately
Im using preview paid one. No rate limit there
Oh
Why it’s working with directly api access ?
Well OR is getting rate limited on that endpoint (even if you paid)
no we're not
Not 429 issue
Oops
Yeah they aren’t limited on the preview endpoint, my bad
Are you sure ? As it’s related to a timeout error 🤔
Sorry I completely misread it
Why i have 429 return i have account with credits ?
#announcements message
@restive locust it only affects exp or also preview?
Arf okay
New quota ?
Announcement is actually clear, sorry for pressing you with questions but I somehow connected the preview conversation we had with this exp announcement
My bad
no worries! I added a sentence to note that it doesn't impact preview
if you had the question I am sure others would too
well free high demand model, I'm surprised they haven't cut it off already
Shipped implicit caching and reasoning summaries through vertex, model should be more stable now
Nope, but the tasks i useare dependent on training data (godot)
When i do normal webdev i find its better but deepseek r1 and 3.5 can compere
3.7 is gold too
so pro exp is being depreciated?
not yet
wdym not yet? its only erroring out now with no actual response it seems unusable
seems its down on ai studio @restive locust , also its very rate limited today @carmine spoke
@carmine spoke #announcements message
yes i saw the announcement with it being further rate limit and it being odwn for half a day + it seems like they are getting rid of it
i hope not, im still using it on vertex i hope its back up tomorrow on ai studio
i be sad if they completely remove it
I think people come back to using exp because the new version aren't the same as previous pro preview lol
nope
sadge
they are seeing what they can do, but it does sound like they need the capacity for the paid endpoint
the 429 error now directs you to the paid preview model
"You exceeded your current quota. Please migrate to Gemini 2.5 Pro Preview (models/gemini-2.5-pro-preview-03-25) for higher quota limits."
noooo please give me 1 more week 😔
Could we get representative to talk with Google so they also deployed the older version of pro preview?
I mean they can deploy multiple sonnet version, are there reason for them to not be able to host multiple pro preview version.
these are preview models, sonnet models are production models
they are not comparable
yes but this is the message direct from google. not OR. I tried bypassing OR and hitting this as of today or maybe it was yesterday.
if it does get deleted is there any similar free models?
i have no problem with gemini-2.5-pro-exp-03-25 via @google/genai directly. occasionally 429, but otherwise pretty fast response (faster than yesterday in fact)
nvm i take that back, i am getting all 429 now...
😭😭
I'm receiving empty strings as response now.
{"error":{"code":500,"message":"An internal error has occurred. Please retry or report in https://developers.generativeai.google/guide/troubleshooting","status":"INTERNAL"}}
anyone having issues with gemini 2.5 pro tool call? i get random 500 errors but i am not sure what is the issue. same prompt sometimes work but sometimes return 500.
Is it tho? Seems like google already make it into production from the way is it
See it already on other goolge product, even they already collaborated with some company for their AI coding production.
If we add Google Studio a free API key as a provider, does OpenRouter accumulate its API and that of Google?
how come the pro preview previously didnt have reasoning, and now it does?
Previously they have it but it's on the background
It’s strange that when trying to use exp through OR, you get rate limited with a specific “OpenRouter traffic is heavily throttled” message (even with BYOK)
But when you run through vertex directly you don’t have any issue at all
No 429 on vertex?
Nope
Sometimes a 500 will come up here and there, but mostly smooth sailing
OR on the other side just flat out doesn’t work
nice let me try. i am using the google official sdk, not sure how to get it to work with vertex
I just switched roo to Vertex instead of OR
Vertex has a 10RPM quota across all Gemini experimental models, so OR permanently has no quota for it given the amount of demand.
I just checked, seems 10 RPM yes
429?
Vertex does use a different set of crentials from MLDev it seems, so maybe we can double the quota by using both. 🤔
I’m way over 10 RPM lol
let me try
are you using the same api for vertex as ai studio? or are you authenticating with gcloud cli?
Hello, why is the 2.5 pro exp model blocked for those who have 10 credits even when we import our key from gemini ia studio.??
I’m using what roo code uses under the hood, so let me check
@restive locust
It’s not gonna work through OR, they’re blocked
You can get around it with Vertex (I think, works for me but not verified)
But i have a gemini api key
It doesn’t matter
I also have a BYOK setup and it doesn’t change the outcome
i think it is the same api key for google ai studio and vertex, let me check...
Yes on my account with more than 10$ and add my gemini key it's working
But not on my account with less than 10$
I’m using the “@anthropic-ai/vertex-sdk”
It won’t work for me and I have more than $10
So I don’t think it’s a question of credits
Working for me are u sure ??
I have a return message limited to people who have more than 10$...
But want to use my own api key
Why intégrations was blocked
Is blocked
Does Gemini 2.5 Pro available as API in Google AI Studio? I can only see older flash versions
Google is currently scrubbing all references to 2.5 Pro Experimental from their docs, it's very likely they'll pull it entirely (my guess is during Google IO next week)
@restive locust google just yoinked all quota for all users on experimental on AI Studio
makes sense, launch 2.5 pro officially, remove exp and preview
well preview becomes launched
Idk
2.5 pro is gonna go public on i/0
Vertex still works?
You mean out of experimental / preview into GA?
yeah
I wasn't able to test vertex because I don't have access to vertex express mode (I'm an existing GCP user). Will test out the proper gcloud cli authentication required for vertex api soon.
i just tested via vertex ai and gcloud cli authentication, gemini-2.5-pro-exp-03-25 still works there. but strangely, all the stats (except QPS) are empty.
if you have vertex express mode access (via the same Gemini api key as Googel AI Studio), you likely can also use it, though i can't test it.
how to use gemini-2.5-pro-exp-03-25 , get vertex key?
if you have never used GCP before, you should be able to access the express mode and use the same api key as google ai studio: https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview
thanks sir i will follow your guide
i wonder how long "paused" means
How use vertex ?
O.o
I'm going to guess until I/O day when they release something else and 2.5 pro is no longer the new hotness
Just use the vertex anthropic sdk it should be pretty easy that way
I still don’t understand why vertex lives in a realm of its own when it comes to models and limits
They should unify ai studio and vertex into a single product
Or atleast the AI side of stuff
How get an api key i dont understand... @midnight venture
2. Open the Google Cloud Console.
3. Create a new Google Cloud project.
4. Enable billing for your newly created Google Cloud project.
5. Enable the Vertex AI API.
6. Enable the Gemini API from the API overview page.
7. In your project dashboard, navigate to APIs & Services → Credentials.
8. Click "Create Credentials" → "API Key".
9. Copy the generated API key and save it securely```
Nah..
It's better if they separated, I already see the difference between them
Took this from Reddit, idk how accurate it is
I think there’s some extra steps involving service accounts but I’m not sure tbh
What’s the difference between the two?
Rate limit and capacities.
Then there also things that you need to check and figure it out by yourself, this are the most important difference
From my understanding the differences in rate limits and capacity are minimal
Idk about what the check and figure out part is
Thanks but i saw lot of people cry on google for jump their invoice with billing activation
I never had that issue, but then again I set a $1 spend limit so idk how google could even charge me lol
I’ve also been using gcloud for like almost 10 years so maybe it’s different for me
I’m still a noob at it but I got a lot of the basics over a long time ago
How could u set a 1$ limit it's very hard to see all option vertex have lot of thing's that we can do
This looks wrong. For Google cloud project route you don't have any api key, you authenticate via gcp cli. For API key you need to use vertex express mode which I linked docs above.
I never used gcp cli so not sure how that works
You linked vertex AI using anthropic sdk, which is simpler. It also uses gcloud auth application-default login
Anthropic’s Claude models are now generally available through Vertex AI.
I had no idea Claude models were available on Vertex AI. TIL.
So now you can have a first-party model served by another first-party provider that didn't develop the model. 🤯
I didn’t know, that’s cool
As long as you use the free endpoint why would they charge you?
👀 (cropped)
.
whats the most similar model to 2.5 while its down? thats also free, i tried a few but they all seem kinda worse
that's a good question
It's the best out there. There's nothing as good.
I think Claude 3.7 is still extremely popular but I have never tried it.
flash thinking
huh? which one?
T_T
gemini 2.5 flash thinking
not free though isn't it?
comes close and I mean like 60% of that quaility
yeah i tsnot
yeah he asked for free
ahh nvm
what gui is that for the api
SIllyTavern
ty
The gemini reasonings are worthless
I suspect they arent even summaries of the reasoning
it seems the Gemini 2.5 preview now is becoming so much better compare to last week.
- very smart in coding task (Gemini pro 2.5 vs Claude is like : Gundam vs robocop (claude) )
- Fast ( sometimes it uses reasoning delta, sometimes not)
- Cache is getting better, but there's still a room for big improvement, still not consistent for atleast 5 minutes, most of the time its only 2-3 minutes
i hope they make the cache system to renew if its actively being used (like claude)
efficiency Claude are still probably cheaper for long running task.
kudos to google team.
yes I agree
thanks
Hearing rumours of Anthropic dropping something next week

But yeah 3.7 can't cut it atm
even grok is dropping a model or two
insane rate limits on this model these days
there's a banner on the model page explaining
isnt that saying just use the paid version the free is dead now?
it's not an ex model yet, it has not ceased to be
the banner does say to use the paid version and twitter does have it paused
hoping its better than 2.5 pro
Think Opus tier
https://www.cnbc.com/2025/05/14/youtube-gemini-ai-feature-will-target-ads-when-viewers-most-engaged.html they had to cuck it a little to spare some compute for the real moneymaker
Wow
Trough full of slop
Vertex still works so this is just for AI studio I assume
yea but vertex is a bit more complicated to sign up for than Ai studio its probably why its still up to get people buying into their stuff
Does 2.5 pro support tool calling now?
Gemini 2.5 preview is faster here too.
They just have had some serious problems. It was very slow.
it does and i think it always did
Using 3.7 in Claude code has been a revelation, it's just so much better and more useful than ones built into editors.
But I think that's more a combination of the software and Claude.
o.o
thanks sir chapel @distant shell
it does help I paid for Claude max, and as such feel like I don't have to worry about nickle and diming myself
I've had 3 going at the same time, work + two personal projects
just setup a task list and let them go
as long as you're okay picking up the pieces if shit hits the fan
Curious for what's max duration can Claude Code go on autonomously and continuously, while making progress without human intervention?
Umm, well that kind of depends on how you want to define it. On time, I've let it go for a while (like over 30 minutes) but I can't say to the clock how long it was running because I was doing other things and it may have completed before then. The other aspect is sometimes their AI is congested and slower, so it isn't always 100% active even if still working. But if told to keep working, it having a clear todolist and actions to complete, and in general enough user instructions about when it should touch basis, it will do quite a bit for better or worse on its own. I've queued up a job from 0 with a fleshed out multi document design foundation, had it make a task list and go. It did about 30% of the total on its own by the time I came back in the morning. Mind you that 30% was the bulk of making something useful, well beyond what other tools have done in one shot. It was also not a frontend/web app.
There's no coded limit to what it can do for how long though like cursor and others. It will churn through tokens, and with the max plan, they have their own tracking so that happens at the API layer and not on the client. It's actually quite nice to get full claude capacity, versus other tools that restrict it, which is probably why it feels so dumb.
Nice. So definitely can go beyond 10 minutes on average with a clear todo list?
I'm trying to figure out where Claude Code fits in to in the spectrum of Devin to Cursor.
its more a partner than a tool on that front
but not fully autonomous, though tbh not that far from it, with some type of meta layer to drive it
I've thought about creating a wrapper that uses gemini or something, and have it think about the big picture and meta stuff, and when claude comes back it does checks and verifies things and then sends claude back out
two cool things about claude code, built in todos (I've been leveraging it to put things in even for myself) and it can batch tool calls, so if it decides it needs to do a bunch of things at once, it can set them up to go, that includes changes and I think it can trigger sub agents, there's been at least one time where it batched a bunch of things, then went "I need to edit this file... looks at file.... oh I guess my batch task already did it..." heh I was like wat?
note I don't have many mcp servers, I have tried context7, it can be useful, but claude can search the web too and that can be just as useful in some cases, the postgres server is useful for having them discover data and schemas themselves when writing code that touches it
I barely explicitly give context (including a file) I almost always just refer to it by name or vague naming, or sometimes just describe what I think it is, much I would like a coworker, and let it find it iself. If I were paying for the api calls, I might be a bit more conservative, but I'm gonna make that $100 worth it
context7 is good but i think using crawl4ai and building RAG content for your llm might be worth a shot
Quick question here... I tried using google vertex in openrouter but it just doesn't work. Can someone help?
How do I use latest gemini-2.5 pro weights through openrouter? Coz I think the gemini 2.5 pro listed on openrouter points to gemini-2.5-pro-preview-03-25 and not to gemini-2.5-pro-preview-05-06.
you can see what endpoint we use in this icon here
our version points to the latest checkpoint. there is no way to hit the march checkpoint for the preview model
(this is a google limitation)
ohh, great. Thanks for correcting me. @restive locust
no worries!
@restive locust Do you find this most recent Gemini difficult to control, or is it just me? I've invested a lot of effort in carefully guiding it to produce the results in the format I want. Gemini has bothered me more than any other model.
I definitely think the newer reasoning models are worse at instruction following, yeah
I have seen people use gemini to plan / architect, and GPT-4.1 or Sonnet 3.5 to implement
AI Studio has just rolled out batch requests for the Gemini 2.5 and 2.0 series of models
j
2.5 Pro Experimental is officially deprecated
I probably used hundreds of dollars worth of tokens
yeah I did the math. it was a lot of inference. like a lot.
same
We all did
Long live exp
Good model
Looking forward to the next experimental / stealth model!

Vertex preview doesn't point to a different model like the google post says?
Same friend, same
cause gemini is not LOTS OF MONEY..... ill start more video games now...
and enjoy me unemployency payment
and anime
until next vibecoding free sota model
It’s not that easy, everything is pretty thrashed now
Wtf invite me
Same
inv to what
Unemployment payment and video games
ben stop it
Whoaaaa I'm watching it stream thought summaries as a single bolded header plus paragraph every 3 seconds, meaning they're processing their thoughts near real time. There's 35 headers in this one for a total of 14.8k output.
Hi !
Is there any way to make gemini-2.5-pro-preview on OR also give its reasoning tokens ?
No, Google has to provide it and they only provide it to allowlisted large accounts
any similar model to gemini 2.5 03 25 for coding on openai or any other?
So what are the odds we'll see a 3.0 Gemini pro experimental sometime within the month?
maybe in i/o google?
That's pretty much what I was angling around
If OpenAI, Claude and DeepSeek are good examples, it won't be 3.0, but a new checkpoint for 2.5.
gemini-2.5-pro-05-20 or something. GA version, not preview or experimental.
RIP experimental mentions and references are completely gone in the docs.
nooo
Haven't scrubbed it everywhere 😅
Their marketing department is weird. I'm literally using it everyday and I still get ads for Gemini everyday.
gemini-2.5-pro-deepthink 
claude ultrathink vibes
no free tier though
gemini-2.5-ultra-5-20

gemini-2.5-ultra-pro-max-6-09
lol the new Google AI Studio usage tab is classifying 2.5 Pro Preview as 2.5 Pro Exp
It's lifting the data from the quota service, so gemini-2.0-pro-exp maps to gemini-exp-1206, gemini-2.0-pro-exp-02-05 and gemini-2.5-pro-exp-03-25, and gemini-2.5-pro-exp maps to gemini-2.5-pro-preview-03-25 and gemini-2.5-pro-preview-05-06.
i can imagine the Google AI Studio team taking every chance to cut corners and ship fast while the GCP team sighs and shakes head lol
must be a nightmare to get the GenAI APIs working with GCP infra
Looks like Gemini is getting a urlContext built in tool that can fetch the contents of URLs to feed into the model.
the built in googleSearch tool is also getting the ability to specify a time range of results to search
Damn. So many startups killed again.
Might need to think extra hard what AI to build now.
- the built in search tool also now allows specify a lat/lon location to geolocate searches
- the Google SDKs are getting MCP support
- you will be able to set your own video FPS to sample videos at instead of the fixed 1FPS
- live API is getting multi speaker support
urlContext built in tool is now live.
{"contents":[{"role":"user","parts":[{"text":"Hi there! What are the headlines on https://bbc.com?"}]}],"generationConfig":{"thinkingConfig":{"includeThoughts":true},"temperature":0,"seed":0},"tools":{"urlContext":{}}}
damn
Gemini has the most confusing names, everything else I can follow just fine
Basically
Model + family number + fluff + date
Model is going to be Gemini
The currently relevant family numbers are 2.0 and 2.5
Some fluff we've seen:
- exp: Experimental, free models. Huge no for production use (Google prohibits that in their terms), very few guarantees
- preview: Slightly less experimental, paid models. Still not fit for production use, though at least google doesn't straight up prohibit it
- thinking: Used in the 2.0 family, as these were not hybrid models. There's a separate 2.0 thinking model, unlike 2.5 Flash
- flash:: Fast
- pro: Slower but better
- lite: Cheaper and worse than Flash
And then the date is given in format mm-dd. In actual production releases, they may use incremental numbers instead of dates (like they e.g. did with Gemini 1.5 Pro and Gemini 1.5 Pro 002)
thanks kyle
thanks kyle
thanks kyle
Gemini 2.5 Pro is getting audio output today at $20/million audio tokens
Damn
Damn
Damn
🥲
Damn
Damn
They have removed raw thoughts from aistudio, replacing it with summaries only. This is a major bummer 😦
Of course they're doing that
They don't want people stealing reasoning via training 
but I didn't train on it, I just liked reading it, it was valuable content.
That is a bummer.
Unfortunately they called it a feature earlier in I/O

In other news flash 2.5 05-20 better than 2.5 Pro for RP 
The new Gemini Flash seems to have the full thinking in AI studio
How? Did you test this?
lmao flash is trash for RP
unless we doing RP in 2023 with the first LLMs
wow
no thinking budget for 2.5 pro?
2.5 Pro thinking budget is coming in June, closer to when it goes GA
anyone else find that the new pro preview absolutely sucks compared to the march version ? Its such a frustrating model to use now
Yes
Google has make the model to be more dumber by making it more tammer
yes you an 95% of the ai community
like @potent coral said enshittification
It's actually quite funny that the older version, which is smarter. Actually able to see bad in good and good in bad, don't totally rejecting the concept when you argue with it and provide a good argument when it did while still able see and understand the difference view and possibility.
But the new one just rejecting it without even providing a good arguments.
so now gemini 2.5 flash thinking is good or o4 mini
they are def pushing more compute into the new shiny model for a few weeks (flash)
so keep that in mind :p although 2.5 pro > flash generally no matter wat imo
yeah for now 2.5 flash very good, in a few weeks it will get the same treatment and become useless again
2.5 pro 03 25 was so fucking good and so fast, i fucking miss it 😭
oh well when 2.5 flash goes to shit ill go with either claude 3.7 thinking or o4 mini/o3 if i still have free tokens
damn this new model is dogshit
the new model is as smart as a rock
thought 120 seconds for a basic python task.
Yep, same experience here
what use now sir
I'm glad im not the only one finding this. my twitter feed was uncharacteristically quiet on the matter but seems a common experience here & on /bard reddit
price to performance IDK. flash is killing it for me still in the usecases where I would want to use flash, but pro sucks at pro usecases... pro now feels like a 10x price flash for 1.2x performance .... when previous iteration was a huge improvement for those usecases
enshitification
definitely feels that way
you're not alone, everyone feels the same
still no thinking budget
Google's going down the Anthropic route of providing signed thoughts to be able to reuse thoughts in subsequent requests.
I just came here investigating the same. We pushed out Gemini pro 2.5 a couple of weeks back and now everything is breaking in production. It is randomly stopping in between a response, refusing to do stuff, and sometimes just filling up the thinking response with repetitive garbage. Shocking move by google. Do you think flash is better than Sonnet 3.5 ?
#1375116913109372968 message
Thanks !
huh wdym?
Having a similar experience. I assume its turbulence before making it GA.
Token speed dropped from 400 to 100~ now , which is an artifical limit. Summaries have much more BS and model is a bit dumber.
The number of times I've seen the summary repeat the same thing over and over is too high.
Its crazy. Was just trying to confirm what I am seeing . - i am seeing more errors - repetitive garbage, replies being cut short - rather than issues with logic.
RP/ERP has also worsened compared to EXP-03-25 (now with very long contexts it suffers from repetitions).
It seemed strange to me that Google was getting them all right!
welcome to the club, 03 25 was the best they dumb down the model.. now we wait
bro i'm tired of this shit i's insane
theu keep ruining good shit
Yes brother, maybe they will release the gemini 3.0 or 2.5 GA soon
big doubt
Hehehe
where is 2.5 thinking budget ?
what mean
not quite sure
I'm curious about what the raw # of calls looks like for this
assuming it stayed pretty constant, this chart seems like great evidence of how 2.5 pro has become such a yapper 😂
interesting how the gemini webapp after deepresearch will offer to generate an infographic - this was one: https://www.jdoodle.com/ih/1HBq
and the prompt is something like this
- Tailwind CSS and Chart.js loaded via CDN.
- The "Brilliant Blues" color palette applied throughout.
- Responsive design with a grid layout for content sections.
- Chart.js visualizations for Context Window Comparison, Architectural Pillars (Doughnut), and MRCR Benchmark Performance. These charts include the required label wrapping for labels longer than 16 characters and the specified tooltip configuration.
- Chart containers are styled according to the requirements (full width of parent, max-width, centered, controlled responsive height).
- HTML/CSS diagrams for the "Thinking Model" paradigm and "Context Caching" process, avoiding SVG and Mermaid JS.
- Content derived from the "Gemini 2.5 Long Context Excellence" report, with introductory paragraphs for each section and explanatory text for all visualizations.
- No SVG or Mermaid JS has been used.
- The output starts with <!DOCTYPE html> and ends with </html>, with no extraneous characters or comments (the planning comments present in the <style> block during generation are not functional HTML/CSS/JS comments and are for context; they wouldn't appear in a rendered page's comment section and are within the rules provided).
could you please fix the link , it aint working
google just converted it into a money printing machine
fixed soz
gemini 2.5 pro degraded alot in performance
and sadly its still the best
yep
aider benchmarks have been retried
-10%
jeez
why do companies love to do that
they didn't have any issues with capacity
i was really betting on google...
Only theory which makes sense to me is Google realised they hit a ceiling on Gemini improvement, quickly retired the insanely good experimental checkpoint in favour for a lighter counterpart
Next release will be an improved exp checkpoint, so people will feel the exp rush all over again as it will easily crush all other competitors and show a "massive" improvement over previous versions
When in reality its just a better exp checkpoint which was retired early
Maybe they are nerfing the pro model for their upcoming ultra model
So the difference is larger
Maybe they made the old pro the new ultra xD
definetely something around those lines
isnt it better to just quantize the model , make it think more to get back some intelligence and then sell this version?
idk if you can just make a model think extra hard to avoid quantisation loss
google has been doing a lot of work towards quantisation and training, but thats a separate topic and requires training from scratch
On top of that, your flagship model shouldnt ideally be quantised, especially if you're google
where are you seeing that?
ran by Paul himself or other people?
I wouldn't trust it, people get very varied results on aider benchmarks that Paul doesn't get
up to and exceeding 10% often
(I don't know why it varies, but the same thing happened with GPT 4.1 / Quasar, people reporting very different results than what he got)
I ran the bad run, it was just 1 run, we would need more runs to draw conclusions.
anyone has problem with json_schema structured output on gemini? somehow if I use @google/genai directly to aistudio, the JSON response is correct; but using openai sdk via openrouter, the structure got messed up (especially with literals)
any way to get gemini 2.5 pro free on open router like before?
the free version was good, the curent version is ass
Is the current version not free? What changed
See above
google wants money now
So no more Gemini for free? I heard new DeepSeek was good
The google gemini (2.5 pro) API is very weird sometimes. One complex prompt, takes almost 2 minutes to complete, gives a very very high quality response. Shortly after, I give it another even harder prompt. Instantly, almost real-time, it replies with a very high quality response. lmao
Thinking isn’t actually very good, there are multiple papers proving this
Experienced this too , it's weird indedd
Yeah... Maybe the tokens per second went BRRR suddenly
The thinking is rough after the update, it used almost 14k tokens one time
RE: thinking, I tried flash without thinking and was getting some weird behavior. I will keep playing with it
Excited for june release of pro thinking budgets
another checkpoint in a few days 
Still no GA?
Source
Pro suddenly aint giving reasoning summaries?
Signs point to 2.5 Flash going GA with the current 05-20 checkpoint, but we're getting another preview model for Pro before GA
I thought @wheat quest you said all your info was from public sources
😄
Honestly thank god, it’s been such a drought dealing with much dumber models like Claude 4 sonnet
It is public, but saying where exactly tends to get things patched (like how Google avoids updates to their open source SDKs after flash thinking got leaked)
True true
at least secondary sources align
1000 requests for free, insane if true
couple of weeks? 😭
whaaaa?
My guess would be it has thinking budget
Seems even people outside of this community also realise how bad of dowgrade the new 2.5 pro are in terms of knowledge and understanding outside of coding domain.
Makes you wonder if they didn't even do an AB test and instead only looked at benchmarks
or did they purposefully degraded it so deepseek like models can not use its data?
🥸
🤷
Is it true?
In their own model card they show it dropping on literally every benchmark aside from code.
Honestly every model release has been weird recently, none of them just a straightforward upgrade.
R1 drops on EQBench's creative writing. Then beats the original on long context until 64k where it drops horribly? Maybe a fluke? And that's probably the most uncontested pure upgrade
o.o
The 2.5 Pro upgrade seems to flipflop on like, everything
huh, what upgrade?
you mean 05 06?
Yeah
ah, just wait for the new endpoint in some weeks
according to gosucoder, 05 06 performs a bit better on cline
i sent you a video @runic ibex
I saw his other video on the new R1 but I'll check it out. Trying out windsurf rn, free so why not. Already used cursor and cline
Yet they still decided to release it lol
Hi
Code is important. I think every top lab is mad that Claude is just crushing it and has been for quite a long time now
it's theorized that it's more efficient
Smaller model?
some would say
Their servers were pretty badly under load a while ago, so could be legit
It went up on the UGI knowledge benchmark though, and that's usually positively correlated with model size. So who the hell knows
What's GA?
I hope so but with the kind of rugpull google did with 03-25 people aren't going to trust them until it is stable for what? At least 3 months? 
general availability
General availability. A term that cloud companies use to signal that the product is out of beta and can be used for production workload with SLAs and proper support.
in other words, "give us your money now if you weren't already"
but also they know you were already
Where did the thinking summaries go 😑
anyone having horrible gemini hallucinations today
what the heck all gemini models, especially this one just having a bad time
well if @wheat quest is right, your woes should be over tomorrow :)
can i get some context XD
who is deathmax
new 2.5 pro should be released tmrw
or very soon if not tomorrow
but leaks and semi-public info suggest tmrw
do we know why like
a bunch of the gemini models today in general
have kind of been tweaking
nope
thursday*
not sure if this has been discussed here before but i just run into this https://discord.com/channels/1091220969173028894/1379302807320137728
looks like google turned on thinking for 2.5 flash and now the first tokens streamed are thinking by default
If plans don't change, we'll get a new checkpoint in a few hours.
@vital locust
wen
Looks like model is landing on Thursday instead, with thinking budget support
Thinking budget for 2.5 Pro will be disabled or 64-32768
whats 64-32768?
also , 2.5 flash GA ? Hope they release the model on batch api
batch api is still stuck with 2.0 flash 001
the thinking budget can be set between 64 tokens and 32K tokens.
What is this
Found on reddit post
Hmm
#1354107710437724221 message
And from my teaser on another server
wow, benchmaxxed slop?
What is diff-fenced
wait a sec so despite not being an insider, you got it - the api for new gemini is exposed to the public? 😂
👀
insane
i will continue to vouch that he is not making this stuff up 
yeah
give a review , does it think more? Sucks at frontend? Dumber than befre?
I've seen independent confirmation of the same as deathmax in another server from an insider
deathmax is not an insider haha
yeah I know
wait deathmax might be insider her
just saying it confirms that deathmax is the goat :)
they turned it off
calling it fake , puts on deathmax
I bet 100$ toven leaked it
window wasn't open that long
do you just have scripts going monitoring this kind of thing
like how plinny gets the system prompt changes

so... is this more or less completion_tokens than 05-06
because the cost is more
and I'm concerned that it's gonna still be a slow loser
Seconds per case : 45.3
gemini 2.5 pro 03-25...
Seconds per case : 165.3
05-06...
so only a slight improvement over current 05-06
that's quite sad
well this model was sucked at everything but coding , so I assume the next model is going to be better at rest of the task with slight coding degradation?
I wouldn't read too much into the test time
this is showing +10 points over 05-06 on both pass 1 / pass 2
why not?
throughput would have been jank given the situation
fair
rip no token count at this point
it is interesting that the total_cost went up assuming costs are accccurate... are they raising prices?
🤑
i refused to use 2.5 flash thinking simply due to the thinking tax. I will disenjoy it
disnejoy it so much
I understand you... What's your use case? Do you ever use the 2.5 pro?
honestly with the chatgpt plus subscription i dont rly use anything else
cuz like, why use api if i got a model right there ready to answer, that i already paid for
chatgpt subs is a good vfm
but even if i did mainly use api i'd be pissed to find out there is a price markup for no reason
chatgpt plus that good huh?
Why do reasoning models cost more than non-reasoning ones even though they have the same architecture? This video provides a great explanation!
I am seeing a lot of people confused about why reasoning models cost more than their non-reasoning counterparts even if they share exactly the same architecture. It has everything to do with the fact th...
for the Nth time , there is nothing called as thinking tax
does it go on about token count?
i'm ttalking about price per token
if it's about context length, it looks like they could just implement context length-specific pricing, like they've already done with 2.5 pro
otherwise i could rack up a lot of context on 2.5 flash non-thinking, and have it cost them just the same, but for some reason they'd be charging me less
it just smells like they have it cost more only because the user gets better results, and nothing else.
just watch teh video
I think I answered all the points on the slide on the video
Deepseek has somewhat disproved this
with R1
and them releasing their figures on it
famous deathmax
Google wouldn't be so stupid to fail its community a second time in the last 30 days.
my prediction is its like 03-25
but overfitted
Especially when they have the edge
i mean i think r1 has the edge imo
but eh
btw evidence shows deepseek distilled from gemini lol
i saw an article proving it
2.5 pro has completed a couple coding tasks I gave it that no model that i tested before it was able to
One of them was figuring out which elements in a pygame app overlapped and fixing the UI
i mean for me i like switching models
sometimes o3 or o4-minj can solve a problem r1 cant
sometimes 2.5 pro is better
and sometimes sonnet 4 takes the win
but in general ive been using r1 the mpst
Did you open source the code somewhere? Would love to take a look.
Aider at this point is probably leaked, reward hacked, overfitted and outdated for agentic flows.
Probably need a aider benchmark V3 to become useful again.
I have a folder called "funnygpt" but i recently cleared it of anything i didnt wish to keep. I will check if that app is still there.
I've stopped using 2.5 pro ENTIRELY recently
the fact that it takes 3+ minutes on many tasks
is insane
even if it gets way better in the next update, if it's not a lot faster, I'm not sure I'll use it!
I read this with trump voice
The recent r1? R1 05 28?
Why you need it to be so fast sir?
3 minutes per little task is insane
Not sure why for me it fast sir
yes
aider benchmark is open source lol
not leaked
Yeah that's what data leak means for pre-training
I remember some benchmarks have hidden or withheld datasets. Can't remember which one.
ARC-AGI being one
Seconded. 2.5 pro can do things no one else can.
I switch between o3, Gemini pro 2.5, r1 and sometimes sonnet 4
It all depends
Pro 2.5 and sonnet 4 explain code better on average
King fall has only 64k context weird…
2.5 pro is the explainor 100%, very easy to follow what it says
not working for me lol
small context -> less resources -> more compute (?)
64k is still a lot tbh
i think just some intern messed up probably, not the actual new 2.5 pro model
and it's gone!
gone
nah why would they label CONFIDENTIAL in a publicly available service lmao
messed up hard or genius marketing
cheap way to generate hype and get attention lol
should have been named Kingfall YOLO 360 noscope GPT Killer x
to scare people even more
sam altman or elon musk would
That’s why we have dubesor!
i dont see it
It’s gone
oh
ah lol
i mean if arc was open source llms prob wouldve gotten like 50%
That makes me wonder if it's an open source model and not Gemini. Like maybe a Gemma-based thinking model perhaps
i think kingfall is not gemma
gemma doesnt support structured output, code execution,metc
unless its gemma 4?
but it seems a bit early
Google has launched exp models with 32-64k context lengths before.
I dont know if this was here before but seems like ai studio now has framerate and time options for video attachments
Google has been cooking on absolutely everything except 05-06 so I'm expecting good things.
o.o
So today is the expected new model?
Well Logan K didn't tweet "Gemini" yet
Which he usually does before releasing something
Thanks for this BTW
so where is this new model at
Is it out ?
no yet
gemini-2.5-pro-preview-06-05 is now rolling out.
Looks great
so you can set thinking budget you just can't turn it off 
👀
Noo
when will it NOT be a preview version?
damn
good thing 2.5 flash can be used in batch api now
05-06 and 06-05 is great naming, guys
LMAO
But we can not disable thinking right?
I tried passing "'extra_json': {'reasoning': {'max_tokens': 0}}", but got a 400 error.
{'error': {'message': 'Provider returned error', 'code': 400, 'metadata': {'raw': '{\n "error": {\n "code": 400,\n "message": "The thinking budget (0) is invalid.",\n "status": "INVALID_ARGUMENT"\n }\n}\n', 'provider_name': 'Google AI Studio'}}
But Google says the new pro model has already support budget control.
Ok, so the 2.5-pro model can not disable thinking just like the flash one?
Alright, I've confirmed this from Google's doc. Thank you.
better? worse? can't be worse right?
Lol that's literally the worst thing you could have done. At least it's ISO order.
lol what's the point of thinking mode button if you can't disable thinking lol
Minimum thinking: "Alright, the user wants me to [whatever user just input]." -> [begin response] 🙄
so we can only get one version of gemini 2.5 pro via openrouter, it always points to latest?

