#general | Arena | Page 4

torn mantle Mar 25, 2025, 2:02 PM

#

Actually gemini has a different reasoning approach

#

So it will be interesting

keen beacon Mar 25, 2025, 2:02 PM

#

this seems different compared to flash thinking

gentle plinth Mar 25, 2025, 2:03 PM

#

yeah openai hiding the reasoning chain for which you are paying is just ridiculous

torn mantle Mar 25, 2025, 2:03 PM

#

keen beacon this seems different compared to flash thinking

Probably modified or smth like oai one

keen beacon Mar 25, 2025, 2:03 PM

#

in my personal experience flash thinking had several limitations on ood

brittle tiger Mar 25, 2025, 2:03 PM

#

torn mantle Actually gemini has a different reasoning approach

deepseek reasoing outputs > gemini > openai

for me at least

sick mountain Mar 25, 2025, 2:03 PM

#

flash thinking:

torn mantle Mar 25, 2025, 2:03 PM

#

brittle tiger deepseek reasoing outputs > gemini > openai for me at least

Yea for now

torn mantle Mar 25, 2025, 2:03 PM

#

brittle tiger deepseek reasoing outputs > gemini > openai for me at least

I always return to deepseek just to read the reasoning process tbh

#

Everytime you find interesting stuff

#

I think it may be time to sub to gemini advanced

#

Its not a bad plan

sick mountain Mar 25, 2025, 2:06 PM

#

if 2.5 is SOTA then it is for sure worth it

torn mantle Mar 25, 2025, 2:06 PM

#

And also they improved the deep research

keen beacon Mar 25, 2025, 2:06 PM

#

the gemini models on gemini the product kinda suck tho

#

aistudio is better

sick mountain Mar 25, 2025, 2:07 PM

#

i feel like they've likely toned down the censorship recently no testing tho just vibes

torn mantle Mar 25, 2025, 2:07 PM

#

keen beacon the gemini models on gemini the product kinda suck tho

That's the issue with google, they have great models, imagen 3 & veo 2, but the marketing and implementation is just not it

keen beacon Mar 25, 2025, 2:08 PM

#

holy hell google moves fast now lol

rigid widget Mar 25, 2025, 2:09 PM

#

I'm genuinely surprised that even people within this community are believing the "safety" garbage coming from these artificial intelligence corporations.

#

They can mark any requests they want as dangerous, using the excuse of security. Weapons, porn, drugs, workarounds, modifications... you can mark all of these as potentially dangerous. You can even consider giving information about yt-dlp and ffmpeg as "potentially dangerous," and even "content blockers, VPNs, userscripts" as "potentially dangerous."

calm sequoia Mar 25, 2025, 2:10 PM

#

It was anomaly from the start that Google is not the No. 1 at everything LLM and AI

alpine coral Mar 25, 2025, 2:10 PM

#

rigid widget I'm genuinely surprised that even people within this community are believing the...

im not saying i'm agreeing with it

#

just that safety is not the same as propaganda

north vale Mar 25, 2025, 2:11 PM

#

Fwiw i think the 2.5 thing is prolly real now

alpine coral Mar 25, 2025, 2:11 PM

#

yeah me too

keen beacon Mar 25, 2025, 2:11 PM

#

ya it is

#

its just an unbelievable pace

#

they did gemini 2 sooo fast

#

then pivot to this that quickoly

#

they didnt even release 2.0 pro stable

#

😉

rigid widget Mar 25, 2025, 2:16 PM

#

alpine coral just that safety is not the same as propaganda

please just look at elon musk

silk haven Mar 25, 2025, 2:16 PM

#

I thought that the 2.5 Flash and 2.5 Pro would be released near Google I/O in May, like last year

brittle tiger Mar 25, 2025, 2:16 PM

#

keen beacon they didnt even release 2.0 pro stable

would make sense for this to be today. bc yea 2.5 before GA 2.0 Pro would be funny

alpine coral Mar 25, 2025, 2:17 PM

#

rigid widget please just look at elon musk

tbf this is a good point lol

keen beacon Mar 25, 2025, 2:17 PM

#

brittle tiger would make sense for this to be today. bc yea 2.5 before GA 2.0 Pro would be fun...

i dont think so tbh. i think theyre skipping it

silk haven Mar 25, 2025, 2:17 PM

#

Google's Sergey Brin: Google’s AI products “are overrun with filters and punts of various kinds.” -> Google’s co-founder tells AI staff to stop ‘building nanny products’

Google’s AI products “are overrun with filters and punts of various kinds.” According to Brin, Google needs to “trust our users” and “can’t keep building nanny products.”

silk haven Mar 25, 2025, 2:18 PM

#

silk haven Google's Sergey Brin: Google’s AI products “are overrun with filters and punts o...

Last month

keen beacon Mar 25, 2025, 2:18 PM

#

silk haven Google's Sergey Brin: Google’s AI products “are overrun with filters and punts o...

that certainly didnt stop them from developing stuff (other than safety filters) this fast lol

north vale Mar 25, 2025, 2:19 PM

#

silk haven Google's Sergey Brin: Google’s AI products “are overrun with filters and punts o...

Very Bullish

leaden palm Mar 25, 2025, 2:20 PM

#

top 5 google naming fails of all time

north vale Mar 25, 2025, 2:20 PM

#

Yeah this doesnt make any sense

#

Surely they add thinking to the name

keen beacon Mar 25, 2025, 2:20 PM

#

cot models will probably be the default going forward

north vale Mar 25, 2025, 2:21 PM

#

Nah

keen beacon Mar 25, 2025, 2:21 PM

#

north vale Nah

this is what openai says

#

they wont release a non cot model anymore

leaden palm Mar 25, 2025, 2:21 PM

#

well gpt-5 won't use chain of thought if unneeded as i understand

keen beacon Mar 25, 2025, 2:22 PM

#

leaden palm well gpt-5 won't use chain of thought if unneeded as i understand

yea but its gonna be a hybrid like sonnet 3.7 presumably

north vale Mar 25, 2025, 2:22 PM

#

Idk if it’s hybrid and mostly doesnt use reasoning for most completions it doesnt count as cot model to me

#

So ig depends how u look at it

#

Maybe models with cot capability will be defaukt

#

But asking it how are u and it reasoning about it before answering will not be default

#

Bc that’s useless

keen beacon Mar 25, 2025, 2:23 PM

#

silk haven Mar 25, 2025, 2:23 PM

#

keen beacon that certainly didnt stop them from developing stuff (other than safety filters)...

Sergey Brin full note:

“It has been 2 years of the Gemini program and GDM. We have come a long way in that time with many efforts we should feel very proud of. At the same time competition has accelerated immensely and the final race to AGI is afoot. I think we have all the ingredients to win this race but we are going to have to turbocharge our efforts.

Code matters most — AGI will happen with takeoff, when the Al improves itself. Probably initially it will be with a lot of human help so the most important is our code performance. Furthermore this needs to work on our own 1p code. We have to be the most efficient coder and Al scientists in the world by using our own Al.

Productivity — In my experience about 60 hours a week is the sweet spot of productivity. Some folks put in a lot more but can burn out or lose creativity. A number of folks work less than 60 hours and a small number put in the bare minimum to get by. This last group is not only unproductive but also can be highly demoralizing to everyone else.

Location — It is important to work in the office because physically being together is far more effective for communication than gve etc. And, therefore you need to be physically colocated with others working on the same thing. We need to minimize reporting lines across countries, cities, and buildings. I recommend being in the office at least every week day.

Organization — We need to have clear responsibility and organization with high functioning groups with shared management and technology leadership.

Simplicity — Lets use simple solutions where we can. Eg if prompting works, just do that, don’t posttrain a separate model. No unnecessary technical complexities (such as lora). Ideally we will truly have one recipe and one model which can simply be prompted for different uses.

Excellence — whether it’s an eval or a data source or a dashboard or a message in an internal Ul, please make sure they all work and all are good.

rigid widget Mar 25, 2025, 2:23 PM

#

keen beacon they didnt even release 2.0 pro stable

because thanks to AI Studio, they want to create models that are constantly being tested with new data and are always getting better. They don't want to offer something as "stable" without doing something really big.

silk haven Mar 25, 2025, 2:23 PM

#

silk haven Sergey Brin full note: “It has been 2 years of the Gemini program and GDM. We h...

Speed — we need our products, models, internal tools to be fast. Can’t wait 20 minutes to run a bit of python on borg.

Iterate at small scale — we need lots of ideas that we can test quickly. The best way to do this is small scale experiments until you can ramp up and hopefully see increasing advantage at scale. This is an excellent validation. Working too much at just large scale has a habit of minor tweaking and overfitting to evals, checkpoint sniping, etc. We need real wins that scale.

No punting — we can’t keep building nanny products. Our products are overrun with filters and punts of various kinds. We need capable products and [to] trust our users.“

north vale Mar 25, 2025, 2:24 PM

#

https://x.com/testingcatalog/status/1904539290899533838?s=46 lol is this real chat

TestingCatalog News 🗞 (@testingcatalog) on X

Is OpenAI ready with GPT-5? 👀👀👀

silk haven Mar 25, 2025, 2:24 PM

#

https://www.theverge.com/command-line-newsletter/622045/google-ai-nanny-products

The Verge

Google’s cofounder tells AI staff to stop ‘building nanny produ...

He also thinks they should be working 60-hour weeks to build AGI.

rigid widget Mar 25, 2025, 2:26 PM

#

Folks, please resist the hype and be patient. Real-world tests are consistently the most crucial.

keen beacon Mar 25, 2025, 2:27 PM

#

well people have been testing nebula here for a while and its been good

#

^

#

although it could be possible phantom is 2.5 pro exp and nebula is something else, or vice versa

pure nova Mar 25, 2025, 2:29 PM

#

I cant believe they release nebula already

sick mountain Mar 25, 2025, 2:29 PM

#

how long has it been in lmarena?

keen beacon Mar 25, 2025, 2:29 PM

#

GDM employees have been hinting it nonstop for the last day or two

pure nova Mar 25, 2025, 2:29 PM

#

Its in aistudio

keen beacon Mar 25, 2025, 2:29 PM

#

sick mountain how long has it been in lmarena?

~5 days

keen beacon Mar 25, 2025, 2:29 PM

#

keen beacon although it could be possible phantom is 2.5 pro exp and nebula is something els...

specter/phantom/nebula i think are the same

#

maybe different temperatures?

brittle tiger Mar 25, 2025, 2:30 PM

#

north vale https://x.com/testingcatalog/status/1904539290899533838?s=46 lol is this real ch...

looks very fake

sick mountain Mar 25, 2025, 2:30 PM

#

it is fake

keen beacon Mar 25, 2025, 2:30 PM

#

keen beacon maybe different temperatures?

dont think it would be noteworthy enough to split into different names i think

#

just different revisions

brittle tiger Mar 25, 2025, 2:30 PM

#

pure nova Its in aistudio

screencap?

keen beacon Mar 25, 2025, 2:30 PM

#

pure nova Its in aistudio

wrong

pure nova Mar 25, 2025, 2:30 PM

#

Im in the uk and i have it

keen beacon Mar 25, 2025, 2:30 PM

#

send a screenshot

pure nova Mar 25, 2025, 2:31 PM

#

keen beacon Mar 25, 2025, 2:31 PM

#

lol no

north vale Mar 25, 2025, 2:31 PM

#

brittle tiger looks very fake

Agreed

sick mountain Mar 25, 2025, 2:31 PM

#

lmao

keen beacon Mar 25, 2025, 2:31 PM

#

it's not actually called nevila

#

nebula

pure nova Mar 25, 2025, 2:31 PM

#

thats what it says on my studio

keen beacon Mar 25, 2025, 2:31 PM

#

right

#

🙄

#

u changed pro to nebula lol

#

with inspect element

#

its supposed to be 2.5 anyway

pure nova Mar 25, 2025, 2:32 PM

#

03-25

#

yeah thats what it says

#

hmm thats odd 🤔

brittle tiger Mar 25, 2025, 2:32 PM

#

I'll feel bad for my laugh emoji if you aren't BSing but that would be pretty strange

rigid widget Mar 25, 2025, 2:36 PM

#

pure nova yeah thats what it says

bruh nebula it's anon name

pure nova Mar 25, 2025, 2:41 PM

#

im not sure

#

thats what its saying on my studio

#

oh its out on gemini now

brittle tiger Mar 25, 2025, 2:56 PM

#

lfg

pure nova Mar 25, 2025, 2:56 PM

#

so any announcement or any news / changes for this ?

#

when can we expect official benchmarks

sick mountain Mar 25, 2025, 2:56 PM

#

probably at 11 am or 12 pm est

north vale Mar 25, 2025, 2:56 PM

#

ok but fr why is polymarket not summing up to 100%

#

ik they all resolve no from a tie but a tie seems very unlikely?

#

it seems priced at 10%+ rn

pure nova Mar 25, 2025, 2:58 PM

#

holy f

#

2.5 is so good wtf

#

every other ai model i asked, it just made complete ass

#

but 2.5 nailed it

sick mountain Mar 25, 2025, 2:59 PM

#

prompt?

pure nova Mar 25, 2025, 2:59 PM

#

#

it was legit the simplest html website

#

and every other model couldnt do it for its life

barren prairie Mar 25, 2025, 3:13 PM

#

pure nova yeah thats what it says

I don t have it on my google ai studio 🥲🥲🥲🥲🥲😥😥😥

scarlet flint Mar 25, 2025, 3:13 PM

#

have you tried deepseek?

#

i saw on internet today

#

that they released new model and its very good

scarlet flint Mar 25, 2025, 3:13 PM

#

pure nova

Can you share picture in higher resolution?

scarlet flint Mar 25, 2025, 3:13 PM

#

barren prairie I don t have it on my google ai studio 🥲🥲🥲🥲🥲😥😥😥

me too

rigid widget Mar 25, 2025, 3:16 PM

#

scarlet flint have you tried deepseek?

of course and too much soon i release my comparison for too many tasks

pure nova Mar 25, 2025, 3:24 PM

#

its my own website?

scarlet flint Mar 25, 2025, 3:26 PM

#

pure nova its my own website?

can you share the image? i wanted to try it on other models?

pure nova Mar 25, 2025, 3:26 PM

#

scarlet flint Mar 25, 2025, 3:27 PM

#

thanks

#

yeah newest deepseek model can't replicate that

#

not even close

elder rapids Mar 25, 2025, 3:30 PM

#

they must be super confident

#

2.5 is crazy

scarlet flint Mar 25, 2025, 3:31 PM

#

yeah

pure nova Mar 25, 2025, 3:31 PM

#

scarlet flint yeah newest deepseek model can't replicate that

Gemini did it perfectly pretty much

#

try to plug it in gemini

#

compare it with me

scarlet flint Mar 25, 2025, 3:32 PM

#

pure nova Gemini did it perfectly pretty much

i don't have the pro model

elder rapids Mar 25, 2025, 3:32 PM

#

saying it because I read the previous messages

#

so

pure nova Mar 25, 2025, 3:32 PM

#

scarlet flint i don't have the pro model

show me what deepseek gave to ou

#

this is what i got fm gemini

north vale Mar 25, 2025, 3:32 PM

#

if anyone wants to run a prompt and doesn't have access, i got access to 2.5 pro

elder rapids Mar 25, 2025, 3:32 PM

#

I have access too

#

it's pretty good, but it's nerfed

pure nova Mar 25, 2025, 3:33 PM

#

elder rapids it's pretty good, but it's nerfed

how

elder rapids Mar 25, 2025, 3:33 PM

#

they're not allowing it to think long enough

#

lmao

#

it's cutting short its own CoT

scarlet flint Mar 25, 2025, 3:33 PM

#

pure nova show me what deepseek gave to ou

pure nova Mar 25, 2025, 3:33 PM

#

yeah wtf

#

compared to gemini

#

its dogshit

elder rapids Mar 25, 2025, 3:33 PM

#

alright what's going on tho

#

2.5 is insane

pure nova Mar 25, 2025, 3:34 PM

#

wow how the hell did deepseek mess it up THAT bad

elder rapids Mar 25, 2025, 3:34 PM

#

for a lot of tasks

#

but they're cutting off the cot

loud leaf Mar 25, 2025, 3:34 PM

#

nebula = 2.5 pro?

elder rapids Mar 25, 2025, 3:34 PM

#

I think so yeah

north vale Mar 25, 2025, 3:34 PM

#

ye

#

prolly

pure nova Mar 25, 2025, 3:34 PM

#

2.5 can also do geogussr

#

its cool

#

i asked it and it got it right first attempt

scarlet flint Mar 25, 2025, 3:34 PM

#

pure nova wow how the hell did deepseek mess it up THAT bad

i think its bad to replicate images

#

i've tested it on my JFrame design in java

#

and it upgraded it heavly

elder rapids Mar 25, 2025, 3:36 PM

#

2.5 has to be SOTA

#

I'm asking it to think longer

#

and it's actually getting these puzzles right lmfao

rigid widget Mar 25, 2025, 3:36 PM

#

pure nova its my own website?

joke bro

elder rapids Mar 25, 2025, 3:36 PM

#

flash doesn't think longer when I ask it to either

pure nova Mar 25, 2025, 3:37 PM

#

elder rapids I'm asking it to think longer

ask it to think for a certain amt of time eg 30sec

brittle tiger Mar 25, 2025, 3:38 PM

#

I want it out in AI studio. have access in gemini but ai studio better for testing when not hooked up to memories and apps like in main app

elder rapids Mar 25, 2025, 3:39 PM

#

are they gonna add 2.5 to AI studio?

north vale Mar 25, 2025, 3:39 PM

#

likely

rigid widget Mar 25, 2025, 3:39 PM

#

left to right: deepseekv3, gemini2.0pro, claude3.7sonnet, deepseekv3-0324

Screenshot_2025-03-25-18-24-48-695_org.mozilla.firefox.png

Screenshot_2025-03-25-18-24-58-443_org.mozilla.firefox.png

keen beacon Mar 25, 2025, 3:39 PM

#

elder rapids it's pretty good, but it's nerfed

system prompt related, this has always been an issue with models on gemini

#

wait for it to launch on ai studio

#

Gemini version is weaker

elder rapids Mar 25, 2025, 3:40 PM

#

ye

#

has to be affecting the thinking length

scarlet flint Mar 25, 2025, 3:41 PM

#

pure nova wow how the hell did deepseek mess it up THAT bad

i aksed it to upgrade the design

📎 index.html

rigid widget Mar 25, 2025, 3:42 PM

#

keen beacon system prompt related, this has always been an issue with models on gemini

By the way, Gemini didn't do that thing we talked about earlier, they've clearly restricted it.

pure nova Mar 25, 2025, 3:43 PM

#

scarlet flint i aksed it to upgrade the design

yes but its not about that its about copying the image

keen beacon Mar 25, 2025, 3:43 PM

#

4o image gen coming

pure nova Mar 25, 2025, 3:43 PM

#

gemini could dothat too

rigid widget Mar 25, 2025, 3:43 PM

#

My English is not very good. Which poem is better?

Screenshot_2025-03-25-16-34-08-862_md.obsidian.png

rigid widget Mar 25, 2025, 3:44 PM

#

keen beacon 4o image gen coming

Dall-e 3 is totally garbage let's see

scarlet flint Mar 25, 2025, 3:46 PM

#

pure nova yes but its not about that its about copying the image

yeah i know, can you show me version of your page upgraded by the gemini 2.5

#

?

#

i don't have access to it since i don't have the subscription i guess

keen beacon Mar 25, 2025, 3:46 PM

#

rigid widget Dall-e 3 is totally garbage let's see

my theory is that it's upgraded compared to the initial preview of 4o image gen

#

otherwise it just looks embarrassing

elder rapids Mar 25, 2025, 3:46 PM

#

what if 2.5 pro is an entirely different and new model

#

it doesn't say it's thinking

gentle plinth Mar 25, 2025, 3:47 PM

#

keen beacon Mar 25, 2025, 3:47 PM

#

yeah we know

elder rapids Mar 25, 2025, 3:47 PM

#

nah what I mean is

keen beacon Mar 25, 2025, 3:47 PM

#

qwen 3 on thursday 👀 apparently

scarlet flint Mar 25, 2025, 3:47 PM

#

elder rapids it doesn't say it's thinking

there was this new thinking method for models

#

more efficient

elder rapids Mar 25, 2025, 3:47 PM

#

what if it's better for context

#

that all the other models

scarlet flint Mar 25, 2025, 3:48 PM

#

scarlet flint there was this new thinking method for models

draft something

pure nova Mar 25, 2025, 3:48 PM

#

scarlet flint yeah i know, can you show me version of your page upgraded by the gemini 2.5

📎 index.html 📎 style.css

scarlet flint Mar 25, 2025, 3:48 PM

#

and its like internal thinking

elder rapids Mar 25, 2025, 3:48 PM

#

ye

pure nova Mar 25, 2025, 3:48 PM

#

its not as good as deepseek but all i said was upgrade the page

scarlet flint Mar 25, 2025, 3:48 PM

#

pure nova its not as good as deepseek but all i said was upgrade the page

yeah its nice

#

it kept the style of original page

pure nova Mar 25, 2025, 3:49 PM

#

yeah

elder rapids Mar 25, 2025, 3:49 PM

#

I'm predicting it becomes the best long context reasoner

#

by a large margin

scarlet flint Mar 25, 2025, 3:49 PM

#

scarlet flint it kept the style of original page

chain-of-draft

keen beacon Mar 25, 2025, 3:50 PM

#

keen beacon qwen 3 on thursday 👀 apparently

great week for acceleration

#

hopefully we get o3 soon given it's been threatened

pure nova Mar 25, 2025, 3:51 PM

#

i wonder if 2.5 pro is better than 3.7 thinking at its max both

rigid widget Mar 25, 2025, 3:51 PM

#

keen beacon my theory is that it's upgraded compared to the initial preview of 4o image gen

As long as there is that "security" nonsense I mentioned before, they can never compete with others in image generation.

silk haven Mar 25, 2025, 3:51 PM

#

https://x.com/officiallogank/status/1904559860378915127?s=46&t=P8-tRi_JAVcI6l5U6nOT4A

Logan Kilpatrick (@OfficialLoganK) on X

🌌🥎👍

keen beacon Mar 25, 2025, 3:51 PM

#

https://x.com/OfficialLoganK/status/1904561688134967357?t=zVeue3sku3MQJM3XIRKcyA&s=19 LOL

Logan Kilpatrick (@OfficialLoganK) on X

@OpenAI : )

rigid widget Mar 25, 2025, 3:51 PM

#

rigid widget My English is not very good. Which poem is better?

can anyone help?

keen beacon Mar 25, 2025, 3:52 PM

#

they potentially have a year lead on native image gen though

pure nova Mar 25, 2025, 3:52 PM

#

silk haven https://x.com/officiallogank/status/1904559860378915127?s=46&t=P8-tRi_JAVcI6l5U6...

whats the prompt for 2.5 to do this

#

oh wait that is 2.5 right

elder rapids Mar 25, 2025, 3:53 PM

#

pure nova i wonder if 2.5 pro is better than 3.7 thinking at its max both

from my experiments with nebula

#

i think it's better

#

but we'll have to wait and see for the un nerfed version

pure nova Mar 25, 2025, 3:54 PM

#

how are u so sure that this is nerfed at all

willow grail Mar 25, 2025, 3:55 PM

#

keen beacon https://x.com/OfficialLoganK/status/1904561688134967357?t=zVeue3sku3MQJM3XIRKcyA...

finally gpt4o releases today

rigid widget Mar 25, 2025, 3:55 PM

#

Can someone who speaks English help me?

pure nova Mar 25, 2025, 3:55 PM

#

lol no way
i just asked gemini 2.5
Simulate a gravity-affected ball bouncing inside a rotating square using Python, with realistic velocity, collision, and rotation-aware physics.
and it gave me a syntax error

#

insane

scarlet flint Mar 25, 2025, 3:56 PM

#

i see that google has this tendency to release very good models from time to time like that experimental model that got almost to the top of leaderboard, i think it was sitting on second place, from my tests it was better than 2.0 flash they released into gemini website

elder rapids Mar 25, 2025, 3:56 PM

#

pure nova how are u so sure that this is nerfed at all

because it's worse than nebula by a ton and doesn't think long enough

#

either it's not nebula at all

#

or it's nerfed

#

only possible cases

keen beacon Mar 25, 2025, 3:57 PM

#

just wait for the release on aistudio 🙈

elder rapids Mar 25, 2025, 3:57 PM

#

https://twitter.com/Oimachi6020_En/status/1904555690246685046 is this true

Oimachi@𝕏 (@Oimachi6020_En) on X

🚨BREAKING🚨
Google is preparing to release up to four models in addition to the Gemini 2.5 Pro

The Gemini website has already prepared it🔥

pure nova Mar 25, 2025, 3:57 PM

#

braindead?

elder rapids Mar 25, 2025, 3:58 PM

#

keen beacon just wait for the release on aistudio 🙈

ong

rigid widget Mar 25, 2025, 3:58 PM

#

elder rapids but we'll have to wait and see for the un nerfed version

bro this is the unnerfed version, we will see the nerfed version in the future.

elder rapids Mar 25, 2025, 3:58 PM

#

dawg

#

if it's saying things differently, not reasoning as long as nebula

#

it's nerfed

#

this isn't skepticism or anything

#

😭

pure nova Mar 25, 2025, 4:00 PM

#

holy sht every question i ask it im getting syntax error

rigid widget Mar 25, 2025, 4:00 PM

#

rigid widget My English is not very good. Which poem is better?

did i shadow banned? Why doesn't anyone see what I post?

pure nova Mar 25, 2025, 4:00 PM

#

write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically
and
Simulate a gravity-affected ball bouncing inside a rotating square using Python, with realistic velocity, collision, and rotation-aware physics.

keen beacon Mar 25, 2025, 4:00 PM

#

the gemini product models suck

scarlet flint Mar 25, 2025, 4:00 PM

#

📎 test.py

elder rapids Mar 25, 2025, 4:00 PM

#

pure nova holy sht every question i ask it im getting syntax error

apparently it's unstable idk

keen beacon Mar 25, 2025, 4:00 PM

#

the aistudio release will be good

scarlet flint Mar 25, 2025, 4:00 PM

#

pure nova lol no way i just asked gemini 2.5 Simulate a gravity-affected ball bouncing ins...

i used your prompt in deepseek

scarlet flint Mar 25, 2025, 4:01 PM

#

scarlet flint

this is what he gave me

#

it loooks like

#

animation

#

or something XD

#

always the same

sick mountain Mar 25, 2025, 4:02 PM

#

might be a tokenization issue ive seen other gemini models miss empty brackets a lot, also see https://www.reddit.com/r/Bard/comments/1jjmta6/gemini_25_cannot_write/

From the Bard community on Reddit

Explore this post and more from the Bard community

pure nova Mar 25, 2025, 4:02 PM

#

ok yeah wtf

#

deepseek one works instanly

scarlet flint Mar 25, 2025, 4:02 PM

#

pure nova deepseek one works instanly

not really

pure nova Mar 25, 2025, 4:02 PM

#

ok nvm

scarlet flint Mar 25, 2025, 4:02 PM

#

keep watching

pure nova Mar 25, 2025, 4:02 PM

#

the ball fell through the square

scarlet flint Mar 25, 2025, 4:02 PM

#

its glitching and also play it again

#

its 1:1

#

the same simulation

#

the same path

pure nova Mar 25, 2025, 4:02 PM

#

but at least it didnt get syntax errors like gemini

scarlet flint Mar 25, 2025, 4:02 PM

#

the same colission hit

loud leaf Mar 25, 2025, 4:02 PM

#

keen beacon just wait for the release on aistudio 🙈

how long's the typical lag to add to aistudio? longer thinking time will help

keen beacon Mar 25, 2025, 4:02 PM

#

in a few hours

scarlet flint Mar 25, 2025, 4:02 PM

#

like the animation was hard coded but it wasn't

elder rapids Mar 25, 2025, 4:02 PM

#

loud leaf how long's the typical lag to add to aistudio? longer thinking time will help

hours

scarlet flint Mar 25, 2025, 4:03 PM

#

pure nova but at least it didnt get syntax errors like gemini

yeah

sick mountain Mar 25, 2025, 4:03 PM

#

sick mountain might be a tokenization issue ive seen other gemini models miss empty brackets a...

@pure nova

scarlet flint Mar 25, 2025, 4:03 PM

#

i wonder if deepseek r2 will be one of the best if not the best thinking model or if it will be total trash and will fail every expectation

pure nova Mar 25, 2025, 4:03 PM

#

sick mountain <@1267562494004564010>

yeah this is probably why , cause it was giving a syntax error of an empty variable

#

it was literally vertices =#comment

#

which doesnt make sense

sick mountain Mar 25, 2025, 4:04 PM

#

ive gotten that same thing with 2.0 ft before

rigid widget Mar 25, 2025, 4:05 PM

#

scarlet flint i wonder if deepseek r2 will be one of the best if not the best thinking model o...

Look at Deepseek V3 0324 and you will understand

elder rapids Mar 25, 2025, 4:06 PM

#

rigid widget Look at Deepseek V3 0324 and you will understand

from my testing it's alright

#

seems to have an aptitude for coding

#

2.0 pro does seem better still tbh

#

3.7 sonnet and 2.0 pro are visibly better than the other non thinking models

#

no one talks about it for some reason

keen beacon Mar 25, 2025, 4:11 PM

#

keen beacon the gemini product models suck

yeah i just found the sysprompt

#

it's literally an entire book

#

when will they understand how badly this impacts performance

calm sequoia Mar 25, 2025, 4:11 PM

#

poll_question_text

Nebula in general leaderboard. Which place after update?

victor_answer_votes

13

total_votes

17

victor_answer_id

1

victor_answer_text

1

victor_answer_emoji_name

🤩

keen beacon Mar 25, 2025, 4:11 PM

#

https://www.reddit.com/r/singularity/comments/1jjm9s9/gemini_25_pro_internal_instructions/

From the singularity community on Reddit: Gemini 2.5 Pro Internal I...

Explore this post and more from the singularity community

keen beacon Mar 25, 2025, 4:12 PM

#

keen beacon https://www.reddit.com/r/singularity/comments/1jjm9s9/gemini_25_pro_internal_ins...

wtf

#

it seems theres a lot more to that 🤣

lime coral Mar 25, 2025, 4:16 PM

#

#

Legit or not we will see

keen beacon Mar 25, 2025, 4:18 PM

#

lime coral

lol no he's not a Google employee

#

the sentence doesn't even really make sense

#

that guy is a weirdo

wintry tinsel Mar 25, 2025, 4:22 PM

#

keen beacon lol no he's not a Google employee

The information he has is reliable but he stole it from somewhere else

#

I have a question about the new deep seek V3 is the api for V3 updated or do I need a new checkpoint

#

As in a new API

scarlet flint Mar 25, 2025, 4:30 PM

#

i mean can't they like train model to behave in some way

#

instead of feeding it with system prompt that breaks performance by like 60%

scarlet flint Mar 25, 2025, 4:38 PM

#

wintry tinsel I have a question about the new deep seek V3 is the api for V3 updated or do I n...

i think its just "deepseek-chat"

torn mantle Mar 25, 2025, 4:44 PM

#

@keen beacon what am i reading here? Did they mess with the model again? Does it feel different than what we got on lmarena?

keen beacon Mar 25, 2025, 4:46 PM

#

it's out!!

keen beacon Mar 25, 2025, 4:46 PM

#

torn mantle <@456226577798135808> what am i reading here? Did they mess with the model again...

nah it's just the Gemini system prompt sucking

#

omg

silk haven Mar 25, 2025, 4:47 PM

#

I https://x.com/sundarpichai/status/1904575384466710607?s=46&t=P8-tRi_JAVcI6l5U6nOT4A

Sundar Pichai (@sundarpichai) on X

Nebula

#

🚀🚀🚀

torn mantle Mar 25, 2025, 4:47 PM

#

keen beacon it's out!!

finally

keen beacon Mar 25, 2025, 4:48 PM

#

this version is NOT a thinking model, that appears to be still on its way on studio

brittle tiger Mar 25, 2025, 4:48 PM

#

LIve in AI STUDIO

pure nova Mar 25, 2025, 4:48 PM

#

why is ai studio somehow better than normal site

keen beacon Mar 25, 2025, 4:48 PM

#

keen beacon this version is NOT a thinking model, that appears to be still on its way on stu...

???

pure nova Mar 25, 2025, 4:48 PM

#

i dont understand what they do to it

keen beacon Mar 25, 2025, 4:48 PM

#

its thinking for me

#

oh it's a hybrid

#

interesting

keen beacon Mar 25, 2025, 4:48 PM

#

keen beacon oh it's a hybrid

its thinking for everything for me

pure nova Mar 25, 2025, 4:49 PM

#

wtf the google ai studio got the hexagon wrong

#

insane

#

it does it fine for like 10 seconds then it falls through

thorny drum Mar 25, 2025, 4:49 PM

#

wow sundar tweeting nebula

#

discord hype goes far

brittle tiger Mar 25, 2025, 4:50 PM

#

glad it launched with 1m context

#

def better than gemini app version for some reason. arc-agi test i used that failed gemini app over and over one shotted it in ai studio like it did in arena as nebula

keen beacon Mar 25, 2025, 4:52 PM

#

keen beacon its thinking for everything for me

yeah it was just bugged

north vale Mar 25, 2025, 4:57 PM

#

wild

barren prairie Mar 25, 2025, 4:58 PM

#

Is pro2.5 nebula confirmed ? Or maybe specter ? And nebula is still didn t come

north vale Mar 25, 2025, 4:58 PM

#

sick mountain Mar 25, 2025, 4:58 PM

#

wow

north vale Mar 25, 2025, 4:58 PM

#

that's nuts

keen beacon Mar 25, 2025, 4:58 PM

#

barren prairie Is pro2.5 nebula confirmed ? Or maybe specter ? And nebula is still didn t come

specter nebula phantom are the same just different revisions afaik

lime coral Mar 25, 2025, 4:58 PM

#

barren prairie Is pro2.5 nebula confirmed ? Or maybe specter ? And nebula is still didn t come

lime coral Mar 25, 2025, 4:59 PM

#

keen beacon specter nebula phantom are the same just different revisions afaik

No confirmation

keen beacon Mar 25, 2025, 4:59 PM

#

theres never gonna be official confirmation

lime coral Mar 25, 2025, 4:59 PM

#

Can as well be flash 2.5 or flash in the app or whatever

sick mountain Mar 25, 2025, 4:59 PM

#

sometimes there is

keen beacon Mar 25, 2025, 5:00 PM

#

on unreleased variants that never get released?

sick mountain Mar 25, 2025, 5:00 PM

#

oh i mean if it is released

keen beacon Mar 25, 2025, 5:00 PM

#

lime coral Can as well be flash 2.5 or flash in the app or whatever

ok but theres no evidence for that

#

while there is some for them being the same

silk haven Mar 25, 2025, 5:01 PM

#

https://x.com/sundarpichai/status/1904579419496386736?s=46&t=P8-tRi_JAVcI6l5U6nOT4A

Sundar Pichai (@sundarpichai) on X

1/ Gemini 2.5 is here, and it’s our most intelligent AI model ever.

Our first 2.5 model, Gemini 2.5 Pro Experimental is a state-of-the-art thinking model, leading in a wide range of benchmarks – with impressive improvements in enhanced reasoning and coding and now #1 on

lime coral Mar 25, 2025, 5:01 PM

#

None in both case. All I see is « their way to answer looks the same ». Not enough

keen beacon Mar 25, 2025, 5:03 PM

#

lime coral None in both case. All I see is « their way to answer looks the same ». Not enou...

what? did u account for when they arrived in the arena? anyway i feel like the community here is extremely good at this

silk haven Mar 25, 2025, 5:04 PM

#

https://x.com/jeffdean/status/1904580112248693039?s=46&t=P8-tRi_JAVcI6l5U6nOT4A

Jeff Dean (@JeffDean) on X

🥁Introducing Gemini 2.5, our most intelligent model with impressive capabilities in advanced reasoning and coding.

Now integrating thinking capabilities, 2.5 Pro Experimental is our most performant Gemini model yet. It’s #1 on @lmarena_ai leaderboard. 🥇

keen beacon Mar 25, 2025, 5:04 PM

#

lime coral None in both case. All I see is « their way to answer looks the same ». Not enou...

whats ur track record btw?

brittle tiger Mar 25, 2025, 5:05 PM

#

1443 dam

pure nova Mar 25, 2025, 5:05 PM

#

can we see the questions that it gets asked though

#

it would be nice to see what it got wrong/right etc

keen beacon Mar 25, 2025, 5:06 PM

#

https://x.com/OfficialLoganK/status/1904580368432586975?t=fKVOERgBUn3dfxTBvbtOgA&s=19

Logan Kilpatrick (@OfficialLoganK) on X

Introducing Gemini 2.5 Pro, the world's most powerful model, with unified reasoning capabilities + all the things you love about Gemini (long context, tools, etc)

Available as experimental and for free right now in Google AI Studio + API, with pricing coming very soon!

silk haven Mar 25, 2025, 5:06 PM

#

pure nova Mar 25, 2025, 5:06 PM

#

why cant we see the claude 3.7 one for code stuff lol

#

he hid it?

keen beacon Mar 25, 2025, 5:07 PM

#

GOOGLE IS SO BACK

pure nova Mar 25, 2025, 5:07 PM

#

keen beacon GOOGLE IS SO BACK

dog whistle

#

so back

keen beacon Mar 25, 2025, 5:07 PM

#

huh

pure nova Mar 25, 2025, 5:07 PM

#

those benchmarks are insane

#

gemini is like 5x better

#

apart frm coding somehow

silk haven Mar 25, 2025, 5:08 PM

#

2.5 free, OpenAI is cooked

pure nova Mar 25, 2025, 5:09 PM

#

idk if its forever tho

#

how is o3 mini high still better at code

#

insane

keen beacon Mar 25, 2025, 5:09 PM

#

o3 mini is based on 4o mini too lol

lime coral Mar 25, 2025, 5:11 PM

#

pure nova why cant we see the claude 3.7 one for code stuff lol

Are you blind haha

red sluice Mar 25, 2025, 5:11 PM

#

Nebula (Gemini 2.0 Pro Thinking) is really inconsistent. It can produce never seen before results for an LLM, but sometimes it totally cracks up & produces language that is too familiar for a professional result... I'm not so sure, it's certainly a very good model, and no wonder it destroys chatgpt already. It also struggles a lot with formatting on elaborated prompts, o3-mini is (unfortunately) better for my usage.

lime coral Mar 25, 2025, 5:11 PM

#

pure nova Mar 25, 2025, 5:12 PM

#

https://i.imgur.com/MGVl272.png

Imgur

gentle plinth Mar 25, 2025, 5:12 PM

#

red sluice Nebula (Gemini 2.0 Pro Thinking) is really inconsistent. It can produce never se...

maybe try a lower temperature?

red sluice Mar 25, 2025, 5:12 PM

#

red sluice Nebula (Gemini 2.0 Pro Thinking) is really inconsistent. It can produce never se...

At this point I'm not even sure Google will dare to release it in this version

lime coral Mar 25, 2025, 5:12 PM

#

red sluice Nebula (Gemini 2.0 Pro Thinking) is really inconsistent. It can produce never se...

lol it’s always strange with new reveal. Wait 1 week. Also why they label « experimental » and serve it free on the api

red sluice Mar 25, 2025, 5:12 PM

#

gentle plinth maybe try a lower temperature?

The issue is that temperature is already something normies don't use.

gentle plinth Mar 25, 2025, 5:13 PM

#

thats why its called experimental

keen beacon Mar 25, 2025, 5:13 PM

#

google's experimental models are free its not a new thing

sick mountain Mar 25, 2025, 5:13 PM

#

tested under code name nebula: https://x.com/lmarena_ai/status/1904581128746656099

lmarena.ai (formerly lmsys.org) (@lmarena_ai) on X

BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆

Tested under codename "nebula"🌌, Gemini 2.5 Pro ranked #1🥇 across ALL categories and UNIQUELY #1 in Math, Creative Writing, Instruction Following, Longer

pure nova Mar 25, 2025, 5:13 PM

#

lol , this week is ai week, 3 new models will come out and beat gemini 2.5 mark my words

keen beacon Mar 25, 2025, 5:14 PM

#

openai should drop o3

#

they just got smoked

cloud meadow Mar 25, 2025, 5:14 PM

#

north vale

woah

lime coral Mar 25, 2025, 5:14 PM

#

keen beacon google's experimental models are free its not a new thing

Like the instability

willow grail Mar 25, 2025, 5:14 PM

#

gemini 2.5 pro destroy chatgpt o3-mini-high

hmmmmmmm i hope its just a benchmark..... thingy....
if its gooder than o3 high then... omg
dont give me hopes

pure nova Mar 25, 2025, 5:14 PM

#

c 3.7 still remains undefeated for web

keen beacon Mar 25, 2025, 5:14 PM

#

pure nova c 3.7 still remains undefeated for web

lol that jump is insane tho

lime coral Mar 25, 2025, 5:15 PM

#

pure nova c 3.7 still remains undefeated for web

The confidence is too large right now

keen beacon Mar 25, 2025, 5:15 PM

#

^

cloud meadow Mar 25, 2025, 5:15 PM

#

pure nova c 3.7 still remains undefeated for web

For now

pure nova Mar 25, 2025, 5:15 PM

#

lime coral The confidence is too large right now

meaning?

lime coral Mar 25, 2025, 5:15 PM

#

Would not validate anything on lmsys until way more people try the model

#

Confidence interval +-15 pts for gemini and +-10 pts for Claude.

cloud meadow Mar 25, 2025, 5:16 PM

#

Claude didn't do anything too crazy with 3.7. Soon enough R2 will drop. I can't wait.

keen beacon Mar 25, 2025, 5:16 PM

#

yall are sleeping on qwen

gentle plinth Mar 25, 2025, 5:16 PM

#

lime coral Confidence interval +-15 pts for gemini and +-10 pts for Claude.

with 95% certainty

lime coral Mar 25, 2025, 5:17 PM

#

Anyway 2k vote is not enough. Same think for the global ranking. Every new model is #1 because of this

north vale Mar 25, 2025, 5:17 PM

#

2k is enough

#

every new model is not #1

lime coral Mar 25, 2025, 5:18 PM

#

Most of the big labs model are #1

#

Where is 4.5 now?

north vale Mar 25, 2025, 5:18 PM

#

lime coral Most of the big labs model are #1

doubtful

silk haven Mar 25, 2025, 5:18 PM

#

keen beacon openai should drop o3

They don't have enough GPUs for that

north vale Mar 25, 2025, 5:18 PM

#

would you predict 2.5 pro gets off of #1?

red sluice Mar 25, 2025, 5:18 PM

#

Weird that on multi turn, it's not being that dominant so far.

keen beacon Mar 25, 2025, 5:19 PM

#

rl is usually done only in single turn

cloud meadow Mar 25, 2025, 5:19 PM

#

Where is Samuel Altman's moat?

#

Evaporated

#

Meta needs to release soon

keen beacon Mar 25, 2025, 5:19 PM

#

cloud meadow Where is Samuel Altman's moat?

o3 mini which is based on 4o mini being competitive with a much larger model 🤔

lime coral Mar 25, 2025, 5:19 PM

#

silk haven They don't have enough GPUs for that

If they are making it somehow part of gpt5 it’s maybe better for them like that. Imagine they release o3 and then gpt5 has like 0.1% better reasoning stats

#

No more hype for hypeman

red sluice Mar 25, 2025, 5:20 PM

#

OpenAI is cooked if they don't have an unreleased model to quickly drop asap lol

keen beacon Mar 25, 2025, 5:21 PM

#

they have o3 they're just stalling on it

red sluice Mar 25, 2025, 5:21 PM

#

Doesn't need to be o3 or 5, just needs to be something that can keep up

hardy pecan Mar 25, 2025, 5:21 PM

#

red sluice OpenAI is cooked if they don't have an unreleased model to quickly drop asap lol

They have a stream in 40 mins... to show off image editing most likely lol

lime coral Mar 25, 2025, 5:22 PM

#

Full stats

hardy pecan Mar 25, 2025, 5:22 PM

#

oh dear...

keen beacon Mar 25, 2025, 5:22 PM

#

oh wow

#

that simpleqa score

#

is bonkers

#

gemini has always been great at simpleqa

#

but it appears with 2.5

#

they literally

#

leapt for almost every benchmark

#

they have been cooking

north vale Mar 25, 2025, 5:24 PM

#

wish they'd shown their prev top model in the benchmark table

red sluice Mar 25, 2025, 5:24 PM

#

Is o3 that good? Is there any decent benchmark available somewhere? Couldn't test it, too expensive 💀

north vale Mar 25, 2025, 5:24 PM

#

to compare the improvement

cloud meadow Mar 25, 2025, 5:24 PM

#

Imagine Llama 4 drops and it overtakes 2.5 lmfao

#

Unlikely to happen but it would be crazy

north vale Mar 25, 2025, 5:24 PM

#

red sluice Is o3 that good? Is there any decent benchmark available somewhere? Couldn't tes...

seems likely that it's pretty good at using more test-time compute

keen beacon Mar 25, 2025, 5:24 PM

#

have u gone on the arena?

north vale Mar 25, 2025, 5:25 PM

#

so maybe like not that good in single completions

keen beacon Mar 25, 2025, 5:25 PM

#

cloud meadow Imagine Llama 4 drops and it overtakes 2.5 lmfao

llama 4 seems to be disappointing

barren prairie Mar 25, 2025, 5:25 PM

#

cloud meadow Imagine Llama 4 drops and it overtakes 2.5 lmfao

Impossible

keen beacon Mar 25, 2025, 5:25 PM

#

based on the meta model spam

cloud meadow Mar 25, 2025, 5:25 PM

#

keen beacon llama 4 seems to be disappointing

If it's anything like those new anonymous models, yeah true.

#

Still holding out hope

#

Trust the Zucc

keen beacon Mar 25, 2025, 5:26 PM

#

i doubt they will be able to beat qwen 3 even when releasing one month later

cloud meadow Mar 25, 2025, 5:27 PM

#

What sizes do you think they will release?

keen beacon Mar 25, 2025, 5:28 PM

#

confirmed sizes are 8b and 15b moe for now. i would expect a successor to the 32b model, maybe moe

silk haven Mar 25, 2025, 5:28 PM

#

I’m testing 2.5 Pro on the Gemini app and the experience is better than in AI Studio, the integration with Google Search and YouTube is insane

lime coral Mar 25, 2025, 5:29 PM

#

With the new DeepSeek v3 llama got postponed for at least six more months

keen beacon Mar 25, 2025, 5:29 PM

#

lime coral With the new DeepSeek v3 llama got postponed for at least six more months

no way they dont announce llama 4 at their first llamacon

red sluice Mar 25, 2025, 5:29 PM

#

lime coral Mar 25, 2025, 5:29 PM

#

keen beacon no way they dont announce llama 4 at their first llamacon

I know. It was a joke lol

keen beacon Mar 25, 2025, 5:29 PM

#

red sluice

depends on when oai launch o3

keen beacon Mar 25, 2025, 5:30 PM

#

lime coral I know. It was a joke lol

but yeah even with it being delayed 6 months i dont think meta will be able to beat deepseek 🤣

lime coral Mar 25, 2025, 5:30 PM

#

Depends on when Google drop the experimental

#

Or will they ship gemini ultra 3 instead

keen beacon Mar 25, 2025, 5:30 PM

#

i think the qwen team are the dark horse in this, but i dont think they will outright be sota

red sluice Mar 25, 2025, 5:32 PM

#

Is there a simple way to see against which models an LLM loses the most on average?

silk haven Mar 25, 2025, 5:34 PM

#

The 🐐
https://x.com/noamshazeer/status/1904581813215125787?s=46&t=P8-tRi_JAVcI6l5U6nOT4A

Noam Shazeer (@NoamShazeer) on X

Introducing Gemini 2.5 Pro Experimental.

The 2.5 series marks a significant evolution: Gemini models are now fundamentally thinking models.

This means the model reasons before responding, to maximize accuracy -- and it’s our best Gemini model yet.

Blog -

elder rapids Mar 25, 2025, 5:35 PM

#

lime coral Full stats

predicted the long context leap

elder rapids Mar 25, 2025, 5:35 PM

#

silk haven The 🐐 https://x.com/noamshazeer/status/1904581813215125787?s=46&t=P8-tRi_JAVcI...

no way I predicted this too

#

lmaooo

elder rapids Mar 25, 2025, 5:35 PM

#

elder rapids what if 2.5 pro is an entirely different and new model

^

elder rapids Mar 25, 2025, 5:36 PM

#

elder rapids I'm predicting it becomes the best long context reasoner

^

keen beacon Mar 25, 2025, 5:36 PM

#

i said that here first

elder rapids Mar 25, 2025, 5:36 PM

#

quiet buddy

#

I'm taking it

keen beacon Mar 25, 2025, 5:36 PM

#

#general message

#

shhhh

rigid widget Mar 25, 2025, 5:36 PM

#

wintry tinsel I have a question about the new deep seek V3 is the api for V3 updated or do I n...

yes it's updated both api and app

elder rapids Mar 25, 2025, 5:37 PM

#

keen beacon shhhh

ig good predictions tho

#

I was gonna analogize 1.0 → 1.5 distinction tho

#

and then the evolution of 1.5 → 2.0

#

since they probably started completely from scratch on each one

#

so if it doesn't signify thinking, it's probably inherently a pure thinking model

#

that was my thought process

red sluice Mar 25, 2025, 5:38 PM

#

🤔 What's that

silk haven Mar 25, 2025, 5:39 PM

#

#

https://x.com/googleaidevs/status/1904586624333471975?s=46&t=P8-tRi_JAVcI6l5U6nOT4A

Google AI Developers (@googleaidevs) on X

Join the team behind Gemini 2.5 as they dive into the model’s thinking and coding advancements.

🎙️Space starts at 12:20pm PT. Drop your questions below.
https://t.co/wBOHiC0n9k

north vale Mar 25, 2025, 5:39 PM

#

do we know api cost for 2.5 pro

keen beacon Mar 25, 2025, 5:39 PM

#

red sluice 🤔 What's that

lmao what

lime coral Mar 25, 2025, 5:39 PM

#

Asking about native image. Native audio is a myth now

lime coral Mar 25, 2025, 5:39 PM

#

north vale do we know api cost for 2.5 pro

0 for now. Still experimental

keen beacon Mar 25, 2025, 5:39 PM

#

4o avm is native audio tho if ur talking bout that

lime coral Mar 25, 2025, 5:40 PM

#

Speaking about gemini. They teased it for gemini 2.0 and then ~

keen beacon Mar 25, 2025, 5:40 PM

#

oh

lime coral Mar 25, 2025, 5:40 PM

#

https://youtu.be/qE673AY-WEI?si=XsJ1AQqyriRzlv-Y

YouTube

Google for Developers

Building with Gemini 2.0: Native audio output

Gemini 2.0 introduces multilingual native audio output. Watch this demo to see how this new capability can help developers build multimodal AI agents. These new output modalities are available to early testers, with wider rollout expected next year. Start building with Gemini 2.0 at aistudio.google.com.

Learn more about Gemini 2.0 → https://...

▶ Play video

#

Crazy demo

elder rapids Mar 25, 2025, 5:41 PM

#

also just tried 2.5 pro on AI studio

#

and it's clearly different from the product

#

ngl I don't even know where to start, I was like an hour late on discovering 2.5 pro

torn mantle Mar 25, 2025, 5:42 PM

#

still havent tried it

#

but i said they tend to nerf their model

keen beacon Mar 25, 2025, 5:43 PM

#

That simpleqa score is crazy

torn mantle Mar 25, 2025, 5:43 PM

#

maybe its not nebula

#

specter was google model right?

keen beacon Mar 25, 2025, 5:43 PM

#

Gpt 4.5 is much much larger and 2.5 pro is somewhat competitive

keen beacon Mar 25, 2025, 5:43 PM

#

torn mantle maybe its not nebula

They said it was nebula

torn mantle Mar 25, 2025, 5:43 PM

#

keen beacon They said it was nebula

mm

#

did you notice any difference?

lime coral Mar 25, 2025, 5:47 PM

#

https://x.com/alexandr_wang/status/1904589984591695874?s=46

Alexandr Wang (@alexandr_wang) on X

🚨 Gemini 2.5 Pro Exp dropped and it's now #1 across SEAL leaderboards:

🥇 Humanity’s Last Exam
🥇 VISTA (multimodal)
🥇 (tie) Tool Use
🥇 (tie) MultiChallenge (multi-turn)
🥉 (tie) Enigma (puzzles)

Congrats to @demishassabis @sundarpichai & team!

🔗 https://t.co/pVIgk6rIcL

olive mesa Mar 25, 2025, 5:51 PM

#

just saw this wow

#

i dont know its context length

#

im guessing 2 million or 3-4 million since its 2.5

north vale Mar 25, 2025, 5:52 PM

#

not 3-4 million

olive mesa Mar 25, 2025, 5:52 PM

#

oh only 1m

#

well i mean thats still huge but i guess since it's experimental rn

lime coral Mar 25, 2025, 5:53 PM

#

It will increased with time

olive mesa Mar 25, 2025, 5:53 PM

#

yeah

lime coral Mar 25, 2025, 5:53 PM

#

OG 1.5 was first released with 128k even though they teased the 1-2M

elder rapids Mar 25, 2025, 5:55 PM

#

it seems to brute force the 30 hares 20 wolfs thing

#

and it gets it correctly

keen beacon Mar 25, 2025, 5:57 PM

#

um tf it has a cut off of january 2025?? wtf the turn around is insane

elder rapids Mar 25, 2025, 5:57 PM

#

did Google find the secret sauce 😭

keen beacon Mar 25, 2025, 5:58 PM

#

the gemini 2-2.5 timeline is absolutely insane

keen ferry Mar 25, 2025, 5:59 PM

#

opinion's on gemini 2.5 pro?

keen beacon Mar 25, 2025, 5:59 PM

#

keen beacon um tf it has a cut off of january 2025?? wtf the turn around is insane

im not sure if its correct. ill have to see if it knows events after june 2024

keen ferry Mar 25, 2025, 5:59 PM

#

was it worth the wait

keen beacon Mar 25, 2025, 5:59 PM

#

keen beacon um tf it has a cut off of january 2025?? wtf the turn around is insane

meanwhile gpt-4.5:

#

what a joke

keen beacon Mar 25, 2025, 6:00 PM

#

keen ferry was it worth the wait

yes

keen beacon Mar 25, 2025, 6:01 PM

#

keen beacon meanwhile gpt-4.5:

bro they continue pretrained the model/did all this stuff in like a month/two 🤣 if thats actually true

elder rapids Mar 25, 2025, 6:01 PM

#

keen ferry was it worth the wait

didn't even have to wait lmao

keen beacon Mar 25, 2025, 6:01 PM

#

i thought the gemini 2 timelines were short but this is CRAZY

elder rapids Mar 25, 2025, 6:01 PM

#

this really is crazy

#

vibes + insane reasoning

torn mantle Mar 25, 2025, 6:02 PM

#

are we sure this is the same model?

brittle tiger Mar 25, 2025, 6:02 PM

#

gemini image editing looks better than openai feature coming out today

https://x.com/wintermoat/status/1904593298008006924

Alphabetting (@wintermoat) on X

@testingcatalog Seems like it changed his face. Gemini w native multimodal doesn't do that.

elder rapids Mar 25, 2025, 6:02 PM

#

told you guys you werent glazing it enough lmao

keen beacon Mar 25, 2025, 6:02 PM

#

torn mantle are we sure this is the same model?

yes, this is nebula

lime coral Mar 25, 2025, 6:03 PM

#

Gpt4o also failed the hands

keen beacon Mar 25, 2025, 6:03 PM

#

embarassing

olive mesa Mar 25, 2025, 6:04 PM

#

keen ferry was it worth the wait

the wait wasnt even long lmao

rigid widget Mar 25, 2025, 6:04 PM

#

wow 😲

olive mesa Mar 25, 2025, 6:05 PM

#

it gives me a different vibe

elder rapids Mar 25, 2025, 6:06 PM

#

ong

torn mantle Mar 25, 2025, 6:06 PM

#

olive mesa it gives me a different vibe

its different

olive mesa Mar 25, 2025, 6:06 PM

#

yeah

#

i like how it writes code now

#

it just overall looks better

elder rapids Mar 25, 2025, 6:08 PM

#

Claude and Google's vibe switched

#

talking about the models

silk haven Mar 25, 2025, 6:08 PM

#

2.5 is a Breakthrough

elder rapids Mar 25, 2025, 6:08 PM

#

3.7 became more robotic, 2.5 pro is so creative

elder rapids Mar 25, 2025, 6:08 PM

#

silk haven 2.5 is a Breakthrough

has to be

oblique flint Mar 25, 2025, 6:08 PM

#

Damn I was wrong about 2.5 pro lol, it's actually better at coding than I initially anticipated. Would be great if it's cheaper than claude too

elder rapids Mar 25, 2025, 6:08 PM

#

look at that long context lmao

torn mantle Mar 25, 2025, 6:12 PM

#

the model is so good

olive mesa Mar 25, 2025, 6:12 PM

#

fr

pure nova Mar 25, 2025, 6:12 PM

#

coding though?

torn mantle Mar 25, 2025, 6:13 PM

#

pure nova coding though?

good so far

pure nova Mar 25, 2025, 6:13 PM

#

anyone tried c/c++ ?

torn mantle Mar 25, 2025, 6:13 PM

#

pure nova anyone tried c/c++ ?

yea

pure nova Mar 25, 2025, 6:13 PM

#

i dont think any ai is fit for c/c++ right now tbh to build an actual decent project

#

its getting somewhat closer but its still lacking a lot

elder rapids Mar 25, 2025, 6:16 PM

#

2.5 pro brute forcing webdev is crazy

silk haven Mar 25, 2025, 6:17 PM

#

2.5 series… not only 2.5 pro
When 2.5 flash? Maybe phantom?

#

olive mesa Mar 25, 2025, 6:18 PM

#

#1 on lmarena jeez

#

anybody know what it gets on arc-agi-2?

elder rapids Mar 25, 2025, 6:18 PM

#

probably similar to sonnet 3.7 thinking

olive mesa Mar 25, 2025, 6:20 PM

#

yeah its close to 64k

torn mantle Mar 25, 2025, 6:20 PM

#

olive mesa anybody know what it gets on arc-agi-2?

not yet

olive mesa Mar 25, 2025, 6:21 PM

#

so far ive only seen it compared to 16k and 32k and it's a lot better

lime coral Mar 25, 2025, 6:21 PM

#

Arc agi is useless

#

I prefer humanity last exam

#

At least you solve practical things

oblique flint Mar 25, 2025, 6:23 PM

#

Crazy that 3.7 sonnet is still 90 points ahead of 2.5 pro in webdev arena

elder rapids Mar 25, 2025, 6:24 PM

#

oblique flint Crazy that 3.7 sonnet is still 90 points ahead of 2.5 pro in webdev arena

p sure Claude is made for these kinds of tasks specifically

#

it's worse in other things compared to 2.5 pro

lime coral Mar 25, 2025, 6:24 PM

#

oblique flint Crazy that 3.7 sonnet is still 90 points ahead of 2.5 pro in webdev arena

What is crazy is Google jump

elder rapids Mar 25, 2025, 6:25 PM

#

ye

lime coral Mar 25, 2025, 6:25 PM

#

Claude was like the ultimate King there

#

3.5 only dethroned by 3.7

elder rapids Mar 25, 2025, 6:25 PM

#

ong

cloud meadow Mar 25, 2025, 6:40 PM

#

This new Google Gemini 2.5 model is insane

📎 message.txt

#

No other model has continously followed my instructions this well

#

It's also picked up things better than any other model

#

Might be my new favorite

silk haven Mar 25, 2025, 6:41 PM

#

2.0 pro was removed from Gemini app

silk haven Mar 25, 2025, 6:41 PM

#

silk haven 2.0 pro was removed from Gemini app

Rip

cloud meadow Mar 25, 2025, 6:42 PM

#

cloud meadow No other model has continously followed my instructions this well

📎 message.txt

#

The reasoning is interesting too

eager crater Mar 25, 2025, 6:42 PM

#

today is crazy

#

we got dalle 4 and the best overall llm

cloud meadow Mar 25, 2025, 6:43 PM

#

I remember just a few months ago (I think before deepseek r1) some dude talking about how everything has been boring since the finetune days and then r1 and distills dropped

#

It was pretty funny lol

rigid widget Mar 25, 2025, 6:44 PM

#

They've taken user data in aistudio seriously. I tried to make them do this dozens of times, but they couldn't.

Screenshot_2025-03-25-21-41-30-241_ru.zdevs.zarchiver.png

rigid widget Mar 25, 2025, 6:45 PM

#

eager crater we got dalle 4 and the best overall llm

Even though most of you aren't aware of it yet, the best non-reasoning model also

cloud meadow Mar 25, 2025, 6:46 PM

#

rigid widget They've taken user data in aistudio seriously. I tried to make them do this doze...

You are definitely right. I tried feeding a list of names of java obfuscators into the model since it had no idea past something like proguard and now it lists the top 5 I've continously asked questions about lmfao

#

Idk if I am the sole reason but I think I made it into the dataset

keen beacon Mar 25, 2025, 6:48 PM

#

cloud meadow Idk if I am the sole reason but I think I made it into the dataset

theres a huge jump in world knowledge

#

maybe not

cloud meadow Mar 25, 2025, 6:49 PM

#

I should hope so considering the fact they own a search engine

torn mantle Mar 25, 2025, 6:49 PM

#

rigid widget They've taken user data in aistudio seriously. I tried to make them do this doze...

yea i noticed that

#

i was trying like some niche prompts on aistudio

#

and seems like they improved on them a lot

#

way way better than grok 3 + reasoning

#

blows it out of the water

oblique flint Mar 25, 2025, 6:50 PM

#

From reddit:

Just a couple of days ago I wrote this:

This is my exact experience. Long context windows are barely any use. They are vaguely helpful for "needle in a haystack" problems, not much more.

I have a "test" which consists in sending it a collection of almost 1000 poems, which currently sit at around ~230k tokens, and then asking a bunch of stuff which requires reasoning over them. Sometimes, it's something as simple as "identify key writing periods and their differences" (the poems are ordered chronologically). More often than not, it doesn't even "see" the final poems, and it has this exact feeling of "seeing the first ones", then "skipping the middle ones", "seeing some a bit ahead" and "completely ignoring everything else".

I see very few companies tackling the issue of large context windows, and I fully believe that they are key for some significant breakthroughs with LLMs. RAG is not a good solution for many problems. Alas, we will have to keep waiting...

Having just tried this model, I can say that this is a breakthrough moment. A leap. This is the first model that can consistently comb through these poems (200k+ tokens) and analyse them as a whole, without significant issues or problems. I have no idea how they did it, but they did it.

Finally they're starting to utilize that context window

torn mantle Mar 25, 2025, 6:50 PM

#

so consistent too

eager crater Mar 25, 2025, 6:53 PM

#

rigid widget Even though most of you aren't aware of it yet, the best non-reasoning model als...

deepseek v3?

thorny drum Mar 25, 2025, 6:54 PM

#

that's crazy

#

imagine being in middle school rn

#

you can literally copy paste your whole book in for your book report

rigid widget Mar 25, 2025, 6:57 PM

#

v3 0325 You can really move and place blocks but it laaaags a lot.

keen beacon Mar 25, 2025, 6:57 PM

#

lmao

red sluice Mar 25, 2025, 6:58 PM

#

Totally a bug, not openai's core prompt 🤔

rigid widget Mar 25, 2025, 7:00 PM

#

meanwhile 2.5 pro

Screenshot_2025-03-25-21-58-22-790_com.foxdebug.acode.png

#

left deepseek v3 0324 (non-reasoning model) right gemini 2.5 pro (reasoning model)

Screenshot_2025-03-25-22-01-42-452_ru.zdevs.zarchiver.png

cloud meadow Mar 25, 2025, 7:02 PM

#

oneshot minecraft?

rigid widget Mar 25, 2025, 7:08 PM

#

it create a good Hacker News clone, but does Hacker News have anything to do with hackers at all?

Screenshot_2025-03-25-21-35-34-164-edit_org.mozilla.firefox.jpg

keen beacon Mar 25, 2025, 7:08 PM

#

it has hackers in the name

#

this thing is great at web design

#

(w/ custom cursor)

cloud meadow Mar 25, 2025, 7:09 PM

#

It made that?

keen beacon Mar 25, 2025, 7:09 PM

#

it made the entire landing

#

that's just a part

#

it can be pretty bold.. personally i think this is cool

rigid widget Mar 25, 2025, 7:13 PM

#

keen beacon lmao

Hahaha Security

brittle tiger Mar 25, 2025, 7:18 PM

#

"Gemini 2.5 Pro just zero-shotted a task o3-mini-high made no progress on after burning through millions of credits via Aider"

https://x.com/_clashluke/status/1904612478199173346

Lucas Nestler (@_clashluke) on X

tbc, this is 100% real

keen beacon Mar 25, 2025, 7:19 PM

#

holy moly. there are some bugs here but i'm confident it could solve them in 1 prompt

#

fully built by gemini 2.5 pro

#

(the svg is naturally pretty bad, llms can't do word svgs just yet)

fleet lintel Mar 25, 2025, 7:22 PM

#

keen beacon holy moly. there are some bugs here but i'm confident it could solve them in 1 p...

what was your prompt?

keen beacon Mar 25, 2025, 7:23 PM

#

"Write full HTML, CSS and JavaScript for a very beautiful, bold, creative, sleek, polished landing page for Cosine, an AI lab", then "Make it much more beautiful, bold, creative, sleek, and polished. Do not use comments." x2

olive mesa Mar 25, 2025, 7:24 PM

#

google is progressing faster than openai

rigid widget Mar 25, 2025, 7:24 PM

#

what x2 mean?

olive mesa Mar 25, 2025, 7:24 PM

#

well from the public view

#

i know they have a lot better stuff they're working on rn

keen beacon Mar 25, 2025, 7:24 PM

#

rigid widget what x2 mean?

sent it twice

#

iteration

olive mesa Mar 25, 2025, 7:25 PM

#

keen beacon holy moly. there are some bugs here but i'm confident it could solve them in 1 p...

woah

keen beacon Mar 25, 2025, 7:27 PM

#

lol

elder rapids Mar 25, 2025, 7:27 PM

#

oblique flint From reddit: >>> Just a couple of days ago I wrote this: This is my exact exper...

glaze me I predicted this exactly

keen beacon Mar 25, 2025, 7:28 PM

#

bro wants that validation

elder rapids Mar 25, 2025, 7:28 PM

#

glaze me dawg

#

I need it

rigid widget Mar 25, 2025, 7:28 PM

#

Aistudio crashed and I can't even access my other prompts.

elder rapids Mar 25, 2025, 7:28 PM

#

I said ts verbatim

keen beacon Mar 25, 2025, 7:28 PM

#

elder rapids glaze me dawg

if you insist?

elder rapids Mar 25, 2025, 7:28 PM

#

yo tell me why I'm a genius

keen beacon Mar 25, 2025, 7:28 PM

#

rigid widget Aistudio crashed and I can't even access my other prompts.

yeah it did for a sec for me but it's working again now (?)

keen beacon Mar 25, 2025, 7:28 PM

#

elder rapids yo tell me why I'm a genius

u just joined bro

elder rapids Mar 25, 2025, 7:29 PM

#

pre 2.5 pro

#

I came here to talk about my observations with nebula

#

enuff speaking chat, lift me onto my pedestal

elder rapids Mar 25, 2025, 7:30 PM

#

keen beacon holy moly. there are some bugs here but i'm confident it could solve them in 1 p...

0 shot?

keen beacon Mar 25, 2025, 7:31 PM

#

i didn't give it any examples but i did ask it to iterate on itself

elder rapids Mar 25, 2025, 7:31 PM

#

damn

#

this is crazy

#

alright so wait

#

this implies it can reason through granularity now through 1m context

#

this is nuts

keen beacon Mar 25, 2025, 7:36 PM

#

did somebody say nuts 🗣️

#

but fr

rigid widget Mar 25, 2025, 7:40 PM

#

Hey, you're Google, snap out of it!

Screenshot_2025-03-25-22-34-45-296_org.mozilla.firefox.png

elder rapids Mar 25, 2025, 7:42 PM

#

keen beacon but fr

granularity has been the no. 1 problem for ages lmao

#

there was definitely a breakthrough

#

also, I think sometimes it still breaks, in certain CoT processes it stops putting in the numbers for a calculation, but keeps the surround formatting

torn mantle Mar 25, 2025, 7:45 PM

#

keen beacon holy moly. there are some bugs here but i'm confident it could solve them in 1 p...

looks great

olive mesa Mar 25, 2025, 7:55 PM

#

0 shot?

ocean vortex Mar 25, 2025, 7:56 PM

#

what's your prompt? Curious how this compares with gpt4.5 and grok3

pure nova Mar 25, 2025, 8:02 PM

#

no way someone has already jailbrken gemini

olive mesa Mar 25, 2025, 8:05 PM

#

pure nova no way someone has already jailbrken gemini

that's 2.5 pro exp??

pure nova Mar 25, 2025, 8:05 PM

#

ye

olive mesa Mar 25, 2025, 8:05 PM

#

damn

eager crater Mar 25, 2025, 8:10 PM

#

ocean vortex what's your prompt? Curious how this compares with gpt4.5 and grok3

Code SVG of a detailed crab

eager crater Mar 25, 2025, 8:10 PM

#

olive mesa 0 shot?

yes

elder rapids Mar 25, 2025, 8:30 PM

#

just gave 2.5 pro 800k tokens worth of material and it processed it faster than flash and pro, and gave extraordinary summary results, and didn't miss a single granular thing, and also gave interpretive results rather than just data points

#

Google did something

wintry tinsel Mar 25, 2025, 8:34 PM

#

pure nova no way someone has already jailbrken gemini

Dm me the jailbreak?

elder rapids Mar 25, 2025, 8:34 PM

#

and then I said I was surprised and that its crazy it's able to do things like that over long context and it pinpointed exactly why it was different, just from the quality of its own output

pure nova Mar 25, 2025, 8:34 PM

#

wintry tinsel Dm me the jailbreak?

cant he told me to not send it to anyone

#

he's good with SE and stuff

#

so he done it many times before with claude too and everything

elder rapids Mar 25, 2025, 8:34 PM

#

this model is literally 1 of 1

wintry tinsel Mar 25, 2025, 8:34 PM

#

K can you give me his contact than, I know Pliny will release a jailbreak but his stuff is annoying in how it’s formatted

pure nova Mar 25, 2025, 8:35 PM

#

elder rapids just gave 2.5 pro 800k tokens worth of material and it processed it faster than ...

such as?

elder rapids Mar 25, 2025, 8:35 PM

#

pure nova such as?

such as what

pure nova Mar 25, 2025, 8:35 PM

#

like whatd u ask

elder rapids Mar 25, 2025, 8:35 PM

#

the type of information?

pure nova Mar 25, 2025, 8:35 PM

#

wintry tinsel K can you give me his contact than, I know Pliny will release a jailbreak but hi...

he wont give it to no one lol

elder rapids Mar 25, 2025, 8:35 PM

#

it was just a book lol

wintry tinsel Mar 25, 2025, 8:36 PM

#

pure nova he wont give it to no one lol

Finee

pure nova Mar 25, 2025, 8:36 PM

#

if u wanna try his dc is

#

access44

rigid widget Mar 25, 2025, 8:36 PM

#

What is gpt4o-lmsys-0315a-ev3-text

#

0315? is this mistake

keen beacon Mar 25, 2025, 8:36 PM

#

march 15th?

rigid widget Mar 25, 2025, 8:36 PM

#

first time out

keen beacon Mar 25, 2025, 8:37 PM

#

weird ass name

rigid widget Mar 25, 2025, 8:37 PM

#

it can't be march 15

#

but it not good at translation

keen beacon Mar 25, 2025, 8:37 PM

#

rigid widget it can't be march 15

doesnt have to be trained just today

rigid widget Mar 25, 2025, 8:39 PM

#

keen beacon doesnt have to be trained just today

Shouldn't it be according to their release date?

keen beacon Mar 25, 2025, 8:39 PM

#

could be but i think thats an internal name lol

barren prairie Mar 25, 2025, 8:41 PM

#

It looks alike sarcoptes scabei 🤔

lime coral Mar 25, 2025, 8:57 PM

#

https://x.com/paulgauthier/status/1904637913411031410?s=46

Paul Gauthier (@paulgauthier) on X

Gemini 2.5 Pro sets SOTA on the aider polyglot leaderboard with a score of 73%.

This is well ahead of thinking/reasoning models. A huge jump from prior Gemini models. The first Gemini model to effectively use efficient diff-like editing formats.

https://t.co/mBVaUPGHPl

rigid widget Mar 25, 2025, 9:03 PM

#

who is rhea from?

#

it's very good

cedar tide Mar 25, 2025, 9:06 PM

#

rigid widget who is rhea from?

Big model from meta

brittle tiger Mar 25, 2025, 9:15 PM

#

lime coral https://x.com/paulgauthier/status/1904637913411031410?s=46

Damn that's wild

north vale Mar 25, 2025, 9:18 PM

#

prolly llama 4 checkpoint

torn mantle Mar 25, 2025, 9:26 PM

#

rigid widget who is rhea from?

its bad

torn mantle Mar 25, 2025, 9:26 PM

#

lime coral https://x.com/paulgauthier/status/1904637913411031410?s=46

impressive

elder rapids Mar 25, 2025, 9:32 PM

#

rigid widget it's very good

only for coding, but it's not better than the SOTA models in that either

#

not that good

north vale Mar 25, 2025, 9:34 PM

#

https://x.com/petarv_93/status/1904643818030317579?s=46

Petar Veličković (@PetarV_93) on X

Gemini models are now capable enough to assist with fundamental AI research!

Several theorems featured in our recent ICML submissions were co-proved with Gemini's help.

2.5 Pro is a really good model; give it a try if you haven't already :)

elder rapids Mar 25, 2025, 9:38 PM

#

crazy

rocky wing Mar 25, 2025, 9:44 PM

#

Hi all

mint relic Mar 25, 2025, 9:48 PM

#

Hello, anyone tried R2 ? Is there any place to use it ?

elder rapids Mar 25, 2025, 9:51 PM

#

it isn't out

sour spindle Mar 25, 2025, 10:09 PM

#

What do you actually get with paid gemini vs. just using the models in ai studio

rigid widget Mar 25, 2025, 10:15 PM

#

Deepseek V3 vs Deepseek V3 0325 real outputs. (Claude, almost there!)
here is https://rentry.org/deepseekv3-vs-v3-0325

Deepseek V3 vs V3 0325

same prompt, same temperature, one shot
V3
V3 0325

rigid widget Mar 25, 2025, 10:18 PM

#

mint relic Hello, anyone tried R2 ? Is there any place to use it ?

Unfortunately no

elder rapids Mar 25, 2025, 10:24 PM

#

@keen beacon do any other models get this right?

RDT_20250111_1254287202857936932807272.jpg.jpg

#

besides 2.5 pro

keen beacon Mar 25, 2025, 10:25 PM

#

not as far as i know

elder rapids Mar 25, 2025, 10:27 PM

#

do you have access to o1

#

ive been trying a ton of these puzzles and it seems like 2.5 pro is way ahead in this aspect

keen beacon Mar 25, 2025, 10:30 PM

#

elder rapids do you have access to o1

yeah

elder rapids Mar 25, 2025, 10:36 PM

#

even when made into text form

#

including o3 mini and deepseek

#

they just can't get them right

willow grail Mar 25, 2025, 10:38 PM

#

eu didnt get the new 4o image?

elder rapids Mar 25, 2025, 10:44 PM

#

keen beacon yeah

rigid widget Mar 25, 2025, 10:44 PM

#

elder rapids ive been trying a ton of these puzzles and it seems like 2.5 pro is way ahead in...

gemini modals really good at multimodal

elder rapids Mar 25, 2025, 10:45 PM

#

rigid widget gemini modals really good at multimodal

yeah they tend to understand things more, but Im making them into text form

#

as I said

#

and the gap isn't THAT large

brittle tiger Mar 25, 2025, 10:46 PM

#

sour spindle What do you actually get with paid gemini vs. just using the models in ai studio

a worse experience until they leave experimental mode. no reason to use them in gemini app until they are no longer experimental. when that happens tho you get integration with search, drive, gmail, youtube, image generation, etc

keen beacon Mar 25, 2025, 10:47 PM

#

dont they still suck even if theyre not experimental?

rigid widget Mar 25, 2025, 10:47 PM

#

elder rapids yeah they tend to understand things more, but Im making them into text form

I also do not like using artificial intelligence directly in a multimodal manner because I always get worse results. So I OCR it into text first.

keen beacon Mar 25, 2025, 10:47 PM

#

in the gemini product

elder rapids Mar 25, 2025, 10:48 PM

#

ye that tends to be a good option

elder rapids Mar 25, 2025, 10:49 PM

#

keen beacon in the gemini product

not anymore tbh

#

once they leave experimental

rigid widget Mar 25, 2025, 10:49 PM

#

sour spindle What do you actually get with paid gemini vs. just using the models in ai studio

I think it's just speed and less error Otherwise, models are trained with data whether you pay for or not.

keen beacon Mar 25, 2025, 10:52 PM

#

elder rapids

added to eqbench

#

also

#

im not keeping track but i feel like i should've hit the ai studio RPD rate limit by now

#

it is nowhere to be seen

keen beacon Mar 25, 2025, 10:55 PM

#

keen beacon im not keeping track but i feel like i should've hit the ai studio RPD rate limi...

i think the models are unlimited on the site but limited on the api

#

cuz i use aistudio all the time

#

random ocr? random tests conversations etc

elder rapids Mar 25, 2025, 10:56 PM

#

keen beacon added to eqbench

damn that's really crazy

#

wonder if they did the same thing deepseek did, training for specifically eq

keen beacon Mar 25, 2025, 10:56 PM

#

elder rapids damn that's really crazy

pretty big jump

sour spindle Mar 25, 2025, 11:06 PM

#

keen beacon im not keeping track but i feel like i should've hit the ai studio RPD rate limi...

I feel like this too. I sometimes get this message: "failed to list tuned models user has exceeded quota" but it says I am still using the model.

#

Maybe they are simply bypassing the rates for the time being?

elder rapids Mar 25, 2025, 11:07 PM

#

keen beacon pretty big jump

you can definitely see it too, the way it speaks is pretty great

#

after some hours with it

#

it has moments where it resembles Claude

keen beacon Mar 25, 2025, 11:08 PM

#

sour spindle I feel like this too. I sometimes get this message: "failed to list tuned models...

ive been getting that since yesterday i think

#

its just bugged

elder rapids Mar 25, 2025, 11:08 PM

#

or at least, a very large, intelligent model

#

makes me wonder how Big pro is

#

if this model is below 100b that would be really crazy

keen beacon Mar 25, 2025, 11:10 PM

#

it is not that would be absurd tbh

silk haven Mar 25, 2025, 11:11 PM

#

Apricot-exp-v1?? Amazon model?

hazy quest Mar 25, 2025, 11:12 PM

#

Finaly midnight in Europe. What a day this has been lmao

elder rapids Mar 25, 2025, 11:14 PM

#

keen beacon it is not that would be absurd tbh

wym?

#

isn't sonnet and 4o at least 100b

keen beacon Mar 25, 2025, 11:15 PM

#

4o is estimated to be 200b, sonnet is 400b

#

pro is definitely within that range

sour spindle Mar 25, 2025, 11:15 PM

#

hazy quest Finaly midnight in Europe. What a day this has been lmao

"There are decades where nothing happens; and there are weeks where decades happen"

Could change this to days and weeks with AI development lol

elder rapids Mar 25, 2025, 11:15 PM

#

damn fr?

keen beacon Mar 25, 2025, 11:16 PM

#

total params. theyre all moe i think

#

https://epoch.ai/gradient-updates/frontier-language-models-have-become-much-smaller

Epoch AI

Frontier language models have become much smaller

In this Gradient Updates weekly issue, Ege discusses how frontier language models have unexpectedly reversed course on scaling, with current models an order of magnitude smaller than GPT-4.

torn mantle Mar 25, 2025, 11:17 PM

#

yea this new model is on next level

elder rapids Mar 25, 2025, 11:19 PM

#

keen beacon pro is definitely within that range

8b flash to 200b would be wild tbh

#

I expect pro to be 120~

keen beacon Mar 25, 2025, 11:19 PM

#

elder rapids 8b flash to 200b would be wild tbh

flash is not 8b

elder rapids Mar 25, 2025, 11:19 PM

#

or 150

keen beacon Mar 25, 2025, 11:19 PM

#

8b flash is a different model

elder rapids Mar 25, 2025, 11:19 PM

#

keen beacon flash is not 8b

flash 8b dawg

keen beacon Mar 25, 2025, 11:19 PM

#

its a different model

#

theres flash and flash 8b

elder rapids Mar 25, 2025, 11:20 PM

#

yeah duh

#

😭

#

I said that

#

but anyways

brittle tiger Mar 25, 2025, 11:20 PM

#

north vale https://x.com/petarv_93/status/1904643818030317579?s=46

Think ppl discount how much stuff like this will drive progress as models inprove

keen beacon Mar 25, 2025, 11:20 PM

#

1.5 line:
flash (larger)
flash 8b
pro

2.0
flash lite (direct technical successor of flash)
flash (larger than flash lite/1.5 flash)
pro

elder rapids Mar 25, 2025, 11:20 PM

#

🙏

#

anyways

#

8b flash to 200b would be wild tbh

north vale Mar 25, 2025, 11:21 PM

#

yeah i think they're calling them flash and pro based on the speed and cost more than the size being comparable to 1.5 's flash and pro

#

basically flash could be 200B with 40B active params

#

and pro could be 1.3T with 150B active params

#

really uncertain but that would make sense to me

keen beacon Mar 25, 2025, 11:21 PM

#

north vale and pro could be 1.3T with 150B active params

lol this is wild

elder rapids Mar 25, 2025, 11:21 PM

#

yeah but we know that's not true so it's kinda trivial

north vale Mar 25, 2025, 11:22 PM

#

oh ok, i don't know it to not be true

#

how do you know

keen beacon Mar 25, 2025, 11:22 PM

#

economics lol they are not increasing model size to that level anymore

#

they didnt even release 1.0 ultra access and a google employee confirmed it wasnt even close to og gpt 4 afaik

elder rapids Mar 25, 2025, 11:22 PM

#

north vale oh ok, i don't know it to not be true

there are traits of models + that would be heavy and unnecessary + ton of money for no reason

sick mountain Mar 25, 2025, 11:22 PM

#

why not? hardware is getting better too

elder rapids Mar 25, 2025, 11:22 PM

#

if models are 27b with similar performance

#

yeah that's not true lmao

#

you still need the total params to run it

north vale Mar 25, 2025, 11:23 PM

#

i really don't think 27B models have similar perf

sick mountain Mar 25, 2025, 11:23 PM

#

ultra did not use MoE iirc

elder rapids Mar 25, 2025, 11:23 PM

#

north vale i really don't think 27B models have similar perf

wym?

#

-10b are visibly worse

north vale Mar 25, 2025, 11:24 PM

#

27B models have much lower perf than gemini pro

elder rapids Mar 25, 2025, 11:24 PM

#

but 30~ is just fine

north vale Mar 25, 2025, 11:24 PM

#

is what i'm saying

elder rapids Mar 25, 2025, 11:24 PM

#

yeah but I'm not talking about Gemini pro

north vale Mar 25, 2025, 11:24 PM

#

oh ok

atomic locust Mar 25, 2025, 11:24 PM

#

I want to give out my MacBook Air 2020 &** for free, it's in perfect health and good as, alongside a charger so it's perfect, I want to give it out because I just got a new model and I thought of giving out the old one to someone who can't afford one and is in need of it... Strictly First come first serve !
DM IF YOU ARE INTERESTED

keen beacon Mar 25, 2025, 11:24 PM

#

north vale yeah i think they're calling them flash and pro based on the speed and cost more...

no they arent btw. they directly said flash lite is based on 1.5 flash size/architecture/whatever i dont recall the exact quote

elder rapids Mar 25, 2025, 11:25 PM

#

but anyways back to the point

#

I do think Google has always had special models, and the speed both perform at is crazy

#

so they can't be that big

north vale Mar 25, 2025, 11:26 PM

#

I just don't see gemini 2.5 pro being within 30% of the size of gemini 1.5 pro

#

elo 140 pts apart

keen beacon Mar 25, 2025, 11:26 PM

#

north vale I just don't see gemini 2.5 pro being within 30% of the size of gemini 1.5 pro

one was pretrained way before and wasnt even a thinking model + modern stuff

elder rapids Mar 25, 2025, 11:27 PM

#

ye

north vale Mar 25, 2025, 11:27 PM

#

yeah good point

elder rapids Mar 25, 2025, 11:27 PM

#

I think the pros are maximum 10b params deviation

north vale Mar 25, 2025, 11:27 PM

#

that'd be completely wild google dominance

elder rapids Mar 25, 2025, 11:27 PM

#

but I have no idea how large they are

#

ye

keen beacon Mar 25, 2025, 11:28 PM

#

the pros are still around the same size. but it is quite plausible they increased the size a little but its not a trillion parameter model

elder rapids Mar 25, 2025, 11:28 PM

#

and I don't think 1.5 pro is above 200b

#

it's both faster and seemed like it had less "raw" intelligence than Claude, which was similar in time, and 4o

#

seemed to know less stuff without search as well

#

completely up to you whether you agree

#

but I do think 200b+ models tend to just feel heavier

#

so I'm inclined to believe it's at most 150b

keen beacon Mar 25, 2025, 11:30 PM

#

elder rapids it's both faster and seemed like it had less "raw" intelligence than Claude, whi...

no it isnt faster lol

#

its the same

elder rapids Mar 25, 2025, 11:31 PM

#

wym?

#

it's way faster

#

especially now

keen beacon Mar 25, 2025, 11:31 PM

#

2.0 pro and 1.5 pro are the same speed

elder rapids Mar 25, 2025, 11:31 PM

#

dawg

keen beacon Mar 25, 2025, 11:31 PM

#

i cant tell right now for 2.5 pro because there are no measurements for 2.5 pro

elder rapids Mar 25, 2025, 11:31 PM

#

I'm not talking about 2.0 pro vs 1.5 pro

#

I'm talking about 1.5 pro vs 4o

#

and then equating to 2.0 pro

keen beacon Mar 25, 2025, 11:31 PM

#

well its relevant because 2.5 pro is highly likely to be continued pretrained from 2.0 pro

#

the timeline seems absurd if it isnt

elder rapids Mar 25, 2025, 11:32 PM

#

I don't think it's absurd at all tbh

sick mountain Mar 25, 2025, 11:32 PM

#

google does have the most compute

elder rapids Mar 25, 2025, 11:32 PM

#

working on both 2.0 and 2.5 at the same time is super reasonable

#

if they're going for completely different architectures

keen beacon Mar 25, 2025, 11:33 PM

#

you just pretrained gemini 2.0 pro spending millions and ur gonna throw it away and rush a model from scratch in a month or two??:?

elder rapids Mar 25, 2025, 11:33 PM

#

as they explained that it's inherently a reasoning model

keen beacon Mar 25, 2025, 11:33 PM

#

elder rapids as they explained that it's inherently a reasoning model

ok, thats still on top of a base model. not relevant

elder rapids Mar 25, 2025, 11:33 PM

#

keen beacon you just pretrained gemini 2.0 pro spending millions and ur gonna throw it away ...

ye they did that with 1.5 pro 002 to 2.0

keen beacon Mar 25, 2025, 11:33 PM

#

yes but that was a sizable amount of time

elder rapids Mar 25, 2025, 11:33 PM

#

that was also like 2 months

#

bruh

keen beacon Mar 25, 2025, 11:33 PM

#

????

elder rapids Mar 25, 2025, 11:34 PM

#

002 came out in like October

#

2.0 pro was in experimental in November

#

so ig one month

#

preceding 1206

#

oh yeah it was 2 months too I'm tripping

#

002 came in September

#

2.0 pro came in November

keen beacon Mar 25, 2025, 11:37 PM

#

elder rapids 002 came out in like October

they were working on gemini 2 in parallel

elder rapids Mar 25, 2025, 11:37 PM

#

it was on lmsys too, everyone was talking about it

elder rapids Mar 25, 2025, 11:37 PM

#

keen beacon they were working on gemini 2 in parallel

yeah I know

#

that's what I mean here

#

it's not like they're throwing away progress

#

since the "progress" is research itself

#

so they could completely ditch 2.5 tomorrow if they find another breakthrough

keen beacon Mar 25, 2025, 11:38 PM

#

002 wasnt a new pretrained model it was just another tune afaik

elder rapids Mar 25, 2025, 11:38 PM

#

yeah I know

#

but 2.0 is completely different

#

and then jumping from 2.0 to 2.5 within a couple months seems reasonable

#

that's how they managed going from "bard" to "ultra 1.0" and then a month later, into 1.5 pro

#

and then ditched ultra

keen beacon Mar 25, 2025, 11:39 PM

#

elder rapids 2.0 pro was in experimental in November

thats like 5 months, they started work after june 2024. the supposed cut off of 2.5 pro is january 2025

elder rapids Mar 25, 2025, 11:39 PM

#

keen beacon thats like 5 months, they started work after june 2024. the supposed cut off of ...

that's not crazy tho

#

they've done this more than once

keen beacon Mar 25, 2025, 11:40 PM

#

so ur saying they pretrained a new 2.5 pro model from scratch, did reasoning rl, safety, etc. in 2 months??

elder rapids Mar 25, 2025, 11:41 PM

#

saying the best AI compute in the world can't do ts is wild

#

safety aligning is the hardest part of that process

#

and I'm pretty sure past models would be insanely informative of that process

#

they probably wanted to get 2.0 over with, with a breakthrough

meager sun Mar 25, 2025, 11:42 PM

#

🧱 🤣 👍

elder rapids Mar 25, 2025, 11:42 PM

#

and then follow up with 2.5 to use it

#

the fact that 2.5 isn't actually that affected by the transformer context drop off is insane

#

it has to be different, there's no other way tbh

#

what if it's TITANS

#

that'd be crazy

#

we'll literally never know, they could have something that actually performs with titans techniques

#

etc

#

2.5 is simply different from 2.0

verbal nimbus Mar 25, 2025, 11:51 PM

#

elder rapids the fact that 2.5 isn't actually that affected by the transformer context drop o...

Found this interesting benchmark

keen beacon Mar 25, 2025, 11:52 PM

#

yeah he posted it here earlier lol

verbal nimbus Mar 25, 2025, 11:52 PM

#

Gemini is indeed the best at 120K

verbal nimbus Mar 25, 2025, 11:52 PM

#

keen beacon yeah he posted it here earlier lol

Ah

#

It struggles a bit at 16K (typical transformer behavior)

elder rapids Mar 25, 2025, 11:54 PM

#

verbal nimbus It struggles a bit at 16K (typical transformer behavior)

that might be a testing issue rather than it's actual performance

verbal nimbus Mar 25, 2025, 11:54 PM

#

Didn't notice V3 0324 is on there too

elder rapids Mar 25, 2025, 11:54 PM

#

damn

#

eclipsed in context

verbal nimbus Mar 25, 2025, 11:54 PM

#

elder rapids that might be a testing issue rather than it's actual performance

There's a reason why transformers struggle in the middle: https://www.youtube.com/watch?v=FAspMnu4Rt0

elder rapids Mar 25, 2025, 11:55 PM

#

ah yeah I know, but I mean

#

I'm not sure it would be so sudden

#

and THAT great

#

the other models don't seem to be affected

verbal nimbus Mar 25, 2025, 11:56 PM

#

Hmm yeah that's a bit odd

elder rapids Mar 25, 2025, 11:57 PM

#

that's really crazy tho tbh, 90 vs 65.6

#

2.5 pro vs 4o

keen beacon Mar 25, 2025, 11:58 PM

#

2.5 pro can do 2m-10m context+, 4o total context is 128k-200k

verbal nimbus Mar 25, 2025, 11:58 PM

#

elder rapids that's really crazy tho tbh, 90 vs 65.6

Sonnet is surprisingly even worse

elder rapids Mar 25, 2025, 11:58 PM

#

keen beacon 2.5 pro can do 2m-10m context+, 4o total context is 128k-200k

ye, but 1.5 and 2.0 pro still struggled in granularity

#

it would be more like need in a haystack

#

rather than actual reasoning

#

but with 2.5 pro that kinda just stopped existing as a problem

#

I tested it too

#

I don't think you guys realize how crazy this is tbh

verbal nimbus Mar 25, 2025, 11:59 PM

#

elder rapids that'd be crazy

That would be, but I don't think they'd start with such a big model for Titan?

elder rapids Mar 26, 2025, 12:00 AM

#

I'm really not sure

keen beacon Mar 26, 2025, 12:00 AM

#

verbal nimbus That would be, but I don't think they'd start with such a big model for Titan?

its highly likely its just 2.0 pro with continued pretraining

elder rapids Mar 26, 2025, 12:00 AM

#

2.5 pro just kinda shook me, especially testing it on lmsys with nebula

keen beacon Mar 26, 2025, 12:00 AM

#

at least the base model

verbal nimbus Mar 26, 2025, 12:00 AM

#

keen beacon its highly likely its just 2.0 pro with continued pretraining

It's a thinking model too, probably RL trained?

elder rapids Mar 26, 2025, 12:01 AM

#

ye ofc

keen beacon Mar 26, 2025, 12:01 AM

#

verbal nimbus It's a thinking model too, probably RL trained?

yes they updated the base model then tuned it for reasoning/rl on it

verbal nimbus Mar 26, 2025, 12:01 AM

#

I wish there was more news on Titan/Mamba-variants

elder rapids Mar 26, 2025, 12:01 AM

#

it has a unique cot too tho

keen beacon Mar 26, 2025, 12:01 AM

#

verbal nimbus I wish there was more news on Titan/Mamba-variants

i dont think mamba is good

elder rapids Mar 26, 2025, 12:01 AM

#

read it's reasoning process

#

it uses weird code words

#

symbols

#

etc

verbal nimbus Mar 26, 2025, 12:02 AM

#

keen beacon i dont think mamba is good

Google made two variants based on Mamba that performed better, but I haven't heard anything since.

keen beacon Mar 26, 2025, 12:02 AM

#

verbal nimbus Google made two variants based on Mamba that performed better, but I haven't hea...

this is a good article from what i recall https://magic.dev/blog/100m-token-context-windows

100M Token Context Windows — Magic

Research update on ultra-long context models, our partnership with Google Cloud, and new funding.

#

why mamba/etc dont actually work

elder rapids Mar 26, 2025, 12:02 AM

#

wonder how this is gonna go in notebook llm

#

I've had problems with it

#

nobody seems to care since it's trivial

#

but I think the products could be so much better

keen beacon Mar 26, 2025, 12:04 AM

#

keen beacon this is a good article from what i recall https://magic.dev/blog/100m-token-cont...

its been a while since i read this tho 🤣

verbal nimbus Mar 26, 2025, 12:04 AM

#

keen beacon its been a while since i read this tho 🤣

I'll check it out

#

These were the Mamba variants Google made: https://www.reddit.com/r/MachineLearning/comments/1b3leks/deepmind_introduces_hawk_and_griffin_r/

#

Haven't heard anything since though 🤷. Same with Titans.

keen beacon Mar 26, 2025, 12:06 AM

#

verbal nimbus Haven't heard anything since though 🤷. Same with Titans.

transformers keep being improved and improved tbh i dont see anything replacing it lol

verbal nimbus Mar 26, 2025, 12:07 AM

#

keen beacon transformers keep being improved and improved tbh i dont see anything replacing ...

Probably not anytime soon. Diffusion LLMs seemed interesting though.

elder rapids Mar 26, 2025, 12:07 AM

#

keen beacon transformers keep being improved and improved tbh i dont see anything replacing ...

ye but I guess still not technically the same architecture

#

as current thinking models

#

but probably gonna remain the base

#

I think the problems we currently have now will eventually be fixed, like better reasoning by creating a CoT

#

and then more attachments

ocean vortex Mar 26, 2025, 12:12 AM

#

elder rapids as current thinking models

What. Thinking models are literally your standard transformer architecture with some fine-tuning. Nothing under the hood is changed

keen beacon Mar 26, 2025, 12:13 AM

#

yeah this guy is wild man

elder rapids Mar 26, 2025, 12:20 AM

#

ocean vortex What. Thinking models are literally your standard transformer architecture with ...

what's with the lack of reading comprehension here

#

I don't want to be rude, this has happened more than once too

#

but goddamn

elder rapids Mar 26, 2025, 12:22 AM

#

elder rapids but probably gonna remain the base

^ literally said this

#

🙏😔

leaden palm Mar 26, 2025, 12:41 AM

#

elder rapids ye but I guess still not technically the same architecture

??

rigid widget Mar 26, 2025, 12:52 AM

#

leaden palm ??

good luck

elder rapids Mar 26, 2025, 12:54 AM

#

leaden palm ??

because it isn't technically the same architecture lmao you guys are confusing transformer with what we have now, which has been established as a change for a while now, as with gpt or native multimodality

#

now I'm wondering if you guys are trolling lmao

#

this is getting ridiculous

leaden palm Mar 26, 2025, 1:02 AM

#

elder rapids because it isn't technically the same architecture lmao you guys are confusing t...

what we have now, which has been established as a change for a while now

are you saying "all modern models (even llama) have tweaks and improvements over the original gpt, and gpt is a large improvement over transformers" (pedantic) or "thinking models have an architecturally different way of generating text" (incorrect, see r1)

willow grail Mar 26, 2025, 1:04 AM

#

what the rate limit for gemini 2.5?

keen beacon Mar 26, 2025, 1:05 AM

#

willow grail what the rate limit for gemini 2.5?

I've been using it a lot and haven't encountered it yet

#

If there is one it's very high. I don't think aistudio is limited like the free api offering

willow grail Mar 26, 2025, 1:06 AM

#

keen beacon If there is one it's very high. I don't think aistudio is limited like the free ...

but ai studio is also free same as openrouter

keen beacon Mar 26, 2025, 1:06 AM

#

willow grail but ai studio is also free same as openrouter

Ya but u have low rpd

willow grail Mar 26, 2025, 1:06 AM

#

have u connted a ide with 2.5?

keen beacon Mar 26, 2025, 1:06 AM

#

Requests per day

#

Idk about openrouter tho

keen beacon Mar 26, 2025, 1:06 AM

#

willow grail but ai studio is also free same as openrouter

I mean on the aistudio website there aren't limits

leaden palm Mar 26, 2025, 1:06 AM

#

willow grail what the rate limit for gemini 2.5?

willow grail Mar 26, 2025, 1:07 AM

#

keen beacon I mean on the aistudio website there aren't limits

oke so u copy paste everything into your ide

willow grail Mar 26, 2025, 1:07 AM

#

leaden palm

top rpm vs bottom rpm vs req day?

keen beacon Mar 26, 2025, 1:07 AM

#

I don't use ai to code yet they suck at rust

leaden palm Mar 26, 2025, 1:07 AM

#

openrouter actually gives you more limits lol

leaden palm Mar 26, 2025, 1:07 AM

#

willow grail top rpm vs bottom rpm vs req day?

top one is if you have a payment method

#

(which is weird because it's free either way)

willow grail Mar 26, 2025, 1:08 AM

#

leaden palm openrouter actually gives you more limits lol

doesnt openrouter share the one 2.5 one with all users?

#

so everybody has less prompts

leaden palm Mar 26, 2025, 1:08 AM

#

willow grail doesnt openrouter share the one 2.5 one with all users?

they contacted google for higher limits

elder rapids Mar 26, 2025, 1:11 AM

#

leaden palm > what we have now, which has been established as a change for a while now are ...

this would make sense if this premise weren't my own claim lol, they suggested fundemental architectural change but I said it isn't technically the same but it doesn't matter since with or without inherent limitations (transformer, or not), we can optimize for other specific tasks like we did with CoT, and what were already doing (for agentic use)

willow grail Mar 26, 2025, 1:13 AM

#

leaden palm they contacted google for higher limits

thanks honey

elder rapids Mar 26, 2025, 1:13 AM

#

since it's architectural identity wasn't a primary claim, and what I said operates on its lack of relevance already, this is just a comprehension issue

leaden palm Mar 26, 2025, 1:14 AM

#

elder rapids since it's architectural identity wasn't a primary claim, and what I said operat...

comprehension??

#

you're the one saying that thinking models use different architectures

#

and don't get that r1 is just v3 RLd on thinking

elder rapids Mar 26, 2025, 1:20 AM

#

leaden palm you're the one saying that thinking models use different architectures

I explicitly said "remain the base" dawg 😭

#

and even clarified "not technically the same" so I consider what I'm saying pedantic posturing, but for rhetorical purposes

#

since the discussion is operating primarily on the CORE architecture, ie titans vs transformers and I'm explicitly stepping away from that dialectic, what do you think I'm saying

#

not only that, I even clarified why they can technically be distinguished between the base transformer architecture (Ie gpt, multimodality) and since yes comprehension is an issue, you dismissed it with "pedantry" knowing that's the premise, not my rebuttal towards what they're saying

leaden palm Mar 26, 2025, 1:30 AM

#

nobody asked

keen beacon Mar 26, 2025, 1:33 AM

#

can someone with access to o1 pro give it this

#

the answer is permanent

#

but gemini 2.5 pro, grok 3 thinking and claude 3.7 sonnet thinking all fail

leaden palm Mar 26, 2025, 1:36 AM

#

keen beacon can someone with access to o1 pro give it this

Question 5

A particle P, of mass m, is attached to one end of a light elastic string of natural length 0.5 m and modulus of elasticity 2mg. The other end of the string is attached to a fixed point A on a rough horizontal surface.

P is held at a point B, where |AB|=0.5 m and given a speed of 1.4 ms⁻¹ in the direction AB.

P comes at rest at the point C.

Determine whether this position of rest is instantaneous or permanent.

heres the transcription

keen beacon Mar 26, 2025, 1:37 AM

#

looks like 2.5 pro gets it with code execution