#general | Arena | Page 58

torn mantle Jun 16, 2025, 8:58 AM

#

also please refrain from sharing them pls

#

im noticing google is patching them lightly

small haven Jun 16, 2025, 8:58 AM

#

torn mantle also please refrain from sharing them pls

it's not like there a whole ass forum talking about it, but will do, are u female tho

#

@deep adder

torn mantle Jun 16, 2025, 8:59 AM

#

you didnt catch that

#

hehe

small haven Jun 16, 2025, 8:59 AM

#

huh sure

torn mantle Jun 16, 2025, 8:59 AM

#

wth

#

@small haven btw i couldnt get any of them to work

small haven Jun 16, 2025, 9:07 AM

#

torn mantle <@931708065319907338> btw i couldnt get any of them to work

me neither

#

theres a trick, u have to remove token count or something like that

#

but i couldnt bother

torn mantle Jun 16, 2025, 9:08 AM

#

small haven theres a trick, u have to remove token count or something like that

huh

#

thats just the modal info

small haven Jun 16, 2025, 9:10 AM

#

torn mantle thats just the modal info

idk have u seen the link?

torn mantle Jun 16, 2025, 9:12 AM

#

yea someone sent me that already

small haven Jun 16, 2025, 9:12 AM

#

mhmm

torn mantle Jun 16, 2025, 9:12 AM

#

the thing is, they're talking about models that require the thinking budget to be turned off in order to work

keen beacon Jun 16, 2025, 9:13 AM

#

use flash as a template instead of pro so you can do that

#

the models suck anyway

#

the ones that remain

torn mantle Jun 16, 2025, 9:14 AM

#

yea

keen beacon Jun 16, 2025, 9:21 AM

#

small haven it's not like there a whole ass forum talking about it, but will do, are u femal...

google employees are probably reading that 🤣

calm sequoia Jun 16, 2025, 10:55 AM

#

Guys, have anyone here used Copilot 365?

#

I need an agentic tool liek Cursor but for documents.

#

When I hear "copilot" i get ick. But it seems to be only solution

ocean vortex Jun 16, 2025, 11:13 AM

#

I think they need to add smth like SimpleQA into their test suite. We need at least 1 benchmark where model size does make a difference

#

otherwise people looking at this may just as well conclude that there's no point paying for larger models lol

#

2.5 Flash 26% vs 54% for 2.5 Pro

#

kinda same story for other labs as well

tall summit Jun 16, 2025, 11:18 AM

#

torn mantle also please refrain from sharing them pls

whatd they send

ocean vortex Jun 16, 2025, 11:19 AM

#

tall summit whatd they send

udes

tall summit Jun 16, 2025, 11:19 AM

#

real

soft kernel Jun 16, 2025, 12:25 PM

#

calm sequoia When I hear "copilot" i get ick. But it seems to be only solution

I'm pretty sure there are better ones out there

olive mesa Jun 16, 2025, 12:34 PM

#

yo!!

torn mantle Jun 16, 2025, 12:35 PM

#

olive mesa yo!!

eeeeeh

cedar tide Jun 16, 2025, 12:37 PM

#

olive mesa yo!!

Fake

fleet lintel Jun 16, 2025, 12:38 PM

#

olive mesa yo!!

100% fake... why do you folks do that?

calm sequoia Jun 16, 2025, 12:52 PM

#

soft kernel I'm pretty sure there are better ones out there

That's what I thought, but there doesn't seem to be anything.

hazy quest Jun 16, 2025, 12:57 PM

#

https://www.reddit.com/r/singularity/comments/1lcqy8a/the_mysterious_kangaroo_video_model_on_artificial/

From the singularity community on Reddit: The mysterious "Kangaroo"...

Explore this post and more from the singularity community

cedar tide Jun 16, 2025, 2:55 PM

#

New open source large reasoning model, from minimax
https://huggingface.co/MiniMaxAI/MiniMax-M1-80k

MiniMaxAI/MiniMax-M1-80k · Hugging Face

civic flame Jun 16, 2025, 3:03 PM

#

2.5 pro beat it in literally every benchmark in that image lol

cloud venture Jun 16, 2025, 3:07 PM

#

4 out of 5

#

yeah ig o3 is "winning"...

#

...the second place

calm sequoia Jun 16, 2025, 3:14 PM

#

cedar tide New open source large reasoning model, from minimax https://huggingface.co/MiniM...

These kinds of charts must be made by Apple UI developers

dusky aurora Jun 16, 2025, 3:18 PM

#

again,"there was an error" to everything

atomic pagoda Jun 16, 2025, 3:20 PM

#

I’m getting “there was an error” to everything too

dusky aurora Jun 16, 2025, 3:20 PM

#

as George Martin said, "outage is coming"

atomic pagoda Jun 16, 2025, 3:21 PM

#

Great

echo aurora Jun 16, 2025, 3:21 PM

#

uh oh

#

team is looking into blobsalute

dusky aurora Jun 16, 2025, 3:23 PM

#

echo aurora team is looking into <:blobsalute:567088319045828640>

team gets my thanks

boreal saddle Jun 16, 2025, 3:23 PM

#

dusky aurora again,"there was an error" to everything

"There was an error." is the new "As a AI language model..." at this point.

late path Jun 16, 2025, 3:25 PM

#

500

indigo hazel Jun 16, 2025, 3:27 PM

#

there error is happening on mobile and pc both

late path Jun 16, 2025, 3:27 PM

#

thought it was my problem and cleared all my cookies

#

didn't help

fleet lintel Jun 16, 2025, 3:27 PM

#

cedar tide New open source large reasoning model, from minimax https://huggingface.co/MiniM...

why no ultra model from gemini yet? They not have like 200$ per month subscription. they should have one, even if it's only 10-20% better

echo aurora Jun 16, 2025, 3:27 PM

#

my apologies for the inconvenience everyone! we are looking into getting a fix out asap

indigo hazel Jun 16, 2025, 3:29 PM

#

echo aurora my apologies for the inconvenience everyone! we are looking into getting a fix o...

but why, if i can ask, arena doesn't allow the use of o3-pro as well as it didn't do for the previous o1-pro? because it costs too much?

echo aurora Jun 16, 2025, 3:30 PM

#

indigo hazel but why, if i can ask, arena doesn't allow the use of o3-pro as well as it didn'...

I won't be able to share details sorry to say 😦 but note this has been passed along to the team to consider

indigo hazel Jun 16, 2025, 3:31 PM

#

echo aurora I won't be able to share details sorry to say 😦 but note this has been passed a...

oh ok, no problem. thank you for the answer, appreciated

jovial heath Jun 16, 2025, 3:33 PM

#

Hi, is there an error on the site?? I'm receiving there was an error message everytime xD or is just me??

sudden cloud Jun 16, 2025, 3:35 PM

#

jovial heath Hi, is there an error on the site?? I'm receiving there was an error message eve...

Ye me too

indigo hazel Jun 16, 2025, 3:36 PM

#

they re working on it i think

sudden cloud Jun 16, 2025, 3:36 PM

#

Right thanks 🙏🏼

echo aurora Jun 16, 2025, 3:40 PM

#

jovial heath Hi, is there an error on the site?? I'm receiving there was an error message eve...

we are aware of these issues and working hard to get it sorted soon, our apologies!!

jovial heath Jun 16, 2025, 3:48 PM

#

echo aurora we are aware of these issues and working hard to get it sorted soon, our apologi...

Thank youu 😄 I'll just be patient then 😄

indigo hazel Jun 16, 2025, 3:52 PM

#

pineapple is really patient xD, great person

echo aurora Jun 16, 2025, 3:57 PM

#

indigo hazel pineapple is really patient xD, great person

you're too kind!

#

ablobcheer okay should be working again!!!

#

get back to battling!

indigo hazel Jun 16, 2025, 3:59 PM

#

echo aurora get back to battling!

yes it works

#

thank you very much

echo aurora Jun 16, 2025, 4:01 PM

#

no, thank you all for flagging! truly helps us so much. we couldn't be more thankful for an active community ❤️

calm sequoia Jun 16, 2025, 4:06 PM

#

indigo hazel but why, if i can ask, arena doesn't allow the use of o3-pro as well as it didn'...

It's because nobody gonna pay 200 for +5 ELO points 😄

whole wagon Jun 16, 2025, 4:21 PM

#

They use it as a defense somehow. Like o3 pro would top the simple bench if only they benchmarked it 😂 like barely anyone can afford to bench it lol

late path Jun 16, 2025, 4:32 PM

#

new deepseek r1 got #2 on webdev arena

torn mantle Jun 16, 2025, 4:39 PM

#

?

#

i dont like blackbooth

blazing coyote Jun 16, 2025, 4:41 PM

#

Blacktooth feels noticeably worse than Kingfall

whole wagon Jun 16, 2025, 4:49 PM

#

I saw some others saying they like it

cedar tide Jun 16, 2025, 4:51 PM

#

Qwen no thinking better than thinking

Screenshot_2025-06-16-18-39-14-271_com.android.chrome-edit.jpg

#

https://x.com/ArtificialAnlys/status/1934654306839613560?t=XbykZywntRz0lBqSG3i-_g&s=19

Artificial Analysis (@ArtificialAnlys)

GPT-4o and FLUX.1 Kontext are the leading image editing models after more than 20,000 votes in the Artificial Analysis Image Editing Arena!

Here are the key takeaways:
➤ OpenAI's GPT-4o and @bfl_ml FLUX.1 Kontext (both Pro and Max) sit close together at the top of the

surreal creek Jun 16, 2025, 4:55 PM

#

Wow we may be entering the Chinese century

sour spindle Jun 16, 2025, 4:55 PM

#

calm sequoia It's because nobody gonna pay 200 for +5 ELO points 😄

Speak for yourself. FOMO is a helluva drug lol

whole wagon Jun 16, 2025, 4:55 PM

#

What checkpoint is dropping in 3 days? Is it kingfall?

surreal creek Jun 16, 2025, 4:56 PM

#

surprising overperformance by DeepSeek and Alibaba!

whole wagon Jun 16, 2025, 4:56 PM

#

Is black tooth Gemini ultra I saw ppl say that

keen beacon Jun 16, 2025, 4:57 PM

#

if u mean ga 2.5 pro, it will be the same as the 0605 preview just renamed to ga

whole wagon Jun 16, 2025, 4:57 PM

#

Kingfall is ultra?

surreal creek Jun 16, 2025, 4:57 PM

#

also, when the leaderboards were fully transferred over to the new LMArena site, did they just standardize them around 2.5 Pro 05-06 being 1446 Elo in every category?

surreal creek Jun 16, 2025, 4:58 PM

#

whole wagon Is black tooth Gemini ultra I saw ppl say that

it’s exceptionally smart so I wouldn’t be surprised, not sure what this “kingfall” model is that ppl are talking about

whole wagon Jun 16, 2025, 5:16 PM

#

blacktooth made me this plane svg which is pretty cool

#

It even did some really nice details like the hint of the far wing

#

https://pastebin.com/5GNAVpnU

#

Here is another (https://pastebin.com/TM5CuvKm)

haughty tangle Jun 16, 2025, 5:30 PM

#

v-jepa 2 having only 1.2b parameters but being near o3 is crazy

whole wagon Jun 16, 2025, 5:32 PM

#

This is gemini-2.5-pro-preview-05-06 😂 no way blacktooth is just another checkpoint of 2.5 pro

late path Jun 16, 2025, 5:34 PM

#

blacktooth does better than 0605 on SVG, but still not as good as kingfall

fleet lintel Jun 16, 2025, 5:35 PM

#

cedar tide https://x.com/ArtificialAnlys/status/1934654306839613560?t=XbykZywntRz0lBqSG3i-_...

why comparing with Flash version? And not imagen 3 or imagen 4? 🤔

whole wagon Jun 16, 2025, 5:36 PM

#

i dont really see much difference with the robot face ngl

fleet lintel Jun 16, 2025, 5:36 PM

#

late path new deepseek r1 got #2 on webdev arena

Wow! Deepseek is doing great

whole wagon Jun 16, 2025, 5:36 PM

#

the plane one is so much more obvious

fleet lintel Jun 16, 2025, 5:36 PM

#

ohk

#

what is this?

#

Looks like a new model.. do we know which company Blacktooth is from?

late path Jun 16, 2025, 5:39 PM

#

whole wagon the plane one is so much more obvious

Can you share the prompt you used?

whole wagon Jun 16, 2025, 5:39 PM

#

"Generate an SVG of a plane. Make it as detailed as possible" This is it

fleet lintel Jun 16, 2025, 5:40 PM

#

oh..looks like blacktooth is gemini

late path Jun 16, 2025, 5:41 PM

#

haha I found old messages. This one seems to be kingfall as well

echo aurora Jun 16, 2025, 5:42 PM

#

reminds me of:

potent snow Jun 16, 2025, 5:43 PM

#

Is there some way to use images as reference to generate?

whole wagon Jun 16, 2025, 5:52 PM

#

late path haha I found old messages. This one seems to be kingfall as well

is knightfall still showing in battle?

#

i dont seem to get it yet

verbal nimbus Jun 16, 2025, 5:55 PM

#

whole wagon "Generate an SVG of a plane. Make it as detailed as possible" This is it

Did you try Claude?

#

It was generally the best model at drawing in TikZ.

whole wagon Jun 16, 2025, 5:56 PM

#

claude-sonnet-4-20250514-thinking-32k

echo aurora Jun 16, 2025, 5:56 PM

#

potent snow Is there some way to use images as reference to generate?

yup! there is the capability to do image edit, you can learn more here: - https://x.com/lmarena_ai/status/1929953954554884211

lmarena.ai (@lmarena_ai)

Image Editing just got real on LMArena 🖼️✨

Introducing Image Edit Arena: where AI editing models go head-to-head on your images. Upload, edit, vote. It's that simple.

Who edits it best? You decide🫵

Learn how it works in thread 🧵

verbal nimbus Jun 16, 2025, 5:57 PM

#

This is Grok? Pretty good

verbal nimbus Jun 16, 2025, 5:58 PM

#

whole wagon `claude-sonnet-4-20250514-thinking-32k`

Hmmm not great, lol. How does Opus compare?

whole wagon Jun 16, 2025, 5:58 PM

#

i had one with opus but it was just as bad so it lost

late path Jun 16, 2025, 5:58 PM

#

whole wagon is knightfall still showing in battle?

It's not available anymore, but I really hope @echo aurora and google can bring it back to the arena😭

verbal nimbus Jun 16, 2025, 5:59 PM

#

On Claude Web, you can feed back the image and ask it to iterate on the design.

#

Seemed to work on a unicorn example.

atomic pagoda Jun 16, 2025, 6:00 PM

#

I’m still getting the error message when I try to send something or something went wrong with this response try again

verbal nimbus Jun 16, 2025, 6:03 PM

#

Claude 3.5 I think. Someone made a timelapse from feeding the outputs of each iteration back to the model. In TikZ (very difficult to draw in, but used to prevent data contamination).

atomic pagoda Jun 16, 2025, 6:05 PM

#

Is anyone else getting the error message or something went wrong with this response try again

echo aurora Jun 16, 2025, 6:05 PM

#

atomic pagoda I’m still getting the error message when I try to send something or something we...

you are? is it for all models? in all modes?

atomic pagoda Jun 16, 2025, 6:06 PM

#

I don’t know, it’s for Claude opus 4 in direct chat mode

verbal nimbus Jun 16, 2025, 6:06 PM

#

atomic pagoda Is anyone else getting the error message or something went wrong with this respo...

I keep getting at least 1 blank output for 95+% of WebDev Arena battles. Idk how the results would even be useful.

#

In 20+ battles, I think I only got a response from both models once. All the other times, one of the models would think for a long time but output nothing.

echo aurora Jun 16, 2025, 6:08 PM

#

atomic pagoda I don’t know, it’s for Claude opus 4 in direct chat mode

👍 we have a thread spun up for this so will tag you there with followup questions.

atomic pagoda Jun 16, 2025, 6:08 PM

#

Ok

calm sequoia Jun 16, 2025, 6:40 PM

#

The only loser here is Musk 🙂

#

Musk: 200k SOTA GPUs, DeepSeek: 0 SOTA GPUS (smuggling)

verbal nimbus Jun 16, 2025, 6:44 PM

#

verbal nimbus I keep getting at least 1 blank output for 95+% of WebDev Arena battles. Idk how...

These are literally my last 12 rounds on WebDev Arena:

R1 0528 blank
R1 0528 & prowlridge blank
R1 0528 & prowlridge blank
R1 0528 blank
blacktooth blank
blacktooth & gpt-4.1-mini-2025-04-14 blank
R1 0528 & blacktooth blank
R1 0528 blank
R1 0528 blank
R1 0528 & blacktooth blank
R1 0528 & blacktooth blank
R1 0528 & prowlridge blank

#

These are in consecutive order too, I'm not cherry picking.

small haven Jun 16, 2025, 6:55 PM

#

late path

kingfall will never be topped

torn mantle Jun 16, 2025, 6:57 PM

#

Not a fan of blackbooth

elder rapids Jun 16, 2025, 7:06 PM

#

imo that's just nonsensical, blacktooth is really good

#

it gaps all the other models

keen beacon Jun 16, 2025, 7:06 PM

#

personally i like kingfall better

elder rapids Jun 16, 2025, 7:07 PM

#

yeah but they're likely just different models and as far as I can tell, blacktooth is much more refined

#

no syntax problems in its output, respects the thinking process unlike kingfall, doesn't jump to conclusions, still understands everything

#

I can't trust that you guys experienced the same thing I did with kingfall, but it's not the insane model you guys are making it out to be

#

it has insane spatial abilities imo and excells in some tasks, but it performed worse usually in plain context tasks where it has to track maybe an argument, or conclude something within that box of context without just "I feel like overall x is better", stuff that necessitates an inherent grasp before even thinking about it

#

o3 imo WAS the best at this, kingfall couldn't fill that, but 0605 accomplishes these "if you know, you know" tasks

jade egret Jun 16, 2025, 7:13 PM

#

guys

#

plz help which llm is best for pygame?

elder rapids Jun 16, 2025, 7:13 PM

#

and that's the same thing I'm getting with blacktooth, which kingfall ultimately failed

small haven Jun 16, 2025, 7:14 PM

#

blacktooth is a lmarena proc, not at all functional compared to kingfall when u look at coding, hence why the svg's were also not as high-fidelity

elder rapids Jun 16, 2025, 7:16 PM

#

cool but I don't get how that's relevant

#

I don't get the glaze tbh, I abused that model a ton, it was so good I can't tell if it just had a different thinking summary, or that thinking summary just really liked what it was seeing

keen beacon Jun 16, 2025, 7:17 PM

#

kingfall felt like it had less work done on it. it had a lot of magical moments/etc where it wasn't diluted by the post training and was unintentional. (being amazing at svgs which isn't a usual post training thing you usually do, i think, and a case where it consistently started solving two 6x6 zebra puzzles when it consistently failed when given one, by spontaneously making a system when faced with increased complexity) blacktooth feels like an overcorrection/overdone post training cooking the model on stuff outside of distribution whereas your task might be in distribution for this revision. (i believe kingfall/blacktooth use the same base model, they're just different post training revisions)

small haven Jun 16, 2025, 7:18 PM

#

elder rapids I don't get the glaze tbh, I abused that model a ton, it was so good I can't tel...

were your prompts mostly riddles/simplebench q's?

elder rapids Jun 16, 2025, 7:19 PM

#

keen beacon kingfall felt like it had less work done on it. it had a lot of magical moments/...

this could be the case ye

elder rapids Jun 16, 2025, 7:19 PM

#

small haven were your prompts mostly riddles/simplebench q's?

who does that lmfao

small haven Jun 16, 2025, 7:19 PM

#

elder rapids who does that lmfao

hmm one guy

#

donald trump is delayed, damn it

elder rapids Jun 16, 2025, 7:20 PM

#

id already moved on from kingfall with riddles or tests

#

after like a couple hours of release

elder rapids Jun 16, 2025, 7:21 PM

#

elder rapids I don't get the glaze tbh, I abused that model a ton, it was so good I can't tel...

anyone feel this way too

#

did nobody see how different the summary was

#

or nah

#

it was way different

keen beacon Jun 16, 2025, 7:22 PM

#

i heard someone talk about that

#

i dont pay attention to the summary that much though

#

i leak cot if i want to read it

elder rapids Jun 16, 2025, 7:22 PM

#

ye I'd expect that, I just ignore it

#

but it caught my eye when it started naturally capitalizing and emphasizing things

#

and placing things where like, damn you can really understand it

#

and not the roboticness that the current summary has in aistudio

keen beacon Jun 16, 2025, 8:10 PM

#

iirc it will remain on chatgpt tho

primal orbit Jun 16, 2025, 8:57 PM

#

I got blacktooth in general arena, not dev

meager harbor Jun 16, 2025, 9:07 PM

#

calm sequoia The only loser here is Musk 🙂

we'll see with grok 3.5, google had so much more computanional power than OAI but since they joined the LLM party late, they were behind for like 2 years. Same should apply for XAI and it's evenworse since xAI started from scratch while it wasn't the case for google, you need some nuance in your thinking.
But yeah I see OpenAI and google on top of xAI in the long run.

meager harbor Jun 16, 2025, 9:09 PM

#

late path new deepseek r1 got #2 on webdev arena

why AI companies doesn't know how versioning works ? could'nt they call call it deepseek r1.1 or r1.5 ?

#

it's annoying as hell

#

Gemini with their 0506 and 0605 confusion

#

why are they trying to mess with us so badly ? a simple versioning would have done the trick

#

but NO they want to f***k with us it seems

#

Not even mentioning OAI who are the jerks king when it comes to bad versioning and confuse the hell out of the user

brittle tiger Jun 16, 2025, 9:22 PM

#

https://x.com/testingcatalog/status/1934718555041276237?t=vEZ4J5JTatCLWC1C0SzRxQ&s=19

TestingCatalog News 🗞 (@testingcatalog)

BREAKING 🚨: Google is preparing Gemini 2.5 Pro Deep Think for a release!

It will appear as a new option in the toolbar and will take several minutes to finalise.

Kingfall? 👀

small haven Jun 16, 2025, 9:25 PM

#

donald trump is delayed

#

apparently

meager harbor Jun 16, 2025, 9:27 PM

#

brittle tiger https://x.com/testingcatalog/status/1934718555041276237?t=vEZ4J5JTatCLWC1C0SzRxQ...

o3 pro counter attack

sacred quail Jun 16, 2025, 9:27 PM

#

its so sad that LLMs are focusing codes more and focusing writing less

#

i can understand why but still

small haven Jun 16, 2025, 9:28 PM

#

its so good that they removed it, holy moly

sacred quail Jun 16, 2025, 9:28 PM

#

i'd say gemini 06/05 best(even better than opus 4) but im a gemini fan soo i could be biased

meager harbor Jun 16, 2025, 9:29 PM

#

why aren't 01 pro and 03 pro on lm arena ?

sacred quail Jun 16, 2025, 9:29 PM

#

expensive, too much time for answer

misty vault Jun 16, 2025, 9:30 PM

#

sacred quail i'd say gemini 06/05 best(even better than opus 4) but im a gemini fan soo i cou...

after months of shtting on gemini
this MIGHT actually be real now

small haven Jun 16, 2025, 9:30 PM

#

the great flippening

meager harbor Jun 16, 2025, 9:33 PM

#

meager harbor why aren't 01 pro and 03 pro on lm arena ?

well but then they can use any marketing BS to say its the best model in the world ever. I don't understand why OAI don't give free acess for 03 pro to lm arena for Batlle (and not direct access), it's not like they don't have the computational power to treat 500 request per day for 03 pro. I call this BS. OAI could if they wanted to

small haven Jun 16, 2025, 9:35 PM

#

huh i get temporarily limited on o3 pro occasionally, they are still compute constrained

sacred quail Jun 16, 2025, 9:35 PM

#

I remember anthropic did diss against lmarena because they think human feedbacks turns models lame
I mean, peoples likes emojis and charts and theyre listen us about that but
still useful to see people's feedback

meager harbor Jun 16, 2025, 9:37 PM

#

small haven huh i get temporarily limited on o3 pro occasionally, they are still compute con...

yeah but its because i'm sure they're like hundred of thousands of prompt per day for 03pro alone

wintry tinsel Jun 16, 2025, 9:57 PM

#

sacred quail i'd say gemini 06/05 best(even better than opus 4) but im a gemini fan soo i cou...

They are tied each one has its own skills

#

At writing

#

Gemini is more Descriptive Opus feels more natural

storm needle Jun 16, 2025, 10:11 PM

#

small haven huh i get temporarily limited on o3 pro occasionally, they are still compute con...

openai scammed you

small haven Jun 16, 2025, 10:11 PM

#

its not really "unlimited"

#

as they say

meager harbor Jun 16, 2025, 10:12 PM

#

sacred quail I remember anthropic did diss against lmarena because they think human feedbacks...

best benchmark i saw so far is lm arena but yeah the problem is people often ask AI on something they're not well versed on. it's like 2 ai giving you 2 different answer on the best move on chess is but since you know nothing or not enough on chess, you can't know whose right

#

you pay for my subscription then ?

storm needle Jun 16, 2025, 10:13 PM

#

small haven its not really "unlimited"

well you have gpt 4.5 unlimited at least

small haven Jun 16, 2025, 10:13 PM

#

storm needle well you have gpt 4.5 unlimited at least

they took that away too

storm needle Jun 16, 2025, 10:13 PM

#

small haven they took that away too

what is the limit?

small haven Jun 16, 2025, 10:13 PM

#

storm needle what is the limit?

zero

#

tbf i spam like a bot

meager harbor Jun 16, 2025, 10:14 PM

#

😂 for real ?

small haven Jun 16, 2025, 10:15 PM

#

better off holding it for deepthink

#

how do u know though

#

where is o3 pro usamo 2025

ornate agate Jun 16, 2025, 10:17 PM

#

all I want to know is how this happened

small haven Jun 16, 2025, 10:18 PM

#

hmm

small haven Jun 16, 2025, 10:19 PM

#

ornate agate all I want to know is how this happened

is that open source too?

#

bing chilling wants to see ur id

haughty tangle Jun 16, 2025, 10:23 PM

#

there's going to be a decent amount of people literally worshipping ASI once it's created, there's already a religion for people who think current AI's are gods

small haven Jun 16, 2025, 10:24 PM

#

https://tenor.com/view/xi-jinping-clapping-applause-gif-17581118

Tenor

elder rapids Jun 16, 2025, 10:26 PM

#

ornate agate all I want to know is how this happened

the video generators that are apparently "better" on all these leaderboards are pretty fake imo

#

I've tried them, none of them are nearly as good as veo 3

#

and it's not even close, most of them aren't as good as veo 2

#

it's really strange how high rated they are though

#

I do notice that they excel off of an image prompt or a flow-esque setup, and do pretty well for fantasy (because they're fine-tuned for that) but it shouldn't be this close

#

ye but I can't imagine that's anything but a flaw of the benchmark

#

rather than how good the model is in reality

#

I haven't used seedance

#

I'm just going off of all the other AI like Kling etc

wintry tinsel Jun 16, 2025, 10:32 PM

#

haughty tangle there's going to be a decent amount of people literally worshipping ASI once it'...

Naaaah that’s just as symbolic as saying money is your god, ai is just silicon or bismuth if we get there

jade egret Jun 16, 2025, 10:35 PM

#

hola

#

guys

#

why do i think claude 4 opus is much more creative than 2.5 pro?

small haven Jun 16, 2025, 10:39 PM

#

jade egret why do i think claude 4 opus is much more creative than 2.5 pro?

yes marginally better

ornate agate Jun 16, 2025, 10:39 PM

#

also I think claudes are dense models, whereas gemini is MoE, so it can use all those parameters for creativity

unborn ocean Jun 16, 2025, 10:40 PM

#

no, but they have higher quality human feedback and also a higher quantity of it

#

that is the main part

#

it is also why the models are really good in these short human preference evaluators

that said, bytedance still build a very impressive model, mainly because they can achieve lower prices than competitors (the models we get is already greatly distilled and technically 480p with a 1080p upscale)

jade egret Jun 16, 2025, 10:40 PM

#

ornate agate also I think claudes are dense models, whereas gemini is MoE, so it can use all ...

true

#

yall

#

do you think kingsfall is gonna be better than claude 4 opus (at those specific coding)

small haven Jun 16, 2025, 10:42 PM

#

jade egret do you think kingsfall is gonna be better than claude 4 opus (at those specific ...

100%

jade egret Jun 16, 2025, 10:42 PM

#

small haven 100%

W

#

what about

#

blacktooth?

small haven Jun 16, 2025, 10:42 PM

#

mhmmm, not rlly

jade egret Jun 16, 2025, 10:42 PM

#

dang

unborn ocean Jun 16, 2025, 10:43 PM

#

they are not as strong in the llm space

elder rapids Jun 16, 2025, 10:43 PM

#

jade egret why do i think claude 4 opus is much more creative than 2.5 pro?

it's not much more creative but it tends to play into fun and cohesive tropes more than 2.5 pro so it can appear that way, it's more natural, to a point

unborn ocean Jun 16, 2025, 10:43 PM

#

but i was (and i assumed you aswell) refering to their video models

elder rapids Jun 16, 2025, 10:44 PM

#

isn't a secret that 0325 was extremely creative

unborn ocean Jun 16, 2025, 10:48 PM

#

deepseek is a very respected newcomer, but there are other companies that capture more of the consumer market there

#

it should also be noted that bytedance operates a ranking platform almost identical to the one used by aritificial analysis but for the chinese market (also including competitors, very similar to artificial analysis)

#

and that is probably also a reason why they are so good at the specific format

patent aspen Jun 16, 2025, 10:52 PM

#

One thing I haven't seen many people point out: if the video leaderboards included native audio generation, there would only be one model

unborn ocean Jun 16, 2025, 10:55 PM

#

patent aspen One thing I haven't seen many people point out: if the video leaderboards includ...

"Google facing the worst stock crash in its history, because its new veo 3 only reaches a 50% win rate!!!"

#

the market is in shambles

#

reporting live

misty vault Jun 16, 2025, 11:00 PM

#

yo my internet is bugged and user pfps not loading

#

oh wait

sour spindle Jun 16, 2025, 11:05 PM

#

Anyone know what model this is: prowlridge

patent aspen Jun 16, 2025, 11:09 PM

#

Just realized flash lite could be a pun on flashlight

small haven Jun 16, 2025, 11:22 PM

#

allegedly

potent pilot Jun 16, 2025, 11:56 PM

#

I'm just curious, has anyone opted out of the arbitration section of the ToS?

elder rapids Jun 17, 2025, 12:19 AM

#

sour spindle Anyone know what model this is: prowlridge

2.5 flash lite

late path Jun 17, 2025, 12:22 AM

#

https://x.com/OfficialLoganK/status/1934766679138951593

Logan Kilpatrick (@OfficialLoganK)

gemini
gemini
gemini

small haven Jun 17, 2025, 12:24 AM

#

late path https://x.com/OfficialLoganK/status/1934766679138951593

v3?

cedar tide Jun 17, 2025, 12:25 AM

#

deep think
2.5 flash lite - prowlridge
2.5 ultra - blacktooth

small haven Jun 17, 2025, 12:25 AM

#

so dt is still coming early? did that get revised alrdy?

small haven Jun 17, 2025, 12:26 AM

#

cedar tide deep think 2.5 flash lite - prowlridge 2.5 ultra - blacktooth

that would be a dream come true tbh

small haven Jun 17, 2025, 12:29 AM

#

cedar tide deep think 2.5 flash lite - prowlridge 2.5 ultra - blacktooth

it seems like it's going to be blacktooth aint it

#

hmm damn

patent aspen Jun 17, 2025, 12:34 AM

#

I know right

torn mantle Jun 17, 2025, 12:36 AM

#

Du coup tu dors pas?

#

Idk if ultra thingy is a new model or not

cedar tide Jun 17, 2025, 12:37 AM

#

#

torn mantle Jun 17, 2025, 12:37 AM

#

Sigh

#

Blackbooth is thr next update?

#

Didn't like it

cedar tide Jun 17, 2025, 12:38 AM

#

what the horse ?

torn mantle Jun 17, 2025, 12:38 AM

#

Its like using 2.0

cedar tide Jun 17, 2025, 12:38 AM

#

torn mantle Du coup tu dors pas?

bientot

torn mantle Jun 17, 2025, 12:38 AM

#

Flash lite

torn mantle Jun 17, 2025, 12:39 AM

#

cedar tide bientot

Ok

cedar tide Jun 17, 2025, 12:43 AM

#

flash not today
but deepthink

#

prove that flash ga not today @patent aspen

patent aspen Jun 17, 2025, 12:45 AM

#

cedar tide prove that flash ga not today <@607352374352281612>

Huh?

#

I have no idea what you're talking about

sacred quail Jun 17, 2025, 12:47 AM

#

cedar tide what the horse ?

must be about "speed"

cedar tide Jun 17, 2025, 12:49 AM

#

Wasn't there a code name for the Mystery Gemini models on the arena related to a horse?

cedar tide Jun 17, 2025, 12:49 AM

#

patent aspen I have no idea what you're talking about

?

patent aspen Jun 17, 2025, 12:49 AM

#

cedar tide ?

?

cedar tide Jun 17, 2025, 12:55 AM

#

@patent aspenYou deleted your "prediction", did you think people were stupid?

patent aspen Jun 17, 2025, 12:55 AM

#

cedar tide <@607352374352281612>You deleted your "prediction", did you think people were st...

?

#

Are you doing okay? I hope the rest of your day goes well!

torn mantle Jun 17, 2025, 12:58 AM

#

cedar tide Wasn't there a code name for the Mystery Gemini models on the arena related to a...

no

#

none

small haven Jun 17, 2025, 1:00 AM

#

cedar tide what the horse ?

should tweet a 👑

small haven Jun 17, 2025, 1:03 AM

#

torn mantle Du coup tu dors pas?

Du coup tu parles francais

torn mantle Jun 17, 2025, 1:06 AM

#

small haven Du coup tu parles francais

Your eyes played tricks on you

small haven Jun 17, 2025, 1:06 AM

#

torn mantle Your eyes played tricks on you

Es tu une femelle

torn mantle Jun 17, 2025, 1:06 AM

#

Femelle pfffft

small haven Jun 17, 2025, 1:06 AM

#

lol

elder rapids Jun 17, 2025, 1:11 AM

#

cedar tide deep think 2.5 flash lite - prowlridge 2.5 ultra - blacktooth

just seems like flash lite, pro, and flash lmao

elder rapids Jun 17, 2025, 1:14 AM

#

cedar tide what the horse ?

the horse could be work horse, or horse for speed but can probably be inferred speed due to the strength emoji

#

so it could go like

flash lite - super speed, pro - powerful model, flash - fast model

#

deepthink seems to be on the way but I don't think it's guaranteed GA like with the other models, that was never set in stone

#

nor alluded to

patent aspen Jun 17, 2025, 1:16 AM

#

Yeah now that you mentioned it, I think the horse emoji is regular Flash because it's the "work horse" model

elder rapids Jun 17, 2025, 1:16 AM

#

just simply "comes later"

small haven Jun 17, 2025, 1:17 AM

#

f's

elder rapids Jun 17, 2025, 1:19 AM

#

ye but it could also be that it's previewed first tmr

#

and the info you're getting is different because while that's technically true

#

unless they don't want a preview

#

and it's just simply that dangerous

late path Jun 17, 2025, 1:22 AM

#

I think they were waiting for o3pro, but o3pro turned out to be garbage and completely not worth competing with😂

elder rapids Jun 17, 2025, 1:23 AM

#

o3 pro isn't garbage imo

#

do you think deepthink will be SOTA

late path Jun 17, 2025, 1:24 AM

#

well at least it falls far short of the expected o1 to o1pro leap

elder rapids Jun 17, 2025, 1:24 AM

#

late path well at least it falls far short of the expected o1 to o1pro leap

ye

whole wagon Jun 17, 2025, 1:24 AM

#

Does an ultra exist

elder rapids Jun 17, 2025, 1:24 AM

#

ye

whole wagon Jun 17, 2025, 1:24 AM

#

Is that what this blacktooth is lmao

#

There's no way it can be deep think, it responds way too fast

elder rapids Jun 17, 2025, 1:25 AM

#

whole wagon Is that what this blacktooth is lmao

nah I don't think so

#

if they never planned to release it, they would've likely never put it in the arena

whole wagon Jun 17, 2025, 1:25 AM

#

I think it's coming ngl

#

Sundar said before they were considering releasing the ultra models

cedar tide Jun 17, 2025, 1:26 AM

#

small haven Es tu une femelle

Est tu une chocolatine ?

elder rapids Jun 17, 2025, 1:26 AM

#

whole wagon Sundar said before they were considering releasing the ultra models

and an ultra would be, by the CEO's mouth, inefficient and overall not something they'd want to serve

small haven Jun 17, 2025, 1:26 AM

#

cedar tide Est tu une chocolatine ?

non ca c'est asura

elder rapids Jun 17, 2025, 1:26 AM

#

via "next generation performance would already close that gap"

whole wagon Jun 17, 2025, 1:26 AM

#

Up until a point

#

Eventually the size will just win

elder rapids Jun 17, 2025, 1:27 AM

#

whole wagon Eventually the size will just win

sure but if that's not the reality, then there's no point in speculating

#

true, pro could just be getting an upgrade and therefore that's blacktooth

#

but not quite ultra size

#

ye and also it's not far fetched to say that, they could be coming up with some really good efficiency innovation

#

hope they don't want profit over retention

#

benefits me more that they make things cheap

small haven Jun 17, 2025, 1:38 AM

#

off topic, but will google ever create a claude code equivalent (gemini code assistant/jules don't compare), but with gemini models? that's where they would rlly win against anthropic imo

elder rapids Jun 17, 2025, 1:39 AM

#

guess so but they could see AI as something that has nothing to do with profit or retention, and or just restrict chat AI access and go fully into subtle AI

#

where it's still beneficial for them, but removes the hope people have so when prior they could say "no profit over retention please"

#

ye but that just sidesteps the question I was asking lmao

#

"early"? if this is a distinction then it concedes user retention

small haven Jun 17, 2025, 1:45 AM

#

especially when its cli/terminal bound

jade egret Jun 17, 2025, 1:46 AM

#

what the best for coding rn

elder rapids Jun 17, 2025, 1:46 AM

#

there is no "early" or "for a time"

small haven Jun 17, 2025, 1:46 AM

#

jade egret what the best for coding rn

opus 4 unfortunately

elder rapids Jun 17, 2025, 1:46 AM

#

if it's not perpetually free then it isn't user retention in AI, simple

jade egret Jun 17, 2025, 1:48 AM

#

small haven opus 4 unfortunately

but i already spend all my uses ; (

#

2nd best?

#

fr?

#

yooo w

#

i can use 4.5 than

elder rapids Jun 17, 2025, 1:49 AM

#

jade egret 2nd best?

best for coding imo is 2.5 pro but by virtue of his assertion, then 2.5 pro still

elder rapids Jun 17, 2025, 1:49 AM

#

jade egret fr?

no

jade egret Jun 17, 2025, 1:49 AM

#

hm

elder rapids Jun 17, 2025, 1:49 AM

#

Craig just says stuff

small haven Jun 17, 2025, 1:49 AM

#

for coding? nope

elder rapids Jun 17, 2025, 1:49 AM

#

don't look at what he says

jade egret Jun 17, 2025, 1:49 AM

#

ima test both out?

#

ima try both out

elder rapids Jun 17, 2025, 1:49 AM

#

jade egret ima test both out?

nah don't, 4.5 isn't a coder at all

jade egret Jun 17, 2025, 1:50 AM

#

elder rapids nah don't, 4.5 isn't a coder at all

oh

elder rapids Jun 17, 2025, 1:50 AM

#

you'd be wasting your time trying to find a middle ground

#

he's just saying random shi

small haven Jun 17, 2025, 1:50 AM

#

u could have at least said o3/o3 pro

elder rapids Jun 17, 2025, 1:50 AM

#

deadass

jade egret Jun 17, 2025, 1:50 AM

#

small haven u could have at least said o3/o3 pro

but

#

i haev a question

#

when every i use o3

#

it doesn't even put code in the canva

jade egret Jun 17, 2025, 1:51 AM

#

jade egret it doesn't even put code in the canva

for some reason

#

: 0

elder rapids Jun 17, 2025, 1:51 AM

#

jade egret it doesn't even put code in the canva

don't worry about it

#

it's simple, don't use o3, gpt 4.5, 4o

#

for coding

#

cool 90% of people would agree

#

there's a sentiment behind asking them not to monetize or concern themselves with major profit

#

that's why the initial question was asked at all

#

yo, id rather use 4o mini

#

😭

elder rapids Jun 17, 2025, 1:54 AM

#

elder rapids hope they don't want profit over retention

^

#

retention includes a lot of things implicitly, I don't really care about how they view it

#

or their philosophy, or their definition of retention

small haven Jun 17, 2025, 1:54 AM

#

100m users organically with/without forced google integrations? 😂

elder rapids Jun 17, 2025, 1:55 AM

#

small haven 100m users organically with/without forced google integrations? 😂

this is already the case

#

with forced Google integration it would prob be close to a billion

#

lol

#

define natural acquisition

#

yo what does that even mean

#

yo

#

you know that's contradictive

#

😭

#

bro not even trying

jade egret Jun 17, 2025, 1:57 AM

#

i dont think 4.5 worked

small haven Jun 17, 2025, 1:57 AM

#

ok that was funny

jade egret Jun 17, 2025, 1:57 AM

#

jade egret i dont think 4.5 worked

help

elder rapids Jun 17, 2025, 1:57 AM

#

cool an outlier on the biggest stage where by your definition would be unnatural

small haven Jun 17, 2025, 1:58 AM

#

ive been using less of oai models fwiw

elder rapids Jun 17, 2025, 1:58 AM

#

do you not think this possibly alludes to other things

#

like maybe the ability to generate other ads

#

you can simply not count them yourself

#

the numbers are there brochacho

#

great time to use AI

#

ngl

small haven Jun 17, 2025, 2:01 AM

#

whats ur risk appetite

elder rapids Jun 17, 2025, 2:01 AM

#

no reason for that to be valid if we have no idea where openAI stands currently since it's visible growth was around a year or two ago, and Google might start going uphill given their innovations

#

like with that alphadolphin bs

jade egret Jun 17, 2025, 2:02 AM

#

what that

small haven Jun 17, 2025, 2:02 AM

#

i mean obv oai then

jade egret Jun 17, 2025, 2:02 AM

#

use gpt 4.5 for prompt?

elder rapids Jun 17, 2025, 2:02 AM

#

jade egret use gpt 4.5 for prompt?

no

small haven Jun 17, 2025, 2:02 AM

#

google as a company is alrdy mature

jade egret Jun 17, 2025, 2:02 AM

#

elder rapids no

wut does it mean

small haven Jun 17, 2025, 2:02 AM

#

so deepmind isolated?

#

deepmind > oai

elder rapids Jun 17, 2025, 2:03 AM

#

small haven google as a company is alrdy mature

different now

elder rapids Jun 17, 2025, 2:03 AM

#

small haven so deepmind isolated?

oh fr?

#

then DeepMind

#

easy

#

lmao

#

dude just trapped himself

#

😭

small haven Jun 17, 2025, 2:03 AM

#

lol

#

thought experiment -> deepmind

elder rapids Jun 17, 2025, 2:03 AM

#

😭

small haven Jun 17, 2025, 2:04 AM

#

in the ai scope, deepmind is considered infant

jade egret Jun 17, 2025, 2:04 AM

#

jade egret wut does it mean

^ pls help

small haven Jun 17, 2025, 2:04 AM

#

oai is mature

#

relatively

elder rapids Jun 17, 2025, 2:04 AM

#

small haven in the ai scope, deepmind is considered infant

yep, very infant

#

a lot younger than openAI

jade egret Jun 17, 2025, 2:04 AM

#

oh

#

it acctually follows that?

#

dang

#

alr brb

elder rapids Jun 17, 2025, 2:04 AM

#

jade egret wut does it mean

ignore Craig completely

zinc ore Jun 17, 2025, 2:04 AM

#

small haven in the ai scope, deepmind is considered infant

Huh

small haven Jun 17, 2025, 2:05 AM

#

elder rapids yep, very infant

i meant as in valuation lol

#

500b

whole wagon Jun 17, 2025, 2:05 AM

#

It would be crashing rn

zinc ore Jun 17, 2025, 2:05 AM

#

I'd guess 80% but slowly going down

whole wagon Jun 17, 2025, 2:05 AM

#

small haven 500b

At it's peak maybe

small haven Jun 17, 2025, 2:06 AM

#

whole wagon At it's peak maybe

private equity

whole wagon Jun 17, 2025, 2:06 AM

#

These days the landscape is far different

small haven Jun 17, 2025, 2:06 AM

#

so prolly even $1b at ipo, i wouldnt be surpirsed

elder rapids Jun 17, 2025, 2:06 AM

#

small haven i meant as in valuation lol

what does that have to do with what I said lol

small haven Jun 17, 2025, 2:07 AM

#

elder rapids what does that have to do with what I said lol

deepmind is older than oai, but not as infant in that regard

whole wagon Jun 17, 2025, 2:08 AM

#

Bro has to resort to mind share now 💀

small haven Jun 17, 2025, 2:09 AM

#

its obviously >80%

elder rapids Jun 17, 2025, 2:12 AM

#

small haven deepmind is older than oai, but not as infant in that regard

holy retardo, if that's what you meant then you're wrong lmao, you mentioned maturity then narrowed it down to DeepMind in specific, and then narrowed it down more to AI, which can ONLY mean productization and that's why I agreed

#

all centered on the fact we were talking about implicit growth

#

it's getting old Craig

jade egret Jun 17, 2025, 2:15 AM

#

guys

#

plz tell me

#

is there any model that is the same as claude 4 opus

small haven Jun 17, 2025, 2:16 AM

#

elder rapids holy retardo, if that's what you meant then you're wrong lmao, you mentioned mat...

so what are u trying to accomplish/prove here? deepmind age in aggregate, as a research comp, push to market? i'm talking about in aggregate, deepmind is founded earlier than oai, and thats what i meant about not as young as oai.

elder rapids Jun 17, 2025, 2:16 AM

#

jade egret is there any model that is the same as claude 4 opus

sonnet 4

jade egret Jun 17, 2025, 2:16 AM

#

elder rapids sonnet 4

welp

#

i ran out budget for that too

#

well

#

i can try

#

brb

patent aspen Jun 17, 2025, 2:17 AM

#

What if OAI just loses?

#

Seriously

jade egret Jun 17, 2025, 2:18 AM

#

why ; (

jade egret Jun 17, 2025, 2:18 AM

#

patent aspen What if OAI just loses?

than deepmind very happy (:

#

:0

#

i want google win (:

patent aspen Jun 17, 2025, 2:19 AM

#

Me too orange me too

small haven Jun 17, 2025, 2:20 AM

#

patent aspen Seriously

then google buys it out, *its staff

elder rapids Jun 17, 2025, 2:21 AM

#

small haven so what are u trying to accomplish/prove here? deepmind age in aggregate, as a r...

I don't get what you mean lol, that's completely removed from context, DeepMind is way older than openAI so that's obviously not what I mean myself by age. The ONLY way it could possibly be an infant compared to openAI is productization, which is also the only thing relevant in context as well given the dichotomy presented by Craig lmao

small haven Jun 17, 2025, 2:22 AM

#

elder rapids I don't get what you mean lol, that's completely removed from context, DeepMind ...

yes obviously, gemini is younger, im agreeing to that..

jade egret Jun 17, 2025, 2:23 AM

#

guys

#

how about deepseek r1 for coding

#

that seems like the only one works idk why

small haven Jun 17, 2025, 2:24 AM

#

theres a lot of markets, craig's market is polymarket

#

yea or the latter

elder rapids Jun 17, 2025, 2:26 AM

#

small haven yes obviously, gemini is younger, im agreeing to that..

dawg you know that's not retrospective framing on my part either, if what I meant was nothing else but that, and that was what you meant too, are you wrong or am I wrong

#

for asking for clarity

jade egret Jun 17, 2025, 2:26 AM

#

deepseek server busy..

jade egret Jun 17, 2025, 2:26 AM

#

jade egret deepseek server busy..

im bad at spelling ik

#

wait i spelt it right

#

:0

small haven Jun 17, 2025, 2:27 AM

#

elder rapids dawg you know that's not retrospective framing on my part either, if what I mean...

yea my bad for not being clear about it, my point is if deepmind is isolated from google, then its an obvious pick to invest "$10m" on deepmind vs. oai...

patent aspen Jun 17, 2025, 2:27 AM

#

@deep adder I think you tend to decide what you want to be true and then ignore everything that doesn't conform to that

elder rapids Jun 17, 2025, 2:28 AM

#

small haven yea my bad for not being clear about it, my point is if deepmind is isolated fro...

ye

#

we agreed

elder rapids Jun 17, 2025, 2:28 AM

#

patent aspen <@348477266704990208> I think you tend to decide what you want to be true and th...

Craig is always right

#

it doesn't have to be a stop sign if he can't read

#

get with the program brochacho

jade egret Jun 17, 2025, 2:30 AM

#

guys..

Gemini didn't work
chatGPT didn't work
I ran out of claude
DeepSeek server busy
I don't think grok is gonna work

what should i try next (for coding, pygame for exact)

small haven Jun 17, 2025, 2:30 AM

#

craig is funny imma give him that

lapis light Jun 17, 2025, 2:31 AM

#

I personally think, as soon as Google starts tackling personalization, OpenAI is cooked

#

At least, that's what I want to see happen.

patent aspen Jun 17, 2025, 2:32 AM

#

I would add one caveat that it's hard for DeepMind to exist in a vacuum. The computing power, supporting teams, data from products, etc all play a role. If we assume they retain those advantages, I would definitely bet on DeepMind

elder rapids Jun 17, 2025, 2:33 AM

#

demis and sundar is a good combo tbh

#

sundar seems bright

patent aspen Jun 17, 2025, 2:34 AM

#

Granted OAI can't exist in a vacuum either. They need Microsoft to bankroll them to remain competitive

#

I doubt they want to bite the hand that feeds them

small haven Jun 17, 2025, 2:46 AM

#

but does msft own 49 or 51% of oai?

patent aspen Jun 17, 2025, 2:47 AM

#

small haven but does msft own 49 or 51% of oai?

I think OAI would probably issue new shares when raising money to avoid losing voting control

#

Did OAI actually sue Microsoft?

small haven Jun 17, 2025, 2:50 AM

#

https://techcrunch.com/2025/06/16/the-cracks-in-the-openai-microsoft-relationship-are-reportedly-widening/

TechCrunch

Maxwell Zeff

The cracks in the OpenAI-Microsoft relationship are reportedly wide...

OpenAI is reportedly considering accusing Microsoft, its largest backer, of anticompetitive behavior throughout their partnership.

patent aspen Jun 17, 2025, 2:51 AM

#

Right I knew about that, but I think it's more like OAI is doing the bare minimum where Microsoft will still let them use their cloud at a discount

small haven Jun 17, 2025, 2:52 AM

#

mainly started when oai asked to allocate some gcp servers

patent aspen Jun 17, 2025, 2:52 AM

#

I guess OAI will start paying GCP to have some negotiation leverage

patent aspen Jun 17, 2025, 2:53 AM

#

small haven mainly started when oai asked to allocate some gcp servers

The rift definitely started earlier than that

small haven Jun 17, 2025, 2:53 AM

#

patent aspen The rift definitely started earlier than that

oh yea, but this is literally nail in the coffin

small haven Jun 17, 2025, 2:54 AM

#

patent aspen I guess OAI will start paying GCP to have some negotiation leverage

i wonder if they will get relatively the same discount considering theyre a big volume client

patent aspen Jun 17, 2025, 2:54 AM

#

small haven oh yea, but this is literally nail in the coffin

Maybe. I think if Microsoft stops providing any servers for a discount, that would truly be the nail in the coffin

small haven Jun 17, 2025, 2:55 AM

#

google is seeping into oai ecosystem, what u gonna do about it craig

patent aspen Jun 17, 2025, 2:55 AM

#

Then their losses for 2025 would likely be closer to $18B

small haven Jun 17, 2025, 2:56 AM

#

holy

patent aspen Jun 17, 2025, 2:57 AM

#

I think OAI lost around $8B in 2024 and was on pace to lose $14B in 2025 before all of this Microsoft stuff

small haven Jun 17, 2025, 2:58 AM

#

just another funding round, no biggie

hollow ocean Jun 17, 2025, 2:58 AM

#

Will deep think dethrone o3 pro tmr?

#

Why not

small haven Jun 17, 2025, 2:59 AM

#

@patent aspen have u tried o3 pro, how does it compare to dt

hollow ocean Jun 17, 2025, 3:00 AM

#

What’s coming tmr

small haven Jun 17, 2025, 3:00 AM

#

hmm damn

hollow ocean Jun 17, 2025, 3:00 AM

#

Will pro GA be good

small haven Jun 17, 2025, 3:01 AM

#

isnt it literally 0605

hollow ocean Jun 17, 2025, 3:01 AM

#

I think so

#

So same model

small haven Jun 17, 2025, 3:02 AM

#

preview to generally accessible 🤷

#

oh

#

craigbenched

patent aspen Jun 17, 2025, 3:09 AM

#

What if @deep adder was actually Craig Federighi?

small haven Jun 17, 2025, 3:09 AM

#

can i have an autograph

patent aspen Jun 17, 2025, 3:11 AM

#

ngl I think the real Craig Federighi has the most punchable smirk

#

Like I'm not an angry person at all

small haven Jun 17, 2025, 3:12 AM

#

do u have the same agility like him when he goes down the stairs at apple hq

#

wow

#

lemme know when, so i can short

patent aspen Jun 17, 2025, 3:14 AM

#

Sometimes I think Apple is like a mustache twirling villain

#

Like when they designed their own protocol for air pods that didn't allow third party buds. Then they made the excuse that they couldn't allow 3rd party buds because it wouldn't be secure - with the protocol that they designed

#

And they will never have an opportunity to try until the EU gets tired of their anti-competitive practices

#

Like with lightning cables

#

small haven Jun 17, 2025, 3:20 AM

#

oh naw @deep adder how do u respond to this

topaz edge Jun 17, 2025, 3:20 AM

#

had to check why i blocked that dude

#

looks like it was justified

#

#

#

guess they should focus less on design and more on hardware

#

lol

#

back to the block list

small haven Jun 17, 2025, 3:23 AM

#

wait what

#

craig is also getting hate from other servers 😭

#

that must be some sort of achievement

topaz edge Jun 17, 2025, 3:23 AM

#

mods should ban him

small haven Jun 17, 2025, 3:24 AM

#

tbh i enjoy his company

topaz edge Jun 17, 2025, 3:26 AM

#

why

small haven Jun 17, 2025, 3:27 AM

#

hes just funny, make it less stale

#

im not taking it serious

topaz edge Jun 17, 2025, 3:27 AM

#

funny is an interesting way to describe it

prisma bison Jun 17, 2025, 3:48 AM

#

Hey folks

echo aurora Jun 17, 2025, 3:50 AM

#

howdy ablobwave

topaz edge Jun 17, 2025, 3:57 AM

#

elder rapids Jun 17, 2025, 4:03 AM

#

patent aspen

holy 😭😭

#

insane response

dusky aurora Jun 17, 2025, 6:30 AM

#

LMArena is my only way of stress relief these days

#

after yet anothe rmssile in my city this morning, my mind wants to rest

civic flame Jun 17, 2025, 6:40 AM

#

☹️

meager harbor Jun 17, 2025, 6:50 AM

#

#

This is AGI

#

Seriously I laugh when i see people saying AI will replace 20% of jobs in the near future

fleet lintel Jun 17, 2025, 7:03 AM

#

Will I be able to use deep think Gemini without the ultra subscription??

torn mantle Jun 17, 2025, 7:17 AM

#

fleet lintel Will I be able to use deep think Gemini without the ultra subscription??

Will i be able to get a decent job if i don't study?

elder rapids Jun 17, 2025, 7:45 AM

#

meager harbor

too trained on the classical riddle or simply decided to forgive the possible mistake the riddle gave lol, models are like that and the thinking process would probably point that out, but since it likes brevity it'll just spit out the total corrected answer

#

which Includes the model forgiveness factor

whole wagon Jun 17, 2025, 7:48 AM

#

meager harbor

Go to people on the street and ask them this question. Report back what % gets it right

#

Human intelligence is so overestimated

#

I tried asking some basic times tables to some people and they messed them all up

#

Meanwhile LLMs can multiple two 8 digit numbers with no tools easily lol

#

Sometimes I get the feeling AGI is already here. And it's just not that world changing is all

#

Maybe when simple bench falls. That will be the marker

drifting thorn Jun 17, 2025, 7:55 AM

#

Need a WeChat account

whole wagon Jun 17, 2025, 7:58 AM

#

Google cooking like crazy ATM

#

Like bruh I thought every 3 months was fast now they releasing a new batch of models and it's not even been 2 weeks since the last

drifting thorn Jun 17, 2025, 7:59 AM

#

Thanks to the integration of AI departments of Google

whole wagon Jun 17, 2025, 7:59 AM

#

Meanwhile over 2 years between chatGPT 4 (march 2023) and upcoming chatGPT 5 in July

#

This cadence won't work now

#

With Google on the scene

drifting thorn Jun 17, 2025, 8:00 AM

#

Deepmind is a very strong team

#

And I guess the next step for Google is combining V-JEPA 2, Veo, Imagen and LLM to create the base of AGI

whole wagon Jun 17, 2025, 8:02 AM

#

Google says they don't think the LLM arch can reach AGI

#

Probably they have something else developing for that

#

Imagine Google drops Gemini 3 same time as gpt5

#

That would be so brutal kek

leaden sun Jun 17, 2025, 8:07 AM

#

whole wagon Google says they don't think the LLM arch can reach AGI

no llm can reach AGI alone

indigo hazel Jun 17, 2025, 8:08 AM

#

and at the same time deepseek r2 lmao

whole wagon Jun 17, 2025, 8:13 AM

#

Imagine if human reasoning really isn't special and a LLM is enough to reach it

#

Dunno whether that would be good or bad lol

#

It seems increasingly likely as the days pass

leaden sun Jun 17, 2025, 8:18 AM

#

whole wagon Imagine if human reasoning really isn't special and a LLM is enough to reach it

am afraid if human reasoning isnt special, we wont be able to survive til today as a species 😅

drifting thorn Jun 17, 2025, 8:22 AM

#

V-JEPA 2 is a world model

#

And it can predicts the future event well, much better than current video-generating AIs

#

I just asked Skywork to do a deep research on this topic.

📎 vjepa2_agi_research_report.docx

elder rapids Jun 17, 2025, 8:32 AM

#

whole wagon Go to people on the street and ask them this question. Report back what % gets i...

pretty sure 100% of people are getting it right, and if they're somehow not, it's not an intelligence problem they just don't have the information in their face, they heard it once and now have to answer so "the surgeon who's the boys father" could've just never registered

#

this is not by any means a simple bench esque problem either

#

so many problems that aren't "solved" by these AI is because there's a level of forgiveness because it has the necessary "truth" in their training data

#

whether this is a flaw is random

leaden sun Jun 17, 2025, 9:11 AM

#

whole wagon I tried asking some basic times tables to some people and they messed them all u...

was it done statistically using huge random sample groups (>=1000 people, this is the sample size that is common in clinical trials for testing new medication for example, the smallest being 100, you need at least 10 to test new cosmetics products sigh)

or you just went on street somewhere randomly and asked random people passing by? here in my place, if i do that, everyone will solve correctly, but i need to admit my town is famous for academics 😆

ocean vortex Jun 17, 2025, 9:29 AM

#

meager harbor Seriously I laugh when i see people saying AI will replace 20% of jobs in the ne...

lmao. Yeah this is a classic example of changing the riddle it was trained for just marginally to change the whole meaning but the model stays with the same answer. Used to be a bigger problem in the past, kinda surprised o3-pro fell for it. 4.5 and Opus would probably answer this correct

#

yeah Opus does get it right. Seems that I overestimated 4.5 though since that doesn't...

alpine coral Jun 17, 2025, 9:40 AM

#

elder rapids whether this is a flaw is random

i dont think totally random - the more well known the 'truth' (or in this case, the riddle) is, the more likely it is to result in an overfitted, wrong answer to a slight variation of the question asked related to it (and designed to get an alternative response as the solution)

#

with yeah assumed typos etc on the part of the user as the rationale

#

the son mother doctor setup is as 'classic' as they come imo

#

o3 gets it wrong, unless instructed to interpret the question literally (funnily, it's reasoning summaries indicate that the CoT was entirely wrong.. but ig the model corrected for the actual answer)

#

o3-pro, when told to interpret it literally (i.e. not assume typos on the part of the user or that it is a failed attempt at asking the 'classic' version of the question) also gets it 'right' (but not really ofc, as Dom points out)

#

oh that's o1-pro..whoops

alpine coral Jun 17, 2025, 9:52 AM

#

ocean vortex yeah Opus does get it right. Seems that I overestimated 4.5 though since that do...

this is pretty good from opus (it hedges and gives both answers, but not unreasonably)

ocean vortex Jun 17, 2025, 9:53 AM

#

alpine coral o3-pro, when told to interpret it literally (i.e. not assume typos on the part o...

you can't tell it to take literally or not, that's a huge hint

#

I tried this on 2.5Pro, no go, it failed as well catgrin

ocean vortex Jun 17, 2025, 9:57 AM

#

elder rapids pretty sure 100% of people are getting it right, and if they're somehow not, it'...

it's either model capacity problem (not big enough to not fallback to more common interpretation), or it was overfitted on the original riddle. Neither causes are very desirable tbh

#

fundamentally it means the model is less flexible

alpine coral Jun 17, 2025, 10:02 AM

#

ocean vortex you can't tell it to take literally or not, that's a huge hint

yeah obviously (i'm not saying they get it 'right') - demonstrating the point that they don't interpret it literally, unless told to, and instead assume it's meant to be 'classic' formulation, and the user made a typo or something, and then go onto to spit out the classic but wrong answer

ocean vortex Jun 17, 2025, 10:11 AM

#

alpine coral yeah obviously (i'm not saying they get it 'right') - demonstrating the point th...

I think that is hard to say definitively though. Models will often arrive at the answer one way and then "justify it" another. Like if it was overfitted it could still output reasoning making assumptions that the user made a mistake or whatever

alpine coral Jun 17, 2025, 10:13 AM

#

yeah tho look at o3's ~~response~~ reasoning summary above; it doesn't say typo, but it says "the user's phrasing is a bit off", which to my mind is basically the same thing no?

ocean vortex Jun 17, 2025, 10:16 AM

#

alpine coral yeah tho look at o3's ~~response~~ reasoning summary above; it doesn't say typo,...

the issue is that it still settles on the wrong answer with no disclaimers or caveats of any kind. You can not do that. Something swayed it from disregarding that initial reasoning path to the point of ignoring it completely lol

#

could still be just the lack of capacity tbh. You can see reasoning helps it, but still not enough and it goes with the easy answer the path of least resistance eventually

#

the fact that o3-pro gets the original wording wrong, would suggest that virtually no attempts of o3 get it right. They probably only contemplate it in reasoning but I'm not sure reasoning traces are even considered with parallel compute for response ranking - would make it much more expensive

#

I think the bottom line is, if the model is to assume the user made a typo or intended to say something else, at the very least that should be in the final response when giving an answer for alternative interpretation

#

If it assumes things wrong (not how it was written) and then just provides you with a concise simple answer - that's automatically incorrect, in my book.

alpine coral Jun 17, 2025, 10:27 AM

#

well i mean this is why i like riddles ahah

ocean vortex Jun 17, 2025, 10:27 AM

#

regardless of the reasoning even

alpine coral Jun 17, 2025, 10:27 AM

#

or twists on riddles.. they trip LLMs up

#

(as was mentioned before tho, in a way that might trip up many humans too, like pretty easy to gloss over the phrasing of the quwstion and assume it's the 'classic' and the answer is the mother.. but kinda beside the point to the discussion.. nvm / carry on aha)

alpine coral Jun 17, 2025, 10:32 AM

#

ocean vortex the issue is that it still settles on the wrong answer with no disclaimers or ca...

but yeah, opus' response was good because it did provide those caveats etc - i do hear your point, and agree

ocean vortex Jun 17, 2025, 10:34 AM

#

yeah.. the reason I even tested it is because from my experience bigger models are considerably better at things like this. They are somewhat less likely to assume things that sound similar and already exist in training data but lead to different outcome 👀

fleet lintel Jun 17, 2025, 10:51 AM

#

I am excited about deep think models. Is there any insider or trusted tester info out on Gemini Deep think models? .. @patent aspen

hollow ocean Jun 17, 2025, 10:51 AM

#

getting ultra as soon as its out

#

can't wait for sota reasoning

fleet lintel Jun 17, 2025, 10:53 AM

#

I am not paying 3000$ per year yet but might be able to get share access from time to rime

hollow ocean Jun 17, 2025, 10:54 AM

#

fleet lintel I am not paying 3000$ per year yet but might be able to get share access from ti...

its worth it

fleet lintel Jun 17, 2025, 10:55 AM

#

3000 $ is not a small amount. I can afford but I rather not unless it's necessary.

#

Honestly, my company should provide me an ultra subscription.

alpine coral Jun 17, 2025, 11:37 AM

#

whole wagon Maybe when simple bench falls. That will be the marker

i really like simple bench. when it becomes impossible to create new, more challenging 'simple' (for human, hard for AI) tests, i feel like that might be some kind of marker

#

was reading through this from ARC the other day, thought this was on the money:

You'll know AGI is here when the exercise of creating tasks that are easy for regular humans but hard for AI becomes simply impossible.

#

lol yeah prob tbf ha

#

but i do think for 'general' intelligence, some expectation that you can give it a basic af question or riddle, that the overwhleming majority of literate humans would get right, and it too gets them right, is reasonable

#

anyway.. not a can of worms i wanted to open.. let's move on ha

#

AGI is perhaps my most disliked term.. it means [so many different things to different people that it basically means] nothing

golden ocean Jun 17, 2025, 11:41 AM

#

i would like to open that can of worms

alpine coral Jun 17, 2025, 11:41 AM

#

go on 😉

willow grail Jun 17, 2025, 11:46 AM

#

olonly real wo/men ur claude max/code

sacred plaza Jun 17, 2025, 12:11 PM

#

fleet lintel I am excited about deep think models. Is there any insider or trusted tester inf...

Deep think seem like it is just a graph of thoughts prompting technique so you can get it now lol via manual request into Gemini models

#

The Claude code running for 7 hours was a bit odd. What the hell was it doing for 7 hours??? Was it just stuck accessing a database for that long haha

unborn ocean Jun 17, 2025, 12:15 PM

#

sacred plaza Deep think seem like it is just a graph of thoughts prompting technique so you c...

I am sure they will be doing RL / SFT on top of that

leaden sun Jun 17, 2025, 12:16 PM

#

alpine coral o3 gets it wrong, unless instructed to interpret the question literally (funnily...

I simply added a ":" in that version of the riddle and most llms got it right suddenly, sadly chatgpt still got it wrong 😔

unborn ocean Jun 17, 2025, 12:16 PM

#

Or I hope, bc otherwise it might be a bit boring

alpine coral Jun 17, 2025, 12:17 PM

#

leaden sun I simply added a ":" in that version of the riddle and most llms got it right s...

oh nice! that's interesting - did you literally just prepend or append a colon to it?

leaden sun Jun 17, 2025, 12:19 PM

#

alpine coral oh nice! that's interesting - did you literally just prepend or append a colon t...

The surgeon, who is the boy's father, says*:*"I cannot operate on this boy, he's my son." Who is the surgeon to the boy?

#

the colon somehow has helped to make clear that sentence in quotation mark is what the surgeon has literally said, hence explicitly stating, he's the father. but:

alpine coral Jun 17, 2025, 12:21 PM

#

ahh i see

#

thanks for claifying!

#

yeah that's kinda intersting ey

leaden sun Jun 17, 2025, 12:22 PM

#

Bildschirmfoto_2025-06-17_um_14.21.33.png

alpine coral Jun 17, 2025, 12:22 PM

#

oh love that..

(assuming the twist and traditional riddle intent):

#

which model?

leaden sun Jun 17, 2025, 12:23 PM

#

dear chat, whats going on with you 😔

#

4o the latest

alpine coral Jun 17, 2025, 12:23 PM

#

leaden sun dear chat, whats going on with you 😔

wdym

leaden sun Jun 17, 2025, 12:24 PM

#

alpine coral wdym

i meant chatgpt

alpine coral Jun 17, 2025, 12:24 PM

#

ahh gotcha gotcha

leaden sun Jun 17, 2025, 12:26 PM

#

the punctuation does matter, i knew it right away

alpine coral Jun 17, 2025, 12:28 PM

#

leaden sun

i dont get the way it starts with "Ah, I see now!" - it's like a follow-up or from reasoning?

leaden sun Jun 17, 2025, 12:29 PM

#

i used the version from above first, most got it wrong, then I added the colon, suddenly, it's clear for most but not all

alpine coral Jun 17, 2025, 12:30 PM

#

i cant reproduce with 4o after adding the colon

brittle tiger Jun 17, 2025, 12:30 PM

#

https://x.com/OfficialLoganK/status/1934767666239004994?t=2iIud5gAUdwJebqpQRAZMw&s=19

Flash lite preview, deep think, 2.5 Pro GA

This what we think?

Logan Kilpatrick (@OfficialLoganK)

⚡️, 💪, 🐎

keen beacon Jun 17, 2025, 12:30 PM

#

fast, strong, workhorse

#

i guess

alpine coral Jun 17, 2025, 12:31 PM

#

i like how we're basically readng tarot cards now

#

kidding aha

leaden sun Jun 17, 2025, 12:31 PM

#

alpine coral i cant reproduce with 4o after adding the colon

strange...

alpine coral Jun 17, 2025, 12:31 PM

#

i agree.. his tweets are worth something.. and that interpretration of the emojis makes sense ig

leaden sun Jun 17, 2025, 12:32 PM

#

i use the battle mode, so the previous rounds might have influenced it

ornate agate Jun 17, 2025, 12:32 PM

#

lightning bolt = flash; arm = pro; horse = fast = flashlite

leaden sun Jun 17, 2025, 12:37 PM

#

leaden sun i used the version from above first, most got it wrong, then I added the colon, ...

i take back this statement, the previous round might indeed have influenced it, so I started with new chat every time, most still got it wrong despite the colon 😅

willow grail Jun 17, 2025, 12:44 PM

#

while u cope and hope of free lmarena models...

#

olonly real wo/men us e claude max/code for swe.

#

🙂

willow grail Jun 17, 2025, 12:46 PM

#

brittle tiger https://x.com/OfficialLoganK/status/1934767666239004994?t=2iIud5gAUdwJebqpQRAZMw...

how boring. if u stop dreaming and wanna be actual productive u do cc/cm

#

/s

indigo hazel Jun 17, 2025, 12:54 PM

#

patent aspen Jun 17, 2025, 12:54 PM

#

The horse is regular Flash. They've said dozens of times that it's the "workhorse model".

alpine coral Jun 17, 2025, 12:54 PM

#

that does ring true

willow grail Jun 17, 2025, 12:55 PM

#

patent aspen The horse is regular Flash. They've said dozens of times that it's the "workhors...

so still no affordable deving with gemini 2.5 so sad

#

go claude max/code

ocean vortex Jun 17, 2025, 2:39 PM

#

you can do your own with parallel requests

#

but it's gonna cost you 😇

keen beacon Jun 17, 2025, 2:41 PM

#

logan didn't say that?

patent aspen Jun 17, 2025, 2:44 PM

#

Maybe interpreting the emojis or 3 x gemini

jade egret Jun 17, 2025, 2:54 PM

#

hi

kind cloud Jun 17, 2025, 2:56 PM

#

on vertex

patent aspen Jun 17, 2025, 3:01 PM

#

I think it depends on what is being predicted tbh

brittle tiger Jun 17, 2025, 3:06 PM

#

Can you give an example of one and what odds look like?

indigo hazel Jun 17, 2025, 3:08 PM

#

kind cloud on vertex

hot lmao

placid charm Jun 17, 2025, 3:11 PM

#

@echo aurora any news about lmarena test garden you allowed to disclose?

patent aspen Jun 17, 2025, 3:12 PM

#

brittle tiger Can you give an example of one and what odds look like?

I'll have to take a look at it later

brittle tiger Jun 17, 2025, 3:12 PM

#

There for me too

willow grail Jun 17, 2025, 3:14 PM

#

crow collage videos https://photos.app.goo.gl/kd7TeRDdHE7YV6aq8

Google Photos

11 new videos · Monday, Jun 16 🎬

Tap to view!

willow grail Jun 17, 2025, 3:14 PM

#

brittle tiger There for me too

that isnt aistudio or

civic flame Jun 17, 2025, 3:14 PM

#

brittle tiger There for me too

still missing the 3rd

#

oh nvm no we're not

willow grail Jun 17, 2025, 3:15 PM

#

if u would need berberine u would know the answer

jade egret Jun 17, 2025, 3:15 PM

#

woah

willow grail Jun 17, 2025, 3:15 PM

#

no idea what that is

jade egret Jun 17, 2025, 3:15 PM

#

so 2.5 pro is officially on vertex studio?

willow grail Jun 17, 2025, 3:16 PM

#

ew no. i dont do drugs

#

i prefer safe streets, superb public transport system.
money if u get sick, so many holiday weeks as employee.
money if u loose job.
money if you cant work.
money if. if if if

#

SOURCE, old man?

#

oh ok

#

oh ok

civic flame Jun 17, 2025, 3:21 PM

#

okay yeah this seems to be blacktooth

#

damn

willow grail Jun 17, 2025, 3:21 PM

#

the user?

cedar tide Jun 17, 2025, 3:32 PM

#

meager harbor

vs qwen 1.5 0.5B

jade egret Jun 17, 2025, 3:33 PM

#

yea

#

but

#

blacktooth better than 2.5 pro 0605

civic flame Jun 17, 2025, 3:34 PM

#

lol no

#

this is better

cedar tide Jun 17, 2025, 3:35 PM

#

brittle tiger There for me too

on the ga version its possoble to put off the thinking from 2.5 pro? ?

#

@kind cloud

keen beacon Jun 17, 2025, 3:37 PM

#

interested in that too, havent gotten the chance to play with it yet

brittle tiger Jun 17, 2025, 3:37 PM

#

cedar tide on the ga version its possoble to put off the thinking from 2.5 pro? ?

I don't see that but you can set thinking budget to as low as 128 tokens.

keen beacon Jun 17, 2025, 3:38 PM

#

its the same as 0605 then

cedar tide Jun 17, 2025, 3:38 PM

#

brittle tiger I don't see that but you can set thinking budget to as low as 128 tokens.

thx

#

so no price reduction when you are without reasoning like on 2.5 flash 🥴

#

@brittle tigeryou have the price of 2.5 flash lite ?

hybrid locust Jun 17, 2025, 3:44 PM

#

ai studio is changing

#

they're releasing them

cedar tide Jun 17, 2025, 3:45 PM

#

hybrid locust ai studio is changing

screen ?

hybrid locust Jun 17, 2025, 3:45 PM

#

#

the other models are gone

keen beacon Jun 17, 2025, 3:45 PM

#

same

cedar tide Jun 17, 2025, 3:45 PM

#

06 05 and 05 20 gone

torn mantle Jun 17, 2025, 3:46 PM

#

cedar tide 06 05 and 05 20 gone

what did you do

cedar tide Jun 17, 2025, 3:46 PM

#

torn mantle what did you do

?

keen beacon Jun 17, 2025, 3:47 PM

#

you removed those models 🥲

wintry tinsel Jun 17, 2025, 3:47 PM

#

Yo wtf

#

lol

#

My daily driver is gone

civic flame Jun 17, 2025, 3:48 PM

#

lol google are going minimalist

#

trust the process

hazy quest Jun 17, 2025, 3:49 PM

#

"Gemma 3n E4B" was this model there already?

cedar tide Jun 17, 2025, 3:49 PM

#

hazy quest "Gemma 3n E4B" was this model there already?

yes

hazy quest Jun 17, 2025, 3:49 PM

#

Whats the logic behind the name?

jade egret Jun 17, 2025, 3:53 PM

#

dang

whole wagon Jun 17, 2025, 3:53 PM

#

Chill the new ones are getting added very soon lol

jade egret Jun 17, 2025, 3:53 PM

#

whole wagon Chill the new ones are getting added very soon lol

yay

#

today?

whole wagon Jun 17, 2025, 3:53 PM

#

Yes

jade egret Jun 17, 2025, 3:53 PM

#

W

#

blacktooth right

#

hopefully better than opus 4 at coding

#

because i ran out of credit soooo fast 😭

brittle tiger Jun 17, 2025, 3:57 PM

#

cedar tide <@266308552111554560>you have the price of 2.5 flash lite ?

Pricing page it takes you too hasnt been updated to include flash lite yet

whole wagon Jun 17, 2025, 4:00 PM

#

They are in Google ai studio now

keen beacon Jun 17, 2025, 4:00 PM

#

yup

leaden meteor Jun 17, 2025, 4:00 PM

#

LMArena updated its leaderboard just now but 06-05 still top....where is stable 2.5pro?

whole wagon Jun 17, 2025, 4:02 PM

#

Well. They just need to change the name lol

balmy mist Jun 17, 2025, 4:02 PM

#

there is a new 2.5 pro?

keen beacon Jun 17, 2025, 4:02 PM

#

ga version is supposed to be the same as 0605

#

afaik

balmy mist Jun 17, 2025, 4:02 PM

#

ahhh

#

anyone test them yet?

whole wagon Jun 17, 2025, 4:02 PM

#

Ppl not understanding blacktooth/kingfall is part of a different model line

#

Gemini 2.5 pro is done

keen beacon Jun 17, 2025, 4:03 PM

#

no there might be more revisions of gemini 2.5 pro still

#

but blacktooth and kingfall are different

#

i agree

whole wagon Jun 17, 2025, 4:03 PM

#

There won't be more revisions

keen beacon Jun 17, 2025, 4:03 PM

#

there will be

whole wagon Jun 17, 2025, 4:03 PM

#

That's the point of the ga. Logan already said it's final release

patent aspen Jun 17, 2025, 4:04 PM

#

whole wagon That's the point of the ga. Logan already said it's final release

No there will be

whole wagon Jun 17, 2025, 4:04 PM

#

Is Gemini 3 pro not cooking kek

keen beacon Jun 17, 2025, 4:04 PM

#

things are done in parallel

#

gemini 3 probably still pretraining

whole wagon Jun 17, 2025, 4:04 PM

#

Hm

hazy quest Jun 17, 2025, 4:05 PM

#

There it is

#

On AI Studio

#

nimble trail Jun 17, 2025, 4:05 PM

#

whole wagon That's the point of the ga. Logan already said it's final release

Non-thinking update still not here tho.

#

So I think there will be revision soon ig

keen beacon Jun 17, 2025, 4:05 PM

#

yeah tats a future revision i think

#

one of the side by side ab tests had it

#

not sure if it iwll be that though

whole wagon Jun 17, 2025, 4:06 PM

#

IMO, a revision should go under a 2.6 name or smth

#

It's confusing to keep updating 2.5 pro

keen beacon Jun 17, 2025, 4:06 PM

#

thats kinda surprising for 2.5 pro kinda i guess

elder rapids Jun 17, 2025, 4:06 PM

#

yo

keen beacon Jun 17, 2025, 4:06 PM

#

i would think they would keep incremental updates

elder rapids Jun 17, 2025, 4:07 PM

#

it's using new formatting

balmy mist Jun 17, 2025, 4:07 PM

#

whats the difference?

Screenshot_2025-06-17_at_12.07.05_PM.png

elder rapids Jun 17, 2025, 4:07 PM

#

I thought it was just 0605

balmy mist Jun 17, 2025, 4:07 PM

#

like which one is GA? which one is better

elder rapids Jun 17, 2025, 4:07 PM

#

??

keen beacon Jun 17, 2025, 4:07 PM

#

0506 is older. 0605 is gemini 2.5 pro

#

logan said it would be the same im kinda confused

#

ga and 0605

elder rapids Jun 17, 2025, 4:07 PM

#

it's different

keen beacon Jun 17, 2025, 4:08 PM

#

are people tripping balls?

balmy mist Jun 17, 2025, 4:08 PM

#

which is deep think?

whole wagon Jun 17, 2025, 4:08 PM

#

Iirc GA is about the user experience. The model intelligence is the same

balmy mist Jun 17, 2025, 4:08 PM

#

keen beacon 0506 is older. 0605 is gemini 2.5 pro

why are the numbers so similar lol

patent aspen Jun 17, 2025, 4:09 PM

#

fwiw it's possible there may be updates other than the model itself that result in differences in performance

elder rapids Jun 17, 2025, 4:09 PM

#

patent aspen fwiw it's possible there may be updates other than the model itself that result ...

brian explain

#

it's not just 0605

#

and it's not 2m context

balmy mist Jun 17, 2025, 4:09 PM

#

ohh i see:
https://x.com/OfficialLoganK/status/1935005571016544332

Logan Kilpatrick (@OfficialLoganK)

Introducing the Gemini 2.5 model family:

- Gemini 2.5 Pro (Stable, no changes from 06-05)
- Gemini 2.5 Flash (Stable, updated pricing from 05-20)
- Gemini 2.5 Flash-Lite (Preview, small reasoning model)

More info in 🧵

elder rapids Jun 17, 2025, 4:09 PM

#

what's flash lites pricing

whole wagon Jun 17, 2025, 4:09 PM

#

Updated pricing 👀

elder rapids Jun 17, 2025, 4:10 PM

#

ye because efficiency is part of the preview

hazy quest Jun 17, 2025, 4:10 PM

#

Lmao so 2.5 Pro is 06-05, no changes

elder rapids Jun 17, 2025, 4:10 PM

#

btw it is different its using line breaks more often, like how 4o spams it

hazy quest Jun 17, 2025, 4:11 PM

#

Updated system prompt. It has been doing that ealier today already

balmy mist Jun 17, 2025, 4:11 PM

#

that new lite model is fast af

elder rapids Jun 17, 2025, 4:12 PM

#

yeah I'm averaging like 400 t/s with it

patent aspen Jun 17, 2025, 4:12 PM

#

elder rapids it's not just 0605

I don't know what to say. It is 0605. There could be differences in tools use, cascading strategies, etc

civic flame Jun 17, 2025, 4:12 PM

#

whole wagon Ppl not understanding blacktooth/kingfall is part of a different model line

yeah i dont see why blacktooth would be a new model on arena if it wasn't a new model

patent aspen Jun 17, 2025, 4:12 PM

#

Oh latency? Yeah that could definitely change without a model update

jade egret Jun 17, 2025, 4:13 PM

#

guys

#

is gemini 2.5 pro equal to gemini 2.5 pro 0605?

whole wagon Jun 17, 2025, 4:13 PM

#

Yep

jade egret Jun 17, 2025, 4:13 PM

#

bruh

#

than

#

nvm

#

so technically we didn't get new stuff

whole wagon Jun 17, 2025, 4:14 PM

#

Flash had a pricing change

keen beacon Jun 17, 2025, 4:14 PM

#

patent aspen fwiw it's possible there may be updates other than the model itself that result ...

Ic so probably inference engine changes/related changes

whole wagon Jun 17, 2025, 4:15 PM

#

New stuff doesn't drop on Tuesdays :p

echo aurora Jun 17, 2025, 4:15 PM

#

placid charm <@283397944160550928> any news about lmarena test garden you allowed to disclose...

no new news that I can share, but know the program is moving along

jade egret Jun 17, 2025, 4:15 PM

#

whole wagon New stuff doesn't drop on Tuesdays :p

when new stuff drop ; (

patent aspen Jun 17, 2025, 4:15 PM

#

Right there's a bunch of other teams that spend all day optimizing inference that don't work on the model itself

whole wagon Jun 17, 2025, 4:16 PM

#

$0.30/$2.50 for 2.5 flash

brittle tiger Jun 17, 2025, 4:17 PM

#

pricing

jade egret Jun 17, 2025, 4:17 PM

#

brittle tiger pricing

where'd you see that

#

plz link

brittle tiger Jun 17, 2025, 4:17 PM

#

jade egret plz link

https://x.com/_philschmid/status/1935005153444139169

Philipp Schmid (@_philschmid)

Gemini 2.5 is production ready! We just launched 3 new Gemini models with 2.5 Pro and Flash being now generally available and a new Gemini 2.5 Flash Lite preview! 🧠⚡️🔦

Here is all you need to know:
🔦 New Gemini 2.5 Flash Lite (Preview) with Thinking, 1M context, only