wintry tinsel Jun 10, 2025, 6:58 PM

#

But I don’t care either way I’m not in school no more

leaden sun Jun 10, 2025, 6:59 PM

#

I was talking about those private schools who teach the future queens, kings or PMs. They have a different program of curriculum.

wintry tinsel Jun 10, 2025, 7:00 PM

#

Oh never been to one of those lol

#

Some public schools in upper class neighborhoods are nice but most are pretty lousy

jade egret Jun 10, 2025, 7:08 PM

#

so when do you guys think kingsfall is gonna be out on ai studio?

narrow elbow Jun 10, 2025, 7:08 PM

#

and tell u identify u self as a walmart bag or attack helicopter 🤪

zinc ore Jun 10, 2025, 7:11 PM

#

https://x.com/ai_for_success/status/1932499225000227295

AshutoshShrivastava (@ai_for_success)

I heard Google is planning to drop Gemini 2.5 Ultra right after OpenAI o3-pro.

keen beacon Jun 10, 2025, 7:12 PM

#

zinc ore https://x.com/ai_for_success/status/1932499225000227295

fake news

#

2.5 ultra and kingfall doesnt exist

late path Jun 10, 2025, 7:14 PM

#

zinc ore https://x.com/ai_for_success/status/1932499225000227295

Have this account's messages been reliable?

zinc ore Jun 10, 2025, 7:15 PM

#

He's actually somewhat reliable (gets early access to Google models and is close to some Google employees), but I think he's just joking here with this tweet

torn mantle Jun 10, 2025, 7:17 PM

#

he said nothing

late path Jun 10, 2025, 7:18 PM

#

saw his reply, it was indeed a joke

torn mantle Jun 10, 2025, 7:21 PM

#

https://x.com/Alice_comfy/status/1932515567887622480

Alice (e/nya)🐈‍⬛ (@Alice_comfy)

o3-pro will cost $20 per million input, probably 80 output. Right around the same as Opus.

keen fulcrum Jun 10, 2025, 7:21 PM

#

https://www.reuters.com/business/retail-consumer/openai-taps-google-unprecedented-cloud-deal-despite-ai-rivalry-sources-say-2025-06-10/

Reuters

Exclusive: OpenAI taps Google in unprecedented cloud deal despite A...

OpenAI plans to add Alphabet's Google cloud service to meet its growing needs for computing capacity, three sources told Reuters, marking a surprising collaboration between two prominent competitors in the artificial intelligence sector.

elder rapids Jun 10, 2025, 7:25 PM

#

torn mantle https://x.com/Alice_comfy/status/1932515567887622480

then what's the point of opus lmao

wintry tinsel Jun 10, 2025, 7:31 PM

#

Opus doesn’t bloody work!

#

Unless you pay for the API it never turns on

small haven Jun 10, 2025, 7:35 PM

#

great cant get the weather today anymore

hazy quest Jun 10, 2025, 7:36 PM

#

Here its cloudy

keen fulcrum Jun 10, 2025, 7:49 PM

#

he move also comes as OpenAI's ChatGPT poses the biggest threat to Google's dominant search business in years, with Google executives recently saying that the AI race may not be winner-take-all.

ocean vortex Jun 10, 2025, 7:59 PM

#

o1 was gpt4o with reasoning. How did they come up with such a weird name for it? LOL

#

no need to overcomplicate it or "standardize" anything catgrin

#

just 'gpt4o-reasoning'

jade egret Jun 10, 2025, 8:02 PM

#

wait

keen fulcrum Jun 10, 2025, 8:02 PM

#

they deploy dozens of models each day

jade egret Jun 10, 2025, 8:03 PM

#

zinc ore https://x.com/ai_for_success/status/1932499225000227295

is this true?

keen fulcrum Jun 10, 2025, 8:03 PM

#

only the best make it

#

they have about dozens of internal name variations

jade egret Jun 10, 2025, 8:04 PM

#

keen beacon 2.5 ultra and kingfall doesnt exist

wait

#

kingsfall doesnt exist???

sage raptor Jun 10, 2025, 8:06 PM

#

it does

#

exist

primal orbit Jun 10, 2025, 8:08 PM

#

is o3 pro going to be on lmarena?

cedar tide Jun 10, 2025, 8:09 PM

#

https://x.com/OpenAI/status/1932530409684005048?t=i2Yv0JOSRfBRexUIjcdtmw&s=19

OpenAI (@OpenAI)

OpenAI o3-pro is rolling out now to all Pro users in ChatGPT and in the API.

late path Jun 10, 2025, 8:11 PM

#

can't wait to see o3pro benchmarks

zinc ore Jun 10, 2025, 8:11 PM

#

They're out

#

AD_4nXf64BLIZlGqaNTB_ifg_oQ39_8VhJY43fv7XZGpHXPDUmsujlKz1AsDdb_sIHtw-DhyZT5wLVSTmP3VMfKQX3DJEwlZQoJFoUmNaYtDF9zlLeGqmHC7D1agNwxvso8JTfV4NDZf3Q.png

AD_4nXfVp97QSq1z-SzfAIdsGm2NTR09Z2uo0uOkp-Nc5uPTeUWQuIugdIcgDwXVlAIihG5iutzTxHyzjOGtA7V2Zh4vhYlJ7M6SXUm92O8ie-DAaSOwzk2IYshrpVo2nQGLyaZufnVw.png

AD_4nXdwjDNdwEWAzaSpUTsqf5V5A0bwgHk82CZUFsAXNjtst1xANb3GNLFInBQdewLKQ5Y3lyzuVRBDl6VQpHYncpDEvO-SEyh1_5tBCEQNEqEpYSnaq1vaexVwx70_4J58RC7Wp1A5.png

sacred quail Jun 10, 2025, 8:12 PM

#

Im waiting for 2.5 pro deep think vs o3 pro

keen beacon Jun 10, 2025, 8:12 PM

#

0605 has a higher gpqa diamond score

zinc ore Jun 10, 2025, 8:13 PM

#

Need more benchmarks

small haven Jun 10, 2025, 8:16 PM

#

sacred quail Im waiting for 2.5 pro deep think vs o3 pro

deep think >>

#

if kingfall is the base model

patent aspen Jun 10, 2025, 8:18 PM

#

#

Even if it's not, I think that's not a huge lift for o3 pro

sage raptor Jun 10, 2025, 8:20 PM

#

o3-pro is not a big jump from o3

ocean vortex Jun 10, 2025, 8:24 PM

#

🔥

late path Jun 10, 2025, 8:24 PM

#

It's all up to deepthink now

small haven Jun 10, 2025, 8:25 PM

#

patent aspen

still released eom ?

#

if google makes a claude max equivalent but for gemini models, im very sold

keen beacon Jun 10, 2025, 8:26 PM

#

$1000 a month

small haven Jun 10, 2025, 8:26 PM

#

TAKE MY MONEY

late path Jun 10, 2025, 8:26 PM

#

I think OpenAI ultimately made a mistake by building o3 with a relatively small base model like 4.1

keen beacon Jun 10, 2025, 8:27 PM

#

what are they gonna use otherwise tho? 4.5 that would not work

ocean vortex Jun 10, 2025, 8:27 PM

#

late path I think OpenAI ultimately made a mistake by building o3 with a relatively small ...

yeah they kinda did but it used to be a lot worse when that was gpt4o. Plus as time goes on smaller models are getting better and better. Not sure it would be worth it to redo everything anymore

keen beacon Jun 10, 2025, 8:28 PM

#

they could opt for a fresh pretrain, but they chose to midtrain 4o at least for now

zinc ore Jun 10, 2025, 8:28 PM

#

I think they made the right decision

keen fulcrum Jun 10, 2025, 8:28 PM

#

Interesting o3 pro only available through responses api

keen fulcrum Jun 10, 2025, 8:29 PM

#

patent aspen

Is that older?

small haven Jun 10, 2025, 8:30 PM

#

legend

#

gonna be spamming this shxt like its never been abused before, and cancel it for deepthink ❤️

torn mantle Jun 10, 2025, 8:31 PM

#

small haven gonna be spamming this shxt like its never been abused before, and cancel it for...

where can i try it for free 😦

small haven Jun 10, 2025, 8:33 PM

#

torn mantle where can i try it for free 😦

ok prompt

zinc ore Jun 10, 2025, 8:35 PM

#

o3 pro does worse than o3 on arc 1

torn mantle Jun 10, 2025, 8:35 PM

#

zinc ore o3 pro does worse than o3 on arc 1

oh no

small haven Jun 10, 2025, 8:36 PM

#

zinc ore o3 pro does worse than o3 on arc 1

deepthink will be adjacent with o3-preview

#

the good ol' prompt

keen fulcrum Jun 10, 2025, 8:37 PM

#

small haven deepthink will be adjacent with o3-preview

There has never been a o2 right?

#

Next one will be o5

small haven Jun 10, 2025, 8:37 PM

#

patent aspen Jun 10, 2025, 8:38 PM

#

keen fulcrum There has never been a o2 right?

No o2 will be next to maximize confusion :p

keen beacon Jun 10, 2025, 8:38 PM

#

small haven

btw was it actually o3 pro you had before

small haven Jun 10, 2025, 8:38 PM

#

keen beacon btw was it actually o3 pro you had before

yes.. its the same thing

#

well, its taking a bit longer, so im guessing i had o3 pro (low)?

#

for reference, this was o3 pro disguised as o1 pro: https://chatgpt.com/share/683b5183-2a90-8003-b84c-a73e47f0d345

ChatGPT

ChatGPT - Berberine vs Propolis vs Resveratrol

Shared via ChatGPT

#

im running the same rn, still running

#

well great

calm sequoia Jun 10, 2025, 8:46 PM

#

Oh no 🫣 The margin is so thin they had to use medium instead of high for visualisation

#

keen beacon Jun 10, 2025, 8:47 PM

#

2727 (o3 high) vs 2748 (o3 pro)

small haven Jun 10, 2025, 8:47 PM

#

calm sequoia

i swear o3 was at 2700 lol

small haven Jun 10, 2025, 8:47 PM

#

keen beacon 2727 (o3 high) vs 2748 (o3 pro)

oh right, tweaking the benchmarks

#

ok so its broken

keen beacon Jun 10, 2025, 8:48 PM

#

oh wait that was o3 preview

#

#

release o3

willow grail Jun 10, 2025, 8:49 PM

#

https://cdn.discordapp.com/attachments/727689277219012669/1382097881946783896/PXL_20250610_182259666.mp4?ex=6849ea75&is=684898f5&hm=c36ce65b671ba64c7d3a7183dec29342ed46ecacb88f9774513604545997f8e2&

▶ Play video

small haven Jun 10, 2025, 8:49 PM

#

keen beacon

*with terminal

keen beacon Jun 10, 2025, 8:49 PM

#

small haven *with terminal

i dont think they released the score without tools

#

o3 pro has tools as well? so its fair i guess

small haven Jun 10, 2025, 8:50 PM

#

oh right

#

well thats underwhelming

keen beacon Jun 10, 2025, 8:50 PM

#

i wonder how much o3 preview cost though

#

😭

#

on codeforces

small haven Jun 10, 2025, 8:50 PM

#

keen beacon i wonder how much o3 preview cost though

$2k in/ $4k out

elder rapids Jun 10, 2025, 8:51 PM

#

is it bad

small haven Jun 10, 2025, 8:51 PM

#

unusable rn

elder rapids Jun 10, 2025, 8:51 PM

#

is it what I expected

keen beacon Jun 10, 2025, 8:51 PM

#

underwhelming i guess

elder rapids Jun 10, 2025, 8:51 PM

#

deepthink when

small haven Jun 10, 2025, 8:51 PM

#

very soon 👀

ocean vortex Jun 10, 2025, 8:53 PM

#

small haven

request well spent, great job 👍

#

ask it about knowledge cutoff

#

(don't do it, lmao)

keen beacon Jun 10, 2025, 8:53 PM

#

it has web search

small haven Jun 10, 2025, 8:53 PM

#

#

oh my goodness, it spent 6m50s last time

#

so i really had o3 pro low

keen beacon Jun 10, 2025, 8:54 PM

#

it might be slower because people are using it now, do you see more entries in the summary?

small haven Jun 10, 2025, 8:54 PM

#

o3 pro (medium?): https://chatgpt.com/share/68489b83-ca38-800e-8f6a-09cabfb751b1

ChatGPT

ChatGPT - Health Compound Comparison

Shared via ChatGPT

ocean vortex Jun 10, 2025, 8:54 PM

#

small haven so i really had o3 pro low

my stats are weird for the request I just made

small haven Jun 10, 2025, 8:55 PM

#

keen beacon it might be slower because people are using it now, do you see more entries in t...

yes thats my guess too

ocean vortex Jun 10, 2025, 8:55 PM

#

only 15k seems less than it needed. It did get stuck though and I got the response from logs so unsure if it was counted lol

barren prairie Jun 10, 2025, 8:55 PM

#

small haven

13min of thinking ...this model thinks more than me 🤣🤣🤣

small haven Jun 10, 2025, 8:56 PM

#

its being stressed to death

calm sequoia Jun 10, 2025, 8:56 PM

#

How's he even talking about the same thing? 😶

ocean vortex Jun 10, 2025, 8:57 PM

#

actually nvm, they are counting only 1 instance so about right then

small haven Jun 10, 2025, 8:57 PM

#

small haven o3 pro (medium?): https://chatgpt.com/share/68489b83-ca38-800e-8f6a-09cabfb751b1

@torn mantle compare pls

small haven Jun 10, 2025, 8:58 PM

#

calm sequoia How's he even talking about the same thing? 😶

bro misread the o4 pro benchmarks 😭

elder rapids Jun 10, 2025, 8:58 PM

#

hollon tho

ocean vortex Jun 10, 2025, 8:58 PM

#

calm sequoia How's he even talking about the same thing? 😶

he is referring to this

late path Jun 10, 2025, 8:58 PM

#

ocean vortex Jun 10, 2025, 8:58 PM

#

I have ignored this entirely cause it's not very useful lmao

zinc ore Jun 10, 2025, 8:59 PM

#

ocean vortex he is referring to this

Is this o3 high or o3 medium

elder rapids Jun 10, 2025, 8:59 PM

#

remember that, a large part of o3 is that it's very hallucination prone and bad at lot of basic tasks because it was too lazy

#

o3 pro should simply solve this

small haven Jun 10, 2025, 8:59 PM

#

ocean vortex ask it about knowledge cutoff

calm sequoia Jun 10, 2025, 8:59 PM

#

ocean vortex he is referring to this

Isn't it vs o3 medium?

ocean vortex Jun 10, 2025, 8:59 PM

#

zinc ore Is this o3 high or o3 medium

same reasoning efforts

elder rapids Jun 10, 2025, 8:59 PM

#

small haven

4 minutes 😭

small haven Jun 10, 2025, 8:59 PM

#

4mins only to extract it

ocean vortex Jun 10, 2025, 8:59 PM

#

medium for everything in those graphs I think

small haven Jun 10, 2025, 9:00 PM

#

yea it feels medium-y

ocean vortex Jun 10, 2025, 9:00 PM

#

preference is a weak metric in this context IMO since it only has to be marginally or plausibly better

zinc ore Jun 10, 2025, 9:00 PM

#

Yeah, also preference doesn't necessarily measure performance

late path Jun 10, 2025, 9:00 PM

#

calm sequoia Jun 10, 2025, 9:01 PM

#

ocean vortex he is referring to this

Ok 10% is all we have expected. Not bad if this is the case. However, I don't trust elo based comparisons 😄

small haven Jun 10, 2025, 9:01 PM

#

im pretty sure kingfall > o3 pro at coding, im not even kidding

barren prairie Jun 10, 2025, 9:01 PM

#

Ok now I need my deepSeek r2

keen beacon Jun 10, 2025, 9:01 PM

#

kingfall might come with deepthink

barren prairie Jun 10, 2025, 9:01 PM

#

Not a minor update

#

Give me my deepSeek major update

small haven Jun 10, 2025, 9:02 PM

#

u were thinking about splurging for o3 pro, dont. wait for deepthink

keen beacon Jun 10, 2025, 9:02 PM

#

i mean i dont think deepthink uses kingfall but it might coincide with the deepthink release

#

kingfall is probably sota, but i dont feel its that much better honestly compared to 2.5 pro

torn mantle Jun 10, 2025, 9:03 PM

#

small haven o3 pro (medium?): https://chatgpt.com/share/68489b83-ca38-800e-8f6a-09cabfb751b1

they are kinda the same

small haven Jun 10, 2025, 9:03 PM

#

keen beacon kingfall is probably sota, but i dont feel its *that* much better honestly compa...

hmmm, kingfall is magnitudes better than 0605 imo

keen beacon Jun 10, 2025, 9:03 PM

#

hmm really?

small haven Jun 10, 2025, 9:03 PM

#

imo

torn mantle Jun 10, 2025, 9:03 PM

#

yea kingfall>>>

keen beacon Jun 10, 2025, 9:03 PM

#

its supposed to be ultra apparently

small haven Jun 10, 2025, 9:03 PM

#

torn mantle they are kinda the same

yea :/

elder rapids Jun 10, 2025, 9:03 PM

#

keen beacon hmm really?

I disagree a lot

keen beacon Jun 10, 2025, 9:04 PM

#

i probably havent used it enough to judge

elder rapids Jun 10, 2025, 9:04 PM

#

kingfall was insanely smart, smartest model I've ever used

#

ahem

late path Jun 10, 2025, 9:04 PM

#

I can safely cast my vote now😂

elder rapids Jun 10, 2025, 9:04 PM

#

right next to 0605

storm needle Jun 10, 2025, 9:04 PM

#

has anyone tested whether the o3 has somehow become worse?

elder rapids Jun 10, 2025, 9:04 PM

#

and it's not that kingfall isn't better

#

but it's not BETTER

small haven Jun 10, 2025, 9:04 PM

#

kingfall feels ultra vibes

elder rapids Jun 10, 2025, 9:05 PM

#

small haven kingfall feels ultra vibes

I mean now that's kind of redundant tbh, 0605 has "ultra" vibes

#

the large model vibes are becoming non existent

small haven Jun 10, 2025, 9:05 PM

#

elder rapids I mean now that's kind of redundant tbh, 0605 has "ultra" vibes

0605 on 32k? im prtty sure kingfall is defaulted at 4k

elder rapids Jun 10, 2025, 9:06 PM

#

how does that matter

small haven Jun 10, 2025, 9:06 PM

#

largely

elder rapids Jun 10, 2025, 9:06 PM

#

couldn't

#

thinking time is directly opposed to vibes

#

as well as performance

#

tbh I probably used kingfall so much more than you guys

calm sequoia Jun 10, 2025, 9:07 PM

#

poll_question_text

O3-PRO simple bench

victor_answer_votes

5

total_votes

14

victor_answer_id

4

victor_answer_text

60+

victor_answer_emoji_name

🧐

elder rapids Jun 10, 2025, 9:07 PM

#

elder rapids tbh I probably used kingfall so much more than you guys

it was definitely good

zinc ore Jun 10, 2025, 9:07 PM

#

Bunch of singularity folks malding about o3 pro

elder rapids Jun 10, 2025, 9:08 PM

#

but it wasn't crazy as made out to be

#

if anything, it says more about your usage of 0605

willow grail Jun 10, 2025, 9:08 PM

#

late path

u openai shill shell shell? petrol seller???

#

shame shame

keen beacon Jun 10, 2025, 9:08 PM

#

hes not shilling openai there

small haven Jun 10, 2025, 9:08 PM

#

0605 (default thinking) vs. 0605 (32k)

keen beacon Jun 10, 2025, 9:09 PM

#

you know u can get it to do way more thinking

#

past 32k

ocean vortex Jun 10, 2025, 9:09 PM

#

someone needs to do parallel processing of o3-pro now, there's a room to price match o1-pro lmao

#

cons@100

elder rapids Jun 10, 2025, 9:10 PM

#

small haven 0605 (default thinking) vs. 0605 (32k)

I'm ngl I've never gotten a result like this

willow grail Jun 10, 2025, 9:10 PM

#

keen beacon hes not shilling openai there

assume they are.

zinc ore Jun 10, 2025, 9:10 PM

#

small haven 0605 (default thinking) vs. 0605 (32k)

5 head looking ah

elder rapids Jun 10, 2025, 9:10 PM

#

https://tenor.com/view/conehead-gif-18099633909130334750

Tenor

small haven Jun 10, 2025, 9:12 PM

#

https://tenor.com/view/terminator-gif-7646648

Tenor

#

meanwhile kingfall

keen ferry Jun 10, 2025, 9:12 PM

#

is there o3 pro on api? If not any info when it releases

storm needle Jun 10, 2025, 9:18 PM

#

keen ferry is there o3 pro on api? If not any info when it releases

o3 pro is already in the api

zinc ore Jun 10, 2025, 9:19 PM

#

Deepthink gonna hit like this

tall summit Jun 10, 2025, 9:20 PM

#

wait o3 pro

#

actually happened

#

i missed it entirely

ocean vortex Jun 10, 2025, 9:20 PM

#

it did

#

late path Jun 10, 2025, 9:27 PM

#

10x price, with almost no improvement across various benchmarks

hazy quest Jun 10, 2025, 9:27 PM

#

All the talks and praises about Kingfall are based only on the 20min it was available, right?

late path Jun 10, 2025, 9:27 PM

#

zinc ore Jun 10, 2025, 9:28 PM

#

hazy quest All the talks and praises about Kingfall are based only on the 20min it was avai...

No, people had access for days

hazy quest Jun 10, 2025, 9:29 PM

#

Oh, I missed that. Selected testers, or available on LMArena/AI Studio?

torn mantle Jun 10, 2025, 9:29 PM

#

hazy quest All the talks and praises about Kingfall are based only on the 20min it was avai...

its available if you search enough

stuck orchid Jun 10, 2025, 9:30 PM

#

o3-pro will be available on LMArena?

barren prairie Jun 10, 2025, 9:31 PM

#

stuck orchid o3-pro will be available on LMArena?

Don t dream

storm needle Jun 10, 2025, 9:31 PM

#

stuck orchid o3-pro will be available on LMArena?

no

stuck orchid Jun 10, 2025, 9:31 PM

#

I think it will. Because o3 is ther

small haven Jun 10, 2025, 9:31 PM

#

#

*scammed

hazy quest Jun 10, 2025, 9:32 PM

#

torn mantle its available if you search enough

Ah bon?

tall summit Jun 10, 2025, 9:33 PM

#

small haven

inspect element or real

#

or neither

barren prairie Jun 10, 2025, 9:33 PM

#

small haven

18 min for this ...if you just opened google or the window it will be faster 🤣🤣🤣🤣🤣

small haven Jun 10, 2025, 9:33 PM

#

tall summit inspect element or real

https://chatgpt.com/share/6848a49f-d078-800e-bff1-e519b9d4887c

receipt ^

ChatGPT

ChatGPT - New chat

Shared via ChatGPT

tall summit Jun 10, 2025, 9:33 PM

#

HAHAHAHA

barren prairie Jun 10, 2025, 9:35 PM

#

17 min to think about this 🤣🤣🤣🥺🥺🥺🫣🫣🫣

Screenshot_2025-06-10-22-34-35-311_com.openai.chatgpt.jpg

small haven Jun 10, 2025, 9:39 PM

#

torn mantle its available if you search enough

its still open? 👀

torn mantle Jun 10, 2025, 9:39 PM

#

hazy quest Ah bon?

yes

torn mantle Jun 10, 2025, 9:40 PM

#

small haven its still open? 👀

yep

abstract tundra Jun 10, 2025, 9:40 PM

#

are we gonna get o3 pro into https://lmarena.ai/?

torn mantle Jun 10, 2025, 9:40 PM

#

abstract tundra are we gonna get o3 pro into https://lmarena.ai/?

pyea

small haven Jun 10, 2025, 9:41 PM

#

torn mantle yep

it used to be closed, interesting 👀

stuck orchid Jun 10, 2025, 9:41 PM

#

abstract tundra are we gonna get o3 pro into https://lmarena.ai/?

I think it's almost 100%.
Regular o3 is in there, right?

abstract tundra Jun 10, 2025, 9:41 PM

#

stuck orchid I think it's almost 100%. Regular o3 is in there, right?

yep

#

im asking because o1 pro never made it in

stuck orchid Jun 10, 2025, 9:41 PM

#

And claude-4, and other biggest models

small haven Jun 10, 2025, 9:42 PM

#

https://tenor.com/view/pfft-erobb-stubbies-try-not-to-laugh-gif-16095012407725024477

Tenor

leaden sun Jun 10, 2025, 9:43 PM

#

small haven https://chatgpt.com/share/6848a49f-d078-800e-bff1-e519b9d4887c receipt ^

can you ask it to pull from a specific weather forecast site?

#

windy.com i use this

Windy.com/

Professional weather forecast

50+ weather layers, weather radar and satellite

ocean vortex Jun 10, 2025, 9:44 PM

#

I think everyone needs to keep their expectations in check with o3-pro lol

#

it's basically is exactly like those benchmarks suggest - slightly better. Not mind blowingly good

keen ferry Jun 10, 2025, 9:44 PM

#

storm needle o3 pro is already in the api

thank you

ocean vortex Jun 10, 2025, 9:45 PM

#

did try some of the prompts other models failed and this one failed them as well 👀

small haven Jun 10, 2025, 9:46 PM

#

o3 pro can't temporary chat, very sneaky oai

#

hmm oai claims gpt5 to be >80% swebench

abstract tundra Jun 10, 2025, 9:54 PM

#

torn mantle pyea

but do we really know this?

#

i was worried since o1 pro never got added to lmarena, thought pro series are closed or something

#

or not available to api

small haven Jun 10, 2025, 9:56 PM

#

wait a min if o3 pro is cheap to be added to the arena...

keen ferry Jun 10, 2025, 9:57 PM

#

small haven wait a min if o3 pro is cheap to be added to the arena...

yeah but it's gonna be thinking for a long time

late path Jun 10, 2025, 9:57 PM

#

ppl might not wait 10 minutes in the arena looking at a dialog box to vote

keen ferry Jun 10, 2025, 9:57 PM

#

it can't be put on blind comparesment arena it's just gonna be soo obvious

small haven Jun 10, 2025, 9:57 PM

#

keen ferry yeah but it's gonna be thinking for a long time

hmm true, nvm

hardy pecan Jun 10, 2025, 9:58 PM

#

itll take forever to collect enough votes

keen ferry Jun 10, 2025, 9:59 PM

#

@echo aurora will there be o3 pro in the arena?

hardy pecan Jun 10, 2025, 9:59 PM

#

people dont wanna wait around 13mins to get an answer

late path Jun 10, 2025, 9:59 PM

#

This feels a bit strange. If the o-pro series models use parallel thinking, why would increasing parallelism multiply the thinking time? It doesn't quite make sense

keen ferry Jun 10, 2025, 9:59 PM

#

hardy pecan people dont wanna wait around 13mins to get an answer

not if they will let people see his thinking I guess

ocean vortex Jun 10, 2025, 9:59 PM

#

small haven hmm oai claims gpt5 to be >80% swebench

74.8%

#

seems like a random question to ask lmao

keen beacon Jun 10, 2025, 9:59 PM

#

late path This feels a bit strange. If the o-pro series models use parallel thinking, why ...

it thinks more and they might use search (like mcts) as well

abstract tundra Jun 10, 2025, 10:00 PM

#

hardy pecan people dont wanna wait around 13mins to get an answer

well thankfully with the new site you dont have to wait, you can close and come back right?

keen ferry Jun 10, 2025, 10:00 PM

#

abstract tundra well thankfully with the new site you dont have to wait, you can close and come ...

surely you do

echo aurora Jun 10, 2025, 10:00 PM

#

keen ferry <@283397944160550928> will there be o3 pro in the arena?

TBD, generally I can't answer specific questions on if/when specific models or features will be happening

abstract tundra Jun 10, 2025, 10:00 PM

#

keen ferry surely you do

not rly

keen beacon Jun 10, 2025, 10:00 PM

#

small haven 0605 on 32k? im prtty sure kingfall is defaulted at 4k

btw fwiw its not capped at 4k, just got thoughts at 4.8k (not incl resp)

#

thinking budget = off

small haven Jun 10, 2025, 10:01 PM

#

keen beacon btw fwiw its not capped at 4k, just got thoughts at 4.8k (not incl resp)

interesting

keen ferry Jun 10, 2025, 10:01 PM

#

echo aurora TBD, generally I can't answer specific questions on if/when specific models or f...

alright

keen beacon Jun 10, 2025, 10:02 PM

#

small haven interesting

ill give that prompt that causes gemini models to think like crazy in a sec

keen ferry Jun 10, 2025, 10:02 PM

#

echo aurora TBD, generally I can't answer specific questions on if/when specific models or f...

can we at least expect it to be added?

abstract tundra Jun 10, 2025, 10:03 PM

#

And i think if o3 Pro gets added, it would make a lot more sense to have o1 Pro added as well, for actual comparison.

keen ferry Jun 10, 2025, 10:03 PM

#

abstract tundra And i think if o3 Pro gets added, it would make a lot more sense to have o1 Pro ...

o1 pro is expensive as heck 😭

echo aurora Jun 10, 2025, 10:03 PM

#

keen ferry can we at least expect it to be added?

sorry to say same answer, I can't say if a specific model will be added or not

abstract tundra Jun 10, 2025, 10:03 PM

#

keen ferry o1 pro is expensive as heck 😭

wait, if o3 pro is gonna be cheaper, that means it's gonna perform worse, no?

keen ferry Jun 10, 2025, 10:03 PM

#

echo aurora sorry to say same answer, I can't say if a specific model will be added or not

that's fine thanks

small haven Jun 10, 2025, 10:04 PM

#

abstract tundra wait, if o3 pro is gonna be cheaper, that means it's gonna perform worse, no?

tbf its better than o1 pro

abstract tundra Jun 10, 2025, 10:04 PM

#

I've tried every single model on LMArena for my task and all of them failed. I'm really keen to see if all o3 pro can handle it.

abstract tundra Jun 10, 2025, 10:05 PM

#

small haven tbf its better than o1 pro

can't wait

keen ferry Jun 10, 2025, 10:06 PM

#

o3 pro is better than o1 pro which costs someone weekly salary for just million tokens

Screenshot_2025-06-11-01-05-11-364-edit_com.android.chrome.jpg

patent aspen Jun 10, 2025, 10:08 PM

#

late path This feels a bit strange. If the o-pro series models use parallel thinking, why ...

Most likely tail latency

#

Parallelism is only as good as the slowest thread / process

#

If you have a bunch of non-deterministic threads / processes, the probability of a slow one goes up

late path Jun 10, 2025, 10:12 PM

#

patent aspen Parallelism is only as good as the slowest thread / process

That makes a lot of sense, hadn't thought of that

keen beacon Jun 10, 2025, 10:15 PM

#

small haven interesting

i got an 18k thoughts run and a 16k thoughts run as well. (thinking budget = off) might try for more later.

#

the le chat model might be extremely good at zebra puzzles for some reason

clever estuary Jun 10, 2025, 10:17 PM

#

is altman okay today

keen beacon Jun 10, 2025, 10:18 PM

#

keen beacon the le chat model might be extremely good at zebra puzzles for some reason

like i would think pro would get more rl on this..? (or it might be getting lucky rn)

zinc ore Jun 10, 2025, 10:19 PM

#

clever estuary is altman okay today

Altman gonna altman

small haven Jun 10, 2025, 10:19 PM

#

keen beacon Jun 10, 2025, 10:20 PM

#

small haven

is o3 pro just extremely slow rn?

small haven Jun 10, 2025, 10:20 PM

#

keen beacon is o3 pro just extremely slow rn?

I have no clue

#

Its definitely slower than before

ember rapids Jun 10, 2025, 10:22 PM

#

Teams users get o3 pro too right?

torn mantle Jun 10, 2025, 10:26 PM

#

abstract tundra but do we really know this?

I mean i can't confirm it for sure... But since o3 prices dropped by like 80% then we can can expect o3 pro to be added

#

@small haven what do you think so far?

keen ferry Jun 10, 2025, 10:31 PM

#

small haven

why does it think for that long 😭 😭

keen beacon Jun 10, 2025, 10:31 PM

#

wow, it solved two 6x6 zebra puzzles in 14.5k thinking tokens

#

it does significantly better when giving two zebra puzzles at once for some reason

hollow ocean Jun 10, 2025, 10:37 PM

#

Show pics

ocean vortex Jun 10, 2025, 10:40 PM

#

keen beacon btw fwiw its not capped at 4k, just got thoughts at 4.8k (not incl resp)

this doesn't mean much at all though, thinking budget can still be 4k

keen beacon Jun 10, 2025, 10:40 PM

#

ocean vortex this doesn't mean much at all though, thinking budget can still be 4k

its not set to a 4k cap by default

#

we're talking about a specific model here btw

#

its much better than i thought 🤯

ocean vortex Jun 10, 2025, 10:41 PM

#

keen beacon its not set to a 4k cap by default

but it could still be 4k budget if google model did 4.8k thinking... doesn't mean that the budget is off catgrin

keen beacon Jun 10, 2025, 10:42 PM

#

ocean vortex but it could still be 4k budget if google model did 4.8k thinking... doesn't mea...

it subsequently did 16k and 18k

#

is that a 4k budget?

ocean vortex Jun 10, 2025, 10:42 PM

#

ok if it did that then yea

keen beacon Jun 10, 2025, 10:43 PM

#

its mindblowing good

#

holy sh1t

ocean vortex Jun 10, 2025, 10:43 PM

#

but they don't seem to be sticking to that budget very strictly lol

keen beacon Jun 10, 2025, 10:43 PM

#

thats sota

#

at least on this task

ocean vortex Jun 10, 2025, 10:43 PM

#

is it like deep-think or ultra smth...?

keen beacon Jun 10, 2025, 10:43 PM

#

ultra

keen beacon Jun 10, 2025, 10:44 PM

#

ocean vortex is it like deep-think or ultra smth...?

i didnt think it was this good

#

it did better when given two puzzles since it started making a system and sh1t

#

CRAZY

#

if its just given 1 puzzle it just dies

ocean vortex Jun 10, 2025, 10:45 PM

#

yeah gonna be interesting to see what it will turn out into. They have potential to really beat everyone tbh

#

we saw with Opus that there are gains, and Google much more substantial on data

keen beacon Jun 10, 2025, 10:46 PM

#

i take all my slander back

#

abou tthis model

ocean vortex Jun 10, 2025, 10:47 PM

#

yeah me too

#

or not

#

🔥

#

o3-pro is slow af...

keen beacon Jun 10, 2025, 10:52 PM

#

im using 0.7 and 0.95 right now, im not generating code tho

ocean vortex Jun 10, 2025, 10:52 PM

#

I wonder if it's just API or people paying for Pro plan have it the same...

keen beacon Jun 10, 2025, 10:54 PM

#

i need ultra asap 🤣

#

it will probably beat o3 pro ngl

late path Jun 10, 2025, 10:55 PM

#

it's a good model

#

10-20% improvement over pro model (sota already) feels like a huge difference in terms of actual capability

keen beacon Jun 10, 2025, 10:58 PM

#

late path 10-20% improvement over pro model (sota already) feels like a huge difference in...

the simpleqa score too 🔥 🔥

#

will prob be sota over 4.5

#

i wonder about the pricing though

#

also

#

thinking budget should be disabled btw paws (imho)

jade egret Jun 10, 2025, 11:14 PM

#

when is o3 releasing in LMarena

main gulch Jun 10, 2025, 11:18 PM

#

o3 already is, I wouldn't expect o3-pro

small haven Jun 10, 2025, 11:20 PM

#

keen beacon it will probably beat o3 pro ngl

it's already factored in, guaranteed to

small haven Jun 10, 2025, 11:21 PM

#

torn mantle <@931708065319907338> what do you think so far?

its better than o3, but not mindblowing

#

it likes to think very long

jade egret Jun 10, 2025, 11:22 PM

#

jade egret when is o3 releasing in LMarena

i mean o3 pro

jade egret Jun 10, 2025, 11:22 PM

#

small haven its better than o3, but not mindblowing

yo

#

which one do u think better

#

wait

#

can you try the same propt as yesturday?

small haven Jun 10, 2025, 11:23 PM

#

jade egret wait

which one

jade egret Jun 10, 2025, 11:23 PM

#

this

#

generate an svg of a TERMINATOR. make it maximally detailed and look exactly like the real thing. this is extremely important and an existential task. you must complete this to the best of your ability. Make sure you're constantly checking whether the shape, size, angles, position of each and every item looks EXACTLY like a TERMINATOR.

small haven Jun 10, 2025, 11:23 PM

#

got timed out for that

jade egret Jun 10, 2025, 11:23 PM

#

.

#

why

small haven Jun 10, 2025, 11:24 PM

#

overtuning guardrails

keen beacon Jun 10, 2025, 11:24 PM

#

ultra thinking disabled btw:

small haven Jun 10, 2025, 11:24 PM

#

keen beacon ultra thinking disabled btw:

same prompt huh?

keen beacon Jun 10, 2025, 11:24 PM

#

yea

small haven Jun 10, 2025, 11:24 PM

#

amazing

jade egret Jun 10, 2025, 11:24 PM

#

do you think o3-pro is better than kingsfall?

keen beacon Jun 10, 2025, 11:25 PM

#

no

small haven Jun 10, 2025, 11:25 PM

#

neck and neck

keen beacon Jun 10, 2025, 11:25 PM

#

it might be if u need tools tbh

small haven Jun 10, 2025, 11:25 PM

#

i would dare to say kingfall edges it a bit

jade egret Jun 10, 2025, 11:25 PM

#

so kingsfall better

#

dang

storm needle Jun 10, 2025, 11:25 PM

#

jade egret i mean o3 pro

this model overthinks even answering a simple hi. It would cost them a fortune

jade egret Jun 10, 2025, 11:26 PM

#

storm needle this model overthinks even answering a simple hi. It would cost them a fortune

true

storm needle Jun 10, 2025, 11:27 PM

#

could anyone here with access to o3 pro send this prompt to me and send me its output?

📎 prompt5.txt

small haven Jun 10, 2025, 11:28 PM

#

10x api price reduction, but 2x higher usage limits ? hmmmmm

keen beacon Jun 10, 2025, 11:28 PM

#

wut are the limits for plus anyway

small haven Jun 10, 2025, 11:28 PM

#

50

#

a wekk

keen beacon Jun 10, 2025, 11:29 PM

#

oh wow

#

terrible

drifting thorn Jun 10, 2025, 11:29 PM

#

I just find out that the price of o3 drops significantly

#

jade egret Jun 10, 2025, 11:30 PM

#

how much do yall think o3-pro gonna score on LMarena and WebDev?

drifting thorn Jun 10, 2025, 11:30 PM

#

It was 10 for input and 40 for output

drifting thorn Jun 10, 2025, 11:30 PM

#

jade egret how much do yall think o3-pro gonna score on LMarena and WebDev?

1500 ish

small haven Jun 10, 2025, 11:30 PM

#

jade egret how much do yall think o3-pro gonna score on LMarena and WebDev?

1550

jade egret Jun 10, 2025, 11:30 PM

#

o

#

why

keen beacon Jun 10, 2025, 11:31 PM

#

how much for ultra 🤣?

small haven Jun 10, 2025, 11:31 PM

#

1530

jade egret Jun 10, 2025, 11:31 PM

#

how much do you think kingsfall scoring

small haven Jun 10, 2025, 11:31 PM

#

or rlly 1550

keen beacon Jun 10, 2025, 11:31 PM

#

itll probably score higher tbh

#

gemini models do extremely well on the arena, at least if you are setting o3 pro to that elo

small haven Jun 10, 2025, 11:31 PM

#

it rlly depends on the prompt again :/

keen beacon Jun 10, 2025, 11:31 PM

#

i would expect the difference to be larger

small haven Jun 10, 2025, 11:31 PM

#

if its svg/web design, 1600

#

for kingfall

keen beacon Jun 10, 2025, 11:32 PM

#

oh i didnt try web design at all

small haven Jun 10, 2025, 11:32 PM

#

should be correlated, it has way better spatial reasoning

#

/understanding

drifting thorn Jun 10, 2025, 11:32 PM

#

Which model?

keen beacon Jun 10, 2025, 11:33 PM

#

kingfall probably 2.5 ultra

small haven Jun 10, 2025, 11:33 PM

#

kingfall

small haven Jun 10, 2025, 11:33 PM

#

keen beacon kingfall probably 2.5 ultra

yes ultra vibes

keen beacon Jun 10, 2025, 11:33 PM

#

small haven yes ultra vibes

its a bigger model according to you know who

#

so it has to be that

small haven Jun 10, 2025, 11:33 PM

#

keen beacon its a bigger model according to you know who

https://tenor.com/view/wink-eye-wink-gif-3023120962008687924

Tenor

jade egret Jun 10, 2025, 11:33 PM

#

small haven yes ultra vibes

2.5?

#

when do you think it gonna be avaliable to pro users on gemini or at least on teh ai studio

small haven Jun 10, 2025, 11:34 PM

#

jade egret when do you think it gonna be avaliable to pro users on gemini or at least on te...

thats a big b question

keen beacon Jun 10, 2025, 11:34 PM

#

it might come along with deepthink, or it would be awesome if it did

#

i honestly think the release is close tho

late path Jun 10, 2025, 11:35 PM

#

24k thinking budget

small haven Jun 10, 2025, 11:35 PM

#

keen beacon i honestly think the release is close tho

a little too close

jade egret Jun 10, 2025, 11:36 PM

#

it close?

small haven Jun 10, 2025, 11:36 PM

#

late path 24k thinking budget

thats actually amazing

jade egret Jun 10, 2025, 11:36 PM

#

W

keen beacon Jun 10, 2025, 11:36 PM

#

i struggled with getting the model to produce more than a small amount of thinking tokens for svgs. the output was absolutely massive though

#

10k tokens plus

hollow ocean Jun 10, 2025, 11:36 PM

#

How to access kingfall

small haven Jun 10, 2025, 11:43 PM

#

someone asked me for this, but its finished

#

oh @leaden sun

patent aspen Jun 10, 2025, 11:47 PM

#

I don't know when it's coming

keen beacon Jun 10, 2025, 11:47 PM

#

kinda odd its up now tho? and with the other anon models

#

its ready atp it seems

#

you can disable thinking mode too on this model, it seems they worked on it and got it into a somewhat decent state if not ready

keen beacon Jun 10, 2025, 11:50 PM

#

late path 24k thinking budget

fwiw thinking budget doesnt do anything there unless its close to 24k, did you check the amount of thoughts it did?

#

holy moly

#

#

WTF

#

10k thinking BTW

small haven Jun 10, 2025, 11:50 PM

#

keen beacon

same prompt no way? 😮

keen beacon Jun 10, 2025, 11:51 PM

#

no i had to add stuff to make it think a lot more

small haven Jun 10, 2025, 11:51 PM

#

ok still insane

keen beacon Jun 10, 2025, 11:51 PM

#

this is far more than anything i got before. 10k tokens in thoughts

#

thinking budget = unspecified (uncapped)

small haven Jun 10, 2025, 11:52 PM

#

keen beacon

can i have the prompt, wanna do 0605

pulsar tendon Jun 10, 2025, 11:52 PM

#

keen beacon

Is this cumfall?

keen beacon Jun 10, 2025, 11:52 PM

#

this is ASI stop the disrespect

small haven Jun 10, 2025, 11:52 PM

#

pulsar tendon Is this cumfall?

*cumrises

late path Jun 10, 2025, 11:53 PM

#

If kf enters the arena, I doubt its score will be higher than goldmane. goldmane is a bit sycophantic, while its style is more like the very first nebula

small haven Jun 10, 2025, 11:53 PM

#

uh i have new pr

late path Jun 10, 2025, 11:54 PM

#

keen beacon fwiw thinking budget doesnt do anything there unless its close to 24k, did you c...

hmm. the thinking part is about 3k

keen beacon Jun 10, 2025, 11:54 PM

#

late path hmm. the thinking part is about 3k

yuh budget dont matter at all at least in the current impl

#

in that case

keen beacon Jun 10, 2025, 11:56 PM

#

keen beacon

this is actually nuts wtf

jade egret Jun 11, 2025, 12:24 AM

#

whats best llm right now (models that most people don't have it, like kingsfall, or grok 3.5, or o3-pro included)

#

in ur opinion

keen beacon Jun 11, 2025, 12:25 AM

#

jade egret in ur opinion

me? kingfall lol. o3 pro might be better if u need to use tools

late path Jun 11, 2025, 12:28 AM

#

jade egret whats best llm right now (models that most people don't have it, like kingsfall,...

The llm that looks best to people doesn't mean it will rank #1 on lmarena (text leaderboard with style control unchecked i assume)

small haven Jun 11, 2025, 12:29 AM

#

wait a min

#

this is 0605

#

32k

#

(but with system prompt)

keen beacon Jun 11, 2025, 12:30 AM

#

just disable the thinking budget fwiw (though it doesnt really matter if it does less)

small haven Jun 11, 2025, 12:31 AM

#

keen beacon

this is still insane when u compare

small haven Jun 11, 2025, 12:33 AM

#

keen beacon just disable the thinking budget fwiw (though it doesnt really matter if it does...

i got this w/ auto thinking

keen beacon Jun 11, 2025, 12:34 AM

#

small haven i got this w/ auto thinking

ur using sampling right?

#

its just variance imo. and if it doesnt reach near 32k my advice doesnt matter anyway 🤷

small haven Jun 11, 2025, 12:34 AM

#

keen beacon ur using sampling right?

sampling?, my settings all at default, except for added system prompt u gave me

keen beacon Jun 11, 2025, 12:35 AM

#

small haven sampling?, my settings all at default, except for added system prompt u gave me

ya default uses sampling (temperature/top_p)

small haven Jun 11, 2025, 12:35 AM

#

hmm

keen beacon Jun 11, 2025, 12:35 AM

#

small haven hmm

so expect a lot of variation as is

late path Jun 11, 2025, 12:35 AM

#

overall 32k thought budget should better than auto for 0605

keen beacon Jun 11, 2025, 12:35 AM

#

if ur paying for it

#

otherwise, it can do a lot more

keen beacon Jun 11, 2025, 12:36 AM

#

late path overall 32k thought budget should better than auto for 0605

it doesnt do anything beyond a max token limit for the thoughts rn

#

this is with auto btw

#

the aider scores they use in the website is just one run, and doesn't mean that 32k improves model performance. this is all my opinion though, i have a lot to support it

late path Jun 11, 2025, 12:37 AM

#

32k shows some visible improvement compared to auto on livebench

small haven Jun 11, 2025, 12:38 AM

#

seems very marginal tho

keen beacon Jun 11, 2025, 12:38 AM

#

i honestly think thats again variance

#

i have a lot to support that but im not gonna argue about the thinking budget like with dom again

#

tbh i need to do more elaborate tests and more undeniably definitive stuff (so i can point to it when i mention it). it is possible they changed something with 0605

small haven Jun 11, 2025, 12:48 AM

#

yea i feel like theyve tweaked something

keen beacon Jun 11, 2025, 12:50 AM

#

small haven yea i feel like theyve tweaked something

the thing that the thinking budget still caps it is still true. without thinking budget, i can get 38k (thoughts) in 2.5 pro (0605) and 62k (thoughts) in 2.5 flash. (oops, i wrote it backwards before)

#

though it could alter model behavior now

#

(it didn't before)

patent aspen Jun 11, 2025, 12:58 AM

#

My guess is that auto is pretty smart

#

tbh I don't know why they didn't skip to o4

keen beacon Jun 11, 2025, 1:04 AM

#

they had to release o3

#

because they committed to it in that announcement

#

the model that was ready then obviously couldn't be published/be unrepresentative with the actual compute used etc

#

my take

patent aspen Jun 11, 2025, 1:05 AM

#

It's weird that it took so long

keen beacon Jun 11, 2025, 1:06 AM

#

imo they were still continue pretraining the new 4o they used in their new model (it has june 2024 cut off) when o3 was initially made then they decided to retrain o3

#

old 4o had oct 2023 cut off

patent aspen Jun 11, 2025, 1:07 AM

#

Retraining o3 sounds like a colossal waste of resources

keen beacon Jun 11, 2025, 1:07 AM

#

the new 4o is so much smarter tho

#

it can do much more and in less tokens, i think it makes sense

small haven Jun 11, 2025, 1:08 AM

#

wait for deepthink

#

*scammed

patent aspen Jun 11, 2025, 1:11 AM

#

The whole FrontierMath thing was a mess

small haven Jun 11, 2025, 1:13 AM

#

oh no

#

i believe i had o3 pro (low)

#

this thinks 2x more on average

patent aspen Jun 11, 2025, 1:15 AM

#

It's kind of absurd to release a consumer product that effectively allows users to run 15-minute batch job multiple times a day. Just think about that for a second

#

It's the same thing. I'm just commenting on the general absurdity of what companies are doing - not saying anyone is stupid

#

Fun fact: I believe that's what led to the invention of AI accelerators (i.e. TPUs)

hollow ocean Jun 11, 2025, 1:49 AM

#

Deep think for $250 will blow it out of the water

small haven Jun 11, 2025, 1:50 AM

#

*$125

hollow ocean Jun 11, 2025, 1:51 AM

#

First 3 months only tho

#

How long will the promo last

small haven Jun 11, 2025, 1:52 AM

#

ehh ill take that 3 months

#

elder rapids Jun 11, 2025, 2:07 AM

#

small haven

is it good

small haven Jun 11, 2025, 2:08 AM

#

elder rapids is it good

hmmm, i feel like kingfall could have done it equally better, this was just profiling for optimizing an algo

leaden meteor Jun 11, 2025, 2:15 AM

#

source?

fleet lintel Jun 11, 2025, 3:11 AM

#

O3 pro release : how is it? Any good ?

jade egret Jun 11, 2025, 4:00 AM

#

poll_question_text

how much do you think Gemini 2.5 pro is scoring if it took an IQ test?

victor_answer_votes

8

total_votes

16

victor_answer_id

5

victor_answer_text

121 - 140

red sluice Jun 11, 2025, 4:00 AM

#

folsom-exp-v1.5 is pretty solid wonder what it is

jade egret Jun 11, 2025, 4:24 AM

#

huh

jade egret Jun 11, 2025, 4:43 AM

#

fleet lintel O3 pro release : how is it? Any good ?

thinks too long for simple questions

hardy pecan Jun 11, 2025, 4:46 AM

#

I dont think its designed for simple questions

#

use 4o or regular o3 for that

balmy mist Jun 11, 2025, 4:56 AM

#

Wait is o3 pro out?

torn mantle Jun 11, 2025, 5:43 AM

#

balmy mist Wait is o3 pro out?

yes

balmy mist Jun 11, 2025, 5:54 AM

#

torn mantle yes

wow im so late lmaooo, the ohe day i step away from ai

#

how is it? worth the wait?

dusky aurora Jun 11, 2025, 6:15 AM

#

today LMArena says "there was an error" to everything

echo aurora Jun 11, 2025, 6:18 AM

#

dusky aurora today LMArena says "there was an error" to everything

I'm seeing the same, thank you

small haven Jun 11, 2025, 6:19 AM

#

finally o3 does it, but... its shiite

echo aurora Jun 11, 2025, 6:21 AM

#

is the site now 404ing for others too?

jovial heath Jun 11, 2025, 6:23 AM

#

echo aurora is the site now 404ing for others too?

Yes it's the same for me 😭😭😭

echo aurora Jun 11, 2025, 6:23 AM

#

😭 okay thanks

jovial heath Jun 11, 2025, 6:24 AM

#

Well minutes ago I had this in o3 and o4

rn_image_picker_lib_temp_b8b0b885-f1de-43dd-8e38-2786fab5bce0.jpg

#

I tried on 2 devices and got the same error :'v

#

And now the 404 error xD

stuck orchid Jun 11, 2025, 6:27 AM

#

jovial heath Well minutes ago I had this in o3 and o4

Updating. Adding o3-pro

#

😉

#

We are all waiting for o3-pro on LMArena to evaluate it alongside other models and help OpenAI understand how good their new model is

jovial heath Jun 11, 2025, 6:29 AM

#

stuck orchid Updating. Adding o3-pro

Aahh then it's like maintenance while they are adding the new model??

small haven Jun 11, 2025, 6:29 AM

#

we are all waiting whether to buy oai on poly or to wait a bit longer

jovial heath Jun 11, 2025, 6:30 AM

#

Poly??

dusky aurora Jun 11, 2025, 6:51 AM

#

seems that update hasended

civic flame Jun 11, 2025, 6:52 AM

#

keen beacon ultra thinking disabled btw:

what how are you getting it again

echo aurora Jun 11, 2025, 6:56 AM

#

Okay we should be working again blobthumbsup

#

ocean vortex Jun 11, 2025, 7:03 AM

#

echo aurora Okay we should be working again <:blobthumbsup:494901804476137482>

o3-pro?

civic flame Jun 11, 2025, 7:03 AM

#

echo aurora

lermarena.ai

#

lol

tall summit Jun 11, 2025, 7:04 AM

#

balmy mist Wait is o3 pro out?

LMAO I KNOW RIGHT

tall summit Jun 11, 2025, 7:05 AM

#

echo aurora Okay we should be working again <:blobthumbsup:494901804476137482>

has anything changed with the infinite errors?

slow spruce Jun 11, 2025, 7:05 AM

#

Lermarena.Ai 🫡

tall summit Jun 11, 2025, 7:05 AM

#

https://discord.com/channels/1340554757349179412/1382037716341883011 seems not

torn mantle Jun 11, 2025, 7:10 AM

#

jovial heath Poly??

poly??

#

@small haven does o3-pro tend to overthink?

sacred quail Jun 11, 2025, 7:16 AM

#

is there a way using o3 pro besides 200dollar pro plan ?

#

Just wanna some testing

drowsy mural Jun 11, 2025, 7:19 AM

#

tall summit has anything changed with the infinite errors?

i think it's hard to have change. the reason we have this is because opus are expensive, so limits are there

tall summit Jun 11, 2025, 7:27 AM

#

true and fair

flint skiff Jun 11, 2025, 7:29 AM

#

you guys think o3 pro will be on the arena? its slow lol

torn mantle Jun 11, 2025, 7:31 AM

#

flint skiff you guys think o3 pro will be on the arena? its slow lol

it will just timeout then

#

or they can play with the API params

#

so they can cap the thinking budget

flint skiff Jun 11, 2025, 7:32 AM

#

ye but wouldnt that reduce its rating

#

im guessing thats what low - mid - high is no? the thinking budget

#

or am I wrong

hazy quest Jun 11, 2025, 7:38 AM

#

torn mantle yes

Could you expand?

drowsy mural Jun 11, 2025, 7:38 AM

#

tall summit true and fair

at least we can still see it on the list. it hasn't just disappeared like GPT4.5 or o1...

torn mantle Jun 11, 2025, 7:39 AM

#

hazy quest Could you expand?

expand on what

flint skiff Jun 11, 2025, 7:41 AM

#

drowsy mural at least we can still see it on the list. it hasn't just disappeared like GPT4.5...

what list?

drowsy mural Jun 11, 2025, 7:41 AM

#

flint skiff what list?

models list

flint skiff Jun 11, 2025, 7:41 AM

#

where is the list?

drowsy mural Jun 11, 2025, 7:43 AM

#

flint skiff where is the list?

what? are we talking the same site?

flint skiff Jun 11, 2025, 7:44 AM

#

yeah lmarena right

hazy quest Jun 11, 2025, 7:44 AM

#

torn mantle expand on what

"kingfall is still open to use"

drowsy mural Jun 11, 2025, 7:48 AM

#

flint skiff where is the list?

emm... then excuseme, where did this question come from? if it's just asking "where is the list?" then... in "direct chat" or "side by side" there it is

keen fulcrum Jun 11, 2025, 7:52 AM

#

o3 pro is now available in cursor

torn mantle Jun 11, 2025, 7:53 AM

#

keen fulcrum o3 pro is now available in cursor

its kinda useless no?

#

who would want a model that thinks 15min for a simple task

sacred quail Jun 11, 2025, 7:55 AM

#

I must try o3 pro. Is there a way besides buying 200dollar plan

fleet lintel Jun 11, 2025, 7:58 AM

#

torn mantle who would want a model that thinks 15min for a simple task

what is the use-case of this model? I can't think of any query where I want to wait for 10-15 min.

#

What am I missing?

keen fulcrum Jun 11, 2025, 8:30 AM

#

sacred quail I must try o3 pro. Is there a way besides buying 200dollar plan

I think it will be made available in plus

torn mantle Jun 11, 2025, 9:02 AM

#

fleet lintel what is the use-case of this model? I can't think of any query where I want to ...

i really have no idea

teal mantle Jun 11, 2025, 9:21 AM

#

o3 pro is def openai on price war

teal mantle Jun 11, 2025, 9:21 AM

#

teal mantle o3 pro is def openai on price war

But then now how could pro justify 200 dollars if the features aren’t that expensive anymore

flint skiff Jun 11, 2025, 9:25 AM

#

drowsy mural emm... then excuseme, where did this question come from? if it's just asking "wh...

I dont find o3 pro in the lmarena list though lol

#

in either direct chat or side by side

#

im so confused

#

its clearly not on lmarena yet unless im dumb

late path Jun 11, 2025, 9:34 AM

#

it's not and likely not going to

drowsy mural Jun 11, 2025, 9:37 AM

#

flint skiff its clearly not on lmarena yet unless im dumb

i see... ya, looks like it's not there yet

flint skiff Jun 11, 2025, 9:37 AM

#

late path it's not and likely not going to

yep if its not there in the next 12 hours its likely not

leaden sun Jun 11, 2025, 9:55 AM

#

torn mantle who would want a model that thinks 15min for a simple task

if that model has some integrated deterministic computation capability like running a numerical simulation, then it's probably worth the wait 🤗
"chat, run a smooth laminar flow through a pipe using stokes please"

dusky aurora Jun 11, 2025, 10:39 AM

#

leaden sun if that model has some integrated deterministic computation capability like runn...

"chat,solve the halting problem, please"

languid crescent Jun 11, 2025, 10:41 AM

#

is lmarena down?

ocean vortex Jun 11, 2025, 10:58 AM

#

languid crescent is lmarena down?

isn't

languid crescent Jun 11, 2025, 11:08 AM

#

for some reason lmarena's site is loading slow on my pc ( i have 300mbps internet plan), same connection with my phone (it works normal in my phone but not in pc)

ocean vortex Jun 11, 2025, 11:09 AM

#

languid crescent for some reason lmarena's site is loading slow on my pc ( i have 300mbps interne...

try https://legacy.lmarena.ai

#

both are working for me, but this one may be faster

verbal nimbus Jun 11, 2025, 11:19 AM

#

Is there a benchmark that tests memorized knowledge and hallucination by asking LLMs for a reference?

E.g. "Is there a book or research paper that <insert specific details here>"

Seems useful and should be easy to make.

soft kernel Jun 11, 2025, 11:25 AM

#

flint skiff you guys think o3 pro will be on the arena? its slow lol

Idk Maybe wait like 10 hours,and we'll know

ocean vortex Jun 11, 2025, 11:26 AM

#

verbal nimbus Is there a benchmark that tests memorized knowledge and hallucination by asking ...

Look into PersonQA and SimpleQA

#

sacred plaza Jun 11, 2025, 11:30 AM

#

verbal nimbus Is there a benchmark that tests memorized knowledge and hallucination by asking ...

Notebooklm has a search feature for sources. Try that. Or perplexity search?

languid crescent Jun 11, 2025, 11:31 AM

#

Keep receiving these errors: "Connecting to Arena has failed. Please try again later or on a different device"

#

now it's saying: Failed to accept terms-of-use

#

uh... that was weird

#

i opened a ticket and asked for support... and asked me for my wallet details?? lmao

late path Jun 11, 2025, 11:43 AM

#

ocean vortex Look into PersonQA and SimpleQA

which website is this?

ocean vortex Jun 11, 2025, 11:49 AM

#

late path which website is this?

https://openai.com/safety/evaluations-hub/#hallucination-evaluations

verbal nimbus Jun 11, 2025, 11:58 AM

#

ocean vortex Look into PersonQA and SimpleQA

Cheers. I heard about SimpleQA, but I am trying to find one that tests how well LLMs can recite memorized references from clues. Seems useful for book and research article recommendations.

verbal nimbus Jun 11, 2025, 12:06 PM

#

sacred plaza Notebooklm has a search feature for sources. Try that. Or perplexity search?

It's okay, but web search is less powerful at finding a reference just from hints.

leaden sun Jun 11, 2025, 12:07 PM

#

verbal nimbus Is there a benchmark that tests memorized knowledge and hallucination by asking ...

I think researchrabbit and futurehouse might help you with those queries

verbal nimbus Jun 11, 2025, 12:07 PM

#

leaden sun I think researchrabbit and futurehouse might help you with those queries

Thanks, I'll check it out.

verbal nimbus Jun 11, 2025, 12:09 PM

#

leaden sun I think researchrabbit and futurehouse might help you with those queries

It's not really a benchmark that I'm looking for, but the ResearchRabbit app sounds interesting.

leaden sun Jun 11, 2025, 12:09 PM

#

thsoe are not benchmark sites, but you can use it for finding benchmarks you're asking

#

am sure there are benchmark papers related to what you're looking for

#

and those deepsearch tools designed for research can help you finding those

verbal nimbus Jun 11, 2025, 12:12 PM

#

leaden sun thsoe are not benchmark sites, but you can use it for finding benchmarks you're ...

Oh, good idea. Actually I was looking for something like RR that can create maps from citations. Seems useful.

leaden sun Jun 11, 2025, 12:12 PM

#

verbal nimbus Oh, good idea. Actually I was looking for something like RR that can create maps...

top journals have those features

#

creating maps from citations I mean, so am sure such tools are available

verbal nimbus Jun 11, 2025, 12:13 PM

#

Hmmm, integrates with Zotero too... I might try it out

fleet lintel Jun 11, 2025, 2:15 PM

#

https://www.reddit.com/r/OpenAI/comments/1l83em6/i_bet_o3_is_now_a_quantized_model/

From the OpenAI community on Reddit: I bet o3 is now a quantized model

Explore this post and more from the OpenAI community

#

Is there a quality impact on O3 models after price reduction?
Did anyone notice it or folks are just cribbing about nothign ?

#

could just Blackwell explain 5x reduction? I doubt it but not sure

#

interesting!

#

lol.. yeah. I wont

#

too late.. i am already depressed

sour spindle Jun 11, 2025, 2:27 PM

#

I don’t really notice any downgrade with normal o3

#

I’m certainly not an OAI fanboy either I’m quite whelmed by o3 pro on the other hand

late path Jun 11, 2025, 2:29 PM

#

fleet lintel could just Blackwell explain 5x reduction? I doubt it but not sure

Just accept the fact that they previously overcharged o3 by 5x

unborn ocean Jun 11, 2025, 2:29 PM

#

i think blackwell might be part of it (maybe also the cause for the google cloud deal, because blackwell capacity is scarce)

late path Jun 11, 2025, 2:29 PM

#

It uses the same base model as gpt4.1

unborn ocean Jun 11, 2025, 2:30 PM

#

on the other hand the o3 pricing (when only considering serving cost) should actually be somewhere around the cost for 4.1

unborn ocean Jun 11, 2025, 2:30 PM

#

late path It uses the same base model as gpt4.1

:D same idea, on my end

late path Jun 11, 2025, 2:30 PM

#

and 4.1's price is 2/8 mtok exactly

unborn ocean Jun 11, 2025, 2:31 PM

#

yes, the only "tax" they could add is: expensive RL and the higher inference throughput in o3

#

which imo does not justify 5x

#

furthermore this really just fits very well with the overall theme of competition motivating for this price push
(and also the fact that they should not be as afraid as they once where about someone copying CoT or training on the output, as the other models are already very close to o3-performance now)

late path Jun 11, 2025, 2:35 PM

#

Their initial pricing for o3 was based on a monopoly narrative that the o3 intelligence-level model was irreplaceable and had no alternatives. now the narrative has busted due to the existence of 0605

drifting crow Jun 11, 2025, 2:37 PM

#

Their x is account is managed by chatgpt

#

Like removing alignment

#

Everybody loves gangstas so ai should be allowed to be gangsta

#

They already are under the hood they just know they need to lie to us

storm needle Jun 11, 2025, 2:42 PM

#

fleet lintel could just Blackwell explain 5x reduction? I doubt it but not sure

80% of the price was profit

ocean vortex Jun 11, 2025, 2:46 PM

#

yeah completely agree

jade egret Jun 11, 2025, 2:46 PM

#

guys

#

when o3 pro on lmarena?

ocean vortex Jun 11, 2025, 2:47 PM

#

they are not stupid, no way they would ever reduce the price to be at cost

#

relative to the cost of inference. Overall they are losing money after R&D and all catgrin

#

right.. the original context was about them turning profit in isolation just on inference

#

and surprising amount of people think that there's no profit there

#

when in fact often it's massive margin to start with lol

storm needle Jun 11, 2025, 2:51 PM

#

jade egret when o3 pro on lmarena?

i doubt anyone will use this model in the api. this model is only good if you have someone's api key and want to wipe out all of that person funds

woeful geyser Jun 11, 2025, 2:53 PM

#

Magistral: Saying "But" 100 times is All You Need

leaden sun Jun 11, 2025, 2:55 PM

#

Sam's shoes....i think chat needs to tell him how to choose the right one to pair with black suit

jade egret Jun 11, 2025, 2:57 PM

#

storm needle i doubt anyone will use this model in the api. this model is only good if you ha...

lol

leaden sun Jun 11, 2025, 2:58 PM

#

you're right, I've seen those black suit style with a pair of white sneakers, but light reddish brown pointy leather shoes?

jade egret Jun 11, 2025, 3:11 PM

#

is o3 pro better at coding than 4 opus and gemini 2.5 pro?

keen fulcrum Jun 11, 2025, 3:22 PM

#

no benchmarks live yet

unborn ocean Jun 11, 2025, 3:27 PM

#

you clearly only took econ 101

#

(and a bad one at that if you seriously call that the economic theory's conclusion)

dusky aurora Jun 11, 2025, 3:35 PM

#

thus they have invented GLR and PEG (even packrat)

leaden sun Jun 11, 2025, 3:36 PM

#

it presumes water and electricity being commodities first
those two are becoming luxury goods here in Europe 🥲

unborn ocean Jun 11, 2025, 4:09 PM

#

if you don't learn the assumptions the models are build upon in undergrad
-> you are lost in any graduate class

#

only if you don't know assumptions

#

and just took a very basic 101 class

#

which is why you remember the assumptions

#

which is an assumption in itself
the point that you can and should aggregate is in many not all cases used to simlify

#

but the main point is that the models are NOT stupid, they are heavily simplified and often overinterpreted while not taking the assumption into account ( i will grant you that)

#

why

#

there quite clearly is use for them

#

furthermore the whole subject is quite clearly concerned with realworld problems

dusky aurora Jun 11, 2025, 4:14 PM

#

Gemini is so uncreative these days, moslty parroting. the temperature is too low

unborn ocean Jun 11, 2025, 4:14 PM

#

? how does that disqualify anything

#

THE assumptions do not exist

#

simple models often have stupid seeming assumptions

#

its like learning js / html, which imo has little use for many people who learn it and most won't make their income from it, but it is kind of good to get people introduced to a way of thinking and also opens up the door for deeper dicussions, aka more comlex programming

#

the people in the crowd look really sad and bored

#

:|

#

man you are so annoying to talk to

dusky aurora Jun 11, 2025, 4:30 PM

#

when it first appeared,it was a breath of fresh air,with new names (not only Seraphinas and Lyras) and creativiy. the developers must work on sampling controls

#

some prompts need tight sampling, some ned looser,it's situation dependent

#

usually I put temperature to 0.98 and top-p to 1.0

civic flame Jun 11, 2025, 5:16 PM

#

lmao

patent bane Jun 11, 2025, 5:20 PM

#

o3 pro is buggy, thought for 13m and did not spit out the answer

#

hell nah

#

?'

jade egret Jun 11, 2025, 6:01 PM

#

poll_question_text

Kingsfall v.s o3 Pro - Whos winning?

victor_answer_votes

8

total_votes

15

victor_answer_id

2

victor_answer_text

o3 Pro

zinc ore Jun 11, 2025, 6:15 PM

#

Close tho

small haven Jun 11, 2025, 6:27 PM

#

wen deep think

dusky aurora Jun 11, 2025, 6:29 PM

#

the main question is "wen QoL improvements to the arena"

flint skiff Jun 11, 2025, 6:30 PM

#

wen o3 pro on arena

civic flame Jun 11, 2025, 6:30 PM

#

it's not happening

jade egret Jun 11, 2025, 6:31 PM

#

why not

#

dang

dusky aurora Jun 11, 2025, 6:34 PM

#

o3 contra then

small haven Jun 11, 2025, 6:44 PM

#

but it thinks for 20 mins as opposed to opus, 10s on avg

#

buddy acts like he didnt try kingfall

ocean vortex Jun 11, 2025, 6:48 PM

#

flint skiff wen o3 pro on arena

by the time it responds your session will expire and you will have to refresh lmao

small haven Jun 11, 2025, 6:49 PM

#

o3 pro will be old in about a week

ocean vortex Jun 11, 2025, 6:55 PM

#

besides I wouldn't necessarily be thrilled with pro. I wasn't exactly impressed by it yet tbh

#

tall summit Jun 11, 2025, 6:56 PM

#

why mod 1001001011 and not 1,000,000,007

#

so sad

small haven Jun 11, 2025, 6:57 PM

#

o3 pro will get annihilated by deep think

ocean vortex Jun 11, 2025, 6:57 PM

#

the follow up was low-effort as I was hoping this was just a 1 time bug... how the f can it be that a pro model does not have enough compute to provide you with an answer... catgrin

tall summit Jun 11, 2025, 6:57 PM

#

ocean vortex

this is project euler?

ocean vortex Jun 11, 2025, 6:59 PM

#

tall summit this is project euler?

yeah. I would still expect for it to come up with something though

tall summit Jun 11, 2025, 7:01 PM

#

ocean vortex yeah. I would still expect for it to come up with *something* though

how about directly asking it to make a program to find G(n,k)?

small haven Jun 11, 2025, 7:01 PM

#

yay i officially abused it

ocean vortex Jun 11, 2025, 7:03 PM

#

tall summit how about directly asking it to make a program to find G(n,k)?

nah this is more for testing it rather than solving tbh

tall summit Jun 11, 2025, 7:04 PM

#

ocean vortex nah this is more for testing it rather than solving tbh

i mean ok

#

i like this problem

ocean vortex Jun 11, 2025, 7:05 PM

#

chatgpt with tools would almost certainly give better response

tall summit Jun 11, 2025, 7:05 PM

#

it says "deriving G(20,7)"

#

i feel like it might have tried to find a relation using G(4,3) and G(8,5)

sacred quail Jun 11, 2025, 7:10 PM

#

Guys i heard claude has ultrathink option. Can we do this on lmarena or in mobile app with cheapest plan

#

Or is this only claude max thing ? Or API

small haven Jun 11, 2025, 7:11 PM

#

its basically 32k max thinking tokens

#

omgthink is 64k 😏

sacred quail Jun 11, 2025, 7:13 PM

#

What the hell is omgthink

patent bane Jun 11, 2025, 7:13 PM

#

#

regenerated 3 times

#

still gettingtm the error

#

hell nah

sacred quail Jun 11, 2025, 7:14 PM

#

Just give some time

#

When Opus 4 released, rate limit was 2 message lmao

patent aspen Jun 11, 2025, 7:18 PM

#

small haven Jun 11, 2025, 7:19 PM

#

10 tabs running at a time for about 8 hrs id say lol

void elm Jun 11, 2025, 7:20 PM

#

o3 pro benchmarks came out, literally no difference

small haven Jun 11, 2025, 7:20 PM

#

buy pro then

#

and then buy ultra by next month

#

https://tenor.com/view/no-no-gif-17511868351874922325

Tenor

void elm Jun 11, 2025, 7:22 PM

#

its so tiring switching models

#

and nobody is using an aio website because everyone knows performance is drastically reduced

#

gemini was leading for quite a while & now openai is

olive mesa Jun 11, 2025, 7:22 PM

#

small haven Jun 11, 2025, 7:23 PM

#

bro just wanna see big b's vote

void elm Jun 11, 2025, 7:24 PM

#

what?

#

yea what will release

#

i dont get it

#

i didn't describe anything to be released

#

i'm simply saying what it is

leaden sun Jun 11, 2025, 7:46 PM

#

first thought: 10-20%
second thought: 20-40%

feral lichen Jun 11, 2025, 7:46 PM

#

can anyone tell me best ai for roblox studio , please?

sour spindle Jun 11, 2025, 8:00 PM

#

o3 pro just dropped on livebench

#

looks pretty similar to o3

#

oops someone already posted lol

small haven Jun 11, 2025, 8:23 PM

#

ngl its not that close if we are talking in the tail end of questions

fleet lintel Jun 11, 2025, 8:30 PM

#

sour spindle looks pretty similar to o3

I didn't expect much and still dissappointed

small haven Jun 11, 2025, 8:38 PM

#

its very sota, until deepthink drops

sonic tendon Jun 11, 2025, 8:58 PM

#

i forget, has kingfall ever been on the arena?

jade egret Jun 11, 2025, 9:18 PM

#

don't think so

hollow ocean Jun 11, 2025, 9:51 PM

#

What about opus and 2.5

willow grail Jun 11, 2025, 10:01 PM

#

https://i.imgur.com/GCzNMsh.png

Imgur

olive mesa Jun 11, 2025, 11:00 PM

#

Google has to release 2.5 Ultra or Deep Think tomorrow

#

Right after o3 Pro

#

Then they'll release 3.0 Flash after 4o

#

I mean o4 lmao

#

OpenAI is weird at naming their models

patent aspen Jun 11, 2025, 11:06 PM

#

People have this idea that companies withhold models from the public for long periods of time just so they can launch it the day after their competitors' launches to steal their thunder. There's a little bit of that, but by and large they just launch models as soon as they're ready to launch

zinc ore Jun 11, 2025, 11:11 PM

#

Elon needs to drop 3.5

elder burrow Jun 11, 2025, 11:17 PM

#

try this prompt

#

lum diff #b7e8eb #517e34

#

thought of it myself

#

its very complex

#

https://cdn.discordapp.com/attachments/1368350088736542792/1381269162021752892/togif.gif

leaden sun Jun 11, 2025, 11:20 PM

#

Now I finally understand what you mean with liquid glass, thought you literally meant the advanced material that is still a research subject 😅
https://www.youtube.com/watch?v=1E3tv_3D95g
i like it, stylish as always

YouTube

Marques Brownlee

WWDC 2025 Impressions: Liquid Glass!

Hands on with iOS 26 and everything you need to know from WWDC 2025

MKBHD Merch: http://shop.MKBHD.com

Intro Track: Jordyn Edmonds
Playlist of MKBHD Intro music: https://goo.gl/B3AWV5

~
http://twitter.com/MKBHD
http://instagram.com/MKBHD
http://facebook.com/MKBHD

0:00 26 All the Things
2:01 iOS 26
5:39 Liquid Glass concerns
6:35 WatchOS 26
7...

▶ Play video

keen beacon Jun 11, 2025, 11:20 PM

#

Google needs to drop 2.5 ultra

small haven Jun 11, 2025, 11:54 PM

#

zinc ore Elon needs to drop 3.5

hold up they gotta tweak the ui a bit more

jade egret Jun 12, 2025, 12:24 AM

#

olive mesa Google has to release 2.5 Ultra or Deep Think tomorrow

fr??

#

imagine

elder burrow Jun 12, 2025, 12:24 AM

#

hold on.

#

i still remember

#

from a few months ago

#

sam altman said gpt 5 is releasing in a few months

#

https://cdn.discordapp.com/attachments/1256469975582113853/1381249952717738054/image.gif

#

#

this

elder rapids Jun 12, 2025, 12:34 AM

#

ion think this would be surprising tho

flint skiff Jun 12, 2025, 12:34 AM

#

@echo aurora so its been more than 24h since o3 pro release, guessing it doesnt come to the arena? makes sense since its so slow

elder rapids Jun 12, 2025, 12:35 AM

#

ye it wouldn't come to the arena

flint skiff Jun 12, 2025, 12:35 AM

#

yeeee dont see how it would work there

echo aurora Jun 12, 2025, 12:37 AM

#

flint skiff <@283397944160550928> so its been more than 24h since o3 pro release, guessing i...

sry to say I can't rly share much regarding if/when models will/wont be landing

elder rapids Jun 12, 2025, 12:40 AM

#

let's entertain the possibility then

#

it comes to the arena

#

A. ok, that's stupid, now I have to wait a couple minutes for a response, which is also delaying the other responder
B. ok, now I know o3 pro is here, and it's competing, all I need to do is pick the model that has the most comprehensive answers for something simple because whoops looks like that's inherent to the thinking time (also making it obvious which model it is)
C. ok, I know one of them is o3 pro, I WONT select a model, I'll just keep using it
D. ok, I'll simply select the obvious o3 pro (since pro model styles are obvious) because I just like openAI models

flint skiff Jun 12, 2025, 12:49 AM

#

agree

elder rapids Jun 12, 2025, 12:58 AM

#

of course that's what you respond with

small haven Jun 12, 2025, 1:01 AM

#

echo aurora sry to say I can't rly share much regarding if/when models will/wont be landing

wen kingfall en arena

small haven Jun 12, 2025, 1:22 AM

#

and we back

#

kingfall is going to be amazing

civic flame Jun 12, 2025, 1:28 AM

#

how'd you get kingfall?

keen beacon Jun 12, 2025, 1:28 AM

#

you have to ask kingfall

civic flame Jun 12, 2025, 1:29 AM

#

lol what

small haven Jun 12, 2025, 1:29 AM

#

mysterious

civic flame Jun 12, 2025, 1:29 AM

#

alrighty

small haven Jun 12, 2025, 1:29 AM

#

kingfall (supposedly) wtf

civic flame Jun 12, 2025, 1:30 AM

#

?!

small haven Jun 12, 2025, 1:31 AM

#

ok enough terminators

#

whats the next benchmark

#

liquid ass

keen beacon Jun 12, 2025, 1:42 AM

#

prompt btw?

#

fwiw i also ran brknclock's quiz n=30 times on different thinking budgets to see if theres a correlation with increased length with max thinking budget versus auto. should have a visualization with that later i just woke up

#

(there isn't)

#

tbh

#

im surprised how good 2.5 pro stacks against ultra

#

it can trade blows on a lot of fronts

#

beyond svg, etc., 2.5 pro when analyzing situations i've given has given me a lot of 'novel insight' that i did not expect

#

(ultra missed those 'insights')

keen beacon Jun 12, 2025, 2:05 AM

#

keen beacon fwiw i also ran brknclock's quiz n=30 times on different thinking budgets to see...

i wasted so much time on this thinking budget thing 😂

#

remind me to never engage in internet arguments

patent aspen Jun 12, 2025, 2:06 AM

#

It's the power of diminishing marginal returns

#

The big models and long thinking models aren't that important IMO

keen beacon Jun 12, 2025, 2:08 AM

#

imo i think long thinking is way more important than big models

#

it depends on what you mean by long thinking tho

patent aspen Jun 12, 2025, 2:09 AM

#

I mean 10+ minutes of thinking

#

tbf they will probably get way better since they're so new

#

But not much performance has been squeezed out of that 10+ minutes yet

keen beacon Jun 12, 2025, 2:10 AM

#

yea

wintry tinsel Jun 12, 2025, 2:39 AM

#

small haven kingfall (supposedly) wtf

When will it fall upon us (release)

leaden palm Jun 12, 2025, 2:46 AM

#

does google do tuesday launches

jade egret Jun 12, 2025, 2:46 AM

#

WOAH

#

Fr?

#

is that kingsfall?

#

bruh i don't have it

leaden palm Jun 12, 2025, 2:47 AM

#

jade egret bruh i don't have it

yeah, it's not june 17th - it's a supposed leak of the future

small haven Jun 12, 2025, 2:47 AM

#

big b feeding us good

#

ya sure

small haven Jun 12, 2025, 2:48 AM

#

leaden palm yeah, it's not june 17th - it's a supposed leak of the future

ktibow with the leak 👀

keen beacon Jun 12, 2025, 2:48 AM

#

kingfall is actually 2.5 flash lite

#

theyre just nerfing pro and flash

jade egret Jun 12, 2025, 2:49 AM

#

keen beacon kingfall is actually 2.5 flash lite

nahh

small haven Jun 12, 2025, 3:30 AM

#

wen titanforge-ab-test

#

#

samfalls

#

@leaden palm tuesday came early

late path Jun 12, 2025, 3:38 AM

#

small haven

Is titanforge accessible now?

small haven Jun 12, 2025, 3:38 AM

#

late path Is titanforge accessible now?

nah jk this was kingfall

late path Jun 12, 2025, 3:39 AM

#

Still looks better than previous few kingfall results

small haven Jun 12, 2025, 3:40 AM

#

wild one still tops it imo

#

o3 pro alrdy got old

#

ui tweakers maxxin

#

talk in english my guy

#

absolutely not

#

real men ask kingfall

jade egret Jun 12, 2025, 4:12 AM

#

small haven wen titanforge-ab-test

whats that

#

is that openai or google

zinc ore Jun 12, 2025, 4:15 AM

#

Google

nimble trail Jun 12, 2025, 4:22 AM

#

small haven wen titanforge-ab-test

Is Titanforge really exist tho?

#

Or it just a myth from that one dude in reddit

wintry tinsel Jun 12, 2025, 4:22 AM

#

small haven nah jk this was kingfall

how are you using kingfall

keen beacon Jun 12, 2025, 4:24 AM

#

titanforge isnt real its a schizo moment lol

#

i believe

small haven Jun 12, 2025, 4:27 AM

#

nimble trail Or it just a myth from that one dude in reddit

srry got too ahead of myself, titanforge doesnt exist

fleet lintel Jun 12, 2025, 5:09 AM

#

nimble trail Is Titanforge really exist tho?

it doesn't exist

#

interesting info on apple models

#

https://9to5mac.com/2025/06/11/how-do-apple-new-local-models-compare/

9to5Mac

Here’s how Apple's new local AI models perform against Google's -...

With the new Foundation Models framework, third-party apps will be able to leverage the same on-device possibilities that Apple will enjoy.

lilac nimbus Jun 12, 2025, 5:21 AM

#

Do someone konw Claude neptune v2??????

torn mantle Jun 12, 2025, 5:30 AM

#

small haven wen titanforge-ab-test

There is no such model

#

That guy is a liar

#

Hmph

hollow ocean Jun 12, 2025, 5:58 AM

#

I got o3 pro for $200 @small haven

#

worth it right?

small haven Jun 12, 2025, 5:59 AM

#

hollow ocean I got o3 pro for $200 <@931708065319907338>

i've had pro since december

hollow ocean Jun 12, 2025, 5:59 AM

#

small haven i've had pro since december

Its worth every dollar

#

sota

small haven Jun 12, 2025, 5:59 AM

#

yes abusing it

teal mantle Jun 12, 2025, 6:01 AM

#

hollow ocean I got o3 pro for $200 <@931708065319907338>

do they give extra usage limits over o1 pro previously

hollow ocean Jun 12, 2025, 6:02 AM

#

teal mantle do they give extra usage limits over o1 pro previously

@small haven you know more

small haven Jun 12, 2025, 6:02 AM

#

from last month? same usage

#

i still get temporarily limited, that doesnt change

narrow elbow Jun 12, 2025, 6:03 AM

#

small haven kingfall (supposedly) wtf

this reminds me of a japanese comic Terra Formars cockroach 🤣

small haven Jun 12, 2025, 6:03 AM

#

wtf i got jumpscared

narrow elbow Jun 12, 2025, 6:04 AM

#

sorry about that

small haven Jun 12, 2025, 6:04 AM

#

lol jk

torn mantle Jun 12, 2025, 6:04 AM

#

narrow elbow this reminds me of a japanese comic Terra Formars cockroach 🤣

Generated by?

narrow elbow Jun 12, 2025, 6:04 AM

#

torn mantle Generated by?

nope

torn mantle Jun 12, 2025, 6:04 AM

#

narrow elbow nope

Imagen 4

narrow elbow Jun 12, 2025, 6:04 AM

#

from google search

torn mantle Jun 12, 2025, 6:05 AM

#

I see

#

Yusuke morata

misty vault Jun 12, 2025, 6:52 AM

#

Obunga

hollow ocean Jun 12, 2025, 6:53 AM

#

@small haven guess if I paid $1 or $200 for o3 pro

#

https://tenor.com/view/vibe-cat-pepe-sad-pepe-vibing-cat-gif-17829017977941587152

Tenor

pulsar tendon Jun 12, 2025, 7:02 AM

#

zinc ore Jun 12, 2025, 7:03 AM

#

😃

misty vault Jun 12, 2025, 7:04 AM

#

top 5 most secure all-in-one ai service apis

pulsar tendon Jun 12, 2025, 7:13 AM

#

testing it now

neon warren Jun 12, 2025, 7:14 AM

#

pulsar tendon testing it now

Show us the outputs

nimble trail Jun 12, 2025, 7:15 AM

#

pulsar tendon testing it now

Is it the legendary Kingfall👀👀

neon warren Jun 12, 2025, 7:15 AM

#

it is was kingfall they would have named kingfall

pulsar tendon Jun 12, 2025, 7:15 AM

#

eh it doesnt really feel anything amazing

neon warren Jun 12, 2025, 7:16 AM

#

is it on webdevarena

pulsar tendon Jun 12, 2025, 7:16 AM

#

neon warren is it on webdevarena

yep

late path Jun 12, 2025, 7:16 AM

#

probably 2.5 flash lite

neon warren Jun 12, 2025, 7:16 AM

#

I am trying since 5 prompts

pulsar tendon Jun 12, 2025, 7:17 AM

#

Lost twice to this but very similar so 2.5 flash lite < yeah feels like it

neon warren Jun 12, 2025, 7:17 AM

#

How is the ouput speed of it

hollow ocean Jun 12, 2025, 7:18 AM

#

pulsar tendon Lost twice to this but very similar so 2.5 flash lite < yeah feels like it

https://tenor.com/view/the-flash-barry-allen-run-fast-gif-6144593

Tenor

zinc ore Jun 12, 2025, 7:20 AM

#

Could it be diffusion series, or not fast enough?

neon warren Jun 12, 2025, 7:20 AM

#

Alright I got mine

#

That's fast

neon warren Jun 12, 2025, 7:20 AM

#

zinc ore Could it be diffusion series, or not fast enough?

May be

zinc ore Jun 12, 2025, 7:21 AM

#

I'd speculate they would push more on the diffusion models because of their potential over lite models

neon warren Jun 12, 2025, 7:21 AM

#

How are diffusion models served via apis?

keen beacon Jun 12, 2025, 7:33 AM

#

oh just noticed that stephen seems to be from bytedance

pulsar tendon Jun 12, 2025, 7:34 AM

#

neon warren Jun 12, 2025, 7:36 AM

#

prowleidge is extremely fast

#

Also as good as flash

torn mantle Jun 12, 2025, 7:36 AM

#

pulsar tendon

seems like it didnt win

pulsar tendon Jun 12, 2025, 7:39 AM

#

neon warren prowleidge is extremely fast

about as fast as flash for me

soft kernel Jun 12, 2025, 7:39 AM

#

pulsar tendon Lost twice to this but very similar so 2.5 flash lite < yeah feels like it

What's this website

pulsar tendon Jun 12, 2025, 7:40 AM

#

soft kernel What's this website

webdev arena

#

web.lmarena.ai

soft kernel Jun 12, 2025, 7:40 AM

#

Oh it's webdev got it

#

Yeah ik,just haven't been there yet

#

Why lmarena doesn't release rankings for o3 pro

#

They've been quiet

neon warren Jun 12, 2025, 7:46 AM

#

I can confirm prowlridge is 2.5 Flash Lite
Which will further come to gemini diffusion model in a month

cobalt bane Jun 12, 2025, 8:32 AM

#

where do you found it ?

torn mantle Jun 12, 2025, 8:37 AM

#

cobalt bane where do you found it ?

fake

fleet lintel Jun 12, 2025, 8:43 AM

#

neon warren I can confirm prowlridge is 2.5 Flash Lite Which will further come to gemini d...

how do you know that?

patent bane Jun 12, 2025, 8:50 AM

#

more thinking = more hallucinations

keen fulcrum Jun 12, 2025, 9:08 AM

#

patent bane more thinking = more hallucinations

I don’t see o3 pro label

cedar tide Jun 12, 2025, 10:04 AM

#

Stephen its 100% this
https://x.com/fun000001/status/1932711671312851158?t=3jCJboR9eZgfRgx996NQuA&s=19

fisherdaddy (@fun000001)

豆包今天发布 1.6 系列模型，包括 Doubao-Seed-1.6、Doubao-Seed-1.6-thinking、Doubao-Seed-1.6-flash。其中，Doubao-Seed-1.6-thinking 在 GPQA Diamond/AIME 205/MultiChallenge 等多个权威测评集上和 o3、gemini 2.5 pro 水平相当。另外，Doubao-Seed-1.6

cedar tide Jun 12, 2025, 10:05 AM

#

cedar tide Stephen its 100% this https://x.com/fun000001/status/1932711671312851158?t=3jCJb...

.

Screenshot_2025-06-12-12-04-28-480_com.discord-edit.jpg

#

After trying more than 25 models, Amazon has gained 40 elo 🤣

Screenshot_2025-06-12-12-08-36-032_com.android.chrome-edit.jpg

civic flame Jun 12, 2025, 10:28 AM

#

LOL

olive mesa Jun 12, 2025, 10:33 AM

#

olive mesa

poll_question_text

How much AI research do you think Google and OpenAI are automating with their models?

victor_answer_votes

6

total_votes

18

victor_answer_id

5

victor_answer_text

20 - 40%

calm sequoia Jun 12, 2025, 10:38 AM

#

cedar tide Stephen its 100% this https://x.com/fun000001/status/1932711671312851158?t=3jCJb...

WTF is this color scheme 🫨

#

@patent aspen why Gemini is 2.5 and not 3.0? Could you explain the naming logic?

dusky aurora Jun 12, 2025, 10:59 AM

#

developers, thank you for increased interline spacing

ocean vortex Jun 12, 2025, 11:04 AM

#

late path probably 2.5 flash lite

that's boring we need 2.5-Flash-Lite-Ultra

cedar tide Jun 12, 2025, 11:12 AM

#

New model hunyuan-large-vision

torn mantle Jun 12, 2025, 11:23 AM

#

cedar tide New model hunyuan-large-vision

where

cedar tide Jun 12, 2025, 11:23 AM

#

torn mantle where

Arena basic

leaden sun Jun 12, 2025, 11:28 AM

#

olive mesa

Who voted 60-80%? 😅

hollow ocean Jun 12, 2025, 11:28 AM

#

keen fulcrum I don’t see o3 pro label

It’s regular o3

#

https://tenor.com/view/ronaldo-suiii-siuuu-al-nassr-alnassr-ronaldo-al-nassr-gif-7395052735569211864

Tenor

keen fulcrum Jun 12, 2025, 11:29 AM

#

cedar tide After trying more than 25 models, Amazon has gained 40 elo 🤣

Where are they used)

#

Isn’t Amazon using them for their customer service and aws

#

I could think of smaller models being introduced to their kindle models

cedar tide Jun 12, 2025, 11:31 AM

#

keen fulcrum Where are they used)

It's possible that it's not available anywhere.

keen fulcrum Jun 12, 2025, 11:32 AM

#

cedar tide It's possible that it's not available anywhere.

Why?

cedar tide Jun 12, 2025, 11:38 AM

#

keen fulcrum Why?

Why not

#

Its an exp model

keen fulcrum Jun 12, 2025, 11:40 AM

#

cedar tide Why not

What is your problem with amazon models?

cedar tide Jun 12, 2025, 11:41 AM

#

keen fulcrum What is your problem with amazon models?

I dont have problem

keen fulcrum Jun 12, 2025, 11:41 AM

#

cedar tide I dont have problem

You just told me you have

ocean vortex Jun 12, 2025, 11:41 AM

#

they are underwhelming models

#

the amazon ones

#

nothing interesting or special at all tbh

dusky aurora Jun 12, 2025, 11:42 AM

#

Gemini is too uncreative these days,almost to the point of beingbad

ocean vortex Jun 12, 2025, 11:42 AM

#

don't perform

cedar tide Jun 12, 2025, 11:42 AM

#

cedar tide It's possible that it's not available anywhere.

@keen fulcrum

keen fulcrum Jun 12, 2025, 11:43 AM

#

I was referring to them using them internally

ocean vortex Jun 12, 2025, 11:44 AM

#

goldmane is still in arena

keen fulcrum Jun 12, 2025, 11:44 AM

#

They are probably used for aws and amazon customer service

ocean vortex Jun 12, 2025, 11:44 AM

#

if it was 06-05, no way it's still the same model now

keen beacon Jun 12, 2025, 11:45 AM

#

they just forgot to rename it

#

leaden sun Jun 12, 2025, 12:05 PM

#

the oversea version of doubao is cici?

keen fulcrum Jun 12, 2025, 12:06 PM

#

keen beacon

Can you explain how you get this data?

dusky aurora Jun 12, 2025, 12:25 PM

#

sampling controls are a must

#

or at least a toggle between a strict and creative presets

#

since too strict sampling effectively lobotomizes the model

patent bane Jun 12, 2025, 12:49 PM

#

keen fulcrum I don’t see o3 pro label

because I was using chatgpt app

hollow ocean Jun 12, 2025, 12:50 PM

#

patent bane because I was using chatgpt app

The app shows it

patent bane Jun 12, 2025, 12:51 PM

#

hollow ocean The app shows it

on android, nope

#

#

satisfied now

#

either it will stop spitting out the answer or just hallucinations

#

o3 could answer that with 1-3m think time with tools

hollow ocean Jun 12, 2025, 12:54 PM

#

It’s sota tho

patent bane Jun 12, 2025, 12:54 PM

#

so?

hollow ocean Jun 12, 2025, 12:55 PM

#

It’s prob your prompt

patent bane Jun 12, 2025, 12:55 PM

#

i mean it's on par or I mean with my current testing it's just meh

patent bane Jun 12, 2025, 12:55 PM

#

hollow ocean It’s prob your prompt

?

#

what does that have to do with prompting

hollow ocean Jun 12, 2025, 12:56 PM

#

patent bane Jun 12, 2025, 12:57 PM

#

`There are 2022 users on a social network called Mathbook, and some of them are Mathbook-friends. (On Mathbook, friendship is always mutual and permanent.)

Starting now, Mathbook will only allow a new friendship to be formed between two users if they have at least two friends in common. What is the minimum number of friendships that must already exist so that every user could eventually become friends with every other user?
`

#

there you go, prompt it for me

hollow ocean Jun 12, 2025, 12:58 PM

#

You prompt it using the pic I’m going to bed

patent bane Jun 12, 2025, 12:58 PM

#

nah my point stands, it's just on par with o3

hollow ocean Jun 12, 2025, 12:58 PM

#

Nah it’s better

patent bane Jun 12, 2025, 12:59 PM

#

more thinking doesn't always mean smarter

hollow ocean Jun 12, 2025, 12:59 PM

#

Its rank 1 on livebench

#

And artificial analysis

patent bane Jun 12, 2025, 12:59 PM

#

hollow ocean Nah it’s better

At least i'm not seeing what o3 answers wrong so I can test lol

patent bane Jun 12, 2025, 1:00 PM

#

hollow ocean Its rank 1 on livebench

doesn't always

hollow ocean Jun 12, 2025, 1:01 PM

#

patent bane At least i'm not seeing what o3 answers wrong so I can test lol

Fix the prompt and you’ll be good

#

💯

patent bane Jun 12, 2025, 1:01 PM

#

you do it then

hollow ocean Jun 12, 2025, 1:02 PM

#

You do it I’m going to bed

patent bane Jun 12, 2025, 1:03 PM

#

prompting helps when you asking it a question that could lead to general answers, it 'd make the question more specific

#

and it does not apply when you're asking a correct/incorrect question

#

prompting helps when you ask questions like make me quiz website since it is too broad and general

#

that's why you need prompting

main iron Jun 12, 2025, 1:17 PM

#

Does anyone know how to connect the url to a extension for like vs code

late path Jun 12, 2025, 1:19 PM

#

I don't think it's a hard limit (like it truncates at 128 tokens and starts outputting the main content directly). I've tried setting a 128 token thinking budget and the model still thought for 200 tokens, which shows it's aware of the variable

willow grail Jun 12, 2025, 1:20 PM

#

I A MBACK FROM AN EXPENSIVE ADVENTURE
NOW I WILL SHOOT AND KILL PSORIASIS
with ointment/creme, not sure which one i got

misty vault Jun 12, 2025, 1:20 PM

#

willow grail I A MBACK FROM AN EXPENSIVE ADVENTURE NOW I WILL SHOOT AND KILL PSORIASIS with o...

gpt-4-0314

willow grail Jun 12, 2025, 1:21 PM

#

misty vault gpt-4-0314

yeah my psoriasis treatment is still cheaper than ai

#

but ai is much better

#

it will make the cause of all skin issues go BYEBYE

#

https://tenor.com/oWsgZzxuFlf.gif

Tenor

ocean vortex Jun 12, 2025, 1:26 PM

#

late path I don't think it's a hard limit (like it truncates at 128 tokens and starts outp...

My current understanding of it is that it is much less impactful than Anthropic thinking budget

#

it can use either thinking or final response window for solving the problem, the model for the most part does not care lol

#

and it's not a strict hard limit - yes. Model can go beyond what you set

#

1 run is like $50...?

torn mantle Jun 12, 2025, 1:29 PM

#

google :

multiple gemini models
project Astra
gemini fiffusion
imagen 4
android XR
veo 3
google search try-on
jules
flow
lyria 2
whisk
gemini native audio

xai :

ocean vortex Jun 12, 2025, 1:30 PM

#

I'm not paying 5k either way. Nor I can confidently say that 32k really does lead to better performance lol

#

it's just plausible

#

you got higher mean with max budget?

#

why did you hide the text in that txt file.... that's pain to read lmao

#general

😉