#general | Arena | Page 22

keen fulcrum Apr 16, 2025, 12:55 PM

#

Which of these is better?

drifting thorn Apr 16, 2025, 12:56 PM

#

o3 mini

#

2.0 Flash Thinking

balmy mist Apr 16, 2025, 12:59 PM

#

wait so we get unlimited gens with veo in studio right?

tall summit Apr 16, 2025, 1:07 PM

#

kinda rigged to bring in claude 3.7 given the list is mostly mini models

leaden meteor Apr 16, 2025, 1:20 PM

#

oblique flint Apr 16, 2025, 1:22 PM

#

idk if o3 will beat it tbh. o3 might be smarter but most people might not like waiting long for an naswer

balmy mist Apr 16, 2025, 1:30 PM

#

https://x.com/legit_api/status/1912498684337479686

ʟᴇɢɪᴛ (@legit_api) on X

o4-mini and o3 were removed 4 hours ago from here

this can mean anything but usually when they add models here, it's a day before release / same day.

removing such models COULD mean it's not coming today - but near enough so don't over react if so

keen beacon Apr 16, 2025, 1:32 PM

#

yeah my source says today looks increasingly unlikely

balmy mist Apr 16, 2025, 1:35 PM

#

that sucks

#

i hope r2 comes out today lol

#

or sum other model

calm sequoia Apr 16, 2025, 1:40 PM

#

keen beacon yeah my source says today looks increasingly unlikely

Likely to release or not release? And is this about o3 or o4-mini?

keen beacon Apr 16, 2025, 1:48 PM

#

calm sequoia Likely to release or not release? And is this about o3 or o4-mini?

likely will not release today

#

bar a big surprise

#

it's about both

#

https://x.com/OpenAI/status/1912506271187832904

OpenAI (@OpenAI) on X

Livestream in o3 hours.

#

well

#

turns out they managed it

balmy mist Apr 16, 2025, 2:04 PM

#

omggggg

#

omggg

#

ahhhh

#

i just jumped out my work meeting

#

when i saw the tweeet

#

i love open ai

#

i was going to be so sad today

misty vault Apr 16, 2025, 2:05 PM

#

balmy mist ahhhh

Please refrain from busting all over the LMArena community discord chat

balmy mist Apr 16, 2025, 2:05 PM

#

ill try bro

keen beacon Apr 16, 2025, 2:05 PM

#

might not be able to if it actually beats 2.5 pro in everything

balmy mist Apr 16, 2025, 2:06 PM

#

lmaoo o3 hours

#

i love that

#

so no mini today?

keen beacon Apr 16, 2025, 2:06 PM

#

they always sneak a hint for what they're announcing in the tweet announcing the stream

keen beacon Apr 16, 2025, 2:06 PM

#

balmy mist so no mini today?

either today (as they have never not launched them at the same time) or it they're feeling quirky tomorrow

#

i think they just said o3 because it's easier to make it a pun or whatever

#

how else could they sneak in what they're announcing

balmy mist Apr 16, 2025, 2:07 PM

#

o4 minus 1 hours

#

lmaoo jk

keen beacon Apr 16, 2025, 2:08 PM

#

o4 min[i]-us 1 hours

balmy mist Apr 16, 2025, 2:08 PM

#

you think they have a team dedicated just for puns?

#

i wonder how much they get paid

keen beacon Apr 16, 2025, 2:09 PM

#

i am also told the benchmarks for o3 given in the last preview in december are now out of date

#

the model has improved since then

keen beacon Apr 16, 2025, 2:09 PM

#

balmy mist you think they have a team dedicated just for puns?

lmao

balmy mist Apr 16, 2025, 2:09 PM

#

keen beacon the model has improved since then

wow sama said they managed to make o3 way better which sounds like a reach, "improved since then" seems more reasonable

#

anyone have those benchmarks handy?

#

i wanna set expectations

#

wait @keen beacon you are talkign about jan benchmarks?

#

i thought they just teased o3 in december

#

and gave the actual benchmarks in jan or feb

keen beacon Apr 16, 2025, 2:11 PM

#

huh

balmy mist Apr 16, 2025, 2:11 PM

#

cant remember lol

keen beacon Apr 16, 2025, 2:11 PM

#

no it was in december i swear

#

because it was part of a stream in the oai christmas run

keen beacon Apr 16, 2025, 2:11 PM

#

balmy mist anyone have those benchmarks handy?

#

#

balmy mist Apr 16, 2025, 2:12 PM

#

yeah you are right

#

they just said they would release it in feb

balmy mist Apr 16, 2025, 2:12 PM

#

keen beacon

if they improved on this then wow

#

and it got a high score on arch right? like it passed

keen beacon Apr 16, 2025, 2:13 PM

#

balmy mist Apr 16, 2025, 2:14 PM

#

so you are saying its better than this?

#

i cant believe that lmaoo, they might as well call it o3.1

keen beacon Apr 16, 2025, 2:14 PM

#

i'm not sure about arc agi performance but i know it has improved performance on the other benchmarks

#

mainly down to going from 4o as base to 4.1

balmy mist Apr 16, 2025, 2:14 PM

#

that makes sense

#

https://x.com/sama/status/1908167790336651720

Sam Altman (@sama) on X

we were able to really improve on what we previewed for o3 in many ways; i think people will be happy...

#

do you think december o3 is better than 2.5 pro?

#

what did gemini 2.5 pro get on arc and that math benchmark?

#

frontier math

keen beacon Apr 16, 2025, 2:17 PM

#

balmy mist do you think december o3 is better than 2.5 pro?

perhaps just about

#

and not in everything

balmy mist Apr 16, 2025, 2:17 PM

#

those are the only two benchmarks i care about and simplebench:
frontier math, arch1&2, Simple bench
we need a graph that combines them

keen beacon Apr 16, 2025, 2:17 PM

#

balmy mist frontier math

they haven't run frontiermath on 2.5 pro iirc

golden ocean Apr 16, 2025, 2:17 PM

#

balmy mist you think they have a team dedicated just for puns?

this team consists of ai agents

keen beacon Apr 16, 2025, 2:17 PM

#

something about rate limits

#

but 2.5 pro is very very good at maths

#

i'd say better than o1 or o3 mini or any other model currently available for that matter

balmy mist Apr 16, 2025, 2:18 PM

#

keen beacon something about rate limits

that is odd

#

i cant wait to pay $200 again for SOTA

thorny drum Apr 16, 2025, 2:20 PM

#

2.5 pro is so much better than any other model rn at math

#

wonder how it stacks against o3

balmy mist Apr 16, 2025, 2:20 PM

#

not o3 technially

#

well december o3

keen beacon Apr 16, 2025, 2:20 PM

#

i would expect 2+ point gains on most benchmarks vs december o3

balmy mist Apr 16, 2025, 2:21 PM

#

that good enough for me

thorny drum Apr 16, 2025, 2:21 PM

#

i mean idrc about the benchmarks its just about the cost

keen beacon Apr 16, 2025, 2:21 PM

#

keen beacon

codeforces will be very interesting

thorny drum Apr 16, 2025, 2:21 PM

#

december o3 low is like $200 per task on arc agi

keen beacon Apr 16, 2025, 2:21 PM

#

i think it'll reach 3000+ with the updated base

#

because 4.1 is miles better than 4o at code tasks

#

same for swe bench

balmy mist Apr 16, 2025, 2:21 PM

#

cause decem o3 prob was better than 2.5 pro, thats why oa is releasing it now instead of skipping them like they said they would

thorny drum Apr 16, 2025, 2:21 PM

#

the model they release very well could be weaker than december o3

balmy mist Apr 16, 2025, 2:21 PM

#

keen beacon codeforces will be very interesting

that competition coding right?

balmy mist Apr 16, 2025, 2:22 PM

#

thorny drum the model they release very well could be weaker than december o3

impossible in terms of benchmarks

#

we have the records lol

#

unless it cheaper to run, like way cheaper

#

then that is a win

thorny drum Apr 16, 2025, 2:22 PM

#

i guess o3 low is only 2x the cost of o1 pro

drifting thorn Apr 16, 2025, 2:22 PM

#

Just tried ChatGLM Z1 Rumination

keen beacon Apr 16, 2025, 2:22 PM

#

balmy mist that competition coding right?

yup

drifting thorn Apr 16, 2025, 2:23 PM

#

It’s thought was good, but the base model itself is trash

keen beacon Apr 16, 2025, 2:23 PM

#

thorny drum the model they release very well could be weaker than december o3

it won't be

#

i can assure you

thorny drum Apr 16, 2025, 2:23 PM

#

🤷‍♂️

drifting thorn Apr 16, 2025, 2:23 PM

#

But I think those big techs should have a reference

balmy mist Apr 16, 2025, 2:23 PM

#

thorny drum 🤷‍♂️

why would they release it if its worse than december lmaoo

#

they just annouced they got the 40 bill funding

drifting thorn Apr 16, 2025, 2:24 PM

#

Btw the “rumination” means extended “thinking” time and the ability to call tools multiple times while in its chain of thought

keen beacon Apr 16, 2025, 2:24 PM

#

if it doesn't beat 2.5 pro in most benchmarks, they won't release it

balmy mist Apr 16, 2025, 2:24 PM

#

keen beacon if it doesn't beat 2.5 pro in most benchmarks, they won't release it

fr

keen beacon Apr 16, 2025, 2:24 PM

#

i heard (again from a source) that they delayed the model after 2.5 pro by a couple of weeks

#

because they wanted to make SURE it didn't make them look like they're behind

balmy mist Apr 16, 2025, 2:25 PM

#

i hope this makes google release nw

#

i love this man

#

war of ai

keen beacon Apr 16, 2025, 2:25 PM

#

i think the higher likelihood of a reaction

#

is from deepseek

#

with R2

#

which would be very cool

balmy mist Apr 16, 2025, 2:25 PM

#

this needs to be a netflix doc

#

oh yeah

#

i would love that

#

imagine they release it during the livestream

#

lmaoo

keen beacon Apr 16, 2025, 2:25 PM

#

i still think the most likely course of events is that R2 releases next week

balmy mist Apr 16, 2025, 2:25 PM

#

but nahh

#

they gonna wait

keen beacon Apr 16, 2025, 2:26 PM

#

2.5 flash, updated 2.5 pro tomorrow or friday if i had to bet

balmy mist Apr 16, 2025, 2:26 PM

#

so they can steal iq

keen beacon Apr 16, 2025, 2:26 PM

#

google and openai have a long running tradition of trying to steal each other's thunder

#

it used to be openai and anthropic but now anthropic is too far behind

balmy mist Apr 16, 2025, 2:26 PM

#

keen beacon it used to be openai and anthropic but now anthropic is too far behind

yeah they got that aws partnership so they might gucci with that

#

it seems like its apple and oa vs google v anthopic and amazon, is that correct? not sure where meta fits in but meta vs deepseek(while also being against the other players?) lol

ember rapids Apr 16, 2025, 2:27 PM

#

Nightwhisper tmrw?

keen beacon Apr 16, 2025, 2:28 PM

#

doubt

balmy mist Apr 16, 2025, 2:28 PM

#

not sure if microsoft still is buddy buddy with oa

ember rapids Apr 16, 2025, 2:28 PM

#

Interesting how OAI chose to release today instead of Thursday

keen beacon Apr 16, 2025, 2:28 PM

#

they're trying to make themselves more independent these days

#

they're failing quite hard mind you

balmy mist Apr 16, 2025, 2:28 PM

#

nw tmw would be goated by google, but i dont think nw is better than o3 or o4 mini imo

keen beacon Apr 16, 2025, 2:28 PM

#

but

balmy mist Apr 16, 2025, 2:29 PM

#

@keen beacon what if nw was better? you think they release it?

balmy mist Apr 16, 2025, 2:29 PM

#

ember rapids Interesting how OAI chose to release today instead of Thursday

cause i was gonna be sad, @keen beacon pulled in a favor for me 🙂

keen beacon Apr 16, 2025, 2:29 PM

#

balmy mist <@456226577798135808> what if nw was better? you think they release it?

google are less willing to sit on things than oai are

#

if they do have a better model and it's almost ready, they'll push to have it out by end of next week max

#

if they don't expect to be waiting a couple weeks or more

balmy mist Apr 16, 2025, 2:30 PM

#

bet, these are exciting times man, im just happy google stepped up their game

#

it was looking scary a few years ago lol

keen beacon Apr 16, 2025, 2:31 PM

#

it's so much better when openai have pressure piled on them

#

to the moon 🙏

alpine coral Apr 16, 2025, 2:32 PM

#

i would expect o3 to outperform gem pro 2.5 on most benchmarks; question will be though, by how much and what cost?

#

like if it's marginally better but twice as expensive, gemini would still be ahead imo

#

but if it blow gem 2.5 out of the water, then yeah who cares (for now anyway) what it costs ha

fleet lintel Apr 16, 2025, 2:33 PM

#

did they even launch o3 on lmarena?? probably not

balmy mist Apr 16, 2025, 2:33 PM

#

yeah gemini 2.5 pro advantage is that it is free and cheap and SOTA

thorny drum Apr 16, 2025, 2:33 PM

#

alpine coral like if it's marginally better but twice as expensive, gemini would still be ahe...

its gonna be way more than 2x as expensive

balmy mist Apr 16, 2025, 2:34 PM

#

they are most likely still going to be SOTA bc of that, but in terms of intelligence i bet on OA, i think thats their goal with these models today

alpine coral Apr 16, 2025, 2:34 PM

#

fleet lintel did they even launch o3 on lmarena?? probably not

well, kinda.. if the discord server counts ha
(or perhaps it's o4-mini) private model

thorny drum Apr 16, 2025, 2:34 PM

#

like waaay more

balmy mist Apr 16, 2025, 2:34 PM

#

they already released their cheap models on monday

#

today is not that day

alpine coral Apr 16, 2025, 2:34 PM

#

thorny drum like waaay more

yeah i would expect it'll be more

balmy mist Apr 16, 2025, 2:34 PM

#

idc about price

#

im here for the most capable model and thats why we like OA

#

we wanna see reasoning breakthroughs

keen beacon Apr 16, 2025, 2:35 PM

#

fleet lintel did they even launch o3 on lmarena?? probably not

nope

thorny drum Apr 16, 2025, 2:35 PM

#

o3 low was roughly 2x as expensive as o1 pro which is 60x more expensive than 2.5 pro

keen beacon Apr 16, 2025, 2:35 PM

#

they don't ever test o-series models on the arena

#

i have access because i help with security and jailbreak testing

fleet lintel Apr 16, 2025, 2:35 PM

#

thorny drum its gonna be way more than 2x as expensive

2x expensive but significantly better is fine. but 5x expensive and marginally better is not good

keen beacon Apr 16, 2025, 2:35 PM

#

that's through an oai-controlled platform so

oblique flint Apr 16, 2025, 2:35 PM

#

balmy mist idc about price

1k / 1M output tokens incoming

alpine coral Apr 16, 2025, 2:35 PM

#

keen beacon i have access because i help with security and jailbreak testing

i suspected it was red teaming ha

balmy mist Apr 16, 2025, 2:35 PM

#

oblique flint 1k / 1M output tokens incoming

thats why you buy the subscription

fleet lintel Apr 16, 2025, 2:35 PM

#

keen beacon i have access because i help with security and jailbreak testing

give something juicy to us

balmy mist Apr 16, 2025, 2:36 PM

#

using api for o3 is nuts

#

or o1

keen beacon Apr 16, 2025, 2:36 PM

#

fleet lintel give something juicy to us

lmao

#

like? 😭

thorny drum Apr 16, 2025, 2:36 PM

#

so a baseline is 120x more expensive than 2.5 pro (on low reasoning effort)

#

i dont think 2x is even in the ballpark

#

or 5x

keen beacon Apr 16, 2025, 2:36 PM

#

oh yeah i think you guys will be annoyed about pricing

#

i don't have official numbers but i have talked with a few people

balmy mist Apr 16, 2025, 2:36 PM

#

idc about pricing, ik what to expect

fleet lintel Apr 16, 2025, 2:36 PM

#

what to expect??? tell us.

balmy mist Apr 16, 2025, 2:36 PM

#

just give me model!!

#

that you cant afford the api costs lol

#

and you need to sub up

#

thats what imma do

keen beacon Apr 16, 2025, 2:37 PM

#

fleet lintel what to expect??? tell us.

better than 2.5 pro in basically every regard but web development, however quite a lot more expensive

#

will be SOTA on basically all benchmarks

balmy mist Apr 16, 2025, 2:37 PM

#

damn

#

so nw is still king

#

😦

fleet lintel Apr 16, 2025, 2:37 PM

#

cool. and how much better on benchmarks?

sage raptor Apr 16, 2025, 2:38 PM

#

keen beacon better than 2.5 pro in basically every regard but web development, however quite...

in other coding tasks is SOTA ?

fleet lintel Apr 16, 2025, 2:38 PM

#

marginal or significant?

balmy mist Apr 16, 2025, 2:38 PM

#

how is it not as good in web dev tasks? thats weird, i think that has to do with system prompts and tool calling

#

you make any model better at web dev with clever prompting

keen beacon Apr 16, 2025, 2:38 PM

#

fleet lintel marginal or significant?

it won't be tiny amounts better, fairly significant. but still in striking distance for deepmind

fleet lintel Apr 16, 2025, 2:39 PM

#

2.5 pro is so good .. i am excited to see what o3 has to offer

keen beacon Apr 16, 2025, 2:39 PM

#

sage raptor in other coding tasks is SOTA ?

swe bench, yes

#

arc, yes

#

codeforces, yes

#

i also hear it "did well" on aider polygot but no figures there

sage raptor Apr 16, 2025, 2:39 PM

#

looks promising

balmy mist Apr 16, 2025, 2:40 PM

#

who is noam?
https://x.com/legit_api/status/1912516320966430928

ʟᴇɢɪᴛ (@legit_api) on X

Noam's excited therefore I am

https://t.co/GFB7rf1gVR

#

@keen beacon he valid?

#

yupp

keen fulcrum Apr 16, 2025, 2:40 PM

#

Here is a surprise for you:

balmy mist Apr 16, 2025, 2:41 PM

#

today is a holiday

#

ewww

#

grok

fleet lintel Apr 16, 2025, 2:41 PM

#

keen fulcrum Here is a surprise for you:

hard pass

keen fulcrum Apr 16, 2025, 2:42 PM

#

They should offer API subscriptions, unfortunately they don't

balmy mist Apr 16, 2025, 2:42 PM

#

keen fulcrum They should offer API subscriptions, unfortunately they don't

hmmm, that kind of defeats the purpose a lil tho

keen fulcrum Apr 16, 2025, 2:42 PM

#

Where?

balmy mist Apr 16, 2025, 2:42 PM

#

but i see what you mean

brittle tiger Apr 16, 2025, 2:42 PM

#

@keen beacon do you think we get o3 today if 2.5 capabilities didn't surprise ppl?

keen fulcrum Apr 16, 2025, 2:43 PM

#

Thats PAYG

keen beacon Apr 16, 2025, 2:43 PM

#

balmy mist who is noam? https://x.com/legit_api/status/1912516320966430928

he's a big contributor in reasoning model research @ OAI

balmy mist Apr 16, 2025, 2:43 PM

#

keen beacon he's a big contributor in reasoning model research @ OAI

wow, im excited af

keen beacon Apr 16, 2025, 2:43 PM

#

brittle tiger <@456226577798135808> do you think we get o3 today if 2.5 capabilities didn't su...

wdym?

sage raptor Apr 16, 2025, 2:43 PM

#

keen beacon Apr 16, 2025, 2:44 PM

#

goals?

balmy mist Apr 16, 2025, 2:44 PM

#

lmaoo perplexity has not been the same since 2.5 pro came out tbh

keen fulcrum Apr 16, 2025, 2:44 PM

#

keen beacon goals?

So an AI engineer hopping around

keen beacon Apr 16, 2025, 2:44 PM

#

taste testing every lab

#

lil bit of this, lil bit of that

sick mountain Apr 16, 2025, 2:45 PM

#

what does member of technical staff mean

balmy mist Apr 16, 2025, 2:45 PM

#

keen beacon lil bit of this, lil bit of that

lmaoo smart man

brittle tiger Apr 16, 2025, 2:45 PM

#

keen beacon wdym?

after gemini dropped sama tweeted "change of plans: we are going to release o3 and o4-mini after all, probably in a couple of weeks, and then do GPT-5 in a few months." was just curious if that was in reaction. my gut says likely but curious on your opinion

keen beacon Apr 16, 2025, 2:45 PM

#

ah

#

yeah that wasn't the plan to start with

#

2.5 pro served as a reminder they weren't invincible

balmy mist Apr 16, 2025, 2:45 PM

#

nothing was the same since 2.5 pro

#

pre 2.5 pro and post

#

its like o1 was SSJ and o3 is beyond SSJ lol

keen beacon Apr 16, 2025, 2:48 PM

#

i also hear that although there is quite a bit of overlap there is some "healthy but fierce" competition between the team focused on o3 and the one focused on o4 mini lmao

novel flame Apr 16, 2025, 2:50 PM

#

Check the Mixture of a Million Experts paper from last summer, it's pretty wild: https://arxiv.org/abs/2407.04153

arXiv.org

Mixture of A Million Experts

The feedforward (FFW) layers in standard transformer architectures incur a linear increase in computational costs and activation memory as the hidden layer width grows. Sparse mixture-of-experts (MoE) architectures have emerged as a viable approach to address this issue by decoupling model size from computational cost. The recent discovery of th...

keen beacon Apr 16, 2025, 2:50 PM

#

lmao

balmy mist Apr 16, 2025, 2:51 PM

#

keen beacon i also hear that although there is quite a bit of overlap there is some "healthy...

i love it

keen beacon Apr 16, 2025, 2:51 PM

#

https://x.com/kalinowski007/status/1912506846524633238 member of technical staff @ oai

Caitlin Kalinowski 🇺🇸 (@kalinowski007) on X

This is a big deal.

#

lots of hype posting going on

balmy mist Apr 16, 2025, 2:51 PM

#

omggg its agi

#

we made it guys

#

lock in!

keen beacon Apr 16, 2025, 2:52 PM

#

i think this is the first model within "striking distance" of AGI if you will

balmy mist Apr 16, 2025, 2:52 PM

#

the hype is real man

#

who gonna livestream it here?

keen beacon Apr 16, 2025, 2:52 PM

#

i think with the current rate of progress AGI is looking like it'll arrive by the end of next year max

balmy mist Apr 16, 2025, 2:53 PM

#

im scared

#

what should I do to cope?

fleet lintel Apr 16, 2025, 2:53 PM

#

i am sure that it is going to be SOTA but i am concerned about cost.

drifting thorn Apr 16, 2025, 2:53 PM

#

oblique flint 1k / 1M output tokens incoming

Just when API for LLMs are available, I remember it costs several dollars per thousand tokens

keen beacon Apr 16, 2025, 2:53 PM

#

balmy mist what should I do to cope?

ask chatgpt 😉

novel flame Apr 16, 2025, 2:53 PM

#

AGI won't come from (autoregressive Transformer based) LLMs though. LLMs might help accelerate the research and coding though.

fleet lintel Apr 16, 2025, 2:54 PM

#

keen beacon i think with the current rate of progress AGI is looking like it'll arrive by th...

hopefully not till 2030. companies wont think twice before firing us

keen beacon Apr 16, 2025, 2:54 PM

#

lol

keen beacon Apr 16, 2025, 2:54 PM

#

balmy mist what should I do to cope?

got some wisdom straight from our boy o3

#

📎 message.txt

balmy mist Apr 16, 2025, 2:54 PM

#

wait you still have access lmaoo

sage raptor Apr 16, 2025, 2:54 PM

#

livestream in o2 hours

keen beacon Apr 16, 2025, 2:54 PM

#

yeah they don't deprecate the model previews until like a week after they're publicly launched

oblique flint Apr 16, 2025, 2:55 PM

#

keen beacon i think with the current rate of progress AGI is looking like it'll arrive by th...

well, I would suggest checking out claudeplayspokemon and gemini plays pokemon lol. It's not AGI if it can't play a children's game

keen beacon Apr 16, 2025, 2:55 PM

#

don't need to worry about payment just yet

#

🙄

balmy mist Apr 16, 2025, 2:55 PM

#

limit doomscrolling lol

keen beacon Apr 16, 2025, 2:55 PM

#

oblique flint well, I would suggest checking out claudeplayspokemon and gemini plays pokemon l...

a matter of months ago the best llm couldn't get past even the first part

oblique flint Apr 16, 2025, 2:55 PM

#

an llm beating elden ring is my AGI definition

keen beacon Apr 16, 2025, 2:55 PM

#

just need to put things into perspective

balmy mist Apr 16, 2025, 2:55 PM

#

keen beacon yeah they don't deprecate the model previews until like a week after they're pub...

damn so are you even excited?

keen beacon Apr 16, 2025, 2:56 PM

#

well yeah

#

because i don't have the "-high" variants nor do i think i have o4 mini

drifting thorn Apr 16, 2025, 2:56 PM

#

keen beacon goals?

Is that u?

balmy mist Apr 16, 2025, 2:56 PM

#

keen beacon because i don't have the "-high" variants nor do i think i have o4 mini

wow

keen beacon Apr 16, 2025, 2:56 PM

#

drifting thorn Is that u?

lol no it's a linkedin profile i found scrolling through suggested connections

#

it would be my dream though 👀

#

labs are getting relentless trying to secure the best talent

drifting thorn Apr 16, 2025, 2:57 PM

#

keen beacon i also hear that although there is quite a bit of overlap there is some "healthy...

Ok it verified my guess on “two teams competing in OpenAI”

keen beacon Apr 16, 2025, 2:57 PM

#

so much so that deepmind are paying people to sit around and do nothing because it prevents them from being poached when they might be useful later on

balmy mist Apr 16, 2025, 2:57 PM

#

its crazy how much hype they are giving it, like i kinda dont know what to do

#

i wonder what google is thining

#

thinking*

drifting thorn Apr 16, 2025, 2:58 PM

#

novel flame Check the Mixture of a Million Experts paper from last summer, it's pretty wild:...

I’ve seen this this afternoon and I can hardly get what it means

keen beacon Apr 16, 2025, 2:58 PM

#

balmy mist i wonder what google is thining

i think they're probably still quite confident they can retake the lead in the next month or two

#

deepmind are the lab that i think moves the fastest

barren prairie Apr 16, 2025, 2:58 PM

#

keen beacon i think they're probably still quite confident they can retake the lead in the n...

Week **

keen beacon Apr 16, 2025, 2:58 PM

#

mostly down to the fact they have so much money to work with and their compute is unmatched

balmy mist Apr 16, 2025, 2:59 PM

#

we need a chart for times when certain companies or models are leading, so we can see how long each company has held that title

narrow elbow Apr 16, 2025, 2:59 PM

#

need another 12 days of live streaming non-stop,then google popping in, just like last time hahhaha.

balmy mist Apr 16, 2025, 2:59 PM

#

keen beacon deepmind are the lab that i think moves the fastest

i agree

#

OA the king of hype

keen beacon Apr 16, 2025, 2:59 PM

#

balmy mist we need a chart for times when certain companies or models are leading, so we ca...

iirc this arena has a video of how the leaderboard has progressed since it was created

#

which kinda works as that

balmy mist Apr 16, 2025, 2:59 PM

#

keen beacon iirc this arena has a video of how the leaderboard has progressed since it was c...

oh wow, nice

keen beacon Apr 16, 2025, 3:00 PM

#

narrow elbow need another 12 days of live streaming non-stop,then google popping in, just lik...

best way to know if google are about to drop is if logan posts "Gemini" on twitter

#

he always posts just that word the day before a launch

#

if they're releasing a new gemma model he'll post "Gemma"

#

ain't no hypeposting around here kids

fleet lintel Apr 16, 2025, 3:00 PM

#

google sucks at marketing

balmy mist Apr 16, 2025, 3:01 PM

#

they lowkey dont have to market

#

they got so much money

plain zinc Apr 16, 2025, 3:01 PM

#

fleet lintel google sucks at marketing

It's much better than if they relied more on marketing.

#

Remember how they tricked everyone with Gemini 1.0 Ultra

keen beacon Apr 16, 2025, 3:02 PM

#

if google invested as much into marketing as they did into research they'd be right on openai's ass

plain zinc Apr 16, 2025, 3:02 PM

#

It's all because of dumb marketing.

keen beacon Apr 16, 2025, 3:02 PM

#

openai are basically the only lab that actually know how to market

#

perhaps unfairly given the huge advantage they got with chatgpt's viral moment

#

but all the same

#

anthropic marketing sucks

fleet lintel Apr 16, 2025, 3:02 PM

#

plain zinc Remember how they tricked everyone with Gemini 1.0 Ultra

dont remind me of 1.0 models. they were horrendous

keen beacon Apr 16, 2025, 3:02 PM

#

google's marketing barely exists

drifting thorn Apr 16, 2025, 3:02 PM

#

oblique flint an llm beating elden ring is my AGI definition

LLM playing Minecraft is my definition

keen beacon Apr 16, 2025, 3:02 PM

#

deepmind didn't really market r1 but it was carried by the press and social media

keen beacon Apr 16, 2025, 3:03 PM

#

fleet lintel dont remind me of 1.0 models. they were horrendous

don't slander my boy 1.0 ultra... he was good at creativity. yes that was about it but 🥹

keen fulcrum Apr 16, 2025, 3:03 PM

#

As Google is taking AI finally mandatory, we will see Gemini models absolutely dominating the next years

#

https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/

Google

Ironwood: The first Google TPU for the age of inference

We’re introducing Ironwood, our seventh-generation Tensor Processing Unit (TPU) designed to power the age of generative AI inference.

keen beacon Apr 16, 2025, 3:03 PM

#

yeah google's TPUs given them a big advantage

balmy mist Apr 16, 2025, 3:03 PM

#

deepseek doesnt really market but they market through their tech, thats the kinda marketing you really want, its so good that it markets itself

keen beacon Apr 16, 2025, 3:04 PM

#

yeah i think that's what GDM are trying to do

fleet lintel Apr 16, 2025, 3:04 PM

#

keen beacon deepmind didn't really market r1 but it was carried by the press and social medi...

you mean deepseek. dont be too sure about it. china has huge power over media houses and they have incentive to hype it up

keen beacon Apr 16, 2025, 3:04 PM

#

but thus far it has been less successful

balmy mist Apr 16, 2025, 3:04 PM

#

no deepmind

balmy mist Apr 16, 2025, 3:04 PM

#

fleet lintel you mean deepseek. dont be too sure about it. china has huge power over media ...

^

keen beacon Apr 16, 2025, 3:04 PM

#

if i asked almost any of my irl friends if they knew what gemini 2.5 pro was they'd ask me wtf im talking about

keen beacon Apr 16, 2025, 3:04 PM

#

fleet lintel you mean deepseek. dont be too sure about it. china has huge power over media ...

yeah my bad

#

lmao

#

yeah i know they have incentive

#

but most of the heavy lifting, at least with R1, was natural because of the model's strengths and it challenging american dominance rather than down to deepseek's efforts themselves

drifting thorn Apr 16, 2025, 3:05 PM

#

fleet lintel you mean deepseek. dont be too sure about it. china has huge power over media ...

True, media house in China hype Chinese AI way more than others

keen beacon Apr 16, 2025, 3:05 PM

#

well, as you'd expect

drifting thorn Apr 16, 2025, 3:06 PM

#

They hyped Deepseek R1 and sell sessions that “make you earn from Deepseek”

#

Which is a scam

balmy mist Apr 16, 2025, 3:06 PM

#

drifting thorn They hyped Deepseek R1 and sell sessions that “make you earn from Deepseek”

china seems really good at marketing

keen beacon Apr 16, 2025, 3:06 PM

#

i wonder if R2 will cause the same absolute hype storm that R1 did

balmy mist Apr 16, 2025, 3:06 PM

#

like generating hype or commotion

#

manus, deepseek etc..

drifting thorn Apr 16, 2025, 3:07 PM

#

Most critics in China is overhyping R1, claiming that it is better than o3 mini

keen beacon Apr 16, 2025, 3:07 PM

#

you literally couldn't use R1 via almost any API nor via deepseek's own platform half the time because every gpu assigned to it was vapourised

balmy mist Apr 16, 2025, 3:07 PM

#

keen beacon you literally couldn't use R1 via almost any API nor via deepseek's own platform...

wild

drifting thorn Apr 16, 2025, 3:07 PM

#

balmy mist china seems really good at marketing

Ofc

keen beacon Apr 16, 2025, 3:07 PM

#

hopefully they've scaled up enough since then to be prepared for large load with r2

balmy mist Apr 16, 2025, 3:08 PM

#

r1 was trained on o1 outputs right? so that means they cant release r2 until o3 is released cause training on o3 mini is not good enough

thorny drum Apr 16, 2025, 3:08 PM

#

r1 was marketed by people losing money in nvda

#

the deepseek team didnt do any marketing really

drifting thorn Apr 16, 2025, 3:09 PM

#

keen beacon hopefully they've scaled up enough since then to be prepared for large load with...

They were banned from buying Nvidia GPUs

balmy mist Apr 16, 2025, 3:09 PM

#

thats why this moat crap is silly

drifting thorn Apr 16, 2025, 3:09 PM

#

That’s why their server overloaded

keen beacon Apr 16, 2025, 3:09 PM

#

drifting thorn They were banned from buying Nvidia GPUs

they stockpiled tf out of them before the ban went into force though as the chinese do with most things

drifting thorn Apr 16, 2025, 3:09 PM

#

balmy mist r1 was trained on o1 outputs right? so that means they cant release r2 until o3 ...

Nah it was trained by R1-zero’s output

balmy mist Apr 16, 2025, 3:10 PM

#

drifting thorn Nah it was trained by R1-zero’s output

r1 was trained on R1-zero output or you are saying r2 is?

drifting thorn Apr 16, 2025, 3:10 PM

#

R1-Zero is their internal model before public release of R1

#

As stated in their paper

balmy mist Apr 16, 2025, 3:11 PM

#

ahh okay, they what was all that commotiion about openai getting made at deepseek?

#

was it jsut because they did it cheaper?

drifting thorn Apr 16, 2025, 3:11 PM

#

I wonder when they can train new V3 based on R1, I think they will train new R1 based on new V3

#

And it will become master of hallucinations

drifting thorn Apr 16, 2025, 3:12 PM

#

balmy mist was it jsut because they did it cheaper?

They didn’t state how V3 was trained…

#

I mean old V3

balmy mist Apr 16, 2025, 3:13 PM

#

love this channel:
https://www.youtube.com/watch?v=yPxavsb2rgk&ab_channel=AIExplained

YouTube

AI Explained

‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2.0: 7 Upd...

Giving some context to a hectic week of AI news. This video won’t just be about the release, then, of GPT 4.1, in the last 48 hours, Kling 2.0, a sneak-peak at the next OpenAI model, or even the new Dolphin language tool. It will be about 7 such stories that contextualise where we are in AI and what is happening.

https://www.emergentmind.com/...

▶ Play video

keen beacon Apr 16, 2025, 3:14 PM

#

ai explained is pretty good

drifting thorn Apr 16, 2025, 3:15 PM

#

Waiting for any LMMs to call out video generating models to have deeper understanding to the context

#

Thx for recommendation

blazing rune Apr 16, 2025, 3:17 PM

#

GPT-4.1 made this 1st try in Windsurf: https://pong-html-js-responsive.windsurf.build/
Prompt: ```
Create a web-based Pong game using HTML, CSS, and JavaScript with these features:

Player controls using W (up) and S (down) keys
Computer opponent with beatable AI
Score tracking for both players
Game over when player loses by 10 points
Pause functionality with spacebar
Restart option with R key
Clean, responsive design

keen beacon Apr 16, 2025, 3:19 PM

#

https://x.com/ananyaku/status/1912523195175039398 from oai researcher - confirmed we're also getting o4 mini today. cc @sonic tendon

Ananya Kumar (@ananyaku) on X

Had a good breakfast today!

patent aspen Apr 16, 2025, 3:20 PM

#

The only problem is that big AI breakthroughs tend to be bottlenecked on top 20-40 researchers at any given org

drifting thorn Apr 16, 2025, 3:29 PM

#

Actually, GLM has a great solution(extremely large amount of thinking tokens with the ability to call tools multiple times in the inference time for 1 response)

sonic tendon Apr 16, 2025, 3:30 PM

#

keen beacon https://x.com/ananyaku/status/1912523195175039398 from oai researcher - confirme...

thx

drifting thorn Apr 16, 2025, 3:30 PM

#

So sad that its base model is 32B dumb

balmy mist Apr 16, 2025, 3:31 PM

#

keen beacon https://x.com/ananyaku/status/1912523195175039398 from oai researcher - confirme...

i love that

#

4 mini strawberries lmaoo

#

but why are the strawberries smaller and smaller

sonic tendon Apr 16, 2025, 3:32 PM

#

probably a coincidence? we shall see

balmy mist Apr 16, 2025, 3:32 PM

#

red you streaming this time?

#

we need you

sonic tendon Apr 16, 2025, 3:32 PM

#

yea, should be

keen beacon Apr 16, 2025, 3:32 PM

#

balmy mist but why are the strawberries smaller and smaller

just the 4o image gen being silly probably

sonic tendon Apr 16, 2025, 3:32 PM

#

ok yeah i am

#

oh, for some reason i thought he'd have used real strawberries

drifting thorn Apr 16, 2025, 3:33 PM

#

No way it’s real strawberries

sonic tendon Apr 16, 2025, 3:33 PM

#

i mean

#

the occasion probably warrants it

north vale Apr 16, 2025, 3:35 PM

#

it would just look like a worse image

keen beacon Apr 16, 2025, 3:39 PM

#

lol i asked o3 for an SNL cold open after trump's tariffs

#

📎 message.txt

#

pretty funny

tawdry meteor Apr 16, 2025, 3:41 PM

#

keen beacon they don't ever test o-series models on the arena

oh so they just come onto the arena after being cleared on safety and fully released?

keen beacon Apr 16, 2025, 3:41 PM

#

yup

tawdry meteor Apr 16, 2025, 3:41 PM

#

makes sense. excited to see the benchmarks

balmy mist Apr 16, 2025, 3:42 PM

#

keen beacon lol i asked o3 for an SNL cold open after trump's tariffs

put into kling and lets start cooking

tawdry meteor Apr 16, 2025, 3:42 PM

#

it's 1pm the livestream right?

balmy mist Apr 16, 2025, 3:42 PM

#

tawdry meteor it's 1pm the livestream right?

est

#

imagine sama not on stream

tawdry meteor Apr 16, 2025, 3:42 PM

#

balmy mist est

the only real timezone /s

keen fulcrum Apr 16, 2025, 3:43 PM

#

https://packaged-media.redd.it/h9c15ev2y4ve1/pb/m2-res_480p.mp4?m=DASHPlaylist.mpd&v=1&e=1744826400&s=7972827f4cd44f02d0a3d8e7cde0aa6cf004b630
This is apparently a helicopter

▶ Play video

keen beacon Apr 16, 2025, 3:43 PM

#

balmy mist imagine sama not on stream

agi cancelled

balmy mist Apr 16, 2025, 3:43 PM

#

@keen beacon https://x.com/TheXeophon/status/1912523048411951223

Xeophon (@TheXeophon) on X

🚨 EXCLUSIVE: Evals for the upcoming o3 and o4 leaked, straight from the presentation!!!!

keen beacon Apr 16, 2025, 3:43 PM

#

there is no agi without twinks

balmy mist Apr 16, 2025, 3:44 PM

#

is this true?

keen beacon Apr 16, 2025, 3:44 PM

#

balmy mist <@456226577798135808> https://x.com/TheXeophon/status/1912523048411951223

can confirm

balmy mist Apr 16, 2025, 3:44 PM

#

damn

#

thats wild

#

the shapes are interesting

keen beacon Apr 16, 2025, 3:44 PM

#

they say o5 will be an irregular shape

torn mantle Apr 16, 2025, 3:44 PM

#

balmy mist <@456226577798135808> https://x.com/TheXeophon/status/1912523048411951223

hes trolling

#

xd

keen beacon Apr 16, 2025, 3:44 PM

#

truly groundbreaking

#

yeah ☠️😭

torn mantle Apr 16, 2025, 3:45 PM

#

idk if this o3 model will be good

#

well i wont be able to use it anyways

balmy mist Apr 16, 2025, 3:46 PM

#

keen beacon https://x.com/ananyaku/status/1912523195175039398 from oai researcher - confirme...

maybe this means o4 mini will be faster than o3 mini? since it has a smaller strawberry?

balmy mist Apr 16, 2025, 3:46 PM

#

torn mantle idk if this o3 model will be good

bro o3 will change out lives

tawdry meteor Apr 16, 2025, 3:46 PM

#

btw pretty sure R2 won't release until May

#

I'll link the source I was reading about it

balmy mist Apr 16, 2025, 3:46 PM

#

tawdry meteor btw pretty sure R2 won't release until May

that was the original plan but ppl said they updated that

tawdry meteor Apr 16, 2025, 3:47 PM

#

🤔 when did they update that? I guess what I was reading was two weeks old

keen beacon Apr 16, 2025, 3:47 PM

#

balmy mist maybe this means o4 mini will be faster than o3 mini? since it has a smaller st...

it's satire i wouldn't read into it too much

tawdry meteor Apr 16, 2025, 3:47 PM

#

they said they wouldn't release this month

balmy mist Apr 16, 2025, 3:47 PM

#

keen beacon it's satire i wouldn't read into it too much

or sama's plan?

keen beacon Apr 16, 2025, 3:47 PM

#

we're all going schizo

balmy mist Apr 16, 2025, 3:47 PM

#

lmaooo

keen beacon Apr 16, 2025, 3:47 PM

#

tawdry meteor they said they wouldn't release this month

iirc i thought the expectation was R2 by end of April?

keen fulcrum Apr 16, 2025, 3:48 PM

#

keen beacon Apr 16, 2025, 3:48 PM

#

you're a little late

balmy mist Apr 16, 2025, 3:49 PM

#

imagine the speaker in the livestream is o3 or o4 mini

#

they still didnt give voice to o1 right?

#

guess thats hard with reasoning

keen beacon Apr 16, 2025, 3:49 PM

#

it's not multimodal with output like 4o is

#

id expect to see those capabilities built into gpt-5

#

or some of them

balmy mist Apr 16, 2025, 3:49 PM

#

i really wonder how the inference will be for gpt5

tawdry meteor Apr 16, 2025, 3:50 PM

#

keen beacon iirc i thought the expectation was R2 by end of April?

South China Morning Post article from April 6 referenced a message from a business manager at Deepseek to clients that R2 would not be coming out March/April

torn mantle Apr 16, 2025, 3:51 PM

#

balmy mist bro o3 will change out lives

it did? xd

tawdry meteor Apr 16, 2025, 3:51 PM

#

But maybe end of April/beginning May doesn't count in that

#

The meaning was a bit difficult to parse

keen beacon Apr 16, 2025, 3:51 PM

#

tawdry meteor South China Morning Post article from April 6 referenced a message from a busine...

i kinda feel like their deadlines are fluid asf

#

they have a lot of pressure being put on them now

tawdry meteor Apr 16, 2025, 3:51 PM

#

Yeah fair

keen beacon Apr 16, 2025, 3:51 PM

#

they abandoned the date for R2 previously in favour of just "ASAP" when o3 mini dropped

sonic tendon Apr 16, 2025, 3:52 PM

#

yeah

#

DS before may would be interesting - not sure about the odds there

#

*may

keen beacon Apr 16, 2025, 3:53 PM

#

before may

#

yeah

#

i mean if it isn't april it will be the first half of may max

sonic tendon Apr 16, 2025, 3:53 PM

#

keen beacon i mean if it isn't april it will be the first half of may max

whar?

keen beacon Apr 16, 2025, 3:53 PM

#

like if they don't launch in april

sonic tendon Apr 16, 2025, 3:53 PM

#

ohhh

keen beacon Apr 16, 2025, 3:54 PM

#

i highly doubt they will leave it any more than another 2 weeks

sonic tendon Apr 16, 2025, 3:54 PM

#

asdfhjlkasdfhjkl i thought you were just poking fun

balmy mist Apr 16, 2025, 3:54 PM

#

wait so o3 is a researcher now?

keen beacon Apr 16, 2025, 3:55 PM

#

it will be more agentic if you want to put it that way

#

rumour has it there will be updates to deep research this week too

balmy mist Apr 16, 2025, 3:55 PM

#

@keen beacon im starting to see that 20k number really show up more

#

are they really doing that?

keen beacon Apr 16, 2025, 3:55 PM

#

it's in the roadmap but not concrete

balmy mist Apr 16, 2025, 3:55 PM

#

okay time to take out a loan

keen beacon Apr 16, 2025, 3:55 PM

#

aimed at enterprise of course

#

if they tried to aim that at consumers i think sama would have to make sure there are no luigis around

balmy mist Apr 16, 2025, 3:56 PM

#

im starting to see where openai is going, they really are pushing this product stuff

sonic tendon Apr 16, 2025, 3:57 PM

#

VC money running dry

keen beacon Apr 16, 2025, 3:57 PM

#

quick a16z and sequoia

#

bankroll sama

balmy mist Apr 16, 2025, 3:57 PM

#

which makes sense, there is no moat in models anymore

keen fulcrum Apr 16, 2025, 3:57 PM

#

https://www.warp.dev/pricing
this is honestly the best AI sub you can get

Warp

Pricing and plans for Warp | Warp

Explore Warp’s pricing plans. Get an AI-powered terminal with built-in team knowledge to help you build software faster and more efficiently.

balmy mist Apr 16, 2025, 3:57 PM

#

with a good product built around your model you do a lot

keen fulcrum Apr 16, 2025, 3:57 PM

#

You can use that even for coding

keen beacon Apr 16, 2025, 3:57 PM

#

balmy mist Apr 16, 2025, 3:58 PM

#

keen beacon

yoo they better not blue ball me again i swear

keen beacon Apr 16, 2025, 3:58 PM

#

https://x.com/i/lists/1676646159539130369 just take a look at this feed of oai employee twitter accounts

@altryne/OpenAI folks / X

keen beacon Apr 16, 2025, 3:58 PM

#

balmy mist which makes sense, there is no moat in models anymore

nah the models are everything

#

what a troll

sonic tendon Apr 16, 2025, 3:59 PM

#

keen beacon what a troll

what does that even mean

keen beacon Apr 16, 2025, 3:59 PM

#

patience chair its a meme

keen beacon Apr 16, 2025, 3:59 PM

#

sonic tendon what does that even mean

the patience chair silly

balmy mist Apr 16, 2025, 3:59 PM

#

keen beacon nah the models are everything

but the models can be copied so it a given that everyone will have a good model

#

especially when the percent changes are so small

#

normies dont even notice the changes now

keen beacon Apr 16, 2025, 4:00 PM

#

#

need me some of these chairs

#

cute car too

balmy mist Apr 16, 2025, 4:00 PM

#

most of my friends cant tell the difference between gpt4 and gemini2.5

keen beacon Apr 16, 2025, 4:00 PM

#

balmy mist but the models can be copied so it a given that everyone will have a good model

other frontier labs can copy each other, but if you arent a frontier lab and youre building products off of them i think ur gonna fail hard eventually

#

this is interesting

#

claude ever the yapper

fleet lintel Apr 16, 2025, 4:03 PM

#

keen fulcrum https://www.warp.dev/pricing this is honestly the best AI sub you can get

Best quality is almost infinite Gemini 2.5 Pro usage for Free.

People dont know but Google launched Gemini code assist and giving 2.5 pro for coding for free . I dont know why Google is doing it ?

balmy mist Apr 16, 2025, 4:03 PM

#

keen beacon other frontier labs can copy each other, but if you arent a frontier lab and you...

you have a point bc before sama said to build with the the fact that models will get better overtime, but now openai is a product company and so is other frontier labs, where they are building tools around their own models and integrating it with stuff, so im not sure anymore

balmy mist Apr 16, 2025, 4:03 PM

#

keen beacon this is interesting

hmm explain this to me please, im kinda slow

keen beacon Apr 16, 2025, 4:04 PM

#

balmy mist you have a point bc before sama said to build with the the fact that models will...

they have to keep themselves afloat especially right now with how competitive things are

balmy mist Apr 16, 2025, 4:04 PM

#

so its reasoning more for each output?

#

which is bad right?

keen beacon Apr 16, 2025, 4:04 PM

#

it depends

balmy mist Apr 16, 2025, 4:04 PM

#

if another model can get the same answer with less thinking tokens?

keen beacon Apr 16, 2025, 4:04 PM

#

yeah

#

the less tokens it can spend reasoning to get the right answer the better

#

the best models will be able to most intelligently decide how much to reason

balmy mist Apr 16, 2025, 4:05 PM

#

keen beacon this is interesting

why did they color code it like that its so hard to see, but the thinking tokens are on the bottom right?

keen beacon Apr 16, 2025, 4:05 PM

#

balmy mist why did they color code it like that its so hard to see, but the thinking tokens...

hm?

balmy mist Apr 16, 2025, 4:05 PM

#

keen beacon the best models will be able to most intelligently decide how much to reason

pretty much what gpt5 should be able to do flawlessly?

keen beacon Apr 16, 2025, 4:05 PM

#

well it should get relatively close

#

needs to get to a point where you don't need any input minus your prompt to get the best answer

balmy mist Apr 16, 2025, 4:06 PM

#

keen beacon this is interesting

for this chart who scores the best?

keen beacon Apr 16, 2025, 4:06 PM

#

2.5 pro iirc

#

obviously its sota rn

keen beacon Apr 16, 2025, 4:06 PM

#

balmy mist for this chart who scores the best?

here's the reference

alpine coral Apr 16, 2025, 4:06 PM

#

keen beacon the best models will be able to most intelligently decide how much to reason

most of those tokens on the 3.7-thinking are often wasted.. it's 'reasoning' yields the same and sometimes inferior to just what the 3.7 vanilla model produces

keen beacon Apr 16, 2025, 4:07 PM

#

i think it's o3 mini efficiency wise

balmy mist Apr 16, 2025, 4:07 PM

#

wow

#

2.5 is really good

#

oh you are right

#

its second

#

and less thinking tokens

#

wait no

#

less thinking tokens

#

but more output tokens

keen beacon Apr 16, 2025, 4:07 PM

#

but o3 mini high is also based on 4o mini 🤣

#

2.5 pro is the most yappy when it comes to output tokens

alpine coral Apr 16, 2025, 4:07 PM

#

balmy mist but more output tokens

they're the same functionally

#

output/thinking tokens

#

whether thinking is abstracted away (or hidden entirely)

#

it's still completion being
<reasoning>
<answer>

rather than just
answer

balmy mist Apr 16, 2025, 4:08 PM

#

https://www.youtube.com/watch?v=sq8GBPUb3rk

YouTube

OpenAI

Introduction to new o-series models

Join Greg Brockman, Mark Chen, Eric Mitchell, Brandon McKinzie, Wenda Zhou, Fouad Matin, Michael Bolin and Ananya Kumar as they introduce and demo the new o-series models.

▶ Play video

#

sama not there wtf

keen beacon Apr 16, 2025, 4:08 PM

#

there it is

balmy mist Apr 16, 2025, 4:08 PM

#

Join Greg Brockman, Mark Chen, Eric Mitchell, Brandon McKinzie, Wenda Zhou, Fouad Matin, Michael Bolin and Ananya Kumar as they introduce and demo the new o-series models.

keen beacon Apr 16, 2025, 4:08 PM

#

AGI CANCELLED

drifting thorn Apr 16, 2025, 4:08 PM

#

keen beacon 2.5 pro is the most yappy when it comes to output tokens

It’s the only model that will give me 10000 words sequel from my unfinished novel

balmy mist Apr 16, 2025, 4:09 PM

#

but greg is there

drifting thorn Apr 16, 2025, 4:09 PM

#

Which, it’s good to be yappy, but the bad thing is that it will often hallucinate when it has a long output

balmy mist Apr 16, 2025, 4:09 PM

#

why sama

#

why

keen beacon Apr 16, 2025, 4:09 PM

#

jokes aside

balmy mist Apr 16, 2025, 4:09 PM

#

why does this not keep you up at night!

keen beacon Apr 16, 2025, 4:09 PM

#

there are a lot of people there

drifting thorn Apr 16, 2025, 4:09 PM

#

And forgets all the details I said before

calm sequoia Apr 16, 2025, 4:09 PM

#

keen beacon this is interesting

Smells like brute force search

keen beacon Apr 16, 2025, 4:09 PM

#

this is the most people i've seen attending a stream since the 4o launch

balmy mist Apr 16, 2025, 4:10 PM

#

sama only cares about memory

#

yeah wild

keen beacon Apr 16, 2025, 4:10 PM

#

for o1 and o3 mini they only had like

#

3 people

balmy mist Apr 16, 2025, 4:10 PM

#

what if sama is replaced by o4

#

and they are annoucing that today

keen beacon Apr 16, 2025, 4:10 PM

#

this time they have almost 3 times that

#

how are they gonna be sitting

#

is it gonna be like

#

a long table with greg as king

#

or what

balmy mist Apr 16, 2025, 4:10 PM

#

what if a new room?

#

something is fishy

#

there is a disturbance in the force

#

both

keen beacon Apr 16, 2025, 4:11 PM

#

why are reddit mods such damn haters

#

no comment saying why, no message

#

not a dupe

#

god

balmy mist Apr 16, 2025, 4:12 PM

#

wait what does this mean?

drifting thorn Apr 16, 2025, 4:12 PM

#

balmy mist Apr 16, 2025, 4:12 PM

#

who is shrek and donkey

keen beacon Apr 16, 2025, 4:12 PM

#

he posted a reddit post and it got removed

#

^

#

yes

#

check the live stream 🤣

drifting thorn Apr 16, 2025, 4:12 PM

#

balmy mist wait what does this mean?

The moderators removed his post

keen beacon Apr 16, 2025, 4:12 PM

#

#general message

drifting thorn Apr 16, 2025, 4:12 PM

#

drifting thorn

Gotta sleep rn

balmy mist Apr 16, 2025, 4:12 PM

#

lmaoo

keen beacon Apr 16, 2025, 4:13 PM

#

and the livestream yeah

#

ikr

#

what's he up to

balmy mist Apr 16, 2025, 4:13 PM

#

he is a very busy man

keen beacon Apr 16, 2025, 4:13 PM

#

he hasn't been at a launch stream for like

#

3 successive launches now

#

😔

balmy mist Apr 16, 2025, 4:13 PM

#

he might pop in at the end

keen beacon Apr 16, 2025, 4:13 PM

#

doesnt he have a baby tho i guess his schedule is wonky

keen beacon Apr 16, 2025, 4:13 PM

#

balmy mist he is a very busy man

busy enough to not attend a major launch but free enough to tweet all the time

#

smh

keen beacon Apr 16, 2025, 4:13 PM

#

keen beacon doesnt he have a baby tho i guess his schedule is wonky

oh yeah good point

balmy mist Apr 16, 2025, 4:13 PM

#

sama's plan

#

he cooking something

keen beacon Apr 16, 2025, 4:14 PM

#

idk what to expect with o4 mini tbh

tawdry meteor Apr 16, 2025, 4:14 PM

#

well I'm curious to see if o3 is actually better than g2.5pro, it's just so far ahead of everything else still

keen beacon Apr 16, 2025, 4:14 PM

#

this guy is part of the agents research team @ openai

#

so i think we're getting agent updates or a new agent related feature

#

(alongside the models)

drifting thorn Apr 16, 2025, 4:15 PM

#

And Sam Altman will be appearing on the livestream of new agent related feature?

keen beacon Apr 16, 2025, 4:16 PM

#

drifting thorn And Sam Altman will be appearing on the livestream of new agent related feature?

it's part of the same stream as the one for o3 and o4 mini

#

so no

#

my bet is just on deep research upgrades

#

since gemini deep research made them look bad

#

gotta reclaim the lead

alpine coral Apr 16, 2025, 4:16 PM

#

i actually think their existing one is superior to gemini's even with 2.5

#

problem is you get 10/month

keen beacon Apr 16, 2025, 4:17 PM

#

what the 🤣

tawdry meteor Apr 16, 2025, 4:17 PM

#

gemini is defo worse at news research imo

tall summit Apr 16, 2025, 4:17 PM

#

keen beacon what the 🤣

oai researcher fanfic

tawdry meteor Apr 16, 2025, 4:17 PM

#

but I haven't tested much on deep research for studies

balmy mist Apr 16, 2025, 4:18 PM

#

its crazy this all started because sama and elon wanted to stop google from getting agi lol

tawdry meteor Apr 16, 2025, 4:19 PM

#

and now they're suing each other and are gonna let google get agi lmao

balmy mist Apr 16, 2025, 4:19 PM

#

i wonder what ilya is up to with SSI

novel flame Apr 16, 2025, 4:20 PM

#

balmy mist its crazy this all started because sama and elon wanted to stop google from gett...

Crazier that things went sour because Sam wanted profits and Elon wanted fascism.

balmy mist Apr 16, 2025, 4:20 PM

#

novel flame Crazier that things went sour because Sam wanted profits and Elon wanted fascism...

wild bro, kinda sad tbh

#

i wonder if they could team back up in the future

#

grok and gpt

#

vs gemini

#

ai wars

#

what i want is an arena battlefield for these models using agents powered by their models in a fight

narrow elbow Apr 16, 2025, 4:21 PM

#

where is microsoft?

balmy mist Apr 16, 2025, 4:21 PM

#

narrow elbow where is microsoft?

who?

#

jk lmaoo

#

they mia bro, they updated copilot

narrow elbow Apr 16, 2025, 4:22 PM

#

hahha

balmy mist Apr 16, 2025, 4:22 PM

#

thats about it

keen beacon Apr 16, 2025, 4:22 PM

#

i dont think ms is trying to compete at all in the frontier space as themselves

balmy mist Apr 16, 2025, 4:22 PM

#

who want to make this arena battle field with me?

novel flame Apr 16, 2025, 4:23 PM

#

balmy mist i wonder what ilya is up to with SSI

Either going down novel but ultimately wrong paths to AGI, or more of the same (autoregressive Transformer LLMs with more compute). I would love it if Ilya actually created AGI but so far all we know for sure is he’s really good at building yappy chatbots.

balmy mist Apr 16, 2025, 4:23 PM

#

it can be a minecraft thing where each model gotta build their own castle and base and then attack the other ones with strategies etc...

keen beacon Apr 16, 2025, 4:23 PM

#

that would be cool ngl

balmy mist Apr 16, 2025, 4:23 PM

#

novel flame Either going down novel but ultimately wrong paths to AGI, or more of the same (...

you think ilya is on the wrong path?

keen beacon Apr 16, 2025, 4:23 PM

#

the run-down:

Greg Brockman - President, Co-founder
Mark Chen - Chief Research Officer
Eric Mitchell - O-series Research, Deep Research Core Contributor
Brandon McKinzie - Research/Member of Technical Staff
Wenda Zhou - Research/Member of Technical Staff, o1 Contributor
Fouad Martin - Agent & Systems Research
Michael Bolin - Research/Member of Technical Staff
Ananya Kumar - Research Lead, Core Contributor for o1 and GPT-4.5

balmy mist Apr 16, 2025, 4:23 PM

#

there is a rumor that he saw something

novel flame Apr 16, 2025, 4:23 PM

#

balmy mist what i want is an arena battlefield for these models using agents powered by the...

What you’re describing is WW3

balmy mist Apr 16, 2025, 4:24 PM

#

novel flame What you’re describing is WW3

i would pay for that tbh

#

like that could be the new benchmarks

alpine coral Apr 16, 2025, 4:24 PM

#

keen beacon i dont think ms is trying to compete at all in the frontier space as themselves

agreed (they've chosen instead to invest $13bn into openai for that)

ocean plume Apr 16, 2025, 4:25 PM

#

anyone test code claude 3.7 thinking vs 2.5 gemini pro and o3 high ?

balmy mist Apr 16, 2025, 4:25 PM

#

ocean plume anyone test code claude 3.7 thinking vs 2.5 gemini pro and o3 high ?

yeah 2.5 pro is daddy

ocean plume Apr 16, 2025, 4:25 PM

#

what better code and less bug

sonic tendon Apr 16, 2025, 4:25 PM

#

who's ready for DeepSeek to overtake them in 12 days

keen beacon Apr 16, 2025, 4:25 PM

#

they shouldnt have released 4.5 tbh

ocean plume Apr 16, 2025, 4:26 PM

#

balmy mist yeah 2.5 pro is daddy

so who is grandpa

sonic tendon Apr 16, 2025, 4:26 PM

#

at least in lmarena, they did last time lol

torn mantle Apr 16, 2025, 4:26 PM

#

sonic tendon who's ready for DeepSeek to overtake them in 12 days

me

#

memememe

sonic tendon Apr 16, 2025, 4:26 PM

#

for o1, i believe it did

torn mantle Apr 16, 2025, 4:26 PM

#

they did on o1

keen beacon Apr 16, 2025, 4:26 PM

#

i mean openai had o3 already at the time

#

if openai didn't stall every release they'd be ahead of other labs most of the time

#

but they sit on their best stuff a lot

balmy mist Apr 16, 2025, 4:27 PM

#

keen beacon but they sit on their best stuff a lot

is o5 being trained? or is focus on gpt5?

keen beacon Apr 16, 2025, 4:27 PM

#

parallel teams

sonic tendon Apr 16, 2025, 4:28 PM

#

ocean plume so who is grandpa

the clone of terence tao i have locked up in my basement

keen beacon Apr 16, 2025, 4:28 PM

#

gpt-5 already firmly underway, o5 in early stages

balmy mist Apr 16, 2025, 4:28 PM

#

wow

keen beacon Apr 16, 2025, 4:28 PM

#

now that the o3 and o4 mini teams have wrapped up they're being moved to mostly o4 and o5

balmy mist Apr 16, 2025, 4:28 PM

#

why dont you go work for them?

sonic tendon Apr 16, 2025, 4:28 PM

#

isn't o4-mini just a direct distill of o4?

keen beacon Apr 16, 2025, 4:28 PM

#

balmy mist why dont you go work for them?

i'm happy with what i'm doing now

#

o4 is not ready enough to do that i think

keen beacon Apr 16, 2025, 4:29 PM

#

sonic tendon isn't o4-mini just a direct distill of o4?

i don't think so

#

the o3 we're getting today is basically a whole different model to the one they announced

balmy mist Apr 16, 2025, 4:29 PM

#

keen beacon i don't think so

really?

#

what is it then?

alpine coral Apr 16, 2025, 4:29 PM

#

keen beacon gpt-5 already firmly underway, o5 in early stages

yeah they said in that interview that they started training gpt-4.5 two years ago (and planning for it a year before that)

novel flame Apr 16, 2025, 4:30 PM

#

balmy mist you think ilya is on the wrong path?

I think Ilya spent a lot of time at OpenAI chasing bigger Transformers convinced that the bitter lesson / scaling laws would mean that GPT-5 would become AGI by virtue of its size alone. And I think he missed a lot of opportunities (R1, test time compute) that other labs spotted, and that a lot of the big ideas he used came from DeepMind papers. I think Ilya is brilliant, but he has chased Transformers for so long I’m not convinced he can let them go.

keen beacon Apr 16, 2025, 4:30 PM

#

that is what i have heard from inside

#

they kept delaying it, then the name was changed to 4.5 and they decided to just "bite the bullet" because they didn't want to waste all the resources they had put into it and not even let it see the light of day

#

ilya leaving made things quite a bit worse

#

damaged morale as well

keen beacon Apr 16, 2025, 4:30 PM

#

novel flame I think Ilya spent a lot of time at OpenAI chasing bigger Transformers convinced...

i dont think so lol. i think he saw how test time compute would pan out

balmy mist Apr 16, 2025, 4:31 PM

#

novel flame I think Ilya spent a lot of time at OpenAI chasing bigger Transformers convinced...

hmm, we will have to see, i still think he has found a trick that most people are missing

keen beacon Apr 16, 2025, 4:31 PM

#

yea i think

balmy mist Apr 16, 2025, 4:31 PM

#

yeah i thought so

keen beacon Apr 16, 2025, 4:31 PM

#

yup

sonic tendon Apr 16, 2025, 4:31 PM

#

keen beacon they kept delaying it, then the name was changed to 4.5 and they decided to just...

honestly, maybe they should've just scrapped it lol

sonic tendon Apr 16, 2025, 4:31 PM

#

keen beacon damaged morale as well

hmm

balmy mist Apr 16, 2025, 4:31 PM

#

i think we can give him grace to cook, there is no way he not gonna produce soemthing goood

keen beacon Apr 16, 2025, 4:32 PM

#

sonic tendon honestly, maybe they should've just scrapped it lol

to be fair if i had put upwards of $150M into one model i'd literally just drop it out of principle 🙏 😭

balmy mist Apr 16, 2025, 4:32 PM

#

if deepseek can do what they did, i know ilya gonna come with heat

sonic tendon Apr 16, 2025, 4:32 PM

#

gpt4.5 sort of makes me doubt that scaling laws will actually hold up, no?

sonic tendon Apr 16, 2025, 4:32 PM

#

keen beacon to be fair if i had put upwards of $150M into one model i'd literally just drop ...

rip

keen beacon Apr 16, 2025, 4:32 PM

#

well of course lmao

#

oh wait

#

no

#

misread

#

it did not cost close to a billion no

#

it was still one of the most expensive training runs ever though, perhaps the most

balmy mist Apr 16, 2025, 4:32 PM

#

you was right:
https://x.com/btibor91/status/1912544591821132135

Tibor Blaho (@btibor91) on X

Fouad Martin - Agent & Systems Research

thorny drum Apr 16, 2025, 4:33 PM

#

more expensive than that grok one?

calm sequoia Apr 16, 2025, 4:33 PM

#

Leo, why do you have so much info? This can't be open source material?

keen beacon Apr 16, 2025, 4:33 PM

#

thorny drum more expensive than that grok one?

yes

#

although it's close iirc

#

grok 4 will probably beat it

keen beacon Apr 16, 2025, 4:34 PM

#

calm sequoia Leo, why do you have so much info? This can't be open source material?

i suppose you could say i am relatively well connected

#

~900B

#

3T+

#

lmao

torn mantle Apr 16, 2025, 4:34 PM

#

calm sequoia Leo, why do you have so much info? This can't be open source material?

he has intels

#

he know some openai staff

balmy mist Apr 16, 2025, 4:34 PM

#

how many parameters is o1? and o3?

torn mantle Apr 16, 2025, 4:34 PM

#

some of his friends

keen beacon Apr 16, 2025, 4:34 PM

#

same as 4o lol

torn mantle Apr 16, 2025, 4:34 PM

#

too

keen beacon Apr 16, 2025, 4:34 PM

#

o1 is quite small

#

200b

#

yeah

#

200b

balmy mist Apr 16, 2025, 4:34 PM

#

wow

keen beacon Apr 16, 2025, 4:34 PM

#

beat me to it

balmy mist Apr 16, 2025, 4:34 PM

#

see

#

thats why the strawberries are smaller

#

sama's plan

oblique flint Apr 16, 2025, 4:35 PM

#

if o1 is so small, why is it so hella expneisve?

keen beacon Apr 16, 2025, 4:35 PM

#

margins

torn mantle Apr 16, 2025, 4:35 PM

#

profit margin

#

they want to earn a lot

keen beacon Apr 16, 2025, 4:35 PM

#

oblique flint if o1 is so small, why is it so hella expneisve?

semianalysis had an article on this, its the kv seq length

#

but yes they also have margins

torn mantle Apr 16, 2025, 4:35 PM

#

but then

#

r1

oblique flint Apr 16, 2025, 4:35 PM

#

keen beacon semianalysis had an article on this, its the kv seq length

do you have the link by any chance? Would be interested to read it

torn mantle Apr 16, 2025, 4:35 PM

#

messed things up for them

calm sequoia Apr 16, 2025, 4:35 PM

#

keen beacon i suppose you could say i am relatively well connected

Only oAI lab?

balmy mist Apr 16, 2025, 4:35 PM

#

i predict o4 will be 50 parameters

torn mantle Apr 16, 2025, 4:36 PM

#

50 what

balmy mist Apr 16, 2025, 4:36 PM

#

sama told me in a dream

#

50 b

sonic tendon Apr 16, 2025, 4:36 PM

#

50 bdozen

alpine coral Apr 16, 2025, 4:36 PM

#

lmao

balmy mist Apr 16, 2025, 4:36 PM

#

sama told me to trust the process

keen beacon Apr 16, 2025, 4:36 PM

#

balmy mist i predict o4 will be 50 parameters

can confirm this is NOT the case gang 😭

keen beacon Apr 16, 2025, 4:37 PM

#

calm sequoia Only oAI lab?

the majority is from openai but there are some scattered among the other ones

keen beacon Apr 16, 2025, 4:37 PM

#

oblique flint do you have the link by any chance? Would be interested to read it

its been a while might be this one: https://semianalysis.com/2024/12/11/scaling-laws-o1-pro-architecture-reasoning-training-infrastructure-orion-and-claude-3-5-opus-failures/

be wary that some of the article isnt true

#

claude 3.5 opus/bigger model wasnt used in the dev of sonnet 3.5 according to dario

#

sonnet 3.5 was truly a generational training run

#

i wonder what anthropic researchers were thinking when they properly put it through its paces and it was cooking that hard

#

i dont think they expected it to be that good

#

it just lined up

balmy mist Apr 16, 2025, 4:39 PM

#

their ceo used to work at google right?

keen beacon Apr 16, 2025, 4:40 PM

#

all of anthropic's founding team were ex-deepmind iirc

balmy mist Apr 16, 2025, 4:40 PM

#

lol

keen beacon Apr 16, 2025, 4:40 PM

#

sorry no

#

dario was ex oai

balmy mist Apr 16, 2025, 4:40 PM

#

its like everyone went against google, then went against oa

keen beacon Apr 16, 2025, 4:40 PM

#

it was a mish mash of ex deepmind and ex oai

balmy mist Apr 16, 2025, 4:40 PM

#

ilya was google at first right?

keen beacon Apr 16, 2025, 4:41 PM

#

google brain yeah

alpine coral Apr 16, 2025, 4:41 PM

#

keen beacon it was a mish mash of ex deepmind and ex oai

yeah - mostly oai though i thought

balmy mist Apr 16, 2025, 4:41 PM

#

omg 19 mins

alpine coral Apr 16, 2025, 4:42 PM

#

part of deepmind's genius / plan was setup in London so as to not get all their talent lured away by silicon valley prospects

balmy mist Apr 16, 2025, 4:44 PM

#

i wanted to work at deepmind so bad when I was in college 😦

fleet lintel Apr 16, 2025, 4:44 PM

#

alpine coral part of deepmind's genius / plan was setup in London so as to not get _all_ thei...

this is also sad. So much talent in EU and eventaully everything is handed over to US

balmy mist Apr 16, 2025, 4:45 PM

#

fleet lintel this is also sad. So much talent in EU and eventaully everything is handed over ...

it be like that bro

alpine coral Apr 16, 2025, 4:45 PM

#

i really like Demis Hassabis.. was listening to this the other day driving.. worth a listen https://podcasts.apple.com/au/podcast/demis-hassabis-on-ai-game-theory-multimodality-and/id1073226719?i=1000703257125

Apple Podcasts

Demis Hassabis on AI, game theory, multimodality, and the nature of...

Podcast Episode · Pivot · 12/04/2025 · 1h 1m

ember rapids Apr 16, 2025, 4:45 PM

#

excited for o4 mini high

alpine coral Apr 16, 2025, 4:45 PM

#

he even mentions 'ultra' in passing ha

balmy mist Apr 16, 2025, 4:45 PM

#

ember rapids excited for o4 mini high

get ready bro

#

@keen beacon you ready?

keen beacon Apr 16, 2025, 4:46 PM

#

yup 😄

balmy mist Apr 16, 2025, 4:46 PM

#

im not

#

lol jk

keen beacon Apr 16, 2025, 4:46 PM

#

smh i have to go

#

this always happens around oai launches

balmy mist Apr 16, 2025, 4:46 PM

#

noo

ember rapids Apr 16, 2025, 4:46 PM

#

gregs presenting so u know its gonna special

keen beacon Apr 16, 2025, 4:46 PM

#

hopefully will be back in time

#

cya

balmy mist Apr 16, 2025, 4:47 PM

#

keen beacon hopefully will be back in time

what if he is a presenter?

#

leo's plan

keen beacon Apr 16, 2025, 4:47 PM

#

my power went out during o1 preview's launch and o3 mini i think lol hopefully the trend doesnt continune

balmy mist Apr 16, 2025, 4:47 PM

#

lmaoo

#

my youtube is not working on chrome so i gotta use brave

#

google dont want me to see it

sonic tendon Apr 16, 2025, 4:49 PM

#

keen beacon cya

aw. cya!

#

i'll get on vc in 5

plain zinc Apr 16, 2025, 4:52 PM

#

Feel the AGI

#

Guys

#

MMM

balmy mist Apr 16, 2025, 4:52 PM

#

feelsssss

#

ahhhhhh

plain zinc Apr 16, 2025, 4:52 PM

#

It's perfect

balmy mist Apr 16, 2025, 4:52 PM

#

lmaoooo

fleet lintel Apr 16, 2025, 4:52 PM

#

plain zinc Feel the AGI

I feel super expensive agi

balmy mist Apr 16, 2025, 4:52 PM

#

yuppp

plain zinc Apr 16, 2025, 4:53 PM

#

fleet lintel I feel super expensive agi

Try FREE AGI moment

#

FEEL

ember rapids Apr 16, 2025, 4:53 PM

#

feeling the agi

plain zinc Apr 16, 2025, 4:53 PM

#

Feel his scent

balmy mist Apr 16, 2025, 4:53 PM

#

remember sama said he felt the agi with 4.5

ember rapids Apr 16, 2025, 4:53 PM

#

Hearing plus users get 1 o3 request per week

plain zinc Apr 16, 2025, 4:54 PM

#

Lol (that's how I imagine Sam Altman)

#

AGI

#

AGI is here!

balmy mist Apr 16, 2025, 4:54 PM

#

who paying $200?

fleet lintel Apr 16, 2025, 4:54 PM

#

we should start protest for "free the AGI" like free the nipple movement

balmy mist Apr 16, 2025, 4:54 PM

#

i am

#

i have to

#

its liek concert tickets now

#

damn

#

i stopped paying after 2 months

plain zinc Apr 16, 2025, 4:54 PM

#

6 minutes

balmy mist Apr 16, 2025, 4:54 PM

#

but ill do it now

fleet lintel Apr 16, 2025, 4:54 PM

#

lol

balmy mist Apr 16, 2025, 4:54 PM

#

feel it!!

tawdry meteor Apr 16, 2025, 4:54 PM

#

how long after the announcement do we think they'll release o3 on arena so we can start ranking it against g2.5p? I want to run coding tasks across 2.5 and o3/o4mini and have them fix each other like with sonnet

balmy mist Apr 16, 2025, 4:55 PM

#

yo im floatinf

#

omgg

plain zinc Apr 16, 2025, 4:55 PM

#

plain zinc 6 minutes

PREPARE your prompts

keen beacon Apr 16, 2025, 4:55 PM

#

aand i'm back :3

balmy mist Apr 16, 2025, 4:55 PM

#

keen beacon aand i'm back :3

i thought you were gonna be in the livestream

fleet lintel Apr 16, 2025, 4:55 PM

#

how many millions rich?

keen beacon Apr 16, 2025, 4:55 PM

#

balmy mist what if he is a presenter?

you got me..

#

i'm actually just greg's alter ego

plain zinc Apr 16, 2025, 4:55 PM

#

Okay, guys. Personally, I'll check on the Google models in LMarena.

balmy mist Apr 16, 2025, 4:56 PM

#

damn bro, so you getting the 20k plan?

plain zinc Apr 16, 2025, 4:56 PM

#

Wish me luck (I hope I find gold)

balmy mist Apr 16, 2025, 4:56 PM

#

can i share with you my prompts?

keen beacon Apr 16, 2025, 4:56 PM

#

o3 and o4 mini system cards are on the cdn now

#

not sharing the link though 😉

balmy mist Apr 16, 2025, 4:56 PM

#

keen beacon not sharing the link though 😉

ahhh

#

i dont see it lol

#

yooo

fleet lintel Apr 16, 2025, 4:58 PM

#

what is the livestream link?

balmy mist Apr 16, 2025, 4:58 PM

#

https://www.youtube.com/watch?v=sq8GBPUb3rk

YouTube

OpenAI

Introduction to new o-series models

Join Greg Brockman, Mark Chen, Eric Mitchell, Brandon McKinzie, Wenda Zhou, Fouad Matin, Michael Bolin and Ananya Kumar as they introduce and demo the new o-series models.

▶ Play video

#

but red is streamin it i think

fleet lintel Apr 16, 2025, 4:58 PM

#

i am lubricating

balmy mist Apr 16, 2025, 4:58 PM

#

i better not get blue balls i swear

keen beacon Apr 16, 2025, 4:59 PM

#

fleet lintel i am lubricating

lotion at the ready lmao

#

oh yeah

#

i still don't know if they're demo-ing at the end of this one

#

so that'll be interesting to see

#

decent chance that will be one last livestream this week

balmy mist Apr 16, 2025, 4:59 PM

#

andwe made it

#

ahhhhh

keen beacon Apr 16, 2025, 5:00 PM

#

sorry i forgot to say

balmy mist Apr 16, 2025, 5:00 PM

#

greg!!!

sonic tendon Apr 16, 2025, 5:00 PM

#

in research lounge

keen beacon Apr 16, 2025, 5:00 PM

#

"demoing o4"

#

where's the rest of 'em

balmy mist Apr 16, 2025, 5:00 PM

#

ahhhhh

ember rapids Apr 16, 2025, 5:00 PM

#

https://tenor.com/view/cinema-gif-5935759059195122434

Tenor

keen beacon Apr 16, 2025, 5:00 PM

#

yeah okay there's gonna be another room

#

with

fleet lintel Apr 16, 2025, 5:00 PM

#

no twink?

keen beacon Apr 16, 2025, 5:00 PM

#

the agent team

keen beacon Apr 16, 2025, 5:00 PM

#

fleet lintel no twink?

stop lubricating it's over

#

systems 😉

#

he's an ai researcher what did you expect

#

collective slightly awkward laughs

balmy mist Apr 16, 2025, 5:02 PM

#

its okay @sonic tendon , imma just watch on my laptop, cant miss this lmaoo

keen beacon Apr 16, 2025, 5:02 PM

#

https://openai.com/index/introducing-o3-and-o4-mini
https://openai.com/index/thinking-with-images

sonic tendon Apr 16, 2025, 5:03 PM

#

balmy mist its okay <@609942266953465856> , imma just watch on my laptop, cant miss this lm...

kk

keen beacon Apr 16, 2025, 5:03 PM

#

lol openai's website is breaking

balmy mist Apr 16, 2025, 5:04 PM

#

yeah o3 is a beast

sonic tendon Apr 16, 2025, 5:04 PM

#

will prob stop streaming if you guys don't need me to

balmy mist Apr 16, 2025, 5:04 PM

#

lol

keen beacon Apr 16, 2025, 5:05 PM

#

just checked benchmarks vs december o3

#

some are better some are worse

#

it did worse on swe bench

#

by 3 points

#

hmph

sonic tendon Apr 16, 2025, 5:05 PM

#

hmph

balmy mist Apr 16, 2025, 5:05 PM

#

o3 with tools is a beast

fleet lintel Apr 16, 2025, 5:05 PM

#

i want to understand the benchmark against other models... otherwise it's hard me to understand the quality

thorny drum Apr 16, 2025, 5:05 PM

#

im very curious the pricing of these models

sonic tendon Apr 16, 2025, 5:06 PM

#

thorny drum im very curious the pricing of these models

one million dollar per token

keen beacon Apr 16, 2025, 5:06 PM

#

lmao aime is basically finished now

#

yes but not launching today i don't think

#

we shall see

torn mantle Apr 16, 2025, 5:07 PM

#

keen beacon lmao aime is basically finished now

nah

#

thats crazy

#

they saturated the benchmark

fleet lintel Apr 16, 2025, 5:07 PM

#

keen beacon lmao aime is basically finished now

is this pass@1?

torn mantle Apr 16, 2025, 5:07 PM

#

these tools calling are kinda neat

sonic tendon Apr 16, 2025, 5:07 PM

#

peak ui design right there

keen beacon Apr 16, 2025, 5:08 PM

#

It sorta reminds me of what they demod in qwq max

#

The tool calling

#

holy moly openai's website is so slow right now

#

either it times out or i get 500s

#

help 😭

balmy mist Apr 16, 2025, 5:08 PM

#

lol

#

we gottta see vibes first

keen beacon Apr 16, 2025, 5:08 PM

#

cooked

sonic tendon Apr 16, 2025, 5:09 PM

#

keen beacon holy moly openai's website is so slow right now

i wonder if your secret endpoint still works

keen beacon Apr 16, 2025, 5:09 PM

#

it does lol

sonic tendon Apr 16, 2025, 5:09 PM

#

hey, i like it!

torn mantle Apr 16, 2025, 5:09 PM

#

ChatGPT Plus, Pro, and Team users will see o3, o4-mini, and o4-mini-high in the model selector starting today, replacing o1, o3‑mini, and o3‑mini‑high. ChatGPT Enterprise and Edu users will gain access in one week. Free users can try o4-mini by selecting 'Think' in the composer before submitting their query. Rate limits across all plans remain unchanged from the prior set of models.

#

tf

#

are

#

you

#

on

#

????????????????????????

keen beacon Apr 16, 2025, 5:09 PM

#

torn mantle ChatGPT Plus, Pro, and Team users will see o3, o4-mini, and o4-mini-high in the ...

o4 mini on free

#

based

sonic tendon Apr 16, 2025, 5:09 PM

#

thinking back to when leo said that o1 was 200b params

#

lmao

#

upcharge

torn mantle Apr 16, 2025, 5:10 PM

#

these updates looks pretty decent

fleet lintel Apr 16, 2025, 5:10 PM

#

why this feels whatever?

sonic tendon Apr 16, 2025, 5:10 PM

#

i wonder if the thinking dialogue is mostly based on reinforced learning now

balmy mist Apr 16, 2025, 5:10 PM

#

so do we get o3 pro?

#

lol

keen beacon Apr 16, 2025, 5:11 PM

#

sigh.

sonic tendon Apr 16, 2025, 5:11 PM

#

keen beacon sigh.

yeah, it happens

ember rapids Apr 16, 2025, 5:11 PM

#

o3 full is cheaper than o1 i think

keen beacon Apr 16, 2025, 5:11 PM

#

🙄

torn mantle Apr 16, 2025, 5:11 PM

#

hes holding it so hard

#

quite happy guy he seems

sonic tendon Apr 16, 2025, 5:11 PM

#

torn mantle hes holding it so hard

what an enigma

torn mantle Apr 16, 2025, 5:11 PM

#

xd

balmy mist Apr 16, 2025, 5:11 PM

#

hmmm

sonic tendon Apr 16, 2025, 5:12 PM

#

"maximize the reasoning capabilities"
i wonder what that could mean in this context

fleet lintel Apr 16, 2025, 5:12 PM

#

4x more expensive compared to 2.5 pro 😦

sonic tendon Apr 16, 2025, 5:12 PM

#

local file access maybe???

keen beacon Apr 16, 2025, 5:12 PM

#

brute force

#

BOOOOOOOOO

sonic tendon Apr 16, 2025, 5:12 PM

#

keen beacon brute force

?

keen beacon Apr 16, 2025, 5:12 PM

#

i need to test this thing at geoguessr

thorny drum Apr 16, 2025, 5:12 PM

#

4x is very good no?

keen beacon Apr 16, 2025, 5:13 PM

#

sonic tendon ?

he's talking about the model using the python tool to brute force an answer

sonic tendon Apr 16, 2025, 5:13 PM

#

keen beacon i need to test this thing at geoguessr

damn that's a good idea

thorny drum Apr 16, 2025, 5:13 PM

#

the december version seemed to be several hundred times

keen beacon Apr 16, 2025, 5:13 PM

#

https://openai.com/index/o3-o4-mini-system-card/

#

hey i was part of this 👀

sonic tendon Apr 16, 2025, 5:14 PM

#

keen beacon hey i was part of this 👀

nice dude!

calm sequoia Apr 16, 2025, 5:15 PM

#

keen beacon hey i was part of this 👀

Ahhh thats where the API is from 😄

keen beacon Apr 16, 2025, 5:15 PM

#

interesting, o4 mini did best on openai's interview choice Qs

#

lmfao their own interview coding tasks are saturated now

calm sequoia Apr 16, 2025, 5:16 PM

#

wtf

leaden meteor Apr 16, 2025, 5:17 PM

#

So, openai doesn't care about arena leaderboard now? I don't see any updates today ...

keen beacon Apr 16, 2025, 5:17 PM

#

keen beacon Apr 16, 2025, 5:17 PM

#

calm sequoia wtf

i did notice that

sonic tendon Apr 16, 2025, 5:17 PM

#

openai PRs? what context is this in

calm sequoia Apr 16, 2025, 5:17 PM

#

In this case the o4-mini may have higher arena benchmark than the o3

sonic tendon Apr 16, 2025, 5:18 PM

#

o

#

Measuring if and when models can automate the job of an OpenAI research engineer is a key goal
of self-improvement evaluation work. We test models on their ability to replicate pull request
contributions by OpenAI employees, which measures our progress towards this capability.
We source tasks directly from internal OpenAI pull requests. A single evaluation sample is based
on an agentic rollout. In each rollout:
1. An agent’s code environment is checked out to a pre-PR branch of an OpenAI repository
and given a prompt describing the required changes.
2. The agent, using command-line tools and Python, modifies files within the codebase.
3. The modifications are graded by a hidden unit test upon completion.
If all task-specific tests pass, the rollout is considered a success. The prompts, unit tests, and
hints are human-written.
The o3 launch candidate has the highest score on this evaluation at 44%, with o4-mini close
behind at 39%. We suspect o3-mini’s low performance is due to poor instruction following
and confusion about specifying tools in the correct format; o3 and o4-mini both have improved
instruction following and tool use. We do not run this evaluation with browsing due to security
considerations about our internal codebase leaking onto the internet. The comparison scores
above for prior models (i.e., OpenAI o1 and GPT-4o) are pulled from our prior system cards
and are for reference only. For o3-mini and later models, an infrastructure change was made to
fix incorrect grading on a minority of the dataset. We estimate this did not significantly affect
previous models (they may obtain a 1-5pp uplift).

calm sequoia Apr 16, 2025, 5:19 PM

#

If o4-mini is so good, how slick is o4???

keen beacon Apr 16, 2025, 5:19 PM

#

"we put in more than 10x the training compute for o1 into o3"

balmy mist Apr 16, 2025, 5:19 PM

#

o4???

keen beacon Apr 16, 2025, 5:19 PM

#

wtf

#

special surprise

#

agent related

#

here we go

#

yeah i presume so

#

yea

balmy mist Apr 16, 2025, 5:20 PM

#

yupp claude code gg

keen beacon Apr 16, 2025, 5:21 PM

#

anthropic gotta up their damn game

#

i wonder how much better this is in things like cursor and windsurf

balmy mist Apr 16, 2025, 5:22 PM

#

depends on cost

#

the only thing holding back claude code is cost

#

wow

#

dope

keen beacon Apr 16, 2025, 5:24 PM

#

lmao

#

what cuties

fleet lintel Apr 16, 2025, 5:24 PM

#

nice demo!

keen beacon Apr 16, 2025, 5:24 PM

#

"we used codex to build codex"

#

lol

#

woah

#

cool

balmy mist Apr 16, 2025, 5:24 PM

#

anybody got codex link?

keen beacon Apr 16, 2025, 5:25 PM

#

not up yet

#

will be in the next few weeks

balmy mist Apr 16, 2025, 5:25 PM

#

i want pro now

tall summit Apr 16, 2025, 5:27 PM

#

gj

keen beacon Apr 16, 2025, 5:27 PM

#

i believe it's in the api now

#

chatgpt it is rolling out

#

then it's probably just a gradual rollout

#

higher tiers first

#

4.1

balmy mist Apr 16, 2025, 5:28 PM

#

codex: https://github.com/openai/codex

GitHub

GitHub - openai/codex: Lightweight coding agent that runs in your t...

Lightweight coding agent that runs in your terminal - openai/codex

keen beacon Apr 16, 2025, 5:28 PM

#

https://x.com/sama/status/1912558745013612888

Sam Altman (@sama) on X

we expect to release o3-pro to the pro tier in a few weeks

#

they dont train it in

#

so its a hallucination

#

they either prompt it in the sys prompt or trained it in

#

waiting for karpathy's review :3

tall summit Apr 16, 2025, 5:30 PM

#

keen beacon hey i was part of this 👀

coolio

keen beacon Apr 16, 2025, 5:30 PM

#

they dont want u to use it 🤣

#

trying to make it as inconvenient as possible

calm sequoia Apr 16, 2025, 5:31 PM

#

Can this thing be evaluated on arena? As I remember, tool usage is blocked

keen beacon Apr 16, 2025, 5:31 PM

#

yea

#

i guess if u use it a lot

keen beacon Apr 16, 2025, 5:31 PM

#

calm sequoia Can this thing be evaluated on arena? As I remember, tool usage is blocked

the regular o3 and o4 mini yeah

#

dk how they will implement the tool usage stuff but it shouldn't take much additional effort

#

let me find somethin

#

can u ask o4 mini this: who won the 2024 london mayoral elections and by what margin specifically? if u dont mind @deep adder

#

try "Let a < b < c be distinct natural numbers. Must every block of c consecutive natural numbers contain three distinct numbers whose product is a multiple of abc?"

#

oh

#

yea

#

cant u disable search?

#

#

go to your personalisation settings

#

hmm

#

numbers are wrong

#

its 1m votes for sadiq but the exact number is wrong + susan hall

#

expected i guess

#

i couldnt probe 4.1 mini for the correct numbers

#

is that with my prompt

#

yeah it's hard

#

my private models failed it

tall summit Apr 16, 2025, 5:35 PM

#

LOL

keen beacon Apr 16, 2025, 5:35 PM

#

ooh

torn mantle Apr 16, 2025, 5:35 PM

#

https://x.com/DeryaTR_/status/1912558350794961168

Derya Unutmaz, MD (@DeryaTR_) on X

I’m absolutely blown away by @OpenAI’s new o3 model!

I’ve had early access and haven’t put it down for days. This release feels like the milestone we experienced with o1-preview and o1-pro, but smarter and more reliable in every way, it truly cranks everything up to eleven! In

tall summit Apr 16, 2025, 5:36 PM

#

its oxygen now

ember rapids Apr 16, 2025, 5:36 PM

#

Googles turn