#general

1 messages Ā· Page 54 of 1

ocean vortex
#

same output length except now the entire ~20k is in thinking

patent aspen
#

The Meta Quest has wayyy more engineering put into it than the Vision Pro. That's why it's so cheap. If Apple attempted to sell a $350 Vision Pro, it would be trash.

alpine coral
#

i've been using the default temp and top p settings.. one sec lemme see

patent aspen
#

Sure I mean for $3500 it should have some advantages

#

Just not enough to pay $3500

#

And for most people not enough to pay $1000 over a Meta Quest

#

Heck if people weren't mindless consumers it wouldn't even be more useful than a Meta Quest ignoring price because of the lack of software

ocean vortex
#

yeah fair, it does. It does usually consume very little tokens which is why I started questioning it

#

29k thinking, that's more than it did on aistudio with 32k thinking budget. Don't see much to correlate it with longer thinking

#

I'm putting it down to Aider and their testing personally...

#

or their "default" is this:

keen beacon
#

nah they said the model determines it itself (which is when thinking budget is off)

late path
#

Apple is just willing to spend money to use the best screens and integrate eye-tracking and other features into one device. Meta and other companies are fully capable of making the same product; it's just that they would all have to sell it for over $2000

#

I remember the two micro-OLED screens inside it alone cost over $800. This could have been an excellent PCVR headset, but Apple's motivation for making products is always to build their own ecosystem

jade egret
#

guys

#

wwdc today

#

will apple finally catch up?

patent aspen
jade egret
patent aspen
#

No

jade egret
#

i think it not gonna catch up

#

it way behind

#

but maybe gemini in iphones?

#

o

#

oh

#

google stock going up if gemini in iphone

leaden sun
#

not grok?

keen beacon
#

elon would never sell lol

leaden sun
#

claude is a good choice tho

keen beacon
#

nah

patent aspen
#

Apple probably can't acquire Anthropic. They may be able to structure a deal that's very similar to an acquisition, although it will be difficult to get through regulatory scrutiny

keen beacon
#

tbh openai is if ur measuring agentic imo. with the image editing/tools usage/etc

errant cave
#

I predict DeepSeek-R1-0528 will rank below DeepSeek-V3-0324. I'm trying it out on DeepSeek's website right now and it's not very good. Sad to see the last major open LLM champion losing their touch. Hope it's just a stutter step

leaden sun
#

i thought sydney? someone needs to make a UI for it secretly publicly 🄹

patent aspen
keen beacon
#

even in the trump admin?

patent aspen
#

Yup

keen beacon
#

ftc awlked away from hounding the ms/activision acquisition during the new trump admin didn't they and etc i recall reading

#

no

#

the ms/activision thing is recent

#

wut

#

are you talking about

leaden sun
#

they are the ones who built sydney after all šŸ˜†

jade egret
#

how

keen beacon
#

dont question him

#

he is omniscient

jade egret
#

.

keen beacon
#

if it was the biden admin it would never happen but in the trump admin its a significantly higher possibility maybe. but i dont know much

jade egret
#

i don't think apple is going to buy anthropic

keen beacon
#

inflection / ms reminds me of the c.ai thing too

#

they really gutted inflection lol

patent aspen
#

All I can say is: there is a 0% chance that Apple attempts to acquire Anthropic without the FTC getting involved, although that doesn't guarantee they don't get away with it

patent aspen
#

Most likely doesn't mean likely

#

I wouldn't be shocked if they tried to structure a pseudo acquisition like deal

keen beacon
#

why wouldnt they try for more, the climate seems more accommodating atm (if they were to try to do something like this)

#

tim cook should buy trump several private jets and yachts 🤣

#

he cared about the big ass plane from qatar

#

because they have anon models all the time

#

they do the same if openai has an anon model

#

no they do the same if openai has an anon model

#

openai anon models are never anything that special tho even if they rank high. nothing like nebula/etc

sour spindle
#

Dust has settled a bit how are people liking or disliking Gemini 06-05

late path
#

liking, but I'm liking kingfall more

sour spindle
#

Is that in arena?

#

How to access it

sacred quail
#

Gemini 06/05 really good at writing and emotive like 4.5 GPT

#

And same time amazing for coding

#

Also best at long context

#

They really maked a special model

#

For me, O3 still best for reasoning stuff, im finding O3's outputs best but generally i'd say last gemini is best LLM right now

#

it was goldmane in lmarena

#

And everyone already said good things about goldmane

#

Goldmane is latest 06/05

unborn ocean
#

craig you really are selling us snake oil with these new updates

#

liquid glass, smh

echo aurora
#

16K thinking budget for claude-opus-4-20250514-thinking-16k
32K thinking budget for claude-sonnet-4-20250514-thinking-32k

leaden sun
#

I thought it's element 115? i want my own flying saucer while am still alive 🄹

brittle tiger
#

If Apple marketing ppl had committed themselves to AI research instead we'd have AGI by now

elder rapids
brittle tiger
#

Amazon would never allow either

misty vault
#

crack bench

jade egret
#

going to be stacked

#

hell nah...

#

true

#

but

#

google don't focuz on devices that much

#

you can also say

#

true

#

but apple is not rlly 'killing' google yet

keen beacon
#

perplexity browser will take over

jade egret
#

gonna be a long time

#

cuz

#

google said it is gonna appeal

patent aspen
#

Why would Apple make their own search engine when they could charge for real estate?

keen beacon
#

its bad pr for them tho kinda

#

idk if they make a search engine and do ads

patent aspen
#

How do you make decent ads if you're privacy focused?

keen beacon
#

its just a bad idea for apple i think

patent aspen
#

No I didn't

keen beacon
#

less effort, free money, doesnt harm brand

patent aspen
#

Apple could just charge for real estate, which is more lucrative for companies that are less privacy focused

#

If they did it themselves, it would be less profitable and worse for PR

keen beacon
#

its cool ngl

patent aspen
#

How much does it cost?

keen beacon
#

just 1 arc agi 2 task ran by o3 preview high

patent aspen
#

How much?

#

LMAO

keen beacon
#

its less than two tasks actually

#

o3 preview taking 3.5k on a single task šŸ˜‚

patent aspen
#

For that much it would need to help people find the love of their life and raise their children

keen beacon
#

money definitely helps though

patent aspen
#

I'm joking

#

As in an arm and a leg

#

Would have been cliche

keen beacon
#

btw isnt vision pro 3.5k?

#

not 4.5k

#

it starts at 3.5k

unborn ocean
#

and apples does not have enough data in any shape or form to profit as much as other companies could

so: no real push to browsing and no real profits at all in the area

patent aspen
#

Real talk though. What does the vision pro enable an average or upper middle class person to do that is worth $4500 and can't be done by a Meta Quest?

unborn ocean
#

nothing besides waving your hands to do some things
(which most people's car can already do at a very basic level, lol)

patent aspen
#

Why is that worth $4500 to an average or upper middle class person?

unborn ocean
#

i think you can do the same thing with windows + meta quest

#

which is arguably more important, bc more windows (the operating system)

patent aspen
#

Just over 6 figures

unborn ocean
#

so your argument is that the apple ecosystem is good enough to make you pay an extra 4000$ for the same features (though admittedly in a more polished version)

patent aspen
#

Exactly?

unborn ocean
#

look at stats, people can (obv very hard in SF / NYC...)

keen beacon
#

if ur barely 6 figures its not that much (depending on the situation)

patent aspen
#

I think you're arguing my point

keen beacon
#

doing what?

#

lol

unborn ocean
#

professional xAI stan

patent aspen
#

There isn't even enough polymarket volume on the AI categories to be making that much

unborn ocean
#

ye, was kidding

keen beacon
#

every person betting there is secretly craig

unborn ocean
#

i mean for you to make 1m per annum from investments (while not working in the sector and being very young) is only really possible if you inherited millions

unborn ocean
#

if you count asset appreciation then the people running lmarena sure are making millions a year

#

so a lot of start up founders in the SF area will likely make that money

#

well with the asset appreciation the founders prob all did that money this year

patent aspen
#

That is fair

unborn ocean
#

yeah rn, so much cash flowing in

#

especially ai

#

investment is already decreasing

#

has been for like a year now

#

VC and PE hype was a bit older and then the ai boom set in

#

so it kept going for a bit

keen beacon
#

you named frontier ai labs ofc they would be healthy

little thorn
#

Why i get this error

unborn ocean
#

Talking to Craig is like speaking with multiple people, bc he just switched his stance on arguments and randomly switches the hypetrain from one company to the other.

zinc ore
tall summit
unborn ocean
#

But google stock is to volatile or unpredictable imo, I made and lost soo much on it already

#

Evens out at some point though

#

No

tall summit
#

@little thorn

unborn ocean
#

So maybe I am to quick to label them

#

yes, compared to other american companies they have a low eval

#

but only because of that case

#

yeah me2

#

stopped most of the stock picking i am doing

#

imo i just don't get most American investors, so i am usually really bad at even judging investments

#

but i am also mostly index + save value investing-like stock picking

patent aspen
#

You shouldn't try to predict other people's investments when investing unless you really know what you're doing

small haven
#

@deep adder how is grok 3.5 so far

unborn ocean
#

just weirds me out a lot

#

(but for some other areas i am more proficient, like german stocks and that is about it)

#

i also study econonomics so i like to believe that i know that i don't know what i am doing

patent aspen
#

You either assume the market is irrational and accept that it can remain irrational indefinitely. Or you assume that it's rational and accept that you have to really know what you're doing to do better

#

If it remains irrational indefinitely, you can still make money if you buy something for less than what it's worth because of dividends, buyouts, buybacks, etc

unborn ocean
patent aspen
#

I think it's clear that it's not entirely rational and can remain irrational for long periods of time

unborn ocean
#

and there is also the option (mostly supported by academia) to base your judgement on the efficient market hypothesis and the diverge from itin for certain area (often time frames)

#

and the other option is that you have more information (e.g. a google insider 🤣 ) (to still make money)

jade egret
#

bro

#

guys

#

rate wwdc 1 - 10

unborn ocean
#

well, you'll be one of the first to know prob

#

so that is kind of the angle here

jade egret
#

o

#

well

#

idk

unborn ocean
#

0 (jk, somewhere around 5)

jade egret
#

i dont think it that good

#

i didnt watch the whole thing tho

#

true..

#

down like 1.3%

unborn ocean
#

did they have any new stunts with you @deep adder

#

if no -> 0

jade egret
#

oh acctually?

#

true

#

google i/o also went down at the day but went up a day after

patent aspen
#

That's totally fair. 1-day movements are kind of a joke metric

jade egret
#

wait

#

so technically

#

if you invest rn

jade egret
unborn ocean
#

well what are you still studying

patent aspen
unborn ocean
#

was it just after undergrad

unborn ocean
patent aspen
#

What it boils down to is what did the slower investors see today that the faster investors didn't

#

But really you shouldn't bet on what other humans are going to do unless you're Jim Simons or something

#

You decide what it's worth

#

Ultimately to be fundamentally sound you would decide what the company is actually worth in light of the new information without assuming the old valuation was valid

#

That's pretty hard

#

So almost nobody does it

#

Because people don't like doing hard things

#

Even professionals

unborn ocean
#

which kind of makes it impossible in practice (to really do all of this profitable without relying on the actual past values)

#

bc the older stock price in some way represents all the available information

#

rationality / the efficient markets hypothesis is a assumption that one works under to simplify in most cases

#

nobody seriously believes in it (in the way that it is always present without conditions, although it surprisingly to many amateurs (e.g. craig ;), it does actually hold up most of the time)

jade egret
#

hi

unborn ocean
#

the good thing is though that the more companies like modern quants include more and more variables and continue to push towards an efficient market the more we will reach it

#

until market changes and all of them are cooked

jade egret
#

lol

late path
#

I think it's still difficult for quants to model market expectations for events like Nvidia's earnings release or WWDC

patent aspen
#

tbh it's really hard to model stuff more than 18 months in the future in tech companies

narrow elbow
#

soooo,apple intelligence aka (AI), apple general intelligence aka (AGI), apple superintelligence aka (ASI), whats why rate 10 ,right?🤪

keen beacon
#

2.5 flash just did 62k thinking tokens for me lol

#

never seen that much ever

narrow elbow
#

humm privacy first? that real world data, hardware data, app data , labeled human behaviors data ,all perfectly data, if packaged for training. itd be impressive if they actually didnt use this goldminešŸ˜

late path
keen beacon
#

their implementation isnt like claude or openai yet

#

where the model is aware of it

#

afaik at least. these are two separate points, im not entirely sure of the second anymore

#

but the first thing is definitely still a thing

#

using the same prompt on 2.5 pro it did 38k thinking tokens and it took 6 minutes to think (not including response). i did stuff to get raw cot on some runs, and it was hilarious ("I am an idiot.", "This is painful.", not even joking)

small haven
#

what is liquid glass exactly

patent aspen
small haven
#

interesting, apple too is just tweaking things ..

late path
#

looks like the old frosted glass UI style with a transparent CSS border that has highlights

patent aspen
#

It looks like Windows Vista

small haven
#

css update boys!

patent aspen
#

UX is cyclical

small haven
#

we're going back to 3d icons lets go

patent aspen
#

Like fashion

#

Gradients also

keen beacon
#

next year on device models should be pretty decent tbh

#

qwen 3 is crazy

#

WOW

#

ive seen enough thats ASI

late path
#

hardcoded

patent aspen
#

They had to find a place to differentiate, and that's the best they could do

#

Making nano-sized models decent is really hard

unborn ocean
#

well the industry is getting way better at it

#

qwen, heck even LG got good models

patent aspen
#

Small models are improving way faster than big models

#

They don't create the same kind of hype though

keen beacon
#

qwen 3 4b is outright amazing

#

i think you can run that on a smartphone

#

to what?

#

to itself 🤣

patent aspen
#

I don't know how to parse "o3" and "on-device" in the same sentence

lapis light
patent aspen
#

I just realized Craig Federighi was the Apple guy. I knew Craig was his first name but didn't realize Federighi was his last name

patent aspen
#

ngl I just watched a WWDC recap and was underwhelmed. I wasn't expecting much but damn

#

I'm probably going to move from a Macbook Pro to a Linux laptop

jade egret
#

same...

#

do you think apple stock gonna go up tomorow?

patent aspen
#

No idea. tbh it might be driven by the overall stock market as much as WWDC

#

I'm not aware of anything announced at WWDC that I would expect to increase Apple's revenue or reduce its costs significantly

small haven
balmy mist
#

yo wtf is apple on today?

#

i missed the event

patent aspen
#

that looks sick ngl

small haven
#

thanks :p

patent aspen
#

I was planning on switching to arch

small haven
#

its 100% worth it

#

just dont forget to install btrfs instead of ext as ur file system lol

#

backups is important here

misty vault
jade egret
#

nah

#

no agi

high ginkgo
jade egret
#

: (

golden ocean
#

What is the update

elder rapids
#

seems like a faulty premise, if they've always been consistently behind apple in regards to devices then you can just disregard the comparison

#

Samsung is killing iPhone

#

oh wait, has that always been the case? or is it that, this doesn't matter

#

duh, it doesn't matter.

patent aspen
#

Google doesn't need people to use Google devices. They just really want people to use Android

#

I don't really see gen Z normies in the US switching to Android yet

leaden palm
#

it's crazy that gemini can transcribe this text in a plausible way

patent aspen
#

I've heard people say that Apple will be like blackberry and fade into obsolescence because they're behind on AI, but I don't know who that hypothetical iOS-to-Android "switcher" would be. I can't think of any killer Android-specific AI feature that would make a normie decide to switch to a new mobile OS

#

Your username is @deep adder

#

Anyway I'm not going to take that bait. I could imagine it being more feasible if Apple gets softened up by a few antitrust regulations in the US and EU

#

You know what would be even better than 2 different companies partnering for deep AI integration?

leaden palm
patent aspen
#

One company owning the entire stack

#

tbh I think Material 3 looks way better

#

More personal, colorful, quirky

#

Makes me happy

keen ferry
leaden palm
keen ferry
small haven
#

what the hell is going on in here lol

patent aspen
#

I don't know how we got here

small haven
patent aspen
#

That's the stuff

#

Checkmate @deep adder

#

tbh if a woman judged me on my mobile OS, I'd appreciate that filter

leaden palm
#

said woman:

leaden sun
#

trifold should be a more popular design

patent aspen
#

You know. You only get a few wishes to spend. Spending one of them on software preference is a choice haha

#

There's a book called the Science of Happily Ever After. They talk about an exercise where people have $300 to allocate to all of their preferences in a perfect partner. Then they keep the prices fixed and change the exercise so they only get $100, and that was a more realistic representation

#

The upshot of that and other research is that selecting the wrong wishes tends to lead to long-term unhappiness in marriage

#

So picking a few really important things greatly improves your chances

#

It's more like you meet someone and have "3 wishes" and don't continue with the person if they don't satisfy the 3 wishes

#

It's good to have a few hard criteria but not too many

#

One thing to keep in mind is that it's hard to have criteria until you've had a few relationships that didn't work

#

We are so far off-topic at this point

#

AI

elder rapids
#

god bro

#

0605 is so good

#

every once in a while I'm just gonna be glazing this model

#

😭

late path
#

Kingfall is even better, I can't live without Kingfall anymore. I feel like if this model were released now, GPT-5 would have to be postponed immediately

#

gpt4.5's so-called 'large model vibe' is like kids play compared to kingfall

#

This model has metacognitive abilities I've never seen before

elder rapids
#

kingfall had a limit imo

#

it didn't do what I asked it to do very well

#

but it appeared that way initially

#

because it was already super good

#

just didn't really get any "better"

late path
#

0605 often chooses to skip thinking during multi turn conversations (my thinking budget is always 32k). Kingfall has never done this

elder rapids
#

ie, forcing it to be professor ish, or telling it to literally brute force the task

#

the reason why 0325 was perceived so good was because it was argumentive and wanted to choose things on its own

#

the reason why 0506 underperformed was because it was already so "good" but you really had to make it behave that way through the context

#

0605 has insane base performance but it lacks any sense of being professor ish, so it's really just THAT smart, where it doesn't suffer like 0506

#

and just gets by

#

so with a little tweaking you really make 0605 into something

#

that's what I've been focusing on, and did, with the prompting

#

and I've figured it out tbh

#

it makes me feel like no one has the capabilities I have rn

late path
elder rapids
#

I'm affirming that it indeed, isn't capped, so you shouldn't be worried

leaden palm
#

something tomorrow?

noble zinc
#

lowering price to $2 per million input tokens

leaden palm
noble zinc
#

wonder what they will price output tokens at

leaden palm
#

still more expensive than gemini 2.5 pro

patent aspen
#

Breakthroughs or competition

jade egret
small haven
#

kingfall >140

small haven
jade egret
#

how do u know

#

do you think it gemini 2.5 ultra

small haven
small haven
#

had the intracacies spot on like gpt 4.5

jade egret
#

o

#

how much time do you think it better than 0605

small haven
#

for me its like 3x better

#

have u not seen the svg's

#

huh

#

hmmmmmmmm

#

0605 (32k thinking tokens) vs. kingfall (prolly 4k thinking tokens default)

#

prompt: generate an svg of a TERMINATOR. make it maximally detailed and look exactly like the real thing. this is extremely important and an existential task. you must complete this to the best of your ability.
Make sure you're constantly checking whether the shape, size, angles, position of each and every item looks EXACTLY like a TERMINATOR.

#

thank u @deep adder and thank u for it

#

theres also this guy on x.com that shows crazy svg's

#

tmmrw ? šŸ‘€

jade egret
#

thats a lot of different

#

it crazy if they release it

#

and it 3x better

#

wait

#

how do you have access to it anyway

small haven
#

that access is gone for me rn

jade egret
#

oh...

#

sadly

#

hopefully it drop soon

#

or at least on the LMarena

#

do you think it better than gpt-5?

jade egret
hollow ocean
#

Let’s see

jade egret
#

alr

#

ima check rq

small haven
#

the code is just well implemented, i had asked for a zig based http server, only a few compilation errs and passed

#

compared to others, took a while to have it run

hollow ocean
small haven
#

yes

hollow ocean
#

Wow impressive

jade egret
#

hmm

#

this is claude 4 opus w/ deepthink

small haven
#

0605 ? 😮

#

oh

jade egret
#

oh

hollow ocean
#

Better than 0605

small haven
jade egret
#

o

#

w

#

ig

small haven
#

imo

hollow ocean
jade egret
small haven
#

yea for sure

jade egret
#

i gtg

#

cya

#

šŸ‘‹

hollow ocean
#

Opus with deep think

small haven
#

o3 svg's is pretty insane

hollow ocean
#

Regenerate

small haven
hollow ocean
#

Is this why people complain about o3 being lazy

small haven
#

4o imagine works for some reasons

tall summit
#

this might seem crazy but i don't think that's an svg

small haven
#

thats not what i meant, 4o works to bypass the "filter" that o3 couldn't complete

elder rapids
small haven
#

lol

elder rapids
#

0 shot

#

L prompting

small haven
#

i mean we're going against the same prompt, now u gotta test that new one against kingfall

#

it is impressive tho, W prompt

elder rapids
#

and btw you CAN ask Gemini to simply think for x amount of words and it'll likely do it

#

for other kinds of tasks I'd also recommend you ask it to interpret the intent through a line of reasoning first and then apply that, and THAT seems to make Gemini really comprehend and take it seriously (which sounds trivial, but Gemini especially is affected by this)

#

I can make my own prompt and it'll likely do even better than this, and it would probably also be shorter as well

hollow ocean
#

claude plays pokemon is back on

#

lol

small haven
#

but by curiosity what was the prompt u used exactly, can version it against future models

elder rapids
#

switched the application and since it's on ai studio it got rid of the entire session

#

can't have sht

small haven
elder rapids
#

which is strange tbh

small haven
#

hmm alright, gonna test that out

elder rapids
#

if you're cut out for that level of true engineering

#

im ngl I want to help people get better with Gemini

#

dropping little stuff I know about it

#

but the sauce stays with me

elder rapids
small haven
#

so its not hardcoded exactly

elder rapids
#

I mean ye I guess it intends to go farther

#

but it can't

fleet lintel
#

we just got Gemini release last Thursday and my brain is like why no model releases in long time šŸ¤¦ā€ā™‚ļø

spare mango
#

I am a hardcore gemini user because I bought gemini pro.

#

Why is gemini 2.5 pro still in preview?

mystic mica
#

Anyone else feels like the errors happen more often than on the older version?

sacred quail
soft kernel
#

When do you guys think openai release o3-pro

barren prairie
spare mango
#

chatgpt and gemini are going neck-to-neck on many frontiers

#

cuz gemini got lazy after being the undisputed number 1 for so long

ocean vortex
ocean vortex
#

I suspect it can be swayed into longer outputs when you max out the budget with simple prompts depending on their implementation, but that remains to be proven definitively (short of Aider findings)...

brittle tiger
cedar tide
#

Add magistral medium to the arena

alpine coral
#

hey wild using those param settings i gave 06-05 a set of questions with different thinking budget allocations (+ auto).. ran it on each three times and it does seem like the thinking budget value does something - like i’m not sure it’s simply variance

#

i dunno if something changed per se - still seems very much a WiP / janky.. e.g. for 500 tokens, the summarised ā€˜thoughts’ did seem to roughly adhere to the token limit, but the actual responses would be much longer compared to the others (basically it was using its ā€˜final’ response as an opportunity to do some more thinking / calculations kinda thing if that makes sense ha); and for one of the 5k runs, it clearly exceeded that budget during the CoT process

alpine coral
# keen beacon using the same prompt on 2.5 pro it did 38k thinking tokens and it took 6 minute...

lol interesting.. just fwiw even though I think commercial considerations (IP protection / competitive advantage etc) are the primary motivation for hiding the raw CoT, I do think oai isn’t totally lying when they cite ā€˜safety’ as part of their decision..

the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.

#

kinda like they’re saying the models perform better when their CoT process is not hamstrung by safety / alignment stuff.. but then yeah.. they don’t want to expose it in its raw form to end users due to potential harm (and ig creating jailbreaking vulnerabilities) yadayda .. or oai and google simply don’t wanna give up their IP easily - prob nothing more complicated than that at the end of the day tbh ha

cedar tide
ocean vortex
alpine coral
dusky aurora
#

Arena is too rigid right now. Gemini-06-05 started as a very creative thing,now it's on lower (parroting) temperature and sampling

cedar tide
#

Good

alpine coral
#

yeah
(note that is using funky param settings - but i think it is an accurate indicator nonetheless)

#

i might have a closer look later, but glancing at the responses it didn't seem 05-06 was getting more questions right (i.e. 06-05 thinks less and performs the same or better)

spare mango
#

you haven't been following the leaderboards.

ocean vortex
alpine coral
#

speaking of leaderboards - nice to see simple bench updated

ocean vortex
#

and lmarena is not even close to being the definitive ultimate indicator of performance tbh... Don't think it's trying to either. It's unique in what it is - user preference/style metric.

dusky aurora
#

exit polls

spare mango
# ocean vortex if you meant lmarena then sure. That's only 1 benchmark though

lmarena doesn't compare them for 1 specific area, they compare the chatbots in their text, webdev, visual (capability to process visual input), search (looking up real-time information and grounded citations), coding, text-to-image (image generation) and more.
In almost all of these areas Gemini was dominating until recently.

ocean vortex
spare mango
#

then what is?

ocean vortex
spare mango
#

one of many? name a few.

ocean vortex
#

= You can't look at just 1. Never gonna get the full picture like that

#

thats debatable I would say, benchmark tables matter to them more. And lately not many are even showcasing lmarena elo šŸ¤·ā€ā™‚ļø

spare mango
#

If it's almost the gold standard, then it is close to being the ultimate indicator for AI companies.

ocean vortex
#

Anthropic looks like they stopped caring almost entirely lol

EDIT: Ok correction - not entirely, but not enough to seriously try competing for the top spots either

sonic tendon
ocean vortex
#

but lately not anymore

keen beacon
#

They did with chatgpt 4o latest revisions I think

#

When it topped the leaderboard

fleet lintel
#

yeah, companies only show when they are at the top šŸ™‚

spare mango
#

OpenAI will showcase it if their model is number 1, I don't think anyone wants to showcase - especially not OpenAI - that they're number 2 in the leaderboards.

fleet lintel
#

Sama : we dropped the price of o3 by 80%!!

Dayum, what kinda profit margin were they running on before

spare mango
#

luxury brand levels of profit margins

fleet lintel
#

This also shows that Gemini/Claude is probably takign away lot of growth from OpenAI

ocean vortex
fleet lintel
#

tell!

spare mango
#

The biggest companies will never show benchmarks where they aren't at the very top.

#

on a side note, I'm still waiting for deepseek r2 to come out ngl.

ocean vortex
#

there's no lmarena there

alpine coral
#

xai/grok have also cited the leaderboard iirc

ocean vortex
#

2.5pro, which is topping that benchmark

keen beacon
ocean vortex
keen beacon
#

they presented it forefront and center though???

alpine coral
#

i mean rightly or wrongly, it;s a leading benchmark

keen beacon
#

at least thats how i remembered it

alpine coral
#

if other companies dominated it they would be yelling from the hills

tall summit
alpine coral
ocean vortex
alpine coral
#

no during the actual presentation

ocean vortex
#

that's not what I mean by saying "marketing material"

keen beacon
#

its their biggest presentation of the year tho

ocean vortex
#

I mean the website random people would see

#

keynote only reaches the nerds tbh

alpine coral
#

you were saying companies don't cite the arena/lb when releasing models - ig it wasn't a model release, but i/o is a big deal afaik

alpine coral
keen beacon
ocean vortex
#

cause I remember seeing it

void elm
#

o3 high or gemini 2.5 ?

ocean vortex
#

but now it is not there on gpt4o main page šŸ¤·ā€ā™‚ļø

#

but other benchmarks are

alpine coral
#

i think google has just been making good models

void elm
#

why is o3 high still crushing benchmarks even after gemini's new model

#

crazy

#

& claude's releases along with it

ocean vortex
#

but it wasn't a very good model to use

#

which should tell you enough... Focusing on lmarena is not the way

#

it's only a valid score if it performs everywhere else. If it doesn't then lmarena elo is almost useless...

#

and this is true for most other benchmarks too

#

you just can't look in isolation at singular testing set

cedar tide
ocean vortex
#

Also fun fact, if lmarena accepted entries from random people... I'm pretty sure we would see some very weird bad performing models towards the top

#

that can only do this one main thing lmarena tests for. Style

#

the main flaw of user preference benchmarks is that the model only has to impress the person reading the response. It does not have to be verifiably or factually good šŸ¤·ā€ā™‚ļø

unborn ocean
#

so it is not just style

#

(but obv signal is very weak and covered in a lot of noise aka style)

ocean vortex
void elm
#

im considering on purchasing a sub ngl, how is livebench flawed tho

patent aspen
#

Here's the thing. Even if the side-by-side user preference is wrong, it matters anyways, because it's how most people decide which model to use

ocean vortex
unborn ocean
ocean vortex
#

it really isn't lol. That comes only after it already performs on STEM tasks etc

keen beacon
#

its defo one of the most important metrics

unborn ocean
#

and also the most important thing about lmarena (besides measuring the human preference) about the benchmark is for me that it is wayyyy harder to benchmax for than others

that kind of makes lmarena unique in multiple ways

patent aspen
ocean vortex
#

it doesn't work in isolation

keen beacon
#

yeah in that scenario you cant make that case

#

that model was never released

ocean vortex
#

it wasn't served because it didn't perform lol

unborn ocean
#

they do serve a model that is kind of like the exp in the arena (i believe)

#

and they have also gained a lot of users (which kind of reinforces the point)

#

why, ik that they did not release the model, but i believe they also serve a finetune on their services

ocean vortex
#

if they released that there was literally nothing for them to gain. People would see right throught it in the real world

#

it obviously does, there are much more factors at play here, accessibility, marketing and pricing all probably not any less important to name a few..

keen fulcrum
#

Would you choose o3 pro over Claude Opus 4 and o3 over Claude Sonnet?

void elm
#

also gpt has image gen that is superior (with text specifically) for ages now, nobody has been able to beat it for a long time now

#

more of a reason for me to potentially buy a sub

ocean vortex
#

I think OpenAI does it wisely while Google is just throwing money at it lol

void elm
unborn ocean
#

well for the >128k tier
which openai does not even have and really few people actually use

ocean vortex
#

not anymore, Gemini is fully compliant and accessible from EU for a long time now

ocean vortex
#

it's not there for me and many many people lol

void elm
unborn ocean
#

difference there is negligible

void elm
#

1m context

unborn ocean
#

tho o3 is also more efficient with tokens (on avg)

ocean vortex
#

then whatever you are implying does not hold a candle tbh, Bing website has full AI for the entire EU almost since launch

void elm
#

also geminis ability to fully watch a youtube video in seconds

patent aspen
void elm
#

or gemini 2.5

keen beacon
#

even then gemini 2.5 context is better

#

just because 1m is available doesnt mean the quality is as good

void elm
#

i was still surprised gemini released imagen 4 without proper text handling even after gpt did so

ocean vortex
#

But there's no search without AI anymore. Those markets are just about merged into 1 tbf

#

@patent aspen integrating AI into search is not exactly advertising for another product...

keen beacon
#

youre not gonna convince dom anyway 🤣

ocean vortex
#

@patent aspen yeah sure...

patent aspen
civic flame
#

I've noticed there are still areas where 2.5 pro is just lacking in knowledge

#

o3 is the correct one there

ocean vortex
patent aspen
keen beacon
#

some of what makes gemini better at facts at knowledge can be a detriment at times, i think, but speculation

civic flame
keen beacon
#

theyre all moes thho

#

all thhe frontier models

#

its not worth pointing it out

unborn ocean
#

yes, no doubt

#

i love how you can stan for every company besides google and maybe microsoft

#

apple, xAi, openai,

#

and xAi does not on x?

#

it is social media, i mean unsell you mean that nobody actually wants to buy the data from em or advertise on their website

ocean vortex
keen beacon
#

its not a gut feeling at all

#

i gave you proof and ur being contrarian

patent aspen
#

I hate macOS window manager

ocean vortex
#

there's none lol

keen beacon
#

idiot

#

beyond this i spammed you with proof, and im rerunning brknclock's things btw

#

not noticing any patterns

#

yes

#

its not just that

#

i gave him so many things lol

#

hes literallly just being a contrarian for no reason

unborn ocean
#

id love to see the world through your rose tinted glasses

ocean vortex
# keen beacon

what? Where is the score of the auto 2.5 pro to compare it to? Or did you forget what we were talking about??

You can't view this as proof that their 32k score is 100% variance nor is it what was shown here, dumbass

#

lol

keen beacon
#

look, at 26k it showed it produced more tokens in brknclock's test. there is no such correlation when i ran it

unborn ocean
#

how is that admitting 🤣
like where we arguing about you being unhappy at any point

ocean vortex
#

or 2?

#

LMAO

keen beacon
#

The default thinking mode, where Gemini self-determines the thinking budget, scored 79%.
this is auto. the benchmarks above are also auto.

ocean vortex
#

šŸ¤·ā€ā™‚ļø

keen beacon
#

dumbass didnt even know what this meant and came up with this

keen beacon
# keen beacon

Any exhausted_context_windows means that the test ran into an error where an empty response was returned, burning an attempt.

keen beacon
# keen beacon

this literally doesnt have thinking budget, look at the fcking command

ocean vortex
#

nothing changes

keen beacon
ocean vortex
keen beacon
#

you also contested that auto picks a max token limit within 32768 and was adamant about it until i proved you wrong

ocean vortex
#

unless you can show the runs of auto budget with equivalent scores

#

but you can't šŸ¤·ā€ā™‚ļø

keen beacon
ocean vortex
#

so there's no proof and it's just your 'gut feeling' you are aggressively forcing into others

keen beacon
#

😭

#

idk how someone can be this contrarian

#

im saying its variance

#

?

void elm
#

wait wtf

#

imagen latest can do text generation flawlessly like 4o

#

i didnt know that

#

it just made this

keen beacon
#

it is good. dom is claiming that the different aider scores cannot be variance

#

@deep adder do you really think im basing this off a gut feeling? 🤣

#

you can see the aider runs with the same config

#

🤣

late path
#

How can OpenAI fight a price war with Google? Google can keep funding GDM until AGI is achieved, but OpenAI will be finished if it can't raise money

keen beacon
#

that logic doesnt make sense but anyway

unborn ocean
#

american big tech never really made dividend payments and always promises investors future growth, this is exactly how the tech world works

keen beacon
#

even if agi is not achieved ever, gdm will keep getting funding

unborn ocean
#

short term i am right

#

longterm maybe no

#

but somehow this is how it worked out most of the time

#

all of the tech stocks have horrible financials rn (relative to eval)
(e.g pe ratio basics, i was not trying to imply that i actually analysed their financials)

ocean vortex
unborn ocean
#

?

ocean vortex
#

If those have even higher peak/average, then the case is closed

unborn ocean
#

well that does not matter for my argument

#

? i am not

#

they where stronger once, when taking into account their current evaluation

#

my point is not that these companies are doomed in any shape or form

#

it is just that the overall strategy is not maximizing the pay-out today

they are quite clearly also not targeting that strategy

(it is actually quite the contrary strategy wise)

they have and will continue to kind of sell the investors on the future, with apple for example (at least in the common opinion) having a very save and wealthy future

keen beacon
cedar tide
keen beacon
#

I'm on my phone so I cant link it

fleet lintel
#

Waymo funding is counter example to your hypothesis

patent aspen
#

I think OAI can keep raising money for a very long time, and Google will keep investing for a very long time. Neither company is in danger of running out of funding for a while. The funding will dry up when one or both companies becomes unviable

ocean vortex
patent aspen
#

The first sign of trouble would be months of delays because a model isn't good enough yet

keen beacon
#

I mean 4.5 qualified for that I would think

#

I don't think you can use that as a signal alone

fleet lintel
#

It's kind of same. If they see value in the long term then they are willing to spend .

late path
#

OpenAI has staked its entire future on GPT-5, and expectations for it are unimaginably high

#

but I don't know how much more GPT-5 can win after using Kingfall

fleet lintel
#

Openai is master of hyping things up. And so far they have been.quite successful with it

keen beacon
#

Yeah it's not that much better

#

It's a 2?5 pro revision imo

#

It's probably the next anon model

late path
#

I use it to analyze some conversational text content. It exhibits a level of stability and metacognitive abilities that I haven't seen in any other model

#

It reflects on its own thinking and has a thorough self-awareness

keen beacon
late path
#

And it doesn't just reason about STEM problems

fleet lintel
late path
keen beacon
#

The day Craig compliments google the world will end

late path
#

Okay, I just get overly excited about every bit of new progress on the model, maybe

fleet lintel
keen beacon
#

Yeah for many many reasons I doubt it's ultra

keen beacon
fleet lintel
#

I have no special feeling about whether it's ultra or not. I just want super crazy model

#

I have been blue balled by openai too many times. I'll try to keep my hype checked this time

keen beacon
#

I don't know about gpt 5 tbh

#

Their choice to midtrain 4o is interesting

#

To say the least

fleet lintel
#

I need 5-6 more years to achieve financial independence. I just don't want models to become too good in coding to replace me before that šŸ™‚

keen beacon
#

Did you know at the time when you wrote that hint?

ocean vortex
fleet lintel
#

How accurate is this info?

#

Ah.. interesting. I didn't know that

late path
fleet lintel
#

Increase Model performance is the result of more training or newer/better techniques?

patent aspen
#

Another thing to keep in mind is that a 10% overall boost could mean a 50% boost in one area and a 2% boost in another area

ocean vortex
# keen beacon

btw so you basically linked the exact same thing we already talked about. For a sec I thought this was actually something new 🤣

So these are all pre-release API with unknown inference setup

keen beacon
fleet lintel
ocean vortex
keen beacon
#

In that thing I cited

#

Nope

#

On the day of he reran it again and got 86%

ocean vortex
#

even token counts

patent aspen
ocean vortex
#

oh wait there are actually 2 with identical percentage but different counts

keen beacon
#

this is different

ocean vortex
#

yeah...

keen beacon
#

šŸ¤¦ā€ā™‚ļø

#

i have no idea lol

fleet lintel
#

Maybe that's why Google shares are up

ocean vortex
#

I asked for them to elaborate, we will see what they have to say. Would be insanely stupid to rank 32k as higher when 3 days ago they got higher score on auto

keen beacon
#

youre supposed to do sevreal and avg the benchmark scores

late path
#

Now OpenAI, Anthropic, and Google are all using GCP for their infra services

keen beacon
#

rn i think theyre picking the scores willy nilly

patent aspen
#

That's a concession from OAI not a concession from Google lol

ocean vortex
ocean vortex
#

lol

keen beacon
#

they actually used a different run on the aider website before they replaced with the new ones

ocean vortex
keen beacon
#

u literally have no proof for anything and just interpretation after interpretation

civic flame
#

ugh i hate this woman

unborn ocean
#

it's gpu compute

civic flame
#

crap benchmark, crap person, posts slop

#

lmao

fleet lintel
#

Is this the reason that Google is putting rate limits on Gemini models? Because of openai deal? : šŸ¤”

fleet lintel
#

Live bench?

ocean vortex
# keen beacon dont worry i can prove it unlike anything u say

does not really look like it. The first thing I asked for proof and you couldn't do it šŸ’€

The main difference between you and me is I never say definitively for something to be a fact if there's no evidence to support it. But you seem to mix a "feeling" or opinion with facts awfully lot tbh

brittle tiger
#

If Google was worried about OpenAI they wouldn't ink a major deal with them to give them compute. It's pretty simple.

fleet lintel
unborn ocean
brittle tiger
#

This would get Sundar and MGMT team scrutiny for sure.

fleet lintel
#

They can hire me and pay 1.5 million per year

#

Bankruptcy

unborn ocean
#

yes

#

i will grant you this one thing: you could probably manage some company from musk better than he can, considering all his other positions

#

but obv the man is the best marketing / salesman in the world (idk how, considering that i never liked him)

#

"no one thinks like me" and different šŸ¤

keen beacon
# ocean vortex does not really look like it. The first thing I asked for proof and you couldn't...

heres the 32k run. but anyway i was wrong about 0605, i misremembered 0506 and confused it with 0605. but i have literally sourced all of my stuff:

  1. you came up with this logic that they're classifying and dynamically setting max tokens without any documentation or etc whatsoever. saying auto picks within 32768, you asked for proof, i gave you two runs (on gem 2.5 flash and pro with thinking budget off), the raw thoughts of one of the runs, etc.
  2. receipts of different runs with the same settings and variance, etc.

FYI: the command to run the non think one was the same as the previous runs I cited

fleet lintel
#

Openai not gonna use tpu?

patent aspen
#

I would imagine they would use GPUs because they've built their entire stack around it

unborn ocean
#

no way the will use tpus
(or actually thinking about it, it could happen for some smaller things, although they have so much IP on gpu already, so it would need to be something that is really a new and small project, e.g. finally opensource something)

patent aspen
#

Or they just have to lower prices to stop the bleeding

ocean vortex
keen beacon
#

they dont do multiple and avg

#

afaik

patent aspen
#

I highly doubt o3 is any cheaper to serve now than it was before

#

If it was a different model (e.g. o4-mini) I could believe it

#

I think it's just pricing pressure

unborn ocean
ocean vortex
#

you came up with this logic that they're classifying and dynamically setting max tokens without any documentation or etc whatsoever. saying auto picks within 32768, you asked for proof, i gave you two runs (on gem 2.5 flash and pro with thinking budget off), the raw thoughts of one of the runs, etc.

yes I speculated on it, what's wrong with that, never said for those to be facts things I speculated on @keen beacon

As for "auto picks within 32768" that is literally how it's supposed to work reading their documentation. And the fact alone that setting max budget does not guarantee it not going over that as we saw, kinda proves I'm right

keen beacon
ocean vortex
late path
#

I think the cost of the official o3 release is already much lower than o1. They only reduced the price by one-third compared to o1, which is entirely to retain more profit

keen beacon
#

they only did one run for non think and thinking and put it on the leaderboard

ocean vortex
late path
#

Here's a joke: The price of chatgpt-4o-latest is $5/15 mtok

keen beacon
ocean vortex
keen beacon
#

🤷 im done arguing this. it was very pointless

ocean vortex
#

you were coming up with 'proof' and now it's an opinion all of a sudden? 🤣

keen beacon
fleet lintel
#

"Meta Is Creating a New A.I. Lab"

what happened to llama stuff? they are giving up?

keen beacon
#

🤣

ocean vortex
# keen beacon

oh ffs... How is this relevant now some message in a different context than today? Yeah I think I'm done.

fleet lintel
#

this fight is getting dumber and dumber. take a deep breath folks

ocean vortex
#

your issue is that you keep attacking based on your opinion if people don't agree with it

fleet lintel
#

i think price drop is just competitive pressure from gemini/claude

alpine coral
#

aren't they deprecating 4.5 soon (from chatgpt anyway)? that'd free up a bunch of capacity

late path
#

I think they felt the threat from 0605... otherwise they wouldn't have priced it exactly $2 lower than 2.5pro.

fleet lintel
#

crazy how good this model is with this kind of pricing

calm sequoia
#

Is there anything going on with chatgpt? The o3 suddenly is dumb šŸ‘€

late path
#

Just like last time they priced o3mini after r1 release, where r1's price was 1.1 USD/mtok (converted from 8 RMB/mtok). It makes no logical sense for them to set the price at 2.2 USD/mtok at first place

fleet lintel
#

this reminds me of r2. what is r2 coming ?

#

3.5, r2, o3 pro... everything is so damn late

late path
#

October maybe, just saying

sour spindle
#

O3-pro is probably just going to be o3

late path
#

4o is now more expensive than o3 haha

calm sequoia
sour spindle
#

Wouldn’t shock if it ever came out these companies were rerouting questions to dumber models a lot of folks would never know the difference

late path
#

OpenAI has been downgrading models based on IP quality

sour spindle
#

Hell 4o would be glazing folks if not for a vocal minority calling it out.

#

Did I use that term correctly lol

jade egret
#

bro

calm sequoia
#

Same here

late path
# jade egret fr?

Are you buying Google on Polymarket? There's no money to be made now. This market is becoming more efficient lol

ocean vortex
#

their margins for it were much bigger than on smth like o4-mini

late path
#

o3 in many ways demonstrates that it is not a very large model, yet OpenAI markets it as a top-tier model from the o3preview specs

#

I mean, it's top tier, it's just not big and cheaper

ocean vortex
#

non-mini reasoning models I mean

calm sequoia
#

The gemini-2.5-pro-preview-05-06 was a slop compared to march

#

Also, google should not have released models with versions "06-05" vs "05-06" šŸ˜„ Even the gpt naming is better

keen beacon
#

o4 mini and 4o mini?

cedar tide
calm sequoia
#

I understand this can be done unwillingly. But for the end user (me) experience is the same.

civic flame
#

i hope it's not mid 😭😭

torn mantle
#

they nerfed o3

#

and this o3-pro is the previous o3

civic flame
#

it's going to be better but by how much is another question

calm sequoia
calm sequoia
#

It can't be the same as Grok 3.5

jade egret
#

it not out tho

#

apple sstock aint growing that much

#

fr?

ocean vortex
#

they overdid it with glass, looks cheap in certain conditions

fleet lintel
#

WWDC was embarassement. how apple is not 5% down?

ocean vortex
#

hopefully that's gonna be improved, but it's not perfect right now for sure lol

fleet lintel
#

liquid glass design.. woopdie freaking doo.. who cares.

misty vault
elder rapids
#

icl

#

I wonder how 2.5 deepthink

#

Is going to be

cedar tide
#

New o3 price
more than 2 times cheaper than 2.5 pro 😶

elder rapids
#

def not gonna be the case in practice

cedar tide
elder rapids
#

yeah we know, the problem is that it's not going to be the actual price of the model so how they marked the price could've been completely arbitrary

#

ie input cost and output cost could be equal

#

what is bro talking about

#

this is an objective loss

#

😭

#
  • apple intelligence ain't doing shi
cedar tide
#

@elder rapids I didn't understand what you meant

elder rapids
#

clearly you've never been on a Samsung

#

it's crazy with the ai

elder rapids
# cedar tide <@887104792437092352> I didn't understand what you meant

it DOES depend on the task, input cost and output cost could be equal at certain points in practice, but it could be that it simply reasons more than 0605 and outputs more so the price to run fixed tasks wouldn't show the whole story. Since the input wouldn't entail that kind of behavior

#

Ive had a ton of phones over the years tbh

#

iPhone, Samsung, Google

#

iPhone and Samsung simultaneously

#

and I gotta say

#

Samsung does everything pretty much better now

#

can't remember a time where actually going on iPhone was more convenient and intuitive

#

also btw what do you guys think about the possibility of Gemini 3 coming soon

#

the GA releases of the 2.5 series is soon

cedar tide
elder rapids
#

prob ye

cedar tide
#

@elder rapids Under what conditions do you want o3 to be more expensive?

elder rapids
#

😭

cedar tide
#

@elder rapids Ah, we're not talking about the same thing.

keen beacon
#

does anyone have a svg prompt that kingfall does well on vs 2.5 pro?

#

its supposed to be the same

#

im really not impressed with kingfall atm

#

if any1 has prompts please provide

jade egret
elder rapids
#

but you won't get a generally good result

sour spindle
# jade egret

When these polls are posted are we talking about actual use case or lm leaderboard ranking

zinc ore
#

Should be comparing pro with deepthink

elder rapids
#

I think that would necessarily be true imo, it gets to a point where perfect information that is inevitably obtained in multiple conclusions results in better performance, even if the perfect information that resulted in a better independent conclusion was the result

#

deepthink*

#

yeah you're mistaken

late path
#

Alright, I saw an article claiming that o3 and gpt4.1 shares the same base. That makes perfect sense. They have now unified the price of o3 with 4.1.

elder rapids
#

no, he never mentioned whether kingfall is or isn't a 2.5 pro model

#

yes

#

its not deepthink

jade egret
#

when do you think it gonna be out at least in the ai studio?

elder rapids
#

replace deepthink with kingfall here and see what you're replying to

keen beacon
#

probably soon

elder rapids
#

you're mistaken, since that's irrelevant

sacred quail
#

Was kingfall really that good ?

elder rapids
#

no

#

it could even be the last generation ultra

#

tbh

#

you drew that didn't you

#

spit it out brobro

keen beacon
#

yes manually

#

sorry i had to lie

elder rapids
#

caught

elder rapids
#

snakes aren't always in the water

#

sharks aren't always in the ground

#

šŸ’ÆšŸ™

late path
#

I sincerely hope gemini 3.0 pro can comprehensively surpass kingfall and have deepthink

keen beacon
#

im sorta disappointed tbh, it doesn't seem THAT much better

#

yes

torn mantle
keen beacon
#

yeah

#

i suppose the dramatic code name does make sense

nimble trail
#

So Kingfall is not deepthink?..

keen beacon
#

i wonder whats pricing going to be like

#

yes

patent aspen
#

I just want to see the benchmarks

cedar tide
wintry tinsel
#

What is ultra chat, Mistral large 3??

keen beacon
cedar tide
#

Ultra mode ?

#

What do you say ?

keen beacon
#

nothing important

elder rapids
#

he likely had access

keen fulcrum
#

So, AI will change education forever. Kids who are not educated using latest tech will fall behind

elder rapids
#

or he could've

late path
#

previously OpenAI made some changes to o1-pro in pro accounts, adding a search function, and it self claims o3pro o3

keen beacon
#

o1 pro doesn't have web access. his did and claimed to be o3

late path
#

But we're not sure if that's really o3pro

keen fulcrum
leaden sun
elder rapids
#

I don't suspect o3 pro is going to be much more than a high high thinking mode, as opposed to o1 pro being an insane gap

keen beacon
keen fulcrum
keen beacon
#

i think, at least based on his past comments

leaden sun
keen fulcrum
#

Due to bureaucracy and politics AI won't be introduced in public schools in the next 10-20 years

keen beacon
#

🤣

leaden sun
keen fulcrum
#

The future is either homeschooling or private schools for education

patent aspen
#

Why did OAI announce o3 pro this morning and then wait several hours to launch?

#

Maybe the ChatGPT outage?

#

Or maybe the rollout caused the outage

cedar tide
wintry tinsel
#

It’s really not for kids, it’s very good for a tight knit community and traditional values with more focused education well depends on the private school

wintry tinsel
leaden sun
wintry tinsel
#

Classical private schools give a way better education, you learn actual critical thinking, philosophy, and unbiased non lobotomized history

#

Public schools are excruciatingly leftist or stem focused