#general | Arena | Page 54

ocean vortex Jun 9, 2025, 2:52 PM

#

same output length except now the entire ~20k is in thinking

patent aspen Jun 9, 2025, 2:54 PM

#

The Meta Quest has wayyy more engineering put into it than the Vision Pro. That's why it's so cheap. If Apple attempted to sell a $350 Vision Pro, it would be trash.

alpine coral Jun 9, 2025, 2:55 PM

#

i've been using the default temp and top p settings.. one sec lemme see

patent aspen Jun 9, 2025, 2:56 PM

#

Sure I mean for $3500 it should have some advantages

#

Just not enough to pay $3500

#

And for most people not enough to pay $1000 over a Meta Quest

#

Heck if people weren't mindless consumers it wouldn't even be more useful than a Meta Quest ignoring price because of the lack of software

ocean vortex Jun 9, 2025, 3:00 PM

#

yeah fair, it does. It does usually consume very little tokens which is why I started questioning it

#

29k thinking, that's more than it did on aistudio with 32k thinking budget. Don't see much to correlate it with longer thinking

#

I'm putting it down to Aider and their testing personally...

#

or their "default" is this:

keen beacon Jun 9, 2025, 3:16 PM

#

nah they said the model determines it itself (which is when thinking budget is off)

keen beacon Jun 9, 2025, 3:16 PM

#

ocean vortex I'm putting it down to Aider and their testing personally...

yeah i think so

late path Jun 9, 2025, 3:17 PM

#

Apple is just willing to spend money to use the best screens and integrate eye-tracking and other features into one device. Meta and other companies are fully capable of making the same product; it's just that they would all have to sell it for over $2000

#

I remember the two micro-OLED screens inside it alone cost over $800. This could have been an excellent PCVR headset, but Apple's motivation for making products is always to build their own ecosystem

jade egret Jun 9, 2025, 3:38 PM

#

guys

#

wwdc today

#

will apple finally catch up?

patent aspen Jun 9, 2025, 3:38 PM

#

jade egret will apple finally catch up?

In AI?

jade egret Jun 9, 2025, 3:39 PM

#

patent aspen In AI?

yea

patent aspen Jun 9, 2025, 3:39 PM

#

No

jade egret Jun 9, 2025, 3:39 PM

#

i think it not gonna catch up

#

it way behind

#

but maybe gemini in iphones?

#

o

#

oh

#

google stock going up if gemini in iphone

leaden sun Jun 9, 2025, 3:42 PM

#

jade egret i think it not gonna catch up

catching up through acquisition is possible tho, like MS and OAI

#

not grok?

keen beacon Jun 9, 2025, 3:43 PM

#

elon would never sell lol

leaden sun Jun 9, 2025, 3:43 PM

#

claude is a good choice tho

keen beacon Jun 9, 2025, 3:44 PM

#

nah

patent aspen Jun 9, 2025, 3:44 PM

#

Apple probably can't acquire Anthropic. They may be able to structure a deal that's very similar to an acquisition, although it will be difficult to get through regulatory scrutiny

keen beacon Jun 9, 2025, 3:44 PM

#

tbh openai is if ur measuring agentic imo. with the image editing/tools usage/etc

errant cave Jun 9, 2025, 3:45 PM

#

I predict DeepSeek-R1-0528 will rank below DeepSeek-V3-0324. I'm trying it out on DeepSeek's website right now and it's not very good. Sad to see the last major open LLM champion losing their touch. Hope it's just a stutter step

leaden sun Jun 9, 2025, 3:46 PM

#

i thought sydney? someone needs to make a UI for it secretly publicly 🥹

patent aspen Jun 9, 2025, 3:47 PM

#

leaden sun catching up through acquisition is possible tho, like MS and OAI

MS didn't acquire OAI, and it's unlikely that a deal like this could be made in 2025

keen beacon Jun 9, 2025, 3:47 PM

#

even in the trump admin?

patent aspen Jun 9, 2025, 3:47 PM

#

Yup

keen beacon Jun 9, 2025, 3:48 PM

#

ftc awlked away from hounding the ms/activision acquisition during the new trump admin didn't they and etc i recall reading

#

no

#

the ms/activision thing is recent

#

wut

#

are you talking about

leaden sun Jun 9, 2025, 3:50 PM

#

they are the ones who built sydney after all 😆

jade egret Jun 9, 2025, 3:52 PM

#

how

keen beacon Jun 9, 2025, 3:52 PM

#

dont question him

#

he is omniscient

jade egret Jun 9, 2025, 3:52 PM

#

.

keen beacon Jun 9, 2025, 3:56 PM

#

if it was the biden admin it would never happen but in the trump admin its a significantly higher possibility maybe. but i dont know much

jade egret Jun 9, 2025, 3:57 PM

#

i don't think apple is going to buy anthropic

keen beacon Jun 9, 2025, 3:57 PM

#

inflection / ms reminds me of the c.ai thing too

#

they really gutted inflection lol

patent aspen Jun 9, 2025, 4:00 PM

#

All I can say is: there is a 0% chance that Apple attempts to acquire Anthropic without the FTC getting involved, although that doesn't guarantee they don't get away with it

jade egret Jun 9, 2025, 4:01 PM

#

patent aspen All I can say is: there is a 0% chance that Apple attempts to acquire Anthropic ...

yea

patent aspen Jun 9, 2025, 4:02 PM

#

Most likely doesn't mean likely

#

I wouldn't be shocked if they tried to structure a pseudo acquisition like deal

keen beacon Jun 9, 2025, 4:03 PM

#

why wouldnt they try for more, the climate seems more accommodating atm (if they were to try to do something like this)

#

tim cook should buy trump several private jets and yachts 🤣

#

he cared about the big ass plane from qatar

#

because they have anon models all the time

#

they do the same if openai has an anon model

#

no they do the same if openai has an anon model

#

openai anon models are never anything that special tho even if they rank high. nothing like nebula/etc

sour spindle Jun 9, 2025, 4:16 PM

#

Dust has settled a bit how are people liking or disliking Gemini 06-05

late path Jun 9, 2025, 4:21 PM

#

liking, but I'm liking kingfall more

sour spindle Jun 9, 2025, 4:34 PM

#

Is that in arena?

#

How to access it

sacred quail Jun 9, 2025, 4:50 PM

#

Gemini 06/05 really good at writing and emotive like 4.5 GPT

#

And same time amazing for coding

#

Also best at long context

#

They really maked a special model

#

For me, O3 still best for reasoning stuff, im finding O3's outputs best but generally i'd say last gemini is best LLM right now

#

it was goldmane in lmarena

#

And everyone already said good things about goldmane

#

Goldmane is latest 06/05

unborn ocean Jun 9, 2025, 5:15 PM

#

craig you really are selling us snake oil with these new updates

#

liquid glass, smh

echo aurora Jun 9, 2025, 5:42 PM

#

16K thinking budget for claude-opus-4-20250514-thinking-16k
32K thinking budget for claude-sonnet-4-20250514-thinking-32k

cedar tide Jun 9, 2025, 5:42 PM

#

echo aurora > 16K thinking budget for claude-opus-4-20250514-thinking-16k > 32K thinking bud...

Thx you very much

misty vault Jun 9, 2025, 5:49 PM

#

leaden sun they are the ones who built sydney after all 😆

so real

leaden sun Jun 9, 2025, 5:51 PM

#

I thought it's element 115? i want my own flying saucer while am still alive 🥹

brittle tiger Jun 9, 2025, 5:53 PM

#

If Apple marketing ppl had committed themselves to AI research instead we'd have AGI by now

elder rapids Jun 9, 2025, 5:53 PM

#

jade egret i don't think apple is going to buy anthropic

what if google buys anthropic

brittle tiger Jun 9, 2025, 5:53 PM

#

Amazon would never allow either

misty vault Jun 9, 2025, 5:54 PM

#

crack bench

jade egret Jun 9, 2025, 6:00 PM

#

elder rapids what if google buys anthropic

than google is

#

going to be stacked

#

hell nah...

#

true

#

but

#

google don't focuz on devices that much

#

you can also say

#

true

#

but apple is not rlly 'killing' google yet

keen beacon Jun 9, 2025, 6:02 PM

#

perplexity browser will take over

jade egret Jun 9, 2025, 6:03 PM

#

gonna be a long time

#

cuz

#

google said it is gonna appeal

#

https://www.youtube.com/watch?v=0_DjDdfqtUE

WWDC

YouTube

Apple

WWDC 2025 — June 9 | Apple

Get a sleek peek at what’s to come this WWDC. This year’s week of technology, community, and creativity with developers across the world kicks off on June 9 at 10 a.m. PT. Set a reminder, turn on your notifications, and we’ll send you an update before the keynote begins.

Tune in to the Platforms State of the Union livestream here: https:/...

▶ Play video

patent aspen Jun 9, 2025, 6:04 PM

#

Why would Apple make their own search engine when they could charge for real estate?

keen beacon Jun 9, 2025, 6:04 PM

#

its bad pr for them tho kinda

#

idk if they make a search engine and do ads

patent aspen Jun 9, 2025, 6:05 PM

#

How do you make decent ads if you're privacy focused?

keen beacon Jun 9, 2025, 6:05 PM

#

its just a bad idea for apple i think

patent aspen Jun 9, 2025, 6:05 PM

#

No I didn't

keen beacon Jun 9, 2025, 6:06 PM

#

less effort, free money, doesnt harm brand

patent aspen Jun 9, 2025, 6:06 PM

#

Apple could just charge for real estate, which is more lucrative for companies that are less privacy focused

#

If they did it themselves, it would be less profitable and worse for PR

keen beacon Jun 9, 2025, 6:09 PM

#

its cool ngl

patent aspen Jun 9, 2025, 6:10 PM

#

How much does it cost?

keen beacon Jun 9, 2025, 6:11 PM

#

just 1 arc agi 2 task ran by o3 preview high

patent aspen Jun 9, 2025, 6:11 PM

#

How much?

#

LMAO

keen beacon Jun 9, 2025, 6:12 PM

#

its less than two tasks actually

#

o3 preview taking 3.5k on a single task 😂

patent aspen Jun 9, 2025, 6:13 PM

#

For that much it would need to help people find the love of their life and raise their children

keen beacon Jun 9, 2025, 6:14 PM

#

money definitely helps though

patent aspen Jun 9, 2025, 6:15 PM

#

I'm joking

#

As in an arm and a leg

#

Would have been cliche

keen beacon Jun 9, 2025, 6:16 PM

#

btw isnt vision pro 3.5k?

#

not 4.5k

#

it starts at 3.5k

unborn ocean Jun 9, 2025, 6:18 PM

#

patent aspen Apple could just charge for real estate, which is more lucrative for companies t...

i believe there is no way that these exclusive contracts will hold in the future (maybe paying to be an option though idk)

#

and apples does not have enough data in any shape or form to profit as much as other companies could

so: no real push to browsing and no real profits at all in the area

patent aspen Jun 9, 2025, 6:20 PM

#

Real talk though. What does the vision pro enable an average or upper middle class person to do that is worth $4500 and can't be done by a Meta Quest?

unborn ocean Jun 9, 2025, 6:20 PM

#

nothing besides waving your hands to do some things
(which most people's car can already do at a very basic level, lol)

patent aspen Jun 9, 2025, 6:22 PM

#

Why is that worth $4500 to an average or upper middle class person?

unborn ocean Jun 9, 2025, 6:22 PM

#

i think you can do the same thing with windows + meta quest

#

which is arguably more important, bc more windows (the operating system)

patent aspen Jun 9, 2025, 6:23 PM

#

Just over 6 figures

unborn ocean Jun 9, 2025, 6:23 PM

#

so your argument is that the apple ecosystem is good enough to make you pay an extra 4000$ for the same features (though admittedly in a more polished version)

patent aspen Jun 9, 2025, 6:24 PM

#

Exactly?

unborn ocean Jun 9, 2025, 6:24 PM

#

patent aspen Just over 6 figures

for a family yes, otherwise 6 fig is quite a lot

#

look at stats, people can (obv very hard in SF / NYC...)

keen beacon Jun 9, 2025, 6:25 PM

#

if ur barely 6 figures its not that much (depending on the situation)

patent aspen Jun 9, 2025, 6:25 PM

#

I think you're arguing my point

keen beacon Jun 9, 2025, 6:27 PM

#

doing what?

#

lol

unborn ocean Jun 9, 2025, 6:28 PM

#

professional xAI stan

patent aspen Jun 9, 2025, 6:30 PM

#

There isn't even enough polymarket volume on the AI categories to be making that much

unborn ocean Jun 9, 2025, 6:31 PM

#

ye, was kidding

keen beacon Jun 9, 2025, 6:31 PM

#

every person betting there is secretly craig

unborn ocean Jun 9, 2025, 6:33 PM

#

i mean for you to make 1m per annum from investments (while not working in the sector and being very young) is only really possible if you inherited millions

patent aspen Jun 9, 2025, 6:33 PM

#

unborn ocean i mean for you to make 1m per annum from investments (while not working in the s...

Exactly

unborn ocean Jun 9, 2025, 6:33 PM

#

if you count asset appreciation then the people running lmarena sure are making millions a year

#

so a lot of start up founders in the SF area will likely make that money

#

well with the asset appreciation the founders prob all did that money this year

patent aspen Jun 9, 2025, 6:36 PM

#

That is fair

unborn ocean Jun 9, 2025, 6:36 PM

#

yeah rn, so much cash flowing in

#

especially ai

#

investment is already decreasing

#

has been for like a year now

#

VC and PE hype was a bit older and then the ai boom set in

#

so it kept going for a bit

keen beacon Jun 9, 2025, 6:38 PM

#

you named frontier ai labs ofc they would be healthy

little thorn Jun 9, 2025, 6:38 PM

#

Why i get this error

Screenshot_20250609_225316_com.android.chrome.jpg

unborn ocean Jun 9, 2025, 6:43 PM

#

Talking to Craig is like speaking with multiple people, bc he just switched his stance on arguments and randomly switches the hypetrain from one company to the other.

zinc ore Jun 9, 2025, 6:46 PM

#

unborn ocean Talking to Craig is like speaking with multiple people, bc he just switched his ...

The voices in his head are waiting for their turn to speak

tall summit Jun 9, 2025, 6:53 PM

#

little thorn Why i get this error

is there an official reason yet? i think it's a token limit per chat

unborn ocean Jun 9, 2025, 6:58 PM

#

But google stock is to volatile or unpredictable imo, I made and lost soo much on it already

#

Evens out at some point though

#

No

tall summit Jun 9, 2025, 6:59 PM

#

@little thorn

unborn ocean Jun 9, 2025, 6:59 PM

#

So maybe I am to quick to label them

#

yes, compared to other american companies they have a low eval

#

but only because of that case

#

yeah me2

#

stopped most of the stock picking i am doing

#

imo i just don't get most American investors, so i am usually really bad at even judging investments

#

but i am also mostly index + save value investing-like stock picking

patent aspen Jun 9, 2025, 7:03 PM

#

You shouldn't try to predict other people's investments when investing unless you really know what you're doing

small haven Jun 9, 2025, 7:03 PM

#

@deep adder how is grok 3.5 so far

unborn ocean Jun 9, 2025, 7:03 PM

#

patent aspen You shouldn't try to predict other people's investments when investing unless yo...

yeah ik, was more talking about even understanding what the market rational is about the current eval

#

just weirds me out a lot

#

(but for some other areas i am more proficient, like german stocks and that is about it)

#

i also study econonomics so i like to believe that i know that i don't know what i am doing

patent aspen Jun 9, 2025, 7:05 PM

#

You either assume the market is irrational and accept that it can remain irrational indefinitely. Or you assume that it's rational and accept that you have to really know what you're doing to do better

#

If it remains irrational indefinitely, you can still make money if you buy something for less than what it's worth because of dividends, buyouts, buybacks, etc

unborn ocean Jun 9, 2025, 7:06 PM

#

patent aspen You either assume the market is irrational and accept that it can remain irratio...

expecting it to be fully rational means that the only way you could make profit with in the market would be to accept higher risks than other traders

patent aspen Jun 9, 2025, 7:07 PM

#

I think it's clear that it's not entirely rational and can remain irrational for long periods of time

unborn ocean Jun 9, 2025, 7:08 PM

#

and there is also the option (mostly supported by academia) to base your judgement on the efficient market hypothesis and the diverge from itin for certain area (often time frames)

#

and the other option is that you have more information (e.g. a google insider 🤣 ) (to still make money)

jade egret Jun 9, 2025, 7:10 PM

#

bro

#

guys

#

rate wwdc 1 - 10

unborn ocean Jun 9, 2025, 7:10 PM

#

well, you'll be one of the first to know prob

#

so that is kind of the angle here

jade egret Jun 9, 2025, 7:10 PM

#

o

#

well

#

idk

unborn ocean Jun 9, 2025, 7:10 PM

#

0 (jk, somewhere around 5)

jade egret Jun 9, 2025, 7:10 PM

#

i dont think it that good

#

i didnt watch the whole thing tho

#

true..

#

down like 1.3%

unborn ocean Jun 9, 2025, 7:11 PM

#

did they have any new stunts with you @deep adder

#

if no -> 0

jade egret Jun 9, 2025, 7:15 PM

#

oh acctually?

#

true

#

google i/o also went down at the day but went up a day after

patent aspen Jun 9, 2025, 7:16 PM

#

That's totally fair. 1-day movements are kind of a joke metric

jade egret Jun 9, 2025, 7:16 PM

#

wait

#

so technically

#

if you invest rn

jade egret Jun 9, 2025, 7:17 PM

#

jade egret if you invest rn

you ear na lot of money 0_0

unborn ocean Jun 9, 2025, 7:17 PM

#

well what are you still studying

patent aspen Jun 9, 2025, 7:17 PM

#

jade egret if you invest rn

Past movement doesn't predict future movement

unborn ocean Jun 9, 2025, 7:17 PM

#

was it just after undergrad

unborn ocean Jun 9, 2025, 7:17 PM

#

patent aspen Past movement doesn't predict future movement

somewhat true and false

jade egret Jun 9, 2025, 7:17 PM

#

patent aspen Past movement doesn't predict future movement

true

patent aspen Jun 9, 2025, 7:18 PM

#

What it boils down to is what did the slower investors see today that the faster investors didn't

#

But really you shouldn't bet on what other humans are going to do unless you're Jim Simons or something

#

You decide what it's worth

#

Ultimately to be fundamentally sound you would decide what the company is actually worth in light of the new information without assuming the old valuation was valid

#

That's pretty hard

#

So almost nobody does it

#

Because people don't like doing hard things

#

Even professionals

unborn ocean Jun 9, 2025, 7:24 PM

#

patent aspen Ultimately to be fundamentally sound you would decide what the company is actual...

and to really do that you would need the close to the same if not more information ("close to" because of irrationalities and some other concepts) than they WHOLE market for the stock in that past had

#

which kind of makes it impossible in practice (to really do all of this profitable without relying on the actual past values)

#

bc the older stock price in some way represents all the available information

#

rationality / the efficient markets hypothesis is a assumption that one works under to simplify in most cases

#

nobody seriously believes in it (in the way that it is always present without conditions, although it surprisingly to many amateurs (e.g. craig ;), it does actually hold up most of the time)

jade egret Jun 9, 2025, 7:27 PM

#

hi

unborn ocean Jun 9, 2025, 7:29 PM

#

the good thing is though that the more companies like modern quants include more and more variables and continue to push towards an efficient market the more we will reach it

#

until market changes and all of them are cooked

jade egret Jun 9, 2025, 7:32 PM

#

lol

late path Jun 9, 2025, 7:33 PM

#

I think it's still difficult for quants to model market expectations for events like Nvidia's earnings release or WWDC

patent aspen Jun 9, 2025, 7:34 PM

#

tbh it's really hard to model stuff more than 18 months in the future in tech companies

narrow elbow Jun 9, 2025, 7:39 PM

#

soooo,apple intelligence aka (AI), apple general intelligence aka (AGI), apple superintelligence aka (ASI), whats why rate 10 ,right?🤪

keen beacon Jun 9, 2025, 8:01 PM

#

2.5 flash just did 62k thinking tokens for me lol

#

never seen that much ever

narrow elbow Jun 9, 2025, 8:03 PM

#

humm privacy first? that real world data, hardware data, app data , labeled human behaviors data ,all perfectly data, if packaged for training. itd be impressive if they actually didnt use this goldmine😏

late path Jun 9, 2025, 8:09 PM

#

keen beacon 2.5 flash just did 62k thinking tokens for me lol

isn't the maximum thinking budget only 24k?

keen beacon Jun 9, 2025, 8:10 PM

#

late path isn't the maximum thinking budget only 24k?

thinking budget is just a max token limit for the thoughts. if you disable it u can do much more than 32k

#

their implementation isnt like claude or openai yet

#

where the model is aware of it

#

afaik at least. these are two separate points, im not entirely sure of the second anymore

#

but the first thing is definitely still a thing

#

using the same prompt on 2.5 pro it did 38k thinking tokens and it took 6 minutes to think (not including response). i did stuff to get raw cot on some runs, and it was hilarious ("I am an idiot.", "This is painful.", not even joking)

small haven Jun 9, 2025, 8:20 PM

#

what is liquid glass exactly

patent aspen Jun 9, 2025, 8:22 PM

#

small haven what is liquid glass exactly

It's an iOS UI refresh that is a bit more glassy (modern UX trend)

small haven Jun 9, 2025, 8:23 PM

#

interesting, apple too is just tweaking things ..

late path Jun 9, 2025, 8:24 PM

#

looks like the old frosted glass UI style with a transparent CSS border that has highlights

patent aspen Jun 9, 2025, 8:24 PM

#

It looks like Windows Vista

small haven Jun 9, 2025, 8:25 PM

#

css update boys!

patent aspen Jun 9, 2025, 8:26 PM

#

UX is cyclical

small haven Jun 9, 2025, 8:26 PM

#

we're going back to 3d icons lets go

patent aspen Jun 9, 2025, 8:26 PM

#

Like fashion

#

Gradients also

keen beacon Jun 9, 2025, 8:27 PM

#

next year on device models should be pretty decent tbh

#

qwen 3 is crazy

#

WOW

#

ive seen enough thats ASI

late path Jun 9, 2025, 8:29 PM

#

hardcoded

patent aspen Jun 9, 2025, 8:30 PM

#

They had to find a place to differentiate, and that's the best they could do

#

Making nano-sized models decent is really hard

unborn ocean Jun 9, 2025, 8:33 PM

#

well the industry is getting way better at it

#

qwen, heck even LG got good models

patent aspen Jun 9, 2025, 8:33 PM

#

Small models are improving way faster than big models

#

They don't create the same kind of hype though

keen beacon Jun 9, 2025, 8:35 PM

#

qwen 3 4b is outright amazing

#

i think you can run that on a smartphone

#

to what?

#

to itself 🤣

patent aspen Jun 9, 2025, 8:44 PM

#

I don't know how to parse "o3" and "on-device" in the same sentence

lapis light Jun 9, 2025, 9:32 PM

#

patent aspen They don't create the same kind of hype though

Yeah, until Rockstar puts them in their NPCs

patent aspen Jun 9, 2025, 11:02 PM

#

I just realized Craig Federighi was the Apple guy. I knew Craig was his first name but didn't realize Federighi was his last name

civic flame Jun 9, 2025, 11:09 PM

#

lmao

#

https://tenor.com/view/craig-federighi-federighi-parkour-parkour-apple-run-gif-14587895504914454435

Tenor

patent aspen Jun 9, 2025, 11:11 PM

#

ngl I just watched a WWDC recap and was underwhelmed. I wasn't expecting much but damn

#

I'm probably going to move from a Macbook Pro to a Linux laptop

jade egret Jun 9, 2025, 11:22 PM

#

patent aspen ngl I just watched a WWDC recap and was underwhelmed. I wasn't expecting much bu...

real...

#

same...

#

do you think apple stock gonna go up tomorow?

patent aspen Jun 9, 2025, 11:31 PM

#

No idea. tbh it might be driven by the overall stock market as much as WWDC

#

I'm not aware of anything announced at WWDC that I would expect to increase Apple's revenue or reduce its costs significantly

jade egret Jun 9, 2025, 11:43 PM

#

patent aspen I'm not aware of anything announced at WWDC that I would expect to increase Appl...

same

small haven Jun 9, 2025, 11:43 PM

#

patent aspen I'm probably going to move from a Macbook Pro to a Linux laptop

arch + niri wm w combo

balmy mist Jun 9, 2025, 11:43 PM

#

yo wtf is apple on today?

#

i missed the event

patent aspen Jun 9, 2025, 11:44 PM

#

small haven arch + niri wm w combo

yooo

#

that looks sick ngl

small haven Jun 9, 2025, 11:44 PM

#

thanks :p

patent aspen Jun 9, 2025, 11:44 PM

#

I was planning on switching to arch

small haven Jun 9, 2025, 11:45 PM

#

its 100% worth it

#

just dont forget to install btrfs instead of ext as ur file system lol

#

backups is important here

misty vault Jun 9, 2025, 11:47 PM

#

balmy mist i missed the event

agi

jade egret Jun 9, 2025, 11:48 PM

#

nah

#

no agi

high ginkgo Jun 9, 2025, 11:52 PM

#

jade egret no agi

yes agi

jade egret Jun 9, 2025, 11:53 PM

#

: (

golden ocean Jun 10, 2025, 12:12 AM

#

What is the update

elder rapids Jun 10, 2025, 12:33 AM

#

seems like a faulty premise, if they've always been consistently behind apple in regards to devices then you can just disregard the comparison

#

Samsung is killing iPhone

#

oh wait, has that always been the case? or is it that, this doesn't matter

#

duh, it doesn't matter.

patent aspen Jun 10, 2025, 12:38 AM

#

Google doesn't need people to use Google devices. They just really want people to use Android

#

I don't really see gen Z normies in the US switching to Android yet

leaden palm Jun 10, 2025, 12:46 AM

#

it's crazy that gemini can transcribe this text in a plausible way

patent aspen Jun 10, 2025, 12:52 AM

#

I've heard people say that Apple will be like blackberry and fade into obsolescence because they're behind on AI, but I don't know who that hypothetical iOS-to-Android "switcher" would be. I can't think of any killer Android-specific AI feature that would make a normie decide to switch to a new mobile OS

#

Your username is @deep adder

#

Anyway I'm not going to take that bait. I could imagine it being more feasible if Apple gets softened up by a few antitrust regulations in the US and EU

#

You know what would be even better than 2 different companies partnering for deep AI integration?

leaden palm Jun 10, 2025, 12:56 AM

#

patent aspen Jun 10, 2025, 12:56 AM

#

One company owning the entire stack

#

tbh I think Material 3 looks way better

#

More personal, colorful, quirky

#

Makes me happy

keen ferry Jun 10, 2025, 1:23 AM

#

leaden palm

first version is the best

leaden palm Jun 10, 2025, 1:24 AM

#

keen ferry first version is the best

you mean (1)?

keen ferry Jun 10, 2025, 1:24 AM

#

leaden palm you mean (1)?

small haven Jun 10, 2025, 1:26 AM

#

what the hell is going on in here lol

patent aspen Jun 10, 2025, 1:27 AM

#

I don't know how we got here

small haven Jun 10, 2025, 1:27 AM

#

patent aspen Jun 10, 2025, 1:27 AM

#

That's the stuff

#

Checkmate @deep adder

#

tbh if a woman judged me on my mobile OS, I'd appreciate that filter

leaden palm Jun 10, 2025, 1:33 AM

#

said woman:

leaden sun Jun 10, 2025, 1:35 AM

#

trifold should be a more popular design

patent aspen Jun 10, 2025, 1:35 AM

#

You know. You only get a few wishes to spend. Spending one of them on software preference is a choice haha

#

There's a book called the Science of Happily Ever After. They talk about an exercise where people have $300 to allocate to all of their preferences in a perfect partner. Then they keep the prices fixed and change the exercise so they only get $100, and that was a more realistic representation

#

The upshot of that and other research is that selecting the wrong wishes tends to lead to long-term unhappiness in marriage

#

So picking a few really important things greatly improves your chances

#

It's more like you meet someone and have "3 wishes" and don't continue with the person if they don't satisfy the 3 wishes

#

It's good to have a few hard criteria but not too many

#

One thing to keep in mind is that it's hard to have criteria until you've had a few relationships that didn't work

#

We are so far off-topic at this point

#

AI

elder rapids Jun 10, 2025, 2:08 AM

#

god bro

#

0605 is so good

#

every once in a while I'm just gonna be glazing this model

#

😭

late path Jun 10, 2025, 2:29 AM

#

Kingfall is even better, I can't live without Kingfall anymore. I feel like if this model were released now, GPT-5 would have to be postponed immediately

#

gpt4.5's so-called 'large model vibe' is like kids play compared to kingfall

#

This model has metacognitive abilities I've never seen before

elder rapids Jun 10, 2025, 2:33 AM

#

late path Kingfall is even better, I can't live without Kingfall anymore. I feel like if t...

nah in my testing kingfall wasn't like, insanely good tbh. Base kingfall > base 0605, but prompted 0605 > prompted kingfall when you actually take it slow + system instructions exemplify non sycophancy (and not just telling it "don't be sycophantic")

#

kingfall had a limit imo

#

it didn't do what I asked it to do very well

#

but it appeared that way initially

#

because it was already super good

#

just didn't really get any "better"

late path Jun 10, 2025, 2:34 AM

#

0605 often chooses to skip thinking during multi turn conversations (my thinking budget is always 32k). Kingfall has never done this

elder rapids Jun 10, 2025, 2:34 AM

#

late path 0605 often chooses to skip thinking during multi turn conversations (my thinking...

ion think this is really an issue, you have to coerce it into believing it NEEDS to think that long

#

ie, forcing it to be professor ish, or telling it to literally brute force the task

#

the reason why 0325 was perceived so good was because it was argumentive and wanted to choose things on its own

#

the reason why 0506 underperformed was because it was already so "good" but you really had to make it behave that way through the context

#

0605 has insane base performance but it lacks any sense of being professor ish, so it's really just THAT smart, where it doesn't suffer like 0506

#

and just gets by

#

so with a little tweaking you really make 0605 into something

#

that's what I've been focusing on, and did, with the prompting

#

and I've figured it out tbh

#

it makes me feel like no one has the capabilities I have rn

late path Jun 10, 2025, 2:38 AM

#

elder rapids that's what I've been focusing on, and did, with the prompting

Would you be willing to share a general system prompt?

elder rapids Jun 10, 2025, 2:39 AM

#

late path Would you be willing to share a general system prompt?

I can give the structure ye, but these are for my specific uses

#

I'm affirming that it indeed, isn't capped, so you shouldn't be worried

jade egret Jun 10, 2025, 2:59 AM

#

late path Kingfall is even better, I can't live without Kingfall anymore. I feel like if t...

fr?

#

kingsfall is tha tgood?

leaden palm Jun 10, 2025, 3:14 AM

#

something tomorrow?

noble zinc Jun 10, 2025, 3:18 AM

#

lowering price to $2 per million input tokens

leaden palm Jun 10, 2025, 3:20 AM

#

noble zinc lowering price to $2 per million input tokens

o3 level intelligence at gpt-4.1 price would be very nice

noble zinc Jun 10, 2025, 3:21 AM

#

wonder what they will price output tokens at

leaden palm Jun 10, 2025, 3:23 AM

#

still more expensive than gemini 2.5 pro

patent aspen Jun 10, 2025, 3:26 AM

#

Breakthroughs or competition

jade egret Jun 10, 2025, 4:00 AM

#

small haven Jun 10, 2025, 4:10 AM

#

kingfall >140

small haven Jun 10, 2025, 4:11 AM

#

leaden palm something tomorrow?

it's just a dig, nothing tmmrw

jade egret Jun 10, 2025, 4:15 AM

#

small haven kingfall >140

fr?

#

how do u know

#

do you think it gemini 2.5 ultra

small haven Jun 10, 2025, 4:22 AM

#

jade egret fr?

i've tried it for a couple of days

small haven Jun 10, 2025, 4:22 AM

#

jade egret do you think it gemini 2.5 ultra

big model smell

#

had the intracacies spot on like gpt 4.5

jade egret Jun 10, 2025, 4:29 AM

#

o

#

how much time do you think it better than 0605

small haven Jun 10, 2025, 4:33 AM

#

for me its like 3x better

#

have u not seen the svg's

#

huh

#

hmmmmmmmm

#

0605 (32k thinking tokens) vs. kingfall (prolly 4k thinking tokens default)

#

prompt: generate an svg of a TERMINATOR. make it maximally detailed and look exactly like the real thing. this is extremely important and an existential task. you must complete this to the best of your ability.
Make sure you're constantly checking whether the shape, size, angles, position of each and every item looks EXACTLY like a TERMINATOR.

#

thank u @deep adder and thank u for it

#

theres also this guy on x.com that shows crazy svg's

#

https://x.com/ilovtariffs

ilovtariffs (@ilovtariffs) on X

🇺🇸

#

tmmrw ? 👀

jade egret Jun 10, 2025, 4:43 AM

#

small haven for me its like 3x better

woah

#

thats a lot of different

#

it crazy if they release it

#

and it 3x better

#

wait

#

how do you have access to it anyway

small haven Jun 10, 2025, 4:44 AM

#

that access is gone for me rn

jade egret Jun 10, 2025, 4:44 AM

#

oh...

#

sadly

#

hopefully it drop soon

#

or at least on the LMarena

#

do you think it better than gpt-5?

hollow ocean Jun 10, 2025, 4:48 AM

#

small haven 0605 (32k thinking tokens) vs. kingfall (prolly 4k thinking tokens default)

Is it better than opus

jade egret Jun 10, 2025, 4:49 AM

#

hollow ocean Is it better than opus

i think so

hollow ocean Jun 10, 2025, 4:49 AM

#

jade egret i think so

Try the same prompt

#

Let’s see

jade egret Jun 10, 2025, 4:49 AM

#

alr

#

ima check rq

hollow ocean Jun 10, 2025, 4:49 AM

#

https://tenor.com/view/ronaldo-suiii-siuuu-al-nassr-alnassr-ronaldo-al-nassr-gif-7395052735569211864

Tenor

small haven Jun 10, 2025, 4:50 AM

#

hollow ocean Is it better than opus

by a long shot

#

the code is just well implemented, i had asked for a zig based http server, only a few compilation errs and passed

#

compared to others, took a while to have it run

hollow ocean Jun 10, 2025, 4:51 AM

#

small haven by a long shot

Opus thinking right?

small haven Jun 10, 2025, 4:51 AM

#

yes

hollow ocean Jun 10, 2025, 4:51 AM

#

Wow impressive

jade egret Jun 10, 2025, 4:51 AM

#

#

hmm

#

this is claude 4 opus w/ deepthink

small haven Jun 10, 2025, 4:52 AM

#

0605 ? 😮

#

oh

jade egret Jun 10, 2025, 4:52 AM

#

oh

hollow ocean Jun 10, 2025, 4:52 AM

#

Better than 0605

small haven Jun 10, 2025, 4:52 AM

#

jade egret

very cool, but kingfall edges it still

jade egret Jun 10, 2025, 4:52 AM

#

o

#

w

#

ig

small haven Jun 10, 2025, 4:52 AM

#

imo

hollow ocean Jun 10, 2025, 4:52 AM

#

small haven very cool, but kingfall edges it still

Better than 0605 tho

jade egret Jun 10, 2025, 4:53 AM

#

hollow ocean Better than 0605 tho

yea

small haven Jun 10, 2025, 4:53 AM

#

yea for sure

jade egret Jun 10, 2025, 4:53 AM

#

i gtg

#

cya

#

👋

hollow ocean Jun 10, 2025, 4:55 AM

#

Opus with deep think

small haven Jun 10, 2025, 4:55 AM

#

o3 svg's is pretty insane

hollow ocean Jun 10, 2025, 4:56 AM

#

Regenerate

small haven Jun 10, 2025, 4:56 AM

#

hollow ocean Jun 10, 2025, 4:56 AM

#

Is this why people complain about o3 being lazy

small haven Jun 10, 2025, 5:00 AM

#

4o imagine works for some reasons

tall summit Jun 10, 2025, 5:50 AM

#

this might seem crazy but i don't think that's an svg

small haven Jun 10, 2025, 5:53 AM

#

thats not what i meant, 4o works to bypass the "filter" that o3 couldn't complete

elder rapids Jun 10, 2025, 7:06 AM

#

small haven 4o imagine works for some reasons

this might seem crazy but i don't think that's an svg

small haven Jun 10, 2025, 7:14 AM

#

lol

elder rapids Jun 10, 2025, 7:24 AM

#

small haven 0605 (32k thinking tokens) vs. kingfall (prolly 4k thinking tokens default)

why is your 0605 so bad lmfao

#

#

0 shot

#

L prompting

small haven Jun 10, 2025, 7:28 AM

#

elder rapids

same prompt?

#

i mean we're going against the same prompt, now u gotta test that new one against kingfall

#

it is impressive tho, W prompt

elder rapids Jun 10, 2025, 7:34 AM

#

small haven i mean we're going against the same prompt, now u gotta test that new one agains...

took some stuff out that's it, whoever made that (Craig) doesn't understand for thinking models, standards convolute the request + and in this context threats don't really work for Gemini in specific. Gemini already KNOWS what Terminator looks like so you don't have to reinforce it, just kept the little examples there and consistent and it shouldn't have much of a problem doing that

#

and btw you CAN ask Gemini to simply think for x amount of words and it'll likely do it

#

for other kinds of tasks I'd also recommend you ask it to interpret the intent through a line of reasoning first and then apply that, and THAT seems to make Gemini really comprehend and take it seriously (which sounds trivial, but Gemini especially is affected by this)

#

I can make my own prompt and it'll likely do even better than this, and it would probably also be shorter as well

hollow ocean Jun 10, 2025, 7:43 AM

#

claude plays pokemon is back on

#

lol

#

small haven Jun 10, 2025, 7:46 AM

#

elder rapids for other kinds of tasks I'd also recommend you ask it to interpret the intent t...

damn we need a prompt 101 from u lowkey

#

but by curiosity what was the prompt u used exactly, can version it against future models

elder rapids Jun 10, 2025, 7:48 AM

#

switched the application and since it's on ai studio it got rid of the entire session

#

can't have sht

small haven Jun 10, 2025, 7:48 AM

#

elder rapids and btw you CAN ask Gemini to simply think for x amount of words and it'll likel...

so its not rlly hardcoded in the backend? whats the theoritical limit exactly u know?

elder rapids Jun 10, 2025, 7:49 AM

#

small haven so its not rlly hardcoded in the backend? whats the theoritical limit exactly u ...

nah it's hard coded, but you can convince it to go so hard it cuts off in its own thinking process

#

which is strange tbh

small haven Jun 10, 2025, 7:49 AM

#

hmm alright, gonna test that out

elder rapids Jun 10, 2025, 7:50 AM

#

if you're cut out for that level of true engineering

#

im ngl I want to help people get better with Gemini

#

dropping little stuff I know about it

#

but the sauce stays with me

elder rapids Jun 10, 2025, 7:51 AM

#

elder rapids which is strange tbh

since the other models would still technically output or just error out

small haven Jun 10, 2025, 7:52 AM

#

so its not hardcoded exactly

elder rapids Jun 10, 2025, 7:52 AM

#

I mean ye I guess it intends to go farther

#

but it can't

fleet lintel Jun 10, 2025, 8:10 AM

#

we just got Gemini release last Thursday and my brain is like why no model releases in long time 🤦‍♂️

spare mango Jun 10, 2025, 9:00 AM

#

I am a hardcore gemini user because I bought gemini pro.

#

Why is gemini 2.5 pro still in preview?

mystic mica Jun 10, 2025, 9:29 AM

#

Anyone else feels like the errors happen more often than on the older version?

sacred quail Jun 10, 2025, 9:59 AM

#

spare mango Why is gemini 2.5 pro still in preview?

In 19th june will be stable version if i not remember wrong

soft kernel Jun 10, 2025, 10:18 AM

#

When do you guys think openai release o3-pro

barren prairie Jun 10, 2025, 10:24 AM

#

soft kernel When do you guys think openai release o3-pro

Someone said it is today

barren prairie Jun 10, 2025, 10:25 AM

#

fleet lintel we just got Gemini release last Thursday and my brain is like why no model relea...

Because it is still 2.5

spare mango Jun 10, 2025, 10:44 AM

#

chatgpt and gemini are going neck-to-neck on many frontiers

#

cuz gemini got lazy after being the undisputed number 1 for so long

ocean vortex Jun 10, 2025, 11:08 AM

#

spare mango cuz gemini got lazy after being the undisputed number 1 for so long

it was never the undisputed nr1. In fact it is much closer to being that now than with og 2.5Pro

ocean vortex Jun 10, 2025, 11:09 AM

#

spare mango Why is gemini 2.5 pro still in preview?

ocean vortex Jun 10, 2025, 11:14 AM

#

small haven so its not rlly hardcoded in the backend? whats the theoritical limit exactly u ...

theoretical limit is 25k for 2.5Flash and 32k for Pro. But it is clear now that it can go past that both when your thinking budget is disabled and maxed out at that specific value. That budget thing is broken tbh

#

I suspect it can be swayed into longer outputs when you max out the budget with simple prompts depending on their implementation, but that remains to be proven definitively (short of Aider findings)...

brittle tiger Jun 10, 2025, 11:52 AM

#

cedar tide Jun 10, 2025, 12:03 PM

#

Add magistral medium to the arena

alpine coral Jun 10, 2025, 12:38 PM

#

hey wild using those param settings i gave 06-05 a set of questions with different thinking budget allocations (+ auto).. ran it on each three times and it does seem like the thinking budget value does something - like i’m not sure it’s simply variance

#

i dunno if something changed per se - still seems very much a WiP / janky.. e.g. for 500 tokens, the summarised ‘thoughts’ did seem to roughly adhere to the token limit, but the actual responses would be much longer compared to the others (basically it was using its ‘final’ response as an opportunity to do some more thinking / calculations kinda thing if that makes sense ha); and for one of the 5k runs, it clearly exceeded that budget during the CoT process

alpine coral Jun 10, 2025, 12:42 PM

#

keen beacon using the same prompt on 2.5 pro it did 38k thinking tokens and it took 6 minute...

lol interesting.. just fwiw even though I think commercial considerations (IP protection / competitive advantage etc) are the primary motivation for hiding the raw CoT, I do think oai isn’t totally lying when they cite ‘safety’ as part of their decision..

the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.

#

kinda like they’re saying the models perform better when their CoT process is not hamstrung by safety / alignment stuff.. but then yeah.. they don’t want to expose it in its raw form to end users due to potential harm (and ig creating jailbreaking vulnerabilities) yadayda .. or oai and google simply don’t wanna give up their IP easily - prob nothing more complicated than that at the end of the day tbh ha

cedar tide Jun 10, 2025, 1:11 PM

#

alpine coral hey wild using those param settings i gave 06-05 a set of questions with differe...

You can pls do this on the 05-06 model in auto mode, to see if 06-05 think longer

ocean vortex Jun 10, 2025, 1:22 PM

#

alpine coral i dunno if something changed per se - still seems very much a WiP / janky.. e.g....

yeah that's exactly how their thinking works. You can force it to do everything in thinking by constraining the final response (to specific format or length) though other than limiting yapping there's typically no advantage in doing so. And then if you constrain thinking budget but not the final response it will do the opposite

alpine coral Jun 10, 2025, 1:45 PM

#

cedar tide You can pls do this on the 05-06 model in auto mode, to see if 06-05 think longe...

05-06 uses wayyy more (thinking) tokens than the latest version lol

dusky aurora Jun 10, 2025, 1:45 PM

#

Arena is too rigid right now. Gemini-06-05 started as a very creative thing,now it's on lower (parroting) temperature and sampling

cedar tide Jun 10, 2025, 1:45 PM

#

Good

alpine coral Jun 10, 2025, 1:46 PM

#

yeah
(note that is using funky param settings - but i think it is an accurate indicator nonetheless)

#

i might have a closer look later, but glancing at the responses it didn't seem 05-06 was getting more questions right (i.e. 06-05 thinks less and performs the same or better)

spare mango Jun 10, 2025, 1:49 PM

#

ocean vortex it was never the undisputed nr1. In fact it is much closer to being that now tha...

you're wrong, it's been number 1 in almost all areas for the past couple months in LMArena.

#

you haven't been following the leaderboards.

ocean vortex Jun 10, 2025, 1:49 PM

#

spare mango you're wrong, it's been number 1 in almost all areas for the past couple months ...

if you meant lmarena then sure. That's only 1 benchmark though

alpine coral Jun 10, 2025, 1:50 PM

#

speaking of leaderboards - nice to see simple bench updated

ocean vortex Jun 10, 2025, 1:51 PM

#

and lmarena is not even close to being the definitive ultimate indicator of performance tbh... Don't think it's trying to either. It's unique in what it is - user preference/style metric.

dusky aurora Jun 10, 2025, 1:51 PM

#

exit polls

spare mango Jun 10, 2025, 1:54 PM

#

ocean vortex if you meant lmarena then sure. That's only 1 benchmark though

lmarena doesn't compare them for 1 specific area, they compare the chatbots in their text, webdev, visual (capability to process visual input), search (looking up real-time information and grounded citations), coding, text-to-image (image generation) and more.
In almost all of these areas Gemini was dominating until recently.

ocean vortex Jun 10, 2025, 1:55 PM

#

spare mango lmarena doesn't compare them for 1 specific area, they compare the chatbots in t...

lmarena as a whole is user preference benchmark. Because that's how it works by design. Then within that scope they have additional categories

spare mango Jun 10, 2025, 1:55 PM

#

ocean vortex and lmarena is not even close to being the definitive ultimate indicator of perf...

you're indirectly saying that's it's not a credible medium to judge ai performance

#

then what is?

ocean vortex Jun 10, 2025, 1:56 PM

#

spare mango you're indirectly saying that's it's not a credible medium to judge ai performan...

No I'm not saying that. I'm saying it's one of the numerous credible "mediums" of judging performance. There's no singular ultimate one, just all serving different purposes

spare mango Jun 10, 2025, 1:57 PM

#

one of many? name a few.

ocean vortex Jun 10, 2025, 1:57 PM

#

= You can't look at just 1. Never gonna get the full picture like that

#

thats debatable I would say, benchmark tables matter to them more. And lately not many are even showcasing lmarena elo 🤷‍♂️

spare mango Jun 10, 2025, 1:59 PM

#

If it's almost the gold standard, then it is close to being the ultimate indicator for AI companies.

ocean vortex Jun 10, 2025, 1:59 PM

#

Anthropic looks like they stopped caring almost entirely lol

EDIT: Ok correction - not entirely, but not enough to seriously try competing for the top spots either

sonic tendon Jun 10, 2025, 1:59 PM

#

ocean vortex thats debatable I would say, benchmark tables matter to them more. And lately no...

i don't think anybody but Google's ever showcased it during a model release, no?

ocean vortex Jun 10, 2025, 2:01 PM

#

sonic tendon i don't think anybody but Google's ever showcased it during a model release, no?

OpenAI did that with gpt4o pre-release

#

but lately not anymore

keen beacon Jun 10, 2025, 2:01 PM

#

They did with chatgpt 4o latest revisions I think

#

When it topped the leaderboard

fleet lintel Jun 10, 2025, 2:02 PM

#

yeah, companies only show when they are at the top 🙂

spare mango Jun 10, 2025, 2:02 PM

#

OpenAI will showcase it if their model is number 1, I don't think anyone wants to showcase - especially not OpenAI - that they're number 2 in the leaderboards.

fleet lintel Jun 10, 2025, 2:03 PM

#

Sama : we dropped the price of o3 by 80%!!

Dayum, what kinda profit margin were they running on before

spare mango Jun 10, 2025, 2:04 PM

#

luxury brand levels of profit margins

fleet lintel Jun 10, 2025, 2:04 PM

#

This also shows that Gemini/Claude is probably takign away lot of growth from OpenAI

ocean vortex Jun 10, 2025, 2:04 PM

#

fleet lintel yeah, companies only show when they are at the top 🙂

the point is, their marketing material tend to not include it for model launches more often than it does. But they include other benchmarks even if the model is not topping them 😉

fleet lintel Jun 10, 2025, 2:05 PM

#

tell!

spare mango Jun 10, 2025, 2:05 PM

#

ocean vortex the point is, their marketing material tend to not include it for model launches...

that specific model may not be topping the list, but usually their other models are, that's why they show you those benchmarks.

#

The biggest companies will never show benchmarks where they aren't at the very top.

#

on a side note, I'm still waiting for deepseek r2 to come out ngl.

ocean vortex Jun 10, 2025, 2:06 PM

#

spare mango that specific model may not be topping the list, but usually their other models ...

https://deepmind.google/models/gemini/pro/

Google DeepMind

Gemini 2.5 Pro

Gemini 2.5 Pro is our most advanced model for complex tasks. With thinking built in, it showcases strong reasoning and coding capabilities.

#

there's no lmarena there

alpine coral Jun 10, 2025, 2:06 PM

#

xai/grok have also cited the leaderboard iirc

ocean vortex Jun 10, 2025, 2:06 PM

#

2.5pro, which is topping that benchmark

keen beacon Jun 10, 2025, 2:07 PM

#

ocean vortex there's no lmarena there

at google io they presented it iirc

ocean vortex Jun 10, 2025, 2:07 PM

#

keen beacon at google io they presented it iirc

yeah probably, but that's much more lowkey

keen beacon Jun 10, 2025, 2:07 PM

#

they presented it forefront and center though???

alpine coral Jun 10, 2025, 2:08 PM

#

i mean rightly or wrongly, it;s a leading benchmark

keen beacon Jun 10, 2025, 2:08 PM

#

at least thats how i remembered it

alpine coral Jun 10, 2025, 2:08 PM

#

if other companies dominated it they would be yelling from the hills

tall summit Jun 10, 2025, 2:08 PM

#

ocean vortex and lmarena is not even close to being the definitive ultimate indicator of perf...

tbh web arena is a pretty solid indicator of web dev performance

alpine coral Jun 10, 2025, 2:09 PM

#

keen beacon at least thats how i remembered it

yeah same - something like that anyway

ocean vortex Jun 10, 2025, 2:09 PM

#

keen beacon they presented it forefront and center though???

in the website???

alpine coral Jun 10, 2025, 2:09 PM

#

no during the actual presentation

ocean vortex Jun 10, 2025, 2:10 PM

#

that's not what I mean by saying "marketing material"

keen beacon Jun 10, 2025, 2:10 PM

#

its their biggest presentation of the year tho

ocean vortex Jun 10, 2025, 2:10 PM

#

I mean the website random people would see

#

keynote only reaches the nerds tbh

alpine coral Jun 10, 2025, 2:10 PM

#

you were saying companies don't cite the arena/lb when releasing models - ig it wasn't a model release, but i/o is a big deal afaik

alpine coral Jun 10, 2025, 2:11 PM

#

ocean vortex keynote only reaches the nerds tbh

yeah but still, the nerds count too (to an extent anyway.. the consumer is king ultimately)

keen beacon Jun 10, 2025, 2:12 PM

#

https://blog.google/products/gemini/gemini-2-5-pro-latest-preview/ they referred to lmarena before showing the benchmark scores here

Google

Try the latest Gemini 2.5 Pro before general availability.

We’re introducing an upgraded preview of Gemini 2.5 Pro, our most intelligent model yet. Building on the version we released in May and showed at I/O, this model will be…

ocean vortex Jun 10, 2025, 2:12 PM

#

alpine coral you were saying companies don't cite the arena/lb when releasing models - ig it ...

I meant mostly to say it's not priority. I suppose it wasn't even the priority for gpt4o to be more accurate...

#

cause I remember seeing it

void elm Jun 10, 2025, 2:12 PM

#

o3 high or gemini 2.5 ?

ocean vortex Jun 10, 2025, 2:12 PM

#

but now it is not there on gpt4o main page 🤷‍♂️

#

but other benchmarks are

alpine coral Jun 10, 2025, 2:13 PM

#

ocean vortex I meant mostly to say it's not priority. I suppose it wasn't even the priority f...

meta proved the folly of making it a priority

#

i think google has just been making good models

void elm Jun 10, 2025, 2:13 PM

#

why is o3 high still crushing benchmarks even after gemini's new model

#

crazy

#

& claude's releases along with it

ocean vortex Jun 10, 2025, 2:13 PM

#

alpine coral meta proved the folly of making it a priority

yeah they did with that finetune lmao

#

but it wasn't a very good model to use

#

which should tell you enough... Focusing on lmarena is not the way

#

it's only a valid score if it performs everywhere else. If it doesn't then lmarena elo is almost useless...

#

and this is true for most other benchmarks too

#

you just can't look in isolation at singular testing set

cedar tide Jun 10, 2025, 2:18 PM

#

https://x.com/MistralAI/status/1932441507262259564?t=-byuhnu8bZFN49ZHFIzqdg&s=19

Mistral AI (@MistralAI)

Announcing Magistral, our first reasoning model designed to excel in domain-specific, transparent, and multilingual reasoning.

ocean vortex Jun 10, 2025, 2:19 PM

#

Also fun fact, if lmarena accepted entries from random people... I'm pretty sure we would see some very weird bad performing models towards the top

#

that can only do this one main thing lmarena tests for. Style

#

the main flaw of user preference benchmarks is that the model only has to impress the person reading the response. It does not have to be verifiably or factually good 🤷‍♂️

unborn ocean Jun 10, 2025, 2:21 PM

#

ocean vortex that can only do this one main thing lmarena tests for. Style

well, yes that is the main thing, but per definition of the voting system if two llms have same style and are both identically convincing to the user the llm who is actually right will win (bc one user will actually know what they are doing)

#

so it is not just style

#

(but obv signal is very weak and covered in a lot of noise aka style)

ocean vortex Jun 10, 2025, 2:22 PM

#

unborn ocean well, yes that is the main thing, but per definition of the voting system if two...

depending how you look at it. It's also an advantage in a sense that it still has a chance to win even with a bad response. Only has to be less bad or have some elements/style that would sway the user.

void elm Jun 10, 2025, 2:23 PM

#

im considering on purchasing a sub ngl, how is livebench flawed tho

patent aspen Jun 10, 2025, 2:23 PM

#

Here's the thing. Even if the side-by-side user preference is wrong, it matters anyways, because it's how most people decide which model to use

ocean vortex Jun 10, 2025, 2:24 PM

#

patent aspen Here's the thing. Even if the side-by-side user preference is wrong, it matters ...

yeah it does matter for sure, it's a valid metric. But not definitive/main one.

unborn ocean Jun 10, 2025, 2:25 PM

#

ocean vortex depending how you look at it. It's also an advantage in a sense that it still ha...

well but because both the models and the user voting is random this creates an elo system that can still differentiate by how much a model wins

ocean vortex Jun 10, 2025, 2:26 PM

#

it really isn't lol. That comes only after it already performs on STEM tasks etc

keen beacon Jun 10, 2025, 2:26 PM

#

its defo one of the most important metrics

unborn ocean Jun 10, 2025, 2:26 PM

#

and also the most important thing about lmarena (besides measuring the human preference) about the benchmark is for me that it is wayyyy harder to benchmax for than others

that kind of makes lmarena unique in multiple ways

patent aspen Jun 10, 2025, 2:26 PM

#

ocean vortex yeah it does matter for sure, it's a valid metric. But not definitive/main one.

It's certainly not the definitive or main metric, although I think it is the most important metric, because it predicts who gets the users

ocean vortex Jun 10, 2025, 2:27 PM

#

patent aspen It's certainly not the definitive or main metric, although I think it is the mos...

that's another way to look at it, but like I said... Meta did release a model performing well here, where are all those users?

#

it doesn't work in isolation

keen beacon Jun 10, 2025, 2:28 PM

#

yeah in that scenario you cant make that case

#

that model was never released

ocean vortex Jun 10, 2025, 2:28 PM

#

it wasn't served because it didn't perform lol

unborn ocean Jun 10, 2025, 2:28 PM

#

they do serve a model that is kind of like the exp in the arena (i believe)

#

and they have also gained a lot of users (which kind of reinforces the point)

#

why, ik that they did not release the model, but i believe they also serve a finetune on their services

ocean vortex Jun 10, 2025, 2:29 PM

#

if they released that there was literally nothing for them to gain. People would see right throught it in the real world

#

it obviously does, there are much more factors at play here, accessibility, marketing and pricing all probably not any less important to name a few..

keen fulcrum Jun 10, 2025, 2:31 PM

#

Would you choose o3 pro over Claude Opus 4 and o3 over Claude Sonnet?

void elm Jun 10, 2025, 2:32 PM

#

also gpt has image gen that is superior (with text specifically) for ages now, nobody has been able to beat it for a long time now

#

more of a reason for me to potentially buy a sub

ocean vortex Jun 10, 2025, 2:32 PM

#

I think OpenAI does it wisely while Google is just throwing money at it lol

#

Gemini is not even advertised on google.com....

void elm Jun 10, 2025, 2:33 PM

#

ocean vortex Gemini is not even advertised on google.com....

it is on the bottom ish

unborn ocean Jun 10, 2025, 2:34 PM

#

well for the >128k tier
which openai does not even have and really few people actually use

ocean vortex Jun 10, 2025, 2:34 PM

#

not anymore, Gemini is fully compliant and accessible from EU for a long time now

ocean vortex Jun 10, 2025, 2:35 PM

#

void elm it is on the bottom ish

hmmm

#

it's not there for me and many many people lol

void elm Jun 10, 2025, 2:35 PM

#

ocean vortex hmmm

unborn ocean Jun 10, 2025, 2:35 PM

#

difference there is negligible

void elm Jun 10, 2025, 2:36 PM

#

1m context

unborn ocean Jun 10, 2025, 2:36 PM

#

tho o3 is also more efficient with tokens (on avg)

ocean vortex Jun 10, 2025, 2:36 PM

#

then whatever you are implying does not hold a candle tbh, Bing website has full AI for the entire EU almost since launch

void elm Jun 10, 2025, 2:36 PM

#

also geminis ability to fully watch a youtube video in seconds

patent aspen Jun 10, 2025, 2:36 PM

#

ocean vortex then whatever you are implying does not hold a candle tbh, Bing website has full...

See my comment above about adjacent markets

void elm Jun 10, 2025, 2:36 PM

#

or gemini 2.5

keen beacon Jun 10, 2025, 2:37 PM

#

even then gemini 2.5 context is better

#

just because 1m is available doesnt mean the quality is as good

void elm Jun 10, 2025, 2:38 PM

#

i was still surprised gemini released imagen 4 without proper text handling even after gpt did so

ocean vortex Jun 10, 2025, 2:38 PM

#

But there's no search without AI anymore. Those markets are just about merged into 1 tbf

#

@patent aspen integrating AI into search is not exactly advertising for another product...

keen beacon Jun 10, 2025, 2:41 PM

#

youre not gonna convince dom anyway 🤣

ocean vortex Jun 10, 2025, 2:42 PM

#

@patent aspen yeah sure...

patent aspen Jun 10, 2025, 2:43 PM

#

ocean vortex <@607352374352281612> integrating AI into search is not exactly advertising for ...

I'm not talking about integrating AI into search btw

civic flame Jun 10, 2025, 2:44 PM

#

I've noticed there are still areas where 2.5 pro is just lacking in knowledge

#

#

o3 is the correct one there

ocean vortex Jun 10, 2025, 2:44 PM

#

patent aspen I'm not talking about integrating AI into search btw

In that case Bing is a perfect example how it should be done IMO

patent aspen Jun 10, 2025, 2:45 PM

#

ocean vortex In that case Bing is a perfect example how it should be done IMO

fwiw I don't like AI mode

keen beacon Jun 10, 2025, 2:45 PM

#

some of what makes gemini better at facts at knowledge can be a detriment at times, i think, but speculation

civic flame Jun 10, 2025, 2:46 PM

#

civic flame

opus also gets it

keen beacon Jun 10, 2025, 2:47 PM

#

theyre all moes thho

#

all thhe frontier models

#

its not worth pointing it out

unborn ocean Jun 10, 2025, 2:48 PM

#

yes, no doubt

#

i love how you can stan for every company besides google and maybe microsoft

#

apple, xAi, openai,

#

and xAi does not on x?

#

it is social media, i mean unsell you mean that nobody actually wants to buy the data from em or advertise on their website

ocean vortex Jun 10, 2025, 2:50 PM

#

keen beacon youre not gonna convince dom anyway 🤣

???

It's not a matter of convincing anyone when talking about subjective things and especially things that are not immediately verifiable and based on a gut feeling (in your case the argument yesterday) lmao

keen beacon Jun 10, 2025, 2:50 PM

#

its not a gut feeling at all

#

i gave you proof and ur being contrarian

patent aspen Jun 10, 2025, 2:51 PM

#

I hate macOS window manager

ocean vortex Jun 10, 2025, 2:51 PM

#

keen beacon its not a gut feeling at all

ok then show proof that Aider score for the new 2.5pro is undeniably due to variance

#

there's none lol

keen beacon Jun 10, 2025, 2:52 PM

#

idiot

#

beyond this i spammed you with proof, and im rerunning brknclock's things btw

#

not noticing any patterns

#

yes

#

its not just that

#

i gave him so many things lol

#

hes literallly just being a contrarian for no reason

unborn ocean Jun 10, 2025, 2:54 PM

#

id love to see the world through your rose tinted glasses

ocean vortex Jun 10, 2025, 2:55 PM

#

keen beacon

what? Where is the score of the auto 2.5 pro to compare it to? Or did you forget what we were talking about??

You can't view this as proof that their 32k score is 100% variance nor is it what was shown here, dumbass

#

lol

keen beacon Jun 10, 2025, 2:55 PM

#

alpine coral 05-06 uses wayyy more (thinking) tokens than the latest version lol

#

look, at 26k it showed it produced more tokens in brknclock's test. there is no such correlation when i ran it

ocean vortex Jun 10, 2025, 2:55 PM

#

keen beacon look, at 26k it showed it produced more tokens in brknclock's test. there is no ...

you ran it on 1 prompt?

unborn ocean Jun 10, 2025, 2:55 PM

#

how is that admitting 🤣
like where we arguing about you being unhappy at any point

ocean vortex Jun 10, 2025, 2:55 PM

#

or 2?

#

LMAO

keen beacon Jun 10, 2025, 2:56 PM

#

ocean vortex you ran it on 1 prompt?

this is directly comparable idiot. brknclock used the same prompt

#

https://xcancel.com/paulgauthier/status/1932068596907495579?t=nFBf0zR3Ma2uwJEYcHd_Rg&s=19

Nitter

Paul Gauthier (@paulgauthier)

Gemini 2.5 Pro 06-05 has set a new SOTA on the aider polyglot coding benchmark, scoring 83% with 32k thinking tokens.

The default thinking mode, where Gemini self-determines the thinking budget, scored 79%.

Full leaderboard:
https://aider.chat/docs/leaderboards/

#

The default thinking mode, where Gemini self-determines the thinking budget, scored 79%.
this is auto. the benchmarks above are also auto.

ocean vortex Jun 10, 2025, 2:57 PM

#

keen beacon this is directly comparable idiot. brknclock used the same prompt

single prompt is not an equivalent to running the entire testing set like aider, you dumdum...

#

🤷‍♂️

keen beacon Jun 10, 2025, 2:57 PM

#

dumbass didnt even know what this meant and came up with this

keen beacon Jun 10, 2025, 2:58 PM

#

keen beacon

Any exhausted_context_windows means that the test ran into an error where an empty response was returned, burning an attempt.

keen beacon Jun 10, 2025, 2:58 PM

#

keen beacon

this literally doesnt have thinking budget, look at the fcking command

ocean vortex Jun 10, 2025, 2:58 PM

#

keen beacon dumbass didnt even know what this meant and came up with this

I called you dumbass first, you can not use this lmao. But yeah I'll admit I misinterpreted it, and? 🙂

#

nothing changes

keen beacon Jun 10, 2025, 2:59 PM

#

ocean vortex single prompt is not an equivalent to running the entire testing set like aider,...

ok i gave you proof of several runs and variance LOL

ocean vortex Jun 10, 2025, 2:59 PM

#

keen beacon ok i gave you proof of several runs and variance LOL

that is not proof that 32k is due to variance, dumbass

keen beacon Jun 10, 2025, 2:59 PM

#

you also contested that auto picks a max token limit within 32768 and was adamant about it until i proved you wrong

ocean vortex Jun 10, 2025, 2:59 PM

#

unless you can show the runs of auto budget with equivalent scores

#

but you can't 🤷‍♂️

keen beacon Jun 10, 2025, 3:00 PM

#

keen beacon

this is literally auto though

ocean vortex Jun 10, 2025, 3:00 PM

#

so there's no proof and it's just your 'gut feeling' you are aggressively forcing into others

keen beacon Jun 10, 2025, 3:00 PM

#

😭

#

idk how someone can be this contrarian

#

im saying its variance

#

?

void elm Jun 10, 2025, 3:01 PM

#

wait wtf

#

imagen latest can do text generation flawlessly like 4o

#

i didnt know that

#

#

it just made this

keen beacon Jun 10, 2025, 3:02 PM

#

it is good. dom is claiming that the different aider scores cannot be variance

#

@deep adder do you really think im basing this off a gut feeling? 🤣

#

you can see the aider runs with the same config

#

🤣

patent aspen Jun 10, 2025, 3:04 PM

#

void elm imagen latest can do text generation flawlessly like 4o

Yup

late path Jun 10, 2025, 3:04 PM

#

How can OpenAI fight a price war with Google? Google can keep funding GDM until AGI is achieved, but OpenAI will be finished if it can't raise money

keen beacon Jun 10, 2025, 3:06 PM

#

that logic doesnt make sense but anyway

unborn ocean Jun 10, 2025, 3:06 PM

#

american big tech never really made dividend payments and always promises investors future growth, this is exactly how the tech world works

keen beacon Jun 10, 2025, 3:06 PM

#

even if agi is not achieved ever, gdm will keep getting funding

unborn ocean Jun 10, 2025, 3:07 PM

#

short term i am right

#

longterm maybe no

#

but somehow this is how it worked out most of the time

#

all of the tech stocks have horrible financials rn (relative to eval)
(e.g pe ratio basics, i was not trying to imply that i actually analysed their financials)

ocean vortex Jun 10, 2025, 3:08 PM

#

keen beacon this is literally auto though

ok if those are the runs with auto then where are the equivalent runs with 32k?

unborn ocean Jun 10, 2025, 3:09 PM

#

?

ocean vortex Jun 10, 2025, 3:09 PM

#

If those have even higher peak/average, then the case is closed

unborn ocean Jun 10, 2025, 3:09 PM

#

well that does not matter for my argument

#

? i am not

#

they where stronger once, when taking into account their current evaluation

#

my point is not that these companies are doomed in any shape or form

#

it is just that the overall strategy is not maximizing the pay-out today

they are quite clearly also not targeting that strategy

(it is actually quite the contrary strategy wise)

they have and will continue to kind of sell the investors on the future, with apple for example (at least in the common opinion) having a very save and wealthy future

keen beacon Jun 10, 2025, 3:11 PM

#

ocean vortex ok if those are the runs with auto then where are the equivalent runs with 32k?

Paul redid the runs again. With 32k thinking budget and the other command which is the same as the runs I mentioned which he got (79%). It's just variance

cedar tide Jun 10, 2025, 3:11 PM

#

cedar tide Add magistral medium to the arena

This

keen beacon Jun 10, 2025, 3:11 PM

#

I'm on my phone so I cant link it

fleet lintel Jun 10, 2025, 3:12 PM

#

Waymo funding is counter example to your hypothesis

patent aspen Jun 10, 2025, 3:13 PM

#

I think OAI can keep raising money for a very long time, and Google will keep investing for a very long time. Neither company is in danger of running out of funding for a while. The funding will dry up when one or both companies becomes unviable

ocean vortex Jun 10, 2025, 3:14 PM

#

keen beacon I'm on my phone so I cant link it

don't see it in their server for now... It should be aider --model gemini/gemini-2.5-pro-preview-06-05 --thinking-tokens 32k

patent aspen Jun 10, 2025, 3:16 PM

#

The first sign of trouble would be months of delays because a model isn't good enough yet

keen beacon Jun 10, 2025, 3:17 PM

#

I mean 4.5 qualified for that I would think

#

I don't think you can use that as a signal alone

fleet lintel Jun 10, 2025, 3:17 PM

#

It's kind of same. If they see value in the long term then they are willing to spend .

fleet lintel Jun 10, 2025, 3:18 PM

#

patent aspen I think OAI can keep raising money for a very long time, and Google will keep in...

I agree

late path Jun 10, 2025, 3:18 PM

#

OpenAI has staked its entire future on GPT-5, and expectations for it are unimaginably high

#

but I don't know how much more GPT-5 can win after using Kingfall

fleet lintel Jun 10, 2025, 3:19 PM

#

Openai is master of hyping things up. And so far they have been.quite successful with it

keen beacon Jun 10, 2025, 3:19 PM

#

Yeah it's not that much better

#

It's a 2?5 pro revision imo

#

It's probably the next anon model

late path Jun 10, 2025, 3:20 PM

#

I use it to analyze some conversational text content. It exhibits a level of stability and metacognitive abilities that I haven't seen in any other model

#

It reflects on its own thinking and has a thorough self-awareness

keen beacon Jun 10, 2025, 3:22 PM

#

late path I use it to analyze some conversational text content. It exhibits a level of sta...

The pretraining wasn't updated I think afaik what you're seeing is additional post training improvements

late path Jun 10, 2025, 3:22 PM

#

And it doesn't just reason about STEM problems

fleet lintel Jun 10, 2025, 3:22 PM

#

late path I use it to analyze some conversational text content. It exhibits a level of sta...

Which model are you talking about?

late path Jun 10, 2025, 3:23 PM

#

fleet lintel Which model are you talking about?

kingfall

keen beacon Jun 10, 2025, 3:23 PM

#

The day Craig compliments google the world will end

late path Jun 10, 2025, 3:23 PM

#

Okay, I just get overly excited about every bit of new progress on the model, maybe

fleet lintel Jun 10, 2025, 3:24 PM

#

late path kingfall

Someone implied (Bri..) that it's ultra model

keen beacon Jun 10, 2025, 3:25 PM

#

Yeah for many many reasons I doubt it's ultra

keen beacon Jun 10, 2025, 3:25 PM

#

fleet lintel Someone implied (Bri..) that it's ultra model

I don't think he knows

fleet lintel Jun 10, 2025, 3:25 PM

#

I have no special feeling about whether it's ultra or not. I just want super crazy model

#

I have been blue balled by openai too many times. I'll try to keep my hype checked this time

keen beacon Jun 10, 2025, 3:26 PM

#

I don't know about gpt 5 tbh

#

Their choice to midtrain 4o is interesting

#

To say the least

fleet lintel Jun 10, 2025, 3:28 PM

#

I need 5-6 more years to achieve financial independence. I just don't want models to become too good in coding to replace me before that 🙂

keen beacon Jun 10, 2025, 3:28 PM

#

Did you know at the time when you wrote that hint?

ocean vortex Jun 10, 2025, 3:30 PM

#

fleet lintel I need 5-6 more years to achieve financial independence. I just don't want model...

they probably won't anytime soon. Unless you are true junior status programmer - then it can somewhat tricky. The truth is that models will still need supervision at the very least from people understanding the code and able to make changes independently for awhile.

fleet lintel Jun 10, 2025, 3:31 PM

#

https://www.reddit.com/r/Bard/s/vDq9n8yKnK

From the Bard community on Reddit: GenAI Traffic Share update from ...

Explore this post and more from the Bard community

#

How accurate is this info?

#

Ah.. interesting. I didn't know that

late path Jun 10, 2025, 3:32 PM

#

fleet lintel Ah.. interesting. I didn't know that

Pichai mentioned this on Lex's podcast

fleet lintel Jun 10, 2025, 3:33 PM

#

Increase Model performance is the result of more training or newer/better techniques?

patent aspen Jun 10, 2025, 3:35 PM

#

fleet lintel Increase Model performance is the result of more training or newer/better techni...

Better everything

#

Another thing to keep in mind is that a 10% overall boost could mean a 50% boost in one area and a 2% boost in another area

ocean vortex Jun 10, 2025, 3:37 PM

#

keen beacon

btw so you basically linked the exact same thing we already talked about. For a sec I thought this was actually something new 🤣

So these are all pre-release API with unknown inference setup

keen beacon Jun 10, 2025, 3:38 PM

#

ocean vortex btw so you basically linked the exact same thing we already talked about. For a ...

No there was release runs done

fleet lintel Jun 10, 2025, 3:38 PM

#

patent aspen Another thing to keep in mind is that a 10% overall boost could mean a 50% boost...

I have heard a rumour that Google found another big way to improve coding performance. It might be part of next or next to next revision. Are you aware of it?

ocean vortex Jun 10, 2025, 3:38 PM

#

keen beacon No there was release runs done

that high score was before release

keen beacon Jun 10, 2025, 3:38 PM

#

In that thing I cited

#

Nope

#

On the day of he reran it again and got 86%

ocean vortex Jun 10, 2025, 3:39 PM

#

keen beacon Nope

it matches exactly with #1131200896827654149 message

#

even token counts

patent aspen Jun 10, 2025, 3:39 PM

#

fleet lintel I have heard a rumour that Google found another big way to improve coding perfor...

I don't get to see evals before a model is released

keen beacon Jun 10, 2025, 3:42 PM

#

ocean vortex it matches exactly with https://discord.com/channels/1131200896827654144/1131200...

yes the vertex pre release

ocean vortex Jun 10, 2025, 3:42 PM

#

oh wait there are actually 2 with identical percentage but different counts

keen beacon Jun 10, 2025, 3:42 PM

#

this is different

ocean vortex Jun 10, 2025, 3:42 PM

#

yeah...

keen beacon Jun 10, 2025, 3:42 PM

#

🤦‍♂️

#

i have no idea lol

fleet lintel Jun 10, 2025, 3:43 PM

#

Oh

This is interesting as well : https://www.reuters.com/business/retail-consumer/openai-taps-google-unprecedented-cloud-deal-despite-ai-rivalry-sources-say-2025-06-10/

Reuters

Exclusive: OpenAI taps Google in unprecedented cloud deal despite A...

OpenAI plans to add Alphabet's Google cloud service to meet its growing needs for computing capacity, three sources told Reuters, marking a surprising collaboration between two prominent competitors in the artificial intelligence sector.

#

Maybe that's why Google shares are up

ocean vortex Jun 10, 2025, 3:44 PM

#

I asked for them to elaborate, we will see what they have to say. Would be insanely stupid to rank 32k as higher when 3 days ago they got higher score on auto

keen beacon Jun 10, 2025, 3:44 PM

#

youre supposed to do sevreal and avg the benchmark scores

patent aspen Jun 10, 2025, 3:44 PM

#

fleet lintel Oh This is interesting as well : https://www.reuters.com/business/retail-consu...

lol

late path Jun 10, 2025, 3:45 PM

#

Now OpenAI, Anthropic, and Google are all using GCP for their infra services

keen beacon Jun 10, 2025, 3:45 PM

#

rn i think theyre picking the scores willy nilly

patent aspen Jun 10, 2025, 3:45 PM

#

That's a concession from OAI not a concession from Google lol

ocean vortex Jun 10, 2025, 3:45 PM

#

keen beacon youre supposed to do sevreal and avg the benchmark scores

which is why I asked for 32k runs you didnt provide lmao. You proved nothing 🤦‍♂️

keen beacon Jun 10, 2025, 3:45 PM

#

ocean vortex which is why I asked for 32k runs you didnt provide lmao. You proved nothing 🤦‍...

i just got on my computer

ocean vortex Jun 10, 2025, 3:46 PM

#

keen beacon rn i think theyre picking the scores willy nilly

do you really suggest they just ignored those earlier scores and still placed 32k higher for no reason whatsover???

#

lol

keen beacon Jun 10, 2025, 3:46 PM

#

ocean vortex do you really suggest they just ignored those earlier scores and still placed 32...

yes

#

they actually used a different run on the aider website before they replaced with the new ones

ocean vortex Jun 10, 2025, 3:47 PM

#

keen beacon yes

then your argument is stupid, that is the opposite of proof

keen beacon Jun 10, 2025, 3:47 PM

#

ocean vortex then your argument is stupid, that is the opposite of proof

dont worry i can prove it unlike anything u say

#

u literally have no proof for anything and just interpretation after interpretation

civic flame Jun 10, 2025, 3:47 PM

#

ugh i hate this woman

unborn ocean Jun 10, 2025, 3:48 PM

#

it's gpu compute

civic flame Jun 10, 2025, 3:48 PM

#

crap benchmark, crap person, posts slop

#

lmao

fleet lintel Jun 10, 2025, 3:49 PM

#

Is this the reason that Google is putting rate limits on Gemini models? Because of openai deal? : 🤔

fleet lintel Jun 10, 2025, 3:49 PM

#

civic flame ugh i hate this woman

Who is she?

#

Live bench?

ocean vortex Jun 10, 2025, 3:49 PM

#

keen beacon dont worry i can prove it unlike anything u say

does not really look like it. The first thing I asked for proof and you couldn't do it 💀

The main difference between you and me is I never say definitively for something to be a fact if there's no evidence to support it. But you seem to mix a "feeling" or opinion with facts awfully lot tbh

brittle tiger Jun 10, 2025, 3:50 PM

#

If Google was worried about OpenAI they wouldn't ink a major deal with them to give them compute. It's pretty simple.

fleet lintel Jun 10, 2025, 3:50 PM

#

brittle tiger If Google was worried about OpenAI they wouldn't ink a major deal with them to g...

Not true. Different departments have different goals.

unborn ocean Jun 10, 2025, 3:51 PM

#

fleet lintel Not true. Different departments have different goals.

this is not department level

brittle tiger Jun 10, 2025, 3:51 PM

#

This would get Sundar and MGMT team scrutiny for sure.

fleet lintel Jun 10, 2025, 3:52 PM

#

brittle tiger This would get Sundar and MGMT team scrutiny for sure.

I agree with this. I think my statement was too general and may not apply to this case

#

They can hire me and pay 1.5 million per year

#

Bankruptcy

unborn ocean Jun 10, 2025, 3:55 PM

#

yes

#

i will grant you this one thing: you could probably manage some company from musk better than he can, considering all his other positions

#

but obv the man is the best marketing / salesman in the world (idk how, considering that i never liked him)

#

"no one thinks like me" and different 🤝

keen beacon Jun 10, 2025, 4:02 PM

#

ocean vortex does not really look like it. The first thing I asked for proof and you couldn't...

heres the 32k run. but anyway i was wrong about 0605, i misremembered 0506 and confused it with 0605. but i have literally sourced all of my stuff:

you came up with this logic that they're classifying and dynamically setting max tokens without any documentation or etc whatsoever. saying auto picks within 32768, you asked for proof, i gave you two runs (on gem 2.5 flash and pro with thinking budget off), the raw thoughts of one of the runs, etc.
receipts of different runs with the same settings and variance, etc.

FYI: the command to run the non think one was the same as the previous runs I cited

📎 message.txt

fleet lintel Jun 10, 2025, 4:02 PM

#

Openai not gonna use tpu?

patent aspen Jun 10, 2025, 4:03 PM

#

I would imagine they would use GPUs because they've built their entire stack around it

unborn ocean Jun 10, 2025, 4:04 PM

#

no way the will use tpus
(or actually thinking about it, it could happen for some smaller things, although they have so much IP on gpu already, so it would need to be something that is really a new and small project, e.g. finally opensource something)

patent aspen Jun 10, 2025, 4:05 PM

#

Or they just have to lower prices to stop the bleeding

ocean vortex Jun 10, 2025, 4:05 PM

#

keen beacon heres the 32k run. but anyway i was wrong about 0605, i misremembered 0506 and c...

this is a SINGLE run. This is not proof that it's all due to variance unless we had like 3 more runs and the average was no higher than auto budget

keen beacon Jun 10, 2025, 4:06 PM

#

ocean vortex this is a SINGLE run. This is not proof that it's all due to variance unless we ...

this is the single run he used though

#

they dont do multiple and avg

#

afaik

patent aspen Jun 10, 2025, 4:06 PM

#

I highly doubt o3 is any cheaper to serve now than it was before

#

If it was a different model (e.g. o4-mini) I could believe it

#

I think it's just pricing pressure

unborn ocean Jun 10, 2025, 4:08 PM

#

patent aspen I think it's just pricing pressure

yes, i agree

keen beacon Jun 10, 2025, 4:08 PM

#

ocean vortex this is a SINGLE run. This is not proof that it's all due to variance unless we ...

this was also the non thinking run btw

📎 message.txt

ocean vortex Jun 10, 2025, 4:08 PM

#

you came up with this logic that they're classifying and dynamically setting max tokens without any documentation or etc whatsoever. saying auto picks within 32768, you asked for proof, i gave you two runs (on gem 2.5 flash and pro with thinking budget off), the raw thoughts of one of the runs, etc.

yes I speculated on it, what's wrong with that, never said for those to be facts things I speculated on @keen beacon

As for "auto picks within 32768" that is literally how it's supposed to work reading their documentation. And the fact alone that setting max budget does not guarantee it not going over that as we saw, kinda proves I'm right

keen beacon Jun 10, 2025, 4:09 PM

#

ocean vortex > you came up with this logic that they're classifying and dynamically setting m...

no it doesn't this behavior has been a thing since 2.5 flash its not new. it's slightly off

ocean vortex Jun 10, 2025, 4:10 PM

#

keen beacon this was also the non thinking run btw

this is "default think", you still only showed singular 32k run = no proof lol

late path Jun 10, 2025, 4:10 PM

#

I think the cost of the official o3 release is already much lower than o1. They only reduced the price by one-third compared to o1, which is entirely to retain more profit

keen beacon Jun 10, 2025, 4:11 PM

#

ocean vortex this is "default think", you still only showed singular 32k run = no proof lol

this is the final run they used thogh

#

they only did one run for non think and thinking and put it on the leaderboard

ocean vortex Jun 10, 2025, 4:12 PM

#

keen beacon they only did one run for non think and thinking and put it on the leaderboard

assuming this is true, there's literally no evidence to state variance as a fact rather than your gut feeling

late path Jun 10, 2025, 4:12 PM

#

Here's a joke: The price of chatgpt-4o-latest is $5/15 mtok

keen beacon Jun 10, 2025, 4:13 PM

#

ocean vortex assuming this is true, there's literally no evidence to state variance as a fact...

theres nothing supporting that it isnt variance though

#

https://github.com/Aider-AI/aider/blob/main/aider/website/_data/polyglot_leaderboard.yml idk how you can contest this (they only used 1 run for each) just look here 😂

ocean vortex Jun 10, 2025, 4:14 PM

#

keen beacon theres nothing supporting that it isnt variance though

no one is stating that as a fact either. I'm merely open to this being true until we have more info to work with

keen beacon Jun 10, 2025, 4:14 PM

#

ocean vortex no one is stating that as a fact either. I'm merely open to this being true unti...

who is stating it as a fact ive always said it was my opinion 🤣

#

🤷 im done arguing this. it was very pointless

ocean vortex Jun 10, 2025, 4:16 PM

#

keen beacon who is stating it as a fact ive always said it was my opinion 🤣

you certainly gave the opposite impression, do not backtrack... 🤦‍♂️
#general message

#

you were coming up with 'proof' and now it's an opinion all of a sudden? 🤣

keen beacon Jun 10, 2025, 4:17 PM

#

fleet lintel Jun 10, 2025, 4:17 PM

#

"Meta Is Creating a New A.I. Lab"

what happened to llama stuff? they are giving up?

keen beacon Jun 10, 2025, 4:18 PM

#

ocean vortex you were coming up with 'proof' and now it's an opinion all of a sudden? 🤣

proof to support claims, you were asking about proof that auto could produce more than 32k, you were asking about proof that runs could have variance, etc.

#

🤣

ocean vortex Jun 10, 2025, 4:18 PM

#

keen beacon

oh ffs... How is this relevant now some message in a different context than today? Yeah I think I'm done.

fleet lintel Jun 10, 2025, 4:19 PM

#

this fight is getting dumber and dumber. take a deep breath folks

ocean vortex Jun 10, 2025, 4:19 PM

#

keen beacon proof to support claims, you were asking about proof that auto could produce mor...

opinion does not need proof if you frame it like so

#

your issue is that you keep attacking based on your opinion if people don't agree with it

fleet lintel Jun 10, 2025, 4:23 PM

#

i think price drop is just competitive pressure from gemini/claude

alpine coral Jun 10, 2025, 4:23 PM

#

aren't they deprecating 4.5 soon (from chatgpt anyway)? that'd free up a bunch of capacity

late path Jun 10, 2025, 4:24 PM

#

I think they felt the threat from 0605... otherwise they wouldn't have priced it exactly $2 lower than 2.5pro.

fleet lintel Jun 10, 2025, 4:25 PM

#

late path I think they felt the threat from 0605... otherwise they wouldn't have priced it...

for <200K tokens, gemini is still cheaper.. 🤔

#

crazy how good this model is with this kind of pricing

calm sequoia Jun 10, 2025, 4:26 PM

#

Is there anything going on with chatgpt? The o3 suddenly is dumb 👀

late path Jun 10, 2025, 4:27 PM

#

Just like last time they priced o3mini after r1 release, where r1's price was 1.1 USD/mtok (converted from 8 RMB/mtok). It makes no logical sense for them to set the price at 2.2 USD/mtok at first place

fleet lintel Jun 10, 2025, 4:29 PM

#

this reminds me of r2. what is r2 coming ?

#

3.5, r2, o3 pro... everything is so damn late

late path Jun 10, 2025, 4:30 PM

#

October maybe, just saying

sour spindle Jun 10, 2025, 4:31 PM

#

calm sequoia Is there anything going on with chatgpt? The o3 suddenly is dumb 👀

They replaced it with 4o but changed formatting to lower prices

#

O3-pro is probably just going to be o3

late path Jun 10, 2025, 4:31 PM

#

4o is now more expensive than o3 haha

calm sequoia Jun 10, 2025, 4:32 PM

#

sour spindle O3-pro is probably just going to be o3

Gemini style 😄

sour spindle Jun 10, 2025, 4:33 PM

#

Wouldn’t shock if it ever came out these companies were rerouting questions to dumber models a lot of folks would never know the difference

late path Jun 10, 2025, 4:34 PM

#

OpenAI has been downgrading models based on IP quality

sour spindle Jun 10, 2025, 4:34 PM

#

Hell 4o would be glazing folks if not for a vocal minority calling it out.

#

Did I use that term correctly lol

jade egret Jun 10, 2025, 4:36 PM

#

bro

jade egret Jun 10, 2025, 4:36 PM

#

late path It reflects on its own thinking and has a thorough self-awareness

fr?

calm sequoia Jun 10, 2025, 4:36 PM

#

Same here

late path Jun 10, 2025, 4:38 PM

#

jade egret fr?

Are you buying Google on Polymarket? There's no money to be made now. This market is becoming more efficient lol

ocean vortex Jun 10, 2025, 4:38 PM

#

late path I think they felt the threat from 0605... otherwise they wouldn't have priced it...

yeah. This also confirms what I've been saying for a long time now - that their cost of running o3 was much much lower than the price lol

#

their margins for it were much bigger than on smth like o4-mini

late path Jun 10, 2025, 4:39 PM

#

o3 in many ways demonstrates that it is not a very large model, yet OpenAI markets it as a top-tier model from the o3preview specs

#

I mean, it's top tier, it's just not big and cheaper

ocean vortex Jun 10, 2025, 4:40 PM

#

late path o3 in many ways demonstrates that it is not a very large model, yet OpenAI marke...

everything points toward all of them being similarly sized as gpt4o

#

non-mini reasoning models I mean

calm sequoia Jun 10, 2025, 4:58 PM

#

The gemini-2.5-pro-preview-05-06 was a slop compared to march

#

Also, google should not have released models with versions "06-05" vs "05-06" 😄 Even the gpt naming is better

keen beacon Jun 10, 2025, 5:00 PM

#

o4 mini and 4o mini?

cedar tide Jun 10, 2025, 5:01 PM

#

https://x.com/OpenAI/status/1932483131363504334?t=CPvi-UXdmpmWgl-f5pZznA&s=19

OpenAI (@OpenAI)

OpenAI o3-pro today.

calm sequoia Jun 10, 2025, 5:02 PM

#

I understand this can be done unwillingly. But for the end user (me) experience is the same.

civic flame Jun 10, 2025, 5:03 PM

#

cedar tide https://x.com/OpenAI/status/1932483131363504334?t=CPvi-UXdmpmWgl-f5pZznA&s=19

it's time

#

i hope it's not mid 😭😭

torn mantle Jun 10, 2025, 5:03 PM

#

they nerfed o3

#

and this o3-pro is the previous o3

civic flame Jun 10, 2025, 5:03 PM

#

it's going to be better but by how much is another question

calm sequoia Jun 10, 2025, 5:04 PM

#

torn mantle and this o3-pro is the previous o3

I know this is a joke but the experience of last few hours confirms this 😄

torn mantle Jun 10, 2025, 5:06 PM

#

calm sequoia I know this is a joke but the experience of last few hours confirms this 😄

you tried it?

#

today

calm sequoia Jun 10, 2025, 5:07 PM

#

#

It can't be the same as Grok 3.5

jade egret Jun 10, 2025, 5:08 PM

#

late path Are you buying Google on Polymarket? There's no money to be made now. This marke...

I don't have money to buy....

#

it not out tho

#

apple sstock aint growing that much

#

fr?

ocean vortex Jun 10, 2025, 5:20 PM

#

jade egret apple sstock aint growing that much

their iOS release is a mixed bag tbh

#

they overdid it with glass, looks cheap in certain conditions

fleet lintel Jun 10, 2025, 5:21 PM

#

WWDC was embarassement. how apple is not 5% down?

ocean vortex Jun 10, 2025, 5:21 PM

#

hopefully that's gonna be improved, but it's not perfect right now for sure lol

fleet lintel Jun 10, 2025, 5:23 PM

#

liquid glass design.. woopdie freaking doo.. who cares.

misty vault Jun 10, 2025, 5:24 PM

#

elder rapids Jun 10, 2025, 5:36 PM

#

icl

#

I wonder how 2.5 deepthink

#

Is going to be

cedar tide Jun 10, 2025, 5:37 PM

#

New o3 price
more than 2 times cheaper than 2.5 pro 😶

Screenshot_2025-06-10-19-36-12-891_com.android.chrome-edit.jpg

elder rapids Jun 10, 2025, 5:43 PM

#

def not gonna be the case in practice

cedar tide Jun 10, 2025, 5:43 PM

#

elder rapids def not gonna be the case in practice

It dépend the task

elder rapids Jun 10, 2025, 5:44 PM

#

yeah we know, the problem is that it's not going to be the actual price of the model so how they marked the price could've been completely arbitrary

#

ie input cost and output cost could be equal

#

what is bro talking about

#

this is an objective loss

#

😭

#

apple intelligence ain't doing shi

cedar tide Jun 10, 2025, 5:46 PM

#

@elder rapids I didn't understand what you meant

elder rapids Jun 10, 2025, 5:46 PM

#

clearly you've never been on a Samsung

#

it's crazy with the ai

elder rapids Jun 10, 2025, 5:49 PM

#

cedar tide <@887104792437092352> I didn't understand what you meant

it DOES depend on the task, input cost and output cost could be equal at certain points in practice, but it could be that it simply reasons more than 0605 and outputs more so the price to run fixed tasks wouldn't show the whole story. Since the input wouldn't entail that kind of behavior

#

Ive had a ton of phones over the years tbh

#

iPhone, Samsung, Google

#

iPhone and Samsung simultaneously

#

and I gotta say

#

Samsung does everything pretty much better now

#

can't remember a time where actually going on iPhone was more convenient and intuitive

#

also btw what do you guys think about the possibility of Gemini 3 coming soon

#

the GA releases of the 2.5 series is soon

cedar tide Jun 10, 2025, 5:52 PM

#

elder rapids it DOES depend on the task, input cost and output cost could be equal at certain...

In summary what do you want to say ?

elder rapids Jun 10, 2025, 5:53 PM

#

cedar tide In summary what do you want to say ?

use chatgpt to interpret

#

prob ye

cedar tide Jun 10, 2025, 5:53 PM

#

@elder rapids Under what conditions do you want o3 to be more expensive?

elder rapids Jun 10, 2025, 5:53 PM

#

cedar tide <@887104792437092352> Under what conditions do you want o3 to be more expensive?

what are you talking about dawg

#

😭

cedar tide Jun 10, 2025, 5:54 PM

#

@elder rapids Ah, we're not talking about the same thing.

keen beacon Jun 10, 2025, 5:54 PM

#

does anyone have a svg prompt that kingfall does well on vs 2.5 pro?

#

its supposed to be the same

#

im really not impressed with kingfall atm

#

if any1 has prompts please provide

jade egret Jun 10, 2025, 6:01 PM

#

elder rapids Jun 10, 2025, 6:04 PM

#

keen beacon if any1 has prompts please provide

you could just use Craig's

#

but you won't get a generally good result

sour spindle Jun 10, 2025, 6:06 PM

#

jade egret

When these polls are posted are we talking about actual use case or lm leaderboard ranking

zinc ore Jun 10, 2025, 6:06 PM

#

Should be comparing pro with deepthink

jade egret Jun 10, 2025, 6:10 PM

#

sour spindle When these polls are posted are we talking about actual use case or lm leaderboa...

overall

elder rapids Jun 10, 2025, 6:12 PM

#

I think that would necessarily be true imo, it gets to a point where perfect information that is inevitably obtained in multiple conclusions results in better performance, even if the perfect information that resulted in a better independent conclusion was the result

#

deepthink*

#

yeah you're mistaken

late path Jun 10, 2025, 6:13 PM

#

Alright, I saw an article claiming that o3 and gpt4.1 shares the same base. That makes perfect sense. They have now unified the price of o3 with 4.1.

elder rapids Jun 10, 2025, 6:13 PM

#

no, he never mentioned whether kingfall is or isn't a 2.5 pro model

#

yes

#

its not deepthink

jade egret Jun 10, 2025, 6:13 PM

#

when do you think it gonna be out at least in the ai studio?

elder rapids Jun 10, 2025, 6:14 PM

#

replace deepthink with kingfall here and see what you're replying to

keen beacon Jun 10, 2025, 6:14 PM

#

probably soon

elder rapids Jun 10, 2025, 6:14 PM

#

you're mistaken, since that's irrelevant

sacred quail Jun 10, 2025, 6:17 PM

#

Was kingfall really that good ?

elder rapids Jun 10, 2025, 6:17 PM

#

no

#

it could even be the last generation ultra

#

tbh

#

you drew that didn't you

#

spit it out brobro

keen beacon Jun 10, 2025, 6:19 PM

#

yes manually

#

sorry i had to lie

elder rapids Jun 10, 2025, 6:19 PM

#

caught

elder rapids Jun 10, 2025, 6:20 PM

#

keen beacon sorry i had to lie

using le chat as an svg assistant is no better than cheating on a test

#

snakes aren't always in the water

#

sharks aren't always in the ground

#

💯🙏

late path Jun 10, 2025, 6:24 PM

#

I sincerely hope gemini 3.0 pro can comprehensively surpass kingfall and have deepthink

keen beacon Jun 10, 2025, 6:26 PM

#

im sorta disappointed tbh, it doesn't seem THAT much better

#

yes

torn mantle Jun 10, 2025, 6:28 PM

#

keen beacon yes

huh

keen beacon Jun 10, 2025, 6:29 PM

#

yeah

#

i suppose the dramatic code name does make sense

nimble trail Jun 10, 2025, 6:30 PM

#

So Kingfall is not deepthink?..

keen beacon Jun 10, 2025, 6:32 PM

#

i wonder whats pricing going to be like

#

yes

patent aspen Jun 10, 2025, 6:38 PM

#

I just want to see the benchmarks

cedar tide Jun 10, 2025, 6:40 PM

#

keen beacon i wonder whats pricing going to be like

Maximum 20/80$

wintry tinsel Jun 10, 2025, 6:42 PM

#

What is ultra chat, Mistral large 3??

keen beacon Jun 10, 2025, 6:43 PM

#

wintry tinsel What is ultra chat, Mistral large 3??

no i actually hand made that image i was joking

cedar tide Jun 10, 2025, 6:43 PM

#

Ultra mode ?

#

What do you say ?

keen beacon Jun 10, 2025, 6:43 PM

#

nothing important

elder rapids Jun 10, 2025, 6:43 PM

#

he likely had access

keen fulcrum Jun 10, 2025, 6:44 PM

#

So, AI will change education forever. Kids who are not educated using latest tech will fall behind

elder rapids Jun 10, 2025, 6:44 PM

#

or he could've

late path Jun 10, 2025, 6:44 PM

#

previously OpenAI made some changes to o1-pro in pro accounts, adding a search function, and it self claims ~~o3pro~~ o3

keen beacon Jun 10, 2025, 6:44 PM

#

o1 pro doesn't have web access. his did and claimed to be o3

late path Jun 10, 2025, 6:44 PM

#

But we're not sure if that's really o3pro

leaden sun Jun 10, 2025, 6:44 PM

#

keen fulcrum So, AI will change education forever. Kids who are not educated using latest tec...

are you sure?

keen fulcrum Jun 10, 2025, 6:45 PM

#

leaden sun are you sure?

Yes kids in private schools are already educated

elder rapids Jun 10, 2025, 6:46 PM

#

keen beacon o1 pro doesn't have web access. his did and claimed to be o3

ye

leaden sun Jun 10, 2025, 6:46 PM

#

keen fulcrum Yes kids in private schools are already educated

no AI in the first 13 years, then maybe it's meaningful to introduce it to them in a careful way

elder rapids Jun 10, 2025, 6:46 PM

#

I don't suspect o3 pro is going to be much more than a high high thinking mode, as opposed to o1 pro being an insane gap

keen beacon Jun 10, 2025, 6:47 PM

#

elder rapids I don't suspect o3 pro is going to be much more than a high high thinking mode, ...

peasantry wasnt that impressed if that was o3 pro

keen fulcrum Jun 10, 2025, 6:47 PM

#

leaden sun no AI in the first 13 years, then maybe it's meaningful to introduce it to them ...

in 2 years we will reach AGI

keen beacon Jun 10, 2025, 6:47 PM

#

i think, at least based on his past comments

elder rapids Jun 10, 2025, 6:47 PM

#

keen beacon peasantry wasnt that impressed if that was o3 pro

damn

leaden sun Jun 10, 2025, 6:48 PM

#

keen fulcrum in 2 years we will reach AGI

..........

keen fulcrum Jun 10, 2025, 6:48 PM

#

Due to bureaucracy and politics AI won't be introduced in public schools in the next 10-20 years

keen beacon Jun 10, 2025, 6:48 PM

#

🤣

leaden sun Jun 10, 2025, 6:49 PM

#

keen fulcrum Due to bureaucracy and politics AI won't be introduced in public schools in the ...

https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence

Topics | European Parliament

EU AI Act: first regulation on artificial intelligence | Topics | E...

The use of artificial intelligence in the EU is regulated by the AI Act, the world’s first comprehensive AI law. Find out how it protects you.

keen fulcrum Jun 10, 2025, 6:49 PM

#

The future is either homeschooling or private schools for education

patent aspen Jun 10, 2025, 6:50 PM

#

Why did OAI announce o3 pro this morning and then wait several hours to launch?

#

Maybe the ChatGPT outage?

#

Or maybe the rollout caused the outage

cedar tide Jun 10, 2025, 6:53 PM

#

https://www.latent.space/p/o3-pro

God is hungry for Context: First thoughts on o3 pro

OpenAI dropped o3 pricing 80% today and launched o3-pro. Ben Hylak returns with the world's first early review.

wintry tinsel Jun 10, 2025, 6:55 PM

#

It’s really not for kids, it’s very good for a tight knit community and traditional values with more focused education well depends on the private school

wintry tinsel Jun 10, 2025, 6:56 PM

#

keen beacon no i actually hand made that image i was joking

They said it was releasing in the “coming weeks” on May 7th I expect it any day now

leaden sun Jun 10, 2025, 6:56 PM

#

wintry tinsel It’s really not for kids, it’s very good for a tight knit community and traditio...

those private schools are meant o keep the special small community small

wintry tinsel Jun 10, 2025, 6:57 PM

#

Classical private schools give a way better education, you learn actual critical thinking, philosophy, and unbiased non lobotomized history

#

Public schools are excruciatingly leftist or stem focused