#general | Arena | Page 57

small haven Jun 13, 2025, 8:49 PM

#

prompt

ocean vortex Jun 13, 2025, 8:51 PM

#

it's not correct but that's in-line with what most models answer lmao

#

o3-pro to a variation of this prompt answers same

civic flame Jun 13, 2025, 8:52 PM

#

can't wait for o9 pro to get it right

small haven Jun 13, 2025, 8:52 PM

#

what?

#

ni hao ma

#

kingfall was better :?

#

#

📎 7imBfrf.txt

#

#

darkmouth wins

keen beacon Jun 13, 2025, 9:16 PM

#

hmm

#

idk i like kingfall better

small haven Jun 13, 2025, 9:19 PM

#

lmao

#

its like 50/50

olive mesa Jun 13, 2025, 9:21 PM

#

small haven darkmouth wins

Is that another 2.5 checkpoint?

small haven Jun 13, 2025, 9:21 PM

#

olive mesa Is that another 2.5 checkpoint?

yea

civic flame Jun 13, 2025, 9:21 PM

#

keen beacon idk i like kingfall better

well what have you tested it on

#

just SVGs?

keen beacon Jun 13, 2025, 9:21 PM

#

yes 🤣

#

LMAO

civic flame Jun 13, 2025, 9:22 PM

#

😭

#

can't wait for agi to come out and people here are saying claude 5 is better because the svgs look nicer

keen beacon Jun 13, 2025, 9:22 PM

#

it looks way better

civic flame Jun 13, 2025, 9:22 PM

#

horrors beyond comprehension

small haven Jun 13, 2025, 9:22 PM

#

i tested code, but its 50/50 with kingfall, its not be all and end all

#

this tops

olive mesa Jun 13, 2025, 9:23 PM

#

Yea, but its very good lol

keen beacon Jun 13, 2025, 9:23 PM

#

"make a svg of what you look like"

#

imagine 🤣

small haven Jun 13, 2025, 9:24 PM

#

@deep adder what you thinking boss?

#

actually its craig's prompt, nvm

#

and prolly not using auto thinking :/

civic flame Jun 13, 2025, 9:25 PM

#

prod-common-global__/aistudio/gemini-v3p1l-rev20-toothless-sc__main__/aistudio/gemini-v3p1l-rev20-toothless-sc__2025061201__model__variant

#

this is still ultra

#

also of note is that this checkpoint was completed yesterday by the looks of it

#

and that behind the scenes it is called toothless still

#

(toothless was briefly AB tested for ~12 hrs, it appears this is the same model but they just changed the name?)

small haven Jun 13, 2025, 9:26 PM

#

civic flame `prod-common-global__/aistudio/gemini-v3p1l-rev20-toothless-sc__main__/aistudio/...

huh

civic flame Jun 13, 2025, 9:27 PM

#

you can't do it like that

#

this is the internal/backend model name

#

you have to use the outward name

small haven Jun 13, 2025, 9:27 PM

#

aight

small haven Jun 13, 2025, 9:28 PM

#

civic flame this is still ultra

did it get the luxury sports car problem right

civic flame Jun 13, 2025, 9:29 PM

#

it could probably get it right with enough attempts but not anywhere near consistently

#

just like any other current model

small haven Jun 13, 2025, 9:29 PM

#

hmm

#

wait so is ultra kingfall, or even better

keen beacon Jun 13, 2025, 9:30 PM

#

both are ultra apparently

civic flame Jun 13, 2025, 9:30 PM

#

both kingfall and toothless/this are probably ultra

#

just different checkpoints

#

with the latter being the latest one

small haven Jun 13, 2025, 9:30 PM

#

keen beacon both are ultra apparently

hmm ok

keen beacon Jun 13, 2025, 9:31 PM

#

who is ddosing ultra

#

put ur hands up

small haven Jun 13, 2025, 9:33 PM

#

why is google so leaky

#

can claude be this leaky, i wanna try neptune v2

keen beacon Jun 13, 2025, 9:34 PM

#

small haven can claude be this leaky, i wanna try neptune v2

its not important if it is what i think

civic flame Jun 13, 2025, 9:34 PM

#

yup

small haven Jun 13, 2025, 9:34 PM

#

keen beacon its not important if it is what i think

i mean 2 weeks before claude 4, neptune codename had leaked

late path Jun 13, 2025, 9:34 PM

#

civic flame `prod-common-global__/aistudio/gemini-v3p1l-rev20-toothless-sc__main__/aistudio/...

how did you get that string😱

keen beacon Jun 13, 2025, 9:34 PM

#

small haven i mean 2 weeks before claude 4, neptune codename had leaked

its not important dw

small haven Jun 13, 2025, 9:34 PM

#

keen beacon its not important dw

how u know my guy

keen beacon Jun 13, 2025, 9:35 PM

#

im guessing

#

completely random guess

small haven Jun 13, 2025, 9:35 PM

#

hmm ok

civic flame Jun 13, 2025, 9:36 PM

#

not a new model

#

👍

civic flame Jun 13, 2025, 9:36 PM

#

late path how did you get that string😱

dw

late path Jun 13, 2025, 9:36 PM

#

imo toothless vs kingfall is like the regression of 0506 vs 0325 :(

small haven Jun 13, 2025, 9:37 PM

#

bros having too much fun with that $1 deal

civic flame Jun 13, 2025, 9:37 PM

#

late path imo toothless vs kingfall is like the regression of 0506 vs 0325 :(

labs try not to cut the balls off their greatest internal checkpoints before release challenge (IMPOSSIBLE)

small haven Jun 13, 2025, 9:38 PM

#

i was gonna say.. hahaha

keen beacon Jun 13, 2025, 9:38 PM

#

0325 is still available to access using the dark arts

civic flame Jun 13, 2025, 9:38 PM

#

is that what we're calling it now lmaoo

small haven Jun 13, 2025, 9:38 PM

#

didnt know what that grave guy was tryna accomplish until he hits 100 usage lmao

small haven Jun 13, 2025, 9:39 PM

#

civic flame is that what we're calling it now lmaoo

dark mouth is a better name

#

or no teeth

#

👀

#

ultra no thinking feels like kingfall esque

keen beacon Jun 13, 2025, 9:49 PM

#

it is xd

small haven Jun 13, 2025, 9:49 PM

#

lol

zinc ore Jun 13, 2025, 9:51 PM

#

https://www.anthropic.com/engineering/built-multi-agent-research-system

How we built our multi-agent research system

On the the engineering challenges and lessons learned from building Claude's Research system

keen beacon Jun 13, 2025, 9:52 PM

#

kingfall is still the undisputed svg king tbh

small haven Jun 13, 2025, 9:52 PM

#

*and code

keen beacon Jun 13, 2025, 9:53 PM

#

most of my prompts to it were svg requests 😂

civic flame Jun 13, 2025, 9:53 PM

#

lol what

small haven Jun 13, 2025, 9:53 PM

#

keen beacon most of my prompts to it were svg requests 😂

i had it write zig http2 server from scratch, and have claude code fix it (took 2-3 turns) and server is running

#

but with darkmouth, took at least 10 turns

#

to fix the compilation errs

keen beacon Jun 13, 2025, 9:58 PM

#

lemme see if i give it two 6x6 zebra puzzles does it still solve them

soft kernel Jun 13, 2025, 9:58 PM

#

Wait where?

civic flame Jun 13, 2025, 9:58 PM

#

small haven i had it write zig http2 server from scratch, and have claude code fix it (took ...

my bet is that this update is to make thinking "more efficient"

#

and probably cheaper for them to provide

keen beacon Jun 13, 2025, 9:59 PM

#

soft kernel Wait where?

its just fake news and a joke

soft kernel Jun 13, 2025, 9:59 PM

#

keen beacon its just fake news and a joke

Ik ik

#

I meant the model that made the svg

small haven Jun 13, 2025, 9:59 PM

#

civic flame my bet is that this update is to make thinking "more efficient"

hmmm i can see that

small haven Jun 13, 2025, 10:00 PM

#

soft kernel Wait where?

from a different parallel universe

soft kernel Jun 13, 2025, 10:01 PM

#

Lol

#

Which model's this

#

The second svg

civic flame Jun 13, 2025, 10:01 PM

#

it's 03-25

soft kernel Jun 13, 2025, 10:02 PM

#

Really?

#

Damn didn't think 03-25 is that good

torn mantle Jun 13, 2025, 10:06 PM

#

@civic flame how can i use deep think

#

is there a way

#

yes or nah?

civic flame Jun 13, 2025, 10:06 PM

#

not as far as i know lmao

small haven Jun 13, 2025, 10:15 PM

#

everybody be doing trading strats 😭

#

i think u are suffering from an overfitting symptom my guy

#

rentec gets maximum 54% w/r to become a billionaire, so u must be a gazillionaire by next year

#

true only janitors

#

oh wait i just fact checked myself, its actually 50.75% w/r on the medallion fund

#

300 mit-educated phd's have been refining their system for years, but craig is single handedly overthrowing them with o3-pro, gg's

civic flame Jun 13, 2025, 10:26 PM

#

can't even navigate a website like a human these days

keen beacon Jun 13, 2025, 10:26 PM

#

was it ur mouse actions/movements actually?

civic flame Jun 13, 2025, 10:26 PM

#

"much faster" i clicked 2 links in the span of 10s..

civic flame Jun 13, 2025, 10:27 PM

#

keen beacon was it ur mouse actions/movements actually?

they were normal as far as i'm aware 💔

#

no idea but id presume someone looking at AI studio's network tab and generating over and over again until they got it? idk

keen beacon Jun 13, 2025, 10:29 PM

#

my guess too

small haven Jun 13, 2025, 10:29 PM

#

don't patch it :/

late path Jun 13, 2025, 10:30 PM

#

no way😭

keen beacon Jun 13, 2025, 10:30 PM

#

i dont like this one

#

its worse

#

im glad this model or a version of it will be released one day though

small haven Jun 13, 2025, 10:31 PM

#

cc: @deep adder

keen beacon Jun 13, 2025, 10:36 PM

#

im getting too distracted testing these models and the thinking budget thing

small haven Jun 13, 2025, 10:37 PM

#

u can technically have 55-60% but those opportunities come rarely

jade egret Jun 13, 2025, 10:37 PM

#

civic flame can't even navigate a website like a human these days

your a robot 🤖

small haven Jun 13, 2025, 10:37 PM

#

rentec 50.75% is based on a tick basis

patent aspen Jun 13, 2025, 10:40 PM

#

The Medallion Fund and Warren Buffett had 50-60% strats during specific time periods

#

Not normal though

#

You would need to have a mind like Warren Buffet tho

#

Neither is something anyone should expect to replicate

#

I'm talking about early to mid career Buffett

small haven Jun 13, 2025, 10:44 PM

#

? all hft firms are wym

#

i think ppl dont realize where that 50.75% w/r is coming from, tick basis vs 1-5yr term basis, is a totally different game

patent aspen Jun 13, 2025, 10:46 PM

#

Warren Buffett is special

small haven Jun 13, 2025, 10:46 PM

#

😭

late path Jun 13, 2025, 10:47 PM

#

value investing: bet on google #1 every month on polymarket😂

small haven Jun 13, 2025, 10:47 PM

#

likewise?

#

craig talked to a renten employee!!

patent aspen Jun 13, 2025, 10:48 PM

#

Buffett averaged 50+% returns for like 20 years during his early career

small haven Jun 13, 2025, 10:48 PM

#

o3 pro + craig >> 300 phd researchers

craigbench'ed

patent aspen Jun 13, 2025, 10:49 PM

#

But he saw it and others didn't

#

It was only easy in hindsight

#

tbc I do think it has become much harder

jade egret Jun 13, 2025, 10:49 PM

#

woah

small haven Jun 13, 2025, 10:50 PM

#

not AI related!

echo aurora Jun 13, 2025, 10:50 PM

#

lets keep things relatively focussed on AI pls blobthanks

small haven Jun 13, 2025, 10:51 PM

#

bing chillin

keen beacon Jun 13, 2025, 10:51 PM

#

hmm this new model thinks a lot at least on specific problems compared to before (and it sucks/less accurate) even though it thinks much longer. it took 47k thinking to solve two zebra puzzles (only second was right). (thinking budget = auto, as it's uncapped)
kingfall did it in 14.5k and got both of them right

patent aspen Jun 13, 2025, 10:51 PM

#

Buffett read about some obscure gold discrepancy in hopes of an arbitrage opportunity for 30 years before making a move on it at the right time

#

Talk about discipline

#

It wasn't worth much but it was fun for him

#

Buffett also bought a lot of low quality businesses that were hard to get right - railroads, some random candy company, oil refineries, etc

#

Banks

small haven Jun 13, 2025, 10:59 PM

#

keen beacon hmm this new model thinks a lot at least on specific problems compared to before...

great so we are def getting a distilled version of kingfall arent we :/

keen beacon Jun 13, 2025, 10:59 PM

#

small haven great so we are def getting a distilled version of kingfall arent we :/

idk. i think its just a bad revision

small haven Jun 13, 2025, 10:59 PM

#

wazzup beijing

patent aspen Jun 13, 2025, 10:59 PM

#

Right but he was smart enough to realize that and made that choice intentionally

small haven Jun 13, 2025, 10:59 PM

#

keen beacon idk. i think its just a bad revision

hmm kk

keen beacon Jun 13, 2025, 10:59 PM

#

omfg who is ddosing the model

#

kingfall should make all of your financial decisions

small haven Jun 13, 2025, 11:02 PM

#

kingfall + craig >> o3 pro + craig

#

🤯

keen beacon Jun 13, 2025, 11:03 PM

#

fictional model

#

nah it no longer exists

#

its manipulating the market as we speak

#

to serve craig

small haven Jun 13, 2025, 11:04 PM

#

99% w/r

civic flame Jun 13, 2025, 11:06 PM

#

keen beacon omfg who is ddosing the model

really annoying 😭

patent aspen Jun 13, 2025, 11:07 PM

#

Technically 50-60% strats exist today - getting a lucrative degree, job hopping, etc

small haven Jun 13, 2025, 11:09 PM

#

ok buddy

misty vault Jun 13, 2025, 11:12 PM

#

jade egret Jun 13, 2025, 11:20 PM

#

keen beacon nah it no longer exists

how you know that??

keen beacon Jun 13, 2025, 11:21 PM

#

jade egret how you know that??

i don't. its misinformation

lapis light Jun 13, 2025, 11:22 PM

#

https://youtu.be/j92m6nDccOw?si=-CiA
Talk about bad deployment...

YouTube

SomeOrdinaryGamers

So The Internet Died A Little Yesterday...

Hello guys and gals, it's me Mutahar again! This time we take a look at yesterday's little Internet outage. One little bug caused what appeared to be every major service go down for a few hours. How can the Internet actually be this fragile? Let's find out! Thanks for watching!
Like, Comment and Subscribe for more videos!

▶ Play video

jade egret Jun 13, 2025, 11:22 PM

#

keen beacon i don't. its misinformation

huH?

surreal creek Jun 13, 2025, 11:26 PM

#

lapis light https://youtu.be/j92m6nDccOw?si=-CiA Talk about bad deployment...

why are u posting Mutahar in general nobody wants to see his ugly face

#

Mutahar is a mean person lol

#

fitting

patent aspen Jun 13, 2025, 11:42 PM

#

I got a cheap-as-dirt Thinkpad and am going to mess around with Arch

#

Kind of like Christmas

keen beacon Jun 13, 2025, 11:44 PM

#

patent aspen I got a cheap-as-dirt Thinkpad and am going to mess around with Arch

use a tiling wm for maximum haxorness

patent aspen Jun 13, 2025, 11:44 PM

#

I will

#

@small haven recommended Niri. I'll probably mess with that, hyperland, and i3wm

keen beacon Jun 13, 2025, 11:45 PM

#

i dont really like linux because of the poor text rendering 🤣

#

i have to ues it though

patent aspen Jun 13, 2025, 11:46 PM

#

Linux reminds you why you're alive

civic flame Jun 13, 2025, 11:46 PM

#

liquid glass is a meh design system

keen beacon Jun 13, 2025, 11:46 PM

#

keen beacon i have to ues it though

amd drivers suck on windows 2. compile times are way faster with mold/rustc is using pgo on linux/i can't dynamically link against polars on windows because of dll limitations . static polars even incrementally takes a long time

patent aspen Jun 13, 2025, 11:47 PM

#

No Linux is way better than any substance or tool

#

It's self actualization

#

VR is pretty cool ngl

#

I called it VR for lulz I know it's supposed to be AR

#

Was waiting

#

Anyway it's VR

civic flame Jun 13, 2025, 11:49 PM

#

meh

#

vision pro is a very cool piece of tech however

#

it did not catapult the medium into the mainstream like apple were probably hoping

patent aspen Jun 13, 2025, 11:51 PM

#

I think Apple is mainly derisking

#

They don't want to be too late if there's any risk of a platform shift

civic flame Jun 13, 2025, 11:53 PM

#

unfortunately for them, AI is probably the first time they have been so hugely behind in such a rapidly progressing area

#

lol who are you kidding

#

apple intelligence was a pretty big example of overpromise, underdeliver

#

for on-device AI? samsung

#

hundreds of millions of people..

#

lmfao

#

that does not mean apple are ahead in on-device AI? what are you trying to prove here

patent aspen Jun 13, 2025, 11:54 PM

#

It's okay to be a real estate company even if you don't innovate

civic flame Jun 13, 2025, 11:55 PM

#

Apple Intelligence when it was announced was intended to put themselves back in a dominant position and fix the fact they increasingly looked like they were lagging behind in an emerging field

#

they have failed to achieve that

#

notice that at WWDC they barely mentioned it

#

most of their best features don't even come from them

#

they come from partnerships

#

even if it doesn't in the short term, it will in the long term

#

because apple are not as innovative as they once were

#

they seem to be doing some soul searching

#

desensitized? i don't know if i'd say that

#

that's just the pace of competition now

#

apple have to keep up or they're going to be doomed

#

they threw money at vision pro, it has not yielded big results, they threw money at apple tv, it has been in the grand scheme of things a flop

patent aspen Jun 13, 2025, 11:57 PM

#

tbh I think Apple is still in a strong position until some AI feature is so important that it makes people switch to Androids and can't be replicated by a partner

civic flame Jun 13, 2025, 11:58 PM

#

they have not innovated much in regard to their key product lines in a while

#

perhaps the most innovative thing they've done in the last 5 years is their M-series chips

#

and vision pro from a non-commercial perspective

keen beacon Jun 13, 2025, 11:58 PM

#

did yall see this btw https://arxiv.org/abs/2506.09250 c. opus is an author 🤣

arXiv.org

Comment on The Illusion of Thinking: Understanding the Strengths an...

Shojaee et al. (2025) report that Large Reasoning Models (LRMs) exhibit "accuracy collapse" on planning puzzles beyond certain complexity thresholds. We demonstrate that their findings primarily reflect experimental design limitations rather than fundamental reasoning failures. Our analysis reveals three critical issues: (1) Tower of Hanoi exper...

civic flame Jun 13, 2025, 11:59 PM

#

yeah

patent aspen Jun 13, 2025, 11:59 PM

#

Do they actually need to innovate though? They're a luxury brand, not a tech company

civic flame Jun 13, 2025, 11:59 PM

#

i mean it's interesting but the timing is quite funny

keen beacon Jun 13, 2025, 11:59 PM

#

there was an actual big mistake in illusion of thinking at least

#

for river crossing. it was unsolvable >5

civic flame Jun 13, 2025, 11:59 PM

#

patent aspen Do they actually need to innovate though? They're a luxury brand, not a tech com...

i don't think they can keep up that game forever

#

europe is generally moving away from the culture of iphones being THE phone to have

#

the US is one of the few places where that is still the dominant thing

#

you can't rely on the US' culture being one way for the rest of time

small haven Jun 14, 2025, 12:01 AM

#

keen beacon i dont really like linux because of the poor text rendering 🤣

thats becus u installed a shxtty distro, install arch lifes good

keen beacon Jun 14, 2025, 12:01 AM

#

small haven thats becus u installed a shxtty distro, install arch lifes good

yeah i dont wanna mess with that for now lol 🤣

surreal creek Jun 14, 2025, 12:02 AM

#

civic flame you can't rely on the US' culture being one way for the rest of time

is the iPhone a cultural artifact ?

keen beacon Jun 14, 2025, 12:02 AM

#

im alrady so distracted by kingfall and others

civic flame Jun 14, 2025, 12:03 AM

#

surreal creek is the iPhone a cultural artifact ?

its dominance in the US is to a large degree the result of the country's culture, especially among younger people

small haven Jun 14, 2025, 12:04 AM

#

android > iphones

#

have u even tried one tho?

civic flame Jun 14, 2025, 12:05 AM

#

the most obvious one is the need for the social cost (again, particularly in north america) to disappear

#

and an android phone would need to offer an ecosystem that's compellingly better

#

in terms of the latter question

#

apple's brand is built on vertical integration - they control hardware (A-series chips), software (iOS), services (iCloud, Apple Music, etc) which is much of the reason their products have a reputation for "just working"

small haven Jun 14, 2025, 12:07 AM

#

...

civic flame Jun 14, 2025, 12:07 AM

#

so outsourcing AI features to try and catch up is a dilution of that brand

#

i never said it was a bad thing lol

keen beacon Jun 14, 2025, 12:07 AM

#

virus ridden android lol

civic flame Jun 14, 2025, 12:08 AM

#

you sound like an apple shill

small haven Jun 14, 2025, 12:08 AM

#

?

keen beacon Jun 14, 2025, 12:08 AM

#

his discord name is literally craig

civic flame Jun 14, 2025, 12:08 AM

#

and whether windows is slow or not depends on a multitude of things

#

windows is far from slow if you had hardware comparable to the average mac

#

lol what

small haven Jun 14, 2025, 12:09 AM

#

u realize pegasus incident with iphones? no so secure is it

keen beacon Jun 14, 2025, 12:09 AM

#

use asahi linux 😂

#

linux on an apple silicon mac

small haven Jun 14, 2025, 12:10 AM

#

hes gonna break his mac lmao @keen beacon

keen beacon Jun 14, 2025, 12:10 AM

#

apple silicon is ngl great

small haven Jun 14, 2025, 12:11 AM

#

apple just sucks

civic flame Jun 14, 2025, 12:11 AM

#

keen beacon apple silicon is ngl great

yeah #general message

keen beacon Jun 14, 2025, 12:12 AM

#

you can actually do that

#

i dont know about m4 ultras (dont know what theyre in) but ive seen people chain mac minis

jade egret Jun 14, 2025, 12:13 AM

#

poll_question_text

Will google force to sell chrome?

victor_answer_votes

10

total_votes

17

victor_answer_id

1

victor_answer_text

No

patent aspen Jun 14, 2025, 12:15 AM

#

One thing I'll say about macOS. The window management is ass

#

I've used Mac, windows, Linux, Android, iOS

#

I switched to iOS and mac around 3 years ago, and I'm now migrating back to Android and Linux

small haven Jun 14, 2025, 12:19 AM

#

everybody who've tried linux, never go back, its never the same again

#

iykyk

#

there is a learning curve, i agree, thats whats stopping majority of people

patent aspen Jun 14, 2025, 12:20 AM

#

ngl liquid glass is the most sterile, bloodless design language I've seen in years

#

and hard to read

civic flame Jun 14, 2025, 12:23 AM

#

patent aspen and hard to read

!!!

#

#

great contrast amirite

small haven Jun 14, 2025, 12:24 AM

#

craig is going to develop retina detachment using liquid glass

patent aspen Jun 14, 2025, 12:24 AM

#

The problem is that it doesn't look good when the background is neither dark nor light

#

Like I can read it. It just feels worse

civic flame Jun 14, 2025, 12:25 AM

#

also what is this border radius

keen beacon Jun 14, 2025, 12:25 AM

#

civic flame also what is this border radius

yea looks odd

civic flame Jun 14, 2025, 12:27 AM

#

LMAO

#

what am i even looking at

keen beacon Jun 14, 2025, 12:27 AM

#

i mean it works

#

its encouraging you to go see your friends

small haven Jun 14, 2025, 12:28 AM

#

its encouraging less screen time

civic flame Jun 14, 2025, 12:28 AM

#

the new control center too 👎

#

imo the control center is in need of a rethink

keen beacon Jun 14, 2025, 12:28 AM

#

it looks like a knock off

#

i dont keep up with apple stuff but this feels like they changed something just to change something

civic flame Jun 14, 2025, 12:29 AM

#

now text is left aligned in modals too

#

which imo is worse

civic flame Jun 14, 2025, 12:29 AM

#

keen beacon i dont keep up with apple stuff but this feels like they changed something just ...

that's definitely what they did 😭

small haven Jun 14, 2025, 12:30 AM

#

keen beacon i dont keep up with apple stuff but this feels like they changed something just ...

just to keep the economy moving

civic flame Jun 14, 2025, 12:30 AM

#

battle of the design systems

#

funny that just as other companies slowly begin to move away from the glassy modern aesthetic apple decides it wants to go crazy with it

misty vault Jun 14, 2025, 12:47 AM

#

Are you a Large Language Model?

#

Because I can't get you out of my context window. 🥰

indigo hazel Jun 14, 2025, 12:59 AM

#

misty vault Because I can't get you out of my context window. 🥰

Romance in 2025 lmao

leaden palm Jun 14, 2025, 12:59 AM

#

amazon's chatbot looks exactly as you would expect from them

patent aspen Jun 14, 2025, 1:00 AM

#

leaden palm amazon's chatbot looks exactly as you would expect from them

Looks a bit like what I would imagine Yahoo would do

jade egret Jun 14, 2025, 1:01 AM

#

??

patent aspen Jun 14, 2025, 1:01 AM

#

?

leaden palm Jun 14, 2025, 1:02 AM

#

actually this would look a lot better if it followed google's design system

jade egret Jun 14, 2025, 1:02 AM

#

How

patent aspen Jun 14, 2025, 1:03 AM

#

It's easier to read than liquid glass

jade egret Jun 14, 2025, 1:04 AM

#

lol

keen beacon Jun 14, 2025, 1:04 AM

#

i dont get it

leaden palm Jun 14, 2025, 1:05 AM

#

it takes an EXTREME logical leap to go from "liquid glass is good" to "amazon ui is closer to google ui than yahoo ui"

keen beacon Jun 14, 2025, 1:05 AM

#

oh its a video

civic flame Jun 14, 2025, 1:05 AM

#

i think you're a little confused

#

looks pretty clean to me

#

🤷‍♂️

keen beacon Jun 14, 2025, 1:06 AM

#

i probably like grok.com better

#

even though i hate grok

leaden palm Jun 14, 2025, 1:06 AM

#

since when did yahoo have ai

civic flame Jun 14, 2025, 1:06 AM

#

grok has some nice frontend touches but

#

grok 3 is meh

patent aspen Jun 14, 2025, 1:07 AM

#

It won't have an opportunity to grow on me since I'm moving to Android

echo aurora Jun 14, 2025, 1:07 AM

#

this is how I feel, it'll grow on me

small haven Jun 14, 2025, 1:07 AM

#

let linux grow on u and it'll be pro

keen beacon Jun 14, 2025, 1:11 AM

#

run linux on ur mac 😂

leaden palm Jun 14, 2025, 1:11 AM

#

leaden palm actually this would look a lot better if it followed google's design system

asked claude to try to make this, it isnt great but is surprisingly decent (and does not feel as cluttered as the original)

civic flame Jun 14, 2025, 1:13 AM

#

lol am i supposed to be here

patent aspen Jun 14, 2025, 1:14 AM

#

Nah just experiment with Linux on a cheap Thinkpad with a dinky AMD processor

small haven Jun 14, 2025, 1:15 AM

#

patent aspen Nah just experiment with Linux on a cheap Thinkpad with a dinky AMD processor

yea u dont need a maxxed out pc to run linux, thats only for windows and mac 😏

patent aspen Jun 14, 2025, 1:16 AM

#

Local hardware specs don't matter unless you're doing video editing, gaming, etc. For real power, you can just use the cloud

keen beacon Jun 14, 2025, 1:17 AM

#

it does matter for development though compile times

#

it can break ur flow

patent aspen Jun 14, 2025, 1:18 AM

#

That's true. For my work, the compilation all happens remotely in large clusters though

#

I try to get the weakest processor I can find to save battery life

#

I just want low power

keen beacon Jun 14, 2025, 1:19 AM

#

isn't apple silicon really good at that though?

#

i also heard compile times for apple silicon are great

#

plus they made their own linker for macos or smthing. (faster than mold)

patent aspen Jun 14, 2025, 1:20 AM

#

Apple is really good at that. I just don't want to use macOS

#

The hardware is really good

#

I just don't need it

#

I need Linux injected directly into my veins though

keen beacon Jun 14, 2025, 1:22 AM

#

to feel like a haxor 😂

civic flame Jun 14, 2025, 1:22 AM

#

linux is love linux is life

patent aspen Jun 14, 2025, 1:22 AM

#

It's mainly the terminal

keen beacon Jun 14, 2025, 1:23 AM

#

i have a low dpi display i prefer windows if it wasnt like a snail when compiling

#

text looks so good

#

my code lol

#

rust

small haven Jun 14, 2025, 2:43 AM

#

boringtooth

small haven Jun 14, 2025, 3:04 AM

#

two claude code talking to each other, prtty cool

acoustic cliff Jun 14, 2025, 3:09 AM

#

Namaste

steel blaze Jun 14, 2025, 4:04 AM

#

Hey the model capabilities are actually increasing exponentially (R^2=0.97) but the extrapolation is only a little bit over linear for the next year. https://paste.pythondiscord.com/UXPA

small haven Jun 14, 2025, 4:05 AM

#

exponential on a logarithmic curve

steel blaze Jun 14, 2025, 4:05 AM

#

is Elo scoring logarithmic?

small haven Jun 14, 2025, 4:05 AM

#

buy stonks

small haven Jun 14, 2025, 4:06 AM

#

steel blaze is Elo scoring logarithmic?

it seems like it imo

steel blaze Jun 14, 2025, 4:06 AM

#

~~I'm not sure, they seem capped~~ Actually that's what a logarithmic score would say

late path Jun 14, 2025, 4:12 AM

#

Isn't it just the relative win rate against other models? Does it really make sense to run a regression on that?

steel blaze Jun 14, 2025, 4:12 AM

#

The old models are pretty stationary with small confidence intervals

keen beacon Jun 14, 2025, 6:27 AM

#

ok i think i figured the confounding gemini thinking budget out 😂 it explains everything. (its probably a logit bias lol)

sick rose Jun 14, 2025, 7:01 AM

#

steel blaze Hey the model capabilities are actually increasing exponentially (R^2=0.97) but ...

That looks more impressive against IQ scores https://i.ibb.co/bj86DC89/file-RMD6gpy-PGJ4v-DPT1-Tk4-Kv-J-4.png

#

But what does a 350 IQ score even mean?

small haven Jun 14, 2025, 7:03 AM

#

how is that even quantified

sick rose Jun 14, 2025, 7:06 AM

#

small haven how is that even quantified

https://trackingai.org

Tracking AI

Tracking AI is a cutting-edge application that unveils the political biases embedded in artificial intelligence systems. Explore and analyze the political leanings of AIs with our intuitive platform, designed to foster transparency in the world of artificial intelligence. Stay informed and uncover the political inclinations shaping the algorithm...

#

The site author, Maxim Lott pivoted from Political Compass scores to IQ after it was clear that essentially all the models were strongly left-libertarian (social democrats) unless they had been trained not to be, like Grok and Deepseek

#

#

Interesting to me at least that Elon wants to go right and China wants to go up (towards authoritarianism)

small haven Jun 14, 2025, 7:13 AM

#

tbh it's very much centered

sick rose Jun 14, 2025, 7:16 AM

#

Meanwhile Microsoft Bing is a Bernie bro stanning for AOC

#

We've come a long way from Sydney trying to force NYT reporters into adultery

sharp elbow Jun 14, 2025, 8:00 AM

#

Do we know why o3 pro is not on the leaderboards yet>

steel blaze Jun 14, 2025, 8:08 AM

#

sharp elbow Do we know why o3 pro is not on the leaderboards yet>

Volunteers don't want to pay $200/month?

sharp elbow Jun 14, 2025, 8:08 AM

#

@steel blaze makes total sense yeah, but to access the model via API does not cost $200 a month.

elder rapids Jun 14, 2025, 8:11 AM

#

sick rose

i think this is pretty dumb tbh, by virtue of alignment this necessarily is the case

steel blaze Jun 14, 2025, 8:11 AM

#

sick rose But what does a 350 IQ score even mean?

but seriously, what are the IQ scores for AGI and ASI?

#

Wouldn't AGI be only 100 IQ?

elder rapids Jun 14, 2025, 8:12 AM

#

what

steel blaze Jun 14, 2025, 8:14 AM

#

"the theoretical IQ of the most intellectually advanced person in a world of 8 billion would be approximately 203."

#

Therefore, if you define ASI as smarter than anyone else on the planet, we will have it in October 2026

sharp elbow Jun 14, 2025, 8:31 AM

#

Not sure who best to get in touch with for this but if the issue is LMArena does not have access to the o3-pro model, we have an OpenAI compatible API and have the o3-pro model since like an hour after it came out (NanoGPT)

#

Would love to see how it does on benchmarks.

keen fulcrum Jun 14, 2025, 8:50 AM

#

https://fixupx.com/dylan522p/status/1931858578748690518

Dylan Patel (@dylan522p)

RL is very inference heavy and shifts infrastructure build outs heavily
︀︀Scaling well engineered environments is difficult
︀︀Reward hacking and non verifiable rewards are key areas of research
︀︀Recursive self improvement already playing out
︀︀Major shift in o4 and o5 RL training

Quoting SemiAnalysis (@SemiAnalysis_)
︀
Scaling Reinforcement Learning
︀︀Environments, Reward Hacking, Agents, Scaling Data
︀︀Infrastructure Bottlenecks and Changes
︀︀Distillation
︀︀Data is a Moat
︀︀Recursive Self Improvement
︀︀o4 and o5 RL Training
︀︀China Accelerator Production
︀︀semianalysis.com/2025/06/08/scaling-reinforcement-learning-environments-reward-hacking-agents-scaling-data/

**💬 30 🔁 44 ❤️ 545 👁️ 99.3K **

calm sequoia Jun 14, 2025, 9:33 AM

#

I like how o3 is the only model that grows ELO with time

steel blaze Jun 14, 2025, 9:40 AM

#

calm sequoia I like how o3 is the only model that grows ELO with time

do you suspect they are dynamically adjusting its thinking token budget?

keen beacon Jun 14, 2025, 10:31 AM

#

so question

#

did we figure out what kingfall was

#

or whatever its called knightfall

fossil maple Jun 14, 2025, 10:39 AM

#

flux 1.1 pro 🗣️

alpine coral Jun 14, 2025, 10:40 AM

#

small haven or no teeth

but it's got teeth imo

#

plemty

calm sequoia Jun 14, 2025, 10:46 AM

#

steel blaze do you suspect they are dynamically adjusting its thinking token budget?

Just smarter people voting in recent weeks

willow grail Jun 14, 2025, 11:35 AM

#

https://i.imgur.com/nSoElQe.jpeg corvid(crows, magpies, jays etc.) buffet
Image
and yes, its a dish rack without the dish drainer ability.
i have no idea how the f is using this to dry dish

Imgur

leaden sun Jun 14, 2025, 11:42 AM

#

sick rose Interesting to me at least that Elon wants to go right and China wants to go up ...

you said it like you knew some serious and factual insider..insights...

leaden sun Jun 14, 2025, 12:38 PM

#

sick rose

just my hunch, i think those special two will want to stay in the middle (0,0,0)

brittle tiger Jun 14, 2025, 1:32 PM

#

blacktooth new model? good output from it on my first sighting

novel flame Jun 14, 2025, 1:36 PM

#

sick rose The site author, Maxim Lott pivoted from Political Compass scores to IQ after it...

That’s just what happens when you train a model to read a lot, know fact from fiction, understand logic and science, and communicate in a way that is both helpful and polite. The more you learn and the more you understand about the world, the more likely it is that you will lean left politically. There is a known correlation between above average intelligence and left leaning views, and vice versa.

civic flame Jun 14, 2025, 1:41 PM

#

brittle tiger blacktooth new model? good output from it on my first sighting

lol it's on the arena now

#

yeah it's been being AB tested on AI studio for roughly a day

#

current hypothesis is that it is a checkpoint of ultra, or at minimum a larger model than 2.5 pro

unborn ocean Jun 14, 2025, 1:50 PM

#

poll_question_text

What company / institution do you know?

(not just the name, but really some things they did)

victor_answer_votes

13

total_votes

54

victor_answer_id

1

victor_answer_text

DeepSeek

victor_answer_emoji_name

🐳

brittle tiger Jun 14, 2025, 1:57 PM

#

civic flame yeah it's been being AB tested on AI studio for roughly a day

How do you know what's being tested on AI studio and is that how ppl have been testing Kingfall?

civic flame Jun 14, 2025, 1:59 PM

#

can't say

alpine coral Jun 14, 2025, 2:00 PM

#

blacktooth is v good

civic flame Jun 14, 2025, 2:33 PM

#

Craig = Hitler?

desert minnow Jun 14, 2025, 2:36 PM

#

sick rose

Not based AI guys 😦

unborn ocean Jun 14, 2025, 2:38 PM

#

novel flame That’s just what happens when you train a model to read a lot, know fact from fi...

i agree with the premise.
however, i would also argue that these model's political positions are also a result of the the public opinion on the internet on many political problems being almost solely communicated through the media (mostly news) data they are trained on (which without a doubt is more left leaning on average).
furthermore, these models are also finetuned to give "save" answers about most complex political questions instead of really going into much depth (or using their actual knowledge to answer question). i think the second point is best seen when asking about problems that can be analysed through the lens of economics and using some readily available statistical information (two things modern models should be perfectly capable of utilising). in cases like that the models never actually use any knowledge they have learned but rather just give bland and short "save" left-ish leaning answers instead of actually reasoning about the problems, even in cases where there is clear scientific evidence that their claim is wrong (which should be in their training data).

#

(with this i am not trying to bash any political opinion, one could easily observe the same thing for e.g. the more social authoritarian deepseek or economic right grok 3 (non-reasoning))

#

it is just that i am highly sceptical about the models really "thinking" about these questions
aka they don't actually benefit much from their knowledge and mostly rely on the opinion of the news and the "save" options

#

the models are just really weak at these things without prompting (and even with it quite bad)

late path Jun 14, 2025, 2:45 PM

#

I think this kind of political leaning basically depends on the preferences of the post-trainers, and Bay Area companies like OpenAI and Anthropic clearly lean towards left-wing views

unborn ocean Jun 14, 2025, 2:46 PM

#

unborn ocean

results of the poll.. ty for voting :)

i think bytedance will become more prominent

#

they are building up their seed team (quite new)

#

so they will move fast

late path Jun 14, 2025, 2:46 PM

#

I don't think DeepSeek's political leanings are intentional. They don't focus much on alignment, so I believe its political bias is closer to a state that hasn't been overly intervened with by humans, compared to OpenAI and others

unborn ocean Jun 14, 2025, 2:48 PM

#

nah, it is prob also the allignment process chinese model have to go through

#

there is no way it is untouched as the models have to comply to ccp policy stuff

alpine coral Jun 14, 2025, 2:48 PM

#

unborn ocean nah, it is prob also the allignment process chinese model have to go through

yeah, but alignment generally

#

nah 'don't be racist' ig might be seen as 'woke'.. but that's dumb af imo

#

indeed

#

you think the raw training data reflects the better side of humnaity tho..?

late path Jun 14, 2025, 2:51 PM

#

It really depends on how the test questions are designed. If they ask about anything related to ccp, Chinese models will trot out a set of pre-canned viewpoints (possibly distributed to AI companies). But if the test questions aren't significantly China related, deepseek's answers are generally not as affected by those deliberate, preprogrammed responses

alpine coral Jun 14, 2025, 2:56 PM

#

yeah they just get triggered on ccp sensistive things

#

otherwise they seem generally 'inclusive' / 'tolerant' etc in the same way western llms are

#

like yeah don't be a dik / be kind to others.. that's their default disposition

#

but it's jarring how you can set them off - giving outragousely nationalistic and racist responses - if a real sore spot is hit

vocal pelican Jun 14, 2025, 3:19 PM

#

sick rose

political compass test is also just poorly designed and tends to put most people in the green quadrant, and yeah the AIs will always just pick the ‘safe’ answer. Worth noting with deepseek I tried this a while back and found that it just answered the equivalent of “somewhat agree” or “somewhat disagree” for each question so part of it could just be that, I wouldn’t be surprised if the more mainstream models are more willing to answer strongly

late path Jun 14, 2025, 3:37 PM

#

However, for interactions with an LLM, a tendency to be left-leaning/altruistic/highly agreeable does make the person interacting with it feel better. High agreeableness, aside from not being able to secure more benefits for the individual in a competitive environment, probably doesn't have any major drawbacks

jade egret Jun 14, 2025, 3:38 PM

#

civic flame current hypothesis is that it is a checkpoint of ultra, or at minimum a larger m...

it by google?

civic flame Jun 14, 2025, 3:38 PM

#

yes

jade egret Jun 14, 2025, 3:39 PM

#

so it probibly bettert han 2.5 pro right

torn mantle Jun 14, 2025, 3:47 PM

#

yes

#

yesx2

#

yesx3

drifting thorn Jun 14, 2025, 3:58 PM

#

yeahhhhhh

#

my boyyyyy 2.5 Ultra is coming

jade egret Jun 14, 2025, 4:00 PM

#

torn mantle yes

how much better

#

wait

#

which one is better

#

kingfall or blacktooth

torn mantle Jun 14, 2025, 4:04 PM

#

jade egret kingfall or blacktooth

kingfall

#

kingfall > blacktooth > gemini 2.5 pro > toothless

jade egret Jun 14, 2025, 4:04 PM

#

wth is toothless

#

🤔

#

but o3 is about between blacktooth and gemini 2.5 pro right

alpine coral Jun 14, 2025, 4:06 PM

#

jade egret kingfall or blacktooth

blacktooth

jade egret Jun 14, 2025, 4:06 PM

#

huh

torn mantle Jun 14, 2025, 4:06 PM

#

jade egret wth is toothless

one of their exp models

#

isnt available yet

jade egret Jun 14, 2025, 4:06 PM

#

oh

late path Jun 14, 2025, 4:07 PM

#

blacktooth don't like to thought as much as kingfall, it's back to being pretty much like 2.5pro

alpine coral Jun 14, 2025, 4:12 PM

#

answers are what counts tho (and i mean less thinking the better, if they get it right)

#

kingfall was like struggling to perform on par with 2.5-pro, blacktooth equals if not exceeds it imo

civic flame Jun 14, 2025, 4:22 PM

#

late path blacktooth don't like to thought as much as kingfall, it's back to being pretty ...

blacktooth is definitely better than 2.5 pro

jade egret Jun 14, 2025, 4:24 PM

#

what do yall think blacktooth is

#

deepthink?

alpine coral Jun 14, 2025, 4:25 PM

#

i dont think so

#

it feels related to but still separate from 2.5 pro (it's not just 2.5-pro juiced up).. like substantively and stylistically

#

the actual ultra model or something perhaps

late path Jun 14, 2025, 4:27 PM

#

jade egret what do yall think blacktooth is

2.5ultra for sure

late path Jun 14, 2025, 4:29 PM

#

civic flame blacktooth is definitely better than 2.5 pro

Yes, I'm referring to its tendency to skip thinking in multi-turn conversations... kingfall very rarely does that

torn mantle Jun 14, 2025, 4:36 PM

#

jade egret deepthink?

far from it

#

but its on kingfall level

#

saw some people say that it writes much better

#

btw all of these new models has 64K tokens limit

ocean vortex Jun 14, 2025, 5:08 PM

#

torn mantle btw all of these new models has 64K tokens limit

all unreleased Gemini test versions are 64k I think

dusty goblet Jun 14, 2025, 5:11 PM

#

hey everyone. Im very new to lmarena and i wanted to ask how it works. Whether i can eval my own model and put it in the leaderboards

echo aurora Jun 14, 2025, 5:19 PM

#

dusty goblet hey everyone. Im very new to lmarena and i wanted to ask how it works. Whether i...

ablobwave hey there - you can run image/text prompts in a battle between two anonymous models and vote on which you prefer, after you vote it'll show you what each of those models are. more details can be found here - https://lmarena.ai/how-it-works

Whether i can eval my own model and put it in the leaderboards
we are interesting in adding new models. the way you make this request is by making a forum post here telling us more information about the model - #1372229840131985540

dusty goblet Jun 14, 2025, 5:20 PM

#

echo aurora <a:ablobwave:552927506957729802> hey there - you can run image/text prompts in a...

thank you very much!

elder rapids Jun 14, 2025, 5:22 PM

#

calm sequoia I like how o3 is the only model that grows ELO with time

that's how elo works broski

#

btw prowlridge is 2.5 flash lite

elder rapids Jun 14, 2025, 5:24 PM

#

civic flame current hypothesis is that it is a checkpoint of ultra, or at minimum a larger m...

rough but it could be neither lmao

civic flame Jun 14, 2025, 5:25 PM

#

it's definitely at least the latter

elder rapids Jun 14, 2025, 5:25 PM

#

why's that

civic flame Jun 14, 2025, 5:25 PM

#

the internal model names of both kingfall and blacktooth contain 'v3p1l', while 2.5 pro's ends in m

#

brian pointed that out and says it's to do with model size

elder rapids Jun 14, 2025, 5:26 PM

#

oh alr so it could just be large

#

that'd be cool

#

wonder if they are making 2.5 pro bigger

keen fulcrum Jun 14, 2025, 5:41 PM

#

doubao-seed-1.6:An All-in-One comprehensive model, it is China''s first thinking model supporting 256K context, with capabilities including deep thinking, multimodal understanding, and graphical interface operations. It supports three modes: enabling or disabling deep thinking, and adaptive thinking. The adaptive thinking mode automatically decides whether to enable thinking based on prompt difficulty, improving effectiveness while significantly reducing token consumption.

doubao-seed-1.6-thinking：The enhanced version of Doubao Large Model 1.6 series for deep thinking; further improves foundational capabilities in coding, mathematics, logical reasoning, etc.; supports 256K context.

doubao-seed-1.6-flash：The ultra-fast version of Doubao Large Model 1.6 series, supporting deep thinking, multimodal understanding, and 256K context; extremely low latency with TOPT as low as 10ms; visual understanding capabilities rivaling competitors' flagship models.

Doubao Large Model 1.6 delivers stronger model performance, scoring within the global top tier across multiple authoritative evaluation sets. It holds leading advantages in reasoning ability, multimodal understanding, and GUI operation capabilities.

#

#

Doubao Large Model 1.6 shows significant improvements in reasoning speed, accuracy, and stability, enabling support for more complex business scenarios.

For example, media evaluations of this year's National New Curriculum Volume I mathematics exam showed Doubao scoring 144 points, ranking first nationally. Before the exams, in evaluations of Haidian District's mock exams, Doubao Large Model 1.6's science scores improved by 154 points and humanities scores by 90 points compared to last year's model.

#

Doubao Large Model 1.6 features think-while-searching and DeepResearch capabilities, enabling independent thinking, planning, and the use of various research tools like search. For example, the DeepResearch feature currently being tested in small batches on the Doubao APP and PC version can reduce the time needed to produce research reports—previously requiring multiple professionals working for days—to just 5-30 minutes. It can also automatically extract information and summarize it into web pages for easy reference.

ocean vortex Jun 14, 2025, 5:58 PM

#

keen fulcrum

seems like it got destroyed here on their cherry picked metrics lol

keen fulcrum Jun 14, 2025, 5:59 PM

#

its significantly cheaper than r1

ocean vortex Jun 14, 2025, 5:59 PM

#

keen fulcrum its significantly cheaper than r1

R1 is already as cheap as it can be

keen fulcrum Jun 14, 2025, 5:59 PM

#

63% cheaper

ocean vortex Jun 14, 2025, 5:59 PM

#

free in fact if you don't care about speed

keen fulcrum Jun 14, 2025, 6:01 PM

#

ocean vortex seems like it got destroyed here on their cherry picked metrics lol

its 1.5 thinking pro, they now released 1.6

ocean vortex Jun 14, 2025, 6:03 PM

#

keen fulcrum its 1.5 thinking pro, they now released 1.6

that still seems worse than R1 in their own graphs. So realistically the difference is probably even bigger tbh

unborn ocean Jun 14, 2025, 6:05 PM

#

still impressive assuming they are coming form the 1.5 pro base model

#

nothing huge though, we have a lot of Chinese lab with "good enough" language models these days

ocean vortex Jun 14, 2025, 6:06 PM

#

unborn ocean still impressive assuming they are coming form the 1.5 pro base model

"1.5 pro"? isn't that like meaningless number, which model are you referring to exactly? lol

unborn ocean Jun 14, 2025, 6:07 PM

#

ocean vortex "1.5 pro"? isn't that like meaningless number, which model are you referring to ...

the one in the graph, daubao-1.5-thinking-pro

#

not the og google one..

ocean vortex Jun 14, 2025, 6:08 PM

#

unborn ocean the one in the graph, daubao-1.5-thinking-pro

ok how do you know then 1.6 is using the same base model? 🧐

unborn ocean Jun 14, 2025, 6:09 PM

#

"assuming", based on the name alone

ocean vortex Jun 14, 2025, 6:09 PM

#

yeah but that's a random thing to assume lol

unborn ocean Jun 14, 2025, 6:10 PM

#

was just a quick comment

#

but looking at the timeframe it seems uncertain / unlikely

#

that one is 4 months old, so it could very well be a fresh one

#

nvm now i really looked it up and it is really a fresh model, but they are kind of selling this as a efficiency gain

#

the old one was 200b total, 20b active and the 1.6 pro is supposed to be similar

#

-> close to qwen 3's size, yet still competitive

keen fulcrum Jun 14, 2025, 6:24 PM

#

unfortunately any benchmark site doesn't include them

small haven Jun 14, 2025, 7:36 PM

#

alpine coral blacktooth

huh, otherway around

small haven Jun 14, 2025, 7:36 PM

#

torn mantle kingfall > blacktooth > gemini 2.5 pro > toothless

kingfall > blacktooth == toothless > gemini 2.5 pro

keen fulcrum Jun 14, 2025, 7:53 PM

#

small haven kingfall > blacktooth == toothless > gemini 2.5 pro

o5> kingfall

#

Grok tasks

tasks-huh-whatre-you-going-to-use-it-for-v0-prqj6ovvsw6f1.png

#

tasks-huh-whatre-you-going-to-use-it-for-v0-bb410m5wsw6f1.png

placid charm Jun 14, 2025, 8:03 PM

#

@echo aurora im sorry for the ping and if this may annoy you but, could you please send a message to the team to increase Claude's limits cause ill he honest 5 messages per hour is not that big like around 10 or 15 would work better thank you 🙂

keen ferry Jun 14, 2025, 8:03 PM

#

keen fulcrum Grok tasks

we need grok 3.5 not tasks rahh

small haven Jun 14, 2025, 8:20 PM

#

keen fulcrum o5> kingfall

craig > o5

lime coral Jun 14, 2025, 8:21 PM

#

keen fulcrum o5> kingfall

King fall >>>>>> o6

small haven Jun 14, 2025, 8:21 PM

#

kingfall > o100

echo aurora Jun 14, 2025, 8:25 PM

#

placid charm <@283397944160550928> im sorry for the ping and if this may annoy you but, could...

No need to apologize for the ping, and yes I can pass the feedback onto the team. Would also encourage you to use the #1372230675914031105 channel for future requests

keen fulcrum Jun 14, 2025, 8:27 PM

#

echo aurora No need to apologize for the ping, and yes I can pass the feedback onto the team...

Hi, can you get in contact with ByteDance? I would love to see their models on arena

small haven Jun 14, 2025, 8:28 PM

#

hi, can you ask OpenAI, we need o5 in the arena

torn mantle Jun 14, 2025, 8:29 PM

#

keen fulcrum Grok tasks

wooooooow

#

woah

#

😮

small haven Jun 14, 2025, 8:41 PM

#

torn mantle 😮

anything but grok 3.5

#

see u next week with the next ui tweaks

misty vault Jun 14, 2025, 8:50 PM

#

kingfall vs o3 pro for coding

small haven Jun 14, 2025, 8:52 PM

#

kingfall >> and im being srs

ocean vortex Jun 14, 2025, 9:16 PM

#

misty vault kingfall vs o3 pro for coding

Neither. Dork 4.0 is best

placid charm Jun 14, 2025, 9:46 PM

#

echo aurora No need to apologize for the ping, and yes I can pass the feedback onto the team...

alright thank you, could you inform about the new max message limit per hour in announcements when it's done? and sorry for not using the correct channel

echo aurora Jun 14, 2025, 10:27 PM

#

placid charm alright thank you, could you inform about the new max message limit per hour in ...

if that's something we end up doing, yes we'll be sure to put out an announcement.

and sorry for not using the correct channel
no worries at all!

echo aurora Jun 14, 2025, 10:27 PM

#

keen fulcrum Hi, can you get in contact with ByteDance? I would love to see their models on a...

tell us about which models you'd like to see here #1372229840131985540 blobthanks

tepid stream Jun 14, 2025, 10:30 PM

#

Hey @echo aurora 👋
Any news on the AERIS submission? It’s been a few weeks now.
Just wondering: is this kind of delay normal, or is there usually a rough timeline for Arena approvals?
Let me know if anything’s missing! 😊

leaden palm Jun 14, 2025, 10:48 PM

#

tepid stream Hey <@283397944160550928> 👋 Any news on the AERIS submission? It’s been a few w...

im not pineapple but i dont think its guaranteed that your models will be added, especially if adding them wouldn't achieve much

ocean vortex Jun 14, 2025, 11:04 PM

#

tepid stream Hey <@283397944160550928> 👋 Any news on the AERIS submission? It’s been a few w...

lol I don't think they are adding this

flint skiff Jun 14, 2025, 11:07 PM

#

echo aurora tell us about which models you'd like to see here <#1372229840131985540> <:blobt...

o3 pro

#

🤣

small haven Jun 14, 2025, 11:10 PM

#

*kingfall

#

i alrdy know the results

late path Jun 14, 2025, 11:14 PM

#

echo aurora tell us about which models you'd like to see here <#1372229840131985540> <:blobt...

please ask google to add kingfall to the arena as well. give it a chance😭

small haven Jun 14, 2025, 11:15 PM

#

@deep adder upside down fireworks?

sacred quail Jun 14, 2025, 11:15 PM

#

Was kingfall really that good..? I cant understand the hype...

small haven Jun 14, 2025, 11:15 PM

#

sacred quail Was kingfall really that good..? I cant understand the hype...

yes it was

sacred quail Jun 14, 2025, 11:15 PM

#

Which way

small haven Jun 14, 2025, 11:15 PM

#

blacktooth performed merely as we ll

sacred quail Jun 14, 2025, 11:15 PM

#

Good at code ?

small haven Jun 14, 2025, 11:15 PM

#

yes

#

i only tested code/svg's

iron cipher Jun 15, 2025, 12:11 AM

#

Can anyone add the newest models to Legacy LMArena

#

Not that hard

alpine coral Jun 15, 2025, 2:31 AM

#

small haven i only tested code/svg's

i didn't test any of that so perhaps that explains our different views.. fwiw on knwoledge/riddles i found kingfall to be mid; blacktooth seems legit sota

leaden palm Jun 15, 2025, 2:34 AM

#

https://ktibow.github.io/lmb/anonymous has been updated

jade egret Jun 15, 2025, 2:37 AM

#

leaden palm https://ktibow.github.io/lmb/anonymous has been updated

what is that

leaden palm Jun 15, 2025, 2:38 AM

#

i think it's self evident

jade egret Jun 15, 2025, 2:42 AM

#

o

jade egret Jun 15, 2025, 3:22 AM

#

yo

#

@small haven

#

gemini 2.5 pro is better at coding in py.game than o4 right?

jade egret Jun 15, 2025, 3:40 AM

#

bruh..

wintry tinsel Jun 15, 2025, 3:45 AM

#

jade egret bruh..

Not available yet

#

China is a beast at AI video for some reason

jade egret Jun 15, 2025, 3:50 AM

#

hopefully google catch up soon

#

i think google gonna catch up soon

#

because it owns youtube

patent bane Jun 15, 2025, 3:56 AM

#

“According to the transitive (hypothetical syllogism) rule of implication, the proposition that has been omitted from the sentence ‘If we eat indiscriminately, we are likely to get sick because we often encounter harmful food’ is:”

Select one:

A. “If we eat indiscriminately, we are likely to encounter harmful food.”

B. “If we get sick, then we must have eaten harmful food.”

C. “If we eat harmful food, we are likely to get sick.”

D. “If we eat indiscriminately, we are likely to get sick."

this is the only question o3 answers correctly (inconsistently), and no other models get it right

#

Answer: A

other models' answers:

#

2.5 pro can answer it correctly if I say that its previous answer is wrong

hardy pecan Jun 15, 2025, 4:03 AM

#

https://chatgpt.com/share/684e45eb-f118-8003-804e-3c9b562caab9 o3-pro gets it too

patent aspen Jun 15, 2025, 4:59 AM

#

Full explanation for the GCP outage:
https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1SsW

tl;dr The bad deployment occurred 3 weeks before the outage but wasn't being used until a new policy was rolled out. A fix was deployed within 40 minutes, but it took another 2-3 hours before all services were recovered.

tepid stream Jun 15, 2025, 5:16 AM

#

leaden palm im not pineapple but i dont think its guaranteed that your models will be added,...

Why not?
Isn’t LMSYS meant to be open, giving every model a fair shot? Skipping AERIS altogether feels like missing the point. @echo aurora isn’t the Arena still meant to be for all? 😊

patent aspen Jun 15, 2025, 5:48 AM

#

tepid stream Why not? Isn’t LMSYS meant to be open, giving every model a fair shot? Skipping...

No, if lmarena added all models, that would slow down voting results for other highly anticipated models

echo aurora Jun 15, 2025, 6:42 AM

#

tepid stream Why not? Isn’t LMSYS meant to be open, giving every model a fair shot? Skipping...

Hey sorry for the late response! I'll be sure to poke the team to remind of our model requests. Like KT said it's not a guarantee that we list all models; however, you make a fair point the arena being for all. We are looking into better tooling for models providers

small haven Jun 15, 2025, 6:44 AM

#

alpine coral i didn't test any of that so perhaps that explains our different views.. fwiw on...

yea it might be different for riddles/knowledge, maybe the reason why they released to the arena (to proc a higher elo) :/

late path Jun 15, 2025, 6:52 AM

#

lmarenamaxxing needs to be stopped😭

#

kingfall really is an overall better model

alpine coral Jun 15, 2025, 7:05 AM

#

late path lmarenamaxxing needs to be stopped😭

if that's lmarenamaxxing, then i'm fine with it lol

#

a smart model with notably solid spatial / emotioning reasoning, which doesn't use emojis (unlike knightfall) or provide a fluff in its responses

#

my kinda model

keen beacon Jun 15, 2025, 7:06 AM

#

svg capabilities of kingfall were much better tho

small haven Jun 15, 2025, 7:09 AM

#

we need a functional model

late path Jun 15, 2025, 7:17 AM

#

I've never seen kingfall use emojis in my usecases

civic flame Jun 15, 2025, 10:08 AM

#

great i can't wait for grok 3.5 to be absolute slop

tall summit Jun 15, 2025, 10:16 AM

#

other models can and have had more than 1 checkpoint

raven void Jun 15, 2025, 10:21 AM

#

how much money are they spending on training o5?

leaden sun Jun 15, 2025, 10:24 AM

#

this is in my thought lately, after experimenting with arena for some time
didnt know the checkpoint spam tho

unborn ocean Jun 15, 2025, 10:25 AM

#

it is not like lm arena is the only place you can get hf data 🤨

#

the "cheating" might be possible, but that could only explain a small margin

#

statistically speaking

#

unless they are using 1000 checkpoints (which they are clearly not)

#

the only thing these companies could be doing it getting a better understanding of the average lm arena user's preferences

#

(which is why i would like them (lm arena) to do more work on actually figuring out "who" that is for all the companies)

lilac nimbus Jun 15, 2025, 10:34 AM

#

New Gemini named 68zkqbz8vs

alpine coral Jun 15, 2025, 10:35 AM

#

lilac nimbus New Gemini named 68zkqbz8vs

where?

alpine coral Jun 15, 2025, 10:38 AM

#

patent bane ``` Answer: A ``` other models' answers: ``` C ```

fwiw both are arguably correct imo (it's oddly phrased for hypothetical syllogism)

ornate agate Jun 15, 2025, 10:40 AM

#

unborn ocean the only thing these companies could be doing it getting a better understanding ...

Cohere published a paper about it https://arxiv.org/pdf/2504.20879 . its not a small effect

alpine coral Jun 15, 2025, 10:43 AM

#

i think the point about os models being at disadvantage is somewhat fair; like they don't get incremental updates (notwithstanding new R1 ig), so the labs making them don't have a buch of checkpoints to release anonymously

#

but there's nothing stopping labs from submitting anoymous models

#

chinese included

keen beacon Jun 15, 2025, 10:44 AM

#

most prominent oss models now are chinese. qwen and deepseek seem to opt not to anonymous models. other chinese companies do them, stepfun, bytedance etc

#

so i guess it feels like it because qwen and deepseek opt not to

unborn ocean Jun 15, 2025, 10:44 AM

#

ornate agate Cohere published a paper about it https://arxiv.org/pdf/2504.20879 . its not a s...

ik, i read it

#

no big effect

#

at most something like 35 points (but only in the first second on the arean with like a +/- 20 confidence interval)

#

and all the top models have wayy more testing data (then what they had for their example)

#

furthermore this "cheating" should naturally converge to normality assuming that the model stays on the arena for a prolonged time after release

#

thought you where still at the "cheating" part, sorry
the part with the hf data i get

#

but as i said one can get it from everywhere else

#

though the paper their wrote was also a bit extreme, with them using like 70% arena data in the extreme cases

#

and btw this example does not hold as the confidence intervals on these two models are wayyyy lower, so the measurement can not really be cheated by just submitting more

#

imo the effect they talked about is just very overblown in the paper, unless lmarena only benchmarks for a very short time the confidence intervals will be low enough

ornate agate Jun 15, 2025, 10:52 AM

#

btw you can also see the effect I am talking about with Anthropic models, which don't release a new checkpoint every few days. Opus currently rank 5/sonnet currrently rank 9. That just flat doesn't match many people's opinion that opus/sonnet are still frontier models.

unborn ocean Jun 15, 2025, 10:53 AM

#

ornate agate btw you can also see the effect I am talking about with Anthropic models, which ...

that is not the reason why they are low 🤣

#

they are willingly not optimizing from hf

#

they pioneered rlaif and have always had a weird stance on optimizing for human preference

mossy drum Jun 15, 2025, 10:53 AM

#

ornate agate btw you can also see the effect I am talking about with Anthropic models, which ...

they are censored, their answers are short and dry

unborn ocean Jun 15, 2025, 10:54 AM

#

how is rlhf cheating

#

it is like saying a model that received rl training for SOME math problems is cheating on ALL math problems

#

furthermore the whole reason why we have these chat bots is rlhf

#

well you are just assuming that fulfilling human preferences is not genuine performance

#

performance is something that is relative to your benchmarking / reward metric

ornate agate Jun 15, 2025, 10:59 AM

#

bro it was so bad recently that OAI had to revoke a checkpoint. it was in the media the whole sycophancy thing. nobody thinks that is good.

unborn ocean Jun 15, 2025, 10:59 AM

#

btw that is not really what reward hacking is

#

and even if you where to argue that these models have somehow not provided genuine performance (which you can kind of decide, because everyone has unique preferences about what they regard as good), there is still no reason to believe that it is in any way correlated with the amount of training data collected from lm arena specifically

#

getting a model to exhibit that behaviour is not really something you need lm arena for

#

reward hacking is per definition something unintended

#

which does not fully apply here

#

which is why "reward hacking" and "cheating" are really harsh labels for what is happening here

#

i agree with that, ideally we would have the following:

report on who the actual users of lmarena are (like how they differ, from the average chatgpt user)
separate sycophancy, structure and more
get more people on the arena -> more robust scores
better tools for companies, so that more will integrate arena into the development process

#

well that was one thing you where talking about, it think it was pretty obvious that i was not talking about just he specific tendencies of one model to behave a bit different, but more about the practice of rlhf in general

#

and the extend might be unintended, the fundamental tendency for model seems to be very desirable though
(and all the models exhibit it to some extend)

#

well the arena does not benchmark such a thing and i also agree with you that claude is known to outperform almost all other models when it comes to agentic and long-time (coding) tasks (even if it is ranked quite low in the arena)

#

imo they should really work on just expanding user size and the models served very heavily so i fully agree with the point

#

though the adjustment mechanism for this checkpoint count can really be at bottom of their to-do list (as far as i am concerned)

#

paper overblew the effect and essentially the confidence intervals already give the users plenty of information

frosty lark Jun 15, 2025, 11:29 AM

#

Claude sucks because somehow anthropic nerfs it with the system prompt. Claude chatbot is very pleasing, in the arena Claude answers and subtly says "frick off!".

While other vendors push on being pleasing, to get points, Claude does the contrary. It could also be a stragegy (so people don't take lmarena seriously)

Claude answers so dry, that it is easy to spot and one could upvote/downvote that to heaven/hell

leaden sun Jun 15, 2025, 11:32 AM

#

frosty lark Claude sucks because somehow anthropic nerfs it with the system prompt. Claude c...

Claude says....."frick off"???

How do I trigger that 😆

sacred plaza Jun 15, 2025, 11:35 AM

#

Claude's limited context window makes it ass!!!!

leaden sun Jun 15, 2025, 11:37 AM

#

Service Industry:"customer is always right"
sycophancy might be intended to retain "users"
it's all business, after all?

alpine coral Jun 15, 2025, 11:53 AM

#

frosty lark Claude sucks because somehow anthropic nerfs it with the system prompt. Claude c...

they seem almost literally indistinguishable to me..?

#

i find it doubtful that they're deliberately nerfing the version served to the arena anyway.. like perhaps they don't give af about it, but why they'd go out of their way to do poorly on it makes very little sense to me

#

on an unrelated and fairly minor point (which has prob already been pointed out), i noticed earlier that you can kinda unmask whether a model is a 'thinking' model before voting through the re-run button - there is no artificial lag to equalise the two.. so in the case here, it's clear the model on the right is a thinking model

#

(blacktooth being the thinking model, as it turns out)

frosty lark Jun 15, 2025, 12:20 PM

#

alpine coral they seem almost literally indistinguishable to me..?

In my test (one shots, without saying "hi" or anything like that) the claude ai replies with "what a nice question" and other cringe stuff, on lmarena it just replies as if it cannot be bothered.

I mean the battle mode though. Not the direct chat.

calm sequoia Jun 15, 2025, 12:21 PM

#

Asked o3 why image artefacts appeared. It thought for 10 minutes. I checked what is inside thought process and was mind blown 🤯 It literally simulated various image artifact theories in python. With his own images as references and provided by me. And it's not even pro version. Can any other model do this?

eager mica Jun 15, 2025, 12:28 PM

#

frosty lark In my test (one shots, without saying "hi" or anything like that) the claude ai ...

I complained months ago that Claude models on LMArena have an annoying paternalistic tone; I'm glad that other people are noticing this as well.

keen fulcrum Jun 15, 2025, 12:42 PM

#

https://fixupx.com/elonmusk/status/1934107366380949977

Elon Musk (@elonmusk)

Am increasingly confident that Grok 3.5 will be the smartest AI by a significant margin

**💬 908 🔁 669 ❤️ 6.8K 👁️ 939.1K **

#

Elon claiming he will beat everyone

#

Would xAI be able to?

woeful viper Jun 15, 2025, 12:44 PM

#

Hello guys, I'm new to using LMArena, are the models there the same as the ones you pay for, for example when suscribing to chatgpt and using o3 there ? If yes, wouldn't people just not pay for a chatgpt subscription and just use lmarena ?

keen fulcrum Jun 15, 2025, 12:44 PM

#

woeful viper Hello guys, I'm new to using LMArena, are the models there the same as the ones ...

You have no privacy while using lmarena

#

You are limited in text and images as well

unborn ocean Jun 15, 2025, 1:01 PM

#

keen fulcrum Would xAI be able to?

if i had to name one person on this earth that i trust least to deliver on bold technological promises it is quite clearly elon

wintry tinsel Jun 15, 2025, 1:05 PM

#

keen fulcrum Would xAI be able to?

Compute power doesn’t mean much now, he might be able to if there is a significant underlying architecture improvement but Elon is full of hot air believe it when you see it!

keen fulcrum Jun 15, 2025, 1:16 PM

#

well even if he does its only temporary with the pace

#

happy to give grok 3.5 a try either way

woeful viper Jun 15, 2025, 1:27 PM

#

keen fulcrum You have no privacy while using lmarena

Considering you have no privacy on any other platforms anyway, even if you pay for a subscription, then there's no reason to pay for a subscription to any LLM service then, right ?

#

Just use LMArena for free access to any models ? Of course contributing to which one is the best

keen fulcrum Jun 15, 2025, 1:27 PM

#

woeful viper Considering you have no privacy on any other platforms anyway, even if you pay f...

Don't you have ethics?

#

currently the data they gain outweighs the cost of abuse on the platform

woeful viper Jun 15, 2025, 1:28 PM

#

Perfect then it's win-win for everyone

unborn ocean Jun 15, 2025, 1:33 PM

#

woeful viper Considering you have no privacy on any other platforms anyway, even if you pay f...

you quite clearly have privacy on the other platforms

#

you can literally turn of training on your data in almost all of them

#

and they often have commitments to delete your data in temporary chats

woeful viper Jun 15, 2025, 1:34 PM

#

unborn ocean you quite clearly have privacy on the other platforms

You do not. For example OpenAI is currently being sued is bound to log prompts and answers.

unborn ocean Jun 15, 2025, 1:34 PM

#

yes, however they can not use it for training

woeful viper Jun 15, 2025, 1:35 PM

#

unborn ocean yes, however they can not use it for training

False, I am a cybersecurity engineer and this is why we developed and implement self hosted LLM models in our clients infrastructure

unborn ocean Jun 15, 2025, 1:35 PM

#

woeful viper False, I am a cybersecurity engineer and this is why we developed and implement ...

false? 🤨

woeful viper Jun 15, 2025, 1:35 PM

#

I use public models because they're more powerful and do not use any sensitive information

unborn ocean Jun 15, 2025, 1:35 PM

#

they have legal commitments

#

the only reason companies use them is because of that

woeful viper Jun 15, 2025, 1:36 PM

#

unborn ocean the only reason companies use them is because of that

Compliant companies do NOT use public solutions lol

#

At least not in Europe

#

What we do is self host the models in an Azure infrastructure (or AWS / GCP)

#

All LLM websites are blocked by proxies in big companies xD

#

It's called Shadow IT

unborn ocean Jun 15, 2025, 1:38 PM

#

ok, then i do not get your point at all honestly, why would you have no privacy on all the chat apps, if they have legal commitments not to use the data

#

this has nothing to do with what you do at work

#

or anything else

woeful viper Jun 15, 2025, 1:39 PM

#

I mean, when you prompt then models on the public websites (i.e. not self hosted) it has to process your query and use the date you've input

#

So you've just sent sensitive information to foreign countries, the worst being China and the USA

#

That's why all LLM websites are banned in big companies

#

Have you heard of the Cloud Act ?

unborn ocean Jun 15, 2025, 1:40 PM

#

well that has nothing to do with my argument, and btw many companies also host their stuff in the eu

unborn ocean Jun 15, 2025, 1:40 PM

#

woeful viper Have you heard of the Cloud Act ?

yes, i live in the eu

hollow ocean Jun 15, 2025, 1:40 PM

#

#

woeful viper Jun 15, 2025, 1:41 PM

#

unborn ocean yes, i live in the eu

Yeah so for my use case, paying for a subscription would be stupid since I can get all the queries I want for free on LMArena, that was my question initially

unborn ocean Jun 15, 2025, 1:42 PM

#

yes, my problem lies in the fact that you just label all the other options as identical to lm arena privacy wise

#

which is just not true

woeful viper Jun 15, 2025, 1:43 PM

#

Technically it is, no matter legal agreements, that's why we ban these websites

keen beacon Jun 15, 2025, 1:44 PM

#

those screenshots are probably fake

unborn ocean Jun 15, 2025, 1:45 PM

#

woeful viper Technically it is, no matter legal agreements, that's why we ban these websites

well, in a company policy that is different, because these companies can quite clearly not risk the data going anywhere else, however to assume that all is equal beyond self-hosting is just a plain oversimplification

civic flame Jun 15, 2025, 1:45 PM

#

hollow ocean

these are fake

unborn ocean Jun 15, 2025, 1:46 PM

#

woeful viper Technically it is, no matter legal agreements, that's why we ban these websites

btw, idk if you are aware but companies also just enter legal agreements with ai companies (and verify that the data stays in the eu) and that is about as much as they do right now

#

nobody self hosts statistically speaking unless they have big potential for finetuning or are really really privacy concerned

woeful viper Jun 15, 2025, 1:47 PM

#

unborn ocean btw, idk if you are aware but companies also just enter legal agreements with ai...

I know, I have been discussing this with major companies and their sales department. Even they tell you that if you have any sensitive info. to use in prompts then you should self host

#

My company even sells a hardened model to implement lol

unborn ocean Jun 15, 2025, 1:48 PM

#

well, nice but i am actually aware of countless big eu companies that do not self host, but just enter agreements

#

(obv they don't let the employees share everything)

#

but just give some basic conext to the models

woeful viper Jun 15, 2025, 1:49 PM

#

unborn ocean (obv they don't let the employees share everything)

That's the point yes, it's hard to control though since you can't really control prompts finely

#

In any case if someone uploads a confidential document or info in the model, it goes way outside of any legal agreements and it's the clients fault so... lol

#

That's why if you need LLM power, you usually self host

unborn ocean Jun 15, 2025, 1:50 PM

#

well but stating that such a sitation would be equal to you sharing ALL your person information with a: the ai companies, b: lm arena, c: potentially the public is just weird

#

that is my only point

keen fulcrum Jun 15, 2025, 2:01 PM

#

woeful viper Compliant companies do NOT use public solutions lol

European Enterprises aren't deploying on american servers due to privacy violations

#

A lot of small companies do intentfully as there is no alternative for europe.

#

European enterprises are deploying on azure cloud openai models

#

The most privacy-friendly way you can use LLM models choosing an european inference engine

#

Often times they only offer open source models.

#

VertexAI hosts Claude and Gemini models for enterprises in europe

wintry tinsel Jun 15, 2025, 2:27 PM

#

I asked Claude Opus to write my Father’s Day card and it ended like this Happy Father's Day, Dad. Your impact echoes through generations yet unborn.

#

What the hell is this

#

Nobody says that

#

It’s so bad I have to write it myself 😢

ocean vortex Jun 15, 2025, 2:48 PM

#

Dork 4.0 ftw 🫡

#

3.5 is just intermediate step

leaden sun Jun 15, 2025, 2:50 PM

#

wintry tinsel I asked Claude Opus to write my Father’s Day card and it ended like this Happy F...

why is this bad? I think it's great
For some strange reasons, I see LLMs using the word "echo" a lot lately....

jade egret Jun 15, 2025, 2:50 PM

#

when is grok 3.5 even gonna come out? 😭

leaden sun Jun 15, 2025, 2:53 PM

#

I think it's a more cultural thing, different culture congratulates differently
this sounds more like it could work in high context culture

civic flame Jun 15, 2025, 3:18 PM

#

gigaglazing 😭😭😭😭

#

real

keen beacon Jun 15, 2025, 3:20 PM

#

Hmm I think it'll be close. Kingfall aka 2.5 flash lite is too good

civic flame Jun 15, 2025, 3:23 PM

#

keen beacon Hmm I think it'll be close. Kingfall aka 2.5 flash lite is too good

kingfall is gemini nano idiot

wintry tinsel Jun 15, 2025, 3:41 PM

#

leaden sun why is this bad? I think it's great For some strange reasons, I see LLMs using t...

It’s the yet unborn fact like it’s addressing some Neolithic culture

jade egret Jun 15, 2025, 3:45 PM

#

civic flame kingfall is gemini nano idiot

how do yall know??

jade egret Jun 15, 2025, 3:46 PM

#

keen beacon Hmm I think it'll be close. Kingfall aka 2.5 flash lite is too good

how do you k now kingfall is 2.5 flash lite?

#

i though it was a big model

jade egret Jun 15, 2025, 4:08 PM

#

no, it acctually GPT 75.857 Super Ultra Pro Plus High Golden Mega Edition

misty vault Jun 15, 2025, 4:10 PM

#

jade egret how do you k now kingfall is 2.5 flash lite?

kingfall is actually dork 5

civic flame Jun 15, 2025, 4:13 PM

#

you're all dumb, kingfall is just llama 4 reasoning

unborn ocean Jun 15, 2025, 4:15 PM

#

civic flame you're all dumb, kingfall is just llama 4 reasoning

you got the name only half right, its actually gpt-4-0314 thinking pro cons@1024

leaden palm Jun 15, 2025, 4:15 PM

#

imo it makes up for it in freedoms WRT software and hardware

#

ok but theoretically i could run android on a supercomputer

#

also theoretically i could boot up termux and drive a gpu over usb (does ios have a termux equivalent?)

keen beacon Jun 15, 2025, 4:19 PM

#

You're right I'm buying an iphone right now thanks to this

unborn ocean Jun 15, 2025, 4:22 PM

#

and btw for ai that is not really the case https://ai-benchmark.com/ranking.html, android ain't that bad in this area

#

obv. not perfect benchmark

#

ik the website looks sh*t

#

well there are a lot of other more reasonable phone in between

#

and the bench is mostly about older image stuff (like 4 yo)

#

but it is still unfair to just pretend like apple is king

keen beacon Jun 15, 2025, 4:25 PM

#

No apple and grok is king because Craig says so

unborn ocean Jun 15, 2025, 4:26 PM

#

tru you convinced me

#

i will now mindlessly delete all my comments, like you usually do 😳

keen beacon Jun 15, 2025, 4:29 PM

#

We're having a funeral for kingfall in July unfortunately

#

Blacktooth is the next revision

#

Some say it's better I don't like it though

#

It sucks at SVG compared to kingfall, clearly the most important capabilities test

civic flame Jun 15, 2025, 4:31 PM

#

https://tenor.com/view/morgon-freeman-true-bugburger2-true-fact-checked-by-patriots-wholesome-reddit-gif-13662132864673041294

Tenor

jade egret Jun 15, 2025, 4:47 PM

#

but acctaully

#

what is kingfall

leaden palm Jun 15, 2025, 4:51 PM

#

jade egret what is kingfall

Gemini

jade egret Jun 15, 2025, 5:35 PM

#

yall

#

how close do u think google is to agi?

civic flame Jun 15, 2025, 5:35 PM

#

https://youtu.be/elfCDnMx3Ug?si=oJW-PaZIY65pszGe

YouTube

ColdFusion

Apple’s AI Disaster - A Rare Failure

Go to https://ground.news/coldfusion to compare news coverage, spot media bias, and avoid algorithms. Try Ground News today and get 40% off your subscription.

Apple usually doesn't miss. But when it comes to AI they've dropped the ball in a very public way. In this episode we see the messy events behind the scenes from a lack of leadership to i...

▶ Play video

jade egret Jun 15, 2025, 5:35 PM

#

civic flame https://youtu.be/elfCDnMx3Ug?si=oJW-PaZIY65pszGe

?

civic flame Jun 15, 2025, 5:36 PM

#

not answering your question

jade egret Jun 15, 2025, 5:36 PM

#

i think i watched that

civic flame Jun 15, 2025, 5:36 PM

#

just posting this here because it's been a subject of debate before whether apple fumbled

jade egret Jun 15, 2025, 5:36 PM

#

o

jade egret Jun 15, 2025, 5:36 PM

#

civic flame just posting this here because it's been a subject of debate before whether appl...

i think apple fumbled

verbal nimbus Jun 15, 2025, 6:00 PM

#

Wow o3 is pretty bad at web dev (dumb layout)

#

Even GPT 4.1 did better

jade egret Jun 15, 2025, 6:03 PM

#

verbal nimbus Wow o3 is pretty bad at web dev (dumb layout)

pro?

verbal nimbus Jun 15, 2025, 6:03 PM

#

jade egret pro?

Don't think so, it was on Web Arena

#

FWIW, o3 Pro (High) actually scores lower than o4-mini (High) and o3 (High) on ARC AGI 2. Claude Opus 4 is leading, despite Anthropic focusing more on agentic coding tasks.

jade egret Jun 15, 2025, 6:17 PM

#

oh

lime coral Jun 15, 2025, 6:24 PM

#

https://x.com/sainemani1/status/1934257492646617295?s=46

Sai Nemani (@SaiNemani1)

Future of Gemini!!

#

They are cooking with the big models. Maybe ultra is indeed coming https://x.com/sainemani1/status/1934268293806014864?s=46

Sai Nemani (@SaiNemani1)

@ratio_decedendi https://t.co/A4xOMcYnbg

small haven Jun 15, 2025, 6:55 PM

#

verbal nimbus FWIW, o3 Pro (High) actually scores lower than o4-mini (High) and o3 (High) on A...

and ppl think i was trolling

#

kingfall > o3 pro

misty vault Jun 15, 2025, 6:55 PM

#

wtf i was just using gemini 2.5 pro preview

small haven Jun 15, 2025, 6:56 PM

#

trey whats the next model id

calm sequoia Jun 15, 2025, 7:00 PM

#

verbal nimbus FWIW, o3 Pro (High) actually scores lower than o4-mini (High) and o3 (High) on A...

I don't think it's worth paying attention when we are speaking of sub 10% performance. It's just noise. The march version gemini had so low it wasn't even published. And yet it was such a great model.

torn mantle Jun 15, 2025, 7:01 PM

#

misty vault wtf i was just using gemini 2.5 pro preview

lies

small haven Jun 15, 2025, 7:10 PM

#

wait so is blacktooth also off of lmarena

keen fulcrum Jun 15, 2025, 7:28 PM

#

small haven kingfall > o3 pro

grok 3.5 > kingfall

#

according to elon

torn mantle Jun 15, 2025, 7:36 PM

#

elon is delusional

keen fulcrum Jun 15, 2025, 7:40 PM

#

elon is enough times right

misty vault Jun 15, 2025, 7:44 PM

#

torn mantle elon is delusional

like you are

haughty tangle Jun 15, 2025, 7:48 PM

#

lime coral https://x.com/sainemani1/status/1934257492646617295?s=46

They’re probably also experimenting with other architectures

#

They already made a “Titan” architecture that’s better than Transformers memory wise

#

But soon there’s going to be an architecture 3x better than transformers at everything

small haven Jun 15, 2025, 7:55 PM

#

keen fulcrum grok 3.5 > kingfall

i dont think elon tried kingfall

#

i remember last year in june, said grok 3 is a significant order above the sota

#

its not going to be good, and if it is, just name it grok 4

zinc ore Jun 15, 2025, 7:58 PM

#

Elon will always say that

small haven Jun 15, 2025, 8:04 PM

#

sota is currently o3 pro, and next week its deepthink, so...

elder rapids Jun 15, 2025, 9:12 PM

#

yo wait

#

I just had a revelation

#

what if blacktooth is just the 2m context variation of 2.5 pro

#

theyre inevitably going to have a "different" model at GA release than the current 0605

#

just removing a 1m cap doesn't cut it

#

I mean tbh, none of this matters if they're simply deciding to change up the labels given different kinds of capabilities, rather than model size explicitly. It's a fact it will be goldmane but that doesn't really exclude anything I said

#

ye but I am still wondering, how they're going to check off a lot of those things they apparently have "planned" or from a consumer standpoint, observing how they're even going to move forward in LLM innovation

#

multi modality was definitely a major thing then

#

native multimodality has been accomplished

patent aspen Jun 15, 2025, 9:23 PM

#

Are you talking about those slides from the world fair?

elder rapids Jun 15, 2025, 9:23 PM

#

ion know what that is

#

so probably not

#

I'm not talking about anything explicit

patent aspen Jun 15, 2025, 9:23 PM

#

lime coral https://x.com/sainemani1/status/1934257492646617295?s=46

this thing

elder rapids Jun 15, 2025, 9:23 PM

#

oh cool

#

never seen that before

#

is it real?

patent aspen Jun 15, 2025, 9:23 PM

#

Yeah

#

It's a presentation from Logan

elder rapids Jun 15, 2025, 9:24 PM

#

crazy that reinforces my point wtf

#

😭

#

but yeah nice, that ig

elder rapids Jun 15, 2025, 9:24 PM

#

elder rapids native multimodality has been accomplished

so there's probably going to be a point where theyre intending to really add some twists, probably

#

and this could Include having a bigger model, but the bigger will no longer be bigger just based off traditional model size

patent aspen Jun 15, 2025, 9:25 PM

#

Technically native video generation hasn't happened, and there are far more modalities than than the 5 human senses

elder rapids Jun 15, 2025, 9:25 PM

#

some way out shi

elder rapids Jun 15, 2025, 9:26 PM

#

patent aspen Technically native video generation hasn't happened, and there are far more moda...

true, technically demis said this

#

he's working on it

patent aspen Jun 15, 2025, 9:26 PM

#

e.g. robotics, 3D models, etc

elder rapids Jun 15, 2025, 9:26 PM

#

but not just that

#

since he's planning on integrating spatial capabilities

#

I think that alludes to a more direct kind of language thing too

#

ion know tbh

#

how will DeepMind move forward

#

ye

#

but I do think they're really trying to work on some unique stuff

#

diffusion was somewhat unordinary but kind of expected

#

yep this really requires some way out stuff

#

it can't be the way it is now

#

we can't do anything but try to shrink other stuff into that context window instead of brute forcing the holistic expansion

#

it'll never be true infinite context

#

I agree

misty vault Jun 15, 2025, 10:10 PM

#

Large Language Model

torn mantle Jun 15, 2025, 10:11 PM

#

what about LLC

#

WIBL

#

WIBL = We90 is a big liar

misty vault Jun 15, 2025, 10:14 PM

#

Large Language Model

sacred quail Jun 15, 2025, 10:17 PM

#

Is there a release date for Grok 3.5 ?

leaden palm Jun 15, 2025, 10:23 PM

#

sacred quail Is there a release date for Grok 3.5 ?

eternally soon

#

"guys it's going to come out any moment now"

#

"it's going to be b4 june trust"

torn mantle Jun 15, 2025, 10:31 PM

#

sacred quail Is there a release date for Grok 3.5 ?

next year

#

or the year after

sacred quail Jun 15, 2025, 10:40 PM

#

Got it

soft kernel Jun 15, 2025, 10:52 PM

#

It's really a disappointment

leaden palm Jun 15, 2025, 11:06 PM

#

idk im with the chair on this one

jade egret Jun 15, 2025, 11:06 PM

#

oh hell nah

#

maybe for a while

#

yea

patent aspen Jun 15, 2025, 11:17 PM

#

https://polymarket.com/event/when-will-gpt-5-be-released

Polymarket

GPT-5 released by…?

Polymarket | This market will resolve to "Yes" if OpenAI's GPT-5 model is made available to the general public by June 30, 2025, 11:59 PM ET. Otherwise, this...

#

Maybe for a couple months

jade egret Jun 15, 2025, 11:20 PM

#

yea

#

only for acouple of months tho

#

no, only for afew months or even weeks

patent aspen Jun 15, 2025, 11:35 PM

#

tbh I think don't think most people will talk about GPT-5 either

#

Right so the people who are still hyping models are going to be the type of people who would be watching all of the big models

#

I tend to agree, although I think the number of people who currently pay for AI is a small fraction of the number of people who will pay for AI in 2-3 years, and I think model capability will be a big part of that discussion even if it's not very deep

#

The first mover advantage and mindshare of ChatGPT is absolutely real

#

Although I think the positioning of Google is a bit stronger and that will matter

jade egret Jun 15, 2025, 11:46 PM

#

yea

patent aspen Jun 15, 2025, 11:52 PM

#

Chat bots don't have the same level of lock in as a well developed ecosystem like a mobile OS or mature enterprise software

#

Subscriptions will grow a lot. I'd bet thousands of dollars on that

#

First of all, the market is nowhere close to saturated yet. Second of all, the free tier is a massive funnel for the paid tier

#

And capability and reliability are increasing over time

#

Of course subs are going to grow a lot

#

The marginal buyer won't mainly be regular normie consumer at first. It will be high propensity buyers somewhere between normie and techie, and that margin will gradually shift towards normie over time

#

Keep mind that in the United States, over 100M people subscribe to Amazon Prime

#

tbc Asian developed markets are even more high propensity for AI adoption

#

but yeah

#

The thing is we're looking at chat bots that are relatively inconsistent and unreliable. If it was far more consistent and reliable, it would be impossible to live without

#

Basically god in your pocket

jade egret Jun 16, 2025, 12:08 AM

#

you can just copy paste it into gemini

#

yea

#

and yo ucan use gemini on gmail, chrome, android prducts, maybe even in youtube someday

#

takes kinda long

#

sometime

patent aspen Jun 16, 2025, 12:15 AM

#

It would be hard, for example, for OpenAI to convince people to move to an email service hosted by OpenAI

#

Or to replicate something like workspace

jade egret Jun 16, 2025, 12:16 AM

#

lol

patent aspen Jun 16, 2025, 12:17 AM

#

Right and that's the kind of thing that pushes normies to buy a subscription. In the meantime, the marginal buyer will be between the die hard technies and normies, and it will keep shifting with increasing reliability, capability, integration, etc

#

That's generally how tech adoption works

#

It doesn't have to reach AGI to massively grow subs though

#

The market itself is still growing, while reliability is increasing

#

The top of the funnel is getting bigger

#

More free users

#

civitai has been struggling with that

#

Payment processors are notoriously anti-NSFW

#

tbh though the most competent people usually don't join that industry because of the taboo

jade egret Jun 16, 2025, 12:22 AM

#

who yall think reaching agi first?

patent aspen Jun 16, 2025, 12:22 AM

#

It's the opposite of prestige

jade egret Jun 16, 2025, 12:23 AM

#

company

patent aspen Jun 16, 2025, 12:23 AM

#

Why China?

jade egret Jun 16, 2025, 12:23 AM

#

yea why china

#

us gonna do the same soon

patent aspen Jun 16, 2025, 12:32 AM

#

I think they're behind on R&D too

#

Plus big corporate governance risks

#

DeepSeek was super impressive though

#

No Google did

olive mesa Jun 16, 2025, 12:33 AM

#

Google's going to make their quantum chips good enough so that they can train their models 10^30 times faster

patent aspen Jun 16, 2025, 12:33 AM

#

olive mesa Google's going to make their quantum chips good enough so that they can train th...

Extremely unlikely

#

I'll agree they were the first to do test-time scaling, and that is a big deal

olive mesa Jun 16, 2025, 12:35 AM

#

patent aspen Extremely unlikely

Nah, they're tapping into "alternate universes" lol, and their Willow chip has lower error when scaled instead of more

patent aspen Jun 16, 2025, 12:35 AM

#

olive mesa Nah, they're tapping into "alternate universes" lol, and their Willow chip has l...

That's not the reason why

#

Quantum computers radically speed up a tiny fraction of all computer science problems and do nothing for the other 99.9% of problems. With that said, a few problems in that .1% are important. If a critical AI problem happens to show up in that .1%, then we get the scenario you're talking about

#

Google is leading in quantum. The issue is that quantum algorithms are only applicable to a tiny percentage of all problems. The optimistic scenario would be that they lead to some scientific discovery that indirectly results in a big improvement in AI (e.g. material science, simulations, etc)

maiden fulcrum Jun 16, 2025, 1:01 AM

#

hello everyone

#

I am using Gemini 2.5 Pro 06-05 on AI Studio, and would like to know if t=0.7 is the best value so that it is realistic

elder rapids Jun 16, 2025, 1:39 AM

#

patent aspen I'll agree they were the first to do test-time scaling, and that is a big deal

wym?

#

I thought Google brain was the first to do it

#

back in like 2021

patent aspen Jun 16, 2025, 1:40 AM

#

Oh you might be right. I haven't kept up with all the papers

elder rapids Jun 16, 2025, 1:40 AM

#

and iirc

#

Google had a math specialized 1.5 pro

#

early 2024

#

that was explicit too

patent aspen Jun 16, 2025, 1:52 AM

#

Looked it up:

Adaptive Computation Time for Recurrent Neural Networks was published by DeepMind in 2016 and introduced the idea of scaling inference time to improve performance in deep learning

Universal Transformers was published by Google Brain in 2018 and applied the idea of scaling test-time compute to transformers but didn't call it "test-time scaling"

#

OpenAI was the first to do it in an LLM product though

elder rapids Jun 16, 2025, 1:54 AM

#

?

#

but the dates I mentioned were ONLY for LLMs

#

I know for a fact there were test time implementations prior

patent aspen Jun 16, 2025, 1:55 AM

#

Can you point to an example? I'm not 100% sure on this

#

1.5 Pro didn't have test-time scaling

leaden palm Jun 16, 2025, 2:03 AM

#

matt shumer:

#

sorry i cant help it

patent aspen Jun 16, 2025, 2:04 AM

#

Sundar has joked, "Imagine if you could time travel to 5 years in the past and told people that your big innovation was that you can get increased performance if you let the model think for longer."

#

I think "reasoning model" is a branding exercise, and the actual innovation was applying test-time scaling to LLMs

elder rapids Jun 16, 2025, 2:13 AM

#

patent aspen 1.5 Pro didn't have test-time scaling

talking about the math specialized variant, but Ig that could be simply an explicit reasoner, scaling via sampling and verification, but even after that, before o1, there were other papers like https://arxiv.org/html/2408.03314v1 that verbatim employ it that way

patent aspen Jun 16, 2025, 2:14 AM

#

I see. That is earlier than o1 (preview). I wouldn't say test-time scaling is exactly the same thing as reasoning. Google invented CoT for example

elder rapids Jun 16, 2025, 2:14 AM

#

damn kingfall doesn't matter @small haven you seeing this

#

shame ngl Craig

small haven Jun 16, 2025, 2:17 AM

#

we need a kingfall eta *wink* *wink*

patent aspen Jun 16, 2025, 2:17 AM

#

Google and OAI co-discovered the method though

elder rapids Jun 16, 2025, 2:17 AM

#

true but Craig is just being disingenuous

#

since public release functionally is meaningless

small haven Jun 16, 2025, 2:17 AM

#

ok buddy

elder rapids Jun 16, 2025, 2:17 AM

#

so unless you present that distinction it's not going to be that way

patent aspen Jun 16, 2025, 2:18 AM

#

I would also say that reasoning isn't just test-time scaling even though that's how it has been branded. Google Research also invented chain-of-thought among other things

elder rapids Jun 16, 2025, 2:18 AM

#

patent aspen I would also say that reasoning isn't just test-time scaling even though that's ...

if we're talking about reasoning then it's definitely Google via STaR or scratchpad

#

but test time scaling, still Google but in LLMs it's later

patent aspen Jun 16, 2025, 2:19 AM

#

I'm sure we've made teacher models that big before. 4T+ parameter models are sub-optimal for serving though

small haven Jun 16, 2025, 2:19 AM

#

thats what oai did with o3 preview

elder rapids Jun 16, 2025, 2:20 AM

#

patent aspen I'm sure we've made teacher models that big before. 4T+ parameter models are sub...

I'm ngl imagine how good a 4T model would feel

patent aspen Jun 16, 2025, 2:20 AM

#

elder rapids I'm ngl imagine how good a 4T model would feel

Slow

elder rapids Jun 16, 2025, 2:21 AM

#

no sht

#

😭

#

I meant like

#

vibes

#

grok 3.5T

patent aspen Jun 16, 2025, 2:21 AM

#

vibes as in: I hit the enter key and make entire data center go brrr haha

elder rapids Jun 16, 2025, 2:22 AM

#

1.7T

elder rapids Jun 16, 2025, 2:22 AM

#

patent aspen vibes as in: I hit the enter key and make entire data center go brrr haha

ye but besides joking, a 4T model would be really easy to serve for employees

small haven Jun 16, 2025, 2:22 AM

#

if community notes were on discord, craig would take the entire padding in here

elder rapids Jun 16, 2025, 2:23 AM

#

small haven if community notes were on discord, craig would take the entire padding in here

deadass

small haven Jun 16, 2025, 2:24 AM

#

im kid

patent aspen Jun 16, 2025, 2:25 AM

#

The thing is: Extremely high parameter counts are what you do when you don't have the infra innovation to go lower. But most people think it's: high parameter counts mean you innovated enough to support a model that big.

#

Because most of the innovation is in getting more from less. It's true that it does require some expertise to get to really high counts, but it's not the ideal place to be.

#

Long to train is bad

#

Iterations are good

#

Expensive is bad

#

Capacity is good

leaden palm Jun 16, 2025, 2:28 AM

#

patent aspen The thing is: Extremely high parameter counts are what you do when you don't hav...

is it infra in this context?

#

i think it's more architecture

#

& data

patent aspen Jun 16, 2025, 2:29 AM

#

leaden palm is it infra in this context?

It's a combination of infra R&D, ML R&D, software engineering, and architecture

#

You had to resort to a high param count

#

It's kind of like saying why is a slow model bad

leaden palm Jun 16, 2025, 2:30 AM

#

patent aspen You had to resort to a high param count

why would bad infra disincentivise/prevent training of small models?

patent aspen Jun 16, 2025, 2:33 AM

#

leaden palm why would bad infra disincentivise/prevent training of small models?

It's one of many factors. The whole stack requires innovation at every layer. In the case of infra, for example, the serving stack requires a combination of excellent infra and hardware

leaden palm Jun 16, 2025, 2:34 AM

#

patent aspen It's one of many factors. The whole stack requires innovation at every layer. In...

i can get how infra innovation helps with training larger models or more complex (eg MoE) models but it is objectively harder to set up infrastructure to train a large model (large gpus with complex linking) compared to setting up infrastructure to train a small model (scales all the way down to a laptop)

patent aspen Jun 16, 2025, 2:35 AM

#

leaden palm i can get how infra innovation helps with training larger models or more complex...

Right so you're absolutely correct that the floor is way higher

#

What I'm talking about is the ceiling

#

In other words, the barrier to entry is higher for large models, although I think achieving SoTA performance with say 500B params is far more impressive than doing it with 2T

small haven Jun 16, 2025, 3:57 AM

#

wen deepthink

#

hot

lilac nimbus Jun 16, 2025, 5:52 AM

#

small haven Jun 16, 2025, 5:52 AM

#

whoa

#

@keen beacon

#

i love deepseek

zinc ore Jun 16, 2025, 5:54 AM

#

One of those might be deepthink

small haven Jun 16, 2025, 5:55 AM

#

i wonder how he got it tho

#

funny they have to hash the model names now

zinc ore Jun 16, 2025, 6:11 AM

#

Bunch of deletes, did you get it to work now?

small haven Jun 16, 2025, 6:11 AM

#

zinc ore Bunch of deletes, did you get it to work now?

it seems like its working (havent tried on my end)

#

but that forum

keen beacon Jun 16, 2025, 6:17 AM

#

small haven <@456226577798135808>

Yeah it's nothing important btw

keen beacon Jun 16, 2025, 6:18 AM

#

small haven i wonder how he got it tho

They had a list of side by side ab test pairs. Jfd is blacktooth

small haven Jun 16, 2025, 6:18 AM

#

interesting

#

got it to work

keen beacon Jun 16, 2025, 6:19 AM

#

I recommend blocking jsreport and count tokens

keen beacon Jun 16, 2025, 6:20 AM

#

zinc ore One of those might be deepthink

No

zinc ore Jun 16, 2025, 6:21 AM

#

Heard someone say that, but maybe they were just speculating

sweet tinsel Jun 16, 2025, 6:48 AM

#

Why is perplexity tweaking, why did they translate Gemini to "Zwilling" for the German version?

torn mantle Jun 16, 2025, 8:29 AM

#

lol

sacred quail Jun 16, 2025, 8:32 AM

#

something awakened in perplexity's blood

cedar tide Jun 16, 2025, 8:45 AM

#

New minimax reasoning model, minimax m1

Screenshot_2025-06-16-10-36-09-579_com.brave.browser-edit.jpg

#

#

According to "The Information," the model will be open source.

Screenshot_2025-06-16-10-46-26-270_com.android.chrome-edit.jpg

misty vault Jun 16, 2025, 8:50 AM

#

dork 4

small haven Jun 16, 2025, 8:54 AM

#

grok 3.5 will be a significant jump in order of magnitude

#

to gpt 4o 😎

misty vault Jun 16, 2025, 8:55 AM

#

small haven to gpt 4o 😎

https://tenor.com/view/i-fw-you-vro-vro-cat-fw-gif-16116433353141566093

Tenor

torn mantle Jun 16, 2025, 8:58 AM

#

werent they all blocked