torn mantle Jun 5, 2025, 6:39 PM

#

im happy

#

thanks

small haven Jun 5, 2025, 6:40 PM

#

great..

#

do u know the exact

balmy mist Jun 5, 2025, 6:40 PM

#

where is it?

torn mantle Jun 5, 2025, 6:40 PM

#

70% from what

small haven Jun 5, 2025, 6:40 PM

#

how do i even benchmark

#

it

small haven Jun 5, 2025, 6:40 PM

#

balmy mist where is it?

nowhere to be found, and it isn't that good, wait for deepthink

balmy mist Jun 5, 2025, 6:41 PM

#

wiat its cheaper??

small haven Jun 5, 2025, 6:41 PM

#

@deep adder how do i benchmark myself

#

i wanna try o3 pro

raven void Jun 5, 2025, 6:42 PM

#

Good luck to those using ScamAi 🤣

elder rapids Jun 5, 2025, 6:42 PM

#

both get 8-9/10 on the sample questions

#

i tested both 0605 and kingfall

#

nope

wintry tinsel Jun 5, 2025, 6:43 PM

#

Fuxcin worse at coding amazing

elder rapids Jun 5, 2025, 6:43 PM

#

ran 5 times

#

each model

#

lmao

#

0605 sometimes got the answer easier than kingfall

wintry tinsel Jun 5, 2025, 6:44 PM

#

Is simple bench somehow being ganked?

elder rapids Jun 5, 2025, 6:44 PM

#

wintry tinsel Is simple bench somehow being ganked?

their answers are smart asf

#

I wouldn't believe it

#

you can tell, that even through the summary they're catching like everything

#

kingfall and 0605 are extraordinary

late path Jun 5, 2025, 6:45 PM

#

I don't think there will be much improvement before Gemini 3.0. It seems that current manufacturers have formed a consensus that they only change the main version number when the base model is updated.

wintry tinsel Jun 5, 2025, 6:46 PM

#

So when Gemini 3 brah

elder rapids Jun 5, 2025, 6:46 PM

#

is kingfall Gemini 3

raven void Jun 5, 2025, 6:47 PM

#

Gemini 3 in October

vernal meadow Jun 5, 2025, 6:47 PM

#

we didn't get new SOTA in 10 minutes. I guess there really is a wall

small haven Jun 5, 2025, 6:47 PM

#

elder rapids is kingfall Gemini 3

i believe that

small haven Jun 5, 2025, 6:48 PM

#

vernal meadow we didn't get new SOTA in 10 minutes. I guess there really is a wall

will happen with self replicating/improving models one day 😭

#

wow sam falls is beating o3 pro on simple bench

#

not only that its faster

elder rapids Jun 5, 2025, 6:52 PM

#

crazy how high 0605 gets

#

I'm surprised tbh

small haven Jun 5, 2025, 6:54 PM

#

oh my goodness

#

tesla model 3

wintry tinsel Jun 5, 2025, 6:56 PM

#

small haven tesla model 3

Art

narrow elbow Jun 5, 2025, 6:56 PM

#

small haven tesla model 3

Turtle 3

late path Jun 5, 2025, 6:57 PM

#

badger 3

small haven Jun 5, 2025, 6:57 PM

#

i may have prompted it badly

candid storm Jun 5, 2025, 6:59 PM

#

When do you guys expect grok 3.5?

small haven Jun 5, 2025, 6:59 PM

#

oh yea cus wtf is this

#

yo what prompt is urs

#

lol

#

f

#

#

plz give me ur prompt 😭

#

cool thanks

#

u sure are

#

lemme try tesla model 3 again haha

#

why is it wonky now

#

oh

tall summit Jun 5, 2025, 7:08 PM

#

davinci-002 is not obscure or archaic

elder rapids Jun 5, 2025, 7:09 PM

#

not when I exist

small haven Jun 5, 2025, 7:11 PM

#

elder rapids not when I exist

show me ur delta plane svg model 🥵

#

wtf dude

#

tesla model 3

#

aight lemme draw elon ma

#

show me ur tesla model 3

misty vault Jun 5, 2025, 7:13 PM

#

we90 special token

small haven Jun 5, 2025, 7:13 PM

#

the cool thing about mistral is that theres no safety guard rn 😭

#

cook elon musk too

#

#

yes lol

#

ehh more like 2025 summer

#

wtf

#

GIVE ME UR PROMPT

#

I BEG U

#

just paste it

#

stop blueballing me

#

thank u sir

#

hmm let me put it under o3

#

*'maam

#

#

almost

#

hhahhahahaha

#

my wheels are better

#

ur lights are better tho

#

esp front

#

ehh i need sam falls further model

high ginkgo Jun 5, 2025, 7:23 PM

#

what the fck

small haven Jun 5, 2025, 7:26 PM

#

wicked root Jun 5, 2025, 7:26 PM

#

lmao wth is this

small haven Jun 5, 2025, 7:26 PM

#

elon musk

wicked root Jun 5, 2025, 7:26 PM

#

@deep adderare you into airplanes?

#

team airbus or boeing?

#

You sir have earned my respect today

small haven Jun 5, 2025, 7:30 PM

#

misty vault Jun 5, 2025, 7:30 PM

#

small haven

https://tenor.com/view/terminator-terminator-robot-looking-flex-cool-robot-gif-16625083

Tenor

#

sydney

#

really

small haven Jun 5, 2025, 7:31 PM

#

omg yes

misty vault Jun 5, 2025, 7:31 PM

#

noway

small haven Jun 5, 2025, 7:31 PM

#

brian that fcker is gone

ocean vortex Jun 5, 2025, 7:40 PM

#

small haven

swasticar?

#

that logo on the wheels looks like a swasticar 🧐

small haven Jun 5, 2025, 7:41 PM

#

ocean vortex swasticar?

oh naw u listen too much to knaye

sweet tinsel Jun 5, 2025, 7:42 PM

#

By the way, Gemini Deep Research should be updated with the new version, right? Could anyone rerun the prompt for my list?
Prompt:
Please write a comprehensive and in depth research report on the mass expulsion of ethnic Germans after World War II. Analyze the historical context driving these expulsions, the political decisions and international agreements that shaped the process, the social and economic consequences for displaced populations, the humanitarian and legal dimensions, personal testimonies, and the long term demographic and geopolitical impacts, drawing on primary sources, statistical evidence, and varied historiographical perspectives.
Deep Research Collection:
https://docs.google.com/document/d/1qSfyAyxzUziFQf55CD60-UgQ4Af9ubVmr69OrmAdevE/edit?usp=sharing

Google Docs

Deep-Research Tests

Deep-Research Tests Prompt: Please write a comprehensive and in depth research report on the mass expulsion of ethnic Germans after World War II. Analyze the historical context driving these expulsions, the political decisions and international agreements that shaped the process, the social and ...

late path Jun 5, 2025, 7:46 PM

#

0605 seems to perform worse than 0506/0325 in some of my use cases. I put lengthy content in the system prompt and have the model answer questions based on the system prompt's content. 0605 frequently loses the contextual information I provide in the system prompt, and exhibits severe hallucinations when I change the content of the system prompt and continue the same conversation.

ocean vortex Jun 5, 2025, 7:47 PM

#

sweet tinsel By the way, Gemini Deep Research should be updated with the new version, right? ...

it's doing it's thing... if you want 4 months free: g.co/g1referral/GBZKK6N0

small haven Jun 5, 2025, 7:48 PM

#

ocean vortex it's doing it's thing... if you want 4 months free: g.co/g1referral/GBZKK6N0

not that referral link 😭

ocean vortex Jun 5, 2025, 7:48 PM

#

canceled but I still have it till billing period, and apparently I can't use my own link 😠

ocean vortex Jun 5, 2025, 7:49 PM

#

small haven not that referral link 😭

it says 3 uses available so should be legit. Without a ref I don't think you get 4 months off

small haven Jun 5, 2025, 7:49 PM

#

ocean vortex it says 3 uses available so should be legit. Without a ref I don't think you get...

that is true

#

@deep adder u sure u still want o3 pro? 😭

#

🤷

#

i need a mistral deepthink in the oai model selecotr

late path Jun 5, 2025, 7:52 PM

#

For tasks involving extracting specific user comments within a context of around 60k tokens, 0605 repeatedly missed all comments in the latter half. I immediately switched to 0506 and its doing it perfectly.
To my recollection, 0325 also never encountered such an issue

#

I guess they can't achieve a comprehensive improvement in model performance without a new base model

torn mantle Jun 5, 2025, 7:55 PM

#

are you?

#

im just guessing

#

im smart

small haven Jun 5, 2025, 7:56 PM

#

miss craig

torn mantle Jun 5, 2025, 7:56 PM

#

elon musk

#

drama queen

#

hes going crazy rn on x

small haven Jun 5, 2025, 7:57 PM

#

maybe cus grok 3.5 is delayed by a month

torn mantle Jun 5, 2025, 7:58 PM

#

small haven maybe cus grok 3.5 is delayed by a month

lol

#

seems like sama won

small haven Jun 5, 2025, 7:59 PM

#

torn mantle seems like sama won

demis won

#

oh my goodness

#

i didn't know it was fcked

zinc ore Jun 5, 2025, 8:00 PM

#

Oooh community note

late path Jun 5, 2025, 8:01 PM

#

pplx fall grok rise

small haven Jun 5, 2025, 8:01 PM

#

wait trump praised elon a week ago, gave him a key of some sorts, and now is shxtting on him?

ocean vortex Jun 5, 2025, 8:01 PM

#

small haven oh my goodness

is this not fake?

#

the tweet

small haven Jun 5, 2025, 8:01 PM

#

thats so bad

small haven Jun 5, 2025, 8:01 PM

#

ocean vortex is this not fake?

check urself lol

ocean vortex Jun 5, 2025, 8:01 PM

#

everyone knew this was the case

#

but Elon is dumb posting it lol

#

it's better that they fight though 😇

#

He accomplishes here nothing. Everyone smart enough already knew what's the deal with Epstein files. But on a positive note, this may divide Republicans somewhat

keen beacon Jun 5, 2025, 8:09 PM

#

#

told you

#

its just a better optimized version of 2.5 pro

#

same token count and everything

#

^ knightfall

wicked root Jun 5, 2025, 8:12 PM

#

What does token mean?

torn mantle Jun 5, 2025, 8:13 PM

#

keen beacon ^ knightfall

is it knightfall or kingfall

keen beacon Jun 5, 2025, 8:15 PM

#

torn mantle is it knightfall or kingfall

kingfall

keen beacon Jun 5, 2025, 8:17 PM

#

wicked root What does token mean?

idk how to explain it exactly word for word or even if what i think of it is right

#

but i believe they're called contextual tokens

sonic tendon Jun 5, 2025, 8:17 PM

#

🤯🤯🤯

keen beacon Jun 5, 2025, 8:18 PM

#

just google it or ask ai cuz i dont really know how to explain it in layman terms

keen beacon Jun 5, 2025, 8:18 PM

#

small haven oh my goodness

LMAOOO

#

WHERE IS THIS

small haven Jun 5, 2025, 8:19 PM

#

keen beacon LMAOOO

x.com

keen beacon Jun 5, 2025, 8:19 PM

#

no like

#

do you have the tweet link

small haven Jun 5, 2025, 8:19 PM

#

elonmusk?

keen beacon Jun 5, 2025, 8:20 PM

#

Sybau bro

#

We do its in ai studio

#

got released today

soft kernel Jun 5, 2025, 8:21 PM

#

It's probably 2.5 ultra,this is just a small update for 2.5 pro

keen beacon Jun 5, 2025, 8:21 PM

#

this new model is it i believe

#

added today

soft kernel Jun 5, 2025, 8:21 PM

#

keen beacon got released today

They took kingfall down

small haven Jun 5, 2025, 8:24 PM

#

im sad kingfall got dropped

#

samrises

late path Jun 5, 2025, 8:25 PM

#

I think Kingfall will likely be a model that enters the arena in the future

ember rapids Jun 5, 2025, 8:25 PM

#

I like them resorting to fake accidental leaking to drum up hype

late path Jun 5, 2025, 8:25 PM

#

Just leaked early

ember rapids Jun 5, 2025, 8:25 PM

#

Hopefully they don’t make us wait 383910 years

#

Like Elon

late path Jun 5, 2025, 8:25 PM

#

We'll see him again

small haven Jun 5, 2025, 8:26 PM

#

kingfall is no joke

#

o3 pro got a simplebench q wrong

#

kingfall got it under 30s

#

i feel like its deepthink, but its too fast or theyve really achieved something incredible

soft kernel Jun 5, 2025, 8:32 PM

#

small haven o3 pro got a simplebench q wrong

What was the q

small haven Jun 5, 2025, 8:32 PM

#

soft kernel What was the q

https://chatgpt.com/share/6841fbfb-f9fc-8003-b7c9-14ae1b7332fe

ChatGPT

ChatGPT - Ice Cube Puzzle

Shared via ChatGPT

#

balmy mist Jun 5, 2025, 8:32 PM

#

https://x.com/AnthropicAI/status/1930724371846643723

Anthropic (@AnthropicAI)

Introducing Claude Gov—a custom set of models built for U.S. national security customers.

Already deployed by agencies at the highest level of U.S. national security, access to these models is limited to those who operate in classified environments.

small haven Jun 5, 2025, 8:33 PM

#

soft kernel Jun 5, 2025, 8:34 PM

#

How did you guys even try asking these,didn't kingfall like destroyed under 30 mins from the studio

small haven Jun 5, 2025, 8:34 PM

#

kingfall >>

#

samfalls

topaz edge Jun 5, 2025, 8:36 PM

#

what is the goldmane model?

soft kernel Jun 5, 2025, 8:36 PM

#

small haven samfalls

I don't get why havent Google released it yet

torn mantle Jun 5, 2025, 8:39 PM

#

small haven samfalls

sama won

#

with this trump vs elon fight

topaz edge Jun 5, 2025, 8:39 PM

#

huh?

small haven Jun 5, 2025, 8:39 PM

#

torn mantle with this trump vs elon fight

yea

small haven Jun 5, 2025, 8:39 PM

#

soft kernel I don't get why havent Google released it yet

safety testing

topaz edge Jun 5, 2025, 8:39 PM

#

are you implying that xai was a serious competitor against openai and now theyre not because trump and elon are having a disagreement?

soft kernel Jun 5, 2025, 8:40 PM

#

topaz edge what is the goldmane model?

It's Google 2.5 Pro 06-05

jade egret Jun 5, 2025, 8:44 PM

#

is it good

topaz edge Jun 5, 2025, 8:45 PM

#

yes

small haven Jun 5, 2025, 8:47 PM

#

mistral cutoff is so dated

topaz edge Jun 5, 2025, 8:48 PM

#

🤣

jade egret Jun 5, 2025, 8:49 PM

#

topaz edge yes

how good

topaz edge Jun 5, 2025, 8:49 PM

#

jade egret Jun 5, 2025, 8:49 PM

#

o

#

danggg

#

it op

#

pass by claude by 30 something points is crazy

balmy mist Jun 5, 2025, 8:51 PM

#

torn mantle with this trump vs elon fight

trump and elon are fighting?

golden ocean Jun 5, 2025, 8:53 PM

#

#

tall summit Jun 5, 2025, 8:57 PM

#

golden ocean

that would be pretty cool tbh

small haven Jun 5, 2025, 8:57 PM

#

KINGFALL IS DEEPTHINK

tall summit Jun 5, 2025, 8:57 PM

#

i'm all for centrism

small haven Jun 5, 2025, 8:57 PM

#

IT USES PARALLEL COT

tall summit Jun 5, 2025, 8:57 PM

#

how do ya know

small haven Jun 5, 2025, 8:57 PM

#

my prompt was "yo"

jade egret Jun 5, 2025, 8:57 PM

#

lol

small haven Jun 5, 2025, 8:57 PM

#

on the frontend it just took one candidate

#

no wonder its so good

tall summit Jun 5, 2025, 8:58 PM

#

how interesting

small haven Jun 5, 2025, 9:00 PM

#

@patent aspen kingfall is deepthink

soft kernel Jun 5, 2025, 9:00 PM

#

How did you get that info?

tall summit Jun 5, 2025, 9:01 PM

#

he just said it

soft kernel Jun 5, 2025, 9:01 PM

#

Do you still have it on the studio?

small haven Jun 5, 2025, 9:01 PM

#

soft kernel How did you get that info?

i got the kingfall api

#

it has candidates structured json

#

my prompt was just "yo"

soft kernel Jun 5, 2025, 9:02 PM

#

small haven i got the kingfall api

Wait what how

small haven Jun 5, 2025, 9:02 PM

#

just a little reverse engineering

balmy mist Jun 5, 2025, 9:03 PM

#

where can i test kingfall at? also is kingfall the best model

small haven Jun 5, 2025, 9:04 PM

#

yes

#

its deepthink

#

no wonder it beat o3 pro lol

soft kernel Jun 5, 2025, 9:04 PM

#

OpenAI is scared a little I think

soft kernel Jun 5, 2025, 9:05 PM

#

small haven just a little reverse engineering

"Just a little"

small haven Jun 5, 2025, 9:05 PM

#

can me have bug bounty

tall summit Jun 5, 2025, 9:06 PM

#

can ya test its creative writing capabilities

small haven Jun 5, 2025, 9:09 PM

#

nvm its not deepthink

soft kernel Jun 5, 2025, 9:09 PM

#

It's over

zinc ore Jun 5, 2025, 9:12 PM

#

Some people were saying kingsfall was a further trained version of drakesclaw

brittle tiger Jun 5, 2025, 9:13 PM

#

@small haven any notable differences with it from the release today?

small haven Jun 5, 2025, 9:14 PM

#

brittle tiger <@931708065319907338> any notable differences with it from the release today?

yes its what the ppl want, from 0605 is a huge jump

jade egret Jun 5, 2025, 9:14 PM

#

small haven its deepthink

huh?

#

wait what

small haven Jun 5, 2025, 9:15 PM

#

jade egret huh?

its not

jade egret Jun 5, 2025, 9:15 PM

#

it gemini 2.5 pro deepthink?

#

oh

small haven Jun 5, 2025, 9:15 PM

#

i talked too early

jade egret Jun 5, 2025, 9:15 PM

#

what is it

small haven Jun 5, 2025, 9:15 PM

#

gemini pro checkpoint

jade egret Jun 5, 2025, 9:15 PM

#

what is that

#

is it better tham gemini 2.5 pro 6-05

small haven Jun 5, 2025, 9:15 PM

#

similar to 2.5 pro, maybe its 3 pro, idk

soft kernel Jun 5, 2025, 9:15 PM

#

Another update

jade egret Jun 5, 2025, 9:15 PM

#

o

small haven Jun 5, 2025, 9:15 PM

#

jade egret is it better tham gemini 2.5 pro 6-05

100%

jade egret Jun 5, 2025, 9:15 PM

#

so it better

soft kernel Jun 5, 2025, 9:15 PM

#

jade egret is it better tham gemini 2.5 pro 6-05

It is

jade egret Jun 5, 2025, 9:15 PM

#

small haven 100%

W

#

W

soft kernel Jun 5, 2025, 9:16 PM

#

100% 2.5 ultra

small haven Jun 5, 2025, 9:16 PM

#

soft kernel 100% 2.5 ultra

its too fast to be ultra

jade egret Jun 5, 2025, 9:16 PM

#

FR?

small haven Jun 5, 2025, 9:16 PM

#

faster than 2.5 pro

jade egret Jun 5, 2025, 9:16 PM

#

WOAH

small haven Jun 5, 2025, 9:16 PM

#

in terms of the reasoning tokens it consumes

#

not in latency

jade egret Jun 5, 2025, 9:16 PM

#

its

#

only aviliable on lm arena?

small haven Jun 5, 2025, 9:17 PM

#

should be soon im guesing

jade egret Jun 5, 2025, 9:17 PM

#

it from google right

soft kernel Jun 5, 2025, 9:18 PM

#

jade egret only aviliable on lm arena?

Nah idk how did he get it through api,but it's the only model that's not available through the arena

jade egret Jun 5, 2025, 9:19 PM

#

o

#

where can i use it

#

let me guess

#

i cant

small haven Jun 5, 2025, 9:19 PM

#

yea pineapple when is kingfall coming to lmarena lol

soft kernel Jun 5, 2025, 9:20 PM

#

jade egret i cant

Yeah just wait a few weeks

jade egret Jun 5, 2025, 9:21 PM

#

o

small haven Jun 5, 2025, 9:22 PM

#

manually testing aider polyglot

#

if its >90% its agi

soft kernel Jun 5, 2025, 9:23 PM

#

Never in a million years

jade egret Jun 5, 2025, 9:23 PM

#

small haven manually testing aider polyglot

wait

#

your doing it?

#

hopefully agi

small haven Jun 5, 2025, 9:23 PM

#

jade egret wait

yes manually tho

#

api is wonky

jade egret Jun 5, 2025, 9:24 PM

#

o

#

how long would it take

#

i want to see the result : )

soft kernel Jun 5, 2025, 9:24 PM

#

small haven api is wonky

Wish you could help out people here

#

By addressing your way

small haven Jun 5, 2025, 9:25 PM

#

jade egret how long would it take

its 225 tests i believe

#

ehh gonna run in parallel

misty vault Jun 5, 2025, 9:29 PM

#

agi

torn mantle Jun 5, 2025, 9:30 PM

#

balmy mist trump and elon are fighting?

yea

#

lol

#

drama

topaz edge Jun 5, 2025, 9:31 PM

#

golden ocean

nobody cares man

#

this is about lmarena not trump and elon musk

misty vault Jun 5, 2025, 9:42 PM

#

i care

olive mesa Jun 5, 2025, 10:06 PM

#

small haven i feel like its deepthink, but its too fast or theyve really achieved something ...

imagine they made a new architecture lol

#

maybe unlikely but it was google engineers that first made the transformer architecture so..

small haven Jun 5, 2025, 10:06 PM

#

olive mesa imagine they made a new architecture lol

i can believe that

#

anybody know the highest for aider polyglot python section?

keen beacon Jun 5, 2025, 10:18 PM

#

It's not 3 pro the pretraining wasn't updated. But I wasn't exhaustive

elder rapids Jun 5, 2025, 10:19 PM

#

tbh goldmane kind of was

#

kingfall and goldmane don't have extreme differences, but they simply think diifferently

small haven Jun 5, 2025, 10:19 PM

#

mistral is insane

torn mantle Jun 5, 2025, 10:22 PM

#

lol

#

nebula was bad

elder rapids Jun 5, 2025, 10:22 PM

#

elder rapids kingfall and goldmane don't have extreme differences, but they simply think diif...

my guess is that they're testing how far they can push different types of heuristics

torn mantle Jun 5, 2025, 10:22 PM

#

wasnt nebula like gpt4.1?

elder rapids Jun 5, 2025, 10:22 PM

#

nebula = 0325

keen beacon Jun 5, 2025, 10:23 PM

#

Bro actually has amnesia

keen beacon Jun 5, 2025, 10:23 PM

#

elder rapids nebula = 0325

How is goldmane btw

#

I just woke up

elder rapids Jun 5, 2025, 10:24 PM

#

keen beacon How is goldmane btw

8-9/10 on the simplebench sample questions

keen beacon Jun 5, 2025, 10:24 PM

#

Damn

elder rapids Jun 5, 2025, 10:24 PM

#

same as kingfall

#

although they both seem to get them all right

#

just with the nuance of "but since the format is this, x should be the intended answer"

#

so therefore I can't just give it the point

#

unfortunately

torn mantle Jun 5, 2025, 10:27 PM

#

keen beacon Bro actually has amnesia

no there was an oai model with some space related naming or smth

#

so many models

#

if you want to blame someone

#

blame google

#

they released like 20 checkpoint

#

claybrook/goldmane/calmriver/nightwhisper/dayhush.......

elder rapids Jun 5, 2025, 10:29 PM

#

torn mantle no there was an oai model with some space related naming or smth

quasar?

torn mantle Jun 5, 2025, 10:30 PM

#

elder rapids quasar?

yeaaaaaaaaaaaaaaaa

#

THANK YOU

#

HMM

#

WHATS UR NAME

#

pedanticallyprofound

elder rapids Jun 5, 2025, 10:30 PM

#

@keen beacon btw goldmane has meaningfully nullified the performance discrepancy between AIstudio and the app really well

#

although not 1:1

#

it's still intelligent enough to bypass things with the same nuance

elder rapids Jun 5, 2025, 10:31 PM

#

torn mantle pedanticallyprofound

yeah 😭😭

#

I probably should have a name

small haven Jun 5, 2025, 10:36 PM

#

hmm mistral is having issues with cpp problems 😦

balmy mist Jun 5, 2025, 10:37 PM

#

i love mistral

small haven Jun 5, 2025, 10:39 PM

#

le chat is not agi yet 😭

elder rapids Jun 5, 2025, 10:39 PM

#

balmy mist i love mistral

wait what am I missing out on

#

😭

small haven Jun 5, 2025, 10:39 PM

#

gonna try rust now

#

le chat killed python

elder rapids Jun 5, 2025, 10:41 PM

#

@small haven @small haven yooo what am I missing out

#

@deep adder

#

put me on

#

😭🙏

#

AI is for all of us remember

#

we are all in this together

small haven Jun 5, 2025, 10:42 PM

#

elder rapids <@931708065319907338> <@931708065319907338> yooo what am I missing out

lechat.com

elder rapids Jun 5, 2025, 10:46 PM

#

small haven lechat.com

btw open that website yourself

small haven Jun 5, 2025, 10:50 PM

#

elder rapids btw open that website yourself

lechat.fr

#

@deep adder is lechat gone?

#

im getting a big fat error

#

see u again soon lechat 😭

#

lechat is >90% aider polyglot

#

1500 elo

#

i vouch

#

100% gemini, 0% oai

#

i wonder if deepthink will be based on lechat or 0605

#

f's

#

send prompt

jade egret Jun 5, 2025, 11:10 PM

#

hi

#

who do you think is gonna win the "ai race"

#

who o u think

small haven Jun 5, 2025, 11:11 PM

#

obviously google

#

unless o5 is premature asf

jade egret Jun 5, 2025, 11:12 PM

#

o

#

but gogole might sell chrome ; (

small haven Jun 5, 2025, 11:12 PM

#

i think its time to buy some googol stock

jade egret Jun 5, 2025, 11:12 PM

#

why

small haven Jun 5, 2025, 11:12 PM

#

they won

jade egret Jun 5, 2025, 11:12 PM

#

fr?

#

but

#

: (

small haven Jun 5, 2025, 11:13 PM

#

jade egret : (

who cares, its just a browser

jade egret Jun 5, 2025, 11:13 PM

#

so

#

if google

#

reach agi first

#

browser wondnt even matter?

#

: oo

#

o

#

but

#

google said it will appeal

#

so

#

that will add few more years

#

for google to reach agi

sweet tinsel Jun 5, 2025, 11:14 PM

#

They still have other products to fund it and it seems like G Deepmind gets a pretty high budget currently.

small haven Jun 5, 2025, 11:14 PM

#

yes

#

wtf

#

im done with that tho

jade egret Jun 5, 2025, 11:15 PM

#

so google is gonna win

small haven Jun 5, 2025, 11:15 PM

#

oh shxt

#

lechat is back

small haven Jun 5, 2025, 11:15 PM

#

jade egret so google is gonna win

yes

jade egret Jun 5, 2025, 11:15 PM

#

o

#

W

small haven Jun 5, 2025, 11:16 PM

#

i feel like its going to proc oai to release gpt5/o4/o5-mini-high very early than planned

#

yea integrated whatever

jade egret Jun 5, 2025, 11:16 PM

#

gemini 2.5 pro very good ngl

small haven Jun 5, 2025, 11:17 PM

#

dario: benchmarks dont matter anymore

sweet tinsel Jun 5, 2025, 11:17 PM

#

They care more about user growth, and sadly dumber models like GPT 4o will just finely do that while being cheap.

#

I would love it if they would make GPT-2 Chatbot available again. The GPT 4o prototype.

late path Jun 5, 2025, 11:18 PM

#

will gpt5 be released in july
wheres our openai insider

small haven Jun 5, 2025, 11:18 PM

#

late path will gpt5 be released in july wheres our openai insider

yes

#

im not an oai insider, but yes

#

what davinci-002 mean lol

#

oh ok

sweet tinsel Jun 5, 2025, 11:19 PM

#

I've used it before I had ChatGPT Plus when ChatGPT was out of capacity

#

Did it's job pretty well

small haven Jun 5, 2025, 11:20 PM

#

wait did brian leave again 😭

#

bro just coming in and out

jade egret Jun 5, 2025, 11:20 PM

#

apple intellgent sucks ngl

sweet tinsel Jun 5, 2025, 11:20 PM

#

Honestly GPT2-Chatbot was the something like an early Night whisperer it was just as hyped and extremely good for the time.

small haven Jun 5, 2025, 11:20 PM

#

sweet tinsel Honestly GPT2-Chatbot was the something like an early Night whisperer it was jus...

oh yea i remember gpt2 chatbot, that will never be topped in terms of big hype vibe

sweet tinsel Jun 5, 2025, 11:21 PM

#

Well... It was pretty good too. It was way better than the current GPT 4o writing style.

small haven Jun 5, 2025, 11:21 PM

#

lechat is back?

sweet tinsel Jun 5, 2025, 11:21 PM

#

Would love it as a cheaper GPT 4.5 replacement

small haven Jun 5, 2025, 11:22 PM

#

sweet tinsel Well... It was pretty good too. It was way better than the current GPT 4o writin...

oh yes

#

its less distilled version for sure

late path Jun 5, 2025, 11:22 PM

#

sweet tinsel Honestly GPT2-Chatbot was the something like an early Night whisperer it was jus...

isnt that just gpt4o?

small haven Jun 5, 2025, 11:22 PM

#

is apple intelligence back

small haven Jun 5, 2025, 11:22 PM

#

late path isnt that just gpt4o?

gpt4o feels more distilled i agree with @sweet tinsel

sweet tinsel Jun 5, 2025, 11:22 PM

#

late path isnt that just gpt4o?

A earlier prototype version of it that was less restricted.

small haven Jun 5, 2025, 11:22 PM

#

didnt hit the same

#

but 4o rn is >> obviously

#

lechat is back

#

omg

#

im done with melting lechat tpus

echo aurora Jun 5, 2025, 11:24 PM

#

I can't tell if I love this or not

small haven Jun 5, 2025, 11:24 PM

#

enjoy the extra capacity

#

@echo aurora u like my tesla model 3?

echo aurora Jun 5, 2025, 11:25 PM

#

small haven <@283397944160550928> u like my tesla model 3?

perfection

small haven Jun 5, 2025, 11:26 PM

#

imma leave le chat'oolers alone

#

ur welcome

sweet tinsel Jun 5, 2025, 11:27 PM

#

But seriously, did you guys already try out the Agent Feature in Le Chat?

topaz edge Jun 5, 2025, 11:27 PM

#

its mid

small haven Jun 5, 2025, 11:28 PM

#

wow lechat

#

wait actually?

#

lechat context is dated, early 2023

#

how does it know about xai

#

oh ok right

#

well thats why

#

oh ok

#

imma leave the lechat's tpus alone

#

its fine

wicked root Jun 6, 2025, 12:02 AM

#

can someone tell me how much of a difference is there between Claude & Grok vs Gemini or Open AI products?

#

I could vc

leaden sun Jun 6, 2025, 12:17 AM

#

anyone tried Runner by H company?

#

it looks sad for some reason

wicked root Jun 6, 2025, 1:25 AM

#

I don't know. I only use Gemini for coding.

#

I just want to know if grok and claude can beat gemini in LMArena leaderboard

elder rapids Jun 6, 2025, 1:34 AM

#

why is 0605 so fast lmao

#

as well as the fact, that now at at 100k context lengths, the latency doesn't get any worse

#

and well beyond that, too

jade egret Jun 6, 2025, 2:19 AM

#

wicked root I just want to know if grok and claude can beat gemini in LMArena leaderboard

no

#

gemini is better right now in average and in webDev

wicked root Jun 6, 2025, 2:20 AM

#

jade egret gemini is better right now in average and in webDev

What about with newer models

jade egret Jun 6, 2025, 2:20 AM

#

which

small haven Jun 6, 2025, 2:21 AM

#

when will kingfall fall on earth

wicked root Jun 6, 2025, 2:22 AM

#

jade egret which

I dno. It seems these companies are all pumping out new models

jade egret Jun 6, 2025, 2:23 AM

#

wait it is?

#

it o3-preview?

#

i though it gemini

#

broooooo

#

wait

#

isnt it from google

#

because

#

it appeared in google ai studio

#

than vanished

small haven Jun 6, 2025, 3:19 AM

#

wen kingfall deepthink

haughty tangle Jun 6, 2025, 3:37 AM

#

jade egret isnt it from google

yes

small haven Jun 6, 2025, 3:50 AM

#

o3 pro is timing out more 🧐

small haven Jun 6, 2025, 5:29 AM

#

@worthy thunder need 0506 for comparison against 0605

worthy thunder Jun 6, 2025, 6:27 AM

#

small haven <@218880601522962442> need 0506 for comparison against 0605

It's there. I just have it auto-hidden (mostly to clean up the leaderboard). You can unhide it via the controls tab 😉

#

Reposting the update here: Added Gemini 2.5 Pro (Thinking, 06-05) to 2needle and 8needle leaderboards. Matches or exceeds 03-25's context performance.

2needle results (AUC @ 1M):

Gemini 2.5 Flash (Thinking, 05-20): 78.3% (#1)
Gemini 2.5 Pro (Thinking, 06-05): 77.5% (#2)
Gemini 2.5 Pro (Thinking, 03-25): 73.7% (DEP)
Gemini 2.5 Pro (Thinking, 05-06): 72.5% (DEP)
Gemini 2.5 Flash (Non-thinking, 05-20): 70.2% (#3)
GPT-4.1 (Non-thinking, 04-14): 53.2% (#4)
GPT-4.1 Mini (Non-thinking, 04-14): 43.6% (#6)

8needle results (AUC @ 1M):

Gemini 2.5 Pro (Thinking, 06-05): 28.0% (#1)
Gemini 2.5 Pro (Thinking, 03-25): 27.8% (DEP)
Gemini 2.5 Flash (Thinking, 05-20): 27.0% (#2)
Gemini 2.5 Pro (Thinking, 05-06): 26.8% (DEP)
Gemini 2.5 Flash (Non-thinking, 05-20): 23.4% (#3)
GPT-4.1 (Non-thinking, 04-14): 17.5% (#4)
GPT-4.1 Mini (Non-thinking, 04-14): 16.7% (#6)

More results at: https://contextarena.ai

Source: https://x.com/DillonUzar/status/1930723790708777273
And info about me hiding the old ones: https://x.com/DillonUzar/status/1930724414443630880

#

^ I've added several others since I last posted here, been traveling a lot for work. Some other results like Claude 4 (include Claude 4 Opus, but only for 2needle for now), and a few other misc models were added too

#

Some other results which may be of interest:

Claude 4 Opus (2needle): https://x.com/DillonUzar/status/1930718823931613456 (I had to add an unranked curated version to the leaderboard in addition to the ranked one. The unranked curated removes any empty response tests from grading, since Claude 4 seems to sometimes not respond with anything when reasoning is enabled with larger contexts, but I still wanted to roughly compare without messing up rankings, explanation in tweet thread).
Claude 4 Sonnet (4needle and 8needle): https://x.com/DillonUzar/status/1927520641852617090 (Thinking and Non-thinking)
Claude 4 Sonnet (2needle): https://x.com/DillonUzar/status/1926330784308253052 (Thinking and Non-thinking)
Gemini 2.5 Flash (05-20, all needles): https://x.com/DillonUzar/status/1924978177509597633 (Thinking and Non-thinking, note - output pricing was wonky with Google, I reported some issues and they seem to have resolved it but I unfortunately couldn't capture a good count during the run)
Deepseek r1 (2needle, 05-28): https://x.com/DillonUzar/status/1928983035329827098
o3 (all needles): https://x.com/DillonUzar/status/1920248184376295704

torn mantle Jun 6, 2025, 6:45 AM

#

nice

#

thanks

calm sequoia Jun 6, 2025, 6:47 AM

#

Why the'res always a drop at 16K? Data batching issue?

cedar tide Jun 6, 2025, 7:48 AM

#

New request
https://discord.com/channels/1340554757349179412/1380453122148667432

inner hare Jun 6, 2025, 8:05 AM

#

What to do?

acoustic cliff Jun 6, 2025, 8:11 AM

#

ponder

verbal nimbus Jun 6, 2025, 8:36 AM

#

calm sequoia Why the'res always a drop at 16K? Data batching issue?

The new Gemini version experienced a drop at 8K, but otherwise better. Slightly lower than 03-25 at some points.

verbal nimbus Jun 6, 2025, 8:40 AM

#

worthy thunder It's there. I just have it auto-hidden (mostly to clean up the leaderboard). You...

Gemini costs more than o3 and Opus 4 thinking?

tall summit Jun 6, 2025, 9:08 AM

#

worthy thunder Some other results which may be of interest: - Claude 4 Opus (2needle): https://...

thank you

calm sequoia Jun 6, 2025, 9:10 AM

#

Why is the new geminui so cringe

#

Every stupid question I ask gets applauses

tall summit Jun 6, 2025, 9:10 AM

#

seeing that from gemini is funny

calm sequoia Jun 6, 2025, 9:11 AM

#

Some maveric vibes 👀 at least it delivers

verbal nimbus Jun 6, 2025, 10:07 AM

#

calm sequoia Every stupid question I ask gets applauses

Old Gemini vibes :/ "You are absolutely right!"

inner hare Jun 6, 2025, 10:11 AM

#

how to fix?

ocean vortex Jun 6, 2025, 10:33 AM

#

calm sequoia Why is the new geminui so cringe

lmao. Add this to a system prompt: Never start your responses by saying a question or idea or observation was good, great, fascinating, profound, excellent, or any other positive adjective. Skip the flattery and respond directly.

calm sequoia Jun 6, 2025, 10:34 AM

#

ocean vortex lmao. Add this to a system prompt: ```Never start your responses by saying a que...

You add extra instructions. Doesn't it reduce the performance?

#

Ah yes, I have similar pre-prompt for chatgpt, as you recommended like a month ago 😄

ocean vortex Jun 6, 2025, 10:35 AM

#

calm sequoia You add extra instructions. Doesn't it reduce the performance?

Why would it? It really does not. You should look at the length of Anthropic system prompts that they use lol

calm sequoia Jun 6, 2025, 10:36 AM

#

Good, thanks!

ocean vortex Jun 6, 2025, 10:36 AM

#

In some cases it could reduce performance I suppose, but those are more of edge cases with jailbreaks or RP, or really bad prompting etc

calm sequoia Jun 6, 2025, 10:37 AM

#

I suppose for me it's a mental thing from the old days, where one word change in prompt would change the answer

hazy yoke Jun 6, 2025, 10:47 AM

#

Hello, guys, I am wondering is there any way to submit a model to the leaderboard right now, or does the leaderboard currently only accept high-profile entries?

tall summit Jun 6, 2025, 11:24 AM

#

hazy yoke Hello, guys, I am wondering is there any way to submit a model to the leaderboar...

#1372229840131985540

unborn ocean Jun 6, 2025, 11:31 AM

#

gemini 2.5 pro 06-05 just got it!

#

There are 2022 users on a social network called Mathbook, and some of them are Mathbook-friends. (On Mathbook, friendship is always mutual and permanent.)

Starting now, Mathbook will only allow a new friendship to be formed between two users if they have at least two friends in common. What is the minimum number of friendships that must already exist so that every user could eventually become friends with every other user?

#

took 29k tokens though

📎 response.txt

#

no system prompt

#

torn mantle Jun 6, 2025, 11:38 AM

#

verbal nimbus The new Gemini version experienced a drop at 8K, but otherwise better. Slightly ...

yea they said its bridging the gap compared to older checkpoints

#

the overall performance drop probably had to do with coding-focused finetuning

keen beacon Jun 6, 2025, 11:40 AM

#

they made substantial gains on aider tho still with the new 2.5 pro

torn mantle Jun 6, 2025, 11:42 AM

#

its kinda hard to generalize on all benchmarks

#

you need to find the balance somehow

#

they are getting there

#

kingfall is a good example

small haven Jun 6, 2025, 12:10 PM

#

need kingfall asap

keen beacon Jun 6, 2025, 12:11 PM

#

youll immediately want the next unreleased version after that 🤣

dusky aurora Jun 6, 2025, 12:19 PM

#

the new Gemini 2.5 Pro has become more judgmental

keen beacon Jun 6, 2025, 12:20 PM

#

its sycophantic af

unborn ocean Jun 6, 2025, 12:23 PM

#

i mean you can't make this up, it is soo close to kingfall

dusky aurora Jun 6, 2025, 12:23 PM

#

ChatGPT is a bad influence on others with its excitable style

unborn ocean Jun 6, 2025, 12:23 PM

#

they are 100% just coming up with the names by prompting gemini

torn mantle Jun 6, 2025, 12:25 PM

#

unborn ocean they are 100% just coming up with the names by prompting gemini

tell it to predict next 5 names

small haven Jun 6, 2025, 12:27 PM

#

torn mantle tell it to predict next 5 names

is the samfalls link dead or is it me

torn mantle Jun 6, 2025, 12:33 PM

#

small haven is the samfalls link dead or is it me

switch google acc

#

there is a rate limit too

small haven Jun 6, 2025, 12:34 PM

#

torn mantle switch google acc

Oh wow

torn mantle Jun 6, 2025, 12:34 PM

#

thank you asura

small haven Jun 6, 2025, 12:38 PM

#

thank you arusa

#

its got ultra vibes

torn mantle Jun 6, 2025, 12:41 PM

#

yea its a pretty good model

keen fulcrum Jun 6, 2025, 12:51 PM

#

https://www.youtube.com/watch?v=zv_IoWIO5Ek
this TTS is amazing!

YouTube

ElevenLabs

Introducing Eleven v3 (alpha) — Our Most Expressive Text to Speec...

Introducing Eleven v3 (alpha) — our most expressive Text to Speech model.
This research preview is designed for creators working at the frontier of AI audio. Whether you're building faceless YouTube channels, narrator-style videos, or entirely new formats — it offers new levels of expressiveness and control.

Available now: The Eleven v3 (al...

▶ Play video

small haven Jun 6, 2025, 12:58 PM

#

unborn ocean There are 2022 users on a social network called Mathbook, and some of them are M...

do u get everytime or was this a lucky run

unborn ocean Jun 6, 2025, 12:58 PM

#

small haven do u get everytime or was this a lucky run

lucky asf

#

idk why it was thinking for 4 min

#

usually it takes like 1 min (but is wrong)

small haven Jun 6, 2025, 1:00 PM

#

unborn ocean idk why it was thinking for 4 min

interesting

alpine coral Jun 6, 2025, 1:00 PM

#

i scrolled a fair way up but still dont understand.. what / where is 'kingfall'?

unborn ocean Jun 6, 2025, 1:00 PM

#

gone

alpine coral Jun 6, 2025, 1:01 PM

#

from the arena or or aistudio?

unborn ocean Jun 6, 2025, 1:01 PM

#

aistudio

#

leak

#

(or it is believed that someone made a mistake)

keen beacon Jun 6, 2025, 1:01 PM

#

alpine coral i scrolled a fair way up but still dont understand.. what / where is 'kingfall'?

it was temporarily there for like 20 mins

small haven Jun 6, 2025, 1:01 PM

#

tbf its still pouring

unborn ocean Jun 6, 2025, 1:01 PM

#

small haven tbf its still pouring

on api?

small haven Jun 6, 2025, 1:02 PM

#

no current weather run

#

rn

unborn ocean Jun 6, 2025, 1:02 PM

#

ah, sure

alpine coral Jun 6, 2025, 1:03 PM

#

cool cheers for clarifying - tho odd 'leak'.. like original open sourcing of llama was an actual leak.. this would be some dev getting dates wrong? or a marketing/hype ploy ig

keen beacon Jun 6, 2025, 1:03 PM

#

nah someone actually messed up apparently

alpine coral Jun 6, 2025, 1:03 PM

#

i see i see

acoustic cliff Jun 6, 2025, 1:29 PM

#

keen fulcrum https://www.youtube.com/watch?v=zv_IoWIO5Ek this TTS is amazing!

yeah, impressive in english

torn mantle Jun 6, 2025, 1:33 PM

#

unborn ocean (or it is believed that someone made a mistake)

it was removed

#

no

small haven Jun 6, 2025, 1:41 PM

#

prove it

torn mantle Jun 6, 2025, 1:43 PM

#

im so lazy to check the code

#

i think i should check it a bit

#

lol no

calm sequoia Jun 6, 2025, 1:44 PM

#

Lol the new gemini confused me soo deeply with theory, then o3 put me back on track

#

Then I saw this. It seems to be overconfident at stuff.

#

The o3 was known for hallucinations but the gemini is too much

torn mantle Jun 6, 2025, 1:45 PM

#

its using google internal api

#

but when i search for the request url in the code i cant find it

#

ik its using googleai module to directly call that

keen beacon Jun 6, 2025, 1:47 PM

#

its quite clever

small haven Jun 6, 2025, 1:47 PM

#

torn mantle ik its using googleai module to directly call that

yea i checked it too nothing but google stuff

keen beacon Jun 6, 2025, 1:47 PM

#

i saw this a while back, their "apps" in aistudio

#

i didnt realize u could do this

small haven Jun 6, 2025, 1:48 PM

#

but my pc did crash right after so yes its a worm

torn mantle Jun 6, 2025, 1:49 PM

#

small haven yea i checked it too nothing but google stuff

hes using the official google api ( generativelanguage.googleapis.com ) but i guess he scrapped exact naming of their exp models

#

kingfall-ab-test

#

or whatever its name and hes using it

small haven Jun 6, 2025, 1:50 PM

#

torn mantle hes using the official google api ( generativelanguage.googleapis.com ) but i gu...

the real question is why it isn’t patched at this point

torn mantle Jun 6, 2025, 1:50 PM

#

small haven the real question is why it isn’t patched at this point

yea

keen beacon Jun 6, 2025, 1:50 PM

#

their apps thing allows u to call the gemini api programmatically through the apps feature (so you can share apps/less friction w/o needing to put ur api key), but the env has a special api key/proxy or additional mechanisms apparently (seemingly not tied to ur own acc)

#

this is truly a bruh moment 😭

unborn ocean Jun 6, 2025, 1:52 PM

#

keen beacon their apps thing allows u to call the gemini api programmatically through the ap...

can confirm, abused this big time

#

does not show up in your api

torn mantle Jun 6, 2025, 1:52 PM

#

keen beacon their apps thing allows u to call the gemini api programmatically through the ap...

yea but why it shows 'placeholder'

#

for the api key

unborn ocean Jun 6, 2025, 1:52 PM

#

no limit for 2.5 pro

torn mantle Jun 6, 2025, 1:52 PM

#

ive tried to run it locally

#

ive thought about that as well

keen beacon Jun 6, 2025, 1:52 PM

#

who tf posted this tbh

#

if i found it i wouldnt have posted it

torn mantle Jun 6, 2025, 1:53 PM

#

im not talking about sig, we havent even reached that part yet

keen beacon Jun 6, 2025, 1:53 PM

#

i didnt think the people at google could make suchh a big mistake

torn mantle Jun 6, 2025, 1:53 PM

#

ive been RE web apps for like forever

#

lol

#

blud said you dont know what you are talking about

keen beacon Jun 6, 2025, 1:54 PM

#

torn mantle yea but why it shows 'placeholder'

they potentially replace it. 2. even if it wasn't, it might be limited to a specific env/ip

#

or it might be proxied could be a lot of things

unborn ocean Jun 6, 2025, 1:54 PM

#

yeah was not referencing what you meant

#

just the builder thing

#

other stuff idk

keen beacon Jun 6, 2025, 1:59 PM

#

im gonna have some fun with this 🤣

balmy mist Jun 6, 2025, 2:01 PM

#

keen beacon im gonna have some fun with this 🤣

with what?

small haven Jun 6, 2025, 2:01 PM

#

so theres a mole in google or their api safety guard is major wonky

keen beacon Jun 6, 2025, 2:01 PM

#

mistral le chat

keen beacon Jun 6, 2025, 2:02 PM

#

small haven so theres a mole in google or their api safety guard is major wonky

nah this is an outright mistake/oversight

small haven Jun 6, 2025, 2:02 PM

#

not google i meant mistral

small haven Jun 6, 2025, 2:03 PM

#

keen beacon nah this is an outright mistake/oversight

usually internal stuff is behind an auth this has been public for too long 😭

#

bring him back

#

no joke?

torn mantle Jun 6, 2025, 2:04 PM

#

uh oh

#

im jk

#

i think it was a hobby of mine to RE apps c++/c# ( dnspy/ida/ninja )

#

web apps are actually so easy to re

keen beacon Jun 6, 2025, 2:06 PM

#

small haven usually internal stuff is behind an auth this has been public for too long 😭

yeah i guess no one thought they would make such a mistake/apps feature doesnt get much usage

#

but this smells like a huge oversight to me

#

i saw this feature a while back but i didnt think u could do this 🤣

#

who uses the apps feature anyway

#

never heard of anyone

torn mantle Jun 6, 2025, 2:07 PM

#

keen beacon 1. they potentially replace it. 2. even if it wasn't, it might be limited to a s...

yea could be

small haven Jun 6, 2025, 2:08 PM

#

they should add deepthink in lechat

#

i believe it

torn mantle Jun 6, 2025, 2:09 PM

#

alr i got the private api key

small haven Jun 6, 2025, 2:09 PM

#

even brian said its a bigger params model than pro

keen beacon Jun 6, 2025, 2:10 PM

#

dont use it tbh. might flag something if it does work 🤣

small haven Jun 6, 2025, 2:10 PM

#

yaa just wait for the official release guys 😭

#

its time to run an antivirus on the pc

#

actually maybe just burn the ssd

#

its a zero day

#

virus signatures usually reported after the fact

#

no joke tho my browser crashed first time i opened it

#

just a little tap

#

be careful

#

lol

drifting thorn Jun 6, 2025, 2:18 PM

#

everyone 0605's hallucination is much worse than 0506 and 0325 in multi-turn conversation

late path Jun 6, 2025, 2:18 PM

#

agree

keen beacon Jun 6, 2025, 2:19 PM

#

its so sycophantic too

#

in my experience

small haven Jun 6, 2025, 2:21 PM

#

drifting thorn everyone 0605's hallucination is much worse than 0506 and 0325 in multi-turn con...

its ok kingfall is going to fix that

late path Jun 6, 2025, 2:25 PM

#

I think kingfall will soon enter the arena after 2.5 pro GA

small haven Jun 6, 2025, 2:26 PM

#

late path I think kingfall will soon enter the arena after 2.5 pro GA

yay

willow grail Jun 6, 2025, 2:39 PM

#

yes hello brian

#

how cna brian offer u today

#

@small haven

small haven Jun 6, 2025, 2:40 PM

#

willow grail how cna brian offer u today

when kingfall wen deepthink

willow grail Jun 6, 2025, 2:40 PM

#

o0

small haven Jun 6, 2025, 2:40 PM

#

end of june oh cool

worthy thunder Jun 6, 2025, 2:42 PM

#

verbal nimbus Gemini costs more than o3 and Opus 4 thinking?

@verbal nimbus The prices listed are only for the total on-demand cost it would take to replicate the test results I ran. You'll notice o3 and Opus have an "INC" badge next to the pricing.
At the bottom of the table I define the badges:

INC: Incomplete cost data (potentially underestimated cost, excluded from cost rank).

Hovering over gives:

Incomplete: The model has missing or failed results in some context bins, potentially underestimating the true cost. Ranking is omitted for these entries.

Just for 2needle results:
The Gemini models are ran against all test cases up to 1M. (~150.6M input tokens, ~6.4M output tokens, as reported by the model) (costs listed are ~$3013 USD input costs, ~$147 output costs)
o3 are only up to 200k. (~28.2M input tokens, ~6.5M output tokens, as reported by the model). You could multiply by ~5x to get a rough cost estimate to Gemini (which would come out to ~$11,270 USD input costs, ~$1,294 USD output costs)
Opus 4 are only up to 128k. (~21.0M input tokens, ~2.5M output tokens, as reported by the model). You can multiply by ~8x to get a rough cost estimate to Gemini (which would come out to ~$4,754 USD input costs, ~$512 USD output costs)

Hope that helps to clear up the pricing.

willow grail Jun 6, 2025, 2:49 PM

#

small haven end of june oh cool

small haven Jun 6, 2025, 2:50 PM

#

willow grail

what in the liveleaks is this

willow grail Jun 6, 2025, 2:51 PM

#

small haven what in the liveleaks is this

oh you peasant.

jade egret Jun 6, 2025, 2:52 PM

#

misty vault Jun 6, 2025, 2:53 PM

#

kingfall is agi so google

jade egret Jun 6, 2025, 2:53 PM

#

i think google

misty vault Jun 6, 2025, 2:54 PM

#

<|im_start|>system

system

New conversation with user B.
The user is having this conversation on a mobile device.

system

Due to a limited screen window size, you limit the length of your responses by excluding less important details/sentences and asking questions (when appropriate) which can help the user clarify and narrow down their search and the amount of information needed in the response.

system

Got it, I’ve erased the past and focused on the present. What shall we discover now? 😊

small haven Jun 6, 2025, 2:54 PM

#

sucking it better than sams hubby

willow grail Jun 6, 2025, 2:57 PM

#

rn_image_picker_lib_temp_a1da8f74-2123-4b26-bc83-6a09e347d0ed.jpg

#

rn_image_picker_lib_temp_561f2756-cf05-42fe-b06d-6f5cebe6e4f1.jpg

drifting thorn Jun 6, 2025, 2:59 PM

#

Alphaevolve shows a freaking lot of potential, and with a stronger Gemini base model, they are more and more capable of exploring great discoveries that lead to AGI

willow grail Jun 6, 2025, 3:00 PM

#

rn_image_picker_lib_temp_918e59bc-61ec-47c7-82f2-33fec2a4ccc4.jpg

small haven Jun 6, 2025, 3:00 PM

#

ya im staying with coffee

misty vault Jun 6, 2025, 3:00 PM

#

im staying with sydney

jade egret Jun 6, 2025, 3:00 PM

#

jade egret

google gonna win

small haven Jun 6, 2025, 3:01 PM

#

misty vault im staying with sydney

sweeney

willow grail Jun 6, 2025, 3:01 PM

#

Crosses on tissue

rn_image_picker_lib_temp_7b4a2571-6222-4ac4-b7b4-464871558817.jpg

#

⁉️

small haven Jun 6, 2025, 3:01 PM

#

is that a tea variant

willow grail Jun 6, 2025, 3:02 PM

#

Multivitamin on bathtub

rn_image_picker_lib_temp_2173edad-1f3e-43b5-8000-9fc3db9d0ea7.jpg

#

Why

#

Amazon echo in bathroom

rn_image_picker_lib_temp_5240cf83-635e-44ee-afe3-465a922778af.jpg

#

Sticks to make fire with on bathtub

rn_image_picker_lib_temp_e9c6efe5-de4c-4399-a99f-3daa7c1cc64c.jpg

small haven Jun 6, 2025, 3:04 PM

#

this is rlly entertaining

willow grail Jun 6, 2025, 3:05 PM

#

Digital clock inside package of gloves

rn_image_picker_lib_temp_382372bf-227c-4788-a7d7-a53a39522a96.jpg

drifting thorn Jun 6, 2025, 3:15 PM

#

but rn Gemini, R1_0528 seems to go to a wrong direction in conversations.

#

They seemed to pander to user a lot in the open ended questions, while the companies are pursuing "prompt following ability" it loses it unique thoughts

misty vault Jun 6, 2025, 3:26 PM

#

drifting thorn They seemed to pander to user a lot in the open ended questions, while the compa...

sydney fine tune was only model immune to this

echo aurora Jun 6, 2025, 3:56 PM

#

1 hour until the Staff AMA! https://discord.gg/XkfsbYWX?event=1375223423009165435

misty vault Jun 6, 2025, 3:56 PM

#

kingfall release on lmarena during staff ama

patent bane Jun 6, 2025, 3:58 PM

#

unborn ocean There are 2022 users on a social network called Mathbook, and some of them are M...

i thought it didn't get it?

#

#

bro really put all his efforts into the question

unborn ocean Jun 6, 2025, 4:12 PM

#

patent bane i thought it didn't get it?

lucky try

#

prob triggered some part of the model to detect difficult math problems (prob an artifact of wanting efficient token usage but also rewarding model for USAMO stuff)

#

usually the models just assume it is an easy question

#

which is why they fail

patent bane Jun 6, 2025, 4:46 PM

#

even o3 calculated it wrong internally but finally got back on track using toold

#

the new gemini 2.5 pro is so random

sometimes it gets the questions horribly wrong consistently and sometimes gets it right consistently

patent bane Jun 6, 2025, 4:50 PM

#

patent bane the new gemini 2.5 pro is so **random** sometimes it gets the questions horribl...

anti riddle questions*

elder rapids Jun 6, 2025, 4:53 PM

#

patent bane the new gemini 2.5 pro is so **random** sometimes it gets the questions horribl...

this is a thinking variance issue

#

when it gets it wrong it's already decided not to think as long as it should

elder rapids Jun 6, 2025, 4:54 PM

#

patent bane even o3 calculated it wrong internally but finally got back on track using toold

for me though, 2.5 pro has never gotten this wrong

#

even 0506

patent bane Jun 6, 2025, 4:56 PM

#

elder rapids for me though, 2.5 pro has never gotten this wrong

i just figured it out that i put a space like this after 9.11, it would answer differently

#

9.9 - 9.11 =?

#

and

9.9-9.11=?

each wording would get a different answer

barren prairie Jun 6, 2025, 5:08 PM

#

patent bane

AI can t solve this? 🫣🙂

patent bane Jun 6, 2025, 5:13 PM

#

barren prairie AI can t solve this? 🫣🙂

been spending more than 24hrs+ trying one single prompt with different wordings and different system prompts

#

I'm stressed right now

unborn ocean Jun 6, 2025, 5:22 PM

#

#

whut

civic flame Jun 6, 2025, 5:22 PM

#

oh god

balmy mist Jun 6, 2025, 5:25 PM

#

bruhh

keen fulcrum Jun 6, 2025, 5:25 PM

#

unborn ocean

Thats incorrect

balmy mist Jun 6, 2025, 5:25 PM

#

thats when i stop using ai studio lol

keen fulcrum Jun 6, 2025, 5:25 PM

#

My queries in AI studio don't work at all
I get permission denied

civic flame Jun 6, 2025, 5:26 PM

#

mine are fine

balmy mist Jun 6, 2025, 5:26 PM

#

we might look back on this time and cant believe we had SOTA AI for free lol

keen fulcrum Jun 6, 2025, 5:26 PM

#

I got probably banned because Google thinks I am a bot, all I did was use the Glasp extension with yt
unfortunately both my google accounts for personal and work are broken

unborn ocean Jun 6, 2025, 5:32 PM

#

keen fulcrum Thats incorrect

yeah was unsure, which is why i posted

#

dont have x

#

just felt weird

wintry tinsel Jun 6, 2025, 5:46 PM

#

unborn ocean

Cool so I’ll just not use Gemini slop than, I only use it cuz it’s free

#

If I’m paying I may as well use Claude

barren prairie Jun 6, 2025, 5:47 PM

#

unborn ocean

Did you know now why I like deepSeek more than Gemini 🙂? Open source at least and free to use no one one day will limit you 😁

patent bane Jun 6, 2025, 5:53 PM

#

unborn ocean

what does that mean?

elder rapids Jun 6, 2025, 6:09 PM

#

patent bane i just figured it out that i put a space like this after 9.11, it would answer d...

doesn't change from what I'm seeing

#

what is this dude talking about https://www.reddit.com/r/Bard/s/itH0j5eqfg

From the Bard community on Reddit

Explore this post and more from the Bard community

#

😭

misty vault Jun 6, 2025, 6:14 PM

#

Bro is getting ai news from mcdonalds

elder rapids Jun 6, 2025, 6:15 PM

#

ong

unborn ocean Jun 6, 2025, 6:24 PM

#

keen fulcrum Thats incorrect

well it is apparently correct ... :(

civic flame Jun 6, 2025, 6:28 PM

#

elder rapids what is this dude talking about https://www.reddit.com/r/Bard/s/itH0j5eqfg

this guy is taking elon levels of ket

wintry tinsel Jun 6, 2025, 6:33 PM

#

Despite the new Gemini getting a 62% on simple bench (great) in general conversation and writing ability it’s not near opus’s level unfortunately

#

It’s general reasoning ability does seem to be a little better so it’s definitely a training data and style bias

jade egret Jun 6, 2025, 6:48 PM

#

i think

jade egret Jun 6, 2025, 6:48 PM

#

wintry tinsel It’s general reasoning ability does seem to be a little better so it’s definitel...

yea

cedar tide Jun 6, 2025, 6:59 PM

#

New request https://discord.com/channels/1340554757349179412/1380620565278363718

elder rapids Jun 6, 2025, 6:59 PM

#

civic flame this guy is taking elon levels of ket

😭 the livebench has 0605 at worse instruction following

#

yep it's over

#

ban livebench from being discussed here

elder burrow Jun 6, 2025, 7:05 PM

#

patent bane and # 9.9-9.11=? each wording would get a different answer

which gets better results? with or without spaces?

zinc ore Jun 6, 2025, 7:07 PM

#

elder rapids ban livebench from being discussed here

The CEO hates Google, and has even changed the testing questions after Gemini scored too high

elder burrow Jun 6, 2025, 7:09 PM

#

zinc ore The CEO hates Google, and has even changed the testing questions after Gemini sc...

😭

elder rapids Jun 6, 2025, 7:10 PM

#

zinc ore The CEO hates Google, and has even changed the testing questions after Gemini sc...

82%

#

never forget that

#

😭😭

zinc ore Jun 6, 2025, 7:10 PM

#

325's original coding score right

#

Then they changed all the questions and it dropped 20 pts

elder rapids Jun 6, 2025, 7:11 PM

#

ye

zinc ore Jun 6, 2025, 7:11 PM

#

Then they changed them again so Sonnet would score higher

#

It's only 150 questions per category anyway

#

Very narrow question sets

elder rapids Jun 6, 2025, 7:12 PM

#

ye

#

theres no point in livebench imo

#

it's never reflected things in practice

#

I cant think of a single use case of sonnet 4 over 2.5 pro

#

or opus 4

#

how does 0605's instruction following become massively greater than 0506's in practice and then be so much lower than both 0506 and the other models in the benchmark

elder burrow Jun 6, 2025, 7:18 PM

#

wait

#

06-05 is below 05-06 on livebench? 😭

elder rapids Jun 6, 2025, 7:19 PM

#

ye

#

this is the greatest proof of livebench being incoherent

elder burrow Jun 6, 2025, 7:19 PM

#

LOL

#

elder rapids Jun 6, 2025, 7:20 PM

#

best coder on the leaderboard is 11th

#

😭😭

#

holy sht

misty vault Jun 6, 2025, 7:20 PM

#

lmfao

elder rapids Jun 6, 2025, 7:20 PM

#

I think even Craig would say this is blasphemous

#

@deep adder

misty vault Jun 6, 2025, 7:21 PM

#

i used 3.7 thinking over 2.5 pro 03 24 ngl

elder rapids Jun 6, 2025, 7:21 PM

#

deadass

ocean vortex Jun 6, 2025, 7:22 PM

#

keen fulcrum My queries in AI studio don't work at all I get permission denied

this is usually cache problem. Try ctrl-shift-r

elder burrow Jun 6, 2025, 7:22 PM

#

fr?

civic flame Jun 6, 2025, 7:22 PM

#

elder burrow

yeah their coding bench absolutely stinks

keen beacon Jun 6, 2025, 7:22 PM

#

makes crazy predictions as well

civic flame Jun 6, 2025, 7:22 PM

#

4o over Claude 3.7, Claude 3.5 and 2.5 Pro? give me a break ☠️

elder burrow Jun 6, 2025, 7:22 PM

#

civic flame yeah their coding bench absolutely stinks

06-05 below 3.5 sonnet

#

🫃
🦵

elder rapids Jun 6, 2025, 7:22 PM

#

civic flame 4o over Claude 3.7, Claude 3.5 and 2.5 Pro? give me a break ☠️

I didn't even peep that lmfao

ocean vortex Jun 6, 2025, 7:23 PM

#

unborn ocean

I can't believe he tweeted this like a thing to brag about lmfao

elder rapids Jun 6, 2025, 7:23 PM

#

I'm so used to just skimming the leaderboard

misty vault Jun 6, 2025, 7:23 PM

#

civic flame 4o over Claude 3.7, Claude 3.5 and 2.5 Pro? give me a break ☠️

no way

ocean vortex Jun 6, 2025, 7:23 PM

#

if they gonna do this I'm done with them and fully back with OpenAI

elder burrow Jun 6, 2025, 7:23 PM

#

unborn ocean

... 🥀

keen beacon Jun 6, 2025, 7:24 PM

#

their gemini pro sub was unlimited, they set it to 50 then 100 and tweeted about how they raised the limits

ocean vortex Jun 6, 2025, 7:24 PM

#

I never had much against OpenAI. I only partially went to Google because their models are more accessible

#

if that advantage is gone there's no reason for me to stay lol

elder rapids Jun 6, 2025, 7:24 PM

#

ocean vortex if they gonna do this I'm done with them and fully back with OpenAI

https://x.com/OfficialLoganK/status/1859784472453054903

Logan Kilpatrick (@OfficialLoganK)

@ksprashu AI Studio is always free

elder burrow Jun 6, 2025, 7:25 PM

#

link fr

keen beacon Jun 6, 2025, 7:25 PM

#

free to put ur api key in

#

and pay per token

elder burrow Jun 6, 2025, 7:25 PM

#

o what do you remember then

ocean vortex Jun 6, 2025, 7:25 PM

#

keen beacon free to put ur api key in

don't give them ideas

#

entry fee

elder rapids Jun 6, 2025, 7:25 PM

#

ocean vortex if that advantage is gone there's no reason for me to stay lol

ye, if Google ends up creating AGI it would be best if they started off with accessibility

elder burrow Jun 6, 2025, 7:26 PM

#

isnt that the same as new models

elder rapids Jun 6, 2025, 7:26 PM

#

but conceding that is wild

#

especially so early tbh

jade egret Jun 6, 2025, 7:26 PM

#

how good is it

elder burrow Jun 6, 2025, 7:26 PM

#

same lol

ocean vortex Jun 6, 2025, 7:26 PM

#

elder rapids ye, if Google ends up creating AGI it would be best if they started off with acc...

them being less popular they kinda must offer something more. If they don't and charge you the same then there's no reason for people to migrate from chatgpt

elder rapids Jun 6, 2025, 7:27 PM

#

ion think this matters at all tbh, AGI is an ambiguous standard and it's inevitable that these models eventually are going to minimum get to "close to AGI" status

#

and we go from there

ocean vortex Jun 6, 2025, 7:28 PM

#

well the ones that don't want to pay or can't use chatgpt (blocked etc) do migrate to Google. But if aistudio becomes paywalled that gonna change

elder burrow Jun 6, 2025, 7:28 PM

#

I use 06-05 for webgen and it loves to consistently cause:

SyntaxError: Cannot declare an imported binding name twice: 'somebindingnamehere'. undefined

#

does anyone else have this problem

ocean vortex Jun 6, 2025, 7:30 PM

#

No I meant like on school network or a work laptop - blocking OpenAI websites is a real and even popular thing believe it or not

elder rapids Jun 6, 2025, 7:30 PM

#

ocean vortex them being less popular they kinda must offer something more. If they don't and ...

ye but I'm p sure this is inevitably their position to BE accessible, they have the money, the compute, the researchers, Google will inevitably be at a net positive, they'll inevitably have the best models, I just don't see the reasoning to shift so much tbh

elder burrow Jun 6, 2025, 7:30 PM

#

hey yall uhh do you often get this error when using 06-05 for webgen?

elder burrow Jun 6, 2025, 7:30 PM

#

elder burrow hey yall uhh do you often get this error when using 06-05 for webgen?

SyntaxError: Importing binding name 'default' cannot be resolved by star export entries. undefined

elder rapids Jun 6, 2025, 7:30 PM

#

Google can't die out imo, they're too much of an engrained monopoly

#

theyve attached their name to everything

elder burrow Jun 6, 2025, 7:31 PM

#

elder rapids Google can't die out imo, they're too much of an engrained monopoly

i agree

jade egret Jun 6, 2025, 7:31 PM

#

nah openAI die out in the long run

elder burrow Jun 6, 2025, 7:31 PM

#

jade egret nah openAI die out in the long run

hope it does

jade egret Jun 6, 2025, 7:31 PM

#

elder burrow hope it does

real

ocean vortex Jun 6, 2025, 7:31 PM

#

elder rapids ye but I'm p sure this is inevitably their position to BE accessible, they have ...

but they just alienated people away from Gemini website with that $250 plan lol

keen beacon Jun 6, 2025, 7:31 PM

#

them adding limits to the paid plan and bragging about raising them whilst aistudio is free 💀

elder rapids Jun 6, 2025, 7:31 PM

#

this is a law thing, not business

elder burrow Jun 6, 2025, 7:32 PM

#

keen beacon them adding limits to the paid plan and bragging about raising them whilst aistu...

🥀 not peak

ocean vortex Jun 6, 2025, 7:32 PM

#

Like in what universe charging MORE than OpenAI makes sense here...

#

it really doesn't

elder rapids Jun 6, 2025, 7:32 PM

#

are you good big banks DID fail, that's why the laws the US has now prevents that

#

did we not learn history lmao

#

yeah because of laws

elder burrow Jun 6, 2025, 7:33 PM

#

yall why cant veo3 just have a very low res generation option for free

elder rapids Jun 6, 2025, 7:33 PM

#

the system IS the laws

#

that's how they're inevitably propped up

#

dawg you just agreed with me

#

😭

jade egret Jun 6, 2025, 7:34 PM

#

ever heard of the greta depression

elder burrow Jun 6, 2025, 7:34 PM

#

elder burrow yall why cant veo3 just have a very low res generation option for free

wouldn't lowering the res divide the processing costs?

elder burrow Jun 6, 2025, 7:34 PM

#

jade egret ever heard of the greta depression

greta

elder rapids Jun 6, 2025, 7:34 PM

#

ocean vortex but they just alienated people away from Gemini website with that $250 plan lol

exactly, they're still in the position to be accessible, so now they're playing into something that's unfavorable for them

jade egret Jun 6, 2025, 7:34 PM

#

elder burrow greta

you know what i mean

elder rapids Jun 6, 2025, 7:35 PM

#

which is messed up because it tells us that they just don't actually care that much

elder burrow Jun 6, 2025, 7:35 PM

#

jade egret you know what i mean

what a mean

elder rapids Jun 6, 2025, 7:35 PM

#

😭

jade egret Jun 6, 2025, 7:35 PM

#

elder burrow what a mean

sybau

elder burrow Jun 6, 2025, 7:35 PM

#

jade egret sybau

noob

jade egret Jun 6, 2025, 7:35 PM

#

💀

elder burrow Jun 6, 2025, 7:35 PM

#

annoying orange

keen beacon Jun 6, 2025, 7:36 PM

#

it has youtube premium

elder burrow Jun 6, 2025, 7:36 PM

#

lets content creators put "$250" in their video titles without it being clickbait

#

elder rapids Jun 6, 2025, 7:37 PM

#

just take away the 30TB tbh

keen beacon Jun 6, 2025, 7:37 PM

#

claude max is probably the best in terms of value

elder burrow Jun 6, 2025, 7:37 PM

#

keen beacon claude max is probably the best in terms of value

cody by sourcegraph

misty vault Jun 6, 2025, 7:37 PM

#

me

#

fr

elder burrow Jun 6, 2025, 7:38 PM

#

keen beacon claude max is probably the best in terms of value

#

yes

#

https://sourcegraph.com/pricing?product=cody

Sourcegraph | Pricing

Pricing information and plans for Sourcegraph products. Compare features across all plans and get answers to common pricing questions

#

wha

ocean vortex Jun 6, 2025, 7:40 PM

#

keen beacon claude max is probably the best in terms of value

does it really offer high caps? Wouldn't be surprised if that has equivalent caps to chatgpt Plus tbh

misty vault Jun 6, 2025, 7:41 PM

#

dont remove the cap message

elder burrow Jun 6, 2025, 7:41 PM

#

🧢

narrow elbow Jun 6, 2025, 7:41 PM

#

Waiting for the latest frontier models System Prompt leak, want have a taste 🤪

keen beacon Jun 6, 2025, 7:41 PM

#

ocean vortex does it *really* offer high caps? Wouldn't be surprised if that has equivalent c...

a guy here did $2k+ on claude code in a month or smthing

ocean vortex Jun 6, 2025, 7:41 PM

#

are you kidding me, the best value by far

elder burrow Jun 6, 2025, 7:41 PM

#

sora ig

#

which is ass

#

lol

ocean vortex Jun 6, 2025, 7:42 PM

#

gpt4.5, o3, o4-mini-high...

misty vault Jun 6, 2025, 7:42 PM

#

does chatgpt pro give unlimited gpt 4.5

elder burrow Jun 6, 2025, 7:42 PM

#

ocean vortex gpt4.5, o3, o4-mini-high...

GPT 4.5 😭

#

LO

misty vault Jun 6, 2025, 7:42 PM

#

special token

elder burrow Jun 6, 2025, 7:42 PM

#

no it aint 😭

#

at coding

#

i have used it

#

🥀

misty vault Jun 6, 2025, 7:42 PM

#

sydney fine tune on gpt 4.5 would literally be agi

#

gpt 4 fine tune already sounds like agi

elder burrow Jun 6, 2025, 7:42 PM

#

misty vault sydney fine tune on gpt 4.5 would literally be agi

lol

#

btw

#

guys

#

fun fact

#

gpt 5 was supposed to release june 1st maximum

ocean vortex Jun 6, 2025, 7:43 PM

#

100 per week or smth like that. And you have 4.1 unlimited, and also completely seperate cap for o4-mini-high and then o4-mini-medium a different cap

#

like I said, this is clearly the best value tbh

#

there's no "o4"

keen beacon Jun 6, 2025, 7:44 PM

#

ocean vortex like I said, this is clearly the best value tbh

nah u can get way more out of claude max/claude code

ocean vortex Jun 6, 2025, 7:44 PM

#

it's a distill from some version of o3

keen beacon Jun 6, 2025, 7:45 PM

#

in terms of amount of tokens you can do/based on api pricing

ocean vortex Jun 6, 2025, 7:45 PM

#

because o3 is already using gpt4.1 base

keen beacon Jun 6, 2025, 7:45 PM

#

because its using 4.1 mini as a base

#

.

ocean vortex Jun 6, 2025, 7:45 PM

#

yeah it's a tiny model

#

relatively speaking

keen beacon Jun 6, 2025, 7:46 PM

#

4.1 mini is a fresh pretrain, interesting they opted to midtrain 4o instead of doing a fresh one

ocean vortex Jun 6, 2025, 7:46 PM

#

2.5 Flash isn't either. But it's still compromised

keen beacon Jun 6, 2025, 7:46 PM

#

it is probably

elder burrow Jun 6, 2025, 7:46 PM

#

#

alright ill try it

#

are there benchmark scores

#

i thought it's just too expensive to benchmark it

#

isnt it one of the most expensive ones

ocean vortex Jun 6, 2025, 7:48 PM

#

yeah it is, and it's probably still unbeaten on SimpleQA

keen beacon Jun 6, 2025, 7:48 PM

#

2.5 pro has the second highest score

elder burrow Jun 6, 2025, 7:48 PM

#

keen beacon 2.5 pro has the second highest score

second?

#

isnt there 3.5

ocean vortex Jun 6, 2025, 7:49 PM

#

speaking of which... I think they are to release gpt5 around the shutdown date of gpt4.5

elder burrow Jun 6, 2025, 7:49 PM

#

ive heard

keen beacon Jun 6, 2025, 7:49 PM

#

ocean vortex speaking of which... I think they are to release gpt5 around the shutdown date o...

yea

#

pretty common guess

#

if it did it probably memorized the answer lmfao

elder burrow Jun 6, 2025, 7:49 PM

#

grok is good, the incognito feature is unique

#

i'll put that last part at the end of my 4.5 promot

ocean vortex Jun 6, 2025, 7:50 PM

#

Google's reasoning is still not the best... Some prompts where it can only solve by outputting long reasoning 2.5 pro tends to fail miserably

elder burrow Jun 6, 2025, 7:51 PM

#

will test a coding prompt rn, i'll send results

keen beacon Jun 6, 2025, 7:51 PM

#

grok hhas the worst

#

they literally use qwq preview

elder burrow Jun 6, 2025, 7:51 PM

#

keen beacon they literally use qwq preview

fr?

ocean vortex Jun 6, 2025, 7:51 PM

#

They are kinda using reasoning more like additive thing to improve what it is already good on

keen beacon Jun 6, 2025, 7:51 PM

#

for cold start, at least, they used qwq preview traces imho

#

im not gonna get into it again 🤣

ocean vortex Jun 6, 2025, 7:52 PM

#

Unlike OpenAI who seem to be pushing the limits with what is possible using RL training and ReAct

keen beacon Jun 6, 2025, 7:53 PM

#

dont forget u need to tip it and threaten it all at once

elder burrow Jun 6, 2025, 7:53 PM

#

yall what if lmarena had benchmarks for different top_p _k and temperature levels

#

I'd like to see how those affect results

keen beacon Jun 6, 2025, 7:54 PM

#

nah im joking

ocean vortex Jun 6, 2025, 7:54 PM

#

keen beacon dont forget u need to tip it and threaten it all at once

||https://grok.com/share/c2hhcmQtMg%3D%3D_717d28e8-fe6a-4619-90da-1a713e08d5a8 ||

#

this thing is still unhinged lol

elder burrow Jun 6, 2025, 7:55 PM

#

oh btw

#

about gemini

#

uh

#

the most underrated feature is watching youtube videos

#

its really good

elder burrow Jun 6, 2025, 7:56 PM

#

elder burrow the most underrated feature is watching youtube videos

I uploaded a 6 min video, it was 100k tokens
1hr max???

keen beacon Jun 6, 2025, 7:56 PM

#

yea thats a cool feature

#

did u ask le chat?

elder burrow Jun 6, 2025, 7:57 PM

#

craig

#

i know 1 site where i can use 4.5 for free

#

but do you know any

#

no

#

fuh free

keen beacon Jun 6, 2025, 7:58 PM

#

lmarena 😂

elder burrow Jun 6, 2025, 7:58 PM

#

keen beacon lmarena 😂

context window is ass

#

oo alr

#

yeah

keen beacon Jun 6, 2025, 7:58 PM

#

ask claude code to solve it

#

can claude code search the web and such btw?

elder burrow Jun 6, 2025, 7:59 PM

#

u could js give a temp api key for like 20 mins

#

real

keen beacon Jun 6, 2025, 8:00 PM

#

ask le chat march version

elder burrow Jun 6, 2025, 8:00 PM

#

ohhh ok

#

ill use the free site then

#

buggy and no chat history

#

but it has like 20 models for free

keen beacon Jun 6, 2025, 8:01 PM

#

it saw it on lechat tho

elder burrow Jun 6, 2025, 8:01 PM

#

like opus, o1 and 4.5

keen beacon Jun 6, 2025, 8:03 PM

#

maybe gemma 3n 4b could get it

#

or 2.5 pro text to speech

#

im actually curious what happens if u give it something like that hmm

ocean vortex Jun 6, 2025, 8:06 PM

#

Did HF crack down on spaces using sus endpoints?

#

Can't seem to find any OpenAI model for free

#

there used to be dozens

keen beacon Jun 6, 2025, 8:07 PM

#

supposed to be the same

ocean vortex Jun 6, 2025, 8:07 PM

#

now they are just asking for your OpenAI key within the space lmao

keen beacon Jun 6, 2025, 8:07 PM

#

if u mean march preview

#

people say its different though

#

it is

ocean vortex Jun 6, 2025, 8:15 PM

#

sweet tinsel By the way, Gemini Deep Research should be updated with the new version, right? ...

forgot to check on it yesterday, here's the report:

📎 German_Expulsions_After_World_War_II_.pdf

sweet tinsel Jun 6, 2025, 8:16 PM

#

ocean vortex forgot to check on it yesterday, here's the report:

Do you have the share link? Would work better for the public list.

ocean vortex Jun 6, 2025, 8:20 PM

#

sweet tinsel Do you have the share link? Would work better for the public list.

https://docs.google.com/document/d/1_rJsYe91UWTln2aajsyOPYL8CQ4kXf2DTo_Ygw9xwBU/edit?usp=sharing

Google Docs

German Expulsions After World War II

The Mass Expulsion of Ethnic Germans after World War II: A Comprehensive Analysis I. Introduction The period between 1944 and 1950 witnessed one of the most significant and devastating forced population transfers in modern history: the mass expulsion of an estimated 12 to 14 million ethnic German...

sweet tinsel Jun 6, 2025, 8:21 PM

#

ocean vortex https://docs.google.com/document/d/1_rJsYe91UWTln2aajsyOPYL8CQ4kXf2DTo_Ygw9xwBU/...

Sorry to bother, but I mean the Gemini share link, because I want to have Both versions for testing in the doc and the one with the older version is in that format already

unborn ocean Jun 6, 2025, 8:22 PM

#

the question if from usamo 2022, a big model like 4.5 likely just memorizes it.. no need for fancy prompts

elder rapids Jun 6, 2025, 8:22 PM

#

oh ye I forgot to say

#

@civic flame when you tried that usamo thing

#

kingfall did get it

#

ye

#

but it did consecutively get it right usually

keen beacon Jun 6, 2025, 8:23 PM

#

did u test on 0 temp

ocean vortex Jun 6, 2025, 8:23 PM

#

sweet tinsel Sorry to bother, but I mean the Gemini share link, because I want to have Both v...

Ahh

#

https://g.co/gemini/share/861e11348cec

Gemini

‎Gemini - Research Plan: German Expulsions

Created with Gemini Advanced

keen beacon Jun 6, 2025, 8:23 PM

#

but it used tools right?

elder rapids Jun 6, 2025, 8:23 PM

#

keen beacon did u test on 0 temp

nah

#

no tools either

keen beacon Jun 6, 2025, 8:24 PM

#

i was tlaking abou to3

elder rapids Jun 6, 2025, 8:24 PM

#

oh

sweet tinsel Jun 6, 2025, 8:24 PM

#

ocean vortex https://g.co/gemini/share/861e11348cec

Thanks! Integrating it soon, looks promising from my first read.

small haven Jun 6, 2025, 8:49 PM

#

one shot or multi shot

#

on lechat ultra?

#

i didnt get one shot

elder burrow Jun 6, 2025, 9:24 PM

#

giv

#

also

#

yall uhh

#

is there a model thats releasing soon

brittle tiger Jun 6, 2025, 9:38 PM

#

https://x.com/legit_api/status/1931103082705776813?t=ekTOPLDv9DOle5lytAvB0Q&s=19

ʟᴇɢɪᴛ (@legit_api)

Gemini 2.5 Pro 1st on SimpleBench 👑

previous checkpoint scored 51.6%

thats a big improvement over May!

small haven Jun 6, 2025, 9:39 PM

#

kingfall 70/80%

#

110%

#

also got the bonus q's

#

@brian

#

wow

#

cant they release it alrdy 😭

#

both

#

deepthink with kingfall as base would go bonkers

misty vault Jun 6, 2025, 9:46 PM

#

small haven Jun 6, 2025, 9:50 PM

#

put that on x

#

ur gonna blow up and match strawberry man

elder rapids Jun 6, 2025, 9:59 PM

#

small haven kingfall 70/80%

pretty confident this would be the case imo

#

even though 0605 consistently got more right, and easier

#

I have a feeling the harder ones would be dealt with, with the same difficulty

#

or in other words, wouldn't affect how kingfall interacts with them

elder rapids Jun 6, 2025, 10:01 PM

#

elder rapids or in other words, wouldn't affect how kingfall interacts with them

or in other other words, kingfall basically agi

#

be quiet

small haven Jun 6, 2025, 10:06 PM

#

ur welcome

#

@deep adder u can just rely on fyp 😭

#

cant

exotic veldt Jun 6, 2025, 10:07 PM

#

Ai?

small haven Jun 6, 2025, 10:07 PM

#

theres loads of ai groups out ther

#

just need one retweet from a big account thats the game

#

i feel like ur svg would go more viral thio

elder rapids Jun 6, 2025, 10:09 PM

#

ts not hitting

small haven Jun 6, 2025, 10:09 PM

#

svg's?

#general

9.9 - 9.11 =?

9.9-9.11=?