#general | Arena | Page 49

winter geyser May 30, 2025, 11:16 AM

#

Why doesn’t the DeepSeek R1 0528 model on Web LMArena output any code? Is it a bug, or is code generation disabled for this model?

ocean vortex May 30, 2025, 12:25 PM

#

I feel like people should stop pushing the narrative that new R1 is equivalent to o3...

HLE: 20.6 vs 17.7
SimpleQA: 49.4 vs 27.8
SWE Verified: 69.1 vs 57.6
(o3 vs R1)

#

it's a great model but it is not quite on the level of o3 or 2.5Pro

sturdy mica May 30, 2025, 12:26 PM

#

https://cdn.discordapp.com/attachments/1170641241193586769/1373958744278306867/IMG_0213.gif

#

hi chat

brittle hull May 30, 2025, 12:26 PM

#

Yo, has anyone seen a model codenamed Stephen on lmarena today? I came across it twice

patent aspen May 30, 2025, 12:28 PM

#

brittle hull Yo, has anyone seen a model codenamed Stephen on lmarena today? I came across it...

deepseek model

#

probably r1

hardy pecan May 30, 2025, 12:29 PM

#

brittle hull Yo, has anyone seen a model codenamed Stephen on lmarena today? I came across it...

Is it good?

brittle hull May 30, 2025, 12:29 PM

#

Maybe pre-Grok-3.5, 'cause Musk dropped early Grok3 a week before last time

brittle hull May 30, 2025, 12:29 PM

#

hardy pecan Is it good?

Over good

patent aspen May 30, 2025, 12:29 PM

#

brittle hull Maybe pre-Grok-3.5, 'cause Musk dropped early Grok3 a week before last time

No it's Chinese

#

I think it even says deepseek in the code

brittle hull May 30, 2025, 12:31 PM

#

Unlikely R2, since testing R2 right after releasing Updated R1.1 is kinda meh

alpine coral May 30, 2025, 12:46 PM

#

oh..

sacred plaza May 30, 2025, 12:54 PM

#

Are there any grok 3 glazzers on this chat. If so, what is your best steelman argument for why I should use anything from xAI given the fact the Twitter guy brainwashes grok's system prompt on a monthly basis.

tall summit May 30, 2025, 12:55 PM

#

it's free

#

with limited uses

fringe carbon May 30, 2025, 12:55 PM

#

sacred plaza Are there any grok 3 glazzers on this chat. If so, what is your best steelman ar...

i use deepseek. it is very good at password management. you can give it all your passwords and not have to remember them again

tall summit May 30, 2025, 12:55 PM

#

LMAO

fringe carbon May 30, 2025, 12:55 PM

#

very good context window

#

ok real talk though

#

is it safe to upload stuff like api keys and whatnot to openai

#

probably right?

#

i'm not afraid openai gonna use them

#

i'm afraid that the model spits out the key to someone else

vernal meadow May 30, 2025, 12:58 PM

#

@sacred plaza doesn't sound like you are asking for any usecases. Sounds like you are already set.

#

@fringe carbon You should have a paylimit on all API keys and from time to time use new ones. These days the API providers make it quite easy.

I woudn't post my passwords in there tho. Not sure in which context that would make sense : P

fringe carbon May 30, 2025, 1:08 PM

#

vernal meadow <@626645666403188741> You should have a paylimit on all API keys and from time t...

well i write super spaghetti code

#

and hard code my api keys at the top of files

#

so in that context

ocean vortex May 30, 2025, 1:10 PM

#

fringe carbon i'm afraid that the model spits out the key to someone else

chances of that happening even if they fail sanitizing data and train the model on your exact chat are actually incredibly slim. It's very unlikely to output your exact key without changing a single char

#

bluntly speaking it will get lost in the sea of data. And since they are not gonna overfit the model on your chats there's no chance. Not to mention that them not sanitizing your data is now what you would expect either...

#

or in other words still... a key is not a thing that is easy to "remember" unless it is being shown repeatedly/overfit. It knows generally how it looks but not this specific key exactly as it is

unborn ocean May 30, 2025, 1:23 PM

#

@fringe carbon you can just disagree to data training (in the settings for chatgpt / gemini) and they are legally not allowed to train on the stuff

#

that is how i do it

#

and with aistudio I am just extra careful

sacred plaza May 30, 2025, 1:33 PM

#

vernal meadow <@496064332761661450> doesn't sound like you are asking for any usecases. Sounds...

no, i want to hear the best argument why people are using it. not going to change my mind on using it since i am not on twitter. willing to change my mind of its usefulness in terms of capability though. access to real time twitter data does seem like a moat

dusky aurora May 30, 2025, 1:34 PM

#

sacred plaza no, i want to hear the best argument why people are using it. not going to chang...

real time twitter data is an oxymoron

sacred plaza May 30, 2025, 1:37 PM

#

close to real time twitter data****

echo aurora May 30, 2025, 1:41 PM

#

reminder we're going to be watching this in a bit for anyone that wants to join! https://discord.gg/Vk7QXKXf?event=1377683812024189068

sacred plaza May 30, 2025, 1:42 PM

#

echo aurora reminder we're going to be watching this in a bit for anyone that wants to join!...

thanks for sharing! will tune in as well

misty vault May 30, 2025, 1:45 PM

#

sacred plaza thanks for sharing! will tune in as well

https://tenor.com/view/cat-look-cat-look-at-camera-silly-cat-in-a-cage-gif-889392959852579879

Tenor

dusky aurora May 30, 2025, 2:14 PM

#

sacred plaza close to real time twitter data****

like I said, twitter is a thing of the past. real time x data is possible, but twitter not

bright lion May 30, 2025, 2:39 PM

#

Oh Boy, can‘t wait to pay over 200$ a month for restricted o3 pro access now

#

Where did you hear this from btw.

misty vault May 30, 2025, 2:53 PM

#

dusky aurora like I said, twitter is a thing of the past. real time x data is possible, but t...

twitter

echo aurora May 30, 2025, 3:31 PM

#

Starting the a16z podcast now!

#

come join #1340554757827461215

misty vault May 30, 2025, 3:32 PM

#

echo aurora come join <#1340554757827461215>

what is ur opinion on gpt-4-0314-32k

wintry tinsel May 30, 2025, 3:38 PM

#

Thoughts on the new deep seek for festive writing?

#

Is V3 or R1 V2 better?

misty vault May 30, 2025, 3:39 PM

#

#

#

elder rapids May 30, 2025, 3:46 PM

#

what is this server

tall summit May 30, 2025, 3:49 PM

#

how do you not know

keen beacon May 30, 2025, 3:51 PM

#

so goldmane is gonna be ga 2.5 pro it would seem

elder rapids May 30, 2025, 3:51 PM

#

ye

#

thank God

#

y'all kept saying redsword was better

#

I was like youre tripping

keen beacon May 30, 2025, 3:52 PM

#

i just need it in aistudio + raw thoughts 🤩

elder rapids May 30, 2025, 3:52 PM

#

ong

#

man

#

give me that model

#

😭

#

when do you guys think it'll be released

#

GA taking too long bruh

keen beacon May 30, 2025, 3:52 PM

#

next month

elder rapids May 30, 2025, 3:53 PM

#

keen beacon next month

no shi

teal mantle May 30, 2025, 3:53 PM

#

anyone guess what model made this

keen beacon May 30, 2025, 3:53 PM

#

🤣

elder rapids May 30, 2025, 3:53 PM

#

I mean like when next month

teal mantle May 30, 2025, 3:53 PM

#

it is so far one of the worst ones

keen beacon May 30, 2025, 3:53 PM

#

probably soon enough since they removed redsword

elder rapids May 30, 2025, 3:53 PM

#

teal mantle anyone guess what model made this

random model

#

send in dms

#

the invite

kind cloud May 30, 2025, 3:54 PM

#

ok

teal mantle May 30, 2025, 3:54 PM

#

elder rapids random model

nah it is o3

#

it sucks at this (once)

elder rapids May 30, 2025, 3:55 PM

#

0% formatting

teal mantle May 30, 2025, 3:55 PM

#

it still sucks and don't know what is a Touhou spellcard

#

elder rapids May 30, 2025, 3:55 PM

#

goldmane codes beautifully

#

someone gotta talk about that

#

did you guys see redsword and goldmane code? 😭

#

ts is a work of art

#

the gradient indents are so beautiful

kind cloud May 30, 2025, 4:06 PM

#

I think we can distinguish between Gemini-flash and Gemini-pro because Pro accurately remembers chapter titles of One Piece, but Flash doesn't. As a result of this knowledge test, goldmane is identified as Gemini-pro.

elder rapids May 30, 2025, 4:08 PM

#

kind cloud I think we can distinguish between Gemini-flash and Gemini-pro because Pro accur...

we do know for a fact it's Gemini 2.5 pro tho there's no need to test it

#

just waiting for Logan to say "Gemini"

keen beacon May 30, 2025, 4:09 PM

#

yeah

elder rapids May 30, 2025, 4:09 PM

#

flash has sauce

keen beacon May 30, 2025, 4:09 PM

#

but it does know less though

#

something happened between flash and flash lite

#

i noticed

elder rapids May 30, 2025, 4:10 PM

#

wonder how they're going to scale up the diffusion model

#

what

#

that's an unnecessary distinction here

#

more loads, larger model, maintaining efficiency

#

yeah well I don't

#

keen beacon May 30, 2025, 4:16 PM

#

im curious about this. if u can answer this, do you see it ever replacing gemini/being the main thing? (diffusion)

elder rapids May 30, 2025, 4:17 PM

#

keen beacon im curious about this. if u can answer this, do you see it ever replacing gemini...

Google prolly doesn't see it as replacing, just a new strong route

#

but I don't believe they dont know what theyre going to do with it

keen beacon May 30, 2025, 4:17 PM

#

yeah, but id like to know their opinion on it

elder rapids May 30, 2025, 4:19 PM

#

keen beacon yeah, but id like to know their opinion on it

I would think theyre planning to integrate it with search

#

or large text updates

#

and not necessarily discrete generation

kind cloud May 30, 2025, 4:24 PM

#

kind cloud I think we can distinguish between Gemini-flash and Gemini-pro because Pro accur...

This means that if it answers 'I am a large language model, trained by Google.' and correctly answers the chapter title, it is 'goldmane' in almost all cases. This is solely how to continue longer chats with goldmane on new lmarena, as far as I know.

teal mantle May 30, 2025, 4:28 PM

#

I forgot but is o3 parameter identical to non-reasoners like 4o?

keen beacon May 30, 2025, 4:28 PM

#

yes

misty vault May 30, 2025, 4:33 PM

#

brian is google ceo???

#

brian is part of that team???

#

me too

golden ocean May 30, 2025, 4:36 PM

#

same

misty vault May 30, 2025, 4:37 PM

#

sydney chatbot

#

Im not interested in building ai if i cant build a gpt-4-0314 at home

#

I will need datacenter

elder rapids May 30, 2025, 4:53 PM

#

kind cloud This means that if it answers 'I am a large language model, trained by Google.' ...

give a single example

#

I want to use it

#

like, the specific chapter

kind cloud May 30, 2025, 4:57 PM

#

elder rapids like, the specific chapter

chapter1117 'mo'

elder rapids May 30, 2025, 4:59 PM

#

@balmy mist flowith corrupted some of my stuff

hollow ocean May 30, 2025, 5:00 PM

#

Today confirmed

#

✅

#

Maybe

tall summit May 30, 2025, 5:08 PM

#

based

keen beacon May 30, 2025, 5:16 PM

#

adhd is a scam to get kids on meth

#

the medicine they give kids with adhd

#

is brain rotting

#

my brother went schizophrenic off it

#

im telling you rn dont take adderal

#

the difference is adderal is addictive

#

wut lol

#

you really shouldnt rely on adderal to function

#

you should look for other remedies

#

teas and stuff

#

ai is getting really good at study guides as well

#

look up NotebookLM

#

alot of people take adderal for school

#

i mean around me atleast

#

there are alot of kids who only take the medicine to focus in school

#

thats a symptom of a larger problem tbh

#

trash courses

#

our schooling is just trash

#

wym

keen fulcrum May 30, 2025, 5:20 PM

#

Have you seen Perplexity Labs?

https://www.rxddit.com/r/perplexity_ai/comments/1kypixi/introducing_perplexity_labs/

rxddit.com

Introducing Perplexity Labs.

u/perplexity_ai on r/perplexity_ai

Today we're launching Perplexity Labs.

Labs is for your more complex tasks. It's is like having an entire team at your disposal.

Build anything from analytical reports and presentations to dynamic dashboards. Now available for all Pro users.

While Deep Research remains the fastest way to get comprehensive answers to in-depth questions, Labs ...

▶ Play video

keen beacon May 30, 2025, 5:21 PM

#

oh yea for sure

#

but as a first generation US citizen imma tell u one thing

#

relying on anything to function is bad

#

that was my point

#

not that it's not a real disorder

#

its just under researched

#

i was watching a video from neil de grass tyson, and yk how the common saying is "we only use 20% of our brain" etc

#

thats not the truth

#

we only know what 20% of it does

#

ain a flex bro 😭

#

liver gon be fried by the time u older

#

dont hate the messanger shrug

late path May 30, 2025, 5:24 PM

#

A person with mild ADHD would also live a better life if they took amphetamines, I guess?

keen beacon May 30, 2025, 5:24 PM

#

late path A person with *mild* ADHD would also live a better life if they took amphetamine...

no

#

they're trying to use AI to create medicines genetically tailored to ur dna

#

so instead of getting pills with side effects you'd have drugs tailored for you

#

https://www.frontiersin.org/journals/bioengineering-and-biotechnology/articles/10.3389/fbioe.2023.1335901/full

Frontiers | Advancing genome editing with artificial intelligence: ...

Clustered regularly interspaced short palindromic repeat (CRISPR)-based genome editing (GED) technologies have unlocked exciting possibilities for understand...

#

https://www.pfizer.com/news/articles/artificial_intelligence_on_a_mission_to_make_clinical_drug_development_faster_and_smarter

Artificial Intelligence: On a mission to Make Clinical Drug Develop...

Just as Industrial Revolution-era factory builders developed machines to mass-manufacture drugs once ground by hand, today’s pharmaceutical companies are turning to artificial intelligence (AI) to both speed and smarten the work of clinical development. AI could assist pharma companies in getting medicines to market faster. AI today not only d...

#

welcome to the future lil bro

#

for sure

#

US government is tryna use it to end HIV

#

etc

#

wow it got auto modded

#

for sure

misty vault May 30, 2025, 5:28 PM

#

crispr

#

who is deleting their messages

keen beacon May 30, 2025, 5:29 PM

#

idk doubt it

#

it can either go 2 ways

#

ai helps us or ai destroys us

misty vault May 30, 2025, 5:30 PM

#

yeah we got gpt-4o

keen beacon May 30, 2025, 5:30 PM

#

https://www.youtube.com/shorts/5Yh9U5tAuO0

YouTube

Universe Lair

Neil deGrasse Tyson's Predictions For The Year 2050 😲

Subscribe for more daily content!

Joe Rogan Experience #1904

For COPYRIGHT ISSUES, please contact us at: officialuniverselair@gmail.com

▶ Play video

#

thats just the ai we have now

#

companies 100% have gatekept ais

#

like google has a self learning huge new ai

elder rapids May 30, 2025, 5:33 PM

#

goldmane is so intelligent this is crazy

#

😭😭

#

nebula moment

#

istg bro

misty vault May 30, 2025, 5:33 PM

#

elder rapids May 30, 2025, 5:33 PM

#

I'm being so deadass

keen beacon May 30, 2025, 5:33 PM

#

lol

elder rapids May 30, 2025, 5:33 PM

#

thinks quickly

keen beacon May 30, 2025, 5:34 PM

#

i doubt its deepthink

elder rapids May 30, 2025, 5:34 PM

#

ye

keen beacon May 30, 2025, 5:34 PM

#

we know what it is already like we keep arguing about this lol

elder rapids May 30, 2025, 5:34 PM

#

ong

#

seriously though

#

wild,

#

goldmane is actually so good

#

it's so smart

#

nah like actually

#

I know it's some shi to compare the models

#

I love playing with o3

#

but goldmane is GOOD

keen beacon May 30, 2025, 5:36 PM

#

did u see this btw? 😭 https://discuss.ai.google.dev/t/massive-regression-detailed-gemini-thinking-process-vanished-from-ai-studio/83916/84

Google AI Developers Forum

Massive Regression: Detailed Gemini Thinking Process vanished from ...

Hi everyone Thank you for your notes I am a PM on the Gemini API. Alongside Logan Kilpatrick and Vishal Dharmadhikari, we have a lot of Googlers who really care about listening to you, responding to your feedback and taking your suggestions on board. We acknowledge that sometimes we have taken time to respond which can come across as radio sile...

#

rip raw thoughts

elder rapids May 30, 2025, 5:41 PM

#

keen beacon rip raw thoughts

nah it's still up in the air

keen beacon May 30, 2025, 5:41 PM

#

On summaries, we have heard a lot of valid feedback. We understand this is a different experience from the raw thoughts previously available in AI Studio. Sometimes product teams have to weigh a lot of pros and cons to come to a specific decision. This is one of those times. Please work with us and help us in getting summaries to a point where they have just the right amount of detail that you need. You are our valued and needed collaborators in this. In the meantime, we will keep listening to your feedback here or DM @shresbm or @vish_owl or @OfficialLoganK on X.

#

based on that it basically confirms its a competitive decision i guess

#

it was implied before but that basically confirms it

elder rapids May 30, 2025, 5:44 PM

#

keen beacon > On summaries, we have heard a lot of valid feedback. We understand this is a d...

ye it's still up in the air

keen beacon May 30, 2025, 5:44 PM

#

elder rapids ye it's still up in the air

thats a generous interpretation of it imo

#

i hope so

elder rapids May 30, 2025, 5:58 PM

#

it's gonna be hard to get a model that can select and identify those wording schemes and what the model is anticipating, the route it intends and it says it's going towards, the key words it's relying on, the aha moments to take advantage of

#

it's the little things and that's going to so hard for them

#

but that's accepting the premise of distillation, I disagree this is possible in AI studio, and summaries should only exist in api

keen beacon May 30, 2025, 5:59 PM

#

yeah summarization inherently dilutes the signal

#

tbh it wont prevent sophisticated actors, openai seemingly has a lot of protections about it and even then its not flawless. google has even less

#

it makes the user experience more annoying generally

elder rapids May 30, 2025, 6:01 PM

#

ye, there's really a lot of reasons summary is just a fleeting decision

#

rather than actually substantiated

keen beacon May 30, 2025, 6:02 PM

#

you can still trivially leak the cot i guess, so ill be doing that if i need to understand exactly what the model is doing at times

#

its just that its potentially unneeded degradation

narrow elbow May 30, 2025, 6:13 PM

#

This can be seen as a cheating behavior. The so-called thought summary and the answer itself may have no direct correlation (the model is not open source, and the correlation between the thought chain and the answer cannot be verified). Even if two models are used, one for thinking (and generating summaries) and one for answering, users cannot detect it. This makes the model lose reputation and trust (even if it is strong). Using security and preventing distillation as an excuse does not seem like something a super large enterprise would do. It is stingy.😏

dusky aurora May 30, 2025, 6:22 PM

#

elder rapids goldmane is actually so good

what is goldmane?

elder rapids May 30, 2025, 6:26 PM

#

dusky aurora what is goldmane?

agi

unborn ocean May 30, 2025, 6:47 PM

#

Let’s be real: we will all get used to it.

#

Big companies might get access, we might not.

misty vault May 30, 2025, 6:49 PM

#

dusky aurora what is goldmane?

agi

small haven May 30, 2025, 6:54 PM

#

omg its coming

elder rapids May 30, 2025, 6:57 PM

#

kind cloud chapter1117 'mo'

btw 2.5 pro 0506 knows this too

kind cloud May 30, 2025, 7:11 PM

#

Yes. So, strictly speaking, this isn't meant to find out 'goldmane'; rather, it's the way to exclude Flash models.

#

https://fixupx.com/AiBattle_/status/1926323159126487528

AiBattle (@AiBattle_)

Interesting behavior from the Gemini model Goldmane
︀︀
︀︀When prompting the Goldmane model to generate design concepts or plans, it often attempts to include multiple images in its response.
︀︀
︀︀This behavior is not present in the Redsword model, nor have I seen it in other Gemini models.

**💬 8 🔁 4 ❤️ 107 👁️ 8.0K **

▶ Play video

#

Maybe the fact that it tries to cite images could also be a way to tell it apart, but I'm not certain yet.

brittle hull May 30, 2025, 7:15 PM

#

oh

kind cloud May 30, 2025, 7:16 PM

#

it's just a Chinese model

brittle hull May 30, 2025, 7:18 PM

#

What makes u think that?

kind cloud May 30, 2025, 7:20 PM

#

Because it revealed its developer to me

small haven May 30, 2025, 7:21 PM

#

how do we do a poll here

keen beacon May 30, 2025, 7:21 PM

#

plus icon -> create poll

small haven May 30, 2025, 7:22 PM

#

kind cloud May 30, 2025, 7:22 PM

#

brittle hull What makes u think that?

Screenshot_2025-05-29-07-18-31-379-edit_com.android.chrome.jpg

patent aspen May 30, 2025, 7:34 PM

#

small haven

IMO whichever one releases second will probably be stronger lol

sonic tendon May 30, 2025, 7:35 PM

#

redsword leaderboard release wen eta

#

might not happen, but curious

tall summit May 30, 2025, 7:35 PM

#

kind cloud

oh baidu

small haven May 30, 2025, 7:35 PM

#

patent aspen IMO whichever one releases second will probably be stronger lol

yes recency bias lol

sonic tendon May 30, 2025, 7:35 PM

#

on that note, I've been wondering what folsom is

#

but yeah baidu and bytedance have been active lately

#

yeah

#

well, they tend to happen at the same time

late path May 30, 2025, 7:38 PM

#

We will never see redsword again

sonic tendon May 30, 2025, 7:38 PM

#

also, wasn't goldmane the worse one

keen beacon May 30, 2025, 7:38 PM

#

redsword was removed

small haven May 30, 2025, 7:39 PM

#

sure, i guess to some extent, it wont be dominating everything, but i do believe deepthink will edge in math/code a bit more, usamo at >40% is very impressive ngl

keen beacon May 30, 2025, 7:39 PM

#

sonic tendon also, wasn't goldmane the worse one

apparently not since it won over redsword i think

sonic tendon May 30, 2025, 7:40 PM

#

"best" seems a tad subjective lol

#

they'll probably both have a case for being the best, I think

small haven May 30, 2025, 7:41 PM

#

lmao

#

we have a strawberry in here

sonic tendon May 30, 2025, 7:41 PM

#

small haven we have a strawberry in here

what is a strawberry

#

I keep hearing that term

keen beacon May 30, 2025, 7:42 PM

#

strawberry man?

#

idk

sonic tendon May 30, 2025, 7:42 PM

#

the twitter guy

#

i assume

#

heh

small haven May 30, 2025, 7:45 PM

#

my issue here is why would an actual google engineer be here to actually talk about insider info? what is there to be gain, other than attention and losing ur cushy job lol

keen beacon May 30, 2025, 7:45 PM

#

nah hes just a massive google fan

sonic tendon May 30, 2025, 7:46 PM

#

aren't we all

small haven May 30, 2025, 7:46 PM

#

not wannabes

#

ok bud

sonic tendon May 30, 2025, 7:49 PM

#

i doubt anyone cares that much

floral merlin May 30, 2025, 8:16 PM

#

Hello, where is the price / score chart in the new UI?

#

This (Price Anaysis):

#

It is the most usable for me chart so far.

keen beacon May 30, 2025, 8:20 PM

#

it might not be added yet in the new site idk

floral merlin May 30, 2025, 8:20 PM

#

keen beacon it might not be added yet in the new site idk

Ok, thanks for the info 👍

echo aurora May 30, 2025, 8:29 PM

#

floral merlin Hello, where is the price / score chart in the new UI?

Yeah sry to say isn't currently on the regular site

floral merlin May 30, 2025, 8:30 PM

#

echo aurora Yeah sry to say isn't currently on the regular site

No need to sorry. I know that an art of creating software is always an art of sacrifices.

elder rapids May 30, 2025, 8:30 PM

#

keen beacon apparently not since it won over redsword i think

goldmane was better than redsword in most of my tests

keen beacon May 30, 2025, 8:31 PM

#

i havent used the model yet lmao i just read the chat 🤣

#

there were people saying both things i dont know

jade egret May 30, 2025, 8:31 PM

#

hi

elder rapids May 30, 2025, 8:31 PM

#

ion know bro, I've been accurate about basically every assessment ive projected here

#

which isn't much projections at all

#

but still

echo aurora May 30, 2025, 8:32 PM

#

jade egret hi

hello meowwavepeek

elder rapids May 30, 2025, 8:32 PM

#

I KNOW models

misty vault May 30, 2025, 8:32 PM

#

do u know that gpt-4-0314 is agi

elder rapids May 30, 2025, 8:33 PM

#

gpt 40 is agi

golden ocean May 30, 2025, 8:33 PM

#

gpt 4 is agi

high ginkgo May 30, 2025, 8:36 PM

#

then we go to a more recent model gpt 4o and we're back to narrow ai ☹️

sturdy mica May 30, 2025, 8:50 PM

#

misty vault do u know that gpt-4-0314 is agi

https://tenor.com/view/funny-cool-2023-funny-memes-hahaguffawchuckleleamalus-this-is-the-hahaguffawchuckleleamalus-gif-4886308197560020132

Tenor

small haven May 30, 2025, 9:05 PM

#

no comment

sour spindle May 30, 2025, 10:23 PM

#

Opus 4 just took the top spot on SimpleBench

primal orbit May 30, 2025, 10:33 PM

#

is opus in the lmarena direct chat thinking or not?

torn mantle May 30, 2025, 10:38 PM

#

its good

#

although the radar chart is a bit broken visually

#

its def better than whatever they were providing previously

#

deep research feature or wtvr...

elder rapids May 30, 2025, 11:11 PM

#

sour spindle Opus 4 just took the top spot on SimpleBench

just about what I expected tbh

#

also I don't think opus 4 or sonnet 4 nonthinking are going to be much higher than 3.7 nonthinking

misty vault May 30, 2025, 11:14 PM

#

real

small haven May 31, 2025, 12:29 AM

#

these tweets about o3 pro is making me 🥵

#

@deep adder why can't i paste image into claude code anymore

#

it used to work

#

i guess they disabled it :/

#

or my settings is fcked

hollow ocean May 31, 2025, 1:11 AM

#

@misty star Disappointment

small haven May 31, 2025, 1:46 AM

#

claude code is a fcking beast

#

ya not ai related, if they can do, why can i post a xi pic

#

*cant

small haven May 31, 2025, 2:13 AM

#

oh hell naw

#

wait i was using haiku?

#

tf

#

#

lmao, i was running multi agents

#

i think its caused i ran above limits

#

it defaults to haiku

#

i was wondering why i was getting shxtty results

hollow ocean May 31, 2025, 2:52 AM

#

https://x.com/p4mui/status/1928234595385819605?s=46&t=AH7sIlIv16Z3Kdb6j3cjfg

P4mui  (@P4mui)

BREAKING: XChat is now rolling out in beta for a select few users

It will roll out to more users soon

As a reminder XChat is the new generation of DMs with new features including:

• End-to-end encryption

• Ability to send files

• Ability to unread!!!

• Ability to

pseudo hemlock May 31, 2025, 5:08 AM

#

Is the "prompt to best model" feature gone?

late path May 31, 2025, 5:10 AM

#

pseudo hemlock Is the "prompt to best model" feature gone?

legacy.lmarena.ai

pseudo hemlock May 31, 2025, 5:11 AM

#

I'm on there and went to "prompt-specific leaderboard", put in the prompt, and it doesn't load anything after I press send

#

I even tried https://github.com/lmarena/p2l and went to "Try on Chatbot Arena at the Prompt-to-Leaderboard tab!" and still nothing

GitHub

GitHub - lmarena/p2l: Prompt-to-Leaderboard

Prompt-to-Leaderboard. Contribute to lmarena/p2l development by creating an account on GitHub.

echo aurora May 31, 2025, 5:21 AM

#

pseudo hemlock Is the "prompt to best model" feature gone?

like cap said isn't apart of the current site atm, I'm going to flag to the team regarding p2l on the legacy site. sorry for the inconvenience!

pseudo hemlock May 31, 2025, 5:26 AM

#

echo aurora like cap said isn't apart of the current site atm, I'm going to flag to the team...

its okay! thanks

#

and not a big deal, i'm sure i'm the only person who wants it enough to try it on the legacy site lmao

elder rapids May 31, 2025, 7:29 AM

#

pseudo hemlock and not a big deal, i'm sure i'm the only person who wants it enough to try it o...

nah

#

it's actually pretty fun

elder solar May 31, 2025, 7:47 AM

#

why i dont have this feature in my chatgpt logged account?

#

it shows off when logged out

torn escarp May 31, 2025, 9:23 AM

#

echo aurora Yeah sry to say isn't currently on the regular site

Is the price analysis chart planned in the new UI?

drifting thorn May 31, 2025, 1:57 PM

#

Flux Konnect is good as hell

misty vault May 31, 2025, 2:12 PM

#

fluxsydney

echo aurora May 31, 2025, 2:50 PM

#

torn escarp Is the price analysis chart planned in the new UI?

That’s tbd, but good to know something you’d like to see brought to current site

ocean vortex May 31, 2025, 3:12 PM

#

primal orbit is opus in the lmarena direct chat thinking or not?

I don't think it is, but you can kinda sorta trick it by adding <thinking> at the end of your prompt

#

native thinking:

#

with a prompt, thinking 'disabled':

#

the thing with Opus is that it usually doesn't reason for long either way

#

write me a poem that doesn't rhyme. <thinking>

echo aurora May 31, 2025, 3:34 PM

#

pseudo hemlock its okay! thanks

prompt-to-leaderboard is working again blobthumbsup

pliant cypress May 31, 2025, 3:39 PM

#

New deepseek R1 score 10% higher on SimpleBench wow

torn mantle May 31, 2025, 3:44 PM

#

the new deepseek is actually crazy

#

its the closest model to o3 in terms of formatting

#

they are trying to mimic that

#

pretty sure its trained on o3 outputs too

tall summit May 31, 2025, 3:45 PM

#

i hate o3 formatting

torn mantle May 31, 2025, 3:45 PM

#

i love the arrows explanation, straight to the point

#

i myself use that a lot

keen beacon May 31, 2025, 4:00 PM

#

nah this new r1 is closer to gemini

#

in terms of output

torn mantle May 31, 2025, 4:00 PM

#

gemini is like the biggest yapper

#

it doesnt write with emojis nor arrows

#

but its packed with so much knowledge

#

so we got the formatting + knowledge from both models

keen beacon May 31, 2025, 4:01 PM

#

https://eqbench.com/creative_writing.html (see new r1 and click the slop metric)

#

#

for creative writing probably personally

#

but eqbench's creative writing leaderboard is a useful metric

torn mantle May 31, 2025, 4:05 PM

#

keen beacon

thats on creative writing

dusky aurora May 31, 2025, 4:05 PM

#

torn mantle gemini is like the biggest yapper

I like yapping models

tall summit May 31, 2025, 4:05 PM

#

keen beacon for creative writing probably personally

↑

keen beacon May 31, 2025, 4:06 PM

#

torn mantle thats on creative writing

still the slop metric is somewhat useful to determine provenance

#

if u understand how its calculated

dusky aurora May 31, 2025, 4:06 PM

#

keen beacon for creative writing probably personally

opus writes so great scenes

tall summit May 31, 2025, 4:06 PM

#

opus my beloved

torn mantle May 31, 2025, 4:13 PM

#

lol

#

grok 3.5 is coming

#

just wait a year or two

wintry tinsel May 31, 2025, 5:37 PM

#

It is lol

#

My metric is different though I jailbreak and unlock the model first, than I judge its full underlying ability

#

Opus is so far ahead its in a league of its own

#

Like a fundamentally different class of performance, I’ve struggled to step down even to sonnet after using it

#

For creative writing I mean

small haven May 31, 2025, 6:08 PM

#

opus is built diff

ocean vortex May 31, 2025, 6:08 PM

#

pliant cypress New deepseek R1 score 10% higher on SimpleBench wow

He finally added Opus as well 🧐

ocean vortex May 31, 2025, 6:10 PM

#

wintry tinsel Like a fundamentally different class of performance, I’ve struggled to step down...

I think Opus is easily the biggest available reasoning model right now tbh. That's not to say that it is the best overall since it clearly isn't, but there are things it's gonna take the lead at

small haven May 31, 2025, 6:20 PM

#

omfg i have it

#

craig smart

#

7mins for news is crazy tho

#

i thought u was trolling but dayum, o1 pro in the api boys

#

*o3 pro

tall summit May 31, 2025, 6:22 PM

#

wintry tinsel Like a fundamentally different class of performance, I’ve struggled to step down...

i knowwwwwww

wintry tinsel May 31, 2025, 6:23 PM

#

ocean vortex I think Opus is easily the biggest available reasoning model right now tbh. That...

Lots of things I’d say its gotta take the lead on at least half of the things people do with LLM’s

small haven May 31, 2025, 6:31 PM

#

keen ferry May 31, 2025, 6:38 PM

#

small haven

you got pro subscription?

torn mantle May 31, 2025, 6:40 PM

#

keen ferry you got pro subscription?

yes

small haven May 31, 2025, 6:40 PM

#

keen ferry you got pro subscription?

no its a free account

torn mantle May 31, 2025, 6:41 PM

#

stop it

#

or dont

#

idkdidkidkdi

small haven May 31, 2025, 6:42 PM

#

its free, i just had to splurge $200

#

o3 pro is super long tho, way longer than the old o1 pro

torn mantle May 31, 2025, 6:44 PM

#

small haven its free, i just had to splurge $200

what can we do without you?

#

first one that will have o3 pro access in this server

#

what a privilege

#

😖

small haven May 31, 2025, 6:45 PM

#

ok lemme put ur berberine prompt in

torn mantle May 31, 2025, 6:45 PM

#

lol

#

is it out yet or what?

small haven May 31, 2025, 6:45 PM

#

not officially

grim axle May 31, 2025, 6:57 PM

#

anyone having problems?

#

elder rapids May 31, 2025, 6:57 PM

#

keen beacon https://eqbench.com/creative_writing.html (see new r1 and click the slop metric)

I hate this benchmark so much

#

😭

#

whoever made it has no idea how to judge + the model capabilities to judge

#

look at the bias control clarifications lmao

grim axle May 31, 2025, 6:59 PM

#

small haven May 31, 2025, 6:59 PM

#

https://chatgpt.com/share/683b5183-2a90-8003-b84c-a73e47f0d345 @torn mantle

ChatGPT

ChatGPT - Berberine vs Propolis vs Resveratrol

Shared via ChatGPT

elder rapids May 31, 2025, 7:02 PM

#

grim axle

it is broken ye

echo aurora May 31, 2025, 7:02 PM

#

grim axle

Yeah I'm seeing this also, thank you!

#

Sorry for missing models everyone! Our team is looking into

grim axle May 31, 2025, 7:06 PM

#

echo aurora Sorry for missing models everyone! Our team is looking into

Claude 4 Opus was having problems too

misty vault May 31, 2025, 7:10 PM

#

elder rapids whoever made it has no idea how to judge + the model capabilities to judge

fr

echo aurora May 31, 2025, 7:13 PM

#

grim axle Claude 4 Opus was having problems too

Yeah I've been hearing the same as well, altho I can't repro because I can't access models 😭

misty vault May 31, 2025, 7:13 PM

#

echo aurora Yeah I've been hearing the same as well, altho I can't repro because I can't acc...

sorry for making the models unavailable

small haven May 31, 2025, 7:14 PM

#

ok so wen is o4

#

and ultimately o4 pro

grim axle May 31, 2025, 7:17 PM

#

echo aurora Yeah I've been hearing the same as well, altho I can't repro because I can't acc...

it’s probably API fault

misty vault May 31, 2025, 7:18 PM

#

no

#

if u haven't reloded since before the issue

#

then everything still works fine

grim axle May 31, 2025, 7:19 PM

#

press ctrl w I found a way

misty vault May 31, 2025, 7:19 PM

#

small haven May 31, 2025, 7:22 PM

#

small haven

poll_question_text

which one wins

victor_answer_votes

10

total_votes

15

victor_answer_id

2

victor_answer_text

o3 pro

misty vault May 31, 2025, 7:42 PM

#

grim axle press ctrl w I found a way

small haven May 31, 2025, 7:45 PM

#

oh my goodness, o3 pro is no joke

civic flame May 31, 2025, 7:48 PM

#

@small haven you taking requests?

small haven May 31, 2025, 7:49 PM

#

civic flame <@931708065319907338> you taking requests?

ok

civic flame May 31, 2025, 7:51 PM

#

o3 surprises me by how meh it is at correctly formatting realistic wikipedia articles so try this:

"Write full Wikitext for a very realistic Wikipedia article for the 2028 Republican primaries, after the primary has finished."

small haven May 31, 2025, 7:52 PM

#

queued

split olive May 31, 2025, 7:55 PM

#

fix lmarena

glad peak May 31, 2025, 7:55 PM

#

Anyone else having this issue? Was working fine 5 mins ago now all my chats are gone

misty vault May 31, 2025, 7:55 PM

#

glad peak Anyone else having this issue? Was working fine 5 mins ago now all my chats are ...

you got permanently banned

split olive May 31, 2025, 7:56 PM

#

i get connection failed error

glad peak May 31, 2025, 7:58 PM

#

Should've just scrolled up lol

echo aurora May 31, 2025, 8:01 PM

#

glad peak Anyone else having this issue? Was working fine 5 mins ago now all my chats are ...

Really sorry about that! Team is looking into a lot of widespread issues atm.

glad peak May 31, 2025, 8:03 PM

#

Bruh you guys give me free access to all the best models with ease. It working 3% of the time would be a blessing

small haven May 31, 2025, 8:03 PM

#

@civic flame what did you get usually for both candidates

#

their names

tall summit May 31, 2025, 8:03 PM

#

glad peak Bruh you guys give me free access to all the best models with ease. It working 3...

well you have to show all your prompts to the world

small haven May 31, 2025, 8:04 PM

#

https://chatgpt.com/share/683b6035-9634-8003-92e2-69522597c214 @civic flame

ChatGPT

ChatGPT - 2028 GOP Primary Simulation

Shared via ChatGPT

#

tim scott is that out of the scope

grim axle May 31, 2025, 8:04 PM

#

echo aurora Really sorry about that! Team is looking into a lot of widespread issues atm.

Ask AI to fix the issues. Problem solved 👍

echo aurora May 31, 2025, 8:06 PM

#

grim axle Ask AI to fix the issues. Problem solved 👍

I shoulda thought of that! On it!

grim axle May 31, 2025, 8:06 PM

#

split olive i get connection failed error

It means you got banned

echo aurora May 31, 2025, 8:07 PM

#

Lol not true!! ^

split olive May 31, 2025, 8:07 PM

#

💀💀

civic flame May 31, 2025, 8:07 PM

#

small haven https://chatgpt.com/share/683b6035-9634-8003-92e2-69522597c214 <@133813616834406...

interesting

grim axle May 31, 2025, 8:07 PM

#

What model should they release next?

civic flame May 31, 2025, 8:07 PM

#

it went with a scenario i think is not particularly likely but all the same interesting to read

misty vault May 31, 2025, 8:08 PM

#

grim axle What model should they release next?

gpt-4-0314

civic flame May 31, 2025, 8:08 PM

#

unfortunately it looks like formatting is only a little better than o3's attempt

#

#

nonexistent template

grim axle May 31, 2025, 8:08 PM

#

misty vault gpt-4-0314

I don’t see why they didn’t added it yet

civic flame May 31, 2025, 8:08 PM

#

skipped out a bunch 👎 lazy

small haven May 31, 2025, 8:09 PM

#

damn

misty vault May 31, 2025, 8:09 PM

#

openai not giving lmarena gpt-4-0314 access 😔

small haven May 31, 2025, 8:09 PM

#

if 2028 is tim scott, im eating my shorts

civic flame May 31, 2025, 8:10 PM

#

i think approximately zero of these people would decline to run lmao

#

claude 4 opus' prediction i think makes more sense, it went with vance

small haven May 31, 2025, 8:11 PM

#

civic flame claude 4 opus' prediction i think makes more sense, it went with vance

tbh vance seems like a sequential choice, pretty logical, but tim scott as the answer is diff and seems like it thought a bit for it

civic flame May 31, 2025, 8:11 PM

#

i presume somewhere in the CoT it was like

#

"this is likely to be a very competitive primary without trump, and like in 2016 the winner tends to be hard to predict, so..."

#

which i suppose makes sense

#

shame they don't expose much of the CoT though

echo aurora May 31, 2025, 8:23 PM

#

slight gestures towards

✅ Avoid political and religious content.

misty vault May 31, 2025, 8:25 PM

#

I would vote for gpt-4-0314 if it were running for president ngl

civic flame May 31, 2025, 8:30 PM

#

echo aurora *slight gestures towards* > ✅ Avoid political and religious content.

https://tenor.com/view/1984-gif-19260546

Tenor

small haven May 31, 2025, 8:34 PM

#

echo aurora *slight gestures towards* > ✅ Avoid political and religious content.

its ai related tho lol

grim axle May 31, 2025, 8:34 PM

#

I need to try the new flux model please turn back on

#

also is the new model any good?

elder rapids May 31, 2025, 8:40 PM

#

yo 2.5 pro in the app has better instruction following now

#

sum changed

grim axle May 31, 2025, 8:40 PM

#

elder rapids yo 2.5 pro in the app has better instruction following now

?

keen beacon May 31, 2025, 8:41 PM

#

gemini product or aistudio? (or both)

elder rapids May 31, 2025, 8:41 PM

#

it's interpreting my instructions and applying them in a much better way than before

elder rapids May 31, 2025, 8:41 PM

#

keen beacon gemini product or aistudio? (or both)

product/the app

keen beacon May 31, 2025, 8:41 PM

#

kk

tall summit May 31, 2025, 8:42 PM

#

opus bad at translation compared to gemini 2.5 pro D:

elder rapids May 31, 2025, 8:42 PM

#

keen beacon kk

oh ye they fixed the formatting issues on mobile too

elder rapids May 31, 2025, 8:43 PM

#

tall summit opus bad at translation compared to gemini 2.5 pro D:

sonnet imo is better for a lot of translation tasks

#

strangely enough

#

but 2.5 pro is a god at translation

#

same with 4o

torn mantle May 31, 2025, 8:43 PM

#

small haven https://chatgpt.com/share/683b5183-2a90-8003-b84c-a73e47f0d345 <@295243581818404...

nice one

elder rapids May 31, 2025, 8:44 PM

#

although less nuance when pushed

torn mantle May 31, 2025, 8:44 PM

#

unfortunately i cant tell if its any different from o1 pro

elder rapids May 31, 2025, 8:44 PM

#

ask if it's agi

torn mantle May 31, 2025, 8:44 PM

#

leo been playing roblox all day

small haven May 31, 2025, 8:44 PM

#

o3 pro is less verbose

elder rapids May 31, 2025, 8:45 PM

#

o3 is much less verbose in general tbh

small haven May 31, 2025, 8:45 PM

#

its not a bad thing tbh

#

ya next year

#

lol

#

now we wait for deepthink :/

elder rapids May 31, 2025, 8:46 PM

#

ngl high hopes for deepthink

#

if it actually pushes 2.5 pro further

leaden palm May 31, 2025, 8:47 PM

#

you guys know you could code/orchestrate your own deepthink right

elder rapids May 31, 2025, 8:47 PM

#

then that's something really to hope for

small haven May 31, 2025, 8:47 PM

#

leaden palm you guys know you could code/orchestrate your own deepthink right

too lazy, i think im gonna put $250 in

elder rapids May 31, 2025, 8:47 PM

#

leaden palm you guys know you could code/orchestrate your own deepthink right

bro acting like we're not the consumers 😭

grim axle May 31, 2025, 8:47 PM

#

it’s pretty easy

pseudo hemlock May 31, 2025, 8:49 PM

#

echo aurora prompt-to-leaderboard is working again <:blobthumbsup:494901804476137482>

THATS WHY HES THE GOAT

rigid crescent May 31, 2025, 8:53 PM

#

sorry if this has been asked to death already but is repochat planned for the new ui?

echo aurora May 31, 2025, 8:57 PM

#

rigid crescent sorry if this has been asked to death already but is repochat planned for the ne...

generally I won't be able to say if a specific feature is or isn't incoming; however, it's something we're putting thought into

rigid crescent May 31, 2025, 9:00 PM

#

understood, thank you for clarifying 🙏

small haven May 31, 2025, 9:01 PM

#

o3 pro uses big fat arrows 😮

atomic pagoda May 31, 2025, 9:08 PM

#

Is anyone else getting the connection error?

meager lintel May 31, 2025, 9:10 PM

#

New Gemini 2.5 pro checkpoint in a few days

#

Probably goldmane?

#

Where’d Tuesday come from?

#

😄

#

Well, I heard from someone else who saw it leaked from semi-public info too so makes sense

#

LMArena staff sandbagging the leaderboard update until Tuesday would be 💀

#

Just like Grok 3…

#

In this case, the other source I think is from something similar to the feature flags leak of Claude 4

torn escarp May 31, 2025, 9:20 PM

#

echo aurora That’s tbd, but good to know something you’d like to see brought to current site

100%.
I don't know anything else close to it that so distinctly compares price/performance of LLMs

errant thorn May 31, 2025, 9:20 PM

#

it was working for a few secs just now but now its back down

keen beacon May 31, 2025, 9:26 PM

#

they should call it o2 pro instead for maximum confusion

small haven May 31, 2025, 9:31 PM

#

i literally have it 😭

#

o1 pro with search enabled kek

#

o3 pro gave me alpha omg

patent aspen May 31, 2025, 9:39 PM

#

When is the last time someone working at OAI said they were still working on o3 pro?

#

Looks like April 16

small haven May 31, 2025, 9:41 PM

#

yesterday

patent aspen May 31, 2025, 9:41 PM

#

Oh where?

small haven May 31, 2025, 9:42 PM

#

oai cpo

#

on x.com

keen beacon May 31, 2025, 9:42 PM

#

https://xcancel.com/btibor91/status/1928522990599172141#m

Nitter

Tibor Blaho (@btibor91)

For everyone asking for an update about o3-pro, it is coming soon (possibly already in testing - see below)

small haven May 31, 2025, 9:43 PM

#

theres no way ppl still think o3 pro is fake 😭

#

o3 pro vs baseline

#

ok so u saying o3 pro but integrated into gpt 5 lol

#

i mean its still o3 pro

#

just a router

patent aspen May 31, 2025, 9:48 PM

#

It sounds like o3 pro is coming out eventually

small haven May 31, 2025, 9:50 PM

#

i mean yes if u scroll up a bit

patent aspen May 31, 2025, 9:50 PM

#

I know. I'm just reaffirming based on the posts above

small haven May 31, 2025, 9:51 PM

#

99.99% confidence band lol

echo aurora May 31, 2025, 9:51 PM

#

site should be up and working again btw 👍

keen beacon May 31, 2025, 9:52 PM

#

gpt 5 release might coincide/be somewhat correlated with gpt 4.5 being shut down on the api too maybe

#

which is in july

small haven May 31, 2025, 9:52 PM

#

day 1 with o3 pro

#

gpt5 at the end of the day is just a router

keen beacon May 31, 2025, 9:55 PM

#

small haven day 1 with o3 pro

was it worth the wait lmao?

small haven May 31, 2025, 9:55 PM

#

keen beacon was it worth the wait lmao?

yes

misty vault May 31, 2025, 9:55 PM

#

gpt-5-preview-0314

small haven May 31, 2025, 9:56 PM

#

? i think its just going to explode more? bc majority is just using 4o as default

misty vault May 31, 2025, 9:56 PM

#

true

keen beacon May 31, 2025, 9:57 PM

#

they spent a lot of time on 4o (mid train). whilst 4.1 mini/etc are fresh. it seems 4o is gonna be used for a while

#

(they talked about this in a podcast btw about the mid-train/fresh train)

#

yeah

small haven May 31, 2025, 9:58 PM

#

yea, no one cares about the rest lol

#

wb grok 3.5

#

even when elon ma has a black eye

#

i feel like grok 3.5 bigbrain is just going to at most match o3

#

u rlly believe that?

keen beacon May 31, 2025, 10:00 PM

#

lmao

patent aspen May 31, 2025, 10:00 PM

#

Is bigbrain some meme?

small haven May 31, 2025, 10:01 PM

#

i can not vouch for this

keen beacon May 31, 2025, 10:01 PM

#

patent aspen Is bigbrain some meme?

its xai's version of o3 pro/deepthink

#

they named it bigbrain

small haven May 31, 2025, 10:01 PM

#

oh right xai will release a $200/mo plan 🧠

#

lol ahh

elder rapids May 31, 2025, 10:06 PM

#

4o is spitting out images embedded into the chat

#

man where is goldmane

#

cool that's when my big ass TV arrives

#

ion believe that tbh

misty vault May 31, 2025, 10:09 PM

#

wtf is ion

elder rapids May 31, 2025, 10:09 PM

#

I don't

small haven May 31, 2025, 10:11 PM

#

are u saying gemini 2.5 pro updated version is gonna match goldmane

#

noice

echo aurora May 31, 2025, 10:47 PM

#

hey after the site was down are you now seeing your chat history ( blobyes ) or are you NOT seeing your chat history ( blobno )?

elder rapids May 31, 2025, 11:22 PM

#

goldmane is an explanation god

#

it's crazy intelligent and it's subject to being influenced more now

#

less dogmatic when it comes to uncertain things at first and brute forces conclusions to be more certain

#

what it says

#

it's subject to being influenced more now

#

no I mean it's different from 0506

elder rapids May 31, 2025, 11:26 PM

#

elder rapids less dogmatic when it comes to uncertain things at first and brute forces conclu...

it thinks for a while and sometimes doesn't think at all

#

which in the cases It was in

#

was surprisingly appropriate

patent aspen May 31, 2025, 11:29 PM

#

Is it less verbose than 0506? It's supposed to be

#

Or at least that was highly requested

elder rapids May 31, 2025, 11:31 PM

#

patent aspen Is it less verbose than 0506? It's supposed to be

much less

patent aspen May 31, 2025, 11:31 PM

#

I'm excited. I haven't got around to trying it yet

keen beacon May 31, 2025, 11:40 PM

#

I hope the not thinking bug is fixed

#

It's really annoying when it does it

patent aspen May 31, 2025, 11:41 PM

#

keen beacon I hope the not thinking bug is fixed

What is the bug?

misty vault May 31, 2025, 11:41 PM

#

I think the not thinking part is the bug like he literally just said

keen beacon May 31, 2025, 11:41 PM

#

patent aspen What is the bug?

It just doesn't think before the reply

small haven May 31, 2025, 11:42 PM

#

ok after playing with o3 pro for a few hours, its pure insanity

keen beacon May 31, 2025, 11:42 PM

#

Long conversations

#

Since they exclude the previous thoughts iirc in prev turns, at some point the model just doesn't have the tendency to do it. Weirdly they don't prefill the thinking delimiter so it can just do that

#

Might be fixed since their logic will be different with the next update I think since they're adding the toggle

#

Yeah I don't think so

#

The 'fix' is to ask it to think it's just annoying to do so

#

Sometimes it won't work and you have to rephrase it in weird ways to get it to think etc

#

I looked at the latency for the first token so it's not visual

grim axle May 31, 2025, 11:52 PM

#

Okay yeah the flux model sucks ass

keen beacon May 31, 2025, 11:52 PM

#

It could be that but it mostly starts happening in long conversations, and I think the mechanism is as above. It's been a thing since flash thinking exp

#

Also it can think twice or get into thinking loops (multiple thinking blocks per reply) so I think it's a model thing

keen beacon May 31, 2025, 11:55 PM

#

keen beacon Also it can think twice or get into thinking loops (multiple thinking blocks per...

For me the thinking twice thing/etc happens sometimes when I ask it to think

#

#general message here's an instance of it doing it (fyi I was wrong here about it being a special token)

tall summit May 31, 2025, 11:58 PM

#

small haven ok after playing with o3 pro for a few hours, its pure insanity

REAL

meager lintel Jun 1, 2025, 12:11 AM

#

whatever X-preview is sucks lol

meager lintel Jun 1, 2025, 12:12 AM

#

patent aspen Or at least that was highly requested

thank god

#

after making 05/06 the ONLY option with no option to still use 03/25, I was half worried the thinking spam was some kind of intentional thing to pump out output tokens.....

#

yea makes sense

#

I'm hoping that whatever the "new research stuff" got into the new 2.5 flash is in goldmane too

#

new 2.5 flash is absolutely amazing

#

just not quite smart enough

#

but it hits so far above its weight it's insane

#

I saw some google researcher on twitter saying something like "a ton of new research ideas (which I can't talk about) were successful and got into 2.5 flash", so it's got my hopes up 😄

elder rapids Jun 1, 2025, 12:34 AM

#

Logan confirmed it's coming in the next few weeks

keen beacon Jun 1, 2025, 12:36 AM

#

He said that a few days earlier

#

They removed redsword I assume release is imminent

#

https://xcancel.com/OfficialLoganK/status/1927791900817317992#m

Nitter

Logan Kilpatrick (@OfficialLoganK)

new 2.5 pro should close the gaps in a couple of weeks

patent aspen Jun 1, 2025, 12:37 AM

#

Weird that he said a couple weeks on May 28th

elder rapids Jun 1, 2025, 12:41 AM

#

it's either you or him

#

ye

keen beacon Jun 1, 2025, 12:42 AM

#

Maybe Logan is talking about an even later revision

#

But this update is substantial

#

It would be strange

elder rapids Jun 1, 2025, 12:43 AM

#

I mean, in my eyes it's better than 0325 all around tbh

#

0506 still had the same capability but you had to prompt it more

#

goldmane simply just does it

patent aspen Jun 1, 2025, 12:44 AM

#

tbh if Logan said a couple weeks on May 28th, I trust him more than myself

elder rapids Jun 1, 2025, 12:44 AM

#

btw I believe 2.5 flash was an exception

#

they didn't actually serve it

#

as far as I know, I don't use vertex

keen beacon Jun 1, 2025, 12:45 AM

#

patent aspen tbh if Logan said a couple weeks on May 28th, I trust him more than myself

Why remove redsword this early though

elder rapids Jun 1, 2025, 12:45 AM

#

keen beacon Why remove redsword this early though

difference could've been major

keen beacon Jun 1, 2025, 12:46 AM

#

Maybe but more time couldn't have hurt

elder rapids Jun 1, 2025, 12:49 AM

#

hopefully

#

I anticipate googles releases a lot

small haven Jun 1, 2025, 12:49 AM

#

keen beacon https://xcancel.com/OfficialLoganK/status/1927791900817317992#m

against what model lol

elder rapids Jun 1, 2025, 12:50 AM

#

small haven against what model lol

goldmane vs 0506

leaden palm Jun 1, 2025, 12:55 AM

#

small haven against what model lol

idea is that where 0506 is worse than 0325, new gemini will be better

grim axle Jun 1, 2025, 2:16 AM

#

FLUX AI IS HAVING PROBLEMS

leaden palm Jun 1, 2025, 2:19 AM

#

grim axle FLUX AI IS HAVING PROBLEMS

curious why you're still on beta.lmarena.ai and why you're using an ai built for image to image generation as a textual ai

#

perhaps it's because it didn't force you to attach anything

grim axle Jun 1, 2025, 2:20 AM

#

leaden palm curious why you're still on beta.lmarena.ai and why you're using an ai built for...

I use it to edit images like this

leaden palm Jun 1, 2025, 2:20 AM

#

well you didn't attach an image there did you

#

probably why you're getting problems

#

it works fine for me with an image

grim axle Jun 1, 2025, 2:20 AM

#

I tried and it’s still not working

leaden palm Jun 1, 2025, 2:20 AM

#

then that's odd

grim axle Jun 1, 2025, 2:21 AM

#

#

is there a limit?

#

because I used the model 20 times

keen fulcrum Jun 1, 2025, 2:56 AM

#

Claude is making these graphs and Cursor isn't great at displaying them 😄

#

Flux kontext amazing
https://fixupx.com/AngryTomtweets/status/1928509452493246911

Angry Tom (@AngryTomtweets)

Today @bfl_ml dropped FLUX.1 Kontext, a new multimodal model that understands both image and text inputs.
︀︀
︀︀It's now available in @LTXStudio for you to try!
︀︀
︀︀Try here: ltx.studio

**💬 2 🔁 10 ❤️ 119 👁️ 10.6K **

▶ Play video

elder rapids Jun 1, 2025, 3:14 AM

#

lmao why does 2.5 flash thinking identify as Claude 4 sonnet, with all of the up to date Claude information on the arena

#

it says the model string too, like the regular Claude models

#

I've also been seeing different models act in a way that don't align with their personality

#

there's definitely a bug going on rn

#

where it's showing the wrong model name

#

or its routing to a different model

#

even the other models, like "Stephen" or "x-preview" are doing it

#

Claude is sometimes identifying as a Google model, too

#

yo this is DEFINITELY happening

hollow ocean Jun 1, 2025, 4:22 AM

#

elder rapids it's crazy intelligent and it's subject to being influenced more now

Is it better than opus 4

elder rapids Jun 1, 2025, 5:57 AM

#

hollow ocean Is it better than opus 4

opus 4 isn't a very high standard so ye ofc

small haven Jun 1, 2025, 7:59 AM

#

o3 pro currently has a 64k context window 😦

#

if deepthink matches o3 pro, but offers 1m context window, google wins

hollow ocean Jun 1, 2025, 8:13 AM

#

small haven if deepthink matches o3 pro, but offers 1m context window, google wins

will you switch to team google if deepthink does?

calm sequoia Jun 1, 2025, 8:32 AM

#

Someone on twitter posted this interesting table

small haven Jun 1, 2025, 8:35 AM

#

hollow ocean will you switch to team google if deepthink does?

yes, im not a dickrider for any, i want to use the most frontier model

late path Jun 1, 2025, 8:43 AM

#

they should add phantom and nebula too

#

the beginning of google's legendary comeback😁

dusky aurora Jun 1, 2025, 9:16 AM

#

calm sequoia Someone on twitter posted this interesting table

ah,so goldmane is a Gemini version?

vernal meadow Jun 1, 2025, 10:06 AM

#

yes they both are and they are both very good

dusky aurora Jun 1, 2025, 10:21 AM

#

so they say Goldmane will be relased i the coming days?

alpine coral Jun 1, 2025, 10:35 AM

#

elder rapids lmao why does 2.5 flash thinking identify as Claude 4 sonnet, with all of the up...

that seems v odd

#

i can't rememember the last time a google (or anthropic or oai for that matter) model identified itself as a model from a different lab.. and it happened repeatedly?

primal orbit Jun 1, 2025, 10:42 AM

#

calm sequoia Someone on twitter posted this interesting table

dragontail was better than claybrook, yet they picked claybrook.

keen beacon Jun 1, 2025, 11:25 AM

#

alpine coral i can't rememember the last time a google (or anthropic or oai for that matter) ...

If it identifies as Claude 4 sonnet at the very least the system prompts are all mixed up. Or the model names are switched/messed up or it's both

alpine coral Jun 1, 2025, 11:39 AM

#

keen beacon If it identifies as Claude 4 sonnet at the very least the system prompts are all...

oh true...

#

that would fs be the most obvious / likely explanation

keen beacon Jun 1, 2025, 11:42 AM

#

There were issues with lmarena earlier this is probably related

sacred plaza Jun 1, 2025, 12:46 PM

#

anyone know what Ilya Sutskever has been up at his new startup?

patent aspen Jun 1, 2025, 12:58 PM

#

sacred plaza anyone know what Ilya Sutskever has been up at his new startup?

Raising money

#

Using Google TPUs for research

#

tbh I don't expect any company starting so late to become relevant, although I also didn't expect DeepSeek or xAI, so take my word with a grain of salt

misty vault Jun 1, 2025, 1:09 PM

#

me when gemini 2.5 pro

#

no gemini 2.5 pro is king

#

i learned from u

#

no

#

u worshipped gemini 2.5 pro in may still 🥰

high ginkgo Jun 1, 2025, 1:15 PM

#

gemini 2.5 pro is cancer

misty vault Jun 1, 2025, 1:19 PM

#

it is though

quiet folio Jun 1, 2025, 1:22 PM

#

sacred plaza Jun 1, 2025, 1:24 PM

#

high ginkgo gemini 2.5 pro is cancer

Elaborate and you not going to wait for deep think?

high ginkgo Jun 1, 2025, 1:24 PM

#

i am going to wait for deep think

dusky aurora Jun 1, 2025, 1:33 PM

#

I still see nothing better in the Direct Chat list. Even the May version of Gemini is a viceroy compredto others

#

Opus is good but so laconic

#

somehow modern LLMs tend toward giving short, token-hoarding replies

keen fulcrum Jun 1, 2025, 1:39 PM

#

Did anyone hear of https://sambanova.ai before?

SambaNova Systems | Revolutionize AI Workloads

Unlock the power of AI for your business with SambaNova's enterprise-grade generative AI platform. Discover how to achieve 10x lower costs & unmatched security.

#

patent aspen Jun 1, 2025, 1:41 PM

#

You think SSI will be relevant?

keen fulcrum Jun 1, 2025, 1:42 PM

#

late path Jun 1, 2025, 1:46 PM

#

keen fulcrum Did anyone hear of https://sambanova.ai before?

I heard their context window is too small, making them almost useless in practice

late path Jun 1, 2025, 2:08 PM

#

I remember Ilya saying their first product would be safe ASI.
We won't see them until ASI

patent aspen Jun 1, 2025, 2:09 PM

#

Wouldn't they just get outpaced by Google if that's their approach?

#

That's why I don't think it's particularly likely that new entrants will catch up

#

I think DeepSeek will remain relevant because it's based in China, and the US may eventually ban China from using US models

#

I don't think it will surpass the top US model in capability

dusky aurora Jun 1, 2025, 2:13 PM

#

keen fulcrum Did anyone hear of https://sambanova.ai before?

https://cloud.sambanova.ai/playground

SambaNova Cloud

Preview AI-enabled Fastest Inference APIs in the world.

late path Jun 1, 2025, 2:14 PM

#

I'm not sure if every model Deepseek releases is only slightly worse than SOTA. Is this a coincidence, or is it because the distilled data they used fundamentally limits their upper bound?

tall summit Jun 1, 2025, 2:27 PM

#

dusky aurora Opus is good but so laconic

which is great

#

and it is wordy when you ask

dusky aurora Jun 1, 2025, 2:36 PM

#

I prefer long replies. Curent Gemini is all about bullet points,which isreadable but too abrupt

sacred plaza Jun 1, 2025, 2:44 PM

#

Ssi isn't trying to make money or sell stuff though. Why are you comparing these AI labs with SSI? It seems more like a research facility

sacred plaza Jun 1, 2025, 3:08 PM

#

This seems probably false. Agree that they are competing on the same pool of resources when it comes to gpus though but the goal for SSI does not seem to be AGI

#

#

Very cool. Have not heard of blue sky research but would not be surprised given Google deepmind as institutional.

Given how hard it was to get even 20% of the compute for open AI for safety safety specific testing that led to the creation of anthropic, I don't see the market incentives promoting safety work for its own sake, like SSI.

#

Would be glad to be proved wrong tho!

#

Wish I knew that deep seek release before I brought the ultra plan, lol 😭. Agree that AI labs are doing safety work, which is definitely promising!

#

I agree with this take. It just seems like Google is adding graph of thoughts promoting technique into the internal model to create deepthink. This kind of seems similar to how AI Labs put in chain of thought prompting into their models to develop their reasoning models.

feral lichen Jun 1, 2025, 3:34 PM

#

best ai for roblox studio?

late path Jun 1, 2025, 3:50 PM

#

it would be a remarkable achievement if they could replicate the huge elo improvement that Alphago achieved by using mcts over raw DNN in LLM

narrow elbow Jun 1, 2025, 4:17 PM

#

pursuit of technological dominance, tech giants and capitalists always prioritize capability over safety.this race mentality remains unchanged. just like cold war nukes.

patent aspen Jun 1, 2025, 4:18 PM

#

narrow elbow pursuit of technological dominance, tech giants and capitalists always prioritiz...

They're terrified of safety incidents though

narrow elbow Jun 1, 2025, 4:18 PM

#

yea

elder rapids Jun 1, 2025, 4:28 PM

#

keen beacon If it identifies as Claude 4 sonnet at the very least the system prompts are all...

could be

#

but this is crazy tbh

cedar tide Jun 1, 2025, 4:28 PM

#

when the next leaderboard update ? there are 6 models in the arena not yet in the leaderboard 🥴
(Two Claude 4, new R1, grok 3 mini,
qwen 3 no think, glm 4 air)

#

They added it to the battle arena recently

#

@deep adder
Now in battle arena

Screenshot_2025-06-01-18-48-35-949_com.android.chrome-edit.jpg

leaden palm Jun 1, 2025, 4:52 PM

#

depends on how long a while is

elder rapids Jun 1, 2025, 5:02 PM

#

leaden palm depends on how long a while is

couple days

tulip meadow Jun 1, 2025, 5:08 PM

#

Hello can someone help me?

#

In Which section, Can I use as ai?

cedar tide Jun 1, 2025, 5:13 PM

#

Nope

torn mantle Jun 1, 2025, 5:14 PM

#

cedar tide <@348477266704990208> Now in battle arena

lmao

#

i thought the next grok model on lmarena will be the 3.5 ver

#

ig we just have to wait a little longer

echo aurora Jun 1, 2025, 5:15 PM

#

tulip meadow In Which section, Can I use as ai?

BlobWave at https://lmarena.ai/ you can use ai through: battle, side-by-side, and direct chat

tulip meadow Jun 1, 2025, 5:15 PM

#

echo aurora <a:BlobWave:1199039210938708048> at <https://lmarena.ai/> you can use ai through...

Thank you

#

Anyone have torrentleech site access? I need invitation

verbal nimbus Jun 1, 2025, 5:24 PM

#

Would be cool if the Web Arena had a Svelte mode. Curious to see if the rankings would stay the same.

keen fulcrum Jun 1, 2025, 5:53 PM

#

So no r2 for the foreseeable future=
How long will we be stuck on R1?

small haven Jun 1, 2025, 5:54 PM

#

o3 pro before grok 3.5 is crazy

#

it happened

#

troll?

elder rapids Jun 1, 2025, 6:04 PM

#

elder rapids but this is crazy tbh

btw

#

I'm not getting goldmane NEARLY as much as in the legacy website

#

which is super strange tbh

small haven Jun 1, 2025, 6:05 PM

#

@deep adder https://chatgpt.com/share/683c9650-6a04-8003-9b6f-d18411d02799

ChatGPT

ChatGPT - O3 vs O1 Pro

Shared via ChatGPT

#

ok ya officially, who tf cares

#

its already here

#

yes

#

o3 pro + claude code is the meta

#

yo but imagine deepthink matches o3 pro, right.. but with 1m context window, that would go insane

elder rapids Jun 1, 2025, 6:08 PM

#

small haven yo but imagine deepthink matches o3 pro, right.. but with 1m context window, tha...

ngl this WOULD go insane

small haven Jun 1, 2025, 6:08 PM

#

no

elder rapids Jun 1, 2025, 6:08 PM

#

but hold on I want to know

#

deepthink isn't going to be 2.5 pro 0506

#

with more thinking

#

it's going to be goldmane lvl probably

#

with parallel

small haven Jun 1, 2025, 6:08 PM

#

elder rapids deepthink isn't going to be 2.5 pro 0506

its official name is literally gemini 2.5 pro + deep think

elder rapids Jun 1, 2025, 6:08 PM

#

which could be crazier

small haven Jun 1, 2025, 6:09 PM

#

exactly lmao

#

u can retire at any age

elder rapids Jun 1, 2025, 6:09 PM

#

small haven its official name is literally gemini 2.5 pro + deep think

yeah?

small haven Jun 1, 2025, 6:09 PM

#

thats why i keep myself busy

surreal warren Jun 1, 2025, 6:13 PM

#

Any idea What tools to try for Deep research sites or scrapping
To Find me items matching specs

Chatgpt is fabricating

small haven Jun 1, 2025, 7:09 PM

#

u can batch tasks in parallel in claude code, amazing

#

yup

#

u can run as many as u want, but for my case, 3 was enough

#

add coffee

candid harbor Jun 1, 2025, 7:30 PM

#

try quadruple espressos

feral lichen Jun 1, 2025, 7:56 PM

#

How can I continue a conversation if you keep standing like that?

grim axle Jun 1, 2025, 10:42 PM

#

feral lichen How can I continue a conversation if you keep standing like that?

make a new chat

hollow ocean Jun 1, 2025, 11:42 PM

#

GPT-5 July confirmed ✅

elder rapids Jun 1, 2025, 11:50 PM

#

hollow ocean GPT-5 July confirmed ✅

you can suspect it via the model deprecations

#

but ion think it's absolutely confirmed

hollow ocean Jun 1, 2025, 11:52 PM

#

elder rapids you can suspect it via the model deprecations

https://x.com/btibor91/status/1929241704873308253?s=46&t=AH7sIlIv16Z3Kdb6j3cjfg

Tibor Blaho (@btibor91)

@Angaisb_ July

surreal creek Jun 2, 2025, 12:25 AM

#

is Stephen a different version of R1? it keeps answering in Chinese which is a pretty obvious giveaway, but there’s a May version of R1 not codenamed in the arena

#

unless it’s an undisclosed version of Qwen

elder rapids Jun 2, 2025, 12:30 AM

#

surreal creek is Stephen a different version of R1? it keeps answering in Chinese which is a p...

it's not

#

they're just random small Chinese models

#

same as X preview

#

Gemini 2.7 soon

#

get it right

patent aspen Jun 2, 2025, 12:32 AM

#

elder rapids Gemini 2.7 soon

Knowing how models are named, it honestly wouldn't shock me

elder rapids Jun 2, 2025, 12:32 AM

#

ion think Google would do that tbh

patent aspen Jun 2, 2025, 12:32 AM

#

I don't either

elder rapids Jun 2, 2025, 12:32 AM

#

they have some sort of philosophy of design

drifting thorn Jun 2, 2025, 12:45 AM

#

grim axle also is the new model any good?

Extremely good at image consistency

drifting thorn Jun 2, 2025, 12:46 AM

#

leaden palm you guys know you could code/orchestrate your own deepthink right

How?

leaden palm Jun 2, 2025, 1:10 AM

#

drifting thorn How?

openai style:

const solve = async (prompt) => {
const results = await Promise.all(Array.from({length: 9}, () => generate(prompt)));
const index = parseInt(await generate(
  `We tried to figure out the answer to the prompt <prompt>${prompt}</prompt> 9 times. Write a final answer incorporating the best aspects from all of these: <answers>${results.join("\n\n---\n\n")}</answers>`
));
return results[index];
}

open source style:

const solve = async (prompt) => {
return generate(`${prompt}

Note: whenever you are about to end thinking, don't. Instead, first write out what you were about to respond with, then critique it in depth, then keep thinking. You are only allowed to end thinking once this has happened 5 times.`);
}

tree of thought style:

const solve = async (prompt, decisions) => {
const trials = (await generate(`You are in the process of solving the prompt <prompt>${prompt}</prompt>. You've made these decisions so far: <decisions>${decisions.join("\n\n---\n\n")}</decisions> You now need to either list some possible paths you can take (separated with the separator ---) or only list the final answer.`)).split("---").map(x => x.trim());
if (trials.length > 1) {
const results = await Promise.all(trials.map(d => solve(prompt, [...decisions, d])));
const best = await generate(`You are in the process of solving the prompt <prompt>${prompt}</prompt>. You've made the decisions <decisions>${decisions.join("\n\n---\n\n")}</decisions> so far. Now, you made some more decisions, resulting in these results: <results>${results.join("\n\n---\n\n")}</results> Write your final answer to this prompt, combining the best aspects of each.`);
return best;
}
return trials[0];
}

wintry tinsel Jun 2, 2025, 1:43 AM

#

After opus my reaction to AI news has been

#

🫤 😕 😑 😐

#

Like they is nothing interesting happening

leaden palm Jun 2, 2025, 1:49 AM

#

wintry tinsel 🫤 😕 😑 😐

me after the sota isn't beaten for 11 days:

hardy pecan Jun 2, 2025, 2:18 AM

#

we are in a golden age

#

but have trouble perceiving it

leaden palm Jun 2, 2025, 2:43 AM

#

idk its said that logan is responsible for accelerating gemini's availability and ai studio dev

surreal creek Jun 2, 2025, 2:46 AM

#

leaden palm me after the sota isn't beaten for 11 days:

bahahahaha

hollow ocean Jun 2, 2025, 3:28 AM

#

Opus sota

#

trophy3d

small haven Jun 2, 2025, 3:59 AM

#

they alrdy nerfed o3 pro great

elder rapids Jun 2, 2025, 5:25 AM

#

hardy pecan but have trouble perceiving it

I mean it is kind of magical tbh

#

like, could you imagine being able to speak to something that isn't human but can actually, coherently, and with extraordinary articulation

#

talk to you about things that are extremely implicit, beyond the syntax it's built off of

#

and so reminiscent of human thoughts

#

and then now it can access and see your screen, and think about what it's looking at with the necessary context, to actual figure it out

#

rather than a narrow program that key logs then executes that in repetition

#

all in the span of a year

keen fulcrum Jun 2, 2025, 7:46 AM

#

LLMs & Models:

DeepSeek: R1-0528 released: 64K context, efficient quantization.
OpenAI: Deprecating GPT-4 32k for GPT-4o, chat log & censorship concerns.
Anthropic: Claude Opus safety report, mechanistic interpretability tools.
Google: Veo 3 video model, SignGemma for sign language. Gemini 2.5 Pro: large context, UI/creative limits.
Mistral: Agents API for orchestration.
AI21: Jamba model reception good, details limited.
XAI: $300M for Grok on Telegram, skepticism remains.

Agents & Tools:

Perplexity: Labs for multi-tool workflows, new features.
LlamaIndex: Agents in Finance workshop, advanced RAG.
VerbalCodeAI: AI terminal tool for code analysis.
Latent Space: Collab on autonomous engineers.

Infra & Hardware:

Unsloth: Optimized DeepSeek models for limited hardware.
AMD: Max+ 365 GPU (128GB VRAM).
NVIDIA: Blackwell optimizations for DeepSeek R1.

Open Source:

Ollama: Naming issues, SDK instabilities.
Hugging Face: Diffusers enhanced, LightEval v0.10.

Challenges:

Cursor: Backlash over slow pool removal.
Manus: Instability, network issues.
Nomic.ai: Cloud security concerns.

New:

Black Forest Labs: New AI lab, Flux-1-Kontext image model.
Factory AI: Autonomous software engineers.

Insights:

Mary Meeker: AI industry report: accelerating adoption.
Microsoft: Early Sora API access.
Cohere: AI automation gains.
Gradio: MCP hackathon.

misty vault Jun 2, 2025, 8:50 AM

#

gpt

keen fulcrum Jun 2, 2025, 9:45 AM

#

small haven Jun 2, 2025, 9:47 AM

#

I think they tried o4 pro

willow grail Jun 2, 2025, 12:12 PM

#

ayo!!!!
TO ALL OUTSIDE VIBE CODERS!!!!

u know which laptops are best for vibe coding with our lovely cc/cm in parks?

calm sequoia Jun 2, 2025, 1:12 PM

#

I think "vibe-coding" does not impact your laptop choice anyhow.

#

It all depends on what you code. Web - buy something nice to hold and use. Heavy workload (apps, data processing) - buy something powerful, even if ugly.

willow grail Jun 2, 2025, 1:15 PM

#

calm sequoia It all depends on what you code. Web - buy something nice to hold and use. Heavy...

so any website would be not heavy workload?

#

even amazon?

calm sequoia Jun 2, 2025, 1:15 PM

#

Idk I havent seen website that requries a lot of cores or GPU

#

And amazon website is 💩 wdym?

willow grail Jun 2, 2025, 1:17 PM

#

calm sequoia And amazon website is 💩 wdym?

hmmm okey... but like lets go one step earlier..

how to vibe code with claude code on ANDROID?
do i even need a laptop to use claude code on ubuntu? or is screensharing from pc to outside device enough?

#

WE DONT KNOW

#

stuff like laptop and tablet is an issue cause it probably will be impossible to use without a table high enough

#

i am feeding crows at graveyard. this takes time. like 2 hours daily.

so i wanna code while at it

#

i could bring a mobile table with me if laptop/tablet

#

slept 8 hours R.

#

claude code is the best agentic vibe coder ou tthere

#

ok then why did my lower arms hurt a lot 10 years ago after 5h usage of macbook??

#

weighted 1.6kg

#

i can watch websites on android yes.

#

just websites and perhaps games up to phaser.js level

misty vault Jun 2, 2025, 1:28 PM

#

no way

willow grail Jun 2, 2025, 1:30 PM

#

misty vault no way

like how? u would need to rotate your head a lot down

#

thats unnatural and thus will lead to pain

patent aspen Jun 2, 2025, 2:38 PM

#

#

If I didn't have a full time job and had the energy and will to code in my free time, I'd probably switch from a Macbook Pro to a Framework running Arch Linux

queen siren Jun 2, 2025, 3:24 PM

#

anyone recommend good libraries for running multiagent llm systems?

patent aspen Jun 2, 2025, 3:36 PM

#

The Linux terminal feels like putting my hand on the third rail of the universe, and no amount of money, build quality, or brand value can give me that high

glad jackal Jun 2, 2025, 3:53 PM

#

Guys currently which llm has the least hallucinations is it Gemini 2.5?

#

Lmarena?

#

What abt Gemini 2.5

#

It has Been 1 in leader board

grim axle Jun 2, 2025, 4:25 PM

#

can’t wait until Claude opus is open source

elder rapids Jun 2, 2025, 4:35 PM

#

glad jackal Guys currently which llm has the least hallucinations is it Gemini 2.5?

it is Gemini 2.5, the reason why gpt 4.5 is said to not have much hallucinations is because it doesn't assert much

elder rapids Jun 2, 2025, 4:56 PM

#

yes bro opus never hallucinates

#

Claude 4 opus is a hallucination goblin btw

acoustic cliff Jun 2, 2025, 4:58 PM

#

CIuncanny

echo aurora Jun 2, 2025, 5:14 PM

#

going to be listening to lofi most of the day in #1340554757827461215 for anyone interested

feral lichen Jun 2, 2025, 5:16 PM

#

can anyone tell me best ai for lua coding?

warped sequoia Jun 2, 2025, 5:40 PM

#

echo aurora going to be listening to lofi most of the day in <#1340554757827461215> for anyo...

you love lofi so much 😭

balmy mist Jun 2, 2025, 6:07 PM

#

anything new?

sour spindle Jun 2, 2025, 6:18 PM

#

Dumb question that’s probably been answered is the claude 4 opus on leaderboard thinking or base

ocean vortex Jun 2, 2025, 6:33 PM

#

balmy mist anything new?

dork 4.0 agi confirmed

echo aurora Jun 2, 2025, 6:36 PM

#

sour spindle Dumb question that’s probably been answered is the claude 4 opus on leaderboard ...

it is the base version

sour spindle Jun 2, 2025, 6:36 PM

#

Pretty good score for the base version

ocean vortex Jun 2, 2025, 6:36 PM

#

there's no such thing as claude base though, both are the same exact model

#

chat

#

just different prompt template 😉

#

And if you add <thinking> at the end of your prompt, from my experience this is gonna be largely the same as thinking natively enabled

small haven Jun 2, 2025, 6:39 PM

#

same day as o3 pro official release 🥳

ocean vortex Jun 2, 2025, 6:40 PM

#

small haven same day as o3 pro official release 🥳

I'm thinking they are to revise their implemention fundamentally maybe. Otherwise it makes no sense why it is taking them so long...

#

Straight forward parallel compute, you could implement that in a short evening lol

keen beacon Jun 2, 2025, 6:40 PM

#

apparently o3 pro is already rolling out

ocean vortex Jun 2, 2025, 6:41 PM

#

and it doesn't need safety testing additionally etc

keen beacon Jun 2, 2025, 6:41 PM

#

just recently

small haven Jun 2, 2025, 6:51 PM

#

ocean vortex I'm thinking they are to revise their implemention fundamentally maybe. Otherwis...

i mean i have it alrdy, just need them to officially release, bc they kept tweaking it its kinda annoying

small haven Jun 2, 2025, 7:06 PM

#

me

#

bro got amnesia

ocean vortex Jun 2, 2025, 7:07 PM

#

small haven i mean i have it alrdy, just need them to officially release, bc they kept tweak...

I'm using o4-high early access

#

seems on pair with 4.0 dork

small haven Jun 2, 2025, 7:07 PM

#

ocean vortex I'm using o4-high early access

im not even trolling

keen beacon Jun 2, 2025, 7:07 PM

#

hes not trolling

ocean vortex Jun 2, 2025, 7:08 PM

#

catgrin

#

are you not? Is it good then?

small haven Jun 2, 2025, 7:08 PM

#

ok let me get some proof, sigh

#

gimme a prompt @ocean vortex

#

that involves the internet lol

ocean vortex Jun 2, 2025, 7:09 PM

#

is it with tools or no tools?

small haven Jun 2, 2025, 7:09 PM

#

no clue, it doesn't show explicitly in the cot

#

it does use web tho

ocean vortex Jun 2, 2025, 7:10 PM

#

small haven no clue, it doesn't show explicitly in the cot

we need smth it can't cheat with tools on then... try this:

approximately:
A. 4km eastward
B. 30 km northward
C. >30km away north-westerly
D. <1 km northward
E. >30 km away north-easterly.
F. 5 km+ eastward
G. The glove is exactly where the car was at the time it slipped out.
H. Neither option is correct.```

small haven Jun 2, 2025, 7:11 PM

#

queued

marsh stratus Jun 2, 2025, 7:12 PM

#

is Claude 4 Opus on the leaderboard the thinking Claude or nonthinking?

ocean vortex Jun 2, 2025, 7:13 PM

#

marsh stratus is Claude 4 Opus on the leaderboard the thinking Claude or nonthinking?

neither

#

seriously though read several messages above lol

small haven Jun 2, 2025, 7:13 PM

#

ocean vortex we need smth it can't cheat with tools on then... try this: ```A luxury sports-...

pretty sure it does use tools in the backend, just doesnt show anything like o3

ocean vortex Jun 2, 2025, 7:14 PM

#

small haven pretty sure it does use tools in the backend, just doesnt show anything like o3

it probably does. O3 non-pro early access did use tools as well

patent aspen Jun 2, 2025, 7:15 PM

#

What is the source for o3 pro rolling out on Thursday?

humble heath Jun 2, 2025, 7:15 PM

#

will o3 pro be added to the arena?? i assume not bc the model is so expensive

small haven Jun 2, 2025, 7:15 PM

#

patent aspen What is the source for o3 pro rolling out on Thursday?

i have it alrdy stealth released 🤷‍♂️

#

and thursday is where big releases happen

small haven Jun 2, 2025, 7:19 PM

#

ocean vortex it probably does. O3 non-pro early access did use tools as well

https://chatgpt.com/share/683df8dd-3fa8-8003-a892-27d46a39e79c

ChatGPT

ChatGPT - Glove drift analysis

Shared via ChatGPT

ocean vortex Jun 2, 2025, 7:25 PM

#

small haven https://chatgpt.com/share/683df8dd-3fa8-8003-a892-27d46a39e79c

https://chatgpt.com/share/683dfaa6-ff4c-800b-bd59-1dc3642799a2 lmfao

ChatGPT

ChatGPT - Glove drift analysis

Shared via ChatGPT

small haven Jun 2, 2025, 7:26 PM

#

ocean vortex https://chatgpt.com/share/683dfaa6-ff4c-800b-bd59-1dc3642799a2 lmfao

it continues based off ur account, not mine

ocean vortex Jun 2, 2025, 7:27 PM

#

yeah ik, but I don't have 3.5 obviously, it's still hilarious why it responded this way. It's just showing "chatgpt" for this chat no model name

small haven Jun 2, 2025, 7:27 PM

#

let's be honest tho, do u really think 3.5 could have done that

small haven Jun 2, 2025, 7:27 PM

#

ocean vortex yeah ik, but I don't have 3.5 obviously, it's still hilarious why it responded t...

is it even correct thats my q lol

ocean vortex Jun 2, 2025, 7:27 PM

#

small haven Jun 2, 2025, 7:27 PM

#

no i meant the answer

#

is it A

ocean vortex Jun 2, 2025, 7:28 PM

#

small haven is it A

it's D

small haven Jun 2, 2025, 7:28 PM

#

damnn

ocean vortex Jun 2, 2025, 7:29 PM

#

I haven't seen any model get this right yet though. So it doesn't mean it's sht. Just that maybe it is not significantly better than normal o3

small haven Jun 2, 2025, 7:29 PM

#

deepthink next

small haven Jun 2, 2025, 7:30 PM

#

ocean vortex I haven't seen any model get this right yet though. So it doesn't mean it's sht....

grok 3.5 on big brain could prolly crack it ngl

#

wen deepthink sir

#

yes

#

im actually curious

#

yes i believe that

#

its brian

patent aspen Jun 2, 2025, 7:33 PM

#

It's a new person

small haven Jun 2, 2025, 7:33 PM

#

nah but when is deepthink

#

gimme the exact date

patent aspen Jun 2, 2025, 7:33 PM

#

Anat

#

I remember Ruth because she made our perks suck. I don't think about Anat much

#

She's kind of just there

small haven Jun 2, 2025, 7:34 PM

#

brian gimme deepthink exact date release

keen beacon Jun 2, 2025, 7:34 PM

#

he probably doesnt know xd

small haven Jun 2, 2025, 7:35 PM

#

go into that deepmind office and asked everybody for the date

keen beacon Jun 2, 2025, 7:35 PM

#

yeah but idk what you expect from him if u keep asking about it

small haven Jun 2, 2025, 7:35 PM

#

hes talking quiet guys

#

so basically end of june

#

ur not in that slack group? come on bro

#

i have a genuine question, will deepthink have a 1m context window like the rest of the gemini models ?

keen beacon Jun 2, 2025, 7:39 PM

#

theyre releasing two revs in a month?

#

or naming it ga later

#

i guess its the latter

small haven Jun 2, 2025, 7:43 PM

#

wait they can actually afford deepthink on 2m context window? i believe it for regular gemini models

keen beacon Jun 2, 2025, 7:43 PM

#

load i guess

small haven Jun 2, 2025, 7:43 PM

#

im guessing keeping compute for research/training

#

we need an oai insider in here now

#

yoooo wen is o4 pro

#

ok bud

#

2m context window on deepthink is absolutely going to crush o3 pro, sorry sam

keen beacon Jun 2, 2025, 7:47 PM

#

can deepthink use tools?

#

if not there will still be a use for o3 pro

small haven Jun 2, 2025, 7:48 PM

#

maybe not, but 2m context window is a huge deal

#

o3 pro currently has 64k

#

i maxxed it out

#

128k got timed out

#

80k~ timed out

#

58k ish passed

#

original o1 pro could fit 128k

#

o1 as well

#

o3-mini-high too, but now it's limited to 64k

#

yes...

#

cuz of the huge spike in new members

#

google has no users, so 2m context window it is

#

no "loyal" users

#

😭

patent aspen Jun 2, 2025, 7:57 PM

#

I mean that's their first mover advantage

#

It is

small haven Jun 2, 2025, 7:57 PM

#

"forced"

#

nk style

patent aspen Jun 2, 2025, 7:58 PM

#

You are correct Gemini will not be ChatGPT

keen beacon Jun 2, 2025, 7:58 PM

#

4o image gen was probably bigger than 2.5 pro

small haven Jun 2, 2025, 7:59 PM

#

400m in ai studio? or 400m android users who have no choice to pass by a gemini feature lol

#

geographic 100% india?

#

and other third world countries

#

i believe tides will turn when deepthink release all jokes aside

patent aspen Jun 2, 2025, 8:00 PM

#

And you are a shining example @deep adder

small haven Jun 2, 2025, 8:01 PM

#

craig singh

#

@patent aspen also when is jules actually going to be GA? would gladly pay for unlimited like codex

patent aspen Jun 2, 2025, 8:03 PM

#

small haven <@607352374352281612> also when is jules actually going to be GA? would gladly p...

No clue

#

Your politeness is incomparable

small haven Jun 2, 2025, 8:04 PM

#

its a fact, not conspiracy

#

western world is iphone dominated

green oak Jun 2, 2025, 8:05 PM

#

I would like to but the buisness

#

I have 3 dollars

small haven Jun 2, 2025, 8:05 PM

#

wtf

patent aspen Jun 2, 2025, 8:05 PM

#

"third world country type" comes off a bit differently haha

#

Kind of like poors or plebs

#

It does. I'm just remarking on your word choice

#

Oh absolutely

#

More so in the United States among gen Z but yes

#

Sure

small haven Jun 2, 2025, 8:10 PM

#

i think thats a you thing 😂

ocean vortex Jun 2, 2025, 8:48 PM

#

small haven google has no users, so 2m context window it is

agree

#

but the still thought $250 sub was great idea

sour spindle Jun 2, 2025, 9:43 PM

#

Gemini app is atrocious

#

Normies love ChatGPT vast majority don’t even know there are other models

misty vault Jun 2, 2025, 10:07 PM

#

https://tenor.com/view/cat-look-cat-look-at-camera-silly-cat-in-a-cage-gif-889392959852579879

Tenor

patent aspen Jun 2, 2025, 10:32 PM

#

People (especially rich people) have tended to value materialistic, branded status symbols less over time

#

Why do you think that?

#

Interesting. It does seem there was a counter-trend among millenials toward experiences, health, and well-being, although the current trend among younger people does seem to be towards materialism

elder rapids Jun 2, 2025, 11:02 PM

#

this IS Gemini usage...

red sluice Jun 2, 2025, 11:06 PM

#

Gemini pro is winning on most benchmarks

hidden quartz Jun 2, 2025, 11:07 PM

#

Hey guys wanted to know if images generated from lmarena can be used commercially? And if not would editing it help??

red sluice Jun 2, 2025, 11:08 PM

#

hidden quartz Hey guys wanted to know if images generated from lmarena can be used commerciall...

Depends on the models I guess but since the results are open source if someone really wants to prove it’s from their model, they can find and prove it’s theirs

hidden quartz Jun 2, 2025, 11:10 PM

#

Umm currently using flux to generate product photos and then changing the subject with my own and keeping the background. Should I be worried?

zinc ore Jun 2, 2025, 11:22 PM

#

AI Overviews is 1.5b if they were counting that. Per IO

#

I think Meta claims their AI usage is 1b (or close to that figure)