#general | Arena | Page 31

ocean vortex Apr 26, 2025, 8:03 PM

#

it's at a clear disadvnatge againt competition in some areas, and this includes all their reasoning models as they are based on it

#

I wouldn't go as far than that, as the trend lately has been downsizing... So maybe in the long-run the trend/progress will catch up to it and it's not gonna be too small. But for now it does seem suboptimal as people just can't get spatial awareness at a top level without decently big model size

#

though we also need to keep in mind that we just do not know what could potentially be done with RL training using notably bigger model. Maybe it's diminishing returns, but maybe the gains are actually more substantial

lavish orchid Apr 26, 2025, 8:10 PM

#

anyone else have the problem of Gemini 2.5 Pro not using terminal in Cursor?

#

Claude 3.7 runs it 20 times and does everything I ask it, Gemini 2.5 Pro is unsure and asks to install libraries etc

#

prompted it in user rules too no change

ocean vortex Apr 26, 2025, 8:10 PM

#

it would unlikely to scale the same way as normal classic chat LLMs

#

so I would think "perfect size" for that is something different (bigger), even with the current metrics and their limitations

tall summit Apr 26, 2025, 8:18 PM

#

WHAT

#

AAAAAAAAA

small haven Apr 26, 2025, 8:19 PM

#

wen

#

u just want to see the world burn dont you 😭

ocean vortex Apr 26, 2025, 8:39 PM

#

how is this real... 😭💀

small haven Apr 26, 2025, 8:42 PM

#

it was half that last week

#

o1 pro is unlimited lessss gooo

wintry locust Apr 26, 2025, 9:00 PM

#

for o1 it's called juice

ocean vortex Apr 26, 2025, 9:07 PM

#

small haven o1 pro is unlimited lessss gooo

to my surprise I'm getting the same 8192 score nonsense on playground with completely empty user defined system prompt for o3-high. Now that's 1 way to ruin a model...

balmy mist Apr 26, 2025, 9:16 PM

#

we still aint got no new stuff?

ocean vortex Apr 26, 2025, 9:16 PM

#

people gonna have to worry about this stupid yap score now when trying to eval openai models 💀💀

balmy mist Apr 26, 2025, 9:16 PM

#

whar hppened to r2?

leaden palm Apr 26, 2025, 9:16 PM

#

balmy mist whar hppened to r2?

there were rumors about it today

#

probably wont be released super soon tho

#

polymarket: 9% chance in apr, 90% chance apr/may/jun

#

manifold: 25% chance in apr, 88% chance in apr/may, 93% chance in apr/may/june

ocean vortex Apr 26, 2025, 9:18 PM

#

leaden palm polymarket: 9% chance in apr, 90% chance apr/may/jun

there are only 3 days left in April though lol

leaden palm Apr 26, 2025, 9:19 PM

#

@keen beacon alt?

keen beacon Apr 26, 2025, 9:21 PM

#

ive never seen a model do this 🤣

oblique flint Apr 26, 2025, 9:21 PM

#

lavish orchid anyone else have the problem of Gemini 2.5 Pro not using terminal in Cursor?

It still kinda sucks in cursor unfortunately, just not as good at toolcalling

small haven Apr 26, 2025, 9:36 PM

#

leaden palm manifold: 25% chance in apr, 88% chance in apr/may, 93% chance in apr/may/june

that discrepancy in apr

#

how tf is not being arbitraged

keen beacon Apr 26, 2025, 9:49 PM

#

leaden palm <@456226577798135808> alt?

@olive mesa hi twin

leaden palm Apr 26, 2025, 9:49 PM

#

ah forgot that exists

knotty jetty Apr 26, 2025, 10:00 PM

#

Is there a reason why they don’t include Claude’s web search function for the web search leaderboard

keen beacon Apr 26, 2025, 10:00 PM

#

yea to make sure it hits 1345 words. i didnt request that 🤣

knotty jetty Apr 26, 2025, 10:00 PM

#

Claude.ai but premium

#

Ok

#

But it’s only in premium just so you know

#

What do you mean

#

The lmarena api

#

Ok

#

So do you know why they haven’t added it

#

Oh

#

No it’s really good

#

I have perplexity pro and I feel like Claude has gotten many more answers right

#

For real

#

Oop

#

Well my dad uses perplexity too

#

Should I tell him to stop paying for it

elder rapids Apr 26, 2025, 10:04 PM

#

it's crazy how media is so far off from the reality of these products

#

and fake news is the reason why a lot of them still live

#

perplexity should ong be dead rn

#

dawg

#

why are you saying it's a scam

#

because you agree with me

#

😭

#

ye

#

that's what I'm saying

keen beacon Apr 26, 2025, 10:05 PM

#

u used to get like 600 messages a day on perplexity

knotty jetty Apr 26, 2025, 10:06 PM

#

My dad uses its API, he runs a music player company and he uses it to find creators numbers and their emails and tells them stuff. Is he cooked or nah

keen beacon Apr 26, 2025, 10:06 PM

#

with a lot of models, even if u dont use it for search

elder rapids Apr 26, 2025, 10:06 PM

#

knotty jetty My dad uses its API, he runs a music player company and he uses it to find creat...

nah

#

there's always going to be replacements

#

in the AI industry

#

maybe even made by openAI

knotty jetty Apr 26, 2025, 10:06 PM

#

elder rapids nah

No like do you think it’s getting it wrong

elder rapids Apr 26, 2025, 10:06 PM

#

knotty jetty No like do you think it’s getting it wrong

oh

#

depends

#

that kind of stuff is hard to get wrong

#

especially for an LLM

knotty jetty Apr 26, 2025, 10:07 PM

#

Ok thank god bro

keen beacon Apr 26, 2025, 10:07 PM

#

u cant disable it now?

#

thats dumb

knotty jetty Apr 26, 2025, 10:08 PM

#

My dad has been working on ts for 10 years and I’m scared ai is gonna screw him up

elder rapids Apr 26, 2025, 10:08 PM

#

knotty jetty My dad has been working on ts for 10 years and I’m scared ai is gonna screw him ...

nah it's alr

#

if he adapts

#

he'll be elevated

#

there's not much downsides

#

still depends

knotty jetty Apr 26, 2025, 10:08 PM

#

elder rapids if he adapts

Dw he’s already a millionaire but I’m just scared he is gonna be sad if this idea fails

elder rapids Apr 26, 2025, 10:09 PM

#

since it'll get to a point where not taking jobs

#

becomes unethical

elder rapids Apr 26, 2025, 10:09 PM

#

knotty jetty Dw he’s already a millionaire but I’m just scared he is gonna be sad if this ide...

he could be impacted yeah, but he might find a whole new interest in AI itself

#

who knows

#

ong

earnest parcel Apr 26, 2025, 10:10 PM

#

gemini is absolutely not impressed by sonnet 3.7:thinking chess moves it seems 😄

elder rapids Apr 26, 2025, 10:12 PM

#

earnest parcel gemini is absolutely not impressed by sonnet 3.7:thinking chess moves it seems �...

if you ask 2.5 pro to reason through the spatial task of the board + think like a grandmaster and focus on recollection, itll stomp these other models with the same prompt

knotty jetty Apr 26, 2025, 10:13 PM

#

elder rapids he could be impacted yeah, but he might find a whole new interest in AI itself

He’s in the screen burning industry and he doesn’t use ai for that but now he’s branching to a music player website and app which uses ai for song recommendations but I think he will be fine even if it fails because he is one of the leaders in screen burning services.

elder rapids Apr 26, 2025, 10:13 PM

#

it seems like the best chess model currently tbh, though as base

#

the gpt models

#

might be the best

#

especially 4.5

earnest parcel Apr 26, 2025, 10:13 PM

#

elder rapids if you ask 2.5 pro to reason through the spatial task of the board + think like ...

i am currently running them against another, with full info and full reasoning, so I'll see. definately a huge difference in behaviour though, gemini is confident and claude is constantly passive/doubting itself

elder rapids Apr 26, 2025, 10:13 PM

#

but when you tweak them

#

2.5 pro is the best

#

nahhh

knotty jetty Apr 26, 2025, 10:14 PM

#

elder rapids but when you tweak them

Are u talking bout studio

elder rapids Apr 26, 2025, 10:14 PM

#

the industry doesn't allow that

elder rapids Apr 26, 2025, 10:14 PM

#

knotty jetty Are u talking bout studio

AIstudio yeah

#

but I mean

#

prompting wise

#

not temp control

#

and other dev stuff

knotty jetty Apr 26, 2025, 10:14 PM

#

elder rapids not temp control

Temp is also important tho

elder rapids Apr 26, 2025, 10:14 PM

#

knotty jetty He’s in the screen burning industry and he doesn’t use ai for that but now he’s ...

nice

#

he'll find his way

knotty jetty Apr 26, 2025, 10:15 PM

#

elder rapids he'll find his way

Yeah

#

Disappointing

#

See this is why screen burning is gonna fail bro

elder rapids Apr 26, 2025, 10:16 PM

#

knotty jetty Temp is also important tho

ye but when it comes to thinking models this becomes more arbitrary, especially puzzle tasks and spatial tasks like chess

knotty jetty Apr 26, 2025, 10:17 PM

#

elder rapids ye but when it comes to thinking models this becomes more arbitrary, especially ...

Yeah I only use it for research tasks so I don’t face those issues

knotty jetty Apr 26, 2025, 10:19 PM

#

elder rapids nice

https://arenaprints.com/pages/pre-burned-screens-screen-burning-services-and-screen-print-screens

Arena Prints

Pre-burned Screens, Screen Burning Services and Screen Print Screens

Say goodbye to the headaches of print prep and hello to a streamlined process that allows you to fulfill more orders in less time. Our screen print screens are designed to simplify your workflow, so you can focus on what matters most: delivering exceptional products to your customers.

#

Proudly black owned business if that interests anyone at all💀

elder rapids Apr 26, 2025, 10:20 PM

#

send that to him not me 💯

knotty jetty Apr 26, 2025, 10:20 PM

#

Bro come on

keen beacon Apr 26, 2025, 10:20 PM

#

yea 🤣 qwq 32b preview

knotty jetty Apr 26, 2025, 10:20 PM

#

Blacks I wild, just call us black people bro😭

elder rapids Apr 26, 2025, 10:20 PM

#

knotty jetty Proudly black owned business if that interests anyone at all💀

why proudly smh

keen beacon Apr 26, 2025, 10:21 PM

#

they added a sh1t ton of rl on top of it probably more than qwq full tbh

meager sun Apr 26, 2025, 10:21 PM

#

👹 evil

knotty jetty Apr 26, 2025, 10:21 PM

#

Dw I don’t really care

elder rapids Apr 26, 2025, 10:21 PM

#

deadass

#

can't even say type sh**

#

you can suspect it, but not reasonably believe it imo

#

they're not in the same position DeepMind is

#

they don't have the data scientists

#

they don't have the researchers

#

ye but I think we can assume they have the researchers, but not insane data scientists

#

nah not in comparison

#

just as it is

#

ye ofc

#

but it's not necessarily sacrifice

#

in the way it's suggested

#

deepmind

#

anthropic

#

private institutions

#

universities

#

ion think anyone who works at these companies primarily subscribe to the ideals

#

ye

#

I would say only the really top guys

#

that represent those ideals

#

which is inherent to the ideology themselves

keen beacon Apr 26, 2025, 10:29 PM

#

twink

elder rapids Apr 26, 2025, 10:29 PM

#

I mean, if I were a standard worker

#

I wouldn't care about these things

#

I'm trying to work hard and get research in lmao

#

for money

elder rapids Apr 26, 2025, 10:31 PM

#

elder rapids in the way it's suggested

since generally, specific researchers aren't valued in the sense they can continually output high level stuff

#

but can output quality for the direction the company intends

#

ion know about the specific situation too much with Ilya

#

but that's prob what happened, and he likely shifted

ocean vortex Apr 26, 2025, 10:33 PM

#

no "secret sauce". Just a head start when it mattered + userbase and some really smart engineers. Funding helps as well ofc

keen fulcrum Apr 26, 2025, 10:33 PM

#

elder rapids Apr 26, 2025, 10:33 PM

#

keen fulcrum

#

this is the same thing someone sent earlier lmao

#

"(This content is from public information and is for reference only and does not constitute investment advice) Investment is risky, please be cautious when entering the market!"

#

just in a different format

#

or actually prob where they got it from

#

keen beacon Apr 26, 2025, 10:35 PM

#

damn u know chinese lol? or just guessed it immediately

elder rapids Apr 26, 2025, 10:35 PM

#

guessed it immediately

#

but I know a little Chinese

keen fulcrum Apr 26, 2025, 10:38 PM

#

I believe this to be the case
lets await next week and hopefulyl get some news
R2 and Qwen 3 are imminent to release soon

keen beacon Apr 26, 2025, 10:39 PM

#

the qwen 3 release seems to be significant, they did llama cpp prs/transformers prs/vllm prs/mobile apps/etc far before the release of qwen 3

keen fulcrum Apr 26, 2025, 10:39 PM

#

I believe Google will drop theirs soon after R2

keen beacon Apr 26, 2025, 10:40 PM

#

im hoping they release a qwen 3 reasoning model off the bat, but im most excited for new pretrained base models for fine-tuning, etc. qwen 2.5 was exceptional

keen fulcrum Apr 26, 2025, 10:40 PM

#

The coder models

elder rapids Apr 26, 2025, 10:41 PM

#

keen fulcrum I believe this to be the case lets await next week and hopefulyl get some news R...

it literally says it's not a leak, just an accumulation of already public information

#

which undermines it's value as a concept stock

#

since it's not new Information

keen beacon Apr 26, 2025, 10:41 PM

#

yea theyre releasing smaller models too

#

maybe a 32b alternative but moe so itll inference faster

elder rapids Apr 26, 2025, 10:42 PM

#

I truly don't think it's going to get that much better from deepseek

keen fulcrum Apr 26, 2025, 10:42 PM

#

Oh indeed browser integrated llms soon to be the next thing

elder rapids Apr 26, 2025, 10:42 PM

#

get rid of 2.5 pro, get rid of o3s and o4 minis release

#

let r2 release

#

do you seriously think the gap would've become THAT wide

keen beacon Apr 26, 2025, 10:43 PM

#

i dont know what to expect with r2 tbh

elder rapids Apr 26, 2025, 10:43 PM

#

without those 2 crazy releases

#

nah

keen fulcrum Apr 26, 2025, 10:43 PM

#

There is the possibility in the room qwen 3 will outperform R2, lets see

elder rapids Apr 26, 2025, 10:43 PM

#

I can't believe deepseek would've accomplished that

keen beacon Apr 26, 2025, 10:43 PM

#

i dont think r2 will outperform 2.5 pro at least in simpleqa i think

elder rapids Apr 26, 2025, 10:43 PM

#

let alone at the level of o3

keen beacon Apr 26, 2025, 10:44 PM

#

i use 2.5 pro on stuff that requires a lot of world knowledge/niche world knowledge

#

its exceptional (compared to other reasoning models)

elder rapids Apr 26, 2025, 10:44 PM

#

especially when r1 wasn't really that good

keen fulcrum Apr 26, 2025, 10:44 PM

#

elder rapids especially when r1 wasn't really that good

R1 forced the industry to release their newest models

elder rapids Apr 26, 2025, 10:44 PM

#

keen fulcrum R1 forced the industry to release their newest models

that's straight up propaganda

#

😭 🙏

keen fulcrum Apr 26, 2025, 10:45 PM

#

Why?
AI got significantly better as soon as R1

elder rapids Apr 26, 2025, 10:45 PM

#

keen fulcrum Why? AI got significantly better as soon as R1

nothing occured when deepseek released that was meaningful

#

that's literally impossible

#

the time period is too narrow

#

that means they weren't planning on releasing o3 mini after the announcement

#

and it takes a ton of time

#

for them to prepare it

#

won't release it on a whim like that

#

unless it's truly done

#

especially with how integrated it was

#

take a look at 2.0 flash thinking

#

lol

#

same thing

keen beacon Apr 26, 2025, 10:47 PM

#

it defo was the reason and the reason why they started working on improving the reasoning summary

elder rapids Apr 26, 2025, 10:47 PM

#

keen beacon it defo was the reason and the reason why they started working on improving the ...

this is exactly the only thing they did tho

keen beacon Apr 26, 2025, 10:48 PM

#

i dont recall r1/o3 mini timelines that much tbh ive no idea about timeline

elder rapids Apr 26, 2025, 10:48 PM

#

since people were whining about it

#

you cant attribute any of these AI things to the release of deepseek r1

#

timelines don't add up

#

understanding what even goes on behind these ai

#

and what entails the integration

keen beacon Apr 26, 2025, 10:48 PM

#

vision model and btw thats a bad way to test lol

brittle tiger Apr 26, 2025, 10:50 PM

#

grok has good team from what i can tell. i heard they pay way more than other labs, basically a working for elon tax

elder rapids Apr 26, 2025, 10:50 PM

#

probably ye

#

but

#

the thing is, they're not as old

#

as other labs

torn mantle Apr 26, 2025, 10:51 PM

#

I cant get enough of o3

elder rapids Apr 26, 2025, 10:51 PM

#

and that's a serious factor

elder rapids Apr 26, 2025, 10:51 PM

#

torn mantle I cant get enough of o3

I can, ts cannot comprehend a thing im saying 😭 🙏

#

jkjkjk

#

nahhh

#

I think it's actually starting

#

you cant get the jump from 2.0 flash thinking to 2.5 pro

#

without a major breakthrough

#

oh

#

wait wym?

#

era of huge growth

#

oh ye

#

if r2 doesn't close the gap tbh

#

I can reasonably assume

#

open source is going to be pretty bad

#

for a little while

#

until they come up with something

keen beacon Apr 26, 2025, 10:54 PM

#

elder rapids open source is going to be pretty bad

nah the qwen team will deliver

#

its their time this time

#

hopefully

#

im not sure about deepseek but the qwen chat website was updated with strings of a qwen plus sub with video gen, image gen, access to qwen 3, etc

knotty jetty Apr 26, 2025, 10:56 PM

#

elder rapids for a little while

Yo what ai should I use for research rn

#

I’ve tried everything aside from deepseek tbh

#

No like not deep research

#

Just like general search

#

What’s tavily?

#

How do I use it

keen beacon Apr 26, 2025, 11:00 PM

#

bruh if u pay those prices lol

#

paying for o4 mini/o3's api and enabling first party web tools, etc on the api is more reasonable tbh

ember rapids Apr 26, 2025, 11:09 PM

#

Sam did say r1 made them move several releases

#

I wonder what impact r2 will have

earnest parcel Apr 26, 2025, 11:10 PM

#

elder rapids if you ask 2.5 pro to reason through the spatial task of the board + think like ...

https://lichess1.org/game/export/gif/white/mLb7m0Zn.gif?theme=brown&piece=cburnett

The game between Gemini 2.5 Pro Preview and Claude 3.7 Sonnet Thinking finally concluded, and with 8 and 7 blunders respectively, ultimately ended in a draw!

elder rapids Apr 27, 2025, 12:29 AM

#

knotty jetty Yo what ai should I use for research rn

grok or AI studio 2.5 pro grounding

elder rapids Apr 27, 2025, 12:30 AM

#

earnest parcel https://lichess1.org/game/export/gif/white/mLb7m0Zn.gif?theme=brown&piece=cburne...

probably to be expected without any prompting

#

I got 2.5 pro to play at around 1900~ ish

#

as that's below my elo

small haven Apr 27, 2025, 12:33 AM

#

bro what is happening to chatgpt, everything is 64k context max

willow grail Apr 27, 2025, 12:40 AM

#

Cervical Spine Risk: Rotating your head 180 degrees is generally not recommended. It puts significant stress on the cervical vertebrae, discs, ligaments, and potentially the vertebral arteries that run through the neck bones to supply the brain. Doing this forcefully or if you have underlying neck issues (even unknown ones) could risk injury, nerve impingement, or (rarely) vascular problems causing dizziness or pain.

olive mesa Apr 27, 2025, 12:50 AM

#

keen beacon <@1192983462823612496> hi twin

hi!

earnest parcel Apr 27, 2025, 12:52 AM

#

elder rapids probably to be expected without any prompting

this is with prompting, and also there is no way 2.5 Pro (or any language model) comes even remotely close to 1900 ELO. I have tested and played matches and tournament around 200 times by now (using all types of different methods), and the strongest any LLM ever came was GPT-3.5 Instruct in movetext continuation (aka Chess notation recall from training data). Other language models play more in the 400 Elo range, even SOTA.

keen beacon Apr 27, 2025, 12:52 AM

#

olive mesa hi!

where were you 💔

olive mesa Apr 27, 2025, 12:53 AM

#

keen beacon where were you 💔

my discord was muted 💔

keen beacon Apr 27, 2025, 12:54 AM

#

don't leave me like that smh..

olive mesa Apr 27, 2025, 12:55 AM

#

sorry lol.. ill try not to in the future

keen beacon Apr 27, 2025, 12:58 AM

#

(i'm joking)

#

don't stress 😭

olive mesa Apr 27, 2025, 1:04 AM

#

lmao ok 😭

neat apex Apr 27, 2025, 1:05 AM

#

earnest parcel this is with prompting, and also there is no way 2.5 Pro (or any language model)...

In my calculations gpt 4o have 1000 and o3 mini have 1400 at THE MOST (likely 250 lower)

#

Rellying a lot in book moves or gimmick moves anyway

earnest parcel Apr 27, 2025, 1:05 AM

#

neat apex In my calculations gpt 4o have 1000 and o3 mini have 1400 at THE MOST (likely 25...

tell me where to play against 1000 elo gpt 4o, and I show you its not nearly 1000 elo

neat apex Apr 27, 2025, 1:06 AM

#

A yes, at bullet

#

I am saying the most optimist scenario ever

#

Who said 1900?

neat apex Apr 27, 2025, 1:07 AM

#

earnest parcel tell me where to play against 1000 elo gpt 4o, and I show you its not nearly 100...

Putted some games i found at the analisis and it said 1000 elo

#

For some reason at defense it holds very well

earnest parcel Apr 27, 2025, 1:08 AM

#

neat apex Putted some games i found at the analisis and it said 1000 elo

which analysis?

neat apex Apr 27, 2025, 1:08 AM

#

That chess.com one what says the estimated one

#

Just like the gif shows, at developing it is great but at end game it trash out

#

It must be the reason it gived an high value

earnest parcel Apr 27, 2025, 1:12 AM

#

neat apex It must be the reason it gived an high value

I don't know of any system that can take a game, and determine "ELO" based on it.... that would be super inaccurate. Elo is based on your opponents strenght, and the outcome, not on how good your moves looked in isolation..... Either way, I have tested a ton, and recorded a lot of games, and most SOTA models play around 400 ELO level (when compared to Lichess opponents), and are unable to beat the weakest Stockfish 14 level (sub 800 ELO)

neat apex Apr 27, 2025, 1:12 AM

#

Oh

#

Whata twist

I think the 1000 elo number is somehow accurate but limited to hard macths since chess youtubers played against

#

And for some reason it plays way better at middle game, likely due memorizing moves

elder rapids Apr 27, 2025, 1:28 AM

#

earnest parcel this is with prompting, and also there is no way 2.5 Pro (or any language model)...

yeah this isn't true at all lmao, minimum 1400 if you prompt it right + urge it to use opening repertoire, emphasizing move recollection, these models can easily be that good. can you give me your prompts lol

#

you either don't know what level they really are playing at with lack of experience in chess or you don't know how to prompt

#

has to be one of those two

earnest parcel Apr 27, 2025, 1:29 AM

#

elder rapids you either don't know what level they really are playing at with lack of experie...

talk is cheap, give me your 1400 minimum ELO prompt, and I can directly show you its not 1400 ELO.

#

(also you just lowered your treshhold by 500 ELO, impressive)

elder rapids Apr 27, 2025, 1:30 AM

#

yk what just get up a game lmao

elder rapids Apr 27, 2025, 1:30 AM

#

earnest parcel (also you just lowered your treshhold by 500 ELO, impressive)

no?

#

that's an entirely different claim lmao

#

I'm not saying 2.5 pro plays at that level

raven void Apr 27, 2025, 1:31 AM

#

GPT 5 at home

elder rapids Apr 27, 2025, 1:31 AM

#

I'm saying if you urge recollection, it'll be at least 1400

#

not restrictive to 2.5 pro

earnest parcel Apr 27, 2025, 1:31 AM

#

#general message

I got 2.5 pro to play at around 1900~ ish

elder rapids Apr 27, 2025, 1:31 AM

#

and the longer it goes

#

it'll definitely deteriorate

elder rapids Apr 27, 2025, 1:31 AM

#

earnest parcel https://discord.com/channels/1340554757349179412/1340554757827461211/13658475011...

great now can you quote the claim made in the larger passage

earnest parcel Apr 27, 2025, 1:31 AM

#

really?`so I can get any model to play 2k elo also. they played E4

elder rapids Apr 27, 2025, 1:31 AM

#

earnest parcel really?`so I can get any model to play 2k elo also. they played E4

no, it loses too much context

earnest parcel Apr 27, 2025, 1:33 AM

#

well unlike you I already provided I have collected data (169 games as of now, between multiple modes). would love to see anything despite baseless claims about your 1900 or 1400 elo LLMs

elder rapids Apr 27, 2025, 1:33 AM

#

earnest parcel really?`so I can get any model to play 2k elo also. they played E4

c5

elder rapids Apr 27, 2025, 1:42 AM

#

earnest parcel well unlike you I already provided I have collected data (169 games as of now, b...

🤷

#

just saying

#

not a lot of people are that good at understanding models

#

entire runs can easily be invalidated if you don't adjust prompt techniques respective to the model

earnest parcel Apr 27, 2025, 1:44 AM

#

i am not interested in troll statements. either provide proof for baseless claims or I got nothing to discuss with you

elder rapids Apr 27, 2025, 1:44 AM

#

wym?

#

I thought we were playing it already

#

that's why I said c5 lmao

earnest parcel Apr 27, 2025, 1:44 AM

#

saying c5 is not proof of LLM playing 1400 or previously claimed 1900 elo.

elder rapids Apr 27, 2025, 1:45 AM

#

earnest parcel saying c5 is not proof of LLM playing 1400 or previously claimed 1900 elo.

I'll just send all the thought processes

#

and outputs

#

lmao

#

it's not that deep

#

dude actually blocked me lmaoooo

#

😭

#

anyone want to go against 2.5 pro?

#

just for fun, and for the sake of testing

keen beacon Apr 27, 2025, 1:52 AM

#

They need to delete this new 4o personality

elder rapids Apr 27, 2025, 1:53 AM

#

keen beacon They need to delete this new 4o personality

ye

#

it's creative

#

but saying anything to it poisons the well

#

adjusting to the user is cool, that's what I like about 2.5 pro

#

but goddamn

#

I have to prompt it everytime I talk to it

#

to be the way I like it

keen beacon Apr 27, 2025, 1:56 AM

#

2.5 pro is great to talk to in comparison. They are trying way too hard with the new 4o etc. I didn't think they'd keep trying to force it this hard

elder rapids Apr 27, 2025, 1:57 AM

#

keen beacon 2.5 pro is great to talk to in comparison. They are trying way too hard with the...

I'm seeing people say they really like how warm it is

#

and how it's better than 3.7 sonnet

#

etc

#

but it's kinda of surreal

keen beacon Apr 27, 2025, 1:57 AM

#

Maybe they like the sycophancy lol

elder rapids Apr 27, 2025, 1:57 AM

#

knowing how synthetic 4o is

#

compared to the sonnet models

leaden palm Apr 27, 2025, 1:58 AM

#

elder rapids I'm seeing people say they really like how warm it is

where

small haven Apr 27, 2025, 1:58 AM

#

what they need to do is release

#

o3 pro

elder rapids Apr 27, 2025, 1:58 AM

#

leaden palm where

subreddits and stuff

elder rapids Apr 27, 2025, 1:58 AM

#

small haven o3 pro

ong

#

back to the pro plan I go

#

🙏

small haven Apr 27, 2025, 1:58 AM

#

its gon be worth it

#

its worth it rn but i want worthier

leaden palm Apr 27, 2025, 1:59 AM

#

if o3 is deep research lite whats o3 pro

elder rapids Apr 27, 2025, 1:59 AM

#

if Google releases another model after that tho

#

ion know what I'm gonna do

#

imagine anthropic releases 4.0

leaden palm Apr 27, 2025, 1:59 AM

#

elder rapids if Google releases another model after that tho

May 20

small haven Apr 27, 2025, 1:59 AM

#

leaden palm if o3 is deep research lite whats o3 pro

o4 mini is deep research lite lol

#

o3 is deep research

elder rapids Apr 27, 2025, 1:59 AM

#

leaden palm May 20

ye io

small haven Apr 27, 2025, 1:59 AM

#

o3 pro is deep anus research

elder rapids Apr 27, 2025, 2:00 AM

#

have you guys tested the deep researches enough

#

I don't use the Gemini one or the openAI one that much

small haven Apr 27, 2025, 2:00 AM

#

yes

elder rapids Apr 27, 2025, 2:00 AM

#

so I'm not sure which one is better

leaden palm Apr 27, 2025, 2:00 AM

#

somehow my stupid scaffold with exa and gemini flash+pro outperforms all the other free ones

small haven Apr 27, 2025, 2:01 AM

#

elder rapids so I'm not sure which one is better

oai deep research is meta rn

#

nothing beats it

elder rapids Apr 27, 2025, 2:01 AM

#

leaden palm somehow my stupid scaffold with exa and gemini flash+pro outperforms all the oth...

damn

elder rapids Apr 27, 2025, 2:01 AM

#

small haven nothing beats it

have you tried the Gemini one?

#

or nah

small haven Apr 27, 2025, 2:01 AM

#

elder rapids have you tried the Gemini one?

yes ..

#

obviously

elder rapids Apr 27, 2025, 2:01 AM

#

you have to get the subscription ig

#

to even use it

keen beacon Apr 27, 2025, 2:01 AM

#

What do y'all usually use deep research for

small haven Apr 27, 2025, 2:01 AM

#

elder rapids to even use it

its free tiral

#

*trial first month

elder rapids Apr 27, 2025, 2:01 AM

#

small haven its free tiral

oh ye

small haven Apr 27, 2025, 2:02 AM

#

but it just spits out mumbo jumbo, not wurf

elder rapids Apr 27, 2025, 2:02 AM

#

wym?

#

its done what I've asked it

#

very well

small haven Apr 27, 2025, 2:02 AM

#

oai deep research has high entropy info for every sentence

#

gemini dr is just stale and a bunch of unneeded detail

leaden palm Apr 27, 2025, 2:03 AM

#

keen beacon What do y'all usually use deep research for

elder rapids Apr 27, 2025, 2:03 AM

#

small haven gemini dr is just stale and a bunch of unneeded detail

alr give me a prompt

small haven Apr 27, 2025, 2:03 AM

#

elder rapids alr give me a prompt

like any prompt

elder rapids Apr 27, 2025, 2:04 AM

#

actually nah

#

I'll just read back

#

and do one for 2.5 pro with the same prompt

keen beacon Apr 27, 2025, 2:04 AM

#

leaden palm

Hmm what are you expecting out of "demo results" btw lol

leaden palm Apr 27, 2025, 2:05 AM

#

keen beacon Hmm what are you expecting out of "demo results" btw lol

it just loads a pregenerated one (im the one who set up that ui)

elder rapids Apr 27, 2025, 2:05 AM

#

I disagree a ton with the density of info, but the formatting seems more consistent in openAI DR

#

they seem too similar to compare on that part, or its unnecessary comparison

#

since there's necessarily limited amount of info for a topic

small haven Apr 27, 2025, 2:06 AM

#

elder rapids I disagree a ton with the density of info, but the formatting seems more consist...

u are pure coping, its not

#

gemini dr is like a student trying to fill up the minimum word count

elder rapids Apr 27, 2025, 2:07 AM

#

small haven u are pure coping, its not

wym coping? if I were asking about that I wouldn't be dismissing what you said lmao

small haven Apr 27, 2025, 2:08 AM

#

oh mb i read it wrong

elder rapids Apr 27, 2025, 2:08 AM

#

I'm asking which one is better, or which one sufficiently describes the information, not summarize it

small haven Apr 27, 2025, 2:08 AM

#

sorry

#

yea i agree

elder rapids Apr 27, 2025, 2:09 AM

#

in openAI DR it fetches good asf insight

#

and I've seen the same with Geminis

#

Gemini seems to verbosify a ton tho

#

but it doesn't seem like I'm getting less info

#

just more yap

leaden palm Apr 27, 2025, 2:10 AM

#

small haven gemini dr is like a student trying to fill up the minimum word count

meanwhile perplexity:

small haven Apr 27, 2025, 2:10 AM

#

"prioritize verbosity" loool

elder rapids Apr 27, 2025, 2:10 AM

#

prioritize verbosity 😭🙏

#

nerfing the search

small haven Apr 27, 2025, 2:26 AM

#

well great o3 solved a unit test where o1 pro couldnt... nice

raven void Apr 27, 2025, 2:27 AM

#

o3 is just so good

#

Gemini got the answer to my problem wrong even on the meta synthesis only o3 solved it

small haven Apr 27, 2025, 2:30 AM

#

yea..

#

ok so wen full o4 tho 😭

patent bane Apr 27, 2025, 2:30 AM

#

raven void Gemini got the answer to my problem wrong even on the meta synthesis only o3 sol...

whats the problem

raven void Apr 27, 2025, 2:31 AM

#

when they have o5 pro internally 🤣

raven void Apr 27, 2025, 2:31 AM

#

patent bane whats the problem

code for a part of my project

small haven Apr 27, 2025, 2:31 AM

#

to be working at oai, that must be insane

woeful geyser Apr 27, 2025, 3:27 AM

#

I feel that the vibe of o3 is too easy to recognize as well.

fleet lintel Apr 27, 2025, 4:49 AM

#

April was kind of disappointing.. no new top model was released. bunch of hype but nothing materialised.

earnest parcel Apr 27, 2025, 5:07 AM

#

fleet lintel April was kind of disappointing.. no new top model was released. bunch of hype b...

yea compared to march where we got Gemma 2 27B, Mistral Small 3.1, QwQ, Gemini 2.5 Pro, and Nemotron 49B, it's quite an uneventful month.
Llama 4 and GPT-4.1 are kind of duds, at least for me.

plain zinc Apr 27, 2025, 5:19 AM

#

fleet lintel April was kind of disappointing.. no new top model was released. bunch of hype b...

Gemini 2.5 Pro, o3, o4-mini-high: 🤨🤨🤨

drifting thorn Apr 27, 2025, 5:27 AM

#

2.5 Pro is the SOTA...

solar nebula Apr 27, 2025, 5:28 AM

#

yes

earnest parcel Apr 27, 2025, 5:29 AM

#

03-25, it doesn't count as April release (even though it got rebranded from exp to preview)

fleet lintel Apr 27, 2025, 5:30 AM

#

plain zinc Gemini 2.5 Pro, o3, o4-mini-high: 🤨🤨🤨

2.5 pro was in march. and nothing really better came after that

#

o3, o4-mini-high : disappointing. OAI tried to play game by releasing a bit prematurely to take on google but honestly they are just meh compared to 2.5 pro

plain zinc Apr 27, 2025, 6:23 AM

#

fleet lintel 2.5 pro was in march. and nothing really better came after that

Oh, right. He was in March.

#

https://x.com/seti_park/status/1915996238979453023

SETI Park (@seti_park) on X

1️⃣ US20250131254A1:

@GoogleDeepMind의 ‘기능 보존형 신경망 확장 기술’ 관련 특허로, 기존에 학습된 지식을 소실하지 않으면서도 AI 모델의 규모를 확장하거나 특정 목적에 맞게 특화시킬 수 있는 혁신적인 방법을 제시합니다. 이는 마치 사람이 새로운 지식을 배울 ...

#

From this post, you can understand where Google gets several models in LMarena.

#

So much more in such large numbers

calm sequoia Apr 27, 2025, 7:04 AM

#

raven void GPT 5 at home

Where is this from and what is synthesis

torn mantle Apr 27, 2025, 7:11 AM

#

@balmy mist i think i may be right again, we may get r2 this monday / next week

#

https://x.com/ClementDelangue/status/1916345020791001181

clem 🤗 (@ClementDelangue) on X

👀👀👀 https://t.co/mekr0Drodq

#

lot of hints as well

small haven Apr 27, 2025, 7:49 AM

#

day 11 , where is o3 pro

#

lemme guess when r2 gets released? smh

hardy pecan Apr 27, 2025, 7:53 AM

#

small haven day 11 , where is o3 pro

he said a few weeks. have more patience.

fleet lintel Apr 27, 2025, 8:00 AM

#

does anyone know of insider Apple info? Are they planning to compete in AI space at all?

mossy drum Apr 27, 2025, 8:12 AM

#

New model in Arena: llama-4-scout-17b-16e-instruct

keen fulcrum Apr 27, 2025, 8:20 AM

#

https://arstechnica.com/tech-policy/2025/04/elon-musks-xai-accused-of-lying-to-black-communities-about-harmful-pollution/

Ars Technica

Thermal imaging shows xAI lied about supercomputer pollution, group...

xAI faces calls to deny permits to power gas turbines at supercomputer facility.

small haven Apr 27, 2025, 8:24 AM

#

hardy pecan he said a few weeks. have more patience.

easy to say when ur on a plus sub

hardy pecan Apr 27, 2025, 8:25 AM

#

small haven easy to say when ur on a plus sub

Complaining about things you can't change is a wasted effort. Would be good to see it out at the end of this week, provided they are on time

sage raptor Apr 27, 2025, 8:28 AM

#

keen fulcrum https://arstechnica.com/tech-policy/2025/04/elon-musks-xai-accused-of-lying-to-b...

Pollution is everywhere

unborn ocean Apr 27, 2025, 9:01 AM

#

plain zinc https://x.com/seti_park/status/1915996238979453023

They had this research for a while now apparently: https://arxiv.org/pdf/2308.06103

#

Maybe the already used it for the early Gemini 2.0 iterations back in 2024 (Like all the ones that where aistudio only)

unborn ocean Apr 27, 2025, 10:42 AM

#

poll_question_text

Deepseek R2 before May?

victor_answer_votes

7

total_votes

17

victor_answer_id

2

victor_answer_text

No ❌

ocean vortex Apr 27, 2025, 11:33 AM

#

small haven it was half that last week

for what model? I'm absolutely blown away after I found out this silly score applies to playground and API too. If what you are saying is true that means any benchmarks that people did earlier might not even be possible to reproduce now. Changing stuff like that can always have unintended consequences, even if in theory higher value should be better. There is no place in API for crazy sht like that lol

ocean vortex Apr 27, 2025, 12:07 PM

#

full kite Apr 27, 2025, 12:09 PM

#

Guys I'm better

vagrant field Apr 27, 2025, 12:26 PM

#

hi all

#

hi !

#

depends , but I use gemini*, o4-mini-high , and claude 3.*

#

gemini 2.5 pro and claude 3.7

alpine coral Apr 27, 2025, 12:35 PM

#

ocean vortex for what model? I'm absolutely blown away after I found out this silly score app...

the api was what i found most curious about it when we were discussing it yesterday.. you could have been blown away 12hrs earlier if you read my messages lol

alpine coral Apr 27, 2025, 12:35 PM

#

ocean vortex

this is consistent with oai's chain of command i guess

#

System / platform (oai)
Developer (who can add what we call a system message)
User

#

if there's a conflict, the higher one has authority

#

as it explains in its reasoning (and hence why it said 8192;the 2903 in the devloper message was overriden)

keen beacon Apr 27, 2025, 1:10 PM

#

this seems about right

brittle tiger Apr 27, 2025, 1:55 PM

#

Is yap score unique for each user preference or are they just changing it for everyone on the fly looking for a sweet spot?

alpine coral Apr 27, 2025, 2:07 PM

#

keen beacon this seems about right

i think they might be from the same platform-level prompt (oai's instructions, which [are meant to] override anything given in the Developer Prompt (API) as well as end user messages

#

the 8196 comes from the platform-level prompt (as do, I think, those instructions about mirroring style etc)

knotty jetty Apr 27, 2025, 2:10 PM

#

alpine coral the 8196 comes from the platform-level prompt (as do, I think, those instruction...

What website is this

alpine coral Apr 27, 2025, 2:10 PM

#

https://platform.openai.com/playground/prompts?models=o3

knotty jetty Apr 27, 2025, 2:11 PM

#

Ok thanks bro

alpine coral Apr 27, 2025, 2:11 PM

#

np

brittle tiger Apr 27, 2025, 2:48 PM

#

full kite Apr 27, 2025, 3:10 PM

#

what the fk is a yap score

#

I NEED SMART MIND HERE Please

#

I have this test tomorrow

#

It's like the final test of the year right

#

But it's a mock one

#

Idk sht about what the subjects are

#

I don't go to classes and everything

#

What would be the best method to know how to solve them

#

Like I have access to all the past mock tests from the last 4 years

#

and the correction of them

#

And also videos about ppl solving them

#

Like Idk what to do chat

#

Please 😭

keen beacon Apr 27, 2025, 3:19 PM

#

Get ready @full kite

full kite Apr 27, 2025, 3:19 PM

#

keen beacon Get ready <@901923867604443206>

ready for what

full kite Apr 27, 2025, 3:20 PM

#

keen beacon Get ready <@901923867604443206>

Are you a smart person

keen beacon Apr 27, 2025, 3:20 PM

#

full kite ready for what

To be unemployed

leaden palm Apr 27, 2025, 3:20 PM

#

full kite Like I have access to all the past mock tests from the last 4 years

read through them then

full kite Apr 27, 2025, 3:21 PM

#

keen beacon To be unemployed

😭 😭 😭 😭

full kite Apr 27, 2025, 3:21 PM

#

leaden palm read through them then

And then what

leaden palm Apr 27, 2025, 3:21 PM

#

see what you don't know

#

what you don't understand

#

look for patterns in that

keen beacon Apr 27, 2025, 3:22 PM

#

ur in ai discord bro just cheat

#

all my classes online

#

4.0 gpa

full kite Apr 27, 2025, 3:22 PM

#

keen beacon 4.0 gpa

Is that good

keen beacon Apr 27, 2025, 3:22 PM

#

4.0 gpa out of 4

#

thats the highest u can get

full kite Apr 27, 2025, 3:23 PM

#

that's like good

keen beacon Apr 27, 2025, 3:23 PM

#

ye

#

i have a spare laptop i put in the corner

#

then use parsec on main desktop

#

remote into it

#

gemini free trial gives u unlimited screenshots too

flint sand Apr 27, 2025, 3:24 PM

#

keen beacon 4.0 gpa

won't work for people with relative grading

#

also don't cheat dude

keen beacon Apr 27, 2025, 3:24 PM

#

flint sand won't work for people with relative grading

whats that

full kite Apr 27, 2025, 3:24 PM

#

keen beacon gemini free trial gives u unlimited screenshots too

what free trial are you talking about

#

I'm using google ai studio

flint sand Apr 27, 2025, 3:24 PM

#

keen beacon whats that

your gpa isn't directly based on your marks, it's based on how much you scored relative to the highest marks obtained in class

keen beacon Apr 27, 2025, 3:25 PM

#

full kite I'm using google ai studio

go to https://gemini.google.com/app

top right corner you should see something like this

click it

1 month free trial no payment

cancel before they charge u

Gemini

‎Gemini

Bard is now Gemini. Get help with writing, planning, learning, and more from Google AI.

#

p sure ai studio's 2.5 pro without premium is only like 500k tokens then you get rate limited for a day

leaden palm Apr 27, 2025, 3:29 PM

#

keen beacon go to https://gemini.google.com/app top right corner you should see something l...

or use the openrouter + vertex + ai studio + copilot stack

#

or use the official student free trial if you're in university

full kite Apr 27, 2025, 3:30 PM

#

keen beacon go to https://gemini.google.com/app top right corner you should see something l...

bro what tf

#

Dude 2.5 pro is free and 1 million token per chat

#

what is pro even about 🙏 😭

#

I have the flash one too

#

faster

leaden palm Apr 27, 2025, 3:31 PM

#

keen beacon p sure ai studio's 2.5 pro without premium is only like 500k tokens then you get...

2.5 pro in ai studio is 250k/min and 25/day

full kite Apr 27, 2025, 3:31 PM

#

#

gemini.google.com is a scam

#

ong

leaden palm Apr 27, 2025, 3:32 PM

#

full kite Apr 27, 2025, 3:32 PM

#

what does RPM means

leaden palm Apr 27, 2025, 3:32 PM

#

requests per minute

leaden palm Apr 27, 2025, 3:32 PM

#

leaden palm

actually maybe ai studio has higher limits than the ai studio api... idk

full kite Apr 27, 2025, 3:33 PM

#

leaden palm requests per minute

so

#

GUYS

#

I need to study jessu

#

please help me

leaden palm Apr 27, 2025, 3:34 PM

#

full kite I need to study jessu

what

#

jesus?

full kite Apr 27, 2025, 3:34 PM

#

yeahuh

#

help me studying

leaden palm Apr 27, 2025, 3:34 PM

#

lmao

full kite Apr 27, 2025, 3:34 PM

#

😡

ocean vortex Apr 27, 2025, 3:39 PM

#

alpine coral the api was what i found most curious about it when we were discussing it yester...

yeah sorry didn't see that 👀

ocean vortex Apr 27, 2025, 3:40 PM

#

brittle tiger Is yap score unique for each user preference or are they just changing it for ev...

I think it's more on the fly thing, especially if this was at 4k earlier. I also very much doubt it was evaled with this in the context. But regardless it's unwanted flood for API as far as I'm concerned which makes "developer message" much less powerful and relevant

frail thorn Apr 27, 2025, 3:52 PM

#

Guys does anybody have a chatgpt premium account that could be shared with me?

ocean vortex Apr 27, 2025, 3:53 PM

#

frail thorn Guys does anybody have a chatgpt premium account that could be shared with me?

No. But I can run a prompt for you if you want

full kite Apr 27, 2025, 3:54 PM

#

ocean vortex No. But I can run a prompt for you if you want

why do you pay for chatgpt

frail thorn Apr 27, 2025, 3:54 PM

#

ocean vortex No. But I can run a prompt for you if you want

Well a "prompt" wouldn't solve everything

#

I need full access

#

😭😭😭😭

ocean vortex Apr 27, 2025, 3:54 PM

#

frail thorn I need full access

well then buy it

full kite Apr 27, 2025, 3:55 PM

#

frail thorn Well a "prompt" wouldn't solve everything

what do you need to do lil bro

frail thorn Apr 27, 2025, 3:55 PM

#

ocean vortex well then buy it

I'M NOT SPENDING 20 DOLLARS ON IT, I ONLY NEED IT FOR LIKE 4 DAYS

#

Its whatever

ocean vortex Apr 27, 2025, 3:55 PM

#

well then @full kite could give it to you maybe

full kite Apr 27, 2025, 3:55 PM

#

frail thorn I'M NOT SPENDING 20 DOLLARS ON IT, I ONLY NEED IT FOR LIKE 4 DAYS

WHAT DO YOU NEED CHATGPT FOR

frail thorn Apr 27, 2025, 3:56 PM

#

REALLY

full kite Apr 27, 2025, 3:56 PM

#

😡

ocean vortex Apr 27, 2025, 3:56 PM

#

he does't need his it seems

frail thorn Apr 27, 2025, 3:56 PM

#

full kite WHAT DO YOU NEED CHATGPT FOR

school stuff.....

full kite Apr 27, 2025, 3:56 PM

#

Ok tell me what

#

I'll see if I accept

frail thorn Apr 27, 2025, 3:56 PM

#

full kite Ok tell me what

well I'm doing a research

#

I just need chatgpt to help me

#

the normal one doesn't accpet all of my files

#

and so on

full kite Apr 27, 2025, 3:57 PM

#

how many files

frail thorn Apr 27, 2025, 3:57 PM

#

many...

#

like 10 at least

ocean vortex Apr 27, 2025, 3:57 PM

#

frail thorn like 10 at least

have you tried aistudio?

frail thorn Apr 27, 2025, 3:57 PM

#

ocean vortex have you tried aistudio?

yeah

ocean vortex Apr 27, 2025, 3:57 PM

#

it's free and the king for file uploads

frail thorn Apr 27, 2025, 3:57 PM

#

but like

full kite Apr 27, 2025, 3:57 PM

#

frail thorn yeah

No you did not

frail thorn Apr 27, 2025, 3:57 PM

#

full kite No you did not

I DID

full kite Apr 27, 2025, 3:57 PM

#

okay what happen

frail thorn Apr 27, 2025, 3:58 PM

#

wait

full kite Apr 27, 2025, 3:58 PM

#

🤨 🫃

frail thorn Apr 27, 2025, 3:58 PM

#

full kite 🤨 🫃

HEY THATS MY EMOJI

#

🫃🫃🫃🫃🫃

full kite Apr 27, 2025, 3:59 PM

#

frail thorn HEY THATS MY EMOJI

Dude

#

listen I'm going to quit

#

if you don't tell me

#

😡

frail thorn Apr 27, 2025, 3:59 PM

#

TELL YOU WHAT

#

oh my days

#

you know what

#

QUIT

full kite Apr 27, 2025, 3:59 PM

#

WHAT HAPPEN WITH GOOGLE GEMINI

#

IT HAS 2 000 000 TOKENS FREE

#

PER CHAT

#

CHATGPT PRO IS 128 000

frail thorn Apr 27, 2025, 4:00 PM

#

😇

full kite Apr 27, 2025, 4:00 PM

#

😡

full kite Apr 27, 2025, 4:00 PM

#

frail thorn 😇

Ok we should start over

#

I'll help you

#

I want to help you

#

@frail thorn

#

?

#

😔

frail thorn Apr 27, 2025, 4:01 PM

#

😀

full kite Apr 27, 2025, 4:02 PM

#

frail thorn 😀

Okay so do you need to upload a large ammout of pdfs?

keen fulcrum Apr 27, 2025, 4:02 PM

#

frail thorn HEY THATS MY EMOJI

Are you pregnant?

frail thorn Apr 27, 2025, 4:02 PM

#

full kite Okay so do you need to upload a large ammout of pdfs?

just forget it I already bought a subscription

#

🙄

frail thorn Apr 27, 2025, 4:03 PM

#

keen fulcrum Are you pregnant?

yes

timber kiln Apr 27, 2025, 4:03 PM

#

frail thorn just forget it I already bought a subscription

Now you gotta use it overtime so they lose money on your sub

frail thorn Apr 27, 2025, 4:03 PM

#

timber kiln Now you gotta use it overtime so they lose money on your sub

hell yeah

ocean vortex Apr 27, 2025, 4:03 PM

#

frail thorn just forget it I already bought a subscription

I'M NOT SPENDING 20 DOLLARS ON IT, I ONLY NEED IT FOR LIKE 4 DAYS

frail thorn Apr 27, 2025, 4:03 PM

#

I'll include the thank you

frail thorn Apr 27, 2025, 4:04 PM

#

ocean vortex > I'M NOT SPENDING 20 DOLLARS ON IT, I ONLY NEED IT FOR LIKE 4 DAYS

yeah well I did

full kite Apr 27, 2025, 4:04 PM

#

frail thorn I'll include the thank you

leaden palm Apr 27, 2025, 4:04 PM

#

full kite CHATGPT PRO IS 128 000

wait does the app still lack the 1m context from 4.1 lmao

ocean vortex Apr 27, 2025, 4:04 PM

#

🫃

full kite Apr 27, 2025, 4:04 PM

#

full kite

And it's free

#

You fkn looser

frail thorn Apr 27, 2025, 4:04 PM

#

someones mad

full kite Apr 27, 2025, 4:04 PM

#

✡️

frail thorn Apr 27, 2025, 4:04 PM

#

🤗

full kite Apr 27, 2025, 4:05 PM

#

🥀

frail thorn Apr 27, 2025, 4:05 PM

#

full kite ✡️

you greedy juice...

full kite Apr 27, 2025, 4:05 PM

#

frail thorn you greedy juice...

say that to sam the juice

ocean vortex Apr 27, 2025, 4:05 PM

#

I wouldn't really say chatgpt is better for what you are trying to do even... But ok whatever 😂

frail thorn Apr 27, 2025, 4:05 PM

#

JUST CHILL OUT GUYS

#

its just 20 dollars

#

Its not like I'm going to die from it

full kite Apr 27, 2025, 4:05 PM

#

frail thorn its just 20 dollars

yeah 20 dollars or free

ocean vortex Apr 27, 2025, 4:06 PM

#

frail thorn its just 20 dollars

kids in Africa could have eaten those dollars

frail thorn Apr 27, 2025, 4:06 PM

#

LOLLLLLLL

full kite Apr 27, 2025, 4:06 PM

#

WE GOT ONE

frail thorn Apr 27, 2025, 4:06 PM

#

now that made me giggle

full kite Apr 27, 2025, 4:06 PM

#

coins clipper

frail thorn Apr 27, 2025, 4:06 PM

#

gold detector

full kite Apr 27, 2025, 4:07 PM

#

the

#

why have you buy the chatgpt

frail thorn Apr 27, 2025, 4:07 PM

#

🤗

full kite Apr 27, 2025, 4:07 PM

#

thing

frail thorn Apr 27, 2025, 4:07 PM

#

I'm just talking about me

#

I'm not including any other reference

full kite Apr 27, 2025, 4:07 PM

#

frail thorn I'm not including any other reference

what does that mean

frail thorn Apr 27, 2025, 4:08 PM

#

😭

#

I NEED TO GO BACK TO RESEARCHING

#

cya

full kite Apr 27, 2025, 4:09 PM

#

frail thorn I NEED TO GO BACK TO RESEARCHING

RESEARCHING WHAT

#

🤨

frail thorn Apr 27, 2025, 4:09 PM

#

full kite RESEARCHING WHAT

👹

full kite Apr 27, 2025, 4:09 PM

#

are you doing research about the juice

#

😔

#

yes ?

#

Sez u

#

aware

native shoreBOT Apr 27, 2025, 4:32 PM

#

dynoSuccess diana12493_32 has been warned.

full kite Apr 27, 2025, 4:33 PM

#

Hello??

#

Can the other guy be warn too

#

😡

keen beacon Apr 27, 2025, 4:34 PM

#

oy vey stop noticing

native shoreBOT Apr 27, 2025, 4:34 PM

#

dynoSuccess thekingofnothing_ has been warned.

#

dynoSuccess m0_0d_ai has been warned.

leaden palm Apr 27, 2025, 4:50 PM

#

which is why if you build anything and want it to be free you have to get like 3 other providers

#

25 rpd does not serve 2-3 users actively building things

#

25 rpd is the documented limit

#

and i've ran into it before

full kite Apr 27, 2025, 4:55 PM

#

keen beacon oy vey stop noticing

💀

ocean vortex Apr 27, 2025, 5:56 PM

#

wtf happened here

frail thorn Apr 27, 2025, 5:59 PM

#

full kite Can the other guy be warn too

😢 🖕

full kite Apr 27, 2025, 5:59 PM

#

frail thorn 😢 🖕

🫃

zinc ore Apr 27, 2025, 7:23 PM

#

https://x.com/Teknium1/status/1916398151914950687

Teknium (e/λ) (@Teknium1) on X

Okay come on.. lmao

small haven Apr 27, 2025, 7:27 PM

#

lmao

bright kayak Apr 27, 2025, 7:49 PM

#

small haven lmao

pliant cypress Apr 27, 2025, 8:25 PM

#

Logan just hint 2.5 ultra and remove tweet very fast 😆

keen beacon Apr 27, 2025, 8:26 PM

#

what did he say

small haven Apr 27, 2025, 8:28 PM

#

pliant cypress Logan just hint 2.5 ultra and remove tweet very fast 😆

screenshot or didnt happen

#

buddy perma works at google and says this

pliant cypress Apr 27, 2025, 8:43 PM

#

keen beacon what did he say

Something about making custom t-shirts with text "1400+ ELO club", "smell big model", "AGI when". But maybe its just coping

sage raptor Apr 27, 2025, 8:49 PM

#

maybe he is drunk

full kite Apr 27, 2025, 8:57 PM

#

pliant cypress Something about making custom t-shirts with text "1400+ ELO club", "smell big mo...

1400 elo?

#

what does that mean

knotty jetty Apr 27, 2025, 9:03 PM

#

Yo gemini 2.5 pro in perplexity is peak

#

Also I hate perplexity but its really good with the llm

full kite Apr 27, 2025, 9:07 PM

#

knotty jetty Yo gemini 2.5 pro in perplexity is peak

whats perplexity

knotty jetty Apr 27, 2025, 9:07 PM

#

full kite whats perplexity

New gen

full kite Apr 27, 2025, 9:07 PM

#

what

#

dude I know nothing of the sht

#

I'm using google studio ai

#

whats perplexity

knotty jetty Apr 27, 2025, 9:08 PM

#

Just look it up bro

#

Perplexity.ai

full kite Apr 27, 2025, 9:08 PM

#

omfg

leaden palm Apr 27, 2025, 9:16 PM

#

full kite what does that mean

why are you in this server if you arent an arena user lol

small haven Apr 27, 2025, 9:17 PM

#

deepwiki's deep research is underrated

wintry tinsel Apr 27, 2025, 9:17 PM

#

Remember how they murdered the old server

#

🥲

leaden palm Apr 27, 2025, 9:17 PM

#

small haven deepwiki's deep research is underrated

the one from devin?

small haven Apr 27, 2025, 9:18 PM

#

leaden palm the one from devin?

yup

#

#

took about 3-4 mins to finish

full kite Apr 27, 2025, 9:21 PM

#

leaden palm why are you in this server if you arent an arena user lol

bro I just use arena to see the scoreboard I'm not jerkng off to it

leaden palm Apr 27, 2025, 9:21 PM

#

full kite bro I just use arena to see the scoreboard I'm not jerkng off to it

so you do use the leaderboard

#

which is... elo data

full kite Apr 27, 2025, 9:21 PM

#

yeah what about it

#

aok

#

I thought it was a chess thing, they were talking about that earlier

leaden palm Apr 27, 2025, 9:22 PM

#

anything with pairwise comparisons can be measured with elo

full kite Apr 27, 2025, 9:22 PM

#

like llm will never be able to play chess

full kite Apr 27, 2025, 9:23 PM

#

leaden palm so you do use the leaderboard

no but fr what else can we do on arena thing

#

can we play on it or something

leaden palm Apr 27, 2025, 9:23 PM

#

full kite no but fr what else can we do on arena thing

you can chat with llms and vote to help build the leaderboard

wintry tinsel Apr 27, 2025, 9:23 PM

#

LLM’s dead end 😱

#

LLM’s failed project

leaden palm Apr 27, 2025, 9:24 PM

#

you can also compare other things:

llms using github repos
llms using search
llms making websites
llms writing code
image generation

wintry tinsel Apr 27, 2025, 9:24 PM

#

Rip LLM’s

#

Support Yan Le Cun

torn mantle Apr 27, 2025, 9:24 PM

#

pliant cypress Logan just hint 2.5 ultra and remove tweet very fast 😆

where

leaden palm Apr 27, 2025, 9:24 PM

#

wintry tinsel Rip LLM’s

torn mantle Apr 27, 2025, 9:25 PM

#

pliant cypress Something about making custom t-shirts with text "1400+ ELO club", "smell big mo...

oh its probably riverhollow/sunstrike

#

im not impressed with these models tbh

#

they just seem like gemini 2.5 pro 03

full kite Apr 27, 2025, 9:26 PM

#

I don't know what a repos is

leaden palm Apr 27, 2025, 9:26 PM

#

thats ok, a lot of lm arena is for coders and you might not be one

full kite Apr 27, 2025, 9:28 PM

#

I'm a coder

leaden palm Apr 27, 2025, 9:31 PM

#

full kite I'm a coder

have you used github before?

full kite Apr 27, 2025, 9:32 PM

#

leaden palm have you used github before?

to download yt dlp

#

I know about the green board

#

contributions sht

small haven Apr 27, 2025, 9:33 PM

#

my iq is dropping

elder rapids Apr 27, 2025, 9:35 PM

#

where's Nw 🙏😭

sage raptor Apr 27, 2025, 9:35 PM

#

soon

keen beacon Apr 27, 2025, 9:35 PM

#

in the void

elder rapids Apr 27, 2025, 9:35 PM

#

the longer these models are unreleased

#

the more there's a chance people catch up

#

and it's unimpressive

wintry tinsel Apr 27, 2025, 9:36 PM

#

We need human brain computer

#

Neurons in the circuit

#

Than true AGI

#

🔥

golden ocean Apr 27, 2025, 9:36 PM

#

lets disect urs so we can start the experiments immediately

wintry tinsel Apr 27, 2025, 9:37 PM

#

That would be cheating I’m already AGI

misty vault Apr 27, 2025, 9:37 PM

#

more like artificial general stupidity

#

Ok that sounds more like a direct insult rather than a silly joke

#

no more funny

wintry tinsel Apr 27, 2025, 9:38 PM

#

Are LLM’s capable of being stupid?

#

Perhaps stupidity is a byproduct of intelligence

leaden palm Apr 27, 2025, 9:38 PM

#

wintry tinsel Are LLM’s capable of being stupid?

llama 1b:

#

qwen 500m:

golden ocean Apr 27, 2025, 9:39 PM

#

wintry tinsel Are LLM’s capable of being stupid?

wintry tinsel Apr 27, 2025, 9:39 PM

#

Well you have a point

small haven Apr 27, 2025, 9:44 PM

#

ok but wen o3 pro

#

baited, just wanted to spawn my man craig

elder rapids Apr 27, 2025, 9:48 PM

#

nah ion think so

#

a major part of what made o1 pro so good was it's ability for pure longer context reasoning

small haven Apr 27, 2025, 9:48 PM

#

ye im using o3 more than o1 pro..

#

what we know is its 10x compute (so thinking for 10x longer than o1)

keen beacon Apr 27, 2025, 9:51 PM

#

lol

small haven Apr 27, 2025, 9:51 PM

#

and the fact it is outputting as a one shot answer rather than streaming tokens, means it is using an internal canvas in the backend

#

so it is constantly iterating its final answer, thats my hypothesis

elder rapids Apr 27, 2025, 9:52 PM

#

we can infer

#

longer reasoning

#

whereas pro is probably fine tuned

small haven Apr 27, 2025, 9:55 PM

#

and internal canvas

elder rapids Apr 27, 2025, 9:55 PM

#

specifically to force a longer reasoning chain + better initial instruction retention

small haven Apr 27, 2025, 9:55 PM

#

internal canvas, checking its own answer on a pass@1, then reiterates it, until it is satisfied with a hard limit of 10x compute

elder rapids Apr 27, 2025, 9:55 PM

#

there won't be an o3 pro if it's not better than o1 pro

#

just saying lmao

willow grail Apr 27, 2025, 10:03 PM

#

golden ocean

great for cleaning my ass. yes. thats it.

small haven Apr 27, 2025, 10:04 PM

#

can sam stop riding his husband and release o3 pro

willow grail Apr 27, 2025, 10:06 PM

#

small haven can sam stop riding his husband and release o3 pro

ceos dont have a healthy relationship

#

its a ceo lol?

#

watch movies

#

then u know

#

rich people are very unhappy

#

thats why i am happy

#

nonetheless you are not rich

#

cause its adverse to not say it

#

youre stuffed with adverse lies

#

/s

#

ceo's dont live on prairies, therefore they are unhappy creatures.

terse shuttle Apr 27, 2025, 10:10 PM

#

Should we expect a new image generation model from openAI in the arena?

willow grail Apr 27, 2025, 10:10 PM

#

you have cute attributes, federighi

#

ceos dont have small things.

#

see it as necessary.

#

we said the same word... lol

#

._. we are models?

#

i will weigh 81kg after water flunctuations in exactly 30 days

#

now i am 82.6kg

#

having diseases which makes loosing weight slower is bad

#

i think u just wanna beleive they are as happy as a nanny in a village

#

cause u wanan be rich too

#

so u have a goal to go for right now

ocean vortex Apr 27, 2025, 10:45 PM

#

No it’s more like multiple attempts and consensus system. You can still set low-med-high for pro.

tall summit Apr 27, 2025, 10:48 PM

#

oh hell yeah #announcements cute

small haven Apr 27, 2025, 10:53 PM

#

omg sam tweeted, plz be o3 pro

golden ocean Apr 27, 2025, 11:03 PM

#

its him riding his husband

tall summit Apr 27, 2025, 11:21 PM

#

golden ocean Apr 28, 2025, 12:17 AM

#

hi

leaden palm Apr 28, 2025, 1:34 AM

#

probably a refreshed pretrain

keen beacon Apr 28, 2025, 2:23 AM

#

its likely to be a cpt of 2.0 pro

#

it isnt. even openai cant increase simpleqa that much yet thru reasoning

small haven Apr 28, 2025, 3:14 AM

#

i love o3

#

but i would love o3 pro even more

dapper aspen Apr 28, 2025, 3:32 AM

#

hi

balmy mist Apr 28, 2025, 3:39 AM

#

we got deepseek yet?

#

imma go to china and fight Winnie if we dont get r2 this week

alpine coral Apr 28, 2025, 4:31 AM

#

just got folsom-exp-v1, which i haven't seen or heard of before - new anon model?

#

presumably related to cobalt, apricot

#

so amazon ig

torn mantle Apr 28, 2025, 4:47 AM

#

balmy mist we got deepseek yet?

https://x.com/ZhaoTing1024/status/1916678514180497416

Zhao Tianyu (@ZhaoTing1024) on X

Official announcement: Qwen 3 this week. Reasoning and non-reasoning in one.

small haven Apr 28, 2025, 6:36 AM

#

when o3 shows these traces in the cot, i kinda leak a bit

cedar tide Apr 28, 2025, 6:59 AM

#

alpine coral just got `folsom-exp-v1`, which i haven't seen or heard of before - new anon mo...

Amazon reasoning model no?

keen beacon Apr 28, 2025, 7:09 AM

#

small haven when o3 shows these traces in the cot, i kinda leak a bit

you do what now

calm sequoia Apr 28, 2025, 7:10 AM

#

torn mantle https://x.com/ZhaoTing1024/status/1916678514180497416

Any news on possible specs?

alpine coral Apr 28, 2025, 7:42 AM

#

cedar tide Amazon reasoning model no?

don't think so

#

it seemed quite fast and dumb

cedar tide Apr 28, 2025, 7:43 AM

#

https://x.com/btibor91/status/1916756247422124353?t=nmHdYoncI3U1Sdle6XHOlg&s=19

Tibor Blaho (@btibor91) on X

Amazon Bedrock already lists "Amazon Nova Premier" with a release date of 2025-04-30, along with a new model "Writer Palmyra X5" (2025-04-28) and "Llama 4 Maverick 17B Instruct" and "Llama 4 Scout 17B Instruct" (both 2025-04-28)

small haven Apr 28, 2025, 8:20 AM

#

why is o3 limited at 64k context, absolute dogwater

keen beacon Apr 28, 2025, 9:25 AM

#

omg qwen 3

#

apparently pre trained on 36 trillion tokens 😮 (2x qwen 2.5). multiple moe models?

stuck orchid Apr 28, 2025, 9:27 AM

#

Hi.
Is there a chat limit in https://beta.lmarena.ai? Or is it possible to test ai even on long requests (10K+ tokens)?

keen beacon Apr 28, 2025, 9:37 AM

#

keen beacon apparently pre trained on 36 trillion tokens 😮 (2x qwen 2.5). multiple moe mode...

curious about the super low amnt of active params on the moe models. qwen 15b in the huggingface pr has 2b active, qwen 30b has 3b active

torn mantle Apr 28, 2025, 10:03 AM

#

calm sequoia Any news on possible specs?

https://x.com/OpulentByte/status/1916794351025352859

Opulent Byte (@OpulentByte) on X

@ai_for_success Looks like smaller Qwen 3 models today only.

#

keen fulcrum Apr 28, 2025, 10:13 AM

#

https://hiddenlayer.com/innovation-hub/novel-universal-bypass-for-all-major-llms/

HiddenLayer | Security for AI

Novel Universal Bypass for All Major LLMs

HiddenLayer’s latest research uncovers a universal prompt injection bypass impacting GPT-4, Claude, Gemini, and more, exposing major LLM security gaps.

torn mantle Apr 28, 2025, 10:33 AM

#

So llama 4 reasoning models werent added on lmarena giving the recent controversies

#

And its kinda weird too qwen3 isnt added yet in an anonymous battle mode

ocean vortex Apr 28, 2025, 10:39 AM

#

keen beacon it isnt. even openai cant increase simpleqa that much yet thru reasoning

em?

gpt-4.1-2025-04-14 --> 41.6%
o3-low --> 49.4%

in both cases just shy of 20% of the earlier score increase

keen beacon Apr 28, 2025, 10:42 AM

#

ocean vortex em? gpt-4.1-2025-04-14 --> 41.6% o3-low --> 49.4% in both cases just shy of ...

2.5 flash got a worse simpleqa tho compared to gemini 2 flash, i would expect openai to lead there in terms of factual reasoning. but yeah i guess its out of date, since we have o3. i was comparing it with o1 when i formed that opinion initially

keen fulcrum Apr 28, 2025, 10:47 AM

#

torn mantle So llama 4 reasoning models werent added on lmarena giving the recent controvers...

They were

torn mantle Apr 28, 2025, 10:52 AM

#

https://x.com/JustinLin610/status/1916805525171494965

Junyang Lin ✈️ ICLR (@JustinLin610) on X

see if we can finish the job tonight #Qwen3 👻

humble sonnet Apr 28, 2025, 11:15 AM

#

https://code-arena.fly.dev
what is this link ?

balmy mist Apr 28, 2025, 11:43 AM

#

torn mantle https://x.com/JustinLin610/status/1916805525171494965

Omggg

neon anchor Apr 28, 2025, 11:46 AM

#

Sunstrike is not generating code at all

balmy mist Apr 28, 2025, 11:47 AM

#

Anybody tested qwen3?

keen beacon Apr 28, 2025, 11:48 AM

#

balmy mist Anybody tested qwen3?

its not out yet

#

the 235b moe (apparently) with reasoning mode might be very impressive

sage raptor Apr 28, 2025, 11:51 AM

#

No eletricity in europe 😭

keen beacon Apr 28, 2025, 11:54 AM

#

torn mantle https://x.com/OpulentByte/status/1916794351025352859

no lmao

#

it's just that those were the ones that accidentally appeared on modelscope briefly

#

anyway, to summarise, this week we are getting:

qwen 3 (likely today)
amazon nova premier (wed)
deepseek R2 (somewhat likely, depends on qwen 3)

brittle tiger Apr 28, 2025, 12:00 PM

#

Would be cool if r2 went on arena before debuting

torn mantle Apr 28, 2025, 12:00 PM

#

keen beacon it's just that those were the ones that accidentally appeared on modelscope brie...

Yea the big one is 235b

keen beacon Apr 28, 2025, 12:02 PM

#

i expect that to be a very good model

balmy mist Apr 28, 2025, 12:06 PM

#

keen beacon its not out yet

ah okay, its like we get a fire week, then break, then back to fire lol

#

i kinda like that, give us time to collect ourselves

torn mantle Apr 28, 2025, 12:14 PM

#

Qwen 3 is what people expected llama 4 to be like

calm sequoia Apr 28, 2025, 12:30 PM

#

The GPT 4o doesn't allow altering real photos due to their stupid policies. Anyone have the jailbreak code or the alternative tool?

sonic tendon Apr 28, 2025, 12:35 PM

#

torn mantle Yea the big one is 235b

oh damn
was wondering if qwen3 was going to include a big model
seems sort of odd that they didn't include it in the huggingface pr from a week or two ago, but maybe they're trying to keep it closed?

keen beacon Apr 28, 2025, 12:35 PM

#

sonic tendon oh damn was wondering if qwen3 was going to include a big model seems sort of od...

a lot of models werent in the pr

sonic tendon Apr 28, 2025, 12:35 PM

#

ah

keen beacon Apr 28, 2025, 12:36 PM

#

they just briefly put out a fp8 quantized qwen 3 0.6b on hf and removed it

torn mantle Apr 28, 2025, 12:36 PM

#

keen beacon they just briefly put out a fp8 quantized qwen 3 0.6b on hf and removed it

Yea

#

Will 235b be the biggest model released by them so far?

#

Or they released smth bigger before

keen beacon Apr 28, 2025, 12:37 PM

#

torn mantle Will 235b be the biggest model released by them so far?

yea. 110b dense was their largest before

#

im so hyped rn

calm sequoia Apr 28, 2025, 12:58 PM

#

I think nothing will be better for at least a month

calm sequoia Apr 28, 2025, 12:58 PM

#

calm sequoia The GPT 4o doesn't allow altering real photos due to their stupid policies. Anyo...

If anyone will need the same: https://sider.ai

Sider: ChatGPT Sidebar + GPT-4.1, Claude 3.5, Gemini 2.5 & AI Tools

Sider, the most advanced AI assistant, helps you to chat, write, read, translate, explain, test to image with AI, including GPT-4.1 & GPT-4.1 mini, Gemini and Claude, on any webpage.

ocean vortex Apr 28, 2025, 1:02 PM

#

it should at least be mostly as good as 2.5 I think.

#

probably. But it's not gonna be worth using it given the price that's for sure lol

#

technically sota now is o3 anyway

keen beacon Apr 28, 2025, 1:05 PM

#

i dont think they will put o3 pro in the arena

barren prairie Apr 28, 2025, 1:05 PM

#

keen beacon i dont think they will put o3 pro in the arena

Maybe to beat Gemini 🙂

keen beacon Apr 28, 2025, 1:07 PM

#

imho the most likely thing to dethrone gemini 2.5 pro will be gemini 2.5 pro (at least in the arena) 🤣

torn mantle Apr 28, 2025, 1:08 PM

#

In all seriousness o3 model is a lot better than gemini 2.5 pro at general tasks

#

You can still feel the robotic vibes from gemini

full kite Apr 28, 2025, 1:12 PM

#

Just got r2 it's good

ocean vortex Apr 28, 2025, 1:12 PM

#

full kite Just got r2 it's good

old model I'm already using r3

full kite Apr 28, 2025, 1:13 PM

#

ocean vortex old model I'm already using r3

bro what

#

Guys can IA do my homework pls

#

it's a math test

calm sequoia Apr 28, 2025, 1:14 PM

#

Why did you change your username from Mango to Diana? 🙂

full kite Apr 28, 2025, 1:14 PM

#

calm sequoia Why did you change your username from Mango to Diana? 🙂

cause diana rules

#

https://tenor.com/view/shoebill-diana-hello-gif-25289056

Tenor

calm sequoia Apr 28, 2025, 1:15 PM

#

if R2 > Behemot, Zuck is cooked

keen beacon Apr 28, 2025, 1:15 PM

#

thats obviously gonna be the case tbh

full kite Apr 28, 2025, 1:15 PM

#

calm sequoia if R2 > Behemot, Zuck is cooked

what is behemot

calm sequoia Apr 28, 2025, 1:15 PM

#

keen beacon thats obviously gonna be the case tbh

Obvious for you, but not for Zuck or his investors 😄

full kite Apr 28, 2025, 1:16 PM

#

what is behemot

keen beacon Apr 28, 2025, 1:16 PM

#

behemot is agi

alpine coral Apr 28, 2025, 1:16 PM

#

keen beacon i dont think they will put o3 pro in the arena

yeah zero chance

calm sequoia Apr 28, 2025, 1:16 PM

#

Behemot is minotaur shrek

severe tinsel Apr 28, 2025, 1:17 PM

#

Hi, I’m not a pro, will Qwen3 compete with Qwen 2.5 Max at the top of the leaderboard or its not the same category?

full kite Apr 28, 2025, 1:17 PM

#

agi doesn't exist

full kite Apr 28, 2025, 1:17 PM

#

severe tinsel Hi, I’m not a pro, will Qwen3 compete with Qwen 2.5 Max at the top of the leader...

nuhuh

keen beacon Apr 28, 2025, 1:17 PM

#

severe tinsel Hi, I’m not a pro, will Qwen3 compete with Qwen 2.5 Max at the top of the leader...

qwen 3 is a line of models its not even released

#

or in the arena

severe tinsel Apr 28, 2025, 1:17 PM

#

Yea i mean one of them

keen beacon Apr 28, 2025, 1:17 PM

#

severe tinsel Yea i mean one of them

yes at least one of them will compete i think

#

probably the big one will be competitive in the leaderboard

severe tinsel Apr 28, 2025, 1:18 PM

#

Okay thanks!

stuck orchid Apr 28, 2025, 1:19 PM

#

Qwen 3 may be better than Gemini-2.5-pro 👍

full kite Apr 28, 2025, 1:20 PM

#

qwen is slow asf

ocean vortex Apr 28, 2025, 1:24 PM

#

full kite what is behemot

this

full kite Apr 28, 2025, 1:24 PM

#

ocean vortex this

this is a bear

ocean vortex Apr 28, 2025, 1:24 PM

#

it's behemoth

full kite Apr 28, 2025, 1:25 PM

#

like a bear with teef

#

4 legged bear

ocean vortex Apr 28, 2025, 1:25 PM

#

https://sciifii.fandom.com/wiki/Behemoth

SciiFii Wiki

Behemoth

The Behemoth (Megasus mammothoides, name meaning "great mammoth pig") is a species of large land mammal that originally didn't exist, but was created by SciiFii and introduced to the African...

alpine coral Apr 28, 2025, 1:25 PM

#

bro... it's trolling....

#

(i assume / hope lol)

ocean vortex Apr 28, 2025, 1:26 PM

#

alpine coral bro... it's trolling....

I think we both are. I hope he does too lol

alpine coral Apr 28, 2025, 1:27 PM

#

lol

keen beacon Apr 28, 2025, 1:28 PM

#

try it yourself lol

#

direct chat / side by side?

#

i think side by side was configured differently a while back it might have better limits

#

yea

alpine coral Apr 28, 2025, 1:29 PM

#

i dunno about coding (let alone for a specifc programming language), but i feel 2.5 pro is just solid as af all round

keen beacon Apr 28, 2025, 1:30 PM

#

its incredible i still main it over anything else

alpine coral Apr 28, 2025, 1:30 PM

#

it's more usable than o3 (esp o3 high)

#

yes

keen beacon Apr 28, 2025, 1:32 PM

#

its definitely seen less c++ than the others and its more things to manage

#

yea

#

if look at it historically probably. it depends on how much of it was curated, but i think its likely to be even less

#

im not very sure about the others. but 1 and 2 is highly likely to be python/javascript (not sure which one is which though)

#

it depends really but with a gc and such its generally slower/much slower i think

#

did u try?

#

2.5 pro has the best context retention/usage in a model i think anyway, it helps a lot

#

you should probably upload the code into the repo instead of it being in a zip

#

do u have git installed?

#

i think u can also upload folders directly on the website

#

maybe ask 2.5 pro to teach u git

#

its more convenient and allows u to have version control

alpine coral Apr 28, 2025, 1:51 PM

#

i'm getting so many errors in the arena atm

#

#

i feel like the yap score has been there all along (and at 8192).. recently discovered / noticed, rather than added..

#

also the other models' responses.. 3.7-sonn and v3 do well; sunstrike also (though verbose af)

#

folsom-exp-v1 assumes it's a bitcoin or something - pretty terrible response imo

tall summit Apr 28, 2025, 2:10 PM

#

alpine coral

what.

alpine coral Apr 28, 2025, 2:12 PM

#

oai reasoning models have a top-level prompt that gives guidance about how to intereact and includes this part about a 'yap score' (which has always been at 8192, so far as i can tell)

tall summit Apr 28, 2025, 2:13 PM

#

alpine coral oai reasoning models have a top-level prompt that gives guidance about how to in...

is that real.

alpine coral Apr 28, 2025, 2:13 PM

#

yeah i mean, to the extent there's some oai-imposed instrucitons that include this thing called a yap score, it's real

tall summit Apr 28, 2025, 2:14 PM

#

wow okay.

alpine coral Apr 28, 2025, 2:14 PM

#

whether it's concerning though im not really sure tbh

#

like esp if it's been there all along

#

it may just be for stylistic purposes

#

but yeah that said.. my intial reaction was to think it was a way for oai to dynamically throttle outputs on o models in chagtgpt, to like manage costs / compute

keen beacon Apr 28, 2025, 2:15 PM

#

iirc it was a thing since launch

#

o3/o4 mini launch

alpine coral Apr 28, 2025, 2:16 PM

#

yeah i only learnt about it here a couple of days ago

#

but now i think it's likely been there all along (and am not really bothered by it.. like there's no indication of it being used to nerf the models or whatever... yet anyway aha)

#

i kinda wonder if, in cases where lots of reasoning tokens are used especially, the final outputs could be super lengthy, and this was just their solution (prompting), rather than it actually meant to be dynamic

brittle tiger Apr 28, 2025, 2:41 PM

#

2.5 Pro tops another benchmark

https://geobench.org/

GeoBench

GeoBench is an LLM/LVLM benchmark for GeoGuessr.

cedar tide Apr 28, 2025, 2:50 PM

#

Qwen 3 253b will be better than deepseek 3.1 671b ? (And Maverick 400b)

torn mantle Apr 28, 2025, 3:13 PM

#

cedar tide Qwen 3 253b will be better than deepseek 3.1 671b ? (And Maverick 400b)

Probably

#

The question is : will the smaller models be even better than Maverick?

#

Imagine 30b > 400b

cedar tide Apr 28, 2025, 3:16 PM

#

torn mantle The question is : will the smaller models be even better than Maverick?

below 253b there is only a dense of 14b and a moe of 30b
I don't think it can be better than maverick

#

apart from their version with reasoning

keen beacon Apr 28, 2025, 3:18 PM

#

i think all/most of them are hybrid reasoning models

cedar tide Apr 28, 2025, 3:18 PM

#

👀
https://x.com/alexalbert__/status/1916874027756666981?t=eC1sQXsZWHold8XGlIlxbw&s=19
https://x.com/alexalbert__/status/1916874039769120904?t=UGLhieKvS3ariM5WPQxCwQ&s=19

Alex Albert (@alexalbert__) on X

"let's train our model to get higher chat slop ELO scores"

*model starts exclusively outputting pure chat slop*

Alex Albert (@alexalbert__) on X

not meant to be a jab at any one lab in particular, just highlighting a particularly bad incentive structure I see rn. there's a reason you don't find Claude at #1 on chat slop leaderboards. it's the LLM equivalent of optimizing for video watch time in a social media algo.

cedar tide Apr 28, 2025, 3:19 PM

#

keen beacon i think all/most of them are hybrid reasoning models

Yes, it has already been leaked, we know that.

keen beacon Apr 28, 2025, 3:19 PM

#

cedar tide below 253b there is only a dense of 14b and a moe of 30b I don't think it can b...

there is 15b moe according to the hf pr

#

might be a placeholder but the pr also mentioned the 8b model which was confirmed

keen beacon Apr 28, 2025, 3:21 PM

#

cedar tide 👀 https://x.com/alexalbert__/status/1916874027756666981?t=eC1sQXsZWHold8XGlIlxb...

LOOL he deleted them

cedar tide Apr 28, 2025, 3:21 PM

#

keen beacon there is 15b moe according to the hf pr

Yes, I was thinking the same thing and I said it to several people, but in the end, whether it's the leaks from ModelScope or Hugging Face, there’s no trace of this model anywhere.

keen beacon Apr 28, 2025, 3:22 PM

#

cedar tide Yes, I was thinking the same thing and I said it to several people, but in the e...

ya i guess its a placeholder

#

wont be surprised if its real tho

terse shuttle Apr 28, 2025, 3:31 PM

#

terse shuttle Should we expect a new image generation model from openAI in the arena?

.

keen fulcrum Apr 28, 2025, 3:49 PM

#

Qwen 3 dropping

keen beacon Apr 28, 2025, 3:51 PM

#

yup they're trickling in

torn mantle Apr 28, 2025, 3:51 PM

#

cedar tide 👀 https://x.com/alexalbert__/status/1916874027756666981?t=eC1sQXsZWHold8XGlIlxb...

Gemini 2.5 pro is a solid model

keen beacon Apr 28, 2025, 3:51 PM

#

will probably be all done in the next hour or two

#

I'm just waiting for the big one

torn mantle Apr 28, 2025, 3:51 PM

#

Its either o3 or gemini 2.5 pro that deserves #1 spot tbh

keen fulcrum Apr 28, 2025, 3:52 PM

#

https://fixupx.com/kimmonismus/status/1916818352485413038

Chubby♨️ (@kimmonismus)

Qwen 3 released
︀︀
︀︀Qwen3-8B
︀︀
︀︀Qwen3 Highlights
︀︀
︀︀Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Building upon extensive advancements in training data, model architecture, and optimization techniques, Qwen3 delivers the following key improvements over the previously released Qwen2.5:
︀︀
︀︀•⁠Expanded Higher-Quality Pre-training Corpus: Qwen3 is pre-trained on 36 trillion tokens across 119 languages — tripling the language coverage of Qwen2.5 — with a much richer mix of high-quality data, including coding, STEM, reasoning, book, multilingual, and synthetic data.
︀︀•⁠Training Techniques and Model Architecture: Qwen3 incorporates a series of training techiques and architectural refinements, including global-batch load balancing loss for MoE models and qk layernorm for all models, leading to improved stability and ov…

keen beacon Apr 28, 2025, 3:52 PM

#

lmao someone leaked qwen 3 32b it seems

#

seemingly one of the randos (non qwen team) in the hf qwen org

cedar tide Apr 28, 2025, 3:52 PM

#

keen beacon lmao someone leaked qwen 3 32b it seems

32b where ?

keen beacon Apr 28, 2025, 3:52 PM

#

eh they've started dropping them now anyway

#

doesn't matter much

#

apparently: https://huggingface.co/second-state/Qwen3-32B-GGUF

second-state/Qwen3-32B-GGUF · Hugging Face

#

not many details there tho

#

why are there random people in the qwen hf org lol

#

it'll be officially released literally in the next hour probably

#

I wonder if we also see something from deepseek this week

#

it would make sense but who knows

#

the 235b possibly

#

depends on how strong their reasoner is

keen beacon Apr 28, 2025, 3:54 PM

#

keen beacon the 235b possibly

yeah

#

I do expect it to beat R1 minimum really

#

if they can't even do that it's a flop

#

llama 4 reasoning releasing at llamacon tomorrow by the looks of it, frontend is ready

#

i would also expect behemoth

tall summit Apr 28, 2025, 4:01 PM

#

i think o3 is better at translation than g2.5pro

tall summit Apr 28, 2025, 4:01 PM

#

keen beacon llama 4 reasoning releasing at llamacon tomorrow by the looks of it, frontend is...

sounds fun

keen beacon Apr 28, 2025, 4:01 PM

#

its gonna be a huge flop

#

if it's anything like the rest

#

there will be a lot of memes about qwen and llama i suspect

#

idk about qwen

#

i have a lot more faith in them than i do meta

#

yann lecooked

keen beacon Apr 28, 2025, 4:02 PM

#

keen beacon i have a lot more faith in them than i do meta

yea i meant clowning on meta lol

#

how qwen was the llama 4 people expected

#

oh

#

yeah nevermind

keen fulcrum Apr 28, 2025, 4:08 PM

#

https://modelscope.cn/collections/Qwen3-9743180bdc6b48
still no entries

Qwen3

通义千问3系列

keen beacon Apr 28, 2025, 4:08 PM

#

its a shame that guy is uploading the ggufs publicly before qwen officially announces it

keen fulcrum Apr 28, 2025, 4:09 PM

#

Must be some employee with access to the system

keen beacon Apr 28, 2025, 4:09 PM

#

keen fulcrum Must be some employee with access to the system

no its a random guy in the hf org i believe

keen fulcrum Apr 28, 2025, 4:09 PM

#

Oh lol

keen beacon Apr 28, 2025, 4:11 PM

#

I also hope that 15b moe is real, it's an awesome size

#

It's insane how many models they are gonna release. With 36 trillion tokens in pretraining not to mention the reasoning training etc

keen fulcrum Apr 28, 2025, 4:13 PM

#

Is R2 coming out this week too?

keen beacon Apr 28, 2025, 4:13 PM

#

Idk lol

#

Its qwens week probably

barren prairie Apr 28, 2025, 4:15 PM

#

keen fulcrum Is R2 coming out this week too?

deepSeek team always work silently no one knows what they are planning 🙃

keen fulcrum Apr 28, 2025, 4:17 PM

#

There was a leak for deepseek too
if its not this week, must be next one

sonic tendon Apr 28, 2025, 4:20 PM

#

keen beacon i have a lot more faith in them than i do meta

qwen 2.5 is already higher than maverick lmao

keen fulcrum Apr 28, 2025, 4:20 PM

#

Behemoth

sonic tendon Apr 28, 2025, 4:21 PM

#

yeah but

#

eh

sonic tendon Apr 28, 2025, 4:22 PM

#

keen beacon Its qwens week probably

one of the chief researchers is tweeting aggressively about it

#

i would personally expect q3 within the next 24 hours

torn mantle Apr 28, 2025, 4:24 PM

#

keen beacon llama 4 reasoning releasing at llamacon tomorrow by the looks of it, frontend is...

Yea

#

Weird they didnt add any new models on lmarena