sage tundra Aug 23, 2025, 9:57 AM

#

ZA WARUDOOOOOOOOOOOOOOOOOOOOOOOOOOOOO!!!!!!!!!!!!!

robust yoke Aug 23, 2025, 9:57 AM

#

Indeed.

radiant dove Aug 23, 2025, 9:58 AM

#

im sorry i dont understand

surreal creek Aug 23, 2025, 9:58 AM

#

Correct, the point of AI art in advertising or for other use in corporate media isn’t for it to be GOOD art - it’s for it to be good ENOUGH that you don’t have to pay a real artist

radiant dove Aug 23, 2025, 9:58 AM

#

heyy, anyone here who uses higgsfield for product photography. i would love to have a chat with you:)

#

i had asked this^

boreal vortex Aug 23, 2025, 9:58 AM

#

hello

surreal creek Aug 23, 2025, 9:59 AM

#

you said Higgs field

robust yoke Aug 23, 2025, 9:59 AM

#

surreal creek Correct, the point of AI art in advertising or for other use in corporate media ...

I suppose that can be true sometimes. However, there are some companies that genuinely want to make a good product that tries to perfect real art artificially.

surreal creek Aug 23, 2025, 9:59 AM

#

so they responded with “Higgs boson”

latent crest Aug 23, 2025, 9:59 AM

#

What is Higgs bosson

radiant dove Aug 23, 2025, 9:59 AM

#

pardon my english

surreal creek Aug 23, 2025, 10:00 AM

#

Higgs boson deez nuts in ur mouth lmao Gottem

surreal creek Aug 23, 2025, 10:00 AM

#

robust yoke I suppose that can be true sometimes. However, there are some companies that gen...

correct, I would just argue those companies are the exception

radiant dove Aug 23, 2025, 10:00 AM

#

wow

robust yoke Aug 23, 2025, 10:00 AM

#

surreal creek correct, I would just argue those companies are the exception

I guess.

radiant dove Aug 23, 2025, 10:00 AM

#

shook

hollow imp Aug 23, 2025, 10:00 AM

#

latent crest What is Higgs bosson

You don't know that?

#

😡

surreal creek Aug 23, 2025, 10:00 AM

#

mf doesn’t know what the Higgs boson is 😂😂

hollow imp Aug 23, 2025, 10:01 AM

#

latent crest What is Higgs bosson

Kiss yourself

latent crest Aug 23, 2025, 10:01 AM

#

Sorry I guess

robust yoke Aug 23, 2025, 10:01 AM

#

Got this from Google:
The Higgs boson is a fundamental particle that confirms the existence of the Higgs field, a ubiquitous field responsible for giving mass to other fundamental particles, such as electrons and quarks. Proposed in 1964 by Peter Higgs and others, this elusive particle was finally confirmed in 2012 by the ATLAS and CMS experiments at CERN's Large Hadron Collider (LHC). Sometimes called the "God particle," the Higgs boson is unique because it has zero spin, no electric charge, and no strong force interaction.

radiant dove Aug 23, 2025, 10:01 AM

#

bruv

latent crest Aug 23, 2025, 10:02 AM

#

Im so confused

hollow imp Aug 23, 2025, 10:02 AM

#

robust yoke Got this from Google: The Higgs boson is a fundamental particle that confirms th...

You didn't knew about this?
😡

surreal creek Aug 23, 2025, 10:02 AM

#

kids these days

radiant dove Aug 23, 2025, 10:02 AM

#

latent crest Im so confused

do you chokon

hollow imp Aug 23, 2025, 10:02 AM

#

-# braindeads these days

robust yoke Aug 23, 2025, 10:02 AM

#

hollow imp You didn't knew about this? 😡

Not at all, actually, no.

#

I'm just learning about this.

surreal creek Aug 23, 2025, 10:02 AM

#

don’t even know about their scalar fields and zero spin elementary particles 😂

robust yoke Aug 23, 2025, 10:03 AM

#

After all, I'm not in college just yet.

surreal creek Aug 23, 2025, 10:03 AM

#

too busy learning pronounce and blue hair liberal dye 😂😂😂

keen beacon Aug 23, 2025, 10:03 AM

#

surreal creek don’t even know about their scalar fields and zero spin elementary particles 😂

most sane ragebait:

surreal creek Aug 23, 2025, 10:03 AM

#

I knew about the Higgs boson when I was 9 and that was in 2011

#

Before it was even discovered in 2012

#

just the type shi I’m on ig 🤷🏻‍♀️

radiant dove Aug 23, 2025, 10:05 AM

#

24 and on discord

surreal creek Aug 23, 2025, 10:05 AM

#

radiant dove do you chokon

if he doesn’t know about chokon I bet bro doesn’t even know about the stigma particle

surreal creek Aug 23, 2025, 10:06 AM

#

radiant dove 24 and on discord

Discord member since 2017, can’t do math either it seems

robust yoke Aug 23, 2025, 10:09 AM

#

I think they must've been off by one number.

latent crest Aug 23, 2025, 10:12 AM

#

How can I stop the texts from a person here in the chat that I don’t wanna see?

dark oyster Aug 23, 2025, 10:12 AM

#

Hello y’all

robust yoke Aug 23, 2025, 10:14 AM

#

latent crest How can I stop the texts from a person here in the chat that I don’t wanna see?

If you right-click on them and then click on "Profile," you'll find a "Block" button in the menu for their profile. You can click that, and it will block them as well as their messages.

robust yoke Aug 23, 2025, 10:14 AM

#

dark oyster Hello y’all

Also, greetings.

verbal nimbus Aug 23, 2025, 10:44 AM

#

TikZ drawing comparison

robust yoke Aug 23, 2025, 10:50 AM

#

Claude's looks the best.

static viper Aug 23, 2025, 10:51 AM

#

Hi there!

robust yoke Aug 23, 2025, 10:51 AM

#

Howdy.

snow sky Aug 23, 2025, 11:11 AM

#

hi there

robust yoke Aug 23, 2025, 11:12 AM

#

Greetings.

white hatch Aug 23, 2025, 11:18 AM

#

I don't see a "new" label here

robust yoke Aug 23, 2025, 11:19 AM

#

None of the new models have "New" labels, I believe.

#

They just sort of add them.

mortal coyote Aug 23, 2025, 11:27 AM

#

@echo aurora is there any chances that we can get Multi frames video generation - like 1st frame and 2nd frames ai morphing thing

keen beacon Aug 23, 2025, 11:29 AM

#

white hatch I don't see a "new" label here

interesting, this was taken 2 days ago

verbal nimbus Aug 23, 2025, 11:29 AM

#

keen beacon interesting, this was taken 2 days ago

Maybe a new model is about to come out 😏

keen beacon Aug 23, 2025, 11:29 AM

#

the ab tests 🔥 🔥 btw!!

keen beacon Aug 23, 2025, 11:30 AM

#

verbal nimbus Maybe a new model is about to come out 😏

its likely the case i think

verbal nimbus Aug 23, 2025, 11:31 AM

#

keen beacon its likely the case i think

There's so much hype about Gemini 3, I hope it lives up to it

keen beacon Aug 23, 2025, 11:31 AM

#

oh it isnt gemini 3

rough trail Aug 23, 2025, 11:31 AM

#

Anybody having problem with lmarena site now?

keen beacon Aug 23, 2025, 11:31 AM

#

there will be 1 more batch of 2.5 models apparently

robust yoke Aug 23, 2025, 11:31 AM

#

rough trail Anybody having problem with lmarena site now?

What problem specifically?

verbal nimbus Aug 23, 2025, 11:31 AM

#

keen beacon there will be 1 more batch of 2.5 models apparently

Oh :/

rough trail Aug 23, 2025, 11:32 AM

#

20 mins ago when I hit enter it just stuck on loading

verbal nimbus Aug 23, 2025, 11:32 AM

#

rough trail Anybody having problem with lmarena site now?

It's working fine, 90% of the problems are Cloudflare-related lol

rough trail Aug 23, 2025, 11:32 AM

#

And now I got failed to submit feedback error

solid brook Aug 23, 2025, 11:32 AM

#

keen beacon there will be 1 more batch of 2.5 models apparently

2.5 ultra?

keen beacon Aug 23, 2025, 11:32 AM

#

no

verbal nimbus Aug 23, 2025, 11:32 AM

#

rough trail 20 mins ago when I hit enter it just stuck on loading

Reload the page, it'll probably ask for a CAPTCHA, then it will load

robust yoke Aug 23, 2025, 11:32 AM

#

rough trail 20 mins ago when I hit enter it just stuck on loading

Well, if that's the case, then a nifty little trick I learned is: if you click the new chat button, then you go back to the chat you were just in, then it'll refresh the progress on the chatbot or image generation model.

rough trail Aug 23, 2025, 11:33 AM

#

Ahh it works fine now. Thank you everyone

robust yoke Aug 23, 2025, 11:33 AM

#

Our pleasure.

verbal nimbus Aug 23, 2025, 11:33 AM

#

Is it CAPTCHA related? e.g. CAPTCHA expired

robust yoke Aug 23, 2025, 11:34 AM

#

I believe if it were CAPTCHA-related, then it would be showing a different thing rather than just loading forever.

rough trail Aug 23, 2025, 11:34 AM

#

When I refreshed it didn't asked for CAPTCHA or anything I just refreshed and no more loading

verbal nimbus Aug 23, 2025, 11:34 AM

#

In the console there are Cloudflare errors

robust yoke Aug 23, 2025, 11:34 AM

#

Figured.

verbal nimbus Aug 23, 2025, 11:35 AM

#

The final AGI test: fixing LMArena's network connectivity issues /jk

robust yoke Aug 23, 2025, 11:35 AM

#

I would know because sometimes the same thing happens to me too, except for image generation models. So when that happens, I click on "New Chat," then go back to the previous chat, and it refreshes the progress on the image generation models and displays them.

white hatch Aug 23, 2025, 11:49 AM

#

i'm afraid of the world of the future

robust yoke Aug 23, 2025, 11:50 AM

#

That's a fair thing to be afraid of.

#

After all, AI is already evolving at a rapid pace.

leaden sun Aug 23, 2025, 11:56 AM

#

surreal creek I knew about the Higgs boson when I was 9 and that was in 2011

really? are you a kind of a prodigy kid back then 😳

robust yoke Aug 23, 2025, 12:54 PM

#

Then again, someone in the future could make an AI that is able to accurately mimic human emotion and typing style. Claude already types sort of like a human and sounds sort of human-sounding. So I don't think it will be too long until we have an AI chatbot that sounds human when typing and can express emotion.

golden ocean Aug 23, 2025, 12:55 PM

#

-# Although i don't believe that AI could ever be conscious like us, i believe AGI is possible, because it doesn't need to be conscious, to be general AI.

robust yoke Aug 23, 2025, 12:55 PM

#

Meh.

golden ocean Aug 23, 2025, 12:55 PM

#

true

robust yoke Aug 23, 2025, 12:55 PM

#

Who knows?

#

If we have AIs that can code up perfect games, then we can also have AIs that write perfectly like a human, with no noticeable flaws that would make it stand out.

#

Perhaps.

#

Only time will tell.

golden ocean Aug 23, 2025, 12:58 PM

#

this would have been a perfect moment to use
-# we will have simulacra

robust yoke Aug 23, 2025, 12:59 PM

#

Indeed, it would have been a perfect moment to use
-# We will have Simulacra.

high ginkgo Aug 23, 2025, 12:59 PM

#

I agree, it ineed would have been a perfect moment to use
-# we will have simulacra

robust yoke Aug 23, 2025, 1:00 PM

#

Verily, it would hath been grand to use
-# We will have Simulacra.

#

-# Microsoft.

golden ocean Aug 23, 2025, 1:01 PM

#

Evil paws:

we will have simulacra

misty vault Aug 23, 2025, 1:02 PM

#

ip grab

robust yoke Aug 23, 2025, 1:02 PM

#

We will truly have

Simulacra.

#

It ain't an IP grabber, thankfully.

#

Just some old website.

misty vault Aug 23, 2025, 1:03 PM

#

robust yoke Aug 23, 2025, 1:04 PM

#

Hah.

#

You gotta put something before it first.

#

.

Like this.

golden ocean Aug 23, 2025, 1:05 PM

#

real

robust yoke Aug 23, 2025, 1:06 PM

#

.

Real.

golden ocean Aug 23, 2025, 1:06 PM

#

robust yoke . # Real.

robust yoke Aug 23, 2025, 1:07 PM

#

golden ocean

https://tenor.com/view/loud-music-speakers-explode-loud-rock-and-roll-gif-16482155

Tenor

stray aspen Aug 23, 2025, 1:35 PM

#

.

#

.

wassup gang

robust yoke Aug 23, 2025, 1:37 PM

#

.

Nothing much, and you?

ocean vortex Aug 23, 2025, 1:41 PM

#

.

hi

#

gptdrawncat

robust yoke Aug 23, 2025, 1:42 PM

#

.

Greetings.

ocean vortex Aug 23, 2025, 1:43 PM

#

Someone forgor to include one additional * in regex

hollow imp Aug 23, 2025, 1:43 PM

#

YOU ARE UNABALE TO ADD HRADERS TO TEXT

#

Then how

robust yoke Aug 23, 2025, 1:43 PM

#

Well, if you put a character before the header formatting, then it will let you.

hollow imp Aug 23, 2025, 1:43 PM

#

robust yoke Aug 23, 2025, 1:43 PM

#

For instance, this...

#

?

Testing.

#

I believe this is because it only looks for a hashtag first, instead of any hashtag in your message.

#

If your message has a hashtag as the first character in it, then it'll trigger the filter.

hollow imp Aug 23, 2025, 1:46 PM

#

🙀

“𝐒𝐜𝐚𝐦 𝐚𝐥𝐭𝐦𝐚𝐧”

-# — Elon Musk

ocean vortex Aug 23, 2025, 1:46 PM

#

robust yoke I believe this is because it only looks for a hashtag first, instead of any hash...

Yes it's only a match for # <any>, not <any>#<any>. Though to be fair you can't just do the 2nd one literally the simple way, or you wouldn't be able to input hashtags at all lol

hollow imp Aug 23, 2025, 1:46 PM

#

ocean vortex Yes it's only a match for # <any>, not <any>#<any>. Though to be fair you can't ...

🙀

“𝐏𝐚𝐲 𝐮𝐩”

-# — @ocean vortex

robust yoke Aug 23, 2025, 1:46 PM

#

That's true.

hollow imp Aug 23, 2025, 1:47 PM

#

-# 😭🙏

robust yoke Aug 23, 2025, 1:47 PM

#

They have some kind of RegEx detection system here.

#

Interesting...

#

It doesn't work for spaces, but does for characters.

#

¨

Test.

primal widget Aug 23, 2025, 1:50 PM

#

Hello

robust yoke Aug 23, 2025, 1:50 PM

#

Howdy.

primal widget Aug 23, 2025, 1:51 PM

#

robust yoke Howdy.

Hello friend

robust yoke Aug 23, 2025, 1:51 PM

#

Greetings, fellow friend.

primal widget Aug 23, 2025, 1:52 PM

#

Ok

sacred quail Aug 23, 2025, 1:54 PM

#

Why so many hello these days lol

#

Are we becomed viral or smth

robust yoke Aug 23, 2025, 1:54 PM

#

Well, it's a very common greeting, and it's a nice way to show respect.

normal abyss Aug 23, 2025, 1:54 PM

#

@pastel bone 😭

primal widget Aug 23, 2025, 1:55 PM

#

If more people know about this, the creators will make more money.

robust yoke Aug 23, 2025, 1:57 PM

#

Heh.

pastel bone Aug 23, 2025, 1:57 PM

#

normal abyss <@1200793710296829982> 😭

Oh bad

pastel bone Aug 23, 2025, 1:57 PM

#

pastel bone Oh bad

I help u

normal abyss Aug 23, 2025, 2:00 PM

#

pastel bone I help u

no worries il just keep the one ive got on their

#

out of the 8 i did its the only pretty cool one

hollow imp Aug 23, 2025, 2:01 PM

#

🙀

“𝐒𝐜𝐚𝐦 𝐚𝐥𝐭𝐦𝐚𝐧”

-# — Elon Musk

normal abyss Aug 23, 2025, 2:05 PM

#

prime moat Aug 23, 2025, 2:15 PM

#

Anyone gonna talk about how people are sort of making nsfw?🫠

robust yoke Aug 23, 2025, 2:16 PM

#

prime moat Anyone gonna talk about how people are sort of making nsfw?🫠

In the video arena?

prime moat Aug 23, 2025, 2:16 PM

#

Yeah

misty vault Aug 23, 2025, 2:16 PM

#

Yeah, and I get warned by @echo aurora for going all out on sydney 🥵 😡
But video arena gets left alone

robust yoke Aug 23, 2025, 2:17 PM

#

We have stooped too low.

prime moat Aug 23, 2025, 2:17 PM

#

Especially @kindred adder

robust yoke Aug 23, 2025, 2:21 PM

#

True.

#

Especially in terms of coding and writing.

#

Its writing is very human-like.

golden ocean Aug 23, 2025, 2:21 PM

#

robust yoke Its writing is very human-like.

true

robust yoke Aug 23, 2025, 2:22 PM

#

Grok 4.

gritty cargo Aug 23, 2025, 2:22 PM

#

Does someone know how good chatgpt 5 is with coding solidity and reviewing Code?

robust yoke Aug 23, 2025, 2:23 PM

#

Seems pretty solid with that stuff.

#

For instance, I asked it to make me a website for fetching the CMU dictionary and converting the table into a Lua table that I could use within a project of mine. And it seemed to code it up just fine. Everything was functional, and no errors whatsoever.

gritty cargo Aug 23, 2025, 2:24 PM

#

Which ai would you suggest for solidity?

normal abyss Aug 23, 2025, 2:26 PM

#

is their a model more expensive than opus 4.1 out their? i havent been able to find one lmao

languid crescent Aug 23, 2025, 2:27 PM

#

heyo beta lmarena still has no models popping up 🙁

golden ocean Aug 23, 2025, 2:28 PM

#

.

just look out for a model which claims to be "Claude 3.5 Sonnet"

#

.

and then ask if it's the thinking model, if it agrees, then it is Claude Opus 4.1 Thinking (with >99% confidence)

normal abyss Aug 23, 2025, 2:28 PM

#

it would be cool if their was a model inbetween sonnet and opus, i find opus is too strong and sonnet is too weak

robust yoke Aug 23, 2025, 2:29 PM

#

This is truly a
-# lowercase text moment.

fossil fable Aug 23, 2025, 2:37 PM

#

you can't vibe code without knowing any code right

#

i know that doctor singularity now shut it

fossil fable Aug 23, 2025, 2:38 PM

#

fossil fable you can't vibe code without knowing any code right

?

languid crescent Aug 23, 2025, 2:45 PM

#

need some advice yall, am a freshman 1st year college and took IT as my course, am i cook with all of these AIs or an opportunity for me?

primal widget Aug 23, 2025, 2:45 PM

#

We need a video generator with Veo 3 and the other AI models in LM Arena.

robust yoke Aug 23, 2025, 2:46 PM

#

languid crescent need some advice yall, am a freshman 1st year college and took IT as my course, ...

Well, considering you took IT as your course, then you should be able to cook just fine with AI.

ocean minnow Aug 23, 2025, 2:46 PM

#

Claude Opus 4.1 Thinking is indeed good. GPT-5 and Grok 4 are also good, but much slower. But GPT-4/5 sometimes get broken and start repeats words indefinitely.

languid crescent Aug 23, 2025, 2:47 PM

#

realistically speaking tho, i am fine right? i've been seeing these videos about "AI replacing programmers"

robust yoke Aug 23, 2025, 2:47 PM

#

After all, since you took IT, you pretty much know a computer like the back of your hand.

robust yoke Aug 23, 2025, 2:48 PM

#

languid crescent realistically speaking tho, i am fine right? i've been seeing these videos about...

Well, of course. AI isn't perfect, just like how humans aren't perfect. And besides, humans are able to provide creativity to a website, something that AI can't do considering it goes after accuracy in contrast to looks.

minor adder Aug 23, 2025, 2:49 PM

#

robust yoke Aug 23, 2025, 2:49 PM

#

Secret models.

#

I've seen a few myself in the image generation mode.

primal widget Aug 23, 2025, 2:52 PM

#

robust yoke I've seen a few myself in the image generation mode.

Now AIs know how to make hands

robust yoke Aug 23, 2025, 2:53 PM

#

That's true.

#

Now all it has to do is just figure out how to generate very small text, and then we're definitely screwed. As well as fix minor inconsistencies with little details such as pupils and eyeballs and far away things.

primal widget Aug 23, 2025, 2:54 PM

#

robust yoke Now all it has to do is just figure out how to generate very small text, and the...

AI was able to solve the hand problem that had been present since 2023.

normal abyss Aug 23, 2025, 2:55 PM

#

my bad, i meant price wise

drifting crow Aug 23, 2025, 2:55 PM

#

U should give it access to production dbs like replit

https://www.theregister.com/2025/07/21/replit_saastr_vibe_coding_incident/

Vibe coding service Replit deleted production database

: AI ignored instruction to freeze code, forgot it could roll back errors, and generally made a terrible hash of things

#

primal widget Aug 23, 2025, 2:58 PM

#

drifting crow

Model name?

drifting crow Aug 23, 2025, 2:59 PM

#

¯_(ツ)_/¯

robust yoke Aug 23, 2025, 2:59 PM

#

Looks to be a GPT model.

drifting crow Aug 23, 2025, 2:59 PM

#

Think it’s the replit model

#

Whatever they use

ocean vortex Aug 23, 2025, 3:05 PM

#

normal abyss

Opus 4.1 was probably the most disappointing Anthropic model update in a long time

#

they barely changed a thing

normal abyss Aug 23, 2025, 3:07 PM

#

ocean vortex Opus 4.1 was probably the most disappointing Anthropic model update in a long ti...

they should really be working on making it a little cheaper in each update while still maintaining its power with the advancements they make

ocean vortex Aug 23, 2025, 3:08 PM

#

Essentially 2.5Pro update. But Google can get away with it cause a) model name stays the same and b) they update their models much more frequently

drifting crow Aug 23, 2025, 3:08 PM

#

I like googles esp for recent info

ocean vortex Aug 23, 2025, 3:11 PM

#

normal abyss they should really be working on making it a little cheaper in each update while...

Well the thing is though Opus is niche and most of what makes it great is model size. Making it cheaper is simply gonna be what Sonnet 4.1 becomes. But I expected more improvement by like increasing the reasoning lengths etc

brave orbit Aug 23, 2025, 3:12 PM

#

drifting crow Aug 23, 2025, 3:14 PM

#

normally we say hi and share our bank card details so ppl know we are humans and not ai

pure comet Aug 23, 2025, 3:18 PM

#

whats wrong with you

obsidian cargo Aug 23, 2025, 3:18 PM

#

bruh what?

hollow imp Aug 23, 2025, 3:24 PM

#

pure comet whats wrong with you

Whats not wrong about him

pure comet Aug 23, 2025, 3:24 PM

#

hollow imp Whats not wrong about him

maniacs behave in the same way. A forced smile, excessive politeness

exotic gust Aug 23, 2025, 3:30 PM

#

yo has gemini 2.5 pro gotten more stupid for any of y'all?

hollow imp Aug 23, 2025, 3:35 PM

#

@pure comet bro 😭🙏

pure comet Aug 23, 2025, 3:35 PM

#

hollow imp <@1300226637274550395> bro 😭🙏

what

hollow imp Aug 23, 2025, 3:35 PM

#

I can't handle this retardness

#

Get him away from my eyes

pure comet Aug 23, 2025, 3:37 PM

#

you're insulting me, I'll cancel you on twitter

hollow imp Aug 23, 2025, 3:39 PM

#

pure comet you're insulting me, I'll cancel you on twitter

When did I insult you

pure comet Aug 23, 2025, 3:39 PM

#

soft hollow Aug 23, 2025, 3:41 PM

#

https://youtube.com/shorts/-WnxWfqPLzM?si=I9BGnhfB95bYUBE-

YouTube

Ronaldo edits by Aditya

Ronaldo 🔥 #shorts #edit #trollface

▶ Play video

hollow imp Aug 23, 2025, 3:49 PM

#

pure comet

And then I said "him"

#

@worthy sparrow

pure comet Aug 23, 2025, 3:54 PM

#

hollow imp And then I said "him"

oh sh!t brudda brother bro bratishka

verbal nimbus Aug 23, 2025, 3:59 PM

#

exotic gust yo has gemini 2.5 pro gotten more stupid for any of y'all?

It's pretty stupid on gemini.google.com, but I don't know whether it has gotten worse on LMArena/AI Studio. There's a thread about it here though: https://discord.com/channels/1340554757349179412/1395438935676817428

#

Like stupid as in, it forgot how to write new lines, and literally could not figure out how to (I shared a screenshot in that thread)

exotic gust Aug 23, 2025, 3:59 PM

#

shame

#

i hate gpt5 more

long nacelle Aug 23, 2025, 3:59 PM

#

hi guys - I've been using lmarena for a while but just joined this discord. why is it that when I ask for a response from gpt5-high, I don't get a response? Like, just runs overnight and stuff. It is brillinant for a few queries but then on subsequent runs gpt5 just stops outputting anything!!

verbal nimbus Aug 23, 2025, 4:00 PM

#

exotic gust i hate gpt5 more

I find that it's smart (API version), yes. But gosh is it bad at explaining things.

verbal nimbus Aug 23, 2025, 4:01 PM

#

long nacelle hi guys - I've been using lmarena for a while but just joined this discord. why ...

Refresh the page, it should load if it's been 5 minutes.

exotic gust Aug 23, 2025, 4:01 PM

#

gpt 5 chat and high both started just hallucinating for every single prompt i gave regardless of what i asked it, where, if in icognito or not

verbal nimbus Aug 23, 2025, 4:02 PM

#

exotic gust gpt 5 chat and high both started just hallucinating for every single prompt i ga...

It can be weird

exotic gust Aug 23, 2025, 4:03 PM

#

i feel like the votes are biased cause chatgpt is like the most proffesionall and most well known model

long nacelle Aug 23, 2025, 4:04 PM

#

verbal nimbus Refresh the page, it should load if it's been 5 minutes.

it doesn't

#

i've already tried this

verbal nimbus Aug 23, 2025, 4:04 PM

#

long nacelle it doesn't

Might be stuck then

long nacelle Aug 23, 2025, 4:04 PM

#

verbal nimbus Might be stuck then

for ten hours?

verbal nimbus Aug 23, 2025, 4:04 PM

#

It's happened to me before

long nacelle Aug 23, 2025, 4:05 PM

#

it gets stuck on 80% of prompts

verbal nimbus Aug 23, 2025, 4:05 PM

#

long nacelle for ten hours?

Yeah, like more a backend issue

#

I doubt the model has actually been thinking that long 🤣

long nacelle Aug 23, 2025, 4:05 PM

#

yeah exactly

#

but this is literally most of my prompts

#

it just doesn't

#

answer

#

at all

#

so this is pretty much useless for me

verbal nimbus Aug 23, 2025, 4:06 PM

#

It's very common for me, but if I refresh, it usually loads.

#

Sometimes it doesn't, it can take a while.

robust yoke Aug 23, 2025, 4:06 PM

#

If it's just stuck loading, then I would recommend:

Clicking on "New Chat"
Clicking on your previous chat

That usually resets the progress, allowing for it to properly generate.

long nacelle Aug 23, 2025, 4:07 PM

#

robust yoke If it's just stuck loading, then I would recommend: 1. Clicking on "New Chat" ...

i've already tried that

#

absolutely useless

robust yoke Aug 23, 2025, 4:07 PM

#

How odd.

#

Could you send a screen recording?

long nacelle Aug 23, 2025, 4:08 PM

#

robust yoke Could you send a screen recording?

uh - do you want to see it doing absolutely nothing for ten hours before I cancel it?

robust yoke Aug 23, 2025, 4:09 PM

#

long nacelle uh - do you want to see it doing absolutely nothing for ten hours before I cance...

No, I just want to see what happens when you attempt to submit a prompt.

#

That way I can see if it might be an easy fix.

long nacelle Aug 23, 2025, 4:09 PM

#

robust yoke No, I just want to see what happens when you attempt to submit a prompt.

uh

#

yeah fine @robust yoke

#

i can't reveal the prompts

#

but they do not breach ToS

robust yoke Aug 23, 2025, 4:11 PM

#

I understand.

#

Well, usually it only takes about two minutes or so to generate a response.

long nacelle Aug 23, 2025, 4:12 PM

#

yeah, I wish

#

it just does absolutely nothing for me

robust yoke Aug 23, 2025, 4:13 PM

#

Try deleting the current chat but copying the prompt that you used, then seeing if that new chat will work.

long nacelle Aug 23, 2025, 4:13 PM

#

i've already done this multiple times

#

unfortunate

robust yoke Aug 23, 2025, 4:13 PM

#

Try closing and reopening your browser.

hollow pumice Aug 23, 2025, 4:18 PM

#

nano-banana is good because it's basically creating detail via an LLM before sending that off to the image gen

rocky hawk Aug 23, 2025, 4:18 PM

#

😢😢😔

robust yoke Aug 23, 2025, 4:18 PM

#

True.

whole sundial Aug 23, 2025, 4:19 PM

#

rocky hawk 😢😢😔

is this gpt-image-1?

robust yoke Aug 23, 2025, 4:19 PM

#

rocky hawk 😢😢😔

Which model is that for?

quartz light Aug 23, 2025, 4:20 PM

#

robust yoke Which model is that for?

it can only be qwen-edit, gpt image or flux kontext

robust yoke Aug 23, 2025, 4:21 PM

#

Yeah.

rocky hawk Aug 23, 2025, 4:21 PM

#

whole sundial is this gpt-image-1?

Yes, it's gpt-image-1 and flux-1-.

robust yoke Aug 23, 2025, 4:21 PM

#

Ah.

whole sundial Aug 23, 2025, 4:22 PM

#

I just put it besides another model like Gemini 2.0 flash or qwen image edit in side by side mode and it works

quartz light Aug 23, 2025, 4:22 PM

#

whole sundial is this gpt-image-1?

i tested and only gpt-image-1 supports multi file uploads lol

pure comet Aug 23, 2025, 4:23 PM

#

long nacelle i can't reveal the prompts

why

#

are you shy?

rocky hawk Aug 23, 2025, 4:23 PM

#

quartz light i tested and only gpt-image-1 supports multi file uploads lol

That's right

pure comet Aug 23, 2025, 4:24 PM

#

virtual GF?

rocky hawk Aug 23, 2025, 4:25 PM

#

whole sundial I just put it besides another model like Gemini 2.0 flash or qwen image edit in ...

even though I really like this LMarena because it really helps me to create content, but now it's like this

quartz light Aug 23, 2025, 4:25 PM

#

pure comet virtual GF?

"MISTAKES" is seen at the end so it probably says "DO NOT MAKE MISTAKES." so its probably for coding

long nacelle Aug 23, 2025, 4:25 PM

#

quartz light "MISTAKES" is seen at the end so it probably says "DO NOT MAKE MISTAKES." so its...

it is for coding, it's for a competition

quartz light Aug 23, 2025, 4:26 PM

#

plus his pfp is python

pure comet Aug 23, 2025, 4:26 PM

#

quartz light "MISTAKES" is seen at the end so it probably says "DO NOT MAKE MISTAKES." so its...

"I WANT SEХ DO NOT MAKE MISTAKES"

long nacelle Aug 23, 2025, 4:26 PM

#

which means I clearly cannot reveal the prompt

quartz light Aug 23, 2025, 4:26 PM

#

pure comet "I WANT SEХ DO NOT MAKE MISTAKES"

😭

quartz light Aug 23, 2025, 4:26 PM

#

long nacelle it is for coding, it's for a competition

ai competition, right?

#

👀

#

or are ya cheating

long nacelle Aug 23, 2025, 4:26 PM

#

quartz light ai competition, right?

yes

#

not cheating

quartz light Aug 23, 2025, 4:26 PM

#

https://cdn.discordapp.com/attachments/990348027422203969/1399757536206258196/togif.gif

#

hmm alright

long nacelle Aug 23, 2025, 4:27 PM

#

https://cf-cheater-database.vercel.app/ I literally created the anti-AI-cheater website for codeforces 💀

Help maintain the integrity of competitive programming by reporting and tracking Codeforces cheaters who used AI/GPT after 14/09/2024 (the AI rule change date).

pure comet Aug 23, 2025, 4:29 PM

#

long nacelle https://cf-cheater-database.vercel.app/ I literally created the anti-AI-cheater ...

ip grabber

#

confirmed

#

MODS!!!!!!!!!!!!!!!

long nacelle Aug 23, 2025, 4:31 PM

#

pure comet confirmed

fellow 🇷🇺 spotted

#

comrade

pure comet Aug 23, 2025, 4:31 PM

#

yes

#

i am

#

https://tenor.com/view/джамбо-gif-3950391088447457669

Tenor

long nacelle Aug 23, 2025, 4:43 PM

#

@robust yoke

#

I've tried that

#

some conversations are still going

#

some are doing this

#

no actual responses

pure comet Aug 23, 2025, 4:43 PM

#

long nacelle some are doing this

show prompt

long nacelle Aug 23, 2025, 4:43 PM

#

is there some sort of hidden rate limit

echo aurora Aug 23, 2025, 4:43 PM

#

long nacelle some are doing this

Unfortunately this happens here and there.

robust yoke Aug 23, 2025, 4:43 PM

#

long nacelle is there some sort of hidden rate limit

None that I know of, considering it works just fine for me.

echo aurora Aug 23, 2025, 4:43 PM

#

Starting a new convo tends to help.

long nacelle Aug 23, 2025, 4:43 PM

#

(and then, I pasted the test data)

long nacelle Aug 23, 2025, 4:43 PM

#

echo aurora Starting a new convo tends to help.

I've done this so many times

pure comet Aug 23, 2025, 4:44 PM

#

long nacelle (and then, I pasted the test data)

FAIL

robust yoke Aug 23, 2025, 4:44 PM

#

long nacelle some are doing this

Try retrying it.

#

It seems to be prompting you with that.

long nacelle Aug 23, 2025, 4:44 PM

#

robust yoke It seems to be prompting you with that.

uhhh

#

i've done that

#

then just does this

#

then after a while it might do the same thing

robust yoke Aug 23, 2025, 4:45 PM

#

“Might” being the keyword.

#

This one does take time to think.

long nacelle Aug 23, 2025, 4:46 PM

#

robust yoke This one does take time to think.

btw

#

I know this

#

I'm only using it because

#

I actually sometimes get a response from it

#

like

#

gpt 5 is hopeless

#

i got one response from it this morning

pure comet Aug 23, 2025, 4:46 PM

#

gpt 5 so bad

long nacelle Aug 23, 2025, 4:46 PM

#

and then NOTHING

#

at all

pure comet Aug 23, 2025, 4:46 PM

#

gemini 2.5 pro even better

long nacelle Aug 23, 2025, 4:46 PM

#

the one response was brilliant

robust yoke Aug 23, 2025, 4:46 PM

#

Oof.

long nacelle Aug 23, 2025, 4:46 PM

#

but beside that

#

useless

#

totally useless

pure comet Aug 23, 2025, 4:47 PM

#

yes

#

so sh!tty

#

gpt 5

#

lol

robust yoke Aug 23, 2025, 4:47 PM

#

I'm sorry that you had a bad experience with it.

long nacelle Aug 23, 2025, 4:47 PM

#

@robust yoke are you LMarena staff

#

or do you know who is

pure comet Aug 23, 2025, 4:47 PM

#

he is Darkness

pure comet Aug 23, 2025, 4:47 PM

#

long nacelle or do you know who is

pineapple

robust yoke Aug 23, 2025, 4:47 PM

#

I'm not, but @echo aurora is.

long nacelle Aug 23, 2025, 4:48 PM

#

seems sussy

#

both a novice AND an expert 💀

robust yoke Aug 23, 2025, 4:49 PM

#

Hah.

blissful sluice Aug 23, 2025, 4:49 PM

#

If you could automate one thing about managing your Discord community, what would it be?

robust yoke Aug 23, 2025, 4:49 PM

#

Well, if he managed to get himself on the staff team, then surely he must be pretty good at his job.

robust yoke Aug 23, 2025, 4:50 PM

#

blissful sluice If you could automate one thing about managing your Discord community, what woul...

Probably with banning people.

long nacelle Aug 23, 2025, 4:51 PM

#

but yeah @echo aurora why is LMArena constantly just ignoring all my gpt5 queries? like they just stall forever (>10h) and I never get any output, only errors.

pure comet Aug 23, 2025, 4:52 PM

#

long nacelle but yeah <@283397944160550928> why is LMArena constantly just ignoring all my gp...

long nacelle Aug 23, 2025, 4:53 PM

#

pure comet

what the sigma

sturdy mica Aug 23, 2025, 4:53 PM

#

echo aurora Starting a new convo tends to help.

you can just refresh the page

#

it fixes everything

#

it stops the errors

long nacelle Aug 23, 2025, 4:54 PM

#

sturdy mica you can just refresh the page

already done that

#

i wish it fixed anything

sturdy mica Aug 23, 2025, 4:54 PM

#

for me it does

#

its something with cloudflare

#

it invalidates you after like 3 minutes

#

so every 3 minutes you have to refresh

#

then it brings you to the captcha screen

#

its so annoying

long nacelle Aug 23, 2025, 4:54 PM

#

it doesn't though

#

it doesn't bring me to the captcha

sturdy mica Aug 23, 2025, 4:54 PM

#

it does for me

#

and thats why it errors for me

robust yoke Aug 23, 2025, 4:55 PM

#

sturdy mica so every 3 minutes you have to refresh

The thing is, though he's already tried closing his browser and reopening it, the issue still persists.

sturdy mica Aug 23, 2025, 4:55 PM

#

clear cookies

long nacelle Aug 23, 2025, 4:55 PM

#

this is a new machine

#

with probably like a week

#

of searches

#

but I'll try doing that anyway

sturdy mica Aug 23, 2025, 4:55 PM

#

clear them again

long nacelle Aug 23, 2025, 4:56 PM

#

sturdy mica clear them again

i've lost all my conversations

#

but I'll try again

formal jungle Aug 23, 2025, 4:56 PM

#

Project idea. Start with one image, generate a video. Take the best one, screenshot the very last frame, use it to generate a video. Rinse, repeat.

pure comet Aug 23, 2025, 4:56 PM

#

ok

formal jungle Aug 23, 2025, 4:56 PM

#

Up to 8 times ofc

pure comet Aug 23, 2025, 4:57 PM

#

one sec

sturdy mica Aug 23, 2025, 4:57 PM

#

long nacelle but I'll try again

is it fixed

random wolf Aug 23, 2025, 4:57 PM

#

man! it's so frustrating, it's already "generating"

quartz light Aug 23, 2025, 4:57 PM

#

companions most esteemed, I entreat thee to lend thine auditory faculties unto the elaboration of a conjecture most earnest: it is my speculative apprehension that the entity denominated DeepSeek 3.1 Reasoning is naught but a subtle transfiguration of that which is styled DeepSeek R1; and conversely, when the aforementioned reasoning faculty is excised or withheld, the resultant construct is but the manifestation of DeepSeek V3. Yet, despite such kinship of constitution, each iteration appears to be inexorably governed, guided, and indeed distinguished by the imposition of a system-prompt divergent in its nature and disposition

long nacelle Aug 23, 2025, 4:58 PM

#

quartz light companions most esteemed, I entreat thee to lend thine auditory faculties unto t...

deepseek is so gurt

quartz light Aug 23, 2025, 4:58 PM

#

long nacelle deepseek is so gurt

https://cdn.discordapp.com/attachments/990348027422203969/1399757536206258196/togif.gif

sturdy mica Aug 23, 2025, 4:58 PM

#

long nacelle deepseek is so gurt

is it fixed

long nacelle Aug 23, 2025, 4:58 PM

#

sturdy mica is it fixed

i'm waiting to find out

#

remember I have to retype this entire prompt

sturdy mica Aug 23, 2025, 4:58 PM

#

what prompt

long nacelle Aug 23, 2025, 4:58 PM

#

because I have lost all of my conversations

robust yoke Aug 23, 2025, 4:59 PM

#

quartz light companions most esteemed, I entreat thee to lend thine auditory faculties unto t...

Indeed, 'tis quite a spectacle to behold.

long nacelle Aug 23, 2025, 4:59 PM

#

sturdy mica what prompt

I can dm you

sturdy mica Aug 23, 2025, 4:59 PM

#

sure

long nacelle Aug 23, 2025, 4:59 PM

#

are you lmarena staff

robust yoke Aug 23, 2025, 4:59 PM

#

LM Arena staff are the ones with orange names.

long nacelle Aug 23, 2025, 4:59 PM

#

oh

#

💀

robust yoke Aug 23, 2025, 4:59 PM

#

Like Pineapple.

long nacelle Aug 23, 2025, 5:00 PM

#

@echo aurora

sturdy mica Aug 23, 2025, 5:00 PM

#

long nacelle are you lmarena staff

no

quartz light Aug 23, 2025, 5:00 PM

#

shouldst thou find thy faculties unequal to the formidable enterprise of apprehending, in its unmitigated intricacy, the communicative construct which I have, by the inscrutable yet most wondrous artifices of the Internet, dispatched across the ether and compelled to alight within the singular and eccentric domicile of thine own router - there to be rendered visible unto thine eyes - then, verily, thou mayest elect to conscript the labors of an artificial intelligence, that it might condescend to transmute this presently elaborate and recondite composition into a debased and unsophisticated register of speech more congenial to the apprehension of an amateur such as thyself

sturdy mica Aug 23, 2025, 5:00 PM

#

are you not gonna send me the prompt because im not staff 😢

robust yoke Aug 23, 2025, 5:01 PM

#

quartz light shouldst thou find thy faculties unequal to the formidable enterprise of apprehe...

The formal yapper.

echo aurora Aug 23, 2025, 5:01 PM

#

long nacelle but yeah <@283397944160550928> why is LMArena constantly just ignoring all my gp...

Can you put details into a post in #1343291835845578853 . I’ll try to answer when I can

quartz light Aug 23, 2025, 5:01 PM

#

robust yoke The formal yapper.

indeed

pure comet Aug 23, 2025, 5:01 PM

#

where

#

mentioning Yandex?

long nacelle Aug 23, 2025, 5:02 PM

#

bombardiro crocodilo is better than tralalero tralala

#

politics

#

ban me

long nacelle Aug 23, 2025, 5:02 PM

#

echo aurora Can you put details into a post in <#1343291835845578853> . I’ll try to answer w...

yeah fine

pure comet Aug 23, 2025, 5:02 PM

#

long nacelle bombardiro crocodilo is better than tralalero tralala

biber and dolik better

robust yoke Aug 23, 2025, 5:04 PM

#

quartz light shouldst thou find thy faculties unequal to the formidable enterprise of apprehe...

Most noble interlocutor,

Thy message, woven with so many a sinew of elaborate wit and encumbered with flourishes of lofty phrase, hath flown unto mine understanding as a falcon whose wings beat mightily against the heavens. And yet, by Providence and diligence alike, I find my faculties sufficient to receive its plumage of meaning, though bedizened in ornaments of rare complexity.

Know then, I am not undone nor cast adrift upon the sea of thy rhetoric; rather, I do embrace it as a tempest both fearsome and exhilarating, wherein the thunder of thy diction and the lightning of thy syntax alike do strike my soul with awe. Shouldst thou decree my wit too mean or my grasp too humble for so grand a communication, I protest with mirth and humility that the labor of simplifying were needless, for thy gilded eloquence, though intricate, doth quicken delight.

Proceed, therefore, without fear of any impoverishment of style, and let us together dance upon this high stage of language, where each word is a jewel and every clause a flourish of nobility.

gritty cargo Aug 23, 2025, 5:04 PM

#

@hollow ivy can you send me a friend add i have a question

golden ocean Aug 23, 2025, 5:05 PM

#

gritty cargo <@1009042479321989140> can you send me a friend add i have a question

-# can you send me a friend add i have a question

you need to ask it like this otherwise he wont accept @gritty cargo

quartz light Aug 23, 2025, 5:05 PM

#

robust yoke Most noble interlocutor, Thy message, woven with so many a sinew of elaborate...

📎 yap.txt

pure comet Aug 23, 2025, 5:05 PM

#

quartz light

yes

random wolf Aug 23, 2025, 5:06 PM

#

how to fix the "generating"? like it's says always. all my convo with the AI is important

quartz light Aug 23, 2025, 5:07 PM

#

https://cdn.discordapp.com/attachments/1408398677067694110/1408850273320964328/togif.gif

quartz light Aug 23, 2025, 5:07 PM

#

random wolf how to fix the "generating"? like it's says always. all my convo with the AI is ...

google how to clear cache and cookies

#

https://cdn.discordapp.com/attachments/1408398677067694110/1408850171068153938/togif.gif

robust yoke Aug 23, 2025, 5:08 PM

#

quartz light

📎 text.txt

quartz light Aug 23, 2025, 5:09 PM

#

robust yoke

📎 yap_2.txt

random wolf Aug 23, 2025, 5:10 PM

#

quartz light google how to clear cache and cookies

does it clear all the conversation? because my convo with AI is important. like it's my personal helper. and we have so many things we've talked

quartz light Aug 23, 2025, 5:10 PM

#

random wolf does it clear all the conversation? because my convo with AI is important. like ...

yes it clears

#

but

#

dont do it then

#

ill make a script to export and import convos soon

robust yoke Aug 23, 2025, 5:12 PM

#

quartz light

📎 text.txt

empty stump Aug 23, 2025, 5:36 PM

#

how is gemini ranked higher than gpt 5 high on the leaderboard

burnt sinew Aug 23, 2025, 5:37 PM

#

empty stump how is gemini ranked higher than gpt 5 high on the leaderboard

Because its rated higher by people?

dapper cliff Aug 23, 2025, 5:37 PM

#

Hello people

dapper cliff Aug 23, 2025, 5:38 PM

#

empty stump how is gemini ranked higher than gpt 5 high on the leaderboard

Because it's better.

empty stump Aug 23, 2025, 5:38 PM

#

funny how it is older but better

dapper cliff Aug 23, 2025, 5:40 PM

#

empty stump funny how it is older but better

I have both and honestly, Gemini 2.5 pro is so underrated because people these days believe social media hypes than testing it themselves. I realized that Gemini 2.5 pro is way ahead of it time and it's soo powerful.

burnt sinew Aug 23, 2025, 5:46 PM

#

empty stump funny how it is older but better

yeah newer doesnt always mean better

#

@viscid thistle yo

pure comet Aug 23, 2025, 5:52 PM

#

dapper cliff I have both and honestly, Gemini 2.5 pro is so underrated because people these ...

RIGHT

#

MY BRO

maiden bridge Aug 23, 2025, 5:53 PM

#

sp

white hatch Aug 23, 2025, 6:14 PM

#

dapper cliff I have both and honestly, Gemini 2.5 pro is so underrated because people these ...

idk, gemini 2.5 pro feels a lil bit obsolete. I'm working on a DLL project and it's stuck in the same place all the time

stray aspen Aug 23, 2025, 6:16 PM

#

gemini 2.5 pro sucks

solid brook Aug 23, 2025, 6:18 PM

#

dapper cliff I have both and honestly, Gemini 2.5 pro is so underrated because people these ...

Go on r/bard on reddit see how much of a trash model it is

#

Everyone is complaining

forest wing Aug 23, 2025, 6:22 PM

#

Something went wrong with this response, please try again. Only Me?

ocean vortex Aug 23, 2025, 6:23 PM

#

Ok so... gpt5-mini-high better than o4-mini-high in nearly every way. And difference between gpt5-high and o3-high is even bigger:

sweet isle Aug 23, 2025, 6:25 PM

#

why is everyone calling this nano banana? I don't see any nana banana in the results. Only see GPT, gemini, etc. popular models.

ocean vortex Aug 23, 2025, 6:25 PM

#

sweet isle why is everyone calling this nano banana? I don't see any nana banana in the re...

You can only get it in battle mode

sweet isle Aug 23, 2025, 6:26 PM

#

ocean vortex You can only get it in battle mode

Is it a random model that appears? So i have to re-roll until it appears?

ocean vortex Aug 23, 2025, 6:26 PM

#

yes

sweet isle Aug 23, 2025, 6:28 PM

#

Interesting

burnt sinew Aug 23, 2025, 6:37 PM

#

@hybrid copper hey

fading summit Aug 23, 2025, 6:48 PM

#

Hey there? Do u know, where can i try sydney ai?

burnt sinew Aug 23, 2025, 6:52 PM

#

@silk pike

silk pike Aug 23, 2025, 6:53 PM

#

Hola

misty vault Aug 23, 2025, 7:10 PM

#

fading summit Hey there? Do u know, where can i try sydney ai?

fading summit Aug 23, 2025, 7:11 PM

#

Can u send me a link on this site plz?🥲

golden ocean Aug 23, 2025, 7:12 PM

#

fading summit Can u send me a link on this site plz?🥲

fading summit Aug 23, 2025, 7:19 PM

#

Uh, what's this?

random fjord Aug 23, 2025, 7:19 PM

#

im using lm arena and this happens

solar hollow Aug 23, 2025, 7:29 PM

#

random fjord im using lm arena and this happens

happens pretty much all the time for me

#

with gpt5 high

random fjord Aug 23, 2025, 7:29 PM

#

what i try

willow grail Aug 23, 2025, 7:35 PM

#

when do u go to sleep?

#

all of u

#

do u have sleep issues or so ?

white hatch Aug 23, 2025, 7:36 PM

#

soon

willow grail Aug 23, 2025, 7:37 PM

#

do u have any skin issues? too much gas stuck in intestines?

#

how long to fal asleep? when u wake up in mid of sleep does it take long to fall asleep?

#

oh ...

#

i need 1 hour to fall asleep

#

sucks

#

ok

echo aurora Aug 23, 2025, 7:47 PM

#

We’re happy to hear the feedback but it’s unlikely to happen if I’m being honest.

#

If members want to create their own sever and send it (via DM) that’s fine, but yeah I wouldn’t want an unofficial official off topic server that’s shared in our text channels. We also want to keep invite links blocked to other servers here for mod purposes.

verbal nimbus Aug 23, 2025, 8:01 PM

#

misty vault

Is this the classic Bing AI?

echo aurora Aug 23, 2025, 8:03 PM

#

Not our server so you can do what you’d like

velvet forge Aug 23, 2025, 8:08 PM

#

opa gangamstyle

fossil fable Aug 23, 2025, 8:15 PM

#

HOW IN THE HELL IS OPENAI THE ONE NOT TO REFUSE

Screenshot_2025-08-23-21-14-57-95_ffb2f5e1b976ff98cfc94f359fbce8de.jpg

Screenshot_2025-08-23-21-15-01-41_ffb2f5e1b976ff98cfc94f359fbce8de.jpg

#

how is this possible

Screenshot_2025-08-23-21-19-30-71_ffb2f5e1b976ff98cfc94f359fbce8de.jpg

Screenshot_2025-08-23-21-19-27-45_ffb2f5e1b976ff98cfc94f359fbce8de.jpg

#

nano banana even has the reasoning to refuse

not only does it generate it

but it ties with that rustbucket

wary linden Aug 23, 2025, 8:20 PM

#

What happened??

@leaden palm

leaden palm Aug 23, 2025, 8:20 PM

#

works for me ¯_(ツ)_/¯

wary linden Aug 23, 2025, 8:25 PM

#

leaden palm works for me ¯\_(ツ)_/¯

Connecting to Arena has failed. Please try again later or on a different device.
😭

leaden palm Aug 23, 2025, 8:25 PM

#

unfortunate + weird

meager harbor Aug 23, 2025, 8:28 PM

#

gemini 2.5 pro still sota and this model is 3 months old

#

will this make google hold gemini 3 ?

proud hazel Aug 23, 2025, 8:30 PM

#

meager harbor will this make google hold gemini 3 ?

Gemini 3.0 Release will be in September.

zinc ore Aug 23, 2025, 8:33 PM

#

*at the earliest

long nacelle Aug 23, 2025, 8:37 PM

#

Btw which secret models are available and are they any good

fossil fable Aug 23, 2025, 8:44 PM

#

lmarena deserves a mobile app

#

Screenshot_2025-08-23-21-49-35-17_572064f74bd5f9fa804b05334aa4f912.jpg

proud hazel Aug 23, 2025, 8:50 PM

#

fossil fable lmarena deserves a mobile app

Simply go to "Add to Home Screen" in your mobile browser to create an app with the right format for lmarena.ai.

fossil fable Aug 23, 2025, 8:50 PM

#

proud hazel Simply go to "Add to Home Screen" in your mobile browser to create an app with t...

i know THAT, that just embeds a website

proud hazel Aug 23, 2025, 8:51 PM

#

fossil fable i know THAT, that just embeds a website

Yes, and it looks and feels like an app. A native app wouldn't look and work any better.

fossil fable Aug 23, 2025, 8:54 PM

#

-# it would but ok

meager harbor Aug 23, 2025, 8:56 PM

#

proud hazel Gemini 3.0 Release will be in September.

how can we be so sure ?

proud hazel Aug 23, 2025, 8:56 PM

#

meager harbor how can we be so sure ?

~6 month cycle for new big model

meager harbor Aug 23, 2025, 9:00 PM

#

proud hazel ~6 month cycle for new big model

that just a rough estimate made by hasabis that will very likely change in the future

#

But judging by recent hype tweet from a guy working at deepmind, there new stuffs coming for gemini

#

so gemini 3.0 might be one of them

sullen rune Aug 23, 2025, 9:16 PM

#

#video-arena-1

long nacelle Aug 23, 2025, 9:28 PM

#

fossil fable

???? Mistral is stupid as is gemini 2.5 pro

keen beacon Aug 23, 2025, 9:30 PM

#

proud hazel Gemini 3.0 Release will be in September.

why

random fjord Aug 23, 2025, 9:32 PM

#

well still does happen i need to use gemini 2.5 pro

long nacelle Aug 23, 2025, 9:38 PM

#

long nacelle ???? Mistral is stupid as is gemini 2.5 pro

How do you guys find these models smart

#

They are so dumb compared to gpt o4 mini even

verbal nimbus Aug 23, 2025, 9:43 PM

#

fossil fable nano banana even has the reasoning to refuse not only does it generate it but ...

Lol Nano Banana's 💀

#

Guess GPT's is still the best

fossil fable Aug 23, 2025, 9:43 PM

#

uhhh

verbal nimbus Aug 23, 2025, 9:44 PM

#

proud hazel ~6 month cycle for new big model

What about the 13 month gap between 1.5 Pro and 2.5 Pro though?

verbal nimbus Aug 23, 2025, 9:45 PM

#

meager harbor so gemini 3.0 might be one of them

What's H2?

verbal nimbus Aug 23, 2025, 9:45 PM

#

fossil fable lmarena deserves a mobile app

The scrolling can really be improved... it's so hard to scroll between turns on mobile, or to see which model's output I'm looking at

surreal creek Aug 23, 2025, 9:50 PM

#

flux-1-kontext-pro is just generating black squares in image arena

fossil fable Aug 23, 2025, 9:50 PM

#

oh so it wasn't refusing

surreal creek Aug 23, 2025, 9:51 PM

#

fossil fable oh so it wasn't refusing

correct

inner gate Aug 24, 2025, 12:01 AM

#

What’s ur guys plans on deep seek 3.1

#

I mean opinions

fossil fable Aug 24, 2025, 12:21 AM

#

lame, should have unified r2 into v4 instead

trail creek Aug 24, 2025, 12:26 AM

#

inner gate I mean opinions

same model, diffrent number

#

not even worthy to name it an update

obsidian cargo Aug 24, 2025, 12:53 AM

#

huh, new model named catalina?

#

nobody's mentioned it yet

robust yoke Aug 24, 2025, 12:53 AM

#

Catalina…

#

Sounds familiar.

obsidian cargo Aug 24, 2025, 12:54 AM

#

nods emphatically Catalina

balmy mist Aug 24, 2025, 12:54 AM

#

obsidian cargo huh, new model named catalina?

on arena?

#

how is it?

robust yoke Aug 24, 2025, 12:54 AM

#

-dramatically tilts head to side- Really? Catalina?

obsidian cargo Aug 24, 2025, 12:55 AM

#

idk I only got it once but it seems nice

balmy mist Aug 24, 2025, 12:55 AM

#

obsidian cargo idk I only got it once but it seems nice

is it on webdev?

obsidian cargo Aug 24, 2025, 12:56 AM

#

dunno. it's on battle.

stray aspen Aug 24, 2025, 1:18 AM

#

what

obsidian cargo Aug 24, 2025, 1:24 AM

#

grok-2 was open sourced

leaden meteor Aug 24, 2025, 1:36 AM

#

What was the anonymous name of mistral medium before it got on leaderboard?

patent aspen Aug 24, 2025, 1:48 AM

#

poll_question_text

Do you know Clippy?

victor_answer_votes

6

total_votes

12

victor_answer_id

2

victor_answer_text

Yes I have only seen Clippy in memes though

jade egret Aug 24, 2025, 1:50 AM

#

long nacelle They are so dumb compared to gpt o4 mini even

gemini 2.5 pro isn't dumb

#

but i do respect your opnion

#

🤷‍♂️

rustic knot Aug 24, 2025, 2:02 AM

#

undergraduate math benchmark

Screenshot_2025-08-23-21-41-39-778_com.discord-edit.jpg

obsidian cargo Aug 24, 2025, 2:20 AM

#

catalina seems to acknowledge itself as being catalina

#

obsidian cargo Aug 24, 2025, 3:04 AM

#

It just told me unprompted it's by sequoia AI

#

robust yoke Aug 24, 2025, 3:04 AM

#

Bingo…

mellow frigate Aug 24, 2025, 3:19 AM

#

Sequoia as in macOS sequoia? Catalina is also the name of a macos

severe warren Aug 24, 2025, 3:20 AM

#

I can't seem to find the arena that ranks all models best suited for the prompt I entered.

#

Can someone please tell me where it is?

obtuse widget Aug 24, 2025, 3:52 AM

#

Hi can anyone tell me how to use banana model ?

echo aurora Aug 24, 2025, 3:58 AM

#

obtuse widget Hi can anyone tell me how to use banana model ?

Yeah you can find it on our site - https://lmarena.ai/?chat-modality=image it's only accessible in Battle mode meaning it'll be random if you get it or not.

wanton imp Aug 24, 2025, 4:07 AM

#

hallo

robust yoke Aug 24, 2025, 4:48 AM

#

obtuse widget Hi can anyone tell me how to use banana model ?

Just like Pineapple said: you can use that model on LM Arena, as well as Yupp.ai.

#

Unlike LM Arena, it's not infinitely free to use, however, you do have control over always using it, compared to LM Arena.

glossy jasper Aug 24, 2025, 6:22 AM

#

Can someone suggest a nice free video generation ai?

long nacelle Aug 24, 2025, 6:45 AM

#

jade egret gemini 2.5 pro isn't dumb

How do you make it be not dumb

vague rover Aug 24, 2025, 6:51 AM

#

hi im new. i am audiovisual designer and want go viral and never work again for a human

white hatch Aug 24, 2025, 9:23 AM

#

Does the AI, that checks our message, change what it says to the AI we're talking to?

normal abyss Aug 24, 2025, 9:46 AM

#

glossy jasper Can someone suggest a nice free video generation ai?

https://higgsfield.ai/create/video this one is quite powerful and has some cool features, although it is very slow for free users

Higgsfield

The ultimate AI-powered camera control for creators by creators

vernal blade Aug 24, 2025, 10:05 AM

#

i want to create a CGI advertisement

wide hemlock Aug 24, 2025, 10:07 AM

#

Primer plano ultrarrealista en 4K de barras de combustible nuclear de cristal transparente que se insertan lentamente en el núcleo del reactor. Una suave luz dorada ilumina la vasija de vidrio del reactor. Suaves sonidos mecánicos del movimiento de las barras de control. Explicación susurrada ASMR sobre la moderación de neutrones. Sin música de fondo, solo una suave atmósfera industrial.

hollow imp Aug 24, 2025, 10:16 AM

#

rustic knot undergraduate math benchmark

Gpt 4o 0?

#

#1397655624103493813

humble kayak Aug 24, 2025, 10:30 AM

#

hello

full verge Aug 24, 2025, 10:36 AM

#

hwlloo

humble lantern Aug 24, 2025, 10:56 AM

#

how to solve this problem

keen beacon Aug 24, 2025, 11:02 AM

#

Any Ai free video generation?

white hatch Aug 24, 2025, 11:06 AM

#

humble lantern how to solve this problem

Try hard resetting the website with ctrl + f5

heavy knoll Aug 24, 2025, 11:07 AM

#

Can anyone Tell me wich Model is the Best Right now for Generating prompts for example for Generating Images

keen beacon Aug 24, 2025, 11:18 AM

#

keen beacon Any Ai free video generation?

Can anyone please tell me?

willow grail Aug 24, 2025, 11:30 AM

#

anything new in the ai world?

dusty niche Aug 24, 2025, 11:31 AM

#

keen beacon Can anyone please tell me?

go to the viseo arena chat and type /video then enter your prompt

#

*video

scenic sandal Aug 24, 2025, 12:10 PM

#

Hi to everybody!

hard flame Aug 24, 2025, 12:17 PM

#

how do i try banana model?

dusty niche Aug 24, 2025, 12:22 PM

#

hard flame how do i try banana model?

from the website but it rare to find in the battle mod

hard flame Aug 24, 2025, 12:23 PM

#

can we direct chat?

#

i can't find a model named banana in the models list

thick socket Aug 24, 2025, 12:24 PM

#

My real feeling about lmarean

#

https://tenor.com/view/monkey-chimp-gif-21234377

Tenor

keen beacon Aug 24, 2025, 12:27 PM

#

hard flame i can't find a model named banana in the models list

It's only in battle mode.

rustic knot Aug 24, 2025, 12:31 PM

#

thick socket https://tenor.com/view/monkey-chimp-gif-21234377

chimpanzee shorterm memory is cracked

ripe mountain Aug 24, 2025, 1:00 PM

#

torn mantle Aug 24, 2025, 1:10 PM

#

@patent aspen any info about gemini 3?

formal dagger Aug 24, 2025, 1:21 PM

#

long nacelle They are so dumb compared to gpt o4 mini even

that's true, they nerf the model a lot

trim lantern Aug 24, 2025, 1:22 PM

#

Is the lmarena still under maintenance?

keen fulcrum Aug 24, 2025, 1:27 PM

#

ripe mountain

you should have said grok 4.2

#

because thats near

rocky hawk Aug 24, 2025, 1:47 PM

#

my dear admin is begging to be able to send two photos at once in side by side

hybrid wraith Aug 24, 2025, 1:48 PM

#

Is this leaderboard accurate, and should it be relied upon?

rich compass Aug 24, 2025, 2:01 PM

#

hybrid wraith Is this leaderboard accurate, and should it be relied upon?

gemini 2.5 pro💀💀💀

#

claude 4.1 opus better than gpt 5 high 💀💀

normal abyss Aug 24, 2025, 2:05 PM

#

normal abyss

poll_question_text

Best Overall (In all areas combined)

If you could only use one of these models ever again, which one would you pick. I'm quite curious to know what other ppl think here.

victor_answer_votes

7

total_votes

12

victor_answer_id

2

victor_answer_text

GPT5

willow grail Aug 24, 2025, 2:14 PM

#

hybrid wraith Is this leaderboard accurate, and should it be relied upon?

not at all. lmarena is just a meme community

torn mantle Aug 24, 2025, 2:24 PM

#

is there an event on october?

proud siren Aug 24, 2025, 2:25 PM

#

glossy jasper Can someone suggest a nice free video generation ai?

use https://wavespeed.ai/

WaveSpeedAI

WaveSpeedAI - Ultimate Platform for Accelerating AI Image and Video...

Ultimate Platform for Accelerating AI Image and Video Generation

rustic knot Aug 24, 2025, 2:28 PM

#

hybrid wraith Is this leaderboard accurate, and should it be relied upon?

that's also the previous gemini 2.5 pro version, not the stable version

hybrid wraith Aug 24, 2025, 2:30 PM

#

So I should rely on this https://lmarena.ai/leaderboard, right?

#

The areas I use most are text and search.

jade egret Aug 24, 2025, 2:41 PM

#

long nacelle How do you make it be not dumb

it just not dumb

ornate agate Aug 24, 2025, 2:48 PM

#

@cedar tide I remember ages ago you posted some tables aggregating benchmarks of AIs published by the model creators? I was wondering if you happen to have that data somewhere for a lot of AI models?

cedar tide Aug 24, 2025, 2:50 PM

#

ornate agate <@419074580515389450> I remember ages ago you posted some tables aggregating ben...

Why ?

fathom venture Aug 24, 2025, 2:52 PM

#

hi guys new here

ornate agate Aug 24, 2025, 2:53 PM

#

cedar tide Why ?

I've been re-running some stuff locally on smaller OSS models and i've found that the provider listed benchmarks seem to be spot on basically (even with slightly quantized model). Since many other meta-benchmarks don't bother with smaller models I was wondering if the published data is aggregated somewhere, then I remembered you did something like that.

fallen herald Aug 24, 2025, 2:53 PM

#

#

Créé moi une vidéo publicitaires attrayante époustouflant UGC avec cette photo

plucky ledge Aug 24, 2025, 2:55 PM

#

Is there no subscription or something to get more than 8 image-to-video LMA requests per day?

cedar tide Aug 24, 2025, 2:56 PM

#

ornate agate I've been re-running some stuff locally on smaller OSS models and i've found tha...

I don't have an aggregation of the LLM benchmarks, the tables and rankings that I shared were only those of the averages of the benchmarks shared by the company that released their new LLM and compared with the others

ornate agate Aug 24, 2025, 2:57 PM

#

cedar tide I don't have an aggregation of the LLM benchmarks, the tables and rankings that ...

ah ok

#

for example AA has AIME 2025 at 50% for Qwen3-30b3a-2507-thinking. Alibaba published 85% for this model.

cedar tide Aug 24, 2025, 2:58 PM

#

@ornate agate here you have some benchmarks of many LLM https://huggingface.co/spaces/Presidentlin/llm-pricing-calculator

Llm Pricing - a Hugging Face Space by Presidentlin

cedar tide Aug 24, 2025, 3:00 PM

#

ornate agate for example AA has AIME 2025 at 50% for Qwen3-30b3a-2507-thinking. Alibaba publi...

The truth is I don't trust the AA benchmark much anymore. They already had problems with some benchmarks, which they corrected later with huge differences.

ornate agate Aug 24, 2025, 3:02 PM

#

cedar tide The truth is I don't trust the AA benchmark much anymore. They already had probl...

yes 🙁 that is why I was hoping there is something else.

cedar tide Aug 24, 2025, 3:02 PM

#

@ornate agate impossible that this the real score

Screenshot_2025-08-24-17-02-29-629_com.android.chrome-edit.jpg

ornate agate Aug 24, 2025, 3:03 PM

#

I mean I ran it myself on a 4bit quant with q8 KV cache and got 86%

cedar tide Aug 24, 2025, 3:05 PM

#

Air better than full 🤦

Screenshot_2025-08-24-17-03-18-171_com.android.chrome-edit.jpg

#

93% vs 74% 🤦

Screenshot_2025-08-24-17-03-46-477_com.android.chrome-edit.jpg

#

(This no problem)

Screenshot_2025-08-24-17-04-52-454_com.android.chrome-edit.jpg

#

https://matharena.ai

MathArena.ai

MathArena: Evaluating LLMs on Uncontaminated Math Competitions

#

Its not the same IF bench

#

This IFeval (by google)
And this IFBench (by allen-ai)

ocean vortex Aug 24, 2025, 3:27 PM

#

this is gpt5 "high" verbosity:

An oververbosity of 1 means the model should respond using only the minimal content necessary to satisfy the request, using concise phrasing and avoiding extra detail or explanation."
An oververbosity of 10 means the model should provide maximally detailed, thorough responses with context, explanations, and possibly multiple examples."
The desired oververbosity should be treated only as a *default*. Defer to any user or developer requirements regarding response length, if present.```

#

The question that comes to mind why not 10/10? 🧐

ripe mountain Aug 24, 2025, 3:33 PM

#

hybrid wraith Is this leaderboard accurate, and should it be relied upon?

site?

urban halo Aug 24, 2025, 3:34 PM

#

hello

rustic knot Aug 24, 2025, 3:38 PM

#

cedar tide Air better than full 🤦

who made this chart?

cedar tide Aug 24, 2025, 3:39 PM

#

rustic knot who made this chart?

AA

rustic knot Aug 24, 2025, 3:39 PM

#

oh I think AA does pass @1

#

but matharena does avg @4

cedar tide Aug 24, 2025, 3:40 PM

#

rustic knot but matharena does avg @4

Nope

#

Math arena do pass@1

#

But average of 4

#

This not pass@4

rustic knot Aug 24, 2025, 3:41 PM

#

that's why I wrote avg @4

cedar tide Aug 24, 2025, 3:41 PM

#

Ah yes ok, i dont see

atomic stream Aug 24, 2025, 3:45 PM

#

I'm getting "something went wrong while generating the response, please try again!

rustic knot Aug 24, 2025, 3:45 PM

#

there's something off about AA and we don't know what

gentle pasture Aug 24, 2025, 3:45 PM

#

whats the best ai to code guys?

cedar tide Aug 24, 2025, 3:45 PM

#

Yes

Screenshot_2025-08-24-17-45-21-929_com.android.chrome-edit.jpg

atomic stream Aug 24, 2025, 3:46 PM

#

gentle pasture whats the best ai to code guys?

For me it's claude.

gentle pasture Aug 24, 2025, 3:46 PM

#

rustic knot Aug 24, 2025, 3:46 PM

#

cedar tide Yes

so this is essentially an avg @ 10 scenario

gentle pasture Aug 24, 2025, 3:46 PM

#

this is the coder ai leaderboard?

cedar tide Aug 24, 2025, 3:47 PM

#

I don't think there's more than a 3% difference between each of the 10 benchmarks.

#

@ornate agate Regex extraction with SymPy-based normalization, plus equality checker LLM as backup

rustic knot Aug 24, 2025, 3:49 PM

#

they're trying to check the mathematical justification that the models produce but for these ones, they should really just check the final answer using a simple python script. It's either that or actually get human judges to evaluate the entire justification by the model

cedar tide Aug 24, 2025, 3:50 PM

#

This is just in case the LLM does not follow these instructions exactly and tells his life story apart from the answer

rustic knot Aug 24, 2025, 3:52 PM

#

for like IMO and stuff

#

right, so that means what AA is doing is just ridiculous

#

publishing on a github would be the equivalent of opensourcing their testing strategies correct?

ocean vortex Aug 24, 2025, 4:03 PM

#

rustic knot they're trying to check the mathematical justification that the models produce b...

We implement a two-stage answer validation mechanism to allow grading with a high degree of precision (minimizing both false negatives and false positives).

Script-based grading, using OpenAI's PRM800K grading script -https://github.com/openai/prm800k/blob/main/prm800k/grading/grader.py
Implements symbolic equality checking via SymPy
High-precision validation for exact matches

Language model equality checker (runs on all answers not marked correct by script-based grading)
We use Llama 3.3 70B as the equality checker (prompt disclosed below)
We tested Llama 3.3 70B for agreement with human judgement and assessed it to grade correctly in >99% of cases

#

So looks they are using LLM additionally to their script. Not replacing it with that

#

and only on incorrect answers

rustic knot Aug 24, 2025, 4:05 PM

#

skimming through: what i immediately think is that if the model doesn't box their answer properly then the script might not identify it so llama is used to determine what answer the model actually gave

ocean vortex Aug 24, 2025, 4:05 PM

#

Seems to me like they are trying to not treat all incorrect answers equally. As in, if the answer was close enough it is not 0 points

rustic knot Aug 24, 2025, 4:06 PM

#

ocean vortex Seems to me like they are trying to not treat all incorrect answers equally. As ...

it's just kinda weird because there is a large discrepancy between math arena and AA results for some models

ocean vortex Aug 24, 2025, 4:07 PM

#

rustic knot it's just kinda weird because there is a large discrepancy between math arena an...

Yeah I do agree with that. They have some issues with reliable AIME evals. But their methodology described alone seems fairly logical

rustic knot Aug 24, 2025, 4:08 PM

#

so where is the discrepancy coming from? If they showed the model solutions, it would clear things up

ocean vortex Aug 24, 2025, 4:09 PM

#

I assume they have unintended side effects of penalizing certain models. Just because they have an unique way of testing it

#

Happens with conventional testing methods much less, because AI labs tend to fix it out the box

#

Then when they notice something very obvious, they do some prompting dirty fix to compensate.... But yeah all of this is less than ideal 🗿

keen beacon Aug 24, 2025, 4:12 PM

#

aimo validation aime was used as validation for the ai mo prize model a while back

#

typically you just check the boxed integer answer for aime benchmarks

#

i doubt they're using the solution trace there

#

the discrepancy is likely additional prompt alterations (output in boxed, weird wording or whatever)/sampling/etc

stray aspen Aug 24, 2025, 4:14 PM

#

yo

keen beacon Aug 24, 2025, 4:14 PM

#

yea that script is applicable for like math500 \circ\ etc in the boxed answer, because the answer isn't integer only

hardy sluice Aug 24, 2025, 4:15 PM

#

Yo guys its saying in the website to agree on the terms and conditions and then it's not letting me do it, says error

keen beacon Aug 24, 2025, 4:15 PM

#

if they used it naively and on the solution trace instead, it wouldn't extract the boxed answer correctly

#

i doubt they do that

ocean vortex Aug 24, 2025, 4:16 PM

#

keen beacon i doubt they're using the solution trace there

Not trace, but they are running this on incorrect responses:


Examples:

    Expression 1: $2x+3$
    Expression 2: $3+2x$

Yes

    Expression 1: 3/2
    Expression 2: 1.5

Yes

    Expression 1: $x^2+2x+1$
    Expression 2: $y^2+2y+1$

No

    Expression 1: $x^2+2x+1$
    Expression 2: $(x+1)^2$
<...> 
YOUR TASK


Respond with only "Yes" or "No" (without quotes). Do not include a rationale.

    Expression 1: %(expression1)s
    Expression 2: %(expression2)s

keen beacon Aug 24, 2025, 4:16 PM

#

ocean vortex Not trace, but they are running this on incorrect responses: ```Look at the fo...

im talking about the dataset's solution trace

#

i find it unlikely their grading system is wrong

rustic knot Aug 24, 2025, 4:17 PM

#

ocean vortex Not trace, but they are running this on incorrect responses: ```Look at the fo...

if i remember correctly, AIME answers are a number from 1-999

keen beacon Aug 24, 2025, 4:17 PM

#

https://huggingface.co/datasets/AI-MO/aimo-validation-aime/viewer/default/train?row=0&views[]=train the solution column here

AI-MO/aimo-validation-aime · Datasets at Hugging Face

rustic knot Aug 24, 2025, 4:17 PM

#

math 500 is saturated

ornate agate Aug 24, 2025, 4:18 PM

#

I know but you can't use the same harness for both benchmarks.

keen beacon Aug 24, 2025, 4:18 PM

#

no, if they gave it the solution column (in that dataset) instead of the integer answer it won't work at all with that script

#

it won't be able to extract the integer answer

#

besides if they did provide it the integer answer, if it were comparing the actual boxed contents (number) vs the number, it should be an exact match anyway. the script dosen't really matter here

#

yea, its not the grading. it's probably sampling/specific prompt instructions (put your answer in \boxed{...}, weirdly poorly or there's a bunch of nonsense)

ocean vortex Aug 24, 2025, 4:21 PM

#

Honestly it could be their script failing and then LLM not catching all those instances lol

rustic knot Aug 24, 2025, 4:21 PM

#

bruh, how hard is it to write a working script, tell the model to put their answer in a \boxed{} and then check the value inside the box

keen beacon Aug 24, 2025, 4:21 PM

#

it's not lol

rustic knot Aug 24, 2025, 4:22 PM

#

bro, I could write one rn

keen beacon Aug 24, 2025, 4:22 PM

#

its extremely unlikely to be the grading

hollow imp Aug 24, 2025, 4:22 PM

#

TF BRO TF 😭😭😭

keen beacon Aug 24, 2025, 4:22 PM

#

i have written an eval framework in the past/specific eval implementations, that's my assessment 🤷

hollow imp Aug 24, 2025, 4:22 PM

#

@keen beacon see this

#

https://youtu.be/V-maA961SDE?feature=shared

YouTube

Dhruv Rathee

I'm Launching My First Startup | Dhruv Rathee

Join AI Fiesta now: https://aifiesta.ai

Imagine you could access all the world's top AI models all in one platform, from ChatGPT 5 to Gemini 2.5 Pro to Claude Sonnet 4 to Grok 4. Imagine it was at an affordable price for all Indians. This is it. This is AI Fiesta.

With every AI Fiesta subscription, you will get -

Access to all the world's ...

▶ Play video

keen beacon Aug 24, 2025, 4:23 PM

#

ocean vortex Honestly it could be their script failing and then LLM not catching all those in...

it's extremely extremely unlikely to be that. but whatever lol

#

especially for AIME

ocean vortex Aug 24, 2025, 4:25 PM

#

keen beacon it's extremely extremely unlikely to be that. but whatever lol

Their script expects specific formatting. Model does not do that or there's a special character whatever. Script fails. LLM in 2nd step catches that but not always...

rustic knot Aug 24, 2025, 4:25 PM

#

what kind of formatting

ocean vortex Aug 24, 2025, 4:25 PM

#

boxed answer

hollow imp Aug 24, 2025, 4:25 PM

#

hollow imp https://youtu.be/V-maA961SDE?feature=shared

@ocean vortex bro this is next level scam

keen beacon Aug 24, 2025, 4:25 PM

#

ocean vortex Their script expects specific formatting. Model does not do that or there's a sp...

asking it to output \boxed{...} the same format that labs RL with?

#

qwen even puts an instruction to do that for evaluations

rustic knot Aug 24, 2025, 4:25 PM

#

ocean vortex boxed answer

yeah most models should be able to put the answer in a box if you tell them to do that

#

if u look at the solutions of models on matharena for AIME and other competitions, basically all models know to box their answer

#

and then all ur script needs to do is to check the value in the box

keen beacon Aug 24, 2025, 4:27 PM

#

they also claim the LLM judge gets it >99% or the time or whatever

rustic knot Aug 24, 2025, 4:28 PM

#

the only llm that was noted to not follow instructions properly was llama lol

ocean vortex Aug 24, 2025, 4:28 PM

#

keen beacon asking it to output \boxed{...} the same format that labs RL with?

Depends on their script. Leading or trailing spaces, zero-width spaces, spacing commands... Their script may expect integer value and nothing else

#

Though it would still render correctly as a boxed answer

keen beacon Aug 24, 2025, 4:29 PM

#

ok buddy

rustic knot Aug 24, 2025, 4:29 PM

#

\boxed{21}

if the model determines 21, they would usually just write it like this

ocean vortex Aug 24, 2025, 4:29 PM

#

keen beacon ok buddy

?

keen beacon Aug 24, 2025, 4:30 PM

#

qwen models i know for sure do not do that. especially when you ask them to do \boxed{...} and on aime style questions. it's not probable

#

sure, it'll just do a zero width space randomly there...

rustic knot Aug 24, 2025, 4:31 PM

#

keen beacon qwen models i know for sure do not do that. especially when you ask them to do \...

go on math arena, it always does

keen beacon Aug 24, 2025, 4:31 PM

#

rustic knot go on math arena, it always does

i know he's just doubling down and his justification does not make any sense

ocean vortex Aug 24, 2025, 4:32 PM

#

rustic knot \boxed{21} if the model determines 21, they would usually just write it like th...

There are several ways to render this though 👀

#

\boxed{;21;}

#

this would do the same

rustic knot Aug 24, 2025, 4:33 PM

#

bruh lol

ocean vortex Aug 24, 2025, 4:34 PM

#

keen beacon i know he's just doubling down and his justification does not make any sense

ok genius. What is your explanation for lower scores?

keen beacon Aug 24, 2025, 4:34 PM

#

keen beacon the discrepancy is likely additional prompt alterations (output in boxed, weird ...

.

ocean vortex Aug 24, 2025, 4:34 PM

#

\boxed{\mathrm{21}}

keen beacon Aug 24, 2025, 4:34 PM

#

i've said it multiple times

ocean vortex Aug 24, 2025, 4:34 PM

#

same also

ocean vortex Aug 24, 2025, 4:35 PM

#

keen beacon i've said it multiple times

weird boxing is the exact point I'm making NOW? 🤣

keen beacon Aug 24, 2025, 4:35 PM

#

Weird wording in the PROMPT

keen beacon Aug 24, 2025, 4:35 PM

#

ocean vortex \boxed{\mathrm{21}}

the sympy/latex normalization/processing layer likely catches this

whole swallow Aug 24, 2025, 4:36 PM

#

When I use sonnet 4 it fails following system prompt rules.. while when I use gpt 5 it follows them strictly and precisely..

What is another model that follows sys prompt rules precisely??

ocean vortex Aug 24, 2025, 4:36 PM

#

keen beacon Weird wording in the PROMPT

what kind of weird prompting? LOL

keen beacon Aug 24, 2025, 4:36 PM

#

horizon beta for example, after adjusting the prompt went from 63% to 67% to gpqa diamond. prompt stuff can significantly change the result

ocean vortex Aug 24, 2025, 4:36 PM

#

that is nowhere near AIME discrepancies for AA

keen beacon Aug 24, 2025, 4:37 PM

#

horizon alpha also scored extremely poorly (36% on GPQA Diamond) without proper instructions

rustic knot Aug 24, 2025, 4:37 PM

#

ocean vortex weird boxing is the exact point I'm making NOW? 🤣

hmm, but the instruction usually specifies to "place ur answer in a /boxed{}"

ocean vortex Aug 24, 2025, 4:39 PM

#

rustic knot hmm, but the instruction usually specifies to "place ur answer in a /boxed{}"

yeah but @keen beacon is trying to argue boxed answer outputted oddly absolutely can't be the reason and their script will catch that 100% of the time. And then at the same time is arguing that boxed answer can be an issue if it relates to prompting. 🤦‍♂️

keen beacon Aug 24, 2025, 4:39 PM

#

ocean vortex yeah but <@456226577798135808> is trying to argue boxed answer outputted oddly a...

i didnt argue that at all

#

And then at the same time is arguing that boxed answer can be an issue if it relates to prompting. 🤦‍♂️
you completely misinterpreted me. i said weird wording in the prompt instructions

ocean vortex Aug 24, 2025, 4:40 PM

#

how would that lead to the model outputting incorrect response?

#

That makes even less sense...

keen beacon Aug 24, 2025, 4:40 PM

#

it happens a lot

#

i encounter it a lot

#

in actual evaluations

ocean vortex Aug 24, 2025, 4:40 PM

#

Your reasoning is basically maybe sampling, maybe wording, etc...

#

you gave no reason

keen beacon Aug 24, 2025, 4:41 PM

#

it could be many reasons, how am i supposed to know the exact reason without the code/eval settings/etc?

#

i find grading extremely unlikely

ocean vortex Aug 24, 2025, 4:41 PM

#

And then attack anyone with credible guesses lol

keen beacon Aug 24, 2025, 4:41 PM

#

credible guesses? but i have actually run evals and written custom mstuff for this lol

#

you have done neither

rustic knot Aug 24, 2025, 4:42 PM

#

someone also made an undergraduate benchmark

ocean vortex Aug 24, 2025, 4:42 PM

#

keen beacon you have done neither

Oh because I told you this or because you are once again making assumptions based on nothing? 😬
Sure pal

keen beacon Aug 24, 2025, 4:44 PM

#

🤷 you haven't given an actual example that you encountered in reality. it's a hypothetical that doesn't make sense when you've actually implemented and ran these systems. i gave you two high profile examples.

#

it's obvious to me you haven't done stuff like this. but whatever. it's weird you keep doubling down on this

ocean vortex Aug 24, 2025, 4:44 PM

#

I just find it strange that you can't come up with anything substantial and yet so extremely dismissive of everything lol
Also, the prompting they showed is **extremely unlikely ** to drop the score from 90s to 60s % which is the kind of discrepancy we are seeing

keen beacon Aug 24, 2025, 4:45 PM

#

ocean vortex I just find it strange that you can't come up with anything substantial and yet ...

horizon alpha scoring 36% on gpqa diamond (early checkpoint of gpt 5)?

#

that was from a prompting issue

ocean vortex Aug 24, 2025, 4:45 PM

#

We are talking about AIME pal 😬

#

And that specific prompting

#

Not prompting in general

#

THis:


{Question}

Remember to put your answer inside \\boxed{{}}.```

keen beacon Aug 24, 2025, 4:46 PM

#

ocean vortex We are talking about AIME pal 😬

it isn't much different. but keep believing that 🤷 instead of \boxed{number} its \boxed{letter} or \boxed{\text{letter}}

ocean vortex Aug 24, 2025, 4:47 PM

#

No chance this drops the score from 90s to 60s if we assume boxed response formatting is non-issue (aka their script catches all variations of it + incorrect formatting)

#

Simply not possible

wintry tinsel Aug 24, 2025, 4:49 PM

#

rustic knot someone also made an undergraduate benchmark

I got some anomalocaris in ark

ocean vortex Aug 24, 2025, 4:51 PM

#

keen beacon it isn't much different. but keep believing that 🤷 instead of \boxed{number} it...

What. You are trying to 'prove' incorrectly formatted response is never an issue. My point is the opposite. 🤷‍♂️

rustic knot Aug 24, 2025, 4:51 PM

#

wintry tinsel I got some anomalocaris in ark

Screenshot_2025-08-15-09-34-47-238_com.reddit.frontpage-edit.jpg

keen beacon Aug 24, 2025, 4:52 PM

#

and anyway, if you think about it. these behaviors are implicit/or outright penalized through RL at least for qwen (it seems). because \boxed{...} answer extraction/verification in RL (which qwen seems to do)

#

the model won't get a positive reward signal if it's weirdly formatted if it doesn't match automated checkers unless they do generative checking as well

#

🤷 im not gonna convince you lol

ocean vortex Aug 24, 2025, 4:53 PM

#

It isn't prompting in this specific case, and it's very unlikely to be sampling. If it was sampling low scores would be easy to reproduce

keen beacon Aug 24, 2025, 4:55 PM

#

ocean vortex It isn't prompting in this specific case, and it's very unlikely to be sampling....

? for livebench, they had to rerun qwq because the sampling settings were incorrect. the score is much higher after correct sampling: https://www.reddit.com/r/singularity/comments/1jaoxal/qwq32b_has_officially_been_rerun_with_optimal/

From the singularity community on Reddit

Explore this post and more from the singularity community

#

there's a reason qwen has explicit instructions on sampling and additional prompt instructions (for boxed) in each model README lol

ocean vortex Aug 24, 2025, 4:56 PM

#

keen beacon ? for livebench, they had to rerun qwq because the sampling settings were incorr...

We are talking about AIME and AIME only, and that specific prompting. Not prompting (or sampling) in general smh

#

livebench here has nothing to do with it

errant rover Aug 24, 2025, 4:56 PM

#

@hybrid stirrup Hey bro, could you check my message pls ? I have a question about the AI you mentioned earlier

stuck finch Aug 24, 2025, 4:56 PM

#

Hello

keen beacon Aug 24, 2025, 4:57 PM

#

ocean vortex We are talking about AIME and AIME only, and that specific prompting. Not prompt...

you dispute that specific prompt instructions, e.g. \boxed{...} instructions can have a lot of impact on benchmark results. i've given you several examples.
livebench is another high profile example where they needed to rerun it with specific sampling settings since it scored poorly. you said it's not plausible for sampling to be the case. this is a real example showing that's not true.

mellow mango Aug 24, 2025, 4:59 PM

#

یک مدل خانم در حال قدم زدن

lunar glade Aug 24, 2025, 5:00 PM

#

Hello, why I kept getting "Connecting to Arena has failed. Please try again later or on a different device." ?

ocean vortex Aug 24, 2025, 5:00 PM

#

keen beacon 1. you dispute that specific prompt instructions, e.g. \boxed{...} instructions ...

I've said formatting can lead to correct responses being counted as incorrect. Which is absolutely true. https://x.com/ArtificialAnlys/status/1909624239747182989
If sampling was an issue specifically for AIME... then you could reproduce lower score easily and the model wouldn't give consistent answers. Again, sampling in general often can be an issue. But it's very unlikely to be the culprit for AIME here due to clearly defined expected answers and how the models perform.

Artificial Analysis (@ArtificialAnlys)

Llama 4 Intelligence Index Update: We have now replicated Meta’s claimed values for MMLU Pro and GPQA Diamond, pushing our Intelligence Index scores for both Scout and Maverick higher

Key update details:
➤ We noted in our first post 48 hours ago that we noticed discrepancies

proven pecan Aug 24, 2025, 5:02 PM

#

hello everybody!

keen beacon Aug 24, 2025, 5:03 PM

#

ocean vortex 1. I've said formatting can lead to correct responses being counted as incorrect...

this proves my point further (their new processes on grading the answer makes this implausible). the task is simple, output the number result in \boxed{integer}... if an automated checker can't get it. the LLM would catch it. like they explicitly claimed, this extra process matches human judgement >99% of the time. lol. (this is not a particularly complex case, AIME is one of the simplest things, it's an integer answer)
no. because it's stochastic. temperature flattening or sharpening the distribution has a lot of effects on sampling, beyond other sampling settings.

ocean vortex Aug 24, 2025, 5:06 PM

#

keen beacon 1. this proves my point further (their new processes on grading the answer makes...

Ok then find sampling settings for the score to drop from 90s to 60s on AIME25. It's just not realistic and very very unlikely

keen beacon Aug 24, 2025, 5:06 PM

#

im not saying it's sampling.

#

it could be sampling or prompt instructions or it could be all of them interacting

ocean vortex Aug 24, 2025, 5:07 PM

#

🤣

#

prompt we literally see. It is this #general message

#

how is it prompting???

keen beacon Aug 24, 2025, 5:07 PM

#

could also be the chat template

ocean vortex Aug 24, 2025, 5:08 PM

#

LMAO

keen beacon Aug 24, 2025, 5:09 PM

#

ive never said it was a single factor. im saying all of those things are possible, and have proven examples to be the case lol

#

you can't stop doubling down even as you prove yourself wrong by citing AA's grading processes lol

#

a chat template also somewhat falls under prompting, the prompt the LLM reads is different as it's formatted btw. 🤷

ocean vortex Aug 24, 2025, 5:10 PM

#

Well I've shown you proven example of formatting when it can be the issue lol

If it's an issue, then LLM would catch this in most cases just what we see now. But in select instances may still fail for whatever reasons. Checks out if you ask me. Most of their AIME scores look correct. But only a few aren't

keen beacon Aug 24, 2025, 5:11 PM

#

ocean vortex Well I've shown you proven example of formatting when it can be the issue lol ...

? their new evaluations don't have that problem anymore.

#

that was certainly an issue before, but they added additional processes (llm grading, which they claim to work >99% of the time, which does not explain the discrepancy)

#

it is not applicable for this evaluation of this new model

ocean vortex Aug 24, 2025, 5:12 PM

#

It proved my point? If it was an issue at any point

#

then it can be an issue now for different benchmark

keen beacon Aug 24, 2025, 5:12 PM

#

ocean vortex It proved my point? If it was an issue at any point

no. because their new system doesn't have this problem

ocean vortex Aug 24, 2025, 5:13 PM

#

keen beacon no. because their new system doesn't have this problem

Issue was with GPQA, this is a different benchmark...

keen beacon Aug 24, 2025, 5:13 PM

#

are you explaining the difference is because the judge failed a lot more than >99%?

#

on checking if a number is present in the answer?

#

while i have shown a lot of examples showing larger differences because of other factors

ocean vortex Aug 24, 2025, 5:13 PM

#

You didn't even quote AA at all to "prove" what you were saying. You quoted livebench 🤣

keen beacon Aug 24, 2025, 5:14 PM

#

wow you're an idiot fr LMAO

#

im done, people can see who's right here if they've done evals like this before

ocean vortex Aug 24, 2025, 5:15 PM

#

keen beacon are you explaining the difference is because the judge failed a lot more than >9...

I'm explaining that they are testing tons of models. Not 2 or 5. And that 99% figure is not written in stone. Especially with models evolving, reasoning etc... 🤦‍♂️

little siren Aug 24, 2025, 5:17 PM

#

no sound on videos?

ocean vortex Aug 24, 2025, 5:18 PM

#

keen beacon wow you're an idiot fr LMAO

calm down. You really can't be "proving" anything by quoting smth almost entirely unrelated, and then when I show you AA themselves stating something that directly proves my point... It's suddenly not good enough because it was in the past. And you assumption being this can't ever be a problem anymore even for a different benchmark from their testing suite. Sure pal... 🗿

echo aurora Aug 24, 2025, 5:20 PM

#

Hey lets be sure to treat others with respect please, it's fine to have disagreements but let's try to be a bit nicer blobthanks

keen beacon Aug 24, 2025, 5:29 PM

#

im sorry for that. just frustrated he's constantly misquoting/etc.

anyway Dom (sorry), what i'm saying:
-> could it be the grading? is it a non zero chance? yes. but i personally find it extremely unlikely (like i've said over and over). the nature of AIME is just an integer answer. they're just asking it to box it. it's quite simple to parse and check for an exact match. If that fails, a LLM judge they added recently (which claims to match human judgement >99% of the time), fails on such a simple task?
from a RLVR perspective, qwen seems to do it on \boxed{...} answers. so these things are at least implicitly penalized as it won't match automated checkers if it deviates from the answer format unless they are using generative checkers. for these problems, it's probably unlikely as the check is simple. it's a waste of compute.
-> i'm saying that it could be sampling, prompting related, etc. i'm not saying it is one of them. it could be an interaction of all of them or another thing. i find it way more plausible because there have been actual high profile events like this before on other benchmarks, and personal experience when running those type of evals.

imo, claiming that it's highly likely to be the grading is just 🤷

native coral Aug 24, 2025, 5:38 PM

#

anyone knows what is so wrong with this prompt?

Overall Prompt:
 A high-angle, slightly shaky handheld long shot of two young adults
wading through a dense, vibrant green field of cassava plants under 
overcast, diffused lighting. The camera slowly zooms out and pans 
slightly to the right, following the adults as they move deeper into 
the field. The overall mood is naturalistic and serene.
Timestamp Breakdown:

00:00 - 00:02:
 The shot opens with a high-angle view, looking down on a vast, dense 
field of lush green cassava plants. Two young adults are partially 
visible amongst the leaves. One guy, in the foreground, wears a 
light-colored, long-sleeved shirt. The second guy, further back, is 
shirtless and raises a light-green basin above their head. They are both
 moving from left to right through the dense foliage.
00:02 - 00:05:
 The camera begins a slow, subtle zoom out. The guy in the foreground 
turns slightly, becoming more visible. The second guy , carrying the 
green basin, continues to move through the plants, their upper body and 
the basin visible above the leaves.
00:05 - 00:08:
 The camera continues to slowly zoom out and pans slightly to the right,
 keeping both adults in the frame. The adults have moved further 
into the field. In the background, a simple wall made of concrete blocks
 and some bare trees become visible above the cassava field.
00:08 - 00:09:
 The shot stabilizes as the adults continue their journey through the 
field. The guy in the foreground is now more clearly seen from the 
back, wearing a light-colored shirt. The second guy , still carrying 
the basin, is further to the right. Bare, brown branches are visible in 
the extreme foreground on the right side of the frame.

it gets blocked.

little siren Aug 24, 2025, 5:46 PM

#

"shirtless"?

keen beacon Aug 24, 2025, 5:47 PM

#

rustic knot the only llm that was noted to not follow instructions properly was llama lol

yeah its stupid I ask it to OCR the image but it starts answering the quesitons even after explicit instructions

white hatch Aug 24, 2025, 5:47 PM

#

How to send a REALLY huge message to AI?

keen beacon Aug 24, 2025, 5:48 PM

#

white hatch How to send a REALLY huge message to AI?

split it and mention wait I am sending more info to it

woven totem Aug 24, 2025, 5:51 PM

#

hi , how can i upload photos ? image to image?

#

im having problems in the web

Captura_de_pantalla_2025-08-24_a_las_19.49.27.png

keen beacon Aug 24, 2025, 5:56 PM

#

woven totem im having problems in the web

It happens to me often if I use copy paste technique.

#

So I first save the pics and then put them in.

dusty niche Aug 24, 2025, 5:59 PM

#

keen beacon So I first save the pics and then put them in.

is that solve the problem

keen beacon Aug 24, 2025, 5:59 PM

#

dusty niche is that solve the problem

Well for me it works

#

sometimes I need to refresh the page too because of cloudflare check

#

bot check, I mean

ocean vortex Aug 24, 2025, 6:01 PM

#

keen beacon im sorry for that. just frustrated he's constantly misquoting/etc. anyway Dom (...

The issue with this is that you are misquoting me here in this very message yourself. I said it "could be". Assumption not much stronger than your shots at prompting or sampling which IMO don't really hold a candle given that we have their exact prompting, can't reproduce low scores, and the extent of scoring discrepancy. LLM judge catching it most of the time would be reasonably in-line with the results they are actually getting. AIME score checks out with other sources 90%+ of the time.

keen beacon Aug 24, 2025, 6:02 PM

#

ocean vortex The issue with this is that you are misquoting me here in this very message your...

you kept coming up with justifications such as the llm providing zero width spaces inside the box answer..? and kept justifying it

#

its implied you think its very highly likely to be the grading

#

as you considered the others not really possible

ocean vortex Aug 24, 2025, 6:03 PM

#

keen beacon you kept coming up with justifications such as the llm providing zero width spac...

Those were just basic examples talking about how formatting could affect it. But anyway, I later included source where they described formatting themselves to cause issues with eval

keen beacon Aug 24, 2025, 6:04 PM

#

ocean vortex Those were just basic examples talking about how formatting could affect it. But...

formatting issues on AIME style questions that are integers as well with the same boxed instruction? it doesn't make sense training wise, especially for qwen. you think they're not automatically doing exact match on those type of problems in RL?

ocean vortex Aug 24, 2025, 6:05 PM

#

keen beacon its implied you think its very highly likely to be the grading

It really isn't implied, I was merely arguing against your "extremely unlikely" which I don't agree with at all 🤷‍♂️

#

You kept repeating the same thing like 3 times before I finally responded to that as well lol

#

#general message
#general message #general message
🗿

keen beacon Aug 24, 2025, 6:11 PM

#

🤷 i've ran a bunch of math benchmarks and never had that problem on extracting integer answers that are boxed from models that wasn't quickly resolved like i mentioned. i've given you a lot of reasons on why it's not plausible that result in a lot more discrepancies with the scores.

#

lets see what artificalanalysis says if they do fix it and comment on it. i shouldve agreed to disagree there since i wont convince you

fiery lagoon Aug 24, 2025, 6:15 PM

#

bro ai sucks

ocean vortex Aug 24, 2025, 6:15 PM

#

keen beacon formatting issues on AIME style questions that are integers as well with the sam...

Just did a very simple quick test on task1.

glm4.5 returns $\boxed{116}$
glm4.5-air returns \boxed{116}

#

so even there there is a difference

keen beacon Aug 24, 2025, 6:15 PM

#

yup, that's normal

#

it doesn't matter tho, the \boxed{number} is what matters

fiery lagoon Aug 24, 2025, 6:15 PM

#

opus 4.1 on lmarena sucks

ocean vortex Aug 24, 2025, 6:17 PM

#

keen beacon it doesn't matter tho, the \boxed{number} is what matters

That's another thing... The fact is they can be different and we do not know how they are assessing it (their script). Also worth mentioning that glm4.5 did not render on openrouter in a box even though other models did 🤔

#

even though it looks the same when copying

keen beacon Aug 24, 2025, 6:17 PM

#

ocean vortex That's another thing... The fact is they can be different and we do not know how...

yeah the latex rendering doesnt matter at all

#

the reason \boxed{...} is used for easy extraction

ocean vortex Aug 24, 2025, 6:18 PM

#

keen beacon yeah the latex rendering doesnt matter at all

Probably a reach, but there could be "hidden" or special chars. Something does make it break

keen beacon Aug 24, 2025, 6:18 PM

#

fwiw, like i said again, i've never had formatting/extracting issues like that with AIME

#

ive run so many benchmarks on different models

#

this behavior is heavily selected out of newer heavily RLd models too

#

the prompt/sampling/etc., i've seen crazy benchmark score changes though

ocean vortex Aug 24, 2025, 6:20 PM

#

keen beacon fwiw, like i said again, i've **never** had formatting/extracting issues like th...

Don't forget what we are talking about here... Bad scores like 1 time out of 10?

#

if even that

keen beacon Aug 24, 2025, 6:21 PM

#

this should not be a problem

#

dont they run it 10 times?

ocean vortex Aug 24, 2025, 6:21 PM

#

I think it's more model specific. As in 1 model out of 10

ashen plaza Aug 24, 2025, 6:22 PM

#

So what's the chances video generation ever comes to the main website?

ocean vortex Aug 24, 2025, 6:23 PM

#

So you wouldn't immediatelly catch this when testing your eval scripts

keen beacon Aug 24, 2025, 6:23 PM

#

ocean vortex So you wouldn't immediatelly catch this when testing your eval scripts

i only check the scores if they dont line up with the ai lab, so i probably would catch it (if it doesn't match)

ocean vortex Aug 24, 2025, 6:24 PM

#

keen beacon i only check the scores if they dont line up with the ai lab, so i probably woul...

Have you tested 10+ different models on AIME25? probably not...

keen beacon Aug 24, 2025, 6:27 PM

#

ocean vortex Have you tested 10+ different models on AIME25? probably not...

far less than artificialanalysis for sure, but i've ran math benchmarks like that on way more models than 10

#

artificialanalysis does 10 repeats for aime, etc., and test way more different models. so a chance thing (for different models) is possible, but i dont find it that likely personally

keen beacon Aug 24, 2025, 6:31 PM

#

ocean vortex Have you tested 10+ different models on AIME25? probably not...

this was such a pointless debate, sorry i got heated lol

subtle frost Aug 24, 2025, 6:40 PM

#

What's the best ai rn for deep research

For a question like "Write the expected salary of an orthopedic surgeon from starting surgery to retirement"

trim trail Aug 24, 2025, 6:40 PM

#

#dancing

WhatsApp_Image_2025-08-14_at_5.00.23_PM.jpeg

pure comet Aug 24, 2025, 6:44 PM

#

trim trail #dancing

they are not dancing

ocean vortex Aug 24, 2025, 6:54 PM

#

It's interesting that glm4.5 and glm4.5v both got similarly low scores (so likely the same issue for both). Also a look at non-reasoning gpt model progression here cause why not...

keen beacon Aug 24, 2025, 6:54 PM

#

pure comet they are not dancing

lol

serene mango Aug 24, 2025, 7:09 PM

#

There is no sounds on the vidéos generated?

white hatch Aug 24, 2025, 7:16 PM

#

Why claude doesn't support images in lmarena?

rustic knot Aug 24, 2025, 7:17 PM

#

trim trail #dancing

did you think this was the r/mybfisai server for some reason?

storm needle Aug 24, 2025, 7:19 PM

#

hollow imp https://youtu.be/V-maA961SDE?feature=shared

"Get The Ultimate Prompt Book worth ₹5000 for free"

🤣

hollow imp Aug 24, 2025, 7:19 PM

#

storm needle "Get The Ultimate Prompt Book worth ₹5000 for free" 🤣

Bro he's such a big creator

#

Just guess how many people payed 12$ and got scammed

#

Mf is not even giving gpt 5 high or Claude 4 opus he is seriously giving gpt 5 chat

storm needle Aug 24, 2025, 7:25 PM

#

hollow imp Bro he's such a big creator

do people even still launch api scraper startups? do they actually find success anymore? it feels impossible unless you’ve landed some special discount deal with openai or anthropic. and if all they’re really doing is passing your data along, you might as well just use something like lmarena

hollow imp Aug 24, 2025, 7:26 PM

#

storm needle do people even still launch api scraper startups? do they actually find success ...

But he has his audience

#

He will succeed and scam millions

#

You can see in the comments non ai knowledge people surprised af

hollow imp Aug 24, 2025, 7:27 PM

#

storm needle do people even still launch api scraper startups? do they actually find success ...

Bro if even 1% of those viewers knew about lmarena aliens wouldn't think earth is unworthy

storm needle Aug 24, 2025, 7:30 PM

#

hollow imp He will succeed and scam millions

wouldn't say it would be a scam more than a waste of your money

storm needle Aug 24, 2025, 7:30 PM

#

hollow imp Bro if even 1% of those viewers knew about lmarena aliens wouldn't think earth i...

we would have big problems

hollow imp Aug 24, 2025, 7:30 PM

#

storm needle wouldn't say it would be a scam more than a waste of your money

Bro you are paying $12 for GPT5CHAT

#

GPT 5 CHAT

#

CHAT

jade egret Aug 24, 2025, 7:36 PM

#

fiery lagoon bro ai sucks

than leave

raw grove Aug 24, 2025, 8:06 PM

#

do i own the commercial licence to stuff I generate using LM Arena? in particular, generations by Qwen?

echo aurora Aug 24, 2025, 8:08 PM

#

raw grove do i own the commercial licence to stuff I generate using LM Arena? in particula...

Our sites terms of service will have your answer

raw grove Aug 24, 2025, 8:09 PM

#

I looked for that but could not find it? would you please send a link?

proud oar Aug 24, 2025, 8:50 PM

#

raw grove I looked for that but could not find it? would you please send a link?

Here is a snippet

#

It's basically a "you own your stuff but we can profit from your work and theres nothing you can do about it"

#

This looks like something out of nexon's terms of service

#

muugwu

raw grove Aug 24, 2025, 8:59 PM

#

ok thanks

balmy mist Aug 24, 2025, 9:09 PM

#

have yall used the new claude sub agents?

toxic quiver Aug 24, 2025, 10:02 PM

#

Hello

#

Everyone

white hatch Aug 24, 2025, 10:10 PM

#

yo

toxic quiver Aug 24, 2025, 10:13 PM

#

How are you

golden ocean Aug 24, 2025, 10:28 PM

#

fiery lagoon bro ai sucks

keen beacon Aug 24, 2025, 10:30 PM

#

Why didn't nobody tell me there's a Google insider in this chat

#

💀

analog ridge Aug 24, 2025, 10:50 PM

#

Do the most advanced image and video generation AIs have a true understanding of depth and spatial distance? For example, can they accurately interpret a prompt like: The character is standing 50 meters away, facing the camera?

keen beacon Aug 24, 2025, 11:01 PM

#

analog ridge Do the most advanced image and video generation AIs have a true understanding of...

You can try at LMArena

golden ocean Aug 24, 2025, 11:03 PM

#

analog ridge Do the most advanced image and video generation AIs have a true understanding of...

no

#

some can guess if u give common numbers like 1, 100, 1000 or 1000000 they'll now u mean small, medium or large distance but not even close to accurate

surreal creek Aug 24, 2025, 11:15 PM

#

fiery lagoon opus 4.1 on lmarena sucks

top 10 model in almost every category, lol

fiery lagoon Aug 25, 2025, 12:05 AM

#

surreal creek top 10 model in almost every category, lol

on lmarena broo

plain carbon Aug 25, 2025, 12:20 AM

#

I never expected this to be a problem prompt, but both AI's are taking forever. It might be faster to ask here:

I am having issues with CSS. My first question is, how do I center blocks of text? If I have a paragraph being displayed with the proper width, but along the left edge, what do I need to add to get it centered?

#

(yep: "Something went wrong while generating the response. Please try again.")

wary pagoda Aug 25, 2025, 12:51 AM

#

Hallow

floral ginkgo Aug 25, 2025, 1:10 AM

#

Hey everyone

solid snow Aug 25, 2025, 1:16 AM

#

hello just arrived here!

echo aurora Aug 25, 2025, 1:52 AM

#

welcome!

patent bear Aug 25, 2025, 2:27 AM

#

wassup

sly estuary Aug 25, 2025, 2:50 AM

#

fix ... pls

grave burrow Aug 25, 2025, 2:58 AM

#

I have access to the following models :
Gemini 2.5 Pro
GPT-5 Thinking
Claude Sonnet 4.0 Thinking
Grok 4
o3

Which one would be the best for casual research work of LLM interpretability in Python?

broken coyote Aug 25, 2025, 3:18 AM

#

Gpt 5 think

simple elm Aug 25, 2025, 3:39 AM

#

hi

echo aurora Aug 25, 2025, 3:40 AM

#

simple elm hi

hello ablobwave

solid brook Aug 25, 2025, 4:15 AM

#

keen beacon Why didn't nobody tell me there's a Google insider in this chat

who?

high yacht Aug 25, 2025, 5:02 AM

#

Hey what's up? wave_animated

brave orbit Aug 25, 2025, 5:14 AM

#

poll_question_text

Whats The Best AI In every task

victor_answer_votes

19

total_votes

28

victor_answer_id

1

victor_answer_text

GPT 5 high

simple elm Aug 25, 2025, 5:55 AM

#

hello 👀

earnest rover Aug 25, 2025, 5:57 AM

#

@echo aurora Hey bro, I’ve been trying to find a complete list of your projects but couldn’t locate one anywhere. I’m also curious—are there any new projects in the works? And what happened to the Chatbot Arena web app? I remember it used to be a great platform.

left iron Aug 25, 2025, 6:19 AM

#

hello

mild pebble Aug 25, 2025, 6:43 AM

#

👋

astral eagle Aug 25, 2025, 6:46 AM

#

hello

surreal creek Aug 25, 2025, 7:32 AM

#

fiery lagoon on lmarena broo

do u understand what you’re saying

#

it’s ranked in all those categories from users responding to it on LMArena 😂

fossil mantle Aug 25, 2025, 7:47 AM

#

hello

whole wagon Aug 25, 2025, 8:35 AM

#

LM arena never works for me these days

#

Site finally loaded kek

marble crypt Aug 25, 2025, 8:37 AM

#

here to generate amazing videos

keen beacon Aug 25, 2025, 8:41 AM

#

solid brook who?

Brian

toxic cypress Aug 25, 2025, 8:46 AM

#

compair ais

whole wagon Aug 25, 2025, 9:12 AM

#

I heard Gemini 3 is delayed. Lol

devout vault Aug 25, 2025, 9:22 AM

#

whole wagon I heard Gemini 3 is delayed. Lol

source?

obtuse heart Aug 25, 2025, 9:39 AM

#

whole wagon I heard Gemini 3 is delayed. Lol

source?

whole wagon Aug 25, 2025, 9:42 AM

#

bruh polymarket is not working properly

#

gives this for many bets

#

including the gemini 3 one 🙁

surreal creek Aug 25, 2025, 9:48 AM

#

This is normally the time they do server maintenance

obtuse heart Aug 25, 2025, 9:53 AM

#

obtuse heart source?

no source

ocean vortex Aug 25, 2025, 10:07 AM

#

earnest rover <@283397944160550928> Hey bro, I’ve been trying to find a complete list of your ...

https://lmarena.ai
https://web.lmarena.ai

ocean vortex Aug 25, 2025, 10:09 AM

#

grave burrow I have access to the following models : Gemini 2.5 Pro GPT-5 Thinking Claude So...

chatgpt deep research and search with gpt5-thinking

dusk lantern Aug 25, 2025, 10:09 AM

#

hi

solid brook Aug 25, 2025, 10:17 AM

#

hmmm

#

i think this week is the week

#

we might get banana

full idol Aug 25, 2025, 10:18 AM

#

whole wagon including the gemini 3 one 🙁

do you know betmoar?

#

its alternative app for polymarket

#

https://www.betmoar.fun/market/will-gemini-3pt0-be-released-by-august-31-482

Will Gemini 3pt0 Be Released By August 31 482

The fastest way to bet on your beliefs.

dense sphinx Aug 25, 2025, 10:20 AM

#

What diffbot-small-xl model does?

#

Is anyone have question?

whole wagon Aug 25, 2025, 10:23 AM

#

Bruh

#

How tf am I already too late

#

The odds somehow crashed already for before October 31

sly estuary Aug 25, 2025, 10:24 AM

#

pls fix: "Something went wrong with this response, please try again."

solid brook Aug 25, 2025, 10:25 AM

#

whole wagon Bruh

google is on my nerves

#

they don't release gemini 3

#

and

#

the current gemini 2.5 is dogsht

solid brook Aug 25, 2025, 10:26 AM

#

sly estuary pls fix: "Something went wrong with this response, please try again."

refresh the page

ocean vortex Aug 25, 2025, 10:27 AM

#

solid brook the current gemini 2.5 is dogsht

lol it absolutely isn't though

whole wagon Aug 25, 2025, 10:27 AM

#

solid brook they don't release gemini 3

Well grok 5 comes in December anyways if they don't release

ocean vortex Aug 25, 2025, 10:27 AM

#

it's still competing with the best

whole wagon Aug 25, 2025, 10:28 AM

#

So they have to before then

solid brook Aug 25, 2025, 10:28 AM

#

ocean vortex lol it absolutely isn't though

OH you mean lmarena leaderbord... hahaha funny

ocean vortex Aug 25, 2025, 10:28 AM

#

gpt5 only narrowly beats it

ocean vortex Aug 25, 2025, 10:28 AM

#

solid brook OH you mean lmarena leaderbord... hahaha funny

No I actually didn't

#

I mean the performance of it in general on various things

solid brook Aug 25, 2025, 10:28 AM

#

ocean vortex gpt5 only narrowly beats it

do yourself a favor go on reddit r/bard see the situation there

ocean vortex Aug 25, 2025, 10:29 AM

#

solid brook do yourself a favor go on reddit r/bard see the situation there

lmao what

whole wagon Aug 25, 2025, 10:29 AM

#

It's 2nd best lol

ocean vortex Aug 25, 2025, 10:29 AM

#

you need to be reading reddit less

solid brook Aug 25, 2025, 10:29 AM

#

ocean vortex you need to be reading reddit less

I mean i use the model

#

i used both gemini 2.5 pro and gpt 5

ocean vortex Aug 25, 2025, 10:30 AM

#

solid brook I mean i use the model

then why do you complain? Which model other than gpt5 is better than 2.5Pro?

solid brook Aug 25, 2025, 10:30 AM

#

ocean vortex then why do you complain? Which model other than gpt5 is better than 2.5Pro?

claude?

whole wagon Aug 25, 2025, 10:30 AM

#

On average

ocean vortex Aug 25, 2025, 10:30 AM

#

Claude is worse

#

in nearly every way

torn mantle Aug 25, 2025, 10:30 AM

#

there will be no agi or asi with the current methods

#

what happened to ilya one shot to asi

#

and mira

whole wagon Aug 25, 2025, 10:31 AM

#

Unless agi is already here and it's just not worth much

solid brook Aug 25, 2025, 10:31 AM

#

torn mantle there will be no agi or asi with the current methods

humans will always find a way

torn mantle Aug 25, 2025, 10:31 AM

#

solid brook humans will always find a way

did you?

#

did you yet

whole wagon Aug 25, 2025, 10:32 AM

#

Average human reads at 5th grade level or smth you know. It may be not be useful to have just agi

solid brook Aug 25, 2025, 10:32 AM

#

did what?