#general | Arena | Page 107

quartz pike Aug 28, 2025, 11:37 AM

#

but i was asking for like an ACTUAL ai image model that exists on lmarena

#

im stoopit

#

and lazy

#

so me no do dat

wild sapphire Aug 28, 2025, 11:42 AM

#

this is going wild now

#

only limit is imagination

mental scaffold Aug 28, 2025, 11:43 AM

#

Hello, here to check out the videos

wild sapphire Aug 28, 2025, 11:44 AM

#

yeah and the wild imagination worldwide

quartz pike Aug 28, 2025, 11:47 AM

#

Chat is ts tuff?

#

Duh its ai generated. this is lmarena 😭

#

correct. but personally i find the seccond one a bit funny

#

Bro just casually lets go

#

and f*cking dies 😭

#

lol

#

btw can u vpte

#

i wanna see what the models are

#

that made it

fervent pebble Aug 28, 2025, 11:52 AM

#

Hello. I'm here because i'm excited at how fast Ai Image generation is evolving and looking forward to seeing how it develops.

quartz pike Aug 28, 2025, 11:54 AM

#

😭

lunar jay Aug 28, 2025, 11:57 AM

#

alpine coral Aug 28, 2025, 12:06 PM

#

nice one - that does suggest some kind of RAG/internet access almost surely

#

is nightride a strong model otherwise? like aside from up to date info

#

@echo aurora i had scroll through so many images and greetings before finding discussion about models in the arena (which is what i liked about the server in the past).. perhaps there's a way to separate out like general chatter from discussion / speculation about anon models (the juicy stuff ha)

neon idol Aug 28, 2025, 12:25 PM

#

#

Sorry for the image in italian just use google lens for translate

solid brook Aug 28, 2025, 12:28 PM

#

neon idol

intresting

#

ask for knowledge cutoff this time

neon idol Aug 28, 2025, 12:29 PM

#

solid brook ask for knowledge cutoff this time

Ok

ripe mountain Aug 28, 2025, 12:30 PM

#

poll_question_text

SOTA

victor_answer_votes

9

total_votes

17

victor_answer_id

1

victor_answer_text

GPT-5

#

poll_question_text

SOTA Open-Source/Open-Weight

victor_answer_votes

7

total_votes

12

victor_answer_id

1

victor_answer_text

Qwen3 235B 2507 (Reasoning)

keen fulcrum Aug 28, 2025, 12:33 PM

#

poll_question_text

Grok 5 before Gemini 3

victor_answer_votes

19

total_votes

21

victor_answer_id

2

victor_answer_text

No

neon idol Aug 28, 2025, 12:34 PM

#

solid brook ask for knowledge cutoff this time

verbal nimbus Aug 28, 2025, 12:54 PM

#

poll_question_text

Is Gemini 3 already on LMArena under an anonymous name?

victor_answer_votes

13

total_votes

13

victor_answer_id

2

victor_answer_text

No

ocean vortex Aug 28, 2025, 12:57 PM

#

ripe mountain

Huh it's still R1 (newest ver)

verbal nimbus Aug 28, 2025, 12:57 PM

#

poll_question_text

When will Gemini 3.0 be released?

victor_answer_votes

6

total_votes

14

victor_answer_id

1

victor_answer_text

September

ocean vortex Aug 28, 2025, 12:57 PM

#

But ig it's old and people are bored wanting smth new lol

#

Deepseek V3.1 was a non-release for the most part. You can now have no reasoning or reasoning that almost matches R1 within the same model. Ok great, moving on...

rustic knot Aug 28, 2025, 1:02 PM

#

ocean vortex Huh it's still R1 (newest ver)

lmao rekt, what i do know is Qwen 3 has overtaken R1 in math, other fields idrk

ocean vortex Aug 28, 2025, 1:03 PM

#

rustic knot lmao rekt, what i do know is Qwen 3 has overtaken R1 in math, other fields idrk

Actually looked at this and dunno anymore. I disagree with this 😠

#

I suppose qwen3 performs, but my IRL experience was somewhat differing from this

rustic knot Aug 28, 2025, 1:04 PM

#

ocean vortex Actually looked at this and dunno anymore. I disagree with this 😠

Qwen 3 is better at math and competitive programming

#

that's why its AA score is better

#

but ngl they did r1 dirty by that aime 2025

ocean vortex Aug 28, 2025, 1:05 PM

#

Qwen3 seemed to me like it's much easier to break and not nearly as reliable as R1

rustic knot Aug 28, 2025, 1:06 PM

#

another thing to note is that it seems like ds keeps trying to use grpo while Qwen sswitched to gspo

worthy sleet Aug 28, 2025, 1:07 PM

#

I need nano banana and I can use it for free from google aistudio. I can also do it from lmarena directchat. Is it preferable for lmarena doing it there because they can use those prompt or is it more of an expense for them?

ocean vortex Aug 28, 2025, 1:09 PM

#

rustic knot another thing to note is that it seems like ds keeps trying to use grpo while Qw...

R1 is just reliable though and shines in several metrics on unseen data (new benchmarks), whereas Qwen can quickly fall apart and seems more benchmaxxed tbh

#

Also smaller

rustic knot Aug 28, 2025, 1:09 PM

#

which new benchmarks?

ocean vortex Aug 28, 2025, 1:09 PM

#

uh there was creative writing one

rustic knot Aug 28, 2025, 1:09 PM

#

ocean vortex R1 is just reliable though and shines in several metrics on unseen data (new ben...

this is also a new bench lol

Screenshot_2025-08-23-21-41-39-778_com.discord-edit.jpg

ocean vortex Aug 28, 2025, 1:10 PM

#

that became a thing after that model was released

rustic knot Aug 28, 2025, 1:10 PM

#

ocean vortex uh there was creative writing one

kimi does the best.on that and it doesn't even use test-time compute

ocean vortex Aug 28, 2025, 1:10 PM

#

things like this: https://www.reddit.com/r/LocalLLaMA/comments/1ieooqe/deepseek_r1_takes_1_overall_on_a_creative_short/

From the LocalLLaMA community on Reddit: DeepSeek R1 takes #1 overa...

Explore this post and more from the LocalLLaMA community

rustic knot Aug 28, 2025, 1:11 PM

#

that seems like from a long time ago

ocean vortex Aug 28, 2025, 1:11 PM

#

Like it really seems that model had good fine-tuning. It is performing on things they didn't explicity train for

ocean vortex Aug 28, 2025, 1:11 PM

#

rustic knot that seems like from a long time ago

Yeah but still kinda relevant

rustic knot Aug 28, 2025, 1:12 PM

#

like kimi overtook everyone on the eq creative bench

#

it's basically the best writer, even gpt 6 (underscore) agrees

ocean vortex Aug 28, 2025, 1:12 PM

#

rustic knot like kimi overtook everyone on the eq creative bench

As it should, it's the biggest of those 3

rustic knot Aug 28, 2025, 1:14 PM

#

there seems to be a tradeoff b/w what certain RL algorithms are good for, GRPO might be better for writing where the rewards are uncertain while GSPO is better when there is only 1 right answer

ocean vortex Aug 28, 2025, 1:14 PM

#

They can't do RL training (reasoning) though it seems 🗿

#

kimi has a lot of potential for it

rustic knot Aug 28, 2025, 1:14 PM

#

have u seen the GSPO paper?

keen beacon Aug 28, 2025, 1:14 PM

#

gspo is just better overall

#

vs grpo

rustic knot Aug 28, 2025, 1:16 PM

#

in grpo, there is a higher variance for answers which might be better for writing purposes. Trentk used it to turn a small Qwen model in mechahitler

#

more creative yk

keen beacon Aug 28, 2025, 1:17 PM

#

not too sure about that but in my experience gspo is just better

rustic knot Aug 28, 2025, 1:17 PM

#

cuz ur probably using it in technical cases where there is only 1 right answer for the most part

#

@ocean vortex you gone buddy? xD

leaden sun Aug 28, 2025, 1:25 PM

#

rustic knot this is also a new bench lol

Anthropic is losing the game it seems, they're not racing to AGI, but to establish as a coding product company?

white hatch Aug 28, 2025, 1:25 PM

#

will there be added support for uploading files other than images?

ocean vortex Aug 28, 2025, 1:27 PM

#

rustic knot <@514836230802898954> you gone buddy? xD

Nah, why?

#

No opinion on this. I'm more interested in how it turns out in practice. But haven't looked into technical details of this enough to comment tbh

#

Plus I would say it's a bad idea to generalize like this. I think there's more than 1 way to train a good performing model

keen beacon Aug 28, 2025, 1:35 PM

#

rustic knot there seems to be a tradeoff b/w what certain RL algorithms are good for, GRPO m...

Sure.

#

Until you remember that GPT-5 is the best model overall, that is, most likely to give a good answer to ANY prompt.

#

Regarding if Qwen is benchmaxxed or not - we quite literally do not have appropriate benchmarks to figure it out

#

In psychometrics for example, tests are created to distinguish between different broad abilities, which are measured with different subtests

#

Let's say that we did an exploratory factor analysis on a LLM and figured out that it has a strength in broad factor that is responsible for creative writing, and each other thing this broad ability accounts for

#

So it's going to be great at benchmarks that measure this broad ability

#

And if a model does somehow bad at all benchmarks at the same time except those it was measured at, it is an evidence of benchmaxxxing

#

But we do not have extensive psychometrically valid benchmarks for LLM so far, so...

summer cove Aug 28, 2025, 1:47 PM

#

hi

leaden sun Aug 28, 2025, 1:50 PM

#

keen beacon But we do not have extensive psychometrically valid benchmarks for LLM so far, s...

Those psychometric are just theoretical frameworks that don’t necessarily reflect the reality and can be easily gamed, those top consultant firms are using psychometric for their assessments and people know very well how to game the system, same will be for LLMs

keen beacon Aug 28, 2025, 1:51 PM

#

leaden sun Those psychometric are just theoretical frameworks that don’t necessarily reflec...

As a person with some background in quantitative psychology, nope.

leaden sun Aug 28, 2025, 1:52 PM

#

Ok, that speaks volume about your background

tawny kite Aug 28, 2025, 1:54 PM

#

HEY EVERYONE, I AM NEW HERE,

sullen fern Aug 28, 2025, 1:59 PM

#

hi

languid crescent Aug 28, 2025, 2:06 PM

#

mah chat history is gone 😭

fervent tangle Aug 28, 2025, 2:07 PM

#

GUYS

#

GOOGLE RELEASED NANO BANANA

#

ITS GEMINI 2.5 FLASH

#

on google ai studio

keen beacon Aug 28, 2025, 2:08 PM

#

OMG NANO BANANA

sullen fern Aug 28, 2025, 2:08 PM

#

I have a question

fervent tangle Aug 28, 2025, 2:09 PM

#

finally imma use nano banana for free

sullen fern Aug 28, 2025, 2:09 PM

#

How can i change the aspect ratio on lmarena ?

fervent tangle Aug 28, 2025, 2:09 PM

#

on google ai studio

fervent tangle Aug 28, 2025, 2:09 PM

#

sullen fern How can i change the aspect ratio on lmarena ?

u cant

keen beacon Aug 28, 2025, 2:09 PM

#

its also available on lmarena direct chat btw if u run into ur limits there

sullen fern Aug 28, 2025, 2:09 PM

#

damn i wanted to make a horizontal image

fervent tangle Aug 28, 2025, 2:12 PM

#

keen beacon its also available on lmarena direct chat btw if u run into ur limits there

wydm

keen beacon Aug 28, 2025, 2:12 PM

#

aistudio has generation limits for nano banana

fervent tangle Aug 28, 2025, 2:12 PM

#

keen beacon aistudio has generation limits for nano banana

on google ai studio its better, u can choose it right away and edit stuff with it

fervent tangle Aug 28, 2025, 2:12 PM

#

keen beacon aistudio has generation limits for nano banana

really

keen beacon Aug 28, 2025, 2:12 PM

#

yea u have X amount of requests

fervent tangle Aug 28, 2025, 2:13 PM

#

keen beacon yea u have X amount of requests

10 amount?

keen beacon Aug 28, 2025, 2:13 PM

#

im not sure about the specifics, they dont document it and it changes

fervent tangle Aug 28, 2025, 2:13 PM

#

keen beacon im not sure about the specifics, they dont document it and it changes

idk i been using it for a while now

fervent tangle Aug 28, 2025, 2:13 PM

#

keen beacon im not sure about the specifics, they dont document it and it changes

it also has limit on lmarena

keen beacon Aug 28, 2025, 2:13 PM

#

yea im just saying if u do reach the high limit

fervent tangle Aug 28, 2025, 2:14 PM

#

keen beacon yea im just saying if u do reach the high limit

i can just change my google account

keen beacon Aug 28, 2025, 2:14 PM

#

🤷 im just saying it is available on lmarena direct chat if you want. no login required there to get extra use

fervent tangle Aug 28, 2025, 2:14 PM

#

keen beacon 🤷 im just saying it is available on lmarena direct chat if you want. no login r...

yea ig

#

imma try it if i reach the limit

#

also did the quality drop?

#

after they released it?

#

or am i tripping

keen beacon Aug 28, 2025, 2:16 PM

#

personally i dont see much of a difference/if at all compared to when it was anon but i barely used it

#

some people have said it was nerfed though

fervent tangle Aug 28, 2025, 2:16 PM

#

keen beacon some people have said it was nerfed though

yea it used to be much clearer

south vigil Aug 28, 2025, 2:17 PM

#

What do you gets think is better at coding; GPT 5 high or claude opus 4.1/sonnet4

white hatch Aug 28, 2025, 2:18 PM

#

gpt for architecture, claude for coding

fervent tangle Aug 28, 2025, 2:18 PM

#

south vigil What do you gets think is better at coding; GPT 5 high or claude opus 4.1/sonnet...

4.1 opus for sure, but it costs alot

#

try 4 sonnet if u dont want alot of costs

south vigil Aug 28, 2025, 2:20 PM

#

claude opus 4.1 was unable to help yesterday fix a simple api concurrency limit issue, had to take over, and do it the old fashioned way. i use opus 4.1 often and it's decent but was thinking about switching to gpt5 high though because it's a better reasoning model.

fervent tangle Aug 28, 2025, 2:21 PM

#

south vigil claude opus 4.1 was unable to help yesterday fix a simple api concurrency limit ...

did gpt5 fix it for u?

south vigil Aug 28, 2025, 2:21 PM

#

didn't try, should've, i just fixed it myself, i'll see if gpt5 can try optimise it today

#

because claude opus 4.1 completely failed

high hound Aug 28, 2025, 2:23 PM

#

fervent tangle Aug 28, 2025, 2:23 PM

#

high hound

google ofc, easily

drifting crow Aug 28, 2025, 2:24 PM

#

Nvidia

sonic bear Aug 28, 2025, 2:28 PM

#

golden hour backlight,soft rim light on hair,clothes & subjects,lens flare sunlight,create shadow on ground

topaz bay Aug 28, 2025, 2:33 PM

#

high hound

wen it comes to text model only, then XAI,openai and Anthropic have a big chance

#

Only problem is that the west only does close sourced

wet pewter Aug 28, 2025, 2:35 PM

#

fervent tangle google ofc, easily

Google farming your data through androweed

topaz bay Aug 28, 2025, 2:35 PM

#

I hope china keeps doing good open source, so that it's a fair market

topaz bay Aug 28, 2025, 2:35 PM

#

wet pewter Google farming your data through androweed

ai doesnt need data from people anymore, it self trains and does a feedback loop, only picture and video needs new data, that's why youtube is so good at it

fervent tangle Aug 28, 2025, 2:56 PM

#

wet pewter Google farming your data through androweed

google already has everyone's data lmfao

#

thats how they made their AI models good

quartz pike Aug 28, 2025, 2:59 PM

#

tuff or nah. vote ples

#

yall just asking. are the devs planning to add video arena to the website?

stiff blaze Aug 28, 2025, 3:05 PM

#

Make him run like super mario 3

fervent tangle Aug 28, 2025, 3:07 PM

#

quartz pike

brodie i cant vote its fowarded

quartz pike Aug 28, 2025, 3:08 PM

#

fervent tangle brodie i cant vote its fowarded

click the thing at the bottom

#

to go to le channel

neon musk Aug 28, 2025, 3:08 PM

#

How generate image to video on this LMARENA to ratio 9:16 or portrait??

quartz pike Aug 28, 2025, 3:08 PM

#

so you can vote

fervent tangle Aug 28, 2025, 3:08 PM

#

quartz pike click the thing at the bottom

nvm i did

#

the right one is veo3

#

but it doesnt have audio..

quartz pike Aug 28, 2025, 3:08 PM

#

oh?

fervent tangle Aug 28, 2025, 3:09 PM

#

quartz pike oh?

tbh veo 3 looks good

quartz pike Aug 28, 2025, 3:10 PM

#

dis one is even better but idk wtf is going on with the left one

#

The ai on the left crashed the f out

exotic tartan Aug 28, 2025, 3:15 PM

#

why do i see only video-arena-1? aren't there supposed to be more?

grand valve Aug 28, 2025, 3:24 PM

#

hello friends....how to use google flast 2.5 i.e. nano banana

solid brook Aug 28, 2025, 3:26 PM

#

grand valve hello friends....how to use google flast 2.5 i.e. nano banana

gemini app lmarena google ai studio

keen beacon Aug 28, 2025, 3:28 PM

#

https://www.lesswrong.com/posts/aFW63qvHxDxg3J8ks/nobody-is-doing-ai-benchmarking-right

Nobody is Doing AI Benchmarking Right — LessWrong

By Chapin Lenthall-Cleary and Cole Gaboriault • …

fervent tangle Aug 28, 2025, 3:40 PM

#

grand valve hello friends....how to use google flast 2.5 i.e. nano banana

google ai studio

keen beacon Aug 28, 2025, 3:46 PM

#

keen beacon https://www.lesswrong.com/posts/aFW63qvHxDxg3J8ks/nobody-is-doing-ai-benchmarkin...

Another interesting writeup from these guys:
https://www.lesswrong.com/posts/SJARqiCTcqyGPSKJw/anthropic-is-going-all-in-on-ability-without-intelligence

Anthropic Is Going All In On Ability Without Intelligence? — Less...

Consider everything in this post speculative. I intend to provide updates once I have data from more models, more robust Starburst performance data (…

#

In short, while Claude models seem to be great at coding, they severely lack in GENERAL ability

sacred wharf Aug 28, 2025, 3:49 PM

#

hi cant wait to see whats posssible here

keen beacon Aug 28, 2025, 3:49 PM

#

keen beacon In short, while Claude models seem to be great at coding, they severely lack in ...

They are great at coding - but when it comes to anything but coding, they are suddenly so behind the frontier it is even difficult to consider them a frontier model anymore

#

I've written before about METR's time horizon benchmark. While I consider it a valuable benchmark, it doesn't measure exactly what it's trying to. In order to only measure a model's time horizon, a benchmark would need to only vary the task length. Instead, the short tasks tend to be easy and not specialized-knowledge-dependent (i.e. doing a web search), whereas the long ones tend to require far greater specialized knowledge and intelligence/problem-solving (i.e. ML coding tasks). So it winds up measuring an amalgamation of time horizon, coding ability, ML knowledge, problem-solving, etc. Very roughly speaking, it's a decent benchmark of (partly narrow) abilities useful for AI automation of AI progress.

quartz pike Aug 28, 2025, 3:59 PM

#

Hey anyone here know code?

#

Cause just asking. Wich assistant did better since i dont understand code:

#

Assistant a:

📎 message.txt

#

Assistant B:

📎 message.txt

keen beacon Aug 28, 2025, 4:04 PM

#

quartz pike Cause just asking. Wich assistant did better since i dont understand code:

Bruh 💀

quartz pike Aug 28, 2025, 4:05 PM

#

?

#

im a dummy

#

me no know

#

or how to put code somewhere and start it

astral kayak Aug 28, 2025, 4:09 PM

#

hello

keen beacon Aug 28, 2025, 4:10 PM

#

quartz pike or how to put code somewhere and start it

Did you know that you could just GPT-5 about it...

echo aurora Aug 28, 2025, 4:11 PM

#

alpine coral <@283397944160550928> i had scroll through so many images and greetings before f...

This is a bit tricky as creating new channels doesn't always lead to community members using them for the topic of the channel, we're currently experiencing this in the other existing channels. What I've seen other communities do, and what I'm encouraging everyone here to do, is if you'd like to have a conversation stay on a specific topic -> create a thread for it. It won't be great for new people to join in on that conversation (as it'll get pushed up and off the general text area), but it does establish a more "private" place for members to discuss something specific.

quartz pike Aug 28, 2025, 4:15 PM

#

keen beacon Did you know that you could just GPT-5 about it...

Ik i can just ask gpt 5. its a great coding model and at anything. But... I wanna see how other ais compare

#

oh wait im a dumbfuc

#

i just realised what you meant

#

ima just shut tf up 😭

keen beacon Aug 28, 2025, 4:16 PM

#

Bro 💀

quartz pike Aug 28, 2025, 4:16 PM

#

Chill english aint my first language 😭

#

im greek

#

Ελλαδα!!!!

#

-# that says greece in greek

feral python Aug 28, 2025, 4:19 PM

#

aaaaaaaaaaaaaa

tribal peak Aug 28, 2025, 4:19 PM

#

Hello

torn mantle Aug 28, 2025, 4:21 PM

#

feral python aaaaaaaaaaaaaa

AAAAAAAAAAAAAAAAAAAAAH

#

WAAAAAAAAAAAAAH

#

😱

feral python Aug 28, 2025, 4:21 PM

#

torn mantle WAAAAAAAAAAAAAH

NAAAAAAAAAAAAAAAAAA

feral python Aug 28, 2025, 4:22 PM

#

tribal peak Hello

Hi

keen beacon Aug 28, 2025, 4:25 PM

#

quartz pike Chill english aint my first language 😭

Neither is mine

quartz pike Aug 28, 2025, 4:25 PM

#

cool

unborn spoke Aug 28, 2025, 4:36 PM

#

Hello this Forum is so cool

echo aurora Aug 28, 2025, 4:44 PM

#

unborn spoke Hello this Forum is so cool

welcome welcome 👋

zealous sparrow Aug 28, 2025, 5:11 PM

#

MAI-1 is the stupidest model name i heard

humble sonnet Aug 28, 2025, 5:11 PM

#

What is mai-1 ?

zealous sparrow Aug 28, 2025, 5:11 PM

#

Medium Aritifical Inteligence?? I dont know the short say.

keen beacon Aug 28, 2025, 5:12 PM

#

Lmaoo

zealous sparrow Aug 28, 2025, 5:12 PM

#

Nevermind its probably just Microsoft Artifical Inteligence..

rustic knot Aug 28, 2025, 5:12 PM

#

oai stream rn (it's incredibly bad)

ember whale Aug 28, 2025, 5:14 PM

#

hi

quartz pike Aug 28, 2025, 5:14 PM

#

Yall how good is mai

keen beacon Aug 28, 2025, 5:15 PM

#

quartz pike Yall how good is mai

No clue

quartz pike Aug 28, 2025, 5:15 PM

#

i expect it to be trash

#

since well... Microsoft

#

microsoft's models suck ass

zealous sparrow Aug 28, 2025, 5:15 PM

#

quartz pike Yall how good is mai

13 on text arena thats what we know rn

#

I will test webdev

#

prob goin to be so bad

quartz pike Aug 28, 2025, 5:15 PM

#

hmmm

#

how do you test webdev?

#

js asking

zealous sparrow Aug 28, 2025, 5:16 PM

#

personally use codepen

#

for html

#

too lazy to open vsc

quartz pike Aug 28, 2025, 5:16 PM

#

oh

#

me stupid

#

what is codepen

#

and how do i use it

zealous sparrow Aug 28, 2025, 5:16 PM

#

quartz pike what is codepen

online html editor

quartz pike Aug 28, 2025, 5:16 PM

#

And can you pls compare it to gpt 5 chat?

zealous sparrow Aug 28, 2025, 5:16 PM

#

quartz pike And can you pls compare it to gpt 5 chat?

it has no chance

quartz pike Aug 28, 2025, 5:16 PM

#

yes

#

but its worth it

#

lol

zealous sparrow Aug 28, 2025, 5:17 PM

#

it takes a while to generate a simple html game

hollow imp Aug 28, 2025, 5:17 PM

#

Gpt 5 high vs this new model

zealous sparrow Aug 28, 2025, 5:17 PM

#

hollow imp Gpt 5 high vs this new model

100-0

quartz pike Aug 28, 2025, 5:17 PM

#

hollow imp Gpt 5 high vs this new model

hell no 😭 it stands no chance

#

Maybe againt gpt 5 chat or nano

hollow imp Aug 28, 2025, 5:17 PM

#

So the new model is bs

#

I'm not even gonna try it

quartz pike Aug 28, 2025, 5:18 PM

#

but high there is no chance

hollow imp Aug 28, 2025, 5:18 PM

#

Then

quartz pike Aug 28, 2025, 5:18 PM

#

hollow imp So the new model is bs

we arent sure

#

But since its microsoft

#

its safe to assume its dogsh1t

zealous sparrow Aug 28, 2025, 5:18 PM

#

im doing the high vs mai comparison
Topic: Single html shooter

#

simple

hollow imp Aug 28, 2025, 5:18 PM

#

quartz pike But since its microsoft

Bro

#

They were using o1

zealous sparrow Aug 28, 2025, 5:18 PM

#

GPT 5 high is already done thinking and mai is still thinking

hollow imp Aug 28, 2025, 5:18 PM

#

And now they merged with github and released this sh it

quartz pike Aug 28, 2025, 5:19 PM

#

hollow imp They were using o1

bruh

#

no wonder it was ass

#

o1 is respectfully. DOGSH1T

zealous sparrow Aug 28, 2025, 5:19 PM

#

its obvious who is winning.

quartz pike Aug 28, 2025, 5:19 PM

#

left lol

hollow imp Aug 28, 2025, 5:19 PM

#

I'm saying before they released this

quartz pike Aug 28, 2025, 5:19 PM

#

mai1 thinks its einstein

hollow imp Aug 28, 2025, 5:20 PM

#

They were using o1

quartz pike Aug 28, 2025, 5:20 PM

#

thinking that much

civic flame Aug 28, 2025, 5:20 PM

#

mai-1 has been in development since early 2024 btw

#

☠️

zealous sparrow Aug 28, 2025, 5:20 PM

#

unless mai servers died

quartz pike Aug 28, 2025, 5:20 PM

#

DAMN

#

ALR ITS LOOSING

#

EARLY 2024

#

THATS A 1 YEAR OLD MODEL

#

SONNET 3.5 RELEASED 1 YEAR AGO

zealous sparrow Aug 28, 2025, 5:20 PM

#

btw

hollow imp Aug 28, 2025, 5:20 PM

#

🤮

zealous sparrow Aug 28, 2025, 5:20 PM

#

mai was designed to be a gpt 4 competitor

civic flame Aug 28, 2025, 5:20 PM

#

4 turbo*

glacial mulch Aug 28, 2025, 5:20 PM

#

lmao what is mai

civic flame Aug 28, 2025, 5:20 PM

#

lol

keen beacon Aug 28, 2025, 5:21 PM

#

mid ai

balmy mist Aug 28, 2025, 5:21 PM

#

so is openai speech on seasame now?

zealous sparrow Aug 28, 2025, 5:21 PM

#

glacial mulch lmao what is mai

prob short for Microsoft Artificial Inteligence

keen beacon Aug 28, 2025, 5:21 PM

#

glacial mulch lmao what is mai

#announcements message

#

ah

#

you meant that

whole swallow Aug 28, 2025, 5:21 PM

#

zealous sparrow its obvious who is winning.

This is on web lmarena or normal one?

zealous sparrow Aug 28, 2025, 5:22 PM

#

whole swallow This is on web lmarena or normal one?

lmarena.ai

whole swallow Aug 28, 2025, 5:22 PM

#

Aight

zealous sparrow Aug 28, 2025, 5:23 PM

#

yeah mai breaks if you tell it to webdev

#

its a generating loophole

civic flame Aug 28, 2025, 5:24 PM

#

literally agi

zealous sparrow Aug 28, 2025, 5:24 PM

#

been like 5 min

#

if it ever finishes and the game is worse than GPT-5s

#

i mean i told it to make a cafe site

#

it finished...

#

i am not going to lie

#

the site aint half bad

#

It found images.

supple vector Aug 28, 2025, 5:30 PM

#

Lmarena free API when????

#

day one of asking for LMArena free ai api

zealous sparrow Aug 28, 2025, 5:36 PM

#

my thoughts on mai

#

the model is slow but it makes decent sites

#

if LMArena errors at you during gen you are in for a lot of waiting

#

LMarena redid its captcha?

cyan zodiac Aug 28, 2025, 5:44 PM

#

#

what could it possibly be writing

prime mulch Aug 28, 2025, 5:45 PM

#

zealous sparrow been like 5 min

What is the result bro

zealous sparrow Aug 28, 2025, 5:45 PM

#

prime mulch What is the result bro

cant quite host it so ill show in a few images

#

theres also a contact us section

#

the generation takes so long tho

#

im pretty sure the next prompt i gave it which was add JS is already generatin 10mins

#

i did say a lot of js

#

it errored 🙁

misty vault Aug 28, 2025, 5:53 PM

#

@deep adder mai-1-preview is sydney from mcdonalds

#

#

zealous sparrow Aug 28, 2025, 5:58 PM

#

mai-1 must have a very small context window

#

will error if you ask for too much

golden ocean Aug 28, 2025, 6:34 PM

#

misty vault

Omg its reaL

misty vault Aug 28, 2025, 6:34 PM

#

supple vector Aug 28, 2025, 6:42 PM

#

supple vector day one of asking for LMArena free ai api

@echo aurora 🍍 Mr pinapple I beg

keen beacon Aug 28, 2025, 6:46 PM

#

Guys

#

I just joined

#

What's up with the hype

keen beacon Aug 28, 2025, 6:47 PM

#

zealous sparrow Medium Aritifical Inteligence?? I dont know the short say.

More like middling artificial intelligence hahaha lmao kill me pls

echo aurora Aug 28, 2025, 6:48 PM

#

supple vector <@283397944160550928> 🍍 Mr pinapple I beg

Could happen, can't say though I'm aware of these plans

wintry tinsel Aug 28, 2025, 6:55 PM

#

keen beacon More like middling artificial intelligence hahaha lmao kill me pls

keen beacon Aug 28, 2025, 6:55 PM

#

wintry tinsel

what

wintry tinsel Aug 28, 2025, 6:55 PM

#

No context

zealous sparrow Aug 28, 2025, 6:56 PM

#

MAI is not that bad but also not that good

#

context window is prob small

#

will error if you are demanding

#

takes long to gen

#

[talkin webdev]

unborn ocean Aug 28, 2025, 6:58 PM

#

zealous sparrow context window is prob small

is currently labelled as "exp", so might just be the not fully finished post training

magic stag Aug 28, 2025, 7:01 PM

#

Who's the one with nailoong enojis?

magic stag Aug 28, 2025, 7:02 PM

#

magic stag Who's the one with nailoong enojis?

#

#announcements message

#

Saw this reaction in announcements

keen beacon Aug 28, 2025, 7:02 PM

#

What's up with oAI streaming?

ocean vortex Aug 28, 2025, 7:03 PM

#

keen beacon What's up with oAI streaming?

gpt4o got updated lmao

#

#

They refuse to let that name go

keen beacon Aug 28, 2025, 7:04 PM

#

why did they ever upgrade 4o when they have 5?

#

now we have to retest all these stupid benchmarks again

ocean vortex Aug 28, 2025, 7:05 PM

#

oh wait I can't read

#

it's a previous model

#

it's now called gpt-realtime huh

quartz pike Aug 28, 2025, 7:06 PM

#

mai*

ocean vortex Aug 28, 2025, 7:07 PM

#

keen beacon why did they ever upgrade 4o when they have 5?

4o the text model was not actually updated ever since before gpt5 release

#

And now they renamed gpt4o-realtime into gpt-realtime, which makes sense actually

vast fern Aug 28, 2025, 7:08 PM

#

ocean vortex And now they renamed gpt4o-realtime into gpt-realtime, which makes sense actuall...

what new features?

ocean vortex Aug 28, 2025, 7:09 PM

#

vast fern what new features?

Marginal update looks like. They just polished it and renamed I think

#

As for gpt4o text model... Current version of that is gpt5-chat. If they haven't renamed, it would be that.

quartz pike Aug 28, 2025, 7:11 PM

#

quartz pike mai*

any updates on mai?*

ocean vortex Aug 28, 2025, 7:11 PM

#

it already cannibalized gpt4.1 so they held on to that name too long as is tbh

vast fern Aug 28, 2025, 7:11 PM

#

quartz pike any updates on mai?*

it sucks

ocean vortex Aug 28, 2025, 7:11 PM

#

vast fern it sucks

"mai" ??

keen beacon Aug 28, 2025, 7:12 PM

#

microsoft ai 1 😂

quartz pike Aug 28, 2025, 7:12 PM

#

vast fern it sucks

oh lol

ocean vortex Aug 28, 2025, 7:12 PM

#

keen beacon microsoft ai 1 😂

wtf 🗿

ornate agate Aug 28, 2025, 7:14 PM

#

keen beacon microsoft ai 1 😂

I thought that was a r1 finetune? is this one actually their own model now?

keen beacon Aug 28, 2025, 7:14 PM

#

yes 🤣 🤣

ocean vortex Aug 28, 2025, 7:15 PM

#

What is is exactly though... is it like a huge model or renamed Phi... 🧐

keen beacon Aug 28, 2025, 7:18 PM

#

theinformation had an article about in may 2024

#

lol

ocean vortex Aug 28, 2025, 7:18 PM

#

Lack of any metrics at all is not very inspiring

ornate agate Aug 28, 2025, 7:19 PM

#

"MAI-1-preview is an in-house mixture-of-experts model, pre-trained and post-trained on ~15,000 NVIDIA H100 GPUs. " :\

#

it better be really good if it used 10x the GPUs of DeepSeek

ocean vortex Aug 28, 2025, 7:21 PM

#

ornate agate it better be really good if it used 10x the GPUs of DeepSeek

It probably isn't if they can't sell it to public convincingly lol

raven heath Aug 28, 2025, 7:22 PM

#

ocean vortex Aug 28, 2025, 7:23 PM

#

First prompt impression it's thinking for ages and can't decode things nowhere near as well as gpt5-mini

ornate agate Aug 28, 2025, 7:23 PM

#

ocean vortex First prompt impression it's thinking for ages and can't decode things nowhere n...

I asked it an aime question and its still "generating..."

raven heath Aug 28, 2025, 7:24 PM

#

ornate agate Aug 28, 2025, 7:24 PM

#

theres literally no way anyone's gonna click on random audios posted with no words

ocean vortex Aug 28, 2025, 7:26 PM

#

raven heath

go into voice channel

echo aurora Aug 28, 2025, 7:26 PM

#

ornate agate theres literally no way anyone's gonna click on random audios posted with no wor...

I do SCsigh

ornate agate Aug 28, 2025, 7:27 PM

#

echo aurora I do <:SCsigh:853646030625308692>

If I was a mod here I would run discord in a VM with all the wierd stuff people spam onto this channel 😄

ocean vortex Aug 28, 2025, 7:27 PM

#

correct decoding:

#

sh'it decoding:

ornate agate Aug 28, 2025, 7:27 PM

#

it crashed on trying to do an aime question

#

lets try again

zinc ore Aug 28, 2025, 7:32 PM

#

Bruh 14 day ago poll

ocean vortex Aug 28, 2025, 7:32 PM

#

ornate agate lets try again

how did it go?

ornate agate Aug 28, 2025, 7:33 PM

#

ocean vortex how did it go?

generating...

solid brook Aug 28, 2025, 7:36 PM

#

bruh 2 of them really chose elon musk....

ornate agate Aug 28, 2025, 7:36 PM

#

ocean vortex sh'it decoding:

whats the text string you used as text? I wonder if a local model can do it

ocean vortex Aug 28, 2025, 7:37 PM

#

ornate agate whats the text string you used as text? I wonder if a local model can do it

PzA1MSBuYWh0IHJlbGxhbXMgdHViIDAyMSBuYWh0IHJlZ2dpYiBzaSBlcmF1cXMgZXNvaHcgcmVnZXRuaSB0c2VsbGFtcyBlaHQgc2kgdGFoVw==

wooden salmon Aug 28, 2025, 7:43 PM

#

Oh thx

ocean vortex Aug 28, 2025, 7:47 PM

#

MAI doesn't seem to actually be horrendous, but I really do not think it's gonna challenge the current best ones. Maybe og R1 level.

neon idol Aug 28, 2025, 7:48 PM

#

solid brook bruh 2 of them really chose elon musk....

I am one of these sigma 🗿🔥

gray junco Aug 28, 2025, 7:53 PM

#

hello

hardy lion Aug 28, 2025, 7:54 PM

#

ocean vortex MAI doesn't seem to actually be horrendous, but I really do not think it's gonna...

leaderboard has og r1 at 1395, mai-1-preview at 1402 and r1-0528 at 1417, so it's pretty close to between them

ornate agate Aug 28, 2025, 8:03 PM

#

ocean vortex PzA1MSBuYWh0IHJlbGxhbXMgdHViIDAyMSBuYWh0IHJlZ2dpYiBzaSBlcmF1cXMgZXNvaHcgcmVnZXRu...

gpt-oss can decode this.

unborn ocean Aug 28, 2025, 8:06 PM

#

ocean vortex PzA1MSBuYWh0IHJlbGxhbXMgdHViIDAyMSBuYWh0IHJlZ2dpYiBzaSBlcmF1cXMgZXNvaHcgcmVnZXRu...

decoding is a really boring test honestly

#

only openai ai is good at it

#

does not really reflect much

ornate agate Aug 28, 2025, 8:06 PM

#

it seems that other AIs fail the base64 decode itself, for some reason

ocean vortex Aug 28, 2025, 8:06 PM

#

unborn ocean decoding is a really boring test honestly

Boring test is a good test. You can't overfit or contaminate for this

#

There are endless possible prompts

unborn ocean Aug 28, 2025, 8:07 PM

#

ocean vortex Boring test is a good test. You can't overfit or contaminate for this

you can obviously rl your model on this as a task in post training
openai might be doing (based on their performance in the area)

#

very easy to implement

#

not overfit in the traditional sense, yet still optimised and not reflective of overall reasoning power

ocean vortex Aug 28, 2025, 8:08 PM

#

unborn ocean only openai ai is good at it

This one is one of the tougher ones. But in general other models can do well too

#

It's reversed and then encoded. So next level. But gpt5-mini can do it reliably so 👀

unborn ocean Aug 28, 2025, 8:09 PM

#

in my experience (which admittedly is from a couple of months, if not halve a year ago) openai is just stronger than expected on this

ocean vortex Aug 28, 2025, 8:10 PM

#

unborn ocean very easy to implement

Not really. This kinda converts to many other reasoning tasks too. But they didn't specifically do this for sure as with tools it will solve it in seconds

unborn ocean Aug 28, 2025, 8:10 PM

#

the same on coding competitions, which does not mean the overfit in the traditional sense here as well, it just means they have trained (rl'ed) a lot on the format

unborn ocean Aug 28, 2025, 8:10 PM

#

ocean vortex Not really. This kinda converts to many other reasoning tasks too. But they didn...

a year ago every lab was doing the same thing for basic number calculations (e.g. 1231*2333), so why should they not do that?

ocean vortex Aug 28, 2025, 8:11 PM

#

unborn ocean the same on coding competitions, which does not mean the overfit in the traditio...

That's how you should be doing it. This means model is able to generalise and you have improved performance.

ocean vortex Aug 28, 2025, 8:11 PM

#

unborn ocean a year ago every lab was doing the same thing for basic number calculations (e.g...

why should they is the better question. There's no such benchmark that they quoted which would test it...

sonic tendon Aug 28, 2025, 8:13 PM

#

unborn ocean only openai ai is good at it

i don't think that that's true? qwen and DS can do it fine, as well as 2.5 pro and flash

unborn ocean Aug 28, 2025, 8:13 PM

#

ocean vortex That's how you should be doing it. This means model is able to generalise and yo...

yes, but using the same structured environment they rl'ed in will result in the model looking better than they are in similar tasks (that they should actually learn, like in the coding comp example the actual coding vs the short-ish coding competitions they are rl'ing on)

ocean vortex Aug 28, 2025, 8:13 PM

#

They just didn't. It makes no sense to do it for them. Target something specifically you can't even promote or sell to the public? Makes no sense at all.

#

Unless they did smth similar to improve the reasoning performance in general

#

but then it just kills your entire point

#

lol

unborn ocean Aug 28, 2025, 8:14 PM

#

ocean vortex They just didn't. It makes no sense to do it for them. Target something specific...

the idea is not that it provides benefit to rl on this, but that it is an easy way to implement rlvr + curriculum learning to get better perf in other areas

ornate agate Aug 28, 2025, 8:14 PM

#

actually training specifically on this is exactly the sort of thing I imagine OpenAI to do.

unborn ocean Aug 28, 2025, 8:14 PM

#

ocean vortex Unless they did smth similar to improve the reasoning performance in general

that is obviously the point here, duh

sonic tendon Aug 28, 2025, 8:14 PM

#

this seems fairly trivial to test

ocean vortex Aug 28, 2025, 8:14 PM

#

unborn ocean the idea is not that it provides benefit to rl on this, but that it is an easy w...

They don't do random things they can't sell if it doesn't improve measurable performance

unborn ocean Aug 28, 2025, 8:15 PM

#

sonic tendon this seems fairly trivial to test

yes would also be interested, the last time i did this was in the later o1 days (so i think o3-mini just released)

#

and openai was king back then

#

and i felt like they did use it for training

ocean vortex Aug 28, 2025, 8:16 PM

#

You are essentially saying "yeah it's better but only because they focused on making it better". Or like "yeah it is better on this thing but only because they focused on this specific thing they didn't advertise anywhere at all and there's no benefit at all". None of these statements make any sense whatsover tbh

#

Metrics do though. No one has time to dig through this nonense of what people are doing lol

#

They would degrade what actually matters to them if they were training so randomly

#

It's just one of the things that highlighs where OpenAI is clearly ahead atm

whole wagon Aug 28, 2025, 8:19 PM

#

Microsoft made their own LLM kekw Top 10 anime betrayals

sonic tendon Aug 28, 2025, 8:19 PM

#

ocean vortex It's just one of the things that highlighs where OpenAI is clearly ahead atm

i really don't think that openAI is particularly ahead in this

sonic tendon Aug 28, 2025, 8:20 PM

#

whole wagon Microsoft made their own LLM <:kekw:799290545942953986> Top 10 anime betrayals

they made phi

#

and an R1 finetune I think?

ocean vortex Aug 28, 2025, 8:20 PM

#

sonic tendon i really don't think that openAI is particularly ahead in this

They kinda are measurably ahead. Other models can do it, but most of the time accuracy is visibly worse

whole wagon Aug 28, 2025, 8:20 PM

#

This is the first major llm release out of Microsoft

unborn ocean Aug 28, 2025, 8:20 PM

#

ocean vortex You are essentially saying "yeah it's better but only because they focused on ma...

i dont think you are getting my point:
they did not focus on this to sell the 10 nerds who try it out on their model, they probably used it in their rl training process to enhance other areas, why?:

it is really easy to implement (no humans needed)
one could suspect that the models would benefit from more concise and flawless reasoning in other areas aswell (bc one flaw results in broken output and the token usage indicates efficiency)
it is really easy to implement curriculum learning on top of this (scaling the difficulty of the decoding problem)
no real overfit

sonic tendon Aug 28, 2025, 8:21 PM

#

https://huggingface.co/microsoft/MAI-DS-R1

microsoft/MAI-DS-R1 · Hugging Face

sonic tendon Aug 28, 2025, 8:21 PM

#

whole wagon This is the first major llm release out of Microsoft

maybe

ocean vortex Aug 28, 2025, 8:21 PM

#

unborn ocean i dont think you are getting my point: they did not focus on this to sell the 10...

Well then if they used it to make the model better (unlikely, but ok, let's assume that) and arrive at the current SOTA to improve it general, what is the problem?

whole wagon Aug 28, 2025, 8:22 PM

#

https://microsoft.ai/news/two-new-in-house-models/

Microsoft AI

Two in-house models in support of our mission

Introducing MAI-Voice-1 and MAI-1-Preview, our new purpose-built models.

ocean vortex Aug 28, 2025, 8:23 PM

#

Well almost everyone is doing math or cyphers with reasoning

#

for training

unborn ocean Aug 28, 2025, 8:23 PM

#

ocean vortex Well then if they used it to make the model better (unlikely, but ok, let's assu...

i don't have a problem with them doing it, my point is just that using the very environment they (potentially) used for training as a benchmark seems like a really dumb idea

whole wagon Aug 28, 2025, 8:23 PM

#

I actually knew about this Microsoft ai for ages. They operated in secret as a redundancy for if openAI ever betrayed Microsoft

ocean vortex Aug 28, 2025, 8:23 PM

#

And decoding is fundamentally just math

#

So yeah... 🤷‍♂️

#

Someone just does it better than others lol

unborn ocean Aug 28, 2025, 8:24 PM

#

whole wagon I actually knew about this Microsoft ai for ages. They operated in secret as a r...

yeah they heavily scaled up, they always had some interesting people, but now they are all moving into the proprietary team

whole wagon Aug 28, 2025, 8:24 PM

#

Will be interesting to see how fast Microsoft climbs to the frontier

#

Everyone's first LLM is bad

sonic tendon Aug 28, 2025, 8:25 PM

#

people seem to be sloptimizing for LMArena

unborn ocean Aug 28, 2025, 8:25 PM

#

amazon, lol

ocean vortex Aug 28, 2025, 8:25 PM

#

wdym. RL training for reasoning pretty much evolves around math. That's how you can see deterministic results and eval reasoning traces

#

Like that's not news lol

rustic knot Aug 28, 2025, 8:26 PM

#

the entirety of AI is math

sonic tendon Aug 28, 2025, 8:26 PM

#

unborn ocean amazon, lol

mistral is still a surprise to me

#

i always liked them for their terse responses

#

but then

unborn ocean Aug 28, 2025, 8:27 PM

#

same, still mentally ignoring that they are n.2 without style control

#

can't explain why

ocean vortex Aug 28, 2025, 8:27 PM

#

This is still math if you write the prompt correctly. Nothing to do with puzzles when we are assessing just decoding. If you do base62 there's gonna be a TON of math

sonic tendon Aug 28, 2025, 8:27 PM

#

i feel like DS might be the first to top Google on the non style control leaderboard

#

since 2.5

ocean vortex Aug 28, 2025, 8:28 PM

#

Every model knows how to do it

unborn ocean Aug 28, 2025, 8:28 PM

#

yeah, if huawei gets their sh*t together (big if)

ocean vortex Aug 28, 2025, 8:28 PM

#

but not every model can do the math part with good enough precision

#

disagree. It's simply testing how well a model can reason and for how long while still being able to complete the task and not go offtrack

#

If it gives up half-way through and hallucinates an answer - you now know the limits of it.

#

tools render it invalid test lol

#

But that's also why it is unlikely for AI labs to focus their training on it

#

It is not. ML models are not humans catgrin

#

Ever since reasoning became a thing, this is the way to test that (test-time compute) tbh

#

If it's allowed to reason as long as it needs to and it had good RL training, eventually it will arrive at the answer with high enough confidence

#

And generally... I don't think it should be surprise to anyone that OpenAI is leading on fine-tuning at this stage to be completely honest. Their model is not the biggest and nor do they have the most compute

#

As I've said, it makes no sense at all to do this UNLESS it improves performance for them elsewhere they can show. So it's kinda irrelevant?

#

Testing this is meant to show what it can do what is applicable to IRL tasks. I think this does it well. Regardless of if they found it to be the case by training for it or if they didn't train for it.

#

It is positively not possible for them to train for it "just because" if they hadn't found it improving performance on the things they can showcase and sell (published metrics). Like this is just ridiculous and totally not how things are done lol

tall summit Aug 28, 2025, 8:46 PM

#

LMAO

ocean vortex Aug 28, 2025, 8:47 PM

#

No I gave you the arguments lol

#

I find it unbelievable what mental gymnastics you are doing to turn some thing a model does well at into a negative. Coming up with some wild theories 🗿

#

Don't be THIS biased 👀

whole wagon Aug 28, 2025, 8:50 PM

#

openAI does a bit of chess training I think. There's no way GPT5 is this good at it otherwise

ocean vortex Aug 28, 2025, 8:50 PM

#

Dunno about chess... 🤔

#

maybe

#

lol what

#

You probably realise yourself this is nonsense

#

At least I would hope so

#

just stop it lol

#

example 236 from their RL gym ???

#

came up with the prompt myself, reversed it myself, encoded it myself. Have more prompts than just this one

#

well you are just spitting non-sense

#

🗿

#

My goal is to eval models regardless what lab was behind it. Believe it or not in my testing gpt-oss didn't do great overall at all

#

Even though I did have fairly high expectations

#

You think wrong...? Or maybe didn't read attentively

whole wagon Aug 28, 2025, 8:55 PM

#

Gpt oss is not well rounded. Great in some things bad in others

ocean vortex Aug 28, 2025, 8:55 PM

#

whole wagon Gpt oss is not well rounded. Great in some things bad in others

yeah it falls apart fairly quick

#

o4-mini considerably better model, like no comparison better, imo

verbal nimbus Aug 28, 2025, 8:56 PM

#

alpine coral is `nightride` a strong model otherwise? like aside from up to date info

It performed well on my prompt, but I rarely encounter the model.

whole wagon Aug 28, 2025, 8:56 PM

#

o4 mini different price class. Compare to GPT5 nano

ocean vortex Aug 28, 2025, 8:58 PM

#

whole wagon o4 mini different price class. Compare to GPT5 nano

they were selling the biggest OSS variant as the model equivalent to o4-mini though. Their goal was indeed comparable performance I believe

#

It wasn't meant to compete with nano

whole wagon Aug 28, 2025, 8:59 PM

#

The small 20b model is just straight up bad. Like qwen is better lol

ocean vortex Aug 28, 2025, 8:59 PM

#

whole wagon The small 20b model is just straight up bad. Like qwen is better lol

Oh I mean the 120b one

whole wagon Aug 28, 2025, 8:59 PM

#

Yeah ik

ocean vortex Aug 28, 2025, 8:59 PM

#

The smaller ones are naturally worse still

whole wagon Aug 28, 2025, 8:59 PM

#

I'm just saying the small one is basically useless lol

#

Qwen 30b a3b outperforms

ocean vortex Aug 28, 2025, 9:00 PM

#

#general message

#

Other than that we have no way of knowing what exactly they have done. You can say this about everything

#

but at a certain point you just need to accept the facts

ocean vortex Aug 28, 2025, 9:03 PM

#

whole wagon I'm just saying the small one is basically useless lol

yeah I can believe that. Personally I started with the biggest most capable one and lost interest fairly quickly lol

whole wagon Aug 28, 2025, 9:08 PM

#

There are fine tunes that remove it iirc

verbal nimbus Aug 28, 2025, 9:08 PM

#

OSS 120B is interesting, only 5B active parameters

ornate agate Aug 28, 2025, 9:09 PM

#

whole wagon There are fine tunes that remove it iirc

those guys also usually degrade the model significantly, I prefer to just use another model really.

wintry tinsel Aug 28, 2025, 9:12 PM

#

rustic knot the entirety of AI is math

It operates using math although the raw data it is learning from is not necessarily math so data science is not purely math

rancid knot Aug 28, 2025, 9:18 PM

#

Hello, I want to learn how to use Ai properly and change style of image and video

errant condor Aug 28, 2025, 9:28 PM

#

Guys any way to generate 9:16 videos from an image?

empty briar Aug 28, 2025, 9:36 PM

#

Hello, I am here to learn about the new AI tool with LMArena

golden ocean Aug 28, 2025, 9:38 PM

#

true

maiden fox Aug 28, 2025, 9:54 PM

#

helli, I love just you

white hatch Aug 28, 2025, 9:56 PM

#

Any benchmarks where mini models are better than their big brother?

keen beacon Aug 28, 2025, 10:01 PM

#

white hatch Any benchmarks where mini models are better than their big brother?

High-level: yes, this happens. Smaller models can beat their bigger sibling when:

They’re task-specialized (e.g., a 7B code-tuned model edging out a general 70B on coding benchmarks like HumanEval/MBPP).
The benchmark rewards concise instruction-following (some 7–8B instruct models have topped their family’s larger base models on MT-Bench–style evaluations).
There are tight context/latency budgets (short prompts, low-temperature, or limited tokens) where smaller models are less prone to over-elaboration.
The pipeline uses strong retrieval/tools, making the generator size less decisive.
Evaluation variance is high and differences are within noise; smaller models can win specific subsets even if they lose on average.

#

got it to google

minor schooner Aug 28, 2025, 10:05 PM

#

Hi all fellow AI enabled self expression utilisers..! Here to compare different models abilities to bring hard sci-fi animations with a few very exact requirements for details expected to be present, and keep the consistency in check.

errant rover Aug 28, 2025, 10:20 PM

#

Guys what's the best AI for upscaling right now ?

bleak jewel Aug 28, 2025, 10:44 PM

#

Hi everyone

rocky frigate Aug 28, 2025, 10:46 PM

#

Hey I'm new to here tell me how to add a prompt and generate a video. How to type the prompt?

barren prairie Aug 28, 2025, 10:56 PM

#

bleak jewel Hi everyone

Hi

echo aurora Aug 28, 2025, 10:56 PM

#

rocky frigate Hey I'm new to here tell me how to add a prompt and generate a video. How to typ...

More info can be found in #1397655624103493813 that should help!

sonic tendon Aug 28, 2025, 11:13 PM

#

wait, when did this happen?

#

i don't think it's even on OR yet

#

no idea, I haven't seen it till now

jovial wolf Aug 28, 2025, 11:20 PM

#

when i try https://web.lmarena.ai/
it gives me like ksx files that are a big headache to run and are buggy, rather than the quick .html files im used to getting from chatgpt's site. does anyone know more about this?

like the sandbox doesnt work, ever. maybe ill just try a different browser?

urban mulch Aug 28, 2025, 11:24 PM

#

New here

hardy lion Aug 28, 2025, 11:31 PM

#

sonic tendon wait, when did this happen?

my guess would be that it happened on or around 2025-08-15 🤣

echo aurora Aug 28, 2025, 11:32 PM

#

You'll want to review #1397655624103493813 for more information on how to use Video Arena

median oasis Aug 28, 2025, 11:46 PM

#

Hello all, Her to attempt to 'level up' my content creation skills for non profit association member engagement

sonic tendon Aug 28, 2025, 11:52 PM

#

hardy lion my guess would be that it happened on or around 2025-08-15 🤣

I think that's just when they finished it internally?

#

https://x.com/JustinLin610/status/1960692051185688646?t=qS_urFTZ8QuWJifepTXGfQ&s=19

Junyang Lin (@JustinLin610)

Qwen

#

this seems like it could've been when it hit the arena

#

if so it wouldn't be uncommon

echo sinew Aug 29, 2025, 12:06 AM

#

empty briar Hello, I am here to learn about the new AI tool with LMArena

Hi! You'll want to check #1397655624103493813 for more information on how to use our tools for generating content.

keen beacon Aug 29, 2025, 12:16 AM

#

Is it even on chat.qwen.ai already?

rustic knot Aug 29, 2025, 12:17 AM

#

god bless Qwen, they will overtake deepseek in fundamental ai research and development

ornate sleet Aug 29, 2025, 12:19 AM

#

New here 🧡

keen beacon Aug 29, 2025, 12:19 AM

#

rustic knot god bless Qwen, they will overtake deepseek in fundamental ai research and devel...

Insane cope

rustic knot Aug 29, 2025, 12:19 AM

#

keen beacon Insane cope

ok, what do u think then

keen beacon Aug 29, 2025, 12:22 AM

#

rustic knot ok, what do u think then

Idk, it just did far worse against Deepseek each time I compared them both

#

To be honest we do not even have any good benchmarks to start with

jovial pelican Aug 29, 2025, 12:45 AM

#

I'm curious to see and be a part of

lofty elm Aug 29, 2025, 12:50 AM

#

nano-banana not showing in direct chat?

keen beacon Aug 29, 2025, 12:51 AM

#

lofty elm nano-banana not showing in direct chat?

Its Gemini 2.5 Flash Image now. Also titled as nano-banana on LMArena because people just loved this name more

lofty elm Aug 29, 2025, 12:52 AM

#

keen beacon Its Gemini 2.5 Flash Image now. Also titled as nano-banana on LMArena because pe...

i cant see it in direct chat options?

keen beacon Aug 29, 2025, 12:52 AM

#

lofty elm i cant see it in direct chat options?

Click on the image generation option

lofty elm Aug 29, 2025, 12:52 AM

#

keen beacon Click on the image generation option

oh thats it im seeing it now, i just woke up

#

thankss

keen beacon Aug 29, 2025, 12:54 AM

#

okay, this Qwen seems to be a bit better than previous versions

#

still far away from frontier

#

seems like they did something to reduce hallucinations this time

livid venture Aug 29, 2025, 12:56 AM

#

Hello, LmArena

lofty elm Aug 29, 2025, 1:02 AM

#

thanks its working

keen beacon Aug 29, 2025, 1:07 AM

#

lofty elm thanks its working

by the way, it's also in aistudio.google.com now since they released it

errant rover Aug 29, 2025, 1:26 AM

#

yo

verbal nimbus Aug 29, 2025, 1:50 AM

#

Q1 2026? 💀

#

The new AI Studio is bad, can't even scroll up

rare python Aug 29, 2025, 2:01 AM

#

https://ai.studio/banana

Google AI Studio

The fastest path from prompt to production with Gemini

coarse charm Aug 29, 2025, 2:05 AM

#

hi

fiery wraith Aug 29, 2025, 2:18 AM

#

hi

muted delta Aug 29, 2025, 2:35 AM

#

hi

gloomy badge Aug 29, 2025, 2:40 AM

#

ya

rain rune Aug 29, 2025, 2:43 AM

#

Hey

keen beacon Aug 29, 2025, 2:43 AM

#

Did anyone know that AIs trained with reinforcement learning lose the ability to be creative?

#

https://gwern.net/doc/reinforcement-learning/preference-learning/mode-collapse/

‘AI mode collapse’ directory

Bibliography for directory <code>reinforcement-learning/preference-learning/mode-collapse</code>, most recent first: 4 <a class='icon-not' href='/doc/reinforcement-learning/preference-learning/mode-collapse/index#see-alsos'>related tags</a>, 105 <a class='icon-not' href='/doc/reinforcement-learning/preference-learning/mode-collapse/index#links'>...

#

Mode collapse harms esthetics: outputs start to sound the same, like “AI slop” or “ChatGPTese”, or look somehow similar, like “the Midjourney look.” This cripples creative uses like creative writing. And this damage can manifest in strange ways, like models refusing to write non-rhyming poetry & subtly steering non-rhyming inputs towards rhyming, or being unable to generate random numbers.

#

Basically, you just train a model to give precise and correct answers - and it starts to create art with the same precision and correctness as if it was a reinforcement learning challenge. It loses the ability to synthesize from weakly related and totally unrelated ideas - it loses the ability to be creative.

steep cypress Aug 29, 2025, 2:46 AM

#

helo!

novel umbra Aug 29, 2025, 2:57 AM

#

hello

gusty gull Aug 29, 2025, 3:01 AM

#

Hi just dicovered lm arena on discord 🙂

echo aurora Aug 29, 2025, 3:03 AM

#

hey everyone ablobwave

exotic nebula Aug 29, 2025, 3:13 AM

#

echo aurora hey everyone <a:ablobwave:552927506957729802>

yo congrats on your promotion to admin 🥳

molten fern Aug 29, 2025, 3:14 AM

#

hello

exotic nebula Aug 29, 2025, 3:14 AM

#

molten fern hello

hii

echo aurora Aug 29, 2025, 3:15 AM

#

exotic nebula yo congrats on your promotion to admin 🥳

https://tenor.com/view/huh-dog-hug-dog-question-mark-dog-question-mark-gif-14144426802562365039

Tenor

#

haven't I always had admin

exotic nebula Aug 29, 2025, 3:16 AM

#

echo aurora haven't I always had admin

https://tenor.com/view/bruh-meme-gif-26978290

Tenor

#

bro i thought you were just a mod 😭

slow galleon Aug 29, 2025, 3:23 AM

#

hello

limber acorn Aug 29, 2025, 3:38 AM

#

hello

tired lichen Aug 29, 2025, 3:40 AM

#

hi there

stoic raft Aug 29, 2025, 3:53 AM

#

hi.. im newbie.. how generate video here?

willow steppe Aug 29, 2025, 4:01 AM

#

hello

quartz light Aug 29, 2025, 4:02 AM

#

ad_4nxdh77zf4ze8rat4wfayfy4rcgp7sdakwcyixrfihpgrfni9qx0axu17_cf2wkzqbypwwvml6y_2woa97duvyfyw4_drl7s6yexzcgbgyljnamhlu51krhegnbccoqgv1hyzioqk.png

#

bruh

#

where is the coding model

keen beacon Aug 29, 2025, 4:08 AM

#

hello

granite igloo Aug 29, 2025, 4:21 AM

#

Hello I have the reached the limit for nanobanana, it's telling me to wait another 50 mins to generate so I have a question for that, How many Images can I generate before hitting the limit? I have generated around 15-20 images before hitting the limit and once the 50 minutes pass will I be able to again generate 15-20 images or will I be limited to a few images only? TIA

hollow horizon Aug 29, 2025, 4:25 AM

#

Hello

limber crow Aug 29, 2025, 4:26 AM

#

the model of video generate is random?

#

I can't choose the model

glossy tinsel Aug 29, 2025, 4:26 AM

#

Hello

desert abyss Aug 29, 2025, 4:30 AM

#

limber crow the model of video generate is random?

Yes, it's random here in Discord.

desert abyss Aug 29, 2025, 4:31 AM

#

limber crow I can't choose the model

You can go to the battle mode to choose the model: https://lmarena.ai/?arena

limber crow Aug 29, 2025, 4:32 AM

#

desert abyss Yes, it's random here in Discord.

so only the picture mode can choose?

#

🤔

lilac pivot Aug 29, 2025, 5:02 AM

#

Hello. I new to this arena.

proven reef Aug 29, 2025, 5:15 AM

#

hi

echo aurora Aug 29, 2025, 5:32 AM

#

ablobwave

echo aurora Aug 29, 2025, 5:32 AM

#

stoic raft hi.. im newbie.. how generate video here?

If you check out #1397655624103493813 you'll find what you're looking for.

echo aurora Aug 29, 2025, 5:34 AM

#

limber crow I can't choose the model

to the battle mode to choose the model:
slight correction - selecting Side by Side & Direct is what will let you select specific models https://lmarena.ai/?arena=&mode=side-by-side

vague owl Aug 29, 2025, 5:44 AM

#

hi everyone

lone notch Aug 29, 2025, 6:15 AM

#

helo

keen fulcrum Aug 29, 2025, 6:23 AM

#

quartz light where is the coding model

they released Grok Code Fast 1

quartz light Aug 29, 2025, 6:23 AM

#

keen fulcrum they released Grok Code Fast 1

its ass

keen fulcrum Aug 29, 2025, 6:23 AM

#

I was expecting more as well

quartz light Aug 29, 2025, 6:23 AM

#

wait

keen fulcrum Aug 29, 2025, 6:23 AM

#

Its ok for its price

quartz light Aug 29, 2025, 6:23 AM

#

is that the only thing??

keen fulcrum Aug 29, 2025, 6:24 AM

#

Yes

quartz light Aug 29, 2025, 6:24 AM

#

seriously????????

#

ive been waiting for over a month

#

for that sh?

#

@deep adder grok fell off

quartz light Aug 29, 2025, 6:24 AM

#

keen fulcrum Its ok for its price

no

#

its many times worse than gpt 5 nano

whole wagon Aug 29, 2025, 6:29 AM

#

astral pagoda Aug 29, 2025, 6:30 AM

#

quartz light where is the coding model

Isn’t it Sonic ?

quartz light Aug 29, 2025, 6:35 AM

#

idk

#

yall check this out

#

https://websim.com/@rat/ai-game-generator

AI Game Generator

#

its goofy but

#

i guess camera rotation kinda works

#

ill fix in like 10 hrs

#

btw it uses nanobanana

#

and yes it is super inefficient but uhhh

#

yes

empty salmon Aug 29, 2025, 6:50 AM

#

Google aistudio remove aspect ratio 9:16 for veo ?

latent pagoda Aug 29, 2025, 6:50 AM

#

hello

empty salmon Aug 29, 2025, 6:58 AM

#

empty salmon Google aistudio remove aspect ratio 9:16 for veo ?

anyone have same issue?

keen beacon Aug 29, 2025, 7:13 AM

#

Insanely difficult task for LLMs:

What emotions does the following piece of music evoke? Why? Conduct a comprehensive music theory analysis.

Kick = 1/4 * 32
Snare = 1/16~, 1/32~, 1/16, 1/8, 1/16~, 1/16, 1/16., 1/32~ * 64
Closed hat = 1/16 * 128
Open hat = 1/8 * 64
Bass = (B1 1/4 * 6, G#1 1/4 * 2) * 4
Celli pizzicato = ((B3 1/8 D#4 1/8 F#4 1/8) * 21), B3 1/8
Organ = (D#5min 1/8, 1/8~, D#5min 1/8, F#5maj 1/8, C#5maj 1/8, 1/8~, C#5maj 1/8, F5dim 1/8, B4maj 1/8, 1/8~, B4maj 1/8, F#5Maj 1/8, A#4min 1/8, 1/8~, A#4min 1/8, C#5maj 1/8) * 4
Choir = (B4maj 1/4.., F5dim/B4 1/4.., F5Dim/B4 1/8, C#5maj/G#4 1/2, C#5maj 1/2) * 2
Trumpet = (B3+G#4 1/4.., F4+G#4 1/8., C#4+G#4 1/8., C#4+G#4+B4 1/8., C#4+B4 1/4.., D#4+B4 1/4.., D#4+B4 1/8, B3+G#4 1/4.., F4+G#4 1/8., C#4+G#4 1/8., C#4+G#4+B4 1/8., C#4+B4 1/4.., D#4+A#4 1/4.., D#4+A#4 1/8) * 2

Legend:
C#5maj/G#4 1/2..

C - note label
5 - octave label
maj - chord quality
/G#4 - note in bass (for inversions)
1/2 - note duration

Other:
. - dotted note
~ - pause
B3+G#4 - notes sounding together

#

Here it is:

#

Qwen 2507 Thinking thinks that the tempo is

Implied Moderate to Slow (Dotted half notes in choir, long bass notes). Likely 70-90 BPM – slow enough for emotional weight, fast enough for rhythmic drive.
Which is absolutely ridiculous. Other models, however, determine it correctly at 120-160 BPM.

#

But if I tell that it is 70 BPM in the prompt, they all pretend that the tempo is "slow and ceremonial march", which is again ridiculous

#

That Qwen is absolutely horrible at this. It invents things I never included in this track and never ever studied:

Organ/Choir/Trumpet Progression: This is a direct adaptation of Pachelbel's Canon progression (I-V-vi-iii-IV-I-IV-V), but transposed to B minor with minor-mode alterations

Picardy Third (B4maj): The tonic chord resolving as major (B-D#-F#) instead of minor (B-D-F#) is the emotional core. It transforms expected sorrow into luminous hope—like sunlight breaking through clouds. This is the primary source of "bittersweet" tension.

#

Okay, maybe I accidentally rediscovered Pachebel's Canon, but there are literally no Picardy third in the end here, all chords here are strictly diatonic to F# major/B Lydian, the mode is definitely major so it can't be a Picardy third. It completely made it up.

#

New Qwen-Max which is at LMArena right now is much better. But it is still horrible, making up things and hearing modes that are simply not present in the track

#

Deepseek V3.1 thinking in the chat is much better. It makes something up sometimes, but at least it is able to clearly identify the tonal center at B most of the time, and call it a major mode. Which is very close, because it is actually B Lydian, a major mode of F# Major scale.

whole wagon Aug 29, 2025, 7:25 AM

#

keen beacon Aug 29, 2025, 7:25 AM

#

The worst offenses I can accuse Deepseek in are "chromaticism" and "modal interchange" (which are bs because all chords are strictly diatonic here), but not goddamn Picardy thirds

#

Qwen on other hand completely slops everything up. Also the first model that calculated the tempo to be 2 times slower than it actually is.

zinc ore Aug 29, 2025, 7:28 AM

#

whole wagon

Crazy grok is on the list but not oai

keen beacon Aug 29, 2025, 7:28 AM

#

GPT 5 is the only model that correctly identifies B Lydian. It succeeds around 2-3 attempts out of 10, but it is still better than Qwen and Deepseek that never managed to do it.

#

Also the only model that keep insisting on F# or B Major and disregards other modes, which is almost correct

ocean vortex Aug 29, 2025, 7:30 AM

#

whole wagon

grok code only there because it's a new model lol

whole wagon Aug 29, 2025, 7:33 AM

#

whole wagon

Who is top

ocean vortex Aug 29, 2025, 7:34 AM

#

Now I see why I haven't heard much about it. This is probably the first coding model I see that doesn't have any coding metrics neither in their press release nor even in model card https://data.x.ai/2025-08-26-grok-code-fast-1-model-card.pdf

#

https://x.ai/news/grok-code-fast-1

Grok Code Fast 1 | xAI

We're thrilled to introduce grok-code-fast-1, a speedy and economical reasoning model that excels at agentic coding.

#

Last week, we quietly released grok-code-fast-1 under the codename sonic. During this stealth phase, our team carefully monitored community channels and deployed multiple new model checkpoints to address feedback.

So basically 'sonic' was no good and they buried the results. 🗿

#

Their blogpost absolutely screams "AI written" lmao

grok-code-fast-1 was crafted to shine in the tasks developers face every day, striking a compelling balance between performance and cost. Its strength lies in delivering strong performance in a economical, compact form factor, making it a versatile choice for tackling common coding tasks quickly and cost-effectively.

wild galleon Aug 29, 2025, 7:49 AM

#

How if i want use API of Lmarena.ai to my openweb ui ??

unborn ocean Aug 29, 2025, 7:55 AM

#

wild galleon How if i want use API of Lmarena.ai to my openweb ui ??

Can’t. There is no api, scraping etc. is obviously not allowed.

#

But you might want to look at this: https://github.com/cheahjs/free-llm-api-resources.

#

If you are looking for resources in general.

keen fulcrum Aug 29, 2025, 7:57 AM

#

ocean vortex Now I see why I haven't heard much about it. This is probably the first coding m...

It does in the model card

ocean vortex Aug 29, 2025, 7:57 AM

#

keen fulcrum It does in the model card

where

keen fulcrum Aug 29, 2025, 7:58 AM

#

zinc ore Aug 29, 2025, 8:00 AM

#

Where's the coding benchmarks (outside of cyber security)

keen fulcrum Aug 29, 2025, 8:00 AM

#

If they were that proud of their coding model we would have seen promotions by Musk

ocean vortex Aug 29, 2025, 8:00 AM

#

keen fulcrum

biology and chemistry I wouldn't classify as coding. Cybersecurity cybench perhaps, with a stretch... Though I wouldn't expect to be looking at this and nothing else for a coding model

EDIT: Ok they did low-key mention SWE in their news article. That's something I suppose

pine kraken Aug 29, 2025, 8:03 AM

#

https://x.com/noob_contrarian/status/1961335098327929081

Hope you all are having a good day. I just wanted to take a moment and tell you all about a research study I’m conducting for an app that focuses on AI Brainrot Detox with some new features and benefits for people who are struggling. I would really, really appreciate it if you all could just take out a few minutes from your day and participate in this.
I’m reiterating that this is completely anonymous so please don’t feel uncomfortable.

Ash (@noob_contrarian)

working on recovering from AI overload - quick poll to see who else needs help: [https://t.co/dcy1UF0iOG]

(anonymous, of course; we’re all in this together)

hollow imp Aug 29, 2025, 8:26 AM

#

pine kraken https://x.com/noob_contrarian/status/1961335098327929081 Hope you all are havin...

Bruh

rocky mauve Aug 29, 2025, 8:32 AM

#

pine kraken https://x.com/noob_contrarian/status/1961335098327929081 Hope you all are havin...

🤖

fleet narwhal Aug 29, 2025, 8:35 AM

#

hello..just wanted to learn more about AI,that is why joined..have a good day to all

tender trellis Aug 29, 2025, 8:42 AM

#

Wha5h is this

vapid sapphire Aug 29, 2025, 8:52 AM

#

hi

leaden sun Aug 29, 2025, 9:06 AM

#

keen beacon Another interesting writeup from these guys: https://www.lesswrong.com/posts/SJA...

thanks for sharing, finally something substantial to back what we all feel about claude for a long time now

gaunt void Aug 29, 2025, 9:22 AM

#

Hello

tall summit Aug 29, 2025, 9:36 AM

#

keen beacon https://www.lesswrong.com/posts/aFW63qvHxDxg3J8ks/nobody-is-doing-ai-benchmarkin...

lesswrong

cedar estuary Aug 29, 2025, 9:36 AM

#

hi everyone!

tall summit Aug 29, 2025, 9:36 AM

#

lesswrong moment, disregarding it immediately

leaden sun Aug 29, 2025, 9:40 AM

#

ok? i should check their credibility then 😅

rich field Aug 29, 2025, 9:41 AM

#

hi

leaden sun Aug 29, 2025, 9:42 AM

#

rich field hi

hey nice to see you here Eric!

rich field Aug 29, 2025, 9:42 AM

#

🙂

#

what is it all about?

leaden sun Aug 29, 2025, 9:44 AM

#

well...talking about benchmarking mostly 😆 feel free to look around and play with image/video generation

neon idol Aug 29, 2025, 9:53 AM

#

leaden sun well...talking about benchmarking mostly 😆 feel free to look around and play w...

Bruh what the hell of AI are you?

rose vessel Aug 29, 2025, 9:55 AM

#

hello all !

fleet lintel Aug 29, 2025, 10:17 AM

#

I am lookoing for some hard (text base) questions that we expect that LLMs should be able to solve but currently cant

tall summit Aug 29, 2025, 10:18 AM

#

fleet lintel I am lookoing for some hard (text base) questions that we expect that LLMs shoul...

some of HLE

leaden sun Aug 29, 2025, 10:23 AM

#

neon idol Bruh what the hell of AI are you?

Hi, I'm Sydney. How can I help you today?

daring schooner Aug 29, 2025, 10:25 AM

#

hello

neat quarry Aug 29, 2025, 10:33 AM

#

Hello!

#

how i make videos here?

dusky tangle Aug 29, 2025, 10:46 AM

#

Hello! 🙂

wicked thicket Aug 29, 2025, 10:56 AM

#

hello

neat quarry Aug 29, 2025, 10:56 AM

#

I sent my prompt, but the video wasn't made T-T

#

How long does it take to make?

#

Damn, I think I've already used up my daily credits even though I haven't done anything lol

wanton knot Aug 29, 2025, 10:59 AM

#

Hello!

stark sand Aug 29, 2025, 11:02 AM

#

they are enjoying burger in a happy and smiling mood

grand pewter Aug 29, 2025, 11:02 AM

#

Hii

latent crest Aug 29, 2025, 11:03 AM

#

Good morning guys , can nano banana do nsfw content or soft nsfw content ?

merry geode Aug 29, 2025, 11:13 AM

#

Helloo

midnight mesa Aug 29, 2025, 11:20 AM

#

i love lmarena

tough onyx Aug 29, 2025, 11:25 AM

#

heloo

golden ocean Aug 29, 2025, 11:41 AM

#

-# (Claude Opus 4.1 thinking is included in the second option)

quartz pike Aug 29, 2025, 11:43 AM

#

yo yall

#

on lmarena

#

is there a limit to how many messages i can send

#

or image requests to an ai?

#

-# personally i think yes. because it would cost a SH1T ton of money for lmarena to run o3 for free to anyone. and 4.1 opus.

#

-# but im not sure so pls fact check me

lilac inlet Aug 29, 2025, 11:47 AM

#

quartz pike is there a limit to how many messages i can send

I have been using LMarena for quite a long time, and I have experienced all the top-notch models, especially Opus 4.1, which is available for limited use. Once you exhaust your limit, you have to wait for some time for it to run again!

quartz pike Aug 29, 2025, 11:48 AM

#

oh lol

#

what is the limit?

quartz pike Aug 29, 2025, 11:48 AM

#

lilac inlet I have been using LMarena for quite a long time, and I have experienced all the ...

And me too. but personally i come from websim lol

#

its a really great platform

lilac inlet Aug 29, 2025, 11:50 AM

#

quartz pike oh lol

I think it depends on your token. If you're using it to generate a long code, then it won't even run thrice!

quartz pike Aug 29, 2025, 11:55 AM

#

damn

#

so its a token limit not a run limit.

#

Got it.

willow grail Aug 29, 2025, 12:02 PM

#

hm

#

banana sucks at

convert style of image into japanese anime style. this shows a dinosaur anime. we see a ceratosaurus.

#

someone help me ?

crisp spear Aug 29, 2025, 12:04 PM

#

willow grail banana sucks at > > convert style of image into japanese anime style. this sho...

Is this your prompt?

quiet kestrel Aug 29, 2025, 12:04 PM

#

How to make video

#

Pls tell me

#

#

Hello?

willow grail Aug 29, 2025, 12:06 PM

#

crisp spear Is this your prompt?

ya

#

this is the best so far which it could make

#

but its not really anime

keen beacon Aug 29, 2025, 12:09 PM

#

Is there any Ai generator that does nsfw content? Or I meant say everything in general literally

willow grail Aug 29, 2025, 12:09 PM

#

convert picture of dinosaur so it looks like manga death note and similar manga.

keen beacon Aug 29, 2025, 12:21 PM

#

keen beacon Is there any Ai generator that does nsfw content? Or I meant say everything in g...

Stable diffusion

chilly parrot Aug 29, 2025, 12:43 PM

#

hi

viscid sparrow Aug 29, 2025, 12:45 PM

#

hi

quiet kestrel Aug 29, 2025, 12:46 PM

#

How to see my videos

rocky wedge Aug 29, 2025, 12:53 PM

#

Hi

midnight mesa Aug 29, 2025, 12:55 PM

#

is it possible to nano banana make videos

lofty compass Aug 29, 2025, 1:06 PM

#

a 3d wireframe of a battle ship

#

#share-prompts a 3d wireframe of a battle ship

#

/video a 3d wireframe of a battle ship

timber iris Aug 29, 2025, 1:13 PM

#

Does GPT 5 HIGH in imarena have thinking?

frank elm Aug 29, 2025, 1:16 PM

#

I am exploring the new Features

neon idol Aug 29, 2025, 1:16 PM

#

timber iris Does GPT 5 HIGH in imarena have thinking?

Yes

echo sinew Aug 29, 2025, 1:17 PM

#

quiet kestrel How to make video

Hello! You'll want to check #1397655624103493813 to learn how to prompt the bot and generate videos in #video-arena-1 #video-arena-2 #video-arena-3

timber iris Aug 29, 2025, 1:22 PM

#

neon idol Yes

Great, i guess it have auto thinking, sadly we can't see what are they thinking

cinder oar Aug 29, 2025, 1:22 PM

#

yoh great to get a chance for different ai.

keen beacon Aug 29, 2025, 1:29 PM

#

timber iris Does GPT 5 HIGH in imarena have thinking?

Yes, it's literally the best thinking model of OpenAI and on the market right now.

#

Sad that we can't really see the reasoning process.

willow grail Aug 29, 2025, 1:40 PM

#

timber iris Great, i guess it have auto thinking, sadly we can't see what are they thinking

why am i alive. its suffering. nothing else.

hybrid yacht Aug 29, 2025, 1:40 PM

#

Hi all, complete newbie to this Ai thing,
just here to learn

willow grail Aug 29, 2025, 1:40 PM

#

hybrid yacht Hi all, complete newbie to this Ai thing, just here to learn

nice

#

how did you find lmarena

hybrid yacht Aug 29, 2025, 1:41 PM

#

willow grail how did you find lmarena

Been watching Jack Vs Ai videos

willow grail Aug 29, 2025, 1:44 PM

#

oh thats a new youtuber

hollow ether Aug 29, 2025, 2:04 PM

#

Hello there! It was good to be here and learn more about AI

boreal rampart Aug 29, 2025, 2:22 PM

#

hello! I'm here for make a content video AI!

stray dock Aug 29, 2025, 2:27 PM

#

broooo wtf just booted up LMArena and all my previous prompts are GONE

#

WTF

willow grail Aug 29, 2025, 2:36 PM

#

boreal rampart hello! I'm here for make a content video AI!

omg jack

exotic gust Aug 29, 2025, 2:39 PM

#

eager crag Aug 29, 2025, 2:40 PM

#

is it... down?

exotic gust Aug 29, 2025, 2:40 PM

#

seems like it

#

nope

eager crag Aug 29, 2025, 2:40 PM

#

no?

exotic gust Aug 29, 2025, 2:40 PM

#

back to normal

eager crag Aug 29, 2025, 2:40 PM

#

oh yeah it's back up cool

exotic gust Aug 29, 2025, 2:40 PM

#

prolly some cloudflare issue

eager crag Aug 29, 2025, 2:41 PM

#

or it's me who did too many prompts

rich compass Aug 29, 2025, 2:41 PM

#

fix your @$$hole site😎

eager crag Aug 29, 2025, 2:41 PM

#

hey, not nice.

rich compass Aug 29, 2025, 2:41 PM

#

eager crag hey, not nice.

what not nice?

#

im trying to fix my script

#

AND SITE

#

JUST DOWN

dense sphinx Aug 29, 2025, 2:41 PM

#

Mine not working too

eager crag Aug 29, 2025, 2:42 PM

#

me too, but it isn't necessary to insult the developers.

dense sphinx Aug 29, 2025, 2:42 PM

#

Is it bug again?

eager crag Aug 29, 2025, 2:42 PM

#

they're gonna fix it like they always did. patience is a virtue.

opal hamlet Aug 29, 2025, 2:43 PM

#

WTF？

rich compass Aug 29, 2025, 2:43 PM

#

badass site

#

dont worry

#

your ass soon gone

dense sphinx Aug 29, 2025, 2:43 PM

#

JN Pavel durov dangerous.

rich compass Aug 29, 2025, 2:43 PM

#

dense sphinx JN Pavel durov dangerous.

nah

#

IM NOT IN THE DANGER SKYLER

#

I AM THE DANGE

#

R

#

https://tenor.com/view/dayni-gif-16797284359068260185

Tenor

dense sphinx Aug 29, 2025, 2:44 PM

#

Can it be fixed?

eager crag Aug 29, 2025, 2:44 PM

#

it can be, just an outage.

#

i'm guessing at least.

rich compass Aug 29, 2025, 2:45 PM

#

dense sphinx Can it be fixed?

use free grok 4 for now

#

grok com

#

idk

#

5 requests

sonic flax Aug 29, 2025, 2:45 PM

#

Hello friends, I get the message "No models found," how can I fix this?

tight oriole Aug 29, 2025, 2:45 PM

#

🙀 💀 ✌️

rich compass Aug 29, 2025, 2:45 PM

#

tight oriole 🙀 💀 ✌️

nice nick

#

kitler

tight oriole Aug 29, 2025, 2:45 PM

#

rich compass nice nick

😽

eager crag Aug 29, 2025, 2:46 PM

#

sonic flax Hello friends, I get the message "No models found," how can I fix this?

it might be an outage. we and other usual users don't know yet.

sonic flax Aug 29, 2025, 2:47 PM

#

eager crag it might be an outage. we and other usual users don't know yet.

tank you

#

It's back, friends.

dense sphinx Aug 29, 2025, 2:48 PM

#

@rich compass where?

#

On LM?

rich compass Aug 29, 2025, 2:48 PM

#

dense sphinx On LM?

nah

#

grok dot com

merry wren Aug 29, 2025, 2:51 PM

#

hi

#

im having an issue

#

there are no more models on the model selector

night geode Aug 29, 2025, 2:52 PM

#

Hi

merry wren Aug 29, 2025, 2:53 PM

#

night geode Hi

hello

night geode Aug 29, 2025, 2:55 PM

#

I am completely new to this platform, how can we generate videos in the website?

merry wren Aug 29, 2025, 2:56 PM

#

night geode I am completely new to this platform, how can we generate videos in the website?

video arena i think

night geode Aug 29, 2025, 2:57 PM

#

Alr

#

Thanks

static stream Aug 29, 2025, 3:05 PM

#

hello

merry wren Aug 29, 2025, 3:08 PM

#

night geode Thanks

you're welcome

merry wren Aug 29, 2025, 3:08 PM

#

static stream hello

hi

lean coral Aug 29, 2025, 3:13 PM

#

hey

#

is there any way i can generate more thn 8 vid in 4 hr

echo aurora Aug 29, 2025, 3:18 PM

#

merry wren there are no more models on the model selector

Are you still seeing this?

echo aurora Aug 29, 2025, 3:18 PM

#

lean coral is there any way i can generate more thn 8 vid in 4 hr

There is not

rugged brook Aug 29, 2025, 3:19 PM

#

is sonnet 4 thinking better then gemini 2.5 in coeding

lean coral Aug 29, 2025, 3:20 PM

#

@echo aurora there is limitation for generating image?

rugged brook Aug 29, 2025, 3:20 PM

#

ye

lean coral Aug 29, 2025, 3:20 PM

#

ah

#

i was looking for any ai that close to veo 3 that going to give vid with sound with any limitation

merry wren Aug 29, 2025, 3:23 PM

#

echo aurora Are you still seeing this?

refreshed, resolved now

merry wren Aug 29, 2025, 3:23 PM

#

echo aurora Are you still seeing this?

thank you for asking

quartz pike Aug 29, 2025, 3:24 PM

#

Tuff or nah pls vote i wanna see what sh1tty ai made the one on the right.

sterile pulsar Aug 29, 2025, 3:25 PM

#

Hello!

quartz pike Aug 29, 2025, 3:26 PM

#

hello

lean coral Aug 29, 2025, 3:27 PM

#

hey

rose lintel Aug 29, 2025, 3:37 PM

#

hi

#

im new here

upper venture Aug 29, 2025, 3:37 PM

#

quartz pike Tuff or nah pls vote i wanna see what sh1tty ai made the one on the right.

It makes sense. Veo 3 is one of the best video GenAI rn and seed dance isn't made for those type of gaming videos after all, you can see it generates yeah but as the name suggests and given the company who made it (Bytedance, owner of TikTok and others) that isn't the main goal, it's more about dances, IRL videos, etc not games even tho it can generate xD.
Btw what an impressive result from Veo 3 even with the audio

lean coral Aug 29, 2025, 3:39 PM

#

can i generate ai vid in lmarena compare model

upper venture Aug 29, 2025, 3:40 PM

#

lean coral can i generate ai vid in lmarena compare model

#video-arena-1 /video

#

ig, I didn't use or test it yet, I just like seeing others' prompts and gens lol

willow grail Aug 29, 2025, 3:41 PM

#

wtf is happening with users

#

so many japanese nicknames

unborn ocean Aug 29, 2025, 3:41 PM

#

upper venture It makes sense. Veo 3 is one of the best video GenAI rn and seed dance isn't mad...

the main point is that it is the lite version: good visuals, but too small to actually understand the query or physics

#

i acutally called that it would be the lite model

#

very easy to notice

upper venture Aug 29, 2025, 3:42 PM

#

unborn ocean the main point is that it is the lite version: good visuals, but too small to ac...

That's also very true

#

It is a capable model ngl

#

Not as capable as Veo 3 tho. But yes, this result doesn't show all the power that the Seed team created with their GenAI models

unborn ocean Aug 29, 2025, 3:43 PM

#

willow grail so many japanese nicknames

no name like yours though :v

willow grail Aug 29, 2025, 3:43 PM

#

unborn ocean no name like yours though :v

o.o

unborn ocean Aug 29, 2025, 3:43 PM

#

nah, again the name change, lol

#

like the 10th

willow grail Aug 29, 2025, 3:45 PM

#

unborn ocean nah, again the name change, lol

yeah

#

i played catch with wild crows

#

itsg fun lol

#

they need to train tho

dense geyser Aug 29, 2025, 3:45 PM

#

hello

willow grail Aug 29, 2025, 3:45 PM

#

i will teach them how to catch food in air

willow grail Aug 29, 2025, 3:46 PM

#

unborn ocean no name like yours though :v

like there is a pedestrian bridge above a river. and crows live there

#

so u go up there and throw the food down and they try to catch it

unborn ocean Aug 29, 2025, 3:48 PM

#

willow grail so u go up there and throw the food down and they try to catch it

sounds pretty cool, the crows in my city are really brutal though

willow grail Aug 29, 2025, 3:48 PM

#

if its baby season then u just need to feed em and they stop attacking

unborn ocean Aug 29, 2025, 3:48 PM

#

like they kill (weak) pigeons and stuff like that

willow grail Aug 29, 2025, 3:48 PM

#

unborn ocean like they kill (weak) pigeons and stuff like that

weak pidgeons sounds like virus etc?

unborn ocean Aug 29, 2025, 3:49 PM

#

yeah idk, usually it is just old looking ones (and then they team up etc.)

willow grail Aug 29, 2025, 3:52 PM

#

unborn ocean yeah idk, usually it is just old looking ones (and then they team up etc.)

like normal carrion crow?

unborn ocean Aug 29, 2025, 3:54 PM

#

was probably just confusing em with ravens (because we dont really have any of them)

#

thinking about it

#

i was also surprised

willow grail Aug 29, 2025, 3:55 PM

#

ravens usually are parents and children.
they dont do big groups like crows

unborn ocean Aug 29, 2025, 3:55 PM

#

well i did see ravens / crows teaming up on single pigeons

willow grail Aug 29, 2025, 3:55 PM

#

lol

#

wher ulive

#

was it winter?

#

achso

unborn ocean Aug 29, 2025, 3:56 PM

#

maybe i was drunk

willow grail Aug 29, 2025, 3:56 PM

#

no idea

unborn ocean Aug 29, 2025, 3:56 PM

#

but saw it multiple times, so idk

willow grail Aug 29, 2025, 3:59 PM

#

unborn ocean but saw it multiple times, so idk

king jon un would perhaps also attack very weak birds

junior forge Aug 29, 2025, 3:59 PM

#

Hi, im abdoulaye from fench nice to meet all!

willow grail Aug 29, 2025, 4:01 PM

#

whats going on so many new people

#

https://tenor.com/view/ooh-ooo-cat-shocked-funny-gif-14366308

Tenor

echo aurora Aug 29, 2025, 4:22 PM

#

hi everyone ablobwave lots of new folks, welcome welcome! Don't hesitate to ping me if you have questions or problems with the site blobthanks

misty vault Aug 29, 2025, 4:23 PM

#

echo aurora hi everyone <a:ablobwave:552927506957729802> lots of new folks, welcome welcome!...

bring back alpha.lmarena.ai so that i can continue enjoying lmarena while it is down for everybody else

dusty niche Aug 29, 2025, 4:25 PM

#

Guess the modle

clear trellis Aug 29, 2025, 4:26 PM

#

echo aurora hi everyone <a:ablobwave:552927506957729802> lots of new folks, welcome welcome!...

Hey im pretty much new and i just heard of lm arena like 10 mins ago.

My question is, is it 100% free ? Can i test and generate images and make comparison fully free or is it limited by credits ?

modern flume Aug 29, 2025, 4:28 PM

#

how to generate video

dusty niche Aug 29, 2025, 4:29 PM

#

clear trellis Hey im pretty much new and i just heard of lm arena like 10 mins ago. My questi...

yep 100% free, just go to the website and start testing

modern flume Aug 29, 2025, 4:30 PM

#

dusty niche yep 100% free, just go to the website and start testing

how to generate video

remote lagoon Aug 29, 2025, 4:30 PM

#

even has an extra btn that links to webdev arena!

dusty niche Aug 29, 2025, 4:31 PM

#

modern flume how to generate video

go into video arena an type /video then add your prompt

#

video arena is chat here not a website

echo aurora Aug 29, 2025, 4:37 PM

#

misty vault bring back alpha.lmarena.ai so that i can continue enjoying lmarena while it is ...

Lol good to know, we shall consider blobthanks

echo aurora Aug 29, 2025, 4:39 PM

#

modern flume how to generate video

Check out #1397655624103493813 for more info on how to use this!

ripe orbit Aug 29, 2025, 4:43 PM

#

hello

echo aurora Aug 29, 2025, 4:46 PM

#

clear trellis Hey im pretty much new and i just heard of lm arena like 10 mins ago. My questi...

welcome! Yes, it is free. There are limits in that the amount of prompts per day/hour by model is in place. But when you hit that limit and you can wait it out kind of thing.

upbeat idol Aug 29, 2025, 4:50 PM

#

hi

rustic knot Aug 29, 2025, 4:58 PM

#

oh this model literally said it was everyone

quartz pike Aug 29, 2025, 5:00 PM

#

WHAT

#

YOU ARE HERE

#

WWWWW

austere kiln Aug 29, 2025, 5:04 PM

#

stray dock broooo wtf just booted up LMArena and all my previous prompts are GONE

Had the same issue on a smartphone. On pc, however, there is a long history available. Using the same wifi network on both devices.

tall owl Aug 29, 2025, 5:09 PM

#

hello

teal mantle Aug 29, 2025, 5:12 PM

#

what is the gpt 5 pro quota of chatgpt pro?

#

team is 15 per month

weak timber Aug 29, 2025, 5:14 PM

#

Hello

torpid roost Aug 29, 2025, 5:32 PM

#

hi

blissful carbon Aug 29, 2025, 5:45 PM

#

hi

echo aurora Aug 29, 2025, 5:46 PM

#

hello 👋

nocturne fiber Aug 29, 2025, 5:55 PM

#

happy to be here

teal mantle Aug 29, 2025, 5:55 PM

#

weak timber Hello

hello

grizzled plaza Aug 29, 2025, 6:07 PM

#

hello world

vagrant idol Aug 29, 2025, 6:10 PM

#

Hello

median ginkgo Aug 29, 2025, 6:27 PM

#

Anyone here knows how i can do the image to video thing

verbal nimbus Aug 29, 2025, 6:31 PM

#

median ginkgo Anyone here knows how i can do the image to video thing

#1397655624103493813

naive sedge Aug 29, 2025, 6:31 PM

#

median ginkgo Anyone here knows how i can do the image to video thing

#video-arena-1

The do /image

Then u will see the bot Givin u the command

#

Then*

echo aurora Aug 29, 2025, 6:31 PM

#

median ginkgo Anyone here knows how i can do the image to video thing

Yeah check #1397655624103493813 for details, you'll want to use /video

#

not /image

#

/image creates images, /video creates videos, /image-to-videos makes videos with a reference image

leaden palm Aug 29, 2025, 6:42 PM

#

hi guys - this place grown into something new.

remember when there was just one lmsys discord, when the icon was the same as gradio's or vicuna's, when there were just a few million votes? these days with all the new image, video, and text models, lm arena has really became a phenomenon.

thing is that i've also became a person who isn't as interested in these new video models, these hyperactive chats, these things that come with scale. i haven't even been using lm arena or moderating the chat that much lately. as such i'm leaving this server and leaving the job to the other moderators here.

it was great being with all of you. see you around, hope to discuss some more things then 👋

fathom plover Aug 29, 2025, 6:43 PM

#

hi

echo aurora Aug 29, 2025, 6:44 PM

#

leaden palm hi guys - this place grown into something new. remember when there was just one...

Thank you so so much for everything @leaden palm

#

I really appreciate all of the help you've put into this community! You shall be missed ❤️

sterile saddle Aug 29, 2025, 6:47 PM

#

helo to everyone

proud hazel Aug 29, 2025, 6:48 PM

#

Hey @echo aurora, how do you prefer to be contacted about “private” matters? Is it okay to just send you a DM?

echo aurora Aug 29, 2025, 6:49 PM

#

proud hazel Hey <@283397944160550928>, how do you prefer to be contacted about “private” mat...

Can you send a DM to the @oak python bot? That's a good way.

inner gate Aug 29, 2025, 6:50 PM

#

What’s up pals

umbral veldt Aug 29, 2025, 6:51 PM

#

can't wait to try video gen!

sweet imp Aug 29, 2025, 6:54 PM

#

Yo YO

quartz pike Aug 29, 2025, 6:58 PM

#

Chat i think gpt image 1 is cooking

proud hazel Aug 29, 2025, 7:13 PM

#

quartz pike Chat i think gpt image 1 is cooking

I'm curious for the result

quartz pike Aug 29, 2025, 7:14 PM

#

it crashed

#

i cri

#

me depresso

#

me die

proud hazel Aug 29, 2025, 7:16 PM

#

🫡

verbal nimbus Aug 29, 2025, 7:31 PM

#

quartz pike Chat i think gpt image 1 is cooking

Direct mode never works for me

quartz pike Aug 29, 2025, 7:33 PM

#

it aint direct

#

its compare

#

or wait

#

is me stupid

#

yeah

#

compare

magic rock Aug 29, 2025, 7:34 PM

#

How to generate video with audio?

quartz pike Aug 29, 2025, 7:34 PM

#

L u c k.

#

pray to lmarena discord

#

to give you veo3

keen beacon Aug 29, 2025, 7:35 PM

#

magic rock How to generate video with audio?

#1397655624103493813

#

ah

#

with audio

#

it is random, so no

neon idol Aug 29, 2025, 7:35 PM

#

quartz pike to give you veo3

Or just use clickbait video in YouTube tutorial for get Free Veo3

quartz pike Aug 29, 2025, 7:36 PM

#

yes

raw arrow Aug 29, 2025, 7:37 PM

#

hello

echo aurora Aug 29, 2025, 7:39 PM

#

magic rock How to generate video with audio?

Not all video models have audio support, and since it's random which models you get, it's also random if your video gets sound or not.

sour wyvern Aug 29, 2025, 7:42 PM

#

Hello

dense quail Aug 29, 2025, 7:47 PM

#

Hello I am here to learn more on how to generate good videos from prompts

golden ocean Aug 29, 2025, 8:03 PM

#

dense quail Hello I am here to learn more on how to generate good videos from prompts

true

echo aurora Aug 29, 2025, 8:05 PM

#

dense quail Hello I am here to learn more on how to generate good videos from prompts

ablobwave Be sure to check out #1397655624103493813 for more information on how to use the bot.

hollow geyser Aug 29, 2025, 8:25 PM

#

hello, video generation

ocean vortex Aug 29, 2025, 8:36 PM

#

upper venture It makes sense. Veo 3 is one of the best video GenAI rn and seed dance isn't mad...

so in short, veo3 > seedance, lol

upper venture Aug 29, 2025, 8:36 PM

#

ocean vortex so in short, veo3 > seedance, lol

As it stands rn, in short: Veo 3 > anything else.

#

And listen, I'm not a big fan of Google or Gemini.

ocean vortex Aug 29, 2025, 8:37 PM

#

upper venture As it stands rn, in short: Veo 3 > anything else.

it does seem that is consensus yeah. Although I'll admit I'm not huge fan of video models in general to use them extensively. Don't see a terrible amount of use for them other than just playing around for fun, personally 👀

upper venture Aug 29, 2025, 8:38 PM

#

ocean vortex it does seem that is consensus yeah. Although I'll admit I'm not huge fan of vid...

Same

#

But from all the texts here and even that Minecraft first pov generation, I see why people talk so great about it

#

Veo 3 is really capable (I mean, expected from the owners of YouTube and the ones who have the largest amount of GPUs out there)

proud hazel Aug 29, 2025, 8:39 PM

#

Kling 2.1 Master, Wan 2.2 and Seedance 1.0 Pro all generate higher-quality videos. However, Veo 3 has audio.

upper venture Aug 29, 2025, 8:39 PM

#

proud hazel Kling 2.1 Master, Wan 2.2 and Seedance 1.0 Pro all generate higher-quality video...

Yes but look at the rank. They're all 100 points behind Veo 3

#

That's a big thing

proud hazel Aug 29, 2025, 8:40 PM

#

upper venture Yes but look at the rank. They're all 100 points behind Veo 3

Yes, just because of the audio. If they all had audio like Veo 3, Veo would be at a disadvantage.

upper venture Aug 29, 2025, 8:40 PM

#

proud hazel Yes, just because of the audio. If they all had audio like Veo 3, Veo would be a...

Even Veo without audio is still above them, don't get it wrong

ocean vortex Aug 29, 2025, 8:40 PM

#

upper venture Veo 3 is really capable (I mean, expected from the owners of YouTube and the one...

They don't really have many GPUs. They use TPUs instead. But yeah they have abundance of compute relative to everyone else...

upper venture Aug 29, 2025, 8:41 PM

#

Lm arena separates the Veo models from sound and non sound. Yes, the difference is big too. The ones with sound have way more points. But the ones without sound are still scoring better than the competition. Don't get it wrong. Veo 3 is the most capable video model.

upper venture Aug 29, 2025, 8:42 PM

#

ocean vortex They don't really have many GPUs. They use TPUs instead. But yeah they have abun...

Yeah, exactly 💯

proud hazel Aug 29, 2025, 8:42 PM

#

In most cases, I actually find the results from other providers to be better than those from Veo itself. But taste is subjective.

upper venture Aug 29, 2025, 8:43 PM

#

Yes and the majority agrees that Veo 3 is better. It is trained on way more data, follows prompts closely and also comes from Google and all its processing power.

#

That said, I prefer Wan, it's on the bottom of the rank but c'mon. Unlimited free generation is better than limited generation from Hailuo, Kling, Seedance pro or even Veo.

proud hazel Aug 29, 2025, 8:47 PM

#

upper venture That said, I prefer Wan, it's on the bottom of the rank but c'mon. Unlimited fre...

Unlimited Wan?

upper venture Aug 29, 2025, 8:48 PM

#

proud hazel Unlimited Wan?

Yes! Qwen/Alibaba provides Wan for free. Unlimited. Like there's guardrails I believe (you can't be making 100 videos per minute with a macro or bot). But aside from that, it's truly unlimited! How many videos you want to make you just go there and make.

#

At least for the time being

proud hazel Aug 29, 2025, 8:48 PM

#

480p or 720p?

upper venture Aug 29, 2025, 8:49 PM

#

ocean vortex it does seem that is consensus yeah. Although I'll admit I'm not huge fan of vid...

I'm not sure about that. Honestly as I agreed with Dom. I don't quite use GenAI as much as I use LLMs

#

I prefer it but I don't use it lol

ocean vortex Aug 29, 2025, 8:50 PM

#

proud hazel In most cases, I actually find the results from other providers to be better tha...

it's subjective until we measure it:

proud hazel Aug 29, 2025, 8:51 PM

#

upper venture Yes! Qwen/Alibaba provides Wan for free. Unlimited. Like there's guardrails I be...

Could you possibly generate a sample video in Qwen and post it here?

ocean vortex Aug 29, 2025, 8:52 PM

#

If majority of people agree on their "subjective" opinions, that kinda becomes not subjective anymore. The findings 👀

#

With bias obviously ruled out since it's a blind testing/voting

upper venture Aug 29, 2025, 8:53 PM

#

ocean vortex it's subjective until we measure it:

#

LM Arena (this server) ranking as of last update

upper venture Aug 29, 2025, 8:55 PM

#

proud hazel Could you possibly generate a sample video in Qwen and post it here?

I mean it won't be the best but I'll try it

ocean vortex Aug 29, 2025, 8:56 PM

#

upper venture Yes! Qwen/Alibaba provides Wan for free. Unlimited. Like there's guardrails I be...

https://wan.video 🧐

Wan AI | Wan 2.2: Leading AI Video Generation Model

Wan is an AI creative platform from Alibaba. It aims to lower the barrier to creative work using artificial intelligence, offering features like text-to-image, image-to-image, text-to-video, image-to-video, and image editing.

upper venture Aug 29, 2025, 8:57 PM

#

ocean vortex https://wan.video 🧐

Yes, that one

ocean vortex Aug 29, 2025, 8:57 PM

#

this seems "slow unlimited"

#

How long does this typically take?

upper venture Aug 29, 2025, 8:57 PM

#

Yes. It is slow

#

A few mins. The get member is not needed even for free members you get I believe 50 coins per day where you can get 5 fast generations per day.

#

People needs to understand that those video models are extremely expensive to run. Being provided unlimited generation per day even if it takes half an hour or an hour each. It's still free. Remember that it'll be using wan 2.2 their best model too.