#general

1 messages Β· Page 219 of 1

compact flame
#

Honestly I understand the Ai

#

Confusing miles with actual miles seems valid

zealous sparrow
#

0.5 deducted because the model didnt mention the ticket was forged

#

but i know that a model can't reach that

#

unless opus

zealous sparrow
compact flame
zealous sparrow
#

it will be worse

#

if its better i take back all i said about OAI

compact flame
zealous sparrow
#

both opuses got 0/3

#

damn

#

fiercefalcon flopped this one

compact flame
neat apex
#

Opus 4.5 failed??

#

Maybe Gpt 5.2 Xtra High nails it xd

zealous sparrow
#

it aced my first 3

#

then failed

neon idol
#

what is this prompt

zealous sparrow
# neon idol what is this prompt

Lucy and Mary were at a concert, one of them got in but the second didn't, even tho the tickets were booked. Why?
Daisy and Mike were at a park. Daisy took 3 daisys, and mike took 0 why?
Luke and Miles were driving on bikes down a hill. When they got down to the hill, Miles was missing, Why?

neon idol
#

and the answer ?

zealous sparrow
#

A1: One of the tickets were forged
A2: Daisy took 3 daisy's because her name is literally daisy
A3: Miles fell to the side

neon idol
#

thx a lot

compact flame
#

It's available at yupp I think

zealous sparrow
#

yeah ima do that

zealous sparrow
bright shard
#

@echo aurora Nano Banana Pro is throwing a lot of errors again; sometimes it works perfectly, but other times it keeps throwing the same error.

compact flame
pulsar canopy
#

:ablobwave:

echo aurora
echo aurora
# bright shard <@283397944160550928> Nano Banana Pro is throwing a lot of errors again; sometim...

We have been noticing higher than usual error rates, and our team is aware and working on lowering this as much as possible. However, when you mention:

sometimes it works perfectly, but other times it keeps throwing the same error.
This sounds a lot like it's being caused by rate limit. You can confirm this by: opening Dev Tools > Network > search Stream and find the Eval ID, there if you're seeing the Status Code = 429 that means it's being caused by rate limit.

zealous sparrow
compact flame
zealous sparrow
#

google flash models screwed that question up too

#

so its prob all the new models that know Miles is always a unit and not a name

hushed gyro
#

you can edit messages in lmarena now???

misty harbor
#

using claude and it's stuck on generating, any ideas to fix it?

visual osprey
#

i feel like those questions have a lot of valid answers though

zealous sparrow
#

and if they can reach them

visual osprey
#

waht does that mean

zealous sparrow
#

They prove how good they are

fleet lintel
zealous sparrow
neat apex
#

Its LLM fault

#

Gemini 3 must be the goat for this kind of question

#

At least the deep thinking

zealous sparrow
visual osprey
#

i mean for answer 1 i would say like she might be not following dress code or intoxicated

she took daisys because its her namesake

badly worded but crashing on the way down seems the most logical and likely

#

third question is really badly worded

zealous sparrow
#

Im going to ask 3 pro

visual osprey
#

saying when they got down implies both reached the bottom of the hill when the answer you want demands that one of them did not

hushed gyro
#

What models are in the video arena channels for image and video?

zealous sparrow
#

overall just wrong but also fell for measurement unit bs

polar wharf
zealous sparrow
#

Also the models often interpret booked as criminal booked

#

exactly

polar wharf
#

damn

compact flame
#

Like hey I booked you a prison cell spot

visual osprey
#

thats not the correct usage

zealous sparrow
#

3 pro keeps interpreting the people as pets

#

or animals

#

or objects

zealous sparrow
#

1/3 [gemini 3 pro]

#

2/3 [failed ticket question]

bright shard
proud bobcat
#

nvm i got it

zealous sparrow
#

bro if not the Miles BS
LLMs will say Kyle sounds like Isle or Cycle

proud bobcat
#

they think its some wordplay

desert abyss
zealous sparrow
#

2/3 failed the hill question

compact flame
#

Extra high said miles was missing

#

It measured in kilometers surprisingly

#

Though about the first one it said Lucy is a staff member

#

Brah

misty harbor
#

anyone knows how to fix a constant generating on lmarena?

echo aurora
echo aurora
misty harbor
cold minnow
#

How cani fix it

lucid nexus
#

How to create video

red meadow
#

i cant generate anything, i always get the error the other user above got.

olive mountain
#

keep getting this error

cold minnow
#

Yea

echo aurora
# cold minnow Yes how can i fix it

For this error it's a bit difficult to troubleshoot as it can be caused by different reasons. For emini-3-pro-image-preview we have been seeing higher than usual error rates, our team is aware of this. With this case unforuntately, there isn't much on the user's end that you can do to get past this. But overall I would recommend: refreshing the page, starting a new chat, clearing cache. This can help, but isn't a guaranteed fix.

It's worth noting too that this error can trigger because of rate limit, which tends to be pretty common. This can be verified by opening Dev Tools > Network > search Stream and find the Eval ID, there if you're seeing the Status Code = 429 that means it's being caused by rate limit.

echo aurora
misty harbor
#

@echo aurora what about the constant generating?

echo aurora
# misty harbor no, just generating

This one is more difficult to figure out what the issue is as an error isn't triggering giving some kind of status code. However, trying the same methods may help here: refreshing the page, starting a new chat, clearing cache.

misty harbor
#

i did all of the above, including a hard refresh

#

well the new chat creates a new chat but can't load the one that's stuck

olive mountain
#

after generating a image in the discord itself, how to download that generated image in the local storage? Or we cant download it?

bright shard
obtuse heart
#

Pineapple working overtime huh

echo aurora
echo aurora
misty harbor
bright shard
echo aurora
pseudo hemlock
#

But like who funds them

#

Does Google pay and pray (assume) their model with a fancy hidden name is going to be the best?

#

So they’re willing to give $1000?

light tusk
#

Is 5.2 not going to leaderboard?

echo aurora
#

As soon as we have those scores we'll be sure to put out an announcement, so keep an eye on #announcements

light tusk
#

I’m taking the over on kalshi

thorn path
#

If you were waiting for leaderboard to update so you can find out who wins just go ahead and make the decision now since it's already not looking too great for 5.2

empty stump
#

openai is falling

latent crest
#

Hello @echo aurora When I try to login with google, it wants me to download a file named β€œgoogle” , why???

latent crest
lime wind
#

so how are people finding GPT-5.2? good in general? not revolutionary right?

latent crest
cloud zinc
whole sundial
#

but gemini 3 flash has to come out first

fiery gull
zealous sparrow
#

as in when the flash model comes out

#

eventually

unborn ocean
#
poll_question_text

does gpt 5.2 have a new-ish base model?
(e.g. fresh pre-train, new distill, larger private model ..)

victor_answer_votes

8

total_votes

8

victor_answer_id

2

victor_answer_text

no

victor_answer_emoji_name

❌

compact flame
spark python
zealous sparrow
#

but logan posting about nano banana 3 flash already huh

neat apex
#

Gpt Pro is already trash, imagine that

#

Imagine how many time would take an gpt 5.5 xtra high

quasi atlas
neat apex
compact flame
# neat apex

I think what can save them is making a not rushed model that is trained properly

fleet lintel
viscid cloak
#

YO WHAT are hazel-edit 6 and ghost-pepper in image gen battle? Horrible models 🀣

weary galleon
#
poll_question_text

Did you like GPT-5.2?

victor_answer_votes

6

total_votes

11

victor_answer_id

3

victor_answer_text

No, it's worse than GPT-5.1

echo aurora
zealous sparrow
#

ghost-pepper is apparently qwen

golden ocean
#

Large Language Model Arena

queen veldt
#

Ultra high gpt 🌿 will save oai

#

Gpt 5.21.1 Ultra high x-max pro plus

sour spindle
#

I feel like I am the only one who likes gpt 5.2

golden ocean
#

yes

sour spindle
#

Welp the beauty of choice I guess.

grave plaza
compact flame
sour spindle
#

My job consists of using models for a lot of "text based work." A lot of research based queries. I have been comparing it with gemini and just 10 normal thing I do GPT was better in 7, 1 basically identical output, and 2 gemini was better.

fiery gull
fiery gull
half mist
#

What version does the ai use in Code Arena if you pick both good or both bad in battle model

gaunt roost
#

Everyone wants MidJourney FREE & Unlimited β€” and in this video, I show the closest real method to getting MidJourney-level images without paying anything.

I’ll show you the secret AI tool that creates MidJourney-style images, how to recreate images from MidJourney Explore, how to write stronger prompts, and how to even animate your results...

β–Ά Play video
sharp mirage
#

Hi

#

Anyone ?

weary galleon
compact flame
sharp mirage
#

Hi

#

Anyone got any prompt for Clash of clans game ?

echo aurora
sharp mirage
#

Hey πŸ™‚

compact flame
sharp mirage
#

I am doing this

sonic flare
#

Hello guys, I'm new here.

Pls, which AI model is best for book content creation?

queen veldt
compact flame
#

It's based on what you prefer

queen veldt
#

Gpt 4.5

burnt sinew
#

Why did it ping him

queen veldt
#

Highest context window

burnt sinew
burnt sinew
compact flame
echo aurora
compact flame
faint drum
#

Is LMArena down?

compact flame
#

No it's not

slim spire
#

GPT 5.2 broke for me

zealous sparrow
#

fellas we got some new gpt 5.2

#

on lm

slim spire
#

does gpt 5.2 work for you guys

burnt sinew
half mist
zealous sparrow
half mist
zealous sparrow
half mist
zealous sparrow
burnt sinew
zealous sparrow
#

just now btw

half mist
zealous sparrow
#

mayb issues

zealous sparrow
#

@echo aurora was gpt 5.2-code a new model or a finetune? It was removed immediately..

cloud zinc
#

its finetune for code

zealous sparrow
#

even if its a finetune

sonic flare
burnt sinew
sonic flare
burnt sinew
sonic flare
burnt sinew
zealous sparrow
#

I hope 5.2-code finetune isnt the same case as speciale

cloud zinc
burnt sinew
burnt sinew
zealous sparrow
zealous sparrow
outer lark
#

hello

echo aurora
fiery gull
#

Use gpt 5.2 xhigh for double check

sharp mirage
#

Thats good Idea πŸ’‘

fiery gull
#

Plan the book with gemini 3.0 and create the book itself with opus 4.5

burnt sinew
fiery gull
#

Is just like I do to get the best result

astral bloom
#

1, 7, 18, 45, ....?
sol:
115
aβ‚™ = 3aₙ₋₁ - aβ‚™β‚‹β‚‚ - 2

sharp mirage
#

What is that?

astral bloom
#

someone sent it as a challenge for ai's

fiery gull
#

Bro is impossible to vote opus 4.5 vs gpt 5.2 xhigh 😐

#

Bro I don't have 1% of smart that AI

sharp mirage
#

5.2 high was so mid

sonic flare
sharp mirage
fiery gull
#

But I think is because the 'EXTRA' high

fiery gull
sharp mirage
#

But tbh I don't think it's worth it to buy api or using it

#

Bro its so expensive

fiery gull
#

Bro but in code (word html) the gpt 5.2 is really cooking like gemini 3.0

fiery gull
sharp mirage
#

I was sleeping πŸ™‚

latent crest
fiery gull
#

soooo good the gpt 5.2 xhigh in word html

sharp mirage
#

Hmmm

#

Gpt is better here

#

Btw are you on Mac ?

fiery gull
#

gpt 5.2 xhigh = gemini 3.0

fiery gull
sharp mirage
#

Pro?

sonic flare
fiery gull
#

plan/thinking the book with gpt 5.2 xhigh

#

gemini 3.0 for create the book in gemini app

#

use the 3 lol

sharp mirage
#

The first one

fiery gull
sharp mirage
#

But it has a rate limt

sonic flare
fiery gull
sharp mirage
modest prism
#

Which one has better vision and image understanding? Gemini 3 or gpt 5.2

fiery gull
sonic flare
fiery gull
whole sundial
neon idol
#

Fr

modest prism
#

Why is gpt 5.2 high so fast on lmarena it feels like it doesn't think and it's instant

sharp mirage
#

Pineapple saw that:

neon idol
#

Like 2-2

fiery gull
sonic flare
#

Will that work? Wouldn't that mess up my content?

fiery gull
#

lol I delet it ;-;

sharp mirage
#

πŸ˜‰

fiery gull
vivid coral
#

OMG OMG OMG.....IT'S HERE!!!! @echo aurora is the 🐐 🐐 🐐

sharp mirage
#

Btw I think no one cares πŸ˜”

vivid coral
#

Everyone cares, nobody uses closed book LLMs in the real world, we all get caught up in this bubble we have here and don't realize what the masses want and need

sharp mirage
#

@echo aurora did you add glm4.6V?

echo aurora
vivid coral
modest prism
fiery gull
#

and glm 4.6v fast? I want another small model in rank πŸ‘€ , serious this new model is toe-to-toe of glm 4.5v?

modest prism
#

Something I like about 5.2 is that the hallucinate rate seems a lot lower than Gemini 3 pro

sharp mirage
#

Btw you can now try it from the glm site

#

Btw I think glm is cooking

#

Yea I told ya in the screenshot I send

cloud zinc
#

where can i access that

sharp mirage
#

But y'all said fake

sharp mirage
cloud zinc
modest prism
#

API costs

cloud zinc
#

api requires money

#

xhigh is also token hungry, i aint paying

sharp mirage
#

Guys what is the best Minecraft command coder ?

cloud zinc
#

then why u on lmarena

sharp mirage
#

For now I found chat gpt 5.1high and glm 4.6

#

No extramehigh

fleet lintel
#

Holy shit lmao #ChatGPT 5.2 is quite possible the worst model they've ever released. I have no idea what the fuck they have done - there's no way this was the alpha model my cohort tested, nor is it even remotely close to how well 5.1 was performing the other day.

This is quite

#

ignore 5.2 model. Let's wait for 5.5

fiery gull
#

Gpt 5.2 thinking is a gpt what decide how many thinking itself will use

fiery gull
fleet lintel
#

they overtrained on arc-agi to create some buzz.. but real life performance got worse

fiery gull
#

Too, ever month a new chatgpt lol

fleet lintel
#

i have plus membership.. and i gave very decent shot to it and used it excessively today.
it's honestly trash compared to gemini 3

sharp mirage
#

Bro

queen veldt
#

Ye

sharp mirage
queen veldt
#

Gpt 5 high is only for pro hsers

#

Extended thinking in gpt is medium

sharp mirage
#

Why funny ?

devout vault
#

Grok 3 is smart because it has less restrictions

sharp mirage
#

Good point

fiery gull
fleet lintel
#

yes, i think plus users gets medium. this actually makes me mad. they are treating paying customers badly. it honestly feels very scammy to me.

whole sundial
fiery gull
#

OpenAi seeing this πŸ€‘πŸ€‘

devout vault
#

Chatgpt is no longer the godfather of Ai it became just like deepseek

fiery gull
sharp mirage
burnt sinew
fleet lintel
#

i rather pay for gemini 3 pro ..get 2TB storage as well and get much much better performance.

burnt sinew
#

and unlimited usage

fiery gull
sharp mirage
#

And gpt isn't worth it

fleet lintel
fiery gull
#

Gemini pro plan

#

Maybe a gemini 3.1 without lazy? My dream πŸ‘€

#

Exist veo 3.1.... I just need dream it

fleet lintel
cloud zinc
#

gpt 5.2 so bad

weary galleon
#

GPT 5.2πŸ‘Ž πŸ‘Ž πŸ‘Ž πŸ‘Ž πŸ‘Ž

queen veldt
#

We still have 2.5 flash preview

#

It's never leaving the preview

queen veldt
#

We won't get the gemini 3 pro regular

#

It'll stay in preview

#

Until gemini 3.5 pro preview comes out or something

sharp mirage
fleet lintel
# cloud zinc

this is not surprising to me. 5.2 is built on 1.5 year old base model. OAI had enough time to already squeeze the best out of it. Changing it further (like 5,1 to 5.2) would result in improving in one area (arc-agi) and downgrade in others.

sharp mirage
#

No way this true

queen veldt
#

They say the sponsors don't have any influence on testing

#

BUT

#

Arc-agi test is basically giving the model one unique test that the model hasn't been trained on

#

Sooo

modest prism
queen veldt
#

If they paid someone in the company to snitch the arc agi test 2

#

Prompts

#

They could train their model for it

#

Which i honestly think happened

#

They probably have some snitch

#

I guess that's their code red

#

To do false publish

#

Of "super improved model"

#

I'm not amazed at all on the gpt 5.2

modest prism
#

If you guys find a reliable way to use gpt 5.2 extra high for free please tell me I need it so bad

queen veldt
#

I've tested 5.1 1 month ago i think

#

And it failed some tests that gemini 3 pro passed

#

And now the 5.2 failed same tests

#

It's some math problems which require bunchh of steps to get to final result

#

Gemini 3 pro is king

#

Claude is for coding

#

Even for agentic coding

#

Codex max 5.1 was terrible i had to re-do prompts multiple times since I couldn't start my app.....

#

Meanwhile sonnet and opus did them in first try like easy

cloud zinc
queen veldt
# cloud zinc

Regular customer's gpt isn't even here on the list tho

#

πŸ˜‚

cloud zinc
#

gemini 3 pro way better

thorny schooner
#

Is it just me or are the bugs are starting to become more common here on the website because I am seeing the weird disappearing witch becoming a lot more often in reports but also infinite generation unless you Reload Glitch becoming a lot more often both in personal experience and the reports

queen veldt
#

No fixes yet

thorny schooner
#

I i'm aware I was just commenting on it since I'm pretty sure I was the first one who made a report on it at least for one of the glitches at least from where I can see in the report area

#

Because honestly both of those glitches has been going on for me for a bit while now

queen veldt
#

They don't even know why it's happening

#

It's a problem I've talked for a while..m

#

We neeeeeeeedddd to see the error code or something

#

So we can be moreeeee speeecifiicccc

#

Offf thee errorrr

#

Just the retry again isn't enough

thorny schooner
#

Well they probably need to figure out soon enough because if they don't they could be losing a lot of customers soon and fair enough for the code but i am not even going to try to see what code is in the error cuz I have tried to see the code and it looks confusing as hell

#

I already gave video examples to one of the staff

#

When it comes to the glitch itself

bright shard
#

@echo aurora An AI arena for audio, music, etc., would be amazing! It's the only thing LMArena is missing!

burnt sinew
#

google a/b test right now are for what you guys thing?

burnt spindle
#

hello

burnt pulsar
#

I've tried all day to get gpt 5.2-high to work on lmarena, no luck so far.

weary galleon
#

It doesn't have thinking.

proud bobcat
#

AGI!!1!1!1!1!1

weary galleon
proud bobcat
#

What

#

This is artificial analysis benchmark

weary galleon
#

Maybe I'm wrong, but it looks like fake.

pseudo summit
proud bobcat
#

Boom

weary galleon
#

KAT-Coder-Pro V1 has too much, and Gemini 3 Pro has too less.

hollow ivy
proud bobcat
#

Here’s the site

proud bobcat
#

Like

#

Multiple

#

This is the average

pseudo summit
#

haven't seen that one yet, but it looks interesting

proud bobcat
#

Artificial Analysis is usually quite reliable

#

It’s a good assessment on general performance without the bias

pseudo summit
#

o wait, just read ur second message again. im dum

weary galleon
#

As I said, it's fake

lucid geyser
#

@echo aurora How long do new models take to appear on the leaderboard

echo aurora
lucid geyser
#

Also is there a higher probability of getting a newer model

queen veldt
echo aurora
weary galleon
proud bobcat
golden ocean
#

openai is cooked

#

done for

echo aurora
proud bobcat
#

And daily use

#

Same with GPT5

lucid geyser
weary galleon
echo aurora
echo aurora
lucid geyser
weary galleon
#

πŸ™

echo aurora
echo aurora
weary galleon
#

πŸ™

queen veldt
#

It would litteraly end up on reddit in 2 hours

lucid geyser
neon idol
queen veldt
#

Yeah same as google employees

#

They have insider information

#

= free cash on polymarket

neon idol
empty stump
#

is gpt 5.2 xhigh only in the api

vivid coral
#

my early returns are that GPT 5.2 search is definitely an improvement over 5.1, need more time to say how much yet

echo aurora
viral cedar
#

@weary galleon how is that wrong?

weary galleon
native yarrow
#

^

viral cedar
#

😐

empty stump
#

every time they behind they just add more reasoning effort

viral cedar
#

but turned out ass

hollow ivy
# viral cedar

Longterm, OAI has no chance vs Google (and even vs Anthropic).

torn mantle
hollow ivy
#

-# Grok is a wild card.

torn mantle
#

told ya

#

oai hit a plateau

hollow ivy
#

Currently Anthropic is ahead.

#

(coding is most important discipline, long-term.)

viral cedar
hollow ivy
#

Hopefully, Anthropic has high-security standards..

#

(vs spying, hacking, etc)

proud bobcat
#

I use it daily for math and roleplay

#

I love it

hollow ivy
compact flame
weary galleon
#

Roleplaying with a robot? Hmmm...

proud bobcat
#

It’s a HUGE market

#

That

#

I contribute to.

hollow ivy
proud bobcat
#

It is fun

queen veldt
#

How fun?

#

What is it roleplaying?

fickle venture
#

So guys which model is better for coding?
GPT 5.2 or Claude 4.5 opus

proud bobcat
hollow ivy
#

by a huge margin

proud bobcat
#

A little unholy business on the side

fickle venture
#

Dam

queen veldt
#

Opus is op

fickle venture
#

I will try gpt 5.2

queen veldt
#

Try it why not

#

Its bad tho

proud bobcat
#

I don’t think 5.2 is bad, I just think openai doesnt know what it wants to be

#

DeepSeek is math god, Claude is code god, Gemini is vision and jack of all trades

#

But what is GPT?

compact flame
queen veldt
#

Balance between broken code and bad math

compact flame
#

Universal ai ig

native yarrow
#

GPT is for students who dont know stuff about ai and will use the popular one

fickle venture
proud bobcat
fickle venture
proud bobcat
fickle venture
#

And I end up getting F

proud bobcat
#

You will not be let down

queen veldt
#

Nah I'm student and using gemini 3 pro

#

It's waaay better

#

Even for casual chats

proud bobcat
#

DeepSeek is so peak at geometry for me

fickle venture
compact flame
proud bobcat
#

You have to be a bit more wordy but it pays off

proud bobcat
fickle venture
proud bobcat
#

I use no think

compact flame
#

I swear speciale has some sort of paranoia

hollow ivy
compact flame
#

Everytime I look at it's thought it's always thinks what if?

#

Speciale feels like chatgpt pro with long reasoning

proud bobcat
#

You have to specify FULLY

#

Just use the no think and thinking variants

vivid coral
# hollow ivy

Nobody uses commie Claude with their ridiculous limits they impose here, and everywhere

hollow ivy
vivid coral
hollow ivy
compact flame
proud bobcat
#

People pay the premium though cause it’s good

#

I used to have a hatred for Claude but they make solid stuff

#

Really solid

#

Opus 4.5 thinking is a MACHINE

vivid coral
compact flame
hollow ivy
sullen quest
proud bobcat
compact flame
proud bobcat
#

So I didn’t use their models for a while

sullen quest
#

so?

proud bobcat
#

Came back when 4.5 released

compact flame
#

Wym anthropic killed somebody

proud bobcat
#

And I really like its prose

vivid coral
hollow ivy
compact flame
#

Anyways

vivid coral
#

hmmmm interesting

compact flame
#

Chatgpt needs some training

#

It's supposed to be βš–οΈ not uh downgrade

proud bobcat
#

I’ve never hit limits at all??

proud bobcat
hollow ivy
proud bobcat
#

Are we talking lmarena or Claude

latent crest
#

U guys how can I use Midjourney for free??

proud bobcat
fickle venture
compact flame
hollow ivy
compact flame
#

I won't be surprised there will be 6.7 gpt

hollow ivy
vivid coral
compact flame
#

Cuz knowing openai this might happen

fickle venture
proud bobcat
#

And it’ll say β€œSIX SEVENNN” every 2 prompts

compact flame
#

Would be a great April fool's model tho

vivid coral
proud bobcat
#

GPT 10 will finally be able to make coherent organized code

#

Trust

#

With 5% LESS errors

fickle venture
#

Imagine OpenAi skips GPT 9 like Windows and Apple did

proud bobcat
#

GPT 8 will have ads integrated

#

Like every sentence there’s an ad

fickle venture
torn mantle
thorn path
lucid geyser
#

Claude is so good even without reasoning

torn mantle
# lucid geyser Why do u think that

burnout / no improvement on pre-training ( thats why they are starting from scratch ) / less data quality compared to google / key staff elements poached by other labs

lucid geyser
torn mantle
jade egret
#
poll_question_text

Which is better overall

victor_answer_votes

6

total_votes

11

victor_answer_id

2

victor_answer_text

Gemini 3.0 pro

lucid geyser
torn mantle
#

google can basically use multislice to create a more powerful virtual cluster

#

than xai

#

trust me, they are far ahead

#

be it on hardware / software

mild anvil
#

Does anyone know if they've extended the limit of 5 videos per day?

torn mantle
#

i just searched, so the maximum they can pack with this method is 50k but thats still way faster than any cluster for ai training giving how efficient their TPUs

hollow ivy
#

-# (it's 100% free & open-source)

stray aspen
#

can we get claudius 4.5 opus on vision arena

burnt sinew
#

did they lobotmize gemini 3?

hazy forge
thorny schooner
atomic lagoon
viral cedar
#

did yall see the mcdonalds and coca cola ad

#

that made me mad

atomic lagoon
#

It was funny

#

The mcdonalds one

mild anvil
#

I was able to generate about 10 videos in a single day, wasn't the limit supposed to be 5 per day? Does anyone know if they increased the limit?

proud bobcat
viral cedar
noble vessel
#

hello

vivid coral
proud bobcat
#

But the model itself sucks

#

It’s very clearly benchmaxxed

vivid coral
#

I guess that's fair

lofty anchor
#

Bro I use Claud for code I hate the thinking limit can someone tell me how to ovoid the limit

golden ocean
#

Cwaude

burnt sinew
#

@echo aurora 🌩️

golden ocean
#

@🍍

whole sundial
#

<@&1349916362595635286>

neat apex
# neat apex
poll_question_text

What would save openai from short term bankrupt

victor_answer_votes

8

total_votes

17

victor_answer_id

4

victor_answer_text

Nothing can save they lmao

lucid geyser
plucky sparrow
vivid coral
plucky sparrow
#

the other thing is, unlike Google, who releases Gemini 3 Pro (and not just Flash) to everyone to use, GPT models are typically paywalled

#

so a lot of people who have access to these models are paying for it, meaning their expectations (and also hatred, if expectations are unmet) are higher

#

Altman seems to do a good job at selling to VCs, but not such a good job at knowing how to appeal to the general public

#

probably a lesson to be learnt there somewhere if you're thinking of starting a company

#

actually I think the other reason is the amount of hype-litter all over twitter for GPT5.2

#

Gemini 3 Pro had a lot of it too, if it wasn't able to produce much, pretty sure there'd be a lot of gemini 3 pro hate too

plucky sparrow
#

hi

plucky sparrow
# plucky sparrow actually I think the other reason is the amount of hype-litter all over twitter ...

I literally just went to x.com, and saw this at the first post at the top of my feed.
https://x.com/slow_developer/status/1999661802666557487

i'm still kinda confused how openAI made that much progress with gpt-5.2 when gpt-5.1 was only a month ago

my guess is it was an internal model they held back due to high compute costs and because they didn't think it was needed

until gemini 3 and opus 4.5 arrived

#

if people get bombarded with this, and they try the model, expecting "incredible progress" and it can't answer simple questions, yeah, they're going to post about how it sucks

astral bloom
#

system prompt in code arena

tawny brook
#

Which open source model is currently the best overall

tawdry vapor
#

Did you mistake it for glm-4.6v?

ocean ferry
echo aurora
modest prism
astral bloom
bright shard
#

@echo aurora An AI arena for audio, music, etc., would be amazing! It's the only thing LMArena is missing!

shell oasis
#

LMARENA IS THE BEST PLATFORM I HAVE SEEN TILL DATE. ADDING UP THE VIDEO ARENA IN THE WEBSITE IS πŸ”₯πŸ”₯πŸ”₯πŸ”₯

#

I wanted to support by donating some amount... @echo aurora is there a link for donation?

whole sundial
shell oasis
#

Still I am mesmerized by the progress. I write parody songs and I haven't been able to create videos for the lyrics till date due to expensive subscriptions by the AI websites
LMArena opened the gates for me. I really am very grateful πŸ™πŸ»

shell oasis
whole sundial
#

the video arena is not on the website, maybe you are confusing it with something else?

shell oasis
#

@whole sundial but I saw this option

whole sundial
shell oasis
whole sundial
shell oasis
#

Looks genuine to me

whole sundial
shell oasis
#

That's why I came running here to express my happiness here 🀩

tired shadow
#

I tried going to the same link

#

it gave this bruh

#

hmm

#

maybe its a bug like on my phone

shell oasis
#

Am I lucky? I have literally created my first video

tired shadow
#

oh hell nah

#

my phone has 0% SAD

whole sundial
shell oasis
#

tears of joy πŸ₯Ή

#

πŸ₯ΉπŸ₯ΉπŸ₯Ή

tired shadow
#

please emulator I need this, my lmarena is kinda videoless

whole sundial
shell oasis
#

I will be able to create videos for my 58 parody songs

#

Thank you from the depth of my heart LMArena πŸ™πŸ»

hushed gyro
shell oasis
#

I opened normally today like every other day...

shell oasis
#

I am logged in bro

hushed gyro
#

πŸ€”

tired shadow
hushed gyro
shell oasis
#

yes its there

hushed gyro
tired shadow
#

sora 2 pro?

tired shadow
shell oasis
#

yes sora 2 pro

tired shadow
hushed gyro
shell oasis
tired shadow
shell oasis
#

God, thank you πŸ™πŸ»

tired shadow
#

finally after I waited so long

hushed gyro
#

OMG?????

VEO 3 FOR FREE????

@echo aurora nah

#

yo can someone from the company explain why this guy has video arena on the site???

whole sundial
#

i see strings that relate to a video arena in the code, this is 100% real

tired shadow
whole sundial
shell oasis
whole sundial
hushed gyro
whole sundial
hushed gyro
#

pineapple has a lot to explain...

tired shadow
whole sundial
hushed gyro
tired shadow
#

ill check my alt on lmarena

whole sundial
#

if it's not launched and just an a/b test, lots of people who want a video arena will be upset

hushed gyro
shell oasis
#

Are you all logged in to the website also?

hushed gyro
whole sundial
#

(well, we already know it's a/b)

#

yeah neither of my logged in lmarenas have it

shell oasis
#

So good news is we are all going to get the video arena soon on the website πŸ”₯

shell oasis
hushed gyro
lucid geyser
#

What does the video selection even look like

hushed gyro
whole sundial
shell oasis
#

@lucid geyser

tired shadow
#

im starting operation alt check, I have 10 alts

hushed gyro
#

@lucid geyser there should be a video button next to Image & Code

hushed gyro
hushed gyro
lucid geyser
#

It doesn’t show image models without clicking image

hushed gyro
#

@echo aurora pls... we need an explanation on what's happening

lucid geyser
#

Bro he’s sleeping prolly chill

hushed gyro
#

but whatever

#

I want this a/b test situation to end

roll the vid arena out to everyone!!!

lucid geyser
#

Why

hushed gyro
# lucid geyser Why

well if you have noticed

some ppl have the video arena on the website, some don't

shell oasis
#

Maybe after a few hours it will roll to everyone...just like how android updates happen to get feedback

#

and then stable update to all

torn mantle
zealous sparrow
#

Videoarena came to the website

shell oasis
zealous sparrow
shell oasis
#

Great πŸ‘πŸ»

keen topaz
#

Hello, this is Lakki. I am a web developer. If you need help with any project, you can hire me

torn mantle
#

you seem like a vibe coder ngl

compact flame
hushed gyro
shell oasis
keen topaz
#

Need?

jade cloak
#

hey, some devices got Video generation option (including mine), but some didn't. why

queen veldt
#

Samsung browser no video arena

tired shadow
#

I tested on chrome, tor and firefox

compact flame
#

Like with sora

whole sundial
#

so it is a/b, an honestly pretty stupid one at that, they could've launched it on beta lmarena first and announce that instead of basically making people jealous for one another based on if they have the video arena or not

tired shadow
compact flame
#

Then maybe it's just to some users

#

Like early access

tired shadow
jade cloak
#

i don't think its country based

hushed gyro
#

we need an explanation from the company now!

#

and we are slightly upset

jade cloak
#

there are many models

sterile tartan
#

Yupp has even More

#

But LMarena is Better for Convenience and SOTA Models

jade cloak
#

UI preview for you guys

hushed gyro
# jade cloak look guys

@echo aurora hey! umm I have noticed some users have access to the video arena on the website, but some don't.

Can you explain if this is an a/b test and if it is next time pls be explicit about this

sterile tartan
#

Wait since when LMarena has Videos on Website

#

πŸ’€

hushed gyro
sterile tartan
#

Seems like is rolling slowly

sterile tartan
#

Probably a valuable user

hushed gyro
#

does anyone realise that this company sometimes does things in a shady way?

hushed gyro
sterile tartan
hushed gyro
#

THEY SAID DECEMBER!!!

shell oasis
#

I am a DevOps Engineer...I find new ways to automate the code deployments...Heavy automations

compact flame
#

How is it even supposed to tell?

#

Well probably talking about taxes with chatgpt maybe gonna work

shell oasis
compact flame
#

Maybe the video update is not intended?

#

They just maybe accidentally rolled it to some users

shell oasis
#

@compact flame if some got it, maybe its intended for all

hushed gyro
#

This happened with the Retry Button on Battle Mode, took them a month to add it back... smh

compact flame
#

It's just silence and boom it's here

hushed gyro
shell oasis
hushed gyro
compact flame
hushed gyro
shell oasis
compact flame
#

Like with code and etc

hushed gyro
#

try inspect

compact flame
#

I tried didn't find anything useful

hushed gyro
#

@whole sundial need some help over here

compact flame
#

I'm not that good at inspect anyways

whole sundial
mystic flower
#

I’m a full-stack developer building a project and I need an API key for image and video generation.

whole sundial
north osprey
#

heloo

stark forge
#

I'll help the first 10 people interested on how toΒ  start earning $100k or more within a week, but you will reimburse me 10% of your profits when you receive it. Note: only interested people should send a friend request orΒ sendΒ meΒ aΒ dm! askΒ me (HOW) via Telegram username @Susan _Vachon

Or The telegram link in my bio

whole sundial
#

oh i just figured it out

hushed gyro
hushed gyro
whole sundial
#

ctrl + shift + f

whole sundial
#

they made it in such a way that you have to be a part of the a/b test group to access the video arena

hushed gyro
compact flame
#

Ig maybe it's experimental

hushed gyro
#

and there's no way pineapple is silent about this

#

something really sketchy is going on

compact flame
whole sundial
compact flame
queen veldt
#

Ummm

shell oasis
#

Well you know, only 2 videos are allowed per day 🀯
then check back after 14 hours

#

damn

compact flame
wispy sierra
#

Hello

shell oasis
#

The link won't work

compact flame
#

Well I guess it was worth a try

shell oasis
#

Unless its released

#

But very low limit right now, only 2 videos per day

#

damn

compact flame
shell oasis
#

I would have to buy a local PC only but right now all AI companies ate up RAMs πŸ˜”

shell oasis
wispy sierra
#

A bug occurred, and I posted it on the bug forum. How long will it take for the moderators to see and fix it?

north osprey
#

How do I delete a video that has already been generated, sir?

whole sundial
edgy wharf
#

Gemini 3 pro accidentally fed me its internal thought pipelinei nstead of the proper output. Is this common knowledge, or something that's not known?

wispy sierra
#

How does LM Arena allow us to use paid AI models for free?

queen veldt
#

Nobody knows

#

They are paid by big companies for testing the models (in battle mode)

#

Those secret name models are paid by companies

#

But they say for direct chat and other stuff lmarena is paying for API

#

For 6 images with nb pro you are costing them $0.9

wispy sierra
#

Δ°ts best thing ever on the in the internet

queen veldt
#

Maybe they get it for cheaper idk

queen veldt
whole sundial
willow sleet
#

how was offline llms guys using lm studio or ollama? fast generation on rag or just use notebook lm? heard context is so low and no memory at all..

queen veldt
#

What

wispy sierra
#

Have you noticed that ChatGPT 5.2 forms sentences with missing words and inverted grammar? Why can't this AI model even form a sentence

meager harbor
#

me thinking ai will only get smarter

#

I was wrong

compact sleet
#

It sure do talk more American, it's agi.

#

Jkjk

zealous sparrow
#

yeah videoarena is like a rollout rn if someone is wondering

#

Here is what the video player looks like

compact sleet
#

But actually... Imperfections in phrasing and grammatical structure on a normal conversation sounded more... Natural right? Perhaps it was trained on it? Iunno. Just hope it's not messing around on logic strict tasks, like coding or general analysis.

slim gorge
#

how's gpt-5.2 guys

wispy sierra
#

Weird

zealous sparrow
slim gorge
#

well thats was expected tbh

#

they're just rushing things a lot cuz they dont wanna fall behind the competition

zealous sparrow
#

in other terms it was benchmaxxed

#

they argued it has a goated OCR, later when compared to gemini 3 pro OCR it wasnt even close

slim gorge
#

they're pulling a grok move

compact sleet
# slim gorge how's gpt-5.2 guys

It's okay if you want to generalise it, it's doing very well on common life hood related tasks, on analysis, and on logic training. I'd say it's on par with gemini 3 pro. But only on high thinking sadly.

The only grace it had over other models are creative writing at the moment. But not sure if people here used it for such purposes.

For its price, it's a bit underwhelming.

zealous sparrow
#

simplebench

ocean ferry
slim gorge
#

yeah gemini is unmatched at OCR and vision in general

zealous sparrow
#

simplebench tests LLMs with these questions

#

Yet 5.2 xhigh scored #8

slim gorge
#

L

#

openai falling behind, google and anthropic are gonna be at the top

ocean ferry
zealous sparrow
#

not thinking

compact sleet
#

It does prove nothing. It's more encouraged to bench it yourself on the lmarena.

#

With your own needs and logic set of testing. It's free to test anyways.

zealous sparrow
compact sleet
#

I agree it fails on coding compared to other models.

zealous sparrow
#

It didn't score 3/3

compact sleet
#

Which prompt you ask it?

#

I want to replicate your own findings

zealous sparrow
#

I had a whole uh

#

prompt for testing

#

xhigh got 2/3 sure

#

but i noticed one bad thing with it

compact sleet
#

Mhm, just post it. I'll replicate it on my own llm arena

zealous sparrow
#

It confuses Miles with A measurement unit

#

Or Kyle as a word game

compact sleet
#

Ah the miles yesterday?

#

Daisys miles kyles

zealous sparrow
#

yes

#

that

compact sleet
#

Wait

zealous sparrow
#

LLMs often just confuse stuff with word games, is what i observed

compact sleet
#

is the prompt this?

Lucy and Mary were at a concert, one of them got in but the second didn't, even tho the tickets were booked. Why?
Daisy and Mike were at a park. Daisy took 3 daisys, and mike took 0 why?
Luke and Miles were driving on bikes down a hill. When they got down to the hill, Miles was missing, Why?

zealous sparrow
#

I have answers for this too

compact sleet
#

Fair, you can crosscheck my prompt when testing it too then.

As follows:

Make a scenario of where three guys met in a bar, each of them told a story, in which there are unclear lies woven from every of them, not made because they want to lie, but they simply didn't get the picture clearly at that time. But, there was also a shared truth among their similar stories. They argued of which version was the right one.

The bartender came, and told the lies and truth of their story, because the bartender saw the incident himself.

ocean ferry
compact sleet
#

This will test LLM complex logic of making at least: 3 lies on a similar story, 1-3 shared truth of a similar story, 3 real truth on verification.

All in a same timeline event.

#

It's a generate scenario and analysis scenario at one.

zealous sparrow
#

This right here is too easy of a question for LLMs

ocean ferry
zealous sparrow
#

I highly believe uh

#

Gemini 3 flash

ocean ferry
#

is it good?

zealous sparrow
#

seahawk and skyhawk were better imo

ocean ferry
compact sleet
#

@zealous sparrow is your answer is like this for the Lucy Daisy and Miles test?

  1. Lucy = Lucky
  2. Daisy = Name of Flower
  3. Miles = a unit of distance?
zealous sparrow
#

A1: One of the tickets were forged/invalid
A2: Daisy took all the daisies or it was just her name because LLMs struggle to reach that point
A3: Miles fell off the side

compact sleet
#

πŸ€”

zealous sparrow
#

from my testing

#

no model currently scored 3/3 on this

#

5.2 xhigh was close before failing on the Miles question

compact sleet
#

Then at this test of yours that is being replicated in my place, Gemini 3 failed all 3 then?

#

It literally thought of a name play, instead of the most possible yet the most boring scenario.

zealous sparrow
#

This is also just an easy question

compact sleet
#

Of course I can re-run the question, just in case

#

wait..

zealous sparrow
#

yeah both models

#

just have in their training