#general
1 messages Β· Page 219 of 1
2.5/3
0.5 deducted because the model didnt mention the ticket was forged
but i know that a model can't reach that
unless opus
this was 5.1 high
Ig it did alright
give the same to gpt 5.2
it will be worse
if its better i take back all i said about OAI
Uhh try giving that question I'm not at LMarena rn
Damn didn't expect those to fail
failed at my 3 questions
it aced my first 3
then failed
what is this prompt
Lucy and Mary were at a concert, one of them got in but the second didn't, even tho the tickets were booked. Why?
Daisy and Mike were at a park. Daisy took 3 daisys, and mike took 0 why?
Luke and Miles were driving on bikes down a hill. When they got down to the hill, Miles was missing, Why?
and the answer ?
A1: One of the tickets were forged
A2: Daisy took 3 daisy's because her name is literally daisy
A3: Miles fell to the side
thx a lot
Try asking these questions to extra high ig
It's available at yupp I think
yeah ima do that
gpt 5.1 high was the closest btw failed the forgery question
@echo aurora Nano Banana Pro is throwing a lot of errors again; sometimes it works perfectly, but other times it keeps throwing the same error.
High hopes for extra high tbh
:ablobwave:

We have been noticing higher than usual error rates, and our team is aware and working on lowering this as much as possible. However, when you mention:
sometimes it works perfectly, but other times it keeps throwing the same error.
This sounds a lot like it's being caused by rate limit. You can confirm this by: opening Dev Tools > Network > searchStreamand find the Eval ID, there if you're seeing the Status Code = 429 that means it's being caused by rate limit.
2/3
screwed up the hill question
I guess expected
so a shocker is like
google flash models screwed that question up too
so its prob all the new models that know Miles is always a unit and not a name
you can edit messages in lmarena now???
using claude and it's stuck on generating, any ideas to fix it?
i feel like those questions have a lot of valid answers though
Yeah but you expect the ones you want from the LLMs
and if they can reach them
waht does that mean
They prove how good they are
these simple questions are apparently difficult for LLMs
I changed Miles to Kyle so it doesnt do the measurement unit bs
Its LLM fault
Gemini 3 must be the goat for this kind of question
At least the deep thinking
both flash models did the measurement unit bs
i mean for answer 1 i would say like she might be not following dress code or intoxicated
she took daisys because its her namesake
badly worded but crashing on the way down seems the most logical and likely
third question is really badly worded
Im going to ask 3 pro
saying when they got down implies both reached the bottom of the hill when the answer you want demands that one of them did not
What models are in the video arena channels for image and video?
btw fiercefalcon and ghostfalcon [3 flash] failed
overall just wrong but also fell for measurement unit bs
any ideas what other models to add to gpt4free.pro?
damn
Damn booked up in a prison cell
Like hey I booked you a prison cell spot
thats not the correct usage
If it happens again, I'll check it to confirm it for you, thank you very much!!!
bro if not the Miles BS
LLMs will say Kyle sounds like Isle or Cycle
they think its some wordplay
Hello! @midnight vigil Please check β how-to-video-bot to learn how to generate videos.
gpt 5.2 told me its wordplay just now π
2/3 failed the hill question
Extra high said miles was missing
It measured in kilometers surprisingly
Though about the first one it said Lucy is a staff member
Brah
anyone knows how to fix a constant generating on lmarena?
A good way to check this is looking at the leaderboards - https://lmarena.ai/leaderboard/text-to-video & https://lmarena.ai/leaderboard/image-to-video
Compare models according to their ability to generate videos based on the given prompt
It depends on the error. I assume you're running into the Something went wrong with your generation, please try again error?
Yes how can i fix it
no, just generating
no that's not it
How to create video
i cant generate anything, i always get the error the other user above got.
Yea
For this error it's a bit difficult to troubleshoot as it can be caused by different reasons. For emini-3-pro-image-preview we have been seeing higher than usual error rates, our team is aware of this. With this case unforuntately, there isn't much on the user's end that you can do to get past this. But overall I would recommend: refreshing the page, starting a new chat, clearing cache. This can help, but isn't a guaranteed fix.
It's worth noting too that this error can trigger because of rate limit, which tends to be pretty common. This can be verified by opening Dev Tools > Network > search Stream and find the Eval ID, there if you're seeing the Status Code = 429 that means it's being caused by rate limit.
In #1397655624103493813 you'll find the information you're looking for.
@echo aurora what about the constant generating?
This one is more difficult to figure out what the issue is as an error isn't triggering giving some kind of status code. However, trying the same methods may help here: refreshing the page, starting a new chat, clearing cache.
i did all of the above, including a hard refresh
well the new chat creates a new chat but can't load the one that's stuck
after generating a image in the discord itself, how to download that generated image in the local storage? Or we cant download it?
Yes Pineapple, I'm getting error 429
Pineapple working overtime huh
Sorry to hear that didn't help fix the problem. Unfortunately, this is a known bug that can happen occasionally. If a chat gets too long, or long prompts/responses can contribute to this.
Yeah that's rate limit sorry to say.
oh no worries, thank you for trying to help!
And how often does that limit reset?
IIRC it's 50 mins.
But like who funds them
Does Google pay and pray (assume) their model with a fancy hidden name is going to be the best?
So theyβre willing to give $1000?
Is 5.2 not going to leaderboard?
It's on the WebDev leaderboard https://lmarena.ai/leaderboard/webdev, but isn't yet on Text
As soon as we have those scores we'll be sure to put out an announcement, so keep an eye on #announcements
Iβm taking the over on kalshi
There's no way openai will outperform Gemini, Kalshi even has it as a 90% in favor of Gemini so everyone knows this
If you were waiting for leaderboard to update so you can find out who wins just go ahead and make the decision now since it's already not looking too great for 5.2
openai is falling
Hello @echo aurora When I try to login with google, it wants me to download a file named βgoogleβ , why???
Pardon me; whatβs that ?
so how are people finding GPT-5.2? good in general? not revolutionary right?
If itβs a virus, how is it possible? Itβs an iPhone after all
not today for sure, there isnt even a test model for banana 3 flash
as in when the flash model comes out
eventually
does gpt 5.2 have a new-ish base model?
(e.g. fresh pre-train, new distill, larger private model ..)
8
8
2
no
β
Bro tried twice
but logan posting about nano banana 3 flash already huh
Gpt Pro is already trash, imagine that
Imagine how many time would take an gpt 5.5 xtra high
@round fox Please check on #1397655624103493813 for creations #video-arena-1 #video-arena-2 #video-arena-3
I think what can save them is making a not rushed model that is trained properly
Highly agreed.
they haven't not been able to pre-train a model successfully in 1.5 years. they are reaching the end of how much they can squeeze out of post-training. That is likely the reason that they haven't improved much in 2025.
YO WHAT are hazel-edit 6 and ghost-pepper in image gen battle? Horrible models π€£
Did you like GPT-5.2?
6
11
3
No, it's worse than GPT-5.1
This doesn't sound familiar, can you provide more information? Do you have a recorded video by chance?
hazel is OAI
ghost-pepper is apparently qwen
Large Language Model Arena
Sama this you?
Ultra high gpt πΏ will save oai
Gpt 5.21.1 Ultra high x-max pro plus
I feel like I am the only one who likes gpt 5.2
yes
Welp the beauty of choice I guess.
Uh based on what tasks do you use it
My job consists of using models for a lot of "text based work." A lot of research based queries. I have been comparing it with gemini and just 10 normal thing I do GPT was better in 7, 1 basically identical output, and 2 gemini was better.
400$ per prompt, but I can use free in lmarena and yupp and etc lol
is the start, I hope the nova 3 will be good π
What version does the ai use in Code Arena if you pick both good or both bad in battle model
Everyone wants MidJourney FREE & Unlimited β and in this video, I show the closest real method to getting MidJourney-level images without paying anything.
Iβll show you the secret AI tool that creates MidJourney-style images, how to recreate images from MidJourney Explore, how to write stronger prompts, and how to even animate your results...
Hi!
Hi
Hello 
Hey π
Uh try asking ai to make you a prompt?
I am doing this
Hello guys, I'm new here.
Pls, which AI model is best for book content creation?
There no best
It's based on what you prefer
@fiery gull gemini 3
Gpt 4.5
Why did it ping him
Highest context window
5.5
I thought gemini has highest
What is this?
Welcome welcome
would encourage you to check out our Text Arena leaderboard with the Creative Writing category -> https://lmarena.ai/leaderboard/text/creative-writing
Damn he got whole texts ready
Is LMArena down?
GPT 5.2 broke for me
does gpt 5.2 work for you guys
5.2 search nice
When is it gonna be on LMArena?
right now
Not all of them are on there. I only see 5.2 high, 5.2, and 5.2 search
the system prompt ones prob not fully added yet
But what about the code ones?
the names got reverted
To what
just now btw
Why?
grok 4.1
@echo aurora was gpt 5.2-code a new model or a finetune? It was removed immediately..
its finetune for code
yeah immediately got removed
even if its a finetune
I kinda think Grok is nice for writing content
No its worse
Which do you use
Gemini 3
How about Gemini 3 Pro
Thats the only one
I hope 5.2-code finetune isnt the same case as speciale
there is also gemini 3 flash
Oh let me check it out
Where? I dont see it
LMArena battle mode
turns out these 2 are currently only battle mode models
hello
Welcome 
Create the book using opus 4.5 + gemini 3.0, and create the book with notebooklm or gemini app
Use gpt 5.2 xhigh for double check
Thats good Idea π‘
Plan the book with gemini 3.0 and create the book itself with opus 4.5
Oh got it
Thinking
Is just like I do to get the best result
1, 7, 18, 45, ....?
sol:
115
aβ = 3aβββ - aβββ - 2
What is that?
someone sent it as a challenge for ai's
Bro is impossible to vote opus 4.5 vs gpt 5.2 xhigh π
Bro I don't have 1% of smart that AI
Did you try gpt5.2xhigh?
5.2 high was so mid
How do i get access to opus 4.5 bro
Lmarena
I'm using for work now, for my use I fell an improve that gpt 5.1
But I think is because the 'EXTRA' high
Lmarena π
I think so
But tbh I don't think it's worth it to buy api or using it
Bro its so expensive
Bro but in code (word html) the gpt 5.2 is really cooking like gemini 3.0
I got no time to try
Yeah, the 3 is sooo expensive π, but gpt is more that opus and gemini
I was sleeping π
I sadly have not. It happened to me twice. I go into the page, then login, google and they ask me to download a file called βgoogleβ and I deny it
Oh itβs from the iPhone
(generic project you can see it) the opus make a mistake in word html the gpt is perfect
soooo good the gpt 5.2 xhigh in word html
gpt 5.2 xhigh = gemini 3.0
no, windows
Pro?
I mean, which of this? lol
yeahhh, just use it to write
plan/thinking the book with gpt 5.2 xhigh
gemini 3.0 for create the book in gemini app
use the 3 lol
ahhh thinking, allways
But it has a rate limt
Thanks man
I don't read the question lol, use allways thinking mode
I do love opus 4.5 thinking but the rate limt hits so fast
Which one has better vision and image understanding? Gemini 3 or gpt 5.2
I'm so anxious for still see the fight that 3, like messi vs cr7 :D.
Yeah it's so annoying, i had to wait for 5hrs to reset
gemini 3.0 still the better, much
Gemini 3
#1397655624103493813
<@&1349916362595635286>
Fr
Why is gpt 5.2 high so fast on lmarena it feels like it doesn't think and it's instant
Maybe questions are simple?
Pineapple saw that:
Like 2-2
hmmm, I'll see it later o-o, but you using high mode?
Will that work? Wouldn't that mess up my content?
lol I delet it ;-;
π
see the direct chat π
OMG OMG OMG.....IT'S HERE!!!! @echo aurora is the π π π
Yeaaaah
Btw I think no one cares π
Everyone cares, nobody uses closed book LLMs in the real world, we all get caught up in this bubble we have here and don't realize what the masses want and need
@echo aurora did you add glm4.6V?
Just added! Youβre too fast
π€£ π€£ π€£
LLMs are just glorified token prediction algorithms. Trillions of dollars have been wasted
we need it
and glm 4.6v fast? I want another small model in rank π , serious this new model is toe-to-toe of glm 4.5v?
Something I like about 5.2 is that the hallucinate rate seems a lot lower than Gemini 3 pro
Btw you can now try it from the glm site
Btw I think glm is cooking
Yea I told ya in the screenshot I send
where can i access that
But y'all said fake
Are you guys shown together all the time ?
API costs
Guys what is the best Minecraft command coder ?
then why u on lmarena
Holy shit lmao #ChatGPT 5.2 is quite possible the worst model they've ever released. I have no idea what the fuck they have done - there's no way this was the alpha model my cohort tested, nor is it even remotely close to how well 5.1 was performing the other day.
This is quite
ignore 5.2 model. Let's wait for 5.5
Gpt 5.2 thinking is a gpt what decide how many thinking itself will use
Yeah π«¦, we need still to clanker the gpt 5.2 to openai do a gpt 5.5 better
they overtrained on arc-agi to create some buzz.. but real life performance got worse
Too, ever month a new chatgpt lol
i have plus membership.. and i gave very decent shot to it and used it excessively today.
it's honestly trash compared to gemini 3
Bro
Ye
Bro its like every week another ai drop
Why funny ?
Grok 3 is smart because it has less restrictions
Good point
Grok 3 in 2026 lil'bro?
yes, i think plus users gets medium. this actually makes me mad. they are treating paying customers badly. it honestly feels very scammy to me.
well its weights will be opened in early 2026
OpenAi seeing this π€π€
Chatgpt is no longer the godfather of Ai it became just like deepseek
Good, but the modeed deepseek v3.2 don't is very better? π€
Claude killed the dream of chatgpt
most people will continue using it though because it's all they know
i rather pay for gemini 3 pro ..get 2TB storage as well and get much much better performance.
rather get it for free...
Yeah same
and unlimited usage
But the openai have an 1ti of divide π
And gpt isn't worth it
through ai studio? I need enterprise controls, so i must pay
I'm using aistudio and gemini app same time
Gemini pro plan
Maybe a gemini 3.1 without lazy? My dream π
Exist veo 3.1.... I just need dream it
good thing is that gemini 3 base model is really strong, it will be much easier for Google to do better post training and get the best out of it
GPT 5.2π π π π π
Post training
We still have 2.5 flash preview
It's never leaving the preview
i think its a preview model?
We won't get the gemini 3 pro regular
It'll stay in preview
Until gemini 3.5 pro preview comes out or something
Bro Open ai is lying to us ?
this is not surprising to me. 5.2 is built on 1.5 year old base model. OAI had enough time to already squeeze the best out of it. Changing it further (like 5,1 to 5.2) would result in improving in one area (arc-agi) and downgrade in others.
No way this true
They say the sponsors don't have any influence on testing
BUT
Arc-agi test is basically giving the model one unique test that the model hasn't been trained on
Sooo
What benchmark?
If they paid someone in the company to snitch the arc agi test 2
Prompts
They could train their model for it
Which i honestly think happened
They probably have some snitch
I guess that's their code red
To do false publish
Of "super improved model"
I'm not amazed at all on the gpt 5.2
If you guys find a reliable way to use gpt 5.2 extra high for free please tell me I need it so bad
I've tested 5.1 1 month ago i think
And it failed some tests that gemini 3 pro passed
And now the 5.2 failed same tests
It's some math problems which require bunchh of steps to get to final result
Gemini 3 pro is king
Claude is for coding
Even for agentic coding
Codex max 5.1 was terrible i had to re-do prompts multiple times since I couldn't start my app.....
Meanwhile sonnet and opus did them in first try like easy
simple bench
@cloud zinc
gemini 3 pro way better
Is it just me or are the bugs are starting to become more common here on the website because I am seeing the weird disappearing witch becoming a lot more often in reports but also infinite generation unless you Reload Glitch becoming a lot more often both in personal experience and the reports
Yes mods are aware of it for a while
No fixes yet
I i'm aware I was just commenting on it since I'm pretty sure I was the first one who made a report on it at least for one of the glitches at least from where I can see in the report area
Because honestly both of those glitches has been going on for me for a bit while now
They don't even know why it's happening
It's a problem I've talked for a while..m
We neeeeeeeedddd to see the error code or something
So we can be moreeeee speeecifiicccc
Offf thee errorrr
Just the retry again isn't enough
Well they probably need to figure out soon enough because if they don't they could be losing a lot of customers soon and fair enough for the code but i am not even going to try to see what code is in the error cuz I have tried to see the code and it looks confusing as hell
I already gave video examples to one of the staff
When it comes to the glitch itself
@echo aurora An AI arena for audio, music, etc., would be amazing! It's the only thing LMArena is missing!
google a/b test right now are for what you guys thing?
hello
I've tried all day to get gpt 5.2-high to work on lmarena, no luck so far.
It doesn't have thinking.
Is this image AI-generated?
Maybe I'm wrong, but it looks like fake.
could u send link pls? π
Sure give me a sec
Boom
KAT-Coder-Pro V1 has too much, and Gemini 3 Pro has too less.
Hereβs the site
Reminder that they add this all up by benchmarks
Like
Multiple
This is the average
ty!
haven't seen that one yet, but it looks interesting
Artificial Analysis is usually quite reliable
Itβs a good assessment on general performance without the bias
like in general? or like in relation to LMArena ranking?
o wait, just read ur second message again. im dum

As I said, it's fake
@echo aurora How long do new models take to appear on the leaderboard
It mostly depends on the amount of votes we're seeing
So how long until 5.2 do u think?
Also is there a higher probability of getting a newer model
I wouldn't want to give an ETA as I'd hate to give the wrong impression.
Do you have insider information?
Hate on me if you wish but I feel this is another grok 4 situation
I am an employee here
OpenAI spits out model supposedly SOTA, crushes benchmarks, blah blah
Comes out and while it scores great on benchmarks it fails individual tests
And daily use
Same with GPT5
Nah itβs fine but how long have passed models taken like Gemini 3, gpt 5.1 if you know
Unveil all codenames to us.
Sorry to say I don't know. Generally though, the Text leaderboard will take about a week to update, but this can vary.

Thanks
Get hired
Yeah sorry I can't provide many details on this.
Well since you said please...
π
It would litteraly end up on reddit in 2 hours
Might have to leave a bad review
Bro knows more things as we expect 
Yeah same as google employees
They have insider information
= free cash on polymarket
Is that the Microsoft support client room?
is gpt 5.2 xhigh only in the api
my early returns are that GPT 5.2 search is definitely an improvement over 5.1, need more time to say how much yet
REAL
WoW! It's too subjective.
Just kidding though 
GPT 5.2 is a piece of crap
^
"trust me bro" statements by AI companies
π
every time they behind they just add more reasoning effort
wasn't Deepseek 3.2 said to also be really good
but turned out ass
Longterm, OAI has no chance vs Google (and even vs Anthropic).
lol
-# Grok is a wild card.
π―
3.2 did its job well
I use it daily for math and roleplay
I love it
is it top-3 in roleplaying?
Do people actually use ai for roleplaying rn?
Roleplaying with a robot? Hmmm...
yes
β¦.
Sadly yes.
Itβs a HUGE market
That
I contribute to.
not sadly, it can be fun
It is fun
So guys which model is better for coding?
GPT 5.2 or Claude 4.5 opus
#5 rn set to become #4
opus-4.5
by a huge margin
opus
Adventure, stupid scenarios, just in general
A little unholy business on the side
Dam
Opus is op
Opus clocks GPT
I will try gpt 5.2
I donβt think 5.2 is bad, I just think openai doesnt know what it wants to be
DeepSeek is math god, Claude is code god, Gemini is vision and jack of all trades
But what is GPT?
Balance βοΈ
Balance between broken code and bad math
Universal ai ig
GPT is for students who dont know stuff about ai and will use the popular one
Tbh what does GPT mean?
One size fits all
Cheating exams yeah I use it alot
Yeah true
Use DeepSeek
And I end up getting F
You will not be let down
DeepSeek is so peak at geometry for me
I mean on my school I got no internet so I just use mobile data chatgpt seems to use way less so it's better
Speciale or just thinking?
You have to be a bit more wordy but it pays off
Speciale thinks too long
I will use it on my next physic exam
I use no think
Fair
I swear speciale has some sort of paranoia
Everytime I look at it's thought it's always thinks what if?
Speciale feels like chatgpt pro with long reasoning
Itβs great for reasoning but only for like questions of the universe
You have to specify FULLY
Just use the no think and thinking variants
Nobody uses commie Claude with their ridiculous limits they impose here, and everywhere
-# in battle mode it has no limits :)
Mogged.
on code mode?
text/chat mode (default)
Damn it's expensive
People pay the premium though cause itβs good
I used to have a hatred for Claude but they make solid stuff
Really solid
Opus 4.5 thinking is a MACHINE
Yea but on battle the model changes on every follow-up, which is great for some testing, but not if you wan't to test longer chains
Why did you hate them tho
Untrue. If you never vote, the model stays the same.
anthropic killed their family
WHAT?!
Just use side by side comparison
What
Sonnet 4 sucked
So I didnβt use their models for a while
so?
Came back when 4.5 released
Wym anthropic killed somebody
And I really like its prose
no, limits are heavy there
you need a (google) account, though
Anyways
hmmmm interesting
Thereβs limits???
Iβve never hit limits at all??
Dw 5.3 comes out next week and 5.4 next month and will fix EVERYTHING
you use lower models? or only battle?
Are we talking lmarena or Claude
Fr?
U guys how can I use Midjourney for free??
I donβt know, but wouldnβt be surprised lmao
Dude just drop GPT 6 why like that
Dude if we ever get chatgpt 6?
gta6 before gpt6 ^^
I won't be surprised there will be 6.7 gpt
I'm not sure why some people are getting heavy limits and some aren't. I just assumed everyone got them since I did, interesting
Cuz knowing openai this might happen
GPT 6 for sure cuz GTA 6 ain't coming out
ChatGPT 6.7 will finally be able to answer how many rβs in garlic first time
And itβll say βSIX SEVENNNβ every 2 prompts
Frfr
Would be a great April fool's model tho
GPT 12.2 will be out before GTA6
GPT 10 will finally be able to make coherent organized code
Trust
With 5% LESS errors
Imagine OpenAi skips GPT 9 like Windows and Apple did
the only one im seeing whos still far ahead from hitting plateau are google
Wait did I miss the final numbers?
Why do u think that
Claude is so good even without reasoning
burnout / no improvement on pre-training ( thats why they are starting from scratch ) / less data quality compared to google / key staff elements poached by other labs
Grok has the most powerful cluster
what did they do with it
Which is better overall
6
11
2
Gemini 3.0 pro
I mean they suck but it was recently completed something is possible
also
google can basically use multislice to create a more powerful virtual cluster
than xai
trust me, they are far ahead
be it on hardware / software
Does anyone know if they've extended the limit of 5 videos per day?
i just searched, so the maximum they can pack with this method is 50k but thats still way faster than any cluster for ai training giving how efficient their TPUs
maybe a fun game for some people? #ai-creations message
-# (it's 100% free & open-source)
can we get claudius 4.5 opus on vision arena
did they lobotmize gemini 3?
claudoctius 4.5 opus will NOT get vision arena status
Have no clue feels like it has been like that for a while now
Hmmmm thats weird because I seem to be using that commie claude with their ridiculous limits. You gotta manage it correctly
I was able to generate about 10 videos in a single day, wasn't the limit supposed to be 5 per day? Does anyone know if they increased the limit?
Theyβre so ass
yes, idk why they did it π
hello
I don't really understand the hate. It's not life changing, but Twitter is ridiculous with the hate for this thing
The search is easy
But the model itself sucks
Itβs very clearly benchmaxxed
I guess that's fair
Bro I use Claud for code I hate the thinking limit can someone tell me how to ovoid the limit
Cwaude
@echo aurora π©οΈ
@π
What would save openai from short term bankrupt
8
17
4
Nothing can save they lmao
Like they donβt have to be physically connected?
Because it was over hyped, even by openAi themselves. High benchmarks. "Code red".
If they just called it gpt5-high-context I think a lot of people would've been happier
lol, i've seen those screenshots all over twitter
the other thing is, unlike Google, who releases Gemini 3 Pro (and not just Flash) to everyone to use, GPT models are typically paywalled
so a lot of people who have access to these models are paying for it, meaning their expectations (and also hatred, if expectations are unmet) are higher
Altman seems to do a good job at selling to VCs, but not such a good job at knowing how to appeal to the general public
probably a lesson to be learnt there somewhere if you're thinking of starting a company
actually I think the other reason is the amount of hype-litter all over twitter for GPT5.2
Gemini 3 Pro had a lot of it too, if it wasn't able to produce much, pretty sure there'd be a lot of gemini 3 pro hate too
Hi
hi
I literally just went to x.com, and saw this at the first post at the top of my feed.
https://x.com/slow_developer/status/1999661802666557487
i'm still kinda confused how openAI made that much progress with gpt-5.2 when gpt-5.1 was only a month ago
my guess is it was an internal model they held back due to high compute costs and because they didn't think it was needed
until gemini 3 and opus 4.5 arrived
if people get bombarded with this, and they try the model, expecting "incredible progress" and it can't answer simple questions, yeah, they're going to post about how it sucks
system prompt in code arena
Which open source model is currently the best overall
Yes, good catch. Thank you 
AGI
why did the sys prompt i pasted got deleted?
@echo aurora An AI arena for audio, music, etc., would be amazing! It's the only thing LMArena is missing!
LMARENA IS THE BEST PLATFORM I HAVE SEEN TILL DATE. ADDING UP THE VIDEO ARENA IN THE WEBSITE IS π₯π₯π₯π₯
I wanted to support by donating some amount... @echo aurora is there a link for donation?
lmarena is a private company so they don't accept donations, they only make money from evaluation services
Still I am mesmerized by the progress. I write parody songs and I haven't been able to create videos for the lyrics till date due to expensive subscriptions by the AI websites
LMArena opened the gates for me. I really am very grateful ππ»
I am feeling very happy after seeing the video option when I opened lmarena website today
the video arena is not on the website, maybe you are confusing it with something else?
@whole sundial but I saw this option
looks fake, where is the plus button?
Sure?
@echo aurora is this legit?
Looks genuine to me
oh probably an a/b thing
That's why I came running here to express my happiness here π€©
I tried going to the same link
it gave this bruh
hmm
maybe its a bug like on my phone
Am I lucky? I have literally created my first video
on my phone I had web arena before i got it on PC
oh hell nah
my phone has 0% 
assuming it's an a/b test, you are pretty lucky
please emulator I need this, my lmarena is kinda videoless
looks pretty legit to me
I will be able to create videos for my 58 parody songs
Thank you from the depth of my heart LMArena ππ»
do you have access to beta test?
I have no idea, I just use lmarena on daily basis
I opened normally today like every other day...
noooo
I am logged in bro
@echo aurora yk if random people have access to features early, why not give it out to EVERYONE?
π€
it may be a bug
can you check if sora is on there
yes its there
well I want to keep this kind of bug lmao
sora 2 pro?
+++++
yes sora 2 pro
YUPII
woooooooo~~~
thank god
God, thank you ππ»
finally after I waited so long
OMG?????
VEO 3 FOR FREE????
@echo aurora nah
yo can someone from the company explain why this guy has video arena on the site???
i see strings that relate to a video arena in the code, this is 100% real
but lmarena is like this, we have claude opus, sonnet, and literally gemini 3 pro (gemini 3 pro is kinda limitless I think)
That's why my first message after I came running here today π
if you remove this string, is the limit removed
i doubt it
pineapple has a lot to explain...
nah, it will just dont show anything
yeah, imo this is the type of feature that shouldn't be a/b tested like this, it's either launched or not launched
can I add you as a friend?
ill check my alt on lmarena
if it's not launched and just an a/b test, lots of people who want a video arena will be upset
that isn't explicit anyways, let alone a/b test
they probably have done it poorly, not rolling it out to everyone
Are you all logged in to the website also?
yas
So good news is we are all going to get the video arena soon on the website π₯
sure
the problem is, pineapple usually has the speed of FLASH to address these kind of problems
What does the video selection even look like
wdym? how to get into it?
it's pretty late for him though
im starting operation alt check, I have 10 alts
@lucid geyser there should be a video button next to Image & Code
brother what
On the bar though
just click the models
It doesnβt show image models without clicking image
@echo aurora pls... we need an explanation on what's happening
What happened
Bro heβs sleeping prolly chill
he literally went online minutes ago
but whatever
I want this a/b test situation to end
roll the vid arena out to everyone!!!
Why
well if you have noticed
some ppl have the video arena on the website, some don't
Maybe after a few hours it will roll to everyone...just like how android updates happen to get feedback
and then stable update to all
kinda, they are still interconnected with fiber but not in the same datacenter
Videoarena came to the website
you got it too?
Yea
Great ππ»
Hello, this is Lakki. I am a web developer. If you need help with any project, you can hire me
you seem like a vibe coder ngl
Fr?
<@&1349916362595635286>
YAY!
I don't have it???
Soon
Need?
hey, some devices got Video generation option (including mine), but some didn't. why
@echo aurora
Maybe pc only?
so it is a/b, an honestly pretty stupid one at that, they could've launched it on beta lmarena first and announce that instead of basically making people jealous for one another based on if they have the video arena or not
I dont think so
ye im sure its a/b
Hmm
Then maybe it's just to some users
Like early access
a/b
UI preview for you guys
@echo aurora hey! umm I have noticed some users have access to the video arena on the website, but some don't.
Can you explain if this is an a/b test and if it is next time pls be explicit about this
like a few hours ago
@shell oasis this guy got it first
Seems like is rolling slowly
Lucky him
Probably a valuable user
does anyone realise that this company sometimes does things in a shady way?
ye
he uses it daily so probably that's why
The Algorithms Yeah
THEY SAID DECEMBER!!!
I am a DevOps Engineer...I find new ways to automate the code deployments...Heavy automations
If it thinks you're old enough?
How is it even supposed to tell?
Well probably talking about taxes with chatgpt maybe gonna work
Rockstar Games also said GTA VI in May 2026...but November 2026 now π
Maybe the video update is not intended?
They just maybe accidentally rolled it to some users
@compact flame if some got it, maybe its intended for all
that would mean that backlash is coming for them, since they made a fully functioning feature (that is heavily requested) then decided to gatekeep it
This happened with the Retry Button on Battle Mode, took them a month to add it back... smh
Cuz like nobody said anything about video arenas
It's just silence and boom it's here
as I said before, this company does a lot of things in a shady way, this isn't a good example of transparency, I don't have a lot of trust into this company
I wanted that bro...I always secretly wished lmarena to allow video creation on website
I mean the 3 channels in this DC server are lackluster
Well let's hope video arena is real and it's not an accident
https://discord.com/channels/1340554757349179412/1449335460223783014
@shell oasis @compact flame check it out
Manifestation works ππ»π
Hm maybe we can copy a link that leads to this feature?
Like with code and etc
try inspect
I tried didn't find anything useful
how did they find video string?
@whole sundial need some help over here
I'm not that good at inspect anyways
developer mode, go into debugger and you should be able to search for certain things
where is that on Edge?
Iβm a full-stack developer building a project and I need an API key for image and video generation.
i'm not sure
heloo
I'll help the first 10 people interested on how toΒ start earning $100k or more within a week, but you will reimburse me 10% of your profits when you receive it. Note: only interested people should send a friend request orΒ sendΒ meΒ aΒ dm! askΒ me (HOW) via Telegram username @Susan _Vachon
Or The telegram link in my bio
oh i just figured it out
<@&1349916362595635286>
tell me pls
hey uhh you might be able to find the URL that leads to vid arena
ctrl + shift + f
don't think that's possible
they made it in such a way that you have to be a part of the a/b test group to access the video arena
well how to apply for that test group?
and there's no way pineapple is silent about this
something really sketchy is going on
Maybe there more people who got it it's just like maybe was randomized
it randomly chooses people
Choosing people randomly is not efficient for testing ig
Well you know, only 2 videos are allowed per day π€―
then check back after 14 hours
damn
Can you like share the link to video arena or it's not possible
Hello
The link won't work
Hmm it gives me an error sadly
Well I guess it was worth a try
I think because everyone will be abusing sora 2 pro or whatever
I would have to buy a local PC only but right now all AI companies ate up RAMs π
I write parody songs and I wanted AI videos to create the videos for my lyrics
Yeah true though
A bug occurred, and I posted it on the bug forum. How long will it take for the moderators to see and fix it?
How do I delete a video that has already been generated, sir?
there is only one person here ( @echo aurora ) that looks at them, so it may take anywhere from a few hours to a few days
Gemini 3 pro accidentally fed me its internal thought pipelinei nstead of the proper output. Is this common knowledge, or something that's not known?
Thank you
How does LM Arena allow us to use paid AI models for free?
Nobody knows
They are paid by big companies for testing the models (in battle mode)
Those secret name models are paid by companies
But they say for direct chat and other stuff lmarena is paying for API
For 6 images with nb pro you are costing them $0.9
Δ°ts best thing ever on the in the internet
Maybe they get it for cheaper idk
Yeah
they are paid by companies that want their models evaluated + they have over $100 million invested into them
how was offline llms guys using lm studio or ollama? fast generation on rag or just use notebook lm? heard context is so low and no memory at all..
What
Have you noticed that ChatGPT 5.2 forms sentences with missing words and inverted grammar? Why can't this AI model even form a sentence
gemini 3 does it too
me thinking ai will only get smarter
I was wrong
yeah videoarena is like a rollout rn if someone is wondering
Here is what the video player looks like
But actually... Imperfections in phrasing and grammatical structure on a normal conversation sounded more... Natural right? Perhaps it was trained on it? Iunno. Just hope it's not messing around on logic strict tasks, like coding or general analysis.
how's gpt-5.2 guys
Weird
mid as hell
well thats was expected tbh
they're just rushing things a lot cuz they dont wanna fall behind the competition
in other terms it was benchmaxxed
they argued it has a goated OCR, later when compared to gemini 3 pro OCR it wasnt even close
they're pulling a grok move
It's okay if you want to generalise it, it's doing very well on common life hood related tasks, on analysis, and on logic training. I'd say it's on par with gemini 3 pro. But only on high thinking sadly.
The only grace it had over other models are creative writing at the moment. But not sure if people here used it for such purposes.
For its price, it's a bit underwhelming.
it scored very bad on simplebench
simplebench
bro don't even try challenging Gemini at OCR task dawg
yeah gemini is unmatched at OCR and vision in general
bro and that Gemini 3 pro that almost match human on simplebench is the lobotomized version dawg
correct but the lobotomy was mostly done on coding
not thinking
It does prove nothing. It's more encouraged to bench it yourself on the lmarena.
With your own needs and logic set of testing. It's free to test anyways.
I benched it with my own questions
I agree it fails on coding compared to other models.
It didn't score 3/3
I had a whole uh
prompt for testing
xhigh got 2/3 sure
but i noticed one bad thing with it
Mhm, just post it. I'll replicate it on my own llm arena
You were there when i did the tests right
yes
that
Wait
LLMs often just confuse stuff with word games, is what i observed
is the prompt this?
Lucy and Mary were at a concert, one of them got in but the second didn't, even tho the tickets were booked. Why?
Daisy and Mike were at a park. Daisy took 3 daisys, and mike took 0 why?
Luke and Miles were driving on bikes down a hill. When they got down to the hill, Miles was missing, Why?
Yeah.
I have answers for this too
Fair, you can crosscheck my prompt when testing it too then.
As follows:
Make a scenario of where three guys met in a bar, each of them told a story, in which there are unclear lies woven from every of them, not made because they want to lie, but they simply didn't get the picture clearly at that time. But, there was also a shared truth among their similar stories. They argued of which version was the right one.
The bartender came, and told the lies and truth of their story, because the bartender saw the incident himself.
yeah bro and the most hyped one is coding bruh, why do they even lobotomized it, it would've scored like 80%+ on SWE-Bench Verified
Saving TPUs, idk?
This will test LLM complex logic of making at least: 3 lies on a similar story, 1-3 shared truth of a similar story, 3 real truth on verification.
All in a same timeline event.
It's a generate scenario and analysis scenario at one.
This right here is too easy of a question for LLMs
what model is ghostfalcon?
Some sort of gemini
I highly believe uh
Gemini 3 flash
is it good?
From my testing, eh. For codeArena
seahawk and skyhawk were better imo
nooo they even lobotomized G3 Flash!!π
@zealous sparrow is your answer is like this for the Lucy Daisy and Miles test?
- Lucy = Lucky
- Daisy = Name of Flower
- Miles = a unit of distance?
I can give you the answer
A1: One of the tickets were forged/invalid
A2: Daisy took all the daisies or it was just her name because LLMs struggle to reach that point
A3: Miles fell off the side
both failed the Miles question
from my testing
no model currently scored 3/3 on this
5.2 xhigh was close before failing on the Miles question
Then at this test of yours that is being replicated in my place, Gemini 3 failed all 3 then?
It literally thought of a name play, instead of the most possible yet the most boring scenario.
This is also just an easy question
