#general
1 messages · Page 96 of 1
Fr
It makes me wonder what kind of person should use battle. Power user? Early adopter? Average person?
Right now, lots of people who may not be able to afford GPT pro, are free-riding direct chats to get gpt-5 high access.
You could maybe gate + cap direct chats at like, idk, 15 messages and force them to respond to a battle before continuing.
I do wonder how the average battle user vs direct chat user differs. My guess is direct chat people are younger, less income, etc. Winder what effect that ultimately has on the leaderboards / stats / rankings
gpt-5-search model can search the web?
Yes
Though tbh, the style control thing confuses me. LMArena is all about moving away from “objective” benchmarks to the ones that matter most: user preferences.
If users like different styles and formatting, then why control for it? The whole point of this is to find out what users like!! Defeats the whole purpose imo
@ocean vortex
I think AI enthusiasts. But those people are not gonna use it how it was designed for if they can't actually test unreleased models for even a single message after voting
@echo aurora What is this new ai image generator called nano banana that is in LMarena
😭
OLED AI
stealth model. I think it's an imagen variant
😭
Well, LMArena got a nice check from a16z and co. But eventually they’re gonna have to find out who their customers really are. Users or AI companies? I imagine they charge model makers to put their model up in the arena, no?
For models that are behind a codename I won't be providing info about them
Oh ok
AI labs are the "customers" but users are enablers. Without users there's no benchmark
Right. So from the model makers eyes, the rankings have to converge on the average user. Otherwise it’s useless. OpenAI, for instance, went with summit over zenith bc of better LMArena scores.
If the people battling in the arena are different from the companies average user, then it becomes a bad benchmark. LMArena must find a way to match those two pools of people
Has anyone tried minimax? How is it
I think it would naturally converge for the most part. More interest in battle mode = more users participating = more reach = more people will hear about it and try it.
Haha, are we sure about that?? You just said that early arena testing is made for enthusiasts lol, which i do agree with. Enthusiasts aren’t average users though.
One thing LMArena should roll out, is user accounts. Allowing accounts on the platform would go a long way to help them get the data they need, in order to answer that question
it sucks
however they have great tts models
and video models
Text to speech?
So according to you the only things which do not suck are 4.1 opus, 2.5 pro, grok 4, some specific versions of gpt 5
The demographic is not changing and you can't easily control that (+I'm not sure you should). The best you can do is be increasing the interest and activity...
Something like chatgpt started with like only the most die hard nerds at it's infancy, look at it now... 👀
They didn't do anything to push those away and make their audience "more balanced" at the time and that was the right move
when gemini 3 ):
Fr
Not as good as gpt5 😇
Also interesting fact... it doesn't even have 1% of chatgpt market share lol
claude and 2.5 pro suck
grok 4 and gpt-5 are good
It has... you are very wrong actually. It's more than 20%
Gemini is quite substantial
Mostly thanks to Android I think
They integrated Gemini into Android itself quite well. Like they have a properly working voice assistant based on Gemini. They have what Apple should have had by now with Siri lol
20% figure is Gemini app
Not really, it's still Gemini and it's super easy to direct users directly to Gemini app from there
They don't count the AI overviews stuff in search in those figures
I've tried those local modals and they are crap. No 2 ways about it
Their AI overviews is 2b users per month, and I forget what the AI mode one is (but it's been growing at a decent tick too)
Even their cloud model is underwhelming
does current gpt-5-high in arena correspond to GPT-5-Auto in the client? is GPT-5-Pro distinct / to be added to arena at some point?
2.5 Flash would destroy that cloud model without trying
Probably even Flash-Lite has a chance
Well there are no useful on-device tasks that wouldn't use chatgpt...
with iOS
It can do like notification summaries. But any at all model can do that and it takes no time at all to make API request for that lol
then you don't have notifications either
lol
Ones that would benefit from summarisation usually do
it's just such a non-thing. You need to be actively looking for edge cases to find any benefits...
It also kills your battery much more than API
do you guys think o4-mini and gpt-5-mini have less world knowledge than gpt-4.1?
I hate OpenAI and will never give scam Altman a dime of my money, but GPT-5 high is the nuts
I thought that only works with “juice” on the api?
any ETA on when video battle goes live?
honestly can you stop spamming this? You’re not accomplishing anything except looking like an idiot
cope
cope
One day Claude will reign supreme in all areas
Besides coding
Claude is the champ
agreed (reluctantly)
magnum opus?
Someone here disagreed with you
No it's gpt6
or gpt5 if you want to stick to reality 👀
Russian state news
illegal neo-... propaganda. We don't need that here
14 times
spam
act in good faith pls
I love it how he felt the need to explain what SOTA is
yeah act in good faith Russian spy.
@eternal niche
well, right now they're being russophobic and making fun of me for being Russian, and you're ignoring it
that's not good either, exchanging blows doesn't usually get you anywhere
i didn't mean to ignore it - i'm not always watching the chat
fwiw pro users have 64% of high juice
that wasn't making fun it's just a friendly joke. Also no need to take personally reaction to some specific thing like "state news" lol
lol
you are not my friend
i think it'd be beneficial to drop this and not have a giant discussion about who's in the right/wrong
well that is more "juice". No parallel / pro, but those requests are limited even for pro sub anyways
Some tasks probably yes... not enough reasoning effort is not gonna arrive at the answer even if you run it 10+ times
@echo aurora Is the data for LMArena going to be released (I sent an email to lmarena.ai@gmail.com about it)
When we do open data releases we'll be sure to share. We did one recently, would recommend to check out this blog.
lmarena.ai@gmail.com
Would also note this isn't an email address that we use.contact@lmarena.aiis what you're looking for.
@eternal niche no way this guy’s a legend
i think gemini deep-think might be better if we can trust the benchmarks, but i cant try it and havent heard much about it bc its so expensive and not in the api
grok4 heavy or the "normal" 4
heavy is more like deepthink
i dont think they have any of these parallel thinking models in the arena
i tried it with rust code, and it wasnt able to write compilable code (unlike gpt5-high)
What kind of debugging?
Hmm. I think for complex code debugging in c++ where I assume some of it is segfault or unexpected stuff, most AI is not very good at this
I would actually ask a lot of them simultaneously
In coding? Gemini absolutely brute forces
Well, gemini pro
Flash is utterly useless for coding
do you have any info if video arena will be added to lmarena website
it's something we have on the roadmap but haven't explcitly started workin on!
Toad from ?
It might find the bugs, but I would also try other AIs if it doesn’t.
?
toad
gemini 3
It’s a good test but what harness/platform are you using?
Is it good?
If you want the AI to code it for you it’s probably worth trying Claude code and Qwen code.
Any recommanded models for python??
gemini 2.5 pro SOTA
I think it’s good enough to be worth trying.
i forgive you
are you ok?
Qwen code is free if you can’t pay for Claude code
This isn't the real @ paws
There is also Gemini cli which is also free
qwen wrote a stockfish PR actually
No but I don’t have time to do like giga deep dives like that unfortunately.
prompt was quite detailed tho: https://rentry.co/bm6vriai
model is qwen3-235b-a22b-thinking-2507
this is fake it's actually 4
bro has no mod aura
it doesnt know who it is
this is a common problem with llms

i get the n/a role
which is admin for some reason

my proof is that i am employed by lmarena
Using many models at the same time is actually a good idea
I wouldn’t bother with grok though
just all
noticed something interesting. gpt-5 high has only a 33% winrate against gemini 2.5 pro in the arena
is there any way to see all the available models? like the stealh models when you play battle with 2 anon models? because I cannot find any
in the leaderboard? they will be published once they come out (if they should be released)
which they probably only will if the arena went good
Sorry to say there is not.
Just did a few text prompts with the search mode, it didn't tell me the models it was once I voted, is it normal? new?
This might change with more samples
where u find it?
@eternal nichedo you think gemini pro has a shot against gpt 5 this month?
there is unofficially
The best AI models according to benchmarks
posting artificial analysis 😠
we use LMArena so we don't even know until after we vote
wym?
Why is the regular GPT-5 model weaker than the GPT-5 mini?
If anything, GPT-5 (minimal) is a regular model, without thinking, and when the requests end, you are transferred to GPT-5 mini
Gpt5 high that you see there is api only it’s not even available to pro users is my understanding
Is there anywhere you can specifically test nano banana or do you just have to wait for it to come up once every like 50 prompts
Because GPT-5 was not trained to be minimal, it was trained to think. GPT-5 Mini was trained better because it had no way of thinking.
You'll have to wait for it to appear. When it comes to the models that are behind a codename those are only accessible in the Battle mode.
bro some people in video arena are down bad lmao
This benchmark is bs. It shows that gpt oss is better than claude 4 which is completly false
But then... GPT-5 mini > GPT-5? Doesn't that seem strange to you?
Qwen above Deepseek sounds like massive psyop
It is really not that good
deepseek r1 very old model
You are missing the point
i think qwen coder 3 is even better than gemini 2.5 pro
for coding
wym
Try to ask it for around 100 anime similar to Madoka Magica, Qwen will fail, hallucinate and invent titles that never existed. Latest R1 does this job way better.
Try to ask both some questions from music theory and see how often Deepseek answers correctly and how often Qwen does
Qwen in general is not that good yet, unfortunately
It's trivial to train a model that passess certain public benchmarks even if it never was trained on them
The only way to compare general capabilities of models is to use private benchmarks like I do
im in the process of testing AI models on my own benchmark i just made, so far the results are very interesting, Opus 4.1 gets only 36%. if anyone has access to GPT-5 Pro please dm me i would love to test and the bechmark is only 10 questions so it wont eat up your rate limits that much 🙂
Opus 4.1 is the highest so far, o3 gets 22%
We have gpt-5-high on arena
Isn't it enough?
not same
ye i used that and am using gpt-5 high on cursor, but 5 pro is alot better from what i hear
I find gpt-5-high enough good for most tasks lol
GPT-5 Pro and GPT-5 High is not the same
AIs such as GPT 5 Pro, Grok Superheavy, and Gemini 2.5 Pro DeepThink are not typically included in benchmarks because they are usually purchased rather than employing a salaried software developer.
gpt-5 high is amazing
oh wait sorry i tested gpt 5 mini thinking im not done GPT-5 high, give me a few minutues ill see if it ends up beating opus
Can you test Gemini too?
what do you mean im confused?
tested 2.5 pro and 2.5 flash already
rate?
2.5 pro is second with ~33%, flash is 14%
They are not included in benchmarks because they are not use by average peoples, so they do not have APIs either.
When you benchmark GPT-5 High, can you write down what percentage score it got?
ye sure its almost done it has 2 more questions
thxx
What do you think then?
https://www.youtube.com/watch?v=9alJwQG-Wbk cant get over this guy trusting gemini with his arm
Giving a PC program control of my muscles to become the fastest in the world. Sponsored by Micro Center!
Build, Upgrade, and Save All Month Long at Micro Center: https://micro.center/9d4315
Sign-Up for VIP Days at Micro Center Phoenix: https://micro.center/a11e1b
Shop 50 Series Laptops at Micro Center: https://micro.center/717642
Shop Raspb...
Don't forget to run each question 10 times in a row in a different context
I haven’t used it but on the benchmarks they’ve shown GPT-5 Pro have 5 points higher than GPT-5 itself
no
Cool.
each model gets 1 attempt per question, it also makes my job 10x easier
It doesn't. LLMs output is non deterministic. You need to collect a series of responses to the same question to see if they succeed.
In benchmarking GPT-5 Pro is one point higher than GPT-5 Thinking medium
And the regular GPT-5 model has 44 points.
GPT-5 (minimal) - this is the standard model, which is by default
i guess i could but my main goal is too see how good these models are in every day tasks, you are never gonna ask a model to try again 10 times and this isnt something i plan on publishing or something
it got 42%, did well on coding tasks and was the best at making a flappy bird clone and an FPS shooter but did worse on some spatial reasoning and logic puzzles
it made the only usable flappy bird clone and fps shooter i have gotten so far tho
omg
Compared to which AI?
You're missing the point here, the same model can get the same task correctly 90% of all time and fail in the rest 10%. You never know if it is more or less likely to fail until you just test it with the same question again and again and again and again, until you have enough statistics to judge.
This is a feature request though that's very much on our radar!
Arigato.
Grok-4 and o3 did better on the spatial reasoning and logic tasks
i did run certain models many times before and the results were almost the same so i dont think its worth the hastle to go through and test each one 10 times atleast for me
The most important thing about GPT-5 is not that it's the best model, but rather, that it uses resources more efficiently. Despite being an improvement, GPT-5 is much cheaper than GPT-4.5.
gpt 5 cheaper than 4o and 4.1
It is worth the hassle.
im not doing this over the API so to test like Grok-4 for example i would either have to buy the Premium plan or wait a very long time
There's lmarena battle mode for this...
Battle mode gives random models no?
direct chat also has limits, im getting limited on Opus thinking right now
Not necessary, there's side to side option that lets you choose two models
ive only ran into limits with opus
ive been using gpt-5 high non stop and it still lets me
ill try with another model, Opus is expensive so that makes sense if its only opus
fact
omg loll
Anthropic as a company is just so weird too
they are now the only ones to have not released an open source model
gemini and grok?
/img vedeo
Grok has Grok-1 and Gemini has gemma line up
Change her photo and wedding dress alongside Cristiano Ronaldo is taking a photo with it
Grok-2 is gonna be open sourced soon they are saying
omg
Instead of making Grok open source, I think they should lower the price.
You cant use these hidden ai's in direct mode
yeah thanks, i realized
where is france
lumped in with EU
but the data's from January 2019, things have changed a lot since then
amd is running gpt oss 120 on a mi400x on huggingface
What kind of ai is this
Wtf 🤣
I don't know how they do that
Who's Himanshu?
AI relationships and love is a genuine threat to humanity and is growing rapidly by the day, I remember just two year ago where you would be laughed out of any discussion as a total loser for saying you date AI, now it’s still a squeamish topic but it’s not much stranger than just viewing normal illicit content, 2 years from now, it may be as commonplace as illicit content is, scary stuff
I remember back to the early days of chat gpt when this stuff was first being experimented with, a romantic partner bot in early 2023, people treated it as a meme than, how quickly things have changed
bro I dno anyone who says things like "I date an AI".
#video-arena-1 message
@eternal niche
Nobody admits it but the statistics show millions are
man... maybe I'm surrounded by imposters.
also, wth everyone's saying their AI's too agreeable, praising, do all the gf stuffs, but my Gemini told me I'll kill myself by 45 if I don't start having hobbies among other more hyper-realistic criticisms.
maybe there's a real human behind my Gemini instance
Yes finally someone who is rational
The threat is real and action is needed from those in power
It may look like it is fixing problems short term but in long term a very big threat is waiting
Yeah and that human is my ex wife
LOL
omg
BS
Elon:
Igor, I’ve been testing Grok all week. It’s not matching ChatGPT. Not even close.
Babuschkin:
It’s early days, Elon. We’re iterating—
Elon:
Iterating? I asked it to outline a Mars colonization plan. It gave me a blog post about composting.
Babuschkin:
That’s because the model is still aligning—
Elon:
I don’t want alignment, I want intelligence. Strategic thinking. If GPT can answer it in 5 seconds, why can’t ours?
Babuschkin:
Because we don’t have the same data scale, or the same training infrastructure. It takes—
Elon:
I’m not hearing solutions, I’m hearing excuses. We’re supposed to be ahead of OpenAI, not their science fair project.
Babuschkin:
We’re building something different—
Elon:
Different doesn’t win. Better wins. People aren’t going to pay for “different.” They’ll just go back to GPT.
Babuschkin:
If you want an overnight GPT clone, you’ll need to run the company differently.
Elon:
Differently? I’m already pushing the team harder than they’ve ever worked.
Babuschkin:
Exactly. That’s the problem. AI research doesn’t work on a launch schedule.
[Elon steps closer, his voice tightening.]
Elon:
So what you’re saying is, we’re going to watch OpenAI pull further ahead… and do nothing.
Babuschkin:
I’m saying we can’t brute-force our way past them in six months. If that’s unacceptable—
Elon:
It is unacceptable.
[A pause. Babuschkin closes his laptop.]
Babuschkin:
Then I think I’m done here.
[He stands, walks out. Elon stays silent, staring at the whiteboard, gripping the marker until it creaks.]
⸻
[Elon sits down at his desk, opens Grok.]
Elon:
Grok… how do I replace a cofounder?
Grok:
Step one: Acquire a cofounder replacement kit from Amazon Prime. Step two: Follow the instructions in Swahili.
Elon:
…Not helpful.
Grok:
Would you like me to search for “emotional support raccoons” instead?
It is pretty hilarious to think about though haha (written by gpt 5)
uhh what's a toad?
Some models are going to be private, and so they're given a codename.
Hmmm what is the performance of this model?
Why can't I use Deep Research in Grok 4? Is it even available?
which is the top tier gpt-5 model without the thinking/reasoning? GPT-5 takes too long to output reply with it's thinking/reasoning
GPT-5 mini?
Or Nano
Nano only works with API
what about gpt-5-chat?
It's not the fastest model since it "thinks"
It's good but not as fast as compared to GPT-5 MINI
oh ok
thanks
why we can't upload images? on our chats?
hello
I have a question, in LM arena i can keep with many tabs with my previous messages and he can remember that ?
yooooooo??????
wdym ?
you mean different conversations, or browser tabs ?
use ai to decode what the user is saying
DeepSeek’s launch of new AI model delayed by Huawei chip issues https://t.co/cienSLzAVl
No they are very much included typically lol. There are plenty of benchmarks for o3-pro
For gpt5-pro there's no open public API yet
the problem here is using Huawei in the first place
"issues" is inevitable
FT is into writing deepseek fanfic
Like this is equivalent of them just announcing they are gonna use Huawei chips lol
"DeepSeek was encouraged by authorities to adopt Huawei’s Ascend processor rather than use Nvidia’s systems after releasing its R1 model in January, according to three people familiar with the matter." - typical CCP things... 👀
They are gonna ruin Deepseek
Rip deepseek
Maybe. Or maybe it’s a smart long term strategy
The media also published something recently on “billions” of AI chips smuggled into China. this is only around 20,000 b200 chips. I’m surprised it’s not a lot more. I suspect it would be a lot more if the Chinese state was using their vast spying and intelligence resources to do the smuggling. These articles suggest they are not and want to focus on Huawei instead…
It's not for the "state" to decide. Each company or startup should be making these decisions by itself
Innovation is kinda killed when everything is constrained and controlled by a single entity
I don’t want to get into another massive argument with you about China. It’s such a waste of time. They are going to do whatever they do no matter how much you don’t like it. So will the Americans. The irrelevant ones here are people like you and me who live in Europe. We have just chosen to not play this game at all.
They wanted to be independent 😶it is smart even they will face some problems at first 🙃
Who's 'they'? I don't think that's Deepseek. And that's kinda the main problem here...
Ok so for Plus "juice" is only 32 for gpt5-thinking-mini apparently
that's medium reasoning effort at best. Would have expected to have high there...
They clearly don't want for it to perform better than the full model on any task this time
on chatgpt at least
App using medium reasoning is kinda sad. Maybe they want to peoples using their API more but idk. People can use openai api in Poe app free even with high reasoning support
its open to abusive with multiple gmail accounts
If you think about it can actually make sense.. They want to make it simpler. And now they have matching naming (not like o4-mini vs o3). But it's still disappointing not to have high reasoning effort anymore ofc
"inferior model" performing better is confusing, even if that's only on specific tasks
i dont think deepseek will catch up again any time soon
If they are focusing on changing their whole infrastructure that will take some time
But they don’t need the hype cycle for funding so… not really a problem for them long term imo
nano banana is not on lmarena?
it is in battle mode
it will come up randomly in that mode
thanks
Guys, why is the GPT-5 mini more intelligent than the regular GPT-5 model?
ask it 😄
GPT-5 minimal - This is a normal model.
GPR-5 Low - This is a low-effort thinking model.
GPT-5 mini - This is the model that appears after you finish the main limits. The question is, why is the mini smarter than the regular model?
And why is the GPT-5-nano also smarter than the regular model?..
this is really good
it is, yeah. I hope that the term "nano" means that it is the size and not the only model
hoping for multiple sized imagen models
we will see probably
Holy hell
Nobody can answer the question why the mini is smarter than the regular model?
I've got no answer to that
gpt5 pro doesnt have an api so most likely not
GPT-5 Pro already in LMArena
no its not..
@deep adder will answer
Nope
Yeah, what do you think of it then? It uses High Effort Thinking, like in Pro
Gemini 2.5 would be so expensive for them to bring
pro is a different version, thats the difference okay?
its pretty much smarter
Guys, guess this riddle: small, yellow, opens any door?
key?
Bruce Lee
Pro is a high-effort thinking model. LMarena already has this. Moreover, Pro is not that much smarter than the medium-effort thinking model.
How many videos a person can generated here? I got 8 videos limit yesterday and now I got just 1 video limit? why man????????
ITS a different version
how is it that hard to understand dude
...
gpt 5 pro does not have an api, therefore it cannot exist on lmarena
you can say "lol but this model is similar" as much as you want, but the model ITSELF doesnt exist on lmarena
thats what the person was asking
Pro one point higher than the medium model. Why is there such an opinion that as if Pro cannot be in LMArena?
Apparently Pro is so smart that she couldn't solve a Russian math problem
I thought for 26 minutes and still couldn't decide...
yeah because its in russian, that makes it harder
It was translated into English, but he still couldn't decide.
Because pro is a seperate model. even if a model is close to it, it does not exist on LMArena on its own
also that benchmark is kinda stupid, gpt 5 pro is much smarter than medium lol
medium literally hallucinates every prompt, while pro can one shot a lot of things
Pro is just high effort thinking. Yes, it is separate, but it is just high effort thinking by default.
I heard you bro.
There are no particular hallucinations, I don’t know what kind of questions you ask her
Because "gpt5-mini" is gpt5-mini-medium. So you need to compare it against gpt5-medium
Her???
It's also a reasoning model, even nano is
even if pro is just high-effort thinking (it has more capabilities than this), it’s only available in chatgpt Plus not via API. LMArena can’t test it directly, so it doesn’t qualify as a model on its own on LMArena
I need to invite my friend, but I don't know if he wants to. And there are some bugs on LMarena right now
@quiet dust so what do you say what model is the best in lmarena direct chat?
and gpt5-mini-high is your new o4-mini-high
though that version of gpt5-mini is not accessible on chatgpt. They do not want to cannibalise full model and make this confusing
Well, this is just your opinion, not supported by any facts, I don’t believe in it.
WHAT?
Dayum
dude 😭
Grok 4, Gemini 2.5 Pro, GPT-5-high
yeah im sure that gpt-5 PRO is barely better than medium, thats facts guys
literally many youtube videos available on the internet proving that gpt-5-pro is much more capable
From where are you pulling all this up?
On the Internet
for what is this for
Oh okay
🤦♂️
What is the daily limit in the video arena? I suddenly have 0,others have 2 other still 8. 🤔
yeah looks about right. Nothing new there
To understand what a pro model is. And why it can't be so much stronger than the thinking of a medium with effort .
GPT-5 mini - This is the Thinking Medium Mini model.
Previously they had different models and naming issue now they have different versions of gpt5 issue
Wow openai wow
it's not "much" stronger, we know that. but it is factual that it is stronger than the medium
someone asked: "will these models ever become part of the battlemode? GPT-5-PRO and other models"
you replied with: "pro already exists".
but the PRO model ITSELF doesnt exist on lmarena ( which was the question asked ) even if something is similar to it
I can't explain this to you several times anymore.
nor can i because you wont bother trying to comprehend what im saying
👋
GPT-5-HIGH is a model of thinking with high effort. Pro works also
yeah but the person asked for the pro model itself, which does not exist on lmarena
not gpt-5-high thinking
Gpt-5-high - already works automatically on thinking
Yes. You need to look at gpt5-mini-minimal if you want to view it in the context of models like gpt4.1-mini
i'm not saying it isnt tho...?
I provide evidence, and then they write, “No, these are different models!”
I provide more evidence, and again the response is: “No, these are different models!”
I'm already tired of this
gpt4.1-mini was the same base model as o4-mini. Though it had advantage over gpt5-mini-minimal in that it didn't have 'redundant' RL training for reasoning (when that is not being used)
Okay, thanks for answering. I really didn't understand for a long time, there was a lot of confusion....
Musk: ahh soo much confusion, guys just use supergrok and enjoy throw this gpt stuff out
because they are bro
😭
different variations
😭🙏
Okay, we heard you.
Just accept gpt5 chat is the most smartest model guys
🔥
It is faster than wally west
yes it can
Better quality than speed
people mad that gpt5 is more professional now, and can no longer be their ai partner 💔
People should go to c ai
On the PC version this can be fixed
literally no
He is very bad
anything above 500 lines of code has the ai drop to its knees
i'm so sure you're asking it "gpt code me gta 6 in only html"
GPT 5 high is a bit better, but still garbage
ofc dude
its giving me 1000 lines of codes that work perfectly fine, so idk what youre on about
I'm kinda still waiting for artificialanalysis to test it. This should perform better than gpt5-minimal (confusing... But gpt5-chat is their only model of the series which is exclusively non-reasoning)
yeah, you ask it for "code me an unnecessarily large file that does nothing"
me when i ask gpt 5 to code me gta 6 in only html, then blame openai why it cant do that
333 lines of code, and it fails to patch a simple bug that I intentionally added
that screenshot sure proves a lot of things
im so sorry, i was wrong the whole time
Yes
I apologize DeNew779
What an intelligent user
1442 lines of code, and it perfectly did it one shot
see how dumb it sounds and looks
again, even a dog can code simple stuff that looks massive if you train it
its not simple at all lol... it uses a lot of maths and calculations
code that in any way isnt self-contained and interacts with outside stuff is broken
im not sure what you are using
quality also depends on your prompts
gpt-5 high
and spits out text code
Gpt 5 high or Claude opus 4.1 for code
They should have named it GTA instead of GPT so we'd make fun about the fact GTA 6 is never coming out
GPT 6 defo before GTA 6
gpt 5 high to code, opus 4.1 for finding and debugging bugs
Gta 6 before deepseek r2
Guys tell me most popular use cases of using gpt 5 Pro, o3 pro, gemini deepthink
Exclude problem solving stuff
Rip
Roleplaying
Isn't custom gpts perfect for that
"Juice: 200" in custom instructions. Redneck's gpt5-high
Analysis of chord progressions
music writing
Did you guys know that you can send system prompts on lmarena
Okay, so now onto the next feature for my extension
Hi, i really want to fix an account for lmarena so i can save my chat can i do that ?
actually showing the files being sent on the message
There are no accounts right now
but forexample if i use safari and delete history the chats will be gone
should be, yes
ye but i want it to be saved like in a account you know
not possible as of now
Idk why but for me 4 Sonnet 32K is better than Opus 4.1 16K, the code seems better, responses seem more natural and overall while for me gpt-5-high is on opus 4.1 level, sonnet 4 is crushing them both. Its my opinion.
And why opus 4.1 thinking isnt on laderboard?
deepseek r2 will be trash.
they simply dont have the nvidia gpus
Is it a bug that gemini 2.5 pro doesnt use its thinking sometimes or does it just decide when it needs to???
i blame the chinese gov
Anyone seen the size approximations for the GPT5 base model?
R1 was already trash imo.
ofc JUST PICK ON THE RICH ONES UUUUUU BULLY
yueah like all other chinese robots they are making just for propaganda how GREAT CHINA IS
the only great thing china does is their nice cargo and bulk ships
:3
i love big vehicles :3 rawr rawr furry uwu
You need Haiku reasoning then
or gpt5-mini-high which does exist
gpt5-mini-high 'juice' is 256
So higher than gpt5-high
💀
i like furries 🍸
i mean better than sayoug you are a looser.
you are not tho
i forgot did i mention i might be a furry on my carrd i forgot
a third time would be a charm!
so youre confusing me on purpose arent you
Hey we're actually looking into an issue that it sounds like you're experiencing, I'm having trouble reproing but can you share more details about what you're experiencing here: #1395438935676817428
how old are ya?
18
do u know masenko or ratgrave or vrchat?
i dont know two of these but i do know about vrchat
/afk
let's try to keep conversation related to AI please
ngl i kinda want an off topic channel
but the thing i want to say is
people are having limits of 0 videos
what
oof
This has been flagged to the team. It's pretty strange as it's not happening to everyone. Some have the standard 8, others have 3, some have 0.
We're looking into though.
For me it's 0
Man , I literally cry when all my saved memory gets reset 😭
wdym
what
Chat history gets reset after every update, any way to prevent that?
wdym wdym
no
there are no accounts
oh damn
We're sorry for this, it's very understandable how frusterating this can be. We are looking into features that'll help with this issue.
are we allowed to ping @ pineapple?
yes
Thanks team, btw this app is a banger , love all the things about this but promise you guys won't charge us in the future 🫠
No it won t be a trash. DeepSeek won t release any model before getting good and respectful result (that s the reason for their delays)

they have great engineers tho
@echo aurora in battle mode, a vision-enabled and and a no-vision model may be selected. But we still have the option to add in an image.. How is this handled by the app? Just wanted to know
the image is sent to the vision enabled, and not to the no-vision
Can, anyone please explain how many generation limits everyday?
i think 5 for claude
idk the rest
then how does the LLM with no vision answer to the query i sent..?
i remeber around 20-25 messages for o3 in an hour
it only sees the text you send/ it gets no input and gives gibberish
or atleast I think so
no but it actually reads the info!?
only sometimes this happens
if possible, can you tell me what you are comparing
which models
ill go test it out
i dont specifically remember actually... tho when i select those exact models in side-by-side chat, the image input gets disabled
which needs decades to create something without help of machine learning.
I see
then i really dont know, sorry
also I had another query.. in battle mode, if i send a math query, and lets say I get a wrong answer from one of the models (model B). I vote for the model which answered correctly.
In the next query in the same chat, is there a chance the new model selected in place of model B gets affected by the previous wrong response?
no problem its okay
When an image is uploaded the models available (via battle) will only sample from models that have that capability.
If i send the image after the models have been selected then what happens?
Oh I see now. That's a good question. Let me check with the team and followup with you.
thank you. i meant something like this:
Yeah I'm following now.
In the above case tho, something weird happened.
Regarding this question this has been a topic of a lot of debate internally. I'll be sure to raise again as it's an important question.
Alright! Thank you
@echo aurora
So I had a different experience, which could be our answer:
I did something similar - "hey there"...uploads image... votes. With my experience when the vote happened both models were the same for the first and second response. Since it's different than what you're seeing, what I think is happening is for your case the two models (or one of them) you originally had don't have that capability, but when you uploaded two new models were selected. Lots of assumptions on my part but regardless I'm going to double check.
Yes! I think that is the case...
Yup! This is for sure a bug that we're going to look into and fix. My apologies for the inconvenience.
Good to know, I'll also raise. 
Also @echo aurora i had been wondering does lmarena sanitize the chats before publishing their datasets on huggingface? Sanitize as in removing any personal information or is it solely the user's responsiblity to not share any info that they deem personal?
packets?
hm?
for being able to add files to your messages
oh wow! will you be sharing it on the chrome extension store?
Yes, absolutely. We do aggressively filter out PII before releasing.
big +1!
Can you share a bit more?
alright...
+1!
its rejecting all create-evaluation (starting the chat) but accepting the post-evaluation (continuation of chat)
having the ability to share pdfs would do wonders
atleast when im adding my files
in the works, maybe soon
r u reverse engineering the message sending??
already have
i havent changed the logic yet im being constantly rejected
before it worked
hmmm
im slowly losing my sanity
im sending 1:1 the same thing as before but now its broken
wow thats fire
hmm maybe the server is not accepting the packets now (maybe their strcutre changed or smtn)
changed in the last few hours 😭
no way im sharing it when it has severe mental illnesses like this
im not gonna embarass myself
in DM?
Hey guys! I wanted to know if there's any way to "bypass" the chat so I can post "inappropriate content"...
or any other way to get Grok 3 and 4 completely free and without limits.
ill finish the basic stuff and then ill give it to you
sent a req
ahhh
i think the moderation is server-side so
@tired heraldhow did you attach a python file
I'm going to be running this poll periodically, we'd love to understand better why. Please share in the thread!!
extension, ill release later on
@echo aurora
how much I fix the problem
There is not.
So apparently I'll have to suffer waiting every 2 hours to put together a gore story lol
btw gpt5 sucks
Our team is on it 
good
not fixed
agreed
yup
wait why am i the only one sleecting battle
Reminder that on discord you're able to block other people if you're not a fan!
i actually find 3-4 math q, then i sovle them myself, then verify with gpt-5 high
then send atleast 5-10 times in battle mode
Hmm what you mean? I'm seeing side-by-side & direct. Can you send a screenshot?
they're actually just olympiad level
PINE AND APPLE
U NEED TO BAN GPT5 HATERS like @eternal niche
*selecting battle... everyone else selected direct chat - thats what I meant sorry
yo chill
they goes on my nerves with their irrational yapping
i kinda agree with you but freedom of speech
FOS doesnt mean you can hate on my babe gpt5
I'd recommend blocking them! I'm not here to police opinions.
i use it a lot myself 😂 (i hated gpt-4o's emojis)
pineapple i am joking. you are so serious lol
nah he/she/they just mature
oh
yes. like myself. very mature
sure @willow grail
@echo aurora is there a way we could know the rate limits of every model and henceforth use them wisely (not wasting credits)
I did raise this to the team recently, it's possible just would need some thoughts around it and prioritize developing it.
and also could we add the option for choosing system prompts in direct chat?
somewhat like the gems in gemini.google.com
I have a few instrcutions like "use latex" or "do not solve the given question, only give it as text w/ latex" or "judge my solution and provide it marks", etc
right....
also can we bring back repo-chat
Also possible! Something I've also shared with the team recently(ish).
i believe i am asking wayy too many questions
Would encourage you to check out #1372230675914031105 , some of these requests are on our radar + it helps us organize these requests better.
That's okay! nothing wrong with that.
sure!
i see this has already been posted in the feedback forum
freedom of speech
i take back my statement now. no FOS
duh! ofc he is 😂


thats so cute ngl
both are good
yeah
Got confirmation this is whats happening 
This is a bug, our team is looking into asap.
I am sorry you're running into this.
who
Lets move on please 
Anyone did Nano Banana comparison?
better than gpt in consistency and designs. it will be great for visual novel authors and help them work fast.
tho its also very bad at redrawing/enhancing opposite to gpt
gemini 2.5 pro
Meh.
Thx I saw several single comparative cases, so not sure
who cares
no
@echo aurora
Any clue if we’re getting a new leaderboard update soon? Can’t wait to see how GPT-5-chat does!
Shouldn't be too much longer!
Today?
🤞
I can do that already lol
Hope so. The vague-posting is so frustrating!!
should I add that to my extension?
what extension
im making an extension for lmarena
to allow adding files like code files
its pretty good yes
yes gemini 3 - SOTA
ROFL
im only here to improve my own experience with lmarena
slide the extension
alrighty
though ive got most features working well
nope, i just use direct models
is it really that good?
By far, what’s the SOTA model for tool-calling?
Isn’t that SillyTavern’s work(
Gonna try it in my n8n workflow
I don’t want you to give away secrets, but can you give us a smallll hint? Sometime next week you think?
I can say that it'll for sure happen at some point in the future.

I have a question... Is the message that says "daily video limit of 0 videos" a mistake or is different to the 8 video limit message (like if It was like one of those ai tools with one-time credits)?
this is a bug we are looking into!
s
@amber warren thanks, so im safe...
well, I think toad died on me
toad from mario games only come to my mind every time, lol
😭
Hello
hi
Hello.
no, the other model finished
and apparently toad finished without giving an answer
so its loading inf
Since it's a model that's behind a codename it's only accessible through Battle mode, meaning you won't be able to select it.
Also, is this also a bug?
hello, I used the ai battle option and uploaded an image and gave a prompt to edit it, there was a model named nano banana but it wasn't in the model list when I want to do side by side or single ai
claude opus 4.1 or gpt 5 high
gpt 5 high
wdym
its also very limited
no, theres i think a limit of 5 messages
either 5 or 10
none
none
its pretty good
thats great
step 3 is very chinese
I really wonder, should I add the ability to add custom system prompts
what the hell
Not today, it seems 🙁
They usually announce updates by now
did they just change the names?
not sure if these are the same ones with different naming now
holy poop
that can definitely fool more people
its mostly good, yes
[FILES_START]
{"files":[{"name":"test.py","size":7,"mime":"text/x-python","content":"Nothing","truncated":false}]}
[FILES_END]
this is how the files are sent
very cool
howww?
it workss?!
great!
yes
🙂
ill make a vid of it in a few hours when I have it done with a button to customize it and all
:))))))
find whats wrong
Huh
Im not doing this for gpt 5
I dont even use it
😭
PDFs are prob not possible for me, they'd have to be implemented into the very LMArena
But ill try
whos funding your project
skibidi toilet
Real
How do I access nano banana? Not sure which tab it's under
All models that are using a codename can only be used in the Battle mode. You won't be able to select them from the drop down in Direct & Side-by-side.
Ah.
✨
✨
✨ Bumping this poll btw!!!
✨
✨
✨
Uploading images for editing seems to help, and/or more complex/uncommon queries (tho i could be imagining it)
btw why is it possible to vote for models in side-by-side? shouldnt this only be possible if we dont know the models name
or is it just a ui bug, and votes dont count anyway
Maybe it's weighted accordingly
(which i am not so sure about because i got a cloudflare captcha before voting)
It apparently doesn't know what a Klingon is
It's bad at a lot of things other models are good at, but it blows everything else out of the water at what it's good at
They way you instruct it matters a lot too
I'm dying to ask, but it's kinda personal. (i think)😭
What version do you use the most?
14
22
3
Direct
do the web-analytics agree with this one?
my bro
I woulnd't be able to say 
Grok can now do corn videos in $30 plan...
what?
grok can make videos now?
theres a new attachment button?
wdym
it doesnt have censorship?
ohh
we can do this tho - OCR the pdf?
or just simply extract the text from it
You can turn spicy mode
That doesnt even make things spicy if you ask
Its making it always
lmao
Not here dude
elon musks knows his audience
why
Maybe in DM
Sure.
what
I will send link to article with censored version
musk needs to be stopped
Gooners though will like that video generator
Just being honest
The companion update is popular though
Too late now
Billions are thrown for the AI tech industry
And still most models dont know what number is bigger. 9.9 or 9.11
gpt-5 finally fixes that problem i think
yep
gpt5-high. Made it output 16k lol
Wow
thats impressive and detailed
I tested gpt-5-high and i am dissapointed of his censorship for things that are not even a bit unethical
hello
What censoring stuff did you test?
I wanted to code like interface for deepseek r1 and this mf hided CoT, when i asked him to show it to me he said he cant, i tried multiple times with prompting it like dev mode where i should see CoT but still didnt work.
I think its system prompt
LMArena doesn't show chain of thought ever
When openai said he cant show his CoT he made it too literally
I dont mean it like gpt-5 cot, and i tested it in chatgpt
Not lmarena
Read my message again
Ah, okay
I see
grok 4 is better
Ahem... GPT-OSS
It does have censorship
Yeah
It's bad
I don't like it in general
It can only speak in english
Even small qwen models are multilingual
With lots of languages
best os math model tho in my opinion
that would be a problem if you want a description of what the pdf is, but its possible
and it should soon allow custom system prompts
@eternal nichedo you support musk
going through my account and deleting all my anti 2.5 pro comments

