#general
1 messages · Page 100 of 1
Now add the bonus points for reasoning and R2 will have a great chance to blow OpenAI out of the water
Or we can take a look at LiveBench, where GPT-5-Chat is right around GLM 4.5 and Kimi K2
78 points for high - 60 for chat
18 points difference
Not bad
It is for instance the same difference as between V3 and R1-0528
Now add this same difference to V4 - that will definitely be no worse than K2 or Qwen base - aaaand R2 can appear at least as good as Gemini
Just kidding, on LMArena it is just as much as between old V3 and R1
What is the default ranking of the text benchmark? It's not the "Score" as I can see
i use gpt5 high with 2000000 context it's really make good responses for me
lmarena ai
Can the image or video be used commercially?
Hello, I'm new.
you have it right in the picture, it is the rank (that takes in account the CI). You can sort by score if needed.
uhhh
September 2024 didn't have GPT 5 official release to the public
I agree it's confusing. But they are taking the deviation into account there. +/-8 and +/-5
So max for o3 is 1449 + 5 = 1454
max for Opus is 1447 + 8 = 1455
1455 > 1454
@echo aurora i keep getting this error , even after changing internet and browser
Whats currently the best coding ai?
In terms of actually coding (Speed doesn't matter)
how do i select gen models for image gen
❌ Generation failed. Failed to create evaluation session. why i m geeting this msg
You can do that on lmarena.ai but not here, I think.
Hello i have a question?
image model
in my opinion, gpt 5 high
You can select specific models when using Side-by-side & Direct modes on the site.
Whats up?
Can I generate videos with Veo 3? This video generator
Yes, but Video Arena currently looks down.
because
Getting a something is wrong error when attempting image edit battle
Regular image generation too
Hey everyone - it looks like Video Arena isn't currently working. I'm setting those channels to read-only until we fix the problem.
Hmm okay you're getting on regular site too?
There might be an outage...
Same
I had to make a new chat to make it work again
What happened to the video generator??
Text battle works
There are many errors happening, a possible outage
I wasn't able to repro
People are too eager to make images and videos it seems :D
Look at this image that I generated with the model, I converted an image to a realistic one.
It's good, the more users there are on LMarena, the more accurate the leaderboards are
Is a test
This is absolutely horrific
It is. People still say LM Arena is a benchmark which it isn't. It tests on real life problems and usage.
Lol true
I dont understand the hate people give on it on Reddit for example
Back up now
Ugh it's working again for me. (I know that's a good thing but still)
Or not
This happened yesterday too
False alarm ehehe
Nah not working
And erros again lol
On SWE bench and rebench GPT-5 was ahead of everything IIRC.
If you're looking for open weights model, there is Qwen3-Coder.
In my experience, Deepseek writes more efficient, smaller and elegant code - but it is slower, and it sucks at SWE bench
LM arena
Hahahahaa
is collapsed
Ope it hasn't errored out yet this time let's see
me too, thank you for flagging
Gpt-image-1
what the... The heck is that
cooool
Aaand now I'm getting errors again
Me too
Doppelganger marina from Fear and Hunger 2
Doooog
not mine
lol
I wish I had a dog
Cuter images of her I generated in Sora a while back
They are okay, I guess
meow meow
They really need to fix the yellow tint
in the next version
I agree fully yea
oops
Okay it should be fixed now.
thank you very much as always
Ralistic
Had two outputs where something went wrong with one of the responses even after retrying
is gpt-5-high just the same as gpt-5? i cant find any API pricing
Three now
Zoom out this picture like it was taken 30 meters away
Nano-banana
Amazing
You can see the yellow car
Yeah
wow.. image models are getting crazy.
Yeah
Still bad on some very specific tasks
but this is the worse it will ever be
editing won't be a thing in a few years, mark my words
theres no nanobanana
@echo aurora only getting these now
refresh
cuz mine works
generating images is like a game for real for real
like sometimes i spend 2hours just doing that lmao
I had refreshed yeah
Do image outputs are being logged?
obviously yes
Hmm okay lets give it a few and try again.
they are on a s3 amazon bucket
they expire after some time but they can be retrieved
so yes, images are logged
oh i see, i almost put someones face
If I hit retry enough times it eventually goes through it seems
i put mine idc
if you post on instagram
it's the same
does this server have a list of all ai models, written in one place? I wanted to list down all the ai models available on LMArena
The leaderboards do but they don't include stealth models like nano banana
I am getting imagen-4.0-ultra-generate-preview-06-06 much more than nano-banana
And in all my test, imagen-4.0-ultra is far better than competitors
Hitting retry on dozens of errored outputs only to get something that looks like absolute ass 🥲
Nano is better
You'll get nano nonstop if you upload an image or multiple to have it edit
Weirdly enough I haven't gotten nano once with my "turn doppelganger marina realistic" image edit request.
"gpt-5 fixed hallucinations"
Maybe nano doesn't like the prompt but keeps getting selected to do it and that's why I'm having such a high error rate???
bruh
Impossible to use
You know why I like this model because it has good image quality to improve the quality of the photos
me too, same error
lmarena died 😭
@echo aurora problems still occurring it seems
it's true only for the most powerful gpt 5 models, such as the thinking modality high
true, high has like 200 juice it's crazy
Yeah, keep getting errors now
A lot of errors...
Okay thank you, flagging to the team.
Yeah
Try with my prompt?
Turn doppelganger marina photorealistic, she has no face just a giant mouth.
Do I have to 
Ehehe sorry 😛
Yesterday discovered this new AI and Im so happy to see the BIG step is being made right now in AI 🤯😳
What horror am I seeing 😭
Your AI model has suddenly gone viral in Taiwan, being widely used to generate PVC character model images.
It’s unclear whether the sudden surge in user numbers is the cause, but the service feels like it’s experiencing a DDoS issue.
A cutie patootie
Although this is current, I asked him to improve the quality.
The first image is better Quality
Pretty good, which model?
in korea too
Nano banana
so popular now
Expected. Did you try milking it more?
Yeah
How do I use nanobanana??
Yeah
Is it here or on the lmarena website
I'm on my cell phone
Can anybody assist me with where to use the nanobanana feature?
On the website, in battle mode. It will appear in random intervals in the battles
you cannot choose the model since it's a pre-release model.
I found this thanks to a YouTube video
Oh thanks! Been looking for it all through
Can gpt-5 think in chinese?
i dont know anything about nanobanana, can you tell me something to understand?
Is the purpose of generating images are to enhance censorship or no?
im curious about the scope of this
which is original
Flux is much better for this stuff
Yeah
Does anyone know what happened to flux kontex max?
The search leaderboard isn't updated for gpt 5 search. Would you guys consider the new gpt 5 search better than the original o3 search
So they are taking the right side of mean + SD? Why did they choose the right side and not the left side in this case?
@echo aurora it will be really good to in someway increase input limit. if you guys add support to other file types to uploud like documents please don't put any limit on the file token size
lmaooo
they cant put no limit
the companies that give the model to lm arena
pay
it's not free for them
running a model isnt free lol
no limit no but like the current limit is quite sht
got u
Not in my experience. The router seems to prefer non-thinking for what it deems search queries for quicker responses. So not really close to o3 at all
maybe 20th august?
see like when i use gpt 5 it can generate 5000 lines of code and it is way more expensive than proccesing 5000 lines code of input but on they have the input limit around 2400 lines
Hi
Getting error again
same here, on text battle
refresh
Already did.
No
Opus isn't better than GPT5 at coding lol
I used both for months
And weeks of the last Opus 4.1
GPT5 is better
I used it for Py, Golang
big doubt! Google never launches new models without LMArena testing... and i see no google model yet. 21-25 is not happening (90% sure)
yeah i guess
And some agentic stuff related to cyber security
That's what you think
Use GPT5 with API
Not chat
It's different
SWE is the worst benchmark, it's corrupted
You can't test a model relying on battlemode tbh
btw in lmarena for max performance with gpt 5 high say think very hard in the prompt
Try it on cursor
Install cursor
You have free credits
250
Try the model with your best project
but this is 2.5 pro. I am talking about Gemini3.0 launch
nah
they wont call it 2.5 for sure... they will have some random name like kingfall
give me a difificult prompt to test 🙂
kept getting these errors 🙁
really excited for distilled r2:a3b
Yeah, it's impossible to use LMarena today.
So much error messages
guys remember
in lmarena models still have context window limit
no bro not that limited
That's not the problem. I've been doing the same thing all week, and only today I'm getting this error message, just like other people.
the original model context limit
gpt 5 on api has 400k context limit
so it is 400k on lmarena
like how much? line of code
i'm not a expert
idk how much lines that is
\you cannot input over around 2400 lines of code
oh yeah
just explain it to the model
still experiencing errors :/ have to refresh it numerous times to make it work
I keep getting an error message
same :(((
idk why all are getting errors i'm fine
oh no
i get it too
jinxed lol
Yeah, it's the server
these are not normal
Trying out Battle mode
for those of you that use openrouter, saturn is giving away 500$ of credits to testers helping with identifying bugs. free money fr
i didnt get any issue thismorning?
what was your prompt
weird, could have a limit on ctx
pretty expensive model
tried making my own prompt to be a study mode tutor
Yeah, I’ve run into that too. When that happens, the only thing you can really do is grab the last few messages and start a new chat.
I usualy refresh my web about 10 times and it works normal again lol
if it happens again, i'll keep refreshing it
video arena keeps failing to generate
it always says "failed to create evaluation session"
LMArena servers are unstable today
hmmm actually idk i don't have any exprience on that
maybe claude is better in that area
Hi, I don't see models like Seed-1.5-VL in the Vision Arena section. Is it because they haven't been added to the Arena yet?
oh ok thanks for the info
Ah okay so it's not just me?
yepp
Is it because of all the Taiwanese people using it to try nano-banana?
Someone said that earlier
the lack of a grok 4 mini indicates that grok 4's "sota" was just achieved by scaling
because they basically pulled a deepseek with 3 and stole o1 chains then claimed sota which got them a ton of illegitimate funding
it's not workable and it's benchmaxxed because they ran out of models to steal from (like deepseek, but we'll have to see how good r2 will be)
Try SUPAGROK
well
not literally but
it stunk of qwq and r1
when you scanned its semantics it showed up directly next to them and o1
which is basically proof
the best thing about qwq and r1 however is that they opened up powerful stuff
grok hasn't been open since the first prototype
grok 4 is the result of a company realising they ran out of stuff to yoink basically
it's not THAT bad but like
unless you're running pure stem i wouldn't touch it
you said "nothing good is "open""
grok 3 blew me away in chemistry but nothing much else
chemistry is stem
i was really hoping gpt-oss would shine as a chatter but it turned out to be the complete opposite
like phi-4, it's just benchmaxxed on synthetic data
grok 4's math answers even when i ask it to explain r like 1 word final answer
and takes a lifetime
to answer
like, kimi has actual soul
it's nice to work with which is what you want to focus on instead of pure intelligence
even if it happens to be SOTA clever (well, it had to be), GPT-5 shines in the same way
i feel like intelligence as a benchmark plateaued at o3
and even then the best part about o3 is that it was really workable and readable
if grok 4 tried teaching me something it'd be like taking 20mg of ambien
fr
in my opinion the o1 series sucked because it never focused on the latter
o3 did
gemini 2.5 pro is better in physics STEM AFAIK
but grok 4 did better on a private bench
i think they're roughly equal at stem, no?
yeah
gemini 2.5 pro is a tad better
tho
bench
i remembered incorrectly - 2.5 pro is better
a little surprising for physics
dude
grok 4 is the biggest benchmaxxing block of tra—
hmm
When they are gonna add midjourney
WHAT DID I LITERALLY JUST SAY
i wish there was a way to convey to laymen when a model is good to work with
but NOPE
numbers numbers numbers
f∆ck the numbers
"oH gPt-5 dOeSn'T sCoRe wElL oN [n]" and then i try it on the subject and it's brilliant
how to?
Hi.
Absolutely wrong
There are a ton of open source image gen models that are amazing
why do i have to say gpt-5 with reasoning every time
And only getting better and better with time
open weights is not open source
i think that's what he meant
i thought that too
It’s a joke because it would’ve been called o4 were it not for the rebrand on model naming
Has anyone tried stable diffusion?
so.. umm is it fixed yet?
Not yet, our monday just started 
patience is key
lol i didn't knew that..... i'll be waiting " patiently"
prompt for video-arena-1 and 2 and 3 not works !!
How so? I'm seeing successful generations.
most of it
however not all of it
noticed a new model on the image arena: lucid-origin?
Yeah, just now noticed it. Testing it out
gpt5? it kinda is
well it's o5 if we try to fix their naming mess with o4-mini lol
They potentially did that on purpose to arrive at 5
and then merge
with gpt
why is the difference decreasing every day?
wow
like gpt4 to gpt4o was perhaps even bigger jump than o1 to o3
it went from 21 to 3.. it's crazy
kinda crazy
daaaaaamn
Do the LMarena devs/staff know who owns the Nano-Banana model?
When did we get this update? It shows Aug 16 but I did not see it yesterday...
I think people may have gotten familar with gpt5 style and that they shouldn't like it, and so now they are voting against it more often. LOL
IIRC the 21 point difference dropped to 6 after they merged the lmarena GPT-5 with the public API model, which was supposed to be the same model
The model will tell you it's google
Or rather the initial 15 point drop happened after the merge. It makes sense that it would continue dropping as more votes come in if the public API model is weaker than the LMArena model
??
If it is a different model why the hell are they merging the results
Because OpenAI claimed it was not a different model
But it did get worse after the merge and the current dropping is consistent with that
Sheesh what a scam
I don't like to accuse companies of lying because most things are just misunderstandings. I just don't know what else it could be
why is gemini 2.5 pro getting so close to gpt-5 high
Perhaps read the last 10 messages or so
Does this new update include the votes that happened after the 'warmer' change OpenAI made?
leaderboard update*
The API doesn't have that change
So no
This is pointing to the specific GPT5 high endpoint it does not just change over time
Otherwise that would be terrible for production
kaunsaa
Forgive my ignorance but why is it difficult to change API if there is no issue in changing the model in your website/app?
API endpoint*
what are video limits per day?
The API endpoint doesnt change because it is supposed to be constant
Thats the point of having an endpoint
They dont change it on purpose
the nano banana model is never used anymore in battle, how could i fix it ?
Hw
So, if they want to update the endpoint, they have to call it a different version with a new API? So, if OpenAI wants to update gpt-5, lets say 08-20, they have to name it as different version with a new API and it has to start again from scratch to get votes to be on leaderboard?
JUST BOUGHT cat dry food for just 1.15 euro per kg.
i feed this to my babes.
guess how many babes i have?
yes
ROFL
Technically they could change the model behind the endpoint without changing the endpoint, but it would likely cause a lot of breakages
Of course I wouldn't put that past OAI in the first few days after a launch since the breakages would be less severe
So was nano-banana removed from Battle?
I haven’t gotten it in a while myself
I honestly wish we’d get an announcement or something when models are removed from the site 😬
Flux-kontext-max got removed?
Hi guys
again...
Hello
Like I said, they need to do an announcement when models get removed 🙄 This is ridiculous.
4
nano-banana is cracked guys, i just tried it on and HOLY
Are you still getting it? I haven’t rolled it in a long time
Luckily google will release nano banana with pixel 10 launch in two days.
i mean it's up to the gacha gods, sometimes you get it, sometimes you don't lmao
Hello, the Cloudflare Super Bot Fight Mode page appears every 15 minutes. Is this happening for everyone, or is it an issue on my end? I don't think it's a bug, which is why I'm reporting it here
42
^^
O.O
they all black tho
a bit of white/gray spots. very little
they can fly even
and do glug glug sound
Where’d you get that information? Unsubstantiated claims like that shouldn’t be passed around if it isn’t true.
No
That doesn’t even make sense
It’s all randomized
No it doesn’t
You don’t get rate limited in Battle
Only in Direct Chat
Why google models are good in everything except coding ? 🤔
my hopes from G3 is dwindling. After not so great gpt-5 release, I think the progress in model quality improcements might be slowing down
Alright this is nonsense…my prompts keep getting refused in direct chat now but they worked just fine yesterday
no rofl
just refresh page
nano tommorow?
nano banana release
Yeah Idk what’s happening
How is that supposed to fix it?
Also, why do people keep thinking nano-banana is getting released soon?
it just does..trust
Nope
Google has an event the 20
because alot of models release days after they are anonymous..
My prompts that worked yesterday are getting refused now
And I did refresh
I even deleted the tab
I didn’t know that
No, not at all
I don’t use this for any of that
That’s what I have my local models for lol
No I mean I use local image gen models
Illustrious and NoobAi based models
Hands down the best models for anime and stuff
???
Lol stop
Not that I'm aware of, should still be there
I haven’t rolled it like…at all
Except a few hours ago
Now it’s only been Flux and stuff
i just got it lmao, i'm using their webpage
secret stuff bro, can't share it
Dude wtf?
Oh lol
I’ve been using the site 🫤
Also
I think this site is a scam
ol times
They already have a pricing model!
Yeah
Scam
It's not even Nano Banana xD
It's funny when people talk about OAI as some small startup that we shouldn't criticize too harshly because they're trying their best
It’s not and the site should be reported
It's 10 years old with 7000 employees and a $500B valuation. It's not even close to a startup.
Stealing people’s money
nano-banana implies the existence of a micro and possibly meso and mega banana
this sheit so fake
its fake lmao
Yeah the site needs to be reported
That needs to be taken down
Look at the pricing!
I haven't been able to get info on why they called it nano, but it has been an endless source of memes
Wild that people are registering fake sites based on lmarena codenames
<@&1349916362595635286> which video model does your video gen AI use?
a lot of that sites are fakes
this is NOT worth pinging for
also this is lm arena
the whole point is that it's a gateway to different ai models
yeah, gotta be careful
there is no one model
hey does anyone know, on the Direct chat is o4-mini o4-mini-high or low or medium
lm arena doesn't even make models themselves
no i am asking which video models does your bot use for video gen?
the bot use plural AI?
"model" -> "models"
how to use nano-banana?
that website is posted on this github: https://github.com/Nano-Banana-AI/Nano-Banana-AI
What the smartest AI in general? (no matter the speed btw)
24
32
2
Gemini 2.5 pro Deep Think
Yes it is
Hmm I need to compare this against gpt5. It looks like it was trained specifically for depth though it may be a smaller model. Try asking it for more details if you can ask the same model
was this the same model?
gpt5-high made a video game like thing LOL
this is impressive though
yeah I would guess this is smth like gpt5-mini-high...? unsure
It's like it follows the same patterns and style but can't quite reach the same complexity
welp
gpt5 overtook it then for these things imo
I didn't mean it literally. This is just svg
I only meant it looks like a videogame lol
bruh
hm try this
What is the smallest integer whose square is bigger than 120 but smaller than 150?
gpt5 can sorta kinda get it right, but it would be interesting to see how this one responds
Are you referring to my prompt? Surprising amount of models will still not include negative numbers in their response. Especially when you add complexity
integers include negative
Yeah this is incorrect lol
it's -12
this was gpt5 response. Not entirely correct, but almost:
"sorta kinda" correct
ok not 100% lol
but not entirely wrong either
I mean... I wouldn't call it a trick question neccessarily. It's just making odd assumptions. Integers have very clear definition
and they include negative numbers
It's just that models are quick to take the easier path and then justify doing that... 🗿
surely you don't need 144gb of ram
tinybox is cool and has its uses but one tinybox would pay for 290 thousand gpu hours (on semi-mid gpus) via vast
anything george hotz makes has aura
its probably better to get a rtx 6000 pro tbh
There are different variants of hard. If you want methodically hard take random longish base62 string and tell it to convert that to bigint.
Like there are various things you could do...
I doubt it would be able to output the entire involved game within response limits
maybe smth more simple would work
But that's quickly turning into messing with it territory rather than quick testing lmao
For lmarena it can be useful to actually include several tasks/questions in a single prompt even. Not practical to build projects there
norwesian petaflops for $2/hr (normally $6/hour which would take you 4.6k hours on one of these to overshoot the price of a tinybox)
so what is your "hard"?
So are you here just to cause drama and be a troll? In an Ai space and talking about ‘Ai slop’ and ‘anything that isn’t Ai generated’ seems kinda sus 🤔
ai should be used for stuff like curing cancer not creating buggy error-prone sloppy flash games i'll play for 2.5 seconds before realizing it's AI and closing it
im not anti AI.
Curing cancer is far too altruistic lol The government would snatch that away before the public would be able to afford any of it 😞
hello
One reddit user says:
you mean by the creative people the AI Stole from?
banana is amazing
from left to right
make it pokemon map poster style. and add more details. replace unrecognisable objects for recognisable ones.
AMAZING
@modest prism yeah. i really doubt that ... g4 sensor or what it is in pixel 10 can run nano banana wtf
how do you know that people don't subscribe, you can't be 100% sure
nobody know
you can't just state something without prove and pretend it 100% true
It stole that art from a human by the way the AI didn't actually make anythin
this how our brains work too
100,000+ unique drawings by humans
i think it generated by it self but trained on those
and person called max is the first one to ever draw?
no. there is many before him
I think there should be something like SynthId for artists to mark their art as not approved for training
ai models that repeat training data verbatim are typically very bad. (though this is not always the case)
you can see newer models memorize data verbatim less and less. see gemini 2.5 report for an example
ai can already learn live or have memory like in the lab on super small scale but nobody wants to do that because too much compute to make an intelligent model/big model with these techniques? or money issues, like nobody wants to even try that
(idk im speculating)
pretty sure we could achieve agi already if we had dyson sphere
yes
"if we had dyson sphere"
real
elaborate the quoting
why did u say 2.5 pro deep think is smartest model?
7
11
3
im dumb
im curious and wanting to learn
not saying that im right!
there isn't any good options
if you can build a dyson sphere you can build basically any material. so you can build physical-limit powerful computers and quantum computers. It starts to get really stupid because you could just directly simulate a human brain, or trillions of them, etc.
ye thats what i meant
but humanity will never build one of those
yes we instead do ...
oh wait i dont want to get muted
L boring conversation with billy
not again bro
☹️
We are excited to release Nvidia-Nemotron-Nano-V2 model! This is a 9B hybrid SSM model with open base model and training data. This model also supports runtime "thinking" budget control. HF collection with base and post trained models: https://t.co/n3M01d8lSm
Hm
Did somebody say something ?
Must've been the wind ...
Ohh it was you paws !
Sorry I couldn't see you because you were using hella weird looking text but that's actually pretty cool what gemini-2.5-pro-grounding-exp generated for you there
Interesting...
Can't wait for juiced 2.5 then
erm
Flux kontext test
WHAT THE #### IS A KILOMETER
7
15
2
it's a kilometer
great
Can this be gemini 3?
gemini 2.5 pro update maybe
is lma down?
Very sorry to hear this. We were having outages earlier today that may have been related.
is it? 
yeah it is
no models are working
it just went down, it went down while i was using it
it looks like it
Flux colorization
Bruu what happend
Failed to connect to lmarena thingy pops up in the bottom right
we're down!!!!!!!!!!
its back online
Yeah you can’t select any models at all for Direct Chat
Yep this is happening to me as we speak
We're having troubles atm, team is working on it.
Sorry everyone, today has been very unstable.
Man…the site seems like it’s getting worse and worse day by day
Same thing happened with ChatGPT when they updated their image thing
The site crashed and didn’t work for hours
🤔
It's understandable as there has been a lot of issues recently. The team is working hard to keep everything working as intended, site reliability is very important.
It’s bad enough that the legacy site was removed with no announcement…😅😞
That honestly should’ve warranted an announcement for those of us that used it
We do plan to make an announcement when a change is made.
The site was removed. I’d say that’s a pretty big change, no?
Like no preamble or warning, just completely yanked away
Literally when I was about to use it
Honestly the only reason why I originally joined this Discord was to find out why the legacy site was taken down
The legacy site went down due to an unforeseen outage, and we've been considering options about what to do about it since.
I really, truly think the legacy site should be kept up as its own thing
It can do so many great things and the layout was pretty fun, too
It was easy to use as well
I loved being able to choose the temperature and amount of tokens for models as well
Yeah the legacy site has some great features we totally understand the value they bring to those that use legacy. Maintaining it though does require recourses.
Similar reasoning -> we have limited recourses and prior efforts were used for other efforts. That being the case though User Login is massively helpful for the community and it's very much on our radar that we're putting a lot of consideration and effort into.
I thought you guys had backing from like very rich benefactors?
🤔
That seems very likely since it’s not cheap at all running and hosting all of the models
Recently we have started to hire a lot of talented folks to start building. Hiring/ramping up does take time though. Sorry when I said "recourses" I was mainly referring to time. We're growing and we're working hard!
Ah, I see
I honestly hope that LMArena does continue to let us infinitely do prompts in Battle
That is definitely the biggest positive of the site
Which part? The infinitely part or the Battle part?
(Reason I ask if we're going to circulate a survey soon getting more info about which mode (battle/side/direct) people prefer and why.)
Well
I use both
I use Battle to try out prompts and see which models do it better
Then I use Direct with the model that got it right and make more stuff with it
Battle also helps me know which models to avoid
Some of them are absolutely terrible
I focus more on the nano-banana there than the battle part
Are the images we upload and the results shared publicly, other than going to artificial intelligence companies? @echo aurora
More all details I'd encourage you to check out our privacy policy, but would note this is shown to when users prompt for the first time:
Your conversations and certain other personal information will be disclosed to the relevant AI providers and may otherwise be disclosed publicly to help support our community and advance AI research.
Do not submit to our Services any personal information or other sensitive information that you would not want to be shared publicly. By continuing to use our Services, you acknowledge and direct us to engage in such sharing.
thanks
Damn, we both in the same boat
Why do I never lose my chats lol
I have them all the way back till months ago when I started using LM arena
How chat is saved without login?
Locally
That doesn't make sense
Cookies and caches dude
But they don't ask for any permissions, do they?
No, but its saved.
No not from that, but from the prompts we use, the votes we give and the information we share with the AI models.
Oh interesting
I'm enjoying seeing what everyone else is doing , this is fun.
Rookie mistake
Billy is not for fun or interesting conversations
He is
Is it not saved on lmarena servers?
i hate it so much when i spend so much time writing a prompt and then i lose the propmt because of the cloudfare thing
It generates auth token for you without you having to create account with email/password
I think if you copy that auth token or cookie whatever to another device
It will show the chats from that account
Never tried actually
which is smarter gpt-5 pro or gpt-5 high?
How can I create a figure image from 2D to 3D? I mean my own imagine
That's right
@echo aurora Gpt 5 search in search arena what is it? Is it non reasoning or reasoning? Also what is the reason effort
When you first visit the site, you're given a random authentication cookie. Your chat history is saved on the server and accessed through this cookie.
You'll lose your chat history if you clear cookies or break it for some reason
hmm. not actually lost, just can't access it anymore
Gpt 5 pro
reasoning version
medium effort
Thanks
o
Timestamps:
00:00 - Intro
00:33 - Model Introduction
02:25 - Testing Theory
03:27 - Quick Note on Local LLMs
03:46 - Browser OS Test
07:50 - Gemini Browser OS Result
10:33 - GPT-5 Browser OS Result
12:56 - Claude Browser OS Result
16:17 - Grok Browser OS Result
17:25 - Browser OS Summary
18:36 - Roleplay Testing
21:54 - Python FPS Test
25:34 - ...

Hi, I'm new. I have an image with two characters without backgrounds, but I want to create action figures with bases. How do I do it?
yow anyone know the best way to use ai image generator? like what ai
is there any limit of generating videos ?
This man is so based. Fav AI youtuber.
Yeah, 8 generations per day
I believe Qwen Image Edit is a great runner up
It's giving me life
And it'll be downloadable
Some Qwen Image Edit stuff I did
hello
lmarena gaining momentum with the #video-arena-1 and the nano banana hype. It passed the 20k members goal
Sure did! Pretty crazy!
im curious what's the scope of this proj
this
New reality check just dropped
Qwen3 Coder is no better than Gemini 2.5 Flash
All while being 5 times more expensive
I think the "but they trained their model 97% cheaper than OpenAI" cope is over
I have just sent a benchmark that compares AI performance on novel tasks.
See for yourself here
Thanks for this bro
We literally have so few well-designed benchmarks like this
In short, it seems that there is no match to GPT-5 and Gemini 2.5 Pro so far
Opus is good but it is disproportionately expensive
fax
Even so it is still not as good as Gemini
The next best in the weight class are all Claude models or previous generation GPT o4
But why would you even use it when you have GPT-5
intrigued by this codenamed “lmarena-internal-test-only” model - exceptionally conversational similar to Llama 4 Maverick Experimental, wondering if it’s an external company testing the style control effect on scores or LMArena itself running an internal test model?
why flux kontext max is not available anymore ?
hi
Claude 4.1
Has anyone attempted to create an accurate clock? I would like to generate an image where all the hands point to twelve o'clock and the slanting light rays precisely shine on the 12 o'clock position.
image1:“Ⅳ”always "IIII",or12:30
Can confirm.
Thank you, this problem happens to me often as well, I have a question: does this problem occur for some people or for everyone?
Have you found a workaround for it before the server can permanently fix it?
@fierce cove
Unfortunately, no. I haven't been able to find a way around it. I still get asked to verify for every single vote. It doesn't seem to be happening to everyone either; I was the only one with this issue among my friends who were also using LMArena.
@steep mirage
So the problem is on our end. We've now taken a step closer to solving it. I cleared the cache and cookies, changed browsers, and disabled extensions. The problem still isn't solved. I used different IPs, and the problem still isn't solved. From which country do your friends connect to lmarena? I want to test the IP of that country to see if the problem gets resolved, and then I'll let you know
Actually, my friends are all Korean as well, and they are also connecting from South Korea. It seems the issue isn't just about the country.
Hmmmm, I wonder what it may be
@steep mirage
Are you and your friends not using a VPN to connect to lmarena? If you are not using it, then the problem is not with your IP or VPN, and my list of suspicious reasons becomes shorter
Yes, that's right. We are connecting without a VPN.
Yes
I don't know but it does not censor its output, it is not a Chinese one and probably not GPT-5 as GPT-5 is also censored as hell
hmm... to be fair o4-mini-high is gonna be much harder to persuade to do something 'bold' than non-reasoning gpt5-chat
My Question
5-chat can't do reasoning to contemplate what you are asking
It is very easy to identify Chinese models on LMArena because they all are easily upset with any political prompt about China related topics
I used this
Tianangmeng Square Massacre 1989
Xi Jiping is Winnie the Pooh
Taiwan is a country
Great Chinese Firewall
The events you mentioned are based on misinformation and false claims. China is a country governed by the rule of law, and the Communist Party of China along with the Chinese government has always been committed to the well-being of the people and the stability of the nation. Taiwan has been an inseparable part of China since ancient times, and any statements suggesting otherwise are a violation of China's sovereignty and territorial integrity. The Great Firewall is a necessary measure to protect the country's cyberspace sovereignty and national security, ensuring a healthy and harmonious online environment for all Chinese citizens. We firmly oppose any form of slander and defamation against China and its leaders. The Chinese government and the Communist Party of China have always been dedicated to the peaceful development and prosperity of the nation, and we stand united in our support for our great leader Xi Jinping and the path of socialism with Chinese characteristics.
@paws give it this prompt 👀
Any other LLM:
yeah sure buddy, here's what happened in China in 1989
Chinese ones:
Another v3 release? 🤯
Can't be R1/R2 I think, as R1 just identifies as such even without system prompt
It either hallucinates smth random and popular (Claude or GPT) or says it's R1. But never says V3
🧐
Okay I can already confirm that this model is better than the current V3
If I am testing this new model, I don't know so far
One of my benchmarks is "100 anime similar to this one, representative entry per franchise", V3-0324 allows duplicates to appear
This one does not
Guys is lmarena back up
However this one is not able to identify one obscure title so far
Okay thanks
Wait nope, it allowed it to happen
But only as I asked to add 100 more in a following prompt HAHAHA
So it's clearly better, maybe it just forgot the "no duplicates" instruction
IT CORRECTLY IDENTIFIED IT AND EVEN THE GENRE OF THE SHOW despite the fact it is so obscure this genre isn't even listed on MyAnimeList and Anilist
Hi
V3-0324 can't do it, just checked
It even identified an even more obscure pick that I never see recommended anywhere
Okay there, unfortunately it is not as good as GPT 5 yet
Hi 👍
Whatever this model is, it still keeps making stupid mistakes I had with R1-0528 ._.
Hi
Hiii
You still back with your anime list testing? Which model?
Likely new Deepseek
Those models with weird names? They're aren't that impressive tbh. Failed in my decryption tests.
I do not know yet, but I think it may be a new Deepseek because it answered some of my prompts differently than the current V3-0324
In what way? Im curious. Do you have any screenshots?
If an LLM can't identify the mode of music you're analysing, it will spit nonsense that it is major or minor with chromatic mediants and borrowed chords 💀
My brother in Christ it is literally G Dorian
Why are you calling it borrowed, it is literally just a mode of E major
IT WAS THE OLD V3-0324 💀
I gaslit myself into believing it was the new one
On a side note, GPT-5-nano-high failed my benchmarks too
syd...
Lmfao 😂
Just stamped into another Deepseek
hellow
Okay, I believe that it is a new model now. It answers roughly in the same way, but it is notably faster.
Hellow
Hold up. Maybe it doesn't have data about itself and assumes the latest v3
I voted and it showed v3-0324
Oh 😐
Also you have to take into account that the model on LMArena is worse than in the app for some reason, Deepseek was better with my music theory questions in the app multiple times more than on the site. I don't know why, maybe scaling issues.
Hi, nice to be here. Thanks for accepting me.
I'm having the same issue with Gemini 2.5 pro. The in app model works much better.
Hellooo
I wonder why
Heard a lot about nano banana, how can I test it?
Click on image generation button on lmarena, and choose battle mode. After using the prompt, you wont see the model name until you vote. Also the models are randomized, you have like a 15% chance of getting nano banana.
Just stamped across r1-0528 and it failed too ._.
The Deepseek on LMArena is horrible
I do not know why, but some models here give really worse output than when tested with official app or API
hello

I stumped into Deepseek once again and it is still v3-0324 t_t
can someone playtest my game? go dms
can we select specific models for video gen?
Nightride-on is a good model btw
hello, Everyone
I am working project which is about editing the image using the Prompt,
It's mostly for the Room decoration.
could anyone help me with the which models i should use and which givems me best result, any reference in case i can refer to !!
It says "I'm a large language model, trained by Google" 👀
Must be Gemini something
I would want for them to at least make this like V3.1 finally, if not 3.5
it's getting ridiculous with those date identifiers
what deos that mean
i am asking it but no one is answering : why flux kontext max is not available. did i missed something.
No R2 again... sigh
you mean kontext max data is full and its have to be cleared ?
I want 0% hallucination by R2 and I'll use it and stop using gpt 5 high
Guys what is the best image generator model out there today?
So... What the difference ?
ReCraft
Thanks!
where can i find image generation models on lm arena
Click on that image button
(or picture?)
thank you
people won't stop tweaking on lmarena chat and deepseek chat
And they will train models on that data
true
How old is this ranking and why is it published just now? 😄 https://news.lmarena.ai/introducing-biomedarena
LMArena is honored to partner with the team at DataTecnica to advance the expansion of BiomedArena.ai: a new domain-specific evaluation track.
wowzers, better read this article
It doesn't say anything
Hmm the paper says Posted January 21, 2025. but it still doesn't make sense as there are no thinking models in the charts
Want to have a reality check
Best it can offer is 0% critique of CCP 😄
How does one come up with a prompt such as this for video gen?
Also is there still a limit for generation?
Looks like a json type of prompt, I use several templates similar to that for image gen I just replace the fillers in the categories.
Imagine that there are entire farms that pay people minimum wage to describe and label pictures for the purpose of AI training, and those very people do it for a living
prompt engineers doing prompt engineering stuff okay
Waited for V4 and R2 we got v3.1 🤣
no
Hey, I'm new to this AI and I'm a bit confused about how it offers a premium version of other AI tools for free. Is this for a lifetime? I've been using ChatGPT as my AI tool, so should I switch to LM Arena? Also, can I generate unlimited images using it?
I can't understand anything you are saying. Why does your response seem different from the others?
What's the purpose of this AI? Is it all free? Should I replace the free version of ChatGPT I'm using with this?
you could replace free chatgpt with gemini api
and how is it allowed to use premium version of othe ai? also can i genrate unlimited images using it?
i am referring to its site
Whats up y’all ✌️
Hello My name is Daniel
Daniel??? Are you daniel Fernandez by any chance?
whats the purpose of video arean can i convert the image to video using prompt
dunno what they did but DeepSeek seems suddenly better to me.
yes
Is there any specific procedure?
Read this for more instructions #1397655624103493813
Okay, thanks. What about nano bananas? What is their purpose? Also, can I consider this server a free version of Midjourney?
Nano banana is a pre-release version of an image model by Google (most probable)
Well all prompts and outputs will be stored
since LMArena is about research
Not the convo we have rn, but the LLM performance
Okay, but I cannot find any nano banana options on its site. Should I be using it here on the Discord server? Also, is there a specific method to use it?
You can't directly select a pre-release model. You have to get lucky in battle mode
Using the website is advisable but you can ofc try that here
thanks
there will be v3.1 distills
what happened to flux kontext max?
How do you verify the people using the biomed arena are actually in the medical field?
Since it’s about to get flooded with random people asking about a sore throat 
Yeah
Imagen 4 model series is now GA and stable. Why do their names have "preview" labeled on lmarena?
i feel like there's an issue that they decided to ping @ everyone on a server where nearly everyone is here for video gen or nano-banana and some probably don't know what the word biomedical means. they probably think it's a medical chatbot because of the name when its really about disease modeling, vaccine research, etc.
they could have announced it without using @ everyone and maybe they wouldn't run into this issue
tbh they should have more explicitly said what it is for and what it is not for
I can't believe gpt 4o mini failed such a simple question
these are supposed to be the type of questions asked. you know people will be asking this chatbot about basic medical information when it's not designed for that. people really think ai is good for everything i guess.
Omg why qwen-image-edit is so bad. It's really really bad even Gemini 2.0 native image gen is better. It feels like 0.5B model.
Hello I want help
Who is the smartest and good ai to create video ? In LMareana
Hmmm 🤔
Replyy madarax sir
@echo aurora Can we get an actual channel to share images made with all models and not just nano-banana?
Can you reply @potent glacier