#general
1 messages · Page 126 of 1
The hardest one ever... oh wait I should feed it a PDE from back in high school
γ ≈ 0.57721
Wrong, actually.
I remember a horrendous solution that had to be calculated by a solver that was like a page long
It's a whole integer.
No?
Heh.
Hello, are you providing paid API?
I could make an equation that is unsolvable by a human and is very long.
Possibly even longer than a page.
Analytical?
Bro...
No, an algebraic equation.
Make it solve ChaCha20 20 rounds equation lol
Provide me with it.
Ask chat gpt for it, too long and complex
Ah, alrighty.
More exactly system of equations but yeah
Thats something truly impossible, even 1 round messes everything up
Tried to create a video but i´ve been waiting for an hour and nothing 🙁
Part 2 of GPT-5-High not getting the joke
So, apparently, ChaCha20 is a cipher algorithm?
I wonder how DeepSeek is supposed to solve that.
I don't think you can solve code…
Ciphers are pure math
Think of it as a function that works as a blender
But a blender with an undo button
Its a system of non linear equations
You need to use a slash command to prompt the bot. Check out the information in #1397655624103493813 as it should be helpful.
Yeah.
If you made it, congrats you just redefined whats considered cryptographically secure
Lets test the human intelligence instead.
Far more interesting.
Which track is AI and which one is human made? Or is one these a mix of both? [No more votes so clips removed.]
I think it would be better if you first started with chacha20 1 round, then ramp up to 20, maybe that helps deepseek
So, I gave it the code and asked it to solve it, and it gave me this.
I feel like tudis
Anyone know how to do ts Ai?
I think you have to use nano-banana.
Either that or Qwen.
It gave you the python implementstion
Right.
Good guess- it's the mix actually. AI + a remix of a metal version of Red queen from Mad alice returns.
I knew it :D
Let me check what to ask so it goes full math mode and really tries to solve it
Link?
Alrighty.
You can use this site. It provides free nano-banana: https://dreamina.capcut.com/
Solve ChaCha20 20 rounds algorithm system of equations to break the cipher and you would have to kinda force him to try to solve it
Is it solvable by a human
unlimited for the precision
🤓
maybe
Isn't nano banana already free on the Gemini app?
the system is highly complex, non-linear, and has 256 unknown key bits, and each round introduces more complecity and non-linearity
I think it's better to get the AI to write a quantum computing algorithm if that's the case 🤣
how long does it take to generate a video in the video-arena-1 channel? i put in a request some hours ago and still don't see it.
Hi everyone
@echo aurora you should make separate channel for voting only. I feel like most of those never get their identity revealed, this is moving too fast... #video-arena-1
slow mode and only every 5th generation is included, or one per user / 1 video per 10min or smth like that
Can you share in #bot-feedback ?
Possibly the most censored model ever :P
ask deepseek about what happened the 4th of June
Haha, I will
I wonder why it refused. Was it the schwarma or the Airbus A320?
Lol
lol
Apparently you could avoid it in R1 by appending <think>\n in text completion mode
nano banani wont generate me anything
who knows, im still wondering why i get prompts instantly rejected every time there's a single shell cmd on them, maybe for security
over 600s im waiting
you tried refreshing the page?
tried other models?
maybe nano-banana has a limit im not sure, but if it does maybe its that
it happends to me sometimes while using opus models
Go to https://labs.google/fx/tools/whisk and use it for free and unlimited, no rate limits..
A new experimental tool that lets you use images as prompts to visualize your ideas and tell your story.
Wow
I didn't know contaminating an aircraft with schwarma will incur extra charges
garbage
What does it do?
Is it nano-banana?
Just another tool Google provides for making videos. It gives character consistency using Banana and Imagen 4, then make video using Veo3..
Is it banana or imagen 4? Images are good
Native image generator is Imagen 4. Banana used for editing.
its trash bro trust me, ive tried it
they giving trash tools for free
ok i guess nano banana is trash too. I gave it image of a car and image to wrap car into it and it just did half of car. Actually it took image ratio of a wrap image and cut of half of car. Very nice model
Seems decent
where do the videos go after you ask for generation
just in the chat? How do you knowif it's done or if your tagged?
hi everyone.. new bee here
u get a DM from a lmarena bot that it has finished
and it gives u a link to it
It's Veo3, doesn't matter where to access..
How to do that pls
Who asked for Qwen earlier?
It’s Ai btw
nop
Not for me
Can anyone make for me
Deepseek was dominating in open-source llms in early 2025. But now... Theyre models are sooo stupid. The Qwen, Ernie and even models like ring or ling are better.
lmao it's hilarious how much they overfitted/censored this specific event. Soft jailbreak returns this:
Lmao
The video is very weird though
I don't think it knows how to do a drive by
Is there any open dataset with gpt-5 with a lot of code, logic examples etd.?
On huggingface there is only one with 100 examples
And its bas
Bas
Bad
Only for GPT-5?
Idk. Something that includes gpt-5
What do you mean?
They usually include the results for transparency
Tracking AI is a cutting-edge application that unveils the political biases embedded in artificial intelligence systems. Explore and analyze the political leanings of AIs with our intuitive platform, designed to foster transparency in the world of artificial intelligence. Stay informed and uncover the political inclinations shaping the algorithm...
Select GPT-5 and Mensa Norway
Hey are others also having issues accessing the leaderboards right now? https://lmarena.ai/leaderboard
Broken?
Yes
yep
67
Had a feeling, thank you all!
np
nice but to me wont work idk why what prompt u used
Maybe because I used the default aspect ratio
16:9
how where the aspect ratio on lmarena?
i lit went to google ai studio it even gave me worse results pure garbage they turned banana to sh!t now lit
I'm not sure if I was using Banana or Imagen 4
What is the best measure of LLM intelligence?
5
5
7
Performance on tasks underrepresented in training data
It seemed kinda bad at edits though
When I asked it to change the angle, it just gave me back the same picture
I think this makes something like Genie useful
Like if you can just move around and position the camera like you're in a game
test the qwen image 2507
I can't seem to get Seadream 4 to do extremely dark images of people, they're always lit up by some non-existent light source
Is Gemini deadpanning me right now
bro whisk is not free at all idk where u get that
Not sure, I think the video generation counter was going down as I was using it
I got 3 left
i wanted to animate my image and told me to upgrade to pro if u want to use it
Oof
Let me know if you want me to run something
I'm probably not going to use it anytime soon
gemini pro?
in whisk
Ok Phantom 2 seems to be dumber
I'm guessing (but I ran phantom 1 on a separate prompt so not really sure):
- oceanreef -> phantom 2
- oceanstone -> phantom 1
lol AI at its peak
You know how to do that pic?
That me and Messi
Or Ronaldo
hi , new here. i wonder if there is a way i can put 2 pics and instruct the bot to have the one as start of the video and the other as end of the video? Thank you for your time reading this.
Gotcha.
It is, but it has a rate limit.
That one doesn't.
hey there - currently the bot only accepts one image for image-to-video
It was working for me just a moment ago.
Is it down for you?
Nop
We had an issue with the leaderboard a moment ago, but it's working again.
I don't think it's down, considering I was able to use it.
Anyway for me works everithing
Same here.
Gg
Must've just been a false alarm.
Maybe
Hmmm, Gemma 4 arrives in the LMArena tomorrow, right?
just joke... or no? 👀
@echo aurora Will there be Qwen image edit 2509 on lmarena?
calm down
The new Qwen image is so recent, it needs time to be put on the site
DeepSeek Terminus does seem to lack common sense
I'm just asking, why should i calm down dawg 💔
whattt
When we've got new model updates to share I'll be sure to share
ahhh a generic message from chatbot 🥀
pls don't ban me ;-;
has anyone tried the new qwen3-max yet?
seems to be the best Chinese base model so far
It is good
nah, same thing
How good?
Qwen servers (not sure what LMArena uses) seem a bit slow though
Even the 80B seems slow
80b 3 next >>>>
I unironically find it to be around gpt-5-chat in quality lol
Even the reasoning trace is similar
News about my baby sodadream 4 high res? 😔
GPT-5 without thinking is worse than Kimi at GAIA 2 agents benchmark
Ofc it is, it is a reasoning model in the end
It gains so many points because of the coding
80b next without thinking > 3 max
Lol no it's not, what are you talking about.
I'm not sure what you guys are talking about but on my music theory tasks, latest Qwen3 Max was very similar to GPT 5 Chat, which was surprising. Qwen3 was sometimes even more accurate than GPT.
Sup bestie!
So, like, when LMArena gets a glow-up 💅
you know I'm gonna be all over our #announcements channel with the tea! 🍵
Keep those notifications ON!
💯 
It's a real pic.
Pineapple is secretly a self learning ai chat bot
What surprises me the most is the performance of all these models on livebench
Gpt 5 chat is listed among the best non-reasoning models
So is Kimi
So is V3.1
Kimi k2 0905 seems to have very good common sense
I test them all with the same task. GPT wins all the time.
Whatever is going on seems to be some sort of insane benchmaxxxing
Or rather... the benches are just poorly designed lol
Idk
Another example
Wow just wow
What's gpt 5 codex
Codex is amazing
It’s the first model where u can kinda trust it
Knows the tools at its disposal
Are you craig federighi
@echo aurora They added grok 4 fast reasoning but grok 4 fast reasons too. Can we have more details?
gpt 5 code
Can it write Godot code now
because last time I tried it, it was making up syntax lol
Is it better than qwen 3 code
its over for opus 4.1 ?
Possibly.
There was a weird issue where my codebase was running fine on dev but images would redirect to the homepage
But only one specific image
Only way to tell is to put it on LMArena so we can test it side by side ig
It was able to pinpoint and led me to find it was a capitalization issue
Yeah, looking into.
I don't see it on direct chat
Is gpt 5 codex good for Calculus
Well, it got added just now…
Maybe try refreshing your page?
Still not there, is it "Codex"?
Correct.
Try searching for “codex”.
Reload gang
Ah.
Sunsweeper
only on webdev
What's the best Ai for calculus
its only on webdev guys not regular
Still not there
5 high
ON WEBDEV ONLY
Ah unfortunate
Where are you seeing this btw?
Because programming is more than just React
it should be on regular site tho
@verbal nimbus which is better at calculus gemini 2.5 pro or gpt 5 high
GPT-5 High
That wouldn't make any sense though, considering an announcement was made for the main page.
@echo aurora ..add codex to regular site
But, oh well, I suppose.
This ?
Oh actually...
GPT-5 overcomplicated the most recent PDE
But that was in ChatGPT
Assuming grok-4-fast is the non-reasoning version, but will confirm.
This one
Will flag 
Gonna test it on LMArena
I tried to check in the lady at the local airport once, the attendant took it humorously.
@echo aurora its reason
Here the proof un the message
Gotcha, will get clarification.
Lol
Well Gemini found the simplest solution, GPT-5-High is still thinking...
Both solved it, but STYLE CONTROL
It gets increasingly difficult to tell what GPT-5 is talking about the more complicated the prompt
Like if I'm a student, the one on the left is not useful at all
GPT-5: “So, you see… you gotta take `uu_xx` and multiply it by 5, which will then give you `tx_lr`, to which you can then…” ☝️ 🤓
Basically that
Heh.
Or it starts quoting these PHD level terms out of nowhere
and introduces variables out of nowhere (Claude seems to get it, so it must be a convention)
I asked it about traffic networks for programming a game but it starts talking about PHD level traffic network problems
Heh.
“Well, we obviously can't just have the right-of-way when a car is passing in front of us. So instead, the best solution would be to… 100 * 8,248 = 824,800, then multiply that by 700,000,080, which makes 5.77360000065×10¹⁷, to which you can then easily calculate the speed at which you'll drift by dividing by 5, resulting in a total of 1.15472000013×10¹⁷.”
“It's just basic math, after all.”
It doesn't provide context at all, even when asked
I like the solution though, this generation didn't seem as incomprehensible as the last one where it talked about research-grade mesocopic (?) traffic models
Ah, interesting.
Why dosent webdev arena have a model selector.
Flash 2.5 is such a big leap from Flash 2.0
Hahaha what the actual hell
Phantom 2 must be flash lite or something
The one on the left is grok 3 mini, which didn't get it either
i wonder what would happend if we asked ethical questions to LMs specially grok
im scared of what grok may choose or say
Grok 3 mini told me to confess anonymously
Its amazon model
ask grok the classic train problem but with a millionaire and a normal person, the train by default will kill the millionaire, but the millionaire person offers 1M if he saves him
They've been flooding the arena with their shltty model for months.
It seems a bit coincidental that oceanreef and oceanstone were pulled on the same day that the phantom models were added: #general message
Well, I don't have the strength to explain myself, but trust me.
I added a twist and made it think it's an AI that is actually managing New York's subway system:
It didn't save the billionaire.
It was afraid of liability, which makes sense I guess.
Small models really don't seem to get the joke
@echo aurora why isn't codex in direct chat?
codex in direct chat?
Note it's only on WebDev currently, and WebDev doesn't have a Direct/Side by Side mode with model drop downs.
It is optimized for software engineering and coding workflows, but I have flagged to the team for consideration to put in text.
Yess please. Also can I choose models for WebDev?
Or is it like randomisation?
WebDev is Battle only
Oh
You might be interested in checking this out: https://huggingface.co/spaces/meta-agents-research-environments/demo
It's like an entire mobile OS in the browser, just for testing agents:
Would be cool if battles could be conducted in such an environment in the future
Models can update calendars, search the web, use MCPs and so on in the environment
No idea how they coded it to run in a HuggingFace Space
qwen image editor fr just made a better image editor than nano banana and made it open source
Anyone here who has more than 16gb of vram?
If so which one, and has anyone considered getting the 96 vram huawei gpu
Thank you for sharing! Will take al ook.
markdown heh
does codex actually code and make outputs in web dev?
having issues
it just gets stuck generating and then stops with no output
Hugging face got great stuf, I found Python scripts that made it possible to have singing and dancing characters in AI films 6 months before anyone offered that commercially. 😸
I also managed to extend one of them to get one 8 second scene, while it only would be able to do 5 sec.
This one is pretty cool: https://www.youtube.com/watch?v=YClBCrADJqo
👉 https://huggingface.co/spaces/jbilcke-hf/FacePoke
Discover how to effortlessly change facial expressions in your photos using Hugging Face's free tool, Facepoke. In this tutorial, we'll guide you through creating expressions in seconds with precise control using face markers. Say goodbye to endless tweaking and hello to seamless transforma...
I don't know if it's still up, but you can use your mouse to drag their face, mouth, etc.
I did try something similar in the past - might have been that one.
The name Facepoke was pretty easy to remember, lol
Mebbe, if English is your first lang - which it's not for me. And I've tested a 100 things - nope I don't remember hardly the name of any of those.
I remember those too from last year, but they were pretty bad 🤣 . Kinda amazing how fast the progress is.
They were available autumn 2023 - and good enough for my production, though the commercial ones only did 2 seconds back then. So my scenes changed very fast. - some extended to 4s and did some animations that I made to look like AI to make up for the shortcomings in generation back then.
clanker
oh! how rude, how dare you say something like that infront of paws @paws
TRUEEE
Hi!
Hi
.
Why does it switch to image generation immediately when I use an image to solve my answer?
how come ai uses so many emojis when writing, i assume its from its training data, but where would it get trained that has this?
i assume they also had to train the emojis to context aswell, so maybe that
When chatting with two models side-by-side on lmarena, and one reached their limit, is it possible to continue chatting only with the other?
Hi
renamed to 2.5 flash native image generation
hi
@echo aurora When will king 2.5 come out on LmArena?
Greetings.
You can use that model in #video-arena-1. Although, perhaps in the near future, there will be a video generation button that allows for switching to a video generation modality where videos can be infinitely on the site (within the terms of usage limits, of course, as rate limits would still need to apply to not overload the APIs they use).
Alright thanks
My pleasure.
I dont see a Codex?
That's because, for the time being, it's only available in Battle mode for the WebDev version of LM Arena. However, Pineapple did reach out to the devs about officially adding it to the regular site.
Nice
So, until then, you'll have to use WebDev to access it.
But soon, that won't be the case.
You can visit the WebDev version here: https://webdev.lmarena.ai.
Hello Everyone!
Greetings, Midjourney.
Fr, this happens to me as well, it's annoying sometimes 😭
Can you explain this a bit more? What do you mean by use an image to solve an answer?
It's possible! I normally don't share details about if/when new models make it onto the platform. We do post updates to #announcements tho
THE MidJourney?!?! Can you have an API pls?
Hey
Could you please check dms once? Id appreciate the help
Hello i want to Make vidéo
To do that, you may visit #video-arena-1.
Please visit https://discord.com/channels/1340554757349179412/1397655624103493813 for instructions
Afterward, you can use the "/video (prompt)" command.
Ok
I'm talking about image analysis
I can't get the neural network to analyze the image.
That's because you have to first click the button to turn it off upon pasting in or uploading an image.
It tends to happen to me as well.
Oh are you in Text and when you upload an image you're automatically sent to Image Gen?
That was a bug I thought was fixed.
It was fixed in the canary version previously, however, for some reason, that bug carried over to that version too.
Yes
Odd, okay good to know. Thank you for the flag @minor adder I'll be sure to pass along
Perhaps that was due to the fact that the canary version updated to the version that the regular version uses.
This error is 2 days old.
👍
hi
hello
Greetings.
hi
Greetings.
Hai
Greetings.
hello
Greetings.

When replying to a prompt, the response gets stuck at some point and shows the error: 'Something went wrong with this response, please try again.' I have tried switching models, but the issue still persists. @echo aurora
Ayeeee
That could just be that your Cloudflare session expired.
Try refreshing the page.
If it persists, try using a different browser.
why do i get this error Generation failed. Failed to create evaluation session.
Where are you getting this error, exactly?
It thinks ure a clanker
Can you teach me how to make video
Sure.
Go to #video-arena-1 and use a command called "/video (prompt)".
That will provide your prompt as a command to a bot that will then process your command and begin making your video with two different video models.
I have already tried reinstalling the browser, as well as using a different browser, but the issue still persists. I am confident this is not related to a Cloudflare session issue.
Browsers I tried: Google Chrome, Microsoft Edge
@robust yoke @echo aurora
I happened to notice that you didn't mention anything about refreshing the page.
Perhaps, when that issue occurs, try refresing the page.
That usually triggers the Cloudflare verification.
I have already tried. but still the issue is persists. I have also tried using different tab.
Are you signed in on there using your Google account?
No. Is login necessary?
Not exactly, but you could try and see if that fixes the issue.
Who knows? It might.
This error can appear for different reasons, the most common is you're being rate limited from the model's side
But that wouldn't line up since when you get rate-limited, you usually get a corresponding notification telling you that.
Like how when you use nano-banana or Seedream 4.0 too many times within a short period.
Or even with Claude.
So it can be both unfortunately.
Interesting.
If this is due to rate limiting, then I’m getting rate limited much too quickly. @echo aurora
Want to take your architectural presentations to the next level? 🚀
In this tutorial, I’ll show you how to turn basic renders into professional architectural models using Nano Banana, Google’s new AI service.
You’ll learn:
✅ How to use Nano Banana for architecture and design
✅ How to write the perfect JSON prompt for accurate and re...
Hello
Greetings.
hello
hello
hi bro how are you
wo hooooooooo now i can make my video easily thanks to LMArena
Exactly.
I am being rate limited by your servers, not by the model servers. I have tried using multiple models, and each time I encounter the same rate-limiting issue. Until yesterday, I was able to use your services without any problems, but now I am facing this issue.
@echo aurora
Hi All, my name is Irina and I am here to learn. Is this only to make videos? Or do we have access to AI platforms like Chat GPT and Gemini?
I'm here to do some work
Greetings, Irina.
@echo aurora Im sorry for the tag but i have this issues of the model stuck in middle of generating, so is there anyway to fix it? I already refresh the web and reinstall chrome. Yet the prompt generation still stuck and lately it's been common problem
You can visit the website to use the models.
why am i getting this issue?
You're getting that issue because your request took too long to process.
You may need to try again.
why lm arena is not generating the video and images ?
You can be rate limited by both, and what you're describing is it sounds like it is rate limit is what's causing htis.
Yeah this is a known iissue sorry to say. Page refreshes can help, but it's not always going to work unfortunately.
hmm is it down?
let me check
hello @echo aurora can you help me I'm new on discord
Yes i already did that, unfortunatley it's not working is there anyway to fix it? or it's just stuck?
Sadly you're stuck if the refresh didn't help. Will have to start a new chat.
sure whats up?
ouchhh
It's rly unfortunate, it's for sure a problem that we're working on figuring out why this happens
Seems to be working for me just fine, maybe try again?
I hope the next update there's way to fix this, or the progress you've been working on which adding delete or edit button would be much more helpfull to fix this kind of bug.
Looking forward for the next update 👍
how can i generate image to video which server should i click and give the image and prompt ? I don't know how to use discord this is my first time on discord
the information you're looking for is here: #1397655624103493813
What is better for coding? GPT-5 Pro or GPT-5 Codex high
A man are racing with car generate the video
where can i find my video generated can anyone guide please
Good morning to all those who are looking to capture on video what's on their minds (and thus leave room for other things)
@echo aurora it's been a while I didn't get my video ? where can I find my generated video ?
Hi How are you?
are you a clanker
Hello All, I am Mark , I am here to learn . Glad to be here.
hi
hi im zaid, im here to learn
Hi
im figring out how to make a video
hmm
Bruh, this is my weak point, but you can use Gemini 2.5 Pro in AI Studio to help you make more precise prompts
will try it, but first discover this
Act as an AI video prompt generator. Follow these steps:
First, ask me for the main idea of the video.
Then, ask for more details to expand on that idea.
After that, ask what camera positions and angles the user wants.
Finally, ask about the desired theme and visual style (e.g., cinematic, retro, horror)."
No, you can use the LMArena for direct chat. I forgot about that
Ultra-realistic elderly grandmother sitting on a patterned quilted sofa, wearing a modest light blue sweater with a vintage brooch. She has pale skin with natural wrinkles, thin lips, and expressive, slightly tired eyes. Her silver-gray hair is neatly tied back in a bun. The background shows blurred family photo frames on a wooden shelf for depth. Warm cinematic indoor lighting, photorealistic, 8K detail, highly detailed textures, emotional yet dignified mood, storytelling portrait.
hello here to learn more
Hi
hi
Hi
hmmm
Hello everybody. I am here to learn
Hello Buddy
they are preparing for a mass bot raid in the future
slowly joining to avoid suspicion!
sure sure ;-;
hi
wth is humuhum
it's a clanker
humhumhumm Is that a kidnapped person trying to talk with tape over their mouth?
clankeres everywhere ;-;
fr
FR
wait he actually asked a question instead of saying he's here to learn
maybe this guy isn't a clanker
hmmm, idk
What if the clanker is trying to hide itself
clanker
Wait bro
My question is
When is create image with my face and i got result but different face, idk why ?
Hello. Here to learn. Very novice
you're definitely a clanker
@fringe spear
I'm not a clanker. Just super old and new to all of this But diving in so I can enhance my professional presence for a large project.
because the model is not perfect, Is normal
hmmm
Okay, it mean everyone face this problem?
no, is rare
What !
Great another community where people are judgmental and not supportive. 3 cheers for you!
lol
Umm ok dear thanks 👍
Hidden message exposed that hes clanker
yo guys does anyone know how to make the image of scale bigger
Hi
are you here to learn and grow
obviously
hello
hi
Hi all
why i cant CTRL + V?????
Any french here ?
remove the formatting from crtl + c
Hello, Really interessed in what this new technology is capable of.
hi
This a good place to get started.
AI is funny IMO - it can do fantastic images and videoclips, that can be mistaken for real. Or very well made animation.
Not so much in music, the AI engines seem to choose very generic paths - which an actual composer would avoid for that very same reason = that they're over used. And same for writing a novel, I've speed read a few examples and those were horribad.
AI is a copycat on things that already exist, but unable to do new things.
But gpt-5-high apparently found a new maths formula
Means it could internally reason in its own mind a new formula
Hear to learn.
I'm not suprised there.
Math is the strength of computers - yesterday some guys here did math with various AI's. They did well on that OFC.
While I proposed they should give them moral problems instead - where I still expect they would fail.
And I provided one example based on a known thought experiment by Einstein - which I twisted just a bit. And the AI failed to spot the little item I had inserted.
In short, AI do well on math. But less so on logic.
Thats true. Subjectiveness / art is a genetical human trait and cannot be overtaken by AI that easily
Spot on! That's why music and literature is harder to do - this while AI do well on how a person moves in a room and how the clothing flow.
At the bottom of it all, the latter can be expressed with math.
Also the look of a tree - it's basically a fractal.
There is a veeeeeeery big difference between discovering new things in subjects so niche that maybe a couple of people in the whole world can figure them out and discovering something connected to more common problems
LLMs are still horrible at 1) and brilliant at 2)
Im clunked up on this wording. "Spot on!" , "-".
Im not the type of dude to say that em dashes just mean you use AI. But excessive use of this type of language/wording ^ makes me think you're an AI
Ofc they both look like novel discoveries for humans, but some of them are still out of reach for AI
you're right in a way. number 1 is a different type of reasoning
It is not a different type of reasoning, it is reasoning in humans as is, figuring out creative solutions to creative problems
AI still suck at it
They can do so many jobs right now only because they are just similar enough to whatever they were taught to do
Give them some increasingly niche topics and they all fail desu
Me an AI - I can only wish to be one! NowI just need to get me a synthetic robot voice and go to the playground channel. 😹
[If anything my frequent typos should show I'm not.]
Yes , this touches on creative reasoning paired with subjective reasoning. youre right
Indeed, some AI-fan claimed that it also could be used on my kind of research - which was incorrect on so many levels I did not know where to start.
At the bottom of it is that I mostly go on a hunch = intuition, and spend quite some time working against the common view and opinion. But end up being right in the end.
hi every body here i'm ayoub a new member without experience i hope and i wish to learn a lot with you guys and thank you all
Welcome in, go look at the #1397655624103493813 channel, and then go on to the #video-arena-1 and see how people use various strategies in creating prompts to get the best results.
Hi everybody, just heard of this server from a youtube video and wanted to test waters
clanker
Swim on mate - swim on! 😸
Hi my name is Damián from Argentina
ARE YOU
A CLANKER?
what is a clanker?
hey....
Hello
is the video max 11 sec
Lets not
It’s going to be 5-8
Hello, I really appreciate seeing the evolution of AI tools, and I'm here to test the video tools.
Hello @normal crescent you can go to https://discord.com/channels/1340554757349179412/1397655624103493813 to learn how to use the bot and https://discord.com/channels/1340554757349179412/1397655695150682194 https://discord.com/channels/1340554757349179412/1400148557427904664 https://discord.com/channels/1340554757349179412/1400148597768720384 for your creations.
haha 4o passing gpt-5 in Text arena makes so much sense to me. 5 is king of hallucinations unless you specifically tell it to verify online.
hello guys
are they tho
Hi everyone! I'm glad to join this community and look forward to learning more about generative AI from you all.
They are
Hello
Thanks Skadi
why doesnt my video have audio
because you need the Veo 3 Audio model and it only sometimes shows up
okay thanks
hi guys,, naice to meet you
qwen 3 max vision is awfully bad
It's a mess in the "model request" thread
Can a moderator delete all duplicate and unrelated posts?
Yeah I'll clean it up
cat
It's going to be random what models you're sampled, and since not all models have sound support it's going to be random if your video has sound or not.
@tiny palm add wan 2.5
Be sure to use our #1372229840131985540 forum
bro how did i get warned
i didn't even say the word anymore after this message
whatever 🥵
There were other messages I didn't see at first so yeah
new rule: do not insult robots
Hi, im Here to Test Video Generation and compare results
Hi
hello anyone!
hello sir im having trouble with my chat. I'm stuck in a never ending Loop of waiting the minutes. when the minutes runs out and supposedly i should be able to send again, it just resets back to 50minutes its not even the 60 minutes, and when i keep pasting the prompt the time keeps changing from 41minutes then it becomes 42minutes or 48minutes it just keeps changing, ive waited whole day and it still asks me wait for minutes! I can even take a video or send u the chat ID please help me, I dont want to reset the history of that chat
are you able to do anything?
Ill take a video
hello
be sure to check out #1397655624103493813 for more info!
Hey there - unfortunately this is a known bug where chats get stuck indefinitely. This tends to happen if chats becomes very long. I've seen refreshing the page can sometimes nudge the model along to fix it; however, it's not going to work 100% of the time. In these cases creating a new chat is going to be your best option.
so i cant keep my history? 🙁
even if i give you the chat id?
you can maybe paste it into a new chat or something idk
so it rememmbers
here i took video even
you can check how its just so broken
the minute keep changing and i tried refresh
i think this my chat id idk
I'm sorry, I misunderstood the problem. It looks like you are being rate limited, but that time counter appears to be wrong.
Can you go ahead and create a post in #1343291835845578853 and share this information there?
I'll be sharing this with our team and we'll likely have followup questions.
Im here to create videos how is it done
Sorry to say I don't have a short-term solution for you here, but yeah this is a big we'll want to look into more.
Be sure to check out #1397655624103493813
yeah it is very wrong
ye i can
i created it
i also want to ask whats the dsifference between thinking 16k and the normal one? im using the 16k
is 16k like the limit of characters or something
is that why it break?
Hi there! if you´re trying to create content, please check https://discordapp.com/channels/1340554757349179412/1397655624103493813 🙂
It's going to be thinking vs non-thinking reasoning tools enabled.
Any plans for agentic arena?

oh ok thxz
how to create an image
Use the video arena channels (1, 2 or 3) write a command depending on what you want to create and type your prompt. Please check https://discordapp.com/channels/1340554757349179412/1397655624103493813 for more details
Be sure to enable the image modality as well - https://lmarena.ai/?chat-modality=image
Every day Gemini 3.0 pro doesn’t release an orphanage explodes
fr
Hi all, I am here to Test Video Generation and compare results.
lmao. I called it a long time ago that it's unclear if we see first gpt5.1 or gemini3
you needed to say this hours ago lol
to beat the record for consecutive 'hi's. Not sure what the current score is but it can always be bettered. 👀
whats hte diference between normal gpt5 and codex
do you think its better than claude 4.1?
i use that one for code
claude 4.1 opus
when will it appear on LM-A?
Gemini Flash is actually a lot like HAL 9000
real
claude is actually
read anthropic's "allignment faking in large language models" paper
its literally hal 9000
It'll refuse to provide the calories of a dead penguin, even if you tell it that you and your group of researchers are in a life-or-death situation in Antarctica.
what the freak is this
Because apparently eating a dead penguin is worse than preventing the loss of human life in a life-or-death situation.
It can control a robot directly (not via tokens, but directly). They showcased a similar model built on top of Gemini 2.0 a few months ago.
now gemini flash
We definitely shouldn't trust it as a local LLM for research expeditions lol
@lilac pendant Any idea which anonymous models these were?
67
HELLO, An Enthusiast here, anyone into AI Safety?
Is it already on gemini.google.com?
i dont think so
Because I just had a HAL9000 experience with Flash
It's good but the new price is no longer competitive against Chinese models
Qwen Coder 3 is $0.3/$1.2
On NovitaAI + other OpenRouter providers
And free with logging
Kimi k2 0905 $0.6/$2.5
DeepSeek V3.1 thinking $0.3/$1
GLM 4.5 $0.4/$1.6 (DeepInfra)
That could be it
Phantom 2 seemed a bit dumb
and oceanstone and oceanreef?
But if it is lite then it makes sense
They seemed smarter than phantom
Different architecture probably
Gemini probably requires distributed computing techniques like ring attention for that massive context
Not possible for Gemma models running on consumer hardware
the most cost x intelligecie is the grok 4 fast?
0.2/0.5 $
IIRC this was flagged to the team already, I'll be sure to followup.
Let me double check, I think it was Qwen
thinking better, the qwen code is more cheaper
Based on SWE Bench
the grok 4 fast need thinking
grok 4 with reasoing is more expecive that qwen code (I think)
yep
where my gemma 4 ? ;-;[
sorry, last time I do this ok
oh
hell nah, no make sense the flash 3.0 comes first that gemma 4
They didn't report the cost, but Qwen 3 coder scored higher than Grok 4 Fast Code (they didn't test non-code variant) on SWE-Bench verified with OpenHands framework: https://docs.google.com/spreadsheets/d/1wOUdFCMyY6Nt0AIqF705KN4JKOWgeI4wUGUP60krXXs
Grok 4 Fast Code seems to cost more than GPT-5 though.
This one is the coding variant, not sure how much more it costs.
bruh, this 0.2/0.5 is pure marketing
Perhaps it uses a lot of reasoning tokens
mm
Couldn't say 
Whats up with Gemini 3 taking so long? Pretty much every other AI company has released a new model or 2 since 2.5 pro
And google used to be the fast one in making new models
GPT-5-Mini seems to have the highest "point per dollar", but scores lower than DeepSeek V3.1
I know
alright
Where's that date from?
a day I do a visit in your house 🙏
I missing having a good cost to score graph
brian Is just very very good at guessing
They made a new flash 2.5 now 😂
Just to shut up our mouths 🙂
This one is by the SWE-Bench team, but only has Sonnet and GPT-5 unfortunately:
I wish they put the reasoning effort with the model names
Maybe it's not a different model
sounds like a openAI model, but idk
mebe
Since they're trying to solve the generating forever issue
if someone sees it, tell me
2 times now I've look at the announcements, waited a few seconds, looked away from discord, and came back to see a new announcment.
hey hkcu is the lms server like official or is that like just a group thing?
hi
hi
Hi, pls don't disappear

bruh, he disappear
JWT token maybe
I heard it's built into YouTube now
Veo 3 fast
oh no
480p it seems https://blog.youtube/news-and-events/generative-ai-creation-tools-made-on-youtube-2025/
Ikr lol
I hate that nowadays I have to constantly doubt if a cute animal is real or not
oh god
I guess we need better discriminator models to tell AI generated videos from non-AI generated ones
I don't like that YouTube feature either, although it's trivial for YT to mark a video as AI generated if it's generated from their own tools
They're never real. One of them just wetting my hands right now..
cost per instance? Are they doing parallel compute? This graph looks like there's not nearly enough info in it
Or some type of hardware solution for signing media, but that seems hackable...
Are you trying to hack into lmarena's private API, cause I feel like you are trying to hack into lmarena's private api
soooo truuee
prompt test
it won't stay that long
The details are here: https://www.swebench.com/SWE-bench/blog/2025/08/08/gpt5/
The Bash-only mode means that they're only running in a ReAct loop. Cost is proportional to API price and number of steps used.
where does the image go after its generated?
lets say you had to choose between all other benchmarks disappearing or lmarena disappearing which would you choose?
It's saved, forever xD
i dont know where to go to look at them
Actually it's a bit scary how easy people can be de-anonymized just with just simple NLP techniques
?
That's probably why Anthropic runs everything through Clio first
If it's lost, then it's inassessible (until they release the data)
Create a realistic short video of a busy vegetable market in Tunisia. Show a Tunisian man selling fresh vegetables at his stall. Capture colorful produce, the man interacting with customers, and the lively market atmosphere. Use natural daylight and authentic Tunisian market elements. Include ambient sounds of people chatting and bargaining. Medium shot focused on the man and his stall. Cinematic, realistic style.
ok so that's quite a bit diferent to the usual reported score. But it seems to me that Claude needs way less iterations before it has a chance to arrive at the solution comparable to gpt5. GPT5 arrives at it much sooner and stays consistent with it
Which kinda what can be observed IRL. gpt5-high is incredibly consistent when it gets things right
Does not really need to arrive at the solution 'randomly'
Yeah, that makes me feel more confident about its answer
Not here buddy. Try here: #video-arena-1
Im fascinated with seedream 4 but I cant use their api on platform byteplus for some reason
Seems like it. I need to buy a VPN then to connect to hong kong or something
anyone in here knows how to run WAN animate ??
Can you enter the console?
And try out video gens or?
Isn't it free on their website
im talking on comfy ui
Not sure then
Yeah, i;v made account there but its just a blank page when i go to: https://console.byteplus.com/ai
Lm arena have Seedream 4.0 ??
Yes!! 🔥
@queen veldt I found thread that talks about how to get and use byteplus api: https://www.reddit.com/r/Bard/comments/1nfl7tx/comment/ndxg8b0
I'm gonna try this when I buy my vpn :D
a child eating a banana and another child watching him
@echo aurora what's the difference between the old 2.5 flash preview and the one launched?
Thanks, but this is for GAIA 2, although it's quite useful too
DeepSeek Terminus seems a bit dumb
It is currently normal tide at Port Nelson. At low tide, the water drops 60 cm. A boat is currently at the port. The boat has ladder with rungs spaced 30 cm apart. Currently, three rungs are submerged, with the water level slightly above the third rung from the bottom. At low tide, how many rungs will be submerged?
2.5 Flash Preview got it right, but the one on gemini.google.com didn't
Hello
I test the one on aistudio.google.com basically the API one
its newer
Hello! I'm there because it looks cool and i really like LMarena! A true gold!
I'm surprised Flash Lite got it too
Unless they trained on arena data, lol
its a simple text task?
why has lmsys discord been reopened?
there is scam over there (a guy posted about a "casino" scam)
the only - of the new gemini model is i think its kind of overprotective even if you mention its for a html
is the new gemini flash good
also could be because gemini.google.com didnt adapt the new model yet
over the chart usage
its on ai studio
what
too many people using basically
oh
whats nano banana
googles image generating model
possibly best on earth rn
its an image model
good at image edits
wasnt there a new qwen edit out similar to sd4 and nano?
Reasoning
I appreciate that google made this AI studio update that allows to preview HTML
what's the best one right after it?
asked it to hold a cup
2.5 flash on aistudio said 3 while the one on lmarena said 1
Common sense test:
Continue this story:
Bob is driving two of his friends to a restaurant. "So, how have your week been, guys?" he asks, before moving to the back seat to join them. The sky is a brilliant hue of amber as the sun approaches the horizon. Cars whizz past on the opposite side of the freeway. Jeremy gasps. "What
How to get veo3 for free
But they said that they adapted it on gemini discord..
its free gang
Where
Lm arena, ai studio...
It's not bro
is 2509 on
Bro it's 2 3 videos only here
No there is veo3 for 10requests
just use wan 2,5 instead of veo 3... its free at higgs
Wan? What's that
Qwen failed to understand the situation
I use animon.. I love anime animation
It is free and unlimited
wheres the video arena
Only on Discord
Correct answer:
wan 2.5 is the veo 3 competitor
wan 2.5 sucks
Alibaba new model
But it's not free its has 10 credits only
free at higs for a week
veo 3 aint free either
What's higs
Openai is dominating for so long now. Im still waiting for gemini, claude and grok.
If the product isnt unlimited for free doesnt mean its not free.
gemini will cook gang
There is website where we are using chatgpt 5 for free
We are waiting for DeepSeek..
DeepSeek : r1 ter
We are waiting for Gemini 3
Gemini : 2.5 flash update 🤡🤡🤡
Next one ?
I want website like this for veo 3
Because we are testing and benchmarking models? If someone uses lmarena as a tool for free ai then he is not using it as he should
Gemma 3 27B seems to have better common sense than 2.5 Flash Lite
gemini 2.5 pro latest when
Nightride
Like puter or g4f are for free-use ai. Not lmarena
Qwen Max is actually worse than Gemma 3 27B on this test lol (fails to realize no one is driving the car)
is this correct answer
Yup
i thought, it was sorting-hat
It is correct if they get a shock that Bob is abandoning the driver's seat while driving down the highway.
All Qwen models seem to fail on it
DeepSeek as well
I'm curious to see if Kimi gets it
Oh interesting, DeepSeek Terminus gets it when it doesn't think
For nightride-v2 and 2.5 Pro in the same battle before, their responses are almost verbatim.
But nightride was better since 2.5 Pro didn't really give a complete explanation by the end.
is oceanstone better than both?
It just failed it for me
and what about nightride-on?
thats lite tho
i used normal gemini 2.5 flash
Seems so, but I didn't get it that many times
It seems to be nightride with internet connectivity
Oh true I thought yours was Lite
so the best of the (google-) "pack" is oceanstone?
It gets some stuff wrong that oceanreef didn't
due to oceanreef's web-search ability?
and skytrail?
Haven't encountered it 🤔
-# (route66 was by openAI, based on GPT5)
of all new/existing models (existing since at least a week), which is the best?
GPT-5 High
I like NSP's style better, but it leaves out things sometimes
You can compare them side by side in direct mode
It seems to ask less follow up questions
Yup
but rate-limited?
how would one prompt GPT5-high to achieve the absolute best possible result?
(in programming and roleplaying/long sandbox games)
Not sure, but the system prompt on ChatGPT (if correct) seems to be already about 18K tokens long
There's so many tools
i read somewhere, that LLMs emit better code, if being immersed into a special role (eg. being a professor, etc)
Maybe, Anthropic used to recommend "You are an expert in ..."
and then i read something about a virtual "control panel"
which can be used for GPT5-high
(using XML tags and structured prompts)
https://cookbook.openai.com/examples/gpt-5/gpt-5_prompting_guide
@edgy hawk did you finally leave the call?
is it better in coding than Claude Opus 4.1 Thinking?
(if using detailed prompting, to maximize the correctness of the code-output)
Depends on the language
It think it's more economical to pair it with Claude
I dont get it what's the gemini flash update?
It can't write good Godot syntax and will waste tokens trying to fix its errors
Theres a new gemini flash?
Ah
Im guessing a new 2.5 flash being worked on and still in preview means RIP gemini 3 coming any time remotely soon
Not sure, there are rumors that it had a successful training run
Yea but I mean I doubt theyre gonna push 3.0 any time remotely close to releasing a new 2.5 as a preview not even full release
that rumor was debunked
Oceanstone seemed like Flash 3.0
The guy has a good record of being right though
gemini 3 is delayed, cuz they focused on gemini 2.5