#announcements

1 messages · Page 1 of 1 (latest)

thorn mauve
#

@everyone Welcome to the new LMArena server! 👋 Since our graduation from LMSys, we're now moving all Arena projects, and leaderboard updates here to better serve the community. Expect more server updates and community surprises to roll out over time. ✨

All Arena related channels in the LMSys server, will be deprecated starting on March 14, 2025.

Learn more about our graduation here: https://x.com/lmarena_ai/status/1842982750095278482

As part of Chatbot Arena's graduation🎓, we're excited to announce that we changed our X handle to @lmarena_ai! For open-source systems & research at LMSys, please follow @lmsysorg.

This account, @lmarena_ai, will be dedicated to sharing Arena projects & leaderboard updates. See

jaunty rain
#

🚨 New UI Design Access - Heads Up! 🚨

@everyone It looks like the cat’s out of the bag—our new UI design access has been shared! 🐱💨

Here are a few things to keep in mind:
🔸 Super Alpha Stage – Expect breakage, data loss, and plenty of bugs along the way.
🔸 Constant Changes – Things may shift at any moment as we continue refining the experience.

Thanks for bearing with us—your feedback is invaluable! 🛠️ 💙

If you're still interested, then check it out here:

https://alpha.lmarena.ai
pw: super-alpha

PLEASE Give us feedback here: https://forms.gle/8cngRN1Jw4AmCHDn7
and 🪲 report bugs here: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form

Please note: This super alpha is currently optimized for desktop only! Mobile is not yet ready at this time. 🖥️

thorn mauve
#

🙏 Bug Reports & Feedback on the new Desktop UI – Help Us Improve! 🛠️

@everyone
New UI here: https://alpha.lmarena.ai/
pw: super-alpha

Whenever possible, please submit bugs and feedback using the links below so we can prioritize and triage quickly.

💠 Feedback : https://forms.gle/8cngRN1Jw4AmCHDn7
🪲 Bugs: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form
💬 Want to discuss the new UI? Join the conversation in #new-ui-feedback feedback!

Your insights are greatly appreciated—thanks for helping us make things better! 🙌

thorn mauve
#

@everyone Our First Community event! 🎉

Check out the Events tab 🗓️ in this server for more information on our first Stage event on Thursday. We’re excited to walkthrough and chat about the new Desktop UI with you all!

Don’t forget to submit and vote for questions in advance for the Q&A. Hope to see you there! 👋🏽

https://discord.gg/lmarena?event=1349137953406058597

thorn mauve
#

@everyone 📢 If you haven't already, let us know if you're coming to our first event tomorrow! Details above! 👆

BTW, we heard you in #new-ui-feedback and will demo an update that just rolled out around the voting UI. Check out the preview here and go test it out in the alpha! ✅

➡️ Reminder to submit and vote for Q&A here: https://app.sli.do/event/7bThoD3UhfcLdJTLiteEgz pw: super-alpha

thorn mauve
#

@everyone 📢 Today is the day for our first community event!

🎉 Head to The Arena Stage channel for a live walkthrough on the new Desktop UI, and a little Q&A. The event will start at <t:1741894260:R> and be about 30 mins long.

We can’t wait to see you there! 😃

thorn mauve
#

@everyone Thank you to everyone who joined our first Discord community event!

We want to make these even better, and your feedback is key. Tell us what you loved, what could be improved, and what you’d like to see next! Didn’t attend? You can still let us know what you would like to see:

💡 Why share your thoughts?
💠 More events tailored to what you want
💠 Better content, speakers, and formats based on your input
💠 A chance to shape the future of our community

👉 Take a minute to fill out our feedback form here: https://forms.gle/Hr9xgSTWnyLVR9tC8

Your input makes all the difference—let’s make the next one even bigger and better! 🚀

thorn mauve
#

@everyone New Desktop UI – still in Alpha!

Your feedback so far has been incredibly helpful!! 🙏 🙌
Please keep testing the Alpha here: https://alpha.lmarena.ai
🔑 We've removed the password to make it easier for you!

REMINDER: as an Alpha UI it's got limited features, but frequent updates. Be sure to go to the current site for the freshest leaderboard data, features, and the full set of models.

Please share feedback and bugs through these channels as it makes it easier for us to reproduce errors and prioritize requests, we're hard at work already implementing many of them! 🚧

💠 Feedback : https://forms.gle/8cngRN1Jw4AmCHDn7
🪲 Bugs: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form

💬 Want to discuss the new UI? Join the conversation in #new-ui-feedback!

thorn mauve
#

@everyone We love all the great feedback on the Alpha Desktop UI—keep it coming! Your insights are shaping the future of Arena, and we truly appreciate it.
As we continue working on some major improvements to bring everything up to speed with the current site, we’re re-implementing the password to keep access limited to you, our community. This is still an early-stage build, and not yet intended for public use.

🔑 New password: still-alpha

Thanks for being part of this journey with us! Keep testing, sharing feedback, and helping us make Arena even better.

thorn mauve
#

Hi @everyone!

Thanks so much for continuing to test the alpha — we’ve been busy making updates based on your feedback. Here’s what’s new in the alpha:

  • 🛠️ Fixed a bug where messages wouldn’t save (which also caused votes to fail)
  • ✍️ O3-Mini now correctly formats text
  • 📊 Leaderboard columns are now sortable
  • 🔄 Leaderboard data is updated live

🔗 Please keep testing the Alpha: https://alpha.lmarena.ai/
🔑 Password: still-alpha

Your feedback is super valuable — it helps us reproduce bugs and prioritize your requests. We’re already hard at work on more improvements! 🚧

💠 Feedback: https://forms.gle/8cngRN1Jw4AmCHDn7
🪲 Bugs: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form

thorn mauve
thorn mauve
#

📱Alpha is now mobile-ready! @everyone

You can now test the new Arena Alpha UI right from your phone. Whether you're on the go or just prefer mobile, the experience is now optimized and ready for you.
We know a lot of you have been waiting for this — so now’s your chance to dive in and put it to the test!

🔗 https://alpha.lmarena.ai
🔑 Password: still-alpha

As always, your feedback helps us improve fast:
💠 Feedback: https://forms.gle/8cngRN1Jw4AmCHDn7
🪲 Bugs: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form

Come chat in #new-ui-feedback and let us know what you think!

thorn mauve
#

@everyone Thank YOU for all the rich feedback on Alpha UI in the past week!!

Please keep testing, and sending notes!
🔓 No password needed anymore at https://alpha.lmarena.ai/

Reminder this is still an early version, so some features are limited, but updates are coming fast for Desktop & Mobile.
For the latest models and leaderboard data, use the main site.

Got thoughts or found a bug? Let us know here:
💠 Feedback: https://forms.gle/8cngRN1Jw4AmCHDn7
🪲 Bugs: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form
💬 Chat about it in: ⁠#new-ui-feedback

thorn mauve
#

📣 Big News: We are starting a company + New Beta is Live! 💥

Hey @everyone — we’ve got two big updates to share today:

1️⃣ We are starting a company to support LMArena!
We began as a scrappy academic project out of UC Berkeley, and thanks to all of YOU, we’re taking the next step that will allow us to stay committed to improving the platform you’ve helped us build. LMArena will stay neutral, open, and accessible to everyone. Read more here: https://blog.lmarena.ai/blog/2025/new-beta/

2️⃣ Beta is LIVE!
We’ve been listening closely to your feedback from the Alpha, and today, we’re releasing a Beta version of the new LMArena site:
🔗 https://beta.lmarena.ai

📝 A note for those that were testing the Alpha: saved chats won’t carry over in this version. We know that’s a bummer, we’re still fine tuning this - thanks for your patience as we continue to improve! 🤗

☑️ Since Beta will still be a bit buggy, we will be testing the signal quality. Votes are currently being stored, and we'll start to include them properly as the signal quality increases. Your feedback helps us make the evaluation stronger, so please keep voting!

🔗 Try the Beta: https://beta.lmarena.ai
💠 Feedback: https://forms.gle/8cngRN1Jw4AmCHDn7
🪲 Bugs: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form
💬 Chat about it in: ⁠#new-ui-feedback

Thank you for being part of this. Let’s make it great together! 🤝

thorn mauve
#

The feedback is coming in hot for Beta! 🔥 THANK YOU @everyone 🙌🏽
...and we are responding!!

We now have:
🌓 Dark/Light mode toggle in the top right
✂️ Copy/paste images directly into the prompt box
✨ A few polish items in the leaderboard

Keep sending the feedback! 🙏🏽

thorn mauve
#

Hi @everyone, do you have time for a quick poll? 👋

Every vote helps us understand the community better - thanks as always for your help! 🙏🏽
#general message

thorn mauve
#

@everyone Around this time 2 years ago, this community helped us launch our very first Arena leaderboard!

Today we’re celebrating what you all have built together on LMArena! 🥳 👏 A few fun stats:
☑️ 3M+ votes!
🤖 400+ models on the leaderboards!
📊 300+ pre-release evaluations!
📝 10+ open datasets for prompts and user preferences

Take a look at our OG leaderboard below - back when we were still developed Vicuna🦙

Read more about this community’s impact in our blog post: https://blog.lmarena.ai/blog/2025/two-year-celebration/
tweet: https://x.com/lmarena_ai/status/1916620122342695363

minor tide
#

Hi @everyone! ablobwave

Building community is paramount to LMArena's mission. That's why our team is investing more time and energy into creating a space for those interested in making an impact in AI. I'm excited to share that I'll be stepping in as this Discord's community manager! You'll be hearing more from me as we implement improvements to help grow, engage, and protect this space - all in service of building our AI community.

Building community requires... well... community! So I'd love to hear your thoughts through this survey about possible changes you'd like to see happen in this Discord.

It's nice to meet you!
grizzblob

minor tide
#

Hey everyone ablobwave Quick heads up that over the next few days, we'll be making some server changes focused on new member onboarding, channel structure, and mod reporting. Listening to community feedback is incredibly important to us, so please let us know how you’re feeling about these changes. Don’t hesitate to reach out with any questions!

Also, many of you have requested independent scrolling, here's a sneak preview of what we're working on!

minor tide
#

Server Updates

@everyone As some of you may have noticed we've implemented a few changes to the server in an effort to make a more engaging and protected space. It's important to note that we're driven by community feedback so if you have thoughts on these changes fill out this form!

These changes include:

discord Server Structure

  • We've added a new Forum Category that's intended for gathering feedback, troubleshooting issues, and model requests. This is intended to replace the #new-ui-feedback and #arena-feedback channels to better organize and track issues, feedback, and requests! Note if you're not seeing these new channels you may have to enable them in: Channels & Roles -> Browse Channels.
  • New Roles have been created! In the Channels & Roles section you'll find a few questions that'll auto-assign new roles. These roles will allow for more targeted announcements ensuring you all are getting the information that's most important to you!
  • The Server Guide is now where you'll find our #rules & #information-desk channels. Server Guide is located at the top of the channel list.
  • Channels that weren't getting much use are being moved to the Archived category. We're looking to create a more engaging server and phasing out channels with little use will reduce clutter.
    shield Moderation
  • For immediate needs pinging <@&1349916362595635286> role is available.
  • For issues you'd like to flag privately you can now **send a Direct Message to the ModMail bot **which you'll find at the top of the Member List. This will provide more options for members to report bad behavior.
  • We've updated our #rules a bit so be sure to give that a peek. These changes are intended to keep discussion more on-topic and create a more inclusive space.
    🪴 The Future
  • Our plans are to host events on a more regular basis! Staff AMAs, contests, and casual game/activity nights are all things to look forward to. So keep an eye out!
minor tide
#

New Models added to the Beta Site

<@&1372208635530448926> new models are now live on the beta site! Go check em out! ablobparty

  • mistral-medium-2505
  • claude-3-7-sonnet-20250219-thinking-32k
  • amazon.nova-pro-v1:0
  • command-a-03-2025
minor tide
#

Mistral Medium 3 making waves 🌊

<@&1372208524230397962> Since the debut of Mistral Medium 3 we've seen some impressive moves on the leaderboards:

  • #11 overall in chat (a +90 leap from Mistral Large)
  • Top-tier in technical domains (#5 in Math, #7 in Hard Prompts & Coding)
  • #9 in WebDev Arena
minor tide
#

@everyone Sharing some news about the future of LMArena! lmarenalogo

Next week, we are planning to flip the switch and change the current site to the beta site! Additionally, we’re excited to share the news that we have received $100M in seed funding! What this means is hiring more people, increasing site performance and incorporating community feedback faster into the experience. Community feedback will continue to play a crucial role in helping shape the platform.

You may also notice the fresh coat of paint on the server! Along with the new look we’ve added some custom emojis - lmarena battle directchat leaderboard sidebyside battle3d sidebyside3d directchat3dtrophy3d

We’re planning to host a staff AMA soon! If you have questions for the staff please fill out this form. We’ll be sure to announce the date/time when confirmed. We also plan to record the event for those that are unable to attend it live.

Quick note to use the new #1372230675914031105 and #1343291835845578853 forum channels for any feedback or bugs you’d like to share with the team. Also, be sure to grab those server roles in the Channels & Roles section if you haven’t already!

minor tide
#

New Model Update

<@&1372208635530448926> happy to share that **the next generation of Claude is in the Arena! ** lmarenalogo

  • Claude Opus 4
  • Claude Sonnet 4
minor tide
#

Staff AMA

@everyone Happy to share that we'll be hosting our first Staff AMA at <t:1749229200:F> ! **LMArena's Cofounder & CEO Anastasios Angelopoulos ** will be first up in our series of Staff AMAs we plan to bring to this community on a more regular basis. For those unable to attend live, we plan to record the event.

Sign up! lmarenalogo

minor tide
#

Style Control is now the Default View

<@&1372208524230397962> <@&1372208635530448926> Last summer, Style Control was introduced to disentangle model response quality from stylistic factors like length and markdown formatting, helping to better reflect core capabilities. Many community members agreed this provided a clearer assessment, so as of today, Style Control is now the default view. You can still disable Style Control through the filter options.

<@&1372208590248742964> be sure to check out our research into how style influences votes here!

minor tide
#

Independent Scrolling is here

<@&1372208635530448926> The highly requested independent scrolling feature is now here! Each response area is now individually scrollable. Thank you for all your feedback and enjoy! lmarenalogo

minor tide
#

New a16z podcast just dropped! 🎙️

<@&1372208635530448926> New a16z podcast here! Our cofounders sat down with the lead investor at a16z to talk about LMArena and the future of AI evaluation. The team discusses how LMArena evolved overtime, why subjective data is crucial, and what it means to build a CI/CD pipeline for large models. Watch the episode here. lmarenalogo

We'll be hosting a watch party <t:1748619000:F> in #1340554757827461215 for anyone that wants to join!

a16z general partner Anjney Midha sits down with LMArena cofounders Anastasios N. Angelopoulos, Wei-Lin Chiang, and Ion Stoica to talk about the future of AI evaluation.

As benchmarks struggle to keep up with the pace of real-world deployment, LMArena is reframing the problem: what if the best way to test AI models is to put them in front of mi...

▶ Play video
minor tide
#

New Model Update

<@&1372208635530448926> DeepSeek R1-0528 is now in the Arena! Go check it out. lmarenalogo

minor tide
#

AI Generation Contest

@everyone We want to see what you all can make with LMArena! Happy to share that today we’re announcing our first AI Generation Contest. Each month we’ll be crowning a new <@&1378032433873555578> through community voting.

battle How will it work?

  • You may have noticed the new #june-contest channel which is where you all will “submit” your creations and discuss what others submit
  • To submit simply post a screenshot of what you’ve created through LMArena to #june-contest
  • On June 20th submissions will be closed and we’ll circulate a survey allowing you all to vote on which ones you like best
  • The person with the highest score will be declared our winner!
    leaderboard What could I win?
  • 1 month of Discord Nitro nitro
  • <@&1378032433873555578> role! You’ll hold onto this exclusive role (for at least a month) which will be hoisted towards the top of the member’s list showing the world server your accomplishment!
    sidebyside Rules
  • Submissions must be done through Battle Mode, submissions created through side-by-side or direct chat will not be accepted
  • Your submission must include both the left and right response
  • Your submission must be after you’ve voted for which response you prefer meaning the models should be revealed in your submission
  • Only one submission per person
    example provided below

This month’s theme: Image - Cozy Desk

Let’s get cozy! Warm beverages, fluffy blankets, and overall snug vibes is what we’re looking for - but at a desk. Get creative and show us what you think would make a cozy environment. This month’s contest will be for image creations only.

Happy submitting and good luck! lmarenalogo

minor tide
minor tide
#

Early Access Feedback Program

@everyone We're planning to take the feedback process to a whole a new level - and you're invited to apply!

The LMArena Test Garden is going to be our new private feedback program that'll invite selected users to get exclusive sneak peeks at features, design mocks, and ideas the team is considering implementing for early feedback to ensure we're on the right path. For those who are exceptional at providing feedback, and would like to see what we're cooking up, this program is for you!

Apply here

If selected we'll followup privately for next steps. lmarenalogo

minor tide
#

Few Reminders

@everyone
lmarenalogo We have a contest running right now! Don't miss out on the chance to win! Post your creations to the #june-contest channel. More details here - #announcements message

lmarenalogo Excited to provide feedback and interested to see the things we're building behind the scenes? Apply to our Test Garden! Apply here!

lmarenalogo Thank you all who attended last week's Staff AMA! We look forward to bringing more of these in the future. Please share your feedback here!

minor tide
#

6/16/25 Error Message

The team is aware of a widespread issue where models aren't providing a response but instead are erroring out. We are working on a speedy fix. Our apologies for any inconvenience this causes. I'll update this message when it's fixed.

It's fixed! Should be working again. ablobcheer Don't hesitate to @ me if you're still having issues. lmarenalogo

minor tide
#

June's AI Generation Contest Submissions

@everyone Help us determine June's <@&1378032433873555578> for Cozy Desk by voting here!

Reminder for what we're looking for:

Let’s get cozy! Warm beverages, fluffy blankets, and overall snug vibes is what we’re looking for - but at a desk. Get creative and show us what you think would make a cozy environment.

minor tide
#

New Leaderboard - Image Edit leaderboard

@everyone <@&1372208635530448926> <@&1372208524230397962> A new leaderboard is now live! Driven by community votes we’re happy to share that the Image Edit Leaderboard is now available with 7 models currently filling the ranks! With the Image Edit, you can upload an image and directly compare each model’s editing capabilities.

Go check out the Image Edit Leaderboard here! lmarenalogo

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New models added to LMArena!

  • mistral-small-2506
  • imagen-4.0-ultra-generate-preview-06-06
  • ideogram-v3-quality
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New model added to LMArena!

  • grok-3-mini-high
minor tide
#

July Contest Update lmarena_logo

<@&1385704581677191278> To celebrate the launch of the Image Edit Leaderboard let's incorporate some Image Edit capabilities into our July contest!

How does it work and what are the rules?

  • Submit your entry by sharing a screenshot of what you've created in the #july-contest channel.
  • On July 25th submissions will be closed and we'll circulate a way for the community to vote for our winner.
  • Submissions must be done through Battle Mode & use the Image Edit functionality. You must use both an image and text to inspire something new.
  • Your submission must include both the left and right response.
  • Your submission must be after you've voted for which response you prefer meaning the models should be revealed in your submission.
  • Example here.

What could I win?

  • 1 Month of Discord Nitro nitro
  • Become the newest member to receive the <@&1378032433873555578> role!

July's Contest Theme - Out of Place Objects in Space! 🪐 🚀 ☄️ Sci-fi space environment is a must, but include something that clearly doesn't belong. Confuse us!

June's Contest Winner lmarenalogo

Help me send a big congrats to @tawny basalt for being our very first <@&1378032433873555578> !! A very cozy desk was achieved, check it out here!

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New model added to LMArena!

  • seedream-3 (text-to-image)
minor tide
#

New Model Update lmarenalogo

@everyone <@&1372208635530448926> New model added to LMArena & WebDev Arena

  • grok-4
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New model added to LMArena!

  • kimi-k2
minor tide
#

Light UI Improvements Now Live lmarenalogo

@everyone <@&1372208635530448926> We're excited to share with you all that we've made some light UI improvements that are now live! Our intention with these improvements is to make the overall experience more polished, intuitive, and delightful. Many of these improvements were inspired by community feedback.

What’s new?

  • A more streamlined interface. The core chat UI is now a more focused and minimal experience that reduces visual clutter.
  • A more compact sidebar. This provides quick access to About Us, How It Works, Feedback, Leaderboards, and other sections.
  • Leaderboard tab now has improved navigation. Accessing the leaderboards you care most about can now be done faster.

As always, community feedback is crucial. We'd love to hear what you think about these changes here.

We have an exciting July ahead with numerous changes and improvements in the pipeline that we're eager to share with you!

minor tide
#

Search Arena is Now Live lmarenalogo

@everyone <@&1372208635530448926> A new modality has been added to LMArena. Check out Search Arena here!

7 models with search capabilities are ready and waiting for your testing. Note to have the Search modality selected in the chat box first.

  • Grok 4
  • Claude Opus 4
  • Sonar Pro High & Reasoning Pro High
  • o3
  • GPT 4o-Search Preview
  • Gemini 2.5 Pro Grounding

Learn more about what Search Arena has taught us about human-AI interactions on our blog post.

minor tide
#

Psst… we’ve got a surprise for you.

An experimental Video Arena bot is now live, and you can try it right here only in this server!

Generate videos and images with top AI video models with the LMArena bot:

🗳️ Vote on each other’s creations
🧠 Learn how to use it in #1397655624103493813
💬 Share feedback in #bot-feedback

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New models added to LMArena!

  • GLM-4.5
  • GLM-4.5 Air
minor tide
#

Video Arena is live… here on Discord!

@everyone <@&1372208635530448926> <@&1398740297521037332> We’re launching an experimental Video Arena here on Discord. Generate videos with the top AI models for free, and compare their results right here in our community server.

Learn how to use this bot in #1397655624103493813 and start generating in #video-arena-1 #video-arena-2 #video-arena-3.

What does the bot do?

  • With the LMArena bot you can generate videos, images, and image-to-videos via the bot. Similar to battle mode you’ll given two generations and anyone will be able to vote on which they prefer. After a certain number of votes, the bot will reveal the models.

Why is this being considered an experiment?

  • There are a lot of firsts with this one. First time access is exclusive to our Discord server. First time others are able to vote on other’s generations. First time you all will be inspired and chat about what you and others generate. That being the case, we're excited to hear what the community thinks!

To celebrate this new milestone, join us for a Staff AMA with the Bot’s Developer - Thijs Simonian on <t:1754672400:F>. Be sure to submit any questions you have here.

minor tide
#

Open Data Release lmarenalogo

@everyone <@&1372208590248742964> We are sharing a new dataset with over 140k conversations from the text arena collected between April 17th and July 25th 2025. Join us as we explore real-world trends, new features, and fresh prompts.

What’s covered in the latest analysis:

  • Language & topic breakdowns
  • Rating changes: How Arena scores shift over time
  • Overview of the released dataset

And more!

The data analysis highlights not just who is winning, but why, and what signals might matter most in human-based AI evaluations.

Read the full breakdown here on our blog.

minor tide
#

New Model Capability Update - Veo3 Image-To-Video is Here lmarenalogo

<@&1372208635530448926> <@&1398740297521037332> New model capability added to Video Arena

  • Veo 3 Fast & Veo 3 now has **Image-to-Video with audio **capabilities!

Give it a try using /image-to-video in our video-arena channels: #video-arena-1 #video-arena-2 #video-arena-3 and vote on what you think is best!

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New models added to Text & WebDev Arena on LMArena!

  • OpenAI gpt-oss-120b
  • OpenAI gpt-oss-20b
  • Claude Opus 4.1 (battle mode only)
minor tide
#

Video Leaderboards Now Live lmarenalogo

<@&1372208635530448926> <@&1372208524230397962> <@&1398740297521037332> Thanks to the contributions of this community we now have Video Leaderboards available!

  • Text-to-Video Arena Leaderboard Here
  • Image-to-Video Arena Here
minor tide
#

New Video Models Update lmarenalogo

<@&1372208635530448926> <@&1398740297521037332> New models have been added to Video Arena

  • Hailuo-02-pro
  • Hailuo-02-fast
  • Sora
  • Runway-Gen4-turbo

Give them a try in our video-arena channels: #video-arena-1 #video-arena-2 #video-arena-3

minor tide
#

GPT-5 is here! lmarena_logo

@everyone Now that OpenAI's GPT-5 is here, we're thrilled to share that this model has been setting a new bar across our leaderboards. Tested under the codename “summit”, GPT-5 now holds the highest Arena score to date.

Thanks to the community's voting GPT-5 is now number #1 in Text, Vision, and WebDev Arena.

GPT-5 is now available on LMArena!

minor tide
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New models added to LMArena!

  • gpt-5-mini-2025-08-07
  • gpt-5-nano-2025-08-07
minor tide
minor tide
#

New milestone - 15,000 members! lmarenalogo

Thank you all for being a part of this amazing community. Whether you've been here since day one or recently joined us, the LMArena team is incredibly grateful that you're all here!

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New models added to Search Arena on LMArena!

  • gpt-5-search
  • claude-opus-4.1-search
minor tide
#

July Contest Update

<@&1385704581677191278> Thank you for your patience with this one! Vote on July's contest submissions here! On Friday 8/15 we'll announce the winner and start our next contest.

Vote Here.

minor tide
#

Leaderboard Update lmarenalogo

<@&1372208635530448926> <@&1372208524230397962> Thank you all for patiently waiting for this update. We're happy to share the leaderboards were just updated including the GPT-5 variants.

  • gpt-5-high
  • gpt-5-chat
  • gpt-5-mini-high
  • gpt-5-nano-high

Check out the leaderboards here.

minor tide
#

August Contest Update lmarena_logo

<@&1385704581677191278> To celebrate the launch of our experimental Video Arena let's generate some videos into our August contest!

How does it work and what are the rules?

  • Use /video to generate two videos related to our contest theme (Slice 🔪 ) in our video arena channels (#video-arena-1) .
  • alarm To submit your generation to the contest -> **Forward the generated message to the #august-contest channel ** This is where we are collecting submissions.
  • To forward a message, hover your mouse over the message, select Forward, send to the #august-contest channel.
  • Submissions will close on Sept 12th!

What could I win?

  • 1 Month of Discord Nitro nitro
  • Become the newest member to receive the <@&1378032433873555578> role!

August's Contest Theme - Slice! 🔪 Show us those oddly satisfying, safe‑for‑work, crisp cross‑section cuts into everyday objects. Think is it cake videos. Examples here & here.

July's Contest Winner lmarenalogo

Big congrats to @delicate iris for being our July <@&1378032433873555578> ! Check out their generation here.

minor tide
#

BiomedArena is here! lmarenalogo

@everyone <@&1372208635530448926> We're excited to announce that we're partnering with DataTecnica LLC & the National Institutes of Health to create BiomedArena! This new arena is focussed on real-world biomedical workflows, from literature review to disease modeling, using open, reproducible methods trusted by scientists.

It's already in use at NIH’s Intramural Research Program. We're proud to support and help expand this work around:

  • Open, reproducible evaluations
  • Expert-in-the-loop feedback
  • Scientific transparency at scale

Check out BiomedArena for yourself here and read more about this partnership on our blog here!

minor tide
#

Video Arena Highlights Channel lmarenalogo

<@&1398740297521037332> We’re introducing #highlight ! You’ll now notice a ⭐ has been added to the area you vote in Video Arena. This ⭐ allows users to **highlight spectacular generations they want shown off! ** After 4 people chose to highlight a generation it’ll automatically be posted to #highlight for all to appreciate.

Reminder we have our August Contest currently running, be sure to forward your 🔪Slice 🔪 generations in #august-contest - more info here.

minor tide
#

Legacy Site Update lmarenalogo

<@&1372208635530448926> - Our legacy website began this wonderful journey of exploring the world's leading AI models and shaping community leaderboards.

Looking toward the future, we've decided to invest our full efforts into the current version of the site. We have bittersweet news to share: our legacy site is no longer available. All the great features from the legacy site are either under consideration or currently in development. We're listening—please tell us in our feedback forum which features matter most to you.

A heartfelt thanks from the LMArena team to everyone who's been on this journey with us from the early stages! The legacy site will always have a place in our hearts. ❤️

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New model added to Image Edit on LMArena!

  • qwen-image-edit
minor tide
#

Image Edit Leaderboard Update lmarenalogo

<@&1372208524230397962> - The Image Edit Leaderboard has been updated and Qwen-Image-Edit is now the #1 open model for Image Edit!

Check out our leaderboards here.

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New models added to LMArena!

  • deepseek-v3.1
  • deepseek-v3.1-thinking
minor tide
#

Video Arena Bot is Working lmarenalogo

<@&1398740297521037332> Thank you for your patience while we resolved the issue. The @royal stirrup Bot is working again.

Reminder how to use the bot -> type /video or /image-to-video only in these channels #video-arena-1 #video-arena-2 #video-arena-3

minor tide
#

Battle, Side by side, Direct - Why? lmarenalogo

<@&1372208473269473320> We'd love to understand better why you use your preferred version.

Please fill out this survey if you'd like to share your thoughts. blobthanks

minor tide
#

Gemini-2.5.-Flash-Image-Preview Release lmarenalogo

@everyone <@&1372208635530448926> A lot of you have found a recent anonymous model a-peeling & today we're thrilled to share with you that nano-banana = Gemini-2.5-Flash-Image-Preview 🍌

The previous two weeks of testing this model has led to the largest Elo score jump in LMArena history. The text-to-image leaderboard & image-edit leaderboard have been updated and a new leader of the pact has emerged.

Gemini-2.5-Flash-Image-Preview is now available in Battle, Side by Side, & Direct modes. Try it out now & join us in #nano-banana to tell us what you think!

minor tide
#

New Model Update - MAI-1-preview lmarenalogo

<@&1372208635530448926> - A new model provider has landed on our text leaderboard. Microsoft AI's MAI-1-preview is now sitting at #13!

Come check out MAI-1-preview available now on LMArena.

minor tide
#

User Sign In - Google Sign-in lmarenalogo

You all have been waiting patiently for this feature and we’re thrilled to share that **we're starting to roll out User Login! ** On our canary site you can currently login with your Google Account google .

A few notes:

  • You will be able to access your Chat History on different devices when logged in.
  • This is currently only available on our canary site. Desktop & mobile.
  • When you create/login to an account, you can merge your existing chats with the account with the Merge existing chats with your account toggle.
  • To log out open the sidebar, locate your email on the bottom-left, click the three dots.

We want to ensure this feature is working properly which is why we’re slowly rolling this out. Make us aware of any bugs in here: #1343291835845578853 and be sure to share any feedback in here: #1372230675914031105 . Plans for more sign-in options are being worked on.

To access User Login please use this link: https://canary.lmarena.ai/

minor tide
#

User Login - Google Sign-in lmarenalogo

@everyone <@&1372208635530448926> You all have been waiting patiently for this feature and we’re thrilled to share that **we're starting to roll out User Login! ** You can currently login with your Google Account google .

A few notes:

  • You will be able to access your Chat History on different devices when logged in.
  • When you create/login to an account, you can merge your existing chats with the account with the Merge existing chats with your account toggle.
  • To log out open the sidebar, locate your email on the bottom-left, click the three dots.
  • There is a small hold-out group, meaning some users won't yet have access to this feature yet. We will roll this out to everyone as soon as we're certain everything is working properly.

Be sure to share any bugs in here: #1343291835845578853 and be sure to share any feedback in here: #1372230675914031105 . Plans for more sign-in options are being worked on.

Access User Login here!

minor tide
#

Only 9 day left on our August Video Generation Contest lmarenalogo

<@&1398740297521037332> Reminder that you only have 9 days left to submit your for our Video Gen Contest!

How does it work and what are the rules?

  • Use /video to generate two videos related to our contest theme (Slice 🔪 ) in our video arena channels (#video-arena-1) .
  • alarm To submit your generation to the contest -> **Forward the generated message to the #august-contest channel ** This is where we are collecting submissions.
  • To forward a message, hover your mouse over the message, select Forward, send to the #august-contest channel.

July's Contest Theme - Slice! 🔪 Show us those oddly satisfying, safe‑for‑work, crisp cross‑section cuts into everyday objects. Think is it cake videos. Examples here & here.

minor tide
#

Video Arena Discord Bot is Working lmarenalogo

<@&1398740297521037332> The bot is working again! Thank you all for your patience while we worked on a fix.

Reminder how to use the bot:

  1. You must be in #video-arena-1 , #video-arena-2 , or #video-arena-3
  2. Type /video
  3. Type in your prompt and hit Enter
minor tide
#

User Login & Rate Limits lmarenalogo

<@&1372208635530448926> – Due to unprecedented traffic, we’re introducing rate limits for image generation. Logged-in users will continue to enjoy higher limits, and we’ll keep making the login experience better so contributing to community evaluations becomes even more rewarding.

You can learn more about User Login here.

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New Models added to LMArena!

  • Qwen3-max-preview
  • Kimi-K2-0905-preview
minor tide
#

Multi-Turn for Image Edit lmarenalogo

@everyone Multi-turn editing is now available on all image edit models! Refine your image step by step instead of trying to fit every edit into one mega-prompt.

Multi-Turn for Image Edit is available in Battle, Side by Side, or Direct. Try it out for yourself.

minor tide
#

Video Arena Rate Limit lmarenalogo

<@&1398740297521037332> Due to increased usage of the experimental Video Arena, we're going to make a change setting the individual use limit to 5 generations per day. After you've hit this limit, you will have to wait 24hr to start using the bot again.

Reminder how to use Video Arena can be found here.

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New Model added to LMArena!

  • Seedream-4
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New Models added to LMArena!

  • Qwen3-next-80b-a3b-instruct
  • Qwen3-next-80b-a3b-thinking
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New Model added to LMArena!

  • Hunyuan-image-2.1
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New Model added to LMArena!

  • Seedream-4-high-res
minor tide
#

Battle, Side by side, Direct - Why? lmarenalogo

<@&1372208419418673302> <@&1372208243681660978> <@&1372207111445938226> We'd love to understand better why you use your preferred version.

Please fill out this survey if you'd like to share your thoughts. blobthanks

minor tide
#

August Contest Update lmarena_logo

<@&1385704581677191278> Thank you to everyone that participated in our first Video Arena GenAI contest! Vote on which you like the best to crown our new <@&1378032433873555578>

This contest theme is 🔪Slice🔪 Show us those oddly satisfying, crisp cross‑section cuts into everyday objects.

Vote Here

minor tide
#

Text-to-Image & Image Edit Leaderboards Updated lmarenalogo

<@&1372208524230397962> Our Text-to-Image & Image Edit leaderboards have been updated with some interesting movement.

Seedream-4-high-res is now tied with Gemini-2.5-flash-image-preview (nano-banana) for the #1 slot on the Text-to-Image leaderboard. Check it out yourself on the Text-to-Image leaderboard Here.

For Image Edit we're now seeing Seedream-4-high-res holding the #2 position. Image Edit leaderboard can be found Here.

Tell us what you think in our #leaderboards channel.

minor tide
#

AI Eval Product Update lmarenalogo

<@&1372208635530448926> It's our mission to improve the reliability of AI. We're introducing an evaluation product to analyze human-AI interactions at scale, turning their complexity into insights the AI ecosystem will benefit from.

Our AI Evaluation service offers enterprises, model labs, and developers comprehensive evaluations grounded in real-world human feedback. LMArena AI Evaluations consist of:

  • Comprehensive, in-depth evaluations based on feedback from our community.
  • Auditability through representative samples of feedback data.
  • Service-level agreements (SLA) with committed delivery timelines for evaluation results.

Analytics based on community feedback reveal strengths, weaknesses, and tradeoffs—helping providers build even better models and AI applications for everyone.

Are you an enterprise, model lab, or developer that wants to learn more about our AI Evaluation services? Read more on our blog.

minor tide
#

Top 10 Open Model Update lmarenalogo

@everyone <@&1372208524230397962> New open models have entered the Text Arena, and the rankings by provider have shifted for September. Only the top 7 open models also rank within the top 50 overall (proprietary & open).

Some noteworthy highlights trophy3d

  • Qwen-3-235b-a22b-instruct is currently in the top slot
  • Longcat-flash-chat debuts on the charts in impressive manor landing at #5
  • Top models are now clustered even closer in score

lmarena Holding Firm

  • Qwen-3-235b-a22b-instruct stays at #1 (overall rank #8)
  • Kimi-K2-0711-preview firm at #2 (overall rank tied for #8)
  • DeepSeek-R1-0528 holding steady at #3 (overall rank #9)
  • GLM-4.5 holds at #4 (overall rank #13)
  • Mistral-Small-2506 at #9 (overall rank tied at #53)

directchat New Entrants

  • Longcat-flash-chat debuts at #5 (overall rank #20)

leaderboard Movers

  • MiniMax-M1 went from #5 → #6 (overall rank #43)
  • Gemma-3-27b-it shifts from #6-> 7 (overall rank #46)
  • gpt-oss-120b drops to #8 (overall rank #51)
  • Llama-3.1-Nemotron-Ultra-253b-v1 drops from #8 -> #10 (overall rank #53)

battle Dropouts

  • Command-A-03-2025 (#10 → out)

Check out the details for yourself on our leaderboards. Let us know what you think in the #leaderboards channel.

minor tide
#

New Model Update lmarenalogo

@everyone <@&1372208635530448926> New models added to LMArena!

  • Grok-4-fast
  • Grok-4-fast-search

Grok-4-fast Release

Grok-4-fast-search by xAI was tested under the codename menlo and has rocketed as #1 on the Search Leaderboard! Text Arena also tested Grok-4-fast, codename tahoe, where it has debuted impressively at #8 on the Text Leaderboard.

Check out the rankings for yourself and let us know what you think in #leaderboards.

minor tide
#

Model Update lmarenalogo

<@&1372208635530448926> Thank you all for your patience with us as we made adjustments to seedream-4. An update has been made where seedream-4-2k is now available in Battle, Direct, & Side by Side. The model known as seedream-4-high-res is not available at this time. Keep an eye on this channel as we'll continue to provide new model updates in this channel.

Give seedream-4-2k a try here!

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New Models added to LMArena!

  • deepseek-v3.1-terminus
  • deepseek-v3.1-terminus-thinking
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New Models added to LMArena!

  • qwen3-max-2025-09-23
  • qwen3-vl-235b-a22b-thinking
  • qwen3-vl-235b-a22b-instruct
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New Models added to LMArena's WebDev!

  • Gpt-5-codex
  • Qwen3-coder

Give them a try and vote here!

minor tide
#

Leaderboard Update lmarenalogo

<@&1372208635530448926> <@&1372208524230397962> Seedream-4-2k has landed on the leaderboards! On the Text-to-Image leaderboard at #1, Seedream-4-2k is now tied with Gemini-2.5-flash-image-preview (nano-banana)! On the Image Edit leaderboard Seedream-4-2k is now ranked at #2.

Let us know what you think in #leaderboards!

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New Models added to LMArena!

  • gemini-2.5-flash-preview-09-2025
  • gemini-2.5-flash-lite-preview-09-2025
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New model added to WebDev on LMArena!

  • claude-sonnet-4-5-20250929

Give it a try here!

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New models added to LMArena!

  • claude-sonnet-4-5
  • claude-sonnet-4-5-20250929-thinking-16k
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New models added to LMArena!

  • deepseek-v3.2-exp
  • deepseek-v3.2-exp-thinking
minor tide
#

100,000 Community Members! A HUGE Thank You! lmarenalogo

We are deeply grateful that you all are interested in being a part of this community. From the LMArena Team - thank you all!

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New model added to LMArena!

  • glm-4.6
minor tide
#

October AI Generation Contest - Abstract lmarenalogo

<@&1385704581677191278> We want to see what you all can make with LMArena! Could you be crowned our next <@&1378032433873555578> ? October's AI Gen Contest is now open.

sidebysideHow does it work and what are the rules?

  • You must submit your entry by sharing a screenshot in #october-contest
  • On October 24th submissions will be closed and we'll share a way to vote.
  • Submissions must be done through Battle Mode
  • Your submission must include both the left and right response.
  • Your submission must be after you've voted for which response you prefer meaning the models should be revealed in your submission.
  • Example here.
    leaderboard What could I win?
  • 1 month of Discord Nitro nitro
  • <@&1378032433873555578> role! You’ll hold onto this exclusive role which will be hoisted towards the top of the member’s list.

This month’s theme: Image - Abstract Art 🎨 blobpainter

Use wild shapes, vibrant colors, and chaotic lines to express feelings or ideas. Create a unique visual experience. This month’s contest will be for image creations only.

Video Gen Contest Winner

Congrats to @vernal lance for being our first Video Gen <@&1378032433873555578>

minor tide
#

Arena Champions lmarenalogo

@everyone We've thrilled to share with you all the Arena Champions Role! Our goal with this role, <@&1422628364782407830> , is to create a better community for those interested in in-depth AI discussions. This program aims to reward members who show genuine commitment to meaningful conversation by providing a private space where they can engage without interruptions.

Access to this space will be granted through an application process. Members must demonstrate both interest in AI and commitment to meaningful conversation. If you're looking for a dedicated space to have these discussions...

Apply Here

As a thank you to longtime community members, we've granted automatic access to those who've been part of this server since July 2025. Note that members with this role will need to Follow the Category to view these new channels. You can find this in the Channels & Roles tab at the top of the channel list. Select Browse Channels, then enable the Arena Champions Category. You'll then be able to see the list of channels!

minor tide
#

Reasoning Trace Now Live lmarenalogo

<@&1372208635530448926> Reasoning Trace is now available on Side by Side & Direct chat with reasoning models. Think of this as a way for reasoning models to show their work before they provide a response. Check it out in Side by Side & Direct now!

#

New Model Update lmarenalogo

<@&1372208635530448926> New model added to LMArena!

  • reve-v1
    Note this model is** image-edit only**, meaning it will only work if you upload an image for it to edit. The model will error out when using text-to-image.

New model update as well. This has replaced the 16k version.

  • claude-sonnet-4-5-20250929-thinking-32k
minor tide
#

Leaderboard Update lmarenalogo

<@&1372208635530448926> <@&1372208524230397962> Our Text Leaderboard has been updated!

Claude Sonnet 4.5 has made it onto the Text Leaderboard, impressively tied with Claude Opus 4.1 for the #1 slot. It’s also shining across many other categories, including: Hard Prompts, Coding, Creative Writing, Instruction Following, and others.

Share your thoughts with us in #leaderboards

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New model added to LMArena!

  • ibm-granite-h-small (ibm)
minor tide
#

New Model Update lmarenalogo

<@&1398740297521037332> New model added to LMArena's Video Arena!

  • ray-3

Reminder on how to use Video Arena can be found here: #1397655624103493813

minor tide
#

We Want to Learn from YOU lmarenalogo

@everyone As we continue to build LMArena it’s important we remain focussed we’re delivering the tools you all need to excel at being knowledge experts. In order to do this we must understand what is important to you all better.

If you’re interested in sharing your expertise with the team please…

Fill Out This Survey

minor tide
#

New Model Update lmarenalogo

<@&1398740297521037332> <@&1372208635530448926> New models added to LMArena's Video Arena!

  • sora-2
  • sora-2-pro
    note these are only going to be available in text-to-video

Reminder on how to use Video Arena can be found in #1397655624103493813

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New models added to LMArena!

  • hunyuan-vision-1.5-thinking
  • ring-flash-2.0
  • ling-flash-2.0
minor tide
#

New Channel Alert lmarenalogo

Introducing #codename-discussion We'll use this channel to have focussed discussions related to models that are using codenames. Reminder that these models appear under codenames or aliases in Battle mode.

You may have to enable this channel manually in Channels & Roles -> Browse Channels

minor tide
#

Few Quick Reminders lmarenalogo

battle We are looking to understand what is important to you all better to make LMArena a great product. If you’re interested in sharing your expertise with the team please…

Fill Out This Survey

leaderboard Our Arena Champions Program aims to reward members who show genuine commitment to meaningful conversation by providing a private space where they can engage without interruptions. Access to this space will be granted through an application process. Members must demonstrate both interest in AI and commitment to meaningful conversation. If you're looking for a dedicated space to have these discussions...

Apply Here

minor tide
#

Video Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> Sora 2 & Sora 2 Pro have now landed on the Text-to-Video Leaderboard! Sora 2 Pro is now tied as #1 alongside Veo 3 & Veo 3 Fast. Sora 2 has also shaken things up by landing in at #3. If you haven’t already, be sure to check them out for yourself in our Video Arena! Reminder for how to can be found here: #1397655624103493813

Be sure to let us know what think in #leaderboards !

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New models added to LMArena!

  • qwen3-vl-8b-thinking
  • qwen3-vl-8b-instruct
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> New model added to LMArena & WebDev!

New models added to Video Arena!

  • veo-3-1-fast
  • veo-3-1

🚨 New Model Update!

Claude Haiku 4.5 is in the Arena!
@AnthropicAI's latest small model is now available for Text and WebDev ⚡️

Come test it out and tell us what you think. Your votes drive the leaderboards!

minor tide
#

Leaderboard Update lmarenalogo

<@&1372208635530448926> <@&1372208524230397962> The Text Leaderboard has been updated! Claude-Haiku-4-5 has landed and is currently sitting in the #22 rank. Be sure to check out the Text Arena Leaderboard and let us know what you think in #leaderboards

minor tide
#

Text-to-Video & Image-to-Video Leaderboard Update lmarenalogo

<@&1372208635530448926> <@&1372208524230397962> There has been a big shift in our Text-to-Video Leaderboard & Image-to-Video Leaderboard as Veo-3.1 now ranks #1 in both!

Haven’t tried it out for yourself yet? Be sure to check out #1397655624103493813 for an explanation on how Video Arena works.

Let us know what you think in #leaderboards and be sure to share those Veo-3.1 generations in #ai-creations

https://x.com/arena/status/1980319296120320243

🚨🎬 Big news from Video Arena!

@GoogleDeepMind’s latest Veo 3.1 now ranks #1 in both Text-to-Video and Image-to-Video leaderboards. 🏆

This is a +30-point leap from Veo 3.0 → 3.1, making it the first model to break 1400 in Video Arena history!

Huge congrats to the

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New model added to LMArena!

🚨 New Model Update
MiniMax-M2 by @MiniMax_AI is expected to land next week but is already in the Arena for testing as MiniMax-M2-Preview!

Let’s see how it stacks up.

Early details suggest it’s an advanced agentic model with strong reasoning and long-context capabilities,

minor tide
#

Image-to-Video Leaderboard Update lmarenalogo

<@&1372208635530448926> <@&1372208524230397962> We’ve updated the Image-to-Video Leaderboard. Hailuo-2.3 is now on the leaderboard and is ranked #5 with Seedance-v1-pro & Kling-2.5-turbo-1080p.

Image-to-Video Leaderboard Here & share your thoughts in #leaderboards

minor tide
#

Image-to-Video Leaderboard Update & New Model Update lmarenalogo

<@&1372208635530448926> <@&1372208524230397962> New model added to LMArena's Video Arena! This is an image-to-video model only.

  • hailuo-2.3-fast

Also, the Text-to-Video Leaderboard has been updated! Hailuo-2.3 is now ranked #7. Check it out and let us know what you think in #leaderboards.

Text-to-Video Leaderboard HERE

minor tide
#

October Contest Update lmarena_logo

<@&1385704581677191278> Thank you to everyone that participated! Vote on which you like the best to crown our new <@&1378032433873555578>.

This contest theme is 🎨 blobpainter Abstract Art 🎨 blobpainter Use wild shapes, vibrant colors, and chaotic lines to express feelings or ideas. Create a unique visual experience.

Vote Here

minor tide
#

WebDev Leaderboard Update lmarenalogo

<@&1372208635530448926> <@&1372208524230397962> A new model has landed on the WebDev Leaderboard—MiniMax-M2 is now the #1 top open model, & top #4 overall. The community has shown that it shines at performance coding, reasoning, and agentic-style tasks while remaining cost-effective and fast.

WebDev Leaderboard Here & share your thoughts in #leaderboards

minor tide
#

Arena Expert Tagging & Occupational Leaderboards lmarenalogo

@everyone We’re thrilled to introduce a new tagging system built on our evaluation framework that’ll identify the most expert-level prompts from the community. With this new system we’re introducing Expert Leaderboard. Arena Expert reveals the structure of prompts: their depth, reasoning, and specificity, which drives the clarity of evaluation.

Additionally, Occupational Leaderboards are now available that map prompts to real-world domains. By mapping all Arena prompts across 23 occupational fields, the system captures the full spectrum of real-world reasoning tasks. With this update you’ll see 8 of these leaderboards live including:

Read the full research analysis on our blog here & check out our open dataset of expert prompts with occupational tags for yourself here!

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New model added to LMArena!

🚨 New Open Source Model Update!

Touted for its reasoning and coding strengths, Kimi K2 Thinking by @Kimi_Moonshot is now live for both Text and WebDev in Battle, Side by Side and Direct. Bring your toughest prompts! 💪

The last time Kimi K2 was in the Arena with a new model,

minor tide
minor tide
#

Image-Edit Leaderboard Updated lmarenalogo

<@&1372208635530448926> The Image Edit Leaderboard has been updated! Reve-edit-fast is now publicly released and is now ranked in the top 5.

Check out the Image Edit Leaderboard yourself!

minor tide
#

October's Contest Winner lmarenalogo

<@&1385704581677191278> Congrats to @dusty stump for being our October's Abstract Art Contest Winner!! The newest member of our <@&1378032433873555578> ! Check out their generation here.

Stay tuned for future contest announcements.

minor tide
#

Text Leaderboard Update - Kimi K2 Thinking added lmarenalogo

<@&1372208635530448926> <@&1372208524230397962> The Text leaderboard has been updated and Kimi-k2-thinking is now the the #2 ranked open source ranked model & tied for #7 overall. We’ve been seeing this model excel at Math, Coding, and Creative Writing categories. On our Expert leaderboard, Kimi-k2-thinking has an impressive score of 1447 as well.

Check out the Text leaderboard yourself and let us know what you think in #leaderboards.

minor tide
#

User Login - User Email Now Available lmarenalogo

@everyone Driven by community feedback, User Login with email is now available! Save your chat history across multiple devices on both your mobile and desktop browsers.

minor tide
#

Code Arena Is Here lmarenalogo

@everyone Code Arena is now available on LMArena! The WebDev Arena has leveled up with a complete redesign shaped by community feedback and is now known as Code Arena. With Code Arena, models generate live, deployable web apps and sites that anyone can open, inspect, and judge directly, in real time.

Since Code Arena’s evaluation methods have been rebuilt, a fresh new leaderboard designed to reflect this new system has launched.

battle Try out Code Arena for yourself - HERE
leaderboard Check out the new Code Arena leaderboard - HERE
directchat Learn more in our Blog Post - HERE

https://youtu.be/iw8oHpttQOs

https://lmarena.ai/code

Introducing Code Arena, where AI coding meets the real world.

Traditional benchmarks measure correctness: whether code compiles or passes tests. Correctness matters, but it’s only part of what defines real development. Building software is iterative and creative: you plan, test, refine, and repeat. A credible evaluati...

▶ Play video
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New model added to Text, Vision, and Code Arena on LMArena!

🚨 New Model Update!

@OpenAI has updated its GPT-5 series with GPT-5.1.
Available now in the Arena for Text, Vision and the new Code Arena!

Take it for a test drive with your toughest prompts. Let's see how it stacks up! 🥊

minor tide
#

Leaderboard Ranking Method Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> Today we're announcing an important update to how model rankings are displayed on LMArena, one that makes them both more interpretable and more statistically accurate in how they reflect uncertainty. There will now be two new metrics displayed alongside each model’s score:

  • Raw Rank: the model’s position based purely on its Arena score.
    There are no ties here. Each model receives a unique rank based on its performance.
  • Rank Spread: an interval that shows the range of possible ranks a model could have, given the overlap in confidence intervals (CIs) across models.

To learn more about this update check out this blog post here>) and let us know what you think in #leaderboards.

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New model added to Text & Vision!

  • gpt-5.1-high
    & new models added to Code Arena!
  • gpt-5.1-codex
  • gpt-5.1-codex-mini
minor tide
#

Text Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> The Text Arena leaderboard has been updated! Grok-4.1-thinking is now in the #1 rank and followed by Grok-4.1 in the #2 rank. On the Expert leaderboard, Grok-4.1-thinking is also in the #1 rank excelling at Hard Prompts, Coding, Instruction Following, and Creative Writing.

Check out the Text Arena leaderboard and bookmark our Leaderboard Changelog for leaderboard updates.

minor tide
#

November AI Generation Contest - Code Arena lmarenalogo

<@&1385704581677191278> We want to see what you all can make with LMArena! To celebrate Code Arena's launch let's use this modality! Could you be crowned our next <@&1378032433873555578> ? November's AI Gen Contest is now open.

sidebysideHow does it work and what are the rules?

  • You must submit your entry by sharing the preview link in #november-contest
  • On November 28th December 10 submissions will be closed
  • Example here
    leaderboard What could I win?
  • 1 month of Discord Nitro nitro
  • <@&1378032433873555578> role!

Looking for more information on how to use Code Arena? Be sure to check out our walkthrough video here !

https://lmarena.ai/code

See top ranked models: https://lmarena.ai/leaderboard/webdev
Read about it: https://news.lmarena.ai/webdev-arena/

Learn how to build websites and applications with Code Arena, test different models head-to-head, and see how community votes shape the leaderboard. Try it out yourself - and vote for your favorite model.

0...

▶ Play video
minor tide
#

Video & Image Leaderboard Updates lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> The Image-to-Video & the Text-to-Image Leaderboards have been updated as well. Wan2.5-i2v-preview & Wan2.5-t2i-preview have landed in the Top 5 on the Image-to-Video and Text-to-Image leaderboards.

Check out the Image-to-Video leaderboard & the Text-to-Image leaderboard! Let us know what you think in #leaderboards.

Stay up to date with our Leaderboard Changelog!

minor tide
#

Leaderboard Update & New Model Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> Gemini-3-pro has landed on the Text, WebDev, and Vision leaderboards!

  • #1 in Text scoring 1501
  • #1 in Vision scoring 1328
  • #1 in WebDev scoring 1487

You can try out Gemini-3-pro for yourself as it’s now available on LMArena! Be sure to bookmark our Leaderboard Changelog for all leaderboard related updates.

minor tide
#

WebDev Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> A new model provider has entered the WebDev Arena: Deep Cogito has released Cogito-v2.1 which ties ranks #18 overall and is in the Top 10 for Open Source models!

See for yourself on the WebDev Leaderboard and share your thoughts in #leaderboards

minor tide
#

Text Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> Today, some scores for GPT-5.1 are live for the Text Arena:

  • GPT-5.1-high ranks #4
  • GPT-5.1 ranks #12

Stay tuned as we collect more votes and see how the scores converge for GPT-5.1-high in the new WebDev leaderboard powered by Code Arena. We’ll also see how GPT-5.1-medium stacks up.

minor tide
#

New Mode Update lmarenalogo

<@&1372208635530448926> - Google DeepMind’s new image model just landed on LMArena.

  • gemini-3-pro-image-preview (nano-banana-pro)

https://x.com/arena/status/1991540746114199960

🚨🍌BREAKING: @GoogleDeepMind’s Gemini 3 Pro Image aka Nano Banana Pro is in the Arena!

Built on Gemini 3, which only two days ago landed as #1 across all major Arena leaderboards.

Put it head-to-head in Battle mode with the latest models and judge for yourself if it’s SOTA for

minor tide
#

Vision Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> A new model provider has landed on the Vision leaderboard! Ernie-5.0-preview-1022 by Baidu debuts with a score of 1206!

Check out the Vision leaderboard and share some prompts you’ve used with Ernie-5.0-preview-1022 in #share-prompts

minor tide
#

WebDev Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> The WebDev leaderboard has been updated.
GPT 5.1’s Code Arena evaluations have been added:

  • GPT-5.1-medium landed at #2 with a score of 1407
  • GPT-5.1 landed at #8 with a score of 1364
  • GPT-5.-Codex landed at #9 with a score of 1336
  • GPT-5.1-Codex-Mini landed at #13 with a score of 1252

Check out the WebDev leaderboard and contribute with your votes on Code Arena. Always stay up to date with our Leaderboard Changelog.

minor tide
#

Image Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> It’s been a major week for leaderboard movement, and now the Text-to-Image and Image Edit leaderboards have added Gemini-3-pro-image-preview.

  • Gemini-3-pro-image-preview ranks #1 on the Text-to-Image leaderboard (+84 pt over nano-banana)
  • Gemini-3-pro-image-preview ranks #1 on the Image Edit leaderboard (+41 pt over nano-banana)

Let us know what you think in #leaderboards.

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New model added to Text and Code Arena on LMArena!

After a HUGE week of Google, xAI and OpenAI releases, @Anthropic has now entered the Arena with Claude Opus 4.5!

Claude Opus 4.1 currently holds a strong #4 on the WebDev leaderboard (powered by Code Arena) and ranks #7 in the super competitive Text Arena.

How much stronger

minor tide
#

Image Edit Update lmarenalogo

<@&1372208635530448926> Driven by community feedback, we've updated how multi-turn for image edit works and added some new features we’re excited to share with you:

  • Multi-turn in image generation chat has been turned off.
  • You can edit images directly in chat rather than having to download them first by using the new Edit feature in image generation.
  • The new image upload limit is 10.

Big shoutout to everyone in the community who spoke up when we first launched multi-turn for image generation. Your feedback means a lot!

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New model added to Text-to-Image and Image Edit on LMArena!

🖼️ Frontier model drops aren’t slowing down… @bfl_ml’s FLUX.2 just entered the Image Arena!

FLUX.2 Pro and FLUX.2 Flex are available for both Text-to-Image and Image Edit. Stay close as the leaderboard shakes up to see where FLUX.2 lands. Hit it with your strongest prompts and

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New models added to Search Arena!

  • gemini-3-pro-grounding
  • gpt-5.1-search
minor tide
#

Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - Claude-opus-4-5-20251101 & Claude-opus-4-5-20251101-thinking-32k have been added to the leaderboards!

WebDev leaderboard (powered by Code Arena)

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New models added to Text Arena!

🚨New Models in the Arena!

🐳DeepSeek V3.2: a new family of reasoning-first, agent-oriented models from @deepseek_ai are now live in the Arena.

Standard, Thinking, and Speciale are all in the Text Arena, waiting for your toughest prompts!

Get your votes in: we’ll see how they

minor tide
minor tide
#

Text Arena Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - A new open source model has landed on the leaderboard.
Mistral-Large-3 lands at #6 among open models and #28 overall on the Text leaderboard. Mistral-Large-3 was tested under the codename “Jaguar” and performs strongly in:

  • Coding
  • Hard Prompts
  • Multi-Turn
  • Instruction Following
  • Longer Query
    Check out the Text leaderboard for yourself, and let us know what you think in #leaderboards
minor tide
#

Early Access Feedback Program lmarenalogo

<@&1372208635530448926> Resharing an Early Access Program that we haven't mentioned in a bit!

The LMArena Test Garden is our private feedback program that'll invite selected members to get a sneak peeks at features, design mocks, and ideas the team is considering implementing for early feedback. For those who are exceptional at providing feedback, and would like to see what we're cooking up, this program is for you!

Note that if selected we'll followup privately for next steps. Being apart of this program does require signing of an NDA.

Apply Here

minor tide
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New model added to Text Arena!

🚨New Model Update

@Amazon Nova 2 Lite is now available in the Text Arena!

Designed for medium-thinking reasoning tasks, Nova 2 Lite is built for everyday tasks like helping customer support chats, sorting documents, and handling basic business workflows.

minor tide
#

New Model Update & Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - New model added to Text-to-Image Arena & Image Edit Arena!

  • Seedream-4.5

The leaderboards for Text-to-Image & Image Edit have been updated to include Seedream-4.5 as well! This model has landed at #3 for Image Edit leaderboard and ranks #7 in Text-to-Image leaderboard!

Stay up to date with our Leaderboard Changelog.

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New models added to Code Arena & Video Arena!

Code Arena

https://x.com/arena/status/1996692943030354085?s=20

🚨 New Model in the Code Arena!

GPT-5.1-Codex Max by @OpenAI is ready for you in the Code Arena.

Bring your most toughest, creative prompts and we'll see how it stacks up against current leaders: Claude Opus 4.5 Thinking by @anthropicAI and Gemini 3 Pro by @GoogleDeepMind !

minor tide
#

Contest Reminder lmarenalogo

<@&1385704581677191278> - Reminder, our current Code Arena contest is going to wrap up on December 10th! Be sure to add those submissions to #november-contest before time runs out.

More details here.

minor tide
#

Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated and ERNIE-5.0-Preview-1103 landed with a score of 1431 putting it in the top 20.

Check out the Text Arena leaderboard and stay up to date with the Leaderboard Changelog.

minor tide
minor tide
#

New Model Update & WebDev Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - New models added to Code Arena & Text Arena!

  • GPT-5.2-high
  • GPT-5.2

Tested internally under the codename “robin and robin-high”, GPT-5.2-high now ranks #2 & GPT-5.2 now ranks #6 on our WebDev leaderboard! These scores are preliminary, so stay tuned as they stabilize. Always stay up to date with our Leaderboard Changelog.

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New models added to Text and Vision Arena!

  • glm-4.6v
  • glm-4.6v-flash
minor tide
#

YouTube Channel Launch lmarenalogo

@everyone - We recently launched our YouTube channel! If you’ve been enjoying LMArena, you’ll want to subscribe: we’re posting fast, practical breakdowns to help you understand the AI frontier and choose the best models for your work.

Recent videos include:
Beginner’s guide to free + open models
GPT-5.2 enters the Arena
Why small open models are disappearing (7B → 32B shift)
Generating SVGs to measure coding capabilities
How to choose the best AI model for coding

Subscribing helps us grow it, and ensures you see new releases the moment they drop. Let us know if there are topics you want us to cover!

Subscribe here → https://www.youtube.com/@ArenaAIOfficial

minor tide
#

December AI Generation Contest lmarenalogo

<@&1385704581677191278> We want to see what you all can make with LMArena! Could you be crowned our next <@&1378032433873555578> ? December's AI Gen Contest is now open.

sidebysideHow does it work and what are the rules?

  • You must submit your entry by sharing a screenshot in #december-contest
  • On December 30th submissions will be closed and we'll share a way to vote
  • Submissions must be done through Battle Mode
  • Your submission must include both the left and right response
  • Your submission must be after you've voted for which response you prefer meaning the models should be revealed in your submission
  • Example here

leaderboard What could I win?

  • 1 month of Discord Nitro nitro
  • <@&1378032433873555578> role! You’ll hold onto this exclusive role which will be hoisted towards the top of the member’s list

Share With Us

We'd love to see what you created on LMArena! Everyone is encouraged to post your contest submission on your own X account and tag @arena. We may repost what you've shared!

This month’s theme: Image - Holiday Celebration

Let's get festive! We want to see how you celebrate the holidays with diverse celebrations like Christmas, Hanukkah, Kwanzaa, New Year's and more. This month’s contest will be for image creations only.

November's Code Arena Contest Winner

Help me give a big congrats to @quiet rivet for being our first Code Arena <@&1378032433873555578> !! Check out the winning submission here.

minor tide
#

Image Leaderboard Update & New Model Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The Text to Image leaderboard and Image Edit leaderboard have new models shaking up the ranks!

  • gpt-image-1.5 is #1 in Text-to-Image (1264)
  • chatgpt-image-latest is #1 on Image Edit (1409)
  • gpt-image-1.5 #4 in Image Edit (1395)

New models added to Image Arena!

  • gpt-image-1.5
  • chatgpt-image-latest
minor tide
#

Text Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - GPT-5.2-high has landed on the Text Arena leaderboard at #13 in the Text Arena!

With a score of 1441, the model performs strongest in:

  • #1 Math category
  • #2 in Mathematical occupational field
  • #5 on Arena Expert

Stay up to date with all leaderboards changes with our Leaderboard Changelog!

minor tide
#

Text, Vision, and WebDev Leaderboard Update & New Model Updatelmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - Gemini-3-flash has landed on our Leaderboards! The Text Arena leaderboard, Vision Arena leaderboard, and WebDev Arena leaderboard have all been updated.

Gemini-3-Flash highlights:

  • Top 5 across Text, Vision, WebDev
  • #2 in Math and Creative Writing categories

Where Gemini-3-Flash (thinking-minimal) performs strongest:

  • Top 10 across Text and Vision
  • #2 in the Multi-Turn category

These models are now available on Text and WebDev Arena.

  • gemini-3-flash
  • gemini-3-flash (thinking-minimal)

Let us know what you think in #leaderboards and stay up to date with our Leaderboard Changelog.

sly rock
#

Open Sourcing The Leaderboards

@everyone - Today we’re releasing Arena-Rank, an open-source Python package for paired-comparison ranking—the same code that powers the LMArena leaderboards.

Why we’re doing this:

  • Transparency & reproducibility: Anyone can now audit our leaderboard methodology, including ratings and confidence intervals.
  • Research-grade tooling: Arena-Rank implements Bradley–Terry and contextual Bradley–Terry models, with utilities designed for real datasets and real experimentation.
  • Community & extensibility: The package is intentionally decoupled from our internal pipelines, making it easier to test new ideas, compare methods, and apply the same techniques beyond LLM evaluation (e.g., alignment datasets, sports, esports).

Under the hood, Arena-Rank reflects several methodological and engineering upgrades we’ve made over the past months, including faster JAX-based optimization and cleaner separation between data preprocessing and modeling.

We’re looking forward to feedback, experiments, and contributions from the community. We can't wait to see what kind of leaderboards you make.

Grab it at GitHub: https://github.com/lmarena/arena-ai

💬 To install → pip install arena-rank

This is part of our broader commitment to open science and transparent AI evaluation—and it’s just a starting point. Read more in our blog here.

sly rock
#

Image Edit Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - New models reve-v1.1and reve-v1.1-fast have landed on the leaderboard!

Image Edit leaderboard

  • reve-v1.1 ranks #8
  • reve-v1.1-fast ranks #15

This represents a +6-point gain over Reve V1.

Always stay up to date with our Leaderboard Changelog.

#

Search Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The [Search Arena leaderboard](<https://lmarena.ai/leaderboard/search) has been updated. GPT-5.2-Search ranks #2 while Grok-4.1-Fast-Search ranks #4.

Both models debuted ahead of their predecessors, posting gains of +10 points for GPT-5.2-Search and +17 points for Grok-4.1-Fast-Search.

Stay up to date with our Leaderboard Changelog and let us know what you think about how GPT-5.2 ranks in #leaderboards.

sly rock
#

Text Leaderboard Update! lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The [Text Arena leaderboard](<https://lmarena.ai/leaderboard/text) has been updated. GPT-5.2 makes its debut and ranks #17.

Compared to GPT-5.1, the model has improved by +2 points. It trails just one point behind GPT-5.2-high, which is optimized for expert-level reasoning and critical tasks.

Stay up to date with our Leaderboard Changelog!

minor tide
#

Text Leaderboard Update - ERNIE-5.0-Preview lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated & ERNIE-5.0-Preview-1203 by Baidu has landed with a score of 1451. Here are some highlights:

  • Top Text model from Chinese labs
  • This is a 23 pt increase since ERNIE-5.0-Preview-1103

Bookmark our Leaderboard Changelong to stay up to date with the latest changes to our leaderboards!

minor tide
#

WebDev Leaderboard Update - GLM-4.7 lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - Our WebDev leaderboard has been updated and GLM-4.7 by Z.ai ranks #6. This makes it the new #1 open model for WebDev. GLM-4.7 has a score of 1449, which is a +83 pt increase over GLM-4.6.

Try out GLM-4.7 in Code Arena and share with the community some of the generations made in #share-prompts !

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New model added to Video Arena!

  • seedance-v1.5-pro
minor tide
#

Text Arena Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - Our Text Arena leaderboard has been updated with GLM-4.7 & Minimax-m2.1-preview included.

Let us know what you think in #leaderboards , and stay up to date with all leaderboards changes with our Leaderboard Changelog.

minor tide
#

December Contest Closed - Vote Here lmarenalogo

<@&1385704581677191278> - Our December Contest is now closed!

Vote Here to crown our next <@&1378032433873555578>

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New models added to Image Arena & Image-Edit Arena!

🚨 Qwen-Image-2512 and Qwen-Image-Edit-2511 by @Alibaba_Qwen are now live in the Arena.

The latest release delivers reduced finer natural textures, and stronger text rendering with improved layout accuracy.

Bring your toughest, most creative prompts and see how it performs with

minor tide
#

User Login Issues lmarenalogo

<@&1372208635530448926> - Over the break, we identified some issues with the user login and registration flow. These have now been fixed, so if you were experiencing problems, please try logging in or registering again. If you continue to have any issues, don’t hesitate to let us know in #1451836386293448725

Many thanks to everyone who reported these problems initially, we really appreciate your help!

minor tide
#

Image Edit & Text-to-Image Leaderboard Update - Qwen-Image-Edit-2511 & Qwen-Image-2512 lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The Image Edit leaderboard has been updated and qwen-image-edit-2511 is now the #1 open model, and #9 overall!

On the Text-to-Image leaderboard qwen-image-2512 is the #2 open model, and ranks #13 overall.

Find our Leaderboard Changelog here and share what you think about these updates in #leaderboards.

minor tide
#

First January AI Generation Contest lmarenalogo

<@&1385704581677191278> We want to see what you all can make with LMArena! Could you be crowned our next <@&1378032433873555578> ? For January we're going to be running a contest each week!

sidebysideHow does it work and what are the rules?

  • You must submit your entry by sharing a screenshot in #january-1st-contest
  • On January 9th submissions will be closed and we'll share a way to vote
  • Submissions must be done through Battle Mode
  • Your submission must include both the left and right response
  • Your submission must be after you've voted for which response you prefer meaning the models should be revealed in your submission
  • Example here

leaderboard What could I win?

  • 1 month of Discord Nitro nitro
  • <@&1378032433873555578> role! You’ll hold onto this exclusive role which will be hoisted towards the top of the member’s list

This month’s theme: Window to the Future 🪟

Create an image of something that represents you looking out the window and looking torwards your wildest, brightest future. Make it aesthetic, surreal, or sci-fi! This month’s contest will be for image creations only.

December's Contest Winner

Shoutout to @meager wren for being our new <@&1378032433873555578> !! Check out the winning submission here.

sly rock
#

Company News lmarenalogo

@everyone

Today, we're excited to announce our $150M funding round at a post-money valuation of more than $1.7B, nearly triple our valuation just seven months after our seed raise in May. The round was led by Felicis and UC Investments, with participation from a16z, The House Fund, LVDP, Kleiner Perkins, Lightspeed and Laude Ventures. This milestone reflects a growing industry consensus: AI cannot scale responsibly without independent, transparent, and continuous evaluation.

LMArena started as a research experiment. It’s now becoming a foundational pillar for the AI ecosystem. To this community who has tested, voted, reported bugs, submitted suggestions, and shared your perspective: Thank you. You are shaping the future of AI. ♥️

Let’s measure and advance what the world needs next. We will move even faster to build new features and improve our product experience for this community to evaluate the frontier of AI.

Read more about the announcement on our blog: https://news.lmarena.ai/series-a/

▶️ A message just for our wonderful community:

minor tide
#

Vision Arena Leaderboard Update - ERNIE-5.0-Preview-1220 lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The Vision Arena leaderboard has been updated. ERNIE-5.0-Preview-1220 is now ranked #8 with a score of 1226. Baidu currently stands as the only Chinese lab in the Top 10 on the Vision leaderboard.

Check out our Leaderboard Changelog for all leaderboard updates.

minor tide
minor tide
#

Text Arena Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard just got an update. Let us know what you think in the #leaderboards channel.

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - A new model has been added to Video Arena.

minor tide
#

January AI Generation 2nd Contest lmarenalogo

<@&1385704581677191278> We want to see what you all can make with LMArena! Could you be crowned our next <@&1378032433873555578> ? For January we're going to be running a contest each week!

lmarena How does it work and what are the rules?

  • You must submit your entry by sharing a screenshot in #jan
  • On January 16th submissions will be closed and we'll share a way to vote
  • Submissions must be done through Battle Mode
  • Your submission must include both the left and right response
  • Your submission must be after you've voted for which response you prefer meaning the models should be revealed in your submission
  • Example here

lmarena What could I win?

  • 1 month of Discord Nitro nitro
  • <@&1378032433873555578> role! You’ll hold onto this exclusive role which will be hoisted towards the top of the member’s list

This month’s theme :
🍃 Nature Reclaims… depictions of a world where nature has begun to reclaim what humanity once built or occupied. Show us human-made environments overtaken, transformed, or reinterpreted by the natural world. 🍃

January 1st Contest - Vote Here

Vote for our January's First Contest Winner! Vote Here.

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - New models have been added to Video Arena.

  • veo-3.1-audio-4k
  • veo-3.1-audio-1080p
  • veo-3.1-fast-audio-4k
  • veo-3.1-fast-audio-1080p
    Check out #1397655624103493813 to try them out yourself!
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - A new model has been added to Code Arena!

  • gpt-5.2-codex
    &
    New model has been added to Image Arena!
  • glm-image
minor tide
#

Text Arena Leaderboard Update - ERNIE-5.0-0110 lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated! ERNIE-5.0-0110 now ranks #8 with a score of 1460 along with being #12 in Arena Expert. This is currently the only model from a Chinese lab in the Top 10. It performs strongest in the Math category and quite a few occupational categories.

Try out Text Arena and stay up to date with changes in our leaderboards with the Leaderboard Changelog.

minor tide
#

Text-to-Image & Image-Edit Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Image Arena leaderboard has been updated where z-image-turbo now ranks #22, flux.2-klein-9B now ranks #24, and flux.2-klein-4B ranks #31 overall. Additionally, Image Edit Arena leaderboard has been updated where flux.2-klein-9B ranks #15 and flux.2-klein-4B ranks #21.

Stay up to date with our Leaderboard Changelog!

minor tide
#

Image-Edit Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The Image Edit leaderboard has been updated. wan2.5-i2i-preview has been added and now ranks #21 with a score of 1213.

As always, stay up to date with our Leaderboard Changelog.

minor tide
#

January AI Generation 3rd Contest lmarenalogo

<@&1385704581677191278> We want to see what you all can make with LMArena! Could you be crowned our next <@&1378032433873555578> ?

lmarena How does it work and what are the rules?

  • You must submit your entry by sharing the **Code Arena preview link ** in #january-3rd-contest
  • On January 26th submissions will be closed and we'll share a way to vote
  • Submissions must be done with Code Arena. Reminder on how to use Code Arena can be found here.
  • Example here.

lmarena What could I win?

  • 1 month of Discord Nitro nitro
  • <@&1378032433873555578> role! You’ll hold onto this exclusive role which will be hoisted towards the top of the member’s list

This month’s theme :
⌨️ Code Arena - Let's use Code Arena for this contest! No specific theme! Build what ever you think would be most appealing!

January 1st Contest Winner

Big congrats to @dark glacier for being our first January contest winner! Check out their submission here.

January 2nd Contest - Vote Here

Vote for our January's Second Contest Winner! Vote Here. Reminder of the theme - Nature Reclaims… 🍃

minor tide
#

5 Million Votes lmarenalogo

Text Arena has officially passed 5 million community votes. That’s millions of real-world comparisons shaping how frontier AI models are evaluated. You didn’t just prompt. You tested. You voted. You moved the leaderboard.

This milestone belongs to you. 💙

minor tide
#

Text-to-Image Leaderboard Update - GLM-Image lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Image Arena leaderboard has been updated and GLM-Image now ranks #8 among open models and #35 overall with a score of 1018.

minor tide
#

Video Arena is Now Live on LMArena lmarenalogo

@everyone What started last summer as a small Discord bot experiment has grown into a rigorous way to measure and understand how frontier video models perform with real-world use. Thank you to our wonderful community for all the feedback! Today, Video Arena is now available to all on LMArena.

  • Video Arena on Discord with the bot will remain in place and operate the same.
  • Video Arena on the web, similar to how it works on Discord, will be Battle mode only.
  • Login is required in order to use Video Arena on web.
  • The rate limit for Video Arena on web is 3 generation requests per 24 hours.

Learn more about Video Arena on our blog here.

Try out Video Arena on web now lmarenalogo

minor tide
#

Video Arena Walkthrough lmarenalogo

<@&1398740297521037332> <@&1372208635530448926> - Have you tried Video Arena on web yet? Generate videos with 15 different frontier AI models and compare them head-to-head. Vote for the best output to power the leaderboards.

Get the full walkthrough and a few pro-tips from one of our lead engineers on our YouTube channel: https://www.youtube.com/watch?v=jaIU6eKVK1M

LMArena | Benchmark & Compare the Best AI Models

Chat with multiple AI models side-by-side. Compare ChatGPT, Claude, Gemini, and other top LLMs. Crowdsourced benchmarks and leaderboards.

Try it free at https://lmarena.ai/video

Learn how to create AI videos using LMArena's Video Arena. In this walkthrough, lead engineer Anh Mai demonstrates how to use the Video Arena to generate both text-to-video and image-to-video content.

#AIVideo #TextToVideo #ImageToVideo #GenerativeAI #AIEngineering #LMArena

▶ Play video
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - A new model has been added to Text Arena!

  • glm-4.7-flash
minor tide
#

Single-Image Edit & Multi-Image Edit Leaderboard lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The Image Edit Arena leaderboard now has more data around real-world use. It now has two distinct leaderboards:

  • Single-Image Edit: ranks models on single-image tasks
  • Multi-Image Edit: ranks models on multi-image tasks
    This gives us a more accurate view of model performance across distinct image editing use cases, from simple edits to multi-image reasoning. Some changes we see when looking at them:
  • Leader change: ChatGPT Image (Latest) goes #1 -> #3, while Gemini 3 Pro Image 2K (Nano‑Banana Pro) goes #2 -> #1.
  • Biggest rise: FLUX-2-Flex jumps #19 -> #12 (up 7 places).
  • Small‑model mover: FLUX-2-Klein 4B climbs #22 -> #17 (up 5 places).
  • Biggest drops: Seedream-4 2K slides #7 -> #14 (down 7 places) and Qwen Image Edit (2511) slips to #11 -> #16 (down 5 places)
    Check it out yourself to see the differences on our Image Edit leaderboard.
minor tide
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - A new model has been added to Text Arena.

  • qwen3-max-thinking
minor tide
#

Image Edit Leaderboard Update - Hunyuan-Image-3.0-Instruct lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The Image Edit leaderboard has been updated. Hunyuan-Image-3.0-Instruct now ranks #7 for Image Edit.

Try out Hunyuan-Image-3.0-Instruct vs. all the best frontier models in Image Arena. Stay up to date with our Leaderboard Changelog.

minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - A new model has been added to Text Arena.

  • molmo-2-8b
minor tide
#

New Model Update lmarenalogo

<@&1372208635530448926> - A new model has been added to Text Arena.

  • kimi-k2.5
minor tide
#

Community Reminders lmarenalogo

<@&1372208635530448926> - A few reminders for everyone:

  • Please Login to save your Chat History. If you have not yet created an account, please do so now so your chat history is not lost.
  • We’ve expanded our Help Center with a new Experiments category, featuring a growing collection of articles on ongoing experiments.
  • The <@&1349916362595635286> ping should be used to report users who are violating our Discord server #rules . The best place to report bugs is #1343291835845578853. For sharing feedback please use #1372230675914031105 . Please contribute to an existing thread if one already exists for your issue or feedback instead of creating a new thread.
minor tide
#

Introducing Auto-Modality & Model Selector lmarenalogo

<@&1372208635530448926> - Auto-Modality & Model selector are now live!

battle Auto-Modality: Whether your prompt is a coding question, a math proof, or an image generation request, auto-modality routes it to the right modality automatically. This makes every evaluation smoother and more intuitive. For more information on how auto-modality works check out this Help Center article.

sidebyside Model selector: A new design for our model selection window is now live in Direct and Side by Side. Models are now ordered by rank and can be filtered by modality. Find more information about this in the Model Selection Menu Help Center article

minor tide
#

Better AI videos in under 90 seconds lmarenalogo

<@&1398740297521037332> - Check out our Better AI videos in under 90 seconds video now on our Youtube channel: https://www.youtube.com/watch?v=0hCI2XEh0x0

Try it free at https://lmarena.ai/video

Learn how to create AI videos using LMArena's Video Arena. LMArena's lead engineer Anh Mai recommends 6 prompting tips to create better AI videos. Learn how to use the Video Arena to generate both text-to-video and image-to-video content.

#AIVideo #TextToVideo #ImageToVideo #GenerativeAI #AIEngineering #...

▶ Play video
minor tide
#

Text Arena Leaderboard Update lmarenalogo

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated. Kimi K2.5 Thinking is now ranked the #1 open model and ranking #15 overall.

Some highlights:

  • #1 Open model (+5pts vs GLM-4.7)
  • #7 Coding
  • #7 Instruction Following
  • #14 Hard Prompts

Kimi K2.5 Thinking has also been added to Code Arena so go check it out.

minor tide
#

LMArena is now Arena arena_round

@everyone - We’re excited to share today our new look and feel to match our scientific mission: to measure and advance the frontier of AI for real-world use. We are now just: Arena. Now available at: arena.ai.

From a small PhD research project to a platform powered by a global community of millions. This rebrand has been shaped by this community, the people who use it.

Read more about the rebrand process on our blog here.

minor tide
#

Community Reminders arena_round

@everyone - As our community continues to grow, it’s important to keep conversations organized and easy to follow.
dot1 Introducing the #ask-here channel. This channel is the** home for one-off questions**. Going forward, questions in #general will be discouraged so discussions there can stay focused on AI related discussions.
dot1 For reporting issues please use #1343291835845578853. Before posting, check whether a thread already exists and add your report there.
dot1 For sharing feedback please use #1372230675914031105. As with bugs, add to an existing thread if one is already active.

minor tide
#

Vision Leaderboard Update - Kimi K2.5 arena_round

<@&1372208524230397962> <@&1372208635530448926> - The Vision Arena leaderboard has been updated. Kimi-k2.5-thinking is now the #1 open model and ranks #6 overall in Vision Arena making it the only open model in the Top 15.

minor tide
#

Text, Search, Code, Video, and Image Leaderboards Updated arena_round

<@&1372208524230397962> <@&1372208635530448926> - The leaderboards have been updated with new models being added! Check them out:
dot1 Text-to-Image leaderboard

Stay up to date with our Leaderboard Changelog.

minor tide
#

Search Bar & Archive Chat Now Available arena_round

<@&1372208635530448926> - Two new features have rolled out to everyone.

dot1 Search Bar - Your chats are now searchable, with the option to filter by modality.
dot3 Archive Chat - Archive chat sessions to keep them for later without cluttering your chat history.

With these features now live, the process for deleting a chat session has changed. Follow the steps in this article to learn how to delete chats going forward.

minor tide
#

Video Arena Discord Bot Rate Limit Change

<@&1398740297521037332> - Video Arena on Discord had it's rate limit updated to 1 generation request per 24 hour period. Using Video Arena on web still has the same rate limit of 3 generations per 24 hour period.

Arena | Benchmark & Compare the Best AI Models

Chat with multiple AI models side-by-side. Compare ChatGPT, Claude, Gemini, and other top LLMs. Crowdsourced benchmarks and leaderboards.

minor tide
#

New Model Update arena_round

<@&1372208635530448926> - New models have been to Arena!
Text Arena

minor tide
#

Code Arena Leaderboard Update - Kimi K2.5 arena

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has received an update. ⁨⁨⁨⁨⁨Kimi-K2.5-thinking⁩⁩⁩⁩⁩ now ranks #1 open and #5 overall on Code Arena.
The community has also ranked ⁨⁨⁨⁨⁨Kimi-K2.5-thinking⁩⁩⁩⁩⁩ as #1 open model for Vision, and Text including the Coding category.

Let us know what you think in #leaderboards and share the previews of what you’ve built with Kimi.ai in #ai-creations .

Try out the best frontier models on agentic coding tasks at Code Arena.

minor tide
#

Say hello to Max max

@everyone - Max is Arena’s intelligent router, powered by 5+ million real-world community votes.

Max routes each prompt to the most capable model with latency in mind. AI models excel at different things (code, math, speed, reasoning). Max orchestrates across model strengths to deliver reliable performance across real-world use cases.

To learn more about Max check out this blog post and our Youtube video.

Available today here.

Say hello to Max! https://arena.ai/max

Max is Arena’s intelligent router, powered by 5+ million real-world community votes.

Max routes each prompt to the most capable model with latency in mind. AI models excel at different things (code, math, speed, reasoning). Max orchestrates across model strengths to deliver reliable performance across r...

▶ Play video
minor tide
#

New Model Update arena

<@&1372208635530448926> - New model added to Text, Vision, & Code Arena

  • seed-1.8
minor tide
#

Have you met Max? max

<@&1372208635530448926> - Max intelligently routes each prompt to the most capable model currently live on Arena.

Catch the full video on our YouTube with Arena researcher Derry.

Try it free at https://arena.ai/max

Learn how Arena 's Max intelligently routes your prompts to the best AI model for each task. In this walkthrough, researcher Derry Xu demonstrates how Max balances capability, speed, and task type.

Whether you need fast responses, complex reasoning, or specialized skills like coding and math, Max orchestrat...

▶ Play video
minor tide
#

Video Arena Leaderboard Update - Vidu Q3 Pro arena

<@&1372208524230397962> <@&1372208635530448926> - The Image-to-Video leaderboard has been updated. Vidu-Q3-pro by Vidu AI is now in the Top 5 with a score of 1362.

minor tide
#

New Model Update - Opus 4.6 arena

<@&1372208635530448926> - New models added to Text Arena and Code Arena.

  • claude-opus-4-6
  • claude-opus-4-6-thinking
minor tide
#

Claude Opus 4.6 First Impressions arena

<@&1372208635530448926> - Our AI Capabilities Lead, Peter, breaks down the latest performance of Opus 4.6. Check it out here.

Try Claude Opus 4.6 yourself: https://arena.ai

Arena's AI Capability Lead Peter Gostev shares his first impressions of Claude Opus 4.6, Anthropic's latest flagship coding model. In this deep dive, Peter tests the model's coding and reasoning capabilities to see how it stacks up against other frontier models on Arena's leaderboards.

From SVG ge...

▶ Play video
minor tide
#

Code, Text, and Expert Leaderboard Updates - Opus 4.6 arena

<@&1372208524230397962> <@&1372208635530448926> - Claude Opus 4.6 has landed on our leaderboards and is now #1 across Code, Text and Expert!
dot1#1 in Code Arena: +106 score over Opus 4.5
dot2#1 in Text Arena: scoring 1496, 10pts over Gemini 3 Pro and is also ranking #1 in key Text Arena categories:

  • Hard Prompts
  • Instruction Following
  • Longer Query
    dot3#1 for Expert Arena: +~50 lead

Tell us what you think about these changes in #leaderboards and stay up to date with our Leaderboard Changelog.

minor tide
#

January AI Generation Contest arena

<@&1385704581677191278> - Thank you all for voting for our 2nd January AI Generation Contest 🍃 Nature Reclaims! The votes have been tallied and the newest member of our <@&1378032433873555578> is @raw thunder ! Check out the winning submission here.

3rd January Contest - Vote Here

Help crown our next <@&1378032433873555578> by voting. Reminder that this theme is: ⌨️ Code Arena

minor tide
#

Vision, Text, and Code Leaderboard Update - Kimi K2.5 arena

<@&1372208524230397962> <@&1372208635530448926> - Kimi K2.5 is now on our leaderboards and is in the top 5 open models for Vision, Text, and Code!
dot1 #2 open model in Vision, #10 overall on par with gpt-5.1
dot2 #3 open model in Text, #26 overall on par with o3 and Qwen3-max-preview
dot3 #4 open model in Code, #10 overall rivaling gemini-3-flash

minor tide
#

Video Arena Discord Update arena

<@&1398740297521037332> - We’re making a small but important change: Video Arena is moving off Discord and will be exclusively available on arena.ai.

This change was driven largely by community feedback requesting new features that aren’t possible to support through a Discord bot. Moving Video Arena to our site gives us the flexibility to build and ship new capabilities that Discord simply can’t support. While the Discord version is going away, Video Arena itself is not . It’s still fully available on our site and will continue to improve over time.

Starting Wednesday, February 11th at 4pm PST, Video Arena will no longer be available through Discord. The Video Arena site experience on arena.ai/video is unaffected and will remain fully accessible.

Thank you everyone for being vocal about how you want to see Video Arena improve. This move helps us build a better experience for everyone.

#

Image Arena Leaderboard Update - Grok Imagine Image arena

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Image and Image Edit leaderboards have been updated to include Grok-Imagine-Image.

Text-to-Image leaderboard:

  • #4 Grok-Imagine-Image; scoring 1170, surpassing Flux-2-max and Nano-banana
  • #6 Grok-Imagine-Image-Pro
    Image-Edit leaderboard:
  • #5 Grok-Imagine-Image-Pro; scoring 1330, overtaking Seedream-4.5
  • #6 Grok-Imagine-Image
minor tide
#

Text Arena and Code Arena Leaderboard Update - Opus 4.6 Thinking arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard and the Code Arena leaderboard have been updated to include `Claude-opus-4-6-thinking’!

Some highlights:

  • #1 Code Arena: scoring 1576
  • #1 Text Arena: scoring 1504
  • In Code Arena: Claude Opus 4.6 takes #1 & #2; Claude Opus 4.5 takes #3 & #5
minor tide
#

Updates to the Image Arena Leaderboard arena

@everyone - Text-to-image models have advanced quickly, and so have use cases. After analyzing 4M+ user prompts (from fantasy art to logos and posters), it’s clear that a single leaderboard is no longer enough to capture real-world use. With that in mind, we’ve updated the Text-to-Image Arena with Prompt Categories & Quality Filtering.

Category-specific leaderboards that surface domain-level performance across common use cases. New categories include:
Product, Branding & Commercial Design
3D Imaging & Modeling
Cartoon, Anime & Fantasy
Photorealistic & Cinematic Imagery
Art
Portraits
Text Rendering

To improve reliability, we filtered the prompt set to focus on inputs that consistently deliver quality image generation. After removing ~15% of noisy or underspecified prompts, we recomputed the leaderboard, resulting in more stable, higher-confidence rankings.

These updates are a first step toward more granular, interpretable evaluation of text-to-image models—grounded in how people actually use them. You can now explore how your favorite text-to-image models perform across these categories on the Text-to-Image Arena leaderboard.

Read more about the update on our blog.

minor tide
#

Video Arena Discord Reminder arena

<@&1398740297521037332> - Reminder that on **Wednesday February 11th @ 4pm PST ** the Video Arena through the Discord bot will not be available. Video Arena will still be available through the site and is unaffected by this change. This shift allows us to focus efforts into improving Video Arena with features and capabilities that aren't possible through a Discord bot.

We appreciate everyone who has provided feedback and enjoyed using Video Arena through Discord. Thank you!

minor tide
#

Announcing Arena's Academic Partnerships Program arena

<@&1372208590248742964> <@&1372208635530448926> - Today we’re announcing our Academic Partnerships Program, a new initiative to support independent academic research in AI evaluation, rankings, and measurement.

As AI systems advance and adoption accelerates, the methods we use to evaluate and compare models increasingly shape both scientific progress and real-world outcomes. Many of the most important contributions in this area come from the academic research community, and we’re proud to help support that work directly.

Selected projects may receive up to $50,000 in research funding. We welcome proposals across evaluation methodology, leaderboard design, measurement and statistical validity, preference data and human evaluation, and safety/alignment evaluation.

Learn more about our Academic Partnerships Program here.

Apply to our Academic Partnerships Program by March 31, 2026 here.

minor tide
#

PDF Upload is available on Arena arena

<@&1372208635530448926> - Upload PDFs with your prompts to add richer context and test models on document reasoning, bringing evaluations closer to real-world use. Try it across 10 models today - we’ll be adding more over time.

Leaderboard coming soon. Start uploading, comparing, and voting!

minor tide
#

New Models & Video Arena Leaderboard Update arena

<@&1372208635530448926> <@&1398740297521037332> <@&1372208524230397962> - The Video Arena leaderboards have been updated and high-res 1080p variants for Veo 3.1 now rank #1 and #2 in Video Arena.

dot3 In Text-to-Video the 1080p versions top the chart

  • #1 veo-3.1-audio-1080p
  • #2 veo-3.1-fast-audio-1080p

dot4 In Image-to-Video, 1080p variants make the top 5

  • #2 veo-3.1-audio-1080p
  • #5 veo-3.1-fast-audio-1080p

dot1 New models have been added to Video Arena and Text Arena.

  • veo-3.1-audio-1080p (Video Arena)
  • veo-3.1-fast-audio-1080p (Video Arena)
  • step-3.5-flash (Text Arena)
minor tide
minor tide
#

Multi-file Apps Now Live in Code Arena arena

<@&1372208635530448926> - Since launching Code Arena in November to evaluate frontier AI models on real-world, agentic coding tasks, we’ve received a lot of feedback asking to adapt more complex workflows.

With multi-file apps, you can now build and compare production-ready projects, making it easier to evaluate how top frontier AI models perform on your actual use cases.

minor tide
#

Text Arena Leaderboard Update - GLM-5 arena

<@&1372208635530448926> <@&1372208524230397962> - The Text Arena leaderboard has been updated and glm-5 is now #1 among open models.

  • #1 open model on par with gpt-5.1-high
  • #11 overall; scoring 1452, +11pts improvement over GLM-4.7

Stay up to date with our Leaderboard Changelog.

minor tide
#

Video Arena Discord Reminder arena

<@&1398740297521037332> - We are currently in the process of removing the Video Arena through the Discord bot. Video Arena will still be available through the site and is unaffected by this change. This shift allows us to focus efforts into improving Video Arena with features and capabilities that aren't possible through a Discord bot.

We appreciate everyone who has provided feedback and enjoyed using Video Arena through Discord. Thank you!

minor tide
#

New Model Update arena

<@&1372208635530448926> - A new model has been added to Text Arena and Code Arena.

  • Minimax-m2.5
minor tide
#

Code Arena Leaderboard Update - GLM-5 arena

<@&1372208635530448926> <@&1372208524230397962> - The Code Arena leaderboard has been updated and GLM-5 is now the #1 open model in Code Arena. Overall #6 on par with Gemini-3-pro, 100+pts below Claude-Opus-4.6 in agentic webdev tasks.

Arena's AI Capability Lead Peter Gostev shares his first impressions of two powerful models: GLM-5 and MiniMax-M2.5. Give it a watch here.

Try it yourself: https://arena.ai

Arena's AI Capability Lead Peter Gostev shares his first impressions of two powerful models from China: GLM-5 (Zhipu AI) and MiniMax-M2.5 (MiniMax).

In this deep dive, Peter tests both models' coding and reasoning capabilities to see how they stack up against leading models like Claude Opus 4.6 and Gemini 3 P...

▶ Play video
minor tide
#

New Model Update

<@&1372208635530448926> - A new model has been added to Text, Vision, and Code Arena.

  • qwen3.5-397b-a17b
minor tide
#

New Model Update

<@&1372208635530448926> - A new model has been added to Text, and Code Arena.

  • claude-sonnet-4-6
minor tide
#

First impressions of Claude Sonnet 4.6 arena

<@&1473460945308221542> - Check our newest YouTube video with Arena's AI Capability Lead Peter Gostev sharing his first impressions of Claude Sonnet 4.6, Anthropic's latest model in the Claude family.

https://www.youtube.com/watch?v=b0yr1I0dxA4

Want the new YouTube Updates role? Just head to Channels & Roles (in the channel list), click Customize, choose What brings you here, and select YouTube Update.

Try it yourself: https://arena.ai

Arena's AI Capability Lead Peter Gostev shares his first impressions of Claude Sonnet 4.6, Anthropic's latest model in the Claude family.

In this deep dive, Peter tests Claude Sonnet 4.6's coding and reasoning capabilities to see how it stacks up against leading models like Claude Opus 4.6, Gemini 3 Pro, and G...

▶ Play video
minor tide
#

New Model Update arena

<@&1372208635530448926> - New models have been added to Search Arena.

  • sonnet-4.6-search
  • opus-4.6-search
minor tide
#

Arena Leaderboard UI Update arena

@everyone - Millions of votes power the leaderboard. Now you can filter for what matters to you. A new side panel lets you filter and break down ranked results to find the best model for your task. Some highlights:

dot1 Filter by category (e.g. Coding, Expert prompts)
dot3 Open vs. Proprietary Models
dot2 Rank labs by their top-performing models

Check it out, and let us know what you think in #leaderboards.

minor tide
#

A quick look at Arena's updated leaderboard UI arena

<@&1473460945308221542> - Join our Designer, Justin Keoninh, for a walkthrough of the new leaderboard UI updates and learn how to make the most of the latest enhancements.

https://www.youtube.com/watch?v=xfmcR6-Uh5Q

Want the new YouTube Updates role? Just head to **Channels & Roles **(in the channel list), click Customize, choose What brings you here, and select YouTube Update.

https://arena.ai

Millions of votes power the Arena leaderboard. Now you can filter for what matters to you. A new side panel lets you filter and break down ranked results to find the best AI model for your task.

#arenaai #llmevaluation

▶ Play video
minor tide
#

Text Arena Leaderboard Update - Qwen3.5-397B-A17B arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated including Qwen3.5-397B-A17B.

In the highly competitive Text Arena, a few highlights:
dot1 #20 overall in Text on par with Claude Opus 4.1 variants
dot4 Top 5 open for key categories in Text like: Math, Instruction Following, Multi-Turn, Creative Writing and Coding
dot3 Top 5 for open models in Arena Expert (#26 overall)

minor tide
#

Text and Code Arena Leaderboard Update - Gemini 3.1 Pro arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena and Code Arena leaderboards have been updated and now include Gemini-3.1-Pro. It’s top 3 across Text and Vision Arena, and #6 in Code Arena, tied closely with Claude Opus 4.5.

Highlights:
dot1 Tied #1 in Text (scoring 1500) only 4 pts from Opus 4.6
dot5 Top 3 in Arena Expert (scoring 1538), just behind Opus 4.6
dot2 #6 in Code Arena, on par with Opus 4.5 and GLM-5

minor tide
#

New Model Update arena

<@&1372208635530448926> - A new model has been added to Text Arena!

  • trinity-large
minor tide
#

Code Arena and Text Arena Leaderboard Update - Sonnet 4.6 arena

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard and Text Arena leaderboard has been updated to include Claude-sonnet-4.6.

Highlights:

  • +130 pts jump in Code Arena (#22 -> #3) compared to Sonnet 4.5, surpassing top-tier thinking models like Gemini-3.1 and GPT-5.2
  • Strong gains in Text categories: Math (#4) and Instruction Following (#5), Overall (#13)
minor tide
#

Video Arena Channels Update arena

<@&1398740297521037332> - We’re planning to remove the Video Arena generation channels from the server on Monday 2/23 @ 4pm PST. If you’d like to download any generations, please make sure to do so before that date.

#

What happens to your Arena vote? arena

<@&1473460945308221542> - Ever wondered what actually happens after you vote on Arena? Clayton breaks down the full journey.

https://www.youtube.com/watch?v=omT1ohYG53E

https://arena.ai

Ever wondered what actually happens after you vote on Arena? Clayton breaks down the full journey — from raw vote to research-grade data — including how Arena tags prompts by category, filters spam and duplicates, and ensures every data point is legitimate.

0:00 Introduction
0:08 How prompts get tagged (coding, math, creat...

▶ Play video
minor tide
#

Vision Leaderboard Update - Qwen3.5-397B-A17B arena

<@&1372208524230397962> <@&1372208635530448926> - The Vision Arena leaderboard has been updated to include Qwen3.5-397B-A17B. It's now tied for top 2 open model in the Vision Arena with Kimi-K2.5-Instant. Ranks #13 overall on par with proprietary models like GPT-4o.

minor tide
#

Text Leaderboard Update - GPT-5.2-chat-latest arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated to include GPT-5.2-chat-latest now in the top 5.
Highlights include:
dot5 Top 5 scoring 1478 on par with Gemini-3-Pro
dot4 +40pt improvement over the GPT-5.2 model
dot3 Top in key categories: Multi-Turn, Instruction-Following, Hard Prompts, Coding

minor tide
#

Image Arena Leaderboard Update - Reve V1.5 arena

<@&1372208524230397962> <@&1372208635530448926> - The Image Arena leaderboard has been updated to include Reve V1.5.

Highlights:
dot1 #4, scoring 1177, on par with Grok-Imagine-Image
dot4 Top 5 for categories: Text Rendering, Art and Product, Branding Commercial Design

minor tide
#

Code Arena Leaderboard Update - Qwen3.5-397B-A17B arena

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has been updated now including Qwen3.5-397B-A17B.

Highlights:
dot2top 7 open model
dot3ranks #17 overall, on par with proprietary models like GPT-5.2 and Gemini-3-Flash

minor tide
#

New Model Update - arena

A new model has been added to Image Arena.

  • seedream-5.0-lite
#

Video Arena Leaderboard Update - Wan2.6-t2v arena

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Video leaderboard and the Image-to-Video leaderboard has been updated and now include Wan2.6-t2v.

Highlights:
dot5Wan2.6-t2v is the #1 Chinese model in the Video Arena
dot1Top 8 for Text-to-Video, scoring 1346, on par with Veo-3-fast-audio
dot3#12 for Text-to-Image, scoring 1292, close with Seedance v1.5 pro and Kling 2.6 pro

minor tide
#

Search and Text Arena Leaderboard Update - Grok 4.20 beta1 arena

<@&1372208524230397962> <@&1372208635530448926> - The Search Arena leaderboard and Text Arena leaderboard has been updated and now include Grok-4.20-Beta1.

Highlights:
dot2 #1 in Search Arena, scoring 1226, leading GPT-5.2 and Gemini-3
dot3 #4 in Text Arena, scoring 1492 on par with Gemini 3.1 Pro

minor tide
#

New Model Update - arena

<@&1372208635530448926> - New models have been added to Code, Text, and Vision Arena.
Code Arena

  • qwen3.5-27b-code
  • qwen3.5-35b-a3b-code
  • qwen3.5-122b-a10b-code
    Text and Vision Arena
  • qwen3.5-27b
  • qwen3.5-35b-a3b
  • qwen3.5-122b-a10b
minor tide
#

Image Edit Leaderboard Update - Seedream-5.0-Lite arena

<@&1372208524230397962> <@&1372208635530448926> - The Image Edit Arena leaderboard has been updated. Seedream-5.0-Lite now ties for top 5 on the Multi-Image Edit Arena.

Highlights:
dot2 ranks #10 in Single-Image, scoring 1301 on par with Hunyuan-Image-3.0 and Nano Banana
dot4 ranks #23 overall for Text-to-Image Arena, scoring 1106

minor tide
#

Video Arena Leaderboard Update - P-Video arena

<@&1372208524230397962> <@&1372208635530448926> - P-Video enters the Video Arena leaderboards in top 26.

Highlights:
dot1 tied for #22 in Text-to-Video, score 1178 on par with Hailuo 2.0 and Kandinsky 5.0 Pro
dot3 top 26 for Image-to-Video, score 1199 on par with Hailuo 2.0 Fast
dot5 their fastest model with pricing at $0.04/second for 1080p

minor tide
#

Image Arena Leaderboard Update - Nano Banana 2 arena

<@&1372208524230397962> <@&1372208635530448926> - Nano Banana 2 debuts at #1 in Image Arena, and it changes the game again 🍌Officially released as Gemini-3.1-Flash-Image-Preview, it introduces a new web search capability, unlocking image generation grounded in real-world context.

Highlights:
dot3 #1 Text-to-Image scoring 1279, surpassing GPT-Image-1.5 and Nano Banana Pro
dot5 Ties for #1 Single-Image Edit, scoring 1407 on par with ChatGPT-Image-Latest
dot1 Top 3 Multi-Image Edit, alongside Nano Banana Pro variants
dot4 $0.067 per image ~2x cheaper than Nano Banana Pro

sand grail
#

Why the best AI models make the worst in-app assistants arena

<@&1473460945308221542> - Peter covers three reasons AI agents underperform inside existing software.

https://www.youtube.com/watch?v=qF8afKUGRpc

Want the new YouTube Updates role? Head to Channels & Roles (in the channel list), click Customize, choose What brings you here, and select YouTube Updates.

https://arena.ai

You've probably noticed that AI feels incredibly powerful in tools like Claude or ChatGPT, but weirdly disappointing when it's built into the apps you already use. It's not your imagination.

In this clip, we cover three reasons AI agents underperform inside existing software: the model you're actually getting isn't the flagshi...

▶ Play video
minor tide
#

Search Arena Leaderboard Update - Claude Opus & Sonnet 4.6 arena

<@&1372208524230397962> <@&1372208635530448926> - The Search Arena leaderboard has been updated to include Claude-Opus-4-6 and Claude-Sonnet-4-6.

Highlights:
dot5 #1 wide lead by Opus 4.6 scoring 1255, +30pt over Grok-4.20-beta1, GPT-5.2 and Gemini-3
dot2 Sonnet 4.6 ranks #7 on par with GPT-5.1

minor tide
#

New Model Update - arena

<@&1372208635530448926> - A new model has been added to Code Arena!

  • gpt-5.3-codex
minor tide
#

Video Arena Leaderboard Update - Kling V3 Pro arena

<@&1372208524230397962> <@&1372208635530448926> - The Video Arena leaderboard has been updated to include Kling-V3-Pro.

Highlights:
dot4 tied #8, scoring 1337 on par with Wan2.5-i2v-preview
dot5 +52pt improvement over Kling 2.6 Pro
dot3 +48pt over Kling-2.5-turbo-1080p

sand grail
#

7 new categories for Image Arena | Arena.ai text-to-image update arena

<@&1473460945308221542> Guanglei Song, PhD introduces 7 new categories in Image Arena to find the top models for photorealistic, 3D modeling, and more.

https://www.youtube.com/watch?v=kWK18CEbSag

See top models: https://arena.ai/leaderboard/text-to-image

Read more about it: https://arena.ai/blog/image-arena-improvements/

Arena just dropped 7 new categories for evaluating text-to-image models, such as photorealistic & cinematic imagery or 3D imaging & modeling. Guanglei Song, PhD and engineering manager at Arena, talks about how they us...

▶ Play video
sand grail
#

How millions of people compare AI models | Arena.ai explained in 60 seconds arena

<@&1473460945308221542> Arena in 60 seconds. What did we miss?

https://www.youtube.com/watch?v=nktiDGTn61I

Try it free: https://arena.ai

How millions of people compare AI models - explained in 60 seconds. Enter a prompt, two anonymous AI models battle it out, you pick the winner, and your vote powers the world's most trusted AI leaderboard. Filter by skill, explore modalities beyond text - from image generation to full app building - and chat with s...

▶ Play video
sly rock
#

Video Arena Leaderboard Update - Runway Gen-4.5 arena

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Video Arena leaderboard has been updated to include Runway Gen 4.5.
dot1 Gen-4.5 scores 1218, on par with KlingAI’s Kling-2.6-Pro.

sly rock
#

Text & Code Arena Leaderboard Update - Gemini-3.1-Flash Lite arena

<@&1372208524230397962> <@&1372208635530448926> - The leaderboards have been updated to include Gemini-3.1-Flash-Lite-Preview for Text and Code Arena.

Highlights:
dot1 ranks #36 in Text, scoring 1432, on par with Grok-4.1-fast, strong in Creative Writing, and Longer Query
dot2 surpassing larger Gemini 2.5 Flash and GPT-5-mini
dot3 tied in Code Arena for #35 scoring 1261, on par with Qwen3-coder for agentic webdev tasks

sand grail
#

Document Arena walkthrough on Arena.ai | make the best AI models compete arena

<@&1473460945308221542> - in Document Arena, you can upload a PDF and watch two anonymous AI models go head-to-head.

https://www.youtube.com/watch?v=cIU3-gt_Kro

https://arena.ai

Which AI model is best at document reasoning? In Document Arena, you can upload a PDF and watch two anonymous AI models go head-to-head — then vote for the one that gives the better response. In this video, Arena engineer Kelsey walks us through how Document Arena works.

0:00 Introduction to Document Arena
0:31 Solving homew...

▶ Play video
sly rock
#

Document Arena Leaderboard is now Live! arena

<@&1372208524230397962> <@&1372208635530448926> - The Document Arena leaderboard has been added. The Document Arena displays model rankings based on side-by-side evaluations of real-world document reasoning performance across user-uploaded PDF files.

See which frontier AI models rank highest in document reasoning, all powered by side-by-side evaluations on user-uploaded PDFs from real work use cases.

dot1 #1 is Claude Opus 4.6 scoring 1525, +51 pts in the lead
dot2 While Opus 4.5 and Gemini 3.1 Pro Preview join in the top 3
dot3 Latest GPT-5.2 tied at #9, ~100 pts behind Opus 4.6

minor tide
#

New Model Update - arena

<@&1372208635530448926> - New models have been added to Text Arena and Video Arena!

  • GPT-5.3-Chat-Latest (Text Arena)
  • PixVerse V5.6 (Video Arena)
sand grail
#

Why an AI router beats every model on Arena | Max deep dive arena

<@&1473460945308221542> - Arena ML researchers Derry and Evan go behind the scenes of Arena's new Max intelligent router.

https://www.youtube.com/watch?v=nO6E5t6dmA0

Want the new YouTube Updates role? Head to Channels & Roles (in the channel list), click Customize, choose What brings you here, and select YouTube Updates.

Try Max → https://arena.ai/max

Arena researchers Derry and Evan break down Max, the intelligent router that topped every category on Arena's leaderboard — not by being one model, but by picking the best model for every prompt.

In this deep dive, they walk through the Max data and explain how routing across labs combines the strengths of di...

▶ Play video
sly rock
#

Text & Code Arena Leaderboard Update - Qwen3.5 Medium Models arena

<@&1372208524230397962> <@&1372208635530448926> - The Text & Code leaderboards have been updated to include Qwen 3.5 medium models: qwen3.5-27b ,qwen3.5-35b-a3b , qwen3.5-122b-a10b and qwen3.5-flash

Code Arena Highlights:
dot1 top 10 open Qwen3.5-122b-a10b, scoring 1384 and Qwen3.5-27b, scoring 1375 both very close to proprietary models: Claude Sonnet 4.5 and GPT-5.1-medium
dot2 Qwen3.5-35b-a3b, scoring 1257 is on par with new Gemini-3.1-flash-lite-preview
dot3 Qwen3.5-Flash, scoring 1243 is on par with GPT-5.1-codex-mini

Text Arena Highlights:
dot1 top 10 open Qwen3.5-122b-a10b, scoring 1420
dot2 Qwen3.5-27b, the smallest and densest scores 1410, on par with GLM-4.5
dot3 Qwen3.5-35b-a3b, scoring 1392 and Qwen3.5-Flash scoring, 1395 are on par with 6-7x larger last-generation model Qwen3-235b

sly rock
#

Text Arena Leaderboard Update - GPT-5.4 arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated to include gpt-5.4 and gpt-5.4-high.

Highlights:
dot1GPT-5.4-high is tied with Gemini-3-Pro.
dot2Top 3 in Creative Writing, and top 10 in Instruction Following, Hard Prompts.
dot3Top 6 for Occupational categories - Writing, Literature & Language, Entertainment, Sports & Media, Business, Management & Financial Ops.
dot4GPT-5.4 (reasoning none) ranks #16

sand grail
#

First impressions of OpenAI GPT 5.4 | Arena.ai arena

<@&1473460945308221542> - AI capability lead Peter Gostev runs through one-shot tests to see how GPT 5.4 compares to other models.

https://www.youtube.com/watch?v=foEfcttIuiI

Want the new YouTube Updates role? Head to Channels & Roles (in the channel list), click Customize, choose What brings you here, and select YouTube Updates.

Try it yourself: https://arena.ai

Arena's AI Capability Lead Peter Gostev shares his first impressions of GPT-5.4, OpenAI's latest frontier model in the GPT-5 series.

In this deep dive, Peter tests GPT-5.4's coding and reasoning capabilities to see how it stacks up against leading models like Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.2.

From...

▶ Play video
sand grail
sly rock
#

Text Arena Leaderboard Update - PixVerse V5.6 arena

<@&1372208524230397962> <@&1372208635530448926> - The Video Arena leaderboards have been updated to include pixverse-v5.6

Highlights:
dot1#15 on Text-to-Video
dot2#15 on Image-to-Video

minor tide
#

Document Arena Leaderboard Update - Claude Sonnet 4.6 arena

<@&1372208524230397962> <@&1372208635530448926> - The Document Arena leaderboard has been updated to include claude-sonnet-4-6

dot1#2 ranking overall
dot2top 3 are all Anthropic models

sand grail
#

First impressions of OpenAI GPT 5.4 | Arena.ai arena

<@&1473460945308221542> - After testing GPT 5.4-medium and coming away underwhelmed, Peter revisits the model at higher reasoning levels — and the difference is massive.

https://www.youtube.com/watch?v=4T9_deFRI30

Try it yourself: https://arena.ai

See also First impressions of GPT 5.4 Medium:
https://youtu.be/foEfcttIuiI

After testing GPT 5.4-medium and coming away underwhelmed, the Arena team revisited the model at higher reasoning levels — and the difference is massive.

This video compares GPT 5.4 at medium, high, and pro reasoning levels side by s...

▶ Play video
minor tide
#

Text Arena Leaderboard Update arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated and now includes Nemotron-3-Super-120B-A12B.

Highlights:
dot1 #37 open rank in Expert
dot5 #38 open rank in Math
dot3 #118 Text overall

minor tide
sand grail
#

How AI search actually works (and why it breaks) | Arena.ai arena

<@&1473460945308221542> - Every LLM can retrieve sources but the real challenge is reasoning about which ones to trust → arena.ai/search

https://www.youtube.com/watch?v=iy1HGPAK5H4

Try Search Arena → https://arena.ai/search

How does an LLM actually search the web? And why does it still get things wrong?

In this video, Arena researcher Logan King breaks down how search-augmented large language models work — from tool calls and context management to hallucination, misattribution, and source quality. He explains why eve...

▶ Play video
minor tide
#

Text and Document Leaderboard Update arena

<@&1372208524230397962> <@&1372208635530448926> - GPT-5.4 lands tied #2 on Document Arena and in top 5 for Arena Expert. In the Document Arena, top models for document analysis and long-form reasoning are ranked based on real-world use.

Highlights:
dot1 #2 tied with Sonnet 4.6
dot3 #5 for Arena Expert
dot5 top 10 in Business, Management, & Financial Ops and Writing, Literature, & Language
dot2 top 15 in Math, Instruction Following, Multi-Turn & Hard Prompts
dot4 top 15 in Text Arena overall

sand grail
#

The Nano Banana origin story: how an anonymous Google model made Arena history arena

<@&1473460945308221542> - Are you old enough to remember the Nano Banana hype? Meet the engineer who added it to the Arena - Yue!

https://www.youtube.com/watch?v=6vJnfrr34Xc

Try Image Arena → https://arena.ai/image

An anonymous image generation model appeared on Arena and quickly became the most-voted model in the platform's history. It was called Nano Banana — and it turned out to be built on Google Gemini.

In this video, we sit down with Yue, a former Google employee, to break down what made Nano Banana so s...

▶ Play video
minor tide
#

Code Arena Leaderboard Update arena

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has been updated and GPT-5.4-high has landed in the top 6.

Highlights:
dot5 top 6 in Web Dev overall
dot2 #6 for Multi-File React
dot3 top 10 for Single-File HTML

sand grail
#

Can AI spot nonsense? We tested 80 models — thinking ones did worst arena

<@&1473460945308221542> - Peter takes us through his viral benchmark 💩

https://www.youtube.com/watch?v=bOLXvFMqhi8

Can your AI tell when a question is total nonsense — or does it just make up an answer and hope you don't notice?

Arena researcher Peter Gostev built a benchmark to find out. He crafted nonsense questions across domains like law, finance, and tech, then tested 80 models to see which ones pushed back and which ones played along. The results? S...

▶ Play video
minor tide
#

Text and Code Arena Leaderboard Update - Grok 4.20 Beta Reasoning arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard and the Code Arena leaderboard has been updated to include Grok 4.20 Beta Reasoning.

Highlights:
dot4 #7 in Text Arena overall tied with GPT-5.4-high
dot3 top 10 in Math, Multi-Turn, Creative Writing, Coding & Hard Prompts
dot2 top 15 in Expert Arena

minor tide
#

Customize Arena Leaderboards

<@&1372208524230397962> - Everyone's real-world use for AI differs. Select the columns and data that matters most to you:

  • Rank Spread
  • Model Organization
  • License
  • Total Votes
  • Price ($/MToken)
  • Max Context
minor tide
#

Video Edit Leaderboard Launch arena

<@&1372208524230397962> - Today we’re launching the Video Edit Arena to evaluate the frontier capability of video models! The leaderboard is powered by thousands of real-world community votes. Click the Edit button in Video Arena to edit any video and compare top model outputs. More models coming soon!

#1 Grok-Imagine-Video
#2 Kling-o3-pro
#3 Kling-o1-pro
#4 Gen4-aleph

minor tide
#

New Model Update - arena

<@&1372208635530448926> - New models have been added to Text Arena & Vision Arena!

  • gpt-5.4-mini-high
  • gpt-5.4-nano-high
minor tide
#

New Model Update - arena

<@&1372208635530448926> - A new model has been added to Text Arena and Code Arena!

  • minimax-m2.7
sand grail
#

Pick the most reliable AI, not the smartest one arena

<@&1473460945308221542> - Peter explains why the most underrated quality in AI models isn't how smart they are - it's how consistently they perform.

https://www.youtube.com/watch?v=IaewLXbgMIQ

https://arena.ai

Arena's AI Capability Lead Peter Gostev explains why the most underrated quality in AI models isn't how smart they are - it's how consistently they perform, and why that distinction changes everything for builders.

#AI #LLM #arenaai #softwaredevelopment

▶ Play video
minor tide
#

Code Arena Leaderboard Update - MiniMax M2.7 arena

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has been updated to include MiniMax-M2.7

minor tide
#

Text Arena and Expert Leaderboard Updated - Qwen 3.5 Max Preview arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard and the Expert Arena leaderboard has been updated to include Qwen3.5-max-preview.

Highlights:
dot1#3 Math
dot2#10 Expert
dot3#15 Text Arena
dot4Top 20 for Writing, Literature & Language, Life, Physical, & Social Science, Entertainment, Sports, & Media, and Medicine & Healthcare

minor tide
#

Image Arena and Vision Arena Leaderboard Updated - MAI-Image-2 & Grok-4.20-Beta-Reasoning arena

<@&1372208524230397962> <@&1372208635530448926> - The Image Arena leaderboard has been updated to include MAI-Image-2.

Highlights:
dot3 #5 in Text-to-Image overall
dot4 #5 for 3D Imaging & Modeling, Cartoon, Anime & Fantasy, Photorealistic & Cinematic Imagery, Art and Portraits
dot2 #6 for Product, Branding & Commercial Design

The Vision Arena leaderboard has also been updated to include Grok-4.20-Beta-Reasoning

Highlights:
dot5 Scoring 1240
dot1 #11 across all Vision

minor tide
#

Reporting Process for “Something Went Wrong” Errors arena

<@&1372208635530448926> - We’ve recently updated our bug reporting process. If you encounter a Something went wrong error, you’ll now see a Trace ID. We recommend the following steps when this happens:

  1. Try the troubleshooting steps in this article.
  2. Confirm the issue isn’t related to rate limits, as outlined in this article.
  3. Submit the Trace ID using this form. This helps us identify the root causes and resolve issues over time.
minor tide
#

New Model Update - arena

<@&1372208635530448926> - A new model has been added to Vision Arena!

  • mimo-v2-omni
sand grail
#

Battle mode vs. side by side vs. direct chat | Arena.ai arena

<@&1473460945308221542> - three ways to test AI - each designed for a different use case.

https://www.youtube.com/watch?v=JdcoHxnPouM

Try it free: https://arena.ai

Arena.ai gives you three distinct ways to interact with the world's top AI models — each designed for a different use case.

🔥 Battle mode — Pit two anonymous models against each other and vote on the best response. Your votes help shape the largest open LLM leaderboard.
🔀 Side by side — Pick two specif...

▶ Play video
minor tide
#

Code Arena and Arena Expert Leaderboard Update - MiMo V2 Pro arena

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard and Arena Expert leaderboard has been updated to include MiMo-V2-Pro.

Highlights:
dot1top 6 lab, #13 in Code Arena for agentic webdev tasks
dot2#10 for Arena Expert
dot3 top 20 for Life, Physical, & Social Science and Business, Management, & Financial Ops occupational categories

sand grail
#

How to evaluate LLMs | the statistics behind Arena's rankings arena

<@&1473460945308221542> - curious about the math behind Arena's ranking system? This is a technical deep dive into the core methodology with Anastasios Angelopoulos, co-founder and CEO of Arena.

https://www.youtube.com/watch?v=CnWt0Zarfoc

https://arena.ai

Anastasios Angelopoulos, co-founder and CEO of Arena, presents a technical deep dive into how the platform evaluates large language models using live human preference data.

Prior familiarity with probability and statistics is helpful but not required.
The talk covers Arena's core methodology — pairwise comparisons, Bradley-...

▶ Play video
minor tide
#

Server Update - arena

<@&1372208635530448926> - As our community continues to grow, we’re making a few updates to help us provide better and more organized support. Going forward, here are some updated guidelines:
dot2 The #general channel should be used for general AI discussion. If you have a question, please use #ask-here. If you’re reporting a bug, please use #1343291835845578853. Going forward, I wouldn't expect getting an answer from staff if it's asked in general.
dot3 If you’d like to reach the team directly, please send a message to @hearty vessel. Direct messages to individual team members (including myself) will no longer be supported.
dot5 Before creating a new thread in #1343291835845578853, please check if there’s already an existing thread for the issues you're experiencing. Duplicate posts may be removed to help keep things organized.

sand grail
#

Create an Arena account and manage your chats arena

<@&1473460945308221542> - how to create an Arena account to unlock higher rate limits, cross-device chat sync, and access additional features.

https://www.youtube.com/watch?v=1Nee2fIlvy8

https://arena.ai

Create an Arena account to unlock higher rate limits, cross-device chat sync, and access to additional features. This quick walkthrough covers signing up with Google or email, plus how to save, archive, and delete your chats.

#arenaai

▶ Play video
minor tide
#

Search Arena Leaderboard Updated - Gemini 3.1 Pro Grounding arena

<@&1372208524230397962> <@&1372208635530448926> - The Search Arena leaderboard has been updated to include Gemini-3.1-Pro-Grounding.

sand grail
#

Are Open Sources models catching up to Proprietary models? arena

<@&1473460945308221542> - We looked at 3 years of Arena data. Proprietary models still hold the top 20 spots — but open source is climbing — with the top open source models ranked 20 (GLM-5 by Z.ai), 23 (Kimi-K2.5-Thinking by Moonshot AI) and 27 (Qwen3.5-397b-a17b by Alibaba).

https://www.youtube.com/shorts/kw__8_0AUx4

https://arena.ai/leaderboard/text

We looked at 3 years of Arena data. Proprietary models still hold the top 20 spots — but open source is climbing — with the top open source models ranked 20 (GLM-5 by Z.ai), 23 (Kimi-K2.5-Thinking by Moonshot AI) and 27 (Qwen3.5-397b-a17b by Alibaba).

Here's how the race has evolved.

#arenaai #opensource ...

▶ Play video
minor tide
#

Text Arena Leaderboard Update - GPT-5.4-Mini-High arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated to include GPT-5.4-Mini-High.

Highlights:
dot1#3 Business, Management & Financial Ops
dot2#10 Multi-Turn
dot3 #13 Arena Expert
dot4 #17 Legal & Government
dot5 #19 Instruction Following

sand grail
#

Big model smell: will one giant AI model replace all the small ones? arena

<@&1473460945308221542> Arena ML researchers Evan and Derry discuss whether the future of AI belongs to massive generalist models or smaller, fine-tuned specialists.

https://www.youtube.com/watch?v=v74VmZnj6Ww

Arena ML researchers Evan and Derry discuss whether the future of AI belongs to massive generalist models or smaller, fine-tuned specialists.

They break down what "big model smell" actually means (sensing genuine reasoning vs. memorized responses), introduce the idea of "pristine pre-training smell," and debate whether scaling is really hitting...

▶ Play video
sand grail
#

Troubleshooting guide for Arena.ai arena

<@&1473460945308221542> Five things to try if you see a "Something went wrong with this response" error and how to submit a bug report if you need extra help.

https://www.youtube.com/watch?v=r7ekTRRSlRs

Getting a "Something went wrong with this response" error on arena.ai? This video walks through five things to try first, common error messages you might see, and how to submit a bug report if you need extra help.

📋 Full troubleshooting guide:
https://help.arena.ai/collections/3598891525-troubleshooting

🐛 Bug report form:
https://docs....

▶ Play video
minor tide
#

Text, Vision, and Search Arena Leaderboard Update - Grok 4.20 Multi-Agent Beta Model arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard, Vision Arena leaderboard, and Search Arena leaderboard has been updated to include Grok-4.20-multi-agent-beta-0309.

Highlights:
dot1#7 for Search Arena, #11 in Text Arena, #22 in Vision Arena
dot2#3 Medicine & Healthcare
dot3 #6 Expert Prompts
dot4 #6 Mathematical
dot5 #6 Legal & Government

minor tide
#

Pareto chart is now on the Leaderboards arena

<@&1372208524230397962> <@&1372208635530448926> - We've added Pareto frontier charts to the leaderboard! The Pareto frontier curve demonstrates which models are most efficient at their level of performance (by Arena score) vs. a blended price per 1M tokens (3:1 Ratio).

Now available across:
dot2 Text Arena
dot4 Vision Arena
dot1 Search Arena
dot3 Document Arena
dot5 Code Arena

In this video, Peter and Justin walk through how to read the Pareto frontier, find hidden gems, compare models across categories.

https://arena.ai

The top of the leaderboard doesn't tell the whole story. Arena's Pareto chart shows you which AI models give you the best performance at every price point — so you can stop defaulting to the most expensive option and start picking the right model for your budget and your task.

In this video, Peter and Justin walk through how...

▶ Play video
minor tide
#

New Model Update arena

<@&1372208635530448926> - A new model has been added to Vision Arena!

  • GLM-5V-Turbo
minor tide
#

Code Arena Leaderboard Update - Qwen 3.6 Plus arena

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has been updated to include Qwen 3.6 Plus. Qwen 3.6 Plus Preview is the #2 lab for the React leaderboard in Code Arena which ranks models based on agentic workflows involving multi-step reasoning, tool use, and multi-file apps.

minor tide
#

Text Arena Leaderboard Update - Gemma-4-31B arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated to include Gemma-4-31B.

Highlights:
dot1#3 open (#27 overall), on par with the best open models Kimi-K2.5, Qwen-3.5-397b
dot2Top 3 across Math, Instruction Following, Multi-Turn, Hard Prompts, Creative Writing, and Coding
dot3 Apache 2.0 license
dot4 Its efficient variant: Gemma-4-26B-A4B is #6 open (#39 overall)

minor tide
#

Arena Leaderboard Dataset Release arena

<@&1372208524230397962> <@&1372208635530448926> - We're releasing the full history of Arena leaderboard data as a public dataset, nearly 3 years of rankings across 10 Arenas, dozens of categories, and hundreds of models. Optimized to empower analysis and unlock new insights across modalities and over time.

Learn more in our blog post here.

Find the dataset on our Hugging Face here.

minor tide
#

Changes to Models Available in Direct and Side-by-Side arena

<@&1372208635530448926> - We are changing some of the models available in Direct and Side-by-Side modes. These changes are part of our efforts to ensure that we can continue offering access to AI models while keeping the platform running reliably.

The following models are being removed from Direct and Side-by-Side:

  • claude opus models
  • gpt 5.4, gpt-5.4-high
  • gemini-3.1-pro-preview

Updates like this help us maintain availability for everyone over the long term, and we intend to bring these models back in a way that’s more sustainable when possible.

minor tide
#

New Model Update - arena

<@&1372208635530448926> - A new model has been added to Text Arena & Code Arena!

  • Qwen-3.6-Plus
sand grail
#

How battles in direct changes the way we evaluate LLMs arena

<@&1473460945308221542> - Longer context windows, more decisive votes, and over 90% correlation with traditional battle rankings — plus new signals about human preference that weren't measurable before.

https://www.youtube.com/watch?v=_pmZJaEbRaQ

https://arena.ai/text/direct

Battles in direct is a new evaluation mode that triggers battles mid-conversation in direct chat. Unlike traditional battles with four voting options, battles in direct uses three: continue with A, continue with B, or skip. The result? Longer context windows, more decisive votes, and over 90% correlation with tradit...

▶ Play video
minor tide
#

New Model Update - arena

<@&1372208635530448926> - A new model has been added to Text Arena and Code Arena!

  • GLM-5.1
sand grail
#

First impressions of Z.ai GLM-5.1 (open source) arena

<@&1473460945308221542> - GLM-5.1 is a solid incremental upgrade over GLM-5, with slightly higher richness and quality across most generations. But the gap isn't massive.

https://www.youtube.com/watch?v=f11tVBXWr2g

Try it yourself: https://arena.ai

The Arena team put Z.ai's GLM-5.1 through roughly 100 one-shot generation tests — 3D scenes, SVGs, games, and more — comparing it side by side with Claude Opus 4.6, GLM-5, and Gemini 3.1 Pro.

The verdict: GLM-5.1 is a solid incremental upgrade over GLM-5, with slightly higher richness and quality across mo...

▶ Play video
minor tide
#

Text Arena Leaderboard Update - GLM-5.1 arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated to include GLM-5.1.

Highlights:
dot1#1 open model in Longer Query (#4 overall)
dot2#1 open model in Life, Physical & Social Science (#5 overall)
dot3 #1 open model in Entertainment, Sports & Media (#8 overall)
dot4 #1 open model in Coding (#10 overall)

sand grail
#

We just open-sourced 3 years of model benchmark data—over 700+ models arena

<@&1473460945308221542> - This is one of the longest-running datasets tracking real model performance over time.

https://www.youtube.com/watch?v=QbpW77m90kw

Download here:
https://huggingface.co/datasets/lmarena-ai/leaderboard-dataset

Arena has released a comprehensive public dataset containing every leaderboard entry since May 2023—over 700+ unique models, multiple arenas, continuous evaluation history. This is one of the longest-running datasets tracking real model performance over time.

Ideal...

▶ Play video
minor tide
#

Planned Change: New Usage System - Share Your Feedback arena

<@&1372208635530448926> - We're sharing an early opportunity to share feedback at a change we’re planning before we release anything. But before introducing the change, we'd like to provide some context. Right now, Arena manages usage through a mix of per-model limits, session token limits, and modality limits. It works, but it's difficult to track, and we’re confident we can build something better.

What’s changing

We're reworking this into a single, unified system. Instead of separate hidden limits everywhere, you'd have a daily balance of “credits” you can spend however you want, so you can go all-in on Video Arena, stick with one model, bounce between a few, whatever. The goal is to give you way more control and visibility over how you use Arena .

What we don't know yet

We’re still early, so many details are intentionally flexible. Things like credit structure, launch timing, and even the final name are still being refined. We’re sharing this now so feedback like yours can directly shape how it evolves. If you’re wondering what certain aspects will look like, there’s a good chance we’re actively working through those decisions and won't have a direct answer.

Share your feedback

Drop your thoughts in this thread - https://discord.com/channels/1340554757349179412/1491461236448170134 What would make this feel fair? What would you want to know up front? What would actually be useful to see?

minor tide
#

Code Arena Leaderboard Update - GLM-5.1 arena

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has been updated to include GLM-5.1.

sand grail
#

Humans are not as worthless as you think arena

<@&1473460945308221542> Derry and Peter break down why verifiable benchmarks only capture a narrow slice of model performance, and why human judgment still matters more than most people think.

https://www.youtube.com/watch?v=tUxCxdcJeg4

https://arena.ai

AI benchmarks like GPQA, MMLU, and SWEbench saturate in months — models score 99% and nobody cares anymore. But when you actually use these models, something's off. Tests pass, scores look great, and the output is still wrong.

Arena researchers Derry Xu and Peter Gostev break down why verifiable benchmarks only capture a nar...

▶ Play video
minor tide
#

Text & Vision Arena Leaderboard Update - Muse Spark arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard and the Vision Arena leaderboard has been updated to include Muse-Spark.

Highlights:
dot1Text Arena: #3
dot5 Vision Arena: #2
dot2#4 Hard Prompts, #6 Coding, #9 Creative Writing, #10 Instruction Following, #27 Expert
dot3 #3 tied for Business, Management, & Financial Ops, #7 Legal & Government, #12 Writing & Literature

sand grail
#

What top models still suck at | Arena Deep Dive arena

<@&1473460945308221542> - ICYMI check out Peter's keynote from the AI Engineer conference in London last week.

https://www.youtube.com/watch?v=zTMQ88btM8s

https://arena.ai

Peter Gostev, AI capability lead at Arena, walks through the talk he gave at the AI Engineer conference in London — breaking down where today's top models are still falling short, according to millions of real user votes on Arena.

The key metric: Arena's "both bad" rate, which tracks how often users dislike both model respon...

▶ Play video
sand grail
#

Ask an Expert Accountant | Arena.ai arena

<@&1473460945308221542> Can AI function like a professional Accountant? We asked an expert to judge AI responses to a complex Arena Expert-style prompt.

https://www.youtube.com/watch?v=AvoTjdYgFBA

Best AI for your job:
https://arena.ai/leaderboard/text/industry-business-and-management-and-financial-operations

Can AI function like a professional Accountant? We asked an expert Accountant to judge AI responses to a complex prompt regarding required minimum distributions and missing participants. See his reactions and verdicts.

0:00 Intro ...

▶ Play video
minor tide
#

Video Edit Arena Arena Leaderboard Update - HappyHorse-1.0 arena

<@&1372208524230397962> <@&1372208635530448926> - The Video Edit Arena leaderboard has been updated to include Happyhorse-1.0. HappyHorse-1.0 by Alibaba-ATH debuts at #1 in Video Edit Arena!

minor tide
#

Introducing: Image to WebDev Leaderboard arena

<@&1372208524230397962> <@&1372208635530448926> - The Image to WebDev Leaderboard ranks models based on their ability to generate websites based on screenshots and images. This is a dedicated leaderboard that shows which models are the best at agentic coding live sites based on visual inputs.

Check out the Image to WebDev Leaderboard here!

minor tide
minor tide
#

Text-to-Video and Image-to-Video Arena Leaderboard Update - HappyHorse-1.0 arena

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Video leaderboard and Image-to-Video leaderboard has been updated to include HappyHorse-1.0.

Highlights:
dot1#2 Text-to-Video: Scores 1444
dot2#2 Image-to-Video: Scores 1444
dot3 Top 2 for all 3 Video Arena leaderboards

minor tide
#

Leaderboard Update - Opus 4.7 & Opus 4.7 Thinking arena

<@&1372208524230397962> <@&1372208635530448926> - Code Arena leaderboard, Expert Arena leaderboard & Text Arena leaderboard has been updated to include Claude-opus-4-7 and Claude-opus-4-7-thinking!

Highlights - Opus 4.7 Thinking:
dot1Ranks #1 in Code Arena, +37 points over Opus-4.6, #1 on both React and HTML leaderboards
dot4 Ranks #1 in Expert Arena
dot3 Ranks #1 in Text Arena, leads across major categories: #1 Coding, #1 Software & IT Services, #1 Writing, Literature, & Language, #1 Life, Physical and Social Sciences, #2 Multi-Turn

Highlights - Opus 4.7:
dot5 Ranks # 4 in Expert Arena
dot3 Ranks #3 in Text Arena

minor tide
#

First impressions of Claude Opus 4.7 arena

<@&1473460945308221542> <@&1372208524230397962> - In this deep dive, Peter tests Claude Opus 4.7’s capabilities to see how it stacks up against leading models in the Code Arena where agentic web development tasks are evaluated by generating live sites and apps.

https://www.youtube.com/watch?v=VE7Pi4gLu0s

Arena's AI Capability Lead Peter Gostev shares his first impressions of Claude Opus 4.7, Anthropic's latest model in the Claude family.

In this deep dive, Peter tests Claude Opus 4.7’s capabilities to see how it stacks up against leading models in the Code Arena where agentic web development tasks are evaluated by generating live sites and ap...

▶ Play video
minor tide
#

Leaderboard Update - Opus 4.7 & Opus 4.7 Thinking arena

<@&1372208524230397962> <@&1372208635530448926> - The Vision Arena leaderboard and the Document Arena leaderboard has been updated to include Opus-4.7 and Opus-4.7-thinking.

Highlights:
dot2 Opus 4.7 ranks #1 in Document Arena with a score of 1521, while Thinking ranks #4 with a score of 1508
dot5 In Vision 4.7-Thinking ranks #1 while Non-Thinking ranks #3
dot1 Vision sub-categories saw the biggest gains over Opus-4.6: #1 Diagram (+20), #1 for Non-Thinking in Homework (+30), and #1 OCR for Non-Thinking (+7)

minor tide
#

Leaderboard Update - Qwen 3.6 Plus arena

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has been updated to include Qwen3.6-plus.

minor tide
sand grail
#

We built a Japan travel planner from scratch with Code Arena arena

<@&1473460945308221542> See how agentic coding actually works: tool calling, self-correction, multi-turn iteration, and more.

https://www.youtube.com/watch?v=g-vieNXOF4s

https://arena.ai/code

We put Code Arena to the test by building a full Japan travel itinerary website from scratch — complete with AI-generated images and real restaurant recommendations pulled from the web. Watch as two anonymous models go head-to-head in battle mode, and see how agentic coding actually works: tool calling, self-correction, ...

▶ Play video
minor tide
#

Image Arena Leaderboard Update - GPT-Image-2 arena

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Image Arena leaderboard and the Image Edit leaderboard has been updated to include GPT-Image-2.

Highlights:
dot1#1 Text-to-Image (1512), +242 over #2 (Nano-banana-2 with web-search aka gemini-3.1-flash-image)
dot2#1 Single-Image Edit (1513), +125 over #2 (Nano-banana-pro aka gemini-3-pro-image)
dot3 #1 Multi-Image Edit (1464), +90 over #2 (Nano-banana-2)

https://www.youtube.com/watch?v=Adsaiyr7Nv8

Try it yourself: https://arena.ai/image

GPT Image 2 just made the biggest leap Arena has ever recorded — over 200 Arena points ahead of every other image model. Arena's AI Capability Lead Peter Gostev ran 100+ prompts head-to-head against GPT Image 1.5, Grok Imagine, and Nano Banana 2 to find out what's actually driving that gap.

The verdic...

▶ Play video
minor tide
#

New Model Update - arena

<@&1372208635530448926> - A new model has been added to [Text],(https://arena.ai/), Vision & Code Arena!

  • MiMo-V2.5
  • Pro versions added to Text and Code
minor tide
#

Code, Vision, Document, and Text Arena Leaderboard Update - Kimi-K2.6 arena

<@&1372208524230397962> <@&1372208635530448926> - The Code, Vision, Document, and Text Arena leaderboards have been updated to include Kimi-K2.6.

Highlights:
dot1#2 open model in Code Arena (#6 overall)
dot2#1 open model in Vision Arena (#15 overall)
dot3 #1 open model in Document Arena (#8 overall)
dot4 #2 open model in Text Arena (#24 overall)

Stay up to date with our Leaderboard Changelog.

minor tide
#

Text-to-Image and Image Edit Leaderboard Update - Qwen Image 2.0 Pro arena

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Image Arena leaderboard and the Image Edit leaderboard has been updated to include qwen-image-2.0-pro-2026-04-22.

Highlights:
dot1#9 Text-to-Image
dot2#17 Image Edit (Single Image)
dot3Top 10 in Text-to-Image categories: #6 Portraits, #7 Photorealistic & Cinematic Imagery, #7 Art

minor tide
#

Text and Code Arena Leaderboard Update - DeepSeek V4 arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard and Code Arena leaderboard has been updated to include DeepSeek V4.

Highlights:
dot1Code Arena - DeepSeek V4 Pro (thinking): #3 open model (#14 overall)
dot2Text Arena - DeepSeek V4 Pro (thinking): #2 open model (#14 overall), DeepSeek V4 Flash (thinking): #10 open model (#47 overall)
dot3 Top 10 Text categories: #1 Medicine & Healthcare (v4 Pro), #8 Legal & Government (v4 Pro), #8 Math (v4 Pro Thinking), #9 Life, Physical, & Social Science (v4 Pro Thinking)

https://www.youtube.com/watch?v=AC2jj_jfunQ

Try it yourself: https://arena.ai/code

The Arena team put DeepSeek V4 Pro through a battery of one-shot generation tests — 3D voxel scenes, SVGs, UI mockups, and creative prompts — comparing it side by side with Claude Opus 4.7, GLM-5.1, Gemini 3.1, GPT-5.4 High, Muse Spark, and its own predecessor, DeepSeek 3.2.

The verdict: DeepSeek V4 i...

▶ Play video
minor tide
#

New Model Update - GPT 5.5 arena

<@&1372208635530448926> - A new model has been added to Text, Vision, Search, Document, and Code Arena! Note this is in Battle mode. You will not find this model in Direct and Side by Side modes.

  • GPT 5.5

https://www.youtube.com/watch?v=nDjMlfNbNNY

NOTE: Follow Arena on X to know when GPT 5.5 is available on Arena.ai: https://x.com/arena

OpenAI just dropped GPT 5.5 (codenamed "Spud") — their first new pre-trained model in a while. But how does it actually perform on real-world tasks? Peter Gostev, Arena's AI Capability Lead, puts it through its paces with visual coding challenges, long-...

▶ Play video
minor tide
#

Changes to Image Rate Limits arena

<@&1372208635530448926> - We’re updating the rate limits for Image Arena, effective Monday, April 27. This will help us maintain reliability and ensure the service remains sustainable as usage grows. The team is developing a new usage system that'll unify limits, learn more here: #announcements message.

dot1 Image Arena Battle Mode will now be limited to 15 generation requests per 24-hour period
dot2 The following models will have a 5 generation request per hour limit

  • gpt-image-1.5-high-fidelity
  • gpt-image-2
  • gemini-3.1-flash-image-preview
  • gemini-3-pro-image-preview
  • chatgpt-image-latest-high-fidelity
minor tide
#

Code, Text, Document, Expert, Search, and Vision Leaderboard Update - GPT-5.5 arena

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena, Text Arena, Expert Arena, Search Arena, Document Arena, and Vision Arena leaderboards have been updated to include GPT-5.5.

Highlights:
dot1Code Arena: ranked #9, +50 point jump over GPT-5.4
dot2Document Arena: ranked #6
dot3 Text Arena: ranked #7, #3 in Math and #8 in Instruction Following categories
dot4Expert Arena: ranked #5
dot5Search Arena: ranked #2
dot1Vision Arena: ranked #5

Stay up to date with our Leaderboard Changelog
!

minor tide
#

Agent Mode - Looking for Feedback arena

<@&1372208635530448926> - Some users currently have access to our Agent Mode experiment. Agent Mode is a multi-modal chat experience that lets you work across different modalities within a single, unified workflow. Since this is still an experiment,** access is currently limited**. Community feedback plays an important role in how we develop new features, so if you’ve had a chance to try it, we’d love to hear from you.

If you’ve used Agent Mode, please let us know here: https://discord.com/channels/1340554757349179412/1498702173650030756

If apart of the experiment, you will find Agent Mode in the menu drop down where you select Battle, Direct, and Side by Side.

We'll have a few follow-up questions and would really value your input! blobthanks

minor tide
#

Text Arena Leaderboard Update - Ernie-5.1 arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated to include Ernie-5.1.

Highlights:
dot1#13 overall, now the #1 highest-ranked model from a Chinese lab
dot2Categories: #9 Math, #1 Legal & Government, #4 Business, Management & Financial Ops, and #7 Software & IT Services

sand grail
#

The biggest lead in Arena history arena

<@&1473460945308221542> GPT Image 2, DeepSeek V4, GPT-5.5, and more — all in 80 seconds

https://www.youtube.com/watch?v=x4_hcF8qsIs

https://arena.ai/leaderboard

GPT Image 2 just set a record on Arena — landing at number one in Image Arena with a 242-point lead and a 93% win rate, the biggest gap in Arena history. Meanwhile, DeepSeek V4 Pro debuted at number two among open models and number three in Code Arena, and GPT-5.5 arrived with a surprising showing on Code Arena's ...

▶ Play video
minor tide
#

Text Arena Leaderboard Update - Hunyuan-Hy3-Preview arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated to include hunyuan-hy3-preview.

Highlights:
dot1Top 7 lab for open models in Text Arena
dot2Ranks #80 overall
dot3 Priced at $0.29 / $1.17 per 1M tokens

sand grail
#

Where autoraters break down | Arena.ai deep dive arena

<@&1473460945308221542> <@&1372208590248742964> Arena researchers walk through building an autorater from scratch, then get into where it falls apart in practice.

https://www.youtube.com/watch?v=r1gqLuMdKX4

https://arena.ai

Arena researchers Li Chen, PhD and I-Hung Hsu, PhD walk through how they'd build an autorater from scratch — different kinds of autoraters, training objectives, what dimensions actually matter to rate on — then get into what makes it hard in practice: preference drift, multi-turn evaluation, tie threshold variance, and the ...

▶ Play video
minor tide
minor tide
#

Image Arena Leaderboard Update - UNI-1.1-Max & UNI-1.1 arena

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Image leaderboard & the Image Edit leaderboard has been updated to include UNI-1.1-Max & UNI-1.1.

Highlights:
dot1Text-to-Image Arena: UNI-1.1-Max #6 overall (1193), UNI-1.1 #7 overall (1190)
dot2Multi-Image Edit Arena: UNI-1.1-Max #7 overall (1315), UNI-1.1 #8 overall (1298)
dot3 Single-Image Edit Arena: UNI-1.1-Max #7 overall (1337), UNI-1.1 #11 overall (1310)

minor tide
#

Multimodal Max max

@everyone - Max, Arena's model router powered by 5M+ community votes, is now multimodal.

Starting today, Max is the default in Direct chat across every modality: search, vision, image generation, image editing, and front-end coding with the same latency-controlled performance as the original router for text.

Learn more about Multimodal Max in this blog post.

minor tide
minor tide
#

Arena Staff AMA arena

<@&1385704933403398274> - We’re hosting a Discord Staff AMA with our Product Lead to answer the most common questions! We’ll be recording the session and sharing it in this server afterward.

Submit your question(s) here!

minor tide
#

Vision Arena Leaderboard Update - Gemma-4 arena

<@&1372208524230397962> <@&1372208635530448926> - The Vision Arena leaderboard has been updated to include Gemma-4.

Highlights:
dot1Gemma-4-31b ranks #2 open (#20 overall)
dot2Gemma-4-26b-a4b ranks #4 open (#26 overall)

sand grail
#

The Pareto frontier just moved arena

<@&1473460945308221542> - Gemma 4, UNI-1.1, Grok 4.3, and more — all in 80 seconds

https://www.youtube.com/watch?v=otqFNGFNwmI

https://arena.ai/leaderboard

This week in the Arena: Google's Gemma 4 lands in Code Arena and pushes the open source Pareto frontier. Luma debuts Uni-1.1 in Image Arena and immediately becomes the #3 lab across Text-to-Image and Image Edit. xAI pushes Grok 4.3 across four arenas. Plus updates on GPT-5.5 Instant, Xiaomi's MiMo-V2-Omni, Arcee Tri...

▶ Play video
minor tide
#

Text, Vision, and Document Arena Leaderboard Update - GPT-5.5 Instant arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard, Vision Arena leaderboard, and Document Arena leaderboard has been updated to include gpt-5.5-instant.

Highlights:
dot1Vision Arena: #11 overall, on par with Claude-Sonnet-4.6
dot2Text Arena: #18 overall, Multi-Turn #5
dot3 Document Arena: #24, on par with GPT-5.2

minor tide
#

Search Arena Leaderboard Update - Ernie-5.1 arena

<@&1372208524230397962> <@&1372208635530448926> - The Search Arena leaderboard has been updated to include Ernie-5.1.

Highlights:
dot1Debuts at #4 in Search Arena
dot2Top 3 lab in Search
dot3 The only Chinese model in the top 10 overall

minor tide
#

Code Arena: Frontend - New Categories arena

<@&1372208524230397962> <@&1372208635530448926> - Introducing 7 new leaderboard views for frontend output in Code Arena. Aggregate leaderboards don’t tell the full story. "Best frontend coding model" depends on what you're building, so we built leaderboards that show exactly that.

Here are the new major frontend web development task categories:
dot1Brand & Marketing
dot3Reference-Based Design
dot5Data & Analytics
dot2Consumer Product
dot4Gaming
dot3Simulations
dot1Content Creation Tools

Read more about the new categories and see prompt examples for each on our blog.

minor tide
#

<@&1372208635530448926> - We’re currently reworking Arena’s usage system into a single, unified one. Instead of separate hidden limits across different features and models, the new system is designed to give users a daily balance of credits that can be used however they want across Arena. The goal is to provide much more visibility, flexibility, and control over usage.

As part of the rollout for the new usage system, we’ve recently added daily limits for all Arena users. You may notice a new error message letting you know when you’ve hit your daily usage limit, along with info on when it resets. Your usage will automatically reset after 24 hours.

These limits are being put in place ahead of the full rollout to help support upcoming features like Agent Mode and improve overall system stability while we continue building out the remaining parts of the new usage system. The daily usage limit is one part of the rollout while we continue building the rest of the system. Additional features, including clearer visibility into daily usage spending and other usage management tools, are still in development.

If you run into any issues with the system, please report them in this thread: #1503450500614455337 message

sand grail
#

Open source battle: GLM vs Kimi vs MiMo vs DeepSeek arena

<@&1473460945308221542> <@&1372208635530448926> Peter tests the top four open source models out of China — GLM 5.1, Kimi K2.6, MiMo 2.5 Pro, and DeepSeek V4 Pro

https://www.youtube.com/watch?v=k7WAGtS9cJY

https://arena.ai/code

Peter tests the top four open source models out of China — GLM 5.1, Kimi K2.6, MiMo 2.5 Pro, and DeepSeek V4 Pro — across 70+ visual coding prompts covering 3D scenes, websites, and SVG animation, to see whether Arena's leaderboard rankings hold up in practice.

0:00 The lineup
1:50 3D scene generation
13:09 Website ...

▶ Play video
minor tide
#

Agent Mode Feedback arena

<@&1372208635530448926> - Now that Agent Mode has rolled out to more users, we’re really hungry for feedback.

If you’ve tried out the new mode, we’d love to hop on a casual voice call with you sometime in the next couple of days. If you’re interested, just ping me in the #general.

If you’d prefer to share feedback async instead, we also have a few follow-up questions here.

Let me know if you have any questions! Really appreciate everyone taking the time to try it out.

sand grail
#

How Arena tags millions of votes a week arena

<@&1473460945308221542> <@&1372208590248742964> Arena researchers dive deep into the data pipeline behind Arena's tagging system

https://www.youtube.com/watch?v=AfrfpbDBr78

https://arena.ai

Arena collects millions of votes per week across its text, image, webdev, and other arenas — each with dozens of categories that let users and model labs slice performance by domain.

Arena researchers Guanglei Song, PhD and I-Hung Hsu, PhD walk through the data pipeline behind that tagging system. The discussion covers the ...

▶ Play video
minor tide
#

Text and Vision Arena Leaderboard Update - Qwen3.7 arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard and the Vision Arena leaderboards has been updated to include Qwen3.7.

Highlights:
dot1Qwen3.7 Max Preview ranks #13 overall in Text
dot5Qwen3.7 Max Preview ranks #16 overall in Vision
dot2Standout categories include #7 Math, #9 Expert, #9 Software & IT, and #10 Coding

minor tide
#

Text and Code: Frontend Leaderboard Update - Gemini 3.5 Flash arena

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard and the Code Arena leaderboard has been updated to include Gemini 3.5 Flash.

Highlights:
dot3 Ranks #9 overall for both Text and Code: Frontend
dot4 Sub-category highlights: #7 Content Creation Tools, #8 Gaming, #8 Consumer Product, #9 Data & Analytics, and #10 Reference-Based Design

minor tide
#

Changes to Models Available in Direct and Side-by-Side for Image Arena arena

<@&1372208635530448926> - We are changing some of the models available in Direct and Side-by-Side modes for Image Arena. These changes are part of our efforts to ensure that we can continue offering access to AI models while keeping the platform running reliably.

The following models are being removed from Direct and Side-by-Side on <t:1779469200:F>

  • gemini 3 pro image preview 2k
  • gemini-3.1-flash-image-preview
  • gpt image 1.5 high fidelity
  • gpt image 2 medium

Updates like this help us maintain availability for everyone over the long term, and we intend to bring these models back in a way that’s more sustainable when possible.

sand grail
#

Gemini 3.5 Flash | First impressions arena

<@&1473460945308221542> Gemini 3.5 Flash is doing more than you ask

https://www.youtube.com/watch?v=BScuyWDzm8Y

Try it yourself: https://arena.ai/code

Google's new Gemini 3.5 Flash sits unusually close to the Pro line on pricing and tops Code Arena for Google's lineup. This walks through hundreds of side-by-side generations from the Arena to see how the model actually behaves — where it produces unusually rich SVGs and 3D scenes, where it adds elements...

▶ Play video
minor tide
#

Image Arena Leaderboard Update - HiDream-O1-Image arena

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Image Arena leaderboard has been updated to include HiDream-O1-Image.

Highlights:
dot3 Ranks #27 overall
dot4 Ranks #4 open source for Text-to-Image Arena

minor tide
#

Code Arena Leaderboard Update - Qwen3.7 Max (20260517) arena

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has been updated to include Qwen3.7 Max (20260517).

Highlights:
dot1#4 in Code Arena: Frontend
dot2Scoring 1541