thorn mauve Mar 4, 2025, 5:54 PM

#

@everyone Welcome to the new LMArena server! 👋 Since our graduation from LMSys, we're now moving all Arena projects, and leaderboard updates here to better serve the community. Expect more server updates and community surprises to roll out over time. ✨

All Arena related channels in the LMSys server, will be deprecated starting on March 14, 2025.

Learn more about our graduation here: https://x.com/lmarena_ai/status/1842982750095278482

lmarena.ai (formerly lmsys.org) (@lmarena_ai) on X

As part of Chatbot Arena's graduation🎓, we're excited to announce that we changed our X handle to @lmarena_ai! For open-source systems & research at LMSys, please follow @lmsysorg.

This account, @lmarena_ai, will be dedicated to sharing Arena projects & leaderboard updates. See

jaunty rain Mar 8, 2025, 4:09 AM

#

🚨 New UI Design Access - Heads Up! 🚨

@everyone It looks like the cat’s out of the bag—our new UI design access has been shared! 🐱💨

Here are a few things to keep in mind:
🔸 Super Alpha Stage – Expect breakage, data loss, and plenty of bugs along the way.
🔸 Constant Changes – Things may shift at any moment as we continue refining the experience.

Thanks for bearing with us—your feedback is invaluable! 🛠️ 💙

If you're still interested, then check it out here:

https://alpha.lmarena.ai
pw: super-alpha

PLEASE Give us feedback here: https://forms.gle/8cngRN1Jw4AmCHDn7
and 🪲 report bugs here: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form

Please note: This super alpha is currently optimized for desktop only! Mobile is not yet ready at this time. 🖥️

thorn mauve Mar 8, 2025, 4:14 PM

#

🙏 Bug Reports & Feedback on the new Desktop UI – Help Us Improve! 🛠️

@everyone
New UI here: https://alpha.lmarena.ai/
pw: super-alpha

Whenever possible, please submit bugs and feedback using the links below so we can prioritize and triage quickly.

💠 Feedback : https://forms.gle/8cngRN1Jw4AmCHDn7
🪲 Bugs: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form
💬 Want to discuss the new UI? Join the conversation in #new-ui-feedback feedback!

Your insights are greatly appreciated—thanks for helping us make things better! 🙌

thorn mauve Mar 11, 2025, 11:26 PM

#

@everyone Our First Community event! 🎉

Check out the Events tab 🗓️ in this server for more information on our first Stage event on Thursday. We’re excited to walkthrough and chat about the new Desktop UI with you all!

Don’t forget to submit and vote for questions in advance for the Q&A. Hope to see you there! 👋🏽

https://discord.gg/lmarena?event=1349137953406058597

thorn mauve Mar 12, 2025, 9:56 PM

#

@everyone 📢 If you haven't already, let us know if you're coming to our first event tomorrow! Details above! 👆

BTW, we heard you in #new-ui-feedback and will demo an update that just rolled out around the voting UI. Check out the preview here and go test it out in the alpha! ✅

➡️ Reminder to submit and vote for Q&A here: https://app.sli.do/event/7bThoD3UhfcLdJTLiteEgz pw: super-alpha

thorn mauve Mar 13, 2025, 7:20 PM

#

@everyone 📢 Today is the day for our first community event!

🎉 Head to The Arena Stage channel for a live walkthrough on the new Desktop UI, and a little Q&A. The event will start at <t:1741894260:R> and be about 30 mins long.

We can’t wait to see you there! 😃

thorn mauve Mar 13, 2025, 8:01 PM

#

@everyone Thank you to everyone who joined our first Discord community event!

We want to make these even better, and your feedback is key. Tell us what you loved, what could be improved, and what you’d like to see next! Didn’t attend? You can still let us know what you would like to see:

💡 Why share your thoughts?
💠 More events tailored to what you want
💠 Better content, speakers, and formats based on your input
💠 A chance to shape the future of our community

👉 Take a minute to fill out our feedback form here: https://forms.gle/Hr9xgSTWnyLVR9tC8

Your input makes all the difference—let’s make the next one even bigger and better! 🚀

Google Docs

LMArena Community Event Feedback

THANK YOU for attending our first Community Event!

thorn mauve Mar 14, 2025, 5:48 PM

#

@everyone New Desktop UI – still in Alpha!

Your feedback so far has been incredibly helpful!! 🙏 🙌
Please keep testing the Alpha here: https://alpha.lmarena.ai
🔑 We've removed the password to make it easier for you!

REMINDER: as an Alpha UI it's got limited features, but frequent updates. Be sure to go to the current site for the freshest leaderboard data, features, and the full set of models.

Please share feedback and bugs through these channels as it makes it easier for us to reproduce errors and prioritize requests, we're hard at work already implementing many of them! 🚧

💠 Feedback : https://forms.gle/8cngRN1Jw4AmCHDn7
🪲 Bugs: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form

💬 Want to discuss the new UI? Join the conversation in #new-ui-feedback!

thorn mauve Mar 17, 2025, 5:35 PM

#

@everyone We love all the great feedback on the Alpha Desktop UI—keep it coming! Your insights are shaping the future of Arena, and we truly appreciate it.
As we continue working on some major improvements to bring everything up to speed with the current site, we’re re-implementing the password to keep access limited to you, our community. This is still an early-stage build, and not yet intended for public use.

🔑 New password: still-alpha

Thanks for being part of this journey with us! Keep testing, sharing feedback, and helping us make Arena even better.

thorn mauve Mar 24, 2025, 5:10 PM

#

Hi @everyone!

Thanks so much for continuing to test the alpha — we’ve been busy making updates based on your feedback. Here’s what’s new in the alpha:

🛠️ Fixed a bug where messages wouldn’t save (which also caused votes to fail)
✍️ O3-Mini now correctly formats text
📊 Leaderboard columns are now sortable
🔄 Leaderboard data is updated live

🔗 Please keep testing the Alpha: https://alpha.lmarena.ai/
🔑 Password: still-alpha

Your feedback is super valuable — it helps us reproduce bugs and prioritize your requests. We’re already hard at work on more improvements! 🚧

💠 Feedback: https://forms.gle/8cngRN1Jw4AmCHDn7
🪲 Bugs: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form

thorn mauve Mar 31, 2025, 11:19 PM

#

📣 Alpha Arena Updates @everyone

Copy Code is now available! ✅
Image Generation is also live in the alpha 🖼️

Try it out here https://alpha.lmarena.ai/ (password: still-alpha)

Thank you for continuing to test! Keep the feedback and bugs coming 🙌🏽
💠 Feedback: https://forms.gle/8cngRN1Jw4AmCHDn7
🪲 Bugs: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form

thorn mauve Apr 2, 2025, 11:03 PM

#

📱Alpha is now mobile-ready! @everyone

You can now test the new Arena Alpha UI right from your phone. Whether you're on the go or just prefer mobile, the experience is now optimized and ready for you.
We know a lot of you have been waiting for this — so now’s your chance to dive in and put it to the test!

🔗 https://alpha.lmarena.ai
🔑 Password: still-alpha

As always, your feedback helps us improve fast:
💠 Feedback: https://forms.gle/8cngRN1Jw4AmCHDn7
🪲 Bugs: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form

Come chat in #new-ui-feedback and let us know what you think!

thorn mauve Apr 8, 2025, 5:53 PM

#

@everyone Thank YOU for all the rich feedback on Alpha UI in the past week!!

Please keep testing, and sending notes!
🔓 No password needed anymore at https://alpha.lmarena.ai/

Reminder this is still an early version, so some features are limited, but updates are coming fast for Desktop & Mobile.
For the latest models and leaderboard data, use the main site.

Got thoughts or found a bug? Let us know here:
💠 Feedback: https://forms.gle/8cngRN1Jw4AmCHDn7
🪲 Bugs: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form
💬 Chat about it in: ⁠#new-ui-feedback

thorn mauve Apr 17, 2025, 5:11 PM

#

📣 Big News: We are starting a company + New Beta is Live! 💥

Hey @everyone — we’ve got two big updates to share today:

1️⃣ We are starting a company to support LMArena!
We began as a scrappy academic project out of UC Berkeley, and thanks to all of YOU, we’re taking the next step that will allow us to stay committed to improving the platform you’ve helped us build. LMArena will stay neutral, open, and accessible to everyone. Read more here: https://blog.lmarena.ai/blog/2025/new-beta/

2️⃣ Beta is LIVE!
We’ve been listening closely to your feedback from the Alpha, and today, we’re releasing a Beta version of the new LMArena site:
🔗 https://beta.lmarena.ai

📝 A note for those that were testing the Alpha: saved chats won’t carry over in this version. We know that’s a bummer, we’re still fine tuning this - thanks for your patience as we continue to improve! 🤗

☑️ Since Beta will still be a bit buggy, we will be testing the signal quality. Votes are currently being stored, and we'll start to include them properly as the signal quality increases. Your feedback helps us make the evaluation stronger, so please keep voting!

🔗 Try the Beta: https://beta.lmarena.ai
💠 Feedback: https://forms.gle/8cngRN1Jw4AmCHDn7
🪲 Bugs: https://airtable.com/appK9qvchEdD9OPC7/pagxcQmbyJgyNgzPx/form
💬 Chat about it in: ⁠#new-ui-feedback

Thank you for being part of this. Let’s make it great together! 🤝

thorn mauve Apr 17, 2025, 8:17 PM

#

The feedback is coming in hot for Beta! 🔥 THANK YOU @everyone 🙌🏽
...and we are responding!!

We now have:
🌓 Dark/Light mode toggle in the top right
✂️ Copy/paste images directly into the prompt box
✨ A few polish items in the leaderboard

Keep sending the feedback! 🙏🏽

thorn mauve Apr 26, 2025, 12:08 AM

#

Hi @everyone, do you have time for a quick poll? 👋

Every vote helps us understand the community better - thanks as always for your help! 🙏🏽
#general message

thorn mauve Apr 27, 2025, 10:41 PM

#

@everyone Around this time 2 years ago, this community helped us launch our very first Arena leaderboard!

Today we’re celebrating what you all have built together on LMArena! 🥳 👏 A few fun stats:
☑️ 3M+ votes!
🤖 400+ models on the leaderboards!
📊 300+ pre-release evaluations!
📝 10+ open datasets for prompts and user preferences

Take a look at our OG leaderboard below - back when we were still developed Vicuna🦙

Read more about this community’s impact in our blog post: https://blog.lmarena.ai/blog/2025/two-year-celebration/
tweet: https://x.com/lmarena_ai/status/1916620122342695363

Screenshot_2025-04-27_at_10.43.32_AM.png

minor tide May 7, 2025, 5:28 PM

#

Hi @everyone! ablobwave

Building community is paramount to LMArena's mission. That's why our team is investing more time and energy into creating a space for those interested in making an impact in AI. I'm excited to share that I'll be stepping in as this Discord's community manager! You'll be hearing more from me as we implement improvements to help grow, engage, and protect this space - all in service of building our AI community.

Building community requires... well... community! So I'd love to hear your thoughts through this survey about possible changes you'd like to see happen in this Discord.

It's nice to meet you!
grizzblob

minor tide May 13, 2025, 1:31 PM

#

Hey everyone ablobwave Quick heads up that over the next few days, we'll be making some server changes focused on new member onboarding, channel structure, and mod reporting. Listening to community feedback is incredibly important to us, so please let us know how you’re feeling about these changes. Don’t hesitate to reach out with any questions!

Also, many of you have requested independent scrolling, here's a sneak preview of what we're working on!

minor tide May 14, 2025, 5:13 PM

#

Server Updates

@everyone As some of you may have noticed we've implemented a few changes to the server in an effort to make a more engaging and protected space. It's important to note that we're driven by community feedback so if you have thoughts on these changes fill out this form!

These changes include:

discord Server Structure

We've added a new Forum Category that's intended for gathering feedback, troubleshooting issues, and model requests. This is intended to replace the #new-ui-feedback and #arena-feedback channels to better organize and track issues, feedback, and requests! Note if you're not seeing these new channels you may have to enable them in: Channels & Roles -> Browse Channels.

New Roles have been created! In the Channels & Roles section you'll find a few questions that'll auto-assign new roles. These roles will allow for more targeted announcements ensuring you all are getting the information that's most important to you!

The Server Guide is now where you'll find our #rules & #information-desk channels. Server Guide is located at the top of the channel list.

Channels that weren't getting much use are being moved to the Archived category. We're looking to create a more engaging server and phasing out channels with little use will reduce clutter.
Moderation

For immediate needs pinging <@&1349916362595635286> role is available.

For issues you'd like to flag privately you can now **send a Direct Message to the ModMail bot **which you'll find at the top of the Member List. This will provide more options for members to report bad behavior.

We've updated our #rules a bit so be sure to give that a peek. These changes are intended to keep discussion more on-topic and create a more inclusive space.
🪴 The Future

Our plans are to host events on a more regular basis! Staff AMAs, contests, and casual game/activity nights are all things to look forward to. So keep an eye out!

minor tide May 17, 2025, 3:51 PM

#

New Models added to the Beta Site

<@&1372208635530448926> new models are now live on the beta site! Go check em out! ablobparty

mistral-medium-2505

claude-3-7-sonnet-20250219-thinking-32k

amazon.nova-pro-v1:0

command-a-03-2025

minor tide May 19, 2025, 3:39 PM

#

Mistral Medium 3 making waves 🌊

<@&1372208524230397962> Since the debut of Mistral Medium 3 we've seen some impressive moves on the leaderboards:

#11 overall in chat (a +90 leap from Mistral Large)

Top-tier in technical domains (#5 in Math, #7 in Hard Prompts & Coding)

#9 in WebDev Arena

minor tide May 21, 2025, 4:17 PM

#

@everyone Sharing some news about the future of LMArena! lmarenalogo

Next week, we are planning to flip the switch and change the current site to the beta site! Additionally, we’re excited to share the news that we have received $100M in seed funding! What this means is hiring more people, increasing site performance and incorporating community feedback faster into the experience. Community feedback will continue to play a crucial role in helping shape the platform.

You may also notice the fresh coat of paint on the server! Along with the new look we’ve added some custom emojis - lmarena battle directchat leaderboard sidebyside battle3d sidebyside3d directchat3d trophy3d

We’re planning to host a staff AMA soon! If you have questions for the staff please fill out this form. We’ll be sure to announce the date/time when confirmed. We also plan to record the event for those that are unable to attend it live.

Quick note to use the new #1372230675914031105 and #1343291835845578853 forum channels for any feedback or bugs you’d like to share with the team. Also, be sure to grab those server roles in the Channels & Roles section if you haven’t already!

minor tide May 22, 2025, 8:01 PM

#

New Model Update

<@&1372208635530448926> happy to share that **the next generation of Claude is in the Arena! ** lmarenalogo

Claude Opus 4

Claude Sonnet 4

Screenshot_2025-05-22_at_12.46.13_PM.png

minor tide May 22, 2025, 9:32 PM

#

Staff AMA

@everyone Happy to share that we'll be hosting our first Staff AMA at <t:1749229200:F> ! **LMArena's Cofounder & CEO Anastasios Angelopoulos ** will be first up in our series of Staff AMAs we plan to bring to this community on a more regular basis. For those unable to attend live, we plan to record the event.

Sign up! lmarenalogo

minor tide May 24, 2025, 2:49 PM

#

Style Control is now the Default View

<@&1372208524230397962> <@&1372208635530448926> Last summer, Style Control was introduced to disentangle model response quality from stylistic factors like length and markdown formatting, helping to better reflect core capabilities. Many community members agreed this provided a clearer assessment, so as of today, Style Control is now the default view. You can still disable Style Control through the filter options.

<@&1372208590248742964> be sure to check out our research into how style influences votes here!

minor tide May 25, 2025, 7:42 PM

#

Independent Scrolling is here

<@&1372208635530448926> The highly requested independent scrolling feature is now here! Each response area is now individually scrollable. Thank you for all your feedback and enjoy! lmarenalogo

minor tide May 29, 2025, 4:35 PM

#

New a16z podcast just dropped! 🎙️

<@&1372208635530448926> New a16z podcast here! Our cofounders sat down with the lead investor at a16z to talk about LMArena and the future of AI evaluation. The team discusses how LMArena evolved overtime, why subjective data is crucial, and what it means to build a CI/CD pipeline for large models. Watch the episode here. lmarenalogo

We'll be hosting a watch party <t:1748619000:F> in #1340554757827461215 for anyone that wants to join!

YouTube

a16z

Beyond Leaderboards: LMArena’s Mission to Make AI Reliable

a16z general partner Anjney Midha sits down with LMArena cofounders Anastasios N. Angelopoulos, Wei-Lin Chiang, and Ion Stoica to talk about the future of AI evaluation.

As benchmarks struggle to keep up with the pace of real-world deployment, LMArena is reframing the problem: what if the best way to test AI models is to put them in front of mi...

▶ Play video

minor tide May 29, 2025, 6:09 PM

#

New Model Update

<@&1372208635530448926> DeepSeek R1-0528 is now in the Arena! Go check it out. lmarenalogo

minor tide May 30, 2025, 3:47 PM

#

AI Generation Contest

@everyone We want to see what you all can make with LMArena! Happy to share that today we’re announcing our first AI Generation Contest. Each month we’ll be crowning a new <@&1378032433873555578> through community voting.

battle How will it work?

You may have noticed the new #june-contest channel which is where you all will “submit” your creations and discuss what others submit

To submit simply post a screenshot of what you’ve created through LMArena to #june-contest

On June 20th submissions will be closed and we’ll circulate a survey allowing you all to vote on which ones you like best

The person with the highest score will be declared our winner!
What could I win?

1 month of Discord Nitro

<@&1378032433873555578> role! You’ll hold onto this exclusive role (for at least a month) which will be hoisted towards the top of the member’s list showing the ~~world~~ server your accomplishment!
Rules

Submissions must be done through Battle Mode, submissions created through side-by-side or direct chat will not be accepted

Your submission must include both the left and right response

Your submission must be after you’ve voted for which response you prefer meaning the models should be revealed in your submission

Only one submission per person
example provided below

This month’s theme: Image - Cozy Desk ☕

Let’s get cozy! Warm beverages, fluffy blankets, and overall snug vibes is what we’re looking for - but at a desk. Get creative and show us what you think would make a cozy environment. This month’s contest will be for image creations only.

Happy submitting and good luck! lmarenalogo

minor tide Jun 2, 2025, 3:52 PM

#

Leaderboards Updated

<@&1372208635530448926> Our leaderboards were recently updated. Go check them out! https://lmarena.ai/leaderboard

A reminder that this Friday we'll be hosting our Staff AMA. We are planning to record the session for those that can't make it live. https://discord.gg/XkfsbYWX?event=1375223423009165435 lmarenalogo

minor tide Jun 6, 2025, 6:25 PM

#

Early Access Feedback Program

@everyone We're planning to take the feedback process to a whole a new level - and you're invited to apply!

The LMArena Test Garden is going to be our new private feedback program that'll invite selected users to get exclusive sneak peeks at features, design mocks, and ideas the team is considering implementing for early feedback to ensure we're on the right path. For those who are exceptional at providing feedback, and would like to see what we're cooking up, this program is for you!

Apply here

If selected we'll followup privately for next steps. lmarenalogo

minor tide Jun 13, 2025, 6:05 PM

#

Few Reminders

@everyone
lmarenalogo We have a contest running right now! Don't miss out on the chance to win! Post your creations to the #june-contest channel. More details here - #announcements message

lmarenalogo Excited to provide feedback and interested to see the things we're building behind the scenes? Apply to our Test Garden! Apply here!

lmarenalogo Thank you all who attended last week's Staff AMA! We look forward to bringing more of these in the future. Please share your feedback here!

minor tide Jun 16, 2025, 3:47 PM

#

6/16/25 Error Message

The team is aware of a widespread issue where models aren't providing a response but instead are erroring out. We are working on a speedy fix. Our apologies for any inconvenience this causes. ~~I'll update this message when it's fixed.~~

It's fixed! Should be working again. ablobcheer Don't hesitate to @ me if you're still having issues. lmarenalogo

minor tide Jun 23, 2025, 6:10 PM

#

June's AI Generation Contest Submissions

@everyone Help us determine June's <@&1378032433873555578> for Cozy Desk by voting here!

Reminder for what we're looking for:

Let’s get cozy! Warm beverages, fluffy blankets, and overall snug vibes is what we’re looking for - but at a desk. Get creative and show us what you think would make a cozy environment.

minor tide Jul 3, 2025, 3:59 PM

#

New Leaderboard - Image Edit

@everyone <@&1372208635530448926> <@&1372208524230397962> A new leaderboard is now live! Driven by community votes we’re happy to share that the Image Edit Leaderboard is now available with 7 models currently filling the ranks! With the Image Edit, you can upload an image and directly compare each model’s editing capabilities.

Go check out the Image Edit Leaderboard here! lmarenalogo

minor tide Jul 4, 2025, 8:38 AM

#

New Model Update

<@&1372208635530448926> New models added to LMArena!

mistral-small-2506

imagen-4.0-ultra-generate-preview-06-06

ideogram-v3-quality

minor tide Jul 7, 2025, 4:29 PM

#

New Model Update

<@&1372208635530448926> New model added to LMArena!

grok-3-mini-high

minor tide Jul 7, 2025, 7:04 PM

#

July Contest Update

<@&1385704581677191278> To celebrate the launch of the Image Edit Leaderboard let's incorporate some Image Edit capabilities into our July contest!

How does it work and what are the rules?

Submit your entry by sharing a screenshot of what you've created in the #july-contest channel.

On July 25th submissions will be closed and we'll circulate a way for the community to vote for our winner.

Submissions must be done through Battle Mode & use the Image Edit functionality. You must use both an image and text to inspire something new.

Your submission must include both the left and right response.

Your submission must be after you've voted for which response you prefer meaning the models should be revealed in your submission.

Example here.

What could I win?

1 Month of Discord Nitro

Become the newest member to receive the <@&1378032433873555578> role!

July's Contest Theme - Out of Place Objects in Space! 🪐 🚀 ☄️ Sci-fi space environment is a must, but include something that clearly doesn't belong. Confuse us!

June's Contest Winner

Help me send a big congrats to @tawny basalt for being our very first <@&1378032433873555578> !! A very cozy desk was achieved, check it out here!

minor tide Jul 9, 2025, 2:32 PM

#

New Model Update

<@&1372208635530448926> New model added to LMArena!

seedream-3 (text-to-image)

minor tide Jul 10, 2025, 6:02 AM

#

New Model Update

@everyone <@&1372208635530448926> New model added to LMArena & WebDev Arena

grok-4

minor tide Jul 14, 2025, 6:49 PM

#

New Model Update

<@&1372208635530448926> New model added to LMArena!

kimi-k2

minor tide Jul 16, 2025, 5:05 PM

#

Light UI Improvements Now Live

@everyone <@&1372208635530448926> We're excited to share with you all that we've made some light UI improvements that are now live! Our intention with these improvements is to make the overall experience more polished, intuitive, and delightful. Many of these improvements were inspired by community feedback.

What’s new?

A more streamlined interface. The core chat UI is now a more focused and minimal experience that reduces visual clutter.

A more compact sidebar. This provides quick access to About Us, How It Works, Feedback, Leaderboards, and other sections.

Leaderboard tab now has improved navigation. Accessing the leaderboards you care most about can now be done faster.

As always, community feedback is crucial. We'd love to hear what you think about these changes here.

We have an exciting July ahead with numerous changes and improvements in the pipeline that we're eager to share with you!

minor tide Jul 23, 2025, 4:16 PM

#

Search Arena is Now Live

@everyone <@&1372208635530448926> A new modality has been added to LMArena. Check out Search Arena here!

7 models with search capabilities are ready and waiting for your testing. Note to have the Search modality selected in the chat box first.

Grok 4

Claude Opus 4

Sonar Pro High & Reasoning Pro High

o3

GPT 4o-Search Preview

Gemini 2.5 Pro Grounding

Learn more about what Search Arena has taught us about human-AI interactions on our blog post.

minor tide Jul 25, 2025, 3:05 PM

#

Psst… we’ve got a surprise for you.

An experimental Video Arena bot is now live, and you can try it right here only in this server!

Generate videos and images with top AI video models with the LMArena bot:

🗳️ Vote on each other’s creations
🧠 Learn how to use it in #1397655624103493813
💬 Share feedback in #bot-feedback

minor tide Jul 28, 2025, 8:00 PM

#

New Model Update

<@&1372208635530448926> New models added to LMArena!

GLM-4.5

GLM-4.5 Air

minor tide Jul 30, 2025, 4:20 PM

#

Video Arena is live… here on Discord!

@everyone <@&1372208635530448926> <@&1398740297521037332> We’re launching an experimental Video Arena here on Discord. Generate videos with the top AI models for free, and compare their results right here in our community server.

Learn how to use this bot in #1397655624103493813 and start generating in #video-arena-1 #video-arena-2 #video-arena-3.

What does the bot do?

With the LMArena bot you can generate videos, images, and image-to-videos via the bot. Similar to battle mode you’ll given two generations and anyone will be able to vote on which they prefer. After a certain number of votes, the bot will reveal the models.

Why is this being considered an experiment?

There are a lot of firsts with this one. First time access is exclusive to our Discord server. First time others are able to vote on other’s generations. First time you all will be inspired and chat about what you and others generate. That being the case, we're excited to hear what the community thinks!

To celebrate this new milestone, join us for a Staff AMA with the Bot’s Developer - Thijs Simonian on <t:1754672400:F>. Be sure to submit any questions you have here.

minor tide Jul 31, 2025, 4:18 PM

#

Open Data Release

@everyone <@&1372208590248742964> We are sharing a new dataset with over 140k conversations from the text arena collected between April 17th and July 25th 2025. Join us as we explore real-world trends, new features, and fresh prompts.

What’s covered in the latest analysis:

Language & topic breakdowns

Rating changes: How Arena scores shift over time

Overview of the released dataset

And more!

The data analysis highlights not just who is winning, but why, and what signals might matter most in human-based AI evaluations.

Read the full breakdown here on our blog.

minor tide Aug 1, 2025, 5:10 PM

#

New Model Capability Update - Veo3 Image-To-Video is Here

<@&1372208635530448926> <@&1398740297521037332> New model capability added to Video Arena

Veo 3 Fast & Veo 3 now has **Image-to-Video with audio **capabilities!

Give it a try using /image-to-video in our video-arena channels: #video-arena-1 #video-arena-2 #video-arena-3 and vote on what you think is best!

minor tide Aug 5, 2025, 6:25 PM

#

New Model Update

<@&1372208635530448926> New models added to Text & WebDev Arena on LMArena!

OpenAI gpt-oss-120b

OpenAI gpt-oss-20b

Claude Opus 4.1 (battle mode only)

minor tide Aug 6, 2025, 4:30 PM

#

Video Leaderboards Now Live

<@&1372208635530448926> <@&1372208524230397962> <@&1398740297521037332> Thanks to the contributions of this community we now have Video Leaderboards available!

Text-to-Video Arena Leaderboard Here
Image-to-Video Arena Here

minor tide Aug 6, 2025, 8:35 PM

#

New Video Models Update

<@&1372208635530448926> <@&1398740297521037332> New models have been added to Video Arena

Hailuo-02-pro

Hailuo-02-fast

Sora

Runway-Gen4-turbo

Give them a try in our video-arena channels: #video-arena-1 #video-arena-2 #video-arena-3

minor tide Aug 7, 2025, 5:14 PM

#

GPT-5 is here!

@everyone Now that OpenAI's GPT-5 is here, we're thrilled to share that this model has been setting a new bar across our leaderboards. Tested under the codename “summit”, GPT-5 now holds the highest Arena score to date.

Thanks to the community's voting GPT-5 is now number #1 in Text, Vision, and WebDev Arena.

GPT-5 is now available on LMArena!

minor tide Aug 7, 2025, 8:37 PM

#

Staff AMA Reminder

<@&1398740297521037332> Reminder that tomorrow we're having our Staff AMA focussing on Video Arena. If you have questions please fill out this form.

https://discord.com/events/1340554757349179412/1400149736027328623

minor tide Aug 7, 2025, 8:54 PM

#

New Model Update

<@&1372208635530448926> New models added to LMArena!

gpt-5-mini-2025-08-07

gpt-5-nano-2025-08-07

minor tide Aug 8, 2025, 4:52 PM

#

Staff AMA Starting Soon

<@&1398740297521037332> Reminder we'll be starting this event in 10 minutes!

https://discord.com/events/1340554757349179412/1400149736027328623

minor tide Aug 11, 2025, 8:45 PM

#

New milestone - 15,000 members!

Thank you all for being a part of this amazing community. Whether you've been here since day one or recently joined us, the LMArena team is incredibly grateful that you're all here!

minor tide Aug 12, 2025, 3:26 PM

#

New Model Update

<@&1372208635530448926> New models added to Search Arena on LMArena!

gpt-5-search

claude-opus-4.1-search

minor tide Aug 13, 2025, 7:04 PM

#

July Contest Update

<@&1385704581677191278> Thank you for your patience with this one! Vote on July's contest submissions here! On Friday 8/15 we'll announce the winner and start our next contest.

Vote Here.

minor tide Aug 15, 2025, 5:02 PM

#

Leaderboard Update

<@&1372208635530448926> <@&1372208524230397962> Thank you all for patiently waiting for this update. We're happy to share the leaderboards were just updated including the GPT-5 variants.

gpt-5-high

gpt-5-chat

gpt-5-mini-high

gpt-5-nano-high

Check out the leaderboards here.

minor tide Aug 18, 2025, 2:01 PM

#

August Contest Update

<@&1385704581677191278> To celebrate the launch of our experimental Video Arena let's generate some videos into our August contest!

How does it work and what are the rules?

Use /video to generate two videos related to our contest theme (Slice 🔪 ) in our video arena channels (#video-arena-1) .

To submit your generation to the contest -> **Forward the generated message to the #august-contest channel ** This is where we are collecting submissions.

To forward a message, hover your mouse over the message, select Forward, send to the #august-contest channel.

Submissions will close on Sept 12th!

What could I win?

1 Month of Discord Nitro

Become the newest member to receive the <@&1378032433873555578> role!

August's Contest Theme - Slice! 🔪 Show us those oddly satisfying, safe‑for‑work, crisp cross‑section cuts into everyday objects. Think is it cake videos. Examples here & here.

July's Contest Winner

Big congrats to @delicate iris for being our July <@&1378032433873555578> ! Check out their generation here.

minor tide Aug 19, 2025, 1:35 PM

#

BiomedArena is here!

@everyone <@&1372208635530448926> We're excited to announce that we're partnering with DataTecnica LLC & the National Institutes of Health to create BiomedArena! This new arena is focussed on real-world biomedical workflows, from literature review to disease modeling, using open, reproducible methods trusted by scientists.

It's already in use at NIH’s Intramural Research Program. We're proud to support and help expand this work around:

Open, reproducible evaluations

Expert-in-the-loop feedback

Scientific transparency at scale

Check out BiomedArena for yourself here and read more about this partnership on our blog here!

minor tide Aug 19, 2025, 4:47 PM

#

Video Arena Highlights Channel

<@&1398740297521037332> We’re introducing #highlight ! You’ll now notice a ⭐ has been added to the area you vote in Video Arena. This ⭐ allows users to **highlight spectacular generations they want shown off! ** After 4 people chose to highlight a generation it’ll automatically be posted to #highlight for all to appreciate.

Reminder we have our August Contest currently running, be sure to forward your 🔪Slice 🔪 generations in #august-contest - more info here.

minor tide Aug 19, 2025, 6:32 PM

#

Legacy Site Update

<@&1372208635530448926> - Our legacy website began this wonderful journey of exploring the world's leading AI models and shaping community leaderboards.

Looking toward the future, we've decided to invest our full efforts into the current version of the site. We have bittersweet news to share: our legacy site is no longer available. All the great features from the legacy site are either under consideration or currently in development. We're listening—please tell us in our feedback forum which features matter most to you.

A heartfelt thanks from the LMArena team to everyone who's been on this journey with us from the early stages! The legacy site will always have a place in our hearts. ❤️

minor tide Aug 19, 2025, 7:03 PM

#

New Model Update

<@&1372208635530448926> - New model added to Image Edit on LMArena!

qwen-image-edit

minor tide Aug 20, 2025, 4:43 PM

#

Image Edit Leaderboard Update

<@&1372208524230397962> - The Image Edit Leaderboard has been updated and Qwen-Image-Edit is now the #1 open model for Image Edit!

Check out our leaderboards here.

minor tide Aug 21, 2025, 10:22 PM

#

New Model Update

<@&1372208635530448926> New models added to LMArena!

deepseek-v3.1

deepseek-v3.1-thinking

minor tide Aug 22, 2025, 5:50 AM

#

Video Arena Bot is Working

<@&1398740297521037332> Thank you for your patience while we resolved the issue. The @royal stirrup Bot is working again.

Reminder how to use the bot -> type /video or /image-to-video only in these channels #video-arena-1 #video-arena-2 #video-arena-3

#

https://cdn.discordapp.com/attachments/1397655624103493813/1402042969128697959/VideoArena_DiscordBot_Ho-to-video.gif

minor tide Aug 25, 2025, 7:14 PM

#

Battle, Side by side, Direct - Why?

<@&1372208473269473320> We'd love to understand better why you use your preferred version.

Please fill out this survey if you'd like to share your thoughts. blobthanks

minor tide Aug 26, 2025, 2:11 PM

#

Gemini-2.5.-Flash-Image-Preview Release

@everyone <@&1372208635530448926> A lot of you have found a recent anonymous model a-peeling & today we're thrilled to share with you that nano-banana = Gemini-2.5-Flash-Image-Preview 🍌

The previous two weeks of testing this model has led to the largest Elo score jump in LMArena history. The text-to-image leaderboard & image-edit leaderboard have been updated and a new leader of the pact has emerged.

Gemini-2.5-Flash-Image-Preview is now available in Battle, Side by Side, & Direct modes. Try it out now & join us in #nano-banana to tell us what you think!

minor tide Aug 28, 2025, 5:11 PM

#

New Model Update - MAI-1-preview

<@&1372208635530448926> - A new model provider has landed on our text leaderboard. Microsoft AI's MAI-1-preview is now sitting at #13!

Come check out MAI-1-preview available now on LMArena.

minor tide Aug 30, 2025, 9:04 PM

#

User Sign In - Google Sign-in

You all have been waiting patiently for this feature and we’re thrilled to share that **we're starting to roll out User Login! ** On our canary site you can currently login with your Google Account google .

A few notes:

You will be able to access your Chat History on different devices when logged in.

This is currently only available on our canary site. Desktop & mobile.

When you create/login to an account, you can merge your existing chats with the account with the Merge existing chats with your account toggle.

To log out open the sidebar, locate your email on the bottom-left, click the three dots.

We want to ensure this feature is working properly which is why we’re slowly rolling this out. Make us aware of any bugs in here: #1343291835845578853 and be sure to share any feedback in here: #1372230675914031105 . Plans for more sign-in options are being worked on.

To access User Login please use this link: https://canary.lmarena.ai/

minor tide Sep 2, 2025, 5:59 PM

#

User Login - Google Sign-in

@everyone <@&1372208635530448926> You all have been waiting patiently for this feature and we’re thrilled to share that **we're starting to roll out User Login! ** You can currently login with your Google Account google .

A few notes:

You will be able to access your Chat History on different devices when logged in.

When you create/login to an account, you can merge your existing chats with the account with the Merge existing chats with your account toggle.

To log out open the sidebar, locate your email on the bottom-left, click the three dots.

There is a small hold-out group, meaning some users won't yet have access to this feature yet. We will roll this out to everyone as soon as we're certain everything is working properly.

Be sure to share any bugs in here: #1343291835845578853 and be sure to share any feedback in here: #1372230675914031105 . Plans for more sign-in options are being worked on.

Access User Login here!

minor tide Sep 3, 2025, 5:04 PM

#

Only 9 day left on our August Video Generation Contest

<@&1398740297521037332> Reminder that you only have 9 days left to submit your for our Video Gen Contest!

How does it work and what are the rules?

Use /video to generate two videos related to our contest theme (Slice 🔪 ) in our video arena channels (#video-arena-1) .

To submit your generation to the contest -> **Forward the generated message to the #august-contest channel ** This is where we are collecting submissions.

To forward a message, hover your mouse over the message, select Forward, send to the #august-contest channel.

July's Contest Theme - Slice! 🔪 Show us those oddly satisfying, safe‑for‑work, crisp cross‑section cuts into everyday objects. Think is it cake videos. Examples here & here.

minor tide Sep 5, 2025, 7:32 PM

#

Video Arena Discord Bot is Working

<@&1398740297521037332> The bot is working again! Thank you all for your patience while we worked on a fix.

Reminder how to use the bot:

You must be in #video-arena-1 , #video-arena-2 , or #video-arena-3

Type /video

Type in your prompt and hit Enter

#

https://cdn.discordapp.com/attachments/1397655624103493813/1402042969128697959/VideoArena_DiscordBot_Ho-to-video.gif

minor tide Sep 5, 2025, 7:54 PM

#

User Login & Rate Limits

<@&1372208635530448926> – Due to unprecedented traffic, we’re introducing rate limits for image generation. Logged-in users will continue to enjoy higher limits, and we’ll keep making the login experience better so contributing to community evaluations becomes even more rewarding.

You can learn more about User Login here.

minor tide Sep 8, 2025, 6:28 PM

#

New Model Update

<@&1372208635530448926> New Models added to LMArena!

Qwen3-max-preview

Kimi-K2-0905-preview

minor tide Sep 8, 2025, 8:34 PM

#

Multi-Turn for Image Edit

@everyone Multi-turn editing is now available on all image edit models! Refine your image step by step instead of trying to fit every edit into one mega-prompt.

Multi-Turn for Image Edit is available in Battle, Side by Side, or Direct. Try it out for yourself.

minor tide Sep 8, 2025, 8:55 PM

#

Video Arena Rate Limit

<@&1398740297521037332> Due to increased usage of the experimental Video Arena, we're going to make a change setting the individual use limit to 5 generations per day. After you've hit this limit, you will have to wait 24hr to start using the bot again.

Reminder how to use Video Arena can be found here.

minor tide Sep 10, 2025, 11:57 PM

#

New Model Update

<@&1372208635530448926> New Model added to LMArena!

Seedream-4

minor tide Sep 11, 2025, 6:38 PM

#

New Model Update

<@&1372208635530448926> New Models added to LMArena!

Qwen3-next-80b-a3b-instruct

Qwen3-next-80b-a3b-thinking

minor tide Sep 11, 2025, 9:36 PM

#

New Model Update

<@&1372208635530448926> New Model added to LMArena!

Hunyuan-image-2.1

minor tide Sep 13, 2025, 1:15 AM

#

New Model Update

<@&1372208635530448926> New Model added to LMArena!

Seedream-4-high-res

minor tide Sep 15, 2025, 8:04 PM

#

Battle, Side by side, Direct - Why?

<@&1372208419418673302> <@&1372208243681660978> <@&1372207111445938226> We'd love to understand better why you use your preferred version.

Please fill out this survey if you'd like to share your thoughts. blobthanks

minor tide Sep 16, 2025, 3:36 PM

#

August Contest Update

<@&1385704581677191278> Thank you to everyone that participated in our first Video Arena GenAI contest! Vote on which you like the best to crown our new <@&1378032433873555578>

This contest theme is 🔪Slice🔪 Show us those oddly satisfying, crisp cross‑section cuts into everyday objects.

Vote Here

minor tide Sep 16, 2025, 5:58 PM

#

Text-to-Image & Image Edit Leaderboards Updated

<@&1372208524230397962> Our Text-to-Image & Image Edit leaderboards have been updated with some interesting movement.

Seedream-4-high-res is now tied with Gemini-2.5-flash-image-preview (nano-banana) for the #1 slot on the Text-to-Image leaderboard. Check it out yourself on the Text-to-Image leaderboard Here.

For Image Edit we're now seeing Seedream-4-high-res holding the #2 position. Image Edit leaderboard can be found Here.

Tell us what you think in our #leaderboards channel.

minor tide Sep 16, 2025, 8:10 PM

#

AI Eval Product Update

<@&1372208635530448926> It's our mission to improve the reliability of AI. We're introducing an evaluation product to analyze human-AI interactions at scale, turning their complexity into insights the AI ecosystem will benefit from.

Our AI Evaluation service offers enterprises, model labs, and developers comprehensive evaluations grounded in real-world human feedback. LMArena AI Evaluations consist of:

Comprehensive, in-depth evaluations based on feedback from our community.

Auditability through representative samples of feedback data.

Service-level agreements (SLA) with committed delivery timelines for evaluation results.

Analytics based on community feedback reveal strengths, weaknesses, and tradeoffs—helping providers build even better models and AI applications for everyone.

Are you an enterprise, model lab, or developer that wants to learn more about our AI Evaluation services? Read more on our blog.

minor tide Sep 18, 2025, 4:25 PM

#

Top 10 Open Model Update

@everyone <@&1372208524230397962> New open models have entered the Text Arena, and the rankings by provider have shifted for September. Only the top 7 open models also rank within the top 50 overall (proprietary & open).

Some noteworthy highlights

Qwen-3-235b-a22b-instruct is currently in the top slot
Longcat-flash-chat debuts on the charts in impressive manor landing at #5
Top models are now clustered even closer in score

lmarena Holding Firm

Qwen-3-235b-a22b-instruct stays at #1 (overall rank #8)
Kimi-K2-0711-preview firm at #2 (overall rank tied for #8)
DeepSeek-R1-0528 holding steady at #3 (overall rank #9)
GLM-4.5 holds at #4 (overall rank #13)
Mistral-Small-2506 at #9 (overall rank tied at #53)

directchat New Entrants

Longcat-flash-chat debuts at #5 (overall rank #20)

leaderboard Movers

MiniMax-M1 went from #5 → #6 (overall rank #43)
Gemma-3-27b-it shifts from #6-> 7 (overall rank #46)
gpt-oss-120b drops to #8 (overall rank #51)
Llama-3.1-Nemotron-Ultra-253b-v1 drops from #8 -> #10 (overall rank #53)

battle Dropouts

Command-A-03-2025 (#10 → out)

Check out the details for yourself on our leaderboards. Let us know what you think in the #leaderboards channel.

minor tide Sep 19, 2025, 11:41 PM

#

New Model Update

@everyone <@&1372208635530448926> New models added to LMArena!

Grok-4-fast

Grok-4-fast-search

Grok-4-fast Release

Grok-4-fast-search by xAI was tested under the codename menlo and has rocketed as #1 on the Search Leaderboard! Text Arena also tested Grok-4-fast, codename tahoe, where it has debuted impressively at #8 on the Text Leaderboard.

Check out the rankings for yourself and let us know what you think in #leaderboards.

minor tide Sep 22, 2025, 4:02 PM

#

Model Update

<@&1372208635530448926> Thank you all for your patience with us as we made adjustments to seedream-4. An update has been made where seedream-4-2k is now available in Battle, Direct, & Side by Side. The model known as seedream-4-high-res is not available at this time. Keep an eye on this channel as we'll continue to provide new model updates in this channel.

Give seedream-4-2k a try here!

minor tide Sep 23, 2025, 2:56 PM

#

New Model Update

<@&1372208635530448926> New Models added to LMArena!

deepseek-v3.1-terminus

deepseek-v3.1-terminus-thinking

minor tide Sep 24, 2025, 4:05 PM

#

New Model Update

<@&1372208635530448926> New Models added to LMArena!

qwen3-max-2025-09-23

qwen3-vl-235b-a22b-thinking

qwen3-vl-235b-a22b-instruct

minor tide Sep 24, 2025, 9:25 PM

#

New Model Update

<@&1372208635530448926> New Models added to LMArena's WebDev!

Gpt-5-codex

Qwen3-coder

Give them a try and vote here!

minor tide Sep 25, 2025, 7:05 PM

#

Leaderboard Update

<@&1372208635530448926> <@&1372208524230397962> Seedream-4-2k has landed on the leaderboards! On the Text-to-Image leaderboard at #1, Seedream-4-2k is now tied with Gemini-2.5-flash-image-preview (nano-banana)! On the Image Edit leaderboard Seedream-4-2k is now ranked at #2.

Let us know what you think in #leaderboards!

minor tide Sep 25, 2025, 7:25 PM

#

New Model Update

<@&1372208635530448926> New Models added to LMArena!

gemini-2.5-flash-preview-09-2025

gemini-2.5-flash-lite-preview-09-2025

minor tide Sep 29, 2025, 6:00 PM

#

New Model Update

<@&1372208635530448926> - New model added to WebDev on LMArena!

claude-sonnet-4-5-20250929

Give it a try here!

minor tide Sep 29, 2025, 6:25 PM

#

New Model Update

<@&1372208635530448926> - New models added to LMArena!

claude-sonnet-4-5

claude-sonnet-4-5-20250929-thinking-16k

minor tide Sep 29, 2025, 10:35 PM

#

New Model Update

<@&1372208635530448926> - New models added to LMArena!

deepseek-v3.2-exp

deepseek-v3.2-exp-thinking

minor tide Sep 30, 2025, 5:04 PM

#

100,000 Community Members! A HUGE Thank You!

We are deeply grateful that you all are interested in being a part of this community. From the LMArena Team - thank you all!

minor tide Sep 30, 2025, 6:04 PM

#

New Model Update

<@&1372208635530448926> New model added to LMArena!

glm-4.6

minor tide Oct 1, 2025, 3:30 PM

#

October AI Generation Contest - Abstract

<@&1385704581677191278> We want to see what you all can make with LMArena! Could you be crowned our next <@&1378032433873555578> ? October's AI Gen Contest is now open.

sidebyside How does it work and what are the rules?

You must submit your entry by sharing a screenshot in #october-contest

On October 24th submissions will be closed and we'll share a way to vote.

Submissions must be done through Battle Mode

Your submission must include both the left and right response.

Your submission must be after you've voted for which response you prefer meaning the models should be revealed in your submission.

Example here.
What could I win?

1 month of Discord Nitro

<@&1378032433873555578> role! You’ll hold onto this exclusive role which will be hoisted towards the top of the member’s list.

This month’s theme: Image - Abstract Art 🎨

Use wild shapes, vibrant colors, and chaotic lines to express feelings or ideas. Create a unique visual experience. This month’s contest will be for image creations only.

Video Gen Contest Winner

Congrats to @vernal lance for being our first Video Gen <@&1378032433873555578>

minor tide Oct 1, 2025, 7:26 PM

#

Arena Champions

@everyone We've thrilled to share with you all the Arena Champions Role! Our goal with this role, <@&1422628364782407830> , is to create a better community for those interested in in-depth AI discussions. This program aims to reward members who show genuine commitment to meaningful conversation by providing a private space where they can engage without interruptions.

Access to this space will be granted through an application process. Members must demonstrate both interest in AI and commitment to meaningful conversation. If you're looking for a dedicated space to have these discussions...

Apply Here

As a thank you to longtime community members, we've granted automatic access to those who've been part of this server since July 2025. Note that members with this role will need to Follow the Category to view these new channels. You can find this in the Channels & Roles tab at the top of the channel list. Select Browse Channels, then enable the Arena Champions Category. You'll then be able to see the list of channels!

minor tide Oct 1, 2025, 11:38 PM

#

Reasoning Trace Now Live

<@&1372208635530448926> Reasoning Trace is now available on Side by Side & Direct chat with reasoning models. Think of this as a way for reasoning models to show their work before they provide a response. Check it out in Side by Side & Direct now!

#

New Model Update

<@&1372208635530448926> New model added to LMArena!

reve-v1
Note this model is** image-edit only**, meaning it will only work if you upload an image for it to edit. The model will error out when using text-to-image.

New model update as well. This has replaced the 16k version.

claude-sonnet-4-5-20250929-thinking-32k

minor tide Oct 2, 2025, 7:59 PM

#

Leaderboard Update

<@&1372208635530448926> <@&1372208524230397962> Our Text Leaderboard has been updated!

Claude Sonnet 4.5 has made it onto the Text Leaderboard, impressively tied with Claude Opus 4.1 for the #1 slot. It’s also shining across many other categories, including: Hard Prompts, Coding, Creative Writing, Instruction Following, and others.

Share your thoughts with us in #leaderboards

minor tide Oct 2, 2025, 11:30 PM

#

New Model Update

<@&1372208635530448926> New model added to LMArena!

ibm-granite-h-small (ibm)

minor tide Oct 3, 2025, 5:17 PM

#

New Model Update

<@&1398740297521037332> New model added to LMArena's Video Arena!

ray-3

Reminder on how to use Video Arena can be found here: #1397655624103493813

minor tide Oct 7, 2025, 2:01 PM

#

We Want to Learn from YOU

@everyone As we continue to build LMArena it’s important we remain focussed we’re delivering the tools you all need to excel at being knowledge experts. In order to do this we must understand what is important to you all better.

If you’re interested in sharing your expertise with the team please…

Fill Out This Survey

minor tide Oct 7, 2025, 5:32 PM

#

New Model Update

<@&1398740297521037332> <@&1372208635530448926> New models added to LMArena's Video Arena!

sora-2

sora-2-pro
note these are only going to be available in text-to-video

Reminder on how to use Video Arena can be found in #1397655624103493813

minor tide Oct 8, 2025, 3:37 PM

#

New Model Update

<@&1372208635530448926> New models added to LMArena!

hunyuan-vision-1.5-thinking

ring-flash-2.0

ling-flash-2.0

minor tide Oct 8, 2025, 4:51 PM

#

New Channel Alert

Introducing #codename-discussion We'll use this channel to have focussed discussions related to models that are using codenames. Reminder that these models appear under codenames or aliases in Battle mode.

You may have to enable this channel manually in Channels & Roles -> Browse Channels

minor tide Oct 9, 2025, 9:11 PM

#

Few Quick Reminders

We are looking to understand what is important to you all better to make LMArena a great product. If you’re interested in sharing your expertise with the team please…

Fill Out This Survey

Our Arena Champions Program aims to reward members who show genuine commitment to meaningful conversation by providing a private space where they can engage without interruptions. Access to this space will be granted through an application process. Members must demonstrate both interest in AI and commitment to meaningful conversation. If you're looking for a dedicated space to have these discussions...

Apply Here

minor tide Oct 14, 2025, 5:30 PM

#

Video Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> Sora 2 & Sora 2 Pro have now landed on the Text-to-Video Leaderboard! Sora 2 Pro is now tied as #1 alongside Veo 3 & Veo 3 Fast. Sora 2 has also shaken things up by landing in at #3. If you haven’t already, be sure to check them out for yourself in our Video Arena! Reminder for how to can be found here: #1397655624103493813

Be sure to let us know what think in #leaderboards !

minor tide Oct 14, 2025, 5:53 PM

#

New Model Update

<@&1372208635530448926> New models added to LMArena!

qwen3-vl-8b-thinking

qwen3-vl-8b-instruct

minor tide Oct 15, 2025, 6:23 PM

#

New Model Update

<@&1372208635530448926> New model added to LMArena & WebDev!

claude-haiku-4-5-20251001
https://x.com/arena/status/1978523872505872481

New models added to Video Arena!

veo-3-1-fast

veo-3-1

lmarena.ai (@arena)

🚨 New Model Update!

Claude Haiku 4.5 is in the Arena!
@AnthropicAI's latest small model is now available for Text and WebDev ⚡️

Come test it out and tell us what you think. Your votes drive the leaderboards!

minor tide Oct 16, 2025, 11:42 PM

#

Leaderboard Update

<@&1372208635530448926> <@&1372208524230397962> The Text Leaderboard has been updated! Claude-Haiku-4-5 has landed and is currently sitting in the #22 rank. Be sure to check out the Text Arena Leaderboard and let us know what you think in #leaderboards

minor tide Oct 20, 2025, 5:07 PM

#

Text-to-Video & Image-to-Video Leaderboard Update

<@&1372208635530448926> <@&1372208524230397962> There has been a big shift in our Text-to-Video Leaderboard & Image-to-Video Leaderboard as Veo-3.1 now ranks #1 in both!

Haven’t tried it out for yourself yet? Be sure to check out #1397655624103493813 for an explanation on how Video Arena works.

Let us know what you think in #leaderboards and be sure to share those Veo-3.1 generations in #ai-creations

https://x.com/arena/status/1980319296120320243

lmarena.ai (@arena)

🚨🎬 Big news from Video Arena!

@GoogleDeepMind’s latest Veo 3.1 now ranks #1 in both Text-to-Video and Image-to-Video leaderboards. 🏆

This is a +30-point leap from Veo 3.0 → 3.1, making it the first model to break 1400 in Video Arena history!

Huge congrats to the

minor tide Oct 25, 2025, 12:11 AM

#

New Model Update

<@&1372208635530448926> - New model added to LMArena!

minimax-m2-preview
https://x.com/arena/status/1981850766039187901

lmarena.ai (@arena)

🚨 New Model Update
MiniMax-M2 by @MiniMax_AI is expected to land next week but is already in the Arena for testing as MiniMax-M2-Preview!

Let’s see how it stacks up.

Early details suggest it’s an advanced agentic model with strong reasoning and long-context capabilities,

minor tide Oct 28, 2025, 4:33 PM

#

Image-to-Video Leaderboard Update

<@&1372208635530448926> <@&1372208524230397962> We’ve updated the Image-to-Video Leaderboard. Hailuo-2.3 is now on the leaderboard and is ranked #5 with Seedance-v1-pro & Kling-2.5-turbo-1080p.

Image-to-Video Leaderboard Here & share your thoughts in #leaderboards

minor tide Oct 30, 2025, 10:10 PM

#

Image-to-Video Leaderboard Update & New Model Update

<@&1372208635530448926> <@&1372208524230397962> New model added to LMArena's Video Arena! This is an image-to-video model only.

hailuo-2.3-fast

Also, the Text-to-Video Leaderboard has been updated! Hailuo-2.3 is now ranked #7. Check it out and let us know what you think in #leaderboards.

Text-to-Video Leaderboard HERE

minor tide Oct 31, 2025, 10:34 PM

#

October Contest Update

<@&1385704581677191278> Thank you to everyone that participated! Vote on which you like the best to crown our new <@&1378032433873555578>.

This contest theme is 🎨 blobpainter Abstract Art 🎨 Use wild shapes, vibrant colors, and chaotic lines to express feelings or ideas. Create a unique visual experience.

Vote Here

minor tide Nov 3, 2025, 10:44 PM

#

WebDev Leaderboard Update

<@&1372208635530448926> <@&1372208524230397962> A new model has landed on the WebDev Leaderboard—MiniMax-M2 is now the #1 top open model, & top #4 overall. The community has shown that it shines at performance coding, reasoning, and agentic-style tasks while remaining cost-effective and fast.

WebDev Leaderboard Here & share your thoughts in #leaderboards

minor tide Nov 5, 2025, 7:35 PM

#

Arena Expert Tagging & Occupational Leaderboards

@everyone We’re thrilled to introduce a new tagging system built on our evaluation framework that’ll identify the most expert-level prompts from the community. With this new system we’re introducing Expert Leaderboard. Arena Expert reveals the structure of prompts: their depth, reasoning, and specificity, which drives the clarity of evaluation.

Additionally, Occupational Leaderboards are now available that map prompts to real-world domains. By mapping all Arena prompts across 23 occupational fields, the system captures the full spectrum of real-world reasoning tasks. With this update you’ll see 8 of these leaderboards live including:

Software & IT Services

Writing, Literature, & Language

Life, Physical, & Social Science

Entertainment, Sports, & Media

Business, Management, & Financial Ops

Mathematical

Legal & Government

Medicine & Healthcare
with more to come!

Read the full research analysis on our blog here & check out our open dataset of expert prompts with occupational tags for yourself here!

minor tide Nov 6, 2025, 5:32 PM

#

New Model Update

<@&1372208635530448926> - New model added to LMArena!

kimi-k2-thinking
https://x.com/arena/status/1986482438768673107

lmarena.ai (@arena)

🚨 New Open Source Model Update!

Touted for its reasoning and coding strengths, Kimi K2 Thinking by @Kimi_Moonshot is now live for both Text and WebDev in Battle, Side by Side and Direct. Bring your toughest prompts! 💪

The last time Kimi K2 was in the Arena with a new model,

minor tide Nov 7, 2025, 5:55 PM

#

Leaderboard Update

<@&1372208635530448926> Ernie-5.0-preview-1022 is now on the Text Arena Leaderboard and is now in the #2 rank! Let us know what you think in #leaderboards

minor tide Nov 7, 2025, 11:09 PM

#

Image-Edit Leaderboard Updated

<@&1372208635530448926> The Image Edit Leaderboard has been updated! Reve-edit-fast is now publicly released and is now ranked in the top 5.

Check out the Image Edit Leaderboard yourself!

minor tide Nov 9, 2025, 6:36 PM

#

October's Contest Winner

<@&1385704581677191278> Congrats to @dusty stump for being our October's Abstract Art Contest Winner!! The newest member of our <@&1378032433873555578> ! Check out their generation here.

Stay tuned for future contest announcements.

minor tide Nov 10, 2025, 6:21 PM

#

Text Leaderboard Update - Kimi K2 Thinking added

<@&1372208635530448926> <@&1372208524230397962> The Text leaderboard has been updated and Kimi-k2-thinking is now the the #2 ranked open source ranked model & tied for #7 overall. We’ve been seeing this model excel at Math, Coding, and Creative Writing categories. On our Expert leaderboard, Kimi-k2-thinking has an impressive score of 1447 as well.

Check out the Text leaderboard yourself and let us know what you think in #leaderboards.

minor tide Nov 10, 2025, 10:39 PM

#

User Login - User Email Now Available

@everyone Driven by community feedback, User Login with email is now available! Save your chat history across multiple devices on both your mobile and desktop browsers.

minor tide Nov 12, 2025, 5:50 PM

#

Code Arena Is Here

@everyone Code Arena is now available on LMArena! The WebDev Arena has leveled up with a complete redesign shaped by community feedback and is now known as Code Arena. With Code Arena, models generate live, deployable web apps and sites that anyone can open, inspect, and judge directly, in real time.

Since Code Arena’s evaluation methods have been rebuilt, a fresh new leaderboard designed to reflect this new system has launched.

battle Try out Code Arena for yourself - HERE
Check out the new Code Arena leaderboard - HERE
directchat Learn more in our Blog Post - HERE

https://youtu.be/iw8oHpttQOs

YouTube

LMArena_ai

Introducing: Code Arena on LMArena.ai

https://lmarena.ai/code

Introducing Code Arena, where AI coding meets the real world.

Traditional benchmarks measure correctness: whether code compiles or passes tests. Correctness matters, but it’s only part of what defines real development. Building software is iterative and creative: you plan, test, refine, and repeat. A credible evaluati...

▶ Play video

minor tide Nov 13, 2025, 7:55 PM

#

New Model Update

<@&1372208635530448926> - New model added to Text, Vision, and Code Arena on LMArena!

gpt-5.1
https://x.com/arena/status/1989058785927950628

lmarena.ai (@arena)

🚨 New Model Update!

@OpenAI has updated its GPT-5 series with GPT-5.1.
Available now in the Arena for Text, Vision and the new Code Arena!

Take it for a test drive with your toughest prompts. Let's see how it stacks up! 🥊

minor tide Nov 14, 2025, 6:27 PM

#

Leaderboard Ranking Method Update

<@&1372208524230397962> <@&1372208635530448926> Today we're announcing an important update to how model rankings are displayed on LMArena, one that makes them both more interpretable and more statistically accurate in how they reflect uncertainty. There will now be two new metrics displayed alongside each model’s score:

Raw Rank: the model’s position based purely on its Arena score.
There are no ties here. Each model receives a unique rank based on its performance.

Rank Spread: an interval that shows the range of possible ranks a model could have, given the overlap in confidence intervals (CIs) across models.

To learn more about this update check out this blog post here>) and let us know what you think in #leaderboards.

minor tide Nov 15, 2025, 3:06 PM

#

New Model Update

<@&1372208635530448926> - New model added to Text & Vision!

gpt-5.1-high
& new models added to Code Arena!

gpt-5.1-codex

gpt-5.1-codex-mini

minor tide Nov 17, 2025, 9:26 PM

#

Text Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> The Text Arena leaderboard has been updated! Grok-4.1-thinking is now in the #1 rank and followed by Grok-4.1 in the #2 rank. On the Expert leaderboard, Grok-4.1-thinking is also in the #1 rank excelling at Hard Prompts, Coding, Instruction Following, and Creative Writing.

Check out the Text Arena leaderboard and bookmark our Leaderboard Changelog for leaderboard updates.

minor tide Nov 17, 2025, 10:12 PM

#

November AI Generation Contest - Code Arena

<@&1385704581677191278> We want to see what you all can make with LMArena! To celebrate Code Arena's launch let's use this modality! Could you be crowned our next <@&1378032433873555578> ? November's AI Gen Contest is now open.

sidebyside How does it work and what are the rules?

You must submit your entry by sharing the preview link in #november-contest

On ~~November 28th~~ December 10 submissions will be closed

Example here
What could I win?

1 month of Discord Nitro

<@&1378032433873555578> role!

Looking for more information on how to use Code Arena? Be sure to check out our walkthrough video here !

YouTube

LMArena_ai

Code Arena walkthrough on LMArena.ai | build, compare, and vote wit...

https://lmarena.ai/code

See top ranked models: https://lmarena.ai/leaderboard/webdev
Read about it: https://news.lmarena.ai/webdev-arena/

Learn how to build websites and applications with Code Arena, test different models head-to-head, and see how community votes shape the leaderboard. Try it out yourself - and vote for your favorite model.

0...

▶ Play video

minor tide Nov 17, 2025, 11:27 PM

#

Video & Image Leaderboard Updates

<@&1372208524230397962> <@&1372208635530448926> The Image-to-Video & the Text-to-Image Leaderboards have been updated as well. Wan2.5-i2v-preview & Wan2.5-t2i-preview have landed in the Top 5 on the Image-to-Video and Text-to-Image leaderboards.

Check out the Image-to-Video leaderboard & the Text-to-Image leaderboard! Let us know what you think in #leaderboards.

Stay up to date with our Leaderboard Changelog!

minor tide Nov 18, 2025, 4:13 PM

#

Leaderboard Update & New Model Update

<@&1372208524230397962> <@&1372208635530448926> Gemini-3-pro has landed on the Text, WebDev, and Vision leaderboards!

#1 in Text scoring 1501

#1 in Vision scoring 1328

#1 in WebDev scoring 1487

You can try out Gemini-3-pro for yourself as it’s now available on LMArena! Be sure to bookmark our Leaderboard Changelog for all leaderboard related updates.

minor tide Nov 19, 2025, 6:32 PM

#

WebDev Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> A new model provider has entered the WebDev Arena: Deep Cogito has released Cogito-v2.1 which ties ranks #18 overall and is in the Top 10 for Open Source models!

See for yourself on the WebDev Leaderboard and share your thoughts in #leaderboards

minor tide Nov 19, 2025, 11:31 PM

#

Text Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> Today, some scores for GPT-5.1 are live for the Text Arena:

GPT-5.1-high ranks #4
GPT-5.1 ranks #12

Stay tuned as we collect more votes and see how the scores converge for GPT-5.1-high in the new WebDev leaderboard powered by Code Arena. We’ll also see how GPT-5.1-medium stacks up.

minor tide Nov 20, 2025, 4:17 PM

#

New Mode Update

<@&1372208635530448926> - Google DeepMind’s new image model just landed on LMArena.

gemini-3-pro-image-preview (nano-banana-pro)

https://x.com/arena/status/1991540746114199960

lmarena.ai (@arena)

🚨🍌BREAKING: @GoogleDeepMind’s Gemini 3 Pro Image aka Nano Banana Pro is in the Arena!

Built on Gemini 3, which only two days ago landed as #1 across all major Arena leaderboards.

Put it head-to-head in Battle mode with the latest models and judge for yourself if it’s SOTA for

minor tide Nov 21, 2025, 9:53 PM

#

Vision Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> A new model provider has landed on the Vision leaderboard! Ernie-5.0-preview-1022 by Baidu debuts with a score of 1206!

Check out the Vision leaderboard and share some prompts you’ve used with Ernie-5.0-preview-1022 in #share-prompts

minor tide Nov 21, 2025, 10:15 PM

#

WebDev Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> The WebDev leaderboard has been updated.
GPT 5.1’s Code Arena evaluations have been added:

GPT-5.1-medium landed at #2 with a score of 1407

GPT-5.1 landed at #8 with a score of 1364

GPT-5.-Codex landed at #9 with a score of 1336

GPT-5.1-Codex-Mini landed at #13 with a score of 1252

Check out the WebDev leaderboard and contribute with your votes on Code Arena. Always stay up to date with our Leaderboard Changelog.

minor tide Nov 21, 2025, 10:33 PM

#

Image Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> It’s been a major week for leaderboard movement, and now the Text-to-Image and Image Edit leaderboards have added Gemini-3-pro-image-preview.

Gemini-3-pro-image-preview ranks #1 on the Text-to-Image leaderboard (+84 pt over nano-banana)

Gemini-3-pro-image-preview ranks #1 on the Image Edit leaderboard (+41 pt over nano-banana)

Let us know what you think in #leaderboards.

minor tide Nov 24, 2025, 7:14 PM

#

New Model Update

<@&1372208635530448926> - New model added to Text and Code Arena on LMArena!

claude-opus-4-5-20251101
https://x.com/arena/status/1993035224880759068

lmarena.ai (@arena)

After a HUGE week of Google, xAI and OpenAI releases, @Anthropic has now entered the Arena with Claude Opus 4.5!

Claude Opus 4.1 currently holds a strong #4 on the WebDev leaderboard (powered by Code Arena) and ranks #7 in the super competitive Text Arena.

How much stronger

minor tide Nov 25, 2025, 8:00 PM

#

Image Edit Update

<@&1372208635530448926> Driven by community feedback, we've updated how multi-turn for image edit works and added some new features we’re excited to share with you:

Multi-turn in image generation chat has been turned off.
You can edit images directly in chat rather than having to download them first by using the new Edit feature in image generation.
The new image upload limit is 10.

Big shoutout to everyone in the community who spoke up when we first launched multi-turn for image generation. Your feedback means a lot!

minor tide Nov 25, 2025, 10:27 PM

#

New Model Update

<@&1372208635530448926> - New model added to Text-to-Image and Image Edit on LMArena!

flux-2-pro

flux-2-flex
https://x.com/arena/status/1993444903876280645

lmarena.ai (@arena)

🖼️ Frontier model drops aren’t slowing down… @bfl_ml’s FLUX.2 just entered the Image Arena!

FLUX.2 Pro and FLUX.2 Flex are available for both Text-to-Image and Image Edit. Stay close as the leaderboard shakes up to see where FLUX.2 lands. Hit it with your strongest prompts and

minor tide Nov 26, 2025, 2:05 AM

#

New Model Update

<@&1372208635530448926> - New models added to Search Arena!

gemini-3-pro-grounding

gpt-5.1-search

minor tide Nov 26, 2025, 7:51 PM

#

Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - Claude-opus-4-5-20251101 & Claude-opus-4-5-20251101-thinking-32k have been added to the leaderboards!

WebDev leaderboard (powered by Code Arena)

#1 for Claude-Opus-4.5 (thinking-32k)

#2 for Claude-Opus-4.5
Expert leaderboard

#1 for Claude-Opus-4.5
Text leaderboard

#3 for Claude-Opus-4.5

#6 for Claude-Opus-4.5 (thinking-32k)
Stay up to date with our Leaderboard Changelog.

minor tide Dec 1, 2025, 6:50 PM

#

New Model Update

<@&1372208635530448926> - New models added to Text Arena!

deepseek-v3.2

deepseek-v3.2-thinking
https://x.com/arena/status/1995564824718442620

lmarena.ai (@arena)

🚨New Models in the Arena!

🐳DeepSeek V3.2: a new family of reasoning-first, agent-oriented models from @deepseek_ai are now live in the Arena.

Standard, Thinking, and Speciale are all in the Text Arena, waiting for your toughest prompts!

Get your votes in: we’ll see how they

minor tide Dec 1, 2025, 11:41 PM

#

Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - new models Flux-2-pro, Flux-2-flex, and KAT-coder-pro-v1 have landed on the leaderboards!

Text-to-Image leaderboard

Flux-2-flex ranks #3

Flux-2-pro ranks #5
Image Edit leaderboard

Flux-2-pro ranks #6

Flux-2-flex ranks #7
WebDev leaderboard

KAT-coder-pro-v1 ranks #16

Always stay up to date with our Leaderboard Changelog.

minor tide Dec 2, 2025, 5:05 PM

#

Text Arena Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - A new open source model has landed on the leaderboard.
Mistral-Large-3 lands at #6 among open models and #28 overall on the Text leaderboard. Mistral-Large-3 was tested under the codename “Jaguar” and performs strongly in:

Coding

Hard Prompts

Multi-Turn

Instruction Following

Longer Query
Check out the Text leaderboard for yourself, and let us know what you think in #leaderboards

minor tide Dec 3, 2025, 7:27 PM

#

Early Access Feedback Program

<@&1372208635530448926> Resharing an Early Access Program that we haven't mentioned in a bit!

The LMArena Test Garden is our private feedback program that'll invite selected members to get a sneak peeks at features, design mocks, and ideas the team is considering implementing for early feedback. For those who are exceptional at providing feedback, and would like to see what we're cooking up, this program is for you!

Note that if selected we'll followup privately for next steps. Being apart of this program does require signing of an NDA.

Apply Here

minor tide Dec 3, 2025, 10:54 PM

#

Search Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - The Search Arena leaderboard has been updated!

Search Arena leaderboard

Gemini-3-pro-grounding ranks #1

Gpt-5.1-search ranks #2

Tell us what you think in #leaderboards and stay up to date with our Leaderboard Changelog.

minor tide Dec 4, 2025, 2:12 AM

#

New Model Update

<@&1372208635530448926> - New model added to Text Arena!

nova-2-lite
https://x.com/arena/status/1996396395411177920

lmarena.ai (@arena)

🚨New Model Update

@Amazon Nova 2 Lite is now available in the Text Arena!

Designed for medium-thinking reasoning tasks, Nova 2 Lite is built for everyday tasks like helping customer support chats, sorting documents, and handling basic business workflows.

minor tide Dec 4, 2025, 6:07 PM

#

New Model Update & Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - New model added to Text-to-Image Arena & Image Edit Arena!

Seedream-4.5

The leaderboards for Text-to-Image & Image Edit have been updated to include Seedream-4.5 as well! This model has landed at #3 for Image Edit leaderboard and ranks #7 in Text-to-Image leaderboard!

Stay up to date with our Leaderboard Changelog.

minor tide Dec 5, 2025, 12:45 AM

#

New Model Update

<@&1372208635530448926> - New models added to Code Arena & Video Arena!

Code Arena

Gpt-5.1-codex-max
Video Arena

Kling-2.6

https://x.com/arena/status/1996692943030354085?s=20

lmarena.ai (@arena)

🚨 New Model in the Code Arena!

GPT-5.1-Codex Max by @OpenAI is ready for you in the Code Arena.

Bring your most toughest, creative prompts and we'll see how it stacks up against current leaders: Claude Opus 4.5 Thinking by @anthropicAI and Gemini 3 Pro by @GoogleDeepMind !

minor tide Dec 5, 2025, 6:21 PM

#

Contest Reminder

<@&1385704581677191278> - Reminder, our current Code Arena contest is going to wrap up on December 10th! Be sure to add those submissions to #november-contest before time runs out.

More details here.

minor tide Dec 9, 2025, 5:08 PM

#

Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated and ERNIE-5.0-Preview-1103 landed with a score of 1431 putting it in the top 20.

Check out the Text Arena leaderboard and stay up to date with the Leaderboard Changelog.

minor tide Dec 11, 2025, 1:03 AM

#

November Contest Closed - Vote Here

<@&1385704581677191278> - Our November Code Arena Contest is now closed!

Vote Here to crown our next <@&1378032433873555578>

minor tide Dec 11, 2025, 6:32 PM

#

New Model Update & WebDev Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - New models added to Code Arena & Text Arena!

GPT-5.2-high

GPT-5.2

Tested internally under the codename “robin and robin-high”, GPT-5.2-high now ranks #2 & GPT-5.2 now ranks #6 on our WebDev leaderboard! These scores are preliminary, so stay tuned as they stabilize. Always stay up to date with our Leaderboard Changelog.

minor tide Dec 13, 2025, 1:23 AM

#

New Model Update

<@&1372208635530448926> - New models added to Text and Vision Arena!

glm-4.6v

glm-4.6v-flash

minor tide Dec 16, 2025, 6:23 PM

#

YouTube Channel Launch

@everyone - We recently launched our YouTube channel! If you’ve been enjoying LMArena, you’ll want to subscribe: we’re posting fast, practical breakdowns to help you understand the AI frontier and choose the best models for your work.

Recent videos include:
• Beginner’s guide to free + open models
• GPT-5.2 enters the Arena
• Why small open models are disappearing (7B → 32B shift)
• Generating SVGs to measure coding capabilities
• How to choose the best AI model for coding

Subscribing helps us grow it, and ensures you see new releases the moment they drop. Let us know if there are topics you want us to cover!

Subscribe here → https://www.youtube.com/@ArenaAIOfficial

YouTube

LMArena

Created by researchers from UC Berkeley, LMArena is an open platform to evaluate, benchmark, compare, and test frontier AI models. Users can chat with multiple models and compare their responses across tasks. By seeing models side by side and voting on the better response, the community shapes a public leaderboard that reflects real-world perfor...

minor tide Dec 16, 2025, 6:57 PM

#

December AI Generation Contest

<@&1385704581677191278> We want to see what you all can make with LMArena! Could you be crowned our next <@&1378032433873555578> ? December's AI Gen Contest is now open.

sidebyside How does it work and what are the rules?

You must submit your entry by sharing a screenshot in #december-contest

On December 30th submissions will be closed and we'll share a way to vote

Submissions must be done through Battle Mode

Your submission must include both the left and right response

Your submission must be after you've voted for which response you prefer meaning the models should be revealed in your submission

Example here

leaderboard What could I win?

1 month of Discord Nitro

<@&1378032433873555578> role! You’ll hold onto this exclusive role which will be hoisted towards the top of the member’s list

Share With Us

We'd love to see what you created on LMArena! Everyone is encouraged to post your contest submission on your own X account and tag @arena. We may repost what you've shared!

This month’s theme: Image - Holiday Celebration

Let's get festive! We want to see how you celebrate the holidays with diverse celebrations like Christmas, Hanukkah, Kwanzaa, New Year's and more. This month’s contest will be for image creations only.

November's Code Arena Contest Winner

Help me give a big congrats to @quiet rivet for being our first Code Arena <@&1378032433873555578> !! Check out the winning submission here.

minor tide Dec 16, 2025, 7:18 PM

#

Image Leaderboard Update & New Model Update

<@&1372208524230397962> <@&1372208635530448926> - The Text to Image leaderboard and Image Edit leaderboard have new models shaking up the ranks!

gpt-image-1.5 is #1 in Text-to-Image (1264)
chatgpt-image-latest is #1 on Image Edit (1409)
gpt-image-1.5 #4 in Image Edit (1395)

New models added to Image Arena!

gpt-image-1.5

chatgpt-image-latest

minor tide Dec 17, 2025, 1:52 AM

#

Text Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - GPT-5.2-high has landed on the Text Arena leaderboard at #13 in the Text Arena!

With a score of 1441, the model performs strongest in:

#1 Math category
#2 in Mathematical occupational field
#5 on Arena Expert

Stay up to date with all leaderboards changes with our Leaderboard Changelog!

minor tide Dec 17, 2025, 4:54 PM

#

Text, Vision, and WebDev Leaderboard Update & New Model Update

<@&1372208524230397962> <@&1372208635530448926> - Gemini-3-flash has landed on our Leaderboards! The Text Arena leaderboard, Vision Arena leaderboard, and WebDev Arena leaderboard have all been updated.

Gemini-3-Flash highlights:

Top 5 across Text, Vision, WebDev
#2 in Math and Creative Writing categories

Where Gemini-3-Flash (thinking-minimal) performs strongest:

Top 10 across Text and Vision
#2 in the Multi-Turn category

These models are now available on Text and WebDev Arena.

gemini-3-flash

gemini-3-flash (thinking-minimal)

Let us know what you think in #leaderboards and stay up to date with our Leaderboard Changelog.

sly rock Dec 18, 2025, 7:05 PM

#

Open Sourcing The Leaderboards

@everyone - Today we’re releasing Arena-Rank, an open-source Python package for paired-comparison ranking—the same code that powers the LMArena leaderboards.

Why we’re doing this:

Transparency & reproducibility: Anyone can now audit our leaderboard methodology, including ratings and confidence intervals.
Research-grade tooling: Arena-Rank implements Bradley–Terry and contextual Bradley–Terry models, with utilities designed for real datasets and real experimentation.
Community & extensibility: The package is intentionally decoupled from our internal pipelines, making it easier to test new ideas, compare methods, and apply the same techniques beyond LLM evaluation (e.g., alignment datasets, sports, esports).

Under the hood, Arena-Rank reflects several methodological and engineering upgrades we’ve made over the past months, including faster JAX-based optimization and cleaner separation between data preprocessing and modeling.

We’re looking forward to feedback, experiments, and contributions from the community. We can't wait to see what kind of leaderboards you make.

Grab it at GitHub: https://github.com/lmarena/arena-ai

💬 To install → pip install arena-rank

This is part of our broader commitment to open science and transparent AI evaluation—and it’s just a starting point. Read more in our blog here.

sly rock Dec 18, 2025, 8:56 PM

#

Image Edit Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - New models reve-v1.1and reve-v1.1-fast have landed on the leaderboard!

Image Edit leaderboard

reve-v1.1 ranks #8
reve-v1.1-fast ranks #15

This represents a +6-point gain over Reve V1.

Always stay up to date with our Leaderboard Changelog.

#

Search Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - The [Search Arena leaderboard](<https://lmarena.ai/leaderboard/search) has been updated. GPT-5.2-Search ranks #2 while Grok-4.1-Fast-Search ranks #4.

Both models debuted ahead of their predecessors, posting gains of +10 points for GPT-5.2-Search and +17 points for Grok-4.1-Fast-Search.

Stay up to date with our Leaderboard Changelog and let us know what you think about how GPT-5.2 ranks in #leaderboards.

sly rock Dec 18, 2025, 9:48 PM

#

Text Leaderboard Update!

<@&1372208524230397962> <@&1372208635530448926> - The [Text Arena leaderboard](<https://lmarena.ai/leaderboard/text) has been updated. GPT-5.2 makes its debut and ranks #17.

Compared to GPT-5.1, the model has improved by +2 points. It trails just one point behind GPT-5.2-high, which is optimized for expert-level reasoning and critical tasks.

Stay up to date with our Leaderboard Changelog!

minor tide Dec 22, 2025, 5:14 PM

#

Text Leaderboard Update - ERNIE-5.0-Preview

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated & ERNIE-5.0-Preview-1203 by Baidu has landed with a score of 1451. Here are some highlights:

Top Text model from Chinese labs
This is a 23 pt increase since ERNIE-5.0-Preview-1103

Bookmark our Leaderboard Changelong to stay up to date with the latest changes to our leaderboards!

minor tide Dec 22, 2025, 5:46 PM

#

WebDev Leaderboard Update - GLM-4.7

<@&1372208524230397962> <@&1372208635530448926> - Our WebDev leaderboard has been updated and GLM-4.7 by Z.ai ranks #6. This makes it the new #1 open model for WebDev. GLM-4.7 has a score of 1449, which is a +83 pt increase over GLM-4.6.

Try out GLM-4.7 in Code Arena and share with the community some of the generations made in #share-prompts !

minor tide Dec 30, 2025, 9:17 PM

#

New Model Update

<@&1372208635530448926> - New model added to Video Arena!

seedance-v1.5-pro

minor tide Dec 31, 2025, 7:28 PM

#

Text Arena Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - Our Text Arena leaderboard has been updated with GLM-4.7 & Minimax-m2.1-preview included.

Let us know what you think in #leaderboards , and stay up to date with all leaderboards changes with our Leaderboard Changelog.

minor tide Jan 2, 2026, 10:28 PM

#

December Contest Closed - Vote Here

<@&1385704581677191278> - Our December Contest is now closed!

Vote Here to crown our next <@&1378032433873555578>

minor tide Jan 3, 2026, 2:18 AM

#

New Model Update

<@&1372208635530448926> - New models added to Image Arena & Image-Edit Arena!

qwen-image-2512

qwen-image-edit-2511
https://x.com/arena/status/2007273636512837958

lmarena.ai (@arena)

🚨 Qwen-Image-2512 and Qwen-Image-Edit-2511 by @Alibaba_Qwen are now live in the Arena.

The latest release delivers reduced finer natural textures, and stronger text rendering with improved layout accuracy.

Bring your toughest, most creative prompts and see how it performs with

minor tide Jan 5, 2026, 5:26 PM

#

User Login Issues

<@&1372208635530448926> - Over the break, we identified some issues with the user login and registration flow. These have now been fixed, so if you were experiencing problems, please try logging in or registering again. If you continue to have any issues, don’t hesitate to let us know in #1451836386293448725

Many thanks to everyone who reported these problems initially, we really appreciate your help!

minor tide Jan 5, 2026, 6:19 PM

#

Image Edit & Text-to-Image Leaderboard Update - Qwen-Image-Edit-2511 & Qwen-Image-2512

<@&1372208524230397962> <@&1372208635530448926> - The Image Edit leaderboard has been updated and qwen-image-edit-2511 is now the #1 open model, and #9 overall!

On the Text-to-Image leaderboard qwen-image-2512 is the #2 open model, and ranks #13 overall.

Find our Leaderboard Changelog here and share what you think about these updates in #leaderboards.

minor tide Jan 5, 2026, 11:32 PM

#

First January AI Generation Contest

<@&1385704581677191278> We want to see what you all can make with LMArena! Could you be crowned our next <@&1378032433873555578> ? For January we're going to be running a contest each week!

sidebyside How does it work and what are the rules?

You must submit your entry by sharing a screenshot in #january-1st-contest

On January 9th submissions will be closed and we'll share a way to vote

Submissions must be done through Battle Mode

Your submission must include both the left and right response

Your submission must be after you've voted for which response you prefer meaning the models should be revealed in your submission

Example here

leaderboard What could I win?

1 month of Discord Nitro

<@&1378032433873555578> role! You’ll hold onto this exclusive role which will be hoisted towards the top of the member’s list

This month’s theme: Window to the Future 🪟

Create an image of something that represents you looking out the window and looking torwards your wildest, brightest future. Make it aesthetic, surreal, or sci-fi! This month’s contest will be for image creations only.

December's Contest Winner

Shoutout to @meager wren for being our new <@&1378032433873555578> !! Check out the winning submission here.

sly rock Jan 6, 2026, 4:08 PM

#

Company News

@everyone

Today, we're excited to announce our $150M funding round at a post-money valuation of more than $1.7B, nearly triple our valuation just seven months after our seed raise in May. The round was led by Felicis and UC Investments, with participation from a16z, The House Fund, LVDP, Kleiner Perkins, Lightspeed and Laude Ventures. This milestone reflects a growing industry consensus: AI cannot scale responsibly without independent, transparent, and continuous evaluation.

LMArena started as a research experiment. It’s now becoming a foundational pillar for the AI ecosystem. To this community who has tested, voted, reported bugs, submitted suggestions, and shared your perspective: Thank you. You are shaping the future of AI. ♥️

Let’s measure and advance what the world needs next. We will move even faster to build new features and improve our product experience for this community to evaluate the frontier of AI.

Read more about the announcement on our blog: https://news.lmarena.ai/series-a/

▶️ A message just for our wonderful community:

minor tide Jan 8, 2026, 12:23 AM

#

Vision Arena Leaderboard Update - ERNIE-5.0-Preview-1220

<@&1372208524230397962> <@&1372208635530448926> - The Vision Arena leaderboard has been updated. ERNIE-5.0-Preview-1220 is now ranked #8 with a score of 1226. Baidu currently stands as the only Chinese lab in the Top 10 on the Vision leaderboard.

Check out our Leaderboard Changelog for all leaderboard updates.

minor tide Jan 9, 2026, 12:29 AM

#

Text-to-Video and Image-to-Video leaderboards Update - Hunyuan-Video-1.5

<@&1372208524230397962> <@&1372208635530448926> - Hunyuan-Video-1.5is now on our leaderboards! For the Text-to-Video leaderboard Hunyuan-Video-1.5 now ranks #18 with a score of 1193 and on the Image-to-Video leaderboard it ranks #20 with a score of 1202.

Let us know what you think in #leaderboards and always stay up to date with our Leaderboard Changelog.

minor tide Jan 9, 2026, 8:47 PM

#

Text Arena Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard just got an update. Let us know what you think in the #leaderboards channel.

minor tide Jan 13, 2026, 12:12 AM

#

New Model Update

<@&1372208635530448926> - A new model has been added to Video Arena.

ltx-2-19b
Check out #1397655624103493813 and test out ltx-2-19b now!

minor tide Jan 13, 2026, 1:04 AM

#

January AI Generation 2nd Contest

<@&1385704581677191278> We want to see what you all can make with LMArena! Could you be crowned our next <@&1378032433873555578> ? For January we're going to be running a contest each week!

lmarena How does it work and what are the rules?

You must submit your entry by sharing a screenshot in #jan

On January 16th submissions will be closed and we'll share a way to vote

Submissions must be done through Battle Mode

Your submission must include both the left and right response

Your submission must be after you've voted for which response you prefer meaning the models should be revealed in your submission

Example here

lmarena What could I win?

1 month of Discord Nitro

<@&1378032433873555578> role! You’ll hold onto this exclusive role which will be hoisted towards the top of the member’s list

This month’s theme :
🍃 Nature Reclaims… depictions of a world where nature has begun to reclaim what humanity once built or occupied. Show us human-made environments overtaken, transformed, or reinterpreted by the natural world. 🍃

January 1st Contest - Vote Here

Vote for our January's First Contest Winner! Vote Here.

minor tide Jan 14, 2026, 8:55 PM

#

New Model Update

<@&1372208635530448926> - New models have been added to Video Arena.

veo-3.1-audio-4k

veo-3.1-audio-1080p

veo-3.1-fast-audio-4k

veo-3.1-fast-audio-1080p
Check out #1397655624103493813 to try them out yourself!

minor tide Jan 14, 2026, 9:56 PM

#

New Model Update

<@&1372208635530448926> - A new model has been added to Code Arena!

gpt-5.2-codex
&
New model has been added to Image Arena!

glm-image

minor tide Jan 14, 2026, 11:41 PM

#

Text Arena Leaderboard Update - ERNIE-5.0-0110

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated! ERNIE-5.0-0110 now ranks #8 with a score of 1460 along with being #12 in Arena Expert. This is currently the only model from a Chinese lab in the Top 10. It performs strongest in the Math category and quite a few occupational categories.

Try out Text Arena and stay up to date with changes in our leaderboards with the Leaderboard Changelog.

minor tide Jan 17, 2026, 1:00 AM

#

Text-to-Image & Image-Edit Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Image Arena leaderboard has been updated where z-image-turbo now ranks #22, flux.2-klein-9B now ranks #24, and flux.2-klein-4B ranks #31 overall. Additionally, Image Edit Arena leaderboard has been updated where flux.2-klein-9B ranks #15 and flux.2-klein-4B ranks #21.

Stay up to date with our Leaderboard Changelog!

minor tide Jan 19, 2026, 5:59 PM

#

Image-Edit Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - The Image Edit leaderboard has been updated. wan2.5-i2i-preview has been added and now ranks #21 with a score of 1213.

As always, stay up to date with our Leaderboard Changelog.

minor tide Jan 20, 2026, 5:48 PM

#

January AI Generation 3rd Contest

<@&1385704581677191278> We want to see what you all can make with LMArena! Could you be crowned our next <@&1378032433873555578> ?

lmarena How does it work and what are the rules?

You must submit your entry by sharing the **Code Arena preview link ** in #january-3rd-contest

On January 26th submissions will be closed and we'll share a way to vote

Submissions must be done with Code Arena. Reminder on how to use Code Arena can be found here.

Example here.

lmarena What could I win?

1 month of Discord Nitro

<@&1378032433873555578> role! You’ll hold onto this exclusive role which will be hoisted towards the top of the member’s list

This month’s theme :
⌨️ Code Arena - Let's use Code Arena for this contest! No specific theme! Build what ever you think would be most appealing!

January 1st Contest Winner

Big congrats to @dark glacier for being our first January contest winner! Check out their submission here.

January 2nd Contest - Vote Here

Vote for our January's Second Contest Winner! Vote Here. Reminder of the theme - Nature Reclaims… 🍃

minor tide Jan 20, 2026, 8:38 PM

#

5 Million Votes

Text Arena has officially passed 5 million community votes. That’s millions of real-world comparisons shaping how frontier AI models are evaluated. You didn’t just prompt. You tested. You voted. You moved the leaderboard.

This milestone belongs to you. 💙

minor tide Jan 21, 2026, 12:52 AM

#

Text-to-Image Leaderboard Update - GLM-Image

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Image Arena leaderboard has been updated and GLM-Image now ranks #8 among open models and #35 overall with a score of 1018.

minor tide Jan 21, 2026, 6:03 PM

#

Video Arena is Now Live on LMArena

@everyone What started last summer as a small Discord bot experiment has grown into a rigorous way to measure and understand how frontier video models perform with real-world use. Thank you to our wonderful community for all the feedback! Today, Video Arena is now available to all on LMArena.

Video Arena on Discord with the bot will remain in place and operate the same.
Video Arena on the web, similar to how it works on Discord, will be Battle mode only.
Login is required in order to use Video Arena on web.
The rate limit for Video Arena on web is 3 generation requests per 24 hours.

Learn more about Video Arena on our blog here.

Try out Video Arena on web now

minor tide Jan 22, 2026, 7:48 PM

#

Video Arena Walkthrough

<@&1398740297521037332> <@&1372208635530448926> - Have you tried Video Arena on web yet? Generate videos with 15 different frontier AI models and compare them head-to-head. Vote for the best output to power the leaderboards.

Get the full walkthrough and a few pro-tips from one of our lead engineers on our YouTube channel: https://www.youtube.com/watch?v=jaIU6eKVK1M

LMArena | Benchmark & Compare the Best AI Models

Chat with multiple AI models side-by-side. Compare ChatGPT, Claude, Gemini, and other top LLMs. Crowdsourced benchmarks and leaderboards.

YouTube

LMArena

Video Arena walkthrough on LMArena.ai | build, compare, and vote wi...

Try it free at https://lmarena.ai/video

Learn how to create AI videos using LMArena's Video Arena. In this walkthrough, lead engineer Anh Mai demonstrates how to use the Video Arena to generate both text-to-video and image-to-video content.

#AIVideo #TextToVideo #ImageToVideo #GenerativeAI #AIEngineering #LMArena

▶ Play video

minor tide Jan 22, 2026, 8:22 PM

#

New Model Update

<@&1372208635530448926> - A new model has been added to Text Arena!

glm-4.7-flash

minor tide Jan 23, 2026, 5:29 PM

#

Single-Image Edit & Multi-Image Edit Leaderboard

<@&1372208524230397962> <@&1372208635530448926> - The Image Edit Arena leaderboard now has more data around real-world use. It now has two distinct leaderboards:

Single-Image Edit: ranks models on single-image tasks
Multi-Image Edit: ranks models on multi-image tasks
This gives us a more accurate view of model performance across distinct image editing use cases, from simple edits to multi-image reasoning. Some changes we see when looking at them:

Leader change: ChatGPT Image (Latest) goes #1 -> #3, while Gemini 3 Pro Image 2K (Nano‑Banana Pro) goes #2 -> #1.

Biggest rise: FLUX-2-Flex jumps #19 -> #12 (up 7 places).

Small‑model mover: FLUX-2-Klein 4B climbs #22 -> #17 (up 5 places).

Biggest drops: Seedream-4 2K slides #7 -> #14 (down 7 places) and Qwen Image Edit (2511) slips to #11 -> #16 (down 5 places)
Check it out yourself to see the differences on our Image Edit leaderboard.

minor tide Jan 23, 2026, 10:29 PM

#

New Model Update

<@&1372208635530448926> - New models added!
Text-to-Image

wan2.6-t2i
Image Edit

wan2.6-image
Code Arena

devstral-2

minor tide Jan 26, 2026, 3:40 PM

#

New Model Update

<@&1372208635530448926> - A new model has been added to Text Arena.

qwen3-max-thinking

minor tide Jan 26, 2026, 6:18 PM

#

Image Edit Leaderboard Update - Hunyuan-Image-3.0-Instruct

<@&1372208524230397962> <@&1372208635530448926> - The Image Edit leaderboard has been updated. Hunyuan-Image-3.0-Instruct now ranks #7 for Image Edit.

Try out Hunyuan-Image-3.0-Instruct vs. all the best frontier models in Image Arena. Stay up to date with our Leaderboard Changelog.

minor tide Jan 26, 2026, 8:41 PM

#

New Model Update

<@&1372208635530448926> - A new model has been added to Text Arena.

molmo-2-8b

minor tide Jan 27, 2026, 4:26 PM

#

New Model Update

<@&1372208635530448926> - A new model has been added to Text Arena.

kimi-k2.5

minor tide Jan 27, 2026, 5:03 PM

#

Community Reminders

<@&1372208635530448926> - A few reminders for everyone:

Please Login to save your Chat History. If you have not yet created an account, please do so now so your chat history is not lost.
We’ve expanded our Help Center with a new Experiments category, featuring a growing collection of articles on ongoing experiments.
The <@&1349916362595635286> ping should be used to report users who are violating our Discord server #rules . The best place to report bugs is #1343291835845578853. For sharing feedback please use #1372230675914031105 . Please contribute to an existing thread if one already exists for your issue or feedback instead of creating a new thread.

minor tide Jan 27, 2026, 7:11 PM

#

Introducing Auto-Modality & Model Selector

<@&1372208635530448926> - Auto-Modality & Model selector are now live!

battle Auto-Modality: Whether your prompt is a coding question, a math proof, or an image generation request, auto-modality routes it to the right modality automatically. This makes every evaluation smoother and more intuitive. For more information on how auto-modality works check out this Help Center article.

sidebyside Model selector: A new design for our model selection window is now live in Direct and Side by Side. Models are now ordered by rank and can be filtered by modality. Find more information about this in the Model Selection Menu Help Center article

minor tide Jan 27, 2026, 9:50 PM

#

Better AI videos in under 90 seconds

<@&1398740297521037332> - Check out our Better AI videos in under 90 seconds video now on our Youtube channel: https://www.youtube.com/watch?v=0hCI2XEh0x0

YouTube

LMArena

6 prompting tips for creating better AI videos

Try it free at https://lmarena.ai/video

Learn how to create AI videos using LMArena's Video Arena. LMArena's lead engineer Anh Mai recommends 6 prompting tips to create better AI videos. Learn how to use the Video Arena to generate both text-to-video and image-to-video content.

#AIVideo #TextToVideo #ImageToVideo #GenerativeAI #AIEngineering #...

▶ Play video

minor tide Jan 27, 2026, 11:49 PM

#

Text Arena Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated. Kimi K2.5 Thinking is now ranked the #1 open model and ranking #15 overall.

Some highlights:

#1 Open model (+5pts vs GLM-4.7)
#7 Coding
#7 Instruction Following
#14 Hard Prompts

Kimi K2.5 Thinking has also been added to Code Arena so go check it out.

minor tide Jan 28, 2026, 6:25 PM

#

LMArena is now Arena

@everyone - We’re excited to share today our new look and feel to match our scientific mission: to measure and advance the frontier of AI for real-world use. We are now just: Arena. Now available at: arena.ai.

From a small PhD research project to a platform powered by a global community of millions. This rebrand has been shaped by this community, the people who use it.

Read more about the rebrand process on our blog here.

minor tide Jan 29, 2026, 6:01 PM

#

Community Reminders

@everyone - As our community continues to grow, it’s important to keep conversations organized and easy to follow.
dot1 Introducing the #ask-here channel. This channel is the** home for one-off questions**. Going forward, questions in #general will be discouraged so discussions there can stay focused on AI related discussions.
dot1 For reporting issues please use #1343291835845578853. Before posting, check whether a thread already exists and add your report there.
dot1 For sharing feedback please use #1372230675914031105. As with bugs, add to an existing thread if one is already active.

minor tide Jan 29, 2026, 9:29 PM

#

Vision Leaderboard Update - Kimi K2.5

<@&1372208524230397962> <@&1372208635530448926> - The Vision Arena leaderboard has been updated. Kimi-k2.5-thinking is now the #1 open model and ranks #6 overall in Vision Arena making it the only open model in the Top 15.

minor tide Jan 29, 2026, 10:56 PM

#

Text, Search, Code, Video, and Image Leaderboards Updated

<@&1372208524230397962> <@&1372208635530448926> - The leaderboards have been updated with new models being added! Check them out:
dot1 Text-to-Image leaderboard

p-image

wan2.6-t2i
Image Edit leaderboard

p-image-edit

wan2.6-image
Text-to-Video leaderboard

kling-o1-pro

ltx-2-19b
Image-to-Video leaderboard

ltx-2-19b
Code Arena leaderboard

devstral-2
Text Arena leaderboard

glm-4.7-flash
Search Arena leaderboard

gemini-3-flash-grounding

claude-sonnet-4-5-search

claude-opus-4-5-search

Gpt-5.2-search-non-reasoning

Stay up to date with our Leaderboard Changelog.

minor tide Jan 29, 2026, 11:55 PM

#

Search Bar & Archive Chat Now Available

<@&1372208635530448926> - Two new features have rolled out to everyone.

dot1 Search Bar - Your chats are now searchable, with the option to filter by modality.
dot3 Archive Chat - Archive chat sessions to keep them for later without cluttering your chat history.

With these features now live, the process for deleting a chat session has changed. Follow the steps in this article to learn how to delete chats going forward.

minor tide Feb 1, 2026, 4:39 PM

#

Video Arena Discord Bot Rate Limit Change

<@&1398740297521037332> - Video Arena on Discord had it's rate limit updated to 1 generation request per 24 hour period. Using Video Arena on web still has the same rate limit of 3 generations per 24 hour period.

Arena | Benchmark & Compare the Best AI Models

Chat with multiple AI models side-by-side. Compare ChatGPT, Claude, Gemini, and other top LLMs. Crowdsourced benchmarks and leaderboards.

minor tide Feb 2, 2026, 3:52 PM

#

New Model Update

<@&1372208635530448926> - New models have been to Arena!
Text Arena

step-3.5-flash
Code Arena

qwen3-max-thinking

minor tide Feb 2, 2026, 4:12 PM

#

Code Arena Leaderboard Update - Kimi K2.5

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has received an update. ⁨⁨⁨⁨⁨Kimi-K2.5-thinking⁩⁩⁩⁩⁩ now ranks #1 open and #5 overall on Code Arena.
The community has also ranked ⁨⁨⁨⁨⁨Kimi-K2.5-thinking⁩⁩⁩⁩⁩ as #1 open model for Vision, and Text including the Coding category.

Let us know what you think in #leaderboards and share the previews of what you’ve built with Kimi.ai in #ai-creations .

Try out the best frontier models on agentic coding tasks at Code Arena.

minor tide Feb 4, 2026, 6:21 PM

#

Say hello to Max

@everyone - Max is Arena’s intelligent router, powered by 5+ million real-world community votes.

Max routes each prompt to the most capable model with latency in mind. AI models excel at different things (code, math, speed, reasoning). Max orchestrates across model strengths to deliver reliable performance across real-world use cases.

To learn more about Max check out this blog post and our Youtube video.

Available today here.

YouTube

Arena AI

Introducing Max: Arena's intelligent router

Say hello to Max! https://arena.ai/max

Max is Arena’s intelligent router, powered by 5+ million real-world community votes.

Max routes each prompt to the most capable model with latency in mind. AI models excel at different things (code, math, speed, reasoning). Max orchestrates across model strengths to deliver reliable performance across r...

▶ Play video

minor tide Feb 5, 2026, 12:15 AM

#

New Model Update

<@&1372208635530448926> - New model added to Text, Vision, & Code Arena

seed-1.8

minor tide Feb 5, 2026, 4:24 PM

#

Have you met Max?

<@&1372208635530448926> - Max intelligently routes each prompt to the most capable model currently live on Arena.

Catch the full video on our YouTube with Arena researcher Derry.

YouTube

Arena AI

Max intelligent router walkthrough on Arena.ai

Try it free at https://arena.ai/max

Learn how Arena 's Max intelligently routes your prompts to the best AI model for each task. In this walkthrough, researcher Derry Xu demonstrates how Max balances capability, speed, and task type.

Whether you need fast responses, complex reasoning, or specialized skills like coding and math, Max orchestrat...

▶ Play video

minor tide Feb 5, 2026, 7:15 PM

#

Video Arena Leaderboard Update - Vidu Q3 Pro

<@&1372208524230397962> <@&1372208635530448926> - The Image-to-Video leaderboard has been updated. Vidu-Q3-pro by Vidu AI is now in the Top 5 with a score of 1362.

minor tide Feb 5, 2026, 9:29 PM

#

New Model Update - Opus 4.6

<@&1372208635530448926> - New models added to Text Arena and Code Arena.

claude-opus-4-6

claude-opus-4-6-thinking

minor tide Feb 5, 2026, 10:46 PM

#

Claude Opus 4.6 First Impressions

<@&1372208635530448926> - Our AI Capabilities Lead, Peter, breaks down the latest performance of Opus 4.6. Check it out here.

YouTube

Arena AI

First impressions of Opus 4.6 | Arena.ai

Try Claude Opus 4.6 yourself: https://arena.ai

Arena's AI Capability Lead Peter Gostev shares his first impressions of Claude Opus 4.6, Anthropic's latest flagship coding model. In this deep dive, Peter tests the model's coding and reasoning capabilities to see how it stacks up against other frontier models on Arena's leaderboards.

From SVG ge...

▶ Play video

minor tide Feb 6, 2026, 6:42 PM

#

Code, Text, and Expert Leaderboard Updates - Opus 4.6

<@&1372208524230397962> <@&1372208635530448926> - Claude Opus 4.6 has landed on our leaderboards and is now #1 across Code, Text and Expert!
dot1 #1 in Code Arena: +106 score over Opus 4.5
dot2 #1 in Text Arena: scoring 1496, 10pts over Gemini 3 Pro and is also ranking #1 in key Text Arena categories:

Hard Prompts
Instruction Following
Longer Query
#1 for Expert Arena: +~50 lead

Tell us what you think about these changes in #leaderboards and stay up to date with our Leaderboard Changelog.

minor tide Feb 6, 2026, 10:05 PM

#

January AI Generation Contest

<@&1385704581677191278> - Thank you all for voting for our 2nd January AI Generation Contest 🍃 Nature Reclaims! The votes have been tallied and the newest member of our <@&1378032433873555578> is @raw thunder ! Check out the winning submission here.

3rd January Contest - Vote Here

Help crown our next <@&1378032433873555578> by voting. Reminder that this theme is: ⌨️ Code Arena

minor tide Feb 6, 2026, 10:36 PM

#

Vision, Text, and Code Leaderboard Update - Kimi K2.5

<@&1372208524230397962> <@&1372208635530448926> - Kimi K2.5 is now on our leaderboards and is in the top 5 open models for Vision, Text, and Code!
dot1 #2 open model in Vision, #10 overall on par with gpt-5.1
dot2 #3 open model in Text, #26 overall on par with o3 and Qwen3-max-preview
dot3 #4 open model in Code, #10 overall rivaling gemini-3-flash

minor tide Feb 7, 2026, 5:12 PM

#

Video Arena Discord Update

<@&1398740297521037332> - We’re making a small but important change: Video Arena is moving off Discord and will be exclusively available on arena.ai.

This change was driven largely by community feedback requesting new features that aren’t possible to support through a Discord bot. Moving Video Arena to our site gives us the flexibility to build and ship new capabilities that Discord simply can’t support. While the Discord version is going away, Video Arena itself is not . It’s still fully available on our site and will continue to improve over time.

Starting Wednesday, February 11th at 4pm PST, Video Arena will no longer be available through Discord. The Video Arena site experience on arena.ai/video is unaffected and will remain fully accessible.

Thank you everyone for being vocal about how you want to see Video Arena improve. This move helps us build a better experience for everyone.

#

Image Arena Leaderboard Update - Grok Imagine Image

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Image and Image Edit leaderboards have been updated to include Grok-Imagine-Image.

Text-to-Image leaderboard:

#4 Grok-Imagine-Image; scoring 1170, surpassing Flux-2-max and Nano-banana
#6 Grok-Imagine-Image-Pro
Image-Edit leaderboard:
#5 Grok-Imagine-Image-Pro; scoring 1330, overtaking Seedream-4.5
#6 Grok-Imagine-Image

minor tide Feb 9, 2026, 8:26 PM

#

Text Arena and Code Arena Leaderboard Update - Opus 4.6 Thinking

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard and the Code Arena leaderboard have been updated to include `Claude-opus-4-6-thinking’!

Some highlights:

#1 Code Arena: scoring 1576
#1 Text Arena: scoring 1504
In Code Arena: Claude Opus 4.6 takes #1 & #2; Claude Opus 4.5 takes #3 & #5

minor tide Feb 9, 2026, 9:48 PM

#

Updates to the Image Arena Leaderboard

@everyone - Text-to-image models have advanced quickly, and so have use cases. After analyzing 4M+ user prompts (from fantasy art to logos and posters), it’s clear that a single leaderboard is no longer enough to capture real-world use. With that in mind, we’ve updated the Text-to-Image Arena with Prompt Categories & Quality Filtering.

Category-specific leaderboards that surface domain-level performance across common use cases. New categories include:
• Product, Branding & Commercial Design
• 3D Imaging & Modeling
• Cartoon, Anime & Fantasy
• Photorealistic & Cinematic Imagery
• Art
• Portraits
• Text Rendering

To improve reliability, we filtered the prompt set to focus on inputs that consistently deliver quality image generation. After removing ~15% of noisy or underspecified prompts, we recomputed the leaderboard, resulting in more stable, higher-confidence rankings.

These updates are a first step toward more granular, interpretable evaluation of text-to-image models—grounded in how people actually use them. You can now explore how your favorite text-to-image models perform across these categories on the Text-to-Image Arena leaderboard.

Video Arena Discord Reminder

<@&1398740297521037332> - Reminder that on **Wednesday February 11th @ 4pm PST ** the Video Arena through the Discord bot will not be available. Video Arena will still be available through the site and is unaffected by this change. This shift allows us to focus efforts into improving Video Arena with features and capabilities that aren't possible through a Discord bot.

We appreciate everyone who has provided feedback and enjoyed using Video Arena through Discord. Thank you!

minor tide Feb 10, 2026, 5:53 PM

#

Announcing Arena's Academic Partnerships Program

<@&1372208590248742964> <@&1372208635530448926> - Today we’re announcing our Academic Partnerships Program, a new initiative to support independent academic research in AI evaluation, rankings, and measurement.

As AI systems advance and adoption accelerates, the methods we use to evaluate and compare models increasingly shape both scientific progress and real-world outcomes. Many of the most important contributions in this area come from the academic research community, and we’re proud to help support that work directly.

Selected projects may receive up to $50,000 in research funding. We welcome proposals across evaluation methodology, leaderboard design, measurement and statistical validity, preference data and human evaluation, and safety/alignment evaluation.

Learn more about our Academic Partnerships Program here.

Apply to our Academic Partnerships Program by March 31, 2026 here.

minor tide Feb 10, 2026, 7:16 PM

#

PDF Upload is available on Arena

<@&1372208635530448926> - Upload PDFs with your prompts to add richer context and test models on document reasoning, bringing evaluations closer to real-world use. Try it across 10 models today - we’ll be adding more over time.

Leaderboard coming soon. Start uploading, comparing, and voting!

minor tide Feb 11, 2026, 12:13 AM

#

New Models & Video Arena Leaderboard Update

<@&1372208635530448926> <@&1398740297521037332> <@&1372208524230397962> - The Video Arena leaderboards have been updated and high-res 1080p variants for Veo 3.1 now rank #1 and #2 in Video Arena.

dot3 In Text-to-Video the 1080p versions top the chart

#1 veo-3.1-audio-1080p

#2 veo-3.1-fast-audio-1080p

dot4 In Image-to-Video, 1080p variants make the top 5

#2 veo-3.1-audio-1080p

#5 veo-3.1-fast-audio-1080p

dot1 New models have been added to Video Arena and Text Arena.

veo-3.1-audio-1080p (Video Arena)

veo-3.1-fast-audio-1080p (Video Arena)

step-3.5-flash (Text Arena)

minor tide Feb 11, 2026, 5:54 PM

#

New Model Update

<@&1372208635530448926> - A new model has been added to Text Arena and Code Arena.

glm-5

minor tide Feb 11, 2026, 6:13 PM

#

Multi-file Apps Now Live in Code Arena

<@&1372208635530448926> - Since launching Code Arena in November to evaluate frontier AI models on real-world, agentic coding tasks, we’ve received a lot of feedback asking to adapt more complex workflows.

With multi-file apps, you can now build and compare production-ready projects, making it easier to evaluate how top frontier AI models perform on your actual use cases.

minor tide Feb 11, 2026, 11:22 PM

#

Text Arena Leaderboard Update - GLM-5

<@&1372208635530448926> <@&1372208524230397962> - The Text Arena leaderboard has been updated and glm-5 is now #1 among open models.

#1 open model on par with gpt-5.1-high
#11 overall; scoring 1452, +11pts improvement over GLM-4.7

Stay up to date with our Leaderboard Changelog.

minor tide Feb 11, 2026, 11:59 PM

#

Video Arena Discord Reminder

<@&1398740297521037332> - We are currently in the process of removing the Video Arena through the Discord bot. Video Arena will still be available through the site and is unaffected by this change. This shift allows us to focus efforts into improving Video Arena with features and capabilities that aren't possible through a Discord bot.

We appreciate everyone who has provided feedback and enjoyed using Video Arena through Discord. Thank you!

minor tide Feb 12, 2026, 4:39 PM

#

New Model Update

<@&1372208635530448926> - A new model has been added to Text Arena and Code Arena.

Minimax-m2.5

minor tide Feb 12, 2026, 5:18 PM

#

Code Arena Leaderboard Update - GLM-5

<@&1372208635530448926> <@&1372208524230397962> - The Code Arena leaderboard has been updated and GLM-5 is now the #1 open model in Code Arena. Overall #6 on par with Gemini-3-pro, 100+pts below Claude-Opus-4.6 in agentic webdev tasks.

Arena's AI Capability Lead Peter Gostev shares his first impressions of two powerful models: GLM-5 and MiniMax-M2.5. Give it a watch here.

YouTube

Arena AI

First impressions of GLM-5 and MiniMax-M2.5 | Arena.ai

Try it yourself: https://arena.ai

Arena's AI Capability Lead Peter Gostev shares his first impressions of two powerful models from China: GLM-5 (Zhipu AI) and MiniMax-M2.5 (MiniMax).

In this deep dive, Peter tests both models' coding and reasoning capabilities to see how they stack up against leading models like Claude Opus 4.6 and Gemini 3 P...

▶ Play video

minor tide Feb 17, 2026, 3:53 PM

#

New Model Update

<@&1372208635530448926> - A new model has been added to Text, Vision, and Code Arena.

qwen3.5-397b-a17b

minor tide Feb 17, 2026, 7:30 PM

#

New Model Update

<@&1372208635530448926> - A new model has been added to Text, and Code Arena.

claude-sonnet-4-6

minor tide Feb 18, 2026, 12:23 AM

#

First impressions of Claude Sonnet 4.6

<@&1473460945308221542> - Check our newest YouTube video with Arena's AI Capability Lead Peter Gostev sharing his first impressions of Claude Sonnet 4.6, Anthropic's latest model in the Claude family.

https://www.youtube.com/watch?v=b0yr1I0dxA4

Want the new YouTube Updates role? Just head to Channels & Roles (in the channel list), click Customize, choose What brings you here, and select YouTube Update.

YouTube

Arena AI

First impressions of Claude Sonnet 4.6 | Arena.ai

Try it yourself: https://arena.ai

Arena's AI Capability Lead Peter Gostev shares his first impressions of Claude Sonnet 4.6, Anthropic's latest model in the Claude family.

In this deep dive, Peter tests Claude Sonnet 4.6's coding and reasoning capabilities to see how it stacks up against leading models like Claude Opus 4.6, Gemini 3 Pro, and G...

▶ Play video

minor tide Feb 18, 2026, 4:33 PM

#

New Model Update

<@&1372208635530448926> - New models have been added to Search Arena.

sonnet-4.6-search

opus-4.6-search

minor tide Feb 18, 2026, 9:12 PM

#

Arena Leaderboard UI Update

@everyone - Millions of votes power the leaderboard. Now you can filter for what matters to you. A new side panel lets you filter and break down ranked results to find the best model for your task. Some highlights:

dot1 Filter by category (e.g. Coding, Expert prompts)
dot3 Open vs. Proprietary Models
dot2 Rank labs by their top-performing models

Check it out, and let us know what you think in #leaderboards.

minor tide Feb 19, 2026, 12:29 AM

#

A quick look at Arena's updated leaderboard UI

<@&1473460945308221542> - Join our Designer, Justin Keoninh, for a walkthrough of the new leaderboard UI updates and learn how to make the most of the latest enhancements.

https://www.youtube.com/watch?v=xfmcR6-Uh5Q

Want the new YouTube Updates role? Just head to **Channels & Roles **(in the channel list), click Customize, choose What brings you here, and select YouTube Update.

YouTube

Arena AI

A quick look at Arena's updated leaderboard UI | Arena.ai

https://arena.ai

Millions of votes power the Arena leaderboard. Now you can filter for what matters to you. A new side panel lets you filter and break down ranked results to find the best AI model for your task.

#arenaai #llmevaluation

▶ Play video

minor tide Feb 19, 2026, 4:22 AM

#

Text Arena Leaderboard Update - Qwen3.5-397B-A17B

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated including Qwen3.5-397B-A17B.

In the highly competitive Text Arena, a few highlights:
dot1 #20 overall in Text on par with Claude Opus 4.1 variants
dot4 Top 5 open for key categories in Text like: Math, Instruction Following, Multi-Turn, Creative Writing and Coding
dot3 Top 5 for open models in Arena Expert (#26 overall)

minor tide Feb 19, 2026, 4:26 PM

#

Text and Code Arena Leaderboard Update - Gemini 3.1 Pro

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena and Code Arena leaderboards have been updated and now include Gemini-3.1-Pro. It’s top 3 across Text and Vision Arena, and #6 in Code Arena, tied closely with Claude Opus 4.5.

Highlights:
dot1 Tied #1 in Text (scoring 1500) only 4 pts from Opus 4.6
dot5 Top 3 in Arena Expert (scoring 1538), just behind Opus 4.6
dot2 #6 in Code Arena, on par with Opus 4.5 and GLM-5

minor tide Feb 19, 2026, 7:05 PM

#

New Model Update

<@&1372208635530448926> - A new model has been added to Text Arena!

trinity-large

minor tide Feb 20, 2026, 4:35 PM

#

Code Arena and Text Arena Leaderboard Update - Sonnet 4.6

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard and Text Arena leaderboard has been updated to include Claude-sonnet-4.6.

Highlights:

+130 pts jump in Code Arena (#22 -> #3) compared to Sonnet 4.5, surpassing top-tier thinking models like Gemini-3.1 and GPT-5.2
Strong gains in Text categories: Math (#4) and Instruction Following (#5), Overall (#13)

minor tide Feb 20, 2026, 5:48 PM

#

Video Arena Channels Update

<@&1398740297521037332> - We’re planning to remove the Video Arena generation channels from the server on Monday 2/23 @ 4pm PST. If you’d like to download any generations, please make sure to do so before that date.

#

What happens to your Arena vote?

<@&1473460945308221542> - Ever wondered what actually happens after you vote on Arena? Clayton breaks down the full journey.

https://www.youtube.com/watch?v=omT1ohYG53E

YouTube

Arena AI

How Arena turns your vote into research-grade data

https://arena.ai

Ever wondered what actually happens after you vote on Arena? Clayton breaks down the full journey — from raw vote to research-grade data — including how Arena tags prompts by category, filters spam and duplicates, and ensures every data point is legitimate.

0:00 Introduction
0:08 How prompts get tagged (coding, math, creat...

▶ Play video

minor tide Feb 20, 2026, 7:46 PM

#

Vision Leaderboard Update - Qwen3.5-397B-A17B

<@&1372208524230397962> <@&1372208635530448926> - The Vision Arena leaderboard has been updated to include Qwen3.5-397B-A17B. It's now tied for top 2 open model in the Vision Arena with Kimi-K2.5-Instant. Ranks #13 overall on par with proprietary models like GPT-4o.

minor tide Feb 23, 2026, 4:15 PM

#

Text Leaderboard Update - GPT-5.2-chat-latest

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated to include GPT-5.2-chat-latest now in the top 5.
Highlights include:
dot5 Top 5 scoring 1478 on par with Gemini-3-Pro
dot4 +40pt improvement over the GPT-5.2 model
dot3 Top in key categories: Multi-Turn, Instruction-Following, Hard Prompts, Coding

minor tide Feb 24, 2026, 5:05 AM

#

Image Arena Leaderboard Update - Reve V1.5

<@&1372208524230397962> <@&1372208635530448926> - The Image Arena leaderboard has been updated to include Reve V1.5.

Highlights:
dot1 #4, scoring 1177, on par with Grok-Imagine-Image
dot4 Top 5 for categories: Text Rendering, Art and Product, Branding Commercial Design

minor tide Feb 24, 2026, 5:05 PM

#

Code Arena Leaderboard Update - Qwen3.5-397B-A17B

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has been updated now including Qwen3.5-397B-A17B.

Highlights:
dot2 top 7 open model
dot3 ranks #17 overall, on par with proprietary models like GPT-5.2 and Gemini-3-Flash

minor tide Feb 24, 2026, 6:00 PM

#

New Model Update -

A new model has been added to Image Arena.

seedream-5.0-lite

#

Video Arena Leaderboard Update - Wan2.6-t2v

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Video leaderboard and the Image-to-Video leaderboard has been updated and now include Wan2.6-t2v.

Highlights:
dot5 Wan2.6-t2v is the #1 Chinese model in the Video Arena
dot1 Top 8 for Text-to-Video, scoring 1346, on par with Veo-3-fast-audio
dot3 #12 for Text-to-Image, scoring 1292, close with Seedance v1.5 pro and Kling 2.6 pro

minor tide Feb 25, 2026, 8:02 AM

#

Search and Text Arena Leaderboard Update - Grok 4.20 beta1

<@&1372208524230397962> <@&1372208635530448926> - The Search Arena leaderboard and Text Arena leaderboard has been updated and now include Grok-4.20-Beta1.

Highlights:
dot2 #1 in Search Arena, scoring 1226, leading GPT-5.2 and Gemini-3
dot3 #4 in Text Arena, scoring 1492 on par with Gemini 3.1 Pro

minor tide Feb 25, 2026, 6:08 PM

#

New Model Update -

<@&1372208635530448926> - New models have been added to Code, Text, and Vision Arena.
Code Arena

qwen3.5-27b-code

qwen3.5-35b-a3b-code

qwen3.5-122b-a10b-code
Text and Vision Arena

qwen3.5-27b

qwen3.5-35b-a3b

qwen3.5-122b-a10b

minor tide Feb 26, 2026, 1:18 AM

#

Image Edit Leaderboard Update - Seedream-5.0-Lite

<@&1372208524230397962> <@&1372208635530448926> - The Image Edit Arena leaderboard has been updated. Seedream-5.0-Lite now ties for top 5 on the Multi-Image Edit Arena.

Highlights:
dot2 ranks #10 in Single-Image, scoring 1301 on par with Hunyuan-Image-3.0 and Nano Banana
dot4 ranks #23 overall for Text-to-Image Arena, scoring 1106

minor tide Feb 26, 2026, 3:25 PM

#

Video Arena Leaderboard Update - P-Video

<@&1372208524230397962> <@&1372208635530448926> - P-Video enters the Video Arena leaderboards in top 26.

Highlights:
dot1 tied for #22 in Text-to-Video, score 1178 on par with Hailuo 2.0 and Kandinsky 5.0 Pro
dot3 top 26 for Image-to-Video, score 1199 on par with Hailuo 2.0 Fast
dot5 their fastest model with pricing at $0.04/second for 1080p

minor tide Feb 26, 2026, 4:15 PM

#

Image Arena Leaderboard Update - Nano Banana 2

<@&1372208524230397962> <@&1372208635530448926> - Nano Banana 2 debuts at #1 in Image Arena, and it changes the game again 🍌Officially released as Gemini-3.1-Flash-Image-Preview, it introduces a new web search capability, unlocking image generation grounded in real-world context.

Highlights:
dot3 #1 Text-to-Image scoring 1279, surpassing GPT-Image-1.5 and Nano Banana Pro
dot5 Ties for #1 Single-Image Edit, scoring 1407 on par with ChatGPT-Image-Latest
dot1 Top 3 Multi-Image Edit, alongside Nano Banana Pro variants
dot4 $0.067 per image ~2x cheaper than Nano Banana Pro

sand grail Feb 26, 2026, 5:36 PM

#

Why the best AI models make the worst in-app assistants

<@&1473460945308221542> - Peter covers three reasons AI agents underperform inside existing software.

https://www.youtube.com/watch?v=qF8afKUGRpc

Want the new YouTube Updates role? Head to Channels & Roles (in the channel list), click Customize, choose What brings you here, and select YouTube Updates.

YouTube

Arena AI

Why the best AI models make the worst in-app assistants

https://arena.ai

You've probably noticed that AI feels incredibly powerful in tools like Claude or ChatGPT, but weirdly disappointing when it's built into the apps you already use. It's not your imagination.

In this clip, we cover three reasons AI agents underperform inside existing software: the model you're actually getting isn't the flagshi...

▶ Play video

minor tide Feb 26, 2026, 7:01 PM

#

Search Arena Leaderboard Update - Claude Opus & Sonnet 4.6

<@&1372208524230397962> <@&1372208635530448926> - The Search Arena leaderboard has been updated to include Claude-Opus-4-6 and Claude-Sonnet-4-6.

Highlights:
dot5 #1 wide lead by Opus 4.6 scoring 1255, +30pt over Grok-4.20-beta1, GPT-5.2 and Gemini-3
dot2 Sonnet 4.6 ranks #7 on par with GPT-5.1

minor tide Feb 27, 2026, 12:49 AM

#

New Model Update -

<@&1372208635530448926> - A new model has been added to Code Arena!

gpt-5.3-codex

minor tide Feb 27, 2026, 2:18 AM

#

Video Arena Leaderboard Update - Kling V3 Pro

<@&1372208524230397962> <@&1372208635530448926> - The Video Arena leaderboard has been updated to include Kling-V3-Pro.

Highlights:
dot4 tied #8, scoring 1337 on par with Wan2.5-i2v-preview
dot5 +52pt improvement over Kling 2.6 Pro
dot3 +48pt over Kling-2.5-turbo-1080p

sand grail Feb 27, 2026, 5:00 PM

#

7 new categories for Image Arena | Arena.ai text-to-image update

<@&1473460945308221542> Guanglei Song, PhD introduces 7 new categories in Image Arena to find the top models for photorealistic, 3D modeling, and more.

https://www.youtube.com/watch?v=kWK18CEbSag

YouTube

Arena AI

7 new categories for Image Arena | Arena.ai text-to-image update

See top models: https://arena.ai/leaderboard/text-to-image

Arena just dropped 7 new categories for evaluating text-to-image models, such as photorealistic & cinematic imagery or 3D imaging & modeling. Guanglei Song, PhD and engineering manager at Arena, talks about how they us...

▶ Play video

sand grail Mar 2, 2026, 8:02 PM

#

How millions of people compare AI models | Arena.ai explained in 60 seconds

<@&1473460945308221542> Arena in 60 seconds. What did we miss?

https://www.youtube.com/watch?v=nktiDGTn61I

YouTube

Arena AI

How millions of people compare AI models | Arena.ai explained in 60...

Try it free: https://arena.ai

How millions of people compare AI models - explained in 60 seconds. Enter a prompt, two anonymous AI models battle it out, you pick the winner, and your vote powers the world's most trusted AI leaderboard. Filter by skill, explore modalities beyond text - from image generation to full app building - and chat with s...

▶ Play video

sly rock Mar 2, 2026, 11:04 PM

#

Video Arena Leaderboard Update - Runway Gen-4.5

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Video Arena leaderboard has been updated to include Runway Gen 4.5.
dot1 Gen-4.5 scores 1218, on par with KlingAI’s Kling-2.6-Pro.

sly rock Mar 3, 2026, 4:57 PM

#

Text & Code Arena Leaderboard Update - Gemini-3.1-Flash Lite

<@&1372208524230397962> <@&1372208635530448926> - The leaderboards have been updated to include Gemini-3.1-Flash-Lite-Preview for Text and Code Arena.

Highlights:
dot1 ranks #36 in Text, scoring 1432, on par with Grok-4.1-fast, strong in Creative Writing, and Longer Query
dot2 surpassing larger Gemini 2.5 Flash and GPT-5-mini
dot3 tied in Code Arena for #35 scoring 1261, on par with Qwen3-coder for agentic webdev tasks

sand grail Mar 3, 2026, 6:08 PM

#

Document Arena walkthrough on Arena.ai | make the best AI models compete

<@&1473460945308221542> - in Document Arena, you can upload a PDF and watch two anonymous AI models go head-to-head.

https://www.youtube.com/watch?v=cIU3-gt_Kro

YouTube

Arena AI

Document Arena walkthrough on Arena.ai | make the best AI models co...

https://arena.ai

Which AI model is best at document reasoning? In Document Arena, you can upload a PDF and watch two anonymous AI models go head-to-head — then vote for the one that gives the better response. In this video, Arena engineer Kelsey walks us through how Document Arena works.

0:00 Introduction to Document Arena
0:31 Solving homew...

▶ Play video

sly rock Mar 3, 2026, 7:43 PM

#

Document Arena Leaderboard is now Live!

<@&1372208524230397962> <@&1372208635530448926> - The Document Arena leaderboard has been added. The Document Arena displays model rankings based on side-by-side evaluations of real-world document reasoning performance across user-uploaded PDF files.

See which frontier AI models rank highest in document reasoning, all powered by side-by-side evaluations on user-uploaded PDFs from real work use cases.

dot1 #1 is Claude Opus 4.6 scoring 1525, +51 pts in the lead
dot2 While Opus 4.5 and Gemini 3.1 Pro Preview join in the top 3
dot3 Latest GPT-5.2 tied at #9, ~100 pts behind Opus 4.6

minor tide Mar 3, 2026, 9:56 PM

#

New Model Update -

<@&1372208635530448926> - New models have been added to Text Arena and Video Arena!

GPT-5.3-Chat-Latest (Text Arena)

PixVerse V5.6 (Video Arena)

sand grail Mar 4, 2026, 5:52 PM

#

Why an AI router beats every model on Arena | Max deep dive

<@&1473460945308221542> - Arena ML researchers Derry and Evan go behind the scenes of Arena's new Max intelligent router.

https://www.youtube.com/watch?v=nO6E5t6dmA0

Want the new YouTube Updates role? Head to Channels & Roles (in the channel list), click Customize, choose What brings you here, and select YouTube Updates.

YouTube

Arena AI

Why an AI router beats every model on Arena | Max deep dive

Try Max → https://arena.ai/max

Arena researchers Derry and Evan break down Max, the intelligent router that topped every category on Arena's leaderboard — not by being one model, but by picking the best model for every prompt.

In this deep dive, they walk through the Max data and explain how routing across labs combines the strengths of di...

▶ Play video

sly rock Mar 5, 2026, 1:06 AM

#

Text & Code Arena Leaderboard Update - Qwen3.5 Medium Models

<@&1372208524230397962> <@&1372208635530448926> - The Text & Code leaderboards have been updated to include Qwen 3.5 medium models: qwen3.5-27b ,qwen3.5-35b-a3b , qwen3.5-122b-a10b and qwen3.5-flash

Code Arena Highlights:
dot1 top 10 open Qwen3.5-122b-a10b, scoring 1384 and Qwen3.5-27b, scoring 1375 both very close to proprietary models: Claude Sonnet 4.5 and GPT-5.1-medium
dot2 Qwen3.5-35b-a3b, scoring 1257 is on par with new Gemini-3.1-flash-lite-preview
dot3 Qwen3.5-Flash, scoring 1243 is on par with GPT-5.1-codex-mini

Text Arena Highlights:
dot1 top 10 open Qwen3.5-122b-a10b, scoring 1420
dot2 Qwen3.5-27b, the smallest and densest scores 1410, on par with GLM-4.5
dot3 Qwen3.5-35b-a3b, scoring 1392 and Qwen3.5-Flash scoring, 1395 are on par with 6-7x larger last-generation model Qwen3-235b

sly rock Mar 5, 2026, 7:46 PM

#

Text Arena Leaderboard Update - GPT-5.4

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated to include gpt-5.4 and gpt-5.4-high.

Highlights:
dot1 GPT-5.4-high is tied with Gemini-3-Pro.
dot2 Top 3 in Creative Writing, and top 10 in Instruction Following, Hard Prompts.
dot3 Top 6 for Occupational categories - Writing, Literature & Language, Entertainment, Sports & Media, Business, Management & Financial Ops.
dot4 GPT-5.4 (reasoning none) ranks #16

sand grail Mar 5, 2026, 11:11 PM

#

First impressions of OpenAI GPT 5.4 | Arena.ai

<@&1473460945308221542> - AI capability lead Peter Gostev runs through one-shot tests to see how GPT 5.4 compares to other models.

https://www.youtube.com/watch?v=foEfcttIuiI

Want the new YouTube Updates role? Head to Channels & Roles (in the channel list), click Customize, choose What brings you here, and select YouTube Updates.

YouTube

Arena AI

First impressions of OpenAI GPT 5.4 | Arena.ai

Try it yourself: https://arena.ai

Arena's AI Capability Lead Peter Gostev shares his first impressions of GPT-5.4, OpenAI's latest frontier model in the GPT-5 series.

In this deep dive, Peter tests GPT-5.4's coding and reasoning capabilities to see how it stacks up against leading models like Claude Opus 4.6, Gemini 3.1 Pro, and GPT-5.2.

From...

▶ Play video

sand grail Mar 6, 2026, 10:48 PM

#

OpenAI’s GPT-5.4-High is in the Arena | Arena.ai

<@&1473460945308221542> - no words, just visuals.

https://www.youtube.com/watch?v=wwtMv4hPv54

YouTube

Arena AI

OpenAI’s GPT-5.4-High is in the Arena | Arena.ai

https://arena.ai/code

Check out OpenAI’s GPT-5.4 for yourself in the Code Arena.

#arenaai #OpenAI #GPT54

▶ Play video

sly rock Mar 7, 2026, 12:25 AM

#

Text Arena Leaderboard Update - PixVerse V5.6

<@&1372208524230397962> <@&1372208635530448926> - The Video Arena leaderboards have been updated to include pixverse-v5.6

Highlights:
dot1 #15 on Text-to-Video
dot2 #15 on Image-to-Video

minor tide Mar 9, 2026, 3:12 PM

#

Document Arena Leaderboard Update - Claude Sonnet 4.6

<@&1372208524230397962> <@&1372208635530448926> - The Document Arena leaderboard has been updated to include claude-sonnet-4-6

dot1 #2 ranking overall
dot2 top 3 are all Anthropic models

sand grail Mar 10, 2026, 9:22 PM

#

First impressions of OpenAI GPT 5.4 | Arena.ai

<@&1473460945308221542> - After testing GPT 5.4-medium and coming away underwhelmed, Peter revisits the model at higher reasoning levels — and the difference is massive.

https://www.youtube.com/watch?v=4T9_deFRI30

YouTube

Arena AI

First impressions of OpenAI GPT 5.4 High

Try it yourself: https://arena.ai

See also First impressions of GPT 5.4 Medium:
https://youtu.be/foEfcttIuiI

After testing GPT 5.4-medium and coming away underwhelmed, the Arena team revisited the model at higher reasoning levels — and the difference is massive.

This video compares GPT 5.4 at medium, high, and pro reasoning levels side by s...

▶ Play video

minor tide Mar 11, 2026, 4:15 PM

#

Text Arena Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated and now includes Nemotron-3-Super-120B-A12B.

Highlights:
dot1 #37 open rank in Expert
dot5 #38 open rank in Math
dot3 #118 Text overall

minor tide Mar 11, 2026, 4:47 PM

#

sand grail Mar 11, 2026, 6:14 PM

#

How AI search actually works (and why it breaks) | Arena.ai

<@&1473460945308221542> - Every LLM can retrieve sources but the real challenge is reasoning about which ones to trust → arena.ai/search

https://www.youtube.com/watch?v=iy1HGPAK5H4

YouTube

Arena AI

How AI search actually works (and why it breaks) | Arena.ai

Try Search Arena → https://arena.ai/search

How does an LLM actually search the web? And why does it still get things wrong?

In this video, Arena researcher Logan King breaks down how search-augmented large language models work — from tool calls and context management to hallucination, misattribution, and source quality. He explains why eve...

▶ Play video

minor tide Mar 11, 2026, 8:17 PM

#

Text and Document Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - GPT-5.4 lands tied #2 on Document Arena and in top 5 for Arena Expert. In the Document Arena, top models for document analysis and long-form reasoning are ranked based on real-world use.

Highlights:
dot1 #2 tied with Sonnet 4.6
dot3 #5 for Arena Expert
dot5 top 10 in Business, Management, & Financial Ops and Writing, Literature, & Language
dot2 top 15 in Math, Instruction Following, Multi-Turn & Hard Prompts
dot4 top 15 in Text Arena overall

sand grail Mar 12, 2026, 4:03 PM

#

The Nano Banana origin story: how an anonymous Google model made Arena history

<@&1473460945308221542> - Are you old enough to remember the Nano Banana hype? Meet the engineer who added it to the Arena - Yue!

https://www.youtube.com/watch?v=6vJnfrr34Xc

YouTube

Arena AI

The Nano Banana origin story: how an anonymous Google model made Ar...

Try Image Arena → https://arena.ai/image

An anonymous image generation model appeared on Arena and quickly became the most-voted model in the platform's history. It was called Nano Banana — and it turned out to be built on Google Gemini.

In this video, we sit down with Yue, a former Google employee, to break down what made Nano Banana so s...

▶ Play video

minor tide Mar 12, 2026, 4:51 PM

#

Code Arena Leaderboard Update

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has been updated and GPT-5.4-high has landed in the top 6.

Highlights:
dot5 top 6 in Web Dev overall
dot2 #6 for Multi-File React
dot3 top 10 for Single-File HTML

sand grail Mar 13, 2026, 9:44 PM

#

Can AI spot nonsense? We tested 80 models — thinking ones did worst

<@&1473460945308221542> - Peter takes us through his viral benchmark 💩

https://www.youtube.com/watch?v=bOLXvFMqhi8

YouTube

Arena AI

Can AI spot nonsense? We tested 80 models — thinking ones did worst

Can your AI tell when a question is total nonsense — or does it just make up an answer and hope you don't notice?

Arena researcher Peter Gostev built a benchmark to find out. He crafted nonsense questions across domains like law, finance, and tech, then tested 80 models to see which ones pushed back and which ones played along. The results? S...

▶ Play video

minor tide Mar 16, 2026, 9:35 PM

#

Text and Code Arena Leaderboard Update - Grok 4.20 Beta Reasoning

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard and the Code Arena leaderboard has been updated to include Grok 4.20 Beta Reasoning.

Highlights:
dot4 #7 in Text Arena overall tied with GPT-5.4-high
dot3 top 10 in Math, Multi-Turn, Creative Writing, Coding & Hard Prompts
dot2 top 15 in Expert Arena

minor tide Mar 17, 2026, 6:38 PM

#

Customize Arena Leaderboards

<@&1372208524230397962> - Everyone's real-world use for AI differs. Select the columns and data that matters most to you:

Rank Spread
Model Organization
License
Total Votes
Price ($/MToken)
Max Context

minor tide Mar 17, 2026, 7:03 PM

#

Video Edit Leaderboard Launch

<@&1372208524230397962> - Today we’re launching the Video Edit Arena to evaluate the frontier capability of video models! The leaderboard is powered by thousands of real-world community votes. Click the Edit button in Video Arena to edit any video and compare top model outputs. More models coming soon!

#1 Grok-Imagine-Video
#2 Kling-o3-pro
#3 Kling-o1-pro
#4 Gen4-aleph

minor tide Mar 17, 2026, 7:53 PM

#

New Model Update -

<@&1372208635530448926> - New models have been added to Text Arena & Vision Arena!

gpt-5.4-mini-high

gpt-5.4-nano-high

minor tide Mar 18, 2026, 4:11 PM

#

New Model Update -

<@&1372208635530448926> - A new model has been added to Text Arena and Code Arena!

minimax-m2.7

sand grail Mar 18, 2026, 5:50 PM

#

Pick the most reliable AI, not the smartest one

<@&1473460945308221542> - Peter explains why the most underrated quality in AI models isn't how smart they are - it's how consistently they perform.

https://www.youtube.com/watch?v=IaewLXbgMIQ

YouTube

Arena AI

Pick the most reliable AI, not the smartest one

https://arena.ai

Arena's AI Capability Lead Peter Gostev explains why the most underrated quality in AI models isn't how smart they are - it's how consistently they perform, and why that distinction changes everything for builders.

#AI #LLM #arenaai #softwaredevelopment

▶ Play video

minor tide Mar 18, 2026, 10:43 PM

#

Code Arena Leaderboard Update - MiniMax M2.7

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has been updated to include MiniMax-M2.7

minor tide Mar 19, 2026, 4:36 PM

#

Text Arena and Expert Leaderboard Updated - Qwen 3.5 Max Preview

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard and the Expert Arena leaderboard has been updated to include Qwen3.5-max-preview.

Highlights:
dot1 #3 Math
dot2 #10 Expert
dot3 #15 Text Arena
dot4 Top 20 for Writing, Literature & Language, Life, Physical, & Social Science, Entertainment, Sports, & Media, and Medicine & Healthcare

minor tide Mar 19, 2026, 7:39 PM

#

Image Arena and Vision Arena Leaderboard Updated - MAI-Image-2 & Grok-4.20-Beta-Reasoning

<@&1372208524230397962> <@&1372208635530448926> - The Image Arena leaderboard has been updated to include MAI-Image-2.

Highlights:
dot3 #5 in Text-to-Image overall
dot4 #5 for 3D Imaging & Modeling, Cartoon, Anime & Fantasy, Photorealistic & Cinematic Imagery, Art and Portraits
dot2 #6 for Product, Branding & Commercial Design

The Vision Arena leaderboard has also been updated to include Grok-4.20-Beta-Reasoning

Highlights:
dot5 Scoring 1240
dot1 #11 across all Vision

minor tide Mar 19, 2026, 11:52 PM

#

Reporting Process for “Something Went Wrong” Errors

<@&1372208635530448926> - We’ve recently updated our bug reporting process. If you encounter a Something went wrong error, you’ll now see a Trace ID. We recommend the following steps when this happens:

Try the troubleshooting steps in this article.
Confirm the issue isn’t related to rate limits, as outlined in this article.
Submit the Trace ID using this form. This helps us identify the root causes and resolve issues over time.

minor tide Mar 20, 2026, 3:40 PM

#

New Model Update -

<@&1372208635530448926> - A new model has been added to Vision Arena!

mimo-v2-omni

sand grail Mar 20, 2026, 5:17 PM

#

Battle mode vs. side by side vs. direct chat | Arena.ai

<@&1473460945308221542> - three ways to test AI - each designed for a different use case.

https://www.youtube.com/watch?v=JdcoHxnPouM

YouTube

Arena AI

Battle mode vs. side by side vs. direct chat | Arena.ai

Try it free: https://arena.ai

Arena.ai gives you three distinct ways to interact with the world's top AI models — each designed for a different use case.

🔥 Battle mode — Pit two anonymous models against each other and vote on the best response. Your votes help shape the largest open LLM leaderboard.
🔀 Side by side — Pick two specif...

▶ Play video

minor tide Mar 20, 2026, 7:13 PM

#

Code Arena and Arena Expert Leaderboard Update - MiMo V2 Pro

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard and Arena Expert leaderboard has been updated to include MiMo-V2-Pro.

Highlights:
dot1 top 6 lab, #13 in Code Arena for agentic webdev tasks
dot2 #10 for Arena Expert
dot3 top 20 for Life, Physical, & Social Science and Business, Management, & Financial Ops occupational categories

sand grail Mar 23, 2026, 3:24 PM

#

How to evaluate LLMs | the statistics behind Arena's rankings

<@&1473460945308221542> - curious about the math behind Arena's ranking system? This is a technical deep dive into the core methodology with Anastasios Angelopoulos, co-founder and CEO of Arena.

https://www.youtube.com/watch?v=CnWt0Zarfoc

YouTube

Arena AI

How to evaluate LLMs | the statistics behind Arena's rankings

https://arena.ai

Anastasios Angelopoulos, co-founder and CEO of Arena, presents a technical deep dive into how the platform evaluates large language models using live human preference data.

Prior familiarity with probability and statistics is helpful but not required.
The talk covers Arena's core methodology — pairwise comparisons, Bradley-...

▶ Play video

minor tide Mar 23, 2026, 9:47 PM

#

Server Update -

<@&1372208635530448926> - As our community continues to grow, we’re making a few updates to help us provide better and more organized support. Going forward, here are some updated guidelines:
dot2 The #general channel should be used for general AI discussion. If you have a question, please use #ask-here. If you’re reporting a bug, please use #1343291835845578853. Going forward, I wouldn't expect getting an answer from staff if it's asked in general.
dot3 If you’d like to reach the team directly, please send a message to @hearty vessel. Direct messages to individual team members (including myself) will no longer be supported.
dot5 Before creating a new thread in #1343291835845578853, please check if there’s already an existing thread for the issues you're experiencing. Duplicate posts may be removed to help keep things organized.

sand grail Mar 25, 2026, 8:21 PM

#

Create an Arena account and manage your chats

<@&1473460945308221542> - how to create an Arena account to unlock higher rate limits, cross-device chat sync, and access additional features.

https://www.youtube.com/watch?v=1Nee2fIlvy8

YouTube

Arena AI

Create an Arena account and manage your chats

https://arena.ai

Create an Arena account to unlock higher rate limits, cross-device chat sync, and access to additional features. This quick walkthrough covers signing up with Google or email, plus how to save, archive, and delete your chats.

#arenaai

▶ Play video

minor tide Mar 26, 2026, 7:14 PM

#

Search Arena Leaderboard Updated - Gemini 3.1 Pro Grounding

<@&1372208524230397962> <@&1372208635530448926> - The Search Arena leaderboard has been updated to include Gemini-3.1-Pro-Grounding.

sand grail Mar 27, 2026, 4:19 PM

#

Are Open Sources models catching up to Proprietary models?

<@&1473460945308221542> - We looked at 3 years of Arena data. Proprietary models still hold the top 20 spots — but open source is climbing — with the top open source models ranked 20 (GLM-5 by Z.ai), 23 (Kimi-K2.5-Thinking by Moonshot AI) and 27 (Qwen3.5-397b-a17b by Alibaba).

https://www.youtube.com/shorts/kw__8_0AUx4

YouTube

Arena AI

Are Open Sources models catching up to Proprietary models?

https://arena.ai/leaderboard/text

We looked at 3 years of Arena data. Proprietary models still hold the top 20 spots — but open source is climbing — with the top open source models ranked 20 (GLM-5 by Z.ai), 23 (Kimi-K2.5-Thinking by Moonshot AI) and 27 (Qwen3.5-397b-a17b by Alibaba).

Here's how the race has evolved.

#arenaai #opensource ...

▶ Play video

minor tide Mar 27, 2026, 4:26 PM

#

Text Arena Leaderboard Update - GPT-5.4-Mini-High

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated to include GPT-5.4-Mini-High.

Highlights:
dot1 #3 Business, Management & Financial Ops
dot2 #10 Multi-Turn
dot3 #13 Arena Expert
dot4 #17 Legal & Government
dot5 #19 Instruction Following

sand grail Mar 30, 2026, 4:54 PM

#

Big model smell: will one giant AI model replace all the small ones?

<@&1473460945308221542> Arena ML researchers Evan and Derry discuss whether the future of AI belongs to massive generalist models or smaller, fine-tuned specialists.

https://www.youtube.com/watch?v=v74VmZnj6Ww

YouTube

Arena AI

Big model smell: will one giant AI model replace all the small ones?

Arena ML researchers Evan and Derry discuss whether the future of AI belongs to massive generalist models or smaller, fine-tuned specialists.

They break down what "big model smell" actually means (sensing genuine reasoning vs. memorized responses), introduce the idea of "pristine pre-training smell," and debate whether scaling is really hitting...

▶ Play video

sand grail Mar 31, 2026, 4:45 PM

#

Troubleshooting guide for Arena.ai

<@&1473460945308221542> Five things to try if you see a "Something went wrong with this response" error and how to submit a bug report if you need extra help.

https://www.youtube.com/watch?v=r7ekTRRSlRs

YouTube

Arena AI

Troubleshooting guide for Arena.ai

Getting a "Something went wrong with this response" error on arena.ai? This video walks through five things to try first, common error messages you might see, and how to submit a bug report if you need extra help.

📋 Full troubleshooting guide:
https://help.arena.ai/collections/3598891525-troubleshooting

🐛 Bug report form:
https://docs....

▶ Play video

minor tide Mar 31, 2026, 8:24 PM

#

Text, Vision, and Search Arena Leaderboard Update - Grok 4.20 Multi-Agent Beta Model

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard, Vision Arena leaderboard, and Search Arena leaderboard has been updated to include Grok-4.20-multi-agent-beta-0309.

Highlights:
dot1 #7 for Search Arena, #11 in Text Arena, #22 in Vision Arena
dot2 #3 Medicine & Healthcare
dot3 #6 Expert Prompts
dot4 #6 Mathematical
dot5 #6 Legal & Government

minor tide Apr 1, 2026, 5:04 PM

#

Pareto chart is now on the Leaderboards

<@&1372208524230397962> <@&1372208635530448926> - We've added Pareto frontier charts to the leaderboard! The Pareto frontier curve demonstrates which models are most efficient at their level of performance (by Arena score) vs. a blended price per 1M tokens (3:1 Ratio).

Now available across:
dot2 Text Arena
dot4 Vision Arena
dot1 Search Arena
dot3 Document Arena
dot5 Code Arena

In this video, Peter and Justin walk through how to read the Pareto frontier, find hidden gems, compare models across categories.

YouTube

Arena AI

Stop overpaying for AI — how to use Arena's Pareto chart

https://arena.ai

The top of the leaderboard doesn't tell the whole story. Arena's Pareto chart shows you which AI models give you the best performance at every price point — so you can stop defaulting to the most expensive option and start picking the right model for your budget and your task.

In this video, Peter and Justin walk through how...

▶ Play video

minor tide Apr 1, 2026, 6:08 PM

#

New Model Update

<@&1372208635530448926> - A new model has been added to Vision Arena!

GLM-5V-Turbo

minor tide Apr 2, 2026, 3:41 PM

#

Code Arena Leaderboard Update - Qwen 3.6 Plus

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has been updated to include Qwen 3.6 Plus. Qwen 3.6 Plus Preview is the #2 lab for the React leaderboard in Code Arena which ranks models based on agentic workflows involving multi-step reasoning, tool use, and multi-file apps.

minor tide Apr 2, 2026, 4:45 PM

#

Text Arena Leaderboard Update - Gemma-4-31B

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated to include Gemma-4-31B.

Highlights:
dot1 #3 open (#27 overall), on par with the best open models Kimi-K2.5, Qwen-3.5-397b
dot2 Top 3 across Math, Instruction Following, Multi-Turn, Hard Prompts, Creative Writing, and Coding
dot3 Apache 2.0 license
dot4 Its efficient variant: Gemma-4-26B-A4B is #6 open (#39 overall)

minor tide Apr 2, 2026, 8:47 PM

#

Arena Leaderboard Dataset Release

<@&1372208524230397962> <@&1372208635530448926> - We're releasing the full history of Arena leaderboard data as a public dataset, nearly 3 years of rankings across 10 Arenas, dozens of categories, and hundreds of models. Optimized to empower analysis and unlock new insights across modalities and over time.

Learn more in our blog post here.

Find the dataset on our Hugging Face here.

minor tide Apr 3, 2026, 12:41 AM

#

Changes to Models Available in Direct and Side-by-Side

<@&1372208635530448926> - We are changing some of the models available in Direct and Side-by-Side modes. These changes are part of our efforts to ensure that we can continue offering access to AI models while keeping the platform running reliably.

The following models are being removed from Direct and Side-by-Side:

claude opus models

gpt 5.4, gpt-5.4-high

gemini-3.1-pro-preview

Updates like this help us maintain availability for everyone over the long term, and we intend to bring these models back in a way that’s more sustainable when possible.

minor tide Apr 3, 2026, 3:26 PM

#

New Model Update -

<@&1372208635530448926> - A new model has been added to Text Arena & Code Arena!

Qwen-3.6-Plus

sand grail Apr 6, 2026, 10:03 PM

#

How battles in direct changes the way we evaluate LLMs

<@&1473460945308221542> - Longer context windows, more decisive votes, and over 90% correlation with traditional battle rankings — plus new signals about human preference that weren't measurable before.

https://www.youtube.com/watch?v=_pmZJaEbRaQ

YouTube

Arena AI

How battles in direct changes the way we evaluate LLMs

https://arena.ai/text/direct

Battles in direct is a new evaluation mode that triggers battles mid-conversation in direct chat. Unlike traditional battles with four voting options, battles in direct uses three: continue with A, continue with B, or skip. The result? Longer context windows, more decisive votes, and over 90% correlation with tradit...

▶ Play video

minor tide Apr 7, 2026, 4:39 PM

#

New Model Update -

<@&1372208635530448926> - A new model has been added to Text Arena and Code Arena!

GLM-5.1

sand grail Apr 7, 2026, 7:30 PM

#

First impressions of Z.ai GLM-5.1 (open source)

<@&1473460945308221542> - GLM-5.1 is a solid incremental upgrade over GLM-5, with slightly higher richness and quality across most generations. But the gap isn't massive.

https://www.youtube.com/watch?v=f11tVBXWr2g

YouTube

Arena AI

First impressions of Z.ai GLM-5.1 (open source)

Try it yourself: https://arena.ai

The Arena team put Z.ai's GLM-5.1 through roughly 100 one-shot generation tests — 3D scenes, SVGs, games, and more — comparing it side by side with Claude Opus 4.6, GLM-5, and Gemini 3.1 Pro.

The verdict: GLM-5.1 is a solid incremental upgrade over GLM-5, with slightly higher richness and quality across mo...

▶ Play video

minor tide Apr 7, 2026, 10:26 PM

#

Text Arena Leaderboard Update - GLM-5.1

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated to include GLM-5.1.

Highlights:
dot1 #1 open model in Longer Query (#4 overall)
dot2 #1 open model in Life, Physical & Social Science (#5 overall)
dot3 #1 open model in Entertainment, Sports & Media (#8 overall)
dot4 #1 open model in Coding (#10 overall)

sand grail Apr 8, 2026, 3:34 PM

#

We just open-sourced 3 years of model benchmark data—over 700+ models

<@&1473460945308221542> - This is one of the longest-running datasets tracking real model performance over time.

https://www.youtube.com/watch?v=QbpW77m90kw

YouTube

Arena AI

We just open-sourced 3 years of model benchmark data—over 700+ mo...

Download here:
https://huggingface.co/datasets/lmarena-ai/leaderboard-dataset

Arena has released a comprehensive public dataset containing every leaderboard entry since May 2023—over 700+ unique models, multiple arenas, continuous evaluation history. This is one of the longest-running datasets tracking real model performance over time.

Ideal...

▶ Play video

minor tide Apr 8, 2026, 3:46 PM

#

Planned Change: New Usage System - Share Your Feedback

<@&1372208635530448926> - We're sharing an early opportunity to share feedback at a change we’re planning before we release anything. But before introducing the change, we'd like to provide some context. Right now, Arena manages usage through a mix of per-model limits, session token limits, and modality limits. It works, but it's difficult to track, and we’re confident we can build something better.

What’s changing

We're reworking this into a single, unified system. Instead of separate hidden limits everywhere, you'd have a daily balance of “credits” you can spend however you want, so you can go all-in on Video Arena, stick with one model, bounce between a few, whatever. The goal is to give you way more control and visibility over how you use Arena .

What we don't know yet

We’re still early, so many details are intentionally flexible. Things like credit structure, launch timing, and even the final name are still being refined. We’re sharing this now so feedback like yours can directly shape how it evolves. If you’re wondering what certain aspects will look like, there’s a good chance we’re actively working through those decisions and won't have a direct answer.

Share your feedback

Drop your thoughts in this thread - https://discord.com/channels/1340554757349179412/1491461236448170134 What would make this feel fair? What would you want to know up front? What would actually be useful to see?

minor tide Apr 10, 2026, 3:37 PM

#

Code Arena Leaderboard Update - GLM-5.1

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has been updated to include GLM-5.1.

sand grail Apr 10, 2026, 10:28 PM

#

Humans are not as worthless as you think

<@&1473460945308221542> Derry and Peter break down why verifiable benchmarks only capture a narrow slice of model performance, and why human judgment still matters more than most people think.

https://www.youtube.com/watch?v=tUxCxdcJeg4

YouTube

Arena AI

Humans are not as worthless as you think

https://arena.ai

AI benchmarks like GPQA, MMLU, and SWEbench saturate in months — models score 99% and nobody cares anymore. But when you actually use these models, something's off. Tests pass, scores look great, and the output is still wrong.

Arena researchers Derry Xu and Peter Gostev break down why verifiable benchmarks only capture a nar...

▶ Play video

minor tide Apr 10, 2026, 11:02 PM

#

Text & Vision Arena Leaderboard Update - Muse Spark

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard and the Vision Arena leaderboard has been updated to include Muse-Spark.

Highlights:
dot1 Text Arena: #3
dot5 Vision Arena: #2
dot2 #4 Hard Prompts, #6 Coding, #9 Creative Writing, #10 Instruction Following, #27 Expert
dot3 #3 tied for Business, Management, & Financial Ops, #7 Legal & Government, #12 Writing & Literature

sand grail Apr 14, 2026, 9:02 PM

#

What top models still suck at | Arena Deep Dive

<@&1473460945308221542> - ICYMI check out Peter's keynote from the AI Engineer conference in London last week.

https://www.youtube.com/watch?v=zTMQ88btM8s

YouTube

Arena AI

What top models still suck at | Arena Deep Dive

https://arena.ai

Peter Gostev, AI capability lead at Arena, walks through the talk he gave at the AI Engineer conference in London — breaking down where today's top models are still falling short, according to millions of real user votes on Arena.

The key metric: Arena's "both bad" rate, which tracks how often users dislike both model respon...

▶ Play video

sand grail Apr 15, 2026, 3:15 PM

#

Ask an Expert Accountant | Arena.ai

<@&1473460945308221542> Can AI function like a professional Accountant? We asked an expert to judge AI responses to a complex Arena Expert-style prompt.

https://www.youtube.com/watch?v=AvoTjdYgFBA

YouTube

Arena AI

Ask an Expert Accountant | Arena.ai

Best AI for your job:
https://arena.ai/leaderboard/text/industry-business-and-management-and-financial-operations

Can AI function like a professional Accountant? We asked an expert Accountant to judge AI responses to a complex prompt regarding required minimum distributions and missing participants. See his reactions and verdicts.

0:00 Intro ...

▶ Play video

minor tide Apr 15, 2026, 3:35 PM

#

Video Edit Arena Arena Leaderboard Update - HappyHorse-1.0

<@&1372208524230397962> <@&1372208635530448926> - The Video Edit Arena leaderboard has been updated to include Happyhorse-1.0. HappyHorse-1.0 by Alibaba-ATH debuts at #1 in Video Edit Arena!

minor tide Apr 15, 2026, 7:57 PM

#

Introducing: Image to WebDev Leaderboard

<@&1372208524230397962> <@&1372208635530448926> - The Image to WebDev Leaderboard ranks models based on their ability to generate websites based on screenshots and images. This is a dedicated leaderboard that shows which models are the best at agentic coding live sites based on visual inputs.

Check out the Image to WebDev Leaderboard here!

minor tide Apr 16, 2026, 3:12 PM

#

New Model Update -

<@&1372208635530448926> - A new model has been added to Text, Vision, Code, Image-to-WebDev, and Document Arena! This model has been added to Battle mode, but is not currently available in Direct or Side-by-Side.

Claude Opus 4.7

minor tide Apr 17, 2026, 3:30 AM

#

Text-to-Video and Image-to-Video Arena Leaderboard Update - HappyHorse-1.0

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Video leaderboard and Image-to-Video leaderboard has been updated to include HappyHorse-1.0.

Highlights:
dot1 #2 Text-to-Video: Scores 1444
dot2 #2 Image-to-Video: Scores 1444
dot3 Top 2 for all 3 Video Arena leaderboards

minor tide Apr 17, 2026, 4:50 PM

#

Leaderboard Update - Opus 4.7 & Opus 4.7 Thinking

<@&1372208524230397962> <@&1372208635530448926> - Code Arena leaderboard, Expert Arena leaderboard & Text Arena leaderboard has been updated to include Claude-opus-4-7 and Claude-opus-4-7-thinking!

Highlights - Opus 4.7 Thinking:
dot1 Ranks #1 in Code Arena, +37 points over Opus-4.6, #1 on both React and HTML leaderboards
dot4 Ranks #1 in Expert Arena
dot3 Ranks #1 in Text Arena, leads across major categories: #1 Coding, #1 Software & IT Services, #1 Writing, Literature, & Language, #1 Life, Physical and Social Sciences, #2 Multi-Turn

Highlights - Opus 4.7:
dot5 Ranks # 4 in Expert Arena
dot3 Ranks #3 in Text Arena

minor tide Apr 17, 2026, 5:49 PM

#

First impressions of Claude Opus 4.7

<@&1473460945308221542> <@&1372208524230397962> - In this deep dive, Peter tests Claude Opus 4.7’s capabilities to see how it stacks up against leading models in the Code Arena where agentic web development tasks are evaluated by generating live sites and apps.

https://www.youtube.com/watch?v=VE7Pi4gLu0s

YouTube

Arena AI

First impressions of Claude Opus 4.7 | Arena.ai

Arena's AI Capability Lead Peter Gostev shares his first impressions of Claude Opus 4.7, Anthropic's latest model in the Claude family.

In this deep dive, Peter tests Claude Opus 4.7’s capabilities to see how it stacks up against leading models in the Code Arena where agentic web development tasks are evaluated by generating live sites and ap...

▶ Play video

minor tide Apr 20, 2026, 3:04 PM

#

Leaderboard Update - Opus 4.7 & Opus 4.7 Thinking

<@&1372208524230397962> <@&1372208635530448926> - The Vision Arena leaderboard and the Document Arena leaderboard has been updated to include Opus-4.7 and Opus-4.7-thinking.

Highlights:
dot2 Opus 4.7 ranks #1 in Document Arena with a score of 1521, while Thinking ranks #4 with a score of 1508
dot5 In Vision 4.7-Thinking ranks #1 while Non-Thinking ranks #3
dot1 Vision sub-categories saw the biggest gains over Opus-4.6: #1 Diagram (+20), #1 for Non-Thinking in Homework (+30), and #1 OCR for Non-Thinking (+7)

minor tide Apr 20, 2026, 6:04 PM

#

Leaderboard Update - Qwen 3.6 Plus

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has been updated to include Qwen3.6-plus.

minor tide Apr 20, 2026, 8:49 PM

#

New Model Update -

<@&1372208635530448926> - A new model has been added to Text Arena, Vision Arena, Code Arena, and Document Arena!

Kimi K2.6

sand grail Apr 20, 2026, 11:29 PM

#

We built a Japan travel planner from scratch with Code Arena

<@&1473460945308221542> See how agentic coding actually works: tool calling, self-correction, multi-turn iteration, and more.

https://www.youtube.com/watch?v=g-vieNXOF4s

YouTube

Arena AI

We built a Japan travel planner from scratch with Code Arena

https://arena.ai/code

We put Code Arena to the test by building a full Japan travel itinerary website from scratch — complete with AI-generated images and real restaurant recommendations pulled from the web. Watch as two anonymous models go head-to-head in battle mode, and see how agentic coding actually works: tool calling, self-correction, ...

▶ Play video

minor tide Apr 21, 2026, 7:29 PM

#

Image Arena Leaderboard Update - GPT-Image-2

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Image Arena leaderboard and the Image Edit leaderboard has been updated to include GPT-Image-2.

Highlights:
dot1 #1 Text-to-Image (1512), +242 over #2 (Nano-banana-2 with web-search aka gemini-3.1-flash-image)
dot2 #1 Single-Image Edit (1513), +125 over #2 (Nano-banana-pro aka gemini-3-pro-image)
dot3 #1 Multi-Image Edit (1464), +90 over #2 (Nano-banana-2)

https://www.youtube.com/watch?v=Adsaiyr7Nv8

YouTube

Arena AI

First impressions of OpenAI GPT Image 2

Try it yourself: https://arena.ai/image

GPT Image 2 just made the biggest leap Arena has ever recorded — over 200 Arena points ahead of every other image model. Arena's AI Capability Lead Peter Gostev ran 100+ prompts head-to-head against GPT Image 1.5, Grok Imagine, and Nano Banana 2 to find out what's actually driving that gap.

The verdic...

▶ Play video

#

minor tide Apr 22, 2026, 6:27 PM

#

New Model Update -

<@&1372208635530448926> - A new model has been added to [Text],(https://arena.ai/), Vision & Code Arena!

MiMo-V2.5

Pro versions added to Text and Code

minor tide Apr 22, 2026, 10:34 PM

#

Code, Vision, Document, and Text Arena Leaderboard Update - Kimi-K2.6

<@&1372208524230397962> <@&1372208635530448926> - The Code, Vision, Document, and Text Arena leaderboards have been updated to include Kimi-K2.6.

Highlights:
dot1 #2 open model in Code Arena (#6 overall)
dot2 #1 open model in Vision Arena (#15 overall)
dot3 #1 open model in Document Arena (#8 overall)
dot4 #2 open model in Text Arena (#24 overall)

Stay up to date with our Leaderboard Changelog.

minor tide Apr 23, 2026, 4:38 PM

#

Text-to-Image and Image Edit Leaderboard Update - Qwen Image 2.0 Pro

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Image Arena leaderboard and the Image Edit leaderboard has been updated to include qwen-image-2.0-pro-2026-04-22.

Highlights:
dot1 #9 Text-to-Image
dot2 #17 Image Edit (Single Image)
dot3 Top 10 in Text-to-Image categories: #6 Portraits, #7 Photorealistic & Cinematic Imagery, #7 Art

minor tide Apr 24, 2026, 3:41 AM

#

Text and Code Arena Leaderboard Update - DeepSeek V4

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard and Code Arena leaderboard has been updated to include DeepSeek V4.

Highlights:
dot1 Code Arena - DeepSeek V4 Pro (thinking): #3 open model (#14 overall)
dot2 Text Arena - DeepSeek V4 Pro (thinking): #2 open model (#14 overall), DeepSeek V4 Flash (thinking): #10 open model (#47 overall)
dot3 Top 10 Text categories: #1 Medicine & Healthcare (v4 Pro), #8 Legal & Government (v4 Pro), #8 Math (v4 Pro Thinking), #9 Life, Physical, & Social Science (v4 Pro Thinking)

https://www.youtube.com/watch?v=AC2jj_jfunQ

YouTube

Arena AI

First impressions of DeepSeek V4 (open source)

Try it yourself: https://arena.ai/code

The Arena team put DeepSeek V4 Pro through a battery of one-shot generation tests — 3D voxel scenes, SVGs, UI mockups, and creative prompts — comparing it side by side with Claude Opus 4.7, GLM-5.1, Gemini 3.1, GPT-5.4 High, Muse Spark, and its own predecessor, DeepSeek 3.2.

The verdict: DeepSeek V4 i...

▶ Play video

#

minor tide Apr 24, 2026, 6:51 PM

#

New Model Update - GPT 5.5

<@&1372208635530448926> - A new model has been added to Text, Vision, Search, Document, and Code Arena! Note this is in Battle mode. You will not find this model in Direct and Side by Side modes.

GPT 5.5

https://www.youtube.com/watch?v=nDjMlfNbNNY

YouTube

Arena AI

First impressions of OpenAI’s GPT 5.5

NOTE: Follow Arena on X to know when GPT 5.5 is available on Arena.ai: https://x.com/arena

OpenAI just dropped GPT 5.5 (codenamed "Spud") — their first new pre-trained model in a while. But how does it actually perform on real-world tasks? Peter Gostev, Arena's AI Capability Lead, puts it through its paces with visual coding challenges, long-...

▶ Play video

#

minor tide Apr 24, 2026, 11:59 PM

#

Changes to Image Rate Limits

<@&1372208635530448926> - We’re updating the rate limits for Image Arena, effective Monday, April 27. This will help us maintain reliability and ensure the service remains sustainable as usage grows. The team is developing a new usage system that'll unify limits, learn more here: #announcements message.

dot1 Image Arena Battle Mode will now be limited to 15 generation requests per 24-hour period
dot2 The following models will have a 5 generation request per hour limit

gpt-image-1.5-high-fidelity

gpt-image-2

gemini-3.1-flash-image-preview

gemini-3-pro-image-preview

chatgpt-image-latest-high-fidelity

minor tide Apr 27, 2026, 4:06 PM

#

Code, Text, Document, Expert, Search, and Vision Leaderboard Update - GPT-5.5

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena, Text Arena, Expert Arena, Search Arena, Document Arena, and Vision Arena leaderboards have been updated to include GPT-5.5.

Highlights:
dot1 Code Arena: ranked #9, +50 point jump over GPT-5.4
dot2 Document Arena: ranked #6
dot3 Text Arena: ranked #7, #3 in Math and #8 in Instruction Following categories
dot4 Expert Arena: ranked #5
dot5 Search Arena: ranked #2
dot1 Vision Arena: ranked #5

Stay up to date with our Leaderboard Changelog
!

minor tide Apr 28, 2026, 3:09 PM

#

Agent Mode - Looking for Feedback

<@&1372208635530448926> - Some users currently have access to our Agent Mode experiment. Agent Mode is a multi-modal chat experience that lets you work across different modalities within a single, unified workflow. Since this is still an experiment,** access is currently limited**. Community feedback plays an important role in how we develop new features, so if you’ve had a chance to try it, we’d love to hear from you.

If you’ve used Agent Mode, please let us know here: https://discord.com/channels/1340554757349179412/1498702173650030756

If apart of the experiment, you will find Agent Mode in the menu drop down where you select Battle, Direct, and Side by Side.

We'll have a few follow-up questions and would really value your input! blobthanks

minor tide Apr 29, 2026, 4:21 PM

#

Text Arena Leaderboard Update - Ernie-5.1

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated to include Ernie-5.1.

Highlights:
dot1 #13 overall, now the #1 highest-ranked model from a Chinese lab
dot2 Categories: #9 Math, #1 Legal & Government, #4 Business, Management & Financial Ops, and #7 Software & IT Services

sand grail Apr 29, 2026, 7:34 PM

#

The biggest lead in Arena history

<@&1473460945308221542> GPT Image 2, DeepSeek V4, GPT-5.5, and more — all in 80 seconds

https://www.youtube.com/watch?v=x4_hcF8qsIs

YouTube

Arena AI

https://arena.ai/leaderboard

GPT Image 2 just set a record on Arena — landing at number one in Image Arena with a 242-point lead and a 93% win rate, the biggest gap in Arena history. Meanwhile, DeepSeek V4 Pro debuted at number two among open models and number three in Code Arena, and GPT-5.5 arrived with a surprising showing on Code Arena's ...

▶ Play video

minor tide Apr 30, 2026, 6:35 PM

#

Text Arena Leaderboard Update - Hunyuan-Hy3-Preview

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard has been updated to include hunyuan-hy3-preview.

Highlights:
dot1 Top 7 lab for open models in Text Arena
dot2 Ranks #80 overall
dot3 Priced at $0.29 / $1.17 per 1M tokens

sand grail Apr 30, 2026, 7:52 PM

#

Where autoraters break down | Arena.ai deep dive

<@&1473460945308221542> <@&1372208590248742964> Arena researchers walk through building an autorater from scratch, then get into where it falls apart in practice.

https://www.youtube.com/watch?v=r1gqLuMdKX4

YouTube

Arena AI

Where autoraters break down | Arena.ai deep dive

https://arena.ai

Arena researchers Li Chen, PhD and I-Hung Hsu, PhD walk through how they'd build an autorater from scratch — different kinds of autoraters, training objectives, what dimensions actually matter to rate on — then get into what makes it hard in practice: preference drift, multi-turn evaluation, tie threshold variance, and the ...

▶ Play video

minor tide Apr 30, 2026, 11:56 PM

#

New Model Update -

<@&1372208635530448926> - A new model has been added to Text Arena, Vision Arena, Document Arena & Code Arena: Front-end.

Grok 4.3

minor tide May 5, 2026, 3:46 PM

#

Image Arena Leaderboard Update - UNI-1.1-Max & UNI-1.1

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Image leaderboard & the Image Edit leaderboard has been updated to include UNI-1.1-Max & UNI-1.1.

Highlights:
dot1 Text-to-Image Arena: UNI-1.1-Max #6 overall (1193), UNI-1.1 #7 overall (1190)
dot2 Multi-Image Edit Arena: UNI-1.1-Max #7 overall (1315), UNI-1.1 #8 overall (1298)
dot3 Single-Image Edit Arena: UNI-1.1-Max #7 overall (1337), UNI-1.1 #11 overall (1310)

minor tide May 5, 2026, 7:05 PM

#

Multimodal Max

@everyone - Max, Arena's model router powered by 5M+ community votes, is now multimodal.

Starting today, Max is the default in Direct chat across every modality: search, vision, image generation, image editing, and front-end coding with the same latency-controlled performance as the original router for text.

Learn more about Multimodal Max in this blog post.

minor tide May 5, 2026, 9:29 PM

#

New Model Update -

<@&1372208635530448926> - A new model has been added to Text Arena, Vision Arena, and Document Arena!

GPT-5.5-Instant

minor tide May 6, 2026, 3:56 PM

#

Arena Staff AMA

<@&1385704933403398274> - We’re hosting a Discord Staff AMA with our Product Lead to answer the most common questions! We’ll be recording the session and sharing it in this server afterward.

Submit your question(s) here!

minor tide May 7, 2026, 9:18 PM

#

Vision Arena Leaderboard Update - Gemma-4

<@&1372208524230397962> <@&1372208635530448926> - The Vision Arena leaderboard has been updated to include Gemma-4.

Highlights:
dot1 Gemma-4-31b ranks #2 open (#20 overall)
dot2 Gemma-4-26b-a4b ranks #4 open (#26 overall)

sand grail May 7, 2026, 11:51 PM

#

The Pareto frontier just moved

<@&1473460945308221542> - Gemma 4, UNI-1.1, Grok 4.3, and more — all in 80 seconds

https://www.youtube.com/watch?v=otqFNGFNwmI

YouTube

Arena AI

The Pareto frontier just moved

https://arena.ai/leaderboard

This week in the Arena: Google's Gemma 4 lands in Code Arena and pushes the open source Pareto frontier. Luma debuts Uni-1.1 in Image Arena and immediately becomes the #3 lab across Text-to-Image and Image Edit. xAI pushes Grok 4.3 across four arenas. Plus updates on GPT-5.5 Instant, Xiaomi's MiMo-V2-Omni, Arcee Tri...

▶ Play video

minor tide May 8, 2026, 10:56 PM

#

Text, Vision, and Document Arena Leaderboard Update - GPT-5.5 Instant

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard, Vision Arena leaderboard, and Document Arena leaderboard has been updated to include gpt-5.5-instant.

Highlights:
dot1 Vision Arena: #11 overall, on par with Claude-Sonnet-4.6
dot2 Text Arena: #18 overall, Multi-Turn #5
dot3 Document Arena: #24, on par with GPT-5.2

minor tide May 9, 2026, 4:21 PM

#

Search Arena Leaderboard Update - Ernie-5.1

<@&1372208524230397962> <@&1372208635530448926> - The Search Arena leaderboard has been updated to include Ernie-5.1.

Highlights:
dot1 Debuts at #4 in Search Arena
dot2 Top 3 lab in Search
dot3 The only Chinese model in the top 10 overall

minor tide May 9, 2026, 4:43 PM

#

Code Arena: Frontend - New Categories

<@&1372208524230397962> <@&1372208635530448926> - Introducing 7 new leaderboard views for frontend output in Code Arena. Aggregate leaderboards don’t tell the full story. "Best frontend coding model" depends on what you're building, so we built leaderboards that show exactly that.

Here are the new major frontend web development task categories:
dot1 Brand & Marketing
dot3 Reference-Based Design
dot5 Data & Analytics
dot2 Consumer Product
dot4 Gaming
dot3 Simulations
dot1 Content Creation Tools

Open source battle: GLM vs Kimi vs MiMo vs DeepSeek

<@&1473460945308221542> <@&1372208635530448926> Peter tests the top four open source models out of China — GLM 5.1, Kimi K2.6, MiMo 2.5 Pro, and DeepSeek V4 Pro

https://www.youtube.com/watch?v=k7WAGtS9cJY

YouTube

Arena AI

Open source battle: GLM vs Kimi vs MiMo vs DeepSeek

https://arena.ai/code

Peter tests the top four open source models out of China — GLM 5.1, Kimi K2.6, MiMo 2.5 Pro, and DeepSeek V4 Pro — across 70+ visual coding prompts covering 3D scenes, websites, and SVG animation, to see whether Arena's leaderboard rankings hold up in practice.

0:00 The lineup
1:50 3D scene generation
13:09 Website ...

▶ Play video

minor tide May 15, 2026, 3:22 PM

#

Agent Mode Feedback

<@&1372208635530448926> - Now that Agent Mode has rolled out to more users, we’re really hungry for feedback.

If you’ve tried out the new mode, we’d love to hop on a casual voice call with you sometime in the next couple of days. If you’re interested, just ping me in the #general.

If you’d prefer to share feedback async instead, we also have a few follow-up questions here.

Let me know if you have any questions! Really appreciate everyone taking the time to try it out.

sand grail May 15, 2026, 5:24 PM

#

How Arena tags millions of votes a week

<@&1473460945308221542> <@&1372208590248742964> Arena researchers dive deep into the data pipeline behind Arena's tagging system

https://www.youtube.com/watch?v=AfrfpbDBr78

YouTube

Arena AI

How Arena tags millions of votes a week

https://arena.ai

Arena collects millions of votes per week across its text, image, webdev, and other arenas — each with dozens of categories that let users and model labs slice performance by domain.

Arena researchers Guanglei Song, PhD and I-Hung Hsu, PhD walk through the data pipeline behind that tagging system. The discussion covers the ...

▶ Play video

minor tide May 18, 2026, 4:00 PM

#

Text and Vision Arena Leaderboard Update - Qwen3.7

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard and the Vision Arena leaderboards has been updated to include Qwen3.7.

Highlights:
dot1 Qwen3.7 Max Preview ranks #13 overall in Text
dot5 Qwen3.7 Max Preview ranks #16 overall in Vision
dot2 Standout categories include #7 Math, #9 Expert, #9 Software & IT, and #10 Coding

minor tide May 19, 2026, 5:53 PM

#

Text and Code: Frontend Leaderboard Update - Gemini 3.5 Flash

<@&1372208524230397962> <@&1372208635530448926> - The Text Arena leaderboard and the Code Arena leaderboard has been updated to include Gemini 3.5 Flash.

Highlights:
dot3 Ranks #9 overall for both Text and Code: Frontend
dot4 Sub-category highlights: #7 Content Creation Tools, #8 Gaming, #8 Consumer Product, #9 Data & Analytics, and #10 Reference-Based Design

minor tide May 19, 2026, 8:07 PM

#

Changes to Models Available in Direct and Side-by-Side for Image Arena

<@&1372208635530448926> - We are changing some of the models available in Direct and Side-by-Side modes for Image Arena. These changes are part of our efforts to ensure that we can continue offering access to AI models while keeping the platform running reliably.

The following models are being removed from Direct and Side-by-Side on <t:1779469200:F>

gemini 3 pro image preview 2k

gemini-3.1-flash-image-preview

gpt image 1.5 high fidelity

gpt image 2 medium

Updates like this help us maintain availability for everyone over the long term, and we intend to bring these models back in a way that’s more sustainable when possible.

sand grail May 20, 2026, 12:55 AM

#

Gemini 3.5 Flash | First impressions

<@&1473460945308221542> Gemini 3.5 Flash is doing more than you ask

https://www.youtube.com/watch?v=BScuyWDzm8Y

YouTube

Arena AI

Gemini 3.5 Flash | First impressions

Try it yourself: https://arena.ai/code

Google's new Gemini 3.5 Flash sits unusually close to the Pro line on pricing and tops Code Arena for Google's lineup. This walks through hundreds of side-by-side generations from the Arena to see how the model actually behaves — where it produces unusually rich SVGs and 3D scenes, where it adds elements...

▶ Play video

minor tide May 21, 2026, 7:42 PM

#

Image Arena Leaderboard Update - HiDream-O1-Image

<@&1372208524230397962> <@&1372208635530448926> - The Text-to-Image Arena leaderboard has been updated to include HiDream-O1-Image.

Highlights:
dot3 Ranks #27 overall
dot4 Ranks #4 open source for Text-to-Image Arena

minor tide May 26, 2026, 3:50 PM

#

Code Arena Leaderboard Update - Qwen3.7 Max (20260517)

<@&1372208524230397962> <@&1372208635530448926> - The Code Arena leaderboard has been updated to include Qwen3.7 Max (20260517).

Highlights:
dot1 #4 in Code Arena: Frontend
dot2 Scoring 1541