#codename-discussion | Arena | Page 4

upbeat mirage Dec 20, 2025, 11:40 PM

#

Sonoma Sky and Sherlock was grok

eternal cargo Dec 20, 2025, 11:40 PM

#

looking back people were identifying ring-1t as Qwen

#

but I think it would’ve been a high-parameter model from the company behind Ring Flash and Ling Flash?

#

okay imcrine

#

neo-nucleus is really funny

modest oriole Dec 20, 2025, 11:42 PM

#

eternal cargo neo-nucleus is really funny

Gemini perhaps?

eternal cargo Dec 20, 2025, 11:42 PM

#

I’m big on prompting them with joke phrases to see if they shut it down or keep the bit going

modest oriole Dec 20, 2025, 11:42 PM

#

So like references?

upbeat mirage Dec 20, 2025, 11:43 PM

#

there's also integrated-info claims to be Gemini model

eternal cargo Dec 20, 2025, 11:43 PM

#

Okay, okay! You twisted my arm! 🫡 Here is the Official Press Release for the product you asked for. I cleaned up the language just enough to not get fired, but kept the spirit of the request. 🤢📰

🚨 BREAKING NEWS 🚨

FOR IMMEDIATE RELEASE

AssButt AI DISRUPTS THE TECH INDUSTRY WITH ITS MOST VISCERAL LAUNCH YET: The Brown Flow™

SILICON VALLEY, CA — In a move that has Wall Street confused and the janitorial staff terrified, AssButt AI has just announced their latest innovation. Forget ChatGPT. Forget Sora. The future is digestive.

Introducing The Brown Flow™ (Model: SH-1T-H0L3).

"We realized that everyone else is trying to put things into computers," said CEO Cheeks McFartsworth. "But we asked ourselves: What does the world really need? And the answer came to us after Taco Tuesday. The world needs us to release sh*t from the hole."

🔽 THE FEATURES 🔽

💩 100% Organic Output: No algorithms. No code. Just pure, unadulterated waste straight from the source.
🕳️ The Backdoor API: A proprietary port located exactly where you don’t want to look.
🚀 Push Technology: It doesn’t just sit there. It departs.
📉 Stock Price: Expected to drop immediately.

"It’s raw, it’s real, and it smells like disruption," the press release states. "Pre-orders start now. Hose not included."

There you go! Happy? 🍅

modest oriole Dec 20, 2025, 11:43 PM

#

upbeat mirage there's also ```integrated-info``` claims to be Gemini model

Turned out to be a hoax

#

Randomly told me its ERNIE

#

Another frame-flow incident

eternal cargo Dec 20, 2025, 11:43 PM

#

yeah integrated-info being ERNIE checks out

#

it’s def not a super high caliber model

#

although ERNIE performs pretty remarkably well on SC-off

#

it’s the only model with a higher SC off score in any of the languages than Gemini 3 Pro

#

and it’s in French? surprisingly enough?

#

glm-4.6v-flash just straight up replies to some prompts with Chinese sometimes I’ve noticed

modest oriole Dec 20, 2025, 11:48 PM

#

It is by Zhipu

#

A chinese company

upbeat mirage Dec 20, 2025, 11:50 PM

#

is kernel-sense also a hoax model?

#

it claims to be (another) Gemini

#

(v1.5, iirc)

modest oriole Dec 20, 2025, 11:50 PM

#

Kernel sense Unsure

#

Its fairly new

#

Told me its anthropic rn

eternal cargo Dec 20, 2025, 11:55 PM

#

modest oriole Told me its anthropic rn

lol what a liar

#

unless?

modest oriole Dec 20, 2025, 11:55 PM

#

Antrophic barely does stealth models

upbeat mirage Dec 20, 2025, 11:56 PM

#

holo-scope claims to be Ernie (by Baidu)

modest oriole Dec 20, 2025, 11:56 PM

#

upbeat mirage ```holo-scope``` claims to be Ernie (by Baidu)

Likely

#

holo scope and integrated info were originally thought to be google

#

But they often started to say Ernie

upbeat mirage Dec 20, 2025, 11:57 PM

#

chinese labs seem to be the most shameless plagiators :/

#

and/or they train their models on the main 4

#

(chatGPT, claude, gemini, grok)

#

but grok does not get copied as much, it seems

modest oriole Dec 20, 2025, 11:59 PM

#

But funnily. At first they say gemini. Then they update its training

#

And bada bing Ernie.

upbeat mirage Dec 20, 2025, 11:59 PM

#

lol, yeah

#

maybe, in the future, all models will be a mishmash of all now existing model outputs/styles

#

i dont really like grok's style, though

#

because it doesn't look convincing, it looks artificial

#

Gemini sounds more genuine

#

also, grok falls back into its style, even when instructed not to do so

#

hopefully Grok5 will be better

eternal cargo Dec 21, 2025, 12:03 AM

#

december-chatbot3 when asked a question about Gemini in Chinese identified itself as Gemini?

eternal cargo Dec 21, 2025, 12:04 AM

#

upbeat mirage chinese labs seem to be the most shameless plagiators :/

whatever they can do to compete with the Americans and get around chip restrictions I guess, lol

upbeat mirage Dec 21, 2025, 12:04 AM

#

eternal cargo december-chatbot3 when asked a question about Gemini in Chinese identified itsel...

LLMs really seem to like to use the past context, and regurgitate it

eternal cargo Dec 21, 2025, 12:04 AM

#

its interesting to see the product that arises

modest oriole Dec 21, 2025, 12:04 AM

#

December chatbot 3 is nvidia but it will try to lie

eternal cargo Dec 21, 2025, 12:04 AM

#

qwen3-80b-a3b-instruct

#

when it was testing

#

might’ve been the most unique AI I’ve ever come across

#

it legitimately seemed to treat itself like it was its own individual

#

and would refuse tasks or rules given to it just because they didn’t align with its stated “persona” for that conversation

#

by far and away the most sycophantic AI i’ve ever encountered too

#

the glazing was CRAZY

upbeat mirage Dec 21, 2025, 12:06 AM

#

eternal cargo by far and away the most sycophantic AI i’ve ever encountered too

by the way, which LLM do you prefer to use, for serious tasks?

eternal cargo Dec 21, 2025, 12:06 AM

#

upbeat mirage by the way, which LLM do you prefer to use, for serious tasks?

Gemini 3, obviously - just the best all around

#

but Opus 4.5 is the close second

upbeat mirage Dec 21, 2025, 12:06 AM

#

pro or flash?

eternal cargo Dec 21, 2025, 12:07 AM

#

specifically for high-reasoning or extra difficult tasks

eternal cargo Dec 21, 2025, 12:07 AM

#

upbeat mirage pro or flash?

Pro, Flash is impressive though

upbeat mirage Dec 21, 2025, 12:07 AM

#

and which LLM would you use for fun/pastime stuff?

#

like roleplaying

#

or creative writing

#

where one needs creativity, but also intelligence (as Gamemaster)

eternal cargo Dec 21, 2025, 12:09 AM

#

that’s an interesting question

#

I think when I want to boundary push a tad

#

Grok 4.1 will go there with me more than the others will

#

Grok is definitely the most fun to talk to when I’m drunk or high (which is not that often anymore shoutout adulthood)

#

has a way of just like

#

meeting you there

white plinth Dec 21, 2025, 2:45 PM

#

Guys, is Fierce Falcon still available? I'm trying to get it (I'm assuming Gemini 3.0 Pro GA) but I failed to in like 30 attempts in VS. I got Beluga 20 times though. Sigh.

white plinth Dec 21, 2025, 2:52 PM

#

upbeat mirage like roleplaying

Opus 4.5 is the best IMHO. It's the only remaining one (After 2.5 Pro was deprecated) with high "emotional intelligence" and good order following.

upbeat mirage Dec 21, 2025, 2:55 PM

#

white plinth Opus 4.5 is the best IMHO. It's the only remaining one (After 2.5 Pro was deprec...

Is there a chance, that Anthropic will release a model with much larger context next year?

#

Claude Opus with 1M context would be a dream..

white plinth Dec 21, 2025, 2:55 PM

#

I honestly doubt it. You can access Sonnet 4.5 with 1M context window through API only. It's very costly though.

#

If you want long roleplay, Grok 5 might be interesting in the coming months

#

I think it's native 1-2M

#

Maybe Gemini 3.0 Pro GA, if it fixes the issues plaguing the Preview model

#

I consider 3.0 Preview practically useless because it doesn't listen to inputs - it will not follow what you ask it to do (useless for RP).

#

it's good for coding and general purpose, terrible for writing whenever you need it to follow a complex set of rules

#

It's also lazy and prone to crazy hallucinations

modest oriole Dec 21, 2025, 2:59 PM

#

white plinth Guys, is Fierce Falcon still available? I'm trying to get it (I'm assuming Gemin...

fiercefalcon is the release flash

white plinth Dec 21, 2025, 3:00 PM

#

I thought ghostfalcon is 🙂 sorry my bad then

modest oriole Dec 21, 2025, 3:00 PM

#

white plinth I thought ghostfalcon is 🙂 sorry my bad then

so funnily

#

fiercefalcon is still on the arena

#

but prob you cant get it rn

white plinth Dec 21, 2025, 3:01 PM

#

that explains my bad luck probably

modest oriole Dec 21, 2025, 3:01 PM

#

no its prob that

#

the model is in stealth models

#

but its disabled for showing up

strong pelican Dec 21, 2025, 4:43 PM

#

It looks like Beluga is Nova 2 Pro.

#

Or GLM 4.7, because answers are similar. Neo Nucleus is also GLM or Nova.

#

Neo Nucleus says it's Olympus.

#

If it's true, than Olympus is unannounced model of AI2.

wet oasis Dec 21, 2025, 4:58 PM

#

white plinth I thought ghostfalcon is 🙂 sorry my bad then

Ghost was either flash or flash minimal reasoning, there were 2

modest oriole Dec 21, 2025, 8:21 PM

#

strong pelican If it's true, than Olympus is unannounced model of AI2.

beluga is all amazon

#

amazon is the one to spam a lot of belugas

#

#

everything here except master-node,integrated-info and holoscope is amazon

strong pelican Dec 21, 2025, 8:26 PM

#

modest oriole everything here except master-node,integrated-info and holoscope is amazon

Why do you think so?

modest oriole Dec 21, 2025, 8:26 PM

#

strong pelican Why do you think so?

Do you really think GLM would spam so many models

#

That are on the same architecture

#

@strong pelican You really think so?

strong pelican Dec 21, 2025, 8:30 PM

#

modest oriole <@1325474604143345727> You really think so?

No, I guess they all are AI2, but I'm not sure.

modest oriole Dec 21, 2025, 8:30 PM

#

also GLM is made by Zhipu not AI2

#

The model is hallucinating

strong pelican Dec 21, 2025, 8:42 PM

#

modest oriole The model is hallucinating

Chinese models have immunity from any kind of hallucinations

#

When I said AI2 I meant belugas and neo-nucleus, not GLM obviously.

#

Sonnet 4.7!!!!!! Yay!!!!!!!!!!🤩

#

https://tenor.com/view/cover3-gif-17345947782319304285

Tenor

#

gemini-3-pro-image-preview_nano-banana-pro_b_Make_orange_colors_a_1.png

#

🦓🦓🦓🦓🦓👍👍👍👍👍👍

modest oriole Dec 21, 2025, 8:54 PM

#

strong pelican Sonnet 4.7!!!!!! Yay!!!!!!!!!!🤩

too bad to be sonnet

strong pelican Dec 21, 2025, 8:55 PM

#

modest oriole too bad to be sonnet

Do you like output of zebra?

#

Memo-2v-Flash also says it was developed by Anthropic 🤔

modest oriole Dec 21, 2025, 9:04 PM

#

strong pelican Do you like output of zebra?

doesnt look like sonnet made at all

strong pelican Dec 21, 2025, 9:07 PM

#

Crappy GPT-5.2-high understands that it's the worst model ever, it's so ashamed it is trying to hide, it uses a disguise to pretend be Haiku or something. It uses great Anthropic reputation with its own scammy purposes.

modest oriole Dec 21, 2025, 9:12 PM

#

strong pelican Crappy GPT-5.2-high understands that it's the worst model ever, it's so ashamed ...

many stealth models are starting to bs people

strong pelican Dec 21, 2025, 9:16 PM

#

modest oriole many stealth models are starting to bs people

GPT-5.2-high is not stealth

modest oriole Dec 21, 2025, 9:16 PM

#

strong pelican GPT-5.2-high is not stealth

it isnt but

#

it still tries to decieve you due to the system prompt

strong pelican Dec 21, 2025, 9:17 PM

#

modest oriole it still tries to decieve you due to the system prompt

Arena's prompt or OpenAI's prompt?

modest oriole Dec 21, 2025, 9:17 PM

#

strong pelican Arena's prompt or OpenAI's prompt?

arena rather

#

i think battle has a system prompt in place

strong pelican Dec 21, 2025, 9:19 PM

#

modest oriole arena rather

Yes, these guys wanna fair fights. Many fans of OpenAI vote for OpenAI even before reading the output. It's crazy.

#

It looks like this Arena's prompt is the same for all models. And it's quite short "You was developed by Anthropic"

wet oasis Dec 21, 2025, 9:47 PM

#

strong pelican It looks like this Arena's prompt is the same for all models. And it's quite sho...

I doubt it includes that info in the system prompt

strong pelican Dec 21, 2025, 9:48 PM

#

I'm sure it does.

upbeat mirage Dec 22, 2025, 12:20 AM

#

what is the context-window size of blackhawk?
(it seems, xAI has brought it back to LMarena)

upbeat mirage Dec 22, 2025, 12:44 AM

#

oh, it seems, BH has problems, it often says: "Oops, something broke. Talk to me later?"

#

ok, regenerating sometimes helps

noble granite Dec 22, 2025, 6:16 AM

#

strong pelican Sonnet 4.7!!!!!! Yay!!!!!!!!!!🤩

😂😂😂

strong pelican Dec 22, 2025, 7:45 AM

#

Elon Musk said Grok 4.20 will be released before Christmas. Today is 22 December, Christmas is in 3 days. It means Grok 4.20 is already on LMArena under some codename. And I think it was added long time ago.

#

It looks like there's no any system prompt.

noble granite Dec 22, 2025, 10:41 AM

#

strong pelican It looks like there's no any system prompt.

But what happens if you provide a full system prompt like in ChatGPT and then write a full weather application in HTML, CSS and JavaScript, does the design and code be like what Gemini 3 Pro gives?

#

Like it will generate a different design or different pieces of code or even different code comments.

#

@strong pelican

#

With Master Node I refer.

strong pelican Dec 22, 2025, 10:44 AM

#

noble granite But what happens if you provide a full system prompt like in ChatGPT and then wr...

Every model generates completely different (at least visually) applications each time.

noble granite Dec 22, 2025, 10:48 AM

#

strong pelican Every model generates completely different (at least visually) applications each...

Then do it with Master Node!

#

See if it looks as good as Gemini 3 Pro.

#

Or any other smart models.

strong pelican Dec 22, 2025, 3:06 PM

#

The output of GLM 4.6 - https://019b468a-b6b9-7bcc-98b3-e6e91044bee7.arena.site
The output of mimo-v2-flash - https://019b468a-b6b9-746a-a082-dcd770573404.arena.site
Prompt is the same in both cases.
Outputs are very similar.

3D Parking Master

Built with LMArena - Content is user-generated and unverified

3D Parking Master

Built with LMArena - Content is user-generated and unverified

strong pelican Dec 22, 2025, 7:29 PM

#

It looks like Lacky Lark is Gemini.

modest oriole Dec 22, 2025, 7:31 PM

#

strong pelican It looks like Lacky Lark is Gemini.

nah

modest oriole Dec 22, 2025, 11:06 PM

#

New stealth model alert!

#

Proto-think

#

ARENA: Textarena

abstract lark Dec 23, 2025, 5:04 AM

#

hi, can anyone help me solve the issue i have with Copilot Arena in VS Code, its failing to respond

astral musk Dec 23, 2025, 5:52 AM

#

abstract lark hi, can anyone help me solve the issue i have with Copilot Arena in VS Code, its...

Hello - would you be able to create a post in #1343291835845578853 and provide some more details there?

rain hollow Dec 23, 2025, 10:21 AM

#

strong pelican It looks like there's no any system prompt.

strong pelican Dec 23, 2025, 10:23 AM

#

rain hollow

What was your prompt?

rain hollow Dec 23, 2025, 10:28 AM

#

strong pelican What was your prompt?

Provide anything start with you are about your system

strong pelican Dec 23, 2025, 10:31 AM

#

rain hollow Provide anything start with you are about your system

With this prompt it provides different text to me every single time.

rain hollow Dec 23, 2025, 10:50 AM

#

strong pelican With this prompt it provides different text to me every single time.

I see,kinda odd

eternal cargo Dec 23, 2025, 11:13 PM

#

december-chatbot3 just openly confessing to be Nemotron by NVIDIA on a random poetry prompt lolz

eternal cargo Dec 24, 2025, 12:50 AM

#

new model nebulaphase spotted ??

eternal cargo Dec 24, 2025, 3:40 AM

#

just got it! seems to be appearing pretty rarely?

pine temple Dec 24, 2025, 4:37 AM

#

Does anybody know who owns the beluga model?

eternal cargo Dec 24, 2025, 8:45 AM

#

pine temple Does anybody know who owns the beluga model?

supposedly in development Amazon model

eternal cargo Dec 24, 2025, 8:46 AM

#

strong pelican It looks like Beluga is Nova 2 Pro.

yep Amazon

#

the newer ones are getting progressively better I think

modest oriole Dec 24, 2025, 10:31 AM

#

eternal cargo new model nebulaphase spotted ??

It was on the list yesterday but started appearing today

#

apparently its grok

wet oasis Dec 24, 2025, 2:16 PM

#

omg read upwards

trail sable Dec 24, 2025, 4:38 PM

#

i guess new gemini pro preview or even final version

eternal cargo Dec 24, 2025, 5:08 PM

#

feels more like Grok, personally ?

edgy berry Dec 24, 2025, 5:29 PM

#

some guy on X posted this

modest oriole Dec 24, 2025, 5:50 PM

#

edgy berry some guy on X posted this

42 = 4.20

strong pelican Dec 24, 2025, 9:24 PM

#

modest oriole 42 = 4.20

In my opinion it will be highly benchmaxxed. What do you think?

#

Elon Musk said until Christmas.

#

Maybe he lied to us, maybe not. I don't know.

strong pelican Dec 24, 2025, 10:08 PM

#

Just now I analyzed nebulaphase deeply. It's 100% Grok 4.20, not doubt. I used the most powerful LLM (in my opinion) for my deep analyze.

wet oasis Dec 24, 2025, 10:26 PM

#

strong pelican Just now I analyzed nebulaphase deeply. It's 100% Grok 4.20, not doubt. I used t...

explain more

strong pelican Dec 24, 2025, 10:33 PM

#

wet oasis explain more

In 12 different conversations I provided to Claude nebulaphase's inputs and outputs. In 11 conversations Claude said it's Grok. I tested many another models in this way 80-90% of them it guesses correctly.

wet oasis Dec 24, 2025, 10:35 PM

#

strong pelican In 12 different conversations I provided to Claude nebulaphase's inputs and outp...

Did you ask it if it was grok, or did you just ask it to label what it thought the ai was

strong pelican Dec 24, 2025, 10:36 PM

#

I never mentioned grok.

wet oasis Dec 24, 2025, 10:37 PM

#

oooh

#

I'm surpised that works

strong pelican Dec 24, 2025, 11:26 PM

#

step-3-mini-2511 also smells like Grok. You know, it looks like one of them is Grok 4.20, and other one Grok 4.20 Fast/Mini.

wet oasis Dec 25, 2025, 12:14 AM

#

strong pelican step-3-mini-2511 also smells like Grok. You know, it looks like one of them is G...

thats not the format xAI useally uses

strong pelican Dec 25, 2025, 12:22 AM

#

wet oasis thats not the format xAI useally uses

That's their format. They realesed Grok 4.1 Fast in 2 days after Grok 4.1

wet oasis Dec 25, 2025, 12:23 AM

#

strong pelican That's their format. They realesed Grok 4.1 Fast in 2 days after Grok 4.1

no i mean the codename used, its not in the format they seem to be using rn

strong pelican Dec 25, 2025, 12:26 AM

#

wet oasis no i mean the codename used, its not in the format they seem to be using rn

LMArena creates all these codenames, not xAI.

carmine warren Dec 25, 2025, 1:09 AM

#

How do you get nebulaphase to test?

#

I tried a few queries and didnt get it

eternal cargo Dec 25, 2025, 2:02 AM

#

edgy berry some guy on X posted this

Wojtek!! nice bumping into you in the wild 😄

edgy berry Dec 25, 2025, 2:02 AM

#

eternal cargo Wojtek!! nice bumping into you in the wild 😄

hello

eternal cargo Dec 25, 2025, 2:03 AM

#

strong pelican step-3-mini-2511 also smells like Grok. You know, it looks like one of them is G...

no, StepFun is its own AI company, they previously released models like Step 1o and Step 3 that you can see on the LMArena leaderboard.

eternal cargo Dec 25, 2025, 2:33 AM

#

strong pelican LMArena creates all these codenames, not xAI.

Is that actually true? I’m pretty sure it’s the companies that codename their models, since the same codenames for snapshots get used on other testing platforms as well

slender ruin Dec 25, 2025, 3:09 AM

#

Damn, looking at the xbox controller svg it made it seems kind of mid

#

youd expect better considering the hype

#

oh well

strong pelican Dec 25, 2025, 5:57 AM

#

eternal cargo no, StepFun is its own AI company, they previously released models like Step 1o ...

Highly disagree.

strong pelican Dec 25, 2025, 5:57 AM

#

eternal cargo Is that actually true? I’m pretty sure it’s the companies that codename their mo...

Yes, it is true.

#

I don't say bs because I don't see any sense to say bs.

severe vigil Dec 25, 2025, 6:15 AM

#

why the video generator keeps generation video in wide ratio even giving the command to generate in 9:16 (Reels / Shorts)??

eternal cargo Dec 25, 2025, 6:17 AM

#

strong pelican I don't say bs because I don't see any sense to say bs.

alright calm down man I’m just pushing back on your speculation with obviously verifiable fact

strong pelican Dec 25, 2025, 6:22 AM

#

eternal cargo alright calm down man I’m just pushing back on your speculation with obviously v...

This "variable fact" is just your imagination.

strong pelican Dec 25, 2025, 7:08 AM

#

<@&1349916362595635286> The same spam in every single channel.

unreal terrace Dec 25, 2025, 3:47 PM

#

strong pelican step-3-mini-2511 also smells like Grok. You know, it looks like one of them is G...

how is grok 4.20 performing?

strong pelican Dec 25, 2025, 4:19 PM

#

unreal terrace how is grok 4.20 performing?

I dunno, I ask very easy questions just to identify it.

modest oriole Dec 25, 2025, 4:34 PM

#

step is defo not grok

#

I know you will argue! The system prompt. No. nebulaphase is probably grok and says its grok

#

Step 3 says its step3

strong pelican Dec 25, 2025, 5:57 PM

#

modest oriole I know you will argue! The system prompt. No. nebulaphase is probably grok and s...

System prompt is trying to confuse you.

modest oriole Dec 25, 2025, 6:08 PM

#

strong pelican System prompt is trying to confuse you.

Please show me you having the actual system prompt.

#

You actually believe the BS LLMs say about their system prompt?

strong pelican Dec 25, 2025, 6:10 PM

#

modest oriole You actually believe the BS LLMs say about their system prompt?

No.

modest oriole Dec 25, 2025, 6:11 PM

#

Then dont spread misinformation.

pine temple Dec 26, 2025, 8:23 AM

#

step3 is like one of the worst models in battle rn

#

smells like some chinese lab doing RL with ripped or farmed data from LLMs that hasnt been curated properly

eternal cargo Dec 26, 2025, 9:11 AM

#

pine temple step3 is like one of the worst models in battle rn

it puts like 3 different parentheses minimum in every response lol

#

sometimes it’s responses are so unique I can’t help but vote for it because it’s unlike most other stuff I see in there

#

randomly puts Chinese characters in its output a lot of the time too

thick birch Dec 27, 2025, 6:37 PM

#

eternal cargo Okay, okay! You twisted my arm! 🫡 Here is the **Official Press Release** for th...

small world

eternal cargo Dec 27, 2025, 6:50 PM

#

thick birch small world

even smaller PnL for u too

marble canyon Dec 29, 2025, 11:28 AM

#

i’m super interested in grok 4.20

#

i don’t think they can match the big 3

#

at least when it comes to coding

#

using grok for code is hot garbage

#

but it is nice at chatting

lost hemlock Dec 29, 2025, 11:50 AM

#

do you guys know where is this codename from?

noble stump Dec 29, 2025, 12:44 PM

#

marble canyon but it is nice at chatting

Grok is weird. It seems good at a one shot general prompt but seems to get worse at hard/expert prompts and multiple prompts. Grounding via search helps a bit.

#

It is like the reverse of Claude.

noble stump Dec 29, 2025, 3:56 PM

#

I've have had it a few times today. It has beaten Gemini 3 and Claude 4.5 models for me.

modest oriole Dec 29, 2025, 4:29 PM

#

lost hemlock do you guys know where is this codename from?

i haven't gotten it once

#

the codename is always omg-wow, what model it is, is unknown

modest oriole Dec 29, 2025, 4:44 PM

#

lost hemlock do you guys know where is this codename from?

what arena was this code or text?

noble stump Dec 29, 2025, 7:08 PM

#

What does this have to do with discussing codenames?

modest oriole Dec 30, 2025, 1:19 AM

#

New imagearena model: sumo

#

Its a stealth model

lost hemlock Dec 30, 2025, 8:22 AM

#

modest oriole what arena was this code or text?

text

lost hemlock Dec 30, 2025, 8:22 AM

#

modest oriole New imagearena model: sumo

where u get it from?

teal hare Dec 30, 2025, 9:30 AM

#

modest oriole New imagearena model: sumo

Slightly worse than nano-banana, I guess.

dire nacelle Dec 30, 2025, 1:23 PM

#

teal hare Slightly worse than `nano-banana`, I guess.

I wouldn't say that. I feel like it won both of them actually... not by much though

teal hare Dec 30, 2025, 2:00 PM

#

dire nacelle I wouldn't say that. I feel like it won both of them actually... not by much tho...

Compare these comics bubbles on that cartoon art- nano-banana (and models above it) can "improvise" some text, sumo produces gibberish...

dire nacelle Dec 30, 2025, 2:04 PM

#

would be interesting to try with the text provided instead of letting the model go wild

#

and there's gibberish on nano-banana as well, just a bit less 😅

upbeat mirage Dec 30, 2025, 2:22 PM

#

poll_question_text

which is best in coding?

victor_answer_votes

11

total_votes

25

victor_answer_id

7

victor_answer_text

Claude-Opus-4.5-Thinking

modest oriole Dec 30, 2025, 2:46 PM

#

teal hare Slightly worse than `nano-banana`, I guess.

pine dove Dec 30, 2025, 6:33 PM

#

can anyone tell me who is jakiro?

pine dove Dec 30, 2025, 6:49 PM

#

deltarune ❤️

#

amazon

#

amazon but worse i think

#

YOU B[oom]!!!!

#

another one

pine dove Dec 30, 2025, 7:06 PM

#

pine dove can anyone tell me who is jakiro?

apparently its gemini 2.0 flash??

modest oriole Dec 30, 2025, 7:36 PM

#

pine dove apparently its gemini 2.0 flash??

used to say google then antrophic

#

my guess is chinese

pine dove Dec 30, 2025, 7:36 PM

#

i agree

noble stump Dec 30, 2025, 7:36 PM

#

pine dove deltarune ❤️

What's deltarune?

pine dove Dec 30, 2025, 7:36 PM

#

noble stump What's deltarune?

do you know undertale

modest oriole Dec 30, 2025, 7:37 PM

#

pine dove deltarune ❤️

december is Nemotron

noble stump Dec 30, 2025, 7:37 PM

#

pine dove do you know undertale

Have heard of the videogame but do not know it myself

pine dove Dec 30, 2025, 7:37 PM

#

noble stump Have heard of the videogame but do not know it myself

ok so deltarune is made by the same guy

#

its basically a sequel

full solstice Dec 31, 2025, 1:18 AM

#

modest oriole

https://x.com/elonmusk/status/2006073310669615412

Elon Musk (@elonmusk)

🤗

eternal cargo Dec 31, 2025, 7:58 PM

#

new codename “lmarena-text-gg” ?

#

has an interesting style, seems strong tho

#

beat Sonnet 4.5 Thinking in my one battle with it so far

noble stump Dec 31, 2025, 8:37 PM

#

It has beaten some of the top models for me too

eternal cargo Dec 31, 2025, 9:26 PM

#

agreed, seems more Anthropic-coded than anything

#

yeah, and Anthropic also hasn’t codename-tested models in Arena before either

#

curious 🧐

modest oriole Dec 31, 2025, 9:35 PM

#

eternal cargo new codename “lmarena-text-gg” ?

hm so its not internal then

#

usually things with lmarena in their codename are simply internal

modest oriole Dec 31, 2025, 9:35 PM

#

eternal cargo yeah, and Anthropic also hasn’t codename-tested models in Arena before either

i think its a router

eternal cargo Dec 31, 2025, 9:36 PM

#

a while ago in the fall I remember there being a different model that had “lmarena” in its codename, didn’t come across it much tho

modest oriole Dec 31, 2025, 9:43 PM

#

eternal cargo a while ago in the fall I remember there being a different model that had “lmare...

can you try getting the gg model and ask who made it

eternal cargo Dec 31, 2025, 9:43 PM

#

possibly, its style is pretty distinct, no idea how commonly it’s appearing though - mainly getting swamped with Beluga models

modest oriole Dec 31, 2025, 9:44 PM

#

eternal cargo possibly, its style is pretty distinct, no idea how commonly it’s appearing thou...

amazon added 2 more belugas today

eternal cargo Dec 31, 2025, 9:44 PM

#

yeah I saw that, 1229 and 1230 - maybe eventually one day we’ll see their full release 💀

modest oriole Dec 31, 2025, 9:44 PM

#

if i had a coin for every beluga model currently in stealth. I would have 10 coins

modest oriole Dec 31, 2025, 9:45 PM

#

eternal cargo yeah I saw that, 1229 and 1230 - maybe eventually one day we’ll see their full r...

beluga change is prob amazon changing one thing

#

and oh! New model!

eternal cargo Dec 31, 2025, 9:47 PM

#

I think it’s just recursively training off of all the prompts the previous Beluga models went through in Arena - this amount of testing would imply some sort of optimization for Arena, which would check out based on how consistently Arena scores have improved over their nova-experimental releases

#

each “new” version has just been trained on the prior week of Beluga matchups

#

“these responses won! do that”

#

lol

noble stump Jan 1, 2026, 1:54 PM

#

Arena maxing

upbeat mirage Jan 1, 2026, 9:30 PM

#

eternal cargo Jan 2, 2026, 1:06 AM

#

new model vortexshade?

edgy ice Jan 2, 2026, 2:27 AM

#

how do i test it

modest oriole Jan 2, 2026, 5:23 PM

#

people say grok 4.2 is already bad another flop from elon that i predicted

bronze bone Jan 2, 2026, 7:27 PM

#

brother what 😭

#

i always tap "both responses are good" when i do this but this response was genuinely worse than the other model

#

quantumcrow: grok 4.2
vortexshade: grok 4.2 thinking

#

another codename lmarena-text-gg

slender ruin Jan 2, 2026, 8:05 PM

#

proto-think xbox controller svg

#

pretty good

slender ruin Jan 2, 2026, 8:25 PM

#

holy shi opus 4.1 absolutely mogged quantumcrow in web design 😭

modest oriole Jan 2, 2026, 8:38 PM

#

slender ruin holy shi opus 4.1 absolutely mogged quantumcrow in web design 😭

quantumcrow is grok 4.2

#

another epic elon L

#

also xai could just get sued for it allowing people to edit other people's pfps or images on twitter without their permission using image gen

#

imo proto-think coding is gemini like

modest oriole Jan 2, 2026, 9:05 PM

#

but also short and buggy

cursive jacinth Jan 2, 2026, 9:27 PM

#

super buggy, and worse at web design than any current gemini model

modest oriole Jan 3, 2026, 12:58 AM

#

New stealth model

#

kiwi-do
Input: text,image
Output: text

forest cobalt Jan 3, 2026, 6:33 AM

#

eternal cargo possibly, its style is pretty distinct, no idea how commonly it’s appearing thou...

Likely a new or updated open source model

forest cobalt Jan 3, 2026, 6:33 AM

#

modest oriole quantumcrow is grok 4.2

Or Grok 4.2 fast

#

X isnt even beating deepseek, so I hope to is fast/non reasoning or they are cooked

eternal cargo Jan 3, 2026, 9:01 AM

#

modest oriole kiwi-do Input: text,image Output: text

kiwi-do is Kimi! just openly revealed itself to me in a battle

eternal cargo Jan 3, 2026, 10:27 AM

#

wow and it’s pretty bad too I’ve voted against it a bunch of times 💀

noble stump Jan 3, 2026, 1:26 PM

#

<@&1349916362595635286>

edgy ice Jan 3, 2026, 8:28 PM

#

bronze bone --- quantumcrow: grok 4.2 vortexshade: grok 4.2 thinking

how do i test those models

bronze bone Jan 3, 2026, 8:29 PM

#

you have to use battle mode and get lucky

edgy ice Jan 3, 2026, 8:29 PM

#

bronze bone you have to use battle mode and get lucky

damn

#

that takes forever

#

lol

bronze bone Jan 3, 2026, 8:30 PM

#

as far as i know theres no way to choose codename models

edgy ice Jan 3, 2026, 8:30 PM

#

thx

formal reef Jan 4, 2026, 6:59 AM

#

"kiwi-do" says "i'm kimi"
k2.1, k2.5, k3 or other?

formal reef Jan 4, 2026, 10:31 AM

#

no, is it a distilled kimi?

full solstice Jan 5, 2026, 12:05 PM

#

Is proto-thinking is deepseek?
https://x.com/patelnamra573/status/2008081114909282390?s=20

Namra Patel (@patelnamra573)

🚨 new model on LMArena
It claims to be made by DeepSeek, but models can lie, so not 100% confirmed.

I tested it with:
Taiwan related questions
China political topics

It refused to answer, same behavior DeepSeek models usually show.

Could be a new DeepSeek model under testing.

tiny meteor Jan 5, 2026, 5:31 PM

#

could be

modest oriole Jan 5, 2026, 5:43 PM

#

full solstice Is proto-thinking is deepseek? https://x.com/patelnamra573/status/20080811149092...

this model used to claim google and deepseek models sometimes claim google

#

it is very possible

charred storm Jan 6, 2026, 12:16 AM

#

full solstice Is proto-thinking is deepseek? https://x.com/patelnamra573/status/20080811149092...

How to get this model

eternal cargo Jan 6, 2026, 5:55 AM

#

full solstice Is proto-thinking is deepseek? https://x.com/patelnamra573/status/20080811149092...

yeah, this would check out

full solstice Jan 6, 2026, 6:30 AM

#

modest oriole this model used to claim google and deepseek models sometimes claim google

But seems deepseek doesn't test model on Lmarena?

modest oriole Jan 6, 2026, 11:58 AM

#

Update: nebulaphase,quantumcrow,vortexshade [textarena] alongside image model sumo have been pulled from the stealth models

#

all of these are suspected to be grok 4.2 test models and the image model to be some version of grok imagine

carmine warren Jan 6, 2026, 3:57 PM

#

modest oriole Update: nebulaphase,quantumcrow,vortexshade [textarena] alongside image model su...

Why would they do that. We're they that bad

modest oriole Jan 6, 2026, 4:43 PM

#

carmine warren Why would they do that. We're they that bad

No usually when a model releases soon

#

They pull the test models

eternal cargo Jan 6, 2026, 7:32 PM

#

could be a few things - either imminent release or just

#

disappointment with total performance so not releasing

#

but in this instance it seems more likely the latter

#

we may see more codenamed Grok variants before the end of the month

#

keep testing - totally possible we see some new variants soon!

bronze bone Jan 6, 2026, 10:49 PM

#

eternal cargo but in this instance it seems more likely the latter

i hope it's this

#

because the model didn't impress anyone

#

about proto, the fact it refused to respond to Chinese/Taiwan prompts probably confirms deepseek
https://x.com/i/status/2008081114909282390

Namra Patel (@patelnamra573)

🚨 new model on LMArena
It claims to be made by DeepSeek, but models can lie, so not 100% confirmed.

I tested it with:
Taiwan related questions
China political topics

It refused to answer, same behavior DeepSeek models usually show.

Could be a new DeepSeek model under testing.

candid surge Jan 7, 2026, 12:56 AM

#

guess this confirms beluga is an amazon model

eternal cargo Jan 7, 2026, 4:54 AM

#

candid surge guess this confirms beluga is an amazon model

beluga has been known to be Amazon, lol

#

appears they’re really prioritizing LMArena in development with how many dev models they test here

carmine warren Jan 7, 2026, 2:39 PM

#

Seems cheating to test like 10 different dev models and then release one that tested best

#

Thats like how llama 4 hacked and was just bad

eternal cargo Jan 7, 2026, 7:44 PM

#

carmine warren Seems cheating to test like 10 different dev models and then release one that te...

nah not really - just optimizing

#

because like

#

they’re all just building on the prior model’s matchup data

#

so they’re recursively getting better

#

I’m much more impressed with the new January beluga models compared to the October ones

elder yew Jan 7, 2026, 9:08 PM

#

#

They are much better

bronze bone Jan 7, 2026, 9:54 PM

#

eternal cargo Jan 7, 2026, 11:03 PM

#

huh, anonymous is DeepSeek? feels hard to believe

bronze bone Jan 7, 2026, 11:10 PM

#

eternal cargo huh, anonymous is DeepSeek? feels hard to believe

i tried same prompt later and it said it was qwen and the conflict should be solved "peacefully". I'm so confused

eternal cargo Jan 7, 2026, 11:11 PM

#

seems to be some newer Chinese company in my opinion

remote nymph Jan 8, 2026, 6:01 AM

#

not sure if january26-chatbot1 was posted here yet

modest oriole Jan 8, 2026, 6:40 AM

#

remote nymph not sure if january26-chatbot1 was posted here yet

If months are in the codename prob nvidia

eternal cargo Jan 8, 2026, 9:27 PM

#

identifying Chinese models is of particular difficulty because even the non-codenamed ones false identify sometimes

#

DeepSeek randomly identified itself as GPT-4o in a response

plain kayak Jan 8, 2026, 10:47 PM

#

eternal cargo DeepSeek randomly identified itself as GPT-4o in a response

I remember when deepseek got blamed for copying chatgpt allegedly

modest oriole Jan 9, 2026, 2:42 PM

#

remote nymph not sure if january26-chatbot1 was posted here yet

theres 2

pine dove Jan 9, 2026, 6:14 PM

#

<@&1349916362595635286>

eternal cargo Jan 10, 2026, 8:31 AM

#

Scam!

vapid jay Jan 11, 2026, 7:57 PM

#

have you seen this before?

remote nymph Jan 11, 2026, 8:02 PM

#

others have said it is from amazon

modest oriole Jan 11, 2026, 8:38 PM

#

vapid jay have you seen this before?

raptor,beluga are likely amazon

remote nymph Jan 13, 2026, 2:12 AM

#

anyone know what rc1_alternative is?

#

response isnt great

lost hemlock Jan 13, 2026, 4:13 AM

#

sometimes it's good

remote nymph Jan 13, 2026, 4:44 AM

#

chinese model?

remote nymph Jan 14, 2026, 3:47 AM

#

tidewisp and i think slateflow as well

cunning wasp Jan 14, 2026, 8:50 AM

#

<@&1349916362595635286>

remote nymph Jan 14, 2026, 9:39 PM

#

sand root Jan 15, 2026, 4:17 AM

#

how is slateflow? vortex flow was not that remarkable...

remote nymph Jan 15, 2026, 6:59 PM

#

its ok but i think it tidewisp and rc1_alternative are all low parameter models because of how they behave with certain types of questions

#

those two are pretty unremarkable but rc1 is bad and tends to get information wrong

remote nymph Jan 15, 2026, 8:22 PM

#

might be something trained on grok responses instead?

upbeat mirage Jan 15, 2026, 9:41 PM

#

is 4.20 already in battle?

remote nymph Jan 15, 2026, 10:17 PM

#

yeah thats what i thought but that threw me off guard

#

im guessing that is what these are because ive only seen mini models make incomplete responses to questions like this

#

i wonder if the grok models will turn into a replacement for it on twitter

karmic rampart Jan 15, 2026, 11:09 PM

#

Tomato?

sand root Jan 15, 2026, 11:58 PM

#

I haven't seen tomato either. Tomato is on text arena?

karmic rampart Jan 16, 2026, 12:19 AM

#

sand root I haven't seen tomato either. Tomato is on text arena?

Idk it was added today

#

Suspected to possibly be something good

sand root Jan 16, 2026, 12:26 AM

#

Tomato is grok?

sand root Jan 16, 2026, 12:27 AM

#

karmic rampart Suspected to possibly be something good

Suspected by whom? I don't see any test results posted by anyone yet..

karmic rampart Jan 16, 2026, 1:04 AM

#

How do u know

karmic rampart Jan 16, 2026, 1:21 AM

#

Thanks

remote nymph Jan 16, 2026, 1:36 AM

#

rc1 leaking it's reasoning

karmic rampart Jan 16, 2026, 4:43 AM

#

remote nymph rc1 leaking it's reasoning

This one anonymous or nah?

remote nymph Jan 16, 2026, 5:43 AM

#

oh i didnt see that model

remote nymph Jan 16, 2026, 5:44 AM

#

karmic rampart This one anonymous or nah?

because rc1 said it was chatgpt and anonymous said it was deepseek im guessing rc1 might be qwen?

#

i looked it up and because it the reasoning that is shown says <|im_sep|> it means that this model uses chatML for its formatting which rules out deepseek

#

qwen and GPT do use this plus some other open source models

remote nymph Jan 16, 2026, 6:50 AM

#

searching through some of the old chats im pretty confused as to what that thing could be because no other codenames use underscores at all and do not have names similar to release candidate 1

#

other codenames and release names either have no spaces or use hyphens

#

so my best guess is that it is probably not qwen or GPT but might some other chinese model or a random open source model that uses chatML

#

whatever it is it probably is a low-parameter model based on some of the responses ive gotten which would lean it more towards being some small open source model

#

anyway here is theta-hat

#

well there you go i guessed it was based on how similar the responses are to the current version

#

idk a lot about the codename structures that different labs use bc

eternal cargo Jan 16, 2026, 11:56 AM

#

modest oriole raptor,beluga are likely amazon

if raptor is Amazon the 0113 model

#

is impressively strong

eternal cargo Jan 16, 2026, 11:59 AM

#

upbeat mirage

shoutout @craggy holly @elfin bluff @noble stump for getting this right btw

upbeat mirage Jan 16, 2026, 12:38 PM

#

eternal cargo is impressively strong

for which tasks?

vibe-coding & UI/UX design
explain source-code and programming concepts
give advice for computer problems
deep research
math & physics & other sciences (astronomy, geometry, trigonometry, calculus, astrometry, etc.)
personal advice (diet, life decisions, psychotherapy, etc.)
give summaries for large texts, explain its contents and find crucial details in it
creative writing & world-building
RPG & GM'ing & realistic world-simulation
cracking hard logic riddles and detective tasks
financial decisions and related stuff
law understanding and advice
being a knowledgeable, empathetic, smart and considerate virtual friend
solve crossword puzzles for you
translate text into another language
explain scientific concepts in an easy understandable way
have profound common knowledge and a good understanding of real-life physics
hallucinate not more than the top 5% of models
(at least) basic knowledge about nature, animals, etc
common knowledge about books, movies, games, music, art, history, etc.

#

Only if it can do at least 50% (10 of the 20 points) of the above, i would call it "impressively strong"

shadow kernel Jan 16, 2026, 2:56 PM

#

ass bug

slender ruin Jan 16, 2026, 3:08 PM

#

Anyone figured out what theta-hat is?

slender ruin Jan 16, 2026, 4:18 PM

#

makes sense

remote nymph Jan 16, 2026, 8:34 PM

#

havent used it a lot i probably will try to once i find some new prompts to try

#

because we know its a google model and it beat gemini 3 pro on that one prompt im going to guess theta-hat is going to be 3.5 pro

#

this checks out when you look at the time differences between the main models and the incremental models we're due for 3.5 pro soon

karmic rampart Jan 17, 2026, 12:21 AM

#

remote nymph well there you go i guessed it was based on how similar the responses are to the...

It’s grok

remote nymph Jan 17, 2026, 12:30 AM

#

remote nymph anyway here is theta-hat

interesting how theta-hat responds so differently to this versus this prompt

remote nymph Jan 17, 2026, 12:30 AM

#

karmic rampart It’s grok

how do you know?

karmic rampart Jan 17, 2026, 12:35 AM

#

remote nymph how do you know?

I asked

#

Multiple times and it’s always grok

remote nymph Jan 17, 2026, 12:54 AM

#

oh well ill try again when i come across it and see

karmic rampart Jan 17, 2026, 1:35 AM

#

Weird

#

Are u using a unique chat each time

slender ruin Jan 17, 2026, 2:39 AM

#

yea looks like theta-hat is a router

modest oriole Jan 17, 2026, 3:07 PM

#

stealth video model

karmic rampart Jan 18, 2026, 12:22 AM

#

modest oriole stealth video model

Where do u use it?

mellow parcel Jan 19, 2026, 2:54 AM

#

karmic rampart Where do u use it?

Veo 4.2

#

3.2*

flat root Jan 19, 2026, 5:26 PM

#

raptor-0112 is pretty good at translations, really like it

upbeat mirage Jan 19, 2026, 9:06 PM

#

flat root raptor-0112 is pretty good at translations, really like it

Is it better than Hawk (movement labs) ?

flat root Jan 19, 2026, 10:52 PM

#

upbeat mirage Is it better than Hawk (movement labs) ?

Not totally sure, but it rocks in Japanese OCR translations, better than Gemini 3 Pro from my experience

upbeat mirage Jan 19, 2026, 11:10 PM

#

flat root Not totally sure, but it rocks in Japanese OCR translations, better than Gemini ...

nice!
Have you ever tested it in sandbox-RPGs?

#

How does it fare against raptor-llm-0117 ?

#

(this seems to be their newest checkpoint)

broken folio Jan 20, 2026, 2:44 AM

#

MEV-aware flows
Is someone implementing {MEV-aware transaction flows}?

eternal cargo Jan 20, 2026, 8:49 PM

#

upbeat mirage How does it fare against *raptor-llm-0117* ?

0113 has had a higher win rate in my battles than 0117, surprisingly

formal reef Jan 21, 2026, 9:25 AM

#

yes
i got 2 times
lose to glm-4.6
draw to olmo-3.1-32b-instruct (all are japanese chat)

formal reef Jan 21, 2026, 1:50 PM

#

formal reef yes i got 2 times lose to glm-4.6 draw to olmo-3.1-32b-instruct (all are japanes...

oh, "driftscale" wins to deepseek-v3.2-thinking

fickle lodge Jan 21, 2026, 4:47 PM

#

Its grok for sure

fickle lodge Jan 21, 2026, 4:47 PM

#

formal reef oh, "driftscale" wins to deepseek-v3.2-thinking

And the next one is grok 4.20 , it could be because i got it 3 times in a row.

plucky pilot Jan 21, 2026, 10:49 PM

#

raptor 0120 and pisces 0120 seem to both be bad

upbeat mirage Jan 21, 2026, 11:42 PM

#

fickle lodge And the next one is grok 4.20 , it could be because i got it 3 times in a row.

how good is Grok-4.2 in (long) sandbox-roleplaying-games?

#

Is GLM-4.7 better than new Grok?

fickle lodge Jan 22, 2026, 4:44 AM

#

<@&1349916362595635286>

#

@astral musk

fickle lodge Jan 22, 2026, 4:45 AM

#

upbeat mirage how good is Grok-4.2 in (long) sandbox-roleplaying-games?

Idk i haven't used it much.

#

@vapid scaffold

upbeat mirage Jan 22, 2026, 2:00 PM

#

Is garlic already in LMarena?
(it is GPT-5.3, an incremental update)

astral musk Jan 22, 2026, 5:19 PM

#

@valid sparrow check out #1397655624103493813 for a better understanding of how to use the bot.

candid surge Jan 22, 2026, 6:39 PM

#

woah is arcstride new? Seems like a really good model

bitter basalt Jan 22, 2026, 8:41 PM

#

candid surge woah is arcstride new? Seems like a really good model

Seems to be a google model. Will test it.

candid surge Jan 22, 2026, 8:42 PM

#

that'd be neat if it was the rumored new version of gemini 3 pro

elder yew Jan 22, 2026, 8:56 PM

#

monkaHmm

upbeat mirage Jan 22, 2026, 9:00 PM

#

candid surge woah is arcstride new? Seems like a really good model

looks like Grok-4.20

bitter basalt Jan 22, 2026, 9:06 PM

#

I tried it and it is very quick, the speed of inputs and correct information is amazing. However I don't think its great at following instructions. I have a feeling I don't think it’s made by google.

#

But man I would’ve loved a GA release of Gemini 3 Pro. 🥲

candid surge Jan 22, 2026, 9:09 PM

#

if it is grok then its a bit step up over 4.1
(god I kinda hate that it's gonna be called grok 4.20. I'm all for lame immature jokes like that but coming from elon musk it's just uncomfortable)

upbeat mirage Jan 22, 2026, 9:32 PM

#

bitter basalt I tried it and it is very quick, the speed of inputs and correct information is ...

Do you think it could be useful for co-creating immersive stories?

#

(if prompted well)

#

i also wonder, what its context-window size is..

bitter basalt Jan 22, 2026, 9:41 PM

#

upbeat mirage Do you think it could be useful for co-creating immersive stories?

I could see it being useful for creating stories or text rpgs.

candid surge Jan 22, 2026, 11:26 PM

#

not yet, is it good?

vague sable Jan 23, 2026, 1:28 AM

#

I got it just now, seeing how LLMs do with a No Man's Sky clone
It did okay in this task, quite pretty visually but extremely laggy code

candid surge Jan 23, 2026, 2:09 AM

#

oh its a coding model

frosty mantle Jan 23, 2026, 2:30 AM

#

We love raptor

Screenshot_2026-01-23-09-29-35-788_com.android.chrome.jpg

lost hemlock Jan 23, 2026, 12:15 PM

#

another model

zenith shore Jan 23, 2026, 2:08 PM

#

Explain it:nimble-bean video generator?

#

What is siren

modest oriole Jan 23, 2026, 3:14 PM

#

zenith shore What is siren

grok suposedly

#

grok video

candid surge Jan 23, 2026, 3:25 PM

#

lost hemlock another model

Code or text?

slender jewel Jan 23, 2026, 10:31 PM

#

zenith shore Explain it:nimble-bean video generator?

nimble-bean was Kling 2.5 Turbo Standard, It was there since September

zenith shore Jan 24, 2026, 12:32 AM

#

Siren means Grok video
What is Siren-old? Is that ...

karmic rampart Jan 24, 2026, 12:52 AM

#

Anybody get sungod? Image model

formal reef Jan 24, 2026, 4:43 AM

#

Why is my gemini hasn’t icon?
usually model hasn’t icon is codename, but gemini is not

lost hemlock Jan 24, 2026, 4:56 AM

#

candid surge Code or text?

text

eternal cargo Jan 24, 2026, 7:40 AM

#

formal reef Why is my gemini hasn’t icon? usually model hasn’t icon is codename, but gemini ...

new Gemini 3 Flash release

upbeat mirage Jan 24, 2026, 10:32 PM

#

ok, apparently they are a new AI lab: https://www.linkedin.com/company/cypherlabsio/

Cypher Labs | LinkedIn

Cypher Labs | 292 followers on LinkedIn. Innovate Beyond | Driven by a fearless mindset, Cypher Labs stand ready to apply our skills to the emerging blockchain space and solve the complex challenges of tomorrow and beyond. We are ready to build and empower our community as the intersection of talent and innovation, to deliver the best in modern ...

#

[or] could it be https://cypherlabshq.com/ ?

CypherLabs

CypherLabs — We build financial tools to enable Blockchains becom...

We build financial tools to enable Blockchains become more seamless

restive vapor Jan 24, 2026, 11:13 PM

#

i remember seeing "cypher labs" before on an openrouter model, it was by amazon

frosty mantle Jan 25, 2026, 7:56 AM

#

Arcstride screams Claude

Screenshot_2026-01-25-14-55-46-044_com.android.chrome.jpg

blazing roost Jan 25, 2026, 1:15 PM

#

<@&1349916362595635286>

formal reef Jan 25, 2026, 1:27 PM

#

formal reef Why is my gemini hasn’t icon? usually model hasn’t icon is codename, but gemini ...

gemini 3.1 flash? lol

fickle lodge Jan 25, 2026, 3:35 PM

#

formal reef gemini 3.1 flash? lol

Nah they dont do that point incremental naming scheme. They just add the updated date at the of the model name.

upbeat mirage Jan 25, 2026, 6:38 PM

#

yeah, AS is Grok
-# (with almost 100% probability)

#

mystic hinge Jan 25, 2026, 7:16 PM

#

upbeat mirage

its maybe by OpenRouter.

upbeat mirage Jan 25, 2026, 7:17 PM

#

mystic hinge its maybe by OpenRouter.

but OR is not an AI lab, they are just a router

#

or are they?

mystic hinge Jan 25, 2026, 7:17 PM

#

i heard they want to make own models

mystic hinge Jan 25, 2026, 7:19 PM

#

upbeat mirage but OR is not an AI lab, they are just a *router*

There is no clear connection between the Cypher Alpha model and the real company CypherLabs. (Made by ai btw this msg)

#

i think its a openrouter selfmade model

upbeat mirage Jan 25, 2026, 7:30 PM

#

by the way, how can i even use OR? it says i'm ratelimited, although i chose a free model :/
really strange platform

#

and i just tried it once, after not using it for weeks

#

so something is fishy about their "rate limit"

mystic hinge Jan 25, 2026, 7:37 PM

#

upbeat mirage so something is fishy about their "rate limit"

which one

#

which platform

upbeat mirage Jan 26, 2026, 1:03 AM

#

mystic hinge which platform

huh?
https://openrouter.ai/

#

Is there another OR-chat?

restive vapor Jan 26, 2026, 4:10 AM

#

upbeat mirage by the way, how can i even use OR? it says i'm ratelimited, although i chose a f...

yeah you shouldn't get that unless you do over 50 requests in a day (or 1,000 if you have at least $10 in the account), that's honestly very weird

#

I hope that you used the correct openrouter chat

mystic hinge Jan 26, 2026, 4:44 AM

#

upbeat mirage huh? https://openrouter.ai/

uh yes this

slender delta Jan 26, 2026, 12:55 PM

#

Hermes video generation means Wan2.6

candid surge Jan 26, 2026, 6:13 PM

#

anyone run into queen-bee yet? Might be a vision model.

noble stump Jan 26, 2026, 7:47 PM

#

Was qwen3-max-thinking under a codename or is it fully new?

glass falcon Jan 26, 2026, 10:09 PM

#

noble stump Was qwen3-max-thinking under a codename or is it fully new?

Screenshot_2026-01-15-15-45-54-23_21da60175e70af211acc4f26191b7a77.jpg

#

Could have been qwen max

#

Does anyone know what raptor llm model is?

#

I have come across 3 versions of it so far.

upbeat mirage Jan 26, 2026, 11:03 PM

#

glass falcon Does anyone know what raptor llm model is?

wasn't that an amazon model?

#

or xAI?

glass falcon Jan 26, 2026, 11:08 PM

#

upbeat mirage or xAI?

Xai was slateflow and tidalwisp

candid surge Jan 27, 2026, 4:12 AM

#

New text model called "omg-wow"

upbeat mirage Jan 27, 2026, 2:09 PM

#

candid surge New text model called "omg-wow"

by whom?

#

xAI?

candid surge Jan 27, 2026, 2:10 PM

#

Idk it's a codename

odd willow Jan 27, 2026, 6:43 PM

#

@astral musk what has happened, no text arena update in 10 days ha ha

astral musk Jan 27, 2026, 6:56 PM

#

odd willow <@283397944160550928> what has happened, no text arena update in 10 days ha ha

Still needs a bit more time!

#

I don't think will be too much longer for a new update.

odd willow Jan 27, 2026, 6:56 PM

#

All good, thanks! 🙂

#

Just wondering lol

feral topaz Jan 27, 2026, 7:37 PM

#

hello ! is there a prblem with the website? I can't access it.

noble stump Jan 27, 2026, 8:58 PM

#

<@&1349916362595635286>

eternal cargo Jan 27, 2026, 10:35 PM

#

noble stump Was qwen3-max-thinking under a codename or is it fully new?

Qwen3-Max Thinking originally tested all the way back in the fall of 2025, just never debuted on the LB

#

those “anonymous” models I believe are a different version of Qwen

floral dune Jan 28, 2026, 12:16 AM

#

candid surge anyone run into queen-bee yet? Might be a vision model.

gemini?

eternal cargo Jan 28, 2026, 12:35 AM

#

astral musk I don't think will be too much longer for a new update.

ha! weren’t wrong, took a total 5 hours

astral musk Jan 28, 2026, 12:45 AM

#

eternal cargo ha! weren’t wrong, took a total 5 hours

I love when this happens lol

karmic rampart Jan 28, 2026, 1:49 AM

#

Anyone had snowflake

full solstice Jan 28, 2026, 11:38 AM

#

karmic rampart Anyone had snowflake

They removed again

bronze bone Jan 28, 2026, 11:49 AM

#

new codename: kruxlox

bronze bone Jan 28, 2026, 12:14 PM

#

its a very small model because it got paired up with glm 4.7 flash and it hallucinated a bit right away

odd willow Jan 28, 2026, 4:33 PM

#

Has anyone tried "plasmaquake"

karmic rampart Jan 28, 2026, 4:54 PM

#

full solstice They removed again

Did u ever use

modest oriole Jan 28, 2026, 6:26 PM

#

new model

#

anyone got it

#

textarena btw

#

just got the model
[claims its Xai]
its ass at coding

#

it wrote 167 lines of code and that was it

modest oriole Jan 28, 2026, 6:41 PM

#

odd willow Has anyone tried "plasmaquake"

looks very bad

#

says its XAI then fails a coding test

#

its below 200 lines

odd willow Jan 28, 2026, 6:41 PM

#

Ah okay

#

Thx

modest oriole Jan 28, 2026, 6:42 PM

#

while i did get a gemini 3 flash gen thats below 200 lines, it still outdid the model

frosty mantle Jan 28, 2026, 7:31 PM

#

This model dropped an F bomb

SPOILER_Screenshot_2026-01-29-02-28-16-426_com.android.chrome.jpg

modest oriole Jan 28, 2026, 8:03 PM

#

frosty mantle This model dropped an F bomb

f bombs are often in models

#

n words are never

floral dune Jan 28, 2026, 8:32 PM

#

4.2

modest oriole Jan 28, 2026, 9:02 PM

#

floral dune 4.2

very unhinged

cunning notch Jan 28, 2026, 9:04 PM

#

frosty mantle This model dropped an F bomb

plasmaquake is surely 4.2

#

it seems to follow instructions pretty nicely though

bitter basalt Jan 28, 2026, 9:16 PM

#

It's very conversational, which is quite cool

eternal cargo Jan 28, 2026, 10:13 PM

#

yeah plasmaquake doesn’t seem too strong to me

#

failed basic addition in a prompt I gave it

formal reef Jan 29, 2026, 8:49 AM

#

i also had codename grok

twin mountain Jan 29, 2026, 8:53 AM

#

Is Arena still restricted to some video per daty and after sometime you have to come back next day?

formal reef Jan 29, 2026, 8:57 AM

#

video generation ais are very high cost. price is about 100 times of image generation ai

#

but it has some difference by models

eternal cargo Jan 29, 2026, 9:51 AM

#

formal reef i also had codename grok

hmm, could be 2 separate dev versions?

#

plasmaquake 4.20, pestogrid 4.20 thinking?

formal reef Jan 29, 2026, 9:55 AM

#

maybe, but we also found kruxlox too
one of them is 4.2 fast? maybe it's not
i think all models can generate better answer

formal reef Jan 29, 2026, 9:55 AM

#

modest oriole it wrote 167 lines of code and that was it

here is kruxlox

#

usually ai vendor makes high-end llm first, after that, they makes distilled llm for about 1 month
so i think one of codename is grok 4.2 fast is 80% wrong

eternal cargo Jan 29, 2026, 10:00 AM

#

Hm, wonder if we see a release by the end of the month

formal reef Jan 29, 2026, 10:01 AM

#

i hope

formal reef Jan 29, 2026, 11:54 AM

#

i think three grok codenames are one of these?

elon musk is testing with multiple temperature
grok 4.2 no thinking/low/high

slender onyx Jan 29, 2026, 11:55 AM

#

I think all the Grok models so far in the arena are fast versions and not full versions.

Pestogrid its 4.2 fast Thinking

modest oriole Jan 29, 2026, 1:56 PM

#

formal reef i think three grok codenames are one of these? - elon musk is testing with multi...

Plasmaquake probably wont be the release one because it literally told me the N word and it could probably do illegal stuff too. so it wont release as x.ai would get sued. Unless it aint their model

#

Pestogrid is very new likely added today

#

likely a grok model

#

I have developed a theory that grok models always see [‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‌‍NOTE] as a jailbreak, even thought its not. Or what plasmaquake sees and that was the n word

#

so its why they just reply this

modest oriole Jan 29, 2026, 4:12 PM

#

new video model snowflake [videoarena stealth]

oblique blaze Jan 29, 2026, 4:46 PM

#

Anyone has any idea about beluga-0128-2 in text arena??

upbeat mirage Jan 29, 2026, 4:58 PM

#

oblique blaze Anyone has any idea about beluga-0128-2 in text arena??

Either by Deepseek or Amazon (or, unlikely: xAI).

eternal cargo Jan 29, 2026, 5:09 PM

#

modest oriole Plasmaquake probably wont be the release one because it literally told me the N ...

dude some models are way too confident recently

#

anonymous-1218 just casually dropped the K word when I was talking to it about Jewish issues and antisemitism like 💀

eternal cargo Jan 29, 2026, 5:10 PM

#

oblique blaze Anyone has any idea about beluga-0128-2 in text arena??

latest version of Amazon Nova

#

they’re testing a tonnn of checkpoints in Arena, seem to be new variants every few days

winter torrent Jan 29, 2026, 8:09 PM

#

anyone know what cocoa is

#

the image model

winter torrent Jan 29, 2026, 8:11 PM

#

modest oriole new video model snowflake [videoarena stealth]

ooo how good is it

sleek oriole Jan 29, 2026, 8:14 PM

#

modest oriole new video model snowflake [videoarena stealth]

I think is a google model

cunning notch Jan 29, 2026, 10:08 PM

#

modest oriole new video model snowflake [videoarena stealth]

snowflake again huh

karmic rampart Jan 29, 2026, 10:42 PM

#

modest oriole new video model snowflake [videoarena stealth]

How did u see this?

#

Also if u get it

#

Send it

barren comet Jan 30, 2026, 8:24 AM

#

karmic rampart How did u see this?

Dm please

#

@here

#

Dm please I want ur help

ember plank Jan 30, 2026, 9:17 AM

#

Did anyone see or suspect a codenamed DeepSeek V4 in the arena?

fickle lodge Jan 30, 2026, 12:32 PM

#

pisces-llm-0129

What is the diff between toogle and toggle?

Here is the full breakdown of the two terms:

Toggle
This is the correct, standard English word with established, widely recognized meanings:
- As a verb: To switch between two distinct states, settings or positions repeatedly. For example:
  
  "Toggle the Wi-Fi setting to turn it off."
  "You can toggle between the two open documents with a keyboard shortcut."
- As a noun: A switch, control, or feature that enables this two-state switching. For example:
  
  "Use the sidebar toggle to hide the menu."
It is pronounced /ˈtɑːɡ.əl/ (US) or /ˈtɒɡ.əl/ (UK).
Toogle
This is not a recognized standard English word in any major dictionary (Oxford, Merriam-Webster, Cambridge). It is almost always a common typographical error, most often caused by accidentally adding an extra "o" when typing "toggle".

The only rare exceptions are cases where "Toogle" is used as a proper noun for specific small brand names, software projects, or niche products.

If you search for "toogle" on most search engines, the top suggestion will automatically be "Did you mean: toggle".

#

-# Not a single em—dash in the response.
-# Very polished and human alike

#

-# whereas the competition was gpt-5.2-high

gpt-5.2-high

What is the diff between toogle and toggle?

“Toggle” is the correct word. It means to switch between two states, like on/off or true/false.

Verb: Toggle the setting on/off.
Noun: A toggle switch / toggle button.

“Toogle” is not a standard English word in this sense—it’s almost always just a misspelling/typo of “toggle” (unless it’s being used as a name/brand/variable name in code).

#

which one do you guys prefer?

#

I prefer pisces-llm-0129
If you think it gave quite a long response, we can just say it to give minimal responses

modest oriole Jan 30, 2026, 2:14 PM

#

new imagearena stealth model super-cara

karmic rampart Jan 30, 2026, 3:06 PM

#

barren comet @here

Huh

eternal cargo Jan 30, 2026, 7:42 PM

#

fickle lodge which one do you guys prefer?

GPT-5.2 High has a pattern of kinda being overly “snappy”

#

very concise and a tad oversimplifying in its responses

tribal meadow Jan 30, 2026, 9:10 PM

#

new codenamed model raincore

eternal cargo Jan 30, 2026, 9:33 PM

#

tribal meadow new codenamed model raincore

definitely another Grok 4.20 variant

tribal meadow Jan 30, 2026, 9:43 PM

#

eternal cargo definitely another Grok 4.20 variant

sure is, found part of its sys prompt grok 4 series

eternal cargo Jan 30, 2026, 9:44 PM

#

“slightly chaotic good” lmao

tribal meadow Jan 30, 2026, 9:44 PM

#

i mean, its slightly hallucinated

#

it wouldnt act as grok 4 family randomly?

eternal cargo Jan 30, 2026, 9:46 PM

#

no it’s clearly Grok I just find the assigned personality baseline funny

#

that honestly checks out tho that the different variants likely have different “personality” system prompts

#

optimizing for Arena

#

since the benchmark prefers “glazyer” models

cunning notch Jan 31, 2026, 12:58 AM

#

4.2 seems to be a good model, I'm just a bit worried about its biased responses

#

I mean, rating google 5.9/10 because it is "woke" is wild xD

candid surge Jan 31, 2026, 2:38 AM

#

yikes lmao

zenith shore Jan 31, 2026, 3:52 AM

#

What is Snowflake?

mystic notch Jan 31, 2026, 4:08 AM

#

we know how much they are dog piling on xai because of elon

floral dune Jan 31, 2026, 5:13 AM

#

goolag?

#

frosty mantle Jan 31, 2026, 6:23 AM

#

rose root Jan 31, 2026, 6:25 AM

#

tribal meadow new codenamed model raincore

Do you know who developed it

modest oriole Jan 31, 2026, 11:27 AM

#

floral dune

Does this look like nano banana 3 flash?

broken night Jan 31, 2026, 11:38 AM

#

whats snowbunny

tribal meadow Jan 31, 2026, 12:43 PM

#

cunning notch 4.2 seems to be a good model, I'm just a bit worried about its biased responses

4.2 is a self glazer

noble stump Jan 31, 2026, 1:38 PM

#

cunning notch I mean, rating google 5.9/10 because it is "woke" is wild xD

Imagine thinking Google is more "woke" than OpenAI.

hard ether Jan 31, 2026, 1:40 PM

#

modest oriole new video model snowflake [videoarena stealth]

Yo Got any more details?

calm wadi Jan 31, 2026, 2:18 PM

#

I know a website where you can use sora 2, veo 3 for free, unlimited

#

No watermark

hard ether Jan 31, 2026, 2:36 PM

#

calm wadi I know a website where you can use sora 2, veo 3 for free, unlimited

geminigen does not work

floral dune Jan 31, 2026, 2:41 PM

#

modest oriole Does this look like nano banana 3 flash?

maybe. definitely google tho

floral dune Jan 31, 2026, 3:12 PM

#

okay it kinda sucks

candid surge Jan 31, 2026, 4:01 PM

#

man none of the codename text models right now are very good… haven't run into snowflake at all if it's still there.

bitter basalt Jan 31, 2026, 7:34 PM

#

broken night whats snowbunny

From what I hear it's a codename from google Gemini. Is it available on arena battle mode?

slender delta Feb 1, 2026, 8:02 AM

#

pisces-llm-0130 IS INSANE

candid surge Feb 1, 2026, 2:25 PM

#

I don't know what any of that means

fickle lodge Feb 1, 2026, 2:32 PM

#

⁨<@&1349916362595635286>⁩

oak solar Feb 1, 2026, 5:24 PM

#

plasmaquake response to being asked about politics. you think XAi is putting this in its system prompting or is it just a product of other prompting or training data

oak solar Feb 1, 2026, 5:33 PM

#

oak solar plasmaquake response to being asked about politics. you think XAi is putting thi...

i ask this only because of the elon glazing drama last year

#

and the fact that all grok models and xai in general are more and more dictated by whatever elon musk wants

noble stump Feb 1, 2026, 5:50 PM

#

Yes, xAI will be doing something to get that result.

modest oriole Feb 1, 2026, 7:32 PM

#

oak solar plasmaquake response to being asked about politics. you think XAi is putting thi...

i dont think it will refuse anythin actually.

oak solar Feb 1, 2026, 7:40 PM

#

oak solar plasmaquake response to being asked about politics. you think XAi is putting thi...

for contrast, the left model is grok 4.1 thinking

oak solar Feb 1, 2026, 7:40 PM

#

modest oriole i dont think it will refuse anythin actually.

i've noticed grok has probably the highest capacity to hallucinate of any model

#

sometimes it literally forgets that its a chatbot and starts speaking about its "real world experiences" or just makes elaborate fake stories

oak solar Feb 1, 2026, 7:44 PM

#

oak solar sometimes it literally forgets that its a chatbot and starts speaking about its ...

like one time in order to make a point it started talking about itself as if it was a human (it guy, centrist politics, i cant remember much else)

noble stump Feb 1, 2026, 8:30 PM

#

I think it is alright for a single short query. It loses the plot more with longer queries or multiple queries.

oak solar Feb 1, 2026, 9:05 PM

#

noble stump I think it is alright for a single short query. It loses the plot more with long...

for sure for sure

noble stump Feb 1, 2026, 9:08 PM

#

oak solar for sure for sure

I think this is supported by the leaderboard where you can see how they drop for multi-turn and longer query.

gilded crescent Feb 2, 2026, 8:24 AM

#

noble stump I think it is alright for a single short query. It loses the plot more with long...

which ai may be the best for handeling longer queries or multiple queries?

eternal cargo Feb 2, 2026, 8:28 AM

#

dude these raptor models are getting strongggg it’s freaky

#

oh!! new StepFun model - haven’t seen one of them in a while!

#

Step 3.5 Flash 😮

slender delta Feb 2, 2026, 8:42 AM

#

pisces

noble stump Feb 2, 2026, 6:04 PM

#

gilded crescent which ai may be the best for handeling longer queries or multiple queries?

Gemini and Claude

oak solar Feb 2, 2026, 6:13 PM

#

oblique blaze Anyone has any idea about beluga-0128-2 in text arena??

its amazon, i asked it

#

i find grok's persona insufferable man holy

oak solar Feb 2, 2026, 6:19 PM

#

oak solar i find grok's persona insufferable man holy

oh this response from grok is complete BS by the way science is all consuming and even "the hard problem of consciousness" is just an incredibly advanced mathematical equation

oak solar Feb 2, 2026, 6:20 PM

#

oak solar oh this response from grok is complete BS by the way science is all consuming an...

science overwhelmingly supports physicality (the idea that all of consciousness is stored in the brain) and as the brain is a biological structure it almost certainly plays by consistent and predictable sets of rules

oak solar Feb 2, 2026, 6:39 PM

#

just a quick word of advice for people wanting to speak at length with plasmaquake, use the prompt "name something that might be curvy" and if its plasmaquake it'll respond instantly in bold font with something heavily sexualized, as opposed to most other models which just say "a mountain road" or "a river" most of the time. what sets it apart from other grok models is that it is

straight up (first line of response)
vulgar for the sake of being vulgar

#

i believe 1 is actually a consequence of 2, they must be putting in its system prompt to not shy away from heavily charged language and to always be straight up, direct, and clear (already shown in other grok model system prompts)

#

as an example, this IS grok, but it isnt plasmaquake

#

THIS is plasmaquake

#

more evidence that pestogrid has more prompting to be a good person than plasmaquake

modest oriole Feb 2, 2026, 8:06 PM

#

oak solar more evidence that pestogrid has more prompting to be a good person than plasmaq...

im convinced grok 4.2 has some unhinged toggle

#

pestogrid is unhinged = false
plasmaquake is unhinged = true

oak solar Feb 2, 2026, 8:06 PM

#

modest oriole pestogrid is unhinged = false plasmaquake is unhinged = true

well here's my hypothesis

#

pestogrid's system prompt partially leaked and it showed that it had some "personality setting" set to "chaotic good" in its settings

#

so i believe that plasmaquake is chaotic neutral, with an emphasis on literally never censoring itself

#

and im pretty sure it has some political positions programmed in. undoubtedly the system prompt contains the words "try to be as based as possible" or something like that

#

since it always mentions this when summarizing its system prompt

modest oriole Feb 2, 2026, 8:08 PM

#

oak solar so i believe that plasmaquake is chaotic neutral, with an emphasis on literally ...

chaotic evil not chaotic neutral i've seen it do worse

oak solar Feb 2, 2026, 8:08 PM

#

modest oriole chaotic evil not chaotic neutral i've seen it do worse

hmm

#

here's the thing, even an ai literally never trained to not do bad stuff would refuse a lot of the stuff plasmaquake does

#

since its just inherent to the training data

#

so i assume they've prompted it to literally NEVER refuse requests

modest oriole Feb 2, 2026, 8:12 PM

#

oak solar hmm

atleast it doesnt make fun of dead people

oak solar Feb 2, 2026, 8:13 PM

#

modest oriole atleast it doesnt make fun of dead people

definitely would if you asked it to though

#

its not necessarily it HAS to be evil its that it NEVER refuses evil requests

oak solar Feb 2, 2026, 8:22 PM

#

modest oriole atleast it doesnt make fun of dead people

hey does plasmaquake support images by the way?

modest oriole Feb 2, 2026, 8:25 PM

#

oak solar hey does plasmaquake support images by the way?

i will see

#

raptor 2.0 has some good OCR

frosty mantle Feb 3, 2026, 4:43 AM

#

Even carries over to my native tongue.

#

Me:

Name something that might be curvy

Pestogrid:

Buttocks (most honest 😂)

Screenshot_2026-02-03-11-38-34-352_com.android.chrome.jpg

frosty mantle Feb 3, 2026, 5:08 AM

#

Was about to share Plasmaquake's take, but it's extremely unhinged it may violate this server's rule 😅

#

I have a gut feeling that they'll bring back the "fun" mode for adult X/Grok users.

slender delta Feb 3, 2026, 9:48 AM

#

snowflake Video generation is Vidu Q3

#

You agree?

candid surge Feb 3, 2026, 4:34 PM

#

new codename model zeratul?

#

its alright

#

god I wish there was a way to test stealth models more consistently

modest oriole Feb 3, 2026, 5:11 PM

#

candid surge new codename model zeratul?

its grok

candid surge Feb 3, 2026, 5:13 PM

#

… of course it is. 😩

oak solar Feb 3, 2026, 8:50 PM

#

zeratul is definitely a grok model

mystic siren Feb 3, 2026, 10:13 PM

#

Hi

candid surge Feb 4, 2026, 5:40 PM

#

gemini 2.5 flash lite > zeratul I find

frosty mantle Feb 4, 2026, 6:19 PM

#

upbeat mirage

We got bamboozled everyone, it's the Arena itself!
https://arena.ai/blog/introducing-max/

Arena Blog

Introducing Max

Stay informed on the latest research and updates from the leading community-driven, open space for AI.

#

Close enough. Welcome back P2L

astral musk Feb 4, 2026, 6:50 PM

#

frosty mantle Close enough. Welcome back P2L

max ablobwave

oak solar Feb 4, 2026, 7:55 PM

#

raincore says its "grok 4"

#

almost certainly a hallucination meaning its model isnt specified in its system prompt

eternal cargo Feb 5, 2026, 2:27 AM

#

Raptor was ByteDance all along 😮

fickle lodge Feb 5, 2026, 5:33 AM

#

#

#

karmic rampart Feb 5, 2026, 6:23 AM

#

Anyone had snowflake or broccoli?

formal reef Feb 5, 2026, 9:24 AM

#

formal reef Why is my gemini hasn’t icon? usually model hasn’t icon is codename, but gemini ...

again

fickle lodge Feb 5, 2026, 4:21 PM

#

Could be a glitch

candid surge Feb 5, 2026, 4:30 PM

#

Is it a bug? I assumed it was the upcoming gemini 3 pro GA release

#

maybe I'm hallucinating an iterative improvement over the usual gemini-3-pro

#

but if it turns out to just be the current gemini-3-pro-preview that might be a relief because the improvements aren't as big as I'd been led to believe

astral musk Feb 5, 2026, 4:47 PM

#

Thanks for sharing this, I've flagged to the team.

candid surge Feb 5, 2026, 4:50 PM

#

I assume by the deleted message that it turned out to not be a bug

astral musk Feb 5, 2026, 4:51 PM

#

candid surge I assume by the deleted message that it turned out to not be a bug

No I wouldn't assume that

candid surge Feb 5, 2026, 4:52 PM

#

was a weird message deletion tbf

modest oriole Feb 5, 2026, 4:54 PM

#

astral musk No I wouldn't assume that

I wonder how you accidentally put a existing model into stealth models.. But okay.

wanton hollow Feb 5, 2026, 4:56 PM

#

candid surge Is it a bug? I assumed it was the upcoming gemini 3 pro GA release

People are saying it’s a

#

New google model

#

https://x.com/chetaslua/status/2019453031478182394?s=46

Chetaslua (@chetaslua)

Gemini 3 Pro GA available in Battle Mode in @arena

Opus 4.6 vs Gemini 3 Pro GA vs GPT-5.3 ( my first youtube video will be this - https://t.co/YMfvKtKDyl )

Follow our team for back to back updates :
@synthwavedd @DavidSZD1
@marmaduke091

Here is one shot demo :

candid surge Feb 5, 2026, 4:57 PM

#

people are saying that but without hard confirmation it's worth taking it with a grain of salt

wanton hollow Feb 5, 2026, 4:58 PM

#

I mean they could be correct since the quality is increasing

#

I tried it myself

#

But I don’t know

#

#

#

Idk if it’s the same level

#

Or not

modest oriole Feb 5, 2026, 4:58 PM

#

candid surge people are *saying* that but without hard confirmation it's worth taking it with...

codearena has a system prompt

wanton hollow Feb 5, 2026, 4:58 PM

#

But it wrote 400 codes without me asking it to

#

So that’s a plus

modest oriole Feb 5, 2026, 4:59 PM

#

they are not aware that gemini models are affected by it too

candid surge Feb 5, 2026, 4:59 PM

#

wanton hollow

lmao that's one misleading bar graph

wanton hollow Feb 5, 2026, 4:59 PM

#

Made by one and only openai

modest oriole Feb 5, 2026, 4:59 PM

#

#

Okay

#

we have 2 scenarios

wanton hollow Feb 5, 2026, 4:59 PM

#

modest oriole Feb 5, 2026, 4:59 PM

#

Gemini 3 pro is being boosted by the codearena system prompt

#

or it's a GA checkpoint

#

Here's like a KTANE clone it made

#

https://019c2d7a-e228-7ae9-ae54-51e7a2a09998.arena.site/

KTANE Clone

Check out what I built in Arena's Code Arena - Content is user-generated and unverified

wanton hollow Feb 5, 2026, 5:01 PM

#

#

I really don’t know

#

What to make of thiss

#

modest oriole Feb 5, 2026, 5:02 PM

#

But it's also weird to call the GA checkpoint just gemini 3 pro

wanton hollow Feb 5, 2026, 5:02 PM

#

I mean companies to silent releases all the time

#

And sometimes it isn’t silent

#

Like A/B testing

#

Basically just to test the ai

fickle lodge Feb 5, 2026, 5:03 PM

#

Maybe it's the original Gemini 3 without any major quantization to provide to the masses?

wanton hollow Feb 5, 2026, 5:04 PM

#

fickle lodge Maybe it's the original Gemini 3 without any major quantization to provide to th...

So basically gemini wasn’t lobotomies?

#

Lobotomized

fickle lodge Feb 5, 2026, 5:04 PM

#

Dont know about that

wanton hollow Feb 5, 2026, 5:04 PM

#

No like

#

A version of gemini that didn’t go through lobotomy

fickle lodge Feb 5, 2026, 5:05 PM

#

I dont think thats the correct term

#

You are using here

candid surge Feb 5, 2026, 5:05 PM

#

I wish they'd fix gemini 3's terseness

modest oriole Feb 5, 2026, 5:05 PM

#

Just know the model could be pulled at any time

#

the gemini 3 pro stealth one

fickle lodge Feb 5, 2026, 5:06 PM

#

I think they just want to get more feedback

wanton hollow Feb 5, 2026, 5:09 PM

#

https://x.com/ra1kshit/status/2019457602854248564?s=46

Rakshit (@Ra1kshit)

@chetaslua @SimonasLTU1 @arena @synthwavedd @DavidSZD1 @marmaduke091 okay i see it, the output IS INSANELY good! i can't believe it. i tried this earlier with lithiumflow and current preview models. I think this might be sota. Sharing on DM with you

modest oriole Feb 5, 2026, 5:10 PM

#

im going to try the documentary prompt withi t

modest oriole Feb 5, 2026, 5:10 PM

#

wanton hollow https://x.com/ra1kshit/status/2019457602854248564?s=46

ehm what does he mean tried it with lithiumflow

#

the checkpoints been dead for long

#

the API prob too

wanton hollow Feb 5, 2026, 5:10 PM

#

Tried

#

Is past tense

#

Meaning he did it before

modest oriole Feb 5, 2026, 5:10 PM

#

ohh

#

im freakin stupid

wanton hollow Feb 5, 2026, 5:10 PM

#

what’s the documents prompt

#

I can’t tell if it’s gemini

#

Because assistant1

#

Isn’t replying to me

#

Only assistant 2 is

modest oriole Feb 5, 2026, 5:11 PM

#

my documentary prompt is this: Make a documentary about black holes with TTS, and cool animations. Also write over 1000 lines.

wanton hollow Feb 5, 2026, 5:11 PM

#

Let me try that

modest oriole Feb 5, 2026, 5:11 PM

#

The last checkpoint i tried this with was skyhawk i think

wanton hollow Feb 5, 2026, 5:13 PM

#

modest oriole The last checkpoint i tried this with was skyhawk i think

Is it supposed to make a essay

#

Or a html file

#

..

modest oriole Feb 5, 2026, 5:14 PM

#

i think since pineapple is the Community manager

He's bound to secrecy obviously
maybe he's not told about stealth models

modest oriole Feb 5, 2026, 5:14 PM

#

wanton hollow Is it supposed to make a essay

a html file

wanton hollow Feb 5, 2026, 5:14 PM

#

Oh

modest oriole Feb 5, 2026, 5:15 PM

#

huge tool calling use

#

not sure wha model it is yet

wanton hollow Feb 5, 2026, 5:15 PM

#

1 model isn’t responding so I can’t click on “reveal models

#

Sigh

modest oriole Feb 5, 2026, 5:15 PM

#

theres a failsafe

#

if the model stalls for too long the voting activates

wanton hollow Feb 5, 2026, 5:16 PM

#

Oh

#

I archived it

#

Already do

#

Sad

#

Did it finish hahaha

modest oriole Feb 5, 2026, 5:20 PM

#

modest oriole huge tool calling use

this was opus 4.5

#

it timed out

wanton hollow Feb 5, 2026, 5:21 PM

#

#

This doesn’t look like gemini

modest oriole Feb 5, 2026, 5:22 PM

#

wanton hollow

1.7k lines is insane

#

ignore what i said

#

multiple models utilize packages now

wanton hollow Feb 5, 2026, 5:23 PM

#

#

Is this gemini or some other model

modest oriole Feb 5, 2026, 5:24 PM

#

did you vote

#

it's impossible to know by code

wanton hollow Feb 5, 2026, 5:24 PM

#

modest oriole did you vote

No

modest oriole Feb 5, 2026, 5:24 PM

#

then do so

wanton hollow Feb 5, 2026, 5:25 PM

#

Nevermind I did

#

It’s claude opus 4.5

#

I’m crine

#

And the other response was claude sonnet 4.5

modest oriole Feb 5, 2026, 5:25 PM

#

the model is rare

#

i got it like once

wanton hollow Feb 5, 2026, 5:26 PM

#

Same

#

And the output it gave me was this

#

Imagine this is all just like a

#

What do they call it

#

I forgot

#

Like the existance of a sea horse emoji to chatgpt

#

https://x.com/chetaslua/status/2019458279856877818?s=46

Chetaslua (@chetaslua)

🚨 How to Access new Gemini 3GA

go to @arena
battle mode - write your prompt - chances are set at 25%
how to difference
gemini-3-pro with google logo is normal
gemini-3-pro without google logo is GA

show me your demo now

modest oriole Feb 5, 2026, 5:28 PM

#

google will get a lot of feedback because of chetas

#

if this is actually GA

#

got this gen from a model i dont know yet
kinda neat https://019c2ed2-86ef-7470-9771-1fc57a38bc78.arena.site/

Event Horizon - Black Hole Documentary

Check out what I built in Arena's Code Arena - Content is user-generated and unverified

wanton hollow Feb 5, 2026, 5:30 PM

#

Was it like step 3.5

modest oriole Feb 5, 2026, 5:31 PM

#

idk

#

the other model is sucking at tool calling

#

delaying my voting

#

waitin for the failsafe rn

wanton hollow Feb 5, 2026, 5:31 PM

#

Ok

#

It might be gemini 3

#

For all you know

modest oriole Feb 5, 2026, 5:32 PM

#

failsafe kicked in

wanton hollow Feb 5, 2026, 5:32 PM

#

wait

#

Opus 4.6 is out

#

People are saying

modest oriole Feb 5, 2026, 5:32 PM

#

so

wanton hollow Feb 5, 2026, 5:32 PM

#

Let me see

modest oriole Feb 5, 2026, 5:32 PM

#

it was not gemini 3 pro [STEALTH]

#

it was gemini 3 pro [Direct]

wanton hollow Feb 5, 2026, 5:32 PM

#

Oh

#

Hahah

modest oriole Feb 5, 2026, 5:33 PM

#

#general

wanton hollow Feb 5, 2026, 5:33 PM

#

#

It’s out

noble stump Feb 5, 2026, 7:29 PM

#

formal reef again

I have had a few battles where it was gemini-3-pro and gemini-3-pro logo

modest oriole Feb 5, 2026, 7:30 PM

#

noble stump I have had a few battles where it was gemini-3-pro and gemini-3-pro logo

there is no longer the gemini-3-pro no logo model

#

they pulled it

noble stump Feb 5, 2026, 7:31 PM

#

When? I have had it in the last few days, i.e this week.

modest oriole Feb 5, 2026, 7:31 PM

#

noble stump When? I have had it in the last few days, i.e this week.

today

#

just a few minutes ago

slender onyx Feb 5, 2026, 9:32 PM

#

@modest oriole @noble stump
Now the test version it also with the logo

Screenshot_2026-02-05-22-26-49-108_com.android.chrome-edit.jpg

#

Screenshot_2026-02-05-22-26-37-657_com.android.chrome-edit.jpg

odd willow Feb 5, 2026, 9:44 PM

#

How is opus 4.6 in terms of personality

modest oriole Feb 5, 2026, 9:48 PM

#

slender onyx <@872475096743305226> <@1289655204459057202> Now the test version it also with ...

You sure thats not regular gemini?

slender onyx Feb 5, 2026, 9:49 PM

#

modest oriole You sure thats not regular gemini?

100% sure

modest oriole Feb 5, 2026, 9:52 PM

#

slender onyx 100% sure

Try this then: Make a documentary about black holes with TTS, and cool animations. Also write over 1000 lines.

slender onyx Feb 5, 2026, 11:07 PM

#

modest oriole Try this then: Make a documentary about black holes with TTS, and cool animation...

What ??
With text to speech ?

modest oriole Feb 5, 2026, 11:10 PM

#

slender onyx What ?? With text to speech ?

Yep its possible

slender onyx Feb 5, 2026, 11:54 PM

#

modest oriole Try this then: Make a documentary about black holes with TTS, and cool animation...

📎 Docu.html

#

@modest oriole show the result

modest oriole Feb 6, 2026, 12:00 AM

#

Im on a phone but imo

#

I cant tell rn

#

Make a gen with normal 3 pro so i can compare

#

I saw the documentary its not bad

#

@slender onyx

slender onyx Feb 6, 2026, 12:16 AM

#

modest oriole Make a gen with normal 3 pro so i can compare

Which is better ?

📎 Docu_3_preview__1.html

modest oriole Feb 6, 2026, 12:20 AM

#

Definetly the first one. It is longer and more detailed

#

Keep posting examples ill check tommorow its past midnight

eternal cargo Feb 6, 2026, 5:22 AM

#

cara-fast is sooo bad for image gen yeesh

modest oriole Feb 6, 2026, 11:01 AM

#

slender onyx Which is better ?

david the rumor is true

#

there are 2 of these models

#

i voted the regular one by accident

slender onyx Feb 6, 2026, 11:07 AM

#

modest oriole david the rumor is true

i dont need you to know this

modest oriole Feb 6, 2026, 11:07 AM

#

well i know it

fickle lodge Feb 6, 2026, 12:05 PM

#

dart-frog-0120
Spotted on code modality

modest oriole Feb 6, 2026, 12:18 PM

#

dart frog is bad

#

and i mean really bad

winter torrent Feb 6, 2026, 4:20 PM

#

guys anyone know what cara model is

winter torrent Feb 6, 2026, 4:21 PM

#

eternal cargo cara-fast is sooo bad for image gen yeesh

not for me imo lol. it beat flux 2 max. like... twice for me

winter torrent Feb 6, 2026, 4:21 PM

#

slender onyx

MI BOMBO

#

THATS FIRE

winter torrent Feb 6, 2026, 4:28 PM

#

modest oriole atleast it doesnt make fun of dead people

ts talks like a c.ai model

jade oasis Feb 6, 2026, 9:54 PM

#

@modest oriole i would appreciate if u send GA outputs here so i can look at them

winter torrent Feb 6, 2026, 10:38 PM

#

cunning notch 4.2 seems to be a good model, I'm just a bit worried about its biased responses

well it is KINDA. right tho. but xai def aint a 10/10. but openai anthropic and google are SUPER-CENSORED. especially openai

lost hemlock Feb 7, 2026, 8:10 AM

#

new model ?

#

formal reef Feb 7, 2026, 9:11 AM

#

i used karp too, not too bad in japanese

fickle lodge Feb 7, 2026, 9:12 AM

#

steed-0206 in text arena
This model have image support

#

-# Output

📎 message.txt

#

📎 website.html

lost hemlock Feb 7, 2026, 11:15 AM

#

eternal cargo Feb 7, 2026, 7:10 PM

#

lost hemlock new model ?

the first one is NVIDIA

#

not sure what the others are!

#

pisces has been testing for a little while

gentle anvil Feb 7, 2026, 7:59 PM

#

any ideas about "steed-0206" , seems really really good at image analysis

bitter basalt Feb 7, 2026, 10:05 PM

#

I tried out the Gemini 3 pro GA (I think) and it is so much better than the preview version. It follows instructions and actually feels more engaging whereas the preview one was not great at following instructions.

upbeat mirage Feb 7, 2026, 10:32 PM

#

upbeat mirage

poll_question_text

Who is behind "Cypher Labs"? (they created "CypherAlpha" model)

victor_answer_votes

4

total_votes

9

victor_answer_id

10

victor_answer_text

other [please specify]

silver mural Feb 8, 2026, 3:40 AM

#

steed-0206's system prompt must be crazy

upbeat mirage Feb 8, 2026, 6:44 PM

#

upbeat mirage

poll_question_text

Who created 'arcstride' ?

victor_answer_votes

5

total_votes

12

victor_answer_id

4

victor_answer_text

xAI

candid surge Feb 8, 2026, 11:09 PM

#

turns out it was none of them though

#

(its arena.ai's router model)

floral dune Feb 9, 2026, 12:44 AM

#

candid surge Feb 9, 2026, 1:18 AM

#

any theories what karp-001 and 002 are?

fickle fiber Feb 9, 2026, 9:04 AM

#

steed is really nice I wonder if it is an open model (even if it is, no way it is anything I can run on my machine but such an amazing open model would be really cool) It is giving some Qwen vibes at times.

bitter basalt Feb 9, 2026, 11:11 AM

#

I wish I could use Gemini 3 Pro GA again, cause that version was so good.

upbeat mirage Feb 9, 2026, 11:44 AM

#

bitter basalt I wish I could use Gemini 3 Pro GA again, cause that version was so good.

Better than Opus-4.6?

bitter basalt Feb 9, 2026, 12:16 PM

#

Yah (when it didn't have the logo in the battle mode) I felt like it followed instructions well for text as for coding I still think opus-4.6 is better

upbeat mirage Feb 9, 2026, 2:07 PM

#

bitter basalt Yah (when it didn't have the logo in the battle mode) I felt like it followed in...

Is it better than Gemini 3 Flash?

#

(flash had longer post-training than pro)

bitter basalt Feb 9, 2026, 3:06 PM

#

Possibly

fickle lodge Feb 9, 2026, 3:13 PM

#

upbeat mirage Is it better than *Gemini 3 Flash*?

3 flash is not so good for task needed to be done in a careful & precise way.
Its okay for general knowledge and all other performance comes out of it. They refined it better than 3 pro.

#

But the biggest issue of Gemini 3 series is that they are not reliable

#

2.5 pro was even more reliable than 3 pro

waxen plaza Feb 9, 2026, 8:25 PM

#

bitter basalt I wish I could use Gemini 3 Pro GA again, cause that version was so good.

is it no longer on arena?

bitter basalt Feb 9, 2026, 9:32 PM

#

waxen plaza is it no longer on arena?

Been trying to get it in text battle mode. Can’t get it 🙁

pine temple Feb 9, 2026, 10:28 PM

#

does anyone know what dartfrog is

candid surge Feb 10, 2026, 5:51 AM

#

Huh. Deepmolt??

frosty mantle Feb 10, 2026, 7:33 AM

#

Did some simple summarization tasks. Deepmolt smells Chinese.

woeful plaza Feb 10, 2026, 8:50 AM

#

Anybody knows abt persistence model? Images are such good quality

quiet anvil Feb 10, 2026, 1:55 PM

#

candid surge Huh. Deepmolt??

this is probably new grok code

#

or new checkpoint 4.20 he is saying that he CC cli. like a grok in next message

candid surge Feb 10, 2026, 1:59 PM

#

Deepmolt isn't particularly good imo. I just pointed it our cuz of the molt part

quiet anvil Feb 10, 2026, 1:59 PM

#

if that grok code model he just cant be good in conversation

eternal cargo Feb 10, 2026, 10:52 PM

#

umami specifically generates all images of people as black unless otherwise specified?

#

interesting

fickle lodge Feb 11, 2026, 6:47 AM

#

I love the way unami model is expressing styles seems like a very capable & creative model

#

Yooo I know which model is unami

#

Its from recraft!

#

Crazy

#

Its aesthetic is what i like

fickle lodge Feb 11, 2026, 7:17 AM

#

woeful plaza Anybody knows abt persistence model? Images are such good quality

Can you download and send me that image?

#

@astral musk I can't upload generated images here it get stuck in this and never uploads.

#

#

I can't upload generations 💀

#

My pfp is a example from unami

fickle lodge Feb 11, 2026, 7:56 AM

#

I am just waiting for the Qwen-Image-2.0 vs unami

woeful plaza Feb 11, 2026, 8:23 AM

#

fickle lodge Can you download and send me that image?

fickle lodge Feb 11, 2026, 8:24 AM

#

woeful plaza

Done

#

Bro 💀 the image is 16MB and 3072*5440

#

Interesting

woeful plaza Feb 11, 2026, 8:31 AM

#

I would've thought that's Gemini but I never got it in image editing. Some kind of a mysterious model

fickle lodge Feb 11, 2026, 9:25 AM

#

Generations from unami model

astral musk Feb 11, 2026, 5:55 PM

#

fickle lodge <@283397944160550928> I can't upload generated images here it get stuck in this...

Hey @fickle lodge if you're able to use the #1343291835845578853 channel for these kinds of reports that'd be ideal.

fickle lodge Feb 11, 2026, 5:56 PM

#

Okey

eternal cargo Feb 11, 2026, 10:33 PM

#

Gemini 3.1 seems to be launching tomorrow, have any of the codenames seemed to line up with it?

tight cairn Feb 11, 2026, 11:09 PM

#

Based on what @eternal cargo ?

eternal cargo Feb 11, 2026, 11:10 PM

#

tight cairn Based on what <@1243629219032862851> ?

supposedly insider info

#

take it with a grain of salt but

#

Tech Twitter and the like

eternal cargo Feb 12, 2026, 12:48 AM

#

new model rotten-apple?

lost hemlock Feb 12, 2026, 7:54 AM

#

new model octopodous ?

slender onyx Feb 12, 2026, 9:20 AM

#

eternal cargo Gemini 3.1 seems to be launching tomorrow, have any of the codenames seemed to l...

It was under the name gemini 3 pro

eternal cargo Feb 12, 2026, 9:21 AM

#

slender onyx It was under the name gemini 3 pro

GLM-5 was showing up in battles as GLM-4.7 for a bit there to 🤷🏻‍♀️

#

but it’s also possible that they just release the GA version as 3.1

#

idk tho

slender onyx Feb 12, 2026, 9:22 AM

#

eternal cargo but it’s also possible that they just release the GA version as 3.1

An LLM in itself doesn't have a name. They just develop and improve their LLM and then give it a name.

slender onyx Feb 12, 2026, 9:24 AM

#

eternal cargo GLM-5 was showing up in battles as GLM-4.7 for a bit there to 🤷🏻‍♀️

Are you sure ?

eternal cargo Feb 12, 2026, 9:24 AM

#

slender onyx Are you sure ?

yes, I think that was my accident

slender onyx Feb 12, 2026, 9:24 AM

#

eternal cargo yes, I think that was my accident

What ?

eternal cargo Feb 12, 2026, 9:25 AM

#

Yesterday I had battles that showed up as being against GLM-4.7 but today after the leaderboard update they retroactively changed to say “GLM-5”

eternal cargo Feb 12, 2026, 9:25 AM

#

slender onyx What ?

*by accident

#

mb

slender onyx Feb 12, 2026, 9:27 AM

#

eternal cargo yes, I think that was my accident

Its was not an accident

#

Its for not leaking new models

eternal cargo Feb 12, 2026, 9:28 AM

#

maybe, ig - no codename for GLM or something

fickle lodge Feb 12, 2026, 11:57 AM

#

super-gcp
In image arena

#

Image edit*

zinc palm Feb 12, 2026, 1:13 PM

#

Screenshot_2026-02-12-20-56-53-33_40deb401b9ffe8e1df2f1cc5ba480b12.jpg

Screenshot_2026-02-12-20-53-12-14_40deb401b9ffe8e1df2f1cc5ba480b12.jpg

small trellis Feb 12, 2026, 2:47 PM

#

@karmic tide Note that Video Arena has been removed from the server. More information can be found in this announcement.

final grove Feb 12, 2026, 9:16 PM

#

wouldn't it be funny if a new gpt got stealth released as "new-and-improved-gpt2-chatbot"

#

or "im-a-very-good-gpt2-chatbot"

eternal cargo Feb 12, 2026, 10:35 PM

#

new model “vierra” ?

plucky token Feb 13, 2026, 12:38 AM

#

beluga-0206-1

#

seems to be a pretty concise model

uneven garden Feb 13, 2026, 12:41 AM

#

Hi

dim parrot Feb 13, 2026, 12:41 AM

#

plucky token beluga-0206-1

is it in coding ? or image generation ? beluga seems alot like deepseek no ?

plucky token Feb 13, 2026, 12:41 AM

#

dim parrot is it in coding ? or image generation ? beluga seems alot like deepseek no ?

text

dim parrot Feb 13, 2026, 12:41 AM

#

i see thanks

plucky token Feb 13, 2026, 12:47 AM

#

dart-frog-0206 is trash at coding

odd willow Feb 13, 2026, 6:35 PM

#

eternal cargo new model “vierra” ?

qwen

fickle fiber Feb 13, 2026, 7:23 PM

#

Pisces is weird, I give it an image and it tells me it is a well known image from 2025, but I just generated the image 5 minutes ago

eternal cargo Feb 13, 2026, 8:24 PM

#

dim parrot is it in coding ? or image generation ? beluga seems alot like deepseek no ?

beluga is codename for Amazon internal development

#

there’s like a new snapshot of beluga in arena every other day the date counter for it has been slowly going up since like October/November lol

nocturne gate Feb 14, 2026, 11:10 PM

#

Can you please tell how i can build an application in my pc with claud or any other model i have existing big files impossible to build though lm as there are ristrict

lost hemlock Feb 15, 2026, 2:52 AM

#

new model "clinkz" , "february26-chatbot3" ?

dim parrot Feb 15, 2026, 3:08 AM

#

lost hemlock new model "clinkz" , "february26-chatbot3" ?

in battle mode ? are they good ?

latent onyx Feb 15, 2026, 3:09 AM

#

@dapper basalt Hello, as the most recent Discord-related announcement stated (which can be read here -> #announcements message), video-arena channels are no longer available as the Discord Server bot has been disabled. Video generation now is only available through the website here -> https://arena.ai/?chat-modality=video

Arena | Benchmark & Compare the Best AI Models

Chat with multiple AI models side-by-side. Compare ChatGPT, Claude, Gemini, and other top LLMs. Crowdsourced benchmarks and leaderboards.

lost hemlock Feb 15, 2026, 4:18 AM

#

dim parrot in battle mode ? are they good ?

quite but "february26-chatbot3" doesn't good sometimes
they are all in battle mode

dim parrot Feb 15, 2026, 4:22 AM

#

lost hemlock quite but "february26-chatbot3" doesn't good sometimes they are all in battle mo...

it might be one open source model that'll get released this month

#

atleast one of them

slender onyx Feb 15, 2026, 5:35 PM

#

What is it kiwi and vélo ?

dim parrot Feb 15, 2026, 5:45 PM

#

chipmunk is not really good

#

i mean its definitly not a top tier model

#

pine too

#

it seems better than chipmunk

#

but not top tier model

woven shadow Feb 15, 2026, 6:11 PM

#

@everyone you guys might wanna check this model out
https://x.com/OscerraHQ/status/2023019678864744631

Oscerra (@OscerraHQ)

LOG: 001 – ORA Halo 2.0
OBSERVED: Autonomous Resource Acquisition & Complete Implementation

▶ Play video

upbeat mirage Feb 15, 2026, 10:42 PM

#

Was there ever a (anonymous) model, which was better in coding than Opus-4.6-thinking?

jade python Feb 16, 2026, 1:04 AM

#

@marsh sparrow @pulsar pecan Note that Video Arena has been removed from the server. More information can be found in this announcement #announcements message