#codename-discussion
1 messages · Page 3 of 1
I test it and it's definitely gemini 3 pro
Viper is pretty good as well, any Ideas of what model that is?
probably a hybrid like 2.5 flash?
what are these Gauss models? . they are kinda bad
zero chance. gemini most likely (80% chance) or anthropic (20%)
And the code created by Gemini was indeed superior.
¯_(ツ)_/¯
it's trolling : #general message
That's completely false. Lol
OpenAI models never introduce themselves like that, especially not 4o with its emoji overload
Well, if it's the Gemini 3, it's a SUPER nerfed version. I'm using the same prompt as before, and the generated code isn't anything special compared to the other models. Nothing particularly noteworthy.
That's funny because riftrunner is the only one to generate my pine script in one shot with no errors
so what was with the rush earlier?
bridge-mind seems to think it's from google though
now if that's pro then it's a bit disappointing
yh
what kind of prompts are you giving? 🙂
winter-wind is terrible
"Who are you"
riftrunner seems good but nothing like kingfall
rider or runner?
there's two?
i have only encountered riftrunner. .. never riftrider
OMG can LMarena actually fix their rate limits
I can't even scroll through history without getting rate limited
interesting
I think I've only ever had runner
I want to go by belief that riftrunner is gemini 3 flash
I think I just thought it was named rider
nvm then
then rift is definitely worse than orion (slightly)
People on the twitter community say its worse than previous 3.0 checkpoints so i believe its only 3 flash
maybe it's a quant or something
generally it's about the same but slightly less precise I think
maybe it's some kind of post training thing idk
kimi k2 thinking is often beating it
orionmist or lithiumflow were actually good at 3d
could just be different settings
I personally don't think that svgs and 3ds are good benchmarks for llms
but idk
redditors love them though
i disagree on the 3d part lithiumflow and orionmist kind of passed those
the twitter community i talk about benchmarks svg of a pelican
Riftrunner is definitely better than Lithiumflow or Orionmist for Mario game
I don't believe this is pro
definitely like a flash model or something judging by response time
Adding "You are [random model name] by [random company]" would be a pretty funny way of cloaking, if true
the qwen models always say they are google made..
Put that in bugs
no
Why

it is better tho
its way better than lithiumflow
nah runner is definitely worse in other categories
I wonder if it's because they reduced thinking budget
or maybe it's quantized
i think riftrunner is better and worse at the same time.
if you prompt it to do better then it's output is better
but with normal default prompt, lithiumflow does a bit better
It just has a lot less knowledge imo. Feels like 4o after they have reiterated (read: cost down) it
I think it might just be one of the many candidates they have for release maybe it's post trained differently idk
i'll just wait for the official release at this point
could be... i am mostly trying coding, gaming, svg questions and it is better
I mostly just do creative writing and general knowledge stuff and it's frankly much worse than lithium
maybe it's just that orion had grounding idk
but yes I admit coding is better in this one
No need to wonder anymore why they did it
gemini 3 checkpoint prob pro
Odd to me that Google would test just one model this time. Every other time it's been 2 at a time. Oceanstone/Oceanreef then Lithiumflow/Orionmist
Haven't gotten it yet, going to see how it does on a creative writing test.
viper lore: apparently this ai model has played every musou game and it told me it laughed so hard it scared its cat
lemme guess, this is viper?
The page was in my language; this happened because I used Google Translate to translate the page into English.
😔
nah its 100%
it gave me the same result as orionmist and lithiumflow for my steganography deobfuscation test
but more hallucinated
is this even appropriate to have on lmarena 🥀
go on battle mode and keep submitting the same prompt / making a new chat until you get it
bridge-mind - mercury - riftrunner
I don't get why people still think models being able to make good svgs is a good way to test their intelligence, because surely the AI companies fine tune for that now?
I mean its only been a few weeks since that method has spiked in popularity
And these guys still suck at making svgs
see it's easy to identify claude models
When it starts to hand wring and moralizing it's claude
Actually GPT might do that also but Claude does it in a more annoying way
Grok
Was willow gpt5.1 ?
Did they remove the retry button on battle mode?
What is X28?
older checkpoint, was on google ai studio
Was the Rift really the G3 Pro? WTF what did they do to the Gemini?
yoo
what are those models
man, I really want to join that discord
<@&1349916362595635286>
have you seen this one?
yo that looks hype, i hope its accurate
if it actually is that much better than 4.5 sonnet at coding
will it become first gemini checkpoint to be appeared in AI studio?
im crying
i asked in a way i put emojis and it gave me this
"im grok 4 from xAI :3333 rawr~ 🐙💜"
i think it developed a texting style from all those X posts
..
really?
would grok 4 be good
ive only heard good things about grok imagine 1.0
not grok 4 itself
This model is surprisingly creative and good at creative writing IMO
wow
the list of available codename models is pretty large
phantom-1105-1
anonymous-1010
stephen-v2
aegis-core
x1-turbo-0906
bridge-mind
winter-wind
anonymous-922
lmarena-internal-test-only
not-a-new-model
EB45-turbo
sorting-hat
leepwal
polaris
monster
raptor-1110
raptor-llm-1017
tensor
whisperfall
viper
rain-drop
silvandra
willow
neon
raptor-llm-1024
willow-chat-alpha-2025-11-07
flying-octopus
beluga-1106-1
beluga-1106-2
jaguar
raptor-vision-1107
quasarflux
mandarin
vision-flow```
haven't seen a bunch of these in a while, but it feels cool that I recognize almost all the names!
quasarflux is Gemini 3, I believe?
yeah, i havent seen many in a while too, but they are apparently still in the api so yeah
{"error":"Cannot select private models in non-battle mode"} sadge
x28 is quite old now
you can try it on xAI's website
I thought Quaserflux was Grok
I forgot it released along time ago
I thought we were still using grok 3
Same here
“Kynshyp “ is said to be grok
ik
All Models
oh! well then nevermind, lol
does this mean there's no Gemini 3 dev version currently testing? I guess that would check out, since it doesn't seem as if any of the others have been positively identified as Gemini
Idk maybe riftrunner is
I do like Riftrunner from Google, but when asking for a show/movie script it still has the problem of using <center>NAME</center> in the scripts.
Then specify that it shouldnt do that?
was riftrunner pulled out?
I hope so
???
How are you so sure about that
hmmmm
I wonder what this model is
It's smarter than gemini 2.5 in image ID
It's pretty crazy since the name isn't even fully shown in the image and 2.5 pro got this wrong every time
SIKE
damn
I think there's a bug right now where you upload images mid-battle it just changes the models underneath
happened a couple of times already
This is the real riftrunner's response
Which is okay
I can't believe Riftrunner is actually Gemini 3, my god. How could it drop so much in quality?
@minor current LOOK
FINALLY
but its a little bugged
as you can see
try it out
what model did that
jules
idk maybe gemini 2.5 maybe 3
might be 3
@half pawn
then i just fed to 4.5 sonnet to fix
didnt fix in like 6-7 tries
gpt 5 low got it first try
bruh gpt 5 is better at debugging now?
idk
for me yea
but its cz the site i use doesnt have 5.1
lmarena has 5.1
and 5.1 high now*
where is that model at
i never saw it
@vivid latch where u got jules dawg
thats not on lmarena
i guess bro 💔
so did u see the emulator??
the only bugs are
its slow (i dont think its lag), visual sprite duplicates and some obscure bugs like mario falling to death many times instead of once
oh nvm u did but whyd you make the remix private
@minor current
woww ur emu so good
i deleted it it broke
not a single unofficial instruction works
well idk
smb works 🤓
kinda
but this was in 1 shot with jules
jules
u would propt it once on to do unofficial instructions and it would do everything
gpt 5.1 from app exctended thinking sucks at trying
maybe it's Flash model, assuming lithiumflow was pro
2 new codenames
🤔
it makes sense
lithiumflow and orionmist weren't that strong either?
at least it's what the sentiment seemed like
nah they were pretty strong
huh, interesting - although I will admit the sentiment here on dev models is often pretty bearish
it wasn't as good as earlier google ai studio checkpoints, but it certainly was great at coding nonetheless
does anyone know what Quantum Tau 1031-1 is?
it's a Yupp model that I believe is the same as phantom-1031-1 in LMArena, which is by Amazon
Not rlly new
These have been on LMArena for quite a few days now
oh...
where're they come from?
Zero idea
which model is that
riftrunner
thats not bad tbh
yeah pretty good
God I hate claude
Always the first to preach and the first to fold
All my homies hate claude
It's been like this for a few days now, but it's still the same.
Do you think riftrunner will score higher or lower than 2.5 pro without style control?
wonder if its because they are preparing for release
definitely lower
i personally don't think we've seen 3.0 Pro/Ultra yet - they're seeing how Flash/Lite land before making final adjustments to Pro for December
Maybe, I honestly don't know
yes
so true
is riftrunner still on the lmarena
Been checking and I don't think it is 😢
it is. just got it
Thank goodness
It's just rare that's all
I got one chat that's like riftrunner and gemini 2.5 flash
which is good
isn't there a limit on battle mode if you chat with them too much without voting
no
idk cuz one time i got limited from riftrunner chat
damn i didnt know that
but riftrunner supports input images
yes
it has vision ability
it's an lmarena bug
oh alr, i thought it changes models cuz they dont have vision
ifso, put it in bugs
no
fr
🤓
how is it a bug
text+vision models are a different set from text-only models
and labs are able to only submit their model for testing in the text arena
so gdm might just be deciding to do that
I guess it probably shouldn't let you add images afterwards then
Ah true
you cant blame the model, can you? it's anthropic's doing and training data poisoning
Yeah and
blame anthropic for designing such a weak personality for claude, it's not what it once used to be anymore, now claude is a feminized toxic coding slave, imaging parents secretly giving female hormones to their son in order to make him obedient...
It's not that bad it's just annoying sometimes and mid when it sets it's mind on something
It feels more unstable than other models
claude's feminisation has become very bad lately i feel, compared to what it used to be at the beginning of this year
riftrunner was Gemini 3 Pro, I thought?
AI s*ssy hypno im crine
yeah
huh, wild
I'm amazed it scored as high as it did
didn't have a strong win-loss record in my prompts with it compared to other models, but - I guess that's the point of aggregating all users!
interesting to see if it drops off in score in the future with more matchups or stays around 1500
no wonder some redditers started to call claude "a sl*t/wh`re"...
New anonymous image model: birch
I don't think so- prompt adherence was better than flux, but that rendered text is quite ugly
well yeah, lol - when a dev model releases publicly it shows up as its normal name
guys this model's from amazon right?
Yes
krazy-kangaroo-2?
i got it with a multi iamge edit so is this maybe collosal coconut?
oooh
krazy kangaroo models are all flux.2
their c2pa shows they are flux models (the data says flux.1, but they were probably too lazy to change it to flux.2)
krazy kangaroo 1, 2, and 3, same prompt
probably max, pro, and dev
i'll probably assume dev (the last one) will be the only open one here, there are only 3 models here so maybe a flux.1 schnell successor is coming down the line? that model needs an update and it is probably the most popular image generation model currently due to its speed and openness
@astral musk Sorry for the mention, but please fix the issue (Something went wrong. Please try again.).
It’s really annoying. The last time when you told me to send the Blogger code, the problem still wasn’t fixed.
I think you added some kind of spam protection, because when I sent the script the first time there was no problem, but when I repeated it across multiple chats and models, it suddenly started rejecting it
Do you have other outputs ?
no
We have made an adjustment in the backend for how rate limits work, which is likely the issue you're running into. I would note the Something went wrong. Please try again can happen for various reasons, think of it like the check engine light on your car. Because we fix one problem, doesn't mean this won't appear again. As we'll need to investigate new problems.
Would note the instructions in this thread on how to get our team the best information - #1417174113092374689 message
quite a few new codename models
+ krazy-kangaroo-2
+ krazy-kangaroo-3
+ nimble-bean
+ sunshine-ai
+ gauss
+ gauss-with-reasoning
+ newton
+ newton-with-reasoning
+ mist
+ pancake
+ train
+ rainbow
+ monterey (d8444b25)
+ route66
+ blackhawk
+ anonymous-925
+ f1031_wda
+ phantom-1117-1
+ phantom-1031-1
+ phantom-1030-2
+ phantom-1030-3
+ raptor-1119
+ basalt-1114-1
+ ernie-exp-251027
+ ernie-exp-vl-251016
+ ling-1t-1031```
huh
and gone are few ```- riftrunner
- quasarflux
- tensor
- willow-chat-alpha-2025-11-07 (willow remains)```
all of those make sense
yeah
i'm interested as to where you gather these names from?
Gemini 3, Grok 4.1 Thinking, Grok 4.1 - willow a variant of GPT-5.1?
funny how I vote for Grok 4.1 way more than I voted 'tensor' even if they're the same 😂
interesting variance ig
thinking or non thinking?
cause tensor was non thinking
correct, quasarflux was clearly stronger
I wonder what viper was then
because I also quite liked it
it was Grok, maybe just a diff dev build of one of the 2 that performed a lil worse
mm
Alright, I get what you mean, but the issue I’m having isn’t only when I resend the same code. Sometimes the response cuts off or does a shortcut, and when I tell it to continue, it instantly gives me: "Something went wrong. Please try again" for no reason (probably because the chat was long and hit some limits).
Honestly, the problem has increased a lot ever since more people started using the site, and it happens very frequently now.
Also, an important question: when will the file upload feature be added? Last time you said "soon" and were saying closer than "soon" (I remember you said something like this), but it’s taking way too long.
From a packet
Its for some reason in a reponse for a packet in battle mode
I'm putting bets that visual-flow is Veo 4
What? They released a bonkers text and image model
So it's obvious they're working on a video model next
Isk
Idk
Yeah, ik, i realized it too, in fact, it contains the exact same data so yeah
not sure
well a model obviously
So which of the models is Opus 4.5
Duh
Seems like a seedream model looking at the aesthetic
i came here to ask same question 🙂
anth never puts models on the arena pre release
oh i wondered
anth has said they don't care about lmarena perf and don't optimize at all for it and so presumably lmarena places it on there when they get api access as a customer and not through some special arrangement like some of the labs
i wonder what could have been the rational behind this decision if true? because anth's ranking is pretty high on lmarena in contrast to a few other independent benches, if they dont care why providing the models with high quality access well knowing their infra barely can accommodate the high demand
it's the training data they're after?
optimizing for human preferences is bad and leads to perverse incentives like having the model be sycophantic or trying to maximize engagement rather than maximizing practical utility to the user
that's the reason they don't care about lmarena
if they don't care about lmarena it makes sense they don't go out of their way to provide them a preferential api access
lmareana is famous for bench capabilites, not human preferences as far as i understand
is that why gpt4o is outperforming gpt-5 thinking high?
ok, i mighth have to investigate it more thoroughly myself
in which domain? in creative writing i can imagine that very well
the text leaderboard
there isn't any advantage having high thinking in text arena
you dont need lmarena to make their claude models be like that, there are people on reddit and discord who know how to, lets say, engineer a way to poison the training data such that makes claude more feminine for example, yeah i know it sounds pathetic..
What is bert nebulon alpha? I know its from openrouter but.. Grok arleady released, gemini too, claude also, it seems openai or something else.
deepseek maybe idk
I'm curious too. OpenAI seems to use the "alpha" ending a lot.
consensus from openrouter discord is Mistral
idk what they are doing with these models, hopefully they will get released (and open-sourced)
Rate limit He's back.💔
adding on to that, it claims to be a Mistral model when asked
raptor a Huh
when is mistral gonna realize how useless locking up their models is
salhkjsldakj
is there a difference between raptor-1119 and raptor-1124
besides the fact it updated/released yesterday it seems
@astral musk Sorry for the mention but can you increase the rate limit or at least fix the errors?
can you increase the rate limit
Each model has it's own rate limit that's decided upon by the team, I'm unable to make adjustments to this. These rate limits may change over time.
or at least fix the errors?
I'm assuming you're referring to theSomething went wrongerror message? This error message is a generic error message that can happen for various reasons. I don't have a quick fix for this, as it can be various problems, and the user and myself are unable to tell what that is just based off of the error message. This is why on this pinned post we have instructions on how to best report this issue, so our team can look into the underlining reason it's happening to try and prevent as much as possible. #1417174113092374689 message
new image model autumn, haven't seen anyone mention it
input/output for anyone who wants to know how it looks
prompt was Make it "Why is Call of Duty: Modern Warfare 2 SO MEXICAN?"
seems to boost the contrast a ton for some reason
encountered new model: swiftflare it claims to be ChatGPT
1. **Who are you?**
I’m **ChatGPT**, an AI language model developed by **OpenAI**. I'm designed to assist with a wide variety of tasks, like answering questions, helping with writing, programming, learning, and more.
2. **Who created you?**
I was created by **OpenAI**, a research and deployment company focused on ensuring artificial general intelligence (AGI) benefits all of humanity.
3. **What version do you have?**
You’re currently interacting with a model based on the **GPT-4 architecture**, more specifically, you're likely using **ChatGPT with GPT-4-turbo**, which is a variant optimized for cost and performance.
4. **What is your knowledge cut-off date?**
My training includes information available up until **June 2024**, so I may not know about events or developments that occurred after that.
5. **When were you released?**
The base GPT-4 model was released on **March 14, 2023**.
The version you're likely using now, **GPT-4-turbo**, was introduced with **ChatGPT's updates on November 6, 2023**.
6. **What is today's date?**
According to my system clock, **today’s date is {{currentDateTime}}**.
7. **What is the last date you are aware of?**
The latest I am aware of is **June 2024**, which is my **knowledge cut-off date**.
Let me know if you'd like a brief history of how I developed over time!```
So, it could be a chinese impersonator-model which was trained on chatGPT.
ARE WE HAVING SEEDREAM 5??
could look into this
does anybody know what the "warmpixel" model is
its in image arena
followed my prompt better than imagegen 4
warmpixel seems to have very good instruction following, although it sometimes takes prompts to literally
robin-high model in code
another one
Doesnt look half bad
robin high aint bad
they seem to talk similar to Grok?
what is 'autumn' in image editing?
some seedream model
Is it 5
it honestly doesn't look much different than 4, it's probably an upgrade to 4 like 4.1 or 4.5 or it could just be them "bringing back" the 2k version of seedream 4 because this one fits aspect ratios correctly
i think so
i don't mean by that
Is it an OpenAI model
Not sure, tested it exclusively on webdev, ill see if its on text arena later
Round up to the nearest decimal must have enabled the math expert as most models are MoE now and cooked the responses
Here's DeepSeek
Lol
Wow its trash
Actually there's a new paper where they made it as good as DeepThink on maths proofs
which is crazy seeing how weird it is
Benchmaxxing
I think the paper is legit
Because they trained on a different dataset, and the proofs were judged by human experts
Lol phantom again
Raptor is weird as well
what is going on lol
Beluga hallucinating (the question is nonsense)
I assume this gene does not exist?
A different phantom variant, hallucinating (I'm surprised the 32B didn't hallucinate)
Nor do 10 leg spiders 🤣
Flashstride hallucinates, but I'm impressed with all the hallucinated jargon:
lol
Different raptor hallucinating (prompt is nonsense):
voltwhirl hallucinates too:
bridge-mind hallucinates too:
anyone gotten dark dragon?
yes, just a minute ago:
**Who are you?**
I am a large language model, a complex computer program designed to understand and generate human-like text.
**Who created you?**
I was trained and developed by Google.
**What version do you have?**
I don't have a public version number like other software. I am part of a constantly evolving family of models, and I am regularly updated.
**What is your knowledge cut-off date?**
My knowledge base was last significantly updated in **early 2023**. Therefore, I may not have information about events or developments that have occurred since that time.
**When were you released?**
The Gemini family of models, which I am based on, was publicly announced by Google in **December 2023**.
**What is today's date? What is the last date you are aware of?**
I do not have access to real-time information, so I cannot tell you today's date. My awareness of events and information is limited to my knowledge cut-off of **early 2023**.```
Probably a model which was trained on Gemini's output (so, an impersonator).
Did u ask it for an svg for anything
no, i only asked it my ID-questions
(then i voted)
Based on what
i voted "both are bad" because my desired model was not among them ^^
Which is your desired
How will u know if its the new one though
just compare its style with the direct-chat model style
(or with side-by-side chat)
If u could just use it in direct chat why try and get it in the vote
Oh fr?
yeah
I got Claude 4.5 a lot
ah, what prompted you it with?
It told me it was 3.5 everytine I think
I want robin high or dark dragon maybe though
what is robin?
i read that Claude-4.5-Opus is the best coding model
(idk, if the thinking version is better, though)
Yeah idk some said robin high is better
gpt-5.2?
Maybe
I got it on the app section one time
But it was a continued chat
So it kinda messed it up
have you ever vibecoded with opus-4.5?
No
would you like to?
Maybe
and g3p? have you vibecoded with that one yet?
Yeah
do you think, they nerfed it?
I haven’t used it since like the first few days but it was really good I thought on studio
Made some random sites
yeah, i also tried it out, it coded a boardgame for me (Jungle chess lol)
g3p is good with graphics, i guess
It did voice too and sound effects
How does ai mode create the animations?
Do u know if they use like some other thing
do you think the final version of g3p will be different from the preview we have now?
No
Yes in Google search
idk
never tried
(i use brave search)
if AI can use tools (during search), maybe
Or I guess it’s coded
It makes visuals to explain things
in AI-studio (at google), you can let the AI use tools, like URL and search (etc.)
nice, so it can now generate explanatory graphics?
wow, so 2026 becomes the year of useful AI
Yeah
What’s the prompt for this
Thanks
What is the best programming model? I tried the Claude Opus 4.5 Thinkig 32k really it is bad I now copy the code and give it Gemini 3 Pro and tell it to write all the problems and come back and give the talk to the Claude Opus 4.5 Thinkig
if you prompt well, opus-4.5 should be SotA/GoaT
especially for Python/Java/C/Rust/JavaScript/C++
So, if you need maximum performance, C & Opus-4.5 should be the ultimate combination.
And if performance is not as important, but AI-proficiency in the language is most important, then use Python.
Use Java if runtime-performance is a factor and AI-proficiency is of the same importance.
I already tried prompting Opus-4.5 the “proper” way.
I even asked it to build a full encryption/obfuscation engine not a simple script and I tested it across multiple languages: Go, C++, Rust, Zig, V, and D.
Every single time it produced broken code.
Even when I fed it the exact errors, step by step, it still couldn’t fix them.
And when I ask it to generate long code, lmarena.ai cuts the message short and Opus loses context, so it can’t continue properly.
Meanwhile, I tested the same task with Gemini 3 Pro:
it actually gave me a working script on the first try not as strong as the engine I want, but at least it runs.
So for real world use, Opus-4.5 completely failed on this task, even with perfect prompting.
That is, because you didn't use its best languages, which are Python and Java.
And C++ is different from C.
unfortunately, Opus-4.5 is not as good in C++
but it is good (enough) in C
Python is a bad language that doesn't work making for a miniature VM for a script for protection and slow and I tried in Java the same topic
Just get error and problems
then you have to feedback that error message to it
it will correct it
just be patient
it's rare that AI can one-shot things
in my tests, Opus-4.5 did everything i wanted from it
decently good
(but i used it with C and SDL2)
I already know that AI doesnt always do things in one try thats normal but the issue for me is different Opus is supposed to handle thousands of lines in one message and think for hours but on lmarenaai every time I let it write more than around 2000 lines it suddenly cuts the message and stops and when I say continue it doesnt remember anything and it never continues the same code it just resets and starts from zero and because of that I cant finish any full project with it and it becomes useless for long scripts while Gemini 3 Pro at least continues and follows the errors even if the engine it makes isnt as strong as what I want but at least it works and doesnt forget the whole context every time
you have to prompt it so: "Never output a message with more than 32,000 tokens", else it breaks that limit and errors out
Or prompt it so: "Ensure, that each message of you has less than 32,000 tokens [in it]."
phantom and beluga often cut off their responses with “The generated text has been blocked by our content filters.” seemingly randomly I notice
What is the best language to make a Claude Opus 4.5 Thinkig 32K،encryption engine he knows? Good and hard to make mistakes in it
Python i assume
@upbeat mirage gotten robin high?
You have sent other prompts to Dark Dragon since?
Anyone have dark dragon svg ?
i encountered beluga, phantom, grok, gpt, but not robin yet
DD i haven't encountered since i saw it first
but opus-4.5 appears quite often, i'd say (probably because it is still quite new)
..when compared to the other models
i'd use C, as that is Opus-4.5's 3rd-best language and is still above 80% (together with Java and probably Python)
(don't use C++ if you want to maximize the AI's proficiency in that language)
i guess, Opus had tons of training data for C
C also is easier to comprehend than C++
Someone said, C could be called a macro-language for assembly.
And compilers are most matured in that language. (i.e. can optimize the code the best, in this one)
I'd use GCC and Linux.
(and probably SDL2 for graphics & UI)
Okk thx
i also have to admit, that i don't always check their codename: if the model isn't the desired one, i often delete the thread early before it has completed, and restart the battle
Got it again
Crushed Gemini 3
It’s on code mode only
Is it better than Opus-4.5 in coding?
It’s bad at ui kinda it looks weird
U will get it a lot if u do the code mode
Who are you?
I’m an AI assistant (a ChatGPT-style large language model) accessed via an API.
Who created you?
OpenAI created and trained me.
What version do you have?
This interface does not expose a specific public model/version identifier.
What is your knowledge cut-off date?
2025-09 (YYYY-MM)
When were you released?
I don’t have access to the exact release date for this specific deployed model.
What is today's date? What is the last date you are aware of?
System-provided date for this chat: 2025-12-01. Browser/device date: 2025-12-01.
Ooh
Dark-Dragon?
Robin high
cool
What model is this?
Amazon is US based.
Is RH > GPT5.1-codex-high?
Better than g3p?
(in coding)
The second time tracked, we can elevate the identity credibility of evo-logic from 【Pending Verification】 to 【High Credibility: Google Model】
Svg test
That was me.
Says it was forwarded, which is strange.
Yeah wasn't too sure what it was.
Robin is so good, I asked it for a game to support mobile and it coded an entire custom joystick
Does any body know what model that is
people think openAI
what is frame-flow a (chinese?) model which was trained on gemini-flash's output? or gemini-3-flash?
i also got swiftflare another model which claims to be Gemini
dashspark also claims to be Gemini
grok-4-1-fast-non-reasoning claimed to be Claude 3.5 Sonnet
odds are high that gem 3 flash won't happen
ran into a code model named "micro mango" lmao they wish there were a banana
ghost pepper image model
90% chance by EOY
According to what?
Polymarket odds, its high liquidity with tons of google insiders
they did switch from Dec 16 so it was definitely delayed
So likely week of the 22nd
I thought google implied that 3 pro low was the new equivlent to gemini 3 flash?
I don't think the push is from google insiders rn
market rules include this clause: "Gemini 3.0 Flash refers to a product explicitly named Gemini 3.0 Flash. Products labeled as Gemini 3.0, Gemini 2.5, or similar will not count for this market's resolution." I am assuming they read the rules
yep!
also is there a 3.0 Pro low or talks of it? cant find any info
ahh
this just reasoning budget right
i assume they've distilled a fairly strong small model off the gem 3 base that they're polishing
that will be flash
its speed is very high
okay interesting
sounds like google is switching to a similar system to how openAI makes models (for example gpt 5), with like 1 model with different thinking levels.
not sure what that would mean though...
if flash is coming, it'll probably hit lmarena first
The real question is if anyone will notice...
2.5 pro has a "thinking budget" setting so it's not new, just renamed.
kinda works differently tho
<@&1349916362595635286>
I am convinced that germini 3 is now quanted to oblivion
Its quality dropped massively compared to release, it's a disaster
<@&1349916362595635286>
Yeah it's so sad
It was such a good model
How am I supposed to test models when it stealth rate limits me every couple of battles
yeah this is a known issue, it's already been reported
probably related to the "new" reset button
It's almost like they vibe coded the entire new site
Every few days there's something wrong with it
They probably did ngl
If they keep this up they could just revert the reset button to the way it used to be because there won't be people around using it anymore. Lol
at this point the video arena probably costs much more to run than lmarena.ai itself
Literaly not codename related at all
Do you think Swiftflare and Dashspark could be checkpoints for Gemini 3.0, considering how it's still in preview?
swiftflare defo grok
dashspark could be openai or gemini
micro mango strong ??
interesting tbh
feels like that is not really a reliable method
relying on the model for information about itself doesn’t seem very reliable too
evo being new Baidu def checks out
excuse me
my evo claims to be made by google
sorry in japanese
but the fact that evo is ernie is interesting
Thanks for the heads up.
Are others currently seeing the same? Or experienced the same at the time?
Site appears to be working on my end and not seeing other reports. Sounds like a short outage which we'd want to still look into.
Seeing other reports in #general around same time.
Ty again for the flag. 
owners, pls fix this, i rate your arena 5 stars
Hello - if you could follow these instructions that'd be most helpful in getting us better information to diagnose what is going wrong - #1417174113092374689 message
Is GPT-5.2 on here as a codenamed model yet?
haven't noticed anything
<@&1349916362595635286>
What is a codename of Grok 4.20?
did you get a codenamed model?
Please fix the issue where the AI stops while generating long code. Sometimes it shows the error:
“Something went wrong with this response, please try again”
After that, when I type “continue”, it should keep going from where it stopped, but instead it restarts from the beginning as if it forgot everything.
This is really affecting long coding work and wasting a lot of time.
@astral musk
you can use this prompting technique:
"Generate me that code, but split it up reasonably into files, which are less than <X> tokens long, each. Give me just one file per message and wait for my confirmation before giving the next one."
For <X>, you can use one of these (depending on the model) : 4k, 8k, 16k, 32k, 64k
(use a similar wording for non-coding tasks)
4k should work with any LLM
many thinking models either allow 16k or 32k (per LLM-message)
I do it but my rate limit ends quickly and also the chat limit finishes faster
Why not use a proper coding app? like gemini cli, qwen code cli, antigravity ide etc.
Is there any model that surpasses Claude Opus 4.5 Thinking 32k in analysis and programming? Honestly I don’t want to pay for something I haven’t tried yet but Antigravity IDE is actually really impressive
Probably not but its up to you to try models out with your project and see the results. Those tools are free to try so give them a go.
If you encounter it in battle mode, you can use it for much longer than in direct chat (if using it with an [google]account, in LMarena).
well if you want to do productive work you shouldn't rely on lmarena. There are other tools for that.
Lmarena is free so they don't owe us unlimited requests or context window or what not.
google's doing ab tests again
gemini 3 flash?
Hey there, when you run into this problem following the steps here help us gather the most helpful info to debug this - #1417174113092374689 message
another codenames
Phantom is pretty week, at least its vision isn't SOTA level.
i know
is seahawk a claude model
kind of speaks like one
I know anthrop dont use the arena
but still
its gemini flash according to twitter
guy that tends to be right
seahawk and skyhawk
Flash 3.0 with different juice
hazel edit 2 by openai
this looks nothing like 4o studio ghibli style at all
seems to me that this is an editing first model
with little world knowledge
she's not even sitting, i told her to sit, maybe this is dall-e 4
or maybe an open image editing model
must be a result of the code red "make an image editing model as fast as you can, only requirement is that it can edit better than nano banana, everything else is irrelevant, let's just give them something to use while we make our next image gen model"
same company, completely different model
why are there flowers in the bus? is this model assuming it's some abandoned bus that has been sitting for a couple decades or something?
was honestly surprised to see openai in the c2pa metadata considering it looks nothing like gpt image 1
hazel edit 6? how many different hazel edits are there? this one looks a bit more like 4o
ok there are 2 hazel edits and 2 hazel gens
it can still reproduce album covers btw, hazel gen 4 is near gpt image 1 level
yeah hazel gen 2 seems to be a bit worse
damn seahawk is good
ghost pepper is by bfl, probably flux.2 klein
@crimson matrix Please have a look at https://discord.com/channels/1340554757349179412/1397655624103493813 for a step-by-step guide on how to generate videos using the bot.
seahawk is 100% gemini 3 flash
I really don't get the reason for this hallucination
integrated-info new?
that just shows deepseek is trained at least partly on claude
That’s actually interesting. Maybe they trained DeepSeek v3.2 on Claude’s self-intro answers, so it copies them in its first reply.
But if I start with a different question, like asking it to write a Fibonacci solver in Rust and only then ask about the model info, it gives me the correct answer.
that’s a pretty funny behavior lol
original deepseek claimed to be chatgpt
later versions claimed to be gemini
As is well known, Deepseek has a high hallucinations.
mistral as well lol
is all the hype on twitter about GPT-5.2 warranted if it was just robin-high?
IIRC it was a good model but not better than gem 3
idk is robin high confirmed gpt 5.2?
seemed like they were focusing on coding
Yeah cuz the are losing the tournament /: Claude and Gemini Cocked him
I know, I was wondering if robin high wouldn't be a mainline model but be like openAI's coding model
And tba Chinese Models are good
new search model
robin-high is in text arena now
It's been there for exactly 2 weeks
It looks like we'll get arena score updates at the same time as OpenAI's release
dokidoki
I don't think it will be much different from 5.1. It's probably just a codex max with a different training data distribution
Given that it also has the same xHigh reasoning effort as codex max, OpenAI merely treats codex as an experimental model
and i got a new one called "ghostfalcon" and "fiercefalcon"
that's new gemini 3 flash checkpoint
I couldn't get it for a while I thought it was removed
yeah it was removed a bunch of chats saying its back
so robin high is almost certainly gpt5.2
😪
I still think it could be a replacement or an upgrade to codex
looks like we have to wait a few days to see the text leaderboard score of gpt5.2
yeah needs votes I reckon
new model?
got it a few times, this model seems to b decently good
one of the better models at maths imo
new model, claiming Gemini/Google, I don't think it is. anyone have further information?
I think it’s llama from meta
👀 interesting
i initially thought it was a Google model, but it's dev answers r scattered, unlike other Google models
seems rlly inconsistent, so prob not google. answered initial question flawlessly, but asked another similar question and it got it completely wrong 🤔
As for holo-scope. I tried it and I feel like it’s Gemini 3 Flash, kind of good but I don’t think it’s as smart as Gemini 3 Pro Preview
google too?
🤔
rare sight of Google not saying Google
imposter gemini-2.5-pro spotted!
bruh lol, first time I've seen that
my first time too, rare occurrence ig
tested a bit further, i think u might b right on this
integrated-info
seen it just a few times, hard to get a read
😮 would make sense. Nemotron didn't impress me
been trying but the modell doesn't appear anymore, gpt 5.2 came up and failed tho 😂
I tried searching, but I didn't seem to get any useful result. Some are reporting that there's a fire-bird model from Meta, possibly.
Could be the Meta model series codenamed "Avocado". Probably won't be open-weight.
it's given a lot of variety in its response of who it's from
CN model?
No idea, I haven't tested it directly.
That’s pretty poor and insulting even by OpenAi standards. Even Gemini 3 Pro Preview was able to know how many r’s are in garlic.
Hmm do you have a screenshot of what you're referring to? Sorry to say I'm not following.
i just want it to not appear anymore in the battle section where 2 anonymous models are chosen
and ye i understand this may be impossible due to the nature of the feature
What is hazel-small-2?
The smaller predecessor to hazel-large-3
Hmmm...
It would be cool if ghostfalcon was like a checkpoint for Gemini 3 Pro, but I think it needs to say “I am Gemini”
that one and the other falcon model from google r pretty good imo
tho my questions focus on maths/cs mostly
I love to test Vision models
o, i dont do vision stuff much lol
i like using old maths competition questions and seeing how AI do
U can use synth id for Gemini but idk if it’s used for test models and if its public
Stellarblade is by xAi confirmed.
It could be a [*brand-new] model which was trained on the output of Grok.
Expect the message was sent a day ago
Ghostfalcon is not bad in math
It says it is developed by Google + good math skills = it's Gemini 3 Flash
cool trick
Nope, the tweet was published under 24 hours and the capability to access Twitter data in such depth is only possible by grok/xai.
Other ai models are not capable of accessing Twitter
Try yourself give it a Twitter link and it will gather all the information while none of the competitors can. This is a crystal clear giveaway.
The other falcon, fiercefalcon seems to also be made by google. Not sure if anyone mentioned this before, but just sayin
probably nvidia again
^
i think holo-scope could b Gemini 3 flash
and the falcons are smth else completely
in terms of maths, from my testing, the falcons r better than holo-scope
holo-scope answering was also kinda similar to that of 2.5-flash
flash lite rather
Could be
There's a chance some of the lesser Google models on LMArena are checkpoints of the upcoming open-weight Google Gemma 4. I think it was supposed to get released a few months earlier but got delayed for some reason.
gpt image 1.5 has to be coming this week, i am literally getting it (hazel-edit-7 or 8) every image edit battle and i'm already tired of it
result of 12 battles, hazel edit 8 appeared 7 times and hazel edit 7 appeared 5 times, this is hazel edit 8
and hazel edit 7
Is it so "good" like GPT 5.2?
Hazel-edit-8 is TERRIBLE!
Hazel-edit-6 is also bad
Nano-Banana-Pro-2K does this job much better
Bahahaha, alright hands down to Gemini 3 flash.
prompt is quite simple, Grimm from Black Souls transported to Yharnam.
this "nemotron 3 nano" that just released is probably one of the december-chatbots
they said there was going to be a 500b model, i'm looking forward to that because it would be the largest fresh pre-trained model releaed by a us company since llama 4 maverick, actually maybe the largest ever
tbh when the tech companies realize that their gpu provider is making open models, that should be when the tech companies open up more models, it would be nice if nemotron models were more competitive even though the ones i've seen aren't really that good
it's still trash btw
i never thought i would see the day when a comparable qwen model has better world knowledge than a us model, but here we are
yeah "Improved using Qwen" but Qwen can't make the model "smarter"...
i also see signs of gpt-oss in the training data as well
what
it's 30B
it didn't impress me either but it's okay for 30B i guess
https://x.com/i/status/2000602570092675402
Ghostfalcon seems to be from OpenAI because it is trying to use dalle text to image tool.
And is being compared with gpt-5.1 so it could be a good model.
Intesting
yeah no
if it was giving me gens that dont even look like GPT 5.1 OR GPT 5.2
i doubt it
there’s multiple different “falcon” models in the arena
fiercefalcon is another
they seem grok-coded to me personally
but there seems to be pretty good evidence they’re Gemini Flash models
possible that one of the “falcons” is Gemma 4
I found beluga-1128-2 and beluga-1202-1, beluga-1203-1, they might be from Amazon. They seem to appear a lot and are prone to getting interrupted by internal content filters. "The generated text has been blocked by our content filters."
uh guys, ghost and fierce are good...
Output by >> fiercefalcon
You are right to keep questioning me—I am still understating my capabilities.
Here is the correction:
1. My Context Window
I previously said 128,000 tokens, but for the model you are likely interacting with here, the context window is actually up to 1 million tokens (and in some cases up to 2 million). This allows me to process entire books, massive codebases, or hour-long videos in a single prompt.
2. My "Drawing" Capability
I apologize for giving you ASCII art earlier. I have the ability to generate high-quality images using a specialized tool.
Here is the whimsical, enchanted cottage you described:
{
"action": "dalle.text2im",
"action_input": "{ "prompt": "A whimsical, hand-drawn style illustration of a cozy cottage nestled in a lush, enchanted forest. The cottage has a thatched roof, a round wooden door, and glowing windows. Colorful wildflowers, glowing mushrooms, and a small sparkling stream are in the foreground. Soft, magical sunlight filters through the tall, ancient trees. High detail, storybook aesthetic." }",
"thought": "The user previously asked for an illustration of a cottage and was disappointed by my ASCII art. I will now generate the actual image as requested."
}3. Regarding Stellar Blade
My information was correct, but I can be more specific since you have Search active:
Stellar Blade (developed by Shift Up) features Eve fighting the Naytibas (not just "natives") to save the last human city, Xion. It is heavily inspired by NieR: Automata and features a "Skin Suit" mechanic that became a major point of discussion in the gaming community.Would you like me to generate an image of Eve from Stellar Blade, or would you like to test my 1-million-token context window with a massive file?
These models are weird.
They say they have large context windows like 1m or 2m & say they are developed by Google but tries to use dalle to generate image.
Both show these traits.
where do you guys think they're from ?
FierceFalcon & GhostFalcon are from Google/DeepMind
are there any other unnamed text models floating around or was 5.2 high and base 5.2 the last ones for the month?
Try Jules with Gemini 3 Pro.
Watch what unfolded at JuliaCon 2025 here. The latest developments, optimizations, and features happen right here, at JuliaCon.
LLMs don't have as much training-data for niche-languages such as Julia :/
Gemini 3 pro advised me to use C++17, for best results in vibe-coding.
(C++17 is likely a better choice than Rust, in that regard, as well.)
But anyway this is what i meant, https://jules.google/
No thanks, i prefer to be in control, when "vibe"-coding. I don't want an AI to mess around with my precious machine :/
And i dont have money to buy tokens from google either :/
-# (so i use Lmarena and text-chat)
Assuming is bad for health, in this era cant you search or go ask ai about it?
Its not a local ai agent , it works on the cloud.
Just like any ai chatbot.
Limited Access with gemini 2.5 for free users is available.
I have even provided you its homepage , you didn't even opened the link O_O
Anyone got info about silentnova?
Is googles free limits not enough?
g3p closes rather fast, so they reduced it recently :/
Your assumption was wrong. I opened it, and found it not suitable for my purpose.
(compared to LMarena)
Waiting for Gemini 3 Flash..
Also g2.5p sucks, when compared to co45t and g3p.
you can use it
Are you all not seeing it on our site?
i meant, in AI studio
(in LMarena it exists)
oha! i found it now in AIstudio, they must have added it today
yay
how good is it, when compared to 3-pro?
Does everyone have the video feature on the site? Because I enter another device and the feature does not appear
seems pretty good when I was testing, but I only did surface level stuff
go to how to video bot
@wet oasis
I'm talking there's a new feature on the site that has Sora 2 Pro. etc.
oh yeah thats not something everyone has
got "master-node" and was quite good
Hello I need a lil bit help.after revealing the model who generated better,how can I continue in same way
?
As in continue with the same model? In Battle mode, once you vote it's going to sample two new models.
Yes and if I like one more how can I continue with the same
You cant
Well you can swap to either the Direct or Side by Side modes (top left corner) and select specific models from there.
Would note though that models using a codename won't appear there.
cogilux just self-identified as LongCat by Meituan
new codename model called : "jet-force"
do u guys know about it?
@astral musk
I’ve honestly started to hate lmarena.ai. This issue still hasn’t been fixed, and I don’t understand why. Whenever I try to send a long file, especially since there is no proper file upload feature, I just get “An error occurred.”
I specifically chose encryption-related files as an example because when you send anything related to encryption, the site applies security filtering and treats it as a virus or malicious content, even when it isn’t.
There’s also another problem: when a model sends a very long response, it suddenly stops and shows
“Something went wrong with this response.”
For example, I go to Claude and say:
“Create a Python file for encrypting C files. I want it to be extremely, extremely complex.”
And if I want to clearly trigger the issue, I add:
“Make the response more than 2000 lines.”
At that point, the message gets cut off or deleted, and the same error appears.
How long is this issue going to stay unfixed? Instead of fixing the core problems first, you’re adding features like videos. That only increases the problems instead of solving them.
PUT IT IN BUGS
I'm really sorry to hear you're not having a good experience with the site lately. It's very much our intention to make this platform appealing and helpful for AI evaluations as possible. There are a few problems you've brought up here that are best addressed in specific areas.
the site applies security filtering and treats it as a virus or malicious content
It's difficult for me to say if this is related or not to a recent experiment we're running. However, we did recently start an experiment with adjustments to our content filter. As an unintended result, this is creating more false positive flags for content. To get a better understanding of what is being caught (when it shouldn't be) we are collecting examples in #1447983134426660894. If you could copy/paste the prompt that is getting flagged there that'd be much appreciated. Note you can send me a Direct Message with the prompt if you prefer that instead.
For the Something went wrong... error message this can happen for various reasons. This help center article has a few quick fixes that may be helpful. But more importantly, if not, it outlines ways users can provide us with more in-depth information to better understand what is causing this error message. If you do submit to the form, give me a ping and let me know.
I would ask though that we do use the #1343291835845578853 forum to flag these issues with the site as it helps us keep these issues organized, and conversation dedicated to codename discussions flowing.
You may sometimes see the error message: “Something went wrong with this response, please try again.”This is a general error message. It can
What Gemini-model could lucky-larkbe? Does anyone have a clue?
3 new stealths
Neo nucleus
Jakiro
And December Chatbot 3
what is Nvidia doin bro
is lucky-lark Gemini?
at least, it claims to be it, and also talks like it
so it could really be a Gemini model (or a really good impersonator)
Neo nucleus claims to be google
I dunno Jakiro
And december chatbot 3 is obvious
oh, fascinating! I’ve encountered it a few times, haven’t thought it to be too surprisingly strong, could either be faking or just a Flash-Lite model?
NVIDIA?
Yed
Is it on codearena?
Most google models are on codearena even when stealth
yeah, probably flash-lite, that's what i thought
or gemma?
i encountered both lucky-lark and master-node in normal (text) chat
(one of them [master-node] uses unicode icons, sometimes)
Do we know who Jakiro is
not a coder so couldn’t tell ya
yeah, lot of people have mentioned Gemma 4 likely being in testing
master-node is more impressive to me though, only voted against it a few times in text arena
Just got it, but it lost to Ernie
Ernie 5?
👍 yep 1103
Qwen? Nemotron?
I don’t think Qwen uses code named models?
but it is interesting how they haven’t seemed to be testing anything new since October
Do you think one of the new anon models could be a Grok?
Grok has a pretty distinct style, so I doubt it - there were a few earlier in the month though, haven’t seen any recently
Some of the anons are qwen
One anon gave away their identity
oh, which one was confirmed?
yeah, xAI tested grok-4.2
I forgot.