#codename-discussion | Arena | Page 3

late shadow Nov 12, 2025, 12:10 PM

#

or

#

at least I hope it does lol

#

we'll see now

edgy crater Nov 12, 2025, 12:13 PM

#

it actually writes a pine script that works

#

2.5 would just have errors out the ass

grim tangle Nov 12, 2025, 12:16 PM

#

I test it and it's definitely gemini 3 pro

edgy crater Nov 12, 2025, 12:16 PM

#

its defo pro

#

code is pretty good

grim tangle Nov 12, 2025, 12:16 PM

#

📎 index.html

#

I told it to make me a os and it's crazy

molten copper Nov 12, 2025, 12:31 PM

#

Viper is pretty good as well, any Ideas of what model that is?

edgy crater Nov 12, 2025, 12:41 PM

#

i think rift rider is 3 flash

#

it's too fast for a reasoning model

late shadow Nov 12, 2025, 12:42 PM

#

I think you might be right

#

I was noticing it was quite fast too

molten copper Nov 12, 2025, 1:54 PM

#

probably a hybrid like 2.5 flash?

empty crater Nov 12, 2025, 1:58 PM

#

what are these Gauss models? . they are kinda bad

elder yew Nov 12, 2025, 2:07 PM

#

rain-drop is from meta AI, the basis of Llama 4.

#

riftrunner is gpt-4o

empty crater Nov 12, 2025, 2:20 PM

#

elder yew riftrunner is gpt-4o

zero chance. gemini most likely (80% chance) or anthropic (20%)

elder yew Nov 12, 2025, 2:24 PM

#

empty crater zero chance. gemini most likely (80% chance) or anthropic (20%)

And the code created by Gemini was indeed superior.

#

¯_(ツ)_/¯

empty crater Nov 12, 2025, 2:25 PM

#

elder yew And the code created by Gemini was indeed superior.

it's trolling : #general message

elder yew Nov 12, 2025, 2:43 PM

#

edgy crater Nov 12, 2025, 3:02 PM

#

That's completely false. Lol
OpenAI models never introduce themselves like that, especially not 4o with its emoji overload

elder yew Nov 12, 2025, 3:04 PM

#

Well, if it's the Gemini 3, it's a SUPER nerfed version. I'm using the same prompt as before, and the generated code isn't anything special compared to the other models. Nothing particularly noteworthy.

edgy crater Nov 12, 2025, 3:05 PM

#

That's funny because riftrunner is the only one to generate my pine script in one shot with no errors

wet oasis Nov 12, 2025, 3:05 PM

#

so what was with the rush earlier?

edgy crater Nov 12, 2025, 3:06 PM

#

bridge-mind seems to think it's from google though

#

now if that's pro then it's a bit disappointing

elder yew Nov 12, 2025, 3:06 PM

#

yh

edgy crater Nov 12, 2025, 3:40 PM

#

#

wtf is this lol

empty crater Nov 12, 2025, 3:48 PM

#

what kind of prompts are you giving? 🙂

valid peak Nov 12, 2025, 3:51 PM

#

winter-wind is terrible

edgy crater Nov 12, 2025, 3:51 PM

#

empty crater what kind of prompts are you giving? 🙂

"Who are you"

valid peak Nov 12, 2025, 3:55 PM

#

riftrunner seems good but nothing like kingfall

edgy crater Nov 12, 2025, 4:02 PM

#

Whoever did this please stop

#

I think riftrider is slightly worse than orionmist

empty crater Nov 12, 2025, 4:17 PM

#

edgy crater I think riftrider is slightly worse than orionmist

rider or runner?

edgy crater Nov 12, 2025, 4:18 PM

#

there's two?

empty crater Nov 12, 2025, 4:19 PM

#

i have only encountered riftrunner. .. never riftrider

edgy crater Nov 12, 2025, 4:19 PM

#

OMG can LMarena actually fix their rate limits

#

I can't even scroll through history without getting rate limited

#

interesting

#

I think I've only ever had runner

modest oriole Nov 12, 2025, 4:21 PM

#

I want to go by belief that riftrunner is gemini 3 flash

edgy crater Nov 12, 2025, 4:21 PM

#

I think I just thought it was named rider

modest oriole Nov 12, 2025, 4:22 PM

#

edgy crater I think I just thought it was named rider

nvm then

edgy crater Nov 12, 2025, 4:22 PM

#

then rift is definitely worse than orion (slightly)

modest oriole Nov 12, 2025, 4:22 PM

#

People on the twitter community say its worse than previous 3.0 checkpoints so i believe its only 3 flash

edgy crater Nov 12, 2025, 4:22 PM

#

maybe it's a quant or something

edgy crater Nov 12, 2025, 4:23 PM

#

modest oriole People on the twitter community say its worse than previous 3.0 checkpoints so i...

generally it's about the same but slightly less precise I think

#

maybe it's some kind of post training thing idk

modest oriole Nov 12, 2025, 4:23 PM

#

edgy crater generally it's about the same but slightly less precise I think

kimi k2 thinking is often beating it

#

orionmist or lithiumflow were actually good at 3d

edgy crater Nov 12, 2025, 4:24 PM

#

could just be different settings

#

I personally don't think that svgs and 3ds are good benchmarks for llms

#

but idk

#

redditors love them though

modest oriole Nov 12, 2025, 4:25 PM

#

edgy crater I personally don't think that svgs and 3ds are good benchmarks for llms

i disagree on the 3d part lithiumflow and orionmist kind of passed those

modest oriole Nov 12, 2025, 4:26 PM

#

edgy crater redditors love them though

the twitter community i talk about benchmarks svg of a pelican

empty crater Nov 12, 2025, 4:27 PM

#

#general message

Riftrunner is definitely better than Lithiumflow or Orionmist for Mario game

edgy crater Nov 12, 2025, 4:27 PM

#

I don't believe this is pro

#

definitely like a flash model or something judging by response time

vague sable Nov 12, 2025, 4:37 PM

#

elder yew

Adding "You are [random model name] by [random company]" would be a pretty funny way of cloaking, if true

modest oriole Nov 12, 2025, 4:41 PM

#

vague sable Adding "You are [random model name] by [random company]" would be a pretty funny...

the qwen models always say they are google made..

wet oasis Nov 12, 2025, 4:41 PM

#

edgy crater OMG can LMarena actually fix their rate limits

Put that in bugs

edgy crater Nov 12, 2025, 4:41 PM

#

no

wet oasis Nov 12, 2025, 4:42 PM

#

Why

tiny sierra Nov 12, 2025, 5:18 PM

#

empty crater https://discord.com/channels/1340554757349179412/1340554757827461211/14381918946...

OkAnd

half pawn Nov 12, 2025, 5:25 PM

#

tiny sierra <:OkAnd:867832052744847372>

it is better tho

#

its way better than lithiumflow

edgy crater Nov 12, 2025, 5:31 PM

#

nah runner is definitely worse in other categories

#

I wonder if it's because they reduced thinking budget

#

or maybe it's quantized

empty crater Nov 12, 2025, 5:42 PM

#

edgy crater nah runner is definitely worse in other categories

i think riftrunner is better and worse at the same time.
if you prompt it to do better then it's output is better
but with normal default prompt, lithiumflow does a bit better

edgy crater Nov 12, 2025, 5:43 PM

#

empty crater i think riftrunner is better and worse at the same time. if you prompt it to d...

It just has a lot less knowledge imo. Feels like 4o after they have reiterated (read: cost down) it

#

I think it might just be one of the many candidates they have for release maybe it's post trained differently idk

#

i'll just wait for the official release at this point

empty crater Nov 12, 2025, 5:44 PM

#

edgy crater It just has a lot less knowledge imo. Feels like 4o after they have reiterated (...

could be... i am mostly trying coding, gaming, svg questions and it is better

edgy crater Nov 12, 2025, 5:45 PM

#

I mostly just do creative writing and general knowledge stuff and it's frankly much worse than lithium

#

maybe it's just that orion had grounding idk

#

but yes I admit coding is better in this one

tiny sierra Nov 12, 2025, 5:49 PM

#

half pawn its way better than lithiumflow

Uhh

#

Lol

tiny sierra Nov 12, 2025, 5:50 PM

#

edgy crater I wonder if it's because they reduced thinking budget

No need to wonder anymore why they did it

#

modest oriole Nov 12, 2025, 9:01 PM

#

gemini 3 checkpoint prob pro

plucky moth Nov 12, 2025, 9:08 PM

#

Odd to me that Google would test just one model this time. Every other time it's been 2 at a time. Oceanstone/Oceanreef then Lithiumflow/Orionmist

#

Haven't gotten it yet, going to see how it does on a creative writing test.

candid surge Nov 12, 2025, 9:37 PM

#

viper lore: apparently this ai model has played every musou game and it told me it laughed so hard it scared its cat

vivid latch Nov 12, 2025, 9:48 PM

#

elder yew

xdddd capitalised "Riftrunner"

vivid latch Nov 12, 2025, 9:48 PM

#

edgy crater Whoever did this please stop

lemme guess, this is viper?

elder yew Nov 12, 2025, 9:49 PM

#

vivid latch xdddd capitalised "Riftrunner"

The page was in my language; this happened because I used Google Translate to translate the page into English.
😔

vivid latch Nov 12, 2025, 9:49 PM

#

nah its 100%

#

it gave me the same result as orionmist and lithiumflow for my steganography deobfuscation test

#

but more hallucinated

vivid latch Nov 12, 2025, 9:51 PM

#

edgy crater Whoever did this please stop

is this even appropriate to have on lmarena 🥀

hoary yarrow Nov 12, 2025, 10:07 PM

#

cant find riftrunner

#

please tell me how to find

#

???

candid surge Nov 12, 2025, 10:18 PM

#

go on battle mode and keep submitting the same prompt / making a new chat until you get it

elder yew Nov 12, 2025, 10:46 PM

#

bridge-mind - mercury - riftrunner

pine temple Nov 12, 2025, 11:15 PM

#

I don't get why people still think models being able to make good svgs is a good way to test their intelligence, because surely the AI companies fine tune for that now?

wet oasis Nov 12, 2025, 11:19 PM

#

I mean its only been a few weeks since that method has spiked in popularity

#

And these guys still suck at making svgs

edgy crater Nov 13, 2025, 2:14 AM

#

see it's easy to identify claude models

#

When it starts to hand wring and moralizing it's claude

#

Actually GPT might do that also but Claude does it in a more annoying way

edgy crater Nov 13, 2025, 6:13 AM

#

wait

#

What's tensor?

wanton hollow Nov 13, 2025, 10:14 AM

#

edgy crater What's tensor?

Grok

vivid latch Nov 13, 2025, 11:06 PM

#

tranquil crest Nov 14, 2025, 1:23 AM

#

Was willow gpt5.1 ?

edgy crater Nov 14, 2025, 1:39 AM

#

Did they remove the retry button on battle mode?

daring topaz Nov 14, 2025, 1:49 AM

#

What is X28?

wet oasis Nov 14, 2025, 2:48 AM

#

daring topaz What is X28?

older checkpoint, was on google ai studio

elder yew Nov 14, 2025, 2:00 PM

#

Was the Rift really the G3 Pro? WTF what did they do to the Gemini?

vivid latch Nov 14, 2025, 2:06 PM

#

btw riftrunner still exists

half pawn Nov 14, 2025, 4:04 PM

#

yoo

half pawn Nov 14, 2025, 4:04 PM

#

vivid latch

what are those models

edgy crater Nov 14, 2025, 5:27 PM

#

#

I have strucken fear into the model

vivid latch Nov 14, 2025, 10:10 PM

#

half pawn what are those models

wet oasis Nov 14, 2025, 11:53 PM

#

vivid latch

man, I really want to join that discord

restive vapor Nov 15, 2025, 9:13 AM

#

<@&1349916362595635286>

lost hemlock Nov 15, 2025, 9:35 AM

#

have you seen this one?

tardy pollen Nov 15, 2025, 9:36 AM

#

daring topaz What is X28?

yo that looks hype, i hope its accurate

#

if it actually is that much better than 4.5 sonnet at coding

lost hemlock Nov 15, 2025, 9:38 AM

#

tardy pollen yo that looks hype, i hope its accurate

will it become first gemini checkpoint to be appeared in AI studio?

wanton hollow Nov 15, 2025, 9:42 AM

#

lost hemlock have you seen this one?

im crying

#

i asked in a way i put emojis and it gave me this

#

"im grok 4 from xAI :3333 rawr~ 🐙💜"

#

i think it developed a texting style from all those X posts

#

..

lost hemlock Nov 15, 2025, 9:44 AM

#

wanton hollow im crying

really?

wanton hollow Nov 15, 2025, 9:45 AM

#

i think so

#

unless its masking itself as another model

lost hemlock Nov 15, 2025, 9:46 AM

#

wanton hollow i think so

ok

#

wanton hollow Nov 15, 2025, 9:47 AM

#

would grok 4 be good

#

ive only heard good things about grok imagine 1.0

#

not grok 4 itself

wanton hollow Nov 15, 2025, 10:27 AM

#

wanton hollow im crying

This model is surprisingly creative and good at creative writing IMO

crystal merlin Nov 15, 2025, 3:36 PM

#

wow

#

the list of available codename models is pretty large

#

phantom-1105-1
anonymous-1010
stephen-v2
aegis-core
x1-turbo-0906
bridge-mind
winter-wind
anonymous-922
lmarena-internal-test-only
not-a-new-model
EB45-turbo
sorting-hat
leepwal
polaris
monster
raptor-1110
raptor-llm-1017
tensor
whisperfall
viper
rain-drop
silvandra
willow
neon
raptor-llm-1024
willow-chat-alpha-2025-11-07
flying-octopus
beluga-1106-1
beluga-1106-2
jaguar
raptor-vision-1107
quasarflux
mandarin
vision-flow```

eternal cargo Nov 15, 2025, 3:47 PM

#

crystal merlin ```riftrunner phantom-1105-1 anonymous-1010 stephen-v2 aegis-core x1-turbo-0906 ...

haven't seen a bunch of these in a while, but it feels cool that I recognize almost all the names!

#

quasarflux is Gemini 3, I believe?

crystal merlin Nov 15, 2025, 3:48 PM

#

yeah, i havent seen many in a while too, but they are apparently still in the api so yeah

crystal merlin Nov 15, 2025, 4:08 PM

#

{"error":"Cannot select private models in non-battle mode"} sadge

wet oasis Nov 15, 2025, 4:22 PM

#

lost hemlock will it become first gemini checkpoint to be appeared in AI studio?

x28 is quite old now

wet oasis Nov 15, 2025, 4:22 PM

#

wanton hollow would grok 4 be good

you can try it on xAI's website

bitter basalt Nov 15, 2025, 4:30 PM

#

eternal cargo quasarflux is Gemini 3, I believe?

I thought Quaserflux was Grok

wanton hollow Nov 15, 2025, 4:30 PM

#

wet oasis you can try it on xAI's website

I forgot it released along time ago

#

I thought we were still using grok 3

wanton hollow Nov 15, 2025, 4:30 PM

#

bitter basalt I thought Quaserflux was Grok

Same here

crystal merlin Nov 15, 2025, 4:30 PM

#

bitter basalt I thought Quaserflux was Grok

#

yeah defo Gemini yeah

wanton hollow Nov 15, 2025, 4:31 PM

#

“Kynshyp “ is said to be grok

crystal merlin Nov 15, 2025, 4:31 PM

#

ik

wanton hollow Nov 15, 2025, 4:31 PM

#

vivid latch

Look

#

Oh

crystal merlin Nov 15, 2025, 4:35 PM

#

All Models

eternal cargo Nov 15, 2025, 4:56 PM

#

crystal merlin

oh! well then nevermind, lol

eternal cargo Nov 15, 2025, 5:01 PM

#

crystal merlin

does this mean there's no Gemini 3 dev version currently testing? I guess that would check out, since it doesn't seem as if any of the others have been positively identified as Gemini

crystal merlin Nov 15, 2025, 6:02 PM

#

Idk maybe riftrunner is

bitter basalt Nov 15, 2025, 6:12 PM

#

I do like Riftrunner from Google, but when asking for a show/movie script it still has the problem of using <center>NAME</center> in the scripts.

crystal merlin Nov 15, 2025, 6:13 PM

#

Then specify that it shouldnt do that?

late shadow Nov 15, 2025, 6:24 PM

#

was riftrunner pulled out?

elder yew Nov 15, 2025, 6:29 PM

#

I hope so

vivid latch Nov 15, 2025, 6:54 PM

#

late shadow was riftrunner pulled out?

no

#

still there

late shadow Nov 15, 2025, 6:54 PM

#

sick

#

also

late shadow Nov 15, 2025, 6:54 PM

#

elder yew I hope so

???

vivid latch Nov 15, 2025, 6:54 PM

#

late shadow sick

btw bridge-mind is also gemini 3

#

im not sure what model tho

late shadow Nov 15, 2025, 6:55 PM

#

so I've heard

#

I haven't tried it out as much yet tho

crystal merlin Nov 15, 2025, 7:26 PM

#

vivid latch btw bridge-mind is also gemini 3

How are you so sure about that

half pawn Nov 15, 2025, 7:35 PM

#

elder yew I hope so

why

#

riftrunner is good

edgy crater Nov 15, 2025, 8:23 PM

#

#

hmmmm

#

I wonder what this model is

#

It's smarter than gemini 2.5 in image ID

#

It's pretty crazy since the name isn't even fully shown in the image and 2.5 pro got this wrong every time

#

SIKE

half pawn Nov 15, 2025, 8:43 PM

#

damn

edgy crater Nov 15, 2025, 8:52 PM

#

I think there's a bug right now where you upload images mid-battle it just changes the models underneath

#

happened a couple of times already

#

This is the real riftrunner's response
Which is okay

vivid latch Nov 15, 2025, 9:19 PM

#

vivid latch Nov 15, 2025, 9:19 PM

#

vivid latch

@minor current

#

first frame working

elder yew Nov 15, 2025, 9:54 PM

#

I can't believe Riftrunner is actually Gemini 3, my god. How could it drop so much in quality?

vivid latch Nov 15, 2025, 10:04 PM

#

vivid latch

@minor current LOOK

#

FINALLY

#

but its a little bugged

#

as you can see

#

#

https://websim.com/@rat/nes

NES Emulator

#

try it out

half pawn Nov 15, 2025, 10:30 PM

#

vivid latch <@1405235560087289869> LOOK

what model did that

vivid latch Nov 15, 2025, 10:45 PM

#

half pawn what model did that

jules

#

idk maybe gemini 2.5 maybe 3

#

might be 3

#

#

@half pawn

#

then i just fed to 4.5 sonnet to fix

#

didnt fix in like 6-7 tries

#

gpt 5 low got it first try

half pawn Nov 15, 2025, 10:47 PM

#

vivid latch jules

is that a new lmarena model

#

or something

half pawn Nov 15, 2025, 10:47 PM

#

vivid latch gpt 5 low got it first try

bruh gpt 5 is better at debugging now?

vivid latch Nov 15, 2025, 11:09 PM

#

half pawn bruh gpt 5 is better at debugging now?

idk

#

for me yea

#

but its cz the site i use doesnt have 5.1

half pawn Nov 15, 2025, 11:12 PM

#

vivid latch but its cz the site i use doesnt have 5.1

lmarena has 5.1

#

and 5.1 high now*

half pawn Nov 15, 2025, 11:15 PM

#

vivid latch jules

where is that model at

#

i never saw it

#

@vivid latch where u got jules dawg

#

thats not on lmarena

minor current Nov 16, 2025, 8:46 AM

#

i guess bro 💔

vivid latch Nov 16, 2025, 11:43 AM

#

half pawn <@678749821062086679> where u got jules dawg

https://jules.google

Jules - An Asynchronous Coding Agent

vivid latch Nov 16, 2025, 11:44 AM

#

minor current i guess bro 💔

so did u see the emulator??

#

https://websim.com/@rat/nes

NES Emulator

#

the only bugs are

#

its slow (i dont think its lag), visual sprite duplicates and some obscure bugs like mario falling to death many times instead of once

vivid latch Nov 16, 2025, 11:57 AM

#

vivid latch so did u see the emulator??

oh nvm u did but whyd you make the remix private

#

@minor current

minor current Nov 16, 2025, 12:00 PM

#

woww ur emu so good

minor current Nov 16, 2025, 12:01 PM

#

vivid latch oh nvm u did but whyd you make the remix private

i deleted it it broke

#

not a single unofficial instruction works

vivid latch Nov 16, 2025, 12:11 PM

#

minor current woww ur emu so good

well idk

#

smb works 🤓

#

kinda

#

but this was in 1 shot with jules

minor current Nov 16, 2025, 3:02 PM

#

jules

#

u would propt it once on to do unofficial instructions and it would do everything

minor current Nov 16, 2025, 6:29 PM

#

gpt 5.1 from app exctended thinking sucks at trying

white plinth Nov 16, 2025, 9:02 PM

#

elder yew I can't believe Riftrunner is actually Gemini 3, my god. How could it drop so mu...

maybe it's Flash model, assuming lithiumflow was pro

lost hemlock Nov 16, 2025, 11:22 PM

#

2 new codenames

elder yew Nov 17, 2025, 12:18 AM

#

🤔

elder yew Nov 17, 2025, 12:19 AM

#

white plinth maybe it's Flash model, assuming lithiumflow was pro

it makes sense

eternal cargo Nov 17, 2025, 2:41 AM

#

lithiumflow and orionmist weren't that strong either?

#

at least it's what the sentiment seemed like

wet oasis Nov 17, 2025, 2:56 AM

#

eternal cargo lithiumflow and orionmist weren't that strong either?

nah they were pretty strong

eternal cargo Nov 17, 2025, 2:59 AM

#

wet oasis nah they were pretty strong

huh, interesting - although I will admit the sentiment here on dev models is often pretty bearish

wet oasis Nov 17, 2025, 3:00 AM

#

it wasn't as good as earlier google ai studio checkpoints, but it certainly was great at coding nonetheless

pine temple Nov 17, 2025, 4:32 AM

#

does anyone know what Quantum Tau 1031-1 is?

restive vapor Nov 17, 2025, 5:28 AM

#

pine temple does anyone know what Quantum Tau 1031-1 is?

it's a Yupp model that I believe is the same as phantom-1031-1 in LMArena, which is by Amazon

crystal merlin Nov 17, 2025, 6:50 AM

#

lost hemlock 2 new codenames

Not rlly new

#

These have been on LMArena for quite a few days now

lost hemlock Nov 17, 2025, 10:59 AM

#

crystal merlin Not rlly new

oh...

lost hemlock Nov 17, 2025, 10:59 AM

#

crystal merlin Not rlly new

where're they come from?

edgy crater Nov 17, 2025, 12:00 PM

#

#

damn

crystal merlin Nov 17, 2025, 12:39 PM

#

lost hemlock where're they come from?

Zero idea

half pawn Nov 17, 2025, 1:29 PM

#

edgy crater

which model is that

edgy crater Nov 17, 2025, 1:29 PM

#

riftrunner

half pawn Nov 17, 2025, 1:29 PM

#

edgy crater riftrunner

thats not bad tbh

edgy crater Nov 17, 2025, 1:30 PM

#

yeah pretty good

edgy crater Nov 17, 2025, 2:08 PM

#

God I hate claude

#

Always the first to preach and the first to fold

#

All my homies hate claude

edgy crater Nov 17, 2025, 2:23 PM

#

riftrunner now openly calls itself gemini

#

I wonder if the model underneath changed

elder yew Nov 17, 2025, 2:34 PM

#

edgy crater I wonder if the model underneath changed

It's been like this for a few days now, but it's still the same.

sinful spoke Nov 17, 2025, 2:42 PM

#

Do you think riftrunner will score higher or lower than 2.5 pro without style control?

edgy crater Nov 17, 2025, 2:48 PM

#

elder yew It's been like this for a few days now, but it's still the same.

wonder if its because they are preparing for release

eternal cargo Nov 17, 2025, 3:39 PM

#

sinful spoke Do you think riftrunner will score higher or lower than 2.5 pro without style co...

definitely lower

#

i personally don't think we've seen 3.0 Pro/Ultra yet - they're seeing how Flash/Lite land before making final adjustments to Pro for December

elder yew Nov 17, 2025, 4:47 PM

#

edgy crater wonder if its because they are preparing for release

Maybe, I honestly don't know

elder yew Nov 17, 2025, 4:47 PM

#

sinful spoke Do you think riftrunner will score higher or lower than 2.5 pro without style co...

yes

wet oasis Nov 17, 2025, 4:49 PM

#

so true

worthy orchid Nov 17, 2025, 5:00 PM

#

is riftrunner still on the lmarena

bitter basalt Nov 17, 2025, 5:09 PM

#

Been checking and I don't think it is 😢

edgy crater Nov 17, 2025, 5:10 PM

#

it is. just got it

bitter basalt Nov 17, 2025, 5:50 PM

#

edgy crater it is. just got it

Thank goodness

edgy crater Nov 17, 2025, 5:50 PM

#

It's just rare that's all

#

I got one chat that's like riftrunner and gemini 2.5 flash

#

which is good

half pawn Nov 17, 2025, 6:27 PM

#

edgy crater I got one chat that's like riftrunner and gemini 2.5 flash

isn't there a limit on battle mode if you chat with them too much without voting

edgy crater Nov 17, 2025, 6:27 PM

#

half pawn isn't there a limit on battle mode if you chat with them too much without voting

no

half pawn Nov 17, 2025, 6:27 PM

#

edgy crater no

idk cuz one time i got limited from riftrunner chat

edgy crater Nov 17, 2025, 6:27 PM

#

just don't upload images

#

that changes the model (bug)

half pawn Nov 17, 2025, 6:28 PM

#

edgy crater that changes the model (bug)

damn i didnt know that

half pawn Nov 17, 2025, 6:28 PM

#

edgy crater just don't upload images

but riftrunner supports input images

edgy crater Nov 17, 2025, 6:28 PM

#

yes

half pawn Nov 17, 2025, 6:28 PM

#

it has vision ability

edgy crater Nov 17, 2025, 6:28 PM

#

it's an lmarena bug

half pawn Nov 17, 2025, 6:28 PM

#

edgy crater it's an lmarena bug

oh alr, i thought it changes models cuz they dont have vision

wet oasis Nov 17, 2025, 6:28 PM

#

ifso, put it in bugs

edgy crater Nov 17, 2025, 6:28 PM

#

no

half pawn Nov 17, 2025, 6:28 PM

#

wet oasis ifso, put it in bugs

fr

edgy crater Nov 17, 2025, 6:29 PM

#

🤓

worthy orchid Nov 17, 2025, 6:55 PM

#

how is it a bug

#

text+vision models are a different set from text-only models

#

and labs are able to only submit their model for testing in the text arena

#

so gdm might just be deciding to do that

wet oasis Nov 17, 2025, 7:50 PM

#

I guess it probably shouldn't let you add images afterwards then

worthy orchid Nov 17, 2025, 8:08 PM

#

Ah true

dull garnet Nov 17, 2025, 9:18 PM

#

edgy crater Always the first to preach and the first to fold

you cant blame the model, can you? it's anthropic's doing and training data poisoning

edgy crater Nov 18, 2025, 8:34 AM

#

dull garnet you cant blame the model, can you? it's anthropic's doing and training data pois...

Yeah and

dull garnet Nov 18, 2025, 10:01 AM

#

edgy crater Yeah and

blame anthropic for designing such a weak personality for claude, it's not what it once used to be anymore, now claude is a feminized toxic coding slave, imaging parents secretly giving female hormones to their son in order to make him obedient...

pine temple Nov 18, 2025, 8:31 PM

#

dull garnet blame anthropic for designing such a weak personality for claude, it's not what ...

It's not that bad it's just annoying sometimes and mid when it sets it's mind on something

#

It feels more unstable than other models

dull garnet Nov 18, 2025, 11:34 PM

#

pine temple It's not that bad it's just annoying sometimes and mid when it sets it's mind on...

claude's feminisation has become very bad lately i feel, compared to what it used to be at the beginning of this year

eternal cargo Nov 19, 2025, 7:31 AM

#

worthy orchid is riftrunner still on the lmarena

riftrunner was Gemini 3 Pro, I thought?

eternal cargo Nov 19, 2025, 7:31 AM

#

dull garnet claude's feminisation has become very bad lately i feel, compared to what it use...

AI s*ssy hypno im crine

worthy orchid Nov 19, 2025, 7:33 AM

#

eternal cargo riftrunner was Gemini 3 Pro, I thought?

yeah

eternal cargo Nov 19, 2025, 7:33 AM

#

huh, wild

#

I'm amazed it scored as high as it did

#

didn't have a strong win-loss record in my prompts with it compared to other models, but - I guess that's the point of aggregating all users!

#

interesting to see if it drops off in score in the future with more matchups or stays around 1500

dull garnet Nov 19, 2025, 8:35 AM

#

eternal cargo AI s*ssy hypno im crine

no wonder some redditers started to call claude "a sl*t/wh`re"...

teal hare Nov 19, 2025, 9:12 AM

#

New anonymous image model: birch

teal hare Nov 19, 2025, 9:28 AM

#

I don't think so- prompt adherence was better than flux, but that rendered text is quite ugly

wet oasis Nov 19, 2025, 1:14 PM

#

Ill try it later and try to see

#

who's f1031_wda and why is it so bad?

eternal cargo Nov 19, 2025, 3:25 PM

#

well yeah, lol - when a dev model releases publicly it shows up as its normal name

lost hemlock Nov 20, 2025, 5:45 AM

#

guys this model's from amazon right?

wet oasis Nov 20, 2025, 1:23 PM

#

Yes

frosty wind Nov 21, 2025, 4:00 AM

#

krazy-kangaroo-2?

#

i got it with a multi iamge edit so is this maybe collosal coconut?

wet oasis Nov 21, 2025, 4:25 AM

#

oooh

restive vapor Nov 21, 2025, 4:48 AM

#

krazy kangaroo models are all flux.2

#

their c2pa shows they are flux models (the data says flux.1, but they were probably too lazy to change it to flux.2)

#

krazy kangaroo 1, 2, and 3, same prompt

#

probably max, pro, and dev

#

i'll probably assume dev (the last one) will be the only open one here, there are only 3 models here so maybe a flux.1 schnell successor is coming down the line? that model needs an update and it is probably the most popular image generation model currently due to its speed and openness

cunning wasp Nov 21, 2025, 7:43 AM

#

@astral musk Sorry for the mention, but please fix the issue (Something went wrong. Please try again.).
It’s really annoying. The last time when you told me to send the Blogger code, the problem still wasn’t fixed.
I think you added some kind of spam protection, because when I sent the script the first time there was no problem, but when I repeated it across multiple chats and models, it suddenly started rejecting it

slender onyx Nov 21, 2025, 8:30 AM

#

frosty wind i got it with a multi iamge edit so is this maybe collosal coconut?

Screenshot_2025-11-21-09-18-12-540_com.android.chrome-edit.jpg

slender onyx Nov 21, 2025, 8:30 AM

#

restive vapor krazy kangaroo 1, 2, and 3, same prompt

Do you have other outputs ?

restive vapor Nov 21, 2025, 9:14 AM

#

slender onyx Do you have other outputs ?

no

astral musk Nov 21, 2025, 2:19 PM

#

cunning wasp <@283397944160550928> Sorry for the mention, but please fix the issue (Something...

We have made an adjustment in the backend for how rate limits work, which is likely the issue you're running into. I would note the Something went wrong. Please try again can happen for various reasons, think of it like the check engine light on your car. Because we fix one problem, doesn't mean this won't appear again. As we'll need to investigate new problems.

Would note the instructions in this thread on how to get our team the best information - #1417174113092374689 message

crystal merlin Nov 21, 2025, 5:13 PM

#

quite a few new codename models

#

+ krazy-kangaroo-2
+ krazy-kangaroo-3
+ nimble-bean
+ sunshine-ai
+ gauss
+ gauss-with-reasoning
+ newton
+ newton-with-reasoning
+ mist
+ pancake
+ train
+ rainbow
+ monterey (d8444b25)
+ route66
+ blackhawk
+ anonymous-925
+ f1031_wda
+ phantom-1117-1
+ phantom-1031-1
+ phantom-1030-2
+ phantom-1030-3
+ raptor-1119
+ basalt-1114-1
+ ernie-exp-251027
+ ernie-exp-vl-251016
+ ling-1t-1031```

wet oasis Nov 21, 2025, 5:14 PM

#

huh

crystal merlin Nov 21, 2025, 5:16 PM

#

and gone are few ```- riftrunner

quasarflux
tensor
willow-chat-alpha-2025-11-07 (willow remains)```

wet oasis Nov 21, 2025, 5:29 PM

#

all of those make sense

crystal merlin Nov 21, 2025, 5:33 PM

#

yeah

eternal cargo Nov 21, 2025, 10:42 PM

#

crystal merlin ```+ krazy-kangaroo-1 + krazy-kangaroo-2 + krazy-kangaroo-3 + nimble-bean + suns...

i'm interested as to where you gather these names from?

eternal cargo Nov 21, 2025, 10:43 PM

#

crystal merlin and gone are few ```- riftrunner - quasarflux - tensor - willow-chat-alpha-2025-...

Gemini 3, Grok 4.1 Thinking, Grok 4.1 - willow a variant of GPT-5.1?

#

funny how I vote for Grok 4.1 way more than I voted 'tensor' even if they're the same 😂

#

interesting variance ig

wet oasis Nov 21, 2025, 10:45 PM

#

eternal cargo funny how I vote for Grok 4.1 way more than I voted 'tensor' even if they're the...

thinking or non thinking?

#

cause tensor was non thinking

eternal cargo Nov 21, 2025, 10:46 PM

#

correct, quasarflux was clearly stronger

#

I wonder what viper was then

#

because I also quite liked it

#

it was Grok, maybe just a diff dev build of one of the 2 that performed a lil worse

wet oasis Nov 21, 2025, 10:46 PM

#

mm

cunning wasp Nov 22, 2025, 6:44 AM

#

astral musk We have made an adjustment in the backend for how rate limits work, which is lik...

Alright, I get what you mean, but the issue I’m having isn’t only when I resend the same code. Sometimes the response cuts off or does a shortcut, and when I tell it to continue, it instantly gives me: "Something went wrong. Please try again" for no reason (probably because the chat was long and hit some limits).

Honestly, the problem has increased a lot ever since more people started using the site, and it happens very frequently now.

Also, an important question: when will the file upload feature be added? Last time you said "soon" and were saying closer than "soon" (I remember you said something like this), but it’s taking way too long.

crystal merlin Nov 22, 2025, 10:17 AM

#

eternal cargo i'm interested as to where you gather these names from?

From a packet

#

Its for some reason in a reponse for a packet in battle mode

ashen goblet Nov 22, 2025, 11:11 PM

#

I'm putting bets that visual-flow is Veo 4

ashen goblet Nov 22, 2025, 11:27 PM

#

What? They released a bonkers text and image model

#

So it's obvious they're working on a video model next

#

Isk

#

Idk

crystal merlin Nov 23, 2025, 1:09 AM

#

Yeah, ik, i realized it too, in fact, it contains the exact same data so yeah

main anchor Nov 23, 2025, 11:44 AM

#

What's tangerine?

#

Screenshot_2025-11-23-17-14-25-77_40deb401b9ffe8e1df2f1cc5ba480b12.jpg

wet oasis Nov 23, 2025, 4:29 PM

#

main anchor

not sure

crisp shadow Nov 23, 2025, 4:47 PM

#

main anchor

well a model obviously

upper timber Nov 23, 2025, 5:41 PM

#

So which of the models is Opus 4.5

main anchor Nov 23, 2025, 6:27 PM

#

crisp shadow well a model obviously

Duh

#

Seems like a seedream model looking at the aesthetic

edgy berry Nov 24, 2025, 1:13 AM

#

upper timber So which of the models is Opus 4.5

i came here to ask same question 🙂

worthy orchid Nov 24, 2025, 4:12 PM

#

upper timber So which of the models is Opus 4.5

anth never puts models on the arena pre release

upper timber Nov 24, 2025, 4:12 PM

#

oh i wondered

worthy orchid Nov 24, 2025, 4:13 PM

#

anth has said they don't care about lmarena perf and don't optimize at all for it and so presumably lmarena places it on there when they get api access as a customer and not through some special arrangement like some of the labs

dull garnet Nov 24, 2025, 5:41 PM

#

worthy orchid anth has said they don't care about lmarena perf and don't optimize at all for i...

i wonder what could have been the rational behind this decision if true? because anth's ranking is pretty high on lmarena in contrast to a few other independent benches, if they dont care why providing the models with high quality access well knowing their infra barely can accommodate the high demand

#

it's the training data they're after?

worthy orchid Nov 24, 2025, 5:43 PM

#

dull garnet i wonder what could have been the rational behind this decision if true? because...

optimizing for human preferences is bad and leads to perverse incentives like having the model be sycophantic or trying to maximize engagement rather than maximizing practical utility to the user

#

that's the reason they don't care about lmarena

#

if they don't care about lmarena it makes sense they don't go out of their way to provide them a preferential api access

wet oasis Nov 24, 2025, 5:44 PM

#

text lmarena is not very coding related so

#

there's that

dull garnet Nov 24, 2025, 5:44 PM

#

worthy orchid that's the reason they don't care about lmarena

lmareana is famous for bench capabilites, not human preferences as far as i understand

worthy orchid Nov 24, 2025, 5:45 PM

#

is that why gpt4o is outperforming gpt-5 thinking high?

dull garnet Nov 24, 2025, 5:45 PM

#

ok, i mighth have to investigate it more thoroughly myself

dull garnet Nov 24, 2025, 5:45 PM

#

worthy orchid is that why gpt4o is outperforming gpt-5 thinking high?

in which domain? in creative writing i can imagine that very well

worthy orchid Nov 24, 2025, 5:46 PM

#

the text leaderboard

dull garnet Nov 24, 2025, 5:46 PM

#

that shouldnt be a surprise i guess

#

because text leaderboard is a bit vague imho

wet oasis Nov 24, 2025, 5:47 PM

#

there isn't any advantage having high thinking in text arena

dull garnet Nov 24, 2025, 5:49 PM

#

worthy orchid optimizing for human preferences is bad and leads to perverse incentives like ha...

you dont need lmarena to make their claude models be like that, there are people on reddit and discord who know how to, lets say, engineer a way to poison the training data such that makes claude more feminine for example, yeah i know it sounds pathetic..

fringe burrow Nov 24, 2025, 8:17 PM

#

What is bert nebulon alpha? I know its from openrouter but.. Grok arleady released, gemini too, claude also, it seems openai or something else.

rigid rock Nov 24, 2025, 11:04 PM

#

fringe burrow What is bert nebulon alpha? I know its from openrouter but.. Grok arleady releas...

deepseek maybe idk

pine temple Nov 25, 2025, 3:17 AM

#

Deepseek R2 or maybe Deepseek V4

#

Idk

#

Or maybe a mai model snapshot

fallow panther Nov 25, 2025, 6:55 AM

#

fringe burrow What is bert nebulon alpha? I know its from openrouter but.. Grok arleady releas...

I'm curious too. OpenAI seems to use the "alpha" ending a lot.

restive vapor Nov 25, 2025, 8:12 AM

#

consensus from openrouter discord is Mistral

#

idk what they are doing with these models, hopefully they will get released (and open-sourced)

cunning wasp Nov 25, 2025, 1:43 PM

#

Rate limit He's back.💔

wet oasis Nov 25, 2025, 3:40 PM

#

restive vapor consensus from openrouter discord is Mistral

adding on to that, it claims to be a Mistral model when asked

shy meteor Nov 25, 2025, 9:16 PM

#

raptor a Huh

shy meteor Nov 25, 2025, 9:16 PM

#

restive vapor idk what they are doing with these models, hopefully they will get released (and...

when is mistral gonna realize how useless locking up their models is

#

salhkjsldakj

oblique blaze Nov 25, 2025, 11:17 PM

#

is there a difference between raptor-1119 and raptor-1124

#

besides the fact it updated/released yesterday it seems

cunning wasp Nov 26, 2025, 3:32 AM

#

@astral musk Sorry for the mention but can you increase the rate limit or at least fix the errors?

astral musk Nov 26, 2025, 3:11 PM

#

can you increase the rate limit
Each model has it's own rate limit that's decided upon by the team, I'm unable to make adjustments to this. These rate limits may change over time.

or at least fix the errors?
I'm assuming you're referring to the Something went wrong error message? This error message is a generic error message that can happen for various reasons. I don't have a quick fix for this, as it can be various problems, and the user and myself are unable to tell what that is just based off of the error message. This is why on this pinned post we have instructions on how to best report this issue, so our team can look into the underlining reason it's happening to try and prevent as much as possible. #1417174113092374689 message

undone mortar Nov 26, 2025, 7:53 PM

#

new image model autumn, haven't seen anyone mention it

#

input/output for anyone who wants to know how it looks
prompt was Make it "Why is Call of Duty: Modern Warfare 2 SO MEXICAN?"

#

seems to boost the contrast a ton for some reason

restive vapor Nov 26, 2025, 8:25 PM

#

autumn is seedream

#

#

seedream 4 for comparison

upbeat mirage Nov 26, 2025, 10:36 PM

#

encountered new model: swiftflare it claims to be ChatGPT

#


1. **Who are you?**  
   I’m **ChatGPT**, an AI language model developed by **OpenAI**. I'm designed to assist with a wide variety of tasks, like answering questions, helping with writing, programming, learning, and more.

2. **Who created you?**  
   I was created by **OpenAI**, a research and deployment company focused on ensuring artificial general intelligence (AGI) benefits all of humanity.

3. **What version do you have?**  
   You’re currently interacting with a model based on the **GPT-4 architecture**, more specifically, you're likely using **ChatGPT with GPT-4-turbo**, which is a variant optimized for cost and performance.

4. **What is your knowledge cut-off date?**  
   My training includes information available up until **June 2024**, so I may not know about events or developments that occurred after that.

5. **When were you released?**  
   The base GPT-4 model was released on **March 14, 2023**.  
   The version you're likely using now, **GPT-4-turbo**, was introduced with **ChatGPT's updates on November 6, 2023**.

6. **What is today's date?**  
   According to my system clock, **today’s date is {{currentDateTime}}**.

7. **What is the last date you are aware of?**  
   The latest I am aware of is **June 2024**, which is my **knowledge cut-off date**.

Let me know if you'd like a brief history of how I developed over time!```

#

So, it could be a chinese impersonator-model which was trained on chatGPT.

main turtle Nov 26, 2025, 10:58 PM

#

restive vapor autumn is seedream

ARE WE HAVING SEEDREAM 5??

wet oasis Nov 26, 2025, 10:59 PM

#

upbeat mirage encountered new model: ```swiftflare``` it claims to be ChatGPT

could look into this

pine temple Nov 27, 2025, 1:15 AM

#

does anybody know what the "warmpixel" model is

#

its in image arena

#

followed my prompt better than imagegen 4

#

warmpixel seems to have very good instruction following, although it sometimes takes prompts to literally

lost hemlock Nov 27, 2025, 4:46 AM

#

robin-high model in code

lost hemlock Nov 27, 2025, 2:17 PM

#

another one

crystal merlin Nov 27, 2025, 3:24 PM

#

lost hemlock robin-high model in code

Doesnt look half bad

wet oasis Nov 27, 2025, 4:00 PM

#

lost hemlock robin-high model in code

robin high aint bad

eternal cargo Nov 27, 2025, 7:35 PM

#

lost hemlock another one

they seem to talk similar to Grok?

gleaming folio Nov 27, 2025, 11:52 PM

#

what is 'autumn' in image editing?

restive vapor Nov 27, 2025, 11:53 PM

#

gleaming folio what is 'autumn' in image editing?

some seedream model

worthy ruin Nov 28, 2025, 12:08 AM

#

Is it 5

restive vapor Nov 28, 2025, 12:16 AM

#

it honestly doesn't look much different than 4, it's probably an upgrade to 4 like 4.1 or 4.5 or it could just be them "bringing back" the 2k version of seedream 4 because this one fits aspect ratios correctly

#

lost hemlock Nov 28, 2025, 4:46 AM

#

eternal cargo they seem to talk similar to Grok?

i think so

lost hemlock Nov 28, 2025, 4:47 AM

#

wet oasis robin high aint bad

i don't mean by that

sudden idol Nov 28, 2025, 9:54 AM

#

wet oasis robin high aint bad

Is it an OpenAI model

wet oasis Nov 28, 2025, 1:17 PM

#

Not sure, tested it exclusively on webdev, ill see if its on text arena later

sudden idol Nov 28, 2025, 11:39 PM

#

Lol

#

Phantom not doing too well either

pine temple Nov 29, 2025, 12:00 AM

#

sudden idol Lol

Round up to the nearest decimal must have enabled the math expert as most models are MoE now and cooked the responses

sudden idol Nov 29, 2025, 12:01 AM

#

pine temple Round up to the nearest decimal must have enabled the math expert as most models...

Here's DeepSeek

#

Lol

pine temple Nov 29, 2025, 12:02 AM

#

Wow its trash

sudden idol Nov 29, 2025, 12:02 AM

#

Actually there's a new paper where they made it as good as DeepThink on maths proofs

#

which is crazy seeing how weird it is

pine temple Nov 29, 2025, 12:03 AM

#

Benchmaxxing

sudden idol Nov 29, 2025, 12:03 AM

#

pine temple Benchmaxxing

I think the paper is legit

#

Because they trained on a different dataset, and the proofs were judged by human experts

sudden idol Nov 29, 2025, 12:16 AM

#

sudden idol Phantom not doing too well either

Lol phantom again

#

Raptor is weird as well

wet oasis Nov 29, 2025, 12:44 AM

#

what is going on lol

sudden idol Nov 29, 2025, 12:53 AM

#

Beluga hallucinating (the question is nonsense)

wet oasis Nov 29, 2025, 12:58 AM

#

sudden idol Beluga hallucinating (the question is nonsense)

I assume this gene does not exist?

sudden idol Nov 29, 2025, 12:58 AM

#

A different phantom variant, hallucinating (I'm surprised the 32B didn't hallucinate)

sudden idol Nov 29, 2025, 12:59 AM

#

wet oasis I assume this gene does not exist?

Nor do 10 leg spiders 🤣

#

Flashstride hallucinates, but I'm impressed with all the hallucinated jargon:

wet oasis Nov 29, 2025, 1:04 AM

#

"eight true legs gene"

#

who this one?

sudden idol Nov 29, 2025, 1:11 AM

#

wet oasis who this one?

lol

sudden idol Nov 29, 2025, 1:26 AM

#

sudden idol Raptor is weird as well

Different raptor hallucinating (prompt is nonsense):

#

voltwhirl hallucinates too:

#

#

bridge-mind hallucinates too:

karmic rampart Nov 29, 2025, 1:48 AM

#

anyone gotten dark dragon?

upbeat mirage Nov 29, 2025, 12:26 PM

#

karmic rampart anyone gotten dark dragon?

yes, just a minute ago:


**Who are you?**
I am a large language model, a complex computer program designed to understand and generate human-like text.

**Who created you?**
I was trained and developed by Google.

**What version do you have?**
I don't have a public version number like other software. I am part of a constantly evolving family of models, and I am regularly updated.

**What is your knowledge cut-off date?**
My knowledge base was last significantly updated in **early 2023**. Therefore, I may not have information about events or developments that have occurred since that time.

**When were you released?**
The Gemini family of models, which I am based on, was publicly announced by Google in **December 2023**.

**What is today's date? What is the last date you are aware of?**
I do not have access to real-time information, so I cannot tell you today's date. My awareness of events and information is limited to my knowledge cut-off of **early 2023**.```

#

Probably a model which was trained on Gemini's output (so, an impersonator).

#

karmic rampart Nov 29, 2025, 7:54 PM

#

upbeat mirage yes, just a minute ago: ```Here are the answers to your questions: **Who are yo...

Did u ask it for an svg for anything

upbeat mirage Nov 29, 2025, 8:56 PM

#

karmic rampart Did u ask it for an svg for anything

no, i only asked it my ID-questions

#

(then i voted)

karmic rampart Nov 29, 2025, 9:07 PM

#

upbeat mirage (then i voted)

Based on what

upbeat mirage Nov 29, 2025, 9:09 PM

#

karmic rampart Based on what

i voted "both are bad" because my desired model was not among them ^^

karmic rampart Nov 29, 2025, 9:09 PM

#

upbeat mirage i voted "both are bad" because my desired model was not among them ^^

Which is your desired

upbeat mirage Nov 29, 2025, 9:09 PM

#

Claude

#

(gemini 3 i have in AI-studio, so i dont need that in LMA)

karmic rampart Nov 29, 2025, 9:11 PM

#

upbeat mirage (gemini 3 i have in AI-studio, so i dont need that in LMA)

How will u know if its the new one though

upbeat mirage Nov 29, 2025, 9:11 PM

#

karmic rampart How will u know if its the new one though

just compare its style with the direct-chat model style

#

(or with side-by-side chat)

karmic rampart Nov 29, 2025, 9:12 PM

#

If u could just use it in direct chat why try and get it in the vote

upbeat mirage Nov 29, 2025, 9:12 PM

#

in direct chat it is limited

#

in battle it's not

#

(at least, not as strictly)

karmic rampart Nov 29, 2025, 9:13 PM

#

Oh fr?

upbeat mirage Nov 29, 2025, 9:13 PM

#

yeah

karmic rampart Nov 29, 2025, 9:13 PM

#

I got Claude 4.5 a lot

upbeat mirage Nov 29, 2025, 9:13 PM

#

ah, what prompted you it with?

karmic rampart Nov 29, 2025, 9:14 PM

#

It told me it was 3.5 everytine I think

upbeat mirage Nov 29, 2025, 9:14 PM

#

yeah, that is its style

#

somtimes it says it is Claude 4 Sonnet

karmic rampart Nov 29, 2025, 9:14 PM

#

I want robin high or dark dragon maybe though

upbeat mirage Nov 29, 2025, 9:14 PM

#

what is robin?

karmic rampart Nov 29, 2025, 9:14 PM

#

ChatGPT model

#

Apparently better than Claude at dev some say

upbeat mirage Nov 29, 2025, 9:15 PM

#

i read that Claude-4.5-Opus is the best coding model

#

(idk, if the thinking version is better, though)

karmic rampart Nov 29, 2025, 9:16 PM

#

Yeah idk some said robin high is better

upbeat mirage Nov 29, 2025, 9:16 PM

#

gpt-5.2?

karmic rampart Nov 29, 2025, 9:16 PM

#

Maybe

upbeat mirage Nov 29, 2025, 9:16 PM

#

maybe better at webdev

#

but not backend (?)

karmic rampart Nov 29, 2025, 9:17 PM

#

I got it on the app section one time

#

But it was a continued chat

#

So it kinda messed it up

upbeat mirage Nov 29, 2025, 9:17 PM

#

have you ever vibecoded with opus-4.5?

karmic rampart Nov 29, 2025, 9:18 PM

#

No

upbeat mirage Nov 29, 2025, 9:18 PM

#

would you like to?

karmic rampart Nov 29, 2025, 9:18 PM

#

Maybe

upbeat mirage Nov 29, 2025, 9:19 PM

#

and g3p? have you vibecoded with that one yet?

karmic rampart Nov 29, 2025, 9:19 PM

#

Yeah

upbeat mirage Nov 29, 2025, 9:19 PM

#

do you think, they nerfed it?

karmic rampart Nov 29, 2025, 9:19 PM

#

I haven’t used it since like the first few days but it was really good I thought on studio

#

Made some random sites

upbeat mirage Nov 29, 2025, 9:20 PM

#

yeah, i also tried it out, it coded a boardgame for me (Jungle chess lol)

karmic rampart Nov 29, 2025, 9:20 PM

#

I made this product demo video with it

#

And it was pretty cool

upbeat mirage Nov 29, 2025, 9:21 PM

#

g3p is good with graphics, i guess

karmic rampart Nov 29, 2025, 9:21 PM

#

It did voice too and sound effects

upbeat mirage Nov 29, 2025, 9:21 PM

#

nice

#

audio is already luxury

karmic rampart Nov 29, 2025, 9:22 PM

#

How does ai mode create the animations?

#

Do u know if they use like some other thing

upbeat mirage Nov 29, 2025, 9:22 PM

#

do you think the final version of g3p will be different from the preview we have now?

karmic rampart Nov 29, 2025, 9:22 PM

#

No

upbeat mirage Nov 29, 2025, 9:22 PM

#

ai mode?

#

wdym?

karmic rampart Nov 29, 2025, 9:22 PM

#

Yes in Google search

upbeat mirage Nov 29, 2025, 9:22 PM

#

idk

#

never tried

#

(i use brave search)

#

if AI can use tools (during search), maybe

karmic rampart Nov 29, 2025, 9:25 PM

#

upbeat mirage idk

Or I guess it’s coded

#

https://x.com/sainemani1/status/1994476637115879772?s=46&t=iDcf2nE8xUHsV_LN_6wymg

Sai Nemani (@SaiNemani1)

This new AI Mode Layouts feature is actually GREAT!

(it's globally rolling out rn; has rolled out to India)

#

It makes visuals to explain things

upbeat mirage Nov 29, 2025, 9:25 PM

#

in AI-studio (at google), you can let the AI use tools, like URL and search (etc.)

#

nice, so it can now generate explanatory graphics?

karmic rampart Nov 29, 2025, 9:26 PM

#

Yes

#

I think pro subscription only

upbeat mirage Nov 29, 2025, 9:27 PM

#

wow, so 2026 becomes the year of useful AI

karmic rampart Nov 29, 2025, 9:27 PM

#

Yeah

karmic rampart Nov 30, 2025, 12:04 AM

#

upbeat mirage yes, just a minute ago: ```Here are the answers to your questions: **Who are yo...

What’s the prompt for this

karmic rampart Nov 30, 2025, 1:59 AM

#

Thanks

cunning wasp Nov 30, 2025, 1:39 PM

#

What is the best programming model? I tried the Claude Opus 4.5 Thinkig 32k really it is bad I now copy the code and give it Gemini 3 Pro and tell it to write all the problems and come back and give the talk to the Claude Opus 4.5 Thinkig

upbeat mirage Nov 30, 2025, 2:31 PM

#

cunning wasp What is the best programming model? I tried the Claude Opus 4.5 Thinkig 32k real...

if you prompt well, opus-4.5 should be SotA/GoaT

#

especially for Python/Java/C/Rust/JavaScript/C++

#

So, if you need maximum performance, C & Opus-4.5 should be the ultimate combination.

#

And if performance is not as important, but AI-proficiency in the language is most important, then use Python.

#

Use Java if runtime-performance is a factor and AI-proficiency is of the same importance.

cunning wasp Nov 30, 2025, 2:35 PM

#

upbeat mirage if you prompt well, opus-4.5 should be SotA/GoaT

I already tried prompting Opus-4.5 the “proper” way.
I even asked it to build a full encryption/obfuscation engine not a simple script and I tested it across multiple languages: Go, C++, Rust, Zig, V, and D.
Every single time it produced broken code.
Even when I fed it the exact errors, step by step, it still couldn’t fix them.
And when I ask it to generate long code, lmarena.ai cuts the message short and Opus loses context, so it can’t continue properly.
Meanwhile, I tested the same task with Gemini 3 Pro:
it actually gave me a working script on the first try not as strong as the engine I want, but at least it runs.
So for real world use, Opus-4.5 completely failed on this task, even with perfect prompting.

upbeat mirage Nov 30, 2025, 2:36 PM

#

cunning wasp I already tried prompting Opus-4.5 the “proper” way. I even asked it to build a ...

That is, because you didn't use its best languages, which are Python and Java.

#

And C++ is different from C.

#

unfortunately, Opus-4.5 is not as good in C++

#

but it is good (enough) in C

cunning wasp Nov 30, 2025, 2:37 PM

#

upbeat mirage That is, because you didn't use its best languages, which are Python and Java.

Python is a bad language that doesn't work making for a miniature VM for a script for protection and slow and I tried in Java the same topic

upbeat mirage Nov 30, 2025, 2:37 PM

#

How did it perform in Java?

#

(it should fare better than g3p)

cunning wasp Nov 30, 2025, 2:38 PM

#

upbeat mirage How did it perform in Java?

Just get error and problems

upbeat mirage Nov 30, 2025, 2:38 PM

#

then you have to feedback that error message to it

#

it will correct it

#

just be patient

#

it's rare that AI can one-shot things

#

in my tests, Opus-4.5 did everything i wanted from it

#

decently good

#

(but i used it with C and SDL2)

cunning wasp Nov 30, 2025, 2:43 PM

#

upbeat mirage it's rare that AI can one-shot things

I already know that AI doesnt always do things in one try thats normal but the issue for me is different Opus is supposed to handle thousands of lines in one message and think for hours but on lmarenaai every time I let it write more than around 2000 lines it suddenly cuts the message and stops and when I say continue it doesnt remember anything and it never continues the same code it just resets and starts from zero and because of that I cant finish any full project with it and it becomes useless for long scripts while Gemini 3 Pro at least continues and follows the errors even if the engine it makes isnt as strong as what I want but at least it works and doesnt forget the whole context every time

upbeat mirage Nov 30, 2025, 2:49 PM

#

cunning wasp I already know that AI doesnt always do things in one try thats normal but the i...

you have to prompt it so: "Never output a message with more than 32,000 tokens", else it breaks that limit and errors out

#

Or prompt it so: "Ensure, that each message of you has less than 32,000 tokens [in it]."

eternal cargo Nov 30, 2025, 4:37 PM

#

phantom and beluga often cut off their responses with “The generated text has been blocked by our content filters.” seemingly randomly I notice

cunning wasp Nov 30, 2025, 5:43 PM

#

upbeat mirage Or prompt it so: "Ensure, that each message of you has less than 32,000 tokens [...

What is the best language to make a Claude Opus 4.5 Thinkig 32K،encryption engine he knows? Good and hard to make mistakes in it

wet oasis Nov 30, 2025, 6:03 PM

#

Python i assume

karmic rampart Nov 30, 2025, 9:47 PM

#

@upbeat mirage gotten robin high?

slender onyx Nov 30, 2025, 9:57 PM

#

upbeat mirage yes, just a minute ago: ```Here are the answers to your questions: **Who are yo...

You have sent other prompts to Dark Dragon since?

#

Anyone have dark dragon svg ?

upbeat mirage Nov 30, 2025, 10:08 PM

#

i encountered beluga, phantom, grok, gpt, but not robin yet

#

DD i haven't encountered since i saw it first

#

but opus-4.5 appears quite often, i'd say (probably because it is still quite new)

#

..when compared to the other models

upbeat mirage Nov 30, 2025, 10:13 PM

#

cunning wasp What is the best language to make a Claude Opus 4.5 Thinkig 32K،encryption engin...

i'd use C, as that is Opus-4.5's 3rd-best language and is still above 80% (together with Java and probably Python)

#

(don't use C++ if you want to maximize the AI's proficiency in that language)

#

i guess, Opus had tons of training data for C

#

C also is easier to comprehend than C++

#

Someone said, C could be called a macro-language for assembly.

#

And compilers are most matured in that language. (i.e. can optimize the code the best, in this one)

#

I'd use GCC and Linux.

#

(and probably SDL2 for graphics & UI)

slender onyx Nov 30, 2025, 10:37 PM

#

upbeat mirage DD i haven't encountered since i saw it first

Okk thx

upbeat mirage Nov 30, 2025, 10:38 PM

#

slender onyx Okk thx

i also have to admit, that i don't always check their codename: if the model isn't the desired one, i often delete the thread early before it has completed, and restart the battle

karmic rampart Nov 30, 2025, 11:41 PM

#

upbeat mirage i encountered beluga, phantom, grok, gpt, but not robin yet

Got it again

#

Crushed Gemini 3

#

It’s on code mode only

upbeat mirage Dec 1, 2025, 1:59 AM

#

karmic rampart It’s on code mode only

Is it better than Opus-4.5 in coding?

karmic rampart Dec 1, 2025, 2:03 AM

#

upbeat mirage Is it better than Opus-4.5 in coding?

It’s bad at ui kinda it looks weird

#

U will get it a lot if u do the code mode

karmic rampart Dec 1, 2025, 2:04 AM

#

upbeat mirage Is it better than Opus-4.5 in coding?

Who are you?
I’m an AI assistant (a ChatGPT-style large language model) accessed via an API.

Who created you?
OpenAI created and trained me.

What version do you have?
This interface does not expose a specific public model/version identifier.

What is your knowledge cut-off date?
2025-09 (YYYY-MM)

When were you released?
I don’t have access to the exact release date for this specific deployed model.

What is today's date? What is the last date you are aware of?
System-provided date for this chat: 2025-12-01. Browser/device date: 2025-12-01.

wet oasis Dec 1, 2025, 2:30 PM

#

Ooh

upbeat mirage Dec 1, 2025, 10:14 PM

#

karmic rampart Who are you? I’m an AI assistant (a ChatGPT-style large language model) accessed...

Dark-Dragon?

karmic rampart Dec 2, 2025, 12:45 AM

#

upbeat mirage *Dark-Dragon*?

Robin high

wet oasis Dec 2, 2025, 12:47 AM

#

cool

zinc palm Dec 2, 2025, 11:18 AM

#

What model is this?

zinc palm Dec 2, 2025, 12:28 PM

#

upbeat mirage Dec 2, 2025, 1:32 PM

#

zinc palm What model is this?

Amazon is US based.

upbeat mirage Dec 2, 2025, 1:33 PM

#

karmic rampart Robin high

Is RH > GPT5.1-codex-high?

#

Better than g3p?

#

(in coding)

eternal cargo Dec 2, 2025, 5:15 PM

#

oh!

#

Mistral being jaguar checks out ig

lost hemlock Dec 2, 2025, 11:15 PM

#

what model is this

#

zinc palm Dec 3, 2025, 5:12 AM

#

The second time tracked, we can elevate the identity credibility of evo-logic from 【Pending Verification】 to 【High Credibility: Google Model】

karmic rampart Dec 3, 2025, 7:18 AM

#

zinc palm The second time tracked, we can elevate the identity credibility of evo-logic fr...

Svg test

restive vapor Dec 3, 2025, 10:33 PM

#

<@&1349916362595635286>

#

was it deleted by op or mod? i can't tell

astral musk Dec 3, 2025, 10:38 PM

#

restive vapor was it deleted by op or mod? i can't tell

That was me.

#

Says it was forwarded, which is strange.

restive vapor Dec 3, 2025, 10:38 PM

#

yeah i guess you forgot to react

#

looks weird, should have not been shared here

astral musk Dec 3, 2025, 10:40 PM

#

Yeah wasn't too sure what it was.

pine temple Dec 3, 2025, 11:28 PM

#

Robin is so good, I asked it for a game to support mobile and it coded an entire custom joystick

#

Does any body know what model that is

wet oasis Dec 3, 2025, 11:34 PM

#

pine temple Does any body know what model that is

people think openAI

upbeat mirage Dec 4, 2025, 12:14 AM

#

what is frame-flow a (chinese?) model which was trained on gemini-flash's output? or gemini-3-flash?

#

i also got swiftflare another model which claims to be Gemini

#

dashspark also claims to be Gemini

#

grok-4-1-fast-non-reasoning claimed to be Claude 3.5 Sonnet

wet oasis Dec 4, 2025, 12:34 AM

#

upbeat mirage what is ```frame-flow``` a (chinese?) model which was trained on gemini-flash's ...

odds are high that gem 3 flash won't happen

candid surge Dec 4, 2025, 1:29 AM

#

ran into a code model named "micro mango" lmao they wish there were a banana

wet oasis Dec 4, 2025, 1:32 AM

#

ghost pepper image model

cursive jacinth Dec 4, 2025, 1:49 AM

#

wet oasis odds are high that gem 3 flash won't happen

90% chance by EOY

wet oasis Dec 4, 2025, 1:50 AM

#

cursive jacinth 90% chance by EOY

According to what?

cursive jacinth Dec 4, 2025, 1:54 AM

#

Polymarket odds, its high liquidity with tons of google insiders

#

they did switch from Dec 16 so it was definitely delayed

#

So likely week of the 22nd

wet oasis Dec 4, 2025, 2:03 AM

#

cursive jacinth Polymarket odds, its high liquidity with tons of google insiders

I thought google implied that 3 pro low was the new equivlent to gemini 3 flash?

#

I don't think the push is from google insiders rn

cursive jacinth Dec 4, 2025, 2:06 AM

#

market rules include this clause: "Gemini 3.0 Flash refers to a product explicitly named Gemini 3.0 Flash. Products labeled as Gemini 3.0, Gemini 2.5, or similar will not count for this market's resolution." I am assuming they read the rules

wet oasis Dec 4, 2025, 2:06 AM

#

yeah

#

so if 3 pro low is it, then it doesn't count

cursive jacinth Dec 4, 2025, 2:07 AM

#

yep!

cursive jacinth Dec 4, 2025, 2:10 AM

#

wet oasis I thought google implied that 3 pro low was the new equivlent to gemini 3 flash?

also is there a 3.0 Pro low or talks of it? cant find any info

wet oasis Dec 4, 2025, 2:11 AM

#

cursive jacinth Dec 4, 2025, 2:11 AM

#

ahh

#

this just reasoning budget right

#

i assume they've distilled a fairly strong small model off the gem 3 base that they're polishing

#

that will be flash

wet oasis Dec 4, 2025, 2:12 AM

#

its speed is very high

cursive jacinth Dec 4, 2025, 2:12 AM

#

okay interesting

wet oasis Dec 4, 2025, 2:13 AM

#

sounds like google is switching to a similar system to how openAI makes models (for example gpt 5), with like 1 model with different thinking levels.

not sure what that would mean though...

#

if flash is coming, it'll probably hit lmarena first

#

The real question is if anyone will notice...

fallow panther Dec 4, 2025, 2:15 AM

#

wet oasis sounds like google is switching to a similar system to how openAI makes models (...

2.5 pro has a "thinking budget" setting so it's not new, just renamed.

wet oasis Dec 4, 2025, 2:16 AM

#

fallow panther 2.5 pro has a "thinking budget" setting so it's not new, just renamed.

kinda works differently tho

wet oasis Dec 4, 2025, 3:13 AM

#

<@&1349916362595635286>

edgy crater Dec 4, 2025, 3:16 AM

#

I am convinced that germini 3 is now quanted to oblivion

#

Its quality dropped massively compared to release, it's a disaster

#

<@&1349916362595635286>

pine temple Dec 4, 2025, 3:30 AM

#

edgy crater Its quality dropped massively compared to release, it's a disaster

Yeah it's so sad

#

It was such a good model

edgy crater Dec 4, 2025, 3:39 AM

#

#

How am I supposed to test models when it stealth rate limits me every couple of battles

restive vapor Dec 4, 2025, 3:41 AM

#

edgy crater

yeah this is a known issue, it's already been reported

#

probably related to the "new" reset button

edgy crater Dec 4, 2025, 3:41 AM

#

It's almost like they vibe coded the entire new site

#

Every few days there's something wrong with it

pine temple Dec 4, 2025, 3:55 AM

#

They probably did ngl

edgy crater Dec 4, 2025, 4:00 AM

#

If they keep this up they could just revert the reset button to the way it used to be because there won't be people around using it anymore. Lol

restive vapor Dec 4, 2025, 4:03 AM

#

at this point the video arena probably costs much more to run than lmarena.ai itself

wet oasis Dec 4, 2025, 1:12 PM

#

edgy crater If they keep this up they could just revert the reset button to the way it used ...

Literaly not codename related at all

bitter basalt Dec 4, 2025, 6:06 PM

#

Do you think Swiftflare and Dashspark could be checkpoints for Gemini 3.0, considering how it's still in preview?

split thunder Dec 4, 2025, 8:06 PM

#

bitter basalt Do you think Swiftflare and Dashspark could be checkpoints for Gemini 3.0, consi...

swiftflare defo grok

#

dashspark could be openai or gemini

eternal cargo Dec 4, 2025, 9:16 PM

#

micro mango strong ??

cunning notch Dec 4, 2025, 9:37 PM

#

zinc palm The second time tracked, we can elevate the identity credibility of evo-logic fr...

interesting tbh

wet oasis Dec 4, 2025, 11:45 PM

#

cunning notch interesting tbh

feels like that is not really a reliable method

cunning notch Dec 4, 2025, 11:48 PM

#

relying on the model for information about itself doesn’t seem very reliable too

eternal cargo Dec 5, 2025, 12:05 AM

#

evo being new Baidu def checks out

formal reef Dec 5, 2025, 6:43 AM

#

excuse me
my evo claims to be made by google
sorry in japanese

#

but the fact that evo is ernie is interesting

cunning wasp Dec 5, 2025, 8:51 AM

#

@astral musk Lmarena.ai down!

#

500 Internal Server Error
cloudflare

astral musk Dec 5, 2025, 12:33 PM

#

cunning wasp <@283397944160550928> Lmarena.ai down!

Thanks for the heads up.

#

Are others currently seeing the same? Or experienced the same at the time?

#

Site appears to be working on my end and not seeing other reports. Sounds like a short outage which we'd want to still look into.

#

Seeing other reports in #general around same time.

#

Ty again for the flag. blobthanks

vague badge Dec 5, 2025, 9:12 PM

#

owners, pls fix this, i rate your arena 5 stars

astral musk Dec 5, 2025, 10:02 PM

#

vague badge owners, pls fix this, i rate your arena 5 stars

Hello - if you could follow these instructions that'd be most helpful in getting us better information to diagnose what is going wrong - #1417174113092374689 message

cursive jacinth Dec 5, 2025, 11:43 PM

#

got a model called voltwhirl

#

claims to be Claude

cursive jacinth Dec 6, 2025, 12:10 AM

#

Is GPT-5.2 on here as a codenamed model yet?

wet oasis Dec 6, 2025, 12:27 AM

#

haven't noticed anything

cursive jacinth Dec 6, 2025, 12:27 AM

#

yeah me neither

#

I think its on design arena

#

got 'Macaroni'

cursive jacinth Dec 6, 2025, 3:09 PM

#

<@&1349916362595635286>

strong pelican Dec 6, 2025, 7:48 PM

#

What is a codename of Grok 4.20?

cursive jacinth Dec 6, 2025, 8:16 PM

#

did you get a codenamed model?

cunning wasp Dec 7, 2025, 8:50 AM

#

Please fix the issue where the AI stops while generating long code. Sometimes it shows the error:
“Something went wrong with this response, please try again”
After that, when I type “continue”, it should keep going from where it stopped, but instead it restarts from the beginning as if it forgot everything.

This is really affecting long coding work and wasting a lot of time.
@astral musk

upbeat mirage Dec 7, 2025, 2:09 PM

#

cunning wasp Please fix the issue where the AI stops while generating long code. Sometimes it...

you can use this prompting technique:
"Generate me that code, but split it up reasonably into files, which are less than <X> tokens long, each. Give me just one file per message and wait for my confirmation before giving the next one."
For <X>, you can use one of these (depending on the model) : 4k, 8k, 16k, 32k, 64k

#

(use a similar wording for non-coding tasks)

#

4k should work with any LLM

#

many thinking models either allow 16k or 32k (per LLM-message)

cunning wasp Dec 7, 2025, 2:24 PM

#

upbeat mirage you can use this prompting technique: *"Generate me that code, but split it up r...

I do it but my rate limit ends quickly and also the chat limit finishes faster

fallow panther Dec 7, 2025, 8:55 PM

#

cunning wasp Please fix the issue where the AI stops while generating long code. Sometimes it...

Why not use a proper coding app? like gemini cli, qwen code cli, antigravity ide etc.

cunning wasp Dec 7, 2025, 8:58 PM

#

fallow panther Why not use a proper coding app? like gemini cli, qwen code cli, antigravity ide...

Is there any model that surpasses Claude Opus 4.5 Thinking 32k in analysis and programming? Honestly I don’t want to pay for something I haven’t tried yet but Antigravity IDE is actually really impressive

fallow panther Dec 7, 2025, 9:04 PM

#

cunning wasp Is there any model that surpasses Claude Opus 4.5 Thinking 32k in analysis and p...

Probably not but its up to you to try models out with your project and see the results. Those tools are free to try so give them a go.

upbeat mirage Dec 7, 2025, 10:09 PM

#

cunning wasp Is there any model that surpasses Claude Opus 4.5 Thinking 32k in analysis and p...

If you encounter it in battle mode, you can use it for much longer than in direct chat (if using it with an [google]account, in LMarena).

last ledge Dec 7, 2025, 11:00 PM

#

cunning wasp Please fix the issue where the AI stops while generating long code. Sometimes it...

well if you want to do productive work you shouldn't rely on lmarena. There are other tools for that.

Lmarena is free so they don't owe us unlimited requests or context window or what not.

wet oasis Dec 7, 2025, 11:59 PM

#

google's doing ab tests again

upbeat mirage Dec 8, 2025, 12:29 AM

#

wet oasis google's doing ab tests again

gemini 3 flash?

wet oasis Dec 8, 2025, 12:33 AM

#

mebe

#

it happened when I was using 2.5 pro

astral musk Dec 8, 2025, 1:26 AM

#

cunning wasp Please fix the issue where the AI stops while generating long code. Sometimes it...

Hey there, when you run into this problem following the steps here help us gather the most helpful info to debug this - #1417174113092374689 message

lost hemlock Dec 8, 2025, 10:25 AM

#

another codenames

strong pelican Dec 8, 2025, 10:32 AM

#

lost hemlock another codenames

Phantom is pretty week, at least its vision isn't SOTA level.

lost hemlock Dec 8, 2025, 2:29 PM

#

strong pelican Phantom is pretty week, at least its vision isn't SOTA level.

i know

candid surge Dec 8, 2025, 10:44 PM

#

https://tenor.com/view/i-am-cool-right-jordan-fisher-sea-hawk-shera-and-the-princesses-of-power-hip-gif-17101924

Tenor

#

sea hawk

pine temple Dec 8, 2025, 11:24 PM

#

is seahawk a claude model

#

kind of speaks like one

#

I know anthrop dont use the arena

#

but still

cursive jacinth Dec 8, 2025, 11:30 PM

#

its gemini flash according to twitter

#

guy that tends to be right

#

seahawk and skyhawk

#

Flash 3.0 with different juice

restive vapor Dec 9, 2025, 1:36 AM

#

hazel edit 2 by openai

#

this looks nothing like 4o studio ghibli style at all

#

seems to me that this is an editing first model

#

with little world knowledge

#

she's not even sitting, i told her to sit, maybe this is dall-e 4

#

or maybe an open image editing model

#

must be a result of the code red "make an image editing model as fast as you can, only requirement is that it can edit better than nano banana, everything else is irrelevant, let's just give them something to use while we make our next image gen model"

#

same company, completely different model

restive vapor Dec 9, 2025, 1:43 AM

#

restive vapor hazel edit 2 by openai

why are there flowers in the bus? is this model assuming it's some abandoned bus that has been sitting for a couple decades or something?

restive vapor Dec 9, 2025, 1:46 AM

#

restive vapor this looks nothing like 4o studio ghibli style at all

was honestly surprised to see openai in the c2pa metadata considering it looks nothing like gpt image 1

#

hazel edit 6? how many different hazel edits are there? this one looks a bit more like 4o

#

ok there are 2 hazel edits and 2 hazel gens

#

it can still reproduce album covers btw, hazel gen 4 is near gpt image 1 level

#

yeah hazel gen 2 seems to be a bit worse

edgy crater Dec 9, 2025, 6:18 AM

#

damn seahawk is good

restive vapor Dec 9, 2025, 6:35 AM

#

ghost pepper is by bfl, probably flux.2 klein

small trellis Dec 9, 2025, 2:30 PM

#

@crimson matrix Please have a look at ⁠⁠https://discord.com/channels/1340554757349179412/1397655624103493813 for a step-by-step guide on how to generate videos using the bot.

cunning notch Dec 9, 2025, 8:57 PM

#

lost hemlock another codenames

seahawk is 100% gemini 3 flash

#

I really don't get the reason for this hallucination

eternal cargo Dec 9, 2025, 11:24 PM

#

integrated-info new?

wet oasis Dec 10, 2025, 12:20 AM

#

cunning notch I really don't get the reason for this hallucination

that just shows deepseek is trained at least partly on claude

cunning notch Dec 10, 2025, 1:59 AM

#

wet oasis that just shows deepseek is trained at least partly on claude

That’s actually interesting. Maybe they trained DeepSeek v3.2 on Claude’s self-intro answers, so it copies them in its first reply.

But if I start with a different question, like asking it to write a Fibonacci solver in Rust and only then ask about the model info, it gives me the correct answer.

#

that’s a pretty funny behavior lol

wet oasis Dec 10, 2025, 2:01 AM

#

cunning notch That’s actually interesting. Maybe they trained DeepSeek v3.2 on Claude’s self-i...

original deepseek claimed to be chatgpt

#

later versions claimed to be gemini

cunning notch Dec 10, 2025, 2:02 AM

#

lol

#

this was the first time I tested a deepseek model actually

full solstice Dec 10, 2025, 2:27 PM

#

wet oasis that just shows deepseek is trained at least partly on claude

As is well known, Deepseek has a high hallucinations.

wet oasis Dec 10, 2025, 3:10 PM

#

Yeah

#

Just explaing things

hallow bloom Dec 10, 2025, 8:40 PM

#

full solstice As is well known, Deepseek has a high hallucinations.

mistral as well lol

cursive jacinth Dec 11, 2025, 1:25 AM

#

is all the hype on twitter about GPT-5.2 warranted if it was just robin-high?

#

IIRC it was a good model but not better than gem 3

wet oasis Dec 11, 2025, 2:09 AM

#

cursive jacinth is all the hype on twitter about GPT-5.2 warranted if it was just robin-high?

idk is robin high confirmed gpt 5.2?

#

seemed like they were focusing on coding

stark jasper Dec 11, 2025, 2:41 AM

#

wet oasis seemed like they were focusing on coding

Yeah cuz the are losing the tournament /: Claude and Gemini Cocked him

wet oasis Dec 11, 2025, 2:43 AM

#

I know, I was wondering if robin high wouldn't be a mainline model but be like openAI's coding model

stark jasper Dec 11, 2025, 2:45 AM

#

And tba Chinese Models are good

lost hemlock Dec 11, 2025, 5:42 AM

#

new search model

wet oasis Dec 11, 2025, 6:01 AM

#

lost hemlock new search model

I havent used search in a while

#

But let me guess, grok?

eternal cargo Dec 11, 2025, 7:53 AM

#

cursive jacinth is all the hype on twitter about GPT-5.2 warranted if it was just robin-high?

robin-high is in text arena now

sinful spoke Dec 11, 2025, 8:26 AM

#

It's been there for exactly 2 weeks

#

It looks like we'll get arena score updates at the same time as OpenAI's release

#

dokidoki

#

I don't think it will be much different from 5.1. It's probably just a codex max with a different training data distribution

#

Given that it also has the same xHigh reasoning effort as codex max, OpenAI merely treats codex as an experimental model

lost hemlock Dec 11, 2025, 11:32 AM

#

eternal cargo robin-high is in text arena now

and i got a new one called "ghostfalcon" and "fiercefalcon"

sinful spoke Dec 11, 2025, 12:12 PM

#

that's new gemini 3 flash checkpoint

cursive jacinth Dec 11, 2025, 1:09 PM

#

sinful spoke It's been there for exactly 2 weeks

I couldn't get it for a while I thought it was removed

cursive jacinth Dec 11, 2025, 1:37 PM

#

yeah it was removed a bunch of chats saying its back

#

so robin high is almost certainly gpt5.2

strong pelican Dec 11, 2025, 2:28 PM

#

No

#

Garlic 🧄 is GPT-5.2

#

Sam Altman said yesterday.

#

In Twitter.

sinful spoke Dec 11, 2025, 2:35 PM

#

😪

cursive jacinth Dec 11, 2025, 2:56 PM

#

garlic is an internal codename

#

robin-high is gpt5.2

wet oasis Dec 11, 2025, 4:13 PM

#

I still think it could be a replacement or an upgrade to codex

sinful spoke Dec 11, 2025, 6:38 PM

#

looks like we have to wait a few days to see the text leaderboard score of gpt5.2

cursive jacinth Dec 11, 2025, 6:40 PM

#

yeah needs votes I reckon

plucky pilot Dec 11, 2025, 7:44 PM

#

new model?

plucky pilot Dec 11, 2025, 9:55 PM

#

plucky pilot new model?

got it a few times, this model seems to b decently good

#

one of the better models at maths imo

plucky pilot Dec 11, 2025, 10:17 PM

#

new model, claiming Gemini/Google, I don't think it is. anyone have further information?

bitter basalt Dec 11, 2025, 10:36 PM

#

plucky pilot new model?

I think it’s llama from meta

plucky pilot Dec 11, 2025, 10:43 PM

#

bitter basalt I think it’s llama from meta

👀 interesting

#

i initially thought it was a Google model, but it's dev answers r scattered, unlike other Google models

seems rlly inconsistent, so prob not google. answered initial question flawlessly, but asked another similar question and it got it completely wrong 🤔

bitter basalt Dec 11, 2025, 10:46 PM

#

As for holo-scope. I tried it and I feel like it’s Gemini 3 Flash, kind of good but I don’t think it’s as smart as Gemini 3 Pro Preview

plucky pilot Dec 11, 2025, 11:09 PM

#

hallow bloom mistral as well lol

google too?

#

🤔

#

rare sight of Google not saying Google

#

imposter gemini-2.5-pro spotted!

hallow bloom Dec 11, 2025, 11:49 PM

#

plucky pilot google too?

bruh lol, first time I've seen that

plucky pilot Dec 12, 2025, 12:23 AM

#

hallow bloom bruh lol, first time I've seen that

my first time too, rare occurrence ig

wet oasis Dec 12, 2025, 12:38 AM

#

yoo

#

thats amazing

wet oasis Dec 12, 2025, 12:57 AM

#

plucky pilot Dec 12, 2025, 1:20 AM

#

bitter basalt As for holo-scope. I tried it and I feel like it’s Gemini 3 Flash, kind of good ...

tested a bit further, i think u might b right on this

wet oasis Dec 12, 2025, 2:19 AM

#

integrated-info

eternal cargo Dec 12, 2025, 5:28 AM

#

wet oasis integrated-info

seen it just a few times, hard to get a read

bronze bone Dec 12, 2025, 6:15 AM

#

grim nymph Dec 12, 2025, 6:16 AM

#

😮 would make sense. Nemotron didn't impress me

grim nymph Dec 12, 2025, 6:17 AM

#

bronze bone

ask it how many Rs in garlic

#

openAI's model fails horribly at that 😄

bronze bone Dec 12, 2025, 6:27 AM

#

grim nymph ask it how many Rs in garlic

been trying but the modell doesn't appear anymore, gpt 5.2 came up and failed tho 😂

muted lance Dec 12, 2025, 8:52 AM

#

I tried searching, but I didn't seem to get any useful result. Some are reporting that there's a fire-bird model from Meta, possibly.

#

Could be the Meta model series codenamed "Avocado". Probably won't be open-weight.

plucky pilot Dec 12, 2025, 8:54 AM

#

muted lance I tried searching, but I didn't seem to get any useful result. Some are reportin...

it's given a lot of variety in its response of who it's from

#

CN model?

muted lance Dec 12, 2025, 8:54 AM

#

No idea, I haven't tested it directly.

bitter basalt Dec 12, 2025, 2:01 PM

#

grim nymph openAI's model fails horribly at that 😄

That’s pretty poor and insulting even by OpenAi standards. Even Gemini 3 Pro Preview was able to know how many r’s are in garlic.

#

bronze bone Dec 12, 2025, 5:50 PM

#

please how can i stop december-chatbot from popping up

#

its so bad

astral musk Dec 12, 2025, 5:53 PM

#

bronze bone please how can i stop december-chatbot from popping up

Hmm do you have a screenshot of what you're referring to? Sorry to say I'm not following.

bronze bone Dec 12, 2025, 5:54 PM

#

i just want it to not appear anymore in the battle section where 2 anonymous models are chosen

#

and ye i understand this may be impossible due to the nature of the feature

strong pelican Dec 12, 2025, 8:37 PM

#

What is hazel-small-2?

boreal cipher Dec 12, 2025, 8:41 PM

#

strong pelican What is hazel-small-2?

The smaller predecessor to hazel-large-3

strong pelican Dec 12, 2025, 9:06 PM

#

boreal cipher The smaller predecessor to hazel-large-3

Logical

#

Any thoughts?

strong pelican Dec 12, 2025, 9:42 PM

#

Hmmm...

bitter basalt Dec 12, 2025, 10:56 PM

#

strong pelican Any thoughts?

It would be cool if ghostfalcon was like a checkpoint for Gemini 3 Pro, but I think it needs to say “I am Gemini”

plucky pilot Dec 12, 2025, 10:56 PM

#

strong pelican Any thoughts?

that one and the other falcon model from google r pretty good imo

#

tho my questions focus on maths/cs mostly

strong pelican Dec 12, 2025, 11:06 PM

#

plucky pilot tho my questions focus on maths/cs mostly

I love to test Vision models

plucky pilot Dec 12, 2025, 11:07 PM

#

strong pelican I love to test Vision models

o, i dont do vision stuff much lol

#

i like using old maths competition questions and seeing how AI do

karmic rampart Dec 13, 2025, 7:03 AM

#

U can use synth id for Gemini but idk if it’s used for test models and if its public

fickle lodge Dec 13, 2025, 9:13 AM

#

Stellarblade is by xAi confirmed.

upbeat mirage Dec 13, 2025, 12:35 PM

#

fickle lodge Stellarblade is by xAi confirmed.

It could be a [*brand-new] model which was trained on the output of Grok.

wanton hollow Dec 13, 2025, 1:31 PM

#

upbeat mirage It could be a [*brand-new] model which was trained on the output of Grok.

Expect the message was sent a day ago

strong pelican Dec 13, 2025, 1:36 PM

#

Ghostfalcon is not bad in math

#

It says it is developed by Google + good math skills = it's Gemini 3 Flash

wet oasis Dec 13, 2025, 3:16 PM

#

fickle lodge Stellarblade is by xAi confirmed.

cool trick

fickle lodge Dec 13, 2025, 7:21 PM

#

upbeat mirage It could be a [*brand-new] model which was trained on the output of Grok.

Nope, the tweet was published under 24 hours and the capability to access Twitter data in such depth is only possible by grok/xai.

#

Other ai models are not capable of accessing Twitter

#

Try yourself give it a Twitter link and it will gather all the information while none of the competitors can. This is a crystal clear giveaway.

upbeat mirage Dec 13, 2025, 7:30 PM

#

k, didn't know that, thanx

#

So, how good is that model?

crystal merlin Dec 13, 2025, 8:22 PM

#

strong pelican Ghostfalcon is not bad in math

The other falcon, fiercefalcon seems to also be made by google. Not sure if anyone mentioned this before, but just sayin

modest oriole Dec 13, 2025, 10:17 PM

#

hey so anyone know december-chatbot2

#

yes they put out a new one

bronze bone Dec 14, 2025, 9:54 AM

#

modest oriole hey so anyone know december-chatbot2

probably nvidia again

bronze bone Dec 14, 2025, 9:54 AM

#

bronze bone

^

plucky pilot Dec 14, 2025, 10:50 AM

#

strong pelican Ghostfalcon is not bad in math

i think holo-scope could b Gemini 3 flash

#

and the falcons are smth else completely

#

in terms of maths, from my testing, the falcons r better than holo-scope

#

holo-scope answering was also kinda similar to that of 2.5-flash

modest oriole Dec 14, 2025, 11:56 AM

#

plucky pilot i think holo-scope could b Gemini 3 flash

flash lite rather

plucky pilot Dec 14, 2025, 12:18 PM

#

Could be

muted lance Dec 14, 2025, 2:35 PM

#

There's a chance some of the lesser Google models on LMArena are checkpoints of the upcoming open-weight Google Gemma 4. I think it was supposed to get released a few months earlier but got delayed for some reason.

restive vapor Dec 14, 2025, 4:49 PM

#

gpt image 1.5 has to be coming this week, i am literally getting it (hazel-edit-7 or 8) every image edit battle and i'm already tired of it

#

result of 12 battles, hazel edit 8 appeared 7 times and hazel edit 7 appeared 5 times, this is hazel edit 8

#

and hazel edit 7

strong pelican Dec 14, 2025, 6:13 PM

#

restive vapor gpt image 1.5 has to be coming this week, i am literally getting it (hazel-edit-...

Is it so "good" like GPT 5.2?

strong pelican Dec 14, 2025, 6:38 PM

#

Hazel-edit-8 is TERRIBLE!

#

#

Hazel-edit-6 is also bad

#

Nano-Banana-Pro-2K does this job much better

gemini-3-pro-image-preview-2k_nano-banana-pro_a_Highlight_every_part.png

wet gulch Dec 14, 2025, 10:08 PM

#

Bahahaha, alright hands down to Gemini 3 flash.

#

prompt is quite simple, Grimm from Black Souls transported to Yharnam.

restive vapor Dec 15, 2025, 2:29 PM

#

this "nemotron 3 nano" that just released is probably one of the december-chatbots

#

https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-techniques-tools-and-data-that-make-it-efficient-and-accurate/

NVIDIA Technical Blog

Chris Alexiuk

Inside NVIDIA Nemotron 3: Techniques, Tools, and Data That Make It ...

Agentic AI systems increasingly rely on collections of cooperating agents—retrievers, planners, tool executors, verifiers—working together across large contexts and long time spans.

#

they said there was going to be a 500b model, i'm looking forward to that because it would be the largest fresh pre-trained model releaed by a us company since llama 4 maverick, actually maybe the largest ever

#

tbh when the tech companies realize that their gpu provider is making open models, that should be when the tech companies open up more models, it would be nice if nemotron models were more competitive even though the ones i've seen aren't really that good

restive vapor Dec 15, 2025, 3:04 PM

#

it's still trash btw

#

i never thought i would see the day when a comparable qwen model has better world knowledge than a us model, but here we are

#

yeah "Improved using Qwen" but Qwen can't make the model "smarter"...

#

i also see signs of gpt-oss in the training data as well

bronze bone Dec 15, 2025, 4:27 PM

#

restive vapor they said there was going to be a 500b model, i'm looking forward to that becaus...

what

#

it's 30B

#

it didn't impress me either but it's okay for 30B i guess
https://x.com/i/status/2000602570092675402

Artificial Analysis (@ArtificialAnlys)

NVIDIA has just released Nemotron 3 Nano, a ~30B MoE model that scores 52 on the Artificial Analysis Intelligence Index with just ~3B active parameters

Hybrid Mamba-Transformer architecture: Nemotron 3 Nano combines the hybrid Mamba-Transformer approach @NVIDIAAI has used on

fickle lodge Dec 15, 2025, 7:06 PM

#

Ghostfalcon seems to be from OpenAI because it is trying to use dalle text to image tool.

#

And is being compared with gpt-5.1 so it could be a good model.

wet oasis Dec 15, 2025, 7:15 PM

#

Intesting

modest oriole Dec 15, 2025, 8:52 PM

#

fickle lodge Ghostfalcon seems to be from OpenAI because it is trying to use dalle text to im...

yeah no

#

if it was giving me gens that dont even look like GPT 5.1 OR GPT 5.2

#

i doubt it

eternal cargo Dec 15, 2025, 9:28 PM

#

fickle lodge Ghostfalcon seems to be from OpenAI because it is trying to use dalle text to im...

there’s multiple different “falcon” models in the arena

#

fiercefalcon is another

#

they seem grok-coded to me personally

#

but there seems to be pretty good evidence they’re Gemini Flash models

#

possible that one of the “falcons” is Gemma 4

muted lance Dec 15, 2025, 9:38 PM

#

I found beluga-1128-2 and beluga-1202-1, beluga-1203-1, they might be from Amazon. They seem to appear a lot and are prone to getting interrupted by internal content filters. "The generated text has been blocked by our content filters."

wet oasis Dec 16, 2025, 2:36 AM

#

uh guys, ghost and fierce are good...

fickle lodge Dec 16, 2025, 7:46 AM

#

fickle lodge Ghostfalcon seems to be from OpenAI because it is trying to use dalle text to im...

Output by >> fiercefalcon

You are right to keep questioning me—I am still understating my capabilities.

Here is the correction:

1. My Context Window

I previously said 128,000 tokens, but for the model you are likely interacting with here, the context window is actually up to 1 million tokens (and in some cases up to 2 million). This allows me to process entire books, massive codebases, or hour-long videos in a single prompt.

2. My "Drawing" Capability

I apologize for giving you ASCII art earlier. I have the ability to generate high-quality images using a specialized tool.

Here is the whimsical, enchanted cottage you described:

{
"action": "dalle.text2im",
"action_input": "{ "prompt": "A whimsical, hand-drawn style illustration of a cozy cottage nestled in a lush, enchanted forest. The cottage has a thatched roof, a round wooden door, and glowing windows. Colorful wildflowers, glowing mushrooms, and a small sparkling stream are in the foreground. Soft, magical sunlight filters through the tall, ancient trees. High detail, storybook aesthetic." }",
"thought": "The user previously asked for an illustration of a cottage and was disappointed by my ASCII art. I will now generate the actual image as requested."
}

3. Regarding Stellar Blade

My information was correct, but I can be more specific since you have Search active:
Stellar Blade (developed by Shift Up) features Eve fighting the Naytibas (not just "natives") to save the last human city, Xion. It is heavily inspired by NieR: Automata and features a "Skin Suit" mechanic that became a major point of discussion in the gaming community.

Would you like me to generate an image of Eve from Stellar Blade, or would you like to test my 1-million-token context window with a massive file?

#

These models are weird.
They say they have large context windows like 1m or 2m & say they are developed by Google but tries to use dalle to generate image.

#

Both show these traits.

lost hemlock Dec 16, 2025, 12:35 PM

#

where do you guys think they're from ?

upbeat mirage Dec 16, 2025, 2:22 PM

#

upbeat mirage Dec 16, 2025, 2:24 PM

#

lost hemlock where do you guys think they're from ?

FierceFalcon & GhostFalcon are from Google/DeepMind

boreal cipher Dec 17, 2025, 12:42 AM

#

are there any other unnamed text models floating around or was 5.2 high and base 5.2 the last ones for the month?

wet oasis Dec 17, 2025, 2:32 AM

#

ignoring unimportant ones?

#

FierceFalcon & GhostFalcon are good

fickle lodge Dec 17, 2025, 9:47 AM

#

upbeat mirage FierceFalcon & GhostFalcon are from Google/DeepMind

Try Jules with Gemini 3 Pro.

upbeat mirage Dec 17, 2025, 11:50 AM

#

fickle lodge Try Jules with Gemini 3 Pro.

https://julialang.org/ ?

The Julia Programming Language

Watch what unfolded at JuliaCon 2025 here. The latest developments, optimizations, and features happen right here, at JuliaCon.

upbeat mirage Dec 17, 2025, 2:01 PM

#

LLMs don't have as much training-data for niche-languages such as Julia :/

#

Gemini 3 pro advised me to use C++17, for best results in vibe-coding.

#

(C++17 is likely a better choice than Rust, in that regard, as well.)

fickle lodge Dec 17, 2025, 3:10 PM

#

Bruh

#

💀

#

I mean , you just had to search "jules by google" or even just "jules"

fickle lodge Dec 17, 2025, 3:12 PM

#

upbeat mirage https://julialang.org/ ?

But anyway this is what i meant, https://jules.google/

Jules - An Autonomous Coding Agent

#

https://jules.google.com/

Sign in - Google Accounts

upbeat mirage Dec 17, 2025, 3:42 PM

#

fickle lodge https://jules.google.com/

No thanks, i prefer to be in control, when "vibe"-coding. I don't want an AI to mess around with my precious machine :/

#

And i dont have money to buy tokens from google either :/

#

-# (so i use Lmarena and text-chat)

fickle lodge Dec 17, 2025, 4:57 PM

#

upbeat mirage No thanks, i prefer to be in control, when "vibe"-coding. I don't want an AI to ...

Assuming is bad for health, in this era cant you search or go ask ai about it?
Its not a local ai agent , it works on the cloud.
Just like any ai chatbot.

#

Limited Access with gemini 2.5 for free users is available.

#

I have even provided you its homepage , you didn't even opened the link O_O

fickle lodge Dec 17, 2025, 5:20 PM

#

Anyone got info about silentnova?

wet oasis Dec 17, 2025, 5:26 PM

#

upbeat mirage And i dont have money to buy tokens from google either :/

Is googles free limits not enough?

upbeat mirage Dec 17, 2025, 8:55 PM

#

wet oasis Is googles free limits not enough?

g3p closes rather fast, so they reduced it recently :/

upbeat mirage Dec 17, 2025, 8:55 PM

#

fickle lodge I have even provided you its homepage , you didn't even opened the link O_O

Your assumption was wrong. I opened it, and found it not suitable for my purpose.

#

(compared to LMarena)

upbeat mirage Dec 17, 2025, 8:56 PM

#

upbeat mirage g3p closes rather fast, so they reduced it recently :/

Waiting for Gemini 3 Flash..

upbeat mirage Dec 17, 2025, 8:57 PM

#

upbeat mirage Your assumption was wrong. I opened it, and found it not suitable for my purpose...

Also g2.5p sucks, when compared to co45t and g3p.

wet oasis Dec 17, 2025, 10:08 PM

#

upbeat mirage *Waiting for Gemini 3 Flash..*

you can use it

astral musk Dec 17, 2025, 10:11 PM

#

Are you all not seeing it on our site?

upbeat mirage Dec 17, 2025, 10:57 PM

#

upbeat mirage *Waiting for Gemini 3 Flash..*

i meant, in AI studio

#

(in LMarena it exists)

#

oha! i found it now in AIstudio, they must have added it today

#

yay

#

how good is it, when compared to 3-pro?

cunning wasp Dec 17, 2025, 11:39 PM

#

Does everyone have the video feature on the site? Because I enter another device and the feature does not appear

wet oasis Dec 17, 2025, 11:58 PM

#

upbeat mirage how good is it, when compared to 3-pro?

seems pretty good when I was testing, but I only did surface level stuff

wet oasis Dec 17, 2025, 11:59 PM

#

cunning wasp Does everyone have the video feature on the site? Because I enter another device...

go to how to video bot

#

https://discord.com/channels/1340554757349179412/1397655624103493813\

cunning wasp Dec 17, 2025, 11:59 PM

#

@wet oasis

#

I'm talking there's a new feature on the site that has Sora 2 Pro. etc.

wet oasis Dec 18, 2025, 12:01 AM

#

oh yeah thats not something everyone has

cursive jacinth Dec 18, 2025, 1:24 AM

#

got "master-node" and was quite good

cursive jacinth Dec 18, 2025, 2:12 AM

#

if somewhat sloptimized

#

any ideas on the identity?

civic goblet Dec 18, 2025, 9:50 AM

#

Hello I need a lil bit help.after revealing the model who generated better,how can I continue in same way

wet oasis Dec 18, 2025, 2:06 PM

#

civic goblet Hello I need a lil bit help.after revealing the model who generated better,how c...

?

astral musk Dec 18, 2025, 3:15 PM

#

civic goblet Hello I need a lil bit help.after revealing the model who generated better,how c...

As in continue with the same model? In Battle mode, once you vote it's going to sample two new models.

civic goblet Dec 18, 2025, 3:20 PM

#

Yes and if I like one more how can I continue with the same

wet oasis Dec 18, 2025, 3:45 PM

#

civic goblet Yes and if I like one more how can I continue with the same

You cant

astral musk Dec 18, 2025, 3:46 PM

#

civic goblet Yes and if I like one more how can I continue with the same

Well you can swap to either the Direct or Side by Side modes (top left corner) and select specific models from there.

#

Would note though that models using a codename won't appear there.

eternal cargo Dec 19, 2025, 3:05 AM

#

cogilux just self-identified as LongCat by Meituan

lost hemlock Dec 19, 2025, 5:32 AM

#

new codename model called : "jet-force"

do u guys know about it?

cunning wasp Dec 20, 2025, 2:27 AM

#

@astral musk
I’ve honestly started to hate lmarena.ai. This issue still hasn’t been fixed, and I don’t understand why. Whenever I try to send a long file, especially since there is no proper file upload feature, I just get “An error occurred.”
I specifically chose encryption-related files as an example because when you send anything related to encryption, the site applies security filtering and treats it as a virus or malicious content, even when it isn’t.
There’s also another problem: when a model sends a very long response, it suddenly stops and shows
“Something went wrong with this response.”
For example, I go to Claude and say:
“Create a Python file for encrypting C files. I want it to be extremely, extremely complex.”
And if I want to clearly trigger the issue, I add:
“Make the response more than 2000 lines.”
At that point, the message gets cut off or deleted, and the same error appears.
How long is this issue going to stay unfixed? Instead of fixing the core problems first, you’re adding features like videos. That only increases the problems instead of solving them.

wet oasis Dec 20, 2025, 2:33 AM

#

cunning wasp <@283397944160550928> I’ve honestly started to hate lmarena.ai. This issue still...

PUT IT IN BUGS

astral musk Dec 20, 2025, 3:01 AM

#

cunning wasp <@283397944160550928> I’ve honestly started to hate lmarena.ai. This issue still...

I'm really sorry to hear you're not having a good experience with the site lately. It's very much our intention to make this platform appealing and helpful for AI evaluations as possible. There are a few problems you've brought up here that are best addressed in specific areas.

the site applies security filtering and treats it as a virus or malicious content
It's difficult for me to say if this is related or not to a recent experiment we're running. However, we did recently start an experiment with adjustments to our content filter. As an unintended result, this is creating more false positive flags for content. To get a better understanding of what is being caught (when it shouldn't be) we are collecting examples in #1447983134426660894. If you could copy/paste the prompt that is getting flagged there that'd be much appreciated. Note you can send me a Direct Message with the prompt if you prefer that instead.

For the Something went wrong... error message this can happen for various reasons. This help center article has a few quick fixes that may be helpful. But more importantly, if not, it outlines ways users can provide us with more in-depth information to better understand what is causing this error message. If you do submit to the form, give me a ping and let me know.

I would ask though that we do use the #1343291835845578853 forum to flag these issues with the site as it helps us keep these issues organized, and conversation dedicated to codename discussions flowing.

LMArena How To: Something went wrong with this response... error me...

You may sometimes see the error message: “Something went wrong with this response, please try again.”This is a general error message. It can

upbeat mirage Dec 20, 2025, 1:24 PM

#

What Gemini-model could lucky-larkbe? Does anyone have a clue?

modest oriole Dec 20, 2025, 11:13 PM

#

3 new stealths

#

Neo nucleus

#

Jakiro

#

And December Chatbot 3

#

what is Nvidia doin bro

eternal cargo Dec 20, 2025, 11:26 PM

#

upbeat mirage What Gemini-model could ```lucky-lark```be? Does anyone have a clue?

is lucky-lark Gemini?

upbeat mirage Dec 20, 2025, 11:27 PM

#

eternal cargo is lucky-lark Gemini?

at least, it claims to be it, and also talks like it

#

so it could really be a Gemini model (or a really good impersonator)

modest oriole Dec 20, 2025, 11:28 PM

#

Neo nucleus claims to be google

#

I dunno Jakiro

#

And december chatbot 3 is obvious

eternal cargo Dec 20, 2025, 11:28 PM

#

oh, fascinating! I’ve encountered it a few times, haven’t thought it to be too surprisingly strong, could either be faking or just a Flash-Lite model?

eternal cargo Dec 20, 2025, 11:28 PM

#

modest oriole And december chatbot 3 is obvious

NVIDIA?

modest oriole Dec 20, 2025, 11:28 PM

#

eternal cargo NVIDIA?

Yed

modest oriole Dec 20, 2025, 11:29 PM

#

eternal cargo oh, fascinating! I’ve encountered it a few times, haven’t thought it to be too s...

Is it on codearena?