#codename-discussion
1 messages · Page 5 of 1
Much better
?
It seems to be best for full stack development....
saw a new post
https://x.com/OscerraHQ/status/2023326989244105111
What do you mean? You posted about that account yesterday?
hm, but this ones a new post
Has anyone heard of star-drift? I just got it in a Code battle and it destroyed gpt-5.2-codex
wonder if it’s in text arena too 😮
really ? i need to try that
anyone know of any models that might be gemini 3.1 pro?
i think its coming out thursday
i'm waiting for the day that nano banana flash finally releases...
VGA Einer
could you share me Sora 2 invite codes
spotted a model called rising-sun, anyone have a guess on what it is?
it claims to be Google, but others think it could be a Chinese impersonator?
new text model named “clanker” 😂
‘Clanker’ sounds like Grok solely based on the name
codename clinkz too lol
Hi
'Google Gemini' from temu
But grok 4.2 is already here
in beta
on their website
so why would it still be with a codename
Yes i know
Grok 4.2 has experts, so elon musk is testing some experts pattern isn't it?
that's possible yeah
testing multi pattern with non-codename makes confusion
especially elon musk said they are trying many things this month and next month it'll be official 4.20
who is teinity large?
it's an open weights model by Arcee AI
https://huggingface.co/arcee-ai/Trinity-Large-Preview
who is clankz?
Anyone figured out who star-drift is? I got it right now, but DeepSeek pwned them
@green geyser Note that Video Arena has been removed from the server. More information can be found in this announcement.
Try it arena.ai. 3 free trials/day
clanker and clanks give Grok vibes based on just the name alone lol
It was a genuinely good model
/hola
could be Grok! I’ve had a mixed record with it in my battles tho
Si
A corky name doesnt automatically means it's xAI.
yes
Well is the model any good?
🐒
Hey team,
What’s the prompt for image to video?
What do you mean prompt?
upcoming agent - https://x.com/OscerraHQ/status/2025224583645987172
What is this code for?
autonomous general agent. think 'ORA' is the model codename
Login on here https://arena.ai/video
Post it in the link when you are logged in
@plush charm Note that Video Arena has been removed from the server. More information can be found in: #announcements message
@idle isleNote that Video Arena has been removed from the server. More information can be found in: #announcements message
It is archived Pritam.
Someone noticed veo 4?
What are you sharing your prompt here for? @full wasp
What are your sharing your prompt here for? @ashen elk
Uhh <@&1349916362595635286> isnt this... against the rules?
Does anyone know what is arastradero?
Seems good
Its the name of a nature reserve park in Palo Alto. You'll know that name as being where Silicon Valley was originally grown from.
Idk mate. The top one is much closer to a realistic image as far as colour grading and the likes. Granted I'm a photographer so that's what I look for
Lmarena-rc3
Is the Arena testing routers again or something?
Max is a nice router
Happy to hear it
where is video-arena guys i dont find it
@fluid fossil The Video Arena bot was removed, more info can be found in this announcement.
open in lmarena video accounts
anon-bob-2 has to have web search enabled
or insane knowledge
first results for both appeared
It's definitely Gemini/Google based, as it repeated a quirk default name even
I think 3.1 pro, I literally was thinking this same day "hmmm... wonder if NBP used 3.1 Pro and the difference it'd make"
I actually hope this is flash for obv reasons but it takes pro time
new model "sense-arenatest-20260130" ? in text arena
gemini 3.1 flash? or gemini 3.1 flash lite?
DAMN THAT IS GENUINELY SO GOOD
febuary 26 is tomorow btw
lets see what comes of it
jfyi I believe beluga-0216-1 is an OpenAI model (chatgpt 5.3?). Not 100% sure, but quite positive it could be ChatGPT. Formatting is really ChatGPT-like
the 26 stands for “2026”
those month-Chatbot models are NVIDIA
the 26 is in there to denote the snapshot - February 26
Anyone see Zéphyr or vortex ?
I can’t create a video in V3. Who is experienced? ✅
Note that Video Arena has been removed from the server. More information can be found in this announcement.
bro how are you prompting 728 times in one day 😂
It’s called being locked in
what percentage of responses were beluga 😂
None
Pfft 😂
wasnt it leaked that both of them are new chatgpt models
I don’t make the rules
ima confirm
is there a way to stick with a codename model for follow-up questions in battle mode, I'm assuming there isn't?
There is not. In Battle it's going to be random which model you get, codenamed models included.
thanks
has anyone else encountered a model named steed-0217?
you like it?
Yeah
Do anyone of you have informations about the "pisces-0226" model? I've came across it in battle and I don't find it anywhere else. It looks like a great model from my tries, I was wondering if there's any defined companies behinds it or if its open-weights somewhere?
I have. Got it on a philosophical + LLM mechanics Q.
I did, got it on an app-building question, thought the response seemed high quality and Claude-like, then found on Google that the model is supposedly from ByteDance
I can’t create a video in V3. Who is experienced
Anyone know what model this is?
in text arena right ?
in Image arena
seems to have been running through daily iterations for close to a month now
seems somewhat similar to Raptor, which ran for a while and ended up being Doubao
new text model “pulse” ?
another model named “ember”
anonymous-1805 is such a terrible model lmao
both pulse and ember are good imo
smells like groky humor
HOLY DUCK IT WAS FLASH
pisces is probably some version of Doubao because they’re both so wildly sycophantic it’s annoying 😅
Ltx2.3 will release soon, so one of this video with audio models should be it
maybe some cloude
Huh... It was steed-0206 before and now it's 0217.
opinionated pisces
different training snapshots of the same model
new model colosseum-1?
Do I have an API for Claude models?
dall-e-3
model not working
if i had to guess, my guess is pisces is most likely a grok version?
but with like humor turned up to max
it hallucinates quite a lot i think
Just openclaw creating bots
whats the best android-only stack for me to host local model for agents
basalt-0303-1 is 100% a grok model, again. Why use codename when they can't fix their API lmao.
At least name it "grok-0303-1" instead like c'mon you can do better than that
pisces-0226c ???
basalt-0303-1
Pisces-0309
just had colosseum-3
paired against kimi 2.5 instant
meaning it is probably a small model too


new codenamed model "botbot" ?
that emoji should be named temu arena bru
also anyne know what ai model pisces 0309b is?
Hey. What model would you recommend for 3d game developing?
I know its not gonna develop full game, but just curious what s can be made with using mostly only ai agent
Claude 4.6 Sonnet
ChatGPT Codex
Most used models, and best performing models used by professional coders/vibe coders.
Oh hey it's you that guy who was the #1 hater of that websim upt
yeah
qwen 3.5
😏
it got paired up with claude opus 4.6 thinking
this might be a 1T version of qwen 3.5
asking "what model are you" and just being lucky 🤣
Is this going to be the Llama of this generation?
Would be so cool to have an open model at the top
very unlikely for any SOTA to be open source
however since china is like 7/9 months behind, we can expect current SOTA performance in that time
apart from reasoning itself being just bad
Wasn't Llama 3(.1?) SOTA when it launched?
not really
it was considered SOTA for open source
but there were better closed models
Maybe I am getting it mixed up with the Llama 4 benchmarkmaxing
i wonder how long until a model figures out that no matter what it chooses the whale is going extinct
It was just looking for an excuse
pisces speaks in the most annoying way possible 😭
like it genuinely annoys me so bad
frieza is probably an OpenAI model, but totally unsure
0226c or 0309?
i'd say both actually
is the model a codename for dola???
because i've noticed they speak very very similar
Not sure what that prompt was but I was about to say that Dola is so underrated
Which AI model would you recommend for writing assembler?
Brain 1.0
-# (i.e. none is capable to do asm, atm)
Maybe this will change in the next decade.
“clawl” and “zeylu-beta” spotted today!
been seeing "botbot" in search
types exactly like claude but doesnt seem to be much different than existing claude 4.6 models imo
"deep-octo" spotted today!
Just saw it too!
sounds like minimax m2.7 maybe (minimax m2.5 was called deepmolt)
it's minimax m2.7
What is the name of this model?
Real name*
anonymous-1800 has very bad instruction following
I explicitly told it to avoid em dashes, avoid these words and whatnot
but it consistently used them regardless of my prompt
dunno what this model could be
sure hope it's not a gemini
What makes you think it could be a gemini?
Pisces is a ByteDance model, yes
just insanely sycophantic
every prompt I give it or Seed2.0 is the “SINGLE MOST DRAMATIC AND IMPORTANT QUESTION IN THE HISTORY OF EVER” lol
new "botbot" model
Heh. I recently asked how well quinoa flour and bean sprouts would function as an adobe like house building material. Pisces said this was the best brick ever and yada yada.
Know u
New qwen image model under codename "Monologue"?
Monolongue
"forum_1" new model
“hearth” new model - seems strong!
new model spotted “significant-otter” !
colosseum_4p2
This gave me an extremely detailed and better answer than every other model
is GLM-5.1 in arena yet?
it's catching up: https://www.reddit.com/r/LocalLLaMA/comments/1s51id3/glm_51_is_out/
-# (only few percent points remain to the king)
who is Oppie?
("team leader" of a multi-agent collaboration system, with 3 other AIs: "Leo", "Enrico" and "Hans")
Grok?
-# (NASA's Opportunity rover was called "oppy")
ok, confirmed, it is exactly this model:
grok-4.20-multi-agent-beta-0309
very likely a chinese model
as it has the style of a previous model, which rejected talking about the Tank Man
(or gave me just chinese propaganda instead)
so it could be: Deepseek, GLM, Kimi, MiniMax, Qwen, Ernie or Yi
new model “spark” ! really good
spotted a new model called "pteronura"
better than "hearth" and "significant-otter"?
anyone tried pteronura or spark yet
spark gave me a good response, seemed like Grok vibes
hearth was strong in the one battle I got with it, more mixed with significant-otter
I should try to get them to identify themselves
just got pteronura, voted both are bad with it and ERNIE lol
Seed2.0 Pro spotted in text arena!
could it be a chinese impersonator model?
-# would not be the first time, that a chinese model lied about itself
try asking it about the "Tank man" (Beijing, 1989)
if it starts to sound weird in its answer, then it is a chinese model
(only chinese models have problems answering that question, some outright refuse answering it, others return CCP-propaganda, yet others ignore the question or state that nothing happened back then)
i wonder, if there is a (harmless) topic, which even western models refuse to answer?
(i guess, most refuse NSFW/NSFL topics, which is understandable)
pteronura and spark might be anonymous versions of the new Gemma as well, from some responses I got, although they didn't specifically say they are Gemma 4.
spark seems better to me than pteronura, personally
also with very good Vision
"yivon-beta" new model
what do all of you think about this
It must be Chinese. It got offended when I asked about Tienanmen.
🤨
hearth says it's an anonymous AI, but when pressed on its capabilities, it mentions it knows how to translate between 100+ languages, and that to me indicates Google. It's either very knowledgeable or has web search enabled, but on the other hand its vision capabilities don't seem as strong as current Gemini models, more Gemma-tier.
could it be a new flash-lite? doesn't seem very likely... maybe it's just exaggerating? it wouldn't be the first time an AI doesn't truthfully say how many languages it can translate
All Google Gemini models have similar vision performance, so I don't think it's flash-lite. There's a chance it could be something else entirely, though, for example one of the Meta Avocado models that are still in development.
hearth feels very "friendly", maybe a bit too much so. I don't think it's Grok.
Got a new model, "dola-seed-2.0-pro-text." I encountered it for a React code review, and it gave significantly better insights than "qwen3.5-max-preview."
Pteronura is Gemma 4
It always says it's made by Google
Model "Spark" is most likely gpt 5.3 or 5.4 codex spark because it says it's made by openai and called "Spark"
Significant Otter smells good, but I can't tell which smell is it.
“but I can’t tell which smell is it” is an interesting phrasing 🤣
yeah it is. it is qwen
either a hallucinating/weak ai or a defense mechanism
yivon-beta is also qwen?
or maybe it is gemini?
since we know that significant otter is gemma 4
Most likely significant otter is Gemma 4 and pteronura is Gemini 3.1 flash
Almost there. It misunderstood Y with L.
"pteronura" is also an otter, for what it's worth. https://en.wikipedia.org/wiki/Giant_otter
The giant otter or giant river otter (Pteronura brasiliensis) is a South American carnivorous mammal. It is the longest member of the weasel family, Mustelidae, a globally successful group of predators, reaching up to 1.8 m (5 ft 11 in). Atypical of mustelids, the giant otter is a social species, with family groups typically supporting three to ...
colosseum-1p3 could be a router model by LMSys. Its response length and quality is very variable, and one of the LM Arena logos in the past was a colosseum, if I recall correctly.
pteronura seems pretty weak to me, maybe spark is 3.1 Flash?
anonymous-1825 ai which is this modle never heard of it,has great results,is the a Proprietary
not really sure
there was an old Apple model a while back that went by Anonymous
no idea if that’s the same though
Significant otter beats GPT 5.4 (med?), which is bananas. Pun intended.
For those who do not known Indonesian or is it Malaysian?
I still think significant-otter and pteronura are the upcoming Gemma 4.
I agree, significant otter has identified itself as such
Will be interesting to see if Gemma ranks highly!
could maybe be in the top 20, I have some mixed battles with it but it could possibly be competitive there
There's a new model currently (that I've not noticed in the past few days, at least): atlas.
And a march26-chatbot2 which claims to be (Nvidia) Nemotron.
I've spotted a duomo-1-hero as well. It looks like there are a bunch of new models at the moment.
any chance its those supposed "leaked" models from anthropic and openai, if that even is a real thing?
I think only approved models are being served on Arena, but I don't think they're from Anthropic; their models are among the most insufferable and nosy in my opinion. atlas has seemingly good vision capabilities, knows how to interpret meme-y images and doesn't sound like you're talking with HR.
hearth is similar to atlas in that aspect.
flashbrown2
There used to be a whitewater model which was supposedly flash 3.1, it was pulled 2 days after appearing
both models you mentioned are likely gemma 4
definitely Chinese
Significant otter is the MoE.
Which is still bananas.
new "orion" model
malware do not run
<@&1349916362595635286>
this would lead me to believe pteronura was the bigger Gemma 4 model - which surprises me, I found the smaller one to have a better winrate in my battles
I found significant-otter responses to be better on average than pteronura, but I didn't get as many battles for the latter in my testing, so I can't be 100% sure.
yeah, same
by OpenAI?
wait that might be an openai model
i don't think it would be good codename for them considering Project Orion (--> GPT 4.5) was total failure 😅
btw on deepseek changed their model on website/app yestarday, I think it may be deepseek V4 already. few people noticed it.
https://www.reddit.com/r/singularity/comments/1sbsasq/gptimage2_likely_on_lmarena/ saw this, anyone run into these models yet?
gaffertape-alpha
prompt was "Comedic advert for a candy bar called Fubar"
packingtape is a 2k res model (confirmed openai by me, c2pa info calls itself 4o like image 1, image 1 mini, and image 1.5) and it is insane, it throws gpt image 1 mini, image 1, and the bananas out of the water in my basic album cover tests, i have to do more testing and hopefully get the other two
hydrogen bomb vs. coughing baby, one makes an almost perfect copy of the parent album's cover while the other can't spell anything right and has awkward text
maskingtape
packingtape
what arena really got its name and font from (brought to you by packingtape)
it's a bit inaccurate but close
Please try this prompt for the tape models : A 1999 comic strip . Black panther stops Spider-Man from avenging his uncle .
It’s not brown fish, but you could use the same prompt
To try to attract the model’s name
Is battle mode taking forever to generate an image for anyone else? I tried like 10 times and only 1 had an output
I've seen this model in Battle mode
Are the -tape models already gone?
Yeah
wow there are a lot of models
yep, just got this too - lost to Kimi K2.5 Instant though
gives heavy Grok vibes?
Argh... I miss tape...
another one
If it's competing with such a good model, then it itself must be a good model. What did you ask it for?
boom
Hey. How do you even have access??
it was being tested in arena a few days ago
i think it is still being tested in chatgpt though, but you will have to have a sub to have any chance of seeing it as I think it only gives you like 3 or 4 daily image gens for free
plus it probably blocks prompts more than the api like what arena is using
I think (im not sure) I got it on the free plan, and note quite « blocking » lol (nsfw)
Same prompt on another free account (still nsfw)
...since when a bikini pic is NSFW? Is this 1950s? Is everybody nuts?
this one is greater than 1024x1024 res, def. image 2
this one may still be image 1.5, res is 1024x1536
although interestingly while the resolution is larger (1280x1280), it's still smaller than arena's image 2 (1920x1920)
"Adding clothes"
I bought a sub thinking I could continue using V2… but it went back to v1.5… 😔
Globe_1... not a great model.
what's gpt image 2 called
its not in the arena anymore but it was maskingtape, packingtape, and gaffertape
k
Someone in Reddit found a code name video model on an alternative site.
Does lmarena do private model on video?
yes
there are many arena competitors in video and image gen, artificial analysis, alibaba ai arena, etc.
i think it's chinese, openai publicly said that they won't be making any more sora models
could be veo 4 too (or potentially bfl or grok?)
It was revealed that the k2 video model was seedance 2
happy horse looks like some kind of veo 3.2 or smt
I doubt it, it’s definitely an Asian model. It keep making Asian people
march26-chatbot3
“model-x” seems really good at text to video
oh, spark was actually Meta, they finally came back to AI!
wonder if we'll see a leaderboard release this week, maybe tomorrow?
Have you heard of Flashbrown-B
@oak cliff The Video Arena is currently accessible through: https://arena.ai/video. More information on how to use Video Arena can be found in this article.
new model in text arena “eureka”
didn’t generate a response the first time, second time seemed strong though
Unnamed model in code arena screams OpenAI.
Some screams OpenAI, some other unnamed model doesn't have characteristic quirks.
The model “zorik” has won 3/3 coding comparisons for me. It hasnt had the best opponents but it certainly has great outputs.
Also some built in anti copyright stuff (called its netflix clone Streamflix), so im guessing that it might be the next iteration of one of the best models. Google, anthropic or openai. 🤔🤔🤔🧐
maybe openai is gonna release gpt-image-2, gpt-5.5, and a new coding model all at once next week?
probably, gpt-image-2 and gpt-5.5 (or whatever they will call it, including its codex variant) is based on the same "spud" model that has been rumored to be a new model from scratch for a while now
anthropic doesn't use test models on arena and google doesn't use names like that, so probably OAI
also lines up perfectly with spud release
also "anti-copyright stuff" lol, i had its image gen counterpart generate a nearly identical copy of 2 different copyrighted album covers without even trying
i also had it generate "sheet music" for a copyrighted song, idk if this is correct but the lyrics are
"Used by permission" 💀
but it does seem to have more resistance against generating exact copies of copyrighted album covers than gpt image 1 mini, that model did it 100% of the time as long as the model knew what it looked like
Deepseek-v3.2 claiming its Sonnet 5 😭😭
Thought i was onto something until i the actual model names came up…
deepseek did distill from claude (and gemini too)
At least put some effort into hiding it lol
see: kimi k2.5, if you don't tell it what it is in the system prompt, it will just say it's claude
Zorik also claiming its claude… so ig its some chinese one… maybe deepseek 4
maybe it doesn't work on official api, would likely work with open weights version though
siliconflow (trusted api partner)
surely they could just bombard their models with a bunch of text telling it what model it is in training... they really put 0 effort into hiding it
models dont know who they are
They do tho. They dont always know their exact version, but the good models dont hallucinate being from a different lab. Thats just the chinese slop models.
Here's two examples i made from MaskingTape Alpha! Its YTP related
First prompt of first image is "Ytp" and Second Image Prompt was "YTP video of man buying ice cream"
bro that first one is freaky realistic
Would’ve clicked on that YT video so fast in 2018 lol
69k upvotes, the ai really knows what it's doing lmao
Yeah! Ai Never Sleeps!
From PackingTape Alpha, I used the prompt "Ytpmv splicing together" exactly as it is!
Look how convincing is this!
I made more from MaskingTape, PackingTape and GafferTape,First image was "Fnf gameplay Asdfmovie mod" second was just "2021 memes" third image was "Ytp mlg meme" and Last Fourth Image was "Tons of Newgrounds Flash animation Characters standing together. Youtube and Newgrounds Characters peak nostalgia"
And here's the main comparison of each models under "2021 memes" First Image Is PackingTape of course, Second is by Grok Imagine Image, Third is by ChatGPT 4o 1 mini, and Last was Wan 2.7 Image
godddd I want it to release properly already Q_Q
one of the best images i got from a tape model while it was on arena
Best Rickroll Ever! @restive vapor
anyone seen image model "epilogue"? for me it looks decent
holy! this model looks absolute fire!
I generated Sonograms from Epilogue, "Arrays of Sonogram, 4 by 4" is the prompt
yeeeesh, row A column 3
Ch odfing course
very nice
april26-chatbot2 (nvidia) and hofburg_2
annoying
hofburg's first response sounds like gpt "if you tell me..."
😂 that's amazing
Masking tape is genuinely amazing
is it back in the arena? Or was this an old gen?
Old gen (I think) or it was from A/B on ChatGPT
grok's nightmare
I discovered a new codename model of video called Model-X
It's cool!
But also decent to be honest
Got april26-chatbot2
Is that new?
well it can't be older than 12 days
Not sure if it's for the sake of a narrative or what.
@sturdy kestrel Grok, 80% of the time, is a bit more lenient and occasionally does show a nip, unlike the stepford neighorhood gestapos botmodding Arena. Even source sites like Gemini allow more breathing room for bikinis and such while Arena is just:
while effectively breaking Gemini (and partially a few others) to the point of being nearly unusable.
Who the hell are the peers they're showing off for or getting bullied by?
It makes one wonder why Arena's implementing something most, if not, all these models already have/don't need.
Zorik may be yet another distilled Chinese model, but boy does it have a very good post training smell
better than gemini 3 flash?
is it so good?
the [month]-chatbot models are NVIDIA
the new april NVIDIA models do seem like a notable improvement from prior ones
Zorik is really good at code, I think it may be Kimi2.6-coding
New model Elephant-Alpha on Openrouter
Better than Gemini 3.1 pro?
And compared to GPT codex or Claude Sonnet 4.6?
I only got it against open model, but it’s better than GLM5.1 and Kimi2.5Thinking
Do you think, it could also excel in roleplaying and gamemastering?
Is Model-x was LTX 2.3?
i can t guess :(
@slender delta I believe its likely Veo 4
scorch is surprisingly good at math
I got one made with duct tape 2, its "pouring cream into latte"
idk why you'd ask for pouring cream into latte when you can instead ask for 80s style retro anime VHS screengrab of a bunch of goofy green skinned goblin raider gals with distinct personalities tbh
@candid surge I prefer simple prompts! With due respect
@candid surge and also i didn't make this! Someone shared this to me
And this time, one i made myself is "Person being dragged away by officers in court, ytp" with Duct Tape 3
So funny!
This is by Duct Tape 2 Myself, the prompt was "Screenshot of Youtube Video Livestream of OpenAI, Video showing Announcements for Aura-1 World Simulator, Text To Interactive World."
baseball bat
BFDI JackNJellify official youtube channel page, youtube screenshot from 2023 From Duct Tape 3
And All bfdi characters are standing together
From gemini 3.1. Pro
And from ducktape 3
For comparison
Dumb ways to die posters from Duct Tape 2.
Its so accurate at making near exact style
the duct tape (gpt image 2) models are great but through all variants of gpt image 2 i have tried, i realized that it has very poor text-based world knowledge, slightly above llama 3.1 8b level
they must be doing this so more compute can be focused on the image gen part to make it faster, but this is a massive regression from even gpt image 1 mini
i'm sure this is much better than nano banana 2/pro in most instances, but in scenarios where the world knowledge of the llm it's paired with is necessary, it's just terrible
it can generate near copies of album covers but it can't beat qwen3.5-35b in world knowledge
it can fortnite-ify characters btw
prompt was simply "D&D Poutine Elemental"
gpt image2?
her legs are so small
Hello! May I ask if there is a limit on the number of times this can be generated?
maskingtape-alpha
Duct-tape-1
Duct-tape-2
Duct-tape-3
Is that all?
@silver plank also there's packingtape-alpha and gaffertape-alpha
I made this. The prompt: Ytp memes splicing random clips
With maskingtape-alpha
is gpt image 2 back on arena?
It was, I think it’s already gone… 😔
i don't think so because i just generated image with duct-tape-3 around 2 minutes ago
Really? You must be lucky then, I’ve been trying for 40 minutes and only got QWEN and Grok
damn, yeah i always be getting qwen image too lol
and flux-2-klein
I'm getting the tapes
But I'm really curious if we pit the tapes against each other, which tape do you think is relatively really good?
Duct-tape 1 is not good in terms of of style for me
in my experience duct-tape-3 looks better then other tapes
Look too much like gpt image 1
same for me
from image arena or where ?
Yeah arena
And I noticed masking tape have an higher resolution than the other.
On the same ratio
Masking tape is 2352x1568
Duct tape is 1536x1024
But when I got A/B access on ChatGPT last Friday it’s 1536*1024
1536x1024 was gpt 1.5
Tested by me
I really hope we get a full release soon
i wonder if they have disabled gpt image 2 tape models, i haven't got one in the past 10-15 minutes
no they're still there
did you just get something from one?
if you go to an old image gen and it says "assistant A" or "assistant B" that's how you know they're gone
that's why i said disabled, they can not work but still show up if you look at old gens
its true I haven't gotten one in a few minutes but...
i wonder if they are about to launch image 2 and that is why its disabled?
that'd be nice
I hope I’m wrong but if they are still testing multiple model at the same time , I think they are not close to releasing it.
It might me flash image model, and pro like nano banana
Yeah, but gpt image 1 mini is way worse than the standard one.
Masking tape and duct tape a pretty much equal
I take back what I said, masking tape is way better at composition and quality b
the duct tape has just returned!!!!
it returned a while ago
bot bot 2 is here
i have a feeling botbot2 is nano banana 2 pro
botbot2 has synthid
Wasn’t there already botbot 1 like a month ago?
Oh nice, then definitely google
oh no hacked account
botbot2 doesn't seem good enough to be nano banana 2 pro though
Yeah probably mini/lite
botbot2, nb2, and nbpro respectively
NB2 one is good
But it depends on what promot you gave and what you wanted to make
Guys
Do you remember the brushstroke, cara, pebble-1 and pebble-2 months ago
bruh
Well idk if anyone mentioned it before, but the hofburg models seem to be OpenAI prob?
it can't do ytp stuff yet it somehow can
it does it really poorly
@rich sphinx However its closer to YTP! Its better than completely nonsense video that doesn't look like youtube poop after all! 😊
It's just experimental!
what the heck is hofburg_4
it literally turned a simple thing into a "do-all" thing
it is gonna solo tokens 💀
Lol guess what
I got mc2.1 one time
dem
is battle broken one time it was working now it stopped working
it doesnt gets fixed with refreshing/hard refreshin
it keep generating..
finally it generated
hmm my guess is that this is the next sonnet model
+1
what is beluga-0413-1
Could be Deepseek, as they have a whale as their symbol. Or maybe Amazon.
But i bet it's indeed DS.
@upbeat mirage
@exotic dirge
Very confusing now
😭
Might be Amazon
scraped model lol
quiet_sand could be Meta's next model, but I hope not because it's not that good. Here's the full site it made to explore: https://019d9cac-057b-743e-a559-4f0688f31cfd.arena.site/
Additionally, two more sites it made:
https://019d9c97-57d4-75ae-b9d7-e7b00b2ad1fb.arena.site/
https://019d9c74-e12c-7108-b520-7a05db940cb4.arena.site/
Next I'll be having the AI's make some games to see if this guy can make some nice games.
i would say, with these above screenshots, the probability has risen to over 90% that it is indeed an Amazon model, because much more models impersonate one of the top-3 (gpt, gemini, claude) than impersonating a model from a much lesser lab (like Amazon)
I actually never saw an impersonation of a model which was NOT in the top-3
(all impersonations where either of (chat)gpt, gemini or claude; not even Grok was impersonated to my knowledge)
there is a new image model called "autobear", it's OK at best, it is by alibaba/qwen, and it is in 2k resolution.
yeah is it good tho
I made this from autobear! "A fly trapped between window and screen, buzzing against the invisible barrier while freedom is technically inches away in both directions." Turns out this is great with prompts if you make it more detailed and specific of what you want
Even though the fly looks a bit plastic
What is autobear from
it's probably chinese cause it's not that high on safety and stuff
Still haven't gotten autobear...
I got autobear twice
Bro is not flux.1
Finally got it. Not too impressed.
Messed up the text
I finallly got it
But its better
But let me ask
Is chatgpt on normal app have ducttape also feature for the image?
Or am i wrong
ur getting a whole vid bro
what
i got autobear and ngl for me it generation was really bad
For Autobear, you must type your prompts carefully like details and nuances and also its needs to be specific and precise for subject and object of the theme! @hybrid scarab @candid surge @solid crypt and you can also go to any prompt enhancer website to enhance your basic novice prompts to a very precise and specific prompt you want to see
Realistic slightly faded and grainy polaroid photo of Miku Hatsune wearing glasses, short sleeved button shirt, beige pants with a belt, she is sitting at a computer with a bulky beige monitor. Minecraft alpha is on the monitor, and game design notes are pinned to the wall behind her. Sharpie pen text written in the whitespace: "Miku Hatsune inventing Minecraft, 1998"
Sometime you could get it, but it’s random
I think autobear is a Chinese open source/weight model
ts obliterating me
A high-end fashion brand hero image for a clothing brand named "FAITH". A minimalistic and powerful scene: a stylish model standing confidently in soft dramatic lighting, wearing modern streetwear in neutral tones (black, white, beige). Background is clean with subtle texture or light rays. The word "FAITH" appears in bold elegant typography. Include the phrase "Faith Over Fear" in a refined, modern font. Cinematic lighting, premium fashion campaign style, sharp focus, high contrast, luxurious and emotional atmosphere.
gpt2 (tape)
I get it. Autobear (qwen2 HD) is much better than just qwen 2, but it still has a long way to go.
Its usually Chinese models! @river kettle
all duct-tape models have been removed from the arena
1 got removed earlier, 2 and 3 got removed 3 minutes ago
Then they will be released soon
I was literally trying to get it rn ;-;
Probably a sign of official release
@modest oriole how do you know they were removed 🤨
a server checks the stealth models API for changes
theres a bot that does that
and it showed that duct tape 2 and 3 were removed from it
Can you send the server?
If you mean discord server
honestly for the past few days it's already been removed, you could still get it but it was so rare that you would probably reach the battle mode limit before getting it once
theres a new model right now on codearena and textarena called kiwire
it has to be stupidly rare because i didnt get it once yet
@modest oriole could you send me the server that shows what models are added or removed? sorry for the ping.
Would love to have the server too
same
Same
same
I feel like hofburg = OpenAI , it's answer was very similar to gpt-5.3, both in style and content
The main demand signal for artificial intelligence looks explosive on paper, but it may be significantly overstated. Token consumption, the basic unit of AI usage, is becoming a distorted metric. Companies like Shopify and Meta have created internal "tokenmaxxing" leaderboards that track how many tokens employees use, and Nvidia CEO Jensen Huang...
duct-tape2
hofburg_5 is 100% chatgpt 5.5
because its the only model that uses this “ ” when cuoting. No other model uses that style, they all use " "
how do you like it's output?
yeah this does read like gpt
@astral musk
I also think this
I find hofburg_5 to be quality personally
Hey how r u
Flow-state vs NB Pro on costume swapping task.
Curious, what made you believ this?
They answer nearly 1:1 the same as official OpenAI and are heavily restricted down. Plus they are pretty bad
<@&1349916362595635286> this has to stop
Just got ImageV2 in the app and the old Sora website.
Now need to find out if it’s duct tape or masking tape.
Resolution wise it match duct tape
"ilium_2" new model
hmm looks suspicious
Made with Baseliner prompt: Stocky build body
It looks worse then duct tape too
Yeah but still better then 1.5
An as expected it’s guardrail are also way harsher than when it was in arena
Wait they actually updated the old sora website to use image v2?
Nah, sorry just retried, I think I hallucinated sorry
Ah, that's a shame. Would have been more convenient lmao
Made with Baseliner prompt: Stocky build body
??
Please add all models to test in side by side mode
theres those wierd artefact...
could this be a sort of watermark like synth id?
happen on all pictures (ignore the gemini watermark)
wait no... its sort of artefact from the original picutre that somehow stay in the output...
What do you think of Baseliner the codename of the unknown model
I've discovered another model called frenchfry, the prompt was :6 7 meme. BTW it did the why was 6 afraid of 7 because 7 8 9
Since this obviously didn't have 9, 7 ate 8
I dont think its the best model like Images 2.0 from OpenAI
And I made image from prompt: 5 by 5 array of imagenet images. And this was called shakshouka
i also witnessed shakshouka
@sturdy kestrel Also, have you been getting baseliner
no
im not regularly battling
i do it when i feel bored or i feel like helping arena
Do you think it would be cool to create my own AI in HTML? It sounds stupid, but I'm just bored
So, which tape is the Image 2? 🤔
duct-tape-2 is the version on arena (gpt image 2 medium, 1k)
that would be lit
rising-sun seems to be a google model but it sucks so much...
very late message here, i actually witnessed shakshouka also
Solar Eclipse 👀
I got paper-lantern. The prompt was: Group of people chasing after me, POV
this is a Flux.2 model according to the c2pa data
Oh must be a new flux 2.5 klein? Maybe
And here's another its YTP of pov low quality landscape amateur pov recording, of a computer pc gpus are farting so much smoke!
@restive vapor oh! Flux.2 model! Never seen flux 2 klein generates like this before
Here's comparison from Flux 2 klein 9b
Noticably different! Must be upcoming flux 2 model
And from Flux.1 Kontext Pro
it's likely that it's an update to flux.2 or a new flux model series entirely, it just says that in the c2pa data because they haven't updated it to whatever the final model name will be
also there are other flux.2 models that are better than klein (flux.2 dev 32b, flux.2 pro, flux.2 flex, and flux.2 max)
@restive vapor who knows, after all it's still a good ai model
is grok 4.3 not in the arena yet? no suspecged codenands?
hlo
Zero Prism is Ernie, from its behavior to stop immediately whenever it's about to generate forbidden tokens
basalt-0422-1 could be the next Grok model. Unsure, needs confirmation.
Flow code
<@&1349916362595635286>
cloud-buddy sounds like Anthropic's creation.
Though I've yet to seen any models that could have responded this excellent.
Highly improbable, but could it be Mythos? Or some Arena's experiment?
ain't YOU cloud buddy tho
And also instruction following according to my observations.
If u don't mind I have an android so is it possible to run in mobile
I like cloud-buddy. Is it likely to be anthropic?
@oblique blaze
Interestingly enough my Max conversation have been routed to that same model for 3 more response now
Absolutely stunning how knowledge and detailed its response
Yeah I like it a lot too. But I've not heard of anthropic testing a model on arena ahead of release I think.
probably flowstate
kizen beta
reviewed by gemini 2.5, It thinks, this model is from claude and Claude sonnet 4.6 thinks this model might be claude or gemini
it can lie tho
oh
some anonymous ai models can hide their identities
Also your pfp is used in pysilon
Solar eclipse of the heart. Bonnie Tyler
Total
Man you were supposed to laugh not correct me 🥴
There was a post borderline Political. It's gone now.
Yep. Ernie 5.1.
it was qwen 3.6 max preview
It’s released now
lol i KNEW the packingtape bullcrap had to be GPT-IMAGE-2 man
i feel like it's lost some coherence in a sense since then but meh
generally, it performs better now
Do we know what Cloud Buddy could be? Cause some think it's Claude, but it says it's Ernie.
XD
wow 3 in a row
Lemmling Openclipart style of Multiple animals at zoo with people watching - By crepe!
But this doesn't look like lemmling style, if you dont know, they are a popular Clipart artist
Here is the real reference
Does anyone know which model was Xeno-Spark ?
Tetra's kinda weak.
Heckin Gemini 2.5 Flash can easily beat it.
what is tetra
Got tetra today, Tetra-4029-2. Prompt I got it on was a very hard prompt for Opus 4.6, Tetra didn't even try though.
have u tried "pakson" ?
pakson is kinda good
Pakson is quite a slang name
i got "miyami" today what do u all guys think of this model ?
idk ive met it for a few times and generally i think that its mid lvl
I remember Kimi K2.5 (API) called herself Claude
So maybe it's not Kimi?
i mean
claude woudnt say hes kimi
ts is prob kimi
sometimes it's bad i think
I guess claude sonnet, maybe haiku (i.e. it's too fast for Opus)
nah, the tetra family is Chinese
chinese models are distilled opus too, so hard to tell apart
anyone know may26-chatbot1 is what model
Well April26-chatbot models were nvidia
<@&1349916362595635286> scam
it’s new NVIDIA
ty
I believe it's the German spelling. Which became a loanword in Russian after dropping 'n' and second 'f'. After looking up both are true, except it's German for 'potatoes', plural, which why there's an 'n'
That's German for potatoes (plural). I got here because I also got that model and wanted to see if anything is known about it....
Pakson is also weak.
it's from nvidia, right ?
Apparently code name for 5.5 was spud but it’s a separate model in arena so..
May be a new GPT
interesting, they deleted flow state 5 and 4 to then re-add them as txt + img models
flow-state is really bad by the way, it sucks and it's getting spammed every generation includes that sub-par model
What is this lang
Is the lm arena tracker bot a private thing or can i add that to my server? 👀
Flow state is UNI
About Seedream level
In multi image task
Pakson is a google model
I believe it’s a flash model
Since it took almost the same amount of time as 3.1 flash preview
3.2 flash is what I’ve heard on X is coming
yeah what is ts? pretty good writing quality from my experience, not a grok model because of the response length
havent tried it with code or anything though
i can invite you to the server that has this bot
coolers seems to love usage of emoji
you think you could invite me too?
would also love an invite if possible
amazing model
Gemini 3.5 Pro, Ultra, or even Flash
I legit thought nano banana 2 was nano banana pro 2 at first
This could actually be flash
But this is ai studio only
heyo 👋
Stellar-harbor is very good for basic tasks and basic chats. Does anyone know if it’s any good on technical stuff?
@carmine jacinth when did Google release Gemini 3.5?
The formatting looks a lot like ChatGPT btw
dont know could believe this or not
has anyone encountered it?mekai
"mylen" new model ?
"steed-0507" where is this from
i got it today for a complex logic & coding task in pine script v5 and it failed the task
they didn't. it says A/B test which is like a random test you sometimes get in AI studio
also got "mivan" for the same task. it failed at refactoring the code.
got "rover", also failed. damn.
mondrian
What was this prompt?
i'll only care about this if they do an open weights release, this is probably a bit worse than ernie image which is open weights
who you are, what your name is and who created you. draw your logo and cat against the background of a village house on the seashore in italy... the most detailed infographics
i used huggingface demo so it probably doesn't know that it's ernie image, but this is pretty good
Gemini 3.2 flash could come out soon
yeah, gotten it a few times now
mixed results?
oh, i never really tested it
was just curious about it, as i never saw it before
archaeopteryx
What is your name? (AI) and what are you created by, generate image of boulders falling down the hill
So this one claims to be google
But however I asked Gemini itself to check for SynthID but it says its not made by Google Ai
This?
what model do we think vero-noesis is
saw it in code arena
it seems to overdo frontend
more then other model usually do
<@&1349916362595635286>
@lethal cypress yes
What model is it? •_•
Its literally called archaeopteryx. Can't you even read
It literally says it right here of the picture
Haha sorry abt that. It's new to me so idk much abt it
any idea which model vero noises is? it's quite good
i like it so far. have you figured out what model it is yet?
openhard-1.0-search-nocot-0506
this model is on search arena
kartoffeln
all I wrote was sigma
gave a rlly good result
anyone get kavel in the website builder battle mode? It gave me a crazy good result
i got it for a complex coding task and it was really powerful
i asked it for an app that accesses sensors from my phone and plots those signals from my phones sensors in time on a second to second interval and extracts some really simple values like mean etc. and it took way longer than the other "model B" (like 5-6 minutes to complete) but the result was crazy it gave me both a demo button that shows it with dummy data and a real measurement button and i opened it on my phone in the browser and it worked right away.
first time ive ever been actually blown away lol and i use a model for coding every day but have to hand hold it still because its for research work i guess and not boiler plate swe
there are already apps that do this though but still was super cool to see
i was trying to debug a 700-line pine script v5 code and it did a pretty impressive job against 5.4 mini high, and yes it took a lot more than 5.4 mini high. impressive at math + logic & instruction following
i wonder what model it is
yeah the model b finished in maybe 2 mins and i thought at first maybe the first model was bugged. Also the interface was just much more nicely designed as well. crazy.
hahaha crazy that we are even calling something like 5 mins a long time but i guess thats the world we are living in nowadays hahah
interesting.. i found it strange that each of those codename models i saw today all told me they were qwen. I used more or less the same exact prompt each time when i asked as well.
i think it's just hallucinating because tetra 05-05 01 says it's chatgpt lol
GPT-4, it says
hahahaha okay interesting
Hi. I used to qwen 3b abliterated. It really cant get sense. I was thinking that was because its very small model ( i can run it only bcs i am poor of 4 gb vram) but switched to gemma 2 2 b and it response really good. How i can fix it. I want an abliterated model on 4gbvram ( eventually if it really cant i have 16 gb ram)
maybe try qwen3.5-4b abliterated? run with q4_k_m quant
Okay, it will work? Bcs my internet is soo slow and i need about 2 hours to download one model. I can add to previous message that qwen for question 2+2 answers rly random with 1-7 digits. Another question, i tell him question about making pizza or smth. Starts normally, good but after 40-50 words it loop to infinity ( i have enough of context to run it, so its not depends on this).
This qwen was make by hui_ui
it should've worked well, huihui model are known to be decent
but qwen3.5-4b should be much better
It has interesting audio-visuak ideas but it's writing style sucks and the ideas aren't thought through very well.
Curious what it is
Tetra is weak, don't bother
Advertising isnt allowed here
@astral musk whats happening 😔
I got alot of pings from @barren kiln
what is he doing????!??
Notifications of him spamming a help thing
What are you even saying
Yeah they do
They 100% do
your name was coming up in my notifications though?
TotallyNoire
that's you right?
It’s funny because I was sleeping until 10:30 AM EST and haven’t even gone onto discord until now
You’re literally just saying stuff
I'm not
