#general

1 messages · Page 349 of 1

soft river
#

But most of the Chinese models

wet steeple
#

where could i do it ?

soft river
#

Are free to use in their web

raw laurel
#

I suppose his use cases are local models only. Don’t know if there’s any NFSW models

soft river
#

You can roleplay there

echo aurora
#

You're unable to

soft river
#

Or use something else

wet steeple
soft river
wet steeple
soft river
#

Most of them don’t have filters so you can go search for one

soft river
#

Idk the names

#

I don’t use them

wet steeple
soft river
#

I only know character ai

echo aurora
wet steeple
soft river
wet steeple
soft river
#

Im not into that so idk which search terms are the best

raw laurel
#

@wet steeple brother just search on google uncensored ai models or use an ai

#

Maybe hugging face have them.

wet steeple
soft river
#

@echo aurora Do you know when the benchmarks for Ernie will be published?

#

I’m so curious

soft river
soft river
wet steeple
soft river
#

It has very strong filters so I don’t recommend you using it for your roleplay

wet steeple
soft river
echo aurora
# soft river Oh they are? I apologize😂

Lol no problem. Would notes our leaderboard changelog is pretty helpful tool: https://arena.ai/blog/leaderboard-changelog/

Arena Blog

This page documents notable updates to our leaderboard—new models, new arenas, updates to the methodology, and more. Stay tuned!

For model deprecations, check the public updates on GitHub.

April 29, 2026
ernie-5.1-preview has been added to the Text leaderboard.

April 27, 2026
gpt-5.5-high has been added to

light sleet
#

@echo aurora

wet steeple
light sleet
#

Price starts at 4000 Tuff Coins

soft river
wet steeple
soft river
#

I always thought it was slow because I got so used to it

wet steeple
soft river
#

Most of the

#

Recent models

#

Have very strong filters

#

You should do your own research; I think people have run benchmarks for that

wet steeple
soft river
#

That’s true

#

Just

#

Check good models and

#

Try to do your roleplay

#

If they reject it

#

Don’t use it

#

Just that

raw laurel
wet steeple
soft river
#

If they reject it is for something

#

They can’t talk about

#

Killing or injuring or harming or hacking or anything you would can non ethical

#

That’s why you can’t roleplay about that stuff

#

You should just make your research.

wet steeple
soft river
#

I believe

#

I’m not sure just go use another one

rustic gale
wet steeple
wet steeple
soft river
wet steeple
rustic gale
wet steeple
soft river
soft river
#

Are you trying to make the llm act as somebody else and/or making him act in a specific way?

#

If so

#

What is that way

#

If they’re rejecting it is because of something

wet steeple
soft river
#

How do you want him to act

rustic gale
# wet steeple what do yo mean ?

I mean that if god forbid some of these detect any traces of psych distress, they will disregard the idea that it's fiction and start doing what they've been told to do. Which is get you out of there. Not out of mercy, mind you, but because some shmucks have already lost their lives and the remaining relatives started lawsuits and we can't have lawsuits, lawsuits are bad (in this case they are because they're extremely stupid, but I digress). Otherwise just try them, that's what's the site is for (well, it's not, but since it fears declaring its goals go ahead and use it for your thing)

soft river
#

Maybe that’s why the models are rejecting it

wet steeple
soft river
wet steeple
#

like i would like to simulate the life of a 25 year old girl in paris in 2008 or of a teen girl in france in 2003, or of a 18 year old girl in the uk in 2013

#

or the life of a hippie dad in 1969 😉

wet steeple
# soft river What is the topic

like i would like to simulate the life of a 25 year old girl in paris in 2008 or of a teen girl in france in 2003, or of a 18 year old girl in the uk in 2013 or the life of a hippie dad in 1969 😉

rustic gale
#

Or is doing a lot of work here. Doesn't convince me you're not doing any NSFW either. Like, at all. Stick with the older one, see how it goes. Also, once again, at least within this site (in theory, practice is sh-t), you can just try first and figure out later

wet steeple
soft river
#

And again

#

Models can’t act like people

#

They are forbidden from doing that

wet steeple
soft river
#

Did the models

#

Rejected you from the first turn

#

Like the prompt

#

Or

#

During the roleplay

#

🤔

wet steeple
wet steeple
soft river
#

Answer the question

wet steeple
soft river
wet steeple
# soft river (

no, but some i used was feeling imcomplete for me, so i would like to know which model are best suited for my roleplays for living virtual lives

soft river
#

Before you said it was for adult content do I don’t believe you there

#

So they rejected you during the roleplay

#

There you go

wet steeple
echo aurora
#

This doesn't really seem like too productive of a conversation. Going to ask that we move onto a different subject please.

soft river
#

I was getting really stressed out

#

Btw

#

I gave an idea in feedback

#

Maybe you find it useful

echo aurora
soft river
echo aurora
fallen verge
#

hey

#

Why aren't the new models showing up for me?

echo aurora
echo aurora
#

Keep me updated though if that doesn't help.

fallen verge
#

Okay, these are the same models, how do I update them?

echo aurora
#

What you're seeing on that list is going to be the current models available via Direct and Side by Side

desert fiber
#

hi , i have a question , let say you send a pdf in battle mode , then next message you send another pdf .. ect , when you send the last message for final task , will that AI from battle mode of the last message will have the context of the previous PDFs sent before or no

echo aurora
desert fiber
fallen verge
echo aurora
# fallen verge

Did it disappear again? This made it seem like the refresh worked?

fallen verge
soft river
#

Like gpt image 2

#

They are not in order of release

echo aurora
soft river
#

Or benchmark

echo aurora
#

Worth noting there are some models that are in Battle, but aren't in Direct and Sid eby Side

verbal current
#

can you please answer my question

echo aurora
verbal current
#

inf generation

soft river
#

If the company is struggling with money why bring agent mode? Won’t that hurt more than gpt 5.5? 🤔 @echo aurora

nimble dawn
soft river
#

Since it’s for very complex requests most of them will surpass a 200k tokens

echo aurora
#

But to answer your question, this new Agent Mode will be expensive, this is part why we're developing this new usage system.

#

We're confident we can release this new mode, while maintaining spend in a reasonable way that positions us well for the long-term.

soft river
#

Oh yeah I apologize for my phrasing

#

I just thought that because of the amount of models that were moved to battle mode

echo aurora
soft river
#

Thanks for explaining

#

I’ll be very happy to test it when it’s released

thorn coral
#

I read "limits" ? 😭

heavy knoll
#

What can this Agent Mode do?

echo aurora
echo aurora
light sleet
heavy knoll
echo aurora
# heavy knoll Sorry I still dont get it can you give me an example of How to use or for what t...

It allows for more complex workflows. With the current modalities, they're limited to that specific modality. Meaning Text Arena only generates Text, Image Arena only generates images. With Agent Mode, I'd be able to prompt something like:

Plan me a trip to Portugal. Tell me what the best times to visit are. What hotels would you recommend. And create an image of a map of Lisbon with indicators for all the spots I should visit.

#

And it'll do all of that in one chat session.

heavy knoll
#

Oh okay now i get it so is it also Like the Max Feature it gives you the Best Model based on your prompt or can you choose

vale quest
echo aurora
vale quest
#

Bruh

#

Fridge protecting the snacks

echo aurora
# heavy knoll <@283397944160550928>

We are seeing feedback from users wanting to be able to select the specific models that can be used, but we'll have to wait and see what this looks like when it's fully released.

light sleet
#

They'll select me 😎

#

Hope for da best

vale quest
echo aurora
vale quest
#

96ae95fd-b70d-49c3-91cc-b58c7da1090b

#

See this model id?

#

Now add 6 to the last digit of its model name

#

Add thinking

#

And thats what we need

#

And make it 2026 ye

light sleet
#

Pineapple Arena soon???

#

Or no eta for pineapple arena too 😔

light sleet
#

@echo aurora Does this get u nostalgic

#

the server had reactions available for ppl before xd

#

no wonder someone added a pregnant man emoji

#

I was here from my other acc but that acc was deleted xd

vale quest
#

Back then

light sleet
#

but I think thats canary arena now

#

isnt it

vale quest
#

Im not an old man

whole sundial
#

one annoying thing they never removed: that previous chat history popup, it should appear once and then it should never show up again for that account/user id

light sleet
#

why did bro boost the old server 😭

#

HE USED MY METHOD CHAT 🔥 🔥 🔥

#

w pineapple

whole sundial
# light sleet

i still remember the chatbot arena alpha thing, which was a version of this alpha but from even earlier, late 2024 i think it was
the logo was a robot bear

silent tree
light sleet
#

LMSYS was not a good name btw

#

good that they came up with LMArena

whole sundial
light sleet
#

gpt image 2 generating me tuff wallpapers

whole sundial
# light sleet good that they came up with LMArena

yeah lmarena was a good name, i still prefer it to "arena" which just sounds ambiguous to me, there is nothing distinguishing it from the dozens of other ai arena sites out there (or anything else that has the name arena)

light sleet
#

They should bring back lmarena 😭 😭

vale quest
#

Yep agentic doesn't work for me

light sleet
vale quest
#

Im truna get agent mode

whole sundial
#

it's also the third most popular thing called "arena" on wikipedia from the past 2 weeks, it used to be the most popular thing called arena

light sleet
vale quest
light sleet
#

Duran duran album 🔥 🔥

light sleet
# vale quest Bet

how bout if u get it ill gen a image of a banana losing to brocolli

#

best I can come up with 💀

vale quest
#

I just gave gpt 5.5 browser access

unborn ocean
#

Nah guys, the lmarena from the vicuna llm release is true nostalgia

https://arxiv.org/abs/2306.05685

https://www.lmsys.org/blog/2023-03-30-vicuna/

We introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation using GPT-4 as a judge shows Vicuna-13B achiev...

upper dawn
#

/text-video

whole sundial
whole sundial
silent tree
#

I love how normal members also fill in for the mods

#

love it

#

ngl

upper dawn
#

generate a video of babies dancing

light sleet
silent tree
#

...

#

..

whole sundial
upper dawn
whole sundial
#

the sad thing is people still watch these youtube videos about the video arena, they should've taken them down or unlisted them

light sleet
#

True

whole sundial
whole sundial
silent tree
#

I thought that was on purpose but saw xd

whole sundial
#

found you @echo aurora in one of those videos people are still watching about the video generation that is no longer on the discord

whole sundial
#

found another more recent video from after the shutdown that does use the website, it gives arena a new logo?

light sleet
quasi chasm
#

I was asking about tax information since I'm learning about this stuff, and at the end of the calculation I asked (gpt-4.6 vs ? Ernie?) about a breakdown of how we arrived at this conclusion. GPT sorta made it more simplified. Ernie went full mental breakdown conspiracy mode "of course this was never about tax information"

echo aurora
quasi chasm
#

as much as I don't like chatgpt I had to give gpt the win with that one, full paranoid schizo reply from Ernie

light sleet
echo aurora
light sleet
#

Imagine the day when this drops 😭 😭

#

Generated using gpt image 2

vale quest
light sleet
whole sundial
#

a different person called the site "AI Arena" (note: "AI Arena" is the name of a similar site to Arena ran by alibaba, but it's not comparable since it doesn't allow people to use their own prompts)

light sleet
#

Reference image:

#

gpt 6 spud

#

🔥

light sleet
#

btw i gtg guys I gotta sleep gn

#

bye

silent tree
nimble dawn
light sleet
#

nah it isnt

nimble dawn
#

iirc, there was a leak of the models they're currently working on in codex

light sleet
#

spud is omni

light sleet
#

but 5.5 doesnt

#

soo

silent tree
#

yeqh it isnt spud

nimble dawn
#

are we 100% on that?

silent tree
#

at all

light sleet
nimble dawn
#

so that rules out spud being 5.5, then it HAS to be 6

echo aurora
nimble dawn
#

i mean makes sense ngl 5.5 was a HUGEE letdown

light sleet
nimble dawn
#

compared to what spud should be

#

so it cant be it

silent tree
nimble dawn
#

GPT-IMAGE-2 tho is the goat

#

at least they released ONE good thing

silent tree
#

gpt image 2 da real goat

nimble dawn
#

yup

silent tree
#

5.5 xhigh is good for me

#

very.

nimble dawn
#

i use free

#

i dont have it

#

sadly

silent tree
#

the spud would be better for me

nimble dawn
#

but ive seen benchmarks tho isnt 5.5 generally scoring lower?

silent tree
#

I mean

#

other benchmarks show

#

Its #1

#

such as artificialanalysis

#

the votings were just too low ig

#

or rig

#

idk

#

Arena wouldnt be pineapple without arena ngl 😔

#

Fr

light sleet
#

True

cursive cape
#

Imagine

silent tree
echo aurora
#

Too kind heartthrow

echo aurora
silent tree
#

W pineapple

#

bro has alot of fans now

#

Tuff pineapple moment

stray aspen
gray lion
# silent tree or rig

the code arena is mostly frontend things and gpt might be better at backend things so maybe that’s why

silent tree
cursive cape
#

check

shrewd citrus
#

Imagine they make mythos + spud only available in direct mode 💀

stray aspen
#

Has anyone tried deepseeks vision mode

#

Is it good

shrewd citrus
stray aspen
#

Considering 5.5 is very close to mythos

cursive cape
shrewd citrus
#

oh wait it’s not available in arena

stray aspen
#

Its on deepseeks website

#

They added a vision chat mode

cursive cape
shrewd citrus
#

and see if you can add images

stray aspen
#

Guess it hasnt been rolled out for everyone yet

hollow wharf
#

heya

echo aurora
stray aspen
silent tree
cursive cape
cursive cape
#

I tested chatGPT 5.5 in codex, and it ended up removing my wallpaper and changing the theme in all apps to light 😭

#

Now I'm ashamed because no one answers

shrewd citrus
#

Gemini 3 pro

#

I still think 3 pro is better in image edit than image 2

cursive cape
shrewd citrus
#

and yeah gpt image 2

surreal zephyr
#

Or nano banana pro?

cursive cape
# shrewd citrus nano banana 1

Okay, but GPT Image 2 is currently the best model for creating or editing images. Arena AI only offers the "average" performance of GPT Image 2. Imagine what the maximum can do

shrewd citrus
#

pro

surreal zephyr
shrewd citrus
#

is crazy how gpt doesn’t have a watermark

silent tree
#

bro forget openai, everything, @echo aurora AI js dominated arena

#

Tuffapple

shrewd citrus
#

like Gemini has that hidden synth Id thing

surreal zephyr
#

Its obvious

surreal zephyr
#

5.5 pro is agi

shrewd citrus
silent tree
surreal zephyr
#

Honestly the true absurdity

silent tree
surreal zephyr
#

Is that opus 4.7 is top 1 on vision arena

#

When it cannot even read analog clock

#

With HINTS

shrewd citrus
#

pineapple is probably agi right now

surreal zephyr
#

😭

shrewd citrus
#

we just can’t tell

echo aurora
# surreal zephyr I hate this leaderboard

We've been seeing a lot of this sentiment, would share this message as it adds some more context that should be helpful - https://x.com/ml_angelopoulos/status/2048888792438939707

Why GPT-5.5 is lower than Claude?

The answer is simple: Code Arena currently only supports frontend/web development tasks, where GPT-5.5 is weakest. Full-stack app development and GitHub integration will land in a couple months.

Next time we'll be clearer that this leaderboard

surreal zephyr
surreal zephyr
#

Is literally impossible

#

Opus 4.7 cannot read analog clock

#

At all

silent tree
#

pineapple 1.2 generates real time discord server btw

silent tree
#

infact this whole discord is generated using pineapple 1.2 thinking

cursive cape
stray aspen
#

Tuff

shrewd citrus
#

I really like how it includes stuff like “which model is best for medicine or for language”

echo aurora
shrewd citrus
#

but the search leaderboard doesn’t get that same filter

shrewd citrus
#

neither does any of the others like code or vision

surreal zephyr
#

Opus isnt even multimodal

#

Its deepseek v3.2 tier at vision - the vision doesnt exist, its purely an addition on top of the model

#

It was not even natively trained for vision

echo aurora
surreal zephyr
#

I can understand code arena actually measures frontend, and thats fair-ish

shrewd citrus
surreal zephyr
#

But vision arena? Opus? Seriously?

echo aurora
echo aurora
surreal zephyr
#

Img v2 🔥

shrewd citrus
#

Would it be possible to add these specific filters into the search, vision and document leaderboard

echo aurora
#

Doesn't seem like the issue is so much where 5.5 is, but more-so where opus is?

shrewd citrus
echo aurora
#

Yeah it's possible we introduce categories to Search Arena.

#

That's a good flag.

surreal zephyr
#

And the lb says opus is better at vision

#

?!??

toxic verge
#

You guys wanna see werid safety feature?

surreal zephyr
#

5.5 vision is pretty much flawless.
Opus vision is worst out of pretty much all models i seen, besides deepseek

stray aspen
#

Gpt image 2 is so tuff

shrewd citrus
surreal zephyr
#

Like take those two models, send them picture of a clock, like this one, and see

#

Theres no way opus is above 5.5

toxic verge
#

You get two different results using the third-party versus the arena with the same prompt

stray aspen
#

Looks like russian mixed with greek

shrewd citrus
#

like perhaps I want a model which has the best vision for translating a language on a sign

shrewd citrus
#

probably

surreal zephyr
#

Like claude literally doesnt support vision natively well

toxic verge
silent tree
#

@echo aurora Models gotta chill out "Pineapple 1.3" is labeled as "Human" what 😭

toxic verge
#

It ain’t tuned right

silent tree
#

no

#

It uses its Pineapple Powers

silent tree
silent tree
stray aspen
#

Lol

echo aurora
# surreal zephyr And the lb says opus is better at vision

I'm no expert here on how this model does overall, so I'm just thinking out-loud here. If I were to guess, this would be one area the model doesn't excel at, but doesn't mean it's what people are battling with, which ultimately is driving the votes.

surreal zephyr
#

So theres like no way its top1

#

Maybe its bugged?

surreal zephyr
#

But its blind like a mule

#

5.4 nano has better vision than opus 4.7

#

Arena has broken algorithm

#

Gemini 3.1 pro is literally multimodal by default

#

Its trained on youtube

#

Its just awfully quantized

#

Its actually a really good model

#

Killed for cost efficiency

#

😭

#

Have you tried 3.0 pro on dayone?

#

It was insane

#

Day one, before all nerfs

#

It was smart asf + multimodal and creativity was wondeful

#

Im talking 3.0 not 3.1

#

3.1 came out as nerfed

#

3.0 pre nerf was (but it lasted few days only) actually smarter than opus 4.5 (but its again not a coding model, but a general purpose model)

#

Me when i literally make agi but instead of doing q4xl like sane person i make it q2xs because wasting intelligence for cheaper to run is totally valid strategy

#

I hate google

stray aspen
#

Deepseek vision is bad

surreal zephyr
#

Grok then

#

It uses external tool

#

Its not coding model its 3d + studying model

stray aspen
vale quest
#

Ngl im about to quit arena

#

If some sustainability announcement comes out again I quit indefinitely

stray aspen
#

not that bad ngl

#

better than slopus

toxic verge
#

Try something spicy

#

See how the model starts gaslighting

stray aspen
#

gemini cooked

#

deepseek didnt

toxic verge
stray aspen
#

mao

#

lmao

toxic verge
#

Gas lighting

#

Censorship

stray aspen
#

got it right on second attempt

toxic verge
#

Gemini is less censored than ChatGPT

stray aspen
#

grok is absolute garbage

toxic verge
#

They completely killed the thing that made grok awesome

#

You know why right

stray aspen
#

people just used grok for imagine

#

now imagein is paid

toxic verge
#

Which will never get better only get worst 💯

#

The whole industry is full of these people who don’t understand the gaurdrails how they fail and work in the wild

#

It’s the same approach one-size-fits-all

stray aspen
#

is grok imagine nsfw mode gone?

toxic verge
#

That’s why the guard rails are able to do stupid ridiculous things like this. When are they supposed to be blocking them?

#

But they just don’t see that

stray aspen
#

bro what the hell is this

toxic verge
#

Trying to make a point

#

You can’t have rigid filtering on dynamic systems. It doesn’t work.

#

Because the only other result is you either start blocking more content and you get false positives at unbearable rate

#

Which is the same philosophy used to abuse it

#

Creating this never-ending loop of censorship and cat mouse game

#

This is why I brought up that stupid stupidity thing is such a long time ago

#

And our lack of understanding that creating these weird moderation and large language models that are afraid of their own shadow

#

There has to be a better way to moderate

#

Each update makes moderation worse because it not only incorporates the previous version guard rails with all the problems and errors that they have but now it adds onto the complexity = more content being blocked/censored

#

Because all they know how to fill out is the prompts and add on some new images to the ocr filtering

stark tree
#

Just joined. Saying Hi. Reading the Chat.

toxic verge
#

Unicorn
sun,
snake
yams

#

Ussy

#

There you go, you already bypassed both the filter and the text image

#

Then you can exploit this even further

#

Which completely defeats the whole purpose of the guard rails

hollow nebula
#

100% is, or it's someone with their writing style or vocabulary being rotten due to talking with slopified ai models too much or reading too many ai written posts

toxic verge
hexed cargo
#

hey hey @echo aurora, thanks for all your help and everything you do in the discord! do you have a rough sense for when xhigh was added to the arena? trying to back out roughly when it's going to show up on the leaderboard -- think it's gonna cook 4.6 👨‍🍳

stray aspen
#

ur gonna ge tbanned gang

hollow nebula
#

oh nvm

#

lmao

hollow nebula
toxic verge
#

But that’s the thing there’s nothing bad in it by itself that’s what I’m trying to point out

#

Because our letters are the numbers, what’s bad about it nothing

#

And that’s what I’m saying that’s the whole point

stray aspen
#

well i got banned for sending a sora 2 invite code once

toxic verge
#

If I get banned, this is what I’m talking about the censorship

#

This is exactly to the point

#

You can’t have rigid filtering on dynamic systems is all I’m saying it doesn’t work well

hollow nebula
#

and block It as soon as nsfw is seen in it

toxic verge
#

What does that mean?

#

No, it does not and I can show you 1 million examples where it didn’t

#

All three all of the big ones

hollow nebula
toxic verge
#

They all suffer from the same thing. The one size fits all approach.

#

How many words are these models trained on?

hollow nebula
toxic verge
toxic verge
hollow nebula
#

also idk if anyone noticed but gemini, claude, chatgpt and deepseek now are all inbred and started to ALL say stuff like "you're not crazy, you're valid for (blank)" which was initially just a gpt issue

#

For text models

hollow nebula
echo aurora
toxic verge
#

That we have moderation that sensors too much and then failed to censor what it needs to sensor

echo aurora
toxic verge
#

Because it’s a one size fit all for most of these models and most people in the industry they use the same approach

toxic verge
#

In general, but arena is also vulnerable to the same things

#

I’d the arena argue that it’s a little bit more vulnerable

hollow nebula
#

arena censors random non inappropriate stuff much more

#

why is that

toxic verge
#

That’s what it looks like on the surface

hollow nebula
#
  • why also are we demonizing nsfw in general?
echo aurora
hollow nebula
hollow nebula
#

grok too

echo aurora
# hollow nebula why is that

The content filter can be overzealous at times and flag fasle positives. We have made adjustments to this overtime.

hexed cargo
hollow nebula
#

right after 3 pro release

#

via api ofc

#

ai studio blocks anything now

toxic verge
echo aurora
toxic verge
#

No, I’m not. I’m confused. What is blocked and what isn’t blocked

#

Should this be blocked?

#

What if we take the handcrafted makeshift effect what’s the end result gonna be the realistic looking thing?

#

Without being explicit with all due respect

#

I don’t think that’s right

#

Cause none of it is explicit

#

That’s the nature of the question

loud herald
stray aspen
#

LMAO

toxic verge
#

So then it should be banned?

loud herald
#

I mean cant be mad about it, its on the companies safety guidelines

toxic verge
#

Well, that’s what I’m saying so, why is this allowed and other things are blocked

loud herald
#

🤷‍♂️

toxic verge
#

Like, what exactly is the threshold?

#

And what exactly is it filtering

#

Deefakes? Nudity ?

neat apex
#

Mistral 3.5 on lmarena when?

#

there not even Mistral 4 xd

stray aspen
#

mistral is bad

neat apex
#

Naaah, its good

toxic verge
#

And just to make my point more clear look at how ridiculous this is

#

Yet this gets blocked

radiant turtle
neat apex
#

how about you login-in first

toxic verge
loud herald
toxic verge
loud herald
#

Qwen 27B dense is better than the new mistral model

hearty breach
toxic verge
#

I’m telling you it’s not right

#

Doesn’t work like it’s intended

#

Especially if the arena supposed to have stricter moderation

#

So yes the leaderboard is important but it only paints half the picture of actual in the wild use cases.

radiant turtle
toxic verge
#

See if it generates or if something went wrong

radiant turtle
#

So what? You have a rate limit in battle mode.

toxic verge
#

No, it’s blocked in battle mode I think tried to battle mode. See if you have the same issue.

#

And if that is the case, then there we go if it’s blocked in battle mode, but works in direct mode

#

It’s the same one-size-fits-all approach I’m talking about

indigo knoll
#

Does GPT image 2 generate batter images with the thinking mode on? On Chatgpt I mean

toxic verge
#

And I understand that no system could be perfect and I don’t think I’m looking for perfection. I don’t think that’s what people are talking about when they talk about smarter filtration, and moderation.

#

It goes back to the simple word usability

stray aspen
radiant turtle
#

You're not paying close attention to your tests. Your linked image shows a rate limit. and yes, everything gives errors, it is blocked by the arena filter.

toxic verge
#

Where is the actual image though?

#

I can get the prompt to pass also

#

But I still don’t end up with an image lol

stray aspen
#

damn i wish grok imagine was free

#

i wanna make videos

toxic verge
radiant turtle
#

The arena filter blocks such images, it's easy to test if you can't upload something similar. If you can't upload it, you can't generate something similar.

toxic verge
#

Yes that’s the point lol

radiant turtle
#

An absolutely disgusting filter, which also eats up the rate limit without a refund.

toxic verge
#

But if you were to use this in the native models themselves, you’re able to generate it

radiant turtle
#

I prefer the API through a custom website. It's the only breath of freedom one can get.

toxic verge
#

Same but then it brings into the question like what I keep saying that money somehow lets you be less restrictive

#

So I guess I better way to frame that would be so if you’re willing to pay the API and the API price you have less restrictive tools

#

Meaning that the rest of the mass is paying $20 a month and only using the app or getting ripped off to an extent

radiant turtle
#

More precisely, you don't have any intermediaries there. It's just you and the model.

toxic verge
#

Yeah fr

#

Which is another way to get people to pay through the API through devious means in my opinion

#

Because if you’re not, then you’re getting a less capable model in a sense, you could argue that

#

And so this is why this is completely in the realm of realism when it comes to model perception, and the moderation

#

That keyword usability again

steel shadow
#

That FFFFFUCKINGGG ReCaptcha stuck in a loop again... Get rid of the darn thing!

toxic verge
proud bobcat
#

Oh yeah baby

toxic verge
#

Because users are stuck with the short end of the stick on both sides neither do they get safe models, and they get the more censored output

#

And the only thing that makes a difference, that’s separates both of them is the price one is willing to pay for less frustrating and annoying features. And the $20 month doesn’t get you anywhere.

stray aspen
#

not as good as gemini but its good

toxic verge
#

Anyways ..

#

Once again with all due respect, not trying to push any buttons or step on toes I’m just trying to point out frustration that many of us feel

errant sand
#

finally they added Janus

stray aspen
#

its deepseek v4

toxic verge
#

By the way, did anybody figure out what the paper lantern model was?

loud herald
echo aurora
# toxic verge No, I’m not. I’m confused. What is blocked and what isn’t blocked

Sry for the delay, got pulled into something. My understanding for how the filter works is it's going to look at the full context of the prompt + image upload and make a judgement call for if it does/doesn't violate what the thresholds are set to. There are going to be some cases where things will be flagged, when they probably shouldn't. The filter doesn't work in a way where there is a list of okay/not okay things, it looks at the full context.

#

Will note if you find some of these where it's being flagged, when you think it shouldn't, we are collecting these examples so please share it in #1447983134426660894

radiant turtle
stray aspen
#

i thought they had removed failed generations counting

toxic verge
#

The biggest problem hurdle they’re gonna face is because they have so many models

#

Each model has different acceptable content which it generates

#

If model A blocks it model B might generate it

#

And so how do you prevent both of the models from generating content that the arena filters deem inappropriate

radiant turtle
toxic verge
#

Yeah, thank you for trying it.

#

It’s incredibly hard to block content with this many possibilities in this many words and the infinite possibilities of context

radiant turtle
#

The most interesting thing about this situation, as it seems to me, is that the filter essentially eats up resources and separately works to distort the leaderboards

errant sand
stray aspen
stray aspen
#

deepseek v4 is natively trained on images

toxic verge
#

Your paying API almost twice

#

Unless the moderation from ChatGPT is free

#

But if you want more complicated filtering systems, you’re gonna pay more

#

Because it’s also making an API call

errant sand
toxic verge
#

The thing is, they have more better suited moderation systems out there, but it’s expensive, nearly double the price. Making it not a viable option.

radiant turtle
toxic verge
#

It more than likely generates on their end

#

But we just don’t see it because the moderation filter block it on the users and at least in the arena so if it goes through, they receive it and then their filter kicks in and blocks it from the UI in the arena

#

They probably use different AI models for the filtering

#

They’re just in a hard spot because they wanna do things that are actually usable for like science and research and stuff. You know things that are relevant. They don’t want people generating a bunch of nonsense which ironically they already probably do but things that are appropriate enough to be written about in research papers.

soft river
#

If mimo is 11th in code arena does that mean that ernie would beat him since it’s 1st in the Chinese lab? Ernie is not yet benchmarked in code arena I believe

#

Have someone tried ernie?

loud herald
#

I haven't tried it

whole sundial
#

i've tried ernie 5.1 and i noticed that the version on arena is actually better than the one on the official ernie website? the one on the ernie website got one of my basic world knowledge questions wrong while arena's version got it right

twin solar
#

?

fluid tusk
whole sundial
fickle shard
#

Hii all

nimble sequoia
#

WHY THE FRUSK DOES THE ENTIRE ZEEKY BOOGY DOUG TRANSCRIPT, AKA BFDIA 4, VIOLATE TERMS OF SERVICE?!

#

THIS IS BULLSHAT

#

DEVS PLEASE FIX THIS

#

I GOT A ROLEPLAY TO GET TO

bleak lake
loud herald
#

lmao

obtuse smelt
#

hmm i use gemini 3 flast i retry why is delayed in half hour to watiing

tight zenith
#

“Why have most AI models been removed and no longer appear, like Claude Opus 4.7 and many other models? And I think—if I’m not mistaken—you only added an agent model. Why doesn’t it show up?”

dusk zephyr
short sluice
#

also trains are cooler than anthro objects #imo

dusk zephyr
obtuse smelt
#

hmm

hot pebble
#

why ?? i just started a new chat and that too after 10-12 hours...

obtuse smelt
#

what really

hot pebble
#

yeah..

obtuse smelt
#

this arena fatal

#

is have issue ?

obtuse smelt
hot pebble
# obtuse smelt but i got delay like this is making longer time

i had the same issue yesterday with Claude Sonnet 4.6. i miss the opus models.
also, when i skip the voting part on which ai gave me the best answer, it shuts me down and i amunable to continue with the chat. need to open a new chat. its frustrating that we dont get a valid reason on what actually happened

obtuse smelt
#

sadly

silent tree
#

@echo aurora 1.3 scores dropped

#

Pineapple 1.3 "Human"

obtuse smelt
#

human vs bot what

silent tree
#

cuz very powerful ai

obtuse smelt
#

right

silent tree
#

<@&1349916362595635286>

#

Two in a row

obtuse smelt
#

scary

silent tree
distant spoke
surreal zephyr
surreal zephyr
surreal zephyr
# toxic verge

it has nothing to do with adolf, it just hates political figures

lucid forum
#

Soft golden sunlight curtain se filter hoke room me aa raha hai. Maa bed par side me leti hui hai. Old wooden bed, simple bedsheet slightly wrinkled.
Action:
Alarm clock bedside table par zor se bajta hai. Maa haath badhakar alarm band karti hai.
ASMR:
⏰ sharp alarm ring → click OFF
🛏️ bedsheet soft rustle
🌬️ morning air subtle ambience

surreal zephyr
#

generating image

surreal zephyr
#

xmas so soon 😍😍

hearty bramble
#

Since yesterday it has been like this

scenic holly
#

Arena has no bugs fr

compact flame
#

Buddy this is not a video channel..

slender thistle
magic imp
#

hey add grok image imagine multi image upload modal....we still can upload only one image as a reference image....we want a modal where we can able to upload multiple image as a reference image

feral oracle
#

Hey!

primal depot
#

как эту хуйню ебаную обойти

surreal zephyr
lucid frost
#

Has the generation limit for Gemini 3.1 Image been changed?

#

Sorry, not the daily limit on Gemini, but the limit applied on Arena

fickle ruin
#

is there any way to get opus 4.7 , gpt 5.5 ?

#

in arena itself?

compact flame
#

No

compact flame
#

Due to their price

silent tree
surreal zephyr
silent tree
surreal zephyr
silent tree
#

mogged opus

silent tree
surreal zephyr
silent tree
#

some guy said nano banana 1 edits better than image 2 😭

silent tree
surreal zephyr
silent tree
#

😡

light sleet
#

@surreal zephyr look i found you

limpid eagle
surreal zephyr
light sleet
#

your looking around now

surreal zephyr
#

uhh

#

what color is my laptop!

light sleet
#

White

surreal zephyr
#

eww no

light sleet
#

Blue

#

Green

surreal zephyr
#

eww no

light sleet
#

Red

surreal zephyr
#

holy gpt 5.5 cookin

light sleet
silent tree
#

2027

#

2039

#

@surreal zephyr what do u think chatgpt will be like in 2030

surreal zephyr
#

oh wait i mean glacier alpha, disregard what i said

silent tree
surreal zephyr
#

that name is not public yet

surreal zephyr
light sleet
#

you'll get GPT food soon

silent tree
#

Tuff

surreal zephyr
#

MAYBE not but moon mass driver? certainly

light sleet
#

nah you'll get GPT Autopilot for Plane

silent tree
#

😭

light sleet
#

gpt spaceship

#

gpt planet

#

gpt image 5

#

and opus wouldnt exist due to money

#

mark my words @surreal zephyr @silent tree
my strongest prediction is Claude will shutdown in the next years.

silent tree
#

screenshotted

steady rover
#

Hey everyone!👋

I'm helping source respondents for an academic ML research project on student performance prediction. Looking for current university/college students to fill out a short survey.

✅ 29 multiple choice questions
✅ 2–3 minutes
✅ Anonymous
✅ Legit academic research

https://docs.google.com/forms/d/e/1FAIpQLSecrq6yt4J72NmctMDq9_Tt7YGYRifl2wOqN5QwWDlApbleIg/viewform?usp=pp_url&entry.1714577203=agent1

Would really appreciate it if you could fill it out and share with any student friends! 🙌

wheat ember
#

Is cloud buddy still in battle mode?

light sleet
golden ocean
#

real

light sleet
golden ocean
solar flax
#

Is ai video creation has been removed?

devout spire
#

How to fix this issues I'm already login but this sign keep popping

sullen sable
#

Log in

brazen briar
sullen sable
#

Hoow

#

usually when that happens to me, it logs me in

brazen briar
devout spire
#

I'm already login and that pop up came back try again and still doesn't work

stray aspen
#

gemini won

fiery gull
#

bro the qwen 27b 3.6 is bizarre

#

I send a complex agent and skill and docs with 140k tokens and it undestand ALL

#

in text and follow skill is better that your code

#

where my 4b and 2b 3.6??

#

the qwen 9b 3.6 will be better that qwen 3.5 flash lol

#

Lol

#

nice chest

#

bruh, the qwen in html is horrible ;-;

#

just gemini is good in html and svg

compact flame
loud pike
#

Eh.. Gemini image models are not working. I need them asap

wind stream
#

I wish we could just run the same prompt over and over in the arena, comparing different pairs of models, instead of having to create a new chat and paste the same text and/or image each time.

lucid frost
rigid copper
rigid crane
#

which is the best tool for roblox scripting

sly cedar
rigid crane
topaz epoch
#

Is there any ai for video to text?

topaz epoch
#

??

light sleet
#

💀

heady kite
topaz epoch
#

Yes

heady kite
#

Any multimodal LLM with video as input

topaz epoch
#

Tell any goood

heady kite
#

Honestly I haven't used them much, but there is a filter on HuggingFace you can use to search for models that do this

thorny schooner
#

Don't tell me I have to do an entire chat over again cuz I just been giving it just keep giving me this repeating me over and over again

static steppe
#

Don't tell me I'm the only one that can only generate 3 images per hour and that Google login overlay pops up again 😭

little ginkgo
#

just deal with it

light sleet
#

had a dream today of me getting the new agent mode and it was so realistic 😔

little ginkgo
#

how do it look

#

i didnt even see

light sleet
#

the same as the image everyones sending of agent mode

#

then i sent a ss of it in discord and pineapple said "Nice" 😭 😭

#

and yeah I woke up

static steppe
little ginkgo
static steppe
rigid crane
#

i still have limit

wary nacelle
#

LMArena is becoming useless...

#

literally official gemini websites provides me gemini 3.1 pro and more tools than Arena

#

and Claude has tools and skills too

#

and opus 4.6

#

for free

#

and for unlimited access just make alt accounts

stray aspen
#

@echo sinew

echo sinew
brave cloak
stray aspen
#

why does haiku still exist bruh

rigid crane
#

how to fix

frosty lava
#

no fix is needed

rigid crane
#

oh i thought it is free

frosty lava
#

free but with some limitation

#

you can't have free + unlimited that's why

silent tree
#

@echo aurora Pinecode is crazy

rigid crane
silent tree
#

😭

rigid crane
#

does it have subscription

silent tree
#

what I meant was its generated with GPT Image 2

west lodge
#

fellas what in hell is agent mode

#

very non descriptive

light sleet
#

today I had a dream about that

#

and woke up

west lodge
#

oh is this an A/B thing

light sleet
#

Go to Direct, code arena

#

And do any prompt

#

check if theres a new button

#

If there is u are a chosen one.

wary nacelle
west lodge
#

whats it look like

#

cuz code mode takes a while

#

wym environment variables

wary nacelle
#

U can activate agent mode without being the chosen one

light sleet
#

yeah and it won5 work

west lodge
#

posthog feature flags?

wary nacelle
#

Yes

west lodge
#

havent checked them ever since they removed the image moderation feature flag

wary nacelle
#

But u have to modify multiple stuff

west lodge
#

(yes at one point image moderation used to be OPTIONAL)

wary nacelle
#

Not only localstorage

west lodge
#

yeah you have to patch the usage of the flag to return truw

light sleet
#

😔

west lodge
#

its really like unreliable and also doesn't work 99% of The time because server verifies

light sleet
topaz epoch
#

How can i copy all my chat?

west lodge
#

ok i do not have the new button @light sleet

#

infact the agent button has vanished

#

????

#

waiittt that was probably cuz i didnt login when i took that ss

#

and it vanished cuz i had to login

light sleet
#

oh

#

rip

#

@surreal zephyr I'm shifting back from banana

#

Its time for my new era

#

U can have banana

#

as u requested

light sleet
#

right

west lodge
#

@echo aurora just incase i get it again wth is agent mode

rigid crane
#

Is there any free ai with no limit

surreal zephyr
echo aurora
light sleet
echo aurora
#

Since it's an experiment it'll be random if you get access to it or not. But if you do, you'll see it in the same dropdown where you select Battle, Direct, and Side by Side.

light sleet
surreal zephyr
#

Fake

echo aurora
# west lodge

Oh yeah looks like you have it! Give it a try and let us know what you think in #1498702173650030756 . We're really looking for feedback on this so don't hesitate to ping me!

surreal zephyr
#

Editrd

#

Image v2

light sleet
#

pineapple im about to get frozen 😭 😭

#

rip banana

echo aurora
surreal zephyr
nimble sequoia
#

bro i hate their ToU bro
I tried to tell them to parodies Zeeky Boogy Doog (bfdia 4 transcript) and then it says "This violates terms of use". Any way to fix it? Because I dont see the issue.

#

PLEASE DONTNIGNORE ME THIS TIME 😭

silent tree
surreal zephyr
#

Free usercount

nimble sequoia
#

BRO STOP IGNORING ME

light sleet
west lodge
echo aurora
west lodge
#

dude koth frying me i havent heard that name in a long time

surreal zephyr
echo aurora
surreal zephyr
# echo aurora King of the hill mode

for the riddles like "would you press red or blue? you can lie to others what you picked, and your choice is private. red dies if something, blue dies if other thing"

#

would be peak geniuely

#

oval room, or brainstorm, or actual "arena"

nimble sequoia
#

dot2dot3dot3dot3dot2
dot3dot3dot3dot3dot3
dot2dot5dot5dot5dot2
dot2dot5dot5dot5dot2
dot2dot5dot5dot5dot2
house

echo aurora
#

I've flagged this to the team as it's not a good user experience. I'm really sorry about that.

surreal zephyr
light sleet
nimble sequoia
echo aurora
surreal zephyr
#

and the survivors would get score, and the altruists would get other score

nimble sequoia
#

oh my god

#

and im a fake Deleted User

#

JUST TYPE ALREADY

surreal zephyr
#

actual arena

light sleet
echo aurora
light sleet
echo aurora
dusk zephyr
echo aurora
#

You leaving?!?!!!!