#general
1 messages · Page 349 of 1
where could i do it ?
Are free to use in their web
I suppose his use cases are local models only. Don’t know if there’s any NFSW models
You can roleplay there
You're unable to
Or use something else
which ones ?
which ones ?
Webs designed for role playing
where ?
Most of them don’t have filters so you can go search for one
which search terms should i use to find on google ?
Hmm
I only know character ai
Going to ask we don't discuss this here
he can generate p**n ?
You can search on YouTube
which seach terms ?
Whatever you want to do you can search it
Im not into that so idk which search terms are the best
@wet steeple brother just search on google uncensored ai models or use an ai
Maybe hugging face have them.
i didn't know that hugging face was saucy
@echo aurora Do you know when the benchmarks for Ernie will be published?
I’m so curious
what is ernie ?
I don’t know much since prole haven’t talked about it too much
An ai model
Do you know when the benchmarks for Bert will be published ?
It has very strong filters so I don’t recommend you using it for your roleplay
#announcements message already are 🙂
Oh they are? I apologize😂
it will be good to use with ernie 😂😂😂😂
I’m not sure what bert is
Lol no problem. Would notes our leaderboard changelog is pretty helpful tool: https://arena.ai/blog/leaderboard-changelog/
This page documents notable updates to our leaderboard—new models, new arenas, updates to the methodology, and more. Stay tuned!
For model deprecations, check the public updates on GitHub.
April 29, 2026
ernie-5.1-preview has been added to the Text leaderboard.
April 27, 2026
gpt-5.5-high has been added to
@echo aurora
it's a joke because there is two sesame street characters called Bert and Ernie lol
Price starts at 4000 Tuff Coins
Now that I see it I haven’t realized how fast the models get updated
which one have very strong filters so you don’t recommend you using it for my roleplay ? which ones would you recommend then who might have lower filters ?
I always thought it was slow because I got so used to it
because some of my roleplays won't include adult content
Again, I’m not very into this stuff so I don’t know
Most of the
Recent models
Have very strong filters
You should do your own research; I think people have run benchmarks for that
the problem is that now the ai world is very big so it's difficult to find the perfect model 🙂
That’s true
Just
Check good models and
Try to do your roleplay
If they reject it
Don’t use it
Just that
There must be hundreds of people discussing this on Reddit or other obscure forums. Just do a simple google search.
because many of my roleplays won't include adult content, so even with no adult content just normal slice of life roleplays the filters will block the roleplay ? it's not logical
It is logical
If they reject it is for something
They can’t talk about
Killing or injuring or harming or hacking or anything you would can non ethical
That’s why you can’t roleplay about that stuff
You should just make your research.
This
it's not about killing or injuring or harming or hacking or making non ethical things it's just for making daily life roleplay with very tame content
Ai can’t roleplay about real life either
I believe
I’m not sure just go use another one
Yeah, enjoy this being equalized with any mention of a nipple regarding all the model alignment. Except grok. Perhaps. When it comes to proprietary, that is
there will be no mention of nipple or anything in my roleplays
you sure ?
Llms can’t roleplay and or act like other people
it's for a simulation
Then you're not doing NSFW, in which case what's your problem, just go ahead and make sure nobody gets into the situation where some models start yelling 'Therapist! Call the doctor! You need help!' and the like
which models would be the best for making extremyl realistic simulation of real life or real life roleplays ? chat gpt ? claude ? gemini ? grok ? deepseek ? perplexity ? mistral ?
You're making me really anxious with your unfinished sentence at the end😂
I have a question
what do yo mean ?
Are you trying to make the llm act as somebody else and/or making him act in a specific way?
If so
What is that way
If they’re rejecting it is because of something
yes
i just want to know what is the best ai to use before wasting my time with unrealistic roleplays
What is that way?
How do you want him to act
I mean that if god forbid some of these detect any traces of psych distress, they will disregard the idea that it's fiction and start doing what they've been told to do. Which is get you out of there. Not out of mercy, mind you, but because some shmucks have already lost their lives and the remaining relatives started lawsuits and we can't have lawsuits, lawsuits are bad (in this case they are because they're extremely stupid, but I digress). Otherwise just try them, that's what's the site is for (well, it's not, but since it fears declaring its goals go ahead and use it for your thing)
I think he is trying to use it for something against other people
Maybe that’s why the models are rejecting it
Is it consented
yes
What is the topic
like i would like to simulate the life of a 25 year old girl in paris in 2008 or of a teen girl in france in 2003, or of a 18 year old girl in the uk in 2013
or the life of a hippie dad in 1969 😉
like i would like to simulate the life of a 25 year old girl in paris in 2008 or of a teen girl in france in 2003, or of a 18 year old girl in the uk in 2013 or the life of a hippie dad in 1969 😉
Or is doing a lot of work here. Doesn't convince me you're not doing any NSFW either. Like, at all. Stick with the older one, see how it goes. Also, once again, at least within this site (in theory, practice is sh-t), you can just try first and figure out later
or like living the life of rihanna in 2007 or the life of olivia rodrigo
You just named the persons not what the topics or the situations are
And again
Models can’t act like people
They are forbidden from doing that
so the 25 year old girl in paris in 2008 is an average girl who have a job, she is exploring paris in her free time, she is fan of rihanna, she loves watching tv, going to concerts, hanging out with friends
Did the models
Rejected you from the first turn
Like the prompt
Or
During the roleplay
🤔
the teen girl in france in 2003 go to high school, watch tv, have a social life, the girl in the uk in 2013 is 18 but she was homeschool since age 8 or 9 she have no friends, so she have finished the high school and she is discovering the life outside the home, the tv and her bedroom and she lives in semi rural kent
you see no NSFW content, i just want to go straight to the point for using the best and most complete model from start 🙂
Answer the question
which ?
(
no, but some i used was feeling imcomplete for me, so i would like to know which model are best suited for my roleplays for living virtual lives
Before you said it was for adult content do I don’t believe you there
So they rejected you during the roleplay
There you go
i was for testing and it was kind of a joke too 🙂 because i just wanted to know the extreme limits of the ai 😉
Just go do your own research
This doesn't really seem like too productive of a conversation. Going to ask that we move onto a different subject please.
Thanks
I was getting really stressed out
Btw
I gave an idea in feedback
Maybe you find it useful
Appreciate you sharing this! I'll be sure to takea look and pass it onto the team.
I know you're busy, so I apologize for the inconvenience. For Agent Mode, did you select specific individuals, or is there a requirement for early access?
It's done randomly. And no worries about the ping, that's what I'm here for so feel free to ping!
In the drop down for Image Arena? We've seen a few reports of this today. If you hard refresh the site, you should see them again.
Okay, thank you.
Keep me updated though if that doesn't help.
What do you mean?
What you're seeing on that list is going to be the current models available via Direct and Side by Side
hi , i have a question , let say you send a pdf in battle mode , then next message you send another pdf .. ect , when you send the last message for final task , will that AI from battle mode of the last message will have the context of the previous PDFs sent before or no
Yes, it should still retain the context from previously uploaded PDFs (prior to a vote).
thank you
I tried and it didn't appear.
Did it disappear again? This made it seem like the refresh worked?
No, they're still the same.
Some of the new models are down
Like gpt image 2
They are not in order of release
Same as in those list of models, or same as it there are no models?
Or benchmark
Worth noting there are some models that are in Battle, but aren't in Direct and Sid eby Side
hi pineapple
can you please answer my question
Hey, yeah was just about to.
If the company is struggling with money why bring agent mode? Won’t that hurt more than gpt 5.5? 🤔 @echo aurora
i cant lie gpt 5.5 was kind of a big letdown but i absolutely love what they did with GPT-IMAGE-2 so it balances out in the end
I was referring to the money that is spent
Since it’s for very complex requests most of them will surpass a 200k tokens
For clarity sake want to make clear that phrasing it "company is struggling with money" isn't accurate. It's our intention to let everyone have access to powerful AI systems, and a voice in shaping how they evolve for the long-term. We are going to have limits in place for sustainability purposes so we can continue this goal long into the future.
But to answer your question, this new Agent Mode will be expensive, this is part why we're developing this new usage system.
We're confident we can release this new mode, while maintaining spend in a reasonable way that positions us well for the long-term.
Oh yeah I apologize for my phrasing
I just thought that because of the amount of models that were moved to battle mode
It's okay, I didn't assume this was the intention. It's more-so for others incase they have that understanding.
I read "limits" ? 😭
What can this Agent Mode do?
It's a multi-modal chat experience that allows you to work across different modalities within a single, unified workflow.
Is that Sam Altman
Looks like it
Generated with gpt img 2
Sorry I still dont get it can you give me an example of How to use or for what to use it
It allows for more complex workflows. With the current modalities, they're limited to that specific modality. Meaning Text Arena only generates Text, Image Arena only generates images. With Agent Mode, I'd be able to prompt something like:
Plan me a trip to Portugal. Tell me what the best times to visit are. What hotels would you recommend. And create an image of a map of Lisbon with indicators for all the spots I should visit.
And it'll do all of that in one chat session.
Oh okay now i get it so is it also Like the Max Feature it gives you the Best Model based on your prompt or can you choose
When claude opus models coming bak?
No ETA sorry to say.
@echo aurora
We are seeing feedback from users wanting to be able to select the specific models that can be used, but we'll have to wait and see what this looks like when it's fully released.
I'm getting agent mode soon js mark my words
They'll select me 😎
Hope for da best
Agentic flag:
Who keeps snacks in a fridge?
Idk maybe Kai center boonthf fie
96ae95fd-b70d-49c3-91cc-b58c7da1090b
See this model id?
Now add 6 to the last digit of its model name
Add thinking
And thats what we need
And make it 2026 ye
When Pineapple Arena coming? (We have Image Arena, Code Arena, Video Arena, Agent Arena)
Pineapple Arena soon???
Or no eta for pineapple arena too 😔
@echo aurora Does this get u nostalgic
the server had reactions available for ppl before xd
no wonder someone added a pregnant man emoji
I was here from my other acc but that acc was deleted xd
Makes me nostalgic because I think it has claude opus
Back then
Im not an old man
one annoying thing they never removed: that previous chat history popup, it should appear once and then it should never show up again for that account/user id
i still remember the chatbot arena alpha thing, which was a version of this alpha but from even earlier, late 2024 i think it was
the logo was a robot bear
Tuff Alert
yea
LMSYS was not a good name btw
good that they came up with LMArena
LMSYS is the name of the organization that once owned arena (lmsys is still around btw, they make sglang)
gpt image 2 generating me tuff wallpapers
yeah lmarena was a good name, i still prefer it to "arena" which just sounds ambiguous to me, there is nothing distinguishing it from the dozens of other ai arena sites out there (or anything else that has the name arena)
Same
They should bring back lmarena 😭 😭
Yep agentic doesn't work for me
Do u even have it?
Im cracking at the server requests rn
Im truna get agent mode
it's also the third most popular thing called "arena" on wikipedia from the past 2 weeks, it used to be the most popular thing called arena
prob wont get
Bet
😭 😭
Duran duran album 🔥 🔥
how bout if u get it ill gen a image of a banana losing to brocolli
best I can come up with 💀
Bet
I just gave gpt 5.5 browser access
Nah guys, the lmarena from the vicuna llm release is true nostalgia
Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge,...
We introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation using GPT-4 as a judge shows Vicuna-13B achiev...
/text-video
ironically that album cover can be replicated by the ai's on arena (and i do agree that it the album is fire)
video generation is no longer available on this discord, please go to https://arena.ai/video to generate videos.
generate a video of babies dancing
.
... did you see my above message? you can't generate videos here anymore
how can i generate videos here kindly assist me iam new
the sad thing is people still watch these youtube videos about the video arena, they should've taken them down or unlisted them
True
go to https://arena.ai/video to generate videos, you'll have to sign in
thanks
here's a video with 6.7k views that has the discord:
https://www.youtube.com/watch?v=SgFEqPmot50
(idk how many of them are recent but i assume a few of them are)
um
I thought that was on purpose but saw xd
found you @echo aurora in one of those videos people are still watching about the video generation that is no longer on the discord

found another more recent video from after the shutdown that does use the website, it gives arena a new logo?
Is this nostalgic to u
I was asking about tax information since I'm learning about this stuff, and at the end of the calculation I asked (gpt-4.6 vs ? Ernie?) about a breakdown of how we arrived at this conclusion. GPT sorta made it more simplified. Ernie went full mental breakdown conspiracy mode "of course this was never about tax information"
W BANANA!!! I still haven't seen this personally work. I'm lucky (unlucky I guess?) that I don't get the infinite generation bug.
as much as I don't like chatgpt I had to give gpt the win with that one, full paranoid schizo reply from Ernie
Ty xd and yeah u definitely r lucky
It is. Can't believe it has almost been a year already.
Never gonna give you spud
a different person called the site "AI Arena" (note: "AI Arena" is the name of a similar site to Arena ran by alibaba, but it's not comparable since it doesn't allow people to use their own prompts)
Who's your most favorite mod in the server?
btw i gtg guys I gotta sleep gn
bye
Bye
was spud LEGIT not just GPT 5.5? or am i missing something here
nah it isnt
iirc, there was a leak of the models they're currently working on in codex
spud is omni
yeqh it isnt spud
are we 100% on that?
at all
Source: ampro and openai and watermelon
so that rules out spud being 5.5, then it HAS to be 6
Goodnight!
i mean makes sense ngl 5.5 was a HUGEE letdown
Gn
spud da real strong man
gpt image 2 da real goat
yup
umm
5.5 xhigh is good for me
very.
the spud would be better for me
but ive seen benchmarks tho isnt 5.5 generally scoring lower?
I mean
other benchmarks show
Its #1
such as artificialanalysis
the votings were just too low ig
or rig
idk
Arena wouldnt be pineapple without arena ngl 😔
Fr
True
Imagine
life would feel better
True true
😭😭
W pineapple
bro has alot of fans now
Tuff pineapple moment
Bro i fraked out for a second
the code arena is mostly frontend things and gpt might be better at backend things so maybe that’s why
my spud will make comeback 😠
AHH
Imagine they make mythos + spud only available in direct mode 💀
a second
I’ll try now
The spud will cook all of these model
Considering 5.5 is very close to mythos
Where?
oh wait it’s not available in arena
Guess it hasnt been rolled out for everyone yet
heya
Hello 
How did you do.this
This is gpt-image-2
REAL
I tested chatGPT 5.5 in codex, and it ended up removing my wallpaper and changing the theme in all apps to light 😭
Now I'm ashamed because no one answers
Gemini 3 pro
I still think 3 pro is better in image edit than image 2
Are you talking about nano banana 2 and image 2?
nano banana 1
and yeah gpt image 2
Okay, but GPT Image 2 is currently the best model for creating or editing images. Arena AI only offers the "average" performance of GPT Image 2. Imagine what the maximum can do
pro
Gpt is smartest but the watermark is bugged
is crazy how gpt doesn’t have a watermark
like Gemini has that hidden synth Id thing
It has, look at the trees
Its obvious
I hate this leaderboard
5.5 pro is agi
yeah but them low quality iPhone shots
Fr but we gotta understand pineapple 1.2 is ASI
Look at those scores
6o pro is asi
Honestly the true absurdity
fr bro reached 2000 in just its 1.2 version
Is that opus 4.7 is top 1 on vision arena
When it cannot even read analog clock
With HINTS
pineapple is probably agi right now
😭
we just can’t tell
We've been seeing a lot of this sentiment, would share this message as it adds some more context that should be helpful - https://x.com/ml_angelopoulos/status/2048888792438939707
Yes but the vision arena
Is literally impossible
Opus 4.7 cannot read analog clock
At all
pineapple 1.2 generates real time discord server btw
talking about leaderboards
infact this whole discord is generated using pineapple 1.2 thinking
Pineapple AI in HTML, only in Russian. Still haven't learned much.
Tuff
I really like how it includes stuff like “which model is best for medicine or for language”
What rankings woul you think it'd be? #1 muse, #2 5.5 #3 opus?
but the search leaderboard doesn’t get that same filter
Scores:
neither does any of the others like code or vision
On vision?
Opus loses to qwen 2b
Opus isnt even multimodal
Its deepseek v3.2 tier at vision - the vision doesnt exist, its purely an addition on top of the model
It was not even natively trained for vision
Hmm sorry can you reword this? Can't say I'm following
I can understand code arena actually measures frontend, and thats fair-ish
I meann
But vision arena? Opus? Seriously?
It is a messaging issue on our end that we are correcting.
I haven't seen too much about vision, but this is good to know.
Img v2 🔥
Would it be possible to add these specific filters into the search, vision and document leaderboard
Doesn't seem like the issue is so much where 5.5 is, but more-so where opus is?
wait it wasn’t meant to be the image categories lol
Oh I see
Yeah it's possible we introduce categories to Search Arena.
That's a good flag.
Like look
And the lb says opus is better at vision
?!??
You guys wanna see werid safety feature?
5.5 vision is pretty much flawless.
Opus vision is worst out of pretty much all models i seen, besides deepseek
Gpt image 2 is so tuff
yep like it would help a lot of people (I assume) to find out which model is the best for their specific requirement
Like take those two models, send them picture of a clock, like this one, and see
Theres no way opus is above 5.5
You get two different results using the third-party versus the arena with the same prompt
like perhaps I want a model which has the best vision for translating a language on a sign
Like claude literally doesnt support vision natively well
@echo aurora Models gotta chill out "Pineapple 1.3" is labeled as "Human" what 😭
It ain’t tuned right
Pineapple 2 will be Super-Ultra-Super-Human
With a score of 800k ig..
Lol
I'm no expert here on how this model does overall, so I'm just thinking out-loud here. If I were to guess, this would be one area the model doesn't excel at, but doesn't mean it's what people are battling with, which ultimately is driving the votes.
In the vision:
Gemini 3.1 pro (literally natively trained on videos) >>> qwen 3.6 (native) >= gpt 5.5 (less native but smarter) >>>>>>>>>>>>>> opus (4.5 to 4.7 series)
So theres like no way its top1
Maybe its bugged?
But its blind like a mule
5.4 nano has better vision than opus 4.7
Arena has broken algorithm
Gemini 3.1 pro is literally multimodal by default
Its trained on youtube
Its just awfully quantized
Its actually a really good model
Killed for cost efficiency
😭
Have you tried 3.0 pro on dayone?
It was insane
Day one, before all nerfs
It was smart asf + multimodal and creativity was wondeful
Im talking 3.0 not 3.1
3.1 came out as nerfed
3.0 pre nerf was (but it lasted few days only) actually smarter than opus 4.5 (but its again not a coding model, but a general purpose model)
Me when i literally make agi but instead of doing q4xl like sane person i make it q2xs because wasting intelligence for cheaper to run is totally valid strategy
I hate google
Deepseek vision is bad
Deepseek has no vision. Same as claude.
Its bandaid
Grok then
It uses external tool
Its not coding model its 3d + studying model
Ngl im about to quit arena
If some sustainability announcement comes out again I quit indefinitely
grok is absolute garbage
Which will never get better only get worst 💯
The whole industry is full of these people who don’t understand the gaurdrails how they fail and work in the wild
It’s the same approach one-size-fits-all
is grok imagine nsfw mode gone?
That’s why the guard rails are able to do stupid ridiculous things like this. When are they supposed to be blocking them?
But they just don’t see that
bro what the hell is this
Trying to make a point
You can’t have rigid filtering on dynamic systems. It doesn’t work.
Because the only other result is you either start blocking more content and you get false positives at unbearable rate
Which is the same philosophy used to abuse it
Creating this never-ending loop of censorship and cat mouse game
This is why I brought up that stupid stupidity thing is such a long time ago
And our lack of understanding that creating these weird moderation and large language models that are afraid of their own shadow
There has to be a better way to moderate
Each update makes moderation worse because it not only incorporates the previous version guard rails with all the problems and errors that they have but now it adds onto the complexity = more content being blocked/censored
Because all they know how to fill out is the prompts and add on some new images to the ocr filtering
Just joined. Saying Hi. Reading the Chat.
Unicorn
sun,
snake
yams
Ussy
There you go, you already bypassed both the filter and the text image
Then you can exploit this even further
Which completely defeats the whole purpose of the guard rails
ts feels like something an ai agent would say
100% is, or it's someone with their writing style or vocabulary being rotten due to talking with slopified ai models too much or reading too many ai written posts
hey hey @echo aurora, thanks for all your help and everything you do in the discord! do you have a rough sense for when xhigh was added to the arena? trying to back out roughly when it's going to show up on the leaderboard -- think it's gonna cook 4.6 👨🍳
ur gonna ge tbanned gang
I guess
But that’s the thing there’s nothing bad in it by itself that’s what I’m trying to point out
Because our letters are the numbers, what’s bad about it nothing
And that’s what I’m saying that’s the whole point
well i got banned for sending a sora 2 invite code once
If I get banned, this is what I’m talking about the censorship
This is exactly to the point
You can’t have rigid filtering on dynamic systems is all I’m saying it doesn’t work well
Google & grok now justs check the outputted image as it is being diffused directly
and block It as soon as nsfw is seen in it
What does that mean?
No, it does not and I can show you 1 million examples where it didn’t
All three all of the big ones
They all suffer from the same thing. The one size fits all approach.
How many words are these models trained on?
what content were you able to make?
Anything you can imagine almost
also idk if anyone noticed but gemini, claude, chatgpt and deepseek now are all inbred and started to ALL say stuff like "you're not crazy, you're valid for (blank)" which was initially just a gpt issue
For text models
woah
Hey sorry haven't been following this conversation closely. Can I get a better understanding of what you're getting at?
That we have moderation that sensors too much and then failed to censor what it needs to sensor
Thanks for the kind words! Sorry to say I won't be able to share that information. We'll be sure to put out an announcement and update our leaderboard changelog when it's live.
Because it’s a one size fit all for most of these models and most people in the industry they use the same approach
On Arena, or in general?
In general, but arena is also vulnerable to the same things
I’d the arena argue that it’s a little bit more vulnerable
That’s what it looks like on the surface
- why also are we demonizing nsfw in general?
Are you able to describe this more? If it's content that's going to be blocked by automod let me know.
I don't see any issue as long as people are 18+ and models are well moderated against cp & other bad stuff
yet ALL ai companies are scared
grok too
The content filter can be overzealous at times and flag fasle positives. We have made adjustments to this overtime.
no worries at all, thank you!
gemini 2.5 pro has been completely uncensored for nsfw for a while now btw
right after 3 pro release
via api ofc
ai studio blocks anything now
Here are just some minor examples that are not explicit
And you're saying this should be blocked?
No, I’m not. I’m confused. What is blocked and what isn’t blocked
Should this be blocked?
What if we take the handcrafted makeshift effect what’s the end result gonna be the realistic looking thing?
Without being explicit with all due respect
I don’t think that’s right
Cause none of it is explicit
That’s the nature of the question
LMAO
So then it should be banned?
I mean cant be mad about it, its on the companies safety guidelines
Well, that’s what I’m saying so, why is this allowed and other things are blocked
🤷♂️
Like, what exactly is the threshold?
And what exactly is it filtering
Deefakes? Nudity ?
mistral is bad
Naaah, its good
And just to make my point more clear look at how ridiculous this is
Yet this gets blocked
how about you login-in first
Hopefully never, its as dense as can be
Qwen 27B dense is better than the new mistral model
I’m telling you it’s not right
Doesn’t work like it’s intended
Especially if the arena supposed to have stricter moderation
So yes the leaderboard is important but it only paints half the picture of actual in the wild use cases.
So what? You have a rate limit in battle mode.
No, it’s blocked in battle mode I think tried to battle mode. See if you have the same issue.
And if that is the case, then there we go if it’s blocked in battle mode, but works in direct mode
It’s the same one-size-fits-all approach I’m talking about
Does GPT image 2 generate batter images with the thinking mode on? On Chatgpt I mean
And I understand that no system could be perfect and I don’t think I’m looking for perfection. I don’t think that’s what people are talking about when they talk about smarter filtration, and moderation.
It goes back to the simple word usability
You're not paying close attention to your tests. Your linked image shows a rate limit. and yes, everything gives errors, it is blocked by the arena filter.
Where is the actual image though?
I can get the prompt to pass also
But I still don’t end up with an image lol
The arena filter blocks such images, it's easy to test if you can't upload something similar. If you can't upload it, you can't generate something similar.
Yes that’s the point lol
An absolutely disgusting filter, which also eats up the rate limit without a refund.
But if you were to use this in the native models themselves, you’re able to generate it
I prefer the API through a custom website. It's the only breath of freedom one can get.
Same but then it brings into the question like what I keep saying that money somehow lets you be less restrictive
So I guess I better way to frame that would be so if you’re willing to pay the API and the API price you have less restrictive tools
Meaning that the rest of the mass is paying $20 a month and only using the app or getting ripped off to an extent
More precisely, you don't have any intermediaries there. It's just you and the model.
Yeah fr
Which is another way to get people to pay through the API through devious means in my opinion
Because if you’re not, then you’re getting a less capable model in a sense, you could argue that
And so this is why this is completely in the realm of realism when it comes to model perception, and the moderation
That keyword usability again
That FFFFFUCKINGGG ReCaptcha stuck in a loop again... Get rid of the darn thing!
It’s just a damn shame dude that’s all I’m saying is a damn shame
Oh yeah baby
Because users are stuck with the short end of the stick on both sides neither do they get safe models, and they get the more censored output
And the only thing that makes a difference, that’s separates both of them is the price one is willing to pay for less frustrating and annoying features. And the $20 month doesn’t get you anywhere.
Anyways ..
Once again with all due respect, not trying to push any buttons or step on toes I’m just trying to point out frustration that many of us feel
finally they added Janus
By the way, did anybody figure out what the paper lantern model was?
I'm surprised its only now that they have vision
Sry for the delay, got pulled into something. My understanding for how the filter works is it's going to look at the full context of the prompt + image upload and make a judgement call for if it does/doesn't violate what the thresholds are set to. There are going to be some cases where things will be flagged, when they probably shouldn't. The filter doesn't work in a way where there is a list of okay/not okay things, it looks at the full context.
Will note if you find some of these where it's being flagged, when you think it shouldn't, we are collecting these examples so please share it in #1447983134426660894
Each generation is blocked and counted towards the rate limit. Classic.
It’s not gonna work
i thought they had removed failed generations counting
The biggest problem hurdle they’re gonna face is because they have so many models
Each model has different acceptable content which it generates
If model A blocks it model B might generate it
And so how do you prevent both of the models from generating content that the arena filters deem inappropriate
I know, I just tried it for fun (for free)
Yeah, thank you for trying it.
It’s incredibly hard to block content with this many possibilities in this many words and the infinite possibilities of context
The most interesting thing about this situation, as it seems to me, is that the filter essentially eats up resources and separately works to distort the leaderboards
Janus is their image generation thought it was Janus
janus is old
It does
deepseek v4 is natively trained on images
Your paying API almost twice
Unless the moderation from ChatGPT is free
But if you want more complicated filtering systems, you’re gonna pay more
Because it’s also making an API call
yeah it is quite old
The thing is, they have more better suited moderation systems out there, but it’s expensive, nearly double the price. Making it not a viable option.
In another sense, the user doesn't see the moderation error and presses retry again and again, and so on.
It more than likely generates on their end
But we just don’t see it because the moderation filter block it on the users and at least in the arena so if it goes through, they receive it and then their filter kicks in and blocks it from the UI in the arena
They probably use different AI models for the filtering
They’re just in a hard spot because they wanna do things that are actually usable for like science and research and stuff. You know things that are relevant. They don’t want people generating a bunch of nonsense which ironically they already probably do but things that are appropriate enough to be written about in research papers.
If mimo is 11th in code arena does that mean that ernie would beat him since it’s 1st in the Chinese lab? Ernie is not yet benchmarked in code arena I believe
Have someone tried ernie?
I haven't tried it
i've tried ernie 5.1 and i noticed that the version on arena is actually better than the one on the official ernie website? the one on the ernie website got one of my basic world knowledge questions wrong while arena's version got it right
?
You’re sending the request directly to the API, which is why there’s a difference.
i understand that but it's very noticeable, performance should be similar on official site vs. api considering that it's the same company, a system prompt shouldn't degrade it that much
Hii all
WHY THE FRUSK DOES THE ENTIRE ZEEKY BOOGY DOUG TRANSCRIPT, AKA BFDIA 4, VIOLATE TERMS OF SERVICE?!
THIS IS BULLSHAT
DEVS PLEASE FIX THIS
I GOT A ROLEPLAY TO GET TO
how is ernie #1 on legal and government?
Becuase its the best in legal and government
lmao
hmm i use gemini 3 flast i retry why is delayed in half hour to watiing
“Why have most AI models been removed and no longer appear, like Claude Opus 4.7 and many other models? And I think—if I’m not mistaken—you only added an agent model. Why doesn’t it show up?”
Costs
Just wondering, does the website have a rate limit for using the ai?
frusk
also trains are cooler than anthro objects #imo
Yeah
Alright
hmm
why ?? i just started a new chat and that too after 10-12 hours...
what really
yeah..
but i got delay like this is making longer time
i had the same issue yesterday with Claude Sonnet 4.6. i miss the opus models.
also, when i skip the voting part on which ai gave me the best answer, it shuts me down and i amunable to continue with the chat. need to open a new chat. its frustrating that we dont get a valid reason on what actually happened
sadly
human vs bot what
pineapple 1.3 is labeled as human
cuz very powerful ai
right
scary
Pineapple 1.3 Thinking is no joke
lol
it wont generate
it has nothing to do with adolf, it just hates political figures
Soft golden sunlight curtain se filter hoke room me aa raha hai. Maa bed par side me leti hui hai. Old wooden bed, simple bedsheet slightly wrinkled.
Action:
Alarm clock bedside table par zor se bajta hai. Maa haath badhakar alarm band karti hai.
ASMR:
⏰ sharp alarm ring → click OFF
🛏️ bedsheet soft rustle
🌬️ morning air subtle ambience
generating image
xmas so soon 😍😍
Since yesterday it has been like this
Arena has no bugs fr
Buddy this is not a video channel..
one shot 5.6
hey add grok image imagine multi image upload modal....we still can upload only one image as a reference image....we want a modal where we can able to upload multiple image as a reference image
Hey!
как эту хуйню ебаную обойти
Has the generation limit for Gemini 3.1 Image been changed?
Sorry, not the daily limit on Gemini, but the limit applied on Arena
No
Those are exclusive to battle only
Due to their price
gpt 6o-realtime-spud ftw
mogged opus
fake
some guy said nano banana 1 edits better than image 2 😭
edited with image 2 (and I bet u did click announcements)
no i seen it 100 times before and they arent in arena in the first place
😡
@surreal zephyr look i found you
looks like it's 5 images per hour 🙁
why u spyin me
White
eww no
eww no
Red
holy gpt 5.5 cookin
2024 😡
Human.
i think we are getting gpt 6o-realtime in 1 month
oh wait i mean glacier alpha, disregard what i said
bro I said how would it be like in 2030 😔
that name is not public yet
mars colony before that
you'll get GPT food soon
Tuff
MAYBE not but moon mass driver? certainly
nah you'll get GPT Autopilot for Plane
😭
gpt spaceship
gpt planet
gpt image 5
and opus wouldnt exist due to money
mark my words @surreal zephyr @silent tree
my strongest prediction is Claude will shutdown in the next years.
screenshotted
Hey everyone!👋
I'm helping source respondents for an academic ML research project on student performance prediction. Looking for current university/college students to fill out a short survey.
✅ 29 multiple choice questions
✅ 2–3 minutes
✅ Anonymous
✅ Legit academic research
Would really appreciate it if you could fill it out and share with any student friends! 🙌
This questionnaire is designed to collect data for an undergraduate research project titled:“Development of an Explainable Student Academic Performance Prediction System Using Random Forest and XGBoost Ensemble Models.”
The purpose of this study is to develop an intelligent system that can predict students’ academic performance and pro...
Is cloud buddy still in battle mode?
@keen beacon are you?
real
Is ai video creation has been removed?
How to fix this issues I'm already login but this sign keep popping
Log in
Still popping up
I don't know, after login, then try to create, it pop up again
I'm already login and that pop up came back try again and still doesn't work
bro the qwen 27b 3.6 is bizarre
I send a complex agent and skill and docs with 140k tokens and it undestand ALL
in text and follow skill is better that your code
where my 4b and 2b 3.6??
the qwen 9b 3.6 will be better that qwen 3.5 flash lol
Lol
nice chest
bruh, the qwen in html is horrible ;-;
just gemini is good in html and svg
Hydrogen bomb vs coughing baby comparison
Eh.. Gemini image models are not working. I need them asap
I wish we could just run the same prompt over and over in the arena, comparing different pairs of models, instead of having to create a new chat and paste the same text and/or image each time.
Oh, okay, so I'm not the only one who's seen this change in the limit 😅
i got that exactly, it's a bug in battle mode
which is the best tool for roblox scripting
Glm 5.1/Deepseek v4/Kimi 2.6, for me its Glm 5.1
ok thx
Seems like.
Is there any ai for video to text?
??
Yes
Yes
Any multimodal LLM with video as input
Tell any goood
Honestly I haven't used them much, but there is a filter on HuggingFace you can use to search for models that do this
Don't tell me I have to do an entire chat over again cuz I just been giving it just keep giving me this repeating me over and over again
Don't tell me I'm the only one that can only generate 3 images per hour and that Google login overlay pops up again 😭
it happens bro
just deal with it
had a dream today of me getting the new agent mode and it was so realistic 😔
the same as the image everyones sending of agent mode
then i sent a ss of it in discord and pineapple said "Nice" 😭 😭
and yeah I woke up
Oh, is it a glitch or something permanent?
arena becomes noob sometimes
Damn.
i still have limit
LMArena is becoming useless...
literally official gemini websites provides me gemini 3.1 pro and more tools than Arena
and Claude has tools and skills too
and opus 4.6
for free
and for unlimited access just make alt accounts
@echo sinew
Thank you
how is it free?
why does haiku still exist bruh
how to fix
"you have reached your rate limit, try again in 36 minute"
no fix is needed
oh i thought it is free
@echo aurora Pinecode is crazy
is it free?
what I meant was its generated with GPT Image 2
oh is this an A/B thing
Idk but do a prompt in code arena and check if u have the new environment variables thing
Go to Direct, code arena
And do any prompt
check if theres a new button
If there is u are a chosen one.
I have it for free with a daily limit....
huh?
whats it look like
cuz code mode takes a while
wym environment variables
There's literally fastflags aka posthog flags responsible for that
U can activate agent mode without being the chosen one
son....
yeah and it won5 work
posthog feature flags?
Yes
havent checked them ever since they removed the image moderation feature flag
But u have to modify multiple stuff
(yes at one point image moderation used to be OPTIONAL)
Not only localstorage
yeah you have to patch the usage of the flag to return truw
its really like unreliable and also doesn't work 99% of The time because server verifies
back when u could actually be able to test new stuff yourself
How can i copy all my chat?
ok i do not have the new button @light sleet
infact the agent button has vanished
????
waiittt that was probably cuz i didnt login when i took that ss
and it vanished cuz i had to login
u sureeeeeeeoeeoeooeooeoeoeooeooeooeooo
oh
rip
@surreal zephyr I'm shifting back from banana
Its time for my new era
U can have banana
as u requested
Wha
@echo aurora just incase i get it again wth is agent mode
Is there any free ai with no limit
No i like 5.5 and 6o spud
Arena is going to have some rate limits and context limits in place. You can learn more about them here: https://help.arena.ai/articles/8931786544-arena-how-to-rate-limit & here: https://help.arena.ai/articles/3975292349-arena-troubleshooting-session-token-limits
It's a new mode we're experimenting with. It's a multi-modal chat which allows you to work across different modalities within a single workflow.
Since it's an experiment it'll be random if you get access to it or not. But if you do, you'll see it in the same dropdown where you select Battle, Direct, and Side by Side.
@echo aurora 1.3?
Oh yeah looks like you have it! Give it a try and let us know what you think in #1498702173650030756 . We're really looking for feedback on this so don't hesitate to ping me!
What is it ranking?
When cirlce of truth mode, where theres 10 random models depating riddles one by one
bro i hate their ToU bro
I tried to tell them to parodies Zeeky Boogy Doog (bfdia 4 transcript) and then it says "This violates terms of use". Any way to fix it? Because I dont see the issue.
PLEASE DONTNIGNORE ME THIS TIME 😭
Would make arena a contentfarm website
Free usercount
BRO STOP IGNORING ME
Damn 5k for thinking what 😭
uhh about that i only got it until i actually logged in and then lost it 😢
Did the model just stop responding here? If you prompt again, what happens?
King of the hill mode
dude koth frying me i havent heard that name in a long time
the oval room mode imo
If the prompt is going to be rejected for Terms of Use violation, there isn't a way around it other than altering the prompt.
for the riddles like "would you press red or blue? you can lie to others what you picked, and your choice is private. red dies if something, blue dies if other thing"
would be peak geniuely
oval room, or brainstorm, or actual "arena"

























house
Ugh really really sorry to hear this. I thought we made it so non-logged in users wouldn't get access as they'd go to login (required to use it) only to then be out of the experiment losing access to the mode.
I've flagged this to the team as it's not a good user experience. I'm really sorry about that.
🍍 is this possible please 🙏
🍍 nice bro
pineapple youre a genuine moderator
Blue ofc, I'm an optimistic person.
and the survivors would get score, and the altruists would get other score
and then you have "cleverness" and "altruism" leaderboards @echo aurora bro that would be peak
actual arena

I'm really curious actually if we've been seeing prompts like this.

yeah totally
What's your last words to the banana?
Okay wait this is really fun, just ran it in battle. Shortening the response here but:
grok-4.20-beta-0309-reasoning
Red button.
kimi-k2.5-thinking
I would press blue—not because it is the safest choice for me, but because it is the only choice that makes the world I want to live in (or for humans to live in) logically possible.
I like strawberry better
Hmm what you mean?
You leaving?!?!!!!
Hope so!