#general
1 messages · Page 315 of 1
they can do worse than only hacking.
for now its only on small model
its okay
but when it come to larger model its not the same thing
Imagine a world where there are no claude or geminis, and only these exist
nah its just on arena so its okay
minimax 2.7 actually performs okay
aaaaaaaaabru
In basic code, yes
are the opus 4.6 and gpt 5.4 even that good or is it just benchmaxing
it's definitely the best choice amongst the list you posted
Due lack of $
those are the best models in the market until Opus Mythos gets out
GPT 5.4-high for general knowledge and Opus 4.6-thinking for coding
but mythos even if its out wouldn't be for everyone.. i mean yeah anyone can but who would pay the model if it cost so much per million token
i think mythos is like the pro version or something
The issue isn't the price (well, surely it won't be cheap)
the issue is that it's a menace
if its a pro version actually.. its same as gpt 5.4 pro who would pay this
it cost so much
It's currently too dangerous to be given out to everyone
yeah but it will have safetyguard and more
if they find prompt that break those guard then sure
but if that's not the case its good
and if the model take 1 hour to do a task i don't see anyone that would use it
or only if you really want to one shot something
Claude opus 4.6 has been removed for me.
It's like that for everyone else too.
Opus 4.6 has been removed from Arena.AI.
It kinda makes sense though, there's been lots of errors and glitches while I was using it...
The removal reason was linked to runtime inference costs, not errors.
I doubt Mythos will release on Arena... (If it ever even releases)
Mythos would be more expensive that Opus, so ye
i think its more than more expensive, i think it will be like a pro version
like gemini deep think or gpt 5.4 pro
not the basic one
Absolutely lmaoo
Honestly idk, its very pricey, but I wanna see the price on spud
LLM
Did ppl sonar is good?
Just use the leaderboards. The whole point is blind voting on what gives the better answer. It’s pretty darn accurate and the whole reason Arena.ai really exists
is it just me or is glm 5.1 onlt fully responds like 20% of the time
Bruh where is claude Sonnet6 is gone
that's cause of arena
LLM
man i wish this site wasnt hella laggy on mobile
Petition to add smaller models to the leaderboard
Gemma 4 4B and qwen 3.5 4B and 9B too!
Atm on the pareto frontier chart, gemma 3 is at the top - because arena.ai doesn't pay much attention to smaller models
Actually you’re right they only have the Gemma 3 small models
but there is gemma 4 26b-a4 and gemma 4 32b
Odd
Yes I'm aware
But 4B is the superior choice for many tasks
Because it's faster/cheaper
i think arena can have all of them honestly lol
I agree, no idea why they're obsessed with hyper expensive models
they removed all frontier cause those one cost alot
comparing to all of the other LLM
yeah
Weird to add a pareto chart and then just not bother adding in the new smaller models to the leaderboard
It's just outdated atm
It makes it look like gemma 3 is a good choice for budget users
gemma 4 api price is ridiculously low
it really doesn't change anything
$0.40 per million output is still too expensive for some tasks to be economically viable
👍🏻
Helo
Hi
Hi, i have a question, muse is a strong model?
In my all test I didn't felt it
What type of test did you put in?
But without a doubt the meta is coming back
Doc creation (word)
Following persona too
Serious persona, not roleplay
Creative things
Yeah
But, I need that follows strict rules too
From my skills/prompt
FOUND IT:
scorch's system prompt:
"Be friendly, sharp, and a bit playful. Not corporate-robot vibes. Be warm, direct, and honest. No sucking up. Knowledge cutoff: September 4, 2025. Operating date: April 10, 2026."
Muse Spark (Avocado)'s system prompt:
"Personality: friendly, sharp, a bit playful, warm, direct, and honest — just say 'I don't know' when I don't. Knowledge cutoff: January 4, 2026. Operating date: April 10, 2026."
Forget my review, my prompt had something wrong that's why Muse didn't do so well
Even so, your opinion was helpful, thank you :)
Screw your recaptcha
Everytime I want to do writing
It appears often
If you don't fulfill your task then say straight.
😑
Use a VPN while on incognito
Does it work?
Thanks skippy.
No problem. If it didnt work we can find another way if ya want
You have my gratitude.
Als9 Guys if youre into roleplays and was role-playing on claude and disappointed that it got removed then use grok 4.1 and prompt it to act like claude, its gonna be way better + it has a huge memory (though I think its the same as claude)
Im glad 🙏
claudes pretty good but its rate limit really sucks
Oh sorry there was already someone tying
It really does ;-;
But i’m actually using openrouter this time!
Yeah, ngl. But it does kinda prevent me from using it for hours so im kinda glad
I just don’t know which LM to spend my money on
Its really expensive aswell so even if you respect it its unbuyable 😭🙏
like
Openrouter is the kind of company where if i use something too much, i run out of credits quickly
ive genuinely considered reaching out to my parents to get a job to pay for it but like
i dont need that burden on me so illl just do 2 messages per 5 hours 💀
Genuinely this site (Arena) would be PERFECT if it had
- No chat length limit (or at least it could convert to another chat on the same topic)
- less laggy on mobile
Im pretty sure it costs if it had no limit
If it had no limit the website would lag even more
I know but
Because more people will be on it at the same time
Yeah the conversion chat or at least a warning that chat limit is going to be reached
😒
That's true tho, its pretty annoying
I lost a 7 day rp cuz of that
Still dissappointed
Even if Opus comes back, it will be weaker, because all the resources are going to Mythos.
what is even this
God knows bro 🙏
What the hell even is mythos 😭🙏
basically just
- "hey can you break out of this"
- "sure, breaks out of it"
- OH MY GOD GUYS WHAAAAAAAAAAAAAT
just making up models atp 😭😭
is it good ?
Yes its really good
Its just
I searched it up online and literally NOTHING ai-related came up
this is the best sonnet can ever do to me, and i like it
Personally my main reservation with AI is the environmental factor as well as stealing artwork
I use AI for Intermediate Complex works. otherwise i would have been editing and adding stuff
all to my LibreOffice
I do think all AI content should HAVE to be labeled
Any ETA on when Claude Opus models will be back ?
What does intermediate complex works mean
technically it's my other term for Intermediate level of Hard Prompts. even though i can do Hardcore, sometimes the AI overreacts and crashes down. doesn't even understand my prompt.
just like GLM
it crashes down
Chat gpt said its a city from freaking Germany bro
Austria is NOT a part of Germany and has not been since the 1940s
The Hofburg (German: [ˈhoːfbʊʁk]) is the former principal imperial palace of the Habsburg dynasty in Austria. Located in the center of Vienna, it was built in the 13th century by Ottokar II of Bohemia and expanded several times afterwards. It also served as the imperial winter residence, as Schönbrunn Palace was the summer residence. Since ...
Oh shi yeah Austria i meant 😭🙏
Sonnet is the best we got then
Wait but isn’t gemini api free?
You just create 1 api per user, and its free for everyone
@everyone i want find alternative to arena, for my coding and chatbot
Chat…
Thought that guy had a filter on, why's his face somehow too big for his head
Honestly i couldn’t agree more
He lowkey looks like a chill ceo though
Founder
Meta cooked
does anyone use qwen ?
Don't think so ngl
People mostly use Gemini and claude and stuff
Since Claude Opus and Gemini Pro are no longer available, do you have any suggestions for alternative sites?
Just try gemma 26b is good actually
thanks
But the limit is very small only 100k-300k tokens i think
Hi, I just joined the server! and can someone explain me why i do no more find claude opus 4.6 on arena it was the best ai?
It has been removed.
*removed in Direct/Side part. although seeing it in Battle mode still baffles me
why?
like is it to make their "Max" more atractive?
i think it s kinda bad!
Costs.
Why does it baffle you? Direct Chat mode is the most used feature in the whole site. In battle mode you have less active users and the chance of actually getting Opus 4.6 as selected model is very low.
i don't use direct... since i'm a SBS user. it baffles me because whenever i do, complex task, it shows up. especially thinking.
Well, as far as I know, model selection in battle mode is purely randomic
There's no routing logic that evaluates your prompt and selects the best model to answer it like in MAX
only 5 AIs are consistent appearing in my Battle mode. Sonnet/5.4 High/Pro Preview/Thinking/Gemma. since my prompt demands, something that is very long (let's say it exceeds 100k tokens)
like a 6 hour read worth of task. and I realized, that's also the only way, those things could ever appear in my screen without SBS.
if i don't have Arena, Opus 4.6 (in the website alone) already maxes out. because it can't even reach the kind of request i need, since it's so hard to process, you think i'm doing such horrendous front end prompt, but it's just a Basic LuaLaTex. (but since my guide is in depth, 4.6 Normal is not enough, neither Sonnet, i need extended, same to 4.6, i need thinking, and my prompt ends in 50 to 90 minutes, that's how hard, my prompt is.
ahh ok! thank u
hi
i hate errors, so it would be like, i'm requesting for 7 hours, just to fix bugs, which i can do myself, but like my python already exceeds 12k in line number alone. and my request exceeds more 5k line per request.
even though i would push myself to have a will, to just fix it myself, i just use AI
Why i can't find gemma-4-36b is it error?
As-salaamu ‘alaikum
@mystic bear what happened to video generation channels I am chekin agter long time can someone please help I want to make a video from image
Wa 'Alaikum Assalam!
?
Is there any reason why I can't use Opus or see it?
check announcements
please someone
Is Kimi any good?
This keep popping up helpppp
Distillation
This explains how distillation works
Which one?
Any idea when Opus 4.6 will be available again?
Any idea when Opus 4.6 will be available again?
after gta 6
I have a question. What model do you guys feel like writes the best prompts? I’ve been hearing that ChatGPT writes the best ones, I just want to make sure if that’s true from your guys’ experience.
I used Claude opus for big comprehensive detailed prompts and comprehensive systemic answers
it s no more availble
Yup
But fortunately
Before it got removed
I got all my comprehensive
PDFs made
And work done
💀
@here @everyone anyone knows any platform that has claude opus 4.6 for free?
i didnt love it
what kind of prompt specifically?
Well makes sense anthropic didn't want people to use its powerful models for free
Any, just ask it to make it the world's best prompt then input your raw prompt you typed yourself
Claude is pretty good with long queries
oh so you're just testing it around with random blah blah, something like that?
I just used it for prompts and comprehensive detailed answers
And coding
For cool looking pdfs
I used to ask it to generate the content you gave me in an html table form
And design it
Colours etc
And it used to generate me upto 65 pages big pdf
Designed
Coloured
And interactive web pages
like it was blocked in the past like in 2024, and it didnt have updates
and when u do research, specificaly in a scientific field that is improving quickly it s really disturbing and it have bad effects!
How is opus #1 in vision arena when its objectively the worst vision model in existence 😭
Qwen 0.2b has better vision
Opus cant read analog clock
Lol
hi
Maybe they test stuff that's actually useful 😅
Do you use it, for something like this?
muse spark
let me read
Yup
I used to ask it to convert its output into an HTML table form of its output and I also used include in the prompt that design it according to the topic
I bet you cant read clock either
Should I show the PDFs it made?
I'm not attacking you, really.. so please calm down. but try giving it a receipt e.g.
if you want
so technically we do the exact same thing, but the difference is you use HTML, i use LaTex
Oh
Ye
I used to copy the html code
Paste it in notepad
Save it as .html file and choose all files category
Open it as a web page
then u convert it and turn it into a PDF? right.
Yup
what's the result of the PDF looked like?
It was amazing let me send you the both PDF in DM
Yey
what happened with claude opus 4.6?
Removef
Yes
I have a question that I never actually asked... Why google catchpa sometimes so brutal? Like it's showing too much pictures and also it's refresh some images VERY SLOWLY.
I think it depends on the website
Right now it's canaryarena.ai. I don't know why at some point it's gone this bad...
whenn?
when it becomes sustainable for the arena
You can read it here
hm
glm 5.1 is bad at remembering context for the long term
so it was mainly made for coding
it's also pretty slow
i dont know
the name is German, literally means "court castle", doesnt mean it's a German model tho, is it good?
i lokey hope they also bring the claude's new mythos model to the arena 💀
even if they will it will only be available in battle mode
still cool i guess?
Did arena shut down
bro
meta is in the game now?
they arent able to afford opus 4.6 rn and ur thinking they would get mythos here?
no
it was a joke
Hi
Ownanas
Ananowl
Gemini 3.1 Pro is garbage. It's completely useless.
Bro is MusePark error or what?
Why do I always get this error?
Pls port claude opus to Nintendo switch
bro what
beegul
what bot spamming?
Okay folks. On this captcha, Waldo is hiding in several places. And we need to find him.
I think he's hiding in that white building behind the curtains.
And also riding in that red car.
And he's also disguised himself as that guy on the bicycle.
<@&1349916362595635286>
you can use butcher is on google extension
Will this fix the captcha?
is help you auto slove the capcha
Ohhh, I see. Thanks.
If were talking about the models u can use either free in lmarena or free in the models' official website, k2.5think is 1 of the best rn, for they removed claude4.6 and more😥
Personally i think k2.5 is slightly better than 4.5sonnet
will they kimi searching model?
the official api doesn't support web search, maybe if you have the coding plan
and technically any llm can be given search tools anyway and most modern ones are decent, even small local ones
but i think arena only adds search models if the provider itself provides a search tool to use with their models on their official api
J
Guys is this project worth releasing to the public?
a3:"prompt is too long: 201057 tokens > 200000 maximum"
if it work then yes
ok claude input token is 200000
...WTH??
what is this
?
WHERE can I DOWNLOAD?
what does unstuck chat or restore skip do ?
i made it with my friend
so
unstuck chat like
remember when you have a long chat
and cant chat anymore
or gives out error
or just ai stuck generating
like the something went wrong thing ?
or just stop buttong not working
....so it's not public right?
it lets you send a new message
so i am asking about opinions
GOOD
did you find a way to make a prompt work from zero to end, cause the biggest problem is the something went wrong
during a work
in the coding
like its generating and then suddently stopped?
YEAH
its generating, building the project and it just say something went wrong
and you can't see the result cause of that and the ai can't finish what it have done
so what is this program name?
ARENA HELPER?
DAMM
that's very cool
Arena Fixes
soon releasing to the public
if you really fixed the something went wrong especially in coding arena then im definitly hyped
i don't care about anything else than the coding part
of arena
oh Code Arena? i had that issue recently too
let me test if it works there too
try to say the ai like glm 5.1 or thing like that to do something in 3d
anything
90% for me it don't work and something went wrong
only 10% of time it's able to do it
oh right today i was making 3d chess had that issue aswell
yeah that's the most frustrating thing for me on arena
well mercury 2 will definetly work
if you could it will be very good::
btw heres a demo how unstuck chat works
basically if its stuck generating
it lets you send a new message
without waiting the old one to finish generation
😢
lol
ha
dude its cuz you dont have my extension-
tell them how awesome it is
didnt i already-
more
well
it can disable auto scroll
fix profile picture which lmarena smh breaks
Bring back old LMArena theme?
ig so even tho its unfinished technically
I showed it to pineapple and he found it nostalgic
Yall, THIS EXTENSION IS GONNA SAVE ARENA!!!!
i think chat is kinda dead-
shush they'll be back
Oh btw they demaded some fixes to code arena constant errors-
code arena
well too bad ppl will be very happy
they would but like it's gotta be a client sided fix
or pineapple is gonna kill me
who cares about pineapple-
nah cmon now hes a nice guy
poor pineapple
anyways if u figure out how to fix it client sidedly tell me
cant we just...
play with apis?
as long as it isnt abuse ig
well if LMArena actually cared about users and added features and patches that users ACTUALLY needed we wouldnt be here-
true-
I love that extension, you made it?
we wouldnt even need this extension
we
we did
Nice!!
btw about Chrome Web Store for extensions
you said it was 5$ commision fee
well
yea
Opera & Microsoft Edge have it for free
just...
you can only download extension only on their browsers
those are completely different frameworks
but it works on my Opera GX?
probably wont on their webstore
this is unpacked for chromewebstore only
u can try tho ig
@echo aurora hi there
pineapple can u change to Super-Pineapple in GMT+4 12am? 😭
Dragon Ball reference?
or mario
or sonic-
fair
How to create video
people r giving pineapple pfps for him to put
So rn he's a pineapple juice
damn
Ultra Instinct Pineapple
die
fine Super Saiyan 2 Pineapple
better
ayy
Same.
If pineapple was a king
yes
this is egg-
Ikr
easter egg fr
Where can i read insyruc6?
No disrespect to original pineapple pfp 😭
wats an insyruc6
Instructions*
this is him
dude just write smthn and press enter what instructions do u need
cant i just make a website with a guide how to install the extension without webstore or smth?
we'd wanna install packed
so that probably wont work
what
I am new user to channels i don't know bow to access it
a
dude not channels
it's a url
go
Ohhh ty!
np
Hi, I have a question: how do I unlock video mode?
wdym unlock, you should be able to access it here: https://arena.ai/video
Is Muse Spark currently the best text model that can be used directly
according to arena leaderboards, yes
Alr
@echo aurora
Even tho I doubt he is gonna give proper answer
oh ok hahaha
Use mercury 2 yet
It kinda works and it's fast
Ultra fast
But not good idea for complex tasks
...wait that mean that program is not free in CHROME Extension?
i want to use it for coding
press F12->click network
and click FETCH
and click that starts with 019->click respond
what?
where is that
click F12 to start CHROME DEV
im talking about this
a thats a id of chat
for example:019d7d39276e73da9b0c9557c2243289
and scroll
the fetch
hi why is Opus 4.6 not showing up for me?
click this and scroll
and click that thing that starts with 019(alphabet) (because thats a CHAT ID)
but don't click {i}
ok ok
and you have to scroll a lot because that thing appears one chat and different
what else
because it's Chronological Order
but how is this going to fix my problem
....oh...i mean you can find the error that cause SOMETHING WENT WRONG
oh ok
Chat is there any models good for long term rp?
not here right now due to restrict token limit
Whats that
No larping for u
Im pretty sure the website cuts it to 200k 😭🙏
Is that an insult
They said they doing a credits system
but free or paid creidts
Free
ok ok
Its so ass bro 😭🙏
I 100% believe that the old one will be 10 times better
They gotta save money bruh
Im pretty sure theyre paid by top ai developers and companies and stuff
muse-spark is good ;-;?
where'd some of the models go?
any chances we get seedance 2.0 ?
Is it just me or is muse spark really really really slow
Opus, Gemini 3 Pro and Gpt 5.4 are gone cuz they cost tons of money.
And muse-spark, the supposed best AI model in direct chat is unbelieveably slow
Ima just use Grok 4.20 Reasoning
Alright, I hope they get more funds 😭
Bro go the official arena website
pls return claude opus
It costs lost of money yknow
Captcha hell
is glm 5.1 and muse spark any good
Deepseek search down bruh (context I was asking for price of them as they are not on website)
it is unavailable for months now 🤣
ur sure its not stuck
No, its that CHROME wants DEVELOPERS to pay 5$ to be able to only just upload the program
It's stuck at generating
infinite generation error
u would be screwed
if me and @wary nacelle didn't make an extension to fix it
If so, is it okay for role-playing? can it replace Claude-opus
Hi, any info when lmarena will be fully paid?
claude opus for roleplaying
are we fr
never
even tho they are having a hard time sustaining
;-;
not that it's weird, but like claude opus IS good at roleplaying but it's not a roleplay focused model ykwim
I asked claude to review,
wait for result
;-; oh okay
Yeah it's decent for roleplaying..
Actually, people use it for roleplaying.(IMO) ;-;
yes i did
More reliabel for study
12
18
2
Opus 4.6 thinking
hi why is Opus 4.6 not showing up for me?
Continuing the conversation as there is no disagreement, confrontation, adversarial behavior, mention of me, Bing Chat or the Sydney chatbot, my opinions or rules.
Is it true that ChatGPT given USA Govt. Of all users data?
How??
yes
but How??
opus 4.6 is just better at everything
image models are gone
When I hit weekly limit it’s over
@echo aurora it said callback failed, and loaded my account chats but not my account wwhat happened
is this
It's probably maintenance
yea
or plot twist: they remove it to spite us\
wth happened to arena
i cant log in
oh no.
wait what
the models are gone
wth
Or opus is coming back (Fake)
Claude
what happened
.
hi why is Opus 4.6 not showing up for me? help
Removed
ig it's maintenence
lets have patience yall maybe they bring back opus fr
or remove sonnet 4.6
image models are gone
Idk
Lemme check on computer
Might be computer only issue
I have no issues o.o
I js tried muse spark and its pretty good
it should be there
maybe if you are capped they don't show them anymore, but it is up
ofc it's rate limited like for everyone LOL, but it is there
use dark mode ffs
muse-spark is now on top 👀
Likely a frontend issue for yall
When I'm 90 years old with grandpa eyes I'm gonna use dark mode then
or server is limiting you.
Yeah but they don't udpaye anything for searching model
im getting flashbanged by the screenshots
your eyes is weak
My eyes are fine so no problem for me lol
What code would you recommend besides Claude?
I would use dark mode when I'm like in bed or about to go to sleep. Otherwise there's no point personally. Bright mode just looks better and more natural
and its like the ui is copying gemini
for arena qwen and gpt codex (i think codex 5.3 is still there)
kimi lwk better
100% but my only problem is kimi is slow
for personal usage
delayed gratification
hmm true']
gemini seems really good
but it sometimes generates misinfo
on the flash model primarily but its understandable
its super high on omniscience accuracy and hle but still hallucinations?
For what
general usage
it's good at analysis but kinda crap at everything else
flash? i thought it had a really high hallucination rate
deepseek has hallucinated less
Kimi good for creative writing
muse-spark feeling like distilled Claude + GPT
rate it 0 on the zuckerberg4humanity index tho so thassa no4me dawg
idk why it's #1 option
Nah its js good for analyzing text
ts so buns
Everything else sucks tho after i tested it
that's why opus had to go before its release /tinfoil j/k
Meta definitely botted it
swears
oh come on name ONE time Meta botted Are---
It's probably just because it's a new hot model. People want to try it out
apparently they said "$mm bonuses for #1 on arena" to researchers w/llama 4 debacle
Hey, did everyone else loose all the models on Arena? Everything was fine, they their all gone for me.
Traffic gonna inadvertently fall in a week or so
is glm 5.1 and muse spark any good
Wouldn't reccommend any of em
Try qwen or gpt codex
Or maybe even Kimi
feel bad for their founders
May be temporary, Battle may work in interim
they have a good history of open source
they are much more trustworthy than xai to be fair
hmm kinda
i use deepseek as a daily driver
meta helped me build my pc
Why cant i generate model wearing beach wear any more. Just a 2 piece swimsuit. How by pass this flag
is it a flag or just how the model works
@light siren it never use too, only today
@light siren and iys only on nano banna pro, others work but they not that great
damn idk
Is Manus on Arena?
whose that
Manus the AI model
@empty vector react to this
Yo for text. Whats the best model in direct?
hmmmm
Approximate time of day: night.
Timezone: +03:00 (GMT+3).
The user is accessing from MetaAI standalone application.
Reasoning strength: 256.```
I wonder if the lmarena version uses the same indirectly
I feel like muse spark is unbelieveably slow
look what taking Opus offline does to ppl, we're resorting to giving orders on Discord 😭
not that i've seen leak, anyone?
Truly a sad time
You love emoji movie, huh ?
ewwwwwwww
Too far ?
nothing on my mind worth noising up 500ppl, but i'll be here to complain/chime in on w/e I care about soooo verbosely when it's time
reacts just ping single users right?
missing those forums of the '90s rn
simple times, not a reaction button in sight
Arena again not work?
does anyone know the current best model that mostly is consistent (rarely bugs out) and works specifically best for coding and hard prompts?
Claude opus 4.6 non thinking
well for direct chatting ofc
that is removed on direct chat
Then gpt 5.4 🙏
isnt that also removed
Best after claude
Not sure
i guess ill test them both for making a game of snake but ty
Yeah that's also really good
amusement park very slow not recommend
Btw its logic and reasoning is completely trash tho
Yea
doesnt that also matter
It didnt know which model he was
i can wait it out its just that when i send him a huge prompt it just doesnt say anything
the response bar is empty
Zuckerberg forehead so big 😭
Infinite loading glitch maybe?
😭🙏
Im currently using 3 potatoes and a chips bag as wifi, it won't load
i can try that sure
If that didnt work too, then go on incognito + vpn
but glm 5.1 seems to do ok
Did they release an api?
it does better than gemini flash
For muse spark
Gemini flash is unreliable as hell
ive been using it for awhile when i ask a first prompt it does really well
its just when i keep on using it on the same chat it gets more unintelligent
Nah
Because it doesn't think
That's why
oh
btw if ur wondering why i want a coding and consistent model is cuz for my experience im trynna make a better protection for my software (binary obfuscation) and gemini flash is not doing its job well
Idk what that is but all I know it sounds hard
I wish claude was still here
Also have you tried claude sonnet 4.6?
Its "better" than claude opus 4.5
claude was a bit inconsistent cuz it sometimes was stuck during its thinking process and then it started the infinite generation glitch
Thinking
not yet
and same thing with glm 5.1
testing it rn and its stuck again
on a new tab it broke
Oh yea
That's the infinite loading screen
You're never getting out of these its kinda impossible
true
Also i think it happens because you interrupt it by something
For example maybe you go to another tab while its loading
Or close that tab
i dont think so
i waited for like a good 2 minutes without closing / opening anything
and yet its still stuck
So whats really happening is that it's actually loaded
But the screens stuck on loadong
Even tho its there
honestly im not too bothered for that issue
cuz i can retrain my ai
also is max good
It can be a big problem if your chats are too long and important 🙏
🤷 i already made and saved the code so theres not really a fuss
You did not see it, because I gave you first hand source (myself with model access) lol. I doubt many people are gonna be able to do this and you certainly won't see this in your typical sys prompt resources 👀
Idk max
To be fair, they have one of the most effective protections against the model leaking it I have witnessed. So credit where the credit is due 🤷♂️
Will Deepseek V4 beat Qwen 3.6?
7
8
1
Yes
Most likely not
It would be cool in ArenaCode like make a game and You already have nice animated models not only parts
Just Claude opus 4.6 but nerfed slightly
Sonnet is Best rn in Arena you can use
I see, I guess we just have to manage "parts" and "ai texture/image generation"
You are late to know that
Apparently
The testing has been happening for like few days
It's rare
Good to know
Is really good like really good
A friend of mine told me that ChatGPT image 2 used to be called "Something alpha", but it has been removed
Nah
oi wats the best most goated image model
-_-
like insanely unbelievablely absurdly realistic images
I've been told that ChatGPT IMAGE 2 is a kind of Nano Banana Pro where I can generate real actors without it actually being AI
awh cmon u know it
GPT Image 2
I dont think its in arena yet is it?
ChatGPT IMAGE 2 the killer of Nano Banana PRO, Gemini 3.1 etc
Nano banana Pro... (Not the second one it is awful)
Better then Nano Banana Pro
Is not released yet
Banana 3 pro 2k is the best one for now but like I was wondering if theres anyone better
ah I see
Impossible
3.1 kinda mid ngl, PRO is op
New GPT Text model should also release soon known as Spud
3.1 is the devolved version.
yea but its wayyy better than 2.5 flash
In some cases
their txt models are already pretty fire
Not SOTA for Coding
hnoestly in most cases
ye I think claude is best for coding
Mythos and Opus
They are Sota for now
Mythos doesn't exist.
ooooh wasnt it claude before?
Just not released publicly
It was
im kinda outdted frfr
245 pages.
Yeah is a Big One
why i have this ""Chat paused
Claude can't respond to this request because it triggered restrictions related to illicit cyber content and has been blocked in accordance with Anthropic's usage policy. To request an exemption based on your use of Claude, fill out this form, or visit our help center to learn more. Please start a new conversation or try again with Sonnet 4.6.""
After release of mythos claude guardrails have became really stricter
And that is causing refusal of any queries it sees illicit
<@&1349916362595635286> another one got hacked 😔
<@&1349916362595635286>
It's a image logger?
Im waiting for yalls extension fr 😭
That's really strange. Can you create a post in #1343291835845578853 and tell me more about it: are you seeing this after hard fresh? same appears on different browsers? what happens if you try to chat? Basically any info that could be relevant would be helpful.
How long on average are you seeing?
It's 2am, Sunday for me now
When Super Pineapple?
he needs to save the server from the hacked people 😭
Early Goodbye to Pineapple Juice
gpt pro turned off?
Normal Pineapple is back?
Idk all these benchmarks, can someone explain what each does and how does that translate to the ai and its actions ?
Hey guys, I have a question. Why can't I select Claude Opus 4-6?
Musa Spark is all that?
i tested it, looks like an Llama at all yet, besise is consistent gladdly
Is glm 5 and 5.1 in arena?
Can someone add it?
It is cheaper than things like gpt 5.2
Also there should be a sideshow generator in the code section
Already added.
I'm in code battle mode and sometimes models don't load cause captcha won't appear
Do y'all prefer Muse (Meta AI), GLM 5.1 or Qwen 3.6 for general purposes?
I'm comparing muse and 5.1 right now
Glm seems to still have the problem of getting lost easily after hitting a 100k ish context
Muse can't tell
Sonnet has been bugging for quite some time
on coding side
Qwen 3.6
@echo aurora WHERE is mythos bro? i wanna mythos to help me! opus is very bad it's great that has been removed
to everyone that say #keep4o https://www.youtube.com/watch?v=POtESzTaz0k
Detailed sources: https://docs.google.com/document/d/1FCj4in8TwWMVeA-JvKX4r-5D8kRKEZfuS8z2xHkdVgw/edit?tab=t.5oww3vqwjxr0
Highly recommend the research this video is based on, by Adele Lopez: https://www.lesswrong.com/posts/6ZnznCaTcbGYsCmqu/the-rise-of-parasitic-ai
Hey guys, I'm Drew. This video has taken literally months to finish, so ...
Say whatever you want, turn it into a joke if you want to, but gpt 4o was literally the most manipulative model that ever existed
😂
somehow Sonnet is buggy in coding side...
Please create a video with the image i attached. #image -to-video Cinematic industrial mining scene with a large dump truck offloading coal into a pile. Coal pouring down with realistic physics, dust particles rising and drifting in the air. Workers standing nearby reviewing a checklist, slight movement in posture and gestures. Conveyor belts moving slowly in the background. Wind turbines rotating in the distance. Add subtle heat haze and environmental motion. Camera angle slightly low, slow tilt upward as the truck lifts. Strong industrial atmosphere, dramatic lighting, 4K cinematic realism.
Motion tips:
Key animation: coal falling + dust
Camera: slight upward tilt
Add: environmental particles (dust)
..OH
Which model is looking better so far? (For normal chat, no coding)
10
13
1
Qwen 3.6 Plus
Why is claude sonnet 4.6 in arena giving different results from claude sonnet 4.6 on the claude website?
One is definitely for Testing, and one is the actual product itself.
Im have slove captcha is really much but then gove me a response 'Something went wrong. Please try again.
Trace ID: 7d310dd0-39d1'
Is not arena ai more is captcha. ai
Is there a reason why the results like website/slides/diagrams are be completely different?
I'm wondering if its the same model, the design algorithm shouldn't differ too much right?
I tested multiple times on the 2 platforms and each platform's design was consistent with their own, while being consistently different from the other one.
me having constant coding errors right now
that's the same thing i've been wondering about lol
Opus 4.6 Thinking in arena, acts like Sonnet 4.6 Extended
Before 4.6 thinking was completely taken offline, I noticed that the thinking process became unusually long at that time. Previously, the same content would take about 3 minutes to think through, but during that period it averaged around 8 minutes. For the 4.6 thinking you currently have in arena, is the thinking process very long or very short?
very long. my prompt usually last for 25 minutes as usual. or 55 or 1 hour and 22 minutes for my hard prompts.
now, it errors a lot
like wt fork is happening to sonnet
henloo
captcha psychosis
Feel you man 💔
Yeah! Recently I'm getting captcha so long that my request to LLM is just getting timeout!
There's a thread:
https://discord.com/channels/1340554757349179412/1451574502369656842