#general
1 messages Β· Page 316 of 1
russian vs captchas
And Gemini 3.1
or just 4.5 thinking with File Upload. i don't need vision at all
Like not again
that's 7th time now
the code is about to end, and it just crash like that
tf?
?
i've been asking sonnet a very difficult (Difficulty: Closer to being Impossible) simple prompt. where it would create me an HTML, somehow it ends up like that, and for the 7th time in a row, (in 5 hours) it ended up like that again
hi, does anybody know how long a session goes until I hit the session limit?
been investigative here though, some people who suggests ideas here, doesn't reappear or neither they come back after producing an idea. (which comes into fruition, that it became a reality) (people who basically joined for one time, and doesn't chat back, like are u a bot or something)
Is me @empty sky because my account is at risk and I accidentally phone locked it
hello there
anthropic naming Mythos as Mythos, and saying they're not gonna release it in the public, makes me think, it's just a myth and it's just Opus 4.7 lol
after all mythos means myth lol
after all every hyped model isn't really as good as they say it would be lol.
(and i just used the word "after" and "all" two times and three times as i just typed this)
wow
i would like my Side by side be like that lol
Too slow
but Opus is out... sadly
Sometimes, it didn't respond
What are you using it for
mine would not be muse at all, but either GLM (which at this time, i may gave a praise, since my opinion changed, but it just lack something i need, which is file upload) or sonnet
turns out it's a session limit (by theory)
Chat only
Just testing
Took at least 4 mins in answer
What did you ask it?
Something about GraphQL of ig
Hey
If I use only for roleplay. Which model I should use?
I usually use Glm 5.1, sonnet models. I want to try out other models as well.
@burnt sinew
Why did you set the arena's current state as your profile pic?
Literally π
gemini-3-flash
what the hell is this recpatcha bruh
fix yall site
this is so buns the recaptcha aint working ive been 30min hiting everything good and still tweakin
fix yall recaptcha
sh ai
At least it works now
Unlike before, when you had to use workarounds to get it to work
Besides, the captcha is a measure to prevent abuse
its not
i don't even complain about recaptcha, i complain about this goddamn error lol
that's the thing people should be complaining about
why the hell does it randomly stop out of nowhere
wtf?
so it's a random battle mode?
the models are outdated! this data is misleadin to people who look at it and think gemma3 is still best (gemma 4 is out)... also the BLEND ratio should not be fixed
there should be a slider for the blend ratio, not fixed at 3:1
i wonder if that thing is even dynamic
yesβ¦i hope they will give it backπ₯²or at least kimik3/dsv4 by the time there released
for Chinese models are cheaper api
Hello guys
Dude, what's with trying to add points when the site isn't even working properly? Logging in is a nightmare because it keeps glitching π
Stop adding Windows 11-style features for the love of God
why is the opus model in the arena gone now?
@blissful knot
that's the reason why there are no models
Well, for about a week now
We're waiting for it to be returned
were removed from April 3
What's the worst possible model on arena? My friend is asking for ai to help him with code.
im just generating image
I think so, yes, it is possible
but I haven't tried
Although I really miss GPT 5.4
let's try π
that's arena's birthday gift for me, and it was removing those models. like tf?!
the image models are gone again after login
not on my birthday like bruh
and it was good friday at the same time
Hey guys, quick question. I have a ChatGPT plus subscription right now, and I want to ask if itβs worth it to cancel it and get a Claude subscription? I mainly use AI for uni projects and stuff like that. My only concern is Claude having certain token limits and ChatGPT being unlimited. What do you think?
Arena Fixes Extension Demo Preview
yes
Claude is way better for exactly Optimized and good code
When GPT is for thinking and other planning kind of tasks
GPT rn is not the best at coding
Xd
well my friend @light siren can do that
but he says pineapple will kill him
so yeah rn it only fixes client side stuff
like chat stuck
Captcha
Skip button not appearing when forced comparison in direct chat
Copy Buttons at the bottom of your prompt so you can copy easier
The classic arena theme was the best
next ai assisted feature besides Enhance Prompt is gonna be
Save whole Chat History Context
as a one prompt
which is also good
incase LMArena chat breaks so hard
that extension cant even fix it
-# which kinda happens very often on claude-
You can add a button to bypass endless "generating" btw
uhh we already have that
this button is universal
all it does is....
ignore that chat is already generating a response
and send a new prompt
how is it good?
FIXES ALL ISSUES
even errors
How do I get this Arena extension?
Its not public yet
A new prompt or the same prompt?
cuz Chrome Web store wants 5$ to upload extension π
new prompt
Really?
yes
π
And... the thing is Chrome Web Store doesnt accept my currency
either @light siren 's
I've created a twitch extension to automate everything
But I haven't gotten around to making one like the one you made for arena
cuz if you did you would loose some huge amount of braincells just trying to understand how tf Arena Ai even functions on such unstable code
on hopes and prayers ig
Literally cuz LMArena is financially supported by big companies
they dont rely on us... Users
thats why their User Experience i ass
Even claude has better user experience
Literally cuz LMArena
iswas financially supported by big companies
a
Basically what they changed is background color and the name and the font, and removed the thing that was saying "Find The Perfect AI For You" thing with the ai logos
and made it "Chat with our FRONTIER ais" or smth
It's more fun to find vulnerabilities
Like the one I found on *** three hours ago
Yea but ngl I liked old LMARENA background color better
New fonts aren't that good too
they be trying to make themselves look more PREMIUM while they cant just fix a damn copy button to be at THE BOTTOM of the user message
instead of the middle
Like is it actually that hard?
why is it important?
π
What's that on starting of the last line π
@wary nacelle
Add an option to disable completelly the math markdowns
When AI processes a complex regex, that function breaks the regex visually and you can't copy it
π
just realized it had a highlight
yea a blueish one
lmarena was better
arena kinda lags too
I didnt lag in lmarena it was more simpler and better
oh its a simple reason
smth happend to their Senior Full Stack Dev
and new one is kinda bad
okay now how tf do i upload it to chrome web store..
without 5$ commission fee
Huh
release to other webstores first
like rolls out in others first then u can think what to do for chrome web
like edge
Microsoft Edge Webstore & Opera Webstore are free yes but they are completely different frameworks and do not work on Chrome
AND
Chrome webstore was specifically made
to be dominant
so google can be greedy
and earn money
Oh
and sharing the extension source code is a bad idea
yea dont
people could modify and professional ones would probably bypass arena stuff by modifying stuff
release on their websites first π₯ π₯ (I use edge)
since yall already have this
image
my wants yall to find other icons
idk
he needs reference to perfect the extension Bring Back LMArena Theme thing
idk @light siren
dont use edge it makes ur pc laggy
Mobile edge
yes but u can replace em
nope
it's a really expensive model
unless u wanna use anthropic's website directly
kiwi second option cuz it supports chrome web store
but lemurs better and non laggier and specifically made for extensions too.
kiwi is nice
I've used it myself once
yea
yeah cuz that's a edited arena lol
It's an edited arena
prove it
take a video.
It doesnt look like fake.
gatekeep your shi
uhh okay. I will stay quiet
spammers the one that r mad
Might be fake dude
If they offered claude opus 4.6 for free
gg
they are bankrupt
what ads are useful for? if they dont have 1k member
What is this site?
Top example of being slow
Is everything more or less free there?
I don't give a damn about advertising
I need a website
haha
<@&1349916362595635286> advertising and misinformation
I noticed this guy is an alt
suspicious
No generous providers offer claude opus 4.6 opus for free
π
yes
i do
nobody wants your fake edited arena
take your advertising to someone else
LMFAO
π π
he's actually slow
sad
@light siren And Users
what do you guys think about
making our ( My and Liam) extension
an App?
excuse me wat
how would that work can u explain
soo..
Yk Electron-Vite @light siren
as i said its basically Browser
but made as an app
u wanna embed arena on an app
bad idea
what is this platform?
?
our
sdfgfsd
releasing on any official platforms would be an issue in that case
cause it's basically arena but with fixes
lets stick to the extension
he is right yk
like nobody asked
we do have our own solutions to use opus
so he doesnt flood our chat
bet
so where were we-
i want back my Gemini 3.1 pro + Opus 4.6 thinking combo π
the app
yeah right
Ain't no way one guy raigebaited all of u π π "zs" bro's a menace
so why it wasnt a good idea make LMArena app?
ragebaited who
here comes the main account
with built in extension
then creating websites back to back lol
It's his main account
u wanna recreate the entire extension framework in an app?
how bro felt
whatever block
real
all? ahh... i just joined this conversation... (baliw yarn)
anyways the app
yes its ez
yes app in sizes will be lil bit heavy like 100-200 mb but even faster than LMArena
i think
how much days would it take?
well fine
prob 7 idk
would it be available for Android too π
good question
@wary nacelle can we first make auto fixes option so the app just fixes automatically
U make that
i only make frontend
fine
Oh RIGHT
on prompts
lemme give you my current worked Extension zip
dms
i wish this server had proper voice channel
I know the platform
it used to but they are gone now, they must've just removed them
for study i think chatgpt is the best
there were two voice channels, "research lounge" and another one
Well-
real probably but kinda the website itself is made with ai-
I dont see the same structure so not sure
I see
Proxy?
yes-
Knew it
well not proxy
its called bridging
i can get the api directly
I just call it Proxy
then just plug in Opus 4.6 and other peak loved ais we all love
Arena Plus
Or Arena Extra
those models might be fake
@light siren
hi
Arena+
.
sure
Long prompts don't work
imagine
yall getting fans now
no.
lol
Nah
then
They're making a chicken named arena tools, it tastes delicious
it does taste delicious
nah it tastes like heaven
insanity
better with the side of LMArena Sauce
You are welcome
ts soo real
better that u can automatically fix the chicken that is stuck in one place and u can't get it.
ChickenArena π₯ π₯
KFC Arena
nice new Chat Mode name-
wait but fr-
i already have Cooking Ai
not bad idea to implement it into Arena
lowkey just make LMarena power with ai response
and the framework of cooking
and stuff
handled by extension
I forbid
You get like 2 prompts on that website
Meta actually made something good what?
π
help us find the old lmarena ui yall
Prompt Enhancer preview
Bros actually cooking wth
Oh its only the start
wait till you see i add to it Ultra Fast Web Search
- Chat context extraction
so incase chat breaks
you can start a new one
@light siren is actually cooking too
but smth else
Should changing to Arena capcha. Ai cause im keep get captcha every prompt
oh my gosh thank you
What is that old lm arena
Random thing made for fun
np
They did?
arena.ai used to be lmarena
where is video generation option?
It is peak before cause that time im dunno about arena
Very peak
Too bad im dunno about arena in that time:(
Gotta miss the times when UI was good
performance was fast
no forced comparisons
No captchas
*Recaptchas
Im hate captcha
Is alright when you need one or two verify
But then more after 4 or 5 image choose keep going, even misunderstanding wrong for that
which tool Lmarena use for video generation?
or just Arena Ultra, jk. is that based on phones lol
finally someone understood the reference of the joke π
@hollow mulch i can tag myself lol
So who make arena ai?
Even the site get to known more from tik tok
a corny name would be the reverse of Arena, which is "Anera". i don't think that word has any meaning, except for it's Japanese word according to Google Ai which means angel for some reason.
But why arena have to add captcha? Because some guy create new acount to using limited model?
well after all, the Extension feels like a gift from an angel.
Is just my theory :/
Guy did yupp ai closed or the site still there?
@light siren what do you think?
hi im new
Hi bro
hi a potential tortured soul of Arena Ai's new "USEFUL" features update
They don't have enough money to afford them anymore
ah
basically for readers, as i said in my first word, it's gonna be corny or a knockoff version of arena
You can read in #announcements
who even pays this
i am wondering how the website still exists
so you're experiencing them too? lol
that's my current bug now
The same ai companies
wanna know why?
CUZ EVERY SINGLE PROMPt you send
EVERY SINGLE DROP
of data
is used for their data
and models training
basically no privacy
so basically we're the research people for those ai
It's has been crashing like...everyday. for monthes at this point. It was peak years ago
our projects are just training data
whatever data that we input, the output would become the AI's knowledge? am i right?
because that's been a theory for a long time
The customer is the product, as always.
no
it is corny right
yes
why still have claude opus in leaderboard lol π
cuz it has collected data
and it has 1504 user opinions
It's still in battle mode
Yep. but Human Rats as we speak. just like Middle Class in First World Like but Third World Countries.
where Poor people get money, that definitely comes from Low to Upper Middle Class. (Upper Class would be different, since they're too smart to lessen their taxes)
opus is very good ai
i used to have np 20-30 anki cards
summary of yt video doing very good
but now i have only sonnet grok or this muse spark or something
before i used to use claude opus 4,5 and gemini 3.0 pro
lucky you, i combine Gemini 2.5 Pro/Gemini 3.1 Pro/Sonnet 4.6/Opus 4.5 Thinking/Opus 4.6 Thinking/Opus 4.6 majority of the time.
now it's reduced into two
i discovered arena in october 2025
before i just using chatgpt
and randomly gemini
still great
but didn't know i can actually integrate LaTex coding until January
then i joined this DC server because of errors of GPT 5.2 and Opus Thinking
at February
and most of that errors, was caused by prompts and the results of my LaTex
i joined because opus dissapear
and other models
i still have chats with opus 4,6 and 4,6 thinking
sameee
and some others
i created account
i discovered this through a sharing subreddit
i still benefit in that subreddit, because randomly i would find some post, that there would be 60% in some certain cinema
like a promo like this, it's a sharing subreddit after all, idk if America has a version of that
yum yum
Economy Diffrences
And that save 96 is just a gimmick
Just increase the price by extra and label it as discount
Add one more to the list
yep.
J-co is popular here though compared to krispy kreme
you would found random reseller across a street.
they would be selling a dozen of donuts
not cheaper, more expensive actually
because it gets sold out in 8 hours
scalpers as you call them
Opus 4.6 is nerfed ?
what youe mean by peak ?
like the sota model ?
opus
Helo
Heavily nerfed and hallucinating a lot. The Api is still performing well, but not like it used to. Ask it a simple question like I want to wash my car and the car wash is 50 meters away, should I walk or drive? it says to walk to the car wash. Earlier, it would say drive since you want to wash your car. Opus 4.5 is doing well, use that instead.
you talk in claude chat ?
I use both the Api and claude chat.
Api in antigravity.
in api is nerfed to ?
Yeah, not doing good as it was before but itβs still manageable not like Claude chat which is dead now.
ok so the best is to use opus 4.5
but opus 4.5 is less good than sonnet 4.6 ?
Yup use Opus 4.5
Sonnet is also hallucinating a lot, Opus 4.5 is doing good so use it.
ok thanks
how are they gonna make muse spark free long term
sadly opus 4.6 he can be the best
Am I the only one who won't see Opus, 3+ Pro, what else. Was I banned from trying top llms π

glm needs file upload tho
how can i get out of this loop?
Opus 4.6 nerfed in claude chat ?
14
14
1
Yes
you can't actually, that's a bug
although GLM is just like a knock off version of Sonnet (100% of the time)
but it compares itself to Opus
It achieves the result I want much faster than Opus or 3.1 for me for more real-life (not hard problem coding) tasks
Glm's agent mode is very cool. Chinese AI impresses me a lot
is there a better chinese ai than Deepseek tho (Deepseek sucks right now)
like no. 2 ai you can suggest
(I didn't say no. 1, because no. 1 is disputable)
Glm is better than 3.1 Proand Opus as I said in what I said it's better, and it's not bad. Meaning it can achieve everything
Deepseek is far behind, but its MAI (Microsoft's) version is impressive
Benchmarks agree on all this
I believe it's still free on openrouter btw
(mai)
(the best of deepseeks)
soooo my stuffs gone?
i was making a website on here lol
bye
@echo aurora where is mythos bro?
why spam mods asking about an unreleased model on a sunday bro?
They took Opus from me. π
is there an ai good for latex? (not prism, it's just gpt 5.4, degraded version)
I dunno
me opus? are u serious by that word
Check out texpen the github maybe
i'm not technically correcting, but, and but. "Cause they took my Opus" would be a better term than "Cause they took me opus" like what does me opus mean? the hell.
Progress of Arena Fixes so far
@light siren has been working on LMArena OG theme
waiting till he send his part so we can cook
(this is Enhance Prompt button)
bruh, this are something new that's why you see them like this
Like, stay 2 weeks behind plebs
it might be smashing something, but i do not believe the "bleep" out of it..., something is cooking yall
like there's no file upload...
how the hell does it win
the site itself has file upload, WHILE IN ARENA IT DOESN'T HAVE ONE, SO WATTT EN DA INTAYR WUORLD ES GUENG ON
crash out
Random question: do you think lmarena will remove Gemini 2.5 pro?
11
22
1
Yes
Glms are really seriously good if you're fine with like two extra fixes after the first prompt
I only don't code with glms now, but for basic tasks they're my go-to now
well for coding glm is definitely good
but not good for long files
because it usually makes typos
Yeah, actually glm-5+ are really, really good at coding
Solve everything for me. 3.1 Pro just does it slightly faster and prettier for me usually
dude making a system prompt is so hard π
it keeps finding a way to bypass it and use it to JUST CASUALLY CHAT WITH THE USER INSTEAD OF ENHANCING PROMPT
What I found really bad about Geminis is that 2.5 Flash Lite with a good system prompt works much better than 3.1 Flash Lite. Why?
2.5 Pro is often also more fitting even though obviously less capable and a bit unnatural
I should prolly try another account and device
you made me confused actually...
like in my head right now it's saying:
Did he read the announcement?
Was he actually banned?
What the hell is going on and why did that happen to him?
Oh lord... have mercy
Which announcement?
in announcements?
like i can't use opus too
the fact, this app gave me a dissappointing birthday present, something that will make me pissed.
Oh wow they're back but only in my old chats
Yeah Opus been unavailable for quite a long time already,
Announced in April 3, Good Friday. and it was my birthday at the same time.
like sonnet?
I see sonnet 4.5, 2.5 Pro, 5.2 high and recent 5.4 mini high
Oh, 4.6 sonnet is also there. And 3 flash
These are the top for me. No better
Are the providers removing their llms from arena? I see better models on screenshots in the announcements here!
Maybe used too much without voting much?
What stopped me HARD from voting more often is captchas, so annoying, every time
I wish I also knew what is counted β only the last reply or the whole conversation starting with the last vote
Cuz it's getting harder to vote immediately this last year
that's a good theory btw
Happened pretty much all of a sudden btw, I only made like four text prompts that week, and then after a couple of days...
it has a lot of errors going on?
Half of the time actually, esp for image generation
for me it's coding generation
For text, not half of the time)
ever since I joined arena, I've been SO invested in AI bro.
i use Opus for LaTex after all
I'm gonna become a slopmaster π
me who's only been using 4 AIs before
but perplexity sucks
no way u can js ask an AI in arena to make u a web with the Opensource Model link and it can install everything for you and load the AI.
Crazy
Found about it rn
π
Blizzard must create Slopcraft the real-time vibe coding strategy
U think Spud will be stronger than Mythos?
Ai Studio (found it in 2023), Chatgpt (before it became limited this year, dissappointingly), Bing (a very good idea in the past, it just downgraded actually, and that's the actual problem, although Microsoft can say that they own OpenAI), then some independent random AIs or meta ai whatever
Dang sora truly was the only good video generator
Of should I?.. ππ€
me personally, I am waiting for GPT Image 2 and excited for it.
Mythos is worse than opus 4.6 lmao
What π
Mythos is hype bait thats too unreliable to release
just like how they hype, GPT whatever it is in 2019
Mythos is literally opus but not quantized and no guardrails
π€£
Which makes it more creative but garbage to work with
No way that github misposting isn't a part of their deliberate rollout strategy to cover the expenses of Trump's attack
that would disappointing for real
If I ever need help with GPT related stuff, you're the guy π
cuz ur smart about Openai.
after all it's just Opus 4.7 to me lol
Thats literally true, read the paper
Nah its base for opus 4.7 they literally said that
Opus 4.7 will be mythos but quantized and actually reliable
me having selective amnesia. and i can't pronounce words oh yeah
So slightly less creative
what did i said... lol
Yeah but like they actually admitted they are using it as base for 4.7
So mythos is basically a hype bait
expect madaforkers
freaking out, saying it's good
Opus is good for being objective and not agreeing to any crap you write. Besides this, I don't know why ppl actually choose Anthropic. Their tasks aren't probably hard or demanding stability maybe
the name itself, already makes it look like a fool. like "Mythos" are u saying it's a myth, like you should just named it Omega
it's so corny
Wanna be rockstars. OpenAI did it better with o for omni
or Achaemenid Empire.
The next day after I came up with choosing omni for a lot on arena btw π
is that even the right spelling, from what i just wrote
I like how Jan Assmann uses the word "mythomotor"
In essence, everyone's drivers that embody the essence of current collective memory
i want you π
die
ban
bro why all llm gone
There was some paper that claimed that for most tasks "instant" simply gives better results
is it just me or did people start to use gifs from giphy more often?
discord changed it
all ia gone
wdym all?
What's that?
like what model
i would want that. but i don't have the moneee
high
PLEASE OPUS 4.6
don't you guys understand budget cuts?
like haven't use experience financial problems....
Have you noticed Geminis became much faster in aistudio recently?
2.5 Pro begins answering within like two seconds for me very often
its free vro
its our extension
it became flash
The fact that mythos aint much better than 5.4 pro is funny
Extension me and @wary nacelle made
like that's believable. cheaper to prompt?
Flash lite really
5.4 Pro is actually a real model.
And mythos "alleged" stats arent much better
if it's cheaper to prompt. it should have been here now, and named as. Daddy's wishes.
ampro mythos was mainly supposed to be a coding model if i am right so that makes sense
Mythos has less $/token but uses MUCH MORE TOKENS
like are we in SIms 4
being milked by EA
No mythos actually is creativity model
With no guardrails
And its still bad
π₯
uhh
Did they remove Claude Sonnet as well? Are even the free models going to start getting removed?
im not sure about that one
you were the one saying that gpt 6 is coming this week
i dont see it
Yes next thursday
This week was the plan
100$
so they're different from 5.4 high?
Gpt 6 was delayed from last week because mythos wasnt released
my eyes are wide open so lets see
Gpt 6 spud was supposed to come same day as mythos to literally make fun of mythos
then why the hell i don't see 5.4 pro before
Claude Sonnet 4.5 and other versions disappeared for me.
So instead they are pushing the plan (and superapp) first
Gpt 6 spud next thursday if stuff go well
Its not in codex, only web interface
Or in codex but you pay via api
It costs 10x of normal 5.4 xhigh
i have no idea where you got that information from
@echo aurora ?
but for me that seems veryy unlikely
people tend to believe that it will release in june
I literally predicted 5.4 release up to 5 hours
π
what the
caine from tadc recreated
pro confimed?
Just watch twitter and insider trading lol
i never do lol
Its all there, obvious
But yeah, i had a screenshot somewhere, tibo literally was suprised mythos is private only too
Cuz they wanted 6.0 as counter for it
ok but is gpt 6 a generation leap or not?
Its multimodal
because i have been seeing it is
And thats MASSIVE
Are you able to see a screenshot? Can you also try a different browser? Model is appearing for me.
oh thats awesome
Its new base but its going to be more opus like prolly
Because its 6.0 not 6.5
Its like fresh start again
More creative, less reliable
but it definitely is not a big leap to agi π
Unless they fine tuned it strongly with the bonus time they had
Define agi
Agi is literally moving goalposts
Thats the shortcut yeah, but define it
The original definition was "better than average human at most tasks"
well i really dont know how to define it because i dont really know what agi means
Which we went past long time ago
im just saying for some reason some certain people say that it is a big leap to agi
it seems very unlikely
Can average human write working code at 200 tps?
Can average human solve frontiermath problems?
there's no way something thats a big leap to agi is going to release this week or next week
Agi is already here.
People say agi while they think asi
Arc agi 1 was agi
Arc agi 3 is literally "more than asi"
Lol
well i guess i definitely dont know what i am talking about
or maybe im wrong
or maybe im confused
Agi is a bad metric
Guess which model is "most agi" rn?
Because the answer is funny and unexpected
no idea
Gemini 3.1 pro
Why? Because its multimodal properly
Yes its stupid
But its closest to agi
Because its multimodal
Gpt 5.4 and opus 4.6 are both way better at coding than gemini 3.1
so agi is linked to multimodal

Which model can replace you at doing dishes?
yall is mercury 2 good
so basically gpt is going to beat gemini's latest model?
Thats agi
yeah
For agi you need a good vision and decent smart
Not "very smart", very smart is ASI
world models
Yes because its a new base and its multimodal
Asi = artificial super intelligence
Asi means "better than humans at everything"
from what i understood
AI = mimicking humans
AGI = acting like humans?
ASI = overpowering humans?
Ai = neural net
Its different thing
ok
AI = does anything at all without needing to teach
Ai is like a fly or mosquito
Agi is like a dumb factory worker
Asi is like Einstein
anybody got an answer
i see
Garbage
Use gemini 3.1 flash lite instead
Mercury loses to qwen 27b
die
Mercury is trash
liam you're mine
Like literally
π
It uses outdated architecture
fair
gay ahh
Im getting 6 question from cΓ’ptcha and still get wrong, why is time is have too much?
*this
this might be grok's nightmare
2.5 Flash Lite outperforms 3.1 Flash Lite for what I suppose Flash Lite models are supposed to be
Noticed smth like that for text answers and some cognition from 2.5 to 3+?
Went completely unable to give responses in the right structure I requested
This was a shocker for me
Man the flash lite 3.1 is so trash even in my pratice use
I wonder how they trained them and what changed
The flash 2.5 is better lol
I assume flash lite 3.1 can produce longer text-oids?! Lol
Speed and price.
And i literally never said its a good model i said its better than mercury thats different
immediate ban
There is no gpt high latest no claude no gemini so why you post on announcement group that every model present there
Claude opus 4.6
Check out the leaderboards & use categories for a more filtered view of rankings: https://arena.ai/leaderboard/text

More reliable for Legal work?
15
20
2
Opus 4.6 Thinking
why do they treat you like a toy π
<@&1349916362595635286>
we dont π
these hacked bots GOTTA stop now.
What is discord doing π
opus by far. or gemini
@echo aurora delete pls.
cant you set up autofilter for when those exact 4 images are posted?
whats this
Dude the hell you mean Mercury 2 is worse than Gemini lite 3 cuz of old architecture
Dude mercury 2 is the one who uses new architecture instead of sequential token generation it does parallel based on diffusion
Basically what that means it can even fix it's own previous written token and be very fast while saving computing power and providing high speed, just give that model proper training like Gemini or GPT and it literally beats anything
All ais till now were just expanding their training data
Thinking if they keep expanding it will get better
That's not how it works
Mercury 2 was the first one to change to architecture and try smth else
And guess what Google is stealing that architecture
To make new Gemini 4
man release arena tools fast π
Tell Google to remove commission fees
Yo Google remove
They said no
π
Then wait till proper obfuscation of code
So that we just give the source
And y'all put it manually as developer mode extension
Who has registered a Google web store developer account with paid 5$
So we can upload
Or smth
Idk
We are broke π
- our currency isn't accepted by google
sequential is upgrade over diffusion
The hell are u saying
gemini diffusion
or D3PM (Discrete Denoising Diffusion Probabilistic Models) by Austin et al., published in 2021
diffusion is old as hell and way worse than seq
thats why it was abadoned in the first place
its like "yo lets make wheels square again and call it innovative"
Diffusion-LM (Li et al., May 2022)
This is Diffusion LLM
D3PM proved the concept but performed poorly compared to autoregressive models β about 2β3Γ worse than GPT-2 on language modeling benchmarks
yes those all are
how to fix or atleast redo
diffusion is faster but dumb so its useless
diffusion is not even as good as NON REASONING autoreg models
Second of all inception uses diffusion aka parallel only for token generation, that means they can still train the ai on same data as Google or Claude and get same results but faster also it has less hallucinations because it can fix it's previously written tokens
All that is different is the way ai generates tokens
all those i mentioned use diffusion.
and diffusion is just worse literally
ask claude or gpt if you dont understand it
So saying back then diffusion was dumber gpt cuz it was trained on less data than gpt
Is same as comparing grok to gemini
mercury 2 is literally worst llm rn
Obviously cuz it wasn't trained properly no wonder
When Gemini Diffusion releases then you will see it's potential
how hard is it to understand that thinking > hallucinating whole blob and trying to make it coherent
gemini diffusion RELEASED, AND THEN WAS SCRAPPED
Not my problem google failed to do that architecture
Second of all you are comparing a fish to a bird
mercury2 loses to qwen 0.8b thats how bad it is
diffusion is just trash
Diffusion model purpose of mercury 2 is completely different
"fast and dumb and cheap"
U are mistaking knowledge with intelligence
All ais are just memorizing stuff and mixing it together
knowledge is useless
ok you have 0 idea how ai works. blocked
enjoy your mercury while i enjoy gpt 5.4 pro
i WONDER which one is better
And yet you keep comparing a jet with an air fryer
Look if you are building a project like idk code refactoring ai or Ai Voice Assistant
U obviously would use Mercury 2
Cuz of it's speed
And for example voice Assistant speed really matters
And for simple stuff like telling voice Assistant to disable room lights or smth with proper system prompt
U would still use mercury 2
Cuz who needs it to be smarter when I want it to do simple tasks but faster
U are comparing GPU with CPU practically
GPU can do tasks at massive scale but not complex ones
While CPU does complex tasks but not at massive scale
You can already see that when going to inception labs website they offer u api not the mercury chat
Cuz it's literally made for integration of ai to projects be a thing
And fast thing not wait 19 mins until ai decides to enable the lights of ur room or smth
By ur logic coders now are basically stupid at milking the cows at the farm
It's simple task
Yet a farmer does it better
That's ur logic
And second of all to your statement "diffusion models are 3x dumber", let me clarify diffusion models intelligence like any other ai is based on training
And diffusion models are just harder to train
Because they are more complex
They literally think differently
Chud
Whatever making ppl like u understand that comparing a motorcycle to a plane is dumb is useless
?
Are u a chud
What is chud
U dont know in big 2026?
I am not Twitter or reddit guy so nah
Literally previous guy was saying "knowledge is useless"
Idk
is there anyway i can extract a chat from arena?
Bro any one seen the model muse spark in arena