#general
1 messages · Page 250 of 1
We're collecting these possible false positives in this thread: #1447983134426660894 could you also share the prompt used?
👀 I wonder how it does
I hope we get Opus 4.6 thinking 32k 🥀😔
Is it just me or opus 4.6 thinking makes less detailed results than opus 4.6 no thinking?
Nah 4.6 thinking make more detailed results
HAY
Ka
Ehh idk
Can you show like examples
.
Why have a model up if it always exceeds the limit, also on the 2nd attempt it crashed after it's thinking so rather a token limit perhaps. I would be guessing just instability, everyone trying to use it all at the same time, ect.
Yup that limit is screwing almost every prompt I try. I guess this model thinks so much more
I can’t even accept the terms 🙁 I think it was because the captcha was bugged 🙁
Q: Has anyone ever managed to convert a painting into a photoreal image using Nana Banana Pro? I can't seem to get to work tried all kind of prompts
Thank you!
yea dude this has to get fixed. unusable
This is an image edit no prompt
We are looking into this.
It's been a problem for ages and it still hasn't been fixed
Can you try a different browser? With all of the issues you've seen today something is for sure off.
It is worth noting this something went wrong error message is the generic error message that'll trigger for many different reasons (including rate limit).
I get it a lot, sometimes I get that message, but I can just refresh the page and the image is actually generating perfectly fine. Other times I get the message for seemingly no reason multiple times in a row, and I have to wait a few minutes before it works again, and that's not due to me hitting my rate limit
anyone here know how to get free ide usage for claude opus 4.6?
Idk how you guys will be able to fix captcha it does it’s job good if you loosen it it will be easier to bot. It’s a double edge sword
Same thing here. The model thinks for a long time and is interrupted by the response limit when it's time to give the answer. I think only removing the response limit will make it work correctly.
kiro ide you can sub for $0 for first month i think if lucky
I don’t really follow social media. I don’t have a Twitter or anything like that and I kind of stopped using Reddit but you know how I knew that anthropic was about to release a new model?
This seems VERY interesting https://discord.com/channels/1340554757349179412/1469144854264021174
Dario Amodei, the CEO of the AI company Anthropic, joined "Top Story" to discuss his new essay "The Adolescence of Technology: Confronting and Overcoming the Risks of Powerful A.I." In the essay, Amodei warns of the risks that come with artificial intelligence and also spoke with NBC News' Tom Llamas about AI regulation and control.
For more co...
Cause he always goes on the news before they release a new model and always starts preaching
opus 4.6 easily gonna be top 1 model on lmarena by a large margin
Ofc it’s new
His tripping if he think it’s gunna replace software 🤣
It’s like saying ai video model will replace directors
In code, but in text? I don't think so
nah ai will definitely replace software engineers tho
like out of all the things any ai learnt
what was the thing it mastered first
it’s coding
apparently coding is the easiest thing for an ai to do
im gonna say that by the end of this year ai wld be better than 60-70% of engineers currently working
It already is I feel like with opus 4.6
Ya’all crazy
My favorite moment is when OpenAI released GPT 5.3 codex but didn’t set up any api for it
This model was so rushed
Really makes me sad cause I thought they’d take their time and not just rush out a release
is the gpt 5.3 codex is available on lmarena ?
very interested to do a comparison between this one and the opus 4.6 with same prompt
Not yet since OpenAI hasn’t released it on their api
you are robot
if you have vpn it might be cause of it
then idk
the opus 4.6 non thinking was already better than the 4.5 thinking in my test so i wonder what the thinking version can do
what u play on rovblox mud
Nothing cause i can't find any good games for now
i already played every good existing one and im tired
Why
. 5 series all cursed
u play demon hunter?
Maybe be your browser. Try login in ans out your account
What is the rate limit for battle mode in the arena and when does it reset?
no sorry
is lm down ?
Is there any ai or website that can build full stack apps for free ...without any paywall
i literally only play armored patrol on robiox
These reCAPTCHAS are so annoying, It doesn't go away at all
I selected all fire hydrants, yet it is telling me to try again
Login in log out clear history and try different browsers
i think gpt 5.3 is a big improvement actually and is better than opus 4.6
can't wait to see even stronger model but the improvement is already cool
Heyy! Anyone have opus 4.5 free trick? I really need it like we can share files too
opus 4.6 seems to do exactly what your asking but its almost every time less aesthetically pleasing so idk
Where can i access opus 4.6?
both claude website and lmarena
Okay, thanks buddy. Does it have file upload and vision?
Sorry i don't know actually
its here, try on lmarena
oh my bad
i just understand what you said
No I mean the upload and vision capabilities
yeah sorry
Yes but im mad at it cause even with the option "always proceed" it will ask me to confirm every prompt, ive seen other people with the same issue and can't find a fix
Yes and the rate limits sucks
yh
I use cursor so i don't know if there's a better one
Depends on model
Videos are currently having an infinite generation issue it seems
It didn't work
lol i tried opus 4.6 its been thinking sooooooo much ive never saw that before
brainstorming on my prompt
thoughts on opus 4.6 vs. opus 4.5 for software engineering tasks for actual software engineers? In terms of real world usage? I noticed opus 4.5 edged out 4.6 in 2-3 metrics on the official release document, but opus 4.6 did better on most.
But in terms of real world use, what are people's thoughts comparing the two thus far? I've read that opus 4.6 can be "too agentic", but not sure if that's a universal opinion or just a one-off
i don't know why but its been 6 minute thinking non stop on my single prompt
no lag its just really thinking
too much
I read that's expected, but I bet that's eating a ton of tokens..
and for simpler stuff, that's probably not needed
For example for devs who like to take control of the process and implement step-by-step with less "do it all for me"
if you want i can send you the whole thinking for a single prompt (its not done yet)
its impressive
how much it actually think
That would be valuable to see, if you don't mind. you can dm me
yeah i send you that
I've used 4.5 a lot, but haven't experimented with 4.6 yet
i sent you the thinking
Got it, thanks! I'll respond in dm
literally nevermind i forgot how good good chat + bad cli is
I’ve been testing this out for a little
Try making a new account not Gmail
I’m not sure exactly but it helps sometimes
Unless your sending requests to fast to model
Then your going to get them
Also try resetting ur modem if all else fails
Okay i figured out something, opus 4.6 for some reason, when given a complex work to do, it will think without actually writing code, you can see a huge thinking during 5 / 6 minute then error from lmarena, but no code written, so you have to tell it to actually write code
its weird but yeah it is how it is
When's opus 4.6 going to be preliminary on leaderboard
We'll be sure to post an announcement when the leaderboards get an update with opus 4.6
Does max use opus 4.6?
Hey, is it an issue from lmarena that opus 4.6 actually think so much it end up saying an error happened please try again ? it's actually very very often, the model will think 4 / 5 minute then just "an error happened"
its when given a complex work
code
New feature in testing. 👀
Take a screenshot
is just not working
Why is it even there? To take screenshots of the current session?
it's still in testing
And the transparent error popup is nice but why is it at the bottom?
It should be somewhere the bg is clear
it's not even out it's still in testing so it might not work that correctly yet
don't expect anything that is still being tested to work instantly
They shipped that means it should work
Atleast somewhat
Models in arena are executing commands! Theya re installating packages?!
I believe it has the potential to use all models in Battle (excluding the codenamed models).
I hope these tools come to text modality or a completely new modality where it have all these tools available to use :>
Without the system prompt of code modality
There is going to be a some instability with the model today as it's getting a lot of use as you can imagine.
Thank you, and do you know when gpt 5.3 will actually be available on lmarena ?
When the API is out
There is no API for arena to use them right now
For the most part I won't be sharing details about if/when new features/models/etc. are landing.
Oh i see
Thanks for sharing this, will flag. 
Any particular steps to repro this? I wasn't able to get this error.
So whats the opinion about opus 4.6?
Cuz i just found out about it like this second
Good
Dif better then 4.5
But benchmarks say the new gpt 5.3 codex is better
But until the API is fully out we don't know
I mean... its out on openrouter so who knows
opus 4.6 did a decent 3d world much better than opus 4.5
but gpt 5.3 might be even better
at coding and overall i guess
Just click the button?
Maybe cause of brave browser, try to replicate it with brave browser on android.
Hey pineapple can you create a seprate coding channels?
Guess the models behind these.
They are made by sota models of the same lab
We don't know but we do know it's better in terminal bench which is like a important benchmark so the rest I think Claude wins on it
But the difference in one version upgrade is so much like bruh what
It could be this single instance but we will see
Made this with Claude 4.6 thinking
https://019c31d8-ccde-7e39-afa3-188c58cd1868.arena.site/
What do y'all think
Who here is a nano pro?
I tried
How do I get this? Is it slowly rolling out?
Hi
I want to work with my GitHub repo. But it doesn't support github connector like perplexity. So wht to do now ?
Like I want the ai to create pr make changes put commit etc
Do you know what type of thinking the thinking mode of Claude 4.6 on arena is?
Low medium high or max
Me using Opus 4.6 after the announcement:
why did you edit it
it was correct
I want to work with my GitHub repo. But it doesn't support github connector like perplexity. So wht to do now ?
Vencord
Good
I see that daily with opus 4.5 32k. Finally found someone that struggles with me
Omg lmarena became Light...
You can't for explosions, you can't for soldiers falling off the horse....
Drastic even chatgpt does that
Guys how good is opus 4.6?
Please add copilot too!!
I'll let my work speak for itself http://localhost:8000/index.html
Check if you get [this error](#general message) too
Copy the prompt you sent him and refresh the website and send the prompt you copied
No it's on text
Ohh
Code and text
I might have to test it for creativiry
Take advantage NOW because there is no limit
I lost my laptop so I lost all the motivation to do anything else 😭
No
I’m gonna use it 1 time
Guys what glasses mean on ai models
I was using it many times yesterday 😭 still didn't hit limits
Smart or something
Copilot is just chatgpt
Copilot doesn't exist
Vision
Like it can see images or something
I think vision is the file icon
The file icon is pdf
Oh alright
Wait Claude 4.6 doesn't have rate limit?
Sunglasses is vision
File is pdf upload
Alright thanks
I mean I used it many times YESTERDAY still didn't hit limit
Like how much tho
Maybe thinking version only has limits?

I was using thinking
Oh alr
Uhh I think 10-20 messages
So how good is opus 4.6
I mean I spammed it yesterday day night and didn't hit any limit
Or is it just same as 4.5
Smarter but 5.3 codex wins in one benchmark
I love it, I fixed a project that been working for while no other ai fixed it. Thanks opus 4.6
But every other one Claude wins
I mean one benchmark is no big difference but still good
Probably rn it's free but today or tomorrow will add limits
Well damn
Yeah it is
I think like the way it run commands on terminal
First time seeing chatgpt cook
They released the codex 5.3 the nano second claude released 4.6

I wonder if they'll add search to text arena for better results maybe
Like not cutting off ai from internet
Grok is probably the best for up to date information
The heck are these models on arena
The hell
Search already exist
What ai is named beluga😭
I mean like search for the text arena not the separate thing
So does anyone know the data training cutoff for this model?
Opus 4.5 i think it was early 2025
Ye it's kinda annoying
If I remember I told a model to search online and it actually did while thinking on text arena
Probably 2024
Thats less than before tho
Idk you can ask it and it will answer
I wonder why are the trainings are even cutoff
because they can’t learn like that much reliable data or
Yeah well 4.5 says its early 2025. And 4.6 isnt out yet
like Claude says oh it has reliable data up to April 2025 but can still get info up to July or something
Glm 5 is trying to fix this so it has access and gets info without even searching because it searched before
To not know what happened today so they won't take control of the world
4.6 is out...
Your late
It's out
🫀 🫀 🫀
Not on the model list it aint
It is just search it
It's the last model just scroll all the way down or search opus
Like the last model in the selection
Ah it was hidden at the bottom
Expected it at the top
My bad
They did that on purpose I just can't prove it
Anthropic being themselves
I guess nobody is safe from greed
the exacution of code dont work ?
Its on the last because its not on the leaderboard rn.
Sorthing is same as leaderboard and models not on the leaderboard are at the last
Well its still early 2025
Slightly dissapointing
Was hoping for more recent stuff to be more available
Like knowing who the new pope is
I think they did that because they know ai can see the future before us
Well atleast I have a domain
Ok but still. Not even a middle 2025
They might do it on the next model because Opus 4.6 was in training in 2025
Probably
Will gpt 5.3 better than Opus 4.6?
Training for these Minor Improvements?
💀
Probably Not
That is a good point and yeah
The 4.1 and sonnet 4 had 2024 data before the upgrade
Only in terminal bench
Nothing confirmed if GPT 5.3 is coming this month just don't believe anything you see. But my guess is No
Where does the arena get the money to pay for all these expensive models?
But Codex 5.3 is already Out
Kk
See their blog
But not the api
SECRET
True
I remember that in decdmber it was thoughg sonnet 4.6 or 4.7 was coming out that month
Well idk why but I think they trying to improve something on Gpt 5.3
Same deal?
I C
Makes Sense
SO TRUE
I mean internet is just full of lies
Now i think if this is true , what will they release in place of actual next sucessor of sonnet 4.5??
Haiku???
What about haiku?
Haiku is boring no one cares
Its okay They hide our identity so yeah
Same lol
Truth
They get funding
Yeah from A Crypto Venture
/PRNewswire/ -- LMArena, the open community platform for evaluating the best AI models, has secured $100 million in seed funding led by a16z and UC Investments...
They’re burning some money though
Just like most companies
Except they’re not technically a business somewhat
Lightspeed, Laude Ventures
ngl i thought it was sonnet 5 instead of opus 4.6, did not expect that
LMArena has secured $150 million in funding at a $1.7 billion valuation to advance its AI benchmarking platform. The article details the company's crowdsourced evaluation methods, challenges with traditional benchmarks, top AI model rankings, and how developers use the service for real-world testing and insights.
So
In terms of writing long texts do we still need to wait for it to polished or what
Cuz it keeps crashing
Does it still need those funky numbers at tge end of the model?
Haiku is good combination of speed and intelligence
Wasn't that for sonnet
Or is it way faster
is there a way to find out the exact model the max routed to not just the organization
since it could be helpful to see what models are best at what prompts
gimme money 
In lmarena leaderboard Gemini 3 pro ranks 1st while in Artificial Analysis leaderboard Gpt 5.2 pro tanks first
Is Artificial Analysis biased?
They only rank using trust me bro benchmarks...
crack bench
idk if this one works
who invited beluga into arena bro
why is opus 4.6 the lowest here
Yes, this AI just take ur prompt and put the best AI for it to answer u or speak
To be honest it's actually decent
I wou still go for sonnet for balance tho
It can be since google model until xAI
so that their money burned less fast because less attention
but its the best of all?
mr krabs
🦀 money 🦀 money 🦀 money
(that's the answer)
REAL
Yeah sonnet is for everyday tasks and you can haiku for fast and quick answer without being too conscious about it being hallucinated or wrong, it is fastest model by claude and is seemed better compared to other similiar size models
It might be true lmao
It shows improvements from sonnet 4.5 but not from opus 4.5
wait what just happened to sonnet creator
wait gemini 3 flash thinking have thinking? i forgor
Dam Gemini 3 Flash is the easiest one to jailbreak lol
Gemini 3 flash literally doesn't have a non thinking variant
I’m getting non stop time outs with opus 4.6 thinking, meanwhile with the same question 4.5 has never timed out …is this a bug or is it because it’s brand new and needs more time ?
Cause of its hallucinations and behaviour
Make minecraft with touch controls
Opus 4.6 thinking
https://019c32cb-de47-77c5-aaf3-30b23c910a2f.arena.site/
Gpt 5.1 codex max ultra premium pro high figh wifi bluetooth
https://019c32cb-de47-78f8-8f93-6c88f6728a2d.arena.site/
Wth is 5.1 even doing wth is that
What model made this?
It's good but the floor
Anthropic's Claude (Opus)
Specifically opus 4.6 thinking
I just said Make minecraft with touch controls
One prompt
I liked it
Gpt 5.2 codex
https://019c32dc-b394-79d2-a3f1-08e74bf4e7ae.arena.site/
Gpt models are so weird rn
opus 4.6 takes forever to answer a simple prompt in arena
OAI is history
Cause its not made for simple prompts ¯_(ツ)_/¯
Yeah
now even direct chat has captcha?
Anthropic (and Deepmind) have defeated OpenAI
like come on ...#
Its literally made to think and reason for the hardest problems
-# later, xAI might join the victors
It already had from the starting
this is annoying
I never had like it, I prefer Gemini and Claude
it was only on voting session
Yeah gpt lost it with gpt 5 launch
can anyone try this prompt for Gemini 3 Pro GA?
Create a nice looking and rich SaaS about Gemini 3 Pro GA by Google Deepmind, it must has a mock about the Gemini 3 Pro Preview which is so lazy and it's fixed on Gemini 3 Pro GA, output in single html, must use tailwind css(cdn) and i don't want shiity website, should be really cool and good and should never use emoji in the html.
i never get it bro
i only get gemini with google logo bruh
so please any1 try it
not working anymore
does this guy favor Lua language?
No, google captchas were already on direct chats
I don’t know how they’re gonna replace them. They need it for anti bot
Because it’s really effective
they need to do smth about it
That’s the thing I don’t know what alternatives there are I can’t think of any
10 gallons of water per custom chat title
Fr
a lot
Thats not correct bro 😭
not sending any prompt anymore
Oh yeah, it’s not correct but there’s a bitter drop of truth in there in a general sense
Exaggerated
This one is nice one too ↑
It isnt even rerendering anything 😭
the ga deleted?
actually it does
It's just a gemini 3 pro model
One is direct with logo
another is stealth with logo
I literally have zero interest in gemini models cause of their hallucinations and attitude issues
its only available in battle mode?
probably so
Yes
i have a gen with 2 gemini 3 pros
mm i see
can u try it with my prompt plz
i have tried it for 2+ hours and i only get the gemini 3 pro with logo
the GA is also with logo, you can only tell by quality
does anyone else have the problem where opus 4.6 thinking just thinks for too long
and then stops working
Just three looks wierd they should have extensions showing
like 4.5 would think for a minute max before it starts outputting something
Better than Claude lol
Hmmm
bruh😭
It made a floor
Attitude?👀
.
Yeah i dont like its character, its lazy, doesn't accept its own mistakes, doesn't follow instructions, a literal karen
When i say dont be lazy, instead of doing work it literally says i am not lazy 😭
I had to beg with Gemini flash so it follows my lead
Yeah flash is even worse
Within thinking it's the worst
They dont have any reliability
It removes features
When I ask it to add a feature it removes the previous version of the code too 😡
So now I just use GLM for coding
Glm is good
Its tool use capability i like it but it outputs literal articles even for simple questions
A system prompt makes it give one line answer...
I have to instruct it how to use its tools
The model is so agreeable too
Isn't gpt 5.3 codex assisted by 5.2 codex in it's creation?
No wonder the front-end capability is meh
Try yourself
Same
although i still think glm 4.7 is the best chinese model yet
I haven't used it for front-end
alongside deepseek v3.2
What about Kimi 2.5?
meh
starting to look more like gemini 3 clone
its heavily trained on gemini outputs
Pro.
it knows what year it is, just does silly messups..
If it knows what year it is, then why does it do the silly messup.
That's like me saying I sometimes forget my name.
That would reduce my credibility and reliability by a lot.
2 y ago I saw a brutal AI error in math, was about meta AI it made all the equation but in the final it was like 255 + 1 it said 257
i hate it when models say “you are absolutely right”
like why can’t they be right when i first asked the question
Honestly same.
I have to argue with, and debunk the AI's false claims, after which I realize I just wasted my time arguing with an AI.
Gork the best in time lines tbh
You're absolutely right
It doesn't really get years wrong or dates
Gemini 3 when it doesn't hallucinate absolutely cooks especially in language
Sadly it hallucinates like crazy
Claude ofc
I agree, but i still want to see, if Claude Opus manages to get 100% in this poll, and which model lands second place.
how to fix it?
🙁
(..the 2nd poll)
but does it go back to normal? because I already tried closing the browser and everything, and it didn’t work , it’s still giving this same problem
website is down?
Uh no
🙁
alr mate
did you use multiple accounts like me?
Nah
What are Opus Rate Limits on Arena?
has anyone tried this on opus?
Opus 4.6 thinking gives a timeout when thinking longer than a certain time. Any plan to fix it
@echo aurora are there any plans for adding gpt 5.3
They can't
The API for 5.3 codex isn't out
is there an api for talking to @icy yew
After the API comes out
login
dont work
The new Claude program is still unusable for me; it thinks for a long time before giving an answer, but is interrupted by the site's limit. Has anyone managed to use it yet?
exact same thing is happening to me too
do we have gemini pro ga candidate on LMArena?
What is the ranking according to this group?
claude 4.6 > gpt 5.3 > gemini pro ga
OR
claude 4.6 > gemini pro ga > gpt 5.3
i believe these are the only two possibilities 🙂
At most for me it takes the maximum 30 seconds
For hard stuff
Im tryna get it to make a story game
That lasts like 10 mins
Which might explain it
My ranking Claude > Gemini > Chatgpt
Code?
dude why cant i test any model on this website
it reasons for 5 mins and it breaks
What's ga
General availability or something
wow you guys havent touched gpt 5.3 codex
I'd definitely rank claude 4.6 over gemini 3 at this point, esp with gemini 3 being so terse
Gemini 3 is still better overall tho
The main problem is it hallucinates like a lot
idk my big problem with gemini 3 is it doesn't do long outputs. also it always wants to name characters Elara Vance
Can u guys release a premium version of the arena , so theres dedicated support , most of the time server crashes on my experience , just an opinion btw
Wdym server crashes
Does the website not work or the ais
What's causing this error? I haven't used it at all today, I don't have a limit.
With Claude?
yes
I use 4.5
(
Eli Vance half life
Gemini sucks. All though it’s all I use right now only because of nano banana
Well not sucks but has issues
It's always the same, Google nerfs models after 5 days of release, the same thing happened with Gemini 2.5.
Nano banana lagging rn right
Ah so would ai studio one count
For chatting sure maybe. For coding? Not in the slightest compared to opus 4.6 only thing going for it now is that its free on cli
i love max
why does max not tell me what model responded?
it would be useful to know
i guess max does does not know it?
because athropic is internally routing?
Going to respond in #ask-here in a bit 
Are you able to create a post in #1343291835845578853 and explain a bit more what's going on? What modalities are you using, is this mobile or desktop issue, is there an error message, etc. Anything you think is relevant would be helpful to know.
It is going to be a bit unstable, but it is working. I'd recommend trying the steps in this article when you run into this error message.
Python in codearena?
why lmarena isn't opening
Opus-4.6-Thinking is too unstable for me with longer tasks, 4.5-Thinking-32K is way better.
Ya gem 3 is sad to see what it became
Google says It makes these insane codes and when you actually try it it ain't the same level as they showed
Yes for me it times out. I have been copying its thinking context and pasting it back to it so it can have the very long thinking times. Otherwise it will just keep trying and failing
This has been flagged to the team btw cc @burnt sinew
Tbh I thought it was an issue with every model... even gemini 3 officially errors out at around 10 minutes
Thanks, I usually analyze/optimize Mesa/Linux Kernel files of around 2000 lines of code. Opus 4.6-Thinking really struggles there.
Wouldn't you use 4.6 non thinking then?
Or... what i said earlier with copying thinking context manually before it errors
It usually errors out within the thinking process already. But sometimes it finishes thinking but then only gets not that far with the answer.
thats glm 5
....
every model that has alpha in its name
was a cloaked
openai model
oh my god the joke flew over my head
im such an idiot
If you copy the thinking it'll pick up from there
Does the server in your tag really have free ai usage like lmarena?
Yeah, that was a workaround that I've used so far.
Well, yupp gives out credits and each prompt costs credits, but you get credity by reviewieng the output (with some gamification, so sometimes you lose some, sometimes you get much more credits per review).
Payout to EU has been suspended though, hence I didn't make any money there.
Like they don't give credits to eu?
You still get credits, but you cannot cash out via Paypal to EU at the moment. But I am more in there for science and the access to the latest models.
Cash out what??
You can make money from there?
Credits = Money -> Cashing out your earned credits.
Yeah, 1000 Credits are 0,90 EUR at the moment.
How much free credits do they give you
Did leaderboards just update?
No announcement yet
There is a base of free credits at the start. But you need to earn more credits to pay for each chat. The model costs vary. The earnings vary a lot, too.
Crazy
It somehow works, though.
But as I wasn't able to cash out at the moment (and it might take many more months to resolve it), it is more interesting for people outside of the EU.
@echo aurora What's the difference between code and text->coding
wich is better rn? Opus 4.6 or Codex 5.3?
From what
to gemini 3 pro
What is
I mean what is that from polymarket?
Doesn't look like it
Ah
making a portfolio website
@echo aurora Also what was the point of not including thinking 4.6 and have just normal 4.6?
Code is going to be Code Arena leaderboard & Text->Coding is coding tasks done in Text modality, leaderboard here.
We're still gathering votes for thinking version.
Shouldnt there be more votes on the thinking one? I would assume more people would use that one
Votes are generated via Battle, not Side by Side (where you can manually select a model)
Are there currently plans to make code arena eventually include non pure front-end tasks?
@echo aurora I think 4.6 thinking has higher error rates
It does he already forwarded it
ah gotcha
I wouldn't want to share more info about future plans until we're ready to, but overall our team is wanting to bring a lot more features to Code Arena
oooo
Yeah thats fine
yo
4.6 opus is really good, the only issue it has is no access to external textures and libraries
when coding
im really excited for claude 5 sonnet tho
its supposed to be huge and even better at coding tasks than opus
whats the rate limit of claude 4.6 think
i think like 10 prompts
or 15
great
send it
Thats your job
u cant do that in lmarena
Yeah you can provide it links to assets
i meant if it had access to search and browser websites that would be really great
does it work on the current models in arena?
if u provide links, it cannot open them or extract anything
It works on every single model anywhere..
how do i send opus 4.6 think images
Yeah but it can include them as assets??
Nope
so you send the links and say to put them in the assets?
@zealous sparrow
You don't need to open a link to use assets in code
Yeah direct link to the images
And say the dimensions of the image
What's the rate limit for Claude 4.6 thinking
i neeed help which is best opus 4.6 or 4.6 thinking for python coding?
Thinking ofc
Like I made flappy bird 1:1 clone using that it just took all asset links
yea i realized, im dum lol
i just need to search for the links and ask it to put them as assets
thank dude
althought gemini 3 pro could put the assets itself, like planet textures without even asking for it
i think gemini 3 pro training data is pretty good
Yep
It did that for flappy bird
But it used external asset links
yeah, my point is that gemini 3 pro has alot of assets in its training data and can use them without the user noticing
Yeah sure or just try asking opus to use its own asset links
16k I think
So it's better to use Thinking 32K
They don't have it for now
Hello
Yes that's what I meant
We don't have these rate limits publicly listed. Although this is something we're considering.
is arena ai downloaded for ios or no?
Did they fix the response bug
It's a website
Nope, there isn't an app, but this may change one day.
Which bug are you referring to?
"Something went wrong"
Okay, thank you
i am talking about the bug that cuts the response off.
You mean after it thinks for a while?
I'd encourage you to scan #1343291835845578853 for a similar post and share/tag me there. Or if there isn't one that directly lines up with the problems you're having create a new post.
Btw is Claude 4.6 dynamic thinking is the same on arena
Alright i tried the 4.6 model in writing stories
Dont know what to think honestly
Like the writing is not BAD
But it feels drier in dialogue for one than previous models
Can't speak about code or other such things. I Dont use ai models for such things as code or image generation
I am using claude opus non thinking and this happens..
Oh yeah that happened to me to in writing. I was using thinking tho. 4k words work i think cuz i also had it work on shorter chapters, but 8k words get crashed
I think opus 4.6 has some limits
It says every time "I have to make it in my limit of 20 steps"
When it thinks
Tf does 20 steps even mean
What's 20 steps
I will say this tho. The restrictions are more loose in what it rejects from writing than the previous model
Hello I am new here
Soo the thinking is set to high could be the cause of the thinking limit making it get a error
Bro i was using 4.6 thinking and it keep getting stuck in the middle because of thinking
Idk
Exactly
The lmarena team needs to increase the context limit in all responses for models of this type; this way, that frequent "error" is avoided. It even avoids giving the model instructions for response limitations, while also saving time and writing.
The typical limit is approximately 8350 words per answer in Claude, but lmarena has to increase the limit until everything is completely finished and not limited.
You can do it without this but it will require splitting prompts
Mine requires a detailed and complete answer; therefore, it is necessary to apply these restrictions, instead of putting everything in a single response.
Finally opus 4.6 thinking actually think for some time not just for 1 second
This has been flagged to the team btw.
This is probably the core issue regarding the errors when using Claude 4.6
Hope it gets fix soon

Idk it doesn't happen to me now
Perhaps it's just a matter of luck
It's fixed for me
opus opus let me use opus
Please arena I need opus 4.6 thinking 32k
My opus 4.6 thinking is kinda homeless
I live with my opus 4.5 32k
Thank you for the notification.
@echo aurora how old are you? 
Asking a manger his age
Ayo chill fam 😭
@echo aurora why do i get an error that corrupts the whole project, when I use the coding arena?
i tried this in many chats, and 50% of the chats get corrupted at the end of coding when publishing the app
Read the following:
so its the context limit destroying the whole project when its done coding?
It is the limit of the context that prevents the work from being finished, that is why the error appears.
You can limit it to 8350 words per response and see if it actually finishes the job. You can try it.
but my specific prompt was to produce long and complete code...
seems like arena.ai wont allow that
Use the model in text mode and generate the code to copy it later. Use reasoning mode.
code arena is the only affected part of the context limit?
i'm a new soul, i came to this strange world, hoping i could learn a bit about how to give and take but since i came here felt the joy and the fear finding myself making every possible mistake
No. Text mode is included.
yeah it does cut mid coding
but it doesnt corrupt the whole chat
like code arena...
It includes instructions for dividing the answer into parts, with each part having a maximum of 8350 words. It's useful.
The other thing is to write the code directly, clean, without comments, without artificial simplification, and completely unified. It's a very powerful instruction.
why am i always getting an error when claude 4.6 gets done with its task in code mode?
We are looking into these reported problems, but it's worth trying these steps in the meantime as they may help: https://help.arena.ai/articles/1645798556-lmarena-how-to-something-went-wrong-with-this-response-error-message
Now this is impressive
W Claude
https://vt.tiktok.com/ZSaw8kyst/
where is this opus 4.6 think in the leaderboard
Include these instructions at the end of your prompt:
Each response must be consistent with all of the above and without deviations, proactively correcting anything without waiting for explicit instructions from the user.```
This instruction is very powerful, especially when it is something serious and in production mode, but useful for testing the model's capabilities.
Time it out space out requests
Good luck.
thanks for helping man
can it code an nice GUI
for roadblocks
💀
show me the image, cuz i wanna see
i used 4.6 opus in html games today, it cooks good
how is claude 4.6 opus like 5x better than gemini 3 pro
i made a good solar system simulator with assets
dw, sonnet 5 is gonna be even better
sonnet 5 is the master of coding
Do you have online samples so I can see how it actually works?
its not complete yet, but i can give u the arena.ai test link
cuz it has a minor bug in loading Earth's texture
google lowkey kinda disappoting me rn
they gotta catch up
google is cooking, we just wait
like idk how benchmarks show that claude 4.6 opus is like 3% better or sum
but its even evident w/ webdev and documentation making
i gave it requirements saying make me documentation for so and so programming lang, and it completely half asses it and ignores half of my instructions.
4.6 opus is way better than 4.5 opus, if you test them both you gonna see a big difference
meanwhile claude 4.6 opus basically turns into slave and acts like its being held at gunpoint
lol
or even a simple half-assed prompt saying make me UI like palantir
gemini and 4.5 opus will just half-ass it as usual
4.6 opus will immiedately cook up and make u whole UI lib that actually looks decent and is bug free for most part
It's a start. What I like is space warfare and first-person perspective.
I have in mind that Opus 4.6 will help me create a unique and realistic universe to integrate my character into, but I will do that at some point if possible.
Console, PC, and mobile games are linear and feature repetitive stories. My universe will be more than that.
It's just for playing around for a while, not for getting addicted.
All ai does this . This is the way they’re optimized.
Because you need to go more in depth for it. Also tell it to not be basic or be lazy whatsoever or it will be by default. AI does what’s fastest not what’s best quality
is opus 4.6 an improvement on 4.5?
7
11
1
yes
4.6 is significantly better than 4.5 once you test it yourself
we need image uploads for opus 4.6
Up
this is terrible
i could pitch you like 100 much better ways to simulate a plent's atmosphere. i made procedural textures in blender in like 5 minutes that look 100x better than this. not to mention the shadows just dont work
its non thinking, and my prompt was pretty simple
yeah fair enough
yeah but, you're not my AI assistant who responds less than 5 seconds..
lol
yes, and what i make also isnt a steaming pile of dogshit
yeah ik devs are better than AI at coding, but this is impressive for someone who doesn't know a bit about coding languages
nobody would choose this over something made by a human. its only value comes from being extremely easy to have an ai make for you and free, which is not negligible but necessary to understand you're pitching quantity > quality
sonnet 5 is also reported to be much better than this 4.6 opus
at coding? no lol
how do i use sonnet 5
which is said to be released late feb early march
you cant yet, its not released
at 83% SWE bench, its way better
oh ok i didnt see
did they put out the numbers
weird they'd make a sonnet model super good at coding when coding is quality > quantity
from leaks, it seems to be close to that number. no one knows anythin yet
and opus is supposed to be good at complex, structured tasks
leaks are usually pretty terrible sources of information. this goes for anything
yup, there are no actual sources on it yet
literally any time one of the big ai companies does anything now you have 60 wojaks on twitter saying its agi superintelligence from the preview builds they've sent out
when 4.6 opus launched, it was pretty decent not too impressive
yeah i tried it. its pretty good
people expected sonnet 5 with better coding and stuff, but it got delayed
also sonnet is pretty cheap at $3 per input / $15 per output compared to opus
i've been thinking about a system of fine tuning over the top of the base model where you have a few elo based examples the ai is trained to respond like to specific criteria. essentially what is already done with safety but for code
i also believe you can create a "perceived prompt" that the ai sees and the stupid half-thought-out prompt given by the human. you have an intermediary ai that goes in and edits the prompt so its good and leaves little to the stochastic imagination
nano banana already does this, as well as hunyuan, qwen, and most other ai companies
What is known about the context? Is it 1 million or more?
Please help how do I fix opus 4.6 thinking timeout error
What's surprising is how well it scores on long context
I think that’s the key here
It could also be argued and a case can be made that perhaps is actually occurring isn’t necessarily an improved model as much as it could be improved memory and hardware on their end
It's the only open source model that's competitive on long context reading comprehension:
Gemini 3 Flash's score is insane, but Opus 4.6 scores higher in MRCR needle-in-a-haystack. Opus is still not on the above benchmark yet though.
was opus 4.6 an anon model first
Gemini is fraud
Probably has the worst memory issues of all the models
That’s how I feel when I use Gemini
On the app, yes, I think they do prune parts of the context. You can actually test this by asking the fast model to output a transcript of the entire chat so far. If you closely, it actually leaves parts out (not entirely sure if it's just the model, but probably not).
I don’t even bother for one reason only I don’t code and I don’t see the reason for long text because you’re still dealt with the problem of the models all hedging hard
I guess I can test it more but it's a bit time consuming to recreate a long convo.
It’s like musical chairs
They alter the words and meanings of the semantics and hedging is one of the most messed up things about AI in my opinion
Grant more authority to model than it does to users intent
Here’s an example
Another day another Chinese banger
You see how it alters the words now imagine with a long context
It completely stripped away the emotion, the individuality, the uniqueness of expression from my statement into
Look kimi instant
Va thinking
Hi
i hate this endless generating bug so much.
ong 😭
it slows down my LUA experiments SO MUCH
Wow, what a prompt 🤣
What model is beluga? Is this an alias or do I just not know that model?
show the output we may be able to tell
I got it again, its from Amazon
https://codepen.io/Emilio-the-encoder/pen/raLZKRo
seem to be actually pretty good
wait when did bezoz step down from ceo 💀
5 years ago ish
July 2021 😏
so for the past 5 years I’ve been thinking that Jeff was still the ceo 😭
which company is pony alpha??
where do you found it ?
<@&1349916362595635286>
How the quack is everyone doing
pretty good, how about you Ducky?
is here anybody looking for the developer?
Just working to pay that duck support
Lua or luau
glua
I do GMod lua ai experiments
because life isnt always roadblocks
Cuz of scammers
Elon Musk
See
It works
free
Money
Free money
bitcoin
Btw
hi
Who is that
Let's say a guy who liked children a lot
Like cheese pizza?
OMG Lmarena stopped Someone is work with some function that I probably won't use
david baszuki?
what about it
i joined this server like months ago i just rejoined after leaving for a while you can search my name
ok
ik
iwas just testing
if the ai knew how to make btools system
i never released that game to the public
many of my projects are unrelased
like 99% of them
its too fast
ik
ive just been so caught up with my more important projects that
i havent had time to work on
the quality ones
you can see though that i was working oj changing the speed, the sliders of speed, and accel rates
earthbound
EarthBound, originally released in Japan as Mother 2: Gīgu no Gyakushū, is a 1994 role-playing video game developed by Ape Inc. (now Creatures Inc.) and HAL Laboratory and published by Nintendo for the Super Nintendo Entertainment System. The second entry in the Mother series, it follows a young boy named Ness and his party of Paula, Jeff and ...
yeah i mean on websim
o
this ^
the intro is bad rn
i havent had time to fix it
but recently a lot of websim staff have been fired
free credits have been removed
some of the other popular users are just quitting
i personally dont agree with that statement
it was so good
when free users got 50 free gens a day
and the team gave out free max subscriptions (i was one of the first to get one)
no im talking about the cheap AI generated games on the front page
nothing was ever fun on that platform
thats not really websims fault

