#general

1 messages · Page 250 of 1

hollow imp
#

Because it's exceeding the thinking limit set by arena

echo aurora
#

We're collecting these possible false positives in this thread: #1447983134426660894 could you also share the prompt used?

verbal nimbus
#

👀 I wonder how it does

uneven peak
#

I hope we get Opus 4.6 thinking 32k 🥀😔

honest verge
#

Is it just me or opus 4.6 thinking makes less detailed results than opus 4.6 no thinking?

uneven peak
gilded shell
#

HAY

uneven peak
#

Ka

icy yew
#

Can you show like examples

#

.

crystal rapids
summer hound
#

Yup that limit is screwing almost every prompt I try. I guess this model thinks so much more

keen beacon
#

I can’t even accept the terms 🙁 I think it was because the captcha was bugged 🙁

patent oracle
#

Q: Has anyone ever managed to convert a painting into a photoreal image using Nana Banana Pro? I can't seem to get to work tried all kind of prompts
Thank you!

summer hound
#

yea dude this has to get fixed. unusable

loud verge
#

Guys

#

Does grok 4 search not work anymore?

toxic verge
echo aurora
thick pawn
echo aurora
echo aurora
thick pawn
burnt sinew
#

anyone here know how to get free ide usage for claude opus 4.6?

toxic verge
frosty shuttle
stone cape
toxic verge
#

I don’t really follow social media. I don’t have a Twitter or anything like that and I kind of stopped using Reddit but you know how I knew that anthropic was about to release a new model?

stone cape
toxic verge
#

Cause he always goes on the news before they release a new model and always starts preaching

primal orbit
#

opus 4.6 easily gonna be top 1 model on lmarena by a large margin

toxic verge
#

Ofc it’s new

#

His tripping if he think it’s gunna replace software 🤣

#

It’s like saying ai video model will replace directors

north obsidian
shrewd citrus
#

like out of all the things any ai learnt

#

what was the thing it mastered first

#

it’s coding

#

apparently coding is the easiest thing for an ai to do

#

im gonna say that by the end of this year ai wld be better than 60-70% of engineers currently working

delicate fable
toxic verge
#

Ya’all crazy

cloud zinc
proud bobcat
#

My favorite moment is when OpenAI released GPT 5.3 codex but didn’t set up any api for it

#

This model was so rushed

#

Really makes me sad cause I thought they’d take their time and not just rush out a release

frosty lava
#

is the gpt 5.3 codex is available on lmarena ?

#

very interested to do a comparison between this one and the opus 4.6 with same prompt

proud bobcat
frosty lava
#

oh alright

#

thank you

nimble snow
#

i cant keep doin dis every prompt

cloud zinc
#

you are robot

frosty lava
#

if you have vpn it might be cause of it

nimble snow
#

im js cooked

frosty lava
#

then idk

#

the opus 4.6 non thinking was already better than the 4.5 thinking in my test so i wonder what the thinking version can do

nimble snow
frosty lava
#

i already played every good existing one and im tired

toxic verge
nimble snow
toxic verge
maiden fulcrum
#

What is the rate limit for battle mode in the arena and when does it reset?

frosty lava
broken elm
#

is lm down ?

past trail
#

Is there any ai or website that can build full stack apps for free ...without any paywall

sturdy mica
robust sonnet
#

Yo guys

#

What is the rate limit on opus 4.6 in code mode?

maiden fulcrum
#

These reCAPTCHAS are so annoying, It doesn't go away at all

#

I selected all fire hydrants, yet it is telling me to try again

toxic verge
#

Login in log out clear history and try different browsers

frosty lava
#

i think gpt 5.3 is a big improvement actually and is better than opus 4.6

#

can't wait to see even stronger model but the improvement is already cool

stoic solar
#

Heyy! Anyone have opus 4.5 free trick? I really need it like we can share files too

frosty lava
#

opus 4.6 seems to do exactly what your asking but its almost every time less aesthetically pleasing so idk

stoic solar
#

Where can i access opus 4.6?

frosty lava
stoic solar
#

Okay, thanks buddy. Does it have file upload and vision?

frosty lava
#

Sorry i don't know actually

stoic solar
#

Oki

#

I checked it, it doesn't have 😞

frosty lava
#

oh my bad

#

i just understand what you said

stoic solar
#

No I mean the upload and vision capabilities

frosty lava
#

yeah sorry

stoic solar
#

Yeah not an issue

#

Do you guys use antigravity?

frosty lava
#

Yes but im mad at it cause even with the option "always proceed" it will ask me to confirm every prompt, ive seen other people with the same issue and can't find a fix

stoic solar
#

Yes and the rate limits sucks

frosty lava
#

yh

stoic solar
#

Any other good ide?

#

Other than vs code with Co pilot

#

Cursor

#

?

frosty lava
#

I use cursor so i don't know if there's a better one

stoic solar
#

Depends on model

thick pawn
#

Videos are currently having an infinite generation issue it seems

maiden fulcrum
frosty lava
#

lol i tried opus 4.6 its been thinking sooooooo much ive never saw that before

#

brainstorming on my prompt

thorn lantern
#

thoughts on opus 4.6 vs. opus 4.5 for software engineering tasks for actual software engineers? In terms of real world usage? I noticed opus 4.5 edged out 4.6 in 2-3 metrics on the official release document, but opus 4.6 did better on most.

But in terms of real world use, what are people's thoughts comparing the two thus far? I've read that opus 4.6 can be "too agentic", but not sure if that's a universal opinion or just a one-off

frosty lava
#

i don't know why but its been 6 minute thinking non stop on my single prompt

#

no lag its just really thinking

#

too much

thorn lantern
#

I read that's expected, but I bet that's eating a ton of tokens..

#

and for simpler stuff, that's probably not needed

#

For example for devs who like to take control of the process and implement step-by-step with less "do it all for me"

frosty lava
#

if you want i can send you the whole thinking for a single prompt (its not done yet)

#

its impressive

#

how much it actually think

thorn lantern
#

That would be valuable to see, if you don't mind. you can dm me

frosty lava
#

yeah i send you that

thorn lantern
#

I've used 4.5 a lot, but haven't experimented with 4.6 yet

frosty lava
#

i sent you the thinking

thorn lantern
#

Got it, thanks! I'll respond in dm

burnt sinew
toxic verge
#

Try making a new account not Gmail

maiden fulcrum
#

hmm

#

why

toxic verge
#

I’m not sure exactly but it helps sometimes

#

Unless your sending requests to fast to model

#

Then your going to get them

#

Also try resetting ur modem if all else fails

frosty lava
#

Okay i figured out something, opus 4.6 for some reason, when given a complex work to do, it will think without actually writing code, you can see a huge thinking during 5 / 6 minute then error from lmarena, but no code written, so you have to tell it to actually write code

#

its weird but yeah it is how it is

burnt sinew
#

When's opus 4.6 going to be preliminary on leaderboard

echo aurora
frosty lava
#

its when given a complex work

#

code

left lodge
#

New feature in testing. 👀

#

Take a screenshot
is just not working

#

Why is it even there? To take screenshots of the current session?

slim spire
#

it's still in testing

left lodge
#

And the transparent error popup is nice but why is it at the bottom?
It should be somewhere the bg is clear

slim spire
#

it's not even out it's still in testing so it might not work that correctly yet

#

don't expect anything that is still being tested to work instantly

left lodge
#

They shipped that means it should work

#

Atleast somewhat

#

Models in arena are executing commands! Theya re installating packages?!

echo aurora
left lodge
#

I hope these tools come to text modality or a completely new modality where it have all these tools available to use :>

#

Without the system prompt of code modality

echo aurora
frosty lava
echo aurora
icy yew
#

There is no API for arena to use them right now

echo aurora
frosty lava
echo aurora
echo aurora
rigid holly
#

So whats the opinion about opus 4.6?

Cuz i just found out about it like this second

icy yew
#

Dif better then 4.5

#

But benchmarks say the new gpt 5.3 codex is better

#

But until the API is fully out we don't know

rigid holly
#

I mean... its out on openrouter so who knows

frosty lava
#

opus 4.6 did a decent 3d world much better than opus 4.5

#

but gpt 5.3 might be even better

#

at coding and overall i guess

left lodge
#

Hey pineapple can you create a seprate coding channels?

#

They are made by sota models of the same lab

icy yew
left lodge
#

But the difference in one version upgrade is so much like bruh what

#

It could be this single instance but we will see

icy yew
toxic verge
#

Who here is a nano pro?

atomic lagoon
royal crater
#

Hi

#

I want to work with my GitHub repo. But it doesn't support github connector like perplexity. So wht to do now ?

#

Like I want the ai to create pr make changes put commit etc

icy yew
undone saffron
#

Me using Opus 4.6 after the announcement:

sleek phoenix
#

it was correct

royal crater
#

I want to work with my GitHub repo. But it doesn't support github connector like perplexity. So wht to do now ?

undone saffron
hollow imp
golden ocean
#

real

austere sundial
#

Omg lmarena became Light...
You can't for explosions, you can't for soldiers falling off the horse....
Drastic even chatgpt does that

fresh urchin
#

Guys how good is opus 4.6?

sleek crow
#

@echo aurora

#

the thinking model dosent work

gilded kiln
#

Please add copilot too!!

golden ocean
undone saffron
spare rune
#

Interesting

fickle venture
spare rune
#

Right

fickle venture
spare rune
#

Ohh

fickle venture
#

Code and text

spare rune
#

I might have to test it for creativiry

fickle venture
spare rune
#

I lost my laptop so I lost all the motivation to do anything else 😭

spare rune
#

I’m gonna use it 1 time

compact flame
#

Guys what glasses mean on ai models

fickle venture
# spare rune No

I was using it many times yesterday 😭 still didn't hit limits

spare rune
icy yew
fickle venture
icy yew
#

Like it can see images or something

compact flame
icy yew
compact flame
#

Oh alright

icy yew
fickle venture
compact flame
fickle venture
compact flame
icy yew
fickle venture
compact flame
#

Oh alr

fickle venture
compact flame
#

So how good is opus 4.6

icy yew
#

I mean I spammed it yesterday day night and didn't hit any limit

compact flame
#

Or is it just same as 4.5

icy yew
fickle venture
icy yew
compact flame
fickle venture
compact flame
icy yew
# fickle venture

Also I think terminal bench is like a important one for agentic coding

#

🫀

fickle venture
#

I think like the way it run commands on terminal

compact flame
#

First time seeing chatgpt cook

icy yew
compact flame
#

I wonder if they'll add search to text arena for better results maybe

#

Like not cutting off ai from internet

icy yew
fickle venture
#

The heck are these models on arena

icy yew
fickle venture
icy yew
compact flame
rigid holly
#

So does anyone know the data training cutoff for this model?

Opus 4.5 i think it was early 2025

icy yew
fickle venture
rigid holly
#

Thats less than before tho

fickle venture
#

Idk you can ask it and it will answer

compact flame
#

I wonder why are the trainings are even cutoff

shrewd citrus
#

because they can’t learn like that much reliable data or

rigid holly
#

Yeah well 4.5 says its early 2025. And 4.6 isnt out yet

shrewd citrus
#

like Claude says oh it has reliable data up to April 2025 but can still get info up to July or something

icy yew
fickle venture
fickle venture
icy yew
#

It's out

#

🫀 🫀 🫀

rigid holly
#

Not on the model list it aint

shrewd citrus
#

It is just search it

fickle venture
icy yew
#

Scroll down

fickle venture
icy yew
#

Like the last model in the selection

rigid holly
#

Ah it was hidden at the bottom

Expected it at the top

My bad

fickle venture
left lodge
#

💀

fickle venture
#

Anthropic being themselves

compact flame
#

I guess nobody is safe from greed

bright spade
#

the exacution of code dont work ?

left lodge
#

Its on the last because its not on the leaderboard rn.
Sorthing is same as leaderboard and models not on the leaderboard are at the last

rigid holly
#

Well its still early 2025

Slightly dissapointing

Was hoping for more recent stuff to be more available

Like knowing who the new pope is

fickle venture
rigid holly
#

Ok but still. Not even a middle 2025

fickle venture
#

Probably

light sleet
#

Will gpt 5.3 better than Opus 4.6?

sterile tartan
#

💀

sterile tartan
rigid holly
#

That is a good point and yeah

The 4.1 and sonnet 4 had 2024 data before the upgrade

icy yew
fickle venture
rare fractal
#

Where does the arena get the money to pay for all these expensive models?

icy yew
sterile tartan
rigid holly
#

I remember that in decdmber it was thoughg sonnet 4.6 or 4.7 was coming out that month

fickle venture
rigid holly
#

Same deal?

fickle venture
left lodge
# left lodge Broo

Now i think if this is true , what will they release in place of actual next sucessor of sonnet 4.5??

sterile tartan
#

💀

left lodge
#

Haiku???

rigid holly
#

What about haiku?

fickle venture
#

Haiku is boring no one cares

fickle venture
icy yew
#

Probably dog water

fickle venture
#

Same lol

sterile tartan
toxic verge
sterile tartan
toxic verge
#

They’re burning some money though

#

Just like most companies

#

Except they’re not technically a business somewhat

#

Lightspeed, Laude Ventures

next ivy
#

ngl i thought it was sonnet 5 instead of opus 4.6, did not expect that

toxic verge
rigid holly
#

So

In terms of writing long texts do we still need to wait for it to polished or what

Cuz it keeps crashing

Does it still need those funky numbers at tge end of the model?

left lodge
icy yew
#

Or is it way faster

high dirge
#

is there a way to find out the exact model the max routed to not just the organization

#

since it could be helpful to see what models are best at what prompts

spare rune
#

Hahahaha omg this is so real pls sen me the link

#

😂😂😂

covert iris
#

gimme money KBBQ_woww

uneven lance
#

In lmarena leaderboard Gemini 3 pro ranks 1st while in Artificial Analysis leaderboard Gpt 5.2 pro tanks first

#

Is Artificial Analysis biased?

#

They only rank using trust me bro benchmarks...

golden ocean
#

crack bench

echo dome
#

idk if this one works

echo dome
frigid tusk
#

why is opus 4.6 the lowest here

north obsidian
icy yew
#

I wou still go for sonnet for balance tho

north obsidian
#

It can be since google model until xAI

golden ocean
frigid tusk
echo dome
golden ocean
#

🦀 money 🦀 money 🦀 money

echo dome
#

(that's the answer)

golden ocean
#

REAL

left lodge
# icy yew I wou still go for sonnet for balance tho

Yeah sonnet is for everyday tasks and you can haiku for fast and quick answer without being too conscious about it being hallucinated or wrong, it is fastest model by claude and is seemed better compared to other similiar size models

left lodge
#

It shows improvements from sonnet 4.5 but not from opus 4.5

echo dome
#

wait what just happened to sonnet creator

echo dome
fickle venture
left lodge
glacial dock
#

I’m getting non stop time outs with opus 4.6 thinking, meanwhile with the same question 4.5 has never timed out …is this a bug or is it because it’s brand new and needs more time ?

left lodge
#

Wth is 5.1 even doing wth is that

north obsidian
#

It's good but the floor

hollow ivy
left lodge
#

Specifically opus 4.6 thinking

#

I just said Make minecraft with touch controls

#

One prompt

north obsidian
#

I liked it

left lodge
#

Gpt models are so weird rn

hollow ivy
#

opus 4.6 takes forever to answer a simple prompt in arena

left lodge
wind ember
#

now even direct chat has captcha?

hollow ivy
wind ember
#

like come on ...#

left lodge
hollow ivy
#

-# later, xAI might join the victors

left lodge
wind ember
#

this is annoying

north obsidian
wind ember
left lodge
ocean ferry
#

can anyone try this prompt for Gemini 3 Pro GA?

Create a nice looking and rich SaaS about Gemini 3 Pro GA by Google Deepmind, it must has a mock about the Gemini 3 Pro Preview which is so lazy and it's fixed on Gemini 3 Pro GA, output in single html, must use tailwind css(cdn) and i don't want shiity website, should be really cool and good and should never use emoji in the html.
#

i never get it bro

#

i only get gemini with google logo bruh

#

so please any1 try it

wind ember
hollow ivy
left lodge
toxic verge
#

I don’t know how they’re gonna replace them. They need it for anti bot

#

Because it’s really effective

wind ember
toxic verge
#

That’s the thing I don’t know what alternatives there are I can’t think of any

left lodge
#

Wait i just noticed chat titles are not first prompt

#

Hmm

golden ocean
#

10 gallons of water per custom chat title

toxic verge
#

Fr

wind ember
left lodge
#

Thats not correct bro 😭

wind ember
#

not sending any prompt anymore

toxic verge
#

Oh yeah, it’s not correct but there’s a bitter drop of truth in there in a general sense

#

Exaggerated

left lodge
#

This one is nice one too ↑

#

It isnt even rerendering anything 😭

ocean ferry
zealous sparrow
#

It's just a gemini 3 pro model

#

One is direct with logo
another is stealth with logo

left lodge
#

I literally have zero interest in gemini models cause of their hallucinations and attitude issues

wind ember
zealous sparrow
left lodge
zealous sparrow
#

i have a gen with 2 gemini 3 pros

wind ember
#

mm i see

ocean ferry
#

i have tried it for 2+ hours and i only get the gemini 3 pro with logo

zealous sparrow
shrewd citrus
#

does anyone else have the problem where opus 4.6 thinking just thinks for too long

#

and then stops working

left lodge
#

Just three looks wierd they should have extensions showing

shrewd citrus
#

like 4.5 would think for a minute max before it starts outputting something

left lodge
#

It is by claude

#

👀

north obsidian
#

Hmmm

north obsidian
#

I see it now

#

Claude 4.5 was better than Claude 4.6 thinking 🤔

uneven lance
#

Gpt 5.3 codex is so bad at frontend

#

But backend it shines

left lodge
# toxic verge Attitude?👀 .

Yeah i dont like its character, its lazy, doesn't accept its own mistakes, doesn't follow instructions, a literal karen

#

When i say dont be lazy, instead of doing work it literally says i am not lazy 😭

uneven lance
#

I had to beg with Gemini flash so it follows my lead

left lodge
#

Yeah flash is even worse

uneven lance
#

Within thinking it's the worst

left lodge
#

They dont have any reliability

uneven lance
#

It removes features

#

When I ask it to add a feature it removes the previous version of the code too 😡

#

So now I just use GLM for coding

left lodge
#

Glm is good

#

Its tool use capability i like it but it outputs literal articles even for simple questions

uneven lance
#

A system prompt makes it give one line answer...

#

I have to instruct it how to use its tools

#

The model is so agreeable too

#

Isn't gpt 5.3 codex assisted by 5.2 codex in it's creation?

#

No wonder the front-end capability is meh

wind ember
left lodge
#

Good is good not perfect

wind ember
#

what is it good at?

#

frontend?

left lodge
#

Try yourself

wind ember
#

i did

#

im asking you what is it good a t

#

maybe im missing something

spare rune
wind ember
#

although i still think glm 4.7 is the best chinese model yet

left lodge
#

I haven't used it for front-end

wind ember
#

alongside deepseek v3.2

left lodge
#

What about Kimi 2.5?

wind ember
#

starting to look more like gemini 3 clone

#

its heavily trained on gemini outputs

spare mango
#

Gemini, the "best" chatbot.

#

"in 2024".

zealous sparrow
#

what model did you use

#

Fast or thinking

spare mango
zealous sparrow
#

it knows what year it is, just does silly messups..

spare mango
#

If it knows what year it is, then why does it do the silly messup.

#

That's like me saying I sometimes forget my name.

#

That would reduce my credibility and reliability by a lot.

north obsidian
#

2 y ago I saw a brutal AI error in math, was about meta AI it made all the equation but in the final it was like 255 + 1 it said 257

shrewd citrus
#

like why can’t they be right when i first asked the question

spare mango
#

I have to argue with, and debunk the AI's false claims, after which I realize I just wasted my time arguing with an AI.

icy yew
golden ocean
#

You're absolutely right

icy yew
#

It doesn't really get years wrong or dates

icy yew
# spare mango Pro.

Gemini 3 when it doesn't hallucinate absolutely cooks especially in language

#

Sadly it hallucinates like crazy

hollow ivy
icy yew
#

Claude ofc

hollow ivy
# icy yew Claude ofc

I agree, but i still want to see, if Claude Opus manages to get 100% in this poll, and which model lands second place.

keen beacon
#

how to fix it?

icy yew
#

Unstable

keen beacon
#

🙁

hollow ivy
hollow ivy
keen beacon
# icy yew It just happens I think

but does it go back to normal? because I already tried closing the browser and everything, and it didn’t work , it’s still giving this same problem

hollow ivy
eternal saffron
#

website is down?

icy yew
eternal saffron
keen beacon
icy yew
#

It works for me

eternal saffron
#

alr mate

frozen osprey
#

Does nano banana pro work

#

It keeps giving errors

somber sky
#

did you use multiple accounts like me?

frozen osprey
somber sky
#

or generate any feminine related?!?!!?

#

huh????!?!?!?

sterile tartan
#

What are Opus Rate Limits on Arena?

plucky sparrow
#

has anyone tried this on opus?

modest prism
#

Opus 4.6 thinking gives a timeout when thinking longer than a certain time. Any plan to fix it

vast fern
#

@echo aurora are there any plans for adding gpt 5.3

icy yew
#

The API for 5.3 codex isn't out

golden ocean
#

is there an api for talking to @icy yew

icy yew
#

After the API comes out

vast fern
keen beacon
frosty shuttle
#

The new Claude program is still unusable for me; it thinks for a long time before giving an answer, but is interrupted by the site's limit. Has anyone managed to use it yet?

shrewd citrus
fleet lintel
#

do we have gemini pro ga candidate on LMArena?

#

What is the ranking according to this group?

claude 4.6 > gpt 5.3 > gemini pro ga
OR
claude 4.6 > gemini pro ga > gpt 5.3

i believe these are the only two possibilities 🙂

glass perch
#

Why is opus4.6 thinking forever man

#

I swear it never stops thinking

icy yew
#

For hard stuff

glass perch
#

Im tryna get it to make a story game

#

That lasts like 10 mins

#

Which might explain it

north obsidian
plucky basalt
#

dude why cant i test any model on this website

#

it reasons for 5 mins and it breaks

icy yew
sturdy mica
obsidian cargo
#

I'd definitely rank claude 4.6 over gemini 3 at this point, esp with gemini 3 being so terse

celest orchid
mystic olive
icy yew
obsidian cargo
#

idk my big problem with gemini 3 is it doesn't do long outputs. also it always wants to name characters Elara Vance

cosmic falcon
#

Can u guys release a premium version of the arena , so theres dedicated support , most of the time server crashes on my experience , just an opinion btw

icy yew
#

Does the website not work or the ais

acoustic garden
#

What's causing this error? I haven't used it at all today, I don't have a limit.

acoustic garden
icy yew
#

Claude code with opus 4.6 is unstable

acoustic garden
#

I use 4.5

icy yew
#

Well
Idk

#

🫀

acoustic garden
#

(

toxic verge
#

Well not sucks but has issues

frosty shuttle
junior spoke
#

Nano banana lagging rn right

burnt sinew
burnt sinew
pulsar crystal
#

i love max
why does max not tell me what model responded?
it would be useful to know

i guess max does does not know it?
because athropic is internally routing?

echo aurora
echo aurora
dense pumice
#

why lmarena isn't opening

burnt pulsar
#

Opus-4.6-Thinking is too unstable for me with longer tasks, 4.5-Thinking-32K is way better.

toxic verge
icy yew
burnt sinew
proud bobcat
#

babe wake up

#

gpt 5.3

echo aurora
burnt sinew
burnt pulsar
#

Thanks, I usually analyze/optimize Mesa/Linux Kernel files of around 2000 lines of code. Opus 4.6-Thinking really struggles there.

burnt sinew
#

Or... what i said earlier with copying thinking context manually before it errors

burnt pulsar
#

It usually errors out within the thinking process already. But sometimes it finishes thinking but then only gets not that far with the answer.

wind ember
proud bobcat
#

....

#

every model that has alpha in its name

#

was a cloaked

#

openai model

#

oh my god the joke flew over my head

#

im such an idiot

burnt sinew
burnt sinew
burnt pulsar
burnt pulsar
#

Payout to EU has been suspended though, hence I didn't make any money there.

burnt sinew
burnt pulsar
#

You still get credits, but you cannot cash out via Paypal to EU at the moment. But I am more in there for science and the access to the latest models.

burnt sinew
#

You can make money from there?

burnt pulsar
#

Credits = Money -> Cashing out your earned credits.

#

Yeah, 1000 Credits are 0,90 EUR at the moment.

burnt sinew
#

Did leaderboards just update?

#

No announcement yet

burnt pulsar
burnt sinew
#

Crazy

burnt pulsar
#

It somehow works, though.

burnt sinew
#

1502 THINKING MODEL to 1576 NON thinking

#

Aye there's the announcement

burnt pulsar
#

But as I wasn't able to cash out at the moment (and it might take many more months to resolve it), it is more interesting for people outside of the EU.

burnt sinew
#

@echo aurora What's the difference between code and text->coding

mighty surge
#

wich is better rn? Opus 4.6 or Codex 5.3?

quartz pike
#

opus absolutelly murdured the leaderboards lol

#

even tho it failed in my benchmark

burnt sinew
#

From what

quartz pike
#

to gemini 3 pro

burnt sinew
#

I mean what is that from polymarket?

#

Doesn't look like it

#

Ah

quartz pike
burnt sinew
echo aurora
echo aurora
burnt sinew
echo aurora
burnt sinew
zealous sparrow
#

@echo aurora I think 4.6 thinking has higher error rates

burnt sinew
zealous sparrow
echo aurora
#

I wouldn't want to share more info about future plans until we're ready to, but overall our team is wanting to bring a lot more features to Code Arena

limber panther
#

yo

#

4.6 opus is really good, the only issue it has is no access to external textures and libraries

#

when coding

#

im really excited for claude 5 sonnet tho

#

its supposed to be huge and even better at coding tasks than opus

stray aspen
#

whats the rate limit of claude 4.6 think

limber panther
#

or 15

stray aspen
#

great

limber panther
#

someone made a farm game using 4.6 opus

#

its good

stray aspen
#

send it

limber panther
burnt sinew
limber panther
#

i meant if it had access to search and browser websites that would be really great

limber panther
#

if u provide links, it cannot open them or extract anything

burnt sinew
stray aspen
#

how do i send opus 4.6 think images

burnt sinew
burnt sinew
limber panther
#

@zealous sparrow

burnt sinew
burnt sinew
#

And say the dimensions of the image

icy yew
topaz epoch
#

i neeed help which is best opus 4.6 or 4.6 thinking for python coding?

burnt sinew
#

Like I made flappy bird 1:1 clone using that it just took all asset links

limber panther
#

i just need to search for the links and ask it to put them as assets

topaz epoch
limber panther
#

i think gemini 3 pro training data is pretty good

burnt sinew
#

It did that for flappy bird

#

But it used external asset links

uneven peak
limber panther
burnt sinew
iron laurel
#

What is the level of thinking for 4.6 Thinking?

#

< or > than 32K?

honest verge
iron laurel
honest verge
#

They don't have it for now

rancid turtle
#

Hello

honest verge
#

Not for 4.6

iron laurel
echo aurora
rancid turtle
#

is arena ai downloaded for ios or no?

inner relic
#

Did they fix the response bug

icy yew
echo aurora
echo aurora
inner relic
#

"Something went wrong"

rancid turtle
inner relic
burnt sinew
echo aurora
icy yew
rigid holly
#

Alright i tried the 4.6 model in writing stories

Dont know what to think honestly

Like the writing is not BAD

But it feels drier in dialogue for one than previous models

Can't speak about code or other such things. I Dont use ai models for such things as code or image generation

inner relic
#

I am using claude opus non thinking and this happens..

rigid holly
#

Oh yeah that happened to me to in writing. I was using thinking tho. 4k words work i think cuz i also had it work on shorter chapters, but 8k words get crashed

honest verge
#

It says every time "I have to make it in my limit of 20 steps"

#

When it thinks

rigid holly
#

Tf does 20 steps even mean

rigid holly
#

I will say this tho. The restrictions are more loose in what it rejects from writing than the previous model

latent merlin
#

Hello I am new here

honest verge
#

But it says 20 steps

icy yew
topaz epoch
#

Bro i was using 4.6 thinking and it keep getting stuck in the middle because of thinking

icy yew
#

Idk

prisma cipher
#

The typical limit is approximately 8350 words per answer in Claude, but lmarena has to increase the limit until everything is completely finished and not limited.

honest verge
prisma cipher
honest verge
#

Finally opus 4.6 thinking actually think for some time not just for 1 second

echo aurora
icy yew
#

Hope it gets fix soon

gleaming roost
honest verge
gleaming roost
#

Perhaps it's just a matter of luck

icy yew
gleaming roost
granite tide
#

opus opus let me use opus

honest verge
#

Please arena I need opus 4.6 thinking 32k

#

My opus 4.6 thinking is kinda homeless

#

I live with my opus 4.5 32k

honest verge
#

Because of the limit

prisma cipher
uneven peak
#

@echo aurora how old are you? NGS_smiles

uneven peak
#

Ayo chill fam 😭

limber panther
#

@echo aurora why do i get an error that corrupts the whole project, when I use the coding arena?

#

i tried this in many chats, and 50% of the chats get corrupted at the end of coding when publishing the app

limber panther
prisma cipher
prisma cipher
limber panther
prisma cipher
limber panther
golden ocean
# icy yew

i'm a new soul, i came to this strange world, hoping i could learn a bit about how to give and take but since i came here felt the joy and the fear finding myself making every possible mistake

prisma cipher
limber panther
#

but it doesnt corrupt the whole chat

#

like code arena...

prisma cipher
#

The other thing is to write the code directly, clean, without comments, without artificial simplification, and completely unified. It's a very powerful instruction.

red meadow
#

why am i always getting an error when claude 4.6 gets done with its task in code mode?

echo aurora
# red meadow why am i always getting an error when claude 4.6 gets done with its task in code...

We are looking into these reported problems, but it's worth trying these steps in the meantime as they may help: https://help.arena.ai/articles/1645798556-lmarena-how-to-something-went-wrong-with-this-response-error-message

icy yew
hollow snow
#

where is this opus 4.6 think in the leaderboard

prisma cipher
# limber panther yeah it does cut mid coding

Include these instructions at the end of your prompt:

Each response must be consistent with all of the above and without deviations, proactively correcting anything without waiting for explicit instructions from the user.```

This instruction is very powerful, especially when it is something serious and in production mode, but useful for testing the model's capabilities.
toxic verge
prisma cipher
#

Good luck.

stray aspen
#

claude4.6 gave me a pretty nice roblocks camera system

limber panther
#

for roadblocks

#

💀

stray aspen
#

yes

#

its actually cooking lol

#

and its not even the thinking version

limber panther
limber panther
viral cedar
#

how is claude 4.6 opus like 5x better than gemini 3 pro

limber panther
#

i made a good solar system simulator with assets

limber panther
#

sonnet 5 is the master of coding

prisma cipher
limber panther
#

cuz it has a minor bug in loading Earth's texture

viral cedar
#

they gotta catch up

limber panther
viral cedar
#

but its even evident w/ webdev and documentation making

#

i gave it requirements saying make me documentation for so and so programming lang, and it completely half asses it and ignores half of my instructions.

limber panther
viral cedar
#

meanwhile claude 4.6 opus basically turns into slave and acts like its being held at gunpoint

viral cedar
#

or even a simple half-assed prompt saying make me UI like palantir

#

gemini and 4.5 opus will just half-ass it as usual

#

4.6 opus will immiedately cook up and make u whole UI lib that actually looks decent and is bug free for most part

prisma cipher
#

I have in mind that Opus 4.6 will help me create a unique and realistic universe to integrate my character into, but I will do that at some point if possible.

#

Console, PC, and mobile games are linear and feature repetitive stories. My universe will be more than that.

#

It's just for playing around for a while, not for getting addicted.

toxic verge
atomic lagoon
solar hollow
#
poll_question_text

is opus 4.6 an improvement on 4.5?

victor_answer_votes

7

total_votes

11

victor_answer_id

1

victor_answer_text

yes

limber panther
#

4.6 is significantly better than 4.5 once you test it yourself

stray aspen
#

we need image uploads for opus 4.6

north obsidian
echo aurora
#

This is being worked on

steep jewel
steep jewel
# steep jewel this is terrible

i could pitch you like 100 much better ways to simulate a plent's atmosphere. i made procedural textures in blender in like 5 minutes that look 100x better than this. not to mention the shadows just dont work

limber panther
steep jewel
limber panther
#

lol

steep jewel
limber panther
steep jewel
limber panther
stray aspen
#

how do i use sonnet 5

limber panther
steep jewel
limber panther
steep jewel
#

did they put out the numbers

#

weird they'd make a sonnet model super good at coding when coding is quality > quantity

limber panther
steep jewel
#

and opus is supposed to be good at complex, structured tasks

steep jewel
limber panther
steep jewel
#

literally any time one of the big ai companies does anything now you have 60 wojaks on twitter saying its agi superintelligence from the preview builds they've sent out

limber panther
#

when 4.6 opus launched, it was pretty decent not too impressive

steep jewel
#

yeah i tried it. its pretty good

limber panther
#

people expected sonnet 5 with better coding and stuff, but it got delayed

#

also sonnet is pretty cheap at $3 per input / $15 per output compared to opus

steep jewel
#

i've been thinking about a system of fine tuning over the top of the base model where you have a few elo based examples the ai is trained to respond like to specific criteria. essentially what is already done with safety but for code

#

i also believe you can create a "perceived prompt" that the ai sees and the stupid half-thought-out prompt given by the human. you have an intermediary ai that goes in and edits the prompt so its good and leaves little to the stochastic imagination

#

nano banana already does this, as well as hunyuan, qwen, and most other ai companies

proud bobcat
#

K2.5 instant is really strong

#

Damn

#

Thinking mogs it but it’s nice to see

prisma cipher
modest prism
#

Please help how do I fix opus 4.6 thinking timeout error

verbal nimbus
toxic verge
#

I think that’s the key here

#

It could also be argued and a case can be made that perhaps is actually occurring isn’t necessarily an improved model as much as it could be improved memory and hardware on their end

verbal nimbus
#

It's the only open source model that's competitive on long context reading comprehension:

#

Gemini 3 Flash's score is insane, but Opus 4.6 scores higher in MRCR needle-in-a-haystack. Opus is still not on the above benchmark yet though.

thorny drum
#

was opus 4.6 an anon model first

toxic verge
#

Gemini is fraud

#

Probably has the worst memory issues of all the models

#

That’s how I feel when I use Gemini

verbal nimbus
toxic verge
#

I don’t even bother for one reason only I don’t code and I don’t see the reason for long text because you’re still dealt with the problem of the models all hedging hard

verbal nimbus
#

I guess I can test it more but it's a bit time consuming to recreate a long convo.

toxic verge
#

It’s like musical chairs

#

They alter the words and meanings of the semantics and hedging is one of the most messed up things about AI in my opinion

#

Grant more authority to model than it does to users intent

#

Here’s an example

proud bobcat
toxic verge
#

You see how it alters the words now imagine with a long context

#

It completely stripped away the emotion, the individuality, the uniqueness of expression from my statement into

#

Look kimi instant

#

Va thinking

stray tusk
#

Hi

molten robin
#

i hate this endless generating bug so much.

main nexus
molten robin
verbal nimbus
rugged abyss
#

What model is beluga? Is this an alias or do I just not know that model?

hazy forge
#

show the output we may be able to tell

rugged abyss
hazy forge
#

seem to be actually pretty good

shrewd citrus
green yacht
echo aurora
shrewd citrus
#

so for the past 5 years I’ve been thinking that Jeff was still the ceo 😭

balmy mist
#

which company is pony alpha??

frosty lava
stray aspen
#

<@&1349916362595635286>

glacial dock
#

How the quack is everyone doing

hardy lion
copper cape
#

is here anybody looking for the developer?

glacial dock
toxic verge
spare rune
molten robin
spare rune
#

Oh

molten robin
#

I do GMod lua ai experiments

spare rune
#

Ok

#

Omg broo

#

Why is ro * lox a banned world

#

I’m gonna die

strange sluice
toxic verge
#

Cuz of scammers

spare rune
#

See

#

It works

#

free

#

Money

#

Free money

#

bitcoin

#

Btw

old garden
toxic verge
#

Who is that

fiery gull
toxic verge
#

Like cheese pizza?

austere sundial
#

OMG Lmarena stopped Someone is work with some function that I probably won't use

sturdy mica
#

hell no

old garden
old garden
sturdy mica
#

websim is cancer

#

dude you have so much stuff

old garden
#

ik websim is lowk going bankrupt or somethingf

#

i have 1.2k folowers on ther i think

sturdy mica
old garden
#

ok

sturdy mica
#

holy

#

your websim page is full of slop

#

nobody is playing this bro 🙏

sturdy mica
old garden
#

iwas just testing

#

if the ai knew how to make btools system

#

i never released that game to the public

#

many of my projects are unrelased

#

like 99% of them

sturdy mica
#

yes you did

#

i was playing it

#

all your private games are public

#

theres a lot

old garden
#

lately

sturdy mica
#

its too fast

old garden
#

ik

sturdy mica
#

if you stay still you glide lol

#

wow this game is awesome

old garden
#

ive just been so caught up with my more important projects that

#

i havent had time to work on

#

the quality ones

old garden
sturdy mica
#

cool

#

what game is that

old garden
#

earthbound

#

EarthBound, originally released in Japan as Mother 2: Gīgu no Gyakushū, is a 1994 role-playing video game developed by Ape Inc. (now Creatures Inc.) and HAL Laboratory and published by Nintendo for the Super Nintendo Entertainment System. The second entry in the Mother series, it follows a young boy named Ness and his party of Paula, Jeff and ...

sturdy mica
#

yeah i mean on websim

old garden
#

o

old garden
#

the intro is bad rn

#

i havent had time to fix it

#

but recently a lot of websim staff have been fired
free credits have been removed
some of the other popular users are just quitting

sturdy mica
#

websim has always been slop

#

lol

old garden
#

i personally dont agree with that statement
it was so good
when free users got 50 free gens a day
and the team gave out free max subscriptions (i was one of the first to get one)

sturdy mica
#

nothing was ever fun on that platform

old garden
#

thats not really websims fault