#general | Arena | Page 277

wicked talon Mar 4, 2026, 8:28 PM

#

Also how would you rate apple playground for image making

ocean vortex Mar 4, 2026, 8:28 PM

#

Well yeah like the main AI sub. Since their limits are very reasonable and their platform is very feature rich

wicked talon Mar 4, 2026, 8:28 PM

#

It uses different models but how would you rate it stilc

wicked talon Mar 4, 2026, 8:28 PM

#

ocean vortex Well yeah like the main AI sub. Since their limits are very reasonable and their...

Hmm true

#

I wonder when apple will integrate Gemini

#

Also s26 agentic ai

#

Seems like a security flaw

ocean vortex Mar 4, 2026, 8:30 PM

#

wicked talon Also how would you rate apple playground for image making

Mostly just a thing to screw around with lmao. I have 16Pro but I barely ever use this playground

#

not extremely useful

willow seal Mar 4, 2026, 8:30 PM

#

wicked talon I wonder when apple will integrate Gemini

Will take a hell lot of time i guess

wicked talon Mar 4, 2026, 8:30 PM

#

willow seal Will take a hell lot of time i guess

Yuh

#

What iOS are we on

#

iOS 26 probably will come out on iOS 27 ish

ocean vortex Mar 4, 2026, 8:31 PM

#

Apple's thing isn't anywhere near as good as the best alternatives. It can forward requests to chatgpt but at that point might as well just use chatgpt app

wicked talon Mar 4, 2026, 8:32 PM

#

ocean vortex Apple's thing isn't anywhere near as good as the best alternatives. It can forwa...

True also it's bad how it asks do you want to ask chatgpt

#

Why couldn't they try integrate it to just do it anyway

willow seal Mar 4, 2026, 8:32 PM

#

ocean vortex Apple's thing isn't anywhere near as good as the best alternatives. It can forwa...

Basically it's aesthetic cover phone

ocean vortex Mar 4, 2026, 8:33 PM

#

willow seal Basically it's aesthetic cover phone

The phone itself is not bad at all. Does everything very well and it just works. But Apple Intelligence was a fiasco for sure

wicked talon Mar 4, 2026, 8:34 PM

#

ocean vortex The phone itself is not bad at all. Does everything very well and it just works....

Apple intelligence was the worst ai I think on record

ocean vortex Mar 4, 2026, 8:34 PM

#

You almost never have issues with apps not working on an iPhone, or glitching out, or having to deal with bloatware. Everything just works from the first try properly

willow seal Mar 4, 2026, 8:35 PM

#

ocean vortex The phone itself is not bad at all. Does everything very well and it just works....

Camera wise in my opinion even samsung doesn't compare to it. Smooth af also a well used IOS will be way smoother then android. But the way it basically have same features just different fonts is money milking

wicked talon Mar 4, 2026, 8:35 PM

#

ocean vortex You almost never have issues with apps not working on an iPhone, or glitching ou...

I'm surprised Samsung allowed Google to partner with apple like that

ocean vortex Mar 4, 2026, 8:36 PM

#

wicked talon I'm surprised Samsung allowed Google to partner with apple like that

I mean... They didn't have a choice lol. Google owns the entire OS that Samsung relies on, they did not have a say.

wicked talon Mar 4, 2026, 8:36 PM

#

Alot of the main galaxy ai features use Gemini nano/ ai core

wicked talon Mar 4, 2026, 8:36 PM

#

ocean vortex I mean... They didn't have a choice lol. Google owns the entire OS that Samsung ...

True but I don't think Google can restrict Samsung

#

As it's open source

#

All Google can do is probably not allow Google pre installed apps

#

I think they did that with Huawei

ocean vortex Mar 4, 2026, 8:37 PM

#

wicked talon As it's open source

Ironically this seems to be changing for the worse lately:
https://arstechnica.com/gadgets/2026/03/with-developer-verification-googles-apple-envy-threatens-to-dismantle-androids-open-legacy/

Ars Technica

With developer verification, Google's Apple envy threatens to disma...

Questions remain as Google prepares to lock down Android app distribution in the name of security.

#

Which would annul their main advantage. Not too great. Wouldn't want for them to do this even though I'm not using Android. It is supposed to be open OS...

wicked talon Mar 4, 2026, 8:39 PM

#

ocean vortex Ironically this seems to be changing for the worse lately: https://arstechnica....

Google started restricting side loading with there advanced protection

#

A account security program was restricting side loading

#

#

I would switch to grapheneos if they block side loading

#

Or install surveillance

#

Which UK government is trying to push

wicked talon Mar 4, 2026, 8:41 PM

#

ocean vortex Which would annul their main advantage. Not too great. Wouldn't want for them to...

Also yes android main feature is open source

#

And side loading

paper vortex Mar 4, 2026, 8:46 PM

#

Hello

loud verge Mar 4, 2026, 8:47 PM

#

wicked talon

Why have you kept scanning on?

#

I keep it off.

wicked talon Mar 4, 2026, 8:47 PM

#

loud verge Why have you kept scanning on?

Why?

loud verge Mar 4, 2026, 8:48 PM

#

Because it always flags mod apps.

#

Always.

wicked talon Mar 4, 2026, 8:48 PM

#

loud verge Because it always flags mod apps.

I've never been given a false flag in my life on pixel

#

And I've installed many modded apps

loud verge Mar 4, 2026, 8:48 PM

#

Hmm...

wicked talon Mar 4, 2026, 8:48 PM

#

Can't talk about piracy :/

loud verge Mar 4, 2026, 8:48 PM

#

Noted.

wicked talon Mar 4, 2026, 8:49 PM

#

loud verge Noted.

Yuh

#

Against discord tos

#

I've only ever got flagged on my old s9

#

Which gave fakes almost every app

rich spruce Mar 4, 2026, 8:51 PM

#

Hi, does anyone have a Sora 2 code? I already have an account. I'm looking for a code

stray aspen Mar 4, 2026, 8:51 PM

#

no

#

go to openai server

rich spruce Mar 4, 2026, 8:52 PM

#

stray aspen no

I can't get in, does anyone have a solution? I've correctly set up my surfshark VPN

stray aspen Mar 4, 2026, 8:53 PM

#

i would give you one

#

but last time i did i got banned for 'scamming'

loud verge Mar 4, 2026, 8:54 PM

#

stray aspen but last time i did i got banned for 'scamming'

😭

wicked talon Mar 4, 2026, 8:54 PM

#

rich spruce I can't get in, does anyone have a solution? I've correctly set up my surfshark ...

VPN is probably blocking it

#

From you entering it by discords anti ddos

spring kelp Mar 4, 2026, 8:55 PM

#

What's the best ai for coding?

rich spruce Mar 4, 2026, 8:55 PM

#

stray aspen but last time i did i got banned for 'scamming'

Are you on WhatsApp? Or instagram

stray aspen Mar 4, 2026, 8:55 PM

#

nah

#

i sent the code here

#

and they banned me

#

lmao

#

@rich spruce

wicked talon Mar 4, 2026, 8:57 PM

#

rich spruce Are you on WhatsApp? Or instagram

Btw I don't think you can join there server due to the VPN

#

You would have to turn it off

rich spruce Mar 4, 2026, 9:00 PM

#

stray aspen and they banned me

Yes, it's blocked after the code. You can send it to me on Telegram; I don't know if it blocks it or not.

rich spruce Mar 4, 2026, 9:01 PM

#

wicked talon Btw I don't think you can join there server due to the VPN

Yes, it's the VPN, they know.

#

I'm looking for a code; I came across a YouTube video on this Discord link

rich spruce Mar 4, 2026, 9:02 PM

#

spring kelp What's the best ai for coding?

Claude 4.6 amazing

spring kelp Mar 4, 2026, 9:03 PM

#

rich spruce Claude 4.6 amazing

Thanks i will try it

stray aspen Mar 4, 2026, 9:07 PM

#

spring kelp What's the best ai for coding?

claudius

quiet skiff Mar 4, 2026, 9:23 PM

#

They added the direct mode battles back, right? Haha

stray aspen Mar 4, 2026, 9:26 PM

#

really?

ashen mauve Mar 4, 2026, 9:32 PM

#

anyone got any ideas for a good rp?

hollow ivy Mar 4, 2026, 9:56 PM

#

ashen mauve anyone got any ideas for a good rp?

roleplaying game with an AI as GM?
I can recommend a sandbox game:
You discover a hidden, ancient alien spaceship, while traveling/exploring antarctica. The spaceship has an AGI/ASI in it, which offers you to become its new (biological) pilot.
Was fun to play with Gemini.

#

Or you play an adventure, where you inherit a time-machine from your reclusive, genius uncle.

#

Claude also is really good in these games.

#

A third game: your character discovers an anomalously large spider in your house (attic, bathroom, or whatever)

#

-# third scenario not recommended for arachnophobic people ^^

golden ocean Mar 4, 2026, 10:02 PM

#

https://cdn.discordapp.com/attachments/1384341249137446992/1432602472245100544/attachment.gif

sinful thorn Mar 4, 2026, 10:09 PM

#

Everything but Video in direct chat 💔

wheat onyx Mar 4, 2026, 10:18 PM

#

5.4?

gleaming roost Mar 4, 2026, 10:18 PM

#

This is the third time Codex has reset my weekly quota to 100%. 🤔
Should I be worried?

#

Oh, now it makes sense.

wicked talon Mar 4, 2026, 10:42 PM

#

Why is 900 people watching the itv logo bro 😭

lucid geyser Mar 4, 2026, 11:02 PM

#

gpt 5.4 on arena but it sucks

fierce kelp Mar 4, 2026, 11:09 PM

#

You shocked?

fickle venture Mar 4, 2026, 11:19 PM

#

gleaming roost Oh, now it makes sense.

And free users...

frosty lava Mar 4, 2026, 11:20 PM

#

lucid geyser gpt 5.4 on arena but it sucks

where you get this info from

#

i don't see it on arena

lucid geyser Mar 4, 2026, 11:22 PM

#

frosty lava where you get this info from

galapagos

echo aurora Mar 4, 2026, 11:22 PM

#

quiet skiff They added the direct mode battles back, right? Haha

Yes, Battles in Direct experiment was added back. There is now a Skip Button added. cc @stray aspen

frosty lava Mar 4, 2026, 11:22 PM

#

lucid geyser galapagos

yes and how do you know its gpt

#

5.4

lucid geyser Mar 4, 2026, 11:23 PM

#

frosty lava yes and how do you know its gpt

it was on design arena a few days ago and it matched gpt style, and everything

#

and on lm arena it says its openai

#

its like the instant or low thinking version though

#

so it sucks

frosty lava Mar 4, 2026, 11:24 PM

#

okay i think you can be right but it might not be gpt at all too

#

wait for the real one

#

better to make a real opinion

stray aspen Mar 4, 2026, 11:28 PM

#

echo aurora Yes, Battles in Direct experiment was added back. There is now a `Skip Button` a...

thank god

lucid geyser Mar 4, 2026, 11:29 PM

#

frosty lava okay i think you can be right but it might not be gpt at all too

very very likely

#

to be gpt

frosty lava Mar 4, 2026, 11:30 PM

#

lucid geyser very very likely

yes but like you said if its a spark version or an instant version or a non thinking version then it's normal

#

if it did worse than gpt 5.3 codex then it can't be the thinking one

lucid geyser Mar 4, 2026, 11:34 PM

#

frosty lava yes but like you said if its a spark version or an instant version or a non thin...

huh

#

its pretty bad, just try it yourself

frosty lava Mar 4, 2026, 11:35 PM

#

lucid geyser its pretty bad, just try it yourself

till now we never saw any new model being worse in capabilities than the previous version :

#

cause it wouldn't make sense at all

#

i told you what it can be, but anyway we don't know since its an hidden one

fervent plank Mar 4, 2026, 11:57 PM

#

It is extremely direct and kinda funny. No fluff

#

But knowledge cutoff in july seems wrong

#

Should be something like december

frosty lava Mar 5, 2026, 12:13 AM

#

fervent plank It is extremely direct and kinda funny. No fluff

we don't have gpt 5.3 yet we only have the instant version so it might be the normal 5.3 without thinking ?

#

or directly 5.4 but without thinking

short sequoia Mar 5, 2026, 12:59 AM

#

https://sovra-mhce-lambda-lexicon.vercel.app/

sly raven Mar 5, 2026, 1:19 AM

#

Quick question: why are certain models (like Gemini 3.1) not receiving new votes anymore? It’s been a long time since I've seen the numbers move. Is this intentional?

tired crypt Mar 5, 2026, 1:37 AM

#

hey i wanna know something, how do people get this notification?

thorny schooner Mar 5, 2026, 1:43 AM

#

Is anyone else having this issue now where it just gives me a error message repeatedly even after I locked out and everything and no I don't have a VPN on I already turn it off

languid thunder Mar 5, 2026, 1:43 AM

#

tired crypt hey i wanna know something, how do people get this notification?

Same. Was about to ask.
Also, how can we use that model? On the website i tried to direct talk and search, but it didnt show up

vernal storm Mar 5, 2026, 2:31 AM

#

fervent plank It is extremely direct and kinda funny. No fluff

Do you need developer?

ashen mauve Mar 5, 2026, 2:34 AM

#

well gang the A/B testing in direct chat is broke whatever you do DONT click skip or your chat is going to get bricked

lofty frigate Mar 5, 2026, 2:58 AM

#

Okay look I don't know if I'm the only one having this issue when I'm using Image generation and using the nano banana pro model, And let's say I passed image of myself and make a prompt to include another figure next to me when it finishes generating it indeed creates another figure but then For some reason it doesn't generates me it generates someone who is wearing the same clothes as me but not me, Is anyone else experiencing this issue, Now if I hit try after a couple of tries it does generate me but many times it generates a random person who is wearing the same clothes as me

vital cove Mar 5, 2026, 3:31 AM

#

stray aspen thank god

Do you need developer?

lofty river Mar 5, 2026, 3:41 AM

#

anyone know why its not letting me reset password ?

hardy lion Mar 5, 2026, 4:24 AM

#

the video generation is now on the website, not in discerd. check it out at https://arena.ai/video

lucid geyser Mar 5, 2026, 5:28 AM

#

tired crypt hey i wanna know something, how do people get this notification?

Bro your in the discord

undone saffron Mar 5, 2026, 6:03 AM

#

sinful thorn Mar 5, 2026, 6:15 AM

#

Everything but video in direct chat😭😭🙏🙏

rigid holly Mar 5, 2026, 6:26 AM

#

Ayo is it just me or does it demand for you to log in your account to use it?

crisp mauve Mar 5, 2026, 6:27 AM

#

/voice

rigid holly Mar 5, 2026, 6:31 AM

#

You'd think they'd make an announcement or something if they made it so you had to be logged in to use arena

tawny brook Mar 5, 2026, 6:36 AM

#

i forgot the prompt, but i ran this again and look what it gave me LMAO

#

bright shard Mar 5, 2026, 6:57 AM

#

rigid holly Ayo is it just me or does it demand for you to log in your account to use it?

It's required; they should add that it will let you do some things even if you don't log in. They should leave it like it was before.

keen beacon Mar 5, 2026, 6:59 AM

#

Why I can't send prompt in lmarena incognito

pastel plaza Mar 5, 2026, 7:09 AM

#

keen beacon Mar 5, 2026, 7:19 AM

#

keen beacon Why I can't send prompt in lmarena incognito

@echo aurora

echo aurora Mar 5, 2026, 7:23 AM

#

keen beacon Why I can't send prompt in lmarena incognito

Hmm what do you mean?

#

Why not

keen beacon Mar 5, 2026, 7:24 AM

#

i'll show you

#

why i cant do this

#

#

thats incognito btw

echo aurora Mar 5, 2026, 7:25 AM

#

keen beacon why i cant do this

Have Video Arena selected?

keen beacon Mar 5, 2026, 7:25 AM

#

echo aurora Have Video Arena selected?

no

#

why?

#

im doing a text

echo aurora Mar 5, 2026, 7:27 AM

#

keen beacon why?

I’m not sure. Was able to make it work without having to sign in. Can you post a bug in #1343291835845578853 ? Add the relevant info. I can’t take a look now, but will later.

glacial swan Mar 5, 2026, 7:40 AM

#

Guys, does anyone know why in Claude 4.6 I can’t upload files? Screenshots and things like that would be useful, but for some reason it doesn’t work in LMArena.

undone saffron Mar 5, 2026, 7:53 AM

#

sinful thorn Everything but video in direct chat😭😭🙏🙏

#1372230675914031105
-# Before creating a post with that proposal, check first to see if someone has already created a post with your idea

undone saffron Mar 5, 2026, 7:57 AM

#

echo aurora Have Video Arena selected?

Why is it that at a certain hour, when using AI to create a video, the captcha system is bug and cannot be overcome?
Then it won't let me write on code-arena or direct chat because the captcha ruined my previous captcha token

#

There comes a point when the same problem [becomes annoying](#general message)

native folio Mar 5, 2026, 8:17 AM

#

fervent plank Should be something like december

Im not sure they'd have necessarily advanced cutoff date between dot one models, most likely trained over the same normalized data with superior inference

light sleet Mar 5, 2026, 8:22 AM

#

finally u can stop responses, seen in canary arena

rare veldt Mar 5, 2026, 9:07 AM

#

hello

runic rain Mar 5, 2026, 9:07 AM

#

Hi

inland quest Mar 5, 2026, 9:15 AM

#

Omg

Screenshot_20260305_121518_com.huawei.browser.png

#

Absolute Cinema

Screenshot_20260305_121524_com.huawei.browser.png

#

Stop button

inland fossil Mar 5, 2026, 9:40 AM

#

Hello

whole sundial Mar 5, 2026, 9:52 AM

#

<@&1349916362595635286>

#

crypto scam

hollow mulch Mar 5, 2026, 9:56 AM

#

My main account got stop genration button, but my onther account still don't have that button:(

wicked sage Mar 5, 2026, 9:57 AM

#

inland quest Absolute Cinema

THEY ADDED A STOP BUTTON???

#

:LFG:

hollow mulch Mar 5, 2026, 9:57 AM

#

inland quest Absolute Cinema

Yes im see it today

#

Well no more infnite genrate yay 😄

#

Gpt 5.3 code vs opus 6 who is better guy?

manic moss Mar 5, 2026, 10:08 AM

#

How long time do we have to wait to have new token for claudAI free ? Or its « one shot » mb I’m new

shrewd citrus Mar 5, 2026, 10:24 AM

#

Is opus 4-6-search broken?

#

hollow mulch Mar 5, 2026, 10:31 AM

#

Now can you guy fix issue this like model sometime can't use is report 'something went wrong with this message, please try again'

distant idol Mar 5, 2026, 10:46 AM

#

@echo aurora when yall making the video arena a direct use or even side by side, its been months since i asked this 😭

hollow mulch Mar 5, 2026, 10:47 AM

#

@echo aurora Is he developer or discord mod?

distant idol Mar 5, 2026, 10:48 AM

#

hollow mulch <@283397944160550928> Is he developer or discord mod?

inbetween ✌

#

king

bright shard Mar 5, 2026, 11:03 AM

#

It's a bug; if you delete the cookies and try again, the login message doesn't appear. It only appears sometimes, so I imagine it's a bug.

fervent plank Mar 5, 2026, 11:17 AM

#

frosty lava we don't have gpt 5.3 yet we only have the instant version so it might be the no...

I am pretty sure galapagos is just the upcoming open source model

#

from openai

rich spruce Mar 5, 2026, 11:18 AM

#

stray aspen thank god

Lneduo2en ?

ocean vortex Mar 5, 2026, 11:18 AM

#

inland quest Omg

how come the UI is half Russian for you?

frosty lava Mar 5, 2026, 11:36 AM

#

fervent plank I am pretty sure galapagos is just the upcoming open source model

galapagos is doing much worse than any open ai last thinking model so it can't be a thinking model:

#

that's probably an instant or normal model

#

so it will be for free user too

#

for sure

fervent plank Mar 5, 2026, 11:41 AM

#

frosty lava galapagos is doing much worse than any open ai last thinking model so it can't b...

That what I meant, it is going to be the upcoming open source model that you can use locally on your own computer

fervent plank Mar 5, 2026, 11:42 AM

#

fervent plank That what I meant, it is going to be the upcoming open source model that you can...

Either that or 5.4 instant, but I it does not feel like they are releasing thatt

frosty lava Mar 5, 2026, 11:42 AM

#

honestly all i care about is thinking model capabilities i don't need another instant model or low cost

fervent plank Mar 5, 2026, 12:21 PM

#

frosty lava honestly all i care about is thinking model capabilities i don't need another in...

Yeah I am waiting on 5.4 pro until I renew my pro subscription.

light scroll Mar 5, 2026, 12:22 PM

#

Arena has updated something? Previous I was able to generate 3 images per day with no account

ocean vortex Mar 5, 2026, 1:34 PM

#

fervent plank Either that or 5.4 instant, but I it does not feel like they are releasing thatt

it could be 5.3 Instant no? That one is still not on a leaderboard

tiny jolt Mar 5, 2026, 1:50 PM

#

.txt support for arena when?

humble ether Mar 5, 2026, 2:24 PM

#

now generate image need login first?

#

tame haven Mar 5, 2026, 2:36 PM

#

I can't chat anymore without login. Is it time to leave lmarena?

humble ether Mar 5, 2026, 2:36 PM

#

maybe

fervent plank Mar 5, 2026, 3:00 PM

#

ocean vortex it could be 5.3 Instant no? That one is still not on a leaderboard

Nah 5.3 instant uses 1 emojii per sentence. This is way more concise than 5.3 instant

wicked talon Mar 5, 2026, 3:18 PM

#

I like how qwen can image create anything except 18+ content

wicked talon Mar 5, 2026, 3:18 PM

#

tame haven I can't chat anymore without login. Is it time to leave lmarena?

To prevent people clearing there cookie + cache to get more message allowance

#

Nvm I hate it it doesn't make Donald trump in a I hate America shirt

burnt sinew Mar 5, 2026, 3:42 PM

#

I dont know but id rather just pick opus 4.6 every time

#

And to have router show me what model its using

tired mantle Mar 5, 2026, 3:48 PM

#

so annoying

ocean vortex Mar 5, 2026, 3:48 PM

#

fervent plank Nah 5.3 instant uses 1 emojii per sentence. This is way more concise than 5.3 in...

it behaves differently on API vs chatgpt though. Chatgpt has lots of instructions they are feeding it.

Here are 10 API responses of it I generated for someone else earlier today (just a silly argument about whether it can find and output release date of itself accurately lol):

📎 message.txt

tired mantle Mar 5, 2026, 3:48 PM

#

it's broken

ocean vortex Mar 5, 2026, 3:49 PM

#

As you can see not a single emoji anywhere

#

No added sys/dev instructions here, default settings and a simple question with OpenAI's search enabled.

ocean vortex Mar 5, 2026, 3:59 PM

#

shrewd citrus

You have no more code_execution tool calls remaining this turn

errant inlet Mar 5, 2026, 4:05 PM

#

https://www.linkedin.com/company/mhn-solutions-2026/

MHN Solutions | LinkedIn

MHN Solutions | 39 followers on LinkedIn. A results-driven digital marketing and web development agency helping businesses grow, scale, and dominate online. | MHN Solutions is a results-driven digital marketing and web development agency helping businesses grow, scale, and dominate online.

We specialize in:

• Search Engine Optimization (SEO)...

fickle venture Mar 5, 2026, 4:21 PM

#

tired mantle so annoying

Make a new chat, but I am pretty sure Gemini nano banana is the only one having this issue

tired mantle Mar 5, 2026, 4:22 PM

#

fickle venture Make a new chat, but I am pretty sure Gemini nano banana is the only one having ...

Only this new version have this issue. 3. works fine

fickle venture Mar 5, 2026, 4:22 PM

#

tired mantle Only this new version have this issue. 3. works fine

Well yeah you can use it on Google flow for free

tired mantle Mar 5, 2026, 4:22 PM

#

fickle venture Well yeah you can use it on Google flow for free

With Gemini watermark?

fickle venture Mar 5, 2026, 4:23 PM

#

tired mantle With Gemini watermark?

Uhh I dont think it haves it

fickle venture Mar 5, 2026, 4:24 PM

#

tired mantle With Gemini watermark?

https://labs.google/fx/tools/whisk

Whisk - labs.google/fx

A new experimental tool that lets you use images as prompts to visualize your ideas and tell your story.

#

Here it is

sharp mirage Mar 5, 2026, 4:52 PM

#

Yo

#

Hi chat

languid kernel Mar 5, 2026, 4:52 PM

#

hi new here

sharp mirage Mar 5, 2026, 4:53 PM

#

Hi

#

Wsg

#

The chat died

#

Yoooo. Chat

languid kernel Mar 5, 2026, 5:10 PM

#

yo folks, where is video arena channel? pls anyone can guide me to it

unreal tide Mar 5, 2026, 5:13 PM

#

holy plot twist

steep blaze Mar 5, 2026, 5:24 PM

#

languid kernel yo folks, where is video arena channel? pls anyone can guide me to it

Can't find it either

mortal vale Mar 5, 2026, 5:25 PM

#

@steep blaze Note that Video Arena has been removed from the server. More information can be found in this #announcements . You can still generate videos on the website.

vital lake Mar 5, 2026, 6:08 PM

#

GPT 5.4 LAUNCHED
EVERYONE

#

EVERYONE

#

5.4 LAUNCHED

#

frosty lava Mar 5, 2026, 6:10 PM

#

yesssss its real

vital lake Mar 5, 2026, 6:11 PM

#

Someone addd ittttt

cloud zinc Mar 5, 2026, 6:11 PM

#

#

https://openai.com/index/introducing-gpt-5-4/

vital lake Mar 5, 2026, 6:11 PM

#

cloud zinc

thx

cloud zinc Mar 5, 2026, 6:12 PM

#

frosty lava Mar 5, 2026, 6:17 PM

#

okay so they are saying its better than 5.3 codex in coding too so now i can't wait for 5.4 codex to see even more improvement in coding task

vital lake Mar 5, 2026, 6:20 PM

#

It doesnt seem like they focused on intelligence this time

#

Like they did with 5.2

frosty lava Mar 5, 2026, 6:24 PM

#

vital lake It doesnt seem like they focused on intelligence this time

Yes this one should be as good or better but not by alot than 5.3 codex in term of coding task or intelligence that's why its not 5.4 codex

#

but for sure they'll release the codex version if what you mean by intelligence is coding capabilities

vital lake Mar 5, 2026, 6:24 PM

#

No, general intelligence

#

Not coding

#

I value a smarter assistant more

#

hopefully 5.4 isnt a disapointment

frosty lava Mar 5, 2026, 6:25 PM

#

vital lake No, general intelligence

oh then, your right it seems more cost efficient and should be more intelligent but they didn't focused as much as for previous model

shell pewter Mar 5, 2026, 6:25 PM

#

its not on arena yet right?

frosty lava Mar 5, 2026, 6:25 PM

#

still should be smarter

vital lake Mar 5, 2026, 6:25 PM

#

vital lake Mar 5, 2026, 6:25 PM

#

shell pewter its not on arena yet right?

It is

shell pewter Mar 5, 2026, 6:25 PM

#

vital lake It is

thx man

frosty lava Mar 5, 2026, 6:26 PM

#

vital lake I value a smarter assistant more

seeing the benchmark it should be still a good improvement in intelligence from 5.2

crisp anvil Mar 5, 2026, 6:26 PM

#

Mine isn't working

vital lake Mar 5, 2026, 6:27 PM

#

crisp anvil Mine isn't working

Refresh?

#

Or open new tab

shell pewter Mar 5, 2026, 6:27 PM

#

vital lake It is

did you (or anyone else) who has used a lot ai and tested gpt 5.4/high? how is it compared to other tier 1 AI?

frosty lava Mar 5, 2026, 6:27 PM

#

knowledge cutoff seems to be august 2025

vital lake Mar 5, 2026, 6:28 PM

#

shell pewter did you (or anyone else) who has used a lot ai and tested gpt 5.4/high? how is i...

I mean it just released

#

frosty lava Mar 5, 2026, 6:29 PM

#

shell pewter Mar 5, 2026, 6:29 PM

#

crisp anvil Mine isn't working

1 saw from X, open ai is experiencing error originally before the launch (lol)
2 openai just released it so their own server must be experiencing a high volume -> slow
3 same thing for arena itself

frosty lava Mar 5, 2026, 6:31 PM

#

vital lake

actually every model fail at this answer due to knowledge cutoff and what they've been trained most on

#

ask gemini to guess it will do same and same for anthropic

stray aspen Mar 5, 2026, 6:32 PM

#

gpt 5.4 is out

vital lake Mar 5, 2026, 6:32 PM

#

frosty lava actually every model fail at this answer due to knowledge cutoff and what they'v...

Well they are given like zero data to determine

#

So a guess like GPT 5 era is good

vital lake Mar 5, 2026, 6:32 PM

#

stray aspen gpt 5.4 is out

We know

frosty lava Mar 5, 2026, 6:33 PM

#

yes its good that they are saying gpt 5

#

its not that old

vital lake Mar 5, 2026, 6:33 PM

#

It talks so natural and smart

frosty lava Mar 5, 2026, 6:33 PM

#

vital lake It talks so natural and smart

they said that they improved the personality to be better again so

#

i guess its good since everyone complained about it

loud verge Mar 5, 2026, 6:35 PM

#

Guys

compact flame Mar 5, 2026, 6:35 PM

#

Well gpt 5.4

#

Is it any good?

loud verge Mar 5, 2026, 6:36 PM

#

Was there no announcement for gpt 5.4 on arena?

frosty lava Mar 5, 2026, 6:36 PM

#

compact flame Well gpt 5.4

its made for general purpose task and supposed to be good at coding even tho its not a codex model

stray aspen Mar 5, 2026, 6:36 PM

#

why does gpt 5.4 not answer

loud verge Mar 5, 2026, 6:36 PM

#

Chad model

wind ember Mar 5, 2026, 6:36 PM

#

compact flame Is it any good?

same performance as gpt 5.3 but cheaper

frosty lava Mar 5, 2026, 6:36 PM

#

should be better or as good as 5.3 codex on coding task

wind ember Mar 5, 2026, 6:36 PM

#

still same coding slops

vital lake Mar 5, 2026, 6:37 PM

#

compact flame Well gpt 5.4

Huge difference from GPT 5.2

#

Is more ways then one

wind ember Mar 5, 2026, 6:37 PM

#

well its def better than gpt 5.2

#

but not that much diff with gpt 5/3

compact flame Mar 5, 2026, 6:37 PM

#

I wonder if we should expect gpt 5.5 sometime later

frosty lava Mar 5, 2026, 6:37 PM

#

the pro version is beating gemini deep think right ?

stray aspen Mar 5, 2026, 6:38 PM

#

gpt 5.4 sucks

#

we need gpt 6

fathom apex Mar 5, 2026, 6:38 PM

#

yea

#

gpt is sucks

frosty lava Mar 5, 2026, 6:40 PM

#

So now openai are doing monthly release instead of two month ?

inner relic Mar 5, 2026, 6:40 PM

#

Is gpt 5.4 good in roleplay

stray aspen Mar 5, 2026, 6:40 PM

#

inner relic Is gpt 5.4 good in roleplay

no

#

it sucks

vital lake Mar 5, 2026, 6:40 PM

#

compact flame I wonder if we should expect gpt 5.5 sometime later

They release monthly so probably around mid or late apirl.

frosty lava Mar 5, 2026, 6:40 PM

#

vital lake They release monthly so probably around mid or late apirl.

that's crazy speeding up by 2x ?

vital lake Mar 5, 2026, 6:40 PM

#

inner relic Is gpt 5.4 good in roleplay

Dont listen to him its been around for like 20 minuites no one could get a good understanding of the model within that time frame.

stray aspen Mar 5, 2026, 6:40 PM

#

gpt 5.4 is literally so ass

honest verge Mar 5, 2026, 6:40 PM

#

Gpt 5.5 is going to come out April 5

stray aspen Mar 5, 2026, 6:40 PM

#

ill just go back to claude 4.6 for the 50th time this month

frosty lava Mar 5, 2026, 6:41 PM

#

stray aspen gpt 5.4 is literally so ass

me when im hating without even testing

vital lake Mar 5, 2026, 6:41 PM

#

frosty lava that's crazy speeding up by 2x ?

Yup, to keep up with Opus 5

royal sail Mar 5, 2026, 6:41 PM

#

stray aspen gpt 5.4 is literally so ass

the model just dropped broski

#

lol

stray aspen Mar 5, 2026, 6:41 PM

#

frosty lava me when im hating without even testing

wdym i just tested it

honest verge Mar 5, 2026, 6:41 PM

#

And gpt 5.6 may 4

inner relic Mar 5, 2026, 6:41 PM

#

stray aspen no

same guy who called glm-5 suck.

#

bruh

frosty lava Mar 5, 2026, 6:41 PM

#

stray aspen wdym i just tested it

okay bro

stray aspen Mar 5, 2026, 6:41 PM

#

inner relic same guy who called glm-5 suck.

glm-5 is actually decent

light sleet Mar 5, 2026, 6:41 PM

#

Gpt 5.4 is good in lua coding 💀

royal sail Mar 5, 2026, 6:41 PM

#

The model is quite better conversationally

honest verge Mar 5, 2026, 6:41 PM

#

and maybe gpt 5.7 early-mid june

royal sail Mar 5, 2026, 6:41 PM

#

It sounds a lot less AI-ish

inner relic Mar 5, 2026, 6:41 PM

#

stray aspen glm-5 is actually decent

alright.

frosty lava Mar 5, 2026, 6:42 PM

#

it talk in a better way

honest verge Mar 5, 2026, 6:42 PM

#

When gpt 6 will come out

stray aspen Mar 5, 2026, 6:42 PM

#

lets get gpt 6 already

#

no more gpt 5.x crap

honest verge Mar 5, 2026, 6:42 PM

#

Because at this point it's only 5.x

royal sail Mar 5, 2026, 6:42 PM

#

they're scared to call anything gpt 6 right now lol

#

I doubt they have anything worth calling gpt 6

frosty lava Mar 5, 2026, 6:42 PM

#

stray aspen no more gpt 5.x crap

so you are basing your opinion on the name they give to the model not on the actual performance, nice

#

knowing every other companies are doing the same

stray aspen Mar 5, 2026, 6:42 PM

#

frosty lava so you are basing your opinion on the name they give to the model not on the act...

when did i say that

tawny canyon Mar 5, 2026, 6:42 PM

#

what do u think ,will gpt 5.4 take the first place in arena?

stray aspen Mar 5, 2026, 6:42 PM

#

i literally just said that i tested it

#

bro what the hell is this

#

even glm is better

#

glory to anthropic

royal sail Mar 5, 2026, 6:43 PM

#

stray aspen bro what the hell is this

5.4's coding skills are pretty much nearly identical to 5.3 codex

#

especially with frontend

honest verge Mar 5, 2026, 6:44 PM

#

High version is out

Screenshot_2026-03-05-21-43-28-458_com.android.chrome-edit.jpg

vital lake Mar 5, 2026, 6:44 PM

#

stray aspen bro what the hell is this

Not everyone values frontend only.

royal sail Mar 5, 2026, 6:44 PM

#

I pretty much just imagine this model to be the general-use version of 5.3 codex

stray aspen Mar 5, 2026, 6:44 PM

#

claude is way better

frosty lava Mar 5, 2026, 6:44 PM

#

this one is made for general purpose task.

inner relic Mar 5, 2026, 6:44 PM

#

Let's test creativity writting

honest verge Mar 5, 2026, 6:44 PM

#

But no xhigh...

frosty lava Mar 5, 2026, 6:44 PM

#

mainly but still good at coding

inner relic Mar 5, 2026, 6:44 PM

#

since chatgpt messed up eariler

honest verge Mar 5, 2026, 6:44 PM

#

I wonder if arena will ever have xhigh

compact flame Mar 5, 2026, 6:44 PM

#

stray aspen claude is way better

Well if people can't afford Claude it's mid

honest verge Mar 5, 2026, 6:44 PM

#

Or it's too expensive

compact flame Mar 5, 2026, 6:44 PM

#

honest verge I wonder if arena will ever have xhigh

Too expensive

vital lake Mar 5, 2026, 6:44 PM

#

inner relic Let's test creativity writting

It seems like it would shine there

royal sail Mar 5, 2026, 6:44 PM

#

stray aspen claude is way better

based on what?

stray aspen Mar 5, 2026, 6:45 PM

#

honest verge I wonder if arena will ever have xhigh

no

royal sail Mar 5, 2026, 6:45 PM

#

the model has barely been out for an hour lol

stray aspen Mar 5, 2026, 6:45 PM

#

royal sail based on what?

on my tests

royal sail Mar 5, 2026, 6:45 PM

#

on your 3 prompts?

stray aspen Mar 5, 2026, 6:45 PM

#

yes

vital lake Mar 5, 2026, 6:45 PM

#

royal sail the model has barely been out for an hour lol

He has only done basic frontend tests btw

royal sail Mar 5, 2026, 6:45 PM

#

congrats

#

we should hire you for benchmarking!

stray aspen Mar 5, 2026, 6:45 PM

#

i test each model for the stuff i do

#

if its not good then its trash for me cause i wont give any other use

honest verge Mar 5, 2026, 6:45 PM

#

Waiting for extreme thinking

inner relic Mar 5, 2026, 6:45 PM

#

All models are not perfect yet

#

that's reality

honest verge Mar 5, 2026, 6:46 PM

#

But I think it's not true

#

Cuz it's still not there

frosty lava Mar 5, 2026, 6:46 PM

#

stray aspen i test each model for the stuff i do

yes you are doing front end task and testing it on a model made for general purpose task

vital lake Mar 5, 2026, 6:46 PM

#

honest verge Waiting for extreme thinking

Oh yeah

stray aspen Mar 5, 2026, 6:46 PM

#

inner relic All models are not perfect yet

nah

vital lake Mar 5, 2026, 6:46 PM

#

But I think extreme thinking is a fake leak

royal sail Mar 5, 2026, 6:46 PM

#

stray aspen i test each model for the stuff i do

If you form an opinion on a model after 3 tests, then I don't really have anything else to tell you

vital lake Mar 5, 2026, 6:46 PM

#

stray aspen i test each model for the stuff i do

Frontend should be the least

stray aspen Mar 5, 2026, 6:46 PM

#

gemini 3 deepthink on arena when

frosty lava Mar 5, 2026, 6:46 PM

#

atleast before judging coding capabilities wait for the coding model lol

royal sail Mar 5, 2026, 6:46 PM

#

Most benchmarks run hundreds of prompts

#

You ran 3.

frosty lava Mar 5, 2026, 6:47 PM

#

stray aspen gemini 3 deepthink on arena when

never since its from a pro subscription

stray aspen Mar 5, 2026, 6:47 PM

#

well that sucks

frosty lava Mar 5, 2026, 6:47 PM

#

just like we ain't ever getting gpt 5.4 pro on arena

stray aspen Mar 5, 2026, 6:47 PM

#

frosty lava just like we ain't ever getting gpt 5.4 pro on arena

yupp has them models tho

compact flame Mar 5, 2026, 6:48 PM

#

frosty lava never since its from a pro subscription

Isn't it for ultra?

stray aspen Mar 5, 2026, 6:48 PM

#

they had a 5.2 pro

#

i guess it was fake

honest verge Mar 5, 2026, 6:48 PM

#

And also all leaks about open source gpt are fake ...

frosty lava Mar 5, 2026, 6:48 PM

#

compact flame Isn't it for ultra?

yes my bad

#

i mean its from the highest subscription

honest verge Mar 5, 2026, 6:48 PM

#

Really hoped openai will release gpt oss 2 or something

compact flame Mar 5, 2026, 6:48 PM

#

Yeh

frosty lava Mar 5, 2026, 6:48 PM

#

so it can't be on arena

royal sail Mar 5, 2026, 6:48 PM

#

honest verge Really hoped openai will release gpt oss 2 or something

i doubt they're going to touch oss again any time soon

stray aspen Mar 5, 2026, 6:48 PM

#

honest verge Really hoped openai will release gpt oss 2 or something

for what

royal sail Mar 5, 2026, 6:48 PM

#

especially with how they're doing in the public eye right now

stray aspen Mar 5, 2026, 6:48 PM

#

people want deepseek 4

tawny canyon Mar 5, 2026, 6:49 PM

#

i did some tests and in general texting gpt 5.4 is not better compared to claude opus 4.6 or even gemini 3.1

royal sail Mar 5, 2026, 6:49 PM

#

tawny canyon i did some tests and in general texting gpt 5.4 is not better compared to claude...

did you use 5.4 high?

tawny canyon Mar 5, 2026, 6:49 PM

#

stray aspen people want deepseek 4

they are sleeping

tawny canyon Mar 5, 2026, 6:49 PM

#

royal sail did you use 5.4 high?

yes

vital lake Mar 5, 2026, 6:49 PM

#

stray aspen people want deepseek 4

They just gonna distill why would we be excited

honest verge Mar 5, 2026, 6:49 PM

#

Is o3 pro better than gpt 5.4?💀

royal sail Mar 5, 2026, 6:50 PM

#

oh hell naw

stray aspen Mar 5, 2026, 6:50 PM

#

honest verge Is o3 pro better than gpt 5.4?💀

no

#

o3 pro sucks

vital lake Mar 5, 2026, 6:50 PM

#

honest verge Is o3 pro better than gpt 5.4?💀

Ummmm source? 😭 🙏

stray aspen Mar 5, 2026, 6:50 PM

#

gpt 5.4 is miles better

royal sail Mar 5, 2026, 6:50 PM

#

bro talkin bout ancient technologies in the big 26

#

o3 is the past

vital lake Mar 5, 2026, 6:50 PM

#

stray aspen o3 pro sucks

The early pro models like o1 and o3 pro were sloppy.

#

Since post RL was new

honest verge Mar 5, 2026, 6:50 PM

#

royal sail o3 is the past

Then why openai keeps it

vital lake Mar 5, 2026, 6:50 PM

#

royal sail o3 is the past

Its still a decent model

royal sail Mar 5, 2026, 6:50 PM

#

Some labs and companies still use it

honest verge Mar 5, 2026, 6:50 PM

#

o3 pro and o3 are still live

royal sail Mar 5, 2026, 6:50 PM

#

vital lake Its still a decent model

not saying it's a bad model

#

but there are much better options now

vital lake Mar 5, 2026, 6:51 PM

#

royal sail but there are much better options now

Fo sure

royal sail Mar 5, 2026, 6:51 PM

#

and cheaper too

honest verge Mar 5, 2026, 6:51 PM

#

And openai won't remove them until 2027

vital lake Mar 5, 2026, 6:51 PM

#

honest verge And openai won't remove them until 2027

Like confirmed?

royal sail Mar 5, 2026, 6:51 PM

#

there's no real reason to use o3 right now unless you REALLY like the model for your use case

honest verge Mar 5, 2026, 6:51 PM

#

vital lake Like confirmed?

Because no official deprecation date

royal sail Mar 5, 2026, 6:51 PM

#

there are more effective options

honest verge Mar 5, 2026, 6:52 PM

#

They are deprecating gpt 5.1 but not o3

vital lake Mar 5, 2026, 6:52 PM

#

royal sail there's no real reason to use o3 right now unless you REALLY like the model for ...

Something about o3 was different idk why

honest verge Mar 5, 2026, 6:52 PM

#

Like why removing 5.1

#

But not o3

royal sail Mar 5, 2026, 6:52 PM

#

vital lake Something about o3 was different idk why

It definitely marked a huge leap in reasoning

#

One of my personal favorite models

#

just far too expensive

#

also hallucinated everything lol

vital lake Mar 5, 2026, 6:52 PM

#

royal sail It definitely marked a huge leap in reasoning

I tested it in chess and it was the first model ever to not hallucinate.

stray aspen Mar 5, 2026, 6:52 PM

#

vital lake Something about o3 was different idk why

the ridiculous pricing

honest verge Mar 5, 2026, 6:52 PM

#

royal sail just far too expensive

Especially pro version

#

Btw what's the point of using o3 pro

#

Like is it better?

vital lake Mar 5, 2026, 6:53 PM

#

stray aspen the ridiculous pricing

Atleast it was cheaper then o1

royal sail Mar 5, 2026, 6:53 PM

#

honest verge Like is it better?

it just thinks wayy longer for higher reasoning capabilities

vital lake Mar 5, 2026, 6:53 PM

#

Who remembers o1 pro pricing? 😭

vital lake Mar 5, 2026, 6:53 PM

#

royal sail it just thinks wayy longer for higher reasoning capabilities

Barley higher

royal sail Mar 5, 2026, 6:53 PM

#

yeah lol

vital lake Mar 5, 2026, 6:53 PM

#

I never saw any big improvements, but maybe I wasnt using it right

frosty lava Mar 5, 2026, 6:54 PM

#

that mean every month we are getting a new gpt its insane

tawny canyon Mar 5, 2026, 6:54 PM

#

will gpt5.4 search mode be added to arena?

vital lake Mar 5, 2026, 6:54 PM

#

Obviously

royal sail Mar 5, 2026, 6:54 PM

#

I feel like OpenAI is gonna have to release something extraordinary to get back into the race

#

feels like they're falling off frontier

stray aspen Mar 5, 2026, 6:54 PM

#

glory to anthropic

frosty lava Mar 5, 2026, 6:55 PM

#

royal sail I feel like OpenAI is gonna have to release something extraordinary to get back ...

i don't think its the real capabilities of their model that are judge by people but the fact that they removed 4o lol

royal sail Mar 5, 2026, 6:55 PM

#

well

tawny canyon Mar 5, 2026, 6:55 PM

#

stray aspen glory to anthropic

gemini will take 1st place in may

royal sail Mar 5, 2026, 6:55 PM

#

4o put thousands of people into psychosis so

frosty lava Mar 5, 2026, 6:55 PM

#

royal sail 4o put thousands of people into psychosis so

exactly

#

and people want it back

stray aspen Mar 5, 2026, 6:55 PM

#

tawny canyon gemini will take 1st place in may

then they wil nerf the model

honest verge Mar 5, 2026, 6:56 PM

#

frosty lava and people want it back

I used it very much but it's not very impressive now

#

But in early 2025 it was peak

thorny schooner Mar 5, 2026, 6:56 PM

#

I still hate they for some reason added battle mode directly to direct mode

honest verge Mar 5, 2026, 6:56 PM

#

There was no better ai writing

royal sail Mar 5, 2026, 6:56 PM

#

I will agree that there was something unique about 4o in the sense that it just had less refusals and felt less corporate

stray aspen Mar 5, 2026, 6:56 PM

#

thorny schooner I still hate they for some reason added battle mode directly to direct mode

pineapple said theres a skipp button now

frosty lava Mar 5, 2026, 6:56 PM

#

honest verge I used it very much but it's not very impressive now

people just want a model that will make them go into psychosis and agree with everything they say

royal sail Mar 5, 2026, 6:56 PM

#

but it was just too sycophantic

frosty lava Mar 5, 2026, 6:56 PM

#

they want the model to agree with everything they say that's it

royal sail Mar 5, 2026, 6:56 PM

#

yeah pretty much

honest verge Mar 5, 2026, 6:57 PM

#

frosty lava they want the model to agree with everything they say that's it

Hey 4o is still live

royal sail Mar 5, 2026, 6:57 PM

#

because if a model agrees with everything i say then it must be the best model

#

/s

honest verge Mar 5, 2026, 6:57 PM

#

Not in chatgpt but it's live

stray aspen Mar 5, 2026, 6:57 PM

#

honest verge Hey 4o is still live

yes

frosty lava Mar 5, 2026, 6:57 PM

#

honest verge Hey 4o is still live

yeah i know

honest verge Mar 5, 2026, 6:57 PM

#

So what's the point of #keep4o

royal sail Mar 5, 2026, 6:58 PM

#

honest verge Hey 4o is still live

the people who were going into psychosis can NOT afford that API pricing 😭 🙏

#

that's why they still want it in the app

frosty lava Mar 5, 2026, 6:58 PM

#

i don't wanna be in a world where 80% of people are in psychosis due to an ai

#

please

#

use ai for coding task or general purpose

distant idol Mar 5, 2026, 6:59 PM

#

any news on seedance 2 api guys

stray aspen Mar 5, 2026, 6:59 PM

#

distant idol any news on seedance 2 api guys

delayed

light sleet Mar 5, 2026, 6:59 PM

#

Gpt 5.4 is the best at lua code?

distant idol Mar 5, 2026, 6:59 PM

#

stray aspen delayed

damn, to when thoo

mossy girder Mar 5, 2026, 6:59 PM

#

Why UI is scrolling down forcefully after response is done? Before it wasn't there, it's irritating.

stray aspen Mar 5, 2026, 7:00 PM

#

light sleet Gpt 5.4 is the best at lua code?

no lmao

#

its horrible

#

claude is better

light sleet Mar 5, 2026, 7:00 PM

#

stray aspen no lmao

what's the best for lua?

stray aspen Mar 5, 2026, 7:00 PM

#

claudius

light sleet Mar 5, 2026, 7:00 PM

#

k

royal sail Mar 5, 2026, 7:00 PM

#

bro

stray aspen Mar 5, 2026, 7:00 PM

#

@daring rock

distant idol Mar 5, 2026, 7:01 PM

#

what are u doing nephew

honest verge Mar 5, 2026, 7:01 PM

#

royal sail bro

He will use ai for this 100%

#

What's the point of paying someone 5$

#

When you can just use ai

royal sail Mar 5, 2026, 7:01 PM

#

I swear these guys just ask a model "how can i make money fast with ai" and do whatever it says

stray aspen Mar 5, 2026, 7:01 PM

#

honest verge He will use ai for this 100%

its a scam bot wdym

honest verge Mar 5, 2026, 7:01 PM

#

stray aspen its a scam bot wdym

Ik

#

But still funny

distant idol Mar 5, 2026, 7:02 PM

#

how to make money fast

honest verge Mar 5, 2026, 7:02 PM

#

distant idol how to make money fast

You have to sleep less

#

Sleeping for 30 minutes is enough

stray aspen Mar 5, 2026, 7:03 PM

#

distant idol how to make money fast

work

honest verge Mar 5, 2026, 7:04 PM

#

stray aspen work

No you have to sell your money

#

So you will make money

#

Then sell it again

distant idol Mar 5, 2026, 7:04 PM

#

honest verge Sleeping for 30 minutes is enough

real shi 💥💥

honest verge Mar 5, 2026, 7:04 PM

#

And you will make money again

#

Do this forever

distant idol Mar 5, 2026, 7:05 PM

#

stray aspen work

im an officer, do i need to quit

honest verge Mar 5, 2026, 7:05 PM

#

And you will get new money

honest verge Mar 5, 2026, 7:05 PM

#

distant idol im an officer, do i need to quit

You have to sleep less

#

What's this...

Screenshot_2026-03-05-22-05-30-328_com.android.chrome-edit.jpg

#

Concord grape?

#

Mammoth newt 0226

frosty lava Mar 5, 2026, 7:06 PM

#

honest verge What's this...

where did you find this

honest verge Mar 5, 2026, 7:06 PM

#

Yupp ai

stray aspen Mar 5, 2026, 7:07 PM

#

honest verge What's this...

why is it so expensive

honest verge Mar 5, 2026, 7:07 PM

#

stray aspen why is it so expensive

Idk maybe new Claude?

vital lake Mar 5, 2026, 7:07 PM

#

#

5.4 seems smart

frosty lava Mar 5, 2026, 7:07 PM

#

personal opinion but i love how gpt 5.4 talk

distant idol Mar 5, 2026, 7:08 PM

#

vital lake

nephew what did u even tell it

vital lake Mar 5, 2026, 7:09 PM

#

distant idol nephew what did u even tell it

Debate about god

honest verge Mar 5, 2026, 7:10 PM

#

Wait mercury 2 is out?

Screenshot_2026-03-05-22-09-33-188_com.android.chrome-edit.jpg

distant idol Mar 5, 2026, 7:10 PM

#

vital lake Debate about god

gus fring is something

honest verge Mar 5, 2026, 7:10 PM

#

And it's after gpt 5.4

frosty lava Mar 5, 2026, 7:10 PM

#

honest verge Wait mercury 2 is out?

what's mercury 2

vital lake Mar 5, 2026, 7:11 PM

#

frosty lava what's mercury 2

A really fast model, I forgot what type

#

Somethin new idk

distant idol Mar 5, 2026, 7:11 PM

#

honest verge Wait mercury 2 is out?

feb and march is filled with crazy models

honest verge Mar 5, 2026, 7:11 PM

#

distant idol feb and march is filled with crazy models

Feb is my favourite

meager harbor Mar 5, 2026, 7:11 PM

#

gpt 5.4 worst than 5.2 latest

#

on arena

frosty lava Mar 5, 2026, 7:12 PM

#

meager harbor gpt 5.4 worst than 5.2 latest

actually no way

honest verge Mar 5, 2026, 7:12 PM

#

Also qwen image 2.0 and pro is out!

distant idol Mar 5, 2026, 7:12 PM

#

meager harbor gpt 5.4 worst than 5.2 latest

people when its comes to gpt 5.4 either they are glazing or hating no inbetween 😭

honest verge Mar 5, 2026, 7:12 PM

#

Crazy

inner relic Mar 5, 2026, 7:12 PM

#

They are focused on coding skill not creativity

vital lake Mar 5, 2026, 7:13 PM

#

inner relic They are focused on coding skill not creativity

Not true, its a balance

distant idol Mar 5, 2026, 7:13 PM

#

honest verge Also qwen image 2.0 and pro is out!

frosty lava Mar 5, 2026, 7:13 PM

#

inner relic They are focused on coding skill not creativity

for coding skill it would be a codex version

#

this one is for general purpose

inner relic Mar 5, 2026, 7:14 PM

#

Ok

#

Anyways.. My one question is

honest verge Mar 5, 2026, 7:14 PM

#

Gpt 5.4 pro pricing is crazy

inner relic Mar 5, 2026, 7:14 PM

#

Chatgpt 5.4 is good at writting and creative?

vital lake Mar 5, 2026, 7:14 PM

#

honest verge Mar 5, 2026, 7:14 PM

#

Like it's too expensive

#

It's unusable for normal peoples

frosty lava Mar 5, 2026, 7:15 PM

#

honest verge Gpt 5.4 pro pricing is crazy

yes but i'd say same for gemini ultra or claude max

vital lake Mar 5, 2026, 7:15 PM

#

honest verge It's unusable for normal peoples

Thats why it has pro in the name

vital lake Mar 5, 2026, 7:15 PM

#

vital lake

This lowers my expectations on 5.4...

royal sail Mar 5, 2026, 7:15 PM

#

honest verge Concord grape?

need that jelly

honest verge Mar 5, 2026, 7:15 PM

#

vital lake Thats why it has pro in the name

Then what's the point of releasing pro version in app

distant idol Mar 5, 2026, 7:15 PM

#

wharr

honest verge Mar 5, 2026, 7:15 PM

#

No one will use it

royal sail Mar 5, 2026, 7:15 PM

#

vital lake

"arc-agi-2 is reaching saturation"

#

yeah okay

vital lake Mar 5, 2026, 7:16 PM

#

honest verge Then what's the point of releasing pro version in app

For people with money

frosty lava Mar 5, 2026, 7:16 PM

#

honest verge No one will use it

some people are buying it actually

distant idol Mar 5, 2026, 7:16 PM

#

vital lake For people with money

are u rich gus

vital lake Mar 5, 2026, 7:16 PM

#

I wish

inner relic Mar 5, 2026, 7:16 PM

#

not bad not bad.

honest verge Mar 5, 2026, 7:16 PM

#

Arc agi 1 even existed?

#

Or every model just scores 100% now?

#

In it

royal sail Mar 5, 2026, 7:17 PM

#

yes

#

it did exist but it was saturated

distant idol Mar 5, 2026, 7:17 PM

#

inner relic not bad not bad.

😭😭😭😭 he learnt from the memes

royal sail Mar 5, 2026, 7:17 PM

#

inner relic not bad not bad.

I don't really like this test, because it's just absuing how tokenization works

honest verge Mar 5, 2026, 7:18 PM

#

Arc agi 3 coming soon

royal sail Mar 5, 2026, 7:18 PM

#

🤦‍♂️

distant idol Mar 5, 2026, 7:18 PM

#

royal sail 🤦‍♂️

😭😭😭😭

#

folk

compact flame Mar 5, 2026, 7:19 PM

#

royal sail 🤦‍♂️

Ai is Future they said

royal sail Mar 5, 2026, 7:19 PM

#

Like, counting letters in a word is something us humans do fine because we have a method of doing it

#

LLMs don't work letter-by-letter

inner relic Mar 5, 2026, 7:19 PM

#

lol

distant idol Mar 5, 2026, 7:19 PM

#

royal sail 🤦‍♂️

i thought they fixed the hallucination of this model

royal sail Mar 5, 2026, 7:19 PM

#

Not a hallucination problem

#

It's just how tokenization works

#

the model doesn't see every letter in the word

#

it just sees the probability of that specific token happening

meager harbor Mar 5, 2026, 7:20 PM

#

frosty lava actually no way

just look at the arena score

meager harbor Mar 5, 2026, 7:20 PM

#

distant idol people when its comes to gpt 5.4 either they are glazing or hating no inbetween ...

look at arena score

royal sail Mar 5, 2026, 7:20 PM

#

I mean, the model just came out

#

I'd give the score time to settle

frosty lava Mar 5, 2026, 7:20 PM

#

meager harbor just look at the arena score

preliminary test mean its not definitive score

#

you didn't saw the warning ?

distant idol Mar 5, 2026, 7:21 PM

#

meager harbor look at arena score

it js came out

#

calm down saul goddman

honest verge Mar 5, 2026, 7:21 PM

#

WHERE IS DEEPSEEK V4

#

Deepseek is already miles away

meager harbor Mar 5, 2026, 7:21 PM

#

frosty lava preliminary test mean its not definitive score

don't expect the score to change much, it won't win 50 elos point magically

meager harbor Mar 5, 2026, 7:22 PM

#

distant idol calm down saul goddman

don't expect the score to change much, it won't win 50 elos point magically

honest verge Mar 5, 2026, 7:22 PM

#

meager harbor don't expect the score to change much, it won't win 50 elos point magically

Dementia?

frosty lava Mar 5, 2026, 7:22 PM

#

meager harbor don't expect the score to change much, it won't win 50 elos point magically

you are ragebaiting

distant idol Mar 5, 2026, 7:22 PM

#

meager harbor don't expect the score to change much, it won't win 50 elos point magically

nephew take a breath and relax

royal sail Mar 5, 2026, 7:22 PM

#

honest verge Deepseek is already miles away

it's releasing in 1 hour and 23 minutes

#

trust

#

my source is me

meager harbor Mar 5, 2026, 7:23 PM

#

distant idol nephew take a breath and relax

why should I relax ?

honest verge Mar 5, 2026, 7:23 PM

#

Are you ....

Screenshot_2026-03-05-22-22-59-567_com.android.chrome-edit.jpg

#

LOL

#

lol

royal sail Mar 5, 2026, 7:23 PM

#

honest verge Are you ....

model just dropped

inner relic Mar 5, 2026, 7:24 PM

#

Yep

meager harbor Mar 5, 2026, 7:24 PM

#

honest verge Dementia?

sleek phoenix Mar 5, 2026, 7:24 PM

#

5.8 in the next month

frosty lava Mar 5, 2026, 7:24 PM

#

meager harbor

let's ignore the fact that it just released and the warning "preliminary"

honest verge Mar 5, 2026, 7:24 PM

#

sleek phoenix 5.8 in the next month

WHEN GPT 6 IS COMING

royal sail Mar 5, 2026, 7:24 PM

#

The current leaderboard position is irrelevant solely because of the fact that the model has only 2,000 votes as of now.

honest verge Mar 5, 2026, 7:24 PM

#

IM TIRED OF 5.X

sleek phoenix Mar 5, 2026, 7:25 PM

#

honest verge WHEN GPT 6 IS COMING

next month and 15 days

royal sail Mar 5, 2026, 7:25 PM

#

That's not nearly enough to make a conclusion

honest verge Mar 5, 2026, 7:25 PM

#

I want next generation

meager harbor Mar 5, 2026, 7:25 PM

#

frosty lava let's ignore the fact that it just released and the warning "preliminary"

score won't change that much.... opus 4.6 will still be ahead by far, I mean are you new to this ?

royal sail Mar 5, 2026, 7:25 PM

#

meager harbor score won't change that much.... opus 4.6 will still be ahead by far, I mean are...

The model only has 2,000 votes

#

Opus 4.5 has 30,000+

sleek phoenix Mar 5, 2026, 7:25 PM

#

royal sail Opus 4.5 has 30,000+

4.5*

meager harbor Mar 5, 2026, 7:25 PM

#

royal sail The model only has 2,000 votes

can you read ? it doesn't matter, there wilkl be max 10 elo diff

royal sail Mar 5, 2026, 7:26 PM

#

sleek phoenix 4.5*

mb lol

royal sail Mar 5, 2026, 7:26 PM

#

meager harbor can you read ? it doesn't matter, there wilkl be max 10 elo diff

how can you be so sure?

honest verge Mar 5, 2026, 7:26 PM

#

gpt 5.4 is o5

distant idol Mar 5, 2026, 7:26 PM

#

1772738782865.Screenshot_20260305-222606.jpg

#

damn 💔

royal sail Mar 5, 2026, 7:27 PM

#

Models can do this pretty easily if you just give it terminal access

#

they'll just use a command

frosty lava Mar 5, 2026, 7:27 PM

#

honest verge gpt 5.4 is o5

i don't think this one is as dangerous as 4o

royal sail Mar 5, 2026, 7:27 PM

#

Same reason why LLMs can't do long form arithmetic without a calculator

meager harbor Mar 5, 2026, 7:27 PM

#

royal sail how can you be so sure?

experience, have been looking at the ranking for just one week ? and also next to the score there is the +-12

royal sail Mar 5, 2026, 7:28 PM

#

My fault

#

Didn't know you had experience

meager harbor Mar 5, 2026, 7:28 PM

#

royal sail Didn't know you had *experience*

arena champion bro

distant idol Mar 5, 2026, 7:28 PM

#

experience 💥

honest verge Mar 5, 2026, 7:28 PM

#

Wait why gpt 5.4 is only available for text?

#

No coding?

royal sail Mar 5, 2026, 7:28 PM

#

meager harbor arena champion bro

there are 500 other arena champions

#

you are not special

distant idol Mar 5, 2026, 7:29 PM

#

royal sail there are 500 other arena champions

dude he is the chosen one

meager harbor Mar 5, 2026, 7:29 PM

#

royal sail there are 500 other arena champions

yeah well you not being arena champion speak a lot for itself

#

no it's just that you don't understand simple thing is ragebaiting me

vital lake Mar 5, 2026, 7:29 PM

#

honest verge Mar 5, 2026, 7:29 PM

#

meager harbor yeah well you not being arena champion speak a lot for itself

How do you even become arena champion?

royal sail Mar 5, 2026, 7:29 PM

#

meager harbor yeah well you not being arena champion speak a lot for itself

I have a bachelor's in computer science and I've fine-tuned LLMs myself, and I also run my own benchmarks.

#

I'd rather have that than "arena champion"

distant idol Mar 5, 2026, 7:30 PM

#

royal sail I'd rather have that than "arena champion"

calm done uncle

#

down*

meager harbor Mar 5, 2026, 7:30 PM

#

royal sail I have a bachelor's in computer science and I've fine-tuned LLMs myself, and I a...

ok Noam Shazeer

ocean vortex Mar 5, 2026, 7:30 PM

#

royal sail The current leaderboard position is irrelevant solely because of the fact that t...

yeah but it's reasonable to expect for the chat model to be tuned more for user preference than the thinking one. Very often was the case with OpenAI. So wouldn't be too surprised if this doesn't climb much. Doesn't mean that it isn't better

royal sail Mar 5, 2026, 7:30 PM

#

I mainly bring it up because I think we're jumping to the conclusion that this model is "worse than 5.2" way too quickly

#

Once the score settles, it's probably fair to determine if the model is good or not

tawny canyon Mar 5, 2026, 7:31 PM

#

guys when will deepseek v4 be released?

honest verge Mar 5, 2026, 7:31 PM

#

WHERE IS DEEPSEEK V4

inner relic Mar 5, 2026, 7:31 PM

#

I dont know what whale is doing

royal sail Mar 5, 2026, 7:31 PM

#

tawny canyon guys when will deepseek v4 be released?

today in exactly 22 minutes

honest verge Mar 5, 2026, 7:31 PM

#

IM TIRED

inner relic Mar 5, 2026, 7:31 PM

#

They're not afraid of gemini,claude and openai

honest verge Mar 5, 2026, 7:31 PM

#

I NEED DEEPSEEK V4

frosty lava Mar 5, 2026, 7:31 PM

#

deepseek waited for gpt release to beat them lol

#

joke

distant idol Mar 5, 2026, 7:31 PM

#

frosty lava deepseek waited for gpt release to beat them lol

they always does this

tawny canyon Mar 5, 2026, 7:31 PM

#

royal sail today in exactly 22 minutes

source?

meager harbor Mar 5, 2026, 7:32 PM

#

royal sail I mainly bring it up because I think we're jumping to the conclusion that this m...

tbf the model is on par with 5.2 but it is not the big leap everyone was talking about

ocean vortex Mar 5, 2026, 7:32 PM

#

royal sail I mainly bring it up because I think we're jumping to the conclusion that this m...

It's fundamentally incorrect to align arena score with better/worse in the first place

royal sail Mar 5, 2026, 7:32 PM

#

meager harbor tbf the model is on par with 5.2 but it is not the big leap everyone was talking...

nobody was saying it was a big leap?

frosty lava Mar 5, 2026, 7:32 PM

#

distant idol they always does this

yes but no opensource model have beaten one of top 3 like anthropic, gpt gemini

royal sail Mar 5, 2026, 7:32 PM

#

if anything people were saying it was incremental lol

royal sail Mar 5, 2026, 7:32 PM

#

ocean vortex It's fundamentally incorrect to align arena score with better/worse in the first...

Of course, it just gives some insight on where the model may sit

meager harbor Mar 5, 2026, 7:32 PM

#

royal sail nobody was saying it was a big leap?

you should see Scam altman hyping it up

distant idol Mar 5, 2026, 7:32 PM

#

frosty lava yes but no opensource model have beaten one of top 3 like anthropic, gpt gemini

opensource models are 1 year back in the race

royal sail Mar 5, 2026, 7:33 PM

#

meager harbor you should see Scam altman hyping it up

well argue with him then lol

#

nobody here thought it was a huge leap

honest verge Mar 5, 2026, 7:33 PM

#

distant idol opensource models are 1 year back in the race

It's almost time for them to release something

#

Or they are getting destroyed

frosty lava Mar 5, 2026, 7:33 PM

#

distant idol opensource models are 1 year back in the race

but they are progressing very fast honestly

ocean vortex Mar 5, 2026, 7:33 PM

#

royal sail Of course, it just gives some insight on where the model may sit

Dunno how long you've been following this, but Anthropic used to suck on lmarena. That was kind of ironic. But it proves that top spots there can only be occupied by the models explicitly trained to do well in this specific environment

distant idol Mar 5, 2026, 7:34 PM

#

honest verge It's almost time for them to release something

im waiting on bytedance models tbh, they are cookinggg ngl

royal sail Mar 5, 2026, 7:34 PM

#

ocean vortex Dunno how long you've been following this, but Anthropic used to suck on lmarena...

Oh yeah I'm aware that LMArena has had some misleading scores in the past

#

hence why i take the scores with a grain of salt and try to supplement my opinion with other benchmarks as well

honest verge Mar 5, 2026, 7:34 PM

#

WE NEED A PIECE OF OPEN SOURCE!

ocean vortex Mar 5, 2026, 7:35 PM

#

royal sail Oh yeah I'm aware that LMArena has had some misleading scores in the past

Not really misleading, but people need some context to properly read them. Arena is one metric out of dozens of them. And it isn't really raw performance or capability metric

royal sail Mar 5, 2026, 7:36 PM

#

Eh, I'd say misleading in some cases. Llama 4 topped the leaderboard at some point.

honest verge Mar 5, 2026, 7:36 PM

#

royal sail Eh, I'd say misleading in some cases. Llama 4 topped the leaderboard at some poi...

Btw where is llama?

royal sail Mar 5, 2026, 7:36 PM

#

dead in the back of an alley

honest verge Mar 5, 2026, 7:36 PM

#

I haven't heard anything about it

ocean vortex Mar 5, 2026, 7:36 PM

#

royal sail Eh, I'd say misleading in some cases. Llama 4 topped the leaderboard at some poi...

yeah because they fine-tuned on arena datasets lol

honest verge Mar 5, 2026, 7:36 PM

#

They just discounted it?

distant idol Mar 5, 2026, 7:36 PM

#

when is arena-video gonna be a direct chat 💔

honest verge Mar 5, 2026, 7:36 PM

#

distant idol when is arena-video gonna be a direct chat 💔

Actually I remember it was

#

For some hours

#

But it got deleted

distant idol Mar 5, 2026, 7:37 PM

#

honest verge But it got deleted

fuhhhhhh

honest verge Mar 5, 2026, 7:37 PM

#

And now it's only for battle arena

distant idol Mar 5, 2026, 7:37 PM

#

ive been waiting for months

#

its the site can really be 10/10 if they js did that

frosty lava Mar 5, 2026, 7:38 PM

#

my dream would be deepseek v4 #1 in capabilities from all existing model till now but it's impossible right ?

ocean vortex Mar 5, 2026, 7:38 PM

#

Many of it is just style that is technically meaningless and easily changeable, some of it are also just the mere patterns... Patterns from the most active users on arena, what prompts they use and how they are voting

honest verge Mar 5, 2026, 7:38 PM

#

Gpt 5.4 is very bad for coding in arena

#

Like it's so bad

royal sail Mar 5, 2026, 7:38 PM

#

frosty lava my dream would be deepseek v4 #1 in capabilities from all existing model till no...

highly unlikely, yeah

honest verge Mar 5, 2026, 7:38 PM

#

I can't

#

Even Gemini 3 beats it

#

It's worse than gpt 5.1

#

It can't do anything

royal sail Mar 5, 2026, 7:39 PM

#

honest verge It's worse than gpt 5.1

literally can't be possible

#

It's supposed to be 5.3 Codex level

distant idol Mar 5, 2026, 7:39 PM

#

frosty lava my dream would be deepseek v4 #1 in capabilities from all existing model till no...

patience nephew 🙏

frosty lava Mar 5, 2026, 7:39 PM

#

why in the coding i only see the medium version ?

#

on arena

honest verge Mar 5, 2026, 7:39 PM

#

royal sail It's supposed to be 5.3 Codex level

It's not even opus 4.5 lvl

ocean vortex Mar 5, 2026, 7:40 PM

#

honest verge Gpt 5.4 is very bad for coding in arena

there are certain coding areas it should be really be really good at tbh

frosty lava Mar 5, 2026, 7:40 PM

#

honest verge It's not even opus 4.5 lvl

front end on gpt is really bad

#

don't do front-end using gpt

royal sail Mar 5, 2026, 7:40 PM

#

ocean vortex there are certain coding areas it should be really be really good at tbh

hate how openai conveniently leaves 5.3 Codex out of so many benchmarks

honest verge Mar 5, 2026, 7:40 PM

#

Opus 4.6 is still my king

#

I'm waiting for opus 5 so hard

frosty lava Mar 5, 2026, 7:41 PM

#

how can we explain opus 4.6 have good "taste" in front-end but not other companies ?

honest verge Mar 5, 2026, 7:41 PM

#

frosty lava how can we explain opus 4.6 have good "taste" in front-end but not other compani...

Because it actually does max effort

frosty lava Mar 5, 2026, 7:42 PM

#

honest verge Because it actually does max effort

for example when i try gpt 5.3 codex on coding task that's not front-end it do very good job and maybe better than opus 4.6

#

so how do you explain it

ocean vortex Mar 5, 2026, 7:42 PM

#

royal sail hate how openai conveniently leaves 5.3 Codex out of so many benchmarks

I don't mind it since I prefer general purpose model and codex is this odd one out for me lmao. But I understand where you are coming from, I didn't even notice at first that they included 5.2-codex rather than 5.3-codex in that graph 🗿

frosty lava Mar 5, 2026, 7:42 PM

#

the same gpt 5.3 codex on front end is bad

honest verge Mar 5, 2026, 7:42 PM

#

frosty lava so how do you explain it

Opus 4.6 is built for frontend

#

It's capable of ambitious work

frosty lava Mar 5, 2026, 7:42 PM

#

yeah i guess that make sense

honest verge Mar 5, 2026, 7:42 PM

#

As anthropic says

frosty lava Mar 5, 2026, 7:42 PM

#

so openai need to work on this

#

its not about coding capabilities honestly but about the taste it have

royal sail Mar 5, 2026, 7:44 PM

#

frosty lava the same gpt 5.3 codex on front end is bad

I feel like GPT models have always been pretty terrible at frontend

#

never really got good UI outputs from GPT models

frosty lava Mar 5, 2026, 7:44 PM

#

royal sail I feel like GPT models have always been pretty terrible at frontend

yes i hope they focus on it for the next one

honest verge Mar 5, 2026, 7:45 PM

#

royal sail never really got good UI outputs from GPT models

UI from gpt 5.3 just feels too plastic

#

While opus 4.6 somehow makes it alive and beautiful

ocean vortex Mar 5, 2026, 7:45 PM

#

frosty lava so openai need to work on this

to have good visuals you ideally need a bigger model. So Opus and Google gonna have natural advantage there. And if OpenAI made it bigger they wouldn't be able to sustain current caps on chatgpt. It's a reasonable trade-off. They used to be struggling considerably more with this before gpt4.1 and subsequent models

honest verge Mar 5, 2026, 7:45 PM

#

What's the secret

frosty lava Mar 5, 2026, 7:45 PM

#

its capable of implementing features and doing very great things like that but don't have any good visuals

frosty lava Mar 5, 2026, 7:46 PM

#

ocean vortex to have good visuals you ideally need a bigger model. So Opus and Google gonna h...

i think they can train it especially on front end task to improve it honestly

royal sail Mar 5, 2026, 7:46 PM

#

ocean vortex to have good visuals you ideally need a bigger model. So Opus and Google gonna h...

I wouldn't say this is necessarily true, since some small models can push out some great UI with proper instructions

ocean vortex Mar 5, 2026, 7:46 PM

#

frosty lava i think they can train it especially on front end task to improve it honestly

you can but only to a certain extent before degrading other things

vital lake Mar 5, 2026, 7:46 PM

#

Guys 5.4 is so good at creative writing

royal sail Mar 5, 2026, 7:46 PM

#

Size is only really important for world knowledge and superposition (relating novel concepts to each other)

vital lake Mar 5, 2026, 7:46 PM

#

Like way better then Gemini and Opus

hushed gyro Mar 5, 2026, 7:47 PM

#

WTF is 5.4???

devout vault Mar 5, 2026, 7:47 PM

#

openai will never make good models. again

ocean vortex Mar 5, 2026, 7:47 PM

#

royal sail I wouldn't say this is necessarily true, since some small models can push out so...

"with proper instructions". But if you adopted similar approach with bigger model chances are it would do better than it did before as well. Can only compare identical setups

honest verge Mar 5, 2026, 7:47 PM

#

devout vault openai will never make good models. again

Anthropic was always ahead

crude lagoon Mar 5, 2026, 7:47 PM

#

devout vault openai will never make good models. again

They reached the peak at 4o

frosty lava Mar 5, 2026, 7:47 PM

#

but the reason why i use gpt 5.3 codex instead of opus 4.6 is that its doing better job on every other thing than front end

royal sail Mar 5, 2026, 7:47 PM

#

ocean vortex "with proper instructions". But if you adopted similar approach with bigger mode...

GPT models with design skills still do pretty horrible

#

Theo.t3 made a video on the whole thing

honest verge Mar 5, 2026, 7:48 PM

#

frosty lava but the reason why i use gpt 5.3 codex instead of opus 4.6 is that its doing bet...

And because 5.3 is very cheap

round forge Mar 5, 2026, 7:48 PM

#

vital lake Like way better then Gemini and Opus

What kind of creative writing do you mean?

honest verge Mar 5, 2026, 7:48 PM

#

While opus 4.6 requires max subscription

#

Pro isn't enough

frosty lava Mar 5, 2026, 7:48 PM

#

honest verge And because 5.3 is very cheap

yes too

inner relic Mar 5, 2026, 7:48 PM

#

2nd

inner relic Mar 5, 2026, 7:48 PM

#

inner relic 2nd

2nd in creative writting

vital lake Mar 5, 2026, 7:48 PM

#

round forge What kind of creative writing do you mean?

Roleplay, story board, all of it.

ocean vortex Mar 5, 2026, 7:48 PM

#

royal sail GPT models with design skills still do pretty horrible

Wouldn't say it's horrible. With gpt4o and before o3/gpt4.1, that's when it was really bad lol. They sucked on just about every single benchmark or test that touched on visuals

vital lake Mar 5, 2026, 7:48 PM

#

Its creepily natural.

#

OpenAI 100% focused in this aspect more

royal sail Mar 5, 2026, 7:49 PM

#

ocean vortex Wouldn't say it's horrible. With gpt4o and before o3/gpt4.1, that's when it was ...

i mean, if you enjoy AI slop blue-purple gradients then sure 😭

#

5.3 codex loves doing it

#

5.3 codex (left), 5.4 (right)

honest verge Mar 5, 2026, 7:49 PM

#

Was gpt o models ever really good?

frosty lava Mar 5, 2026, 7:50 PM

#

but i love the 1m context

distant idol Mar 5, 2026, 7:50 PM

#

guys which model has the most context

royal sail Mar 5, 2026, 7:50 PM

#

honest verge Was gpt o models ever really good?

Pretty good general purpose reasoning models

zealous sparrow Mar 5, 2026, 7:50 PM

#

royal sail 5.3 codex (left), 5.4 (right)

Same lame model

royal sail Mar 5, 2026, 7:50 PM

#

Not amazing at coding

#

But they were frontier

honest verge Mar 5, 2026, 7:50 PM

#

And where's o2?

zealous sparrow Mar 5, 2026, 7:50 PM

#

royal sail Pretty good general purpose reasoning models

Its bad at simplebench

royal sail Mar 5, 2026, 7:50 PM

#

honest verge And where's o2?

skipped lol

vital lake Mar 5, 2026, 7:50 PM

#

honest verge Was gpt o models ever really good?

Yes

zealous sparrow Mar 5, 2026, 7:50 PM

#

I did my tests

vital lake Mar 5, 2026, 7:50 PM

#

royal sail skipped lol

Copyright issues

royal sail Mar 5, 2026, 7:50 PM

#

oh really?

ocean vortex Mar 5, 2026, 7:50 PM

#

royal sail i mean, if you enjoy AI slop blue-purple gradients then sure 😭

on webdev arena, arc-agi and related things they do get respectable scores nowadays. They also perform decently on svg tests

vital lake Mar 5, 2026, 7:50 PM

#

Yeah some company name

frosty lava Mar 5, 2026, 7:50 PM

#

Chat gpt is bad at making good visuals

weak dagger Mar 5, 2026, 7:50 PM

#

@alpine pasture i like cherrys

royal sail Mar 5, 2026, 7:50 PM

#

zealous sparrow Its bad at simplebench

was not aware lol

#

just basing off my usage and general consensus back then

distant idol Mar 5, 2026, 7:51 PM

#

weak dagger <@563563711000870913> i like cherrys

i js ate chicken

honest verge Mar 5, 2026, 7:51 PM

#

Missing

honest verge Mar 5, 2026, 7:51 PM

#

royal sail skipped lol

Or it was so good

weak dagger Mar 5, 2026, 7:51 PM

#

distant idol i js ate chicken

did you like my brother?

distant idol Mar 5, 2026, 7:51 PM

#

weak dagger did you like my brother?

yea, im still hungry too, come here nephew

ocean vortex Mar 5, 2026, 7:52 PM

#

no 5.4 on chatgpt?

#

😠

#

weak dagger Mar 5, 2026, 7:52 PM

#

distant idol yea, im still hungry too, come here nephew

#

RIP chicken

ocean vortex Mar 5, 2026, 7:52 PM

#

this rollout thing is kinda slow recently

#

used to be near instant

royal sail Mar 5, 2026, 7:52 PM

#

ocean vortex on webdev arena, arc-agi and related things they do get respectable scores nowad...

Sure, but arc-agi is more about reasoning, and svg tests are just a capability benchmark rather than measuring design skills

ocean vortex Mar 5, 2026, 7:52 PM

#

Didn't even have 5.3 until now

royal sail Mar 5, 2026, 7:52 PM

#

i do remember gpt 5 taking some time to roll out

honest verge Mar 5, 2026, 7:52 PM

#

Gemini namings are so done 🥀

Screenshot_2026-03-05-22-52-14-958_com.android.chrome-edit.jpg

royal sail Mar 5, 2026, 7:52 PM

#

even though it was supposed to be huge release

frosty lava Mar 5, 2026, 7:52 PM

#

ocean vortex Didn't even have 5.3 until now

because they only released 5.3 instant

#

there is no normal 5.3

royal sail Mar 5, 2026, 7:53 PM

#

honest verge Gemini namings are so done 🥀

i actually hate it

#

every company is terrible at naming models except kimi and minimax

#

and deepseek ngl

#

DeepSeek's naming feels the most straightforward

distant idol Mar 5, 2026, 7:53 PM

#

honest verge Gemini namings are so done 🥀

never seen an ai model with a good name

proud bobcat Mar 5, 2026, 7:53 PM

#

lmao gpt 5.4

#

gpt 5.3 hadnt even fully released yet

#

😭

frosty lava Mar 5, 2026, 7:54 PM

#

proud bobcat gpt 5.3 hadnt even fully released yet

they won't release a normal 5.3 it wouldn't make sense

#

only the instant version

ocean vortex Mar 5, 2026, 7:54 PM

#

royal sail Sure, but arc-agi is more about reasoning, and svg tests are just a capability b...

svg is measuring spatial awareness and design/visual skills directly. That's as good of a test as website design IMO. It forces the model to draw things manually

weak dagger Mar 5, 2026, 7:54 PM

#

Qwens video generator is just so ahh 😭💔 its like veo 3

proud bobcat Mar 5, 2026, 7:54 PM

#

IM CRYING HOW DID IT FALL BEHIND 5.2 CHAT

#

IM CRINE

frosty lava Mar 5, 2026, 7:54 PM

#

proud bobcat IM CRYING HOW DID IT FALL BEHIND 5.2 CHAT

preliminary test + its only one point behind

distant idol Mar 5, 2026, 7:54 PM

#

proud bobcat IM CRYING HOW DID IT FALL BEHIND 5.2 CHAT

it js came out

inner relic Mar 5, 2026, 7:54 PM

#

proud bobcat IM CRYING HOW DID IT FALL BEHIND 5.2 CHAT

dude

#

just test

proud bobcat Mar 5, 2026, 7:54 PM

#

imagine its like gonna be a point ahead

#

😭

royal sail Mar 5, 2026, 7:55 PM

#

ocean vortex svg is measuring spatial awareness and design/visual skills directly. That's as ...

I can agree that it's good for measuring spatial awareness, but drawing out SVGs is a completely different skill in comparison to interface design

proud bobcat Mar 5, 2026, 7:55 PM

#

inner relic just test

lets see

hazy marlin Mar 5, 2026, 7:55 PM

#

proud bobcat IM CRYING HOW DID IT FALL BEHIND 5.2 CHAT

honestly claude is OP when it comes to coding

frosty lava Mar 5, 2026, 7:55 PM

#

proud bobcat lets see

the writing is better if you care

inner relic Mar 5, 2026, 7:55 PM

#

yep

distant idol Mar 5, 2026, 7:55 PM

#

hazy marlin honestly claude is OP when it comes to coding

agree

hazy marlin Mar 5, 2026, 7:55 PM

#

the only reason I use it and not other ones because it actually edits the code not rewrite it entirely everytime

ocean vortex Mar 5, 2026, 7:55 PM

#

proud bobcat IM CRYING HOW DID IT FALL BEHIND 5.2 CHAT

arena score is not indicative of model capability #general message

frosty lava Mar 5, 2026, 7:55 PM

#

hazy marlin honestly claude is OP when it comes to coding

i'd say claude is the best for front-end

proud bobcat Mar 5, 2026, 7:56 PM

#

nevermind

#

solid

#

all gpt models get choked up on this

royal sail Mar 5, 2026, 7:56 PM

#

proud bobcat nevermind

This is a pretty bad way to test a model tbf

#

LLMs don't work in letters

inner relic Mar 5, 2026, 7:56 PM

#

proud bobcat all gpt models get choked up on this

Alright test it, If it's good at roleplay

distant idol Mar 5, 2026, 7:56 PM

#

proud bobcat nevermind

proud bobcat Mar 5, 2026, 7:56 PM

#

lmao

honest verge Mar 5, 2026, 7:56 PM

#

Gemini 3.1 0326 flash lite Omni pro preview high flash TTS image veo

proud bobcat Mar 5, 2026, 7:56 PM

#

hold on i gotta see ts

hazy marlin Mar 5, 2026, 7:57 PM

#

royal sail This is a pretty bad way to test a model tbf

yeah, I will only be impressed if they were able to 1 shot an entire HTML game in 1 prompt

proud bobcat Mar 5, 2026, 7:57 PM

#

i wonder about its coding

#

hold on

#

i wonder if they fixed the schizo code of gpt 5.2 and 5.3

distant idol Mar 5, 2026, 7:58 PM

#

distant idol

do i need to tell him to explain or thats awkward

inner relic Mar 5, 2026, 7:58 PM

#

distant idol do i need to tell him to explain or thats awkward

👍

honest verge Mar 5, 2026, 7:58 PM

#

TECHNOLOGIA

#

TECHNOLOGIA

frosty lava Mar 5, 2026, 7:59 PM

#

when do you think they'll release the codex version

#

5.4 codex

distant idol Mar 5, 2026, 7:59 PM

#

yes techologia

royal sail Mar 5, 2026, 7:59 PM

#

frosty lava when do you think they'll release the codex version

in 2 hours

#

source: idk

distant idol Mar 5, 2026, 7:59 PM

#

royal sail in 2 hours

source

honest verge Mar 5, 2026, 7:59 PM

#

frosty lava when do you think they'll release the codex version

April 1

#

Source :Slam Altman

royal sail Mar 5, 2026, 8:00 PM

#

do not slam the altman

proud bobcat Mar 5, 2026, 8:00 PM

#

scam altman

distant idol Mar 5, 2026, 8:00 PM

#

royal sail do not slam the altman

double kick him

honest verge Mar 5, 2026, 8:01 PM

#

FINALLY

#

This is what I needed

#

MISTRAL VIBE CLI

#

FINALLY

#

I WAITED FOR THIS MY ENTIRE LIFE

proud bobcat Mar 5, 2026, 8:02 PM

#

ah yes

#

mistral cli

#

i needed this

#

truly

#

(mistral sucks)

distant idol Mar 5, 2026, 8:03 PM

#

proud bobcat (mistral sucks)

he will murder u

honest verge Mar 5, 2026, 8:03 PM

#

Wonder if this is even better than Gemini 2.5

#

I'll test it

proud bobcat Mar 5, 2026, 8:03 PM

#

distant idol he will murder u

probably

frosty lava Mar 5, 2026, 8:03 PM

#

why only gpt 5.4 medium on coding arena

#

i want the high version atleast

signal pelican Mar 5, 2026, 8:04 PM

#

lol.... 5.4 is behind 5.2?

proud bobcat Mar 5, 2026, 8:04 PM

#

yeah im confused too

proud bobcat Mar 5, 2026, 8:04 PM

#

signal pelican lol.... 5.4 is behind 5.2?

a bit surprising, even if its still early testing

frosty lava Mar 5, 2026, 8:04 PM

#

signal pelican lol.... 5.4 is behind 5.2?

preliminary test and its only 1 point behind

distant idol Mar 5, 2026, 8:04 PM

#

give it a day or more

frosty lava Mar 5, 2026, 8:05 PM

#

no honestly everyone saying its worse than gpt 5.2 is lying

#

that's just not true at all

signal pelican Mar 5, 2026, 8:05 PM

#

yeah... I've been watching the leaderboard for a while... I don't think 5.4 will catch up that much.

frosty lava Mar 5, 2026, 8:05 PM

#

signal pelican yeah... I've been watching the leaderboard for a while... I don't think 5.4 wil...

the leaderboard doesn't refresh every second

signal pelican Mar 5, 2026, 8:05 PM

#

oh I know

distant idol Mar 5, 2026, 8:05 PM

#

frosty lava that's just not true at all

its not a big leap tho

fierce kelp Mar 5, 2026, 8:06 PM

#

burnt sinew And to have router show me what model its using

You can click to see what the router is using

burnt sinew Mar 5, 2026, 8:06 PM

#

fierce kelp You can click to see what the router is using

Where

frosty lava Mar 5, 2026, 8:06 PM

#

distant idol its not a big leap tho

yes but its not a coding version and it's not worse like people are saying

#

but your right definitly not a huge leap in capabilities

signal pelican Mar 5, 2026, 8:06 PM

#

I'm sure it will beat 5.2 someday... but I'm talking about others... lol

#

I think they are actually falling behind Gemini and Claude

frosty lava Mar 5, 2026, 8:07 PM

#

signal pelican I'm sure it will beat 5.2 someday... but I'm talking about others... lol

let's see

wraith wren Mar 5, 2026, 8:07 PM

#

Quick question. Even though gpt 5.4 was out for maybe an hour? Do you think gpt 5.4 is better than opus 4.6 at coding?

fierce kelp Mar 5, 2026, 8:07 PM

#

burnt sinew Where

Click Max above the response and it will say "Response provided by ..."

cinder nexus Mar 5, 2026, 8:07 PM

#

gpt 5.2 still stands strong

frosty lava Mar 5, 2026, 8:07 PM

#

they clearly have focused on creative writing too this time

frosty lava Mar 5, 2026, 8:08 PM

#

wraith wren Quick question. Even though gpt 5.4 was out for maybe an hour? Do you think gpt ...

not front end, but for example gpt 5.3 codex was better for everything that does not involve good looking

#

so 5.4 should do same as 5.3 codex

wraith wren Mar 5, 2026, 8:08 PM

#

Interesting. So for back end gpt 5.4/5.3 codex is better?

distant idol Mar 5, 2026, 8:08 PM

#

wraith wren Quick question. Even though gpt 5.4 was out for maybe an hour? Do you think gpt ...

le gpt

frosty lava Mar 5, 2026, 8:08 PM

#

wraith wren Interesting. So for back end gpt 5.4/5.3 codex is better?

Yes definitly imo

#

for good looking go to opus

#

gpt is bad at making thing look good

honest verge Mar 5, 2026, 8:09 PM

#

LOL THIS IS GPT 5.3 INSTANT

Screenshot_2026-03-05-23-08-31-787_com.android.chrome-edit.jpg

#

LOL

#

Vs mistral

Screenshot_2026-03-05-23-08-16-488_com.android.chrome-edit.jpg

wraith wren Mar 5, 2026, 8:09 PM

#

frosty lava Yes definitly imo

Oki. Ty

honest verge Mar 5, 2026, 8:09 PM

#

MISTRAL is better

royal sail Mar 5, 2026, 8:09 PM

#

honest verge Vs mistral

This looks horrendous

burnt sinew Mar 5, 2026, 8:09 PM

#

honest verge Vs mistral

Terrible

honest verge Mar 5, 2026, 8:09 PM

#

royal sail This looks horrendous

But gpt is too simple

frosty lava Mar 5, 2026, 8:09 PM

#

yes i was never able to make a gpt model to do something good looking

cinder nexus Mar 5, 2026, 8:10 PM

#

honest verge Vs mistral

the font wants me to kms

royal sail Mar 5, 2026, 8:10 PM

#

honest verge But gpt is too simple

for a website, i would rather have GPT's lol

distant idol Mar 5, 2026, 8:10 PM

#

honest verge Vs mistral

💔

royal sail Mar 5, 2026, 8:10 PM

#

The font for mistral is terrible

honest verge Mar 5, 2026, 8:10 PM

#

royal sail The font for mistral is terrible

Still good for the worst model ever

distant idol Mar 5, 2026, 8:10 PM

#

the font of doom

cinder nexus Mar 5, 2026, 8:10 PM

#

distant idol the font of doom

fr

distant idol Mar 5, 2026, 8:12 PM

#

#

guys stop bullying him

slim gorge Mar 5, 2026, 8:13 PM

#

why is gpt-5.4 already out??? gpt-5.3 just came out like wtf 😭

hollow ivy Mar 5, 2026, 8:13 PM

#

poll_question_text

Which is better in coding?

victor_answer_votes

9

total_votes

15

victor_answer_id

1

victor_answer_text

Claude Opus 4.6

distant idol Mar 5, 2026, 8:13 PM

#

oh

#

👀

honest verge Mar 5, 2026, 8:15 PM

#

Left is gpt and the right is mistral I can't decide which design for repair station is better

Screenshot_2026-03-05-23-14-25-526_com.android.chrome-edit.jpg

Screenshot_2026-03-05-23-14-08-733_com.android.chrome-edit.jpg

#

(idk what repair station even means)

royal sail Mar 5, 2026, 8:15 PM

#

Well these two designs are completely different lol

#

not really fair to compare

#

depends what you need

proud bobcat Mar 5, 2026, 8:16 PM

#

honest verge Vs mistral

this is awful

#

also gpt 5.4 is awful

#

like wow

#

its not

#

impressive at all

honest verge Mar 5, 2026, 8:16 PM

#

proud bobcat also gpt 5.4 is awful

Over hyped model

#

Everybody were like

distant idol Mar 5, 2026, 8:16 PM

#

honest verge Over hyped model

fr

honest verge Mar 5, 2026, 8:16 PM

#

WOW IT'S GOING TO DESTROY EVERYTHING BEST MODEL EVER!

#

But we got nothing

#

It's not better at all

royal sail Mar 5, 2026, 8:17 PM

#

honest verge Over hyped model

overhyped by nobody except scam altman

proud bobcat Mar 5, 2026, 8:17 PM

#

i think openai is just "RANDOM SHI GO"

graceful vortex Mar 5, 2026, 8:17 PM

#

guys I have to ask one thing can anyone that have a knowledge with arena.ai answer me?

proud bobcat Mar 5, 2026, 8:17 PM

#

what is it brinks truck sniper

thorny schooner Mar 5, 2026, 8:17 PM

#

Is it just me where thos battle mode in direct mode increase the frequency of mistakes even if using skip like I keep I getting stuck my getting into a freeze area where it just tells me a error was made despite no matter what I do

distant idol Mar 5, 2026, 8:17 PM

#

graceful vortex guys I have to ask one thing can anyone that have a knowledge with arena.ai answ...

nice pfp

honest verge Mar 5, 2026, 8:17 PM

#

Openai rushed 5.4

#

I think it's clear

vital lake Mar 5, 2026, 8:17 PM

#

Bru how people only care about coding 😭

graceful vortex Mar 5, 2026, 8:17 PM

#

distant idol nice pfp

ty yours too

royal sail Mar 5, 2026, 8:18 PM

#

5.4 does feel great for writing unironically

vital lake Mar 5, 2026, 8:18 PM

#

Wait for 5.4 codex before judging on coding

vital lake Mar 5, 2026, 8:18 PM

#

royal sail 5.4 does feel great for writing unironically

Mhm

royal sail Mar 5, 2026, 8:18 PM

#

not sure why they didn't talk about how good it is at writing

#

feels a lot better

graceful vortex Mar 5, 2026, 8:18 PM

#

why is arena.ai free. I mean how isnt the API's are so expensive?

honest verge Mar 5, 2026, 8:18 PM

#

royal sail 5.4 does feel great for writing unironically

But it was supposed for complex tasks...

vital lake Mar 5, 2026, 8:18 PM

#

Its creepy how good it is

inner relic Mar 5, 2026, 8:18 PM

#

royal sail 5.4 does feel great for writing unironically

Price is questionable

honest verge Mar 5, 2026, 8:18 PM

#

While 5.3 was supposed for writing

inner relic Mar 5, 2026, 8:18 PM

#

Roleplayers doesnt want a expensive model

proud bobcat Mar 5, 2026, 8:18 PM

#

graceful vortex why is arena.ai free. I mean how isnt the API's are so expensive?

backed by the ai companies to get free testing on their models

thorny schooner Mar 5, 2026, 8:18 PM

#

thorny schooner Is it just me where thos battle mode in direct mode increase the frequency of m...

😭

royal sail Mar 5, 2026, 8:18 PM

#

inner relic Price is questionable

well, most models that are great at writing are expensive

proud bobcat Mar 5, 2026, 8:18 PM

#

its a worthwhile investment

distant idol Mar 5, 2026, 8:18 PM

#

inner relic Roleplayers doesnt want a expensive model

yes

royal sail Mar 5, 2026, 8:18 PM

#

Top writing model is Opus 4.6 lol

graceful vortex Mar 5, 2026, 8:19 PM

#

proud bobcat backed by the ai companies to get free testing on their models

so the companies giving it for free trial ?

proud bobcat Mar 5, 2026, 8:19 PM

#

basically

graceful vortex Mar 5, 2026, 8:19 PM

#

to arena.ai test it

thorny schooner Mar 5, 2026, 8:19 PM

#

Basically money is replaced with data as the cost

proud bobcat Mar 5, 2026, 8:19 PM

#

arena gives the rankings and companies get the data

graceful vortex Mar 5, 2026, 8:19 PM

#

and people see the models : "oh wow its a good model"

vital lake Mar 5, 2026, 8:19 PM

#

graceful vortex Mar 5, 2026, 8:19 PM

#

and company won

proud bobcat Mar 5, 2026, 8:19 PM

#

basically yeah

proud bobcat Mar 5, 2026, 8:19 PM

#

vital lake

wheres the meh option

#

its defo better than gpt 5.2

honest verge Mar 5, 2026, 8:19 PM

#

Yes

thorny schooner Mar 5, 2026, 8:19 PM

#

Basically all three of that since we all kind of answer at the same time lmoa

proud bobcat Mar 5, 2026, 8:19 PM

#

but its not

vital lake Mar 5, 2026, 8:19 PM

#

proud bobcat wheres the meh option

Oops

royal sail Mar 5, 2026, 8:19 PM

#

vital lake

This question doesn't really make sense

proud bobcat Mar 5, 2026, 8:19 PM

#

really great

royal sail Mar 5, 2026, 8:19 PM

#

like

#

in comparison to what?

honest verge Mar 5, 2026, 8:20 PM

#

But not for what it was made

vital lake Mar 5, 2026, 8:20 PM

#

royal sail This question doesn't really make sense

Overall bruh

#

Like its pretty clear

honest verge Mar 5, 2026, 8:20 PM

#

Gpt was supposed for coding and agentic tasks