#general
1 messages · Page 277 of 1
Well yeah like the main AI sub. Since their limits are very reasonable and their platform is very feature rich
It uses different models but how would you rate it stilc
Hmm true
I wonder when apple will integrate Gemini
Also s26 agentic ai
Seems like a security flaw
Mostly just a thing to screw around with lmao. I have 16Pro but I barely ever use this playground
not extremely useful
Will take a hell lot of time i guess
Yuh
What iOS are we on
iOS 26 probably will come out on iOS 27 ish
Apple's thing isn't anywhere near as good as the best alternatives. It can forward requests to chatgpt but at that point might as well just use chatgpt app
True also it's bad how it asks do you want to ask chatgpt
Why couldn't they try integrate it to just do it anyway
Basically it's aesthetic cover phone
The phone itself is not bad at all. Does everything very well and it just works. But Apple Intelligence was a fiasco for sure
Apple intelligence was the worst ai I think on record
You almost never have issues with apps not working on an iPhone, or glitching out, or having to deal with bloatware. Everything just works from the first try properly
Camera wise in my opinion even samsung doesn't compare to it. Smooth af also a well used IOS will be way smoother then android. But the way it basically have same features just different fonts is money milking
I'm surprised Samsung allowed Google to partner with apple like that
I mean... They didn't have a choice lol. Google owns the entire OS that Samsung relies on, they did not have a say.
Alot of the main galaxy ai features use Gemini nano/ ai core
True but I don't think Google can restrict Samsung
As it's open source
All Google can do is probably not allow Google pre installed apps
I think they did that with Huawei
Ironically this seems to be changing for the worse lately:
https://arstechnica.com/gadgets/2026/03/with-developer-verification-googles-apple-envy-threatens-to-dismantle-androids-open-legacy/
Which would annul their main advantage. Not too great. Wouldn't want for them to do this even though I'm not using Android. It is supposed to be open OS...
Google started restricting side loading with there advanced protection
A account security program was restricting side loading
I would switch to grapheneos if they block side loading
Or install surveillance
Which UK government is trying to push
Also yes android main feature is open source
And side loading
Hello
Why?
I've never been given a false flag in my life on pixel
And I've installed many modded apps
Hmm...
Can't talk about piracy :/
Noted.
Yuh
Against discord tos
I've only ever got flagged on my old s9
Which gave fakes almost every app
Hi, does anyone have a Sora 2 code? I already have an account. I'm looking for a code
I can't get in, does anyone have a solution? I've correctly set up my surfshark VPN
VPN is probably blocking it
From you entering it by discords anti ddos
What's the best ai for coding?
Are you on WhatsApp? Or instagram
Btw I don't think you can join there server due to the VPN
You would have to turn it off
Yes, it's blocked after the code. You can send it to me on Telegram; I don't know if it blocks it or not.
Yes, it's the VPN, they know.
I'm looking for a code; I came across a YouTube video on this Discord link
Claude 4.6 amazing
Thanks i will try it
claudius
They added the direct mode battles back, right? Haha
really?
anyone got any ideas for a good rp?
roleplaying game with an AI as GM?
I can recommend a sandbox game:
You discover a hidden, ancient alien spaceship, while traveling/exploring antarctica. The spaceship has an AGI/ASI in it, which offers you to become its new (biological) pilot.
Was fun to play with Gemini.
Or you play an adventure, where you inherit a time-machine from your reclusive, genius uncle.
Claude also is really good in these games.
A third game: your character discovers an anomalously large spider in your house (attic, bathroom, or whatever)
-# third scenario not recommended for arachnophobic people ^^
Everything but Video in direct chat 💔
5.4?
This is the third time Codex has reset my weekly quota to 100%. 🤔
Should I be worried?
Oh, now it makes sense.
Why is 900 people watching the itv logo bro 😭
gpt 5.4 on arena but it sucks
You shocked?
And free users...
where you get this info from
i don't see it on arena
galapagos
Yes, Battles in Direct experiment was added back. There is now a Skip Button added. cc @stray aspen
it was on design arena a few days ago and it matched gpt style, and everything
and on lm arena it says its openai
its like the instant or low thinking version though
so it sucks
okay i think you can be right but it might not be gpt at all too
wait for the real one
better to make a real opinion
thank god
very very likely
to be gpt
yes but like you said if its a spark version or an instant version or a non thinking version then it's normal
if it did worse than gpt 5.3 codex then it can't be the thinking one
huh
its pretty bad, just try it yourself
till now we never saw any new model being worse in capabilities than the previous version :
cause it wouldn't make sense at all
i told you what it can be, but anyway we don't know since its an hidden one
It is extremely direct and kinda funny. No fluff
But knowledge cutoff in july seems wrong
Should be something like december
we don't have gpt 5.3 yet we only have the instant version so it might be the normal 5.3 without thinking ?
or directly 5.4 but without thinking
Quick question: why are certain models (like Gemini 3.1) not receiving new votes anymore? It’s been a long time since I've seen the numbers move. Is this intentional?
hey i wanna know something, how do people get this notification?
Is anyone else having this issue now where it just gives me a error message repeatedly even after I locked out and everything and no I don't have a VPN on I already turn it off
Same. Was about to ask.
Also, how can we use that model? On the website i tried to direct talk and search, but it didnt show up
Do you need developer?
well gang the A/B testing in direct chat is broke whatever you do DONT click skip or your chat is going to get bricked
Okay look I don't know if I'm the only one having this issue when I'm using Image generation and using the nano banana pro model, And let's say I passed image of myself and make a prompt to include another figure next to me when it finishes generating it indeed creates another figure but then For some reason it doesn't generates me it generates someone who is wearing the same clothes as me but not me, Is anyone else experiencing this issue, Now if I hit try after a couple of tries it does generate me but many times it generates a random person who is wearing the same clothes as me
Do you need developer?
anyone know why its not letting me reset password ?
the video generation is now on the website, not in discerd. check it out at https://arena.ai/video
Bro your in the discord
Everything but video in direct chat😭😭🙏🙏
Ayo is it just me or does it demand for you to log in your account to use it?
/voice
You'd think they'd make an announcement or something if they made it so you had to be logged in to use arena
It's required; they should add that it will let you do some things even if you don't log in. They should leave it like it was before.
Why I can't send prompt in lmarena incognito
@echo aurora
Hmm what do you mean?
Why not
Have Video Arena selected?
I’m not sure. Was able to make it work without having to sign in. Can you post a bug in #1343291835845578853 ? Add the relevant info. I can’t take a look now, but will later.
Guys, does anyone know why in Claude 4.6 I can’t upload files? Screenshots and things like that would be useful, but for some reason it doesn’t work in LMArena.
#1372230675914031105
-# Before creating a post with that proposal, check first to see if someone has already created a post with your idea
Why is it that at a certain hour, when using AI to create a video, the captcha system is bug and cannot be overcome?
Then it won't let me write on code-arena or direct chat because the captcha ruined my previous captcha token
There comes a point when the same problem [becomes annoying](#general message)
Im not sure they'd have necessarily advanced cutoff date between dot one models, most likely trained over the same normalized data with superior inference
finally u can stop responses, seen in canary arena
hello
Hi
Hello
My main account got stop genration button, but my onther account still don't have that button:(
Yes im see it today
Well no more infnite genrate yay 😄
Gpt 5.3 code vs opus 6 who is better guy?
How long time do we have to wait to have new token for claudAI free ? Or its « one shot » mb I’m new
Now can you guy fix issue this like model sometime can't use is report 'something went wrong with this message, please try again'
@echo aurora when yall making the video arena a direct use or even side by side, its been months since i asked this 😭
@echo aurora Is he developer or discord mod?
inbetween ✌
king
It's a bug; if you delete the cookies and try again, the login message doesn't appear. It only appears sometimes, so I imagine it's a bug.
I am pretty sure galapagos is just the upcoming open source model
from openai
Lneduo2en ?
how come the UI is half Russian for you?
galapagos is doing much worse than any open ai last thinking model so it can't be a thinking model:
that's probably an instant or normal model
so it will be for free user too
for sure
That what I meant, it is going to be the upcoming open source model that you can use locally on your own computer
Either that or 5.4 instant, but I it does not feel like they are releasing thatt
honestly all i care about is thinking model capabilities i don't need another instant model or low cost
Yeah I am waiting on 5.4 pro until I renew my pro subscription.
Arena has updated something? Previous I was able to generate 3 images per day with no account
it could be 5.3 Instant no? That one is still not on a leaderboard
.txt support for
when?
I can't chat anymore without login. Is it time to leave lmarena?
maybe
Nah 5.3 instant uses 1 emojii per sentence. This is way more concise than 5.3 instant
I like how qwen can image create anything except 18+ content
To prevent people clearing there cookie + cache to get more message allowance
Nvm I hate it it doesn't make Donald trump in a I hate America shirt
I dont know but id rather just pick opus 4.6 every time
And to have router show me what model its using
so annoying
it behaves differently on API vs chatgpt though. Chatgpt has lots of instructions they are feeding it.
Here are 10 API responses of it I generated for someone else earlier today (just a silly argument about whether it can find and output release date of itself accurately lol):
it's broken
As you can see not a single emoji anywhere
No added sys/dev instructions here, default settings and a simple question with OpenAI's search enabled.
You have no more code_execution tool calls remaining this turn
MHN Solutions | 39 followers on LinkedIn. A results-driven digital marketing and web development agency helping businesses grow, scale, and dominate online. | MHN Solutions is a results-driven digital marketing and web development agency helping businesses grow, scale, and dominate online.
We specialize in:
• Search Engine Optimization (SEO)...
Make a new chat, but I am pretty sure Gemini nano banana is the only one having this issue
Only this new version have this issue. 3. works fine
Well yeah you can use it on Google flow for free
With Gemini watermark?
Uhh I dont think it haves it
A new experimental tool that lets you use images as prompts to visualize your ideas and tell your story.
Here it is
hi new here
yo folks, where is video arena channel? pls anyone can guide me to it
holy plot twist
Can't find it either
@steep blaze Note that Video Arena has been removed from the server. More information can be found in this #announcements . You can still generate videos on the website.
yesssss its real
Someone addd ittttt
thx
okay so they are saying its better than 5.3 codex in coding too so now i can't wait for 5.4 codex to see even more improvement in coding task
Yes this one should be as good or better but not by alot than 5.3 codex in term of coding task or intelligence that's why its not 5.4 codex
but for sure they'll release the codex version if what you mean by intelligence is coding capabilities
No, general intelligence
Not coding
I value a smarter assistant more
hopefully 5.4 isnt a disapointment
oh then, your right it seems more cost efficient and should be more intelligent but they didn't focused as much as for previous model
its not on arena yet right?
still should be smarter
It is
thx man
seeing the benchmark it should be still a good improvement in intelligence from 5.2
Mine isn't working
did you (or anyone else) who has used a lot ai and tested gpt 5.4/high? how is it compared to other tier 1 AI?
knowledge cutoff seems to be august 2025
I mean it just released
1 saw from X, open ai is experiencing error originally before the launch (lol)
2 openai just released it so their own server must be experiencing a high volume -> slow
3 same thing for arena itself
actually every model fail at this answer due to knowledge cutoff and what they've been trained most on
ask gemini to guess it will do same and same for anthropic
gpt 5.4 is out
Well they are given like zero data to determine
So a guess like GPT 5 era is good
We know
It talks so natural and smart
they said that they improved the personality to be better again so
i guess its good since everyone complained about it
Guys
Was there no announcement for gpt 5.4 on arena?
its made for general purpose task and supposed to be good at coding even tho its not a codex model
why does gpt 5.4 not answer
Chad model
same performance as gpt 5.3 but cheaper
should be better or as good as 5.3 codex on coding task
still same coding slops
I wonder if we should expect gpt 5.5 sometime later
the pro version is beating gemini deep think right ?
So now openai are doing monthly release instead of two month ?
Is gpt 5.4 good in roleplay
They release monthly so probably around mid or late apirl.
that's crazy speeding up by 2x ?
Dont listen to him its been around for like 20 minuites no one could get a good understanding of the model within that time frame.
gpt 5.4 is literally so ass
Gpt 5.5 is going to come out April 5
ill just go back to claude 4.6 for the 50th time this month
me when im hating without even testing
Yup, to keep up with Opus 5
wdym i just tested it
And gpt 5.6 may 4
okay bro
glm-5 is actually decent
Gpt 5.4 is good in lua coding 💀
The model is quite better conversationally
and maybe gpt 5.7 early-mid june
It sounds a lot less AI-ish
alright.
it talk in a better way
When gpt 6 will come out
Because at this point it's only 5.x
they're scared to call anything gpt 6 right now lol
I doubt they have anything worth calling gpt 6
so you are basing your opinion on the name they give to the model not on the actual performance, nice
knowing every other companies are doing the same
when did i say that
what do u think ,will gpt 5.4 take the first place in arena?
i literally just said that i tested it
bro what the hell is this
even glm is better
glory to anthropic
5.4's coding skills are pretty much nearly identical to 5.3 codex
especially with frontend
High version is out
Not everyone values frontend only.
I pretty much just imagine this model to be the general-use version of 5.3 codex
claude is way better
this one is made for general purpose task.
Let's test creativity writting
But no xhigh...
mainly but still good at coding
since chatgpt messed up eariler
I wonder if arena will ever have xhigh
Well if people can't afford Claude it's mid
Or it's too expensive
Too expensive
It seems like it would shine there
based on what?
no
the model has barely been out for an hour lol
on my tests
on your 3 prompts?
yes
He has only done basic frontend tests btw
i test each model for the stuff i do
if its not good then its trash for me cause i wont give any other use
Waiting for extreme thinking
yes you are doing front end task and testing it on a model made for general purpose task
Oh yeah
nah
But I think extreme thinking is a fake leak
If you form an opinion on a model after 3 tests, then I don't really have anything else to tell you
Frontend should be the least
gemini 3 deepthink on arena when
atleast before judging coding capabilities wait for the coding model lol
never since its from a pro subscription
well that sucks
just like we ain't ever getting gpt 5.4 pro on arena
yupp has them models tho
Isn't it for ultra?
And also all leaks about open source gpt are fake ...
yes my bad
i mean its from the highest subscription
Really hoped openai will release gpt oss 2 or something
Yeh
so it can't be on arena
i doubt they're going to touch oss again any time soon
for what
especially with how they're doing in the public eye right now
people want deepseek 4
i did some tests and in general texting gpt 5.4 is not better compared to claude opus 4.6 or even gemini 3.1
did you use 5.4 high?
they are sleeping
yes
They just gonna distill why would we be excited
Is o3 pro better than gpt 5.4?💀
oh hell naw
Ummmm source? 😭 🙏
gpt 5.4 is miles better
The early pro models like o1 and o3 pro were sloppy.
Since post RL was new
Then why openai keeps it
Its still a decent model
Some labs and companies still use it
o3 pro and o3 are still live
not saying it's a bad model
but there are much better options now
Fo sure
and cheaper too
And openai won't remove them until 2027
Like confirmed?
there's no real reason to use o3 right now unless you REALLY like the model for your use case
Because no official deprecation date
there are more effective options
They are deprecating gpt 5.1 but not o3
Something about o3 was different idk why
It definitely marked a huge leap in reasoning
One of my personal favorite models
just far too expensive
also hallucinated everything lol
I tested it in chess and it was the first model ever to not hallucinate.
the ridiculous pricing
Especially pro version
Btw what's the point of using o3 pro
Like is it better?
Atleast it was cheaper then o1
it just thinks wayy longer for higher reasoning capabilities
Who remembers o1 pro pricing? 😭
Barley higher
yeah lol
I never saw any big improvements, but maybe I wasnt using it right
that mean every month we are getting a new gpt its insane
will gpt5.4 search mode be added to arena?
Obviously
I feel like OpenAI is gonna have to release something extraordinary to get back into the race
feels like they're falling off frontier
glory to anthropic
i don't think its the real capabilities of their model that are judge by people but the fact that they removed 4o lol
well
gemini will take 1st place in may
4o put thousands of people into psychosis so
exactly
and people want it back
then they wil nerf the model
I used it very much but it's not very impressive now
But in early 2025 it was peak
I still hate they for some reason added battle mode directly to direct mode
There was no better ai writing
I will agree that there was something unique about 4o in the sense that it just had less refusals and felt less corporate
pineapple said theres a skipp button now
people just want a model that will make them go into psychosis and agree with everything they say
but it was just too sycophantic
they want the model to agree with everything they say that's it
yeah pretty much
Hey 4o is still live
Not in chatgpt but it's live
yes
yeah i know
So what's the point of #keep4o
the people who were going into psychosis can NOT afford that API pricing 😭 🙏
that's why they still want it in the app
i don't wanna be in a world where 80% of people are in psychosis due to an ai
please
use ai for coding task or general purpose
any news on seedance 2 api guys
delayed
Gpt 5.4 is the best at lua code?
damn, to when thoo
Why UI is scrolling down forcefully after response is done? Before it wasn't there, it's irritating.
no lmao
its horrible
claude is better
what's the best for lua?
claudius
k
bro
@daring rock
what are u doing nephew
He will use ai for this 100%
What's the point of paying someone 5$
When you can just use ai
I swear these guys just ask a model "how can i make money fast with ai" and do whatever it says
its a scam bot wdym
how to make money fast
You have to sleep less
Sleeping for 30 minutes is enough
work
No you have to sell your money
So you will make money
Then sell it again
real shi 💥💥
im an officer, do i need to quit
And you will get new money
You have to sleep less
What's this...
Concord grape?
Mammoth newt 0226
where did you find this
Yupp ai
why is it so expensive
Idk maybe new Claude?
personal opinion but i love how gpt 5.4 talk
nephew what did u even tell it
Debate about god
Wait mercury 2 is out?
gus fring is something
And it's after gpt 5.4
what's mercury 2
A really fast model, I forgot what type
Somethin new idk
feb and march is filled with crazy models
Feb is my favourite
actually no way
Also qwen image 2.0 and pro is out!
people when its comes to gpt 5.4 either they are glazing or hating no inbetween 😭
Crazy
They are focused on coding skill not creativity
Not true, its a balance
for coding skill it would be a codex version
this one is for general purpose
Gpt 5.4 pro pricing is crazy
Chatgpt 5.4 is good at writting and creative?
yes but i'd say same for gemini ultra or claude max
Thats why it has pro in the name
This lowers my expectations on 5.4...
need that jelly
Then what's the point of releasing pro version in app
wharr
No one will use it
For people with money
some people are buying it actually
are u rich gus
I wish
not bad not bad.
😭😭😭😭 he learnt from the memes
I don't really like this test, because it's just absuing how tokenization works
Arc agi 3 coming soon
🤦♂️
Ai is Future they said
Like, counting letters in a word is something us humans do fine because we have a method of doing it
LLMs don't work letter-by-letter
lol
i thought they fixed the hallucination of this model
Not a hallucination problem
It's just how tokenization works
the model doesn't see every letter in the word
it just sees the probability of that specific token happening
just look at the arena score
look at arena score
preliminary test mean its not definitive score
you didn't saw the warning ?
don't expect the score to change much, it won't win 50 elos point magically
don't expect the score to change much, it won't win 50 elos point magically
Dementia?
you are ragebaiting
nephew take a breath and relax
it's releasing in 1 hour and 23 minutes
trust
my source is me
why should I relax ?
model just dropped
Yep
5.8 in the next month
let's ignore the fact that it just released and the warning "preliminary"
WHEN GPT 6 IS COMING
The current leaderboard position is irrelevant solely because of the fact that the model has only 2,000 votes as of now.
IM TIRED OF 5.X
next month and 15 days
That's not nearly enough to make a conclusion
I want next generation
score won't change that much.... opus 4.6 will still be ahead by far, I mean are you new to this ?
The model only has 2,000 votes
Opus 4.5 has 30,000+
4.5*
can you read ? it doesn't matter, there wilkl be max 10 elo diff
mb lol
how can you be so sure?
gpt 5.4 is o5
Models can do this pretty easily if you just give it terminal access
they'll just use a command
i don't think this one is as dangerous as 4o
Same reason why LLMs can't do long form arithmetic without a calculator
experience, have been looking at the ranking for just one week ? and also next to the score there is the +-12
arena champion bro
experience 💥
there are 500 other arena champions
you are not special
dude he is the chosen one
yeah well you not being arena champion speak a lot for itself
no it's just that you don't understand simple thing is ragebaiting me
How do you even become arena champion?
I have a bachelor's in computer science and I've fine-tuned LLMs myself, and I also run my own benchmarks.
I'd rather have that than "arena champion"
ok Noam Shazeer
yeah but it's reasonable to expect for the chat model to be tuned more for user preference than the thinking one. Very often was the case with OpenAI. So wouldn't be too surprised if this doesn't climb much. Doesn't mean that it isn't better
I mainly bring it up because I think we're jumping to the conclusion that this model is "worse than 5.2" way too quickly
Once the score settles, it's probably fair to determine if the model is good or not
guys when will deepseek v4 be released?
WHERE IS DEEPSEEK V4
I dont know what whale is doing
today in exactly 22 minutes
IM TIRED
They're not afraid of gemini,claude and openai
I NEED DEEPSEEK V4
they always does this
source?
tbf the model is on par with 5.2 but it is not the big leap everyone was talking about
It's fundamentally incorrect to align arena score with better/worse in the first place
nobody was saying it was a big leap?
yes but no opensource model have beaten one of top 3 like anthropic, gpt gemini
if anything people were saying it was incremental lol
Of course, it just gives some insight on where the model may sit
you should see Scam altman hyping it up
opensource models are 1 year back in the race
well argue with him then lol
nobody here thought it was a huge leap
It's almost time for them to release something
Or they are getting destroyed
but they are progressing very fast honestly
Dunno how long you've been following this, but Anthropic used to suck on lmarena. That was kind of ironic. But it proves that top spots there can only be occupied by the models explicitly trained to do well in this specific environment
im waiting on bytedance models tbh, they are cookinggg ngl
Oh yeah I'm aware that LMArena has had some misleading scores in the past
hence why i take the scores with a grain of salt and try to supplement my opinion with other benchmarks as well
WE NEED A PIECE OF OPEN SOURCE!
Not really misleading, but people need some context to properly read them. Arena is one metric out of dozens of them. And it isn't really raw performance or capability metric
Eh, I'd say misleading in some cases. Llama 4 topped the leaderboard at some point.
Btw where is llama?
dead in the back of an alley
I haven't heard anything about it
yeah because they fine-tuned on arena datasets lol
They just discounted it?
when is arena-video gonna be a direct chat 💔
Actually I remember it was
For some hours
But it got deleted
fuhhhhhh
And now it's only for battle arena
my dream would be deepseek v4 #1 in capabilities from all existing model till now but it's impossible right ?
Many of it is just style that is technically meaningless and easily changeable, some of it are also just the mere patterns... Patterns from the most active users on arena, what prompts they use and how they are voting
highly unlikely, yeah
literally can't be possible
It's supposed to be 5.3 Codex level
patience nephew 🙏
It's not even opus 4.5 lvl
there are certain coding areas it should be really be really good at tbh
front end on gpt is really bad
don't do front-end using gpt
hate how openai conveniently leaves 5.3 Codex out of so many benchmarks
how can we explain opus 4.6 have good "taste" in front-end but not other companies ?
Because it actually does max effort
for example when i try gpt 5.3 codex on coding task that's not front-end it do very good job and maybe better than opus 4.6
so how do you explain it
I don't mind it since I prefer general purpose model and codex is this odd one out for me lmao. But I understand where you are coming from, I didn't even notice at first that they included 5.2-codex rather than 5.3-codex in that graph 🗿
the same gpt 5.3 codex on front end is bad
Opus 4.6 is built for frontend
It's capable of ambitious work
yeah i guess that make sense
As anthropic says
so openai need to work on this
its not about coding capabilities honestly but about the taste it have
I feel like GPT models have always been pretty terrible at frontend
never really got good UI outputs from GPT models
yes i hope they focus on it for the next one
UI from gpt 5.3 just feels too plastic
While opus 4.6 somehow makes it alive and beautiful
to have good visuals you ideally need a bigger model. So Opus and Google gonna have natural advantage there. And if OpenAI made it bigger they wouldn't be able to sustain current caps on chatgpt. It's a reasonable trade-off. They used to be struggling considerably more with this before gpt4.1 and subsequent models
What's the secret
its capable of implementing features and doing very great things like that but don't have any good visuals
i think they can train it especially on front end task to improve it honestly
I wouldn't say this is necessarily true, since some small models can push out some great UI with proper instructions
you can but only to a certain extent before degrading other things
Guys 5.4 is so good at creative writing
Size is only really important for world knowledge and superposition (relating novel concepts to each other)
Like way better then Gemini and Opus
WTF is 5.4???
openai will never make good models. again
"with proper instructions". But if you adopted similar approach with bigger model chances are it would do better than it did before as well. Can only compare identical setups
Anthropic was always ahead
They reached the peak at 4o
but the reason why i use gpt 5.3 codex instead of opus 4.6 is that its doing better job on every other thing than front end
GPT models with design skills still do pretty horrible
Theo.t3 made a video on the whole thing
And because 5.3 is very cheap
What kind of creative writing do you mean?
yes too
2nd
2nd in creative writting
Roleplay, story board, all of it.
Wouldn't say it's horrible. With gpt4o and before o3/gpt4.1, that's when it was really bad lol. They sucked on just about every single benchmark or test that touched on visuals
i mean, if you enjoy AI slop blue-purple gradients then sure 😭
5.3 codex loves doing it
5.3 codex (left), 5.4 (right)
Was gpt o models ever really good?
but i love the 1m context
guys which model has the most context
Pretty good general purpose reasoning models
Same lame model
And where's o2?
Its bad at simplebench
skipped lol
Yes
I did my tests
Copyright issues
oh really?
on webdev arena, arc-agi and related things they do get respectable scores nowadays. They also perform decently on svg tests
Yeah some company name
Chat gpt is bad at making good visuals
@alpine pasture i like cherrys
was not aware lol
just basing off my usage and general consensus back then
i js ate chicken
Missing
Or it was so good
did you like my brother?
yea, im still hungry too, come here nephew
Sure, but arc-agi is more about reasoning, and svg tests are just a capability benchmark rather than measuring design skills
Didn't even have 5.3 until now
i do remember gpt 5 taking some time to roll out
Gemini namings are so done 🥀
even though it was supposed to be huge release
because they only released 5.3 instant
there is no normal 5.3
i actually hate it
every company is terrible at naming models except kimi and minimax
and deepseek ngl
DeepSeek's naming feels the most straightforward
never seen an ai model with a good name
they won't release a normal 5.3 it wouldn't make sense
only the instant version
svg is measuring spatial awareness and design/visual skills directly. That's as good of a test as website design IMO. It forces the model to draw things manually
Qwens video generator is just so ahh 😭💔 its like veo 3
preliminary test + its only one point behind
it js came out
I can agree that it's good for measuring spatial awareness, but drawing out SVGs is a completely different skill in comparison to interface design
lets see
honestly claude is OP when it comes to coding
the writing is better if you care
yep
agree
the only reason I use it and not other ones because it actually edits the code not rewrite it entirely everytime
arena score is not indicative of model capability #general message
i'd say claude is the best for front-end
This is a pretty bad way to test a model tbf
LLMs don't work in letters
Alright test it, If it's good at roleplay
lmao
Gemini 3.1 0326 flash lite Omni pro preview high flash TTS image veo
hold on i gotta see ts
yeah, I will only be impressed if they were able to 1 shot an entire HTML game in 1 prompt
i wonder about its coding
hold on
i wonder if they fixed the schizo code of gpt 5.2 and 5.3
do i need to tell him to explain or thats awkward
yes techologia
in 2 hours
source: idk
source
April 1
Source :Slam Altman
do not slam the altman
scam altman
double kick him
FINALLY
This is what I needed
MISTRAL VIBE CLI
FINALLY
I WAITED FOR THIS MY ENTIRE LIFE
he will murder u
probably
lol.... 5.4 is behind 5.2?
yeah im confused too
a bit surprising, even if its still early testing
preliminary test and its only 1 point behind
give it a day or more
no honestly everyone saying its worse than gpt 5.2 is lying
that's just not true at all
yeah... I've been watching the leaderboard for a while... I don't think 5.4 will catch up that much.
the leaderboard doesn't refresh every second
oh I know
its not a big leap tho
You can click to see what the router is using
Where
yes but its not a coding version and it's not worse like people are saying
but your right definitly not a huge leap in capabilities
I'm sure it will beat 5.2 someday... but I'm talking about others... lol
I think they are actually falling behind Gemini and Claude
let's see
Quick question. Even though gpt 5.4 was out for maybe an hour? Do you think gpt 5.4 is better than opus 4.6 at coding?
Click Max above the response and it will say "Response provided by ..."
gpt 5.2 still stands strong
they clearly have focused on creative writing too this time
not front end, but for example gpt 5.3 codex was better for everything that does not involve good looking
so 5.4 should do same as 5.3 codex
Interesting. So for back end gpt 5.4/5.3 codex is better?
le gpt
Yes definitly imo
for good looking go to opus
gpt is bad at making thing look good
Oki. Ty
MISTRAL is better
This looks horrendous
Terrible
But gpt is too simple
yes i was never able to make a gpt model to do something good looking
the font wants me to kms
for a website, i would rather have GPT's lol
💔
The font for mistral is terrible
Still good for the worst model ever
the font of doom
fr
why is gpt-5.4 already out??? gpt-5.3 just came out like wtf 😭
Which is better in coding?
9
15
1
Claude Opus 4.6
Left is gpt and the right is mistral I can't decide which design for repair station is better
(idk what repair station even means)
Well these two designs are completely different lol
not really fair to compare
depends what you need
this is awful
also gpt 5.4 is awful
like wow
its not
impressive at all
fr
WOW IT'S GOING TO DESTROY EVERYTHING BEST MODEL EVER!
But we got nothing
It's not better at all
overhyped by nobody except scam altman
i think openai is just "RANDOM SHI GO"
guys I have to ask one thing can anyone that have a knowledge with arena.ai answer me?
what is it brinks truck sniper
Is it just me where thos battle mode in direct mode increase the frequency of mistakes even if using skip like I keep I getting stuck my getting into a freeze area where it just tells me a error was made despite no matter what I do
nice pfp
Bru how people only care about coding 😭
ty yours too
5.4 does feel great for writing unironically
Wait for 5.4 codex before judging on coding
Mhm
But it was supposed for complex tasks...
Its creepy how good it is
Price is questionable
While 5.3 was supposed for writing
Roleplayers doesnt want a expensive model
backed by the ai companies to get free testing on their models
😭
well, most models that are great at writing are expensive
its a worthwhile investment
yes
Top writing model is Opus 4.6 lol
so the companies giving it for free trial ?
basically
Basically money is replaced with data as the cost
arena gives the rankings and companies get the data
and people see the models : "oh wow its a good model"
and company won
basically yeah
Yes
Basically all three of that since we all kind of answer at the same time lmoa
but its not
Oops
This question doesn't really make sense
really great
But not for what it was made
Overall bruh
Like its pretty clear
Gpt was supposed for coding and agentic tasks