#general

1 messages · Page 94 of 1

ocean vortex
#

lol what

#

DId you just make this up for no reason?

#

GPT5 is significantly smaller than og GPT4 and their infra is much better now. They are doing just fine

#

They also deprecated gpt4.5 for good so probably doing better than ever on compute tbh

stray aspen
#

api or aistudio

ocean vortex
#

Define 'clearly'. If we look at all the tests and same prompt testing, gpt5 is the clear winner lol

ripe mountain
blazing bison
#

what you're talking about?

#

even on the podcast he said that they gonna need to cut something

ocean vortex
autumn cargo
ocean vortex
#

???????????????

ocean vortex
#

that 4.5 is their biggest model?

blazing bison
#

no

ocean vortex
#

Do you seriously need a source for that?

autumn cargo
blazing bison
#

that they not gonna cut things bcs they now have compute to spare

#

lmao

#

do you know that 4.5 was using training resources right?

#

And now with 4.5 disabled they gonna use it on gpt-6 related work

#

not inference

ocean vortex
#

And then dispute what I wrote

blazing bison
#

bro is blind

#

well, just wait 2 days and when he announce sora or other shi* being cut we discuss again

ocean vortex
#

that is nothing new, they are saying the same things all the time

#

that's how they stay in business... smh

#

They are always looking for ways to cut costs

blazing bison
#

it's not cut costs bro

ocean vortex
#

and this is one of the means of justifying that

blazing bison
#

they don't have enough compute

#

lmao

#

sam is crazying traveling the world to get 500 bi and bro is saying that he has compute but don't want to use to cut costs

#

are you insane or what

ocean vortex
#

No one does if you read everything literally. You really shouldn't take this for face value catgrin

#

They have plenty of it, the only question is the price/cost

blazing bison
#

lol

ocean vortex
#

smh

whole wagon
#

It has always been the case openAI are deploying compute far too slow. The stargate project is planning to have 64k GPUs by the end of the year

#

That is slow compared to other companies

wet sparrow
whole wagon
ocean vortex
#

They have their priorities on other things - like training new models. They also won't accept the higher cost than what they feel comfortable with - that's the extent of them struggling for compute

golden ocean
#

Bro is gpt 4o -

whole wagon
#

OpenAI and Oracle are expected to deploy 64,000 Nvidia GB200s at the Stargate data center in Abilene, Texas by the end of 2026.

whole wagon
ocean vortex
#

yeah they are expanding all the time and scaling, obviously

ocean vortex
#

If they wanted, their compute is 'unlimited', but cost...

#

that's a different question

whole wagon
#

That's their main stargate computer. Meant for training

wet sparrow
ocean vortex
#

Most of these tweets are meant for marketing. And to justify any potential ratelimits. They are not to be taken for face value

wet sparrow
#

Yes, you're a conspiracy theorist. There's no point in discussing it

blazing bison
#

Just let him yap

ocean vortex
#

I'm literally not. lmfao

#

I'm just not naive and know how this works

blazing bison
#

yeah bro

#

obviusly

ocean vortex
#

???

#

are you actually dumb?

wet sparrow
#

I know just enough about ML to know that he's telling the truth

ocean vortex
#

Or do you seriously think them making their models smaller and cutting models from model switcher + modernizing their infra... put them at bigger strain than before? 🤣

#

LOL

wet sparrow
#

The strain remains the same; they can simply serve more people without removing resources from research

#

ChatGPT is growing every week

twilit valve
#

hello

ocean vortex
wet sparrow
#

they use the free power for research

#

So, it's not going down

ocean vortex
#

They used it for research too since the start

wet sparrow
#

Bro, you can't have GPUs training models and doing inference at the same time

ocean vortex
#

But they also had to host bigger models and deal with older gpus

echo aurora
#
poll_question_text

What version do you use the most?

victor_answer_votes

10

total_votes

22

victor_answer_id

3

victor_answer_text

Direct

wet sparrow
#

Even if they have optimized their infrastructure and now have free GPUs for training models, the strain remains the same. They are going to use them to train models

#

When they say they don't have enough GPUs, that includes the GPUs they need for research, the GPUs they need for current inference, and the GPUs they will need for future inference if ChatGPT continues to grow at its current pace

ocean vortex
ocean vortex
ocean vortex
whole wagon
#

The plus limit went from 200 to 3000. GPT5 Thinking basically unlimited for plus users, they will never hit the limit

#

It doesn't add up with the capacity tweet ngl

blazing bison
#

Yes my friend tryed it it's misleading a little

#

because it's like 2600 mini thinkings and not 3000 gpt-5 thinking

whole wagon
#

Yeah it has the router

ocean vortex
whole wagon
#

How much do you think the router helps. What percents are going to mini or even nano

blazing bison
#

Not true because the o3 mini, o4 mini, gpt 4.5 and others was using those resources that they are giving gpt 5 mini thinking now

ocean vortex
blazing bison
#

that's why it's possible 3000 per week

#

the gpt 5 thinking model that is basically new o3 or o4 still 200 per week

ocean vortex
#

and gpt5-thinking is o3 equivalent, in terms of cost

wet sparrow
ocean vortex
#

I mean roughly, that's just common sense...

#

doesn't take a genius, but obviously this is not official confirmed info, if you need for me to spell it out just for you?

blazing bison
#

It just you making assumptions so

#

without real data there is no point

ocean vortex
#

well then there's no point in this chat...?

#

lmao

blazing bison
#

When i discuss things here, i bring tweets from researchs, papers, news

#

always based on some data

ocean vortex
#

Then look at API pricing

#

that's your data

blazing bison
#

you're just talking about things in your head and treating it as the truth

whole wagon
whole wagon
#

I mean look at the speed of it. That should be a clue lol

wet sparrow
#

"From what I hear..."

ocean vortex
#

lmfao

whole wagon
wet sparrow
#

There is public information about them using specific silicons that explains the O3 cut price and speed

whole wagon
#

How is API pricing data for the cost

#

That's literally so dumb

ocean vortex
whole wagon
#

It is not that hidden

wet sparrow
#

Yes, it's hidden from them. It's hidden from everyone

ocean vortex
whole wagon
#

I know a GTM director and he knows it

#

It's not hidden

#

How would they even hide something that fundamental that easily

wet sparrow
#

Well, I'd rather believe my friends than you

whole wagon
#

Your friends work in web dev mate

#

How tf would they know

ocean vortex
#

yeah they would have no clue tbh

wet sparrow
#

I would rather trust information from insiders than from random people on Discord.

wet sparrow
#

I'm leaving now. Bye-bye, conspiracy theorists!

ocean vortex
#

that they "don't know"?

#

that's not some secret information you can trust or not trust

#

LOL

blazing bison
#

Funny how everyone here has insiders friends lmao

#

You guys are funny

whole wagon
#

It is kinda impossible to prove. Because then you literally out them how do you want me to prove it without doing that

glass gulch
#

I feel like I know the answer but llmarena.ru is NOT you guys right?

whole wagon
#

no kek

#

i just visited the site

ripe mountain
ocean vortex
steel blaze
#

Can someone point me to research on automating the leaderboard with third LLM prompt creation and response evaluation? How close can it get to human results these days?

keen beacon
#

What

zinc ore
keen beacon
#

So most people just abuse lmarena to get top of the line models for free?

keen beacon
cedar tide
#

@echo aurora GPT 5 needs to be retested from the beginning on both arenas with the public API. Many people who had preview access say that GPT 5 was much better on that access than now via the official API.

brisk helm
keen beacon
#

wdym?

#

for the training data?

brisk helm
ocean vortex
#

Just did some back to back testing gpt5-chat vs gpt5-reasoning (API) vs gpt5-router (chatgpt)

keen beacon
tidal ginkgo
#

70% of people found out about it when a major ai came out and wanted to use if 4 free

ocean vortex
#

I would say that default model on chatgpt is halfway in between reasoning and non-reasoning variant on overall performance, but closer to the non-reasoning one

#

there are obvious gains to be had of having reasoning always on

tidal ginkgo
#

gpt-5, o4

#

grok 3

#

grok 4

ocean vortex
#

And that router does not nearly always work as it is supposed to

tidal ginkgo
#

claude opus 4

#

most of them

keen beacon
#

Wow

#

Someone's gambling the vc money away

stray aspen
#

which is the greatest on lmarena: gpt-5 nano,chat, gpt-5, or mini

tidal ginkgo
#

lol

#

not here man

stray aspen
#

go to video arena bro

tardy crown
#

yo
anyone know how to make the viral videos with baby speaking?
when I try, veo3 doesnt let me because it contains minors

tidal ginkgo
#

lol

stray aspen
#

you dont lmao

tidal ginkgo
#

use another ai

stray aspen
#

thats google policy

tardy crown
#

yea but people still bypass it

tidal ginkgo
#

well idk

tardy crown
#

there are tons of videos

tidal ginkgo
#

i´m no jailbreaker

tardy crown
#

on my fyp

tidal ginkgo
#

use another ai

echo aurora
kindred solar
#

I may be wrong, but I think the GPT-5 model needs to be handled a little differently than others, which may be why users are dissatisfied

#

So here, the model is always reasoning, unlike on the OpenAI platform, where it automatically selects the working style depending on the prompt?

#

so we test the gpt-5 model as on openai api but chatgpt model works diffrently right?

ocean vortex
#

And it’s not like the metrics are showing night and day difference o3 to gpt5. So those improvements you CAN see kinda align with them… If it was really performing worse I don’t think you would be able to tell it’s better (than o3)

low python
#

Selling ChatGPT Plus for $10 for 3 months.

misty star
verbal nimbus
vapid zinc
#

When will the leaderboards be updated?

echo aurora
stray aspen
#

dude gpt-5 is incredible for lua

#

it keeps on impressing me

#

gemini couldnt one shot this

#

this is great

#

i love gpt-5

#

lmao

#

i mean the fact that it works on the first try

#

this is crazy

wicked root
thorn vault
#

Hy

echo aurora
wicked root
solid brook
#

Yeah gemini benchmark is with unnerfed version of the model. If they benchmark with the nerfed model it would go A LOT down

#

I swear to god if they do the same with gemini 3 ......

languid crescent
#

Hey guys, I have a question

#

I've always wanted to customize the chat (the way I prompt). How do I boldened a text (like h1-h6)? Is it possible to do a strikethrough? Other methods like #, ###, **, ``, ?? How can I do that in LMarena? It makes my prompt much cleaner and clearly instructed.\

#

Is it called "Markdown Syntax"?

wicked root
#

man I'm getting nervous about google vs openai for August

dire cosmos
#

same here :<

dusky aurora
grand coral
#

📢 GLM-4.5 Technical Report is here!
We’re pulling back the curtain on how GLM-4.5 was built to excel at reasoning, coding, and agentic tasks — powered by a unique multi-stage training paradigm.

🔍 Highlights:
• Expert model iteration + self-distillation ➡️ unify reasoning, agentic, and general chat into one model
• Hybrid reasoning mode ➡️ knows when to think deeply, when to respond instantly
• Difficulty-based RL curriculum ➡️ break through performance plateaus
• Efficient function calling ➡️ more reliable tool use for code-heavy tasks

📄 Read the report: https://arxiv.org/abs/2508.06471

💬 Let us know your thoughts in the thread!

worldly inlet
#

S

cedar tide
#

Yes with all the models.

hallow ridge
#

I need help

cedar tide
hallow ridge
#

I need to get my crypto back

delicate rapids
harsh flume
#

Did openAI also discontinue the DeepResearch toggle button?

hallow ridge
#

Hey anyone in the dark net or dark web

#

???

ocean laurel
#

Hello

harsh flume
#

Yea, I tried a Deep Research prompt at chatgpt just to check if it'd maybe understand that it's a deepresearch query and work accordingly, but it doesnt work

#

Bummer, they could have kept it as a function to use on their new gpt

#

If anyone here uses their API and can confirm weather or not its still avaliable as a API tool call or something id appreciate

solid brook
wicked root
solid brook
wicked root
#

Without style control?

solid brook
#

Uhm

#

Idk what is stylencontrol?

wicked root
#

less emotes and stuff

#

you can turn it off on the lmarena website

solid brook
#

Oh

#

Btw the gemini 2.5 in benchmarks is the unnerfed one

delicate rapids
solid brook
#

Not the model we have access to

#

And

#

Gemini 2.5 benchmaxed

#

But gpt 5 benchmarks are real thing

wicked root
solid brook
#

Go reddit

#

They talk a lot about it

wicked root
#

So do you think GPT will beat Gemini this month?

solid brook
#

Yes if gemini 3 does not come out

wicked root
solid brook
#

This is one benchmark

cedar tide
#

@echo aurora
We want to know where the chatgpt router would arrive in the ranking. And if you add it with open ai, be sure that it is exactly the same version as the one available on chatgpt.
(that of the plus users and not pro )

wicked root
#

but it's the most important benchmark for me lol

solid brook
neon idol
#

Is better gpt 5 or gpt 5 pro?

solid brook
#

Gpt pro lol

wicked root
neon idol
solid brook
#

Dude the benchmarks are out

wicked root
#

Not in august

solid brook
#

Gpt 5 already beat gemini 2.5

neon idol
wicked root
#

What happens in sept is a non-issue for me.

solid brook
cedar tide
verbal nimbus
#

It didn't even beat gpt-4o here.

solid brook
vague bloom
#

Yo guys is there any way to connect LMArena With R Studio

solid brook
terse river
#

R language related?

#

i.. guess?

vague bloom
#

The R word is blocked for some reson

#

Idk why

terse river
#

oh

solid brook
vague bloom
lime coral
solid brook
tame granite
#

just make point system and see what is happening 😄

tame granite
viscid timber
#

where do i generate images? in video arena?

potent snow
#

Is it somehow possible to implment midjourny?

hollow imp
#

Are y'all really sure gpt5 is better than o3?

#

Even if it's better it would be the higher tier version
Us free tier users gpt5 isn't better than lmarena o3

#

@deep adder answer me gpt5 pfp

hardy pecan
# verbal nimbus

literally no one reads how this is scored, no due diligence, just copy paste misinformed charts

#

its a bit wild

mortal coyote
#

how can i retrieve the seed code of the image i generated using Lmarena

inner gate
tribal aspen
#

Y'all agree?

keen beacon
#

Doesn't mean it's a bad thing though

tribal aspen
#

If we have such a UI, I would happily use noting other than LMArena lol

keen beacon
#

on LMArena

tribal aspen
#

That can be dangerous sometimes

#

LoL

keen beacon
#

and translation

#

basic stuff

#

no sensitive data lol

verbal nimbus
lusty narwhal
#

hi

inner gate
#

Ah yes datasets

inner gate
verbal nimbus
keen beacon
hardy pecan
keen beacon
#

Concrete reason?

inner gate
#

Does Gemini 2.5 pro have a higher version of it?

verbal nimbus
inner gate
#

Someone said they purposely dumbed down gpt5

#

And will

#

Increase later on

verbal nimbus
hardy pecan
verbal nimbus
#

But I do think it's partly caused by unreliable routing. It might have underestimated the difficulty and routed it to a weaker model.

hardy pecan
#

If it hits 116, I can assume it'll hit around there on average once all the testing is done

#

Have to wait for all the testing to be completed first,

verbal nimbus
#

I don't think the router is very reliable.

verbal nimbus
hardy pecan
#

This is what I mean.

#

More data is needed to smooth out the variance

#

But yes the offline is better

verbal nimbus
# hardy pecan

Interesting, it increased for GPT-5 but the Thinking model was still worse on the second test.

hardy pecan
#

Yes I'm confused with the thinking result too , I assume it to be much better

ocean vortex
hardy pecan
#

I think it'll need to be tested more to smooth out variance

verbal nimbus
#

Actually, look at o3 Pro's score too. Did it get nerfed?

hardy pecan
#

I expect thinking to do far better that regular gpt5

verbal nimbus
#

IQ coincidentally drops by 45% right after GPT-5's release?

ocean vortex
ocean vortex
#

and why is o3-pro lower than o3...

willow grail
#

gpt5 high is the best ai ever

fleet lintel
#

When I use chatgpt app, I get GPT low/mid or high? how do I figure this out?

willow grail
#

its my computation. only for me.

#

and the other higher ups

#

we need to start gatekeep this more often MY KINGS AND QUEENS

#

THE COMPUTATION IS OURS ONLY

fleet lintel
# willow grail wont tell ya.

umm... I have feeling that most of my queries are going to low or mid. I am getting better responses from gemini model 🙁
How do I force high ??

willow grail
#

just to see if the reply there is diff.

gentle breach
#

hey

ocean vortex
inner gate
#

Even

ocean vortex
#

it's like..

gpt5 = gpt5-minimal/low
gpt5-thinking = gpt5-medium

neon idol
ocean vortex
#

so it would still be more capable even with matched reasoning effort

neon idol
ocean vortex
#

but yeah it could be high... For Pro sub maybe even normal gpt5-thinking is 'high', unsure

ocean vortex
neon idol
neon idol
verbal nimbus
solid brook
solid brook
#

I mean

#

.....

verbal nimbus
willow grail
#

gpt5 is best model if you have done ur research

solid brook
willow grail
verbal nimbus
#

I asked it to summarize the chat but it added in a bunch of random stuff that was never mentioned

solid brook
#

Because that is their top model

ocean vortex
#

can we put this to bed now?

autumn cargo
#

Do we know whether the GPT-5 in the leaderboard is medium or high?

#

I think it should be made clear.

willow grail
verbal nimbus
# ocean vortex

That's closer to it, but that's not the model on ChatGPT Plus

solid brook
#

Me

willow grail
verbal nimbus
#

The model on ChatGPT is very odd

solid brook
ocean vortex
autumn cargo
ocean vortex
#

gpt5-pro is not even significantly better than gpt5-thinking

solid brook
verbal nimbus
solid brook
#

Wich lmarena gpt 5 is set to high

verbal nimbus
#

Not to mention that it tests each model multiple times, which shows changes over time.

autumn cargo
verbal nimbus
autumn cargo
solid brook
#

It is sad what a world we live in. People are trashing gpt 5 just because it does not satisfy their delulu

solid brook
#

Sos they are diffrent

#

So

ocean vortex
#

gpt5-high is just gpt5 with high reasoning effort

verbal nimbus
#

Ok wth is actually GPT-5 Pro

ocean vortex
#

Pro is prompting the same model several times in parallel

#

you can also have any reasoning effort with Pro

willow grail
#

parallel compute?

verbal nimbus
#

There's no endpoint for it

solid brook
ocean vortex
verbal nimbus
solid brook
#

Had to

verbal nimbus
#

It took them so long to do that for Grok

ocean vortex
willow grail
#

i am biased.

#

i dont wanna read nazi stuff. not even for 100 euro

solid brook
ocean vortex
#

Also gemini deep think uses similar system

ocean vortex
verbal nimbus
# ocean vortex

I mean, it's nothing remarkable. That's exactly the same score as o3 without Pro.

solid brook
autumn cargo
ocean vortex
#

like cons@10 prompting except here rating of each individual response works differently. So it may choose a unique response even if it was very different from all the others

jolly kite
#

you hello guys, is on LMArena chatgpt 5 thinking? (i searched it but coudnt find it)

solid brook
#

Gpt 5 thinking is gpt 5 medium

#

Lmarena gpt 5 high

verbal nimbus
jolly kite
verbal nimbus
#

It's just High now

willow grail
#

REMINDER!!
POE gives you 1000 to 2000 GPT5 HIGH PROMPTS

ocean vortex
willow grail
#

for only 22 EURO

solid brook
#

Guys can you go and test models and see the reason time and not spread false info?

#

For gpt 5 pro go on youtube

autumn cargo
solid brook
#

Gpt 5 high go on lmarena

autumn cargo
#

That's what I'm saying.

ocean vortex
#

if they just wrote gpt5-pro that would be incomplete. You can't run a request through API (they probably have early access...) with pro without selecting specific reasoning effort

verbal nimbus
#

Coding scores are a bit confusing

solid brook
ocean vortex
verbal nimbus
verbal nimbus
ocean vortex
#

@autumn cargo same applies for gpt5-pro

#

you can't run it without some specific reasoning effort

verbal nimbus
#

Anyone know the reasoning effort of GPT-5 in GitHub Copilot?

autumn cargo
verbal nimbus
#

Because it's the successor of 4.1 😆

solid brook
#

Idk they must have wrote it somewhere

ocean vortex
verbal nimbus
#

At least in terms of input tokens, which is more significant I think

ocean vortex
#

You would call this o3-pro (high):

#

and this is o3-high:

verbal nimbus
#

Maybe the router just decided to send those class of problems to a dumber model

autumn cargo
ocean vortex
fervent jolt
#

Does anyone know when the leader board is going to be updated next?

ocean vortex
#

I noticed this when I was testing chatgpt router

#

when it routed to reasoning the responses were nearly as good as you can realistically expect from a thinking model

#

even though most definitely this is low reasoning effort

bright junco
#

Why does my gemini 2.5 pro print incompletely? Is there a way to fix it?

verbal nimbus
#

Like how can there be such a discrepancy in results between the two coding categories. I think the router is messing things up

ocean vortex
#

the gap minimal to low is insane

#

and they can't afford to lose out to gpt4.1 lol

verbal nimbus
#

I don't think the gap is just because of "less thinking"

keen beacon
#

gpt 5 base is just bad

ocean vortex
#

it is not less, it is quite literally no thinking at all with "minimal"

#

it outputs 2 times less than gpt4.1

verbal nimbus
#

Very weird

keen beacon
#

horizon alpha version of gpt 5 base (juice 0) has a worse gpqa diamond score compared to gpt 4.1 nano

verbal nimbus
#

I hope that's not the version in Copilot. It'll be a downgrade.

harsh flume
#

I havent played the arena in a week. Any cool anonym model right now?

keen beacon
#

horizon beta (juice 5, likely an early version of minimal) did worse than gpt 4.1 mini on gpqa diamond

verbal nimbus
keen beacon
#

openai didnt mention the horizon models in their announcement because everyone thought they were nano or mini models 😭

ocean vortex
# keen beacon gpt 5 base is just bad

I don't think it's bad. It's actually impressive the gains they were able to make with spatial reasoning. It's just that it's hard to make a hybrid model which would be SOTA both when maxed out and with reasoning off.

#

gpt5-minimal score is low because it's too concise when it doesn't get to use reasoning tokens

keen beacon
#

simpleqa was 33%...

keen beacon
#

but i agree about the hybrid thinking thing lol

ocean vortex
keen beacon
#

but they definitely focused on svg specifically among other things

#

because horizon models had poor benchmarks whilst people liked the svg from those

#

the svg thing is a tangent of mine sorry 🤣

tribal aspen
#

what is the limit of gpt 5 on lmarena direct chat?

ocean vortex
#

but if you look at webdev arena...

tribal aspen
#

someone please answer

ocean vortex
#

gpt5 is now SOTA there as well

#

o3 used to do poorly

tribal aspen
#

1415926535 8979323846 2643383279 5028841971 6939937510
5820974944 5923078164 0628620899 8628034825 3421170679
8214808651 3282306647 0938446095 5058223172 5359408128
4811174502 8410270193 8521105559 6446229489 5493038196
4428810975 6659334461 2847564823 3786783165 2712019091
4564856692 3460348610 4543266482 1339360726 0249141273
7245870066 0631558817 4881520920 9628292540 9171536436
7892590360 0113305305 4882046652 1384146951 9415116094
3305727036 5759591953 0921861173 8193261179 3105118548
0744623799 6274956735 1885752724 8912279381 8301194912
9833673362 4406566430 8602139494 6395224737 1907021798
6094370277 0539217176 2931767523 8467481846 7669405132
0005681271 4526356082 7785771342 7577896091 7363717872
1468440901 2249534301 4654958537 1050792279 6892589235
4201995611 2129021960 8640344181 5981362977 4771309960
5187072113 4999999837 2978049951 0597317328 1609631859
5024459455 3469083026 4252230825 3344685035 2619311881
7101000313 7838752886 5875332083 8142061717 7669147303
5982534904 2875546873 1159562863 8823537875 9375195778
1857780532 1712268066 1300192787 6611195909 2164201989

keen beacon
# ocean vortex gpt5 is now SOTA there as well

they focused on 'big model' things with the cpt/etc it seemed, svg, web dev, etc. just found it funny they focused on svg. the benefits are definitely real though in those areas. at least with the horizon checkpoints, those models were fried except in those regards

#

my points are about different things and i clustered them together for some reason and it's confusing sorry 🤣

verbal nimbus
autumn cargo
# autumn cargo Yes of course o3-pro and o3-high are two different models. I was aware of that. ...

But again there is a discussion here https://community.openai.com/t/the-least-important-question-right-now-why-is-gpt-5-pro-not-available-in-api-at-exuberant-pricing/1339471/2 that hints that gpt-5 pro is just a maxed out gpt-5. So really unless OpenAI introduce a new model named gpt-5 pro, I'm inclined to think that gpt-5 pro doesn't exist! List of models here: https://platform.openai.com/docs/models

verbal nimbus
#

Gemini and GLM 4.5 seems benchmaxxed for React + Tailwind

#

On Design Arena they drop to #9 and #10, since they can't use React

hollow imp
unborn sleet
#

can someone tell me why chat gpt doesnt host previews of their code?

verbal nimbus
#

It's like Web Dev Arena

unborn sleet
hollow imp
unborn sleet
brittle tiger
#

Native image gen finally coming to better model than 2.0 flash. Crazy how long that one has been out without improvement

#

Logan also showed it editing palmer luckeys tweet

willow grail
solid brook
#

There is no limit on chatting with the model i think

willow grail
#

box before eating: 340g

#

pls weigh again after eating. thank you

neon idol
#

And how can try it

solid brook
#

The death star

whole sundial
#

Introducing GLM-4.5V: a breakthrough in open-source visual reasoning
︀︀
︀︀GLM-4.5V delivers state-of-the-art performance among open-source models in its size class, dominating across 41 benchmarks.
︀︀
︀︀Built on the GLM-4.5-Air base model, GLM-4.5V inherits proven techniques from GLM-4.1V-Thinking while achieving effective scaling through a powerful 106B-parameter MoE architecture.
︀︀
︀︀Hugging Face: huggingface.co/zai-org/GLM-4.5V
︀︀GitHub: github.com/zai-org/GLM-V
︀︀Z.ai API: docs.z.ai/guides/vlm/glm-4.5v
︀︀Try it now: chat.z.ai

**💬 1 🔁 3 ❤️ 19 👁️ 957 **

stray aspen
#

why is gpt-5 api so forgetful

#

i cant keep track of older messages in a conversation

inner gate
#

What’s ur thoights on deep seek r1

#

Thoughts

eternal niche
#

btw guys gpt5 sucks

#

gemini 2.5 pro even better

solid brook
ornate ether
stray aspen
#

now its decent

#

but the reasoning time is just nasty

hearty pulsar
#

AI going great guys

#

10 million input tokens to do nothing but waste compute

stray aspen
#

what

keen beacon
#

New imagen model will come out today

whole sundial
#

at least something to make up for the lack of glm image gen

#

I will try it as soon as it comes out

stray aspen
#

why more image model

#

we want a llm

keen beacon
#

They'll give a taste of what's upcoming

stray aspen
#

holy

#

its the new image model

#

i hope it hass image editing

#

gemini 2.0 image editing was great

keen beacon
#

Quite sure it will. This looks like the gpt 5 teaser image with image editing

stray aspen
#

yeah

keen beacon
#

Just in case

#

Lol

barren prairie
keen beacon
stray aspen
#

i saw you in the z.ai server

inner gate
#

I feel like Gemini 2.5 pro has boosted in intelligence these past few days idk if it’s just me

stray aspen
inner gate
#

I must be lucky 😭

barren prairie
inner gate
echo aurora
#
poll_question_text

What version do you use the most?

victor_answer_votes

8

total_votes

14

victor_answer_id

3

victor_answer_text

Direct

willow grail
#

which ai product has the best lip sync?

thorny cove
#

i keep getting "Something went wrong with this response, please try again." with GPT 5 Chat

echo aurora
#

I assume it's just that model you're running into issues with?

thorny cove
#

it also happened with main gpt 5

#

but the thing is the gpt 5 chat error is only for 1 chat

#

could it be a rate limit of some sort

stray aspen
#

i dont know if its because im giving extremely long prompts

echo aurora
echo aurora
stray aspen
#

it generates stuff and then it shows that error

echo aurora
#

although I am now noticing for gpt-5 the responses are coming in pretty slow and lag.

#

were both of you seeing the same or was it just the error message?

stray aspen
#

it must be a limit lol

#

its like 1500 lines of code

#

this is new

#

on the leaderboard

#

it used to be gpt-5

keen beacon
stray aspen
#

yeah but its just on the leaderboard

#

i guess gpt-5 is using high effort on arena too

keen beacon
thorny cove
echo aurora
solemn plank
#

HOW TO CREATE VEDIOS ANYONE?

cedar tide
#

Bad webdeb

primal orbit
#

I have an error with opus 4.1 thinking when it repeates the same response it gave to the previous message. And if I try to put a new message, it gives you have to wait for 50 minutes. But the clock doesn't go down. It's 50 minutes each time.

cedar tide
#

2.5 flash lite still hasn't come back

rapid merlin
#

gpt-5 is really hit or miss with styling, either it comes with something actually good or something like this

thorny cove
cedar tide
#

And very bad overall

primal orbit
#

it was released to make gpt 5 look good.

#

and for cheap marketing points

echo aurora
bright junco
#

Why does my gemini 2.5 pro print incompletely? Is there a way to fix it?

thorny cove
#

two different chats at the same time btw

#

whats the diff between gpt 5 chat and gpt 5 high

pure falcon
#

How and Why did GPT-5 lose two votes?

cedar tide
#

Just good in math

pure falcon
#

3183 votes

#

But now the leaderboard says 3181 votes

pure falcon
#

Shouldn’t we have a lot more votes since 5 whole days passed? Why did it LOSE two when all the others gained?

keen beacon
echo aurora
pure falcon
#

Gemini added 1.3K votes. And GPT-5 loses 2? Lol something is very wrong about that

vernal meadow
#

Wow Opus 4.1 improved more than I thought on the none Agentic coding task.

Impressive. Should retest it more. Will Opus 4.1 thinking be on #1? 😮

exotic nebula
#

But it is very impressive to be honest.

#

I mean, the votes are too low to decide. Lets just wait around for a bit.

thorny cove
#

any way to fix "Something went wrong with this response, please try again." for one singular chat?

exotic nebula
thorny cove
exotic nebula
#

True. Have to agree with you there.

#

Btw, if you dont mind me asking, which model do you like the best?

wicked root
#

why was gemini updated but not gpt?

#

yeah seems like so

#

confidence interval points that way

#

you mean this isn't reliable?

#

bugged in what way?

#

sorry, I'm new to this

#

ah

echo aurora
stray aspen
#

pineapple do you know anastasios

wheat onyx
zinc ore
#

Vxtwitter where x is and it'll show the vid

floral comet
#

May i ask, how many is the token count of gpt 5 on (LM arena)

stray aspen
#

idk but it stops generating for me after 2000 lines of code

echo aurora
# pure falcon <@283397944160550928> Any ideas?

The vote is based on pre-release GPT-5 testing. After GPT-5's public launch, we created a new model entry that points to its public endpoint and collecting more votes. These additional votes we've been collecting aren't yet added to the current leaderboard. We will be merging the votes in the next leaderboard release. cc @deep adder

willow grail
floral comet
#

It's been 15 mins, I requested a code on gpt-5 and yes I refreshed the website still same.. Hope it didn't bugged or the code is just too long🤣

echo aurora
#

I believe yes, but will double check and update if that's not the case.

slow grotto
#

imagen 4.0 what are you doing man

zinc ore
#

Nooooo

stray aspen
#

lol disappointing

echo aurora
#

yes, confirmed.

red tangle
#

legit?

#

so many grifters these days that it's hard to tell what's real

#

someone claims they found it in "source code"

willow grail
neon idol
nimble trail
zinc ore
#

Definitely made up

nimble trail
mossy drum
small haven
#

is gemini 3 coming in august

wicked root
small haven
#

what are they hyping about then?

#

imagen 5?

#

lol

keen beacon
#

of the 2.0 flash image preview

#

Donno exactly

small haven
#

makes sense

keen beacon
small haven
#

ah there we go

#

imagen 5

sacred quail
#

imagen series are fine but really bad at understanding prompt

#

Needs native model like gpt 1 image

sacred quail
#

it was great at writing,long context, analyzing videos

#

Also it was fine with reasoning and coding

velvet patrol
#

Why did the amount of votes for gpt-5(-high) change from 3182 to 3181 after the update and the votes for 2.5pro increased by 2k?

#

is gpt-5 already gone from the arena?

whole wagon
#

We’ve scored highly enough to achieve gold at this year’s IOI online competition with a reasoning system — placing #6 when ranked with humans and #1 when ranked with other AIs.

In just a few weeks:
• 2nd at AtCoder
• Gold medal-level at IMO
• Gold medal-level at IOI

velvet patrol
#

hello viren 😛

whole wagon
#

Hi

ornate agate
#

What is eb45-turbo?

hollow pebble
#

look at my video Today!

willow grail
hollow pebble
#

i only have one vote left! So vote for my video please!

willow grail
#

what video?

eternal niche
#

btw guys gpt5 sucks

willow grail
hollow pebble
#

Here's a hint! A image to video prompt is in the video-arena-1: A Female woman in her 1950's cartoon was smiling, giggling & talking someone with beauty. Models: Veo-3-audio-fast vs Hailuo 02 Pro.

eternal niche
hollow pebble
#

look at video-arena-1 and vote for my video!

sacred quail
#

Claude must be doing something magical on codes because even benchmarks not looks great, everyone still using them for coding soo

stray aspen
stray aspen
hollow pebble
#

hey guys! Look at my video on video-arena-1.

sacred quail
#

Not in long context

dim pine
#

Gpt5 😂

eternal niche
willow grail
#

gpt 5 very high before gta 6 last day 2025

jade egret
wheat onyx
#

Pretty sure that was for flash

zinc ore
#

Or hallucinated AI PR I should say

#

If that's the cli one from a month back

stray aspen
#

gemini has a github emoji on lmarena

echo aurora
harsh flume
#

like once you get used to your current stack and workflow youd need a much higher threshold of improvement than mildly better to just send it all to air and get used to another tool, even if in practical terms it'd be just a slight annoyance to do

stray aspen
#

it just put that idk why

neon idol
#

@echo aurora Sorry for this stupid question but in your opinion when Gemini 3.0 will be released?

lime coral
#

In 3 sec

hardy pecan
#

GUIZ GPT 5 IS BAD, BRING BACK 4o!!!!

stray aspen
#

so o3 is smarter than o3 pro

#

thats crazy

hardy pecan
#

It's a flawed benchmark, 1 data point for o3 pro underperformed alot which brought down its average

candid storm
#

How do you know?

#

The data is not published yet right?

#

35% chance is not the same as 'gpt5 will beat gemini in remove style control'

neon idol
#

I am the only that don't care about the message taken by lm arena for testing ai?

misty vault
#

no

#

I even put my home address and credit card numbers in it

#

It is probably on that huggingface link

gentle plinth
#

Normal 5 scores worse then others

hardy pecan
#

its very scuffed, but no one actaully reads how the data is collected and presented, its bizzare. ME SEE CHART ME BELIEVE

wicked root
#

what's happening?

keen beacon
candid storm
#

Ok tnx, you seem like you know your stuff. I've redistributed my portfolio to account for the uncertainty

sacred quail
#

this is NOT long context

#

i'd say after 500k token long context starts

#

Gemini has no competitor

#

Only Minimax M1 tried to be close

#

Also yes, long context is important

#

Espicially when you try learn something, if you are a student, if you need summarize or analyze of some long text, you have no option besides gemini

wicked root
#

how so? I'm hearing mixed opinions across the board on gpt5, it seems highly polarized.

sacred quail
#

Could be true

#

As a gemini fan im just admitting gpt 5 is best model right now

#

Better than O3

#

Also i do showing big respect towards to their "less praise" model choice

#

After 06/05 goldmane update gemini turned to 4o like praising you for everything

#

im really not liking this

#

Yeah

#

I understand people had very high expectations but

#

There is so many unnecesary critizing towards to gpt 5

#

Also im not sure but probably gpt 5 is very efficent model too which is also important

#

Better than O3 but also cheaper than O3

#

Oh really

#

interesting. But they lowered O3 price soo maybe we should compare with first api price

#

If i not remember wrong gemini 2.0 flash think was before than deepseek R1 but weirdly people didnt care

#

it was quite good

#

2.0 was trash but

#

No

#

Listen

#

2.0 was a trash model but 2.0 flash think was 3x 4x better than 2.0 flash so they did really good on that reasoning thing

#

Even if base model is trash

eternal niche
ocean vortex
wicked root
#

@deep adder market’s pricing in your hypothesis

#

What? Ai market is a lot more liquid than the one I trade in

#

¯_(ツ)_/¯

wicked root
#

you seem very confident in gpt, can I ask why? I'm new to this so I don't know a lot.

devout vault
#

yes

devout vault
heavy knoll
#

Can someone Tell me Which one is Chat gpt 5 high

wicked root
#

@eternal nicheman your pfp creeps me out lmao

#

I just realized... I'm SO sorry if that's your selfie

wicked root
#

love the eccentricity, but you've now intensified my fear

#

this is the weirdest thing I've seen in my life and i mean this in the best way imaginable LOL. What's he saying?

wicked root
#

Слушать на всех площадках: https://band.link/matushka_
"Матушка" (слова и музыка - Пётр Андреев)

Подписывайтесь на соц. сети:
Сообщество ВК: https://vk.com/tatiana.kurtukova
Личная страница ВК: https://vk.com/ts_makeeva
Instagram: https://ins...

▶ Play video
#

?

eternal niche
#

yes

sacred quail
#

"gpt 5 chat" is not

ocean vortex
#

gpt5-chat is 4o-latest successor, no reasoning

#

Bluntly speaking it's probably the same as gpt5-minimal

#

just like 4o-latest kinda sorta was the same as gpt4.1

sacred quail
#

@ocean vortex do you what is gpt 5 thinking's base model ?

#

Gpt 5 chat ?

#

Or gpt 4.5 ?

marsh stratus
#

GPT 4.5 is too slow and expensive

ocean vortex
#

So like, that same model is also technically the base model

torn mantle
#

Best model is gemini

ocean vortex
#

They probably would have used gpt5-chat with no routing if it performed better… lol

#

But now by routing it occasionally to gpt5-low, it can comfortably beat gpt4.1

rugged brook
barren ermine
#

what model is best for development and code?

rapid merlin
#

I'm confused about the division between gpt-5-chat and gpt-5 minimal, low and medium, can somebody enlighten me

ocean vortex
junior sonnet
#

can i choose the model in the vid gen?

verbal nimbus
#

I think Design Arena uses pure HTML/JS instead for websites

brisk helm
misty vault
#

Large Language Model

rapid merlin
wicked root
#

someone here said grok4 is the best at coding

#

Is this for ALL coding?

#

@deep adder opinion on grok4?

wintry tinsel
#

Soooo my bros

#

When is Gemini 3

cloud zinc
#

october

wintry tinsel
#

Account was made today

#

Ban this freak

wicked root
#

what's hardcoding?

mellow frigate
#

What do you mean? Generating music?

inner gate
#

!!?!?!

misty vault
#

Yes, you have a chance to be banned when your account is made today

inner gate
#

Oh wow how come

misty vault
#

because of @hollow ivy starting a gemini 2.5 pro gooning cult

#

They will eat you alive if you say anything negative about gemini 2.5 pro

timber tulip
#

hallo

sick spire
#

How to Delete a Generation
lf you'd like to delete the initial prompt and generation from the bot, right-click the bot'smessage and select Apps > Delete Generation . Note that deleting the originalprompt will also delete its corresponding generation, but deleting just the generationwill leave the original prompt intact

golden ocean
#

ts so wholesome

sand bay
#

you know that this site is a bit deceptive

hardy lion
golden ocean
stray aspen
#

So who cares

sharp yew
echo aurora
hardy lion
#

The GPT-5 model without a system prompt does not know that it is GPT-5. This can be reproduced on OpenAI's API playground

slim mesa
#

WTF THE GPT 5 HIGH ARE ON THE CHAT?

agile bloom
#

woah way too many gpt5 which one is for what?

solid brook
agile bloom
tidal ginkgo
#

hey uhhhh

#

i was off of gpt-5 for a bit

#

why is there a gpt-5-high?

#

i thought gpt-5 was high already?

#

oh that sounds bad

rare python
#

they made it clearer

#

because new people can be confused

echo aurora
#

It's the same model, but added the high to make it more clear

tidal ginkgo
#

oh ok ty

#

lol

patent bane
#

hmm

gusty helm
#

Hey! Not sure if this was asked but i see a leaderboard update in text arena yesterday, but did the gpt5 votes remain unchanged? Score think changed a bit, but not votes? Some bug?

echo aurora
# gusty helm Hey! Not sure if this was asked but i see a leaderboard update in text arena yes...

Hello ablobwave - we did chat about this earlier but will share the response

The vote is based on pre-release GPT-5 testing. After GPT-5's public launch, we created a new model entry that points to its public endpoint and collecting more votes. These additional votes we've been collecting aren't yet added to the current leaderboard. We will be merging the votes in the next leaderboard release.

gusty helm
lime oak
#

guys what differint with vedio arena 1 and 2 and 3 and 4

echo aurora
lime oak
#

ok nice idea

quiet dust
#

Hi guys, is the gpt-5-high on LMArena the same as the regular basic gpt-5 model in ChatGPT?

hardy lion
#

not quite, gpt-5-chat should be the closest to the experience in chatgpt

quiet dust
#

So, gpt-5-high approximately at the same level as GPT-5 Thinking (medium mode)?

hardy lion
#

gpt-5-high should be gpt-5-thinking (high mode)

quiet dust
hardy lion
elder lintel
#

hey

hardy lion
#

sup

keen beacon
#

can someone help me, in lmarena its always stuck generating and idk what to do anymore, refreshing the page doesnt work for me

wicked root
#

How many of you think Gpt5 will beat Gemini with style control OFF this month?

#

CI band is insanely wide on gpt5

dawn grove
#

How can u get Lmarena unlimited free is there a method?

#

I*

drifting thorn
dawn grove
#

But it has limitation i mean this likenit has a amount of messages

acoustic cliff
#

Of course

keen beacon
ornate agate
#

A fair trade.

drifting thorn
#

A fair trade.

keen beacon
#

No imagen model yet...

#

I thought it would get released

#

Sigh.

keen beacon
#

Because people don't read and think they're gaming the system by getting gpt-5 for free

sacred quail
#

Well, they can read my conspiracy theory tests which is im testing them on every new model

#

Only problem is if they starts to believe

ocean vortex
#

oh you probably haven't verified your org since you don't have summaries either. But it's weird they aren't letting you adjust reasoning. You may be stuck on "minimal" lol

sacred quail
ocean vortex
#

This is roughly accurate, except when chatgpt decides to route your request to reasoning when you are using "GPT5". Then it is no longer gpt5-chat.

hardy lion
ocean vortex
hardy lion
#

The api calls are using reasoning_effort="high"

gentle plinth
#

we are considering giving a (very) small number of GPT-5 pro queries each month to plus subscribers so they can try it out! i like it too.

but yeah if you wanna pay us $1k a month for 2x the input tokens feels like we should find a way to make that happen...

gentle plinth
tulip cipher
#

what i need to do now..

sage heath
#

Hello

ocean vortex
tulip cipher
sleek crow
# tulip cipher ?

you can reset cookies only for lmarena on your browser to reset the limit

sleek crow
#

what browser do you use ?

tulip cipher
tulip cipher
sleek crow
#

Just find on internet how to reset cookies for a website for fiefox beacuse right now im on microsoft edge

sleek crow
#

i have no clue it think its just the same

spare rune
#

whats the rate limit on gpt5 high?

indigo hazel
eternal niche
#

btw guys gpt5 sucks

ocean vortex
eternal niche
leaden sun
keen beacon
eternal niche
gentle plinth
#

its not worth it

leaden sun
gentle plinth
#

its not that expensive to do inference at scale to justify such costs, even for larger models

leaden sun
#

not for certain niche circle of people who dont understand the details i guess

ocean vortex
#

Not to mention that with 1k per month you kinda already could comfortably rent hw to host any model all to yourself

unborn lantern
eternal niche
unborn lantern
#

Lmarena just change the name of gpt 5 to gpt 5 high or they change the api?

eternal niche
#

yes

ocean vortex
tulip cipher
#

nice nick btw

eternal niche
#

Что вершит судьбу человечества в этом мире? Некое незримое существо или закон, подобно Длани Господней парящей над миром? По крайне мере истинно то, что человек не властен даже над своей волей.

eternal niche
woven scarab
#

gm AI fam! 😁

cedar tide
#

@echo aurora Why was step 3 removed 6 hours after it was put in?

cedar tide
tall summit
#

is gpt5's context limit on the website still 8k/32k/128k?

eternal niche
tulip cipher
keen beacon
#

But gpt-5-thinking or high is the new standard for complex stuff

junior sonnet
#

Ok

viscid timber
#

is gpt 5 the best ai model now?

keen beacon
viscid timber
keen beacon
viscid timber
#

questions, basic tasks

keen beacon
#

Id say gpt thinking. For vibes kimi 2 and claude are better

barren ermine
#

what about creativity?