#general

1 messages · Page 87 of 1

reef pawn
#

What was the prompt?

errant cave
reef pawn
#

Ohh

golden ocean
#

LMAO

wicked root
#

If GPT5 didn't live up to expectations, why is it doing so well on the leaderboard? And why is polymarket acting like Google won the race this month?

ocean vortex
#

It's a great model

#

the best we have

#

3.5 gonna destroy 3.0

wicked root
#

Oh, because of Gemini 3.0?

keen beacon
#

gem 3.0 isnt coming this month apparently

wicked root
#

gap seems to be narrowing though

keen beacon
#

but new 2.5 update

ocean vortex
candid storm
#

I dodged a bullet lol

wicked root
candid storm
#

Fortunately i could switch teams just in time

ocean vortex
wicked root
#

@candid stormI don't see your name.

ocean vortex
#

and since 2.5Pro is technically leading at this moment on leaderboard with their settings, the odds are that

wicked root
#

hm I see

candid storm
#

Im RIC25

wicked root
whole wagon
#

Smth that really annoyed me about openai powerpoint is a lot is just totally incorrect. Like here, 50 is the number of tests they ran not the percent. Imagine getting paid 8 figures and messing up smth so trivial

candid storm
whole wagon
#

it is supposed to be this

#

How do you mess up trivial bar graphs

candid storm
#

Crazy good trade you made!

wicked root
#

🫡 I'm a bit worried if GOogle will be able to keep up the lead until the end of this month

whole wagon
fickle venture
golden ocean
#

It's large
It loves language
and it's a model

fickle venture
#

Ok

candid storm
#

I think google 69% september is a good deal

whole wagon
#

anyways i have pro already

#

Not in api yet i think. or anywhere even lol

golden ocean
#

are all other models gone

whole wagon
#

i like im always the very first for literally everything for some reason lol

whole wagon
fickle venture
keen beacon
whole wagon
#

no idea

errant cave
#

It could be way better but it's a step in the right direction

#

Hope we can reach a level of aesthetics similar to Web 2 Gloss again

brittle tiger
rapid merlin
#

im a plus user and i still only have it on my phone

#

:((

whole wagon
#

ye they give me stuff early. sometimes it gives a popup to ask if i want to be early tester for features or whatever

blazing bison
#

gpt-5 is so good for code oh my god

#

i'm changing my mind completly

whole wagon
#

.

blazing bison
#

ok maybe craig was right

#

all the time

stray aspen
#

yes

#

he was

#

i was wrong too

whole wagon
#

people called it crap without trying it? thats kinda weird

blazing bison
#

bro it's one shotting things that opus was sad trying to accomplish

#

it's bcs i'm trying with real world shi* now instead of some random create a game or site

brittle tiger
#

The negativity is due to it not matching the gains seen in previous gpt 2 -> gpt 3 and gpt 3 -> gpt 4 and hype posting leading up to release. It's clearly v good. Sentiment might get better

little narwhal
blazing bison
#

people when start using gpt-5 for real world tasks will notice the difference

#

wtf

rapid merlin
#

what about the lmarena rating, though

#

I haven't tested it thoroughly yet as i dont have it on my computer yet

stray aspen
rapid merlin
#

it was below gemini for a bit which is a bit concerning

blazing bison
stray aspen
#

anyways gpt 5 is a great model and its better than 2.5 pro

rapid merlin
#

it was some benchmark i believe

hollow ocean
rapid merlin
#

but yeah, i will see y'all tomorrow as i'm heading off to sleep

blazing bison
#

yeah it's better

#

it's sota

#

for me is underwhelming codex cli using gpt 5 on a project with 600k + tokens and identifying right files and doing modifications without creating any bug

#

and my prompt was not even good, it just did

whole wagon
#

even in the benchmarks it was usually

blazing bison
#

i was expecting more from benchs

#

but maybe they are really saturated rgn

stray aspen
#

lets see if it is SoTA in livebench

blazing bison
#

idc livebench bro, it do code good

#

i'm happy

whole wagon
#

who cares about livebench anyways

#

it had 4o as better at coding than o3 for ages

blazing bison
#

true

#

o3 was lazy

#

it's not

#

well, i didnt try it on chatgpt interface yet

#

for now i'm burning some dollars on the api

#

i think it saved me 5 hours of work for $2

#

🤓

alpine osprey
#

Can someone unmute my vc

#

@echo aurora

#

thank u

echo aurora
#

np

alpine osprey
stray aspen
#

that must be awfully expensive

echo aurora
hoary elbow
#

Is genie two at least public?

solar hollow
#

still not able to solve simple chess puzzles with human words unfortunately

wheat onyx
#

still no GPT-5 on my plus account....

#

5pm PT

fickle venture
fickle venture
#

It's just rolling out slowly

#

Since people live is us and openai is there they got it fast

blazing bison
#

so no

#

and thinking i think already is?

fickle venture
#

Oh I see

fickle venture
#

Like o3-pro

exotic nebula
fickle venture
#

Dang

blazing bison
#

it's not necessary btw

wheat onyx
blazing bison
#

this model takes more than 10 minutes for asnwer

flint skiff
#

damn is it just me or is gpt5 pretty meh at UX design?

#

claude does it a lot better for me

blazing bison
#

yeah it's meh

flint skiff
#

so better for backend stuff?

blazing bison
#

but if you give a example he is really good copycat

flint skiff
#

seems like cc is still king

#

atleast until weekly limits

blazing bison
#

idk

#

i can use a example of a design that i like and gpt-5 can create using that

#

claude is not so good with this

flint skiff
#

trying it now

blazing bison
#

but if you don't have idea of what you want, then claude is better

flint skiff
#

yeah I mean gpt 5 is useful its just not as versatile as I expected

#

where are you using it?

blazing bison
#

i think it's the best code modifier model rgn

#

i'm using api directly

flint skiff
#

its only medium reasoning on cursor rn

#

why do people even like cursor

#

I feel like they nerf models hard

#

everytime

blazing bison
#

it nerfs

#

you are not seeing the true potential

#

bcs they use like 3k tokens and rag

flint skiff
#

whats ur pipeline

blazing bison
#

rgn i'm using codex cli

flint skiff
#

codex is good?

#

I heard its eh

blazing bison
#

they updated it

#

the reason i'm testing

#

claude code still better i think, but i'm not paying for claude anymore so

flint skiff
#

does it work like cc

#

where u can login?

#

or do u just use api credits

blazing bison
#

i think yes

#

i'm using api credits rgn bcs it was not working

#

the login thing

#

but they added it

flint skiff
#

I stopped my max 20x sub on claude cuz I thought gpt 5 was gonna be way better lol

#

might sub again

blazing bison
#

well i think it is better

#

cursor is not reference

candid storm
flint skiff
#

it felt so garbage on cursor

blazing bison
flint skiff
#

holy fkkkk

blazing bison
flint skiff
#

im sure they have it super nerfed

blazing bison
#

on cursor

flint skiff
#

cuz they made it free

blazing bison
#

bro they made it free, what you think?

flint skiff
#

for a week or smth

#

yeah

#

but like thats dumb no?

#

people that use it gets a bad taste

#

like this makes me wanna stay away from cursor more lol

hoary elbow
#

When do you think GPT 5 pro will come out as an api

blazing bison
#

they didnt talk about that

#

so maybe never?

hoary elbow
#

All right

blazing bison
#

i'm not saying never ,but idk

hoary elbow
blazing bison
#

there is not any info

#

about it

flint skiff
#

probably a few months

#

but it kinda seems like openai is cooked now if this is all they got

blazing bison
#

bro

#

how do you say that without even testing

flint skiff
#

I mean ill try it yeah

#

setting it up right now

#

are u impressed?

#

like actually

#

how does it fare to opus

blazing bison
#

i'm

#

opus failed 5 tries and gpt-5 one shotted a task on 600k tokens project

#

i didnt send 600k tokens btw, it was using cc and now codex cli

flint skiff
#

hmm

blazing bison
#

gpt-5 was able to identify the correct files

#

and do the modifications

flint skiff
#

ur using vs code?

blazing bison
#

bro it was like 8k lines modifications without creating any bug

blazing bison
flint skiff
#

yeah that sounds crazy

blazing bison
#

i never see an AI do that much modifications without break anything before

flint skiff
#

its way cheaper than opus too

blazing bison
#

it's C# btw

flint skiff
#

ok thats impressive

blazing bison
#

i run the test just for fun, i didnt believe in it bcs my parament was if opus and gemini 2.5 can't so no model can

stray aspen
#

livebench benchmark is out

#

gpt-5 is SoTA

sacred quail
#

i liked too

#

Still sad that O3 will disappear

#

Btw in free version on android, you can select reasoning or non reasoning

#

So which reasoning mode running in mobile app for free version ?

#

Medium reasoning ?

quartz light
#

guys

#

i figured out the true release date of gpt 5

#

gpt-5: 2025-08-05T20:29:37 UTC

gpt-5-mini-2025-08-07: 2025-08-05T20:31:07 UTC

gpt-5-mini: 2025-08-05T20:32:08 UTC

gpt-5-nano-2025-08-07: 2025-08-05T20:38:23 UTC

gpt-5-chat-latest: 2025-08-01T18:35:06 UTC

gpt-5-2025-08-07: 2025-08-01T19:09:20 UTC```
jade egret
#

nah..

quartz light
#

so

#

yall

stray aspen
quartz light
#

connect the dots of the models which released like horizon beta n stuff

flint skiff
#

@blazing bison is it gpt 5 or gpt 5 2025-08-07

#

in the model list

wicked root
quartz light
stray aspen
quartz light
#

is livebench legit now

stray aspen
#

bro depseek is above gemini no max think

quartz light
whole wagon
stray aspen
#

what does this mean

#

why is gpt-5 not on top

#

its greater than these models

whole wagon
#

Not in all aspects

#

Clearly

#

The benchmark is private

#

There is 10 public question and 400 private ones. The public ones not used in the testing

quartz light
#

grok 4 is dogshit

#

💔

#

however

#

i do have hope

#

in the new grok model

stray aspen
#

grok 4 is actually an amazing model

quartz light
#

releasing this month

#

grok 4 coder

sacred quail
#

Guys, in mobile gpt 5 thinking using which reasoning mode ? Medium reasoning or high ? Or do we need to buy plus for able to select high reasoning

quartz light
stray aspen
#

the api just says gpt-5

#

what about the gpt-5 in lmarena

quartz light
#

oh, I thought thinking isnt available on free plan at all since they removed model selection

limpid schooner
#

hey guys, so did anyone figure out here what zenith was?

quartz light
#

fine ill try it

zealous panther
#

No one can

stray aspen
#

it was deepseek r2

quartz light
zealous panther
#

Oh ok

quartz light
#

this way we can find out what zenith could be

limpid schooner
quartz light
#

i just need to find first mention of zenith

#

get the time

#

aaand

limpid schooner
#

holy

sacred quail
#

is horizon beta on openrouter was gpt 5

#

?

quartz light
#

^

quartz light
#

lol

limpid schooner
quartz light
limpid schooner
#

are we sure

quartz light
#

yeah pretty sure

#

created 1st of august

#

likely just an edit of zenith

#

dingdingding

limpid schooner
#

so zenith was not a thinking model?

whole wagon
#

How do people fall for the simple bench 90% GPT5 "leak". They must be incredibly dumb

quartz light
candid storm
#

Is zenith gonna be on the leaderboard?

whole wagon
#

Without even a second of critical thoughts

quartz light
limpid schooner
#

zenith was so good

quartz light
#

so, zenith was probably gpt 5, summit could be mini and lobster could be nano

quartz light
limpid schooner
#

they said summit is gpt 5

brisk helm
#

how do we select how much we want gpt 5 to think in lmarena

limpid schooner
#

lmarena mods

quartz light
#

gpt-5-nano was made 4 days later

stray aspen
#

you dont

brisk helm
#

oh nvm

quartz light
flint skiff
#

is anyone using gpt 5 codex cli rn

quartz light
#

💔

#

give api

#

fr

stray aspen
#

fr yes cap

quartz light
#

ms paint

sour kiln
#

A heroic white police dog with shiny blue eyes, wearing a full police uniform, is bravely rescuing a brown rabbit from drowning in a fast-flowing river. The dog is standing in the water, strong and determined, holding the frightened rabbit gently in his mouth while people watch from the riverbank with admiration and awe. The scene is realistic and emotional, with splashing water, dramatic lighting, and a clear sky. The dog is the hero of the town, and everyone loves and respects him

patent aspen
#

capacity crunch issues

#

It sure is

wheat onyx
#

alright assuming I was lied to, and not getting GPT-5 on Plus today

civic flame
#

well yikes

rare python
#

🤔

stray aspen
#

6 months is a lot

#

we still have gemini 3 and grok 5

quartz light
#

i thought it would be grok 4.1 or 4.5

#

since its just a coding version

quartz light
# quartz light

i just realised the placement of the lines might actually represent the exact date of release

hollow ocean
#

@deep adder question 10 is solved

stray aspen
hollow ocean
#

simple bench

#

yeah

#

90% on public questions

mint relic
#

Brillant

stray aspen
#

how does genspark have gpt 5 pro

#

gpt 5 on bing is out

quartz light
quartz light
quartz light
#

no signup too

#

deep research pops up on signin

#

cool

stray aspen
#

gpt 5 in microsoft copilot sucks

quartz light
stray aspen
#

yes

quartz light
#

you should ask it to think deeply

stray aspen
#

lets try

#

yes

#

it worked

quartz light
#

LOL

stray aspen
#

lol

#

this one nailed it first try

quartz light
stray aspen
#

its a great website

#

but its limited

quartz light
stray aspen
#

you basically get points from rating the AIs

#

and you spend them points to use the ais

#

gpt-5 on microsoft copilot is great

#

but you have to ask it to think deeply first

sacred quail
#

also in poe app you can select high reasoning mode

blazing bison
#

so apparently zennith was gpt-5 too, but another version, and summit won for some reason and they killed zenith

#

😢

stray aspen
stray aspen
#

how does the reasoning effort of gpt-5 work?

#

the gpt-5 of microsoft copilot is smarter than the one in lmarena

verbal nimbus
quartz light
#

but worse

stray aspen
#

yeah that website sucks

#

if we tallk about using paid models for free of course

jade egret
#

if i use gpt-5 to prompt engineer a prompt for gpt-5 🤔

sacred quail
#

btw on poe app, you can use gpt 5 high reasoning with free for multiple prompts

stray aspen
sacred quail
stray aspen
#

you got aobut 30 prompts worth of points

#

for that model

exotic gust
#

I’m scared of what the future might bring for ai

stray aspen
#

also one thing

#

if you go here and make your chat public it reduces the cost by half

verbal nimbus
stray aspen
#

yes very interesting

sacred quail
stray aspen
#

microsoft launched a 3d model website

sacred quail
#

LM arena is still my beloved but, its interesting to see someone trying to be competitor

stray aspen
#

dude what

#

is this genie 3

stray aspen
#

its free

#

no sign up

#

what i like about yupp is that it has a lot of models

quartz light
#

THE ASSETS LOOK SO GOOD

#

even though jump doesn't work

stray aspen
#

where did you make it

quartz light
#

copilot!!!

stray aspen
#

thats crazy

#

it looks pretty decent

quartz light
stray aspen
#

@quartz lightyo bro

#

do you have deep research

#

in copilot

quartz light
#

yep

stray aspen
#

thats crazy

#

i dont yet

quartz light
quartz light
#

@stray aspen

quartz light
stray aspen
#

yes

#

it hasnt rolled out for my account yet

#

probably because i created it outside of canada

quartz light
#

im in ireland

stray aspen
#

thats great

#

google has to lock in

#

openai and their partners hit the industry hard

quartz light
#

hey lneduo

#

have you been able to generate long code on yupp

stray aspen
#

yes

quartz light
#

does it cut off

stray aspen
#

no

#

it works fine

#

but the website is laggy

verbal nimbus
stray aspen
#

that fancy UI makes it laggy

stray aspen
verbal nimbus
#

I'm on mobile, Web Dev Arena throws an error like 50% of the time

#

Sandbox fails to appear, voting button disappears, lol

#

ChatGPT app can't copy

#

Claude input bar still buggy

#

Why do all these AI apps have so buggy frontends

stray aspen
#

idk

#

i dont use web dev aren

verbal nimbus
#

Gemini is so funny

#

It manages to recreate an entire component from minified and obfuscated React code

#

But then gets stuck trying how to make an inner div fill its parent

quartz light
#

i just caught a FATASS MOTH

#

i mean uh

#

ai ai ai ai

verbal nimbus
thorn valley
rare python
quartz light
#

its old

#

@stray aspen @thorn valley @verbal nimbus check this out

#

opus 4.1

verbal nimbus
#

But I can't interact with it

quartz light
verbal nimbus
quartz light
#

i'll try gpt 5 nano to implement nipplejs and a circle button for jump for mobile controls

solar galleon
#

just like yesterday i could upload images in direct chat i used gemini 2.5 pro but it doesn't let me and only shows the error anyone know why

verbal nimbus
#

IDK what's up with Web tech, nowadays stuff loads faster on mobile than my gaming PC

thorn valley
quartz light
thorn valley
#

well optimized, to tell the truth

quartz light
#

its a whole game with sprites, sounds and levels in a single html file!

thorn valley
#

but I can't interact either

quartz light
#

incredible

quartz light
#

it might be janky but

jolly raven
#

How can I use picture to picture.

worthy thunder
#

Context Arena Update: Added GPT-5 (Thinking, 08-07) to 2needle (#1 @ 128k AUC), 4needle (#1 @ 128k AUC), and 8needle (#1 @ 128k AUC) leaderboards! Also added GPT-5-Mini and GPT-5-Nano. (https://x.com/DillonUzar/status/1953660295559192919)

More model results at: http://contextarena.ai

Overall GPT-5 is great for <=128k! Only exception is 8needle, Grok 4 still performs much better at <=32k compared to GPT-5, but GPT-5's performance at higher context wins out.

2needle: Top results (AUC @ 128k):

  • GPT-5 (Thinking, 08-07): 96.7% (#1)
  • GPT-5-Mini (Thinking, 08-07): 92.6% (#2)
  • Gemini 2.5 Flash (Thinking, 06-17): 91.5% (#3)
  • Gemini 2.5 Pro (Thinking, 06-05): 89.6% (#2)
  • Gemini 2.5 Flash (Non-thinking, 06-17): 81.7% (#5)
  • Grok 4 (Thinking, 07-09): 79.5% (#6)
  • o4-mini (Thinking, 04-16): 76.0% (#7)
    ...
  • GPT-5-Nano (Thinking, 08-07): 44.2% (#34)

8needle: Top results (AUC @ 128k):

  • GPT-5 (Thinking, 08-07): 50.3% (#1)
  • Grok 4 (Thinking, 07-09): 48.4% (#2)
  • GPT-5-Mini (Thinking, 08-07): 44.7% (#3)
  • Gemini 2.5 Pro (Thinking, 06-05): 43.9% (#4)
  • Gemini 2.5 Flash (Thinking, 06-17): 33.5% (#5)
  • o4-mini (Thinking, 04-16): 30.8% (#6)
  • o3 (Thinking, 04-16): 27.9% (#6)
    ...
  • GPT-5-Nano (Thinking, 08-07): 11.9% (#22)
stray aspen
#

thats great

quartz light
#

I hope they bump it up to 1M or higher

#

^

wicked root
#

anyone know when the next set of votes will be added to the overall rankings?

quartz light
#

gpt 5 yaps too much

obsidian cargo
#

Anyone know what was up with Zenith? It seemed even better than Summit, which was GPT-5

haughty siren
#

When do you guys think Gemini 3.0 is coming?

oblique needle
#

yeah, I dunno... my guess is 3.0 is pretty much dead in the water unless someone else launches a more powerful model

brisk helm
#

or did u make it ur self?

quartz light
brisk helm
#

craazy with opus 4.1 or gpt 5?

quartz light
#

idk why i uploaded it to the gpt 5 folder lmao

brisk helm
#

yeah thats why i was asking

quartz light
brisk helm
quartz light
#

trying this

#

wait wtf thats literally my generation what?

#

LOL

#

DUDE

steady vale
#

hit my gpt-5 limit lol

#

oh 80 every 3hrs

#

damn thats low

quartz light
lilac nimbus
#

In my text GPT5 code is next level

hasty rock
#

how often does lm update?

verbal nimbus
verbal nimbus
keen beacon
#

What is the difference between chat gpt 5 nano and gpt 5 mini?

floral comet
#

Do the gpt5 on llm arena is a thinking model or not?

little narwhal
floral comet
# terse shuttle thinking

Cool thanks! I'm trying it right now I'm very impressed, anyways sorry for many question but do you know what variant of gpt5 is this?

terse shuttle
floral comet
#

Oh cool! I thought there's only 1 gpt 5 model.. Thanks !

echo aurora
echo aurora
floral comet
#

Yep i see that now, Thanks!

keen fulcrum
#

@echo aurora Its hard to keep up with stealth model names. Can you show a stealth leaderboard?

sterile dust
terse shuttle
sterile dust
#

Hmmm...... Maybe they can't really search the web.

#

Is there other models which can search the web in lmarena?

echo aurora
sterile dust
keen fulcrum
#

The elo scores are never revealed to us until model release

echo aurora
mossy drum
turbid phoenix
#

GPT 5 is not Skynet..... 🤬

molten cipher
#

gpt 5 is insane

prime mulch
#

People say gpt 5 have another update in few days is that real

restive rampart
lethal current
#

Is it normal that gpt 5 is not formatting the code? Or is that on the lma side?

pulsar rain
devout vault
primal orbit
blazing bison
#

Lol openai still let you use old models on pro plan

#

Gpt 4.5 is not dead 😮

rapid merlin
#

i really wonder what's taking them so long to roll it out to plus, especially considering i already have it on my phone

wicked root
blazing bison
#

And people is reporting that apparently gpt 5 is dumb on chatgpt

#

The gpt 5 chat model is dumb

hardy lion
rapid merlin
restive rampart
hardy lion
#

oh you're right! it aught to include tie counts in both places imo

wicked root
#

Why is this so high for gpt5

#

You guys think gpt5 will beat 2.5 this month?

hardy lion
#

That's for style control, which takes more into account than just the raw wins/losses. You can see gemini's heade to win win-loss advantage reflected in the non-style leaderboards where it is actually above gpt-5

blazing bison
#

Plus users cooked

wicked root
primal orbit
#

could anyone explain what is "style control"?

pulsar rain
#

is it system prompt?

wicked root
restive rampart
rapid merlin
#

first test of the day on pc, and i can already say it isn't that censored which is nice

#

it also isn't nearly as lazy as o3 was, wow

hardy lion
#

It's explained in this post: https://news.lmarena.ai/style-control/
The general idea is that research has found that even if two responses contain the same information, people will vote for ones with more "stylistic features" such as markdown, lists, bold etc.

It's even been found that people will vote for more stylastic responses even if they are inaccurate or wrong. Some companies did RLHF too hard and their models were optimized just for responses that look good

So style control learns two sets of parameters, the model strenths, and the importanes of the style features. And then the model strengths are actuall interpreted as "the model strength if all style features were equal". Those are what is reported on the style controlled leaderboards, which are the defaults. It's similar to controlled trials in medacine where they correct studies for differences in age, or other factors.

LMArena Blog

We controlled for the effect of length and markdown, and indeed, the ranking changed. This is just a first step towards our larger goal of disentangling substance and style in Chatbot Arena leaderboard.

rapid merlin
#

with no resistance whatsoever

native venture
#

Hello 👋

rapid merlin
#

hi

neon idol
#

Gpt 5 is also in copilot

#

But is the same got 5 that is in chatgpt app?

solid brook
#

After my i hit limit on gpt 5

#

It switched to gpt 4o mini

#

Not gpt 5 mini

#

Are they high

neon idol
#

@echo aurora Yo a question. I have found on copilot GPT 5 but i the same that is chatgpt app?

keen beacon
#

Does anyone know if in the future it will be possible to upload files to the LMArena project?

pulsar rain
# pulsar rain
poll_question_text

What are your thoughts on lmarena using your data?

victor_answer_votes

3

total_votes

4

victor_answer_id

3

victor_answer_text

I'm extremely careful not to reveal any sensitive infor

astral prawn
#

Dang. I'm gonna miss 4.5 🙁

keen beacon
keen beacon
#

I am pissed off. People on r/Singularity are posting about their rage about GPT-5 and other models leaving... Then unsubscribing. Some reasons were: It has less personality, answers are too short, it wasn't as big of an improvement... Etc.

I really do not understand how people are like that. They are surely going to update the model like 4o to adhere to people's needs.

#

Sorry. A bit angered at the moment.

echo aurora
neon idol
torn mantle
#

is it -chat or -thinking ?

fading moth
neon idol
fading moth
#

Also, can anyone confirm the exact model version/variant of GPT-5 that is available via direct chat in lmarena?

echo aurora
#

I'm going to double check and will followup

golden ocean
fading rover
#

I am trying lmarena web for image geneation with prompt somehow most engines now a days even if mentiond 16:9 creates only square ratio. can anyone help how to specifically force to have 16:9 or 9:16

torn mantle
#

not the -chat ver

wicked root
#

I went long on google man. It’s imperative Altman’s products lose to Google

fading moth
#

Or not

#

So the most most basic gpt5 model nothing else, no thinking?

misty vault
torn mantle
#

im not a fan of this version

keen beacon
torn mantle
white hatch
torn mantle
#

i still think the current gemini is better than gpt5

flint skiff
#

im using gpt 5 high on cursor, through openai api

#

is it still nerfed if I do this?

#

it feels nerfed lol but maybe im just expecting too much from gpt 5

cedar tide
#

The comparisons that open ai doesn't want to show 😶

lament radish
#

someone will tell me how to generate videos here ?

blazing bison
#

Guys the thinking model of chatgpt is not gpt 5 thinking

#

The output is worst 100% of the time

#

There is something wrong with chatgpt interface thinking

#

😡

blazing bison
floral comet
#

Wow i tried vibecoding with gpt, it refused and says the feature I'm requesting is impossible for the current environment, I think that's what set apart between gpt5 and other models.. Other would just most probably agree and waste alot of tokens and my time

flint skiff
terse shuttle
#

@echo aurora just out of interest, are there any plans for a 3d llm arena?

golden ocean
#

mc bench

stoic ridge
#

why my videos doesn't have background sound,

solar hollow
floral comet
stoic ridge
#

Then why video long only 4sec and helf, not 5sec🥲

sly estuary
#

GPT5 model not working right?

#

i just ask any questions but only get "Something went wrong with this response, please try again." response

upbeat owl
#

Hello

floral comet
sly estuary
#

no, not work for me...

floral comet
#

Yeah and usually it happens to me when my input is long

prime mulch
#

Try to reload the page

blazing bison
#

"Its not gonna be a router" they said

#

😆

hollow imp
#

What is google bard?

pliant cliff
keen beacon
#

I tried it when it was new. It was only good for basic stuff

#

no coding or such were good

#

Would be nice to see it back for "nostalgia" vibes

rapid merlin
#

google bard is an absolute joke

#

it told me it can't produce a script in a coding context since it can only generate text

bright kayak
#

does anyone know how much worse gpt 5 mini is to gpt 5?

barren prairie
#

Did open ai remove the old features like search and edu mode from chatgpt free tiers who knows ?

keen beacon
bright kayak
#

thanks

willow grail
#

alright. so have u found out what is best way to use gpt5 for swe?

is it cline, cursor, poe, chatgpt?

strange elm
#

hi i love lmarena :3

keen beacon
#

Hello again, does anybody know what this pre-release model Velocilux could be? Came by when I was doing battle mode.

glass arch
#

I got gpt5 to leak its system prompt to me

rapid merlin
#

so far so bad with the gpt5 in chatgpt

#

it keeps trying to access variables before even initializing them

shadow jewel
#

PLEASE make it so that I can share raw github with the ais 🙏

rapid merlin
#

i noticed claude doing that aswell

glass arch
#

chatgpt makes it easy

#

you can just make a zip file and upload it

#

then it will extract it and view the content

pulsar rain
#

why chatgpt 5 talks like college students write note in their lectures?

rapid merlin
#

the model on lmarena actually wrote the code without minifying it

#

i dont know why this model does it

#

or well to an extent

eternal niche
#

gpt5 sucks

rapid merlin
#

i have memories and all off so that shouldn't influence anything either

#

to be fair i didnt really use a three essay prompt, but i think it should pass this (nothing spawns, tried multiple prompts to fix it and nothing)

floral comet
#

Is the gpt5 model in lmarena uses high reasoning or the medium one? Thanks for any answers!

floral comet
#

Alright thanks!

keen beacon
#

What is considered to be good code? Noob noncoder here

rapid merlin
#

feels lazy to me personally

obtuse heart
keen beacon
#

Holy jesus

tribal aspen
#

anyone from the lmarena team online right now?

#

@echo aurora

white hatch
#

I'm not sure, but i feel like gpt-5 was nerfed

tribal aspen
#

so

#

does the gpt 5 model

#

in direct chatr

#

use thinking only?

terse shuttle
tribal aspen
#

I mean the max reasoning only?

terse shuttle
tribal aspen
#

also why is it so slow

terse shuttle
tribal aspen
#

when it reasons in Copilot/gpt website it doesnt take so long

#

as much it takes in lmarena

tribal aspen
#

I wonder which one is nerfed

terse shuttle
#

maybe because llmarena using not the same provider that using official openai

#

i don't know

tribal aspen
#

as everyone hates copilot

terse shuttle
#

idk

ripe brook
#

What are the models in battle mode?

white hatch
#

random models

ripe brook
#

logically

#

will there be a version for phones?

wheat onyx
#

still no GPT5 on my plus account..

#

GPT‑5 is available to all Plus, Pro, Team, and Free users starting today with access for Enterprise and Edu coming in one week. It may take a few days to roll out to all Free users.

- Pro users get unlimited access to GPT-5 & access to GPT‑5 Pro, ideal for the most challenging,

ripe brook
ornate ether
#

lmarena is actually insane bruh

fleet lintel
torn mantle
fleet lintel
ornate ether
#

just the damn concept

#

so many models, many of them paid or limited, image gen, web dev and now video gen all at one place

#

for free

#

wthelly

wheat onyx
keen beacon
#

and get more data

ornate ether
#

i like it as it is

keen beacon
#

not just coders or specialists

ornate ether
#

isnt mainstream

keen beacon
#

I am an ordinary day to day user myself

ornate ether
#

to be considered "popular"

keen beacon
#

trying to sabotage the list as LMArena gets more popular

#

I checked the traffic yesterday and it has been increasing steadily

ornate ether
brittle tiger
obtuse heart
#

😭

ornate ether
ornate ether
#

even chatgpt on it's official page when asked says it's powered by gpt 4 with a knowledge cut

#

off at oct 2024

obtuse heart
eager crag
#

I tested out GPT-5 to make chiptune music.

#

It’s actually pretty good for an AI

wheat onyx
#

OAI really made users have the same limits as with o3, even though it significantly reduced its costs internally

#

disappointing

ornate ether
wheat onyx
#

so 80 messages of GPT5 vs unlimited for 4o, even though it costs less than 4o

brave orbit
eager crag
#

Does GPT-5 make images too?

brave orbit
eager crag
#

Hello?

brave orbit
#

pls try my website just try its really cool

calm sequoia
keen beacon
#

Quite the same except for hallucination rate

#

And perhaps coding performance and different formatting when writing an output

#

Gemini 3.0 is where it is at when it comes out. Also Deepseek R2.

pseudo hemlock
#

Is GPT5 MoE?

brave orbit
#

i had the same thing i just cleared my cookies and its done

languid crescent
#

Is GPT-5 slow at answering? Sometimes I get errors and need to refresh the message and the website.

brave orbit
#

its diff on what messages since it many think on diff messages for not hard messages it doesnt think for easy messages it just doesnt think

brave orbit
stoic ridge
#

When I generate videos some vids are long 8sec,but some have 5sec , why?

brave orbit
#

however openai stats is ChatGPT
:minor: Degraded Performance soo just so you know that

delicate rapids
#

hello

molten cipher
misty vault
#

Large Language Model

wintry tinsel
#

Is king fall definitively better than GPT5

wintry tinsel
# wheat onyx

So it costs the same as Gemini api when Gemini is free? Lol what a joke

unkempt oak
#

guy the are a methods for generate video while veo3 audio?

hollow imp
# wheat onyx FYI

Whats the difference in performance between free tier and paid tier? The only difference I found was deep research feature

#

Talking about the Gemini web btw

clever estuary
#

which is better, you guys?

#

guess 5 pro failed eh

obtuse heart
#

i just wish i can use it on cline tho

hollow imp
#

How pro?

clever estuary
#

no difference from what I've seen...

eternal niche
#

зет

maiden fulcrum
#

good morning everyone

#

could you give me a feedback about GPT-5 please, I still don't have it on my end.

maiden fulcrum
balmy mist
#

comand shift r

molten cipher
#

i have plus tier but don't see it on my models list

maiden fulcrum
balmy mist
#

or control shift r

#

try updating your app

#

or connect to a diff wifi, sometimes that works

#

but gpt5 has been good imo

#

my new daily driver

molten cipher
maiden fulcrum
inland cedar
#

i just updated my app

maiden fulcrum
inland cedar
#

ummmm

#

no

stray aspen
inland cedar
#

u shud delete and reinstall

maiden fulcrum
#

i still don't have it @inland cedar

true condor
#

So where is Qwen-Image in leaderboard?

fleet lintel
#

google share is up a bit.. is it gpt-5 effect?

indigo hazel
#

i dont know guys. im coding since this night with gpt5 and i feel really good with it. it doesnt make many mistakes like 2.5 pro just because for example doesnt hallucinate. im making a program in python, so it's not web

stray aspen
#

yes its amazing

clever estuary
#

hmmm 5 pro is really good at counting letters without reasoning

maiden fulcrum
clever estuary
#

I mean why not, doesn't seem there's a limit rn

maiden fulcrum
#

why not normal gpt-5 instead

clever estuary
#

just testing it ig

wheat onyx
maiden fulcrum
stray aspen
#

they are slowly rolling it out

#

they are serving the whole wolrd on their website

keen beacon
balmy mist
#

i dont use the app version tho, only web

wheat onyx
maiden fulcrum
#

do you guys think grok 4 heavy is better than gpt-5

solid brook
#

Guys i have a problem with chatgpt. Once i hit the limit on gpt 5 it switches to gpt 4o mini

#

No gpt 5 mini

#

Gpt 4o mini is garbage

obtuse heart
#

do the paid models in the website have any limits when youre using them?

stray aspen
#
poll_question_text

how long will gpt-5 be SoTA

victor_answer_votes

5

total_votes

9

victor_answer_id

1

victor_answer_text

until gemini 3 release

eternal niche
#

guys gpt5 sucks

molten cipher
keen beacon
solid brook
eternal niche
solid brook
eternal niche
keen beacon
#

It will look fun

molten cipher
solid brook
obtuse heart
# eternal niche

gpt-5 has went on the top of the list, this is an old screenshot posted here bro

solid brook
#

0/10 ragebait

solid brook
#

Whatever you say

eternal niche
#

gpt5 sucks

obtuse heart
solid brook
#

Yeah whatever you say

eternal niche
keen beacon
eternal niche
obtuse heart
eternal niche
solid brook
obtuse heart
#

okay lol hes just tryna get people mad

eternal niche
#

just accept that gpt5 sucks

solid brook
#

Yeah man

brisk helm
solid brook
#

Okay we got you

#

Now go be happy

eternal niche
#

i am happy

keen beacon
molten cipher
obtuse heart
#

i cant believe somebody has the free time to actively ragebait on discord for like 2 minutes worth of entertainment

eternal niche
molten cipher
#

so overall gpt 5 is better

eternal niche
#

why

molten cipher
#

google might just make a AI that will just take over the internet

keen beacon
eternal niche
#

skynet

obtuse heart
#

cant wait for gemini 3 tho, theyre cooking with it

eternal niche
#

because it sucks at text

solid brook
#

I agree with gemini 3 beating gpt 5

keen beacon
eternal niche
#

yeah because gpt5 sucks

molten cipher
solid brook
obtuse heart
#

claudes good but like the price is craaaazyyy

eternal niche
molten cipher
keen beacon
obtuse heart
#

if gemini 3 releases with a lower price range than claude right now and pretty much guaranteed the best ai model overall, then anthropic is in trouble

stray aspen
#

everyone who said that already changed their minds

eternal niche
molten cipher
#

tbh wait till gpt 6 comes

stray aspen
maiden fulcrum
#

do you guys think Grok 4 Heavy is better than ChatGPT-5?

stray aspen
#

and its horribly expensive

misty drum
#

Noooo

brisk helm
#

yh

obtuse heart
zealous panther
#

Its a 300$ or sth agent

#

Compare it to gpt 5 pro at least…

brisk helm
solid brook
#

That garbage is overpriced af

obtuse heart
#

what are yalls thoughts about gemini 2.5 deepthink

stray aspen
#

grok 4 heavy is just a lot of groks talking to each other

maiden fulcrum
zealous panther
zealous panther
solid brook
stray aspen
#

yes

#

gpt-5 is great

maiden fulcrum
zealous panther
#

Gemini 2.5 deepthink is catered to logic and math thouhg

zealous panther
#

The benchmarks says it

#

In the last humanity test or sth gpt pro still outscored grok heavy im pretty sure

maiden fulcrum
zealous panther
#

I know

zealous panther
maiden fulcrum
stray aspen
#

no

#

dont waste your money

zealous panther
solid brook
#

Man i cannot imagine google not beating gpt5. They just have to with gemini 3. If not they fall behind a lot

zealous panther
#

80 gpt 5 prompts every 3 hours

maiden fulcrum
zealous panther
#

If you use more than that

#

Then sure

zealous panther
#

But its not worh it generally…

#

Its a big plan for minimal increase in peformance

zealous panther
maiden fulcrum
zealous panther
#

Like it will just slow down

zealous panther
zealous panther
#

Pro has like a 4-5% increase

zealous panther
#

Anything can happen really

maiden fulcrum
zealous panther
#

But still though i felt like gpt 5 was focusing on web dev a lot

zealous panther
#

I think

#

Im not sure

obtuse heart
maiden fulcrum
stray aspen
#

anguilla must be getting rich from the AI domain

stray aspen
zealous panther
#

In app

leaden sun
patent aspen
#

I think OAI was wise to deprecate all of the old models. If they didn't, their capacity crunch would be much worse

brisk helm
#

is yupp better than lmarena

zealous panther
stray aspen
#

but it has a lot of models

#

and you need google signup

zealous panther
#

I think that people dont include like the response time of gpt into account

maiden fulcrum
zealous panther
#

Or the fact that they improved the webdev score by like 200

zealous panther
#

I think openai definitely doomed itself by not focusing on the webdev enough…its so good

#

I would say for like a simple task it would be faster from 10 seconds or more ?

#

O3 always thinks very hard

#

Depends on the prompt

#

If its a thinking prompt

#

Relatively the same time

#

If its just a general prompt

#

Gpt 5 can be faster by a lot

eternal niche
#

gemini 2.5 pro better anyway

zealous panther
#

I mean i can test rn if you want me to test a prompt. I have gpt 5 on my ipad and o3 on phone

zealous panther
eternal niche
#

it is

eternal niche
stray aspen
#

gpt-5 is greater than gemini

zealous panther
#

Yeah ? If you dont use webdeb that is

eternal niche
#

brother

#

you betrayed me

zealous panther
#

Webdev is a HUGEEEE JUMP

#

200+

zealous panther
#

Read the papers