#general

1 messages · Page 85 of 1

hollow imp
#

Is it not good enough?

verbal nimbus
#

Sure, I use both Gemini and Claude a lot

whole wagon
#

guess where gpt5 will be. I think 66%

patent bane
#

78%

hollow imp
#

Nah guys not these rational benchmarks. I want your personal experiences

minor bloom
whole wagon
#

that was fake

thin creek
keen beacon
barren prairie
#

To not let people say they are ClosedAI they make the garbage open source

primal orbit
#

poe.com has gpt 5 thinking high free

fleet lintel
#

altman is shady guy... i was 100% sure that OSS models were going to be bad

tired herald
#

No way

whole wagon
#

if you think gpt5 is getting 90% on simplebench u are delusional ngl

keen beacon
pulsar rain
#

*privacy

rapid merlin
#

either high 50 or low 60 im guessing

tired herald
blazing bison
#

prob not

rapid merlin
#

idk if its on any thinking tbh

gusty loom
#

yo anyone found a documentation page for gpt 5?

eternal niche
hollow imp
keen beacon
#

Guys, it is henceforth Declared that GPT-5 shall score 70% on Simple-bench (Decided by you)

pulsar rain
#

it took more time to answer than gemini 2.5 pro, so probably?

rapid merlin
#

extreme traffic

hollow imp
#

Gpt 5 thinking vs grok 4 thinking

torn mantle
#

next month

pulsar rain
#

yeah we don't really sure about tat

verbal nimbus
void elm
#

do we have any hexagon or any prompts you guys can give me to test? i have the highest gpt 5 model and no idea what to test

thin creek
keen beacon
hoary elbow
#

GPT five is so close to winning against Gemini

obsidian shell
#

mixtral and again mixtral...

devout vault
primal orbit
#

OpenAI’s latest flagship model with significantly improved coding skills, long context (400k tokens), and improved instruction following. Supports native vision, and generally has more intelligence than GPT-4.1.

#

it's free, i'm using it now

hoary elbow
#

Gemini three hasn’t released yet

primal orbit
#

reasoning effort high

tired herald
#

Points

#

You have 3k points, each message costs 250~

primal orbit
#

don't know about points, have never used that site

keen beacon
void elm
#

2027

#

ai hit a wall rn

minor bloom
#

2029

tired herald
keen beacon
#

2035+

void elm
#

no good improvements since gemini 2.5 pro tbh

keen beacon
verbal nimbus
keen beacon
#

I like how no one says this year lol

steady vale
keen beacon
pulsar rain
#

probably not in my lifetime, for true AGI

keen beacon
hollow imp
obsidian shell
#

apparently the gpt 5 high in the interface is better then the gpt 5 on arena

keen beacon
hollow imp
#

Jee advanced

void elm
hollow imp
keen beacon
keen beacon
void elm
# hollow imp

The tangential acceleration equals the time derivative of the speed:
dv/dt = w_t.

Here w_t is the projection of the constant vector a (directed along +x) onto the unit tangent τ to the trajectory:
dv/dt = a · τ.

Since a = a i (i is the x-unit vector), a · τ = a τ_x, where τ_x = dx/ds. Also
dx/dt = v τ_x.

Hence
dv/dx = (dv/dt)/(dx/dt) = (a τ_x)/(v τ_x) = a/v.

Integrate:
v dv = a dx → (1/2) v^2 = a x + C.

With v ≈ 0 at x = 0, C = 0, so
v(x) = sqrt(2 a x).

Thus the speed depends only on x (not on the path shape): v^2 = 2 a x for motion in the +x direction.

minor bloom
#

Isn't grok still the best at engineering?

#

(Not software)

pulsar rain
#

how does gpt-5 performs on numerical math? like pi^(pi+e+1)?

rapid merlin
gusty loom
#

is gpt5 available on lm arena?

keen beacon
solid brook
#

yes

pulsar rain
#

most model without python or math module backup cannot do that

tired herald
hollow imp
# void elm

Wait, there is a chance of it getting it correct.
Send it this.

gusty loom
#

Nvm i see it now

pulsar rain
rapid merlin
pulsar rain
#

I cannot trush gemini for that sht

hoary elbow
#

GPT five is like Gemini 2.5 pro but faster can’t wait for Gemini three though

keen beacon
#

?

hollow imp
keen beacon
rapid merlin
#

holy crap, the output size is indeed solid

pulsar rain
verbal nimbus
# void elm do we have any hexagon or any prompts you guys can give me to test? i have the h...

For common-sense reasoning, here's one from Simple Bench all models get wrong:

A luxury sports-car is traveling north at 30km/h over a roadbridge, 250m long, which runs over a river that is flowing at 5km/h eastward. The wind is blowing at 1km/h westward, slow enough not to bother the pedestrians snapping photos of the car from both sides of the roadbridge as the car passes. A glove was stored in the trunk of the car, but slips out of a hole and drops out when the car is half-way over the bridge. Assume the car continues in the same direction at the same speed, and the wind and river continue to move as stated. 1 hour later, the water-proof glove is (relative to the center of the bridge) approximately?

A) 4 km eastward
B) < 1 km northward
C) > 30 km away north-westerly
D) 30 km northward
E) > 30 km away north-easterly
F) 5 km+ eastward
rapid merlin
#

never seen an OAI output 1300 lines of code without resistance

thin creek
void elm
barren prairie
#

The high version is a pure trash at coding

hollow imp
# void elm

Send this a simple question so even you can understand the answer.

#

Gemini 2.5 pro crashed

#

In thinking

verbal nimbus
autumn blaze
#

bro the lmarena gpt 5 says its gpt 4

pulsar rain
#

how about chess? does gpt-5 still suck at 900 elo 🤣

verbal nimbus
#

It gets it wrong as well

#

Thanks for trying

keen beacon
#

Guys, "OpenAl PRs (no browsing)" Has hit a wall. ALL the models are stuck at 44 percent including o3, ChatGPT agent and GPT-5

minor bloom
#

What is gpt-oss? Its on arena

thin creek
keen beacon
#

Will we see gpt-5 with reasoning on leaderboard ?

obsidian shell
#

playing game with gpt 5 high vs opus 4.1

so far stockfish thinks opus is winning

obsidian shell
keen beacon
#

yeah samee

stray aspen
#

gpt 5 live on yupp ai

keen beacon
verbal nimbus
hollow imp
#

😂

blazing bison
#

funny gpt-5 live everywhere but not in chatgpt

#

🤓

keen beacon
#

its wrong??

#

ohh but i got 0.6

clever estuary
fleet lintel
keen beacon
#

@hollow imp

hollow imp
rapid merlin
#

the model seems to be blowing every other OAI model out of the model when it comes to UI though

#

or well, css

keen beacon
#

@hollow imp From 0 to 1: (2/√3)(arctan(√3) − arctan(1/√3)) = (2/√3)(π/3 − π/6) = π/(3√3).

tired herald
#

Who doesn't love seeing this

keen beacon
#

bro jee adv maths is ezz

#

physics + chem too hard

solid brook
#

man I'm excited af

#

imagine gemini 3

south vigil
#

has anyone benchmarked it yet?

keen beacon
hollow imp
#

@void elm If it can solve this then it passed

keen beacon
#

i have my q bank

keen beacon
#

i have some hard diagram geometry problems (creative thinking not complex formulae)

#

only grok has solved most

#

it didnt get 2-3

hollow imp
#

@keen beacon do you have telegram

keen beacon
south vigil
#

is gpt 5 still not on the app?

hollow imp
#

This wrong btw

south vigil
#

i can't see it on selection

hollow imp
#

This def wrong

obsidian shell
#

is the 74.9% swe bench true? for gpt5

rapid merlin
#

Answer: A) 4 km eastward

Reasoning:

The glove drops at the midpoint of the bridge and (being waterproof) floats on the river.
Once on the water, it’s carried mainly by the river’s current: 5 km/h east.
The light 1 km/h west wind may slow that drift slightly. A simple approximation is 5 − 1 = 4 km/h east.
Any northward motion from the car (30 km/h) is lost almost immediately upon hitting the water; the north offset would be only a few meters—negligible after an hour.
So after 1 hour, it’s roughly 4 km east of the bridge’s center.
this is what the lmarena one said

hollow imp
keen beacon
#

How is GPT-5 performing at maths and physics thus far?

stray aspen
#

selector broke again?

keen beacon
hollow imp
brave ferry
#

Quick question: can anyone confirm whether GPT-5 started rolling out before the livestream ended?

keen beacon
# hollow imp No

Can you explain? Does it perform better than others but still bad

stray aspen
hollow imp
keen beacon
#

its better than others only a few quesitons its struggling on

south vigil
stray aspen
keen beacon
brave heron
#

ye

keen beacon
#

works for me

brave heron
#

not working

stray aspen
#

@echo aurora

#

its broken

remote niche
#

guys when will gpt 5 be available in chatgpt ?

half trail
#

🙄

keen beacon
fleet lintel
hollow imp
remote niche
#

guys when will gpt 5 be available in chatgpt ?

half trail
#

Why gemini becomes dumb these days?

remote niche
#

guys when will gpt 5 be available in chatgpt ?

keen beacon
steady vale
hollow imp
#

It messed up this simple question

half trail
keen beacon
dusky aurora
south vigil
#

where is gpt 6

fleet lintel
rapid merlin
#

gpt 6 in 2077

neon idol
south vigil
#

LOL

keen beacon
#

In the arrangement shown in figure a weight A possesses mass m = 4 kg, a pulley B possesses mass M = 2 kg. Also known are the moment of inertia I = 2 kgm2 of the pulley relative to its axis and the radii of the pulley R = 1 m and 2R. The mass of the threads is negligible. Find the acceleration of the weight A after the system is set free, taking acceleration due to gravity equal to 9.81 m/s2.

echo aurora
neon idol
hollow imp
#

@keen beacon is this correct?

tired herald
#

Asking for its architecture on LMArena gives GPT 4 while asking on Poe gives GPT 5

#

Idk who to trsut

keen beacon
#

yeah its correct 3.5 (for g = 10)

keen beacon
#

while poe may have a systme propmt

#

We are never agi

hollow imp
neon idol
#

Why I cant find gpt 5 on chatgot web and app 💀💀

tired herald
#

Cursor is doing well by offering gpt-5 for free for a bit

tired herald
neon idol
echo aurora
half trail
tired herald
rapid merlin
keen beacon
#

o3 high gets this quesiton wrong most of the times:

A home aquarium partly filled with water slides down an inclined plane of inclination angle θ with respect to the horizontal. The surface of water in the aquarium
(a) remains horizontal
(b) remains parallel to the plane of the incline
(c) forms an angle α with the horizon where 0 < α < θ
(d) forms an angle α with horizon, where θ < α < 90

#

its b

wheat onyx
#

still don't have access to gpt5 yet

neon idol
#

Aniway I wil try it in lm arena

tired herald
stray aspen
#

bro lmarena is down

hollow imp
keen beacon
#

nah its working for me

half trail
#

How to get free access of claude?

keen beacon
half trail
#

😶

hollow imp
keen beacon
wheat onyx
#

I have GPT Plus, where's my upgrade

minor adder
golden ocean
half trail
stray aspen
#

@echo auroracan you escalate please

wicked root
#

alright how's gpt5 looking so far?

rigid holly
#

It deleted my chats like twice

golden ocean
#

dont escalate pls

#

let us keep gpt 5 for ourselves!

keen beacon
wicked root
keen beacon
stray aspen
echo aurora
keen beacon
pulsar rain
stray aspen
#

thanks

golden ocean
keen beacon
#

i am

bleak sundial
rapid merlin
# keen beacon o3 high gets this quesiton wrong most of the times: A home aquarium partly fill...

Answer: (b) remains parallel to the plane of the incline

Why: In the frame of the sliding aquarium, there’s a pseudo-acceleration up the incline of magnitude a. For a frictionless slide, a = g sinθ. This exactly cancels gravity’s component along the plane, leaving only g cosθ normal to the plane. The free surface is perpendicular to the effective gravity, so it must be parallel to the plane.

Note:

If the tank were sliding at constant speed (a = 0), the surface would stay horizontal (a).
In general, the tilt α (from the horizontal) satisfies tanα = a cosθ / (g − a sinθ), so 0 ≤ α ≤ θ, with α = θ only for the frictionless case.

wicked root
bleak sundial
#

need to compare token cost for marginal improvement

keen beacon
hollow imp
#

@void elm what did it say about the quadrant question?

keen beacon
half trail
tired herald
#

Will we get the mini/nano models on lmarena?

stray aspen
minor bloom
#

So guys, what exactly are tools? Does lmarena have them?

keen beacon
keen beacon
golden ocean
#

but lmarena does have google search !!!

keen beacon
keen beacon
clever estuary
stray aspen
#

lmarena is fixed

half trail
#

I have perplexity comet but hoping for android version because most of the time we search in phone not in laptop

neon idol
#

Uhm @echo aurora (Sorry for ping)

stray aspen
polar garnet
tired herald
polar garnet
tired herald
#

Cool site

pulsar rain
#

I think lmarena gemini 2.5 pro have google search

echo aurora
echo aurora
#

looks back for some, but still working on a fix

tawny kelp
#

Huh... I was wondering what model Summit was. I remember having some interesting interactions with that one.

neon idol
clever estuary
polar garnet
#

Also all my conversations are deleted

stray aspen
keen beacon
#

relax

neon idol
stray aspen
#

seems like they reseted the whole website

hollow imp
half trail
#

😶

wicked root
#

what's this?

pulsar rain
#

the frontend got disconnect from back end. everything will be back to normal

clever estuary
#

Wait fr???

rapid merlin
# hollow imp Send this a simple question so even you can understand the answer.

Short answer: You can’t tell from the given picture.

Why:

For x ≥ 0, f(|x|) = f(x). So the graph shown only tells you how f behaves for non‑negative inputs.
But f(-|x|) = f(t) with t ≤ 0. That uses only the values of f on the negative side, which the picture does not give.
What we can say:

y = f(-|x|) is always an even function (symmetric about the y‑axis).
Its graph is the y‑axis mirror of the left half (x ≤ 0) of y = f(x).
Without knowing f for x ≤ 0, the graph of f(-|x|) could lie in quadrants I–II, III–IV, or cross the x‑axis. Extra assumptions (e.g., f even ⇒ same as f(|x|); f odd ⇒ the negative of f(|x|)) would be needed to decide.

golden ocean
#

gpt 5 on mc bench

keen beacon
#

"GPT-5 usage limits 👀

10 messages every 5 hours on Free
80 messages every 3 hours on Plus
Unlimited use on Teams and Pro
"

keen beacon
stray aspen
rapid merlin
verbal nimbus
# bleak sundial

It's very surprising how low Claude is in the coding category, considering that it's leading SWE-bench.

keen beacon
#

Bro, this better include o3 level web search

polar garnet
#

I'm glad we have lmarena

hollow imp
half trail
hollow imp
#

@void elm @keen beacon the answer

haughty siren
#

Is the GPT-5 in LMArena with thinking?

neon idol
polar garnet
verbal nimbus
toxic egret
#

Its fixed, atleast for me

keen beacon
neon idol
#

FIXED

neon idol
#

Yay

novel flame
#

There is a reason why they don't include Gemini 2.5 Pro in the chart. Gemini is waaay better on long context

pulsar rain
#

uhmm. I don't think chat history will get back... that's big rip...

half trail
# neon idol Nah

The best one is Google jules but normal gemini app with 2.5 is bad

polar garnet
keen beacon
wicked root
#

if gpt5 and gemini are on par, why's the market acting like Gemini had won?

rapid merlin
rapid merlin
neon idol
#

Hi! 👋 How can I help you today? Would you prefer to continue in Italian or another language? We are starting bad

verbal nimbus
rapid merlin
#

that people think will blow gpt 5 out of the water

hollow imp
# keen beacon aakash

Can I dm you and talk to you there? Are you frequently on discord? I have 11th prep doubts that can only be solved by personal experience

neon idol
verbal nimbus
minor bloom
#

For some reason gemini is doing much better according to polymarlet

minor bloom
#

Which makes no sense to me

hollow imp
wheat onyx
#

GPT‑5 is available to all Plus, Pro, Team, and Free users starting today with access for Enterprise and Edu coming in one week. It may take a few days to roll out to all Free users.

- Pro users get unlimited access to GPT-5 & access to GPT‑5 Pro, ideal for the most challenging,

keen beacon
keen beacon
half trail
#

Imagen 4 is best and veo 3

verbal nimbus
half trail
#

For coding still lacks a lot

hollow imp
wicked root
#

Yessir

keen beacon
wicked root
hollow imp
misty vault
#

💀

stray aspen
#

lmao

primal orbit
#

gpt-5 is an amazing model compared to gpt-oss

stray aspen
#

craig also said gpt oss was great

solid brook
#

bro you high

stray aspen
rigid holly
#

Can my conversations not dissapear for 5 minutes?

neon idol
#

Elon Musk seeing that gpt 5 is better than Grok 4: 🥀🥀

half trail
#

Gpt 5 available

stray aspen
plain carbon
#

I need advice on how to report the result of a prompt. I asked two AI's a question about doing something on Google Sheets. One was a complete disaster. The second gave me three options; one worked, one failed, and one was half-right. Do I say "both are bad", or "that one is better"?

barren prairie
half trail
pulsar rain
#

open-ai 🤝 CPU / GPU companies: at naming production

half trail
#

How about coding in gpt 5

echo aurora
barren prairie
rapid merlin
plain carbon
pulsar rain
#

naming it o3. 4o is actually so bad, normal people wouldn't even know which one is better. probably they want you to get confuse

rapid merlin
#

didn't test it that much to be able to tell anything else

half trail
#

🤔

hollow imp
#

@echo aurora hello sorry to ping but please can you tell me if lmarena is going to bring pdf attachment anytime soon? I really want it

pulsar rain
#

it's about naming

rapid merlin
#

it was good for its time 👍

#

the o models have a weird naming scheme

#

since the number jumped from 1 to 3 kekw

keen beacon
#

r u the reall craig? (sorry)

hollow ocean
#

@deep adder $1 method goated

rapid merlin
#

o2 became sentient and turned into a company

echo aurora
jade egret
#

is the gpt-5 on lm arena same as the gpt-5 on the officialt website?

pulsar rain
#

most people probably thinking: "well, it have 4 and o, so it must be better than o3, right?"

hollow ocean
#

I don’t think so lol

neon idol
#

Can someone pls give me some prompts for see how id god gpt5. Thx

wicked root
#

Gpt5?

keen beacon
# neon idol Can someone pls give me some prompts for see how id god gpt5. Thx

In the arrangement shown in figure a weight A possesses mass m = 4 kg, a pulley B possesses mass M = 2 kg. Also known are the moment of inertia I = 2 kgm2 of the pulley relative to its axis and the radii of the pulley R = 1 m and 2R. The mass of the threads is negligible. Find the acceleration of the weight A after the system is set free, taking acceleration due to gravity equal to 9.81 m/s2

echo aurora
keen beacon
#

@deep adder r u the real craig?

keen beacon
minor bloom
#

Btw, what is style control?

neon idol
keen beacon
#

\frac{3g\left(M+3m\right)}{M+9m+\frac{I}{R^2}}\ =\ 10.3\ m/s^2.

jade egret
#

gpt-5 is out? im on plus plan

keen beacon
#

another q:

A uniform cylinder of radius R = 1 meter is spinned about its axis to the angular velocity \mathbf{\omega}_\mathbf{0}=\mathbf{250}\ \mathbit{rpm} and then placed into a corner. The coefficient of friction between the corner walls and the cylinder is equal to k = 0.59. How many turns will the cylinder accomplish before it stops?

hollow imp
whole wagon
#

We should be thankful for the competition in this space. Imagine having to wait 2 years from now for GPT6 💀

whole wagon
minor bloom
#

Nah, its coming out next year

keen beacon
pulsar rain
#

probably put it on pay wall with limited uses for sometime

keen beacon
neon idol
stray aspen
hollow imp
keen beacon
#

my prediction heh

hollow imp
hollow imp
keen beacon
echo aurora
#

anyone experiencing the models lagging a lot right now?

blazing bison
#

gpt-5 nano is insane

wheat onyx
burnt axle
#

hi

rapid merlin
#

LMAOOOOOO

hollow imp
# keen beacon hmm

You should try it. I had a hard time trying to find where to chat with it. It was on huggingface but with limits

keen beacon
blazing bison
neon idol
stray aspen
#

artificial analysis

rapid merlin
#

Yes

neon idol
wheat onyx
#

goddamn

rapid merlin
#

this feels more like gpt-4.75 to be fair

stray aspen
#

they are trippin

minor bloom
pulsar rain
#

gpt-5: 10^(π^e) ≈ 2.878446 × 10^22
gemini 2.5 pro: 2.878335... x 10²²
correct answer: 2.878443560...x10^22
gpt-5 is pretty close. it probably rounding issue right?

keen beacon
stray aspen
shell pewter
#

quite useful, for anyone to catch up 😇

whole wagon
#

All the independent benchmarks are coming in and they are trash 💀

neon idol
blazing bison
#

and gpt-5 auto is routing between those models

whole wagon
rapid merlin
#

Gg

whole wagon
#

Artificial analysis tested and they have it as trash kek

neon idol
rapid merlin
#

I have gpt 5 in chatgpt

whole wagon
#

Why would they lie

blazing bison
#

bro gpt-5 sucks sorry

neon idol
rapid merlin
#

yeah, also wtf

#

i have it on my phone but not on my pc ????

blazing bison
#

i don't have it yet

rapid merlin
#

same account

jade egret
hollow imp
#

New people coming to discuss about gpt 5 and new people getting in beef with you 😔

keen beacon
shell pewter
hollow imp
#

@keen beacon dm

keen beacon
#
poll_question_text

Will Gemini release a new model this month?

victor_answer_votes

13

total_votes

21

victor_answer_id

1

victor_answer_text

Yes

wicked root
#

Will gpt5 beat gemeni in lmarena leaderboard for text?

blazing bison
#

gpt-5 400k context and they still limited chatgpt pro users with 100k, i'm gonna cancel this s****

rapid merlin
elder rapids
#

@deep adder gpt 5 is good

hollow imp
blazing bison
steady vale
#

anyone know when gpt5 will be available ycry

ocean vortex
# jade egret

I mean... it kinda is? SOTA for function calling and instruction following. And challenging for the top spot everywhere else

keen beacon
elder rapids
#

good vibes, good overall model, js hope it's smart

rapid merlin
stray aspen
blazing bison
#

NOOOOOOOOOOOOOOOO

rapid merlin
#

besides those two

blazing bison
#

NOOOOOOOOOOOOOOOOOOOO

whole wagon
rapid merlin
#

💀

neon idol
#

Only at me gpt 5 is extremely slow?

stray aspen
whole wagon
#

Only GPT5 will exist. I assume it is cheaper or smth

pulsar rain
keen beacon
neon idol
little narwhal
whole wagon
#

Yes

keen beacon
#
poll_question_text

Which model is the most generally intelligent model right now?

victor_answer_votes

14

total_votes

26

victor_answer_id

4

victor_answer_text

GPT-5

whole wagon
#

ALL

ocean vortex
blazing bison
#

never see you again

whole wagon
#

I think they try to reduce losses. Maybe GPT5 is actually smaller than o3 even

keen beacon
#

🤣🤣

solid brook
stray aspen
keen beacon
#

frrr

empty stump
#

Yesss it's out

blazing bison
keen beacon
whole wagon
#

Vote plz

ocean vortex
wicked root
#

Will gpt5 overtake gemini on the leaderboard?

blazing bison
#

gemini 3 will be better for sure

neon idol
#

Chat, be honest. You are in this group only for see if there are better ai to use for homework 💀🤣

keen beacon
pulsar rain
little narwhal
wicked root
minor bloom
#

GPT-5 has 4 new chat personalities: Cynic, Robot, Listener, Nerd.

Find them in Customize ChatGPT in settings.

whole wagon
#

What does openAI even do now. The AGI illusion has been shattered kek

keen talon
#

why gpt 5 says he is gpt 4

neon idol
#

@keen beacon chatgpt 5 said wrong answer

minor bloom
ocean vortex
# ocean vortex

@willow grail Told you it's to be same day release for EU. You wouldn't listen catgrin

rapid merlin
#

what should I try to give gpt 5 thinking

pulsar rain
keen beacon
whole wagon
#

"It's like a team of PhD experts on demand"

#

😂

rapid merlin
#

its so slow though

#

the thinking model at least lol

#

quite the tank

clever estuary
#

"gpt5 is good at frontend aesthetics"
gpt5:

jade egret
pulsar rain
quick holly
#

hello

jade egret
clever estuary
jade egret
#

what the prompt?

wicked root
#

Chat, will gpt5 beat gemini on lmarena?

keen beacon
#

and for a wrong answer (o4-mini)

whole wagon
exotic tartan
clever estuary
whole wagon
keen beacon
# keen beacon
poll_question_text

On Simple-Bench what will GPT-5 score?

victor_answer_votes

9

total_votes

20

victor_answer_id

4

victor_answer_text

70%

wheat onyx
#

im refreshing my GPT page every 30s. give it to me

wicked root
lime coral
#

The Gemini bat logo will rule them all

rapid merlin
keen beacon
#

another

blazing bison
#

now @deep adder can start hyping gpt-6, gpt-6 for sure will be a different beast, insane model, sota

whole wagon
#

At this rate xAI gonna overtake openAI kek

rapid merlin
#

o4 will be good guys

#

trust

pulsar rain
#

Annotators on their way to destroy their own career:

stray aspen
#

@deep adder will gpt-6 be AGI

wheat onyx
wheat onyx
#

and we should be getting a big reasoner in ~December maybe

stray aspen
#

grok 5

blazing bison
wheat onyx
blazing bison
#

o3 is o2

steady vale
rapid merlin
#

lol

pearl kiln
#

yo guys why do some people have sound on their videos?

clever estuary
rapid merlin
stray aspen
jade egret
blazing bison
#

@pearl kiln bcs of veo 3

rapid merlin
#

not on the website though

pearl kiln
raven oracle
#

Gpt 5 pretty laughably bad at translation compared to something like 2.5 pro, it writes like a textbook XD

blazing bison
#

it is random

clever estuary
stray aspen
steady vale
#

lmfao

pearl kiln
jade egret
#

ima try again tho

keen beacon
#

OMG I WANT GROK 5

blazing bison
#

gpt-5 suck for writing btw

keen beacon
#

GPT 5 seems to suck at lotta things

#

reminds me of meta's scout and maverick releases

blazing bison
#

now i need to test it with code

keen beacon
#

hehe

blazing bison
#

they said that it's a bunch of phds in my pocket

keen beacon
#

lol

quartz light
#

yall try generating music with web audio api with gpt5

whole wagon
#

It's strange how it basically is just openAI failing to progress at a fast rate. The Chinese LLMs don't seem to be having a slowdown

keen beacon
#

PHD from whatsapp?

keen beacon
# keen beacon
poll_question_text

After GPT-5, what are your AGI timelines?

victor_answer_votes

8

total_votes

21

victor_answer_id

2

victor_answer_text

2028-2030

stray aspen
#

well those phds must have bought their degree because they are dumb as hell

quartz light
blazing bison
#

just look the paper they released about the llms personalities, that's insane

olive mesa
#

gpt-5 isn't agi...

keen beacon
#

but claude sonnet 4 is not that great of an improvement from 3.7

olive mesa
#

sigh

#

we're never getting agi

keen beacon
#

opus 4 is awesome tho

keen beacon
whole wagon
blazing bison
#

for me it is

whole wagon
keen beacon
#

hmm

willow grail
keen beacon
blazing bison
#

for me claude sonnet, not opus, is the best model actually

clever estuary
#

GPT5 vs Gemini 2.5 pro

ocean vortex
willow grail
ocean vortex
#

You need to be paying API customer though. But still EU 😇

stray aspen
blazing bison
clever estuary
#

yeah gemini gets lucky at times

blazing bison
#

for math it was o3

keen beacon
white hatch
willow grail
keen beacon
#

but GPT OSS 120b on cerebras impressed me

hard questions, answers in < 5s

#

its amazing

stray aspen
#

which is gemini

keen beacon
#

right one

stray aspen
#

gpt5 looks better

keen beacon
#

nah

#

gemini

willow grail
# stray aspen gpt 5 wins

design wise gpt one is ugly makes no sense.... pls look again at it..

the 2.5 pro version is clearly inspired from retro stuff.....
the gpt one... is... not.. its not inspired at all

stray aspen
#

they are both great

blazing bison
#

they both suck

#

in my opinion

#

let's keep humans doing design for now

keen beacon
#

yeah hehe

clever estuary
white hatch
keen beacon
#

@blazing bison

but GPT OSS 120b on cerebras impressed me

hard questions, answers in < 5s
its amazing

even on math arena it tops (even on SMT whose q havent been released)

#

And decreased hallucinations help with correct info

civic flame
#

@echo aurora hi, can you please provide an update on the feature request to allow disabling the sysprompt in direct chat? i requested this before the new lmarena even released and i've heard nothing since

#

all in on google

blazing bison
civic flame
keen beacon
#

i need the updated 2.5 pro asap

willow grail
pliant cliff
white hatch
#

Actually I trust Google more than OpenAI

keen beacon
clever estuary
rapid merlin
keen beacon
blazing bison
willow grail
rapid merlin
#

not all that bueno but still

clever estuary
keen beacon
rapid merlin
keen beacon
clever estuary
#

ah I see

keen beacon
#

ideas etc.

blazing bison
#

zenith was good with design btw, WHICH MODEL WAS ZENITH

#

SOMEONE

#

SOMEONE ANSWER

eternal niche
stray aspen
#

it was grok 6

keen beacon
#

Ahhhhh, the website is going down

#

too many people

umbral silo
#

Hi

keen beacon
umbral silo
#

How are you?

stray aspen
#

gpt 5 design is disappointing. gemini does better

keen beacon
ripe mountain
#

which ai do you think is the best right now? gpt 5?

gusty loom
#

nah

keen beacon
rapid merlin
bright kayak
#

when is gpt-5 actually coming out? I still don't have it anywhere

gusty loom
#

its out

fossil fable
keen beacon
willow grail
bright kayak
bright kayak
fossil fable
steady vale
bright kayak
#

ok then thank god

gusty loom
bright kayak
fossil fable
#

...

#

...uk

keen beacon
steady vale
keen beacon
#

I dont have it either

clever estuary
#

I told them to recreate Deepl
GPT5

steady vale
#

maybe usa ppl got it first

bright kayak
#

its probably progressively rolling out

keen beacon
#

but it could be because I am in the EU

clever estuary
#

vs gemini 2.5 pro

keen beacon
gusty loom
#

Actually gemini is looking better ngl

keen beacon
#

better than google translate

fossil fable
pliant cliff
raven oracle
bright kayak
#

isnt that literally the deepl site?

keen beacon
clever estuary
#

this is what deepl actually looks like

keen beacon
pliant cliff
bright kayak
molten parcel
#

YOoooo

#

hello guys

fossil fable
molten parcel
#

i am new in this server

stray aspen
gusty loom
white hatch
stray aspen
#

if thats not the actual deepl website gemini did a great job

gusty loom
#

yeah

molten parcel
clever estuary
molten parcel
#

btw

keen beacon
molten parcel
#

can i use veo 3 inthis server?

stray aspen
white hatch
stray aspen
#

if you get lucky enough

molten parcel
#

SWEEET!!!!

gusty loom
fossil fable
#

i find that GPT-5 is far from just focusing on intelligence but also how pleasant it is to work with in general use (like, i don't know, ChatGPT)

molten parcel
white hatch
#

How does gpt-5 behave on coding tasks?

queen heart
#

@everyoneIs there a way to generate videos with sound here, pls help I just got here today

stray aspen
#

you cant choose it

fossil fable
bright kayak
#

@everyone!

fossil fable
gusty loom
#

lol

molten parcel
#

hHOW MANY Veo3 videos can i GENERATE IN A DAY

eternal niche
#

kid

whole wagon
#

Guys I think there is smth actually extremely wrong with gpt5. Like they made a mistake somewhere

gusty loom
whole wagon
#

Look how it performs compared to horizons

fossil fable
whole wagon
#

There is a bug in the inference or smth

#

There is no way it is this bad

molten parcel
molten parcel
gusty loom
#

yes

keen beacon
fossil fable
molten parcel
fossil fable
#

i find it a shame people now dumb it down to "thinking" instead of "reasoning"

steady vale
#

just had gpt-5 answer a question way worse than 4o lol

molten parcel
#

my bad i dint mean it negetively

blazing bison
#

gemini is so much better lmao

stray aspen
molten parcel
whole wagon
#

I'm pretty sure they screwed smth up in the release

blazing bison
#

wait horizon models already revealed?

molten parcel
whole wagon
#

Like artificial analysis benchmark and everything is literally showing GPT5 worse than o3 it's smth wrong in the launch

blazing bison
#

ok we have some real improvements here

molten parcel
#

btw

gusty loom
molten parcel
#

where CAN i actually generate it?

gusty loom
#

Video Arena 1/2/3/4

eternal niche
molten parcel
molten parcel
#

thanks

eternal niche
placid spear
#

Which version of GPT-5 do we have in the arena? GPT-5 pro or just the "normal" thinking model that Plus users have?

gusty loom
#

Can anyone give me a memory prompt for Chatgpt to not use the long dash and use emojis less?

rapid merlin
#

im pretty sure there is a pro?

clever estuary
eternal niche
neon idol
#

After gpt 5 I am atill waiting for deepseek R2

placid spear
#

Thank you

clever estuary
blazing bison
gusty loom
#

After all, we should all really appreciate LM Arena thats let us use their service for free.

whole wagon
#

Ohhhh they are showing benchmarks for GPT5 high internally but we only have GPT5 medium ofc. That's why everyone else's benchmarks are awful 😂

blazing bison
#

it just that deepseek don't hype, they release

whole wagon
#

They didn't release the high version yet

clever estuary
#

it's proven they can't catch up, they don't even have vision at this point

tribal aspen
#

@echo aurora will we have GPT 5 pro on LMArena direct chat

blazing bison
#

it's avaliable already

tribal aspen
#

Why they released two versions of GPT 5 😭

keen beacon
#

Will LMArena add the rest?

whole wagon
tribal aspen
white hatch
whole wagon
#

There's no setting

tribal aspen
#

But the website does have

whole wagon
#

Everyone currently is using GPT5 standard

gusty loom
whole wagon
#

Including LM arena

blazing bison
tribal aspen
#

Gpt 5, mini, nano

#

But no pro

#

I think that's a variant of gpt 5 not pro

#

What is it

gusty loom
#

but better

whole wagon
#

The horizon models don't match up to GPT5 then

#

All the benchmarks are looking like this and wondering wtf is going on

gusty loom
#

yeah prolly

whole wagon
#

It's not just one person. Multiple benchmarks

whole wagon
#

Have it like that

blazing bison
#

" Pro, Plus, and Team users can also start coding with GPT‑5 in the Codex CLI⁠(opens in a new window) by signing in with ChatGPT."

clever estuary
#

honestly of all AI models released these days
Apple's OpenELM is still the gold
no one can top that

blazing bison
#

wow

whole wagon
#

There's smth wrong with gpt5 lol

tribal aspen
#

Where's gpt 5 pro benches

blazing bison
#

so now you can use codex cli with chatgpt account

#

that's interesting

tribal aspen
solid brook
#

bro wth lmarena had the model on the second gpt 5 was announced but the chatgpt website is still the old one

blazing bison
blazing bison
whole wagon
#

Did codex get updated

#

To GPT5?

wheat onyx
#

I don't have it

wheat onyx
#

Oh it's codex

blazing bison
whole wagon
#

Oh nice I have it

gusty loom
wheat onyx
#

Im just dumb

blazing bison
#

they did this bcs of claude code lmao

gusty loom
#

I certainly think thats gemini is better at coding than Chatgpt.

whole wagon
#

Codex is the best coding agent ngl

blazing bison
#

no

whole wagon
#

Yes

blazing bison
#

not the cli one

#

the web maybe

whole wagon
#

I use the web

blazing bison
#

you can't delete your data from there

#

that's insane

jade egret
#

when gemini 3

whole wagon
#

I work on open source stuff so idc lol

blazing bison
#

ye, people that do valuable work care

#

so it's not the best agent

keen fulcrum
#

Wait zenith was gpt5

#

What was better than zenith

blazing bison
#

no one knows what zenith was

stray aspen
#

bro is gpt 5 improving?

whole wagon
stray aspen
#

i feel like its better than an hour ago

whole wagon
#

Bro has 0 clue about anything kek

blazing bison
gusty loom
blazing bison
#

AI suck with c, with rust

#

so idk what are you doing

whole wagon
#

I use it with rust just fine

blazing bison
#

yeah bro, my colleagues that actually do something with rust disagree

#

claude opus is the best one for rust but it generate super verbose code, so no one uses it

hollow imp
#

WHATS BETTER IN LMARENA GPT 5 THAN OFFICIAL CHATGPT FREE TIER GPT 5?

gusty loom
#

stop yelling please

#

its the same

blazing bison
#

it's not

gusty loom
#

how

blazing bison
#

free after some prompts change to gpt-5 mini

#

on arena it always gpt 5

hollow imp
#

Any performance related change?

gusty loom
keen beacon
#

And it is not possible in general

blazing bison
#

it is for some people

keen beacon
#

oh

hollow imp
#

Do you guys know how to get a better version somewhere else? Like a more reasoning one

blazing bison
#

slow roll out

hollow imp
#

Like how you can use o3 and o3 search in lmarena and nowhere else for free

#

Same with grok 4 search

#

Same with opus 4 thinking 16k

#

@blazing bison something like that for a better gpt 5?

gusty loom
#

for search just click the globe icon at the prompt bar

devout vault
#

If summit is gpt 5 then what is zenith?

ocean vortex
blazing bison
devout vault
#

Chat what is zenith

hollow imp
#

I want free biscuits

gusty loom
# ocean vortex

Well sometimes thinking gets much slower results. (sometimes even worse results)

hollow imp
#

Especially my eye is on gemini 2.5 deepthink

#

Even if I get 1 prompt per week

#

Just free biscuit pls

#

🙏

#

@blazing bison

tribal aspen
ocean vortex
# blazing bison

yeah they gonna direct replace 4o-latest with gpt5-chat-latest

devout vault
#

Does anybody know anything about zenith

ocean vortex
#

And o3 with gpt5 medium reasoning

gusty loom
#

Right. I didn't relate to the analysis you sent.

ocean vortex
hollow imp
#

Is gpt5 reasoning not available at all for free on anywhere?

ocean vortex
#

though I haven't seen direct comparisons yet

hollow imp
#

What about poe.ai gpt 5 reasoning high?

gusty loom
ocean vortex
hollow imp
ocean vortex
#

well limited use of the full model probably, but not with reasoning

hollow imp
#

???

#

You sure?

keen beacon
ocean vortex
#

and then it will fall back to gpt5-mini with and without reasoning

keen beacon
#

yes

hollow imp
whole wagon
#

Who is going to eat openAI first. Chinese LLMs or other American ai companies

hollow imp
ocean vortex
hollow imp
#

O4 full release when

molten parcel
#

wait...yall are gettng access to gpt 5?

#

howwwwwww?

keen beacon
#

lol

ocean vortex
hollow imp
hollow imp
ocean vortex
#

and "o4-mini" is just marketing name lol

keen beacon
#

no more O series

hollow imp
#

Why openai naming so twisted

ocean vortex
hollow imp
molten parcel
molten parcel
#

can u access in this server?

keen beacon
hollow imp
keen beacon
#

at least not for me

ocean vortex
#

you don't need forced reasoning though

hollow imp
keen beacon
barren prairie
molten parcel
hollow imp
hollow imp
barren prairie
wicked root
#

Im confused. Is gpt5 overtaking gemini on the leaderboard and does this mean openAI won?

#

I hear lots of mixed views on this

hollow imp
#

Russian

gusty loom
#

Please don't send it here.

half trail
#

Gpt 5 not upto the mark in coding

hollow imp
barren prairie