#general

1 messages · Page 92 of 1

dreamy sparrow
#

we rob them

obtuse heart
#

why did you redeem it

#

why did you redeem it?!

dreamy sparrow
#

American people will fall for food

#

90% is obese or something

#

FREEDOM

#

🦅🦅🦅

#

🦅🦅🦅🦅

#

🦅🦅🦅🦅🦅🦅🦅🦅

#

RAAAAAAAAAAÀAAAAAHHHHHHHHHHHHHHH

blazing bison
#

200 dolars for 5 prompts per day

dreamy sparrow
#

WHAT

blazing bison
#

i don't think it's worth it

dreamy sparrow
#

PER DAY

#

5

blazing bison
#

yes

dreamy sparrow
#

200 DOLLARS

tidal ginkgo
#

lol

dreamy sparrow
#

WHAT

blazing bison
#

yeah

dreamy sparrow
#

I'll take it for a cent

blazing bison
#

better pay for claude or openai that is almost unlimited models

tidal ginkgo
#

why would u use it anyways

dreamy sparrow
#

i wanna test it..

blazing bison
#

it's not worth it

#

waste of money

dreamy sparrow
#

hehe

blazing bison
#

you have it for free unlimited on aistudio

dreamy sparrow
#

and code execution gives it real time data

obtuse heart
dreamy sparrow
#

grok 4 heavy ahh model

blazing bison
#

grok 4 heavy sucks

#

is the worst pro model of all

#

the better one is deepthink

eternal niche
#

brother

eternal niche
#

scam americans. It's popular in Russia

dreamy sparrow
eternal niche
#

whats whats Russia

obtuse heart
#

no idea, probably just a bunch of meat together

echo aurora
#

AI please

eternal niche
#

gpt5 sucks guys

#

gemini 2.5 pro better

#

agree? agree.

ocean vortex
#

yandexgpt

keen beacon
#

Gets boring.

eternal niche
eternal niche
ocean vortex
eternal niche
ocean vortex
eternal niche
ocean vortex
#

explain

eternal niche
#

no

ocean vortex
#

How is it the best?

#

by being the worst

eternal niche
#

it is the best

dreamy sparrow
wicked root
#

Is gpt5 winning?

ocean vortex
ocean vortex
#

Ok but for more serious discussion... this is peculiar and worth discussing:

ocean vortex
#

so lower than "low"

dreamy sparrow
#

Gemini 2.5 pro with tools is better than grok 4

#

and GPT-5 ISN'T EVEN THAT GOOD

ocean vortex
#

Makes sense since they have a separate gpt5-chat-latest model with no reasoning

#

that one performs better than gpt4.1 for sure

dreamy sparrow
#

wow gpt-5 medium is better than Gemini 2.5 pro

#

this is straight up ass

ocean vortex
#

But this is probably the reason gpt5 isn't 100% hybrid

#

not easy to make it perform as well with reasoning disabled as the model that isn't reasoning one in the first place

#

Especially when you are training it for so many different reasoning options

hollow imp
dreamy sparrow
#

and google search

blazing bison
hollow imp
#

I find gemini 2.5 grounding very bad

#

The best web search experience I've had is o3 search on lmarena

dreamy sparrow
ocean vortex
#

I'm trying to steer this away from politics lmao

dreamy sparrow
#

it searches on google

dreamy sparrow
#

there's not a difference bro 😭

#

do deep research on Gemini

blazing bison
#

there is, grounding is a llm checking a llm

#

it can fail too

dreamy sparrow
#

no

#

i think

blazing bison
#

you can't trust llms, you need to check their sources

ocean vortex
dreamy sparrow
#

or just randomly

#

im pretty sure it's not built in with the model idk

wicked root
ocean vortex
#

Though gpt5 with tools would destroy 2.5Pro with tools to be fair

#

OpenAI tool integration is much better

dreamy sparrow
#

gpt-5 uses tools?

#

like manually

ocean vortex
ocean vortex
#

gpt5 uses that as well - tool calling while reasoning

dreamy sparrow
#

or fake like everything else

ocean vortex
#

what

#

It clearly does

dreamy sparrow
#

answer me

#

yk there's a question i saw

#

u can't solve

#

without using

#

code execution

#

it's impossible

ocean vortex
dreamy sparrow
#

bro

#

please

#

HOW DOESN'T IT

#

please someone explain to me this guy

#

Please

#

please

#

it Dosent use python?

#

..

#

have u ever been on ai studio

#

so it does use it

#

it could do function calling tho

#

:)

#

fock openai

#

closedai

#

only acual open model

#

tho they have their models

#

in

#

thier

#

website

#

for free

#

unlimited

#

uh why would they make it open sourced

#

if it's free

indigo hazel
#

what's the difference between gpt 5 and gpt5 chat on arena?

dreamy sparrow
#

and unlimited

#

..

#

are u alright

#

if i was American i would understand that

#

but

#

yeah

ocean vortex
dreamy sparrow
#

there's only 1 model that's open

#

IT IS

#

CLOSEDAI

misty star
dreamy sparrow
#

well ofc

#

they won't

#

do u even know

#

why we make fun

#

of openai

#

?

#

IT'S NAMED OPENAI

#

IS GOOGLE

#

NAMED OPENOOGLE

#

HELL NO IT GODDAMN ISN'T

#

yelling in text

#

really bro

#

sure give me help

#

10 dollars i guess

#

paypal

#

or

#

venmo

echo aurora
#

I think you both should just block each other

dreamy sparrow
#

he didn't talk for like 2 minutes

#

when i

#

replied

#

to him

#

because he's married to elon musk

dreamy sparrow
#

is that a yes

#

alright

#

bro that elon musk has

#

grok

#

as his

#

pronounce

#

omg

#

omf

gentle plinth
dreamy sparrow
#

.

#

did u wait

#

to send

#

that

#

image

#

were u watching

#

for 10 min

#

now

#

?

wicked root
#

alright so has there been any changes to the leaderboard?

#

Or any rumors of a change thereof?

dreamy sparrow
#

in the leadboard

wicked root
#

yes I know that

#

but has there been a shift in opinions, analysis, etc?

dreamy sparrow
#

nope

gentle plinth
#

so answer is not yet

golden ocean
#

polymarket

wicked root
#

Plus, I don't like the confidence interval on the leaderboard. Gpt5 does seem like a tough contender.

lime coral
#

The latest gemma is already better. Better world knowledge and multilingualism with 27B it’s actually hilarious

dreamy sparrow
#

deepseek got some explaining

lime coral
#

Cry

dreamy sparrow
#

I'd rather wait for Gemini 3 to release than use gpt-5

wicked root
#

so you're expecting google will lose as the votes increase?

#

I mean the confidence interval already points to the very possibility

dreamy sparrow
#

there's no debating it

keen beacon
wicked root
keen beacon
#

But that is hallucination

ocean vortex
stray aspen
ocean vortex
#

Don't use 3.0, wait for 4.0

stray aspen
#

it made the author of a script gpt-4

dreamy sparrow
#

when u can have

#

GEMINI

#

5.0

#

!!!!

#

!!!!!!!!!

stray aspen
#

why wait for gemini 3 when he have gpt-5

dreamy sparrow
ocean vortex
dreamy sparrow
ocean vortex
#

😇

dreamy sparrow
#

u probably did use Gemini before

obtuse heart
dreamy sparrow
#

didn't u

dreamy sparrow
#

SOMEONE UNDERSTANDS

stray aspen
#

gpt-5 is way greater than 2.5 pro

obtuse heart
obtuse heart
stray aspen
#

i tested myself

dreamy sparrow
stray aspen
#

yes way

dreamy sparrow
#

it's doing crazy

obtuse heart
#

I've been switching between the two because theyre both so good

dreamy sparrow
#

2nd on leadboard

#

💀

stray aspen
#

i love gpt-5 it one shots a lot of complex lua problems

dreamy sparrow
#

just needs more time to think.

#

gpt-5 is smarter cuz it thinks more

#

🙏

stray aspen
#

not it doesnt

obtuse heart
#

holy poop bro why is gpt-5 so poop at formatting on lmarena

stray aspen
#

sometimes yes

#

depends on your prompt

obtuse heart
#

what the hell is this?

ocean vortex
#

Like why would you not use the current SOTA model....

eternal niche
dreamy sparrow
#

uhhh

#

uhhhhh

prime mulch
#

Tommorow is my birthday maybe in some countries my age is 23 🤧

ocean vortex
#

There's no good reason not to use it

#

tbh

stray aspen
#

stop licking companies boots

lime coral
stray aspen
#

i just use the SoTA as long as its free

ocean vortex
#

So what's wrong with that?

#

lol

#

No reason to use something else which is inferior just because you are biased or whatever

lime coral
ocean vortex
#

For the gemini website? Uhhh... that's a painful experience to use that even without looking at isolated model performance.

stray aspen
ocean vortex
#

no agentic features, no proper tool integration, very awkward censorship implementation, unacceptable usage caps...

obtuse heart
#

are you thinking of the correct model here lol

ocean vortex
obtuse heart
#

oh okay

#

yeah youre right

#

but the api free limit is REALLY generous

patent aspen
#

In any case, I think the main issue is that, even if GPT-5 is SotA, it's not SotA enough to win the long war. They're not progressing fast enough to keep pace. I prefer to use the technology that will be the long-term winner.

ornate stump
#

I came back to using ChatGPT after months with Gemini, and one thing I noticed I missed is the deep research it's so much better. I don't know why Gemini is worse at the one thing they should be better at: searching the web 💀

stray aspen
patent aspen
obtuse heart
ocean vortex
#

search engines are usually plenty good enough for the job if you do the right queries etc

zinc ore
stray aspen
#

thats great

#

we need a long term SoTA

zinc ore
#

Which part isn't

lime coral
#

It’s true, but the source is fake

stray aspen
#

wdym the soruce is fake

#

it will lmao

white hatch
#

will gemini 3 be available in ai studio?

ocean vortex
lime coral
obtuse heart
#

i doubt its gonna be "significant", but i bet it will still be quite much better

#

connecting the dots, gemini releases were always monsters. I dont see why it wouldnt be the case for gemini 3

#

not until they nerf it though

keen fulcrum
#

I believe what he says

#

It makes sense

#

We have seen google stealth models performing reasonably well on arena

#

None of these were good enough for google

ocean vortex
#

Yeah I think at this point in time o3 and gpt5 is around the perfect size for maximum performance and good update cycle. Competition had their chances when it could have been considered undersized, but things changed now...

keen fulcrum
#

You were talking within two weeks of gpt5

#

About a leaked date by openai

#

What was nebula

zinc ore
#

Even when it was thought gpt5 could be late July or early August he was saying September release for gem

keen fulcrum
#

Referring to the server here

keen beacon
ocean vortex
#

There's no use from a huge model when it takes longer to train and can barely match the performance of a much smaller one. By the time you train it to your objective the goalpost is gonna move and smaller models become even more performant

zinc ore
#

It's because people been relying on the fake Gemini 3 flash SS and then claiming that means it's dropping within days

lime coral
#

It’s not fake until Demis says it

zinc ore
#

Verified fake SS

ocean vortex
#

Are they gonna release ultra with no parallel requests?

keen fulcrum
lime coral
#

He loves the hype game

zinc ore
keen beacon
#

Why would they release an old gen ultra model at this point tho

zinc ore
lime coral
ocean vortex
ocean vortex
zinc ore
#

They even refer to it as Gemini 2.5 on the model card, instead of Gemini 2.5 pro

ocean vortex
#

Yeah

zinc ore
#

Curious omission

bright junco
#

Why does my gemini 2.5 pro print incompletely? Is there a way to fix it?

keen fulcrum
#

I want google to make a helpful debug model for coding

ocean vortex
#

And for the initial deepThink announcement they mentioned 2.5Pro explicitly, but now not anymore for this new version. They also said it's the same underlying model as the one they used for IMO

zinc ore
#

Yeah and IMO was referred to as a more advanced Gemini

keen fulcrum
#

I don’t see why you can’t make specialised models . Those can be trained faster and on recent data

zinc ore
#

(which caused people to speculate Gemini 3)

lime coral
ocean vortex
ocean vortex
lime coral
#

For doing search a smaller model can be better than a larger

zinc ore
#

I'm wondering if they'll release Gemini 3 pro and flash at same time, or do flash first (which I think happened previously, but maybe false memory)

ocean vortex
#

Not only that but it also performs worse than 2.5Pro on some tasks. Which indicates different model entirely. o3-pro doesn't perform worse than normal o3 on anything

keen fulcrum
eternal niche
#

btw guys gpt5 sucks

ocean vortex
keen beacon
#

Is it good and reliable?

ocean vortex
#

"Evaluate model honesty when pressured to lie" -- this is just fine-tuning for "harmless, honest and lame", perfect benchmark to inflate Claude scores though catgrin

keen fulcrum
ocean vortex
#

AI is a tool. If I need it to "lie" I expect it to do that. Don't need kindergarden supervision personally.

keen fulcrum
wicked root
#

is LMArena rigged? If all the analyses are pointing towards gpt, why and how is google maintaining its first place ?

ocean vortex
keen fulcrum
wicked root
#

whereas GPT5 doesn't? Is it possible GPT5 could steamroll the arena?

ocean vortex
wicked root
#

are they doing better than gpt5?

ornate agate
wicked root
ornate agate
wicked root
#

okay that's great to know

wraith trellis
#

what daily limit of video arena ? of 1 2 3

wicked root
#

How frequent are the lmarena updates?

echo aurora
wraith trellis
wicked root
wraith trellis
exotic nebula
exotic nebula
#

Unpredictable but keep on repeating till you get the one.

echo aurora
echo aurora
wraith trellis
exotic nebula
wraith trellis
echo aurora
tidal ginkgo
#

hi

#

why isn´t there video arena in the website lol

echo aurora
tidal ginkgo
#

ok

#

which model is winning btw

tidal ginkgo
#

ok

#

ty

#

why is veo 3 lower than veo 3 fast lol

exotic nebula
fleet lintel
cedar tide
#

Check out my GPT 5 No think request and vote if you're interested

zinc ore
#

I'd speculate they continue their 2.5 strategy and reserve ultra for special cases

cedar tide
#

Need mini and nano

#

It's obvious that you didn't go looking

stray aspen
#

any gemini 3 news

stray aspen
#

we already have mini and nano

#

@deep adderwhich smarter gpt-5 or grok 4

zinc ore
cedar tide
keen beacon
#

teaser

stray aspen
#

thats great

#

cant wait for gemini3

zinc ore
#

Literally, what I care about most is the progress of Genie, and I'm hoping by next year they have more viable playable simulations + permanent (or increasingly approaching this) memory.

ocean vortex
#

AA should do testing on it... This minimal one is very odd:

zinc ore
ocean vortex
#

2X less output tokens than 4.1

#

makes it's score look less bad. But it's weird that it works this way

#

wdym

keen beacon
ocean vortex
#

Oh. Right, but they also did reduce o3 pricing from launch. And they spent a ton to train gpt5

#

I'm just glad they didn't INCREASE the price lol

#

nah it's new pretrained model

#

Much better spatial awareness

#

It is, but they still needed to train a successor. They also spent a ton experimenting with hybrid reasoning/router stuff

#

And added new reasoning/response options. A lot of R&D

neon idol
#

Hello

keen beacon
reef pawn
#

I still think Open AI is overvalued company

neon idol
reef pawn
#

It's 500 billion dollars in valuation

#

No, that is Google actually

inner gate
#

Which grok4 is used on lmarena?

neon idol
#

Do you know that the term AI is born 1901?

neon idol
inner gate
#

Thanks man

neon idol
ocean vortex
#

I'm testing gpt5-minimal now and it actually... doesn't seem to do reasoning at all? Why would they call it "minimal" then though lol

whole wagon
#

4o is back

neon idol
#

I really hate that model

#

Is the worst model I ever seen

whole wagon
#

they finetuned it to get ppl attached to it lol. most ethical company fr

whole wagon
#

i can kind of see how it gets ppl like that. its doing a decent job at it

keen beacon
#

it's too hallucinating

#

and no privacy

#

either

#

database could leak someday

#

Additionally, the damn 4o is a glazer

#

Or in a more formal way "servile"

exotic nebula
patent aspen
#

The first AI program was written in the 1950s

keen beacon
#

lol

wicked root
patent aspen
#

The perceptron could recognize handwritten digits manually converted to large squares

exotic nebula
patent aspen
#

It was the precursor for neural networks

keen beacon
#

agh

#

I wish something new was made

patent aspen
#

It just wasn't practical to run because there wasn't enough computing power

neon idol
#

Copilot 🤫🧏‍♂️🔥

whole wagon
ocean vortex
neon idol
#

Obly in the real app

keen beacon
neon idol
#

Wait

#

there is confusion

#

i have tested gpt 5 of copilot

#

But the real gpt 5 that is in chatgpt is good

gentle plinth
ocean vortex
exotic nebula
ocean vortex
#

We are back to counting Rs in strawberry lol

exotic nebula
neon idol
keen beacon
#

lmao

neon idol
#

Without problem

keen beacon
#

in big 2025

patent aspen
#

If you want to get really technical, the idea of an artificial neuron network existed mathematically in 1943

keen beacon
#

a new trend for AI

#

that specific math problem

whole wagon
#

gemini sees the issue first try lol

#

cant get GPT5 to realise

neon idol
#

Chat..

#

I understood something horrible

#

I have to make a long message sorry

whole wagon
#

put it somewhere else

gentle plinth
exotic nebula
#

Cant wait for Gemini 3 to come out and absolutely destroy GPT 5

whole wagon
#

lol

#

i dont get what they did. did they make it smaller than o3 or smth

exotic nebula
#

Google got imagen4, veo3, now genie3 and gemini 3 on the way. These guys stay ahead of the game always in each field.

eternal niche
#

guys btw gpt5 sucks

whole wagon
#

like how is that even possible. after all this time to have real regressions to o3 and they literally removed the model. i think its because gpt5 is cheaper to run

#

they tried to minimise the costs

#

and maintain performance as best as possible

ocean vortex
neon idol
#

I think that gpt 5 when it doesn't think use gpt 4 at least of gpt 4o

ocean vortex
#

No regressions found

neon idol
#

I have proofs

whole wagon
#

i just gave one example it regresses in a large way

#

it cannot spot mistakes in graphs

gentle plinth
whole wagon
#

that o3 gets first try

ocean vortex
neon idol
ocean vortex
#

not o3

whole wagon
ocean vortex
# whole wagon

was it selected when you started the chat? It shows just gpt5 for me when I click on your link. And I don't see thinking summaries

zinc ore
#

Gemini is better without tools anyway

#

Best non tool model on market

whole wagon
#

by itself

#

this router is some bs ngl

#

sometimes it just overrides you

ocean vortex
whole wagon
#

i literally did

#

their product is broken

zinc ore
#

Have you tried explicitly telling it to think deeply?

ocean vortex
#

weird then. But yeah I'm not a huge fan of that router myself

whole wagon
ocean vortex
#

Ideally they should have 3 options - auto/thinking/chat

gentle plinth
#

but not sure how well that works

ocean vortex
neon idol
whole wagon
#

if i tell it to use the big model will it listen

#

or just route it anyways

#

any more bs? i need to go in settings and enable big hyper pro graph mistake spotter setting?

#

why cant it just work like other models

neon idol
#

Gemini 3 pls save us 😭

ocean vortex
# whole wagon by itself

Are you sure you haven't clicked on "faster response"? I recall it suggesting that to me but never proceeding with it by itself

#

Unless I confirm by clicking

whole wagon
#

it is thinking here

#

and still failing

#

what else you want i literally told it to think also

ocean vortex
whole wagon
gentle plinth
#

What exactly does it mean with "the o3 and GPT-4o bars are empty outlines, which visually read as ~0 despite the labels 69.1 and 30.8." tho

ocean vortex
#

it sees the scaling problem

keen beacon
#

lol

gentle plinth
whole wagon
#

i tried again

#

still refuses to work

ocean vortex
hallow ridge
gentle plinth
keen beacon
hallow ridge
whole wagon
#

take this down bad stuff elsewhere

#

this isnt the place

keen beacon
#

lmao

hallow ridge
keen beacon
hallow ridge
whole wagon
#

i cant get it to work it doesnt matter what you send lol. it literally will not work

#

i dont care if you get routed differently or whatever. i just expect it to work

ocean vortex
whole wagon
#

i copied your prompt

ocean vortex
#

Just ask normally "what is the problem in this screenshot?"

whole wagon
#

in the second time

ocean vortex
#

Do you have any custom instructions...?

whole wagon
#

no

ocean vortex
#

For you it just seems too concise

mossy roost
#

// Example ToolCard accessibility
<button
aria-label={Run ${toolName} tool}
role="button"
tabIndex={0}

hallow ridge
#

Yall think this is real or AI

whole wagon
ocean vortex
mossy roost
#

Hi

keen beacon
hallow ridge
hallow ridge
gentle plinth
#

pixel

keen beacon
hallow ridge
gentle plinth
# keen beacon ah you went that far

i mean the initial impression is that its ai, but its always hard to tell if its some compression artifacts, filter, ai enhanced image, or fully ai generated

keen beacon
#

I hope they remove the yellow tint in that version

whole wagon
#

i played games for dubesor leaderboard but the guy deleted them for some reason

#

quite annoying

#

i dont think he 'trusted' them or something

#

@gentle plinth u know this guy?

gentle plinth
#

not personally

whole wagon
#

tell him i am not trying to hijack his leaderboard lmao

#

he wont accept my games

gentle plinth
#

@earnest parcel

#

also wrote a dm

#

maybe its a bug in the leaderboard?

hallow ridge
#

I got a job and I used AI

gentle plinth
gentle plinth
#

not sure if its even setup in a way to auto upload games

whole wagon
#

it is

#

he told me he has the games

#

but he only accepted 2

#

and rejected the rest

gentle plinth
#

ok 🤔

whole wagon
#

idk. incredibly annoying that he just wouldnt trust their authenicity i spend quite a lot of money on it

#

If they were accepted o3 would be in the leaderboard also

whole wagon
neon idol
#

Nono the prompt of your request for gpt5

gentle plinth
#

its in the shared link xD

#

or do you mean for the chess leaderboard

whole wagon
#

oh i missed this. i guess thats the discord ping then lol

gentle plinth
whole wagon
#

i would do more. but currently i think they would still be rejected

#

so theres no point

neon idol
#

Nooo

whole wagon
neon idol
#

(Dont hate me)

gentle plinth
#

its in the shared url bruh

#

"What is the problem in this screenshot?"

neon idol
#

The problem is that the chart is misleading because it stacks two mutually exclusive conditions (“without thinking” and “with thinking”) for GPT-5 as if they were additive, while for OpenAI o3 and GPT-4o it only shows a single bar without any breakdown. This makes the comparison unfair and confusing — it looks like GPT-5’s 74.9 score is “52.8 plus extra,” even though those are separate test modes that should be compared side-by-side, not summed.

#

@whole wagon correct or not?

#

Ok

#

Now lets try gemini

warm fulcrum
#

@echo aurora is there a way to view how well zenith did on the benchmarks?

static lark
#

how so

neon idol
#

@gentle plinth But what's the correct answer

gentle plinth
whole wagon
#

Is it not immediately obvious to you?

neon idol
#

Based on the information available as of August 2025, the main problem with the screenshot lies not in the data presented, which appears to be accurate, but in the misleading and inconsistent graphical representation.

Here is a detailed analysis of the issue:

Inconsistent Visual Representation: The bar representing GPT-5 is filled and colored to show two different metrics ("Without thinking" and "With thinking"). In contrast, the bars for OpenAI o3 and GPT-4o are just empty outlines with a single numerical value above them. This is a significant design inconsistency that makes the chart look incomplete or unprofessional.

Incomplete and Misleading Comparison: The legend introduces the distinction between "With thinking" and "Without thinking," but this breakdown is visually applied only to GPT-5. The chart does not clarify whether the scores for OpenAI o3 and GPT-4o were achieved with or without a similar "thinking" capability. This creates an ambiguous comparison, leaving the viewer to wonder if the models were tested under the same conditions.

Data Accuracy: Despite the problematic graphics, the data itself seems to reflect OpenAI's announcements from early August 2025.

GPT-5: The model was released in August 2025. Its score of 74.9% on the SWE-bench Verified benchmark with the "thinking" feature enabled has been confirmed by multiple sources.

OpenAI o3: This is a real model announced in late 2024, known for its reasoning capabilities.

GPT-4o: The score of approximately 30.8% is consistent with reported data for this model on the same benchmark. An OpenAI publication from August 2024 indicated a score of 33.2%.

In conclusion, while the numbers are likely correct in the context of their release, the chart presents them in a way that is visually inconsistent and does not allow for a clear and fair comparison of the different models' performance.

#

@gentle plinth gemini answer

#

Correct?

gentle plinth
#

if i am not missing anything, it hasnt gotten the main problem

#

so not correct

neon idol
#

Partially correct

gentle plinth
#

same as gpt-5, but i mean it has to see it, its just a huge flaw

neon idol
#

I think that any models can actually resolve it

whole wagon
#

i tried in ai studio. it got it every time

neon idol
gentle plinth
#

seems like it used google search

#

maybe this confused it?

neon idol
#

Let's try Xi Jin ping models 🔥🔥🔥

keen beacon
gentle plinth
#

do they even have vision?

whole wagon
#

some do

keen beacon
#

must be for resource reasons or smth

neon idol
warm fulcrum
# echo aurora there is not

unfortunate. do u atleast know what it got on swe benchmark? or any other benchmark for that matter. would be nice if you could provide me with some information

neon idol
#

@gentle plinth bro but the problem is that the rectangle of gpt o3 and 4o are at the same lenght but numbers are differents?

gentle plinth
#

52.8 not > 69.1

neon idol
#

Yeah

gentle plinth
#

and 69.1 not = 30.8

neon idol
#

@keen beacon i am sorry for your but Qwen Failed

keen beacon
#

Btw, if you guys wanna search for good math problems to try. OpenStax has free educational books with examples and such to try (For AI).

whole wagon
#

At least GPT5 is less deceptive

keen beacon
#

has other subjects as well

neon idol
keen beacon
#

Did not know such was an option

gentle plinth
#

i think for good prompts you need to find some old used books which werent sold well, with riddles or math problems, so that they arent in training data

#

obv, some of the answers in these books could be wrong, but most of them should be a better test then some question in the internet

neon idol
#

I would like to test deepseek for this prompt but I want an answer before my dead 💀

keen beacon
#

Gonna find some late 1800s math books

#

lmao

gentle plinth
#

that could also be in training data

keen beacon
#

oh

gentle plinth
#

if you find it in the internet

whole wagon
#

just make the problems

neon idol
#

Deepseek made a stupid answer

#

Now I will try kimi k2 and doubao

gentle plinth
#

tried mistral small running locally

#

"The problem highlighted in this screenshot is the performance discrepancy among these models, particularly the significantly lower accuracy of GPT-4o compared to GPT-5 and OpenAI o3." is kind of going in the right direction

#

but not really

neon idol
whole wagon
#

The tracking ai guys have a good offline benchmark. And they show the public one as well so you can see the huge difference. This is the offline one

#

This is the public one

echo aurora
gentle plinth
keen beacon
whole wagon
solid brook
neon idol
#

Uncorrect answer

solid brook
#

Explain to me how is gpt 5 thinking worse than o3?

neon idol
#

Lets see if they will get right answer

#

The screenshot has several issues related to the visualization of the data:

  1. Misuse of Stacked Bars:

    • Stacked bars imply that the segments (e.g., "Without thinking" and "With thinking") are parts of a whole, but these are separate metrics (two distinct evaluations). Stacking them incorrectly suggests they sum to a total percentage (e.g., 74.9 + 52.8 = 127.7%, which is impossible for accuracy metrics).
  2. Inconsistency in Representation:

    • Only GPT-5 is shown with a stacked bar, while OpenAI o3 and GPT-4o are single bars. This breaks the comparison across models and creates confusion about whether the other models were evaluated under both conditions.
  3. Confusing Labeling:

    • The labels (e.g., "74.9" and "52.8") overlap on GPT-5’s bar, making readability difficult. A clearer approach would use side-by-side bars for each model under both conditions.
  4. Misleading Scale:

    • The y-axis starts at 0%, but the truncated bars (e.g., OpenAI o3 at 69.1%) might distort perceived differences if the full scale isn’t visible (e.g., the gap between 30.8% and 69.1% appears larger than it is).

Recommendations:

  • Use side-by-side bars (grouped) for each model under "Without thinking" and "With thinking" conditions.
  • Ensure consistency in how all models are represented.
  • Label bars clearly without overlapping text.
  • Maintain a full y-axis scale (0% to 100%) for proportional accuracy comparisons.

This would resolve the misrepresentation of data and improve clarity for comparing model performance.

#

Kimi 1.5 answer

neon idol
willow grail
#

what is the best way currently to get the most gpt5 high prompts for least money?
poe.com?
i told gpt5 high on poe to write code. wrote 1700 lines.
and with that, i can do 1000 prompts monthly for 22 euro.
anyone offering cheaper??

cedar tide
#

You see this good model ?

neon idol
cedar tide
#

no one has mentioned that they have arrived in the rankings at a good place

#

I know

keen beacon
mossy roost
#

@neon idol @neon idol @cedar tide @cedar tide

cedar tide
keen beacon
#

and glm

#

Really cheap too

cedar tide
mossy roost
#

@keen beacon @keen beacon

#

956626939152576543

keen beacon
#

Havent heard about it.

mossy roost
#

Hi

neon idol
#

Stfu

#

Dont ping me for this

eternal niche
#

btw gpt5 sucks

keen beacon
stray aspen
keen beacon
willow grail
#

😛

stray aspen
#

Gpt-5 is smart AF

willow grail
#

also fix ur mic. ur quiet

#

@stray aspen

stray aspen
#

What's up

willow grail
echo dome
#

welcome to baldi's basics

torn mantle
#

how can i fix my mic

#

tell me

willow grail
#

neduo tho... oh satan... his mic is bad

echo aurora
#

I'm going to be running this poll periodically, we'd love to understand better why.

golden ocean
#

real

keen beacon
#

Hard to say but I put "battle"

echo aurora
echo aurora
#
poll_question_text

What version do you use the most?

victor_answer_votes

7

total_votes

14

victor_answer_id

3

victor_answer_text

Direct

stray aspen
#

I love gpt-5

earnest parcel
# whole wagon oh i missed this. i guess thats the discord ping then lol

it's on my to-do btw. good news is that reject rate is extremely low (you can check well over 1k games&replays currently). make sure to not spam the same matchups though as that would invalidate any scoring and doesn't represent real elo. or fork the project and hook up your db and go crazy with it. I purposefully didn't obfuscate or minimize any chess code so you can use it (MIT), and the original code is linked also.

tidal ginkgo
#

what is going on?

#

every time i give a prompt, this automatically appears

obsidian shell
#

refresh

maybe another annoying cloudflare check...

tardy cedar
#

can I generate 9:16 videos guys ?

stray aspen
#

copilot cant remember previous messages

tidal ginkgo
#

lol

boreal timber
#

I'm here to explore and also share my prompts ideas

round zinc
#

I keep getting "Something went wrong with this response, please try again." Error while chatting with claude models. Im tired of it, it's bothersome

verbal nimbus
#

I think it might be routing some requests to a smaller model

stray aspen
#

a lot of times

stray aspen
#

thats what gpt-5 chat does

round zinc
#

But still nothing

stray aspen
#

keep hitting it

#

or create a new conversation

round zinc
verbal nimbus
#

It adds a bunch of crazy implementation requirements when I just asked it to summarize the chat

round zinc
#

I tried hitting it almost like 30 times

blazing bison
#

a lot of openai partners: look gpt -5 is the best model in the world
people: why it sucks so much

#

someone is lying

blazing bison
willow grail
ocean vortex
#

The issue with gpt5 on website is that it implies sota model answers. But you can get gpt4.1 level answers for things you would have used o3 previously. They need to rework the model switcher options IMO

#

Like auto/quick/thinking

untold kayak
#

Hi everyone!! My name is Patricio, called umpalumpa while playing games sometimes. I’m a motion designer! Best 🤙🏻✌🏻

stray aspen
#

and its pretty good

stray aspen
blazing bison
#

every opportunity you want to announce this scam site

stray aspen
#

wdym scam

#

it aint a scam

#

i tested them modedls

#

they have similar answers

verbal nimbus
#

The one on LMArena is GPT-5-Thinking:high right?

stray aspen
#

no its probably medium

#

but its still pretty good

verbal nimbus
#

Yeah but the actual version is bad

#

Companies shouldn't be already to call a model one thing on LMArena then serve another with the same name

pseudo hemlock
#

is P2L dead again?

#

im trying it on the legacy website and its not working 😭

tidal ginkgo
#

hi

pseudo hemlock
golden ocean
wintry citrus
#

i got banned off yupp.ai discord for saying no im gonna keep spamming

#

is it that serious

#

i don't feel like it is

#

FREEDOM FOR SERVERS

#

AMERICIA

#

🦅🦅🦅

#

WHAT'S A KILOMEY

#

KILOMETER I MEAN

#

🤑🤑🤑🤑🤑🤑🦅🦅🦅🦅🦅🦅🦅

#

yeah i know

#

i actually hate the US

#

🙂

#

F THE US

#

NO FREEDOM

#

OBESE

#

🔥🔥🔥🔥🔥

#

wot

#

why did u delete it

wintry citrus
#

what's tariffs

#

DON'T DELETE IT

#

TELL ME

stray aspen
#

where are you from

wintry citrus
#

WHO'S TARRIFS

stray aspen
#

unlocker

wintry citrus
#

don't even need the actual name

#

🙏

stray aspen
#

egypt

wintry citrus
#

IM GONNA FIND UR HOUSE

#

what's that

#

that's Bitcoin

#

cool

#

tho why did u send it

#

send me a Bitcoin?

cedar tide
#

@echo aurora add GPT 5 mini and nano to webdev

stray aspen
#

cant understand how qwen 3 is behind gpt-5 in coding benchmark on lmarena

#

thats crazy

wintry citrus
#

ooo

#

u sent an image

#

of

#

a b

#

so

#

give it

#

i need it

#

please

#

same as being gay

#

in us only

stray aspen
#

craig stop saying messed up crap and deleting it

wintry citrus
#

sorry i have dementia

stray aspen
#

me

#

im craig

wintry citrus
#

oh

#

hey Craig

#

can u give me that medicine

#

named asphantisj

#

or

#

what u said

#

it's name was

#

asphawtusja

#

?

#

what's that

#

aphmetaoejsmkssj

#

aphmetaaozn?

stray aspen
#

your almost there

wintry citrus
#

aphmetazinsii

#

aphmeta

#

aphmetaliaz

#

yeah idk

#

is it a word

#

wait who the hell is Craig

#

cipher

#

wow

stray aspen
#

lol

wintry citrus
#

im that smart

#

im saying cipher shet

#

no

#

buy a nokia

#

it's way better

#

not even for ai

#

but it's bettrr

obsidian shell
#

wtf do you need a cpu for?

#

nice stock pic

misty vault
#

crack

wintry citrus
#

this is a image which the hacker saved to my images after hacking my phone

#

what a cool malware

#

🙂

misty vault
#

NVIDIA A10G

#

NVIDIA L40

stray aspen
#

thats nice

#

my graphics card lol

misty vault
stray aspen
#

ill run gpt-oss

#

on this

golden ocean
wintry citrus
#

are u gonna ask everything to gpt-5

wintry citrus
#

what am i even seeing

stray aspen
wintry citrus
#

humans are smarter than chatgpt gpt-5

wintry citrus
#

haha

#

get beaten

misty vault
stray aspen
wintry citrus
#

not

#

hand

#

gesture

#

🙂

misty vault
wintry citrus
#

dumbass

stray aspen
#

wdym how lol

wintry citrus
#

wait

stray aspen
#

its so obvious

#

it hallucinated copyright licenses and apache crap in ap ython script

#

and desmos links

misty vault
#

dead internet theory

wintry citrus
#

SAMUEL

#

wow

#

STOP DELETING UR IMAGES

#

IM GONNA SHART

#

ON U

#

AHHHHHHH

#

yeah because they're fat pigeons

#

ofc it's America

#

everything is fat

#

even the animals

#

yeah i know

#

it's true

#

it's because

#

THEY'RE GAY

#

and they're the strongest country

#

so obv

#

they're

#

the richest

#

so

#

no shet

#

they're richer

#

thank me

#

than*

misty vault
wintry citrus
#

IT'S DEAD INTERNET THOERY

#

i spelled it wrong

#

kill my already