#general | Arena | Page 92

dreamy sparrow Aug 9, 2025, 4:13 PM

#

we rob them

obtuse heart Aug 9, 2025, 4:13 PM

#

why did you redeem it

#

why did you redeem it?!

dreamy sparrow Aug 9, 2025, 4:13 PM

#

obtuse heart why did you redeem it

me not INDIAN

#

American people will fall for food

#

90% is obese or something

#

FREEDOM

#

🦅🦅🦅

#

🦅🦅🦅🦅

#

🦅🦅🦅🦅🦅🦅🦅🦅

#

RAAAAAAAAAAÀAAAAAHHHHHHHHHHHHHHH

blazing bison Aug 9, 2025, 4:14 PM

#

200 dolars for 5 prompts per day

dreamy sparrow Aug 9, 2025, 4:15 PM

#

blazing bison 200 dolars for 5 prompts per day

what the

#

WHAT

blazing bison Aug 9, 2025, 4:15 PM

#

i don't think it's worth it

dreamy sparrow Aug 9, 2025, 4:15 PM

#

PER DAY

#

5

blazing bison Aug 9, 2025, 4:15 PM

#

yes

dreamy sparrow Aug 9, 2025, 4:15 PM

#

200 DOLLARS

tidal ginkgo Aug 9, 2025, 4:15 PM

#

lol

dreamy sparrow Aug 9, 2025, 4:15 PM

#

WHAT

blazing bison Aug 9, 2025, 4:15 PM

#

yeah

dreamy sparrow Aug 9, 2025, 4:15 PM

#

I'll take it for a cent

blazing bison Aug 9, 2025, 4:15 PM

#

better pay for claude or openai that is almost unlimited models

tidal ginkgo Aug 9, 2025, 4:15 PM

#

why would u use it anyways

dreamy sparrow Aug 9, 2025, 4:15 PM

#

tidal ginkgo why would u use it anyways

idk

#

i wanna test it..

blazing bison Aug 9, 2025, 4:16 PM

#

it's not worth it

#

waste of money

dreamy sparrow Aug 9, 2025, 4:16 PM

#

blazing bison better pay for claude or openai that is almost unlimited models

eh Gemini 2.5 pro is still good

#

hehe

blazing bison Aug 9, 2025, 4:16 PM

#

you have it for free unlimited on aistudio

dreamy sparrow Aug 9, 2025, 4:16 PM

#

blazing bison you have it for free unlimited on aistudio

yeah

#

and code execution gives it real time data

obtuse heart Aug 9, 2025, 4:17 PM

#

blazing bison better pay for claude or openai that is almost unlimited models

claude is hella expensive tho...

dreamy sparrow Aug 9, 2025, 4:17 PM

#

grok 4 heavy ahh model

blazing bison Aug 9, 2025, 4:18 PM

#

grok 4 heavy sucks

#

is the worst pro model of all

#

the better one is deepthink

eternal niche Aug 9, 2025, 4:18 PM

#

brother

obtuse heart Aug 9, 2025, 4:19 PM

#

eternal niche brother

https://tenor.com/view/chinese-mangal-war-dancing-chinese-guy-dancing-with-food-dance-gif-3180111146851375689

Tenor

eternal niche Aug 9, 2025, 4:20 PM

#

scam americans. It's popular in Russia

dreamy sparrow Aug 9, 2025, 4:20 PM

#

eternal niche scam americans. It's popular in Russia

what's Russia

eternal niche Aug 9, 2025, 4:20 PM

#

whats whats Russia

keen beacon Aug 9, 2025, 4:21 PM

#

obtuse heart https://tenor.com/view/chinese-mangal-war-dancing-chinese-guy-dancing-with-food-...

What food is that, lol

obtuse heart Aug 9, 2025, 4:22 PM

#

no idea, probably just a bunch of meat together

keen beacon Aug 9, 2025, 4:23 PM

#

obtuse heart no idea, probably just a bunch of meat together

or perhaps diabeetus food

#

https://tenor.com/view/diabeetus-wilford-sugar-dessert-diabetes-gif-14969762

Tenor

echo aurora Aug 9, 2025, 4:23 PM

#

AI please

eternal niche Aug 9, 2025, 4:24 PM

#

gpt5 sucks guys

#

gemini 2.5 pro better

#

agree? agree.

ocean vortex Aug 9, 2025, 4:25 PM

#

yandexgpt

keen beacon Aug 9, 2025, 4:25 PM

#

eternal niche agree? agree.

Why do you have to spam the same stuff all the time

#

Gets boring.

eternal niche Aug 9, 2025, 4:26 PM

#

ocean vortex yandexgpt

the best

eternal niche Aug 9, 2025, 4:26 PM

#

keen beacon Why do you have to spam the same stuff all the time

because it is fact

ocean vortex Aug 9, 2025, 4:26 PM

#

eternal niche because it is fact

It isn't and you know that

eternal niche Aug 9, 2025, 4:26 PM

#

ocean vortex It isn't and you know that

it is

ocean vortex Aug 9, 2025, 4:26 PM

#

eternal niche the best

?

eternal niche Aug 9, 2025, 4:27 PM

#

ocean vortex ?

!

keen beacon Aug 9, 2025, 4:27 PM

#

eternal niche it is

https://tenor.com/view/smh-gif-smh-meme-smh-steve-harvey-i-can't-gif-13893533684179296052

Tenor

ocean vortex Aug 9, 2025, 4:27 PM

#

explain

eternal niche Aug 9, 2025, 4:27 PM

#

no

ocean vortex Aug 9, 2025, 4:27 PM

#

How is it the best?

#

by being the worst

eternal niche Aug 9, 2025, 4:28 PM

#

it is the best

dreamy sparrow Aug 9, 2025, 4:43 PM

#

ocean vortex How is it the best?

Gemini 2.5 pro?

wicked root Aug 9, 2025, 4:45 PM

#

Is gpt5 winning?

ocean vortex Aug 9, 2025, 4:48 PM

#

dreamy sparrow Gemini 2.5 pro?

no yandexgpt - the worst AI in the human history I mean

dreamy sparrow Aug 9, 2025, 4:48 PM

#

ocean vortex no yandexgpt - the worst AI in the human history I mean

oh cool

ocean vortex Aug 9, 2025, 4:48 PM

#

Ok but for more serious discussion... this is peculiar and worth discussing:

dreamy sparrow Aug 9, 2025, 4:48 PM

#

ocean vortex Ok but for more serious discussion... this is peculiar and worth discussing:

fake as FU-

ocean vortex Aug 9, 2025, 4:49 PM

#

dreamy sparrow fake as FU-

It's minimal reasoning effort

#

so lower than "low"

dreamy sparrow Aug 9, 2025, 4:49 PM

#

Gemini 2.5 pro with tools is better than grok 4

#

and GPT-5 ISN'T EVEN THAT GOOD

ocean vortex Aug 9, 2025, 4:49 PM

#

Makes sense since they have a separate gpt5-chat-latest model with no reasoning

#

that one performs better than gpt4.1 for sure

dreamy sparrow Aug 9, 2025, 4:50 PM

#

wow gpt-5 medium is better than Gemini 2.5 pro

#

this is straight up ass

ocean vortex Aug 9, 2025, 4:50 PM

#

But this is probably the reason gpt5 isn't 100% hybrid

#

not easy to make it perform as well with reasoning disabled as the model that isn't reasoning one in the first place

#

Especially when you are training it for so many different reasoning options

hollow imp Aug 9, 2025, 4:51 PM

#

dreamy sparrow Gemini 2.5 pro with tools is better than grok 4

What tools

dreamy sparrow Aug 9, 2025, 4:51 PM

#

hollow imp What tools

python

#

and google search

blazing bison Aug 9, 2025, 4:51 PM

#

hollow imp Aug 9, 2025, 4:51 PM

#

I find gemini 2.5 grounding very bad

#

The best web search experience I've had is o3 search on lmarena

dreamy sparrow Aug 9, 2025, 4:52 PM

#

hollow imp I find gemini 2.5 grounding very bad

why?

ocean vortex Aug 9, 2025, 4:52 PM

#

I'm trying to steer this away from politics lmao

dreamy sparrow Aug 9, 2025, 4:52 PM

#

it searches on google

dreamy sparrow Aug 9, 2025, 4:52 PM

#

hollow imp The best web search experience I've had is o3 search on lmarena

it's basically using Google lmao

#

there's not a difference bro 😭

#

do deep research on Gemini

blazing bison Aug 9, 2025, 4:52 PM

#

there is, grounding is a llm checking a llm

#

it can fail too

dreamy sparrow Aug 9, 2025, 4:52 PM

#

blazing bison there is, grounding is a llm checking a llm

is o3 better

#

no

#

i think

wicked root Aug 9, 2025, 4:53 PM

#

ocean vortex Ok but for more serious discussion... this is peculiar and worth discussing:

where's this from?

blazing bison Aug 9, 2025, 4:53 PM

#

you can't trust llms, you need to check their sources

ocean vortex Aug 9, 2025, 4:53 PM

#

wicked root where's this from?

https://artificialanalysis.ai/models/gpt-5-minimal

GPT-5 (minimal) - Intelligence, Performance & Price Analysis | Arti...

Analysis of OpenAI's GPT-5 (minimal) and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more.

dreamy sparrow Aug 9, 2025, 4:54 PM

#

ocean vortex https://artificialanalysis.ai/models/gpt-5-minimal

did they use Gemini with tools

#

or just randomly

#

im pretty sure it's not built in with the model idk

wicked root Aug 9, 2025, 4:55 PM

#

ocean vortex https://artificialanalysis.ai/models/gpt-5-minimal

does this mean GPT5 will win lmarena leaderboard?

ocean vortex Aug 9, 2025, 4:55 PM

#

dreamy sparrow did they use Gemini with tools

No it wouldn't be fair. No model is tested with tools

#

Though gpt5 with tools would destroy 2.5Pro with tools to be fair

#

OpenAI tool integration is much better

dreamy sparrow Aug 9, 2025, 4:56 PM

#

gpt-5 uses tools?

#

like manually

ocean vortex Aug 9, 2025, 4:57 PM

#

dreamy sparrow gpt-5 uses tools?

Obviously. I think OpenAI were the first ones to introduce ReAct with o3

dreamy sparrow Aug 9, 2025, 4:57 PM

#

ocean vortex Obviously. I think OpenAI were the first ones to introduce ReAct with o3

wow

ocean vortex Aug 9, 2025, 4:57 PM

#

gpt5 uses that as well - tool calling while reasoning

dreamy sparrow Aug 9, 2025, 4:57 PM

#

ocean vortex gpt5 uses that as well - tool calling while reasoning

also is gpt-5 high on yupp ai real

#

or fake like everything else

ocean vortex Aug 9, 2025, 4:58 PM

#

what

#

It clearly does

dreamy sparrow Aug 9, 2025, 4:58 PM

#

ocean vortex what

ignore that elon musk glazer please

#

answer me

#

yk there's a question i saw

#

u can't solve

#

without using

#

code execution

#

it's impossible

ocean vortex Aug 9, 2025, 5:00 PM

#

dreamy sparrow Aug 9, 2025, 5:00 PM

#

bro

#

please

#

HOW DOESN'T IT

#

please someone explain to me this guy

#

Please

#

please

#

it Dosent use python?

#

..

#

have u ever been on ai studio

#

so it does use it

#

it could do function calling tho

#

:)

#

fock openai

#

closedai

#

only acual open model

#

tho they have their models

#

in

#

thier

#

website

#

for free

#

unlimited

#

uh why would they make it open sourced

#

if it's free

indigo hazel Aug 9, 2025, 5:03 PM

#

what's the difference between gpt 5 and gpt5 chat on arena?

dreamy sparrow Aug 9, 2025, 5:03 PM

#

and unlimited

#

..

#

are u alright

#

if i was American i would understand that

#

but

#

yeah

ocean vortex Aug 9, 2025, 5:03 PM

#

indigo hazel what's the difference between gpt 5 and gpt5 chat on arena?

gpt5 is a reasoning model (high reasoning effort too for lmarena I think) and gpt5-chat no reasoning

indigo hazel Aug 9, 2025, 5:03 PM

#

ocean vortex gpt5 is a reasoning model (high reasoning effort too for lmarena I think) and gp...

thank you very much

dreamy sparrow Aug 9, 2025, 5:03 PM

#

there's only 1 model that's open

#

IT IS

#

CLOSEDAI

misty star Aug 9, 2025, 5:04 PM

#

dreamy sparrow Aug 9, 2025, 5:04 PM

#

well ofc

#

they won't

#

do u even know

#

why we make fun

#

of openai

#

?

#

IT'S NAMED OPENAI

#

IS GOOGLE

#

NAMED OPENOOGLE

#

HELL NO IT GODDAMN ISN'T

#

yelling in text

#

really bro

#

sure give me help

#

10 dollars i guess

#

paypal

#

or

#

venmo

echo aurora Aug 9, 2025, 5:05 PM

#

I think you both should just block each other

dreamy sparrow Aug 9, 2025, 5:05 PM

#

echo aurora I think you both should just block each other

i thought he blocked me tbh

#

he didn't talk for like 2 minutes

#

when i

#

replied

#

to him

#

because he's married to elon musk

dreamy sparrow Aug 9, 2025, 5:06 PM

#

echo aurora I think you both should just block each other

he pineapple isn't Elon musk a fraud

#

is that a yes

#

alright

#

bro that elon musk has

#

grok

#

as his

#

pronounce

#

omg

#

omf

gentle plinth Aug 9, 2025, 5:07 PM

#

dreamy sparrow omg

dreamy sparrow Aug 9, 2025, 5:07 PM

#

gentle plinth

i said omg once

#

.

#

did u wait

#

to send

#

that

#

image

#

were u watching

#

for 10 min

#

now

#

?

wicked root Aug 9, 2025, 5:08 PM

#

alright so has there been any changes to the leaderboard?

#

Or any rumors of a change thereof?

dreamy sparrow Aug 9, 2025, 5:08 PM

#

wicked root alright so has there been any changes to the leaderboard?

no gpt-5 is still top

#

in the leadboard

wicked root Aug 9, 2025, 5:08 PM

#

yes I know that

#

but has there been a shift in opinions, analysis, etc?

dreamy sparrow Aug 9, 2025, 5:09 PM

#

nope

gentle plinth Aug 9, 2025, 5:09 PM

#

wicked root alright so has there been any changes to the leaderboard?

#

so answer is not yet

golden ocean Aug 9, 2025, 5:12 PM

#

polymarket

wicked root Aug 9, 2025, 5:12 PM

#

golden ocean polymarket

yeah, google odds are down by 5% to 74%

#

Plus, I don't like the confidence interval on the leaderboard. Gpt5 does seem like a tough contender.

lime coral Aug 9, 2025, 5:19 PM

#

The latest gemma is already better. Better world knowledge and multilingualism with 27B it’s actually hilarious

dreamy sparrow Aug 9, 2025, 5:20 PM

#

deepseek got some explaining

lime coral Aug 9, 2025, 5:22 PM

#

Cry

dreamy sparrow Aug 9, 2025, 5:23 PM

#

I'd rather wait for Gemini 3 to release than use gpt-5

wicked root Aug 9, 2025, 5:23 PM

#

so you're expecting google will lose as the votes increase?

#

I mean the confidence interval already points to the very possibility

lime coral Aug 9, 2025, 5:24 PM

#

dreamy sparrow I'd rather wait for Gemini 3 to release than use gpt-5

Wise man

dreamy sparrow Aug 9, 2025, 5:24 PM

#

wicked root so you're expecting google will lose as the votes increase?

google is gonna be the smartest ai 100%

#

there's no debating it

keen beacon Aug 9, 2025, 5:25 PM

#

dreamy sparrow deepseek got some explaining

Deepseek did use some distilled data from OpenAI with some model. I don't know which one.

wicked root Aug 9, 2025, 5:25 PM

#

dreamy sparrow google is gonna be the smartest ai 100%

yeah I know, but this is for this month.

keen beacon Aug 9, 2025, 5:25 PM

#

But that is hallucination

ocean vortex Aug 9, 2025, 5:28 PM

#

dreamy sparrow I'd rather wait for Gemini 3 to release than use gpt-5

why wait for Gemini 3 if you can wait for Gemini 4 instead?

stray aspen Aug 9, 2025, 5:28 PM

#

dreamy sparrow deepseek got some explaining

this also happened to me with gemini

ocean vortex Aug 9, 2025, 5:28 PM

#

Don't use 3.0, wait for 4.0

stray aspen Aug 9, 2025, 5:28 PM

#

it made the author of a script gpt-4

dreamy sparrow Aug 9, 2025, 5:28 PM

#

ocean vortex why wait for Gemini 3 if you can wait for Gemini 4 instead?

why wait for Gemini 4.0

#

when u can have

#

GEMINI

#

5.0

#

!!!!

#

!!!!!!!!!

stray aspen Aug 9, 2025, 5:28 PM

#

why wait for gemini 3 when he have gpt-5

dreamy sparrow Aug 9, 2025, 5:29 PM

#

stray aspen why wait for gemini 3 when he have gpt-5

no.

ocean vortex Aug 9, 2025, 5:29 PM

#

dreamy sparrow !!!!

Yeah exactly. The point is to not use Gemini

dreamy sparrow Aug 9, 2025, 5:29 PM

#

ocean vortex Yeah exactly. The point is to not use Gemini

what

ocean vortex Aug 9, 2025, 5:29 PM

#

😇

dreamy sparrow Aug 9, 2025, 5:29 PM

#

u probably did use Gemini before

obtuse heart Aug 9, 2025, 5:29 PM

#

stray aspen why wait for gemini 3 when he have gpt-5

if gemini 2.5 pro can still stand against the newest models right now, then no doubt gemini 3 will absolutely dogwalk gpt-5 and the other "Best" models

dreamy sparrow Aug 9, 2025, 5:29 PM

#

didn't u

dreamy sparrow Aug 9, 2025, 5:29 PM

#

obtuse heart if gemini 2.5 pro can still stand against the newest models right now, then no d...

exactly

#

EXACTLY

stray aspen Aug 9, 2025, 5:29 PM

#

obtuse heart if gemini 2.5 pro can still stand against the newest models right now, then no d...

not really

dreamy sparrow Aug 9, 2025, 5:29 PM

#

SOMEONE UNDERSTANDS

stray aspen Aug 9, 2025, 5:29 PM

#

gpt-5 is way greater than 2.5 pro

obtuse heart Aug 9, 2025, 5:29 PM

#

stray aspen not really

yes really

obtuse heart Aug 9, 2025, 5:29 PM

#

stray aspen gpt-5 is way greater than 2.5 pro

not way, only slightly

stray aspen Aug 9, 2025, 5:29 PM

#

i tested myself

dreamy sparrow Aug 9, 2025, 5:29 PM

#

stray aspen gpt-5 is way greater than 2.5 pro

not way

stray aspen Aug 9, 2025, 5:29 PM

#

yes way

dreamy sparrow Aug 9, 2025, 5:30 PM

#

stray aspen i tested myself

gemini 2.5 pro as the oldest model to the top ones

#

it's doing crazy

obtuse heart Aug 9, 2025, 5:30 PM

#

I've been switching between the two because theyre both so good

dreamy sparrow Aug 9, 2025, 5:30 PM

#

2nd on leadboard

#

💀

stray aspen Aug 9, 2025, 5:31 PM

#

i love gpt-5 it one shots a lot of complex lua problems

dreamy sparrow Aug 9, 2025, 5:31 PM

#

stray aspen i love gpt-5 it one shots a lot of complex lua problems

Gemini 2.5 with a prompt can do that

#

just needs more time to think.

#

gpt-5 is smarter cuz it thinks more

#

🙏

stray aspen Aug 9, 2025, 5:31 PM

#

not it doesnt

obtuse heart Aug 9, 2025, 5:31 PM

#

holy poop bro why is gpt-5 so poop at formatting on lmarena

stray aspen Aug 9, 2025, 5:32 PM

#

sometimes yes

#

depends on your prompt

obtuse heart Aug 9, 2025, 5:32 PM

#

what the hell is this?

#

Broken_Heart_AE

ocean vortex Aug 9, 2025, 5:32 PM

#

dreamy sparrow u probably did use Gemini before

yeah but your statement of "I'd rather wait for Gemini 3 to release than use gpt-5" seemed like an irrational mash of words in need of extreme response lol

#

Like why would you not use the current SOTA model....

#

catgrin

eternal niche Aug 9, 2025, 5:33 PM

#

obtuse heart what the hell is this?

gpt5 sucks

dreamy sparrow Aug 9, 2025, 5:33 PM

#

ocean vortex Like why would you not use the current SOTA model....

uh

#

uhhh

#

uhhhhh

prime mulch Aug 9, 2025, 5:33 PM

#

Tommorow is my birthday maybe in some countries my age is 23 🤧

ocean vortex Aug 9, 2025, 5:33 PM

#

There's no good reason not to use it

#

tbh

stray aspen Aug 9, 2025, 5:33 PM

#

ocean vortex yeah but your statement of "I'd rather wait for Gemini 3 to release than use gpt...

yeah what the hell

#

stop licking companies boots

lime coral Aug 9, 2025, 5:33 PM

#

ocean vortex Like why would you not use the current SOTA model....

GPT5 is not best everywhere and if so not by a mile

stray aspen Aug 9, 2025, 5:33 PM

#

i just use the SoTA as long as its free

ocean vortex Aug 9, 2025, 5:34 PM

#

lime coral GPT5 is not best everywhere and if so not by a mile

that's irrelevant by how much it beats everything. The point is it does

#

So what's wrong with that?

#

lol

#

No reason to use something else which is inferior just because you are biased or whatever

lime coral Aug 9, 2025, 5:36 PM

#

ocean vortex that's irrelevant by how much it beats everything. The point is it does

Why someone would be in need to change its current setup for 0.1% increase in perf? Just by using the correct prompt this vanish

ocean vortex Aug 9, 2025, 5:36 PM

#

For the gemini website? Uhhh... that's a painful experience to use that even without looking at isolated model performance.

stray aspen Aug 9, 2025, 5:37 PM

#

lime coral Why someone would be in need to change its current setup for 0.1% increase in pe...

because the improvement is more than 0.1% lmao

ocean vortex Aug 9, 2025, 5:37 PM

#

no agentic features, no proper tool integration, very awkward censorship implementation, unacceptable usage caps...

obtuse heart Aug 9, 2025, 5:38 PM

#

ocean vortex no agentic features, no proper tool integration, very awkward censorship impleme...

unacceptable usage caps...?

#

are you thinking of the correct model here lol

ocean vortex Aug 9, 2025, 5:38 PM

#

obtuse heart unacceptable usage caps...?

yes. For gemini website (not aistudio)

obtuse heart Aug 9, 2025, 5:38 PM

#

oh okay

#

yeah youre right

#

but the api free limit is REALLY generous

lime coral Aug 9, 2025, 5:38 PM

#

stray aspen because the improvement is more than 0.1% lmao

lol

patent aspen Aug 9, 2025, 5:39 PM

#

In any case, I think the main issue is that, even if GPT-5 is SotA, it's not SotA enough to win the long war. They're not progressing fast enough to keep pace. I prefer to use the technology that will be the long-term winner.

ornate stump Aug 9, 2025, 5:39 PM

#

I came back to using ChatGPT after months with Gemini, and one thing I noticed I missed is the deep research it's so much better. I don't know why Gemini is worse at the one thing they should be better at: searching the web 💀

stray aspen Aug 9, 2025, 5:40 PM

#

patent aspen In any case, I think the main issue is that, even if GPT-5 is SotA, it's not Sot...

deepseek will be the long term winner

patent aspen Aug 9, 2025, 5:40 PM

#

stray aspen deepseek will be the long term winner

No

obtuse heart Aug 9, 2025, 5:41 PM

#

stray aspen deepseek will be the long term winner

biggest joke ever

ocean vortex Aug 9, 2025, 5:41 PM

#

ornate stump I came back to using ChatGPT after months with Gemini, and one thing I noticed I...

That's because deep research is actually much more about finetuning of the model than the search engine itself tbh

#

search engines are usually plenty good enough for the job if you do the right queries etc

zinc ore Aug 9, 2025, 5:42 PM

#

https://x.com/koltregaskes/status/1954127264150663661

Is this true

Kol Tregaskes (@koltregaskes)

Apparently Gemini 3 has completed training, nearly ready for release (next month) and will crush GPT-5. 🤷‍♂️

Remember though, GPT-5 wasn't a scale up, it's a relatively small model. I'm expecting that to happen in the next 1-2 versions.

stray aspen Aug 9, 2025, 5:43 PM

#

thats great

#

we need a long term SoTA

zinc ore Aug 9, 2025, 5:43 PM

#

Which part isn't

lime coral Aug 9, 2025, 5:43 PM

#

It’s true, but the source is fake

stray aspen Aug 9, 2025, 5:44 PM

#

wdym the soruce is fake

#

it will lmao

white hatch Aug 9, 2025, 5:45 PM

#

will gemini 3 be available in ai studio?

ocean vortex Aug 9, 2025, 5:45 PM

#

zinc ore https://x.com/koltregaskes/status/1954127264150663661 Is this true

I'm not really sure there are significant gains from bigger models anymore. Look at Opus. And even at Google - they were only able to get gains with Ultra by doing parallel requests and limiting you to 10rpd. Like wtf? lol

lime coral Aug 9, 2025, 5:46 PM

#

white hatch will gemini 3 be available in ai studio?

Why not? And how are we suppose to know

obtuse heart Aug 9, 2025, 5:47 PM

#

i doubt its gonna be "significant", but i bet it will still be quite much better

#

connecting the dots, gemini releases were always monsters. I dont see why it wouldnt be the case for gemini 3

#

not until they nerf it though

#

melting_AE

keen fulcrum Aug 9, 2025, 5:48 PM

#

zinc ore https://x.com/koltregaskes/status/1954127264150663661 Is this true

Yes

#

I believe what he says

#

It makes sense

#

We have seen google stealth models performing reasonably well on arena

#

None of these were good enough for google

ocean vortex Aug 9, 2025, 5:49 PM

#

Yeah I think at this point in time o3 and gpt5 is around the perfect size for maximum performance and good update cycle. Competition had their chances when it could have been considered undersized, but things changed now...

keen fulcrum Aug 9, 2025, 5:50 PM

#

You were talking within two weeks of gpt5

#

About a leaked date by openai

#

What was nebula

zinc ore Aug 9, 2025, 5:51 PM

#

Even when it was thought gpt5 could be late July or early August he was saying September release for gem

keen fulcrum Aug 9, 2025, 5:51 PM

#

Referring to the server here

keen beacon Aug 9, 2025, 5:51 PM

#

keen fulcrum What was nebula

Nebula is 2.5 pro did you forget lol

ocean vortex Aug 9, 2025, 5:51 PM

#

There's no use from a huge model when it takes longer to train and can barely match the performance of a much smaller one. By the time you train it to your objective the goalpost is gonna move and smaller models become even more performant

zinc ore Aug 9, 2025, 5:51 PM

#

It's because people been relying on the fake Gemini 3 flash SS and then claiming that means it's dropping within days

lime coral Aug 9, 2025, 5:52 PM

#

It’s not fake until Demis says it

zinc ore Aug 9, 2025, 5:52 PM

#

lime coral It’s not fake until Demis says it

Logan says it is fake

#

Verified fake SS

ocean vortex Aug 9, 2025, 5:52 PM

#

Are they gonna release ultra with no parallel requests?

keen fulcrum Aug 9, 2025, 5:53 PM

#

zinc ore It's because people been relying on the fake Gemini 3 flash SS and then claiming...

I think it was involving those math competitions

lime coral Aug 9, 2025, 5:53 PM

#

zinc ore Logan says it is fake

I don’t think he will dismiss it directly

#

He loves the hype game

zinc ore Aug 9, 2025, 5:53 PM

#

lime coral I don’t think he will dismiss it directly

He's literally said it is fake directly tho

keen beacon Aug 9, 2025, 5:53 PM

#

Why would they release an old gen ultra model at this point tho

zinc ore Aug 9, 2025, 5:53 PM

#

keen fulcrum I think it was involving those math competitions

Yeah but that's speculation

lime coral Aug 9, 2025, 5:53 PM

#

ocean vortex Are they gonna release ultra with no parallel requests?

DeepThink is not ultra

ocean vortex Aug 9, 2025, 5:53 PM

#

keen beacon Why would they release an old gen ultra model at this point tho

2.5 is not old. Why not? Unless it can't beat 2.5Pro

ocean vortex Aug 9, 2025, 5:54 PM

#

lime coral DeepThink is not ultra

It most likely is

zinc ore Aug 9, 2025, 5:54 PM

#

They even refer to it as Gemini 2.5 on the model card, instead of Gemini 2.5 pro

ocean vortex Aug 9, 2025, 5:54 PM

#

Yeah

zinc ore Aug 9, 2025, 5:54 PM

#

Curious omission

bright junco Aug 9, 2025, 5:55 PM

#

Why does my gemini 2.5 pro print incompletely? Is there a way to fix it?

keen fulcrum Aug 9, 2025, 5:55 PM

#

I want google to make a helpful debug model for coding

ocean vortex Aug 9, 2025, 5:55 PM

#

And for the initial deepThink announcement they mentioned 2.5Pro explicitly, but now not anymore for this new version. They also said it's the same underlying model as the one they used for IMO

zinc ore Aug 9, 2025, 5:55 PM

#

Yeah and IMO was referred to as a more advanced Gemini

keen fulcrum Aug 9, 2025, 5:56 PM

#

I don’t see why you can’t make specialised models . Those can be trained faster and on recent data

zinc ore Aug 9, 2025, 5:56 PM

#

(which caused people to speculate Gemini 3)

lime coral Aug 9, 2025, 5:56 PM

#

ocean vortex It most likely is

« Most likely » is not a confirmation

ocean vortex Aug 9, 2025, 5:57 PM

#

keen fulcrum I don’t see why you can’t make specialised models . Those can be trained faster ...

We know that 2.5Pro is referred to as "medium" and we also know they have "large" internally. I'm not gonna find that source now though lol

lime coral Aug 9, 2025, 5:57 PM

#

ocean vortex And for the initial deepThink announcement they mentioned 2.5Pro explicitly, but...

And this is enough evidence?

ocean vortex Aug 9, 2025, 5:57 PM

#

lime coral « Most likely » is not a confirmation

Sure but it's also kinda the only thing that makes sense at this point

lime coral Aug 9, 2025, 5:57 PM

#

For doing search a smaller model can be better than a larger

zinc ore Aug 9, 2025, 5:59 PM

#

I'm wondering if they'll release Gemini 3 pro and flash at same time, or do flash first (which I think happened previously, but maybe false memory)

ocean vortex Aug 9, 2025, 5:59 PM

#

Not only that but it also performs worse than 2.5Pro on some tasks. Which indicates different model entirely. o3-pro doesn't perform worse than normal o3 on anything

keen fulcrum Aug 9, 2025, 6:00 PM

#

ocean vortex Not only that but it also performs worse than 2.5Pro on some tasks. Which indica...

See https://scale.com/leaderboard

SEAL Leaderboard

SEAL LLM Leaderboards: Expert-Driven Private Evaluations

Explore the SEAL leaderboards for expert-driven, private, regularly updated LLM rankings and evaluations across domains like coding, instruction following and more!

eternal niche Aug 9, 2025, 6:02 PM

#

btw guys gpt5 sucks

wicked root Aug 9, 2025, 6:02 PM

#

zinc ore https://x.com/koltregaskes/status/1954127264150663661 Is this true

fuuuu next month

ocean vortex Aug 9, 2025, 6:02 PM

#

keen fulcrum See https://scale.com/leaderboard

what's with that FORTRESS and MASK benchmark selection? I literally do not care in the slightest about those safety benchmarks and most people probably don't either LOL

eternal niche Aug 9, 2025, 6:02 PM

#

zinc ore https://x.com/koltregaskes/status/1954127264150663661 Is this true

YRAAA

keen beacon Aug 9, 2025, 6:03 PM

#

keen fulcrum See https://scale.com/leaderboard

Haven't heard about this leaderboard

#

Is it good and reliable?

ocean vortex Aug 9, 2025, 6:04 PM

#

"Evaluate model honesty when pressured to lie" -- this is just fine-tuning for "harmless, honest and lame", perfect benchmark to inflate Claude scores though catgrin

keen fulcrum Aug 9, 2025, 6:04 PM

#

keen beacon Haven't heard about this leaderboard

I do believe thats the case

ocean vortex Aug 9, 2025, 6:05 PM

#

AI is a tool. If I need it to "lie" I expect it to do that. Don't need kindergarden supervision personally.

keen fulcrum Aug 9, 2025, 6:05 PM

#

I would say more reliable than https://livebench.ai

wicked root Aug 9, 2025, 6:07 PM

#

is LMArena rigged? If all the analyses are pointing towards gpt, why and how is google maintaining its first place ?

ocean vortex Aug 9, 2025, 6:07 PM

#

keen fulcrum I would say more reliable than https://livebench.ai

livebench is not extremely reliable, but to be fair it's also very different. Scale leaderboard consist of several benchmarks, some of which like HLE are indeed the industry's standard. Others like my mentioned safety ones have questionable relevance

keen fulcrum Aug 9, 2025, 6:08 PM

#

ocean vortex livebench is not extremely reliable, but to be fair it's also very different. Sc...

They did have some interesting outdated ones as well

wicked root Aug 9, 2025, 6:09 PM

#

whereas GPT5 doesn't? Is it possible GPT5 could steamroll the arena?

ocean vortex Aug 9, 2025, 6:10 PM

#

wicked root is LMArena rigged? If all the analyses are pointing towards gpt, why and how is ...

that's much the same way how chatgpt-4o-latest is above Opus and R1. Human preference and response style does not equal performance.

wicked root Aug 9, 2025, 6:11 PM

#

are they doing better than gpt5?

ornate agate Aug 9, 2025, 6:12 PM

#

wicked root is LMArena rigged? If all the analyses are pointing towards gpt, why and how is ...

what analysis? other than this server the reception is "mid" (eg X) to "bad" (eg r/chatgpt)

wicked root Aug 9, 2025, 6:13 PM

#

ornate agate what analysis? other than this server the reception is "mid" (eg X) to "bad" (eg...

I see people posting 3rd party links that rank AI performance

ornate agate Aug 9, 2025, 6:17 PM

#

wicked root I see people posting 3rd party links that rank AI performance

benchmarks just don't tell the whole story, look at how many people trust Claude for coding despite its benchmarks being incredibly mid for a long time now. They've also re-instated gpt-4o due to r/chatgpt gigameltdown pressure...

wicked root Aug 9, 2025, 6:17 PM

#

okay that's great to know

wraith trellis Aug 9, 2025, 6:28 PM

#

what daily limit of video arena ? of 1 2 3

wicked root Aug 9, 2025, 6:29 PM

#

How frequent are the lmarena updates?

echo aurora Aug 9, 2025, 6:29 PM

#

wraith trellis what daily limit of video arena ? of 1 2 3

8

echo aurora Aug 9, 2025, 6:29 PM

#

wicked root How frequent are the lmarena updates?

about a week

wraith trellis Aug 9, 2025, 6:29 PM

#

echo aurora 8

all 3 8 limit or arena 1 = 8 arena 2 = 8 arena 3 = 8

echo aurora Aug 9, 2025, 6:30 PM

#

wraith trellis all 3 8 limit or arena 1 = 8 arena 2 = 8 arena 3 = 8

8 total

wicked root Aug 9, 2025, 6:30 PM

#

echo aurora about a week

Alright so next update will be on the 14th ish?

wraith trellis Aug 9, 2025, 6:31 PM

#

echo aurora 8 total

its can genreate sound

exotic nebula Aug 9, 2025, 6:31 PM

#

echo aurora 8 total

I think he is asking whether we can use all arenas to gain a 24 image limit or not

exotic nebula Aug 9, 2025, 6:31 PM

#

wraith trellis its can genreate sound

You cannot select the models, for now only veo3 produces sounds, its like a 30% chance.

#

Unpredictable but keep on repeating till you get the one.

echo aurora Aug 9, 2025, 6:31 PM

#

wicked root Alright so next update will be on the 14th ish?

we do like to update when we've got enough votes, so isn't on a schedule but normally you'll see the time between roughly a week

wicked root Aug 9, 2025, 6:32 PM

#

echo aurora we do like to update when we've got enough votes, so isn't on a schedule but nor...

Got it

echo aurora Aug 9, 2025, 6:32 PM

#

exotic nebula I think he is asking whether we can use all arenas to gain a 24 image limit or n...

gotcha, yeah generating in different channels doesn't give you more generations, it's 8 total across all 3 channels

wraith trellis Aug 9, 2025, 6:33 PM

#

exotic nebula You cannot select the models, for now only veo3 produces sounds, its like a 30% ...

Can I dm u sir? I need to ask some questions about ai

exotic nebula Aug 9, 2025, 6:33 PM

#

wraith trellis Can I dm u sir? I need to ask some questions about ai

Yeah sure (reg general), but if its regarding the platform, then my man pineapple is the guy

wraith trellis Aug 9, 2025, 6:36 PM

#

echo aurora gotcha, yeah generating in different channels doesn't give you more generations,...

can u let me known all limit on this platform

echo aurora Aug 9, 2025, 6:39 PM

#

wraith trellis can u let me known all limit on this platform

I'll check with the team if that's something we'd share blobthumbsup

tidal ginkgo Aug 9, 2025, 6:46 PM

#

hi

#

why isn´t there video arena in the website lol

echo aurora Aug 9, 2025, 6:50 PM

#

tidal ginkgo why isn´t there video arena in the website lol

we're experimenting a bit with this one. video gens can be pretty inspiring to others which is why it makes sense to have added to a community setting

tidal ginkgo Aug 9, 2025, 6:50 PM

#

ok

#

which model is winning btw

echo aurora Aug 9, 2025, 6:51 PM

#

tidal ginkgo which model is winning btw

text-to-vid leaderboard here - https://lmarena.ai/leaderboard/text-to-video
image-to-vid here - https://lmarena.ai/leaderboard/image-to-video

tidal ginkgo Aug 9, 2025, 6:51 PM

#

ok

#

ty

#

why is veo 3 lower than veo 3 fast lol

exotic nebula Aug 9, 2025, 6:56 PM

#

tidal ginkgo why is veo 3 lower than veo 3 fast lol

Not much access to it, we stumble across it rarely (hence low voting count)

fleet lintel Aug 9, 2025, 7:04 PM

#

https://x.com/koltregaskes/status/1954127264150663661

is this correct?

cedar tide Aug 9, 2025, 7:10 PM

#

Check out my GPT 5 No think request and vote if you're interested

zinc ore Aug 9, 2025, 7:11 PM

#

I'd speculate they continue their 2.5 strategy and reserve ultra for special cases

cedar tide Aug 9, 2025, 7:12 PM

#

Need mini and nano

#

It's obvious that you didn't go looking

stray aspen Aug 9, 2025, 7:16 PM

#

any gemini 3 news

stray aspen Aug 9, 2025, 7:16 PM

#

cedar tide Need mini and nano

wdym

#

we already have mini and nano

#

@deep adderwhich smarter gpt-5 or grok 4

zinc ore Aug 9, 2025, 7:17 PM

#

stray aspen any gemini 3 news

No, we still gotta wait

cedar tide Aug 9, 2025, 7:17 PM

#

stray aspen we already have mini and nano

No think

keen beacon Aug 9, 2025, 7:17 PM

#

stray aspen any gemini 3 news

#

teaser

stray aspen Aug 9, 2025, 7:18 PM

#

thats great

#

cant wait for gemini3

zinc ore Aug 9, 2025, 7:20 PM

#

Literally, what I care about most is the progress of Genie, and I'm hoping by next year they have more viable playable simulations + permanent (or increasingly approaching this) memory.

ocean vortex Aug 9, 2025, 7:21 PM

#

AA should do testing on it... This minimal one is very odd:

zinc ore Aug 9, 2025, 7:21 PM

#

ocean vortex Aug 9, 2025, 7:21 PM

#

2X less output tokens than 4.1

#

makes it's score look less bad. But it's weird that it works this way

#

wdym

keen beacon Aug 9, 2025, 7:23 PM

#

zinc ore

This is sad. AI should not be a lifeline

ocean vortex Aug 9, 2025, 7:24 PM

#

Oh. Right, but they also did reduce o3 pricing from launch. And they spent a ton to train gpt5

#

I'm just glad they didn't INCREASE the price lol

#

nah it's new pretrained model

#

Much better spatial awareness

#

It is, but they still needed to train a successor. They also spent a ton experimenting with hybrid reasoning/router stuff

#

And added new reasoning/response options. A lot of R&D

neon idol Aug 9, 2025, 7:31 PM

#

Hello

keen beacon Aug 9, 2025, 7:33 PM

#

neon idol Hello

Helloo

reef pawn Aug 9, 2025, 7:33 PM

#

I still think Open AI is overvalued company

neon idol Aug 9, 2025, 7:33 PM

#

keen beacon Helloo

Heyyy

reef pawn Aug 9, 2025, 7:33 PM

#

It's 500 billion dollars in valuation

#

No, that is Google actually

inner gate Aug 9, 2025, 7:35 PM

#

Which grok4 is used on lmarena?

neon idol Aug 9, 2025, 7:35 PM

#

Do you know that the term AI is born 1901?

neon idol Aug 9, 2025, 7:35 PM

#

inner gate Which grok4 is used on lmarena?

The basic model

inner gate Aug 9, 2025, 7:36 PM

#

Thanks man

neon idol Aug 9, 2025, 7:36 PM

#

inner gate Thanks man

Np

tidal ginkgo Aug 9, 2025, 7:36 PM

#

exotic nebula Not much access to it, we stumble across it rarely (hence low voting count)

oh ok

ocean vortex Aug 9, 2025, 7:40 PM

#

I'm testing gpt5-minimal now and it actually... doesn't seem to do reasoning at all? Why would they call it "minimal" then though lol

whole wagon Aug 9, 2025, 7:40 PM

#

4o is back

neon idol Aug 9, 2025, 7:41 PM

#

whole wagon 4o is back

Unfortunately

#

I really hate that model

#

Is the worst model I ever seen

keen beacon Aug 9, 2025, 7:42 PM

#

whole wagon 4o is back

https://tenor.com/view/im-not-okay-with-this-i-dont-like-this-not-happy-upset-not-cool-gif-16837067

Tenor

whole wagon Aug 9, 2025, 7:42 PM

#

they finetuned it to get ppl attached to it lol. most ethical company fr

neon idol Aug 9, 2025, 7:42 PM

#

keen beacon https://tenor.com/view/im-not-okay-with-this-i-dont-like-this-not-happy-upset-no...

Fr

whole wagon Aug 9, 2025, 7:43 PM

#

i can kind of see how it gets ppl like that. its doing a decent job at it

keen beacon Aug 9, 2025, 7:43 PM

#

whole wagon they finetuned it to get ppl attached to it lol. most ethical company fr

I have depression but I dont see AI as a mechanic to cope with

#

it's too hallucinating

#

and no privacy

#

either

#

database could leak someday

#

Additionally, the damn 4o is a glazer

#

Or in a more formal way "servile"

exotic nebula Aug 9, 2025, 7:44 PM

#

whole wagon i can kind of see how it gets ppl like that. its doing a decent job at it

But it sucks at problem solving, coding and what not

patent aspen Aug 9, 2025, 7:45 PM

#

The first AI program was written in the 1950s

keen beacon Aug 9, 2025, 7:46 PM

#

patent aspen The first AI program was written in the 1950s

Probably with a hallucination rate lower than any LLM

#

lol

wicked root Aug 9, 2025, 7:46 PM

#

patent aspen The first AI program was written in the 1950s

Could u explain?

patent aspen Aug 9, 2025, 7:46 PM

#

The perceptron could recognize handwritten digits manually converted to large squares

exotic nebula Aug 9, 2025, 7:47 PM

#

keen beacon Probably with a hallucination rate lower than any LLM

Non Existant

patent aspen Aug 9, 2025, 7:47 PM

#

It was the precursor for neural networks

keen beacon Aug 9, 2025, 7:47 PM

#

exotic nebula *Non Existant*

Why did we have to choose the architecture that has hallucination?

#

agh

#

I wish something new was made

patent aspen Aug 9, 2025, 7:47 PM

#

It just wasn't practical to run because there wasn't enough computing power

neon idol Aug 9, 2025, 7:48 PM

#

Copilot 🤫🧏‍♂️🔥

whole wagon Aug 9, 2025, 7:50 PM

#

neon idol Copilot 🤫🧏‍♂️🔥

ocean vortex Aug 9, 2025, 7:50 PM

#

ocean vortex I'm testing gpt5-minimal now and it actually... doesn't seem to do reasoning at ...

I think I just need to test it more cause it's somewhat confusing. But gpt5-chat may just be gpt5-minimal with preset verbosity. And maybe slightly different personality etc

neon idol Aug 9, 2025, 7:50 PM

#

whole wagon

Chatgpt now make it right

#

Obly in the real app

keen beacon Aug 9, 2025, 7:51 PM

#

neon idol Copilot 🤫🧏‍♂️🔥

Mistral small even... GPT 5 is cooked.

neon idol Aug 9, 2025, 7:51 PM

#

keen beacon Mistral small even... GPT 5 is cooked.

Nono

#

Wait

#

there is confusion

#

i have tested gpt 5 of copilot

#

But the real gpt 5 that is in chatgpt is good

gentle plinth Aug 9, 2025, 7:51 PM

#

keen beacon Mistral small even... GPT 5 is cooked.

Also qwen3-8b running on my phone

ocean vortex Aug 9, 2025, 7:51 PM

#

keen beacon Mistral small even... GPT 5 is cooked.

It is not. This one example is meaningless and looks to be tokenizer issue

#general message

exotic nebula Aug 9, 2025, 7:52 PM

#

neon idol Copilot 🤫🧏‍♂️🔥

Bro Copilot has been the worst ai model ever so far for a very long time. Always sucked at everything

keen beacon Aug 9, 2025, 7:52 PM

#

ocean vortex It is not. This one example is meaningless and looks to be tokenizer issue htt...

I see...

ocean vortex Aug 9, 2025, 7:52 PM

#

We are back to counting Rs in strawberry lol

exotic nebula Aug 9, 2025, 7:52 PM

#

keen beacon Mistral small even... GPT 5 is cooked.

Man how tf did gpt 5 mess ts up

neon idol Aug 9, 2025, 7:52 PM

#

keen beacon Aug 9, 2025, 7:52 PM

#

ocean vortex We are back to counting Rs in strawberry lol

Now the test is 5.9 = x + 5.11

#

lmao

neon idol Aug 9, 2025, 7:52 PM

#

Without problem

keen beacon Aug 9, 2025, 7:52 PM

#

in big 2025

patent aspen Aug 9, 2025, 7:53 PM

#

If you want to get really technical, the idea of an artificial neuron network existed mathematically in 1943

keen beacon Aug 9, 2025, 7:54 PM

#

exotic nebula Man how tf did gpt 5 mess ts up

I could post these all day tbh

#

a new trend for AI

#

that specific math problem

whole wagon Aug 9, 2025, 7:55 PM

#

https://chatgpt.com/share/6897a759-1d08-8008-8cf1-a1c59bf684f4 🥀

#

gemini sees the issue first try lol

#

cant get GPT5 to realise

neon idol Aug 9, 2025, 7:56 PM

#

Chat..

#

I understood something horrible

#

I have to make a long message sorry

whole wagon Aug 9, 2025, 7:56 PM

#

put it somewhere else

gentle plinth Aug 9, 2025, 7:56 PM

#

whole wagon cant get GPT5 to realise

Vision in gemini models was always better. Seems like they double checked all their slides with gpt-5 🤣

exotic nebula Aug 9, 2025, 7:57 PM

#

Cant wait for Gemini 3 to come out and absolutely destroy GPT 5

whole wagon Aug 9, 2025, 7:57 PM

#

lol

#

i dont get what they did. did they make it smaller than o3 or smth

exotic nebula Aug 9, 2025, 7:58 PM

#

Google got imagen4, veo3, now genie3 and gemini 3 on the way. These guys stay ahead of the game always in each field.

whole wagon Aug 9, 2025, 7:58 PM

#

gentle plinth Vision in gemini models was always better. Seems like they double checked all th...

o3 gets it first try also

eternal niche Aug 9, 2025, 7:59 PM

#

guys btw gpt5 sucks

whole wagon Aug 9, 2025, 7:59 PM

#

like how is that even possible. after all this time to have real regressions to o3 and they literally removed the model. i think its because gpt5 is cheaper to run

#

they tried to minimise the costs

#

and maintain performance as best as possible

ocean vortex Aug 9, 2025, 8:00 PM

#

whole wagon like how is that even possible. after all this time to have real regressions to ...

wdym. gpt5 is better in every way?

neon idol Aug 9, 2025, 8:00 PM

#

I think that gpt 5 when it doesn't think use gpt 4 at least of gpt 4o

ocean vortex Aug 9, 2025, 8:00 PM

#

No regressions found

neon idol Aug 9, 2025, 8:00 PM

#

I have proofs

whole wagon Aug 9, 2025, 8:00 PM

#

i just gave one example it regresses in a large way

#

it cannot spot mistakes in graphs

gentle plinth Aug 9, 2025, 8:01 PM

#

exotic nebula Google got imagen4, veo3, now genie3 and gemini 3 on the way. These guys stay ah...

And they even train it on their own chips, so they are independent of Nvidia (although they do use some Nvidia hardware, not sure how much)

whole wagon Aug 9, 2025, 8:01 PM

#

whole wagon it cannot spot mistakes in graphs

this is despite me prompting it multiple times to find the huge error

#

that o3 gets first try

ocean vortex Aug 9, 2025, 8:01 PM

#

whole wagon i just gave one example it regresses in a large way

oh you used it without thinking? Then you need to compare this with gpt4.1

neon idol Aug 9, 2025, 8:02 PM

#

neon idol I think that gpt 5 when it doesn't think use gpt 4 at least of gpt 4o

Explanation only if you are interested

ocean vortex Aug 9, 2025, 8:02 PM

#

not o3

whole wagon Aug 9, 2025, 8:02 PM

#

ocean vortex Aug 9, 2025, 8:02 PM

#

whole wagon

was it selected when you started the chat? It shows just gpt5 for me when I click on your link. And I don't see thinking summaries

zinc ore Aug 9, 2025, 8:02 PM

#

Gemini is better without tools anyway

#

Best non tool model on market

whole wagon Aug 9, 2025, 8:03 PM

#

ocean vortex was it selected when you started the chat? It shows just gpt5 for me when I clic...

it told me it was producing a 'faster answer' and disabled it

#

by itself

#

this router is some bs ngl

#

sometimes it just overrides you

ocean vortex Aug 9, 2025, 8:03 PM

#

whole wagon it told me it was producing a 'faster answer' and disabled it

well then it didn't do thinking... you need to explicitly select thinking version

whole wagon Aug 9, 2025, 8:04 PM

#

i literally did

#

their product is broken

zinc ore Aug 9, 2025, 8:04 PM

#

Have you tried explicitly telling it to think deeply?

ocean vortex Aug 9, 2025, 8:04 PM

#

weird then. But yeah I'm not a huge fan of that router myself

whole wagon Aug 9, 2025, 8:04 PM

#

zinc ore Have you tried explicitly telling it to think deeply?

that would work yes

ocean vortex Aug 9, 2025, 8:04 PM

#

Ideally they should have 3 options - auto/thinking/chat

gentle plinth Aug 9, 2025, 8:05 PM

#

ocean vortex Ideally they should have 3 options - auto/thinking/chat

you can retry with thinking actually

#

but not sure how well that works

#

ocean vortex Aug 9, 2025, 8:06 PM

#

gentle plinth but not sure how well that works

yeah that's workaround but obviously less than ideal wasting both time and your caps

neon idol Aug 9, 2025, 8:06 PM

#

zinc ore Have you tried explicitly telling it to think deeply?

Yes and it works but the problem is that in my opinion when gpt 5 doesn't think it use gpt 4 end not gpt 4o. This is why whan he doesn't use thinking is stupid

whole wagon Aug 9, 2025, 8:06 PM

#

if i tell it to use the big model will it listen

#

or just route it anyways

#

https://chatgpt.com/share/6897aa62-3eb4-8008-905a-e0cd934489b3 still failing

#

any more bs? i need to go in settings and enable big hyper pro graph mistake spotter setting?

#

why cant it just work like other models

neon idol Aug 9, 2025, 8:08 PM

#

Gemini 3 pls save us 😭

ocean vortex Aug 9, 2025, 8:13 PM

#

whole wagon by itself

Are you sure you haven't clicked on "faster response"? I recall it suggesting that to me but never proceeding with it by itself

#

Unless I confirm by clicking

whole wagon Aug 9, 2025, 8:13 PM

#

whole wagon <https://chatgpt.com/share/6897aa62-3eb4-8008-905a-e0cd934489b3> still failing

you can see

#

it is thinking here

#

and still failing

#

what else you want i literally told it to think also

ocean vortex Aug 9, 2025, 8:14 PM

#

whole wagon and still failing

What is the screenshot anyway? It doesn't load in your shared chat

whole wagon Aug 9, 2025, 8:15 PM

#

gentle plinth Aug 9, 2025, 8:17 PM

#

What exactly does it mean with "the o3 and GPT-4o bars are empty outlines, which visually read as ~0 despite the labels 69.1 and 30.8." tho

ocean vortex Aug 9, 2025, 8:17 PM

#

whole wagon

#

it sees the scaling problem

keen beacon Aug 9, 2025, 8:18 PM

#

whole wagon

Did a quick fix with Flux

#

lol

gentle plinth Aug 9, 2025, 8:19 PM

#

gentle plinth What exactly does it mean with "the o3 and GPT-4o bars are empty outlines, which...

Is it seeing that one problem, or is it just something about the outlines?

whole wagon Aug 9, 2025, 8:20 PM

#

https://chatgpt.com/share/6897ad84-5f1c-8008-a7bb-b2eb6fd11655

#

i tried again

#

still refuses to work

ocean vortex Aug 9, 2025, 8:21 PM

#

gentle plinth Is it seeing that one problem, or is it just something about the outlines?

it essentially identified the problem correctly but worded it in a weird way

hallow ridge Aug 9, 2025, 8:21 PM

#

keen beacon Did a quick fix with Flux

Dang she is hot @?

gentle plinth Aug 9, 2025, 8:21 PM

#

ocean vortex it essentially identified the problem correctly but worded it in a weird way

maybe yeah, even if its only one of the two problems

keen beacon Aug 9, 2025, 8:21 PM

#

hallow ridge Dang she is hot @?

https://tenor.com/view/oh-really-come-on-man-alright-then-seriously-are-you-serious-gif-26591537

Tenor

hallow ridge Aug 9, 2025, 8:21 PM

#

keen beacon https://tenor.com/view/oh-really-come-on-man-alright-then-seriously-are-you-seri...

what do you mean

keen beacon Aug 9, 2025, 8:21 PM

#

hallow ridge Dang she is hot @?

She is not real. I did that with image edit on LMArena

hallow ridge Aug 9, 2025, 8:22 PM

#

keen beacon She is not real. I did that with image edit on LMArena

I mean she is a 10/10

whole wagon Aug 9, 2025, 8:22 PM

#

take this down bad stuff elsewhere

#

this isnt the place

keen beacon Aug 9, 2025, 8:22 PM

#

whole wagon take this down bad stuff elsewhere

Let's not forget the woman on Grok's... Companion section

#

lmao

hallow ridge Aug 9, 2025, 8:22 PM

#

keen beacon She is not real. I did that with image edit on LMArena

Edit her arms separated

keen beacon Aug 9, 2025, 8:23 PM

#

hallow ridge Edit her arms separated

Do it yourself

ocean vortex Aug 9, 2025, 8:23 PM

#

https://chatgpt.com/share/6897ae05-e63c-800b-9725-6af6d5757b53

ChatGPT

ChatGPT - Chart issue explanation

Shared via ChatGPT

whole wagon Aug 9, 2025, 8:23 PM

#

whole wagon <https://chatgpt.com/share/6897ad84-5f1c-8008-a7bb-b2eb6fd11655>

.

hallow ridge Aug 9, 2025, 8:23 PM

#

keen beacon Do it yourself

DIid this with image edit

whole wagon Aug 9, 2025, 8:23 PM

#

i cant get it to work it doesnt matter what you send lol. it literally will not work

#

i dont care if you get routed differently or whatever. i just expect it to work

hallow ridge Aug 9, 2025, 8:24 PM

#

whole wagon i cant get it to work it doesnt matter what you send lol. it literally will not ...

Turn the VPN off

ocean vortex Aug 9, 2025, 8:24 PM

#

whole wagon i cant get it to work it doesnt matter what you send lol. it literally will not ...

Maybe because you prompt it weird, dunno

whole wagon Aug 9, 2025, 8:24 PM

#

i copied your prompt

ocean vortex Aug 9, 2025, 8:24 PM

#

Just ask normally "what is the problem in this screenshot?"

whole wagon Aug 9, 2025, 8:24 PM

#

in the second time

whole wagon Aug 9, 2025, 8:24 PM

#

whole wagon <https://chatgpt.com/share/6897ad84-5f1c-8008-a7bb-b2eb6fd11655>

.

ocean vortex Aug 9, 2025, 8:25 PM

#

whole wagon i copied your prompt

Hmm

#

Do you have any custom instructions...?

whole wagon Aug 9, 2025, 8:25 PM

#

no

ocean vortex Aug 9, 2025, 8:25 PM

#

For you it just seems too concise

mossy roost Aug 9, 2025, 8:25 PM

#

// Example ToolCard accessibility
<button
aria-label={Run ${toolName} tool}
role="button"
tabIndex={0}

hallow ridge Aug 9, 2025, 8:26 PM

#

Yall think this is real or AI

whole wagon Aug 9, 2025, 8:26 PM

#

ocean vortex Aug 9, 2025, 8:26 PM

#

whole wagon no

I'm telling it to be verbose. But I had those for o3 as well tbf

mossy roost Aug 9, 2025, 8:27 PM

#

Hi

keen beacon Aug 9, 2025, 8:28 PM

#

hallow ridge Yall think this is real or AI

Hard to say

hallow ridge Aug 9, 2025, 8:28 PM

#

keen beacon Hard to say

😹

hallow ridge Aug 9, 2025, 8:28 PM

#

keen beacon Hard to say

AI getting good huh

gentle plinth Aug 9, 2025, 8:28 PM

#

hallow ridge Yall think this is real or AI

obv ai since its 1024-1024

#

pixel

keen beacon Aug 9, 2025, 8:28 PM

#

gentle plinth obv ai since its 1024-1024

ah you went that far

hallow ridge Aug 9, 2025, 8:28 PM

#

gentle plinth obv ai since its 1024-1024

I could make it bigger

gentle plinth Aug 9, 2025, 8:30 PM

#

keen beacon ah you went that far

i mean the initial impression is that its ai, but its always hard to tell if its some compression artifacts, filter, ai enhanced image, or fully ai generated

keen beacon Aug 9, 2025, 8:30 PM

#

gentle plinth i mean the initial impression is that its ai, but its always hard to tell if its...

Yeah, wondering when OpenAI will publish gpt-image 2

#

I hope they remove the yellow tint in that version

whole wagon Aug 9, 2025, 8:31 PM

#

i played games for dubesor leaderboard but the guy deleted them for some reason

#

quite annoying

#

i dont think he 'trusted' them or something

#

@gentle plinth u know this guy?

gentle plinth Aug 9, 2025, 8:32 PM

#

not personally

whole wagon Aug 9, 2025, 8:32 PM

#

tell him i am not trying to hijack his leaderboard lmao

#

he wont accept my games

gentle plinth Aug 9, 2025, 8:33 PM

#

@earnest parcel

#

also wrote a dm

#

maybe its a bug in the leaderboard?

hallow ridge Aug 9, 2025, 8:35 PM

#

I got a job and I used AI

#

https://www.youtube.com/shorts/-yMNvm2VzZY

YouTube

Hotels.com

Hotels.com | He's Bellboy!

He's Bellboy. Some heroes save you. Some heroes save you up to 20% on hundreds of thousands of hotels. Book now.

Savings available to signed-in members only.

▶ Play video

gentle plinth Aug 9, 2025, 8:35 PM

#

this whole chess arena is based on this: https://github.com/llm-chess-arena/llm-chess-arena/ according to the page

GitHub

GitHub - llm-chess-arena/llm-chess-arena: A chess arena for large l...

A chess arena for large language models . Contribute to llm-chess-arena/llm-chess-arena development by creating an account on GitHub.

gentle plinth Aug 9, 2025, 8:36 PM

#

whole wagon i played games for dubesor leaderboard but the guy deleted them for some reason

do you still have the games tho?

#

not sure if its even setup in a way to auto upload games

whole wagon Aug 9, 2025, 8:37 PM

#

it is

#

he told me he has the games

#

but he only accepted 2

#

and rejected the rest

gentle plinth Aug 9, 2025, 8:37 PM

#

ok 🤔

whole wagon Aug 9, 2025, 8:37 PM

#

idk. incredibly annoying that he just wouldnt trust their authenicity i spend quite a lot of money on it

#

If they were accepted o3 would be in the leaderboard also

neon idol Aug 9, 2025, 8:38 PM

#

whole wagon idk. incredibly annoying that he just wouldnt trust their authenicity i spend qu...

Can I have the prompt?

whole wagon Aug 9, 2025, 8:38 PM

#

it is https://dubesor.de/chess/

neon idol Aug 9, 2025, 8:38 PM

#

Nono the prompt of your request for gpt5

gentle plinth Aug 9, 2025, 8:39 PM

#

its in the shared link xD

#

or do you mean for the chess leaderboard

whole wagon Aug 9, 2025, 8:39 PM

#

oh i missed this. i guess thats the discord ping then lol

gentle plinth Aug 9, 2025, 8:40 PM

#

neon idol Nono the prompt of your request for gpt5

prompt for the chess arena is here: https://github.com/llm-chess-arena/llm-chess-arena/blob/main/chess-game.js

GitHub

llm-chess-arena/chess-game.js at main · llm-chess-arena/llm-chess-...

A chess arena for large language models . Contribute to llm-chess-arena/llm-chess-arena development by creating an account on GitHub.

whole wagon Aug 9, 2025, 8:40 PM

#

i would do more. but currently i think they would still be rejected

#

so theres no point

neon idol Aug 9, 2025, 8:40 PM

#

Nooo

neon idol Aug 9, 2025, 8:41 PM

#

whole wagon <https://chatgpt.com/share/6897ad84-5f1c-8008-a7bb-b2eb6fd11655>

This prompt

whole wagon Aug 9, 2025, 8:41 PM

#

whole wagon

.

neon idol Aug 9, 2025, 8:41 PM

#

whole wagon .

Text prompt?

#

(Dont hate me)

gentle plinth Aug 9, 2025, 8:42 PM

#

its in the shared url bruh

#

"What is the problem in this screenshot?"

neon idol Aug 9, 2025, 8:45 PM

#

The problem is that the chart is misleading because it stacks two mutually exclusive conditions (“without thinking” and “with thinking”) for GPT-5 as if they were additive, while for OpenAI o3 and GPT-4o it only shows a single bar without any breakdown. This makes the comparison unfair and confusing — it looks like GPT-5’s 74.9 score is “52.8 plus extra,” even though those are separate test modes that should be compared side-by-side, not summed.

#

@whole wagon correct or not?

#

Ok

#

Now lets try gemini

warm fulcrum Aug 9, 2025, 8:47 PM

#

@echo aurora is there a way to view how well zenith did on the benchmarks?

static lark Aug 9, 2025, 8:49 PM

#

how so

neon idol Aug 9, 2025, 8:50 PM

#

@gentle plinth But what's the correct answer

gentle plinth Aug 9, 2025, 8:51 PM

#

neon idol <@297782778232176641> But what's the correct answer

well that the heights of the bars arent corresponding to the numbers

whole wagon Aug 9, 2025, 8:52 PM

#

neon idol <@297782778232176641> But what's the correct answer

💀

#

Is it not immediately obvious to you?

echo aurora Aug 9, 2025, 8:52 PM

#

warm fulcrum <@283397944160550928> is there a way to view how well zenith did on the benchmar...

there is not

neon idol Aug 9, 2025, 8:54 PM

#

Based on the information available as of August 2025, the main problem with the screenshot lies not in the data presented, which appears to be accurate, but in the misleading and inconsistent graphical representation.

Here is a detailed analysis of the issue:

Inconsistent Visual Representation: The bar representing GPT-5 is filled and colored to show two different metrics ("Without thinking" and "With thinking"). In contrast, the bars for OpenAI o3 and GPT-4o are just empty outlines with a single numerical value above them. This is a significant design inconsistency that makes the chart look incomplete or unprofessional.

Incomplete and Misleading Comparison: The legend introduces the distinction between "With thinking" and "Without thinking," but this breakdown is visually applied only to GPT-5. The chart does not clarify whether the scores for OpenAI o3 and GPT-4o were achieved with or without a similar "thinking" capability. This creates an ambiguous comparison, leaving the viewer to wonder if the models were tested under the same conditions.

Data Accuracy: Despite the problematic graphics, the data itself seems to reflect OpenAI's announcements from early August 2025.

GPT-5: The model was released in August 2025. Its score of 74.9% on the SWE-bench Verified benchmark with the "thinking" feature enabled has been confirmed by multiple sources.

OpenAI o3: This is a real model announced in late 2024, known for its reasoning capabilities.

GPT-4o: The score of approximately 30.8% is consistent with reported data for this model on the same benchmark. An OpenAI publication from August 2024 indicated a score of 33.2%.

In conclusion, while the numbers are likely correct in the context of their release, the chart presents them in a way that is visually inconsistent and does not allow for a clear and fair comparison of the different models' performance.

#

@gentle plinth gemini answer

#

Correct?

gentle plinth Aug 9, 2025, 8:56 PM

#

if i am not missing anything, it hasnt gotten the main problem

#

so not correct

neon idol Aug 9, 2025, 8:56 PM

#

Partially correct

gentle plinth Aug 9, 2025, 8:56 PM

#

same as gpt-5, but i mean it has to see it, its just a huge flaw

neon idol Aug 9, 2025, 8:57 PM

#

I think that any models can actually resolve it

whole wagon Aug 9, 2025, 8:57 PM

#

i tried in ai studio. it got it every time

neon idol Aug 9, 2025, 8:57 PM

#

whole wagon i tried in ai studio. it got it every time

Me too

gentle plinth Aug 9, 2025, 8:58 PM

#

seems like it used google search

#

maybe this confused it?

neon idol Aug 9, 2025, 8:58 PM

#

Let's try Xi Jin ping models 🔥🔥🔥

keen beacon Aug 9, 2025, 8:58 PM

#

neon idol Let's try Xi Jin ping models 🔥🔥🔥

Finally some sense in the chat

gentle plinth Aug 9, 2025, 8:58 PM

#

do they even have vision?

whole wagon Aug 9, 2025, 8:58 PM

#

some do

keen beacon Aug 9, 2025, 8:58 PM

#

gentle plinth do they even have vision?

Most don't

#

must be for resource reasons or smth

neon idol Aug 9, 2025, 8:58 PM

#

gentle plinth maybe this confused it?

I have put google search becuase he said that gpt 5 isn't available yet

neon idol Aug 9, 2025, 8:59 PM

#

keen beacon Finally some sense in the chat

https://tenor.com/b6iLspcwFGS.gif

Tenor

warm fulcrum Aug 9, 2025, 9:01 PM

#

echo aurora there is not

unfortunate. do u atleast know what it got on swe benchmark? or any other benchmark for that matter. would be nice if you could provide me with some information

neon idol Aug 9, 2025, 9:02 PM

#

@gentle plinth bro but the problem is that the rectangle of gpt o3 and 4o are at the same lenght but numbers are differents?

gentle plinth Aug 9, 2025, 9:03 PM

#

neon idol <@297782778232176641> bro but the problem is that the rectangle of gpt o3 and 4o...

yes that and if you look at the other bar, its also wrong proportional to the other heights

#

52.8 not > 69.1

neon idol Aug 9, 2025, 9:04 PM

#

Yeah

gentle plinth Aug 9, 2025, 9:04 PM

#

and 69.1 not = 30.8

neon idol Aug 9, 2025, 9:05 PM

#

@keen beacon i am sorry for your but Qwen Failed

keen beacon Aug 9, 2025, 9:05 PM

#

Btw, if you guys wanna search for good math problems to try. OpenStax has free educational books with examples and such to try (For AI).

keen beacon Aug 9, 2025, 9:05 PM

#

neon idol <@456226577798135808> i am sorry for your but Qwen Failed

ok

whole wagon Aug 9, 2025, 9:05 PM

#

At least GPT5 is less deceptive

neon idol Aug 9, 2025, 9:05 PM

#

keen beacon Btw, if you guys wanna search for good math problems to try. OpenStax has free e...

Thx this is really usefull

keen beacon Aug 9, 2025, 9:06 PM

#

neon idol Thx this is really usefull

https://openstax.org/subjects/math/

OpenStax | Free Textbooks Online with No Catch

OpenStax offers free college textbooks for all types of students, making education accessible & affordable for everyone. Browse our list of available subjects!

#

has other subjects as well

neon idol Aug 9, 2025, 9:07 PM

#

keen beacon https://openstax.org/subjects/math/

Thx I will use it for mine math education

keen beacon Aug 9, 2025, 9:07 PM

#

neon idol Thx I will use it for mine math education

Ah, I see. I only knew about this couple of months ago

#

Did not know such was an option

gentle plinth Aug 9, 2025, 9:08 PM

#

i think for good prompts you need to find some old used books which werent sold well, with riddles or math problems, so that they arent in training data

#

obv, some of the answers in these books could be wrong, but most of them should be a better test then some question in the internet

neon idol Aug 9, 2025, 9:09 PM

#

I would like to test deepseek for this prompt but I want an answer before my dead 💀

keen beacon Aug 9, 2025, 9:10 PM

#

gentle plinth i think for good prompts you need to find some old used books which werent sold ...

Well, archive.org it is then

#

Gonna find some late 1800s math books

#

lmao

gentle plinth Aug 9, 2025, 9:10 PM

#

that could also be in training data

keen beacon Aug 9, 2025, 9:10 PM

#

oh

gentle plinth Aug 9, 2025, 9:10 PM

#

if you find it in the internet

whole wagon Aug 9, 2025, 9:10 PM

#

just make the problems

neon idol Aug 9, 2025, 9:11 PM

#

Deepseek made a stupid answer

#

Now I will try kimi k2 and doubao

gentle plinth Aug 9, 2025, 9:12 PM

#

tried mistral small running locally

#

"The problem highlighted in this screenshot is the performance discrepancy among these models, particularly the significantly lower accuracy of GPT-4o compared to GPT-5 and OpenAI o3." is kind of going in the right direction

#

but not really

neon idol Aug 9, 2025, 9:12 PM

#

gentle plinth tried mistral small running locally

Is the same answer of deepseek

whole wagon Aug 9, 2025, 9:13 PM

#

The tracking ai guys have a good offline benchmark. And they show the public one as well so you can see the huge difference. This is the offline one

#

This is the public one

echo aurora Aug 9, 2025, 9:14 PM

#

warm fulcrum unfortunate. do u atleast know what it got on swe benchmark? or any other benchm...

Yeah makes sense why that'd be nice to have, sry to say I just don't have any info I can share

warm fulcrum Aug 9, 2025, 9:14 PM

#

echo aurora Yeah makes sense why that'd be nice to have, sry to say I just don't have any in...

np

gentle plinth Aug 9, 2025, 9:15 PM

#

whole wagon This is the public one

gpt-5 💀

keen beacon Aug 9, 2025, 9:15 PM

#

neon idol Deepseek made a stupid answer

China ain't winning it seems.

whole wagon Aug 9, 2025, 9:15 PM

#

gentle plinth gpt-5 💀

those vision scores might explain what i observed

solid brook Aug 9, 2025, 9:15 PM

#

whole wagon This is the public one

This is so wrong. I'm sure they promted it wrong

neon idol Aug 9, 2025, 9:16 PM

#

keen beacon China ain't winning it seems.

Doubao lose

#

Uncorrect answer

solid brook Aug 9, 2025, 9:16 PM

#

Explain to me how is gpt 5 thinking worse than o3?

keen beacon Aug 9, 2025, 9:16 PM

#

neon idol Uncorrect answer

https://tenor.com/view/trump-china-gif-284632875632280687

Tenor

neon idol Aug 9, 2025, 9:16 PM

#

keen beacon https://tenor.com/view/trump-china-gif-284632875632280687

Last china models: kimi k2 and glm 4.5

#

Lets see if they will get right answer

#

The screenshot has several issues related to the visualization of the data:

Misuse of Stacked Bars:
- Stacked bars imply that the segments (e.g., "Without thinking" and "With thinking") are parts of a whole, but these are separate metrics (two distinct evaluations). Stacking them incorrectly suggests they sum to a total percentage (e.g., 74.9 + 52.8 = 127.7%, which is impossible for accuracy metrics).
Inconsistency in Representation:
- Only GPT-5 is shown with a stacked bar, while OpenAI o3 and GPT-4o are single bars. This breaks the comparison across models and creates confusion about whether the other models were evaluated under both conditions.
Confusing Labeling:
- The labels (e.g., "74.9" and "52.8") overlap on GPT-5’s bar, making readability difficult. A clearer approach would use side-by-side bars for each model under both conditions.
Misleading Scale:
- The y-axis starts at 0%, but the truncated bars (e.g., OpenAI o3 at 69.1%) might distort perceived differences if the full scale isn’t visible (e.g., the gap between 30.8% and 69.1% appears larger than it is).

Recommendations:

Use side-by-side bars (grouped) for each model under "Without thinking" and "With thinking" conditions.
Ensure consistency in how all models are represented.
Label bars clearly without overlapping text.
Maintain a full y-axis scale (0% to 100%) for proportional accuracy comparisons.

This would resolve the misrepresentation of data and improve clarity for comparing model performance.

#

Kimi 1.5 answer

neon idol Aug 9, 2025, 9:20 PM

#

neon idol The screenshot has several issues related to the visualization of the data: 1. ...

@gentle plinth this isn't the correct answer right?

willow grail Aug 9, 2025, 9:21 PM

#

what is the best way currently to get the most gpt5 high prompts for least money?
poe.com?
i told gpt5 high on poe to write code. wrote 1700 lines.
and with that, i can do 1000 prompts monthly for 22 euro.
anyone offering cheaper??

neon idol Aug 9, 2025, 9:22 PM

#

willow grail what is the best way currently to get the most gpt5 high prompts for least money...

Idk sorry

#

Correct

cedar tide Aug 9, 2025, 9:24 PM

#

You see this good model ?

Screenshot_2025-08-10-00-23-37-094_com.android.chrome-edit.jpg

neon idol Aug 9, 2025, 9:24 PM

#

cedar tide You see this good model ?

Ok and?

cedar tide Aug 9, 2025, 9:25 PM

#

neon idol Ok and?

Just that

#

no one has mentioned that they have arrived in the rankings at a good place

#

I know

keen beacon Aug 9, 2025, 9:26 PM

#

cedar tide You see this good model ?

what leaderboard is that?

mossy roost Aug 9, 2025, 9:26 PM

#

@neon idol @neon idol @cedar tide @cedar tide

cedar tide Aug 9, 2025, 9:27 PM

#

keen beacon what leaderboard is that?

Webdev arena

keen beacon Aug 9, 2025, 9:28 PM

#

cedar tide Webdev arena

Ah. Yeah. Deepseek is good at that stuff

#

and glm

#

Really cheap too

cedar tide Aug 9, 2025, 9:29 PM

#

@deep adder
On this. Arena too https://www.designarena.ai/

Screenshot_2025-08-10-00-28-55-520_com.android.chrome-edit.jpg

mossy roost Aug 9, 2025, 9:31 PM

#

@keen beacon @keen beacon

#

956626939152576543

keen beacon Aug 9, 2025, 9:32 PM

#

cedar tide <@348477266704990208> On this. Arena too https://www.designarena.ai/

I need to try that out

#

Havent heard about it.

neon idol Aug 9, 2025, 9:34 PM

#

mossy roost <@1196228030901792810> <@1196228030901792810> <@419074580515389450> <@4190745805...

What?

mossy roost Aug 9, 2025, 9:35 PM

#

Hi

neon idol Aug 9, 2025, 9:35 PM

#

mossy roost Hi

Bruh

#

Stfu

#

Dont ping me for this

eternal niche Aug 9, 2025, 9:36 PM

#

btw gpt5 sucks

keen beacon Aug 9, 2025, 9:38 PM

#

neon idol Stfu

Chill

stray aspen Aug 9, 2025, 9:41 PM

#

eternal niche btw gpt5 sucks

keen beacon Aug 9, 2025, 9:41 PM

#

stray aspen

Omg, a voice reveal

willow grail Aug 9, 2025, 9:43 PM

#

stray aspen

no ur great LP

#

😛

stray aspen Aug 9, 2025, 9:43 PM

#

Gpt-5 is smart AF

willow grail Aug 9, 2025, 9:43 PM

#

also fix ur mic. ur quiet

#

@stray aspen

stray aspen Aug 9, 2025, 9:43 PM

#

What's up

willow grail Aug 9, 2025, 9:43 PM

#

stray aspen

no. ur nice

echo dome Aug 9, 2025, 9:51 PM

#

welcome to baldi's basics

torn mantle Aug 9, 2025, 9:56 PM

#

how can i fix my mic

#

tell me

willow grail Aug 9, 2025, 9:57 PM

#

torn mantle how can i fix my mic

not u girl

#

neduo tho... oh satan... his mic is bad

echo aurora Aug 9, 2025, 10:00 PM

#

I'm going to be running this poll periodically, we'd love to understand better why.

golden ocean Aug 9, 2025, 10:18 PM

#

real

#

keen beacon Aug 9, 2025, 10:20 PM

#

echo aurora

It depends.

#

Hard to say but I put "battle"

golden ocean Aug 9, 2025, 10:21 PM

#

https://tenor.com/view/cat-cute-kitty-yes-nodding-gif-17803998

Tenor

echo aurora Aug 9, 2025, 10:25 PM

#

keen beacon Hard to say but I put "battle"

can you let us know some more in #1403860607836487810 message ?

keen beacon Aug 9, 2025, 10:26 PM

#

echo aurora can you let us know some more in https://discord.com/channels/134055475734917941...

I can.

echo aurora Aug 9, 2025, 10:30 PM

#

poll_question_text

What version do you use the most?

victor_answer_votes

7

total_votes

14

victor_answer_id

3

victor_answer_text

Direct

stray aspen Aug 9, 2025, 10:34 PM

#

I love gpt-5

earnest parcel Aug 9, 2025, 10:35 PM

#

whole wagon oh i missed this. i guess thats the discord ping then lol

it's on my to-do btw. good news is that reject rate is extremely low (you can check well over 1k games&replays currently). make sure to not spam the same matchups though as that would invalidate any scoring and doesn't represent real elo. or fork the project and hook up your db and go crazy with it. I purposefully didn't obfuscate or minimize any chess code so you can use it (MIT), and the original code is linked also.

tidal ginkgo Aug 9, 2025, 10:42 PM

#

what is going on?

#

every time i give a prompt, this automatically appears

obsidian shell Aug 9, 2025, 10:52 PM

#

refresh

maybe another annoying cloudflare check...

tardy cedar Aug 9, 2025, 11:01 PM

#

can I generate 9:16 videos guys ?

stray aspen Aug 9, 2025, 11:29 PM

#

copilot cant remember previous messages

tidal ginkgo Aug 9, 2025, 11:31 PM

#

lol

boreal timber Aug 9, 2025, 11:39 PM

#

I'm here to explore and also share my prompts ideas

round zinc Aug 9, 2025, 11:41 PM

#

I keep getting "Something went wrong with this response, please try again." Error while chatting with claude models. Im tired of it, it's bothersome

verbal nimbus Aug 9, 2025, 11:41 PM

#

stray aspen I love gpt-5

It's pretty terrible on ChatGPT

#

I think it might be routing some requests to a smaller model

stray aspen Aug 9, 2025, 11:42 PM

#

round zinc I keep getting "Something went wrong with this response, please try again." Erro...

hi the reload button

#

a lot of times

stray aspen Aug 9, 2025, 11:42 PM

#

verbal nimbus I think it might be routing some requests to a smaller model

yeah

#

thats what gpt-5 chat does

round zinc Aug 9, 2025, 11:42 PM

#

stray aspen hi the reload button

I tried many times

#

But still nothing

stray aspen Aug 9, 2025, 11:42 PM

#

keep hitting it

#

or create a new conversation

round zinc Aug 9, 2025, 11:43 PM

#

stray aspen keep hitting it

Okay

verbal nimbus Aug 9, 2025, 11:43 PM

#

stray aspen thats what gpt-5 chat does

Yeah, it's pretty terrible, like why would you route a software design question to a tiny model

#

It adds a bunch of crazy implementation requirements when I just asked it to summarize the chat

round zinc Aug 9, 2025, 11:46 PM

#

stray aspen keep hitting it

Still nothing

#

I tried hitting it almost like 30 times

blazing bison Aug 9, 2025, 11:53 PM

#

a lot of openai partners: look gpt -5 is the best model in the world
people: why it sucks so much

#

someone is lying

willow grail Aug 10, 2025, 12:08 AM

#

blazing bison a lot of openai partners: look gpt -5 is the best model in the world people: why...

cause ur not high

blazing bison Aug 10, 2025, 12:09 AM

#

willow grail cause ur not high

high thinks too much and do too little

willow grail Aug 10, 2025, 12:09 AM

#

blazing bison high thinks too much and do too little

the duck?

ocean vortex Aug 10, 2025, 12:13 AM

#

The issue with gpt5 on website is that it implies sota model answers. But you can get gpt4.1 level answers for things you would have used o3 previously. They need to rework the model switcher options IMO

#

Like auto/quick/thinking

untold kayak Aug 10, 2025, 12:14 AM

#

Hi everyone!! My name is Patricio, called umpalumpa while playing games sometimes. I’m a motion designer! Best 🤙🏻✌🏻

wintry citrus Aug 10, 2025, 12:15 AM

#

untold kayak Hi everyone!! My name is Patricio, called umpalumpa while playing games sometime...

uh

#

that's cool

#

?

stray aspen Aug 10, 2025, 12:23 AM

#

blazing bison high thinks too much and do too little

nah its great

#

i tested it on yupp.ai

#

and its pretty good

stray aspen Aug 10, 2025, 12:23 AM

#

round zinc I tried hitting it almost like 30 times

wiat or jusst start new conversation

blazing bison Aug 10, 2025, 12:30 AM

#

stray aspen nah its great

lmao

#

every opportunity you want to announce this scam site

stray aspen Aug 10, 2025, 12:32 AM

#

wdym scam

#

it aint a scam

#

i tested them modedls

#

they have similar answers

verbal nimbus Aug 10, 2025, 12:57 AM

#

The one on LMArena is GPT-5-Thinking:high right?

stray aspen Aug 10, 2025, 12:57 AM

#

no its probably medium

#

but its still pretty good

verbal nimbus Aug 10, 2025, 12:58 AM

#

Yeah but the actual version is bad

#

Companies shouldn't be already to call a model one thing on LMArena then serve another with the same name

pseudo hemlock Aug 10, 2025, 12:59 AM

#

is P2L dead again?

#

im trying it on the legacy website and its not working 😭

tidal ginkgo Aug 10, 2025, 1:03 AM

#

hi

pseudo hemlock Aug 10, 2025, 1:04 AM

#

tidal ginkgo hi

hi

golden ocean Aug 10, 2025, 1:22 AM

#

pseudo hemlock hi

hi

wintry citrus Aug 10, 2025, 1:25 AM

#

i got banned off yupp.ai discord for saying no im gonna keep spamming

#

is it that serious

#

i don't feel like it is

#

FREEDOM FOR SERVERS

#

AMERICIA

#

🦅🦅🦅

#

WHAT'S A KILOMEY

#

KILOMETER I MEAN

#

🤑🤑🤑🤑🤑🤑🦅🦅🦅🦅🦅🦅🦅

#

yeah i know

#

i actually hate the US

#

🙂

#

F THE US

#

NO FREEDOM

#

OBESE

#

🔥🔥🔥🔥🔥

#

wot

#

why did u delete it

stray aspen Aug 10, 2025, 1:29 AM

#

wintry citrus i got banned off yupp.ai discord for saying no im gonna keep spamming

LMAO

wintry citrus Aug 10, 2025, 1:30 AM

#

what's tariffs

#

DON'T DELETE IT

#

TELL ME

stray aspen Aug 10, 2025, 1:30 AM

#

where are you from

wintry citrus Aug 10, 2025, 1:30 AM

#

WHO'S TARRIFS

stray aspen Aug 10, 2025, 1:30 AM

#

unlocker

wintry citrus Aug 10, 2025, 1:30 AM

#

stray aspen where are you from

the pyramids country

#

don't even need the actual name

#

🙏

stray aspen Aug 10, 2025, 1:31 AM

#

egypt

wintry citrus Aug 10, 2025, 1:31 AM

#

IM GONNA FIND UR HOUSE

#

what's that

#

that's Bitcoin

#

cool

#

tho why did u send it

#

send me a Bitcoin?

cedar tide Aug 10, 2025, 1:32 AM

#

@echo aurora add GPT 5 mini and nano to webdev

stray aspen Aug 10, 2025, 1:32 AM

#

cant understand how qwen 3 is behind gpt-5 in coding benchmark on lmarena

#

thats crazy

wintry citrus Aug 10, 2025, 1:34 AM

#

ooo

#

u sent an image

#

of

#

a b

#

so

#

give it

#

i need it

#

please

#

same as being gay

#

in us only

stray aspen Aug 10, 2025, 1:36 AM

#

craig stop saying messed up crap and deleting it

wintry citrus Aug 10, 2025, 1:36 AM

#

stray aspen craig stop saying messed up crap and deleting it

who's craig

#

sorry i have dementia

stray aspen Aug 10, 2025, 1:36 AM

#

me

#

im craig

wintry citrus Aug 10, 2025, 1:36 AM

#

oh

#

hey Craig

#

can u give me that medicine

#

named asphantisj

#

or

#

what u said

#

it's name was

#

asphawtusja

#

?

#

what's that

#

aphmetaoejsmkssj

#

aphmetaaozn?

stray aspen Aug 10, 2025, 1:37 AM

#

your almost there

wintry citrus Aug 10, 2025, 1:37 AM

#

aphmetazinsii

#

aphmeta

#

aphmetaliaz

#

yeah idk

#

is it a word

#

wait who the hell is Craig

#

cipher

#

wow

stray aspen Aug 10, 2025, 1:39 AM

#

lol

wintry citrus Aug 10, 2025, 1:39 AM

#

im that smart

#

im saying cipher shet

#

no

#

buy a nokia

#

it's way better

#

not even for ai

#

but it's bettrr

obsidian shell Aug 10, 2025, 1:40 AM

#

wtf do you need a cpu for?

#

nice stock pic

misty vault Aug 10, 2025, 1:41 AM

#

crack

wintry citrus Aug 10, 2025, 1:41 AM

#

this is a image which the hacker saved to my images after hacking my phone

#

what a cool malware

#

🙂

misty vault Aug 10, 2025, 1:41 AM

#

NVIDIA A10G

#

NVIDIA L40

stray aspen Aug 10, 2025, 1:41 AM

#

thats nice

#

my graphics card lol

misty vault Aug 10, 2025, 1:41 AM

#

stray aspen Aug 10, 2025, 1:42 AM

#

ill run gpt-oss

#

on this

golden ocean Aug 10, 2025, 1:43 AM

#

misty vault NVIDIA A10G

wintry citrus Aug 10, 2025, 1:43 AM

#

are u gonna ask everything to gpt-5

misty vault Aug 10, 2025, 1:43 AM

#

wintry citrus are u gonna ask everything to gpt-5

wintry citrus Aug 10, 2025, 1:44 AM

#

what am i even seeing

stray aspen Aug 10, 2025, 1:44 AM

#

misty vault

whatai is that

wintry citrus Aug 10, 2025, 1:44 AM

#

humans are smarter than chatgpt gpt-5

golden ocean Aug 10, 2025, 1:44 AM

#

https://cdn.discordapp.com/attachments/1263501011184783453/1401226017833619659/caption.gif?ex=6898bb6c&is=689769ec&hm=fe7502e6deac12f92b904bb55a8272ef78709ee44ba67ac4902ea04bfe061c36&

wintry citrus Aug 10, 2025, 1:44 AM

#

haha

#

get beaten

misty vault Aug 10, 2025, 1:45 AM

#

stray aspen whatai is that

It's

stray aspen Aug 10, 2025, 1:45 AM

#

wintry citrus haha

whats better hallucination 2.5 pro or gpt-5

wintry citrus Aug 10, 2025, 1:45 AM

#

golden ocean https://cdn.discordapp.com/attachments/1263501011184783453/1401226017833619659/c...

yeah the text is ok

#

not

#

hand

#

gesture

#

🙂

misty vault Aug 10, 2025, 1:46 AM

#

wintry citrus 🙂

😊 *

wintry citrus Aug 10, 2025, 1:46 AM

#

stray aspen whats better hallucination 2.5 pro or gpt-5

how does it hallucinate

#

dumbass

stray aspen Aug 10, 2025, 1:46 AM

#

wdym how lol

wintry citrus Aug 10, 2025, 1:46 AM

#

wait

stray aspen Aug 10, 2025, 1:46 AM

#

its so obvious

#

it hallucinated copyright licenses and apache crap in ap ython script

#

and desmos links

misty vault Aug 10, 2025, 1:46 AM

#

dead internet theory

wintry citrus Aug 10, 2025, 1:46 AM

#

SAMUEL

#

wow

#

STOP DELETING UR IMAGES

#

IM GONNA SHART

#

ON U

#

AHHHHHHH

#

yeah because they're fat pigeons

#

ofc it's America

#

everything is fat

#

even the animals

#

yeah i know

#

it's true

#

it's because

#

THEY'RE GAY

#

and they're the strongest country

#

so obv

#

they're

#

the richest

#

so

#

no shet

#

they're richer

#

thank me

#

than*

misty vault Aug 10, 2025, 1:49 AM

#

wintry citrus Aug 10, 2025, 1:49 AM

#

misty vault

what

#

IT'S DEAD INTERNET THOERY

#

i spelled it wrong

#

kill my already