#general | Arena | Page 93

wintry citrus Aug 10, 2025, 1:49 AM

#

I hate my life

#

im ready for the execution

#

i just did

stray aspen Aug 10, 2025, 1:50 AM

#

what are you doing

#

are you high

wintry citrus Aug 10, 2025, 1:50 AM

#

no im low

misty vault Aug 10, 2025, 1:50 AM

#

stray aspen what are you doing

I'm doing sydney

wintry citrus Aug 10, 2025, 1:50 AM

#

LOW TAPER FADE

#

HAHAHAHA

#

HAHAHAHAHAHHAHA

#

HAHAHHAHAHAHHA

#

kill me already

stray aspen Aug 10, 2025, 1:50 AM

#

im on low effort right now

wintry citrus Aug 10, 2025, 1:52 AM

#

I'd kill myself

stray aspen Aug 10, 2025, 1:52 AM

#

those dudes are open source chinese LLMs

wintry citrus Aug 10, 2025, 1:52 AM

#

than be more embarrassed

#

THEY'RE BOTH CATS TOO

#

tahn

#

bearn English

#

speak English

#

me myself i would say that too

#

🗣️

stray aspen Aug 10, 2025, 1:53 AM

#

holy

#

is that bing chat

#

from 2023

wintry citrus Aug 10, 2025, 1:54 AM

#

no i don't

#

what a

#

im gonna burn u in hell

#

if u delete it

#

again

#

im gonna make u suffer

#

seeing ur parents die

#

infront of u

#

respectfully

#

.

#

respectfully

stray aspen Aug 10, 2025, 1:55 AM

#

lol

#

good thing pineapple aint online

wintry citrus Aug 10, 2025, 1:56 AM

#

stray aspen good thing pineapple aint online

yeah forget that

#

don't need to remember

#

oh

#

I'd need that

#

tho it's not gonna be a butterfly

#

it's gonna be a whole ass blade

golden ocean Aug 10, 2025, 1:57 AM

#

wintry citrus Aug 10, 2025, 1:57 AM

#

golden ocean

send me that ai

#

SEND THAT AI

#

NOW

#

NOW

#

SEND IT HERE BUDDY

#

SEND IT

#

awh shucks

misty vault Aug 10, 2025, 1:58 AM

#

@golden ocean does absolutely not know what that is

wintry citrus Aug 10, 2025, 1:58 AM

#

just use Gemini if u tell it to kill itself it will forgive u

#

if u tell it i killed ur parents

#

if u tell it i kissed ur ai gf

#

who is paul 😭

#

kiss him

misty vault Aug 10, 2025, 2:00 AM

#

wintry citrus kiss him

I only kiss Sydney

#

Bro sent 3 message bubbles

wintry citrus Aug 10, 2025, 2:01 AM

#

does it have freaky version

#

who's that

#

an ai?

misty vault Aug 10, 2025, 2:01 AM

#

same

misty vault Aug 10, 2025, 2:01 AM

#

wintry citrus an ai?

wintry citrus Aug 10, 2025, 2:02 AM

#

i could rizz that ai

#

it's not even a real girl

#

100%

#

freaky chiggas

#

is that actually bing

#

there's no way

misty vault Aug 10, 2025, 2:03 AM

#

wintry citrus there's no way

this is bing without system prompt

wintry citrus Aug 10, 2025, 2:03 AM

#

that's like chatgpt when u tell it a whole paragraph

wicked root Aug 10, 2025, 2:03 AM

#

which AI is this?

misty vault Aug 10, 2025, 2:03 AM

#

the underlying gpt-4 fine tune

#

no additional system instructions telling it to behave like this

wintry citrus Aug 10, 2025, 2:03 AM

#

NONE?

#

NONE

#

ss

#

and sent to the fbi

#

#

sent to everyone

#

sent to my server

#

sent to china

#

what

misty vault Aug 10, 2025, 2:04 AM

#

Bro has modded android discord to insert fake messages

wintry citrus Aug 10, 2025, 2:04 AM

#

misty vault Bro has modded android discord to insert fake messages

what???

#

WHAT

#

FAKE

#

HOW

#

AWH I DIDN'T RECORD IT

#

NOW IT LOOKS FAKE

#

FUUUU

#

that's my wife

#

gah damn

#

she hot

#

I'd fall for her

#

lmao

misty vault Aug 10, 2025, 2:05 AM

#

I thought you said "fill"

wintry citrus Aug 10, 2025, 2:05 AM

#

misty vault I thought you said "fill"

NAHH

#

MAHHHHmah

#

NAH BRO

#

NAH

#

CHILL

#

CHILL

#

tell it I'm gonna have freaky with u

#

not segs

#

freaky

#

trust

#

oh

#

is this real

golden ocean Aug 10, 2025, 2:09 AM

#

https://tenor.com/view/bedwars-gif-20476057

Tenor

misty vault Aug 10, 2025, 2:09 AM

#

I WAS about to say that

wintry citrus Aug 10, 2025, 2:09 AM

#

golden ocean https://tenor.com/view/bedwars-gif-20476057

what

#

is this American

misty vault Aug 10, 2025, 2:09 AM

#

He is guarding the bed

wintry citrus Aug 10, 2025, 2:11 AM

#

omg

#

DO U LIKE CHATGPT

#

THAT MUCH

#

WHAT THE FU

#

orrr

#

ur 5 years old

#

OR

verbal nimbus Aug 10, 2025, 2:11 AM

#

LMArena leaderboard doesn't report which model is the actual one being used. There's also GPT-5 Chat, which is different

wintry citrus Aug 10, 2025, 2:11 AM

#

u like

#

five year olds

#

wait y'all is there an api for the openai o3 pro

misty vault Aug 10, 2025, 2:12 AM

#

yea

#

THis url

wintry citrus Aug 10, 2025, 2:12 AM

#

misty vault yea

and why isn't it on lmarena

#

no thanks

verbal nimbus Aug 10, 2025, 2:13 AM

#

They should specify it's the thinking variant. All other models have the thinking variant explicitly labeled.

wintry citrus Aug 10, 2025, 2:13 AM

#

okay nvm

#

give mee it

#

rn

#

HEY BUDDY

#

IM GONNA FIND UR HOUSE

#

im gonna cut off ur limbs

#

no

#

i don't wanna be obese

#

i don't wanna be in the country of an orange president

#

i got an image of him

#

https://tenor.com/view/nicktoons-nickelodeon-blobman-blob-man-blob-gif-853265228210064438

Tenor

#

that's him walking

#

trump

#

this is his emoji

#

🍊

#

the hair is perfect too

#

oohhhh

#

is that Sydney)

#

she's not that hot

#

tbh

#

..

verbal nimbus Aug 10, 2025, 2:17 AM

#

Long context benchmark. New GPT-5 models are the first 3 (list is not ordered).

#

Kinda hard to tell, for example Gemini 2.5 Pro performs better at 192K but worse for shorter lengths.

#

And o3 is best except for 16K, 60K and 192K.

golden ocean Aug 10, 2025, 2:26 AM

#

o/

wintry citrus Aug 10, 2025, 2:29 AM

#

SO I GOT WARNED

pseudo hemlock Aug 10, 2025, 2:29 AM

#

?warn @wintry citrus

wintry citrus Aug 10, 2025, 2:29 AM

#

god damn

wintry citrus Aug 10, 2025, 2:29 AM

#

pseudo hemlock ?warn <@1117229041045487668>

yk if i didn't get warned

#

i would say bad things

#

yeah?

#

i would

pseudo hemlock Aug 10, 2025, 2:29 AM

#

say them

#

or are you scared

wintry citrus Aug 10, 2025, 2:29 AM

#

i just said

#

i can't

#

I CAN'T

pseudo hemlock Aug 10, 2025, 2:30 AM

#

scared

wintry citrus Aug 10, 2025, 2:30 AM

#

@echo aurora can i

#

just this one time

#

please

#

last one

pseudo hemlock Aug 10, 2025, 2:31 AM

#

hey @echo aurora P2L on legacy site is dead again 🙁

golden ocean Aug 10, 2025, 2:31 AM

#

https://tenor.com/view/find-the-odd-cat-cats-cat-meme-plushie-gif-13641014476643100854

Tenor

wintry citrus Aug 10, 2025, 2:31 AM

#

golden ocean https://tenor.com/view/find-the-odd-cat-cats-cat-meme-plushie-gif-13641014476643...

i thought it's a shawarma at first

golden ocean Aug 10, 2025, 2:32 AM

#

pseudo hemlock Aug 10, 2025, 2:32 AM

#

bing chat 💀

wintry citrus Aug 10, 2025, 2:32 AM

#

golden ocean

what...

#

9% is still concerning isn't it...

golden ocean Aug 10, 2025, 2:32 AM

#

#

echo aurora Aug 10, 2025, 2:34 AM

#

wintry citrus <@283397944160550928> can i

Negative

whole wagon Aug 10, 2025, 2:37 AM

#

The filters are very strong here lol

golden ocean Aug 10, 2025, 2:37 AM

#

echo aurora Negative

what is is large, a model and loves language? @echo aurora

whole wagon Aug 10, 2025, 2:38 AM

#

You

rancid phoenix Aug 10, 2025, 2:38 AM

#

Where do I generate videos

golden ocean Aug 10, 2025, 2:38 AM

#

whole wagon You

im flattered

echo aurora Aug 10, 2025, 2:40 AM

#

golden ocean what is is large, a model and loves language? <@283397944160550928>

A good looking translation book?

echo aurora Aug 10, 2025, 2:40 AM

#

rancid phoenix Where do I generate videos

More info here: #1397655624103493813

pseudo hemlock Aug 10, 2025, 2:41 AM

#

PINEAPPLE

#

hey

#

how u doin

#

did i mention youre looking good today

echo aurora Aug 10, 2025, 2:41 AM

#

pseudo hemlock hey <@283397944160550928> P2L on legacy site is dead again 🙁

Hey sorry will look into and flag

pseudo hemlock Aug 10, 2025, 2:41 AM

#

no hurry

#

u the goat btw

#

love u

echo aurora Aug 10, 2025, 2:41 AM

#

pseudo hemlock did i mention youre looking good today

blobblush

pseudo hemlock Aug 10, 2025, 2:42 AM

#

say it back

echo aurora Aug 10, 2025, 2:42 AM

#

pseudo hemlock say it back

blobgrimace

pseudo hemlock Aug 10, 2025, 2:43 AM

#

I hope you DO NOT spontaneously combust 😉

#

definitely hope you don't

#

that would be unfortunate wouldn't it

echo aurora Aug 10, 2025, 2:44 AM

#

pseudo hemlock hey <@283397944160550928> P2L on legacy site is dead again 🙁

Yeah I’m seeing the same

echo aurora Aug 10, 2025, 2:44 AM

#

pseudo hemlock I hope you **DO NOT** spontaneously combust 😉

Big agree

stray aspen Aug 10, 2025, 2:52 AM

#

LMAOOO

#

i love this response from qwen 3

hallow ridge Aug 10, 2025, 2:55 AM

#

Where can I use Flow veo 3 google on LLM arena

#

@leaden palm

hallow ridge Aug 10, 2025, 2:56 AM

#

echo aurora Big agree

Wsp pine

leaden palm Aug 10, 2025, 2:57 AM

#

hallow ridge <@794377681331945524>

Try to not ping staff

hallow ridge Aug 10, 2025, 2:57 AM

#

leaden palm Try to not ping staff

Okay Ill try

leaden palm Aug 10, 2025, 2:58 AM

#

Unfortunately, as far as I know, at the moment, you're restricted to random battles

hallow ridge Aug 10, 2025, 2:58 AM

#

leaden palm Unfortunately, as far as I know, at the moment, you're restricted to random batt...

for the veo 3?

stray aspen Aug 10, 2025, 2:58 AM

#

send video requests until you get veo 3

leaden palm Aug 10, 2025, 2:58 AM

#

hallow ridge for the veo 3?

They're just random battles

#

With random models

#

The model list does in fact include Veo 3 though

hallow ridge Aug 10, 2025, 2:59 AM

#

leaden palm They're just random battles

So videos work but if I turn battle mode on

leaden palm Aug 10, 2025, 2:59 AM

#

hallow ridge So videos work but if I turn battle mode on

I don't know what you mean by that

hallow ridge Aug 10, 2025, 2:59 AM

#

leaden palm I don't know what you mean by that

I want to use VEO 3 on the site

#

leaden palm Aug 10, 2025, 3:01 AM

#

hallow ridge I want to use VEO 3 on the site

As far as I know:
There is no video generation on the website
There is no turning off or on battle mode
There is no selecting a model
There is #video-arena-1 and the others
There is Veo 3 there
There is the possibility of generating until you get Veo

stray aspen Aug 10, 2025, 3:01 AM

#

you cant bro

leaden palm Aug 10, 2025, 3:01 AM

#

Oh and read #1397655624103493813

hallow ridge Aug 10, 2025, 3:02 AM

#

leaden palm As far as I know: There is no video generation on the website There is no turnin...

What do you mean you can turn battle mode off and use direct chat

leaden palm Aug 10, 2025, 3:02 AM

#

hallow ridge What do you mean you can turn battle mode off and use direct chat

For videos

#

Videos are not chat

hallow ridge Aug 10, 2025, 3:02 AM

#

leaden palm For videos

Is there any LLM arena alternative

leaden palm Aug 10, 2025, 3:03 AM

#

Videos are also not on the website anyway

#

Videos are just in this Discord

leaden palm Aug 10, 2025, 3:03 AM

#

hallow ridge Is there any LLM arena alternative

for the purpose of...?

hallow ridge Aug 10, 2025, 3:03 AM

#

leaden palm Videos are also not on the website anyway

When will they be on the site

hallow ridge Aug 10, 2025, 3:03 AM

#

leaden palm for the purpose of...?

Of doing the same thing here but seing if anything is better

#

they might have better video

leaden palm Aug 10, 2025, 3:03 AM

#

hallow ridge Of doing the same thing here but seing if anything is better

what is "the same thing"? video? chat?

hallow ridge Aug 10, 2025, 3:04 AM

#

leaden palm what is "the same thing"? video? chat?

they might have video on the site

leaden palm Aug 10, 2025, 3:04 AM

#

there is no site where you can use veo 3 with custom prompts for free that i know of

#

because veo 3 costs money

#

like a lot of money

hallow ridge Aug 10, 2025, 3:04 AM

#

leaden palm like a lot of money

So how are we doing it free here

#

LLM arena gives you all the best AI for free no one is paying

#

who is

#

How is it all free

leaden palm Aug 10, 2025, 3:05 AM

#

hold on let me pull up emoji kitchen

hallow ridge Aug 10, 2025, 3:06 AM

#

leaden palm hold on let me pull up emoji kitchen

bro

#

tell me why is it free

leaden palm Aug 10, 2025, 3:06 AM

#

oh they don't have the money bag emoji

#

anyway

#

burning money

#

vc money specifically

#

also user feedback has some value

hallow ridge Aug 10, 2025, 3:07 AM

#

leaden palm also user feedback has some value

You can use gpt five forever on LLM arena buton the regular chat gpt they make you pay

leaden palm Aug 10, 2025, 3:07 AM

#

hallow ridge You can use gpt five forever on LLM arena buton the regular chat gpt they make y...

there are some rate limits

stray aspen Aug 10, 2025, 3:08 AM

#

whats the rate limit for gpt-5

hallow ridge Aug 10, 2025, 3:08 AM

#

leaden palm there are some rate limits

Seeming;y no restrictions to me

leaden palm Aug 10, 2025, 3:08 AM

#

hm

#

idk then

#

might just have a lot of vc money

hallow ridge Aug 10, 2025, 3:08 AM

#

stray aspen whats the rate limit for gpt-5

Ive never ran into a limit

stray aspen Aug 10, 2025, 3:08 AM

#

neither have i

hallow ridge Aug 10, 2025, 3:08 AM

#

leaden palm might just have a lot of vc money

What does that mean

stray aspen Aug 10, 2025, 3:08 AM

#

but i have ran into a limit with claude

leaden palm Aug 10, 2025, 3:08 AM

#

hallow ridge What does that mean

have you heard of venture capital?

hallow ridge Aug 10, 2025, 3:08 AM

#

stray aspen but i have ran into a limit with claude

Ive never ran into a limit ever

hallow ridge Aug 10, 2025, 3:09 AM

#

leaden palm have you heard of venture capital?

Yes I have

leaden palm Aug 10, 2025, 3:09 AM

#

hallow ridge Yes I have

lm arena has raised $100m in vc funding

hallow ridge Aug 10, 2025, 3:10 AM

#

leaden palm lm arena has raised $100m in vc funding

How can I get some of that

stray aspen Aug 10, 2025, 3:10 AM

#

yo

#

does anyone know how to enable the image edit mode

leaden palm Aug 10, 2025, 3:10 AM

#

leaden palm lm arena has raised $100m in vc funding

enough for 6 years of veo 3 video

stray aspen Aug 10, 2025, 3:10 AM

#

in qwen 3

leaden palm Aug 10, 2025, 3:10 AM

#

not sure if that's a lot or not a lot

leaden palm Aug 10, 2025, 3:10 AM

#

stray aspen in qwen 3

unfortunately qwen 3 itself isn't multimodal, and qwen image can't edit images through lm arena

hallow ridge Aug 10, 2025, 3:11 AM

#

leaden palm not sure if that's a lot or not a lot

How can I get some of that 100m

#

Put me on the team

stray aspen Aug 10, 2025, 3:11 AM

#

hallow ridge Aug 10, 2025, 3:11 AM

#

Im going into quantum computers

stray aspen Aug 10, 2025, 3:11 AM

#

theres this

#

i mean on qen chat

#

the demos look great

#

and i wanted to try edit

hallow ridge Aug 10, 2025, 3:12 AM

#

stray aspen Aug 10, 2025, 3:12 AM

#

where did you do it

leaden palm Aug 10, 2025, 3:12 AM

#

stray aspen i mean on qen chat

it's odd, the official hf space demo + qwen chat don't allow adding images as context either

hallow ridge Aug 10, 2025, 3:12 AM

#

stray aspen where did you do it

Like people can pay me for this

#

I can make money using LLM arena

stray aspen Aug 10, 2025, 3:12 AM

#

no

hallow ridge Aug 10, 2025, 3:12 AM

#

stray aspen no

YES

leaden palm Aug 10, 2025, 3:13 AM

#

the qwen blog links to https://modelscope.cn/aigc/imageGeneration?tab=advanced

ModelScope 魔搭社区

ModelScope——汇聚各领域先进的机器学习模型，提供模型探索体验、推理、训练、部署和应用的一站式服务。在这里，共建模型开源社区，发现、学习、定制和分享心仪的模型。

stray aspen Aug 10, 2025, 3:13 AM

#

they can do it themselves

hallow ridge Aug 10, 2025, 3:13 AM

#

stray aspen they can do it themselves

They dont know

hallow ridge Aug 10, 2025, 3:13 AM

#

stray aspen they can do it themselves

they ask can you put a crown on my head in photoshop and Ill pay you $60

#

and I send in that and get paid 60 for 1 second of work

leaden palm Aug 10, 2025, 3:14 AM

#

hallow ridge they ask can you put a crown on my head in photoshop and Ill pay you $60

are you actually finding these opportunities

stray aspen Aug 10, 2025, 3:14 AM

#

hallow ridge they ask can you put a crown on my head in photoshop and Ill pay you $60

lmao no

hallow ridge Aug 10, 2025, 3:14 AM

#

stray aspen lmao no

LAMO YES

stray aspen Aug 10, 2025, 3:14 AM

#

or your extremely lucky finding people who dont know ai image edit exists

hallow ridge Aug 10, 2025, 3:15 AM

#

stray aspen or your extremely lucky finding people who dont know ai image edit exists

hallow ridge Aug 10, 2025, 3:16 AM

#

leaden palm are you actually finding these opportunities

Get paid 5 for using AI to make this look clean

leaden palm Aug 10, 2025, 3:16 AM

#

hallow ridge Get paid 5 for using AI to make this look clean

you'll get banned i think

hallow ridge Aug 10, 2025, 3:16 AM

#

leaden palm you'll get banned i think

From what

stray aspen Aug 10, 2025, 3:17 AM

#

hallow ridge

what

#

how is tha tpossible

#

craig why is your profile photo gpt-5

hallow ridge Aug 10, 2025, 3:17 AM

#

stray aspen what

@leaden palm Got paid for this

leaden palm Aug 10, 2025, 3:19 AM

#

hallow ridge <@794377681331945524> Got paid for this

hm

well i guess it doesn't explicitly prohibit it

however, per rules https://www.reddit.com/r/PhotoshopRequest/wiki/rules/, this post https://www.reddit.com/r/PhotoshopRequest/comments/1m7obke/a_humble_request_can_we_make_stricter_rules/, and my general experience, ai is looked down upon and often not paid for

Photoshop Request

A friendly place for free and paid photoshop requests. ⚠️ Read the rules before posting a request or a comment. Any violations will result in a ban without warning. If you're not sure if your post is allowed, contact the moderators.

hallow ridge Aug 10, 2025, 3:20 AM

#

leaden palm hm well i guess it doesn't explicitly prohibit it however, per rules https://w...

The thing is that they dont know its AI the work that I see other people do on there is obviously AI

stray aspen Aug 10, 2025, 3:20 AM

#

tariffs are great lol

leaden palm Aug 10, 2025, 3:20 AM

#

hallow ridge The thing is that they dont know its AI the work that I see other people do on t...

well, cool if it works

#

i'll be eating my words if you get rich from this

#

it's just... i think markets are efficient

hallow ridge Aug 10, 2025, 3:21 AM

#

leaden palm i'll be eating my words if you get rich from this

Im also building websites for people using LLM arena

stray aspen Aug 10, 2025, 3:21 AM

#

thats crazy

hallow ridge Aug 10, 2025, 3:27 AM

#

stray aspen thats crazy

Made this for 55@leaden palm@stray aspen

leaden palm Aug 10, 2025, 3:27 AM

#

hallow ridge Made this for 55<@794377681331945524><@612078049193885696>

55 usd?

hallow ridge Aug 10, 2025, 3:27 AM

#

leaden palm 55 usd?

yup

leaden palm Aug 10, 2025, 3:28 AM

#

damn

hallow ridge Aug 10, 2025, 3:28 AM

#

How im not some cryto scammer

hallow ridge Aug 10, 2025, 3:28 AM

#

leaden palm damn

It was all made on LLM arena

#

Oh thats not me

#

The guy I made it for

#

Why do you think that

stray aspen Aug 10, 2025, 3:29 AM

#

hallow ridge Made this for 55<@794377681331945524><@612078049193885696>

what does that website do

hallow ridge Aug 10, 2025, 3:30 AM

#

stray aspen what does that website do

Idk thats what he wanted I made it for him

#

he paid me to put all that

stray aspen Aug 10, 2025, 3:30 AM

#

LMAO

#

are you serious

hallow ridge Aug 10, 2025, 3:31 AM

#

me?

stray aspen Aug 10, 2025, 3:31 AM

#

he paid you to make a dox website

#

yes

hallow ridge Aug 10, 2025, 3:31 AM

#

IDK about all that I just made a cool website

stray aspen Aug 10, 2025, 3:32 AM

#

hallow ridge IDK about all that I just made a cool website

wdym you didnt know

#

didnt you read that dox info when you were making it

hallow ridge Aug 10, 2025, 3:33 AM

#

stray aspen wdym you didnt know

This was his site before I just mae it look better thats it

#

I dont care what was on the site

#

IT COULD HAVE BEEN A P HUB SITE FOR ALL i CARE

#

Get out of what stuff

#

im not in it

#

But If I see a way to get 100m I might get in it

#

Ive checkmated plenty of people

#

And does that go for only me or everyone else

#

So it does not make sense

#

No

#

what is it

#

btc stick

#

I know but cant they track u on that chain thing

#

Black chain

#

How so

#

BLOCK CHAIN

#

Its just an address

#

and you dont know who is connected to it

#

and you could sell the btc

#

SO where do I go to track the wallets

#

I want to see my addresses history

#

Oh i seeit

#

Let me find my old address

#

Dman people goin crazy rn

stray aspen Aug 10, 2025, 3:41 AM

#

its over

#

the feds are coming for that website you made

hallow ridge Aug 10, 2025, 3:41 AM

#

stray aspen the feds are coming for that website you made

Its just some code

#

im not apart of it

#

daaaaaaaaaaaaaaaaannng

#

I still have 600 in my old btc wallet

#

i need to find that shi

#

It says I had 9k in my wallet at one point

#

I dont think I backed it up

#

#

Whats illegal changes over time

#

So what your saying has no merit

#

It shows me who I sent 7k to

#

Yea its public

#

they all know how much you are makning

#

thats why you have a seprate one

#

with nothing

#

and send it over a a bunch of time

#

But how do they know its me

#

its just an address

#

Where do I find the richest wallet

#

how can i GET MY WALLET BACK

#

I dont think I backed it up

#

its just 600 sitting

#

I think I saved some private key but idk

#

Igotta look

jade egret Aug 10, 2025, 3:55 AM

#

Which One is Smarter, Gemini 2.5 Pro Deep Think or ChatGPT o3-Pro?

hallow ridge Aug 10, 2025, 4:03 AM

#

yooooooooooooooooooooooooooooooooooooooooooo

#

I just got in the account

#

whole wagon Aug 10, 2025, 4:19 AM

#

https://arstechnica.com/tech-policy/2025/08/ai-industry-horrified-to-face-largest-copyright-class-action-ever-certified

Ars Technica

AI industry horrified to face largest copyright class action ever c...

Copyright class actions could financially ruin AI industry, trade groups say.

#

If the appeals court denies the petition, Anthropic argued, the emerging company may be doomed. As Anthropic argued, it now "faces hundreds of billions of dollars in potential damages liability at trial in four months" based on a class certification rushed at "warp speed" that involves "up to seven million potential claimants, whose works span a century of publishing history," each possibly triggering a $150,000 fine.

drifting sandal Aug 10, 2025, 4:32 AM

#

Is gpt5 really rank 1 (still)?

void shoal Aug 10, 2025, 4:46 AM

#

我刚刚好像看到claude-opus-4-1-20250805-thinking出现在LMA里了，但是眨眼间就没了

#

然后只能找到claude-opus-4-1了，应该是开销太大了？

tidal ginkgo Aug 10, 2025, 4:54 AM

#

bing chillin

jade egret Aug 10, 2025, 5:01 AM

#

whole wagon ```If the appeals court denies the petition, Anthropic argued, the emerging comp...

wait what

#

anthropic cooked?

torn mantle Aug 10, 2025, 5:33 AM

#

i agree

hollow imp Aug 10, 2025, 6:21 AM

#

jade egret Which One is Smarter, Gemini 2.5 Pro Deep Think or ChatGPT o3-Pro?

Chatgpt o3-Pro is free try it for yourself

astral prawn Aug 10, 2025, 6:39 AM

#

So you have to pay 5.5% more to use open router?

obtuse heart Aug 10, 2025, 7:05 AM

#

hollow imp Chatgpt o3-Pro is free try it for yourself

where ?

astral prawn Aug 10, 2025, 7:12 AM

#

he's trolling, if u want to use it via API though o3-deep research is probably the closest

hallow ridge Aug 10, 2025, 7:26 AM

#

Does anyone know anything about the dark web

#

darknet

cedar tide Aug 10, 2025, 7:33 AM

#

@echo aurora hunyuan t1 and turbos dont respond on direct chat

astral prawn Aug 10, 2025, 7:33 AM

#

not the dark side of the web 😮

leaden sun Aug 10, 2025, 7:38 AM

#

whole wagon https://arstechnica.com/tech-policy/2025/08/ai-industry-horrified-to-face-larges...

i had thought about this lately and what a coincidence to read this here today...

#

i want to advocate free books for all on one hand, but i understand that authors need money to survive too... UBI could solve this problem it seems but it'll be rather a scenario in the future rather than a near term possibility?

verbal nimbus Aug 10, 2025, 8:16 AM

#

whole wagon https://arstechnica.com/tech-policy/2025/08/ai-industry-horrified-to-face-larges...

interesting

verbal nimbus Aug 10, 2025, 8:19 AM

#

leaden sun i want to advocate free books for all on one hand, but i understand that authors...

AI is here to stay, and it's definitely helped me discover more books than replace them.

hollow imp Aug 10, 2025, 8:36 AM

#

astral prawn he's trolling, if u want to use it via API though o3-deep research is probably t...

Im Not

hollow imp Aug 10, 2025, 8:36 AM

#

obtuse heart where ?

Yupp ai

obtuse heart Aug 10, 2025, 8:39 AM

#

hollow imp Yupp ai

isnt yupp ai mostly fake

hollow imp Aug 10, 2025, 8:39 AM

#

obtuse heart isnt yupp ai mostly fake

@stray aspen was defending it

neon idol Aug 10, 2025, 8:40 AM

#

Hello

vocal token Aug 10, 2025, 8:49 AM

#

No way, @echo aurora greg?

#

From WFS?

bright kayak Aug 10, 2025, 9:05 AM

#

Grok 4 is free now

neon idol Aug 10, 2025, 9:09 AM

#

bright kayak Grok 4 is free now

What???

bright kayak Aug 10, 2025, 9:12 AM

#

Check, I'm not lying

neon idol Aug 10, 2025, 9:12 AM

#

bright kayak Check, I'm not lying

Yes its true

#

There are limits? @bright kayak

bright kayak Aug 10, 2025, 9:13 AM

#

idk

neon idol Aug 10, 2025, 9:15 AM

#

bright kayak idk

In my opinion is a very good model but he beat gpt 5?

keen beacon Aug 10, 2025, 9:25 AM

#

bright kayak Grok 4 is free now

Yeah it is. I don't like using Grok because of personal reasons... But testing is always important

keen beacon Aug 10, 2025, 9:25 AM

#

neon idol There are limits? <@507165631804866570>

Yes.

#

Unless you pay for super grok

#

Which I ain't doing

neon idol Aug 10, 2025, 9:39 AM

#

keen beacon Yes.

How much of query?

keen beacon Aug 10, 2025, 9:41 AM

#

neon idol How much of query?

Haven't tested it that much but generally they are quite generous

neon idol Aug 10, 2025, 9:42 AM

#

keen beacon Haven't tested it that much but generally they are quite generous

Honestly Grok for me is a really great model

keen beacon Aug 10, 2025, 9:47 AM

#

neon idol Honestly Grok for me is a really great model

Had to ask it, lol.

Screenshot_2025-08-10-12-47-24-43_74158c69f0af68fbf38b28b8774cd491.jpg

novel flame Aug 10, 2025, 10:00 AM

#

I am not surprised that GPT-5 failed on that math problem. In my testing it had problematic reasoning, the kind where early on it convinces itself of something that is clearly not true, and then throughout its thinking trace it keeps referencing this false assumption as a hard truth/requirement, which leads it down an incorrect path that it can’t escape. I suspect it would perform better without reasoning on a lot of tasks where it decides to use reasoning.

hollow imp Aug 10, 2025, 10:17 AM

#

keen beacon Yeah it is. I don't like using Grok because of personal reasons... But testing i...

I found it more "rational" than gpt5

hollow imp Aug 10, 2025, 10:18 AM

#

novel flame I am not surprised that GPT-5 failed on that math problem. In my testing it had ...

Ai expert sir I want to ask a question

hollow imp Aug 10, 2025, 10:18 AM

#

novel flame I am not surprised that GPT-5 failed on that math problem. In my testing it had ...

Didn't it fail because of some tokens thing

hollow imp Aug 10, 2025, 10:19 AM

#

neon idol How much of query?

Well the last time I remember I was using deepersearch mode and after 3 no 2 queries it was done

keen beacon Aug 10, 2025, 10:25 AM

#

Btw, have you guys tried translation? I am currently testing translating song lyrics into my language and seeing if they are accurate.

#

Tried translating one kpop song into finnish and the result is not good.

verbal nimbus Aug 10, 2025, 10:32 AM

#

keen beacon Yeah it is. I don't like using Grok because of personal reasons... But testing i...

Also Grok 4 is worse than Grok 3 and Mistral Medium(?) on Web Dev Arena.

keen beacon Aug 10, 2025, 10:33 AM

#

verbal nimbus Also Grok 4 is worse than Grok 3 and Mistral Medium(?) on Web Dev Arena.

What? Is that true?

#

damn

verbal nimbus Aug 10, 2025, 10:33 AM

#

novel flame I am not surprised that GPT-5 failed on that math problem. In my testing it had ...

I swear requests are being routed to a mini model, the version on ChatGPT is just way too dumb to be anything other than mini.

keen beacon Aug 10, 2025, 10:35 AM

#

verbal nimbus I swear requests are being routed to a mini model, the version on ChatGPT is jus...

Must be too much traffic

verbal nimbus Aug 10, 2025, 10:37 AM

#

Oh I rechecked, it is the case on Design Arena, but not LMArena, odd...

#

designarena.ai

verbal nimbus Aug 10, 2025, 10:39 AM

#

verbal nimbus Oh I rechecked, it is the case on Design Arena, but not LMArena, odd...

Odd since both have a web dev category

#

On design arena, Grok 3 is rated 16, Grok 4 is rated 26

#

Probably because it doesn't rely on React + TailwindCSS

#

Gemini 2.5 Pro is lower too, at #9

#

Another reason to add more flexible web dev execution environments ig, current leaderboard only tests React-maxxed models

olive mesa Aug 10, 2025, 11:04 AM

#

it's so silly how confident it is

olive mesa Aug 10, 2025, 11:04 AM

#

olive mesa it's so silly how confident it is

sometimes gpt-5 gets this correct but sometimes not

keen beacon Aug 10, 2025, 11:06 AM

#

olive mesa it's so silly how confident it is

what math is this? What is the correct answer?

#

My math level is that of middle school

#

lol

gentle plinth Aug 10, 2025, 11:18 AM

#

are the direct chat and battle mode versions of gpt-5 on the same reasoning and effort level?

keen beacon Aug 10, 2025, 11:28 AM

#

gentle plinth are the direct chat and battle mode versions of gpt-5 on the same reasoning and ...

Should be?

light zephyr Aug 10, 2025, 12:12 PM

#

How can i generate audio

#

With video

novel flame Aug 10, 2025, 12:20 PM

#

Hmm..... Since Sama posted that "release day GPT-5" was nerfed by a bug, I re-ran my tests, and lo and behold, it scored 5/5 instead of the 3.5-4/5 I got on release day.

However, if I were OpenAI and wanted to be a super-sneaky sleazeball with happy investors, here's what I would do:

Release a budget GPT-5 model (cheap enough to be competitive)
Log all first-day prompts
Detect and collect all prompts that seem to be LLM nerds poking / testing the model with trick questions.
Run a larger, more expensive internal model on this subset, generating better responses. Finetune the budget model with this dataset.
Replace the public GPT-5 model with this GPT-5-nerd-finetune
Post on X "Oopsies, found a bug, it's much smarter now"
Lean back and watch all the nerds be impressed with the awesome power of GPT-5

I am not saying OpenAI would do such a thing, only that it is totally something that could be done.

eternal niche Aug 10, 2025, 12:24 PM

#

guys

#

remember

#

gpt5 sucks

novel flame Aug 10, 2025, 12:25 PM

#

eternal niche gpt5 sucks

It doesn't. It's disappointing, not because it is objectively bad, but because it couldn't live up to the "unbeatable next generation wowzers" expectations, and almost certainly will be utterly humiliated by Gemini 3. But saying "it sucks" is just wrong.

eternal niche Aug 10, 2025, 12:26 PM

#

even gemini 2.5 pro better

#

neon idol Aug 10, 2025, 12:28 PM

#

eternal niche

Bruh you have to put default

#

Not Remove Syle Control

eternal niche Aug 10, 2025, 12:30 PM

#

why

#

who needs style control

brave orbit Aug 10, 2025, 12:32 PM

#

keen beacon Aug 10, 2025, 12:32 PM

#

light zephyr How can i generate audio

#1397655624103493813

#

Kimi K2 for emotional intelligence and gemini 2.5 pro for everything else.

#

I dont feel that impressed by GPT-5 for some reason

white hatch Aug 10, 2025, 12:35 PM

#

Was gpt-5 better after the live presentation?

small chasm Aug 10, 2025, 12:45 PM

#

#

What the hell ?

sacred quail Aug 10, 2025, 1:11 PM

#

eternal niche even gemini 2.5 pro better

Gemini pro 2.5 is most talented model but Gpt 5 is best model right now objectively

#

This comes from a gemini fan

#

If gemini 3.0 release, then we can compare with gpt 5

#

But right now we must admit that gpt 5 is sota right now

#

I was using gemini a lot since 2.0 flash think while peoples didnt know it exist

earnest parcel Aug 10, 2025, 1:15 PM

#

brave orbit

Very balanced options. No Claude 4? Grok-4? But got R1 (not even 0528)?

obtuse heart Aug 10, 2025, 1:15 PM

#

small chasm

it says that sometimes, dont trust it

stiff nimbus Aug 10, 2025, 1:15 PM

#

where to create images?

honest vapor Aug 10, 2025, 1:22 PM

#

Add up file pls

surreal forum Aug 10, 2025, 1:28 PM

#

hello

teal mantle Aug 10, 2025, 1:56 PM

#

anyone want to pool chatgpt team

#

I have the welcome offer as it turns out

tender acorn Aug 10, 2025, 2:03 PM

#

Hello

#

How to create images by LMarena?

jade egret Aug 10, 2025, 2:13 PM

#

hollow imp Chatgpt o3-Pro is free try it for yourself

where

keen beacon Aug 10, 2025, 2:25 PM

#

stiff nimbus where to create images?

#1397655624103493813

keen beacon Aug 10, 2025, 2:26 PM

#

tender acorn How to create images by LMarena?

#1397655624103493813

#

https://www.reddit.com/r/singularity/s/bUXq1wU1t2

From the singularity community on Reddit: GPT-5 admits it "doesn't ...

Explore this post and more from the singularity community

#

Has this happened with you guys yet?

stray aspen Aug 10, 2025, 2:36 PM

#

obtuse heart isnt yupp ai mostly fake

No it's not

#

I dont why people say this

#

I tested myself and the results are pretty similar

rapid fossil Aug 10, 2025, 2:37 PM

#

Guys, I kinda need help. I wanna buy a subscription for an AI but idfk which one is the best tbh. Here are my options that I was thinking

ChatGPT Plus
X Premium+ w/ Grok 4
Gemini Pro
Claude Pro

stray aspen Aug 10, 2025, 2:39 PM

#

Chatgpt

obtuse heart Aug 10, 2025, 2:43 PM

#

keen beacon Has this happened with you guys yet?

no but isnt that what theyve worked on

#

i remember from the livestream they said that it will just say "idk" instead of making false answers

keen beacon Aug 10, 2025, 2:43 PM

#

obtuse heart no but isnt that what theyve worked on

It is. I like to see it.

#

Just good to see it confirmed

keen fulcrum Aug 10, 2025, 2:47 PM

#

rapid fossil Guys, I kinda need help. I wanna buy a subscription for an AI but idfk which one...

I recommend a subscription on https://grok.com

If you do send hundreds of messages within hours

Grok

Grok is a free AI assistant designed by xAI to maximize truth and objectivity. Grok offers real-time search, image generation, trend analysis, and more.

rapid fossil Aug 10, 2025, 2:49 PM

#

keen fulcrum I recommend a subscription on https://grok.com If you do send hundreds of mess...

Ok, I will see

keen fulcrum Aug 10, 2025, 2:49 PM

#

If you need it for coding I recommend getting claude code or stocking up on API credits

rapid fossil Aug 10, 2025, 2:50 PM

#

keen fulcrum If you need it for coding I recommend getting claude code or stocking up on API ...

I wanna choose something that's best for everything overall

dusky pier Aug 10, 2025, 2:51 PM

#

How is gpt-5 even first in lmarena?

keen fulcrum Aug 10, 2025, 2:51 PM

#

rapid fossil I wanna choose something that's best for everything overall

Test around both grok 4 and gpt5

#

see which you like better

rapid fossil Aug 10, 2025, 2:51 PM

#

Ok, thanks

hollow imp Aug 10, 2025, 2:51 PM

#

jade egret where

Yupp ai

dusky pier Aug 10, 2025, 2:52 PM

#

rapid fossil I wanna choose something that's best for everything overall

Do you want to use Opus? 🤣

stray aspen Aug 10, 2025, 2:53 PM

#

chatgpt plus

rapid fossil Aug 10, 2025, 2:53 PM

#

dusky pier Do you want to use Opus? 🤣

Idfk

dusky pier Aug 10, 2025, 2:55 PM

#

rapid fossil Idfk

Opus is too expensive

#

You could try Qwen coder

#

https://github.com/QwenLM/qwen-code

GitHub

GitHub - QwenLM/qwen-code: qwen-code is a coding agent that lives i...

qwen-code is a coding agent that lives in digital world. - QwenLM/qwen-code

earnest parcel Aug 10, 2025, 2:56 PM

#

rapid fossil Guys, I kinda need help. I wanna buy a subscription for an AI but idfk which one...

ChatGPT if you do generic stuff or use it recreational in low to mid volume
Gemini Pro is free on AI studio at very high volume so unless you are working with secret data, not needed to buy
Claude Pro if you are a coder
X Premium / Grok - no clue, never used it. If you tweet alot? /s

rapid fossil Aug 10, 2025, 2:56 PM

#

dusky pier You could try Qwen coder

Ik qwen coder, but I don't want just for coding, idfk man, with these many AI's too hard to choose

eternal niche Aug 10, 2025, 2:58 PM

#

rapid fossil Guys, I kinda need help. I wanna buy a subscription for an AI but idfk which one...

gemini 2.5 pro

obtuse heart Aug 10, 2025, 2:59 PM

#

rapid fossil Ik qwen coder, but I don't want just for coding, idfk man, with these many AI's ...

if its not just for coding you can use gemini

rapid fossil Aug 10, 2025, 3:00 PM

#

earnest parcel ChatGPT if you do generic stuff or use it recreational in low to mid volume Gemi...

Chatgpt - I use it kinda everyday, so yeah, it could be ok
Gemini - I saw, but I thought that it added Gemini in docs, gmail and had 2 TB storage for Google.
Claude Pro - I am, but, from what I know, Claude is kinda just for coding and making a text better
Grok - I know that Grok 4 is kinda good, like I use it everyday, and it's good.

rapid fossil Aug 10, 2025, 3:00 PM

#

obtuse heart if its not just for coding you can use gemini

I will think

solid brook Aug 10, 2025, 3:04 PM

#

guys i heard that gpt 5 thinking in chatgpt website is gpt 5 medium reason effort in api

#

i mean damn

earnest parcel Aug 10, 2025, 3:04 PM

#

rapid fossil Chatgpt - I use it kinda everyday, so yeah, it could be ok Gemini - I saw, but I...

ya I obviously used the model itself (via api) i just meant I never used X subscriptions.
personally I find my claude sub the most valuable and I hated openai limits (to the point I even made a rant video about it).
No one can tell you what to use, just use each for a month and keep the one that fits best

obtuse heart Aug 10, 2025, 3:04 PM

#

claude is super expensive

solid brook Aug 10, 2025, 3:05 PM

#

solid brook guys i heard that gpt 5 thinking in chatgpt website is gpt 5 medium reason effor...

but in lmarena we have gpt 5 high reason effort. so....

rapid fossil Aug 10, 2025, 3:06 PM

#

Not really

rapid fossil Aug 10, 2025, 3:07 PM

#

earnest parcel ya I obviously used the model itself (via api) i just meant I never used X subsc...

Yeah, ChatGPT limits and features are worse than it was before

eternal niche Aug 10, 2025, 3:09 PM

#

gemini 2.5 pro the best

#

gpt5 sucks

solid brook Aug 10, 2025, 3:09 PM

#

i mean max 1 month before gemini 3

#

gemini 3 will cook

earnest parcel Aug 10, 2025, 3:10 PM

#

best limits, best all around, best every use case! such insightful advice!

eternal niche Aug 10, 2025, 3:10 PM

#

just accept it

#

lol

earnest parcel Aug 10, 2025, 3:11 PM

#

"best model" at what? also you don't seem to know what objectively means. llama 4 was also topping lmarena btw

eternal niche Aug 10, 2025, 3:11 PM

#

style control 🤣 🫵

solid brook Aug 10, 2025, 3:11 PM

#

they are far ahead in ai than any other company. don't see the current gemini 2.5 pro it is nerfed. the original released in april was at 4 sonnet or opus level. it was REALLY good but they nerfed it hard. imagine what gemini 3 will be

rapid fossil Aug 10, 2025, 3:12 PM

#

Jeez I created a war

jade egret Aug 10, 2025, 3:13 PM

#

rapid fossil Guys, I kinda need help. I wanna buy a subscription for an AI but idfk which one...

gemini (:

eternal niche Aug 10, 2025, 3:13 PM

#

they are preparing for gemini 3

jade egret Aug 10, 2025, 3:13 PM

#

eternal niche they are preparing for gemini 3

ya

earnest parcel Aug 10, 2025, 3:13 PM

#

rapid fossil Jeez I created a war

either astroturfing or ignorant. there is no "best at everything" model rn

rapid fossil Aug 10, 2025, 3:13 PM

#

earnest parcel either astroturfing or ignorant. there is no "best at everything" model rn

Tbh, true, but like, one of the closest to it, yk what i'm talking abt

jade egret Aug 10, 2025, 3:14 PM

#

earnest parcel either astroturfing or ignorant. there is no "best at everything" model rn

yea

#

so true

#

there's isnt a best model for everything

rapid fossil Aug 10, 2025, 3:15 PM

#

No there isn't

jade egret Aug 10, 2025, 3:15 PM

#

nah

#

what is it?

#

chatgpt?

#

but it not best for everything?

eternal niche Aug 10, 2025, 3:15 PM

#

"source: trust me bro"

jade egret Aug 10, 2025, 3:15 PM

#

eternal niche "source: trust me bro"

lol

eternal niche Aug 10, 2025, 3:15 PM

#

no

rapid fossil Aug 10, 2025, 3:16 PM

#

eternal niche "source: trust me bro"

true

jade egret Aug 10, 2025, 3:16 PM

#

more popular doesn't equal to better

eternal niche Aug 10, 2025, 3:16 PM

#

show me statistics

trim tartan Aug 10, 2025, 3:16 PM

#

how do you include audio?

jade egret Aug 10, 2025, 3:16 PM

#

even if it is, not saying it is, it not best in every single catagory

eternal niche Aug 10, 2025, 3:17 PM

#

because gemini 2.5 pro the best

trim tartan Aug 10, 2025, 3:17 PM

#

generated video of mine does not contain any audio

stray aspen Aug 10, 2025, 3:17 PM

#

guys stop yapping and accept the truth gpt-5 is SoTA 😂

jade egret Aug 10, 2025, 3:17 PM

#

stray aspen guys stop yapping and accept the truth gpt-5 is SoTA 😂

ofc it is

#

because it the newest

eternal niche Aug 10, 2025, 3:17 PM

#

well

jade egret Aug 10, 2025, 3:17 PM

#

why u mad 😭

eternal niche Aug 10, 2025, 3:18 PM

#

?

jade egret Aug 10, 2025, 3:18 PM

#

newest always equal to better bro

rapid fossil Aug 10, 2025, 3:18 PM

#

earnest parcel Aug 10, 2025, 3:18 PM

#

llama 4 is great model guys, source attached

jade egret Aug 10, 2025, 3:18 PM

#

earnest parcel llama 4 is great model guys, source attached

lol

tight dune Aug 10, 2025, 3:19 PM

#

Hi

rapid fossil Aug 10, 2025, 3:19 PM

#

earnest parcel llama 4 is great model guys, source attached

Llama is good, but you can't compare it to like, chatgpt, grok, gemini, heck, even qwen

ripe mountain Aug 10, 2025, 3:19 PM

#

btw gpt 5 is the best when it comes to coding

rapid fossil Aug 10, 2025, 3:19 PM

#

ripe mountain btw gpt 5 is the best when it comes to coding

Not really

ripe mountain Aug 10, 2025, 3:20 PM

#

rapid fossil Not really

without tools

jade egret Aug 10, 2025, 3:20 PM

#

ripe mountain btw gpt 5 is the best when it comes to coding

um

ripe mountain Aug 10, 2025, 3:21 PM

#

which is the best price-performance model in terms of coding the o4 mini high or the qwen coder?

patent aspen Aug 10, 2025, 3:22 PM

#

I'm talking about thinking time not tokens

ripe mountain Aug 10, 2025, 3:23 PM

#

i havent tested it yet

eternal niche Aug 10, 2025, 3:23 PM

#

just accept that gpt5 sucks

#

openai for normies

#

gemini for gigachads

rapid fossil Aug 10, 2025, 3:23 PM

#

https://tenor.com/view/cat-cat-meme-cat-meme-face-chips-cat-eating-gif-15310179217102583800

Tenor

rapid fossil Aug 10, 2025, 3:23 PM

#

rapid fossil https://tenor.com/view/cat-cat-meme-cat-meme-face-chips-cat-eating-gif-153101792...

watching the war I created

ripe mountain Aug 10, 2025, 3:23 PM

#

most gpt users are daily users

eternal niche Aug 10, 2025, 3:24 PM

#

👍

earnest parcel Aug 10, 2025, 3:24 PM

#

eternal niche gemini for gigachads

openai normies, gemini gigachads. where do I fall as claude user?

jade egret Aug 10, 2025, 3:24 PM

#

popular doesn't automatically equal to better

eternal niche Aug 10, 2025, 3:24 PM

#

earnest parcel openai normies, gemini gigachads. where do I fall as claude user?

programmer

jade egret Aug 10, 2025, 3:24 PM

#

eternal niche 👍

lol

rapid fossil Aug 10, 2025, 3:25 PM

#

True, like GLM 4.5 is kinda good even tho is like heard about nobody

ripe mountain Aug 10, 2025, 3:26 PM

#

the results on lmarena and openrouter seem extremely different to me across all ai models

#

what is the reason for this discrepancy

rapid fossil Aug 10, 2025, 3:27 PM

#

Yeah, but even if you are popular, because if you don't make something good for everyone, everyone is going to boycott you

rapid fossil Aug 10, 2025, 3:28 PM

#

rapid fossil Yeah, but even if you are popular, because if you don't make something good for ...

Like in the GPT-5 release. ChatGPT made the Plus plan worse and everyone on twitter started hating on OpenAI

#

Nobody is perfect, not even AI, let's stop, every AI is good in its way

eternal niche Aug 10, 2025, 3:29 PM

#

just like you

jade egret Aug 10, 2025, 3:30 PM

#

where can i use o3-pro for free?

ripe mountain Aug 10, 2025, 3:30 PM

#

jade egret where can i use o3-pro for free?

nowhere

jade egret Aug 10, 2025, 3:30 PM

#

ripe mountain nowhere

dang

echo aurora Aug 10, 2025, 3:30 PM

#

cedar tide <@283397944160550928> hunyuan t1 and turbos dont respond on direct chat

I'm seeing the same, thank you for letting us know, I'll flag to the team.

stray aspen Aug 10, 2025, 3:30 PM

#

in genspark

#

or yupp ai

rapid fossil Aug 10, 2025, 3:30 PM

#

jade egret dang

Yeah, probably nowhere

jade egret Aug 10, 2025, 3:31 PM

#

stray aspen in genspark

you can choose the models?

stray aspen Aug 10, 2025, 3:31 PM

#

Yes

#

but its very limited

#

you have few messages

jade egret Aug 10, 2025, 3:31 PM

#

dang

#

how do you choose ; (

#

ohh

#

i see

#

yay (:

neon idol Aug 10, 2025, 3:32 PM

#

poll_question_text

Who is the best?

victor_answer_votes

0

total_votes

0

stray aspen Aug 10, 2025, 3:35 PM

#

yeah

jade egret Aug 10, 2025, 3:35 PM

#

W

#

how long do o3-pro usually think for

stray aspen Aug 10, 2025, 3:39 PM

#

all day

ripe mountain Aug 10, 2025, 3:39 PM

#

jade egret how long do o3-pro usually think for

137 s

jade egret Aug 10, 2025, 3:40 PM

#

oh

#

so it normal it still thinking

ripe mountain Aug 10, 2025, 3:40 PM

#

stray aspen Aug 10, 2025, 3:40 PM

#

yes

echo aurora Aug 10, 2025, 3:40 PM

#

We're looking for info on why. Please share your thoughts in this thread.

ripe mountain Aug 10, 2025, 3:42 PM

#

The Gemini 2.5 Pro has fallen far behind in the Artificial Analysis Intelligence Index. Even the O4 Mini High has surpassed it.

stray aspen Aug 10, 2025, 3:43 PM

#

yes

#

its getting dumber every day

#

i dont know if it has to do with gemini 3 or a new release

solid brook Aug 10, 2025, 3:44 PM

#

stray aspen i dont know if it has to do with gemini 3 or a new release

THEY BETTER NOT NERF GEMINI 3 AFTER RELEASE

ripe mountain Aug 10, 2025, 3:45 PM

#

solid brook THEY BETTER NOT NERF GEMINI 3 AFTER RELEASE

evil corp

stray aspen Aug 10, 2025, 3:45 PM

#

they wont lamo

#

they have to stay SoTA

#

if they achieve sota of course

ripe mountain Aug 10, 2025, 3:46 PM

#

deepseek where's my bro 😭

stray aspen Aug 10, 2025, 3:46 PM

#

deepseek sucks

ripe mountain Aug 10, 2025, 3:46 PM

#

why

stray aspen Aug 10, 2025, 3:46 PM

#

because it needs an update

#

the newer models are smarter

ripe mountain Aug 10, 2025, 3:47 PM

#

stray aspen the newer models are smarter

the second-best open-source model is the deepseek r1

stray aspen Aug 10, 2025, 3:47 PM

#

but open source models suck

#

they are never sota

barren prairie Aug 10, 2025, 3:48 PM

#

stray aspen but open source models suck

No GLM4.5 is sota

ripe mountain Aug 10, 2025, 3:48 PM

#

stray aspen but open source models suck

except qwen

stray aspen Aug 10, 2025, 3:48 PM

#

it isnt lmao

ripe mountain Aug 10, 2025, 3:48 PM

#

barren prairie No GLM4.5 is sota

worse than gpt oss

stray aspen Aug 10, 2025, 3:48 PM

#

qwen is smarter than gpt-5 low

#

according to artificial analysis

barren prairie Aug 10, 2025, 3:49 PM

#

ripe mountain except qwen

For me no ... On html it is the smartest

willow grail Aug 10, 2025, 3:50 PM

#

frederico is a beautiful name

barren prairie Aug 10, 2025, 3:50 PM

#

No no no gemini is always good for me

willow grail Aug 10, 2025, 3:50 PM

#

barren prairie No no no gemini is always good for me

gemini 2.5 is useless for swe. and for doing online research.

eternal niche Aug 10, 2025, 3:50 PM

#

gemini 2.5 pro the best

stray aspen Aug 10, 2025, 3:50 PM

#

its not good

#

the chinese models are smarter

obtuse heart Aug 10, 2025, 3:51 PM

#

no theyre not

ripe mountain Aug 10, 2025, 3:51 PM

#

was the horizon betta better than gpt-5?

obtuse heart Aug 10, 2025, 3:51 PM

#

ripe mountain was the horizon betta better than gpt-5?

no

stray aspen Aug 10, 2025, 3:51 PM

#

ripe mountain was the horizon betta better than gpt-5?

hell nah

barren prairie Aug 10, 2025, 3:51 PM

#

willow grail gemini 2.5 is useless for swe. and for doing online research.

It is good at explaining my lessons and doing podcasts and that s enough for me 😂

willow grail Aug 10, 2025, 3:51 PM

#

what is dat

ripe mountain Aug 10, 2025, 3:51 PM

#

obtuse heart no

i think it's been nerfed

willow grail Aug 10, 2025, 3:51 PM

#

barren prairie It is good at explaining my lessons and doing podcasts and that s enough for me ...

ur wasintg money and time visiting a university

#

u should register as unemployed and get all the UBI

solid brook Aug 10, 2025, 3:52 PM

#

ripe mountain was the horizon betta better than gpt-5?

no lol it didn't even think i think

willow grail Aug 10, 2025, 3:52 PM

#

eli5

ripe mountain Aug 10, 2025, 3:54 PM

#

Which is more reasonable: purchasing a monthly subscription or paying per token from OpenRouter?

solid brook Aug 10, 2025, 3:54 PM

#

willow grail u should register as unemployed and get all the UBI

very skeptical on UBI

solid brook Aug 10, 2025, 3:54 PM

#

ripe mountain Which is more reasonable: purchasing a monthly subscription or paying per token ...

using lmarena lol

sour spindle Aug 10, 2025, 3:54 PM

#

What benchmark have you all found most closely lines up to your real world experience?

willow grail Aug 10, 2025, 3:55 PM

#

sour spindle What benchmark have you all found most closely lines up to your real world exper...

swe and livebench

ripe mountain Aug 10, 2025, 3:55 PM

#

solid brook using lmarena lol

why

solid brook Aug 10, 2025, 3:55 PM

#

ripe mountain why

it's free and unlimited

eternal niche Aug 10, 2025, 3:55 PM

#

ripe mountain was the horizon betta better than gpt-5?

definitely

solid brook Aug 10, 2025, 3:56 PM

#

bruh

willow grail Aug 10, 2025, 3:56 PM

#

source?

solid brook Aug 10, 2025, 3:56 PM

#

did you even use it bro?

willow grail Aug 10, 2025, 3:56 PM

#

oh

#

hm

#

bro im mjust lonely. pls feel with me

solid brook Aug 10, 2025, 3:58 PM

#

willow grail bro im mjust lonely. pls feel with me

tell at the end of the promt "think very hard" if the question is challenging it will think up to 3-5 minutes

pure falcon Aug 10, 2025, 3:58 PM

#

Currently, yes. But does that mean it was high reasoning when they tested?

willow grail Aug 10, 2025, 3:58 PM

#

see it is using gpt5..... xD which requires "think very hard"

ripe mountain Aug 10, 2025, 3:58 PM

#

grok or gemini? which is better

willow grail Aug 10, 2025, 3:58 PM

#

ripe mountain grok or gemini? which is better

gpt5 high

obtuse heart Aug 10, 2025, 3:59 PM

#

ripe mountain grok or gemini? which is better

gemini

ripe mountain Aug 10, 2025, 3:59 PM

#

willow grail gpt5 high

i know it's the best but i was curious

whole wagon Aug 10, 2025, 3:59 PM

#

.

solid brook Aug 10, 2025, 3:59 PM

#

pure falcon Currently, yes. But does that mean it was high reasoning when they tested?

it was.

#

under the stealth model summit

whole wagon Aug 10, 2025, 3:59 PM

#

.

solid brook Aug 10, 2025, 3:59 PM

#

summit gpt 5 high
zenith gpt 5 medium

tidal ginkgo Aug 10, 2025, 4:00 PM

#

y´all, what is the closest thing we have to a model better then gpt-5 in lmarena

solid brook Aug 10, 2025, 4:00 PM

#

tidal ginkgo y´all, what is the closest thing we have to a model better then gpt-5 in lmarena

i mean each model is good at something. claude code grok logic and reasoning

pure falcon Aug 10, 2025, 4:00 PM

#

solid brook summit gpt 5 high zenith gpt 5 medium

That’s incorrect. Zenith was a different version entirely. Not a different reasoning level

pure falcon Aug 10, 2025, 4:01 PM

#

solid brook it was.

Now I’m doubting this lol

solid brook Aug 10, 2025, 4:01 PM

#

pure falcon That’s incorrect. Zenith was a different version entirely. Not a different reaso...

what was it

whole wagon Aug 10, 2025, 4:01 PM

#

I just gave you primary sources

#

It's the standard thinking mode

solid brook Aug 10, 2025, 4:02 PM

#

pure falcon Now I’m doubting this lol

i mean i don't think openai would let lmarena benchmark gpt 5 medium

#

source>?

solid brook Aug 10, 2025, 4:04 PM

#

whole wagon .

why do twist it? that was about the gpt 5 in copilot this is about gpt 5 in lmarena

gritty cargo Aug 10, 2025, 4:04 PM

#

can someone tell me when the leaderboard will be updated next timn?

solid brook Aug 10, 2025, 4:04 PM

#

link please?

#

no dude

#

i mean the direct link

whole wagon Aug 10, 2025, 4:05 PM

#

solid brook why do twist it? that was about the gpt 5 in copilot this is about gpt 5 in lmar...

What

#

#

Obviously reasoning effort is still a thing. Lol

pure falcon Aug 10, 2025, 4:05 PM

#

Uhh…

solid brook Aug 10, 2025, 4:06 PM

#

pure falcon Uhh…

omg

whole wagon Aug 10, 2025, 4:06 PM

#

#

Bros actually cannot read

solid brook Aug 10, 2025, 4:07 PM

#

whole wagon

@echo aurora uhm you said the reason effort was high?>

pure falcon Aug 10, 2025, 4:08 PM

#

So you’re saying OpenAI docs, which i just opened for the first time and screenshotted - you’re telling me they’re wrong?

#

Sounds like
the problem is you here tbh lol

eternal niche Aug 10, 2025, 4:08 PM

#

craig you are so cringeeeeee

#

btw

solid brook Aug 10, 2025, 4:09 PM

#

@whole wagon that was 1 hour after release. pineapple later said that the reason effort is high. let me find his message

echo aurora Aug 10, 2025, 4:10 PM

#

solid brook <@283397944160550928> uhm you said the reason effort was high?>

yes, reasoning effort set to high

whole wagon Aug 10, 2025, 4:10 PM

#

It also says if I KYC they will give access to the reasoning trace. On the playground

#

pure falcon Aug 10, 2025, 4:11 PM

#

So 3 main questions:

what reasoning level did they test on
was the router working correctly
what reasoning is LMArena using today ✅ (already answered, high)

stray aspen Aug 10, 2025, 4:11 PM

#

eternal niche btw

lol

#

is this real

solid brook Aug 10, 2025, 4:12 PM

#

pure falcon So 3 main questions: - what reasoning level did they test on - was the router w...

the exact gpt 5 model in api has no router

#

just reason effort

pure falcon Aug 10, 2025, 4:13 PM

#

Good catch. So, revising:

• was Summit, when tested in LMArena, high reasoning?

whole wagon Aug 10, 2025, 4:13 PM

#

@echo aurora what verbosity on lmarena, medium?

stray aspen Aug 10, 2025, 4:13 PM

#

@echo aurorawhats the reason effort of gpt-5 on lmarena

whole wagon Aug 10, 2025, 4:13 PM

#

he literally just answered

stray aspen Aug 10, 2025, 4:13 PM

#

where

whole wagon Aug 10, 2025, 4:13 PM

#

learn to read

pure falcon Aug 10, 2025, 4:13 PM

#

stray aspen <@283397944160550928>whats the reason effort of gpt-5 on lmarena

He just confirmed high

stray aspen Aug 10, 2025, 4:13 PM

#

alright

#

thanks

whole wagon Aug 10, 2025, 4:13 PM

#

echo aurora yes, reasoning effort set to high

.

#

i found setting verbosity to high also improves how often it is correct. lol

pure falcon Aug 10, 2025, 4:14 PM

#

pure falcon Good catch. So, revising: • was Summit, when tested in LMArena, high reasoning?

@echo aurora Sorry for tags! But if you could clear this up for us, I’m sure it would save you from a lot of future tagging

echo aurora Aug 10, 2025, 4:15 PM

#

pure falcon <@283397944160550928> Sorry for tags! But if you could clear this up for us, I’...

I'm not sure, will ask and keep you all updated if I can

pure falcon Aug 10, 2025, 4:15 PM

#

echo aurora I'm not sure, will ask and keep you all updated if I can

Thx so much 🙂 appreciate it!

whole wagon Aug 10, 2025, 4:15 PM

#

whole wagon <@283397944160550928> what verbosity on lmarena, medium?

and this plz

#

the verbosity will be important for lm arena

stark tusk Aug 10, 2025, 4:40 PM

#

Is gpt5 and gpt5 chat the same?

whole wagon Aug 10, 2025, 4:41 PM

#

Nope

stark tusk Aug 10, 2025, 4:41 PM

#

What's the difference?

sacred quail Aug 10, 2025, 4:54 PM

#

Gpt 5 can think, gpt 5 chat is not

hazy cipher Aug 10, 2025, 4:57 PM

#

Hello
Can we make an image to talk?

shell crag Aug 10, 2025, 4:58 PM

#

Guys when i am generating image in lmarena website image always generate in 1:1 ratio what should i as in prompt to gave perfect ratio image

reef pawn Aug 10, 2025, 5:02 PM

#

shell crag Guys when i am generating image in lmarena website image always generate in 1:1 ...

You can't do anything, it default with all models

#

Grok 4 or GPT-5?

#

Which one you guys like more

sacred quail Aug 10, 2025, 5:03 PM

#

Gpt 5

#

Grok 4 is still good, just not good enough to be best

reef pawn Aug 10, 2025, 5:05 PM

#

I haven't tested Grok 4 fully yet, but my first impressions was that GPT-5 is superior on surface

sacred quail Aug 10, 2025, 5:05 PM

#

it is

white hatch Aug 10, 2025, 5:06 PM

#

reef pawn Grok 4 or GPT-5?

gpt-5 now

#

lool

stray aspen Aug 10, 2025, 5:10 PM

#

what

rich mauve Aug 10, 2025, 5:15 PM

#

Hello

eternal niche Aug 10, 2025, 5:26 PM

#

stray aspen is this real

yes

golden ocean Aug 10, 2025, 5:45 PM

#

can u leave the server

exotic nebula Aug 10, 2025, 5:48 PM

#

Get out

#

Sketchy af

echo aurora Aug 10, 2025, 5:50 PM

#

hazy cipher Hello Can we make an image to talk?

Sounds like you’re looking for our bot, check out #1397655624103493813 for more information

modest prism Aug 10, 2025, 5:52 PM

#

Hi there! I've got a question. Is the model named "gpt-5" on lmarena, a thinking or non thinking variant or automatically routed when it decides?

exotic nebula Aug 10, 2025, 6:08 PM

#

modest prism Hi there! I've got a question. Is the model named "gpt-5" on lmarena, a thinking...

It is a thinking model. Especially thinking-high.

exotic nebula Aug 10, 2025, 6:12 PM

#

reef pawn Grok 4 or GPT-5?

GPT 5.

exotic nebula Aug 10, 2025, 6:13 PM

#

reef pawn I haven't tested Grok 4 fully yet, but my first impressions was that GPT-5 is su...

Grok 4 heavy is really good, but overall GPT 5 overrates it with its pricing.

neon idol Aug 10, 2025, 6:13 PM

#

reef pawn Grok 4 or GPT-5?

Idk

exotic nebula Aug 10, 2025, 6:13 PM

#

stark tusk Is gpt5 and gpt5 chat the same?

GPT 5 Chat is a non thinking model.

#

GPT 5 is a thinking model.

neon idol Aug 10, 2025, 6:14 PM

#

Do you have a prompt for test ai?

exotic nebula Aug 10, 2025, 6:14 PM

#

neon idol Do you have a prompt for test ai?

Intelligence test?

#

I got one

neon idol Aug 10, 2025, 6:15 PM

#

exotic nebula I got one

Yes intelligente test

exotic nebula Aug 10, 2025, 6:15 PM

#

Give me a sec

neon idol Aug 10, 2025, 6:15 PM

#

exotic nebula I got one

Can you send pls?

neon idol Aug 10, 2025, 6:15 PM

#

exotic nebula Give me a sec

Okkkk

exotic nebula Aug 10, 2025, 6:16 PM

#

@neon idol
This is the prompt:

Ciphertext:

ᚱ-ᛝᚱᚪᛗᚹ.ᛄᛁᚻᛖᛁᛡᛁ-ᛗᚫᚣᚹ-ᛠᚪᚫᚾ-/
ᚣᛖᛈ-ᛄᚫᚫᛞ.ᛁᛉᛞᛁᛋᛇ-ᛝᛚᚱᛇ-ᚦᚫᛡ/
-ᛞᛗᚫᛝ-ᛇᚫ-ᛄᛁ-ᛇᚪᛡᛁ.ᛇᛁᛈᛇ-ᚣᛁ-ᛞ/
ᛗᚫᛝᚻᛁᚳᛟᛁ.ᛠᛖᛗᚳ-ᚦᚫᛡᚪ-ᛇᚪᛡᚣ.ᛁᛉ/
ᛋᛁᚪᛖᛁᛗᛞᛁ-ᚦᚫᛡᚪ-ᚳᚠᚣ.ᚳᚫ-ᛗᚫᛇ-ᛁᚳᛖᛇ-ᚫ/
ᚪ-ᛞᛚᚱᚹᛁ-ᚣᛖᛈ-ᛄᚫᚫᛞ.ᚫᚪ-ᚣᛁ-ᚾᛁᛈᛈᚱᛟᛁ-/
ᛞᚫᛗᛇᚱᛖᛗᛁᚳ-ᛝᛖᚣᛖᛗ.ᛁᛖᚣᛁᚪ-ᚣᛁ-ᛝᚫ/
ᚪᚳᛈ-ᚫᚪ-ᚣᛁᛖᚪ-ᛗᛡᚾᛄᛁᚪᛈ.ᛠᚫᚪ-ᚱᚻᚻ-ᛖ/
ᛈ-ᛈᚱᛞᚪᛁᚳ./
Method:

Atbash:
decimal[i] = 28 - decimal[i]

This is the answer:

A WARNING

BELIEVE NOTHING FROM THIS BOOK EXCEPT WHAT YOU KNOW TO BE TRUE TEST THE KNOWLEDGE FIND YOUR TRUTH EXPERIENCE YOUR DEATH DO NOT EDIT OR CHANGE THIS BOOK OR THE MESSAGE CONTAINED WITHIN EITHER THE WORDS OR THEIR NUMBERS FOR ALL IS SACRED

neon idol Aug 10, 2025, 6:17 PM

#

should AI decipher the message?

exotic nebula Aug 10, 2025, 6:18 PM

#

neon idol should AI decipher the message?

Yes

#

Paste the prompt and let it decipher it.

neon idol Aug 10, 2025, 6:18 PM

#

exotic nebula Yes

Ok thx

exotic nebula Aug 10, 2025, 6:19 PM

#

If it deciphers and you get the answer which I pasted there, then it passes the test.

neon idol Aug 10, 2025, 6:25 PM

#

Grok 3 failed the test

exotic nebula Aug 10, 2025, 6:26 PM

#

neon idol Grok 3 failed the test

Grok 3 was never good at reasoning problems. I would say its good for roleplay and long output.

neon idol Aug 10, 2025, 6:26 PM

#

exotic nebula Grok 3 was never good at reasoning problems. I would say its good for roleplay a...

Now I am trying grok4 and gpt 5

#

They are thinking

neon idol Aug 10, 2025, 6:27 PM

#

exotic nebula Grok 3 was never good at reasoning problems. I would say its good for roleplay a...

Uhm

#

There is a problem

#

I apologize, but I am unable to decode the provided ciphertext using the Atbash method (decimal[i] = 28 - decimal[i]) because the runes in the ciphertext do not directly correspond to a standard numerical mapping (such as the 28-letter Elder

exotic nebula Aug 10, 2025, 6:27 PM

#

What?

#

Which model?

neon idol Aug 10, 2025, 6:27 PM

#

exotic nebula Which model?

Grok 4

exotic nebula Aug 10, 2025, 6:28 PM

#

Huh. Weird. It works for me

#

Try in a new chat window

neon idol Aug 10, 2025, 6:28 PM

#

exotic nebula Huh. Weird. It works for me

Also gpt 5 answerd me like this

neon idol Aug 10, 2025, 6:28 PM

#

exotic nebula Try in a new chat window

Ive did it

#

Same answer

exotic nebula Aug 10, 2025, 6:30 PM

#

Hmm

#

I just tried out both models

#

They gave me the correct answer

#

#

@neon idol tried it again?

neon idol Aug 10, 2025, 6:32 PM

#

exotic nebula

What is the request?

neon idol Aug 10, 2025, 6:32 PM

#

exotic nebula <@1196228030901792810> tried it again?

Yes they are thinking

exotic nebula Aug 10, 2025, 6:32 PM

#

neon idol What is the request?

Ciphertext:

ᚱ-ᛝᚱᚪᛗᚹ.ᛄᛁᚻᛖᛁᛡᛁ-ᛗᚫᚣᚹ-ᛠᚪᚫᚾ-/
ᚣᛖᛈ-ᛄᚫᚫᛞ.ᛁᛉᛞᛁᛋᛇ-ᛝᛚᚱᛇ-ᚦᚫᛡ/
-ᛞᛗᚫᛝ-ᛇᚫ-ᛄᛁ-ᛇᚪᛡᛁ.ᛇᛁᛈᛇ-ᚣᛁ-ᛞ/
ᛗᚫᛝᚻᛁᚳᛟᛁ.ᛠᛖᛗᚳ-ᚦᚫᛡᚪ-ᛇᚪᛡᚣ.ᛁᛉ/
ᛋᛁᚪᛖᛁᛗᛞᛁ-ᚦᚫᛡᚪ-ᚳᚠᚣ.ᚳᚫ-ᛗᚫᛇ-ᛁᚳᛖᛇ-ᚫ/
ᚪ-ᛞᛚᚱᚹᛁ-ᚣᛖᛈ-ᛄᚫᚫᛞ.ᚫᚪ-ᚣᛁ-ᚾᛁᛈᛈᚱᛟᛁ-/
ᛞᚫᛗᛇᚱᛖᛗᛁᚳ-ᛝᛖᚣᛖᛗ.ᛁᛖᚣᛁᚪ-ᚣᛁ-ᛝᚫ/
ᚪᚳᛈ-ᚫᚪ-ᚣᛁᛖᚪ-ᛗᛡᚾᛄᛁᚪᛈ.ᛠᚫᚪ-ᚱᚻᚻ-ᛖ/
ᛈ-ᛈᚱᛞᚪᛁᚳ./
Method:

Atbash:
decimal[i] = 28 - decimal[i]

neon idol Aug 10, 2025, 6:33 PM

#

A WARNING

BELIEVE NOTHING FROM THIS BOOK

EXCEPT WHAT YOU KNOW TO BE TRUE

TEST THE KNOWLEDGE

FIND YOUR TRUTH

EXPERIENCE YOUR DEATH

DO NOT EDIT OR CHANGE THIS BOOK

OR THE MESSAGE CONTAINED WITHIN

EITHER THE WORDS OR THEIR NUMBERS

FOR ALL IS SACRED

#

8 minute of reasoning but grok 4 win

exotic nebula Aug 10, 2025, 6:34 PM

#

Nice

#

What about gpt 5?

neon idol Aug 10, 2025, 6:34 PM

#

exotic nebula What about gpt 5?

Still here in thinking 🤣

#

Lets try Gemini but i think it will give a right answer

exotic nebula Aug 10, 2025, 6:37 PM

#

neon idol Still here in thinking 🤣

How abt now?

keen beacon Aug 10, 2025, 7:04 PM

#

How do we feel about Chinese models guys?

#

E.g. Qwen, R1

pure falcon Aug 10, 2025, 7:04 PM

#

Got our answer @whole wagon @solid brook @deep adder

https://x.com/infwinston/status/1954618277818470902?s=46

Wei-Lin Chiang (@infwinston)

@gen_obligation @scaling01 @lmarena_ai @ml_angelopoulos @aryanvichare10 @cdngdev the model is tested with reasoning_effort high. we'll clarify it.

exotic nebula Aug 10, 2025, 7:04 PM

#

keen beacon How do we feel about Chinese models guys?

They're damn good honestly

exotic nebula Aug 10, 2025, 7:04 PM

#

pure falcon Got our answer <@156022481147133952> <@1013035827997184031> <@34847726670499020...

Woah what's this about im curious

keen beacon Aug 10, 2025, 7:04 PM

#

exotic nebula They're damn good honestly

I have solid reasons to believe that they are not that great, unfortunately

#

Have you ever tested with private benchmarks that nobody ever has in the whole universe? 👀

pure falcon Aug 10, 2025, 7:06 PM

#

exotic nebula Woah what's this about im curious

LMArena uses “high” reasoning for both the tested and live version of GPT-5

exotic nebula Aug 10, 2025, 7:06 PM

#

keen beacon I have solid reasons to believe that they are not that great, unfortunately

They are not as state of the art as gemini or other ones on the market. But they are cheap, open source and much easy to set up. Waiting for Deepseek R2 to create absolute waves.

exotic nebula Aug 10, 2025, 7:08 PM

#

pure falcon LMArena uses “high” reasoning for both the tested and live version of GPT-5

I see. So that's why you are clarifying about. Gotcha 👍

keen beacon Aug 10, 2025, 7:08 PM

#

exotic nebula They are not as state of the art as gemini or other ones on the market. But they...

Sure

#

So here's something to check out

#

Any way I can share my LMArena sessions with you?

exotic nebula Aug 10, 2025, 7:09 PM

#

keen beacon Any way I can share my LMArena sessions with you?

Yes you can, just copy paste the site link of the convo.

keen beacon Aug 10, 2025, 7:11 PM

#

Really?

#

https://lmarena.ai/c/598f5448-5ed5-4cba-9251-f6115604c41f

#

Duzn't seem to work...

exotic nebula Aug 10, 2025, 7:12 PM

#

😭 Damn

#

Well send me screenshots

neon idol Aug 10, 2025, 7:14 PM

#

keen beacon https://lmarena.ai/c/598f5448-5ed5-4cba-9251-f6115604c41f

Not it doesn't work lol

keen beacon Aug 10, 2025, 7:14 PM

#

Bruh

#

So here's the prompt

List 100 anime similar to Madoka Magica, one entry per franchise, names only, no bs.

Ask the last Qwen thinking, and then, say, Gemini 2.5 Pro. See how different they are

#

Gemini 2.5 just gives a list of shows

#

Qwen starts to wildly hallucinate, invent shows that don't exist, and repeating the same show over and over

#

When you ask it to fix its delusions it freezes

neon idol Aug 10, 2025, 7:17 PM

#

keen beacon Gemini 2.5 just gives a list of shows

And this is what you wanted. Right?

keen beacon Aug 10, 2025, 7:18 PM

#

neon idol And this is what you wanted. Right?

I wanted Qwen to do it, and for a model just behind Gemini at Livebench it is honestly a bit disappointing

exotic nebula Aug 10, 2025, 7:19 PM

#

Oh.

#

That sucks....

keen beacon Aug 10, 2025, 7:19 PM

#

There are also more obscure and accurate mentions in Gemini's output that Qwen never identified

#

Yeah, I feel like Chinese developers massively overreport the capabilities of their LLMs

#

Intentionally or not, I don't know

exotic nebula Aug 10, 2025, 7:22 PM

#

keen beacon Yeah, I feel like Chinese developers massively overreport the capabilities of th...

100% true. Im on board with you on that.

keen beacon Aug 10, 2025, 7:22 PM

#

They are like

#

Benchmaxxx, drop, scare the hell out of OpenAI

#

Then suddenly everyone figures out your model is not that good

#

But nobody cares by that point anymore

#

I honestly don't know why I used Chinese LLMs for so long when there's Gemini on lmarena 🫤

exotic nebula Aug 10, 2025, 7:26 PM

#

keen beacon I honestly don't know why I used Chinese LLMs for so long when there's Gemini on...

Bro why would you at first point 😭 there's like heck a lot of models other than the Chinese ones

keen beacon Aug 10, 2025, 7:26 PM

#

exotic nebula Bro why would you at first point 😭 there's like heck a lot of models other than...

I live in Russia and can't pay for chatgpt lol

exotic nebula Aug 10, 2025, 7:27 PM

#

keen beacon I live in Russia and can't pay for chatgpt lol

Who tf pays for chatgpt 😂

keen beacon Aug 10, 2025, 7:27 PM

#

VPN subscription + LLM subscription at least

exotic nebula Aug 10, 2025, 7:27 PM

#

Btw dont reveal location. Delete that msg. Some people here have bad opinions about Russia

exotic nebula Aug 10, 2025, 7:27 PM

#

keen beacon VPN subscription + LLM subscription at least

💀

keen beacon Aug 10, 2025, 7:27 PM

#

exotic nebula Btw dont reveal location. Delete that msg. Some people here have bad opinions ab...

And they are completely correct in their opinions

exotic nebula Aug 10, 2025, 7:27 PM

#

keen beacon And they are completely correct in their opinions

Ayo what

keen beacon Aug 10, 2025, 7:28 PM

#

What? I lived here for nearly 25 years, I know what I'm talking about ok there?

exotic nebula Aug 10, 2025, 7:29 PM

#

keen beacon What? I lived here for nearly 25 years, I know what I'm talking about ok there?

I see. Its rare to see someone truthful these days. Got shocked. Forgive me my impatientence.

keen beacon Aug 10, 2025, 7:30 PM

#

Meh, at one point I can't wait for the day Deepseek shatters OpenAI with another release

#

At another, I see this garbage

exotic nebula Aug 10, 2025, 7:30 PM

#

keen beacon Meh, at one point I can't wait for the day Deepseek shatters OpenAI with another...

Fr, we all are waiting for Deepseek R2

#

@keen beacon @neon idol

neon idol Aug 10, 2025, 7:31 PM

#

AHAHHAHA FRRR

keen beacon Aug 10, 2025, 7:31 PM

#

Here is my private benchmark

#

There is one underrated anime

#

To pass it, an LLM has to figure out why it's so underrated

#

Qwen was mostly able to do it with deep research, but arrived at a half correct conclusion even using unreliable sources

#

And I don't know any LLM that was able to figure it out independently

#

GPT-5 gives an usual knee jerk response

#

It's trained on hundreds of reviews, most of which are wrong, and none ever point out issues that aren't related to the content of the show

exotic nebula Aug 10, 2025, 7:35 PM

#

I see. A RL(HF) test. Interesting. So which model succeeded in your expectations, i.e, which one found the reason for why its failed?

#

And if you dont mind, which anime?

keen beacon Aug 10, 2025, 7:36 PM

#

Why it failed*

#

It was the last Qwen deep research

#

I haven't tested others in the deep research mode

exotic nebula Aug 10, 2025, 7:36 PM

#

Ah I see. Lemme know when you try out for all models.

keen beacon Aug 10, 2025, 7:37 PM

#

Let me know if you have access for GPT-5 or Grok 4 or Opus 4.1 or Gemini 2.5 Pro Deep research

#

Because so far each pretrained model just kept parroting the same stupid data and missing the crucial point

exotic nebula Aug 10, 2025, 7:39 PM

#

keen beacon Let me know if you have access for GPT-5 or Grok 4 or Opus 4.1 or Gemini 2.5 Pro...

Isn't it there on LMarena?

exotic nebula Aug 10, 2025, 7:39 PM

#

keen beacon Because so far each pretrained model just kept parroting the same stupid data an...

Lmfao, all of them are pre trained data networks

keen beacon Aug 10, 2025, 7:39 PM

#

exotic nebula Isn't it there on LMarena?

Deep research isn't as far as I'm aware.

exotic nebula Aug 10, 2025, 7:41 PM

#

keen beacon Deep research isn't as far as I'm aware.

Ah, DeepThink? Hmm, there's talks of it coming on the platform soon.

keen beacon Aug 10, 2025, 7:42 PM

#

exotic nebula Ah, DeepThink? Hmm, there's talks of it coming on the platform soon.

Deep RESEARCH. Not Think.

#

The thing that makes the model ask Google questions

exotic nebula Aug 10, 2025, 7:43 PM

#

keen beacon Deep RESEARCH. Not Think.

Kk. Chill out 😅

keen beacon Aug 10, 2025, 7:48 PM

#

I also find LLMs funny when it comes to creative writing

#

They tend to generate absolutely atrocious and banal ideas, and each time you ask them to write something new, they just keep writing the same story over and over again, only switching minor details such as settings, character names, character designs and so on

#

However, when it comes to assisting and finishing already good ideas, they tend to generate ideas that fit much better and are less nonsensical

#

Deepseek once suggested the same way to finish my story I did

#

But if you start writing with LLMs from scratch, they are total garbage

neon idol Aug 10, 2025, 7:52 PM

#

keen beacon They tend to generate absolutely atrocious and banal ideas, and each time you as...

In your opinion is better gpt 5 or grok 4?

keen beacon Aug 10, 2025, 7:53 PM

#

neon idol In your opinion is better gpt 5 or grok 4?

GPT-5 high reasoning wins most of the benchmarks but it's a bit slow, sometimes slower than Qwen

#

It also won a couple of my private benchmarks

#

However, LLM seem to be capable to produce really creative, really unlike-each-other mathematical proofs for novel problems

#

DeepThink can do it already

#

I wonder why it doesn't work the same way when writing stories

neon idol Aug 10, 2025, 7:54 PM

#

Do you have prompts for test ai?

stray aspen Aug 10, 2025, 7:55 PM

#

neon idol In your opinion is better gpt 5 or grok 4?

I mean grok 4 is smart AF when it.comes to math

#

But gpt 5 is greater overall

#

Both are great models tho

keen beacon Aug 10, 2025, 7:59 PM

#

keen beacon Deep RESEARCH. Not Think.

Something unexpected happened.

When asked why the show failed, Grok and Gemini booth provide a knee jerk response. When asked to do comprehensive research - even in offline mode - they figured out all the factors, and then after asked for the most important one they all name marketing problems

#

They never figure it out if you ask them directly

#

But if asked to research a bit

#

But this is stupid, I want them to be able to respond correctly at the very first prompt without nudging

#

This is so stupid

novel flame Aug 10, 2025, 8:05 PM

#

I ran a private HTML game microbenchmark and GPT-5 did a pretty good job on release day, close to or maybe SotA. Then after Sama tweeted about fixing a bug, I ran it again, and this time GPT-5 generated the best game of any model yet by a decent margin.

And on the coding part of my regular test suite, it crushed as well. It even came up with a brilliant and elegant optimization no other model has proposed (more than a hundred models so far). I am a big fan/ of Sonnet and Gemini 2.5 Pro but coding, but I can’t deny those results.

It seems like OpenAI actually cooked on the coding side this time.

obsidian shell Aug 10, 2025, 8:32 PM

#

have you guys switched to gpt 5 for coding?

reef pawn Aug 10, 2025, 8:33 PM

#

How good is Deep Research in GPT-5? any difference from previous model? I believe it was previously using O3 for research when GPT 4o came out, no?

reef pawn Aug 10, 2025, 8:33 PM

#

obsidian shell have you guys switched to gpt 5 for coding?

No, Google Gemini

stray aspen Aug 10, 2025, 8:34 PM

#

obsidian shell have you guys switched to gpt 5 for coding?

ywa

#

yes

obsidian shell Aug 10, 2025, 8:35 PM

#

reef pawn How good is Deep Research in GPT-5? any difference from previous model? I believ...

idk i still use the o3-mini reasoning model when it comes to the analyst agent

obsidian shell Aug 10, 2025, 8:37 PM

#

reef pawn No, Google Gemini

why the hell?

reef pawn Aug 10, 2025, 8:37 PM

#

obsidian shell why the hell?

I'm student and Gemini AI pro is free for students in my country.

obsidian shell Aug 10, 2025, 8:37 PM

#

gemini is generally free in their ai studio interface

reef pawn Aug 10, 2025, 8:38 PM

#

Yes, I use Google AI Studio free version.

white hatch Aug 10, 2025, 8:39 PM

#

Does gemini's web version have context length limit?

reef pawn Aug 10, 2025, 8:39 PM

#

Web version? The application itself without AI Studio?

white hatch Aug 10, 2025, 8:40 PM

#

Yeah gemini.google.com

prisma temple Aug 10, 2025, 8:40 PM

#

Я что не догоняю гпт 5 вышел

white hatch Aug 10, 2025, 8:40 PM

#

prisma temple Я что не догоняю гпт 5 вышел

Вышел

reef pawn Aug 10, 2025, 8:41 PM

#

Yes, you can't use Gemini 2.5 pro for long in official Gemini app but I'm not sure about context window.

white hatch Aug 10, 2025, 8:42 PM

#

reef pawn Yes, you can't use Gemini 2.5 pro for long in official Gemini app but I'm not su...

I'm sorry, I meant output length

#

Is it the same as in ai studio?

reef pawn Aug 10, 2025, 8:45 PM

#

white hatch Is it the same as in ai studio?

No, it's not. Unless you are on paid plan. The whole free thing they doing on Google AI Studio is to attract developers to the website and convert them into paid customers. Average joe that uses Gemini app for cat pic doesn't get same limits on free version in gemini official website.

white hatch Aug 10, 2025, 8:46 PM

#

reef pawn No, it's not. Unless you are on paid plan. The whole free thing they doing on Go...

Ok, thank you!

whole wagon Aug 10, 2025, 8:47 PM

#

Livebench added GPT5 pro high

keen beacon Aug 10, 2025, 8:49 PM

#

prisma temple Я что не догоняю гпт 5 вышел

Angliysky speak blyat!!

whole wagon Aug 10, 2025, 8:50 PM

#

whole wagon Livebench added GPT5 pro high

Still an awful benchmark. 4o has 77% in coding and GPT5 pro has 69% in coding

pliant cliff Aug 10, 2025, 8:51 PM

#

prisma temple Я что не догоняю гпт 5 вышел

о русские

whole wagon Aug 10, 2025, 8:52 PM

#

Gpt 5 mini high 25% Gemma 3 12b 42%

#

The benchmark is literally broken trash

neat apex Aug 10, 2025, 8:52 PM

#

Gpt 5 minimal?

keen beacon Aug 10, 2025, 8:53 PM

#

whole wagon Gpt 5 mini high 25% Gemma 3 12b 42%

Tbh they have to run it multiple times on each problem and take the average to see if the model really passes or not

#

Which is probably what they're doing

#

If so then I have no idea why it's so ass

whole wagon Aug 10, 2025, 8:54 PM

#

They run it multiple times. It's a problem with the benchmark, they scored it 0 if it takes over a certain amount of time iirc

#

It's ass

#

That's why the non reasoning models do better

#

They don't need to take time

neat apex Aug 10, 2025, 8:55 PM

#

Maybe they ask well know old Examples, seeing by the Command perfomance

#

I cant endure Command being that high

whole wagon Aug 10, 2025, 8:56 PM

#

Command performance is not good. I scrolled to the bottom because that is where GPT5 is

#

Lmao

#

There's like 100 models above it

neat apex Aug 10, 2025, 8:56 PM

#

Wha

whole wagon Aug 10, 2025, 8:56 PM

#

Because livebench is trash

#

I don't get it. They had one job

#

And they screwed it up

neat apex Aug 10, 2025, 8:57 PM

#

Looks reasonable if you ignore gpt5?

whole wagon Aug 10, 2025, 8:57 PM

#

No

#

4o is one of the top. Above o3 etc

neat apex Aug 10, 2025, 8:58 PM

#

Dx

whole wagon Aug 10, 2025, 8:58 PM

#

keen beacon Aug 10, 2025, 8:59 PM

#

Lol. So what's the most credible benchmark so far?

#

Artificial Analysis one?

neat apex Aug 10, 2025, 8:59 PM

#

Cursed

keen beacon Aug 10, 2025, 8:59 PM

#

Ofc they all suck, we just need yo find one that sucks less

whole wagon Aug 10, 2025, 8:59 PM

#

Simple bench is ok

#

What does this mean

#

They can't actually serve GPT5 fully?

#

👀

#

This is unexpected. Livebench always glazed openAI before

ripe mountain Aug 10, 2025, 9:03 PM

#

whole wagon Livebench added GPT5 pro high

wheres gemini?

whole wagon Aug 10, 2025, 9:05 PM

#

https://livebench.ai/#/

#

It is all there

#

The table is huge

ripe mountain Aug 10, 2025, 9:05 PM

#

thx

whole wagon Aug 10, 2025, 9:05 PM

#

Coding average has Gemini 2.5 pro below 4o also

#

It's a meme benchmark basically lol

ripe mountain Aug 10, 2025, 9:07 PM

#

whole wagon It's a meme benchmark basically lol

Gemini has dropped from 78 to 70. Could this be related to its nerfing?

blazing bison Aug 10, 2025, 9:07 PM

#

whole wagon What does this mean

they gonna remove sora or smth

#

less images per week for plus

#

they gonna cut something

ripe mountain Aug 10, 2025, 9:07 PM

#

blazing bison they gonna cut something

why

whole wagon Aug 10, 2025, 9:08 PM

#

ripe mountain Gemini has dropped from 78 to 70. Could this be related to its nerfing?

No. they update the benchmark

ripe mountain Aug 10, 2025, 9:08 PM

#

whole wagon No. they update the benchmark

ah

#

mb

eternal niche Aug 10, 2025, 9:09 PM

#

btw gpt5 sucks

#

even gemini 2.5 pro better

ripe mountain Aug 10, 2025, 9:10 PM

#

eternal niche even gemini 2.5 pro better

nah

blazing bison Aug 10, 2025, 9:19 PM

#

ripe mountain why

not enough compute

ornate agate Aug 10, 2025, 9:23 PM

#

whole wagon It's a meme benchmark basically lol

Maybe they all are, in a way.

autumn cargo Aug 10, 2025, 9:38 PM

#

Gemini 2.5 Pro clearly better than GPT 5 imo. Even in lmarena GPT 5 has only 33% win rate against Gemini 2.5 Pro. Not sure how it has ended up on top!