#general | Arena | Page 83

keen beacon Aug 6, 2025, 3:24 PM

#

Google will not (likely) launch their 3rd generation yet

wheat onyx Aug 6, 2025, 3:24 PM

#

Either give me other comparison numbers to use, or accept the ones I have

keen beacon Aug 6, 2025, 3:24 PM

#

estimations project decemeber for gemini 3

wheat onyx Aug 6, 2025, 3:24 PM

#

We dont know the paid users of any of them

keen beacon Aug 6, 2025, 3:24 PM

#

We do for OpenAI

wheat onyx Aug 6, 2025, 3:25 PM

#

Give me Anthropics numbers and Google's

keen beacon Aug 6, 2025, 3:25 PM

#

Why?

#

Given prior releases that is the predicted date

#

Counter?

fleet lintel Aug 6, 2025, 3:25 PM

#

earlier or later ?

wheat onyx Aug 6, 2025, 3:25 PM

#

Sorry, I use the numbers published, not vibes

keen beacon Aug 6, 2025, 3:26 PM

#

wen?

fleet lintel Aug 6, 2025, 3:26 PM

#

ohk. that makes sense

keen beacon Aug 6, 2025, 3:26 PM

#

Makes sense.

#

We should get a Gemma model before though

wheat onyx Aug 6, 2025, 3:27 PM

#

wheat onyx Give me Anthropics numbers and Google's

@deep adder I'm waiting

fleet lintel Aug 6, 2025, 3:27 PM

#

no way.. really? that seems too fast

keen beacon Aug 6, 2025, 3:27 PM

#

fleet lintel no way.. really? that seems too fast

Competition is fierce

wheat onyx Aug 6, 2025, 3:27 PM

#

I did, you said "nah" and refused to give any alternatives

fleet lintel Aug 6, 2025, 3:27 PM

#

are they not planning to release 2.5-002 like they did with 1.5 ?

keen beacon Aug 6, 2025, 3:27 PM

#

fleet lintel are they not planning to release 2.5-002 like they did with 1.5 ?

Another version of their 2.5 series?

#

No.

#

"Hero run" next

#

Demis said so on lex's podcast

#

New base modle

#

*model

#

Roadmap?

#

Do you work there?

wheat onyx Aug 6, 2025, 3:29 PM

#

keen beacon estimations project decemeber for gemini 3

People expect within a month or so

#

but who knows

keen beacon Aug 6, 2025, 3:30 PM

#

I find it hard to believe that.

stray aspen Aug 6, 2025, 3:30 PM

#

are we getting gpt 5 tomorrow

keen beacon Aug 6, 2025, 3:30 PM

#

Lets see if we get a Gemma model this week or not

wheat onyx Aug 6, 2025, 3:30 PM

#

keen beacon I find it hard to believe that.

#general message

Why people think it's coming very soon

keen beacon Aug 6, 2025, 3:30 PM

#

wheat onyx https://discord.com/channels/1340554757349179412/1340554757827461211/14023661283...

Base model training runs take a while to get through post-training and saftey testing.

#

Assuming they started it a few months back, it will take a while.

wheat onyx Aug 6, 2025, 3:31 PM

#

stray aspen are we getting gpt 5 tomorrow

believe so

keen beacon Aug 6, 2025, 3:31 PM

#

Oh, GPT-5 will be strong

fleet lintel Aug 6, 2025, 3:31 PM

#

gpt-5 has to be super strong

keen beacon Aug 6, 2025, 3:32 PM

#

I just hope Gemini 3 has native tool ussage

torn bison Aug 6, 2025, 3:32 PM

#

GPT5 isn't great for education, I feel like it's not very good at explaining things, just like o3

keen beacon Aug 6, 2025, 3:32 PM

#

For search functionalties and calculations

wheat onyx Aug 6, 2025, 3:32 PM

#

https://t.co/K6g1I9nsWI
https://t.co/XYG52QQYxa
https://t.co/kzLBtndx3C

wheat onyx Aug 6, 2025, 3:33 PM

#

torn bison GPT5 isn't great for education, I feel like it's not very good at explaining thi...

is the new education mode good? Havent tried it

torn bison Aug 6, 2025, 3:33 PM

#

Gemini having absorbed LearnLM, performs very well in this regard

keen beacon Aug 6, 2025, 3:33 PM

#

wheat onyx is the new education mode good? Havent tried it

Did you guys read the learn LLM paper? amazing arena they created

rough condor Aug 6, 2025, 3:33 PM

#

Anyone else videos not getting audio?

keen beacon Aug 6, 2025, 3:33 PM

#

What do you mean?

torn bison Aug 6, 2025, 3:33 PM

#

wheat onyx is the new education mode good? Havent tried it

I haven't tried it either. I tried to get GPT5 (Summit) to explain a few concepts

wheat onyx Aug 6, 2025, 3:34 PM

#

keen beacon Did you guys read the learn LLM paper? amazing arena they created

nope link it? I'll read later

#

https://tenor.com/view/the-simpsons-bart-vibe-groove-dance-gif-15287212887189905850

Tenor

keen beacon Aug 6, 2025, 3:34 PM

#

When?

torn bison Aug 6, 2025, 3:34 PM

#

will it have improvements in conversation? like, improvements in tool calling won't increase the arena score

jade egret Aug 6, 2025, 3:35 PM

#

genie 3 is good?

keen beacon Aug 6, 2025, 3:35 PM

#

Will it perform good on searching? Like o3 does? (and now grok 4 recently)

wheat onyx Aug 6, 2025, 3:35 PM

#

you say things as if it's a fact.

Yes, if GPT-5 is so strong that Gemini 3 looks weak, they'll probably delay X amount.

"I think they will delay 2 months" is so weird to say

fleet lintel Aug 6, 2025, 3:36 PM

#

they should launch it before gpt-5 .. which wont happen but otherwise another 2.5 launch would be super awkward because it wont perform better than gpt-5

wheat onyx Aug 6, 2025, 3:36 PM

#

fleet lintel they should launch it before gpt-5 .. which wont happen but otherwise another 2....

they could do something like Anthropic. 4.1 release now, state big upgrades coming in ~month

torn bison Aug 6, 2025, 3:36 PM

#

looks like the arena #1 for August is going to GPT5

keen beacon Aug 6, 2025, 3:36 PM

#

RL on tool ussage is quite hard to figure out. Either it gets too domain specific or can't do long horizons. Generalization is a big issue

fleet lintel Aug 6, 2025, 3:37 PM

#

wheat onyx they could do something like Anthropic. 4.1 release now, state big upgrades comi...

makes sense .. I think they have to do something like that.

fleet lintel Aug 6, 2025, 3:37 PM

#

torn bison looks like the arena #1 for August is going to GPT5

100%.. easy money on polymarket if you gamble

keen beacon Aug 6, 2025, 3:37 PM

#

Google won't release Gemini 3 unless it is industry leading.

#

They simply won't.

agile bloom Aug 6, 2025, 3:38 PM

#

is GPT-5 out? available to use on LMArena?

torn bison Aug 6, 2025, 3:38 PM

#

I hope they fix the overflattering issue with 2.5pro. wolfstride is nice, blacktooth is even better, but both are better than 2.5pro anyway.

keen beacon Aug 6, 2025, 3:38 PM

#

GPT-5 is yet to be released

#

Arena did have checkpoints (likely) of it for a few days.

torn bison Aug 6, 2025, 3:39 PM

#

current 2.5 pro makes it impossible for me to trust any of its subjective evaluations

keen beacon Aug 6, 2025, 3:39 PM

#

why is that?

#

sycophancy?

wheat onyx Aug 6, 2025, 3:40 PM

#

If you have actual info, say that. If not, just own that it’s a guess.

I'm not asking you to be right, just to flag what's grounded and what's not.

agile bloom Aug 6, 2025, 3:40 PM

#

keen beacon GPT-5 is yet to be released

so it's not released on LMArena or ChatGPT (for paid user)?

torn bison Aug 6, 2025, 3:40 PM

#

agile bloom so it's not released on LMArena or ChatGPT (for paid user)?

tomorrow

keen beacon Aug 6, 2025, 3:40 PM

#

agile bloom so it's not released on LMArena or ChatGPT (for paid user)?

Its not released nor announced as a product yet

#

Sam hypemen exists

#

And elon purposefully leaked a bunch of grok info

#

They do happen. often in the AI space.

wheat onyx Aug 6, 2025, 3:42 PM

#

I’m not assuming no one has info. I’m saying if you're guessing, just say it's a guess.

And if you do have actual info, say that, don’t just imply it.

agile bloom Aug 6, 2025, 3:42 PM

#

keen beacon Its not released nor announced as a product yet

cool cool? so GPT-5 would be available on LMArena tom or would that take time?

tight tide Aug 6, 2025, 3:42 PM

#

Can anyone tell me how to generate videos like specific tools veo 3 Hailuo 2

wheat onyx Aug 6, 2025, 3:42 PM

#

keen beacon Sam hypemen exists

I think both reasoning and non reasoning will be improved, but not sure how much. I think hallucinations should be improved, which is pretty good

keen beacon Aug 6, 2025, 3:43 PM

#

agile bloom cool cool? so GPT-5 would be available on LMArena tom or would that take time?

They usually have it within a few days (or hours) of the release.

wheat onyx Aug 6, 2025, 3:43 PM

#

OAI said their IMO Gold Model won't release for a few more months

agile bloom Aug 6, 2025, 3:43 PM

#

keen beacon They usually have it within a few days (or hours) of the release.

even coooool, excited to use GPT-5

keen beacon Aug 6, 2025, 3:45 PM

#

Wait a second, in the arena o3 ranks the highest in search. However it should not have the tools to facilitate search in the API. How does the arena version function?

wheat onyx Aug 6, 2025, 3:45 PM

#

ok, so you're saying you know for a fact Gemini is delayed 2 months. Good to know

agile bloom Aug 6, 2025, 3:45 PM

#

okay so which is the smartest AI model right now? with the most about of data (it's trained on) and the highest parameters involved in it?

wheat onyx Aug 6, 2025, 3:45 PM

#

or you're just trolling

#

Ah so we've come full circle. It's just vibes

keen beacon Aug 6, 2025, 3:46 PM

#

agile bloom **okay so which is the smartest AI model right now? with the most about of data ...

Rankings are different and each LLM specialises in different domains (eg claude for coding, gemini for teaching... so on) but o3 is the most generally intelligent LLM out there.

agile bloom Aug 6, 2025, 3:46 PM

#

keen beacon Rankings are different and each LLM specialises in different domains (eg claude ...

thanks

keen beacon Aug 6, 2025, 3:47 PM

#

The chess tournament is about to start soon!

wheat onyx Aug 6, 2025, 3:47 PM

#

keen beacon Rankings are different and each LLM specialises in different domains (eg claude ...

It seems that GPT-5 will be a big coding jump. It will be interesting to see if:

it beat Opus 4.1
1b. Anthropic releases a follow-up to beat it

wheat onyx Aug 6, 2025, 3:48 PM

#

wheat onyx It seems that GPT-5 will be a big coding jump. It will be interesting to see if:...

Ideally it does, good for competition

torn bison Aug 6, 2025, 3:48 PM

#

wheat onyx It seems that GPT-5 will be a big coding jump. It will be interesting to see if:...

100%
I think we'll have to wait at least three months

keen beacon Aug 6, 2025, 3:49 PM

#

wheat onyx It seems that GPT-5 will be a big coding jump. It will be interesting to see if:...

Likely? but why would Anthropic release 4.1 a few days before GPT-5 if they had a better model to release after it? Unless said model is extremely large and would therefore be uncompetitive with GPT-5 in pricing therefore unusable for coding

wheat onyx Aug 6, 2025, 3:49 PM

#

keen beacon Likely? but why would Anthropic release 4.1 a few days before GPT-5 if they had ...

https://x.com/AnthropicAI/status/1952768435873612256

Anthropic (@AnthropicAI)

We plan to release substantially larger improvements to our models in the coming weeks.

torn bison Aug 6, 2025, 3:49 PM

#

summit is next level

keen beacon Aug 6, 2025, 3:50 PM

#

"In the coming weeks" Quite peculiar. why would you release an inferior model only to render it useless in the coming weeks?

torn bison Aug 6, 2025, 3:50 PM

#

torn bison summit is next level

even 2.5 ultra deepthink can't compete with it in coding

wheat onyx Aug 6, 2025, 3:51 PM

#

keen beacon "In the coming weeks" Quite peculiar. why would you release an inferior model on...

maybe to get something out before GPT-5, then see if they can beat GPT-5 after. I don't know their exact strategy

#

seems clear though

torn bison Aug 6, 2025, 3:51 PM

#

gemini 3 pro is even more of a long shot. I'm really looking forward to seeing what cards google will play next

keen beacon Aug 6, 2025, 3:52 PM

#

wheat onyx maybe to get something out before GPT-5, then see if they can beat GPT-5 after. ...

Why would you rattle and confuse enterprises with no particular gain to be had in doing so?

fleet lintel Aug 6, 2025, 3:52 PM

#

keen beacon "In the coming weeks" Quite peculiar. why would you release an inferior model on...

competition is crazy right nnow... these companie are releasing better models as soon as they can

keen beacon Aug 6, 2025, 3:53 PM

#

fleet lintel competition is crazy right nnow... these companie are releasing better models as...

Even if we were to assume that this is their current latest model, a few weeks are insufficient to make significant enough improvements as to warrant another release.

wheat onyx Aug 6, 2025, 3:53 PM

#

keen beacon Why would you rattle and confuse enterprises with no particular gain to be had i...

Well there is an improvement now. I guess it depends on how you define gain

torn bison Aug 6, 2025, 3:54 PM

#

I believe they've found a better way to validate rewards to scale up RL even further

fleet lintel Aug 6, 2025, 3:54 PM

#

keen beacon Even if we were to assume that this is their current latest model, a few weeks a...

they have multiple models trainings happening in parallel.. one track could be fine tuning and other track could be updating the base model.
It is faster (much faster) to release fine tunning version.

keen beacon Aug 6, 2025, 3:54 PM

#

wheat onyx Well there is an improvement now. I guess it depends on how you define gain

On a few benchmarks it seems to have regressed and the community does not seem to pleased with it either

keen beacon Aug 6, 2025, 3:56 PM

#

fleet lintel they have multiple models trainings happening in parallel.. one track could be f...

Oh, yeah that does make sense. A mere finetune of Opus 4 to rid it of its shortcomings rather than any actual training improvement

#

A "quickfix"

fleet lintel Aug 6, 2025, 3:57 PM

#

it's fun to watch these improvements from side. But if you are an Engineer working in AI area, it is not fun. Sooo much pressure, it's crazy

patent aspen Aug 6, 2025, 3:58 PM

#

Yeah basically all companies are developing multiple models in parallel

#

Some might be good but not production ready yet for a variety of reasons

wheat onyx Aug 6, 2025, 3:59 PM

#

looking forward to GPT-5 tomorrow for sure

ripe mountain Aug 6, 2025, 4:00 PM

#

Is the site working properly right now?

shell pewter Aug 6, 2025, 4:01 PM

#

server ded? huh

torn bison Aug 6, 2025, 4:01 PM

#

before GPT5 I would have said kingfall was the undisputed king, but after using summit I think they each have their own strengths. And GPT's strengths are something the current generation of Gemini can absolutely not catch up to

undone crane Aug 6, 2025, 4:02 PM

#

im from Spain !! thank you so much to all LMArerna Stage !!!

gilded ingot Aug 6, 2025, 4:02 PM

#

hey guys is the site down for anyone else?

cyan zodiac Aug 6, 2025, 4:02 PM

#

gilded ingot hey guys is the site down for anyone else?

ye its down

shell pewter Aug 6, 2025, 4:02 PM

#

yea down for me too

gilded ingot Aug 6, 2025, 4:03 PM

#

thanks thought it was just me was working on a project and my chat suddenly vanished

stray aspen Aug 6, 2025, 4:03 PM

#

lmarena is down

shell pewter Aug 6, 2025, 4:03 PM

#

gilded ingot thanks thought it was just me was working on a project and my chat suddenly vani...

lol same scared the crap out of me lul

iron meadow Aug 6, 2025, 4:03 PM

#

@echo aurora

torn bison Aug 6, 2025, 4:03 PM

#

Yeah that's one of kingfall's strengths. It's more comprehensive and often more humanlike

sick chasm Aug 6, 2025, 4:03 PM

#

stray aspen lmarena is down

same

echo aurora Aug 6, 2025, 4:04 PM

#

Thank you for the flag. Escalating now.

shell pewter Aug 6, 2025, 4:04 PM

#

echo aurora Thank you for the flag. Escalating now.

thank you sir

#

🙏

stray aspen Aug 6, 2025, 4:04 PM

#

does anyone have deepseek r2 news

ripe mountain Aug 6, 2025, 4:05 PM

#

which ai do you think is the best right now? im torn between horizon beta and gemini 2.5 pro

weak sluice Aug 6, 2025, 4:05 PM

#

i was literally in the middle of typing.....dangit

sterile ingot Aug 6, 2025, 4:06 PM

#

Is the website down rn?

ripe mountain Aug 6, 2025, 4:06 PM

#

sterile ingot Is the website down rn?

down

weak sluice Aug 6, 2025, 4:06 PM

#

yes

exotic gust Aug 6, 2025, 4:06 PM

#

sterile ingot Is the website down rn?

down for me too

worldly plume Aug 6, 2025, 4:06 PM

#

Same

echo aurora Aug 6, 2025, 4:06 PM

#

Yup, we are having out an outage, team is working on it

sterile ingot Aug 6, 2025, 4:06 PM

#

I thought it was only me.

iron meadow Aug 6, 2025, 4:06 PM

#

ripe mountain which ai do you think is the best right now? im torn between horizon beta and ge...

what is horizon beta

echo aurora Aug 6, 2025, 4:06 PM

#

So sorry everyone!

torn bison Aug 6, 2025, 4:06 PM

#

ripe mountain which ai do you think is the best right now? im torn between horizon beta and ge...

2.5pro, horizon beta is not considered a reasoning model.

obsidian cargo Aug 6, 2025, 4:06 PM

#

phew. joined discord to check it out, thought I got banned because a (very innoccuous) prompt I was submitting somehow broke TOS

iron meadow Aug 6, 2025, 4:06 PM

#

opus 4 is still my favorite by far

#

4-1 tends to get a bit dramatic

#

especially without reasoning

worldly plume Aug 6, 2025, 4:07 PM

#

I hoping chats won't gone after bug fix

summer valley Aug 6, 2025, 4:07 PM

#

yes

sterile ingot Aug 6, 2025, 4:07 PM

#

iron meadow opus 4 is still my favorite by far

Opus 4 is really great at logic but the prompt has to be precise.

iron meadow Aug 6, 2025, 4:07 PM

#

sterile ingot Opus 4 is really great at logic but the prompt has to be precise.

can you elaborate?

random wolf Aug 6, 2025, 4:07 PM

#

what happened guys? what's the problem?

ripe mountain Aug 6, 2025, 4:08 PM

#

iron meadow what is horizon beta

https://openrouter.ai/openrouter/horizon-beta

Horizon Beta - API, Providers, Stats

This is a cloaked model provided to the community to gather feedback. This is an improved version of Horizon Alpha

Note: It’s free to use during this testing period, and prompts and completions are logged by the model creator for feedback and training. Run Horizon Beta with API

sterile ingot Aug 6, 2025, 4:08 PM

#

iron meadow can you elaborate?

Just my experience

worldly plume Aug 6, 2025, 4:08 PM

#

random wolf what happened guys? what's the problem?

LMArena website got's down

iron meadow Aug 6, 2025, 4:08 PM

#

sterile ingot Just my experience

Like what do you mean precise? Im interested in improving my own prompts

echo aurora Aug 6, 2025, 4:08 PM

#

random wolf what happened guys? what's the problem?

We are looking into an outage atm.

shell pewter Aug 6, 2025, 4:08 PM

#

guys, refresh, its backed up

#

trophy3d

stray aspen Aug 6, 2025, 4:08 PM

#

lmarena is live again

iron meadow Aug 6, 2025, 4:08 PM

#

Its back up

random wolf Aug 6, 2025, 4:09 PM

#

echo aurora We are looking into an outage atm.

okay okay, I hope the chats won't be deleted😁

iron meadow Aug 6, 2025, 4:09 PM

#

Confirmed

sterile ingot Aug 6, 2025, 4:09 PM

#

worldly plume I hoping chats won't gone after bug fix

I already cleared the cookies and site data 🙂

shell pewter Aug 6, 2025, 4:09 PM

#

yay

worldly plume Aug 6, 2025, 4:09 PM

#

Yeeees, my chats still here!!

ripe mountain Aug 6, 2025, 4:09 PM

#

torn bison 2.5pro, horizon beta is not considered a reasoning model.

even though horizon doesn't have a reason model, i think there are areas where it outperforms gemini 2.5 pro

torn bison Aug 6, 2025, 4:10 PM

#

ripe mountain even though horizon doesn't have a reason model, i think there are areas where i...

"best" is usually considered in an overall sense

sterile ingot Aug 6, 2025, 4:10 PM

#

iron meadow Like what do you mean precise? Im interested in improving my own prompts

I sometimes have a hard time explaining how I wanted it to happen, it does the work, but the opposite of what I wanted sometimes.

iron meadow Aug 6, 2025, 4:10 PM

#

2-5 pro is one of the worst models i've used. I only use it for youtube videos and processing huge context windows at this point.

iron meadow Aug 6, 2025, 4:10 PM

#

sterile ingot I sometimes have a hard time explaining how I wanted it to happen, it does the w...

Ah. I'd avoid negation in your prompts

shell pewter Aug 6, 2025, 4:10 PM

#

o3 and gemini 2.5 pro are doing pretty good for me

cyan zodiac Aug 6, 2025, 4:11 PM

#

iron meadow 2-5 pro is one of the worst models i've used. I only use it for youtube videos a...

i remember it being good a while back but upon testing it recently its garbage unless its for huge context tasks as you said

ripe mountain Aug 6, 2025, 4:11 PM

#

torn bison "best" is usually considered in an overall sense

yeah right

stray aspen Aug 6, 2025, 4:12 PM

#

iron meadow 2-5 pro is one of the worst models i've used. I only use it for youtube videos a...

deepseek r1 0528 smashes gemini at lua coding

#

i noticed that yesterday

obsidian shell Aug 6, 2025, 4:12 PM

#

stray aspen deepseek r1 0528 smashes gemini at lua coding

why would you make it code lua?

#

thats a first a hearing

stray aspen Aug 6, 2025, 4:12 PM

#

for roblocks

iron meadow Aug 6, 2025, 4:12 PM

#

Roblocks, neovim

#

Lua is a nice language for its usecases

ripe mountain Aug 6, 2025, 4:13 PM

#

grok 4 is the most overrated model btw

mint jungle Aug 6, 2025, 4:13 PM

#

webpage returns, thanks.

obsidian shell Aug 6, 2025, 4:13 PM

#

i get it but why deepseek?

why not claude?

swe bench is a clear indicator

iron meadow Aug 6, 2025, 4:13 PM

#

ripe mountain grok 4 is the most overrated model btw

Maybe it's underrated, but i'm not sure when you would ever use it

#

When you have alternatives yeah?

ripe mountain Aug 6, 2025, 4:13 PM

#

iron meadow When you have alternatives yeah?

yeah

stray aspen Aug 6, 2025, 4:14 PM

#

obsidian shell i get it but why deepseek? why not claude? swe bench is a clear indicator

because

obsidian shell Aug 6, 2025, 4:15 PM

#

poor thing...

#

vpn!

stray aspen Aug 6, 2025, 4:16 PM

#

nah deepseek is good for me

iron meadow Aug 6, 2025, 4:16 PM

#

Claude-4-opus-thinking is so much better than opus 4-1 thinking in the claude app 😭

stray aspen Aug 6, 2025, 4:16 PM

#

it solved a lua problem before opus 4 and gemini 2.5 pro could yesterday

#

i tried it on lmarena

sacred quail Aug 6, 2025, 4:18 PM

#

iron meadow 2-5 pro is one of the worst models i've used. I only use it for youtube videos a...

Use google ai studio

iron meadow Aug 6, 2025, 4:18 PM

#

sacred quail Use google ai studio

Of course im using AI studio for it

#

Man

sacred quail Aug 6, 2025, 4:18 PM

#

Then you are wrong

#

2.5 is good

iron meadow Aug 6, 2025, 4:19 PM

#

Well.

cedar tide Aug 6, 2025, 4:19 PM

#

https://fixupx.com/lmarena_ai/status/1953126364502212659?t=xsCHa0n4hUyX0nN4A9h2Xg&s=19

lmarena.ai (@lmarena_ai)

🎬 The Video Arena Leaderboard is now live!
︀︀
︀︀14,000+ community votes have ranked the top Text-to-Video and Image-to-Video models.
︀︀
︀︀📝 Text-to-Video rankings:
︀︀
︀︀- #1 Veo3 (audio on)
︀︀- #3 Veo3, Veo3-fast
︀︀- #5 Hailuo 02 [Standard], Seedance 1.0 pro
︀︀- #6 Kling 2.1 Master
︀︀- #9 Wan 2.2 A14B
︀︀- #11 Pika 2.2, Mochi 1
︀︀
︀︀Big congrats to @GoogleDeepMind, @Hailuo_AI, Bytedance, @Kling_ai, @Alibaba_Wan, @pika_labs, and @genmoai!

**💬 1 🔁 3 ❤️ 29 👁️ 1.3K **

golden ocean Aug 6, 2025, 4:21 PM

#

iron meadow 2-5 pro is one of the worst models i've used. I only use it for youtube videos a...

for coding it is incredibly annoying yes and for natural text generation (like sounding human and not ai) its sometimes very bad even with system instructions

iron meadow Aug 6, 2025, 4:22 PM

#

golden ocean for coding it is incredibly annoying yes and for natural text generation (like s...

Yeah

ripe mountain Aug 6, 2025, 4:22 PM

#

gpt 4o or gemini 2.5 pro? which is better than?

golden ocean Aug 6, 2025, 4:23 PM

#

ripe mountain gpt 4o or gemini 2.5 pro? which is better than?

https://cdn.discordapp.com/attachments/910347347601543196/1160742997156188181/togif-6.gif

iron meadow Aug 6, 2025, 4:23 PM

#

https://cdn.discordapp.com/attachments/910347347601543196/1160742997156188181/togif-6.gif

misty vault Aug 6, 2025, 4:24 PM

#

https://cdn.discordapp.com/attachments/910347347601543196/1160742997156188181/togif-6.gif?ex=6894a2e8&is=68935168&hm=e470c76d62e4a113ff318d0339e83e44d0c192d0468c401bf6f3b8e95c6f2678&

keen beacon Aug 6, 2025, 4:25 PM

#

Sometimes i wonder how much data there is on these discord channel that gets lost. Are AI companies using this>

golden ocean Aug 6, 2025, 4:27 PM

#

obviously not

#

no frontier models can perfectly imitate online chat platforms or even realistic conversations without fine tuning

wheat onyx Aug 6, 2025, 4:27 PM

#

https://openai.com/index/providing-chatgpt-to-the-entire-us-federal-workforce/

Providing ChatGPT to the entire U.S. federal workforce

First-of-its-kind partnership with Government Services Administration will give federal agencies access to ChatGPT Enterprise for $1 for the next year

cedar tide Aug 6, 2025, 4:28 PM

#

https://fixupx.com/Alibaba_Qwen/status/1953128028047102241?t=M0J-HT4f2u5vJy4uiMfT_Q&s=19

Qwen (@Alibaba_Qwen)

🚀 Introducing Qwen3-4B-Instruct-2507 & Qwen3-4B-Thinking-2507 — smarter, sharper, and 256K-ready!
︀︀
︀︀🔹 Instruct: Boosted general skills, multilingual coverage, and long-context instruction following.
︀︀
︀︀🔹 Thinking: Advanced reasoning in logic, math, science & code — built for expert-level tasks.
︀︀
︀︀Both models are more aligned, more capable, and more context-aware.
︀︀
︀︀Huggingface：
︀︀huggingface.co/Qwen/Qwen3-4B-Instruct-2507
︀︀huggingface.co/Qwen/Qwen3-4B-Thinking-2507
︀︀ModelScope:
︀︀modelscope.cn/models/Qwen/Qwen3-4B-Instruct-2507
︀︀modelscope.cn/models/Qwen/Qwen3-4B-Thinking-2507

**💬 28 🔁 59 ❤️ 320 👁️ 7.0K **

golden ocean Aug 6, 2025, 4:28 PM

#

wheat onyx https://openai.com/index/providing-chatgpt-to-the-entire-us-federal-workforce/

but chatgpt is far left

wheat onyx Aug 6, 2025, 4:30 PM

#

cedar tide https://fixupx.com/Alibaba_Qwen/status/1953128028047102241?t=M0J-HT4f2u5vJy4uiMf...

is this just good at math? Language stuff I feel like would be more interesting at first, for the small models

random wolf Aug 6, 2025, 4:30 PM

#

is there any solution to cancel when generating? it's so frustrating

muted vault Aug 6, 2025, 4:40 PM

#

why does the audionot work

random wolf Aug 6, 2025, 4:42 PM

#

guys help huhu is there any solution to cancel when generating? it's so frustrating😔

exotic nebula Aug 6, 2025, 4:42 PM

#

ripe mountain gpt 4o or gemini 2.5 pro? which is better than?

Gemini 2.5 pro. Gpt 4o aint even competition for it. Easy diff win

maiden fulcrum Aug 6, 2025, 4:44 PM

#

nice video leaderboards

buoyant helm Aug 6, 2025, 4:45 PM

#

muted vault why does the audionot work

what are you talking about 💔

maiden fulcrum Aug 6, 2025, 4:47 PM

#

i really think that seedance 1.0 pro should be number one

#

it beats all models in i2v

rare python Aug 6, 2025, 4:55 PM

#

with and without audio

#

fast and normal

buoyant helm Aug 6, 2025, 4:55 PM

#

^

rare python Aug 6, 2025, 4:56 PM

#

the only video model that has sound iirc

muted vault Aug 6, 2025, 4:59 PM

#

buoyant helm what are you talking about 💔

Does the audio creating feauture from veo 3 work

rare python Aug 6, 2025, 4:59 PM

#

lmarena you can custom prompt the models. In AA you just watch someone else prompt I guess

#

So more diverse style

#

¯_(ツ)_/¯

#

which website? AA or lmarena?

buoyant helm Aug 6, 2025, 5:00 PM

#

muted vault Does the audio creating feauture from veo 3 work

yea if it chose the veo-3-audio

cedar tide Aug 6, 2025, 5:01 PM

#

https://x.com/OpenAI/status/1953139020231569685?t=Ef18R0ieTcDH7rNQ-wpYhQ&s=19

OpenAI (@OpenAI)

LIVE5TREAM THURSDAY 10AM PT

mystic frigate Aug 6, 2025, 5:01 PM

#

its actually tomorrow omg

rare python Aug 6, 2025, 5:02 PM

#

in this server #1397655624103493813

#

They haven't released on the web

wheat onyx Aug 6, 2025, 5:02 PM

#

https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers

Fine-tuning with gpt-oss and Hugging Face Transformers | OpenAI Coo...

Authored by: Edward Beeching, Quentin Gallouédec, and Lewis Tunstall Large reasoning models like OpenAI o3 generate a chain-of-thought to...

storm needle Aug 6, 2025, 5:04 PM

#

https://x.com/openai/status/1953139020231569685?s=46

OpenAI (@OpenAI)

LIVE5TREAM THURSDAY 10AM PT

echo aurora Aug 6, 2025, 5:05 PM

#

storm needle https://x.com/openai/status/1953139020231569685?s=46

Thinking we should have a watch party

exotic nebula Aug 6, 2025, 5:07 PM

#

https://cdn.openai.com/API/docs/images/model-page/model-icons/gpt-5.png

gentle plinth Aug 6, 2025, 5:08 PM

#

So for your timezone stream will be <t:1754586000:f>

maiden fulcrum Aug 6, 2025, 5:09 PM

#

1PM EST

#

Do you guys think GPT-5 will beat Gemini 2.5 Pro, Grok 4 Heavy and o3-pro across all benchmarks?

gentle plinth Aug 6, 2025, 5:10 PM

#

I think so, but more important is the question if the model is actually good, or just benchmaxxing

raven helm Aug 6, 2025, 5:11 PM

#

Comformed

maiden fulcrum Aug 6, 2025, 5:11 PM

#

gentle plinth I think so, but more important is the question if the model is actually good, or...

Isn't benchmaxing the same meaning as being a good model?

torn mantle Aug 6, 2025, 5:11 PM

#

what did i say

#

who predicted that?

#

me

#

and me

raven helm Aug 6, 2025, 5:11 PM

#

torn mantle Aug 6, 2025, 5:11 PM

#

and me

wheat onyx Aug 6, 2025, 5:11 PM

#

maiden fulcrum Do you guys think GPT-5 will beat Gemini 2.5 Pro, Grok 4 Heavy and o3-pro across...

Probably? But other companies have releases upcoming too

raven helm Aug 6, 2025, 5:11 PM

#

raven helm

!!!

stray aspen Aug 6, 2025, 5:11 PM

#

raven helm

is this the gpt 5 announcement

maiden fulcrum Aug 6, 2025, 5:11 PM

#

wheat onyx Probably? But other companies have releases upcoming too

Like who?

raven helm Aug 6, 2025, 5:11 PM

#

stray aspen is this the gpt 5 announcement

prob

#

#

posted 10 min ago

gentle plinth Aug 6, 2025, 5:12 PM

#

maiden fulcrum Isn't benchmaxing the same meaning as being a good model?

No, benchmaxing means it was trained on benchmarks, but isn't necessarily good at other tasks

#

At least that's how I understand it

maiden fulcrum Aug 6, 2025, 5:12 PM

#

Oh, I see

#

What is 'good model' to you?

iron meadow Aug 6, 2025, 5:12 PM

#

LIVE5TREAM

raven helm Aug 6, 2025, 5:13 PM

#

wheat onyx Aug 6, 2025, 5:13 PM

#

maiden fulcrum Like who?

Gemini and Claude both have releases soon

gentle plinth Aug 6, 2025, 5:13 PM

#

maiden fulcrum What is 'good model' to you?

A model which is good at different real world use cases, such as finding bugs outside of public benchmarks, writing clean code for tasks it hadn't been trained on, and finding good solutions to problems

maiden fulcrum Aug 6, 2025, 5:14 PM

#

wheat onyx Gemini and Claude both have releases soon

Gemini 3.0?

raven helm Aug 6, 2025, 5:14 PM

#

yea

maiden fulcrum Aug 6, 2025, 5:14 PM

#

When?

wheat onyx Aug 6, 2025, 5:14 PM

#

stray aspen is this the gpt 5 announcement

Live5tream

wheat onyx Aug 6, 2025, 5:14 PM

#

maiden fulcrum Gemini 3.0?

yeah. No date, but it's soon

raven helm Aug 6, 2025, 5:14 PM

#

Oh, i didnt notice that

stray aspen Aug 6, 2025, 5:15 PM

#

is it a typo or are they refering to gpt 5

obsidian cargo Aug 6, 2025, 5:15 PM

#

Anyone know if zenith was gpt-5 yet?

#

Wish I got more of a chance to try zenith before it got removed

wheat onyx Aug 6, 2025, 5:15 PM

#

stray aspen is it a typo or are they refering to gpt 5

I think it's for Chatgpt o4 and just a typo

maiden fulcrum Aug 6, 2025, 5:15 PM

#

All I know is that there will be a great video model coming by the end of this month

gentle plinth Aug 6, 2025, 5:16 PM

#

stray aspen is it a typo or are they refering to gpt 5

They wrote it this way intentionally

maiden fulcrum Aug 6, 2025, 5:16 PM

#

But I cannot say by who yet.

maiden fulcrum Aug 6, 2025, 5:16 PM

#

stray aspen is it a typo or are they refering to gpt 5

They're referring to Five Guys

raven helm Aug 6, 2025, 5:16 PM

#

stray aspen is it a typo or are they refering to gpt 5

Live5steam - referring to GPT 5

stray aspen Aug 6, 2025, 5:17 PM

#

maiden fulcrum All I know is that there will be a great video model coming by the end of this m...

xAI

wheat onyx Aug 6, 2025, 5:17 PM

#

gentle plinth They wrote it this way intentionally

no way. It's not like if it was a typo they could delete it and tweet again

maiden fulcrum Aug 6, 2025, 5:17 PM

#

stray aspen xAI

Nope

iron meadow Aug 6, 2025, 5:19 PM

#

@echo aurora Can you help me decipher what is wrong with a message I am trying to use as a benchmark for models. I get "something went wrong while generating the response". I'd prefer if it was sent privately, not sure why this server doesnt have a ticket system yet

torn mantle Aug 6, 2025, 5:20 PM

#

i wish

#

i would've earned quite the sum

#

i was actually right many times

raven helm Aug 6, 2025, 5:20 PM

#

@echo aurora What do you think about GPT-5?

echo aurora Aug 6, 2025, 5:20 PM

#

iron meadow <@283397944160550928> Can you help me decipher what is wrong with a message I am...

Yeah I can take a look. My DMs are open. Note we do have @oak python available too.

torn mantle Aug 6, 2025, 5:20 PM

#

in grok 4 not topping lmarena way way before the release

#

no you didnt

#

stop lying

#

omg

#

...

#

what else did i predict

#

hmmm

#

lets see

#

what

#

no

iron meadow Aug 6, 2025, 5:23 PM

#

Sent!

raven helm Aug 6, 2025, 5:26 PM

#

stray aspen Aug 6, 2025, 5:28 PM

#

wheat onyx Aug 6, 2025, 5:36 PM

#

stray aspen

stop overthinking it. It's clearly a typo. Tweets can't be deleted

agile bloom Aug 6, 2025, 5:37 PM

#

stray aspen

wait what?

quartz light Aug 6, 2025, 5:40 PM

#

@echo aurora 4.1 direct when

torn mantle Aug 6, 2025, 5:51 PM

#

stray aspen

gpt-S

whole wagon Aug 6, 2025, 5:52 PM

#

wheat onyx stop overthinking it. It's clearly a typo. Tweets can't be deleted

?

#

It obviously is gpt5. That is obvious even without any hint

obsidian cargo Aug 6, 2025, 5:56 PM

#

Gpt 4o.5

#

That's what they're hinting at

#

Or maybe o4.5

wheat onyx Aug 6, 2025, 5:57 PM

#

whole wagon It obviously is gpt5. That is obvious even without any hint

No way

#

Impossible

whole wagon Aug 6, 2025, 5:57 PM

#

Bruh

#

Have you been under a rock past few weeks

wheat onyx Aug 6, 2025, 5:58 PM

#

whole wagon Have you been under a rock past few weeks

#general message

This isn't a hint to you?

stray aspen Aug 6, 2025, 6:19 PM

#

whole wagon It obviously is gpt5. That is obvious even without any hint

are you serious

#

i could have never thought of that

#

i thought they were hinting grok 5

tame horizon Aug 6, 2025, 6:21 PM

#

Listen to Claudio 4 1 from now on he's not there yesterday I was talking to him and he's not showing up anymore I want

#

#

Guys, they removed it, they remove it, why can they put it back?

torn mantle Aug 6, 2025, 6:25 PM

#

@patent aspen gemini 3 soon or nah?

tame horizon Aug 6, 2025, 6:29 PM

#

torn mantle <@607352374352281612> gemini 3 soon or nah?

maybe friend

stray aspen Aug 6, 2025, 6:29 PM

#

torn mantle <@607352374352281612> gemini 3 soon or nah?

yes

#

they have to respond to gpt -5

tame horizon Aug 6, 2025, 6:29 PM

#

stray aspen are you serious

Thank you very much friend for the information mentioned above about the site

stray aspen Aug 6, 2025, 6:29 PM

#

yes

tame horizon Aug 6, 2025, 6:30 PM

#

how are you?

stray aspen Aug 6, 2025, 6:30 PM

#

im chilling bud

tame horizon Aug 6, 2025, 6:32 PM

#

hehehe cool 😎

#

Look, were you also able to talk to Claude 4.1?

tame horizon Aug 6, 2025, 6:34 PM

#

stray aspen im chilling bud

Did they release it before Anthropic? hahaha

torn mantle Aug 6, 2025, 6:35 PM

#

tame horizon maybe friend

who are you

#

i see

brisk helm Aug 6, 2025, 6:36 PM

#

yo for the video arena, can they make it so that u can choose whcih model u wanna use

echo aurora Aug 6, 2025, 6:36 PM

#

brisk helm yo for the video arena, can they make it so that u can choose whcih model u wann...

It's possible, be sure to share feedback in #bot-feedback

tame horizon Aug 6, 2025, 6:37 PM

#

I'm just a user here! What are the canonical movie scenes "who are you"? - what should I say "Jesus"? , kiss my bro I'm just another one wanting to contribute and be your friend

#

@torn mantle

brisk helm Aug 6, 2025, 6:39 PM

#

echo aurora It's possible, be sure to share feedback in <#1398083208272412722>

ok sorry

echo aurora Aug 6, 2025, 6:39 PM

#

brisk helm ok sorry

no need to apologize, it's all good.

tame horizon Aug 6, 2025, 6:40 PM

#

how are you @torn mantle

tame horizon Aug 6, 2025, 6:40 PM

#

torn mantle who are you

How do you eat today?

obsidian cargo Aug 6, 2025, 6:42 PM

#

with my mouth, as usual

tame horizon Aug 6, 2025, 6:42 PM

#

Wow, I'm missing the notification feature. It should already be showing who wants to chat with me?

echo aurora Aug 6, 2025, 6:42 PM

#

Lets try to keep convo related to AI please.

brisk helm Aug 6, 2025, 6:42 PM

#

jjust a quick question to the ppl that run lmarena. how do u guys freely give access to premium ai models on the webpage, just a question.

stray aspen Aug 6, 2025, 6:44 PM

#

tame horizon I'm just a user here! What are the canonical movie scenes "who are you"? - what ...

bro what

#

are you an AI

brisk helm Aug 6, 2025, 6:44 PM

#

its a bot so yeah an ai i think

tame horizon Aug 6, 2025, 6:44 PM

#

no bro it's me jajjjajajaj

tame horizon Aug 6, 2025, 6:44 PM

#

stray aspen are you an AI

I guess not or will it?

glad perch Aug 6, 2025, 6:53 PM

#

That feeling when you go see see your generated videos only to come to the scene of people already voted and models are revealed 😭😞

echo aurora Aug 6, 2025, 7:03 PM

#

glad perch That feeling when you go see see your generated videos only to come to the scene...

We are putting a lot of thought into how to best do the model reveal. If you have thoughts on likes/dislikes don’t hesitate to share with us in #bot-feedback

whole wagon Aug 6, 2025, 7:04 PM

#

Sus

brisk helm Aug 6, 2025, 7:05 PM

#

whole wagon Sus

ive already applied as janitor at the pentagon

whole wagon Aug 6, 2025, 7:06 PM

#

Well they are getting smth in return

#

Sigh more bs from scam altman kek. GPT5 is not AGI lol

#

"smarter than the smartest person" bruh

#

These claims get more outlandish every time

ornate stump Aug 6, 2025, 7:08 PM

#

smarter than the smarter smarties

elfin herald Aug 6, 2025, 7:22 PM

#

is claude opus 4.1 not available?
i cant see it

obsidian shell Aug 6, 2025, 7:24 PM

#

claude is expensive as ssssss

they dont want us to get it for free so they tool it off direct and moved it to battle only

elfin herald Aug 6, 2025, 7:24 PM

#

obsidian shell claude is expensive as ssssss they dont want us to get it for free so they tool...

oh thanks

#

isnt it the same price as opus 4

obsidian shell Aug 6, 2025, 7:26 PM

#

in the api probably

but anthropic has to capitalize on its release

it wont do it by making free for testing

#

maybe in a week or two

obsidian cargo Aug 6, 2025, 7:26 PM

#

Do you guys think we might get gpt-image-2 tomorrow too?

#

or at least an update to gpt-image-1?
or am I huffing copium?

obsidian shell Aug 6, 2025, 7:27 PM

#

probably not

#

guessing wont get us far

it will come when it will come

torn mantle Aug 6, 2025, 7:28 PM

#

tame horizon how are you <@295243581818404874>

are you a bot

#

are you using an agent to communicate with us?

#

like an agentic browser like comet

leaden palm Aug 6, 2025, 7:30 PM

#

torn mantle are you a bot

probably just non native english

tame horizon Aug 6, 2025, 7:31 PM

#

@torn mantle I'm not anymore, this will be under construction soon as soon as the current project is finished.

#

Several people have asked me this

#

I think it's funny

hallow ridge Aug 6, 2025, 7:36 PM

#

How can I take away the restrictions

stray aspen Aug 6, 2025, 7:49 PM

#

something big is coming

hallow ridge Aug 6, 2025, 7:50 PM

#

stray aspen something big is coming

Like what

wheat onyx Aug 6, 2025, 7:53 PM

#

whole wagon Sigh more bs from scam altman kek. GPT5 is not AGI lol

remember he is building an AI device separately from OAI

tame horizon Aug 6, 2025, 7:54 PM

#

stray aspen something big is coming

I agree

keen beacon Aug 6, 2025, 7:55 PM

#

Staff at OpenAI robotics tweeted about GPT 5 maybe. what?

#

How could it be robotics related???

tame horizon Aug 6, 2025, 7:55 PM

#

leaden palm probably just non native english

I'm not exactly talking about current slang because my course is outdated.

wheat onyx Aug 6, 2025, 7:56 PM

#

wheat onyx remember he is building an AI device separately from OAI

I forgot, OAI acquired it, I'm wrong

novel flame Aug 6, 2025, 7:56 PM

#

wheat onyx remember he is building an AI device separately from OAI

Oh yea, I got it: a PaLLM Pilot running GPT-OSS 20B would indeed be snarter than the smartest people who voluntarily follow Elon on Twixter.

stray aspen Aug 6, 2025, 7:56 PM

#

torn mantle are you a bot

its the new model from OAI

wheat onyx Aug 6, 2025, 7:56 PM

#

novel flame Oh yea, I got it: a PaLLM Pilot running GPT-OSS 20B would indeed be snarter than...

is that what i suggested?

novel flame Aug 6, 2025, 7:57 PM

#

wheat onyx is that what i suggested?

No but it’s a joke.

cedar tide Aug 6, 2025, 8:20 PM

#

@echo aurora why OSS 120b removed from webdev ?

Screenshot_2025-08-06-23-19-16-932_com.discord-edit.jpg

#

And 20b from arena

Screenshot_2025-08-06-23-19-00-657_com.discord-edit.jpg

torn mantle Aug 6, 2025, 8:25 PM

#

leaden palm probably just non native english

i see

torn mantle Aug 6, 2025, 8:25 PM

#

tame horizon I think it's funny

hmm

torn mantle Aug 6, 2025, 8:25 PM

#

stray aspen its the new model from OAI

mini or nano

echo aurora Aug 6, 2025, 8:27 PM

#

cedar tide <@283397944160550928> why OSS 120b removed from webdev ?

I'm not sure, but will flag to the team blobthanks

keen beacon Aug 6, 2025, 8:28 PM

#

Dude there is no Veo 3

obsidian cargo Aug 6, 2025, 8:36 PM

#

dang new models but I ran out of daily generations a few hours ago ehehe

echo aurora Aug 6, 2025, 8:37 PM

#

obsidian cargo dang new models but I ran out of daily generations a few hours ago ehehe

oh no! they will be there tomorrow! would also note I was a bit late to the announcement so you may have been using some of them.

ocean vortex Aug 6, 2025, 8:43 PM

#

cedar tide <@283397944160550928> why OSS 120b removed from webdev ?

it would probably get destroyed there

lime coral Aug 6, 2025, 8:50 PM

#

cedar tide <@283397944160550928> why OSS 120b removed from webdev ?

They are coward

warm fulcrum Aug 6, 2025, 8:53 PM

#

which discord server tells you about newly added ai models

stray aspen Aug 6, 2025, 8:54 PM

#

no

#

very interesting author gemini

#

#

why is it calling itself gpt 4

#

hell yes

#

or i hope so

wheat onyx Aug 6, 2025, 8:56 PM

#

https://x.com/hunoematic/status/1953189036509806833?s=19

invincibleHunter (@hunoematic)

GPT-5 is state of the art at reasoning. In SimpleBench, it passed 90% of all questions (9 out of 10). I have access to it via Copilot, and it uses Reasoning mode.

@btibor91
@patience_cave
@kimmonismus
@slow_developer
@RafaCrackYT
@legit_api
@koltregaskes
@chetaslua

stray aspen Aug 6, 2025, 8:56 PM

#

thats crazy

wheat onyx Aug 6, 2025, 8:57 PM

#

wheat onyx https://x.com/hunoematic/status/1953189036509806833?s=19

https://x.com/ChaseBrowe32432/status/1953195458328932542?s=19

Chase Brower (@ChaseBrowe32432)

@hunoematic this is the public set and not the actual benchmark. its benchmark score will be much lower.

bright kayak Aug 6, 2025, 8:58 PM

#

wheat onyx https://x.com/hunoematic/status/1953189036509806833?s=19

does everyone just trust everyone?

wheat onyx Aug 6, 2025, 8:58 PM

#

I think it's probably safe to ignore it, just thought it was interesting

stray aspen Aug 6, 2025, 9:00 PM

#

claude 4.1 is on livebench

#

terse shuttle Aug 6, 2025, 9:01 PM

#

wheat onyx https://x.com/hunoematic/status/1953189036509806833?s=19

why claude 4 sonnet is lower then claude 3.7?

wheat onyx Aug 6, 2025, 9:02 PM

#

Claude was only ever for coding

iron meadow Aug 6, 2025, 9:04 PM

#

what is it?

bright kayak Aug 6, 2025, 9:04 PM

#

#

new layout

wheat onyx Aug 6, 2025, 9:04 PM

#

Anthropic said they have some big upgrades in the coming weeks. We will see

bright kayak Aug 6, 2025, 9:05 PM

#

bright kayak

bright kayak Aug 6, 2025, 9:08 PM

#

bright kayak

btw this just looks worse overall why

wheat onyx Aug 6, 2025, 9:09 PM

#

Anthropic has big updates in a few weeks. As long as they beat GPT-5, they'll be fine. If not, it will be very difficult for them

iron meadow Aug 6, 2025, 9:11 PM

#

Too much buzz words 😭

#

Website spams the word "vibe"

#

Not tasteful to me

bright kayak Aug 6, 2025, 9:11 PM

#

does this edit look realistic?

iron meadow Aug 6, 2025, 9:12 PM

#

bright kayak does this edit look realistic?

Needs the frosted blur

#

You can copy the css directly from the site

bright kayak Aug 6, 2025, 9:12 PM

#

iron meadow Needs the frosted blur

the background is official

#

i literally just changed the path of the jpg, to gpt-5

#

#

i guess they just change their styles often

hardy pecan Aug 6, 2025, 9:15 PM

#

wheat onyx https://x.com/hunoematic/status/1953189036509806833?s=19

9/10 on a public set that has been available for awhile isn't all that impressive, alot of the models fall down when tested against the private set. no expectation it should stay at 90%, itll be ~70 with my testing

bright kayak Aug 6, 2025, 9:15 PM

#

i mean, i only changed the image, nothing else, so any css would carry over

hardy pecan Aug 6, 2025, 9:17 PM

#

I'm giving it ~70%, which is by no means underestimation, its really strong from my testing, agreed

#

Excited for it

#

gulp

civic flame Aug 6, 2025, 9:27 PM

#

lol

rough nimbus Aug 6, 2025, 9:27 PM

#

hello frends

bright kayak Aug 6, 2025, 9:27 PM

#

this is my edit, using their official bg images

rough nimbus Aug 6, 2025, 9:27 PM

#

very happy to be here with you ❤️

echo aurora Aug 6, 2025, 9:27 PM

#

rough nimbus very happy to be here with you ❤️

glad to hear it!

dim heron Aug 6, 2025, 9:31 PM

#

Can someone not generate video with voice over here?

iron cipher Aug 6, 2025, 9:32 PM

#

bright kayak does this edit look realistic?

is GPT-5 out yet?

#

on LMArena

bright kayak Aug 6, 2025, 9:32 PM

#

people talk a lot about wanting x and y to help with coding, all i need it to do is to help me troubleshoot

bright kayak Aug 6, 2025, 9:32 PM

#

iron cipher is GPT-5 out yet?

i'm glad my edit was this good

#

no, tommorow is the livestream

#

https://x.com/OpenAI/status/1953139020231569685

OpenAI (@OpenAI)

LIVE5TREAM THURSDAY 10AM PT

blazing bison Aug 6, 2025, 9:33 PM

#

They gonna change the name of gpt 4o to gpt 5

#

Yeah

bright kayak Aug 6, 2025, 9:34 PM

#

50% of people wouldnt be able to tell the difference or care

blazing bison Aug 6, 2025, 9:34 PM

#

For sure, if their router model is good enough to identify hard questions

#

Oh my god I have to send it again

civic flame Aug 6, 2025, 9:35 PM

#

buddy there's literally a model with the slug gpt-5-auto

#

on the api it isn't a router, but in chatGPT there will be a router (with the ability to force reasoning for subscribers)

blazing bison Aug 6, 2025, 9:37 PM

#

https://x.com/kevinweil/status/1890914595268657194

Kevin Weil 🇺🇸 (@kevinweil)

@SpencerKSchiff What you outlined is the plan 👍 May start with a little routing behind the scenes to hide some lingering complexity, but mostly around the edges. The plan is to get the core model to do quick responses, tools, and longer reasoning.

#

Do you know what core model means?

#

????

#

The info that gpt 5 is a unified model is old than that

#

I'm?

#

Lol

#

No it's not

patent aspen Aug 6, 2025, 9:39 PM

#

It's not just routing but there is a router

blazing bison Aug 6, 2025, 9:39 PM

#

Yes you're right

#

Yes brian

#

Exactly

#

Yes there is a router and there is new models too

#

I never said otherwise

#

But people will receive for the most requests a gpt 4o router

#

There is lol

#

When I say gpt 4o , im talking about gpt 4o level model

#

You don't know me bro

stray aspen Aug 6, 2025, 9:42 PM

#

craig will gpt 5 be the SoTA when it releases

blazing bison Aug 6, 2025, 9:42 PM

#

Lmao

patent aspen Aug 6, 2025, 9:42 PM

#

I think in general you have to assume that any high volume all in one chat app is going to have some routing involved, although the line between routing and not routing is going to be blurry because there will also be a lot of shared state

bright kayak Aug 6, 2025, 9:42 PM

#

#

I get why people fake screenshots, it's fun

blazing bison Aug 6, 2025, 9:43 PM

#

I never said that

patent aspen Aug 6, 2025, 9:43 PM

#

Yeah I know. Most people don't think in shades of gray

blazing bison Aug 6, 2025, 9:43 PM

#

Its not

#

It just the 4o model

#

The thinking is a summary bug in the frontend

#

Idk I use the api most of the time

patent aspen Aug 6, 2025, 9:44 PM

#

Jules is out of beta

blazing bison Aug 6, 2025, 9:45 PM

#

Gemini 2.5 pro

#

Codex but from google

willow grail Aug 6, 2025, 9:47 PM

#

blazing bison Aug 6, 2025, 9:47 PM

#

Another fake bench

#

Lesgoo

solar hollow Aug 6, 2025, 9:47 PM

#

yeah lets just wait for the release

willow grail Aug 6, 2025, 9:48 PM

#

blazing bison Another fake bench

they have tested it

#

https://x.com/hunoematic/status/1953189036509806833/photo/1

invincibleHunter (@hunoematic)

GPT-5 is state of the art at reasoning. In SimpleBench, it passed 90% of all questions (9 out of 10). I have access to it via Copilot, and it uses Reasoning mode.

@btibor91
@patience_cave
@kimmonismus
@slow_developer
@RafaCrackYT
@legit_api
@koltregaskes
@chetaslua

bright kayak Aug 6, 2025, 9:48 PM

#

of course it's fake

#

the image is deceiving

willow grail Aug 6, 2025, 9:48 PM

#

https://x.com/legit_api/status/1953197372755845597

ʟᴇɢɪᴛ (@legit_api)

@hunoematic @btibor91 @patience_cave @kimmonismus @slow_developer @RafaCrackYT @koltregaskes @chetaslua this matches the 9/10 score it got with GPT-5 earlier today

in my run, the last question took it down

for me it thought for multiple minutes on most of them 🤔

barren prairie Aug 6, 2025, 9:51 PM

#

Gemini pro 2.5 VS GLM 4.5 coding capacities test 🔥⚔️

Prompt:

Create a 3D educational HTML game: A metro train moves on visible tracks across a green field under a blue sky, passing through 10 stations. At each station, a quiz question appears with multiple-choice buttons. The train only continues to the next station if the player answers correctly; otherwise, it stops until the correct answer is selected. Include background buildings, animated grass, and a realistic sky. Add UI buttons for controlling the metro's movement (Start, Stop, Reset). The scene should be playful, colorful, and child-friendly, with smooth transitions and immersive 3D visuals.

1st one is for Gemini pro 2.5

https://g.co/gemini/share/f4e1c8d337a3

Gemini pro failed it and made tons of errors so I couldn t continue the design with it . You can see that from the first prompt he made bug on the buttons that he couldn t fix

2nd one is for GLM4.5

https://chat.z.ai/s/5d97a31b-0071-445e-8b7d-5f1f711388f4

I tried to make the prompts more and more harder each time to make it produce one error but ...

Writing 1638 lines with a very small mistake that he corrected perfectly

Started to bug after this covo starting from the 1694 code line..

Gemini

‎Gemini - 3D Metro Quiz Game Creation

Created with Gemini

Chat with Z.ai - Free AI Chatbot powered by GLM-4.5

Start a free chat with your AI expert for code and smart tools. Tell Z.ai what you need—a complete full-stack application, a stunning presentation, or professional-grade writing—and get instant results.

willow grail Aug 6, 2025, 9:52 PM

#

just use normal simplebench questions and use gpt5 on ms copilot

bright kayak Aug 6, 2025, 9:54 PM

#

bright kayak

i messed up the comments and bottom posts time but otherwise i think this looks pretty realistic

burnt sinew Aug 6, 2025, 10:27 PM

#

can we do text-to-video on lmarena website?

echo aurora Aug 6, 2025, 10:28 PM

#

burnt sinew can we do text-to-video on lmarena website?

No, it's currently only available through this server.

wicked root Aug 6, 2025, 10:36 PM

#

willow grail

nooooooooo

wicked root Aug 6, 2025, 10:36 PM

#

bright kayak the image is deceiving

how do u guys know if it's fake?

bright kayak Aug 6, 2025, 10:37 PM

#

because they said it was tested on only the public set

small haven Aug 6, 2025, 10:43 PM

#

what are the odds that o4 is integrated into gpt5

torn mantle Aug 6, 2025, 10:44 PM

#

wicked root nooooooooo

yes

wicked root Aug 6, 2025, 10:44 PM

#

torn mantle yes

https://tenor.com/view/no-the-office-michael-scott-scream-gif-16929305

Tenor

burnt sinew Aug 6, 2025, 10:45 PM

#

willow grail

says rumor for a reason

glacial swift Aug 6, 2025, 10:45 PM

#

🙂

maiden fulcrum Aug 6, 2025, 11:00 PM

#

18 Hours left until GPT-5

stray aspen Aug 6, 2025, 11:07 PM

#

How do you use gpt 5 on copilot

obsidian shell Aug 6, 2025, 11:12 PM

#

you dont

#

i think its still on 4o

strange briar Aug 6, 2025, 11:21 PM

#

hi

echo aurora Aug 6, 2025, 11:23 PM

#

strange briar hi

ablobwave

cedar tide Aug 6, 2025, 11:43 PM

#

the shared results are too low compared to those shared by open ai

#

The api dont work good now

Screenshot_2025-08-07-02-38-44-981_com.twitter.android-edit.jpg

#

math arena got 91% on aime 25 in local run, much better than artificial analysis

stray aspen Aug 6, 2025, 11:51 PM

#

whats this

shadow jewel Aug 6, 2025, 11:53 PM

#

we love lmarena 🙏

wicked root Aug 7, 2025, 12:30 AM

#

Any news on gpt5?

crimson monolith Aug 7, 2025, 12:47 AM

#

Hello

stray aspen Aug 7, 2025, 1:01 AM

#

didnt think i would ever say this

#

but i think gpt oss 120 actually cooked me a good script

#

at least better than 2.5 pro

little narwhal Aug 7, 2025, 1:04 AM

#

GPT-1984

#

Amirite

static portal Aug 7, 2025, 1:33 AM

#

is it just claude with the issue with randomly thinking forever or other models are the same?

wicked root Aug 7, 2025, 1:35 AM

#

stray aspen at least better than 2.5 pro

NOOOOOO

whole wagon Aug 7, 2025, 1:38 AM

#

horizon beta is absolutely insane

#

This is crazy good

#

yea

#

the full one?

wicked root Aug 7, 2025, 1:41 AM

#

👀

whole wagon Aug 7, 2025, 1:42 AM

#

its incredible

#

Literally destroying all my questions after like 3 seconds thinking

#

o3 pro cant get these right

solar hollow Aug 7, 2025, 1:44 AM

#

whole wagon This is crazy good

this depends on the stack sizes of the table

#

if you are biggest stack with decent margin, you can push much more

#

in any other situation you gotta be very tight

#

especially as a mid stack size

#

none would be pushable in fact im pretty sure

#

of course you can mix your strategy and min open aswell

#

which often will be better

#

as a mid stack you could only push AA probably, barely KK even

#

but you can open with more hands for 2bb

#

of course the prize table matters too

#

if it is very top heavy, you get to push more

whole wagon Aug 7, 2025, 1:53 AM

#

really nice

#

GPT5 will be SOTA by a large margin. For sure

#

I wonder how long until GPT6 though. The 2 year cadence no longer works they need to shorten the gap between releases

red sluice Aug 7, 2025, 1:57 AM

#

Hopefully they'll make the API available asap so we can test it

whole wagon Aug 7, 2025, 1:57 AM

#

This will be SOTA for about 6 months i think

stray aspen Aug 7, 2025, 2:10 AM

#

i love horizon beta

jade egret Aug 7, 2025, 2:23 AM

#

do yall think gpt 5 will be better than claude 4.1 opus at coding?

whole wagon Aug 7, 2025, 2:27 AM

#

https://x.com/sama/status/1953264193890861114?t=DNIeXCZG0bQfyCeFt4JHRw&s=19
Horizon reference

Sam Altman (@sama)

stray aspen Aug 7, 2025, 2:29 AM

#

jade egret do yall think gpt 5 will be better than claude 4.1 opus at coding?

yes

#

claude 4.1 is literally opus 4 but 2% better

earnest swift Aug 7, 2025, 2:40 AM

#

what is the maximam amout of videos i can ask in arena 1?

stray aspen Aug 7, 2025, 2:41 AM

#

not again

#

@echo aurora

stray aspen Aug 7, 2025, 2:41 AM

#

earnest swift what is the maximam amout of videos i can ask in arena 1?

i think its 8

earnest swift Aug 7, 2025, 2:41 AM

#

a day?

stray aspen Aug 7, 2025, 2:41 AM

#

yes

#

maybe if it didnt have north korean level censorship

somber niche Aug 7, 2025, 2:48 AM

#

Technically not a false statement

stray aspen Aug 7, 2025, 2:48 AM

#

who makes open source models in america

#

thats a china thing

earnest swift Aug 7, 2025, 2:57 AM

#

how do i get sound in the video

stray aspen Aug 7, 2025, 2:58 AM

#

earnest swift how do i get sound in the video

get lucky

#

you might get veo 3

#

but its not guaranteed

cedar tide Aug 7, 2025, 3:18 AM

#

cedar tide And 20b from arena

change of providers, now it's open ai themselves?

Screenshot_2025-08-07-06-15-07-849_com.discord-edit.jpg

stray aspen Aug 7, 2025, 3:21 AM

#

est ce qu ils utilisaient l api openrouter auparavant

hardy pecan Aug 7, 2025, 3:27 AM

#

https://x.com/sama/status/1953296157586931745

Sam Altman (@sama)

our livestream tomorrow at 10 am PDT will be longer than usual, around an hour.

we have a lot to show and hope you can find the the time to watch!

stray aspen Aug 7, 2025, 3:29 AM

#

google has to lock in

vital solar Aug 7, 2025, 3:29 AM

#

helo

obsidian shell Aug 7, 2025, 3:35 AM

#

scam altman better announce at least an update to the o models if not gpt5...

jade egret Aug 7, 2025, 4:30 AM

#

stray aspen google has to lock in

ya

copper ruin Aug 7, 2025, 4:49 AM

#

hello

sharp tiger Aug 7, 2025, 4:52 AM

#

@echo aurora

echo aurora Aug 7, 2025, 4:52 AM

#

sharp tiger <@283397944160550928>

ablobwave

sharp tiger Aug 7, 2025, 4:53 AM

#

echo aurora <a:ablobwave:552927506957729802>

Can I talk to you privately?

echo aurora Aug 7, 2025, 4:53 AM

#

sure

leaden palm Aug 7, 2025, 5:19 AM

#

hardy pecan https://x.com/sama/status/1953296157586931745

hour long livestream, then straight to TBPN after... we eating good tomorrow

whole wagon Aug 7, 2025, 5:29 AM

#

https://huggingface.co/huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated All those weeks of safety testing down the drain... Kappa

#

They've actually been removing them off hugging face. There was other ones made that got removed

wicked root Aug 7, 2025, 5:41 AM

#

Are we sure gpt5’s being released today?

raven helm Aug 7, 2025, 6:00 AM

#

raven helm Aug 7, 2025, 6:01 AM

#

wicked root Are we sure gpt5’s being released today?

YES, it is so obvious

solid brook Aug 7, 2025, 6:03 AM

#

wicked root Are we sure gpt5’s being released today?

It will i'm sure. Imagine if it doesn't it will be big shock.

wicked root Aug 7, 2025, 6:05 AM

#

I hope it doesnt

harsh flume Aug 7, 2025, 6:25 AM

#

raven helm

That doesnt mean anything if it was done by third party running their public dataset

#

classic data leakage fallacy

raven helm Aug 7, 2025, 6:25 AM

#

mmmm

leaden palm Aug 7, 2025, 6:33 AM

#

wicked root I hope it doesnt

found the patience cave alt

wicked root Aug 7, 2025, 6:34 AM

#

leaden palm found the patience cave alt

Who?

leaden palm Aug 7, 2025, 6:35 AM

#

wicked root Who?

novel flame Aug 7, 2025, 6:49 AM

#

stray aspen whats this

A new model architecture from Z.ai -- not an LLM / chatbot, it's built to perform reasoning and planning in latent space, and it performs very well on ARC-AGI. Not SoTA, but incredibly well for its miniscule size. As I understand it, HRM works by having two 'recurrent' transformer blocks, one fast and cheap, the other slower and more competent, and the 'high' one oversees the progress of the 'low' one and steers it. It's a novel and very interesting approach.

whole wagon Aug 7, 2025, 7:16 AM

#

#

openAI 120B open source model with reasoning even below many without reasoning

torn bison Aug 7, 2025, 7:19 AM

#

stray aspen est ce qu ils utilisaient l api openrouter auparavant

fireworks

whole wagon Aug 7, 2025, 7:19 AM

#

Idk how the model is so fried

#

Really don't get it. Like the only way you get performance like this is if you tried to

lethal oracle Aug 7, 2025, 7:20 AM

#

Hello guys

#

Since yesterday Im unable to access lmarena.ai

#

Can someone pls help

tropic mesa Aug 7, 2025, 7:22 AM

#

i get veo 3 but it doesnt have audio, i have done 5 videos

#

how to fix it

lethal oracle Aug 7, 2025, 7:23 AM

#

Bruh

ruby smelt Aug 7, 2025, 7:42 AM

#

@leaden palm hey, i am not able to create videos in the video arena. It's saying- the application did not respond

leaden palm Aug 7, 2025, 7:43 AM

#

problematic

#

try again?

ruby smelt Aug 7, 2025, 7:43 AM

#

It's working now! You did something?

leaden palm Aug 7, 2025, 7:44 AM

#

no

#

that was just the first thought that came to mind

ruby smelt Aug 7, 2025, 7:45 AM

#

It wasn't working before. I have been trying for 10 minutes. Thanks 🙌🏻

brittle nimbus Aug 7, 2025, 7:56 AM

#

hello word 🙂

dusky aurora Aug 7, 2025, 8:06 AM

#

brittle nimbus hello word 🙂

do I see a Polish name?

spring turtle Aug 7, 2025, 8:11 AM

#

Can someone explain why this prompt violates the TOU?

novel sigil Aug 7, 2025, 8:58 AM

#

I want to ask if the code "remove style control" is this?
https://colab.research.google.com/drive/19VPOril2FjCX34lJoo7qn4r6adgKLioY?ref=news.lmarena.ai

Google Colab

#

Does this mean adding some features of the format to train the BT model to calculate the elo score?

#

Please answer. Thank you.

ocean vortex Aug 7, 2025, 10:04 AM

#

novel sigil I want to ask if the code "remove style control" is this? https://colab.research...

this is linked from there so yes https://news.lmarena.ai/style-control/

LMArena Blog

Does Style Matter in AI Evaluations?

We controlled for the effect of length and markdown, and indeed, the ranking changed. This is just a first step towards our larger goal of disentangling substance and style in Chatbot Arena leaderboard.

#

It may have been updated since though

high fossil Aug 7, 2025, 10:04 AM

#

Hey!

ocean vortex Aug 7, 2025, 10:04 AM

#

novel sigil Aug 7, 2025, 10:34 AM

#

ocean vortex

So is this code “remove style control”？ I hope to receive your reply. Thank you

ocean vortex Aug 7, 2025, 10:35 AM

#

whole wagon openAI 120B open source model with reasoning even below many without reasoning

What is weird is how they got these benchmark results in a model card... Seems like there's something missing from the version we get to use lol

ocean vortex Aug 7, 2025, 10:38 AM

#

novel sigil So is this code “remove style control”？ I hope to receive your reply. Thank you

It's the opposite of that. It's adding style control which is now default view. By doing "remove style control" you are undoing those changes

novel sigil Aug 7, 2025, 10:38 AM

#

ocean vortex It's the opposite of that. It's adding style control which is now default view. ...

thanks bro

novel sigil Aug 7, 2025, 10:45 AM

#

ocean vortex What is weird is how they got these benchmark results in a model card... Seems l...

I'm also very curious because the volume of data is just too large

exotic tartan Aug 7, 2025, 10:47 AM

#

can someone convince me why OSS is worth running on my machine? this is soooo dogshit

#

also, i gave it another shot just to be sure. it gave me a wrong answer

wheat onyx Aug 7, 2025, 10:52 AM

#

https://x.com/scaling01/status/1953361422055842057?s=19

Lisan al Gaib (@scaling01)

GPT-5 Pro - Research-grade intelligence

with ChatGPT-Pro

keen beacon Aug 7, 2025, 10:52 AM

#

exotic tartan can someone convince me why OSS is worth running on my machine? this is soooo do...

Just use qwen3

#

I have found OSS to be extremely underwhelming

exotic tartan Aug 7, 2025, 10:52 AM

#

I know I can run qwen3, I'm just wondering why the hype around OSS if it's literally unusable

#

am i doing anything wrong? is ollama not really compatible with it or something?

keen beacon Aug 7, 2025, 10:54 AM

#

exotic tartan can someone convince me why OSS is worth running on my machine? this is soooo do...

Btw, what program/website is that?

#

looks sleek

#

for LLMs

keen beacon Aug 7, 2025, 10:54 AM

#

exotic tartan am i doing anything wrong? is ollama not really compatible with it or something?

Not sure.

exotic tartan Aug 7, 2025, 10:54 AM

#

It's called Ollama, you can run LLMs locally with it

novel sigil Aug 7, 2025, 10:54 AM

#

I choose GLM-4.5

exotic tartan Aug 7, 2025, 10:55 AM

#

It's very sleek, but almost no configurations

keen beacon Aug 7, 2025, 10:55 AM

#

exotic tartan It's called Ollama, you can run LLMs locally with it

Yeah but that interface? Isn't Ollama CLI based and pulls a model once it's downloaded? Sorry, I am a bit of a noob

#

I've been using LM Studio

exotic tartan Aug 7, 2025, 10:55 AM

#

I used it because it allows for easy web search implementation which I couldn't get in LM Studio, but maybe it's fixable somehow

#

It's just based on Ollama CLI, but the app is different. go to ollama.com

keen beacon Aug 7, 2025, 10:56 AM

#

exotic tartan It's just based on Ollama CLI, but the app is different. go to ollama.com

ahh, I see

#

thanks a lot

exotic tartan Aug 7, 2025, 10:57 AM

#

For sure. Let me know if you know how to enable web tooling in LM Studio

keen beacon Aug 7, 2025, 10:57 AM

#

https://ollama.com/blog/new-app

Ollama's new app · Ollama Blog

Ollama's new app is now available for macOS and Windows.

#

ahh it was released recently

#

no wonder I haven't known about the app

#

lol

exotic tartan Aug 7, 2025, 11:11 AM

#

I find the answers to be much slower and worse compared to LM Studio for some reason

keen beacon Aug 7, 2025, 11:12 AM

#

exotic tartan I find the answers to be much slower and worse compared to LM Studio for some re...

I hope Ollama will have some agentic automation too in the future

#

would be real nice

exotic tartan Aug 7, 2025, 11:12 AM

#

They seem to be really far from this, but yeah would be amazing

ocean vortex Aug 7, 2025, 11:14 AM

#

wtf

#

was this benchmaxxed to oblivion...

#

gpt-oss

keen beacon Aug 7, 2025, 11:17 AM

#

ocean vortex wtf

https://tenor.com/view/damn-gif-gif-11820915051981029439

Tenor

pale sable Aug 7, 2025, 11:19 AM

#

Hey guys whats the best way to make money with ai

exotic tartan Aug 7, 2025, 11:22 AM

#

pale sable Hey guys whats the best way to make money with ai

ask an LLM, the one with the best answer wins

ocean vortex Aug 7, 2025, 11:23 AM

#

keen beacon https://tenor.com/view/damn-gif-gif-11820915051981029439

this is like 2.5Flash, except way more extreme

#

Goes to show the edge case of those benchmarks being the least effective... Since their only performance line is those scores dropping. And those benchmarks are not good enough to control model size

keen beacon Aug 7, 2025, 11:24 AM

#

ocean vortex Goes to show the edge case of those benchmarks being the least effective... Sinc...

Niche tests are always the best tbh

#

I write the most absurd stuff to test the reasoning

#

with all kinds of models

#

lol

exotic tartan Aug 7, 2025, 11:25 AM

#

I find LM Arena to be the only benchmark I care about

#

The ranking is pretty much how I feel usually about the models. Human vetting is still king

ocean vortex Aug 7, 2025, 11:27 AM

#

exotic tartan The ranking is pretty much how I feel usually about the models. Human vetting is...

Nah it's just a different area of evals. It's not substitute

#

And like, do you think 4o is better than all those models below it...?

exotic tartan Aug 7, 2025, 11:28 AM

#

What's this leaderboard? text?

ocean vortex Aug 7, 2025, 11:28 AM

#

yes

#

lmarena main text leaderboard

exotic tartan Aug 7, 2025, 11:28 AM

#

I definitely do think it's better than them at text

keen beacon Aug 7, 2025, 11:28 AM

#

ocean vortex And like, do you think 4o is better than all those models below it...?

I still find Kimi K2 the best at emotions and general EQ. 4o is too sycophantic along with gemini 2.5 models

pale sable Aug 7, 2025, 11:28 AM

#

Im thinking of making an ai model and sell her pics on OF

keen beacon Aug 7, 2025, 11:29 AM

#

pale sable Im thinking of making an ai model and sell her pics on OF

what

exotic tartan Aug 7, 2025, 11:29 AM

#

Yassin, go away bro

keen beacon Aug 7, 2025, 11:29 AM

#

pale sable Im thinking of making an ai model and sell her pics on OF

https://tenor.com/view/suspicious-untrustful-distrustful-jeremy-clarkson-top-gear-gif-18037537

Tenor

ocean vortex Aug 7, 2025, 11:29 AM

#

exotic tartan I definitely do think it's better than them at text

4o-latest is nowhere near Opus4, Grok4 or even R1 in terms of text performance

exotic tartan Aug 7, 2025, 11:29 AM

#

So explain why people think it's better?

#

And define text performance. I think it's a mixture of accuracy, style, speed etc

ocean vortex Aug 7, 2025, 11:30 AM

#

exotic tartan So explain why people think it's better?

They prefer it's writing style. This eval is measuring human preference elo. Not performance/accuracy

exotic tartan Aug 7, 2025, 11:31 AM

#

If it wouldn't be accurate, people would choose the accurate answer way before how stylized it is

ocean vortex Aug 7, 2025, 11:31 AM

#

It's also good at convincing - but that's also not an indicator of performance 👀

ocean vortex Aug 7, 2025, 11:32 AM

#

exotic tartan If it wouldn't be accurate, people would choose the accurate answer way before h...

Human preference is inherently subjective

exotic tartan Aug 7, 2025, 11:32 AM

#

I'm getting hallucinations from all text based LLMs... it's not unique to 4o

ocean vortex Aug 7, 2025, 11:32 AM

#

and is often not factual

keen beacon Aug 7, 2025, 11:32 AM

#

ocean vortex It's also good at convincing - but that's also not an indicator of performance �...

I find it much nicer if a model could say that it does not know the answer and guides the user to use other sources

ocean vortex Aug 7, 2025, 11:33 AM

#

People have no clue what the correct answer is, both look "similar enough" but one response looks more convincing... that's how this works

#

well not always, but many of that is this.

exotic tartan Aug 7, 2025, 11:34 AM

#

People are varied and not as dumb as you make them to be
I agree that style is a factor and that people prefer 4o style, but I just don't think it's as black and white as you make it sound

ocean vortex Aug 7, 2025, 11:35 AM

#

exotic tartan People are varied and not as dumb as you make them to be I agree that style is a...

It kinda is though because it's what this benchmark is. It's not trying to be something else. It's measuring human preference elo

exotic tartan Aug 7, 2025, 11:35 AM

#

yup, and as I said, human prefer not just style, but also accuracy, speed, etc

keen beacon Aug 7, 2025, 11:37 AM

#

exotic tartan yup, and as I said, human prefer not just style, but also accuracy, speed, etc

For me when I am talking in my native language on LMArena to the models, I check for grammar since many can have clumsy wordings at times (applies mostly to open-source models / small models). It's quite funny.

exotic tartan Aug 7, 2025, 11:37 AM

#

Answer A is totally wrong, answer B is less wrong. I prefer B.
Not every question I ask a text based LLM is something I don't know anything about... sometimes the test is asking it about niche subjects i DO know about.

ocean vortex Aug 7, 2025, 11:39 AM

#

exotic tartan yup, and as I said, human prefer not just style, but also accuracy, speed, etc

speed they are equalising so mostly not a factor, accuracy... Once again that's way way less of a factor and has less factual weight than in conventional benchmarks. Even if the model output is completely wrong, it still can win the user over to get the vote with sycophancy, style, manipulation (negative strong word but there can be some of that in a sense of it abusing the text that people generally like seeing) or whatever else... 👀

exotic tartan Aug 7, 2025, 11:39 AM

#

I feel like I'm repeating myself to be honest
It's okay to not agree 🙂

#

A model can sound as convincing as it wishes... it's easy to spot some hallucinations. Also if you get 2 completely different answers on a subject you know nothing about, picking based on perceived accuracy without you knowing the actual answer is lazy voting

ocean vortex Aug 7, 2025, 11:41 AM

#

exotic tartan Answer A is totally wrong, answer B is less wrong. I prefer B. Not every questi...

When you are matched with LLMs from same tier, often enough you can be swayed against your best intentions by the factors I already mentioned.

keen beacon Aug 7, 2025, 11:41 AM

#

That is hilarious

exotic tartan Aug 7, 2025, 11:41 AM

#

Right, that's why we have thousands of votes and not just 14

ocean vortex Aug 7, 2025, 11:42 AM

#

It's only natural and what happens kinda by design... when everything evolves around *preference *

exotic tartan Aug 7, 2025, 11:45 AM

#

Truth is dynamic and subjective 🙂

ocean vortex Aug 7, 2025, 11:46 AM

#

yeah and even if we don't see them as "poor judges", they will never be as good at assessing accuracy as curated tests with verified answers by industry experts are.

exotic tartan Aug 7, 2025, 11:46 AM

#

I mean we can go deep into an Adderall debate about life after death, existence of god, belief systems, crypto etc
There are ongoing debates about these subjects with no hard 'truths'

#

No one can convice me there is or isn't life after death - we just don't know

ocean vortex Aug 7, 2025, 11:51 AM

#

Ok change of subject, I wonder why OpenAI ditched their yap score for gpt-oss... catgrin

pure anvil Aug 7, 2025, 11:52 AM

#

openai couldn't even release a half usable open weights model, so trash

brave orbit Aug 7, 2025, 11:53 AM

#

poll_question_text

What Is The Browser You Love The Most

victor_answer_votes

1

total_votes

3

ocean vortex Aug 7, 2025, 11:54 AM

#

pure anvil openai couldn't even release a half usable open weights model, so trash

I agree that the model seems trash mostly, but technically this says differently lmao

#general message

exotic tartan Aug 7, 2025, 11:59 AM

#

That's why I keep saying benchmarks suck.. feels disconnected from how they act in real life. Also show me the phone that runs OSS lol

keen fulcrum Aug 7, 2025, 12:05 PM

#

Can lmarena pay more attention to search arena improvements. This area is often neglected

autumn blaze Aug 7, 2025, 12:11 PM

#

Can anyone help me, I just posted a prompt in the video arena and its more than 30 mins its still showing that its generating yet, Although i have generate 2 videos before and after that stucked prompt i generated one more video everything is fine no errors still its takking a long time, [ I just commanded it that it should be a video of 5 mins ] is that causing error or making it late .

willow grail Aug 7, 2025, 12:17 PM

#

if gpt5 comes today, how can i access it via sub in europe?

#

do i need a non-eu cc, or just vpn with german cc?

#

or vpn only works if i use another one like paypal?

exotic tartan Aug 7, 2025, 12:21 PM

#

why do you expect usage issues in Europe?

willow grail Aug 7, 2025, 12:21 PM

#

exotic tartan why do you expect usage issues in Europe?

cause eu?

#

lol

wheat onyx Aug 7, 2025, 12:30 PM

#

Not confirmed

https://x.com/OpenInsightss/status/1953262083237396674?s=19

Open Insights (@OpenInsightss)

🚨 All the leaked ChatGPT-5 benchmark’s

ocean vortex Aug 7, 2025, 12:30 PM

#

willow grail if gpt5 comes today, how can i access it via sub in europe?

by going openai.com. Why would Europe be restricted?

willow grail Aug 7, 2025, 12:31 PM

#

ocean vortex by going openai.com. Why would Europe be restricted?

omg do u all live in usa behind a rock in a cave

#

eu has some ai regulations

ocean vortex Aug 7, 2025, 12:31 PM

#

I am from Europe

willow grail Aug 7, 2025, 12:31 PM

#

no idea how easy/hard it is to follow the eu rules

ocean vortex Aug 7, 2025, 12:32 PM

#

They don't have delays for model releases

#

only certain features and agents

willow grail Aug 7, 2025, 12:32 PM

#

sora was delayed a lot

#

i think o1/o3 too

ocean vortex Aug 7, 2025, 12:32 PM

#

Well I meant LLMs

ocean vortex Aug 7, 2025, 12:32 PM

#

willow grail i think o1/o3 too

It wasn't though?

willow grail Aug 7, 2025, 12:32 PM

#

these are llms too

ocean vortex Aug 7, 2025, 12:32 PM

#

I was using it on release

willow grail Aug 7, 2025, 12:33 PM

#

then u dont eu

ocean vortex Aug 7, 2025, 12:33 PM

#

I EU

#

I would know

#

lol

#

@willow grail have you ever used OpenAI playground?

#

Models there get avail as soon as in US, for the most part

willow grail Aug 7, 2025, 12:34 PM

#

ocean vortex <@1133773055752155177> have you ever used OpenAI playground?

too much money

ocean vortex Aug 7, 2025, 12:34 PM

#

well... But it's the way to go if you want to test new models.

willow grail Aug 7, 2025, 12:34 PM

#

i hate testing stuff.. via api

#

gives me nightmares

ocean vortex Aug 7, 2025, 12:35 PM

#

willow grail i hate testing stuff.. via api

Playground is and websites like openrouter are technically API as well. You don't need to know code to be using it tbh

willow grail Aug 7, 2025, 12:35 PM

#

ocean vortex Playground is and websites like openrouter are technically API as well. You don'...

i mean the money

ocean vortex Aug 7, 2025, 12:36 PM

#

willow grail i mean the money

If you want to simply test a model it is not gonna be expensive usually...

blissful vine Aug 7, 2025, 12:43 PM

#

how can we choose the model ?

ocean vortex Aug 7, 2025, 12:44 PM

#

blissful vine how can we choose the model ?

carefully

quartz light Aug 7, 2025, 12:45 PM

#

https://websim.com/@FilipTheFlop/gpt-5-release-countdown

GPT-5 Release Countdown

blissful vine Aug 7, 2025, 12:45 PM

#

meaning? I want to choose veo3 and runway ?

prime mulch Aug 7, 2025, 12:45 PM

#

Screenshot_2025-08-07-18-14-45-99_40deb401b9ffe8e1df2f1cc5ba480b12.jpg

quartz light Aug 7, 2025, 12:45 PM

#

is gemini 3 releasing today too?

prime mulch Aug 7, 2025, 12:46 PM

#

Screenshot_2025-08-07-18-16-14-39_40deb401b9ffe8e1df2f1cc5ba480b12.jpg

#

Again same issue 😭

quartz light Aug 7, 2025, 12:46 PM

#

prime mulch

what browser

prime mulch Aug 7, 2025, 12:46 PM

#

Chrome

quartz light Aug 7, 2025, 12:47 PM

#

you should use edge canary

prime mulch Aug 7, 2025, 12:47 PM

#

I used normally suddenly my session got deleted

ocean vortex Aug 7, 2025, 12:47 PM

#

blissful vine meaning? I want to choose veo3 and runway ?

I wasn't being serious lol. You can't choose models for video gen. You can't choose them for any battle, only direct chat but not for all

quartz light Aug 7, 2025, 12:47 PM

#

quartz light you should use edge canary

it has chrome extension support forked from kiwi browser

quartz light Aug 7, 2025, 12:47 PM

#

prime mulch Chrome

https://play.google.com/store/apps/details/Microsoft_Edge_Canary?id=com.microsoft.emmx.canary&hl=en_IE&pli=1

Microsoft Edge Canary – Apps on Google Play

Microsoft Edge Canary app

prime mulch Aug 7, 2025, 12:47 PM

#

Is it good?

quartz light Aug 7, 2025, 12:48 PM

#

yes

prime mulch Aug 7, 2025, 12:48 PM

#

I got this message suddenly

quartz light Aug 7, 2025, 12:48 PM

#

it has chrome extension support which is rare

prime mulch Aug 7, 2025, 12:48 PM

#

Can u try to access lm arena?

quartz light Aug 7, 2025, 12:48 PM

#

oops i meant https://play.google.com/store/apps/details?id=com.microsoft.emmx.dev

Microsoft Edge Dev - Apps on Google Play

Microsoft Edge Dev app

quartz light Aug 7, 2025, 12:49 PM

#

prime mulch Can u try to access lm arena?

@echo aurora

#

PC

#

thorium

prime mulch Aug 7, 2025, 12:49 PM

#

quartz light <@283397944160550928>

So its same for all

quartz light Aug 7, 2025, 12:49 PM

#

prime mulch So its same for all

yep

prime mulch Aug 7, 2025, 12:49 PM

#

quartz light thorium

Is it a browser

quartz light Aug 7, 2025, 12:50 PM

#

prime mulch Is it a browser

yes, fastest

prime mulch Aug 7, 2025, 12:50 PM

#

quartz light yep

Are u into genai?

quartz light Aug 7, 2025, 12:50 PM

#

https://thorium.rocks

Thorium Browser

Chromium fork for Linux, Windows, MacOS, Android, and Raspberry Pi named after radioactive element No. 90.

#

https://alpha.lmarena.ai/ works :)

prime mulch Aug 7, 2025, 12:50 PM

#

quartz light Aug 7, 2025, 12:50 PM

#

quartz light https://alpha.lmarena.ai/ works :)

@prime mulch

prime mulch Aug 7, 2025, 12:50 PM

#

How is it?

quartz light Aug 7, 2025, 12:51 PM

#

good, would work as wallpaper for phones

prime mulch Aug 7, 2025, 12:51 PM

#

Yea i created this

devout vault Aug 7, 2025, 12:51 PM

#

i cant wait for gpt-5 bro it's releasing today

prime mulch Aug 7, 2025, 12:52 PM

#

And i have a little wall paper channel but that have no views i wait for growth

prime mulch Aug 7, 2025, 12:53 PM

#

quartz light good, would work as wallpaper for phones

#

What about this

#

This is my masterpiece

rocky mauve Aug 7, 2025, 12:54 PM

#

What’s the latest ai model available on lmarena?

prime mulch Aug 7, 2025, 12:54 PM

#

devout vault i cant wait for gpt-5 bro it's releasing today

That will change ai era

quartz light Aug 7, 2025, 12:54 PM

#

https://websim.com/@FilipTheFlop/gpt-5-release-countdown

GPT-5 Release Countdown

gusty helm Aug 7, 2025, 12:55 PM

#

isnt this just a countdown to livestream?

quartz light Aug 7, 2025, 12:55 PM

#

gusty helm isnt this just a countdown to livestream?

yes buuuut

gusty helm Aug 7, 2025, 12:55 PM

#

does not seem official in any means KEKW

quartz light Aug 7, 2025, 12:56 PM

#

gusty helm does not seem official in any means <:KEKW:1001477570816118804>

it aint

#

my friend generated

devout vault Aug 7, 2025, 12:56 PM

#

gusty helm does not seem official in any means <:KEKW:1001477570816118804>

quartz light Aug 7, 2025, 12:56 PM

#

😭

devout vault Aug 7, 2025, 12:56 PM

#

"LIVE 5 STREAM"

gusty helm Aug 7, 2025, 12:56 PM

#

I mean sure, the X post

#

but the site above seems misleading

quartz light Aug 7, 2025, 12:57 PM

#

devout vault Aug 7, 2025, 12:57 PM

#

gusty helm but the site above seems misleading

it doesn't

gusty helm Aug 7, 2025, 12:57 PM

#

ok

warm fulcrum Aug 7, 2025, 12:58 PM

#

quartz light

cat?

quartz light Aug 7, 2025, 12:58 PM

#

warm fulcrum cat?

lol yes

warm fulcrum Aug 7, 2025, 12:58 PM

#

wowwwww

quartz light Aug 7, 2025, 12:58 PM

#

https://chromecat.app/

Chrome Cat

I'm a cat that likes to hang out in Chrome (^..^)ﾉ

bold tiger Aug 7, 2025, 12:58 PM

#

hi

willow grail Aug 7, 2025, 1:02 PM

#

ocean vortex If you want to simply test a model it is not gonna be expensive usually...

so i stand right. there wont be gpt5 for europeans.

#

api is too expensive.

#

i am right, you loose.

quartz light Aug 7, 2025, 1:06 PM

#

willow grail so i stand right. there wont be gpt5 for europeans.

as a european i found a site with free opus 4.1 within minutes of its release

willow grail Aug 7, 2025, 1:06 PM

#

not interested into boring stuff

quartz light Aug 7, 2025, 1:06 PM

#

https://tenor.com/view/speech-bubble-chicken-gif-26197609

Tenor

willow grail Aug 7, 2025, 1:06 PM

#

quartz light https://tenor.com/view/speech-bubble-chicken-gif-26197609

such sites always have side effects

#

QOL issues.

torn mantle Aug 7, 2025, 1:06 PM

#

1H LEFT

#

FOR GPT5 STREAM

#

😄

willow grail Aug 7, 2025, 1:06 PM

#

torn mantle 1H LEFT

girl?

rocky mauve Aug 7, 2025, 1:06 PM

#

https://tenor.com/view/clown-clowntoclown-conversation-telepathy-clowns-gif-25167584

Tenor

willow grail Aug 7, 2025, 1:07 PM

#

u ok girl?

quartz light Aug 7, 2025, 1:07 PM

#

torn mantle 1H LEFT

1h?

willow grail Aug 7, 2025, 1:07 PM

#

4 hours..

#

girl

#

not 1hour

quartz light Aug 7, 2025, 1:07 PM

#

yeah

willow grail Aug 7, 2025, 1:07 PM

#

pls visit doctor

quartz light Aug 7, 2025, 1:07 PM

#

lol

torn mantle Aug 7, 2025, 1:07 PM

#

ah

#

sorry

willow grail Aug 7, 2025, 1:07 PM

#

@torn mantle pls visit doctor

#

sory wont send doctor

quartz light Aug 7, 2025, 1:07 PM

#

torn mantle sorry

wait where does it say 1 hr

willow grail Aug 7, 2025, 1:07 PM

#

to u

#

cognitive dissonance asura girl

quartz light Aug 7, 2025, 1:07 PM

#

willow grail cognitive dissonance asura girl

https://tenor.com/view/how-bro-felt-after-writing-that-how-bro-felt-alpha-wolf-alpha-alpha-meme-gif-307456636039877895

Tenor

willow grail Aug 7, 2025, 1:07 PM

#

no i dont.

#

i feel trash

quartz light Aug 7, 2025, 1:08 PM

#

https://cdn.discordapp.com/attachments/1190308681540182038/1332826761460842588/otag_hands.gif

willow grail Aug 7, 2025, 1:08 PM

#

also asura is a bad server from THE ISLE

#

they do KOS kill on sight all the time

#

so annozying

ocean vortex Aug 7, 2025, 1:17 PM

#

willow grail api is too expensive.

It's not expensive. Are you like from Belarus or smth? That's not EU

quartz light Aug 7, 2025, 1:21 PM

#

Qwen image gen is pretty good

stray aspen Aug 7, 2025, 1:21 PM

#

Gpt 5 is almost out

willow grail Aug 7, 2025, 1:26 PM

#

ocean vortex It's not expensive. Are you like from Belarus or smth? That's not EU

uhm o3 api is... XD

#

or gemini 2.5 pro

#

or opus 4

ocean vortex Aug 7, 2025, 1:27 PM

#

willow grail uhm o3 api is... XD

that's not expensive... I can agree that o1-pro was expensive, but not current price of o3, that's cheap lol

white hatch Aug 7, 2025, 1:27 PM

#

hell yeaaaaaaaaaah

willow grail Aug 7, 2025, 1:27 PM

#

no

#

yeaaaaaaaaaaah

ocean vortex Aug 7, 2025, 1:28 PM

#

stray aspen Gpt 5 is almost out

is it a girl?

willow grail Aug 7, 2025, 1:28 PM

#

ITS A BODYBULDER

stray aspen Aug 7, 2025, 1:31 PM

#

No

#

@willow grail how expensive will gpt 5 be

willow grail Aug 7, 2025, 1:31 PM

#

very

keen beacon Aug 7, 2025, 1:31 PM

#

stray aspen <@1133773055752155177> how expensive will gpt 5 be

Yes.

willow grail Aug 7, 2025, 1:31 PM

#

via api

keen beacon Aug 7, 2025, 1:32 PM

#

Then Altman says "This is the way to feel the AGI"

prime mulch Aug 7, 2025, 1:32 PM

#

Gpt 5 will change the world views about ai

stray aspen Aug 7, 2025, 1:32 PM

#

prime mulch Gpt 5 will change the world views about ai

The only thing it will.change is my wallet

keen beacon Aug 7, 2025, 1:32 PM

#

prime mulch Gpt 5 will change the world views about ai

The hallucination levels need to get solved before we can develop any AGI

quartz light Aug 7, 2025, 1:32 PM

#

stray aspen <@1133773055752155177> how expensive will gpt 5 be

y e s.

prime mulch Aug 7, 2025, 1:33 PM

#

All HAIL AGI

quartz light Aug 7, 2025, 1:33 PM

#

gggguys just make agi by making it self train on data from internet!!!!!!!....1!!!

prime mulch Aug 7, 2025, 1:33 PM

#

People don't realise how powerful AI. Will get with agi

willow grail Aug 7, 2025, 1:34 PM

#

ill just play rail route. wait for gpt5. be disappointed that it still cant make video games.
and continue playing RAIL ROUTE

stray aspen Aug 7, 2025, 1:34 PM

#

Do you think gpt 5 will be AGI

quartz light Aug 7, 2025, 1:34 PM

#

no

prime mulch Aug 7, 2025, 1:34 PM

#

Nah

keen beacon Aug 7, 2025, 1:34 PM

#

prime mulch People don't realise how powerful AI. Will get with agi

I don't know if that will happen with the LLM architecture though when there are many other ones that are being developed that are far more efficient

keen beacon Aug 7, 2025, 1:34 PM

#

stray aspen Do you think gpt 5 will be AGI

no

prime mulch Aug 7, 2025, 1:34 PM

#

It will be one of the powerful llm not agi

stray aspen Aug 7, 2025, 1:34 PM

#

willow grail ill just play rail route. wait for gpt5. be disappointed that it still cant make...

Let's not overhype i hope it doesn't end up like gpt oss which is absolute garbage

willow grail Aug 7, 2025, 1:35 PM

#

stray aspen Let's not overhype i hope it doesn't end up like gpt oss which is absolute garba...

reread this. there is no hype there

#

RE READ THIS

quartz light Aug 7, 2025, 1:35 PM

#

prime mulch It will be one of the powerful llm not agi

i wonder if they'll have a reasoning mode for it right away

stray aspen Aug 7, 2025, 1:35 PM

#

He'll yes

prime mulch Aug 7, 2025, 1:36 PM

#

quartz light i wonder if they'll have a reasoning mode for it right away

Yea it have possibility

trail wagon Aug 7, 2025, 1:36 PM

#

Prompt: generate video with both text about Russia. Duration 8 second

keen beacon Aug 7, 2025, 1:36 PM

#

quartz light i wonder if they'll have a reasoning mode for it right away

I think sam said it will be a hybrid model which can work with reasoning and without it (GPT5)

quartz light Aug 7, 2025, 1:37 PM

#

trail wagon Prompt: generate video with both text about Russia. Duration 8 second

wrong channel

quartz light Aug 7, 2025, 1:37 PM

#

keen beacon I think sam said it will be a hybrid model which can work with reasoning and wit...

they probably already have it ready but wont release it right away so they can spread out the hype

#

or something idk

#

keen beacon Aug 7, 2025, 1:38 PM

#

quartz light they probably already have it ready but wont release it right away so they can s...

Let's just hope it is a good improvement over gpt4o

quartz light Aug 7, 2025, 1:38 PM

#

keen beacon Let's just hope it is a good improvement over gpt4o

it is

#

100%

#

have you seen the leak

keen beacon Aug 7, 2025, 1:38 PM

#

quartz light it is

Yeah but I'd hate it to be as sycophantic

warm fulcrum Aug 7, 2025, 1:38 PM

#

theres been like 20 leaks already

keen beacon Aug 7, 2025, 1:38 PM

#

warm fulcrum theres been like 20 leaks already

I have seen some

quartz light Aug 7, 2025, 1:38 PM

#

keen beacon Aug 7, 2025, 1:38 PM

#

I like to keep it as a surprise for myself

warm fulcrum Aug 7, 2025, 1:39 PM

#

cat?

quartz light Aug 7, 2025, 1:39 PM

#

warm fulcrum cat?

yes yes

warm fulcrum Aug 7, 2025, 1:39 PM

#

quartz light yes yes

what if it needs to poop

#

does it just go out of frame

willow grail Aug 7, 2025, 1:40 PM

#

quartz light

URL NOW

quartz light Aug 7, 2025, 1:40 PM

#

willow grail Aug 7, 2025, 1:40 PM

#

quartz light

U

#

R

#

L

quartz light Aug 7, 2025, 1:40 PM

#

willow grail URL NOW

https://www.rxddit.com/r/OpenAI/comments/1mettre/comment/n6cekdr

rxddit.com

GPT-5 is already (ostensibly) available via API

u/segin on r/OpenAI

🖼️ Gallery: 2 Images

Comment by u/testmath:
I did "Generate an SVG of a pelican riding a bicycle" and this is what it did, seems like the real deal to me:

https://preview.redd.it/int18mghqegf1.png?width=2560&format=png&auto=webp&s=7f18ea681937c8e09ef48e8006ea5e436a77343e

---- Original Post ----

Using the model `gpt-5-bench-chatcompletio...

willow grail Aug 7, 2025, 1:40 PM

#

good boy pat pat

quartz light Aug 7, 2025, 1:40 PM

#

😭

willow grail Aug 7, 2025, 1:41 PM

#

i am literally complimenting you?!

quartz light Aug 7, 2025, 1:41 PM

#

😡😡😡😡11!!!!!11!11!

keen beacon Aug 7, 2025, 1:41 PM

#

willow grail good boy *pat pat*

https://tenor.com/view/surprised-shocked-funny-memes-gif-2651717394134726385

Tenor

stray aspen Aug 7, 2025, 1:41 PM

#

@willow grail are you Belarusian

willow grail Aug 7, 2025, 1:41 PM

#

stray aspen <@1133773055752155177> are you Belarusian

hell no

quartz light Aug 7, 2025, 1:42 PM

#

willow grail hell no

railroutian

willow grail Aug 7, 2025, 1:42 PM

#

i am croatian

quartz light Aug 7, 2025, 1:42 PM

#

crowatia

mortal quest Aug 7, 2025, 1:42 PM

#

GM

willow grail Aug 7, 2025, 1:42 PM

#

#

the feel when you realize how much more unlocked stations ther is on the map https://i.imgur.com/WiHsvMO.jpeg

Imgur

novel flame Aug 7, 2025, 1:43 PM

#

wheat onyx Not confirmed https://x.com/OpenInsightss/status/1953262083237396674?s=19

If GPT-5 actually scores >65 on ARC-AGI 2, that would be significant: https://pbs.twimg.com/media/GxthgsDXgAAoW6F?format=jpg&name=large

willow grail Aug 7, 2025, 1:44 PM

#

oh u think its only 65% or so? ...... i hope its at human baseline

keen beacon Aug 7, 2025, 1:44 PM

#

willow grail oh u think its only 65% or so? ...... i hope its at human baseline

only 65% ?

willow grail Aug 7, 2025, 1:44 PM

#

i am soon dead. i want immortality tech now. i have no time

#

i am 32

#

i dont have time for 1% per year

keen beacon Aug 7, 2025, 1:45 PM

#

willow grail i dont have time for 1% per year

Competition will make this speed up, esp with China coming out with real good models

#

and research

warm fulcrum Aug 7, 2025, 1:46 PM

#

wow

#

ngl we need gpt 6 now

willow grail Aug 7, 2025, 1:46 PM

#

china so far only delivers vomit . from their bad robots who they send to various events and act like its autonomous but its rather just a simple animation baked it to their bad text models nobody uses

warm fulcrum Aug 7, 2025, 1:46 PM

#

willow grail china so far only delivers vomit . from their bad robots who they send to variou...

???????

#

china ontop

willow grail Aug 7, 2025, 1:46 PM

#

top of what?

warm fulcrum Aug 7, 2025, 1:46 PM

#

everything

novel flame Aug 7, 2025, 1:46 PM

#

warm fulcrum china ontop

Please don't feed the trolls.

willow grail Aug 7, 2025, 1:46 PM

#

how many people die per 1000 residents because china quality is trash?

#

buildings are made of paper, trains crash daily, bridges crashes, fake robots just doing baked in animation

keen beacon Aug 7, 2025, 1:47 PM

#

willow grail how many people die per 1000 residents because china quality is trash?

Be mad, troll

eternal niche Aug 7, 2025, 1:47 PM

#

willow grail buildings are made of paper, trains crash daily, bridges crashes, fake robots ju...

iphone made in china

willow grail Aug 7, 2025, 1:47 PM

#

what is wrong about what i said?

willow grail Aug 7, 2025, 1:47 PM

#

eternal niche iphone made in china

difference is, the company isnt china based

#

china based companies and ceos will create bad products

#

if i am a china man i wont care about investing into the product, it can brake down next day. like a big 100 floor building

#

if i am any other country man i will care about quality

lime coral Aug 7, 2025, 1:48 PM

#

They ship better models than gpt-oss at least

willow grail Aug 7, 2025, 1:49 PM

#

uhm its a fact that in china much more things brake down than in usa or europe

#

and people here telling me china is the chad

#

🐒

warm fulcrum Aug 7, 2025, 1:50 PM

#

why so rude against china town

willow grail Aug 7, 2025, 1:50 PM

#

warm fulcrum why so rude against china town

why u china propaganda?

warm fulcrum Aug 7, 2025, 1:50 PM

#

willow grail why u china propaganda?

no propaganda china ontop

#

china #1 country

#

china make new technology everyday

willow grail Aug 7, 2025, 1:51 PM

#

nono india number one

warm fulcrum Aug 7, 2025, 1:51 PM

#

india bad

willow grail Aug 7, 2025, 1:51 PM

#

racist

echo aurora Aug 7, 2025, 1:51 PM

#

Lets keep conversation focussed on AI and respectful please.

eternal niche Aug 7, 2025, 1:52 PM

#

willow grail nono india number one

lol

willow grail Aug 7, 2025, 1:55 PM

#

2 hours

warm fulcrum Aug 7, 2025, 1:56 PM

#

willow grail 2 hours

3 hours???????

willow grail Aug 7, 2025, 1:56 PM

#

INDIA

stray aspen Aug 7, 2025, 1:57 PM

#

gpt 5 is out in 3 hours

keen beacon Aug 7, 2025, 1:59 PM

#

stray aspen gpt 5 is out in 3 hours

SAMA said that the livestream will be longer than usual at 1 hour long

#

yay

#

quite exciting

willow grail Aug 7, 2025, 2:00 PM

#

yeah cause agentic

stray aspen Aug 7, 2025, 2:00 PM

#

@willow grailare you croatian

willow grail Aug 7, 2025, 2:00 PM

#

yes

prime mulch Aug 7, 2025, 2:10 PM

#

I think china will release better version of gpt 5 in opensource after some months

cedar tide Aug 7, 2025, 2:10 PM

#

the request that is useless at all
https://discord.com/channels/1340554757349179412/1403017053031628913

stray aspen Aug 7, 2025, 2:15 PM

#

are you serious david

ocean vortex Aug 7, 2025, 2:25 PM

#

I think it's unlikely China will manufacture it's own chips able to compete with current best anytime soon tbh

stray aspen Aug 7, 2025, 2:26 PM

#

will we have gpt-5 pro high in the arena

echo aurora Aug 7, 2025, 2:26 PM

#

Reminder that we are doing a little watch party for the livestream: https://discord.com/events/1340554757349179412/1402720955192705176

solid brook Aug 7, 2025, 2:26 PM

#

excellent

ocean vortex Aug 7, 2025, 2:27 PM

#

Not really possible unless they start inventing things themselves

#

also corruption...

#

You can't beat competition by constantly trying to replicate whatever they are doing being 2 steps behind

torn mantle Aug 7, 2025, 2:29 PM

#

i agree

#

with that

#

craig

#

first time

#

being right

#

hope not