#general | Arena | Page 79

torn mantle Jul 31, 2025, 11:01 AM

#

simpleQA and GPQA

rare python Jul 31, 2025, 11:01 AM

#

Seems specialized

torn mantle Jul 31, 2025, 11:01 AM

#

so it should have some decent world knowledge

cedar tide Jul 31, 2025, 11:02 AM

#

96% on simple qa without search tool its impossible

#

fake

civic flame Jul 31, 2025, 11:03 AM

#

bro calls everything fake

torn mantle Jul 31, 2025, 11:07 AM

#

cedar tide 96% on simple qa without search tool its impossible

i dont think its fake

deft vigil Jul 31, 2025, 11:08 AM

#

did everything will be eaten by gpt 5

cedar tide Jul 31, 2025, 11:09 AM

#

civic flame bro calls everything fake

They are unknown and they must not have a lot of money. Do you believe that they created an LLM from A to Z with a 96 on the simplqa without access to a search tool?

reef bridge Jul 31, 2025, 11:10 AM

#

is there MCP feature on LMarena? it would be cool to test out models of how good they are with MCP

keen beacon Jul 31, 2025, 11:11 AM

#

reef bridge is there MCP feature on LMarena? it would be cool to test out models of how good...

You could put that in #1372230675914031105

civic flame Jul 31, 2025, 11:13 AM

#

cedar tide They are unknown and they must not have a lot of money. Do you believe that they...

they're backed by sequoia and they published their model's full proofs for the IMO

#

i trust them 🤷‍♂️ suit yourself

#

just because they're not a big lab that doesn't mean they can't make big advancements - they've been doing some cool math-related stuff in the field for a year

cedar tide Jul 31, 2025, 11:18 AM

#

@civic flameI'm not saying they never got this score, but that they forgot to specify that this score is with a research tool

#

@civic flamethat he is strong in gpa it is possible that he found a new reasoning technique, but the simple qa is just knowledge so to have 96 it would be necessary to make a model 10 times larger than our sota and it is impossible that he did it (or trained it specifically on simple qa but it is stupid)

#

even perplexity deep research has only 94

civic flame Jul 31, 2025, 11:24 AM

#

nvm this isn't from harmonic this is autopoiesis

#

https://autopoiesis.science/blog/92-4-gpqa-diamond give this a read

92.4% GPQA Diamond - Autopoesis Sciences

Autopoiesis Sciences. Research breakthrough in model reasoning and new funding led by Informed Ventures.

torn mantle Jul 31, 2025, 11:24 AM

#

civic flame https://autopoiesis.science/blog/92-4-gpqa-diamond give this a read

wth

#

im confused

#

is it the same model or nah

#

Aristotle X1 Verify pass@1 benchmark results.

civic flame Jul 31, 2025, 11:25 AM

#

they're different lol

#

that's what had me confused

#

they both have models called aristotle

torn mantle Jul 31, 2025, 11:25 AM

#

what did harmonic name their model

#

bruh

civic flame Jul 31, 2025, 11:26 AM

#

aristotle

#

lol

torn mantle Jul 31, 2025, 11:26 AM

#

yea im not trusting that either

#

i thought its from harmonic

cedar tide Jul 31, 2025, 11:26 AM

#

cedar tide <@1338136168344064040>that he is strong in gpa it is possible that he found a ne...

gpt 4.5 which is the largest llm to date has only 62

civic flame Jul 31, 2025, 11:26 AM

#

this is their only post 💀 😭

cedar tide Jul 31, 2025, 11:27 AM

#

Ah now you change your mind

civic flame Jul 31, 2025, 11:27 AM

#

again, i thought it was about the harmonic model

cedar tide Jul 31, 2025, 11:27 AM

#

okk

woven thicket Jul 31, 2025, 11:27 AM

#

Hey can some one teach me how to use it

#

Basically I joined today

#

Hey can anyone listen me

#

Tell me how to use it

cedar tide Jul 31, 2025, 11:29 AM

#

7 employee

astral kayak Jul 31, 2025, 11:31 AM

#

I call bs

#

or it's really bad at other stuff

pure anvil Jul 31, 2025, 11:35 AM

#

They're not claiming that tools aren't being used though

torn mantle Jul 31, 2025, 11:35 AM

#

civic flame this is their only post 💀 😭

yea they are looking for funds

#

small business strategy

cedar tide Jul 31, 2025, 11:44 AM

#

https://fxtwitter.com/ArtificialAnlys/status/1950884246803136601

Artificial Analysis (@ArtificialAnlys)

🇰🇷 LG recently launched EXAONE 4.0 32B - it scores 62 on Artificial Analysis Intelligence Index, the highest score for a 32B model yet
︀︀
︀︀@LG_AI_Research's EXAONE 4.0 is released in two variants: the 32B hybrid reasoning model we’re reporting benchmarking results for here, and a smaller 1.2B model designed for on-device applications that we have not benchmarked yet.
︀︀
︀︀Alongside Upstage's recent Solar Pro 2 release, it's exciting to see Korean AI labs join the US and China near the top of the intelligence charts.
︀︀
︀︀Key results:
︀︀➤ 🧠 EXAONE 4.0 32B (Reasoning): In reasoning mode, EXAONE 4.0 scores 62 on the Artificial Analysis Intelligence Index. This matches Claude 4 Opus and the new Llama Nemotron Super 49B v1.5 from NVIDIA, and sits only 1 point behind Gemini 2.5 Flash
︀︀
︀︀➤ ⚡ EXAONE 4.0 32B (Non-Reasoning): In non-reasoning mode, EXAONE 4.0 scores 51 on the Artificial Analysis Intelligence Index.…

#

il you want to upvote https://discord.com/channels/1340554757349179412/1394703782255788122

deft vigil Jul 31, 2025, 11:53 AM

#

just got a potato lol is it open ai right

cedar tide Jul 31, 2025, 11:57 AM

#

go upvote kolors https://discord.com/channels/1340554757349179412/1386317762128773130

tame palm Jul 31, 2025, 12:09 PM

#

#video-arena-1 mars

shy dustBOT Jul 31, 2025, 12:13 PM

#

Server Information [ LMArena ]

profile Name: LMArena
space rightDoubleArrow ID: 1340554757349179412
space Description:

LMArena is an open platform where everyone can easily access, explore and interact with the world's leading AI models. Community shaped leaderboards help progress AI in a more transparent and grounded in real-world user way. Come join our community to explore and shape the frontier of AI.
Owner: @wooden mulch
Features:
Creation: <t:1739683560:R>
Channels: 286
Text: 28
VC: 3
Members: 6779
Roles: 26
Managed: 4

deft vigil Jul 31, 2025, 12:14 PM

#

dino never finish generating is it that slow ? or simply not functioning now

keen beacon Jul 31, 2025, 12:16 PM

#

deft vigil just got a potato lol is it open ai right

On battle mode?

How's it?

hardy pecan Jul 31, 2025, 12:17 PM

#

Simple Bench - Horizon Alpha: 3 / 20

#

Beautiful

tame palm Jul 31, 2025, 12:19 PM

#

car

deft vigil Jul 31, 2025, 12:22 PM

#

keen beacon On battle mode? How's it?

so so not suprise me much

#

now nightride-on

keen beacon Jul 31, 2025, 12:24 PM

#

deft vigil so so not suprise me much

Any zenith level model?

Potato/Dino, any of two?

deft vigil Jul 31, 2025, 12:24 PM

#

never tried zenith is it still exist ?

keen beacon Jul 31, 2025, 12:25 PM

#

deft vigil never tried zenith is it still exist ?

Nope.

outer zinc Jul 31, 2025, 12:36 PM

#

is there a video generation leaderboard yet?

civic flame Jul 31, 2025, 12:40 PM

#

#

👀

blazing bison Jul 31, 2025, 12:41 PM

#

Everything is ready for next week

unborn ocean Jul 31, 2025, 12:45 PM

#

cedar tide 7 employee

the cbo literally has a phd in how to pitch a start up (not joking!)

#

just a bunch of PR

#

for higher evaluation

harsh sonnet Jul 31, 2025, 12:57 PM

#

for how much time we get this video generation for free

torn mantle Jul 31, 2025, 1:25 PM

#

civic flame

its coming

safe falcon Jul 31, 2025, 1:39 PM

#

What is the SoTA research models available?

#

I'm trying to find an affordable solution for deep research that doesn't hallucinate much

#

For example perplexity hallucinates and tends to believe report mills

#

Also, question, does LmSYS do categorization on AI tasks through leaderboards

echo aurora Jul 31, 2025, 1:42 PM

#

outer zinc is there a video generation leaderboard yet?

No, not yet.

cedar tide Jul 31, 2025, 1:55 PM

#

https://fixupx.com/StepFun_ai/status/1950912271565385770?t=cAAFs0-kVOJPZVZF_n0AiA&s=19

StepFun (@StepFun_ai)

🚀 Announcing Step 3: Our latest open-source multimodal reasoning model is here! Get ready for a stronger, faster, & more cost-effective VLM！
︀︀🔵 321B parameters (38B active), optimized for top-tier performance & cost-effective decoding.
︀︀🔵 Revolutionary Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD) enable efficient inference—even on modest GPUs.
︀︀🔵 Trained on 20T+ tokens (incl. 4T multimodal), with meticulous data curation ensuring reduced hallucinations & robust reasoning across vision and language.
︀︀🚄 Unmatched speed: Up to 4,039 tokens/sec/GPU—70% faster than DeepSeek-V3 under similar conditions.
︀︀💎 Step 3 sets a new Pareto frontier—bridging power, efficiency, and practicality.
︀︀👉 Start building with Step 3 today: huggingface.co/stepfun-ai/step3
︀︀👉More details on our research blog：
︀︀www.stepfun.com/research/zh/step3

💬 8 🔁 19 ❤️ 86…

#

#

https://stepfun.ai/research/en/step3

StepFun

StepFun AI is your smart and reliable personal assistant, here to help you acquire knowledge, find information, learn languages, unleash creativity in writing, and even write code. Whether you’re working, studying, or just navigating everyday life, it’s designed to solve your problems and help you discover and understand the world around you.

#

Here to Upvote
https://discord.com/channels/1340554757349179412/1398572464719659028

stray aspen Jul 31, 2025, 2:19 PM

#

we need new SOTA models

cedar tide Jul 31, 2025, 2:20 PM

#

https://fixupx.com/cohere/status/1950920611267502382?t=VfI4f5NhBtnyp0QIi5dyTw&s=19

cohere (@cohere)

Introducing Command A Vision, a state-of-the-art generative model that excels across multimodal image capabilities that matter for enterprises!

**💬 1 🔁 22 ❤️ 45 👁️ 3.3K **

#

https://fixupx.com/Alibaba_Qwen/status/1950925444057792808?t=OkW55X5BCTfP8CjpvwBR9A&s=19

Qwen (@Alibaba_Qwen)

🦥 Qwen3-Coder-Flash: Qwen3-Coder-30B-A3B-Instruct
︀︀💚 Just lightning-fast, accurate code generation.
︀︀✅ Native 256K context (supports up to 1M tokens with YaRN)
︀︀✅ Optimized for platforms like Qwen Code, Cline, Roo Code, Kilo Code, etc.
︀︀✅ Seamless function calling & agent workflows
︀︀
︀︀💬 Chat: chat.qwen.ai
︀︀🤗 Hugging Face: hf.co/Qwen/Qwen3-Coder-30B-A3B-Instruct
︀︀🤖 ModelScope: modelscope.cn/models/Qwen/Qwen3-Coder-30B-A3B-Instruct
︀︀🔧 Qwen Code: github.com/QwenLM/qwen-code

**💬 6 🔁 5 ❤️ 25 👁️ 259 **

thorny sleet Jul 31, 2025, 2:37 PM

#

hey can anyone help me

with the processs to generate videos here?

echo aurora Jul 31, 2025, 2:42 PM

#

thorny sleet hey can anyone help me with the processs to generate videos here?

Yeah check out #1397655624103493813 for information on how to use Video Arena!

earnest rover Jul 31, 2025, 2:47 PM

#

when lmarena video gen available on lmarena.ai base site ?

echo aurora Jul 31, 2025, 2:56 PM

#

earnest rover when lmarena video gen available on lmarena.ai base site ?

It’s possible! Be sure to share that you’d like this in #bot-feedback

earnest rover Jul 31, 2025, 2:59 PM

#

echo aurora It’s possible! Be sure to share that you’d like this in <#1398083208272412722>

okay. thanks to you (if you are the owner of lmarena or anyone from lmarena).
but i am actually curious when lmarena.ai will have video gen. will it have the rate limits.

echo aurora Jul 31, 2025, 3:01 PM

#

earnest rover okay. thanks to you (if you are the owner of lmarena or anyone from lmarena). b...

That’s very much TBD. We’re treating this like an experiment and we’re looking to hear from the community before decisions like this are made.

delicate atlas Jul 31, 2025, 3:02 PM

#

Can you not attach images to claude sonnet and opus models anymore?

supple bluff Jul 31, 2025, 3:03 PM

#

hello

keen beacon Jul 31, 2025, 3:05 PM

#

Zenith still in arena

earnest rover Jul 31, 2025, 3:07 PM

#

keen beacon Zenith still in arena

what is zenith

blazing bison Jul 31, 2025, 3:11 PM

#

keen beacon Zenith still in arena

Its removed

burnt halo Jul 31, 2025, 3:42 PM

#

hello, I saw the vid on thread lol

primal orbit Jul 31, 2025, 3:59 PM

#

I've got a model called "cuttlefish"

ebon patio Jul 31, 2025, 4:14 PM

#

Hi

misty star Jul 31, 2025, 4:14 PM

#

https://fixupx.com/lmarena_ai/status/1950952994557878578?s=46

lmarena.ai (@lmarena_ai)

🧑‍🔬 Research Update: Today, we are releasing a new dataset with over 140k conversations from the text arena collected between April 17th and July 25th 2025. See thread to dig into it!
︀︀
︀︀We're pairing the data release with a deep dive into how model performance and evaluation dynamics have evolved over time. Let’s look at real-world trends, new features, and fresh prompts.
︀︀
︀︀What’s covered in the latest analysis:
︀︀- Overview of the released dataset
︀︀- Language & topic breakdowns
︀︀- Rating changes: How Arena scores shift over time
︀︀
︀︀And more! 🧵

**💬 1 🔁 1 ❤️ 5 👁️ 66 **

#

🤑

#

https://huggingface.co/datasets/lmarena-ai/arena-human-preference-140k

lmarena-ai/arena-human-preference-140k · Datasets at Hugging Face

golden ocean Jul 31, 2025, 4:19 PM

#

rip my ai girlfriend conversations

tribal aspen Jul 31, 2025, 4:19 PM

#

#announcements the new one sounds more like the AI Companies will be more interested in knowing

cursive spoke Jul 31, 2025, 4:20 PM

#

What models are there in video arena

tribal aspen Jul 31, 2025, 4:20 PM

#

golden ocean rip my ai girlfriend conversations

they cant reveal the ip adress that sent the messages to the model

#

and it would take decades to learn them

misty star Jul 31, 2025, 4:21 PM

#

golden ocean Jul 31, 2025, 4:21 PM

#

true

golden ocean Jul 31, 2025, 4:21 PM

#

misty star

LMAO

whole sundial Jul 31, 2025, 4:24 PM

#

civic flame Jul 31, 2025, 4:29 PM

#

great start

whole sundial Jul 31, 2025, 4:31 PM

#

lol an entire category section for a single model

misty star Jul 31, 2025, 4:32 PM

#

whole sundial Jul 31, 2025, 4:32 PM

#

tall summit Jul 31, 2025, 4:41 PM

#

nothing in the article makes it seem that there are conversations in the statistics and dataset from direct chat

#

i don't even know if there are or not

tall summit Jul 31, 2025, 4:42 PM

#

whole sundial

excuse me what

stray aspen Jul 31, 2025, 4:43 PM

#

they leaked the group chat

tall summit Jul 31, 2025, 4:43 PM

#

whole sundial

these are so interesting

tall summit Jul 31, 2025, 4:45 PM

#

tall summit i don't even know if there are or not

it looks as if there isn't which i find surprising

golden ocean Jul 31, 2025, 4:45 PM

#

I cant even find my chats wtf

#

o9h

#

theres no direct chats right?

keen beacon Jul 31, 2025, 4:54 PM

#

golden ocean rip my ai girlfriend conversations

Lol why would you write those if you knew they would be released in a database?

#

lmao

tall summit Jul 31, 2025, 4:57 PM

#

keen beacon Lol why would you write those if you knew they would be released in a database?

hopefully it's all anonymous

#

as in hopefully he didn't mention private information

keen beacon Jul 31, 2025, 4:58 PM

#

tall summit as in hopefully he didn't mention private information

It should be stated in bold text that the inputs and outputs will be used directly for research

#

if people get confused

golden ocean Jul 31, 2025, 4:58 PM

#

keen beacon Lol why would you write those if you knew they would be released in a database?

the voices convinced me

stray aspen Jul 31, 2025, 4:59 PM

#

bro who leaked the group cha

keen beacon Jul 31, 2025, 4:59 PM

#

golden ocean the voices convinced me

https://tenor.com/view/carzzy-the-voices-are-back-lolesports-voices-gif-5331840045818547848

Tenor

golden ocean Jul 31, 2025, 4:59 PM

#

sydney is in my head

fleet lintel Jul 31, 2025, 5:02 PM

#

stray aspen bro who leaked the group cha

what leak? anything interesting?

stray aspen Jul 31, 2025, 5:02 PM

#

yes

#

they leaked conversations

keen beacon Jul 31, 2025, 5:02 PM

#

stray aspen they leaked conversations

Ofc since it's for research

tall summit Jul 31, 2025, 5:02 PM

#

"leaked" xd

keen beacon Jul 31, 2025, 5:02 PM

#

it's not a private service

#

:/

tall summit Jul 31, 2025, 5:03 PM

#

keen beacon It should be stated in bold text that the inputs and outputs will be used direct...

it does

brave orbit Jul 31, 2025, 5:03 PM

#

echo aurora It’s possible! Be sure to share that you’d like this in <#1398083208272412722>

bro admins make the channel for hello is general not other channels Say it in general channel pls make a rule for it bro most channels is just flooded

keen beacon Jul 31, 2025, 5:03 PM

#

tall summit it does

Ah, well still people seem to forget

brave orbit Jul 31, 2025, 5:04 PM

#

bro pls stop saying hi and hello in other channels every one says hi in other channels then the general

#

mods ban that guy

#

bro you username

#

give me more info since ok

#

yeah i can not help you without info

#

just what you need

#

for it to do

#

say it or less i can not help

#

bro you had a swear word in you username bro you re

#

how can you ever join

fresh latch Jul 31, 2025, 5:16 PM

#

Hey! Anyone know how I can try the Horizon Alpha Model? Supposedly there was a way to do it in LMArena.

whole sundial Jul 31, 2025, 5:16 PM

#

openrouter chat

#

https://openrouter.ai/chat

fresh latch Jul 31, 2025, 5:18 PM

#

Thanks!

torn mantle Jul 31, 2025, 5:28 PM

#

misty star

pffft

humble oyster Jul 31, 2025, 5:35 PM

#

hey i just wanted to know there no limit on image generations, but in video generation there is limitations, i wanted know is there any posiblities that the video generation also be unlimited.?

jade egret Jul 31, 2025, 5:38 PM

#

when gpt 5 ):

echo aurora Jul 31, 2025, 5:38 PM

#

humble oyster hey i just wanted to know there no limit on image generations, but in video gene...

It's possible but unlikely. Be sure to share feedback/requests you have in #bot-feedback

prime mulch Jul 31, 2025, 5:39 PM

#

Bro is video generator have limit?

humble oyster Jul 31, 2025, 5:40 PM

#

prime mulch Bro is video generator have limit?

yes bro you can generate 8 videos a day.

prime mulch Jul 31, 2025, 5:40 PM

#

😭

#

I can understand but even the video is generated for 8 second actual scene is only 4 sec

torn mantle Jul 31, 2025, 5:44 PM

#

humble oyster yes bro you can generate 8 videos a day.

thanks bro

torn mantle Jul 31, 2025, 5:44 PM

#

prime mulch Bro is video generator have limit?

bro how are you doing?

torn mantle Jul 31, 2025, 5:44 PM

#

prime mulch I can understand but even the video is generated for 8 second actual scene is on...

no way bro

prime mulch Jul 31, 2025, 5:45 PM

#

torn mantle bro how are you doing?

Im doing well what about you

jade egret Jul 31, 2025, 5:49 PM

#

what the most fun model to talk with

keen beacon Jul 31, 2025, 5:51 PM

#

jade egret what the most fun model to talk with

Donno, I feel most comfortable with claude models. Doesn't feel too positive and goes straight to the point.

dawn wharf Jul 31, 2025, 5:51 PM

#

jade egret what the most fun model to talk with

amazon folsom

#

directchat3d

marsh stratus Jul 31, 2025, 5:53 PM

#

is gemini-2.5-flash-lite not going to be in the leaderboard?

brave orbit Jul 31, 2025, 5:55 PM

#

whats the best module for codeing in c++ and python and math what should i use pls help

#

help me

brittle tiger Jul 31, 2025, 6:00 PM

#

Horizon Alpha gets math problems wrong that o3 never messes up.

brave orbit Jul 31, 2025, 6:01 PM

#

and also codeing give me just a say whats the best and why @here

#

@here help me say whats the best module for math and codeing pls help me

willow grail Jul 31, 2025, 6:12 PM

#

lmarena is a company?\

#

thought this is a hobby project

echo aurora Jul 31, 2025, 6:14 PM

#

willow grail lmarena is a company?\

Sure are! More information in this blog: https://news.lmarena.ai/new-lmarena/

torn mantle Jul 31, 2025, 6:15 PM

#

reading some prompts from the dataset

#

people really ask all sort of things

reef pawn Jul 31, 2025, 6:29 PM

#

jade egret when gpt 5 ):

Prolly never

blazing bison Jul 31, 2025, 6:37 PM

#

torn mantle reading some prompts from the dataset

If it's not filtered, I'd bet there are even credit cards in there

keen beacon Jul 31, 2025, 6:38 PM

#

torn mantle people really ask all sort of things

well, ig that is good for diversity

torn mantle Jul 31, 2025, 6:40 PM

#

blazing bison If it's not filtered, I'd bet there are even credit cards in there

I mean, TOS talks about how our data is sent to providers, but I don't know how I feel about it being shared publicly

blazing bison Jul 31, 2025, 6:42 PM

#

It's good for open source models. You know it's going to be shared publicly. If you share more than you should, it's your fault.

#

If you feel bad about using it, then don't

echo aurora Jul 31, 2025, 6:47 PM

#

blazing bison If it's not filtered, I'd bet there are even credit cards in there

We do apply aggressive PII filtering.

cedar tide Jul 31, 2025, 6:48 PM

#

New Arena Models

- velocilux

- cogitolux

torn mantle Jul 31, 2025, 6:49 PM

#

blazing bison If you feel bad about using it, then don't

dont what

#

dont tell me what to do

cedar tide Jul 31, 2025, 6:49 PM

#

cedar tide ### New Arena Models ### - velocilux ### - cogitolux

Cresylux family ?
So its from meituan ?
Its good ?

#

@torn mantle essaye les nouveaux modèles et dit moi ce que t'en penses

torn mantle Jul 31, 2025, 6:50 PM

#

cedar tide <@295243581818404874> essaye les nouveaux modèles et dit moi ce que t'en penses

okay

upbeat aurora Jul 31, 2025, 6:50 PM

#

hello

#

animate the image

echo aurora Jul 31, 2025, 6:51 PM

#

upbeat aurora hello

Be sure to check out #1397655624103493813

keen fulcrum Jul 31, 2025, 6:54 PM

#

echo aurora Be sure to check out <#1397655624103493813>

we are quite uncertain what the goals of the platform are and what your internal roadmap holds. Can you make it public?

echo aurora Jul 31, 2025, 6:55 PM

#

keen fulcrum we are quite uncertain what the goals of the platform are and what your internal...

I'll share this with the team. We care a lot about transparency. Our mission remains:

To bring the best AI models to everyone, and to improve them through real-world community evaluations.

keen fulcrum Jul 31, 2025, 6:58 PM

#

Thanks appreciate it. The UI improved a lot over the last months

#

I propose making Search, Image, Video and Webdev Arena available through three major buttons to increase visibility. I attached a possible concept.

#

Its unclear that those buttons lead to a different arena

#

You may add a webdev arena button as its currently deployed on a separate platform

Additionally I propose adding tooltips to the leaderboard explaining how Rank, CI and Elo are determined

primal girder Jul 31, 2025, 7:07 PM

#

echo aurora We do apply aggressive PII filtering.

But when browsing the dataset, it could be seen that there’s some personal information included prompts being published. Ppl sometimes do stupid things like putting some files in without properly erasing all the personal info. 🤣 I know the TOS specified the rights and responsibilities and stuff. But maybe if there could be a way for users to choose to remove some of their prompts from a public release, it might be nicer?

echo aurora Jul 31, 2025, 7:10 PM

#

correct

echo aurora Jul 31, 2025, 7:11 PM

#

primal girder But when browsing the dataset, it could be seen that there’s some personal infor...

If you're seeing examples of this please send me a DM so I can escalate.

#

I don't know tbh, regardless I have been sharing these concerns with the team.

torn mantle Jul 31, 2025, 7:18 PM

#

they probably filtered that out

frigid coral Jul 31, 2025, 8:04 PM

#

https://www.deepcogito.com/research/cogito-v2-preview

Introducing Cogito v2 Preview

From Inference-time Search to Self-Improvement.

white hatch Jul 31, 2025, 8:23 PM

#

guys, did you notice that gemini started after some point to repeat himself or i'm tripping?

torn mantle Jul 31, 2025, 8:26 PM

#

white hatch guys, did you notice that gemini started after some point to repeat himself or i...

Not really

#

Its been consistent to me

#

I questioned myself if gemini 2.5 flash got even better

white hatch Jul 31, 2025, 8:27 PM

#

I have 2 chats with that problem, maybe ran out of tokens idk

#

I use gemini 2.5 pro

echo viper Jul 31, 2025, 8:29 PM

#

Video limit was 10 yesterday and 8 today. We have 4 more days left to limit 0. Hurry up.

flint sandal Jul 31, 2025, 8:39 PM

#

I swear i wrote a message on lmarena in feedback for making video generation arena. And now it is real (i thought they will made it in the webapp)

leaden palm Jul 31, 2025, 9:25 PM

#

what happened in #leaderboards? did this server get fake airdrop raided, openrouter-style?

echo aurora Jul 31, 2025, 9:27 PM

#

leaden palm what happened in <#1340554757827461212>? did this server get fake airdrop raided...

everyone likes to say hello there blobshrug

half nimbus Jul 31, 2025, 10:04 PM

#

hello, I'm new here. I'd like to get the most out of LM arena but I feel like I'm swimming in the deep end without floaties. When y'all got started how did you leverage it?

mellow frigate Jul 31, 2025, 10:08 PM

#

half nimbus hello, I'm new here. I'd like to get the most out of LM arena but I feel like I'...

What do you want to get out of it?

half nimbus Jul 31, 2025, 10:11 PM

#

I would like to fine tune my skills as a prompt engineer

cedar tide Jul 31, 2025, 10:20 PM

#

Thank you, that's exactly enough.

Screenshot_2025-08-01-00-19-47-234_com.discord.jpg

torn mantle Jul 31, 2025, 10:22 PM

#

cedar tide Thank you, that's exactly enough.

its only added twice, glm 4.5 and air

#

ah its on webdev

#

well you asked for it xd

brittle tiger Jul 31, 2025, 10:27 PM

#

I wonder what the squid emoji is hinting at here from a Google pm

https://x.com/simpsoka/status/1951008214595805498?t=OMMvH3wgSpZfOgOqv-3Y3w&s=19

Kath Korevec (@simpsoka)

I'm very excited about next week, y'all. 👀🦑

leaden palm Jul 31, 2025, 10:30 PM

#

brittle tiger I wonder what the squid emoji is hinting at here from a Google pm https://x.com...

squid = jules

#

kath is the jules... gal i guess

brittle tiger Jul 31, 2025, 10:34 PM

#

brittle tiger Horizon Alpha gets math problems wrong that o3 never messes up.

horizon alpha on openrouter chat now has reasoning enabled. not getting these tests of mine wrong anymore

barren prairie Jul 31, 2025, 10:47 PM

#

torn mantle I mean, TOS talks about how our data is sent to providers, but I don't know how ...

It is good that I wrote my promts in arabic , not a lot of people can understand my prompts 😂😂

patent aspen Jul 31, 2025, 10:48 PM

#

poll_question_text

Will GPT-5 launch before Deep Think?

victor_answer_votes

12

total_votes

22

victor_answer_id

1

victor_answer_text

Yes

barren prairie Jul 31, 2025, 10:51 PM

#

When our promts will be shared publicly , we will laugh a lot 😂😂😂🤣🤣🤣
Can t wait to read them .

tall summit Jul 31, 2025, 11:26 PM

#

torn mantle I mean, TOS talks about how our data is sent to providers, but I don't know how ...

it was obvious it would happen again considering it happened before

stray aspen Jul 31, 2025, 11:54 PM

#

Can we use the study mode prompt on other Ais

leaden palm Aug 1, 2025, 12:42 AM

#

stray aspen Can we use the study mode prompt on other Ais

📎 extracted_for_you.md

hallow ridge Aug 1, 2025, 12:50 AM

#

I have an instagram account with over 300k and I don’t want it anymore

leaden palm Aug 1, 2025, 12:58 AM

#

keen beacon Aug 1, 2025, 1:04 AM

#

leaden palm

wheres this from mbtw

leaden palm Aug 1, 2025, 1:05 AM

#

keen beacon wheres this from mbtw

saw it in a tweet, searched it and found the relevant text, idk about the rest of it

forest prism Aug 1, 2025, 1:16 AM

#

Which LLM is the best street smart? I think adding a leaderboard in LMArena for it would be sick

leaden palm Aug 1, 2025, 1:22 AM

#

forest prism Which LLM is the best street smart? I think adding a leaderboard in LMArena for ...

what kinds of prompts would you classify as "testing street smarts"?

forest prism Aug 1, 2025, 1:27 AM

#

leaden palm what kinds of prompts would you classify as "testing street smarts"?

Something like this but advanced

“Your storefront is dead, but the parking lot next door is packed. What scrappy move might get you foot traffic?”
“You get your first bad online review — and it’s unfair. How do you respond publicly without looking defensive?”
“Your competitor just undercut your pricing. You can’t afford to match it — what do you do to stay in the game?”
“You’re launching a new product and have no ad budget. How do you create buzz with zero dollars?”
“A VC firm wants equity in exchange for mentorship, not money. Worth considering?”
“You’re about to go into business with someone who talks big but avoids putting anything in writing. What’s your move?”
“An early client wants a deep discount in exchange for ‘exposure.’ What questions should you ask before agreeing?”
“An employee you trust starts showing up late and missing deadlines. How do you handle it without losing them or getting walked over?”
“You have $5,000 left. Do you spend it on marketing, product development, or paying a debt collector breathing down your neck?”
“A supplier offers a ‘limited-time’ bulk discount, but you haven’t even sold your first batch. Do you go for it?”

#

It's subjective but I think that's why LMArena battle mode exists

forest prism Aug 1, 2025, 1:51 AM

#

Yes, definitely on the non verifiable domain, but very useful tho

#

Interesting, Why don't you think it's useful?

#

I see where you're coming from, I think I'm on the entirely opposite camp, I believe in achieving singularity as the end goal not us being the bottleneck

#

Absolutely, that's the greatest outcome imo. I wonder what you have against it?

#

What's the gap that won't let it happen? like what would you say is the "missing/never will happen" component

#

we aren't there yet is different than it won't happen, won't happen means that there is a component that is impossible preventing singularity from ever happening

stray aspen Aug 1, 2025, 2:03 AM

#

Craig do you think the new glm 4.5 is good

forest prism Aug 1, 2025, 2:04 AM

#

What's that component?

stray aspen Aug 1, 2025, 2:09 AM

#

@cedar tide david penses tu que le nouveau glm 4.5 est meilleur d une maniere ou d une autre que les autres modeles sota

last verge Aug 1, 2025, 3:05 AM

#

Is there a possible making story video with consistent character?

wicked root Aug 1, 2025, 3:27 AM

#

I keep getting rate limited on gemini

echo aurora Aug 1, 2025, 3:28 AM

#

wicked root I keep getting rate limited on gemini

and you haven't been using it a lot?

wicked root Aug 1, 2025, 3:28 AM

#

ofc I do LOL

#

I got it to code 50+ times today within a span of 3 hours.

#

plus a whole bunch of CS questions

brave orbit Aug 1, 2025, 3:39 AM

#

nocturne sparrow Aug 1, 2025, 4:48 AM

#

Potter tying clay pot on tall bamboo pole, king and his sons failing with arrows, spectators watching with tension, mid-range shot, lateral pan motion with slow push-in on shattered hope in faces

static lark Aug 1, 2025, 4:49 AM

#

leaden palm

openai's study mode is horrible

echo aurora Aug 1, 2025, 4:49 AM

#

nocturne sparrow Potter tying clay pot on tall bamboo pole, king and his sons failing with arrows...

The bot only works in the video-arena channels like #video-arena-3 , you'll want to type /image-to-video

static lark Aug 1, 2025, 4:49 AM

#

better to just not use it for studying

#

and use the model normally

#

it walking through the concept with you takes longer than if it just explains it clearly and directly

verbal nimbus Aug 1, 2025, 5:37 AM

#

static lark openai's study mode is horrible

Gemini on AIStudio is pretty good for studying

#

I heard they merged LearnLM with 2.5 Pro

#

As long as you stick to a traditional syllabus, it's pretty great. For non-standard stuff, it's less on-point.

patent aspen Aug 1, 2025, 6:03 AM

#

Seems like more of the same story. Apple has no comparative advantage in AI, but they own the world's best real estate. They'll continue being a luxury real estate company for as long as it works.

whole sundial Aug 1, 2025, 6:31 AM

#

the members of this HF team are all OAI employees btw...

verbal nimbus Aug 1, 2025, 6:37 AM

#

whole sundial

Seems to be gone now: https://huggingface.co/yofo-happy-panda

digital umbra Aug 1, 2025, 6:43 AM

#

whole sundial

it almost feels like the leaks are intentional, first the gpt-5 entrypoint and now this lol

#

well, i think leaks are more credible than sam altman's tweets anyway

brave orbit Aug 1, 2025, 6:44 AM

#

willow grail Aug 1, 2025, 7:13 AM

#

its free and at level of other sotas. its not best for swe.
but opus 4 is still trash for vibe-swe. money waste.

whole sundial Aug 1, 2025, 7:28 AM

#

i guess there's a 20B too
https://fixupx.com/apples_jimmy/status/1951180954208444758

Jimmy Apples 🍎/acc (@apples_jimmy)

So before people take credit, I found the oai os a min after they uploaded and saved the config and other stuff before it was removed.
︀︀
︀︀It’s an OS model and coming soon so kinda feels like ruining a surprise

**💬 16 🔁 5 ❤️ 65 👁️ 1.9K **

sacred quail Aug 1, 2025, 7:53 AM

#

brave orbit

Best at long context

#

Best at analyzing videos

#

Not even has any competitor

#

Other models reading text of videos, while gemini literally watching whole video frame by frame for hours and can gives you detailed and specific outputs

#

People still dont know how useful is this

#

Analyzing video is bigger thing than analyzing pdfs

#

Gemini needs own benchmark just for this
Also analyzing for pdfs or text gemini is still best because of best at long context

brave orbit Aug 1, 2025, 7:57 AM

#

And say why

#

it is better

#

say why it is better in a message

paper vault Aug 1, 2025, 8:25 AM

#

The summer of 1305 finds William Wallace crouched in the dense undergrowth of a Scottish forest, his once-proud frame now gaunt from years of constant flight. The man who once commanded armies and negotiated with kings now lives like a hunted animal, moving from shadow to shadow across a homeland that no longer recognizes his authority. His weathered hands, scarred from countless battles, grip a simple dirk—the only weapon left to Scotland's former Guardian. Seven years have passed since Wallace, "A medieval storybook illustration of a grim knight riding a horse through a peasant village, peasants looking frightened, castles in the misty hills in the background, detailed faces, realistic proportions, dramatic lighting, vintage painting texture, inspired by oil painting and watercolor, muted earthy tones, [additional scene-specific detail here]"

primal girder Aug 1, 2025, 8:32 AM

#

cedar tide Thank you, that's exactly enough.

I still have no idea where to check this update of arena battle models 🤨 . Could anybody please enlighten me?

viral hamlet Aug 1, 2025, 8:48 AM

#

LMARENA I DIDNT KNOW YOU HAVE A DIS I LOVE YOU GUYS

gleaming pagoda Aug 1, 2025, 8:53 AM

#

primal girder I still have no idea where to check this update of arena battle models 🤨 . Coul...

The same. Guys, do you know how to check this?

#

And I have no way to check my vote result either, although I have voted more than 5 times.😭 Is that a bug or something?

lucid jacinth Aug 1, 2025, 8:59 AM

#

Hi

dull raptor Aug 1, 2025, 9:14 AM

#

Hello...

verbal nimbus Aug 1, 2025, 9:15 AM

#

sacred quail People still dont know how useful is this

What do you use it for? I find that the hard part is downloading/uploading the video in the first place.

vast hound Aug 1, 2025, 9:22 AM

#

Some of the user's requests in dataset are funny:

[
  {
    "role": "user",
    "content": [
      {
        "type": "text",
        "text": "Are hamsters made of ham?",
        "image": null,
        "mimeType": null
      }
    ]
  }

sacred quail Aug 1, 2025, 9:27 AM

#

verbal nimbus What do you use it for? I find that the hard part is downloading/uploading the v...

im just pasting youtube links in ai studio

#

Also if you select lower resolution and lower fps like 0.5

#

You can use for muuuch longer videos

#

Sometimes it gives error but just resend message again

#

Could be take several minutes if video too long, just be patient

#

Im asking summaries, asking time stamps

#

asking "are they talked about this"

#

asking "is this guy laughed or scared and which minute?"

#

Yea, it can analyzing face mimics too

#

Basically everything it takes hours if you do that but takes seconds when gemini does

#

Also you can make subtitles too but i dont recommend one shot try, instead translate with 20 minute parts, you can also select time parts. And like i said, gemini not only listen or reading, literally watches videos frame by frame so subtitles will be more accurate because gemini can see whats happening on screen that time

unborn ocean Aug 1, 2025, 9:45 AM

#

info on openai's bigger oss model:
128 experts - 4 active -> very efficient
120b params - 5b active
4k initial context window - 128k current -> not horizon-alpha or what ever it is called (?)
trained in FP4 -> should only run on blackwell (?)

reef pawn Aug 1, 2025, 10:12 AM

#

unborn ocean info on openai's bigger oss model: 128 experts - 4 active ...

1 blackwell is enough?

brittle gull Aug 1, 2025, 10:33 AM

#

chemistry ei sob topic ar bornona den

mortal coyote Aug 1, 2025, 10:38 AM

#

why the website image generator - GPT-1 is so slow ?

golden ocean Aug 1, 2025, 11:02 AM

#

Are hamsters made of ham?

hardy pecan Aug 1, 2025, 11:36 AM

#

gemini 2.5 deepthink is out for ultra members

marsh sundial Aug 1, 2025, 11:39 AM

#

any screenshot?

reef pawn Aug 1, 2025, 11:54 AM

#

hardy pecan gemini 2.5 deepthink is out for ultra members

When it will be available for Gemini AI pro members?

hardy pecan Aug 1, 2025, 11:54 AM

#

reef pawn When it will be available for Gemini AI pro members?

I dont think ever, looks exclusive to Ultra plans, but maybe limited queries for pro in the future, but I doubt it

reef pawn Aug 1, 2025, 11:54 AM

#

Nooo

#

😢

reef pawn Aug 1, 2025, 11:55 AM

#

hardy pecan I dont think ever, looks exclusive to Ultra plans, but maybe limited queries for...

Let's hope it will be available for pro as well ❤️‍🩹

sour spindle Aug 1, 2025, 12:02 PM

#

Very quiet release

earnest rover Aug 1, 2025, 12:18 PM

#

i have a question to LMARENA guys. apart from video gen ai, when you guys will add temperature or token in/out settings in text models. also what about image ai. when we can control the image temperature or gradience.

marsh sundial Aug 1, 2025, 1:18 PM

#

hardy pecan gemini 2.5 deepthink is out for ultra members

It only have 10 RPD for ultra subsriber, that is ridiculous

latent patio Aug 1, 2025, 1:23 PM

#

any free ai image generator with no price unlimited other than LMarena ?

main gulch Aug 1, 2025, 1:31 PM

#

marsh sundial It only have 10 RPD for ultra subsriber, that is ridiculous

confirmed?

#

the exact limit wasn't published afaik

marsh sundial Aug 1, 2025, 1:32 PM

#

I am ultra subscriber

#

echo aurora Aug 1, 2025, 1:32 PM

#

earnest rover i have a question to LMARENA guys. apart from video gen ai, when you guys will a...

TBD! We do hear this request a lot but yeah I can't say when to expect something like this.

earnest rover Aug 1, 2025, 1:33 PM

#

latent patio any free ai image generator with no price unlimited other than LMarena ?

its complicated. there is many. but the models are not that great
g4f.dev
ish.junioralive.in
polliantions.ai

#

am i missing something. what is so special about deepthink

graceful sable Aug 1, 2025, 1:36 PM

#

earnest rover am i missing something. what is so special about deepthink

nothing really

keen beacon Aug 1, 2025, 1:44 PM

#

marsh sundial

Can you ask it a question I have?

paper nimbus Aug 1, 2025, 1:44 PM

#

keen beacon Can you ask it a question I have?

he already hit the limit bro

lone vector Aug 1, 2025, 1:45 PM

#

stray aspen Aug 1, 2025, 1:51 PM

#

deepthhink is out

sour spindle Aug 1, 2025, 1:57 PM

#

marsh sundial It only have 10 RPD for ultra subsriber, that is ridiculous

Thanks for saving me money lol

torn mantle Aug 1, 2025, 2:00 PM

#

marsh sundial It only have 10 RPD for ultra subsriber, that is ridiculous

Yea i will pass

patent aspen Aug 1, 2025, 2:02 PM

#

The IMO benchmark is just sassy lmao

autumn nacelle Aug 1, 2025, 2:08 PM

#

@latent patio yo i am also looking for free ai image generator with no price and unlimited just like you all i know is freepass ai and i think its a bit bad. Wish someone made a list of free ai image generator

#

Something went wrong with this response, please try again. Also I Get This error when trying to create images in LMarena site any solutions ?

stray aspen Aug 1, 2025, 2:11 PM

#

lmarena

blazing bison Aug 1, 2025, 2:12 PM

#

marsh sundial It only have 10 RPD for ultra subsriber, that is ridiculous

Bro, I was literally one button away from paying for it

#

Can't be real

#

10 rpd

stray aspen Aug 1, 2025, 2:12 PM

#

what a daylight robbery

blazing bison Aug 1, 2025, 2:13 PM

#

hopefully gpt-5 is better

#

funny if gpt-5 could do the same for $20

#

Grok 4 Heavy is a bigger scam than Google

stray aspen Aug 1, 2025, 2:17 PM

#

ultra

#

they should have compared with grok 4 heavy and o3 pro high

patent aspen Aug 1, 2025, 2:19 PM

#

My favorite is o3 being below IMO bronze level lmao

blazing bison Aug 1, 2025, 2:19 PM

#

o3 is from december 2024

#

there is optimizations yes, but it's an almost 1 year old model in general

whole wagon Aug 1, 2025, 2:20 PM

#

@patent aspen so this deep think is actually 2.5 ultra base model behind the scenes?

patent aspen Aug 1, 2025, 2:20 PM

#

whole wagon <@607352374352281612> so this deep think is actually 2.5 ultra base model behind...

yeah

#

It's one deep think

blazing bison Aug 1, 2025, 2:21 PM

#

with deepthink they mena like 50k token of thinking

blazing bison Aug 1, 2025, 2:21 PM

#

patent aspen It's one deep think

well, they said that deep think = max tokens thinking

#

so it's not one

jade egret Aug 1, 2025, 2:22 PM

#

guys

#

is deep think good

blazing bison Aug 1, 2025, 2:22 PM

#

we have a bunch of gemini 2.5 pro with a lot of reasoning tokens enabled and one of them decide the best answer

whole wagon Aug 1, 2025, 2:22 PM

#

Nobody even attempted to put it as a model request kek. I assume it will be rejected but worth a shot at least

lime coral Aug 1, 2025, 2:22 PM

#

GPT5 < IMO GPT < Deep Think IMO

blazing bison Aug 1, 2025, 2:22 PM

#

deep think = 50k thinking tokens

patent aspen Aug 1, 2025, 2:22 PM

#

jade egret is deep think good

It's SoTA at math. Not practical for 99% of people

lime coral Aug 1, 2025, 2:24 PM

#

I speak facts cry

#

When did I say something different?

patent aspen Aug 1, 2025, 2:24 PM

#

I mean he's absolutely right if we're talking about math only. Otherwise yeah I'd disagree

lime coral Aug 1, 2025, 2:24 PM

#

Math, coding

blazing bison Aug 1, 2025, 2:24 PM

#

apparently no openai livestream today so no open source today

lime coral Aug 1, 2025, 2:25 PM

#

Coding

whole wagon Aug 1, 2025, 2:27 PM

#

Why are they sitting on these models so long lol

#

Like just release it already. The open source they had for ages

jade egret Aug 1, 2025, 2:27 PM

#

will gpt-5 be better than deepthink?

#

why

leaden meteor Aug 1, 2025, 2:37 PM

#

Is deep think even be going to be in Arena to test against GPT5?

barren prairie Aug 1, 2025, 2:44 PM

#

leaden meteor Is deep think even be going to be in Arena to test against GPT5?

10 rpd for 250$ and you want it on Arena ?

still matrix Aug 1, 2025, 2:45 PM

#

Someone with access to gemini deepthink can I give you a highly complicated clinical case question no other model can solve to check it's answers?

Please i beg you it's really a thrilling misery case no machine can solve and humana are struggling too

Edit :mystery *** I'm actually a friend of the clinical case and we been baffled for months without am answer

gentle plinth Aug 1, 2025, 2:47 PM

#

Dafuq

#

Why admins deleted it

patent aspen Aug 1, 2025, 2:47 PM

#

still matrix Someone with access to gemini deepthink can I give you a highly complicated clin...

Sure

ocean vortex Aug 1, 2025, 2:48 PM

#

lone vector

All of those gains comparing with the initial deep think announcement can be easily attributed to base model update (06-05 vs 05-06) in my book

#

#

And their initial release:

#

(different LCB range)

willow grail Aug 1, 2025, 2:49 PM

#

https://i.imgur.com/paKXQ41.png

Imgur

fleet lintel Aug 1, 2025, 2:49 PM

#

lone vector

numbers look good.. but i have learned to not get hyped before more confirmations.

Are they actually good?

willow grail Aug 1, 2025, 2:49 PM

#

@leaden palm

ocean vortex Aug 1, 2025, 2:49 PM

#

They sneakily did not include USAMO this time at all lol

lone vector Aug 1, 2025, 2:49 PM

#

still matrix Aug 1, 2025, 2:49 PM

#

patent aspen Sure

Here's the question

📎 0208_09.md

lone vector Aug 1, 2025, 2:50 PM

#

fleet lintel numbers look good.. but i have learned to not get hyped before more confirmation...

I'm not a Ultra member, don't know

still matrix Aug 1, 2025, 2:50 PM

#

still matrix Here's the question

Here's the question

ocean vortex Aug 1, 2025, 2:50 PM

#

lone vector

Nothing really insane about this tbh. Just parallel compute

fleet lintel Aug 1, 2025, 2:51 PM

#

you have ultra access?

ocean vortex Aug 1, 2025, 2:51 PM

#

You could have done this yourself after 06-05 was released with some coding, I believe

patent aspen Aug 1, 2025, 2:51 PM

#

fleet lintel you have ultra access?

Yeah

fleet lintel Aug 1, 2025, 2:52 PM

#

patent aspen Yeah

wow.. you paying 300$ bucks?

still matrix Aug 1, 2025, 2:52 PM

#

patent aspen Yeah

I Will be eternally thankful i really will it's a very critical situation we are trying to solve here God bless you

fleet lintel Aug 1, 2025, 2:53 PM

#

deep think probably takes like 5 min to answer any query

leaden meteor Aug 1, 2025, 3:12 PM

#

barren prairie 10 rpd for 250$ and you want it on Arena ?

You never know. Google might want to shell out to show off. It can afford it.

torn mantle Aug 1, 2025, 3:27 PM

#

fleet lintel deep think probably takes like 5 min to answer any query

Hmm

#

Kingfall was faster

#

So we can assume that its gemini 3.0

patent aspen Aug 1, 2025, 3:30 PM

#

still matrix I Will be eternally thankful i really will it's a very critical situation we ar...

https://g.co/gemini/share/d2acd8f8598b

Gemini

‎Gemini - Iatrogenic Neuropsychiatric Syndrome Analysis

Created with Gemini

torn mantle Aug 1, 2025, 3:30 PM

#

Okay

patent aspen Aug 1, 2025, 3:33 PM

#

My thoughts on Deep Think are that it's probably not something that 99% of chatbot users need, but the remaining 1% could have a categorical improvement in capability

#

E.g. mathematicians, scientists, critical medical situations, distributed systems problems, logistics, leading edge HFT firms, etc

torn mantle Aug 1, 2025, 3:35 PM

#

Is it using something similar to kingfall as instruct model?

#

But why does it look worse than kingfall

analog raptor Aug 1, 2025, 3:36 PM

#

@deep adder Grok is bad!

sweet tinsel Aug 1, 2025, 3:38 PM

#

analog raptor <@348477266704990208> Grok is bad!

Hmmmm. It really has its quirks but it's generally solid.

analog raptor Aug 1, 2025, 3:38 PM

#

sweet tinsel Hmmmm. It really has its quirks but it's generally solid.

nuh uh

civic flame Aug 1, 2025, 3:43 PM

#

yeah in terms of frontend design it did worse than base 2.5 ultra & 2.5 pro

#

but it had no bugs

primal orbit Aug 1, 2025, 3:55 PM

#

Does deep think output long detailed answers like deep research?

#

or more concise like o3?

whole wagon Aug 1, 2025, 3:57 PM

#

https://x.com/main_horse/status/1951201925778776530 but nobody can get it to run 😅
would be hilarious if someone gets it to work whilst openai are busy 'safety testing'

#

like we have the weights just not the inference to run it lmao

ocean vortex Aug 1, 2025, 3:57 PM

#

It's just 06-05 with minimal changes if any at all + parallel compute. This is my conclusion thus far judging by what I saw until it can be proven otherwise. They barely showed any metrics at all, and those that they did showed similar gains to the 05-06 initial deep think.

patent bane Aug 1, 2025, 3:58 PM

#

can I send some prompts to test?

#

I canceled my ultra plan a month ago

patent aspen Aug 1, 2025, 3:59 PM

#

ocean vortex It's just 06-05 with minimal changes if any at all + parallel compute. This is m...

You've been making that claim for a while

ocean vortex Aug 1, 2025, 3:59 PM

#

patent aspen You've been making that claim for a while

for awhile? It's only released today lol

patent aspen Aug 1, 2025, 4:00 PM

#

ocean vortex for awhile? It's only released today lol

You've been saying anyone can roll their own deep think with parallel compute and pro

ocean vortex Aug 1, 2025, 4:01 PM

#

patent aspen You've been saying anyone can roll their own deep think with parallel compute an...

When it's only that it's fairly simple and this is mostly true. You can do 10 responses in parallel and see quite easily what can be improved with it

warm fulcrum Aug 1, 2025, 4:02 PM

#

what does rpd mean?

whole wagon Aug 1, 2025, 4:02 PM

#

for the length of time it thinks you cant really do many requests per day anyways

#

like each takes 10+ minutes

ocean vortex Aug 1, 2025, 4:03 PM

#

For the amount of noise they made, essentially promising to release IMO gold medal model, this is kinda a disappointment

primal orbit Aug 1, 2025, 4:04 PM

#

damn

warm fulcrum Aug 1, 2025, 4:04 PM

#

so what's so bad about the deep think model besides the requests per day limit?

#

is it not worth it at all

ocean vortex Aug 1, 2025, 4:04 PM

#

Perhaps but as things stand now they just released a thing that was supposed to be live months ago. Only based on a slightly newer model now lol

patent bane Aug 1, 2025, 4:05 PM

#

oh yeah

civic flame Aug 1, 2025, 4:05 PM

#

you just wasted 1 of your 10 RPD on that?

#

😭

whole wagon Aug 1, 2025, 4:05 PM

#

is this the next dumb test

#

after strawberry

patent bane Aug 1, 2025, 4:05 PM

#

not mine

hollow imp Aug 1, 2025, 4:05 PM

#

patent bane oh yeah

DEEPTHINK!

patent bane Aug 1, 2025, 4:05 PM

#

but that does prove something

whole wagon Aug 1, 2025, 4:06 PM

#

it is essentially like using an optical illusion to assess someones intelligence. it is just an artifact of the tokenizer

patent bane Aug 1, 2025, 4:07 PM

#

yes but it should have used tools to calculate

hollow imp Aug 1, 2025, 4:07 PM

#

Guys I'm squeezing every bit of Gemini 2.5 through custom gems

#

I've added all Robert greene books pdfs in the knowledge base

patent aspen Aug 1, 2025, 4:08 PM

#

warm fulcrum so what's so bad about the deep think model besides the requests per day limit?

Mostly I think it's that people are expecting a model fine tuned for complex structured thinking problems to be better as a general purpose model

#

And they don't like the 10rpd limit

#

And cost

warm fulcrum Aug 1, 2025, 4:09 PM

#

patent aspen And they don't like the 10rpd limit

i mean the ai thinks for like 10 minutes or longer

#

the computing cost must be high

primal orbit Aug 1, 2025, 4:09 PM

#

whole wagon it is essentially like using an optical illusion to assess someones intelligence...

The problem most often is you have to be smart enough to pose the right questions. The 2.5 pro is capable of tackling very intricate dynamics, but it needs to be focused manually on necessary details. It doesn't handle prioritizing very well about where to dig more.

hollow imp Aug 1, 2025, 4:09 PM

#

primal orbit The problem most often is you have to be smart enough to pose the right question...

I've made a custom gem for prompt engineering

ocean vortex Aug 1, 2025, 4:10 PM

#

patent aspen Mostly I think it's that people are expecting a model fine tuned for complex str...

It must at the very least be no worse than 2.5Pro for any task. It is a thing they are not the first to do and it is directly competing with o3-pro and grok4-heavy

whole wagon Aug 1, 2025, 4:10 PM

#

o3 pro has a similar limit actually. they just dont state it. barely anyone will ever hit it, so whats the point. Its just adding unneccessary worry into the user about something which is likely not relevant to their usage

ocean vortex Aug 1, 2025, 4:11 PM

#

Not really lol

hollow imp Aug 1, 2025, 4:11 PM

#

O3 search in lmarena search is godly for me

#

My best experience with web Searching so far

blazing bison Aug 1, 2025, 4:12 PM

#

The api one is always better

ocean vortex Aug 1, 2025, 4:12 PM

#

I don't see a single good reason why this should have been delayed either tbh

#

But this is not based on Ultra is it

blazing bison Aug 1, 2025, 4:14 PM

#

They rushed deep think because they know that next week is openai week

opaque gull Aug 1, 2025, 4:14 PM

#

guys what to do if bot doesnt answer to me in dm when i want to gen video

ocean vortex Aug 1, 2025, 4:14 PM

#

If it was they would have shared more metrics, gains would be higher and wouldn't match up to 05-06 deep think gains

blazing bison Aug 1, 2025, 4:15 PM

#

Yeah, just a rushed version because releasing now some people will buy the ultra plan. Releasing next week no one would buy because of gpt 5

ocean vortex Aug 1, 2025, 4:16 PM

#

and they wouldn't be afraid to include USAMO like they did earlier

hollow imp Aug 1, 2025, 4:18 PM

#

WHERE IS DEEPSEEK R2 COMING

whole wagon Aug 1, 2025, 4:18 PM

#

I like it. Don't care much about price I just want the best. Though they should introduce a tier above for unlimited usage kek

ocean vortex Aug 1, 2025, 4:18 PM

#

Source...?

Also:

If you’re a Google AI Ultra subscriber, you can use Deep Think in the Gemini app today with a fixed set of prompts a day by toggling “Deep Think” in the prompt bar when selecting 2.5 Pro in the model drop down.

whole wagon Aug 1, 2025, 4:19 PM

#

whole wagon I like it. Don't care much about price I just want the best. Though they should ...

If GPT5 top of the line model beats it I will just switch to that instead

hollow imp Aug 1, 2025, 4:19 PM

#

At least gemini 2.5 doesn't hallucinate as badly as grok 4. I have very horrible experiences in lmarena side by side

ocean vortex Aug 1, 2025, 4:20 PM

#

Hm... Ok if it's indeed that, why not release other metrics and focus on math where small models like o4-mini-high are known to be often better than both medium and huge sized models? Makes no sense

#

@patent aspen

hollow imp Aug 1, 2025, 4:21 PM

#

Can you trick 2.5 pro into similar or atleast 40% Deepthink like performance using some prompt engineering?

patent aspen Aug 1, 2025, 4:22 PM

#

ocean vortex Hm... Ok if it's indeed that, why not release other metrics and focus on math wh...

I actually don't have any special insights into evals for models. Your guess is as good as mine

hollow imp Aug 1, 2025, 4:22 PM

#

How can I subscribe for any of these? I'm 15 years old I don't have money

patent bane Aug 1, 2025, 4:23 PM

#

hollow imp Can you trick 2.5 pro into similar or atleast 40% Deepthink like performance usi...

no and yes

hollow imp Aug 1, 2025, 4:23 PM

#

Our education system is cooked

hollow imp Aug 1, 2025, 4:23 PM

#

patent bane no and yes

Reject the element no because square root cannot be Negative and tell me the yes 😂

patent bane Aug 1, 2025, 4:25 PM

#

why would i 😂

hollow imp Aug 1, 2025, 4:26 PM

#

patent bane why would i 😂

What do you want in return

#

Personal experience> benchmark

narrow haven Aug 1, 2025, 4:35 PM

#

Hello guys and girls

whole wagon Aug 1, 2025, 4:35 PM

#

50% odds to increase the lmarena score by 7 Elo?

torn bison Aug 1, 2025, 4:36 PM

#

whole wagon 50% odds to increase the lmarena score by 7 Elo?

you are looking at the leaderboard with stylecontrol enabled

whole wagon Aug 1, 2025, 4:37 PM

#

why dont they use the style control leaderboard

#

well i didnt even realise its a thing. its enabled by default lol

#

so they adjust the scores automatically by default?

hollow imp Aug 1, 2025, 4:46 PM

#

Ayanokouji august

echo aurora Aug 1, 2025, 4:52 PM

#

Yeah that's a good point and a topic we discuss internally here and there. I'll be sure to bring this up again as it's important how we structure this.

primal orbit Aug 1, 2025, 4:58 PM

#

I thought models see only they part through the chat. You send message via site -> site uses API to send message to LLM -> API returns reponse to the site -> site outputs the message.

#

and the site handles 2 channels simultaneously this way

torn mantle Aug 1, 2025, 4:58 PM

#

whats the difference between wolfstride and kingfall tho?

#

is wolfstride like a more recent checkpoint?

patent aspen Aug 1, 2025, 5:01 PM

#

torn mantle is wolfstride like a more recent checkpoint?

Yes

rare python Aug 1, 2025, 5:02 PM

#

torn mantle is wolfstride like a more recent checkpoint?

latest base gemini model checkpoint that is not deepthink iirc

there are nightride but it's weird

astral jetty Aug 1, 2025, 5:03 PM

#

I can’t even try a sample of deep think because it’s behind a 200 dollar paywall

civic flame Aug 1, 2025, 5:06 PM

#

is that it?

#

probably not any good 😭

#

yeah

drowsy cargo Aug 1, 2025, 5:08 PM

#

wow i didn't know so many people were cheating on dev mode

brittle tiger Aug 1, 2025, 5:09 PM

#

https://x.com/fleetingbits/status/1951321535287095393?t=soJHoWcE6J-gmMWXuENR3g&s=19

FleetingBits (@fleetingbits)

Some initial thoughts on Gemini DeepThink:

0) TLDR; it's very impressive

1) It feels more like running a Deep Research query in that it can take 10-15 minutes to run.

2) It seems like it runs in a sandbox and has access to some compute but tries to run code that you wouldn't

patent aspen Aug 1, 2025, 5:15 PM

#

I mean isn't the same true for gpt-5

sudden pollen Aug 1, 2025, 5:15 PM

#

Hi!

tall summit Aug 1, 2025, 5:15 PM

#

oh deepthink out

patent aspen Aug 1, 2025, 5:16 PM

#

I also think Google's naming is way saner than anyone else tbh

civic flame Aug 1, 2025, 5:18 PM

#

tall summit oh deepthink out

rather late

tall summit Aug 1, 2025, 5:19 PM

#

civic flame rather late

i am rather late or deepthink is

sick barn Aug 1, 2025, 5:21 PM

#

yo

civic flame Aug 1, 2025, 5:23 PM

#

tall summit i am rather late or deepthink is

all of the above

tall summit Aug 1, 2025, 5:23 PM

#

civic flame all of the above

lolol

#

fair

#

i just ignore deepthink because i know i wont be able to use it

#

and i dont need it

#

but its cool to see advancements isnt it

unborn ocean Aug 1, 2025, 5:24 PM

#

Output: 42, CoT: hidden, summary stupid

#

Imagine

#

Searching X for „Elon Musk opinion on meaning of everything“

candid field Aug 1, 2025, 5:35 PM

#

guys is the limit 10 videos or 8 ?

keen beacon Aug 1, 2025, 5:36 PM

#

candid field guys is the limit 10 videos or 8 ?

8

patent aspen Aug 1, 2025, 5:53 PM

#

What are the current GPT-5 benchmarks? Are they verified?

keen beacon Aug 1, 2025, 5:55 PM

#

we have none rn

golden ocean Aug 1, 2025, 5:55 PM

#

wintry tinsel Aug 1, 2025, 5:56 PM

#

Gpt will be head and shoulders sota when it releases, remember o series models and 4.1,4.5 are checkpoints in the development of the finished product which is 5

patent aspen Aug 1, 2025, 5:56 PM

#

What are the sources on the release date being next week?

keen beacon Aug 1, 2025, 5:56 PM

#

apparently horizon alpha's reasoning version got 86% on gpqa tho. it was up for a little bit, whatever that is

wintry tinsel Aug 1, 2025, 5:56 PM

#

But nobody can beat Gemini cuz nobody can beat free

civic flame Aug 1, 2025, 5:58 PM

#

patent aspen What are the sources on the release date being next week?

openai have been intensely preparing for it for the last ~4 days

wintry tinsel Aug 1, 2025, 5:58 PM

#

It’s o series integrated into the regular more versatile model so it will be bordering on a major leap if it is not one itself

civic flame Aug 1, 2025, 5:58 PM

#

and they begun A/B testing it on chatgpt late last week

high ginkgo Aug 1, 2025, 5:58 PM

#

can u translate this to simpler english for me

#

sorry for my bed england

#

thx

wintry tinsel Aug 1, 2025, 5:59 PM

#

I hope that it can beat Claude on coding and writing/vibes cuz opus is expensive, slow and censored

stray aspen Aug 1, 2025, 6:11 PM

#

of course it will beat claude

hollow imp Aug 1, 2025, 6:29 PM

#

wintry tinsel I hope that it can beat Claude on coding and writing/vibes cuz opus is expensive...

But is it good

#

I've used it a lot on lmarena and I have bad experiences tbh

ocean vortex Aug 1, 2025, 6:34 PM

#

So basically like deep think lol

#

or o3-pro vs o3

brittle tiger Aug 1, 2025, 6:35 PM

#

https://x.com/amir/status/1951343643887018187?t=NFy3vNf6VrJIgkNX6Zg73A&s=19

Amir Efrati (@amir)

GPT-5 is good.

But model performance gains are still slower than in past years and this year has been a technically challenging one for OpenAI researchers.

The inside story here...

ocean vortex Aug 1, 2025, 6:36 PM

#

At least it won't be 10RPD and paywalled behind a door you need golden key for

brittle tiger Aug 1, 2025, 6:38 PM

#

"The improvements won’t be comparable to the leaps in performance of earlier GPT-branded models, such as the improvements between GPT-3 in 2020 and GPT-4 in 2023"

patent aspen Aug 1, 2025, 6:39 PM

#

ocean vortex At least it won't be 10RPD and paywalled behind a door you need golden key for

Yeah it probably won't be 15 requests a month like o3 pro

#

I'm looking at the OAI help center. I see 15 rpm for o3 pro. Is that out of date?

ocean vortex Aug 1, 2025, 6:41 PM

#

patent aspen Yeah it probably won't be 15 requests a month like o3 pro

o3 pro I can use it on playground as much as I want without paying extortionate sub prices lol

patent aspen Aug 1, 2025, 6:41 PM

#

Oh that's for API

ocean vortex Aug 1, 2025, 6:42 PM

#

for deep think I can't use it at all. Paying for their sub is not an option I would even consider tbh

patent aspen Aug 1, 2025, 6:42 PM

#

15 requests / month

tall summit Aug 1, 2025, 6:43 PM

#

golden ocean

WHAT WAS THE ANSWER

#

I MUST KNOW

patent aspen Aug 1, 2025, 6:43 PM

#

Isn't that even worse?

ocean vortex Aug 1, 2025, 6:44 PM

#

patent aspen 15 requests / month

#

I would guess most of the people using o3-pro here and there do NOT have a Pro sub. And that sub is already priced more reasonably than Gemini one

#

I think it's 'unlimited' only for o3, not the pro

#

But the fact alone that Google is competing on charging you comparable amounts of money and has even stricter limits with all their TPUs is kinda already crazy enough...

blazing bison Aug 1, 2025, 6:47 PM

#

yes, on chatgpt pro requests for all models is unlimited

#

there is only limits on deep research and agent

#

and they are very reasonable limits btw

#

of all pro plans, openai offer the better one

#

claude is not good too, with weekly limits

torn mantle Aug 1, 2025, 6:49 PM

#

patent aspen Yes

then how do you explain kingfall being better than wolfstride

patent aspen Aug 1, 2025, 6:49 PM

#

IIRC the real limit for o3 pro on that plan is capped in low dozens per month

blazing bison Aug 1, 2025, 6:49 PM

#

patent aspen IIRC the real limit for o3 pro on that plan is capped in low dozens per month

on enterprise plan that is like $30

ocean vortex Aug 1, 2025, 6:50 PM

#

blazing bison yes, on chatgpt pro requests for all models is unlimited

wdym

#

o3-pro is not "unlimited", don't know their caps though...

blazing bison Aug 1, 2025, 6:50 PM

#

ocean vortex o3-pro is not "unlimited", don't know their caps though...

bro i do more than 100 request / day

patent aspen Aug 1, 2025, 6:51 PM

#

Yeah it's not actually unlimited

torn mantle Aug 1, 2025, 6:51 PM

#

torn mantle then how do you explain kingfall being better than wolfstride

@patent aspen

blazing bison Aug 1, 2025, 6:51 PM

#

patent aspen Yeah it's not actually unlimited

do you have it?

ocean vortex Aug 1, 2025, 6:52 PM

#

blazing bison bro i do more than 100 request / day

Ok then their caps are very reasonable lol. But this only proves my point even more, what Google is trying to do with their model availability and pricing is insane.

blazing bison Aug 1, 2025, 6:52 PM

#

ocean vortex Ok then their caps are very reasonable lol. But this only proves my point even m...

i tried the same with opus on claude and they weekly limited me now after 2 days

#

🤓

#

bcs before it was basically unlimited too

#

and so many people talking about claude code so

#

it was, atleast for me

#

80/ 100 requests day for me is basically unlimited

ocean vortex Aug 1, 2025, 6:54 PM

#

This was fine. People were still able to test and use it. But also there's no way Google's operating cost of a single instance of their large model is anywhere near that. And if they can only make it perform with parallel compute that is still on them.

keen fulcrum Aug 1, 2025, 6:55 PM

#

https://cdn.discordapp.com/attachments/1262757808462499912/1400909456929456128/image0.jpg?ex=688e5a1a&is=688d089a&hm=302bb5ddaad69eb6c04e044e50c8de2c52267ada273c97260577a7d5697f0df1&

#

Kiro pricing revealed

#

Launching next week

blazing bison Aug 1, 2025, 6:55 PM

#

vibe requests lmao

#

cringe

torn mantle Aug 1, 2025, 6:55 PM

#

whats kiro

blazing bison Aug 1, 2025, 6:55 PM

#

amazon

keen fulcrum Aug 1, 2025, 6:55 PM

#

Agentic ide such as cursor

torn mantle Aug 1, 2025, 6:55 PM

#

brian is ignoring me 🙁

blazing bison Aug 1, 2025, 6:55 PM

#

amazon cursor

#

kiro = amazon cursor

keen fulcrum Aug 1, 2025, 6:56 PM

#

They advertised before with unlimited now they steering back.

pure anvil Aug 1, 2025, 6:56 PM

#

keen fulcrum Kiro pricing revealed

I see their ads on Reddit all the time throwing shade at other agentic tools lmao for their rate limits

ocean vortex Aug 1, 2025, 6:56 PM

#

blazing bison i tried the same with opus on claude and they weekly limited me now after 2 days

Yeah Anthropic is hilariously bad with their limits. And somehow people still manage to make up excuses for them lmao

blazing bison Aug 1, 2025, 6:56 PM

#

like when the models get good enough i'm not gonna need to do 100 requests

keen fulcrum Aug 1, 2025, 6:56 PM

#

$0.20 for spec request and $0.04 for vibe request after you used your quota

blazing bison Aug 1, 2025, 6:57 PM

#

i just reach 100 requests bcs of models doing dumb mistakes

ocean vortex Aug 1, 2025, 6:57 PM

#

They were just about never good on value

blazing bison Aug 1, 2025, 6:57 PM

#

so maybe with claude 6 the price will be reasonable

#

bcs the model will solve problems with less prompts

keen beacon Aug 1, 2025, 6:57 PM

#

ocean vortex They were just about never good on value

claude max was good value before they made the recent change tho

blazing bison Aug 1, 2025, 6:57 PM

#

yeah, it was the best

keen fulcrum Aug 1, 2025, 6:57 PM

#

keen beacon claude max was good value before they made the recent change tho

Still is

blazing bison Aug 1, 2025, 6:57 PM

#

it's not

#

you could, there was no rate limit

#

it's like their rate limit was not working or something

serene cliff Aug 1, 2025, 6:58 PM

#

how to make video here with sound?

keen beacon Aug 1, 2025, 6:58 PM

#

who thought that was a good idea at anthropic lol given their limited compute compared to other companies

whole wagon Aug 1, 2025, 6:58 PM

#

GPT5 is going to cook Gemini 2.5 it's obvious. They better be working hard on Gemini 3 rn lol

pure anvil Aug 1, 2025, 6:58 PM

#

keen beacon claude max was good value before they made the recent change tho

there's nothing better tho, openai models are trash for agentic coding atm so maybe gpt5 will change that

echo aurora Aug 1, 2025, 6:58 PM

#

serene cliff how to make video here with sound?

Check out #1397655624103493813 for more info

keen fulcrum Aug 1, 2025, 6:59 PM

#

blazing bison you could, there was no rate limit

Almost every user reaches rate limit already

blazing bison Aug 1, 2025, 6:59 PM

#

keen fulcrum Almost every user reaches rate limit already

before the change i didnt

#

and i was using it a lot with a lot of context

#

but i was paying for the $200 plan

#

now with 2 days i reached the week limit

keen fulcrum Aug 1, 2025, 7:00 PM

#

blazing bison now with 2 days i reached the week limit

They didn’t change anything yet

#

August 28

blazing bison Aug 1, 2025, 7:00 PM

#

keen fulcrum They didn’t change anything yet

do you want me to share screen?

#

wtf

ocean vortex Aug 1, 2025, 7:00 PM

#

Honestly I think they are simply making a mistake. It's a short-sighted approach that has a high likelihood to hurt them long term and ensure it never beats chatgpt in popularity... They nuked their availability before they were in a position to do so IMO

blazing bison Aug 1, 2025, 7:00 PM

#

so maybe you can unlock my account or smth since apparently you work at anthropic

whole wagon Aug 1, 2025, 7:01 PM

#

They have veo3 still

blazing bison Aug 1, 2025, 7:01 PM

#

veo3 is not that good

#

idk i don't think it's worth $250

#

like only if you don't pretend to generate revenue with it

serene cliff Aug 1, 2025, 7:02 PM

#

there it's a option for sound in video?>

blazing bison Aug 1, 2025, 7:02 PM

#

i wasted $400 this month with 2 pro signatures, but they accelerated my work like 10x

#

claude max and gpt pro

pure anvil Aug 1, 2025, 7:03 PM

#

what work were you doing that it accelerated 10x?

#

lol

keen fulcrum Aug 1, 2025, 7:03 PM

#

blazing bison i wasted $400 this month with 2 pro signatures, but they accelerated my work lik...

Are you considering supergrok when their coding model releases

blazing bison Aug 1, 2025, 7:03 PM

#

crud basically

blazing bison Aug 1, 2025, 7:04 PM

#

keen fulcrum Are you considering supergrok when their coding model releases

no, i'm not paying $300

digital umbra Aug 1, 2025, 7:04 PM

#

veo 3 is pretty good compared to other video models at least

#

i wonder what sota will be next year lol

keen fulcrum Aug 1, 2025, 7:04 PM

#

blazing bison no, i'm not paying $300

You get grok 4 heavy, coder, multimodal and video model later this year

blazing bison Aug 1, 2025, 7:04 PM

#

i just wasted $400 bcs anthropic f*** with my account

#

apparently the unlimited plan is not unlimited

#

openai never did that, even after abusing of it a lot

ocean vortex Aug 1, 2025, 7:05 PM

#

Wait a sec...

#

Ultra plan was released roughly 90 days ago was it not...

#

and only first 90 days were the discounted price

#

lmfao

blazing bison Aug 1, 2025, 7:06 PM

#

i considered getting one day of google ultra, google is easily to refund if you don't abuse

#

if the model is good then ok, i would keep it

#

but after seeing that it's 10 requests / day

#

🤦‍♂️

hollow imp Aug 1, 2025, 7:07 PM

#

blazing bison i considered getting one day of google ultra, google is easily to refund if you ...

30 day free trial

keen beacon Aug 1, 2025, 7:07 PM

#

you mean ultra? pro doesnt get it i think

blazing bison Aug 1, 2025, 7:07 PM

#

ye

#

ultra

#

so many names

granite jay Aug 1, 2025, 7:08 PM

#

Hi

hollow imp Aug 1, 2025, 7:08 PM

#

Any fireship viewer?

patent aspen Aug 1, 2025, 7:09 PM

#

I think 6-8 weeks is pretty realistic

hollow imp Aug 1, 2025, 7:15 PM

#

Isn't sota some openai video generation model?

keen beacon Aug 1, 2025, 7:15 PM

#

🤣

patent aspen Aug 1, 2025, 7:15 PM

#

hollow imp Isn't sota some openai video generation model?

That's sora haha

ocean vortex Aug 1, 2025, 7:20 PM

#

blazing bison but after seeing that it's 10 requests / day

agree

#

They expect you to pay up first, and only then receive a chance to even see if it's any good

#

Their blogpost alone is nowhere near enough to tell

#

and 10RPD you are still very constrained. So no proper testing of any kind and forget the benchmarks

keen fulcrum Aug 1, 2025, 7:23 PM

#

ocean vortex Aug 1, 2025, 7:24 PM

#

If this deep think is indeed based on Ultra, I think the odds of Gemini3 beating GPT5 just got way lower LOL

storm needle Aug 1, 2025, 7:26 PM

#

ocean vortex and 10RPD you are still very constrained. So no proper testing of any kind and f...

and it's not even deep think version that won the gold medal

#

it's a scam

ocean vortex Aug 1, 2025, 7:28 PM

#

But it's soo weird to market huge model as math oriented one completely leaving things out like SimpleQA. Unless they used some derivative of a model meant for competing at IMO. But then it makes even less sense to use this for public release as their overall top performing model.

blazing bison Aug 1, 2025, 7:28 PM

#

and they said on the announcement of the gold medal, that they would allow everyone to use the model

#

misleading

#

😆

#

when google releases something good, really good, bet on Logan marketing it

#

if Logan is in silence, then it's not good

keen beacon Aug 1, 2025, 7:29 PM

#

if gemini 3 doesnt beat gpt 5 its a very bad sign for gdm tbh

blazing bison Aug 1, 2025, 7:29 PM

#

keen beacon if gemini 3 doesnt beat gpt 5 its a very bad sign for gdm tbh

i think they gonna be on the same level

keen beacon Aug 1, 2025, 7:30 PM

#

given how its a new pretrained model and theyve pretrained two fresh model generations since 4o

ocean vortex Aug 1, 2025, 7:30 PM

#

Well yeah for starters you have a "deep think" button which is only available when you have selected 2.5Pro, their previous best performing model. This strongly implies to use this for best possible performance

zinc ore Aug 1, 2025, 7:31 PM

#

ocean vortex If this deep think is indeed based on Ultra, I think the odds of Gemini3 beating...

What about the IMO gold deepthink, which only a select number of people get which is much more performant?

keen beacon Aug 1, 2025, 7:31 PM

#

they cant host it practically

ocean vortex Aug 1, 2025, 7:31 PM

#

zinc ore What about the IMO gold deepthink, which only a select number of people get whic...

it's a specialist model meant only for math

zinc ore Aug 1, 2025, 7:31 PM

#

No, it generalizes to many reasoning tasks

#

It's literally just a suped up version of the deepthink they're offering

ocean vortex Aug 1, 2025, 7:32 PM

#

it does "work" for all tasks, but it was still tuned for math

zinc ore Aug 1, 2025, 7:32 PM

#

But "slower"

#

They claim it is SOTA at coding as well and "other reasoning tasks" as they vaguely mention

#

And the main difference is the current one offered is a faster version

#

So if current deepthink is generalized at reasoning tasks, then the other version should be too

ocean vortex Aug 1, 2025, 7:33 PM

#

tbh chances of that model doing better than your standard 2.5Pro at things like coding or your typical everyday tasks not involving math are very very slim. They trained it to perform as good as possible at IMO with no compromises while still keeping it usable.

zinc ore Aug 1, 2025, 7:34 PM

#

ocean vortex tbh chances of that model doing better than your standard 2.5Pro at things like ...

They literally say otherwise

ocean vortex Aug 1, 2025, 7:35 PM

#

zinc ore They literally say otherwise

where exactly do they say it peforms better ar coding than 2.5Pro?

#

they do not lol

zinc ore Aug 1, 2025, 7:35 PM

#

Yes they do lmao

ocean vortex Aug 1, 2025, 7:35 PM

#

?

zinc ore Aug 1, 2025, 7:35 PM

#

They said it is SOTA at coding and other reasoning tasks

ocean vortex Aug 1, 2025, 7:36 PM

#

link

zinc ore Aug 1, 2025, 7:36 PM

#

This was from a week or two ago, whenever the IMO happened

#

And current deepthink offering is literally the same system but faster as they say

ocean vortex Aug 1, 2025, 7:37 PM

#

zinc ore This was from a week or two ago, whenever the IMO happened

they said this and nowhere in that did they claim it performs at non-math tasks better than 2.5Pro https://deepmind.google/discover/blog/advanced-version-of-gemini-with-deep-think-officially-achieves-gold-medal-standard-at-the-international-mathematical-olympiad/

Google DeepMind

Advanced version of Gemini with Deep Think officially achieves gold...

Our advanced model officially achieved a gold-medal level performance on problems from the International Mathematical Olympiad (IMO), the world’s most prestigious competition for young...

#

no source = didn't happen

zinc ore Aug 1, 2025, 7:39 PM

#

Deepmind employee says it, I don't think they mention it in that blog

ocean vortex Aug 1, 2025, 7:39 PM

#

Also why do you think they are still focusing on math with the current deep think released today? It would make no sense unless it's a derivative of that math oriented model, like I've already said

ocean vortex Aug 1, 2025, 7:40 PM

#

zinc ore Deepmind employee says it, I don't think they mention it in that blog

Well then link that tweet lol

zinc ore Aug 1, 2025, 7:40 PM

#

https://x.com/lmthang/status/1948458590492393834

Thang Luong (@lmthang)

Right before #imo2025, together with colleagues from Mountain View, NYC, Singapore, etc, we all gathered at @GoogleDeepMind headquarter in London for our final push for IMO. I believe that week was when all magic happened!

We put all individual recipes (that we figured out

#

"We finished training 2 days before IMO 😄 That model achieved SOTA results, not just for math, but coding along with other reasoning tasks, unbelievable!"

#

He's one of the leads of the IMO team lol

#

And they're literally saying they're offering that model to mathematicians right now, while the current one is based on the same system but faster

#

I don't make crap up, I just repeat what I've actually read

brittle tiger Aug 1, 2025, 7:44 PM

#

Deep Think prompt: Create a visually impressive Pokemon battle simulator web based game

https://g.co/gemini/share/96cea0058d5a

Gemini

‎Gemini - Pokémon Battle Simulator Implementation

Created with Gemini

ocean vortex Aug 1, 2025, 7:44 PM

#

zinc ore https://x.com/lmthang/status/1948458590492393834

Ok fair, but it's just confusing af 😄
If that was the same model, why can't it score on IMO the same after the fact even when they had the time with all the data and solutions out there? And if it's SOTA on coding and "other reasoning tasks", why no metrics for that?

zinc ore Aug 1, 2025, 7:45 PM

#

Because it's unreleased lol

#

We don't know gpt5s benchmarks yet

ocean vortex Aug 1, 2025, 7:47 PM

#

zinc ore Because it's unreleased lol

If you assume that current deep think is based on Ultra, it would be unreasonable to assume that a) a different finetune of that performs so much better everywhere and also b) that they just released a much lesser version for $300 a month with 10rpd

zinc ore Aug 1, 2025, 7:48 PM

#

https://vxtwitter.com/archit_sharma97/status/1951307373219623281

Archit Sharma (@archit_sharma97)

Gemini 2.5 Deep Think is out!! We were able to improve the model substantially since our announcement at I/O, and it is a faster variation of the system that got Gold 🥇at IMO (still getting bronze level performance🥉!!)

The model is p good at detailed creative tasks too! https://t.co/uxNeFki8oR

#

They're literally advertising it as the IMO gold model, but a faster variation

ocean vortex Aug 1, 2025, 7:49 PM

#

"faster variation" --> less test-time compute = same base model

#

That's how I'm reading this

timber kiln Aug 1, 2025, 7:49 PM

#

ocean vortex If you assume that current deep think is based on Ultra, it would be unreasonabl...

Very bad assumption
You don't release the highest cost version of a model first ever
You start with the normal one and then go higher

ocean vortex Aug 1, 2025, 7:50 PM

#

timber kiln Very bad assumption You don't release the highest cost version of a model first ...

So you think Ultra with parallel test time compute is "normal one"? 🤣

timber kiln Aug 1, 2025, 7:50 PM

#

ocean vortex So you think Ultra with parallel test time compute is "normal one"? 🤣

No this is pro with parallel compute
They would release Ultra first and then parallel later

zinc ore Aug 1, 2025, 7:51 PM

#

ocean vortex "faster variation" --> less test-time compute = same base model

https://vxtwitter.com/lmthang/status/1951311980960350276

Same guy I just shared talking about it with the YOLO run from the tweet above

Thang Luong (@lmthang)

Our IMO journey continues: the yolo run model that we trained a week before #imo2025, despite all possible likelihood of failures, magically achieves SOTA across a wide range of reasoning tasks from maths, to coding, and challenging knowledge. I'm very excited that we have now delivered the IMO 🥇 system to the hands of mathematicians and a simplified version (results below) to all Google AI Ultra subscribers.

QRT: lmthang
Right before #imo2025, together with colleagues from Mountain View, NYC, Singapore, etc, we all gathered at @GoogleDeepMind headquarter in London for our final push for IMO. I believe that week was when all magic happened!

We put all individual recipes (that we figured out before) together and did a yolo run (with the compute that I had to beg various groups to loan) to train our most advanced Gemini model. We finished training 2 days before IMO :D That model achieved SOTA results, not just for math, but coding along with other reasoning tasks, unbe…

#

He calls it a "simplified version" here

ocean vortex Aug 1, 2025, 7:52 PM

#

timber kiln No this is pro with parallel compute They would release Ultra first and then par...

well according to @patent aspen it is definitively Ultra 🤷‍♂️

zinc ore Aug 1, 2025, 7:52 PM

#

But again, connecting it to the YOLO IMO gold run, and calling it a variation of that

torn mantle Aug 1, 2025, 7:53 PM

#

zinc ore He calls it a "simplified version" here

Simplified version and its 10 prompts per day for like 200$

#

They can keep it

ocean vortex Aug 1, 2025, 7:54 PM

#

zinc ore https://vxtwitter.com/lmthang/status/1951311980960350276 Same guy I just shared...

yeah so... o3-preview with crazy test-time compute type of model to a few people, and then more realistic one to the people paying $300

zinc ore Aug 1, 2025, 7:54 PM

#

Yeah, I wish there were more benchmarks to compare it to o3 pro and grok heavy

ocean vortex Aug 1, 2025, 7:54 PM

#

Base model is likely the same, just different amount of parallel instances and the way that system is ran etc

zinc ore Aug 1, 2025, 7:56 PM

#

"magically achieves SOTA"

#

Ie hype phrasing saying it generalizes beyond math

ocean vortex Aug 1, 2025, 7:57 PM

#

timber kiln No this is pro with parallel compute They would release Ultra first and then par...

If there are no gains to show they wouldn't necessarily release it at all. Just look at Flash vs Pro, here (with Ultra) the differences are probably even smaller and maybe no contrast in SimpleQA even. But parallel compute amplifies any differences and gains

hollow imp Aug 1, 2025, 7:59 PM

#

Even 10 prompts a day is enough just give it to me 🙏

ocean vortex Aug 1, 2025, 8:00 PM

#

Marginal gains like it getting the correct answer only occasionally as opposed to never at all, with parallel compute may convert this into it getting it right most of the time.

brittle tiger Aug 1, 2025, 8:01 PM

#

I've definitely done more than 10 prompts today fwiw

keen beacon Aug 1, 2025, 8:01 PM

#

is it a soft limit right now?

zinc ore Aug 1, 2025, 8:02 PM

#

https://fxtwitter.com/lmthang/status/1951318861170745854

Thang Luong (@lmthang)

@swyx @BlackHC I can't share a lot at this point, but clearly how much we we allow the model to think is the main axis of the simplification.

**💬 1 ❤️ 10 👁️ 505 **

hollow imp Aug 1, 2025, 8:05 PM

#

If it's really sota sota then play a game of chess till just 15 moves without hallucinating

trim sand Aug 1, 2025, 8:14 PM

#

Why are all the posts here so weird

ocean vortex Aug 1, 2025, 8:15 PM

#

zinc ore https://fxtwitter.com/lmthang/status/1951318861170745854

So same base model essentially confirmed. But their current cost constraints would not allow them to offer anything better than people got today. That's the best they can do

#

in a nutshell

#

Smth like 100k+ thinking from a huge model with a ton of parallel instances is just not realistic to serve

#

Too concise...

torn mantle Aug 1, 2025, 8:58 PM

#

oai really ruined it all for us

#

with the astronomical monthly 200$ plan

#

i mean i knew other labs will follow suit

#

but whats this?????

#

10 prompts per day for 200$ ?????????????

ocean vortex Aug 1, 2025, 9:01 PM

#

Sure, but then you can't charge ~~200$~~ $300

#

or you may as well become irrelevant soon enough. Or less relevant than you were hoping for 👀

#

It's all for nothing if it doesn't materialize and does not reach people

#

yeah like... People could care less about things "more strategically important to the company", and the company itself will cease to be important if people can't be satisfied and the demand can't be met

primal orbit Aug 1, 2025, 9:18 PM

#

https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-2-5-Deep-Think-Model-Card.pdf

#

more benchmark data within safety section

torn mantle Aug 1, 2025, 9:26 PM

#

i see

#

it make sense but the rate limits are just brutal

elder rapids Aug 1, 2025, 9:32 PM

#

so how is deepthink?

blazing bison Aug 1, 2025, 10:03 PM

#

lmao openai staff got caught using anthropic models

#

https://www.wired.com/story/anthropic-revokes-openais-access-to-claude/

WIRED

Anthropic Revokes OpenAI's Access to Claude

OpenAI lost access to the Claude API this week after Anthropic claimed the company was violating its terms of service.

#

that's funny

#

if even openai don't use openai models, i don't know what expect from gpt 5

#

@patent aspen coding

#

anthropic staff look into your data remember that

#

the data policy of anthropic is the worst of all

#

they said that openai was using their models to AI improvement related tasks

#

ye, idk if openai was drawing the line

patent aspen Aug 1, 2025, 10:08 PM

#

blazing bison ye, idk if openai was drawing the line

I think OAI is the least legally compliant of the trio

blazing bison Aug 1, 2025, 10:08 PM

#

it's not like that OAI was doing it, like sam said for them to do it

#

but their staff was

#

they cut access from personal employees

#

https://www.theinformation.com/articles/inside-openais-rocky-path-gpt-5

#

this article is bad news too

patent aspen Aug 1, 2025, 10:11 PM

#

elder rapids so how is deepthink?

I think it's great for math, science, really hard computer science problems like distributed computing. It's meh at coding although very few bugs

blazing bison Aug 1, 2025, 10:11 PM

#

now i'm not sure if gpt 5 is comming next week

#

apparently gpt 5 is not a big leap from 4o

elder rapids Aug 1, 2025, 10:13 PM

#

if they're the models we've been seeing on lmarena

#

they are big leaps

blazing bison Aug 1, 2025, 10:14 PM

#

google is also facing difficulties

elder rapids Aug 1, 2025, 10:14 PM

#

when does 3 come?

keen beacon Aug 1, 2025, 10:14 PM

#

doesnt openai leak info to the information? (or at least it seems that way) maybe theyre trying to downplay it a bit and let everyone be a little surprisd

blazing bison Aug 1, 2025, 10:14 PM

#

the only lab that do not suffer difficulties is anthropic

#

the paper that they released today wtf bro

#

even mark offering a bunch of money, anthropic researchers refused

#

what is happening there

elder rapids Aug 1, 2025, 10:16 PM

#

do you think it's going to be much different than the 2.5 series

keen beacon Aug 1, 2025, 10:16 PM

#

people were impressed by zenith/etc. and there have been massive preparations for gpt-5 in the frontend/backend apparently that people have datamined. it would be odd for it to be significantly delayed

loud leaf Aug 1, 2025, 10:16 PM

#

blazing bison this article is bad news too

is the article worth $299

blazing bison Aug 1, 2025, 10:17 PM

#

loud leaf is the article worth $299

no

#

you really don't know how to read it?

loud leaf Aug 1, 2025, 10:18 PM

#

ty archive.ph no luck

blazing bison Aug 1, 2025, 10:18 PM

#

np

loud leaf Aug 1, 2025, 10:21 PM

#

not much actual info on gpt5 there

elder rapids Aug 1, 2025, 10:21 PM

#

you wish craig

loud leaf Aug 1, 2025, 10:22 PM

#

i was expecting it to be an underwhelming wrapper that just routes to best previously existing model, but feedback on zenith suggested sota

patent aspen Aug 1, 2025, 10:23 PM

#

Even if we did hear about Apple announcing they would acquire Anthropic, it wouldn't be confirmed because of the subsequent FTC and congressional approvals

loud leaf Aug 1, 2025, 10:23 PM

#

only definitive claim it makes is the leap won't be as big as gpt3 -> 4 and like... yeah

patent aspen Aug 1, 2025, 10:24 PM

#

In that situation they'd probably have about a 60-70% chance of success, but the risks of opening an antitrust investigation probably wouldn't be worth it

loud leaf Aug 1, 2025, 10:25 PM

#

patent aspen Even if we did hear about Apple announcing they would acquire Anthropic, it woul...

kick trump $5m dono should be fine

patent aspen Aug 1, 2025, 10:25 PM

#

loud leaf kick trump $5m dono should be fine

They would also have the EU to worry about

#

The problem is that, even if it were only a tail risk, a tail risk of potentially doing major damage to your core business probably wouldn't be worth it for Apple

#

And they could get most of the same benefits by partnering

jade egret Aug 1, 2025, 10:43 PM

#

when gpt-5

patent aspen Aug 1, 2025, 10:44 PM

#

Then they can participate in the AI race and have more negotiation leverage. It would derisk their business a bit

#

At the moment they're a luxury real estate company as far as AI is concerned

#

Their service businesses are also threatened by AI to some extent

#

I'm talking mega long term

#

The other options are off the table, and if they own Anthropic, they can make it whatever they want

#

They probably can't buy Google, OAI, or xAI

#

They don't have the talent

#

I thought you were talking about them building their own models

#

I mean even OAI is using TPUs over Nvidia on GCP so...

torn mantle Aug 1, 2025, 10:58 PM

#

why is it a myth

#

elaborate

#

its actually years ahead of any major lab

#

they can have their own internal mini cuda but pretty sure its nowhere near it

patent aspen Aug 1, 2025, 10:59 PM

#

IMO Cuda is replaceable because the AI companies can just push software up to a higher layer of abstraction given a long enough time horizon

#

At a certain point, you just use PyTorch, Tensorflow, Jax

#

They are now but they can wrap ASICS too and eventually that's just cheaper

#

Developers just use high level libraries

leaden palm Aug 1, 2025, 11:03 PM

#

whatever jax uses

#

patent aspen Aug 1, 2025, 11:04 PM

#

XLA is basically an ML compiler

south phoenix Aug 1, 2025, 11:11 PM

#

where did the claude models go?

echo aurora Aug 1, 2025, 11:12 PM

#

south phoenix where did the claude models go?

I see them still there...

#

you're not?

south phoenix Aug 1, 2025, 11:13 PM

#

echo aurora I see them still there...

i see everything other than the anthropic models

blazing bison Aug 1, 2025, 11:15 PM

#

echo aurora Aug 1, 2025, 11:19 PM

#

south phoenix i see everything other than the anthropic models

Can you send a screenshot?

torn mantle Aug 1, 2025, 11:20 PM

#

patent aspen IMO Cuda is replaceable because the AI companies can just push software up to a ...

yea but the abstraction wont be perfect... its not like pytorch or tensorflow will be used for anything, i mean to achieve a similar performance like in cuda you need to align perfectly code & hardware.. that's why there are libs like cudnn & cublas that are engineered precisely to get max performance from their tensor cores. and lets say we for example moved to amd rocm even though there's support the performance wont be the same

#

take throughput diff between a100 & mi215 for example

#

which is actually 20%

south phoenix Aug 1, 2025, 11:21 PM

#

echo aurora Can you send a screenshot?

torn mantle Aug 1, 2025, 11:21 PM

#

and for tpus, you have to be married by contract to google to use them, so i wont talk about that

#

for aws, their ceo literally said trainium is like a supplement to nvidia gpus, and for the maojorty of workloads they will still use nvidia

#

so it only created a little competition with billions $$ spent and many collabs too

#

yea because we are talking about an ecosystem

patent aspen Aug 1, 2025, 11:26 PM

#

torn mantle yea but the abstraction wont be perfect... its not like pytorch or tensorflow wi...

Right I mean Jax is basically a high level library built on top of TPUs that can also interface with GPUs. I think the trend towards higher and higher level abstractions over ASICS is already in motion and will continue

echo aurora Aug 1, 2025, 11:27 PM

#

south phoenix

That's strange. What browser are you using?

#

Are others seeing the same ^ ? (claude models not appearing in list)

warm fulcrum Aug 1, 2025, 11:28 PM

#

echo aurora Are others seeing the same ^ ? (claude models not appearing in list)

nope they're there

south phoenix Aug 1, 2025, 11:28 PM

#

echo aurora That's strange. What browser are you using?

im using brave

south phoenix Aug 1, 2025, 11:28 PM

#

echo aurora Are others seeing the same ^ ? (claude models not appearing in list)

yep

warm fulcrum Aug 1, 2025, 11:28 PM

#

south phoenix im using brave

check with google

south phoenix Aug 1, 2025, 11:28 PM

#

warm fulcrum check with google

bro what 💀

warm fulcrum Aug 1, 2025, 11:28 PM

#

south phoenix bro what 💀

google chrome

#

check if its a browser issue

south phoenix Aug 1, 2025, 11:29 PM

#

warm fulcrum google chrome

okay dont say google because thats not a browser

#

just say chrome

warm fulcrum Aug 1, 2025, 11:29 PM

#

buddy u dont have to take it literally

patent aspen Aug 1, 2025, 11:29 PM

#

Basically Microsoft and the companies that aren't tech giants

warm fulcrum Aug 1, 2025, 11:29 PM

#

u know what i meant

south phoenix Aug 1, 2025, 11:29 PM

#

warm fulcrum u know what i meant

i got confused for a sec

#

yeah on chrome its visible

patent aspen Aug 1, 2025, 11:30 PM

#

The Nvidia ecosystem is still pretty dominant today. I just don't think that will be the long term trend

warm fulcrum Aug 1, 2025, 11:31 PM

#

south phoenix yeah on chrome its visible

disable all extensions temporarily

#

on brave

patent aspen Aug 1, 2025, 11:33 PM

#

I think a lot of legacy code will remain Nvidia-based for decades though

torn mantle Aug 1, 2025, 11:35 PM

#

patent aspen The Nvidia ecosystem is still pretty dominant today. I just don't think that wil...

while i agree, 'long-term' is kinda vague

#

ofc things will change in the future

#

but i would give it like +10 years or more to replicate something like cuda

patent aspen Aug 1, 2025, 11:37 PM

#

Nvidia is well positioned enough that they will always be relevant. I just don't think it's actually necessary to replicate CUDA if you can offer comparable performance at 1/5 the cost

torn mantle Aug 1, 2025, 11:37 PM

#

just the migration process will be a headache

#

if this so 'imaginary' company succeeded

patent aspen Aug 1, 2025, 11:38 PM

#

That's definitely relevant for a lot of companies, although if the major frameworks migrate, then it's way less work to migrate

#

Like imagine if <insert your favorite ML framework> just has a one-line config to select the hardware backend

torn mantle Aug 1, 2025, 11:42 PM

#

patent aspen Nvidia is well positioned enough that they will always be relevant. I just don't...

yea but thats a big IF

#

i hope you are not only taking TFLOPS as the only criteria

patent aspen Aug 1, 2025, 11:43 PM

#

torn mantle yea but thats a big IF

Why wouldn't it happen if the economic incentives became big enough?

torn mantle Aug 1, 2025, 11:44 PM

#

again if your software stack isnt as optimized as cuda, then its a waste of time, they all have good theorical performance cards

#

1/5 is just TCO

#

cost of ownership

#

what about electricity

#

what about space

#

thats if we are assuming that the performance is like 70%

patent aspen Aug 1, 2025, 11:46 PM

#

torn mantle what about space

Why buy electricity and space? That's what the cloud is for

echo aurora Aug 1, 2025, 11:48 PM

#

south phoenix im using brave

I'll let team know, but yeah doesn't appear to be widespread, I wasn't able to repro even on Brave browser either.

zinc ore Aug 1, 2025, 11:48 PM

#

https://vxtwitter.com/testingcatalog/status/1951320162541388045

FYI, this guy has access to the gold IMO deepthink model, and has been sharing some tweets about what it makes

TestingCatalog News 🗞 (@testingcatalog)

Gemini Deep Think IMO 👀

It is one of the first models which I am testing extensively b/c it is very fun to play with.

"Cyberpunk nuclear reactor control interface" https://t.co/y5zHfZYm6Y

QRT: testingcatalog
I have Gemini "Deep Think IMO" mode 👀

What should I ask? https://t.co/EhDw7kOAb3

▶ Play video

torn mantle Aug 1, 2025, 11:48 PM

#

patent aspen Why buy electricity and space? That's what the cloud is for

yea thats from a user perspective, but whos paying the bill?

#

whos calculating tco?

south phoenix Aug 1, 2025, 11:48 PM

#

echo aurora I'll let team know, but yeah doesn't appear to be widespread, I wasn't able to r...

yeah would be nice to have that error fixed

torn mantle Aug 1, 2025, 11:53 PM

#

also

#

why are we assuming nvidia will just stand still?

patent aspen Aug 1, 2025, 11:56 PM

#

torn mantle why are we assuming nvidia will just stand still?

Mainly because they make 80+% margins on hardware

#

That's an opportunity if I ever saw one

patent aspen Aug 1, 2025, 11:57 PM

#

torn mantle yea thats from a user perspective, but whos paying the bill?

TCO includes the electricity, space, etc

#

Well definitely the electricity at least

blazing bison Aug 2, 2025, 12:37 AM

#

zinc ore https://vxtwitter.com/testingcatalog/status/1951320162541388045 FYI, this guy h...

ye, people with deep think imo don't pay for it and has unlimited usage, not commenting about the model being better either. It's like spit on the face of paying customers

torn mantle Aug 2, 2025, 1:07 AM

#

patent aspen TCO includes the electricity, space, etc

yea sorry i meant tco is what should be calculated/used as ref not just gpu cost only

#

it does include space & electricity

warm fulcrum Aug 2, 2025, 1:24 AM

#

blazing bison ye, people with deep think imo don't pay for it and has unlimited usage, not com...

how do they get it?

#

or is it just random

blazing bison Aug 2, 2025, 1:25 AM

#

warm fulcrum how do they get it?

friends

warm fulcrum Aug 2, 2025, 1:25 AM

#

blazing bison friends

wowie

blazing bison Aug 2, 2025, 1:27 AM

#

The deep think that they announced is the deep think imo. The deep think released is a slight improvement from gemini 2.5. Deep think imo looks like another model, you can check doing the same prompts pelican, star shoot game, etc

#

They added a unreasonable rate limit and a worst model for paid customers. While they gave influencers a much better model with unlimited requests

warm fulcrum Aug 2, 2025, 1:29 AM

#

so selfish..

oak bolt Aug 2, 2025, 1:32 AM

#

to create videos for my school projects 😛

blazing bison Aug 2, 2025, 1:43 AM

#

????????????????????????????

#

He actually made stuff up

#

The difference between the imo model and the released one is inference config only

#

you are not

#

he deleted his message saying that he is a googler

#

lmao

#

well, different of you i can reference actual source for the shi* i say

#

https://x.com/tulseedoshi/status/1951245891265778160

Tulsee Doshi (@tulseedoshi)

@burny_tech @GoogleDeepMind This is a variation of our IMO gold model that is faster and more optimized for daily use! We are also giving the IMO gold full model to a set of mathematicians to test the value of the full capabilities.

keen beacon Aug 2, 2025, 1:49 AM

#

we need more people asking how many rs are in strawberry using deepthink tbh

#

AGI benchmark

blazing bison Aug 2, 2025, 1:49 AM

#

this @patent aspen is a clown

#

#

https://x.com/giffmana/status/1951296536215978458

Lucas Beyer (bl16) (@giffmana)

@YiTayML So the IMO one is not exactly the general one?

#

"made stuff up"

#

dumb clown

#

mf is lying

#

deep think

#

is the same thing

#

there is no different deep thinks

hardy pecan Aug 2, 2025, 1:51 AM

#

Children stop fighting

blazing bison Aug 2, 2025, 1:51 AM

#

the only thing that change is inference config

#

source?

#

yes

#

source? YES

ornate agate Aug 2, 2025, 1:51 AM

#

blazing bison mf is lying

He’s not sundar Pichai. He just works there and didn’t make these decisions. Don’t have to be mean

blazing bison Aug 2, 2025, 1:51 AM

#

he is lying lol

#

why you're on his side

#

i'm showing actual sources of people that is direct on the project

#

and you guys are on the side of the guy

#

without sources?

#

lmao

#

but he is lying all the time

#

he deleted his lies

#

every time that i show proof of the opposite of this guy is saying he delete his message

#

why you guys like liers lmao

paper nimbus Aug 2, 2025, 1:54 AM

#

https://cdn.discordapp.com/attachments/1365049274068631644/1400782321203806320/image0.gif

blazing bison Aug 2, 2025, 1:59 AM

#

???

#

bro the lies is some messages ago

#

there is a lot more too if you search his messages

#

i can create a exposed with more than 30 lies

#

that this guy made

#

this is sick

#

i think that you guys are the same account now

#

prob

#

the 3 of you lmao

#

i'm already making

#

a lot actually

#

already doing

#

but it's automated so

#

i can waste my time with whatever i want to

#

you're sick with 3 discord accounts lmao

#

kind of funny

#

blocked, i don't like to read lies

#

bye bye

#

bcs you're his alt account

#

you're the king of bs

keen beacon Aug 2, 2025, 2:06 AM

#

brian isnt lying btw

rare python Aug 2, 2025, 2:06 AM

#

https://fixupx.com/lmthang/status/1951311980960350276

Thang Luong (@lmthang)

Our IMO journey continues: the yolo run model that we trained a week before #imo2025, despite all possible likelihood of failures, magically achieves SOTA across a wide range of reasoning tasks from maths, to coding, and challenging knowledge. I'm very excited that we have now delivered the IMO 🥇 system to the hands of mathematicians and a simplified version (results below) to all Google AI Ultra subscribers.

Quoting Thang Luong (@lmthang)
︀
Right before #imo2025, together with colleagues from Mountain View, NYC, Singapore, etc, we all gathered at @GoogleDeepMind headquarter in London for our final push for IMO. I believe that week was when all magic happened!
︀︀
︀︀We put all individual recipes (that we figured out before) together and did a yolo run (with the compute that I had to beg various groups to loan) to train our most advanced Gemini model. We finished training 2 days before IMO :D That model achieved SOTA results, not just for math, but coding alo…

blazing bison Aug 2, 2025, 2:06 AM

#

imagine saying the opposite of the head of the gemini deep think project and using alt accounts to support it. Another level of commitment

#

yeah, he is lying

#

you're right

#

if you say so

keen beacon Aug 2, 2025, 2:10 AM

#

we're all brian's alts 🤣

blazing bison Aug 2, 2025, 2:11 AM

#

Even if you work at google, you are not part of the deep think team

#

bcs i know all of them

rare python Aug 2, 2025, 2:11 AM

#

@blazing bison

#

two version of deepthink

blazing bison Aug 2, 2025, 2:12 AM

#

rare python <@224577039724838912>

not 2 versions, one is deep think with prompt and the other just the model and the benchs

rare python Aug 2, 2025, 2:12 AM

#

2.5 pro deepthink got 80.4 LCB

#

2.5 deepthink got 87.6 lcb

blazing bison Aug 2, 2025, 2:13 AM

#

They aren't my friends, i just know who they are, and their respective discords too

rare python Aug 2, 2025, 2:13 AM

#

blazing bison not 2 versions, one is deep think with prompt and the other just the model and t...

elaborate

#

brian is true

blazing bison Aug 2, 2025, 2:15 AM

#

i'm not wasting my time anymore

#

bye

blazing bison Aug 2, 2025, 2:16 AM

#

rare python elaborate

if you can't read the messages above, that's not my problem

stray aspen Aug 2, 2025, 2:16 AM

#

rare python Aug 2, 2025, 2:17 AM

#

blazing bison if you can't read the messages above, that's not my problem

so wtf is with prompt and with the bench????????????????????????????????

#

I just showed you there are three different deepthink Animated

patent aspen Aug 2, 2025, 2:19 AM

#

As it stands, it's just poor value for 99% of people

#

Nah

rare python Aug 2, 2025, 2:20 AM

#

You should tell the TPU team to scale up for DeepThink :D

#

You guys have to scale for both DeepThink and Gemini 3.0 damn

#

huh isn't gemini 2.5 ultra has above 1M context or something? So why DeepThink only has 100k? Cost?

civic flame Aug 2, 2025, 2:24 AM

#

blazing bison They aren't my friends, i just know who they are, and their respective discords ...

lol why would someone who works on deep think be spilling a bunch of insider info on their main anyway

rare python Aug 2, 2025, 2:25 AM

#

Bro is so eager to "gotcha" brian

#

💀

civic flame Aug 2, 2025, 2:25 AM

#

brian has never failed me 🙏

#

i trust him more than like 90% of people here

blazing bison Aug 2, 2025, 2:25 AM

#

civic flame lol why would someone who works on deep think be spilling a bunch of insider inf...

the problem is when the things they say doesn't match with the things he says

rare python Aug 2, 2025, 2:25 AM

#

brain

blazing bison Aug 2, 2025, 2:26 AM

#

or do you believe that they are lying on X?

civic flame Aug 2, 2025, 2:26 AM

#

blazing bison the problem is when the things they say doesn't match with the things he says

scrolling up you're not going to be convinced no matter what anyone tells you

#

i'm not going to waste my time

#

have fun

blazing bison Aug 2, 2025, 2:26 AM

#

civic flame scrolling up you're not going to be convinced no matter what anyone tells you

He deleted his messages anyway

civic flame Aug 2, 2025, 2:26 AM

#

if you were here more than a day you'd know he does that all the time

#

and if you had half a brain you'd be able to figure out why as well