#general | Arena | Page 78

keen beacon Jul 28, 2025, 5:43 PM

#

I need them to be serious

blazing bison Jul 28, 2025, 5:43 PM

#

I feel that gemini is being nerfed

#

But maybe it's just me

versed totem Jul 28, 2025, 6:05 PM

#

blazing bison I feel that gemini is being nerfed

im noticing the same

keen fulcrum Jul 28, 2025, 7:13 PM

#

@echo aurora Can you make the bot show us the result of the votes?

echo aurora Jul 28, 2025, 7:14 PM

#

keen fulcrum <@283397944160550928> Can you make the bot show us the result of the votes?

It’s possible we make that change. Be sure to note the #bot-feedback channel though for specific bot feedback so it’s easier for us to track.

stray aspen Jul 28, 2025, 7:18 PM

#

echo aurora It’s possible we make that change. Be sure to note the <#1398083208272412722> ch...

avez vous une date de sortie estimee pour le video arena

stray aspen Jul 28, 2025, 7:51 PM

#

blazing bison I feel that gemini is being nerfed

im noticing that too

tight silo Jul 28, 2025, 7:51 PM

#

think it's a sign of a new model?

stray aspen Jul 28, 2025, 7:51 PM

#

bro

#

why is romlox word banned here

digital umbra Jul 28, 2025, 7:52 PM

#

people spammed it in the video arena

stray aspen Jul 28, 2025, 7:52 PM

#

stray aspen im noticing that too

or maybe grok is just better at roblocks coding

grizzled bobcat Jul 28, 2025, 7:53 PM

#

is unlimited time?

#

My question

wheat onyx Jul 28, 2025, 7:53 PM

#

https://x.com/chetaslua/status/1949905375546708242?s=19

Chetaslua (@chetaslua)

GPT - 5 attempt on minecraft

ZENITH TO BE EXACT

This is impressive this is magic .

@tszzl @aidan_mclau @apples_jimmy

OPEN AI has literally made something insane 🙏

unborn ocean Jul 28, 2025, 7:55 PM

#

sharing this for the 3x time because it seems relevant

#

very few knew zhipu

digital umbra Jul 28, 2025, 7:55 PM

#

zhipu have used a few different names iirc

gentle plinth Jul 28, 2025, 7:56 PM

#

wheat onyx https://x.com/chetaslua/status/1949905375546708242?s=19

https://fixvx.com/chetaslua/status/1949905375546708242

Chetaslua (@chetaslua)

GPT - 5 attempt on minecraft

ZENITH TO BE EXACT

This is impressive this is magic .

@tszzl @aidan_mclau @apples_jimmy

OPEN AI has literally made something insane 🙏 https://t.co/iMl1LnK3A2

▶ Play video

#

With embed

wintry locust Jul 28, 2025, 8:00 PM

#

unborn ocean sharing this for the 3x time because it seems relevant

i know all of these am i cooked

ocean vortex Jul 28, 2025, 8:00 PM

#

LOL

wintry locust Jul 28, 2025, 8:00 PM

#

youre missing minimax and baichuan and iflytek

ocean vortex Jul 28, 2025, 8:01 PM

#

wintry locust youre missing minimax and baichuan and iflytek

Yandex

void tusk Jul 28, 2025, 8:01 PM

#

is GLM-4.5 an improved version of gpt4.5?

wintry locust Jul 28, 2025, 8:01 PM

#

ocean vortex Yandex

that's russian...

#

i guess lg is korean

#

ok fair enough

ocean vortex Jul 28, 2025, 8:02 PM

#

Yeah but they are also irrelevant

digital umbra Jul 28, 2025, 8:02 PM

#

void tusk is GLM-4.5 an improved version of gpt4.5?

completely unrelated

gentle plinth Jul 28, 2025, 8:02 PM

#

void tusk is GLM-4.5 an improved version of gpt4.5?

It's a different model from a different company. But it's open source

wintry locust Jul 28, 2025, 8:02 PM

#

AI Sweden

gentle plinth Jul 28, 2025, 8:03 PM

#

void tusk is GLM-4.5 an improved version of gpt4.5?

https://huggingface.co/zai-org/GLM-4.5

zai-org/GLM-4.5 · Hugging Face

void tusk Jul 28, 2025, 8:03 PM

#

gentle plinth https://huggingface.co/zai-org/GLM-4.5

oh i see

#

thanks

stray aspen Jul 28, 2025, 8:04 PM

#

why do all the chinese models have a similar reasoning process

grizzled bobcat Jul 28, 2025, 8:04 PM

#

Is free?

stray aspen Jul 28, 2025, 8:04 PM

#

yes

grizzled bobcat Jul 28, 2025, 8:04 PM

#

I see it

daring rover Jul 28, 2025, 8:08 PM

#

is glm openai?

#

oh

#

it's a random company

keen beacon Jul 28, 2025, 8:08 PM

#

daring rover it's a random company

Yes, chinese

daring rover Jul 28, 2025, 8:12 PM

#

keen beacon Yes, chinese

baited

#

i thuoght it was openai's OS model for a split sec

keen beacon Jul 28, 2025, 8:13 PM

#

daring rover i thuoght it was openai's OS model for a split sec

GPT-5 will come early next month probably

#

so soon we'll have some new models

digital umbra Jul 28, 2025, 8:15 PM

#

i'm just really curious to see what the open source model will be called

keen beacon Jul 28, 2025, 8:16 PM

#

digital umbra i'm just really curious to see what the open source model will be called

GPT-5os

#

lol

#

open source

#

they will probably continue using some weird names

#

perhaps

#

Or maybe it's 3.5 finally open-sourced. I'd laugh

digital umbra Jul 28, 2025, 8:17 PM

#

if they include GPT or "o" in the name it's going to be so confusing lol

#

i'm guessing it's going to be a new model architecture rather than based on any of their proprietary models

keen beacon Jul 28, 2025, 8:18 PM

#

digital umbra i'm guessing it's going to be a new model architecture rather than based on any ...

I really hope that it will be useful and not be just some random tiny model

digital umbra Jul 28, 2025, 8:19 PM

#

well, considering they delayed it when kimi k2 released...

#

i guess it will have a few hundred billion parameters, or a dense model equivalent

#

if they release a 50b moe it's going to be so underwhelming lol

keen beacon Jul 28, 2025, 8:22 PM

#

digital umbra if they release a 50b moe it's going to be so underwhelming lol

Yeah, google and others will overtake them easily in that case

#

Gemma 3 27b is real good for it's size for example

#

a bit aged already

reef pawn Jul 28, 2025, 8:29 PM

#

Gemini 3 pro when?

blazing bison Jul 28, 2025, 8:29 PM

#

I bet August

reef pawn Jul 28, 2025, 8:30 PM

#

Same month as GPT 5, right?

blazing bison Jul 28, 2025, 8:30 PM

#

I think gpt 5 will be released this week

reef pawn Jul 28, 2025, 8:30 PM

#

Oh okay

#

Can't wait

warm fulcrum Jul 28, 2025, 8:30 PM

#

how are some people using gpt-5 before its released

blazing bison Jul 28, 2025, 8:31 PM

#

They aren't

reef pawn Jul 28, 2025, 8:31 PM

#

warm fulcrum how are some people using gpt-5 before its released

WHO IS USING GPT 5?

digital umbra Jul 28, 2025, 8:31 PM

#

people speculating that openai is rushing their model before the eu ai act goes into effect, if that was the case i would think google would also be rushing something

warm fulcrum Jul 28, 2025, 8:31 PM

#

blazing bison They aren't

https://x.com/chetaslua/status/1949905375546708242?s=19

Chetaslua (@chetaslua)

GPT - 5 attempt on minecraft

ZENITH TO BE EXACT

This is impressive this is magic .

@tszzl @aidan_mclau @apples_jimmy

OPEN AI has literally made something insane 🙏

blazing bison Jul 28, 2025, 8:32 PM

#

Actually there is people that get access weeks before release, but these people generally don't talk about it

blazing bison Jul 28, 2025, 8:32 PM

#

warm fulcrum https://x.com/chetaslua/status/1949905375546708242?s=19

We can't be sure that zenith is gpt 5

#

I think it is but...

warm fulcrum Jul 28, 2025, 8:32 PM

#

well ye how are people able to use that model?

blazing bison Jul 28, 2025, 8:33 PM

#

warm fulcrum well ye how are people able to use that model?

It was available on the arena in the weekend but already got removed

reef pawn Jul 28, 2025, 8:33 PM

#

blazing bison It was available on the arena in the weekend but already got removed

Can you tell me it's rank in LM arena?

blazing bison Jul 28, 2025, 8:34 PM

#

The rank is not public yet

warm fulcrum Jul 28, 2025, 8:34 PM

#

blazing bison It was available on the arena in the weekend but already got removed

unfortunate

reef pawn Jul 28, 2025, 8:34 PM

#

Oh okay

warm fulcrum Jul 28, 2025, 8:34 PM

#

how does lmarena even have ahold of these models?

keen beacon Jul 28, 2025, 8:35 PM

#

warm fulcrum how does lmarena even have ahold of these models?

OpenAI provides the pre-release models to LMArena

reef pawn Jul 28, 2025, 8:35 PM

#

Labs give them early access for testing

digital umbra Jul 28, 2025, 8:36 PM

#

they get a lot of useful feedback for putting models here

blazing bison Jul 28, 2025, 8:36 PM

#

warm fulcrum how does lmarena even have ahold of these models?

They hack ceos accounts and proxy it

warm fulcrum Jul 28, 2025, 8:36 PM

#

wowie

#

lets hope openai new model lives up to the hype

blazing bison Jul 28, 2025, 8:36 PM

#

Its good

#

But it's not agi good

reef pawn Jul 28, 2025, 8:37 PM

#

AGI is buzzword

blazing bison Jul 28, 2025, 8:37 PM

#

Its like 25% improvement from o3

reef pawn Jul 28, 2025, 8:37 PM

#

Nice

blazing bison Jul 28, 2025, 8:37 PM

#

And 25% is a lot

keen beacon Jul 28, 2025, 8:37 PM

#

reef pawn AGI is buzzword

We ain't near AGI at all unless some type of new architecture goes live

#

with no hallucinations and consistent answers

digital umbra Jul 28, 2025, 8:38 PM

#

zenith will probably be a great model, let's hope it won't be too expensive or behind a router that gives you garbage most of the time

keen beacon Jul 28, 2025, 8:38 PM

#

and learns from mistakes independently

keen talon Jul 28, 2025, 8:38 PM

#

can someone tell me what are the limits for claude 4 opus?

blazing bison Jul 28, 2025, 8:38 PM

#

digital umbra zenith will probably be a great model, let's hope it won't be too expensive or b...

I think it's gonna be a router that gives you garbage if your prompt is not good enough

#

Because it was already like this on the arena

reef pawn Jul 28, 2025, 8:39 PM

#

keen beacon We ain't near AGI at all unless some type of new architecture goes live

I think we at least decade away from true AGI but Scam Altman and Melon Musk keep milking that word

warm fulcrum Jul 28, 2025, 8:39 PM

#

why is gemini 2.5 pro rated #1 on all tasks

#

theres no way it actually is that good

blazing bison Jul 28, 2025, 8:39 PM

#

warm fulcrum why is gemini 2.5 pro rated #1 on all tasks

Because people vote for it

keen beacon Jul 28, 2025, 8:39 PM

#

reef pawn I think we at least decade away from true AGI but Scam Altman and Melon Musk kee...

Yeah, they should be honest and not lie to people

reef pawn Jul 28, 2025, 8:39 PM

#

Gemini is my fav model

keen beacon Jul 28, 2025, 8:39 PM

#

warm fulcrum why is gemini 2.5 pro rated #1 on all tasks

It's an all-around excellent model

#

in many areas

warm fulcrum Jul 28, 2025, 8:40 PM

#

everytime i ask it to code it just blabs a lot

#

it adds more comments than code

reef pawn Jul 28, 2025, 8:40 PM

#

warm fulcrum everytime i ask it to code it just blabs a lot

You use Google AI Studio?

keen beacon Jul 28, 2025, 8:40 PM

#

warm fulcrum it adds more comments than code

Ah, I just use it for random specific questions. Querying the knowledge base, ethics, morals

blazing bison Jul 28, 2025, 8:40 PM

#

reef pawn I think we at least decade away from true AGI but Scam Altman and Melon Musk kee...

I agree, but we are not a decade away for people that use llms getting 2x leverage

keen beacon Jul 28, 2025, 8:40 PM

#

And of course in Finnish

warm fulcrum Jul 28, 2025, 8:40 PM

#

reef pawn You use Google AI Studio?

yes

#

the actual gemini website doesn't even want to code

#

it just says it isn't capable

keen beacon Jul 28, 2025, 8:41 PM

#

Lol uralic languages are probably in the 0.005 percent of votes/prompts

reef pawn Jul 28, 2025, 8:41 PM

#

blazing bison I agree, but we are not a decade away for people that use llms getting 2x levera...

Yes, Not trying to precise or anything the point was it's not anywhere near yet.

blazing bison Jul 28, 2025, 8:41 PM

#

warm fulcrum it just says it isn't capable

Try ai studio

warm fulcrum Jul 28, 2025, 8:41 PM

#

blazing bison Try ai studio

i alr do

blazing bison Jul 28, 2025, 8:42 PM

#

reef pawn Yes, Not trying to precise or anything the point was it's not anywhere near yet.

I mean, we don't need agi to make useful things

reef pawn Jul 28, 2025, 8:42 PM

#

warm fulcrum the actual gemini website doesn't even want to code

It works fine with me, the thing is you have to give specific prompts to the model and break bigger task in small chunks for better output

reef pawn Jul 28, 2025, 8:43 PM

#

blazing bison I mean, we don't need agi to make useful things

True

keen beacon Jul 28, 2025, 8:43 PM

#

blazing bison I mean, we don't need agi to make useful things

Well medicine would be real useful for AGI to know

blazing bison Jul 28, 2025, 8:43 PM

#

If gpt 5 is not a big leap, I'm sad

#

The bubble will burst

torn mantle Jul 28, 2025, 9:02 PM

#

grok 4

surreal creek Jul 28, 2025, 9:21 PM

#

Gemini in first place 30 points ahead of o3 on the coding leaderboard lol

stray aspen Jul 28, 2025, 9:32 PM

#

which claude

keen fulcrum Jul 28, 2025, 9:35 PM

#

I think Arena should reconsider the evaluation process and include pregenerated results for prompts

#

That way a prompt can be evaluated from multiple users

meager harbor Jul 28, 2025, 9:37 PM

#

keen fulcrum I think Arena should reconsider the evaluation process and include pregenerated ...

no the actual is good, but your idea is good that you also add the possibility to rate pregenerated prompts

unborn ocean Jul 28, 2025, 9:38 PM

#

you know chinese labs are afraid of repercussions if prompting for "a taipei vacation" is already considered an inappropriate topic

meager harbor Jul 28, 2025, 9:39 PM

#

so any gpt 5 whispers ?

sonic tendon Jul 28, 2025, 10:10 PM

#

glm 4.5 is surprisingly good

echo aurora Jul 28, 2025, 10:21 PM

#

We're aware of issues related to non-text models struggling at the moment.

quiet moss Jul 28, 2025, 10:39 PM

#

echo aurora We're aware of issues related to non-text models struggling at the moment.

Ok

echo aurora Jul 28, 2025, 10:39 PM

#

quiet moss Ok

All fixed blobthumbsup

whole sundial Jul 28, 2025, 10:45 PM

#

guys I think they might have distilled glm 4.5 off of gemini, I just had a response start with "Of course!"

quiet moss Jul 28, 2025, 10:46 PM

#

just because it said Of course means its trained off of Gemini?

whole sundial Jul 28, 2025, 10:47 PM

#

ok then tell me another model that starts with "Of course!" all the time

#

seems to happen when reasoning is off

wintry tinsel Jul 28, 2025, 10:48 PM

#

And writing too Claude is always the bomb

whole sundial Jul 28, 2025, 10:49 PM

#

yeah it starts with "Of course!" just like Gemini

#

at least with reasoning off

#

must of post-trained it off of gemini conversations, at least partially

#

but this shouldn't be a surprise, Chinese companies distill off of US models all the time

stray aspen Jul 28, 2025, 10:51 PM

#

glm 4.5 no think is gemini

whole sundial Jul 28, 2025, 10:52 PM

#

I feel like the "Of course!" is a watermark put in by Google

#

I'm not saying it is Gemini, I was just saying that they distilled Gemini into the model

#

and it has long response, kimi gets straight to the point

#

that might be better for some people though, but this means glm 4.5 is going to have more slop

stray aspen Jul 28, 2025, 10:58 PM

#

i love the glm UI

whole sundial Jul 28, 2025, 11:01 PM

#

thinking glm 4.5 does not have the "Of course!" stuff, i think it only does that for non-thinking due to likely gemini distillation. As they can't distill their reasoning traces anymore, it won't do it in reasoning mode because it's distilled off of a different model

leaden palm Jul 28, 2025, 11:01 PM

#

stray aspen i love the glm UI

no dark mode tho

#

and a short default max tokens

whole sundial Jul 28, 2025, 11:03 PM

#

i was using this site https://huggingface.co/spaces/zai-org/GLM-4.5-Space to try it out, disable thinking and lower the temperature and you'll see what I mean

#

it identifies itself as being by Zhipu like it should, but the "Of course!" threw me off a bit

blazing bison Jul 28, 2025, 11:27 PM

#

Even if it have a little of gemini data, it's not a problem if the model is good

#

But for me it's no good

sturdy mica Jul 28, 2025, 11:48 PM

#

whole sundial i was using this site https://huggingface.co/spaces/zai-org/GLM-4.5-Space to try...

why is it responding with a slutty highschool girl system prompt wtf???

#

whats wrong with that website

#

☹️

whole sundial Jul 28, 2025, 11:50 PM

#

it seems to be fine without thinking, maybe it messes up with thinking?

#

or when multiple people are using it at the same time?

sturdy mica Jul 29, 2025, 12:09 AM

#

whole sundial or when multiple people are using it at the same time?

prolly this.... i wonder whos using it for weird fetish roleplay.....

keen beacon Jul 29, 2025, 12:50 AM

#

sturdy mica prolly this.... i wonder whos using it for weird fetish roleplay.....

my bad

torn star Jul 29, 2025, 2:41 AM

#

poll_question_text

GPT 5 when?

victor_answer_votes

20

total_votes

32

victor_answer_id

2

victor_answer_text

Next Thursday (aug 8)

leaden palm Jul 29, 2025, 2:44 AM

#

what ai mode suggestions do you guys have

#

#1 doesn't really make sense to me and #3 isn't really relevant but #2 is definitely personalized

gusty night Jul 29, 2025, 2:48 AM

#

Hello !!

harsh flume Jul 29, 2025, 2:54 AM

#

What do you guys think it's capping AIs from performing well in frontier math benchs?

#

it doesnt seem like it would be an unsurmisable problem when you take into account the existence of models like AlphaFold

leaden sun Jul 29, 2025, 3:05 AM

#

harsh flume What do you guys think it's capping AIs from performing well in frontier math be...

in short, it's a multifaceted problem, beginning with what "understanding" even truly means for a machine, to the problem of translation between formal logic and natural language, to the fact that most if not all traditionally trained mathematicians work more with intuition rather than pure information retrieval, connecting the dots works often subconsciously that happens to surface into conscious understanding, leading to the Eureka moment. As far as i know, the current ai architecture is still too limiting?

#

in case you're interested, one of the current frontier ai research is about the connection between consciousness and high intelligence, it's still an open problem, but a very fascinating one compared to those hopeless millennium prize problems...

novel crater Jul 29, 2025, 3:27 AM

#

what is the fastest model on lmarena?

harsh flume Jul 29, 2025, 3:30 AM

#

I understand when it comes to tier 4, but AlphaGO in 2017 kinda solved the dilemma of navigating a giant state space (10^170), I am kinda dumb but it feels like problems in tier 1-3 of FrontierMath would be a lot easier and lower search space than that since they are all solvable.

It seems like they are only testing LLMs tho which makes sense to have a low score, altough i'd assume that LLMs could implement math-driven tools like alphaproof where the LLM layer would translate a problem into pure math and call in the solver

leaden sun Jul 29, 2025, 3:54 AM

#

i think proof assistants are already being integrated into the architecture to make it more deterministic, the thing is, those theorem provers are not complete and still an area of active research

#

they only testing LLMs? so they have figured another way already? dont tell me it's an artificially grown organic hybrid brain hahah

wicked root Jul 29, 2025, 4:10 AM

#

Is there a new model that's being tested right now in LMArena?

#

Word on the street is GPT5 is being tested rn

drifting thorn Jul 29, 2025, 4:12 AM

#

nah it's great in creative writing(writing lyrics)

whole sundial Jul 29, 2025, 4:15 AM

#

time to make your pfp a picture of cliff richard lol

drifting thorn Jul 29, 2025, 4:15 AM

#

the never gonna give you up hallucination is the LLM joke of the year

quiet moss Jul 29, 2025, 4:15 AM

#

If GPT-5 releases by July 31, is it likely it will be on LMArena on the same day?

whole sundial Jul 29, 2025, 4:17 AM

#

whole sundial time to make your pfp a picture of cliff richard lol

(the correct answer to the prompt is "Nothing's Gonna Stop Us Now" by Starship. "Never Gonna Give You Up" was the number one song of 1987 in the UK, but it's not by Cliff Richard.)
large reasoning llms (o3, 2.5 pro, claude opus, grok 4) get this right.

#

GLM 4.5 gets this right as well

drifting thorn Jul 29, 2025, 4:18 AM

#

whole sundial time to make your pfp a picture of cliff richard lol

actually the whole answer hallucinates...

#

GLM 4.5 has bad lyric writing

#

It doesn't even rhyme with the line I gave it

#

he gave me 4 answers, but none of them rhymes

gusty night Jul 29, 2025, 4:21 AM

#

You are quick at model integration 👏

drifting thorn Jul 29, 2025, 4:22 AM

#

What's the provider of kraken-072125-1\

whole sundial Jul 29, 2025, 4:25 AM

#

amazon

harsh flume Jul 29, 2025, 4:58 AM

#

leaden sun they only testing LLMs? so they have figured another way already? dont tell me i...

well yea

#

I read through what I could find of information on their website and apparently the bench is done with the models using tools, so it'd be possible to integrate a native math AI that an LLM could call on

digital umbra Jul 29, 2025, 5:06 AM

#

this came up in openrouter discord

leaden sun Jul 29, 2025, 5:08 AM

#

harsh flume I read through what I could find of information on their website and apparently ...

yeah define math ai first 😅 i've never looked at those frontiermath questions so i assume it's a broard selection across the entire mathematical discipline, good luck building a math ai who can afford all those vast math tools

#

i know it's difficult for people outside math to imagine how...fundamentally different the areas in maths actually are

#

three examples i'd personally love the llms to be able to use:
https://dealii.org/
https://www.sagemath.org/tour.html
https://rocq-prover.org/
and those are just one of the many out there

SageMath Mathematical Software System

SageMath Mathematical Software System - Sage

SageMath is a free and open-source mathematical software system.

Rocq

Welcome to a World of Rocq

Rocq is a general-purpose, industrial-strength interactive theorem prover.

leaden sun Jul 29, 2025, 5:41 AM

#

obviously, llms need to understand the problem first, recall knowledge needed (theorems, lemmatas, corollaries etc), connect the dots and use the tools correctly to get the final answer

harsh flume Jul 29, 2025, 5:48 AM

#

leaden sun yeah define math ai first 😅 i've never looked at those frontiermath questions ...

alphaproof would be an example

#

alpha geometry2 another

leaden sun Jul 29, 2025, 5:50 AM

#

those are not general math ai, they are specialized if am not mistaken, but yeah, you can always build a swarm of specialized ones and call it a general ai

harsh flume Jul 29, 2025, 5:51 AM

#

leaden sun those are not general math ai, they are specialized if am not mistaken, but yeah...

yea that's why I was positing in the usage of them as tool calls

#

AGI will prob be a form of that anyways as I dont think general intelligence will come from a pure next-token-predictor model with infinite scaling

leaden sun Jul 29, 2025, 5:56 AM

#

the coordination between those agents within a swarm will be a challenge, it's studied also in dynamical systems

harsh flume Jul 29, 2025, 5:56 AM

#

the interesting thing is that these are a whole other transformers achitecture so integrating them within the answer scope of a LLM would be really dope

#

lol it seems like they are on it already

#

here I was proposing the invention of fire whilst they are already on blowtorch schematics lol

#

man, I wish LMArena would organize a sorts of AMA with top AI researchers from these labs, they must be in direct contact with the industry's forefront and that'd make some great content given how invested this server's users are

#

People here would formulate more interesting questions than 90% of podcast hosts

leaden sun Jul 29, 2025, 6:09 AM

#

harsh flume here I was proposing the invention of fire whilst they are already on blowtorch ...

"there is nothing new under the sun", we're simply rediscovering them all...😊

agile bloom Jul 29, 2025, 6:09 AM

#

based on the response glm 4.5 gave me, it needs to be worked on

#

like damn, glm 4.5 told me it's mental state

slim mesa Jul 29, 2025, 6:24 AM

#

hii

#

the grok 4, on the part of direct chat, is really grok 4?

#

mine say him is the grok 1 xd

#

sorry bad english

calm sequoia Jul 29, 2025, 6:42 AM

#

#

What does this even mean

nimble trail Jul 29, 2025, 6:48 AM

#

slim mesa the grok 4, on the part of direct chat, is really grok 4?

Yep it is. The model itself tends to hallucinate about it's model.

whole sundial Jul 29, 2025, 6:49 AM

#

calm sequoia

it's interesting that the current model here is 4o. must be filler for gpt-5 (which, considering they have already added this, should be coming very soon)

calm sequoia Jul 29, 2025, 6:51 AM

#

whole sundial it's interesting that the current model here is 4o. must be filler for gpt-5 (wh...

THis selection also exist on o3

#

And I wouldn't say this is much longer.

ashen mauve Jul 29, 2025, 7:17 AM

#

What is GLM anyways?

cedar tide Jul 29, 2025, 8:48 AM

#

Nemotron v1.5 on Artificial analysis
Its best score for an open source model that can be deployed on a single h100

#

Go Upvote this model
https://discord.com/channels/1340554757349179412/1398515764448989304

golden ocean Jul 29, 2025, 9:00 AM

#

agile bloom like damn, glm 4.5 told me it's mental state

bing chat sydney

cedar tide Jul 29, 2025, 9:00 AM

#

First of all, I want to clarify that I don't trust this score at all to predict their overall performance.

#

Kimi k2 is 2nd best Model without Reasoning so no problem with his score, and you can't compare him with reasoning models

#

For glm They themselves shared the score of their model on the same benchmarks as artificial analysis and these are the right places

Screenshot_2025-07-29-11-06-18-795_com.discord-edit.jpg

#

It's certain that if he had infinite money he would have set many other benchmarks

humble sonnet Jul 29, 2025, 9:22 AM

#

What is GLM 4.5 ?

keen beacon Jul 29, 2025, 9:57 AM

#

humble sonnet What is GLM 4.5 ?

Chinese's company's new model

reef pawn Jul 29, 2025, 10:16 AM

#

cedar tide For glm They themselves shared the score of their model on the same benchmarks a...

GLM is proprietary model, right?

teal mantle Jul 29, 2025, 10:16 AM

#

I am mostly API only but should I renew GPT Plus or Supergrok
One for agent, one for grok 4

torn mantle Jul 29, 2025, 10:16 AM

#

cedar tide For glm They themselves shared the score of their model on the same benchmarks a...

is it really good or they are just benchmaxing again

teal mantle Jul 29, 2025, 10:16 AM

#

reef pawn GLM is proprietary model, right?

MIT

reef pawn Jul 29, 2025, 10:17 AM

#

Oh then the scores are good

cedar tide Jul 29, 2025, 10:17 AM

#

reef pawn GLM is proprietary model, right?

What does proprietary model mean?

torn mantle Jul 29, 2025, 10:17 AM

#

david

#

is it good or nah

reef pawn Jul 29, 2025, 10:17 AM

#

cedar tide What does proprietary model mean?

Not Open Weight

cedar tide Jul 29, 2025, 10:17 AM

#

reef pawn Not Open Weight

Open, mit licence

torn mantle Jul 29, 2025, 10:18 AM

#

proprietary means ownership @reef pawn

reef pawn Jul 29, 2025, 10:18 AM

#

Oh okay

torn mantle Jul 29, 2025, 10:18 AM

#

not open source

teal mantle Jul 29, 2025, 10:18 AM

#

teal mantle I am mostly API only but should I renew GPT Plus or Supergrok One for agent, one...

Which one I should get again

cedar tide Jul 29, 2025, 10:18 AM

#

So very good

torn mantle Jul 29, 2025, 10:18 AM

#

cedar tide So very good

good?

reef pawn Jul 29, 2025, 10:18 AM

#

torn mantle not open source

What is opposite of Not open weight?

cedar tide Jul 29, 2025, 10:18 AM

#

torn mantle good?

Il speaking about that is open mit

torn mantle Jul 29, 2025, 10:18 AM

#

closed source

torn mantle Jul 29, 2025, 10:18 AM

#

cedar tide Il speaking about that is open mit

i see

humble sonnet Jul 29, 2025, 10:19 AM

#

Are there any special features?

reef pawn Jul 29, 2025, 10:19 AM

#

torn mantle closed source

Are proprietary model closed source or open?

torn mantle Jul 29, 2025, 10:19 AM

#

reef pawn Are proprietary model closed source or open?

they can be both

humble sonnet Jul 29, 2025, 10:19 AM

#

keen beacon Chinese's company's new model

Are there any special features?

reef pawn Jul 29, 2025, 10:19 AM

#

torn mantle they can be both

💀

torn mantle Jul 29, 2025, 10:19 AM

#

proprietary is owned by the ones who made it

cedar tide Jul 29, 2025, 10:19 AM

#

@humble sonnet salut

torn mantle Jul 29, 2025, 10:19 AM

#

what are you talking about?

reef pawn Jul 29, 2025, 10:19 AM

#

How you gonna make money from open source model

keen beacon Jul 29, 2025, 10:20 AM

#

humble sonnet Are there any special features?

I don't know that much. Haven't tried GLM models before

torn mantle Jul 29, 2025, 10:20 AM

#

i think you are confusing it with another word or something

keen beacon Jul 29, 2025, 10:20 AM

#

reef pawn How you gonna make money from open source model

Funding

cedar tide Jul 29, 2025, 10:20 AM

#

reef pawn How you gonna make money from open source model

From your api and chatbot if he have a subcription

reef pawn Jul 29, 2025, 10:20 AM

#

keen beacon Funding

But that is not allowed as business

teal mantle Jul 29, 2025, 10:20 AM

#

GPT plus or supergrok btw

reef pawn Jul 29, 2025, 10:21 AM

#

Both sucks

#

Gemini better

teal mantle Jul 29, 2025, 10:21 AM

#

reef pawn Gemini better

How much usage I could milk

cedar tide Jul 29, 2025, 10:21 AM

#

cedar tide From your api and chatbot if he have a subcription

but the problem is that if the open source model there will often be APIs much cheaper than yours

reef pawn Jul 29, 2025, 10:21 AM

#

GPT-1 IMAGE is good tho

teal mantle Jul 29, 2025, 10:21 AM

#

I already milk CLI and AIstudio like anyone decent

torn mantle Jul 29, 2025, 10:21 AM

#

reef pawn How you gonna make money from open source model

you are right from a business perspective

reef pawn Jul 29, 2025, 10:21 AM

#

teal mantle How much usage I could milk

1 million context window

teal mantle Jul 29, 2025, 10:22 AM

#

reef pawn 1 million context window

Message limits I mean😂

reef pawn Jul 29, 2025, 10:22 AM

#

torn mantle you are right from a business perspective

Thanks but I get your point, you was right too!

reef pawn Jul 29, 2025, 10:22 AM

#

teal mantle Message limits I mean😂

Then it's horrible, I got 1 year free Gemini AI pro student membership here in India!

#

https://tenor.com/view/dead-chat-dead-chat-discord-revive-gif-21258645

Tenor

torn mantle Jul 29, 2025, 10:27 AM

#

reef pawn Thanks but I get your point, you was right too!

no you are right

#

if i say you are right then you are right

#

@cedar tide you are wrong

reef pawn Jul 29, 2025, 10:27 AM

#

Aight 🙏

cedar tide Jul 29, 2025, 10:28 AM

#

torn mantle <@419074580515389450> you are wrong

What ?

keen beacon Jul 29, 2025, 10:31 AM

#

reef pawn But that is not allowed as business

Did not know

#

lol

calm sequoia Jul 29, 2025, 12:58 PM

#

Wtf guys, what are you using gemini for

#

Just run out of o3 request

#

Tried gemini 2.5 Pro max thinking budget

#

Failed at all of my requests miserably (o3 successful 90%)

#

Is Gemini always like this? 💀

blazing bison Jul 29, 2025, 1:02 PM

#

GPT-5 has been spotted https://x.com/ryolu_/status/1950163428389040431

Ryo Lu (@ryolu_)

here’s my prompt
one shot + details

#

Cursor staff is already using it

#

Now I'm pretty sure Zenith was GPT 5

mortal lynx Jul 29, 2025, 1:06 PM

#

To me o3 and 2.5 pro are both pretty hit or miss

#

Gemini 2.5 Pro was vastly superior at some tasks and garbage at others, same for o3

#

for coding tasks atleast

blazing bison Jul 29, 2025, 1:07 PM

#

gpt 5 is good

#

it's agi, believe

mortal lynx Jul 29, 2025, 1:07 PM

#

If zenith was GPT-5 it's still not quite AGI, but much closer than o3 and o4-mini were

blazing bison Jul 29, 2025, 1:09 PM

#

i'm just kidding bro

mortal lynx Jul 29, 2025, 1:10 PM

#

I know, i'm just commenting on it

#

I do think Zenith and o3-alpha were a considerably improvement over what we have today, atleast for what I've tested

#

much more than the "20%+ points in HLE and ARC-AGI!" models we got these past few months

blazing bison Jul 29, 2025, 1:10 PM

#

o3-alpha was the best one, idk why people said zenith was better

#

maybe zenith is o3 alpha but the feeling that i got trying o3 alpha, the results, it was better than anything i ever tryed for coding

keen fulcrum Jul 29, 2025, 1:11 PM

#

blazing bison GPT-5 has been spotted https://x.com/ryolu_/status/1950163428389040431

they obviously posted it

blazing bison Jul 29, 2025, 1:12 PM

#

keen fulcrum they obviously posted it

yes, for the hype

keen fulcrum Jul 29, 2025, 1:12 PM

#

the blur is horrible

blazing bison Jul 29, 2025, 1:12 PM

#

it's on purpose

#

but it means that gpt-5 is ready, idk openai is waiting for

keen fulcrum Jul 29, 2025, 1:12 PM

#

#

this is a correct blur

blazing bison Jul 29, 2025, 1:13 PM

#

and after this week rate limits from anthropic uugh

#

I really want OpenAI to dethrone them

keen fulcrum Jul 29, 2025, 1:13 PM

#

blazing bison and after this week rate limits from anthropic uugh

in late august introduced

#

they ran out of gpus

mortal lynx Jul 29, 2025, 1:16 PM

#

They're probably just preparing the blog posts, demos, videos and research papers

#

hopefully the demos are better than their usual "Look how our model can order a new shoe! AGI is here!"

#

Google and xAI do a much better job at that

blazing bison Jul 29, 2025, 1:29 PM

#

mortal lynx hopefully the demos are better than their usual "Look how our model can order a ...

Its gonna be a travel plan

floral comet Jul 29, 2025, 1:33 PM

#

I heard there's (GPT-5) model also known as Zenith, is it still in the LLM Arena?

blazing bison Jul 29, 2025, 1:34 PM

#

floral comet I heard there's (GPT-5) model also known as Zenith, is it still in the LLM Arena...

No

#

Removed

floral comet Jul 29, 2025, 1:35 PM

#

Ohh, damn.. How do people can even use it.. I guess I'm not lucky enough

odd shard Jul 29, 2025, 1:46 PM

#

hello

stray aspen Jul 29, 2025, 1:49 PM

#

craig will gpt 5 be AGI

quiet moss Jul 29, 2025, 2:04 PM

#

no

blazing bison Jul 29, 2025, 2:10 PM

#

floral comet Ohh, damn.. How do people can even use it.. I guess I'm not lucky enough

it's removed, it was avaliable 2 days ago

quiet moss Jul 29, 2025, 2:14 PM

#

keen beacon Jul 29, 2025, 2:35 PM

#

quiet moss

it's a bitcoin wallet

quiet moss Jul 29, 2025, 2:35 PM

#

No it says sk

keen fulcrum Jul 29, 2025, 2:44 PM

#

quiet moss

Its bait anyway

stray aspen Jul 29, 2025, 3:15 PM

#

is it just me or is gemini 2.5 pro getting worse each day

cedar tide Jul 29, 2025, 3:38 PM

#

torn mantle Jul 29, 2025, 3:49 PM

#

who voted no

#

lets talk

keen beacon Jul 29, 2025, 3:51 PM

#

cedar tide

I strongly hope to see that. I'd try all sorts of things

sour spindle Jul 29, 2025, 3:54 PM

#

stray aspen is it just me or is gemini 2.5 pro getting worse each day

Completely unusable for me outside of summarizing long documents

#

It’s tool use is completely broken

#

You would think they would be dominating in this regard

cedar tide Jul 29, 2025, 4:10 PM

#

torn mantle who voted no

You voted no

#

Batard 🤣

keen beacon Jul 29, 2025, 4:13 PM

#

cedar tide

@echo aurora

echo aurora Jul 29, 2025, 4:14 PM

#

keen beacon <@283397944160550928>

This request is for sure on our radar! Was chatting with a few coworkers yesterday about it blobthumbsup

#

But don't forget to use #1372230675914031105 ! Best place to make these kinds of requests.

cedar tide Jul 29, 2025, 4:19 PM

#

https://fxtwitter.com/Alibaba_Qwen/status/1950227114793586867?t=PAPpkXrdI3340UJiuu30oA&s=19

Qwen (@Alibaba_Qwen)

🚀 Qwen3-30B-A3B Small Update: Smarter, faster, and local deployment-friendly.
︀︀
︀︀✨ Key Enhancements:
︀︀✅ Enhanced reasoning, coding, and math skills
︀︀✅ Broader multilingual knowledge
︀︀✅ Improved long-context understanding (up to 256K tokens)
︀︀✅ Better alignment with user intent and open-ended tasks
︀︀✅ No more blocks — now operating exclusively in non-thinking mode
︀︀
︀︀🔧 With 3B activated parameters, it's approaching the performance of GPT-4o and Qwen3-235B-A22B Non-Thinking
︀︀
︀︀Qwen Chat: chat.qwen.ai/?model=Qwen3-30B-A3B-2507
︀︀
︀︀HF:huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507 or huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507-FP8
︀︀
︀︀ModelScope: modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507 or modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507-FP8

**💬 12 🔁 21 ❤️ 194 👁️ 4.8K **

#

#

Anyone can make a request ?

drifting crow Jul 29, 2025, 5:13 PM

#

quiet moss No it says sk

I feel like they made that sk just to help us with our regex

cedar tide Jul 29, 2025, 5:18 PM

#

qwen 3 coder arrived in the leaderboard and its the 3th overall open source model
(Kimi 2 and old qwen 3 no think better)

keen fulcrum Jul 29, 2025, 5:18 PM

#

David do you have news about grok 4 coder

#

when will it release

cedar tide Jul 29, 2025, 5:22 PM

#

keen fulcrum when will it release

Officialy august

cedar tide Jul 29, 2025, 5:26 PM

#

cedar tide

average oh the 25 benchmark of qwen 30b a3b

#

by category

fleet lintel Jul 29, 2025, 5:28 PM

#

mortal lynx If zenith was GPT-5 it's still not quite AGI, but much closer than o3 and o4-min...

Based on all the hype, I think gpt-5 is much much better than o3 / o4... isn't that the case?

keen beacon Jul 29, 2025, 5:36 PM

#

I may sound dumb but what does spatial awareness mean in LLM models? Vision capabilities?

#

I see, thanks

blazing bison Jul 29, 2025, 5:46 PM

#

openai is genius with this study together release lol

civic flame Jul 29, 2025, 5:46 PM

#

ZENITH IS BACK

#

😍

blazing bison Jul 29, 2025, 5:46 PM

#

really?

civic flame Jul 29, 2025, 5:47 PM

#

YESS

#

okay perchance false alarm

keen beacon Jul 29, 2025, 5:50 PM

#

civic flame okay perchance false alarm

:((

blazing bison Jul 29, 2025, 5:53 PM

#

civic flame okay perchance false alarm

100% false alarm

stray aspen Jul 29, 2025, 5:54 PM

#

guys whats the most trustworthy ai Benchmark

blazing bison Jul 29, 2025, 5:54 PM

#

openai will collect so much reasoning data with this study together

#

this mode actually asks a lot about your reasoning

#

funny

stray aspen Jul 29, 2025, 5:55 PM

#

craig wil gpt 5 be agi

civic flame Jul 29, 2025, 5:56 PM

#

it's half back

#

can't share much but

blazing bison Jul 29, 2025, 5:56 PM

#

?

civic flame Jul 29, 2025, 5:56 PM

#

all it needs now is for them to flick a switch to enable it in battle

#

it's been re-added as if ready

blazing bison Jul 29, 2025, 5:56 PM

#

claude 5 will be agi

#

with weekly rate limits

#

after 2 prompts

jade egret Jul 29, 2025, 5:57 PM

#

gpt 5 when ):

blazing bison Jul 29, 2025, 5:57 PM

#

thursday

jade egret Jul 29, 2025, 5:57 PM

#

fr?

#

this?

blazing bison Jul 29, 2025, 5:58 PM

#

jade egret fr?

no

civic flame Jul 29, 2025, 6:04 PM

#

Craig

#

Neptune isn't a new model dawg

#

🥀

keen beacon Jul 29, 2025, 6:06 PM

#

civic flame Craig

Apple guy was wrong after all

jade egret Jul 29, 2025, 6:12 PM

#

blazing bison no

bruh

civic flame Jul 29, 2025, 6:13 PM

#

well yeah opus is a good model

#

😭

cedar tide Jul 29, 2025, 6:18 PM

#

cedar tide average oh the 25 benchmark of qwen 30b a3b

Go Upvote this very good model
https://discord.com/channels/1340554757349179412/1399812659800445131

#

How do you know ?

#

Go upvote this request
https://discord.com/channels/1340554757349179412/1394703782255788122

#

Very good artificial analysis

Screenshot_2025-07-29-20-15-14-623_com.android.chrome-edit.jpg

#

#

Its 32b sota

digital umbra Jul 29, 2025, 6:22 PM

#

Doesn't require internet connection

cedar tide Jul 29, 2025, 6:25 PM

#

I didn't say it was a sota and that it was better than o3 I don't know what you're talking about

#

in the arena you even have 1b models, the arena is not only for sota models

digital umbra Jul 29, 2025, 6:26 PM

#

Still I think EXAONE (which btw isn't a chinese model) is problematic because its license basically forbids you from doing anything at all useful with it

#

Sure you can benchmark but that's about it lol

cedar tide Jul 29, 2025, 6:30 PM

#

Yes exaone its non commercial permissive
We have on just one api with 1/1$ input output

digital umbra Jul 29, 2025, 6:30 PM

#

Which is stupidly expensive for a 32B

cedar tide Jul 29, 2025, 6:30 PM

#

@deep adder I don't understand anything you're saying

cedar tide Jul 29, 2025, 6:32 PM

#

digital umbra Which is stupidly expensive for a 32B

Yes average price for qwen 32b its 0.15 0.45

stray aspen Jul 29, 2025, 6:42 PM

#

damn craig is educating everyone

digital umbra Jul 29, 2025, 6:45 PM

#

yes, why would anyone use an open source model they can run locally when they could give their personal data to openai, be forced to use a web interface and rate limits

stray aspen Jul 29, 2025, 6:46 PM

#

didnt expect qwen would get this far on the artificial analysis leaderboard

digital umbra Jul 29, 2025, 6:47 PM

#

i can feed how much sensitive data i want into my gpu with no regrets. i mean it already sees everything i have on my screen anyway 😛

keen beacon Jul 29, 2025, 6:53 PM

#

they are doing it because of the nyt thing right?

#

sam is trying to bring attention to it to win that lawsuit i guess

digital umbra Jul 29, 2025, 6:57 PM

#

you're typing this on the discord of a site that provides user prompts to AI companies to improve their models...

#

and even if the data is useless for training it would still be useful to sell to data brokers

devout vault Jul 29, 2025, 6:57 PM

#

is glm 4.5 even good

#

is it better than other smart models like gemini 2.5 pro?

stray aspen Jul 29, 2025, 7:00 PM

#

devout vault is it better than other smart models like gemini 2.5 pro?

hell no

#

are you serious

devout vault Jul 29, 2025, 7:00 PM

#

stray aspen hell no

people say it is

#

weird

blazing bison Jul 29, 2025, 7:04 PM

#

This craig is just a rage baiter yapper

stray aspen Jul 29, 2025, 7:05 PM

#

welcome to the internet bro

torn mantle Jul 29, 2025, 7:05 PM

#

Ong

#

Onnng

blazing bison Jul 29, 2025, 7:06 PM

#

It is, just ignore him

#

Bait again

keen beacon Jul 29, 2025, 7:07 PM

#

That's just capitalism

digital umbra Jul 29, 2025, 7:07 PM

#

your original point was that open source models were useless because the chatgpt free tier existed

torn mantle Jul 29, 2025, 7:07 PM

#

Thanks

keen beacon Jul 29, 2025, 7:08 PM

#

digital umbra your original point was that open source models were useless because the chatgpt...

That sounds dumb

digital umbra Jul 29, 2025, 7:08 PM

#

keen beacon That sounds dumb

it is

keen beacon Jul 29, 2025, 7:08 PM

#

There's more to life than ChatGPT

#

I use deepseek and Kimi

#

for example

#

why use chatgpt instead of aistudio atp btw

keen beacon Jul 29, 2025, 7:08 PM

#

keen beacon why use chatgpt instead of aistudio atp btw

More data collection

#

Even more than in gemini.app

#

ok but youre basically already accepting theyre collecting your data

#

use a frontier reasoning model and make it worth it 🤣

balmy mist Jul 29, 2025, 7:09 PM

#

when do yall think gpt5 is coming out?

keen beacon Jul 29, 2025, 7:09 PM

#

I heard some news though that Sam Altman revealed that people say all kinds of personal info on Chatgpt

digital umbra Jul 29, 2025, 7:10 PM

#

fun thing that's a requirement for using o3 then

keen beacon Jul 29, 2025, 7:10 PM

#

?

#

That's some braindead thinking

#

to do

#

on Twitter

primal orbit Jul 29, 2025, 7:11 PM

#

@echo aurora thank you for bringing rate limit notification in the direct chat! very much appreciated.

echo aurora Jul 29, 2025, 7:12 PM

#

primal orbit <@283397944160550928> thank you for bringing rate limit notification in the dire...

Glad to hear it!

primal orbit Jul 29, 2025, 7:13 PM

#

If we could edit the message in chat and reroll, would be a great next update. Like in the Google AI Studio. Sometimes you make mistake and the chat goes off rails.

digital umbra Jul 29, 2025, 7:14 PM

#

yes

#

A "tournament" mode where you can keep using the winning model from the previous turn would also be nice

blazing rune Jul 29, 2025, 7:17 PM

#

https://tenor.com/view/michael-jackson-comendo-picoca-gif-9669437860846841235

Tenor

digital umbra Jul 29, 2025, 7:18 PM

#

ah yes, let me just set up a shell company in panama so i can use chatgpt without letting them know my identity

keen beacon Jul 29, 2025, 7:19 PM

#

Does the EU's GDPR help in how AI companies can collect data? Just curious if people here would know more

unborn ocean Jul 29, 2025, 7:23 PM

#

well idk about the specifics, but there are a lot of data collection things that are turned off for eu consumers

keen beacon Jul 29, 2025, 7:23 PM

#

unborn ocean well idk about the specifics, but there are a lot of data collection things that...

EU is based

unborn ocean Jul 29, 2025, 7:23 PM

#

e.g. training on data with aistudio free tier

#

(in the api only)

#

no i meant the api

#

has a free tier

#

aistudio as a webapp is a different quota that is completely free, separate of the api free tier

keen beacon Jul 29, 2025, 7:28 PM

#

Well, it's good I heard about that usement of data too

#

👍

novel crater Jul 29, 2025, 7:44 PM

#

yoooooooooooooo

#

your parents gave you a great name

#

haha

stray aspen Jul 29, 2025, 7:54 PM

#

wassup billy

blazing bison Jul 29, 2025, 8:21 PM

#

So let's bring zenith back?

#

trophy3d

stray aspen Jul 29, 2025, 8:30 PM

#

yes

torn mantle Jul 29, 2025, 10:30 PM

#

you're still thinking of zenith

#

quite the obsession

blazing bison Jul 29, 2025, 11:37 PM

#

it has the potential to be 1500 elo

stray aspen Jul 29, 2025, 11:40 PM

#

I'm gonna play hugging face

grizzled bobcat Jul 30, 2025, 1:40 AM

#

Guys

#

Help me

#

It limit massage

#

It not unlimited massage

echo aurora Jul 30, 2025, 1:46 AM

#

grizzled bobcat It limit massage

There is a rate limit to how often you use models

keen beacon Jul 30, 2025, 2:11 AM

#

echo aurora There is a rate limit to how often you use models

Hey, I have a question.

I saw zenith got re-added, yesterday.

Can't find it in battle mode.

I'm new to all this, when can it be found again?

#

Anyone?

verbal nimbus Jul 30, 2025, 2:51 AM

#

Talking to Gemini 2.5 Pro is a bit frustrating sometimes. It doesn't notify me that I provided the same attachment twice.

#

On AIStudio it likes to use flowery language for open-ended questions like it's inventing marketing terms, but it's great on STEM questions.

zinc ore Jul 30, 2025, 3:54 AM

#

I'd love to believe it

sturdy mica Jul 30, 2025, 3:59 AM

#

how come you can't add attachments to searching models!?!?

#

does anyone know a workaround or something

keen beacon Jul 30, 2025, 4:03 AM

#

Anyone spotted zenith yet?

stray aspen Jul 30, 2025, 4:07 AM

#

craig do you think gpt 5 will smoke all the other models

#

and will remain for a long time

keen beacon Jul 30, 2025, 4:08 AM

#

https://x.com/hunoematic/status/1944090606625792085

invincibleHunter (@hunoematic)

To clarify, these are my predictions for GPT-5, and insider Satoshi confirms most are accurate, or somewhat accurate. Those are mostly based on rumors.

sturdy mica Jul 30, 2025, 4:18 AM

#

yo how do i have attachments and internet access at the same time

#

cuz this low-key annoying

stray aspen Jul 30, 2025, 4:26 AM

#

sturdy mica yo how do i have attachments and internet access at the same time

you dont

sturdy mica Jul 30, 2025, 4:26 AM

#

stray aspen you dont

cool............. :-[

stray aspen Jul 30, 2025, 4:27 AM

#

make a feedback

#

and maybe theyll add it

sturdy mica Jul 30, 2025, 4:27 AM

#

is there some free service where i could use some models like grok 4 with internet and also attachments

#

@stray aspen do you know of one

#

sigh

#

bzzzzzzzz

#

bzbzbzbzbz

#

aaaaaaaaahhhh

#

bbbbbbbbbb

keen beacon Jul 30, 2025, 4:47 AM

#

sturdy mica is there some free service where i could use some models like grok 4 with intern...

Lmarena, no?

sturdy mica Jul 30, 2025, 4:47 AM

#

keen beacon Lmarena, no?

with search AND attachments

#

lmarena supports only one or the other

#

i need both at same time

lime coral Jul 30, 2025, 6:47 AM

#

Since GPT5 uses tools by default they should be compared with Deep Research version

fleet lintel Jul 30, 2025, 9:31 AM

#

i need GPT5 now. when are they launching?

keen beacon Jul 30, 2025, 9:35 AM

#

fleet lintel i need GPT5 now. when are they launching?

Early August as was rumoured on some sources

cedar tide Jul 30, 2025, 9:41 AM

#

a good cleaning is nice

Screenshot_2025-07-30-11-41-00-071_com.discord-edit.jpg

torn mantle Jul 30, 2025, 10:06 AM

#

cedar tide a good cleaning is nice

happy david

cedar tide Jul 30, 2025, 10:15 AM

#

torn mantle happy david

We need more cleaning
There are still 2 kraken, cuttlefish, clownfish, octopus, stephen

cedar tide Jul 30, 2025, 10:42 AM

#

Go upvote for the best 32B model (and also the best one with fewer than 235B total parameters !)

https://discord.com/channels/1340554757349179412/1396370899342725253

frosty lark Jul 30, 2025, 11:52 AM

#

elo are relative, one cannot compare between playerpools.

cedar tide Jul 30, 2025, 11:59 AM

#

Ernie 4.5 is underrated 😵‍💫

#

Average of the 20 benchmark that baidu shared (Chinese benchmark excluded)
Upvote here https://discord.com/channels/1340554757349179412/1392140140662489108

cedar tide Jul 30, 2025, 12:02 PM

#

cedar tide a good cleaning is nice

Now that we have cleaned these 11 models, add these 10 models 😶

Qwen 30b A3b 25 07
Gemini 2.5 no think
Open reasoning nemotron 32b
Ernie 4.5 300b
Glm 4.5 no think and on webdev
Solar pro 2
Exaone 4.0 32b
Hunyuan 80b a13b
Intern S1 (241b vision)
Reka flash 3.1

torn mantle Jul 30, 2025, 12:03 PM

#

cedar tide Average of the 20 benchmark that baidu shared (Chinese benchmark excluded) Upvot...

look i appreciate benchmarks, but they dont reflect how the model is practically

cedar tide Jul 30, 2025, 12:03 PM

#

torn mantle look i appreciate benchmarks, but they dont reflect how the model is practically

have you tried it?

#

I don't think so

torn mantle Jul 30, 2025, 12:04 PM

#

cedar tide have you tried it?

where can i try it

#

lol

#

https://aistudio.baidu.com/overview

飞桨AI Studio星河社区-人工智能学习与实训社区

飞桨星河社区是面向AI学习者的人工智能学习与实训社区。飞桨星河社区集成了丰富的免费AI课程，大模型社区及模型应用，深度学习样例项目，各领域经典数据集，云端超强GPU算力及存储资源，更有新手练习赛、精英算法大赛等你参与。

#

let me see

cedar tide Jul 30, 2025, 12:05 PM

#

torn mantle where can i try it

https://openrouter.ai/baidu/ernie-4.5-300b-a47b

And on novita ai

ERNIE 4.5 300B A47B - API, Providers, Stats

ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4. Run ERNIE 4.5 300B A47B with API

torn mantle Jul 30, 2025, 12:05 PM

#

cedar tide have you tried it?

the first thing i try them on is : french -> eng and eng -> french

#

for multilingual vibe check

#

if its not fluent and feels native then its a big -1

#

they all sound robotic and ai gen

#

@cedar tide whats your first benchmark

#

or what do you try it on

cedar tide Jul 30, 2025, 12:21 PM

#

@torn mantle The truth is I haven't tried it like everyone else, but just if it has good benchmarks we should give it a chance in the arena so we can try it.

languid crescent Jul 30, 2025, 12:25 PM

#

@echo aurora am so sorry 🙁 I received a warning about advertising didn't know that I can't share it :((

torn mantle Jul 30, 2025, 12:26 PM

#

cedar tide <@295243581818404874> The truth is I haven't tried it like everyone else, but ju...

ive tried it

#

its meh

cedar tide Jul 30, 2025, 12:26 PM

#

torn mantle ive tried it

Where ?

torn mantle Jul 30, 2025, 12:26 PM

#

cedar tide https://openrouter.ai/baidu/ernie-4.5-300b-a47b And on novita ai

here

ornate agate Jul 30, 2025, 12:27 PM

#

I think benchmarks are still a lot better than random vibes or assuming it’s bad

torn mantle Jul 30, 2025, 12:27 PM

#

no they are not

#

vibes check is superior

cedar tide Jul 30, 2025, 12:27 PM

#

@torn mantle for you deepseek v3 is much better?

languid crescent Jul 30, 2025, 12:27 PM

#

i am such a disappointment 🙁

torn mantle Jul 30, 2025, 12:27 PM

#

cedar tide <@295243581818404874> for you deepseek v3 is much better?

yes

torn mantle Jul 30, 2025, 12:27 PM

#

languid crescent i am such a disappointment 🙁

its fine

#

you have the same pfp picture as david

#

why

ornate agate Jul 30, 2025, 12:33 PM

#

Exaone and nemotron AA benchmark at 32b size makes them very compelling for further analysis

cedar tide Jul 30, 2025, 12:41 PM

#

Yes

#

but glm is mostly good at webdev and he's not on it yet

#

and there are only think versions of glm, but sometimes people prefer no think versions, for example qwen 3 no think is much higher than think version in the leaderboard

tall summit Jul 30, 2025, 12:49 PM

#

which lmarena direct chat models have ratelimits?

blazing bison Jul 30, 2025, 1:34 PM

#

The update of the chatgpt Mac app with preparations for gpt 5 basically confirmed that it's gonna be a router

#

🤓

stray aspen Jul 30, 2025, 1:46 PM

#

Bro the Baidu ernie playground is so trash

cedar tide Jul 30, 2025, 2:10 PM

#

https://fixupx.com/ArtificialAnlys/status/1950555928014844415?t=fD2hm_gS7PlfvD8_R9kljQ&s=19

Artificial Analysis (@ArtificialAnlys)

Announcing the Artificial Analysis Music Arena Leaderboard: with >5k votes, Suno v4.5 is the leading Music Generation model followed by Riffusion’s FUZZ-1.1 Pro.
︀︀
︀︀Google’s Lyria 2 places third in our Instrumental leaderboard, and Udio’s v1.5 Allegro places third in our Vocals leaderboard.
︀︀
︀︀The Instrumental Leaderboard is as follows:
︀︀🥇 @SunoMusic V4.5
︀︀🥈 @riffusionai FUZZ-1.1 Pro
︀︀🥉 @GoogleDeepMind Lyria 2
︀︀@udiomusic v1.5 Allegro
︀︀@StabilityAI Stable Audio 2.0
︀︀@metaai MusicGen
︀︀
︀︀Rankings are based on community votes across a diverse range of genres and prompts. Want to see your prompt featured? You can submit prompts in the arena today.
︀︀
︀︀👇 See below for the Vocals Leaderboard and link to participate!

**💬 4 🔁 6 ❤️ 38 👁️ 1.2K **

whole wagon Jul 30, 2025, 2:16 PM

#

Did I hallucinate, I swear on chatGPT the switch model option had gpt5 for a second 😂

finite pollen Jul 30, 2025, 2:17 PM

#

hey in our battles, models that are removed get relabed back to Assistant A so we dont know what they were.. can this be fixed?

jade egret Jul 30, 2025, 2:22 PM

#

when gpt 5

echo aurora Jul 30, 2025, 2:34 PM

#

finite pollen hey in our battles, models that are removed get relabed back to `Assistant A` so...

Interesting. I’ll flag to the team and see if there is a fix that’ll keep those names even if removed.

finite pollen Jul 30, 2025, 2:41 PM

#

echo aurora Interesting. I’ll flag to the team and see if there is a fix that’ll keep those ...

e.g

tall summit Jul 30, 2025, 3:16 PM

#

which lmarena direct chat models have ratelimits?

torn mantle Jul 30, 2025, 3:19 PM

#

@cedar tide https://x.com/Alibaba_Qwen/status/1950570969036361799

Qwen (@Alibaba_Qwen)

🚀 Qwen3-30B-A3B-Thinking-2507, a medium-size model that can think!

• Nice performance on reasoning tasks, including math, science, code & beyond
• Good at tool use, competitive with larger models
• Native support of 256K-token context, extendable to 1M

Qwen Chat: Go to

cedar tide Jul 30, 2025, 3:19 PM

#

Already see

torn mantle Jul 30, 2025, 3:21 PM

#

cedar tide Already see

but you didnt share it

#

xd

cedar tide Jul 30, 2025, 3:25 PM

#

Yes im busy now

#

Soon the average of the benchmark

#

@torn mantle officialy coder 30b a13b tomorow

cedar tide Jul 30, 2025, 3:38 PM

#

cedar tide

poll_question_text

To create Agent arena

victor_answer_votes

21

total_votes

24

victor_answer_id

1

victor_answer_text

Yes

cedar tide Jul 30, 2025, 3:38 PM

#

cedar tide Soon the average of the benchmark

Just i need to go to my pc

cedar tide Jul 30, 2025, 3:38 PM

#

cedar tide

21 vote for create agent arena

torn mantle Jul 30, 2025, 3:41 PM

#

cedar tide <@295243581818404874> officialy coder 30b a13b tomorow

😮

digital umbra Jul 30, 2025, 3:46 PM

#

torn mantle <@419074580515389450> https://x.com/Alibaba_Qwen/status/1950570969036361799

interesting chain of thought

cedar tide Jul 30, 2025, 3:55 PM

#

average of the 24 benchmark

#

by category

#

@torn mantle @ancient reef

torn mantle Jul 30, 2025, 3:56 PM

#

cedar tide by category

Nice

#

Not bad

cedar tide Jul 30, 2025, 3:56 PM

#

In my opinion, Gemini 2.5 Flash is about this size.

#

size of deepseek r1 ?

digital umbra Jul 30, 2025, 3:58 PM

#

https://youtu.be/0obMRztklqU?t=25 speculating on model size

YouTube

JAAM Studios

Numberwang Episode 1

▶ Play video

cedar tide Jul 30, 2025, 4:05 PM

#

@ornate agatethe only hint we have is that there was a gemini 1.5 flash 8b version

#

go upvote the new qwen thinking https://discord.com/channels/1340554757349179412/1399812659800445131

keen beacon Jul 30, 2025, 4:11 PM

#

Zenith in lmarena, anyone?

#

Or openrouter horizon alpha?

digital umbra Jul 30, 2025, 4:11 PM

#

zenith was removed, horizon is not out yet

wicked root Jul 30, 2025, 4:21 PM

#

any update on GPT5?

lapis light Jul 30, 2025, 4:27 PM

#

Can I just ask though, why are there three Video Arena channels?

echo aurora Jul 30, 2025, 4:27 PM

#

lapis light Can I just ask though, why are there three Video Arena channels?

To spread generations out a bit. If it was all in one channel it'd be a bit much,

lapis light Jul 30, 2025, 4:29 PM

#

echo aurora To spread generations out a bit. If it was all in one channel it'd be a bit much...

Okay, fair enough. I guess that makes sense.

tawdry sapphire Jul 30, 2025, 4:30 PM

#

yo

echo aurora Jul 30, 2025, 4:31 PM

#

tawdry sapphire yo

hello ablobwave

tawdry sapphire Jul 30, 2025, 4:31 PM

#

wassup bro

mint cape Jul 30, 2025, 4:32 PM

#

echo aurora hello <a:ablobwave:552927506957729802>

wasn't video arena already here?

echo aurora Jul 30, 2025, 4:33 PM

#

mint cape wasn't video arena already here?

It's been here for a little bit, but we wanted to soft launch it first before dropping the @ everyone

mint cape Jul 30, 2025, 4:33 PM

#

I forgot to disable @everyone pings on the server and was midly annoyed 🙁

tribal glacier Jul 30, 2025, 4:35 PM

#

hello .)

brittle tiger Jul 30, 2025, 4:38 PM

#

Pretty crazy anthropic API revenue is higher than openai now

https://x.com/aj_kourabi/status/1950577614772662325

AJ (@aj_kourabi)

anthro’s API rev has over taken OpenAI’s

and once I internalised that, and its implications, I joined their ranks in believing code is the Only Thing That Matters

humble sonnet Jul 30, 2025, 4:40 PM

#

echo aurora It's been here for a little bit, but we wanted to soft launch it first before dr...

So there's still a limit? Like the video limit per day.

digital umbra Jul 30, 2025, 4:40 PM

#

you have to go through hoops to use o3 through API so it's not too surprising

echo aurora Jul 30, 2025, 4:41 PM

#

humble sonnet So there's still a limit? Like the video limit per day.

Yup

humble sonnet Jul 30, 2025, 4:41 PM

#

echo aurora Yup

Okay

#

Do you have a limit with image with bot?

void tusk Jul 30, 2025, 4:42 PM

#

oh boy, i bet there are going to be a bunch of new people here xd

#

even newer than me XD

echo aurora Jul 30, 2025, 4:42 PM

#

humble sonnet Do you have a limit with image with bot?

It should all count the same regardless if you use image, video, or image to video

humble sonnet Jul 30, 2025, 4:43 PM

#

Oh , but image is unlimited on website

wicked root Jul 30, 2025, 4:57 PM

#

how do people know this? I can't find official statements anywhere

reef pawn Jul 30, 2025, 5:03 PM

#

wicked root how do people know this? I can't find official statements anywhere

It's not official just rumours

wicked root Jul 30, 2025, 5:05 PM

#

and is the new model going to be better than gemini pro?

#

I might switch over to gpt5 if that's the case

nimble trail Jul 30, 2025, 5:05 PM

#

wicked root and is the new model going to be better than gemini pro?

they should be.

ionic idol Jul 30, 2025, 5:19 PM

#

Hi

wicked root Jul 30, 2025, 5:22 PM

#

nimble trail they should be.

What about specifically for text?

wintry tinsel Jul 30, 2025, 5:25 PM

#

I eagerly await August for gpt 5

dusky ore Jul 30, 2025, 5:25 PM

#

#share-prompts create video, on a subway platform, the chubby raccoon is running away, clutching the three ducklings in its arms. Behind him, the mother duck is chasing after the raccoon, wings spread and beak open in panic. The scene is dynamic and full of action. vertical, 9:16

echo aurora Jul 30, 2025, 5:26 PM

#

dusky ore <#1343302058929033216> create video, on a subway platform, the chubby raccoon is...

You need to use the /video command, this can only be done in #video-arena-1 #video-arena-2 so on. More info in #1397655624103493813

golden ocean Jul 30, 2025, 5:26 PM

#

dusky ore <#1343302058929033216> create video, on a subway platform, the chubby raccoon is...

reallllll

reef pawn Jul 30, 2025, 5:28 PM

#

How can I enable my Gemini AI pro membership in Google AI Studio? I already have this membership and want to use Veo 3 and Imagen Ultra but I'm unable to do so

merry pond Jul 30, 2025, 5:29 PM

#

@amber warren Hi !

amber warren Jul 30, 2025, 5:30 PM

#

helloo

dusky ore Jul 30, 2025, 5:30 PM

#

/video create video, on a subway platform, the chubby raccoon is running away, clutching the three ducklings in its arms. Behind him, the mother duck is chasing after the raccoon, wings spread and beak open in panic. The scene is dynamic and full of action. vertical, 9:16

merry pond Jul 30, 2025, 5:31 PM

#

amber warren helloo

Congratulations for intern stage at LMArena

echo aurora Jul 30, 2025, 5:34 PM

#

dusky ore /video create video, on a subway platform, the chubby raccoon is running away, c...

Need to be in one of the video-arena channels, try in #video-arena-4

merry pond Jul 30, 2025, 5:38 PM

#

Is there a role for Lmarena staff to recognize employees who work there and prevent people from being misled by imposters?

echo aurora Jul 30, 2025, 5:41 PM

#

merry pond Is there a role for Lmarena staff to recognize employees who work there and prev...

There isn't

wheat hamlet Jul 30, 2025, 5:43 PM

#

hello guys

merry pond Jul 30, 2025, 5:43 PM

#

echo aurora There isn't

is it planned or not?

echo aurora Jul 30, 2025, 5:43 PM

#

wheat hamlet hello guys

welcome!

echo aurora Jul 30, 2025, 5:43 PM

#

merry pond is it planned or not?

TBD, not having one is intentional

wheat hamlet Jul 30, 2025, 5:44 PM

#

would love to know whats ur personal fav model for text, music and video

#

i feel like people have different preferred models nowadays

merry pond Jul 30, 2025, 5:44 PM

#

echo aurora TBD, not having one is intentional

Okay

trail ember Jul 30, 2025, 6:02 PM

#

🤗

stray aspen Jul 30, 2025, 6:11 PM

#

@echo aurora Mr why so many video arena chats

echo aurora Jul 30, 2025, 6:12 PM

#

stray aspen <@283397944160550928> Mr why so many video arena chats

Spread out the generations, else it'd be a bit crazy

digital umbra Jul 30, 2025, 6:18 PM

#

i wonder who is paying for it

stray aspen Jul 30, 2025, 6:18 PM

#

We are

#

Ok but fr how does lm arena profit

#

@echo aurora

echo aurora Jul 30, 2025, 6:36 PM

#

stray aspen Ok but fr how does lm arena profit

I'd encourage you to read this blog post - https://news.lmarena.ai/new-lmarena/

shut quiver Jul 30, 2025, 6:37 PM

#

Um... hi there. Just joined today. Nice to meet you guys

golden ocean Jul 30, 2025, 6:38 PM

#

dusky ore /video create video, on a subway platform, the chubby raccoon is running away, c...

REAL

keen beacon Jul 30, 2025, 6:45 PM

#

dusky ore /video create video, on a subway platform, the chubby raccoon is running away, c...

create the videos on #video-arena-1 to 4

#

main chat is not the place.

echo aurora Jul 30, 2025, 6:46 PM

#

shut quiver Um... hi there. Just joined today. Nice to meet you guys

welcome welcome! ablobwave

shut quiver Jul 30, 2025, 6:47 PM

#

echo aurora welcome welcome! <a:ablobwave:552927506957729802>

Thanks! I've been using images for a while. It's honestly pretty neat

old ginkgo Jul 30, 2025, 6:51 PM

#

Do you guys think meta will make a pay2win comeback?

digital umbra Jul 30, 2025, 6:52 PM

#

considering how much they invested on stealing talent from other companies, i think they have to

old ginkgo Jul 30, 2025, 7:13 PM

#

Yeah i think so too. I think mark will just let them do whatever. Literally 0 safeguards, just make sure you win. Also here is 100 billion dollars in salary and datacenters lol

woven harness Jul 30, 2025, 7:14 PM

#

video arena will be on the web?

echo aurora Jul 30, 2025, 7:15 PM

#

woven harness video arena will be on the web?

it's possible, be sure to share any feedback related to this in #bot-feedback

wintry tinsel Jul 30, 2025, 7:23 PM

#

digital umbra considering how much they invested on stealing talent from other companies, i th...

The question is will it matter? This is a massive gamble really, and there is a strong possibility of it not being good enough to justify that huge expenses other companies already have their teams assembled the dynamics established and the momentum rolling, it’s a little late to come in and do something new by 2025..

#

To me it feels do or die, they think their future value as a company is banking on this promise so they are willing to burn all their money on the chance they succeed regardless of how high that chance is, even if there’s a real possibility it doesn’t pan out

keen beacon Jul 30, 2025, 7:31 PM

#

I love this

#

But how much does this cost to run?

#

I wonder if companies gave credits to these guys to advertise their services

verbal sorrel Jul 30, 2025, 7:46 PM

#

You guys need to turn off sound so its not a dead giveaway that the model is Veo 3.... 🤣

#

Also doesnt seem like an apples to apples comparison at the point anymore too

echo aurora Jul 30, 2025, 7:48 PM

#

verbal sorrel You guys need to turn off sound so its not a dead giveaway that the model is Veo...

Fair feedback! I'm going to move this to #bot-feedback so we can keep it all in one place.

torn mantle Jul 30, 2025, 8:12 PM

#

Do it

#

Ah you already did

#

Free money

#

Hax

#

Smh

patent aspen Jul 30, 2025, 8:12 PM

#

verbal sorrel Also doesnt seem like an apples to apples comparison at the point anymore too

It is apples to apples. The other video models just lack sound and should be punished for it accordingly

torn mantle Jul 30, 2025, 8:12 PM

#

Yea

#

You need to look for arbitrage

#

Between diff platforms

verbal sorrel Jul 30, 2025, 8:13 PM

#

patent aspen It is apples to apples. The other video models just lack sound and should be pun...

Poor take

torn mantle Jul 30, 2025, 8:13 PM

#

Hax

#

You have a script for it?

verbal sorrel Jul 30, 2025, 8:13 PM

#

You arent measuring video models anymore at that point

torn mantle Jul 30, 2025, 8:13 PM

#

Or Is it manual

#

Liar

#

Smh

patent aspen Jul 30, 2025, 8:14 PM

#

verbal sorrel You arent measuring video models anymore at that point

Sure you are. All video models except one have sound though

verbal sorrel Jul 30, 2025, 8:14 PM

#

The entire point of LMArena is blind testing, if you know the model is Veo 3 right away then it defeats the purpose.

verbal sorrel Jul 30, 2025, 8:14 PM

#

patent aspen Sure you are. All video models except one have sound though

Thats the point...

torn mantle Jul 30, 2025, 8:14 PM

#

Craig just let it out

#

Don't worry

#

You are safe here

#

Im skydiving rn

#

Wish me lick

#

Luck i mean

verbal sorrel Jul 30, 2025, 8:14 PM

#

For LMArena to be effective you need to remove bias, which is why its blind. But if you know the model is Veo 3 out the gates that obviously doesnt work anymore.

patent aspen Jul 30, 2025, 8:15 PM

#

verbal sorrel Thats the point...

It's too bad the others don't but it's not like this is the first time it's been possible to deduce what a model is based on its responses

torn mantle Jul 30, 2025, 8:15 PM

#

Orabazes is mad

verbal sorrel Jul 30, 2025, 8:15 PM

#

True but this is just too obvious and detrimental to testing

torn mantle Jul 30, 2025, 8:15 PM

#

Angry

echo aurora Jul 30, 2025, 8:15 PM

#

torn mantle Im skydiving rn

same same

verbal sorrel Jul 30, 2025, 8:16 PM

#

Im not mad just trying to make it a better leaderboard for everybody

torn mantle Jul 30, 2025, 8:16 PM

#

I have never used any gambling website

#

Like ever

echo aurora Jul 30, 2025, 8:16 PM

#

It's valid feedback for sure

patent aspen Jul 30, 2025, 8:16 PM

#

It would be silly to punish the best video model because the other models don't have feature parity

real whale Jul 30, 2025, 8:16 PM

#

/image-to-video /image-to-video

echo aurora Jul 30, 2025, 8:17 PM

#

real whale /image-to-video /image-to-video

Need to use the video arena channels, like #video-arena-4

patent aspen Jul 30, 2025, 8:19 PM

#

Worse video models should also be incentivised to compete for user preference holistically

real whale Jul 30, 2025, 8:20 PM

#

Thanks you @echo aurora

#

@echo aurora I'm little nob on this site

visual panther Jul 30, 2025, 8:25 PM

#

yes

merry reef Jul 30, 2025, 8:30 PM

#

hola

blazing bison Jul 30, 2025, 8:42 PM

#

Bring zenith back

#

😢

deep valve Jul 30, 2025, 9:27 PM

#

hello

fading kraken Jul 30, 2025, 10:18 PM

#

hello

#

how can i check out the code named models

#

clownfish, nettle, etc

tired herald Jul 30, 2025, 10:19 PM

#

Can't directly, you need to go to the battle and have luck ig

fading kraken Jul 30, 2025, 10:19 PM

#

ah got it

warm fulcrum Jul 30, 2025, 10:19 PM

#

when gpt-5 releases, do u think they will bring it in lmarena?

fading kraken Jul 30, 2025, 10:19 PM

#

no

tired herald Jul 30, 2025, 10:20 PM

#

warm fulcrum when gpt-5 releases, do u think they will bring it in lmarena?

Maybe to battle for a day or two, but not to direct chat

warm fulcrum Jul 30, 2025, 10:20 PM

#

tired herald Maybe to battle for a day or two, but not to direct chat

i dont want to pay openai 20$ 😭

#

they got enough money out of me already

tired herald Jul 30, 2025, 10:20 PM

#

I can't help you with that

warm fulcrum Jul 30, 2025, 10:21 PM

#

im just saying

tired herald Jul 30, 2025, 10:21 PM

#

And I'm just saying too 😭

warm fulcrum Jul 30, 2025, 10:21 PM

#

👍

tired herald Jul 30, 2025, 10:21 PM

#

You'll probably still have some free messages tho

#

On OpenAI

warm fulcrum Jul 30, 2025, 10:22 PM

#

they would water it down hella tho so

tired herald Jul 30, 2025, 10:22 PM

#

Very likely yes

warm fulcrum Jul 30, 2025, 10:22 PM

#

im so jealous i never got to try zenith ngl

tired herald Jul 30, 2025, 10:22 PM

#

So am I

warm fulcrum Jul 30, 2025, 10:23 PM

#

🤷

tired herald Jul 30, 2025, 10:23 PM

#

Just have to wait then ig

tall summit Jul 30, 2025, 10:32 PM

#

tired herald Maybe to battle for a day or two, but not to direct chat

they might

#

gpt-5 will be huge no matter what

#

even just by reputation

#

it'd be very idiotic of them not to

digital umbra Jul 30, 2025, 10:33 PM

#

gpt-5 will be huge because it's the first openai model released in a long time with a name that actually makes some sense

blissful sluice Jul 30, 2025, 10:34 PM

#

Its. Called gpt-five

ocean vortex Jul 30, 2025, 10:34 PM

#

digital umbra gpt-5 will be huge because it's the first openai model released in a long time w...

AGI confirmed

digital umbra Jul 30, 2025, 10:35 PM

#

the first AGI model will be called gpt5.1o-max-pro-alpha

patent aspen Jul 30, 2025, 10:48 PM

#

civic flame Jul 30, 2025, 10:52 PM

#

i'm going for no

#

🍿 this next 7 days is going to be HOT

stray aspen Jul 30, 2025, 10:52 PM

#

patent aspen

No

civic flame Jul 30, 2025, 10:52 PM

#

in demis we trust

zinc ore Jul 30, 2025, 10:55 PM

#

InB4 drop at the same hr

ocean vortex Jul 30, 2025, 11:00 PM

#

patent aspen

lmfao. Ultra 1.0 when

blazing bison Jul 30, 2025, 11:15 PM

#

English only

silent flume Jul 30, 2025, 11:22 PM

#

new model in arena

#

potato and dino

digital umbra Jul 30, 2025, 11:24 PM

#

dino claims to be by anthropic

#

and potato by openai

silent flume Jul 30, 2025, 11:24 PM

#

they are good?

coarse flame Jul 30, 2025, 11:27 PM

#

Potato keep trying to use imgur link

digital umbra Jul 30, 2025, 11:29 PM

#

digital umbra dino claims to be by anthropic

they might also be distilled from those companies, apparently

#

no idea if they're good or not

ebon jacinth Jul 30, 2025, 11:49 PM

#

bros gpt 5 tomorrow?

long depot Jul 30, 2025, 11:55 PM

#

Hi, new here! I'm curious to know if anyone has measured whether people are inherently more likely to pick option A or B in the arena, because of recency bias. I know that when I get long responses I read through one and establish an opinion, then read through the other but can't help comparing as I go, which might bias my vote.

bronze veldt Jul 30, 2025, 11:59 PM

#

hello

echo aurora Jul 31, 2025, 12:03 AM

#

long depot Hi, new here! I'm curious to know if anyone has measured whether people are inh...

Hello! Welcome! ablobwave Our blog here has articles you may find interesting. https://news.lmarena.ai/. Iirc there was a section related to recency bias. I'll double check with the team and let you know.

whole wagon Jul 31, 2025, 12:08 AM

#

https://openrouter.ai/openrouter/horizon-alpha

#

This is OpenAI

#

256k context, smells like open source model

digital umbra Jul 31, 2025, 12:10 AM

#

i had a suspicion it would be that

#

if it's the open source model, 256k context would be nice

whole wagon Jul 31, 2025, 12:11 AM

#

They aren't even hiding it tbh

#

Like didn't even bother with the system prompt lol

digital umbra Jul 31, 2025, 12:12 AM

#

whole wagon Jul 31, 2025, 12:12 AM

#

That's a tokenizer issue I'm pretty sure. There's some explanation to why it's a crap test

#

Same issue caused the r in strawberry thing

digital umbra Jul 31, 2025, 12:14 AM

#

it doesn't seem like it's a reasoning model though

whole wagon Jul 31, 2025, 12:14 AM

#

Could be they just turned off the reasoning

digital umbra Jul 31, 2025, 12:14 AM

#

maybe

whole wagon Jul 31, 2025, 12:14 AM

#

For the demo

digital umbra Jul 31, 2025, 12:15 AM

#

hm

#

it doesn't seem to be terrible at the one trivia question i asked it

#

which even sonnet 4, deepseek and glm-4.5 fails at (but kimi k2 gets right)

whole wagon Jul 31, 2025, 12:16 AM

#

The training cutoff date is strange

#

It's long ago for some reason

#

Hm well the open source model was delayed a lot. So maybe it does add up

#

The openAI models on LM arena know the current president

digital umbra Jul 31, 2025, 12:19 AM

#

qwen3 says biden is president too

whole wagon Jul 31, 2025, 12:19 AM

#

Yeah. It doesn't instantly learn the cutoff date has to be like may onwards

#

For it to say trump is president

digital umbra Jul 31, 2025, 12:23 AM

#

i think it's a large model, too big for 1 gpu

torn mantle Jul 31, 2025, 12:24 AM

#

@cedar tide is probably sleeping and missed horizon alpha

cedar tide Jul 31, 2025, 12:24 AM

#

torn mantle <@419074580515389450> is probably sleeping and missed horizon alpha

Jamais

torn mantle Jul 31, 2025, 12:24 AM

#

lmao

#

xdddd

cedar tide Jul 31, 2025, 12:24 AM

#

He say october cutoff but he know deepseek r1

torn mantle Jul 31, 2025, 12:28 AM

#

cedar tide He say october cutoff but he know deepseek r1

its probably the open sourced model from openai

#

deepseek r1 update could be potato

#

new model added to lmarena

torn mantle Jul 31, 2025, 12:28 AM

#

cedar tide He say october cutoff but he know deepseek r1

digital umbra Jul 31, 2025, 12:28 AM

#

digital umbra it doesn't seem to be terrible at the one trivia question i asked it

it is actually really good at trivia.

#

close to gpt 4.1 for sure

torn mantle Jul 31, 2025, 12:30 AM

#

people are not liking this horizon alpha model at all

#

makes me wonder if deepseek really hit a wall or nah

#

potato was ok-ish but nothing crazy

digital umbra Jul 31, 2025, 12:32 AM

#

potato isn't horizon alpha

torn mantle Jul 31, 2025, 12:32 AM

#

yea ik ik

#

potato could be a chinese model

#

horizon alpha is def from oai

runic axle Jul 31, 2025, 12:38 AM

#

From initial testing Horizon Alpha has the same writing style as Zenith/Summit

torn mantle Jul 31, 2025, 12:38 AM

#

yea coding wise, it has similarities to summit/zenith but not that good tho

digital umbra Jul 31, 2025, 12:39 AM

#

it could be a gpt5 variant for sure, i'd assume it would be for the free tier of chatgpt in that case (replacing 4o)

runic axle Jul 31, 2025, 12:39 AM

#

I meant for creative writing. IIRC the Zenith/Sumit models in LMArena had a thinking/reasoning budget, but Horizon Alpha doesn't.

torn mantle Jul 31, 2025, 12:39 AM

#

surely not a gpt5 variant

digital umbra Jul 31, 2025, 12:39 AM

#

it also makes sense if it's the open source model if kimi was indeed the reason why they delayed it, because based on how it answers i think it's probably in the same size range, around 1T parameters (and yes i know kimi is kinda bad for its size)

torn mantle Jul 31, 2025, 12:40 AM

#

nah

#

the one who got access said its much much smaller

#

can run in a single H100(?) gpu

digital umbra Jul 31, 2025, 12:40 AM

#

if it's dense it would be much smaller

torn mantle Jul 31, 2025, 12:40 AM

#

i dont know if he meant H100 or B-serie

#

you mean moe?

#

dense it will just activate the whole params

digital umbra Jul 31, 2025, 12:41 AM

#

kimi is moe

torn mantle Jul 31, 2025, 12:41 AM

#

yea it is

digital umbra Jul 31, 2025, 12:41 AM

#

so i think if it's a dense model it would surely be smaller for the same performance

torn mantle Jul 31, 2025, 12:41 AM

#

yea could be

runic axle Jul 31, 2025, 12:42 AM

#

If this is the open-weights model maybe it was distilled from GPT-5?

digital umbra Jul 31, 2025, 12:42 AM

#

nah

#

distilled from gpt-4 variants

runic axle Jul 31, 2025, 12:42 AM

#

Yeah probably more likely

torn mantle Jul 31, 2025, 12:45 AM

#

runic axle If this is the open-weights model maybe it was distilled from GPT-5?

this is also possible

#

you know what

#

everything is possible

#

lets just wait and see

cedar tide Jul 31, 2025, 12:53 AM

#

Horizon alpha

torn mantle Jul 31, 2025, 12:54 AM

#

the video needs more work

#

but gl

#

i would just brainstorm ideas -> run it on notebooklm video overviews

#

and make a similar presentation

#

you are a cutie paws

#

😖

#

its a good one

#

👍

cedar tide Jul 31, 2025, 1:05 AM

#

https://fixupx.com/Angaisb_/status/1950722641314066607?t=nCMQDnM-riw8eZcmAScfew&s=19

Angel Bogado 🌻 (@Angaisb_)

Doodle Jump test
︀︀
︀︀Horizon Alpha

**💬 1 🔁 1 ❤️ 7 👁️ 274 **

▶ Play video

hardy pecan Jul 31, 2025, 1:35 AM

#

quick pass of simplebench for Horizon-Alpha: 3/20 lmao

finite pollen Jul 31, 2025, 2:08 AM

#

its probably that OSS one they keep hyping :p

ebon jacinth Jul 31, 2025, 2:26 AM

#

it could be GPT5-nano

lapis light Jul 31, 2025, 2:26 AM

#

Reminds me of the Google Lamda moment

misty vault Jul 31, 2025, 2:29 AM

#

stop

paper nimbus Jul 31, 2025, 2:45 AM

#

where can i track new models

mild vapor Jul 31, 2025, 3:00 AM

#

Does "LMArena" not support setting the Aspect Ratio when creating images? I've given the commands as detailed as possible, but the result is still a 1:1 image.

echo aurora Jul 31, 2025, 3:01 AM

#

mild vapor Does "LMArena" not support setting the Aspect Ratio when creating images? I've g...

We don't currently have this functionality; however, this is very much on our radar.

gritty flare Jul 31, 2025, 3:09 AM

#

ebon jacinth it could be GPT5-nano

What is it/

mild vapor Jul 31, 2025, 3:42 AM

#

In "video-arena", is there a limit to the number of videos can make?

echo aurora Jul 31, 2025, 3:42 AM

#

Yeah

mild vapor Jul 31, 2025, 3:43 AM

#

echo aurora Yeah

how much? and daily reset?

echo aurora Jul 31, 2025, 3:44 AM

#

mild vapor how much? and daily reset?

Yup daily, it's currently @ 8 but we may change it.

junior shuttle Jul 31, 2025, 4:03 AM

#

hallo everyone

languid crescent Jul 31, 2025, 4:55 AM

#

Will video arena be ever in lmarena? Or it just stays in discord for good?

echo aurora Jul 31, 2025, 4:56 AM

#

languid crescent Will video arena be ever in lmarena? Or it just stays in discord for good?

That's TBD, that's why we're considering this experimental. Be sure to use #bot-feedback to let us know what you'd like to see happen!

languid crescent Jul 31, 2025, 4:57 AM

#

GPT just released a new mode, will this be possible incorporated in LMarena?

https://openai.com/index/chatgpt-study-mode/

languid crescent Jul 31, 2025, 4:58 AM

#

languid crescent GPT just released a new mode, will this be possible incorporated in LMarena? ht...

probably not 😭

nova frost Jul 31, 2025, 4:58 AM

#

is there a option to use veo 3 in generating a video because i like it when sound effect is available.

echo aurora Jul 31, 2025, 4:59 AM

#

nova frost is there a option to use veo 3 in generating a video because i like it when soun...

No it's battle mode only atm, there isn't a way to select a specific model.

nova frost Jul 31, 2025, 5:00 AM

#

how many credits we generate a video here?

echo aurora Jul 31, 2025, 5:01 AM

#

nova frost how many credits we generate a video here?

It isn't going to be consistent, but it's currently set to 8 generations a day. Note you can only do so in the video-arena channels like #video-arena-3

nova frost Jul 31, 2025, 5:02 AM

#

echo aurora It isn't going to be consistent, but it's currently set to 8 generations a day. ...

ohh thank you so much

echo aurora Jul 31, 2025, 5:02 AM

#

nova frost ohh thank you so much

no problem! blobfingerguns

whole sundial Jul 31, 2025, 5:02 AM

#

languid crescent GPT just released a new mode, will this be possible incorporated in LMarena? ht...

it's just a system prompt, it's easy to find on the internet and you can just make it the first message

#

https://www.reddit.com/r/ChatGPTJailbreak/comments/1mcqe7l/chatgpt_study_mode_system_prompt/

languid crescent Jul 31, 2025, 5:03 AM

#

whole sundial it's just a system prompt, it's easy to find on the internet and you can just ma...

you can do it by prompt engineering itright?

whole sundial Jul 31, 2025, 5:03 AM

#

well, that's how they made the "mode" lol

languid crescent Jul 31, 2025, 5:03 AM

#

whole sundial well, that's how they made the "mode" lol

oh damn 😭

#

@echo aurora it possible for the video generation arena to send the result directly to the person who prompted it? (Assuming it's not a chatbot you can interact with.) The idea is: you type your prompt in the #video-arena channel, but only you can see the generated video result? idk if discord can even do this 😭

echo aurora Jul 31, 2025, 5:05 AM

#

languid crescent <@283397944160550928> it possible for the video generation arena to send the re...

Both in DMs to the person who generated & the server, or just DM? Be sure to share this in #bot-feedback

languid crescent Jul 31, 2025, 5:06 AM

#

echo aurora Both in DMs to the person who generated & the server, or just DM? Be sure to sha...

gotcha imma share this to #bot-feedback thanks!!

limber kiln Jul 31, 2025, 5:28 AM

#

Hello ! Best wishes for all.

tight sedge Jul 31, 2025, 5:34 AM

#

camping value video

echo aurora Jul 31, 2025, 5:35 AM

#

tight sedge camping value video

Note the #1397655624103493813 channel will give you info on how to use the bot

primal fern Jul 31, 2025, 5:53 AM

#

How to generta evideo in this channel

echo aurora Jul 31, 2025, 5:53 AM

#

primal fern How to generta evideo in this channel

Info in #1397655624103493813

nocturne bear Jul 31, 2025, 5:56 AM

#

@icy forge great prompt buddy

tight sedge Jul 31, 2025, 5:57 AM

#

A serene lakeside campsite at dawn, golden sunlight filtering through pine trees. A tent is pitched near the water, with a small campfire smoldering. A coffee pot steams on a rustic wooden table. Slow drone shot moving from the lake to the campsite."

echo aurora Jul 31, 2025, 5:59 AM

#

tight sedge A serene lakeside campsite at dawn, golden sunlight filtering through pine trees...

#1397655624103493813 has info on how to use the bot.

pseudo summit Jul 31, 2025, 6:30 AM

#

torn mantle potato could be a chinese model

could it b a grok model?

#

haven't gotten to mess w/ it too much, but some of the responses i got were similar to past grok responses

pseudo summit Jul 31, 2025, 6:43 AM

#

torn mantle

this is not consistent for potato btw, not sure if that points towards chinese model

hallow ridge Jul 31, 2025, 6:49 AM

#

How can I use AI to take ove r

halcyon vortex Jul 31, 2025, 7:55 AM

#

yo all of my chats just got wiped...

still bramble Jul 31, 2025, 7:56 AM

#

halcyon vortex yo all of my chats just got wiped...

Yeah, same

#

And this error now appears

halcyon vortex Jul 31, 2025, 8:01 AM

#

still bramble Yeah, same

DUDEEE IM COOKED MY GAME SYSTEM RELIED ON ITTTT

pseudo summit Jul 31, 2025, 8:01 AM

#

interesting

#

observed some very weird behavior by models right before the error

#

maybe they're doing maintenance or smth?

gloomy knot Jul 31, 2025, 8:01 AM

#

hello everyone, joining here to try out the video arena 🙂

halcyon vortex Jul 31, 2025, 8:01 AM

#

pseudo summit maybe they're doing maintenance or smth?

hopefully

still bramble Jul 31, 2025, 8:02 AM

#

halcyon vortex DUDEEE IM COOKED MY GAME SYSTEM RELIED ON ITTTT

rip(

halcyon vortex Jul 31, 2025, 8:03 AM

#

I think the devs are trying to fix lmarena tho

#

they better 🙏

nova edge Jul 31, 2025, 8:03 AM

#

hello

echo aurora Jul 31, 2025, 8:09 AM

#

still bramble And this error now appears

I'm not able to repro this. Do you know if this is only happening on mobile?

echo aurora Jul 31, 2025, 8:09 AM

#

gloomy knot hello everyone, joining here to try out the video arena 🙂

welcome! be sure to check out #1397655624103493813 for more info!

pseudo summit Jul 31, 2025, 8:10 AM

#

echo aurora I'm not able to repro this. Do you know if this is only happening on mobile?

no, i have same error on pc

echo aurora Jul 31, 2025, 8:10 AM

#

pseudo summit no, i have same error on pc

just with gpt-4.1 or are all models giving you this error?

pseudo summit Jul 31, 2025, 8:10 AM

#

all models

#

nvm, seems to be back!

echo aurora Jul 31, 2025, 8:12 AM

#

glad to hear it! but keep me updated if things seem broken again blobthumbsup

velvet iris Jul 31, 2025, 8:13 AM

#

hi

void perch Jul 31, 2025, 8:24 AM

#

can i invite the bot to my private channel, easy lost track in public chanel

flint mortar Jul 31, 2025, 8:26 AM

#

halcyon vortex yo all of my chats just got wiped...

they lowkey did me a favor had too many i didn’t use lol

calm sequoia Jul 31, 2025, 8:42 AM

#

The o3 suddenly started showing code changes without any actual differences. Anyone noticed this?

still bramble Jul 31, 2025, 8:43 AM

#

echo aurora I'm not able to repro this. Do you know if this is only happening on mobile?

On PC everything is ok (chats are there), I just checked. On smartphone the error went away, but chats did not return.

pseudo summit Jul 31, 2025, 8:45 AM

#

chats did not return for me on PC ...

torn mantle Jul 31, 2025, 8:45 AM

#

https://x.com/xai/status/1950828488405254268

xAI (@xai)

xAI supports AI safety and will be signing the EU AI Act’s Code of Practice Chapter on Safety and Security. While the AI Act and the Code have a portion that promotes AI safety, its other parts contain requirements that are profoundly detrimental to innovation and its copyright

still bramble Jul 31, 2025, 8:45 AM

#

pseudo summit chats did not return for me on PC ...

I didn't check the chats on PC right away, only now. Maybe the bug that caused this is gone now, idk.

wind briar Jul 31, 2025, 8:51 AM

#

hello, do you know any way to jailbreak gpt? like give you information it shouldn't give you ( tax fraud, fake ids, etc)

#

askin for a friend😅

halcyon vortex Jul 31, 2025, 8:56 AM

#

@echo aurora Will I ever get my chats back? They just dissapeared randomly

still bramble Jul 31, 2025, 9:00 AM

#

wind briar hello, do you know any way to jailbreak gpt? like give you information it should...

This article looks pretty convincing (https://habr.com/ru/articles/923084/), I once added it to my bookmarks, but never tried the advice from there. It is in Russian, but the screenshots with examples of all jailbreaks are in English, so I think everything will be clear.

Хабр

Джейлбрейкаем чатботы: ChatGPT без филь...

Майкл Скофилд знает, что иногда делать джейлбрейк морально Привет! Сегодня мы копнём в одну из самых спорных и недооценённых тем в мире ИИ — джейлбрейки чатботов. То самое, что позволяет убр...

golden onyx Jul 31, 2025, 9:07 AM

#

GM beautiful people

paper nimbus Jul 31, 2025, 9:12 AM

#

gippity 5 wen chat

dusky aurora Jul 31, 2025, 9:15 AM

#

still bramble This article looks pretty convincing (https://habr.com/ru/articles/923084/), I o...

I remember when it was still habrahabr

civic flame Jul 31, 2025, 9:37 AM

#

paper nimbus gippity 5 wen chat

Tuesday

weak sluice Jul 31, 2025, 9:38 AM

#

Hi!

paper nimbus Jul 31, 2025, 9:41 AM

#

a lotta ppl sayin July 31st tho

#

https://tenor.com/view/miau-adobe-after-effects-glass-breaking-default-preset-gif-5777596260912161139

Tenor

primal sun Jul 31, 2025, 9:56 AM

#

hi

main gulch Jul 31, 2025, 9:56 AM

#

still bramble This article looks pretty convincing (https://habr.com/ru/articles/923084/), I o...

he just uses pliny jbs

blazing narwhal Jul 31, 2025, 10:16 AM

#

hi

keen beacon Jul 31, 2025, 10:17 AM

#

blazing narwhal hi

Hello

teal crypt Jul 31, 2025, 10:27 AM

#

Hi

quiet pollen Jul 31, 2025, 10:36 AM

#

Been hearing that Gemini got nerfed, is it true? But when I see the leaderboard, they are still first in most of the fields

deft vigil Jul 31, 2025, 10:48 AM

#

is there any unreleased model on lmarena right now ?

cedar tide Jul 31, 2025, 10:49 AM

#

Well, I was busy with Horizon. What are Dino and Potato worth? Is it worth going to the arena?

deft vigil Jul 31, 2025, 10:50 AM

#

im on dino now but it really slow though. horizon is openai model ?

torn mantle Jul 31, 2025, 10:51 AM

#

im really starting to think deepseek is ded

civic flame Jul 31, 2025, 10:52 AM

#

cedar tide Well, I was busy with Horizon. What are Dino and Potato worth? Is it worth going...

they're both mid

#

chinese distils by the looks of it

cedar tide Jul 31, 2025, 10:52 AM

#

deft vigil is there any unreleased model on lmarena right now ?

stephen
kraken
kraken 2
folsom
nightride
nightride v2
dino
potato
octopus
clownfish
cuttlefish

civic flame Jul 31, 2025, 10:52 AM

#

cuttlefish and clownfish were removed

cedar tide Jul 31, 2025, 10:52 AM

#

civic flame chinese distils by the looks of it

reasoning ? deepseek ?

civic flame Jul 31, 2025, 10:52 AM

#

cedar tide reasoning ? deepseek ?

no lol

deft vigil Jul 31, 2025, 10:52 AM

#

wow the only way to access it through lmarena battle right

cedar tide Jul 31, 2025, 10:52 AM

#

civic flame no lol

not reasoning ?

torn mantle Jul 31, 2025, 10:53 AM

#

people on chinese forums/servers are talking about anothe 2 months delay for deepseek r2

civic flame Jul 31, 2025, 10:54 AM

#

cedar tide not reasoning ?

no

civic flame Jul 31, 2025, 10:54 AM

#

torn mantle people on chinese forums/servers are talking about anothe 2 months delay for dee...

deepseek are kinda cooked

cedar tide Jul 31, 2025, 10:54 AM

#

@torn mantleWell, I slept, what did I miss on Horizon?

torn mantle Jul 31, 2025, 10:54 AM

#

they hit a wall and they have some technicall issues

civic flame Jul 31, 2025, 10:54 AM

#

lol they probably realised they're too far below gpt-5 if they release r2 soon

torn mantle Jul 31, 2025, 10:54 AM

#

cedar tide <@295243581818404874>Well, I slept, what did I miss on Horizon?

i slept as well

#

i tried it on 1 prompt only

civic flame Jul 31, 2025, 10:54 AM

#

cedar tide <@295243581818404874>Well, I slept, what did I miss on Horizon?

tldr is that it's a very small model good at code and svgs but not much else

#

terrible at math

torn mantle Jul 31, 2025, 10:54 AM

#

idk its only good at svg

deft vigil Jul 31, 2025, 10:55 AM

#

Dino is keep on generating crazy

cedar tide Jul 31, 2025, 10:55 AM

#

civic flame tldr is that it's a very small model good at code and svgs but not much else

yes i know all this

civic flame Jul 31, 2025, 10:55 AM

#

then why ask what you missed 😭

torn mantle Jul 31, 2025, 10:55 AM

#

its the same thing, when something became trendy they will finetune like crazy on it

#

and thats what happened with svg

#

they are focusing on the wrong thing

cedar tide Jul 31, 2025, 10:55 AM

#

@civic flamehave people run benchmarks?

#

apart gpqa and math 500

#

and arc agi

#

and eq bench, creative writing

civic flame Jul 31, 2025, 10:56 AM

#

it got ~67% on aider

torn mantle Jul 31, 2025, 10:56 AM

#

the thing that we should focus on is aristotle x1

civic flame Jul 31, 2025, 10:56 AM

#

which puts it just below claude 4 opus

cedar tide Jul 31, 2025, 10:56 AM

#

civic flame it got ~67% on aider

yes i saw this too

torn mantle Jul 31, 2025, 10:56 AM

#

https://x.com/ai_for_success/status/1950826316712013843

AshutoshShrivastava (@ai_for_success)

92.4% on GPQA Diamond and 96.1% on SimpleQA, what the heck?

Aristotle X1 Verify is the new AI co-scientist from Autopoiesis that topped one of the toughest scientific reasoning benchmarks GPQA and scored 96.1 percent on SimpleQA.

No moat for big AI labs, lol.

Will GPT 5 really

civic flame Jul 31, 2025, 10:56 AM

#

what the

#

never heard of that

#

oh wait

torn mantle Jul 31, 2025, 10:56 AM

#

isnt it the math model

civic flame Jul 31, 2025, 10:56 AM

#

that's aristotle

torn mantle Jul 31, 2025, 10:56 AM

#

that acer was using?

civic flame Jul 31, 2025, 10:56 AM

#

yeah

#

he said it's definitely not "mathematical superintelligence" as they were selling it

torn mantle Jul 31, 2025, 10:57 AM

#

https://autopoiesis.science/blog/92-4-gpqa-diamond

92.4% GPQA Diamond - Autopoesis Sciences

Autopoiesis Sciences. Research breakthrough in model reasoning and new funding led by Informed Ventures.

rare python Jul 31, 2025, 10:57 AM

#

torn mantle https://x.com/ai_for_success/status/1950826316712013843

do you have link?

civic flame Jul 31, 2025, 10:57 AM

#

apparently it's only good at very specific kinds of math

torn mantle Jul 31, 2025, 10:57 AM

#

they blog can have some insights

#

their*

rare python Jul 31, 2025, 10:57 AM

#

Can I use it?

torn mantle Jul 31, 2025, 10:58 AM

#

Our system achieves state-of-the-art performance on both benchmarks because we've built systematic verification directly into the reasoning process. Rather than emotional doubt, our models apply procedural self-skepticism to their outputs, making the skepticism reliable and scalable rather than unpredictable.

Other models embody the opposite of scientific thinking. They're trained to sound confident about everything, when science requires knowing when you don't know.

torn mantle Jul 31, 2025, 10:58 AM

#

rare python Can I use it?

not yet

#

seems to me they are looking for more fundings

#

typical small business -> big business

civic flame Jul 31, 2025, 10:59 AM

#

rare python Can I use it?

https://aristotle.harmonic.fun/

Aristotle, by Harmonic

Mathematical Superintelligence at your fingertips

#

acer and another guy got a testflight invite pretty quickly

torn mantle Jul 31, 2025, 10:59 AM

#

im not sure if this model is powering their app or nah

civic flame Jul 31, 2025, 10:59 AM

#

it is

torn mantle Jul 31, 2025, 10:59 AM

#

yea that one

#

kinda curious if you can just ask it something unrelated to math

#

like inject some random math question inside an actual question

rare python Jul 31, 2025, 11:00 AM

#

Oh I thought this one is more general

torn mantle Jul 31, 2025, 11:00 AM

#

well they said they saturated both benchmarks