#general | Arena | Page 77

ocean vortex Jul 26, 2025, 11:58 AM

#

I agree that some of those prompts are overloaded with stuff about tools and are a bit much. But it’s still not definitively bad, you just have an emphasis on tools at the start of your chats. There have been no observed performance degradation in most cases - it basically evens itself out, helps as much as it harms but makes perfect sense for the platform to use

torn mantle Jul 26, 2025, 11:58 AM

#

dw its ok

ocean vortex Jul 26, 2025, 11:59 AM

#

So depending on your prompt and model it may perform slightly better or slightly worse

hazy quest Jul 26, 2025, 11:59 AM

#

Asura are you gatekeeping information leaked on X ?!

ocean vortex Jul 26, 2025, 11:59 AM

#

What was it

#

Co to kurwa jest @full idol

alpine flare Jul 26, 2025, 12:03 PM

#

ocean vortex What was it

An image of the "New Arena Models" Notification with a list of new models in LM Arena

hazy quest Jul 26, 2025, 12:04 PM

#

This one

alpine flare Jul 26, 2025, 12:05 PM

#

hazy quest Asura are you gatekeeping information leaked on X ?!

Didnt understand it as well

torn star Jul 26, 2025, 12:06 PM

#

I can already tell these models will have an elo of 1800+

keen beacon Jul 26, 2025, 12:06 PM

#

asura is trolling lol

rare python Jul 26, 2025, 12:07 PM

#

keen beacon asura is trolling lol

that's why asura is the most hated person in this server

torn mantle Jul 26, 2025, 12:18 PM

#

rare python that's why asura is the most hated person in this server

how come im the most hated person on all servers i joined

#

they all bully me

#

what did i do?

rare python Jul 26, 2025, 12:19 PM

#

torn mantle what did i do?

wait for the poll result

torn mantle Jul 26, 2025, 12:19 PM

#

it was removed

#

there is no such thing

rare python Jul 26, 2025, 12:20 PM

#

torn mantle it was removed

did you do it?

#

🤔

tepid turtle Jul 26, 2025, 12:21 PM

#

Hi guys, I can't seem to find the new "zenith" model everyone's talking about, was it removed in the meantime?

rare python Jul 26, 2025, 12:21 PM

#

torn mantle it was removed

#general message

#

54%

torn mantle Jul 26, 2025, 12:23 PM

#

...

hazy quest Jul 26, 2025, 12:28 PM

#

tepid turtle Hi guys, I can't seem to find the new "zenith" model everyone's talking about, w...

I get it every like once every 5 attempts in average more or less

fossil fable Jul 26, 2025, 12:29 PM

#

am i stupid

marsh sundial Jul 26, 2025, 12:30 PM

#

summit got better writing style, zenith is over rhetoric, maybe variant

sour spindle Jul 26, 2025, 12:41 PM

#

Going to be quite funny if these new models aren’t OpenAI

#

Meta finally putting that money to use lol

#

Some obscure cracked Chinese company side project?

alpine flare Jul 26, 2025, 12:44 PM

#

Did some testing on the models. Here's what I found out:

Zenith introduced a new perspective on a physics question (electrostatics) that I had, which I have never seen before from any other model. However, it made some strange assumptions and concluded that a scenario was only true if the assumption was also true, which was incorrect. I have also never seen this assumption from any other model before. Usually, you get very similar arguments from several models—you can often predict how they will respond to and reason about a specific physics question (be it wrong or right, there are pretty typical parts they will be wrong and then the answer will be wrong)—but Zenith and Summit were both entirely new in this regard. However, I did not get the impression their raw knowledge base has improved much, as they still produced similar hallucinations on niche topics like o3 or 2.5 Pro.

hazy quest Jul 26, 2025, 1:05 PM

#

I got

Summit against: Gemma 3 27b, Deepseek V3 0324, 4o 0326, llama 3.3 70B,
Zenith against: Amazon nova pro v1, Sonnet 4 (x2), Sonnet 4 32k

Weird that there Summit was paired only against clearly weaker models, and that none were paired against the big boys.

torn mantle Jul 26, 2025, 1:05 PM

#

hazy quest I got - Summit against: Gemma 3 27b, Deepseek V3 0324, 4o 0326, llama 3.3 70B, ...

they want to get that out of the way

alpine flare Jul 26, 2025, 1:06 PM

#

Suppose, you insert two metal plates pressed together inside a charged capacitor disconnected from battery (constant charge). Now we have Capacitor Plate 1 -> MetalPlate1+2 -> Capacitor Plate 2. Then, we separate the two metal plates inbetween the capacitor plates, so we have Capacitor Plate 1 -> MetalPlate1 -> Metal Plate2 -> Capacitor Plate 2. Please discuss whether there is a net electric field between MetalPlate1 and MetalPlate2

#

Even o3-pro answered "After the two inserted metal plates are pulled apart, they carry equal and opposite charges that cannot neutralise each other. Those charges reside partly on the facing surfaces and create a non-zero, uniform electric field in the space between the plates." which is clearly wrong

whole wagon Jul 26, 2025, 1:27 PM

#

whenever you put an actually complex problem in llm arena and get a reasoning model, the model thinks for so long and times out lol

torn mantle Jul 26, 2025, 1:37 PM

#

whole wagon whenever you put an actually complex problem in llm arena and get a reasoning mo...

i know what you want to tell us

whole wagon Jul 26, 2025, 1:37 PM

#

what

torn mantle Jul 26, 2025, 1:37 PM

#

maybe you are smarter than all of us

ocean vortex Jul 26, 2025, 1:38 PM

#

torn mantle what did i do?

I actually dunno but this is still hilarious tbh

whole wagon Jul 26, 2025, 1:38 PM

#

summit is really good

ocean vortex Jul 26, 2025, 1:38 PM

#

lmao

whole wagon Jul 26, 2025, 1:38 PM

#

like damn wow

fossil fable Jul 26, 2025, 1:49 PM

#

fossil fable am i stupid

no really am i stupid

why can't i do direct chats in webarena

civic flame Jul 26, 2025, 1:54 PM

#

webarena only has battle mode lol

whole wagon Jul 26, 2025, 2:00 PM

#

I am in a game show where there are 3 doors. One contains a prize and two a goat. I can choose one door at the beginning. Then the game master offers me to change it. Does changing it increase my probability of winning?

Still cant get this basic question right

#

The answer is obviously No. But it overfits to training data on monty hall problem

#

simplebench type problem

#

The reasoning the model gives is instantly about monty hall and the maths

stray aspen Jul 26, 2025, 2:10 PM

#

stepfun just dropped a new model

whole wagon Jul 26, 2025, 2:12 PM

#

whole wagon The answer is obviously No. But it overfits to training data on monty hall probl...

"Carefully read my prompt before answering" fixes. I think this is the model just assuming I forget to mention the door being opened after being chosen

#

Which is interesting. It's like trying to do typo correction but for the entire prompt

alpine flare Jul 26, 2025, 2:20 PM

#

whole wagon The reasoning the model gives is instantly about monty hall and the maths

Same for me, but summit ended its answer with "Note: This 2/3 result assumes the host always opens a goat, never opens the prize, and always offers the switch. If the host doesn’t open a door (just lets you change to a random other door), switching doesn’t help; it stays 1/3 either way."

hazy quest Jul 26, 2025, 2:44 PM

#

Same. Got summit twice, and both times it said something along those lines.
"Caveats: If the host does not reveal a door and simply asks if you want to change to one of the other two at random, switching doesn’t help (it stays 1/3)."

alpine flare Jul 26, 2025, 2:45 PM

#

zenith didnt do this btw (Grok-4 and summit are the only models that provide such a note, Grok even states "You didn't mention the host opening a door, which is unusual...")

whole wagon Jul 26, 2025, 2:55 PM

#

70% chance it's here by Aug 15th

torn bison Jul 26, 2025, 2:58 PM

#

more like 90% but im not gonna risk my money on it

whole wagon Jul 26, 2025, 2:59 PM

#

#

Interestingly the Dec 31 odds are not shifting though. A Google counter response is expected by then I guess

tall summit Jul 26, 2025, 3:02 PM

#

whole wagon summit is really good

i have never heard this problem before, it's a nice one

torn mantle Jul 26, 2025, 3:08 PM

#

i was confident gemini models wont be topped

#

but zenith / summit proved me wrong

whole wagon Jul 26, 2025, 3:09 PM

#

whole wagon summit is really good

Somehow o3 is getting this right also... I swear it got it wrong before

#

Wtf

torn bison Jul 26, 2025, 3:11 PM

#

I never thought zero shot prompting could create apps of this caliber

torn mantle Jul 26, 2025, 3:12 PM

#

whole wagon Somehow o3 is getting this right also... I swear it got it wrong before

are you trying o3 on lmarena?

whole wagon Jul 26, 2025, 3:12 PM

#

No on chatGPT

torn mantle Jul 26, 2025, 3:12 PM

#

some people said o3 on chatgpt got redirected to the new model zenith/gpt-5

whole wagon Jul 26, 2025, 3:16 PM

#

I think more people would have noticed that lol

stray aspen Jul 26, 2025, 3:16 PM

#

where are gpt-5 news

torn mantle Jul 26, 2025, 3:16 PM

#

whole wagon I think more people would have noticed that lol

but you didnt

primal orbit Jul 26, 2025, 3:16 PM

#

not impressed with zenith, hallucinates within answer (https://i.snipboard.io/em4Mlw.jpg)

whole wagon Jul 26, 2025, 3:16 PM

#

Maybe it's whatever o3-alpha is

torn mantle Jul 26, 2025, 3:16 PM

#

maybe

primal orbit Jul 26, 2025, 3:17 PM

#

wtf stay switzerland supposed to mean

leaden sun Jul 26, 2025, 3:20 PM

#

primal orbit wtf stay switzerland supposed to mean

it means stay"neutral, diplomatic (and composed, measured)"

stray aspen Jul 26, 2025, 3:21 PM

#

lol

leaden sun Jul 26, 2025, 3:27 PM

#

i like "surgically helpful in private", like it's implying a "private therapist" or clinically analytical helpfulness with surgical precision

rare python Jul 26, 2025, 3:54 PM

#

@torn mantle

leaden palm Jul 26, 2025, 4:39 PM

#

so many models

stray aspen Jul 26, 2025, 4:49 PM

#

to use zenith you just send the battle mode your propmt until you get it?

civic flame Jul 26, 2025, 4:49 PM

#

yes

whole sundial Jul 26, 2025, 4:54 PM

#

someone is trying to censor people talking about nightride? I guess that's because Google doesn't want people finding out their model searches on the web, which is why it is good at knowledge. I gave a question yesterday with a very obscure answer that can only be found by searching, and nightride-on got it correct.

keen fulcrum Jul 26, 2025, 4:59 PM

#

whole sundial someone is trying to censor people talking about nightride? I guess that's becau...

Just publish the info recently

#

It could have been in its dataset

whole sundial Jul 26, 2025, 5:10 PM

#

don't think it would have known this lol

#

this isn't the search arena google, why are you using web search on the normal arena? that is not fair to any other model

keen beacon Jul 26, 2025, 5:12 PM

#

gemini advanced/bard with search has been part of the normal arena in the past

#

its not a new thing

jagged crown Jul 26, 2025, 5:20 PM

#

I'm literally in shock. Zenith is basically creating things that have taken me months in the span of a few minutes

unborn ocean Jul 26, 2025, 5:22 PM

#

whole sundial this isn't the search arena google, why are you using web search on the normal a...

could also be that they are exploring the effect of their search implementation on the human preference ratings (and never actually releasing this officially on the arena)

#

aka just an experiment, idk though

stray aspen Jul 26, 2025, 5:30 PM

#

is o4-mini-high in the arena

torn mantle Jul 26, 2025, 5:45 PM

#

rare python <@295243581818404874>

well...

#

i still think its kiri

torn mantle Jul 26, 2025, 5:45 PM

#

whole sundial someone is trying to censor people talking about nightride? I guess that's becau...

hes blaming google now

civic flame Jul 26, 2025, 6:02 PM

#

jagged crown I'm literally in shock. Zenith is basically creating things that have taken me m...

+1

civic flame Jul 26, 2025, 6:02 PM

#

stray aspen is o4-mini-high in the arena

i believe so

whole wagon Jul 26, 2025, 6:03 PM

#

Summit > Zenith > Lobster > Starfish

wheat onyx Jul 26, 2025, 6:04 PM

#

Looking forward to gpt5 in coming days

whole wagon Jul 26, 2025, 6:05 PM

#

Looking forward to open source catching it in 3 months time Kappa

wheat onyx Jul 26, 2025, 6:07 PM

#

whole wagon Looking forward to open source catching it in 3 months time <:Kappa:436339616866...

And being quantized

jagged crown Jul 26, 2025, 6:08 PM

#

I don't think the ramifications of these technologies are really understood by the general public. While on one hand I'm thrilled to be able to create the ideas I have in my head, I'm also terrified for what this will mean for the economy at large

wheat onyx Jul 26, 2025, 6:15 PM

#

jagged crown I don't think the ramifications of these technologies are really understood by t...

Hard to conceptualize. Obviously high unemployment long term. But also much higher gdp. Can't really rework economy before we know the effects

jagged crown Jul 26, 2025, 6:18 PM

#

wheat onyx Hard to conceptualize. Obviously high unemployment long term. But also much high...

Hard to know how to personally prepare. I guess learn to adapt and be able to evolve with it. Definitely going to be very different regarding jobs for my field (UI/UX) designer in a year.

wheat onyx Jul 26, 2025, 6:18 PM

#

jagged crown Hard to know how to personally prepare. I guess learn to adapt and be able to ev...

I imagine more work, fewer jobs

toxic whale Jul 26, 2025, 6:29 PM

#

whole wagon Summit > Zenith > Lobster > Starfish

I disagree, lobster is the best

tight silo Jul 26, 2025, 7:16 PM

#

yeah, I got a random call from summit when doing the battles and I thought it was pretty dang awesome

#

only reason why I didn't give it the win is cause it generated something only partially related to my prompt

#

asked it for a fantasy story set in the perspective of a paladin in hell, it gave me a story of a big-rig driving paladin punching demons so it was pretty close, but not quite fantasy

#

still cool as hell though

leaden palm Jul 26, 2025, 7:21 PM

#

tight silo yeah, I got a random call from summit when doing the battles and I thought it wa...

call?

tight silo Jul 26, 2025, 7:22 PM

#

leaden palm call?

weird wording sorry

#

meant it just showed up

leaden palm Jul 26, 2025, 7:22 PM

#

ok

tight silo Jul 26, 2025, 7:23 PM

#

i am very very tired

ornate stump Jul 26, 2025, 7:28 PM

#

Don’t worry in a couple of years you’ll get a call from someone’s ai assistant for real

tight silo Jul 26, 2025, 7:28 PM

#

yeep

solar hollow Jul 26, 2025, 7:28 PM

#

you guys know if there are restrictions on lmarena for german users?

#

when i want to battle in the arena, sometimes responses take a very long time and/or just error after a while

#

really discourages me from continueing

torn star Jul 26, 2025, 7:57 PM

#

how long until we have ai models creating videos for us on a customized fyp

dawn wharf Jul 26, 2025, 7:58 PM

#

torn star how long until we have ai models creating videos for us on a customized fyp

it will happen yesterday

sullen quest Jul 26, 2025, 7:58 PM

#

solar hollow you guys know if there are restrictions on lmarena for german users?

don't think so?

whole wagon Jul 26, 2025, 8:08 PM

#

Summit is gone out of battle? 😦

#

Zenith is not on the same level

echo aurora Jul 26, 2025, 8:11 PM

#

solar hollow when i want to battle in the arena, sometimes responses take a very long time an...

There are reported issues of lag/errors that we're working on making better. This doesn't have to do with locale.

hardy pecan Jul 26, 2025, 8:21 PM

#

Anyone get the feeling this discord is being used to astroturf and to pump up the new models to manipulate polymarket odds? all these "new" people coming in seems sus.. haven't had them come in in droves like this before lmao

#

maybe im just skeptical

whole wagon Jul 26, 2025, 8:24 PM

#

jagged crown I'm literally in shock. Zenith is basically creating things that have taken me m...

Are you suggesting "StrategicSolutionsAI" writing their first message in the discord server that zenith leaves them in shock and does months of work in a few minutes (even though that is literally impossible in the LLM arena) is an astroturf?

#

I don't believe it Kappa

hardy pecan Jul 26, 2025, 8:26 PM

#

https://tenor.com/view/clown-mr-rogers-mask-clown-mask-gif-22997828

Tenor

whole sundial Jul 26, 2025, 8:27 PM

#

look at this model request post lol https://discord.com/channels/1340554757349179412/1395441703112146984
almost everyone here is just someone creating hype for a model that most can't access

whole wagon Jul 26, 2025, 8:28 PM

#

I don't think the polymarket betting (and kalshi etc) has created very good incentives in this space

hazy quest Jul 26, 2025, 8:30 PM

#

whole wagon Summit > Zenith > Lobster > Starfish

Just based on who the models are put against, I believe that Zenith is bigger than Summit. But very interested in hearing opinions!

whole wagon Jul 26, 2025, 8:37 PM

#

I don't like they keep adding and removing models

#

I only get zenith now

sonic tendon Jul 26, 2025, 8:56 PM

#

hardy pecan Anyone get the feeling this discord is being used to astroturf and to pump up t...

it's not impossible, but i doubt anyone trying to manipulate these markets would be doing something that sophisticated

#

market reactions to new information can be very fickle

#

and liquidity on even the biggest ai market on poly is still awful

#

i don't think that you could make much by speculating on and attempting to control crowd sentiment

gusty tendon Jul 26, 2025, 9:17 PM

#

whole wagon I only get zenith now

i've gotten summit a lot but not zenith once..

brittle tiger Jul 26, 2025, 10:01 PM

#

@civic flame what examples of zenith/summit doing great with reasoning have you seen? the svg pelican looked on par with 2.5, worse than opus 4 to me

civic flame Jul 26, 2025, 10:04 PM

#

sorry, remind me tomorrow i'm not in a good place at the minute. rough night

fossil fable Jul 26, 2025, 10:51 PM

#

how do i use direct chat on webarena

echo aurora Jul 26, 2025, 10:52 PM

#

fossil fable how do i use direct chat on webarena

Sorry to say it's battle mode only.

#

That is feedback we're aware is important though.

formal dagger Jul 26, 2025, 10:54 PM

#

summit crushed Gemini 2.5 on timeline management of story

ocean vortex Jul 26, 2025, 10:55 PM

#

whole sundial don't think it would have known this lol

on = online

fossil fable Jul 26, 2025, 10:55 PM

#

echo aurora Sorry to say it's battle mode only.

is it selectable at all

ocean vortex Jul 26, 2025, 10:56 PM

#

But I agree online models have no business directly competing with models having no tools or internet access

stray aspen Jul 26, 2025, 10:58 PM

#

does any model smash claude sonnet 4 no think at coding?

storm needle Jul 26, 2025, 10:58 PM

#

stray aspen does any model smash claude sonnet 4 no think at coding?

claude 4 opus

ocean vortex Jul 26, 2025, 10:59 PM

#

stray aspen does any model smash claude sonnet 4 no think at coding?

o3 or 2.5pro would both beat Sonnet no think comfortably

#

It’s not even the same league tbh

wanton sonnet Jul 26, 2025, 10:59 PM

#

stray aspen does any model smash claude sonnet 4 no think at coding?

O3 is fairly good at working on complicated projects. Sonnet is a good starter tool though.

mint cape Jul 26, 2025, 11:07 PM

#

Hello

torn star Jul 26, 2025, 11:12 PM

#

Cuttlefish is actually insane.

candid storm Jul 26, 2025, 11:12 PM

#

torn star Cuttlefish is actually insane.

Better than zenith/summit?

blazing rune Jul 26, 2025, 11:15 PM

#

ocean vortex o3 or 2.5pro would both beat Sonnet no think comfortably

Sonnet doesn't need to think, the other ones do

#

I wish they had a non reasoning option for Gemini 2.5 Pro, and had very short reasoning versions for both, like 1k tokens or something

fleet smelt Jul 26, 2025, 11:20 PM

#

candid storm Better than zenith/summit?

How can I chat directly with these models guys?

whole wagon Jul 26, 2025, 11:21 PM

#

You can't

#

Because they are unreleased

fleet smelt Jul 26, 2025, 11:22 PM

#

So how normal people is testing it ??

polar venture Jul 26, 2025, 11:22 PM

#

fleet smelt How can I chat directly with these models guys?

What you mean if you tell me with example i can help you?

fleet smelt Jul 26, 2025, 11:22 PM

#

polar venture What you mean if you tell me with example i can help you?

I want to try zenith myself, its possible some way ?

polar venture Jul 26, 2025, 11:22 PM

#

Like this here iuse ChatGPT 4o

torn star Jul 26, 2025, 11:22 PM

#

fleet smelt So how normal people is testing it ??

Luck

stray aspen Jul 26, 2025, 11:22 PM

#

is this o4 mini high

stray aspen Jul 26, 2025, 11:23 PM

#

fleet smelt So how normal people is testing it ??

get lucky in battle mode

fossil fable Jul 26, 2025, 11:23 PM

#

fossil fable is it selectable at all

?

fleet smelt Jul 26, 2025, 11:23 PM

#

stray aspen get lucky in battle mode

Oh I see ok thanks bro

stray aspen Jul 26, 2025, 11:23 PM

#

just keep on sending until you get that model

fossil fable Jul 26, 2025, 11:23 PM

#

stray aspen get lucky in battle mode

LMAO that's stupid

so they'd let you do that but not the rest

stray aspen Jul 26, 2025, 11:25 PM

#

fossil fable LMAO that's stupid so they'd let you do that but not the rest

no

#

its companies testing their models

#

its unreleased ai models

#

i mean kinda is

fossil fable Jul 26, 2025, 11:25 PM

#

stray aspen its companies testing their models

...in this specific web development arena

does that not sound a bit ridiculous

dusky pier Jul 26, 2025, 11:36 PM

#

torn star Cuttlefish is actually insane.

I think it's grok

#

How is it insane?

hardy pecan Jul 26, 2025, 11:40 PM

#

fossil fable ...in this specific web development arena does that not sound a bit ridiculous

I dont think you understand what lmarena is...

tight nest Jul 26, 2025, 11:52 PM

#

is it true that the current o3 on the chatgpt ui actually routes to Zenith? I saw some people claiming that on twitter

dusky pier Jul 26, 2025, 11:57 PM

#

ocean vortex o3 or 2.5pro would both beat Sonnet no think comfortably

Sonnet is already dead

#

I'm glad they are killing it

#

It's so expensive

hardy pecan Jul 26, 2025, 11:58 PM

#

Ive been getting alot of A/B testing using o3 in chatgpt.com

#

Highly doubt they just replaced it with "zenith" or "summit"

dusky pier Jul 26, 2025, 11:58 PM

#

hardy pecan Ive been getting alot of A/B testing using o3 in chatgpt.com

Sam really wants to train gpt5

wheat onyx Jul 26, 2025, 11:59 PM

#

I wonder what the costs for gpt5 are

stray aspen Jul 26, 2025, 11:59 PM

#

is this 4o mini high?

dusky pier Jul 27, 2025, 12:00 AM

#

wheat onyx I wonder what the costs for gpt5 are

I doubt it's gonna be revolutionary

#

I don't see the hype in it

hardy pecan Jul 27, 2025, 12:03 AM

#

Lol

#

so bizzare

small haven Jul 27, 2025, 12:23 AM

#

hows zenith

jade egret Jul 27, 2025, 12:27 AM

#

when gpt 5.....

small haven Jul 27, 2025, 12:34 AM

#

yes how is it

#

better than o3-pro? 👀

#

or kingfall?

forest prism Jul 27, 2025, 12:58 AM

#

Is folsom-072125-1 just minimax-m1?

dawn wharf Jul 27, 2025, 1:26 AM

#

forest prism Is folsom-072125-1 just minimax-m1?

folsom is amazon

sturdy mica Jul 27, 2025, 1:48 AM

#

poll_question_text

best model rn

victor_answer_votes

7

total_votes

14

victor_answer_id

3

victor_answer_text

o3 pro

torn bison Jul 27, 2025, 1:48 AM

#

https://m-01.pages.dev/summit_gpu_attractor
summit made this

whole wagon Jul 27, 2025, 2:12 AM

#

Bruh why am I just getting zenith

#

Summit is so much better smh

zinc ore Jul 27, 2025, 2:15 AM

#

They changed the sysprompt supposedly, which lowered zenith performance

hardy pecan Jul 27, 2025, 2:21 AM

#

Appears to be so, I'm verifying simple-bench questions summit vs zenith to see which is smarter, although contamination at this point might be redundant

leaden palm Jul 27, 2025, 2:25 AM

#

what do we think about zenith's knowledge? is it just a really recent cutoff or does it search the web?

leaden palm Jul 27, 2025, 2:26 AM

#

leaden palm what do we think about zenith's knowledge? is it just a really recent cutoff or ...

*by really recent i mean within past ~100 days

#

folsom-072125-2 is kinda stupid

sturdy mica Jul 27, 2025, 2:29 AM

#

https://tenor.com/view/peabodycord-penguinz0-moist-critical-gif-26526191

Tenor

small haven Jul 27, 2025, 3:07 AM

#

wow cant wait to use it

wintry tinsel Jul 27, 2025, 3:37 AM

#

dusky pier I don't see the hype in it

The hype is obvious gpt 4 essentially started this whole AI race and Now comes time for the sequel!

#

Gpt 5 has to be successful for the health of the whole industry

winged locust Jul 27, 2025, 3:39 AM

#

NO CUREFISH

patent aspen Jul 27, 2025, 4:03 AM

#

wintry tinsel Gpt 5 has to be successful for the health of the whole industry

No

haughty tangle Jul 27, 2025, 5:39 AM

#

summit is o4-pro

stuck rose Jul 27, 2025, 6:39 AM

#

A great arena for understanding different models. Looking forward to hear about API possibilities for diffrent models.

limber anvil Jul 27, 2025, 6:39 AM

#

haughty tangle summit is o4-pro

Who said

#

Is it better than

#

Claude 4

reef pawn Jul 27, 2025, 7:54 AM

#

Does Copilot PC NPUs have any use cases with LLMs?

heady arch Jul 27, 2025, 7:55 AM

#

Guys do you think that zenith and summit got deleted? I can't get them for 1 hour

gray delta Jul 27, 2025, 8:15 AM

#

heady arch Jul 27, 2025, 8:39 AM

#

Oh no

zealous panther Jul 27, 2025, 8:41 AM

#

what does that mean

hardy pecan Jul 27, 2025, 8:58 AM

#

bastards

#

I was testing them

keen fulcrum Jul 27, 2025, 9:02 AM

#

hardy pecan bastards

Such users as you aren’t appreciated on the platform

sweet tinsel Jul 27, 2025, 9:02 AM

#

Copilot is weird, the deep research is not available in the browser, it told me to pay for plus in the desktop app and I've got 10 free uses on the mobile app.

hardy pecan Jul 27, 2025, 9:02 AM

#

So zenith is still here then perhaps

civic flame Jul 27, 2025, 9:04 AM

#

it should be

#

but I haven't been able to get it

hardy pecan Jul 27, 2025, 9:05 AM

#

Neither

civic flame Jul 27, 2025, 9:05 AM

#

it's still configured on the frontend but I have a feeling it's been disabled behind the scenes

#

as has summit

#

because I've gone from getting them every few rounds to getting neither of them in 50 rounds

hardy pecan Jul 27, 2025, 9:06 AM

#

Yeah same here

sweet tinsel Jul 27, 2025, 9:09 AM

#

Copilot DR is Weird, it refuses to speak in English.

whole wagon Jul 27, 2025, 9:25 AM

#

gray delta

Where do u find these messages

#

Maybe they removed cos they saw this kek

#

The august 15th release bet is at 69% so the market thinks basically guaranteed GPT5 reaches 1st on lmarena

midnight ferry Jul 27, 2025, 9:30 AM

#

gray delta

how do u get those notifications?

fallow remnant Jul 27, 2025, 9:35 AM

#

Where to get to gpt 5?

cedar tide Jul 27, 2025, 9:39 AM

#

Zenith and summit removed too

#

@echo aurora possible to tell the lm arena team that we absolutely want the possibility of regenerating the last answer of the llm even if we have already voted (we could do it on the old lm arena)

midnight ferry Jul 27, 2025, 9:40 AM

#

gpt 5 released 31th july

hardy pecan Jul 27, 2025, 9:41 AM

#

Simple Bench scores from what i tested so far, wasn't able to get to 20 questions in time, before they were removed
Summit: 9/11
Zenith: 10/12

for comparison, gemini 2.5 pro: 7/11 or 7/12 of the simple bench questions

Unfortunately not complete testing, but a rough idea at least

midnight ferry Jul 27, 2025, 9:43 AM

#

what about lobster ?

hardy pecan Jul 27, 2025, 9:43 AM

#

the next 10 other simple bench questions generally trip up alot of models so id expect them to perform less well as opposed to the first 10 public ones

#

Only got lobster in webarena

whole wagon Jul 27, 2025, 9:44 AM

#

where do u get 20 questions

hardy pecan Jul 27, 2025, 9:46 AM

#

from a competition the simple bench fella held to find the best optimized prompt to get these questions correct, wandb.ai dataset i believe

blazing bison Jul 27, 2025, 10:27 AM

#

They removed zenith, killed my boy

#

😢

sweet tinsel Jul 27, 2025, 10:32 AM

#

It's a mistake testing ms copilot out, it's really corny.

merry stag Jul 27, 2025, 10:34 AM

#

where can i try zenith? i try http://lmarena.ai and its always use known model like o3 not anonymous model

dusky aurora Jul 27, 2025, 10:41 AM

#

merry stag where can i try zenith? i try http://lmarena.ai and its always use known model l...

battle mode

fleet lintel Jul 27, 2025, 10:42 AM

#

hardy pecan Simple Bench scores from what i tested so far, wasn't able to get to 20 question...

are, Zenith and Summit, both from OAI?

#

and these models are fast or slow?

hardy pecan Jul 27, 2025, 10:44 AM

#

fleet lintel are, Zenith and Summit, both from OAI?

They are thinkers yeah, yeah I believe from open ai, at least claimed to be when asking them

stone birch Jul 27, 2025, 10:51 AM

#

I tried using the same prompt about 30 times in battle mode to test Zenith, but it never came up.
Is it possible that it has been removed from LM Arena?

civic flame Jul 27, 2025, 10:55 AM

#

yes it's been disabled but the arena is still configured for it

#

so it's probably temporary

#

normally when they're done completely they'll remove the model on the server and client sides, but it's still there on both, just disabled in evaluations

stone birch Jul 27, 2025, 10:59 AM

#

Can't use it? 😭

civic flame Jul 27, 2025, 11:01 AM

#

not right now

past dagger Jul 27, 2025, 11:01 AM

#

Is there any way to transfer my chat from one pc to another

civic flame Jul 27, 2025, 11:01 AM

#

nope

languid crescent Jul 27, 2025, 11:01 AM

#

past dagger Is there any way to transfer my chat from one pc to another

sadly no

#

i did suggest it i think its part of their plan

#

like ability to export chats like chatgpt

past dagger Jul 27, 2025, 11:02 AM

#

then only copy paste works ?

languid crescent Jul 27, 2025, 11:02 AM

#

yea that's what i've been doing just copy all the convo to a .txt file

past dagger Jul 27, 2025, 11:02 AM

#

how you do it

languid crescent Jul 27, 2025, 11:03 AM

#

literally copy every message 😭

past dagger Jul 27, 2025, 11:03 AM

#

one by one

languid crescent Jul 27, 2025, 11:03 AM

#

past dagger one by one

y e s

past dagger Jul 27, 2025, 11:04 AM

#

got it tnq

torn mantle Jul 27, 2025, 11:40 AM

#

languid crescent like ability to export chats like chatgpt

chatgpt has that?

#

am i blind...

#

is it for plus users only?

wild kayak Jul 27, 2025, 12:09 PM

#

A question about web dev arena. If one model fails to generate valid JSON for the chat, will this directly result in the model's failure in the battle? or the battle will just be ignored.

torn mantle Jul 27, 2025, 12:29 PM

#

wild kayak A question about web dev arena. If one model fails to generate valid JSON for th...

what do you mean by json format, are you talking about the payload request?

blazing bison Jul 27, 2025, 12:44 PM

#

whole wagon Maybe they removed cos they saw this kek

this is basically decided by benchmarks so there is a big chance of opus 4.1 or grok 4.1 winning

#

?

torn mantle Jul 27, 2025, 12:45 PM

#

?

blazing bison Jul 27, 2025, 12:45 PM

#

?

torn mantle Jul 27, 2025, 12:45 PM

#

yea its decided by the final votes on lmarena

blazing bison Jul 27, 2025, 12:46 PM

#

what i mean is that models that is trained to smash bench do better than real good models

#

that's why i said "basically"

#

Lama 4 is essentially proof of this

whole wagon Jul 27, 2025, 1:01 PM

#

simplebench would be good if he actually added models

fossil fable Jul 27, 2025, 1:29 PM

#

bug: webarena refuses to work whenever accessed via mobile browser on desktop mode

Screenshot_2025-07-27-14-29-09-63_ee173eb828fd2cc78440901d4e9e3ae9.jpg

civic flame Jul 27, 2025, 1:39 PM

#

works for me

forest flower Jul 27, 2025, 1:58 PM

#

Hey guys, i'm new to web dev arena, i went into the battle section and wrote a prompt and i got the 2 results side by side, but i can't seem to figure out which AI / models generated each output, is it possible to see which on generated these outputs, i looked everywhere i still can't seem to find it

civic flame Jul 27, 2025, 1:58 PM

#

you see after you vote

forest flower Jul 27, 2025, 1:59 PM

#

In the website? https://web.lmarena.ai/

civic flame Jul 27, 2025, 1:59 PM

#

yes

#

both lmarena and web arena

forest flower Jul 27, 2025, 2:00 PM

#

cool, thank you

#

I got one more question, how would i generate a video in video-arena here, i tried #video-arena-1 A real life sagittarius in its true form, but it doesn

#

doesn't seem to give any output

gentle plinth Jul 27, 2025, 2:02 PM

#

You have to use the command /video

#

#1397655624103493813

forest flower Jul 27, 2025, 2:06 PM

#

alright, i didn't see that, thanks a lot @gentle plinth

smoky finch Jul 27, 2025, 2:07 PM

#

guys are the new models (lobster, summit) no longer available in the arena?

blazing bison Jul 27, 2025, 2:12 PM

#

Yes, removed

smoky finch Jul 27, 2025, 2:15 PM

#

😦

fossil fable Jul 27, 2025, 3:18 PM

#

hardy pecan They are thinkers yeah, yeah I believe from open ai, at least claimed to be when...

definitely not a reliable way to tell

Screenshot_2025-07-27-14-42-36-68_ee173eb828fd2cc78440901d4e9e3ae9.jpg

#

Ｆｕｃｋｋｋ．Ｔｈｅｍｏｄｅｌｓａｒｅｇｏｎｅ．

#

it is Not

Screenshot_2025-07-27-15-10-06-17_ee173eb828fd2cc78440901d4e9e3ae9.jpg

Screenshot_2025-07-27-15-09-48-11_ee173eb828fd2cc78440901d4e9e3ae9.jpg

fossil fable Jul 27, 2025, 3:22 PM

#

whole wagon Summit > Zenith > Lobster > Starfish

cuttlefish?

blazing bison Jul 27, 2025, 3:30 PM

#

what they did with my zenith boy

reef tendon Jul 27, 2025, 3:30 PM

#

Do they usually pull these models a couple days before release?

blazing bison Jul 27, 2025, 3:31 PM

#

weeks before release

#

2 weeks

reef tendon Jul 27, 2025, 3:31 PM

#

naaah :///

blazing bison Jul 27, 2025, 3:31 PM

#

1 - 2 weeks generally

reef tendon Jul 27, 2025, 3:31 PM

#

I needed them

blazing bison Jul 27, 2025, 3:33 PM

#

they aren't much better than sonnet or o3

#

only on frontend, if you do frontend things then i understand

blazing bison Jul 27, 2025, 4:05 PM

#

if you say so

storm needle Jul 27, 2025, 4:19 PM

#

except claude

unborn ocean Jul 27, 2025, 4:54 PM

#

for most labs it is likely only from the pre-training stage

#

though there are definitely a lot of other labs just straight up using the sota in post training

mellow mango Jul 27, 2025, 4:55 PM

#

Im new so I gotta ask how does LMArena work? Can I really generate more than, let's say, 3 images of gpt per day even though it's limited on the official website?

#

I mean, is it basically useless paying for GPT Plus cause of this?

leaden palm Jul 27, 2025, 4:58 PM

#

mellow mango I mean, is it basically useless paying for GPT Plus cause of this?

if you're okay with random models and a few less features

mellow mango Jul 27, 2025, 4:58 PM

#

leaden palm if you're okay with random models and a few less features

wym random models? I can clearly choose between the best text to image models for free?

#

such as gpt image and google's imagen 4 ultra

leaden palm Jul 27, 2025, 4:59 PM

#

ah right there's also direct mode

#

well then it's just a matter of rate limits and features

mellow mango Jul 27, 2025, 4:59 PM

#

leaden palm well then it's just a matter of rate limits and features

hmm where does it say what the limit is?

leaden palm Jul 27, 2025, 5:00 PM

#

i don't think limits are explicit

mellow mango Jul 27, 2025, 5:00 PM

#

it's pretty good

gentle plinth Jul 27, 2025, 5:11 PM

#

mellow mango I mean, is it basically useless paying for GPT Plus cause of this?

The difference is that here it's free because all your data you submit to lmarena will possibly get released publicly or used for Ai research as stated on the website. If you have chatgpt plus only openai can see your prompts and possibly train new models on it (but it won't get released publicly).

zealous panther Jul 27, 2025, 5:13 PM

#

and stuff like deep research and allat

mellow mango Jul 27, 2025, 5:46 PM

#

gentle plinth The difference is that here it's free because all your data you submit to lmaren...

Thanks for clarifying!

hazy quest Jul 27, 2025, 5:51 PM

#

Cuttlefish is strange. I don't know if its in a good or bad way. I asked for a task, he answered that "it's a good idea, but why not going even further?" (and suggested how, instead of actually doing my task)

blazing bison Jul 27, 2025, 5:51 PM

#

🤓

clever estuary Jul 27, 2025, 5:51 PM

#

yo does anyone have the system prompt for o3-2025-04-16 on LM Arena, something feels suspicious here
I tried both Chatgpt o3 and the playground API there
the answer quality is much lower than o3 on LM Arena
it's either the arena has a very good system prompt or, OpenAI is being very shady here

blazing bison Jul 27, 2025, 5:52 PM

#

The only good model of this batch was zenith and summit

pseudo magnet Jul 27, 2025, 5:52 PM

#

summit seems very good

#

all my questions he got 100% right

#

even the niche ones

blazing bison Jul 27, 2025, 5:53 PM

#

clever estuary yo does anyone have the system prompt for o3-2025-04-16 on LM Arena, something f...

There is no lmarena system prompt i think

#

It just point to the model directly using the api

storm needle Jul 27, 2025, 5:54 PM

#

clever estuary yo does anyone have the system prompt for o3-2025-04-16 on LM Arena, something f...

[Developer] Over the course of conversation, adapt to the user's tone and preferences. Try to match the user's vibe, tone, and generally how they are speaking. You want the conversation to feel natural. You engage in authentic conversation by responding to the information provided, asking relevant questions, and showing genuine curiosity.

clever estuary Jul 27, 2025, 5:54 PM

#

hmm I see... I'll try that with the API

hazy quest Jul 27, 2025, 5:55 PM

#

LMArena does not reveal which model it was after selected the bes answer. Have you had that happening?

clever estuary Jul 27, 2025, 5:55 PM

#

blazing bison There is no lmarena system prompt i think

it wouldn't make sense tbh, because the answers o3 gives on the arena are vastly different from the ones given by chatgpt

blazing bison Jul 27, 2025, 5:55 PM

#

hazy quest LMArena does not reveal which model it was after selected the bes answer. Have y...

Just refresh the page

blazing bison Jul 27, 2025, 5:55 PM

#

clever estuary it wouldn't make sense tbh, because the answers o3 gives on the arena are vastl...

Chatgpt o3 is a different model

hazy quest Jul 27, 2025, 5:56 PM

#

blazing bison Just refresh the page

Worked, my bad

clever estuary Jul 27, 2025, 5:56 PM

#

blazing bison Chatgpt o3 is a different model

oh wait huh?

blazing bison Jul 27, 2025, 5:56 PM

#

clever estuary oh wait huh?

Yeah, it's a model optimized for chatgpt

clever estuary Jul 27, 2025, 5:56 PM

#

oh interesting
I never knew that lmao

blazing bison Jul 27, 2025, 5:57 PM

#

O3 from api is much, much better

leaden palm Jul 27, 2025, 5:58 PM

#

blazing bison O3 from api is much, much better

a countertheory i've seen floating around: a newer better version of o3 is being tested in chatgpt rn

blazing bison Jul 27, 2025, 6:05 PM

#

clever estuary oh wait huh?

My source btw https://x.com/aidan_mclau/status/1932538835608744244?t=zylMB12BqTnOumgiMvCNYQ&s=19

Aidan McLaughlin (@aidan_mclau)

@imMedhansh no we've done this with all reasoning models always since o1 and haven't changed anything

also it's not exactly 1:1 to what o3 reasoning_effort=medium is in the api for infra reasons but it's quite close

blazing bison Jul 27, 2025, 6:06 PM

#

leaden palm a countertheory i've seen floating around: a newer better version of o3 is being...

I tryed it today and feels the same o3 for me

gentle plinth Jul 27, 2025, 6:27 PM

#

https://www.reddit.com/r/singularity/comments/1m9g5wb/openai_are_now_stealth_routing_all_o3_requests_to/

From the singularity community on Reddit: OpenAI are now stealth ro...

Explore this post and more from the singularity community

#

It's not all requests tho, only some

blazing bison Jul 27, 2025, 6:32 PM

#

The source?

#

He works at openai so...

#

yes, it's noticeable that o3 on chatgpt is bad compared to api

#

o3 on chatgpt is lazy, and dumb for coding

civic flame Jul 27, 2025, 6:33 PM

#

blazing bison they aren't much better than sonnet or o3

they are

#

although the new sysprompt they added to them on lmarena made them way dumber

blazing bison Jul 27, 2025, 6:33 PM

#

civic flame they are

not for my use cases, didn't changed much

civic flame Jul 27, 2025, 6:33 PM

#

pre-sloppification they're noticeably smarter at basically everything

#

zenith got 10/10 on the public simplebench dataset lol

#

yeah

#

the only question it still sometimes struggles with is the last one

blazing bison Jul 27, 2025, 6:34 PM

#

i tryed it and got like 7/10

civic flame Jul 27, 2025, 6:34 PM

#

blazing bison i tryed it and got like 7/10

when did you try it

hazy quest Jul 27, 2025, 6:34 PM

#

gray delta

Where is this screenshot from? Another discord?

blazing bison Jul 27, 2025, 6:34 PM

#

2 days ago i think

civic flame Jul 27, 2025, 6:34 PM

#

and were you giving it the questions as raw text with the choices

stray aspen Jul 27, 2025, 6:34 PM

#

are o3 api requests also being routed to gpt 5?

civic flame Jul 27, 2025, 6:34 PM

#

stray aspen are o3 api requests also being routed to gpt 5?

no

blazing bison Jul 27, 2025, 6:34 PM

#

yes raw text with the coices

#

just copy pasted

civic flame Jul 27, 2025, 6:35 PM

#

no

#

individually

#

regular

raven oracle Jul 27, 2025, 6:35 PM

#

Zenith is gone, right?

civic flame Jul 27, 2025, 6:35 PM

#

blazing bison just copy pasted

i've had 2 other people do the same thing and on their runs it got 9/10 and 10/10 again

blazing bison Jul 27, 2025, 6:35 PM

#

raven oracle Zenith is gone, right?

yes

civic flame Jul 27, 2025, 6:35 PM

#

raven oracle Zenith is gone, right?

disabled it appears

#

but not gone gone

#

it's still there just the serverside doesn't give it to you in evaluations

gentle plinth Jul 27, 2025, 6:36 PM

#

Until Google takes their turn and drops another model checkpoint

hazy quest Jul 27, 2025, 6:36 PM

#

Kingfall was already a "while" ago though

gentle plinth Jul 27, 2025, 6:36 PM

#

Time will tell

civic flame Jul 27, 2025, 6:37 PM

#

it's the base model for deepthink

#

lol

blazing bison Jul 27, 2025, 6:37 PM

#

if openai still have the strict policy of using data only from 2023 and 2024 with exception of politics, then it's not the case

#

sam said that the reasoning behind that is that the model can get updated information searching the internet

#

the objective is the models be smart enough to use internet and in context learning

civic flame Jul 27, 2025, 6:39 PM

#

blazing bison if openai still have the strict policy of using data only from 2023 and 2024 wit...

zenith + summit know things up to early this year so that seems to no longer be the case

blazing bison Jul 27, 2025, 6:39 PM

#

civic flame zenith + summit know things up to early this year so that seems to no longer be ...

politics?

civic flame Jul 27, 2025, 6:39 PM

#

nope

stray aspen Jul 27, 2025, 6:39 PM

#

est ce que quelqu'un a acces a gemini 2.5 pro deepthink?

civic flame Jul 27, 2025, 6:39 PM

#

it knows what gpt-4.1 is without it being in the sysprompt

blazing bison Jul 27, 2025, 6:39 PM

#

because they traing their models with thei api information

#

and it always updated

civic flame Jul 27, 2025, 6:40 PM

#

maybe so

gentle plinth Jul 27, 2025, 6:40 PM

#

I think they can recognize output from their own models tho, but not sure. Gptzero seems kinda OK for classifying if some text is Ai generated. Not 100% accurate, but still somewhat

blazing bison Jul 27, 2025, 6:40 PM

#

they still update the models with new data btw, just not new data from internet

#

synthetic data

civic flame Jul 27, 2025, 6:41 PM

#

tbh they shouldn't stop updating it with new info from the internet for that much longer

#

it's useful for models to have more internal knowledge without needing to use web tools

blazing bison Jul 27, 2025, 6:41 PM

#

i think that gpt-5 is a model router, that route for new models based on gpt 4.5 or gpt 4.1

#

we didn't see any gpt 4.1 or 4.5 thinking yet so zennith is prob one of them distilled + thinking

civic flame Jul 27, 2025, 6:42 PM

#

blazing bison i think that gpt-5 is a model router, that route for new models based on gpt 4.5...

❌

this-is-your-daily-reminder-that-gpt-5-is-not-a-router-its-v0-2hrgou5s1w9f1.webp

#

openai employee

unborn ocean Jul 27, 2025, 6:43 PM

#

blazing bison we didn't see any gpt 4.1 or 4.5 thinking yet so zennith is prob one of them dis...

'no 4.1 thinking' ... eh

#

o3

gentle plinth Jul 27, 2025, 6:43 PM

#

If he says unified this would mean multiple models under the hood

blazing bison Jul 27, 2025, 6:43 PM

#

civic flame ❌

nop. this is not valid anymore

civic flame Jul 27, 2025, 6:44 PM

#

lol what

leaden palm Jul 27, 2025, 6:44 PM

#

iirc reputable sources said it would be like a router first but would eventually become unified

blazing bison Jul 27, 2025, 6:44 PM

#

civic flame lol what

Kevin already said on another tweet that they gonna do rout from start

civic flame Jul 27, 2025, 6:44 PM

#

link it then

blazing bison Jul 27, 2025, 6:45 PM

#

i think i bookmarked it, gonna check

gentle plinth Jul 27, 2025, 6:46 PM

#

#

Which would mean they were two models

clever estuary Jul 27, 2025, 6:46 PM

#

so what, it's just MoE

civic flame Jul 27, 2025, 6:46 PM

#

gentle plinth Which would mean they were two models

were

gentle plinth Jul 27, 2025, 6:46 PM

#

It could mean a lot of things

clever estuary Jul 27, 2025, 6:46 PM

#

MoE is "unified model"

blazing bison Jul 27, 2025, 6:46 PM

#

cause he made mistakes more than one time and just delete tweets like nothing happened?

gentle plinth Jul 27, 2025, 6:46 PM

#

clever estuary MoE is "unified model"

OK fair

clever estuary Jul 27, 2025, 6:47 PM

#

it's impossible to be just one single model because that would be insanely hard to run, assuming 5 is much much better than o3 etc

blazing rune Jul 27, 2025, 6:47 PM

#

clever estuary MoE is "unified model"

not really

wintry tinsel Jul 27, 2025, 6:49 PM

#

Gpt 5 coming out in only one billion years guys 🥹

wintry tinsel Jul 27, 2025, 6:49 PM

#

clever estuary it's impossible to be just one single model because that would be insanely hard ...

better, not much much better

civic flame Jul 27, 2025, 6:49 PM

#

it's releasing in the next 2 weeks lol

wintry tinsel Jul 27, 2025, 6:49 PM

#

That’s just a rumor

#

The ai race is so pressured and accelerated they don’t let things simmer and discover paradigm shifting advancements, every iterative improvement is a new release, so the new next Sota is almost always one iterative improvement over the previous one

civic flame Jul 27, 2025, 6:51 PM

#

i know someone at oai who told me that it aligns with his understanding of the launch window

#

take that as you will

#

but i'm confident

dawn wharf Jul 27, 2025, 6:51 PM

#

clever estuary it's impossible to be just one single model because that would be insanely hard ...

remember how they suddenly reduced the price of o3 by like 80%?

wintry tinsel Jul 27, 2025, 6:51 PM

#

I will take that as a month give or take and I’m not one month patient

dawn wharf Jul 27, 2025, 6:51 PM

#

they might have found optimization techniques

clever estuary Jul 27, 2025, 6:51 PM

#

dawn wharf remember how they suddenly reduced the price of o3 by like 80%?

yeah but getting a contract with google and use their tensor chips

civic flame Jul 27, 2025, 6:51 PM

#

wintry tinsel I will take that as a month give or take and I’m not one month patient

oai wouldn't AB test gpt-5 on chatgpt and put a bunch of gpt-5 models on lmarena a month out from launch

dawn wharf Jul 27, 2025, 6:52 PM

#

civic flame oai wouldn't AB test gpt-5 on chatgpt and put a bunch of gpt-5 models on lmarena...

google is laughing in the corner

wintry tinsel Jul 27, 2025, 6:52 PM

#

Google does yeah

civic flame Jul 27, 2025, 6:52 PM

#

i hope they're cooking with deepthink

dawn wharf Jul 27, 2025, 6:52 PM

#

but I agree

civic flame Jul 27, 2025, 6:52 PM

#

i trust that they are

civic flame Jul 27, 2025, 6:52 PM

#

wintry tinsel Google does yeah

good thing we're not talking about google then

blazing bison Jul 27, 2025, 6:53 PM

#

civic flame link it then

https://x.com/kevinweil/status/1890914595268657194

Kevin Weil 🇺🇸 (@kevinweil)

@SpencerKSchiff What you outlined is the plan 👍 May start with a little routing behind the scenes to hide some lingering complexity, but mostly around the edges. The plan is to get the core model to do quick responses, tools, and longer reasoning.

civic flame Jul 27, 2025, 6:53 PM

#

ty

blazing bison Jul 27, 2025, 6:55 PM

#

if the o3 thing is true, that some questions make it answer like zenith on chatgpt, the routing approach is basically confirmed and in test

#

and the promise that everyone will get access to gpt-5 unlimited, they need routing for that...

wintry tinsel Jul 27, 2025, 6:56 PM

#

civic flame good thing we're not talking about google then

Google takes a Google of years to release

#

Gpt 5 will cost through the nose :/

blazing bison Jul 27, 2025, 6:57 PM

#

I don't think it will cost more than o3

gentle plinth Jul 27, 2025, 6:59 PM

#

old or new price?

blazing bison Jul 27, 2025, 6:59 PM

#

new

#

bro do you have sources or you just say things on your mind?

#

oh my god

#

When discussing the price of things, it's best to base it on something

#

professional yapper

#

i source things that i say

gentle plinth Jul 27, 2025, 7:02 PM

#

🍿

blazing bison Jul 27, 2025, 7:03 PM

#

ok keep yapping

#

unified model and can you inference guy

#

kek

#

and i have money to pay for it

#

we aren't the same

digital umbra Jul 27, 2025, 7:04 PM

#

blazing bison I don't think it will cost more than o3

if their model router is calibrated to direct you to the smallest model most of the time, they could get away with advertising a lower price but it would also be quite annoying for most people i guess. (they could add pricing tiers where more $$$ -> better chance of getting the good model)

blazing bison Jul 27, 2025, 7:05 PM

#

digital umbra if their model router is calibrated to direct you to the smallest model most of ...

That's exactly what I was thinking

gentle plinth Jul 27, 2025, 7:05 PM

#

digital umbra if their model router is calibrated to direct you to the smallest model most of ...

this would literally be llm gambling

blazing bison Jul 27, 2025, 7:05 PM

#

keep yapping

#

i'm not ,even if it's a router they not gonna say it

#

i'm gonna put it here again https://x.com/kevinweil/status/1890914595268657194

Kevin Weil 🇺🇸 (@kevinweil)

@SpencerKSchiff What you outlined is the plan 👍 May start with a little routing behind the scenes to hide some lingering complexity, but mostly around the edges. The plan is to get the core model to do quick responses, tools, and longer reasoning.

#

the unified info is from feb 12

#

great source you too

#

they consider "little routing" and "unified model" the same thing

#

and it's not

#

a game with words

#

scam altman is playing

digital umbra Jul 27, 2025, 7:08 PM

#

i guess the interesting thing to see if routing will be only in chatgpt (it's pretty much guaranteed to be) or if it's going to be in the API as well

leaden palm Jul 27, 2025, 7:09 PM

#

i think openai and anthropic are competing for the title

zinc ore Jul 27, 2025, 7:09 PM

#

I don't even see what the big deal is with them routing

blazing bison Jul 27, 2025, 7:09 PM

#

If you compare the marketing of december o3 and what got released, the difference...

#

And people fall for it again and again

gentle plinth Jul 27, 2025, 7:10 PM

#

they make great products, but not regarding ai. they marketed the new siri to be able to read your e-mails, lookup photos and make new appointments based on all of that, and they couldnt deliver

blazing bison Jul 27, 2025, 7:10 PM

#

gentle plinth they make great products, but not regarding ai. they marketed the new siri to be...

They presented a paper saying that LLMS sucks

digital umbra Jul 27, 2025, 7:12 PM

#

i think they said reasoning wasn't worth the effort

blazing bison Jul 27, 2025, 7:12 PM

#

they said that llms wasn't worth the effort

digital umbra Jul 27, 2025, 7:12 PM

#

https://machinelearning.apple.com/research/illusion-of-thinking

Apple Machine Learning Research

The Illusion of Thinking: Understanding the Strengths and Limitatio...

Recent generations of frontier language models have introduced Large Reasoning Models (LRMs) that generate detailed thinking processes…

#

this is what i was thinking of

#

maybe you think of something else

blazing bison Jul 27, 2025, 7:13 PM

#

Did you read the paper?

gentle plinth Jul 27, 2025, 7:13 PM

#

the paper cant be taken seriously, it was written by an intern, and it turned out that the context window of the llms wouldnt even be sufficient to write out the entire solutions that failed

blazing bison Jul 27, 2025, 7:14 PM

#

digital umbra maybe you think of something else

The point of the paper is that people tried to solve llms problems with thinking tokens, but it was useless

digital umbra Jul 27, 2025, 7:14 PM

#

blazing bison Jul 27, 2025, 7:15 PM

#

digital umbra

?

#

"both collapse at high complexity"

digital umbra Jul 27, 2025, 7:15 PM

#

that was snipped from the conclusion

blazing bison Jul 27, 2025, 7:16 PM

#

I don't know what your point is

digital umbra Jul 27, 2025, 7:18 PM

#

i think we all read it in different ways

unborn ocean Jul 27, 2025, 7:36 PM

#

the paper is really bad, they released multiple papers like that

#

they even did something similar for their open source llm, were they 1. invent new eval, 2. sota sucks at it apparently, 3. conclude: ai is a joke

#

it is basically just them trying to pretend like them sucking at ai products is completely fine

digital umbra Jul 27, 2025, 7:39 PM

#

i don't really understand why apple is doing it

#

they're sitting on a goldmine really, they have cheap-ish hardware (not by consumer standards, but compared to nvidia datacenter stuff) that can run large models

#

it doesn't really matter if their in house models aren't that good, they could make a lot of money from hardware alone

gentle plinth Jul 27, 2025, 7:43 PM

#

ai isnt their thing so much in my opinon. they are already doing lots of money with hardware. but training their own models, it doesnt fit into their philosophy of striving perfection. you cant control ai, its just a black box that can sometimes do unexpected things. also since they are oriented towards privacy they dont collect a lot of data which could be used for training. they probably also dont want to get their hands dirty by torrenting books as meta did to train ai

torn mantle Jul 27, 2025, 7:50 PM

#

digital umbra they're sitting on a goldmine really, they have cheap-ish hardware (not by consu...

How is this beneficial to them? They have a profit hack, they change 5% max of their BOM, do some bug fixes here in there to theur software and have like +400% profit, do you think they can do the same with ai?

#

And how's their hardware compared to tpus? Thats the real cheapest thing, they don't have cost advantage, their r&d ai team are still far behind, and they dont have a clear business plan for it

digital umbra Jul 27, 2025, 7:52 PM

#

nvidia is the highest valued company in the world due to their hardware and the fact the world is running on their CUDA stack. apple could easily grab a slice of that cake by developing a open cross platform alternative to CUDA together with other actors like AMD and intel but they're too mismanaged

blazing bison Jul 27, 2025, 7:53 PM

#

They lost the head of Apple inteliggence for Zuck btw

digital umbra Jul 27, 2025, 7:53 PM

#

they have built in TPU core in their SoC but it's just a pretty limited since it's a consumer device in the end. perhaps they could scale it and make dedicated server hardware...

blazing bison Jul 27, 2025, 7:54 PM

#

Zuck is throwing money on researchers like crazy

digital umbra Jul 27, 2025, 7:55 PM

#

meta is focusing on software and model development. apple should focus on hardware instead. there's no point in all companies trying to do the same thing i think

stray aspen Jul 27, 2025, 7:55 PM

#

I agree

blazing bison Jul 27, 2025, 7:55 PM

#

🤷‍♂️

#

I wouldn't like my company being dependent on 3rd models

digital umbra Jul 27, 2025, 7:57 PM

#

i wonder what meta and openai thinks about being dependent on google hardware...

blazing bison Jul 27, 2025, 7:58 PM

#

Openai is doing everything they can to change this

#

And zuck too...

torn mantle Jul 27, 2025, 8:01 PM

#

digital umbra nvidia is the highest valued company in the world due to their hardware and the ...

It took them 20 years to build cuda, uxl foundation( google & intel collab) was literally made to destroy this hegemony and they are still finding issues building oneapi( their open source version) , same thing with amd, rocm is so hard to use, so its really not that simple to build something like cuda

digital umbra Jul 27, 2025, 8:03 PM

#

perhaps but doesn't that show that there is still an opening?

#

if there still is no viable alternative to cuda

torn mantle Jul 27, 2025, 8:12 PM

#

Its really not that simple

#

I mean it may look like that but its a complex ecosystem

#

Even if apple changed their business plan strategy

#

Nvidia & apple have diff strategies btw... You will never find an m4 or m3 processor on dell

timber kiln Jul 27, 2025, 8:13 PM

#

digital umbra they're sitting on a goldmine really, they have cheap-ish hardware (not by consu...

They are only good for personal use, for batch inference they lose out
The thing that killed Apple from AI game was their feud with Nvidia

#

Yeah no other company can run 2B terrible models on their phones right genius Apple

#

Literally every phone?

#

None of them need to install that default
There are hundreds of apps on the marketplace

hybrid widget Jul 27, 2025, 8:17 PM

#

hello

timber kiln Jul 27, 2025, 8:18 PM

#

Your point is nobody cares about edge models?

digital umbra Jul 27, 2025, 8:18 PM

#

both google and microsoft are putting lots of research into small models for phones and laptops, they're probably already ahead of apple in most areas there

timber kiln Jul 27, 2025, 8:18 PM

#

I mean if you can enjoy a 2B model
Go ahead buddy
That is not enough for my needs

digital umbra Jul 27, 2025, 8:18 PM

#

even if apple has a "nicer" ui it doesn't really matter if the model is crap

timber kiln Jul 27, 2025, 8:19 PM

#

It does matter

#

If you are privacy schizo you need a beefy gpu to keep up not a tiny npu

torn mantle Jul 27, 2025, 8:19 PM

#

digital umbra if there still is no viable alternative to cuda

Also even if they made something similar to cuda goodluck convincing all ai companies to switch to their own, and goodluck adding support of this new cuda alt on pytorch/tensorflow, and gl making +1000 kernel optimized functions for that, nvidia literally has department with hundreds of engineers working for months just to optimize cuDNN kernel to gain a 1% optimization

timber kiln Jul 27, 2025, 8:20 PM

#

What sensitive data can you feed into those tiny models even lmao they have terrible context length

#

You wouldn't know using only local small models

digital umbra Jul 27, 2025, 8:20 PM

#

torn mantle Also even if they made something similar to cuda goodluck convincing all ai comp...

i don't think it will happen while all the competitors are trying to make their own cuda instead of collaborating

torn mantle Jul 27, 2025, 8:21 PM

#

digital umbra i don't think it will happen while all the competitors are trying to make their ...

I told you, google and intel are collaborating

digital umbra Jul 27, 2025, 8:22 PM

#

google and intel consists of 1% of the consumer gpu market, at best

torn mantle Jul 27, 2025, 8:22 PM

#

https://www.intel.com/content/www/us/en/developer/articles/technical/oneapi-a-viable-alternative-to-cuda-lock-in.html

Intel

oneAPI: A Viable Alternative To CUDA* Lock-in

oneAPI programming model - an alternative to CUDA* vendor lock-in for accelerated parallel computing across HPC, AI, and more on CPUs and GPUs.

digital umbra Jul 27, 2025, 8:22 PM

#

apple at least sells large amounts of consumer hardware, and amd i think is essential

torn mantle Jul 27, 2025, 8:23 PM

#

U are always right

hybrid widget Jul 27, 2025, 8:24 PM

#

hii

#

bdaycake

torn mantle Jul 27, 2025, 8:25 PM

#

Apple sells the experience

#

The whole package

digital umbra Jul 27, 2025, 8:26 PM

#

i know

#

that approach doesn't work for them anymore

#

unless they start to inject huge amounts of $ and talent like meta

torn mantle Jul 27, 2025, 8:26 PM

#

digital umbra that approach doesn't work for them anymore

Do you want them to sell their hardware parts to competitors?

#

They need to sell the whple ecosystem then

#

Processor + OS

digital umbra Jul 27, 2025, 8:28 PM

#

apple has been a hardware company since the start, i think trying to own the entire software ecosystem was a mistake for them

torn mantle Jul 27, 2025, 8:28 PM

#

Apple is in a safe zone

digital umbra Jul 27, 2025, 8:31 PM

#

i wouldn't say that is the only reason

#

nvidia was at the bottom when the tariffs were announced

#

apple wouldn't have been as affected if they sold hardware to datacenters as well

stray aspen Jul 27, 2025, 8:57 PM

#

Damn.craig came to educate everyone

fossil fable Jul 27, 2025, 8:57 PM

#

how do you boost a server for 6 years straight what the fuckkkk

Screenshot_2025-07-27-21-56-52-23_1b5a96d325b4aa38841ed4ab46124a4f.jpg

stray aspen Jul 27, 2025, 8:58 PM

#

How long has this server existed

fossil fable Jul 27, 2025, 8:58 PM

#

fossil fable how do you boost a server for 6 years straight what the fuckkkk

there are no messages from before that date

#

yeah

#

server must've been wiped

torn mantle Jul 27, 2025, 8:59 PM

#

hax

torn mantle Jul 27, 2025, 9:03 PM

#

digital umbra apple wouldn't have been as affected if they sold hardware to datacenters as wel...

apple only issue was that they were relying too much on china for its supply chain

#

thats why they invested heavily in india recently

wanton sonnet Jul 27, 2025, 9:08 PM

#

torn mantle apple only issue was that they were relying too much on china for its supply cha...

The problem is that investment wouldn’t help them or save them. The difference in quality of the workforce is just too large to bridge.

echo aurora Jul 27, 2025, 9:11 PM

#

fossil fable how do you boost a server for 6 years straight what the fuckkkk

pretty sure it's for in general boosting since, not for this server in particular

gentle plinth Jul 27, 2025, 9:15 PM

#

https://youtube.com/shorts/trwoCpWN6Ug just one of many example where apple ai is just not sota, apple is good at other things, but definitely not at ai

YouTube

Custom Adventurist

Samsung AI vs Apple AI Which Is Better? #samsungai #appleai #aicomp...

Samsung AI vs Apple AI — which one is actually better? We put both to the test with real-world tasks and features. From photo editing to live translation, who wins the AI battle? Watch until the end! #samsungai #appleai #aicomparison #techreview #galaxys25ultra #iphone16promax

▶ Play video

sturdy mica Jul 27, 2025, 9:15 PM

#

fossil fable how do you boost a server for 6 years straight what the fuckkkk

how come you can say fuckkkk

#

but not just

#

4 letters

#

it sets off automod

torn mantle Jul 27, 2025, 9:16 PM

#

wanton sonnet The problem is that investment wouldn’t help them or save them. The difference i...

yea

whole wagon Jul 27, 2025, 9:17 PM

#

Top openrouter companies this week

#

Qwen shot up lmao

dusky pier Jul 27, 2025, 9:17 PM

#

whole wagon Top openrouter companies this week

How is Mistral so high

whole wagon Jul 27, 2025, 9:18 PM

#

dusky pier How is Mistral so high

Europeans supporting some domestic AI I guess

dusky pier Jul 27, 2025, 9:18 PM

#

whole wagon Europeans supporting some domestic AI I guess

It's one of the worst ais

digital umbra Jul 27, 2025, 9:19 PM

#

whole wagon Top openrouter companies this week

most of that is rp with mistral small 24b lol

#

mistral's usage that is

#

whole wagon Jul 27, 2025, 9:20 PM

#

whole wagon Top openrouter companies this week

This is what it was 2 months ago. So the difference can be seen

#

GPT5 going to come in clutch for OpenAI. They definitely need it lol

gentle plinth Jul 27, 2025, 9:22 PM

#

digital umbra

actually more people use it for legal tasks according to OR

digital umbra Jul 27, 2025, 9:22 PM

#

doesn't make much sense to use byok with openrouter when you still still have to pay openai directly and 5% to openrouter on top of it

gentle plinth Jul 27, 2025, 9:22 PM

#

gentle plinth actually more people use it for legal tasks according to OR

whole wagon Jul 27, 2025, 9:23 PM

#

digital umbra doesn't make much sense to use byok with openrouter when you still still have to...

That doesn't explain the drop compared to 2 months ago

#

It is a relative change

gentle plinth Jul 27, 2025, 9:24 PM

#

digital umbra doesn't make much sense to use byok with openrouter when you still still have to...

you actually get a discount

#

you pay less then 1% in fees for topup and get a 1% discount

ocean vortex Jul 27, 2025, 9:26 PM

#

whole wagon Top openrouter companies this week

If you didn’t need OpenAI key for o3, they would have been in top3 for sure, maybe even higher

gentle plinth Jul 27, 2025, 9:27 PM

#

maybe but where are you getting the info that OR user pay more?

whole wagon Jul 27, 2025, 9:27 PM

#

ocean vortex If you didn’t need OpenAI key for o3, they would have been in top3 for sure, may...

But you have always needed the key. That's why I put the result from 2 months ago also

gentle plinth Jul 27, 2025, 9:27 PM

#

the pricing on both pages are the same

#

there is, but as i said its less then the 1% discount

#

its enabled by default

#

as i said its enabled be default

#

#

ah yes

#

😂

ocean vortex Jul 27, 2025, 9:30 PM

#

whole wagon But you have always needed the key. That's why I put the result from 2 months ag...

2 months ago Claude4 was barely a thing that existed for any meaningful statistic, that’s the main cause

gentle plinth Jul 27, 2025, 9:30 PM

#

ok i was wrong

digital umbra Jul 27, 2025, 9:30 PM

#

OR is for using large open source models + people too lazy to set up litellm for proprietary models

ocean vortex Jul 27, 2025, 9:30 PM

#

So 2 months ago was not representative tbh

#

Neither is the current one unless you pretend o3 doesn’t exist lol

whole wagon Jul 27, 2025, 9:31 PM

#

Well it went from 24.7% to 4.7% in 2 months, that's not easy to explain away

#

Even if what you say is true I don't think it causes that huge difference

ocean vortex Jul 27, 2025, 9:33 PM

#

whole wagon Well it went from 24.7% to 4.7% in 2 months, that's not easy to explain away

It’s fairly easy. Competition sucked so bad that even lesser models like gpt4.1 were enough to steal the traffic from them.

#

People are simply using the best (and cheapest) that they can access

#

Qwen and Deepseek both updated their models or released the new ones around that time. So everyone wanted to test those too, even if they didn’t end up sticking with those…

#

Same for Claude4

#

Kinda all checks out tbh

ocean vortex Jul 27, 2025, 10:06 PM

#

This reminded me of a presentation I saw recently by some true hard-core coder. He coded the entire presentation himself almost live on stage (no powerpoint or any other app). He had a strong opinion against using any AI at all lmao

#

When you are this good at it, AI will slow you down in many cases or just screw things up

fossil fable Jul 27, 2025, 10:14 PM

#

echo aurora pretty sure it's for in general boosting since, not for this server in particula...

no it only pops up for the server you're in

fossil fable Jul 27, 2025, 10:15 PM

#

gentle plinth actually more people use it for legal tasks according to OR

MISTRAL'S A GOOD LAWYER???

torn mantle Jul 27, 2025, 10:22 PM

#

wanton sonnet The problem is that investment wouldn’t help them or save them. The difference i...

The quality difference is big yea but their objective is to have a diversified supply chain prone to fluctuations

#

This will translate to overall cost optimization and stability

#

Sigh, mistral focus changed completely

#

They are working on implementing AI tools/features on ERP systems like SAP/Sage... Thats good but i miss old mistral, feels like they far behind competitors

ocean vortex Jul 27, 2025, 10:27 PM

#

wanton sonnet The problem is that investment wouldn’t help them or save them. The difference i...

https://youtu.be/-JbIOGGTRTY?si=er9CV3uT9hUxOYg-

YouTube

PolyMatter

Why Apple Can’t Leave China (yet)

Use https://go.nebula.tv/polymatter for 40% off an annual subscription of Nebula (that's just $3/month!)

Watch this video ad-free on Nebula: https://nebula.tv/videos/polymatter-why-apple-cant-leave-china-yet

Sources: https://docs.google.com/document/d/1HSxgBpHu9_kBnRd9X0Tb-skXWMrlY3ZV2p_iqB8N3WA/edit?usp=sharing

Twitter: https://twitter.com/p...

▶ Play video

#

Apple’s involvement in China is huge. They have a low-key deal with them where they are essentially investing billions into developing the country itself. In return of cheap labor and no nasty surprises from CCP

#

A bit ‘deal with the devil’ kind of thing. They are politically incompatible but also inseparable and one of the biggest contributors to China’s success

torn mantle Jul 27, 2025, 10:31 PM

#

They are working closely with the ccp yea

lime coral Jul 27, 2025, 10:38 PM

#

hikikiki

ocean vortex Jul 27, 2025, 10:39 PM

#

On the surface level what Trump is trying to do by forcing everyone to move away from China may look right. But not when you realise his own phone he is gonna sell is made in China and not when you start to understand the complexity of the supply chain and the current market. It’s just not possible and this is a dumb way to do this

#

Short-term political gains at the cost of completely destroying things and causing chaos or distrust longer term. China is laughing at this as all cards are in their hands and they seem to be much more competent at diplomacy, more measured too

#

For impartial countries China even with all their obvious issues is becoming a more reliable partner than US…

sick chasm Jul 27, 2025, 11:03 PM

#

zenith is in rotation again, we're so back lol

#

summit too btw

torn mantle Jul 27, 2025, 11:07 PM

#

Oai usually has a pattern releasing models after adding them to lmarena

#

Idk if its a one week thing or two weeks

#

I think gpt5 will be released next Thursday

torn star Jul 27, 2025, 11:10 PM

#

torn mantle I think gpt5 will be released next Thursday

This Thursday?

#

Or the one after this one

torn mantle Jul 27, 2025, 11:10 PM

#

One after

#

Are u sure?

stray aspen Jul 27, 2025, 11:21 PM

#

est ce que quelqu un sait quand l arene video sera publiee

jade egret Jul 27, 2025, 11:22 PM

#

which one comming out first gpt-5 or gemini 3

stray aspen Jul 27, 2025, 11:22 PM

#

jade egret which one comming out first gpt-5 or gemini 3

grok 5

jade egret Jul 27, 2025, 11:22 PM

#

why

#

is gemini 2.5 pro a base

#

what the base model right now for google?

#

2.5 flash?

#

plz tell 😦

#

so we need gemini 2.5 ultra

#

kingsfall is deepthink?

#

fr?

#

so google does have ultra so they js not releasing it

#

oh

#

why tho

#

🍊

#

but i dont think grok gonna be good especially after grok 4

#

oh

#

true

#

hopefully elon musk doesn't mess it up 🙂

#

maybe

#

oh

#

so gpt 5 is still gonna be much better

#

cmon release it already 🙃

#

next few week

#

hopefully

#

even free user can access it i think

#

js not as good as subscriptions

#

please be good at coding

raven oracle Jul 27, 2025, 11:39 PM

#

sick chasm zenith is in rotation again, we're so back lol

Pretty sure it’s gone again, haven’t found it once

sick chasm Jul 27, 2025, 11:54 PM

#

raven oracle Pretty sure it’s gone again, haven’t found it once

got it right now, but I'm pretty sure odds are like 1/100 chats 🥹

#

Also, I'm not at all familiar with how lmarena works, but for very simple prompts (like one above), I've got summit/zenith consistently (for a past hour or so). haven't been able to catch them with complex prompts tho.

whole wagon Jul 27, 2025, 11:57 PM

#

Interesting

#

Summit is a beast. I dont find zenith that good ngl. Like it's slightly better than o3 (so it will get 1st) but that's it

sour spindle Jul 28, 2025, 12:25 AM

#

sick chasm got it right now, but I'm pretty sure odds are like 1/100 chats 🥹

All that work and this is the question you asked lol

sturdy mica Jul 28, 2025, 12:28 AM

#

i wish you could upload images with search. i really need that

#

i dont know why you cant. its like the arena is restricting you

#

@echo aurora please, adding attachments would be awesome to search

#

don't know why they aren't there anyway

echo aurora Jul 28, 2025, 12:33 AM

#

sturdy mica <@283397944160550928> please, adding attachments would be awesome to search

I'll be sure to pass on, but note the #1372230675914031105 channel, it helps us organize feedback better. blobthumbsup

jovial sapphire Jul 28, 2025, 12:51 AM

#

Hi there

echo aurora Jul 28, 2025, 12:51 AM

#

jovial sapphire Hi there

ablobwave

jade egret Jul 28, 2025, 12:51 AM

#

sick chasm got it right now, but I'm pretty sure odds are like 1/100 chats 🥹

what model is that?

jovial sapphire Jul 28, 2025, 12:51 AM

#

sick chasm got it right now, but I'm pretty sure odds are like 1/100 chats 🥹

Oh damn, I came on this server fo this exact reason

jade egret Jul 28, 2025, 12:51 AM

#

echo aurora <a:ablobwave:552927506957729802>

🍊

jovial sapphire Jul 28, 2025, 12:52 AM

#

Yesterday, I was getting Zenith all the time

#

Now I can't seem to get it at all...

jade egret Jul 28, 2025, 12:52 AM

#

is that gpt 5

jovial sapphire Jul 28, 2025, 12:52 AM

#

I think it is

#

I'm an ethical hacker and usually, I ask LLMs to code me tools that help me for my job

#

Zenith no joke made me a tool that I will now use everyday

#

2k lines of code, one shot

#

I have never seen any LLMs do that, with that precision

#

And I use Opus and Sonnet everyday

#

So if it's not GPT5, it's a really, really good model

sick chasm Jul 28, 2025, 12:53 AM

#

After some testing I feel odds are even worse than 1/100 now. Got 1 zenith and 0 summit answers in 200 chats

jovial sapphire Jul 28, 2025, 12:54 AM

#

Yeah

#

They can change the frequency I think

#

Chance of dropping

#

I'm so sad, I wish I had more time to test it 😦

#

#

Here's Gemini comparing the code of Zenith and the code of Gemini 2.5 pro on the same prompt

jade egret Jul 28, 2025, 1:11 AM

#

is claude 4 opus and claude 4 sonnet the same base model but opus have more time to think?

whole sundial Jul 28, 2025, 1:12 AM

#

quiet moss Jul 28, 2025, 1:12 AM

#

ah sh

#

I was just looking for those models

#

man

#

i hope GPT-5 comes out soon

stray aspen Jul 28, 2025, 1:13 AM

#

whole sundial

where are you seeing that

quiet moss Jul 28, 2025, 1:13 AM

#

its a discord server ig

#

may you send it in my dms? @whole sundial

whole sundial Jul 28, 2025, 1:14 AM

#

dev mode discord, they have a bot that checks lmarena apis for new and removed models

leaden meteor Jul 28, 2025, 1:17 AM

#

How do I join this Arena battles discord?

whole sundial Jul 28, 2025, 1:18 AM

#

can I dm you the link?

echo aurora Jul 28, 2025, 1:19 AM

#

leaden meteor How do I join this Arena battles discord?

you can learn more about our experimental video arena here: #1397655624103493813

whole sundial Jul 28, 2025, 1:20 AM

#

that's not what they were asking for

whole wagon Jul 28, 2025, 1:27 AM

#

Yeah Claude just made opus 5x more expensive per token for memes. It's the same base model fr

#

Like bruh

jade egret Jul 28, 2025, 1:33 AM

#

o

#

kimi k2 lowkey kinda slow

jovial sapphire Jul 28, 2025, 1:36 AM

#

whole sundial

I'm so sad

molten wind Jul 28, 2025, 1:36 AM

#

sup chat

storm needle Jul 28, 2025, 1:38 AM

#

whole wagon Yeah Claude just made opus 5x more expensive per token for memes. It's the same ...

not actually

jovial sapphire Jul 28, 2025, 1:40 AM

#

whole sundial

Guys, I think people are not ready

#

Especially developpers

dense dirge Jul 28, 2025, 2:23 AM

#

How do I join this Arena Battles discord?

jade egret Jul 28, 2025, 2:37 AM

#

dense dirge How do I join this Arena Battles discord?

wdym

dense dirge Jul 28, 2025, 2:39 AM

#

@jade egret ??

leaden palm Jul 28, 2025, 2:40 AM

#

dense dirge <@1270888663126904842> ??

what do you mean

torn star Jul 28, 2025, 2:41 AM

#

dense dirge Jul 28, 2025, 2:42 AM

#

leaden palm what do you mean

Ok, I've joined the arena. Thank you for the patience 😊🙏

jade egret Jul 28, 2025, 2:46 AM

#

torn star

no way it this thursday right

torn star Jul 28, 2025, 2:46 AM

#

jade egret no way it this thursday right

Anything is possible. Reasoning models didn’t even exist a year ago

#

The world will change substantially. Most won’t feel it until it’s too late

restive sky Jul 28, 2025, 3:25 AM

#

Is it possible to ask another question to the same LLM in battle mode after it is revealed which is which?

#

For example, if I want to ask more questions of a model with an alias, do I have to just keep trying until I get it again?

twin acorn Jul 28, 2025, 3:43 AM

#

every time i send a prompt in web arena it times out

unkempt garnet Jul 28, 2025, 3:44 AM

#

twin acorn every time i send a prompt in web arena it times out

Same

keen fulcrum Jul 28, 2025, 3:47 AM

#

When will models from #1372229840131985540 be added

echo aurora Jul 28, 2025, 3:49 AM

#

dense dirge How do I join this Arena Battles discord?

are you looking for #1397655624103493813 ?

echo aurora Jul 28, 2025, 3:49 AM

#

keen fulcrum When will models from <#1372229840131985540> be added

That's tbd!

keen fulcrum Jul 28, 2025, 4:01 AM

#

@echo aurora We can’t click on buttons in #video-arena-1

echo aurora Jul 28, 2025, 4:03 AM

#

keen fulcrum <@283397944160550928> We can’t click on buttons in <#1397655695150682194>

the votes? looks like it's registering and working for me, I see that you voted on the most recent one.

#

refresh discord.

keen fulcrum Jul 28, 2025, 4:04 AM

#

echo aurora the votes? looks like it's registering and working for me, I see that you voted ...

It showed no response before, delayed response

twin acorn Jul 28, 2025, 5:10 AM

#

this all the time as well

#

this as well

#

maybe works 5% of the time

iron cipher Jul 28, 2025, 5:19 AM

#

twin acorn maybe works 5% of the time

I'm scared of losing my chat access - again

torn mantle Jul 28, 2025, 5:21 AM

#

twin acorn this all the time as well

could be a cloudflare communication issue

#

are you using some kind of vpn?

sturdy mica Jul 28, 2025, 5:27 AM

#

whole sundial dev mode discord, they have a bot that checks lmarena apis for new and removed m...

can you send me it

sturdy mica Jul 28, 2025, 5:30 AM

#

jovial sapphire Zenith no joke made me a tool that I will now use everyday

what's the tool?

torn mantle Jul 28, 2025, 5:30 AM

#

i hate this type of error handling

sturdy mica Jul 28, 2025, 5:31 AM

#

yuck

#

new ui held together by hope

#

the new ui is so weirdly coded it's like it was vibe coded

torn mantle Jul 28, 2025, 5:34 AM

#

they are actually checking for turnstile token failures(cloudflare antibot) & network errors & server-side vote rejections but it's currently displaying the same generic error message for all of them

#

instead of 'Failed to submit vote' it could've been something like :

Failed to submit vote : Turnstile token is missing or empty.
Failed to submit vote : Network request failed. The server could not be reached
Failed to submit vote : Network request failed. Server rejected the vote

#

more specific and helps in diagnosing the problem

#

nvm they are using the error object, @twin acorn next time, just open the developer console and share the error message from there

twin acorn Jul 28, 2025, 5:53 AM

#

torn mantle are you using some kind of vpn?

no

formal dagger Jul 28, 2025, 6:05 AM

#

even not wolfstride, can't bare 2.5 pro now.

pseudo heath Jul 28, 2025, 6:17 AM

#

what happened with the new qwen3 series on the latest leaderboard? it seems they are all gone

twin acorn Jul 28, 2025, 7:48 AM

#

nvm they are using the error object, @

twin acorn Jul 28, 2025, 7:49 AM

#

torn mantle nvm they are using the error object, <@1127485090373574708> next time, just open...

📎 message.txt

lofty cosmos Jul 28, 2025, 7:51 AM

#

whole sundial

hey man, may I get the invite link to this dev mode discord

civic flame Jul 28, 2025, 8:15 AM

#

formal dagger even not wolfstride, can't bare 2.5 pro now.

Google are cooking dw

lunar wind Jul 28, 2025, 8:43 AM

#

Am i in the wrong server?

west scroll Jul 28, 2025, 8:44 AM

#

lunar wind Am i in the wrong server?

No one here is also helping me,
Like no one

civic ginkgo Jul 28, 2025, 8:44 AM

#

#webdev-arena

frigid coral Jul 28, 2025, 8:52 AM

#

west scroll No one here is also helping me, Like no one

This is from another server. I can message you the link if you'd like.

civic flame Jul 28, 2025, 8:52 AM

#

west scroll No one here is also helping me, Like no one

that's because you're in the wrong server

#

search for the invite on twitter or someone here will send you a dm

civic ginkgo Jul 28, 2025, 8:54 AM

#

frigid coral This is from another server. I can message you the link if you'd like.

can you send me a link?thanks

west scroll Jul 28, 2025, 8:55 AM

#

frigid coral This is from another server. I can message you the link if you'd like.

Dm

west scroll Jul 28, 2025, 8:55 AM

#

civic flame that's because you're in the wrong server

Thanks

lunar wind Jul 28, 2025, 9:43 AM

#

frigid coral This is from another server. I can message you the link if you'd like.

Can you DM me link please?

frigid coral Jul 28, 2025, 9:44 AM

#

lunar wind Can you DM me link please?

I can't DM you since you probably disabled accepting dm from strangers

whole sundial Jul 28, 2025, 9:51 AM

#

you shouldn't do that in main chat here

#

you should remove that message btw

torn mantle Jul 28, 2025, 10:32 AM

#

twin acorn

yea its a cloudflare issue but its probably coming from your adblocker

#

disable it and see if it works

#

ur adblocker blocks some analytics trackers -> cloudflare flags this as suspicious behavior -> leads to authorization errors (401) -> server applies a rate limit on your vote attempt (429)

gentle plinth Jul 28, 2025, 10:41 AM

#

i think it could also be the case that some specific strings in the prompt/output trigger a cloudflare security block.
I had it in direct chat, some specific prompt with special characters (html tags etc.) didnt work, maybe because cloudflare thought this is some kind of xss attack.

torn mantle Jul 28, 2025, 10:42 AM

#

im looking at my console logs and im kinda having same errors but the vote is submitted

torn mantle Jul 28, 2025, 10:58 AM

#

could be that its sending 401 but didnt reach the security threshold to throw a 429 ( rate limit error )

torn mantle Jul 28, 2025, 10:59 AM

#

gentle plinth i think it could also be the case that some specific strings in the prompt/outpu...

are you talking about rendering or the vote issue?

winged locust Jul 28, 2025, 11:00 AM

#

GLM4.5 can use

#

GLM

gentle plinth Jul 28, 2025, 11:01 AM

#

torn mantle are you talking about rendering or the vote issue?

for me it was that i couldnt submit a prompt, but maybe here the problem is something different. just wanted to note that this can sometimes also be an issue with cloudflare

#

but in the network view of the browser i could see that cloudflare blocked the request because of that

#

so maybe here its a different problem

torn mantle Jul 28, 2025, 11:01 AM

#

gentle plinth for me it was that i couldnt submit a prompt, but maybe here the problem is some...

yea you are talking about the rendering issue

#

had that many times

blazing bison Jul 28, 2025, 11:02 AM

#

Using chagpt agent to use lmarena and watch it voting for claude models against gpt 4.1 and o3 answers is actually funny

keen beacon Jul 28, 2025, 11:03 AM

#

blazing bison Using chagpt agent to use lmarena and watch it voting for claude models against ...

How well does it do with voting correctly?

#

also, hello

blazing bison Jul 28, 2025, 11:04 AM

#

keen beacon How well does it do with voting correctly?

Works like 30% of the time, sometimes it can't bypass cloudflare checks

keen beacon Jul 28, 2025, 11:04 AM

#

blazing bison Works like 30% of the time, sometimes it can't bypass cloudflare checks

I hope some kind of operator would come to LMArena too

#

would be quite fun

blazing bison Jul 28, 2025, 11:04 AM

#

But i have nothing to use this agent for so I'm just wasting compute on useless things

keen beacon Jul 28, 2025, 11:05 AM

#

blazing bison But i have nothing to use this agent for so I'm just wasting compute on useless ...

Yeah, sometimes I feel bad for using AI because of the environment even if some systems for water usage are closed-loop. The Anti-AI crowd does that to me

alpine coral Jul 28, 2025, 11:21 AM

#

torn star Anything is possible. Reasoning models didn’t even exist a year ago

damn.. that's easy to forget (less than a year ago)

leaden meteor Jul 28, 2025, 12:24 PM

#

whole sundial you shouldn't do that in main chat here

Could you please dm me that discord invite?

old ginkgo Jul 28, 2025, 12:35 PM

#

keen beacon Yeah, sometimes I feel bad for using AI because of the environment even if some ...

We are not lacking water. Even the countries were people die of thirst aren't lacking water, they are just lacking access to water. Nobody suffers because gpu superclusters in the us are watercooled...

keen beacon Jul 28, 2025, 12:36 PM

#

old ginkgo We are not lacking water. Even the countries were people die of thirst aren't la...

Welp, Meta's wanting 5GW plant and others want their own energy plants which seems crazy

#

I know that at somepoint AntiAI people will get real pissed off and try to sabotage some places

keen beacon Jul 28, 2025, 12:56 PM

#

what are you trying to do

gusty sinew Jul 28, 2025, 12:56 PM

#

how to fix stuck on generating issue does anyone know

old ginkgo Jul 28, 2025, 1:03 PM

#

keen beacon I know that at somepoint AntiAI people will get real pissed off and try to sabot...

All anti AI people are about as competent as just stop oil protestors and im pretty sure the US will make any vandalism towards datacenters count as a felony so that definitly won't happen. This isn't something a bunch of idiots could ever impact and eventually 99% of people will see the usefullness of AI.

old ginkgo Jul 28, 2025, 1:05 PM

#

keen beacon Welp, Meta's wanting 5GW plant and others want their own energy plants which see...

A few GW is nothing compared to the potential benefits of superior intelligence. In fact, everything is nothing compared to the potential of superior intelligence.

keen beacon Jul 28, 2025, 1:17 PM

#

ok but what case, it really depends lol

#

like i cant help you if its that generic. do you need vision input as well? etc. how hard is the task? etc. i mean you can use 4.1, but it won't be cost effective if youre trying to do something easy (but needs fine-tuning for accuracy/etc). if it's more complex, probably 4.1 i guess

#

it can but not really at the same time its complicated lol. i don't personally recommend relying on it to introduce knowledge unless you're doing continued pretraining (larger scale)

jagged crown Jul 28, 2025, 1:22 PM

#

whole sundial can I dm you the link?

I would also love to join that discord if possible

keen beacon Jul 28, 2025, 1:27 PM

#

old ginkgo A few GW is nothing compared to the potential benefits of superior intelligence....

I guess so. Still I am apprehensive of ever mentioning AI on social media because they'll call it AI slop immediately and want to cancel me

#

Esp reddit

#

I like to be there on the r/singularity sub. AntiAI sub is a bit too dramatic at times to even take a peek in.

old ginkgo Jul 28, 2025, 1:28 PM

#

keen beacon I guess so. Still I am apprehensive of ever mentioning AI on social media becaus...

Why would you care about opinions who you think are wrong?

keen beacon Jul 28, 2025, 1:28 PM

#

old ginkgo Why would you care about opinions who you think are wrong?

Maybe I am just sensitive

old ginkgo Jul 28, 2025, 1:29 PM

#

Does anybody know where to find the codenamed models on lmarena?

jagged crown Jul 28, 2025, 1:34 PM

#

old ginkgo Does anybody know where to find the codenamed models on lmarena?

I would also love to know

old ginkgo Jul 28, 2025, 1:39 PM

#

I think you just make the models battle and when you're lucky, you get a codename model. That's what i think atleast.

jagged crown Jul 28, 2025, 1:44 PM

#

I guess I'm asking how to find notifications of what models were added or removed

gusty sinew Jul 28, 2025, 1:45 PM

#

anyone know how to fix issues

#

with it not working.

#

it always does this

#

and then i lose all chat data

#

if anyone experienced issues like this before and knows how to fix please lmk

#

or a different ai place

#

that doesnt do this and lets me use claude without paying

keen talon Jul 28, 2025, 2:05 PM

#

hi, what are the limits for the models?

torn mantle Jul 28, 2025, 2:05 PM

#

gusty sinew

could be a context window issue

#

the session still exist right?

whole wagon Jul 28, 2025, 2:12 PM

#

What the heck

#

https://chat.z.ai/

Chat with Z.ai - Free AI for Presentations, Writing & Coding

Start a free chat with your AI assistant. Tell Z.ai what you need—a stunning presentation, professional-grade writing, or a complex code script—and get instant results.

torn mantle Jul 28, 2025, 2:17 PM

#

whole wagon https://chat.z.ai/

the UI is sick

whole wagon Jul 28, 2025, 2:17 PM

#

Yeah this is the best open source agentic coding model for sure

torn mantle Jul 28, 2025, 2:17 PM

#

also the web search feature is like a deep research one

whole wagon Jul 28, 2025, 2:19 PM

#

This is absolutely insane, it's like just slightly worse than Claude 4 sonnet

#

But 10x cheaper

blazing bison Jul 28, 2025, 2:22 PM

#

it's like qwen, the vibes doenst match

#

just good on paper

whole wagon Jul 28, 2025, 2:22 PM

#

Did you try it

blazing bison Jul 28, 2025, 2:22 PM

#

yes

#

i think kimi k2 is the best rgn

#

from the os options

whole wagon Jul 28, 2025, 2:24 PM

#

Kimi K2 missing reasoning rn that's the only issue

blazing bison Jul 28, 2025, 2:24 PM

#

for me it do well without reasoning

#

and sorry for saying this but the reasoning of claude models is a joke

#

I'm a heavy user of Claude Code, and I almost never use Reasoning

torn mantle Jul 28, 2025, 2:32 PM

#

whole wagon This is absolutely insane, it's like just slightly worse than Claude 4 sonnet

ive tried it

#

its meh

#

benchmaxxing as always

jagged crown Jul 28, 2025, 2:32 PM

#

Is cuttlefish still on the arena?

torn mantle Jul 28, 2025, 2:33 PM

#

yes

whole wagon Jul 28, 2025, 2:33 PM

#

Benchmaxing 🥀 if these Chinese LLMs actually get added to lmarena it would help

#

Like the updated Qwen reasoning model is still not there

#

I want to see if it is benchmaxxed or real

#

The new non reasoning model was bad on lmarena but they removed it

#

They should tell us before randomly removing models I think

#

@echo aurora why was it removed and why nobody was informed?

#

It is more resistant to it especially maths and coding categories

keen beacon Jul 28, 2025, 3:00 PM

#

whole wagon Benchmaxing 🥀 if these Chinese LLMs actually get added to lmarena it would help

This. I wanna see more open source models

agile bloom Jul 28, 2025, 3:01 PM

#

is there an android app for for LMArena?

torn mantle Jul 28, 2025, 3:02 PM

#

agile bloom is there an android app for for LMArena?

no

agile bloom Jul 28, 2025, 3:03 PM

#

got it, it's pretty cool tho, wish someone interested in making an open source app for it

#

I usually have long convo on it, take long to load my chat so an android (with better optimized) would work better I feel

#

btw anyone knows which is the smartest ai model to use? with largest data it has been trained on and with most parameters?

keen beacon Jul 28, 2025, 3:11 PM

#

agile bloom btw anyone knows which is the smartest ai model to use? with largest data it has...

Hard to say. It depends on the use case

#

and many other values

cedar tide Jul 28, 2025, 3:14 PM

#

@echo aurora 2.5 flash lite disappeared from the leaderboard

#

@echo aurora and why new qwen also disappeared ?

#

People go Upvote glm 4.5 https://discord.com/channels/1340554757349179412/1399396203199987843

echo aurora Jul 28, 2025, 3:19 PM

#

blobthanks I'll flag

agile bloom Jul 28, 2025, 3:21 PM

#

is it available to use on LMArena?

#

also why are these paid high end ai models free to use here?

#

is it available in battle or one vs one?

cedar tide Jul 28, 2025, 3:23 PM

#

echo aurora <:blobthanks:825444835460644929> I'll flag

News from glm 4 air ?

echo aurora Jul 28, 2025, 3:24 PM

#

cedar tide News from glm 4 air ?

No news, was flagged but will be sure to bump. blobthumbsup

dusky pier Jul 28, 2025, 4:41 PM

#

1 or 2?

timber kiln Jul 28, 2025, 4:41 PM

#

agile bloom also why are these paid high end ai models free to use here?

Because you are the product
Crowdsourcing llm battles

torn mantle Jul 28, 2025, 4:46 PM

#

dusky pier 1 or 2?

None

#

Its so basic

dusky pier Jul 28, 2025, 4:48 PM

#

torn mantle Its so basic

it's supposed to be basic

torn mantle Jul 28, 2025, 4:49 PM

#

dusky pier *it's supposed to be basic*

Looks the same to me

gusty sinew Jul 28, 2025, 4:49 PM

#

anyonme know this error

#

every time i press the button it just says same thing

#

i keep losing all my chat data because of this and then it purely not woprking

#

it does it after a few hours use of the chat

dusky pier Jul 28, 2025, 4:52 PM

#

torn mantle Looks the same to me

The UI is slightly different

gusty sinew Jul 28, 2025, 4:56 PM

#

no one else get these erros using LMArena?

#

maybe it browser i using

#

or something

blazing bison Jul 28, 2025, 4:58 PM

#

gusty sinew it does it after a few hours use of the chat

Bro the arena is not made for production usage, it's for testing models, don't expect it to work like chatgpt

keen beacon Jul 28, 2025, 4:58 PM

#

blazing bison Bro the arena is not made for production usage, it's for testing models, don't e...

ye, lol

stray aspen Jul 28, 2025, 4:58 PM

#

gusty sinew no one else get these erros using LMArena?

yes its usually because of the cloudflare thing

#

i reload the website and it lets me continue

gusty sinew Jul 28, 2025, 5:00 PM

#

stray aspen yes its usually because of the cloudflare thing

hm whats cloudfare

gentle plinth Jul 28, 2025, 5:02 PM

#

dusky pier 1 or 2?

1

torn mantle Jul 28, 2025, 5:03 PM

#

gusty sinew hm whats cloudfare

It checks whether you are a human or a bot

sour spindle Jul 28, 2025, 5:06 PM

#

all the gpt5 models gone i see ?

gusty sinew Jul 28, 2025, 5:06 PM

#

torn mantle It checks whether you are a human or a bot

oh ok

#

would an adblocker be causing it

#

because it usually doesnt come up for me

#

any verification thing like that

jade egret Jul 28, 2025, 5:10 PM

#

what the most human like llm for writing

gusty sinew Jul 28, 2025, 5:10 PM

#

stray aspen yes its usually because of the cloudflare thing

dam refreshing or restarting browser doesnt work for me

#

literally nothing works

dusky pier Jul 28, 2025, 5:11 PM

#

gentle plinth 1

It was GLM Vs Sonnet

#

The 1st one was from GLM

gentle plinth Jul 28, 2025, 5:12 PM

#

the new one? (glm4.5)

dusky pier Jul 28, 2025, 5:12 PM

#

gentle plinth the new one? (glm4.5)

Yes

torn mantle Jul 28, 2025, 5:15 PM

#

gusty sinew would an adblocker be causing it

Disable it and see

gusty sinew Jul 28, 2025, 5:15 PM

#

gusty sinew literally nothing works

anyone know why or

gusty sinew Jul 28, 2025, 5:15 PM

#

torn mantle Disable it and see

i did

#

dam

torn mantle Jul 28, 2025, 5:16 PM

#

gusty sinew i did

Is it a new chat?

gentle plinth Jul 28, 2025, 5:16 PM

#

jade egret what the most human like llm for writing

https://eqbench.com/
according to this site its gemini 2.5 pro and claude opus 4

gusty sinew Jul 28, 2025, 5:16 PM

#

its any chat i go to, after i use it for a bit it just does this and never fixes

torn mantle Jul 28, 2025, 5:17 PM

#

Open up your console and share with us the erros

torn mantle Jul 28, 2025, 5:18 PM

#

gusty sinew its any chat i go to, after i use it for a bit it just does this and never fixes

Is the conversation long or nah

gusty sinew Jul 28, 2025, 5:20 PM

#

torn mantle Open up your console and share with us the erros

im not sure how

gusty sinew Jul 28, 2025, 5:20 PM

#

torn mantle Is the conversation long or nah

yes

hybrid widget Jul 28, 2025, 5:22 PM

#

hi

gusty sinew Jul 28, 2025, 5:23 PM

#

torn mantle Open up your console and share with us the erros

how do i do this

jade egret Jul 28, 2025, 5:25 PM

#

gentle plinth https://eqbench.com/ according to this site its gemini 2.5 pro and claude opus 4

oh

blazing bison Jul 28, 2025, 5:32 PM

#

gusty sinew how do i do this

You are being rate limited

blazing rune Jul 28, 2025, 5:33 PM

#

nobody knows

#

can't predict the future

blazing bison Jul 28, 2025, 5:34 PM

#

Kimi k2

#

Or gemini 2.5 pro

blazing rune Jul 28, 2025, 5:34 PM

#

Kimi K2 is good but the fast providers are terrible

keen beacon Jul 28, 2025, 5:34 PM

#

Just gotta wait for GPT-5 first

blazing rune Jul 28, 2025, 5:34 PM

#

keen beacon Just gotta wait for GPT-5 first

probably still gonna be horrible at creativity

blazing bison Jul 28, 2025, 5:35 PM

#

blazing rune probably still gonna be horrible at creativity

Its good, zenith was good

gusty sinew Jul 28, 2025, 5:35 PM

#

blazing bison You are being rate limited

oh does it do this on all ais on here

keen beacon Jul 28, 2025, 5:35 PM

#

blazing rune probably still gonna be horrible at creativity

No model has been good with my language, Finnish, except perhaps gemini

blazing rune Jul 28, 2025, 5:35 PM

#

blazing rune Kimi K2 is good but the fast providers are terrible

The official provider (moonshot AI) is like 20 TPS

keen beacon Jul 28, 2025, 5:35 PM

#

Too many typos with others

blazing bison Jul 28, 2025, 5:35 PM

#

gusty sinew oh does it do this on all ais on here

If you use it too much yes

#

Its for test models not prod usage...

blazing rune Jul 28, 2025, 5:35 PM

#

keen beacon No model has been good with my language, Finnish, except perhaps gemini

oh, I was talking about GPT-5 likely not being creative

keen beacon Jul 28, 2025, 5:35 PM

#

blazing rune oh, I was talking about GPT-5 likely not being creative

ah sorry

#

It's hot where I live

blazing rune Jul 28, 2025, 5:36 PM

#

blazing bison Its good, zenith was good

That's before they ruin it with all the safety crap

keen beacon Jul 28, 2025, 5:36 PM

#

gives me symptoms

blazing bison Jul 28, 2025, 5:36 PM

#

blazing rune That's before they ruin it with all the safety crap

People did mecha Hitler with it

keen beacon Jul 28, 2025, 5:36 PM

#

blazing rune oh, I was talking about GPT-5 likely not being creative

But yeah, prob. GPT4o is too sycophantic already

blazing rune Jul 28, 2025, 5:36 PM

#

same thing happened with optimus alpha and quasar alpha

keen beacon Jul 28, 2025, 5:36 PM

#

along with other stuff

blazing bison Jul 28, 2025, 5:37 PM

#

Zenith is sycophantic too

keen beacon Jul 28, 2025, 5:37 PM

#

blazing bison Zenith is sycophantic too

Haven't had a chance to see it often on LMArena

blazing rune Jul 28, 2025, 5:37 PM

#

blazing bison People did mecha Hitler with it

they will most definitely patch that. and usually the way they patch it, it makes the model worse in some way

blazing bison Jul 28, 2025, 5:37 PM

#

keen beacon Haven't had a chance to see it often on LMArena

Yesterday I got it like 10 times in a roll

keen beacon Jul 28, 2025, 5:37 PM

#

blazing bison Yesterday I got it like 10 times in a roll

Oh damn, I really need to try it before the release

blazing bison Jul 28, 2025, 5:38 PM

#

But almost never got summit

blazing bison Jul 28, 2025, 5:38 PM

#

keen beacon Oh damn, I really need to try it before the release

Its already removed

keen beacon Jul 28, 2025, 5:38 PM

#

blazing bison Its already removed

great

#

Hopefully for free users GPT-5 won't be "nerfed"

blazing bison Jul 28, 2025, 5:38 PM

#

Gpt 5 is really something else

#

Oh they will

keen beacon Jul 28, 2025, 5:39 PM

#

blazing bison Gpt 5 is really something else

Ye, I've seen clips of it doing coding

#

It's good at that

#

at least

blazing bison Jul 28, 2025, 5:39 PM

#

I tried it with other things, it's good with math too

#

Medicinal skills too

keen beacon Jul 28, 2025, 5:39 PM

#

blazing bison I tried it with other things, it's good with math too

How is the positivity bias?

#

or the sycophantic stuff

blazing bison Jul 28, 2025, 5:40 PM

#

Very sycophantic

keen beacon Jul 28, 2025, 5:40 PM

#

blazing bison Very sycophantic

Okay, I hoped to see improvements on that

#

Ever since the accident

#

with one 4o update

blazing bison Jul 28, 2025, 5:40 PM

#

But it's smart

#

4o is sycophantic and dumb

#

🤓

gentle plinth Jul 28, 2025, 5:41 PM

#

blazing bison But it's smart

can it write a good novel joke?

blazing bison Jul 28, 2025, 5:41 PM

#

gentle plinth can it write a good novel joke?

Didn't tried this

keen beacon Jul 28, 2025, 5:41 PM

#

blazing bison 4o is sycophantic and dumb

It was terrifying to see the stuff about saying to people that they can fly if they tried it on a high building

blazing bison Jul 28, 2025, 5:42 PM

#

keen beacon It was terrifying to see the stuff about saying to people that they can fly if t...

I don't think that zenith is this level of sycophantic but it try to support you on everything you say

keen beacon Jul 28, 2025, 5:42 PM

#

blazing bison I don't think that zenith is this level of sycophantic but it try to support you...

Ok, thanks for letting me know

#

Google's gemini is going into that direction too

#

at times I don't like it when models are too "positive"