I agree that some of those prompts are overloaded with stuff about tools and are a bit much. But it’s still not definitively bad, you just have an emphasis on tools at the start of your chats. There have been no observed performance degradation in most cases - it basically evens itself out, helps as much as it harms but makes perfect sense for the platform to use
#general
1 messages · Page 77 of 1
dw its ok
So depending on your prompt and model it may perform slightly better or slightly worse
Asura are you gatekeeping information leaked on X ?!
An image of the "New Arena Models" Notification with a list of new models in LM Arena
This one
Didnt understand it as well
I can already tell these models will have an elo of 1800+
asura is trolling lol
that's why asura is the most hated person in this server
how come im the most hated person on all servers i joined
they all bully me
what did i do?
wait for the poll result
Hi guys, I can't seem to find the new "zenith" model everyone's talking about, was it removed in the meantime?
...
I get it every like once every 5 attempts in average more or less
am i stupid
summit got better writing style, zenith is over rhetoric, maybe variant
Going to be quite funny if these new models aren’t OpenAI
Meta finally putting that money to use lol
Some obscure cracked Chinese company side project?
Did some testing on the models. Here's what I found out:
Zenith introduced a new perspective on a physics question (electrostatics) that I had, which I have never seen before from any other model. However, it made some strange assumptions and concluded that a scenario was only true if the assumption was also true, which was incorrect. I have also never seen this assumption from any other model before. Usually, you get very similar arguments from several models—you can often predict how they will respond to and reason about a specific physics question (be it wrong or right, there are pretty typical parts they will be wrong and then the answer will be wrong)—but Zenith and Summit were both entirely new in this regard. However, I did not get the impression their raw knowledge base has improved much, as they still produced similar hallucinations on niche topics like o3 or 2.5 Pro.
I got
- Summit against: Gemma 3 27b, Deepseek V3 0324, 4o 0326, llama 3.3 70B,
- Zenith against: Amazon nova pro v1, Sonnet 4 (x2), Sonnet 4 32k
Weird that there Summit was paired only against clearly weaker models, and that none were paired against the big boys.
they want to get that out of the way
Suppose, you insert two metal plates pressed together inside a charged capacitor disconnected from battery (constant charge). Now we have Capacitor Plate 1 -> MetalPlate1+2 -> Capacitor Plate 2. Then, we separate the two metal plates inbetween the capacitor plates, so we have Capacitor Plate 1 -> MetalPlate1 -> Metal Plate2 -> Capacitor Plate 2. Please discuss whether there is a net electric field between MetalPlate1 and MetalPlate2
Even o3-pro answered "After the two inserted metal plates are pulled apart, they carry equal and opposite charges that cannot neutralise each other. Those charges reside partly on the facing surfaces and create a non-zero, uniform electric field in the space between the plates." which is clearly wrong
whenever you put an actually complex problem in llm arena and get a reasoning model, the model thinks for so long and times out lol
i know what you want to tell us
what
maybe you are smarter than all of us
I actually dunno but this is still hilarious tbh
summit is really good
lmao
like damn wow
no really am i stupid
why can't i do direct chats in webarena
webarena only has battle mode lol
I am in a game show where there are 3 doors. One contains a prize and two a goat. I can choose one door at the beginning. Then the game master offers me to change it. Does changing it increase my probability of winning?
Still cant get this basic question right
The answer is obviously No. But it overfits to training data on monty hall problem
simplebench type problem
The reasoning the model gives is instantly about monty hall and the maths
stepfun just dropped a new model
"Carefully read my prompt before answering" fixes. I think this is the model just assuming I forget to mention the door being opened after being chosen
Which is interesting. It's like trying to do typo correction but for the entire prompt
Same for me, but summit ended its answer with "Note: This 2/3 result assumes the host always opens a goat, never opens the prize, and always offers the switch. If the host doesn’t open a door (just lets you change to a random other door), switching doesn’t help; it stays 1/3 either way."
Same. Got summit twice, and both times it said something along those lines.
"Caveats: If the host does not reveal a door and simply asks if you want to change to one of the other two at random, switching doesn’t help (it stays 1/3)."
zenith didnt do this btw (Grok-4 and summit are the only models that provide such a note, Grok even states "You didn't mention the host opening a door, which is unusual...")
70% chance it's here by Aug 15th
more like 90% but im not gonna risk my money on it
Interestingly the Dec 31 odds are not shifting though. A Google counter response is expected by then I guess
i have never heard this problem before, it's a nice one
Somehow o3 is getting this right also... I swear it got it wrong before
Wtf
I never thought zero shot prompting could create apps of this caliber
are you trying o3 on lmarena?
No on chatGPT
some people said o3 on chatgpt got redirected to the new model zenith/gpt-5
I think more people would have noticed that lol
where are gpt-5 news
but you didnt
not impressed with zenith, hallucinates within answer (https://i.snipboard.io/em4Mlw.jpg)
Maybe it's whatever o3-alpha is
maybe
wtf stay switzerland supposed to mean
it means stay"neutral, diplomatic (and composed, measured)"
lol
i like "surgically helpful in private", like it's implying a "private therapist" or clinically analytical helpfulness with surgical precision
@torn mantle
so many models
to use zenith you just send the battle mode your propmt until you get it?
yes
someone is trying to censor people talking about nightride? I guess that's because Google doesn't want people finding out their model searches on the web, which is why it is good at knowledge. I gave a question yesterday with a very obscure answer that can only be found by searching, and nightride-on got it correct.
Just publish the info recently
It could have been in its dataset
don't think it would have known this lol
this isn't the search arena google, why are you using web search on the normal arena? that is not fair to any other model
gemini advanced/bard with search has been part of the normal arena in the past
its not a new thing
I'm literally in shock. Zenith is basically creating things that have taken me months in the span of a few minutes
could also be that they are exploring the effect of their search implementation on the human preference ratings (and never actually releasing this officially on the arena)
aka just an experiment, idk though
is o4-mini-high in the arena
hes blaming google now
+1
i believe so
Summit > Zenith > Lobster > Starfish
Looking forward to gpt5 in coming days
Looking forward to open source catching it in 3 months time 
And being quantized
I don't think the ramifications of these technologies are really understood by the general public. While on one hand I'm thrilled to be able to create the ideas I have in my head, I'm also terrified for what this will mean for the economy at large
Hard to conceptualize. Obviously high unemployment long term. But also much higher gdp. Can't really rework economy before we know the effects
Hard to know how to personally prepare. I guess learn to adapt and be able to evolve with it. Definitely going to be very different regarding jobs for my field (UI/UX) designer in a year.
I imagine more work, fewer jobs
I disagree, lobster is the best
yeah, I got a random call from summit when doing the battles and I thought it was pretty dang awesome
only reason why I didn't give it the win is cause it generated something only partially related to my prompt
asked it for a fantasy story set in the perspective of a paladin in hell, it gave me a story of a big-rig driving paladin punching demons so it was pretty close, but not quite fantasy
still cool as hell though
call?
ok
i am very very tired
Don’t worry in a couple of years you’ll get a call from someone’s ai assistant for real
yeep
you guys know if there are restrictions on lmarena for german users?
when i want to battle in the arena, sometimes responses take a very long time and/or just error after a while
really discourages me from continueing
how long until we have ai models creating videos for us on a customized fyp
it will happen yesterday
don't think so?
There are reported issues of lag/errors that we're working on making better. This doesn't have to do with locale.
Anyone get the feeling this discord is being used to astroturf and to pump up the new models to manipulate polymarket odds? all these "new" people coming in seems sus.. haven't had them come in in droves like this before lmao
maybe im just skeptical
Are you suggesting "StrategicSolutionsAI" writing their first message in the discord server that zenith leaves them in shock and does months of work in a few minutes (even though that is literally impossible in the LLM arena) is an astroturf?
I don't believe it 
look at this model request post lol https://discord.com/channels/1340554757349179412/1395441703112146984
almost everyone here is just someone creating hype for a model that most can't access
I don't think the polymarket betting (and kalshi etc) has created very good incentives in this space
Just based on who the models are put against, I believe that Zenith is bigger than Summit. But very interested in hearing opinions!
it's not impossible, but i doubt anyone trying to manipulate these markets would be doing something that sophisticated
market reactions to new information can be very fickle
and liquidity on even the biggest ai market on poly is still awful
i don't think that you could make much by speculating on and attempting to control crowd sentiment
i've gotten summit a lot but not zenith once..
@civic flame what examples of zenith/summit doing great with reasoning have you seen? the svg pelican looked on par with 2.5, worse than opus 4 to me
sorry, remind me tomorrow i'm not in a good place at the minute. rough night
how do i use direct chat on webarena
Sorry to say it's battle mode only.
That is feedback we're aware is important though.
summit crushed Gemini 2.5 on timeline management of story
on = online
is it selectable at all
But I agree online models have no business directly competing with models having no tools or internet access
does any model smash claude sonnet 4 no think at coding?
claude 4 opus
o3 or 2.5pro would both beat Sonnet no think comfortably
It’s not even the same league tbh
O3 is fairly good at working on complicated projects. Sonnet is a good starter tool though.
Hello
Cuttlefish is actually insane.
Better than zenith/summit?
Sonnet doesn't need to think, the other ones do
I wish they had a non reasoning option for Gemini 2.5 Pro, and had very short reasoning versions for both, like 1k tokens or something
How can I chat directly with these models guys?
So how normal people is testing it ??
What you mean if you tell me with example i can help you?
I want to try zenith myself, its possible some way ?
Like this here iuse ChatGPT 4o
Luck
is this o4 mini high
get lucky in battle mode
?
Oh I see ok thanks bro
just keep on sending until you get that model
LMAO that's stupid
so they'd let you do that but not the rest
no
its companies testing their models
its unreleased ai models
i mean kinda is
...in this specific web development arena
does that not sound a bit ridiculous
I dont think you understand what lmarena is...
is it true that the current o3 on the chatgpt ui actually routes to Zenith? I saw some people claiming that on twitter
Sonnet is already dead
I'm glad they are killing it
It's so expensive
Ive been getting alot of A/B testing using o3 in chatgpt.com
Highly doubt they just replaced it with "zenith" or "summit"
Sam really wants to train gpt5
I wonder what the costs for gpt5 are
is this 4o mini high?
I doubt it's gonna be revolutionary
I don't see the hype in it
hows zenith
when gpt 5.....
Is folsom-072125-1 just minimax-m1?
folsom is amazon
poll_question_text
best model rn
victor_answer_votes
7
total_votes
14
victor_answer_id
3
victor_answer_text
o3 pro
https://m-01.pages.dev/summit_gpu_attractor
summit made this
They changed the sysprompt supposedly, which lowered zenith performance
Appears to be so, I'm verifying simple-bench questions summit vs zenith to see which is smarter, although contamination at this point might be redundant
what do we think about zenith's knowledge? is it just a really recent cutoff or does it search the web?
*by really recent i mean within past ~100 days
folsom-072125-2 is kinda stupid
wow cant wait to use it
The hype is obvious gpt 4 essentially started this whole AI race and Now comes time for the sequel!
Gpt 5 has to be successful for the health of the whole industry
NO CUREFISH
summit is o4-pro
A great arena for understanding different models. Looking forward to hear about API possibilities for diffrent models.
Does Copilot PC NPUs have any use cases with LLMs?
Guys do you think that zenith and summit got deleted? I can't get them for 1 hour
Oh no
what does that mean
Such users as you aren’t appreciated on the platform
Copilot is weird, the deep research is not available in the browser, it told me to pay for plus in the desktop app and I've got 10 free uses on the mobile app.
So zenith is still here then perhaps
Neither
it's still configured on the frontend but I have a feeling it's been disabled behind the scenes
as has summit
because I've gone from getting them every few rounds to getting neither of them in 50 rounds
Yeah same here
Copilot DR is Weird, it refuses to speak in English.
Where do u find these messages
Maybe they removed cos they saw this kek
The august 15th release bet is at 69% so the market thinks basically guaranteed GPT5 reaches 1st on lmarena
how do u get those notifications?
Where to get to gpt 5?
Zenith and summit removed too
@echo aurora possible to tell the lm arena team that we absolutely want the possibility of regenerating the last answer of the llm even if we have already voted (we could do it on the old lm arena)
gpt 5 released 31th july
Simple Bench scores from what i tested so far, wasn't able to get to 20 questions in time, before they were removed
Summit: 9/11
Zenith: 10/12
for comparison, gemini 2.5 pro: 7/11 or 7/12 of the simple bench questions
Unfortunately not complete testing, but a rough idea at least
what about lobster ?
the next 10 other simple bench questions generally trip up alot of models so id expect them to perform less well as opposed to the first 10 public ones
Only got lobster in webarena
where do u get 20 questions
from a competition the simple bench fella held to find the best optimized prompt to get these questions correct, wandb.ai dataset i believe
It's a mistake testing ms copilot out, it's really corny.
where can i try zenith? i try http://lmarena.ai and its always use known model like o3 not anonymous model
battle mode
are, Zenith and Summit, both from OAI?
and these models are fast or slow?
They are thinkers yeah, yeah I believe from open ai, at least claimed to be when asking them
I tried using the same prompt about 30 times in battle mode to test Zenith, but it never came up.
Is it possible that it has been removed from LM Arena?
yes it's been disabled but the arena is still configured for it
so it's probably temporary
normally when they're done completely they'll remove the model on the server and client sides, but it's still there on both, just disabled in evaluations
Can't use it? 😭
not right now
Is there any way to transfer my chat from one pc to another
nope
sadly no
i did suggest it i think its part of their plan
like ability to export chats like chatgpt
then only copy paste works ?
yea that's what i've been doing just copy all the convo to a .txt file
how you do it
literally copy every message 😭
one by one
y e s
got it tnq
chatgpt has that?
am i blind...
is it for plus users only?
A question about web dev arena. If one model fails to generate valid JSON for the chat, will this directly result in the model's failure in the battle? or the battle will just be ignored.
what do you mean by json format, are you talking about the payload request?
this is basically decided by benchmarks so there is a big chance of opus 4.1 or grok 4.1 winning
?
?
?
yea its decided by the final votes on lmarena
what i mean is that models that is trained to smash bench do better than real good models
that's why i said "basically"
Lama 4 is essentially proof of this
simplebench would be good if he actually added models
bug: webarena refuses to work whenever accessed via mobile browser on desktop mode
works for me
Hey guys, i'm new to web dev arena, i went into the battle section and wrote a prompt and i got the 2 results side by side, but i can't seem to figure out which AI / models generated each output, is it possible to see which on generated these outputs, i looked everywhere i still can't seem to find it
you see after you vote
In the website? https://web.lmarena.ai/
cool, thank you
I got one more question, how would i generate a video in video-arena here, i tried #video-arena-1 A real life sagittarius in its true form, but it doesn
doesn't seem to give any output
alright, i didn't see that, thanks a lot @gentle plinth
guys are the new models (lobster, summit) no longer available in the arena?
Yes, removed
😦
definitely not a reliable way to tell
Fuckkk. The models are gone.
it is Not
cuttlefish?
what they did with my zenith boy
Do they usually pull these models a couple days before release?
naaah :///
1 - 2 weeks generally
I needed them
they aren't much better than sonnet or o3
only on frontend, if you do frontend things then i understand
if you say so
except claude
for most labs it is likely only from the pre-training stage
though there are definitely a lot of other labs just straight up using the sota in post training
Im new so I gotta ask how does LMArena work? Can I really generate more than, let's say, 3 images of gpt per day even though it's limited on the official website?
I mean, is it basically useless paying for GPT Plus cause of this?
if you're okay with random models and a few less features
wym random models? I can clearly choose between the best text to image models for free?
such as gpt image and google's imagen 4 ultra
ah right there's also direct mode
well then it's just a matter of rate limits and features
hmm where does it say what the limit is?
i don't think limits are explicit
it's pretty good
The difference is that here it's free because all your data you submit to lmarena will possibly get released publicly or used for Ai research as stated on the website. If you have chatgpt plus only openai can see your prompts and possibly train new models on it (but it won't get released publicly).
and stuff like deep research and allat
Thanks for clarifying!
Cuttlefish is strange. I don't know if its in a good or bad way. I asked for a task, he answered that "it's a good idea, but why not going even further?" (and suggested how, instead of actually doing my task)
🤓
yo does anyone have the system prompt for o3-2025-04-16 on LM Arena, something feels suspicious here
I tried both Chatgpt o3 and the playground API there
the answer quality is much lower than o3 on LM Arena
it's either the arena has a very good system prompt or, OpenAI is being very shady here
The only good model of this batch was zenith and summit
There is no lmarena system prompt i think
It just point to the model directly using the api
[Developer] Over the course of conversation, adapt to the user's tone and preferences. Try to match the user's vibe, tone, and generally how they are speaking. You want the conversation to feel natural. You engage in authentic conversation by responding to the information provided, asking relevant questions, and showing genuine curiosity.
hmm I see... I'll try that with the API
LMArena does not reveal which model it was after selected the bes answer. Have you had that happening?
it wouldn't make sense tbh, because the answers o3 gives on the arena are vastly different from the ones given by chatgpt
Just refresh the page
Chatgpt o3 is a different model
Worked, my bad
oh wait huh?
Yeah, it's a model optimized for chatgpt
oh interesting
I never knew that lmao
O3 from api is much, much better
a countertheory i've seen floating around: a newer better version of o3 is being tested in chatgpt rn
@imMedhansh no we've done this with all reasoning models always since o1 and haven't changed anything
also it's not exactly 1:1 to what o3 reasoning_effort=medium is in the api for infra reasons but it's quite close
I tryed it today and feels the same o3 for me
It's not all requests tho, only some
The source?
He works at openai so...
yes, it's noticeable that o3 on chatgpt is bad compared to api
o3 on chatgpt is lazy, and dumb for coding
they are
although the new sysprompt they added to them on lmarena made them way dumber
not for my use cases, didn't changed much
pre-sloppification they're noticeably smarter at basically everything
zenith got 10/10 on the public simplebench dataset lol
yeah
the only question it still sometimes struggles with is the last one
i tryed it and got like 7/10
when did you try it
Where is this screenshot from? Another discord?
2 days ago i think
and were you giving it the questions as raw text with the choices
are o3 api requests also being routed to gpt 5?
no
Zenith is gone, right?
i've had 2 other people do the same thing and on their runs it got 9/10 and 10/10 again
yes
disabled it appears
but not gone gone
it's still there just the serverside doesn't give it to you in evaluations
Until Google takes their turn and drops another model checkpoint
Kingfall was already a "while" ago though
Time will tell
if openai still have the strict policy of using data only from 2023 and 2024 with exception of politics, then it's not the case
sam said that the reasoning behind that is that the model can get updated information searching the internet
the objective is the models be smart enough to use internet and in context learning
zenith + summit know things up to early this year so that seems to no longer be the case
politics?
nope
est ce que quelqu'un a acces a gemini 2.5 pro deepthink?
it knows what gpt-4.1 is without it being in the sysprompt
maybe so
I think they can recognize output from their own models tho, but not sure. Gptzero seems kinda OK for classifying if some text is Ai generated. Not 100% accurate, but still somewhat
they still update the models with new data btw, just not new data from internet
synthetic data
tbh they shouldn't stop updating it with new info from the internet for that much longer
it's useful for models to have more internal knowledge without needing to use web tools
i think that gpt-5 is a model router, that route for new models based on gpt 4.5 or gpt 4.1
we didn't see any gpt 4.1 or 4.5 thinking yet so zennith is prob one of them distilled + thinking
❌
openai employee
'no 4.1 thinking' ... eh
o3
If he says unified this would mean multiple models under the hood
nop. this is not valid anymore
lol what
iirc reputable sources said it would be like a router first but would eventually become unified
Kevin already said on another tweet that they gonna do rout from start
link it then
i think i bookmarked it, gonna check
so what, it's just MoE
were
It could mean a lot of things
MoE is "unified model"
cause he made mistakes more than one time and just delete tweets like nothing happened?
OK fair
it's impossible to be just one single model because that would be insanely hard to run, assuming 5 is much much better than o3 etc
not really
Gpt 5 coming out in only one billion years guys 🥹
better, not much much better
it's releasing in the next 2 weeks lol
That’s just a rumor
The ai race is so pressured and accelerated they don’t let things simmer and discover paradigm shifting advancements, every iterative improvement is a new release, so the new next Sota is almost always one iterative improvement over the previous one
i know someone at oai who told me that it aligns with his understanding of the launch window
take that as you will
but i'm confident
remember how they suddenly reduced the price of o3 by like 80%?
I will take that as a month give or take and I’m not one month patient
they might have found optimization techniques
yeah but getting a contract with google and use their tensor chips
oai wouldn't AB test gpt-5 on chatgpt and put a bunch of gpt-5 models on lmarena a month out from launch
google is laughing in the corner
Google does yeah
i hope they're cooking with deepthink
but I agree
i trust that they are
good thing we're not talking about google then
@SpencerKSchiff What you outlined is the plan 👍 May start with a little routing behind the scenes to hide some lingering complexity, but mostly around the edges. The plan is to get the core model to do quick responses, tools, and longer reasoning.
ty
if the o3 thing is true, that some questions make it answer like zenith on chatgpt, the routing approach is basically confirmed and in test
and the promise that everyone will get access to gpt-5 unlimited, they need routing for that...
Google takes a Google of years to release
Gpt 5 will cost through the nose :/
I don't think it will cost more than o3
old or new price?
new
bro do you have sources or you just say things on your mind?
oh my god
When discussing the price of things, it's best to base it on something
professional yapper
i source things that i say
🍿
ok keep yapping
unified model and can you inference guy
kek
and i have money to pay for it
we aren't the same
if their model router is calibrated to direct you to the smallest model most of the time, they could get away with advertising a lower price but it would also be quite annoying for most people i guess. (they could add pricing tiers where more $$$ -> better chance of getting the good model)
That's exactly what I was thinking
this would literally be llm gambling
keep yapping
i'm not ,even if it's a router they not gonna say it
i'm gonna put it here again https://x.com/kevinweil/status/1890914595268657194
@SpencerKSchiff What you outlined is the plan 👍 May start with a little routing behind the scenes to hide some lingering complexity, but mostly around the edges. The plan is to get the core model to do quick responses, tools, and longer reasoning.
the unified info is from feb 12
great source you too
they consider "little routing" and "unified model" the same thing
and it's not
a game with words
scam altman is playing
i guess the interesting thing to see if routing will be only in chatgpt (it's pretty much guaranteed to be) or if it's going to be in the API as well
i think openai and anthropic are competing for the title
I don't even see what the big deal is with them routing
If you compare the marketing of december o3 and what got released, the difference...
And people fall for it again and again
they make great products, but not regarding ai. they marketed the new siri to be able to read your e-mails, lookup photos and make new appointments based on all of that, and they couldnt deliver
They presented a paper saying that LLMS sucks
i think they said reasoning wasn't worth the effort
they said that llms wasn't worth the effort
this is what i was thinking of
maybe you think of something else
Did you read the paper?
the paper cant be taken seriously, it was written by an intern, and it turned out that the context window of the llms wouldnt even be sufficient to write out the entire solutions that failed
The point of the paper is that people tried to solve llms problems with thinking tokens, but it was useless
that was snipped from the conclusion
I don't know what your point is
i think we all read it in different ways
the paper is really bad, they released multiple papers like that
they even did something similar for their open source llm, were they 1. invent new eval, 2. sota sucks at it apparently, 3. conclude: ai is a joke
it is basically just them trying to pretend like them sucking at ai products is completely fine
i don't really understand why apple is doing it
they're sitting on a goldmine really, they have cheap-ish hardware (not by consumer standards, but compared to nvidia datacenter stuff) that can run large models
it doesn't really matter if their in house models aren't that good, they could make a lot of money from hardware alone
ai isnt their thing so much in my opinon. they are already doing lots of money with hardware. but training their own models, it doesnt fit into their philosophy of striving perfection. you cant control ai, its just a black box that can sometimes do unexpected things. also since they are oriented towards privacy they dont collect a lot of data which could be used for training. they probably also dont want to get their hands dirty by torrenting books as meta did to train ai
How is this beneficial to them? They have a profit hack, they change 5% max of their BOM, do some bug fixes here in there to theur software and have like +400% profit, do you think they can do the same with ai?
And how's their hardware compared to tpus? Thats the real cheapest thing, they don't have cost advantage, their r&d ai team are still far behind, and they dont have a clear business plan for it
nvidia is the highest valued company in the world due to their hardware and the fact the world is running on their CUDA stack. apple could easily grab a slice of that cake by developing a open cross platform alternative to CUDA together with other actors like AMD and intel but they're too mismanaged
They lost the head of Apple inteliggence for Zuck btw
they have built in TPU core in their SoC but it's just a pretty limited since it's a consumer device in the end. perhaps they could scale it and make dedicated server hardware...
Zuck is throwing money on researchers like crazy
meta is focusing on software and model development. apple should focus on hardware instead. there's no point in all companies trying to do the same thing i think
I agree
i wonder what meta and openai thinks about being dependent on google hardware...
It took them 20 years to build cuda, uxl foundation( google & intel collab) was literally made to destroy this hegemony and they are still finding issues building oneapi( their open source version) , same thing with amd, rocm is so hard to use, so its really not that simple to build something like cuda
perhaps but doesn't that show that there is still an opening?
if there still is no viable alternative to cuda
Its really not that simple
I mean it may look like that but its a complex ecosystem
Even if apple changed their business plan strategy
Nvidia & apple have diff strategies btw... You will never find an m4 or m3 processor on dell
They are only good for personal use, for batch inference they lose out
The thing that killed Apple from AI game was their feud with Nvidia
Yeah no other company can run 2B terrible models on their phones right genius Apple
Literally every phone?
None of them need to install that default
There are hundreds of apps on the marketplace
hello
Your point is nobody cares about edge models?
both google and microsoft are putting lots of research into small models for phones and laptops, they're probably already ahead of apple in most areas there
I mean if you can enjoy a 2B model
Go ahead buddy
That is not enough for my needs
even if apple has a "nicer" ui it doesn't really matter if the model is crap
It does matter
If you are privacy schizo you need a beefy gpu to keep up not a tiny npu
Also even if they made something similar to cuda goodluck convincing all ai companies to switch to their own, and goodluck adding support of this new cuda alt on pytorch/tensorflow, and gl making +1000 kernel optimized functions for that, nvidia literally has department with hundreds of engineers working for months just to optimize cuDNN kernel to gain a 1% optimization
What sensitive data can you feed into those tiny models even lmao they have terrible context length
You wouldn't know using only local small models
i don't think it will happen while all the competitors are trying to make their own cuda instead of collaborating
I told you, google and intel are collaborating
google and intel consists of 1% of the consumer gpu market, at best
apple at least sells large amounts of consumer hardware, and amd i think is essential
U are always right
i know
that approach doesn't work for them anymore
unless they start to inject huge amounts of $ and talent like meta
Do you want them to sell their hardware parts to competitors?
They need to sell the whple ecosystem then
Processor + OS
apple has been a hardware company since the start, i think trying to own the entire software ecosystem was a mistake for them
Apple is in a safe zone
i wouldn't say that is the only reason
nvidia was at the bottom when the tariffs were announced
apple wouldn't have been as affected if they sold hardware to datacenters as well
Damn.craig came to educate everyone
how do you boost a server for 6 years straight what the fuckkkk
How long has this server existed
there are no messages from before that date
yeah
server must've been wiped
hax
apple only issue was that they were relying too much on china for its supply chain
thats why they invested heavily in india recently
The problem is that investment wouldn’t help them or save them. The difference in quality of the workforce is just too large to bridge.
pretty sure it's for in general boosting since, not for this server in particular
https://youtube.com/shorts/trwoCpWN6Ug just one of many example where apple ai is just not sota, apple is good at other things, but definitely not at ai
Samsung AI vs Apple AI — which one is actually better? We put both to the test with real-world tasks and features. From photo editing to live translation, who wins the AI battle? Watch until the end! #samsungai #appleai #aicomparison #techreview #galaxys25ultra #iphone16promax
how come you can say fuckkkk
but not just
4 letters
it sets off automod
yea
How is Mistral so high
Europeans supporting some domestic AI I guess
It's one of the worst ais
most of that is rp with mistral small 24b lol
mistral's usage that is
This is what it was 2 months ago. So the difference can be seen
GPT5 going to come in clutch for OpenAI. They definitely need it lol
actually more people use it for legal tasks according to OR
doesn't make much sense to use byok with openrouter when you still still have to pay openai directly and 5% to openrouter on top of it
That doesn't explain the drop compared to 2 months ago
It is a relative change
you actually get a discount
you pay less then 1% in fees for topup and get a 1% discount
If you didn’t need OpenAI key for o3, they would have been in top3 for sure, maybe even higher
maybe but where are you getting the info that OR user pay more?
But you have always needed the key. That's why I put the result from 2 months ago also
the pricing on both pages are the same
there is, but as i said its less then the 1% discount
its enabled by default
as i said its enabled be default
ah yes
😂
2 months ago Claude4 was barely a thing that existed for any meaningful statistic, that’s the main cause
ok i was wrong
OR is for using large open source models + people too lazy to set up litellm for proprietary models
So 2 months ago was not representative tbh
Neither is the current one unless you pretend o3 doesn’t exist lol
Well it went from 24.7% to 4.7% in 2 months, that's not easy to explain away
Even if what you say is true I don't think it causes that huge difference
It’s fairly easy. Competition sucked so bad that even lesser models like gpt4.1 were enough to steal the traffic from them.
People are simply using the best (and cheapest) that they can access
Qwen and Deepseek both updated their models or released the new ones around that time. So everyone wanted to test those too, even if they didn’t end up sticking with those…
Same for Claude4
Kinda all checks out tbh
This reminded me of a presentation I saw recently by some true hard-core coder. He coded the entire presentation himself almost live on stage (no powerpoint or any other app). He had a strong opinion against using any AI at all lmao
When you are this good at it, AI will slow you down in many cases or just screw things up
no it only pops up for the server you're in
MISTRAL'S A GOOD LAWYER???
The quality difference is big yea but their objective is to have a diversified supply chain prone to fluctuations
This will translate to overall cost optimization and stability
Sigh, mistral focus changed completely
They are working on implementing AI tools/features on ERP systems like SAP/Sage... Thats good but i miss old mistral, feels like they far behind competitors
Use https://go.nebula.tv/polymatter for 40% off an annual subscription of Nebula (that's just $3/month!)
Watch this video ad-free on Nebula: https://nebula.tv/videos/polymatter-why-apple-cant-leave-china-yet
Sources: https://docs.google.com/document/d/1HSxgBpHu9_kBnRd9X0Tb-skXWMrlY3ZV2p_iqB8N3WA/edit?usp=sharing
Twitter: https://twitter.com/p...
Apple’s involvement in China is huge. They have a low-key deal with them where they are essentially investing billions into developing the country itself. In return of cheap labor and no nasty surprises from CCP
A bit ‘deal with the devil’ kind of thing. They are politically incompatible but also inseparable and one of the biggest contributors to China’s success
They are working closely with the ccp yea
hikikiki
On the surface level what Trump is trying to do by forcing everyone to move away from China may look right. But not when you realise his own phone he is gonna sell is made in China and not when you start to understand the complexity of the supply chain and the current market. It’s just not possible and this is a dumb way to do this
Short-term political gains at the cost of completely destroying things and causing chaos or distrust longer term. China is laughing at this as all cards are in their hands and they seem to be much more competent at diplomacy, more measured too
For impartial countries China even with all their obvious issues is becoming a more reliable partner than US…
Oai usually has a pattern releasing models after adding them to lmarena
Idk if its a one week thing or two weeks
I think gpt5 will be released next Thursday
This Thursday?
Or the one after this one
est ce que quelqu un sait quand l arene video sera publiee
which one comming out first gpt-5 or gemini 3
grok 5
why
is gemini 2.5 pro a base
what the base model right now for google?
2.5 flash?
plz tell 😦
so we need gemini 2.5 ultra
kingsfall is deepthink?
fr?
so google does have ultra so they js not releasing it
oh
why tho
🍊
but i dont think grok gonna be good especially after grok 4
oh
true
hopefully elon musk doesn't mess it up 🙂
maybe
oh
so gpt 5 is still gonna be much better
cmon release it already 🙃
next few week
hopefully
even free user can access it i think
js not as good as subscriptions
please be good at coding
Pretty sure it’s gone again, haven’t found it once
got it right now, but I'm pretty sure odds are like 1/100 chats 🥹
Also, I'm not at all familiar with how lmarena works, but for very simple prompts (like one above), I've got summit/zenith consistently (for a past hour or so). haven't been able to catch them with complex prompts tho.
Interesting
Summit is a beast. I dont find zenith that good ngl. Like it's slightly better than o3 (so it will get 1st) but that's it
All that work and this is the question you asked lol
i wish you could upload images with search. i really need that
i dont know why you cant. its like the arena is restricting you
@echo aurora please, adding attachments would be awesome to search
don't know why they aren't there anyway
I'll be sure to pass on, but note the #1372230675914031105 channel, it helps us organize feedback better. 
Hi there

what model is that?
Oh damn, I came on this server fo this exact reason
🍊
Yesterday, I was getting Zenith all the time
Now I can't seem to get it at all...
is that gpt 5
I think it is
I'm an ethical hacker and usually, I ask LLMs to code me tools that help me for my job
Zenith no joke made me a tool that I will now use everyday
2k lines of code, one shot
I have never seen any LLMs do that, with that precision
And I use Opus and Sonnet everyday
So if it's not GPT5, it's a really, really good model
After some testing I feel odds are even worse than 1/100 now. Got 1 zenith and 0 summit answers in 200 chats
Yeah
They can change the frequency I think
Chance of dropping
I'm so sad, I wish I had more time to test it 😦
Here's Gemini comparing the code of Zenith and the code of Gemini 2.5 pro on the same prompt
is claude 4 opus and claude 4 sonnet the same base model but opus have more time to think?
where are you seeing that
dev mode discord, they have a bot that checks lmarena apis for new and removed models
How do I join this Arena battles discord?
can I dm you the link?
you can learn more about our experimental video arena here: #1397655624103493813
that's not what they were asking for
Yeah Claude just made opus 5x more expensive per token for memes. It's the same base model fr
Like bruh
I'm so sad
sup chat
not actually
How do I join this Arena Battles discord?
wdym
@jade egret ??
what do you mean
Ok, I've joined the arena. Thank you for the patience 😊🙏
Anything is possible. Reasoning models didn’t even exist a year ago
The world will change substantially. Most won’t feel it until it’s too late
Is it possible to ask another question to the same LLM in battle mode after it is revealed which is which?
For example, if I want to ask more questions of a model with an alias, do I have to just keep trying until I get it again?
every time i send a prompt in web arena it times out
Same
When will models from #1372229840131985540 be added
are you looking for #1397655624103493813 ?
That's tbd!
@echo aurora We can’t click on buttons in #video-arena-1
the votes? looks like it's registering and working for me, I see that you voted on the most recent one.
refresh discord.
It showed no response before, delayed response
I'm scared of losing my chat access - again
could be a cloudflare communication issue
are you using some kind of vpn?
can you send me it
what's the tool?
i hate this type of error handling
yuck
new ui held together by hope
the new ui is so weirdly coded it's like it was vibe coded
they are actually checking for turnstile token failures(cloudflare antibot) & network errors & server-side vote rejections but it's currently displaying the same generic error message for all of them
instead of 'Failed to submit vote' it could've been something like :
- Failed to submit vote : Turnstile token is missing or empty.
- Failed to submit vote : Network request failed. The server could not be reached
- Failed to submit vote : Network request failed. Server rejected the vote
more specific and helps in diagnosing the problem
nvm they are using the error object, @twin acorn next time, just open the developer console and share the error message from there
no
even not wolfstride, can't bare 2.5 pro now.
what happened with the new qwen3 series on the latest leaderboard? it seems they are all gone
nvm they are using the error object, @
hey man, may I get the invite link to this dev mode discord
Google are cooking dw
Am i in the wrong server?
No one here is also helping me,
Like no one
This is from another server. I can message you the link if you'd like.
that's because you're in the wrong server
search for the invite on twitter or someone here will send you a dm
can you send me a link?thanks
Dm
Thanks
Can you DM me link please?
I can't DM you since you probably disabled accepting dm from strangers
yea its a cloudflare issue but its probably coming from your adblocker
disable it and see if it works
ur adblocker blocks some analytics trackers -> cloudflare flags this as suspicious behavior -> leads to authorization errors (401) -> server applies a rate limit on your vote attempt (429)
i think it could also be the case that some specific strings in the prompt/output trigger a cloudflare security block.
I had it in direct chat, some specific prompt with special characters (html tags etc.) didnt work, maybe because cloudflare thought this is some kind of xss attack.
im looking at my console logs and im kinda having same errors but the vote is submitted
could be that its sending 401 but didnt reach the security threshold to throw a 429 ( rate limit error )
are you talking about rendering or the vote issue?
for me it was that i couldnt submit a prompt, but maybe here the problem is something different. just wanted to note that this can sometimes also be an issue with cloudflare
but in the network view of the browser i could see that cloudflare blocked the request because of that
so maybe here its a different problem
yea you are talking about the rendering issue
had that many times
Using chagpt agent to use lmarena and watch it voting for claude models against gpt 4.1 and o3 answers is actually funny
How well does it do with voting correctly?
also, hello
Works like 30% of the time, sometimes it can't bypass cloudflare checks
I hope some kind of operator would come to LMArena too
would be quite fun
But i have nothing to use this agent for so I'm just wasting compute on useless things
Yeah, sometimes I feel bad for using AI because of the environment even if some systems for water usage are closed-loop. The Anti-AI crowd does that to me
damn.. that's easy to forget (less than a year ago)
Could you please dm me that discord invite?
We are not lacking water. Even the countries were people die of thirst aren't lacking water, they are just lacking access to water. Nobody suffers because gpu superclusters in the us are watercooled...
Welp, Meta's wanting 5GW plant and others want their own energy plants which seems crazy
I know that at somepoint AntiAI people will get real pissed off and try to sabotage some places
what are you trying to do
how to fix stuck on generating issue does anyone know
All anti AI people are about as competent as just stop oil protestors and im pretty sure the US will make any vandalism towards datacenters count as a felony so that definitly won't happen. This isn't something a bunch of idiots could ever impact and eventually 99% of people will see the usefullness of AI.
A few GW is nothing compared to the potential benefits of superior intelligence. In fact, everything is nothing compared to the potential of superior intelligence.
ok but what case, it really depends lol
like i cant help you if its that generic. do you need vision input as well? etc. how hard is the task? etc. i mean you can use 4.1, but it won't be cost effective if youre trying to do something easy (but needs fine-tuning for accuracy/etc). if it's more complex, probably 4.1 i guess
it can but not really at the same time its complicated lol. i don't personally recommend relying on it to introduce knowledge unless you're doing continued pretraining (larger scale)
I would also love to join that discord if possible
I guess so. Still I am apprehensive of ever mentioning AI on social media because they'll call it AI slop immediately and want to cancel me
Esp reddit
I like to be there on the r/singularity sub. AntiAI sub is a bit too dramatic at times to even take a peek in.
Why would you care about opinions who you think are wrong?
Maybe I am just sensitive
Does anybody know where to find the codenamed models on lmarena?
I would also love to know
I think you just make the models battle and when you're lucky, you get a codename model. That's what i think atleast.
I guess I'm asking how to find notifications of what models were added or removed
anyone know how to fix issues
with it not working.
it always does this
and then i lose all chat data
if anyone experienced issues like this before and knows how to fix please lmk
or a different ai place
that doesnt do this and lets me use claude without paying
hi, what are the limits for the models?
What the heck
Start a free chat with your AI assistant. Tell Z.ai what you need—a stunning presentation, professional-grade writing, or a complex code script—and get instant results.
the UI is sick
Yeah this is the best open source agentic coding model for sure
also the web search feature is like a deep research one
This is absolutely insane, it's like just slightly worse than Claude 4 sonnet
But 10x cheaper
Did you try it
Kimi K2 missing reasoning rn that's the only issue
for me it do well without reasoning
and sorry for saying this but the reasoning of claude models is a joke
I'm a heavy user of Claude Code, and I almost never use Reasoning
ive tried it
its meh
benchmaxxing as always
Is cuttlefish still on the arena?
yes
Benchmaxing 🥀 if these Chinese LLMs actually get added to lmarena it would help
Like the updated Qwen reasoning model is still not there
I want to see if it is benchmaxxed or real
The new non reasoning model was bad on lmarena but they removed it
They should tell us before randomly removing models I think
@echo aurora why was it removed and why nobody was informed?
It is more resistant to it especially maths and coding categories
This. I wanna see more open source models
is there an android app for for LMArena?
no
got it, it's pretty cool tho, wish someone interested in making an open source app for it
I usually have long convo on it, take long to load my chat so an android (with better optimized) would work better I feel
btw anyone knows which is the smartest ai model to use? with largest data it has been trained on and with most parameters?
Hard to say. It depends on the use case
and many other values
@echo aurora 2.5 flash lite disappeared from the leaderboard
@echo aurora and why new qwen also disappeared ?
People go Upvote glm 4.5 https://discord.com/channels/1340554757349179412/1399396203199987843
I'll flag
is it available to use on LMArena?
also why are these paid high end ai models free to use here?
is it available in battle or one vs one?
News from glm 4 air ?
No news, was flagged but will be sure to bump. 
1 or 2?
Because you are the product
Crowdsourcing llm battles
it's supposed to be basic
Looks the same to me
anyonme know this error
every time i press the button it just says same thing
i keep losing all my chat data because of this and then it purely not woprking
it does it after a few hours use of the chat
The UI is slightly different
Bro the arena is not made for production usage, it's for testing models, don't expect it to work like chatgpt
ye, lol
yes its usually because of the cloudflare thing
i reload the website and it lets me continue
hm whats cloudfare
1
It checks whether you are a human or a bot
all the gpt5 models gone i see ?
oh ok
would an adblocker be causing it
because it usually doesnt come up for me
any verification thing like that
what the most human like llm for writing
dam refreshing or restarting browser doesnt work for me
literally nothing works
the new one? (glm4.5)
Yes
Disable it and see
anyone know why or
Is it a new chat?
https://eqbench.com/
according to this site its gemini 2.5 pro and claude opus 4
its any chat i go to, after i use it for a bit it just does this and never fixes
Open up your console and share with us the erros
Is the conversation long or nah
im not sure how
yes
hi
how do i do this
oh
You are being rate limited
Kimi K2 is good but the fast providers are terrible
Just gotta wait for GPT-5 first
probably still gonna be horrible at creativity
Its good, zenith was good
oh does it do this on all ais on here
No model has been good with my language, Finnish, except perhaps gemini
The official provider (moonshot AI) is like 20 TPS
Too many typos with others
If you use it too much yes
Its for test models not prod usage...
oh, I was talking about GPT-5 likely not being creative
ah sorry
It's hot where I live
That's before they ruin it with all the safety crap
gives me symptoms
People did mecha Hitler with it
But yeah, prob. GPT4o is too sycophantic already
same thing happened with optimus alpha and quasar alpha
along with other stuff
Zenith is sycophantic too
Haven't had a chance to see it often on LMArena
they will most definitely patch that. and usually the way they patch it, it makes the model worse in some way
Yesterday I got it like 10 times in a roll
Oh damn, I really need to try it before the release
But almost never got summit
Its already removed
great
Hopefully for free users GPT-5 won't be "nerfed"
Ye, I've seen clips of it doing coding
It's good at that
at least
How is the positivity bias?
or the sycophantic stuff
Very sycophantic
Okay, I hoped to see improvements on that
Ever since the accident
with one 4o update
can it write a good novel joke?
Didn't tried this
It was terrifying to see the stuff about saying to people that they can fly if they tried it on a high building
I don't think that zenith is this level of sycophantic but it try to support you on everything you say
Ok, thanks for letting me know
Google's gemini is going into that direction too
at times I don't like it when models are too "positive"

