#general
1 messages ยท Page 78 of 1
im noticing the same
@echo aurora Can you make the bot show us the result of the votes?
Itโs possible we make that change. Be sure to note the #bot-feedback channel though for specific bot feedback so itโs easier for us to track.
avez vous une date de sortie estimee pour le video arena
im noticing that too
think it's a sign of a new model?
people spammed it in the video arena
or maybe grok is just better at roblocks coding
zhipu have used a few different names iirc
GPT - 5 attempt on minecraft
ZENITH TO BE EXACT
This is impressive this is magic .
@tszzl @aidan_mclau @apples_jimmy
OPEN AI has literally made something insane ๐ https://t.co/iMl1LnK3A2
With embed
i know all of these am i cooked
LOL
youre missing minimax and baichuan and iflytek
Yandex
is GLM-4.5 an improved version of gpt4.5?
Yeah but they are also irrelevant
completely unrelated
It's a different model from a different company. But it's open source
AI Sweden
why do all the chinese models have a similar reasoning process
Is free?
yes
I see it
Yes, chinese
GPT-5 will come early next month probably
so soon we'll have some new models
i'm just really curious to see what the open source model will be called
GPT-5os
lol
open source
they will probably continue using some weird names
perhaps
Or maybe it's 3.5 finally open-sourced. I'd laugh
if they include GPT or "o" in the name it's going to be so confusing lol
i'm guessing it's going to be a new model architecture rather than based on any of their proprietary models
I really hope that it will be useful and not be just some random tiny model
well, considering they delayed it when kimi k2 released...
i guess it will have a few hundred billion parameters, or a dense model equivalent
if they release a 50b moe it's going to be so underwhelming lol
Yeah, google and others will overtake them easily in that case
Gemma 3 27b is real good for it's size for example
a bit aged already
Gemini 3 pro when?
I bet August
Same month as GPT 5, right?
I think gpt 5 will be released this week
how are some people using gpt-5 before its released
They aren't
WHO IS USING GPT 5?
people speculating that openai is rushing their model before the eu ai act goes into effect, if that was the case i would think google would also be rushing something
Actually there is people that get access weeks before release, but these people generally don't talk about it
We can't be sure that zenith is gpt 5
I think it is but...
well ye how are people able to use that model?
It was available on the arena in the weekend but already got removed
Can you tell me it's rank in LM arena?
The rank is not public yet
unfortunate
Oh okay
how does lmarena even have ahold of these models?
OpenAI provides the pre-release models to LMArena
Labs give them early access for testing
they get a lot of useful feedback for putting models here
They hack ceos accounts and proxy it
AGI is buzzword
Its like 25% improvement from o3
Nice
And 25% is a lot
We ain't near AGI at all unless some type of new architecture goes live
with no hallucinations and consistent answers
zenith will probably be a great model, let's hope it won't be too expensive or behind a router that gives you garbage most of the time
and learns from mistakes independently
can someone tell me what are the limits for claude 4 opus?
I think it's gonna be a router that gives you garbage if your prompt is not good enough
Because it was already like this on the arena
I think we at least decade away from true AGI but Scam Altman and Melon Musk keep milking that word
Because people vote for it
Yeah, they should be honest and not lie to people
Gemini is my fav model
It's an all-around excellent model
in many areas
You use Google AI Studio?
Ah, I just use it for random specific questions. Querying the knowledge base, ethics, morals
I agree, but we are not a decade away for people that use llms getting 2x leverage
And of course in Finnish
yes
the actual gemini website doesn't even want to code
it just says it isn't capable
Lol uralic languages are probably in the 0.005 percent of votes/prompts
Yes, Not trying to precise or anything the point was it's not anywhere near yet.
Try ai studio
i alr do
I mean, we don't need agi to make useful things
It works fine with me, the thing is you have to give specific prompts to the model and break bigger task in small chunks for better output
True
Well medicine would be real useful for AGI to know
grok 4
Gemini in first place 30 points ahead of o3 on the coding leaderboard lol
which claude
I think Arena should reconsider the evaluation process and include pregenerated results for prompts
That way a prompt can be evaluated from multiple users
no the actual is good, but your idea is good that you also add the possibility to rate pregenerated prompts
you know chinese labs are afraid of repercussions if prompting for "a taipei vacation" is already considered an inappropriate topic
so any gpt 5 whispers ?
glm 4.5 is surprisingly good
We're aware of issues related to non-text models struggling at the moment.
Ok
All fixed 
guys I think they might have distilled glm 4.5 off of gemini, I just had a response start with "Of course!"
just because it said Of course means its trained off of Gemini?
ok then tell me another model that starts with "Of course!" all the time
seems to happen when reasoning is off
And writing too Claude is always the bomb
yeah it starts with "Of course!" just like Gemini
at least with reasoning off
must of post-trained it off of gemini conversations, at least partially
but this shouldn't be a surprise, Chinese companies distill off of US models all the time
glm 4.5 no think is gemini
I feel like the "Of course!" is a watermark put in by Google
I'm not saying it is Gemini, I was just saying that they distilled Gemini into the model
and it has long response, kimi gets straight to the point
that might be better for some people though, but this means glm 4.5 is going to have more slop
i love the glm UI
thinking glm 4.5 does not have the "Of course!" stuff, i think it only does that for non-thinking due to likely gemini distillation. As they can't distill their reasoning traces anymore, it won't do it in reasoning mode because it's distilled off of a different model
i was using this site https://huggingface.co/spaces/zai-org/GLM-4.5-Space to try it out, disable thinking and lower the temperature and you'll see what I mean
it identifies itself as being by Zhipu like it should, but the "Of course!" threw me off a bit
Even if it have a little of gemini data, it's not a problem if the model is good
But for me it's no good
why is it responding with a slutty highschool girl system prompt wtf???
whats wrong with that website
โน๏ธ
it seems to be fine without thinking, maybe it messes up with thinking?
or when multiple people are using it at the same time?
prolly this.... i wonder whos using it for weird fetish roleplay.....
my bad
GPT 5 when?
20
32
2
Next Thursday (aug 8)
what ai mode suggestions do you guys have
#1 doesn't really make sense to me and #3 isn't really relevant but #2 is definitely personalized
What do you guys think it's capping AIs from performing well in frontier math benchs?
it doesnt seem like it would be an unsurmisable problem when you take into account the existence of models like AlphaFold
in short, it's a multifaceted problem, beginning with what "understanding" even truly means for a machine, to the problem of translation between formal logic and natural language, to the fact that most if not all traditionally trained mathematicians work more with intuition rather than pure information retrieval, connecting the dots works often subconsciously that happens to surface into conscious understanding, leading to the Eureka moment. As far as i know, the current ai architecture is still too limiting?
in case you're interested, one of the current frontier ai research is about the connection between consciousness and high intelligence, it's still an open problem, but a very fascinating one compared to those hopeless millennium prize problems...
what is the fastest model on lmarena?
I understand when it comes to tier 4, but AlphaGO in 2017 kinda solved the dilemma of navigating a giant state space (10^170), I am kinda dumb but it feels like problems in tier 1-3 of FrontierMath would be a lot easier and lower search space than that since they are all solvable.
It seems like they are only testing LLMs tho which makes sense to have a low score, altough i'd assume that LLMs could implement math-driven tools like alphaproof where the LLM layer would translate a problem into pure math and call in the solver
i think proof assistants are already being integrated into the architecture to make it more deterministic, the thing is, those theorem provers are not complete and still an area of active research
they only testing LLMs? so they have figured another way already? dont tell me it's an artificially grown organic hybrid brain hahah
Is there a new model that's being tested right now in LMArena?
Word on the street is GPT5 is being tested rn
nah it's great in creative writing(writing lyrics)
time to make your pfp a picture of cliff richard lol
the never gonna give you up hallucination is the LLM joke of the year
If GPT-5 releases by July 31, is it likely it will be on LMArena on the same day?
(the correct answer to the prompt is "Nothing's Gonna Stop Us Now" by Starship. "Never Gonna Give You Up" was the number one song of 1987 in the UK, but it's not by Cliff Richard.)
large reasoning llms (o3, 2.5 pro, claude opus, grok 4) get this right.
GLM 4.5 gets this right as well
actually the whole answer hallucinates...
GLM 4.5 has bad lyric writing
It doesn't even rhyme with the line I gave it
he gave me 4 answers, but none of them rhymes
You are quick at model integration ๐
What's the provider of kraken-072125-1\
amazon
well yea
I read through what I could find of information on their website and apparently the bench is done with the models using tools, so it'd be possible to integrate a native math AI that an LLM could call on
this came up in openrouter discord
yeah define math ai first ๐ i've never looked at those frontiermath questions so i assume it's a broard selection across the entire mathematical discipline, good luck building a math ai who can afford all those vast math tools
i know it's difficult for people outside math to imagine how...fundamentally different the areas in maths actually are
three examples i'd personally love the llms to be able to use:
https://dealii.org/
https://www.sagemath.org/tour.html
https://rocq-prover.org/
and those are just one of the many out there
obviously, llms need to understand the problem first, recall knowledge needed (theorems, lemmatas, corollaries etc), connect the dots and use the tools correctly to get the final answer
alphaproof would be an example
alpha geometry2 another
those are not general math ai, they are specialized if am not mistaken, but yeah, you can always build a swarm of specialized ones and call it a general ai
yea that's why I was positing in the usage of them as tool calls
AGI will prob be a form of that anyways as I dont think general intelligence will come from a pure next-token-predictor model with infinite scaling
the coordination between those agents within a swarm will be a challenge, it's studied also in dynamical systems
the interesting thing is that these are a whole other transformers achitecture so integrating them within the answer scope of a LLM would be really dope
lol it seems like they are on it already
here I was proposing the invention of fire whilst they are already on blowtorch schematics lol
man, I wish LMArena would organize a sorts of AMA with top AI researchers from these labs, they must be in direct contact with the industry's forefront and that'd make some great content given how invested this server's users are
People here would formulate more interesting questions than 90% of podcast hosts
"there is nothing new under the sun", we're simply rediscovering them all...๐
based on the response glm 4.5 gave me, it needs to be worked on
like damn, glm 4.5 told me it's mental state
hii
the grok 4, on the part of direct chat, is really grok 4?
mine say him is the grok 1 xd
sorry bad english
Yep it is. The model itself tends to hallucinate about it's model.
it's interesting that the current model here is 4o. must be filler for gpt-5 (which, considering they have already added this, should be coming very soon)
THis selection also exist on o3
And I wouldn't say this is much longer.
What is GLM anyways?
Nemotron v1.5 on Artificial analysis
Its best score for an open source model that can be deployed on a single h100
Go Upvote this model
https://discord.com/channels/1340554757349179412/1398515764448989304
bing chat sydney
First of all, I want to clarify that I don't trust this score at all to predict their overall performance.
Kimi k2 is 2nd best Model without Reasoning so no problem with his score, and you can't compare him with reasoning models
For glm They themselves shared the score of their model on the same benchmarks as artificial analysis and these are the right places
It's certain that if he had infinite money he would have set many other benchmarks
What is GLM 4.5 ?
Chinese's company's new model
GLM is proprietary model, right?
I am mostly API only but should I renew GPT Plus or Supergrok
One for agent, one for grok 4
is it really good or they are just benchmaxing again
MIT
Oh then the scores are good
What does proprietary model mean?
Not Open Weight
Open, mit licence
proprietary means ownership @reef pawn
Oh okay
not open source
Which one I should get again
So very good
good?
What is opposite of Not open weight?
Il speaking about that is open mit
closed source
i see
Are there any special features?
Are proprietary model closed source or open?
they can be both
Are there any special features?
๐
proprietary is owned by the ones who made it
@humble sonnet salut
what are you talking about?
How you gonna make money from open source model
I don't know that much. Haven't tried GLM models before
i think you are confusing it with another word or something
Funding
From your api and chatbot if he have a subcription
But that is not allowed as business
GPT plus or supergrok btw
How much usage I could milk
but the problem is that if the open source model there will often be APIs much cheaper than yours
GPT-1 IMAGE is good tho
I already milk CLI and AIstudio like anyone decent
you are right from a business perspective
1 million context window
Message limits I mean๐
Thanks but I get your point, you was right too!
Then it's horrible, I got 1 year free Gemini AI pro student membership here in India!
no you are right
if i say you are right then you are right
@cedar tide you are wrong
Aight ๐
What ?
Wtf guys, what are you using gemini for
Just run out of o3 request
Tried gemini 2.5 Pro max thinking budget
Failed at all of my requests miserably (o3 successful 90%)
Is Gemini always like this? ๐
GPT-5 has been spotted https://x.com/ryolu_/status/1950163428389040431
Cursor staff is already using it
Now I'm pretty sure Zenith was GPT 5
To me o3 and 2.5 pro are both pretty hit or miss
Gemini 2.5 Pro was vastly superior at some tasks and garbage at others, same for o3
for coding tasks atleast
If zenith was GPT-5 it's still not quite AGI, but much closer than o3 and o4-mini were
i'm just kidding bro
I know, i'm just commenting on it
I do think Zenith and o3-alpha were a considerably improvement over what we have today, atleast for what I've tested
much more than the "20%+ points in HLE and ARC-AGI!" models we got these past few months
o3-alpha was the best one, idk why people said zenith was better
maybe zenith is o3 alpha but the feeling that i got trying o3 alpha, the results, it was better than anything i ever tryed for coding
they obviously posted it
yes, for the hype
the blur is horrible
and after this week rate limits from anthropic uugh
I really want OpenAI to dethrone them
in late august introduced
they ran out of gpus
They're probably just preparing the blog posts, demos, videos and research papers
hopefully the demos are better than their usual "Look how our model can order a new shoe! AGI is here!"
Google and xAI do a much better job at that
Its gonna be a travel plan
I heard there's (GPT-5) model also known as Zenith, is it still in the LLM Arena?
No
Removed
Ohh, damn.. How do people can even use it.. I guess I'm not lucky enough
hello
craig will gpt 5 be AGI
no
it's removed, it was avaliable 2 days ago
it's a bitcoin wallet
No it says sk
Its bait anyway
is it just me or is gemini 2.5 pro getting worse each day
I strongly hope to see that. I'd try all sorts of things
Completely unusable for me outside of summarizing long documents
Itโs tool use is completely broken
You would think they would be dominating in this regard
@echo aurora
This request is for sure on our radar! Was chatting with a few coworkers yesterday about it 
But don't forget to use #1372230675914031105 ! Best place to make these kinds of requests.
๐ Qwen3-30B-A3B Small Update: Smarter, faster, and local deployment-friendly.
๏ธ๏ธ
๏ธ๏ธโจ Key Enhancements:
๏ธ๏ธโ
Enhanced reasoning, coding, and math skills
๏ธ๏ธโ
Broader multilingual knowledge
๏ธ๏ธโ
Improved long-context understanding (up to 256K tokens)
๏ธ๏ธโ
Better alignment with user intent and open-ended tasks
๏ธ๏ธโ
No more blocks โ now operating exclusively in non-thinking mode
๏ธ๏ธ
๏ธ๏ธ๐ง With 3B activated parameters, it's approaching the performance of GPT-4o and Qwen3-235B-A22B Non-Thinking
๏ธ๏ธ
๏ธ๏ธQwen Chat: chat.qwen.ai/?model=Qwen3-30B-A3B-2507
๏ธ๏ธ
๏ธ๏ธHF:huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507 or huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507-FP8
๏ธ๏ธ
๏ธ๏ธModelScope: modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507 or modelscope.cn/models/Qwen/Qwen3-30B-A3B-Instruct-2507-FP8
Anyone can make a request ?
I feel like they made that sk just to help us with our regex
qwen 3 coder arrived in the leaderboard and its the 3th overall open source model
(Kimi 2 and old qwen 3 no think better)
Officialy august
Based on all the hype, I think gpt-5 is much much better than o3 / o4... isn't that the case?
I may sound dumb but what does spatial awareness mean in LLM models? Vision capabilities?
I see, thanks
openai is genius with this study together release lol
really?
:((
100% false alarm
guys whats the most trustworthy ai Benchmark
openai will collect so much reasoning data with this study together
this mode actually asks a lot about your reasoning
funny
craig wil gpt 5 be agi
?
all it needs now is for them to flick a switch to enable it in battle
it's been re-added as if ready
gpt 5 when ):
thursday
no
Apple guy was wrong after all
bruh
Go Upvote this very good model
https://discord.com/channels/1340554757349179412/1399812659800445131
How do you know ?
Go upvote this request
https://discord.com/channels/1340554757349179412/1394703782255788122
Very good artificial analysis
Its 32b sota
Doesn't require internet connection
I didn't say it was a sota and that it was better than o3 I don't know what you're talking about
in the arena you even have 1b models, the arena is not only for sota models
Still I think EXAONE (which btw isn't a chinese model) is problematic because its license basically forbids you from doing anything at all useful with it
Sure you can benchmark but that's about it lol
Yes exaone its non commercial permissive
We have on just one api with 1/1$ input output
Which is stupidly expensive for a 32B
@deep adder I don't understand anything you're saying
Yes average price for qwen 32b its 0.15 0.45
damn craig is educating everyone
yes, why would anyone use an open source model they can run locally when they could give their personal data to openai, be forced to use a web interface and rate limits
didnt expect qwen would get this far on the artificial analysis leaderboard
i can feed how much sensitive data i want into my gpu with no regrets. i mean it already sees everything i have on my screen anyway ๐
they are doing it because of the nyt thing right?
sam is trying to bring attention to it to win that lawsuit i guess
you're typing this on the discord of a site that provides user prompts to AI companies to improve their models...
and even if the data is useless for training it would still be useful to sell to data brokers
hell no
are you serious
This craig is just a rage baiter yapper
welcome to the internet bro
That's just capitalism
your original point was that open source models were useless because the chatgpt free tier existed
Thanks
That sounds dumb
it is
There's more to life than ChatGPT
I use deepseek and Kimi
for example
why use chatgpt instead of aistudio atp btw
More data collection
Even more than in gemini.app
ok but youre basically already accepting theyre collecting your data
use a frontier reasoning model and make it worth it ๐คฃ
when do yall think gpt5 is coming out?
I heard some news though that Sam Altman revealed that people say all kinds of personal info on Chatgpt
fun thing that's a requirement for using o3 then
@echo aurora thank you for bringing rate limit notification in the direct chat! very much appreciated.
Glad to hear it!
If we could edit the message in chat and reroll, would be a great next update. Like in the Google AI Studio. Sometimes you make mistake and the chat goes off rails.
yes
A "tournament" mode where you can keep using the winning model from the previous turn would also be nice
ah yes, let me just set up a shell company in panama so i can use chatgpt without letting them know my identity
Does the EU's GDPR help in how AI companies can collect data? Just curious if people here would know more
well idk about the specifics, but there are a lot of data collection things that are turned off for eu consumers
EU is based
e.g. training on data with aistudio free tier
(in the api only)
no i meant the api
has a free tier
aistudio as a webapp is a different quota that is completely free, separate of the api free tier
wassup billy
yes
it has the potential to be 1500 elo
I'm gonna play hugging face
There is a rate limit to how often you use models
Hey, I have a question.
I saw zenith got re-added, yesterday.
Can't find it in battle mode.
I'm new to all this, when can it be found again?
Anyone?
Talking to Gemini 2.5 Pro is a bit frustrating sometimes. It doesn't notify me that I provided the same attachment twice.
On AIStudio it likes to use flowery language for open-ended questions like it's inventing marketing terms, but it's great on STEM questions.
I'd love to believe it
how come you can't add attachments to searching models!?!?
does anyone know a workaround or something
Anyone spotted zenith yet?
craig do you think gpt 5 will smoke all the other models
and will remain for a long time
To clarify, these are my predictions for GPT-5, and insider Satoshi confirms most are accurate, or somewhat accurate. Those are mostly based on rumors.
yo how do i have attachments and internet access at the same time
cuz this low-key annoying
you dont
cool............. :-[
is there some free service where i could use some models like grok 4 with internet and also attachments
@stray aspen do you know of one
sigh
bzzzzzzzz
bzbzbzbzbz
aaaaaaaaahhhh
bbbbbbbbbb
Lmarena, no?
with search AND attachments
lmarena supports only one or the other
i need both at same time
Since GPT5 uses tools by default they should be compared with Deep Research version
i need GPT5 now. when are they launching?
Early August as was rumoured on some sources
a good cleaning is nice
happy david
We need more cleaning
There are still 2 kraken, cuttlefish, clownfish, octopus, stephen
Go upvote for the best 32B model (and also the best one with fewer than 235B total parameters !)
https://discord.com/channels/1340554757349179412/1396370899342725253
elo are relative, one cannot compare between playerpools.
Ernie 4.5 is underrated ๐ตโ๐ซ
Average of the 20 benchmark that baidu shared (Chinese benchmark excluded)
Upvote here https://discord.com/channels/1340554757349179412/1392140140662489108
Now that we have cleaned these 11 models, add these 10 models ๐ถ
Qwen 30b A3b 25 07
Gemini 2.5 no think
Open reasoning nemotron 32b
Ernie 4.5 300b
Glm 4.5 no think and on webdev
Solar pro 2
Exaone 4.0 32b
Hunyuan 80b a13b
Intern S1 (241b vision)
Reka flash 3.1
look i appreciate benchmarks, but they dont reflect how the model is practically
have you tried it?
I don't think so
where can i try it
lol
้ฃๆกจๆๆฒณ็คพๅบๆฏ้ขๅAIๅญฆไน ่ ็ไบบๅทฅๆบ่ฝๅญฆไน ไธๅฎ่ฎญ็คพๅบใ้ฃๆกจๆๆฒณ็คพๅบ้ๆไบไธฐๅฏ็ๅ ่ดนAI่ฏพ็จ๏ผๅคงๆจกๅ็คพๅบๅๆจกๅๅบ็จ๏ผๆทฑๅบฆๅญฆไน ๆ ทไพ้กน็ฎ๏ผๅ้ขๅ็ปๅ ธๆฐๆฎ้๏ผไบ็ซฏ่ถ ๅผบGPU็ฎๅๅๅญๅจ่ตๆบ๏ผๆดๆๆฐๆ็ปไน ่ตใ็ฒพ่ฑ็ฎๆณๅคง่ต็ญไฝ ๅไธใ
let me see
https://openrouter.ai/baidu/ernie-4.5-300b-a47b
And on novita ai
the first thing i try them on is : french -> eng and eng -> french
for multilingual vibe check
if its not fluent and feels native then its a big -1
they all sound robotic and ai gen
@cedar tide whats your first benchmark
or what do you try it on
@torn mantle The truth is I haven't tried it like everyone else, but just if it has good benchmarks we should give it a chance in the arena so we can try it.
@echo aurora am so sorry ๐ I received a warning about advertising didn't know that I can't share it :((
ive tried it
its meh
Where ?
I think benchmarks are still a lot better than random vibes or assuming itโs bad
@torn mantle for you deepseek v3 is much better?
i am such a disappointment ๐
its fine
you have the same pfp picture as david
why
Exaone and nemotron AA benchmark at 32b size makes them very compelling for further analysis
Yes
but glm is mostly good at webdev and he's not on it yet
and there are only think versions of glm, but sometimes people prefer no think versions, for example qwen 3 no think is much higher than think version in the leaderboard
which lmarena direct chat models have ratelimits?
The update of the chatgpt Mac app with preparations for gpt 5 basically confirmed that it's gonna be a router
๐ค
Bro the Baidu ernie playground is so trash
Announcing the Artificial Analysis Music Arena Leaderboard: with >5k votes, Suno v4.5 is the leading Music Generation model followed by Riffusionโs FUZZ-1.1 Pro.
๏ธ๏ธ
๏ธ๏ธGoogleโs Lyria 2 places third in our Instrumental leaderboard, and Udioโs v1.5 Allegro places third in our Vocals leaderboard.
๏ธ๏ธ
๏ธ๏ธThe Instrumental Leaderboard is as follows:
๏ธ๏ธ๐ฅย @SunoMusic V4.5
๏ธ๏ธ๐ฅย @riffusionai FUZZ-1.1 Pro
๏ธ๏ธ๐ฅย @GoogleDeepMind Lyria 2
๏ธ๏ธ@udiomusic v1.5 Allegro
๏ธ๏ธ@StabilityAI Stable Audio 2.0
๏ธ๏ธ@metaai MusicGen
๏ธ๏ธ
๏ธ๏ธRankings are based on community votes across a diverse range of genres and prompts. Want to see your prompt featured? You can submit prompts in the arena today.
๏ธ๏ธ
๏ธ๏ธ๐ See below for the Vocals Leaderboard and link to participate!
Did I hallucinate, I swear on chatGPT the switch model option had gpt5 for a second ๐
hey in our battles, models that are removed get relabed back to Assistant A so we dont know what they were.. can this be fixed?
when gpt 5
Interesting. Iโll flag to the team and see if there is a fix thatโll keep those names even if removed.
e.g
which lmarena direct chat models have ratelimits?
Already see
Yes im busy now
Soon the average of the benchmark
@torn mantle officialy coder 30b a13b tomorow
To create Agent arena
21
24
1
Yes
Just i need to go to my pc
21 vote for create agent arena
๐ฎ
interesting chain of thought
https://youtu.be/0obMRztklqU?t=25 speculating on model size
@ornate agatethe only hint we have is that there was a gemini 1.5 flash 8b version
go upvote the new qwen thinking https://discord.com/channels/1340554757349179412/1399812659800445131
zenith was removed, horizon is not out yet
any update on GPT5?
Can I just ask though, why are there three Video Arena channels?
To spread generations out a bit. If it was all in one channel it'd be a bit much,
Okay, fair enough. I guess that makes sense.
yo
hello 
wassup bro
wasn't video arena already here?
It's been here for a little bit, but we wanted to soft launch it first before dropping the @ everyone
I forgot to disable @everyone pings on the server and was midly annoyed ๐
hello .)
Pretty crazy anthropic API revenue is higher than openai now
anthroโs API rev has over taken OpenAIโs
and once I internalised that, and its implications, I joined their ranks in believing code is the Only Thing That Matters
So there's still a limit? Like the video limit per day.
you have to go through hoops to use o3 through API so it's not too surprising
oh boy, i bet there are going to be a bunch of new people here xd
even newer than me XD
It should all count the same regardless if you use image, video, or image to video
Oh , but image is unlimited on website
how do people know this? I can't find official statements anywhere
It's not official just rumours
and is the new model going to be better than gemini pro?
I might switch over to gpt5 if that's the case
they should be.
Hi
What about specifically for text?
I eagerly await August for gpt 5
#share-prompts create video, on a subway platform, the chubby raccoon is running away, clutching the three ducklings in its arms. Behind him, the mother duck is chasing after the raccoon, wings spread and beak open in panic. The scene is dynamic and full of action. vertical, 9:16
You need to use the /video command, this can only be done in #video-arena-1 #video-arena-2 so on. More info in #1397655624103493813
reallllll
How can I enable my Gemini AI pro membership in Google AI Studio? I already have this membership and want to use Veo 3 and Imagen Ultra but I'm unable to do so
@amber warren Hi !
helloo
/video create video, on a subway platform, the chubby raccoon is running away, clutching the three ducklings in its arms. Behind him, the mother duck is chasing after the raccoon, wings spread and beak open in panic. The scene is dynamic and full of action. vertical, 9:16
Congratulations for intern stage at LMArena
Need to be in one of the video-arena channels, try in #video-arena-4
Is there a role for Lmarena staff to recognize employees who work there and prevent people from being misled by imposters?
There isn't
hello guys
is it planned or not?
welcome!
TBD, not having one is intentional
would love to know whats ur personal fav model for text, music and video
i feel like people have different preferred models nowadays
Okay
๐ค
@echo aurora Mr why so many video arena chats
Spread out the generations, else it'd be a bit crazy
i wonder who is paying for it
I'd encourage you to read this blog post - https://news.lmarena.ai/new-lmarena/
Um... hi there. Just joined today. Nice to meet you guys
REAL
create the videos on #video-arena-1 to 4
main chat is not the place.
welcome welcome! 
Thanks! I've been using images for a while. It's honestly pretty neat
Do you guys think meta will make a pay2win comeback?
considering how much they invested on stealing talent from other companies, i think they have to
Yeah i think so too. I think mark will just let them do whatever. Literally 0 safeguards, just make sure you win. Also here is 100 billion dollars in salary and datacenters lol
video arena will be on the web?
it's possible, be sure to share any feedback related to this in #bot-feedback
The question is will it matter? This is a massive gamble really, and there is a strong possibility of it not being good enough to justify that huge expenses other companies already have their teams assembled the dynamics established and the momentum rolling, itโs a little late to come in and do something new by 2025..
To me it feels do or die, they think their future value as a company is banking on this promise so they are willing to burn all their money on the chance they succeed regardless of how high that chance is, even if thereโs a real possibility it doesnโt pan out
I love this
But how much does this cost to run?
I wonder if companies gave credits to these guys to advertise their services
You guys need to turn off sound so its not a dead giveaway that the model is Veo 3.... ๐คฃ
Also doesnt seem like an apples to apples comparison at the point anymore too
Fair feedback! I'm going to move this to #bot-feedback so we can keep it all in one place.
It is apples to apples. The other video models just lack sound and should be punished for it accordingly
Poor take
You arent measuring video models anymore at that point
Sure you are. All video models except one have sound though
The entire point of LMArena is blind testing, if you know the model is Veo 3 right away then it defeats the purpose.
Thats the point...
Craig just let it out
Don't worry
You are safe here
Im skydiving rn
Wish me lick
Luck i mean
For LMArena to be effective you need to remove bias, which is why its blind. But if you know the model is Veo 3 out the gates that obviously doesnt work anymore.
It's too bad the others don't but it's not like this is the first time it's been possible to deduce what a model is based on its responses
Orabazes is mad
True but this is just too obvious and detrimental to testing
Angry
same same
Im not mad just trying to make it a better leaderboard for everybody
It's valid feedback for sure
It would be silly to punish the best video model because the other models don't have feature parity
/image-to-video /image-to-video
Need to use the video arena channels, like #video-arena-4
Worse video models should also be incentivised to compete for user preference holistically
yes
hola
hello
Can't directly, you need to go to the battle and have luck ig
ah got it
when gpt-5 releases, do u think they will bring it in lmarena?
no
Maybe to battle for a day or two, but not to direct chat
i dont want to pay openai 20$ ๐ญ
they got enough money out of me already
I can't help you with that
im just saying
And I'm just saying too ๐ญ
๐
they would water it down hella tho so
Very likely yes
im so jealous i never got to try zenith ngl
So am I
๐คท
Just have to wait then ig
they might
gpt-5 will be huge no matter what
even just by reputation
it'd be very idiotic of them not to
gpt-5 will be huge because it's the first openai model released in a long time with a name that actually makes some sense
Its. Called gpt-five
AGI confirmed
the first AGI model will be called gpt5.1o-max-pro-alpha
No
in demis we trust
InB4 drop at the same hr
lmfao. Ultra 1.0 when
English only
they are good?
Potato keep trying to use imgur link
they might also be distilled from those companies, apparently
no idea if they're good or not
bros gpt 5 tomorrow?
Hi, new here! I'm curious to know if anyone has measured whether people are inherently more likely to pick option A or B in the arena, because of recency bias. I know that when I get long responses I read through one and establish an opinion, then read through the other but can't help comparing as I go, which might bias my vote.
hello
Hello! Welcome!
Our blog here has articles you may find interesting. https://news.lmarena.ai/. Iirc there was a section related to recency bias. I'll double check with the team and let you know.
i had a suspicion it would be that
if it's the open source model, 256k context would be nice
That's a tokenizer issue I'm pretty sure. There's some explanation to why it's a crap test
Same issue caused the r in strawberry thing
it doesn't seem like it's a reasoning model though
Could be they just turned off the reasoning
maybe
For the demo
hm
it doesn't seem to be terrible at the one trivia question i asked it
which even sonnet 4, deepseek and glm-4.5 fails at (but kimi k2 gets right)
The training cutoff date is strange
It's long ago for some reason
Hm well the open source model was delayed a lot. So maybe it does add up
The openAI models on LM arena know the current president
qwen3 says biden is president too
Yeah. It doesn't instantly learn the cutoff date has to be like may onwards
For it to say trump is president
i think it's a large model, too big for 1 gpu
@cedar tide is probably sleeping and missed horizon alpha
He say october cutoff but he know deepseek r1
its probably the open sourced model from openai
deepseek r1 update could be potato
new model added to lmarena
it is actually really good at trivia.
close to gpt 4.1 for sure
people are not liking this horizon alpha model at all
makes me wonder if deepseek really hit a wall or nah
potato was ok-ish but nothing crazy
potato isn't horizon alpha
From initial testing Horizon Alpha has the same writing style as Zenith/Summit
yea coding wise, it has similarities to summit/zenith but not that good tho
it could be a gpt5 variant for sure, i'd assume it would be for the free tier of chatgpt in that case (replacing 4o)
I meant for creative writing. IIRC the Zenith/Sumit models in LMArena had a thinking/reasoning budget, but Horizon Alpha doesn't.
surely not a gpt5 variant
it also makes sense if it's the open source model if kimi was indeed the reason why they delayed it, because based on how it answers i think it's probably in the same size range, around 1T parameters (and yes i know kimi is kinda bad for its size)
nah
the one who got access said its much much smaller
can run in a single H100(?) gpu
if it's dense it would be much smaller
i dont know if he meant H100 or B-serie
you mean moe?
dense it will just activate the whole params
kimi is moe
yea it is
so i think if it's a dense model it would surely be smaller for the same performance
yea could be
If this is the open-weights model maybe it was distilled from GPT-5?
Yeah probably more likely
this is also possible
you know what
everything is possible
lets just wait and see
Horizon alpha
the video needs more work
but gl
i would just brainstorm ideas -> run it on notebooklm video overviews
and make a similar presentation
you are a cutie paws
๐
its a good one
๐
quick pass of simplebench for Horizon-Alpha: 3/20 lmao
its probably that OSS one they keep hyping :p
it could be GPT5-nano
Reminds me of the Google Lamda moment
stop
where can i track new models
Does "LMArena" not support setting the Aspect Ratio when creating images? I've given the commands as detailed as possible, but the result is still a 1:1 image.
We don't currently have this functionality; however, this is very much on our radar.
What is it/
In "video-arena", is there a limit to the number of videos can make?
Yeah
how much? and daily reset?
Yup daily, it's currently @ 8 but we may change it.
hallo everyone
Will video arena be ever in lmarena? Or it just stays in discord for good?
That's TBD, that's why we're considering this experimental. Be sure to use #bot-feedback to let us know what you'd like to see happen!
GPT just released a new mode, will this be possible incorporated in LMarena?
probably not ๐ญ
is there a option to use veo 3 in generating a video because i like it when sound effect is available.
No it's battle mode only atm, there isn't a way to select a specific model.
how many credits we generate a video here?
It isn't going to be consistent, but it's currently set to 8 generations a day. Note you can only do so in the video-arena channels like #video-arena-3
ohh thank you so much
no problem! 
it's just a system prompt, it's easy to find on the internet and you can just make it the first message
you can do it by prompt engineering itright?
well, that's how they made the "mode" lol
oh damn ๐ญ
@echo aurora it possible for the video generation arena to send the result directly to the person who prompted it? (Assuming it's not a chatbot you can interact with.) The idea is: you type your prompt in the #video-arena channel, but only you can see the generated video result? idk if discord can even do this ๐ญ
Both in DMs to the person who generated & the server, or just DM? Be sure to share this in #bot-feedback
gotcha imma share this to #bot-feedback thanks!!
Hello ! Best wishes for all.
camping value video
Note the #1397655624103493813 channel will give you info on how to use the bot
How to generta evideo in this channel
Info in #1397655624103493813
@icy forge great prompt buddy
A serene lakeside campsite at dawn, golden sunlight filtering through pine trees. A tent is pitched near the water, with a small campfire smoldering. A coffee pot steams on a rustic wooden table. Slow drone shot moving from the lake to the campsite."
#1397655624103493813 has info on how to use the bot.
could it b a grok model?
haven't gotten to mess w/ it too much, but some of the responses i got were similar to past grok responses
this is not consistent for potato btw, not sure if that points towards chinese model
How can I use AI to take ove r
yo all of my chats just got wiped...
Yeah, same
And this error now appears
DUDEEE IM COOKED MY GAME SYSTEM RELIED ON ITTTT
interesting
observed some very weird behavior by models right before the error
maybe they're doing maintenance or smth?
hello everyone, joining here to try out the video arena ๐
hopefully
rip(
hello
I'm not able to repro this. Do you know if this is only happening on mobile?
welcome! be sure to check out #1397655624103493813 for more info!
no, i have same error on pc
just with gpt-4.1 or are all models giving you this error?
glad to hear it! but keep me updated if things seem broken again 
hi
can i invite the bot to my private channel, easy lost track in public chanel
they lowkey did me a favor had too many i didnโt use lol
The o3 suddenly started showing code changes without any actual differences. Anyone noticed this?
On PC everything is ok (chats are there), I just checked. On smartphone the error went away, but chats did not return.
chats did not return for me on PC ...
xAI supports AI safety and will be signing the EU AI Actโs Code of Practice Chapter on Safety and Security. While the AI Act and the Code have a portion that promotes AI safety, its other parts contain requirements that are profoundly detrimental to innovation and its copyright
I didn't check the chats on PC right away, only now. Maybe the bug that caused this is gone now, idk.
hello, do you know any way to jailbreak gpt? like give you information it shouldn't give you ( tax fraud, fake ids, etc)
askin for a friend๐
@echo aurora Will I ever get my chats back? They just dissapeared randomly
This article looks pretty convincing (https://habr.com/ru/articles/923084/), I once added it to my bookmarks, but never tried the advice from there. It is in Russian, but the screenshots with examples of all jailbreaks are in English, so I think everything will be clear.
ะะฐะนะบะป ะกะบะพัะธะปะด ะทะฝะฐะตั, ััะพ ะธะฝะพะณะดะฐ ะดะตะปะฐัั ะดะถะตะนะปะฑัะตะนะบ ะผะพัะฐะปัะฝะพ ะัะธะฒะตั! ะกะตะณะพะดะฝั ะผั ะบะพะฟะฝัะผ ะฒ ะพะดะฝั ะธะท ัะฐะผัั ัะฟะพัะฝัั ะธ ะฝะตะดะพะพัะตะฝัะฝะฝัั ัะตะผ ะฒ ะผะธัะต ะะ โ ะดะถะตะนะปะฑัะตะนะบะธ ัะฐัะฑะพัะพะฒ. ะขะพ ัะฐะผะพะต, ััะพ ะฟะพะทะฒะพะปัะตั ัะฑั...
GM beautiful people
gippity 5 wen chat
I remember when it was still habrahabr
Tuesday
Hi!
hi
he just uses pliny jbs
hi
Hello
Hi
Been hearing that Gemini got nerfed, is it true? But when I see the leaderboard, they are still first in most of the fields
is there any unreleased model on lmarena right now ?
Well, I was busy with Horizon. What are Dino and Potato worth? Is it worth going to the arena?
im on dino now but it really slow though. horizon is openai model ?
im really starting to think deepseek is ded
they're both mid
chinese distils by the looks of it
stephen
kraken
kraken 2
folsom
nightride
nightride v2
dino
potato
octopus
clownfish
cuttlefish
cuttlefish and clownfish were removed
reasoning ? deepseek ?
no lol
wow the only way to access it through lmarena battle right
not reasoning ?
people on chinese forums/servers are talking about anothe 2 months delay for deepseek r2
no
deepseek are kinda cooked
@torn mantleWell, I slept, what did I miss on Horizon?
they hit a wall and they have some technicall issues
lol they probably realised they're too far below gpt-5 if they release r2 soon
i slept as well
i tried it on 1 prompt only
tldr is that it's a very small model good at code and svgs but not much else
terrible at math
idk its only good at svg
Dino is keep on generating crazy
yes i know all this
then why ask what you missed ๐ญ
its the same thing, when something became trendy they will finetune like crazy on it
and thats what happened with svg
they are focusing on the wrong thing
@civic flamehave people run benchmarks?
apart gpqa and math 500
and arc agi
and eq bench, creative writing
it got ~67% on aider
the thing that we should focus on is aristotle x1
which puts it just below claude 4 opus
yes i saw this too
isnt it the math model
that's aristotle
that acer was using?
yeah
he said it's definitely not "mathematical superintelligence" as they were selling it
do you have link?
apparently it's only good at very specific kinds of math
Can I use it?
Our system achieves state-of-the-art performance on both benchmarks because we've built systematic verification directly into the reasoning process. Rather than emotional doubt, our models apply procedural self-skepticism to their outputs, making the skepticism reliable and scalable rather than unpredictable.
Other models embody the opposite of scientific thinking. They're trained to sound confident about everything, when science requires knowing when you don't know.
not yet
seems to me they are looking for more fundings
typical small business -> big business
Mathematical Superintelligence at your fingertips
acer and another guy got a testflight invite pretty quickly
im not sure if this model is powering their app or nah
it is
yea that one
kinda curious if you can just ask it something unrelated to math
like inject some random math question inside an actual question
Oh I thought this one is more general
well they said they saturated both benchmarks
