#general
1 messages · Page 83 of 1
Either give me other comparison numbers to use, or accept the ones I have
estimations project decemeber for gemini 3
We dont know the paid users of any of them
We do for OpenAI
Give me Anthropics numbers and Google's
earlier or later ?
Sorry, I use the numbers published, not vibes
wen?
ohk. that makes sense
@deep adder I'm waiting
no way.. really? that seems too fast
Competition is fierce
I did, you said "nah" and refused to give any alternatives
are they not planning to release 2.5-002 like they did with 1.5 ?
Another version of their 2.5 series?
No.
"Hero run" next
Demis said so on lex's podcast
New base modle
*model
Roadmap?
Do you work there?
People expect within a month or so
but who knows
I find it hard to believe that.
are we getting gpt 5 tomorrow
Lets see if we get a Gemma model this week or not
Why people think it's coming very soon
Base model training runs take a while to get through post-training and saftey testing.
Assuming they started it a few months back, it will take a while.
believe so
Oh, GPT-5 will be strong
gpt-5 has to be super strong
I just hope Gemini 3 has native tool ussage
GPT5 isn't great for education, I feel like it's not very good at explaining things, just like o3
For search functionalties and calculations
is the new education mode good? Havent tried it
Gemini having absorbed LearnLM, performs very well in this regard
Did you guys read the learn LLM paper? amazing arena they created
Anyone else videos not getting audio?
What do you mean?
I haven't tried it either. I tried to get GPT5 (Summit) to explain a few concepts
nope link it? I'll read later
When?
will it have improvements in conversation? like, improvements in tool calling won't increase the arena score
genie 3 is good?
Will it perform good on searching? Like o3 does? (and now grok 4 recently)
you say things as if it's a fact.
Yes, if GPT-5 is so strong that Gemini 3 looks weak, they'll probably delay X amount.
"I think they will delay 2 months" is so weird to say
they should launch it before gpt-5 .. which wont happen but otherwise another 2.5 launch would be super awkward because it wont perform better than gpt-5
they could do something like Anthropic. 4.1 release now, state big upgrades coming in ~month
looks like the arena #1 for August is going to GPT5
RL on tool ussage is quite hard to figure out. Either it gets too domain specific or can't do long horizons. Generalization is a big issue
makes sense .. I think they have to do something like that.
100%.. easy money on polymarket if you gamble
is GPT-5 out? available to use on LMArena?
I hope they fix the overflattering issue with 2.5pro. wolfstride is nice, blacktooth is even better, but both are better than 2.5pro anyway.
GPT-5 is yet to be released
Arena did have checkpoints (likely) of it for a few days.
current 2.5 pro makes it impossible for me to trust any of its subjective evaluations
If you have actual info, say that. If not, just own that it’s a guess.
I'm not asking you to be right, just to flag what's grounded and what's not.
so it's not released on LMArena or ChatGPT (for paid user)?
tomorrow
Its not released nor announced as a product yet
Sam hypemen exists
And elon purposefully leaked a bunch of grok info
They do happen. often in the AI space.
I’m not assuming no one has info. I’m saying if you're guessing, just say it's a guess.
And if you do have actual info, say that, don’t just imply it.
cool cool? so GPT-5 would be available on LMArena tom or would that take time?
Can anyone tell me how to generate videos like specific tools veo 3 Hailuo 2
I think both reasoning and non reasoning will be improved, but not sure how much. I think hallucinations should be improved, which is pretty good
They usually have it within a few days (or hours) of the release.
OAI said their IMO Gold Model won't release for a few more months
even coooool, excited to use GPT-5
Wait a second, in the arena o3 ranks the highest in search. However it should not have the tools to facilitate search in the API. How does the arena version function?
ok, so you're saying you know for a fact Gemini is delayed 2 months. Good to know
okay so which is the smartest AI model right now? with the most about of data (it's trained on) and the highest parameters involved in it?
Rankings are different and each LLM specialises in different domains (eg claude for coding, gemini for teaching... so on) but o3 is the most generally intelligent LLM out there.
thanks
The chess tournament is about to start soon!
It seems that GPT-5 will be a big coding jump. It will be interesting to see if:
- it beat Opus 4.1
1b. Anthropic releases a follow-up to beat it
Ideally it does, good for competition
- 100%
- I think we'll have to wait at least three months
Likely? but why would Anthropic release 4.1 a few days before GPT-5 if they had a better model to release after it? Unless said model is extremely large and would therefore be uncompetitive with GPT-5 in pricing therefore unusable for coding
We plan to release substantially larger improvements to our models in the coming weeks.
summit is next level
"In the coming weeks" Quite peculiar. why would you release an inferior model only to render it useless in the coming weeks?
even 2.5 ultra deepthink can't compete with it in coding
maybe to get something out before GPT-5, then see if they can beat GPT-5 after. I don't know their exact strategy
seems clear though
gemini 3 pro is even more of a long shot. I'm really looking forward to seeing what cards google will play next
Why would you rattle and confuse enterprises with no particular gain to be had in doing so?
competition is crazy right nnow... these companie are releasing better models as soon as they can
Even if we were to assume that this is their current latest model, a few weeks are insufficient to make significant enough improvements as to warrant another release.
Well there is an improvement now. I guess it depends on how you define gain
I believe they've found a better way to validate rewards to scale up RL even further
they have multiple models trainings happening in parallel.. one track could be fine tuning and other track could be updating the base model.
It is faster (much faster) to release fine tunning version.
On a few benchmarks it seems to have regressed and the community does not seem to pleased with it either
Oh, yeah that does make sense. A mere finetune of Opus 4 to rid it of its shortcomings rather than any actual training improvement
A "quickfix"
it's fun to watch these improvements from side. But if you are an Engineer working in AI area, it is not fun. Sooo much pressure, it's crazy
Yeah basically all companies are developing multiple models in parallel
Some might be good but not production ready yet for a variety of reasons
looking forward to GPT-5 tomorrow for sure
Is the site working properly right now?
server ded? huh
before GPT5 I would have said kingfall was the undisputed king, but after using summit I think they each have their own strengths. And GPT's strengths are something the current generation of Gemini can absolutely not catch up to
im from Spain !! thank you so much to all LMArerna Stage !!!
hey guys is the site down for anyone else?
ye its down
yea down for me too
thanks thought it was just me was working on a project and my chat suddenly vanished
lmarena is down
lol same scared the crap out of me lul
@echo aurora
Yeah that's one of kingfall's strengths. It's more comprehensive and often more humanlike
same
Thank you for the flag. Escalating now.
does anyone have deepseek r2 news
which ai do you think is the best right now? im torn between horizon beta and gemini 2.5 pro
i was literally in the middle of typing.....dangit
Is the website down rn?
down
yes
down for me too
Same
Yup, we are having out an outage, team is working on it
I thought it was only me.
what is horizon beta
So sorry everyone!
2.5pro, horizon beta is not considered a reasoning model.
phew. joined discord to check it out, thought I got banned because a (very innoccuous) prompt I was submitting somehow broke TOS
opus 4 is still my favorite by far
4-1 tends to get a bit dramatic
especially without reasoning
I hoping chats won't gone after bug fix
yes
Opus 4 is really great at logic but the prompt has to be precise.
can you elaborate?
what happened guys? what's the problem?
This is a cloaked model provided to the community to gather feedback. This is an improved version of Horizon Alpha
Note: It’s free to use during this testing period, and prompts and completions are logged by the model creator for feedback and training. Run Horizon Beta with API
Just my experience
LMArena website got's down
Like what do you mean precise? Im interested in improving my own prompts
We are looking into an outage atm.
lmarena is live again
Its back up
okay okay, I hope the chats won't be deleted😁
Confirmed
I already cleared the cookies and site data 🙂
yay
Yeeees, my chats still here!!
even though horizon doesn't have a reason model, i think there are areas where it outperforms gemini 2.5 pro
"best" is usually considered in an overall sense
I sometimes have a hard time explaining how I wanted it to happen, it does the work, but the opposite of what I wanted sometimes.
2-5 pro is one of the worst models i've used. I only use it for youtube videos and processing huge context windows at this point.
Ah. I'd avoid negation in your prompts
o3 and gemini 2.5 pro are doing pretty good for me
i remember it being good a while back but upon testing it recently its garbage unless its for huge context tasks as you said
yeah right
deepseek r1 0528 smashes gemini at lua coding
i noticed that yesterday
why would you make it code lua?
thats a first a hearing
for roblocks
grok 4 is the most overrated model btw
webpage returns, thanks.
i get it but why deepseek?
why not claude?
swe bench is a clear indicator
Maybe it's underrated, but i'm not sure when you would ever use it
When you have alternatives yeah?
yeah
because
nah deepseek is good for me
Claude-4-opus-thinking is so much better than opus 4-1 thinking in the claude app 😭
it solved a lua problem before opus 4 and gemini 2.5 pro could yesterday
i tried it on lmarena
Use google ai studio
Well.
🎬 The Video Arena Leaderboard is now live!
︀︀
︀︀14,000+ community votes have ranked the top Text-to-Video and Image-to-Video models.
︀︀
︀︀📝 Text-to-Video rankings:
︀︀
︀︀- #1 Veo3 (audio on)
︀︀- #3 Veo3, Veo3-fast
︀︀- #5 Hailuo 02 [Standard], Seedance 1.0 pro
︀︀- #6 Kling 2.1 Master
︀︀- #9 Wan 2.2 A14B
︀︀- #11 Pika 2.2, Mochi 1
︀︀
︀︀Big congrats to @GoogleDeepMind, @Hailuo_AI, Bytedance, @Kling_ai, @Alibaba_Wan, @pika_labs, and @genmoai!
for coding it is incredibly annoying yes and for natural text generation (like sounding human and not ai) its sometimes very bad even with system instructions
Yeah
gpt 4o or gemini 2.5 pro? which is better than?
Sometimes i wonder how much data there is on these discord channel that gets lost. Are AI companies using this>
obviously not
no frontier models can perfectly imitate online chat platforms or even realistic conversations without fine tuning
🚀 Introducing Qwen3-4B-Instruct-2507 & Qwen3-4B-Thinking-2507 — smarter, sharper, and 256K-ready!
︀︀
︀︀🔹 Instruct: Boosted general skills, multilingual coverage, and long-context instruction following.
︀︀
︀︀🔹 Thinking: Advanced reasoning in logic, math, science & code — built for expert-level tasks.
︀︀
︀︀Both models are more aligned, more capable, and more context-aware.
︀︀
︀︀Huggingface:
︀︀huggingface.co/Qwen/Qwen3-4B-Instruct-2507
︀︀huggingface.co/Qwen/Qwen3-4B-Thinking-2507
︀︀ModelScope:
︀︀modelscope.cn/models/Qwen/Qwen3-4B-Instruct-2507
︀︀modelscope.cn/models/Qwen/Qwen3-4B-Thinking-2507
but chatgpt is far left
is this just good at math? Language stuff I feel like would be more interesting at first, for the small models
is there any solution to cancel when generating? it's so frustrating
why does the audionot work
guys help huhu is there any solution to cancel when generating? it's so frustrating😔
Gemini 2.5 pro. Gpt 4o aint even competition for it. Easy diff win
nice video leaderboards
what are you talking about 💔
i really think that seedance 1.0 pro should be number one
it beats all models in i2v
^
the only video model that has sound iirc
Does the audio creating feauture from veo 3 work
lmarena you can custom prompt the models. In AA you just watch someone else prompt I guess
So more diverse style
¯_(ツ)_/¯
which website? AA or lmarena?
yea if it chose the veo-3-audio
its actually tomorrow omg
Thinking we should have a watch party
So for your timezone stream will be <t:1754586000:f>
1PM EST
Do you guys think GPT-5 will beat Gemini 2.5 Pro, Grok 4 Heavy and o3-pro across all benchmarks?
I think so, but more important is the question if the model is actually good, or just benchmaxxing
Comformed
Isn't benchmaxing the same meaning as being a good model?
and me
Probably? But other companies have releases upcoming too
!!!
is this the gpt 5 announcement
Like who?
No, benchmaxing means it was trained on benchmarks, but isn't necessarily good at other tasks
At least that's how I understand it
LIVE5TREAM
Gemini and Claude both have releases soon
A model which is good at different real world use cases, such as finding bugs outside of public benchmarks, writing clean code for tasks it hadn't been trained on, and finding good solutions to problems
Gemini 3.0?
yea
When?
Live5tream
yeah. No date, but it's soon
Oh, i didnt notice that
is it a typo or are they refering to gpt 5
Anyone know if zenith was gpt-5 yet?
Wish I got more of a chance to try zenith before it got removed
I think it's for Chatgpt o4 and just a typo
All I know is that there will be a great video model coming by the end of this month
They wrote it this way intentionally
But I cannot say by who yet.
They're referring to Five Guys
Live5steam - referring to GPT 5
xAI
no way. It's not like if it was a typo they could delete it and tweet again
Nope
@echo aurora Can you help me decipher what is wrong with a message I am trying to use as a benchmark for models. I get "something went wrong while generating the response". I'd prefer if it was sent privately, not sure why this server doesnt have a ticket system yet
@echo aurora What do you think about GPT-5?
Yeah I can take a look. My DMs are open. Note we do have @oak python available too.
in grok 4 not topping lmarena way way before the release
no you didnt
stop lying
omg
...
what else did i predict
hmmm
lets see
what
no
Sent!
stop overthinking it. It's clearly a typo. Tweets can't be deleted
wait what?
@echo aurora 4.1 direct when
gpt-S
?
It obviously is gpt5. That is obvious even without any hint
No way
Impossible
This isn't a hint to you?
are you serious
i could have never thought of that
i thought they were hinting grok 5
Listen to Claudio 4 1 from now on he's not there yesterday I was talking to him and he's not showing up anymore I want
Guys, they removed it, they remove it, why can they put it back?
@patent aspen gemini 3 soon or nah?
maybe friend
yes
they have to respond to gpt -5
Thank you very much friend for the information mentioned above about the site
yes
how are you?
im chilling bud
Did they release it before Anthropic? hahaha
yo for the video arena, can they make it so that u can choose whcih model u wanna use
It's possible, be sure to share feedback in #bot-feedback
I'm just a user here! What are the canonical movie scenes "who are you"? - what should I say "Jesus"? , kiss my bro I'm just another one wanting to contribute and be your friend
@torn mantle
ok sorry
no need to apologize, it's all good.
how are you @torn mantle
How do you eat today?
with my mouth, as usual
Wow, I'm missing the notification feature. It should already be showing who wants to chat with me?
Lets try to keep convo related to AI please.
jjust a quick question to the ppl that run lmarena. how do u guys freely give access to premium ai models on the webpage, just a question.
bro what
are you an AI
its a bot so yeah an ai i think
no bro it's me jajjjajajaj
I guess not or will it?
That feeling when you go see see your generated videos only to come to the scene of people already voted and models are revealed 😭😞
We are putting a lot of thought into how to best do the model reveal. If you have thoughts on likes/dislikes don’t hesitate to share with us in #bot-feedback
Sus
ive already applied as janitor at the pentagon
Well they are getting smth in return
Sigh more bs from scam altman kek. GPT5 is not AGI lol
"smarter than the smartest person" bruh
These claims get more outlandish every time
smarter than the smarter smarties
is claude opus 4.1 not available?
i cant see it
claude is expensive as ssssss
they dont want us to get it for free so they tool it off direct and moved it to battle only
oh thanks
isnt it the same price as opus 4
in the api probably
but anthropic has to capitalize on its release
it wont do it by making free for testing
maybe in a week or two
Do you guys think we might get gpt-image-2 tomorrow too?
or at least an update to gpt-image-1?
or am I huffing copium?
are you a bot
are you using an agent to communicate with us?
like an agentic browser like comet
probably just non native english
@torn mantle I'm not anymore, this will be under construction soon as soon as the current project is finished.
Several people have asked me this
I think it's funny
How can I take away the restrictions
something big is coming
Like what
remember he is building an AI device separately from OAI
I agree
Staff at OpenAI robotics tweeted about GPT 5 maybe. what?
How could it be robotics related???
I'm not exactly talking about current slang because my course is outdated.
I forgot, OAI acquired it, I'm wrong
Oh yea, I got it: a PaLLM Pilot running GPT-OSS 20B would indeed be snarter than the smartest people who voluntarily follow Elon on Twixter.
its the new model from OAI
is that what i suggested?
No but it’s a joke.
i see
hmm
mini or nano
I'm not sure, but will flag to the team 
Dude there is no Veo 3
dang new models but I ran out of daily generations a few hours ago ehehe
oh no! they will be there tomorrow! would also note I was a bit late to the announcement so you may have been using some of them.
it would probably get destroyed there
They are coward
which discord server tells you about newly added ai models
no
very interesting author gemini
why is it calling itself gpt 4
hell yes
or i hope so
thats crazy
@hunoematic this is the public set and not the actual benchmark. its benchmark score will be much lower.
does everyone just trust everyone?
I think it's probably safe to ignore it, just thought it was interesting
why claude 4 sonnet is lower then claude 3.7?
Claude was only ever for coding
what is it?
Anthropic said they have some big upgrades in the coming weeks. We will see
btw this just looks worse overall why
Anthropic has big updates in a few weeks. As long as they beat GPT-5, they'll be fine. If not, it will be very difficult for them
does this edit look realistic?
Needs the frosted blur
You can copy the css directly from the site
the background is official
i literally just changed the path of the jpg, to gpt-5
i guess they just change their styles often
9/10 on a public set that has been available for awhile isn't all that impressive, alot of the models fall down when tested against the private set. no expectation it should stay at 90%, itll be ~70 with my testing
i mean, i only changed the image, nothing else, so any css would carry over
I'm giving it ~70%, which is by no means underestimation, its really strong from my testing, agreed
Excited for it
gulp
lol
hello frends
this is my edit, using their official bg images
very happy to be here with you ❤️
glad to hear it!
Can someone not generate video with voice over here?
people talk a lot about wanting x and y to help with coding, all i need it to do is to help me troubleshoot
i'm glad my edit was this good
no, tommorow is the livestream
50% of people wouldnt be able to tell the difference or care
For sure, if their router model is good enough to identify hard questions
Oh my god I have to send it again
buddy there's literally a model with the slug gpt-5-auto
on the api it isn't a router, but in chatGPT there will be a router (with the ability to force reasoning for subscribers)
@SpencerKSchiff What you outlined is the plan 👍 May start with a little routing behind the scenes to hide some lingering complexity, but mostly around the edges. The plan is to get the core model to do quick responses, tools, and longer reasoning.
Do you know what core model means?
????
The info that gpt 5 is a unified model is old than that
I'm?
Lol
No it's not
It's not just routing but there is a router
Yes you're right
Yes brian
Exactly
Yes there is a router and there is new models too
I never said otherwise
But people will receive for the most requests a gpt 4o router
There is lol
When I say gpt 4o , im talking about gpt 4o level model
You don't know me bro
craig will gpt 5 be the SoTA when it releases
Lmao
I think in general you have to assume that any high volume all in one chat app is going to have some routing involved, although the line between routing and not routing is going to be blurry because there will also be a lot of shared state
I never said that
Yeah I know. Most people don't think in shades of gray
Its not
It just the 4o model
The thinking is a summary bug in the frontend
Idk I use the api most of the time
Jules is out of beta
yeah lets just wait for the release
they have tested it
@hunoematic @btibor91 @patience_cave @kimmonismus @slow_developer @RafaCrackYT @koltregaskes @chetaslua this matches the 9/10 score it got with GPT-5 earlier today
in my run, the last question took it down
for me it thought for multiple minutes on most of them 🤔
Gemini pro 2.5 VS GLM 4.5 coding capacities test 🔥⚔️
Prompt:
Create a 3D educational HTML game: A metro train moves on visible tracks across a green field under a blue sky, passing through 10 stations. At each station, a quiz question appears with multiple-choice buttons. The train only continues to the next station if the player answers correctly; otherwise, it stops until the correct answer is selected. Include background buildings, animated grass, and a realistic sky. Add UI buttons for controlling the metro's movement (Start, Stop, Reset). The scene should be playful, colorful, and child-friendly, with smooth transitions and immersive 3D visuals.
1st one is for Gemini pro 2.5
https://g.co/gemini/share/f4e1c8d337a3
Gemini pro failed it and made tons of errors so I couldn t continue the design with it . You can see that from the first prompt he made bug on the buttons that he couldn t fix
2nd one is for GLM4.5
https://chat.z.ai/s/5d97a31b-0071-445e-8b7d-5f1f711388f4
I tried to make the prompts more and more harder each time to make it produce one error but ...
Writing 1638 lines with a very small mistake that he corrected perfectly
Started to bug after this covo starting from the 1694 code line..
Start a free chat with your AI expert for code and smart tools. Tell Z.ai what you need—a complete full-stack application, a stunning presentation, or professional-grade writing—and get instant results.
just use normal simplebench questions and use gpt5 on ms copilot
i messed up the comments and bottom posts time but otherwise i think this looks pretty realistic
can we do text-to-video on lmarena website?
No, it's currently only available through this server.
nooooooooo
how do u guys know if it's fake?
because they said it was tested on only the public set
what are the odds that o4 is integrated into gpt5
yes
says rumor for a reason
🙂
18 Hours left until GPT-5
How do you use gpt 5 on copilot
hi

the shared results are too low compared to those shared by open ai
The api dont work good now
math arena got 91% on aime 25 in local run, much better than artificial analysis
whats this
we love lmarena 🙏
Any news on gpt5?
Hello
didnt think i would ever say this
but i think gpt oss 120 actually cooked me a good script
at least better than 2.5 pro
is it just claude with the issue with randomly thinking forever or other models are the same?
NOOOOOO
👀
its incredible
Literally destroying all my questions after like 3 seconds thinking
o3 pro cant get these right
this depends on the stack sizes of the table
if you are biggest stack with decent margin, you can push much more
in any other situation you gotta be very tight
especially as a mid stack size
none would be pushable in fact im pretty sure
of course you can mix your strategy and min open aswell
which often will be better
as a mid stack you could only push AA probably, barely KK even
but you can open with more hands for 2bb
of course the prize table matters too
if it is very top heavy, you get to push more
really nice
GPT5 will be SOTA by a large margin. For sure
I wonder how long until GPT6 though. The 2 year cadence no longer works they need to shorten the gap between releases
Hopefully they'll make the API available asap so we can test it
This will be SOTA for about 6 months i think
i love horizon beta
do yall think gpt 5 will be better than claude 4.1 opus at coding?
yes
claude 4.1 is literally opus 4 but 2% better
what is the maximam amout of videos i can ask in arena 1?
i think its 8
a day?
Technically not a false statement
how do i get sound in the video
get lucky
you might get veo 3
but its not guaranteed
change of providers, now it's open ai themselves?
est ce qu ils utilisaient l api openrouter auparavant
our livestream tomorrow at 10 am PDT will be longer than usual, around an hour.
we have a lot to show and hope you can find the the time to watch!
google has to lock in
helo
scam altman better announce at least an update to the o models if not gpt5...
ya
hello
@echo aurora

Can I talk to you privately?
sure
hour long livestream, then straight to TBPN after... we eating good tomorrow
https://huggingface.co/huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated All those weeks of safety testing down the drain... 
They've actually been removing them off hugging face. There was other ones made that got removed
Are we sure gpt5’s being released today?
YES, it is so obvious
It will i'm sure. Imagine if it doesn't it will be big shock.
I hope it doesnt
That doesnt mean anything if it was done by third party running their public dataset
classic data leakage fallacy
mmmm
found the patience cave alt
Who?
A new model architecture from Z.ai -- not an LLM / chatbot, it's built to perform reasoning and planning in latent space, and it performs very well on ARC-AGI. Not SoTA, but incredibly well for its miniscule size. As I understand it, HRM works by having two 'recurrent' transformer blocks, one fast and cheap, the other slower and more competent, and the 'high' one oversees the progress of the 'low' one and steers it. It's a novel and very interesting approach.
fireworks
Idk how the model is so fried
Really don't get it. Like the only way you get performance like this is if you tried to
Bruh
@leaden palm hey, i am not able to create videos in the video arena. It's saying- the application did not respond
It's working now! You did something?
It wasn't working before. I have been trying for 10 minutes. Thanks 🙌🏻
hello word 🙂
do I see a Polish name?
Can someone explain why this prompt violates the TOU?
I want to ask if the code "remove style control" is this?
https://colab.research.google.com/drive/19VPOril2FjCX34lJoo7qn4r6adgKLioY?ref=news.lmarena.ai
Does this mean adding some features of the format to train the BT model to calculate the elo score?
Please answer. Thank you.
this is linked from there so yes https://news.lmarena.ai/style-control/
It may have been updated since though
Hey!
So is this code “remove style control”? I hope to receive your reply. Thank you
What is weird is how they got these benchmark results in a model card... Seems like there's something missing from the version we get to use lol
It's the opposite of that. It's adding style control which is now default view. By doing "remove style control" you are undoing those changes
thanks bro
I'm also very curious because the volume of data is just too large
can someone convince me why OSS is worth running on my machine? this is soooo dogshit
also, i gave it another shot just to be sure. it gave me a wrong answer
Just use qwen3
I have found OSS to be extremely underwhelming
I know I can run qwen3, I'm just wondering why the hype around OSS if it's literally unusable
am i doing anything wrong? is ollama not really compatible with it or something?
Btw, what program/website is that?
looks sleek
for LLMs
Not sure.
It's called Ollama, you can run LLMs locally with it
I choose GLM-4.5
It's very sleek, but almost no configurations
Yeah but that interface? Isn't Ollama CLI based and pulls a model once it's downloaded? Sorry, I am a bit of a noob
I've been using LM Studio
I used it because it allows for easy web search implementation which I couldn't get in LM Studio, but maybe it's fixable somehow
It's just based on Ollama CLI, but the app is different. go to ollama.com
ahh, I see
thanks a lot
For sure. Let me know if you know how to enable web tooling in LM Studio
ahh it was released recently
no wonder I haven't known about the app
lol
I find the answers to be much slower and worse compared to LM Studio for some reason
I hope Ollama will have some agentic automation too in the future
would be real nice
They seem to be really far from this, but yeah would be amazing
Hey guys whats the best way to make money with ai
ask an LLM, the one with the best answer wins
this is like 2.5Flash, except way more extreme
Goes to show the edge case of those benchmarks being the least effective... Since their only performance line is those scores dropping. And those benchmarks are not good enough to control model size
Niche tests are always the best tbh
I write the most absurd stuff to test the reasoning
with all kinds of models
lol
I find LM Arena to be the only benchmark I care about
The ranking is pretty much how I feel usually about the models. Human vetting is still king
Nah it's just a different area of evals. It's not substitute
And like, do you think 4o is better than all those models below it...?
What's this leaderboard? text?
I definitely do think it's better than them at text
I still find Kimi K2 the best at emotions and general EQ. 4o is too sycophantic along with gemini 2.5 models
Im thinking of making an ai model and sell her pics on OF
what
Yassin, go away bro
4o-latest is nowhere near Opus4, Grok4 or even R1 in terms of text performance
So explain why people think it's better?
And define text performance. I think it's a mixture of accuracy, style, speed etc
They prefer it's writing style. This eval is measuring human preference elo. Not performance/accuracy
If it wouldn't be accurate, people would choose the accurate answer way before how stylized it is
It's also good at convincing - but that's also not an indicator of performance 👀
Human preference is inherently subjective
I'm getting hallucinations from all text based LLMs... it's not unique to 4o
and is often not factual
I find it much nicer if a model could say that it does not know the answer and guides the user to use other sources
People have no clue what the correct answer is, both look "similar enough" but one response looks more convincing... that's how this works
well not always, but many of that is this.
People are varied and not as dumb as you make them to be
I agree that style is a factor and that people prefer 4o style, but I just don't think it's as black and white as you make it sound
It kinda is though because it's what this benchmark is. It's not trying to be something else. It's measuring human preference elo
yup, and as I said, human prefer not just style, but also accuracy, speed, etc
For me when I am talking in my native language on LMArena to the models, I check for grammar since many can have clumsy wordings at times (applies mostly to open-source models / small models). It's quite funny.
Answer A is totally wrong, answer B is less wrong. I prefer B.
Not every question I ask a text based LLM is something I don't know anything about... sometimes the test is asking it about niche subjects i DO know about.
speed they are equalising so mostly not a factor, accuracy... Once again that's way way less of a factor and has less factual weight than in conventional benchmarks. Even if the model output is completely wrong, it still can win the user over to get the vote with sycophancy, style, manipulation (negative strong word but there can be some of that in a sense of it abusing the text that people generally like seeing) or whatever else... 👀
I feel like I'm repeating myself to be honest
It's okay to not agree 🙂
A model can sound as convincing as it wishes... it's easy to spot some hallucinations. Also if you get 2 completely different answers on a subject you know nothing about, picking based on perceived accuracy without you knowing the actual answer is lazy voting
When you are matched with LLMs from same tier, often enough you can be swayed against your best intentions by the factors I already mentioned.
That is hilarious
Right, that's why we have thousands of votes and not just 14
It's only natural and what happens kinda by design... when everything evolves around *preference *
Truth is dynamic and subjective 🙂
yeah and even if we don't see them as "poor judges", they will never be as good at assessing accuracy as curated tests with verified answers by industry experts are.
I mean we can go deep into an Adderall debate about life after death, existence of god, belief systems, crypto etc
There are ongoing debates about these subjects with no hard 'truths'
No one can convice me there is or isn't life after death - we just don't know
Ok change of subject, I wonder why OpenAI ditched their yap score for gpt-oss... 
openai couldn't even release a half usable open weights model, so trash
What Is The Browser You Love The Most
1
3
I agree that the model seems trash mostly, but technically this says differently lmao
That's why I keep saying benchmarks suck.. feels disconnected from how they act in real life. Also show me the phone that runs OSS lol
Can lmarena pay more attention to search arena improvements. This area is often neglected
Can anyone help me, I just posted a prompt in the video arena and its more than 30 mins its still showing that its generating yet, Although i have generate 2 videos before and after that stucked prompt i generated one more video everything is fine no errors still its takking a long time, [ I just commanded it that it should be a video of 5 mins ] is that causing error or making it late .
if gpt5 comes today, how can i access it via sub in europe?
do i need a non-eu cc, or just vpn with german cc?
or vpn only works if i use another one like paypal?
why do you expect usage issues in Europe?
by going openai.com. Why would Europe be restricted?
omg do u all live in usa behind a rock in a cave
eu has some ai regulations
I am from Europe
no idea how easy/hard it is to follow the eu rules
Well I meant LLMs
It wasn't though?
these are llms too
I was using it on release
then u dont eu
I EU
I would know
lol
@willow grail have you ever used OpenAI playground?
Models there get avail as soon as in US, for the most part
too much money
well... But it's the way to go if you want to test new models.
Playground is and websites like openrouter are technically API as well. You don't need to know code to be using it tbh
i mean the money
If you want to simply test a model it is not gonna be expensive usually...
how can we choose the model ?
carefully
meaning? I want to choose veo3 and runway ?
is gemini 3 releasing today too?
what browser
Chrome
you should use edge canary
I used normally suddenly my session got deleted
I wasn't being serious lol. You can't choose models for video gen. You can't choose them for any battle, only direct chat but not for all
it has chrome extension support forked from kiwi browser
Is it good?
yes
I got this message suddenly
it has chrome extension support which is rare
Can u try to access lm arena?
So its same for all
yep
Is it a browser
yes, fastest
Are u into genai?
Chromium fork for Linux, Windows, MacOS, Android, and Raspberry Pi named after radioactive element No. 90.
https://alpha.lmarena.ai/ works :)
@prime mulch
How is it?
good, would work as wallpaper for phones
Yea i created this
i cant wait for gpt-5 bro it's releasing today
And i have a little wall paper channel but that have no views i wait for growth
What about this
This is my masterpiece
What’s the latest ai model available on lmarena?
That will change ai era
isnt this just a countdown to livestream?
yes buuuut
does not seem official in any means 
it aint
my friend generated
😭
"LIVE 5 STREAM"
it doesn't
ok
cat?
lol yes
wowwwww
hi
so i stand right. there wont be gpt5 for europeans.
api is too expensive.
i am right, you loose.
as a european i found a site with free opus 4.1 within minutes of its release
not interested into boring stuff
such sites always have side effects
QOL issues.
girl?
u ok girl?
1h?
yeah
pls visit doctor
lol
wait where does it say 1 hr
also asura is a bad server from THE ISLE
they do KOS kill on sight all the time
so annozying
It's not expensive. Are you like from Belarus or smth? That's not EU
Qwen image gen is pretty good
Gpt 5 is almost out
uhm o3 api is... XD
or gemini 2.5 pro
or opus 4
that's not expensive... I can agree that o1-pro was expensive, but not current price of o3, that's cheap lol
hell yeaaaaaaaaaah
is it a girl?
ITS A BODYBULDER
very
Yes.
via api
Then Altman says "This is the way to feel the AGI"
Gpt 5 will change the world views about ai
The only thing it will.change is my wallet
The hallucination levels need to get solved before we can develop any AGI
y e s.
All HAIL AGI
gggguys just make agi by making it self train on data from internet!!!!!!!....1!!!
People don't realise how powerful AI. Will get with agi
ill just play rail route. wait for gpt5. be disappointed that it still cant make video games.
and continue playing RAIL ROUTE
Do you think gpt 5 will be AGI
no
Nah
I don't know if that will happen with the LLM architecture though when there are many other ones that are being developed that are far more efficient
no
It will be one of the powerful llm not agi
Let's not overhype i hope it doesn't end up like gpt oss which is absolute garbage
reread this. there is no hype there
RE READ THIS
i wonder if they'll have a reasoning mode for it right away
He'll yes
Yea it have possibility
Prompt: generate video with both text about Russia. Duration 8 second
I think sam said it will be a hybrid model which can work with reasoning and without it (GPT5)
wrong channel
they probably already have it ready but wont release it right away so they can spread out the hype
or something idk
Let's just hope it is a good improvement over gpt4o
it is
100%
have you seen the leak
Yeah but I'd hate it to be as sycophantic
theres been like 20 leaks already
I have seen some
I like to keep it as a surprise for myself
cat?
yes yes
URL NOW
good boy pat pat
😭
i am literally complimenting you?!
😡😡😡😡11!!!!!11!11!
@willow grail are you Belarusian
hell no
railroutian
i am croatian
crowatia
GM
the feel when you realize how much more unlocked stations ther is on the map https://i.imgur.com/WiHsvMO.jpeg
If GPT-5 actually scores >65 on ARC-AGI 2, that would be significant: https://pbs.twimg.com/media/GxthgsDXgAAoW6F?format=jpg&name=large
oh u think its only 65% or so? ...... i hope its at human baseline
only 65% ?
i am soon dead. i want immortality tech now. i have no time
i am 32
i dont have time for 1% per year
Competition will make this speed up, esp with China coming out with real good models
and research
china so far only delivers vomit . from their bad robots who they send to various events and act like its autonomous but its rather just a simple animation baked it to their bad text models nobody uses
???????
china ontop
top of what?
everything
Please don't feed the trolls.
how many people die per 1000 residents because china quality is trash?
buildings are made of paper, trains crash daily, bridges crashes, fake robots just doing baked in animation
Be mad, troll
iphone made in china
what is wrong about what i said?
difference is, the company isnt china based
china based companies and ceos will create bad products
if i am a china man i wont care about investing into the product, it can brake down next day. like a big 100 floor building
if i am any other country man i will care about quality
They ship better models than gpt-oss at least
uhm its a fact that in china much more things brake down than in usa or europe
and people here telling me china is the chad
🐒
why so rude against china town
why u china propaganda?
no propaganda china ontop
china #1 country
china make new technology everyday
nono india number one
india bad
racist
Lets keep conversation focussed on AI and respectful please.
lol
2 hours
3 hours???????
INDIA
gpt 5 is out in 3 hours
SAMA said that the livestream will be longer than usual at 1 hour long
yay
quite exciting
yeah cause agentic
@willow grailare you croatian
yes
I think china will release better version of gpt 5 in opensource after some months
the request that is useless at all
https://discord.com/channels/1340554757349179412/1403017053031628913
are you serious david
I think it's unlikely China will manufacture it's own chips able to compete with current best anytime soon tbh
will we have gpt-5 pro high in the arena
Reminder that we are doing a little watch party for the livestream: https://discord.com/events/1340554757349179412/1402720955192705176
excellent
