#general

1 messages · Page 96 of 1

reef pawn
#

LG is cooking

neon idol
pure falcon
#

It makes me wonder what kind of person should use battle. Power user? Early adopter? Average person?

Right now, lots of people who may not be able to afford GPT pro, are free-riding direct chats to get gpt-5 high access.

You could maybe gate + cap direct chats at like, idk, 15 messages and force them to respond to a battle before continuing.

I do wonder how the average battle user vs direct chat user differs. My guess is direct chat people are younger, less income, etc. Winder what effect that ultimately has on the leaderboards / stats / rankings

agile bloom
#

gpt-5-search model can search the web?

neon idol
pure falcon
#

Though tbh, the style control thing confuses me. LMArena is all about moving away from “objective” benchmarks to the ones that matter most: user preferences.

If users like different styles and formatting, then why control for it? The whole point of this is to find out what users like!! Defeats the whole purpose imo

ocean vortex
neon idol
#

@echo aurora What is this new ai image generator called nano banana that is in LMarena

jade egret
#

😭

ocean vortex
#

OLED AI

obsidian cargo
ocean vortex
#

😭

pure falcon
echo aurora
ocean vortex
pure falcon
hollow imp
#

Has anyone tried minimax? How is it

ocean vortex
pure falcon
#

Haha, are we sure about that?? You just said that early arena testing is made for enthusiasts lol, which i do agree with. Enthusiasts aren’t average users though.

One thing LMArena should roll out, is user accounts. Allowing accounts on the platform would go a long way to help them get the data they need, in order to answer that question

stray aspen
#

however they have great tts models

#

and video models

hollow imp
hollow imp
# stray aspen it sucks

So according to you the only things which do not suck are 4.1 opus, 2.5 pro, grok 4, some specific versions of gpt 5

ocean vortex
#

Something like chatgpt started with like only the most die hard nerds at it's infancy, look at it now... 👀

#

They didn't do anything to push those away and make their audience "more balanced" at the time and that was the right move

jade egret
#

when gemini 3 ):

neon idol
#

Fr

ocean vortex
#

Not as good as gpt5 😇

#

Also interesting fact... it doesn't even have 1% of chatgpt market share lol

stray aspen
#

grok 4 and gpt-5 are good

ocean vortex
#

It has... you are very wrong actually. It's more than 20%

#

Gemini is quite substantial

#

Mostly thanks to Android I think

#

They integrated Gemini into Android itself quite well. Like they have a properly working voice assistant based on Gemini. They have what Apple should have had by now with Siri lol

zinc ore
#

20% figure is Gemini app

ocean vortex
#

Not really, it's still Gemini and it's super easy to direct users directly to Gemini app from there

zinc ore
#

They don't count the AI overviews stuff in search in those figures

ocean vortex
#

I've tried those local modals and they are crap. No 2 ways about it

zinc ore
#

Their AI overviews is 2b users per month, and I forget what the AI mode one is (but it's been growing at a decent tick too)

ocean vortex
#

Even their cloud model is underwhelming

loud leaf
#

does current gpt-5-high in arena correspond to GPT-5-Auto in the client? is GPT-5-Pro distinct / to be added to arena at some point?

ocean vortex
#

2.5 Flash would destroy that cloud model without trying

#

Probably even Flash-Lite has a chance

#

Well there are no useful on-device tasks that wouldn't use chatgpt...

#

with iOS

#

It can do like notification summaries. But any at all model can do that and it takes no time at all to make API request for that lol

#

then you don't have notifications either

#

lol

#

Ones that would benefit from summarisation usually do

#

it's just such a non-thing. You need to be actively looking for edge cases to find any benefits...

#

It also kills your battery much more than API

leaden palm
#

do you guys think o4-mini and gpt-5-mini have less world knowledge than gpt-4.1?

eternal niche
#

btw gpt5 sucks

pure falcon
#

I hate OpenAI and will never give scam Altman a dime of my money, but GPT-5 high is the nuts

ornate agate
#

I thought that only works with “juice” on the api?

tight oriole
#

any ETA on when video battle goes live?

pure falcon
# eternal niche well

honestly can you stop spamming this? You’re not accomplishing anything except looking like an idiot

pure falcon
#

Anywho

eternal niche
#

cope

pure falcon
#

One day Claude will reign supreme in all areas

#

Besides coding

#

Claude is the champ

#

agreed (reluctantly)

eternal niche
#

gemini 3 - SOTA

#

just accept it

white hatch
#

magnum opus?

wicked root
#

Someone here disagreed with you

ocean vortex
#

or gpt5 if you want to stick to reality 👀

golden ocean
#

Russian state news

leaden palm
#

uhhh

#

i need to scroll up

#

yeah can you stop @eternal niche lol

ocean vortex
leaden palm
#

14 times

eternal niche
leaden palm
ocean vortex
#

I love it how he felt the need to explain what SOTA is

#

yeah act in good faith Russian spy.

@eternal niche

eternal niche
leaden palm
#

that's not good either, exchanging blows doesn't usually get you anywhere

#

i didn't mean to ignore it - i'm not always watching the chat

leaden palm
ocean vortex
ornate agate
leaden palm
ocean vortex
#

well that is more "juice". No parallel / pro, but those requests are limited even for pro sub anyways

white hatch
#

tasty mhmhmmhmmm

#

lol

ocean vortex
#

Some tasks probably yes... not enough reasoning effort is not gonna arrive at the answer even if you run it 10+ times

upbeat pasture
#

@echo aurora Is the data for LMArena going to be released (I sent an email to lmarena.ai@gmail.com about it)

echo aurora
wicked root
#

@eternal niche no way this guy’s a legend

gentle plinth
#

i think gemini deep-think might be better if we can trust the benchmarks, but i cant try it and havent heard much about it bc its so expensive and not in the api

#

grok4 heavy or the "normal" 4

#

heavy is more like deepthink

#

i dont think they have any of these parallel thinking models in the arena

#

i tried it with rust code, and it wasnt able to write compilable code (unlike gpt5-high)

ornate agate
#

What kind of debugging?

rain mulch
#

hiii

#

@amber warren yo

#

😼

ornate agate
#

Hmm. I think for complex code debugging in c++ where I assume some of it is segfault or unexpected stuff, most AI is not very good at this

#

I would actually ask a lot of them simultaneously

amber warren
#

i am normie in here

wicked root
#

In coding? Gemini absolutely brute forces

#

Well, gemini pro

#

Flash is utterly useless for coding

stray aspen
amber warren
cedar tide
#

Toad from ?

ornate agate
#

It might find the bugs, but I would also try other AIs if it doesn’t.

cedar tide
amber warren
#

toad

eternal niche
ornate agate
#

It’s a good test but what harness/platform are you using?

barren prairie
cedar tide
#

Msft

ornate agate
#

If you want the AI to code it for you it’s probably worth trying Claude code and Qwen code.

barren prairie
#

Any recommanded models for python??

eternal niche
#

gemini 2.5 pro SOTA

ornate agate
#

I think it’s good enough to be worth trying.

eternal niche
#

i forgive you

misty vault
#

are you ok?

ornate agate
#

Qwen code is free if you can’t pay for Claude code

misty vault
#

This isn't the real @ paws

ornate agate
#

There is also Gemini cli which is also free

gentle plinth
#

qwen wrote a stockfish PR actually

ornate agate
#

No but I don’t have time to do like giga deep dives like that unfortunately.

gentle plinth
gentle plinth
#

model is qwen3-235b-a22b-thinking-2507

sand bay
#

this is fake it's actually 4

rain mulch
gentle plinth
#

this is a common problem with llms

amber warren
#

i get the n/a role

#

which is admin for some reason

#

my proof is that i am employed by lmarena

echo aurora
#

can confirm ablobnodfast

#

all of them simultaneously

ornate agate
#

Using many models at the same time is actually a good idea

#

I wouldn’t bother with grok though

echo aurora
#

just all

gentle plinth
#

noticed something interesting. gpt-5 high has only a 33% winrate against gemini 2.5 pro in the arena

worn tundra
#

is there any way to see all the available models? like the stealh models when you play battle with 2 anon models? because I cannot find any

gentle plinth
#

which they probably only will if the arena went good

solid brook
#

Omg google releasing anything but gemini 3

#

This is so bs

red sluice
#

Just did a few text prompts with the search mode, it didn't tell me the models it was once I voted, is it normal? new?

wicked root
terse shuttle
wicked root
#

@eternal nichedo you think gemini pro has a shot against gpt 5 this month?

ripe mountain
#

The best AI models according to benchmarks

amber warren
#

posting artificial analysis 😠

hardy lion
#

we use LMArena so we don't even know until after we vote

ripe mountain
quiet dust
#

Why is the regular GPT-5 model weaker than the GPT-5 mini?
If anything, GPT-5 (minimal) is a regular model, without thinking, and when the requests end, you are transferred to GPT-5 mini

ornate agate
#

Gpt5 high that you see there is api only it’s not even available to pro users is my understanding

leaden egret
#

Is there anywhere you can specifically test nano banana or do you just have to wait for it to come up once every like 50 prompts

ripe mountain
echo aurora
stray aspen
#

bro some people in video arena are down bad lmao

solid brook
quiet dust
keen beacon
#

It is really not that good

ripe mountain
keen beacon
ripe mountain
#

i think qwen coder 3 is even better than gemini 2.5 pro

ripe mountain
ripe mountain
keen beacon
#

Try to ask it for around 100 anime similar to Madoka Magica, Qwen will fail, hallucinate and invent titles that never existed. Latest R1 does this job way better.

#

Try to ask both some questions from music theory and see how often Deepseek answers correctly and how often Qwen does

#

Qwen in general is not that good yet, unfortunately

#

It's trivial to train a model that passess certain public benchmarks even if it never was trained on them

#

The only way to compare general capabilities of models is to use private benchmarks like I do

toxic whale
#

im in the process of testing AI models on my own benchmark i just made, so far the results are very interesting, Opus 4.1 gets only 36%. if anyone has access to GPT-5 Pro please dm me i would love to test and the bechmark is only 10 questions so it wont eat up your rate limits that much 🙂

#

Opus 4.1 is the highest so far, o3 gets 22%

keen beacon
#

Isn't it enough?

ripe mountain
toxic whale
keen beacon
#

I find gpt-5-high enough good for most tasks lol

drifting thorn
ripe mountain
stray aspen
#

gpt-5 high is amazing

ripe mountain
#

sota

toxic whale
#

oh wait sorry i tested gpt 5 mini thinking im not done GPT-5 high, give me a few minutues ill see if it ends up beating opus

toxic whale
#

tested 2.5 pro and 2.5 flash already

ripe mountain
toxic whale
#

2.5 pro is second with ~33%, flash is 14%

ripe mountain
ripe mountain
toxic whale
#

ye sure its almost done it has 2 more questions

ripe mountain
#

thxx

quiet dust
leaden palm
#

https://www.youtube.com/watch?v=9alJwQG-Wbk cant get over this guy trusting gemini with his arm

Giving a PC program control of my muscles to become the fastest in the world. Sponsored by Micro Center!

Build, Upgrade, and Save All Month Long at Micro Center: https://micro.center/9d4315

Sign-Up for VIP Days at Micro Center Phoenix: https://micro.center/a11e1b

Shop 50 Series Laptops at Micro Center: https://micro.center/717642

Shop Raspb...

▶ Play video
keen beacon
drifting thorn
keen beacon
#

Guys I have a stupid question

#

There's no signup on LMarena?

stray aspen
#

no

keen beacon
#

Cool.

toxic whale
keen beacon
quiet dust
#

And the regular GPT-5 model has 44 points.

#

GPT-5 (minimal) - this is the standard model, which is by default

toxic whale
toxic whale
toxic whale
keen beacon
echo aurora
toxic whale
toxic whale
ripe mountain
#

The most important thing about GPT-5 is not that it's the best model, but rather, that it uses resources more efficiently. Despite being an improvement, GPT-5 is much cheaper than GPT-4.5.

#

gpt 5 cheaper than 4o and 4.1

toxic whale
keen beacon
toxic whale
#

Battle mode gives random models no?

#

direct chat also has limits, im getting limited on Opus thinking right now

keen beacon
toxic whale
#

with side by side you still get rate limited

stray aspen
#

ive only ran into limits with opus

#

ive been using gpt-5 high non stop and it still lets me

toxic whale
#

ill try with another model, Opus is expensive so that makes sense if its only opus

ripe mountain
#

claude so overrated and overprice

#

fact

solid brook
stray aspen
#

fact

ripe mountain
#

omg loll

toxic whale
ornate agate
#

they are now the only ones to have not released an open source model

dense reef
#

/img vedeo

toxic whale
ripe mountain
#

gemma is opensource right

#

i forgot

dense reef
#

Change her photo and wedding dress alongside Cristiano Ronaldo is taking a photo with it

toxic whale
#

Grok-2 is gonna be open sourced soon they are saying

ripe mountain
ripe mountain
ripe mountain
#

why the EU is the worst

#

When will the Grok coding model be released?

left rain
#

guys what is this ai image model?

#

like I can't find it anywhere in the dropdown

tired herald
#

You cant use these hidden ai's in direct mode

left rain
torn mantle
#

where is france

whole sundial
#

lumped in with EU

#

but the data's from January 2019, things have changed a lot since then

torn mantle
#

1 year

#

or tomorrow

#

😴

#

true

#

yes

stray aspen
#

amd is running gpt oss 120 on a mi400x on huggingface

rugged mulch
#

What kind of ai is this

exotic nebula
rugged mulch
#

I don't know how they do that

stray aspen
#

Lol

#

People on video arena are so down bad bro 😂

wicked root
#

Who's Himanshu?

wintry tinsel
#

AI relationships and love is a genuine threat to humanity and is growing rapidly by the day, I remember just two year ago where you would be laughed out of any discussion as a total loser for saying you date AI, now it’s still a squeamish topic but it’s not much stranger than just viewing normal illicit content, 2 years from now, it may be as commonplace as illicit content is, scary stuff

#

I remember back to the early days of chat gpt when this stuff was first being experimented with, a romantic partner bot in early 2023, people treated it as a meme than, how quickly things have changed

wicked root
#

bro I dno anyone who says things like "I date an AI".

wintry tinsel
wicked root
#

man... maybe I'm surrounded by imposters.

#

also, wth everyone's saying their AI's too agreeable, praising, do all the gf stuffs, but my Gemini told me I'll kill myself by 45 if I don't start having hobbies among other more hyper-realistic criticisms.

#

maybe there's a real human behind my Gemini instance

solid brook
#

The threat is real and action is needed from those in power

#

It may look like it is fixing problems short term but in long term a very big threat is waiting

misty harbor
wicked root
ripe mountain
solid brook
mellow frigate
#

Elon:
Igor, I’ve been testing Grok all week. It’s not matching ChatGPT. Not even close.

Babuschkin:
It’s early days, Elon. We’re iterating—

Elon:
Iterating? I asked it to outline a Mars colonization plan. It gave me a blog post about composting.

Babuschkin:
That’s because the model is still aligning—

Elon:
I don’t want alignment, I want intelligence. Strategic thinking. If GPT can answer it in 5 seconds, why can’t ours?

Babuschkin:
Because we don’t have the same data scale, or the same training infrastructure. It takes—

Elon:
I’m not hearing solutions, I’m hearing excuses. We’re supposed to be ahead of OpenAI, not their science fair project.

Babuschkin:
We’re building something different—

Elon:
Different doesn’t win. Better wins. People aren’t going to pay for “different.” They’ll just go back to GPT.

Babuschkin:
If you want an overnight GPT clone, you’ll need to run the company differently.

Elon:
Differently? I’m already pushing the team harder than they’ve ever worked.

Babuschkin:
Exactly. That’s the problem. AI research doesn’t work on a launch schedule.

[Elon steps closer, his voice tightening.]

Elon:
So what you’re saying is, we’re going to watch OpenAI pull further ahead… and do nothing.

Babuschkin:
I’m saying we can’t brute-force our way past them in six months. If that’s unacceptable—

Elon:
It is unacceptable.

[A pause. Babuschkin closes his laptop.]

Babuschkin:
Then I think I’m done here.

[He stands, walks out. Elon stays silent, staring at the whiteboard, gripping the marker until it creaks.]

[Elon sits down at his desk, opens Grok.]

Elon:
Grok… how do I replace a cofounder?

Grok:
Step one: Acquire a cofounder replacement kit from Amazon Prime. Step two: Follow the instructions in Swahili.

Elon:
…Not helpful.

Grok:
Would you like me to search for “emotional support raccoons” instead?

solid brook
#

Is this real?

#

No

#

Not real

mellow frigate
#

It is pretty hilarious to think about though haha (written by gpt 5)

languid crescent
#

uhh what's a toad?

echo aurora
solid brook
reef pawn
#

Why can't I use Deep Research in Grok 4? Is it even available?

torn mantle
agile bloom
#

which is the top tier gpt-5 model without the thinking/reasoning? GPT-5 takes too long to output reply with it's thinking/reasoning

reef pawn
#

Or Nano

#

Nano only works with API

agile bloom
reef pawn
#

It's not the fastest model since it "thinks"

agile bloom
#

gpt-5-chat felt faster to me

#

gpt-5-high felt slower

reef pawn
#

It's good but not as fast as compared to GPT-5 MINI

agile bloom
#

oh ok

reef bridge
#

why we can't upload images? on our chats?

terse shuttle
#

ts cool

deft maple
#

hello

teal summit
#

I have a question, in LM arena i can keep with many tabs with my previous messages and he can remember that ?

agile bloom
compact jay
#

you mean different conversations, or browser tabs ?

agile bloom
cedar tide
ocean vortex
#

For gpt5-pro there's no open public API yet

ocean vortex
#

"issues" is inevitable

north vale
#

FT is into writing deepseek fanfic

ocean vortex
#

Like this is equivalent of them just announcing they are gonna use Huawei chips lol

#

"DeepSeek was encouraged by authorities to adopt Huawei’s Ascend processor rather than use Nvidia’s systems after releasing its R1 model in January, according to three people familiar with the matter." - typical CCP things... 👀

#

They are gonna ruin Deepseek

ornate agate
#

The media also published something recently on “billions” of AI chips smuggled into China. this is only around 20,000 b200 chips. I’m surprised it’s not a lot more. I suspect it would be a lot more if the Chinese state was using their vast spying and intelligence resources to do the smuggling. These articles suggest they are not and want to focus on Huawei instead…

ocean vortex
#

Innovation is kinda killed when everything is constrained and controlled by a single entity

ornate agate
#

I don’t want to get into another massive argument with you about China. It’s such a waste of time. They are going to do whatever they do no matter how much you don’t like it. So will the Americans. The irrelevant ones here are people like you and me who live in Europe. We have just chosen to not play this game at all.

barren prairie
ocean vortex
ocean vortex
#

Ok so for Plus "juice" is only 32 for gpt5-thinking-mini apparently

#

that's medium reasoning effort at best. Would have expected to have high there...

#

They clearly don't want for it to perform better than the full model on any task this time

#

on chatgpt at least

sacred quail
#

App using medium reasoning is kinda sad. Maybe they want to peoples using their API more but idk. People can use openai api in Poe app free even with high reasoning support

#

its open to abusive with multiple gmail accounts

ocean vortex
#

If you think about it can actually make sense.. They want to make it simpler. And now they have matching naming (not like o4-mini vs o3). But it's still disappointing not to have high reasoning effort anymore ofc

#

"inferior model" performing better is confusing, even if that's only on specific tasks

solar hollow
#

i dont think deepseek will catch up again any time soon

ornate agate
#

But they don’t need the hype cycle for funding so… not really a problem for them long term imo

tame granite
#

nano banana is not on lmarena?

keen beacon
#

it will come up randomly in that mode

tame granite
quiet dust
#

Guys, why is the GPT-5 mini more intelligent than the regular GPT-5 model?

quiet dust
#

GPT-5 minimal - This is a normal model.
GPR-5 Low - This is a low-effort thinking model.
GPT-5 mini - This is the model that appears after you finish the main limits. The question is, why is the mini smarter than the regular model?

#

And why is the GPT-5-nano also smarter than the regular model?..

tame granite
#

this is really good

keen beacon
#

hoping for multiple sized imagen models

keen beacon
quiet dust
keen beacon
obtuse heart
#

gpt5 pro doesnt have an api so most likely not

quiet dust
#

GPT-5 Pro already in LMArena

obtuse heart
keen beacon
quiet dust
hollow imp
#

Gemini 2.5 would be so expensive for them to bring

obtuse heart
#

its pretty much smarter

white hatch
#

Guys, guess this riddle: small, yellow, opens any door?

white hatch
#

Bruce Lee

quiet dust
#

Pro is a high-effort thinking model. LMarena already has this. Moreover, Pro is not that much smarter than the medium-effort thinking model.

glacial torrent
#

How many videos a person can generated here? I got 8 videos limit yesterday and now I got just 1 video limit? why man????????

obtuse heart
#

how is it that hard to understand dude

quiet dust
#

...

obtuse heart
#

gpt 5 pro does not have an api, therefore it cannot exist on lmarena

#

you can say "lol but this model is similar" as much as you want, but the model ITSELF doesnt exist on lmarena

#

thats what the person was asking

quiet dust
#

Pro one point higher than the medium model. Why is there such an opinion that as if Pro cannot be in LMArena?

quiet dust
#

Apparently Pro is so smart that she couldn't solve a Russian math problem

#

I thought for 26 minutes and still couldn't decide...

obtuse heart
#

yeah because its in russian, that makes it harder

quiet dust
hollow imp
#

Listen

#

Give pro this problem and see if it can solve it

#

@quiet dust

obtuse heart
#

also that benchmark is kinda stupid, gpt 5 pro is much smarter than medium lol

#

medium literally hallucinates every prompt, while pro can one shot a lot of things

quiet dust
quiet dust
ocean vortex
hollow imp
#

Her???

ocean vortex
#

It's also a reasoning model, even nano is

obtuse heart
quiet dust
hollow imp
#

@quiet dust so what do you say what model is the best in lmarena direct chat?

ocean vortex
#

and gpt5-mini-high is your new o4-mini-high

#

though that version of gpt5-mini is not accessible on chatgpt. They do not want to cannibalise full model and make this confusing

quiet dust
hollow imp
#

Dayum

obtuse heart
#

dude 😭

quiet dust
obtuse heart
#

yeah im sure that gpt-5 PRO is barely better than medium, thats facts guys

#

literally many youtube videos available on the internet proving that gpt-5-pro is much more capable

quiet dust
#

...

hollow imp
#

From where are you pulling all this up?

quiet dust
#

On the Internet

obtuse heart
obtuse heart
#

🤦‍♂️

untold hatch
#

What is the daily limit in the video arena? I suddenly have 0,others have 2 other still 8. 🤔

ocean vortex
quiet dust
quiet dust
#

GPT-5 mini - This is the Thinking Medium Mini model.

hollow imp
#

Previously they had different models and naming issue now they have different versions of gpt5 issue

#

Wow openai wow

obtuse heart
quiet dust
obtuse heart
#

👋

quiet dust
#

GPT-5-HIGH is a model of thinking with high effort. Pro works also

obtuse heart
#

not gpt-5-high thinking

quiet dust
#

Gpt-5-high - already works automatically on thinking

ocean vortex
obtuse heart
hollow imp
#

It's like a = 50 = b
a + b = 100
a + a = 100
2a = 100
a = 100/2 = 50

#

😭

quiet dust
#

I provide evidence, and then they write, “No, these are different models!”
I provide more evidence, and again the response is: “No, these are different models!”
I'm already tired of this

ocean vortex
#

gpt4.1-mini was the same base model as o4-mini. Though it had advantage over gpt5-mini-minimal in that it didn't have 'redundant' RL training for reasoning (when that is not being used)

quiet dust
hollow imp
hollow imp
#

😭

obtuse heart
#

different variations

hollow imp
#

😭🙏

quiet dust
hollow imp
#

Just accept gpt5 chat is the most smartest model guys

#

🔥

#

It is faster than wally west

tired herald
#

no

#

literally the worst garbage ive seen in the past year

#

it cant even code well

obtuse heart
quiet dust
obtuse heart
#

people mad that gpt5 is more professional now, and can no longer be their ai partner 💔

hollow imp
#

People should go to c ai

quiet dust
tired herald
quiet dust
tired herald
#

anything above 500 lines of code has the ai drop to its knees

obtuse heart
tired herald
#

GPT 5 high is a bit better, but still garbage

obtuse heart
ocean vortex
tired herald
#

yeah, you ask it for "code me an unnecessarily large file that does nothing"

obtuse heart
#

me when i ask gpt 5 to code me gta 6 in only html, then blame openai why it cant do that

tired herald
#

333 lines of code, and it fails to patch a simple bug that I intentionally added

obtuse heart
#

that screenshot sure proves a lot of things

#

im so sorry, i was wrong the whole time

tired herald
#

Yes

obtuse heart
#

I apologize DeNew779

tired herald
#

What an intelligent user

obtuse heart
#

1442 lines of code, and it perfectly did it one shot

#

see how dumb it sounds and looks

tired herald
#

again, even a dog can code simple stuff that looks massive if you train it

obtuse heart
tired herald
#

code that in any way isnt self-contained and interacts with outside stuff is broken

obtuse heart
#

physics and allat

tired herald
#

im not sure what you are using

obtuse heart
#

quality also depends on your prompts

tired herald
#

but its definitely not what im using

#

mine forgets to put code in code blocks

obtuse heart
tired herald
#

and spits out text code

obtuse heart
#

ig gpt just doesnt like you then, cus ive never had that problem

#

start of skynet

hollow imp
#

Gpt 5 high or Claude opus 4.1 for code

keen beacon
#

They should have named it GTA instead of GPT so we'd make fun about the fact GTA 6 is never coming out

tired herald
#

GPT 6 defo before GTA 6

obtuse heart
hollow imp
#

Guys tell me most popular use cases of using gpt 5 Pro, o3 pro, gemini deepthink

#

Exclude problem solving stuff

tired herald
#

chatting

#

rp

hollow imp
#

Rip

tired herald
#

Roleplaying

hollow imp
#

Isn't custom gpts perfect for that

ocean vortex
keen beacon
tired herald
#

music writing

#

Did you guys know that you can send system prompts on lmarena

#

Okay, so now onto the next feature for my extension

gritty sequoia
#

Hi, i really want to fix an account for lmarena so i can save my chat can i do that ?

tired herald
#

actually showing the files being sent on the message

tired herald
gritty sequoia
#

why

#

how can i save chat ?

tired herald
#

its saved on device

#

so all chats on a pc will stay on that pc

gritty sequoia
#

but forexample if i use safari and delete history the chats will be gone

tired herald
#

should be, yes

gritty sequoia
#

ye but i want it to be saved like in a account you know

tired herald
#

not possible as of now

flint sandal
#

Idk why but for me 4 Sonnet 32K is better than Opus 4.1 16K, the code seems better, responses seem more natural and overall while for me gpt-5-high is on opus 4.1 level, sonnet 4 is crushing them both. Its my opinion.

#

And why opus 4.1 thinking isnt on laderboard?

willow grail
#

deepseek r2 will be trash.
they simply dont have the nvidia gpus

tired herald
#

hope the icon is better

normal abyss
#

Is it a bug that gemini 2.5 pro doesnt use its thinking sometimes or does it just decide when it needs to???

torn mantle
calm sequoia
#

Anyone seen the size approximations for the GPT5 base model?

modest prism
willow grail
willow grail
#

the only great thing china does is their nice cargo and bulk ships

#

:3

#

i love big vehicles :3 rawr rawr furry uwu

ocean vortex
#

or gpt5-mini-high which does exist

#

gpt5-mini-high 'juice' is 256

#

So higher than gpt5-high

fallen swift
willow grail
fallen swift
#

huh

#

i didnt even talk a single bit about furries

willow grail
#

i mean better than sayoug you are a looser.
you are not tho

fallen swift
#

i forgot did i mention i might be a furry on my carrd i forgot

willow grail
fallen swift
#

????

#

what are you on about

willow grail
#

a third time would be a charm!

fallen swift
#

so youre confusing me on purpose arent you

willow grail
fallen swift
#

that face says it all

#

but yeah i like furries too

#

hey man dont be weird

willow grail
echo aurora
willow grail
fallen swift
#

18

willow grail
#

do u know masenko or ratgrave or vrchat?

fallen swift
#

i dont know two of these but i do know about vrchat

willow grail
#

/afk

echo aurora
#

let's try to keep conversation related to AI please

fallen swift
#

but the thing i want to say is

#

people are having limits of 0 videos

tired herald
#

what

keen beacon
tired herald
#

huh

#

no way right

echo aurora
#

We're looking into though.

unborn lantern
#

For me it's 0

trim lantern
#

Man , I literally cry when all my saved memory gets reset 😭

tired herald
#

wdym

stray aspen
#

what

trim lantern
stray aspen
#

wdym wdym

stray aspen
#

there are no accounts

echo aurora
keen beacon
#

are we allowed to ping @ pineapple?

tired herald
#

yes

trim lantern
barren prairie
stray aspen
keen beacon
#

@echo aurora in battle mode, a vision-enabled and and a no-vision model may be selected. But we still have the option to add in an image.. How is this handled by the app? Just wanted to know

tired herald
#

the image is sent to the vision enabled, and not to the no-vision

glacial torrent
#

Can, anyone please explain how many generation limits everyday?

stray aspen
#

idk the rest

keen beacon
keen beacon
tired herald
#

or atleast I think so

keen beacon
keen beacon
tired herald
#

which models

#

ill go test it out

keen beacon
# tired herald which models

i dont specifically remember actually... tho when i select those exact models in side-by-side chat, the image input gets disabled

willow grail
tired herald
#

I see

tired herald
keen beacon
#

also I had another query.. in battle mode, if i send a math query, and lets say I get a wrong answer from one of the models (model B). I vote for the model which answered correctly.

In the next query in the same chat, is there a chance the new model selected in place of model B gets affected by the previous wrong response?

keen beacon
echo aurora
keen beacon
echo aurora
keen beacon
echo aurora
keen beacon
#

In the above case tho, something weird happened.

echo aurora
scenic crypt
#

@echo aurora

echo aurora
# keen beacon In the above case tho, something weird happened.

So I had a different experience, which could be our answer:

I did something similar - "hey there"...uploads image... votes. With my experience when the vote happened both models were the same for the first and second response. Since it's different than what you're seeing, what I think is happening is for your case the two models (or one of them) you originally had don't have that capability, but when you uploaded two new models were selected. Lots of assumptions on my part but regardless I'm going to double check.

keen beacon
echo aurora
#

Yup! This is for sure a bug that we're going to look into and fix. My apologies for the inconvenience.

#

Good to know, I'll also raise. blobthanks

tired herald
#

how weird

#

lmarena is being very weird with packets right now

keen beacon
#

Also @echo aurora i had been wondering does lmarena sanitize the chats before publishing their datasets on huggingface? Sanitize as in removing any personal information or is it solely the user's responsiblity to not share any info that they deem personal?

keen beacon
tired herald
#

internet

#

im working on an extension

keen beacon
#

hm?

tired herald
#

for being able to add files to your messages

keen beacon
echo aurora
echo aurora
keen beacon
#

alright...

keen beacon
tired herald
keen beacon
#

having the ability to share pdfs would do wonders

tired herald
#

atleast when im adding my files

tired herald
keen beacon
#

r u reverse engineering the message sending??

tired herald
#

already have

#

i havent changed the logic yet im being constantly rejected

#

before it worked

keen beacon
tired herald
#

im slowly losing my sanity

#

im sending 1:1 the same thing as before but now its broken

tired herald
keen beacon
tired herald
#

changed in the last few hours 😭

keen beacon
#

hmm did you try checking the network tab's response>>?

#

(you probably did)

tired herald
#

im going to commit life erasure

#

now its working

keen beacon
#

lol great

#

can you share the thing already plss

tired herald
#

no way im sharing it when it has severe mental illnesses like this

#

im not gonna embarass myself

keen beacon
#

in DM?

grand panther
#

Hey guys! I wanted to know if there's any way to "bypass" the chat so I can post "inappropriate content"...
or any other way to get Grok 3 and 4 completely free and without limits.

tired herald
tired herald
#

accepted

#

but I know the issue

#

its the file im sending that has bad data

keen beacon
keen beacon
tired herald
#

WAIT

#

I KNOW HOW TO FIX THIS

echo aurora
stray aspen
#

@tired heraldhow did you attach a python file

echo aurora
# echo aurora

I'm going to be running this poll periodically, we'd love to understand better why. Please share in the thread!!

tired herald
echo aurora
scenic crypt
#

@echo aurora
how much I fix the problem

grand panther
tired herald
#

yes

#

or go to aistudio from google

eternal niche
#

btw gpt5 sucks

echo aurora
#

Our team is on it blobfingerguns

stray aspen
#

stop ragebaiting

tired herald
#

not fixed

keen beacon
keen beacon
echo aurora
#

Reminder that on discord you're able to block other people if you're not a fan!

keen beacon
echo aurora
keen beacon
#

they're actually just olympiad level

willow grail
#

PINE AND APPLE

U NEED TO BAN GPT5 HATERS like @eternal niche

keen beacon
willow grail
#

they goes on my nerves with their irrational yapping

keen beacon
willow grail
echo aurora
keen beacon
willow grail
#

pineapple i am joking. you are so serious lol

keen beacon
willow grail
#

yes. like myself. very mature

keen beacon
#

@echo aurora is there a way we could know the rate limits of every model and henceforth use them wisely (not wasting credits)

echo aurora
keen beacon
#

and also could we add the option for choosing system prompts in direct chat?
somewhat like the gems in gemini.google.com

I have a few instrcutions like "use latex" or "do not solve the given question, only give it as text w/ latex" or "judge my solution and provide it marks", etc

keen beacon
echo aurora
keen beacon
#

i believe i am asking wayy too many questions

echo aurora
#

Would encourage you to check out #1372230675914031105 , some of these requests are on our radar + it helps us organize these requests better.

echo aurora
keen beacon
spare rune
#

pineapple

#

are you

#

a

#

pineapple

keen beacon
keen beacon
echo aurora
keen beacon
spare rune
#

pineaserox

#

pineapple rex

#

im not good at coming up at nimes

#

names

echo aurora
#

both are good

keen beacon
#

pineaserous-rex

#

nono pineasaurous-rex

#

yeah

spare rune
#

yeah

keen beacon
#

hehe

#

pineappie-rex?

echo aurora
#

This is a bug, our team is looking into asap.

#

I am sorry you're running into this.

eternal niche
#

who

echo aurora
#

Lets move on please blobthanks

true condor
#

Anyone did Nano Banana comparison?

trail creek
#

tho its also very bad at redrawing/enhancing opposite to gpt

keen beacon
#

Calm down.

#

Alrighty then.

#

Lol

stray aspen
#

@keen beacon

#

grok 4 or gpt-5 high for coding

eternal niche
#

gemini 2.5 pro

stray aspen
#

lol

#

what

#

gemini sucks

#

its literally obsolete

keen beacon
true condor
eternal niche
#

who cares

keen beacon
#

Too far away

#

Needs to have one in Europe

stray aspen
#

no

pure falcon
#

@echo aurora
Any clue if we’re getting a new leaderboard update soon? Can’t wait to see how GPT-5-chat does!

candid storm
pure falcon
#

🤞

pure falcon
#

Hope so. The vague-posting is so frustrating!!

tired herald
#

should I add that to my extension?

eternal niche
#

what extension

tired herald
#

im making an extension for lmarena

#

to allow adding files like code files

#

its pretty good yes

eternal niche
#

yes gemini 3 - SOTA

tired herald
#

idk

#

ive never used toad before

stray aspen
#

ROFL

tired herald
#

im only here to improve my own experience with lmarena

stray aspen
#

slide the extension

tired herald
#

🙂

#

wait patiently

stray aspen
#

alrighty

tired herald
#

though ive got most features working well

#

nope, i just use direct models

#

is it really that good?

drifting thorn
#

By far, what’s the SOTA model for tool-calling?

#

Isn’t that SillyTavern’s work(

#

Gonna try it in my n8n workflow

pure falcon
echo aurora
tired herald
#

lets see

#

just gave it a pretty complicated request

spring turtle
#

I have a question... Is the message that says "daily video limit of 0 videos" a mistake or is different to the 8 video limit message (like if It was like one of those ai tools with one-time credits)?

amber warren
keen beacon
#

toad and dino, I mean

tired herald
#

s

spring turtle
#

@amber warren thanks, so im safe...

tired herald
#

well, I think toad died on me

keen beacon
tired herald
#

😭

worldly osprey
#

Hello

tired herald
#

hi

keen beacon
tired herald
#

no, the other model finished

#

and apparently toad finished without giving an answer

#

so its loading inf

echo aurora
#

Since it's a model that's behind a codename it's only accessible through Battle mode, meaning you won't be able to select it.

spring turtle
#

Also, is this also a bug?

keen beacon
#

hello, I used the ai battle option and uploaded an image and gave a prompt to edit it, there was a model named nano banana but it wasn't in the model list when I want to do side by side or single ai

tired herald
#

claude opus 4.1 or gpt 5 high

stray aspen
#

gpt 5 high

tired herald
#

wdym

stray aspen
#

its also very limited

tired herald
#

no, theres i think a limit of 5 messages

#

either 5 or 10

#

none

#

none

#

its pretty good

stray aspen
#

thats great

tired herald
#

step 3 is very chinese

#

I really wonder, should I add the ability to add custom system prompts

stray aspen
#

what the hell

pure falcon
keen beacon
#

👀

#

dont know if these are some updated versions

#

but saw on reddit

stray aspen
#

its a nice model

#

just tested it

keen beacon
#

not sure if these are the same ones with different naming now

obtuse heart
stray aspen
#

idk

#

this is imagen ultra

obtuse heart
#

that can definitely fool more people

tired herald
echo aurora
#

oooo which model is this?

#

oh opus 4.1 you mentioned

tired herald
#

[FILES_START]
{"files":[{"name":"test.py","size":7,"mime":"text/x-python","content":"Nothing","truncated":false}]}
[FILES_END]

#

this is how the files are sent

#

very cool

keen beacon
keen beacon
tired herald
tired herald
#

ill make a vid of it in a few hours when I have it done with a button to customize it and all

#

:))))))

#

find whats wrong

#

Huh

#

Im not doing this for gpt 5

#

I dont even use it

stray aspen
#

@tired herald

#

add pdf support

tired herald
#

😭

#

PDFs are prob not possible for me, they'd have to be implemented into the very LMArena

#

But ill try

stray aspen
#

whos funding your project

tired herald
#

No one

#

My boredom*

stray aspen
#

skibidi toilet

tired herald
#

Real

zealous anchor
#

How do I access nano banana? Not sure which tab it's under

echo aurora
zealous anchor
#

Ah.

echo aurora
weak nova
gentle plinth
#

btw why is it possible to vote for models in side-by-side? shouldnt this only be possible if we dont know the models name

#

or is it just a ui bug, and votes dont count anyway

gentle plinth
zealous anchor
#

It apparently doesn't know what a Klingon is

weak nova
#

It's bad at a lot of things other models are good at, but it blows everything else out of the water at what it's good at

#

They way you instruct it matters a lot too

terse shuttle
#

I'm dying to ask, but it's kinda personal. (i think)😭

echo aurora
#
poll_question_text

What version do you use the most?

victor_answer_votes

14

total_votes

22

victor_answer_id

3

victor_answer_text

Direct

gentle plinth
eternal niche
echo aurora
flint sandal
#

Grok can now do corn videos in $30 plan...

keen beacon
#

grok can make videos now?

keen beacon
stray aspen
#

it doesnt have censorship?

eternal niche
keen beacon
keen beacon
#

or just simply extract the text from it

flint sandal
#

That doesnt even make things spicy if you ask

#

Its making it always

eternal niche
#

in dm

flint sandal
#

X. Com and elon musk

#

He saidit

eternal niche
#

i dont believe him

#

i want see video

stray aspen
#

lmao

flint sandal
#

Can i send link to article?

#

Or i will get banned?

keen beacon
stray aspen
#

elon musks knows his audience

eternal niche
keen beacon
#

Maybe in DM

eternal niche
#

only for educational purposes

#

for science

keen beacon
eternal niche
#

what

flint sandal
#

I will send link to article with censored version

stray aspen
#

guys its official

#

elon musk made a corn generator

eternal niche
#

he is lying

#

(elon musk)

flint sandal
#

Nah

#

I tested it

#

Kind

#

Kinda

#

Like i didnt want to test ir

#

You know

#

Science

stray aspen
#

musk needs to be stopped

keen beacon
#

As you can see

flint sandal
#

He will literally buy anyone

#

No one can stop him

#

Only he can stop himself

keen beacon
#

Just being honest

flint sandal
#

I think no one will pay to see robot corn

#

Cornhub is free

keen beacon
flint sandal
#

Its openai and google fault

#

For dicovering transformers and gpr

#

Gpt

keen beacon
#

Billions are thrown for the AI tech industry

flint sandal
#

And still most models dont know what number is bigger. 9.9 or 9.11

obtuse heart
ocean vortex
#

gpt5-high. Made it output 16k lol

obtuse heart
ocean vortex
#

It messed up the reflection

#

but otherwise this is very detailed

obtuse heart
#

thats impressive and detailed

flint sandal
#

I tested gpt-5-high and i am dissapointed of his censorship for things that are not even a bit unethical

neon idol
#

hello

keen beacon
flint sandal
# keen beacon What censoring stuff did you test?

I wanted to code like interface for deepseek r1 and this mf hided CoT, when i asked him to show it to me he said he cant, i tried multiple times with prompting it like dev mode where i should see CoT but still didnt work.

#

I think its system prompt

keen beacon
flint sandal
#

When openai said he cant show his CoT he made it too literally

flint sandal
#

Not lmarena

#

Read my message again

keen beacon
#

I see

eternal niche
#

no censorship

#

musk forward!

flint sandal
#

Wdym

#

Many ai dont have censorship

#

Even opensource

eternal niche
#

(and sucks)

neon idol
#

grok 4 is better

keen beacon
flint sandal
keen beacon
#

It's bad

#

I don't like it in general

#

It can only speak in english

#

Even small qwen models are multilingual

#

With lots of languages

gentle plinth
#

best os math model tho in my opinion

tired herald
tired herald
stray aspen
#

@eternal nichedo you support musk

sour spindle
#

going through my account and deleting all my anti 2.5 pro comments