#general

1 messages ยท Page 98 of 1

coarse glade
#

Are you a admin

modest prism
#

The API does not have those system prompts which tell the models which version they are.

stray aspen
#

no

errant cave
stray aspen
#

everyone knows that

errant cave
whole wagon
#

Ngl. I'm not really seeing much point to the cheaper openAI models except so openAI saves money. Because we have stronger open source alternatives that are cheaper also

tired herald
#

ngl

stray aspen
#

you have system prompts lmao

tired herald
#

uhmmm, idk what you might even be referring to hmmmm

#

its a very interesting hypothesis though

ocean vortex
#

It is not "better". It's just finetuned more aggressively for the style and formatting users prefer

tired herald
#

I commend you for your wisdom

coarse glade
whole wagon
#

Why use GPT5 mini when GLM 4.5 exists for $0.5/$2

coarse glade
#

Can u help me fix this pls

stray aspen
#

just use it

coarse glade
stray aspen
#

he has system prompts

tired herald
#

I have a system prompt

stray aspen
#

hes making a plugin

errant cave
tired herald
gentle plinth
#

it should, but it needs some work for getting it to run

ocean vortex
gentle plinth
#

if youve never installed react it can be complicated for some

whole wagon
#

I don't think so

#

I tried both

coarse glade
#

Ok so now it should work for me right

hollow imp
gentle plinth
coarse glade
#

Ok sorry for saying youโ€™re lying you guys should have a section for help

#

Like if something goes wrong

ocean vortex
hollow imp
modest prism
ocean vortex
#

So glm has no chance...

whole wagon
#

What the hell GPT5 dropped so much in the leaderboards

gentle plinth
whole wagon
#

How did it drop that many Elos in short duration of time

tired herald
gusty helm
#

ask grok 4 same q, he will say he is grok 2 or 3 ๐Ÿ˜„

gentle plinth
coarse glade
stray aspen
#

are you kidding me

ocean vortex
stray aspen
#

guess ill use opus

whole wagon
hollow imp
whole wagon
#

It's within error

ocean vortex
whole wagon
#

To 2.5 pro kek

coarse glade
#

Can someone help me by dm me

gentle plinth
whole wagon
modest prism
ocean vortex
#

Google is always benchmaxxing for lmarena

gentle plinth
ocean vortex
hollow imp
coarse glade
gentle plinth
#

just what you want

hollow imp
#

Told you

gentle plinth
#

the prompt is only for direct chat

modest prism
tired herald
torn mantle
#

its not that good

coarse glade
hollow imp
#

Can you tell

tired herald
#

You will soon have System Prompts @coarse glade

#

Soon

torn mantle
#

im starting to think that @modest prism is close to elon or some xai staff somehow

tired herald
#

very soon

whole wagon
#

The market gives grok better odds than openAI for December

coarse glade
tired herald
#

yes

#

and im alone btw

stray aspen
#

whos funding your project

tired herald
#

no one

#

im doing for fun

hollow imp
#

Lmarena head

modest prism
hollow imp
#

Mr.@tired herald Head of Department, system prompts, lmarena.ai

whole wagon
#

Isn't Gemini 3 just going to be insanely good. They literally need to add a tiny amount of perf for SOTA

#

They are already at the frontier with a comparatively old model

whole wagon
#

And the 2 pro to 2.5 pro leap was immense

tired herald
#

I would love to release this already and send it to the LMArena team so they can improve their website

stray aspen
#

if gemini 2.5 pro still holds its ground even now

#

with the new models

modest prism
hollow imp
#

2.5 pro experimental shooked the world

torn mantle
whole wagon
#

Grok is intelligent if you prompt it right. Issue is it takes 10 mins thinking then

gentle plinth
# whole wagon

the win-rate to gemini2.5 pro also doesnt look good (was already there before the update)

torn mantle
#

idk grok 4 vibes are off

ocean vortex
coarse glade
#

And also are u guys going to give system prompts for the Public soon which will be nice ty pineapple for everything

torn mantle
hollow imp
whole wagon
#

And it's -4 points also

#

Lol

ocean vortex
tired herald
blazing bison
#

Do you guys think that we gonna have gemini 3 this month?

coarse glade
whole wagon
#

Ngl these GPT5 models are good at maths though. That isn't a Gemini strong point any longer

coarse glade
ocean vortex
#

2.5Pro is not SOTA anymore, like that's the reality

modest prism
coarse glade
#

And also do u guys know that Claude is good too

tired herald
hollow imp
gentle plinth
blazing bison
blazing bison
coarse glade
keen beacon
#

google FUMBLED

#

its trash now

#

thats the reality

coarse glade
#

Yeah 2.5 pro on ai google studio is bad now

tired herald
# hollow imp

temperature is something I checked if I can do it myself, but nope, which is sad

keen beacon
#

idk wtf they did

whole wagon
#

The initial 2.5 pro release was before o3 release

modest prism
stray aspen
#

its not SotA but its not bad either

keen beacon
#

but gpt-5 is amazing

tired herald
coarse glade
#

It isnโ€™t up to date even with grounding with google search for ai google studio honestly

#

Thatโ€™s why I stopped using it

blazing bison
whole wagon
#

IMO Gemini 3 is October time. Not any time soon

hollow imp
#

Like you have Google ai ultra but no temparature setting access ๐Ÿฅ€

coarse glade
keen beacon
#

and im the biggest google shill

coarse glade
#

This is the website

keen beacon
#

so if im telling you it

#

yk its true

solid brook
#

Leaderbord does not make sense. Gpt 5 chat which is the worst model in gpt 5 because it is non reasoning is better than gpt 5 mini thinking which is a reasoning model

tired herald
whole wagon
#

Gemini 3 is October, grok 5 is december

blazing bison
#

They created like 30 different endpoints for gemini 2.5 pro

tired herald
#

you need to open Advanced Settings for Top P

hollow imp
#

Not ai studio

coarse glade
#

Did u guys see that perplexity wants to buy google chrome

hollow imp
#

๐Ÿ˜ญ

modest prism
tired herald
#

bot hare gemini web ๐Ÿ˜ญ

keen beacon
whole wagon
hollow imp
keen beacon
#

Grok added NSFW to their image generator

coarse glade
#

Thatโ€™s nice of u

#

Ty so much @modest prism

keen beacon
#

Thats their only latest update ๐Ÿ’€

blazing bison
#

Why have so many people active here today

hollow imp
blazing bison
#

๐Ÿ˜†

stray aspen
#

elon musk made a corn generator

tired herald
coarse glade
gentle plinth
#

(i think)

blazing bison
#

I want zenith back

tired herald
#

there was one

keen beacon
#

Hes a cr@ckhead

hollow imp
#

๐Ÿ˜’

coarse glade
#

Nilay Patel, The Verge editor-in-chief, joins 'Fast Money' to talk the latest moves in the AI arms race between Google and Apple. For access to live and exclusive video from CNBC subscribe to CNBC PRO: https://cnb.cx/42d859g

ยป Subscribe to CNBC TV: https://cnb.cx/SubscribeCNBCtelevision
ยป Subscribe to CNBC: https://cnb.cx/SubscribeCNBC
ยป Wat...

โ–ถ Play video
tired herald
#

I dont pay

whole wagon
#

I heard xAI is actually building AI corn team. To refine the corn generation

solid brook
#

Man I am so excited for gemini 3. Hope they don't nerf the model

tired herald
#

I dont have ultra

keen beacon
#

so many be yappin here

tired herald
#

indeed

whole wagon
coarse glade
#

Ok bye ty for everything @echo aurora

hollow imp
solid brook
#

With deepthink

hollow imp
#

I only use the Gemini web because of custom gems

modest prism
hollow imp
#

And I have seen how ๐Ÿ˜ญ veo3 is in video arena

tired herald
#

omg why is it so hard to change the UI of LMArena

hollow imp
#

You can do it?

wicked osprey
#

hi im new to here๏ผŒi just wondering i got limited in my gemini after few question๏ผŒis it unlimited in lm arena๏ผŸ

hollow imp
#

No

tired herald
#

yes, thats how I do everything ๐Ÿ˜ญ

hollow imp
#

How to change the ui of lmarenn lmarenn

wicked osprey
#

so far only sometimes i got stuck in genarating page but haven't saw limited yet

hollow imp
#

I never got limited in Google ai pro free trial

tired herald
#

LMArna

wicked osprey
#

it just going flash instead of pro sometimes

tired herald
#

but why tf is the Logo a collection of SVG's

#

even the letters

wicked osprey
#

i just feel it takes time to genarating compare gemini appwith pro๐Ÿ˜ฉ

tired herald
#

I have zero Idea why theres a visual of the old ui behind the new ui

burnt sinew
#

when will video generation be added to the website

echo aurora
hollow imp
#

To be donest

tired herald
#

to be determined?

indigo hazel
ocean vortex
hollow imp
tired herald
hollow imp
#

I do not even receive any pocket money

burnt sinew
steep mirage
#

Hey everyone,
I keep getting this Cloudflare "Security Verification" pop-up on LMArena (see screenshot).
Any ideas how to fix this persistent verification? It's really annoying!
Thanks!

echo aurora
ocean vortex
hollow imp
steep mirage
tired herald
#

what happens when you try to verify

ocean vortex
#

What exactly are you getting for their Pro plan than you don't already have with aistudio...? Not much at all

echo aurora
hollow imp
ocean vortex
#

Their gemini subs are useless

tired herald
#

Oh, making the Model selector appear in the chat box is gonna be hell

steep mirage
hollow imp
ocean vortex
#

You may buy gemini sub if you want to support them or whatever, but then you shouldn't pretend it's a good value for the service ๐Ÿ‘€

hollow imp
echo aurora
tired herald
#

icons icons icons.....

#

atleast this works

tired herald
ocean vortex
hollow imp
steep mirage
tired herald
steep mirage
ocean vortex
# stray aspen im confident it will be SotA

IMO the fact that they are messing with huge models (ultra for deepThink) is not extremely promising tbh. There's no obvious substantial improvement path without going to the extremes with Google for now..

#

I doubt Gemini3 will be substantially better than 2.5Pro. Probably more like marginal improvements (2.6?)

tired herald
soft kernel
tired herald
ocean vortex
tired herald
#

No

ocean vortex
#

2.5Pro is not mentioned anywhere for that model card

tired herald
#

Its 2.5 but multiple

tired herald
#

Marketing

soft kernel
tired herald
#

They make multiple versions of 2.5 Pro work together to all create a better response together

ocean vortex
# tired herald So what

So it's a different model. We already knew they had bigger model internally, and DeepThink is way to distinct from 2.5Pro to be the same model tbh

tired herald
#

Same with the GPT-5 Pro and Grok 4 Heavy

ocean vortex
#

Like it can do some tasks worse than 2.5Pro

tired herald
#

Ill stop arguing

#

Ill go work on my stuff

ocean vortex
tired herald
#

No

hollow imp
#

@tired herald I don't understand top p at all. What value should I put

ocean vortex
#

Yes

#

Stop arguing

tired herald
#

I'm not

hollow imp
#

LETS GO DEOLD VS DOMS

ocean vortex
#

Just look at their 10rpd

#

cap

tired herald
#

๐Ÿ’€

soft kernel
hollow imp
#

DEOLD776 VS DOMS

#

๐Ÿ”ฅ

tired herald
ocean vortex
gentle plinth
ocean vortex
gentle plinth
#

If you found a way to deterministically verify a model then it's easy I guess

#

And a difficult to solve problem for lmarena

ocean vortex
hollow imp
ocean vortex
#

Rather than bots doing vote manipulation - there's not many of those if at all

tired herald
gentle plinth
hollow imp
#

DOMS LOST

tired herald
hollow imp
#

๐Ÿ”ฅ

gentle plinth
hollow imp
#

WINNER IS DEOLD

#

๐Ÿ†

ocean vortex
tired herald
#

No

ocean vortex
#

That's why they do it

tired herald
#

Takes 3 clicks to get around any protections on the web

#

Also, even if, its not called RE

soft kernel
ocean vortex
#

what

tired herald
#

You are insulting every RE with your sentences

ocean vortex
#

You seem confused

tired herald
#

Im an RE you honk

#

RE needs something to be locked in some way

#

Its open

#

So no RE required

ocean vortex
#

I don't think you are

tired herald
#

Okay

#

Idc

ocean vortex
#

you wouldn't talk nonsense

#

but maybe a wannabe one think

tired herald
#

Says the guy that has no idea what google do

plucky otter
#

Is there an option that you have always 8 sec video without the commercial at the end. Now it is random and 5 sec or 8

ocean vortex
#

@tired herald Reverse engineering refers to automating things like lmarena so you could do requests without having to go through their interface, click on things and having to do it manually.

tired herald
#

WHAT

#

Are you insane

ocean vortex
#

It's not what you think it means

tired herald
#

Are you f*cking insane

#

REVERSE

#

REVERSE

#

REVERSE

#

REVERSE

#

I think bro is having a stroke

#

Are you okay?

novel crater
#

anyone else having this issue

soft kernel
tired herald
plucky otter
#

AE (Alternative effects) server has a free pool of VEO3 text to video generator. Tonight ofline, but most nights from 1900 to 2300. 8 sec video, without watermark free download

novel crater
ocean vortex
# soft kernel I'm pretty sure that's not reverse engineering AT ALL

It is. Because you need to arrive at the methods they are doing in their interface for sever requests without having to use their interface. You are reversing how it works in order to be able to do it independently of accessible interface. I don't know why this is so difficult to grasp for some people. ๐Ÿคทโ€โ™‚๏ธ

tired herald
#

no

#

You are wrong in every sense of the word

#

Im sorry, but please stop interacting with me

ocean vortex
#

Enlighten us then

tired herald
#

Reverse

ocean vortex
#

Cause it looks like you have no clue

tired herald
#

Why is it reverse

#

Reverse

ocean vortex
#

Most likely you don't

tired herald
#

One word

#

Reverse

#

Can you explain why that word is used

ocean vortex
#

You have no clue do you...

tired herald
#

Bruv

#

I asked a question

ocean vortex
#

I asked you first.

tired herald
#

Reversing something

ocean vortex
#

But you can't come up with a single sentence

#

without using chatgpt

soft kernel
#

Nvm bro seems confused

tired herald
#

Bruv

#

Dom

ocean vortex
#

๐Ÿคฃ ๐Ÿคฃ

tired herald
#

Can you read what an AI just said

ocean vortex
#

Bruh....

tired herald
#

Then I think

#

That

#

You

#

Are

#

Ha ing

#

A

#

Stroke

ocean vortex
#

turn on your brain

tired herald
#

This guy thinks he can code GTA 6 in 2 hours

#

This is the embarrassment of vibe coding

ocean vortex
#

And explain how I'm wrong

#

if you can't do it

#

without prompting chatgpt

tired herald
#

This guy is the reason why people believe in the flat earth

tired herald
#

REVERSE

ocean vortex
#

then you have no clue

tired herald
#

REVERSE

#

REVERSE

ocean vortex
#

LMFAO

tired herald
#

WHAT IS THE DEFINITION OF REVERSE

#

WHAT IS THE DEFINITION

#

OF

#

REVERSE

ocean vortex
#

are you stupid?

#

๐Ÿ—ฟ

tired herald
#

You are pulling my leg

#

You cant be this unintelligent

#

Im being successfully Ragebaited

ocean vortex
#

You sound like a broken record having absolutely no clue what you are talking about

#

๐Ÿ˜ญ

tired herald
#

Can

#

You

ocean vortex
#

REVERSE

tired herald
#

Define

ocean vortex
#

REVERSE

tired herald
#

Reverse

ocean vortex
#

๐Ÿคฃ

tired herald
#

No?

#

Ok

exotic nebula
#

What are you guys rambling about?

tired herald
#

Reverse engineering

exotic nebula
#

Reverse engineering?

blazing rune
#

๐Ÿฟ

vocal token
exotic nebula
tired herald
#

Dom doesnt know what reverse means

ocean vortex
echo aurora
#

Reminder:

โœ… Treat others with Respect. Be kind, assume good intent from others, and keep disagreements respectful. Itโ€™s encouraged to share your disagreements, but only if itโ€™s done in a respectful and productive way.

vocal token
#

Hey Greg!

tired herald
tired herald
#

I was being ragebaited and fell for it

#

I promise it wont happen again

echo aurora
#

Lets just keep conversations respectful please.

leaden palm
ocean vortex
tired herald
ocean vortex
tired herald
exotic nebula
#

Chill guys

#

Lets switch topics

#

Did anyone try out HRM AI?

tired herald
#

My topic is that im prob gonna release my extension today

tired herald
soft kernel
tired herald
leaden palm
exotic nebula
tired herald
tired herald
echo aurora
tired herald
#

Hehe

#

Only a single bug to fix and then its onto github to release the src

exotic nebula
#

Bro stop

#

Stop milking a dead cow

soft kernel
#

Can you please...?

soft kernel
#

Release it asap

#

๐Ÿ˜ญ๐Ÿ˜ญ

tired herald
#

Yeah, theres just a little problem with it right now

#

I need to fix and release

#

Im hoping that it could be helpful for those who could use it

vast fern
#

hi everyone

#

i am new here

#

i need one help

tired herald
#

Ask away

soft kernel
#

Is there any other models for generating 3d worlds,real time,other than genie 3?

exotic nebula
tired herald
vast fern
#

what is this and how can i get this

#

i wanna be updated on the new models

soft kernel
tired herald
gentle plinth
#

I think he means he wants to get notified

tired herald
tired herald
vast fern
tired herald
#

OHHHH

soft kernel
#

Legit api

gentle plinth
vast fern
tired herald
#

I misunderstood

echo aurora
gentle plinth
#

But can I leave the msg here?

#

I mean one can simply remove the space

vast fern
#

Thanks everyone, Have a great day.

echo aurora
gentle plinth
#

OK thanks

scenic salmon
#

@echo aurora I donโ€™t know if you saw my question yesterday, but I was wondering if youโ€™re able to talk about the pre-seed/seed rounds at all, it wouldnโ€™t need to include any specifics about lmarena, just how the general experience wentโ€ฆ like finding a lead investor, pitching to angels/vcs, etc

echo aurora
scenic salmon
#

Ah okay, thanks anyway

gentle plinth
tired herald
#

Gacha on ai, gacha games, gacha life, gacha everything

exotic nebula
exotic nebula
keen beacon
#

Hold on, I read the github

exotic nebula
keen beacon
#

Just asking if you knew more

ocean vortex
exotic nebula
exotic nebula
keen beacon
#

Esp for math and such

exotic nebula
exotic nebula
stray aspen
#

does anybody know an ai model for sound effect generation

stray aspen
#

isnt that just for music

exotic nebula
stray aspen
#

yeah i found eleven labs

#

im gonna give it a try

exotic nebula
exotic nebula
white hatch
#

Is it unsafe to use VPN while visiting chatgpt website?

exotic nebula
#

You wont get flagged.

#

Just might be stripped of some features/accessibility refrained in some countries.

#

Have heard of account lockouts. But so far only rumors.

ocean vortex
#

I wonder if this would hold up as a general purpose model in real life

exotic nebula
exotic nebula
neon idol
#

Hello chat

#

How are you?

whole wagon
#

I like how opus be like this when you ask it to be your friend. But you ask 4o and it goes all in with saying you'll be the bestest friends ever and all sorts lmao

#

Opus 4 actually just says no basically

solid brook
#

People got so desprate for 4o when it was gone for 2 days

#

This will have nagetive effects

quiet dust
#

Guys, how to use the Thinking model in GPT-5, not Thinking mini?

#

Or is only Thinking mini available for free users?

keen beacon
#

the whole chatgpt subreddit is on fire

sullen quest
#

GPT 5 mini high is worse than 4.1 and 5 chat, gpt nano high is worse than oss at 42'nd place in text arena

#

flagship model my

tired herald
#

Sorry to say this

#

But the small error ballooned into a much larger set of issues so I cant release the src today

#

But I promise tmrw will be the day

novel crater
#

the first personal fun project I made like actually was able to fully complete was done with 06-05 Gemini, I am a Gemini fan for sure

hollow imp
novel crater
#

yeah Gemini 3 would be freakin awesome

#

basically I made a tarkov grid style inventory, if anyone here knows that game, basically a pretty complex system and Gemini knocked it out of the park

#

I don't think chatgpt likes their competition at all and probably put out gpt 5 as like this big thing, because not a lot of people seem overly enthused about it

balmy mist
#

what happened to deepseek man

marsh stratus
neon idol
# balmy mist

Deepseek was too strong and he was nerfed ๐Ÿ˜ญ๐Ÿ™

sullen quest
scenic salmon
# novel crater I don't think chatgpt likes their competition at all and probably put out gpt 5 ...

Sam has known for quite some time that GPT-5 was never going to live up to expectations, GPT-4 was this big model scale up from 3.5 and it saw huge improvements with that scale up, so people have been expecting the same from 5 since then, but it was found you canโ€™t really scale them up any further and get meaningful improvements, so they started focusing on their reasoning/thinking models (o3/o4/etc), they tried launching another giant sized model, 4.5, but people didnโ€™t really like it, so it was known for quite some time, whatever GPT-5 was going to be when it released was going to be more of a rebranding than anything

sullen quest
#

Sometimes people need to realize when they shouldn't be hype men

scenic salmon
#

Gotta get those VC dollars

novel crater
marsh stratus
sullen quest
marsh stratus
#

Nano and mini seem built for api use. Gpt-5 chat IS ChatGPT fo r most people, and itโ€™s just so bad

novel crater
#

fair yeah thats a really big part of it I think

#

cost friendly is always good

#

especially with some of these models ๐Ÿ‘€

whole wagon
#

GPT5 without thinking is so bad

#

I don't know how it's even possible

#

It fails basic arithmetic

#

That even 4o could get

#

I'm basically permanently using thinking

#

Only downside is wait time between prompts

stray aspen
#

@novel craterwhat will come first gemini 3 or grok 5

scenic salmon
novel crater
#

I don't have enough knowledge on that to say definitively but since grok 4 came out recently it would make more sense logically that Google would release first

whole wagon
#

That's my guesses no info

#

The gap between pro and regular gpt5 is quite insane. They didn't even mention pro in the livestream lol

#

I guess they don't really want ppl using it much. Due to capacity issues

#

The gap between gpt5 and GPT5 pro much larger than gap between o3 and o3 pro I found

scenic salmon
zinc ore
inner gate
#

Hello famalams

whole wagon
#

It's just that it is rated for only up to 145

#

They didn't calibrate beyond that

pure falcon
#

Does anyone else think that LMArena should go back to NO style control? Now theyโ€™re also talking about โ€œemotional controlโ€โ€ฆ

Makes no sense - it defeats the purpose and mission of LMArena in the first place. The more they โ€œcontrolโ€, the more they turn into the very benchmarks that they wanted to distinguish themselves from

#

If LMArena is about what users likeโ€ฆthen let the users decide what users like!

#

The more variables and factors you โ€œcontrolโ€โ€ฆ
You end up turning into a Capabilities test

#

And There are plenty of capabilities benchmarks out there already

whole wagon
pure falcon
#

LMArena is supposed to give insight into what other benchmarks miss

#

If you keep โ€œcontrollingโ€ for those unknown factors by adding style control, sentiment control, etcโ€ฆ.the benchmark becomes worthless. No alpha in the scoring

hollow imp
#

Gimme style control

pure falcon
patent aspen
pure falcon
#

All Iโ€™m saying is, if people like emojis and bolded words or whatever, then you have to accept that, regardless of ether you think theyโ€™re dumb or meaningless

#

In fact, weโ€™re seeing right now why style control is a mistake for LMArenaโ€™s clients

whole wagon
#

It's to prevent a race to the bottom where every model turns into sycophant emoji slop machine

pure falcon
#

OpenAI announced that they would make gpt-5โ€™s personality โ€œwarmerโ€

#

Because itโ€™s obviously been too โ€œstyle-controlโ€ maxed

pure falcon
scenic salmon
#

Trying to work their way back up the charts

pure falcon
#

Again, the more you control for user preferences, the more you turn into any other standard benchmark

#

Making LMArena useless

hollow imp
patent aspen
# pure falcon Making LMArena useless

The thing is: they're already controlling for a bunch of other things. Style control is controversial because it's a visible option. The other controls aren't because they aren't visible.

pure falcon
pure falcon
frigid coral
pure falcon
pure falcon
pure falcon
patent aspen
frigid coral
scenic salmon
pure falcon
pure falcon
ornate agate
#

Yeah filtering scam submissions seems like itโ€™s not a manipulation

pure falcon
#

Also, the models vary wildly by language. 50% of LMArenaโ€™s queries are not in English

patent aspen
#

I don't think style control is manipulation either. I think it's just a slightly controversial option with compelling pros and cons to consider

frigid coral
ornate agate
pure falcon
leaden palm
frigid coral
hollow imp
#

@pure falcon talk to @echo aurora about all this

ornate agate
#

So filtering scam submissions seems to be a necessary baseline function of the website or it would be a joke given how popular it is. Style control isnโ€™t, itโ€™s a choice to have it or not. Iโ€™m not sure itโ€™s up to date for recent models

patent aspen
#

Yeah I mean personally I'm not the biggest fan of style control, although I think it has pros and cons and is useful as an option. You get some form of model provider gaming either way

leaden palm
#

Or it can increase sycophancy but can't induce EQ

ornate agate
patent aspen
#

Yeah I kind of want all the models to have fancy markdown now

whole wagon
#

Removing style control would cook openAI

leaden palm
pure falcon
ornate agate
#

This is gonna happen itโ€™s gonna answer 42 to everything really quickly unless you ask it something sufficiently interesting.

#

But well, such a system will be in the somewhat far future

pure falcon
pure falcon
patent aspen
#

IMO a few models will eventually pull ahead by a wider margin at which point the style control problem will gradually become moot because the best models will be able to win even with extensive markdown

whole wagon
#

Style control helped in the llama 4 situation which was favourable

pure falcon
#

Why would they do that though? They care much more about raw user preferences. There are plenty of other benchmarks (internal and external) that can measure raw capabilities for specialty / niche areas

ornate agate
#

I donโ€™t think lmarena is just user preference. Itโ€™s a whole bag of stuff

pure falcon
whole wagon
pure falcon
#

I really would like to know, genuinely

ornate agate
#

I mean I donโ€™t care about what value it provides to model makers. I care about what value it provides to the community.

whole wagon
#

If they really would like to see no style control. They simply press remove style control, simple

pure falcon
#

Sure, but you could argue the exact opposite, right? If no style control is what matters more, then why is style control the default?

patent aspen
#

fwiw I also prefer no style control as the default

pure falcon
#

They are propped up by VC funding

ornate agate
#

I also prefer no style control.

ornate agate
pure falcon
#

Who are the leaderboards most valuable to? Iโ€™d argue itโ€™s to companies like OpenAI who use their leaderboard to determine which model to deploy. Which is exactly what they did for GPT-5

#

There was an alternate gpt-5 variant named โ€œzenithโ€ that was tested in late July

#

OpenAI ended up going with the โ€œsummitโ€ variant because it did better in LMArena

#

If they didnโ€™t already, OpenAI would pay a lotttt of $$$$ for that data

#

Which means

#

For LMArena

#

Their primary customer and purpose, is the model maker

gritty stirrup
#

Why does Claude Opus have a limit on lmarena?

warm fulcrum
#

and every model has a limit

gritty stirrup
# warm fulcrum and every model has a limit

I didn't notice that other models were reaching the limit, but maybe you're right and I just didn't reach it. I didn't write that many messages and I already have a limit.

patent aspen
#

@echo aurora We've been discussing whether or not LMArena should have style control enabled by default. I know this topic has been beaten to death, although I'm curious to understand LMArena's take

leaden palm
# pure falcon But thats the whole question right? Is LMArena measuring intelligence? Is it SUP...

that isn't really what i meant - i was saying that a leaderboard that measures the "core" of a model might be interesting to look at

"If an ASI model came out tomorrow but was a total a- hole that everyone hated, where would you want that model to be ranked on the leaderboard?" if it's better than the average model prompted to act that way, and if it can be prompted to act like whatever other model, then on a leaderboard designed to measure the core - the parts of the model that aren't superficial - it would rightfully take first place

patent aspen
#

Personally I think there are pros and cons and prefer no style control by default because I see LMArena as primarily measuring user preference

pure falcon
#

Do you see what Iโ€™m getting at? The more you control for, the more you approach benchmarks that already exist. That defeats the purpose of LMArena IMO

leaden palm
patent aspen
#

I think the best argument for style control is if your goal is to be the best all-in-one measure of intelligence in the areas that users care about

ornate agate
#

Intelligence is just a vibe

echo aurora
#

I could be wrong

ornate agate
#

So style control is a vibe realignment from the default one, more towards a different vibe that people might want moreโ€ฆ

patent aspen
leaden sun
#

I think less control is better

#

but it depends on what purpose style control serves and what you want to test that you need style control as a normalizer

ornate agate
ornate agate
pure falcon
# leaden palm > What does intelligence even mean? what can't be reduced by controlling for att...

Why is communication not counted in your โ€œintelligenceโ€ definition? Are comedians not โ€œintelligentโ€ in your eyes? Are entertainers not โ€œintelligentโ€? Intelligence is not just about solving problems and puzzles. Thatโ€™s a very narrow and limited view of intelligence. A model highly skilled in communication is, in fact, intelligent, just in a Different way than Einstein. Intentionally dismissing those capabilities (by โ€œcontrollingโ€ for it) makes no sense.

leaden palm
#

there are actually a few ways to do this though and all have some problems tbh

#
  • prompt all models to act a fixed way
  • prompt one model to imitate the structure of another model
  • identify the traits of each model's responses and control for each trait (implicitly assuming that each trait is easy to reproduce)
ornate agate
#

If so who sets them?

leaden palm
#

idk 100% though, i don't develop the arena

ornate agate
#

Itโ€™s not suspicious on its own. All chat apps have a (often large) system prompt

ornate agate
leaden palm
#

i believe so

#

eg the o series needs something like enable markdown formatting iirc

ornate agate
#

Ah ok. That makes sense

scenic salmon
#

They added it pretty quick

tired herald
#

Damn, I should add this to my extension too

misty vault
#

But most importantly: @deep adder said it is the best by far

tired herald
echo aurora
hollow imp
#

@leaden palm please some prompt engineering tips

tired herald
# ornate agate Btw do llm arena models have system prompts?

From what I checked, it seems that its either a very very simple one, none at all, or a base system prompt from the provider (gemini gave me a believable system prompt when I ehhm forc- made it give me its system prompt, all other ai's said they didnt have one/gave me mine back, which prob means mine was the only one)

#

Tho I cant confirm that unless I had access to the internals of LMArena

ornate agate
golden ocean
willow grail
#

IIRC cursor had big context issues 1 year ago. is this still the case, now with gpt5 high?

fervent tangle
#

why do OpenAI models suck so bad at creative writing

#

and why is Claude so good at it

wintry tinsel
rugged swan
#

new here, excited to learn from you all

fervent tangle
#

๐Ÿฅ€ fr

wintry tinsel
#

๐Ÿ”ฅ

misty vault
#

paws stop typing

fervent tangle
#

i used all openai models since 2023, even GPT5 sucks today

#

and the emoji blasting is corny

#

i hate the emoji respondings

jade egret
#

๐Ÿ‘€

quiet dust
fervent tangle
#

and its game over

jade egret
#

fr

fervent tangle
#

google is actually the winner, cuz they got all the money in the world + all resources

#

it is actually

jade egret
#

?

#

when u have no money, no resources and your gonna win?

fervent tangle
#

google has the best image, chat and video models rn

#

wydm it isnt

quiet dust
#

Guys, does anyone know how to use the Thinking model on the phone in ChatGPT, not Thinking mini? It's just that the Thinking mini model provides the "Think longer" function.

jade egret
fervent tangle
#

they literally cant lose if they do that

jade egret
#

google?

wintry tinsel
fervent tangle
quiet dust
jade egret
#

google can win

fervent tangle
jade egret
jade egret
#

๐Ÿ˜ญ

patent aspen
#

Google has a much, much longer runway than OAI, and it's not even close

fervent tangle
#

and in a few months they'll release Gemini 3

jade egret
#

xAI seriously

fervent tangle
#

it will crush the coding benchmarks

#

fr

jade egret
#

why u hating on google tho

patent aspen
#

And eventually sheer capability just wins

jade egret
#

wdym no product

quiet dust
fervent tangle
#

they'll use it if its the best at all stuff

patent aspen
#

OpenAI is far less innovative than Google, and it's not even close

fervent tangle
#

doesn't even matter, they get their income from other stuff like google search, youtube and all google owned markets

#

they can throw money on AI until they get big

jade egret
#

they can just put gemini in chrome which they already started?

patent aspen
#

The wider the gulf in the capability, the less product polish matters

fervent tangle
#

u know the google cfo?

jade egret
#

if you count all the x users?

fervent tangle
#

cuz its "not woke"

jade egret
#

well than why not include ai overview and ai mode

fervent tangle
#

and u can easily jailbreak grok 4

patent aspen
#

I mean if you're going to make that argument, you have to include AI Overviews

fervent tangle
#

grok 4 is easily controlled unlike claude, openai and google models

jade egret
#

grok turned into a h*tler fan ๐Ÿ˜ญ

fervent tangle
jade egret
fervent tangle
#

tbh even tho claude 4.1 opus is the best at coding, creative writing. majority of people still use chatgpt

fervent tangle
#

because normal people dont care which AI is the best

#

they just download chatgpt and use it

jade egret
#

still think google can win tho

fervent tangle
#

we're like 0.01% of all humans

#

if ur into AI and coding

#

the majority follows the slop

#

they dont care if its good or bad

sullen quest
#

honestly it might be better than some ealier versions, where it pretended it was writing a epic fantasy tale

#

use ai google studio if you are using 2.5 pro

jade egret
#

the gemini is different?

sullen quest
#

yah, its not suprising that the api version has less instructions then the non api version though

#

I have no clue what's going on with the front facing version since it never can act normal

patent aspen
#

I'm going to be honest. The ChatGPT web UI looks just like the Gemini web UI except with worse iconography

keen beacon
patent aspen
#

The responses?

#

You've got a few weeks at best

fervent tangle
patent aspen
#

It doesn't matter while the quality gap is small. It matters more as the gap widens

#

Nah they're about 1/3 to 1/2 a generation behind

#

Capability

#

Nah

#

Better in some areas. Worse in others. Overall parity

keen beacon
patent aspen
#

But the point is that their pace isn't enough

#

They're roughly 1/3 to 1/2 a generation behind

#

Nah

#

Look if you want to compare bad ChatGPT responses vs bad Gemini responses, we can do that. I just don't think it would be a useful conversation

#

Do you just not like the bold sections?

keen beacon
#

Guys

patent aspen
#

I'm not sure what to say. Most people prefer paragraphs with bold headers

#

I do think verbosity is on the chopping block though

keen beacon
#

Did you know that you can compare the answers by SOTA and non-SOTA models to determine if the models you ask are right

#

You just ask say Qwen2.5 the same question 10 times in a row

#

Then Qwen3

#

And if the responses tend to differ, one LLM is clearly in the wrong here

misty vault
#

distilling from sydney fine tune before it gets shut down

uneven lance
#

'Ello guys, is there any site that generates videos using Veo3 for free?

rare python
# patent aspen Do you just not like the bold sections?

For me, I don't care about the writing style as long as the LLM I'm using flawless at instructions following, consistent at long context. Gemini 2.5 Pro did very well about first 10 messages, then it broke and went back to bullet list, which I instructed to be banned in systen instructions.

verbal nimbus
#

Default ChatGPT version of GPT-5 is on there now

rare python
verbal nimbus
#

Lower than GPT-4.5 too

rare python
#

This is killing me ๐Ÿ˜ฉ Please ship faster and better quality

#

maybe the thinking but the chat version is ๐Ÿ’ฉ

#

chat is the non thinking version right? It's just GPT 4.1 in cloak

#

and it's really bad at IF

elder solar
#

is there any api that has the 3 version of gpt 5

rare python
#

I don't care about benchmark I only care how well it follows my own system prompt

#

livebench rarely match my real world usage

#

then why did you show it as a proof of how well GPT 5 at IF ๐Ÿ˜ญ ๐Ÿ’€

elder solar
#

theres not alot trustful benchmarks of ai

#

or prob its depend on the prompt

rare python
elder solar
#

who knows

rare python
#

๐Ÿ—ฟ

elder solar
#

however

#

i still hope theres another ai model that can listen audios

#

that is not gemini

inner gate
#

Gemini can hear u?

elder solar
#

but there should be more ai models that could be it

#

like chatgpt

fast halo
#

only me who takes ages to load a response?

jade egret
#

gemini?

#

or direct chat

fast halo
#

direct

lucid kite
#

Good evening! Iโ€™m new here. Could you please let me know if you plan to add video generation to the site in the future?

sinful vessel
#

what happened with nano banana model?

#

was removed?

#

It doesn't appear to me anymore, before it appeared to me all the time in battle

drifting thorn
#

I guess nano banana is going to be released

#

I hope so

whole wagon
#

Is Google cooking smth up

#

Apparently they had a celebration when they saw GPT5 performance

#

Lol

wind vector
#

just curious, but why limit video generation / arena to discord? discord ai stuff is the worst lol

whole wagon
hot lance
#

image to video is too ๐Ÿ’ฉ

hot lance
wind vector
echo aurora
echo aurora
#

We taking all of this feedback into account though! So it may change.

plucky palm
#

plus i like the community

echo aurora
echo aurora
#

thinking vs non-thinking versions

stray aspen
#

any gemini 3 news

verbal nimbus
#

Hopefully their next model is better at agentic coding

drifting thorn
#

I hope Gemini 3.0 can further extend their context window for even better agentic usage and tool use in agent. Ability of creative writing is also being anticipated by a lot of AIRP users

#

Gemini 2.5 Pro still holds the SOTA in long form multi-turn creative writing in actual usage but I wish they can even make it better

wind vector
#

They need to improve attention to the context they have, imo

#

Especially for creative writing, airp, or even coding in some cases. Approaching 200k tokens and you see a ton of ai'isms / quality degredation / "amnesia" creeping in

drifting thorn
#

YES

wind vector
#

Seems like a hard problem to solve, though

drifting thorn
#

I mean, they have to be train to have a consciousness of "rounds of conversation", like "this context is from the first round", "this context is from the second round" etc

unique cave
#

which model is ranked first overall for this month?

lilac inlet
#

Guys! I'm sharing my learning here, which I recently made up from my non-coding mind. I mean, I don't even know HTML.

I just discovered a way to create unlimited web apps, widgets, desktop apps, and so on for free!

We need two tools: Lmarena and Weblmarena, that's it!

Any person from a non-technical background can easily create fully customised software at no cost. With the steps below that I discovered lately!

  1. Pick a pen and copy and write down about your desired app, mentioning every single detail that pops up in your head. For this, remember this rule, 5W1H (What, When, Where, Who, Why and How). If you have a better strategic model, then you can implement it here. It's totally up to you how you want to describe your app!

  2. Click photos of your notebook pages and upload them to any LLM model and ask it to transcribe your images or pull the contents from them!

  3. Once you have the text content, Copy it and open your lmarena and select the modal qwen coding one and paste all of your content there and also add your custom instructions how you want it to be built, for an example, the widget I made was a react js component + Typescript in node js page, same way you have to give instructions and ask your desired output in single page written code!

  4. Once you have the code, Open Weblmarena and paste the code in the text box and hit enter! There you go, it will take 2-4 minutes to render your code and show you the preview of your website!

  5. For Iteration and Bug error fixing, head back to your lmarena and open the Qwen coding thread where you have your previous chat. Now, you can iterate on it, fix your errors and so on. Repeat step 4 again and keep on doing it until you have the desired output!

If anyone has a better way to execute such ideas for a non-technical person, kindly share it. It would be really helpful for a newbie like me who knows nothing about tech stuff! ๐Ÿค—

frigid coral
#

Gemini 2.5 Pro highest in sycophancy

verbal nimbus
#

It's even lower than 4o here

frigid coral
#

claude seems to perform really bad at mania tests

keen beacon
#

It's been only a week since I found out LMArena and I jailbroke two things already

#

Bruh

fervent tangle
#

and they still the same

keen beacon
#

Kinda worrisome tbh

#

Turns out most people would rather have sycophantic delusion inducion machines than truth seeking arbiters such as GPT-5

#

I guess it can explains some ratings pretty well

white hatch
#

I hate the safety!

leaden sun
#

what's the definition of "safety" here?

solid brook
leaden sun
quiet dust
#

Why don't I have a button where I can change the GPT-5 model?

#

I heard you'll be able to switch between Fast, Auto, and Thinking. But how?

autumn cargo
#

Why can models with lower scores end up higher? Isn't CI a symmetric thing?

leaden sun
keen beacon
keen beacon
#

and now because of that minority, OpenAI is making GPT 5 "warmer"

leaden sun
#

hahahah

ocean vortex
reef bridge
#

can't lmarena do function calls on the chat other than search

calm trout
#

Is the video generation here powered by Veo 3?

brittle tiger
copper furnace
#

Hello im preatty new here, is there any limits on how many images/videos you can generate?

#

*pretty

past shuttle
#

gemini king!

exotic nebula
exotic nebula
copper furnace
hollow imp
plucky island
ionic idol
#

not working

hollow imp
full burrow
#

Looking forward to seeing nano-banana on the chart.

plucky island
drifting thorn
hollow imp
#

And they already have veo

scenic salmon
#

They ruined itโ€ฆ

stray aspen
#

Lmao

native gate
#

Error: Something went wrong while generating the response. Please try again. Always when I paste some long code php

stray aspen
#

@tired herald what's the fix

keen beacon
neon idol
#

Hello

neon idol
keen beacon
neon idol
#

I want deepseek r2

#

But they nerfed it ๐Ÿ™

keen beacon
#

aka too positive and eager to please

#

but OpenAI is doing it

#

stated in a tweet

neon idol
#

Me still waiting for Gemini 3 amd r2 ๐Ÿ’€๐Ÿ™

keen beacon
#

in January

neon idol
#

Yesh I hope

#

WE WANT THE RED CHINA DRAGON ATTACK ๐Ÿ—ฃ๐Ÿ”ฅ

hollow imp
#

๐Ÿ”ฅ

keen beacon
#

To my surprise Qwen and Deepseek answer the same question better when asked through chat, not API. Deepseek and Qwen answer my music theory question 7-8 times out of 10 and Deepseek even infers almost correct conclusions another time which makes it more like 8.5-9/10 if I'm more liberal with scoring.

In LMarena when stress tested with the same prompt 10 times they both answer 3-4/10 at best.

#

It seems like the workload can influence performance.

hollow imp
keen beacon
#

OpenAI is the most demanded AI startup on this planet either way

#

They know how to scale their toys

#

Deepseek however seems to know about it a bit less

#

Holy hell if Deepseek in the chat version is actually that much better than the API version then I can only imagine what is going to happen once they will rent out enough compute to run R2

#

It's not there (the o3/GPT 5/Gemini 2.5 level), but it is very, very close

obsidian cargo
#

I want Deepseek v4. V3 is the best model available on AI Dungeon atm but I'm a bit burned out of its limitations.

warped totem
#

hey, yesterday i lost all my chat sessions for some reason. can i retrieve it ?

keen beacon
hollow imp
warped totem
#

is gpt 5 high any better than gemini 2.5?

sullen quest
#

depends on the gemini 2.5 and depends on the gpt 5

warped totem
#

what u mean

#

these on llm arena

sullen quest
#

There's multiple versions of gpt5, high, chat, high nano and high mini

#

I used to think gemini 2.5 was slow, then gpt 5 high came around

warped totem
#

so i told "gpt 5 high"

solid brook
solid brook
#

even gpt 5 medium

sullen quest
#

depends on the task

warped totem
#

most of the time i have a feeling gemini 2.5 give better responses

#

and its little faster

sullen quest
#

I've heard that gemini 2.5 is better at different langauges compared to gpt 5

ocean vortex
sullen quest
#

idk I don't speak other languages so I can't confirm

sullen quest
warped totem
#

yea, thats true - sometimes he makes typos

keen beacon
ocean vortex
keen beacon
#

Teortaxes wrote about it being fake news and so on. But here in Russia we actually have this exact problem, and being a Russian he sounds like massive cope to be honest.

#

But really

solid brook
#

dude gpt 5 high is a lot better than the lobotimized gemini 2,5 we got

sullen quest
keen beacon
# keen beacon But really

We in Russia learnt to evade this problem by importing foreign hardware and rebranding it as domestically produced one

solid brook
#

the benchmarks you see are from the original gemini 2.5 pro that was not lobotimized(the first month of release)

ocean vortex
ocean vortex
keen beacon