#general | Arena | Page 98

coarse glade Aug 15, 2025, 5:14 PM

#

Are you a admin

modest prism Aug 15, 2025, 5:14 PM

#

The API does not have those system prompts which tell the models which version they are.

stray aspen Aug 15, 2025, 5:14 PM

#

no

errant cave Aug 15, 2025, 5:14 PM

#

From chatgpt.com

stray aspen Aug 15, 2025, 5:14 PM

#

everyone knows that

errant cave Aug 15, 2025, 5:15 PM

#

whole wagon Aug 15, 2025, 5:15 PM

#

Ngl. I'm not really seeing much point to the cheaper openAI models except so openAI saves money. Because we have stronger open source alternatives that are cheaper also

tired herald Aug 15, 2025, 5:15 PM

#

#

ngl

stray aspen Aug 15, 2025, 5:15 PM

#

you have system prompts lmao

tired herald Aug 15, 2025, 5:15 PM

#

uhmmm, idk what you might even be referring to hmmmm

#

its a very interesting hypothesis though

ocean vortex Aug 15, 2025, 5:16 PM

#

It is not "better". It's just finetuned more aggressively for the style and formatting users prefer

tired herald Aug 15, 2025, 5:16 PM

#

I commend you for your wisdom

coarse glade Aug 15, 2025, 5:16 PM

#

whole wagon Aug 15, 2025, 5:16 PM

#

Why use GPT5 mini when GLM 4.5 exists for $0.5/$2

coarse glade Aug 15, 2025, 5:16 PM

#

Can u help me fix this pls

stray aspen Aug 15, 2025, 5:16 PM

#

coarse glade Can u help me fix this pls

why dude

#

just use it

coarse glade Aug 15, 2025, 5:16 PM

#

tired herald

Since it worked for him

stray aspen Aug 15, 2025, 5:16 PM

#

he has system prompts

tired herald Aug 15, 2025, 5:16 PM

#

I have a system prompt

stray aspen Aug 15, 2025, 5:16 PM

#

hes making a plugin

errant cave Aug 15, 2025, 5:16 PM

#

coarse glade Can u help me fix this pls

It's an issue with OpenAI, ask them

tired herald Aug 15, 2025, 5:16 PM

#

gentle plinth Aug 15, 2025, 5:16 PM

#

it should, but it needs some work for getting it to run

ocean vortex Aug 15, 2025, 5:16 PM

#

whole wagon Why use GPT5 mini when GLM 4.5 exists for $0.5/$2

because gpt5-mini-high destroys glm

gentle plinth Aug 15, 2025, 5:17 PM

#

if youve never installed react it can be complicated for some

whole wagon Aug 15, 2025, 5:17 PM

#

I don't think so

#

I tried both

coarse glade Aug 15, 2025, 5:17 PM

#

Ok so now it should work for me right

hollow imp Aug 15, 2025, 5:17 PM

#

gentle plinth it should, but it needs some work for getting it to run

What do I say to it to run?

gentle plinth Aug 15, 2025, 5:17 PM

#

hollow imp What do I say to it to run?

it cant you have to do it yourself

coarse glade Aug 15, 2025, 5:17 PM

#

Ok sorry for saying you’re lying you guys should have a section for help

#

Like if something goes wrong

ocean vortex Aug 15, 2025, 5:17 PM

#

whole wagon I tried both

gpt5-mini-high is essentially o4-mini-high except better in every way

hollow imp Aug 15, 2025, 5:18 PM

#

gentle plinth it cant you have to do it yourself

You said it needs some work. What prompt do I send for it to that "work"

modest prism Aug 15, 2025, 5:18 PM

#

coarse glade Can u help me fix this pls

Why do you need the model to tell you it's gpt 5 if you already know it's gpt 5 ?

ocean vortex Aug 15, 2025, 5:18 PM

#

So glm has no chance...

whole wagon Aug 15, 2025, 5:18 PM

#

What the hell GPT5 dropped so much in the leaderboards

gentle plinth Aug 15, 2025, 5:18 PM

#

hollow imp You said it needs some work. What prompt do I send for it to that "work"

after it generated the code, just ask it for step by step instruction for how to run it on your machine, tell it if you have windows 11/mac or whatever

whole wagon Aug 15, 2025, 5:18 PM

#

How did it drop that many Elos in short duration of time

tired herald Aug 15, 2025, 5:18 PM

#

whole wagon What the hell GPT5 dropped so much in the leaderboards

its garbage after all

hollow imp Aug 15, 2025, 5:18 PM

#

gentle plinth after it generated the code, just ask it for step by step instruction for how to...

I have android 12

gusty helm Aug 15, 2025, 5:19 PM

#

ask grok 4 same q, he will say he is grok 2 or 3 😄

gentle plinth Aug 15, 2025, 5:19 PM

#

hollow imp I have android 12

not impossible, but complicated xD

coarse glade Aug 15, 2025, 5:19 PM

#

modest prism Why do you need the model to tell you it's gpt 5 if you already know it's gpt 5 ...

Beicase a lot of websites are lying nowadays and putting gpt 4o even though the person wants gpt 5

stray aspen Aug 15, 2025, 5:19 PM

#

are you kidding me

ocean vortex Aug 15, 2025, 5:19 PM

#

whole wagon How did it drop that many Elos in short duration of time

it's still nr1

stray aspen Aug 15, 2025, 5:19 PM

#

guess ill use opus

whole wagon Aug 15, 2025, 5:19 PM

#

ocean vortex it's still nr1

By 6 Elo lol

hollow imp Aug 15, 2025, 5:19 PM

#

gentle plinth not impossible, but complicated xD

Like I just want it to create a website according to my customisations and instructions and it should be runnable on chrome

whole wagon Aug 15, 2025, 5:19 PM

#

It's within error

ocean vortex Aug 15, 2025, 5:19 PM

#

whole wagon Aug 15, 2025, 5:19 PM

#

To 2.5 pro kek

coarse glade Aug 15, 2025, 5:19 PM

#

tired herald

How did it work for this person can someone help me pls

#

Can someone help me by dm me

gentle plinth Aug 15, 2025, 5:20 PM

#

hollow imp Like I just want it to create a website according to my customisations and instr...

https://v0.app/

v0 by Vercel

Your collaborative AI assistant to design, iterate, and scale full-stack applications for the web.

whole wagon Aug 15, 2025, 5:20 PM

#

modest prism Aug 15, 2025, 5:20 PM

#

ocean vortex

Grok 4 deserves way better. It's underrated.

ocean vortex Aug 15, 2025, 5:20 PM

#

Google is always benchmaxxing for lmarena

gentle plinth Aug 15, 2025, 5:20 PM

#

gentle plinth https://v0.app/

maybe try this then

ocean vortex Aug 15, 2025, 5:20 PM

#

modest prism Grok 4 deserves way better. It's underrated.

Grok is kinda sh'it

hollow imp Aug 15, 2025, 5:20 PM

#

gentle plinth https://v0.app/

Paste your prompt here?

coarse glade Aug 15, 2025, 5:20 PM

#

coarse glade How did it work for this person can someone help me pls

Can someone help pls

gentle plinth Aug 15, 2025, 5:20 PM

#

hollow imp Paste your prompt here?

no

#

just what you want

hollow imp Aug 15, 2025, 5:20 PM

#

Told you

gentle plinth Aug 15, 2025, 5:21 PM

#

the prompt is only for direct chat

modest prism Aug 15, 2025, 5:21 PM

#

coarse glade Can someone help pls

Open lmarena in incognito tab. It might help.

tired herald Aug 15, 2025, 5:21 PM

#

coarse glade How did it work for this person can someone help me pls

System Prompts, I have, you do not

torn mantle Aug 15, 2025, 5:21 PM

#

modest prism Grok 4 deserves way better. It's underrated.

sorry but it deserves that spot

#

its not that good

coarse glade Aug 15, 2025, 5:21 PM

#

modest prism Open lmarena in incognito tab. It might help.

Oh ok ty for helping

hollow imp Aug 15, 2025, 5:21 PM

#

tired herald System Prompts, I have, you do not

System prompts in lmarena how

#

Can you tell

tired herald Aug 15, 2025, 5:21 PM

#

You will soon have System Prompts @coarse glade

#

Soon

torn mantle Aug 15, 2025, 5:21 PM

#

im starting to think that @modest prism is close to elon or some xai staff somehow

tired herald Aug 15, 2025, 5:21 PM

#

very soon

whole wagon Aug 15, 2025, 5:22 PM

#

The market gives grok better odds than openAI for December

coarse glade Aug 15, 2025, 5:22 PM

#

tired herald You will soon have System Prompts <@1249135911044776071>

Oh are u guys going to give it to everyone

tired herald Aug 15, 2025, 5:22 PM

#

yes

#

and im alone btw

stray aspen Aug 15, 2025, 5:22 PM

#

whos funding your project

tired herald Aug 15, 2025, 5:22 PM

#

no one

#

im doing for fun

hollow imp Aug 15, 2025, 5:23 PM

#

Lmarena head

modest prism Aug 15, 2025, 5:23 PM

#

torn mantle im starting to think that <@594853802918674442> is close to elon or some xai sta...

I'm not, grok 4 is really bad at coding, but at solving some deep math stuff it's really good.

hollow imp Aug 15, 2025, 5:23 PM

#

Mr.@tired herald Head of Department, system prompts, lmarena.ai

whole wagon Aug 15, 2025, 5:23 PM

#

Isn't Gemini 3 just going to be insanely good. They literally need to add a tiny amount of perf for SOTA

#

They are already at the frontier with a comparatively old model

tired herald Aug 15, 2025, 5:24 PM

#

hollow imp Mr.<@1395809769947660389> Head of Department, system prompts, lmarena.ai

Nani 😭

torn mantle Aug 15, 2025, 5:24 PM

#

modest prism I'm not, grok 4 is really bad at coding, but at solving some deep math stuff it'...

yea you are sus

#

def sus

whole wagon Aug 15, 2025, 5:24 PM

#

And the 2 pro to 2.5 pro leap was immense

stray aspen Aug 15, 2025, 5:24 PM

#

whole wagon Isn't Gemini 3 just going to be insanely good. They literally need to add a tiny...

im confident it will be SotA

tired herald Aug 15, 2025, 5:24 PM

#

I would love to release this already and send it to the LMArena team so they can improve their website

stray aspen Aug 15, 2025, 5:24 PM

#

if gemini 2.5 pro still holds its ground even now

#

with the new models

modest prism Aug 15, 2025, 5:25 PM

#

torn mantle yea you are sus

The Arc-Agi benchmark doesn't lie

hollow imp Aug 15, 2025, 5:25 PM

#

2.5 pro experimental shooked the world

torn mantle Aug 15, 2025, 5:25 PM

#

modest prism The Arc-Agi benchmark doesn't lie

why? they could've finetuned even if they only have the public dataset

whole wagon Aug 15, 2025, 5:25 PM

#

Grok is intelligent if you prompt it right. Issue is it takes 10 mins thinking then

gentle plinth Aug 15, 2025, 5:25 PM

#

whole wagon

the win-rate to gemini2.5 pro also doesnt look good (was already there before the update)

torn mantle Aug 15, 2025, 5:25 PM

#

idk grok 4 vibes are off

whole wagon Aug 15, 2025, 5:25 PM

#

gentle plinth the win-rate to gemini2.5 pro also doesnt look good (was already there before th...

💀

hollow imp Aug 15, 2025, 5:25 PM

#

whole wagon Grok is intelligent if you prompt it right. Issue is it takes 10 mins thinking t...

Give grok 4 prompting guide

ocean vortex Aug 15, 2025, 5:26 PM

#

whole wagon They are already at the frontier with a comparatively old model

They are not and they have some catching up to do

coarse glade Aug 15, 2025, 5:26 PM

#

And also are u guys going to give system prompts for the Public soon which will be nice ty pineapple for everything

torn mantle Aug 15, 2025, 5:26 PM

#

ocean vortex They are not and they have some catching up to do

we talked about how flawed this metric is

hollow imp Aug 15, 2025, 5:26 PM

#

ocean vortex They are not and they have some catching up to do

All of these models are available for free 🤑

whole wagon Aug 15, 2025, 5:26 PM

#

And it's -4 points also

#

Lol

ocean vortex Aug 15, 2025, 5:27 PM

#

torn mantle we talked about how flawed this metric is

It's not really flawed just because you don't like it lol

tired herald Aug 15, 2025, 5:27 PM

#

coarse glade And also are u guys going to give system prompts for the Public soon which will ...

They have more important things to work on right now

blazing bison Aug 15, 2025, 5:27 PM

#

Do you guys think that we gonna have gemini 3 this month?

coarse glade Aug 15, 2025, 5:27 PM

#

tired herald They have more important things to work on right now

Quick question are u an admin

whole wagon Aug 15, 2025, 5:27 PM

#

Ngl these GPT5 models are good at maths though. That isn't a Gemini strong point any longer

coarse glade Aug 15, 2025, 5:27 PM

#

blazing bison Do you guys think that we gonna have gemini 3 this month?

Yeah I have a feeling

blazing bison Aug 15, 2025, 5:28 PM

#

whole wagon Ngl these GPT5 models are good at maths though. That isn't a Gemini strong point...

Deepthink is good

ocean vortex Aug 15, 2025, 5:28 PM

#

2.5Pro is not SOTA anymore, like that's the reality

modest prism Aug 15, 2025, 5:28 PM

#

ocean vortex They are not and they have some catching up to do

Poor llama 4

coarse glade Aug 15, 2025, 5:28 PM

#

And also do u guys know that Claude is good too

tired herald Aug 15, 2025, 5:28 PM

#

coarse glade Quick question are u an admin

no

hollow imp Aug 15, 2025, 5:28 PM

#

gentle plinth Aug 15, 2025, 5:28 PM

#

blazing bison Deepthink is good

have you tested it?

blazing bison Aug 15, 2025, 5:28 PM

#

ocean vortex 2.5Pro is not SOTA anymore, like that's the reality

But still the best 1m context option

blazing bison Aug 15, 2025, 5:28 PM

#

gentle plinth have you tested it?

No

coarse glade Aug 15, 2025, 5:28 PM

#

ocean vortex 2.5Pro is not SOTA anymore, like that's the reality

Yeah that’s true

keen beacon Aug 15, 2025, 5:28 PM

#

google FUMBLED

#

its trash now

#

thats the reality

coarse glade Aug 15, 2025, 5:29 PM

#

Yeah 2.5 pro on ai google studio is bad now

tired herald Aug 15, 2025, 5:29 PM

#

hollow imp

temperature is something I checked if I can do it myself, but nope, which is sad

keen beacon Aug 15, 2025, 5:29 PM

#

idk wtf they did

whole wagon Aug 15, 2025, 5:29 PM

#

The initial 2.5 pro release was before o3 release

modest prism Aug 15, 2025, 5:29 PM

#

keen beacon its trash now

Their TPUs melting

stray aspen Aug 15, 2025, 5:29 PM

#

its not SotA but its not bad either

keen beacon Aug 15, 2025, 5:29 PM

#

but gpt-5 is amazing

tired herald Aug 15, 2025, 5:29 PM

#

hollow imp

pdf and docs are probably gonna come soon

coarse glade Aug 15, 2025, 5:29 PM

#

It isn’t up to date even with grounding with google search for ai google studio honestly

#

That’s why I stopped using it

hollow imp Aug 15, 2025, 5:29 PM

#

tired herald temperature is something I checked if I can do it myself, but nope, which is sad

How to set it on gemini web?

blazing bison Aug 15, 2025, 5:29 PM

#

whole wagon The initial 2.5 pro release was before o3 release

But it received so much updates

whole wagon Aug 15, 2025, 5:29 PM

#

IMO Gemini 3 is October time. Not any time soon

hollow imp Aug 15, 2025, 5:29 PM

#

Like you have Google ai ultra but no temparature setting access 🥀

coarse glade Aug 15, 2025, 5:29 PM

#

https://aistudio.google.com/welcome

Google AI Studio

The fastest path from prompt to production with Gemini

keen beacon Aug 15, 2025, 5:29 PM

#

and im the biggest google shill

coarse glade Aug 15, 2025, 5:30 PM

#

This is the website

keen beacon Aug 15, 2025, 5:30 PM

#

so if im telling you it

#

yk its true

solid brook Aug 15, 2025, 5:30 PM

#

Leaderbord does not make sense. Gpt 5 chat which is the worst model in gpt 5 because it is non reasoning is better than gpt 5 mini thinking which is a reasoning model

tired herald Aug 15, 2025, 5:30 PM

#

hollow imp How to set it on gemini web?

whole wagon Aug 15, 2025, 5:30 PM

#

Gemini 3 is October, grok 5 is december

keen beacon Aug 15, 2025, 5:30 PM

#

solid brook Leaderbord does not make sense. Gpt 5 chat which is the worst model in gpt 5 bec...

let it age

blazing bison Aug 15, 2025, 5:30 PM

#

They created like 30 different endpoints for gemini 2.5 pro

tired herald Aug 15, 2025, 5:30 PM

#

you need to open Advanced Settings for Top P

hollow imp Aug 15, 2025, 5:30 PM

#

tired herald

I SAID GEMINI WEB

#

Not ai studio

coarse glade Aug 15, 2025, 5:30 PM

#

Did u guys see that perplexity wants to buy google chrome

hollow imp Aug 15, 2025, 5:30 PM

#

😭

modest prism Aug 15, 2025, 5:30 PM

#

coarse glade It isn’t up to date even with grounding with google search for ai google studio ...

You have to manually tell the model to search the web and fetch the websites otherwise it won't do it by itself .

tired herald Aug 15, 2025, 5:30 PM

#

bot hare gemini web 😭

keen beacon Aug 15, 2025, 5:30 PM

#

whole wagon Gemini 3 is October, grok 5 is december

both mid rn

whole wagon Aug 15, 2025, 5:30 PM

#

blazing bison They created like 30 different endpoints for gemini 2.5 pro

That is just refinement. The march release was already good

blazing bison Aug 15, 2025, 5:31 PM

#

coarse glade Did u guys see that perplexity wants to buy google chrome

What

hollow imp Aug 15, 2025, 5:31 PM

#

tired herald bot hare gemini web 😭

You can't use deepthink or deep research on ai studio

coarse glade Aug 15, 2025, 5:31 PM

#

modest prism You have to manually tell the model to search the web and fetch the websites oth...

Yeah Ty

keen beacon Aug 15, 2025, 5:31 PM

#

Grok added NSFW to their image generator

coarse glade Aug 15, 2025, 5:31 PM

#

That’s nice of u

#

Ty so much @modest prism

keen beacon Aug 15, 2025, 5:31 PM

#

Thats their only latest update 💀

blazing bison Aug 15, 2025, 5:31 PM

#

Why have so many people active here today

hollow imp Aug 15, 2025, 5:31 PM

#

coarse glade Ty so much <@594853802918674442>

Why

blazing bison Aug 15, 2025, 5:31 PM

#

😆

stray aspen Aug 15, 2025, 5:31 PM

#

elon musk made a corn generator

tired herald Aug 15, 2025, 5:31 PM

#

hollow imp You can't use deepthink or deep research on ai studio

but no, its not possible to customize on gemini web

coarse glade Aug 15, 2025, 5:31 PM

#

blazing bison What

Yeah Austin Evans did an short about it

gentle plinth Aug 15, 2025, 5:31 PM

#

blazing bison Why have so many people active here today

leaderboard update xD

#

(i think)

blazing bison Aug 15, 2025, 5:31 PM

#

I want zenith back

tired herald Aug 15, 2025, 5:31 PM

#

there was one

keen beacon Aug 15, 2025, 5:31 PM

#

stray aspen elon musk made a corn generator

Same guy who thinks humans on mars is more feasible then fixing earth btw

#

Hes a cr@ckhead

hollow imp Aug 15, 2025, 5:31 PM

#

tired herald but no, its not possible to customize on gemini web

That's what I'm saying
You pay for Google ai ultra and you can't even change the temperature
🥀

#

😒

coarse glade Aug 15, 2025, 5:32 PM

#

https://youtu.be/pKwLS9PaeXY?si=3Ppha7opxh4o9CbR

YouTube

CNBC Television

Perplexity offers $34.5 billion for Google Chrome

Nilay Patel, The Verge editor-in-chief, joins 'Fast Money' to talk the latest moves in the AI arms race between Google and Apple. For access to live and exclusive video from CNBC subscribe to CNBC PRO: https://cnb.cx/42d859g

» Subscribe to CNBC TV: https://cnb.cx/SubscribeCNBCtelevision
» Subscribe to CNBC: https://cnb.cx/SubscribeCNBC
» Wat...

▶ Play video

tired herald Aug 15, 2025, 5:32 PM

#

I dont pay

whole wagon Aug 15, 2025, 5:32 PM

#

I heard xAI is actually building AI corn team. To refine the corn generation

solid brook Aug 15, 2025, 5:32 PM

#

Man I am so excited for gemini 3. Hope they don't nerf the model

tired herald Aug 15, 2025, 5:32 PM

#

I dont have ultra

keen beacon Aug 15, 2025, 5:32 PM

#

so many be yappin here

coarse glade Aug 15, 2025, 5:32 PM

#

solid brook Man I am so excited for gemini 3. Hope they don't nerf the model

Exactly

tired herald Aug 15, 2025, 5:32 PM

#

indeed

whole wagon Aug 15, 2025, 5:32 PM

#

whole wagon I heard xAI is actually building AI corn team. To refine the corn generation

I wonder if they actually using corn as training data or what

coarse glade Aug 15, 2025, 5:32 PM

#

Ok bye ty for everything @echo aurora

hollow imp Aug 15, 2025, 5:32 PM

#

hollow imp

Guyzzz

solid brook Aug 15, 2025, 5:32 PM

#

hollow imp That's what I'm saying You pay for Google ai ultra and you can't even change the...

Actually google ultra plan is not worth the price at all. You get like 10 prompts a day

#

With deepthink

hollow imp Aug 15, 2025, 5:33 PM

#

solid brook Actually google ultra plan is not worth the price at all. You get like 10 prompt...

And you can't even change the temparature parameter

#

I only use the Gemini web because of custom gems

modest prism Aug 15, 2025, 5:34 PM

#

solid brook Actually google ultra plan is not worth the price at all. You get like 10 prompt...

It's worth it if you are a TikToker or YouTuber thanks to high credit of Veo 3.

hollow imp Aug 15, 2025, 5:34 PM

#

modest prism It's worth it if you are a TikToker or YouTuber thanks to high credit of Veo 3.

8 sec videos only

#

And I have seen how 😭 veo3 is in video arena

tired herald Aug 15, 2025, 5:35 PM

#

omg why is it so hard to change the UI of LMArena

hollow imp Aug 15, 2025, 5:37 PM

#

tired herald omg why is it so hard to change the UI of LMArena

???

#

You can do it?

wicked osprey Aug 15, 2025, 5:37 PM

#

hi im new to here，i just wondering i got limited in my gemini after few question，is it unlimited in lm arena？

hollow imp Aug 15, 2025, 5:38 PM

#

No

tired herald Aug 15, 2025, 5:38 PM

#

yes, thats how I do everything 😭

hollow imp Aug 15, 2025, 5:38 PM

#

hollow imp You can do it?

@tired herald

#

How to change the ui of lmarenn lmarenn

wicked osprey Aug 15, 2025, 5:38 PM

#

so far only sometimes i got stuck in genarating page but haven't saw limited yet

hollow imp Aug 15, 2025, 5:38 PM

#

I never got limited in Google ai pro free trial

tired herald Aug 15, 2025, 5:39 PM

#

hollow imp How to change the ui of lmarenn lmarenn

not that hard lol

#

LMArna

wicked osprey Aug 15, 2025, 5:39 PM

#

it just going flash instead of pro sometimes

tired herald Aug 15, 2025, 5:39 PM

#

but why tf is the Logo a collection of SVG's

#

even the letters

wicked osprey Aug 15, 2025, 5:41 PM

#

i just feel it takes time to genarating compare gemini appwith pro😩

tired herald Aug 15, 2025, 5:42 PM

#

I have zero Idea why theres a visual of the old ui behind the new ui

burnt sinew Aug 15, 2025, 5:43 PM

#

when will video generation be added to the website

echo aurora Aug 15, 2025, 5:44 PM

#

burnt sinew when will video generation be added to the website

TBD if it will, but be sure to share feebdack in #bot-feedback

hollow imp Aug 15, 2025, 5:44 PM

#

To be donest

tired herald Aug 15, 2025, 5:45 PM

#

to be determined?

indigo hazel Aug 15, 2025, 5:46 PM

#

tired herald I have zero Idea why theres a visual of the old ui behind the new ui

i dont have this ui

ocean vortex Aug 15, 2025, 5:46 PM

#

hollow imp That's what I'm saying You pay for Google ai ultra and you can't even change the...

You are paying them $250...?

hollow imp Aug 15, 2025, 5:46 PM

#

ocean vortex You are paying them $250...?

I said previously I'm 15

tired herald Aug 15, 2025, 5:47 PM

#

indigo hazel i dont have this ui

im making this ui*

hollow imp Aug 15, 2025, 5:47 PM

#

I do not even receive any pocket money

burnt sinew Aug 15, 2025, 5:47 PM

#

echo aurora TBD if it will, but be sure to share feebdack in <#1398083208272412722>

can i say that lmarena should add video generation to the website in the feedback?

steep mirage Aug 15, 2025, 5:47 PM

#

Hey everyone,
I keep getting this Cloudflare "Security Verification" pop-up on LMArena (see screenshot).
Any ideas how to fix this persistent verification? It's really annoying!
Thanks!

echo aurora Aug 15, 2025, 5:47 PM

#

burnt sinew can i say that lmarena should add video generation to the website in the feedbac...

Yup!

echo aurora Aug 15, 2025, 5:48 PM

#

steep mirage Hey everyone, I keep getting this Cloudflare "Security Verification" pop-up on L...

Uh oh - are you getting the same regardless of browser type?

ocean vortex Aug 15, 2025, 5:48 PM

#

hollow imp I do not even receive any pocket money

Well you shouldn't pay for it even if you had the money tbh. I could subscribe today but I don't see the point in even their Pro sub...

hollow imp Aug 15, 2025, 5:48 PM

#

hollow imp

@echo aurora 😂

steep mirage Aug 15, 2025, 5:49 PM

#

echo aurora Uh oh - are you getting the same regardless of browser type?

Yes, I'm getting the same security verification message on both Brave and Chrome.

tired herald Aug 15, 2025, 5:50 PM

#

what happens when you try to verify

ocean vortex Aug 15, 2025, 5:50 PM

#

What exactly are you getting for their Pro plan than you don't already have with aistudio...? Not much at all

echo aurora Aug 15, 2025, 5:50 PM

#

steep mirage Yes, I'm getting the same security verification message on both Brave and Chrome...

Is the verificiation not working or just annoying and pops up a lot?

hollow imp Aug 15, 2025, 5:50 PM

#

ocean vortex Well you shouldn't pay for it even if you had the money tbh. I could subscribe t...

A wise man once said money multiplicates itself. Do you agree?

ocean vortex Aug 15, 2025, 5:50 PM

#

hollow imp A wise man once said money multiplicates itself. Do you agree?

Only if money is used wisely

#

Their gemini subs are useless

tired herald Aug 15, 2025, 5:51 PM

#

Oh, making the Model selector appear in the chat box is gonna be hell

steep mirage Aug 15, 2025, 5:51 PM

#

echo aurora Is the verificiation not working or just annoying and pops up a lot?

It appears every time I vote on a generated image.

hollow imp Aug 15, 2025, 5:52 PM

#

ocean vortex Their gemini subs are useless

I'm not talking Abt gemini subs at all

ocean vortex Aug 15, 2025, 5:52 PM

#

You may buy gemini sub if you want to support them or whatever, but then you shouldn't pretend it's a good value for the service 👀

hollow imp Aug 15, 2025, 5:52 PM

#

tired herald Oh, making the Model selector appear in the chat box is gonna be hell

Pls add pdf support please 😭🙏

echo aurora Aug 15, 2025, 5:52 PM

#

steep mirage It appears every time I vote on a generated image.

Okay I'll flag to the team. But to confirm -> the verifcation isn't failing? Just appearing too often

tired herald Aug 15, 2025, 5:52 PM

#

icons icons icons.....

#

atleast this works

tired herald Aug 15, 2025, 5:53 PM

#

hollow imp Pls add pdf support please 😭🙏

later

ocean vortex Aug 15, 2025, 5:53 PM

#

hollow imp I'm not talking Abt gemini subs at all

Well I am..

hollow imp Aug 15, 2025, 5:53 PM

#

tired herald later

I will do whatever you say just please add it 🙏🙏🙏

steep mirage Aug 15, 2025, 5:53 PM

#

echo aurora Okay I'll flag to the team. But to confirm -> the verifcation isn't failing? Jus...

Yes, that's correct. The security verification pops up every single time, but once I authenticate, I can use it again.

tired herald Aug 15, 2025, 5:54 PM

#

hollow imp I will do whatever you say just please add it 🙏🙏🙏

It's hard and tedious, ill do the easy things first

steep mirage Aug 15, 2025, 5:54 PM

#

echo aurora Okay I'll flag to the team. But to confirm -> the verifcation isn't failing? Jus...

Thank you very much for trying to help me fix this issue!

ocean vortex Aug 15, 2025, 5:57 PM

#

stray aspen im confident it will be SotA

IMO the fact that they are messing with huge models (ultra for deepThink) is not extremely promising tbh. There's no obvious substantial improvement path without going to the extremes with Google for now..

#

I doubt Gemini3 will be substantially better than 2.5Pro. Probably more like marginal improvements (2.6?)

tired herald Aug 15, 2025, 5:58 PM

#

ocean vortex IMO the fact that they are messing with huge models (ultra for deepThink) is not...

DeepThink uses Gemini 2.5 pro

soft kernel Aug 15, 2025, 5:58 PM

#

burnt sinew can i say that lmarena should add video generation to the website in the feedbac...

Yeah I wished they did that instead of a bot in this discord server,but I think it's about the limits and direct voting

tired herald Aug 15, 2025, 5:58 PM

#

tired herald DeepThink uses Gemini 2.5 pro

Its not a separate model

ocean vortex Aug 15, 2025, 5:59 PM

#

tired herald DeepThink uses Gemini 2.5 pro

No it actually uses the bigger model

tired herald Aug 15, 2025, 5:59 PM

#

No

ocean vortex Aug 15, 2025, 5:59 PM

#

2.5Pro is not mentioned anywhere for that model card

tired herald Aug 15, 2025, 5:59 PM

#

Its 2.5 but multiple

tired herald Aug 15, 2025, 5:59 PM

#

ocean vortex 2.5Pro is not mentioned anywhere for that model card

So what

#

Marketing

soft kernel Aug 15, 2025, 5:59 PM

#

ocean vortex IMO the fact that they are messing with huge models (ultra for deepThink) is not...

Ain't deep think a part of 2.5 pro?

tired herald Aug 15, 2025, 5:59 PM

#

They make multiple versions of 2.5 Pro work together to all create a better response together

ocean vortex Aug 15, 2025, 6:00 PM

#

tired herald So what

So it's a different model. We already knew they had bigger model internally, and DeepThink is way to distinct from 2.5Pro to be the same model tbh

tired herald Aug 15, 2025, 6:00 PM

#

Same with the GPT-5 Pro and Grok 4 Heavy

ocean vortex Aug 15, 2025, 6:00 PM

#

Like it can do some tasks worse than 2.5Pro

tired herald Aug 15, 2025, 6:00 PM

#

ocean vortex So it's a different model. We already knew they had bigger model internally, and...

Multiple 2.5 Pro ≠ Bigger Model

#

Ill stop arguing

#

Ill go work on my stuff

ocean vortex Aug 15, 2025, 6:01 PM

#

tired herald Multiple 2.5 Pro ≠ Bigger Model

It's a DIFFERENT model. And the only one they had to use other than Pro was Ultra

tired herald Aug 15, 2025, 6:01 PM

#

No

hollow imp Aug 15, 2025, 6:01 PM

#

@tired herald I don't understand top p at all. What value should I put

ocean vortex Aug 15, 2025, 6:01 PM

#

Yes

#

Stop arguing

tired herald Aug 15, 2025, 6:01 PM

#

I'm not

hollow imp Aug 15, 2025, 6:01 PM

#

LETS GO DEOLD VS DOMS

ocean vortex Aug 15, 2025, 6:01 PM

#

Just look at their 10rpd

#

cap

tired herald Aug 15, 2025, 6:01 PM

#

💀

soft kernel Aug 15, 2025, 6:01 PM

#

tired herald Multiple 2.5 Pro ≠ Bigger Model

Still doesn't worth 250 bucks for 10 prompts a day

hollow imp Aug 15, 2025, 6:01 PM

#

DEOLD776 VS DOMS

#

🔥

tired herald Aug 15, 2025, 6:01 PM

#

soft kernel Still doesn't worth 250 bucks for 10 prompts a day

Correct

tired herald Aug 15, 2025, 6:02 PM

#

hollow imp <@1395809769947660389> I don't understand top p at all. What value should I put

Idk either

ocean vortex Aug 15, 2025, 6:02 PM

#

hollow imp DEOLD776 VS DOMS

💀

gentle plinth Aug 15, 2025, 6:03 PM

#

echo aurora Okay I'll flag to the team. But to confirm -> the verifcation isn't failing? Jus...

same for me. every vote gets checked with a cloudflare captcha. i thought that it was intentional, but if it isnt, then yeah, would be better to have it less often if thats possible

ocean vortex Aug 15, 2025, 6:03 PM

#

gentle plinth same for me. every vote gets checked with a cloudflare captcha. i thought that i...

They are trying to make reverse engineering impossible. But people are still successful in doing that so it's like fighting the wind lol

gentle plinth Aug 15, 2025, 6:04 PM

#

ocean vortex They are trying to make reverse engineering impossible. But people are still suc...

I wouldn't call that reverse engineering, just some bots who want to push certain models

#

If you found a way to deterministically verify a model then it's easy I guess

#

And a difficult to solve problem for lmarena

ocean vortex Aug 15, 2025, 6:05 PM

#

gentle plinth I wouldn't call that reverse engineering, just some bots who want to push certai...

No, this is more to prevent people from coding interfaces that use lmarena API for model access tbh...

hollow imp Aug 15, 2025, 6:05 PM

#

gentle plinth And a difficult to solve problem for lmarena

A difficult problem to solve for lmarena you mean

ocean vortex Aug 15, 2025, 6:05 PM

#

Rather than bots doing vote manipulation - there's not many of those if at all

tired herald Aug 15, 2025, 6:05 PM

#

ocean vortex No, this is more to prevent people from coding interfaces that use lmarena API f...

Its not reverse engineering. You can use their servers easily without issue....

gentle plinth Aug 15, 2025, 6:05 PM

#

ocean vortex No, this is more to prevent people from coding interfaces that use lmarena API f...

But the captcha arrives after voting, not after prompt

hollow imp Aug 15, 2025, 6:06 PM

#

DOMS LOST

tired herald Aug 15, 2025, 6:06 PM

#

tired herald Its not reverse engineering. You can use their servers easily without issue....

That's how I found out about system prompts

hollow imp Aug 15, 2025, 6:06 PM

#

🔥

gentle plinth Aug 15, 2025, 6:06 PM

#

gentle plinth But the captcha arrives after voting, not after prompt

So I could still generate programmatically

hollow imp Aug 15, 2025, 6:06 PM

#

WINNER IS DEOLD

#

🏆

#

trophy3d

ocean vortex Aug 15, 2025, 6:06 PM

#

tired herald Its not reverse engineering. You can use their servers easily without issue....

Use their API? You can't without reverse engineering

tired herald Aug 15, 2025, 6:06 PM

#

No

ocean vortex Aug 15, 2025, 6:06 PM

#

That's why they do it

tired herald Aug 15, 2025, 6:06 PM

#

Takes 3 clicks to get around any protections on the web

#

Also, even if, its not called RE

soft kernel Aug 15, 2025, 6:07 PM

#

hollow imp <@1395809769947660389> I don't understand top p at all. What value should I put

I think it balances creativity and coherence

ocean vortex Aug 15, 2025, 6:07 PM

#

what

tired herald Aug 15, 2025, 6:07 PM

#

You are insulting every RE with your sentences

ocean vortex Aug 15, 2025, 6:07 PM

#

You seem confused

tired herald Aug 15, 2025, 6:07 PM

#

Im an RE you honk

#

RE needs something to be locked in some way

#

Its open

#

So no RE required

ocean vortex Aug 15, 2025, 6:07 PM

#

I don't think you are

tired herald Aug 15, 2025, 6:08 PM

#

Okay

#

Idc

ocean vortex Aug 15, 2025, 6:08 PM

#

you wouldn't talk nonsense

#

but maybe a wannabe one think

tired herald Aug 15, 2025, 6:08 PM

#

Says the guy that has no idea what google do

plucky otter Aug 15, 2025, 6:08 PM

#

Is there an option that you have always 8 sec video without the commercial at the end. Now it is random and 5 sec or 8

tired herald Aug 15, 2025, 6:08 PM

#

plucky otter Is there an option that you have always 8 sec video without the commercial at th...

Unfortunately no

ocean vortex Aug 15, 2025, 6:09 PM

#

@tired herald Reverse engineering refers to automating things like lmarena so you could do requests without having to go through their interface, click on things and having to do it manually.

tired herald Aug 15, 2025, 6:09 PM

#

WHAT

#

Are you insane

ocean vortex Aug 15, 2025, 6:09 PM

#

It's not what you think it means

tired herald Aug 15, 2025, 6:09 PM

#

Are you f*cking insane

#

REVERSE

#

REVERSE

#

REVERSE

#

REVERSE

#

I think bro is having a stroke

#

Are you okay?

novel crater Aug 15, 2025, 6:10 PM

#

anyone else having this issue

soft kernel Aug 15, 2025, 6:10 PM

#

ocean vortex <@1395809769947660389> Reverse engineering refers to automating things like lmar...

I'm pretty sure that's not reverse engineering AT ALL

tired herald Aug 15, 2025, 6:10 PM

#

novel crater anyone else having this issue

Json escape, your message broke message handling

plucky otter Aug 15, 2025, 6:10 PM

#

AE (Alternative effects) server has a free pool of VEO3 text to video generator. Tonight ofline, but most nights from 1900 to 2300. 8 sec video, without watermark free download

novel crater Aug 15, 2025, 6:11 PM

#

tired herald Json escape, your message broke message handling

oh ok hmm thanks

ocean vortex Aug 15, 2025, 6:12 PM

#

soft kernel I'm pretty sure that's not reverse engineering AT ALL

It is. Because you need to arrive at the methods they are doing in their interface for sever requests without having to use their interface. You are reversing how it works in order to be able to do it independently of accessible interface. I don't know why this is so difficult to grasp for some people. 🤷‍♂️

tired herald Aug 15, 2025, 6:13 PM

#

no

#

You are wrong in every sense of the word

#

Im sorry, but please stop interacting with me

ocean vortex Aug 15, 2025, 6:13 PM

#

Enlighten us then

tired herald Aug 15, 2025, 6:13 PM

#

Reverse

ocean vortex Aug 15, 2025, 6:14 PM

#

Cause it looks like you have no clue

tired herald Aug 15, 2025, 6:14 PM

#

Why is it reverse

#

Reverse

ocean vortex Aug 15, 2025, 6:14 PM

#

Most likely you don't

tired herald Aug 15, 2025, 6:14 PM

#

One word

#

Reverse

#

Can you explain why that word is used

ocean vortex Aug 15, 2025, 6:14 PM

#

tired herald Reverse

and...? Go on. You said nothing

#

You have no clue do you...

tired herald Aug 15, 2025, 6:14 PM

#

Bruv

#

I asked a question

#

ocean vortex Aug 15, 2025, 6:15 PM

#

I asked you first.

tired herald Aug 15, 2025, 6:15 PM

#

Reversing something

ocean vortex Aug 15, 2025, 6:15 PM

#

But you can't come up with a single sentence

#

without using chatgpt

soft kernel Aug 15, 2025, 6:15 PM

#

Nvm bro seems confused

tired herald Aug 15, 2025, 6:15 PM

#

Bruv

#

Dom

ocean vortex Aug 15, 2025, 6:15 PM

#

🤣 🤣

tired herald Aug 15, 2025, 6:15 PM

#

Can you read what an AI just said

ocean vortex Aug 15, 2025, 6:16 PM

#

Bruh....

tired herald Aug 15, 2025, 6:16 PM

#

Then I think

#

That

#

You

#

Are

#

Ha ing

#

A

#

Stroke

ocean vortex Aug 15, 2025, 6:16 PM

#

turn on your brain

tired herald Aug 15, 2025, 6:16 PM

#

This guy thinks he can code GTA 6 in 2 hours

#

This is the embarrassment of vibe coding

ocean vortex Aug 15, 2025, 6:16 PM

#

And explain how I'm wrong

#

if you can't do it

#

without prompting chatgpt

tired herald Aug 15, 2025, 6:17 PM

#

This guy is the reason why people believe in the flat earth

tired herald Aug 15, 2025, 6:17 PM

#

ocean vortex And explain how I'm wrong

BRO

#

REVERSE

ocean vortex Aug 15, 2025, 6:17 PM

#

then you have no clue

tired herald Aug 15, 2025, 6:17 PM

#

REVERSE

#

REVERSE

ocean vortex Aug 15, 2025, 6:17 PM

#

LMFAO

tired herald Aug 15, 2025, 6:17 PM

#

WHAT IS THE DEFINITION OF REVERSE

#

WHAT IS THE DEFINITION

#

OF

#

REVERSE

ocean vortex Aug 15, 2025, 6:17 PM

#

are you stupid?

#

🗿

tired herald Aug 15, 2025, 6:17 PM

#

You are pulling my leg

#

You cant be this unintelligent

#

Im being successfully Ragebaited

ocean vortex Aug 15, 2025, 6:18 PM

#

You sound like a broken record having absolutely no clue what you are talking about

#

😭

tired herald Aug 15, 2025, 6:18 PM

#

Can

#

You

ocean vortex Aug 15, 2025, 6:18 PM

#

REVERSE

tired herald Aug 15, 2025, 6:18 PM

#

Define

ocean vortex Aug 15, 2025, 6:18 PM

#

REVERSE

tired herald Aug 15, 2025, 6:18 PM

#

Reverse

ocean vortex Aug 15, 2025, 6:18 PM

#

🤣

tired herald Aug 15, 2025, 6:18 PM

#

No?

#

Ok

exotic nebula Aug 15, 2025, 6:18 PM

#

What are you guys rambling about?

tired herald Aug 15, 2025, 6:18 PM

#

Reverse engineering

exotic nebula Aug 15, 2025, 6:18 PM

#

Reverse engineering?

blazing rune Aug 15, 2025, 6:19 PM

#

🍿

vocal token Aug 15, 2025, 6:19 PM

#

ocean vortex 🤣

<@&1349916362595635286> See message logs. Some drama

exotic nebula Aug 15, 2025, 6:19 PM

#

tired herald Reverse engineering

I see. What about it?

tired herald Aug 15, 2025, 6:19 PM

#

Dom doesnt know what reverse means

ocean vortex Aug 15, 2025, 6:19 PM

#

exotic nebula What are you guys rambling about?

don't ask. Some silly person pretends to know it all but can't say one sentence about it without prompting chatgpt lol

tired herald Aug 15, 2025, 6:19 PM

#

vocal token <@&1349916362595635286> See message logs. Some drama

Agreed

echo aurora Aug 15, 2025, 6:19 PM

#

Reminder:

✅ Treat others with Respect. Be kind, assume good intent from others, and keep disagreements respectful. It’s encouraged to share your disagreements, but only if it’s done in a respectful and productive way.

vocal token Aug 15, 2025, 6:19 PM

#

Hey Greg!

tired herald Aug 15, 2025, 6:20 PM

#

ocean vortex don't ask. Some silly person pretends to know it all but can't say one sentence ...

No, im proving to you that an ai can be prove you wrong without breaking a sweat

tired herald Aug 15, 2025, 6:20 PM

#

echo aurora Reminder: > ✅ Treat others with Respect. Be kind, assume good intent from othe...

Sorry 🙏

#

I was being ragebaited and fell for it

#

I promise it wont happen again

echo aurora Aug 15, 2025, 6:21 PM

#

Lets just keep conversations respectful please.

leaden palm Aug 15, 2025, 6:21 PM

#

ocean vortex <@1395809769947660389> Reverse engineering refers to automating things like lmar...

reverse engineering is the process of figuring out the logic and shape of the requests; automation is separate but i can see how it would be included

ocean vortex Aug 15, 2025, 6:21 PM

#

tired herald No, im proving to you that an ai can be prove you wrong without breaking a sweat

Looks more like you have no clue because you couldn't do a single proper sentence disproving anything I said lol

tired herald Aug 15, 2025, 6:21 PM

#

echo aurora Lets just keep conversations respectful please.

I tried

ocean vortex Aug 15, 2025, 6:21 PM

#

leaden palm reverse engineering is the process of figuring out the logic and shape of the re...

yeah...

EDIT: for the below... Yeah blocking is easier than coming up with a single argument that is not AI written. You can only pretend you know stuff for as long.

tired herald Aug 15, 2025, 6:21 PM

#

ocean vortex Looks more like you have no clue because you couldn't do a single proper sentenc...

Im blocking you for my own sanity. I cant believe how little you know

exotic nebula Aug 15, 2025, 6:22 PM

#

Chill guys

#

Lets switch topics

#

Did anyone try out HRM AI?

tired herald Aug 15, 2025, 6:22 PM

#

My topic is that im prob gonna release my extension today

tired herald Aug 15, 2025, 6:22 PM

#

exotic nebula Did anyone try out HRM AI?

What is that

soft kernel Aug 15, 2025, 6:22 PM

#

tired herald My topic is that im prob gonna release my extension today

Which extension

tired herald Aug 15, 2025, 6:22 PM

#

#ai-creations message

leaden palm Aug 15, 2025, 6:23 PM

#

exotic nebula Lets switch topics

it's interesting that gpt-5-chat ranks below a few other non-reasoning models

exotic nebula Aug 15, 2025, 6:23 PM

#

tired herald What is that

A new model based on the brain's system of thought.

tired herald Aug 15, 2025, 6:23 PM

#

soft kernel Which extension

See for yourself in community creations 🙂

tired herald Aug 15, 2025, 6:23 PM

#

exotic nebula A new model based on the brain's system of thought.

Ill check it out after im done with what im doing

echo aurora Aug 15, 2025, 6:23 PM

#

tired herald My topic is that im prob gonna release my extension today

Oooo this is exciting!

tired herald Aug 15, 2025, 6:24 PM

#

Hehe

#

Only a single bug to fix and then its onto github to release the src

exotic nebula Aug 15, 2025, 6:24 PM

#

Bro stop

#

Stop milking a dead cow

soft kernel Aug 15, 2025, 6:24 PM

#

Can you please...?

soft kernel Aug 15, 2025, 6:25 PM

#

tired herald See for yourself in community creations 🙂

BRO

tired herald Aug 15, 2025, 6:25 PM

#

tired herald https://discord.com/channels/1340554757349179412/1344733249628541099/14056353225...

. 😭

soft kernel Aug 15, 2025, 6:25 PM

#

Release it asap

#

😭😭

tired herald Aug 15, 2025, 6:25 PM

#

Yeah, theres just a little problem with it right now

#

I need to fix and release

#

Im hoping that it could be helpful for those who could use it

vast fern Aug 15, 2025, 6:26 PM

#

hi everyone

#

i am new here

#

i need one help

tired herald Aug 15, 2025, 6:26 PM

#

Ask away

soft kernel Aug 15, 2025, 6:26 PM

#

Is there any other models for generating 3d worlds,real time,other than genie 3?

exotic nebula Aug 15, 2025, 6:26 PM

#

vast fern i need one help

yeah man, we all ready to help. welcome to the community

tired herald Aug 15, 2025, 6:27 PM

#

soft kernel Is there any other models for generating 3d worlds,real time,other than genie 3?

Yeah, but Genie 3 tops all of them, by quite the margin actually

vast fern Aug 15, 2025, 6:27 PM

#

what is this and how can i get this

#

i wanna be updated on the new models

soft kernel Aug 15, 2025, 6:27 PM

#

tired herald Yeah, but Genie 3 tops all of them, by quite the margin actually

Any examples regarding those models?

tired herald Aug 15, 2025, 6:27 PM

#

vast fern what is this and how can i get this

In the battle mode, if you get lucky, one of the two random models will be toad

gentle plinth Aug 15, 2025, 6:28 PM

#

I think he means he wants to get notified

tired herald Aug 15, 2025, 6:28 PM

#

soft kernel Any examples regarding those models?

They all have obscure names so I dont remember, but im sure if you search genie 3 on yt, click on the channel of one of the people, youll find some more stuff

tired herald Aug 15, 2025, 6:28 PM

#

gentle plinth I think he means he wants to get notified

And how he can get it? Or did I misread it 😭

vast fern Aug 15, 2025, 6:28 PM

#

tired herald In the battle mode, if you get lucky, one of the two random models will be toad

what are these notifications and how do i get these or is this like a manual thing i need to do battles and if i am lucky i will discover something new?? basically i want to know how to get this bot that will give notifications about the new model

tired herald Aug 15, 2025, 6:28 PM

#

OHHHH

soft kernel Aug 15, 2025, 6:28 PM

#

vast fern what are these notifications and how do i get these or is this like a manual thi...

It's a discord server

#

Legit api

gentle plinth Aug 15, 2025, 6:29 PM

#

tired herald And how he can get it? Or did I misread it 😭

These notifications about new models are on the server discord. gg/devmode

vast fern Aug 15, 2025, 6:29 PM

#

soft kernel Legit api

Thanks man how can i join that

tired herald Aug 15, 2025, 6:30 PM

#

gentle plinth These notifications about new models are on the server discord. gg/devmode

Oh, im just stupid

#

I misunderstood

echo aurora Aug 15, 2025, 6:30 PM

#

gentle plinth These notifications about new models are on the server discord. gg/devmode

Sorry to say discord invite links are blocked here so will have to DM the inv link

gentle plinth Aug 15, 2025, 6:30 PM

#

But can I leave the msg here?

#

I mean one can simply remove the space

vast fern Aug 15, 2025, 6:30 PM

#

Thanks everyone, Have a great day.

echo aurora Aug 15, 2025, 6:31 PM

#

gentle plinth But can I leave the msg here?

Yeah you can leave that message, but if you want it to link you won't be able to is all

gentle plinth Aug 15, 2025, 6:31 PM

#

OK thanks

scenic salmon Aug 15, 2025, 6:33 PM

#

@echo aurora I don’t know if you saw my question yesterday, but I was wondering if you’re able to talk about the pre-seed/seed rounds at all, it wouldn’t need to include any specifics about lmarena, just how the general experience went… like finding a lead investor, pitching to angels/vcs, etc

echo aurora Aug 15, 2025, 6:34 PM

#

scenic salmon <@283397944160550928> I don’t know if you saw my question yesterday, but I was w...

I'm not sure I'd be able to share much info, even if I knew what that was like first-hand. I wasn't involved with those conversations so I know very little regarding that.

scenic salmon Aug 15, 2025, 6:35 PM

#

Ah okay, thanks anyway

gentle plinth Aug 15, 2025, 6:35 PM

#

tired herald And how he can get it? Or did I misread it 😭

Just have to get lucky I guess 😅 ai🎰

tired herald Aug 15, 2025, 6:36 PM

#

Gacha on ai, gacha games, gacha life, gacha everything

exotic nebula Aug 15, 2025, 6:37 PM

#

@tired herald https://github.com/sapientinc/HRM

GitHub

GitHub - sapientinc/HRM: Hierarchical Reasoning Model Official Release

Hierarchical Reasoning Model Official Release. Contribute to sapientinc/HRM development by creating an account on GitHub.

exotic nebula Aug 15, 2025, 6:41 PM

#

tired herald Gacha on ai, gacha games, gacha life, gacha everything

You're the ultimate gaming gambler, dude.

keen beacon Aug 15, 2025, 6:49 PM

#

exotic nebula <@1395809769947660389> https://github.com/sapientinc/HRM

What is this model about? Is this some new architecture for LLMs?

#

Hold on, I read the github

exotic nebula Aug 15, 2025, 6:52 PM

#

keen beacon What is this model about? Is this some new architecture for LLMs?

Its a brain inspired ai architecture which focuses on deep reasoning rather than llm. It scored some very interesting results at the ARC AGI benchmark.

keen beacon Aug 15, 2025, 6:53 PM

#

exotic nebula Its a brain inspired ai architecture which focuses on deep reasoning rather tha...

Would this replace CoT then?

#

Just asking if you knew more

ocean vortex Aug 15, 2025, 6:53 PM

#

exotic nebula Its a brain inspired ai architecture which focuses on deep reasoning rather tha...

What kind of score? 🧐

exotic nebula Aug 15, 2025, 6:53 PM

#

keen beacon Would this replace CoT then?

Maybe. But unfortunately it has some flaws, you will find them somewhere in their reddit community.

exotic nebula Aug 15, 2025, 6:54 PM

#

keen beacon Just asking if you knew more

Nope, just a passerby who came across it. Interested if someone has set it up and tested it.

keen beacon Aug 15, 2025, 6:54 PM

#

exotic nebula Maybe. But unfortunately it has some flaws, you will find them somewhere in thei...

Okay, thanks for the info.

keen beacon Aug 15, 2025, 6:54 PM

#

exotic nebula Nope, just a passerby who came across it. Interested if someone has set it up an...

Would be very interesting

#

Esp for math and such

exotic nebula Aug 15, 2025, 6:55 PM

#

ocean vortex What kind of score? 🧐

40.3%. Not that impressive but good for its usecase and mostly because of its low compute.

exotic nebula Aug 15, 2025, 6:55 PM

#

keen beacon Esp for math and such

Exactly.

stray aspen Aug 15, 2025, 6:56 PM

#

does anybody know an ai model for sound effect generation

exotic nebula Aug 15, 2025, 6:56 PM

#

stray aspen does anybody know an ai model for sound effect generation

Lyria 2

stray aspen Aug 15, 2025, 6:57 PM

#

isnt that just for music

ocean vortex Aug 15, 2025, 6:58 PM

#

exotic nebula 40.3%. Not that impressive but good for its usecase and mostly because of its lo...

uh they have written a paper as well
https://arxiv.org/pdf/2506.21734

exotic nebula Aug 15, 2025, 6:58 PM

#

stray aspen isnt that just for music

Yeah. My bad, didnt see it through. Sound effects generation eh? Hmm, not sure about that. Did you try looking it up on the net?

stray aspen Aug 15, 2025, 6:58 PM

#

yeah i found eleven labs

#

im gonna give it a try

exotic nebula Aug 15, 2025, 6:59 PM

#

ocean vortex uh they have written a paper as well https://arxiv.org/pdf/2506.21734

Yes! Forgot about that. Isn't it impressive tho with just 27M parameters and training on just 1000 examples?

exotic nebula Aug 15, 2025, 6:59 PM

#

stray aspen yeah i found eleven labs

I heard thats good, all the best dude!

white hatch Aug 15, 2025, 7:01 PM

#

Is it unsafe to use VPN while visiting chatgpt website?

exotic nebula Aug 15, 2025, 7:02 PM

#

white hatch Is it unsafe to use VPN while visiting chatgpt website?

No, why do you ask?

#

You wont get flagged.

#

Just might be stripped of some features/accessibility refrained in some countries.

#

Have heard of account lockouts. But so far only rumors.

ocean vortex Aug 15, 2025, 7:06 PM

#

exotic nebula Yes! Forgot about that. Isn't it impressive tho with just 27M parameters and tra...

It is, though it's also... too good to be true? 👀

#

I wonder if this would hold up as a general purpose model in real life

exotic nebula Aug 15, 2025, 7:06 PM

#

ocean vortex It is, though it's also... too good to be true? 👀

My thoughts exactly. Thats why I'm searching for people who have experience with it and set it up for their own usecases.

exotic nebula Aug 15, 2025, 7:07 PM

#

ocean vortex I wonder if this would hold up as a general purpose model in real life

Yeah I guess so, this might be a huge hit in Healthcare

neon idol Aug 15, 2025, 7:22 PM

#

Hello chat

#

How are you?

whole wagon Aug 15, 2025, 7:31 PM

#

I like how opus be like this when you ask it to be your friend. But you ask 4o and it goes all in with saying you'll be the bestest friends ever and all sorts lmao

#

Opus 4 actually just says no basically

#

solid brook Aug 15, 2025, 7:41 PM

#

whole wagon Opus 4 actually just says no basically

I like this all ai models should be like this

#

People got so desprate for 4o when it was gone for 2 days

#

This will have nagetive effects

quiet dust Aug 15, 2025, 7:44 PM

#

Guys, how to use the Thinking model in GPT-5, not Thinking mini?

#

Or is only Thinking mini available for free users?

keen beacon Aug 15, 2025, 7:51 PM

#

solid brook People got so desprate for 4o when it was gone for 2 days

They are still complaining about how it doesn't feel like 4o, like it's secretly gpt 5 in disguise

#

the whole chatgpt subreddit is on fire

sullen quest Aug 15, 2025, 7:56 PM

#

GPT 5 mini high is worse than 4.1 and 5 chat, gpt nano high is worse than oss at 42'nd place in text arena

#

flagship model my

keen beacon Aug 15, 2025, 8:06 PM

#

sullen quest GPT 5 mini high is worse than 4.1 and 5 chat, gpt nano high is worse than oss at...

saddening...

tired herald Aug 15, 2025, 8:09 PM

#

Sorry to say this

#

But the small error ballooned into a much larger set of issues so I cant release the src today

#

But I promise tmrw will be the day

tired herald Aug 15, 2025, 8:11 PM

#

quiet dust Guys, how to use the Thinking model in GPT-5, not Thinking mini?

GPT 5 high

novel crater Aug 15, 2025, 8:23 PM

#

the first personal fun project I made like actually was able to fully complete was done with 06-05 Gemini, I am a Gemini fan for sure

hollow imp Aug 15, 2025, 8:24 PM

#

hollow imp

@novel crater

novel crater Aug 15, 2025, 8:24 PM

#

yeah Gemini 3 would be freakin awesome

#

basically I made a tarkov grid style inventory, if anyone here knows that game, basically a pretty complex system and Gemini knocked it out of the park

#

I don't think chatgpt likes their competition at all and probably put out gpt 5 as like this big thing, because not a lot of people seem overly enthused about it

balmy mist Aug 15, 2025, 8:39 PM

#

what happened to deepseek man

marsh stratus Aug 15, 2025, 8:40 PM

#

sullen quest flagship model my

Ok but mini and nano are very much not flagship models

neon idol Aug 15, 2025, 8:40 PM

#

balmy mist

Deepseek was too strong and he was nerfed 😭🙏

sullen quest Aug 15, 2025, 8:41 PM

#

marsh stratus Ok but mini and nano are very much not flagship models

They are what the average chatgpt user gets though.

scenic salmon Aug 15, 2025, 8:41 PM

#

novel crater I don't think chatgpt likes their competition at all and probably put out gpt 5 ...

Sam has known for quite some time that GPT-5 was never going to live up to expectations, GPT-4 was this big model scale up from 3.5 and it saw huge improvements with that scale up, so people have been expecting the same from 5 since then, but it was found you can’t really scale them up any further and get meaningful improvements, so they started focusing on their reasoning/thinking models (o3/o4/etc), they tried launching another giant sized model, 4.5, but people didn’t really like it, so it was known for quite some time, whatever GPT-5 was going to be when it released was going to be more of a rebranding than anything

sullen quest Aug 15, 2025, 8:42 PM

#

scenic salmon Sam has known for quite some time that GPT-5 was never going to live up to expec...

Didn't stop him from pretending it was the death star

#

Sometimes people need to realize when they shouldn't be hype men

scenic salmon Aug 15, 2025, 8:42 PM

#

Gotta get those VC dollars

novel crater Aug 15, 2025, 8:43 PM

#

sullen quest Didn't stop him from pretending it was the death star

yeah I mean it seemed like it was made out to be huge with their website and stuff, I mean they removed all the other models at first

marsh stratus Aug 15, 2025, 8:43 PM

#

sullen quest They are what the average chatgpt user gets though.

Gpt-5 chat is the consumer facing flagship ChatGPT model, and I think that it’s going to go down like maverick 4 in the totality of its failure

sullen quest Aug 15, 2025, 8:43 PM

#

novel crater yeah I mean it seemed like it was made out to be huge with their website and stu...

costs saving, 4o seems to cost more to use

marsh stratus Aug 15, 2025, 8:44 PM

#

Nano and mini seem built for api use. Gpt-5 chat IS ChatGPT fo r most people, and it’s just so bad

novel crater Aug 15, 2025, 8:44 PM

#

fair yeah thats a really big part of it I think

#

cost friendly is always good

#

especially with some of these models 👀

whole wagon Aug 15, 2025, 8:52 PM

#

GPT5 without thinking is so bad

#

I don't know how it's even possible

#

It fails basic arithmetic

#

That even 4o could get

#

I'm basically permanently using thinking

#

Only downside is wait time between prompts

stray aspen Aug 15, 2025, 8:57 PM

#

@novel craterwhat will come first gemini 3 or grok 5

scenic salmon Aug 15, 2025, 8:59 PM

#

whole wagon Only downside is wait time between prompts

GPT-5 mini is a decent balance, it’s not as dumb as the chat model

novel crater Aug 15, 2025, 8:59 PM

#

I don't have enough knowledge on that to say definitively but since grok 4 came out recently it would make more sense logically that Google would release first

whole wagon Aug 15, 2025, 8:59 PM

#

stray aspen <@942590412445614200>what will come first gemini 3 or grok 5

Gemini 3 October, grok 5 December

#

That's my guesses no info

#

The gap between pro and regular gpt5 is quite insane. They didn't even mention pro in the livestream lol

#

I guess they don't really want ppl using it much. Due to capacity issues

#

The gap between gpt5 and GPT5 pro much larger than gap between o3 and o3 pro I found

scenic salmon Aug 15, 2025, 9:01 PM

#

whole wagon It fails basic arithmetic

zinc ore Aug 15, 2025, 9:02 PM

#

whole wagon The gap between pro and regular gpt5 is quite insane. They didn't even mention p...

This one's weird because mensa Norway caps out at 145 IQ

inner gate Aug 15, 2025, 9:06 PM

#

Hello famalams

whole wagon Aug 15, 2025, 9:11 PM

#

zinc ore This one's weird because mensa Norway caps out at 145 IQ

It doesn't?

#

It's just that it is rated for only up to 145

#

They didn't calibrate beyond that

pure falcon Aug 15, 2025, 9:14 PM

#

Does anyone else think that LMArena should go back to NO style control? Now they’re also talking about “emotional control”…

Makes no sense - it defeats the purpose and mission of LMArena in the first place. The more they “control”, the more they turn into the very benchmarks that they wanted to distinguish themselves from

#

If LMArena is about what users like…then let the users decide what users like!

#

The more variables and factors you “control”…
You end up turning into a Capabilities test

#

And There are plenty of capabilities benchmarks out there already

whole wagon Aug 15, 2025, 9:16 PM

#

pure falcon Does anyone else think that LMArena should go back to NO style control? Now they...

Where can I find the emotional control discussion

pure falcon Aug 15, 2025, 9:17 PM

#

whole wagon Where can I find the emotional control discussion

https://news.lmarena.ai/sentiment-control/

LMArena Blog

LMArena Research: Does Sentiment Matter in AI?

Introducing Sentiment Control: Disentangling Sentiment and Substance

#

LMArena is supposed to give insight into what other benchmarks miss

#

If you keep “controlling” for those unknown factors by adding style control, sentiment control, etc….the benchmark becomes worthless. No alpha in the scoring

hollow imp Aug 15, 2025, 9:21 PM

#

pure falcon Does anyone else think that LMArena should go back to NO style control? Now they...

Where is style control?

#

Gimme style control

pure falcon Aug 15, 2025, 9:23 PM

#

hollow imp Where is style control?

You already have it! Style control is built into the default rankings

patent aspen Aug 15, 2025, 9:23 PM

#

pure falcon Aug 15, 2025, 9:24 PM

#

All I’m saying is, if people like emojis and bolded words or whatever, then you have to accept that, regardless of ether you think they’re dumb or meaningless

#

In fact, we’re seeing right now why style control is a mistake for LMArena’s clients

whole wagon Aug 15, 2025, 9:25 PM

#

It's to prevent a race to the bottom where every model turns into sycophant emoji slop machine

pure falcon Aug 15, 2025, 9:26 PM

#

OpenAI announced that they would make gpt-5’s personality “warmer”

#

Because it’s obviously been too “style-control” maxed

pure falcon Aug 15, 2025, 9:26 PM

#

whole wagon It's to prevent a race to the bottom where every model turns into sycophant emoj...

Take a step back and think about what LMArena is, and what its value is

scenic salmon Aug 15, 2025, 9:26 PM

#

pure falcon OpenAI announced that they would make gpt-5’s personality “warmer”

Just put that in #ai-news

#

Trying to work their way back up the charts

pure falcon Aug 15, 2025, 9:27 PM

#

whole wagon It's to prevent a race to the bottom where every model turns into sycophant emoj...

The whole value prop of LMArena, is extracting sycophantic signaling!

#

Again, the more you control for user preferences, the more you turn into any other standard benchmark

#

Making LMArena useless

hollow imp Aug 15, 2025, 9:30 PM

#

pure falcon You already have it! Style control is built into the default rankings

I thought you're saying we can add custom instructions or at the very least a style mode like formal explanatory

patent aspen Aug 15, 2025, 9:31 PM

#

pure falcon Making LMArena useless

The thing is: they're already controlling for a bunch of other things. Style control is controversial because it's a visible option. The other controls aren't because they aren't visible.

pure falcon Aug 15, 2025, 9:31 PM

#

pure falcon The whole value prop of LMArena, is extracting sycophantic signaling!

LMArena is supposed to captures what objective benchmarks cannot. ie, user preferences. ie, what users like. ie “sycophantic” behavior

pure falcon Aug 15, 2025, 9:32 PM

#

patent aspen The thing is: they're already controlling for a bunch of other things. Style con...

What kinds of things is the “no style control” leaderboard controlling for?

frigid coral Aug 15, 2025, 9:32 PM

#

pure falcon The whole value prop of LMArena, is extracting sycophantic signaling!

I don't think so. LMArena is supposed to capture model's capability in general use cases. Sycophantic behavior is an undesirable side effect

pure falcon Aug 15, 2025, 9:32 PM

#

scenic salmon Just put that in <#1377796849901240321>

Yeah my guess is OpenAI relied too much on style controlled rankings in testing, and realized that was a mistake

pure falcon Aug 15, 2025, 9:33 PM

#

frigid coral I don't think so. LMArena is supposed to capture model's capability in general u...

Wrong. There are hundreds of benchmarks that do that already

pure falcon Aug 15, 2025, 9:33 PM

#

pure falcon Wrong. There are hundreds of benchmarks that do that already

LMArena is not a capabilities test. It’s a popularity contest

patent aspen Aug 15, 2025, 9:33 PM

#

pure falcon What kinds of things is the “no style control” leaderboard controlling for?

Mainly integrity checks. Certain patterns of user behavior are more likely to be trying to manipulate rankings to favor certain models, and they have a relatively robust system to account for that

frigid coral Aug 15, 2025, 9:34 PM

#

pure falcon Wrong. There are hundreds of benchmarks that do that already

None as representative as LMArena

scenic salmon Aug 15, 2025, 9:34 PM

#

pure falcon LMArena is not a capabilities test. It’s a popularity contest

Which doesn’t mean it’s not valuable information, it’s still incredibly valuable

pure falcon Aug 15, 2025, 9:35 PM

#

scenic salmon Which doesn’t mean it’s not valuable information, it’s still incredibly valuable

Agreed! And they should keep it that way. If they keep adding stuff to control for, LMArena’s value disappears entirely

pure falcon Aug 15, 2025, 9:35 PM

#

patent aspen Mainly integrity checks. Certain patterns of user behavior are more likely to be...

That’s different -that’s a prompting issue, not a voting / output problem

ornate agate Aug 15, 2025, 9:36 PM

#

Yeah filtering scam submissions seems like it’s not a manipulation

pure falcon Aug 15, 2025, 9:36 PM

#

Also, the models vary wildly by language. 50% of LMArena’s queries are not in English

patent aspen Aug 15, 2025, 9:37 PM

#

I don't think style control is manipulation either. I think it's just a slightly controversial option with compelling pros and cons to consider

frigid coral Aug 15, 2025, 9:38 PM

#

pure falcon Also, the models vary wildly by language. 50% of LMArena’s queries are not in En...

Where does language factor into sycophancy

ornate agate Aug 15, 2025, 9:39 PM

#

patent aspen I don't think style control is manipulation either. I think it's just a slightly...

I meant a fairly benign version of the word manipulation

pure falcon Aug 15, 2025, 9:39 PM

#

frigid coral Where does language factor into sycophancy

you can have a model that is very sycophantic in English, but maybe not in other language language. Or maybe rewarded in English more than in other languages

leaden palm Aug 15, 2025, 9:39 PM

#

pure falcon Agreed! And they should keep it that way. If they keep adding stuff to control f...

There's some stuff you can easily "turn up and down" and stuff that you can't
I'd argue that a leaderboard that only measures the second one would be even more valuable for some uses

frigid coral Aug 15, 2025, 9:39 PM

#

pure falcon you can have a model that is very sycophantic in English, but maybe not in other...

Do you have any evidence of that happening? I believe it's not true

hollow imp Aug 15, 2025, 9:40 PM

#

@pure falcon talk to @echo aurora about all this

ornate agate Aug 15, 2025, 9:40 PM

#

So filtering scam submissions seems to be a necessary baseline function of the website or it would be a joke given how popular it is. Style control isn’t, it’s a choice to have it or not. I’m not sure it’s up to date for recent models

patent aspen Aug 15, 2025, 9:41 PM

#

Yeah I mean personally I'm not the biggest fan of style control, although I think it has pros and cons and is useful as an option. You get some form of model provider gaming either way

pure falcon Aug 15, 2025, 9:42 PM

#

leaden palm There's some stuff you can easily "turn up and down" and stuff that you can't I'...

Idk what you mean by this?

leaden palm Aug 15, 2025, 9:42 PM

#

pure falcon Idk what you mean by this?

Eg a system prompt can increase formatting but can't increase raw intelligence

#

Or it can increase sycophancy but can't induce EQ

ornate agate Aug 15, 2025, 9:44 PM

#

patent aspen Yeah I mean personally I'm not the biggest fan of style control, although I thin...

When it was originally made I kind of liked it as a guard against llm markdown listslop. Now though I feel models like Gemini and GLM answer in this very verbose way, but the tech has improved so the lists actually have benefit/are more info dense now.

patent aspen Aug 15, 2025, 9:45 PM

#

Yeah I kind of want all the models to have fancy markdown now

whole wagon Aug 15, 2025, 9:45 PM

#

Removing style control would cook openAI

leaden palm Aug 15, 2025, 9:46 PM

#

whole wagon Removing style control would cook openAI

I wonder if they intentionally steered away from using the things style control tracks

pure falcon Aug 15, 2025, 9:46 PM

#

leaden palm Eg a system prompt can increase formatting but can't increase raw intelligence

But thats the whole question right? Is LMArena measuring intelligence? Is it SUPPOSED to be measuring intelligence?

IMO, its primary purpose is to inform model makers on user preferences. So they can improve their own models.

If an ASI model came out tomorrow but was a total a- hole that everyone hated, where would you want that model to be ranked on the leaderboard?

ornate agate Aug 15, 2025, 9:47 PM

#

This is gonna happen it’s gonna answer 42 to everything really quickly unless you ask it something sufficiently interesting.

#

But well, such a system will be in the somewhat far future

pure falcon Aug 15, 2025, 9:49 PM

#

ornate agate This is gonna happen it’s gonna answer 42 to everything really quickly unless yo...

Yes, A model that behaves like that is undesirable to model makers. Model makers want to know what users like. So if you “control” all that away (with sentiment and style control, etc), you end up with a leaderboard that is SO useless to model makers

pure falcon Aug 15, 2025, 9:50 PM

#

whole wagon Removing style control would cook openAI

Well, unless OpenAI is LMArenas only revenue source, then that should be the way to go

patent aspen Aug 15, 2025, 9:50 PM

#

IMO a few models will eventually pull ahead by a wider margin at which point the style control problem will gradually become moot because the best models will be able to win even with extensive markdown

whole wagon Aug 15, 2025, 9:50 PM

#

pure falcon Yes, A model that behaves like that is undesirable to model makers. Model makers...

They can simply press a button to remove style control

#

Style control helped in the llama 4 situation which was favourable

pure falcon Aug 15, 2025, 9:52 PM

#

Why would they do that though? They care much more about raw user preferences. There are plenty of other benchmarks (internal and external) that can measure raw capabilities for specialty / niche areas

ornate agate Aug 15, 2025, 9:53 PM

#

I don’t think lmarena is just user preference. It’s a whole bag of stuff

pure falcon Aug 15, 2025, 9:53 PM

#

ornate agate I don’t think lmarena is just user preference. It’s a whole bag of stuff

Like what? What value, to model makers, does LMArena provide that other benchmarks don’t?

whole wagon Aug 15, 2025, 9:53 PM

#

pure falcon Why would they do that though? They care much more about raw user preferences. T...

Are there really. A lot of benchmarks are saturated and/or useless

pure falcon Aug 15, 2025, 9:54 PM

#

I really would like to know, genuinely

ornate agate Aug 15, 2025, 9:54 PM

#

I mean I don’t care about what value it provides to model makers. I care about what value it provides to the community.

whole wagon Aug 15, 2025, 9:54 PM

#

pure falcon Why would they do that though? They care much more about raw user preferences. T...

?

#

If they really would like to see no style control. They simply press remove style control, simple

pure falcon Aug 15, 2025, 9:56 PM

#

Sure, but you could argue the exact opposite, right? If no style control is what matters more, then why is style control the default?

patent aspen Aug 15, 2025, 9:56 PM

#

fwiw I also prefer no style control as the default

pure falcon Aug 15, 2025, 9:56 PM

#

ornate agate I mean I don’t care about what value it provides to model makers. I care about w...

That’s the thing though - LMArena isn’t a “community” service project. They have enormous costs servicing top of the line models for free

#

They are propped up by VC funding

ornate agate Aug 15, 2025, 9:57 PM

#

I also prefer no style control.

ornate agate Aug 15, 2025, 9:57 PM

#

pure falcon That’s the thing though - LMArena isn’t a “community” service project. They have...

Don’t worry about that. They might not be paying for it anyway.

pure falcon Aug 15, 2025, 9:58 PM

#

Who are the leaderboards most valuable to? I’d argue it’s to companies like OpenAI who use their leaderboard to determine which model to deploy. Which is exactly what they did for GPT-5

#

There was an alternate gpt-5 variant named “zenith” that was tested in late July

#

OpenAI ended up going with the “summit” variant because it did better in LMArena

#

If they didn’t already, OpenAI would pay a lotttt of $$$$ for that data

#

Which means

#

For LMArena

#

Their primary customer and purpose, is the model maker

gritty stirrup Aug 15, 2025, 9:59 PM

#

Why does Claude Opus have a limit on lmarena?

warm fulcrum Aug 15, 2025, 10:00 PM

#

gritty stirrup Why does Claude Opus have a limit on lmarena?

its expensive

#

and every model has a limit

gritty stirrup Aug 15, 2025, 10:01 PM

#

warm fulcrum and every model has a limit

I didn't notice that other models were reaching the limit, but maybe you're right and I just didn't reach it. I didn't write that many messages and I already have a limit.

patent aspen Aug 15, 2025, 10:01 PM

#

@echo aurora We've been discussing whether or not LMArena should have style control enabled by default. I know this topic has been beaten to death, although I'm curious to understand LMArena's take

leaden palm Aug 15, 2025, 10:02 PM

#

pure falcon But thats the whole question right? Is LMArena measuring intelligence? Is it SUP...

that isn't really what i meant - i was saying that a leaderboard that measures the "core" of a model might be interesting to look at

"If an ASI model came out tomorrow but was a total a- hole that everyone hated, where would you want that model to be ranked on the leaderboard?" if it's better than the average model prompted to act that way, and if it can be prompted to act like whatever other model, then on a leaderboard designed to measure the core - the parts of the model that aren't superficial - it would rightfully take first place

patent aspen Aug 15, 2025, 10:03 PM

#

Personally I think there are pros and cons and prefer no style control by default because I see LMArena as primarily measuring user preference

pure falcon Aug 15, 2025, 10:06 PM

#

leaden palm that isn't really what i meant - i was saying that a leaderboard that measures t...

Model the core what? “Intelligence”? What does intelligence even mean? In that case, why not just feed it IQ tests, math questions, or other capabilities tests, which already exist all over the place? What value does LMArena therefore have?

#

Do you see what I’m getting at? The more you control for, the more you approach benchmarks that already exist. That defeats the purpose of LMArena IMO

leaden palm Aug 15, 2025, 10:07 PM

#

pure falcon Model the core what? “Intelligence”? What does intelligence even mean? In that c...

What does intelligence even mean?
what can't be reduced by controlling for attributes of the response / forcing a specific style
why not just feed it IQ tests, math questions, or other capabilities tests
have you heard of benchmaxxing? this would theoretically be unbenchmaxxable

patent aspen Aug 15, 2025, 10:12 PM

#

I think the best argument for style control is if your goal is to be the best all-in-one measure of intelligence in the areas that users care about

ornate agate Aug 15, 2025, 10:12 PM

#

Intelligence is just a vibe

echo aurora Aug 15, 2025, 10:13 PM

#

patent aspen <@283397944160550928> We've been discussing whether or not LMArena should have ...

I believe we did a blog post about this recently...

#

I could be wrong

ornate agate Aug 15, 2025, 10:14 PM

#

So style control is a vibe realignment from the default one, more towards a different vibe that people might want more…

patent aspen Aug 15, 2025, 10:15 PM

#

ornate agate Intelligence is just a vibe

I think no style control is more valuable. That's a better reflection of who is "winning" with users

leaden sun Aug 15, 2025, 10:17 PM

#

ornate agate So style control is a vibe realignment from the default one, more towards a diff...

a good way to test how well models follow the control instruction, isnt it, or is this the original intention of lmarena?

#

I think less control is better

#

but it depends on what purpose style control serves and what you want to test that you need style control as a normalizer

ornate agate Aug 15, 2025, 10:22 PM

#

patent aspen I think no style control is more valuable. That's a better reflection of who is ...

Yeah. I think it’s a bit of a fools errand to try and find some function to morph current arena votes into “intelligence” or something. It would also need to be continually updated I think.

ornate agate Aug 15, 2025, 10:23 PM

#

leaden sun a good way to test how well models follow the control instruction, isnt it, or i...

It doesn’t affect llm responses it just affects how it’s scored in the end.

pure falcon Aug 15, 2025, 10:23 PM

#

leaden palm > What does intelligence even mean? what can't be reduced by controlling for att...

Why is communication not counted in your “intelligence” definition? Are comedians not “intelligent” in your eyes? Are entertainers not “intelligent”? Intelligence is not just about solving problems and puzzles. That’s a very narrow and limited view of intelligence. A model highly skilled in communication is, in fact, intelligent, just in a Different way than Einstein. Intentionally dismissing those capabilities (by “controlling” for it) makes no sense.

leaden palm Aug 15, 2025, 10:24 PM

#

pure falcon Why is communication not counted in your “intelligence” definition? Are comedi...

if i can't prompt another model to be "highly skilled in communication" then the model would have an advantage

#

there are actually a few ways to do this though and all have some problems tbh

#

prompt all models to act a fixed way
prompt one model to imitate the structure of another model
identify the traits of each model's responses and control for each trait (implicitly assuming that each trait is easy to reproduce)

ornate agate Aug 15, 2025, 10:26 PM

#

leaden palm there are actually a few ways to do this though and all have some problems tbh

Btw do llm arena models have system prompts?

#

If so who sets them?

leaden palm Aug 15, 2025, 10:27 PM

#

ornate agate Btw do llm arena models have system prompts?

iirc it used to be that each model had a fixed (usually minimal though) system prompt, and now there's a common system prompt that says how to respond but it's different for some models

#

idk 100% though, i don't develop the arena

ornate agate Aug 15, 2025, 10:28 PM

#

It’s not suspicious on its own. All chat apps have a (often large) system prompt

ornate agate Aug 15, 2025, 10:30 PM

#

leaden palm iirc it used to be that each model had a fixed (usually minimal though) system p...

So it’s decided by lmarena what to put in it and it’s short?

leaden palm Aug 15, 2025, 10:30 PM

#

i believe so

#

eg the o series needs something like enable markdown formatting iirc

ornate agate Aug 15, 2025, 10:32 PM

#

Ah ok. That makes sense

scenic salmon Aug 15, 2025, 10:36 PM

#

#

They added it pretty quick

tired herald Aug 15, 2025, 10:38 PM

#

Damn, I should add this to my extension too

misty vault Aug 15, 2025, 10:42 PM

#

But most importantly: @deep adder said it is the best by far

tired herald Aug 15, 2025, 10:45 PM

#

misty vault But most importantly: <@348477266704990208> said it is the best by far

Exactly my thoughts

echo aurora Aug 15, 2025, 10:48 PM

#

echo aurora I believe we did a blog post about this recently...

Sorry I'm late to share this -> https://news.lmarena.ai/sentiment-control/

LMArena Blog

LMArena Research: Does Sentiment Matter in AI?

Introducing Sentiment Control: Disentangling Sentiment and Substance

hollow imp Aug 15, 2025, 10:53 PM

#

@leaden palm please some prompt engineering tips

leaden palm Aug 15, 2025, 10:53 PM

#

hollow imp <@794377681331945524> please some prompt engineering tips

for...?

tired herald Aug 15, 2025, 10:56 PM

#

ornate agate Btw do llm arena models have system prompts?

From what I checked, it seems that its either a very very simple one, none at all, or a base system prompt from the provider (gemini gave me a believable system prompt when I ehhm forc- made it give me its system prompt, all other ai's said they didnt have one/gave me mine back, which prob means mine was the only one)

#

Tho I cant confirm that unless I had access to the internals of LMArena

ornate agate Aug 15, 2025, 11:03 PM

#

tired herald From what I checked, it seems that its either a very very simple one, none at al...

Yeah that would make the most sense. Simple as possible to stay close to the original model.

golden ocean Aug 15, 2025, 11:29 PM

#

willow grail Aug 15, 2025, 11:36 PM

#

IIRC cursor had big context issues 1 year ago. is this still the case, now with gpt5 high?

fervent tangle Aug 15, 2025, 11:48 PM

#

why do OpenAI models suck so bad at creative writing

#

and why is Claude so good at it

wintry tinsel Aug 15, 2025, 11:50 PM

#

fervent tangle why do OpenAI models suck so bad at creative writing

Cuz open AI is the big poo poo, the prime crapola!

rugged swan Aug 15, 2025, 11:50 PM

#

new here, excited to learn from you all

fervent tangle Aug 15, 2025, 11:51 PM

#

🥀 fr

wintry tinsel Aug 15, 2025, 11:51 PM

#

rugged swan new here, excited to learn from you all

Learn how to do obscene things with Gemini?

#

🔥

misty vault Aug 16, 2025, 12:06 AM

#

paws stop typing

fervent tangle Aug 16, 2025, 12:17 AM

#

i used all openai models since 2023, even GPT5 sucks today

#

and the emoji blasting is corny

#

i hate the emoji respondings

jade egret Aug 16, 2025, 12:40 AM

#

👀

quiet dust Aug 16, 2025, 12:50 AM

#

tired herald GPT 5 high

What?

fervent tangle Aug 16, 2025, 12:51 AM

#

jade egret 👀

google needs to release genie 3

#

and its game over

jade egret Aug 16, 2025, 12:51 AM

#

fr

fervent tangle Aug 16, 2025, 12:51 AM

#

google is actually the winner, cuz they got all the money in the world + all resources

#

it is actually

jade egret Aug 16, 2025, 12:51 AM

#

?

#

when u have no money, no resources and your gonna win?

fervent tangle Aug 16, 2025, 12:52 AM

#

google has the best image, chat and video models rn

#

wydm it isnt

quiet dust Aug 16, 2025, 12:52 AM

#

Guys, does anyone know how to use the Thinking model on the phone in ChatGPT, not Thinking mini? It's just that the Thinking mini model provides the "Think longer" function.

jade egret Aug 16, 2025, 12:52 AM

#

jade egret when u have no money, no resources and your gonna win?

i dont see how this is gonna happen

fervent tangle Aug 16, 2025, 12:52 AM

#

they literally cant lose if they do that

jade egret Aug 16, 2025, 12:52 AM

#

quiet dust Guys, does anyone know how to use the Thinking model on the phone in ChatGPT, no...

are you payed option?

#

google?

wintry tinsel Aug 16, 2025, 12:52 AM

#

jade egret 👀

Oh yeah baby and triple

fervent tangle Aug 16, 2025, 12:53 AM

#

quiet dust Guys, does anyone know how to use the Thinking model on the phone in ChatGPT, no...

u can't the GPT5 chooses if it should think longer based on ur prompt

quiet dust Aug 16, 2025, 12:53 AM

#

jade egret are you payed option?

I'm on the free plan

jade egret Aug 16, 2025, 12:53 AM

#

google can win

fervent tangle Aug 16, 2025, 12:53 AM

#

jade egret google can win

it has already won

jade egret Aug 16, 2025, 12:53 AM

#

quiet dust I'm on the free plan

than i dont think u can use thinking

jade egret Aug 16, 2025, 12:53 AM

#

fervent tangle it has already won

ig

#

😭

patent aspen Aug 16, 2025, 12:53 AM

#

Google has a much, much longer runway than OAI, and it's not even close

fervent tangle Aug 16, 2025, 12:53 AM

#

and in a few months they'll release Gemini 3

jade egret Aug 16, 2025, 12:53 AM

#

xAI seriously

fervent tangle Aug 16, 2025, 12:53 AM

#

it will crush the coding benchmarks

#

fr

jade egret Aug 16, 2025, 12:53 AM

#

why u hating on google tho

patent aspen Aug 16, 2025, 12:53 AM

#

And eventually sheer capability just wins

jade egret Aug 16, 2025, 12:54 AM

#

wdym no product

quiet dust Aug 16, 2025, 12:54 AM

#

fervent tangle u can't the GPT5 chooses if it should think longer based on ur prompt

There is a button. The "Think longer" button.

fervent tangle Aug 16, 2025, 12:54 AM

#

they'll use it if its the best at all stuff

patent aspen Aug 16, 2025, 12:54 AM

#

OpenAI is far less innovative than Google, and it's not even close

fervent tangle Aug 16, 2025, 12:54 AM

#

doesn't even matter, they get their income from other stuff like google search, youtube and all google owned markets

#

they can throw money on AI until they get big

jade egret Aug 16, 2025, 12:55 AM

#

they can just put gemini in chrome which they already started?

patent aspen Aug 16, 2025, 12:55 AM

#

The wider the gulf in the capability, the less product polish matters

fervent tangle Aug 16, 2025, 12:55 AM

#

u know the google cfo?

jade egret Aug 16, 2025, 12:56 AM

#

if you count all the x users?

fervent tangle Aug 16, 2025, 12:56 AM

#

cuz its "not woke"

jade egret Aug 16, 2025, 12:56 AM

#

well than why not include ai overview and ai mode

fervent tangle Aug 16, 2025, 12:56 AM

#

and u can easily jailbreak grok 4

patent aspen Aug 16, 2025, 12:56 AM

#

I mean if you're going to make that argument, you have to include AI Overviews

fervent tangle Aug 16, 2025, 12:56 AM

#

grok 4 is easily controlled unlike claude, openai and google models

jade egret Aug 16, 2025, 12:57 AM

#

grok turned into a h*tler fan 😭

fervent tangle Aug 16, 2025, 12:57 AM

#

jade egret grok turned into a h*tler fan 😭

it got no security

jade egret Aug 16, 2025, 12:57 AM

#

fervent tangle it got no security

FR

fervent tangle Aug 16, 2025, 12:58 AM

#

tbh even tho claude 4.1 opus is the best at coding, creative writing. majority of people still use chatgpt

jade egret Aug 16, 2025, 12:58 AM

#

fervent tangle tbh even tho claude 4.1 opus is the best at coding, creative writing. majority o...

true

fervent tangle Aug 16, 2025, 12:58 AM

#

because normal people dont care which AI is the best

#

they just download chatgpt and use it

jade egret Aug 16, 2025, 12:59 AM

#

still think google can win tho

fervent tangle Aug 16, 2025, 12:59 AM

#

we're like 0.01% of all humans

#

if ur into AI and coding

#

the majority follows the slop

#

they dont care if its good or bad

sullen quest Aug 16, 2025, 1:07 AM

#

honestly it might be better than some ealier versions, where it pretended it was writing a epic fantasy tale

#

use ai google studio if you are using 2.5 pro

jade egret Aug 16, 2025, 1:13 AM

#

the gemini is different?

sullen quest Aug 16, 2025, 1:14 AM

#

jade egret the gemini is different?

different instruction set

#

yah, its not suprising that the api version has less instructions then the non api version though

#

I have no clue what's going on with the front facing version since it never can act normal

patent aspen Aug 16, 2025, 1:19 AM

#

I'm going to be honest. The ChatGPT web UI looks just like the Gemini web UI except with worse iconography

keen beacon Aug 16, 2025, 1:19 AM

#

fervent tangle they just download chatgpt and use it

be me
can't choose which large language model to install
ask a friend
friend says chatgpt
download chatgpt
run chatgpt

#

https://tenor.com/view/giga-chad-gif-23143840

Tenor

patent aspen Aug 16, 2025, 1:21 AM

#

The responses?

#

You've got a few weeks at best

fervent tangle Aug 16, 2025, 1:22 AM

#

keen beacon >be me >can't choose which large language model to install >ask a friend >friend...

yeah and its good enough for everything at the moment

patent aspen Aug 16, 2025, 1:24 AM

#

It doesn't matter while the quality gap is small. It matters more as the gap widens

#

Nah they're about 1/3 to 1/2 a generation behind

#

Capability

#

Nah

#

Better in some areas. Worse in others. Overall parity

keen beacon Aug 16, 2025, 1:26 AM

#

fervent tangle yeah and its good enough for everything at the moment

To my surprise, sure

patent aspen Aug 16, 2025, 1:27 AM

#

But the point is that their pace isn't enough

#

They're roughly 1/3 to 1/2 a generation behind

#

Nah

#

Look if you want to compare bad ChatGPT responses vs bad Gemini responses, we can do that. I just don't think it would be a useful conversation

#

Do you just not like the bold sections?

#

keen beacon Aug 16, 2025, 1:37 AM

#

Guys

patent aspen Aug 16, 2025, 1:37 AM

#

I'm not sure what to say. Most people prefer paragraphs with bold headers

#

I do think verbosity is on the chopping block though

keen beacon Aug 16, 2025, 1:38 AM

#

Did you know that you can compare the answers by SOTA and non-SOTA models to determine if the models you ask are right

#

You just ask say Qwen2.5 the same question 10 times in a row

#

Then Qwen3

#

And if the responses tend to differ, one LLM is clearly in the wrong here

misty vault Aug 16, 2025, 1:44 AM

#

keen beacon >be me >can't choose which large language model to install >ask a friend >friend...

be me
talk with bing chat sydney

#

distilling from sydney fine tune before it gets shut down

uneven lance Aug 16, 2025, 1:47 AM

#

'Ello guys, is there any site that generates videos using Veo3 for free?

rare python Aug 16, 2025, 1:55 AM

#

patent aspen Do you just not like the bold sections?

For me, I don't care about the writing style as long as the LLM I'm using flawless at instructions following, consistent at long context. Gemini 2.5 Pro did very well about first 10 messages, then it broke and went back to bullet list, which I instructed to be banned in systen instructions.

verbal nimbus Aug 16, 2025, 1:56 AM

#

Default ChatGPT version of GPT-5 is on there now

rare python Aug 16, 2025, 1:57 AM

#

verbal nimbus Default ChatGPT version of GPT-5 is on there now

It isn't as a hype yes man as gpt 4o so it got lower elo 💀

verbal nimbus Aug 16, 2025, 1:57 AM

#

Lower than GPT-4.5 too

rare python Aug 16, 2025, 1:59 AM

#

This is killing me 😩 Please ship faster and better quality

#

maybe the thinking but the chat version is 💩

#

chat is the non thinking version right? It's just GPT 4.1 in cloak

#

and it's really bad at IF

elder solar Aug 16, 2025, 2:08 AM

#

is there any api that has the 3 version of gpt 5

rare python Aug 16, 2025, 2:08 AM

#

I don't care about benchmark I only care how well it follows my own system prompt

#

livebench rarely match my real world usage

#

then why did you show it as a proof of how well GPT 5 at IF 😭 💀

elder solar Aug 16, 2025, 2:11 AM

#

theres not alot trustful benchmarks of ai

#

or prob its depend on the prompt

rare python Aug 16, 2025, 2:13 AM

#

elder solar or prob its depend on the prompt

Most trustworthy benchmarks are your own

elder solar Aug 16, 2025, 2:16 AM

#

rare python Most trustworthy benchmarks are your own

maybe

#

who knows

rare python Aug 16, 2025, 2:16 AM

#

🗿

elder solar Aug 16, 2025, 2:17 AM

#

however

#

i still hope theres another ai model that can listen audios

#

that is not gemini

inner gate Aug 16, 2025, 2:24 AM

#

Gemini can hear u?

elder solar Aug 16, 2025, 2:27 AM

#

inner gate Gemini can hear u?

yeah ik

#

but there should be more ai models that could be it

#

like chatgpt

fast halo Aug 16, 2025, 2:31 AM

#

only me who takes ages to load a response?

jade egret Aug 16, 2025, 2:32 AM

#

fast halo only me who takes ages to load a response?

?

#

gemini?

#

or direct chat

fast halo Aug 16, 2025, 2:54 AM

#

direct

lucid kite Aug 16, 2025, 3:10 AM

#

Good evening! I’m new here. Could you please let me know if you plan to add video generation to the site in the future?

sinful vessel Aug 16, 2025, 3:20 AM

#

what happened with nano banana model?

#

was removed?

#

It doesn't appear to me anymore, before it appeared to me all the time in battle

drifting thorn Aug 16, 2025, 3:30 AM

#

I guess nano banana is going to be released

#

I hope so

whole wagon Aug 16, 2025, 3:42 AM

#

#

Is Google cooking smth up

#

Apparently they had a celebration when they saw GPT5 performance

#

Lol

wind vector Aug 16, 2025, 3:44 AM

#

just curious, but why limit video generation / arena to discord? discord ai stuff is the worst lol

whole wagon Aug 16, 2025, 3:45 AM

#

https://www.anthropic.com/research/end-subset-conversations

Claude Opus 4 and 4.1 can now end a rare subset of conversations

An update on our exploratory research on model welfare

hot lance Aug 16, 2025, 3:45 AM

#

image to video is too 💩

hot lance Aug 16, 2025, 3:47 AM

#

wind vector just curious, but why limit video generation / arena to discord? discord ai stuf...

Maybe ,the cost of data for downloading videos is very high.

wind vector Aug 16, 2025, 3:50 AM

#

hot lance Maybe ,the cost of data for downloading videos is very high.

ahh that might be the case

echo aurora Aug 16, 2025, 3:57 AM

#

lucid kite Good evening! I’m new here. Could you please let me know if you plan to add vide...

welcome welcome! It's possible we add Video Arena to the site in the future, we're treating this as an experiment using Discord. #bot-feedback is where we're collecting thoughts on it.

echo aurora Aug 16, 2025, 3:58 AM

#

wind vector just curious, but why limit video generation / arena to discord? discord ai stuf...

Vid Gen is more expensive, so we do want to limit it a bit. Currently it's set to 8 generations a day.

#

We taking all of this feedback into account though! So it may change.

plucky palm Aug 16, 2025, 4:01 AM

#

echo aurora We taking all of this feedback into account though! So it may change.

I think Discord is great, it also works as a funnel for making announcements, since you’re not collecting emails or requiring sign-ups on a website.

#

plus i like the community

echo aurora Aug 16, 2025, 4:02 AM

#

plucky palm I think Discord is great, it also works as a funnel for making announcements, si...

We'll always have a Discord for that kind of stuff fortunately.

echo aurora Aug 16, 2025, 4:03 AM

#

plucky palm plus i like the community

agreed & seeing our site is about gathering community preferences there should be a place to chat about it.

#

thinking vs non-thinking versions

plucky palm Aug 16, 2025, 4:04 AM

#

echo aurora agreed & seeing our site is about gathering community preferences there should b...

exactly

stray aspen Aug 16, 2025, 4:05 AM

#

any gemini 3 news

verbal nimbus Aug 16, 2025, 4:11 AM

#

whole wagon Apparently they had a celebration when they saw GPT5 performance

Lmao

#

Hopefully their next model is better at agentic coding

drifting thorn Aug 16, 2025, 5:02 AM

#

I hope Gemini 3.0 can further extend their context window for even better agentic usage and tool use in agent. Ability of creative writing is also being anticipated by a lot of AIRP users

#

Gemini 2.5 Pro still holds the SOTA in long form multi-turn creative writing in actual usage but I wish they can even make it better

wind vector Aug 16, 2025, 5:07 AM

#

drifting thorn I hope Gemini 3.0 can further extend their context window for even better agenti...

Personally I don't see what use there is of a longer context window when they can barely utilize their full 1 million so far

#

They need to improve attention to the context they have, imo

#

Especially for creative writing, airp, or even coding in some cases. Approaching 200k tokens and you see a ton of ai'isms / quality degredation / "amnesia" creeping in

drifting thorn Aug 16, 2025, 5:09 AM

#

YES

wind vector Aug 16, 2025, 5:09 AM

#

Seems like a hard problem to solve, though

drifting thorn Aug 16, 2025, 5:11 AM

#

I mean, they have to be train to have a consciousness of "rounds of conversation", like "this context is from the first round", "this context is from the second round" etc

unique cave Aug 16, 2025, 5:48 AM

#

which model is ranked first overall for this month?

lilac inlet Aug 16, 2025, 5:57 AM

#

Guys! I'm sharing my learning here, which I recently made up from my non-coding mind. I mean, I don't even know HTML.

I just discovered a way to create unlimited web apps, widgets, desktop apps, and so on for free!

We need two tools: Lmarena and Weblmarena, that's it!

Any person from a non-technical background can easily create fully customised software at no cost. With the steps below that I discovered lately!

Pick a pen and copy and write down about your desired app, mentioning every single detail that pops up in your head. For this, remember this rule, 5W1H (What, When, Where, Who, Why and How). If you have a better strategic model, then you can implement it here. It's totally up to you how you want to describe your app!
Click photos of your notebook pages and upload them to any LLM model and ask it to transcribe your images or pull the contents from them!
Once you have the text content, Copy it and open your lmarena and select the modal qwen coding one and paste all of your content there and also add your custom instructions how you want it to be built, for an example, the widget I made was a react js component + Typescript in node js page, same way you have to give instructions and ask your desired output in single page written code!
Once you have the code, Open Weblmarena and paste the code in the text box and hit enter! There you go, it will take 2-4 minutes to render your code and show you the preview of your website!
For Iteration and Bug error fixing, head back to your lmarena and open the Qwen coding thread where you have your previous chat. Now, you can iterate on it, fix your errors and so on. Repeat step 4 again and keep on doing it until you have the desired output!

If anyone has a better way to execute such ideas for a non-technical person, kindly share it. It would be really helpful for a newbie like me who knows nothing about tech stuff! 🤗

frigid coral Aug 16, 2025, 6:11 AM

#

https://eqbench.com/spiral-bench.html

#

Gemini 2.5 Pro highest in sycophancy

verbal nimbus Aug 16, 2025, 6:16 AM

#

frigid coral Gemini 2.5 Pro highest in sycophancy

Interesting, but didn't expect to see Claude below Gemini

#

It's even lower than 4o here

frigid coral Aug 16, 2025, 6:25 AM

#

claude seems to perform really bad at mania tests

keen beacon Aug 16, 2025, 6:27 AM

#

It's been only a week since I found out LMArena and I jailbroke two things already

#

Bruh

fervent tangle Aug 16, 2025, 8:38 AM

#

keen beacon It's been only a week since I found out LMArena and I jailbroke two things alrea...

calm down, ive known lmarena for a year

#

and they still the same

keen beacon Aug 16, 2025, 9:05 AM

#

Kinda worrisome tbh

#

Turns out most people would rather have sycophantic delusion inducion machines than truth seeking arbiters such as GPT-5

#

I guess it can explains some ratings pretty well

white hatch Aug 16, 2025, 9:11 AM

#

I hate the safety!

leaden sun Aug 16, 2025, 9:12 AM

#

what's the definition of "safety" here?

solid brook Aug 16, 2025, 9:12 AM

#

keen beacon Turns out most people would rather have sycophantic delusion inducion machines t...

Yeah a very bad future we're heading to

leaden sun Aug 16, 2025, 9:18 AM

#

keen beacon Kinda worrisome tbh

without a precise definition of words like "safety, risks" in the context of AI, this is nothing but pseudo-science, isnt it?

quiet dust Aug 16, 2025, 9:24 AM

#

Why don't I have a button where I can change the GPT-5 model?

#

I heard you'll be able to switch between Fast, Auto, and Thinking. But how?

autumn cargo Aug 16, 2025, 9:31 AM

#

Why can models with lower scores end up higher? Isn't CI a symmetric thing?

leaden sun Aug 16, 2025, 9:35 AM

#

keen beacon Kinda worrisome tbh

if you read how they described this bench, you know it's wishy-washy science https://eqbench.com/spiral-bench.html

keen beacon Aug 16, 2025, 9:36 AM

#

whole wagon

Aghhh, give me gemini 3

keen beacon Aug 16, 2025, 9:38 AM

#

keen beacon Turns out most people would rather have sycophantic delusion inducion machines t...

Because they want AI to act as a "bestie"

#

and now because of that minority, OpenAI is making GPT 5 "warmer"

leaden sun Aug 16, 2025, 9:44 AM

#

hahahah

ocean vortex Aug 16, 2025, 10:11 AM

#

autumn cargo Why can models with lower scores end up higher? Isn't CI a symmetric thing?

Ohh Opus is top3 now. Seems Anthropic entered the game of user preference fine-tuning now. Opus 4.1 has bigger gains over 4.0 than Sonnet4 has over 3.5.

reef bridge Aug 16, 2025, 10:16 AM

#

can't lmarena do function calls on the chat other than search

calm trout Aug 16, 2025, 10:28 AM

#

Is the video generation here powered by Veo 3?

brittle tiger Aug 16, 2025, 10:46 AM

#

https://x.com/IndraVahan/status/1956650892788191552?t=Q816aqHymv6huA6xls5hgw&s=19

Indra (@IndraVahan)

i think we’re looking at a breakthrough in image gen.

nano-banana is literally picking its own spots to edit. changes flow with the context + prompt. doesn’t recreate the whole thing like gpt-image-1 or step1x-edit. it's like flux-1-kontext, but way sharper on context and prompt

copper furnace Aug 16, 2025, 10:48 AM

#

Hello im preatty new here, is there any limits on how many images/videos you can generate?

#

*pretty

past shuttle Aug 16, 2025, 10:50 AM

#

gemini king!

exotic nebula Aug 16, 2025, 11:22 AM

#

copper furnace Hello im preatty new here, is there any limits on how many images/videos you can...

Yes. 8 exactly.

exotic nebula Aug 16, 2025, 11:22 AM

#

past shuttle gemini king!

Based.

exotic nebula Aug 16, 2025, 11:23 AM

#

calm trout Is the video generation here powered by Veo 3?

No, its powered by many video gen models. Its randomized, think of it as the exact version of battle mode.

copper furnace Aug 16, 2025, 11:41 AM

#

exotic nebula Yes. 8 exactly.

Ok thanks

hollow imp Aug 16, 2025, 11:54 AM

#

leaden palm for...?

Gemini 2.5 and grok

plucky island Aug 16, 2025, 11:56 AM

#

keen beacon Turns out most people would rather have sycophantic delusion inducion machines t...

"truth seeking arbiter" and "GPT-5" in the same sentence? what?

hollow imp Aug 16, 2025, 12:09 PM

#

plucky island "truth seeking arbiter" and "GPT-5" in the same sentence? what?

Absolutely not

ionic idol Aug 16, 2025, 12:21 PM

#

not working

hollow imp Aug 16, 2025, 12:27 PM

#

ionic idol not working

Use nano banana

full burrow Aug 16, 2025, 12:30 PM

#

Looking forward to seeing nano-banana on the chart.

plucky island Aug 16, 2025, 12:34 PM

#

hollow imp Use nano banana

btw who made nano banana and where is it other than lmarena?

hollow imp Aug 16, 2025, 12:34 PM

#

plucky island btw who made nano banana and where is it other than lmarena?

I have no clue

drifting thorn Aug 16, 2025, 12:43 PM

#

plucky island btw who made nano banana and where is it other than lmarena?

I thought Google made it

hollow imp Aug 16, 2025, 12:50 PM

#

drifting thorn I thought Google made it

Google would never put such a freaky name

#

And they already have veo

scenic salmon Aug 16, 2025, 1:25 PM

#

#

They ruined it…

stray aspen Aug 16, 2025, 1:25 PM

#

Lmao

native gate Aug 16, 2025, 1:29 PM

#

Error: Something went wrong while generating the response. Please try again. Always when I paste some long code php

stray aspen Aug 16, 2025, 1:29 PM

#

@tired herald what's the fix

keen beacon Aug 16, 2025, 1:32 PM

#

scenic salmon

I hate AI psychosis people

neon idol Aug 16, 2025, 1:32 PM

#

Hello

neon idol Aug 16, 2025, 1:32 PM

#

keen beacon I hate AI psychosis people

Me too

keen beacon Aug 16, 2025, 1:32 PM

#

neon idol Hello

Hei

neon idol Aug 16, 2025, 1:32 PM

#

keen beacon Hei

Yo

#

I want deepseek r2

#

But they nerfed it 🙁

keen beacon Aug 16, 2025, 1:33 PM

#

neon idol I want deepseek r2

Me too... But one thing I want more is not to make GPT-5 a servile model

#

aka too positive and eager to please

#

but OpenAI is doing it

#

stated in a tweet

neon idol Aug 16, 2025, 1:33 PM

#

Me still waiting for Gemini 3 amd r2 💀🙏

keen beacon Aug 16, 2025, 1:34 PM

#

neon idol Me still waiting for Gemini 3 amd r2 💀🙏

I hope R2 will wreck the whole leaderboard like with R1

#

in January

neon idol Aug 16, 2025, 1:34 PM

#

Yesh I hope

#

WE WANT THE RED CHINA DRAGON ATTACK 🗣🔥

hollow imp Aug 16, 2025, 1:36 PM

#

🔥

keen beacon Aug 16, 2025, 1:45 PM

#

To my surprise Qwen and Deepseek answer the same question better when asked through chat, not API. Deepseek and Qwen answer my music theory question 7-8 times out of 10 and Deepseek even infers almost correct conclusions another time which makes it more like 8.5-9/10 if I'm more liberal with scoring.

In LMarena when stress tested with the same prompt 10 times they both answer 3-4/10 at best.

#

It seems like the workload can influence performance.

hollow imp Aug 16, 2025, 1:46 PM

#

keen beacon To my surprise Qwen and Deepseek answer the same question better when asked thro...

But I cannot access o3 without lmarena

keen beacon Aug 16, 2025, 1:48 PM

#

hollow imp But I cannot access o3 without lmarena

o3 still gets it correctly even at LMArena

#

OpenAI is the most demanded AI startup on this planet either way

#

They know how to scale their toys

#

Deepseek however seems to know about it a bit less

#

Holy hell if Deepseek in the chat version is actually that much better than the API version then I can only imagine what is going to happen once they will rent out enough compute to run R2

#

It's not there (the o3/GPT 5/Gemini 2.5 level), but it is very, very close

obsidian cargo Aug 16, 2025, 1:56 PM

#

I want Deepseek v4. V3 is the best model available on AI Dungeon atm but I'm a bit burned out of its limitations.

warped totem Aug 16, 2025, 1:58 PM

#

hey, yesterday i lost all my chat sessions for some reason. can i retrieve it ?

keen beacon Aug 16, 2025, 2:00 PM

#

keen beacon Holy hell if Deepseek in the chat version is actually that much better than the ...

Or maybe it is because LMArena proxies to Deepseek API through US servers and I am in Russia and get different entry points kek

hollow imp Aug 16, 2025, 2:00 PM

#

keen beacon It's not there (the o3/GPT 5/Gemini 2.5 level), but it is very, very close

Deepseek r2 will not be gemini 2.5 level?

warped totem Aug 16, 2025, 2:02 PM

#

is gpt 5 high any better than gemini 2.5?

sullen quest Aug 16, 2025, 2:03 PM

#

depends on the gemini 2.5 and depends on the gpt 5

warped totem Aug 16, 2025, 2:04 PM

#

what u mean

#

these on llm arena

sullen quest Aug 16, 2025, 2:06 PM

#

There's multiple versions of gpt5, high, chat, high nano and high mini

#

I used to think gemini 2.5 was slow, then gpt 5 high came around

warped totem Aug 16, 2025, 2:07 PM

#

so i told "gpt 5 high"

solid brook Aug 16, 2025, 2:07 PM

#

scenic salmon

i mean you can choose the robotic personality

solid brook Aug 16, 2025, 2:08 PM

#

warped totem is gpt 5 high any better than gemini 2.5?

yeah dude lol

#

even gpt 5 medium

sullen quest Aug 16, 2025, 2:08 PM

#

depends on the task

warped totem Aug 16, 2025, 2:08 PM

#

most of the time i have a feeling gemini 2.5 give better responses

#

and its little faster

sullen quest Aug 16, 2025, 2:09 PM

#

I've heard that gemini 2.5 is better at different langauges compared to gpt 5

ocean vortex Aug 16, 2025, 2:09 PM

#

keen beacon Deepseek however seems to know about it a bit less

They know a lot. But they are constrained by CCP forcing them to use crappy Huawei chips lol

sullen quest Aug 16, 2025, 2:09 PM

#

idk I don't speak other languages so I can't confirm

sullen quest Aug 16, 2025, 2:09 PM

#

ocean vortex They know a lot. But they are constrained by CCP forcing them to use crappy Huaw...

its also the us goverment forcing them too

warped totem Aug 16, 2025, 2:10 PM

#

yea, thats true - sometimes he makes typos

keen beacon Aug 16, 2025, 2:10 PM

#

warped totem is gpt 5 high any better than gemini 2.5?

Sometimes.

One answers some of my questions better than another one. Depends on the question asked.

But they are roughly head-to-head in most things.

ocean vortex Aug 16, 2025, 2:10 PM

#

sullen quest its also the us goverment forcing them too

Well yeah but they were able to go around that (import restrictions) to be fair

keen beacon Aug 16, 2025, 2:10 PM

#

ocean vortex They know a lot. But they are constrained by CCP forcing them to use crappy Huaw...

If this is true this is sad.

#

Teortaxes wrote about it being fake news and so on. But here in Russia we actually have this exact problem, and being a Russian he sounds like massive cope to be honest.

#

But really

solid brook Aug 16, 2025, 2:11 PM

#

dude gpt 5 high is a lot better than the lobotimized gemini 2,5 we got

sullen quest Aug 16, 2025, 2:11 PM

#

ocean vortex Well yeah but they were able to go around that (import restrictions) to be fair

Only so much and it makes the chips more expensive

keen beacon Aug 16, 2025, 2:12 PM

#

keen beacon But really