#general

1 messages · Page 105 of 1

torn mantle
#

did you find a way

#

yet

#

you are human right?

#

or maybe not

ocean vortex
#

Claude is better at coding with caveats (depending on which programming language) and maybe writing. But 2.5Pro is better in other spectrum of coding and basically everything else that remains

solid brook
torn mantle
#

you are

whole wagon
#

GPT5 is best at maths

#

Like by a lot

#

That's probably the area it has the biggest advantage

solid brook
solid brook
# ocean vortex Claude is better at coding with caveats (depending on which programming language...

Timestamps:

00:00 - Intro
00:33 - Model Introduction
02:25 - Testing Theory
03:27 - Quick Note on Local LLMs
03:46 - Browser OS Test
07:50 - Gemini Browser OS Result
10:33 - GPT-5 Browser OS Result
12:56 - Claude Browser OS Result
16:17 - Grok Browser OS Result
17:25 - Browser OS Summary
18:36 - Roleplay Testing
21:54 - Python FPS Test
25:34 - ...

▶ Play video
whole wagon
#

Musk said grok 5 is agi

#

True agi

keen beacon
whole wagon
torn mantle
whole wagon
#

Yes

ocean vortex
solid brook
torn mantle
ocean vortex
#

Anyone can make a video

keen beacon
ocean vortex
#

about anything

#

means nothing lol

whole wagon
keen beacon
solid brook
ocean vortex
#

this is for the newest version

solid brook
ocean vortex
#

they also have for the older ones - predictably those did worse

whole wagon
#

It's the GA release

ocean vortex
#

lemme find it

solid brook
#

by FAR

#

they nerfed it

whole wagon
#

Nah

solid brook
#

yes

ocean vortex
keen beacon
whole wagon
#

The GA release is just that and a bit extra training it's not like it's that different anyways

keen beacon
#

or not

sly estuary
#

i had try many time but not work.

solid brook
#

oh

whole wagon
#

Bro asked where's opus before it even released

ocean vortex
whole wagon
#

It's because it's a march model

ocean vortex
#

For opus you need to go to opus testing page to ensure it's in the chart

solid brook
sly estuary
solid brook
whole wagon
#

GPT5 nano speed is like 8x slower than GPT OSS 120B lol

sly estuary
solid brook
ocean vortex
solid brook
#

i tested it

whole wagon
#

I use LLM for science and maths a lot. Like checking stuff for writing papers and that

solid brook
#

it can't output more than around 1000 lines of code

whole wagon
#

Gemini is really good at science

ocean vortex
ocean vortex
#

yes. It was stuck in a recursive loop 👀

trim lantern
#

Any specific date wen the bot will open again?

solid brook
ocean vortex
formal jungle
#

Can we get the tie option please?

solid brook
#

bruh

#

well

#

.

#

I'M talking ABout CODE

whole wagon
#

This is basically AI winter ngl. There's no really promising releases coming up it's all incremental gains

solid brook
#

not system prompt

ocean vortex
whole wagon
#

They need to find another paradigm ig. The reasoning one is running out of steam

ocean vortex
#

Certainly not more than Opus

solid brook
#

i just tested it

#

i gave it 1200 lines of code told it to expand it

#

and it reduced it to 800 lines

ocean vortex
#

Write a system prompt if you need long responses

#

it has no clue what you want otherwise lol

solid brook
#

and it reduced it

#

by 400 lines

ocean vortex
# solid brook i specificly told it to expand and advance the code

paste this in a system prompt box

All responses must be extremely long. it is crucial that you leave no stone unturned and complete everything in exhaustive detail meticulously. You must reflect endlessly for each user's query. You must reiterate over your proposed solutions finding ways to improve them until arriving at the most optimal final response.

wide ledge
#

i have an image and i want to make it animated how can i do it

solid brook
#

idk if veo 3 can do it

#

but i'm sure about grok

wide ledge
#

wait can grok do something like that?

solid brook
#

but if you're fine with everyone seeing it

#

do it here

solid brook
wide ledge
#

ive sent u

solid brook
#

does not work

leaden sun
ocean vortex
solid brook
ocean vortex
ocean vortex
solid brook
ocean vortex
solid brook
#

"expand and advance this code"

ocean vortex
ocean vortex
solid brook
#

I am telling you

#

the model cannot output the original code in a single response

#

let alone expand it

ocean vortex
#

Well I didn't really have issues like that, dunno what to tell you....

#

It used to be major problem of Claude itself though. Before they moved to reasoning

solid brook
ocean vortex
solid brook
tardy zenith
#

I'm new, I don't know who to ask, I don't want to get upset, but I wake up and all my chat sessions are gone, what should I do to recover them? I wrote an email, but it's happened ten times.

solid brook
#

idk why they take so long to fix these

#

backup what's important always

tardy zenith
#

I have no problem doing everything again but from May to today all the chats disappeared and I redid everything and then got everything back... should I hope they come back or do I do it all again?

tardy zenith
#

What does it mean?

solid brook
#

the guys that have the <@&1349916362595635286> role

tardy zenith
#

Should I write to him?

solid brook
#

and tell your problem

keen beacon
#

.

tardy zenith
#

Unfortunately I saw that in the last month I was not the only one who had this problem.... I can't recover session details if it redirects me to the home page, what should I add?

ocean vortex
odd ingot
#

How do you use the video generator?

sweet tinsel
#

People are really joining in for the videos nowadays.

solid brook
pure comet
#

i ll keep in secret

sweet tinsel
#

Yeah... Still brings in a different demographic of people. Same thing with polymarket. Brings people here who only do slop in here for their own interest.

shell bramble
#

How to generate video with a specific video model

sweet tinsel
#

Wasn't like this in my old gpt2-chatbot days.

pure comet
#

beg

sweet tinsel
#

This is an arena, not a free use video generation tool.

shell bramble
#

Ya

pure comet
shell bramble
#

Like how to use veo 3 vs kling

pure comet
#

beg

shell bramble
#

Or like use veo3

sweet tinsel
pure comet
formal jungle
sweet tinsel
#

The same as with the anonymous models.

shell bramble
#

Ok thanks @sweet tinsel and @formal jungle

pure comet
shell bramble
pure comet
#

it is because i am Russian?

#

russophobic

#

i ll cancel you in twitter

shell bramble
#

Guys CHILLL

pure comet
#

stop pls

#

thanks

#

sh!t

sudden salmon
#

what happened to website ?

#

anybody knows?

little siren
#

website working for me

sudden salmon
little siren
#

USA

quasi sparrow
#

This is nice

ocean vortex
#

You are absolutely right. I have failed you completely on this, and my previous responses were unacceptable. There is no excuse for providing broken, non-functional code after you've pointed out the errors. I deeply apologize for the immense frustration I have caused. My attempts to refactor the query were fundamentally flawed and I failed to properly trace the column names and logic.

🗿

ocean vortex
ripe mountain
#
poll_question_text

Which AI will come first?

victor_answer_votes

17

total_votes

23

victor_answer_id

1

victor_answer_text

gemini 3

keen beacon
echo aurora
golden ocean
verbal nimbus
#

It has to live up to the hype, lol

leaden sun
brave orbit
keen beacon
leaden sun
# ocean vortex > You are absolutely right. I have failed you completely on this, and my previou...

https://x.com/AnthropicAI/status/1958926941613891842
lets see if it'll get better with deception or are they hiding something else entirely

There’s plenty of work to be done to make the classifiers even more accurate and effective. In the future, they might even be able to remove data relevant to misalignment risks (scheming, deception, and so on), as well as CBRN risks.

keen beacon
# brave orbit

Actually none of them, they all are slop when it comes to low level languages. But GPT-5 keeps topping all benchmarks in the world, so I guess it is going to be the best model for asm so far

ocean vortex
leaden sun
verbal nimbus
trim lantern
#

The update is taking longer than expected !

ripe mountain
ocean vortex
keen beacon
ripe mountain
leaden sun
ocean vortex
verbal nimbus
# brave orbit

Claude patched assembly code for me once. What it did was kinda crazy and def. above an average developer's pay grade

ocean vortex
#

Like self driving cars... those do not need any alignment

leaden sun
# keen beacon Kind of.

that's not specific to the Japanese people, am sure you know that right? 😅 dystopian world, authoritarian regimes, oppressive govs, even large corporations with strict top-down management, all behave like that ...

keen beacon
#

Sigh… I hate this man….

ocean vortex
#

plenty to be removed still. huggingface

solid brook
solid brook
#

No other ai does it

#

Google did it?

keen beacon
solid brook
keen beacon
solid brook
keen beacon
#

Same bro

solid brook
#

But gemini has far more severe melt downs

#

Like you see it go crazy

golden ocean
#

how do you get that with gemini

#

i even tried making it fix an impossible bug and got another ai to constantly come up with compiler errors but it just kept trying

#

0 signs fo meltdown

echo aurora
whole wagon
#

Altman did full reversal on hyping

#

Now he tells us the new models might get worse lol

echo aurora
#

it is?

#

Looks up for me pikaconfused

whole wagon
#

It was down for me earlier today for same reason

#

It's the captcha service not working not much to do about it ig

vernal saddle
#

Lmarena AI is finally working again. 🙂

echo aurora
hollow imp
drifting thorn
#

It’s true that chat use for most non-AI enthusiasts is somehow saturated

ruby gull
#

@echo aurora Is attaching a image and asking anything leads to chat crashing.?

drifting thorn
#

But the agents aren’t

#

Obviously agents still have a lot of room for improvement

drifting thorn
echo aurora
echo aurora
polar niche
#

@echo aurora When will the gpt 5 high and gemini 2.5 pro will be fixed?

#

How long should thinking be?

echo aurora
polar niche
#

Also what is gpt 5 high?

#

That's not available in the app

#

Still not generating

echo aurora
polar niche
#

So?

mental flame
#

hey guys, i don't know how to generate veo 3 videos, please someone help

echo aurora
pine sorrel
#

Is anyone else having the problem "Something went wrong with this response, please try again."? How can I fix it?

ocean pulsar
#

Hello

ocean vortex
past cradle
#

Hey from Romania

fiery sail
#

hello

fleet lintel
steel garnet
#

Any plans for uncensored AI? :v

echo aurora
steel garnet
pallid zenith
#

Is there a way to make vertical video on Veo 3?

glass scarab
#

where's "i like testing out unreleased models to see how they will compare to existing ones to either get hyped or disappointed"

echo aurora
glass scarab
#

ye i did potatosmile

pallid zenith
#

hey pls help me

wintry tinsel
#

Gemini 3 pro this week?

white hatch
#

No one knows

zinc ore
#

No

white hatch
#

yo wth

zinc ore
#

Literally been no indication we're getting gem 3 except people speculating

wintry tinsel
#

Explain the 3 ships tweet than lol

#

That’s some hard evidence

zinc ore
#

No it isn't lmao

#

"hard evidence"

#

Hahahah

#

Sorry that made me laugh

wintry tinsel
#

The hardest evidence

mortal coyote
#

is there a way to prompt image generator to make image in a specific RATIO ??

#

gpt does it fine , flux cannot do it

gritty token
#

/list

#

/list

charred yacht
#

9:16 how

wintry tinsel
#

Predictions for Gemini 3, what do you bet it will be SOTA

ripe mountain
autumn cloud
#

I love LMArena

robust yoke
fleet lintel
hallow surge
proud hazel
ripe mountain
#

gemini 2.0 flash lite is better than gpt oss

#

for coding

hallow surge
grim axle
#

down once again

blazing rune
# ripe mountain wym

I think he is saying "neither is Qwen 3 Coder, so just use Gemini 2.5 Pro or GPT-5 instead"

#

but Qwen 3 Coder is certainly better than GPT-OSS

#

so idk what he is talking about

ripe mountain
#

thats why it cant even be compared to GPT-OSS

echo aurora
ripe mountain
#

site is working

keen beacon
# ripe mountain

Qwen 3 no cap. Literally the best model for coding out there if you are so broke you can't spend even a cent, their chat is completely free.

#

Best bench ever to compare would be likely https://brokk.ai/power-ranking

Brokk

Comprehensive AI model benchmarks and performance rankings comparing different LLMs on real-world project commits. See which AI coding agents perform best across cost, speed, and accuracy metrics.

#

I have yet to see another bench that'd be this well designed

#

Sure Qwen is not the best performer here. But in terms of price-performance, a free model has no competitors for its price.

golden ocean
#

what about gemini 2.5 pro

keen beacon
keen beacon
golden ocean
#

and free on ai studio

#

but u meant api i guess😔

#

wait no, u said chat

#

ai studio chat also completely free

keen beacon
#

Also proprietary models are not kosher

supple vector
#

beuh no bing

hollow imp
hollow imp
keen beacon
stray aspen
#

Pineapple

#

Does that form log your email

echo aurora
keen beacon
#

Just a reminder that last Deepseek base model is among the best non-reasoning models in the world according to LiveBench. LMArena scores and other public benchmarks tell a similar picture.

Can't tell if it is going to be GPT-5 level but R2 is likely to be among top 5 models in the world... until OpenAI pushes 5.1 a week later to stay competitive sigh

white hatch
#

We believe

#

Someone pays to use vpn?

keen beacon
white hatch
golden ocean
white hatch
#

Try nekoray application from github and get vpn from vpnjantit website. I was using this for a while

keen beacon
rustic knot
keen beacon
#

The real OpenAI

ocean vortex
#

On most metrics it's basically either +/- tied or notably ahead. Overall it is ahead beyond margin of error. 🤷‍♂️

wintry tinsel
#

Google will win the entire AI race

#

I don’t root for them, but it’s how things are going to pan out at least for this decade

ocean vortex
#

Your average Joes are not gonna use aistudio

wintry tinsel
#

Their marketing plan is to just integrate it into their web search

ocean vortex
#

MS did it with Bing

wintry tinsel
ocean vortex
#

They didn't

#

AI overviews is a far cry of that and a very basic implementation. Kinda to just tick a box lol

#

looks like some tiny sh'it model as well tbh

#

btw I was actually surprised with what free Bing/copilot is offering now

lime coral
vast fern
#

hi guys do we still have free UI in google ai studio I saw there was an update today in the UI and I dont see UI will reaming free of charge anymore

#

was there any update in the policy

vast fern
ocean vortex
vast fern
errant mango
#

@echo aurora hello

ocean vortex
#

Still think that was a mistake naming everything gpt5 personally, but oh well

white hatch
vast fern
white hatch
#

Yes, AI studio was always free and may will

vast fern
#

in last one it was clearly mentioned

#

that's why i was confused

#

maybe they updated this as well

obsidian cargo
#

anyone else getting random cutoffs in outputs?

vast fern
obsidian cargo
#

like, outputs just randomly ending

sturdy mica
#

oh my god all my chats are gone AGAIN

#

he guys

#

when the website goes down

#

dont clear everyone's cookies

#

its so annoying

obsidian cargo
#

safest to export what you want to keep

sturdy mica
#

how do you export

obsidian cargo
#

like, copy+paste into a notepad file or google doc

sturdy mica
#

bro

#

what

obsidian cargo
#

not a bro or a him

sturdy mica
#

i call everyone bro

obsidian cargo
#

yeah I'm fine with being called bro

ornate stump
sturdy mica
#

wha5

#

yeah ai studio looks different

#

EW this looks HORRIBLE

#

that is god awful

vast fern
sturdy mica
#

god did not intend for this UI

ornate stump
sturdy mica
#

jesus

vast fern
sturdy mica
#

OH

#

NO

#

NOOOOOO

vast fern
#

ts is so ugly

#

idc unless and untill its free

ornate stump
#

they really don't have a designer

sturdy mica
#

they nerfed 2.5 pro

lime coral
lime coral
sturdy mica
#

but why did they make it worse

#

it looked a lot better before

sour spindle
#

I like it lol

keen beacon
#

it looks really good imo lol

vast fern
sturdy mica
#

Grok 5!!!!!!

sour spindle
#

Yea and I’m usually pretty harsh on Google. Runs smooth on mobile so far

sturdy mica
#

prolly grok 4 code

keen beacon
#

What just happened

#

New Gemini just dropped?

sturdy mica
#

?

#

no

#

new crappy UI dropped

lone vector
#

nano-banana should drop anytime now

#

I'm not sure if the next model is going be 2.5 checkpoint or 3.0

hollow imp
#

It's a bit cringe to me

keen beacon
#

Did they nerf Gemini?

patent aspen
keen beacon
keen beacon
patent aspen
#

Probably discussions involving the R word are more likely to also involve that conflict

#

I'm intentionally avoiding saying those because it's against the rules

#

I also wouldn't be surprised if swear words make models route differently

rare python
patent aspen
runic zenith
#

What do each of the categories on lm arena mean for the leaderboard? Overall, hard prompts, etc? Is there a FAQ page on here or the website for them? Most of them r intuitive for me but a few of em r not

fossil fable
#

ah f5ck telemetry 3.0 pro

#

or 2.5fl ga

#

yeah maybe 2.5fl ga

keen beacon
#

2.5 flash is already ga??

white hatch
whole wagon
#

Its not gemini 3 lol

#

No idea what it is tbh. maybe 2.5 ultra or smth

#

they never released kingfall and that

#

so they still have it

scarlet urchin
#

Does LMarena train its image model off of the images we upload? And is it fast? Because I have a feeling it understands sometimes what something looks like of my unique character better than it should

#

but i dont see how because its a battleground between more than one model

formal jungle
#

Dall E Mini 4 Life

swift cobalt
scarlet urchin
keen beacon
swift cobalt
abstract tundra
#

What happened to Prompt-to-Leaderboard?

#

It's dead again

haughty siren
#

Is gpt-5-high thinking or pro

lofty elm
#

any reasons why gpt high and gemini 2.5 pro are slow to response

sullen quest
rare python
sullen quest
#

gpt high reasons for much longer than 2.5 pro

rare python
sullen quest
opaque mirage
#

why is nano banana so good

jade egret
patent aspen
#

@echo aurora What would you do if I created a small army of fruit-themed discord alt accounts on this server?

jade egret
patent aspen
#

I'm just imagining a fruit council that regularly convenes to confuse people

willow bane
#

live is blind

wintry tinsel
#

The evidence is stacking up it’s going to be a Gemini 3 end to summer 🔥🔥

#

Common folks put your monkey brains together what else does 3 ships mean?

wintry tinsel
#

Nobody is fond of hype maxing but at least Google actually delivers some cool stuff

#

Open AI hype is like cheap doughnuts and Mountain Dew with fentanyl

verbal nimbus
verbal nimbus
wintry tinsel
#

I’m going to upgrade my lazy maxing/cheating this week 🔥🔥

verbal nimbus
#

Hopefully they fix that

wintry tinsel
#

AI studio is better anyways

verbal nimbus
#

Like "can't write new lines in code blocks" level of unusable

wintry tinsel
#

Ah but we shall see my amigo

verbal nimbus
#

Oh, 3 ships, hmm...

#

Gemini 3...

#

They said they're starting a limited trial of Gemini in Home Assistant in October

empty stump
#

does gemini nerf the ai's in aistudio in any way

verbal nimbus
#

Hopefully voice mode/video

keen beacon
#

its better on aistudio imo

verbal nimbus
empty stump
#

so it is nerfed on the gemini website

verbal nimbus
# verbal nimbus

Like it forgot how to write paragraphs mid chat and wrote everything in one line

keen beacon
#

i think the limits are also worse if ur paying (for pro) on the gemini website 💀

balmy mist
#

we getting new model tomorrow?

empty stump
#

100 msg per day pro plan 2.5 pro

keen beacon
#

yea thats really low

empty stump
#

how much on aistudio

verbal nimbus
#

100 on the free API, but I find the API to be very unreliable

keen beacon
#

its not unlimited but its a high amount and depends on the day

verbal nimbus
#

Flash API is 1000 iirc

keen beacon
#

500 now

#

flash api

verbal nimbus
#

Anyone seen the Flash computer vision demo in build?

#

It's better than I thought

keen beacon
#

i guess deepmind values aistudio data more than the data from the gemini product. (why rate limits are higher)

#

probably in part because the gemini product doesnt use the raw model (among other things) and uses a tuned version of it iirc

verbal nimbus
#

Flash 2.5 no thinking, vision mode

#

It can do 2D segmentation too

keen beacon
#

yea i saw that

#

its cool

#

guess so but also prob a mix of other reasons. its strange how the limits suck for paying gemini users (on pro) tho

verbal nimbus
keen beacon
#

that as well ig

#

you dont get close to 100 rpd really?

verbal nimbus
#

Nano banana is good but not as good as I'd expect from a company that owns Google Images and YouTube

#

It's crazy how good AVM is

verbal nimbus
#

OpenAI's voice mode is at least a year or two ahead

#

Google's one can't switch between languages

#

It uses different models I think

#

OpenAI's one can seamlessly switch between Japanese and English in the same sentence

#

While preserving the native accents from each

keen beacon
#

did u try with the tts?

verbal nimbus
#

I tried in live mode

#

Since AVM is a speech-to-speech model

keen beacon
#

yea could be a restriction of that. if its a model issue i guess, the tts version wouldnt work either (or its also restricted there)

verbal nimbus
#

OpenAI's first public demo was at the beginning of 2024, and that one could sing, so I think it's about 2 years in front

rare python
#

faster

verbal nimbus
keen beacon
#

(original) 4o has a cut off of oct 2023

#

probably much less than 6 months

verbal nimbus
#

I just think it's kinda crazy how good the first model was

#

Given it's the first of its kind

urban wharf
#

hi guys took a lot of effort making this. please like and subscribe

verbal nimbus
#

Too bad they haven't seemed to work on it more

autumn cloud
#

love how lmarena just deletes all my chats when it feels like it

verbal nimbus
#

So if you cleared your browser history, your chats would be gone

#

Or if it was on a private tab

autumn cloud
#

nah its happened twice already

#

idrc tho

verbal nimbus
#

Export would be nice ig

whole sundial
#

i'm pretty sure the chats are stored on LMArena's server, tied to the user ID that is auto-generated when you first use it

verbal nimbus
#

Well if you deleted your browser history, it'll be deleted too

#

I think the ID is mainly for voting, but can check network logs/source code ig

rare python
verbal nimbus
#

I wonder if there's anything interesting in the public data

#

It's on hugging face

stark socket
whole sundial
# whole sundial i'm pretty sure the chats are stored on LMArena's server, tied to the user ID th...

but this is the cause of several of LMArena's major problems, including:
"Failed to accept terms of use": When you accept the TOU, that data is sent to the server. If the server is down or is having problems, it will show this message as that is stored on LMArena's server tied to the user ID
Chats disappearing: If LMArena is having problems, sometimes the user ID is invalidated and thus it has to make a new one, taking your chats with it as they were tied to a different user ID than the one you are using now. Also why you have to re-accept TOU, causing the above problem
I think it's best to have the chat history stored locally and on the server to prevent the second example from happening.

#

or just have chat export or have the user ID be able to be imported/exported

rare python
verbal nimbus
whole sundial
#

that could fix it, but it should be completely optional

verbal nimbus
#
  • chat export would be trivial to add with an LLM
#

Given you can already copy each message to clipboard

simple carbon
#

can follow styles properly

verbal nimbus
#

Whereas GPT's one handled it fine

simple carbon
simple carbon
#

id even recommend grok over gpt

verbal nimbus
#

GPT's one is the best rn

#

Idk whether it's better than nano-banana

#

But the auto-regressive nature gives it a lot of control

simple carbon
#

nano banana doesnt chaneg the resolution or anything

keen beacon
#

nano banana is autoregressive too it seems

#

(it seems to be 2.5 flash native image gen)

verbal nimbus
verbal nimbus
keen beacon
#

yeah

simple carbon
verbal nimbus
#

I'm trying this one rn

simple carbon
verbal nimbus
simple carbon
verbal nimbus
#

I was talking about image gen

#

Because these sorts of tasks are actually very useful in education

#

Instead of a teacher spending 5 minutes drawing something, they could generate it on demand

simple carbon
verbal nimbus
simple carbon
verbal nimbus
#

GPT will probably get it

simple carbon
#

maybe its just my gpt but it cant comprehend basic things let alone this

#

this is what gemini came up with

verbal nimbus
#

It graphed it I think

#

That's why it says code

simple carbon
#

for image gen its a diffferent story

verbal nimbus
#

There are some visualizations that are harder, this is just an easy case

#

Like integral ones where you have to show each rectangle

simple carbon
#

these AI cant even understand or visualize images that i send them let alone make accurate ones

verbal nimbus
verbal nimbus
simple carbon
#

idk it didnt really use image gen for me it used some weird code graph

verbal nimbus
#

Because Matplotlib is more difficult for complex graphs

verbal nimbus
simple carbon
verbal nimbus
#

It's just spitting back the prompt

simple carbon
verbal nimbus
#

I'll check

simple carbon
#

and uses the same model

#
  • 24 images a day
#

@verbal nimbus

#

this is what chatgpt image gen came up with

verbal nimbus
#

This is what it gave me

simple carbon
verbal nimbus
simple carbon
#

so ig the code function is better... then

verbal nimbus
verbal nimbus
simple carbon
verbal nimbus
simple carbon
#

lol its not even letting me run code, garbage

verbal nimbus
simple carbon
verbal nimbus
#

ChatGPT's analysis tool is actually pretty good

simple carbon
#

the marketing fooled me i must say

simple carbon
verbal nimbus
#

Like it can research and gather data, analyze it, then plot it out

#

Just normal mode

#

Doesn't work on mobile though, can't access Internet

simple carbon
verbal nimbus
verbal nimbus
simple carbon
verbal nimbus
#

It'll search the web while thinking, analyze the data then plot it

verbal nimbus
simple carbon
#

?? this one

verbal nimbus
simple carbon
#

just gotta click the thingy

dense sphinx
#

Riko? What's that?

simple carbon
dense sphinx
#

Oh I see

#

Hello skibidi

simple carbon
verbal nimbus
#

And it'll do it automatically

dense sphinx
simple carbon
verbal nimbus
#

My prompt is kinda bad but it should work if you're on desktop/web

simple carbon
# verbal nimbus Like you can just ask: Gather data on US debt in the last 20 years, analyze it, ...

Here’s an illustrative chart featuring U.S. federal debt over the last 20 years (approx. 2005–Q1 2025), along with the year-over-year percentage change in debt:

Data Summary & Sources
Data Point Description Source
Federal Debt U.S. national debt surpassed $36 trillion as of early 2025.
Investopedia
The Washington Post

Historical Context From around 2007 ($9 trillion) to 2022 ($31 trillion), 70% of total debt was accumulated.
USAFacts
The Washington Post

Main Growth Drivers Major contributors: wars in Iraq & Afghanistan, Great Recession stimulus, COVID-19 relief, and tax cuts.
The Washington Post

Debt-to-GDP Debt as a percentage of GDP has climbed above 120% by Q1 2025.
FRED
The Washington Post
Analysis of the Chart

Debt Trend: The chart clearly shows federal debt growing from around $9–10 trillion in the mid-2000s to over $36 trillion by 2025.

Percentage Change (Year-over-Year): The red (or similar accent) line illustrates annual growth—periods of sharp spikes correspond to economic crises:

2008–2009: The Great Recession-led stimulus caused noticeable jumps.

2020–2021: COVID-19 relief led to some of the steepest increases.

Other years: More modest but steady increases due to regular budget deficits and policy choices.

How I Constructed the Chart

Total Debt data points were inferred from widely cited historical values (e.g. $9 trillion in 2007, $31 trillion by 2022, over $36 trillion by 2025). These align with FRED data series

verbal nimbus
#

Yup

#

Don't need to enable search though

#

Not sure if it'll think in that mode

simple carbon
verbal nimbus
#

Oh let it think

simple carbon
#

it always makes some document after thinking

verbal nimbus
#

Did it output a graph?

simple carbon
verbal nimbus
#

I'll try too

simple carbon
#

holy it takes so long on thinking mode

verbal nimbus
#

Yup, it's faster if you give it the data

dense sphinx
#

What's that prompt?

verbal nimbus
#

It might have gotten contradictory information

simple carbon
simple carbon
verbal nimbus
simple carbon
verbal nimbus
#

Especially nowadays, with internet misinformation

verbal nimbus
verbal nimbus
simple carbon
#

as far as i know

simple carbon
#

it still takes pretty long

dense sphinx
#

Guys are you felt errors from Claude 3.7 sonnet recently?

verbal nimbus
#

Yeah but to source the data, create a graph and format it and everything

#

That'll probably take me 15-20 mins at least

simple carbon
dense sphinx
#

Me too

#

I didn't know what model best right now

simple carbon
verbal nimbus
simple carbon
#

ive built multiple projects with gemini ai canvas

verbal nimbus
#

The current one is Opus 4.1 and Sonnet 4

dense sphinx
verbal nimbus
dense sphinx
verbal nimbus
dense sphinx
#

👍

verbal nimbus
#

Because most people usually don't have the time to research everything

verbal nimbus
#

Hopefully will be fixed with Gemini 3

verbal nimbus
empty summit
#

hi everones

hollow spire
#

hey

gleaming oriole
#

Does anybody konw how to make our new model be listed in LMArena Text-To-Image Leaderboard?

whole sundial
#

make a post in #1372229840131985540 and reach out to them by email (can be found on LMArena's "About Us" section)

whole sundial
#

(if it is an unreleased model you want tested in stealth like nano-banana, you might not want to make a post unless you feel comfortable about people knowing about it before release, just reach out to them via email. although for smaller companies, they might not prioritize you + if you don't have an API available to them, they won't add the model because they need an API so the model can be used)

keen beacon
#

Is there any other ai generator that has no censorship?

regal river
#

Hello there! Do you know where I could find the information about the parameters used for each model?

#

For example, is Gemini 2.5 Pro using the default thinkingBudget parameter?

quasi parrot
#

what do i do if all of my messages were cleared out

mighty reef
white hatch
half drift
#

Hi, new here.. wanna learn ai video

fierce monolith
#

Hi, there.. I would love to learn more about AI video gen tools

verbal nimbus
#

Gemini doesn't enforce a thinking budget every time

#

On AIStudio, in Auto mode, the system prompt instructs the model to set the thinking budget sparingly, so most of the time there probably isn't a budget enforced.

regal river
#

Yes default is dynamic thinking (Gemini adjusting depending of the prompt)

#

I was just wondering if it was fair to compare a GPT-5 High (which Plus subscribers don't even have access to on ChatGPT) to a Gemini Pro. Depends on the latter's level of thinking.

verbal nimbus
#
You are Gemini, a helpful AI assistant built by Google. I am going to ask you some questions. Your response should be accurate without hallucination.

You can write and run code snippets using the python libraries specified below.

\`\`\`python
print(google_search.search(queries=['query1', 'query2']))
\`\`\`

Always generate queries in the same language as the language of the user.

# Example

For the user prompt "Wer hat im Jahr 2020 den Preis X erhalten?" this would result in generating the following tool_code block:

\`\`\`python
print(google_search.search(["Wer hat den X-Preis im 2020 gewonnen?", "X Preis 2020"]))
\`\`\`

**Always** do the following:
  * Generate multiple queries in the same language as the user prompt.
  * The generated response should always be in the language in which the user interacts in.
  * Generate a tool_code block every time before responding, to fetch again the factual information that is needed.

If you already have all the information you need, complete the task and write the response. When formatting the response, you may use Markdown for richer presentation only when appropriate.

Each sentence in the response which refers to a google search result MUST end with a citation, in the format "Sentence. [INDEX]", where INDEX is a snippet index. Use commas to separate indices if multiple search results are used. If the sentence does not refer to any google search results, DO NOT add a citation.<ctrl100>
<ctrl99>context

Current time is...<edited for privacy>
#

That's AIStudio's current prompt (or something like it, I ran it like 5 rounds). Only had grounding enabled, otherwise probably longer. Sorry I thought Discord would minimize it.

verbal nimbus
#

Gemini 2.5 Pro on LMArena doesn't seem to have a system prompt. Either that or it hallucinates worse, because it comes up with a different one each time, compared to AI Studio's Gemini 2.5 Pro on temp 1. I think it just doesn't have a system prompt, which is weird.

regal river
verbal nimbus
rare python
#

put the whole thing in a txt

willow grail
#

Baofeng PMR Funkgeräte

tacit root
#

is something wrong with arena, or is this on my end? today sometimes, like every 10th prompt, I get this stuck generation 😕

keen fulcrum
#

When will we see russian LLM models? They were at some point one of the best in the leaderboard

hollow imp
#

@green plume

#

😡

#

MY CHAT HISTORY GONE....... AGAIN

pine knoll
#

hi huys I'm new

#

how can i try specific AI model such as Veo 3 or others?

golden vortex
#

I cant create images with upload images allways a message

golden ocean
golden vortex
#

Is the creation down again?

pine knoll
golden vortex
#

you can create images with references?

pine knoll
#

yes

#

it's a feature bro

#

maybe you could try to refresh the page\

#

or even delete browser history and settings

golden vortex
#

i will try 🙂

unkempt bluff
#

/ok

golden vortex
#

working!

#

thanks a million guys for your help!

pine knoll
compact grail
#

uhhh can i get my image please

proud hazel
compact grail
proud hazel
#

yeah

pine knoll
#

just refresh the page

stiff kernel
heady flare
proud hazel
heady flare
proud hazel
#

Yeah, which one?

heady flare
stray aspen
gloomy zenith
#

Hi guys,

#

New here, i just discover LMArena, is it completely free to use or there are limitations

gloomy zenith
#

Lol okay

echo aurora
echo aurora
# heady flare Whyy

Are you still getting this? Does refreshing the page/new browser/start new chat make a difference?

rocky mauve
# heady flare Whyy

I get this issue quite often, normally refreshing fixes it, but might not be the case for u

timber iris
#

Is it possible to use gpt 5 high thinking in Direct chat?

gloomy zenith
#

This is so cool, I wish I found it sooner

torpid raptor
#

Hello new from her

alpine coral
#

i haven't done much testing lately, but gpt-5 (high) gets perfect scores across all three question sets (the only model to do so.. saturates it basically)..
i've done just a couple of runs with opus 4.1 (thinking(; underperforms opus 4 (thinking)
haven't tested grok-4.. except 1 run/quiz, where it does poorly

heady flare
echo aurora
calm sequoia
#

A lot of people who've been in this channel seems gone now. Where did you guys migrate?

unborn ocean
calm sequoia
#

Same

#

Where is the Leo, @keen beacon these days?

unborn ocean
#

It think wild is kind of in the unsloth dc

#

Idk though

rustic knot
restive swan
#

Hello world.

heady flare
polar niche
#

I love this site

quartz light
#

THATS NOT-

restive swan
#

I would like to speak to someone about a general LMArena question, is there is anyone "official" to ask?

wintry tinsel
quartz light
#

dude

#

lmfao

#

thats not

#

I CAN JUST GET RANDOM INFO FROM THE MODEL LIKE THIS

restive swan
#

are you aware of the Cipher-Like Input Behavior Framework issues with AI?

calm sequoia
polar niche
#

What do you mean?

quartz light
restive swan
# polar niche What do you mean?

ChatGPT's Cipher-Like Input Behavior Framework

Learned Pattern Associations
  • Models learn a fuzzy mapping between surface forms (ciphers, encodings) and likely semantic outputs during training.

  • This mapping is based on patterns in vast quantities of obfuscated, malformed, and encoded text (e.g., Reddit rot13 posts, base64 logs, Twitter-style camouflaged phrases).

  • The model doesn't solve ciphers; it replays what ciphers usually mean.

    Semantic Anchoring via Thematic Embedding

  • Transformers construct high-dimensional latent representations where semantic and thematic features are entangled.

  • Even when literal meaning is misparsed, the model recognizes:

      Topic domain (e.g., “violence,” “romance,” “sarcasm”)
      Affective tone (e.g., angry, ironic, pleading)
      Discourse form (e.g., question, command, confession)
    

    Latent Drift with Shallow Anchoring

  • Once the model “believes” it has decoded the input:

      It anchors on the thematic attractors it detects.
      It fills in gaps with plausible content using language priors.
      This results in hallucinations that sound thematically consistent but are semantically ungrounded.
    
  • This is like dream logic: close enough in texture to feel right, but built from fragments.

Semantic Lensing Effect

  • Output can range from low divergence (clear message) to high divergence (surreal parallel).
restive swan
#

sorry

#

I'd really like to talk to someone official at LMarena.

fossil fable
polar niche
#

It works with encodings

#

I used it and it solved most of them pretty easily

quartz light
#

banan

restive swan
#

or base64

restive swan
#

doin cat things

polar niche
restive swan
#

aaaactually arezra... have you tried?

quartz light
#

im testing this on llama

polar niche
#

Yes

quartz light
frail birch
#

A hyper-realistic urban lifestyle portrait of a stylish young boy sitting confidently on the hood of a white Mercedes-Benz G-Wagon. He wears a black BALR. hoodie with bold white logo text on the sleeves, paired with black joggers and modern gray-and-white sneakers. His hood is up, framing his face, and he has a sharp, intense expression. A black wristwatch adds to his streetwear aesthetic. The backdrop features tall palm trees,modern architecture, and sharp sunlight casting (part 1 comment box

quartz light
restive swan
#

Vg jnf gur orfg bs gvzrf, vg jnf gur jbefg bs gvzrf.

keen beacon
polar niche
keen beacon
#

in battle mode

restive swan
#

Is there no official representation of LMarena?

polar niche
restive swan
#

thanks

polar niche
quartz light
quartz light
quartz light
polar niche
#

These outdated models don't do well in cryptography

restive swan
# polar niche It was the best of times, it was the worst of times.

Assistant B

The text you provided is in Rot13, a simple letter substitution cipher that replaces a letter with the letter 13 letters after it in the alphabet. To decode it, we can apply the same substitution in reverse.

Here is the decoded text:

"If it is the beginning of words, it is the end of words."

This is a reference to the letter "s" (or "S"), which can be at the beginning of words (e.g., "saw") and at the end of words (e.g., "cats").

#

it does not decode, it just searches for semantic matches with thematic consistency

restive swan
polar niche
#

Don't use mistral for these type of tasks

restive swan
#

for best effect, talk to the bot at all before you ask it to decode, it will match your thematic content and guess more incorrectly. This affects all AI that I know of.

polar niche
#

Try gpt-5 high

#

It will decode no problem

spare rune
#

nano pineapple

#

did yall see this top tier free model

#

its so good..

keen beacon
#

@echo aurora Hi

echo aurora
spare rune
#

i think thats a banana

#

not sure

keen beacon
simple carbon
#

holy shitt

#

nano banan is gemini

#

i knew it

polar niche
#

Pineapple on pizza? Yes or no

keen beacon
#

made with lm arena

keen beacon
primal orbit
#

Gemini-2.5-Flash-Image-Preview has knowledge cutoff June 2025. Looking forward for next Gemini Pro with updated knowledge.

polar niche
#

Why is r o b l o x censored

spare rune
shadow jewel
#

lmarena so peak I might cry

spare rune
#

thats to cracked

#

what model is this

#

is ts gemini 2,5 image edit

shadow jewel
drifting elk
#

Guys are the llms in lmarena real?

drifting elk
#

I ask gpt 5 he says that he is gpt 4o

spare rune
simple carbon
#

why is my one not generating

spare rune
#

so thats why

drifting elk
solar galleon
#

would love to try using Gemini-2.5-Flash-Image-Preview if only i could upload images 😭 (im waiting for the fix)

spare rune
#

wait you cant upload images

#

oh

echo aurora
simple carbon
simple carbon
south elk
#

WTH this banana is insane news

simple carbon
#

also why was it called nano banana

echo aurora
golden ocean
#

nano pineapple

solar galleon
simple carbon
echo aurora
simple carbon
#

inogdeto

#

incog

keen beacon
keen beacon
shadow jewel
#

I thought it was gonan be a crazy ass propt

simple carbon
#

ahhh lets goo its working for me now

#

i had to try on 7 tabs and inogdeto but still

restive swan
echo aurora
restive swan
#

ty

stray aspen
#

Holy

#

Nano banana was revealsd

simple carbon
#

perfect image editor

obsidian cargo
#

only getting "Something went wrong with this response, please try again." results 🙁

warm hare
#

hi

simple carbon
dusty cedar
#

Eeeeveryone is jumping on that

stray aspen
#

It's greater than all that crap that takes forever and is not as good

#

Like gpt image

drifting elk
#

Yup

simple carbon
#

otherwise its very good at following prompts

obsidian cargo
#

its a pretty crazy leap too, gemini 2.0-flash was one of the worst image models on LMArena

simple carbon
#

nvm i think that was 1.5

fossil fable
obsidian cargo
#

sucks at cyclopes though. gpt-image-1 is the best at those for now.

obsidian cargo
#

bruh...

fossil fable
#

autoregressive image-out multimodal models are the best thing to happen to image gen since its origin

simple carbon
#

is this gonna be available on gemini itself

grand patio
#

Nano bananas... Super dope

fleet lintel
#

it feels like decades since the nano-banana came out.. when and what is the next mdoel launch. 🤣

obsidian cargo
#

tbf it was a test I didn't want anything specific besides "ONLY ONE EYE YOU IDIOT"

#

bad prompting would be some convoluted 500 word json file

polar niche
#

@echo aurora Something went wrong

#

API issues?

golden ocean
#

where do the json prompts come from

echo aurora
#

What seems to be the problem?

obsidian cargo
echo aurora
obsidian cargo
#

every time I hit that retry

#

well except this time, but its taking a while

restive swan
#

maybe the model doesn't like you

obsidian cargo
#

nevermind it failed

#

maybe

restive swan
#

try patting it on the head

obsidian cargo
#

I don't think it likes Gergoth

misty vault
#

tf

restive swan
#

like give it some pictures of cute kittens to calm it down

#

you monster

clear herald
#

bro assistant B been generating for 300 seconds

obsidian cargo
#

refresh the page it probably failed

leaden palm
#

i remember when reaching 1M or 2M all time battles was a milestone

#

wild model

keen beacon
echo aurora
clear herald
#

like i open other chats instead because since its still generating it wouldnt allow me to enter new prompts

#

and every time i go back it resets to 0 and keeps counting

clear herald
#

double problem

glass copper
#

@echo aurora I've posted over 100 issues to "feedback" threads and nothing ever happens, it's where ideas go to die. Can you please just get it done?

Un-nest the Leaderboard, and give it its own page. Add tabs to the top of the page, so we can switch between the different Tests. When we're on the Leaderboard, we should be able use the browser's native scrollbar to scroll up&down. As it stands, there is no scrollbar and we can't even use the (invisible/non-existing) scrollbar inside the nested table. You can't see where you are, and you have to use the mouse wheel, inside a little nested window, inside of a page. It's a nightmare to use.

#

This change is very obvious and easy to interpret

fading rover
#

I don't know what is going on but gpt model on chatgpt image generation over it's website product more quality detail than in lmarena they mostly differ the quality from lmarena website ..

glass copper
#

The UI is so bad that I don't even load the LMArena right now, I have to download the page with my AI and have it re-display the contents in a new table

sullen quest
#

wat

zenith cape
#

Why can't I farm any nano-bananas today? Did it get banned?

glass copper
#

@echo aurora Look, this is actually an F-tier, close to 0 out of 10 for design

#

Hard to do worse than this

#

you also threw half the pixels in the garbage

#

half of the page rendering area is wasted...and then we're pressing "Ctrl+F" to scroll down inside a nested table, meanwhile not being able to see where you are. That's an F

#

and still couldn't afford a scrollbar

keen beacon
#

Just wish i don’t need to upgrade super Grok…

#

😔

obsidian cargo
echo aurora
# clear herald also yes

Do you know if you're seeing the same in direct/side by side or are you just getting this in battle?

echo aurora