#general

1 messages · Page 266 of 1

wicked sage
#

thanks trey

stray aspen
#

it sucks

echo aurora
#

Time moved forward

stray aspen
#

ill just stick to claudius

wicked sage
#

claudius

modest prism
#

Is Gemini 3.1 good

wicked sage
#

best name oat

lost patrol
#

i bet 3.5 earliest on google I/O. So in on May 19th

thorny schooner
#

Going to be real though i keep forgetting how much direct chat get get so disrupted by random competitive mode it's crazy

sick mantle
#

Pineapple WHEN IS SEEDANCE 2.0 COMEING

stone tundra
#

why si 3.1

#

so low

cloud zinc
modest prism
cloud zinc
inner relic
#

huh

#

bro

echo aurora
# sick mantle Pineapple WHEN IS SEEDANCE 2.0 COMEING

Sorry going to have to give the boring answer -> I won't be able to share details about what new models or features are upcoming until we're ready to share more. Would recommend to keep an eye on our announcement channel.

cloud zinc
#

80.6 on swe

loud verge
#

Thanks socky

sick mantle
stray aspen
#

is the gemini on lm arena high thinking

errant zodiac
#

Why do I not have permissions to use the video arena channels? For some reason, Arena won't let me use those channels.

cunning birch
#

Gemini pro is just so dumb... More stupid than GLM

rigid holly
#

I dont mean to be a nag, but its been like 2 days and yet the thinking model for sonnet 4.6 still did not come out while usually the previous ones came out withing the hour

Is it like a whole intentional thing or what @echo aurora

echo sinew
stray aspen
#

nah

inner relic
#

what are ya yapping about, this thing just released

#

yes

stray aspen
#

claude is better

inner relic
#

it created better

#

building

cunning birch
#

At all !!!

modest prism
echo aurora
errant zodiac
# sick mantle Its deleted.

What does that mean? The channel is deleted? Or my permission is deleted? And if my permission is deleted - why?

celest orchid
echo aurora
rigid holly
#

Ah. Ok. Atleast i got a answer.

echo aurora
#

Moving Video Arena off of Discord and just onto the site.

inner relic
#

Okay it's your opinion

errant zodiac
#

Ah. I understand.

echo aurora
stray aspen
#

bro this thing has early deepseek reasoning times

#

takes forever

echo aurora
sick mantle
stray aspen
#

why did bro send his pfp

wheat thorn
thorny schooner
remote vapor
#

why send fake chicken.,.,

errant zodiac
rugged abyss
#

Really interest in coding performance in real tasks

sick mantle
stray aspen
#

that chiken aint real

arctic basin
remote vapor
sick mantle
remote vapor
#

yea fair-

stray aspen
#

<@&1349916362595635286>

remote vapor
#

oh come on...

sick mantle
#

Fake

remote vapor
stray aspen
#

is anyone even still falling for that scam

celest orchid
sick mantle
#

Ofc he is

proud bobcat
#

Oh shi

#

Gemini 3.1 pro

echo aurora
proud bobcat
#

It’s not supposed to be groundbreaking

sick mantle
stray aspen
#

this coding is terrible

sick mantle
#

SICK

proud bobcat
#

It’s a refresh

stray aspen
#

theres no order

#

gemini 3.1 sucks

inner relic
proud bobcat
#

Hold on let me check this out

sick mantle
proud bobcat
#

Let me give it a pretty simple task

small lintel
remote vapor
#

why peeps posting slop?

proud bobcat
#

I think people forget video arena moved

cunning birch
proud bobcat
#

Yeah Gemini code was never good

#

Me and my friend tested it out and it always was just messy

#

Did it work? Yeah

#

But it’s still ahh

lost patrol
proud bobcat
#

HELLLLLLL NAHHH

sick mantle
#

But pineapple remove limts

proud bobcat
#

I DONT BELIEVE THAT AT ALL

sonic swallow
#

/vídeo

fickle venture
#

LMAO GEMINI 3.1 IS GOOD

stray aspen
#

seems like it bunched up a lot of code in a single line to reduce the total lines by 700

cunning birch
#

I thought Google will destroy all her competitors..
Now the competitors are safe 😆 😆 😆

fickle venture
#

It will get nerfed tomorrow

inner relic
#

there's a thinking level

#

medium

#

I think yall using high thinking

cunning birch
fickle venture
proud bobcat
lost patrol
proud bobcat
#

Yeah I know it’s from artificial analysis

#

Yeah no this is ridiculous

#

It’s a literal refresh model

#

INSANELY slow too

shrewd citrus
#

yeah lol like it should’ve achieved 1st place on the arena leaderboard

proud bobcat
#

Depends

shrewd citrus
#

with those “insane” benchmark results

thorny schooner
#

Well from what I'm hearing I guess it was a good thing I did not use that most recent AI the granted it probably wouldn't had help right now with the compares and stuff so still funny to hear some of the complaining ( cuz well I mean I was So nice to see I'm not the only one)

proud bobcat
#

GPT 5.2 isn’t a refresh

lost patrol
#

hallucination gone back it seams

proud bobcat
#

That’s a whole ass new architecture

#

“it actually is” 5.1 was a refresh

#

5.2 was completely diff

#

Yeah

proud bobcat
#

How is Gemini 3.1 pro number 4 in speed this is slow as hell

#

Number one in coding?

lost patrol
#

at least at AA 😉

proud bobcat
#

Nahhh

harsh flume
#

we at a point of dinimishing returns. More compute, better inference, better artificial data, but still same paradigms

#

There wont be another GPT3 moment in a long time

low patrol
#

W GEMINI in the chat

#

spam w gemini

celest orchid
low patrol
royal sail
lost patrol
#

yeah

royal sail
#

older models hallucinated all the damn time

proud bobcat
#

Wow this is slow dude

#

Holy hell

harsh flume
#

but thats nitpicking, in general compute is increasing at a linear pace

#

Opus 4.6 wasn't a gamechanger. This is what I mean, diminishing returns

plucky sparrow
#

is this good? 🤔 doesn't seem that impressive to me:

lost patrol
#

lets see, how it's on simple bench 😄

echo aurora
# celest orchid nice

Since it's going to be a new/high demand models there errors may happen, team is monitoring this

stray aspen
#

they migrated from the video arena channels

echo aurora
#

I think it's people trying to use Video Arena

sick mantle
#

@echo aurora I wanna use sora on arena but it chooses for me random model please fix

royal sail
low patrol
#

safety censor has reduced

#

in gemini website and aistudio

stray aspen
proud bobcat
#

3.1 PRO PASSED THE CAR WASH TEST

#

YESSSSS

stray aspen
proud bobcat
#

Arena

low patrol
stray aspen
#

yes

#

it gave me a non working script

#

lmarena gave me one that works

harsh flume
#

There's a chance compute is too high, I predict there'll be a inflexion point of algorithmic paradigm shift way more radical than what deepseek did and we get to a way higher joules efficiency

sick mantle
quartz light
#

my screenshot btw

#

:(

harsh flume
#

I don't mean compute is high in an absolute sense, rather energy output towards compute is too high

surreal zephyr
#

ok so gemini 3.1 pro is comparable to 4.6 sonnet nonthinking

harsh flume
#

The human brain is enough evidence of that

surreal zephyr
#

but much worse than sonnet high or opus high

low patrol
#

in leaderboard artifical analysis

proud bobcat
#

Artificial analysis is benchmarked

low patrol
patent bane
surreal zephyr
frosty lava
#

how does 3.1 not really better than 3 ?

proud bobcat
#

Sorry

surreal zephyr
proud bobcat
#

Benchmaxxed

proud bobcat
simple kayak
frosty lava
#

when we got opus 4-6 it was much better than 4-5 and gpt 5.3 codex is much better than gpt 5.2 then why not for gemini 3.1

low patrol
#

so it is worse than gemini 3?

#

or better

proud bobcat
#

Uhhh

#

It’s like

stray aspen
#

great i just lost my massive manually made prompt to a somethnig went wrong error

proud bobcat
#

Marginal improvement

low patrol
#

🥺

quartz light
half mist
frosty lava
#

they released a new model knowing it was just a small improvement ?

proud bobcat
#

Benchmarks wise it’s defo a lot better

#

Actual use?

#

Eh.

#

Defo a LOT more token efficient

stray aspen
#

benchmarks say theres a massive leap

harsh flume
sick mantle
frosty lava
half mist
simple kayak
proud bobcat
#

To be fair it is a refresh

frosty lava
#

same for gpt

harsh flume
proud bobcat
#

Gemini 3.1.2 pro xhigh ultra spark codex

half mist
proud bobcat
#

Gets 0.2% increase in swebench

frosty lava
#

not the model

proud bobcat
#

Yeah this is a joke Gemini 3.1 pro refuses to code

half mist
proud bobcat
#

OH

#

WAIT

#

ITS DOINF SOMETHING

frosty lava
#

i would liked to see as much improvement from gemini 3 to 3.1 than what we saw for opus and gpt codex

half mist
proud bobcat
#

I’m asking it to make a simple 3D fps shooter and it’s installed 6 packages

#

I’m sorry?????

hoary elbow
#

Is it still optimized or is it like when 3.0 first came out?

#

Cause after it came out, it got Nerfed

proud bobcat
#

What’s optimized

#

OH

half mist
proud bobcat
#

Quantization

hoary elbow
#

Oh yeah, is it still quantized?

proud bobcat
#

Uhhh no looks like it’s working at full precision

hoary elbow
#

Or is it like Gemini 3.0 prime

proud bobcat
#

As a tradeoff

Abysmally slow

plucky sparrow
#

it's quantized

proud bobcat
#

No it’s not

#

It’s slow as hell dude

surreal zephyr
proud bobcat
#

That’s a full precision model

plucky sparrow
#

probably just cause everyone is using it. it doesn't seem much better

proud bobcat
#

Google doesn’t do batch inference

celest orchid
#

its fun

surreal zephyr
pulsar crystal
proud bobcat
#

Looks fine

proud bobcat
#

Lmao

surreal zephyr
#

it probably is quantized

plucky sparrow
#

he's right, I don't, I'm just basing it on the fact there was a model 1-2 weeks ago that was miles ahead of the current release

surreal zephyr
#

doesnt mean we are ever getting the real model lol

proud bobcat
#

If it was quantized it would be fast

stray aspen
#

i hate this so much

hoary elbow
#

I’m getting mixed reactions. Can I make a poll here? I can’t find rules anymore and the last time I read, it was a long time ago

surreal zephyr
frosty lava
surreal zephyr
#

it was very big before release

proud bobcat
#

How do you think opus is so fast

#

Yeah exactly

surreal zephyr
surreal zephyr
#

codex spark is fast

#

opus isnt

proud bobcat
harsh flume
#

For Claude users, what is the best EV in getting usage? Their plan or just go to Openrouter API?

proud bobcat
#

For a behemoth of a model?

#

Absolutely fast

surreal zephyr
proud bobcat
proud bobcat
#

Dude even Qwen crushes Gemini

#

This is bad

surreal zephyr
#

LOL

harsh flume
cyan harbor
#

why the "remove" option is deleted?

surreal zephyr
#

2.5 flash does better than this bro

harsh flume
#

ill wait for the subtitled version

frosty lava
#

i think its lazy as hell

proud bobcat
frosty lava
#

you have to prompt it well

proud bobcat
#

🥀

#

I KNOW WHY IT SUCKS

surreal zephyr
proud bobcat
#

WAIT

#

YEAH

#

ITS BEING TOKEN EFFICIENT

proud bobcat
#

The files it’s making don’t even go above 100 lines of code

#

It’s strangling itself

surreal zephyr
proud bobcat
#

Not enough at all

#

See what I mean?

#

It’s been trained to use the least amount of tokens as possible to copy Claude

surreal zephyr
#

900 LOC

#

not remotely close to what opus did

royal sail
surreal zephyr
royal sail
#

The model literally feels lazy

#

it also makes tons of silly syntax errors

surreal zephyr
#

gemini worse than gpt 5.2 low?

stray aspen
#

this is grok levels of disappointment

royal sail
#

I mean

#

The model isn't terrible by any means

#

It just doesn't feel great with some coding tasks

surreal zephyr
#

lol

hoary elbow
lost patrol
#

i think google is positioning it more for common user and maybe into the scienece direction.
Not for coding

royal sail
#

But majority of the time, it generates pretty reasonable code

stray aspen
#

i think claude could do this better

royal sail
stray aspen
#

considering it made me this

sick mantle
#

<@&1349916362595635286>

mortal vale
#

@twilit sable Note that Video Arena has been removed from the server. More information can be found in this #announcements

stray aspen
#

this is way better

surreal zephyr
stray aspen
#

holy cook

surreal zephyr
#

the models not even close

stray aspen
#

lmao

analog steeple
#

god forbid third world countries from touching latest tech 🙏

whole swallow
#

WHEN DID 3.1 PRO COME OUT

proud bobcat
bleak lake
whole swallow
#

Is opus 4.6 that good?

bleak lake
echo aurora
proud bobcat
shell pewter
#

anyone want to share how does gemini 3.1 vs opus 4.6 feel?

bleak lake
stray aspen
#

@echo auroracan max route you to gemini 3.1

proud bobcat
#

Claude Opus 4.6 just gives you the same performance

#

Oh and Gemini is slow ahh hell

shell pewter
# bleak lake Benchmarks or tests?

as in did any user test it extensively to have some own opinion, cus the benchmark from google said it really good but the rank on arena is a mixed bag innit

shell pewter
proud bobcat
#

Got no babes…

#

Only ai…

shell pewter
proud bobcat
#

I love how my breakup was the reason I became invested into llms

#

I literally had no other hobby

shell pewter
proud bobcat
#

Gemini 3.0 still peak

shell pewter
stray aspen
#

grok 4.2 is the best research model

proud bobcat
#

GPT 5.2 search is great if it would USE IT MORE OFTEN

shell pewter
proud bobcat
#

Grok makes the best research models

limber panther
#

penguin riding a motorbike

shell pewter
stray aspen
odd geyser
#

Why is Claude 4.6 better than Gemini 3.0 pro in the text?

proud bobcat
crystal mica
#

i will honestly say. gemini 3.1 pro is great in terms of solving math and etc. problems, but i hate it speech type and position

odd geyser
proud bobcat
#

But it really goes crazy in coding

Not in a good way.

#

It installs way too many packages

#

Is slow

sonic swallow
#

gerar Prompt de vídeo profissional de uma pessoa que faz flexão rápido

proud bobcat
#

Lane

rigid copper
#

hi guys

proud bobcat
#

It took it a solid 3 seconds to make one line of code

odd geyser
odd geyser
rigid copper
proud bobcat
#

Also Gemini 3.1 is heavily token efficient but not in a good way

#

It has too many shortcuts when coding

#

Claude is thorough

#

Every other model is thorough

rigid copper
#

umm @echo aurora i would like to get some help with this

stray aspen
#

🥀

surreal zephyr
odd geyser
rigid copper
surreal zephyr
proud bobcat
#

THE CODE FOR THE MAP IT MADE IS BARELY 64 LINES LONG

#

WHATTT

rigid copper
royal sail
#

Definitely faster than Opus 4.6

proud bobcat
#

Opus 4.6 was faster for me

echo aurora
rigid copper
#

if it's a rate limit, it shouldn't affect other model like claude since i often use gemini 3 pro

echo aurora
proud bobcat
#

One thing I do like is that Gemini 3.1 bug checks after it makes files

#

That’s good

echo aurora
proud bobcat
#

Mhmm

#

Yeah real

#

Oh

#

It finished coding

#

Let’s see

#

Yeah it’s garbage

#

It’s complete garbage

#

Oh my god

#

Hold on let me send the link

soft matrix
#

Opus 4.6 is good with some languages💗

proud bobcat
#

This is god awful

shell pewter
proud bobcat
#

Lmarena isn’t nerfed

#

It’s the same model from Google api

shell pewter
#

so code is still opus (how good is sonnet btw?)
search is grok?

proud bobcat
#

Studio sucks right now

#

Arena has seemingly been better

#

Lmao let me get Claude sonnet to do the same thing

shell pewter
#

its really hard to just use logic without any search, or is it just me?

#

i agree actually

proud bobcat
#

Gemini losing the plot

rigid copper
proud bobcat
#

Claude is cooking

shell pewter
#

yes 100% this

proud bobcat
#

I think a mix of opus and kimi right now is goated

#

Maybe GLM 5?

#

Research is always grok

#

Grok crushes research

#

Oh my god sonnet is COOKING

echo aurora
plucky sparrow
#

no opus deep think is best at research

#

but it's also super expensive

shell pewter
shell pewter
inner relic
#

Any leaks about deepseek v4?

#

The whale is really quiet

proud bobcat
proud bobcat
wind hinge
#

/image

shell pewter
rigid copper
#

tried all model and it won't work :/

proud bobcat
#

I’d say sonnet is 5-10% better than 4.5 opus

stone tundra
#

why is 3.1

#

so low

#

😢

proud bobcat
#

It’s joever

shell pewter
#

I have been using AI for research and code, so for now i guess i only need to use grok and opus/sonnet?

stone tundra
#

text leaderboard

proud bobcat
#

LMAO I IUST REALIZED GEMINI 3.1 PRO SKIPPED MAKING THE ACTUAL HUD AND GAMEPLAY

#

BRUHHH

#

Ass model

shell pewter
#

thx mate 🙏

stone tundra
#

lol

astral vortex
stone tundra
#

yeah

#

idk how it fialed os badly

lofty quartz
#

मनुष्य किसी भी दुःख को, सहन कर सकता हैं, लेकिन गृह क्लेश उसकी, आत्मा को निचोड़ देती हैं..!

inner relic
#

I have to agree, Gemini 3.1 isnt that good

proud bobcat
#

It’s such a downgrade

#

Remember how crazy 3.0 was at coding to the point we were blown away

shell pewter
crystal mica
astral vortex
crystal mica
inner relic
#

Alright, Let me guess, I think deepseek v4 releases tommorow

shell pewter
astral vortex
proud bobcat
#

DUDE

crystal mica
proud bobcat
#

SONNET COOKED GEMINI BY A MILE

#

LOOK AT THIS

rigid copper
inner relic
balmy mist
astral vortex
balmy mist
#

anyone tried g3.1?

shell pewter
proud bobcat
#

Like

#

I’m not even hating

#

It’s just ass

balmy mist
#

how??

#

like worse than g3?

proud bobcat
#

Terrible coding ability

#

Slow as hell

astral vortex
crystal mica
proud bobcat
balmy mist
#

wtf

#

no way

cunning birch
balmy mist
#

smh

proud bobcat
#

Sonnet cooked it

cunning birch
#

Google always make bad models on february

#

Gemini2 and 3.1

royal sail
proud bobcat
#

If this was Gemini 3.1 flash it would’ve been understandable

royal sail
#

The model is not that bad

proud bobcat
#

It is

#

It IS

#

Sonnet wiped the floor with it in less time and 10 times higher quality code

royal sail
#

The only problem it has is the verbosity and uncommon syntax errors

royal sail
surreal zephyr
proud bobcat
#

Gemini 3.1 made a mess

#

Here’s sonnet

royal sail
#

It's a bit unfair to base an entire model's performance off 1 prompt no?

surreal zephyr
#

gemini sys prompt SUCKS

proud bobcat
#

Failed both times

royal sail
royal sail
proud bobcat
#

Mhm

royal sail
#

That's the problem

proud bobcat
#

The third time the game functioned

royal sail
#

You're basing your entire opinion of a model on a single prompt

proud bobcat
#

On the same prompt

proud bobcat
#

Like ampro’s tank

surreal zephyr
proud bobcat
#

For a SOTA model that looks awful

surreal zephyr
#

gemini is BETTER if you not use system prompt

#

its WAY worse with system prompt

proud bobcat
#

Wow it really is

#

Oh my god

meager tinsel
#

I do not think 2 hours of testing is nearly enough time to form a general finalized opinion on a large multi-modal model like Gemini. For me it's generated some pretty nice stuff but in completely different medium than what everyone else is posting.

proud bobcat
#

Safety fine tuning fluff

fading atlas
#

OK

surreal zephyr
#

heres with system prompt

proud bobcat
#

Wow you aren’t even lying

#

God damn

balmy mist
#

did gemini deep think get updated as well?

surreal zephyr
#

gemini is actually better if you tell it to work hard and ignore sys prompt

#

the sys prompt DEMANDS it to be cost efficient and token efficient

gleaming heath
simple perch
# proud bobcat It’s ass

I was wondering why my photos don't look good in terms of quality anymore. I thought my app was bugged because it wasn't saving in 2K. Lol.

rigid copper
#

@echo aurora starting a new chat does the fix, but that means i can't use the old chat anymore because my project (usually from code arena) still going on.

#

anyway thanks for the tip

rigid copper
mortal vale
#

@random ginkgo Note that Video Arena has been removed from the server. More information can be found in this #announcements

surreal zephyr
echo aurora
#

We’re exploring how occasional Battles in Direct chat might work. Our mission is to measure and advance the frontier of AI for real-world use, and integrating Battles into Direct is a meaningful step in that direction. The help center's article about the experiment can be found here.

proud bobcat
#

Gemini subreddit glazing the hell out of 3.1 pro

#

Woah it can make an svg…

coral axle
#

Hey everyone. Trying to use opus4-6 here in the arena, but it's clearly not the right model. It's running some version that doesn't even know its own existence lol. Any routing or deployment issues going on?

shadow prairie
#

Hi everyone, what can these lags be related to?

fringe carbon
#

ugh

#

anyone surprised 3.1 didn't top

surreal zephyr
shadow prairie
#

I have stupid lags.

#

as shown in the screenshot

rotund seal
fringe carbon
echo aurora
half mist
surreal zephyr
fringe carbon
echo aurora
mild dagger
#

why i could not make video?

half mist
echo aurora
shadow prairie
#

for Russia ❤️

robust sonnet
#

Yo guys getting like mad errors

surreal zephyr
echo aurora
echo aurora
surreal zephyr
shadow prairie
stray aspen
echo aurora
surreal zephyr
coral axle
#

I get the pre-training argument, but in production, a properly fine-tuned model should have its identity explicitly defined in the System Prompt. If it's guessing whether it's Sonnet or Opus, it usually means the system prompt injection is either missing or misconfigured on your end, not just a random pre-training hallucination. Can you check the prompt wrapper?

surreal zephyr
half mist
drowsy mural
#

I don't even want to say anything—this feature is utterly ridiculous.

coral axle
#

@echo aurora ?

fringe carbon
#

when you vote in side by side does that count?

#

or does it need to be in battle

#

i thought only battle counted

#

but it lets you vote side by side which is weird

echo aurora
half mist
#

No, no it does not. It’s a new Experiment that puts Battles in Direct Chat that is pointless in my opinion

echo aurora
fringe carbon
sleek crow
coral axle
#

@echo aurora Hey team, following up on the model identity issue with some visual proof.

Image 1: The official UI. The model knows exactly what it is (3.1 Pro) because the system prompt is properly configured.
Image 2: Your gemini-3.1-pro-preview endpoint. It's defaulting to a generic, outdated response (claiming it might be 1.5) because it clearly lacks a proper system prompt wrapper.

By exposing these raw endpoints without injecting the correct identity context, you're essentially lobotomizing state-of-the-art models. Users come here to benchmark, and when a model acts lost, they blame the AI companies, not the platform's infrastructure. It's tarnishing the models' reputations. You really need to update your system prompts for these new endpoints.

stray aspen
#

where can i utilize gemini 3.1 no system prompt

proud bobcat
#

Models don’t know what they are

mortal vale
#

@nocturne turtle Note that Video Arena has been removed from the server. More information can be found in this #announcements

stray aspen
whole swallow
proud bobcat
whole swallow
#

If I tell gemini that it's gemini 4.5 it doesn't become magically more powerful

coral axle
#

@proud bobcat Models don't know what they are unless you pass a proper system prompt at inference time. The official UI injects it. Your API wrapper clearly doesn't. Calling a missing system prompt a 'standard hallucination' is wild for a testing platform. But anyway, good luck with the benchmarks.

#

@whole swallow I never said a system prompt magically boosts capabilities. That's a strawman. I said it grounds the model's identity. If you don't inject the correct context (like the official UI does), the model falls back to older SFT data and hallucinates an outdated version. It's a basic context injection issue, not a capability debate. But I'll leave it at that.

ancient elk
lost basalt
#

This is the worst possible thing arena.ai could do with there loyal members, lost everything.

stray aspen
#

@surreal zephyr where can i use the best version of gemini 31

surreal zephyr
#

and tell it to put max effort ect

stray aspen
#

lm arena?

surreal zephyr
#

no shortcuts

#

yea

stray aspen
#

ok

surreal zephyr
#

its trained to be lazy for money

surreal zephyr
#

you have to tell it to put more effort

coral axle
#

Look, I was just trying to give you a heads-up so you could actually update your system prompts. I have zero interest in using this outdated, misconfigured crap anyway. I'll just take this to social media to expose how you're doing false advertising and dragging these AI companies' names through the mud by serving crippled models. Have fun with your broken benchmarks. I'm out.

stray aspen
#

do wahtever you want

hollow imp
#

@surreal zephyr exhaustively enjoy 3.1 while you can, it's gonna be nerfed to trash after 2 days

stray aspen
#

ive never seen someone so upset because a model doesnt tell you its name

lost basalt
# hollow imp How did you lose everything

I had a month old chat history which i build with time, now i can't build it again. It took alot of time. i tried to create a master prompt for a new chat but it doesn't work the same.

hollow imp
lost basalt
#

It randomly starts a battle and now claude 4.6 also has a rate limit which asks to wait for 21 minutes.

hollow imp
#

Do you have to catch a train?

lost basalt
stray aspen
#

if you want more pay aa claude subcription

coral axle
#

kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk

hollow imp
#

You get enough rate limits that you can copy paste every single message from the error chat instead of scrolling instagram and wasting time

coral axle
#

just to say it's one thing when it clearly isn't

lost basalt
hollow imp
#

1 blocked message

stray aspen
#

wdym its always been there

pale sonnet
#

3.1 pro is soo good

lost basalt
sick mantle
stray aspen
hollow imp
#

Vertex ai

stray aspen
#

vertex aint free

hollow imp
#

It is

sick mantle
hollow imp
#

200$ google cloud free credits

pale sonnet
hollow imp
sick mantle
lost basalt
#

Can only wait, maybe they will fix it. we can only criticize

proud bobcat
#

Even with a system prompt

#

It still doesn’t know

#

Gemini has told me it was 1.5 pro on the official app

shadow prairie
#

Oh gods, nothing helped how to live?

stray aspen
#

lmarena's gemini 3.1 is way better than the one on gemini app and ai studio

pale sonnet
sick mantle
shadow prairie
lost basalt
quartz light
hollow imp
#

@stray aspen vertex ai bro

proud bobcat
#

Claude processing 5 billion safety parameters before Something went wrong with this response, please try again.

shadow prairie
#

He just can't give me the code! I've been doing this for an hour now.

proud bobcat
shadow prairie
sick mantle
hollow imp
#

300$ api credit

#

90 days

gleaming roost
#

😊

stray aspen
#

gemini business is overloaded

#

it sucks

shadow prairie
#

Now endless coding

echo aurora
midnight marlin
#

Are the battles still in direct mode? 💩

stray aspen
#

yes

lost basalt
shadow prairie
sick mantle
stray aspen
#

no

#

these models aint cheap

proud bobcat
#

trust me

sick mantle
shadow prairie
#

Xs what should I do, I need to download the repository

echo aurora
echo aurora
hollow imp
#

@surreal zephyr @stray aspen

stuck orchid
#

Is Gemini 3.1 better than CLaude Opus 4.6 according to your tests?

stray aspen
#

insane

#

whats the prompt

quartz light
surreal zephyr
#

its not even close

#

bro

#

its literally AGI

quartz light
surreal zephyr
stuck orchid
#

Claude Opus 4.6 > gemini 3.1 pro?

#

Okay

hollow imp
olive mesa
#

3.1 pro was agi

#

before it got nerfed

stray aspen
#

?

hollow imp
#

Normal

stray aspen
#

prompt

#

bro what hte hell is trinity large

hollow imp
woven peak
#

what is trinity larg

#

large

#

is it good

brittle tiger
quartz light
woven peak
quartz light
#

THE CODE

hushed gyro
#

guys what the hell is trinity

quartz light
quartz light
#

bad models

#

why?

#

oh they just released it on arena?

#

loooooooo

#

so late

hollow imp
fickle venture
#

Tf is trinity

quartz light
echo aurora
quartz light
#

trinity large released a while ago

#

its not very good

#

but

fickle venture
quartz light
#

decent for first release

quartz light
#

why

fickle venture
#

The heck that's a spying model

turbid timber
#

is kimi better than chatgpt

woven peak
fickle venture
woven peak
stuck orchid
brittle tiger
#

i've been using 4.6 opus in antigravity for days and 3.1 pro is fixing problems opus wouldn't

celest orchid
#

Who is trinity large

woven peak
olive mesa
#

even the quantized version of 3.1 is better than 4.6

quartz light
celest orchid
olive mesa
#

which is what we have rn

stray aspen
#

just another open source model

woven peak
#

(opus)

olive mesa
#

at least from my and other people's tests

fickle venture
olive mesa
#

the unquantized version which was only around for a short time at launch was also noticeably better than the 3.1 we currently have access to

surreal zephyr
bleak lake
# celest orchid It's good ai?

Trinity-Large-Preview is a frontier-scale open-weight language model from Arcee, built as a 400B-parameter sparse Mixture-of-Experts with 13B active parameters per token using 4-of-256 expert routing.

It excels in creative writing, storytelling, role-play, chat scenarios, and real-time voice assistance, better than your average reasoning model usually can. But we’re also introducing some of our newer agentic performance. It was trained to navigate well in agent harnesses like OpenCode, Cline, and Kilo Code, and to handle complex toolchains and long, constraint-filled prompts.

woven peak
bleak lake
#

512k tokens btw

woven peak
olive mesa
turbid timber
turbid timber
#

but i think ur right

zealous sparrow
turbid timber
#

chatgpt is definitely more worth paying for generally

fickle venture
zealous sparrow
#

who beat me

fickle venture
#

You can use GPT 5.3 codex xhigh for free (idk about limit)

If you got macos install codex
If you got windows / linux install codex CLI
or opencode and link your account there

shadow prairie
sterile sun
#

@echo aurora Why do you come with so much armour?

zealous sparrow
stuck orchid
echo aurora
sterile sun
rocky mauve
sterile sun
#

I bet if a pineapple would have fell on netwon, we all would be floating today

leaden ivy
#

Hey

hushed gyro
#

oh no.... i thought this is a dream... BATTLE MODE IN DIRECT???!!!

normal abyss
surreal zephyr
proud bobcat
#

hold on let me try studio

surreal zephyr
surreal zephyr
#

Llmarena is best

hushed gyro
surreal zephyr
#

Make sure to force it to think much

proud bobcat
surreal zephyr
normal abyss
surreal zephyr
#

But repeat many times to think excessively

#

Its trained to be lazy

hushed gyro
#

has anyone noticed that the quality of NB Pro has degraded after the errors?

normal abyss
hushed gyro
#

omg why does it have errors again STOP THE TORTURE...!!!!

proud bobcat
#

damn

drowsy mural
robust sluice
#

NB still errors but come with low resolution when it works

surreal zephyr
hushed gyro
hushed gyro
proud bobcat
#

"create a detailed 3d simulation of a tanks suspension with a map, driving controls, accurate physics, and polish. do not cheap out on code, think to the maximum and install as many packages as needed."

#

trying this out

#

lets race it against sonnet lmao

robust sluice
#

did they buff on Flux or something

hushed gyro
#

based, W

robust sluice
#

never seen them work so fast

echo aurora
robust sluice
#

so its a bug ?

hushed gyro
proud bobcat
#

the hell

#

lmarena is down

echo aurora
hushed gyro
#

this is dogwater lmao

sick mantle
hushed gyro
normal abyss
echo aurora
whole sundial
#

I should bump this again as it is still heavily discussed here, please remove the battles in direct

proud bobcat
#

odd.

#

maybe i overly specified to install packages?

echo aurora
normal abyss
echo aurora
echo aurora
# proud bobcat no problem

What model did this happen for you btw? I'm using battle and it appears is only happening with one of the options.

proud bobcat
#

sonnet and 3.1 pro

#

more specifically sonnet 4.6

hushed gyro
#

WHAT?!

Today vs... 2 weeks ago

fickle venture
#

@echo aurora edited: nevermind the scam got deleted

split kayak
#

Gemini3.1

fickle venture
#

Oh sorry it got deleted

robust sluice
echo aurora
fickle venture
echo aurora
echo aurora
fickle venture
#

Alr I'll try it when I see a scam again

hushed gyro
echo aurora
golden ocean
#

DOG WATER

hushed gyro
#

better or no?

surreal zephyr
hollow imp
# stray aspen slide your ptompt

Create a 3D simulation of a blackhole including accurate light bending, you should be able to pan the camera with the mouse, the blackhole should have a accurate accretion disk, web based

#

@quartz light

surreal zephyr
#

opus didnt get even close

simple kernel
hushed gyro
#

again... @surreal zephyr

hollow ivy
hybrid osprey
#

Is nano banana being nerfed it’s generating 600kb files instead of the usual 6mb and failing to generate much more frequently

meager tinsel
crystal mica
#

guys, someone know how to export lmarena chat .txt?

hushed gyro
#

bro nano banana pro across all platforms are dogwater lmaoooo

rapid otter
#

.

half mist
rapid otter
#

add to friends

rapid otter
crystal mica
surreal zephyr
# hollow ivy

via api? gemini is infinitely better.
normally? opus

quaint trail
golden ocean
pale sonnet
#

rip🥹🕊️

random violet
#

Remove battle mode in direct chat... ☠️🤌

gloomy onyx
# hollow ivy

I doubt there is a public model which is able to overperform Opus-4.6-thinking in pure coding tasks.

Maybe Gemini 3 Deep Think Could actually be better, but only by reading its description card provided by Google, it is more of a research tool than a model made only for coding.

scarlet spire
#

Given it itself anecdotally way outperforms GPT-5.2 and at least matches GPT-5.2-Highest in abilities

pale sonnet
#

Γεια σου!

quaint trail
#

everytime i try to use nano banana pro i get this error lol

surreal zephyr
gloomy onyx
#

Those are parameters of evaluation which in my opinion are worth more than a single benchmark when we’re talking about pure vibecoding in real scenarios

scarlet spire
gloomy onyx
#

It’s like steroids for AI at this point

scarlet spire
#

I'm not saying faking but I'm moreso saying if you use a metric that's a subset of something else that can only ever be measured as a whole, what does that tell us? nothing much.

quaint trail
scarlet spire
scarlet spire
gloomy onyx
scarlet spire
#

It's not a hallucination if you mean "GPT-4 class"

quaint trail
#

what does it mean by gpt-4 class

scarlet spire
#

It's aaaah. An unintended lens into how OpenAI got to it. It just means that GPT-5 is, as we know, a further-trained version of GPT-4.

gloomy onyx
#

I guess so, the model does not actually know the specific version of itself

scarlet spire
#

It doesn't

#

It never "knows". It isn't told.

stray aspen
#

ai studio gemini 3.1 sucks

quaint trail
#

i have no idea how you even begin to train AI models, i can understand how they work

#

but training makes no sense

#

also coding in arena has recently just never worked for me, one time today but every time i get this error, and this error is common when using nano banana pro. not sure why

#

nano banana pro works fine when i use the actual gemini

scarlet spire
# quaint trail but training makes no sense

Simple concept of a Transformer:

  1. You give it random noise created from adding noise to a real piece of data
  2. You punish it until it guesses what the original was correctly every time.
  3. You do this concurrently with a f"ckton of other data! :)
scarlet spire
quaint trail
surreal zephyr
muted bolt
#

I dont like the idea that when we generate we can choice to have 1 generated at a time.. by the time I generate the 2nd image its saying im up to my limit which is wrong bc I normally get 5 chances

surreal zephyr
#

<@&1349916362595635286>

vocal axle
#

i dont speak english

mortal vale
#

@vocal axle Note that Video Arena has been removed from the server. More information can be found in this #announcements

scarlet spire
quaint trail
scarlet spire
vocal axle
#

i am nexs user

gloomy onyx
# scarlet spire It never "knows". It isn't told.

There should be an identity dataset where they train the self perception of the model about its metadata, but I guess that to preserve the efficiency of the hardware allocated to the training session, it just reinforces the company name and maybe the model series

scarlet spire
light sleet
#

When gpt 5.3 Codex😭

scarlet spire
gloomy onyx
scarlet spire
#

Ask GPT-5.3 on the ChatGPT.com website and you'll see that it has no problem telling you.

gloomy onyx
quaint trail
scarlet spire
quaint trail
gloomy onyx
#

Otherwise it would be easier for the model to “remember” that stuff

scarlet spire
#

The system prompt is not included on Arena, as I said

#

That's why it doesn't truly know to tell you its identity. It's intentionally not part of the post-training.

coral axle
gloomy onyx
scarlet spire
# coral axle

Good show! Yes, the ChatGPT website (first image) will indeed be able to tell you its identity just fine. It's told about what its identity is and what the date today is as well if I'm not mistaken.

scarlet spire
quaint trail
# coral axle

i remember in like 2023-2024 using chatgpt when the website was green themed, and i kept asking it the year for its training data

#

i was pissed when it said october 2022 or 2023

coral axle
#

too

quaint trail
#

also it didnt even have web search back then

coral axle
#

fk lie

scarlet spire
#

Byte count doesn't scale from binary storage count to nonbinary neural net gradients. Your comparison of "compared to many terabytes" is one that doesn't hold up because the data when trained, is not transformed into bytes of any quantity.

quaint trail
lilac nest
#

As a free user I've been using Gemini Flash 3 as my preferred everyday model. I use the limited free Pro for more complex tasks. Interestingly, Sonnet 4.6 from Claude was made the default for even Claude Pro users, so I thought I'd give it a try.

Initially, my thoughts are that it seems pretty good, but can't tell if it's better than Gemini Flash. I was wondering when it would be added to the leaderboard?

scarlet spire
quartz light
# quaint trail whats timeout

ai providers have different timeouts and basically if the response is taking longer than the timeout plan theyre paying for then it just stops

coral axle
#

I was very upset, swearing I was using opus 4-6 when in fact it's a clear derivation of 3-5. Shame!

scarlet spire
quaint trail
scarlet spire
quaint trail
#

so we just assumed chatgpt was like google if it could talk back

#

well, it basically is that nowadays

#

since it can use the web

scarlet spire
lilac nest
#

Ah so it's available to be voted on currently then. I searched the change log for it but I guess that must not include when things are added for initial voting

stray aspen
#

is gemini 3.1 nerfed already?

coral axle
#

@scarlet spire Dude, I'm developing my own model in Brazilian Portuguese from scratch. I've already developed my rag Online using some free models, and you can clearly see the difference when you use the latest templates directly from their respective platforms to the Arena platform.

gloomy onyx
scarlet spire
coral axle
#

@scarlet spire Look, I'm using Antigravity with Opus 4-5 and I was sometimes using the arena because I don't have the paid Opus plan, and I noticed that some days it always sends the same mess instead of fixing what it should do to continue training... then I just asked it to find out the version and realized I was using something that clearly wasn't it.

scarlet spire
scarlet spire
#

You're sending me random pings with absolutely no reference to what you are commenting on. You sound to me like you're trying to reply to something but are effectively failing to use the reply feature? Use the reply feature.

stray aspen
#

what is this ai conversation

coral axle
#

@scarlet spire Just configure the correct API to use the correct version and stop running those crappy system prompts from older versions.

honest verge
#

BRO I CAN'T TAKE GEMINI 3 FLASH ANYMORE

#

It forgets everything

#

After 3 prompts

#

I need 3.1