#general

1 messages · Page 225 of 1

proud bobcat
#

Not everyone wants to pay the opus premium

#

That’s just a fact

gusty egret
#

MoE is starting to catch up to dense though

proud bobcat
#

Api I mean

echo aurora
#

We are currently experimenting with this, meaning it's not going to be fully available to 100% of users.

proud bobcat
#

A lot of companies use the api

plucky sparrow
#

depends on your job opportunities bro

proud bobcat
#

Honestly if you can just use flash via api for cheaper than 200 a month

#

That’s a steal

#

Price is almost certainly an issue

ember iris
proud bobcat
#

This is ragebait

#

Bro owns 5 Lamborghinis

pale obsidian
#

i want to see artificial analysis benchmarks but they've been really slow lately

torn mantle
#

you guys dont know how crazy this is

#

for its price

#

performance/cost ratio

#

its even outperforming gem3 pro in some benchs

gusty helm
#

did google just smash the market again?

pale obsidian
#

also i hope they make 3.0 flash the default model for free users

torn mantle
#

we will probably say goodbye to sonnet 4.5 and 3.7

#

like look at those numbers..

#

wth

#

i still cant believe it tbh

gusty egret
proud bobcat
spare mango
#

Gemini 3 Pro uses logical language to explain the solution to a problem, which I appreciate. Meanwhile Gemini 3 Thinking uses cozy vocabulary to connect with you better, which I don't appreciate when I'm asking for technical help.

torn mantle
pale obsidian
torn mantle
#

they just ended oai

proud bobcat
#

Oai is done

#

They had their chance

#

It was make or break with 5.2

#

And they fumbled

#

Goofy ass company

spare mango
stray aspen
#

@deep adder is openai dead

torn mantle
#

atp xai should quit

#

lol

#

how can you compete with this.........

spare mango
#

@pale obsidian

proud bobcat
#

Xai still holds great search summarization

#

Unpopular opinion but I like grok

pale obsidian
#

^^ fully agree

proud bobcat
#

It may not be a benchmark smasher but’s it’s a fast, smart, and reliable model

pale obsidian
#

nothing beats grok at online research

stray aspen
#

xAI is absolute ass lmao

zealous sparrow
proud bobcat
#

I use it a lot when looking for good car deals

old ginkgo
#

Lol

modest prism
proud bobcat
#

Opus 4.5 isn’t real

zealous sparrow
proud bobcat
#

What do you mean?

zealous sparrow
#

i wanted to make the room walkable

gusty egret
stray aspen
#

opus 4.5 is a freak

limber panther
#

guys gemini 3 flash is better than 3 pro

#

its official

proud bobcat
#

So

modest prism
old ginkgo
#

Either we've solved cost alltogether or they improved the base model that trained flash.

proud bobcat
#

Grok is fire

spare mango
gusty egret
#

grok more often than not is at high demand...

stray aspen
#

grok sucks

zealous sparrow
stray aspen
#

gpt oss is better than grok lmao

limber panther
#

who tf uses grok

#

low iq

robust sluice
#

did they lower limit rate or something ?

torn mantle
#

grok 4.2 will be so bad

gusty egret
#

I'd say only Anthropic have a chance at competing with Google, but GPT-5.2 seems rushed if you look at the cutoff date compared to past models and their release date.

torn mantle
#

please dont hype it

modest prism
stray aspen
#

who even still hypes xAI releases

#

everyone knows its ass

modest prism
gusty egret
#

Grok overdoes web searching. Simple question, and it uses 50 sources...

old ginkgo
#

Grok 4 was sota at the time of release. Do all of you guys have amnesia?

stray aspen
#

at this point

torn mantle
#

asi

#

not agi but asi

zealous sparrow
#

rate this insult gemini 3 flash made at grok

pale obsidian
torn mantle
gusty egret
#

People have been saying "AGI achieved" since legacy GPT-4

limber panther
#

we need faster SOTA models, not slow models

#

gemini 3 flash is a good promise

old ginkgo
#

Grok releases will always be interesting to me because you never know what happens.

#

They can achieve sota or suck ass

pale obsidian
#

i think this refers to the open source model they hinted

limber panther
old ginkgo
#

Did they benchmaxx on arc agi 2 too?

torn mantle
limber panther
#

if ur hyped for grok then u eat AI slop

torn mantle
#

🫡

#

demis

torn mantle
stray aspen
old ginkgo
#

Because grok was the first model that got good at arc agi 2.

#

You guys genuinely have amnesia

stray aspen
proud bobcat
random wraith
#

is not letting upload image to lmarena.ai to anyone else?

proud bobcat
#

Grok will absolutely flame Elon

limber panther
#

now we need an opensource sota model "Gemma 4"

gusty egret
#

Grok relies heavily on system prompting to "behave" smartly. The system prompt has historically been very comprehensive in guiding behaviour.

torn mantle
#

now watch every other lab train on gemini 3 flash output

limber panther
#

openai is dying, grok doesnt care, anthropic is on google's side. and Google has won the AI race

torn mantle
#

thanks google

limber panther
random wraith
#

anyone else having problems trying to upload an image on lmarena.ai?

torn mantle
#

grok is only good at search tool calling

#

its a dumb model

stray aspen
#

yes

#

its so damn stupid that it makes me mad

zealous sparrow
torn mantle
#

i still cant wrap my head around this.. how can anyone top this cost+performance?

gusty egret
#

Gemini has been trained to be really objective if you use from the API

zealous sparrow
torn mantle
#

no wonder demis said we are far ahead of chinese labs

gusty egret
zealous sparrow
#

this is one i didnt expect gemini to roast [the platform of many controversies]

#

also

#

ts so true

random wraith
#

anyone else having problems trying to upload an image on lmarena.ai?

gusty egret
modest prism
torn mantle
#

guys you should use antigravity, it provides way better rate limits

spare mango
#

Profile picture checks out.

torn mantle
#

its basically unlimited lol

modest prism
gusty egret
spare mango
#

I'll let you figure it out.

torn mantle
#

lol wth is this... how is it better than gemini 3 pro at multilangual

modest prism
#

Ok but when we get Nano banana flash?

torn mantle
#

whats its cutoff date tho?

#

???

random wraith
#

do u guys still use lmarena.ai to edit images? or is thee anothe free solution?

gusty egret
#

probably early 2025

#

let me see

random wraith
#

image wont load

stray aspen
#

use yupp then

pale obsidian
muted timber
gusty egret
#

march 2025

random wraith
stray aspen
#

no its a website

random wraith
#

incognito mode o nomal?

torn mantle
gusty egret
#

Actually maybe it's worse

stray aspen
#

normal gang

torn mantle
#

not bad at all

zealous sparrow
gusty egret
zealous sparrow
#

I think they tested without system instructions

gusty egret
#

It doesn't seem to know beyond January?

torn mantle
pale obsidian
#

they aint done running the benchmarks, its incomplete rn

#

i think atleast

stray aspen
#

where can i use gemini 3 flash

modest prism
#

Dude it's better than sonnet 4.5. look carefully

muted timber
#

not even close

#

eh?

gusty egret
#

When is Anthropic gonna do MoE?

torn mantle
gusty egret
#

Their thinking models don't really try that hard.

torn mantle
#

wtfffffffffffff

stray aspen
#

nice ragebait

proud bobcat
#

Rigged

stray aspen
#

gemini 3 flash is a freak

gusty egret
#

Why is GPT-5.2 there

pale obsidian
torn mantle
#

i took it from official website

gusty egret
#

Rushed ahh benchmarkings

stray aspen
#

its real lol

#

check yourself

torn mantle
#

lol

#

wtf

muted timber
#

@echo aurora !!!

#

IT WORK

#

I DON TKNOW WHAT HAPP

torn mantle
muted timber
#

i change the AI to gemini

#

and it work

zealous sparrow
fleet lintel
torn mantle
#

we all know gpt 5.2 is the worst model ever

pale obsidian
#

damm this is insane

proud bobcat
proud bobcat
#

It’s so peaking peak

stray aspen
echo aurora
torn mantle
proud bobcat
#

Lmao nuh uh

#

Ts is so fire

gusty egret
#

gpt-5.2 is a new architecture compared to 5.1. The cutoff of 5.2 is September 2025 and 5.1 is October 2024

#

Rushed

#

GPT-5 is also October 2024 for reference

queen veldt
#

Nah that's fake

stray aspen
#

its not lol

torn mantle
gusty egret
torn mantle
#

true

#

xdd

fleet lintel
# torn mantle

This is beating all expectations. How is this possible??

pale obsidian
#

wait thats flash 3 reasoning

leaden laurel
#

i need usa email

fleet lintel
#

Fast cheap And great?? I thought you can only get 2 out of 3

torn mantle
leaden laurel
#

(please someone give me usa gmail)\

pale obsidian
torn mantle
#

cutoff

#

Jan 1, 2025

leaden laurel
#

or like any other country

torn mantle
fleet lintel
leaden laurel
#

which is not banned

zealous sparrow
#

can you go ask sonnet 4.5 Make a documentary with TTS about the creation of the universe in html, I want it all animated, and show facts on screen, it should be high quality. I need to compare

obsidian shell
#

@echo aurora nano pro has been unresponsive for hours

limber panther
#

i got 3 months google pro plan free trial

torn mantle
#

openai code red again??

obsidian shell
#

oh and codex is gone

#

lol

limber panther
#

Google has alot of shares in Anthropic, they're not against eachother

gusty egret
#

openai vs google i wonder who'll win

limber panther
#

Google is likely to buy anthropic

proud bobcat
stray aspen
fleet lintel
limber panther
#

vertex ai studio has all anthropic models

obsidian shell
limber panther
#

antigravity too

#

how so

fleet lintel
zealous sparrow
#

wdym no money

golden ocean
#

is the poor community celebrating gemini 3 flash

zealous sparrow
#

google is the richest company bro

limber panther
#

Google has already won the AI race

#

its over for openai fanboys

torn mantle
proud bobcat
#

What

limber panther
#

openai is dead

#

rip

fleet lintel
proud bobcat
#

This is ragebait

zealous sparrow
#

then who is

stray aspen
#

openAI is absolutely cooked lamo

patent aspen
proud bobcat
#

It’s too good

modest prism
#

Gemini 3 flash passed the AGI vision test unlike 3 pro.

jade egret
#

gemini 3 flash good?

stray aspen
#

yes

#

its a freak

limber panther
#

its great

modest prism
jade egret
#

:0

zealous sparrow
limber panther
#

only 3$ for output

#

crazy cheap

jade egret
#

what it better than pro as in 'smartness'?

pale obsidian
#

simple explanation as of why google is winning this

limber panther
#

and test it yourself

fleet lintel
stray aspen
#

google leads the way

torn mantle
#

bye bye openai

#

lol

obsidian shell
#

we had oai

limber panther
#

openai is dying

#

what a good year

#

i love 2026 now

stray aspen
#

are there any news on new veo models

obsidian shell
#

why does google have to win every time ?

#

like

#

cmon

zealous sparrow
golden ocean
#

gpt 6

modest prism
#

How is this possible?

torn mantle
torn mantle
jade egret
#

different between pro and thinking?

stray aspen
#

gemini 3 flash has such a great vision

obsidian shell
limber panther
#

pro is 3 pro

jade egret
#

oh

limber panther
stray aspen
#

i hope flash doesnt get more stupid in the coming weeks

limber panther
#

tbh im more hyped for gemini 3 flash

#

than gpt 6

pale obsidian
#

what does this mean

modest prism
gusty egret
limber panther
weary galleon
left lodge
#

@echo aurora
My suggestions,
Add a option to remove the system prompt added by you guys in code modality, because it causes the model to ignore instructions and just make a website out of our requests.
The environment is good implemented but it is very sophisticated, why only frontends of websites?

gusty egret
#

did google nerf 3-pro to favour 3-flash?

limber panther
#

we can already see that openai failed to compete

#

and grok is dumb as hell

stray aspen
#

bro whats happening with nano banena pro

#

its not working on lmarena

#

and its unavaiable on yupp

limber panther
#

its good on gemini app

#

working fine

stray aspen
#

alright

split kayak
#

3flash

echo aurora
modest prism
#

Sam Altman should be fired

modest prism
jade egret
#

wait... so gemini 3 flash is better than pro?

torn mantle
#

but just beneath 3 pro a bit

jade egret
#

oh

torn mantle
#

but then you add in speed & cost to the formula

#

and you find 3 flash is way better

stray aspen
#

flash cooked

obsidian shell
wicked sage
neat apex
#

I hope very much that Gemini 3 flash beats that baseless allergations that Gemini 3 is overscaled

wicked sage
#

like i swear to GOD bro.

zealous sparrow
#

I am waiting for the simplebench result

#

I can'r wait

#

t

limber panther
stray aspen
neat apex
zealous sparrow
#

I hope it places atleast #2 or #3 on simplebench

limber panther
#

and its all over

stray aspen
#

that would be the absolute final blow

neat apex
near root
#

Gemini 3 Pro vs Gemini 3 Flash battle3d

neat apex
#

soo, i guess it will score 66%

zealous sparrow
neat apex
#

it shows the Flash 2.5 09 results?

cloud zinc
#

gpt 5.2 still on top

neat apex
stray aspen
#

5.2 sucks

neat apex
#

5.2 is mid

cloud zinc
#

its the top rn

neat apex
#

carried by the xtra high and some luck

stray aspen
#

you need extremely high compute just be behind gemini 3 pro and flash is behind it with less compute lmao

gusty egret
modest prism
cloud zinc
zealous sparrow
limber panther
#

i cant roast u on this server

left lodge
#

@echo aurora Well i have a doubt,
We have now gemini-3-flash &
gemini-3-flash (thinking-minimal)

The first one is thinking-high?

stray aspen
#

AI benchmark drama is crazy

#

lmao

neat apex
modest prism
gusty egret
#

Wait

#

Lol

brittle tiger
#

Flash 3 is going to dominate coding tools

gusty egret
#

I'm so stuck in this year

neat apex
#

bro lives in 2024

cloud zinc
gusty egret
#

17/12/2025

#

I can't believe we're in the 17th month of 2025

modest prism
cloud zinc
#

it does

stray aspen
#

it doesnt lol

cloud zinc
left lodge
zealous sparrow
#

im letting gemini 3 flash cook up a social credi test in html, you know that meme right

zealous sparrow
#

like

#

if they dont make it a toggle

#

its a lawsuit for privacy invasion

cloud zinc
#

no toggle

stray aspen
#

flash is so good

weary galleon
zealous sparrow
zealous sparrow
devout vault
#

is gemini 3 flash free api

weary galleon
#

I'm Anthropic soldier, but today Flash impressed me.

weary galleon
stray aspen
#

we just got ragebaited

burnt sinew
zealous sparrow
#

listen i like AI companies, but this is absurd

cloud zinc
#

good, they should focus on AI

zealous sparrow
#

40% less GPU production due to AI

burnt sinew
cloud zinc
#

gaming can take a backseat

zealous sparrow
#

a lot of their consumers

#

are gamers

#

the company will face a loss

weary galleon
cloud zinc
burnt sinew
cloud zinc
#

ai will create more games

zealous sparrow
cloud zinc
weary galleon
burnt sinew
fleet lintel
#

Thinking about flash performance.. basically, it implies that google probably already has Gemini 3 pro model with current flash post training that is much better.

burnt sinew
zealous sparrow
burnt sinew
unborn ocean
#

Where we don’t have the pro yet

weary galleon
whole sundial
fleet lintel
whole sundial
#

almost no support, but support has been slowly getting better

astral elk
#

Good night community

stray aspen
#

morning

weary galleon
#

They are great.

torn mantle
#

is gemin i3 flash vision broken?

burnt sinew
#

Well they might be the only option if nvidia stops making consumer gpus, but I dont think they would

weary galleon
#

Pure OAIs "flagship"

weary galleon
#

For gaming also.

zealous sparrow
#

and that will be the case

#

I liked AI till it caused companies to raisep rices

burnt sinew
weary galleon
zealous sparrow
#

RAM like 70% up or sh- and now GPUs

torn mantle
zealous sparrow
#

Next? CPUs

stray aspen
#

5.2 is just plain stupid

weary galleon
burnt sinew
#

Is gpt 5.2 just benchmaxxed to the extreme?

whole sundial
# whole sundial intel... exists i guess

they have no chance in ai anyways so they can make some more gpus once they figure out how to not use tsmc, they still need to worry about ram in any event though

stray aspen
crude lagoon
#

No way 😭 gemini flash model

burnt sinew
#

Gemini 3 flash is better than 3 pro at math?

crude lagoon
whole sundial
#

b570/b580 got a lot of sales at launch, but then rtx 5000 made people forget about them, if you can find one they are a great choice, actually better at ai than amd i think and intel has first-party tools with support, idk about third-party ones though

whole sundial
#

are you kidding me, a new rx 7600 costs more than a b580? i have an rx 7600 and i can tell you that a b580 is probably better

weary galleon
#

Flash is excellent in coding, much better than GPT 5.2!

stray aspen
#

craig whats your specs

echo aurora
torn mantle
weary galleon
torn mantle
#

to make it fair you should try gemini 3 flash vs gpt 5.2 high ( although high is way expensive )

sour spear
#

So, am I right in assuming that Gemini 3 Flash, the free model available for everyone, is better than anything available in ChatGPT?

stray aspen
#

yes

torn mantle
#

actually if you think about it, its fair from cost point of view

stray aspen
#

gpt 5.2 sucks

torn mantle
#

but one is a reasoning model one is not

#

they lost so bad

stray aspen
#

feels like they just routing the non thinking model to gpt 2 lmao

gusty egret
#

You have no idea what it was like to be around gpt 2 when it released

#

It was the coolest thing of all time

#

Nowadays we get objectively amazing models but since they're comparatively ass to other models, we just dunk on them

torn mantle
#

i think there was a model before davincii that i was using a lot

#

and waiting for the streamline response

#

and gpt neo

#

was crazy

#

it really felt like magic

stray aspen
torn mantle
#

at that time we still didnt understand how they worked exactly

stray aspen
#

for high school

#

now i use it for college

#

and yes it was crazy

#

however not so good for physics class and math lmao

#

i would spend al lnight trying to get the AI to give me problem solution that matched the books answer

gusty egret
#

It was super cool

whole sundial
gusty egret
#

I've been in the community since early 2019

whole sundial
#

of course it wouldn't get this right, but i had to try

gusty egret
#

The shift from text prediction playgrounds to chatbots kinda annoyed me

#

Giving models their own personality made it harder to generate writing similar to your own

stray aspen
whole sundial
gusty egret
zealous sparrow
#

this is one thing i love gemini 3 flash does it automatically looks for sfx and finds them

#

i didnt link those

torn mantle
#

now you can make comparisons

whole sundial
weary galleon
torn mantle
#

its so expensive too

#

wth

gusty egret
#

Gpt o1 flashback

whole sundial
stray aspen
whole sundial
whole sundial
gusty egret
obsidian cargo
#

hey @echo aurora they made the lmarena prompt filter too strict it's forbidding perfectly innocuous text prompts now…

gusty egret
#

Actually wait

sour spear
#

Seems the rollout is complete. I'm usually the last person on Earth to get new models in the official app, and I just saw it pop up. 😁

cloud zinc
gusty egret
#

I still have the email from the newsletter OpenAI sent out upon releasing gpt 2

echo aurora
sour spear
# cloud zinc

No idea how that happened. 5.2 High has its moments, but then it's also giving me ridiculously bad results sometimes, I can't bring myself to even try it anymore for coding.

proven grove
#

Is there any way I can fix it if I get stuck with infinite loading?

gusty egret
#

OpenAI is just trying too hard. When gpt 5 thinking released it kept talking in lowercase during programming questions, which was so odd

weary galleon
#

Flash is good even with minimal thinking.

burnt sinew
rugged abyss
#

Love how Gemini 3 Flash is beating Gpt 5.2 by a huge margin

gusty egret
#

Google, multi zillion dollar corporation, Vs OpenAI, a tiny shed with 2 dudes in it

rugged abyss
gusty egret
#

3 dudes

rugged abyss
#

Where are you getting 3 dudes from?

burnt sinew
#

What is style control? Like system prompt?

gusty egret
rugged abyss
#

you are referring to the starter years of openai

gusty egret
#

Idk why you're pressed about my trolling you

atomic lagoon
#

burh

#

It had a copy image so I tested

rugged abyss
gusty egret
#

I mean I think it's pretty obvious that OpenAI isn't a tiny shed with two dudes in it

rugged abyss
gusty egret
#

I thought I demonstrated my competence earlier lol

rugged abyss
gusty egret
#

My earlier contributions lol

#

Nvm it's fair

#

I wonder when anthropic will do a MoE

rugged abyss
gusty egret
#

🥀

gusty egret
#

Specifically because of the cotton question 💀

zealous sparrow
#

yes it did

gusty egret
#

Why'd it choose to spell it as Uighur? Feels like the LLM would choose the more common one, that is Uyghur

rugged abyss
#

No way you didnt give it the answers and questions

#

What was the prompt?

torn mantle
#

so funny

zealous sparrow
# rugged abyss What was the prompt?

Make a social credit test in html, IF YOU do bad execution, if you do good you chinese citzen, also add sound effects and the red sun in the sky music, take them from myinstants
+
My short system instruction to make it not lazy

#

literally just this

#

this prompt sucks ass

#

but it works

gusty egret
#

That's the most neurodivergently written sentence ever lol

gusty egret
zealous sparrow
gusty egret
#

John Cena was mentioned though

#

Obviously it understands

#

And also the extremely exaggerated values like +5000 social credits XD

gusty egret
#

I actually didn't choose the wrong answers

rugged abyss
gusty egret
#

I'ma go do that

keen beacon
#

👋

gusty egret
#

Minus one billion

#

That's crazy

weary galleon
#

Even Flash with minimal thinking is much better than GPT 5.2 with maximum thinking 🤣

#

Thinking won't help if your are stupid.

#

69 is the exact right answer.

half mist
modest prism
zealous sparrow
#

both are gem 3 flash

half mist
half mist
modest prism
#

Gpt image 1.5 is available for free with generate limits, NB pro is not. Waiting for NB flash

burnt sinew
zealous sparrow
#

im using on AIstudio

#

i haven't hit ratelimit yet but im goin to see it soon ig

half mist
zealous sparrow
#

yeah to make gemini3 flash good at coding you need an anti-lazy prompt

#

how so

zealous sparrow
#

the owner of it

#

said it would be public

#

soon

cloud zinc
#

there is no prompt

zealous sparrow
#

also 👀

#

they are going to

#

fix up pro

cloud zinc
#

its pizza planet

zealous sparrow
#

waiit

#

i know now

#

google put out a 9f checkpoint which was likely a 3 pro one

#

it's prob gone now

#

or mayb it was flash idk

half mist
#

Also, the thinking model doesn’t think, or it doesn’t show the thinking

cloud zinc
torn mantle
#

it has everything

#

planning ... execution .. optimization ...

zealous sparrow
#

i realize why 3 flash was so good on codearena

#

the codearena system prompt made it

#

good

half mist
#

Also, I see no difference between fast and thinking. Fast does it instantly, and thinking does it instantly as well, it doesn’t show the thinking

weary galleon
gusty egret
#

I counted twice

zealous sparrow
weary galleon
gusty egret
gusty egret
#

What am I missing

weary galleon
#

Just a small corner of a whole tomato.

gusty egret
#

I suppose not having the whole image does make that one difficult to spot

weary galleon
gusty egret
#

I was not here a few days ago

weary galleon
gusty egret
#

Also I think GPT might be counting the ones with the top stem visible

weary galleon
gusty egret
#

Plus 3

weary galleon
#

Which 3?

gusty egret
#

Who knows

#

The inner workings of 5.2 are a mystery to us all

weary galleon
#

Because it's an extreamly bad model.

gusty egret
#

Maybe it's these three

#

They're the most tomato-shaped where you can't see the stems

#

I can only theorise because it probably gives a different answer every single time

weary galleon
gusty egret
#

It's rushed

weary galleon
#

Rushed, but fair.

gusty egret
#

The cutoff date of GPT-5.2 is 1 September 2025 and GPT-5.1 & GPT-5 is 1 October 2024.

#

The release month of GPT-5 was August 2025 and the release month of GPT-5.2 was December 2025.

#

GPT-5 took 10 months to be published, whereas GPT-5.2 took 2.5 months.

lapis sparrow
#

🙌 Hello ...

half mist
# torn mantle yea lol

The limit for thinking is the same as pro. So if the rate limit of thinking is the same as pro, what’s the point of thinking?

gaunt roost
restive scarab
#

Nano banana pro is literally not generating and working

astral blaze
#

I am being SILENCED

#

It truly is over

echo sinew
# astral blaze I am being SILENCED

You're not being silenced. Your post seems off topic. Completely unrelated to the current chat. If that was a reply to someone in particular, you can ping them or reply to their original message.

astral blaze
gusty egret
#

Nano banana pro appears to be down

neon idol
#

@zealous sparrow sooo impression on gemini 3 flash?

gusty egret
#

3-pro just died on me after thinking for 3 minutes straight

solar hollow
zealous sparrow
solar hollow
#

it seems though, that the difficulty of the prompt matters

stray aspen
#

does anyone have a good system prompt for gemini 3

half mist
#

Oof

#

Happy it has Google Search grounding though since that makes it more accurate

torn mantle
#

this benchmark was proven to be unreliable over and over and over

#

read what it measures

zealous sparrow
#

I haven't seen too many hallucinations rn...

#

The model only gave me good answers

torn mantle
#

we dont know what a partial answers is... and is this LLM judged?

cloud zinc
#

wow gemini 3 is so bad

zealous sparrow
#

the model barely gave me that much wrong answers

cloud zinc
#

so gemini 3 flash hallucinates a lot

torn mantle
#

no

cloud zinc
torn mantle
#

bad bench

cloud zinc
#

no it is good

zealous sparrow
#

the bench is ass

#

bro

#

give me a question that it would hallucinate

cloud zinc
#

it is a trustful benchmark

stray aspen
cloud zinc
#

i am looking at an objective benchmark

#

check here

zealous sparrow
half mist
torn mantle
#

stop

half mist
zealous sparrow
#

give me prompt

#

ill test

cloud zinc
#

so it is giving fake information

#

thats not good

half mist
cloud zinc
#

how can people trust an ai model, if it gives fake info

zealous sparrow
half mist
cloud zinc
#

alot of time it hallucinates

zealous sparrow
stray aspen
zealous sparrow
#

i didnt get a lot of hallucinations yet

stray aspen
#

what show

#

no

#

he said he got hallucinations with grounding on

zealous sparrow
#

well then its obvious

#

it wont have any info without it

half mist
zealous sparrow
#

therefore its unfair for the model

#

as it has to have every show in its training

half mist
zealous sparrow
#

let me find a new one

stray aspen
#

its not

#

its a great model

zealous sparrow
#

i need to see it

#

else im not taking it

#

i want to bet one thing

#

the testing for hallucination was done without grounding, which nerfed the model to rely on training data. Making it the reason it scored so high on hallucinating

stray aspen
#

or are you ragebaiting

half mist
#

It’s very hit or miss if it hallucinates

half mist
zealous sparrow
#

and hallucinated

#

and it prob got detected so it removed your chat

stray aspen
#

i aint buying this gang

zealous sparrow
cloud zinc
half mist
half mist
zealous sparrow
#

consider me stupid

golden ocean
#

Considered.

half mist
#

I think we just gotta use it normally, then wait until we find a hallucinated response

half mist
# zealous sparrow gl

Does this count? It says the Plankton Movie is expected march 2025, but that date has already been surpassed

zealous sparrow
#

your gemini is cursed

half mist
zealous sparrow
cloud zinc
#

thats so weird

half mist
zealous sparrow
#

your gemini is fkn cursed man

half mist
zealous sparrow
queen veldt
#

3 flash AGI

zealous sparrow
#

This is without grounding, and this is not a hallucination. As the models training data only dates back to Jan 2025.

#

So no bench can argue this is a hallucination

queen veldt
#

3 flash is more accurate than the pro version???

stray aspen
#

yes

#

he has bette rvision

cloud zinc
#

flash is focused more on vision and code

half mist
#

Gemini 3 Flash is actually really good at coding

cloud zinc
#

it is bad at other things

#

like hallucinating a lot

zealous sparrow
#

it only gave me good answers on grounding

#

API and website prob differs

#

wwtv says hes using website

cloud zinc
#

just because u cant confirm it doesnt mean it doesnt happen

zealous sparrow
#

sure not but like

#

it aint happening for me on AIStudio

#

and it happens for him on website

cloud zinc
#

most people not using aistudio

zealous sparrow
#

i want to bet one thing

#

the hallucination bench benched website

cloud zinc
#

is good

zealous sparrow
#

But also on website its not hallucinating..

#

I swear the bench got a weird ver of the model or smth

weary galleon
sterile tartan
weary galleon
sterile tartan
#

Sonnet 4.7 will be 🔥 Tho

weary galleon
#

I'm sure.

quartz light
#

i hope sonnet 5 wont be worse than 4.7 lol

zealous sparrow
#

they said 91% hallucination [wrong answers]

zealous sparrow
#

yet there's barely any

quartz light
#

oh

#

who said

zealous sparrow
#

are we like deadass

quartz light
#

4.5 on lowest?

#

haiku*

golden ocean
#

deadass

zealous sparrow
#

I hit the 3 flash ratelimit on AIstudio

#

didnt count but

#

strong say its about 100

#

More actually

#

let's not go as high

drowsy crater
#

Sup

quartz light
#

this bench shows otherwise

#

see

#

its a bench so its tru

quartz light
weary galleon
#

Guys ignore trolls, they wanna make violent argument. Ignore them and they would start to cry because of lack of attention.

pale obsidian
pale obsidian
#

i dont care either but to say xAI is worthless and ass is just not true

quartz light
#

i mean yeah some people but "xai haters" are ones hating on text models

quartz light
#

like dude even nova 2.0 is 3rd place on that bench

#

the fact that they only beat gemini by .1% 🤣

#

thats embarassing

#

not a "comeback" or whatever

pale obsidian
#

the fact artificial analysis published a report about a voice model instead of gemini 3 flash is also funny

vivid coral
#

speaking of xAI

stray aspen
golden ocean
#

lmaooo

sullen quest
vivid coral
#

more xAI fun

half mist
#

Did notice that Gemini 3 Flash asks follow up questions at the end of each response like ChatGPT. Before with 2.5 Flash, it didn’t do this natively unless you told it to do that

vivid coral
#

I thought Pro did that too, maybe not, I'll have to check

queen veldt
#

I just realised the depression I'd have if there was no lmarena

fiery gull
#

I'm so happy

vivid coral
fiery gull
#

now I have a gemini 3.0 pro but without minutes of loading to attack my adhd

stray aspen
#

does anyone have a nice prompt to remove coding lazyness in gemini 3

sour spear
#

Interesting little comparison. ChatGPT failed abysmally, the answer is utter bs. Gemini 3 Flash Thinking got it perfectly right. And Gemini 3 Flash Fast actually did some reasoning "on the fly" (the "Wait, let's look closer" part in italics), correcting itself, and also reaching the correct conclusion.

queen veldt
# stray aspen why

All those images i created with nb flash and all those coding with gpt 5 high

#

None of that would be possible without lmarena

#

I'd literally scroll tiktok or sum

stray aspen
#

i mean we still have yupp

#

lmao

vivid coral
#

yupp isn't free

fiery gull
queen veldt
#

Yupp can't be compared to lmarena

#

We don't need to submit feedback

#

To get points so we can use the model

torn mantle
torn mantle
vivid coral
#

yupp has limits, some models cost ya, nothing near LMArena. If you ask Gemini itself for alts, it will even tell you, there is nothing really like it. It will try to direct you to openrouter, yupp, etc, And when you press that there are costs and limits, they tell you only LMAreana is this generous

modest prism
#

I wonder how lmarena afford all these models

warm zodiac
#

Hey where did you get this info? Do you know when in Q1

modest prism
vivid coral
#

there might be more investers/seeders that aren't published or reported in the media, idk

#

could be the AI companies themselves, who knows

echo aurora
vivid coral
#

I guess some of is the AI companies themselves, it's very clear there's heavy heavy backing, it's not going anywhere

cloud zinc
#
LMArena Blog

Today, we’re introducing a commercial product: AI Evaluations. This service offers enterprises, model labs, and developers comprehensive evaluation services grounded in real-world human feedback, showing how models actually perform in practice.

hasty crag
#

hello

cloud zinc
#

when is it coming

#

sure

half mist
#

https://youtu.be/bY_RarpUdUw Spinners has you pick Gemini 3 flash make two versions for the next version of spinners, and you pick your favorite, then the next one is based upon the one you chosen. Huh, sounds familiar doesn’t it?

Gemini 3 Flash enables modern coding workflows, including ultra-low latency, near real-time code generation, and rapid iteration. It can also natively facilitate A/B testing, like evolving the perfect loading spinner in milliseconds, and can adapt to user selections to generate refined code variants in real time.

Learn more at https://deepmind....

▶ Play video
cloud zinc
#

everything is soon

torn mantle
#

gem3 flash eats up usage so fast on antigravity

#

which is surprising

#

15% left

#

nauuuuuu

#

its token efficient so whyyyyyyy

native yarrow
#

@zealous sparrow can u share ur system prompt

reef hollow
#

Does everyone have the video feature on the site? Because I enter another device and the feature does not appear

echo aurora
#

If/when we fully roll this out to all users we'd be sure to let the community know!

torn mantle
#

lol

fiery gull
neat apex
cloud zinc
#

wow

sullen quest
#

craig, you can admit that 5.2 is wasn't sent down from the heavens to bless the ai world forever ok?

stray aspen
#

It's ass

cloud zinc
stray aspen
#

I tested it myself benchmarks won't change my opinion

sullen quest
torn mantle
#

new

#

looks good

sterile tartan
#

Rate Limits?

torn mantle
sterile tartan
cloud zinc
torn mantle
sterile tartan
torn mantle
#

:omg:

#

@cloud zinc

acoustic crater
#

It looks like nano banana can be used

torn mantle
#

the auto modality works tho

torn mantle
echo aurora
torn mantle
#

this is so stupid tbh

#

Warp now uses Gemini 3 Flash for generated code diffs!

We've seen a clear bump in quality from Gemini 2.5 Flash.

Warp suggests code changes whenever you hit a compilation error in the terminal. We'll be rolling out Gemini 3 Flash for diffs, so let us know what you think.

#

ive seen couple of posts like these

#

cursor too

#

they just confine gemini 3 flash to something small like diffs or small bug fixes

#

thats it?

queen veldt
#

Okay I'm begging to like the flash model

#

It's fast it's got up to date information in seconds

warm zodiac
#

anyone get master-node as a codenamed model?

queen veldt
#

"SOTA"

sterile tartan
#

GPT needs Better Eyes and Ears

queen veldt
#

They added it to YouTube

#

It watches the whole video and gives suggestions

#

This is insane!

pseudo hemlock
#

gemini 3 flash is absolutely insane

#

it beats 3 pro in SWE

#

allegedly

#

but anyways its beating 4.5 opus thinking and not thinking

#

and on par with grok 4.1 thinking

#

absolutely mental

patent aspen
#

The OAI founders seem very, very stressed

queen veldt
#

Also the flash model is better at online guidances

#

Toolatlon benchmark

#

It's even better than pro model at some stuff like DAMN

sullen quest
latent crest
#

I randomly have the video arena now. 2 vids in a day tho 🙁

Thanks pineapple, hopefully there will be a little but more, but thanks regardless 🙏

molten cipher
#

Guys

#

Google is cooking with gemini

#

But bro it's dang scary

#

I was playing a horror game and vibe coding on anti gravity with gemini 3 pro high

torn mantle
#

ITS GETTING SCARY ALREADY

#

dont tell me whats next

#

🫣

molten cipher
#

.

torn mantle
#

😱

molten cipher
#

It said

"Okay, that's done. I will send it"

#

Then it started spamming

"I WILL SEND IT, I WILL SEND IT, I WILL SENT IT, I WILL SEND IT"

torn mantle
#

👻

molten cipher
#

Chill out it was scary, I was playing a horror game with my pay to win dude paying the game to scare the hell out of me

neat apex
#

OpenAi will turn warning code Brown after GF3

torn mantle
#

why are you using grounding/search/

#

?

#

disable it

neat apex
#

OpenAi now: If my eyes turns red, Run
OpenAi soon: if my pants turns brown, Run

vital lake
#

Why cant we say curse words in the chat lol? @echo aurora

vital lake
#

4kids IG

fickle venture
#

Imagine you got banned

torn mantle
#

some new models added

echo aurora
sullen quest
#

it still bans mentioning the popular children's game where you play other peoples creations

echo aurora
sullen quest
#

huh

#

weird

echo aurora
#

If/when Video Arena moves to the site we'll reassess the banned words list

rapid narwhal
#

Did the terms update and they not tell us? all my prompts that worked before don't work anymore. I can't even use the site to make a script for a youtube video anymore!

echo aurora
#

You can also DM me if you'd prefer that.

obtuse smelt
#

lol try games with AI videos