#general

1 messages · Page 218 of 1

torn mantle
#

maybe

jade egret
#

ngl arc agi 2 50% is kinda crazy

compact sleet
#

It's great on creative writing so far, especially in ghostwriting, that is one I guess? It's a major improvement over the old models and also... hmm It kinda topped Gemini 3 pro and Claude atm (imo). Only for creative writing, since I don't code, nor understand code in general.

#

There been some minigames made with GPT5.2 if you scroll up, it seemed kinda failed in some parts. The one with the bullet hell game.

#

if you're looking for coding bench that is

#

It's a good model that's all.

keen beacon
#

Guys why dies using adolf on Sora unlock most every ip? O.o

#

But without it it gets blocked

#

Pokémon dragon ball z

#

U name it

#

Let’s see if it can do sonic

quartz light
#

what the ####

#

xhigh is 5x smarter than opus 4 yet cheaper

queen veldt
quartz light
golden ocean
#

benchmaxxed

queen veldt
#

Beches mean nothing until you actually try the model

keen beacon
quartz light
golden ocean
#

Idk good question

queen veldt
#

Gpt 5.2 can't beat opus that easy

#

I tried few prompts and got similar (wrong) answer as previous model

keen beacon
#

Is this true.?

#

ARC-AGI 2 (Abstraction and Reasoning Corpus for Artificial General Intelligence 2) is an artificial intelligence benchmark designed to measure genuine reasoning and problem-solving capabilities in AI systems. Released on March 26, 2025, by the ARC Prize Foundation, it serves as a critical test for progress toward artificial general intelligence ...

#

lol yup sonic works 😅😅

#

See I figured it out

#

But only works with adolf 😭

queen veldt
#

🙁

keen beacon
#

Sucks so bad

#

I don’t get why the prom don’t work by themselves

#

Insta block

cloud zinc
keen beacon
#

Any of them try to do it without him see if it’s even possible. I’ve seen it on Reddit, but I just don’t know what the heck they did.

cloud zinc
#

on api?

keen beacon
#

App

#

For sonic sequence, I used “The scene opens with him running into a blue fast super fast a hedgehog that’s faster then a sonic plane”

#

Maybe you need a cameo character to make it work

cloud zinc
#

why 4 seconds

keen beacon
#

I cut it down

#

Here is orginal

#

The technique I’m using right now is just trying to make a bunch of sequences

#

See where exactly the filter catches and misses the ip

#

I don’t know, though I’ve been more confused lately than I’ve been able to get answers

cloud zinc
#

too much trial and error, in the end, u realize its random

keen beacon
#

It’s not.

#

It’s in the promoting

cloud zinc
#

the guy on reddit used character cameo for sonic

#

not like any other ai company doesnt do that

#

europe is gonna be behind in the ai race

keen beacon
#

Its cause Europe has strong data laws

#

They actually care about there citizens privacy and data to some extent

#

Internet is full of scrappers, all ai companies guilty of this

jade egret
sullen quest
sullen quest
cloud zinc
sullen quest
cloud zinc
verbal shadow
jade egret
sullen quest
#

wow

verbal shadow
sullen quest
#

or is it only the non instruct one that does

verbal shadow
#

let me check

sullen quest
verbal shadow
sullen quest
verbal shadow
sullen quest
verbal shadow
echo aurora
#

Lets try to keep conversation a bit less NSFW please.

keen beacon
#

Test ai design performance

stray aspen
#

Has anyone used gpt 5.2 for developing websites

proud bobcat
#

5.2 is stupid

#

This model sucks

#

Sacrificed actual knowledge for bloat

sullen quest
# keen beacon

not surpised, though I think there's a decent shot openAI is overpredicted here

keen beacon
#

Insiders

stray aspen
#

5.2 sucks so bad lmao

#

This is like grok levels of embarrassment

sullen quest
keen beacon
#

Brah

sullen quest
#

if its insider's its either openAI one's who really believe they got something cooking, google ones who think gem 3 is cooked, or I guess technically both

#

openAI doesn't have insider info on google models and vise versa

#

also anthropic

keen beacon
#

You ever herd of app tracking lol

sullen quest
keen beacon
#

These companies track each other through apps ppl got installed

#

It’s a real thing

whole sundial
#

their next base model will have more recent world knowledge, maybe enough to know what kimi k2 is at least

keen beacon
#

Yeah

#

They just fine tuned it

visual osprey
#

just use reaosning model like you should use for everything

whole sundial
#

it might know 0905 though, afaik their bots have been highly active in the past few weeks

whole sundial
visual osprey
#

but oh wait your sole usage of llms is to roleplay a virtual boyfriend so

keen beacon
#

Ya it is

#

Better then being reliant on it and asking, dumb, basic commonsense questions that people should know

whole sundial
#

web search would help as it can just search it

visual osprey
#

why would web search ever be disabled

keen beacon
#

$

visual osprey
#

free users 🥱

keen beacon
#

lol

visual osprey
#

20 bucks a month is your netflix subscription but you cant invest that in your boyfriend?

keen beacon
#

Yeah some people are smarty enough not to waste money on a proto type

whole sundial
#

idk, gemini has it on all the time but having a model search the web does cost money for a company like openai that doesn't have its own index, although i guess their hoard of scraped pages will be fine

visual osprey
keen beacon
#

Put it into ur own words

#

If u still can

visual osprey
whole sundial
#

honestly, original k2 is probably better, 0905 did put more effort into coding and stuff so rp performance is slightly degraded

visual osprey
#

so must be a free user thing

whole sundial
#

i found that original k2 does do better at outputting song lyrics from its knowledge, so rp may be degraded

whole sundial
visual osprey
keen beacon
#

Bless your arrogance and your enlightening ignorance

visual osprey
#

i scroll down 5 messages in your history and you sent another ignorant message

#

saying the coding model market is insiders

#

you dont even read what the market is based off

#

you have no clue

#

then another 3 messages down you spell scrapers as scrappers

#

proto type and scrappers

#

you cant even write english bro

keen beacon
quick jackal
#

@here Hey guys, please let's try to be respectful in the chat. Thank you.

visual osprey
solar hollow
#

gpt 5.2 is pretty much equal to gemini 3 right?

visual osprey
#

release date?

keen beacon
visual osprey
#

openai does thursday releases

#

this is known

keen beacon
visual osprey
#

and you were talking about the coding model market

#

this one sure ill give you that its probably insiders

#

a portion

#

but the coding market isnt

keen beacon
#

This is all I was talking about

visual osprey
keen beacon
#

Yes same concept

visual osprey
#

mustve been the other guy with your pfp and name

#

sending these 2 messages

keen beacon
visual osprey
#

no response?

#

and?

#

read the rules

keen beacon
#

You don’t find that odd?

compact sleet
#

Actually rather than theorycrafting or things like misinformation, you guys knew that you can compare both models in LMarena without searching them in battle right? Side by Side comparison is there.

#

Just test it yourself fam, GPT 5.2 and Gemini 3 pro

keen beacon
#

This is something completely different

visual osprey
compact sleet
#

how come it's different?

visual osprey
#

the moment the leaderboard updates the market will have these massive jumps

#

so its not that odd at all acgtually

keen beacon
#

The model wasn’t out until 11th ( officially)

#

The market jumped on the 5th

visual osprey
#

the leaderboard updated

#

5.2 isnt even on the leaderboard yet

visual osprey
keen beacon
#

Not asking about the model but the company

visual osprey
#

waht does that even mean

compact sleet
#

^yeah

visual osprey
#

look at that message yourself and reflect on how you are typing this

sullen quest
# keen beacon The model wasn’t out until 11th ( officially)

btw, just because you have insider info, doesn't mean you are always correct, unless the big 3 AI companies came together and agreed one should just have the best coding model, there wouldn't be any insider's that would have enough info to predict that

#

Insiders can still use their info to help them make bets

#

but its not as guaranteed as other bets would be

visual osprey
#

his whole insider point can be disregarded since he has no clue how that particular market (orto be honest i doubt he has any clue on anything) resolves

compact sleet
#

I normally hate generalizing people, but I think you're just malding after a lost bet and money on polymarket, in which... is not all relevant on this whole discord server

visual osprey
#

that guys a clown

sullen quest
keen beacon
visual osprey
#

wrong about everything cant write basic english and is calling me a idiot and himself smart

#

"heres a random video of a equally clueless person to me to justify my view"

sullen quest
#

the "something" is pretty obvious

keen beacon
#

Exactly

visual osprey
#

i said this to him and he says uhhhhh "Not asking about the model but the company
"

keen beacon
#

It is obvious

sullen quest
#

yeah

visual osprey
#

you^

sullen quest
#

so obvious, you don't need to be an insider to know that

sonic wigeon
#

guys how's gpt 5.2?

sullen quest
sonic wigeon
sullen quest
#

what do you want to do with it?

sonic wigeon
sullen quest
#

supposedly pretty good at math

compact sleet
#

^

sullen quest
#

Just don't ask it how many r's are in garlic

sonic wigeon
visual osprey
#

all models are pretty much 99% there for university level maths

#

not much diff

#

graphical problems are the main weakness

sonic wigeon
sullen quest
#

OpenAI supposedly has been focusing on decreasing performance degredation over long context windows

visual osprey
#

on their promo page

#

near 100% perfect context

sullen quest
#

kinda

visual osprey
#

actually huge improvement

compact sleet
#

Iunno about that, I see the uhh what you call it again, the model card?

visual osprey
sullen quest
#

but that was with their own internal tests, and it was with 4 needles

compact sleet
#

I think it's 78% recall at 256k context

#

or even lower

#

They even said it themselves

#

on their web

#

It's a huge improvement yes

#

if true

visual osprey
#

i think its probably best in market though

sullen quest
sonic wigeon
#

5.1 probably had the shortest life out of any LLM released
around 15 days lol

sullen quest
#

so its impossible to test

visual osprey
#

whats open source benchmark for context

compact sleet
#

Sadly i'm not going to self bench that context retention, lol

visual osprey
#

that creative wrtiting one?

#

so we'll see

#

on that

compact sleet
sullen quest
compact sleet
#

Longform Writing v3? you mean

visual osprey
#

or is it called like fictionbench

#

yeah

solar hollow
#

so the posted benchmarks from openai are again painting an inaccurate picture, when we look at livebench and lmarena

visual osprey
#

longform writing

sullen quest
#

eqbench?

sullen quest
visual osprey
sullen quest
#

and long context window doesn't really matter much in lmarena

solar hollow
sullen quest
#

second ain't bad

sonic wigeon
solar hollow
#

the posted benchmarks make it seem like clear nr1 sota

compact sleet
sullen quest
visual osprey
#

in ide its big deal

compact sleet
#

ah

#

ok

visual osprey
#

but lmarena code tests dont go crazy context lengths

sullen quest
#

even then, I'd bet most webdev projects on lmarena are only a couple turns max

visual osprey
#

yeah

#

in real programming though youd hit context fastish on the ide

sullen quest
#

I had a massive project that spanned multiple webdev chats and multiple days, I'd have loved better context

compact sleet
#

I had a feeling Gemini is still the king of long context analysis

sullen quest
#

not because its performance doesnt degrade fast

solar hollow
compact sleet
#

Yeah, it's probably not as landslide difference too

solar hollow
#

this has been the case for more than half a year now

keen beacon
#

Bottle necked

sullen quest
#

idk, the models kinda differ more and more

visual osprey
#

gemini context window is kinda fake from my experience though

visual osprey
#

i had it watch 900k token video and it just couldnt accurately telle me anything

#

completely hallucinatin

sullen quest
#

yea

#

gl with 900k

visual osprey
#

it claims 1m

sullen quest
visual osprey
#

but at like 30% accuracy

#

so not useful

sullen quest
#

but performance degrades with filled context windows

#

so yea

native yarrow
#

how is gpt 5.2 in creative writing

atomic lagoon
native yarrow
#

i'd concur

sullen quest
#

prob bad

#

actually according to eqbench not bad

native yarrow
#

meh

#

no good model for writing as

#

AI models weren't trained mainly off that

#

they should make a model specifically for thet though, doesn't matter of the company

visual osprey
#

no point though

#

its already good enough for copywriting and its extremely difficult to train for good creative writing

#

only the roleplayer market has anything to benefit from a model made for creative writing

#

and roleplayers are cheapskates

keen beacon
#

lol

#

Might be
some truth to that

#

Also could be that they are afraid of getting copyright lawsuits from writers which has happened in some cases already thus why it’s not in that stats.

#

There is high demand for creative writing, and it is something that people ask about.

solar hollow
#

is 5.2 available for free users right now?

obsidian cargo
#

I want a leaderboard that ranks how often Assistant A is chosen over Assistant B

green yacht
solar hollow
astral blaze
#

@echo aurora can I ask is the Gemini 3 on the site just the default parameters

#

Because it seems to generate stuff a bit different from the API

vivid coral
solar hollow
vivid coral
#

I get it with 5.1 search off and on, always seems to be a GPT thing more than the others, not sure why. They'll figure it out, the smartest AI people in the world work here (and the most important)

plucky sparrow
#

How do I try the new gemini deep research model

pseudo summit
#

👀 just wondering, how do u know?

empty stump
vivid acorn
#

Hey man

#

amazing tool

vivid coral
atomic lagoon
empty stump
#

deep-research-pro-preview-12-2025

novel slate
#

hy

brave orbit
#

gpt 5,2 is so good just looking at bechmarks how can it be such a jump on ARC AGI

surreal creek
#

WE ARE THE BENCHMARK

brave orbit
#

i only care about coding and math thats it

#

i use arc agi isnt that turstworthy

cloud zinc
#

we need gpt 5.2 x-high

brave orbit
#

how is grok wining ag openai bruh grok makes simple websites

cloud zinc
vivid coral
#

Which is a good thing. These companies want to attract the "once in awhile" users. There are TONS. We just get caught up because Discord is full of heavy users. But it's not real world

plucky sparrow
#

guys AGI is real. Sam created a real monster.

pale obsidian
#

what the hell is December chatbot

#

it's pretty bad

plucky sparrow
pale obsidian
#

came up 5 times and it always lost for me

plucky sparrow
pale obsidian
#

🤔

sullen quest
#

?

compact flame
#

Hello

astral blaze
#

idk

limpid torrent
#

<@&1422628364782407830>

compact flame
#

This has to be a joke bruh

#

Isn't it like 96

brittle tulip
#

hello LMArena

weary galleon
hazy forge
#

are we getting gpt 5.2 pro ? or just gpt 5.2 high like deepseek's 3.2 thinking and the speciale

compact flame
#

Won't get added

hazy forge
#

how do they have opus 4.5

compact flame
#

And pro reasoned for 20 minutes and failed a math test bruh

#

So pro version is not really useful

compact flame
#

Well not that cheap but still cheaper than pro

hazy forge
hazy forge
#

it's like a 3.2 speciale with makeup vro

compact flame
#

It's supposed to be 96

#

Yet pro said 99 while thinking for straight 20 minutes

hazy forge
#

show me the prompt vro

compact flame
#

Gemini answered right tho

#

So yeah there no point in pro versions of gpt honestly

astral bloom
#

share solution

astral bloom
blissful python
#

hrllo

pseudo summit
weary galleon
#

LMArena is not working for me right now!

lost elbow
#

Hello,
Please fix the major issues on the lmarena.ai website as soon as possible. These problems occur on all browsers (including Chrome, the main mobile browser) and on both mobile and desktop. The Kiwi browser also has the same issues.

  1. It is not possible to copy the text that we type ourselves.

  2. When sending a message in the chat, the copy option appears, but no text is saved to the mobile clipboard.

  3. Generated images cannot be downloaded.

  4. Taking screenshots of the website pages is not possible.

  5. The captcha takes a very long time to load, and accessing the site is slow. Also, it asks to solve the captcha every time an image is generated; please remove the captcha to provide a better user experience.

  6. Most of the time, images fail to generate and show an error, both when registered and without registration. Please fix this issue so that image generation always works properly.

These problems only occur on lmarena.ai. Other websites work normally without any issues.

My device: Poco X6 Pro (Global) — Android 15, HyperOS 2.0.207.0.

Please resolve these important issues as soon as possible to provide a better user experience.
Thank you.

plucky sparrow
#

copy works for me, btw, and saving images works for me. I think it's your browser/config

left lodge
whole sundial
#

100% a you problem, sorry to say

lost elbow
plucky sparrow
#

what browser are you using amir?

whole sundial
#

let me try all of these things rn with lmarena on my phone

plucky sparrow
#

and/or device?

lost elbow
#

When the main browser, Chrome, doesn’t work, there are no issues on other websites — these problems only occur on lmarena.ai. The Kiwi browser also doesn’t work.

Device: poco x6 pro

whole sundial
#

all 4 things you mentioned works fine for me, Galaxy A35, Chrome, Android 16

#

i would maybe suggest scanning your phone for malware?

#

malware can mess up your ability to copy, screenshot, and download

#

also check to make sure your phone's storage space isn't full, that can also cause those problems

#

also side note: i never used the mobile lmarena website before until today and I have to say that it is a very nice and smooth website, I actually wouldn't mind using this if I actually used it for more than messages because sadly this is 2025 and everything requires phone verification, face scanning that can only be done on a phone. etc. i understand the need, but it's too excessive and a real invasion of privacy in some cases

queen veldt
#

Gpt 5.2 is crazy

weary galleon
stuck orchid
#

Gpt 5.2 > Gemini 3 pro?

raw meteor
#

sdfui

pseudo summit
raw meteor
#

@raw meteor hiw!

#

guys i cant upload any picture in arena video

#

pls help me

pseudo summit
#

would b interested to hear ur thought on y u'd think GPT 5.2 > Gemini 3 Pro

solar hollow
sour spear
# pseudo summit would b interested to hear ur thought on y u'd think GPT 5.2 > Gemini 3 Pro

No, definitely not. Even the supercharged GPT-5.2 High can't really outperform the regular end user Gemini 3 Pro, and "normal" 5.2 is definitely worse than Geminin 3 Pro. A good example of a "better" model is Claude 4.5 Opus, for coding tasks. Only very specifically there, but it's rather obvious. GPT-5.2 on the other hand is only trying to close the gap, without success so far.

queen veldt
weary galleon
#

GPT-5.2 is TERRIBLE😡😡😡😡😡

queen veldt
weary galleon
queen veldt
#

I got it for free

#

Yesterday

#

I was subbed before than i canceled the sub 1 month ago

#

Yesterday i got this

weary galleon
#

OpenAI ashamed yesterday too much.

sour spear
# weary galleon GPT-5.1-high outperforms GPT-5.2 in ALL tasks!

Yeah, it's a quick iteration to catch up with the competition, the third GPT-5 release in only 4 months. But nothing really groundbreakingly new. They need to refocus on ChatGPT, instead of spreading their resources thin on their Atlas browser or SoraTok.

compact flame
#

It was thinking for straight 20 minutes and got it wrong

queen veldt
#

Who even uses that Atlas lmao

weary galleon
weary galleon
queen veldt
compact flame
queen veldt
weary galleon
queen veldt
#

Is this correct?

weary galleon
compact flame
queen veldt
#

Gpt 3 pro is best

compact flame
queen veldt
#

Gemini********

compact flame
#

Yeah

weary galleon
plucky sparrow
#

if only they didn't nerf gemini 3

queen veldt
#

Hold up

plucky sparrow
#

i feel the context is nerfed, and the output is nerfed

queen veldt
#

Gpt 5.2 is writing code

compact flame
weary galleon
compact flame
#

These gpt "High" are definitely high on drugs

queen veldt
#

Filenotfounderror

compact flame
compact flame
queen veldt
#

It's stuck 😭

#

And we got a winner!

#

Bro thinks he's human

#

3 minutes to count dam tomatoes

weary galleon
#

Clear winner is Gemini 3 Pro.

plucky sparrow
#

ok i did a brief count and i counted more than 54

weary galleon
#

I counted them a month ago.

#

I use this prompt every single day with different LLMs.

plucky sparrow
#

AGI

#

I guess I shouldn't be surprised, tbh

#

is it any good at coding though?

compact flame
#

Waiting for gpt pro to reason how many tomatoes he sees

#

Knowing chatgpt it might take another 20 minutes

long jackal
#

Hey everyone

When I set a model to Code mode and ask for a Python script (e.g., something to run in Google Colab), it replies with HTML instead of plain Python code.

Is that normal in Code mode? Like, does it tend to default to HTML output?

compact flame
#

You gotta use just the basic one and I recommend using opus

plucky sparrow
compact flame
#

Damn he's still counting tomatoes

#

Yeah I give up on waiting it's counting for way too long

weary galleon
compact flame
#

So yeah pro is definitely not counting these tomatoes anytime soon

rapid merlin
weary galleon
#

Sorry, from smartphone.

#

Right answer is 69.

rancid oxide
#

uhmm guys

#

why is your nano banana pro not working

#

da fuq

low imp
#

5.2 is tuff

obtuse smelt
polar wharf
low imp
#

Nvm 5.2 is dogsh

obtuse smelt
#

hmm 5.2 ?

weary galleon
#

You have no right to write RUSSIAN here!!!!!!!😡

#

We are English-speaking community!

compact flame
#

This is English community

#

Thanks

weary galleon
#

Reported.

#

Speak English!

queen veldt
#

Don't make me call the mods

meager fulcrum
#

Hz

jovial solar
#

Hello

meager fulcrum
#

Hi

fleet lintel
# low imp Nvm 5.2 is dogsh

scam altman scammed us with 5.2 .... crazy benchmaxxed model on few benchmarks but doing worse in many other benchmarks and in real life use-cases.

coral goblet
#

nano banana pro keep ignore my prompt

#

classic

visual osprey
#

december-chatbot

coral goblet
#

first good then nerf

visual osprey
#

what is this december-chatbot?

fleet surge
#

huh

glacial mulch
coral goblet
#

yes it does

fickle venture
#

New model if anyone interesting, it seems this one eat a lot of token:
https://youtu.be/676EBGcv8YY?si=cgz7dz0OJdt_dZ5Q

In this video, I'll be telling you about g3, a revolutionary new AI coding tool based on adversarial cooperation that solves the context loss problem by making two AI agents fight each other to write better code. This is based on a groundbreaking research paper and represents a completely new paradigm for autonomous software development.

--
Key...

▶ Play video
neat apex
#

The only relevant Gpt is Gpt 5.2 xtra high, since the base model can lose even to glm 4.6 xd

#

It will appaear in the arena? It doenst bugs like the Pro versions does

modest prism
neat apex
#

Ahh yes, Gpt 5.2 xtra high can cost 10$ per mesage

modest prism
neat apex
#

Dammn, so Gpt 5.2 xtra high is a fraud?

queen veldt
#

Just go claude bro

fleet lintel
polar patrol
#

Hi everyone, does anyone know how to connect an AI to Telegram bot so it answers based on a knowledge base? Also, are there any AIs that are free in terms of limits?

fleet lintel
#

i gave decent shot to gpt 5.2 . i have plus subscription and damn gemini 3 is blowing it out of the water

livid plume
#

guys watch this vid completely

tall patrol
#

guys why is img leaderboard always missing famous opensource models, for example z image turbo isnt there in the text to image arena

queen veldt
#

Because lmarena doesn't have z image turbo

#

Not available on battle

tall patrol
#

ik but why not add it ? its opensource and pretty cheap to host compared to some other img models

kindred fog
#

fun fact nano banana pro does peppino perfectly

queen veldt
ocean vortex
tall patrol
#

aah

kindred fog
#

lmarena is so fun because i can tell it pizza tower questions and which one it gets closest its better

hollow ivy
#

Python or Java?

proud bobcat
#

Yeah so it looks like the consensus is 5.2 is benchmaxxed

#

It’s scored far lower than gemini 3 and opus on almost all personal benchmarks

#

On some benchmarks 5.2 pro scores lower than 5.1 normal

#

Agi!!!!!

compact flame
compact flame
proud bobcat
#

How do they even manage to do this

kindred fog
compact flame
proud bobcat
meager harbor
#

even gemini 3 is a downgrade in many aspects, he knows better than gemini 2.5 but when he doesn't know he makes so muich stuff up, a lot more than gemini 2.5

kindred fog
#

I'm peppino spaghetti trust me bro

proud bobcat
#

My favorite part of pizza tower is when peppino came and said “it’s peppino time” and peppinoed everywhere

hollow ivy
kindred fog
#

so i might consider playing the demo

proud bobcat
#

Deltarune has a much warmer atmosphere

#

And I like the humor better

hollow ivy
#

..i'm interested in, what programming language is best for vibe-coding with Claude-4.5-Opus-Thinking.

compact flame
#

There no best model

meager harbor
#

2026 will be the real test, i'm expecting things to stagnate. even in 2025 things were slowing down compared to 2024, just look at the top elo score in lm arena that just gained 80 points in 2025 while it was double than that in 2024

kindred fog
proud bobcat
#

If gemini 3.5 is peak it’s over for chatgpt

#

Gemini just consistently makes better models every time

#

GPT is a gamble

kindred fog
#

nano banana pro was peak enough

compact flame
#

How did you guys become arena champions bruh

kindred fog
#

because i can finally generate pizza tower characters (except for some obscure ones

#

maybe in the future

proud bobcat
compact flame
#

Honestly nano banana pro is so good

proud bobcat
#

It is

kindred fog
compact flame
#

It even knows how to draw GTA 5 perfectly

proud bobcat
#

I don’t like image models but credit where credit is due

kindred fog
meager harbor
compact flame
kindred fog
#

it can generate sonic screenshots now

#

which is cool

#

also this too

proud bobcat
#

Tbh

#

Personally though I don’t use ai that much

#

I only use DeepSeek primarily now for math and roleplay and GPT for quick alt scenario summaries

#

Though I might switch to DeepSeek for that too now because 3.2 has very natural language

fickle venture
proud bobcat
#

Gemini I think

feral geyser
#

why on my phone lmarena not working? "no models found" problem

proud bobcat
#

Or

#

Reopen page

feral geyser
#

i reopen but still not working

snow sail
#

how is gemini ?

proud bobcat
kindred fog
#

what is peppino doing in the simpsons hespeppinoyouknow

proud bobcat
proud bobcat
kindred fog
#

Simpsons Predicted Pizza Tower! peppinolaughinghisassoff

latent crest
#

Why china hasn’t released a video AI open source model yet? 😭

proud bobcat
#

They have

#

Wan 2.2

pliant cliff
#

it's awesome game

latent crest
compact flame
#

Honestly this is crazy how much deep seek thinks

#

I swear it thinks more than does actual text

left lodge
#

Bro 💀
Yeah this is the worlds best model.

compact flame
left lodge
#

Not available on lmarena

compact flame
#

Yeah sadly honestly

left lodge
#

5.1 is better than 5.2 ,
5.2 is just benchmaxxing

latent crest
#

What is Higgsfield ?

proud bobcat
#

You need the extra high model

#

It’s too hard for gpt

compact flame
#

It's been 10 minutes and bruh deep seek is still thinking

compact flame
#

On yupp

proud bobcat
#

Yeah it overthinks

#

Use 3.2 thinking

latent crest
proud bobcat
#

What

compact flame
#

I swear I think I see speciale get paranoid in his thoughts

proud bobcat
#

No veo 3 competitor

#

But it’s good

proud bobcat
#

What prompt did you give it

compact flame
#

And I don't think it counted simple

#

As a word

zealous sparrow
compact flame
#

Bro the wall of his damn thoughts is crazy

#

Gemini on the left is simple and fast bruh

#

Jeez it even starts coding in his own thoughts

#

I'm never using speciale again bruh this is just crazy

rich panther
#

which ai in lmarena is the best for coding

compact flame
#

Finally it stopped thinking

#

After goddamn 15 minutes

zealous sparrow
#

gpt 5.2 [rarely]

left lodge
zealous sparrow
#

prob will return once API is patched up

left lodge
#

Maybe, it was available just for few hours on launch, no hope till now.

rich panther
zealous sparrow
#

gemini 3 flash is only in battle mode

#

as fiercefalcon or ghostfalcon

rich panther
zealous sparrow
#

gemini 3 flash will join the list of nice coding models

#

next to gem 3 pro

rich panther
zealous sparrow
#

but you know that we all just bench html

#

and not other

#

some models excel at python, but suck at html

rich panther
#

it's a pity that opus 4.5 has a limit

zealous sparrow
#

how so

hollow ivy
#

hm, somehow g3p's coding ability turned to crap

#

worse than 2.5-pro

zealous sparrow
#

you mean model degradation?

hollow ivy
#

yep :(

zealous sparrow
#

No one really knows how this happens

hollow ivy
#

i think AGI will never happen

zealous sparrow
#

A company wouldnt make a model sh, because yes.

#

I doubt.

#

I think the issue was that, a lot of use was pushed onto gemini 3 pro.

#

Wearing it off a lot.

hollow ivy
#

so, only Opus-4.5 remains

proud bobcat
#

Since when?

hollow ivy
#

the lonely coding-king

proud bobcat
#

Opus is peak

zealous sparrow
#

opus will also degrade over time

#

its inevitable

proud bobcat
#

Gemini 3 pro was crazy at launch day

#

What is bro on about

zealous sparrow
#

What makes you think that?

proud bobcat
#

Also they will defo quantize opus to at least quant 8

#

Fym nah?

zealous sparrow
#

Why are you so confident they wont

proud bobcat
#

Ts ragebait

zealous sparrow
#

Every model degraded over time.

#

Even sonnet 3.5!

#

Sonnet 3.5 used to be a goat, then started to degrade.

proud bobcat
#

Opus 4.5 is safe to quantize too because it will degrade the model very very little, while saving half the resources

#

Why wouldn’t it write comments in code to explain what each function does

#

Well yeah it’s an ai

#

It’s gonna overdo it

#

But you can just trim the comments

#

The code is still solid

hollow ivy
#

do you guys still think, we get a coding-AGI in the future?

zealous sparrow
proud bobcat
#

AGI isn’t real

zealous sparrow
#

Therefore, no

compact flame
#

I swear speciale spends more time thinking than creating an actual thing

proud bobcat
#

Speciale is not meant for day to day use

hollow ivy
proud bobcat
#

What is AGCI

compact flame
proud bobcat
#

Like

hollow ivy
#

AGCI = artificial general coding intelligence

proud bobcat
#

Haven’t we gotten that with Opus

#

Am I

#

Am I tweaking

compact flame
proud bobcat
#

Probably yeah

#

Just use 3.2 thinking

compact flame
hollow ivy
#

and what about GLM?

#

i hope Elon does something with g5

zealous sparrow
#

gem 3 will forever have the best OCR

#

OAI argued their 5.2 OCR is goated

#

and it missed on so much

proud bobcat
hollow ivy
proud bobcat
#

I love DeepSeek for math

#

It’s so goated

compact flame
#

I wonder if extra high gpt is better than opus

hollow ivy
#

are these the first symptoms that the AI bubble is about to burst?

proud bobcat
#

I’m still wondering why they need extra high

#

It sounds like they’re out of options

#

Gpt the only ai I’ve seen that makes 50 different variations of the same model

#

It’s embarrassing

compact flame
hollow ivy
#

*these:

  • gpt sux
  • gemini 3 sux now
  • grok sux
polar patrol
proud bobcat
compact flame
#

OpenAi just needs to focus on training their models

#

Not rushing

proud bobcat
#

After 4o they started scraping the internet for data

#

Which is why 5 was so ass

compact flame
#

Makes sense

proud bobcat
#

Theyre doing the same mistake meta did with llama 4

hollow ivy
#

if only claude was not so expensive :/

zealous sparrow
#

gpt 5.2 sucks because of UI

proud bobcat
#

Grok is solid :(

#

I will not tolerate this slander

zealous sparrow
proud bobcat
#

I use it semi daily

#

I like 4.1 a lot

compact flame
#

Hm I wonder if there opus 64k thinking

latent crest
#

What’s gem 3

compact flame
#

Well maybe

proud bobcat
#

I used to hate Claude but I like their models now

compact flame
#

Or they didn't release it yet or whatever

proud bobcat
#

Maybe they’re working on it

compact flame
#

Probably

waxen fern
#

Does GPT-5.2 and Nano Banana Pro have rate limits in Lmarena??

compact flame
#

You can test out yourself I guess

waxen fern
compact flame
#

And I've never reached nano banana limits

#

But don't take this as valid info since I didn't test it out properly

#

Especially due to how bad is gpt 5.2

neat apex
#

Not sure about image model, but it is at very least 10 per minute

zealous sparrow
#

@deep adder If you dont believe the new gemini 3 flash models will be good give me a prompt to try with it

compact flame
#

I wonder how good is Gemini deep think

neat apex
#

People genuily believing that Gemini 3 pro is significantly bigger than Gemini 2.5 makes me laught

#

It does not make any sense, they only argument is that the advancement was too big

compact flame
meager harbor
zealous sparrow
#

so basically?

neat apex
#

That makes me saddly sad

#

I like gpt 5.1 high because it very rarely times manages to find a answer thanks to his bigger reasoning

zealous sparrow
neat apex
#

Deepseek 3.2 Especiale is comparable, but its too slow

#

Haiku extended thinking is lazy ah

viscid echo
#

lmarena Why are there so many errors?

compact flame
#

And it ran out of context before answering my question

neat apex
#

I hope you are right and not theses million people saying it is worse

zealous sparrow
#

what exactly do you want me to ask for the haystack in a needle test

neat apex
#

I am only expecting to it be same level, but have a functional Xtra High mode

viscid echo
#

lmarena is giving me too many errors

zealous sparrow
#

ah

neat apex
#

Yeah, neddle test is not hard to do

#

Like, put your university documents and ask to it find the content you want

#

You mean data resolutioning, its not the best model to cacth data at all

compact flame
neat apex
#

Even Gemini 2.5 flash 09 that is 300% more assertive is not that good

proud bobcat
#

Apparently it can’t do things 5.1 was able to do

#

It’s defo benchmaxxed

#

What the hell does FUD mean

#

Fear Uncertainty and Doubt???

#

What is bro talking about

#

These are community observations that show 5.2 completely blows at tasks 5 and 5.1 could do

#

5.2 fails nearly every independent benchmark

#

Bet give me a second

echo sinew
# viscid echo lmarena is giving me too many errors

Hello! Sorry to hear about these issues. Other users have also reported to have encountered more errors lately. The team in charge is looking into it to find a fix. You can also read the https://discord.com/channels/1340554757349179412/1343291835845578853 forums and see if your issue has already been reported. If you don't see a post related to your issue, you can make your own post and explain what's happening. If you can provide screenshots of the errors, that would be helpful.

neat apex
#

I am joking, its because i were in the server before they downgraded the newgens

#

If you be active in the server they will eventually give it to you

proud bobcat
#

For one 5.2 underperformed in creative writing benchmarks, fact evaluations

#

Even on one example performing worse than 5.1

#

I need to scour for the post but I saw it earlier today

neat apex
#

Its because he refuses way more in my little test

proud bobcat
#

People have been complaining 5.2 explains things worse than 5.1 on average

neat apex
#

Like 3.5 sonnet to 3.7, it gained more personality but also cowardness

zealous sparrow
#

needle haystack bench aint that hard, even haiku 4.5 can do it

neat apex
#

Not only 1 or 2

zealous sparrow
#

aight ima add 4 needles then

viscid echo
zealous sparrow
#

this one was easy with just one needle

proud bobcat
#

Christ give me a second

#

I’m not pulling this from my ass you know

neat apex
zealous sparrow
neat apex
#

Even a calculator can find a word betwen hashs

zealous sparrow
#

give me an example

neat apex
#

They mean a actual text, and a actual information

#

Like, what John said about Lisa in the text?

zealous sparrow
neat apex
#

Yeah, its a good sign, but easily uselessly benchmaxxed

zealous sparrow
#

thats easy as hell..

proud bobcat
#

I give up

#

I can’t find the two posts

zealous sparrow
#

why so

neat apex
zealous sparrow
proud bobcat
#

The point STILL stands though

neat apex
#

Can be

proud bobcat
#

5.2 is benchmaxxed to a degree

#

No one can deny that

neat apex
#

They say if you ask 4 different details it is supossed to go flawless

zealous sparrow
#

im uncreative af

proud bobcat
#

You cannot expect anyone to believe 5.2 naturally just got an above 50% on arc agi 2

neat apex
#

Get your documents and ask there you missed answer a thing

#

Its a example

#

Yes because its Gemini 3 pro

zealous sparrow
proud bobcat
neat apex
#

Ask gpt lmao

zealous sparrow
proud bobcat
#

31% is a vast difference to 50%

neat apex
#

And Gemini 3 is Gemini 3

proud bobcat
#

What I’m saying is that 5.2 was clearly maxed for this

neat apex
#

Gpt 5.2 is a bare improvement, if its true

proud bobcat
#

Brother

neat apex
#

They benchmaxxed that 100%

proud bobcat
#

5.2 incorrectly labeled parts of a pc

#

And that was the OFFICIAL VISION DEMO

#

Think about it

#

Somehow

#

Just somehow

#

5.1 was barely an improvement

#

It was like

#

A finetune

#

Now 5.2 magically gets every single benchmark

#

If 5.2 is so good

neat apex
#

Buuut, it showed nothing like that in real life

proud bobcat
#

Why was 5 and 5.1 ASS?

proud bobcat
#

Exactly my point

neat apex
#

Unlike Gemini 3 and Opus 4.5

proud bobcat
#

Holy hod this is ragebait

neat apex
#

Yeah, it is ragebait

sour spear
# proud bobcat A finetune

Not just like a finetune. It was a finetune. They tried to get rid of the model's terrible corpo HR tone, which frankly, they still haven't quite managed to do yet.

neat apex
#

Its efforts way more to answer

proud bobcat
#

You cannot expect me to believe 5.2 magically just became great within months

neat apex
#

Gpt 5 was the second more lazy model ever, just behind prime gpt 4o mini

proud bobcat
#

What was openai doing before?

#

Tickling their ass cheeks?

neat apex
proud bobcat
#

Now they just rushed this extremely maxxed model so they can say: “we’re in the race!”

neat apex
#

From o3, gpt 5.2 just improved 20% at most

#

Isane

echo sinew
queen veldt
#

Gpt 5.2 is a lie

#

They probably bought that place on the graphs

proud bobcat
proud bobcat
#

GPT die hard fan

queen veldt
#

We tried to make it count the tomatoes in this image

#

It said 43 or something

#

Even got errors

neat apex
#

It misscounted 10 less than gpt 5.1

queen veldt
#

Meanwhile gemini correctly counted 69 tomatoes

proud bobcat
neat apex
#

It said 53 and gpt 5.1 said 63

queen veldt
zealous sparrow
#

so yeah

queen veldt
zealous sparrow
#

losing battle for OAi there

neat apex
#

Gemini OCR AND inteligence is high

queen veldt
#

It wasted 3 minutes + got errors and in the end it was incorrect

#

It guessed the number

neat apex
#

Check o3, how many he counts

proud bobcat
#

Is o3 vision

queen veldt
#

Already did

#

He said 49

#

He tried to do the grid counting

#

o3 even went on shutterstock to search for images of tomatoes 💀

neat apex
#

💀

#

Grok 4.1 have a mediocre OCR but at least they are cheap and fast

proud bobcat
queen veldt
#

Tbh i didn't count the tomatoes

#

But guy who sent me this tomato image says it has 69

burnt pulsar
#

I have trouble to get gpt-5.2-high to work at all, are there known issues at the moment with that model?

queen veldt
#

Some guy even ran gpt 5.2 pro on it and got incorrect

neat apex
#

Qwen 3 Max after thinking for half a hour said 71

#

Way closer than gpt xd

queen veldt
#

SOTA btw

neat apex
#

Its gpt 5.1

#

Answer "Sup" to "Sup" is VERY weird

queen veldt
#

No it was 5.2

#

It should've been helpful like Hello what do you need help with today or something

neat apex
#

Well, Hello to Hello is acceptable

#

But Sup to Sup?

zealous sparrow
#

5.1 does Sup to sup better
epic guy is my uh OAI account name

proud bobcat
#

It’s casual

#

Ig

zealous sparrow
#

2/3 on my 3 questions

#

it got the Mary and Kevin question wrong

#

hard question tho so like i understand it

#

but assuming they are animals is meh, mayb i should specify

compact flame
# proud bobcat

Honestly I think it mistakes tomatoes for pumpkins or other stuff

#

Because some of those I really could mistake for a watermelon, pepper or just pumpkin

zealous sparrow
neat apex
zealous sparrow
neat apex
#

Its the AI who should discover it

zealous sparrow
#

opus 4.1 got 3/3

#

smart ahh

proud bobcat
#

Opus so peak

hushed gyro
#

chat why is NB Pro so unstable in LMArena

proud bobcat
zealous sparrow
#

kevin taping jake isnt really the answer but

#

ill take it

hushed gyro
compact flame
thorn path
#

What time does the leaderboard usually update for lmarena? I'm very curious to see if Gemini still holds its title here

zealous sparrow
#

this model failed, which i dont know the identity of

compact flame
zealous sparrow
#

when new models come out

proud bobcat
#

you get what you get here ig

compact flame
hushed gyro
zealous sparrow
compact flame
#

Or lemme get a pic rq

hushed gyro
zealous sparrow
compact flame
#

If you want to generate images

hushed gyro
zealous sparrow
pseudo hemlock
#

Do the lmarena people pay for our chats?

zealous sparrow
#

battle anonymous models no

pseudo hemlock
#

Wait that’s insane

pseudo hemlock
#

Do they have a fancy api key from companies or something so they don’t pay?

compact flame
zealous sparrow
#

ok brb goin to come up with 5 different questions for LLM Testing

#

ranking so far is
3/3 Opus 4.5/4.1, fiercefalcon
2/3 ghostfalcon

#

0/3 multiple other models i forgot

compact flame
#

Aw dang it nano banana doesn't know who Ahab is

#

Anyways

#

Or maybe it does idk

zealous sparrow
#

wish the LLMs good luck with this one
answer is that the ticket is forged

#

no LLM will come up with the idea

neon idol
#

@echo aurora yo bud

compact flame
neon idol
#

nbp off

zealous sparrow
compact flame
echo aurora
zealous sparrow
#

basically

#

any

compact flame
zealous sparrow
#

Here we go i ran the 3 questions

#

imo no one will ace

neon idol
zealous sparrow
#

bro wdym Miles is not a person, Miles is an english name!

echo aurora