#general | Arena | Page 218

torn mantle Dec 11, 2025, 11:16 PM

#

maybe

jade egret Dec 11, 2025, 11:17 PM

#

ngl arc agi 2 50% is kinda crazy

compact sleet Dec 11, 2025, 11:24 PM

#

It's great on creative writing so far, especially in ghostwriting, that is one I guess? It's a major improvement over the old models and also... hmm It kinda topped Gemini 3 pro and Claude atm (imo). Only for creative writing, since I don't code, nor understand code in general.

#

There been some minigames made with GPT5.2 if you scroll up, it seemed kinda failed in some parts. The one with the bullet hell game.

#

if you're looking for coding bench that is

#

It's a good model that's all.

keen beacon Dec 11, 2025, 11:34 PM

#

Guys why dies using adolf on Sora unlock most every ip? O.o

#

But without it it gets blocked

#

Pokémon dragon ball z

#

U name it

#

Let’s see if it can do sonic

quartz light Dec 11, 2025, 11:38 PM

#

what the ####

#

xhigh is 5x smarter than opus 4 yet cheaper

queen veldt Dec 11, 2025, 11:40 PM

#

quartz light what the ####

Lie

quartz light Dec 11, 2025, 11:40 PM

#

queen veldt Lie

no its not what lol

#

https://arcprize.org/leaderboard

ARC Prize

ARC Prize - Leaderboard

The ARC-AGI Leaderboard.

golden ocean Dec 11, 2025, 11:40 PM

#

benchmaxxed

queen veldt Dec 11, 2025, 11:41 PM

#

Beches mean nothing until you actually try the model

keen beacon Dec 11, 2025, 11:41 PM

#

quartz light Dec 11, 2025, 11:41 PM

#

golden ocean benchmaxxed

isnt it hard to benchmax arc agi

#

golden ocean Dec 11, 2025, 11:42 PM

#

Idk good question

queen veldt Dec 11, 2025, 11:42 PM

#

Gpt 5.2 can't beat opus that easy

#

I tried few prompts and got similar (wrong) answer as previous model

keen beacon Dec 11, 2025, 11:44 PM

#

#

Is this true.?

#

https://aiwiki.ai/wiki/ARC-AGI_2

ARC-AGI 2

ARC-AGI 2 (Abstraction and Reasoning Corpus for Artificial General Intelligence 2) is an artificial intelligence benchmark designed to measure genuine reasoning and problem-solving capabilities in AI systems. Released on March 26, 2025, by the ARC Prize Foundation, it serves as a critical test for progress toward artificial general intelligence ...

#

lol yup sonic works 😅😅

#

See I figured it out

#

But only works with adolf 😭

queen veldt Dec 11, 2025, 11:58 PM

#

🙁

keen beacon Dec 11, 2025, 11:58 PM

#

Sucks so bad

#

I don’t get why the prom don’t work by themselves

#

Insta block

cloud zinc Dec 12, 2025, 12:02 AM

#

keen beacon I don’t get why the prom don’t work by themselves

what prompt

keen beacon Dec 12, 2025, 12:03 AM

#

Any of them try to do it without him see if it’s even possible. I’ve seen it on Reddit, but I just don’t know what the heck they did.

cloud zinc Dec 12, 2025, 12:03 AM

#

on api?

keen beacon Dec 12, 2025, 12:04 AM

#

App

#

For sonic sequence, I used “The scene opens with him running into a blue fast super fast a hedgehog that’s faster then a sonic plane”

#

Maybe you need a cameo character to make it work

cloud zinc Dec 12, 2025, 12:06 AM

#

why 4 seconds

keen beacon Dec 12, 2025, 12:06 AM

#

I cut it down

#

Here is orginal

#

The technique I’m using right now is just trying to make a bunch of sequences

#

See where exactly the filter catches and misses the ip

#

I don’t know, though I’ve been more confused lately than I’ve been able to get answers

cloud zinc Dec 12, 2025, 12:11 AM

#

too much trial and error, in the end, u realize its random

keen beacon Dec 12, 2025, 12:12 AM

#

It’s not.

#

It’s in the promoting

cloud zinc Dec 12, 2025, 12:15 AM

#

the guy on reddit used character cameo for sonic

#

not like any other ai company doesnt do that

#

europe is gonna be behind in the ai race

keen beacon Dec 12, 2025, 12:26 AM

#

Its cause Europe has strong data laws

#

They actually care about there citizens privacy and data to some extent

#

Internet is full of scrappers, all ai companies guilty of this

jade egret Dec 12, 2025, 12:36 AM

#

sullen quest Dec 12, 2025, 12:55 AM

#

sullen quest Dec 12, 2025, 12:56 AM

#

cloud zinc europe is gonna be behind in the ai race

gonna?

cloud zinc Dec 12, 2025, 12:56 AM

#

sullen quest

its not wrong

sullen quest Dec 12, 2025, 12:58 AM

#

cloud zinc its not wrong

Happy now?

cloud zinc Dec 12, 2025, 1:01 AM

#

verbal shadow Dec 12, 2025, 1:02 AM

#

jade egret Dec 12, 2025, 1:03 AM

#

verbal shadow

lmao

sullen quest Dec 12, 2025, 1:04 AM

#

verbal shadow

it still does this?

#

wow

verbal shadow Dec 12, 2025, 1:04 AM

#

sullen quest it still does this?

yes

sullen quest Dec 12, 2025, 1:05 AM

#

does it do that on grok.com

#

or is it only the non instruct one that does

verbal shadow Dec 12, 2025, 1:05 AM

#

let me check

sullen quest Dec 12, 2025, 1:05 AM

#

cloud zinc

wonder what made the diff

verbal shadow Dec 12, 2025, 1:06 AM

#

#

wow

sullen quest Dec 12, 2025, 1:09 AM

#

verbal shadow

try grok fast

verbal shadow Dec 12, 2025, 1:09 AM

#

sullen quest try grok fast

i wasnt signed up

sullen quest Dec 12, 2025, 1:09 AM

#

verbal shadow wow

wat

verbal shadow Dec 12, 2025, 1:09 AM

#

sullen quest wat

read it

#

echo aurora Dec 12, 2025, 1:12 AM

#

Lets try to keep conversation a bit less NSFW please.

keen beacon Dec 12, 2025, 1:12 AM

#

https://www.designarena.ai/

Design Arena

Design Arena is the largest global crowdsourced benchmark for design. Challenge, Vote, Crown your Winner.

#

Test ai design performance

#

stray aspen Dec 12, 2025, 1:21 AM

#

Has anyone used gpt 5.2 for developing websites

proud bobcat Dec 12, 2025, 1:22 AM

#

#

5.2 is stupid

#

This model sucks

#

Sacrificed actual knowledge for bloat

sullen quest Dec 12, 2025, 1:25 AM

#

keen beacon

not surpised, though I think there's a decent shot openAI is overpredicted here

keen beacon Dec 12, 2025, 1:25 AM

#

Insiders

stray aspen Dec 12, 2025, 1:27 AM

#

5.2 sucks so bad lmao

#

This is like grok levels of embarrassment

sullen quest Dec 12, 2025, 1:30 AM

#

keen beacon Insiders

not in every company

keen beacon Dec 12, 2025, 1:30 AM

#

Brah

sullen quest Dec 12, 2025, 1:30 AM

#

if its insider's its either openAI one's who really believe they got something cooking, google ones who think gem 3 is cooked, or I guess technically both

#

openAI doesn't have insider info on google models and vise versa

#

also anthropic

keen beacon Dec 12, 2025, 1:32 AM

#

You ever herd of app tracking lol

sullen quest Dec 12, 2025, 1:33 AM

#

keen beacon You ever herd of app tracking lol

too vague

keen beacon Dec 12, 2025, 1:35 AM

#

https://youtu.be/LKnlE609ZLw?si=xaUBnEOfc87GIuGt

YouTube

PC Security Channel

How tech companies spy on you and which is the worst?

We all know Microsoft, Google, Nvidia and other tech companies collect data, use telemetry for targeted ads, but who are the worst offenders and what kind of connections are they making on your PC? Try Zero Trust with ThreatLocker: https://www.threatlocker.com/pcsecurity (sponsor)

Buy the best antivirus: https://thepcsecuritychannel.com/best-a...

▶ Play video

#

These companies track each other through apps ppl got installed

#

https://youtu.be/GP8ezsjhPdw?si=68G1HWv1PVlNrq6n

YouTube

Jake Tran

Corporate Espionage: Make BIG Money Spying on Companies

😈 Watch exclusive 40+ minute documentaries that are too controversial to ever be released to the public: https://jake.yt/join

📹 Take a peak at all the private documentaries here: https://jake.yt/hidden-vids

💻 𝗟𝗮𝗽𝘁𝗼𝗽 𝗟𝗶𝗳𝗲𝘀𝘁𝘆𝗹𝗲 𝗔𝗰𝗮𝗱𝗲𝗺𝘆: Learn exactly how I landed my $40/...

▶ Play video

#

It’s a real thing

whole sundial Dec 12, 2025, 1:40 AM

#

proud bobcat Sacrificed actual knowledge for bloat

the model is still based on a 2024 model, they care more about boosting performance than upgrading world knowledge, don't expect it to know anything past mid-2024

#

their next base model will have more recent world knowledge, maybe enough to know what kimi k2 is at least

keen beacon Dec 12, 2025, 1:41 AM

#

Yeah

#

They just fine tuned it

visual osprey Dec 12, 2025, 1:41 AM

#

just use reaosning model like you should use for everything

whole sundial Dec 12, 2025, 1:41 AM

#

it might know 0905 though, afaik their bots have been highly active in the past few weeks

whole sundial Dec 12, 2025, 1:41 AM

#

visual osprey just use reaosning model like you should use for everything

reasoning isn't going to help the fact that the base model doesn't know what kimi k2 is

visual osprey Dec 12, 2025, 1:41 AM

#

but oh wait your sole usage of llms is to roleplay a virtual boyfriend so

keen beacon Dec 12, 2025, 1:42 AM

#

Ya it is

#

Better then being reliant on it and asking, dumb, basic commonsense questions that people should know

whole sundial Dec 12, 2025, 1:42 AM

#

web search would help as it can just search it

visual osprey Dec 12, 2025, 1:42 AM

#

why would web search ever be disabled

keen beacon Dec 12, 2025, 1:42 AM

#

$

visual osprey Dec 12, 2025, 1:43 AM

#

free users 🥱

keen beacon Dec 12, 2025, 1:43 AM

#

lol

visual osprey Dec 12, 2025, 1:43 AM

#

20 bucks a month is your netflix subscription but you cant invest that in your boyfriend?

keen beacon Dec 12, 2025, 1:43 AM

#

Yeah some people are smarty enough not to waste money on a proto type

whole sundial Dec 12, 2025, 1:43 AM

#

idk, gemini has it on all the time but having a model search the web does cost money for a company like openai that doesn't have its own index, although i guess their hoard of scraped pages will be fine

visual osprey Dec 12, 2025, 1:43 AM

#

keen beacon Dec 12, 2025, 1:44 AM

#

Put it into ur own words

#

If u still can

visual osprey Dec 12, 2025, 1:44 AM

#

keen beacon Yeah some people are smarty enough not to waste money on a proto type

funny how youve dropped 2 messages about how im stupid and youre smart and you dont even know prototype is a single word

whole sundial Dec 12, 2025, 1:44 AM

#

honestly, original k2 is probably better, 0905 did put more effort into coding and stuff so rp performance is slightly degraded

visual osprey Dec 12, 2025, 1:45 AM

#

whole sundial idk, gemini has it on all the time but having a model search the web does cost m...

have never seen it not web search when i see in its reasoning that it doesnt know what im talking about

#

so must be a free user thing

keen beacon Dec 12, 2025, 1:45 AM

#

https://tenor.com/view/intomysoul-darkheart-youre-cute-gif-5431645

Tenor

whole sundial Dec 12, 2025, 1:45 AM

#

i found that original k2 does do better at outputting song lyrics from its knowledge, so rp may be degraded

whole sundial Dec 12, 2025, 1:46 AM

#

visual osprey so must be a free user thing

i have paid gemini and it does this, so that's irrelevant

visual osprey Dec 12, 2025, 1:46 AM

#

keen beacon https://tenor.com/view/intomysoul-darkheart-youre-cute-gif-5431645

american tiktok brains are painfully unfunny and lack self awareness

keen beacon Dec 12, 2025, 1:46 AM

#

Bless your arrogance and your enlightening ignorance

visual osprey Dec 12, 2025, 1:47 AM

#

i scroll down 5 messages in your history and you sent another ignorant message

#

saying the coding model market is insiders

#

you dont even read what the market is based off

#

you have no clue

#

then another 3 messages down you spell scrapers as scrappers

#

proto type and scrappers

#

you cant even write english bro

keen beacon Dec 12, 2025, 1:48 AM

#

quick jackal Dec 12, 2025, 1:48 AM

#

@here Hey guys, please let's try to be respectful in the chat. Thank you.

visual osprey Dec 12, 2025, 1:49 AM

#

keen beacon

what is this

solar hollow Dec 12, 2025, 1:49 AM

#

gpt 5.2 is pretty much equal to gemini 3 right?

visual osprey Dec 12, 2025, 1:49 AM

#

release date?

keen beacon Dec 12, 2025, 1:49 AM

#

visual osprey Dec 12, 2025, 1:49 AM

#

openai does thursday releases

#

this is known

keen beacon Dec 12, 2025, 1:49 AM

#

visual osprey Dec 12, 2025, 1:49 AM

#

and you were talking about the coding model market

#

this one sure ill give you that its probably insiders

#

a portion

#

but the coding market isnt

keen beacon Dec 12, 2025, 1:50 AM

#

This is all I was talking about

visual osprey Dec 12, 2025, 1:50 AM

#

keen beacon Dec 12, 2025, 1:50 AM

#

Yes same concept

visual osprey Dec 12, 2025, 1:50 AM

#

mustve been the other guy with your pfp and name

#

sending these 2 messages

keen beacon Dec 12, 2025, 1:51 AM

#

visual osprey Dec 12, 2025, 1:51 AM

#

no response?

#

and?

#

read the rules

keen beacon Dec 12, 2025, 1:53 AM

#

#

You don’t find that odd?

compact sleet Dec 12, 2025, 1:53 AM

#

Actually rather than theorycrafting or things like misinformation, you guys knew that you can compare both models in LMarena without searching them in battle right? Side by Side comparison is there.

#

Just test it yourself fam, GPT 5.2 and Gemini 3 pro

keen beacon Dec 12, 2025, 1:54 AM

#

This is something completely different

visual osprey Dec 12, 2025, 1:54 AM

#

keen beacon You don’t find that odd?

read

compact sleet Dec 12, 2025, 1:54 AM

#

how come it's different?

visual osprey Dec 12, 2025, 1:54 AM

#

#

the moment the leaderboard updates the market will have these massive jumps

#

so its not that odd at all acgtually

keen beacon Dec 12, 2025, 1:55 AM

#

The model wasn’t out until 11th ( officially)

#

The market jumped on the 5th

visual osprey Dec 12, 2025, 1:56 AM

#

the leaderboard updated

#

5.2 isnt even on the leaderboard yet

visual osprey Dec 12, 2025, 1:56 AM

#

visual osprey the leaderboard updated

for 5.1 codex max

keen beacon Dec 12, 2025, 1:57 AM

#

Not asking about the model but the company

visual osprey Dec 12, 2025, 1:57 AM

#

waht does that even mean

compact sleet Dec 12, 2025, 1:57 AM

#

^yeah

visual osprey Dec 12, 2025, 1:57 AM

#

look at that message yourself and reflect on how you are typing this

sullen quest Dec 12, 2025, 1:57 AM

#

keen beacon The model wasn’t out until 11th ( officially)

btw, just because you have insider info, doesn't mean you are always correct, unless the big 3 AI companies came together and agreed one should just have the best coding model, there wouldn't be any insider's that would have enough info to predict that

#

Insiders can still use their info to help them make bets

#

but its not as guaranteed as other bets would be

visual osprey Dec 12, 2025, 1:58 AM

#

his whole insider point can be disregarded since he has no clue how that particular market (orto be honest i doubt he has any clue on anything) resolves

compact sleet Dec 12, 2025, 1:58 AM

#

I normally hate generalizing people, but I think you're just malding after a lost bet and money on polymarket, in which... is not all relevant on this whole discord server

visual osprey Dec 12, 2025, 2:00 AM

#

that guys a clown

sullen quest Dec 12, 2025, 2:00 AM

#

sullen quest Insiders can still use their info to help them make bets

for example, if I knew my company "rex AI" was about to release a banger coding model, I'd be interested in polymarket bets that have rex AI as a noncontender for the best coding model, but I can't guarantee the new "Rex 2.2 ulite ultra" will actually win

keen beacon Dec 12, 2025, 2:00 AM

#

visual osprey Dec 12, 2025, 2:00 AM

#

wrong about everything cant write basic english and is calling me a idiot and himself smart

#

"heres a random video of a equally clueless person to me to justify my view"

sullen quest Dec 12, 2025, 2:01 AM

#

keen beacon

gehlo, openAI JUST released a coding-first model

#

the "something" is pretty obvious

keen beacon Dec 12, 2025, 2:01 AM

#

Exactly

visual osprey Dec 12, 2025, 2:01 AM

#

i said this to him and he says uhhhhh "Not asking about the model but the company
"

sullen quest Dec 12, 2025, 2:01 AM

#

visual osprey i said this to him and he says uhhhhh "Not asking about the model but the compan...

lo,l

keen beacon Dec 12, 2025, 2:02 AM

#

It is obvious

visual osprey Dec 12, 2025, 2:02 AM

#

https://tenor.com/view/wtf-reaction-laughing-hurr-durr-gif-7358154

Tenor

sullen quest Dec 12, 2025, 2:02 AM

#

yeah

visual osprey Dec 12, 2025, 2:02 AM

#

you^

sullen quest Dec 12, 2025, 2:02 AM

#

so obvious, you don't need to be an insider to know that

sonic wigeon Dec 12, 2025, 2:02 AM

#

guys how's gpt 5.2?

sullen quest Dec 12, 2025, 2:02 AM

#

sonic wigeon guys how's gpt 5.2?

depends

sonic wigeon Dec 12, 2025, 2:02 AM

#

sullen quest depends

hmm

sullen quest Dec 12, 2025, 2:02 AM

#

what do you want to do with it?

sonic wigeon Dec 12, 2025, 2:02 AM

#

sullen quest what do you want to do with it?

math, solving, studying

sullen quest Dec 12, 2025, 2:03 AM

#

sonic wigeon math, solving, studying

prob fine

#

supposedly pretty good at math

compact sleet Dec 12, 2025, 2:03 AM

#

^

sullen quest Dec 12, 2025, 2:03 AM

#

Just don't ask it how many r's are in garlic

sonic wigeon Dec 12, 2025, 2:03 AM

#

sullen quest Just don't ask it how many r's are in garlic

lmao

visual osprey Dec 12, 2025, 2:03 AM

#

all models are pretty much 99% there for university level maths

#

not much diff

#

graphical problems are the main weakness

sonic wigeon Dec 12, 2025, 2:04 AM

#

visual osprey all models are pretty much 99% there for university level maths

hmm true. basically every model can solve any JEE level question now

sullen quest Dec 12, 2025, 2:04 AM

#

OpenAI supposedly has been focusing on decreasing performance degredation over long context windows

visual osprey Dec 12, 2025, 2:04 AM

#

sullen quest OpenAI supposedly has been focusing on decreasing performance degredation over l...

it looked impressive on the graph

#

on their promo page

#

near 100% perfect context

sullen quest Dec 12, 2025, 2:05 AM

#

kinda

visual osprey Dec 12, 2025, 2:05 AM

#

actually huge improvement

compact sleet Dec 12, 2025, 2:05 AM

#

Iunno about that, I see the uhh what you call it again, the model card?

visual osprey Dec 12, 2025, 2:05 AM

#

sullen quest Dec 12, 2025, 2:05 AM

#

but that was with their own internal tests, and it was with 4 needles

compact sleet Dec 12, 2025, 2:05 AM

#

I think it's 78% recall at 256k context

#

or even lower

#

They even said it themselves

#

on their web

#

It's a huge improvement yes

#

if true

visual osprey Dec 12, 2025, 2:05 AM

#

i think its probably best in market though

sullen quest Dec 12, 2025, 2:05 AM

#

compact sleet I think it's 78% recall at 256k context

thats probably 8 needles

sonic wigeon Dec 12, 2025, 2:06 AM

#

5.1 probably had the shortest life out of any LLM released
around 15 days lol

sullen quest Dec 12, 2025, 2:06 AM

#

visual osprey i think its probably best in market though

idk, nobody really has access to MRCRv2 as far as I can tell

#

so its impossible to test

visual osprey Dec 12, 2025, 2:06 AM

#

whats open source benchmark for context

compact sleet Dec 12, 2025, 2:06 AM

#

Sadly i'm not going to self bench that context retention, lol

visual osprey Dec 12, 2025, 2:06 AM

#

that creative wrtiting one?

#

so we'll see

#

on that

compact sleet Dec 12, 2025, 2:07 AM

#

visual osprey that creative wrtiting one?

EQ Bench?

sullen quest Dec 12, 2025, 2:07 AM

#

sonic wigeon 5.1 probably had the shortest life out of any LLM released around 15 days lol

that's what gem 3 does to a Sam Altman in the wild

compact sleet Dec 12, 2025, 2:07 AM

#

Longform Writing v3? you mean

visual osprey Dec 12, 2025, 2:07 AM

#

or is it called like fictionbench

#

yeah

solar hollow Dec 12, 2025, 2:07 AM

#

so the posted benchmarks from openai are again painting an inaccurate picture, when we look at livebench and lmarena

visual osprey Dec 12, 2025, 2:07 AM

#

longform writing

sullen quest Dec 12, 2025, 2:07 AM

#

visual osprey whats open source benchmark for context

?

#

eqbench?

sullen quest Dec 12, 2025, 2:07 AM

#

solar hollow so the posted benchmarks from openai are again painting an inaccurate picture, w...

idk, 5.2 isn't on text arena yet

visual osprey Dec 12, 2025, 2:07 AM

#

https://eqbench.com/creative_writing_longform.html

sullen quest Dec 12, 2025, 2:08 AM

#

and long context window doesn't really matter much in lmarena

solar hollow Dec 12, 2025, 2:08 AM

#

sullen quest idk, 5.2 isn't on text arena yet

yeah i mean the webarena

sullen quest Dec 12, 2025, 2:08 AM

#

second ain't bad

sonic wigeon Dec 12, 2025, 2:08 AM

#

sullen quest and long context window doesn't really matter much in lmarena

yeah the UI starts to lag really badly after some time
though that happens with every UI

solar hollow Dec 12, 2025, 2:08 AM

#

the posted benchmarks make it seem like clear nr1 sota

compact sleet Dec 12, 2025, 2:08 AM

#

sullen quest and long context window doesn't really matter much in lmarena

wait, I thought coders need it? Since they probably one shot apps and things like that, that can't possibly low in tokens right?

sullen quest Dec 12, 2025, 2:09 AM

#

sonic wigeon yeah the UI starts to lag really badly after some time though that happens with ...

not really a 5.2 problem then?

visual osprey Dec 12, 2025, 2:09 AM

#

compact sleet wait, I thought coders need it? Since they probably one shot apps and things lik...

not rleevant in chat

#

in ide its big deal

sullen quest Dec 12, 2025, 2:09 AM

#

compact sleet wait, I thought coders need it? Since they probably one shot apps and things lik...

text arena I mostly mean

compact sleet Dec 12, 2025, 2:09 AM

#

ah

#

ok

visual osprey Dec 12, 2025, 2:09 AM

#

but lmarena code tests dont go crazy context lengths

sullen quest Dec 12, 2025, 2:09 AM

#

even then, I'd bet most webdev projects on lmarena are only a couple turns max

visual osprey Dec 12, 2025, 2:09 AM

#

yeah

#

in real programming though youd hit context fastish on the ide

sullen quest Dec 12, 2025, 2:10 AM

#

I had a massive project that spanned multiple webdev chats and multiple days, I'd have loved better context

compact sleet Dec 12, 2025, 2:10 AM

#

I had a feeling Gemini is still the king of long context analysis

sullen quest Dec 12, 2025, 2:10 AM

#

compact sleet I had a feeling Gemini is still the king of long context analysis

mostly cause it has a larger context window

#

not because its performance doesnt degrade fast

solar hollow Dec 12, 2025, 2:11 AM

#

compact sleet I had a feeling Gemini is still the king of long context analysis

as always the 3 labs are very comparable to each other, none is much better than the other

compact sleet Dec 12, 2025, 2:11 AM

#

Yeah, it's probably not as landslide difference too

solar hollow Dec 12, 2025, 2:11 AM

#

this has been the case for more than half a year now

keen beacon Dec 12, 2025, 2:11 AM

#

Bottle necked

sullen quest Dec 12, 2025, 2:12 AM

#

idk, the models kinda differ more and more

visual osprey Dec 12, 2025, 2:12 AM

#

gemini context window is kinda fake from my experience though

sullen quest Dec 12, 2025, 2:12 AM

#

visual osprey gemini context window is kinda fake from my experience though

wat

visual osprey Dec 12, 2025, 2:12 AM

#

i had it watch 900k token video and it just couldnt accurately telle me anything

#

completely hallucinatin

sullen quest Dec 12, 2025, 2:12 AM

#

yea

#

gl with 900k

visual osprey Dec 12, 2025, 2:12 AM

#

it claims 1m

sullen quest Dec 12, 2025, 2:12 AM

#

visual osprey it claims 1m

yeah and it has

visual osprey Dec 12, 2025, 2:12 AM

#

but at like 30% accuracy

#

so not useful

sullen quest Dec 12, 2025, 2:13 AM

#

but performance degrades with filled context windows

#

so yea

native yarrow Dec 12, 2025, 2:23 AM

#

how is gpt 5.2 in creative writing

atomic lagoon Dec 12, 2025, 2:28 AM

#

native yarrow how is gpt 5.2 in creative writing

Probably ahh

native yarrow Dec 12, 2025, 2:29 AM

#

i'd concur

sullen quest Dec 12, 2025, 2:30 AM

#

prob bad

#

actually according to eqbench not bad

native yarrow Dec 12, 2025, 2:37 AM

#

meh

#

no good model for writing as

#

AI models weren't trained mainly off that

#

they should make a model specifically for thet though, doesn't matter of the company

visual osprey Dec 12, 2025, 2:46 AM

#

no point though

#

its already good enough for copywriting and its extremely difficult to train for good creative writing

#

only the roleplayer market has anything to benefit from a model made for creative writing

#

and roleplayers are cheapskates

keen beacon Dec 12, 2025, 2:50 AM

#

lol

#

Might be
some truth to that

#

#

Also could be that they are afraid of getting copyright lawsuits from writers which has happened in some cases already thus why it’s not in that stats.

#

There is high demand for creative writing, and it is something that people ask about.

solar hollow Dec 12, 2025, 3:08 AM

#

is 5.2 available for free users right now?

obsidian cargo Dec 12, 2025, 3:40 AM

#

I want a leaderboard that ranks how often Assistant A is chosen over Assistant B

green yacht Dec 12, 2025, 3:45 AM

#

solar hollow is 5.2 available for free users right now?

try using lmarena

solar hollow Dec 12, 2025, 3:45 AM

#

green yacht try using lmarena

always timeout for me

astral blaze Dec 12, 2025, 3:45 AM

#

@echo aurora can I ask is the Gemini 3 on the site just the default parameters

#

Because it seems to generate stuff a bit different from the API

vivid coral Dec 12, 2025, 4:09 AM

#

solar hollow always timeout for me

heavy usage with new models are part of that I'm sure.

solar hollow Dec 12, 2025, 4:17 AM

#

vivid coral heavy usage with new models are part of that I'm sure.

i have that alot, even before gtp 5.2

vivid coral Dec 12, 2025, 4:20 AM

#

I get it with 5.1 search off and on, always seems to be a GPT thing more than the others, not sure why. They'll figure it out, the smartest AI people in the world work here (and the most important)

plucky sparrow Dec 12, 2025, 4:35 AM

#

How do I try the new gemini deep research model

pseudo summit Dec 12, 2025, 4:37 AM

#

👀 just wondering, how do u know?

empty stump Dec 12, 2025, 4:38 AM

#

plucky sparrow How do I try the new gemini deep research model

use the API

vivid acorn Dec 12, 2025, 4:45 AM

#

Hey man

#

amazing tool

vivid coral Dec 12, 2025, 4:50 AM

#

plucky sparrow How do I try the new gemini deep research model

they don't do deep research here, most DRs are vastly overrated anyway. Great thinking and great search is the key to a great LLM

atomic lagoon Dec 12, 2025, 4:54 AM

#

empty stump use the API

Its not on the API

empty stump Dec 12, 2025, 4:56 AM

#

deep-research-pro-preview-12-2025

novel slate Dec 12, 2025, 5:30 AM

#

hy

brave orbit Dec 12, 2025, 5:32 AM

#

gpt 5,2 is so good just looking at bechmarks how can it be such a jump on ARC AGI

surreal creek Dec 12, 2025, 5:35 AM

#

brave orbit gpt 5,2 is so good just looking at bechmarks how can it be such a jump on ARC AG...

LMARENA DOESN’T CARE ABOUT YOUR BENCHMARKS 😤

#

WE ARE THE BENCHMARK

brave orbit Dec 12, 2025, 5:36 AM

#

i only care about coding and math thats it

#

i use arc agi isnt that turstworthy

cloud zinc Dec 12, 2025, 5:37 AM

#

we need gpt 5.2 x-high

brave orbit Dec 12, 2025, 5:37 AM

#

cloud zinc we need gpt 5.2 x-high

its only in api

#

how is grok wining ag openai bruh grok makes simple websites

cloud zinc Dec 12, 2025, 5:45 AM

#

brave orbit its only in api

so? lmarena can add model from api

echo aurora Dec 12, 2025, 5:53 AM

#

astral blaze Because it seems to generate stuff a bit different from the API

How so?

vivid coral Dec 12, 2025, 5:57 AM

#

brave orbit how is grok wining ag openai bruh grok makes simple websites

Because Grok is kind of like AI for Dummies. So the causals usually vote for it. It's all about the audience. And even though there experts everywhere in here, casuals probably account 80-90% of the votes

#

Which is a good thing. These companies want to attract the "once in awhile" users. There are TONS. We just get caught up because Discord is full of heavy users. But it's not real world

plucky sparrow Dec 12, 2025, 6:09 AM

#

guys AGI is real. Sam created a real monster.

pale obsidian Dec 12, 2025, 6:12 AM

#

what the hell is December chatbot

#

it's pretty bad

plucky sparrow Dec 12, 2025, 6:12 AM

#

🤔

pale obsidian Dec 12, 2025, 6:12 AM

#

came up 5 times and it always lost for me

plucky sparrow Dec 12, 2025, 6:13 AM

#

but I think codename-discussion/speculation is supposed to be in #codename-discussion

pale obsidian Dec 12, 2025, 6:13 AM

#

🤔

pale obsidian Dec 12, 2025, 6:15 AM

#

plucky sparrow but I think codename-discussion/speculation is supposed to be in <#1425525552428...

sure

sullen quest Dec 12, 2025, 6:25 AM

#

?

compact flame Dec 12, 2025, 6:31 AM

#

Hello

astral blaze Dec 12, 2025, 6:50 AM

#

echo aurora How so?

uhhhh lmarena is shorter

#

idk

limpid torrent Dec 12, 2025, 6:52 AM

#

<@&1422628364782407830>

compact flame Dec 12, 2025, 6:56 AM

#

This has to be a joke bruh

#

Isn't it like 96

brittle tulip Dec 12, 2025, 6:58 AM

#

hello LMArena

weary galleon Dec 12, 2025, 7:16 AM

#

brittle tulip hello LMArena

Hi

hazy forge Dec 12, 2025, 7:17 AM

#

brittle tulip hello LMArena

hi

#

are we getting gpt 5.2 pro ? or just gpt 5.2 high like deepseek's 3.2 thinking and the speciale

compact flame Dec 12, 2025, 7:18 AM

#

hazy forge are we getting gpt 5.2 pro ? or just gpt 5.2 high like deepseek's 3.2 thinking a...

Pro is too expensive

#

Won't get added

weary galleon Dec 12, 2025, 7:18 AM

#

hazy forge are we getting gpt 5.2 pro ? or just gpt 5.2 high like deepseek's 3.2 thinking a...

No Pro so far

hazy forge Dec 12, 2025, 7:18 AM

#

how do they have opus 4.5

compact flame Dec 12, 2025, 7:18 AM

#

And pro reasoned for 20 minutes and failed a math test bruh

#

So pro version is not really useful

compact flame Dec 12, 2025, 7:19 AM

#

hazy forge how do they have opus 4.5

Because it's kinda cheap

#

Well not that cheap but still cheaper than pro

hazy forge Dec 12, 2025, 7:19 AM

#

compact flame So pro version is not really useful

so literally 3.2 speciale with the apple treatment design

compact flame Dec 12, 2025, 7:20 AM

#

hazy forge so literally 3.2 speciale with the apple treatment design

Yeah

hazy forge Dec 12, 2025, 7:20 AM

#

it's like a 3.2 speciale with makeup vro

compact flame Dec 12, 2025, 7:20 AM

#

compact flame This has to be a joke bruh

Like look

#

It's supposed to be 96

#

Yet pro said 99 while thinking for straight 20 minutes

hazy forge Dec 12, 2025, 7:20 AM

#

show me the prompt vro

compact flame Dec 12, 2025, 7:21 AM

#

hazy forge show me the prompt vro

1,7,18,45,?

#

Gemini answered right tho

#

So yeah there no point in pro versions of gpt honestly

astral bloom Dec 12, 2025, 7:27 AM

#

share solution

astral bloom Dec 12, 2025, 7:28 AM

#

compact flame 1,7,18,45,?

.

blissful python Dec 12, 2025, 7:31 AM

#

hrllo

pseudo summit Dec 12, 2025, 7:36 AM

#

brave orbit i only care about coding and math thats it

chatGPT 5.2 is not good at maths imo

weary galleon Dec 12, 2025, 8:11 AM

#

LMArena is not working for me right now!

lost elbow Dec 12, 2025, 8:15 AM

#

Hello,
Please fix the major issues on the lmarena.ai website as soon as possible. These problems occur on all browsers (including Chrome, the main mobile browser) and on both mobile and desktop. The Kiwi browser also has the same issues.

It is not possible to copy the text that we type ourselves.
When sending a message in the chat, the copy option appears, but no text is saved to the mobile clipboard.
Generated images cannot be downloaded.
Taking screenshots of the website pages is not possible.
The captcha takes a very long time to load, and accessing the site is slow. Also, it asks to solve the captcha every time an image is generated; please remove the captcha to provide a better user experience.
Most of the time, images fail to generate and show an error, both when registered and without registration. Please fix this issue so that image generation always works properly.

These problems only occur on lmarena.ai. Other websites work normally without any issues.

My device: Poco X6 Pro (Global) — Android 15, HyperOS 2.0.207.0.

Please resolve these important issues as soon as possible to provide a better user experience.
Thank you.

plucky sparrow Dec 12, 2025, 8:15 AM

#

lost elbow Hello, Please fix the major issues on the lmarena.ai website as soon as possible...

try #1343291835845578853

#

or #1372230675914031105

#

copy works for me, btw, and saving images works for me. I think it's your browser/config

left lodge Dec 12, 2025, 8:17 AM

#

lost elbow Hello, Please fix the major issues on the lmarena.ai website as soon as possible...

Everything works for me. Try a different browser or redownload the browser.

whole sundial Dec 12, 2025, 8:18 AM

#

lost elbow Hello, Please fix the major issues on the lmarena.ai website as soon as possible...

you should be able to copy input text
it should be copied
you should be able to download images
you should be able to take screenshots

#

100% a you problem, sorry to say

lost elbow Dec 12, 2025, 8:19 AM

#

left lodge Everything works for me. Try a different browser or redownload the browser.

When the main browser doesn’t work, there are no issues on other websites — these problems only occur on lmarena.ai.

plucky sparrow Dec 12, 2025, 8:19 AM

#

what browser are you using amir?

whole sundial Dec 12, 2025, 8:19 AM

#

let me try all of these things rn with lmarena on my phone

plucky sparrow Dec 12, 2025, 8:19 AM

#

and/or device?

lost elbow Dec 12, 2025, 8:22 AM

#

When the main browser, Chrome, doesn’t work, there are no issues on other websites — these problems only occur on lmarena.ai. The Kiwi browser also doesn’t work.

Device: poco x6 pro

whole sundial Dec 12, 2025, 8:25 AM

#

all 4 things you mentioned works fine for me, Galaxy A35, Chrome, Android 16

#

i would maybe suggest scanning your phone for malware?

#

malware can mess up your ability to copy, screenshot, and download

#

also check to make sure your phone's storage space isn't full, that can also cause those problems

#

also side note: i never used the mobile lmarena website before until today and I have to say that it is a very nice and smooth website, I actually wouldn't mind using this if I actually used it for more than messages because sadly this is 2025 and everything requires phone verification, face scanning that can only be done on a phone. etc. i understand the need, but it's too excessive and a real invasion of privacy in some cases

queen veldt Dec 12, 2025, 8:35 AM

#

Gpt 5.2 is crazy

weary galleon Dec 12, 2025, 8:37 AM

#

queen veldt Gpt 5.2 is crazy

It's GPT-5.1.

stuck orchid Dec 12, 2025, 8:51 AM

#

Gpt 5.2 > Gemini 3 pro?

raw meteor Dec 12, 2025, 8:53 AM

#

sdfui

pseudo summit Dec 12, 2025, 8:53 AM

#

stuck orchid Gpt 5.2 > Gemini 3 pro?

i haven't gotten much direct comparison, have u?

raw meteor Dec 12, 2025, 8:53 AM

#

@raw meteor hiw!

#

guys i cant upload any picture in arena video

#

pls help me

pseudo summit Dec 12, 2025, 8:53 AM

#

would b interested to hear ur thought on y u'd think GPT 5.2 > Gemini 3 Pro

solar hollow Dec 12, 2025, 8:59 AM

#

stuck orchid Gpt 5.2 > Gemini 3 pro?

cant really test it in lmarena, direct chat with it leads to error everytime

sour spear Dec 12, 2025, 9:06 AM

#

pseudo summit would b interested to hear ur thought on y u'd think GPT 5.2 > Gemini 3 Pro

No, definitely not. Even the supercharged GPT-5.2 High can't really outperform the regular end user Gemini 3 Pro, and "normal" 5.2 is definitely worse than Geminin 3 Pro. A good example of a "better" model is Claude 4.5 Opus, for coding tasks. Only very specifically there, but it's rather obvious. GPT-5.2 on the other hand is only trying to close the gap, without success so far.

queen veldt Dec 12, 2025, 9:08 AM

#

weary galleon It's GPT-5.1.

No

weary galleon Dec 12, 2025, 9:08 AM

#

GPT-5.2 is TERRIBLE😡😡😡😡😡

queen veldt Dec 12, 2025, 9:08 AM

#

weary galleon Dec 12, 2025, 9:08 AM

#

queen veldt

I didn't know you have paid subscription.

queen veldt Dec 12, 2025, 9:09 AM

#

I got it for free

#

Yesterday

#

I was subbed before than i canceled the sub 1 month ago

#

#

Yesterday i got this

weary galleon Dec 12, 2025, 9:09 AM

#

sour spear No, definitely not. Even the supercharged GPT-5.2 High can't really outperform t...

GPT-5.1-high outperforms GPT-5.2 in ALL tasks!

#

OpenAI ashamed yesterday too much.

sour spear Dec 12, 2025, 9:11 AM

#

weary galleon GPT-5.1-high outperforms GPT-5.2 in ALL tasks!

Yeah, it's a quick iteration to catch up with the competition, the third GPT-5 release in only 4 months. But nothing really groundbreakingly new. They need to refocus on ChatGPT, instead of spreading their resources thin on their Atlas browser or SoraTok.

compact flame Dec 12, 2025, 9:11 AM

#

weary galleon GPT-5.1-high outperforms GPT-5.2 in ALL tasks!

Fr even 5.2 pro failed that one math test

#

It was thinking for straight 20 minutes and got it wrong

queen veldt Dec 12, 2025, 9:12 AM

#

Who even uses that Atlas lmao

weary galleon Dec 12, 2025, 9:14 AM

#

compact flame Fr even 5.2 pro failed that one math test

GPT-5.1-high counts my tomatoes MORE accurate than GPT-5.2! SHAME!!!!!

weary galleon Dec 12, 2025, 9:14 AM

#

queen veldt Who even uses that Atlas lmao

Sama

queen veldt Dec 12, 2025, 9:15 AM

#

weary galleon GPT-5.1-high counts my tomatoes MORE accurate than GPT-5.2! SHAME!!!!!

Those are NOT your tomatoes 😭

compact flame Dec 12, 2025, 9:15 AM

#

weary galleon GPT-5.1-high counts my tomatoes MORE accurate than GPT-5.2! SHAME!!!!!

What about how good uh 5.2 high counts?

queen veldt Dec 12, 2025, 9:16 AM

#

weary galleon Dec 12, 2025, 9:16 AM

#

queen veldt Those are NOT your tomatoes 😭

It's my prompt I use it all the time.

queen veldt Dec 12, 2025, 9:16 AM

#

Is this correct?

weary galleon Dec 12, 2025, 9:16 AM

#

queen veldt Is this correct?

Yes

compact flame Dec 12, 2025, 9:17 AM

#

queen veldt

Six nine damn

queen veldt Dec 12, 2025, 9:17 AM

#

Gpt 3 pro is best

compact flame Dec 12, 2025, 9:17 AM

#

queen veldt Gpt 3 pro is best

Gpt o3 pro?

queen veldt Dec 12, 2025, 9:17 AM

#

Gemini********

compact flame Dec 12, 2025, 9:17 AM

#

Yeah

weary galleon Dec 12, 2025, 9:18 AM

#

compact flame What about how good uh 5.2 high counts?

A little bit better. GPT-5.1-high says 49, GPT-5.2 says 47, GPT-5.2-high says 52.

plucky sparrow Dec 12, 2025, 9:18 AM

#

if only they didn't nerf gemini 3

queen veldt Dec 12, 2025, 9:19 AM

#

Hold up

plucky sparrow Dec 12, 2025, 9:19 AM

#

i feel the context is nerfed, and the output is nerfed

queen veldt Dec 12, 2025, 9:19 AM

#

Gpt 5.2 is writing code

compact flame Dec 12, 2025, 9:19 AM

#

queen veldt Hold up

What is bro trying to code there?

weary galleon Dec 12, 2025, 9:19 AM

#

compact flame What is bro trying to code there?

GTA clone

compact flame Dec 12, 2025, 9:19 AM

#

These gpt "High" are definitely high on drugs

queen veldt Dec 12, 2025, 9:20 AM

#

😭

#

Filenotfounderror

compact flame Dec 12, 2025, 9:20 AM

#

compact flame These gpt "High" are definitely high on drugs

This is accurate guys

compact flame Dec 12, 2025, 9:20 AM

#

queen veldt 😭

Bro is trying to find the tomato folder

queen veldt Dec 12, 2025, 9:21 AM

#

It's stuck 😭

#

#

And we got a winner!

#

Bro thinks he's human

#

3 minutes to count dam tomatoes

weary galleon Dec 12, 2025, 9:23 AM

#

Clear winner is Gemini 3 Pro.

plucky sparrow Dec 12, 2025, 9:23 AM

#

ok i did a brief count and i counted more than 54

weary galleon Dec 12, 2025, 9:23 AM

#

plucky sparrow ok i did a brief count and i counted more than 54

69 is right answer.

#

I counted them a month ago.

#

I use this prompt every single day with different LLMs.

plucky sparrow Dec 12, 2025, 9:24 AM

#

AGI

#

I guess I shouldn't be surprised, tbh

#

#

is it any good at coding though?

compact flame Dec 12, 2025, 9:26 AM

#

plucky sparrow is it any good at coding though?

Probably?

#

Waiting for gpt pro to reason how many tomatoes he sees

#

Knowing chatgpt it might take another 20 minutes

long jackal Dec 12, 2025, 9:38 AM

#

Hey everyone

When I set a model to Code mode and ask for a Python script (e.g., something to run in Google Colab), it replies with HTML instead of plain Python code.

Is that normal in Code mode? Like, does it tend to default to HTML output?

compact flame Dec 12, 2025, 9:39 AM

#

long jackal Hey everyone When I set a model to Code mode and ask for a Python script (e.g.,...

Code mode is like for browser building stuff and app ones

#

You gotta use just the basic one and I recommend using opus

plucky sparrow Dec 12, 2025, 9:43 AM

#

compact flame Dec 12, 2025, 9:49 AM

#

Damn he's still counting tomatoes

#

Yeah I give up on waiting it's counting for way too long

weary galleon Dec 12, 2025, 10:10 AM

#

compact flame Knowing chatgpt it might take another 20 minutes

It's much faster to count them manually.

compact flame Dec 12, 2025, 10:15 AM

#

weary galleon It's much faster to count them manually.

Yeah it's even examining some sorta crop changes

#

So yeah pro is definitely not counting these tomatoes anytime soon

rapid merlin Dec 12, 2025, 10:16 AM

#

plucky sparrow

haha it still gets a stroke from this

weary galleon Dec 12, 2025, 10:18 AM

#

Sorry, from smartphone.

#

Right answer is 69.

rancid oxide Dec 12, 2025, 10:19 AM

#

uhmm guys

#

why is your nano banana pro not working

#

da fuq

low imp Dec 12, 2025, 10:22 AM

#

5.2 is tuff

obtuse smelt Dec 12, 2025, 10:23 AM

#

again

polar wharf Dec 12, 2025, 10:24 AM

#

https://gpt4free.pro
If you can reverse it, you deserve it

Gpt4Free

Gpt4Free - Free Unlimited AI Image & Video Generation

Generate stunning AI images and videos for free with Gpt4Free. Access 9 powerful AI models including Flux 2 Pro, VEO 3.1, Sora 2, and Kling 2.6. No credit card required, 10 daily credits.

low imp Dec 12, 2025, 10:27 AM

#

Nvm 5.2 is dogsh

obtuse smelt Dec 12, 2025, 10:33 AM

#

hmm 5.2 ?

weary galleon Dec 12, 2025, 10:55 AM

#

You have no right to write RUSSIAN here!!!!!!!😡

#

We are English-speaking community!

compact flame Dec 12, 2025, 10:57 AM

#

This is English community

#

Thanks

weary galleon Dec 12, 2025, 10:59 AM

#

Reported.

#

Speak English!

queen veldt Dec 12, 2025, 11:14 AM

#

Don't make me call the mods

meager fulcrum Dec 12, 2025, 11:28 AM

#

Hz

jovial solar Dec 12, 2025, 11:35 AM

#

Hello

meager fulcrum Dec 12, 2025, 11:36 AM

#

Hi

fleet lintel Dec 12, 2025, 11:47 AM

#

low imp Nvm 5.2 is dogsh

scam altman scammed us with 5.2 .... crazy benchmaxxed model on few benchmarks but doing worse in many other benchmarks and in real life use-cases.

fleet surge Dec 12, 2025, 11:51 AM

#

fleet lintel scam altman scammed us with 5.2 .... crazy benchmaxxed model on few benchmarks b...

openai classic

coral goblet Dec 12, 2025, 11:51 AM

#

nano banana pro keep ignore my prompt

#

classic

visual osprey Dec 12, 2025, 11:51 AM

#

december-chatbot

coral goblet Dec 12, 2025, 11:51 AM

#

first good then nerf

visual osprey Dec 12, 2025, 11:52 AM

#

what is this december-chatbot?

fleet surge Dec 12, 2025, 11:52 AM

#

huh

glacial mulch Dec 12, 2025, 11:57 AM

#

coral goblet first good then nerf

nbp hasnt been nerfed

coral goblet Dec 12, 2025, 12:00 PM

#

yes it does

fickle venture Dec 12, 2025, 12:09 PM

#

New model if anyone interesting, it seems this one eat a lot of token:
https://youtu.be/676EBGcv8YY?si=cgz7dz0OJdt_dZ5Q

YouTube

AICodeKing

Goose's G3: RIP Claude Code! This Opensource AUTOCODING AI Agent CA...

In this video, I'll be telling you about g3, a revolutionary new AI coding tool based on adversarial cooperation that solves the context loss problem by making two AI agents fight each other to write better code. This is based on a groundbreaking research paper and represents a completely new paradigm for autonomous software development.

--
Key...

▶ Play video

neat apex Dec 12, 2025, 12:23 PM

#

The only relevant Gpt is Gpt 5.2 xtra high, since the base model can lose even to glm 4.6 xd

#

It will appaear in the arena? It doenst bugs like the Pro versions does

modest prism Dec 12, 2025, 12:24 PM

#

fickle venture New model if anyone interesting, it seems this one eat a lot of token: https://y...

This is so inefficient. It might be cheaper to hire some real developers instead.

neat apex Dec 12, 2025, 12:25 PM

#

Ahh yes, Gpt 5.2 xtra high can cost 10$ per mesage

modest prism Dec 12, 2025, 12:26 PM

#

neat apex The only relevant Gpt is Gpt 5.2 xtra high, since the base model can lose even t...

That's not true. I sometimes get better results with gpt 5.2 medium than high or extra high.

neat apex Dec 12, 2025, 12:27 PM

#

Dammn, so Gpt 5.2 xtra high is a fraud?

queen veldt Dec 12, 2025, 12:33 PM

#

Just go claude bro

fleet lintel Dec 12, 2025, 12:39 PM

#

neat apex Dammn, so Gpt 5.2 xtra high is a fraud?

it's clear that they only added xhigh to score high on benchmarks. it's too expensive, slow in real world use

polar patrol Dec 12, 2025, 12:39 PM

#

Hi everyone, does anyone know how to connect an AI to Telegram bot so it answers based on a knowledge base? Also, are there any AIs that are free in terms of limits?

fleet lintel Dec 12, 2025, 12:40 PM

#

i gave decent shot to gpt 5.2 . i have plus subscription and damn gemini 3 is blowing it out of the water

livid plume Dec 12, 2025, 12:43 PM

#

guys watch this vid completely

#

https://www.youtube.com/watch?v=4PTXSKdr40k

YouTube

Charles Kendall

Cute Cat Video

This is a cute cat.

▶ Play video

tall patrol Dec 12, 2025, 12:57 PM

#

guys why is img leaderboard always missing famous opensource models, for example z image turbo isnt there in the text to image arena

queen veldt Dec 12, 2025, 12:58 PM

#

Because lmarena doesn't have z image turbo

#

Not available on battle

tall patrol Dec 12, 2025, 1:01 PM

#

ik but why not add it ? its opensource and pretty cheap to host compared to some other img models

kindred fog Dec 12, 2025, 1:02 PM

#

fun fact nano banana pro does peppino perfectly

queen veldt Dec 12, 2025, 1:02 PM

#

#1372229840131985540 you can ask here

ocean vortex Dec 12, 2025, 1:11 PM

#

tall patrol ik but why not add it ? its opensource and pretty cheap to host compared to some...

lmarena aren't hosting themselves anything. Everything is provider API. That would probably be on Alibaba (disguised as 'Tongyi-MAI' which is them) to contact lmarena and have their model entry

tall patrol Dec 12, 2025, 1:11 PM

#

aah

kindred fog Dec 12, 2025, 1:15 PM

#

lmarena is so fun because i can tell it pizza tower questions and which one it gets closest its better

hollow ivy Dec 12, 2025, 1:18 PM

#

compact flame You gotta use just the basic one and I recommend using opus

Which is the best coding language for Opus-4.5?

#

Python or Java?

proud bobcat Dec 12, 2025, 1:21 PM

#

Yeah so it looks like the consensus is 5.2 is benchmaxxed

#

It’s scored far lower than gemini 3 and opus on almost all personal benchmarks

#

On some benchmarks 5.2 pro scores lower than 5.1 normal

#

Agi!!!!!

compact flame Dec 12, 2025, 1:22 PM

#

hollow ivy Which is the best coding language for Opus-4.5?

What

compact flame Dec 12, 2025, 1:22 PM

#

proud bobcat On some benchmarks 5.2 pro scores lower than 5.1 normal

5.2 is a downgrade trust

proud bobcat Dec 12, 2025, 1:23 PM

#

How do they even manage to do this

kindred fog Dec 12, 2025, 1:23 PM

#

proud bobcat Agi!!!!!

deltarune pfp = based

compact flame Dec 12, 2025, 1:23 PM

#

proud bobcat How do they even manage to do this

Rushed

proud bobcat Dec 12, 2025, 1:23 PM

#

kindred fog deltarune pfp = based

I’m Kris Dreemur irl trust

meager harbor Dec 12, 2025, 1:23 PM

#

even gemini 3 is a downgrade in many aspects, he knows better than gemini 2.5 but when he doesn't know he makes so muich stuff up, a lot more than gemini 2.5

kindred fog Dec 12, 2025, 1:23 PM

#

I'm peppino spaghetti trust me bro

#

https://019b12b4-6fe2-7172-9ad9-da0fac3e83a6.arena.site

CUBE STORY - A Cubic Bullet Hell

Built with LMArena - Content is user-generated and unverified

proud bobcat Dec 12, 2025, 1:23 PM

#

My favorite part of pizza tower is when peppino came and said “it’s peppino time” and peppinoed everywhere

hollow ivy Dec 12, 2025, 1:24 PM

#

compact flame What

This was a serious question.

kindred fog Dec 12, 2025, 1:24 PM

#

proud bobcat My favorite part of pizza tower is when peppino came and said “it’s peppino time...

I never played deltarune but i played undertale

#

so i might consider playing the demo

proud bobcat Dec 12, 2025, 1:24 PM

#

Deltarune has a much warmer atmosphere

#

And I like the humor better

hollow ivy Dec 12, 2025, 1:24 PM

#

..i'm interested in, what programming language is best for vibe-coding with Claude-4.5-Opus-Thinking.

compact flame Dec 12, 2025, 1:24 PM

#

hollow ivy This was a serious question.

Opus is just good overall in coding

#

There no best model

meager harbor Dec 12, 2025, 1:25 PM

#

2026 will be the real test, i'm expecting things to stagnate. even in 2025 things were slowing down compared to 2024, just look at the top elo score in lm arena that just gained 80 points in 2025 while it was double than that in 2024

kindred fog Dec 12, 2025, 1:26 PM

#

kindred fog https://019b12b4-6fe2-7172-9ad9-da0fac3e83a6.arena.site

opus did this GREAT (i might try to make pizza tower level maker with this who knows)

proud bobcat Dec 12, 2025, 1:26 PM

#

meager harbor 2026 will be the real test, i'm expecting things to stagnate. even in 2025 thing...

2026 is the make or break year

#

If gemini 3.5 is peak it’s over for chatgpt

#

Gemini just consistently makes better models every time

#

GPT is a gamble

kindred fog Dec 12, 2025, 1:27 PM

#

nano banana pro was peak enough

compact flame Dec 12, 2025, 1:27 PM

#

How did you guys become arena champions bruh

kindred fog Dec 12, 2025, 1:27 PM

#

because i can finally generate pizza tower characters (except for some obscure ones

#

maybe in the future

proud bobcat Dec 12, 2025, 1:27 PM

#

compact flame How did you guys become arena champions bruh

We became champions of the arena

compact flame Dec 12, 2025, 1:28 PM

#

Honestly nano banana pro is so good

proud bobcat Dec 12, 2025, 1:28 PM

#

It is

kindred fog Dec 12, 2025, 1:28 PM

#

yes

compact flame Dec 12, 2025, 1:28 PM

#

It even knows how to draw GTA 5 perfectly

proud bobcat Dec 12, 2025, 1:28 PM

#

I don’t like image models but credit where credit is due

kindred fog Dec 12, 2025, 1:28 PM

#

compact flame It even knows how to draw GTA 5 perfectly

it knows peppino spaghetti and the noise, finally

meager harbor Dec 12, 2025, 1:28 PM

#

proud bobcat If gemini 3.5 is peak it’s over for chatgpt

nah i don't think so, unless new gemini is a breakhtrough, having 50 elo more on lm arena is not sufficient to make people change their personal ai for something just slightly better

compact flame Dec 12, 2025, 1:29 PM

#

kindred fog it knows peppino spaghetti and the noise, finally

Yeah ig it knows alot

kindred fog Dec 12, 2025, 1:29 PM

#

it can generate sonic screenshots now

#

which is cool

#

also this too

proud bobcat Dec 12, 2025, 1:36 PM

#

meager harbor nah i don't think so, unless new gemini is a breakhtrough, having 50 elo more on...

50 elo is quite a step up

#

Tbh

#

Personally though I don’t use ai that much

#

I only use DeepSeek primarily now for math and roleplay and GPT for quick alt scenario summaries

#

Though I might switch to DeepSeek for that too now because 3.2 has very natural language

fickle venture Dec 12, 2025, 1:38 PM

#

kindred fog also this too

Gpt 5.2?

proud bobcat Dec 12, 2025, 1:39 PM

#

Gemini I think

feral geyser Dec 12, 2025, 1:41 PM

#

why on my phone lmarena not working? "no models found" problem

proud bobcat Dec 12, 2025, 1:42 PM

#

feral geyser why on my phone lmarena not working? "no models found" problem

Check your modalities

#

Or

#

Reopen page

feral geyser Dec 12, 2025, 1:42 PM

#

i reopen but still not working

snow sail Dec 12, 2025, 1:43 PM

#

how is gemini ？

proud bobcat Dec 12, 2025, 1:43 PM

#

snow sail how is gemini ？

Straight fire

kindred fog Dec 12, 2025, 1:43 PM

#

what is peppino doing in the simpsons hespeppinoyouknow

proud bobcat Dec 12, 2025, 1:43 PM

#

feral geyser i reopen but still not working

Odd

proud bobcat Dec 12, 2025, 1:43 PM

#

kindred fog what is peppino doing in the simpsons <:hespeppinoyouknow:1163216218338119740>

Is that peppino from the hit platformer pizza tower in the Simpson???

kindred fog Dec 12, 2025, 1:44 PM

#

Simpsons Predicted Pizza Tower! peppinolaughinghisassoff

latent crest Dec 12, 2025, 2:07 PM

#

Why china hasn’t released a video AI open source model yet? 😭

proud bobcat Dec 12, 2025, 2:15 PM

#

latent crest Why china hasn’t released a video AI open source model yet? 😭

They

#

They have

#

Wan 2.2

pliant cliff Dec 12, 2025, 2:19 PM

#

kindred fog because i can finally generate pizza tower characters (except for some obscure o...

oh, you're like pizza tower too?

#

it's awesome game

pliant cliff Dec 12, 2025, 2:19 PM

#

kindred fog because i can finally generate pizza tower characters (except for some obscure o...

even Pepperman?

latent crest Dec 12, 2025, 2:21 PM

#

proud bobcat Wan 2.2

Is that any good ?

compact flame Dec 12, 2025, 2:21 PM

#

Honestly this is crazy how much deep seek thinks

#

I swear it thinks more than does actual text

left lodge Dec 12, 2025, 2:23 PM

#

Bro 💀
Yeah this is the worlds best model.

compact flame Dec 12, 2025, 2:24 PM

#

left lodge Bro 💀 Yeah this is the worlds best model.

Try Extra high one

left lodge Dec 12, 2025, 2:24 PM

#

Not available on lmarena

compact flame Dec 12, 2025, 2:24 PM

#

Yeah sadly honestly

left lodge Dec 12, 2025, 2:25 PM

#

5.1 is better than 5.2 ,
5.2 is just benchmaxxing

latent crest Dec 12, 2025, 2:25 PM

#

What is Higgsfield ?

proud bobcat Dec 12, 2025, 2:28 PM

#

left lodge Bro 💀 Yeah this is the worlds best model.

No no you don’t understand

#

You need the extra high model

#

It’s too hard for gpt

compact flame Dec 12, 2025, 2:28 PM

#

It's been 10 minutes and bruh deep seek is still thinking

proud bobcat Dec 12, 2025, 2:28 PM

#

compact flame It's been 10 minutes and bruh deep seek is still thinking

Speciale?

compact flame Dec 12, 2025, 2:29 PM

#

proud bobcat Speciale?

Yep

#

On yupp

proud bobcat Dec 12, 2025, 2:29 PM

#

Yeah it overthinks

#

Use 3.2 thinking

latent crest Dec 12, 2025, 2:29 PM

#

proud bobcat Speciale?

Did u ask the purpose of life or something

proud bobcat Dec 12, 2025, 2:29 PM

#

What

compact flame Dec 12, 2025, 2:29 PM

#

I swear I think I see speciale get paranoid in his thoughts

proud bobcat Dec 12, 2025, 2:29 PM

#

latent crest Is that any good ?

It’s pretty good

#

No veo 3 competitor

#

But it’s good

proud bobcat Dec 12, 2025, 2:30 PM

#

compact flame I swear I think I see speciale get paranoid in his thoughts

Speciale is only for like REALLY really big questions

#

What prompt did you give it

compact flame Dec 12, 2025, 2:30 PM

#

proud bobcat Speciale is only for like REALLY really big questions

I asked it to create a simple anti cheat

#

And I don't think it counted simple

#

As a word

zealous sparrow Dec 12, 2025, 2:31 PM

#

compact flame I asked it to create a simple anti cheat

speciale dictionary:
simple = complex, as hell, made in 17 coding languages

compact flame Dec 12, 2025, 2:31 PM

#

Bro the wall of his damn thoughts is crazy

#

Gemini on the left is simple and fast bruh

#

Jeez it even starts coding in his own thoughts

#

I'm never using speciale again bruh this is just crazy

rich panther Dec 12, 2025, 2:34 PM

#

which ai in lmarena is the best for coding

compact flame Dec 12, 2025, 2:34 PM

#

Finally it stopped thinking

#

After goddamn 15 minutes

zealous sparrow Dec 12, 2025, 2:35 PM

#

rich panther which ai in lmarena is the best for coding

opus 4.5, gemini 3 pro, new gemini 3 flash models [battlemode],

#

gpt 5.2 [rarely]

left lodge Dec 12, 2025, 2:35 PM

#

proud bobcat Speciale?

It isn't available on lmarena?

zealous sparrow Dec 12, 2025, 2:35 PM

#

left lodge It isn't available on lmarena?

isn't, due to issues

#

prob will return once API is patched up

left lodge Dec 12, 2025, 2:37 PM

#

Maybe, it was available just for few hours on launch, no hope till now.

rich panther Dec 12, 2025, 2:37 PM

#

zealous sparrow opus 4.5, gemini 3 pro, new gemini 3 flash models [battlemode],

there are no gemini 3 flash and opus 4.5 in lmarena, or am I just blind

zealous sparrow Dec 12, 2025, 2:37 PM

#

rich panther there are no gemini 3 flash and opus 4.5 in lmarena, or am I just blind

for opus 4.5 you are blind

#

gemini 3 flash is only in battle mode

#

as fiercefalcon or ghostfalcon

rich panther Dec 12, 2025, 2:38 PM

#

zealous sparrow for opus 4.5 you are blind

by opus 4.5, do you mean claude opus?

zealous sparrow Dec 12, 2025, 2:40 PM

#

rich panther by opus 4.5, do you mean claude opus?

yes

#

gemini 3 flash will join the list of nice coding models

#

next to gem 3 pro

rich panther Dec 12, 2025, 2:40 PM

#

zealous sparrow yes

oh okay

zealous sparrow Dec 12, 2025, 2:41 PM

#

but you know that we all just bench html

#

and not other

#

some models excel at python, but suck at html

rich panther Dec 12, 2025, 2:41 PM

#

it's a pity that opus 4.5 has a limit

zealous sparrow Dec 12, 2025, 2:41 PM

#

how so

hollow ivy Dec 12, 2025, 2:42 PM

#

hm, somehow g3p's coding ability turned to crap

#

worse than 2.5-pro

zealous sparrow Dec 12, 2025, 2:42 PM

#

you mean model degradation?

hollow ivy Dec 12, 2025, 2:42 PM

#

yep :(

zealous sparrow Dec 12, 2025, 2:42 PM

#

No one really knows how this happens

hollow ivy Dec 12, 2025, 2:42 PM

#

i think AGI will never happen

zealous sparrow Dec 12, 2025, 2:42 PM

#

A company wouldnt make a model sh, because yes.

#

I doubt.

#

I think the issue was that, a lot of use was pushed onto gemini 3 pro.

#

Wearing it off a lot.

hollow ivy Dec 12, 2025, 2:43 PM

#

so, only Opus-4.5 remains

proud bobcat Dec 12, 2025, 2:43 PM

#

Since when?

hollow ivy Dec 12, 2025, 2:43 PM

#

the lonely coding-king

proud bobcat Dec 12, 2025, 2:43 PM

#

Opus is peak

zealous sparrow Dec 12, 2025, 2:43 PM

#

opus will also degrade over time

#

its inevitable

proud bobcat Dec 12, 2025, 2:43 PM

#

Gemini 3 pro was crazy at launch day

#

What is bro on about

zealous sparrow Dec 12, 2025, 2:44 PM

#

What makes you think that?

proud bobcat Dec 12, 2025, 2:44 PM

#

Also they will defo quantize opus to at least quant 8

#

Fym nah?

zealous sparrow Dec 12, 2025, 2:44 PM

#

Why are you so confident they wont

proud bobcat Dec 12, 2025, 2:44 PM

#

Ts ragebait

zealous sparrow Dec 12, 2025, 2:44 PM

#

Every model degraded over time.

#

Even sonnet 3.5!

#

Sonnet 3.5 used to be a goat, then started to degrade.

proud bobcat Dec 12, 2025, 2:45 PM

#

Opus 4.5 is safe to quantize too because it will degrade the model very very little, while saving half the resources

#

Why wouldn’t it write comments in code to explain what each function does

#

Well yeah it’s an ai

#

It’s gonna overdo it

#

But you can just trim the comments

#

The code is still solid

hollow ivy Dec 12, 2025, 2:46 PM

#

do you guys still think, we get a coding-AGI in the future?

zealous sparrow Dec 12, 2025, 2:46 PM

#

hollow ivy do you guys still think, we get a coding-AGI in the future?

All models degrade

proud bobcat Dec 12, 2025, 2:46 PM

#

AGI isn’t real

zealous sparrow Dec 12, 2025, 2:46 PM

#

Therefore, no

compact flame Dec 12, 2025, 2:46 PM

#

I swear speciale spends more time thinking than creating an actual thing

proud bobcat Dec 12, 2025, 2:47 PM

#

compact flame I swear speciale spends more time thinking than creating an actual thing

Remember this is for like crazy in-depth questions

#

Speciale is not meant for day to day use

hollow ivy Dec 12, 2025, 2:47 PM

#

proud bobcat AGI isn’t real

oh, i meant an "AGCI" not an actual AGI

proud bobcat Dec 12, 2025, 2:47 PM

#

What is AGCI

compact flame Dec 12, 2025, 2:47 PM

#

proud bobcat Remember this is for like crazy in-depth questions

I know it but when it gave me the final result it had no explanation just the plain script

proud bobcat Dec 12, 2025, 2:47 PM

#

Like

hollow ivy Dec 12, 2025, 2:47 PM

#

AGCI = artificial general coding intelligence

proud bobcat Dec 12, 2025, 2:47 PM

#

Haven’t we gotten that with Opus

#

Am I

#

Am I tweaking

compact flame Dec 12, 2025, 2:48 PM

#

compact flame I know it but when it gave me the final result it had no explanation just the pl...

I think it spent all it's context on thinking

proud bobcat Dec 12, 2025, 2:48 PM

#

Probably yeah

#

Just use 3.2 thinking

compact flame Dec 12, 2025, 2:48 PM

#

proud bobcat Just use 3.2 thinking

Yeah fair

hollow ivy Dec 12, 2025, 2:48 PM

#

proud bobcat Just use 3.2 thinking

is it better than g3p?

#

and what about GLM?

#

i hope Elon does something with g5

zealous sparrow Dec 12, 2025, 2:50 PM

#

gem 3 will forever have the best OCR

#

OAI argued their 5.2 OCR is goated

#

and it missed on so much

proud bobcat Dec 12, 2025, 2:50 PM

#

hollow ivy is it better than g3p?

No but it’s still an extremely good model

hollow ivy Dec 12, 2025, 2:50 PM

#

zealous sparrow gem 3 will forever have the best OCR

ocr without decent coding = meh

proud bobcat Dec 12, 2025, 2:50 PM

#

I love DeepSeek for math

#

It’s so goated

compact flame Dec 12, 2025, 2:51 PM

#

I wonder if extra high gpt is better than opus

hollow ivy Dec 12, 2025, 2:51 PM

#

are these the first symptoms that the AI bubble is about to burst?

proud bobcat Dec 12, 2025, 2:51 PM

#

I’m still wondering why they need extra high

#

It sounds like they’re out of options

#

Gpt the only ai I’ve seen that makes 50 different variations of the same model

#

It’s embarrassing

compact flame Dec 12, 2025, 2:52 PM

#

proud bobcat I’m still wondering why they need extra high

Next thing they'll add is probably Extra high Max pro ultra

hollow ivy Dec 12, 2025, 2:52 PM

#

*these:

gpt sux
gemini 3 sux now
grok sux

polar patrol Dec 12, 2025, 2:52 PM

#

proud bobcat Dec 12, 2025, 2:52 PM

#

compact flame Next thing they'll add is probably Extra high Max pro ultra

Ultra Overclocked Pro Reasoning AGI Extra Speed Boost Mode

compact flame Dec 12, 2025, 2:52 PM

#

OpenAi just needs to focus on training their models

#

Not rushing

compact flame Dec 12, 2025, 2:53 PM

#

proud bobcat Ultra Overclocked Pro Reasoning AGI Extra Speed Boost Mode

Fr

proud bobcat Dec 12, 2025, 2:53 PM

#

After 4o they started scraping the internet for data

#

Which is why 5 was so ass

compact flame Dec 12, 2025, 2:53 PM

#

Makes sense

proud bobcat Dec 12, 2025, 2:53 PM

#

Theyre doing the same mistake meta did with llama 4

hollow ivy Dec 12, 2025, 2:54 PM

#

if only claude was not so expensive :/

zealous sparrow Dec 12, 2025, 2:54 PM

#

hollow ivy *these: - gpt sux - gemini 3 sux now - grok sux

grok sucks is actually accurate

#

gpt 5.2 sucks because of UI

proud bobcat Dec 12, 2025, 2:54 PM

#

Grok is solid :(

#

I will not tolerate this slander

zealous sparrow Dec 12, 2025, 2:54 PM

#

proud bobcat Grok is solid :(

i refuse to believe

proud bobcat Dec 12, 2025, 2:54 PM

#

I use it semi daily

#

I like 4.1 a lot

compact flame Dec 12, 2025, 2:54 PM

#

Hm I wonder if there opus 64k thinking

latent crest Dec 12, 2025, 2:55 PM

#

What’s gem 3

compact flame Dec 12, 2025, 2:55 PM

#

latent crest What’s gem 3

Gemini 3 pro

#

Well maybe

proud bobcat Dec 12, 2025, 2:56 PM

#

compact flame Hm I wonder if there opus 64k thinking

Opus is the goat of thinking

#

I used to hate Claude but I like their models now

compact flame Dec 12, 2025, 2:56 PM

#

proud bobcat Opus is the goat of thinking

Yeah but I just wonder if there 64k version of it

#

Or they didn't release it yet or whatever

proud bobcat Dec 12, 2025, 2:57 PM

#

Maybe they’re working on it

compact flame Dec 12, 2025, 2:58 PM

#

Probably

waxen fern Dec 12, 2025, 3:00 PM

#

Does GPT-5.2 and Nano Banana Pro have rate limits in Lmarena??

compact flame Dec 12, 2025, 3:01 PM

#

waxen fern Does GPT-5.2 and Nano Banana Pro have rate limits in Lmarena??

Why'd you ask?

#

You can test out yourself I guess

waxen fern Dec 12, 2025, 3:02 PM

#

compact flame You can test out yourself I guess

I need answers

compact flame Dec 12, 2025, 3:04 PM

#

waxen fern I need answers

I doubt 5.2 gpt has limits

#

And I've never reached nano banana limits

#

But don't take this as valid info since I didn't test it out properly

#

Especially due to how bad is gpt 5.2

neat apex Dec 12, 2025, 3:11 PM

#

waxen fern I need answers

Not human acheivable limits, even if you manage somehow they will ask only a captcha to continue

#

Not sure about image model, but it is at very least 10 per minute

zealous sparrow Dec 12, 2025, 3:12 PM

#

@deep adder If you dont believe the new gemini 3 flash models will be good give me a prompt to try with it

compact flame Dec 12, 2025, 3:13 PM

#

I wonder how good is Gemini deep think

neat apex Dec 12, 2025, 3:13 PM

#

People genuily believing that Gemini 3 pro is significantly bigger than Gemini 2.5 makes me laught

#

It does not make any sense, they only argument is that the advancement was too big

compact flame Dec 12, 2025, 3:14 PM

#

neat apex People genuily believing that Gemini 3 pro is significantly bigger than Gemini 2...

The only thing that makes me laugh is chatgpt downgrade

meager harbor Dec 12, 2025, 3:14 PM

#

proud bobcat 50 elo is quite a step up

not enough, 100 elo is minimum to make people switch, for now having less limit usage and better UI is what influence people switching

zealous sparrow Dec 12, 2025, 3:14 PM

#

so basically?

neat apex Dec 12, 2025, 3:14 PM

#

That makes me saddly sad

#

I like gpt 5.1 high because it very rarely times manages to find a answer thanks to his bigger reasoning

zealous sparrow Dec 12, 2025, 3:14 PM

#

zealous sparrow so basically?

ah i know now

neat apex Dec 12, 2025, 3:15 PM

#

Deepseek 3.2 Especiale is comparable, but its too slow

#

Haiku extended thinking is lazy ah

viscid echo Dec 12, 2025, 3:15 PM

#

lmarena Why are there so many errors?

compact flame Dec 12, 2025, 3:15 PM

#

neat apex Deepseek 3.2 Especiale is comparable, but its too slow

Bro it was thinking for straight 15 minutes for me

#

And it ran out of context before answering my question

neat apex Dec 12, 2025, 3:16 PM

#

I hope you are right and not theses million people saying it is worse

zealous sparrow Dec 12, 2025, 3:16 PM

#

what exactly do you want me to ask for the haystack in a needle test

neat apex Dec 12, 2025, 3:16 PM

#

I am only expecting to it be same level, but have a functional Xtra High mode

viscid echo Dec 12, 2025, 3:16 PM

#

lmarena is giving me too many errors

zealous sparrow Dec 12, 2025, 3:16 PM

#

ah

neat apex Dec 12, 2025, 3:16 PM

#

Yeah, neddle test is not hard to do

#

Like, put your university documents and ask to it find the content you want

#

You mean data resolutioning, its not the best model to cacth data at all

compact flame Dec 12, 2025, 3:18 PM

#

neat apex You mean data resolutioning, its not the best model to cacth data at all

Hey how to become arena champion?

neat apex Dec 12, 2025, 3:18 PM

#

Even Gemini 2.5 flash 09 that is 300% more assertive is not that good

proud bobcat Dec 12, 2025, 3:18 PM

#

Apparently it can’t do things 5.1 was able to do

#

It’s defo benchmaxxed

#

What the hell does FUD mean

#

Fear Uncertainty and Doubt???

#

What is bro talking about

#

These are community observations that show 5.2 completely blows at tasks 5 and 5.1 could do

#

5.2 fails nearly every independent benchmark

#

Bet give me a second

echo sinew Dec 12, 2025, 3:21 PM

#

viscid echo lmarena is giving me too many errors

Hello! Sorry to hear about these issues. Other users have also reported to have encountered more errors lately. The team in charge is looking into it to find a fix. You can also read the https://discord.com/channels/1340554757349179412/1343291835845578853 forums and see if your issue has already been reported. If you don't see a post related to your issue, you can make your own post and explain what's happening. If you can provide screenshots of the errors, that would be helpful.

neat apex Dec 12, 2025, 3:21 PM

#

compact flame Hey how to become arena champion?

You need to have more than 1500 elo at human arena in the contest, i had only 2200 elo cuz i were sick

#

I am joking, its because i were in the server before they downgraded the newgens

#

If you be active in the server they will eventually give it to you

proud bobcat Dec 12, 2025, 3:22 PM

#

For one 5.2 underperformed in creative writing benchmarks, fact evaluations

#

Even on one example performing worse than 5.1

#

I need to scour for the post but I saw it earlier today

neat apex Dec 12, 2025, 3:23 PM

#

Its because he refuses way more in my little test

proud bobcat Dec 12, 2025, 3:23 PM

#

People have been complaining 5.2 explains things worse than 5.1 on average

neat apex Dec 12, 2025, 3:23 PM

#

Like 3.5 sonnet to 3.7, it gained more personality but also cowardness

zealous sparrow Dec 12, 2025, 3:23 PM

#

needle haystack bench aint that hard, even haiku 4.5 can do it

neat apex Dec 12, 2025, 3:24 PM

#

zealous sparrow needle haystack bench aint that hard, even haiku 4.5 can do it

They allergly say that it went flawless with 4 different needles at 250k tokens

#

Not only 1 or 2

zealous sparrow Dec 12, 2025, 3:24 PM

#

neat apex They allergly say that it went flawless with 4 different needles at 250k tokens

4 needles huh

#

aight ima add 4 needles then

viscid echo Dec 12, 2025, 3:24 PM

#

echo sinew Hello! Sorry to hear about these issues. Other users have also reported to have ...

Oh, I see Thanks for your reply

zealous sparrow Dec 12, 2025, 3:25 PM

#

this one was easy with just one needle

proud bobcat Dec 12, 2025, 3:25 PM

#

Christ give me a second

#

I’m not pulling this from my ass you know

neat apex Dec 12, 2025, 3:25 PM

#

zealous sparrow this one was easy with just one needle

Tf you dooing

zealous sparrow Dec 12, 2025, 3:25 PM

#

neat apex Tf you dooing

pasting huge ass textwalls and telling it to find words

neat apex Dec 12, 2025, 3:25 PM

#

Even a calculator can find a word betwen hashs

zealous sparrow Dec 12, 2025, 3:25 PM

#

neat apex Even a calculator can find a word betwen hashs

what cant they find then

#

give me an example

neat apex Dec 12, 2025, 3:25 PM

#

They mean a actual text, and a actual information

#

Like, what John said about Lisa in the text?

zealous sparrow Dec 12, 2025, 3:26 PM

#

neat apex Like, what John said about Lisa in the text?

so just give it texts and ask it questions?

neat apex Dec 12, 2025, 3:26 PM

#

Yeah, its a good sign, but easily uselessly benchmaxxed

zealous sparrow Dec 12, 2025, 3:26 PM

#

thats easy as hell..

proud bobcat Dec 12, 2025, 3:27 PM

#

I give up

#

I can’t find the two posts

zealous sparrow Dec 12, 2025, 3:27 PM

#

why so

neat apex Dec 12, 2025, 3:27 PM

#

zealous sparrow thats easy as hell..

Not that easy if you ask something that is not explicit and multiple things

zealous sparrow Dec 12, 2025, 3:27 PM

#

neat apex Not that easy if you ask something that is not explicit and multiple things

so have multiple details in the question and ask it to find one

proud bobcat Dec 12, 2025, 3:27 PM

#

The point STILL stands though

neat apex Dec 12, 2025, 3:27 PM

#

Can be

proud bobcat Dec 12, 2025, 3:27 PM

#

5.2 is benchmaxxed to a degree

#

No one can deny that

neat apex Dec 12, 2025, 3:28 PM

#

They say if you ask 4 different details it is supossed to go flawless

zealous sparrow Dec 12, 2025, 3:28 PM

#

neat apex They say if you ask 4 different details it is supossed to go flawless

give me an example

#

im uncreative af

proud bobcat Dec 12, 2025, 3:28 PM

#

You cannot expect anyone to believe 5.2 naturally just got an above 50% on arc agi 2

neat apex Dec 12, 2025, 3:28 PM

#

Get your documents and ask there you missed answer a thing

#

Its a example

#

Yes because its Gemini 3 pro

zealous sparrow Dec 12, 2025, 3:29 PM

#

neat apex Get your documents and ask there you missed answer a thing

i need like a prompt

proud bobcat Dec 12, 2025, 3:30 PM

#

???

neat apex Dec 12, 2025, 3:30 PM

#

Ask gpt lmao

zealous sparrow Dec 12, 2025, 3:30 PM

#

neat apex Ask gpt lmao

ask it what

proud bobcat Dec 12, 2025, 3:30 PM

#

31% is a vast difference to 50%

neat apex Dec 12, 2025, 3:30 PM

#

And Gemini 3 is Gemini 3

proud bobcat Dec 12, 2025, 3:30 PM

#

What I’m saying is that 5.2 was clearly maxed for this

neat apex Dec 12, 2025, 3:30 PM

#

Gpt 5.2 is a bare improvement, if its true

proud bobcat Dec 12, 2025, 3:31 PM

#

Brother

neat apex Dec 12, 2025, 3:31 PM

#

They benchmaxxed that 100%

proud bobcat Dec 12, 2025, 3:31 PM

#

5.2 incorrectly labeled parts of a pc

#

And that was the OFFICIAL VISION DEMO

#

Think about it

#

Somehow

#

Just somehow

#

5.1 was barely an improvement

#

It was like

#

A finetune

#

Now 5.2 magically gets every single benchmark

#

If 5.2 is so good

neat apex Dec 12, 2025, 3:32 PM

#

Buuut, it showed nothing like that in real life

proud bobcat Dec 12, 2025, 3:32 PM

#

Why was 5 and 5.1 ASS?

proud bobcat Dec 12, 2025, 3:32 PM

#

neat apex Buuut, it showed nothing like that in real life

Yes

#

Exactly my point

neat apex Dec 12, 2025, 3:32 PM

#

Unlike Gemini 3 and Opus 4.5

proud bobcat Dec 12, 2025, 3:32 PM

#

Holy hod this is ragebait

neat apex Dec 12, 2025, 3:33 PM

#

Yeah, it is ragebait

sour spear Dec 12, 2025, 3:33 PM

#

proud bobcat A finetune

Not just like a finetune. It was a finetune. They tried to get rid of the model's terrible corpo HR tone, which frankly, they still haven't quite managed to do yet.

proud bobcat Dec 12, 2025, 3:33 PM

#

sour spear Not just like a finetune. It was a finetune. They tried to get rid of the model'...

Yes!

neat apex Dec 12, 2025, 3:33 PM

#

Its efforts way more to answer

proud bobcat Dec 12, 2025, 3:33 PM

#

You cannot expect me to believe 5.2 magically just became great within months

neat apex Dec 12, 2025, 3:33 PM

#

Gpt 5 was the second more lazy model ever, just behind prime gpt 4o mini

proud bobcat Dec 12, 2025, 3:33 PM

#

What was openai doing before?

#

Tickling their ass cheeks?

neat apex Dec 12, 2025, 3:34 PM

#

proud bobcat Tickling their ass cheeks?

Buying Nvidia actions

proud bobcat Dec 12, 2025, 3:34 PM

#

Now they just rushed this extremely maxxed model so they can say: “we’re in the race!”

neat apex Dec 12, 2025, 3:34 PM

#

From o3, gpt 5.2 just improved 20% at most

#

Isane

echo sinew Dec 12, 2025, 3:34 PM

#

compact flame Hey how to become arena champion?

Hello! Please check this post in the Announcements channel: #announcements message

queen veldt Dec 12, 2025, 3:35 PM

#

Gpt 5.2 is a lie

#

They probably bought that place on the graphs

proud bobcat Dec 12, 2025, 3:35 PM

#

neat apex From o3, gpt 5.2 just improved 20% at most

It’s still very good but I feel that with 5, they stopped training their models on valuable data

proud bobcat Dec 12, 2025, 3:35 PM

#

queen veldt Gpt 5.2 is a lie

Tell that to Craig

#

GPT die hard fan

queen veldt Dec 12, 2025, 3:36 PM

#

We tried to make it count the tomatoes in this image

#

It said 43 or something

#

Even got errors

neat apex Dec 12, 2025, 3:36 PM

#

It misscounted 10 less than gpt 5.1

queen veldt Dec 12, 2025, 3:36 PM

#

Meanwhile gemini correctly counted 69 tomatoes

proud bobcat Dec 12, 2025, 3:36 PM

#

proud bobcat Tell that to Craig

I couldn’t find like two posts that got buried and apparently now my argument is invalid 😭

neat apex Dec 12, 2025, 3:36 PM

#

It said 53 and gpt 5.1 said 63

queen veldt Dec 12, 2025, 3:36 PM

#

zealous sparrow Dec 12, 2025, 3:36 PM

#

queen veldt

gemini's OCR is too advanced

#

so yeah

queen veldt Dec 12, 2025, 3:36 PM

#

zealous sparrow Dec 12, 2025, 3:36 PM

#

losing battle for OAi there

neat apex Dec 12, 2025, 3:36 PM

#

Gemini OCR AND inteligence is high

queen veldt Dec 12, 2025, 3:37 PM

#

#

It wasted 3 minutes + got errors and in the end it was incorrect

#

It guessed the number

neat apex Dec 12, 2025, 3:37 PM

#

Check o3, how many he counts

proud bobcat Dec 12, 2025, 3:37 PM

#

Is o3 vision

queen veldt Dec 12, 2025, 3:38 PM

#

Already did

#

#

He said 49

#

He tried to do the grid counting

#

#

#

o3 even went on shutterstock to search for images of tomatoes 💀

neat apex Dec 12, 2025, 3:39 PM

#

💀

#

Grok 4.1 have a mediocre OCR but at least they are cheap and fast

proud bobcat Dec 12, 2025, 3:40 PM

#

queen veldt Dec 12, 2025, 3:44 PM

#

Tbh i didn't count the tomatoes

#

But guy who sent me this tomato image says it has 69

burnt pulsar Dec 12, 2025, 3:45 PM

#

I have trouble to get gpt-5.2-high to work at all, are there known issues at the moment with that model?

queen veldt Dec 12, 2025, 3:45 PM

#

Some guy even ran gpt 5.2 pro on it and got incorrect

neat apex Dec 12, 2025, 3:45 PM

#

Qwen 3 Max after thinking for half a hour said 71

#

Way closer than gpt xd

queen veldt Dec 12, 2025, 3:46 PM

#

#

SOTA btw

neat apex Dec 12, 2025, 3:46 PM

#

Its gpt 5.1

#

Answer "Sup" to "Sup" is VERY weird

queen veldt Dec 12, 2025, 3:48 PM

#

No it was 5.2

#

#

It should've been helpful like Hello what do you need help with today or something

neat apex Dec 12, 2025, 3:49 PM

#

Well, Hello to Hello is acceptable

#

But Sup to Sup?

zealous sparrow Dec 12, 2025, 3:50 PM

#

5.1 does Sup to sup better
epic guy is my uh OAI account name

proud bobcat Dec 12, 2025, 3:52 PM

#

queen veldt

I mean

#

It’s casual

#

Ig

zealous sparrow Dec 12, 2025, 3:54 PM

#

2/3 on my 3 questions

#

it got the Mary and Kevin question wrong

#

hard question tho so like i understand it

#

but assuming they are animals is meh, mayb i should specify

compact flame Dec 12, 2025, 3:55 PM

#

proud bobcat

Honestly I think it mistakes tomatoes for pumpkins or other stuff

#

Because some of those I really could mistake for a watermelon, pepper or just pumpkin

zealous sparrow Dec 12, 2025, 3:57 PM

#

zealous sparrow 2/3 on my 3 questions

@deep adder other models even december-chatbot which is an OAI model weren't close
this is a google one

neat apex Dec 12, 2025, 3:57 PM

#

zealous sparrow but assuming they are animals is meh, mayb i should specify

No, you should not

zealous sparrow Dec 12, 2025, 3:57 PM

#

neat apex No, you should not

you think?

neat apex Dec 12, 2025, 3:57 PM

#

Its the AI who should discover it

zealous sparrow Dec 12, 2025, 3:57 PM

#

neat apex Its the AI who should discover it

I mean yeah, if you specify like every AI will ace

#

opus 4.1 got 3/3

#

smart ahh

proud bobcat Dec 12, 2025, 4:03 PM

#

Opus so peak

hushed gyro Dec 12, 2025, 4:04 PM

#

chat why is NB Pro so unstable in LMArena

proud bobcat Dec 12, 2025, 4:04 PM

#

https://tenor.com/view/homelander-homelander-the-boys-homelander-sad-homelander-its-peak-homelander-peak-gif-14542009839452529163

Tenor

proud bobcat Dec 12, 2025, 4:04 PM

#

hushed gyro chat why is NB Pro so unstable in LMArena

Provider issues

zealous sparrow Dec 12, 2025, 4:05 PM

#

zealous sparrow opus 4.1 got 3/3

any opus model can do my questions atp

#

kevin taping jake isnt really the answer but

#

ill take it

hushed gyro Dec 12, 2025, 4:06 PM

#

proud bobcat Provider issues

where can I then use NB Pro for free?

compact flame Dec 12, 2025, 4:06 PM

#

hushed gyro where can I then use NB Pro for free?

Images

thorn path Dec 12, 2025, 4:07 PM

#

What time does the leaderboard usually update for lmarena? I'm very curious to see if Gemini still holds its title here

zealous sparrow Dec 12, 2025, 4:07 PM

#

this model failed, which i dont know the identity of

compact flame Dec 12, 2025, 4:08 PM

#

thorn path What time does the leaderboard usually update for lmarena? I'm very curious to s...

Like every 24 hours I think

zealous sparrow Dec 12, 2025, 4:08 PM

#

compact flame Like every 24 hours I think

no

#

when new models come out

proud bobcat Dec 12, 2025, 4:10 PM

#

hushed gyro where can I then use NB Pro for free?

well

#

you get what you get here ig

compact flame Dec 12, 2025, 4:11 PM

#

zealous sparrow when new models come out

Oh okay

hushed gyro Dec 12, 2025, 4:12 PM

#

compact flame Images

google images?

zealous sparrow Dec 12, 2025, 4:12 PM

#

zealous sparrow 2/3 on my 3 questions

@deep adder This other google model got 3/3

compact flame Dec 12, 2025, 4:12 PM

#

hushed gyro google images?

Uhh pro pro can explain

#

Or lemme get a pic rq

hushed gyro Dec 12, 2025, 4:13 PM

#

compact flame Uhh pro pro can explain

@zealous sparrow what does he mean by Images?

zealous sparrow Dec 12, 2025, 4:13 PM

#

hushed gyro <@872475096743305226> what does he mean by Images?

imagearena on LMArena

compact flame Dec 12, 2025, 4:13 PM

#

hushed gyro <@872475096743305226> what does he mean by Images?

#

If you want to generate images

hushed gyro Dec 12, 2025, 4:13 PM

#

zealous sparrow imagearena on LMArena

Yeah I tried to use that on lmarena, it just pops up smth is wrong

zealous sparrow Dec 12, 2025, 4:14 PM

#

hushed gyro Yeah I tried to use that on lmarena, it just pops up smth is wrong

you are either ratelimited or reused prompt 3 times

pseudo hemlock Dec 12, 2025, 4:15 PM

#

Do the lmarena people pay for our chats?

zealous sparrow Dec 12, 2025, 4:15 PM

#

pseudo hemlock Do the lmarena people pay for our chats?

all the direct models yes

#

battle anonymous models no

pseudo hemlock Dec 12, 2025, 4:15 PM

#

Wait that’s insane

pseudo hemlock Dec 12, 2025, 4:15 PM

#

zealous sparrow battle anonymous models no

How does that work

#

Do they have a fancy api key from companies or something so they don’t pay?

zealous sparrow Dec 12, 2025, 4:17 PM

#

pseudo hemlock Do they have a fancy api key from companies or something so they don’t pay?

wish i knew

compact flame Dec 12, 2025, 4:17 PM

#

pseudo hemlock Do they have a fancy api key from companies or something so they don’t pay?

I assume they do pay they just get funded or invested in

zealous sparrow Dec 12, 2025, 4:18 PM

#

ok brb goin to come up with 5 different questions for LLM Testing

#

ranking so far is
3/3 Opus 4.5/4.1, fiercefalcon
2/3 ghostfalcon

#

0/3 multiple other models i forgot

compact flame Dec 12, 2025, 4:18 PM

#

Aw dang it nano banana doesn't know who Ahab is

#

Anyways

#

Or maybe it does idk

zealous sparrow Dec 12, 2025, 4:19 PM

#

wish the LLMs good luck with this one
answer is that the ticket is forged

#

no LLM will come up with the idea

neon idol Dec 12, 2025, 4:20 PM

#

@echo aurora yo bud

compact flame Dec 12, 2025, 4:20 PM

#

zealous sparrow wish the LLMs good luck with this one answer is that the ticket is forged

I wonder what is the answer

neon idol Dec 12, 2025, 4:20 PM

#

nbp off

zealous sparrow Dec 12, 2025, 4:20 PM

#

compact flame I wonder what is the answer

i highly doubt an LLM will get this right

compact flame Dec 12, 2025, 4:21 PM

#

zealous sparrow i highly doubt an LLM will get this right

What is LLM I didn't hear about that model

echo aurora Dec 12, 2025, 4:21 PM

#

neon idol <@283397944160550928> yo bud

ablobwave

zealous sparrow Dec 12, 2025, 4:21 PM

#

compact flame What is LLM I didn't hear about that model

LLM stands for Large Language Model

#

basically

#

any

compact flame Dec 12, 2025, 4:21 PM

#

zealous sparrow LLM stands for Large Language Model

Oh alright

zealous sparrow Dec 12, 2025, 4:22 PM

#

Here we go i ran the 3 questions

#

imo no one will ace

neon idol Dec 12, 2025, 4:23 PM

#

echo aurora <a:ablobwave:552927506957729802>

we need your powerful magic trick for fix nbp

zealous sparrow Dec 12, 2025, 4:23 PM

#

bro wdym Miles is not a person, Miles is an english name!

echo aurora Dec 12, 2025, 4:24 PM

#

neon idol we need your powerful magic trick for fix nbp

I've flagged to the team the higher than usual error rate. We'll have to wait for a solution which is being worked on.

neon idol Dec 12, 2025, 4:24 PM

#

echo aurora I've flagged to the team the higher than usual error rate. We'll have to wait fo...

nice thx u