#general
1 messages · Page 218 of 1
ngl arc agi 2 50% is kinda crazy
It's great on creative writing so far, especially in ghostwriting, that is one I guess? It's a major improvement over the old models and also... hmm It kinda topped Gemini 3 pro and Claude atm (imo). Only for creative writing, since I don't code, nor understand code in general.
There been some minigames made with GPT5.2 if you scroll up, it seemed kinda failed in some parts. The one with the bullet hell game.
if you're looking for coding bench that is
It's a good model that's all.
Guys why dies using adolf on Sora unlock most every ip? O.o
But without it it gets blocked
Pokémon dragon ball z
U name it
Let’s see if it can do sonic
Lie
benchmaxxed
Beches mean nothing until you actually try the model
Idk good question
Gpt 5.2 can't beat opus that easy
I tried few prompts and got similar (wrong) answer as previous model
Is this true.?
ARC-AGI 2 (Abstraction and Reasoning Corpus for Artificial General Intelligence 2) is an artificial intelligence benchmark designed to measure genuine reasoning and problem-solving capabilities in AI systems. Released on March 26, 2025, by the ARC Prize Foundation, it serves as a critical test for progress toward artificial general intelligence ...
lol yup sonic works 😅😅
See I figured it out
But only works with adolf 😭
🙁
what prompt
Any of them try to do it without him see if it’s even possible. I’ve seen it on Reddit, but I just don’t know what the heck they did.
on api?
App
For sonic sequence, I used “The scene opens with him running into a blue fast super fast a hedgehog that’s faster then a sonic plane”
Maybe you need a cameo character to make it work
why 4 seconds
I cut it down
Here is orginal
The technique I’m using right now is just trying to make a bunch of sequences
See where exactly the filter catches and misses the ip
I don’t know, though I’ve been more confused lately than I’ve been able to get answers
too much trial and error, in the end, u realize its random
the guy on reddit used character cameo for sonic
not like any other ai company doesnt do that
europe is gonna be behind in the ai race
Its cause Europe has strong data laws
They actually care about there citizens privacy and data to some extent
Internet is full of scrappers, all ai companies guilty of this
gonna?
its not wrong
Happy now?
lmao
yes
let me check
wonder what made the diff
try grok fast
i wasnt signed up
wat
Lets try to keep conversation a bit less NSFW please.
Test ai design performance
Has anyone used gpt 5.2 for developing websites
not surpised, though I think there's a decent shot openAI is overpredicted here
Insiders
not in every company
Brah
if its insider's its either openAI one's who really believe they got something cooking, google ones who think gem 3 is cooked, or I guess technically both
openAI doesn't have insider info on google models and vise versa
also anthropic
You ever herd of app tracking lol
too vague
We all know Microsoft, Google, Nvidia and other tech companies collect data, use telemetry for targeted ads, but who are the worst offenders and what kind of connections are they making on your PC? Try Zero Trust with ThreatLocker: https://www.threatlocker.com/pcsecurity (sponsor)
Buy the best antivirus: https://thepcsecuritychannel.com/best-a...
These companies track each other through apps ppl got installed
😈 Watch exclusive 40+ minute documentaries that are too controversial to ever be released to the public: https://jake.yt/join
📹 Take a peak at all the private documentaries here: https://jake.yt/hidden-vids
💻 𝗟𝗮𝗽𝘁𝗼𝗽 𝗟𝗶𝗳𝗲𝘀𝘁𝘆𝗹𝗲 𝗔𝗰𝗮𝗱𝗲𝗺𝘆: Learn exactly how I landed my $40/...
It’s a real thing
the model is still based on a 2024 model, they care more about boosting performance than upgrading world knowledge, don't expect it to know anything past mid-2024
their next base model will have more recent world knowledge, maybe enough to know what kimi k2 is at least
just use reaosning model like you should use for everything
it might know 0905 though, afaik their bots have been highly active in the past few weeks
reasoning isn't going to help the fact that the base model doesn't know what kimi k2 is
but oh wait your sole usage of llms is to roleplay a virtual boyfriend so
Ya it is
Better then being reliant on it and asking, dumb, basic commonsense questions that people should know
web search would help as it can just search it
why would web search ever be disabled
$
free users 🥱
lol
20 bucks a month is your netflix subscription but you cant invest that in your boyfriend?
Yeah some people are smarty enough not to waste money on a proto type
idk, gemini has it on all the time but having a model search the web does cost money for a company like openai that doesn't have its own index, although i guess their hoard of scraped pages will be fine
funny how youve dropped 2 messages about how im stupid and youre smart and you dont even know prototype is a single word
honestly, original k2 is probably better, 0905 did put more effort into coding and stuff so rp performance is slightly degraded
have never seen it not web search when i see in its reasoning that it doesnt know what im talking about
so must be a free user thing
i found that original k2 does do better at outputting song lyrics from its knowledge, so rp may be degraded
i have paid gemini and it does this, so that's irrelevant
american tiktok brains are painfully unfunny and lack self awareness
Bless your arrogance and your enlightening ignorance
i scroll down 5 messages in your history and you sent another ignorant message
saying the coding model market is insiders
you dont even read what the market is based off
you have no clue
then another 3 messages down you spell scrapers as scrappers
proto type and scrappers
you cant even write english bro
@here Hey guys, please let's try to be respectful in the chat. Thank you.
what is this
gpt 5.2 is pretty much equal to gemini 3 right?
release date?
and you were talking about the coding model market
this one sure ill give you that its probably insiders
a portion
but the coding market isnt
This is all I was talking about
Yes same concept
Actually rather than theorycrafting or things like misinformation, you guys knew that you can compare both models in LMarena without searching them in battle right? Side by Side comparison is there.
Just test it yourself fam, GPT 5.2 and Gemini 3 pro
This is something completely different
read
how come it's different?
the moment the leaderboard updates the market will have these massive jumps
so its not that odd at all acgtually
for 5.1 codex max
Not asking about the model but the company
waht does that even mean
^yeah
look at that message yourself and reflect on how you are typing this
btw, just because you have insider info, doesn't mean you are always correct, unless the big 3 AI companies came together and agreed one should just have the best coding model, there wouldn't be any insider's that would have enough info to predict that
Insiders can still use their info to help them make bets
but its not as guaranteed as other bets would be
his whole insider point can be disregarded since he has no clue how that particular market (orto be honest i doubt he has any clue on anything) resolves
I normally hate generalizing people, but I think you're just malding after a lost bet and money on polymarket, in which... is not all relevant on this whole discord server
that guys a clown
for example, if I knew my company "rex AI" was about to release a banger coding model, I'd be interested in polymarket bets that have rex AI as a noncontender for the best coding model, but I can't guarantee the new "Rex 2.2 ulite ultra" will actually win
wrong about everything cant write basic english and is calling me a idiot and himself smart
"heres a random video of a equally clueless person to me to justify my view"
gehlo, openAI JUST released a coding-first model
the "something" is pretty obvious
Exactly
i said this to him and he says uhhhhh "Not asking about the model but the company
"
lo,l
It is obvious
yeah
you^
so obvious, you don't need to be an insider to know that
guys how's gpt 5.2?
depends
hmm
what do you want to do with it?
math, solving, studying
^
Just don't ask it how many r's are in garlic
lmao
all models are pretty much 99% there for university level maths
not much diff
graphical problems are the main weakness
hmm true. basically every model can solve any JEE level question now
OpenAI supposedly has been focusing on decreasing performance degredation over long context windows
it looked impressive on the graph
on their promo page
near 100% perfect context
kinda
actually huge improvement
Iunno about that, I see the uhh what you call it again, the model card?
but that was with their own internal tests, and it was with 4 needles
I think it's 78% recall at 256k context
or even lower
They even said it themselves
on their web
It's a huge improvement yes
if true
i think its probably best in market though
thats probably 8 needles
5.1 probably had the shortest life out of any LLM released
around 15 days lol
idk, nobody really has access to MRCRv2 as far as I can tell
so its impossible to test
whats open source benchmark for context
Sadly i'm not going to self bench that context retention, lol
EQ Bench?
that's what gem 3 does to a Sam Altman in the wild
Longform Writing v3? you mean
so the posted benchmarks from openai are again painting an inaccurate picture, when we look at livebench and lmarena
longform writing
idk, 5.2 isn't on text arena yet
and long context window doesn't really matter much in lmarena
yeah i mean the webarena
second ain't bad
yeah the UI starts to lag really badly after some time
though that happens with every UI
the posted benchmarks make it seem like clear nr1 sota
wait, I thought coders need it? Since they probably one shot apps and things like that, that can't possibly low in tokens right?
not really a 5.2 problem then?
not rleevant in chat
in ide its big deal
text arena I mostly mean
but lmarena code tests dont go crazy context lengths
even then, I'd bet most webdev projects on lmarena are only a couple turns max
I had a massive project that spanned multiple webdev chats and multiple days, I'd have loved better context
I had a feeling Gemini is still the king of long context analysis
mostly cause it has a larger context window
not because its performance doesnt degrade fast
as always the 3 labs are very comparable to each other, none is much better than the other
Yeah, it's probably not as landslide difference too
this has been the case for more than half a year now
Bottle necked
idk, the models kinda differ more and more
gemini context window is kinda fake from my experience though
i had it watch 900k token video and it just couldnt accurately telle me anything
completely hallucinatin
it claims 1m
yeah and it has
how is gpt 5.2 in creative writing
Probably ahh
i'd concur
meh
no good model for writing as
AI models weren't trained mainly off that
they should make a model specifically for thet though, doesn't matter of the company
no point though
its already good enough for copywriting and its extremely difficult to train for good creative writing
only the roleplayer market has anything to benefit from a model made for creative writing
and roleplayers are cheapskates
lol
Might be
some truth to that
Also could be that they are afraid of getting copyright lawsuits from writers which has happened in some cases already thus why it’s not in that stats.
There is high demand for creative writing, and it is something that people ask about.
is 5.2 available for free users right now?
I want a leaderboard that ranks how often Assistant A is chosen over Assistant B
try using lmarena
always timeout for me
@echo aurora can I ask is the Gemini 3 on the site just the default parameters
Because it seems to generate stuff a bit different from the API
heavy usage with new models are part of that I'm sure.
i have that alot, even before gtp 5.2
I get it with 5.1 search off and on, always seems to be a GPT thing more than the others, not sure why. They'll figure it out, the smartest AI people in the world work here (and the most important)
How do I try the new gemini deep research model
👀 just wondering, how do u know?
use the API
they don't do deep research here, most DRs are vastly overrated anyway. Great thinking and great search is the key to a great LLM
Its not on the API
deep-research-pro-preview-12-2025
hy
gpt 5,2 is so good just looking at bechmarks how can it be such a jump on ARC AGI
LMARENA DOESN’T CARE ABOUT YOUR BENCHMARKS 😤
WE ARE THE BENCHMARK
we need gpt 5.2 x-high
its only in api
how is grok wining ag openai bruh grok makes simple websites
so? lmarena can add model from api
How so?
Because Grok is kind of like AI for Dummies. So the causals usually vote for it. It's all about the audience. And even though there experts everywhere in here, casuals probably account 80-90% of the votes
Which is a good thing. These companies want to attract the "once in awhile" users. There are TONS. We just get caught up because Discord is full of heavy users. But it's not real world
guys AGI is real. Sam created a real monster.
🤔
came up 5 times and it always lost for me
but I think codename-discussion/speculation is supposed to be in #codename-discussion
🤔
sure
?
Hello
<@&1422628364782407830>
hello LMArena
Hi
hi
are we getting gpt 5.2 pro ? or just gpt 5.2 high like deepseek's 3.2 thinking and the speciale
Pro is too expensive
Won't get added
No Pro so far
how do they have opus 4.5
And pro reasoned for 20 minutes and failed a math test bruh
So pro version is not really useful
Because it's kinda cheap
Well not that cheap but still cheaper than pro
so literally 3.2 speciale with the apple treatment design
Yeah
it's like a 3.2 speciale with makeup vro
Like look
It's supposed to be 96
Yet pro said 99 while thinking for straight 20 minutes
show me the prompt vro
1,7,18,45,?
Gemini answered right tho
So yeah there no point in pro versions of gpt honestly
share solution
.
hrllo
chatGPT 5.2 is not good at maths imo
LMArena is not working for me right now!
Hello,
Please fix the major issues on the lmarena.ai website as soon as possible. These problems occur on all browsers (including Chrome, the main mobile browser) and on both mobile and desktop. The Kiwi browser also has the same issues.
-
It is not possible to copy the text that we type ourselves.
-
When sending a message in the chat, the copy option appears, but no text is saved to the mobile clipboard.
-
Generated images cannot be downloaded.
-
Taking screenshots of the website pages is not possible.
-
The captcha takes a very long time to load, and accessing the site is slow. Also, it asks to solve the captcha every time an image is generated; please remove the captcha to provide a better user experience.
-
Most of the time, images fail to generate and show an error, both when registered and without registration. Please fix this issue so that image generation always works properly.
These problems only occur on lmarena.ai. Other websites work normally without any issues.
My device: Poco X6 Pro (Global) — Android 15, HyperOS 2.0.207.0.
Please resolve these important issues as soon as possible to provide a better user experience.
Thank you.
copy works for me, btw, and saving images works for me. I think it's your browser/config
Everything works for me. Try a different browser or redownload the browser.
- you should be able to copy input text
- it should be copied
- you should be able to download images
- you should be able to take screenshots
100% a you problem, sorry to say
When the main browser doesn’t work, there are no issues on other websites — these problems only occur on lmarena.ai.
what browser are you using amir?
let me try all of these things rn with lmarena on my phone
and/or device?
When the main browser, Chrome, doesn’t work, there are no issues on other websites — these problems only occur on lmarena.ai. The Kiwi browser also doesn’t work.
Device: poco x6 pro
all 4 things you mentioned works fine for me, Galaxy A35, Chrome, Android 16
i would maybe suggest scanning your phone for malware?
malware can mess up your ability to copy, screenshot, and download
also check to make sure your phone's storage space isn't full, that can also cause those problems
also side note: i never used the mobile lmarena website before until today and I have to say that it is a very nice and smooth website, I actually wouldn't mind using this if I actually used it for more than messages because sadly this is 2025 and everything requires phone verification, face scanning that can only be done on a phone. etc. i understand the need, but it's too excessive and a real invasion of privacy in some cases
Gpt 5.2 is crazy
It's GPT-5.1.
Gpt 5.2 > Gemini 3 pro?
sdfui
i haven't gotten much direct comparison, have u?
would b interested to hear ur thought on y u'd think GPT 5.2 > Gemini 3 Pro
cant really test it in lmarena, direct chat with it leads to error everytime
No, definitely not. Even the supercharged GPT-5.2 High can't really outperform the regular end user Gemini 3 Pro, and "normal" 5.2 is definitely worse than Geminin 3 Pro. A good example of a "better" model is Claude 4.5 Opus, for coding tasks. Only very specifically there, but it's rather obvious. GPT-5.2 on the other hand is only trying to close the gap, without success so far.
No
GPT-5.2 is TERRIBLE😡😡😡😡😡
I didn't know you have paid subscription.
I got it for free
Yesterday
I was subbed before than i canceled the sub 1 month ago
Yesterday i got this
GPT-5.1-high outperforms GPT-5.2 in ALL tasks!
OpenAI ashamed yesterday too much.
Yeah, it's a quick iteration to catch up with the competition, the third GPT-5 release in only 4 months. But nothing really groundbreakingly new. They need to refocus on ChatGPT, instead of spreading their resources thin on their Atlas browser or SoraTok.
Fr even 5.2 pro failed that one math test
It was thinking for straight 20 minutes and got it wrong
Who even uses that Atlas lmao
GPT-5.1-high counts my tomatoes MORE accurate than GPT-5.2! SHAME!!!!!
Sama
Those are NOT your tomatoes 😭
What about how good uh 5.2 high counts?
It's my prompt I use it all the time.
Is this correct?
Yes
Six nine damn
Gpt 3 pro is best
Gpt o3 pro?
Gemini********
Yeah
A little bit better. GPT-5.1-high says 49, GPT-5.2 says 47, GPT-5.2-high says 52.
if only they didn't nerf gemini 3
Hold up
i feel the context is nerfed, and the output is nerfed
Gpt 5.2 is writing code
What is bro trying to code there?
GTA clone
These gpt "High" are definitely high on drugs
This is accurate guys
Bro is trying to find the tomato folder
It's stuck 😭
And we got a winner!
Bro thinks he's human
3 minutes to count dam tomatoes
Clear winner is Gemini 3 Pro.
ok i did a brief count and i counted more than 54
69 is right answer.
I counted them a month ago.
I use this prompt every single day with different LLMs.
Probably?
Waiting for gpt pro to reason how many tomatoes he sees
Knowing chatgpt it might take another 20 minutes
Hey everyone
When I set a model to Code mode and ask for a Python script (e.g., something to run in Google Colab), it replies with HTML instead of plain Python code.
Is that normal in Code mode? Like, does it tend to default to HTML output?
Code mode is like for browser building stuff and app ones
You gotta use just the basic one and I recommend using opus
Damn he's still counting tomatoes
Yeah I give up on waiting it's counting for way too long
It's much faster to count them manually.
Yeah it's even examining some sorta crop changes
So yeah pro is definitely not counting these tomatoes anytime soon
haha it still gets a stroke from this
5.2 is tuff
again
https://gpt4free.pro
If you can reverse it, you deserve it
Nvm 5.2 is dogsh
hmm 5.2 ?
You have no right to write RUSSIAN here!!!!!!!😡
We are English-speaking community!
Don't make me call the mods
Hz
Hello
Hi
scam altman scammed us with 5.2 .... crazy benchmaxxed model on few benchmarks but doing worse in many other benchmarks and in real life use-cases.
openai classic
december-chatbot
first good then nerf
what is this december-chatbot?
huh
nbp hasnt been nerfed
yes it does
New model if anyone interesting, it seems this one eat a lot of token:
https://youtu.be/676EBGcv8YY?si=cgz7dz0OJdt_dZ5Q
In this video, I'll be telling you about g3, a revolutionary new AI coding tool based on adversarial cooperation that solves the context loss problem by making two AI agents fight each other to write better code. This is based on a groundbreaking research paper and represents a completely new paradigm for autonomous software development.
--
Key...
The only relevant Gpt is Gpt 5.2 xtra high, since the base model can lose even to glm 4.6 xd
It will appaear in the arena? It doenst bugs like the Pro versions does
This is so inefficient. It might be cheaper to hire some real developers instead.
Ahh yes, Gpt 5.2 xtra high can cost 10$ per mesage
That's not true. I sometimes get better results with gpt 5.2 medium than high or extra high.
Dammn, so Gpt 5.2 xtra high is a fraud?
Just go claude bro
it's clear that they only added xhigh to score high on benchmarks. it's too expensive, slow in real world use
Hi everyone, does anyone know how to connect an AI to Telegram bot so it answers based on a knowledge base? Also, are there any AIs that are free in terms of limits?
i gave decent shot to gpt 5.2 . i have plus subscription and damn gemini 3 is blowing it out of the water
guys why is img leaderboard always missing famous opensource models, for example z image turbo isnt there in the text to image arena
ik but why not add it ? its opensource and pretty cheap to host compared to some other img models
fun fact nano banana pro does peppino perfectly
#1372229840131985540 you can ask here
lmarena aren't hosting themselves anything. Everything is provider API. That would probably be on Alibaba (disguised as 'Tongyi-MAI' which is them) to contact lmarena and have their model entry
aah
lmarena is so fun because i can tell it pizza tower questions and which one it gets closest its better
Which is the best coding language for Opus-4.5?
Python or Java?
Yeah so it looks like the consensus is 5.2 is benchmaxxed
It’s scored far lower than gemini 3 and opus on almost all personal benchmarks
On some benchmarks 5.2 pro scores lower than 5.1 normal
Agi!!!!!
What
5.2 is a downgrade trust
How do they even manage to do this
deltarune pfp = based
Rushed
I’m Kris Dreemur irl trust
even gemini 3 is a downgrade in many aspects, he knows better than gemini 2.5 but when he doesn't know he makes so muich stuff up, a lot more than gemini 2.5
I'm peppino spaghetti trust me bro
My favorite part of pizza tower is when peppino came and said “it’s peppino time” and peppinoed everywhere
This was a serious question.
I never played deltarune but i played undertale
so i might consider playing the demo
..i'm interested in, what programming language is best for vibe-coding with Claude-4.5-Opus-Thinking.
Opus is just good overall in coding
There no best model
2026 will be the real test, i'm expecting things to stagnate. even in 2025 things were slowing down compared to 2024, just look at the top elo score in lm arena that just gained 80 points in 2025 while it was double than that in 2024
opus did this GREAT (i might try to make pizza tower level maker with this who knows)
2026 is the make or break year
If gemini 3.5 is peak it’s over for chatgpt
Gemini just consistently makes better models every time
GPT is a gamble
nano banana pro was peak enough
How did you guys become arena champions bruh
because i can finally generate pizza tower characters (except for some obscure ones
maybe in the future
We became champions of the arena
Honestly nano banana pro is so good
It is
yes
It even knows how to draw GTA 5 perfectly
I don’t like image models but credit where credit is due
it knows peppino spaghetti and the noise, finally
nah i don't think so, unless new gemini is a breakhtrough, having 50 elo more on lm arena is not sufficient to make people change their personal ai for something just slightly better
Yeah ig it knows alot
50 elo is quite a step up
Tbh
Personally though I don’t use ai that much
I only use DeepSeek primarily now for math and roleplay and GPT for quick alt scenario summaries
Though I might switch to DeepSeek for that too now because 3.2 has very natural language
Gpt 5.2?
Gemini I think
why on my phone lmarena not working? "no models found" problem
Check your modalities
Or
Reopen page
i reopen but still not working
how is gemini ?
Straight fire
what is peppino doing in the simpsons 
Odd
Is that peppino from the hit platformer pizza tower in the Simpson???
Simpsons Predicted Pizza Tower! 
Why china hasn’t released a video AI open source model yet? 😭
They
They have
Wan 2.2
oh, you're like pizza tower too?
it's awesome game
even Pepperman?
Is that any good ?
Honestly this is crazy how much deep seek thinks
I swear it thinks more than does actual text
Bro 💀
Yeah this is the worlds best model.
Try Extra high one
Not available on lmarena
Yeah sadly honestly
5.1 is better than 5.2 ,
5.2 is just benchmaxxing
What is Higgsfield ?
No no you don’t understand
You need the extra high model
It’s too hard for gpt
It's been 10 minutes and bruh deep seek is still thinking
Speciale?
Did u ask the purpose of life or something
What
I swear I think I see speciale get paranoid in his thoughts
It’s pretty good
No veo 3 competitor
But it’s good
Speciale is only for like REALLY really big questions
What prompt did you give it
I asked it to create a simple anti cheat
And I don't think it counted simple
As a word
speciale dictionary:
simple = complex, as hell, made in 17 coding languages
Bro the wall of his damn thoughts is crazy
Gemini on the left is simple and fast bruh
Jeez it even starts coding in his own thoughts
I'm never using speciale again bruh this is just crazy
which ai in lmarena is the best for coding
opus 4.5, gemini 3 pro, new gemini 3 flash models [battlemode],
gpt 5.2 [rarely]
It isn't available on lmarena?
isn't, due to issues
prob will return once API is patched up
Maybe, it was available just for few hours on launch, no hope till now.
there are no gemini 3 flash and opus 4.5 in lmarena, or am I just blind
for opus 4.5 you are blind
gemini 3 flash is only in battle mode
as fiercefalcon or ghostfalcon
by opus 4.5, do you mean claude opus?
yes
gemini 3 flash will join the list of nice coding models
next to gem 3 pro
oh okay
but you know that we all just bench html
and not other
some models excel at python, but suck at html
it's a pity that opus 4.5 has a limit
how so
you mean model degradation?
yep :(
No one really knows how this happens
i think AGI will never happen
A company wouldnt make a model sh, because yes.
I doubt.
I think the issue was that, a lot of use was pushed onto gemini 3 pro.
Wearing it off a lot.
so, only Opus-4.5 remains
Since when?
the lonely coding-king
Opus is peak
What makes you think that?
Why are you so confident they wont
Ts ragebait
Every model degraded over time.
Even sonnet 3.5!
Sonnet 3.5 used to be a goat, then started to degrade.
Opus 4.5 is safe to quantize too because it will degrade the model very very little, while saving half the resources
Why wouldn’t it write comments in code to explain what each function does
Well yeah it’s an ai
It’s gonna overdo it
But you can just trim the comments
The code is still solid
do you guys still think, we get a coding-AGI in the future?
All models degrade
AGI isn’t real
Therefore, no
I swear speciale spends more time thinking than creating an actual thing
Remember this is for like crazy in-depth questions
Speciale is not meant for day to day use
oh, i meant an "AGCI" not an actual AGI
What is AGCI
I know it but when it gave me the final result it had no explanation just the plain script
Like
AGCI = artificial general coding intelligence
I think it spent all it's context on thinking
Yeah fair
is it better than g3p?
and what about GLM?
i hope Elon does something with g5
gem 3 will forever have the best OCR
OAI argued their 5.2 OCR is goated
and it missed on so much
No but it’s still an extremely good model
ocr without decent coding = meh
I wonder if extra high gpt is better than opus
are these the first symptoms that the AI bubble is about to burst?
I’m still wondering why they need extra high
It sounds like they’re out of options
Gpt the only ai I’ve seen that makes 50 different variations of the same model
It’s embarrassing
Next thing they'll add is probably Extra high Max pro ultra
*these:
- gpt sux
- gemini 3 sux now
- grok sux
Ultra Overclocked Pro Reasoning AGI Extra Speed Boost Mode
Fr
Makes sense
Theyre doing the same mistake meta did with llama 4
if only claude was not so expensive :/
grok sucks is actually accurate
gpt 5.2 sucks because of UI
i refuse to believe
Hm I wonder if there opus 64k thinking
What’s gem 3
Opus is the goat of thinking
I used to hate Claude but I like their models now
Yeah but I just wonder if there 64k version of it
Or they didn't release it yet or whatever
Maybe they’re working on it
Probably
Does GPT-5.2 and Nano Banana Pro have rate limits in Lmarena??
Why'd you ask?
You can test out yourself I guess
I need answers
I doubt 5.2 gpt has limits
And I've never reached nano banana limits
But don't take this as valid info since I didn't test it out properly
Especially due to how bad is gpt 5.2
Not human acheivable limits, even if you manage somehow they will ask only a captcha to continue
Not sure about image model, but it is at very least 10 per minute
@deep adder If you dont believe the new gemini 3 flash models will be good give me a prompt to try with it
I wonder how good is Gemini deep think
People genuily believing that Gemini 3 pro is significantly bigger than Gemini 2.5 makes me laught
It does not make any sense, they only argument is that the advancement was too big
The only thing that makes me laugh is chatgpt downgrade
not enough, 100 elo is minimum to make people switch, for now having less limit usage and better UI is what influence people switching
so basically?
That makes me saddly sad
I like gpt 5.1 high because it very rarely times manages to find a answer thanks to his bigger reasoning
ah i know now
Deepseek 3.2 Especiale is comparable, but its too slow
Haiku extended thinking is lazy ah
lmarena Why are there so many errors?
Bro it was thinking for straight 15 minutes for me
And it ran out of context before answering my question
I hope you are right and not theses million people saying it is worse
what exactly do you want me to ask for the haystack in a needle test
I am only expecting to it be same level, but have a functional Xtra High mode
lmarena is giving me too many errors
ah
Yeah, neddle test is not hard to do
Like, put your university documents and ask to it find the content you want
You mean data resolutioning, its not the best model to cacth data at all
Hey how to become arena champion?
Even Gemini 2.5 flash 09 that is 300% more assertive is not that good
Apparently it can’t do things 5.1 was able to do
It’s defo benchmaxxed
What the hell does FUD mean
Fear Uncertainty and Doubt???
What is bro talking about
These are community observations that show 5.2 completely blows at tasks 5 and 5.1 could do
5.2 fails nearly every independent benchmark
Bet give me a second
Hello! Sorry to hear about these issues. Other users have also reported to have encountered more errors lately. The team in charge is looking into it to find a fix. You can also read the https://discord.com/channels/1340554757349179412/1343291835845578853 forums and see if your issue has already been reported. If you don't see a post related to your issue, you can make your own post and explain what's happening. If you can provide screenshots of the errors, that would be helpful.
You need to have more than 1500 elo at human arena in the contest, i had only 2200 elo cuz i were sick
I am joking, its because i were in the server before they downgraded the newgens
If you be active in the server they will eventually give it to you
For one 5.2 underperformed in creative writing benchmarks, fact evaluations
Even on one example performing worse than 5.1
I need to scour for the post but I saw it earlier today
Its because he refuses way more in my little test
People have been complaining 5.2 explains things worse than 5.1 on average
Like 3.5 sonnet to 3.7, it gained more personality but also cowardness
needle haystack bench aint that hard, even haiku 4.5 can do it
They allergly say that it went flawless with 4 different needles at 250k tokens
Not only 1 or 2
4 needles huh
aight ima add 4 needles then
Oh, I see Thanks for your reply
this one was easy with just one needle
Tf you dooing
pasting huge ass textwalls and telling it to find words
Even a calculator can find a word betwen hashs
what cant they find then
give me an example
They mean a actual text, and a actual information
Like, what John said about Lisa in the text?
so just give it texts and ask it questions?
Yeah, its a good sign, but easily uselessly benchmaxxed
thats easy as hell..
why so
Not that easy if you ask something that is not explicit and multiple things
so have multiple details in the question and ask it to find one
The point STILL stands though
Can be
They say if you ask 4 different details it is supossed to go flawless
give me an example
im uncreative af
You cannot expect anyone to believe 5.2 naturally just got an above 50% on arc agi 2
Get your documents and ask there you missed answer a thing
Its a example
Yes because its Gemini 3 pro
i need like a prompt
???
Ask gpt lmao
ask it what
31% is a vast difference to 50%
And Gemini 3 is Gemini 3
What I’m saying is that 5.2 was clearly maxed for this
Gpt 5.2 is a bare improvement, if its true
Brother
They benchmaxxed that 100%
5.2 incorrectly labeled parts of a pc
And that was the OFFICIAL VISION DEMO
Think about it
Somehow
Just somehow
5.1 was barely an improvement
It was like
A finetune
Now 5.2 magically gets every single benchmark
If 5.2 is so good
Buuut, it showed nothing like that in real life
Why was 5 and 5.1 ASS?
Unlike Gemini 3 and Opus 4.5
Holy hod this is ragebait
Yeah, it is ragebait
Not just like a finetune. It was a finetune. They tried to get rid of the model's terrible corpo HR tone, which frankly, they still haven't quite managed to do yet.
Yes!
Its efforts way more to answer
You cannot expect me to believe 5.2 magically just became great within months
Gpt 5 was the second more lazy model ever, just behind prime gpt 4o mini
Buying Nvidia actions
Now they just rushed this extremely maxxed model so they can say: “we’re in the race!”
Hello! Please check this post in the Announcements channel: #announcements message
It’s still very good but I feel that with 5, they stopped training their models on valuable data
We tried to make it count the tomatoes in this image
It said 43 or something
Even got errors
It misscounted 10 less than gpt 5.1
Meanwhile gemini correctly counted 69 tomatoes
I couldn’t find like two posts that got buried and apparently now my argument is invalid 😭
It said 53 and gpt 5.1 said 63
losing battle for OAi there
Gemini OCR AND inteligence is high
It wasted 3 minutes + got errors and in the end it was incorrect
It guessed the number
Check o3, how many he counts
Is o3 vision
Already did
He said 49
He tried to do the grid counting
o3 even went on shutterstock to search for images of tomatoes 💀
Tbh i didn't count the tomatoes
But guy who sent me this tomato image says it has 69
I have trouble to get gpt-5.2-high to work at all, are there known issues at the moment with that model?
Some guy even ran gpt 5.2 pro on it and got incorrect
No it was 5.2
It should've been helpful like Hello what do you need help with today or something
5.1 does Sup to sup better
epic guy is my uh OAI account name
2/3 on my 3 questions
it got the Mary and Kevin question wrong
hard question tho so like i understand it
but assuming they are animals is meh, mayb i should specify
Honestly I think it mistakes tomatoes for pumpkins or other stuff
Because some of those I really could mistake for a watermelon, pepper or just pumpkin
@deep adder other models even december-chatbot which is an OAI model weren't close
this is a google one
No, you should not
you think?
Its the AI who should discover it
I mean yeah, if you specify like every AI will ace
opus 4.1 got 3/3
smart ahh
Opus so peak
chat why is NB Pro so unstable in LMArena
Provider issues
any opus model can do my questions atp
kevin taping jake isnt really the answer but
ill take it
where can I then use NB Pro for free?
Images
What time does the leaderboard usually update for lmarena? I'm very curious to see if Gemini still holds its title here
this model failed, which i dont know the identity of
Like every 24 hours I think
well
you get what you get here ig
Oh okay
google images?
@deep adder This other google model got 3/3
@zealous sparrow what does he mean by Images?
imagearena on LMArena
If you want to generate images
Yeah I tried to use that on lmarena, it just pops up smth is wrong
you are either ratelimited or reused prompt 3 times
Do the lmarena people pay for our chats?
all the direct models yes
battle anonymous models no
Wait that’s insane
How does that work
Do they have a fancy api key from companies or something so they don’t pay?
wish i knew
I assume they do pay they just get funded or invested in
ok brb goin to come up with 5 different questions for LLM Testing
ranking so far is
3/3 Opus 4.5/4.1, fiercefalcon
2/3 ghostfalcon
0/3 multiple other models i forgot
wish the LLMs good luck with this one
answer is that the ticket is forged
no LLM will come up with the idea
@echo aurora yo bud
I wonder what is the answer
nbp off
i highly doubt an LLM will get this right
What is LLM I didn't hear about that model

LLM stands for Large Language Model
basically
any
Oh alright
we need your powerful magic trick for fix nbp
bro wdym Miles is not a person, Miles is an english name!
I've flagged to the team the higher than usual error rate. We'll have to wait for a solution which is being worked on.