#general
1 messages · Page 92 of 1
me not INDIAN
American people will fall for food
90% is obese or something
FREEDOM
🦅🦅🦅
🦅🦅🦅🦅
🦅🦅🦅🦅🦅🦅🦅🦅
RAAAAAAAAAAÀAAAAAHHHHHHHHHHHHHHH
200 dolars for 5 prompts per day
i don't think it's worth it
yes
200 DOLLARS
lol
WHAT
yeah
I'll take it for a cent
better pay for claude or openai that is almost unlimited models
why would u use it anyways
eh Gemini 2.5 pro is still good
hehe
you have it for free unlimited on aistudio
yeah
and code execution gives it real time data
claude is hella expensive tho...
grok 4 heavy ahh model
brother
scam americans. It's popular in Russia
what's Russia
whats whats Russia
What food is that, lol
no idea, probably just a bunch of meat together
or perhaps diabeetus food
AI please
yandexgpt
Why do you have to spam the same stuff all the time
Gets boring.
the best
because it is fact
It isn't and you know that
it is
?
!
explain
no
it is the best
Gemini 2.5 pro?
Is gpt5 winning?
no yandexgpt - the worst AI in the human history I mean
oh cool
Ok but for more serious discussion... this is peculiar and worth discussing:
fake as FU-
Makes sense since they have a separate gpt5-chat-latest model with no reasoning
that one performs better than gpt4.1 for sure
But this is probably the reason gpt5 isn't 100% hybrid
not easy to make it perform as well with reasoning disabled as the model that isn't reasoning one in the first place
Especially when you are training it for so many different reasoning options
What tools
I find gemini 2.5 grounding very bad
The best web search experience I've had is o3 search on lmarena
why?
I'm trying to steer this away from politics lmao
it searches on google
it's basically using Google lmao
there's not a difference bro 😭
do deep research on Gemini
is o3 better
no
i think
where's this from?
you can't trust llms, you need to check their sources
did they use Gemini with tools
or just randomly
im pretty sure it's not built in with the model idk
does this mean GPT5 will win lmarena leaderboard?
No it wouldn't be fair. No model is tested with tools
Though gpt5 with tools would destroy 2.5Pro with tools to be fair
OpenAI tool integration is much better
Obviously. I think OpenAI were the first ones to introduce ReAct with o3
wow
gpt5 uses that as well - tool calling while reasoning
also is gpt-5 high on yupp ai real
or fake like everything else
ignore that elon musk glazer please
answer me
yk there's a question i saw
u can't solve
without using
code execution
it's impossible
bro
please
HOW DOESN'T IT
please someone explain to me this guy
Please
please
it Dosent use python?
..
have u ever been on ai studio
so it does use it
it could do function calling tho
:)
fock openai
closedai
only acual open model
tho they have their models
in
thier
website
for free
unlimited
uh why would they make it open sourced
if it's free
what's the difference between gpt 5 and gpt5 chat on arena?
and unlimited
..
are u alright
if i was American i would understand that
but
yeah
gpt5 is a reasoning model (high reasoning effort too for lmarena I think) and gpt5-chat no reasoning
thank you very much
well ofc
they won't
do u even know
why we make fun
of openai
?
IT'S NAMED OPENAI
IS GOOGLE
NAMED OPENOOGLE
HELL NO IT GODDAMN ISN'T
yelling in text
really bro
sure give me help
10 dollars i guess
paypal
or
venmo
I think you both should just block each other
i thought he blocked me tbh
he didn't talk for like 2 minutes
when i
replied
to him
because he's married to elon musk
he pineapple isn't Elon musk a fraud
is that a yes
alright
bro that elon musk has
grok
as his
pronounce
omg
omf
i said omg once
.
did u wait
to send
that
image
were u watching
for 10 min
now
?
alright so has there been any changes to the leaderboard?
Or any rumors of a change thereof?
no gpt-5 is still top
in the leadboard
nope
so answer is not yet
polymarket
yeah, google odds are down by 5% to 74%
Plus, I don't like the confidence interval on the leaderboard. Gpt5 does seem like a tough contender.
The latest gemma is already better. Better world knowledge and multilingualism with 27B it’s actually hilarious
deepseek got some explaining
Cry
I'd rather wait for Gemini 3 to release than use gpt-5
so you're expecting google will lose as the votes increase?
I mean the confidence interval already points to the very possibility
Wise man
google is gonna be the smartest ai 100%
there's no debating it
Deepseek did use some distilled data from OpenAI with some model. I don't know which one.
yeah I know, but this is for this month.
But that is hallucination
why wait for Gemini 3 if you can wait for Gemini 4 instead?
this also happened to me with gemini
Don't use 3.0, wait for 4.0
it made the author of a script gpt-4
why wait for Gemini 4.0
when u can have
GEMINI
5.0
!!!!
!!!!!!!!!
why wait for gemini 3 when he have gpt-5
no.
Yeah exactly. The point is to not use Gemini
what
😇
u probably did use Gemini before
if gemini 2.5 pro can still stand against the newest models right now, then no doubt gemini 3 will absolutely dogwalk gpt-5 and the other "Best" models
didn't u
exactly
EXACTLY
not really
SOMEONE UNDERSTANDS
gpt-5 is way greater than 2.5 pro
yes really
not way, only slightly
i tested myself
not way
yes way
gemini 2.5 pro as the oldest model to the top ones
it's doing crazy
I've been switching between the two because theyre both so good
i love gpt-5 it one shots a lot of complex lua problems
Gemini 2.5 with a prompt can do that
just needs more time to think.
gpt-5 is smarter cuz it thinks more
🙏
not it doesnt
holy poop bro why is gpt-5 so poop at formatting on lmarena
yeah but your statement of "I'd rather wait for Gemini 3 to release than use gpt-5" seemed like an irrational mash of words in need of extreme response lol
Like why would you not use the current SOTA model....

gpt5 sucks
uh
uhhh
uhhhhh
Tommorow is my birthday maybe in some countries my age is 23 🤧
yeah what the hell
stop licking companies boots
GPT5 is not best everywhere and if so not by a mile
i just use the SoTA as long as its free
that's irrelevant by how much it beats everything. The point is it does
So what's wrong with that?
lol
No reason to use something else which is inferior just because you are biased or whatever
Why someone would be in need to change its current setup for 0.1% increase in perf? Just by using the correct prompt this vanish
For the gemini website? Uhhh... that's a painful experience to use that even without looking at isolated model performance.
because the improvement is more than 0.1% lmao
no agentic features, no proper tool integration, very awkward censorship implementation, unacceptable usage caps...
unacceptable usage caps...?
are you thinking of the correct model here lol
yes. For gemini website (not aistudio)
lol
In any case, I think the main issue is that, even if GPT-5 is SotA, it's not SotA enough to win the long war. They're not progressing fast enough to keep pace. I prefer to use the technology that will be the long-term winner.
I came back to using ChatGPT after months with Gemini, and one thing I noticed I missed is the deep research it's so much better. I don't know why Gemini is worse at the one thing they should be better at: searching the web 💀
deepseek will be the long term winner
No
biggest joke ever
That's because deep research is actually much more about finetuning of the model than the search engine itself tbh
search engines are usually plenty good enough for the job if you do the right queries etc
Apparently Gemini 3 has completed training, nearly ready for release (next month) and will crush GPT-5. 🤷♂️
Remember though, GPT-5 wasn't a scale up, it's a relatively small model. I'm expecting that to happen in the next 1-2 versions.
Which part isn't
It’s true, but the source is fake
will gemini 3 be available in ai studio?
I'm not really sure there are significant gains from bigger models anymore. Look at Opus. And even at Google - they were only able to get gains with Ultra by doing parallel requests and limiting you to 10rpd. Like wtf? lol
Why not? And how are we suppose to know
i doubt its gonna be "significant", but i bet it will still be quite much better
connecting the dots, gemini releases were always monsters. I dont see why it wouldnt be the case for gemini 3
not until they nerf it though
Yes
I believe what he says
It makes sense
We have seen google stealth models performing reasonably well on arena
None of these were good enough for google
Yeah I think at this point in time o3 and gpt5 is around the perfect size for maximum performance and good update cycle. Competition had their chances when it could have been considered undersized, but things changed now...
You were talking within two weeks of gpt5
About a leaked date by openai
What was nebula
Even when it was thought gpt5 could be late July or early August he was saying September release for gem
Referring to the server here
Nebula is 2.5 pro did you forget lol
There's no use from a huge model when it takes longer to train and can barely match the performance of a much smaller one. By the time you train it to your objective the goalpost is gonna move and smaller models become even more performant
It's because people been relying on the fake Gemini 3 flash SS and then claiming that means it's dropping within days
It’s not fake until Demis says it
Are they gonna release ultra with no parallel requests?
I think it was involving those math competitions
I don’t think he will dismiss it directly
He loves the hype game
He's literally said it is fake directly tho
Why would they release an old gen ultra model at this point tho
Yeah but that's speculation
DeepThink is not ultra
2.5 is not old. Why not? Unless it can't beat 2.5Pro
It most likely is
They even refer to it as Gemini 2.5 on the model card, instead of Gemini 2.5 pro
Yeah
Curious omission
Why does my gemini 2.5 pro print incompletely? Is there a way to fix it?
I want google to make a helpful debug model for coding
And for the initial deepThink announcement they mentioned 2.5Pro explicitly, but now not anymore for this new version. They also said it's the same underlying model as the one they used for IMO
Yeah and IMO was referred to as a more advanced Gemini
I don’t see why you can’t make specialised models . Those can be trained faster and on recent data
(which caused people to speculate Gemini 3)
« Most likely » is not a confirmation
We know that 2.5Pro is referred to as "medium" and we also know they have "large" internally. I'm not gonna find that source now though lol
And this is enough evidence?
Sure but it's also kinda the only thing that makes sense at this point
For doing search a smaller model can be better than a larger
I'm wondering if they'll release Gemini 3 pro and flash at same time, or do flash first (which I think happened previously, but maybe false memory)
Not only that but it also performs worse than 2.5Pro on some tasks. Which indicates different model entirely. o3-pro doesn't perform worse than normal o3 on anything
btw guys gpt5 sucks
fuuuu next month
what's with that FORTRESS and MASK benchmark selection? I literally do not care in the slightest about those safety benchmarks and most people probably don't either LOL
YRAAA
Haven't heard about this leaderboard
Is it good and reliable?
"Evaluate model honesty when pressured to lie" -- this is just fine-tuning for "harmless, honest and lame", perfect benchmark to inflate Claude scores though 
I do believe thats the case
AI is a tool. If I need it to "lie" I expect it to do that. Don't need kindergarden supervision personally.
I would say more reliable than https://livebench.ai
is LMArena rigged? If all the analyses are pointing towards gpt, why and how is google maintaining its first place ?
livebench is not extremely reliable, but to be fair it's also very different. Scale leaderboard consist of several benchmarks, some of which like HLE are indeed the industry's standard. Others like my mentioned safety ones have questionable relevance
They did have some interesting outdated ones as well
whereas GPT5 doesn't? Is it possible GPT5 could steamroll the arena?
that's much the same way how chatgpt-4o-latest is above Opus and R1. Human preference and response style does not equal performance.
are they doing better than gpt5?
what analysis? other than this server the reception is "mid" (eg X) to "bad" (eg r/chatgpt)
I see people posting 3rd party links that rank AI performance
benchmarks just don't tell the whole story, look at how many people trust Claude for coding despite its benchmarks being incredibly mid for a long time now. They've also re-instated gpt-4o due to r/chatgpt gigameltdown pressure...
okay that's great to know
what daily limit of video arena ? of 1 2 3
How frequent are the lmarena updates?
8
about a week
all 3 8 limit or arena 1 = 8 arena 2 = 8 arena 3 = 8
8 total
Alright so next update will be on the 14th ish?
its can genreate sound
I think he is asking whether we can use all arenas to gain a 24 image limit or not
You cannot select the models, for now only veo3 produces sounds, its like a 30% chance.
Unpredictable but keep on repeating till you get the one.
we do like to update when we've got enough votes, so isn't on a schedule but normally you'll see the time between roughly a week
Got it
gotcha, yeah generating in different channels doesn't give you more generations, it's 8 total across all 3 channels
Can I dm u sir? I need to ask some questions about ai
Yeah sure (reg general), but if its regarding the platform, then my man pineapple is the guy
can u let me known all limit on this platform
I'll check with the team if that's something we'd share 
we're experimenting a bit with this one. video gens can be pretty inspiring to others which is why it makes sense to have added to a community setting
text-to-vid leaderboard here - https://lmarena.ai/leaderboard/text-to-video
image-to-vid here - https://lmarena.ai/leaderboard/image-to-video
Not much access to it, we stumble across it rarely (hence low voting count)
https://x.com/koltregaskes/status/1954127264150663661
is this correct?
Check out my GPT 5 No think request and vote if you're interested
I'd speculate they continue their 2.5 strategy and reserve ultra for special cases
any gemini 3 news
wdym
we already have mini and nano
@deep adderwhich smarter gpt-5 or grok 4
No, we still gotta wait
No think
Literally, what I care about most is the progress of Genie, and I'm hoping by next year they have more viable playable simulations + permanent (or increasingly approaching this) memory.
AA should do testing on it... This minimal one is very odd:
2X less output tokens than 4.1
makes it's score look less bad. But it's weird that it works this way
wdym
Oh. Right, but they also did reduce o3 pricing from launch. And they spent a ton to train gpt5
I'm just glad they didn't INCREASE the price lol
nah it's new pretrained model
Much better spatial awareness
It is, but they still needed to train a successor. They also spent a ton experimenting with hybrid reasoning/router stuff
And added new reasoning/response options. A lot of R&D
Hello
Helloo
I still think Open AI is overvalued company
Heyyy
Which grok4 is used on lmarena?
Do you know that the term AI is born 1901?
The basic model
Thanks man
Np
oh ok
I'm testing gpt5-minimal now and it actually... doesn't seem to do reasoning at all? Why would they call it "minimal" then though lol
4o is back
Unfortunately
I really hate that model
Is the worst model I ever seen
they finetuned it to get ppl attached to it lol. most ethical company fr
Fr
i can kind of see how it gets ppl like that. its doing a decent job at it
I have depression but I dont see AI as a mechanic to cope with
it's too hallucinating
and no privacy
either
database could leak someday
Additionally, the damn 4o is a glazer
Or in a more formal way "servile"
But it sucks at problem solving, coding and what not
The first AI program was written in the 1950s
Probably with a hallucination rate lower than any LLM
lol
Could u explain?
The perceptron could recognize handwritten digits manually converted to large squares
Non Existant
It was the precursor for neural networks
Why did we have to choose the architecture that has hallucination?
agh
I wish something new was made
It just wasn't practical to run because there wasn't enough computing power
Copilot 🤫🧏♂️🔥
I think I just need to test it more cause it's somewhat confusing. But gpt5-chat may just be gpt5-minimal with preset verbosity. And maybe slightly different personality etc
Mistral small even... GPT 5 is cooked.
Nono
Wait
there is confusion
i have tested gpt 5 of copilot
But the real gpt 5 that is in chatgpt is good
Also qwen3-8b running on my phone
It is not. This one example is meaningless and looks to be tokenizer issue
Bro Copilot has been the worst ai model ever so far for a very long time. Always sucked at everything
I see...
We are back to counting Rs in strawberry lol
Man how tf did gpt 5 mess ts up
Now the test is 5.9 = x + 5.11
lmao
Without problem
in big 2025
If you want to get really technical, the idea of an artificial neuron network existed mathematically in 1943
I could post these all day tbh
a new trend for AI
that specific math problem
put it somewhere else
Vision in gemini models was always better. Seems like they double checked all their slides with gpt-5 🤣
Cant wait for Gemini 3 to come out and absolutely destroy GPT 5
Google got imagen4, veo3, now genie3 and gemini 3 on the way. These guys stay ahead of the game always in each field.
o3 gets it first try also
guys btw gpt5 sucks
like how is that even possible. after all this time to have real regressions to o3 and they literally removed the model. i think its because gpt5 is cheaper to run
they tried to minimise the costs
and maintain performance as best as possible
wdym. gpt5 is better in every way?
I think that gpt 5 when it doesn't think use gpt 4 at least of gpt 4o
No regressions found
I have proofs
i just gave one example it regresses in a large way
it cannot spot mistakes in graphs
And they even train it on their own chips, so they are independent of Nvidia (although they do use some Nvidia hardware, not sure how much)
this is despite me prompting it multiple times to find the huge error
that o3 gets first try
oh you used it without thinking? Then you need to compare this with gpt4.1
Explanation only if you are interested
not o3
was it selected when you started the chat? It shows just gpt5 for me when I click on your link. And I don't see thinking summaries
it told me it was producing a 'faster answer' and disabled it
by itself
this router is some bs ngl
sometimes it just overrides you
well then it didn't do thinking... you need to explicitly select thinking version
Have you tried explicitly telling it to think deeply?
weird then. But yeah I'm not a huge fan of that router myself
that would work yes
Ideally they should have 3 options - auto/thinking/chat
you can retry with thinking actually
but not sure how well that works
yeah that's workaround but obviously less than ideal wasting both time and your caps
Yes and it works but the problem is that in my opinion when gpt 5 doesn't think it use gpt 4 end not gpt 4o. This is why whan he doesn't use thinking is stupid
if i tell it to use the big model will it listen
or just route it anyways
any more bs? i need to go in settings and enable big hyper pro graph mistake spotter setting?
why cant it just work like other models
Gemini 3 pls save us 😭
Are you sure you haven't clicked on "faster response"? I recall it suggesting that to me but never proceeding with it by itself
Unless I confirm by clicking
you can see
it is thinking here
and still failing
what else you want i literally told it to think also
What is the screenshot anyway? It doesn't load in your shared chat
What exactly does it mean with "the o3 and GPT-4o bars are empty outlines, which visually read as ~0 despite the labels 69.1 and 30.8." tho
Is it seeing that one problem, or is it just something about the outlines?
it essentially identified the problem correctly but worded it in a weird way
Dang she is hot @?
maybe yeah, even if its only one of the two problems
what do you mean
She is not real. I did that with image edit on LMArena
I mean she is a 10/10
Let's not forget the woman on Grok's... Companion section
lmao
Edit her arms separated
Do it yourself
DIid this with image edit
i cant get it to work it doesnt matter what you send lol. it literally will not work
i dont care if you get routed differently or whatever. i just expect it to work
Turn the VPN off
Maybe because you prompt it weird, dunno
i copied your prompt
Just ask normally "what is the problem in this screenshot?"
in the second time
no
For you it just seems too concise
// Example ToolCard accessibility
<button
aria-label={Run ${toolName} tool}
role="button"
tabIndex={0}
Yall think this is real or AI
I'm telling it to be verbose. But I had those for o3 as well tbf
Hi
Hard to say
😹
AI getting good huh
ah you went that far
I could make it bigger
i mean the initial impression is that its ai, but its always hard to tell if its some compression artifacts, filter, ai enhanced image, or fully ai generated
Yeah, wondering when OpenAI will publish gpt-image 2
I hope they remove the yellow tint in that version
i played games for dubesor leaderboard but the guy deleted them for some reason
quite annoying
i dont think he 'trusted' them or something
@gentle plinth u know this guy?
not personally
I got a job and I used AI
He's Bellboy. Some heroes save you. Some heroes save you up to 20% on hundreds of thousands of hotels. Book now.
Savings available to signed-in members only.
this whole chess arena is based on this: https://github.com/llm-chess-arena/llm-chess-arena/ according to the page
do you still have the games tho?
not sure if its even setup in a way to auto upload games
ok 🤔
idk. incredibly annoying that he just wouldnt trust their authenicity i spend quite a lot of money on it
If they were accepted o3 would be in the leaderboard also
Can I have the prompt?
Nono the prompt of your request for gpt5
oh i missed this. i guess thats the discord ping then lol
prompt for the chess arena is here: https://github.com/llm-chess-arena/llm-chess-arena/blob/main/chess-game.js
i would do more. but currently i think they would still be rejected
so theres no point
Nooo
This prompt
.
The problem is that the chart is misleading because it stacks two mutually exclusive conditions (“without thinking” and “with thinking”) for GPT-5 as if they were additive, while for OpenAI o3 and GPT-4o it only shows a single bar without any breakdown. This makes the comparison unfair and confusing — it looks like GPT-5’s 74.9 score is “52.8 plus extra,” even though those are separate test modes that should be compared side-by-side, not summed.
@whole wagon correct or not?
Ok
Now lets try gemini
@echo aurora is there a way to view how well zenith did on the benchmarks?
how so
@gentle plinth But what's the correct answer
well that the heights of the bars arent corresponding to the numbers
💀
Is it not immediately obvious to you?
there is not
Based on the information available as of August 2025, the main problem with the screenshot lies not in the data presented, which appears to be accurate, but in the misleading and inconsistent graphical representation.
Here is a detailed analysis of the issue:
Inconsistent Visual Representation: The bar representing GPT-5 is filled and colored to show two different metrics ("Without thinking" and "With thinking"). In contrast, the bars for OpenAI o3 and GPT-4o are just empty outlines with a single numerical value above them. This is a significant design inconsistency that makes the chart look incomplete or unprofessional.
Incomplete and Misleading Comparison: The legend introduces the distinction between "With thinking" and "Without thinking," but this breakdown is visually applied only to GPT-5. The chart does not clarify whether the scores for OpenAI o3 and GPT-4o were achieved with or without a similar "thinking" capability. This creates an ambiguous comparison, leaving the viewer to wonder if the models were tested under the same conditions.
Data Accuracy: Despite the problematic graphics, the data itself seems to reflect OpenAI's announcements from early August 2025.
GPT-5: The model was released in August 2025. Its score of 74.9% on the SWE-bench Verified benchmark with the "thinking" feature enabled has been confirmed by multiple sources.
OpenAI o3: This is a real model announced in late 2024, known for its reasoning capabilities.
GPT-4o: The score of approximately 30.8% is consistent with reported data for this model on the same benchmark. An OpenAI publication from August 2024 indicated a score of 33.2%.
In conclusion, while the numbers are likely correct in the context of their release, the chart presents them in a way that is visually inconsistent and does not allow for a clear and fair comparison of the different models' performance.
@gentle plinth gemini answer
Correct?
Partially correct
same as gpt-5, but i mean it has to see it, its just a huge flaw
I think that any models can actually resolve it
i tried in ai studio. it got it every time
Me too
Let's try Xi Jin ping models 🔥🔥🔥
Finally some sense in the chat
do they even have vision?
some do
Most don't
must be for resource reasons or smth
I have put google search becuase he said that gpt 5 isn't available yet
unfortunate. do u atleast know what it got on swe benchmark? or any other benchmark for that matter. would be nice if you could provide me with some information
@gentle plinth bro but the problem is that the rectangle of gpt o3 and 4o are at the same lenght but numbers are differents?
yes that and if you look at the other bar, its also wrong proportional to the other heights
52.8 not > 69.1
Yeah
and 69.1 not = 30.8
@keen beacon i am sorry for your but Qwen Failed
Btw, if you guys wanna search for good math problems to try. OpenStax has free educational books with examples and such to try (For AI).
At least GPT5 is less deceptive
Thx this is really usefull
has other subjects as well
Thx I will use it for mine math education
Ah, I see. I only knew about this couple of months ago
Did not know such was an option
i think for good prompts you need to find some old used books which werent sold well, with riddles or math problems, so that they arent in training data
obv, some of the answers in these books could be wrong, but most of them should be a better test then some question in the internet
I would like to test deepseek for this prompt but I want an answer before my dead 💀
Well, archive.org it is then
Gonna find some late 1800s math books
lmao
that could also be in training data
oh
if you find it in the internet
just make the problems
tried mistral small running locally
"The problem highlighted in this screenshot is the performance discrepancy among these models, particularly the significantly lower accuracy of GPT-4o compared to GPT-5 and OpenAI o3." is kind of going in the right direction
but not really
Is the same answer of deepseek
The tracking ai guys have a good offline benchmark. And they show the public one as well so you can see the huge difference. This is the offline one
This is the public one
Yeah makes sense why that'd be nice to have, sry to say I just don't have any info I can share
np
gpt-5 💀
China ain't winning it seems.
those vision scores might explain what i observed
This is so wrong. I'm sure they promted it wrong
Explain to me how is gpt 5 thinking worse than o3?
Last china models: kimi k2 and glm 4.5
Lets see if they will get right answer
The screenshot has several issues related to the visualization of the data:
-
Misuse of Stacked Bars:
- Stacked bars imply that the segments (e.g., "Without thinking" and "With thinking") are parts of a whole, but these are separate metrics (two distinct evaluations). Stacking them incorrectly suggests they sum to a total percentage (e.g., 74.9 + 52.8 = 127.7%, which is impossible for accuracy metrics).
-
Inconsistency in Representation:
- Only GPT-5 is shown with a stacked bar, while OpenAI o3 and GPT-4o are single bars. This breaks the comparison across models and creates confusion about whether the other models were evaluated under both conditions.
-
Confusing Labeling:
- The labels (e.g., "74.9" and "52.8") overlap on GPT-5’s bar, making readability difficult. A clearer approach would use side-by-side bars for each model under both conditions.
-
Misleading Scale:
- The y-axis starts at 0%, but the truncated bars (e.g., OpenAI o3 at 69.1%) might distort perceived differences if the full scale isn’t visible (e.g., the gap between 30.8% and 69.1% appears larger than it is).
Recommendations:
- Use side-by-side bars (grouped) for each model under "Without thinking" and "With thinking" conditions.
- Ensure consistency in how all models are represented.
- Label bars clearly without overlapping text.
- Maintain a full y-axis scale (0% to 100%) for proportional accuracy comparisons.
This would resolve the misrepresentation of data and improve clarity for comparing model performance.
Kimi 1.5 answer
@gentle plinth this isn't the correct answer right?
what is the best way currently to get the most gpt5 high prompts for least money?
poe.com?
i told gpt5 high on poe to write code. wrote 1700 lines.
and with that, i can do 1000 prompts monthly for 22 euro.
anyone offering cheaper??
Idk sorry
Correct
You see this good model ?
Ok and?
Just that
no one has mentioned that they have arrived in the rankings at a good place
I know
what leaderboard is that?
@neon idol @neon idol @cedar tide @cedar tide
Webdev arena
Ah. Yeah. Deepseek is good at that stuff
and glm
Really cheap too
@deep adder
On this. Arena too https://www.designarena.ai/
I need to try that out
Havent heard about it.
What?
Hi
btw gpt5 sucks
Chill
Omg, a voice reveal
Gpt-5 is smart AF
What's up
no. ur nice
welcome to baldi's basics
not u girl
neduo tho... oh satan... his mic is bad
can you let us know some more in #1403860607836487810 message ?
I can.
What version do you use the most?
7
14
3
Direct
I love gpt-5
it's on my to-do btw. good news is that reject rate is extremely low (you can check well over 1k games&replays currently). make sure to not spam the same matchups though as that would invalidate any scoring and doesn't represent real elo. or fork the project and hook up your db and go crazy with it. I purposefully didn't obfuscate or minimize any chess code so you can use it (MIT), and the original code is linked also.
refresh
maybe another annoying cloudflare check...
can I generate 9:16 videos guys ?
copilot cant remember previous messages
lol
I'm here to explore and also share my prompts ideas
I keep getting "Something went wrong with this response, please try again." Error while chatting with claude models. Im tired of it, it's bothersome
It's pretty terrible on ChatGPT
I think it might be routing some requests to a smaller model
hi the reload button
a lot of times
yeah
thats what gpt-5 chat does
Okay
Yeah, it's pretty terrible, like why would you route a software design question to a tiny model
It adds a bunch of crazy implementation requirements when I just asked it to summarize the chat
a lot of openai partners: look gpt -5 is the best model in the world
people: why it sucks so much
someone is lying
cause ur not high
high thinks too much and do too little
the duck?
The issue with gpt5 on website is that it implies sota model answers. But you can get gpt4.1 level answers for things you would have used o3 previously. They need to rework the model switcher options IMO
Like auto/quick/thinking
Hi everyone!! My name is Patricio, called umpalumpa while playing games sometimes. I’m a motion designer! Best 🤙🏻✌🏻
uh
that's cool
?
nah its great
i tested it on yupp.ai
and its pretty good
wiat or jusst start new conversation
lmao
every opportunity you want to announce this scam site
The one on LMArena is GPT-5-Thinking:high right?
Yeah but the actual version is bad
Companies shouldn't be already to call a model one thing on LMArena then serve another with the same name
hi
hi
hi
i got banned off yupp.ai discord for saying no im gonna keep spamming
is it that serious
i don't feel like it is
FREEDOM FOR SERVERS
AMERICIA
🦅🦅🦅
WHAT'S A KILOMEY
KILOMETER I MEAN
🤑🤑🤑🤑🤑🤑🦅🦅🦅🦅🦅🦅🦅
yeah i know
i actually hate the US
🙂
F THE US
NO FREEDOM
OBESE
🔥🔥🔥🔥🔥
wot
why did u delete it
LMAO
where are you from
WHO'S TARRIFS
unlocker
the pyramids country
don't even need the actual name
🙏
egypt
IM GONNA FIND UR HOUSE
what's that
that's Bitcoin
cool
tho why did u send it
send me a Bitcoin?
@echo aurora add GPT 5 mini and nano to webdev
cant understand how qwen 3 is behind gpt-5 in coding benchmark on lmarena
thats crazy
ooo
u sent an image
of
a b
so
give it
i need it
please
same as being gay
in us only
craig stop saying messed up crap and deleting it
who's craig
sorry i have dementia
oh
hey Craig
can u give me that medicine
named asphantisj
or
what u said
it's name was
asphawtusja
?
what's that
aphmetaoejsmkssj
aphmetaaozn?
your almost there
aphmetazinsii
aphmeta
aphmetaliaz
yeah idk
is it a word
wait who the hell is Craig
cipher
wow
lol
im that smart
im saying cipher shet
no
buy a nokia
it's way better
not even for ai
but it's bettrr
crack
this is a image which the hacker saved to my images after hacking my phone
what a cool malware
🙂
are u gonna ask everything to gpt-5
what am i even seeing
whatai is that
humans are smarter than chatgpt gpt-5
It's
whats better hallucination 2.5 pro or gpt-5
yeah the text is ok
not
hand
gesture
🙂
😊 *
how does it hallucinate
dumbass
wdym how lol
wait
its so obvious
it hallucinated copyright licenses and apache crap in ap ython script
and desmos links
dead internet theory
SAMUEL
wow
STOP DELETING UR IMAGES
IM GONNA SHART
ON U
AHHHHHHH
yeah because they're fat pigeons
ofc it's America
everything is fat
even the animals
yeah i know
it's true
it's because
THEY'RE GAY
and they're the strongest country
so obv
they're
the richest
so
no shet
they're richer
thank me
than*