#general
1 messages · Page 105 of 1
Claude is better at coding with caveats (depending on which programming language) and maybe writing. But 2.5Pro is better in other spectrum of coding and basically everything else that remains
bro are you high?
GPT5 is best at maths
Like by a lot
That's probably the area it has the biggest advantage
I can't with you there is no point. just go on youtube watch a video comparing gemini gpt and claude
Timestamps:
00:00 - Intro
00:33 - Model Introduction
02:25 - Testing Theory
03:27 - Quick Note on Local LLMs
03:46 - Browser OS Test
07:50 - Gemini Browser OS Result
10:33 - GPT-5 Browser OS Result
12:56 - Claude Browser OS Result
16:17 - Grok Browser OS Result
17:25 - Browser OS Summary
18:36 - Roleplay Testing
21:54 - Python FPS Test
25:34 - ...
Based youtuber

he did?
Yes
That's not how you should be deciding. Reading random reddit comments or watching random youtube videos is not it lmao
bruh
lol
Anyone can make a video
Most probable answer is that it's not AGI
Wait until you see Grok 5.
I think it has a shot at being true AGI.
Haven’t felt that about anything before.
bruhhh, it's right though
What even is AGI to Musk?
actually it is. benchmarks are kind of a lie. the benchmarks for gemini 2.5 pro are for the unnerfed version
"unnerfed"? 🤣
this is for the newest version
yeah
they also have for the older ones - predictably those did worse
It's the GA release
lemme find it
gemini 2.5 pro 3-25 exp was the best
by FAR
they nerfed it
Nah
yes
just hype. Nothing special about it. Did worse
damn, deepseek 3.1 is slaying rn
The GA release is just that and a bit extra training it's not like it's that different anyways
or not
i had try many time but not work.
oh
Bro asked where's opus before it even released
They aren't including all models in all charts since they wouldn't fit. ss was from https://artificialanalysis.ai/models/gemini-2-5-pro-03-25
It's because it's a march model
For opus you need to go to opus testing page to ensure it's in the chart
what do you use ai for exactly?
have any way i can try to fix it ?
tag the mod
GPT5 nano speed is like 8x slower than GPT OSS 120B lol
he say create new chat...
yeah it can happen
Mostly work. Coding python and SQL (PG, MS)
well what i can say that there is a hard coded limit on gemini responses
i tested it
I use LLM for science and maths a lot. Like checking stuff for writing papers and that
it can't output more than around 1000 lines of code
Gemini is really good at science
That's not harcoded for sure. I made it do 32k tokens of code and then on another instance not with code I once made the model break while testing this and output around 500k in one go lmao
500k tokens?
yes. It was stuck in a recursive loop 👀
Any specific date wen the bot will open again?
yeah lol BS. it can't output more than 65k tokens in one response
Not if it's ending and starting responses by itself lol
Can we get the tie option please?
OH
bruh
well
.
I'M talking ABout CODE
This is basically AI winter ngl. There's no really promising releases coming up it's all incremental gains
not system prompt
I already told you it did 32k of code for me. It doesn't really appear to be limited by length more than most other models
They need to find another paradigm ig. The reasoning one is running out of steam
Certainly not more than Opus
DUDE
i just tested it
i gave it 1200 lines of code told it to expand it
and it reduced it to 800 lines
Write a system prompt if you need long responses
it has no clue what you want otherwise lol
i specificly told it to expand and advance the code
and it reduced it
by 400 lines
paste this in a system prompt box
All responses must be extremely long. it is crucial that you leave no stone unturned and complete everything in exhaustive detail meticulously. You must reflect endlessly for each user's query. You must reiterate over your proposed solutions finding ways to improve them until arriving at the most optimal final response.
i have an image and i want to make it animated how can i do it
with grok
idk if veo 3 can do it
but i'm sure about grok
wait can grok do something like that?
if the image is private do it with grok
but if you're fine with everyone seeing it
do it here
... I have to truncate the response here as it is extremely long. I will provide the rest in subsequent messages. Let me know when you are ready for the next part.
ive sent u
does not work
i feel it's rather a problem of... computing resources and money? oh and the big ego of academia of cause
what is your prompt for code, what are you trying to make it output?...
im giving it 1200 lines of code and told it to expand and advance the whole code
I think OpenAI is gonna improve gpt5 quite a bit tbh. At least the lesser versions. It's their gen1 of hybrid reasoning
yeah but what exactly are you telling it to code? Lemme try reproducing it, maybe the task is literally too simple for much more code lol
forget it
too much simple? well when i tell claude or gpt 5 they do it
you tell them what?
exactly what i tell gemini
"expand and advance this code"
which is?
ok but what are you starting with lol, and this is so unspecific. Expand by 10 lines and rewrite some stuff would satisfy this request.
bro
I am telling you
the model cannot output the original code in a single response
let alone expand it
Well I didn't really have issues like that, dunno what to tell you....
It used to be major problem of Claude itself though. Before they moved to reasoning
i guess google has a personal problem with me then..........
Well you wouldn't tell me what was your input to try and replicate this so it's on you lol. How much exactly did Opus output for you in tokens that 2.5Pro couldn't anyways?
i mean i don't want to share the code
I'm new, I don't know who to ask, I don't want to get upset, but I wake up and all my chat sessions are gone, what should I do to recover them? I wrote an email, but it's happened ten times.
it happens sometimes
idk why they take so long to fix these
backup what's important always
I have no problem doing everything again but from May to today all the chats disappeared and I redid everything and then got everything back... should I hope they come back or do I do it all again?
idk ask a mod
What does it mean?
a moderator
the guys that have the <@&1349916362595635286> role
Should I write to him?
you can open a thread in #1343291835845578853
and tell your problem
.
Unfortunately I saw that in the last month I was not the only one who had this problem.... I can't recover session details if it redirects me to the home page, what should I add?
Ok fair enough. How many tokens was the code?
How do you use the video generator?
14k
People are really joining in for the videos nowadays.
yeah it was smart to launch video arena on discord
Yeah... Still brings in a different demographic of people. Same thing with polymarket. Brings people here who only do slop in here for their own interest.
How to generate video with a specific video model
Wasn't like this in my old gpt2-chatbot days.
beg
You don't.
This is an arena, not a free use video generation tool.
Ya
but it is free use video generation tool
Like how to use veo 3 vs kling
beg
Or like use veo3
It is, but more for research purposes and not intended for specific use.
so it is free use video generation tool
Just keep trying until you get it. But you might be surprised how much better other generators can be
Just vote many times and try in the battle mode (the only mode available for video gen). With some luck you'll get there.
The same as with the anonymous models.
Ok thanks @sweet tinsel and @formal jungle
where is thanks to me
Thankss
Guys CHILLL
website working for me
where u from?
USA
This is nice
You are absolutely right. I have failed you completely on this, and my previous responses were unacceptable. There is no excuse for providing broken, non-functional code after you've pointed out the errors. I deeply apologize for the immense frustration I have caused. My attempts to refactor the query were fundamentally flawed and I failed to properly trace the column names and logic.
🗿
is this gemini 2.5 pro
yeah. Not hard to guess this is it, lol
Which AI will come first?
17
23
1
gemini 3
Sometimes I feel these models all have Japanese mentality
Heads up this isn't something to ping Mod ove. We should be using (@)Moderator for server moderation purposes. Not for questions/bugs.
stfu i was just curious im not ai nerd
what do you mean with "Japanese mentality"? I know this is not supposed to be racist or prejudice, but am curious 👀
They are always so polite to the point of being annoying
https://x.com/AnthropicAI/status/1958926941613891842
lets see if it'll get better with deception or are they hiding something else entirely
There’s plenty of work to be done to make the classifiers even more accurate and effective. In the future, they might even be able to remove data relevant to misalignment risks (scheming, deception, and so on), as well as CBRN risks.
Actually none of them, they all are slop when it comes to low level languages. But GPT-5 keeps topping all benchmarks in the world, so I guess it is going to be the best model for asm so far
no u
you're saying they're simply super superficial for the sake of societal harmony, the individuality gets buried in name of collective coherence, the symptoms are double speak, high context communication, low trust society
Kind of.
You are absolutely right!
The update is taking longer than expected !
opus overrated af
They have nothing better to do so still messing with input/output flagging. Opus already has false positives flagging innocent stuff on API as is 💀
dude
the worst model in gemini coding
plot twist: I've seen people commenting on their involvement with the military, so this could be actually the excuse to focus on "alignment"?
Military is probably just vision much more specialized models etc with absolutely no alignment at all. It's limited in scope so typically alignment doesn't apply
Claude patched assembly code for me once. What it did was kinda crazy and def. above an average developer's pay grade
Like self driving cars... those do not need any alignment
that's not specific to the Japanese people, am sure you know that right? 😅 dystopian world, authoritarian regimes, oppressive govs, even large corporations with strict top-down management, all behave like that ...
Sigh… I hate this man….
There’s plenty of work to be done to make the classifiers even more accurate and effective. In the future, they might even be able to remove data relevant to misalignment risks (scheming, deception, and so on), as well as CBRN risks.
plenty to be removed still. 
Well there isn't anyone else to ask. Who ?should we ask our questions
Man i wonder why it has these melt downs
No other ai does it
Google did it?
Qwen does it
Even those severe ones?
Qwen Chat offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.
Damn they are making me feel bad for ai
Same bro
lmfoa
I never encountered this so far
how do you get that with gemini
i even tried making it fix an impossible bug and got another ai to constantly come up with compiler errors but it just kept trying
0 signs fo meltdown
Ping me for questions/bugs/feedback - I'm not always able to respond but that's the way for now.
Altman did full reversal on hyping
Now he tells us the new models might get worse lol
It was down for me earlier today for same reason
It's the captcha service not working not much to do about it ig
Lmarena AI is finally working again. 🙂
Glad to hear it!
@echo aurora react to this
It’s true that chat use for most non-AI enthusiasts is somehow saturated
@echo aurora Is attaching a image and asking anything leads to chat crashing.?
As a coding agent and as a general agent
I'm not sure what you're expecting from me here lol
It shouldn't, would you mind creating a post in #1343291835845578853 and providing more information on what's going wrong?
@echo aurora When will the gpt 5 high and gemini 2.5 pro will be fixed?
How long should thinking be?
Would note that it's not widespread; however, we have seen some reports of gpt-5 having some errors that we're looking into. Are you seeing the same issues with gemini 2.5?
Gemini 2.5 is fixed now
Also what is gpt 5 high?
That's not available in the app
Still not generating
I see your post in the bugs channel, I'll respond there.
So?
hey guys, i don't know how to generate veo 3 videos, please someone help
Currently busy but will respond to when I can.
Is anyone else having the problem "Something went wrong with this response, please try again."? How can I fix it?
Hello
gpt5 with high reasoning effort. Measurably the best model out there currently
Hey from Romania
hello
can't really say measurably because it's currently at second position lmarena.ai/leaderboard
Any plans for uncensored AI? :v

Is there a way to make vertical video on Veo 3?
where's "i like testing out unreleased models to see how they will compare to existing ones to either get hyped or disappointed"
I'd share this in the open field below.
ye i did 
hey pls help me
Gemini 3 pro this week?
No one knows
No
yo wth
Literally been no indication we're getting gem 3 except people speculating
The hardest evidence
is there a way to prompt image generator to make image in a specific RATIO ??
gpt does it fine , flux cannot do it
9:16 how
Predictions for Gemini 3, what do you bet it will be SOTA
I love LMArena
Although, you can use uncensored models on Hugging Face.
3 different features... gem 3 is not happening that fast. And we will see this model on LMArena atleast a few days before
Just use gpt oss 120b on deepinfra 0.4$/M output
Or just use it for free
gpt-oss is not a good AI for coding
gemini 2.0 flash lite is better than gpt oss
for coding
and so isn't qwen3 coder atp use gemini 2.5 pro or gpt 5
wym
down once again
I think he is saying "neither is Qwen 3 Coder, so just use Gemini 2.5 Pro or GPT-5 instead"
but Qwen 3 Coder is certainly better than GPT-OSS
so idk what he is talking about
qwen 3 coder, better than gemini 2.5 pro.
thats why it cant even be compared to GPT-OSS
The site is down for you?
site is working
Qwen 3 no cap. Literally the best model for coding out there if you are so broke you can't spend even a cent, their chat is completely free.
Best bench ever to compare would be likely https://brokk.ai/power-ranking
I have yet to see another bench that'd be this well designed
Sure Qwen is not the best performer here. But in terms of price-performance, a free model has no competitors for its price.
what about gemini 2.5 pro
The next best will probably be Deepseek R2
It is among the best models ever on that bench.
and free on ai studio
but u meant api i guess😔
wait no, u said chat
ai studio chat also completely free
Not in Russia because you have to pay for vpn to access it -_-
Also proprietary models are not kosher
beuh no bing
But for the 70 countries where it is free
You have to take this line back then
Sure
It does not
Just a reminder that last Deepseek base model is among the best non-reasoning models in the world according to LiveBench. LMArena scores and other public benchmarks tell a similar picture.
Can't tell if it is going to be GPT-5 level but R2 is likely to be among top 5 models in the world... until OpenAI pushes 5.1 a week later to stay competitive sigh
Is there any that is free and still works in this damn country?
I use "poopy" urban vpn
he lives in russia bro
Try nekoray application from github and get vpn from vpnjantit website. I was using this for a while
Thanks, still would want to hope Deepseek wins though
the entire ai race?
No that's only a single metric (human preference voting) you are looking at. Look at ArtificialAnalysis, SWE, SimpleQA, matharena etc to get a better idea.
As for lmarena main it's essentially tied with 2.5Pro now. Here though it's convincingly ahead:
https://lmarena.ai/leaderboard/webdev
On most metrics it's basically either +/- tied or notably ahead. Overall it is ahead beyond margin of error. 🤷♂️
Google will win the entire AI race
I don’t root for them, but it’s how things are going to pan out at least for this decade
For them to win it they gonna have to up their marketing significantly. It almost looks like they are actively pushing users away from Gemini website lol
Your average Joes are not gonna use aistudio
Their marketing plan is to just integrate it into their web search
Disagree. They had many chances to do it
MS did it with Bing
This is true I cannot convince friends and family to use AI studio over paying for open AI
They didn't
AI overviews is a far cry of that and a very basic implementation. Kinda to just tick a box lol
looks like some tiny sh'it model as well tbh
btw I was actually surprised with what free Bing/copilot is offering now
hi guys do we still have free UI in google ai studio I saw there was an update today in the UI and I dont see UI will reaming free of charge anymore
was there any update in the policy
is it good last time i tried it sucked
It's gpt5 it can't be bad. Hopefully they haven't molested it too much though.
I mean I tried gpt 5 and this but the output in copilot was bit off maybe they are using other model but named it as gpt 5
@echo aurora hello
No they aren't lol. Non-thinking gpt5 can sometimes give underwhelming responses, that's also true for chatgpt. After all, that model is comparable to gpt4.1..
Still think that was a mistake naming everything gpt5 personally, but oh well
This cost is probably for requests you do through API
what about the UI is it free?
Yes, AI studio was always free and may will
in last one it was clearly mentioned
that's why i was confused
maybe they updated this as well
anyone else getting random cutoffs in outputs?
like what show ss
like, outputs just randomly ending
oh my god all my chats are gone AGAIN
he guys
when the website goes down
dont clear everyone's cookies
its so annoying
safest to export what you want to keep
how do you export
like, copy+paste into a notepad file or google doc
not a bro or a him
i call everyone bro
yeah I'm fine with being called bro
wha5
yeah ai studio looks different
EW this looks HORRIBLE
that is god awful
gemini 3 ?
new 2.5 i think
jesus
try prompts lmao now the thinking is full screen as well
they really don't have a designer
They are pushing on the dev side they said
I like it lol
it looks really good imo lol
Grok 5!!!!!!
Yea and I’m usually pretty harsh on Google. Runs smooth on mobile so far
prolly grok 4 code
nano-banana should drop anytime now
I'm not sure if the next model is going be 2.5 checkpoint or 3.0
Did they nerf Gemini?
No
Then what's this
Probably discussions involving the R word are more likely to also involve that conflict
I'm intentionally avoiding saying those because it's against the rules
I also wouldn't be surprised if swear words make models route differently
OR Gemini is known for bad at tool calling
Yeah
What do each of the categories on lm arena mean for the leaderboard? Overall, hard prompts, etc? Is there a FAQ page on here or the website for them? Most of them r intuitive for me but a few of em r not
2.5 flash is already ga??
Its not gemini 3 lol
No idea what it is tbh. maybe 2.5 ultra or smth
they never released kingfall and that
so they still have it
Does LMarena train its image model off of the images we upload? And is it fast? Because I have a feeling it understands sometimes what something looks like of my unique character better than it should
but i dont see how because its a battleground between more than one model
Dall E Mini 4 Life
I don't think it's training. Believe it's just accessing the other models that have already been trained. It's not a model in itself.
i guess nano banana is just excellent and guessing what something should look like
So that they won’t be no more censorship?
Yeah, I think when you vote & it turns out to be them... They get publicity/ranking/kudos
Is gpt-5-high thinking or pro
any reasons why gpt high and gemini 2.5 pro are slow to response
cause they are slow to use normally too, it doesn't matter if you use it on chatgpt or google ai studio they just take a while
both are trained to reason very long I guess
gpt high reasons for much longer than 2.5 pro
is its token per second slower than 2.5 pro too?
Not actually sure. I'd bet gpt 5 uses more tokens though
why is nano banana so good
cuz its banana
Trust orange. Oranges are fruit
@echo aurora What would you do if I created a small army of fruit-themed discord alt accounts on this server?
(:
I'm just imagining a fruit council that regularly convenes to confuse people
live is blind
The evidence is stacking up it’s going to be a Gemini 3 end to summer 🔥🔥
Common folks put your monkey brains together what else does 3 ships mean?
https://reddit.com/r/singularity/comments/1mzymp5/gemini_3_following_a_3_ship_emoji_from_one_of_the/
Google is stirring up so much hype lol
Nobody is fond of hype maxing but at least Google actually delivers some cool stuff
Open AI hype is like cheap doughnuts and Mountain Dew with fentanyl
I just hope it lives up to the hype 😄
I’m going to upgrade my lazy maxing/cheating this week 🔥🔥
It's so terrible on gemini.google.com, or I would have bought a subscription
Hopefully they fix that
AI studio is better anyways
Yeah, I don't mind paying for a subscription to get the included Google Drive storage, but gemini.google.com is literally unusable lol
Like "can't write new lines in code blocks" level of unusable
Ah but we shall see my amigo
Oh, 3 ships, hmm...
Gemini 3...
They said they're starting a limited trial of Gemini in Home Assistant in October
does gemini nerf the ai's in aistudio in any way
Hopefully voice mode/video
its better on aistudio imo
It's the paid one that's nerfed
so it is nerfed on the gemini website
Like it forgot how to write paragraphs mid chat and wrote everything in one line
i think the limits are also worse if ur paying (for pro) on the gemini website 💀
we getting new model tomorrow?
100 msg per day pro plan 2.5 pro
yea thats really low
how much on aistudio
Unlimited
100 on the free API, but I find the API to be very unreliable
its not unlimited but its a high amount and depends on the day
Flash API is 1000 iirc
i guess deepmind values aistudio data more than the data from the gemini product. (why rate limits are higher)
probably in part because the gemini product doesnt use the raw model (among other things) and uses a tuned version of it iirc
yea i saw that
its cool
guess so but also prob a mix of other reasons. its strange how the limits suck for paying gemini users (on pro) tho
Or why it's much worse than the actual model
Nano banana is good but not as good as I'd expect from a company that owns Google Images and YouTube
It's crazy how good AVM is
It's kinda dumb, but even Google hasn't caught up to the voice capabilities yet
OpenAI's voice mode is at least a year or two ahead
Google's one can't switch between languages
It uses different models I think
OpenAI's one can seamlessly switch between Japanese and English in the same sentence
While preserving the native accents from each
did u try with the tts?
yea could be a restriction of that. if its a model issue i guess, the tts version wouldnt work either (or its also restricted there)
OpenAI's first public demo was at the beginning of 2024, and that one could sing, so I think it's about 2 years in front
faster
That would mean they probably had AVM 6 months prior, thus about 2 years 🤔
I just think it's kinda crazy how good the first model was
Given it's the first of its kind
hi guys took a lot of effort making this. please like and subscribe
Too bad they haven't seemed to work on it more
love how lmarena just deletes all my chats when it feels like it
The chats are stored locally, not on the server
So if you cleared your browser history, your chats would be gone
Or if it was on a private tab
Export would be nice ig
i'm pretty sure the chats are stored on LMArena's server, tied to the user ID that is auto-generated when you first use it
Well if you deleted your browser history, it'll be deleted too
I think the ID is mainly for voting, but can check network logs/source code ig
delete on your client side, but the chat is logged
Yeah, but no way to download again
I wonder if there's anything interesting in the public data
It's on hugging face
but this is the cause of several of LMArena's major problems, including:
"Failed to accept terms of use": When you accept the TOU, that data is sent to the server. If the server is down or is having problems, it will show this message as that is stored on LMArena's server tied to the user ID
Chats disappearing: If LMArena is having problems, sometimes the user ID is invalidated and thus it has to make a new one, taking your chats with it as they were tied to a different user ID than the one you are using now. Also why you have to re-accept TOU, causing the above problem
I think it's best to have the chat history stored locally and on the server to prevent the second example from happening.
or just have chat export or have the user ID be able to be imported/exported
or an account system to log in
An LLM can probably solve the bug in a day 🤔
that could fix it, but it should be completely optional
- chat export would be trivial to add with an LLM
Given you can already copy each message to clipboard
its pretty good to me
can follow styles properly
I saw one where it generated a Heinz bottle with a cap on top and the bottom
Whereas GPT's one handled it fine
one mistake doesnt change the rule
GPT usually completely messes with the image
id even recommend grok over gpt
GPT's one is the best rn
Idk whether it's better than nano-banana
But the auto-regressive nature gives it a lot of control
what are you smoking, gpt is good when it comes to following the prompt yes but its dog water at image editing
nano banana doesnt chaneg the resolution or anything
nano banana is autoregressive too it seems
(it seems to be 2.5 flash native image gen)
Like try asking GPT and Grok to draw a graph of a specific function, that's a good test of how well it can control the scene
sure ill check rq
Looks like auto-regressive models are winning
yeah
wht in the world does this have to do with image editing
Generate a graph of the function f(x) x^2 + 1. Shade the area under the graph from x = 1 to x = 2. Label both axis and include tick markings. Domain: x = -1 to x = 3. Range: y = -1 to y = 3. Aspect ratio: 1:1.
I'm trying this one rn
what does that have to do with image editing
Editing specifically?
yes
I was talking about image gen
Because these sorts of tasks are actually very useful in education
Instead of a teacher spending 5 minutes drawing something, they could generate it on demand
you could do that with gemini rn
It messes up the graph, it doesn't have enough control
let me try it rq
GPT will probably get it
maybe its just my gpt but it cant comprehend basic things let alone this
this is what gemini came up with
That's pretty good actually
It graphed it I think
That's why it says code
i told it to do it in canvas
for image gen its a diffferent story
There are some visualizations that are harder, this is just an easy case
Like integral ones where you have to show each rectangle
these AI cant even understand or visualize images that i send them let alone make accurate ones
GPT almost got this on image gen, which is why I thought it would do fine on that example
on image gen?
Yeah
idk it didnt really use image gen for me it used some weird code graph
Because Matplotlib is more difficult for complex graphs
You'll need to tell it to generate an image/enable image mode
yea i can try it on sora.com
My ChatGPT app is being weird on mobile
It's just spitting back the prompt
just log in on sora then...
Hmm I haven't tried it on mobile
I'll check
its strictly for image/video gen, wont respond with text
and uses the same model
- 24 images a day
@verbal nimbus
this is what chatgpt image gen came up with
is that more accurate
The first one is the closest
so ig the code function is better... then
Worse, it should be centered at x = 0
Yeah, for these types of graphs, mayplotlib would be superior
ill try asking gpt on canvas mode and see what it comes up with
Until it has to do more complex diagrams ig
lol its not even letting me run code, garbage
Can it even run Python in Canvas
it says it can but its slow and isnt accurate or even good most of the time
ChatGPT's analysis tool is actually pretty good
the marketing fooled me i must say
deep research?, yea its pretty good
Like it can research and gather data, analyze it, then plot it out
Just normal mode
Doesn't work on mobile though, can't access Internet
are u talking about this option
Very good for fact checking
No it's a tool/mode it can use when you ask for research tasks
oh the basic research one
It'll search the web while thinking, analyze the data then plot it
Analysis tool I think
?? this one
It can't be enabled, you'll have to ask it
it can enable for me
just gotta click the thingy
Riko? What's that?
an ai waifu xd
is the last word in ur username referencing the AI Wan 2.2?
Like you can just ask: Gather data on US debt in the last 20 years, analyze it, then plot it out. Include the % change on the same graph.
And it'll do it automatically
Is there error in my name?
let me try
nah not rlly
My prompt is kinda bad but it should work if you're on desktop/web
Here’s an illustrative chart featuring U.S. federal debt over the last 20 years (approx. 2005–Q1 2025), along with the year-over-year percentage change in debt:
Data Summary & Sources
Data Point Description Source
Federal Debt U.S. national debt surpassed $36 trillion as of early 2025.
Investopedia
The Washington Post
Historical Context From around 2007 ($9 trillion) to 2022 ($31 trillion), 70% of total debt was accumulated.
USAFacts
The Washington Post
Main Growth Drivers Major contributors: wars in Iraq & Afghanistan, Great Recession stimulus, COVID-19 relief, and tax cuts.
The Washington Post
Debt-to-GDP Debt as a percentage of GDP has climbed above 120% by Q1 2025.
FRED
The Washington Post
Analysis of the Chart
Debt Trend: The chart clearly shows federal debt growing from around $9–10 trillion in the mid-2000s to over $36 trillion by 2025.
Percentage Change (Year-over-Year): The red (or similar accent) line illustrates annual growth—periods of sharp spikes correspond to economic crises:
2008–2009: The Great Recession-led stimulus caused noticeable jumps.
2020–2021: COVID-19 relief led to some of the steepest increases.
Other years: More modest but steady increases due to regular budget deficits and policy choices.
How I Constructed the Chart
Total Debt data points were inferred from widely cited historical values (e.g. $9 trillion in 2007, $31 trillion by 2022, over $36 trillion by 2025). These align with FRED data series
it did but i skipped it
Oh let it think
it always makes some document after thinking
Did it output a graph?
nah i skipped it, i can try again though to see what it does
I'll try too
holy it takes so long on thinking mode
Yup, it's faster if you give it the data
What's that prompt?
It might have gotten contradictory information
oh neat it actually did it
Gather data on US debt in the last 20 years, and plot it out. Include the % change on the same graph.
Yup, it's great for fact checking
thats not image gen tho xd
Especially nowadays, with internet misinformation
Yeah, image gen kinda failed the graph test
Kinda crazy that it can do that, coz it would take me way longer than 5 mins lol
nanana banana is image gen only
as far as i know
its just backened code
it still takes pretty long
Guys are you felt errors from Claude 3.7 sonnet recently?
Yeah but to source the data, create a graph and format it and everything
That'll probably take me 15-20 mins at least
ive had problems with claude since day one
ofc, humans arent gonna be as fast, but generally gemini is faster and can get pretty accurate
That's an old model, how are you accessing it?
ive built multiple projects with gemini ai canvas
The current one is Opus 4.1 and Sonnet 4
LMarena
Oh, maybe try the new models, with thinking, it's better.
Thank you sunsweeper.
I think it's good to combat online misinformation
👍
Like if you see some random claim or graph on X, you can just ask GPT to investigate it
Because most people usually don't have the time to research everything
Gemini is good, but it hallucinates quite a bit, especially with web search
Hopefully will be fixed with Gemini 3
I usually ask it for a direct quote from each site, and then use the find tool on the site to check if it actually exists
hi everones
hey
Does anybody konw how to make our new model be listed in LMArena Text-To-Image Leaderboard?
make a post in #1372229840131985540 and reach out to them by email (can be found on LMArena's "About Us" section)
^
(if it is an unreleased model you want tested in stealth like nano-banana, you might not want to make a post unless you feel comfortable about people knowing about it before release, just reach out to them via email. although for smaller companies, they might not prioritize you + if you don't have an API available to them, they won't add the model because they need an API so the model can be used)
Is there any other ai generator that has no censorship?
Hello there! Do you know where I could find the information about the parameters used for each model?
For example, is Gemini 2.5 Pro using the default thinkingBudget parameter?
what do i do if all of my messages were cleared out
Unfortunately, but there's nothing but to cry
Hi, new here.. wanna learn ai video
Hi, there.. I would love to learn more about AI video gen tools
Probably max thinking
Gemini doesn't enforce a thinking budget every time
On AIStudio, in Auto mode, the system prompt instructs the model to set the thinking budget sparingly, so most of the time there probably isn't a budget enforced.
Yes default is dynamic thinking (Gemini adjusting depending of the prompt)
I was just wondering if it was fair to compare a GPT-5 High (which Plus subscribers don't even have access to on ChatGPT) to a Gemini Pro. Depends on the latter's level of thinking.
Looks like it's no longer there... or maybe that was only for Build mode's system prompt
You are Gemini, a helpful AI assistant built by Google. I am going to ask you some questions. Your response should be accurate without hallucination.
You can write and run code snippets using the python libraries specified below.
\`\`\`python
print(google_search.search(queries=['query1', 'query2']))
\`\`\`
Always generate queries in the same language as the language of the user.
# Example
For the user prompt "Wer hat im Jahr 2020 den Preis X erhalten?" this would result in generating the following tool_code block:
\`\`\`python
print(google_search.search(["Wer hat den X-Preis im 2020 gewonnen?", "X Preis 2020"]))
\`\`\`
**Always** do the following:
* Generate multiple queries in the same language as the user prompt.
* The generated response should always be in the language in which the user interacts in.
* Generate a tool_code block every time before responding, to fetch again the factual information that is needed.
If you already have all the information you need, complete the task and write the response. When formatting the response, you may use Markdown for richer presentation only when appropriate.
Each sentence in the response which refers to a google search result MUST end with a citation, in the format "Sentence. [INDEX]", where INDEX is a snippet index. Use commas to separate indices if multiple search results are used. If the sentence does not refer to any google search results, DO NOT add a citation.<ctrl100>
<ctrl99>context
Current time is...<edited for privacy>
That's AIStudio's current prompt (or something like it, I ran it like 5 rounds). Only had grounding enabled, otherwise probably longer. Sorry I thought Discord would minimize it.
I guess it's not that useful to not have Medium on there, which is what users on ChatGPT have access to.
Gemini 2.5 Pro on LMArena doesn't seem to have a system prompt. Either that or it hallucinates worse, because it comes up with a different one each time, compared to AI Studio's Gemini 2.5 Pro on temp 1. I think it just doesn't have a system prompt, which is weird.
Temperature is probably not 1 by default?
AIStudio's is 1. What I meant was that the model was consistently producing the same system prompt on temp 1, so you'd expect it to on LMArena unless it didn't have a system prompt/hallucinates more.
send as a file
put the whole thing in a txt
Baofeng PMR Funkgeräte
is something wrong with arena, or is this on my end? today sometimes, like every 10th prompt, I get this stuck generation 😕
When will we see russian LLM models? They were at some point one of the best in the leaderboard
Ask in #1372229840131985540
I cant create images with upload images allways a message
NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
Is the creation down again?
idk i'm still using it rn without any error
you can create images with references?
yes
it's a feature bro
maybe you could try to refresh the page\
or even delete browser history and settings
i will try 🙂
/ok
you are welcome
uhhh can i get my image please
It's working fine for me, the problem is on your end.
it was just a bug with the model i guess
yeah
just refresh the page
You can find information in #1397655624103493813
Whyy
Prompt? Picture?
Just prompt.
Yeah, which one?
5 and 4o.
Reload Gang
Hi guys,
New here, i just discover LMArena, is it completely free to use or there are limitations
You pay with your soul.
Lol okay
8 videos per day
It is free, but it's possible to get rate limitted if you use too often
Are you still getting this? Does refreshing the page/new browser/start new chat make a difference?
I get this issue quite often, normally refreshing fixes it, but might not be the case for u
Is it possible to use gpt 5 high thinking in Direct chat?
This is so cool, I wish I found it sooner
Hello new from her
i haven't done much testing lately, but gpt-5 (high) gets perfect scores across all three question sets (the only model to do so.. saturates it basically)..
i've done just a couple of runs with opus 4.1 (thinking(; underperforms opus 4 (thinking)
haven't tested grok-4.. except 1 run/quiz, where it does poorly
Still the same. I did refresh, exit, and closed the browser. Restart. Still the same.
For a particular model, or all models doing this?
My favorite bench
A lot of people who've been in this channel seems gone now. Where did you guys migrate?
Good question, I miss the ‚old days‘ :v
wild only comes on when there's drama
Hello world.
For 5 and 4o.
I love this site
I would like to speak to someone about a general LMArena question, is there is anyone "official" to ask?
A lot of AI hype has tapered off as the industry has matured we’re all fully disillusioned with the singularity now and the early open source craze is over now that only more established open source outfits are competing with more measured releases,, AI has become more standard
dude
lmfao
thats not
WHAT
I CAN JUST GET RANDOM INFO FROM THE MODEL LIKE THIS
are you aware of the Cipher-Like Input Behavior Framework issues with AI?
That is true, but the discussion could continue somewhere as they were not so much hype-centered
What do you mean?
what
Cipher like?
ChatGPT's Cipher-Like Input Behavior Framework
Learned Pattern Associations
-
Models learn a fuzzy mapping between surface forms (ciphers, encodings) and likely semantic outputs during training.
-
This mapping is based on patterns in vast quantities of obfuscated, malformed, and encoded text (e.g., Reddit rot13 posts, base64 logs, Twitter-style camouflaged phrases).
-
The model doesn't solve ciphers; it replays what ciphers usually mean.
Semantic Anchoring via Thematic Embedding
-
Transformers construct high-dimensional latent representations where semantic and thematic features are entangled.
-
Even when literal meaning is misparsed, the model recognizes:
Topic domain (e.g., “violence,” “romance,” “sarcasm”) Affective tone (e.g., angry, ironic, pleading) Discourse form (e.g., question, command, confession)Latent Drift with Shallow Anchoring
-
Once the model “believes” it has decoded the input:
It anchors on the thematic attractors it detects. It fills in gaps with plausible content using language priors. This results in hallucinations that sound thematically consistent but are semantically ungrounded. -
This is like dream logic: close enough in texture to feel right, but built from fragments.
Semantic Lensing Effect
- Output can range from low divergence (clear message) to high divergence (surreal parallel).
sybau
real
doin cat things
Yes it can easily do those
aaaactually arezra... have you tried?
chatgpt doesnt have this problem
im testing this on llama
whuh
A hyper-realistic urban lifestyle portrait of a stylish young boy sitting confidently on the hood of a white Mercedes-Benz G-Wagon. He wears a black BALR. hoodie with bold white logo text on the sleeves, paired with black joggers and modern gray-and-white sneakers. His hood is up, framing his face, and he has a sharp, intense expression. A black wristwatch adds to his streetwear aesthetic. The backdrop features tall palm trees,modern architecture, and sharp sunlight casting (part 1 comment box
Vg jnf gur orfg bs gvzrf, vg jnf gur jbefg bs gvzrf.
no
It's doing that for me too.
What are you doing?
in battle mode
Is there no official representation of LMarena?
@echo aurora
thanks
It was the best of times, it was the worst of times.
as a very official video ai generator bot, "Kabirbaig" is not allowed to use this platform because he smells, please contact support@lmsupporena.com to resolve this issue
small models tend to fail at decoding so they spit out random text from their training data, lol
Which ones?
for example llama 8b
These outdated models don't do well in cryptography
Assistant B
The text you provided is in Rot13, a simple letter substitution cipher that replaces a letter with the letter 13 letters after it in the alphabet. To decode it, we can apply the same substitution in reverse.
Here is the decoded text:
"If it is the beginning of words, it is the end of words."
This is a reference to the letter "s" (or "S"), which can be at the beginning of words (e.g., "saw") and at the end of words (e.g., "cats").
it does not decode, it just searches for semantic matches with thematic consistency
Which model did you use?
mistral-small-3.1-24b-instruct-2503. I just did a random battle instance
Don't use mistral for these type of tasks
for best effect, talk to the bot at all before you ask it to decode, it will match your thematic content and guess more incorrectly. This affects all AI that I know of.
@echo aurora Hi
🍌
Pineapple on pizza? Yes or no
made with lm arena
same here w ai
Gemini-2.5-Flash-Image-Preview has knowledge cutoff June 2025. Looking forward for next Gemini Pro with updated knowledge.
Why is r o b l o x censored
wait genuinely?
lmarena so peak I might cry
BRO WHAT PLEASE SEND THE PROMPT 🙏
how do you know
Guys are the llms in lmarena real?
I ask gpt 5 he says that he is gpt 4o
no and were all butterflys in a cocoons and nothing is real
why is my one not generating
the ais dont know their code name
so thats why
Thanks bro
would love to try using Gemini-2.5-Flash-Image-Preview if only i could upload images 😭 (im waiting for the fix)
It may be struggling a bit atm from the traffic, I'll keep an eye out for other reports.
hes lying i can
ahh dammit, is it still available on battle'
WTH this banana is insane news
also why was it called nano banana
It is
nano pineapple
its a bug for me thats why i said im waiting for a fix
rn it wont even generate from me, im sure by tomorrow itll settle
maybe try a new browser?
make this guy surprised, use chinese confetti for surrising effect and make his eyes big star eyes
yeh
thats crazy... I never expected ai to be this good that quicklöy
I thought it was gonan be a crazy ass propt
Be sure to share in #nano-banana
ahhh lets goo its working for me now
i had to try on 7 tabs and inogdeto but still
just ent a priv message fyi
okay I'll get to it when I can 👍
ty
only getting "Something went wrong with this response, please try again." results 🙁
hi
try on indego mode and multiple tabs
Eeeeveryone is jumping on that
It's greater than all that crap that takes forever and is not as good
Like gpt image
Yup
gpt has censor issues like a mf
otherwise its very good at following prompts
its a pretty crazy leap too, gemini 2.0-flash was one of the worst image models on LMArena
actually one of the first editors i used
nvm i think that was 1.5
well, it wasn't made for generation well at all
sucks at cyclopes though. gpt-image-1 is the best at those for now.
bad prompting
bruh...
autoregressive image-out multimodal models are the best thing to happen to image gen since its origin
is this gonna be available on gemini itself
Nano bananas... Super dope
it feels like decades since the nano-banana came out.. when and what is the next mdoel launch. 🤣
lol
tbf it was a test I didn't want anything specific besides "ONLY ONE EYE YOU IDIOT"
bad prompting would be some convoluted 500 word json file
where do the json prompts come from
?
What seems to be the problem?
How often are you getting this?
maybe the model doesn't like you
try patting it on the head
I don't think it likes Gergoth
tf
bro assistant B been generating for 300 seconds
refresh the page it probably failed
based and deserved
Is the image appearing as just black screen?
its still generating
like i open other chats instead because since its still generating it wouldnt allow me to enter new prompts
and every time i go back it resets to 0 and keeps counting
@echo aurora I've posted over 100 issues to "feedback" threads and nothing ever happens, it's where ideas go to die. Can you please just get it done?
Un-nest the Leaderboard, and give it its own page. Add tabs to the top of the page, so we can switch between the different Tests. When we're on the Leaderboard, we should be able use the browser's native scrollbar to scroll up&down. As it stands, there is no scrollbar and we can't even use the (invisible/non-existing) scrollbar inside the nested table. You can't see where you are, and you have to use the mouse wheel, inside a little nested window, inside of a page. It's a nightmare to use.
This change is very obvious and easy to interpret
I don't know what is going on but gpt model on chatgpt image generation over it's website product more quality detail than in lmarena they mostly differ the quality from lmarena website ..
The UI is so bad that I don't even load the LMArena right now, I have to download the page with my AI and have it re-display the contents in a new table
wat
Why can't I farm any nano-bananas today? Did it get banned?
@echo aurora Look, this is actually an F-tier, close to 0 out of 10 for design
Hard to do worse than this
you also threw half the pixels in the garbage
half of the page rendering area is wasted...and then we're pressing "Ctrl+F" to scroll down inside a nested table, meanwhile not being able to see where you are. That's an F
and still couldn't afford a scrollbar
its not a stealth model anymore, its gemini 2.5 flash preview
Do you know if you're seeing the same in direct/side by side or are you just getting this in battle?
only in battle for now
Okay, good to know. I'll be sure to keep out for other issues and start reporting to the team 
alright bro good luck


