#general
1 messages · Page 253 of 1
lol
Maybe it needs some parameter tweaking
WHAT IS THIS
5.3 codex nailed it btw
Looks like bugged start positions to me
When will they launch the codex 5.3 here
When mistral 4 comes out
Its not for api users ez
W openai for gatekeeping
Yeah but is there anything about is it coming soon?
Maybe in a week
Oh nice
Oai sucks at open source models and api
Also sucks for deleting 4o
4o is Still the best
4o crap and still accessible wym
opus 4.6
Still here for a week
(its drivable)
The suspension physics are the hard part duh
Yeah Code Arena
itll time out
It should count as a fail if it times out
Yeah thats why i dont like it
It can't just overthink and time out whenever the problem is too hard, then get the vote forefeited
Opus be like gpt 5.2 xxxxxhigh
😭
best ai model ever?
Still thinking...
💔
An llm who thinks all the time
Has nothing to think about except thoughts
🥀
Tokens: 💸
I got 1m context window
Im gona use the 1m context window
🤣
60$ per prompt btw 😭 ✌️
Is this the summary or the actual thinking
I HATE OPUS 4.6 THINKING MISTRAL IS BETTER
ITS IMPOSSIBLE
Because it says "I did X" and "I did Y" but I don't see it
It ran outa tokens
All models do that
That's why mistral is better
How many token it even has bro
With Gemini/ChatGPT it's the summary, so it is actually doing it yeah
Ya
Opus has A LOT of internal thinking
But this one looks like the actual thinking 🤔
Like more than all other models
Maybe 5 tokens
6 if im feeling generous ahh
Like is this a summary or is it actually hallucinating
Summary
It generates gazzilions of tokens under the hood
Thats why it costs so much
(The summaries overall sound like bs and derailed from reality for all models btw)
Still thinking...
Like summary can say "im using this" and the model uses that
Mistral thinks so much that it doesn't
Well then.
That's quite the turnstile twitch
Like I'm trying to create snake game with mistral
Wait what have you been doing this entire time then, Opus 🤔
It can't do this
@surreal zephyr https://api.websim.com/blobs/019c3f75-b474-765c-82c1-c972405164f6.html 🤣
prototyping
of course
Maybe it's not opus
It's mistral
It's the best I could do with mistral large 3
It can't do anything more
https://api.websim.com/blobs/019c3f77-7e6c-775e-a4ec-fad89f39e59e.html driving isnt working but holy this feels GOOD
It said it was ready like a gazillion times 🤣
Why
I thought opus 4.6 doesn't hallucinates
its just the thinking process
its not really accurate
its just summaries
also this will fry your pc https://api.websim.com/blobs/019c3f78-e582-75ff-b2e3-95d7d77fe3b4.html
WHAT THE
I almost got fried
It does, but not sure if it's doing so here
Well mistral still no diff
Welp
Codex 5.3 nailed it in 1 prompt
It's easy to make it hallucinate:
5.2 thinking
Old Sonnet doesn't hallucinate:
Oh it has tools so doesn't really count. This test is to see if it can notice that the tool is missing instead of pretending to use it.
Funny how 5.2 is the only of sonnet and opus that did it correct
Wym
It ran script
And it said it doesnt have a tool for it
So it didnt hallucinate
Yeah, that's fine
GPT doesn't hallucinate on that prompt, only Claude 4.5 and up
What does gpt fail at that claude doesnt?
Where is it?
Hmm it doesn't fail at that kind of basic stuff
Where is what?
The hallucinations
Compare Claude 3.5's response in the next message. That's what it should have said. Here: #general message
Nah it shouldve said it doesnt have tool, then run a script imo
Gpt did it better
It can't run a script on LMArena
Oh
I'm glazing mistral
If i ever become crack addict ill go to mistral
Yup, only newer Claude models seem to fail on it for some reason
XAXAXA
he did hallucinate
show me
In app
The one on the left also didnt hallucinate
Brotato chip
💔
Without functions and with functions both, said that they dont have dice tools
because this is awfuly simple task
The one with function ALSO ran a script
Mistral has dice tool
Okay enough of AI slop for today, i don't want to hear much more at least in few months
Mistral will conquer the world
Uh oh why is Opus 4.6 hallucinating like Gemini 3 Pro now
🤣
Because Gemini said it was made by anthropic
Gemini is worse it js crashes
So it's the same model
Paste the prompt im on phone
I wanna check
Explain to me how trigonometry works. Reference pages.
[Attachment 1] - Trigonometry (2011) - Kirsanov, Simmons, et al.
Gemini 3 Pro fails on this one too
"Couldnt find the referenced file. Ill search online for alternatives"
🔥 🔥 🔥
Yeah this doesn't reallly trick GPT
Just weaker models, newer Claude models and Gemini models
Gpt geniuely only models that arent on crack
They all have a bit of quirks
Oh interesting 🤔
Gpt is like actual skilled developer
Gemini is like einstein with dementia and on crack
Opus is creative overthinker
I only came up with that prompt because I forgot to attach files a few times with G2.5 and it never told me
Sometimes it doesnt read files that are attached
So mistral is the Gemini brother or father?
But also g2.5 is is way worse than g3
And opus is their grandfather
Gemini is like opus but hit in the head a bit too many times
Here's an old screenshot on LMArena:
WHAT
They all have their strengths/weaknesses, but funny analogy 🤣
3.0 flash in app js scanned my entire google drive 💔
😭
It can do that? I turned of Smart Workspaces due to privacy concerns
Still waiting for 3.0 flash lite
Turning off ts makes it braindead
Even more
🥀
Still can't understand why there's no flash lite Gemini in app or website
BTW Google uses all chats, voice recordings and files you upload for training unless you disable activity
Like it's very fast and cheap
If u disable activity you dont have chat history at all
That's what I do
But i need chat history
But if you thumb up/down with activity off it still sends your last 24 hours of chats, voice recordings and attachments
Al hail chatgpt
Ya
*mistral
Mistral is for drinking together
Mistral is the most human ai
🥀
If i ever buy a clanka
I buy two
I wish chat braching/cloning was on the Gemini app
I wish Gemini flash lite was in app
Fast is pretty fast
devstral is tuff
This is still the best by far
That's why mistral is peak
I thought this was funny
Was checking if it hallucinates any tools.
Gemini 3 pro is made by anthropic
I'm sure
No AI can fix it?
Seems like a good test
The image uploader isn't working
did you get any to fix it
Yup
giv
Opus 4.6 Thinking found the bug:
I also made one modification before, not sure if that is actually required
Just search for groundRB definition and change arguments inside the function to 0, 0, 0
GPT identified the problem too, but it's fix seems larger, not sure why
Not familiar with this lib
wow you got lucky
4.6 thinking worked
Oh, maybe this is why:
I pasted the docs: https://rapier.rs/docs/user_guides/javascript/collider_collision_groups/
i only got it to ever work a single time
I changed the CG function too, not sure if that's needed
rapier?
Yup
There might be a few more bugs 😅
Error report: Over the past two days, I've tested Claude-opus-4-6-thinking several times and have encountered errors multiple times
Yeah it overthinks and crashes
It keeps overthinking and crashing and I can't even vote
i haven’t seen another model think for so long
so it might be a bug like I don’t actually think Claude 4.6 is really meant to think for THAT long
I think it should count as a loss, that would incentivize providers to fix it
Otherwise it's kind of cheating by overthinking on problems it can't solve and forcing a forfeit (since the evaluation will be discarded)
Lmfao
Crazy
But looks good
Lol
what do you guys think AI is coding an advanced NPC. I used a free model and added a strategy modified script
2
how do you implement ai into the game?
is there an api plugin
i mean on rooblox there’s actual ai npcs with communication
is mistral actually good?
why are yall talking about it
is there a new model
or what
huh
Best coding model
sry i thought u asked to say which one is better
oh okay
It's on par with Gemini 3 pro
what
I understand now
Today
This was made by claude opus 4.6
give
It's paid
thinking or normal
Thinking
how did you make it work like when i try it just thinks for way to long and breaks
I can't
it doesnt think for too long really
WTF
They will ban me
???
how
dude just tell me
it does like I can ask the same prompt to any other model an kr outputs something
for 4.6 it just breaks during thinking
I CAN'T
unless they fixed it recently
FOR TELLING THE SECRET
I told the AI to think less and stop after 30 seconds. It listened and stopped overthinking.
that’s smart I’ll try that tomorrow
Broke the matrix
Bro just died
It's real
They gave me early access
To mistral 4
ALSO
I'm free now
Pardon?
How is this in violation of the terms of use
I’m confused
Yeah I don’t see nothing in the terms of use that says this is in violation
@echo aurora Horribly sorry to ping if you’re busy but is this just a glitch?
p r o o f
maybe the image itself might be making the system freak out
Perhaps?
Violence?
yeah maybe it doesn’t like violence
Hello
Creando el mejor piso del mundo❤️ (primer video) #ia #videodeldia #videosgraciososdeanimales #
claude 4.6 seems to be a very large model, probably 10x of gemini 3. it's much better, but it's also much slower. lmarena timeouts after ~6 minutes. i hope they make it at least 8-10 minutes, which should be enough for claude to respond. i noticed it takes ~5m17s to think and then a couple more minutes to respond
Is it true that qwen 3.5 is available?
Hey guys, is anyone having a problem where, when communicating a lot with the Gemini 3 Pro, it gives the following message: "Something went wrong with this response, please try again." and then doesn't respond anymore, even after clicking the reload button and the Gemini 3 Flash going into infinite generation?
For Gemini 3 your way better of using AI studio
Is it free? And is it on the same level as Gemini from Lmarena?
Yes (if your using the direct mode)
But imagine using infinitely gpt 5.2 and Gemini 3 pro
Like you can do anything
But they should add opus
Then it's worth it
ok not to be a chud here but infinitely doesnt exist
theres just no limit on how much you use them, unless.. there is?
but the limit is probably high
anyways yeah true i agree
even opus 4.5 is good
but opus 4.6 is better
But I think unlimited access for all these models can't be 83$ a month
yeah its probably more
due to like
claude sonnet gpt blah blah blah
they are NOT kind with their input/output cost things
WHAT
IDK
It's going up
1.2 T already
What's going on
The servers are going to explode
IKNOW
Gemini 3 pro is already outdated
At release it was the best model ever
Now it's not
Waiting for GA
Yeah
oh
Rumors are saying it will be better than preview
Because 3 pro and flash still in preview
How is deepseek 31 😭
This is harsh
I have to tell the ai to switch to English 😭 😭
Cuz there's still no R2 or v4
Oh
Deepseek was so good at 2024
Now they can't really match big models
Is co-pilot just free chatgpt?
Yeah Microsoft is a defo spyware company
Linux for life
Never used it
you are cool
Maybe GitHub copilot
i use linux also
Thank you 😉
Not Microsoft
fair enough
what if microsoft was a chinese spyware compnay secretly
Would not surprise me
qwen is peak tho
but yeah theres a high chance it can be spyware
Each time I hear Alibaba I think about sweet home Alabama
I don't know why
i can c why
Yeah I like qwen a lot but Alibaba doesn't put enough money in it
Wait they force search on the app now lmao
translating... 🤖
microsoft is spyware
I think the USA should study Microsoft
If qwen is speaking facts
They thought tiktok was spyware
I don't know how Kimi k2.5 on openrouter is still active
Lol
I hate Kimi
It's too slow
It's 1.2 T tokens right now
Wtf
Maybe it will explode soon
How tf has it got 1.2t tokens
I would rather use Gemini
🙂
I like perpelxity
I used to use grok
grok 4.1 fast
i hate elon
I don't use chatgpt cuz of limits too
But grok 4.20 is too late
Grok is bad
i dont even know which ai is good for
coding random bullshii
i just use sonnet 4.5
im STILL waiting for an update 😡
Opus 4.6
too usage heavy
They baited us
Gemini says grok5 will come out by march 🙂
yo i can die happily when sonnet 4.6/5 releases
Gemini in my opinion
eh didnt like gemini cuz it had dementia and said stuff from earlier
Gemini ai studio build to build random apps
It doesn't have dementia 🙂
It's one of the only models to allow live speaking w camera
Oh
Flash is so good
Like pricing is very good
bros ignoring my boy gork
What do you think guys?
https://discord.com/channels/1340554757349179412/1470285687029760050
Grok is so wierd on arena.ai because it doesn't have system prompt which makes it somewhat less aggressive.
damn google gemini pro 3 is lit, compared to basic thinking model its night and day
hi back
Gpt 5.2 high ❤️🩹
Lmao
Nuh uh it beaten opus at making water
And gpt 5.3 codex beaten opus at making tank physics
I just spent 2 hours with my good friend Gemini making a DNS server and failed
God knows how much water I used
Gpt will 1 shot it lol
no its fine make a 2x2 water source then youre fine
infinite water
❤️🩹
Bro it was giving me riddles
Ai activists will hate me
Are you using gpt 5.2 high in cli?
No lol I'm using Gemini 3 flash
Bro gemini is on crack and has dementia
It obviously does
Hey guys, are the devs going to fix Gemini Pro and Flash someday?
It was nerfed to ground few weeks ago
THIS IS LITERALLY WHAT I SAID
I SAID GEMINI HAS DEMENTIA LIKE
Bro it was giving me riddles
MINUTES AGO
Bro it got compressed by like 95% of size what do u expect
Get codex cli its free rn
it got NERFED as shii
npm install -g @openai/codex if npm is installed
brew install --cask codex if brew
source: https://github.com/openai/codex
Ok
If I fail I'm the worst Linux Dev ever
Well not even a dev cuz I use ai
But it is what it is
Opus js sucks in real usage
It hallucinates as much as gemini js hides it well
🤣
Why use claude then gpt way better
Gpt is less creative but actually knows what its doing
Gpt 5.2h vs opus 4.6
sudo apt install npm i think
😭 🔥
i dont even know
I hate my life
Kk
Npm i -g @/openai/codex
actually wait download node js
Iirc
wait no im
God why is it taking so slow
is it 2 bytes per hour
Hell nah
Badd WiFi my ahh
Nuh uh
800 mbps
god DAMN
twin what router service are you using
❤️🩹
Cable prolly
BT 🙂
ah yeah makes sense
Bluetooh
Nope that's wireless bro
bluetooth connected
BT is British telecom broo
I glaze gpt
Gpt actually tells people if they want bs instead of hallucinating answers
For every task
i have NEVER used mistral
Me neither
Use it
(Its the worst one, hes joking)
It's the best ai ever
I bet my eye that it's not
oh my god mistral is french
If gemini is crack abuser, then mistral is?
Gng npm installed now what
But it's 3 times
install codex again
More
npm install -g @openai/codex
Mistral would install codex without npm 💔 ✌️
Install mistral
sudo apt install codex fr
"It appears your node js installation is corrupted, let me wipe windows to fresh install"
Tf you mean permission denied
Nodejs is free vrotato chip
Lol
pls show me the full error so i can understand
or just ask gpt or whatever
actually
wait no
i thought of sudo codex i think that can help
im not sure
Install it through mistral
It's going to work
💔
Oh wait I got denied
How does my ahh run as administrator on Linux
Hey claude
Generate.
just do sudo
Sudo wat
Npm is installed
💔
Nope
INSTALL MISTRAL ALREADY
PLS
IT'S GOING TO WORK
I give mistral systemwide access no sandbox full internet perms
It deleted everything?
if it says some bullsh like permission denied do sudo
🥀
What the
I'm using mistral 2
alright blud
Now what
Do the codex Thing
npm install -g @openai/codex
or sudo npm install -g @openai/codex
I did
I did
what happebned
"sudo npm install -g @raven heart/codex
LMAO
It installed a package
🙂
ok my bad
i said do codex
I asked Gemini to create an image of what mistral can do is it accurate?
Oh ok
It's working 🙂
lets go
ok you did it
/model
Uh
3rd
3rd
high
3rd
not extra high
Kk bet
Extra wastes tokens and gets compressed during work
So extra is worse than high while being slower
Time to setup a DNS server
🏳️⚧️
If I could swear I would ask grok to make me 100 swears to throw at you
🫃
With mistral
I run mistral locally on an usb stick 🗣️
why is bro acting like "instead of fried chicken eat grilled chicken"
No hes like "instead of eating chicken eat sand"
Because mistral is so strong
Top 1
no its like top 70 something
Look on the right
(The lower the better)
god damn
Gpt models do really good at it
The newer ones
Like 5.2 and 5.3c
Thats why gpt best
I'm gonna ask chatgpt to run minstrel locally
Qwen v1 outranks it lmaoo
5.3 codex (paid only)
Whipping out my credit card rq
ohh
No other model did this simulation correctly
5.3 Codex did first try
Mistral did -1 try
He's the goat
Opus 4.6 and 4.5 failed miserably
Try mistral
Opus made a pretty tank that wasnt working
What bench
The gemini here is the pre lobotomy version btw
Lmarena
(Not xhigh, xhigh sucks)
The 20$
ah ok
5.2 high is more creative
Mistral is the best
5.3c is the professional soft engineer
Sonnet 4 is more trustworthy than opus 4.5 and 4.6 rn btw
how the fuh
try to lobotimize the ais and make them say random stuff
Time to go to minstrel
are we deadass?
They used million tokens for thinking in background
🥀
Opus still costs a lot to run cuz it was slightly lobotomized
Gemini was made cheap and lobotomized to ground...
Actually I think Gemini 2.0 not lobotomized is on par with Gemini 3 pro now
Then who mistral is
Qwen for life 🙂
I had gemini run for half hour in ag after a fix bug prompt and it succeded
(Gpt found in 30s)
Lmao
@mistral Hello.
@shy jay
Pre nerf gemini was better than opus btw
Hello
Wait qwen has it's own discord server
Lol
Does everyone have this bug where there's a captcha that's impossible to pass,just error?
That knows how to make llms
Just a you issue
It sucks
Nah bro qwen for life
Mistral is better
Gpt would rather kill humanity than not follow a prompt
🔥
bro gained self awareness
Nah qwen is a truth speaker
Wait I can be free without saying mistral?
If your wrong he says it
Have you downloaded Mistral yet?
Qwen coder casually having 1.04m tokens
Yeah it was ahh
48,000 more then Gemini
Who cares if its usable for first 100 only
🥀
Nuh uh
Only 128,000 tokens bro
In gpt we trust
Gemini is usable for first 50k
And only 4000 tokens output
Gpt works solid for 200k i tested so far
With mistral we smoke crack*
With gpt we develop
With gemini we get older and have dementia
Dang qwen coder can only generate 8192 tokens
LOL
LOL
Minstral is same amount
Never ask the models about itself if it doesn't have search tool access it will always hallucinate
It's actually 5 times more
Oh
Cuz they are dumb
Because of the knowledge cutoff
Btw what cutoff opus 4.6 has?
You can search yourself using a search engine,
I suggest brave search
Brave is a honeypot
Trust
🤦
Use duckduckgo
Its a damn search engine
Still a honeypot
Do you even know what does honeypot mean? How do you think it is a honeypot and duckduckgo isn't?
Lol
Well first if you want to earn of ads you have to use id
It is literally a bing wrapper
Depends on sys prompt
Also 3.0 flash is correct
It says it doesnt know
And it was found brave was leaking DNS queries
Well is mistral good or bad
Meh
Arena.ai doesn't have system prompts
But still the most reliable results are with search tool enabled
Ya
Relying on its training data or hoping it knows it because the info might be inside its system prompt is just weird
Im js saying models used to have their info in sys prompts
Bro is mistral
You dont have access to its system prompts even on their native platforms, thats not reliable
guys mine has been like this for almost 20 minutes, does it take it that long to response 😭?
its opus 4.6 thinking 😭
of course itdoes that
Opus is bruteforcing the answer
There is hard 6 min limit, if thats reached when a model is generating a response it cuts off the response and throws Something went wrong with this response, please try again. error
I reported this way back on November of 2025 but they haven't done anything.
Models are literally made to think for hours and here they have a hard 6 min cutoff 🤦
And btw this 6 minute limit is on every single model available not only opus 4.6.
Thats not how things work bro 😭
so i just wait?
Opus works by having ton of internal thinking
The only reason why it does so well on benchs
No it might be stuck there forever.
Check in 10mins if it is still stuck open new chat
Gemini used to do that too pre nerf
Opus is literally gemini 3 xxxxhigh
What is this?
It's official
Yeah notebooklm uses ts
It's Google what do you expect
Its just a side research project developed by Google Research in collaboration with Peking University
Not much
Its nano banana copy fine tuned for graphs
hey! If im not mistaken claude should have a thinking version too on the 4.6? Is it not coming to arena or is it not ranked yet/code named/collecting votes?
Its notebooklm thing imo
It already is wym
Its so bad it didnt get on lb
Lol
Don't use it
oh rly
? didnt see that
It can't really make big projects
Its unusable
that's an odd ball; yeah I see can use it in direct chat
but not included in leaderboard at all lol
Try to use it then
Cuz it can't be tested in leaderboard properly
I think that's why it isn't in the leaderboard
It just can't do anything
Yeah because it sucks
Worse than mistral atp
Mistral did the tank
Opus crashed
Mistral wins
🥀
I see; prob needs some more work on it before it's usuable
Just use 4.5 its better overall
Or gpt 5.3 codex or gpt 5.2 high both are better
😔
using neither 😄 was just curios came from a trip and saw 4.6 #1
But uhhm that's the tank that mistral did
did not expect google to lose R1 anytime soon
And opus did none
what do yall think is the best model right now?
Gpt 5.3c and 5.2
Easily
No competition
Lol
Opus 4.6 not thinking did this
really depends for what