#general
1 messages Β· Page 252 of 1
5.2 easily best
now 5.2 high vs 5.2 codex π€
cant wait for 5.3 access fr
codex not bad
codex bad
This one just looks like perlin noise imo
the prompt asked for low poly, which is exacly what it was meant to do. just that maybe too repetitive it looks
opus just acts like "it knows better" and doesnt follow the prompt
another 5.2- high attempt
Is this really using threejs
yuh
geniuely 5.2-high might be the best model out there rn
thats pretty adorable
wireframe looks bad on screenshot but the wireframe on animation looks good
pretty decent
those two made by 5.2 high, really good
especially if you lower wave scale
Pretty good
way better than opus 4.6 i 4.5 which suprises me
What if you remove the threejs restriction
what should it use then?
Just say 3d
gpt 5.2 vs opus 4.5 or 4.6?
hello, is there any way to access archived chats in the arena?
alr lets see
both decided to use three.js
oh opus just crashed
xD
Lol
ok i'm running it right now
took like 10 minutes of trying and failing captchas lmao
The captchas looping and by the time I solved the captcha the prompt timed out
ok so opus 4.6 is awful
Opus 4.5 Thinking π€
gpt-3 codex just released
"just"
ugly
Idk why the color function became so weird
It's just using trig functions I think
Oh yeah, what's up with Opus 4.6 Thinking π
worse than gemini fr
4.6 tried to use more files and tried to write shaders
well it failed miserably
Yeah lol
lets see
I wonder how Kimi K2.5 thinking does
Is this output one-shot?
Oh nothing? lol
Both 5.2 high i sent are by the same 1 prompt i sent
Thanks, comparison:
U can include both 5.2 high attempts tbh
I think the diff is pretty clear
True
5.2 is js another level compared to em
Its crazy
Very interesting
Quite sensitive to task domain I think, for Matplotlib Gemini and Claude is better
can you try gpt 5.3 high?
5.2 one better performance too
Doesnt exist
yes it does
Link
try a high poly ocean
Not selectable in side-by-side mode, idk if they've added it
or something thatβs just super realistic
test it
oh ok
i feel like opus 4.6 took the prompt to seriously
No
The opus 4.6 one said 96k tri
π
Idk which part it messed up, but it tried to write custom shader functions which 4.5 didn't
Thats more than gpt 5.2 both attempts combined
TRY IT TRY IT TRY IT TRY IT TRY IT TRY IT TRY IT TRY IT TRY IT
Wow
What model made that?
@heady jewel wdym by gpt 5.3
It's like first-person Powder Toy
opus 4.6 nonthinking
Wow, impressive
BTW you can right click anywhere on the screen to put the elements
i know
and q/e to switch modes
wow very cool, I'm a big fan of Powder Toy
I wonder how it's so good on non-thinking
How many turns did it take?
Hmm that's kinda suspicious π€
lies
say prompt
Yeah, but I meant from a data contamination perspective
or it prolly stole the code π
the graphics are the impressive part imo
yeah
Oh I googled Noita, it looks so similar to the actual game π€
I think it's a bit suspicious since it managed to replicate the game so accurately
nah it isnt imo
Like how would it know the color scheme of the game unless it has seen example code from recreations
the gunpowder is PEAK
4.6 nonthinking
xD
its AWFUL
maybe its not real opus on llmarena π
it cant be this bad
"error"
That's why I'm a bit suspicious how it managed to recreate the other game it one shot
alr but dont cringe pls
create a genuinely incredible truly comprehensive INSANELY BEAUTIFUL replica of noita as a single html file web project. genuinely incredible. max effort in everything. zero shortcuts etc.
3 o 4 fps, lol. Bad optimize
βοΈ
nop, its jus ur device
Gemini roasting saying its sonnet 3.5 levels of code π
its a model like opus 4.6
no it jus doesnt know about 4.5
Gpt 5.3 doesnt exist
whats the prompt for this
5.3 codex
Scroll up
are you sure
@quartz light mango
Yeah but he said that 5.3 just released
hi
Not codex
introduces mobile optimization
the only reason i only sent the nonthinking version (and i am not even going to continue attempting to use thinking from now on) is because lmarena ####IN TIMES OUT AND DOESNT LET ANYONE TEST EXTENSIVE THINKING MODELS PROPERLY π
It lags on 9070xt w 5900x both OC π₯
im on a laptop, runs at 144fps :p
It runs 240 for, 200 when much gunpowder me but
Which is awful for 2d
yea it can get laggy
i think it has some typa raytracing
Oh
Lets see 5.2 high π₯
no
it will time out
/:
this problem has existed since forever
they dont care to fix it
Is opus 4.6 thinking not on the leaderboard yet?
even if it is, i dont care about the score
its not accurate
we cant actually test 4.6 thinking
Why not
because when it tries to think for a proper amount of time
for actual projects
it times out
it needs time to think
Both 4.6 and 4.6 thinking seems to be available in direct chat
i know
Its awful
And it wont get on lb xD
no it isnt wdym
It gets bullied by gpt 5.2 xD
its just the fact that we cant test it with the duration of time its designed to be given to think
Ah i see...so the score is just going to be based off simpler prompts?
It's interesting that you can see Claude using some components in other games before:
dont care
Claude simps π
not
Do you think it would have topped the leaderboard if people were able to test it properly?
got it to make a pathtracer for robloo.x as well as a horizontal points based custom renderer for robloo.x with colorsequences used as pixels
It would get above gemini 3.0 pro but below gpt 5.2 low,codex
Proof?
I tried water animations with opus 4.6 and 4.5 and gpt 5.2 and 5.2 codex 3 times each.
5.2 was best everytime.
5.2 codex was a little behind
4.5 opus was far behind codex
4.6 opus was notably behind 4.5 opus
(Opus thinking )
so youre basically saying whatever algorithm the model decides to use for waves determines the whole models performance
i use ai every day
ive tried using 5.2 high
its just bad
universally bad
It's so freaking disappointing man... There's rate limits rn for days when there hasn't been any for NB Pro for weeks...
And 2k takes 3-5 minutes to generate pictures.
Honestly this company should be losing a crap ton of company rn for giving out such poor quality of service
opus 4.5 and 4.6 made 2d waves BRO
then say 3d
OMG IT'S BEEN 5 MINUTES...
lol ok im going to give it an actual good prompt
and its going to give a good output
and ur going to make another excuse and deny it
ok show me good low poly realistic water with waves in 3d made by opus 4.6
ill wait
(im not any company glazer, im just saying my testing results)
Guys how can your coding generations be so good, I used the same models and ended up with dogwater results
Do I need more specific prompts?
Opus 4.6 model thinking:
The user wants me to create a Bright Memory: Infinite-style game clone in a single HTML5 file, optimized for mobile devices, with realistic graphics and fully functional. Let me create a comprehensive FPS-style action game with:
3D rendering using WebGL/Canvas
Touch controls for mobile
Player movement, shooting, combat mechanics
Enemy AI
Visual effects (particles, lighting)
HUD with health, ammo, skills
Performance optimized
I'll use Canvas 2D with pseudo-3D raycasting for performance on mobile devices, combined with particle effects, enemy system, and touch controls.
Let me build a comprehensive game with:
Raycasting 3D engine
Touch/mouse controls
Weapons system
Enemy AI with pathfinding
Particle effects
HUD overlay
Sound generation via Web Audio API
Performance optimizations
I'm creating a pseudo-3D FPS game using raycasting with modern visual enhancements, particle effects, enemies, multiple weapons, and skills like sword slashes and time manipulation, all with full touch controls for mobile. This is going to be a substantial single-file game with a raycasting 3D renderer, textured walls, and all the core mechanics needed.
π
What was the result?
This?
In progress
Ok
Basic result:
autoclicker goes brrr, also kinda bad no offense
ai sucks at 3d
get it to make a good water ill wait
@echo aurora Fix the NB Pro models. Yet again. Waited for 5 minutes then gave me an error.
Please π
A bit glitchy but cool
atleast its scrolling unlike opus 4.6 glazer
????
Guys why can't I load arena site it says unverified connection
and better perf
i didnt specify for it to scroll, i dont need scrolling
small map + laggy
because it doesnt look good
bad prompt
gpt glaze
Kimi K2.5
lmfao
blocked
kid
opus looked better but costed 20 times more, took 20 times longer, and way worse performance, and smaller map
π₯
in real usage gpt wins
still better than opus 4.6 π
it generated it fast
crazy
someone gotta try 5.3 codex
gpt 5.2 > gemini > kimi > opus 4.5 > opus 4.6 result
me when i lie^
?
Do you have a link for this?
im trying again with a better prompt
its a bit bugged
ill fix rq
ok but have you shared code
- it wont be one shot
yes i have, scroll up dumba##
Which one? I'm not logged in so can't share links
i dont see
seriously?
yall do realise
they could all just be using
ocean js
opus 4.6
if youre not checking if theyre just using an existing lib
they are permitted to use any, as specified in prompt, but they used threejs both
show screenshot along with prompt
thats simple as #### if youre not asking it to make it custom
thats so dumb
Looks like Kimi actually wrote custom shader functions
and yet both opus failed, i wonder why
Doesn't look like waves, but it's quite pretty
yeah opus is good at making things look good
didnt you say lowpoly?
but gpt is more reliable and advanced
i asked for high poly for you specifically
π
using YOUR prompt
holy how r u so dum
i didnt ever share a prompt for water?
we serious?
i used your noita prompt
because you said mine is bad
you didnt even specify "high poly"
thats..
Flash's is ugly, but GLM seems to have messed up completely
you just edited the noita prompt
thats what im saying
that is a bad prompt
noita prompt was good
but this is just not specifying anything
flash's still better than opus 4.6 thinking low poly attempt π
dont call me dumb for that
BECAUSE THE MODEL HAS TO DECIDE
Lol
Maybe tell Opus not to write custom shaders
yeah but for noita the focus was for it to look good
Ig this puts into perspective how good Kimi is
if it tires something it is too bad to do, then its the model's fault
since GLM is the next runner up in terms of Chinese open source models
same focus for water
Oh wait maybe it was minimax for web dev
did opus just... FAIL?
just noticed your username is "ggezrekt"
js shut ur ragebaiting atp
noone can be this dumb
you are
check whether the last file it wrote was cut off
you used the same prompt again lol
it isnt
1 file only, wasnt
how do you put images
because it continued after
because look at the prompt he used
how are you agreeing with this guy
hes using the same prompt with no mention of low poly
Oh yeah it tried to write shaders
seems like it messes up whenever it tries to do so
holy ur so slow
bad but better than opus 4.6 π₯
There's an error; it's not rendering
i know but the prompt asks it to focus on looking GOOD
π
I wonder how other models perform on that prompt btw @surreal zephyr
THATS THE POINT YES
bet
=help
ModMail is a feature-rich Discord bot designed to enable your server members to contact staff easily.
Please direct message me if you wish to contact staff. You can also invite me to your server with the link below, or join our support server if you need further help.
To setup the bot, run =setup.
you kidding me
soup
I think GPT might be better
Since it is actually better at WebGL from previous tests
opus tries to overengineer and fails
opus 4-6 better in most of task and really good at solving problems, but maybe gpt can be better at WebGL
the problem is opus is going to try doing the most advanced things because it knows
genuinely no idea how to get this to yall
ops 4.6 tries to overengineer stuff and fails miserably, it sucks in daily use unless you would have other agent controlling it
it tires, BUT IT DOESNT KNOW HOW TO DO IT, AND FAILS
so its BAD because if it FAILS IN PROD, THEN ITS UNUSABLE
it always did better than gpt, gemini for me, i'm trying to use NOT WELL KNOWN collide and slide algorithm, with simple explanation to him, he did everything perfect
way too perfect
xD
he wont admit it
just block him
maybe you right
Is this your tests?
Try opus 4.6 non thik
@verbal nimbus give me the actual prompt used for kimi etc
opus nonthink failed to even load, it crashed.
Same prompt
hes going to use the bad prompt on purpose
give me your prompts and i change them so opus will be best
^ Gemini might do well on the WebGL test too (see message and scroll down)
yes all same prompt
Hmmm
literally just realistic animated water, rest belongs to model's choice
this is such a lie what?
@north obsidian this is straight up just not true
LOL
and tell me your favorite model?
gemini 3.0 pro on release day pre nerf
This was Gemini 3 Pro in experimental phase (Lithiumflow)
now opus 4.5
ragebaiter ragebaiter ragebaiter ragebaiter ragebaiter ragebaiter ragebaiter
ragebaiter
WebGL iirc
it wasn't doing it properly though
none of the models could, it was kind of cheating
GPT was unexpectedly decent too
are we all going to ignore this obvious lie?
@verbal nimbus
this is straight up not true
he shows the grid of previous results which didn't use that prompt
and says its all the same prompt
Perhaps Gemini too: #general message
opus 4.6, custom shaders, after fixing its own bugs
why are you lying @surreal zephyr
Wait I'm not following
@verbal nimbus
Wow
the noita one
but what about the lowpoly
I made the grid
are you slow
you are
low poly animated water, high performance optimised, for game. using threejs and other needed stuff. realistic colors
ALL WATER IS LOW POLY
all from the grid used this prompt
It was @surreal zephyr 's prompt
(btw he lied that the "noita prompt" was used for all results on the grid)
NO IT WASNT
NOITA PROMPT WAS USED FOR THIS
holy f###ing lies
2 turns? That's impressive
im done
yeah quite good
the colors kinda bad tho
but the waves nice
storm is bugged
night
stop spamming
not sure what you mean by this
What was the prompt again?
Is this better than GPT-5.2 High?
create a genuinely incredible truly comprehensive INSANELY BEAUTIFUL 3D water animation as a single html file web project. genuinely incredible. max effort in everything. zero shortcuts etc., then THE SCREEN IS JS BLACK, FIX
Every water you doing in every engine with this waves can't be not low poly.
this was the prompt, opus 4.6 nonthink
does this count as low poly https://api.websim.com/blobs/019c3f00-fa83-73d8-ad6a-5947118a4b09.html
Do you have the code? Curious what it used
Looks like noise on top of trig functions
check it
@surreal zephyr @verbal nimbus
yes, looks cool, but the water is not connected with eachother bug
ye ye
Look
?
I made prompt as simple as heck
Make realistic animated WebGL water, make water clear reflective and put few cubes different size on water to make ripples and float effect.```
first one not, i know how hard is create custom "shader" for water, for ripples
i think gemini used oceanjs
idk
check
which ones custom?
opus
Pretty decent
Yeah, but there are gaps between the faces, kinda weird?
both
and gemini tryed to use packages to generate water textures as i saw
Which model is this btw?
that's why your water was pretty clear
gpt 5.2 custom shaders fail
thats actually... sick.
so damn bright
Had a similar gen with the old Kimi K2
Actually gpt becoming more and more... dumb...?
I really feel like gpt4o is best one, but still not that good
Oh nonthinking
Idk how this gen was so good but then it messes up low poly
neither can do custom shaders well, opus 4.6 and gpt 5.2
ikr
gpt 5 is way different from 5.1, 5.2 and 5.3
okay let me make task harder and goal more clear so both models will understand me
i will show results after few min
hmm
gemini? gpt?
I can kinda see the vision π€£
now try gpt4o like a joke
you will see difference
its really more smart, just because it don't get that much slop in it, they ruined it, and database is really small
with reflections and raytracing π£οΈ π₯
nobody uses it
i wonder would they manage to do it tho
most of games: screen space reflections (like raycast but only on visible screen), if something reflects from back it will not be visible or really bad
I like this one so much
this one is kinda simple
too simple
i can see they are almost same
ask about more noise
opus 4.6 did best imo
like the waves?
The colors are a bit weird
it has custom shaders but colors suck
its actually pretty acccurate
not transparent enough
i see
Some patches of the sea seem too similar to the sky color to me
this one actually bad
@topaz skiff the 4o version crashes lol
Lol
oh sorry to hear that, check any random website
its same one just diff wave setting
i meant the script it wrote doesnt load
Yeah no surprise π€£
Do you have the code for this one?
yall try this at these settings https://019c3f06-49dc-782e-b953-6d5e3bd29817.arena.site/
@verbal nimbus try it
Honestly
The 5.2 was the most prod ready
Imo
It was good perf, and looked good
Pretty cool
Opus did great job on high poly
Not high poly
this is just effect
like huge lie
Your browser will go crazy if it got real procedural topology
Bad wording, shouldve said "when not told to do low poly"
How many tri would be high poly then
now ask to add "topology visualizer on n bind"
wdym by this
don't ask me
5.2
Thats not good water
they should have different topology
for waves
Okay so simplify (THIS IS PLANE) we want lowpoly dynamic MESH for waves (i'm not that much programmer to say this correctly and my english sucks)
Still bad, tell him not use one plane but use shaders
the physics in this are actually kind of cool https://api.websim.com/blobs/019c3f0a-a2dc-73fa-9fed-d55742e7cb1b.html
Gerstner waves, to give it more steep look
best colors + good perf + its not 2d its 3d wym one plane
Okay so simplify, you see how it's made of cubes (cubes of triangles) and it like one big "cursed" smashed plane
that's just plane, i can't tell better
what?
I THINK THEY FIXED TIMEOUT ON ARENA
no way
its made of triangles wym
That's how much you can explain what you want
Do you know polygons?
Gif not loading
i dont understand the issue
I will check it, thanks
Okay so give me a small moment
4.6 just thinks more generally
its beautiful
both thinking models timed out for me tho
best example i could find, look at left
Lol
opus so good π₯
WHAT THE HELL THE TRANSITION
Shaders are cool
CHECK IT
whats that
THATS STORMY WAVES
DUDE
SERIOUSLY CHECK THE PRESETS ON TOP
WHAT
people that know nothing, don't deserve to call their vibe code good, it just a tool, not a "make it for me"
THIS IS INSANE JUST LOOK AT THIS
What model
CHECK THE PRESETS ON TOP
4.6 THINKING
Wow
the floating objects are fake
?
chill dude
Oh yeah it's pretty cool you can click on it to create droplets
4.6 is a freak
they dont float with waves but fake
check the presets on top
they are real objects wdym
Hi
they float in air xD
but the water look good
Yeah I like how the sun movements are animated when you change the time
Very cool
Makes me want to get into shaders
the water looks crazy good tho
stormy ocean into moonlit night is crazy
show source & prompt, also how many prompts it took?
Yeah
Actually i think they really got thinking 4-6 updated
stop glazing
...
Which model was it again?
Which prompt
custom shaders or no?
show source
how is this realistic and the previous one isn't
its the same wym
Do you have the code?
the other one was way better
which one
this
this one looks more game ready
the colors & reflections
how
yes thats what makes it good
water is partially transparent and doesnt bounce light like sheet of steel
im going to ask gpt-5.3 to make one
that one looks like blue mountains that are moving
has no realisitc effects
still better than overengineering and failing
opus 4.6 nonthinking
opus 4.6 opus 4.6 opus 4.6 opus 4.6 opus 4.6 opus 4.6 opus 4.6 opus 4.6
thats opus 4.6
opus 4.6 that is
this is gemini 1
are you still alive?
Btw WebGPU is the successor of WebGL
intresting you liked my simple prompt
i wanted to test same prompt for fairness
okay
but the prompt quite good tho
"be seniour [sic] programmer, use best practices."
"DO NOT HALLUCINATE DO NOT ADD BUGS DO NOT HALLUCINATE DO NOT ADD BUGS DO NOT HALLUCINATE"
my experience with gemini 3:
5.3 better cook i spent 20$ on ts
π
tell me what you made acutally good with AI
(not simple experiments)
And this
What is it
game
Uh I don't really know what GPT-5.2 High made but I won't blame it since my prompt wasn't specific enough
nah
I made this with one prompt
Oh forgo
Not i made it, AI made it with one simple prompt
codex 5.3 π₯
lol
Kimi kind of glitched
show
scrol up
W codex
im using paid codex 5.3 in cli
Gemini 3 Pro
with @topaz skiff 's prompt
what
it worked in the opus demo tho
What's this supposed to do
tank tracks physics
tracks that deform on terrain
Oh that sounds complicated
Deform how
its not THAT complicated, but it requires understanding 3d axles and movement
like this
Is that called deformation or just suspension
yes suspension but connected with track's
Looks not that difficult with a physics library π€
well all models i tested so far failed such as tracks detached from hull, or sideways, or not deforming at all
ok this is best
Which one
holy this is peak
5.3 codex paid via cli
Wow
its the only one that managed to get it to work
What model?
5.3 codex
nailed it
none other managed to do it without any bugs
codex 5.3 one shotted it
π₯
yea thats a mod
Seems like code mode got autistic system prompt
hmm
it changes dynamically tho
Maybe get it to create a 2D physics toolbox where you can construct it yourself
FAR closer to real than opus and 5.2
its getting really close tbh
im js sending it screenshots and checking what it does
Idk it knows where to put the springs
jeez i asked it for abrams
Something like this but editable: https://www.myphysicslab.com/engine2D/car-suspension-en.html
wdym
the track isnt solid, on terrain it bends
opus made solid and bugged track every time
OMG
@topaz skiff @verbal nimbus
singularity bro
Maybe get it to draw where the springs and joints actually are
i remember i shat my pants while playing it as a kid
gpt 5.2 btw
I loved those types of games
i still remember
btw this is deadass
(diarrhea)
sandbox games r one of my favourite
opus 4.6 made tank but not physics
5.2 codex sucks 5.2 high is way better
Yeah Microsoft Flight Sim was one of my favs
Wow
im using 5.3 codex brotato chip
and it made something NOTHING ELSE could
i know
which is crazy
π
codex is normal but less creative more prompt adhering
so worse most times
but more trustworthy
im jus waiting for ai to make good models on blender or whatever
like when that happens game devs are truly cooked
Grok 4.20 is not real
not at all
you guys have clearly never used codex 5.3 very high
the one with the big brain icon
no
there's medium, high and very high
wait I deadass didnβt even realise it could make models π
what are some third party websites that has free credit for Nano Banana pro.
Codex 5.3 is rushed
what? that has nothing to do with coding
I think
they just give personality updates thats all
it does
yeah heard one called job.com
it really does
they dont increase intelligence
NAAAH
dude
Thank you
I DON'T NEED THIS
compare 5.2 xhigh vs 5.2 codex
youll see
stop yapping
Xhigh is better

codex is better
im joking lol but ill send an actual website in dms
i did try it
TEHN COMPARE THE TWO
then*
just compare them
dont use chatgpt.com
why not
use api or some ide which has it
what
but ye test it
High>xhigh
hows that eveen possible
Also bro
No
Xhigh gets compressed during thinking so it sucks
Ok but 5.3 codex is so peak
btw theres a windows and linux port for the codex app if ur interested
Im using cli
Is app better?
Codex Desktop App - Cross-platform Rebuild. Contribute to Haleclipse/CodexDesktop-Rebuild development by creating an account on GitHub.
How cli can be different from app
Gotta ask codex to port itself π₯
Antigravity vs gemini cli.
Cli is way better
where
After all this who is the winner?
idk ab that
thats 1 bench
Xhigh gets compressed
During work
And eats your quota faster
High is better overall unless you rich api user
xhigh hallucinates more:
Thats why high is peak
Uhhh really looks awful
Mistral large is better
It's better than gpt codex
Gemini
that has to be the worst looking chart Iβve seen
Claude
like Claude is at 48%
4.5 hallucinates more than 4
Then deepseek at 82% is almost double but the actual shape is barely bigger lol
But 4 isn't on the chart
Mistral is 84%
U havent seen gpt 5.3 codex
Mistral no diff
Mistral will save us all
Get it to make 3d tanks suspension with proper animations
The lower the better btw
This means you need send request 3-4 times before good answer
But this doesn't mean it is stupid
It's so good it peaks in this bench
Mistral isn't actually the worse, Gemini is much worse
It's off the chart
π
Gemini is actually usable
It does the work sometimes
more = worse
Gemini is like phd engineer high on cocaine and with dementia
he got lobotomized
Hallucinations = creativity trust
Mistral doesn't have hallucinations in chat, it has chat in hallucinations
π₯
NO
No
DON'T SAY THIS
Yeah lol
More = better
Always
Awful...
You guys thinking about "temperature", but hallucinations make code not working, make stupid mistakes, or provide false info
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Gpt would never
We are joking π
they all sucks
Mistral temperature is β
impressive but not more
Ig
Bro show me opus doing animated tank suspension physics
Like this
Only mistral can do this
That's why he's the goat
Here's a more complete chart (lower is better):
oh wow 93%
Gpt oss?
That's the open source GPT model, not the one on ChatGPT
loss
Lol
Lol mistral is useful in this graphs
Kimi K2.5 seems to have improved from K2
Gpt oss β€οΈβπ©Ή
Surprising how the newer models like Opus 4.6 and GPT 5.2 Codex xhigh are actually worse than their predecessors
Glm 4.7?
Theres no way gemini 3 flash is this bad
They got ego boost duh
It is when it comes to hallucinations
Gemini Pro too
Their sys prompt says "state of art" models
Mistral is the actual state of the art
If you forget to attach something, sometimes Gemini Pro would make the attachment up, so you gotta be careful
I'm glazing mistral so much
Gemini 3 is like phd with dementia
Mistral is js crack addict π
"SOMETIMES" π€£ π€£ π€£
Always*
Sometimes it makes the attachment up EVEN IF ITS ATTACHED cuz its too lazy to read it
π π
Mistral 4 large will score 100%
Worse than GTP 2.1 mimi
Gtp 2.1 mimi on top
Uhh... Gemini 3 Pro (click play)
Ai sucks at physics
