#general
1 messages · Page 342 of 1
wasn't it like days ?
i don't remember honestly
but what matter is that it will go public
at some point
Perspective helps!
︀︀
︀︀GPT-5.5 underperforms Mythos on:
︀︀- SWE-Bench Pro
︀︀- HLE
︀︀
︀︀It is basically on-par on:
︀︀- GPQA Diamond
︀︀- BrowseComp
︀︀- OSWorld-Verified
︀︀
︀︀It is better on:
︀︀- Terminal-Bench 2.0
︀︀
︀︀All while being more token efficient, smaller and cheaper than Mythos (and actually available!)
Quoting leo 🐾 (@synthwavedd)
︀
GPT-5.5 benchmarks are out
︀︀
…
Gpt 5.5 out?
Gpt 5.5 is available on free plan Chatgpt?
No
yes its out
but not on free plan
i need the 5.5
Kimiiiiiiiii yessss
Kimi yes kimi no , is it jailbroken or just like that?
Its jailbroken but its so easy to do so
Thats the reason I like chinese models
They always have low guardrails
They are like that naturally not even by design , lol
Isnt kimi a steal from claude?
Kimi is also disabled on arena.ai
Because everyone spent their balance
Now try to use it
no kimi models work
@echo aurora here is a trace id for you:
:19f5165f-0c6b-
also btw i was able to get the reason why it failed, i guess arena is broke
Your account org-3768766e50c242e2ade5fc3b3b783831 <ak-f4h9btz5i7s111b3pub1> is suspended due to insufficient balance, please recharge your account or check your plan and billing details
i can donate my moonshot ai key to you guys if you need it /s
is there still any chance to get opus 4.7 thinking in battle mode?
i think we can blame moonshot ai here for not having an auto-topup function
i looked and i didn't find one, unless i am missing it
also btw @echo aurora if you want a simple message to say to the devs, just say "Kimi models are failing due to Arena's Moonshot AI platform account being out of balance, you have to top it up to fix it. I don't believe Moonshot has an auto topup function, so you'll have to check on it often."
also btw i really do have a moonshot ai api key, it still has some balance on it and i have some experience with the platform
i just have a feeling that the people using kimi are gonna drain account balance really fast
so it makes sense that it would go broke
yeah they've made it more expensive this time around, probably to make up for the price cuts and speed improvements made during kimi k2.5's time
and its also open source
but once again i blame moonshot for not offering automatic topups
if anyone has that much gpu power
than download it and run it locally for others to use
Well they wont focus on putting guardrails up thats why and I approve haha
Is it still able to use or no?
Chinese models distill from anything so yes
you'll have to wait on arena to refill their moonshot account
I mean, if i use it, what notification it will be?
An error?
it will be a something went wrong error
long story short, no.
their account balance is empty at the moment, so no requests can be sent through
until it gets refilled, you'll just get an error message
:3
the message i sent was from arena now sending partial trace to users, I extracted it and got the message
Does anyone have thoughts on openmythos?
i need it :3
Seems fine to me
Any company doing this though would 100% be doing a pay as you go
Not credit system where they have set numbers
.
guys
deepseek v4 is here
there's deepseek v4 pro and
uh
flash
Bro is not a mod
yeah
Okay thank you for the heads up, looking into and flagging 
@whole sundial are you sure this is the case? I'm not getting any issues with Kimi models.
Sorry to say I can't go into details about codenamed models
they may have already fixed it
How about we can like vote on other's battle mode generations?
are you getting responses?
Like it can be a scrollable thing
Yeah, have tried out a bunch of and they all seem good 👍 What makes you think it was balance related and not some other error problem?
That would be super cool. Some kind of social aspect to it where others can vote. Would be really interesting to see those leaderboards too.
This idea has been something we've kicked around a bit.
Ohh any plans on working on it then or you cant talk about it
Nothing that I'm able to share.
yes t's real
Deepseek v4 better cook my meal or else this model sucks
The hype better be good
Gotta be better the opus 5
And gemini 3.5
arena returned an error from the moonshot api to my console that said the account was out of balance
hmm well is max generating image is 3 not much more ?
DEEPSEEK!!!!!!
Bruhhhh
but the announcement
🐳
incoming
Wait
Deepseek is out
I'm legit confused
I don't see it on arena
Can't tell if those are all ai from gpt 2
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Omfg can't distinguish ai from real
awwww.... :/
Wait did it literally just come out?
Yes
Oh
Like 10 minutes ago
I didn't even realise
30*
I was wondering why it wasn't on deepseek app
Probably will be released later today 🙂
For now gotta use arena.ai
It's gonna benchmaxx
But worse than opus
Wouldn't surprise me if it's close to Claude
And 5.5
Nah
Please tell me this model can kick anthropic from the throne
5.5 sucks
lets see if this model is good
Need to compare it to kimi
Wait for mythos to crush it
@echo aurora what was that actually mean? like maximum of 10 attachment per chat?
Im still waiting for the spud
Only 5 people have access to that
Correct
Yeah 100€ per 1m can stay there overpriced for enterprises
Why is flash actually fast asf
Mythos is ass bro
Claude is definitely over hyping it
Glazing
Please someone tell me that this model is actually running on Huawei chips
AND overpriced
There is literally no way to know this
but we already know about the price
Correct but I believe it's going to be good but everyone is over hyping it so much
I mean wasn't 4.7 a downgrade from 4.6?
who want a good model that you can only use 2 time a month !
True
cause of the price and usage
Claude is unusable without paying
pretty sure once i hit it, I have to create another new chat :)
Grok 4.4 will be 1T when it comes out according to Elon
I knew my blue whale wouldn't disappoint
No way deepseek kept this model from us for this long
It's disappointing me now
I gotta make some random bull htmls then try to make it code a speedtest server
damn
DEEPSEEK IS back hell yeah
That's the case if you'd like to continue uploading more files. But you should still be able to prompt.
Nsh
deep seek v4 lets go
Idk
Deepseek v4 is so cool
they scored worse then kimi tho sadly 😢
Never speak again please
NEVER
Leave this server
nah bro
Its worse than kimi in coding
Roleplay 😛
@echo aurora what's the context limit is it 1M?
Nah it isnt
But it has a lot more knowledge than kimi
It doesn't look like it's reasoning improved
Yesss
From 3.2
They updated there website
I wish it was multimodal but this is impressive
It is ain't it?
It will more likely to hit if I do image generation in side-by-side mode
am I in rage bait right now
Deepseek is so cool
We typically will do w/e the default API setting is.
How is Gemma 4 31B ahead of Kimi 2.5?? Big model size difference
Kimi 2.5 sucks
Guessing this is rate limit, but will check this Trace and keep you updated.
And its old
Kimi 2.6 out now
Non thinking pro version over the thinking in leaderboard looks like mistake, especially if you look at it rating at all, cuz it looks lower than expected for new models.
I take that back, this is a caused by a bug that was flagged to the team earlier today.
🫂
Deepseek v4 Pro on leaderboard performs nearly as non thinking GPT-5.4 LOL.
Wtf
And thinking version as gemini 3 flash?
i'm already disappointed by v4 pro, it fails some of my world knowledge test questions
one of them both glm 5.1 and kimi k2.6 gets right, another one can be correctly answered by grok, claude, gpt, gemini, old and new glm and kimi models, and even hy3, a brand new model from tencent that is the same size as v4 flash but yet it gets questions right that v4 pro (5 times the size) can't
gng what
Having probably x3-4 size of parameters
It beats opus 4.6 Max even Opus 4.7 cuz it suck
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
I was expecting much more NGL
It sucks anyway
is deepseek out?
Even qwen is better
i agree, i would rather use hy3 preview than either v4 flash or pro
ok but I think deepsee kv4 peak at roleplay
creative writting
what up with this?
arena gang when they see a new model for 0.1 sec:
damn
tencent hunyuan completely re-did everything with a guy from openai and their hy3 model is pretty good for its size (around the same size as v4-flash), same company that made the awful hunyuan dense models, the 4b has the world knowledge of smollm2-360m with awful reasoning and tool-calling, it came out in the middle of last year
if they made a 1t+ model it would smoke v4-pro and every other open model
ughhh dont tell me its another bust
i really hope arena adds that model, it would probably be top 10 open
So uh what model are we looking at right now that's the best for world knowledge. Thinking Gemini 3.1 still?
what in the disappointment is deepseek doing
yes
What was deepseek v4 called before it was revealed what it is?
Deepseek v4 though is not like v4 performance
so eh they're focused on writting and code
It's like 3.5
Also anyone know the best place to get really good usage with Gemini 3.1 pro anymore
OH YEAH
gpt5.5 is good?
yep
Is DeepSeek on the app?
guys deepseek is NOT back at it
woke up and turned on my pc at 10 pm just to be disappointed
great
Gemini just needs to drop Gemini 3.5 and just destroy every model like the goat it is
yeah and math
if i get a good code bro
ask deepseek to solve a impossible math
chinese are good at mathh
it just butchered my code bruh
someone who actually use roblox to also test deep seek or another AI
great
; D
im risking a good code for deep seek v4 code guys
there were some rumors/drama that the deepmind guys use claude at work rather than any internal models/checkpoints. Doesn't seem great for google.
lol
i wanna see where artificial analysis will place it on their benchmark
and this guy think deepseek v4 sucks bc It can't do a lua code perfect
but yeh i agree with him
it sucks at code
nah its just bad
if Deep seek suck at the code now
im going back to sleep smh, so its not the best open source?
Did you guys mention that Deepseek is okay at writing or no?
idk you go check for yourself
i just woke up in the middle of night to a trash release
should i use mimo if deep seek fails guys?
no use claude 4.7
gemini 3.1 pro then
HOLY MOLY BRO
rok
bruhh
I AIN'T RICH
thats not impressive tho
if deep seek fail me bro
I already told you guys, deepseek is focused on
theres no way it was gapped by glm 5.1
writting and code
thats like all the way down in artifiical analysis
we have models that are good at that for free
everything is code sop now?
HE DID IT
at least we got 1 million context
IN THE THIRD ATTEMPT
i just dont see the point of this launch
LETS GOOO
??
he always
did that
now i will use roblox (cuz literally im using godot right now)
deepseek front end is so bad
try in three attempt
i guess its just the 1 mill context that we care about?
deepseek v4 is one of the best **||overhyped ||**model
wth it's Deepseek v4
fr they waited so long for this lol
the spud
they tricked us into thinking it was the spud
they love marketing campaigns bruh
this is interesting: https://x.com/ValsAI/status/2047513613750202452
thats a bigg gap tbh
to be honest
slopmark
for me deep seek is doing great (yet)
Mythos
you dont trust that team?
not bad for a first shot
needs some fllow up prompts
its better than gemini
and i like that little detail of adding the server location
How many tokens did it output for that?
be specific dude
guys did mimo cook
mimo is better than deepseek lmao
dont you dare say that
deepseek is my friend
its better for frontend
what is deepseeek better for?
for feeling disappointed about interrupting your sleep for a slop release
lmaoo that was a good one, imm go to sleep on that note
ask mimo to mae
make
advanced npc ai shooter
and i said
can claude solve this
br
bro
Anyone else having trouble with gpt image 2 not completing jobs at the moment?
ok mimo is kinda great
Reddit should definitely make an ai app
It's been more than six hours
Yup, image 2 is definitely playing up at the moment
Deepseek seems on logic god mode
I used this model, and its cooking
Far better than the previous model i used
Deepseek is taking over?
Deepseek v4 is literally benchmaxxing
and focus on someone else
Rude
You were rude
though buddy you're too focused on code
Mate leave no one wants you here
We are here to talk about ai
Bruh what's this servers main purpose
It's good at code too lol
ok ok
yeah but mimo 2.5 looks better at code than deepseek
Lmao that's like saying Kimi is better than opus at code
yeah bro stop being ignornant
and you're still rude even I didnt want t argue with you
If you have a problem tell a mod
Well idk abt that
Deepseek is better at debugging and deep tasks
Mimo is good at structured small tasks
And being fast
It is
can we use gpt5.5 in battle?
5.5 mogs all
For me deepseek is better at following instruction context and debugging which is quite amazing
If 5.5 was here we'd have a ping
Never used deepseek i just see reviews
Glad it worked out for you
At code security?
5.5>5.4>5.3>5.2>opus 4.5> opus 4.6 >>> opus 4.7 >>>>> gemini 3.1
(Actual tests, prepared& reviewed by all 3 models together, anonymously)
I reviewed it, its like gemini 3.1 pro but on steroids on context & debugging or even creating elements, i'm glad i can use gemini 3.1 pro but on steroids
I used it for roblox project
Nicee
Honestly i used to make roblox game with old deepseek model, but one of the generated scripts actually creates bold ui design, and i like it, but back then deepseek was infant
it can't even handle codes pretty well back then imo
Deepseek had big glow up rn
yo do u guys think they are adding gpt 5.5 to direct
If its cheap = yes, elseif its expensive = yes > later remove
How now how to use Gemini 3.1 pro for free
Deepseek v4 for now
Imo
or Kimi 2.6 atleast
hello. Will Gemini 3.1 pro be available on arena ai?
glm 5.1 is closest to opus rn
Hows deepshit v4
Deepshit v4 cooked my code pretty well
Ai studio
I've cried over it 🙂
Deepshit is now deepreal
Yesss
How its deep?
Deepseek v4 pro does about same performace as Sonnet 4.6 Thinking, or Opus 4.6 on like Low in my testing
How to use DeepSeek v4?
Hi 👋
Hi
Can you make a video of the picture I uploaded above?
This?
woah v4 is here
Yes
Yes, I really need the view of our village. This is the school.
Describe what is needed?
The video will be drone style slow motion.
Ok bro 1 second
You don't understand Bengali.
Sorry no
Video Arena is only available on https://arena.ai/video, if that is what you are here for.
/ image-to-video I want to the this movie chale and play ho
Deepseek launch looks like a dud but when you factor in cost, it's a killer
It's bad
/cinematic slow zoom, 4 friends watching movie in dark theatre, screen light flickering on faces, dramatic mood, realistic camera movement generate this video
is this a bot?
goofy ah bot

GPT 5.5 better than opus 4.7 ?
And no new version number on Seedream, but they've done something, more responsive prompting and better results.
My bet that it's a hidden update to counter GPT2.
Which one is better? Matter of taste - but it really need a goal photo to tell.
Tudi & Seedream left, and GPT2 right.
Funny thing is that while Seedream have added fake noise to make images more "photographic", it's seen on the GPT2 image.
While the same noise now is much smaller on Seedream at left - and I've done a dozen in the last hour to get her pose and dress right so it is consistent.
Can a model's name be changed? I noticed in a chat that a model previously called deepseek-v3.2 was renamed to deepseek-v4-pro
Is the website down? It’s using forever to load
V4 is the new model
They took 3.2 away
haha i wish
atleast those were removed and replaced with
What's this
It says that fir API Replacement doesn't it?
Hey pro
it says renamed 3.2-exp to 4-flash
but 4-pro is new
How good is gpt 5.5?
Well atleast pro is new
Really Good
Seriously?
yes basically they were testing 4 flash under name of 3.2 exp
or
pr move
not me to know
¯_(ツ)_/¯
I guess chatgpt finally beaten Claude after all these months
it was always better at thinking just needed more specific prompts, and it wasnt best at ui
when can I see the V4 or 5.5
Well when a crab whistles on the mountain
But deepseek v4 is there tho
@surreal zephyr exactly which models are available on Deepseek Web/App?
Because it's cheaper
Well u didn't said that before
"Price is blended using a 3:1 output-to-input ratio: (3 × output price + 1 × input price) ÷ 4. This reflects typical usage where output tokens cost more and are generated in higher volume."
Petition to add a slider to the pareto graph for input:output ratio
Just incase you didn't knew
Kk
K
no idea i dont use deepshit
i need multimodality
Ufff
models without proper multimodal reasoning are unreliable imo
total cost efficiency 5.5 vs 5.4
5.5 up to 7x more token efficient is wild
so up to 3.5x cheaper
2x higher cost per token yes
but uses 7x less tokens
Is doubled
so 3.5x cheaper total
Wait
(just dont spam xhigh when medium and high do fine then you can save 3x the quota)
where 5.5
In battle mode
pineapple will get u 😈
bro is roblox exploiter 👍🏼 👍🏼
uhhh
or wait
Who know what ai is the best for school
gpt 5.2 search
Thank
This problem can be solved
This session has reached its token usage limit. Please start a new chat to continue.
Trace ID: 76f18173-373d
lets go
just cancel a chat
your
pls fix ur fckin captcha😭
lets see if the $200 pro subscription was worth it, and yeah im releasing this game on steam to get my $200 + tax back😭
but i heard 5.5 pro is really good at game-making
i will use like meshy ai and add real 3d models to the game
and see what will happen then
half way there
arena ai actually crash
extended pro😭
Look at this problem
the results are interesting but does someone have a great pc to run it? because on my m2 mac it runs at 5fps😭 please
Help me reach my first 1000 subs, thanks legends
👉 If you enjoyed this breakdown, hit that Hype button to show some love and help push this video further.
This video takes viewers on a fast-paced journey through the complete evolution of American fighter aircraft—from fragile World War I biplanes to cutting-edge stealth jets and future 7t...
Send
thats why i use gpt
gpt has no "end conversation"
so it actually listens to you instead of ending himself when hes lazy
gemini 3.1 flash lite preview .. is it a temporary issue or ?
nvm I'm lagging hard too
guys wasn't claude opus 4.6 available in the LLM chat what happened to it?
its too expensive to give free access to opus to everyone with generous limits
its still in battle tho
ooooh
thanks
also have anyone tried chinese models like kimi?
if so what's your review about it
new deepseek is trash, kimi-k2.6 is the smartest one in my opinion in how it reasons in every domain, and glm-5.1 is okay but i dont like glm cause it has the same problem that gemini has like if you tell it to change one button it will additionaly change the full page, even if i ask it to not do that.
thx
but i recommend to pay for claude or chatgpt, they are way better
i just bought cgpt pro for 200 bucks u know😭
flex
because paying for glm or for kimi that arent SoTA is i think a waste of money
i dontdo complex tasks I'm not even into coding I'm a med student
gpt-5.5/gemini 3.1 will be great for you
Ngl, so far GPT 2 has been pretty impressive. I asked for this prompt:
Create an illustration showcasing details about the differences between Bigfoot and the Abominable Snowman. On Bigfoot's side, it describes it as being either male or female, brown fur and more man-like in its face, looking almost like a Neanderthal. It is aggressive only when provoked and can be found in the woods of America. On the Abominable Snowman's side, it is mostly a male species with white fur and a more ape-like face, bipedal with large feet like Bigfoot, and is less aggressive. It is a creature that prefers solitude and is known to save some of those who wander in the blizzard in the Himalayas. Some theories suggest it may be a Tulpa created by the Tibetians.
And it's shockingly good with the text. Even Gemini struggled when you asked for too much. This is consistent.
And here's a map I asked for my fictional island
yes and no
i like to use claude analysis level for the deep content and also his special commands like oods and L99 actually make difference for me
yeahh i recommend you to buy claude max for 10 bucks monthly, look at the promo codes online and u will be happy with that
YEAAHHH NOW PUT THAT TO 5.5 PRO AND MAKE AN AAA GAME OUT OF IT😋
wtf
codex made factorio copy and now playing it
transactions here and complicated in africa
also 10 bucks is waaaay more than it worth here
I wish lmao
nahh 10 bucks for this much usage is really good, in api you would pay like 150 bucks for this much usage
plus you have all the special features
can someone mute him please?
I'll look into that
wowww pretty fast moderation here
Fair enough
any takes on this ? it's really hard for me to start a new chat at this point
AND BTW WHERE IS SORA I BOUGHT PRO AND NO SORA HERE? ://
Been using Deepseek V4 for a while and it doesn’t improve. After a few exchanges, it loses memory. When I point it out, it doesn’t even remember forgetting, so things get messy. Eventually it only recalls the very first question, so it’ll hit me with “So what you meant is this!” — bringing up ancient history even though we’ve moved on.
It sucks
I guess ill have to switch to mimo 2.5 pro
https://youtu.be/_193U2aNaeE
This is bloody nutz.
OpenAI are backing a bill that would shield them from liability if 100 people are killed by their AI. At least they're making it obvious that they are the villains.
Take 1 minute to contact your representatives about this - https://controlai.com/take-action
Join PauseAI - https://pauseai.info/
Sign the statement on superintelligence - https://...
Deepseek v4 being worse than kimi2.6, gpt 5.4 is just funny
Hey..
what??????????????????????????????????????????????????????/
where are the deepsleepers?
API of V 3.2 got recently updated so they might have been shadow releasing for a while.
But yeah it's not that useful looking at the benchmarks. I can only hope that this being a preview would signal improvements later on
so anyone knows whats the rl for gpt image 2 in chatgpt for free users (OFC)
How’s gpt 5.5?
i am testing 5.5 pro and im quite pleased
but nothing revolutionary
just like a gpt-5.3/5.4 situation
The whale has awoken.
Well the point is is that it’s a reliable workhorse
It’s not supposed to be super duper ultra intelligent
And for what it is it’s an extremely competitive model
How come
.
Have you guys found a way to use claude opus for free?
its not good or fast
faster than other open chineese models ye
I’d beg to differ?
It’s faster than Gemini 3 flash for me and Claude sonnet 4.6
whats ur provider
I use the app and openrouter
It’s been quite nice
DeepSeek has NEVER let me down any time I’ve asked it something
The one time it did was because I didn’t describe something correctly
Which is insanely impressive for a lower tier model
i mean flash is good as the fast cheap model
but pro is supposed to be good and SoTA like thats the point of pro
i would rather have a really slow model that is SoTA and is good
That’s your preference then
Nothing wrong with that at all
I personally fave Kimi K2.5 and K2.6
but still with fast models that arent that good you spend more time fixing and iterating so slower models are actually faster to work with
from my experience
It’s more about how it thinks
For example I can tell you DeepSeek will always provide you decent code
It may not be opus quality
But it will work
why am i not able to login in lm arena website
Every time
Deepseek v3.2-exp*
Renamind model is wild
Deepseek geniuely has most overhyped open source models while having worst ones
Qwen has best models by far from open
Qwen 3.6 27b solos deepseek v3.2exp aka v4
And if you need price to perf then gpt 5.5 is still best
Is Deepseek 4 all that? Or just overrated?
we'll see actual results in few weeks just as it always happens
They benchmark maxed it
So it seems good on paper
But people are saying it isn’t the greatest when it comes to programming tasks
tbh not total disappointment, wanna test it on few thousand hundreds context
The only benchmark worth trusting is arc-agi, the rest is just benchmaxxing and pattern matching. If DeepSeek doesn’t at least hit gemini 3 flash level on arc-agi, it’s a flop. At that price, nobody’s gonna want it.
I’ll test it
Engram sounded so promising
And see what happens
Is Gemini 3 Flash still the best non thinking model rn?
Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited. After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly
Yep. Especially in terms of price and intelligence.
hopefully that's true
Second is GPT 5.3 or what?
I’d say GPT-5.2, but that’s already much more expensive than Gemini 3 Flash.
What, so 5.3 which is a newer version is worse than 5.2?
Maybe GPT 5.4 Mini, but I haven’t tried it yet.
Guys am i able to ask y'all a question?? When will you be able to use ChatGPT Image 2.0 ? I know its in the Leaderboard but we cannot use it yet (as of my knowledge)
mimo 2.5 pro is so good
Just open https://arcprize.org/leaderboard and check out the price/intelligence. It shows how well a model can actually think. Chinese models, as expected, are all at the bottom.
mimo 2.5 pro is great for front end
Respectfully
I have actually never seen a lower iq argument here
You do realize V4 is a completely new dataset
it sucks tho
mimo is way better
Oh my god bruh for the last time DeepSeek isn’t supposed to be SOTA
It’s the reliable workhorse
its supposed to be the poor man's 5.5
In benchmarks DeepSeek V4 outperforms 5.4 xhigh pretty often
I don’t know why that means it sucks
it wont outperform the spud tho
ITS NOT SUPPOSED TO 😭
If you want SOTA you go for Kimi, Claude, GLM
DeepSeek is for rapid deployment
kimi is NOT SotA gang
My guy what.
mimo 2.5 pro is
Mimo is the exact same philosophy as DeepSeek
but its smarter
and does stuff correctly
Ehhh in my testing not really
and we get 1 million context
but its actually smart
and gives you complete coding projects
not just a 100 line template like gemini n stuff
Well in Gemini’s defense here it’s always been a pretty ass model
I just like DeepSeek because it’s reliable
deepseek needs vision
send me
Hold
if it gets vision its better than mimo
great
Again you can prefer what you want
But I think a lot of people conflate that new model must be SOTA
I personally love V4
I love that it doesn't run inference on nvidia
it's insane how deepseek just keeps being nerfed intentionally and it still manages to perform near SOTA
then no, if you look for price to performance open source model or small team model are the best
how do you konw they are nerfing it
they literally are not allowed to use nvidia
and they say what they achieved and innovated publicly
so everyone can technically replicate it
why
it's profitable also for other ai companies they will just use those technics
i think claude does
blud asked why
Wait
i don't think its the best in the world but i think other ai companies just doesn't care and are just scaling
with compute power
DeepSeek v4 pro BEATS Kimi K2.6 in swebench verified???
kimi is ass bro
anything beats it
All your opinions are dogwater bro
Kimi wipes the floor with Claude opus 4.7
It’s not always about benchmarks
Bros laughing while opus 4.7 won’t even read documents, listen to instructions, and takes shortcuts always
Not the mention the stealth price hike with the new tokenizer leading to 35% higher costs for an already expensive ahh model
So you’re getting worse performance with Claude while paying more premium
DeepSeek’s architecture is actually insane
Running a 1.6T parameter model at such a fast speed?
Holy
honestly they will keep going like that and keep reducing compute power necessity, faster training, and it'll just be profitable to everyone
@proud bobcatare you running deepseek v4 locally
other ai companies will just steal the idea to implement on their but its normal honestly
how's deepseek guys
anyone tried it
hmm
but deepseek doing the dirty work for architectural improvement
its not multimodal either eh?
interesting
Oh yeah dude I have like 3 servers laying around just for this
Totally
its not that expensive lmao
I’m not wasting money to run local ai
I don’t need it
I just like keeping up with releases and benchmarks
maybe if at some point we will be able to compress the model so much (like 99%) without loosing quality we will be able to run T model locally lol
to run a 1.6t model locally and at decent speed ?
it's expensive bro
;-;
not more than 15 k tho
yes so its expensive for most people
15k if your lucky and want slow speeds and thats still expensive- 😭
wdym thats like just 10 months of work
without food, rent or anything else
bro your ragebaiting honestly
It’s only 15k guys
yes
you can't save 100% of what you get
15k is the avg yearly salary of the top 5% middle class people where i live
anyway
im not dude
its the truth
unless you live in some third world country
I’d rather buy me and my future husband a cottage somewhere than use that money for ai slop
wdym ai slop
"some third world country"
that is literally the entire world bro.
wake up.
its not
europe and NA is just a small part
to dismiss 4-5 billion people like that is a crime
working at mcdonalds in canada can get you more money than other countries
your ragebaiting cause you know its impossible to save 100% of the money you get every month
so it won't be 10 month
As much as I nerd out about AI I’m never using that for anything serious
but much more
97% of people cannot do that.
the ones that could and are are getting kicked out or constantly racist-ed
An audio setup
either way we're getting off topic
guys lets stop talking about this before the night fury warns us
So sad that the new model isn’t in the web/app yet
DeepSeek?
it is wdym
It’s been out for a good month I’d reckon
No it’s not
what model are you talking about
It’s DeepSeek v3.2
we have v4 gang
Not 4 yet
Bruh it’s not 😂
For a month
Not at all
Jesus Christ.
what in the ragebait are you doing
I think ai might be giving us brain atrophy
Genuinely
People will open the DeepSeek app and see “instant” and “expert” modes and still say ts
Just ask DeepSeek what it’s knowledge cutoff is
Ts is ragebait
The first one is V4 flash
The second is V4 pro
What is there not to get
It’s been like this for a month my dude
Today it got released for api access
The ragebait is INSANE
v4 is good, but the qwen 3.6 27b bruh, how this run in my pc?
The power of dense models
Nuh uh
I can’t wait till we get a good 32B dense model from Qwen
waiting my 3.6 2b
That was only a change in the interference
my pc only run the 2b in 15t/s ;-;
Excuse me, where is GPT 5.5 on Arena? What's the name of the model there?
the 27b is only 4t/s ;-;
Deadass?
v4 is same model as v3.2 it was renamed lmao
nah they made v4 pro (thats actual new one)
I tested the v4, is more fast and has more context
but they renamed 3.2 to 4flash
flash v4 is another model
3.2 exp was 671B
bro it kept the score on arena
but really like the v3.2
its NOT a new model
bruh, Idk about lmarena
I'm talking about my expecience in deepseek.com
According to official benchmarks Deepseek V4 Pro scores 154 points MORE in comparison to Claude Mythos in codeforces rating. Only 3.5 points behind Mythos in BrowseComp, strange.
It’s right here cuh
Is that a new AI?
@proud bobcatwhy does mimo 2.5 pro thinking process look similar to deepseek's
ctrl+shift+r
They learn from the best
😎✌️
That is my source.
yep ;-;
qwen 3.6 27b > v4 pro
in my docs tests
27b > max
lol
How so?
Why is it so bad?
is the mythos in the room with us?
27b is better that max 3.6 🤣
I mean what model is good in like Science and creativity
qwen is horrible making big models
Mythos glazers when they don’t even have access to the model and still hype it
it will be soon
aand it will ccrush the spud
Probably yeah, also research.
I
?
Can you name some models maybe