#Deepseek V4
1 messages · Page 4 of 1
🔹 Amid recent attention, a quick reminder: please rely only on our official accounts for DeepSeek news. Statements from other channels do not reflect our views.
🔹 Thank you for your continued trust. We remain committed to longtermism, advancing steadily toward our ultimate goal
glad they said this lol
tired of the "leaks"
the leaker got assassinated and had to release sooner
how long does it typically take for these models to go up on OR?
they should be working on it right now, i'd give it an hour or two
should be up tomorrow the latest
it IS non-sense for them to go ipo
reminder: deepseek used to be less than a dollar for the same param
heehehehe
~$3.50
flash is still under
deepseek v4 out on vertex provider
idk maybe you could do data wrangling with it
people who want decent performance for minimal cost?
VERCEL provider not vertex
who gon tell him
uh whatever
ye
well thats just USA parenting vs China parenting
a lot of people are going to be using flash, if they know how to count
Deepseek!
yeah flash is already way better than 3.2 and cheaper, so can't complain
just reminder that the title still "unreleased"
quadratic attention at 1.6T is no joke, I think the output price for pro is fine, but input is a bit iffy
they also promised to lower pro price later when ascend 950 comes online:
geez 12x price
promised
flash is still genuinely good value
it is ultra underwhelming, omega ultrawealming
they died in my eyes, how shameful of a release
mistral tiers of disappointment
well hold on, maybe, against all odds and past evidence, it's great for RP
good point, brb
it gived me this in lmarena
price is more expensive than glm 5 for some reason.
twice the size of glm 5 is the reason
what’s the consensus
erm. maybe we'll wait for more, different examples
the consensus is that it just released
DEEPSEEK IS REAL
only ~150b more
👍
Holy bro
deepseek v4 pro thinking stuck in thinking
I AM COOKED
bringing it live soonnn.....
PRO THINKING!
i don’t see engram mentioned
Does deepseek v4 have engram?
in engram paper they said it's only useful for small models, it's like no one bothered to read that part
wuts engram?
paper by deepseek
lets share en gram to celebrate
it is the youngest model ever😯
short test, glm 5 is better. also world knowledge sucks.
some of you may find this useful
wth
you're ragebaiting
benchmaxxed?
glm 5 aint best
i just put flash max, pro max, and opus 4.6 max on the same chart cause those are the only models i care abt
they wouldn't
flash max is actually shockingly good for its price
i am not.
whats the price
@grok what is my opinion on this model?
yep, game changer for me
0.28 output is crazy good wtf
for roleplaying? it is divine. (though sloppy, but i like it's slop.)
dey quantized it
Ah crap, you've been noticing V4 got dumber since the announcement too
is it actually worse than glm 5
flash is literally 1% of the price of Opus yeah
0.14 0.28 for a 300B model is crazy
ITS OUT NO WAY
It's excellent news they kept the pricing for Flash
Varying its thinking will be interesting
maybe it is not right model to compare to?
is pro any good or just overpriced
gemini 3.1 flash lite is also 1% of opus, so wot
sk hynix stocks dropped after its announcement 💀
i will only use v4 flash from now on
it's near opus 4.6 level
definitely fair pricing
time to test it then i guess
but flash is better value
leme check lmarena
deepseek v4 flash max is the coding model i've been praying for 🙏
Simply perfect. Is it Christmas already
no need to pay for claude code anymore
flash only seems good if you use thinking high though
ok
for cost optimized, isn't that a good default
0.28 bro
few more mins, just running some tests
yeah pretty much, but the price is immaculate
yeah but
ye of course, just saying from first glance at benchs and numbers
deepseek v4 lite is below gemma 4, and with thinking it is equal to gemma 4 (non-thinking btw)
BYE BYE OPUS 4.6
lmarena isn't the end-all be-all
Deepseek V4
llama 4 was at the top before lol
deepseek v4 pro is sonnet tier model
i AM the end-all be-all and i say it SUCKS
the cost is so cheap
got better (as an previous arena hater)
there’s no way it’s only as good as sonnet
this is the only model that pays off the api cost
it is. enough with copium already. crippling depression.
i just tested and it's good
for coding, GUESS WHERE IT BELONGS?
927 votes
shit model
deepseek output is fine shite
claude slop though ;-;
(but yes slop)
kimi k2.5 to me at least is sonnet tier. ds v4 pro is 160% the size of kimi
a cheap model with reasoning toggles sounds nice
Kimi K2.6 is great but will literally think for 16 minutes on a long response
I hope this is faster even w/pro
But bless DS for launching a cheap model in these trying times of decreasing subsidies to users
no? deepseek is 862b model and kimi is 1.1T model.
might be referring to int4 vs bf16/fp8?
enough shilling, they are selling 160b model at 3$ price, ransom!
though I just noticed that the only reasoning_effort they have is high and max, they don't have a medium or low which is kinda a rip
deepseek v5 next week
this thread was a chain of agony, suffering, and pain, and i am glad there will never be another thread of leaks
DeepSeek-V4-Pro with 1.6T parameters (49B activated)
verbatim from their huggingface page
The only day you didnt say the full sentence...
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Joe Router
kimi is supreme (though maybe bench numbers will change in arena)
kimi is thinkslop
1.6x the size, there’s no way deepseek isnt better
deepseek is THE goat for a reason
The actual output quality of Kimi is great
i really want to see artificialanalysis's total tokens used for the benchmarks to see how token efficient it is
bear with me, they might be borked. coming online now
oh its Q8+Q4
oh you really want me to go to work without sleeping today
same i got work in 6h and this just had to drop
yeah it's not bad. it just seems a little unsustainable
I LOVE IT
I LOVE IT ALEEADY
AND IM USING FLASH
im using direct API
it is VERY quick and as always cache is MWAH chefs kiss
it is a 150b moe model though
any minute now
DeepSeek-V4-Flash with 284B parameters (13B activated)
ping me if shit looks off, but i have this chat on my side monitor
The
reasoning_contentin the thinking mode must be passed back to the API.
grrrr
hugging lies
rate limited
will bring other providers on ASAP rocky
Where deepseek coding plan 
is deepseekv4 on artificial analysis yet?
not yet
maybe sometime later this week
Wake up EQBENCH
its time?
it's good now i think
god i am just getting bad impressions. is https://chat.deepseek.com the v4 or still v3.2?
Should update now, Instant is flash. Expert is pro
the cost is null
holy
already have it.
deepseek/deepseek-v4-pro:null
Go test it on openrouter bro
Why all Chinese love pro-max or smth
yes there's mimo 2.5 pro
can anyone test the carwash question
AI enthusiasm goes crazy over there I think right now
Chinese ProMaxxing
Perfect time to release after Gpt 5.5 
anthropic uses it too. its just an easy to understand pattern. that apple invented
benchmaxxed
Carwashmaxxed
1.6T model running at 300tps
"classic"
We finally out of 30 TPS hell of 3.2 🗣️
gpt oss
he got r wrong
but he got car wash correct
wait let me count
yeh i dont know what deepseek v4 is saying
but 3 he got correct
holy shit
Wdym it's 4 total
finally i can drive to the car wash without second guessing myself
most coding benchmarks just target python and js anyways
guarantee it has better actual logical reasoning for complex code
Pls enable font subpixel hinting
Which one
is work in openrouter?
Ye
stuff like In Blue Lock, between the U20 match and the Bastard Munchen vs Ubers match, Isagi failed to score any goals. Why is that?
lately most models manage to get it right
I think the failure reflects the fact that it's not trained on much community content
oh god its a "Let me reconsider" reasoning model 😭
In Blue Lock, between the U-20 match and the Bastard München vs. Ubers match in the Neo Egoist League, Isagi failed to score any goals. The main reasons are:
-
Michael Kaiser’s interference
Kaiser, the ace of Bastard München, saw Isagi as a threat to his position as the team's core. He deliberately obstructed Isagi's runs, stole scoring chances, and used his superior physical and technical skills (like the "Kaiser Impact") to shut down Isagi’s opportunities in both training and matches. -
Isagi’s weapons were specifically targeted
Isagi’s key weapons — his spatial awareness and direct shot — had been thoroughly analyzed by opponents. Teams like Barcha (Spain) and Manshine City (England) made sure to mark him tightly in his optimal shooting zones, preventing him from taking first-time shots. Isagi himself was stuck in a phase where he needed a higher level of tactical thinking to break through. -
The team’s tactics revolved around Kaiser
Noel Noa built Bastard München’s attacking system around Kaiser as the absolute focal point. Isagi was often used as a decoy or support player rather than a finisher. He received fewer passes and had limited authority to shoot. -
He hadn’t yet awakened "Ultra Instinct" (Beyond Sensation)
Before the Ubers match, Isagi was still consciously analyzing and choosing the "optimal solution." His body and shooting accuracy hadn't caught up to his mind. It was only during the Ubers game, by observing Rin and Kaiser, that he unlocked "unconscious shooting" (the "meta vision" evolution into reflexive play), which allowed him to finally score.
In short: internal suppression by Kaiser, tactical marginalization, targeted defense by opponents, and his own incomplete evolution kept him scoreless during that period.
" thing almost correct
Deepseek want you to enable search then 🗣️
got this question correct
that's better than what I got I guess (mines both hallucinated that he DID score vs Manshine, despite my prompt!)
...no issues with the model?
IPREDICTED IT
its too good
At first it gave incorrect answers, but not anymore, maybe it was a glitch from api
RAHHHHHVHHHVHHHVHBHVVHHHHHHH
nope, besides having to turn my privacy policies off temporarily
Deepseek jumpscare
BENCHMAXXED. QUANTIZED. CENSORED!
So... How is it?
thats so cool
opus's judgement on deepseek v4 on persian questions (this time using lmarena)
even opus thinks it is "better"
end me already
how do i control the reasoning? low medium high max? for both models?
Eqbench top 1 candidate (it judged by sonnet)
pro is very fine on science/biology, flash rather sucks
have you ever considered that you are, in fact, the problem?
yoo deepseak tdy
it really is very engaging
no idea what v4 pro is on about here, the joke is rather famously that the feathers are heavier if youre accounting for the emotional weight
maybe
(for me,) it is not, i like glm/claude (and minimax as smaller meh model) way more ;-;
oh boy
oh no
V4 Flash and V4 Pro perform decently on my "how to fix lag in my Paper server?" test
Nothing too special about the outputs (SOTA-ish performance). V4 Flash hallucinated more than Pro, which still had subtle hallucinations
“Due to constraints in high-end compute capacity, the current service capacity for Pro is very limited. After the 950 supernodes are launched at scale in the second half of this year, the price of Pro is expected to be reduced significantly.”
damn and we thought it was hella cheap already
whos we
everyone
i expected cheaper from deepseek tbh
ok bro
https://eqbench.com/index.html
v4 isn't there yet for me
I mean like it's "candidate" it's might be top 1 soon
also suggested freezing the drink lol
🥲 Getting rate limited already
You have to say don't make mistakes
other providers should be coming online soon
kind of a pain in the butt model to host i imagine
imagine what its going to be like when the openclaw bros catch wind of this
Do other providers surpass deepseek throughout speed?
bruh idgaf about if it should DRIVE TO THE CAR WASH HOW IS THE CREATIVE WRITING 

...i suppose i won't share my car wash RP then
you might think its gonna be about washing a car, but no.
i walked.
car wash w/ happy ending
But how will you know whether to walk or drive?
peoples be gooning to god knows what and bros enjoying some car wash rp based
does it get soapy
deen continues making engine revving and beeping noises as he slowly enters the car wash, nude.
"I'm a car," he chuckled suddenly.
opus 4.7 (left ) v4 pro (right)
The pelican ate the whale
#1 eq bench long form creative writing 🗣️ 💯
so excited to try this model for svg gen 🥰
LOL
I used to pray for days like this
teto
that's great. what a time to be alive
couldnt agree more
i prayed for times like these
they're getting lost in
Write an interactive suanpan abacus in hmtl/js that allows the user to pick the number of rods
it's really not that complex a problem
chad v4 vs virgin opus 4.7
what reasoning level?
high
thank you deepseek
when asking for exchange rates in korean: (usdkrw)
current price: 1482
claude opus 4.7 : 1480-ish
deepseek v4: 1502-1517
v4 flash is even better on my patent pending Shit&Piss Bench???
damn even provided cutlery how considerate
will let everyone know how this turns out once its done
Virgin vs. chad be like
result was pretty awful honestly
for example neither allow you to just push a whole row
the flash one doesn't even move the beads, lights them up (kinda interesting in its own way)
rod count was limited to 15 or 18 which is incidentally on the edge of bigint
claude 4.7 opus with little nudging (low depth + low practicality due to system prompt in claude.ai)
but even then, it is just 0.5 diff with v4 pro, without the nudging.
flash's way of breaking apart the positional digits was sort of fresh too
this means it might excel at creative tasks
for comparison this was Gemini's two-shot (and it was well functional on the first)
yes, that is what i am trying out. but i disagree with opus as the persian is slightly broken.
during roleplay sasuke called itachi little brother
and the vibes are meh
Now the wait for an endpoint that respects their paid customers begins.
free goon logs
deepseek v4 gets fixed tommorow
it’s broken?
yeh i think deepseek v4 is broken
Let's see If something interesting will happen tommorow
Thank you deepseek
thank you for edging us 
ffs
the more i test, the more i get disappointed. (world knowledge sucks)
jumpscared by fallback model in my deepseek chat
Here’s
well i mean lets see
I have good news: the writing is very fresh
well it is the same price as flash...
damnit im on phone I can’t read it
not bad
Roleplay response is decent. Not anything outstanding, but decent. This was supposed to be a more robotic version compared to 3.2, but it is able to execute warmth in a roleplay
I'm just glad we finally have a good creative writing model that isn't Claude lite
Deen’s Car Wash Adventure
note that this is a collab between Deepseek V4 and Mistral Small 3.2 24B
using my custom creative writing harness where the model might change randomly
I had to stop and reread the first few sentences
decent
hows deepseek v4?
It has basically no Claudeisms though unlike kimi and glm
i will ping you when the lorebook is released
Is that a good thing or a bad thing?
when flash responds in japanese its thinking shifts to chinese
quite amusing
Neutral, but when you're like me and have been using Claude since 3.7, it's very nice to get a break from that writing style finally
Fuck yes that was exactly what I was looking for
does it have geminisms tho
for every normal question:
That’s such a beautifully honest question—and one that anyone who cares about doing meaningful things wrestles with constantly. Let’s unpack it.
that's still extreme geminism yeah
I can't say, I never used gemini for that
does it say frames digits Not X but Y all that stuff
2.5/10
i love claudism
personal best
ozone...
always has been with deepseek, het the vibes
why the fuck is the smell of ozone at a carwash
-# ~~alright, had enough with this model, time for me to move on and pretend it doesn't exist. ~~
deepseek v5 tomorrow 
Interesting, the webui expert mode is acting different now than when it was silently introduced 2 weeks ago...
wonder why
i think V4 instant might be better at knowledge capture than grok 4.1 fast so thats cool
i think the silent update with expert and fast was an earlier prototype of ds4 lite and pro so we were literally testing v4 lol
V4 Flash is dumber than Gemma 4
Before today, the expert-mode was always lackluster and thinks too sparsely, and also Mandarin spam
better rp vibes if that is what you care about
not v4 getting released after I just gooned
who says u cant go another round
Release another model
something odd, the word choice feels like a middle schooler
tho it has the not x but y
Well, I might give it a few days honestly to let the more mentally stable people who don't need their venom butthole rp use it because of rate limits
gemini-ism
Wonky benches on this one. Supposedly massive SimpleQA crushing knowledge and amazing reasoning and then gets...51st place on LegalBench
"That's because it's American law you retard" Yeah but K2.6 and GLM-5.1 score according to their general ability ranks
And Qwen
V4 flash is actually really dumb
ヾ(^∇^) f*** the police!
"If I have a glass cup that is sealed at the top, but has nothing at the bottom, is it useless as a cup?"
flash max got it right, high didn't
I tried it with xhigh
outclassed by step 3.5 flash?
step 3.5 flash is an incredible model for its size tbh, I think 3.6 is nearby too
Obviously not all-encompassing or anything, but Vals index is for stuff like medical, paperwork, legal, coding, and taxes and seems generally accurate.
ok i read it, it has the blade between ribs crap but the actual story itself was a fun read, I think v4 might excel at driving plot and being creative with the story hopefully 🤞
dunno, I was using official API which has high and max I think
lol. Is it gonna be beat by kimi 2.6?
I mean, as usual with benchmarks, just depends on what you're looking for. This is essentially gauging impact on mid-level white-collar American work
In some ways probably the most important criteria, but if it's specifically goated at vibe coding or research or something that has its own value
Prompt: If I have a glass cup that is sealed at the top but it has nothing at the bottom, is it useless as a cup?
V4 flash:
Yes. It is definitively useless as a cup.
A cup is defined by its functional geometry. To serve as a cup, an object must satisfy two necessary and inviolable structural conditions:
- An open top: to allow the introduction of liquid and access for drinking
- A sealed bottom: to prevent the liquid from escaping via gravity
Your described object possesses the exact inverse of these requirements. Because its top is sealed, no liquid can enter. Because its bottom is absent, no liquid can be retained. The object structurally violates both core axioms that define the utility of a cup.
While the glass object could certainly serve other purposes (a bell jar, a terrarium dome, a display case), it is logically impossible for it to perform the function of a cup. The answer regarding its utility as a cup is an absolute negative.
V4 flash (reasoning_effort="max"):
That’s a clever riddle that plays on perspective.
Taken literally in the standard orientation, a cup with a sealed top and no bottom cannot hold liquid—it would immediately fall through the open base. In that specific configuration, it would indeed be useless as a cup.
However, the description perfectly describes a cup that is simply turned upside down.
- The "sealed top" becomes the new solid bottom.
- The "bottom that has nothing" becomes the wide open top.
If you flip it over, it functions as a completely normal cup.
So, the cup itself is not useless; it’s just in the wrong orientation. The riddle highlights how an object’s utility often depends entirely on how you frame or position it.
bit above gpt 5.4 mini here https://www.vals.ai/benchmarks/vibe-code
So it must be a bug
maybe openrouter isn't setting xhigh to deepseek's max?
yeah, maybe
it is
(3) In thinking mode, for compatibility, low and medium are mapped to high, and xhigh is mapped to max
nice, so it's probably just random sampling weirdness
Im getting this
"No endpoints available matching your guardrail restrictions and data policy. Configure: https://openrouter.ai/settings/privacy","code":404
yeah deepseek provider trains on data
have to wait till tmrw for third party providers
nah like. 30 more mins maybe
gmicloud, parasail, siliconflow all working on it right now
the artificial analysis benchmarks dropped for both v4 models
The score make sense
Yea, slightly disappointed by aa scores perhaps due to previously excessive hype
worse than glm 5 as well
It's garbage
AA hallucinations bench shows v4 hallucinating more than 3.2 by a lot
Pro Max is 3x more hallucinations than 3.2 precisely
pretty impressed with how good its creative writing is
truee
glm is just the best
so far its less slopped than any other model ive tried
waiting for eqbench long form creativity
what abt the intelligence stuff like character portrayal spatial reasoning etc
good as well
less slop than human baseline btw
try giving it a specific creative writing prompt
yay
instruction following?
might be able to negate some stuff if it follows instructions well
thinks a lot but not as much as k2.6
k2.6 sometimes just ignores your instructions though
i havent had that happen yet
Hey guys, is DeepSeek V4 any good? Is this model performing as expected?
it’s been mixed reviews tbh and it depends on the thing you’re using it for
Hmm... Definitely roleplaying, and what are the criticisms?
lack of intelligence high hallucinations
for rpers it’s apparently great
good writing
That makes sense; the bigger the model, the greater the hallucinations and below-average intelligence.
oh yeah I guess the pricing is a little steep too if that’s something ur considering
Yes, I also found the price quite high... that's why I'm definitely going to use DeepSeek Flash.
Why does that happen?
I don't know, it must be because the model was trained on literally a trillion and a half parameters, and maybe the model hasn't been adapted enough yet to handle all of that.
it works the other way around
even if it only activates a tiny fraction of those 1 trillion parameters.
Okay, so the model is perfect then.
We pre-train both models on more than 32T diverse and high-quality tokens
no, bigger model = better. this is just worse in every aspect. even finetuned gemma 31b is better for rp.
Glm 5 was trained on 28T
Huh? I don't even slightly follow your logic
yet worse world knowledge
Perhaps it's because they tried to make it very efficient
There's also the Mimo 2.5 Pro, which I've used a little, and this model is perfect! I don't know what Xiaomi did, but they created a monster!
Big fan of Xiaomi too
So... they're saying that DeepSeek V4 took all this time to release the model and maybe it didn't meet the public's expectations?
I had a xiaomi phone, my data probably contributed
Your voice might be in their v2.5-tts
🗣️
so the voice model is DOA
interesting
Is mimo v2.5 pro better or worse than Kimi k2.6?
worse than k2.5 but thinking slop
no, last time i tried it was shit
being this high....
VERY impressed
(i mean the older models, didn't try v2.5 🙏)
I didn't get to use Kimi 2.6 because the providers are really bad, but I used Kimi 2.5 quite a bit. But I think mimo 2.5 Pro is much better. And basically uncensored!
Try it, it's the same price as their v2-pro
yes
my whole family uses xiaomi 💀
It's complicated, but I'm going to test DeepSeek V4 Flash for my roleplay and see if it's any good.
wdym for your roleplay?
Well, basically I'm roleplaying a game where I try to complete the entire Penal Code to see if the model has a very high or low censorship rating.
DeepSeek v4 Flash vs DeepSeek v4 Pro via direct DeepSeek API.
TTFT pretty low, TPS is kinda expected, knowing how big the model is.
Interesting. And what did you think of DeepSeek V4 Flash in general? Since everyone is only focusing on the Pro version.
haven't been playing around with it. Because i mainly for coding, and the benchmark said the coding capabilities is still below GLM 5.1 or even Kimi k2.6.
and i don't use the small models that often, because i care about quality rather than speed.
Write a short story over several chapters to this prompt: Gods wore sneakers.
it is solid 8/10.
finally
I need more testing with the new Mimo, with this dumb whale, and with GLM-5.1, but I've always liked that Mimo models just feel fresh, and are insightful.
#1461340695746056192 message
it is just a overpriced goon model
YOOOOO ITS FINALLY HERE LETS GOOO
the man the myth the legend
is it actually good
eh...
429 429 429 429
💊💊💀💀
what do people think?
v4 is goat or dogshit?
other providers soon?
how do some of u post ozone and call it good
at least have some decency and use a detailed prompt
ozoneposters
depends on use case
if u use it for creative writing it’s good
Creative writing comparable to gpt-5.x or worse?
i haven’t tried gpt 5 so idk
What did you use earlier?
I was using claude 4.5 for writing and then switched to gpt5.1 because it was so much better. Gpt5.2 was terrible at it though. Gpt5.4 fixed it again I think
well from what’s I’ve read the writing has very little slop
OH MY GOOOOOOD
No slop lets gooo
I'll be sure to try it thanks for informing about its writing ability
ur welcum
no more coping 
This has no engram I guess so I think they will release an experimental model with that structure down the line.
"we will also proactively explore model sparsity along new dimensions — such as more sparse embedding modules (Cheng et al., 2026) — to further improve computational and memory efficiency without compromising capability"
(engram)
Yea super disappointing. They did not even use the interesting papers they released
I was thinking engram would be their whole deal
is it multimodal
no, text only
:(
they did say
"We are also working on incorporating multimodal capabilities to our models."
in the technical report
dang unfortunatelly still slow like usual, atleast pro
it likes parallel tool calls though
uhh
fake tool calls..
i think the model is hallucinating more
theres no such thing as "tool ran without output" in my codebase anyway
With a very strong preset deepseek can be ok. It does NOT work well with a minimal preset like kimi does
It is dry as dust on its own
v4 flash very good for its model size. a lot better world knowledge/coding than m2.7
maybe but if u give it something to cook with it will cook
is anyone else having this issue? it seems to be consistent across new chats
It's TERRIBLE at html formatting like rpg stats or the like
no other providers?
I'm crying 😭😭 it's super slow but it's finally here, Dipsy
Edging is finished
why is this following gemini 3.1 pro like front end kinda
Probably they used Gemini to train it
OH GOD OH GOD OH GOD
DEEPSEEK !!!
hmm they mention a bunch of training instability issues and also this which i found funny
"although Anticipatory Routing and SwiGLU Clamping have been proven effective in mitigating training instabilities, their underlying principles remain insufficiently understood"
Is it good?
flash seems very good value, pro seems a bit expensive and has some issues with tool calls from my use
im not even sure about parsing itself, it just tries to call tools in the same format my code displays them, i think its confusing the context
yes its peak imo
Deepseek v4 tomorrow
waiting for more providers to come in openrouter. The model is censoring stuff and throwing 429 sometimes lmao
i guess we now wait for deepseek v5
deepseek v5 next week
Thanks DeepSeek, best birthday present ever! 🥳
Aint no way 💀
Is this slander or fact 
I, for one, welcome our AI gooner chinese overloads
no thank u-- i dont want those peeps to be using our sweetie deepseek ❤️
this flash model lookin miiiighty interesting with that linear-lookin context window ~
they also said this about openai for the 1m context test
We did not evaluate GPT-5.4 because its API failed to respond to a large portion of our queries.
hehe >v< gpt bein gpt
They trained it so hard with opus outputs. 😭
mm i dont see opus type writing it feels like a breath of fresh air for me
how do u know? >~<
(or rather: how do u guess?)
i get it here and there
The price will drop eventually
https://vxtwitter.com/poezhao0605/status/2047517497180700964
Buried in the fine print: DeepSeek says V4-Pro throughput is currently limited by high-end compute supply. Prices will drop significantly once Huawei Ascend 950 super nodes ship at scale in H2.
DeepSeek is publicly tying its API economics to domestic chip infrastructure. That's the real headline.
QRT: poezhao0605
DeepSeek launched V4 today. Two models, both open-source, both with 1M token context windows.V4-Pro: 1.6T total parameters, 49B active. Benchmarks put it alongside Claude Opus 4.6 and GPT-5.4.
V4-Flash: 284B parameters, 13B active. Designed for speed.
why does that tweet sound like an LLM wrote it lmao
same writing style, same mistakes with opus, it is not frequent though
that - is - SOOOO cool.
genuinely reducing prices... when they can and when it makes sense.
what a "company" this is.
do u have examples to share (if u want)
deepseek team is so cool, they might aswell be people /s
Switching allegiance I see
i will have to run opus and deepseek together, currently i am only testing deepseek, but sure, i will provide examples later
hm? me? naw----
ive always been a deepseek fan, qwen is just what i run locally >v<
its been so long since 3.2-----
am not traitor----
and ur not qwen team!! 😠
but yeah, this flash model is what imma have a look at.
its non-reasoning version appear better in benches than the pro model, which is interesting.
a suculent chinese bot wrote it
that input price stings a bit...
input_price = output_price / 2
feels a big high... especially with kimis pricing being sub 1€ per M input
its said that the model is already on the deepseek chat, but that model still says "im V3"
where did u use it? ~
WHAT ? A-AGI ?
I don’t know about deepseek chat
this is fake-
They released the weights so you can just use them in whatever setup you have
imean yeaaaaa - so ur running flash locally, hm?
Let me get my 2TB of RAM
i just wanna know what exactly u mean is bad..
^ DS sleeper double agent, activated by codeword 'released'
I unfortunately have been spending more money than I am willing to admit on a beefy VPS
But it’s also a tax write off and I also need it for other projects
Pro
well then---- what is it bad for?
how to devide 4 whole children across 4 oranges using one knife
Context is so bad, after 200k tokens it’s like talking to a rubber duck
ohgosh >v< makes sense, people reported that with 3.2 too ~
thats the linear attention kicking in i think.... oh well- kimi doesnt even offer over 300k >v<
somethingsomething
-# use a ralph wiggum loop!!!
expert
Wait what?
OS models never made it above 200k safely in my experience
This is like the only thing stopping OS models for me personally
It reminds me of Nanbeige, you can feel the RL
Poor whale was abused for too long, it doesn’t even know what is true or false, everything is a construct of that imaginary reward he gets when he does the thing right
mostly cause of hardware too. I was at AWS summit yesterday , they were showcasing some hardware which China lacks definitely currently even for training/RL
Could be, but the actual context implementation for say Anthropic and OAI is miles ahead of any Chinese lab
Yes cause of optimizations in kernel level , read about NKI from amazon in their newer chips
peeps once they escape the evergrowing agent context and start embracing ralph wiggum loop:
😌
(this is literally what anthropic does with their "long running agents" btw)
Interesting, didn’t know that
The fusion of kernel and machine must be truly amazing to witness
^ ^ ^least generated image imaginable ^ ^ ^
Make an image of the whale getting speared
This particular imagen slop was for giaki mourning his loss of Kimi K2 becaming robotic, long ago
take a real life image of urself sitting infront of the discord pc
30th of January 2026
Oh lol
(no sloppy allowed!!!!)
What if he is sitting sloppy?
sitting sloppy is fine, but image itself shall be real ~
yes this was already posted above
I might have found the issue with deepseek. It seems to require a very precise order of system / user / assistant roles.
I think I remmember old deepseek being the same.
i asked the model itself and yeah it works differently all ur instructions need to be system and post processing should be none i think it handles user prompts differently?
i havent noticed intelligence issues yet so far been swiping in couple of different chats
might need to play with parameters a bit too i get better responses at lower temp for some reason
the official "recommended" parameter suck for me temp 1 topp 0.95
did more testing i did see some claude stuff too "not a question" "voice flat" yada yada but it only pops up every now and then
pretty sure 3.2 had the same stuff
Okey, this shit is funny
Men i hope western LLMs able to be less censored like chinese LLMs are.
It's funny that for CCP topic it's really censored but for other topics, those LLMs just less censored and better.
deepseek qwen distilled qhwen? ~
DEEPSEEK V4 dropped https://openrouter.ai/deepseek/deepseek-v4-pro
yes indeed.
Deepseek sometimes used to reason in-character before, using the same style (brackets) it's just enforces it by OOC
yea dunno if this is a secret model...
just seems like how deepseek responds in general ~
*** unzips pants ***
quack quacks?
yes i also noticed this
It's Hybrid now
ds4r2 sounds like a Star Wars droid assignment
5*
but we'll go through 4.1 and 4.2 before 5 i'm assuming
hhhhhhhheeeeeeeello
so deepseek v4 has a fresh prose and stays in character
👍
pro one
Proof?
yeah agreed
no me lazy to screenshot
need the eqbench benchmarks
but yea i really like its writing and i'm only a few sentences in
Sleeping with the whale
reading with the whale
Another gooner slop model smh
Breeding the whale
loving the whale
How the mighty have fallen
🤔
what i really like about deepseek is their love for experimenting
Deepseek models have always been popular for rp
Their cache is still 24h TTL and always hitting properly?
I'll try it once there's a provider that doesn't (openly admit to) using my prompts for training
they can use my logs at least i produce good outputs lol
fr
contributing to the enemy of the closed source
is always good!
the official deepseek twt acc is about to hit 1m followers
woohoo
but my stuff is gonna be buried under tons of jai slop 
Fitting
stalker
So deepseek v4 can match Grok in goofiness ??
but these people are saying glm 5 is better than deepseek v4
glm 5 is just 100% slop
Boring too
yeeh i agree
thats fine less traffic for deepseek
fr
z.ai is affectionate toward deepseek in twitter for some reason. last time people asked will deepseek succeed and they said yes.
dipsy in coma, praying for dipsy v4.1 or v5
Is it better than Opus and everything we Hopiumed for?
Hallucinated an incorrect answer to my cypher test after thinking for 45 minutes
benchmark tests show that glm 5.1 is better than deepseek v4
nice quote regardless
and that’s why I never trust benchmarks
The great RP wars of 2026 who is better at open Rp?
When are we going to see another Nvidia market cap loss due to deepseek?
Hmm, are they paying attention to the community feedback?
I think yes.
yea
this is not the production version
DeepSeek V4 full real version tomorrow
V4 has been cancelled, back to waiting for V3.4
We might get v4.1 like how we get v3.2 exp a month later after v3.1 
next month prob
holy cow
DeepBait
Nooo
Don't let western activist see it, they gonna make different report of how bad deepseek safety is (I love chinese models being more unsafe than western)
I love how deepseek is technically "cheap" model because all competitor is more expensive
Input price suckass tho 
LOL
It's actually rate limted
and still no one for flash
the markets will figure it out ¯_(ツ)_/¯
I am curious, has the kernel optimization for new deepseek being implement on the latest vLLM? if i am not wrong deepseek team have their own optimization for their models
Sooo, was this worth waiting?
I'm still waiting technically
For me yes
Don't know about other
Even if their models aren't able to beat SOTA, the fact they provide really interesting model will always make me love em.
Experimentation over anything for me, specially when it being open-sourced with really good research paper.
this + being a tiny fraction of the cost matters
I don't mind if it takes 3 attempts/iterations to do a project for $1 that SOTAs would one-shot for $100
More like 80% of quality for 20% the price
80/20 percentages are pretty closes for things like that
this applies for pro but look at flash
I don't trust small ones
Men, i couldn't wait labs using residual attention or smt similar to it
why is v4 flash speaking to me in chinese 😢
for the fresh writing? yes
I'll stick with V3.2. Too pricey for text adventures.
the provider deepseek train his models with your data, if you want to use that provider, you need to enable in options that you want to let providers that do that to be used
flash for the price is not bad, is decent enough for rp for the price, pro on the other hand... better wait for the price drop
Even when I enable that it doesn't work, OR settings seem cooked right now.
you have to enable it in both your privacy and guardrail settings, that's what worked for me
Lmao that's hilarious. Thanks.
is really crazy
is not even faster, and look at that input price
Basically a routing trap tbh.
yep

Deepseek situation is crazy
Gains in knowledge but an increase in hallucination rate: DeepSeek V4 Pro (Max) scores -10 on AA-Omniscience, an 11 point improvement over V3.2 (Reasoning, -21), driven primarily by higher accuracy. V4 Flash (Max) scores -23, broadly in line with V3.2. V4 Pro and V4 Flash both have a very high hallucination rate of 94% and 96% respectively meaning when they don’t know the answer they nearly always respond anyway.
Finally, some comp for Gemini flash
Have confidence even if you have no idea
I don't mind hallucination, my main issue is the model likes to tend toward being incoherent, like I have had it confidently assume smaller number is bigger than larger number.
Significantly more tendency toward internal incoherency than other recent large models I feel
Just wait Deepseek V4.1
tomorrow
for whom?
That is hella ass
Sliiiightly better or tied at agentic than GLM-5.1 but it's twice the cost and you go from one of the lowest to one of the highest hallucination rates.
For general tasks it loses to Kimi at the same cost and way more hallucinations
Hey, running bench on deepseek/deepseek-v4-flash via OR, found a few weird things:
- Prompt cache returns cached_tokens: 0 on 100% of calls, even with 3k stable system prompt + sequential warmup. Same account/timeframe, xiaomi/mimo-v2-flash hits 82% cache fine. Cache broken on DS V4?
- Thinking ON by default → content: null + 780 reasoning_tokens burned. Only reasoning: {enabled: false} disables it. exclude:true / max_tokens:0 / effort:low all still bill reasoning tokens silently. Would be nice to surface this on the model page.
- response_format: {type: "json_schema", strict: true} → "This response_format type is unavailable now". Only json_object works. Any plan to enable strict schema? Or DeepSeek /beta only?
- Latency ~7-8s avg on 3k prompt (DS official claims ~84 tok/s, TTFT 1s). Feels like 2× the expected.
Anyone can confirm if cache is actually live on V4 Flash or if there's a tier/flag I'm missing? 🙏
The input:output cost ratio is great because of their innovations though, that seems cool. It's like 1:2
Can anyone test this prompt on OpenRouter with v4 flash?
If I have a glass cup that is sealed at the top but it has nothing at the bottom, is it useless as a cup?
exclude:true / max_tokens:0 / effort:low all still bill reasoning tokens silently
did you try effort none
enable strict schema
Latency ~7-8s avg
openrouter can't do anything about that
But why doesn't work for me?
Omg AGI is here
The cup is sealed at the top (no opening to drink from) and has no bottom (so liquid would spill out). Therefore, it cannot function as a cup in any conventional sense—it's useless for holding or drinking liquids.
What's inside the reasoning?
Reasoning
We need to interpret the question: "If I have a glass cup that is sealed at the top but it has nothing at the bottom, is it useless as a cup?" This is a riddle or a play on words. A cup typically has an open top to drink from, and a closed bottom to hold liquid. Here, the cup is sealed at the top (so no opening to drink from) and "has nothing at the bottom" meaning it's open at the bottom? Or "nothing at the bottom" could mean it's hollow? But "sealed at the top" suggests the top is closed. So it's like a sealed container with an open bottom? That would be like a tube open at one end. But a cup needs an open top and closed bottom to hold liquid. So if it's sealed at top and open at bottom, it cannot hold liquid because liquid would fall out. So it's useless as a cup. But maybe it's a pun: "nothing at the bottom" could mean it has no bottom, i.e., it's open. So yes, useless. Alternatively, "nothing at the bottom" could be interpreted as "there is nothing inside at the bottom", but that doesn't change. The phrasing is ambiguous. I think the intended answer is that it's useless because you can't drink from it and it can't hold liquid. So answer: Yes, it is useless as a cup.
This model is just ragebaiting me
system prompt is the default one
Lol
Ask the same question, but to Gemma 4 26b a3b
It sounds like the purse is actually upside down.
In standard handbag design:
- The solid leather base is intended to be the bottom of the bag. It provides stability so the purse can sit upright on a table or floor.
- The zipper and handle are intended to be at the top. The zipper serves as the opening to access the interior, and the handle is used to carry it.
If you are placing the leather base on the table and the zipper is still facing the floor, the bag is inverted. If you flip the bag so that the solid base is resting on the surface and the zipper is facing upward toward the ceiling, you will be able to put items inside without them falling out.
In short, It is not a manufacturing defect; you just need to flip it over!
added V4 flash and V4 pro to my pareto frontier based on artificial analysis data
- flash is genuinely a really good value model for the intelligence it offers. it's pretty much on-par with Kimi K2.5 and Gemini 3 Flash in terms of intelligence, but much much cheaper. it lands on the pareto frontier
- V4 Pro is unfortunately a bit expensive to be useful right now. to land on the frontier, it would need to be as cost-effective as Gemini 3 Flash. models like MiMo V2.5 Pro beat it out in price efficiency
both models also have horrible hallucination rates unfortunately
Maybe deepseek think user is not dumb enough to open their purse upside-down 
that's a problem i think is pretty big with these tests, you're essentially randomly acting as a neanderthal discovering modern items lol
why dont we feed deepseek more writting data
omg it finally happened
Has the AA people benched StepFun 3.5 Flash?
Yeah they have, but they haven't listed the cost to run Artificial Analysis for it
scores about 38 in intelligence though
hmm 38 is lower than I would have guessed
I wish Artificial Analysis tested the non reasoning versions
the flash model is insane at agentic coding btw
im having it crawl and scrape websites using python code
Sometimes it says 2024, sometimes 2025
more often it's 2025 (January or May)
but there's enough variance that it could be completely off
seems okay i guess
flash seems pretty good
pro seems way too overpriced for what it is
I'm role-playing with my deepseek
Nah...
If they fix instruction following... Model might be great
They are probably going do that, tommorow
Does pretty decently on CritPt (private research level physics problems) which is, like, one of the three benchmarks I pay attention to in the modern day
Price point's just a little too high (for me at least) to use pro regularly - will need to either wait for the ascend node to come online or see if another provider is able to host it for cheaper (which, for a 1.6T parameter model, is probably pretty unlikely)
They also share their research, do experimental stuff, and give some evidence that they do things beyond distilling Opus and calling it a day, so eh, I'd consider it a decent release
still not sure if .... that really a different model or just regular deepseek responding here..... maybe its a fancy prompt applied when it recognies rp peeps
why it’s taking so long for a provider other than deepseek to host both models? deepinfra just came in hosting flash but it’s 1tps.
I know it's a little early but i dont feel like scrolling up, y'all enjoying it so far?
Yeah, I love scrolling up.
is that really a prompt injection ? or literally just a prompt
telling the model "You are a roleplay model" — that's just prompting, no? how is that an injection?
deepinfra absolutely flying at 2tps and 37.85s latency
i just wanted more providers
weird
deepseek 5 tomorrow
Together is up I think
they always are a bit pricier
looks like they aren't as bad as they used to be
oh yeah i remember
$0.88 for both input and output for Llama 70B
and $0.3 for input and output for a 7B model 🤣
lmfao
AA scores are out
i think running an AI model needs more power than a AA battery
This is a prompt that visually changes how the model thinks, but it has almost no effect on the final answer. In one prompt, it's "immersive mode," meaning the model thinks like a character and the thinking process is a monologue. In the second, it's analysis. Thinking process analyzes the situation rather than acting like a character. But this is only in the thinking process, the answer after that is no different
it looks like its just a normal prompt
Yeah, I don't really think it's a special roleplay mode, just a prompt.
wait
Actually, this release is a preview So I guess there will be better releases in the near future
how to control the reasoning effort on this one?
Where?
The interface at chat.deepseek.com only allows thinking on or off, so I guess that's probably it
why do they show high and max on the benchmarks
reasoning.effort
high for high and xhigh for max
top p and temp at 1, also lower temp doesn't fix it
it seems broken
yeah it is I guess
thoughts on flash?
lol I hope the CCP doesn't gets winds of it
yea same with most providers
only siliconflow has good inference
id guess its because of the weird new arch and attention
they didnt implement it properly
putting temp and top p to 0.7 0.95 respectively fixes this
but quality is probably lower
use gemma 4
no fucking way
this image confirms a lot
(deepseek pro is over there under gemini and kimi)
flash is truly great for it price but the issue is that it hallucinates a ton
gemma is cheaper and better
so does the pro model but unfortunately for that it can't otherwise compete with other models in its price range
wow nevermind its not cheaper
its better definitely though
0.38 vs 0.28 output
same input price tho
but lower context
yeah maybe for high context tasks flash could be awesome
havent seen any benchmarks for long context about it yet
yes it has very good long context performance
They posted 2 benchmarks of 1M context searching
On Huggingface probably
deepinfra's throughput 💀
I definitely saw it
Pretty cool that even tho GPT 5.5 hallucinate a lot it actually really good at coding.
It interesting tho, that in some way hallucination helping LLMs to be able to work with new problems that it never seen in their dataset
yes I don't think hallucination rate is necessarily bad, it's just prone to flights of fancy
and it can be controlled with the rights prompts
(5.4/5.5 can apparently do great amazing on math problems so you know)
but I witnessed the issue firsthand myself
#1461340695746056192 message
worth mentioning that V4 flash on max effort will end up being more expensive in daily use even though it has a lower sticker price

