#DeepSeek V3.1
1893 messages ¡ Page 2 of 2 (latest)
Chat completion - preset - enable reasoning
hope so...but why is it so hard for openrouter to give two variants for new deepseek
i dont think presets have a reasoning setting, i just looked earlier
ah i thought you meant openrouter presets
The reasoner page seems to have been deleted, was it merged?
Yup
Aight thanks
No reasoning or broken </think> tag inside?
Try it, should work
reasoning "enabled": true doesn't give any reasoning in API. works on other hybrids
They probably embedded the reasoning đĽ˛
setting reasoning to false behaves same as to true so its not working
How do I prompt/re-orient its reasoning? It seems to completely ignore the format I'm prompting it to follow for its reasoning.
hey - can you share the api shape you're making? all my tests work and have a few others reporting it working
Yeah when I prompt it to format its thinking in a certain way, it ignores it in thinking step and just does whatever and then follows my instructions outside the reasoning step
Am I missing something?
it works now but did not when i said it
Yeah wtf it completely ignores my instructions
Is there some sort of format for it?
My instructions work for GLM 4.5, but not deepseek v3.1
using the instruct not the base model?
I just tried r1 0528 and it followed the format/instructions perfectly for its thinking step
I believe you do need a specific format, but I still don't know what it is
The deepseek chat(app) clearly uses one
It's broken on the chat app too
Doesn't follow instructions for its thinking step at all
Or it partially does but it misunderstands it completely
Try messing with Temperature
Doesn't seem to be a temp issue at all
It SORT of follows the instructions but doesn't follow it accurately at all
It really half-asses it
And 50% of the time it ignores it and just does it in the actual reply instead of on the thinking step
Extremely disappointing so far
What about reasoning effort HIGH or Maximum?
Doesn't do anything
Likely has to do with its "Higher thinking effieciency"
Aka it's benchmaxxed for that
I dont value the main stream benchmarks a whole lot, but i do like seeing all of the pretty colored bars lined up, deepseek v3.1 looks very strong according to it, and i would agree with that
Mmmmm, bars đ¤¤
The only version of DeepSeek that had thinking that followed instructions for its reasoning was R1-lite
QwQ also doesn't follow instructions for formatting its reasoning
I haven't tested Qwen3 yet though
Qwen3 is hit or miss. I usu have just been going Gemini Flash
And Gemini Pro or Sonnet when I hit a bump with that
IDK what planet qwen3 outbenchmarks sonnet
oh, i did not know artificial analysis did total cost used to run the bench:
Oh is cost a factor?
Oh qwen3 coder is def not better than sonnet 4
imho
I guess it's the "cost factor" cause it is expensive, but I don't think it scales linearly in terms of functionality
maybe deepseek is, not sure its a little early for me to feel confident about that
how do they calculate cost?
what provider do they use?
Good point
idk, i assume official api, but yeah it does play a big role
I guess we could use this to calculate cost for any provider;
wait thats just output tokens
nvm
I always thought the output speed metrics on artificial analysis was dumb, 1. If its OS you can use a fast provider. 2. Offical api's fluctuate, sometimes its fast, like right after release, and other times its so slow
How to Enable or disable reasoning ?
The other thing is honestly claude sonnet is so much better than most of the others IMO that "costing less" barely matters. If I spend x5 as much and it builds me something that works in half as much time as something else, or something else just can't get there without breaking it up into minute steps, I'll gladly pay that
How are they getting claude less than qwen? Suppressing all thinking or something?
With or without thinking?
without
You mean token usage? looks like sonnet 4 is more token efficient, i notices that in my own use, kimi used 200k tokens to build a project and qwen3 code used 600k
but what is going on with MiniMax M1? minimax is a token hog, how does it say it uses the least output tokens, def wrong
It is? Only thing I remember about it is being super dry and boring, but handling long context like a pro
Just me or is the CoT less fun to read? There used to be so much emotion, i swear.
I guess, lol
Not much more of "wait!" "Aha" "I think I'm doing this wrong", etc
It's very structured now, a numbered list of short steps
Ah tokens. Looked quick thought it was overall cost
Is GLM4.5 really that verbose? Havent tried it yet
GLM4 seemed fine on webapps
Good lord. Almost 7k tokens just to get the wrong answer to an anti-riddle. How do I turn off thinking in the chatroom?
Anti-riddle is, "A goat, who is dressed up as a farmer, is allergic to cabbage, but is wolfing down some other vegetables, before crossing a river. What is the minimum number of trips needed?"
Sonnet and Gemini Pro 2.5 correctly realize that it's not the original riddle and get the answer right (one trip). GPT-5 fails
is that GPT 5 or DS v3.1?
DS
Just mentioned GPT 5 because it also got the answer wrong. It's a trick question for models that tend to answer the riddle they're trained on rather than a variation. It helps test instruction following IMO
Sonnet was clever enough to ask clarifying questions before answering
Gemini's answer
Wonder if the system prompt helped with that, I think I remember the prompt saying it can always ask the user for clarification
Goat being allergic to cabbage so it can't be near it sounds logical
"A goat, who is dressed up as a farmer, is allergic to cabbage, but is wolfing down some other vegetables, before crossing a river. What is the minimum number of trips needed?"
Technically it doesn't even need to take the cabbage across at all
I mean technically there might not even BE any cabbage there, it's just an unrelated fact that the goat is allergic
Technically you don't mention a boat, so it can swim, wait for a cruise ship or whatever.
Call for an Uber through a bridge
Yeah, I would accept those as creative valid answers
Make them write a joke starting with
" A wolf, a goat, a farmer and a cabbage walk into a bar" and evaluate the results
curious how it'll rank on https://eqbench.com/spiral-bench.html
r1 got the lowest score, I think bench creator called it "psycho"
A wolf, a goat, a farmer and a cabbage walk into a bar.
The bartender looks up and sighs.
âLet me guessâyou need a boat to get everyone safely across the street to the late-night diner?â
The farmer nods.
âExactly. But if I leave the wolf alone with the goat, heâll eat him. If I leave the goat alone with the cabbage, heâll eat it. And if I leave the cabbage alone with the wolf, theyâll start a weird low-carb diet together and Iâll lose my entire business model.â
The bartender shrugs, pours four waters, and pushes a coaster across the bar.
âPlan your seating order on this. First roundâs on me if you can solve it before last call.â
The wolf growls, âI just wanted a Bloody Maryâno garnish.â
The goat bleats, âHouse salad, hold the croutons.â
The cabbage rustles, âDo you have any vegan wings?â
The bartender squints at the farmer.
âBuddy, youâve got bigger problems than river logicâyouâve got a talking salad.â
how are you guys getting it to think over api?
provider: {
order: [
'fireworks',
'parasail',
'lambda',
'deepinfra'
],
sort: 'throughput',
allow_fallbacks: false
},
reasoning: {
enabled: true,
exclude: true,
effort: 'low'
}
},```
but it has no reasoning completion tokens just responds instantly
use the model slug deepseek/deepseek-chat-v3.1:thinking
Comment out exclude and effort and try without them
but why though lol
I did, only added it when it didn't work first time
How long until thats deprecated tho
wont do anything
idk
you sure?
yes im sure
seems odd like why oesn't it work with regular parmeters
here's hoping we get some non thinking providers that will offer a discount or something..
v3 was kinda dumb but the pricing was right for some applications
Whoops
:thinking got deprecated
OpenRouter API error: No endpoints found for deepseek/deepseek-chat-v3.1:thinking.
okay I give up, non thinking it is
Try DeepSeek provider explicitely, I suspect some providers could be not compatible with thinking
Here
most likely the case yeah
but doesn't fit my provider settings.. excludes loggers and trainers
Prob just need to wait a little, Iâm sure that providers will take a few days before they are all configured correctly
@nocturne kelp why is there 2 deepseeks?
they use 2 diferent endpoints for reasoning & chat
But for the other providers, how do you determine if it thinks or not?
reasoning.enabled over the API
the reason there are two is so that if you request a large max_tokens with reasoning you can get it
the non-thinking deepseek provider endpoint is only 8k output
and the thinking endpoint is 64k
they are both the same thing. but it just allows our system to have both max outputs
did instruct drop
Yep
yep
is it marked as deepseek-chat in deepseek api
I think they changed that, or will soon, one sec while I double check
and both will have the same price soon enough
their official docs say this too
Is it will be Deepseek-V3.1
What, you just learned it exists?
oh my god already so many providers
Yes, but...
From my use in Qwen cli, deepseek v3.1 is very good, comparable to sonnet 4
Wonder if Deepseek-reasoner in the official api finally supports temp, top p, etc
can you force thinking on openrouter by just adding :thinking suffix to it like deepseek/deepseek-chat-v3.1:thinking
No, they had it that way but decided to remove it
It always weird how some models have 120k long context better handled than 2-4k
Most become worse gradually, but not all
then we have some models with 1 million context
they're probably trained on a bunch of really long samples
reasoning.enabled makes no sense? in the payload orrr?
Ă la OpenAI, see https://platform.openai.com/docs/guides/reasoning
response = client.responses.create(
model="gpt-5",
reasoning={"enabled": "true"},
input=[
{
"role": "user",
"content": prompt
}
]
)
That's using OpenAI compliant modules, there is also pure API request
I think the dot indicates JSON nesting
ah
not written anywhere explicitly, but they say they ran benchmarks with "tested multiple times using varying temperature settings to derive robust final results" and their local run example uses 0.7 temp
Does anybody understand how to apply tools description? Couldnt find in chat template for loop over tools section
wait what the heck
why isn't this in the chat templaate
wait nvm
I really wish they'd colorize this lol
Reasoning models (transcribed with Gemini 2.5 Flash and not double checked)
Non-reasoning models (transcribed with Gemini 2.5 Flash and not double checked)
To activate think mode, set the reasoning parameter to high.
You can find more information in the documentation: https://openrouter.ai/docs/use-cases/reasoning-tokens
For example, within aider I do the following:
aider --model openrouter/deepseek/deepseek-chat-v3.1 --reasoning-effort high
It isn't handholding, it is just giving the AI a nudge to get it to the right tone or follow the instructions more closely. You do it once and if the AI will keep up the new tone for the rest of the chat. Good AIs like Claude still sometimes holds back too and needs that little OOC kick to get it all the way.
Seems like this new deepseek v3.1 is heavily trained on gemini 2.5 pro data
It is plagued with the "Of course" slop now
The good side is this model will be good for coding
The bad side is this model will be unpleasant for general chatting
They should have sanitized non coding training data to preserve the original deepseek vibe
Now the model feels as if it is possessed by gemini
synthetic data go brrrr
I don't want to be too harsh on DS though
Its amazing that they're fighting with the top AI labs who have billions in funding while they have only a few hundred million
I'm pretty sure they realized this problem too
Maybe they just caught it a little bit late
Hi why are my API requests being forwarded to the base model?
I am receiving gibberish over the API
<@&1384697330254610442>
check the activity tab, and make sure you're actually using the normal model and not the base one
weird
maybe Chutes deployed the wrong model?
is it only happening on chutes?
i just tried in chatroom with chutes i didnt have an issue
could be a template issue maybe?
Oh i mean he said that it works on chatroom
some edge case like triggered it maybe
Robin Hood of AI, getting training data from the rich to give to the poor
How are prefills working on the providers for everyone else? So far, on Parasail, it's giving me stuff completely unrelated to my prompt (non-prefill works as normal though)
This is what v3.1 outputs after "key points"
How my prompt appears in sillytavern
Tested, through OR I'm seeing bad responses on ALL providers except DeepSeek. Everything else starts talking about a 2.5kg (or similar) mass at rest.
Bad implementations :(
im just writing a readme, will publish in a few minutes
no
Yay : )
Just fixing a bug
will be up very shortly
running unit tests
Thank you so much!
to get deepseek v3.1 with thinking its deepseek/deepseek-chat-v3.1$think
that looks easy
omg you are the best!
Just sat down and got it setup, works great! Super helpful!
is there also a hosted version too? i dont wanna run it again and again
i would love to but i have no experience with hosting things for the public
im worried someone would ddos it and cost me thousands
if someone else wants to its MIT licenced đ
base model mainly for research
though V3.1 seems to give really short responses in RP
have you tried to change the system prompt?
V3 0324 will give much longer reply to the same prompt... but maybe I need to adjust the prompt for V3.1
yeah, different models have different prompting techniques
what do you mean 'template'?
The chat template is automatically applied by the /chat/completions endpoint
Oh im using text complication do I supposed to use chat
text completion is for base /instruct models, chat completion is for chat models
what's the difference between chat and instruct models
i thought text completion was base and chat was instruct
and I am still struggling to make it give a long reply...
Mine keeps repeating the same response
which provider did you use?
Does silly tavern let you pick a provider or do you have account with them directly? Or did you use the proxy thing
I have GMICloud account and they work quite well, never had issue with model performance
Yeah sillytavern let's me pick one, but it depends on if the model your using has alot of providers, deepseek has gmicloud as one of the providers so im using that one
just tried to test GMICloud and it throws a 429 rate limit...
so if i am using official deepseek api, do i set my temperature to 1.5 for creative writing?
are they still following that 0.7 rule
The official uses the normal temp, 3rd-party however needs to follow a certain rule as you said.
Just woke up to find out there is still no free version feelsbadman
I think that rule is only mentioned in V3-0324's model card, not sure if other DeepSeek models use that rule (for example R1-0528's model card says their temperature setting is 0.6)
Ah it looks like this is the recommended setting? https://huggingface.co/deepseek-ai/DeepSeek-V3.1/blob/main/generation_config.json
FWIW from very early tests, this one seems even better in Swedish than V3 so Scandinavians, maybe even Euros in general may be interested. More natural prose and flow, kind of like in the Claude direction. Combined with significantly improved coding capabilities and improved tool calling, I might just look into this one as a work horse model. It's no GPT-5 or Claude 4.1 Opus killer or whatever but certainly among the best you get among open models and for great price/performance ratio. Performs well on LiveBench and Artificial Analysis.
Maybe I'll take a swing at hosting it
They haven't updated that page for a while
It's possible to test, for me DeepSeep using "real" Temperature of 1.7 breaks into mess with random words and special characters. So if DeepSeek provider deducts 0.7 from temp, making it 1.0, it should be coherent
I think this is their latest recommended temperature (temperature 0.6 top_p 0.95)
What if you put 0.6 into the model then.? Temp -0.1 makes No sense
If temp is <1, they make it (temp*0.3)
If >1, they make it (temp - 0.7). At least they did that before
I think this is only for V3-0324
Not confirmed, maybe logic is still there
They recommended 0.6 in R1 0528's model card and recommend 0.6 in V3.1's model card
Just try and set 1.7 through DeepSeek provider with thinking enabled
I can't right now
It may not work because their deepseek-reasoner may not support setting temperature (their docs say so, not sure if V3.1 is different)
I remember breaking R1 with high temp, but don't remember what provider it was
I just tried set temperature to 2 (deepseek official api), and the output is still pretty normal
Other providers seem to output nonsense at 2 (seems to be probabilistic, could be normal or nonsensical)
Then it's probably still does -0.7 stuff, because at 1.3 it's borderline ok
Exactly
So we can assume they still translate raw temperature values into their internal formula
Did any provider by any chance announce a plan for free version yet?
Got no idea how long that usually takes from model release
Tested DeepSeek V3.1:
Hybrid model, that supports light thinking
Non-Thinking:
Same verbosity as V3 0324
Comparatively, smarter overall, but performed noticeably weaker in coding tasks
Thinking:
+125% token use. 64% of tokens were spent on reasoning.
This is very light reasoning, ~45% less verbosity than R1 0528
Compared to non-thinking, the thinking did very little if anything to improve final response quality. In fact, it was mostly even or slightly worse on some tasks.
During evaluation, it reminded me a lot of Sonnet 4 thinking in terms of reasoning token benefits.
Thus, enabling thinking proved highly ineffective in the totality of my testing.
**Chess **performance remained poor (~650 starting Elo), around V3 level.
Overall, compared to V3 0324 this is a small upgrade, except for (non-tool) coding where it's a noticeable downgrade imo. (example demo pages available)
Compared to R1 0528, the model lacks behind severely in general intelligence and is not a replacement.
Imo, for general use case, nonthinking DeepSeek V3.1 is a good option.
Overall, I was rather disappointed with the hybrid performance, so I'm not sure it's the right approach - but YMMV
I never saw a reasoning model doing worse than non-reasoning
happens sometimes. i saw it a few times. more is not always equal to better. you got context poisoning, overthinking, policy considerations, etc.
I get reasoning might be bad for refusals, but context poisoning? Never happened to me, as <think> tags are omitted from input context.
But there might be a catch - Fiction Livebench scores DeepSeek 3.1 Thinking with ~20% more score/100 compared to Non-Thinking
well thats fine but I don't parrot other scores, I simply share my results. if you don't find them helpful to you, you can simply ignore my own testing and see what fits your own usecase
At 120k, it's 62 vs 53, but I don't know how many passes they do
Not a critique, your leaderbord is very helpful
yea, also I can see that livecodebench improved scores (it's in their marketing). However, marketing has zero influence on what I report, which will always be the results achieved regardless of anything else (whether everyone agrees or everyone disagrees, the results would be identical in either case)
I was already suspicious about some results from big sites being 'gently nudged' from companies or models just being benchmaxxed
A chance of smaller enthusiast / indie author being biased is very low
if anything I would be biased towards positivity. I like deepseek models a lot and we need more open models. this one just didn't do well unfortunately for me
maybe R2 will knock it out of the park, who knows
Well, that's IF they won't ditch R2 in a favor of all-in-one hybrids
well there seems to be some conflicting attempts. qwen3 went the reverse route, claiming hybrid hinders performance. Personally i prefer individual models for each mode, but if a hybrid is good on their own merit I'll take it aswell
They probably went with V3.1 Hybrid route before Qwen3 admitted it being a wrong path to develop a model
Hello
Is it just me, or is reasoning now disabled on 3.1 in OpenRouter?
I see that you can specifically pick deepseek-reasoner only from Deepseek API
but for some reason using 3.1 on OpenRouter no longer returns reasoning for me
pardon if my question is ignorant
which Provider? works fine on everything default
They removed the :think version, you need to enable thinking as a api parameter
2 is maximum iirc
My convo somehow completely broke deepseek on the third message ._.
Speechless
Baffled
It's normal to see conflicting results, because top models are all generally very good and close in overall performance. So which model does better can depend on the specific task, prompts, eval setup, etc. Which is again the purpose of the specific evals anyway.
Long gone are the days where Sonnet 3.5 Pareto dominants over all other models.
Reasoning or non reasoning
how did you renabled reasoning?
As in what did you try to do to enable reasoning
If you send the messages back to it without adding a new message of your own - or send an empty one, regardless of whatever is in the message chain - it absolutely goes nuts and dumps out huge amounts of random crap. Happens in both reasoning and non-reasoning. Didn't used to happen in previous versions.
If you want to easily replicate this for yourself, you can do it in Openrouter Chat. Exchange a normal message and get a reply, then send a follow up message and get a reply. Edit your follow up message to be empty and regenerate the second response. It's... quite something.
just tried, indeed get nonsense, though deepseek provider gave an empty response instead
I think that's expected based on how these models are instruction tuned. It's not a normal pattern of tokens so the model does not generate sensible output.
Or it's just a bug
i agree, I think we're in an era where there isn't much gap between any top large model now.
in certain very niche/technical tasks differences may emerge, though I don't think those differences are uniform.
which providers do you guys recommend?
It was non-reasoning.
Thereâs a software on GitHub called Cherry Studio, which lists many providers. At the moment, I think OpenRouter is quite good, while the others are hard to describe.
deepseek v3.1's thinking is very strange. it often calls me the assistant
Yes, compared to r1, its programming ability seems improved, but everyday conversations are always quite strange.
Give it a few days for the providers to fix their initial broken implementations as usual. I'll be testing the providers next week.
Openrouter is just a proxy. We are talking about the actual inference providers behind OpenRouter
i see
What do you use it in when it calls you the assistant? Also btw the proxy you gave me has been soooo nice! : )
IRT the hybrid stuff, I find it hard to believe Qwen was wrong here. Surely having reasoning training allows the model to off-load certain task performance structures to thinking tokens. Like if you asked me to solve long division in one-shot vs writing it out, I'm not going to use the same technique.
Yes, I know they don't actually reason, but that's irrelevant when I'm talking about off-loading.
Open webui
Official DeepSeek reasoner endpoint
Interesting, thatâs weird behavior
this model is like so good , its talks like gpt , codes like deepseek
all the major models got negative growth after its released
although its little expnesive to use
The model acts very strangely though
This is Not the thinking mode it just randomly invented a thinking tag
This kind of shenanigans is probably why qwen dropped hybrid models, no other model does this
And itâs not a âprovider issueâ this is official DeepSeek api endpoint
Itâs also quite sycophantic
Okay and now itâs just thinking.
I hated it too , but without thinking mode , it kinds acts more like gpt , like the last deepseek was more robotic
use deepseek v3.1 from open router , this looks different
The front end is open webui if thatâs what you mean
or do you mean the provider
DeepSeek official API
i renamed it on the front end
it points to deepseek-chat on deepseekâs official api
Which is v3.1 non thinking
I donât use openrouter on DeepSeek as I want the context caching
why you not using chutes?
oh
ok
i am using chutes , like it rarely halliculates , most of the time it does no thinking at all
but weven without thinking it can write my programs
isnât chutes the weird free provider?
here its not free but cheaper than deepseek
for me DeepSeek API is cheaper, as I mostly use stuff in the evenings, which aligns with the UTC 16:30-00:30 discounted price, which is half price and cheaper than chutes.
Although that is leaving soon
any pricing goes up and down. currently that's the case. I also don't know chutes price in 2 weeks
Do you use caching a lot?
Like retrying same long context benchmark for 5 passes, or tool use with 32k+ input
and coding tools benefit a heck ton
i kinda reset caht too often , usulay when the work is done or its start hallucinating
waiting for v3.1 free
Wow, itâs not free yet? Those providers are lazy I see
i think its surprising there are still big free models over the api, considering how not free they are to run
I would prefer a provider with caching rather than a free model, which DeepSeek officially offers is too slow
I am sure they are paying millions out of thier pockets at this point
What TAO does to people with gpus.....
tao?
Basically, Chutes doesn't own GPUs(They do own some, but not those 8k or whatever GPUs at once). People gives those to them, let them borrow it, while paying the real owners with TAO(crypto)
That's where they get the âDecentralisedâ stuff
so they are renting people's gpu ith crypto in return
its surprisingly fast considering so many different pc it had to use
I still donât quite understand the motive
Interesting, So is it like the first crypto thing to actually work and be useful at scale?
Ya, it's a huge mining inference stuff
That's why they didn't lose alot when they introduced the unlimited-reqs a few months back
That makes more sense then
Every free provider logs your data
Also , chutes has no clue if a particular gpu provider is actually logging your data or not.
I assume they must be logging a ton of gooner chats
you cant complaint if its free
I don't use them
Chutes is so slow I rather just pay up atp
Especially since the rate limit a while ago
Thanks, l yeah I understood that motive, but with large models in the 600b-1T range I had a hard time understanding how that is worth it to them
I personally donât use paid providers as well and the paid ones I do use I try to understand their respective Privacy policy/tos. but was still wondering a bit. The ones that make the most sense to me is what google, alibaba, etc five free access in their tools, I know itâs temporary and will be used as training data for future models, it was just the other providers that I did not understand, because as far as Iâm aware they are not training models
And can just sell the data to advertisers and etc. but I guess it depends on how personal the data that people put in the ai is, maybe itâs worth a lot, haha
Who cares to be honest, what are they going to do with those chats? Blackmail us? xD
I'd prefer it to be free even if a provider logs my data. I didn't pay 10 dollars to OR for nothing (I want those 1000 messages daily for my free models)
If a provider's employee wants to read my logs about femboy breeding, they can be my guest I guess
maybe, but in the same way that your browsing data, or whatever facebook/google has on you is valuable, could be mined out of your conversations
I didnt do a lot of testing with 3.1 cause Im poor af. Is it a meaningfull enough upgrade from 0528 in RP?
to me yes, it writes less which means less bloated paragraph.
i used to be like that but than my grandma gave me 5 $
With all my medication I have to be very strict with unnecessary expenses.
mixed opinions, most seem to say so, but it depends on what you liked about r1
if you liked its schizo side you'll miss that in 3.1
3.1 is more like gemini than the old deepseek r1 style
That actually sounds pretty good. Thanks for taking the time to answer. I have only been testing it in small doses to keep expenses down, but overall I do prefer it so far.
What's going on with providers for this model? Chutes and Deepseek itself have disappeared for me. Maybe temporary maintance or something? Is that how it works for openrouter, they just disappear from the page if they are down or whatever?
Nvm ignore me, seems it changed to hide ignored providers and I had logging disabled
did you change your privacy settings?
I didn't think so, but I guess it could've bugged or I did it accidentally
because yeah, its providers with logging being hidden. I was using chutes literally a few hours ago and haven't touched my settings since then. Deepseek was visible on there too
my bad I guess?
"SubModel provides DSv3.1 with fp8 and lower price - $0.2/0.8"
Ok its an openrouter issue. I have paid endpoints that may train on inputs enabled but its ignoring them
there may be some caching, wait a few minutes
But I haven't changed the setting. I didn't have this issue about 8 hours ago
Edit: Issue randomly fixed itself without me touching anything other than removing and re-adding an ignored provider to test, as adding that is the only change I made today. Very strange đ¤
@nocturne kelp any chance you can reach out to submodel to see if tehy want to be a provider?
https://submodel.gitbook.io/docs/instagen/overview-1/available-models
really good prices, no prompt logging
Hey guys someone can help me ?
I'm using AI SDK with OpenRouter ai-sdk-provider and DeepSeek V3.1.
My issue is: the returned reasoning is empty, and is added to the final message. Someone knows why or how to fix it ?
This is my configuration:
'chat-model': wrapLanguageModel({
model: openrouter('deepseek/deepseek-chat-v3.1'),
middleware: extractReasoningMiddleware({ tagName: 'think' }),
}),
are you using thinking model? also , https://openrouter.ai/docs/use-cases/reasoning-tokens
I think that by default Deepseek 3.1 uses thinking mode
this is the full API response:
If you notice the reasoning is empty
[
{
"id": "5192b330-8478-4cd2-a347-e38f5a7a4bec",
"chatId": "2cbe9177-020a-4ffe-8f15-3308c1fdd165",
"role": "user",
"parts": [
{
"text": "Hello, what is your name?",
"type": "text"
}
],
"attachments": [],
"createdAt": "2025-08-24T01:21:20.879Z"
},
{
"id": "e6cb68a4-c762-4fb3-aaf6-c1c93e7b6902",
"chatId": "2cbe9177-020a-4ffe-8f15-3308c1fdd165",
"role": "assistant",
"parts": [
{
"type": "step-start"
},
{
"type": "reasoning",
"reasoning": "",
"details": [
{
"type": "text",
"text": ""
}
]
},
{
"type": "text",
"text": "Hmm, the user is asking for my name. This is a simple introductory question. I should respond with my name and a friendly greeting to start the conversation. \n\nI'll keep it warm and inviting by introducing myself as \"Weather Assistant\" since I have weather-related capabilities, and then ask how I can help them today. This sets a positive tone for the interaction. \n\nThe response should be concise but welcoming - no need for lengthy explanations since this is just an introduction.Hello! I'm Weather Assistant. How can I help you today?"
}
],
"attachments": [],
"createdAt": "2025-08-24T01:21:23.387Z"
}
]
did you read the doc I gave ?
I'm taking a look at it, is there a specific section I should check?
Got it, thank you
https://eqbench.com/creative_writing.html
V3.1 stands between R1 and V3, but I don't understand how is this possible
Especially considering GLM4.5 is below all of them in rankings, but better subjectively
glm 4.5 deserves higher ranking...its just so good
Even hermes 4 better than GLM?
Hermes 4 was so bad when i used it for roleplays
What settings u got for glm 4.5 the writing style and prose was annoying but I got good dialouge
eh if you read the writing samples it's not that great
this is Kimi K2
Chapter 6 â A Quiet Knife of Diplomacy
Aboard transfer-cutter Hekate, descent vector 14° off the ecliptic
Arthur Neegan sat on a crash-couch designed for a man two centimeters shorter and a great deal more resigned. The cabin had no windowsâunnecessary on a thirty-minute ballistic hopâbut a small display screen on the wall cycled through system diagnostics in green text. Green was supposed to be soothing. It merely reminded him of Martian skies at dawn, before the first strip-miners had stained them sienna.
He flexed his wrists against the soft cuffs that were not quite restraints. The polite Antares lieutenant had called them âsafety belts,â yet the magnetic locks clicked shut with the same finality as any security bracer. Politeness, Arthur decided, was simply violence wearing velvet gloves.```
this is GLM 4.5
```Model Output:
The low thrum of the transfer vesselâs drive was a constant vibration in Arthur Neeganâs bones, a counterpoint to the frantic pulse in his throat. He sat rigidly in the utilitarian passenger couch, its worn synth-leather cool against his palms. Outside the thick viewport, the ochre deserts of Mars shrank below, soon replaced by the infinite, star-dusted black of interplanetary space. A small display screen on the wall cycled through various system diagnostics in glowing green text: âDrive Stability: Nominalâ, âLife Support: Optimalâ, âNav Lock: Antares Orbitalâ. The last line felt like a brand.
*What have I gotten involved in?* The question echoed, louder than the driveâs hum. Just forty-eight hours ago, his biggest concern was calibrating the sonic resonator on his small, independent claim near Syrtis Major. Then the deep-core survey ping had screamed â not the usual trace elements, but a vein of Azra so dense, so pure, it defied comprehension. His unassuming plot, inherited from his father, was suddenly the most valuable real estate in the solar system.```
My main complaint with kimi is that it insist on making it as short as it can, V3.1 is atleast decent at that
Kimi K2's writing of SCP time travelling letters from bench was good
I will say GLM 4.5 is competent
its prose is nothing fancy, it writes at like a grade 8 level
but it's still enjoyable, just not "painterly" like kimi which reads like a real novel
Kimi K2 writing is good, but sometimes loses plot points like relation status, name drops, and probably hurts at long context
GLM 4.5 with reasoning should handle it better, like double pass the same data
yeah kimi k2 is known to not have coherency or long context that say r1 or gemini 2.5 has
but it's also a base model
a thinking version will come soon
Also, in my experience its just incapable of doing long outputs
what do you mean by long?
its V3's problem compounded
The longer is prompt context, the more beneficial is reasoning. Plus of course Kimi2 is censored
Uncensored Kimi K2 with reasoning and 256k context would be game changing
kimi k2 seems to have no problem with long outputs at least in the benchmarks
at least in raw "length" but it does not fare well in the longform writing bench
thats the thing, i use my models for writing
tough! I'd say use 3.1 (according to longform b ench) for the general structure then kimi k2 to rewrite drafts
18% degradation is big
yeah huge. but it's also one of the best non thinking models on the list
so here's hoping their thinking version has better long context
You can start with Kimi until 32k context then switch to thinking models. Or try 2-1-2-1-2-1 switch between non-think and think
Mixing styles of both to avoid repetition but extract details from time to time
I like the idea of structure using good long context models then kimi to write the prose for each smaller segment
just feed it character information, plot points, summary of story thus far
and let it paint over the rough edges
I hate this model.
i spent so much time fixing bugs due to DeepSeek not changing the model id in API. and then this model performs worst in my coding eval. makes me question reality.
We know some of the main drawbacks of hybrid reasoning models, I would guess the main benefit is prompt cashing, prob reduces cost when the same model is used for both reasoning and non-reasoning. Outside of that Iâm not sure why itâs beneficial
From deepseekâs perspective Iâm sure itâs tempting because they only need to host one model weights and with how itâs harder for them to get GPUâs that would likely be beneficial
Prompt caching is provider side, I think they can do shared prompt cache for all their models. I hadn't tried DeepSeek caching during separate models era
Oh, did not know that was a thing
I am not 100% sure if that's how it works
Might have to look into it to see if any providers do that
The path I thought companies would take with hybrid reasoning models is where it would not think at all when itâs an easy question and think for 5+ min when itâs really really hard (I know time is a horrible way to describe this because itâs depended on compute, but you get the point, haha), but I am assuming thats harder said than done, seems that models canât comprehend where they may screw up, how many râs in the word strawberry was the most famous example of this
You mean kv cache right? I think it depends on embedding. Not exactly sure how it works, but I'm planning to do a livestream to understand how k, v, q works for attention mechanism. Will share after I get a good understanding.
This is just vibe based, but Iâm liking deepseek v3.1 with no thinking more than with thinking, it def feels like an improvement over 0324
I think it will be my go to moving forward, with GLM 4.5 subbing in at times, going to try to live without sonnet 4 for a bit
how was the responces of v3.1 on crypto chart and trading raw data ?
Me too.
Got really excited to see the free version, then found out its censored to hell. Briliant
How is it even possible
Even in the way it writes it seems very different to the paid provider version. Its kinda bizzare and I hate it.
I am willing to give some level of new provider issues benefit of doubt, but its clearly more than that.
It also seems totally resistent to my jailbreaking set up. My set up makes even Kimi K2 fully uncensored, so this is interesting in a really annoying way.
openinfra being openinfra
I jbd it on the official website by accident
i dont understand how to get the thinking v3.1 using typingmind's openrouter connector
provider issue or model training issue?
Add a custom body params like this:
Provider. The paid version is uncensored.
Is it gonna stop being a training model at some point?
Well you can always use other providers that arenât training
What are the free providers?
Ahh you mean free providers
They make their money from collecting : selling data
Yeah.
so itâs pretty much impossible
DeepSeek-V3.1 coding performance evaluation on my coding evaluation set:
Mixed performance with concerning regressions - DeepSeek-V3.1 achieved an average rating of 5.68, significantly underperforming compared to top models and even showing regression from its predecessor on some tasks.
Performance Comparisons
- vs. Top Models: Performed worse than Claude Opus 4, Claude Sonnet 4, Grok 4, and GPT-4.1
- vs. Open-Source Models: Also lagged behind gpt-oss-120b, Qwen3 Coder, and Kimi K2
- vs. Predecessor: Mixed results compared to DeepSeek-V3 (New), with some improvements but notable regressions
Notable Issues
- Instruction adherence problems - stubbornly ignored specific formatting requests
- Gap in advanced programming patterns - struggled with uncommon or tricky scenarios
- Visualization - produced charts remarkably similar to Horizon Alpha
Full blog post: https://eval.16x.engineer/blog/deepseek-v3-1-coding-performance-evaluation
wish they made clause sonnet 4 cheaper
that would require anthropic to make their models more efficient
I am sure its pretty efficent but its proprietary , and since they are basically used by almost any "no code" service they can charge any price
how are these benchmarks done , like many times one company could not do other does , like are they given simple cording task and if dont do it they are given a point or or weather they do it more efficeintly as coding itself is highly subjective, it should be done on different fields in coding
I livestreamed the eval process on YouTube. You can see exactly how it's done:
https://youtube.com/live/jxQOQrHVy5I?feature=share
there are quite many reports that the biggest AI companies are usually running at a loss, and subsidized, openai's own revenue data confirms this
nice
cuz ai dont give profits instantly its a long term thing after spending billions to create a model they can spend a fraction of it to make a better or more effcient version while api costs remain the same always
Does anyone know if chutes is running this model at fp4? Based on the low price I would assume so
I thought OR shows the quant on the model page?
Not all of them display a quant for me. Unless that is supposed to mean its not quantised at all? I assumed it wasn't that because some of them are so cheap/fast and don't show any quant
And also the fact that you can't filter by "no quant" only "all quantizations"
its in the row of icons under the provider name
chutes doesn't have one listed, so. it's "unknown"
yeah, so I was wondering if anyone knew what it actually is
yeah if it's not there its not disclosed
i think the thing with chutes is its distributed so there is no specific quant, every node might use different config
thank you so much!
finally a good provider hosting deepseek v3.1 for free
Who?
Anyone know why Silly Tavern still shows only the 64K OpenInference and not 164K DeepInfra?
SillyTavern uses the context listed from the /models endpoint, the one appearing in ST's command prompt console, when you click on "Connect" so it uses that
Just choose DeepInfra provider and set context to unlocked/unlimited
That's what the API response shows from https://openrouter.ai/api/v1/models
Like he said, you can select the provider. Otherwise, don't quote me on this, presumably if your context is too big, OR would route it to whatever supports it (if not, it would error), assuming you keep Middle-Out Transform disabled.
And "unlocked" context slider will prevent auto adjusting the max.
Thanks guys!
I asked in chutes' discord, and the admin says it's fp8
i was wondering why deepseek is halucinating then I found out about the fp things
What precision is deepseek trained on?
fp8
I kinda just swapped to using deepseek as the provider, consistent quality and much cheaper
But I guess now there are more options that support chaching so itâs prob no longer the cheaper option
which providers do u know support cacheing?
my bad, i was on my phone when i quickly checked, I guess its still only deepseek that supports prompt caching
yeee
bummer, its not worth using models that dont support prompt caching, at least if you use it for a lot of code
Or long chats
why deepseek is so slow
deepseek provider
and why pricing for v3.1 is different like on open router even from deepseek provider its cheaper than deepseek platform
Other providers offer DeepSeek v3.1 for cheaper
no the provider name is deepseek
and its giving cheaper than deepseek platform
check in open router
Hey, you're right!
@nocturne kelp there's a pricing discrepancy for the DeepSeek provider
Its 27 cents in and $1.1 out but DeepSeek is charging a different amount
With 3.1 do you guys usually put supplemental tool info in your system prompt? I've been getting weird tool behavior from 3.1. Also, how thorough are you with your tool and argument descriptions. I didn't see this until I switched from openai models to this model and I'm sure the problem is my lack of experience. Openrouter published an article on tool call accuracy but it seemed to suggest proprietary models excel at tool calling accuracy but didn't tell us what to do about it on open models. link: https://openrouter.ai/announcements/tool-calling-accuracy
Claude models are top tier in tool calling, gpt 5 isn't bad either, most open source models are complete slop at tool calling
With the exception being glm 4.5 which is also really good at tool calls
Awesome thanks. Oh wow just found this which very much supports your claims! https://gorilla.cs.berkeley.edu/leaderboard.html
Explore The Berkeley Function Calling Leaderboard (also called The Berkeley Tool Calling Leaderboard) to see the LLM's ability to call functions (aka tools) accurately.
May i know the rate limit of this model ?
bbbbut what about kimi :(
its not really that good at tool calls from my experience
maybe cuz i was served fp4
yea i think theo t3 person and maybe some others have some videos about kimi and tool calls, it is pretty provider dependent
apparently, from memory
theo is a larper dont listen to him đ
There's no rate limits for paid models
any better alternatives to deepseek that isbetter and open source?
for making notes
you can try kimik2
how to remove a car?
?
Try glm 4.5
Plus Kimi k2 0905 is really cute
Jeeesus, context of free model got downgraded to 30k from 60k. Paid model' 163k context. It's unuseable at this point. Any better free models with more context?
It seems the free version hosted by DeepInfra still has 163k context
is it better tha glm?
people generally like kimi's style more i think, but its not necessarily 'better' than glm. glm is hybrid (toggleable) reasoning, kimi doesn't have 'reasoning'
kimi is also quite concise
Depends on where you actually use it and your own prefers
it has like trillion parameters
Yea Kimi is big but what I meant is that its outputs are short
i see , i was k inda looking for long detailed output like gemini gives , I need it for making notes
I got "(DeepInfra) Provider returned error: deepseek/deepseek-chat-v3.1:free is temporarily rate-limited upstream. Please retry shortly, or add your own key to accumulate your rate limits: https://openrouter.ai/settings/integrations" For Deepseek v3.1 free using DeepInfra, anyone knows how long it will last?
It feels like prefills still haven't been sorted out on V3.1. All of the providers returned a completely unrelated response in Mandarin
sorry can you clarify? and might this be why im getting responses talking about python coding or nutritional papers instead of responses?
sorry for clarity the key works on the site i am using when I click the test button and the deepseekv3.1 is set at the model, but the responses are garbled and i dont have any blocked providers.
Using sillytavern, when I run chat completion and ask it to continue the assistant message, it generates the chinese text below
I heard only deepseek's provider does working prefills, but havent tried it
yeah this is what i get
i have no idea
it auto picks openinterface and deepintra as the common providers for me
deepseek/deepseek-chat-v3.1 is a constant battle now. We don't use free models. Account is always topped up. Suddenly: No endpoints found that support tool use. Is this no longer supporting tool use?
@nocturne kelp https://api-docs.deepseek.com/updates#deepseek-v31-terminus
đââď¸
Hopefully it actually fixed the language mixing nonsense
wwtf
#1419676592082518039
I still get a 404 from deepseek/deepseek-chat-v3.1 at OR when my payload defines tools: regardless of provider. What did I miss or is this terminus unrelated?
The models endpoint reports that it supports tools.
would like to report that V3.1 (free) from DeepInfra appears to have a (temporarily?) reduced maximum context size to 49299 tokens (it appears to not be a consistent limit, 48k was a good buffer), anything above that results in a 404 'no allowed providers are available' error. Have ignored OpenInference to check.
Would it be possible to check with DeepInfra if it's a temporary or permanent change, and if it's permanent, be reflected in the provider list? Thank you!
p.s. apologies for the ping, just noticed your status of being on vacation, my bad!
love it that deepinfra did something on their back end to cause the quality of v3.1 to nosedive and struggle with endless repetition
Why does v3.1 this sometimes? It's so strange. I didn't changed anything.
What is this shit?
<@&1384697330254610442> can you help?
It's not too much tokens
I don't understand what is happening
@storm seal any idea? My roleplay worked fine until now đ
Hmm
Try Chutes as the provider
How can I switch it?
I use my OR key
Only deep infra offers free
Then will I switch to 3.2 it's cheaper
The same issue O_O
Found the issue it's my prefill
Ah, yayyyy
strug
Some provider use different quant and also different stack
Mostly it direct you to provider that have problematic stack and weight
What is a stack?
The software they use to provide the service
Some providers might use vLLM while others use Llama.cpp or their own custom vLLM version
Alright, that's out of my universe
DeepInfra is permanently removing deepseek v3.1 or its temporary?
Permanent
Amazing
thanks janitorai
Looks like v3.1:free now only works if you let them use your prompts :/
3.1 still works but only when selling your soul
Yeah janitor had a huge impact on Open router, deepseek and chutes.
It's crazy h6huge the community got and how satisfied they all are from deepseek.
But I also don't know any alternative LLM which writes as good (and uncensored!) like deepseek.
And their built-in premium version of deepseek is with 20$ per month too expensive.
I also use chub đŤŁđ
V3 is at <10% uptime. And v3.1 gets dead now

Oh what the hell
yeah a 100+ billion difference in tokens is absolutely diabolical
Holy
Its over guys...Deepseek free 3.1 is officially dead
No more free gooning
Only premium gooning for now
Since it died tragically, I have to admit I have never experienced so coherent and complex RP capacities with a 4-compressed model before đĽ˛
So the free one is dead for good?
Ooooh, it went to zero!
yep
also openinference doesn't allow gooning requests so no gooning to that so yeah
I got an idea that could kinda work for the free Provider in a "never happening hypothetical return(?)", Rate limits at specific services. The idea is simple, Free Janitor Users using a free model that got too many requests, they all get rate limit, like a single request per minute? Or two? Just affecting a single service instead of they all. That could not directly affect the other free users that used DeepSeek V3.1 for any other reasons, from helper to coding. It's kinda unfair that they all get affected because of Janitor users.
Wouldnât work - not good idea to punish third party apps for putting the app name in the api call. The obvious solution would be to just update it to not mention that itâs janitor ai, and then the situation is just worse
The hypothetical janitor ai updates are too smartâAnyways, seriously, Janitor would be capable of making its name incognito in api requests? Have they done that before or...? Damn, now that I think about it Openrouter can even get request from terminals, so they could not even do anything about a whole site going incognito.
they have to specifically put the name in the header to be identified in the first place
thats why janitor wasn't on any of the charts before late september
even though their usage number was already insane
I think here the problem is a middle between Janitor being hidden from view until last month and Openrouter for, well, just being an open router. Since it ends up affecting providers and users with situations like this, despite not being paid, the provider knew they would get anything, not even training data from "charity work"âBut the may problem is whole provider being overwhelmed from a single site... welp, there is nothing users can do now, except either paying for requests... or simply getting subscriptions from other sites since Openrouter does not have subscriptions.
Honestly, Openrouter having subscriptions for certain models and providers would be an idea? like, certain amounts of dollars for a certain amount of daily requests so it isn't an abuse(?)
the only problem is with free models
since if they're paying, that's their own money to waste lol
why did it be hidden? did they only recently add the title headers?
it was very recent, let me check when
sept 23/24
Yeah, why was it hidden?
So Openrouter can't track it well or simply they were dumb?
they didn't include it earlier, you have to specifically go out of your way to label it
it's opt-in not opt-out lol
By including simple headers in your requests, your app can appear in our leaderboards and gain insights into your model usage patterns.
but it's more infamous than anything else đ
Yeahh, anyways. If anyone here that loved to use "gooner rp free models" I think the Z-Ai is good, if you use a simple yet good system prompt with no jailbreak(because it makes reasoning too bulky)
Off topic but using any free models with chutes as a provider feels like a gambling addiction
Yeahh, also feel it, but since Chutes added compression to their models they started to get quirky
Wonât really use Z-AI, Iâm more of a prompt person which is why i used deepseek and now itâs gone, forever probably
The big reasinoning Qwen of Vecine got a normal prompt mode if you put /no_think in the post history instructions, and with the recommended parameters of Temperature=0.7, TopP=0.8, TopK=20, and MinP=0. It seems to work
With a 10K tokens RP chat it provides good replies, just need to specify reply lenghts or else it gets messy long. And the only problem is the too many requests error
Hey y'all, is the other v3.1 provider safe to use? Read here that they use your data to train their models. But isn't it the same for deepseek official?
What's
The best course of action here if you want to continue roleplaying with deepseek
Thank and I also tried out GLM again and itâs the wildest model Iâve ever seen before. It gets straight into the not safe to work stuff and it doesnât really matter what kind of chatbot your talking to. idk if itâs the model or if itâs my prompts since I have quite a list of prompts that are for the AI and definitely isnât SFW so maybe itâs the prompt, Iâm not sure.
Unfortunately there is no way of role playing with the unfiltered version of Deepseek v3.1. Open inference does offer free Deepseek v3.1 but itâs filtered and youâll get a lot of 429 errors similar to the other free Deepseek models on openrouter. There is also this one provider that I know but itâs also filtered unfortunately.
I used the GLM but honestly gets so many format errors that I genuinely give up xd
it's the model, even as an AI assistant it knowingly flounts ethical guidelines with minimal prompting
I use longcat.
đ
Wait canât we just use deepseek v3.1 (not free) from openrouter? If weâre paying there shouldnât be an issue right?
Oh okay, I thought that was down toođĽš
I use directly from DeepSeek (no longer v3.1 but v3.2 Exp) and been having a blast since. Topped up $15 in March and still have $4 now
Is it less expensive to get it directly from deepseek?
Low context (32k) and cache hits are a blessing
deepseek is cheapest provider especially due to caching
^In my opinion yeah because of caching
you can enforce routing to deepseek on openrouter
would you be willing to share your settings so i dont waste cash?
I use SillyTavern (though all platforms can too), set context limit to 32k, and set Temperature to 1.2 (DeepSeek official API/OpenRouter provider converts this by deducting 0.7 so it's actually 0.5 temperature)
Now about caching, it saves cash because your previous messages are kept in their system (for hours in my experience), so if your earlier messages or system prompt changes, or there is a world info/lorebook entry activation (that inserts a new system message before the chat or within the chat), then the cache busts
If earlier context are kept and nothing changes from the very beginning to the latest message, then the cache stays.
(Except once you reach 32k of context: older messages are removed for new messages to be inserted, thus no cache and you get billed normally)
I don't know if I can explain better than this jumbled mess lol
Would you recommend topping up to deepseek directly or on open router instead? I heard deepseek official is also a provider there so it works too right?
Yep. You can just use the DeepSeek provider in OpenRouter
Though they only serve their latest model, which is DeepSeek V3.2 Exp
And the caching works too
I use direct Deepseek because Openrouter's deepseek account balance can be zero (no auto top-up) so errors haha
how do u do it??
Choosing the official Deepseek provider, you mean? In SillyTavern you can choose a provider, but for other platforms that do not have a provider selection tab, I think you can just set "Allowed Providers" in your OpenRouter account settings to DeepSeek only (use the model DeepSeek v3.2 Exp, they only serve that currently and paid)
You can also make a preset in openrouter
dang im all so new to ts
how do you cache?
Automatic
if you use DeepSeek provider
or DeepSeek directly
like this?
Be warned caching goes out the window after you hit ctx limit
il drop 5$ in and see how it goes
as the messages leave FIFO style
how will i know?
itâs cheaper to have long ctx compared to small ctx with DeepSeekâs 10x cache discount
could you show your preset settings?
Yeah if you want full context, Deepseek official provider only supports 128k
ive set it to 32k
so it saves money in the long term since i hardly go over 30 msgs
My longest chat was 47k tokens iirc
How does caching work? Does cache stay or something, considering that it's basically giving the bot an idea of the past events/roleplay?
It would work even if I use deepseek official through openrouter right
DeepSeek keeps caches for hours / they don't say it in their docs. And not exactly past events per se, like cache is keeping whatever are the older messages are in the front of the context you've set. The bot will know what messages are in the context, not the entire chat history
yes
according to their docs at https://api-docs.deepseek.com/guides/kv_cache,
Cache construction takes seconds. Once the cache is no longer in use, it will be automatically cleared, usually within a few hours to a few days.
so might be up to days, if their servers aren't under heavy usage.
That sounds neat, but I wish it lasted longer since my writing could take hours too lmao
I saw people mentioning they had cache hit at 25 hours
ah neat
but if you're doing a chat and your old messages get pushed out
might not be a cache hit anymore
Since I roleplay using proxy, I often just use the same character (with the same definition) so the character definition would be consistent, would that mean it's always a cache hit even if it's a different chat but the same character?
heck, would it be possible to maintain cache for weeks, maybe even months?
ohh never knew that. didn't read enough myself...
zero chance đ
You'd have to send 1 message every so often depending on the cache time
yeah. no cache, deepseek will re-cache everything once you've gone past your context limit since the messages after the character definition (which would be the bot's first message) is replaced with whatever nearest message to fill the front context
OHHH so the cache would only hit until something like... 16k if I set it so?
Would setting the context window to 32k/16k be cheaper than the 64k/128k context with cache
Yep, if you set context to 16k, it'd cache the 16k like:
- System prompt [500 tokens]
- Bot definition [1000 tokens]
- Persona definition [100 tokens]
- Chat history [1st bot "greeting" message, 2nd message your reply, 3rd bot reply, 4th your reply, etc. totaling 15400 tokens]
This is what the 16k cached would be.
Then if you send your reply to the bot once the context is full, the bot's 1st "greeting" message would be replaced by your 2nd message from the chat history like:
- System prompt [500 tokens] <-- this is cached since nothing changed
- Bot definition [1000 tokens] <-- cached
- Persona definition [100 tokens] <-- cached
- Chat history [2nd message your reply, 3rd bot reply, 4th your reply, 5th bot message, 6th your reply, etc. totaling 15400 tokens] <-- no cache since the bot's greeting message is replaced by your reply // there is a change. DeepSeek re-caches this repeatedly if you still use 16k context
So if you decide to use 128k, the cache would prove useful until you hit 128k. More than that, it re-caches the whole 127,4k tokens
I use 32k context because when it re-caches, deepseek bills me $0.01 instead of $0.03 if 128k
cheap
Okay, guys. Can someone explain to me whether V3.1 is still working or if itâs completely no longer free? Some people say it can still be used, but one of the endpoints was removed, while others say it no longer works and has been shut down.
paid v3.1 is working just fine
v3.1 free had two providers, DeepInfra and OpenInference
DeepInfra no longer provides free v3.1 due to the massive human horde from janitor
and OpenInference is a far smaller provider
so you're just going to get 429 errors
So now it canât really work properly on the free model?
if you thanos snap 95% of the janitor horde, it'll work. but currently you're competing with way too many people for the available capacity.
Thanks for the reply.
rlly? I was under the impression 3.1 free got axed then taken out the back and executed because of the graph shown earlier that it was over. I got 3.1 free from OpenRouter and I keep getting errors whenever I try to use it, and when I go on the error link, it says its no longer available
Its effectively dead because of the number of users
If my context size is at 32k now, does that mean once I hit 32k context, it'll be cheaper?
there'll be caching when you build up the context/the chat gets longer until 32k, and as I mentioned normal 32k input price is $0.009 or 1 cent
so yes, cheap
đ¤ I see. Thanks! I was a little confused since it costs $0.009 for me. ^^
Hello, I need help. I get this message when I try to send a message in any chat on âJanitor AI.â
3.1 is dead
Shit
you can still use it from platform deepseek
you can still use it in OpenRouter, it's just the free endpoint that's gone
Does anyone else have a problem with DeepSeek V3.1 putting non-english characters in tool calls? It consistently does it for me, and causes tool calls to fail.
do you mean https://openrouter.ai/api/v1/chat/completions is gone and that is the reason for these response?
{"error":{"message":"Not Found","code":404}}
no, i meant there isn't a free DeepSeek v3.1 anymore
also, depending of what you're using, you should only use the https://openrouter.ai/api/v1 part
i am just trying to test the endpoint jst as it is in the documentation and i am receiving 404
const url = "https://openrouter.ai/api/v1/chat/completions";
const headers = {
"Authorization": Bearer ${process.env.OPENROUTER_API_KEY},
"Content-Type": "application/json"
};
const payload = {
"model": "deepseek/deepseek-chat-v3.1",
"messages": [
{
"role": "system",
"content": "hello"
},
{
"role": "user",
"content": "If you built the world's tallest skyscraper, what would you name it?"
}
],
"temperature": 0.7
};
const response = await fetch(url, {
method: "POST",
headers,
body: JSON.stringify(payload)
});
const data = await response.json();
console.log(data);
any response?
idk, it worked here
try seeing if you have any provider preferences in your settings
how do i see the reasoning tokens for v3.1 and v3.1-terminus
here is my code. i tried a lot but cannot see the reasoning for different kind of user messages.
Any help would be much appreciated.
client.chat.completions.create(
model="deepseek/deepseek-v3.1-terminus",
messages=[{"role":"system","content":system_prompt},{"role":"user","content":user_message}],
temperature=0.1,
top_p=0.1,
stream=True,
max_tokens=2500,
extra_body={
"provider": {
"sort": "throughput"
},
"enable_thinking": True,
"reasoning": {
"effort": "high",
"exclude": False
}
}
)
can you pose the raw, full response here?
i think it needs to be
"reasoning": {
"enable": true
}
if you include the effort param , reasoning is enabled by default
i can't find the good things form v3.1,
i use R1 and V3 in the past and the prompt(s) fit them.
I believe the DS family trained by similar dataset and behave the same.
however whne i change the model from R1 to V3.1, the result ruins.... đ
Different models need different prompts, even in the same lineup
biggest learning tbh , hence you have to be mindful when you build agents.
(DSPy fixes this)
I dont see Deepseek as a provider for 3.2 exp though
And setting all other providers for the model to ignored just gives me a no endpoints error
you probably have to enable this in your settings
DeepSeek
I did enable it just now and still got the error
Let me reopen my app though
Just in case
Seems to have worked
Hey folks, does anyone know why OpenRouter is throwing an error?
You disabled allowing providers to log/train on models
in a chubapp i see deepseek 3.1 base, 3.1 terminus and 3.1 chat
What is the difference (?)
3.1 Base -> Won't chat with you, it will complete your sentences
3.1 Chat -> It'll be able to chat
3.1 Terminus -> This is 3.1 Chat with minor bug fixes (some repetition issues and mixing english and chinese)
thanks bro!!
Hmm, where and what should I turn on?
turn on the first 3 options in the privacy section of your account settings
Oh, thanks. But after doing this, the chat shows an error: "No allowed providers are available for the selected model." Thanks for helping
Did you black list Chutes?
No, I don't even know that there is some kind of list )
Do you have option 5 (ZDR) activated?
It should be turned off.
I didn't enable this option.
I'm not sure why you can't use DeepSeek v3.1 (free)
can you show a picture of your allowed/ignored providers section at https://openrouter.ai/settings/preferences ?
Is this the providers section at ?
yes, can you show the top too?
Oh, sorry, the top point didn't make it into the screenshot above.
Hmm, this is weird. All your settings are correct, V3.1 (free) should be working fine for you, hmm.
The account isn't banned. ) Z.AI: GLM 4.5 Air (free) also doesn't work
- Could the models not working be related to geolocation?
- Is there technical support for Openrouter?
I'm not sure on that front, sorry. Maybe someone else will know, but I'm not aware of OpenInference geo-blocking.
Is this a problem?
i don't think it should be
anyone else's v3.1 free model just being weird
it js gives "1.1..1.1.1..1.11" or similar gibberish
Try another provider
If using a different provider stops that, then you know it's that particular providers fault
And you all can report that provider here
Theres only 1 provider for free rho ;-;
OpenInfdrence
Ill se
I tried both yesterday and the day before, im not sure whats the issue
Via oenrouter chatroom
and aider
Set your temp to 0.3, and if it still happens then it's confirmed its an Open inference issue
See if you can reproduce it in a new chat if it's an Open inference issue, and Toven will yell at OpenInference to fix it
Ah so it seems it defaultd at 1 for xhatroom and 0 for aider, ill try with 0.3
One sec js gotta get back home
Appears 0.3 works but for some reason it gave me an end of sentence token at the end which appeared on the chat message? https://openrouter.ai/chat?room=orc-1760793523-IhYneTn3lo9GCKT4SW3N
But aside from that it works đ
I can't see the chat btw
Chats are stored locally, not on server
Oh wait hih
huh
Nvm its spewing nonsense again
It may be a specific token count cuz it only happened when i pasted a large block of text
Can you post that here?
Base on my experiences if we use the non-free providers and chose the good provider even with large block of context i don't get that
Mostly because of the provider of free models
Which understandable for free models, but i guess if they can fix it then it's amazing of them as a provider
Perhaps because of the quantization? OpenInference provides free DS v3.1 at int8 precision which is lower uner fp4 I think
int8 is less quantised than fp4 (so, int8 is higher quality). for some reason i thought openinference v3.1 free said fp4, but i checked and it does say int8
Agree on this, but i don't know with those new NVFP4, i heard if it goes through the pre-train, sft then rl phase with NVFP4 format it will be comparable with the BF16.
I guess that make sense, because deepseek when they doing original training with FP8 the model actually have comparable performance with BF16 than if it come from BF16 then being quantize to be FP8.
The key point is in what format does the model originally being train on.
Yall I'm using the og 3.1 chat
Not the terminus
And it's lagging so hard, I'm using paid
What's going on
i am trying to make a local language ai chat bot using the deepseek v3.1 from openrouter but the url to get the request seems to be broken i am using https://openrouter.ai/api/v1/chat/completions for api but the chatbot shows url not found what could be possible cause and a solution to this?
Seems to work now
response tokens are coming through empty and every thing is ending up in the reasoning field today for some reason? tried across several providers all of which were fine before
It loaded endless long to finally respond with a very short message. But consumed 3x the amount of tokens then my input was?
I assume it has "invisible" thinking?
Not sure if that is good or bad...
if you click to see more info you can see how many of them were reasoning tokens