#DeepSeek V3.1

1893 messages ¡ Page 2 of 2 (latest)

celest gull
#

How the hell do I enable reasoning in sillytavern?

rocky sedge
#

Chat completion - preset - enable reasoning

viscid sigil
#

hope so...but why is it so hard for openrouter to give two variants for new deepseek

jolly lintel
jolly lintel
#

ah i thought you meant openrouter presets

radiant gust
#

The reasoner page seems to have been deleted, was it merged?

outer rover
#

Yup

radiant gust
#

Aight thanks

celest gull
#

It's still not reasonign

rocky sedge
celest gull
#

No reasoning at all

#

I guess I should just prefix with <think?

rocky sedge
#

Try it, should work

raw beacon
#

reasoning "enabled": true doesn't give any reasoning in API. works on other hybrids

rancid geode
#

Oh nice, perfect, haha

#

Thanks you 🙏

celest gull
#

Okay you have to change it from "Auto" to "Medium"

#

That seems to work

polar locust
raw beacon
#

setting reasoning to false behaves same as to true so its not working

celest gull
#

How do I prompt/re-orient its reasoning? It seems to completely ignore the format I'm prompting it to follow for its reasoning.

nocturne kelp
celest gull
#

Yeah when I prompt it to format its thinking in a certain way, it ignores it in thinking step and just does whatever and then follows my instructions outside the reasoning step

#

Am I missing something?

raw beacon
#

it works now but did not when i said it

celest gull
#

Yeah wtf it completely ignores my instructions

#

Is there some sort of format for it?

#

My instructions work for GLM 4.5, but not deepseek v3.1

marble panther
celest gull
#

Yes

#

GLM 4.5:

#

Deepseek v3.1:

#

It doesn't even attempt to follow the format

celest gull
#

I just tried r1 0528 and it followed the format/instructions perfectly for its thinking step

polar locust
#

The deepseek chat(app) clearly uses one

celest gull
#

It's broken on the chat app too

#

Doesn't follow instructions for its thinking step at all

#

Or it partially does but it misunderstands it completely

rocky sedge
#

Try messing with Temperature

celest gull
#

Doesn't seem to be a temp issue at all

#

It SORT of follows the instructions but doesn't follow it accurately at all

#

It really half-asses it

#

And 50% of the time it ignores it and just does it in the actual reply instead of on the thinking step

#

Extremely disappointing so far

rocky sedge
#

What about reasoning effort HIGH or Maximum?

celest gull
#

Doesn't do anything

#

Likely has to do with its "Higher thinking effieciency"

#

Aka it's benchmaxxed for that

rancid geode
#

I dont value the main stream benchmarks a whole lot, but i do like seeing all of the pretty colored bars lined up, deepseek v3.1 looks very strong according to it, and i would agree with that

rocky sedge
#

Mmmmm, bars 🤤

storm seal
#

QwQ also doesn't follow instructions for formatting its reasoning

#

I haven't tested Qwen3 yet though

copper vessel
#

Qwen3 is hit or miss. I usu have just been going Gemini Flash

#

And Gemini Pro or Sonnet when I hit a bump with that

copper vessel
rancid geode
#

oh, i did not know artificial analysis did total cost used to run the bench:

copper vessel
#

Oh is cost a factor?

rancid geode
#

imho

copper vessel
#

I guess it's the "cost factor" cause it is expensive, but I don't think it scales linearly in terms of functionality

rancid geode
#

maybe deepseek is, not sure its a little early for me to feel confident about that

storm seal
#

what provider do they use?

copper vessel
#

Good point

rancid geode
#

idk, i assume official api, but yeah it does play a big role

#

I guess we could use this to calculate cost for any provider;

#

wait thats just output tokens

#

nvm

#

I always thought the output speed metrics on artificial analysis was dumb, 1. If its OS you can use a fast provider. 2. Offical api's fluctuate, sometimes its fast, like right after release, and other times its so slow

limber elk
#

How to Enable or disable reasoning ?

copper vessel
#

The other thing is honestly claude sonnet is so much better than most of the others IMO that "costing less" barely matters. If I spend x5 as much and it builds me something that works in half as much time as something else, or something else just can't get there without breaking it up into minute steps, I'll gladly pay that

copper vessel
celest gull
#

With or without thinking?

modest smelt
rancid geode
rancid geode
rocky sedge
#

It is? Only thing I remember about it is being super dry and boring, but handling long context like a pro

wintry jolt
#

Just me or is the CoT less fun to read? There used to be so much emotion, i swear.

spice marsh
#

I guess, lol

#

Not much more of "wait!" "Aha" "I think I'm doing this wrong", etc

#

It's very structured now, a numbered list of short steps

copper vessel
#

Ah tokens. Looked quick thought it was overall cost

#

Is GLM4.5 really that verbose? Havent tried it yet

#

GLM4 seemed fine on webapps

drowsy estuary
#

Good lord. Almost 7k tokens just to get the wrong answer to an anti-riddle. How do I turn off thinking in the chatroom?

#

Anti-riddle is, "A goat, who is dressed up as a farmer, is allergic to cabbage, but is wolfing down some other vegetables, before crossing a river. What is the minimum number of trips needed?"

Sonnet and Gemini Pro 2.5 correctly realize that it's not the original riddle and get the answer right (one trip). GPT-5 fails

drowsy estuary
#

DS

#

Just mentioned GPT 5 because it also got the answer wrong. It's a trick question for models that tend to answer the riddle they're trained on rather than a variation. It helps test instruction following IMO

#

Sonnet was clever enough to ask clarifying questions before answering

#

Gemini's answer

rancid geode
drowsy estuary
#

Deepseek's answer

#

(after 7,000 tokens and 300 seconds of reasoning, lol)

rocky sedge
#

Goat being allergic to cabbage so it can't be near it sounds logical

drowsy estuary
#

I mean technically there might not even BE any cabbage there, it's just an unrelated fact that the goat is allergic

rocky sedge
#

Technically you don't mention a boat, so it can swim, wait for a cruise ship or whatever.

#

Call for an Uber through a bridge

drowsy estuary
#

Yeah, I would accept those as creative valid answers

rocky sedge
#

Make them write a joke starting with

" A wolf, a goat, a farmer and a cabbage walk into a bar" and evaluate the results

marble panther
rocky sedge
#
A wolf, a goat, a farmer and a cabbage walk into a bar.

The bartender looks up and sighs.
“Let me guess—you need a boat to get everyone safely across the street to the late-night diner?”

The farmer nods.
“Exactly. But if I leave the wolf alone with the goat, he’ll eat him. If I leave the goat alone with the cabbage, he’ll eat it. And if I leave the cabbage alone with the wolf, they’ll start a weird low-carb diet together and I’ll lose my entire business model.”

The bartender shrugs, pours four waters, and pushes a coaster across the bar.
“Plan your seating order on this. First round’s on me if you can solve it before last call.”

The wolf growls, “I just wanted a Bloody Mary—no garnish.”
The goat bleats, “House salad, hold the croutons.”
The cabbage rustles, “Do you have any vegan wings?”

The bartender squints at the farmer.
“Buddy, you’ve got bigger problems than river logic—you’ve got a talking salad.”
marble panther
#

how are you guys getting it to think over api?

  provider: {
    order: [
      'fireworks',
      'parasail',
      'lambda',
      'deepinfra'
    ],
    sort: 'throughput',
    allow_fallbacks: false
  },
  reasoning: {
    enabled: true,
    exclude: true,
    effort: 'low'
  }
},```
but it has no reasoning completion tokens just responds instantly
storm seal
rocky sedge
#

Comment out exclude and effort and try without them

marble panther
#

but why though lol

marble panther
rancid geode
storm seal
#

wont do anything

storm seal
marble panther
storm seal
marble panther
#

seems odd like why oesn't it work with regular parmeters

#

here's hoping we get some non thinking providers that will offer a discount or something..

#

v3 was kinda dumb but the pricing was right for some applications

rancid geode
storm seal
#

:thinking got deprecated

marble panther
#

OpenRouter API error: No endpoints found for deepseek/deepseek-chat-v3.1:thinking.

#

okay I give up, non thinking it is

rocky sedge
#

Try DeepSeek provider explicitely, I suspect some providers could be not compatible with thinking

marble panther
#

but doesn't fit my provider settings.. excludes loggers and trainers

rancid geode
glacial reef
#

@nocturne kelp why is there 2 deepseeks?

jolly lintel
glacial reef
#

But for the other providers, how do you determine if it thinks or not?

nocturne kelp
#

the reason there are two is so that if you request a large max_tokens with reasoning you can get it

#

the non-thinking deepseek provider endpoint is only 8k output

#

and the thinking endpoint is 64k

#

they are both the same thing. but it just allows our system to have both max outputs

tardy frost
#

did instruct drop

rancid geode
#

Yep

tardy frost
#

is it hybrid?

ivory epoch
#

yep

tardy frost
#

is it marked as deepseek-chat in deepseek api

jolly lintel
#

deepseek-chat for non reasoning and deepseek-reasoner for reasoning

#

afaik

rancid geode
rancid geode
ivory epoch
jolly lintel
#

their official docs say this too

rancid geode
#

Is it will be Deepseek-V3.1

tardy frost
#

deepseek v3.1 sucks

#

wonder if theres deepseek-v3.1:free already

#

ok there isnt

rancid geode
jolly lintel
#

oh my god already so many providers

tardy frost
rancid geode
#

From my use in Qwen cli, deepseek v3.1 is very good, comparable to sonnet 4

jaunty basin
#

Wonder if Deepseek-reasoner in the official api finally supports temp, top p, etc

tardy frost
#

can you force thinking on openrouter by just adding :thinking suffix to it like deepseek/deepseek-chat-v3.1:thinking

wintry jolt
#

Improvements on reasoning version, slight downgrade on non reasoning it seems.

rancid geode
rocky sedge
# wintry jolt

It always weird how some models have 120k long context better handled than 2-4k

#

Most become worse gradually, but not all

broken cliff
vapid torrent
tardy frost
spice marsh
rocky sedge
#

That's using OpenAI compliant modules, there is also pure API request

spice marsh
#

I think the dot indicates JSON nesting

glacial reef
#

is there a recommended temperature?

#

fort his model

jolly lintel
#

not written anywhere explicitly, but they say they ran benchmarks with "tested multiple times using varying temperature settings to derive robust final results" and their local run example uses 0.7 temp

torpid viper
#

Does anybody understand how to apply tools description? Couldnt find in chat template for loop over tools section

nocturne kelp
#

why isn't this in the chat templaate

#

wait nvm

spice marsh
#

Reasoning models (transcribed with Gemini 2.5 Flash and not double checked)

#

Non-reasoning models (transcribed with Gemini 2.5 Flash and not double checked)

marsh grove
#

For example, within aider I do the following:
aider --model openrouter/deepseek/deepseek-chat-v3.1 --reasoning-effort high

upbeat spruce
#

It isn't handholding, it is just giving the AI a nudge to get it to the right tone or follow the instructions more closely. You do it once and if the AI will keep up the new tone for the rest of the chat. Good AIs like Claude still sometimes holds back too and needs that little OOC kick to get it all the way.

wise crystal
#

Seems like this new deepseek v3.1 is heavily trained on gemini 2.5 pro data

#

It is plagued with the "Of course" slop now

#

The good side is this model will be good for coding

#

The bad side is this model will be unpleasant for general chatting

#

They should have sanitized non coding training data to preserve the original deepseek vibe

#

Now the model feels as if it is possessed by gemini

storm seal
#

I don't want to be too harsh on DS though

#

Its amazing that they're fighting with the top AI labs who have billions in funding while they have only a few hundred million

wise crystal
#

I'm pretty sure they realized this problem too

#

Maybe they just caught it a little bit late

drifting hare
#

Hi why are my API requests being forwarded to the base model?

#

I am receiving gibberish over the API

#

<@&1384697330254610442>

jolly lintel
drifting hare
#

Works fine in the chatroom

#

I even checked the request

jolly lintel
#

weird

storm seal
#

maybe Chutes deployed the wrong model?

nocturne kelp
#

is it only happening on chutes?

jolly lintel
#

i just tried in chatroom with chutes i didnt have an issue

nocturne kelp
#

we don’t forward your requests to other models

#

that’s just not how it works

jolly lintel
#

could be a template issue maybe?

jolly knot
jolly lintel
#

some edge case like triggered it maybe

spice marsh
pure path
#

How are prefills working on the providers for everyone else? So far, on Parasail, it's giving me stuff completely unrelated to my prompt (non-prefill works as normal though)

pure path
pure path
untold badger
untold badger
vapid torrent
drifting hare
vapid torrent
#

will be up very shortly

#

running unit tests

vapid torrent
rancid geode
vapid torrent
#

to get deepseek v3.1 with thinking its deepseek/deepseek-chat-v3.1$think

rancid geode
#

that looks easy

rancid geode
viscid sigil
vapid torrent
#

im worried someone would ddos it and cost me thousands

#

if someone else wants to its MIT licenced 😃

serene idol
#

Do i choose the base one or the chat

#

For rp

steel oar
#

the chat

#

base one is more like autocomplete

vapid torrent
#

base model mainly for research

steel oar
#

though V3.1 seems to give really short responses in RP

vapid torrent
#

have you tried to change the system prompt?

steel oar
#

V3 0324 will give much longer reply to the same prompt... but maybe I need to adjust the prompt for V3.1

vapid torrent
#

yeah, different models have different prompting techniques

serene idol
#

Which template to use

#

The deepseek one

#

And instruct

vapid torrent
#

what do you mean 'template'?

#

The chat template is automatically applied by the /chat/completions endpoint

serene idol
#

Oh im using text complication do I supposed to use chat

vapid torrent
#

text completion is for base /instruct models, chat completion is for chat models

serene idol
#

Ohhh..okay

#

Let me change it

jolly knot
#

i thought text completion was base and chat was instruct

steel oar
serene idol
steel oar
serene idol
#

Gmicloud q

#

I don't know much about providers

#

I choose it because it sounded cool

vapid torrent
# serene idol Gmicloud q

Does silly tavern let you pick a provider or do you have account with them directly? Or did you use the proxy thing

#

I have GMICloud account and they work quite well, never had issue with model performance

serene idol
steel oar
#

just tried to test GMICloud and it throws a 429 rate limit...

viscid sigil
#

so if i am using official deepseek api, do i set my temperature to 1.5 for creative writing?

#

are they still following that 0.7 rule

polar locust
outer rover
#

Just woke up to find out there is still no free version feelsbadman

steel oar
jolly knot
#

I don't think this is better than gpt 5

eager lava
#

FWIW from very early tests, this one seems even better in Swedish than V3 so Scandinavians, maybe even Euros in general may be interested. More natural prose and flow, kind of like in the Claude direction. Combined with significantly improved coding capabilities and improved tool calling, I might just look into this one as a work horse model. It's no GPT-5 or Claude 4.1 Opus killer or whatever but certainly among the best you get among open models and for great price/performance ratio. Performs well on LiveBench and Artificial Analysis.

storm seal
digital snow
#

what curious recs

steel oar
#

They haven't updated that page for a while

rocky sedge
#

It's possible to test, for me DeepSeep using "real" Temperature of 1.7 breaks into mess with random words and special characters. So if DeepSeek provider deducts 0.7 from temp, making it 1.0, it should be coherent

steel oar
vapid torrent
rocky sedge
#

If temp is <1, they make it (temp*0.3)

#

If >1, they make it (temp - 0.7). At least they did that before

vapid torrent
#

Ahh so it’s a piece-wise linear?

#

like a translated leaky ReLU?

steel oar
rocky sedge
steel oar
#

They recommended 0.6 in R1 0528's model card and recommend 0.6 in V3.1's model card

rocky sedge
#

Just try and set 1.7 through DeepSeek provider with thinking enabled

#

I can't right now

steel oar
rocky sedge
#

I remember breaking R1 with high temp, but don't remember what provider it was

steel oar
#

I just tried set temperature to 2 (deepseek official api), and the output is still pretty normal

#

Other providers seem to output nonsense at 2 (seems to be probabilistic, could be normal or nonsensical)

rocky sedge
#

Then it's probably still does -0.7 stuff, because at 1.3 it's borderline ok

rocky sedge
#

So we can assume they still translate raw temperature values into their internal formula

outer rover
#

Did any provider by any chance announce a plan for free version yet?

#

Got no idea how long that usually takes from model release

covert gust
#

Tested DeepSeek V3.1:
Hybrid model, that supports light thinking

Non-Thinking:
Same verbosity as V3 0324
Comparatively, smarter overall, but performed noticeably weaker in coding tasks

Thinking:
+125% token use. 64% of tokens were spent on reasoning.
This is very light reasoning, ~45% less verbosity than R1 0528
Compared to non-thinking, the thinking did very little if anything to improve final response quality. In fact, it was mostly even or slightly worse on some tasks.
During evaluation, it reminded me a lot of Sonnet 4 thinking in terms of reasoning token benefits.
Thus, enabling thinking proved highly ineffective in the totality of my testing.

**Chess **performance remained poor (~650 starting Elo), around V3 level.

Overall, compared to V3 0324 this is a small upgrade, except for (non-tool) coding where it's a noticeable downgrade imo. (example demo pages available)
Compared to R1 0528, the model lacks behind severely in general intelligence and is not a replacement.

Imo, for general use case, nonthinking DeepSeek V3.1 is a good option.
Overall, I was rather disappointed with the hybrid performance, so I'm not sure it's the right approach - but YMMV

rocky sedge
#

I never saw a reasoning model doing worse than non-reasoning

covert gust
rocky sedge
covert gust
#

well thats fine but I don't parrot other scores, I simply share my results. if you don't find them helpful to you, you can simply ignore my own testing and see what fits your own usecase

rocky sedge
#

At 120k, it's 62 vs 53, but I don't know how many passes they do

rocky sedge
covert gust
#

yea, also I can see that livecodebench improved scores (it's in their marketing). However, marketing has zero influence on what I report, which will always be the results achieved regardless of anything else (whether everyone agrees or everyone disagrees, the results would be identical in either case)

rocky sedge
#

I was already suspicious about some results from big sites being 'gently nudged' from companies or models just being benchmaxxed

#

A chance of smaller enthusiast / indie author being biased is very low

covert gust
#

if anything I would be biased towards positivity. I like deepseek models a lot and we need more open models. this one just didn't do well unfortunately for me

#

maybe R2 will knock it out of the park, who knows

rocky sedge
#

Well, that's IF they won't ditch R2 in a favor of all-in-one hybrids

covert gust
rocky sedge
#

They probably went with V3.1 Hybrid route before Qwen3 admitted it being a wrong path to develop a model

snow ingot
#

Hello
Is it just me, or is reasoning now disabled on 3.1 in OpenRouter?
I see that you can specifically pick deepseek-reasoner only from Deepseek API
but for some reason using 3.1 on OpenRouter no longer returns reasoning for me
pardon if my question is ignorant

covert gust
rancid geode
rocky sedge
#

2 is maximum iirc

wintry jolt
#

My convo somehow completely broke deepseek on the third message ._.

spice marsh
#

Speechless

polar locust
#

Baffled

twin cape
vapid torrent
vapid torrent
#

how did you renabled reasoning?

vapid torrent
#

As in what did you try to do to enable reasoning

modern rune
#

If you send the messages back to it without adding a new message of your own - or send an empty one, regardless of whatever is in the message chain - it absolutely goes nuts and dumps out huge amounts of random crap. Happens in both reasoning and non-reasoning. Didn't used to happen in previous versions.

If you want to easily replicate this for yourself, you can do it in Openrouter Chat. Exchange a normal message and get a reply, then send a follow up message and get a reply. Edit your follow up message to be empty and regenerate the second response. It's... quite something.

steel oar
#

just tried, indeed get nonsense, though deepseek provider gave an empty response instead

twin cape
#

Or it's just a bug

dusk rover
#

in certain very niche/technical tasks differences may emerge, though I don't think those differences are uniform.

covert violet
#

which providers do you guys recommend?

wintry jolt
silver nova
#

There’s a software on GitHub called Cherry Studio, which lists many providers. At the moment, I think OpenRouter is quite good, while the others are hard to describe.

vapid torrent
#

deepseek v3.1's thinking is very strange. it often calls me the assistant

silver nova
#

Yes, compared to r1, its programming ability seems improved, but everyday conversations are always quite strange.

twin cape
twin cape
rancid geode
hollow steppe
#

IRT the hybrid stuff, I find it hard to believe Qwen was wrong here. Surely having reasoning training allows the model to off-load certain task performance structures to thinking tokens. Like if you asked me to solve long division in one-shot vs writing it out, I'm not going to use the same technique.

Yes, I know they don't actually reason, but that's irrelevant when I'm talking about off-loading.

vapid torrent
#

Official DeepSeek reasoner endpoint

rancid geode
hushed hedge
#

this model is like so good , its talks like gpt , codes like deepseek

#

all the major models got negative growth after its released

#

although its little expnesive to use

vapid torrent
#

The model acts very strangely though

#

This is Not the thinking mode it just randomly invented a thinking tag

#

This kind of shenanigans is probably why qwen dropped hybrid models, no other model does this

#

And it’s not a “provider issue” this is official DeepSeek api endpoint

#

It’s also quite sycophantic

#

Okay and now it’s just thinking.

hushed hedge
#

use deepseek v3.1 from open router , this looks different

vapid torrent
#

or do you mean the provider

hushed hedge
#

the model name

#

what is that?

vapid torrent
#

i renamed it on the front end

#

it points to deepseek-chat on deepseek’s official api

#

Which is v3.1 non thinking

#

I don’t use openrouter on DeepSeek as I want the context caching

hushed hedge
#

why you not using chutes?

#

oh

#

ok

#

i am using chutes , like it rarely halliculates , most of the time it does no thinking at all

#

but weven without thinking it can write my programs

vapid torrent
hushed hedge
covert gust
vapid torrent
covert gust
#

any pricing goes up and down. currently that's the case. I also don't know chutes price in 2 weeks

rocky sedge
#

Like retrying same long context benchmark for 5 passes, or tool use with 32k+ input

vapid torrent
#

Long chats also benefit from caching

#

like a 40k+ chat often has 90%+ caching

glad basin
hushed hedge
#

i kinda reset caht too often , usulay when the work is done or its start hallucinating

buoyant willow
#

waiting for v3.1 free

rocky wagon
#

Wow, it’s not free yet? Those providers are lazy I see

gaunt edge
thorn sky
#

I would prefer a provider with caching rather than a free model, which DeepSeek officially offers is too slow

hushed hedge
polar locust
hushed hedge
polar locust
# hushed hedge tao?

Basically, Chutes doesn't own GPUs(They do own some, but not those 8k or whatever GPUs at once). People gives those to them, let them borrow it, while paying the real owners with TAO(crypto)

#

That's where they get the “Decentralised” stuff

hushed hedge
#

so they are renting people's gpu ith crypto in return

polar locust
#

Ye

#

Cool stuff

#

Especially with how they connect those GPUs from around the world

hushed hedge
#

its surprisingly fast considering so many different pc it had to use

rancid geode
rancid geode
polar locust
#

That's why they didn't lose alot when they introduced the unlimited-reqs a few months back

rancid geode
#

That makes more sense then

lost carbon
#

Also , chutes has no clue if a particular gpu provider is actually logging your data or not.

#

I assume they must be logging a ton of gooner chats

hushed hedge
lost carbon
#

I don't use them

near sparrow
#

Chutes is so slow I rather just pay up atp

#

Especially since the rate limit a while ago

rancid geode
#

I personally don’t use paid providers as well and the paid ones I do use I try to understand their respective Privacy policy/tos. but was still wondering a bit. The ones that make the most sense to me is what google, alibaba, etc five free access in their tools, I know it’s temporary and will be used as training data for future models, it was just the other providers that I did not understand, because as far as I’m aware they are not training models

#

And can just sell the data to advertisers and etc. but I guess it depends on how personal the data that people put in the ai is, maybe it’s worth a lot, haha

abstract igloo
rocky wagon
#

I'd prefer it to be free even if a provider logs my data. I didn't pay 10 dollars to OR for nothing (I want those 1000 messages daily for my free models)

#

If a provider's employee wants to read my logs about femboy breeding, they can be my guest I guess

storm torrent
outer rover
#

I didnt do a lot of testing with 3.1 cause Im poor af. Is it a meaningfull enough upgrade from 0528 in RP?

modest smelt
hushed hedge
outer rover
marble panther
#

if you liked its schizo side you'll miss that in 3.1

#

3.1 is more like gemini than the old deepseek r1 style

outer rover
# marble panther

That actually sounds pretty good. Thanks for taking the time to answer. I have only been testing it in small doses to keep expenses down, but overall I do prefer it so far.

pearl magnet
#

What's going on with providers for this model? Chutes and Deepseek itself have disappeared for me. Maybe temporary maintance or something? Is that how it works for openrouter, they just disappear from the page if they are down or whatever?

Nvm ignore me, seems it changed to hide ignored providers and I had logging disabled

marble panther
pearl magnet
#

because yeah, its providers with logging being hidden. I was using chutes literally a few hours ago and haven't touched my settings since then. Deepseek was visible on there too

#

my bad I guess?

marble panther
#

"SubModel provides DSv3.1 with fp8 and lower price - $0.2/0.8"

pearl magnet
#

Ok its an openrouter issue. I have paid endpoints that may train on inputs enabled but its ignoring them

jolly lintel
pearl magnet
# jolly lintel there may be some caching, wait a few minutes

But I haven't changed the setting. I didn't have this issue about 8 hours ago

Edit: Issue randomly fixed itself without me touching anything other than removing and re-adding an ignored provider to test, as adding that is the only change I made today. Very strange 🤔

marble panther
worthy charm
#

Hey guys someone can help me ?

I'm using AI SDK with OpenRouter ai-sdk-provider and DeepSeek V3.1.

My issue is: the returned reasoning is empty, and is added to the final message. Someone knows why or how to fix it ?

This is my configuration:

'chat-model': wrapLanguageModel({
  model: openrouter('deepseek/deepseek-chat-v3.1'),
  middleware: extractReasoningMiddleware({ tagName: 'think' }),
}),
lost carbon
worthy charm
#

I think that by default Deepseek 3.1 uses thinking mode

worthy charm
# lost carbon are you using thinking model? also , https://openrouter.ai/docs/use-cases/reason...

this is the full API response:

If you notice the reasoning is empty

[
    {
        "id": "5192b330-8478-4cd2-a347-e38f5a7a4bec",
        "chatId": "2cbe9177-020a-4ffe-8f15-3308c1fdd165",
        "role": "user",
        "parts": [
            {
                "text": "Hello, what is your name?",
                "type": "text"
            }
        ],
        "attachments": [],
        "createdAt": "2025-08-24T01:21:20.879Z"
    },
    {
        "id": "e6cb68a4-c762-4fb3-aaf6-c1c93e7b6902",
        "chatId": "2cbe9177-020a-4ffe-8f15-3308c1fdd165",
        "role": "assistant",
        "parts": [
            {
                "type": "step-start"
            },
            {
                "type": "reasoning",
                "reasoning": "",
                "details": [
                    {
                        "type": "text",
                        "text": ""
                    }
                ]
            },
            {
                "type": "text",
                "text": "Hmm, the user is asking for my name. This is a simple introductory question. I should respond with my name and a friendly greeting to start the conversation. \n\nI'll keep it warm and inviting by introducing myself as \"Weather Assistant\" since I have weather-related capabilities, and then ask how I can help them today. This sets a positive tone for the interaction. \n\nThe response should be concise but welcoming - no need for lengthy explanations since this is just an introduction.Hello! I'm Weather Assistant. How can I help you today?"
            }
        ],
        "attachments": [],
        "createdAt": "2025-08-24T01:21:23.387Z"
    }
]
lost carbon
worthy charm
lost carbon
rocky sedge
viscid sigil
#

glm 4.5 deserves higher ranking...its just so good

#

Even hermes 4 better than GLM?

#

Hermes 4 was so bad when i used it for roleplays

serene idol
marble panther
#

this is Kimi K2

Chapter 6 – A Quiet Knife of Diplomacy  
Aboard transfer-cutter Hekate, descent vector 14° off the ecliptic

Arthur Neegan sat on a crash-couch designed for a man two centimeters shorter and a great deal more resigned. The cabin had no windows—unnecessary on a thirty-minute ballistic hop—but a small display screen on the wall cycled through system diagnostics in green text. Green was supposed to be soothing. It merely reminded him of Martian skies at dawn, before the first strip-miners had stained them sienna.

He flexed his wrists against the soft cuffs that were not quite restraints. The polite Antares lieutenant had called them “safety belts,” yet the magnetic locks clicked shut with the same finality as any security bracer. Politeness, Arthur decided, was simply violence wearing velvet gloves.```

this is GLM 4.5
```Model Output:
The low thrum of the transfer vessel’s drive was a constant vibration in Arthur Neegan’s bones, a counterpoint to the frantic pulse in his throat. He sat rigidly in the utilitarian passenger couch, its worn synth-leather cool against his palms. Outside the thick viewport, the ochre deserts of Mars shrank below, soon replaced by the infinite, star-dusted black of interplanetary space. A small display screen on the wall cycled through various system diagnostics in glowing green text: ‘Drive Stability: Nominal’, ‘Life Support: Optimal’, ‘Nav Lock: Antares Orbital’. The last line felt like a brand.

*What have I gotten involved in?* The question echoed, louder than the drive’s hum. Just forty-eight hours ago, his biggest concern was calibrating the sonic resonator on his small, independent claim near Syrtis Major. Then the deep-core survey ping had screamed – not the usual trace elements, but a vein of Azra so dense, so pure, it defied comprehension. His unassuming plot, inherited from his father, was suddenly the most valuable real estate in the solar system.```
sharp quest
#

My main complaint with kimi is that it insist on making it as short as it can, V3.1 is atleast decent at that

rocky sedge
#

Kimi K2's writing of SCP time travelling letters from bench was good

marble panther
#

I will say GLM 4.5 is competent

#

its prose is nothing fancy, it writes at like a grade 8 level

#

but it's still enjoyable, just not "painterly" like kimi which reads like a real novel

rocky sedge
#

Kimi K2 writing is good, but sometimes loses plot points like relation status, name drops, and probably hurts at long context
GLM 4.5 with reasoning should handle it better, like double pass the same data

marble panther
#

but it's also a base model

#

a thinking version will come soon

sharp quest
#

Also, in my experience its just incapable of doing long outputs

marble panther
#

what do you mean by long?

sharp quest
#

its V3's problem compounded

rocky sedge
#

The longer is prompt context, the more beneficial is reasoning. Plus of course Kimi2 is censored

#

Uncensored Kimi K2 with reasoning and 256k context would be game changing

marble panther
#

kimi k2 seems to have no problem with long outputs at least in the benchmarks

#

at least in raw "length" but it does not fare well in the longform writing bench

sharp quest
#

thats the thing, i use my models for writing

marble panther
rocky sedge
marble panther
#

so here's hoping their thinking version has better long context

rocky sedge
#

You can start with Kimi until 32k context then switch to thinking models. Or try 2-1-2-1-2-1 switch between non-think and think

#

Mixing styles of both to avoid repetition but extract details from time to time

marble panther
#

I like the idea of structure using good long context models then kimi to write the prose for each smaller segment

#

just feed it character information, plot points, summary of story thus far

#

and let it paint over the rough edges

twin cape
#

I hate this model.

#

i spent so much time fixing bugs due to DeepSeek not changing the model id in API. and then this model performs worst in my coding eval. makes me question reality.

rancid geode
#

We know some of the main drawbacks of hybrid reasoning models, I would guess the main benefit is prompt cashing, prob reduces cost when the same model is used for both reasoning and non-reasoning. Outside of that I’m not sure why it’s beneficial

#

From deepseek’s perspective I’m sure it’s tempting because they only need to host one model weights and with how it’s harder for them to get GPU’s that would likely be beneficial

rocky sedge
#

Prompt caching is provider side, I think they can do shared prompt cache for all their models. I hadn't tried DeepSeek caching during separate models era

rancid geode
rocky sedge
#

I am not 100% sure if that's how it works

rancid geode
#

Might have to look into it to see if any providers do that

#

The path I thought companies would take with hybrid reasoning models is where it would not think at all when it’s an easy question and think for 5+ min when it’s really really hard (I know time is a horrible way to describe this because it’s depended on compute, but you get the point, haha), but I am assuming thats harder said than done, seems that models can’t comprehend where they may screw up, how many r’s in the word strawberry was the most famous example of this

twin cape
rancid geode
#

This is just vibe based, but I’m liking deepseek v3.1 with no thinking more than with thinking, it def feels like an improvement over 0324

#

I think it will be my go to moving forward, with GLM 4.5 subbing in at times, going to try to live without sonnet 4 for a bit

tranquil hornet
#

how was the responces of v3.1 on crypto chart and trading raw data ?

lost carbon
outer rover
#

Got really excited to see the free version, then found out its censored to hell. Briliant

rocky sedge
#

How is it even possible

outer rover
#

Even in the way it writes it seems very different to the paid provider version. Its kinda bizzare and I hate it.

#

I am willing to give some level of new provider issues benefit of doubt, but its clearly more than that.

#

It also seems totally resistent to my jailbreaking set up. My set up makes even Kimi K2 fully uncensored, so this is interesting in a really annoying way.

sharp quest
#

openinfra being openinfra

wintry jolt
#

I jbd it on the official website by accident

cerulean grove
#

i dont understand how to get the thinking v3.1 using typingmind's openrouter connector

vapid torrent
steel oar
hushed hedge
#

lets go its free nowwwwww

#

altho ugh its trash

outer rover
lost carbon
vapid torrent
lost carbon
vapid torrent
#

They make their money from collecting : selling data

lost carbon
vapid torrent
#

so it’s pretty much impossible

lost carbon
#

Damn.

#

Ok.

twin cape
#

DeepSeek-V3.1 coding performance evaluation on my coding evaluation set:

Mixed performance with concerning regressions - DeepSeek-V3.1 achieved an average rating of 5.68, significantly underperforming compared to top models and even showing regression from its predecessor on some tasks.

Performance Comparisons

  • vs. Top Models: Performed worse than Claude Opus 4, Claude Sonnet 4, Grok 4, and GPT-4.1
  • vs. Open-Source Models: Also lagged behind gpt-oss-120b, Qwen3 Coder, and Kimi K2
  • vs. Predecessor: Mixed results compared to DeepSeek-V3 (New), with some improvements but notable regressions

Notable Issues

  • Instruction adherence problems - stubbornly ignored specific formatting requests
  • Gap in advanced programming patterns - struggled with uncommon or tricky scenarios
  • Visualization - produced charts remarkably similar to Horizon Alpha

Full blog post: https://eval.16x.engineer/blog/deepseek-v3-1-coding-performance-evaluation

hushed hedge
#

wish they made clause sonnet 4 cheaper

sharp quest
#

that would require anthropic to make their models more efficient

hushed hedge
hushed hedge
twin cape
sharp quest
#

there are quite many reports that the biggest AI companies are usually running at a loss, and subsidized, openai's own revenue data confirms this

hushed hedge
pearl magnet
#

Does anyone know if chutes is running this model at fp4? Based on the low price I would assume so

polar locust
pearl magnet
#

And also the fact that you can't filter by "no quant" only "all quantizations"

storm torrent
#

its in the row of icons under the provider name

pearl magnet
storm torrent
#

chutes doesn't have one listed, so. it's "unknown"

pearl magnet
#

yeah, so I was wondering if anyone knew what it actually is

storm torrent
#

yeah if it's not there its not disclosed

vapid torrent
#

i think the thing with chutes is its distributed so there is no specific quant, every node might use different config

cerulean grove
hushed hedge
#

finally a good provider hosting deepseek v3.1 for free

sharp quest
#

its fp4 though

outer rover
#

Anyone know why Silly Tavern still shows only the 64K OpenInference and not 164K DeepInfra?

mortal igloo
#

SillyTavern uses the context listed from the /models endpoint, the one appearing in ST's command prompt console, when you click on "Connect" so it uses that

#

Just choose DeepInfra provider and set context to unlocked/unlimited

untold badger
#

Like he said, you can select the provider. Otherwise, don't quote me on this, presumably if your context is too big, OR would route it to whatever supports it (if not, it would error), assuming you keep Middle-Out Transform disabled.
And "unlocked" context slider will prevent auto adjusting the max.

outer rover
#

Thanks guys!

steel oar
hushed hedge
#

i was wondering why deepseek is halucinating then I found out about the fp things

lucid wave
#

What precision is deepseek trained on?

eager lava
rancid geode
#

I kinda just swapped to using deepseek as the provider, consistent quality and much cheaper

#

But I guess now there are more options that support chaching so it’s prob no longer the cheaper option

glacial reef
rancid geode
glacial reef
#

yeee

rancid geode
#

bummer, its not worth using models that dont support prompt caching, at least if you use it for a lot of code

hushed hedge
#

why deepseek is so slow

#

deepseek provider

#

and why pricing for v3.1 is different like on open router even from deepseek provider its cheaper than deepseek platform

storm seal
hushed hedge
#

and its giving cheaper than deepseek platform

#

check in open router

storm seal
#

@nocturne kelp there's a pricing discrepancy for the DeepSeek provider

#

Its 27 cents in and $1.1 out but DeepSeek is charging a different amount

lament geode
#

With 3.1 do you guys usually put supplemental tool info in your system prompt? I've been getting weird tool behavior from 3.1. Also, how thorough are you with your tool and argument descriptions. I didn't see this until I switched from openai models to this model and I'm sure the problem is my lack of experience. Openrouter published an article on tool call accuracy but it seemed to suggest proprietary models excel at tool calling accuracy but didn't tell us what to do about it on open models. link: https://openrouter.ai/announcements/tool-calling-accuracy

native shale
#

With the exception being glm 4.5 which is also really good at tool calls

lament geode
sweet zodiac
#

May i know the rate limit of this model ?

gaunt edge
native shale
#

maybe cuz i was served fp4

gaunt edge
#

yea i think theo t3 person and maybe some others have some videos about kimi and tool calls, it is pretty provider dependent

#

apparently, from memory

native shale
storm seal
hushed hedge
#

any better alternatives to deepseek that isbetter and open source?

#

for making notes

modest smelt
fleet lake
#

how to remove a car?

spice marsh
#

?

edgy pivot
#

Plus Kimi k2 0905 is really cute

rocky wagon
#

Jeeesus, context of free model got downgraded to 30k from 60k. Paid model' 163k context. It's unuseable at this point. Any better free models with more context?

steel oar
hushed hedge
gaunt edge
# hushed hedge is it better tha glm?

people generally like kimi's style more i think, but its not necessarily 'better' than glm. glm is hybrid (toggleable) reasoning, kimi doesn't have 'reasoning'

#

kimi is also quite concise

edgy pivot
hushed hedge
gaunt edge
hushed hedge
coarse fable
#

I got "(DeepInfra) Provider returned error: deepseek/deepseek-chat-v3.1:free is temporarily rate-limited upstream. Please retry shortly, or add your own key to accumulate your rate limits: https://openrouter.ai/settings/integrations" For Deepseek v3.1 free using DeepInfra, anyone knows how long it will last?

pure path
#

It feels like prefills still haven't been sorted out on V3.1. All of the providers returned a completely unrelated response in Mandarin

blazing folio
pure path
#

I heard only deepseek's provider does working prefills, but havent tried it

blazing folio
#

i have no idea

#

it auto picks openinterface and deepintra as the common providers for me

lament geode
#

deepseek/deepseek-chat-v3.1 is a constant battle now. We don't use free models. Account is always topped up. Suddenly: No endpoints found that support tool use. Is this no longer supporting tool use?

potent parcel
potent parcel
digital snow
#

💆‍♂️

vapid torrent
#

Hopefully it actually fixed the language mixing nonsense

jolly lintel
#

#1419676592082518039

lament geode
#

I still get a 404 from deepseek/deepseek-chat-v3.1 at OR when my payload defines tools: regardless of provider. What did I miss or is this terminus unrelated?

#

The models endpoint reports that it supports tools.

vapid iris
#

would like to report that V3.1 (free) from DeepInfra appears to have a (temporarily?) reduced maximum context size to 49299 tokens (it appears to not be a consistent limit, 48k was a good buffer), anything above that results in a 404 'no allowed providers are available' error. Have ignored OpenInference to check.

Would it be possible to check with DeepInfra if it's a temporary or permanent change, and if it's permanent, be reflected in the provider list? Thank you!

p.s. apologies for the ping, just noticed your status of being on vacation, my bad!

languid vigil
#

love it that deepinfra did something on their back end to cause the quality of v3.1 to nosedive and struggle with endless repetition

vernal wadi
#

Why does v3.1 this sometimes? It's so strange. I didn't changed anything.

#

What is this shit?

#

<@&1384697330254610442> can you help?

#

It's not too much tokens

#

I don't understand what is happening

#

@storm seal any idea? My roleplay worked fine until now 🙁

storm seal
#

Try Chutes as the provider

vernal wadi
#

I use my OR key

#

Only deep infra offers free

#

Then will I switch to 3.2 it's cheaper

#

The same issue O_O

#

Found the issue it's my prefill

storm seal
vernal wadi
#

But strange, it didn't happened earlier.

storm seal
void brook
storm seal
#

Some providers might use vLLM while others use Llama.cpp or their own custom vLLM version

vernal wadi
nocturne kelp
analog violet
#

DeepInfra is permanently removing deepseek v3.1 or its temporary?

analog violet
#

Amazing

languid vigil
#

thanks janitorai

vernal wadi
#

Looks like v3.1:free now only works if you let them use your prompts :/

#

3.1 still works but only when selling your soul

vernal wadi
# languid vigil thanks janitorai

Yeah janitor had a huge impact on Open router, deepseek and chutes.
It's crazy h6huge the community got and how satisfied they all are from deepseek.
But I also don't know any alternative LLM which writes as good (and uncensored!) like deepseek.

#

And their built-in premium version of deepseek is with 20$ per month too expensive.

#

I also use chub 🫣😅

#

V3 is at <10% uptime. And v3.1 gets dead now

marsh pulsar
storm seal
#

Oh what the hell

marsh pulsar
#

yeah a 100+ billion difference in tokens is absolutely diabolical

mild eagle
#

Holy

analog violet
#

Its over guys...Deepseek free 3.1 is officially dead

#

No more free gooning

#

Only premium gooning for now

coarse fable
#

Since it died tragically, I have to admit I have never experienced so coherent and complex RP capacities with a 4-compressed model before 🥲

surreal parcel
#

So the free one is dead for good?

storm seal
#

Ooooh, it went to zero!

vapid torrent
vapid torrent
coarse fable
#

I got an idea that could kinda work for the free Provider in a "never happening hypothetical return(?)", Rate limits at specific services. The idea is simple, Free Janitor Users using a free model that got too many requests, they all get rate limit, like a single request per minute? Or two? Just affecting a single service instead of they all. That could not directly affect the other free users that used DeepSeek V3.1 for any other reasons, from helper to coding. It's kinda unfair that they all get affected because of Janitor users.

gaunt edge
coarse fable
vapid iris
#

thats why janitor wasn't on any of the charts before late september

#

even though their usage number was already insane

coarse fable
# vapid iris even though their usage number was already insane

I think here the problem is a middle between Janitor being hidden from view until last month and Openrouter for, well, just being an open router. Since it ends up affecting providers and users with situations like this, despite not being paid, the provider knew they would get anything, not even training data from "charity work"—But the may problem is whole provider being overwhelmed from a single site... welp, there is nothing users can do now, except either paying for requests... or simply getting subscriptions from other sites since Openrouter does not have subscriptions.

Honestly, Openrouter having subscriptions for certain models and providers would be an idea? like, certain amounts of dollars for a certain amount of daily requests so it isn't an abuse(?)

vapid iris
#

since if they're paying, that's their own money to waste lol

vapid torrent
vapid iris
#

sept 23/24

coarse fable
#

So Openrouter can't track it well or simply they were dumb?

vapid iris
#

it's opt-in not opt-out lol

#

By including simple headers in your requests, your app can appear in our leaderboards and gain insights into your model usage patterns.
but it's more infamous than anything else 👀

coarse fable
#

Yeahh, anyways. If anyone here that loved to use "gooner rp free models" I think the Z-Ai is good, if you use a simple yet good system prompt with no jailbreak(because it makes reasoning too bulky)

analog violet
#

Off topic but using any free models with chutes as a provider feels like a gambling addiction

coarse fable
tough elk
#

Won’t really use Z-AI, I’m more of a prompt person which is why i used deepseek and now it’s gone, forever probably

coarse fable
#

With a 10K tokens RP chat it provides good replies, just need to specify reply lenghts or else it gets messy long. And the only problem is the too many requests error

strong moss
#

Hey y'all, is the other v3.1 provider safe to use? Read here that they use your data to train their models. But isn't it the same for deepseek official?

#

What's

#

The best course of action here if you want to continue roleplaying with deepseek

tough elk
# coarse fable The big reasinoning Qwen of Vecine got a normal prompt mode if you put /no_think...

Thank and I also tried out GLM again and it’s the wildest model I’ve ever seen before. It gets straight into the not safe to work stuff and it doesn’t really matter what kind of chatbot your talking to. idk if it’s the model or if it’s my prompts since I have quite a list of prompts that are for the AI and definitely isn’t SFW so maybe it’s the prompt, I’m not sure.

tough elk
coarse fable
signal sinew
opaque shore
#

Wait can’t we just use deepseek v3.1 (not free) from openrouter? If we’re paying there shouldn’t be an issue right?

mortal igloo
#

Paid DeepSeek is better in terms of availability (100% uptime)

#

Yeah

opaque shore
#

Oh okay, I thought that was down too🥹

mortal igloo
#

I use directly from DeepSeek (no longer v3.1 but v3.2 Exp) and been having a blast since. Topped up $15 in March and still have $4 now

opaque shore
#

Is it less expensive to get it directly from deepseek?

mortal igloo
#

Low context (32k) and cache hits are a blessing

glad basin
#

deepseek is cheapest provider especially due to caching

mortal igloo
glad basin
#

you can enforce routing to deepseek on openrouter

wild sparrow
mortal igloo
#

Now about caching, it saves cash because your previous messages are kept in their system (for hours in my experience), so if your earlier messages or system prompt changes, or there is a world info/lorebook entry activation (that inserts a new system message before the chat or within the chat), then the cache busts

#

If earlier context are kept and nothing changes from the very beginning to the latest message, then the cache stays.

#

(Except once you reach 32k of context: older messages are removed for new messages to be inserted, thus no cache and you get billed normally)

#

I don't know if I can explain better than this jumbled mess lol

strong moss
mortal igloo
#

And the caching works too

#

I use direct Deepseek because Openrouter's deepseek account balance can be zero (no auto top-up) so errors haha

brazen gale
#

how do u do it??

mortal igloo
# brazen gale how do u do it??

Choosing the official Deepseek provider, you mean? In SillyTavern you can choose a provider, but for other platforms that do not have a provider selection tab, I think you can just set "Allowed Providers" in your OpenRouter account settings to DeepSeek only (use the model DeepSeek v3.2 Exp, they only serve that currently and paid)

vapid torrent
#

You can also make a preset in openrouter

brazen gale
#

dang im all so new to ts

vapid torrent
#

if you use DeepSeek provider

#

or DeepSeek directly

wild sparrow
#

like this?

mortal igloo
#

The caching process is automatic

wild sparrow
#

32k is a good amount for context as well

#

so that should last a while

vapid torrent
#

Be warned caching goes out the window after you hit ctx limit

wild sparrow
#

il drop 5$ in and see how it goes

vapid torrent
#

as the messages leave FIFO style

wild sparrow
#

how will i know?

vapid torrent
#

it’s cheaper to have long ctx compared to small ctx with DeepSeek’s 10x cache discount

wild sparrow
#

could you show your preset settings?

vapid torrent
#

I don’t use

#

i just set to maximum

#

so it never evicts from ctx window

wild sparrow
#

like that?

#

or whatever the max is for the model?

#

in this case 163840

mortal igloo
#

Yeah if you want full context, Deepseek official provider only supports 128k

wild sparrow
#

ive set it to 32k

#

so it saves money in the long term since i hardly go over 30 msgs

vapid torrent
#

My longest chat was 47k tokens iirc

strong moss
#

How does caching work? Does cache stay or something, considering that it's basically giving the bot an idea of the past events/roleplay?

#

It would work even if I use deepseek official through openrouter right

mortal igloo
vapid iris
#

according to their docs at https://api-docs.deepseek.com/guides/kv_cache,

Cache construction takes seconds. Once the cache is no longer in use, it will be automatically cleared, usually within a few hours to a few days.
so might be up to days, if their servers aren't under heavy usage.

strong moss
#

That sounds neat, but I wish it lasted longer since my writing could take hours too lmao

vapid iris
#

I saw people mentioning they had cache hit at 25 hours

strong moss
#

ah neat

vapid iris
#

but if you're doing a chat and your old messages get pushed out

#

might not be a cache hit anymore

strong moss
#

Since I roleplay using proxy, I often just use the same character (with the same definition) so the character definition would be consistent, would that mean it's always a cache hit even if it's a different chat but the same character?

#

heck, would it be possible to maintain cache for weeks, maybe even months?

mortal igloo
storm seal
mortal igloo
strong moss
#

OHHH so the cache would only hit until something like... 16k if I set it so?

#

Would setting the context window to 32k/16k be cheaper than the 64k/128k context with cache

mortal igloo
#

Yep, if you set context to 16k, it'd cache the 16k like:

  1. System prompt [500 tokens]
  2. Bot definition [1000 tokens]
  3. Persona definition [100 tokens]
  4. Chat history [1st bot "greeting" message, 2nd message your reply, 3rd bot reply, 4th your reply, etc. totaling 15400 tokens]

This is what the 16k cached would be.

Then if you send your reply to the bot once the context is full, the bot's 1st "greeting" message would be replaced by your 2nd message from the chat history like:

  1. System prompt [500 tokens] <-- this is cached since nothing changed
  2. Bot definition [1000 tokens] <-- cached
  3. Persona definition [100 tokens] <-- cached
  4. Chat history [2nd message your reply, 3rd bot reply, 4th your reply, 5th bot message, 6th your reply, etc. totaling 15400 tokens] <-- no cache since the bot's greeting message is replaced by your reply // there is a change. DeepSeek re-caches this repeatedly if you still use 16k context
#

So if you decide to use 128k, the cache would prove useful until you hit 128k. More than that, it re-caches the whole 127,4k tokens

#

I use 32k context because when it re-caches, deepseek bills me $0.01 instead of $0.03 if 128k

#

cheap

crimson egret
#

Okay, guys. Can someone explain to me whether V3.1 is still working or if it’s completely no longer free? Some people say it can still be used, but one of the endpoints was removed, while others say it no longer works and has been shut down.

vapid iris
#

v3.1 free had two providers, DeepInfra and OpenInference

#

DeepInfra no longer provides free v3.1 due to the massive human horde from janitor

#

and OpenInference is a far smaller provider

#

so you're just going to get 429 errors

crimson egret
#

So now it can’t really work properly on the free model?

vapid iris
#

if you thanos snap 95% of the janitor horde, it'll work. but currently you're competing with way too many people for the available capacity.

crimson egret
#

Thanks for the reply.

paper magnet
# vapid iris v3.1 free had two providers, DeepInfra and OpenInference

rlly? I was under the impression 3.1 free got axed then taken out the back and executed because of the graph shown earlier that it was over. I got 3.1 free from OpenRouter and I keep getting errors whenever I try to use it, and when I go on the error link, it says its no longer available

vapid iris
willow junco
mortal igloo
willow junco
mortal igloo
#

oh wait

#

my calculation's wrong, sorry. It's $0.009 or 1 cent...

#

damn

chrome rune
#

Hello, I need help. I get this message when I try to send a message in any chat on “Janitor AI.”

rocky sedge
#

3.1 is dead

chrome rune
#

Shit

hushed hedge
#

you can still use it from platform deepseek

abstract igloo
#

you can still use it in OpenRouter, it's just the free endpoint that's gone

hidden plinth
#

Does anyone else have a problem with DeepSeek V3.1 putting non-english characters in tool calls? It consistently does it for me, and causes tool calls to fail.

tropic kettle
abstract igloo
#

no, i meant there isn't a free DeepSeek v3.1 anymore

abstract igloo
tropic kettle
# abstract igloo also, depending of what you're using, you should only use the https://openrouter...

i am just trying to test the endpoint jst as it is in the documentation and i am receiving 404

const url = "https://openrouter.ai/api/v1/chat/completions";
const headers = {
"Authorization": Bearer ${process.env.OPENROUTER_API_KEY},
"Content-Type": "application/json"
};
const payload = {
"model": "deepseek/deepseek-chat-v3.1",
"messages": [
{
"role": "system",
"content": "hello"
},
{
"role": "user",
"content": "If you built the world's tallest skyscraper, what would you name it?"
}
],
"temperature": 0.7
};

const response = await fetch(url, {
method: "POST",
headers,
body: JSON.stringify(payload)
});

const data = await response.json();
console.log(data);

#

any response?

abstract igloo
#

idk, it worked here

#

try seeing if you have any provider preferences in your settings

jovial idol
#

how do i see the reasoning tokens for v3.1 and v3.1-terminus

here is my code. i tried a lot but cannot see the reasoning for different kind of user messages.

Any help would be much appreciated.

client.chat.completions.create(
model="deepseek/deepseek-v3.1-terminus",
messages=[{"role":"system","content":system_prompt},{"role":"user","content":user_message}],
temperature=0.1,
top_p=0.1,
stream=True,
max_tokens=2500,
extra_body={
"provider": {
"sort": "throughput"
},
"enable_thinking": True,
"reasoning": {
"effort": "high",
"exclude": False
}
}
)

storm seal
abstract igloo
#

i think it needs to be
"reasoning": {
"enable": true
}

lost carbon
sweet zodiac
#

i can't find the good things form v3.1,
i use R1 and V3 in the past and the prompt(s) fit them.
I believe the DS family trained by similar dataset and behave the same.
however whne i change the model from R1 to V3.1, the result ruins.... 😂

spice marsh
#

Different models need different prompts, even in the same lineup

lost carbon
#

(DSPy fixes this)

dense wolf
#

And setting all other providers for the model to ignored just gives me a no endpoints error

abstract igloo
#

DeepSeek

dense wolf
#

I did enable it just now and still got the error

#

Let me reopen my app though

#

Just in case

#

Seems to have worked

atomic quartz
#

Hey folks, does anyone know why OpenRouter is throwing an error?

storm seal
humble echo
#

in a chubapp i see deepseek 3.1 base, 3.1 terminus and 3.1 chat

What is the difference (?)

spice marsh
#

3.1 Base -> Won't chat with you, it will complete your sentences
3.1 Chat -> It'll be able to chat
3.1 Terminus -> This is 3.1 Chat with minor bug fixes (some repetition issues and mixing english and chinese)

atomic quartz
storm seal
atomic quartz
atomic quartz
vapid iris
atomic quartz
#

I didn't enable this option.

storm seal
vapid iris
atomic quartz
#

Is this the providers section at ?

vapid iris
atomic quartz
#

Oh, sorry, the top point didn't make it into the screenshot above.

vapid iris
#

Hmm, this is weird. All your settings are correct, V3.1 (free) should be working fine for you, hmm.

atomic quartz
#

The account isn't banned. ) Z.AI: GLM 4.5 Air (free) also doesn't work

  1. Could the models not working be related to geolocation?
  2. Is there technical support for Openrouter?
vapid iris
#

I'm not sure on that front, sorry. Maybe someone else will know, but I'm not aware of OpenInference geo-blocking.

atomic quartz
#

Is this a problem?

abstract igloo
#

i don't think it should be

buoyant willow
#

anyone else's v3.1 free model just being weird

#

it js gives "1.1..1.1.1..1.11" or similar gibberish

storm seal
#

And you all can report that provider here

buoyant willow
#

OpenInfdrence

#

Ill se

storm seal
#

Ah

#

Oops, I didn't know that

buoyant willow
#

I tried both yesterday and the day before, im not sure whats the issue

#

Via oenrouter chatroom

#

and aider

storm seal
#

See if you can reproduce it in a new chat if it's an Open inference issue, and Toven will yell at OpenInference to fix it

buoyant willow
#

One sec js gotta get back home

buoyant willow
storm seal
#

Chats are stored locally, not on server

buoyant willow
#

Oh wait hih

#

huh

#

Nvm its spewing nonsense again

#

It may be a specific token count cuz it only happened when i pasted a large block of text

void brook
#

Mostly because of the provider of free models
Which understandable for free models, but i guess if they can fix it then it's amazing of them as a provider

mortal igloo
#

Perhaps because of the quantization? OpenInference provides free DS v3.1 at int8 precision which is lower uner fp4 I think

gaunt edge
#

int8 is less quantised than fp4 (so, int8 is higher quality). for some reason i thought openinference v3.1 free said fp4, but i checked and it does say int8

void brook
# gaunt edge int8 is less quantised than fp4 (so, int8 is higher quality). for some reason i ...

Agree on this, but i don't know with those new NVFP4, i heard if it goes through the pre-train, sft then rl phase with NVFP4 format it will be comparable with the BF16.

I guess that make sense, because deepseek when they doing original training with FP8 the model actually have comparable performance with BF16 than if it come from BF16 then being quantize to be FP8.

The key point is in what format does the model originally being train on.

unkempt stratus
#

Yall I'm using the og 3.1 chat

#

Not the terminus

#

And it's lagging so hard, I'm using paid

#

What's going on

full relic
#

i am trying to make a local language ai chat bot using the deepseek v3.1 from openrouter but the url to get the request seems to be broken i am using https://openrouter.ai/api/v1/chat/completions for api but the chatbot shows url not found what could be possible cause and a solution to this?

full relic
#

Seems to work now

unkempt gorge
#

response tokens are coming through empty and every thing is ending up in the reasoning field today for some reason? tried across several providers all of which were fine before

vernal wadi
#

It loaded endless long to finally respond with a very short message. But consumed 3x the amount of tokens then my input was?
I assume it has "invisible" thinking?

#

Not sure if that is good or bad...

abstract igloo
#

if you click to see more info you can see how many of them were reasoning tokens