agile fossil Sep 29, 2025, 9:10 AM

#

The model is now available on website and HuggingFace
https://huggingface.co/collections/deepseek-ai/deepseek-v32-68da2f317324c70047c28f66

DeepSeek-V3.2 - a deepseek-ai Collection

#

#

i-have-discovered-deepseeker-v3-2-base-v0-al21vk9t31sf1.png

violet hatch Sep 29, 2025, 9:12 AM

#

agile fossil

how did you even access this page

serene mirage Sep 29, 2025, 9:14 AM

#

agile fossil

这是从哪里？他们的微信？

agile fossil Sep 29, 2025, 9:15 AM

#

serene mirage 这是从哪里？他们的微信？

Yes.Now you can use deepseek v3.2 on their website

agile fossil Sep 29, 2025, 9:16 AM

#

violet hatch how did you even access this page

Now you can't access the page.They accidentally leaked it out before and were discovered by others

violet hatch Sep 29, 2025, 9:16 AM

#

agile fossil Now you can't access the page.They accidentally leaked it out before and were di...

ah

serene mirage Sep 29, 2025, 9:17 AM

#

agile fossil Now you can't access the page.They accidentally leaked it out before and were di...

有人下载了他（它？）吗？

#

AI是"他"或者"它"？

violet hatch Sep 29, 2025, 9:17 AM

#

i doubt anyone would have the internet speed to download it that quickly

agile fossil Sep 29, 2025, 9:18 AM

#

serene mirage 有人下载了他（它？）吗？

No.There have no any files just a blank

hexed cloud Sep 29, 2025, 9:39 AM

#

Terminus my ass

shell remnant Sep 29, 2025, 10:03 AM

#

https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp

deepseek-ai/DeepSeek-V3.2-Exp · Hugging Face

violet hatch Sep 29, 2025, 10:04 AM

#

👀

agile fossil Sep 29, 2025, 10:08 AM

#

https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp-Base

deepseek-ai/DeepSeek-V3.2-Exp-Base · Hugging Face

violet hatch Sep 29, 2025, 10:09 AM

#

seems like they use a new attention method which leads to very cheap inference

84112f72d7a99a2a108be2df5d1b78a1993bf34533f90c65db9e9d3c4a1a37fc.png

#

good sign that benchmarks with tool calling still remain intact, means the attention mechanism isnt terrible

agile fossil Sep 29, 2025, 10:14 AM

#

kindred abyss Sep 29, 2025, 10:14 AM

#

Its cheaper 😮 nice

violet hatch Sep 29, 2025, 10:16 AM

#

on api

#

@shrewd gate

agile fossil Sep 29, 2025, 10:16 AM

#

serene mirage Sep 29, 2025, 10:19 AM

#

violet hatch on api

That is significantly cheaper

violet hatch Sep 29, 2025, 10:20 AM

#

output is like 3x cheaper and input is 2x cheaper

serene mirage Sep 29, 2025, 10:20 AM

#

Is it a promotional period or is that the permanent new prices?

violet hatch Sep 29, 2025, 10:20 AM

#

not sure but id say they're likely to stay like that since the inference is cheaper

violet hatch Sep 29, 2025, 10:20 AM

#

violet hatch on api

https://api-docs.deepseek.com/quick_start/pricing

Models & Pricing | DeepSeek API Docs

The prices listed below are in unites of per 1M tokens. A token, the smallest unit of text that the model recognizes, can be a word, a number, or even a punctuation mark. We will bill based on the total number of input and output tokens by the model.

serene mirage Sep 29, 2025, 10:21 AM

#

v3.1terminus is still available at v3.2 rates it seems

#

https://api-docs.deepseek.com/guides/comparison_testing

V3.1-Terminus Comparison Testing | DeepSeek API Docs

As an experimental version, although DeepSeek-V3.2-Exp has been validated for effectiveness on public evaluation sets, it still requires broader and larger-scale testing in real user scenarios to identify potential issues in certain long-tail use cases. To facilitate comparative testing by users, we have temporarily retained additional API acces...

#

For around 3 weeks

violet hatch Sep 29, 2025, 10:22 AM

#

questionably long base url

serene mirage Sep 29, 2025, 10:22 AM

#

violet hatch questionably long base url

Probably deliberately to discourage use

deep verge Sep 29, 2025, 10:24 AM

#

wow so cheap than terminus

#

what the hell

#

why even released terminus

violet hatch Sep 29, 2025, 10:24 AM

#

deep verge why even released terminus

as a bug fix pretty much

deep verge Sep 29, 2025, 10:26 AM

#

when is r2 coming though 😭

violet hatch Sep 29, 2025, 10:26 AM

#

im not sure whether they'll even release r2, i think they'll stick with hybrid

deep verge Sep 29, 2025, 10:26 AM

#

so v4 is r2?

violet hatch Sep 29, 2025, 10:26 AM

#

maybe..

serene mirage Sep 29, 2025, 10:26 AM

#

deep verge so v4 is r2?

Probably

violet hatch Sep 29, 2025, 10:26 AM

#

no confirmation

deep verge Sep 29, 2025, 10:26 AM

#

aww....thats so long

sharp bloom Sep 29, 2025, 10:27 AM

#

violet hatch questionably long base url

smart tbh, openrouter should do this

violet hatch Sep 29, 2025, 10:27 AM

#

i think they'll go with hybrid because it pretty much gives them double the space to run their models, no need to host 2 models, just 1 which can do both

serene mirage Sep 29, 2025, 10:27 AM

#

V4 is probably based on this new Architecture somehow

serene mirage Sep 29, 2025, 10:27 AM

#

violet hatch i think they'll go with hybrid because it pretty much gives them double the spac...

Also improves context caching ability for people who switch between think and nothink

violet hatch Sep 29, 2025, 10:27 AM

#

serene mirage Also improves context caching ability for people who switch between think and no...

true

willow pumice Sep 29, 2025, 10:28 AM

#

Max output tokens seems low? Was it that low for 3.1?

violet hatch Sep 29, 2025, 10:29 AM

#

willow pumice Max output tokens seems low? Was it that low for 3.1?

based on internet archive seemingly yes

serene mirage Sep 29, 2025, 10:29 AM

#

willow pumice Max output tokens seems low? Was it that low for 3.1?

It’s always been max 8k for chat

shell remnant Sep 29, 2025, 10:31 AM

#

As an intermediate step toward our next-generation architecture, V3.2-Exp builds upon V3.1-Terminus by introducing DeepSeek Sparse Attention
Seems v4 is also very soon ?

violet hatch Sep 29, 2025, 10:31 AM

#

i wonder if this means they'll increase context length in the future to something insane like 1M

#

cause if the scaling remains the same then it should be in theory really cheap to both train & run inference, beside the data

trail root Sep 29, 2025, 10:31 AM

#

Qwen Next did something similar

#

So how many active parameters? 685B A37B?

violet hatch Sep 29, 2025, 10:33 AM

#

trail root So how many active parameters? 685B A37B?

seemingly somewhere around that

charred torrent Sep 29, 2025, 10:33 AM

#

http://xhslink.com/o/9KAYMQxFvkh

小红书

DeepSeek-V3.2-Exp 发布，API大幅降价 - 小红书

3 亿人的生活经验，都在小红书

serene mirage Sep 29, 2025, 10:34 AM

#

huh, HLE went down

violet hatch Sep 29, 2025, 10:34 AM

#

lower knowledge density maybe

#

though this is a "experimental" version

serene mirage Sep 29, 2025, 10:35 AM

#

is HLE that bench that requires a bunch of specialist knowledge?

violet hatch Sep 29, 2025, 10:35 AM

#

serene mirage is HLE that bench that requires a bunch of specialist knowledge?

yeah i think so

serene mirage Sep 29, 2025, 10:35 AM

#

also were those results from thinking or non-thinking?

violet hatch Sep 29, 2025, 10:35 AM

#

agile fossil

based on this, WITH reasoning

errant holly Sep 29, 2025, 10:36 AM

#

English twitter announcement: https://x.com/deepseek_ai/status/1972604768309871061

DeepSeek (@deepseek_ai)

🚀 Introducing DeepSeek-V3.2-Exp — our latest experimental model!

✨ Built on V3.1-Terminus, it debuts DeepSeek Sparse Attention(DSA) for faster, more efficient training & inference on long context.
👉 Now live on App, Web, and API.
💰 API prices cut by 50%+!

1/n

trim musk Sep 29, 2025, 10:38 AM

#

#

WeChat official account

#

Official app, web app, WeChat app have all been updated to this version

serene mirage Sep 29, 2025, 10:39 AM

#

they're usually pretty fast at rolling out the new models to all their platforms

violet hatch Sep 29, 2025, 10:40 AM

#

that was a quick rollout after their chat website, do they really get that much traffic to early test it that quickly 🤔

errant holly Sep 29, 2025, 10:41 AM

#

Novita is already trying to add support. Once the OR team wakes up, i'm sure they're on it 🙂

serene mirage Sep 29, 2025, 10:42 AM

#

im guessing the new architecture makes it not just a drop-in weights replacement?

violet hatch Sep 29, 2025, 10:42 AM

#

probably not since the attention mechanism is entirely different

next patrol Sep 29, 2025, 10:42 AM

#

https://tenor.com/view/dj-khaled-another-one-one-gif-5057861

Tenor

serene mirage Sep 29, 2025, 10:43 AM

#

yeah all the previous (v3, r1, v3.1) are "DeepseekV3ForCausalLM"
the new model is "DeepseekV32ForCausalLM"

hollow anvil Sep 29, 2025, 10:47 AM

#

Deepseek is becoming Qwen with small iterational releases

serene mirage Sep 29, 2025, 10:48 AM

#

so the deepseek v3 architecture had:

v3
r1
v3 0324
r1 0528
v3.1
v3.1 terminus

#

all within around 10 months

deep verge Sep 29, 2025, 10:50 AM

#

idk... i prefer a massive update rather than too many smaller versions so quickly.

olive leaf Sep 29, 2025, 10:52 AM

#

violet hatch on api

Woah what the fuck

#

That's massive savings!

serene mirage Sep 29, 2025, 10:58 AM

#

yeah cause they got sparse attention now

agile fossil Sep 29, 2025, 10:59 AM

#

Novita is coming

serene mirage Sep 29, 2025, 10:59 AM

#

no more obscenely quadratic compute usage!!!!!

olive leaf Sep 29, 2025, 10:59 AM

#

serene mirage yeah cause they got sparse attention now

I gotta read the examination for thaf

serene mirage Sep 29, 2025, 11:00 AM

#

i can't find a research paper

#

maybe not out yet

#

i found the paper

#

https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/DeepSeek_V3_2.pdf

GitHub

DeepSeek-V3.2-Exp/DeepSeek_V3_2.pdf at main · deepseek-ai/DeepSeek...

Contribute to deepseek-ai/DeepSeek-V3.2-Exp development by creating an account on GitHub.

#

well not quite a paper

#

more just an outline

sharp bloom Sep 29, 2025, 11:02 AM

#

i like the point release updates. so often it feels like these llms are almost but not quite there yet. maybe the all the real life feedback helps dial them in

olive leaf Sep 29, 2025, 11:03 AM

#

serene mirage https://github.com/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/DeepSeek_V3_2.pdf

Thankies

serene mirage Sep 29, 2025, 11:03 AM

#

apparently they started with v3.1 terminus base

#

and just taped on the lightning indexer to that

#

and then trained that

#

they first train the lightning indexer to agree with the existing model's attention patterns

olive leaf Sep 29, 2025, 11:06 AM

#

I'm 50% sure I'm misunderstanding

#

But I think the outline says that the indexer only chooses the topK tokens

#

If you set the topK to 40, it'll only consider the top 40 tokens in its attention, which makes it faster

#

Someone double check me

serene mirage Sep 29, 2025, 11:09 AM

#

looks like it

#

each token only attends to a subset of the rest of the tokens

#

well it clearly works well so yeah

#

If anyone here is fluent in PyTorch then this looks like the new indexer logic:
https://huggingface.co/deepseek-ai/DeepSeek-V3.2-Exp/blob/main/inference/model.py#L431

inference/model.py · deepseek-ai/DeepSeek-V3.2-Exp at main

violet hatch Sep 29, 2025, 11:39 AM

#

serene mirage If anyone here is fluent in PyTorch then this looks like the new indexer logic: ...

according to GPT5 its like this

#

compute cost stays similar but memory usage is about 9x lower

serene mirage Sep 29, 2025, 11:44 AM

#

violet hatch seems like they use a new attention method which leads to very cheap inference

looks like it also reduces compute

#

cause it only has to do the full attention on the shortlisted tokens and only has to handle the 576d vectors for the rest of the tokens

violet hatch Sep 29, 2025, 11:45 AM

#

serene mirage cause it only has to do the full attention on the shortlisted tokens and only ha...

im pretty sure it computes all of the tokens then cut its down to top k rather than only computing top k

#

so compute is still about the same

serene mirage Sep 29, 2025, 11:45 AM

#

it computes all the tokens using the vectors with lower dimension though

violet hatch Sep 29, 2025, 11:50 AM

#

serene mirage it computes all the tokens using the vectors with lower dimension though

im pretty sure thats a minimal cut, since the meaning between 2 tokens is relatively trivial to compute
you still have to compute it across all of the context, then mask it to the top k then run the forward pass & repeat

#

though it could cut latency a bit

west moss Sep 29, 2025, 11:57 AM

#

There are efficient algos for computing topk

#

It won’t calculate the whole thing, then sort it etc it will just use an optimized algorithm

hidden sequoia Sep 29, 2025, 12:18 PM

#

I wonder if this will affect long context quality

#

Which was already average with DeepSeek

solemn cave Sep 29, 2025, 12:31 PM

#

interesting

#

So... to my understanding, this is how DSA kinda works. Though I am not sure because I have a fever.

DeepSeek Sparse Attention (DSA):

For every query vector the router (a tiny MLP) outputs k token-indices that will matter for this single query (k ≈ 1–4 % of context).
Two index pools are merged:
- local block – fixed-size window around the query (cache-friendly dense matmul).
- global experts – the learned sparse indices anywhere in the 128 k span.
Custom CUDA kernel (FlashMLA) gathers only those KV slots into a contiguous micro-matrix; the rest of KV-cache is never fetched.
Scaled dot-product is done on this k×k tile; result is exactly the same size logits vector as dense attention, so downstream layers notice zero change.
Softmax is computed only over the selected k positions, giving the same attention weights shape but O(k) instead of O(L) compute and memory.
Gradients flow through the router; the index-selection is fully differentiable (straight-through Gumbel), so the sparsity pattern is jointly trained with the rest of the network—no post-hoc approximation.

shrewd gate Sep 29, 2025, 1:06 PM

#

coming online shortly

olive leaf Sep 29, 2025, 1:10 PM

#

solemn cave So... to my understanding, this is how DSA kinda works. Though I am not sure bec...

I have a fever

Get well soon!

shrewd gate Sep 29, 2025, 1:21 PM

#

tools seem borked on their end

#

so launching without tools

violet hatch Sep 29, 2025, 1:22 PM

#

shrewd gate Sep 29, 2025, 1:23 PM

#

violet hatch

yeah this was the case for v3.1 too

#

and terminus

#

our code didn't change

violet hatch Sep 29, 2025, 1:24 PM

#

ah alright, i never checked it before so i assumed it wasnt like that

desert axle Sep 29, 2025, 1:27 PM

#

shrewd gate tools seem borked on their end

I was able to use tools with deepseek-chat via direct api

violet hatch Sep 29, 2025, 1:27 PM

#

uh

#

hm

#

might be an issue from my code 🤔

coarse loom Sep 29, 2025, 1:29 PM

#

has anyone tested long context performance yet?

desert axle Sep 29, 2025, 1:29 PM

#

coarse loom has anyone tested long context performance yet?

It performed decently on a ~32k token test

#

understood all relevant plot points and the connections between them

coarse loom Sep 29, 2025, 1:30 PM

#

oh yeah

#

so better than before?

desert axle Sep 29, 2025, 1:31 PM

#

can't say for sure, it's just one data point. but it's not terrible, at least

violet hatch Sep 29, 2025, 1:32 PM

#

violet hatch uh

okay my code just dropped all the connections for no reason

coarse loom Sep 29, 2025, 1:32 PM

#

a zero shot

viscid thunder Sep 29, 2025, 1:38 PM

#

violet hatch uh

Disappointing that the efficiency gains didn’t improve TPS

violet hatch Sep 29, 2025, 1:47 PM

#

yeah u aint got access to that

violet hatch Sep 29, 2025, 2:08 PM

#

https://youtu.be/fC_VaHvK1Oo

YouTube

Bijan Bowen

DeepSeek V3.2-Exp First Test – Is This the BEST Open Source LLM?

Timestamps:

00:00 - Intro
00:41 - First Look
02:08 - Technical Look
03:36 - Web Browser OS Test
06:57 - 3D Racing Game Test
10:01 - Freestyle Technical Test
11:59 - Creative Web Design Test
13:48 - Closing Thoughts

AI Integration & Consulting: https://bijanbowen.com
Join the Discord: https://discord.gg/hfaR2exy7S

In this video, we test the ne...

▶ Play video

hidden sequoia Sep 29, 2025, 2:09 PM

#

MrBeast ahh thumbnail

solemn cave Sep 29, 2025, 2:20 PM

#

viscid thunder Disappointing that the efficiency gains didn’t improve TPS

yea, because it's not a speed-improvement

foggy falcon Sep 29, 2025, 2:28 PM

#

violet hatch seems like they use a new attention method which leads to very cheap inference

ok that's pretty sick!

viscid thunder Sep 29, 2025, 2:59 PM

#

solemn cave yea, because it's not a speed-improvement

Theoretically it could mean lowering batch size -> speed improvements

#

Maybe with a Turbo api someone will do it

solemn cave Sep 29, 2025, 3:15 PM

#

viscid thunder Theoretically it could mean lowering batch size -> speed improvements

Lowering batch size can shave a couple percent off latency because each token now gets a slightly bigger slice of memory bandwidth, but that’s basically it—DRAM is still the ceiling. So, even a hypothetical “turbo” single-stream endpoint won’t suddenly jump to 2-3× speed; the 3-7× win is purely on the cloud-metered FLOP bill, not on wall-clock. A bummer, yea.

maiden thistle Sep 29, 2025, 3:15 PM

#

Oh, wow, 1/4th of the output price is going to be great for reasoning

violet hatch Sep 29, 2025, 3:23 PM

#

novita really creative with these prices

hidden sequoia Sep 29, 2025, 3:24 PM

#

The difference is DeepSeek has caching

#

Working caching

maiden thistle Sep 29, 2025, 3:24 PM

#

Gotta be #1 in the OR ranking

violet hatch Sep 29, 2025, 3:24 PM

#

true but really now, 1 cent cheaper?

maiden thistle Sep 29, 2025, 3:24 PM

#

Reminds me of a situation

#

Back then, we had some companies pricing their Llama 3 8B $0.001 cheaper than the other and they'd keep alternating who was #1

viscid thunder Sep 29, 2025, 3:26 PM

#

violet hatch true but really now, 1 cent cheaper?

Being the Default routed provider is a powerful thing!

heavy stag Sep 29, 2025, 3:32 PM

#

Skeptically excited to see the performance on this. Quadratic computation is one of the major problems with LLMs unless RAG gets perfected

#

Not computation. You guys know what I mean. I slept 2 hours

violet hatch Sep 29, 2025, 3:34 PM

#

im getting this error in the chatroom 🤔

#

no tokens for like 30 seconds then that happens

#

only with novita

hidden sequoia Sep 29, 2025, 4:12 PM

#

Fiction livebench says 3.2 is better than 3.1 in long context, which is counter-logical?

iron vessel Sep 29, 2025, 4:19 PM

#

hidden sequoia Fiction livebench says 3.2 is better than 3.1 in long context, which is counter-...

why counter logical?

kindred abyss Sep 29, 2025, 4:19 PM

#

DSA achieves fine-grained sparse attention with minimal impact on output quality — boosting long-context performance & reducing compute cost.
they did state specifically that it boosts long-context performance

iron vessel Sep 29, 2025, 4:19 PM

#

is it resulting on different outcome in other people test?

hidden sequoia Sep 29, 2025, 4:21 PM

#

Well I specifically thought that processing 'some' tokens instead of 'all' tokens will result losing subtext in understanding creative writing, and it's just another step to agentic/coding benchmaxxing

#

So it's cheaper, faster AND better in roleplay? Meaning V4 could boast 1000b paramaters while having same or higher speen compared to v3

viscid thunder Sep 29, 2025, 4:23 PM

#

hidden sequoia Well I specifically thought that processing 'some' tokens instead of 'all' token...

“All tokens” has the downside of muddying the waters

#

Less tokens => more focus

#

But it’s a balance

#

In this case, it seems like a good balance

viscid thunder Sep 29, 2025, 4:24 PM

#

hidden sequoia Fiction livebench says 3.2 is better than 3.1 in long context, which is counter-...

Where do you see better? It seems worse to me

#

Based on those numbers

hidden sequoia Sep 29, 2025, 4:24 PM

#

viscid thunder Where do you see better? It seems worse to me

Reasoning version, top half

viscid thunder Sep 29, 2025, 4:24 PM

#

Ohh

#

Huh

hidden sequoia Sep 29, 2025, 4:25 PM

#

58 v 71 @60k is massive

viscid thunder Sep 29, 2025, 4:26 PM

#

Yea

coarse loom Sep 29, 2025, 4:43 PM

#

hidden sequoia Fiction livebench says 3.2 is better than 3.1 in long context, which is counter-...

nice

solemn cave Sep 29, 2025, 4:57 PM

#

hidden sequoia So it's cheaper, faster AND better in roleplay? Meaning V4 could boast 1000b par...

DSA doesn’t “drop” subtext — it learns which tokens actually matter for each query, and those indices are different every layer and every token, so nuance survives. The 60k jump you see is mostly less noise accumulation (fewer low-relevance attention scores), not brute-force scale.

So yes, V4 can stack more params (or wider experts) without the old O(L²) tax, so role-play quality goes up while billable FLOPs stay flat or fall. it’s not just a benchmaxxing trick.

open grail Sep 29, 2025, 4:58 PM

#

hidden sequoia Fiction livebench says 3.2 is better than 3.1 in long context, which is counter-...

Yeah, I wouldn't have expected that! The paper said they did post-training on nearly 1T samples to learn the top-k indexing heads though, so maybe this data had a lot of what fictionlivebench tests for?

hidden sequoia Sep 29, 2025, 5:00 PM

#

solemn cave DSA doesn’t “drop” subtext — it learns which tokens actually matter for each que...

Mdash GPTism detected

solemn cave Sep 29, 2025, 5:03 PM

#

hidden sequoia Mdash GPTism detected

breh

#

Do you really hate m-dash 😭

#

I always use that

hidden sequoia Sep 29, 2025, 5:04 PM

#

That's what bot would say

solemn cave Sep 29, 2025, 5:06 PM

#

hidden sequoia That's what bot would say

https://tenor.com/view/robot-ai-robot-ai-meme-ai-meme-gif-509507551361818171

Tenor

subtle plank Sep 29, 2025, 5:19 PM

#

is this a perfkrmance increase or nah?

#

i assume not but the sacings is wild

maiden thistle Sep 29, 2025, 5:20 PM

#

It's not marketed as a quality improvement, so probably not

haughty ember Sep 29, 2025, 5:21 PM

#

subtle plank is this a perfkrmance increase or nah?

DeepSeek claim its the same performance while vastly more context efficient

solemn cave Sep 29, 2025, 5:28 PM

#

subtle plank is this a perfkrmance increase or nah?

Not.

Others have already explained above too.

high iron Sep 29, 2025, 6:01 PM

#

Is the new 3.2 model better for roleplay than 3.1?

#

And what is different between 3.1 and 3.2?

#

Please answer with ping. Thanks.

near forge Sep 29, 2025, 6:06 PM

#

high iron Please answer with ping. Thanks.

Biggest difference? It's much cheaper.

solemn cave Sep 29, 2025, 6:17 PM

#

high iron And what is different between 3.1 and 3.2?

DSA is the difference: It makes things cheaper.

#

Nothing changed on the model's perf/quality

hexed cloud Sep 29, 2025, 6:27 PM

#

@shrewd gate best default parameters?

shrewd gate Sep 29, 2025, 6:27 PM

#

it's already set on the model

hexed cloud Sep 29, 2025, 6:28 PM

#

I thought deepseek used temperature 0.3

#

It's set to 1

#

I know deepseek provider does the conversion

#

On their end

shrewd gate Sep 29, 2025, 6:28 PM

#

from huggingface

hexed cloud Sep 29, 2025, 6:28 PM

#

But what about the others?

#

Ah

#

I got 3 syntax errors with it with temp 1 so there's that

#

Also, if it's increased performance why are the two endpoints so slow?

solemn cave Sep 29, 2025, 6:43 PM

#

hexed cloud Also, if it's increased performance why are the two endpoints so slow?

the speed you feel is still gate-kept by the old infra. deepseek didnt roll out new gpu stacks, they just swapped the attention math inside the same containers. so the flops bill drops 3-7x for them, but the physical cards, pcie lanes and network hops are untouched -> same queuing, same throttle, same "slow". once providers re-tune batching / cache layouts for the sparse kernels you should see snappier responses, but for now its cheaper for them, not faster for us.

hexed cloud Sep 29, 2025, 6:45 PM

#

is thisan ai response

maiden thistle Sep 29, 2025, 6:45 PM

#

My thoughts exactly lol

hexed cloud Sep 29, 2025, 6:46 PM

#

same old infra, but less maths to be done per token (If I understand right), so i'm within my right to expect faster responses

#

unless they intentionally throttle it

foggy falcon Sep 29, 2025, 6:47 PM

#

hexed cloud is thisan ai response

lol yeah had that vibe too 😂

solemn cave Sep 29, 2025, 6:48 PM

#

hexed cloud is thisan ai response

well no, because if it was I would be speaking too formally

#

I was building on the past interactions

hexed cloud Sep 29, 2025, 6:49 PM

#

"for now its cheaper for them, not faster for us", "same X, same Y, same Z" and talking very personally as if to a customer

maiden thistle Sep 29, 2025, 6:49 PM

#

It just sounds like asking AI to write all lowercase, there's a lot of slop in that response

foggy falcon Sep 29, 2025, 6:49 PM

#

while this would be very normal to say, here in this discord everyone has their gpt-isms detectors on full blast

solemn cave Sep 29, 2025, 6:50 PM

#

hexed cloud "for now its cheaper for them, not faster for us", "same X, same Y, same Z" and ...

damn

#

I guess I should stop doing that then?

hexed cloud Sep 29, 2025, 6:50 PM

#

If that was actually you, I apologize. But also AI might have rewired your brain

solemn cave Sep 29, 2025, 6:50 PM

#

Yep

#

I might've changed

foggy falcon Sep 29, 2025, 6:51 PM

#

solemn cave I guess I should stop doing that then?

great observation! but it might actually be us being hyper focused on filtering out content generated by AI systems and not your language, that is perfectly normal and good.

solemn cave Sep 29, 2025, 6:51 PM

#

I was already being informal too because of earlier with Loinne

foggy falcon Sep 29, 2025, 6:52 PM

#

If you want I can create an image of the key points of this conversation as a mind map for you.

hexed cloud Sep 29, 2025, 6:52 PM

#

foggy falcon great observation! but it might actually be us being hyper focused on filtering ...

https://tenor.com/view/dexter-doakes-squint-stare-suspicious-gif-14432154109786838518

Tenor

#

"great observation"

solemn cave Sep 29, 2025, 6:52 PM

#

Lmfao

foggy falcon Sep 29, 2025, 6:53 PM

#

solemn cave I was already being informal too because of earlier with Loinne

yeah I guess what made it kinda sussy is that it was like one longer message

#

most people just like type one word

#

and then add "lol" in the next message or smth

foggy falcon Sep 29, 2025, 6:53 PM

#

hexed cloud https://tenor.com/view/dexter-doakes-squint-stare-suspicious-gif-144321541097868...

😉

maiden thistle Sep 29, 2025, 6:54 PM

#

solemn cave the speed you feel is still gate-kept by the old infra. deepseek didnt roll out ...

Well, in case you're wondering, your response has, in addition to what was pointed out:

A lot of "not x, but y" typical of AI (deepseek didn't ..., they ... / the bills drop ... but the physical cards / once you ... but for now ... / cheaper for them, not for us)
Weird, typical of AI phrasing ("gate-kept by old infra")
Suspicious, overly summarized, vague list of three elements (physical cards, pcie lanes and network hops)

solemn cave Sep 29, 2025, 6:55 PM

#

foggy falcon yeah I guess what made it kinda sussy is that it was like one longer message

My classmates also get sussy with me whenever I write something formally 💀 Doakes ahh

solemn cave Sep 29, 2025, 6:56 PM

#

maiden thistle Well, in case you're wondering, your response has, in addition to what was point...

Yea, maybe I should stop doing that.

foggy falcon Sep 29, 2025, 6:56 PM

#

maiden thistle Well, in case you're wondering, your response has, in addition to what was point...

at this rate everything is gonna be sus to us in a few years 💀

solemn cave Sep 29, 2025, 6:57 PM

#

foggy falcon at this rate everything is gonna be sus to us in a few years 💀

https://tenor.com/view/multiverso-multiverso-akela-multiverso-scout-gif-8208802467565299466

Tenor

near forge Sep 29, 2025, 6:58 PM

#

The just testmant, wasn't not, Y, but a tapestry of X.

maiden thistle Sep 29, 2025, 6:59 PM

#

foggy falcon at this rate everything is gonna be sus to us in a few years 💀

Hi, please tick this

near forge Sep 29, 2025, 7:00 PM

#

solemn cave Sep 29, 2025, 7:04 PM

#

maiden thistle Hi, please tick this

https://tenor.com/view/helldivers-automaton-robot-laptop-oil-gif-15529904519356806769

Tenor

#

bye dawgs, I'm gonna go sleepa while my immune system fcks the fever

rain spruce Sep 29, 2025, 7:05 PM

#

solemn cave the speed you feel is still gate-kept by the old infra. deepseek didnt roll out ...

"gpt, write without em dashes and lowercase only please"

#

you can't really hide the prose style from people that use a bunch of different models on a daily

hidden sequoia Sep 29, 2025, 7:14 PM

#

solemn cave I guess I should stop doing that then?

You are getting exposed as clanker. You better admit it

hearty gazelle Sep 29, 2025, 7:58 PM

#

Anyone else getting random '<｜end▁of▁thinking｜>' written in some of their responses? It's not even somewhere predictable like the start or end of a response, it's just... between sentences as far as I can tell. It's not often, maybe one in every 10-20 responses or so?

high iron Sep 29, 2025, 8:06 PM

#

near forge Biggest difference? It's much cheaper.

Alright thanks

high iron Sep 29, 2025, 8:09 PM

#

hexed cloud is thisan ai response

Absolutely not. AI would had added — into it.
It's like their breathing, AI can't write without goddamn em dashes.

hexed cloud Sep 29, 2025, 8:47 PM

#

high iron Absolutely not. AI would had added — into it. It's like their breathing, AI can'...

Not the only LLMism out there and can be removed or replaced by regular dashes by user after

rain spruce Sep 29, 2025, 9:54 PM

#

it's just a bit silly to do that in an AI related server

#

and i'm not saying there is something wrong with using it for writing, i do use it for long-form text because english isn't my first language

#

but there's no problem in saying that you've used AI to write and it's just funny when i see those GPTisms on messages

heavy stag Sep 30, 2025, 12:32 AM

#

The cost savings are crazy. Novita had 3.1's input tokens at $0.27, and now with the 3.2 savings it's only $0.27 input!

maiden thistle Sep 30, 2025, 12:33 AM

#

Great cost savings ||for Novita||

iron vessel Sep 30, 2025, 12:41 AM

#

i don't know if someone feel this too, but it seems like deepseek is the most impactful in the AI fields.
i know for most people their model will not be the best but the fact that they experimenting with new things then releasing really good paper about it, show how they not just care about the money but also the future of technology for the people

#

even qwen didn't have this approch

serene mirage Sep 30, 2025, 1:33 AM

#

Novita still no caching support?

solemn cave Sep 30, 2025, 2:53 AM

#

hidden sequoia You are getting exposed as clanker. You better admit it

https://tenor.com/view/play-dead-freeze-im-tripping-star-wars-r2d2-gif-17868688

Tenor

solemn cave Sep 30, 2025, 2:58 AM

#

rain spruce you can't really hide the prose style from people that use a bunch of different ...

I might've been infected by clanker-style ever since I started using the darned thang in 2022

#

I am at least happy that it didn't affect how I write in my first language.

since you guys understand newgen-stuff, maybe I can insert some into my explanations in the future. ❤️‍🩹

solemn cave Sep 30, 2025, 2:59 AM

#

maiden thistle Great cost savings ||for Novita||

What's with Novita?

visual cape Sep 30, 2025, 6:48 AM

#

hearty gazelle Anyone else getting random '<｜end▁of▁thinking｜>' written in some of their respon...

is this with Novita or DeepSeek endpoints?

south steppe Sep 30, 2025, 7:21 AM

#

hearty gazelle Anyone else getting random '<｜end▁of▁thinking｜>' written in some of their respon...

I bet you use text completion ?

hearty gazelle Sep 30, 2025, 8:43 AM

#

visual cape is this with Novita or DeepSeek endpoints?

Looks like it was hitting the Deepseek endpoint from the activity logs. I enabled reasoning.

opaque bolt Sep 30, 2025, 10:13 AM

#

iron vessel i don't know if someone feel this too, but it seems like deepseek is the most im...

Yes, Deepseek are my favorite of the Chinese participants. It just seems like more original research and training goes on here which makes their releases interesting, especially when they are being bold with architecture changes. I'm sure DeepSeek 3.2 is kind of a crazy idea they're throwing out there and that's why it's labelled experimental. Maybe it will underperform too much due to the attention differences, maybe it'll remain good enough for clearly most uses, and then it's a huge win.

opaque bolt Sep 30, 2025, 10:15 AM

#

heavy stag The cost savings are crazy. Novita had 3.1's input tokens at $0.27, and now with...

I'd wait for this one to mature and get more widespread competition among the providers. Decent chances this one will go lower than before. Maybe even at Novita themselves...

astral aurora Sep 30, 2025, 10:35 AM

#

Will there be a free model of this?

queen gale Sep 30, 2025, 10:41 AM

#

astral aurora Will there be a free model of this?

with how cheap V3.2 is listed? certainly

solemn cave Sep 30, 2025, 10:50 AM

#

astral aurora Will there be a free model of this?

It'll get crowded fast

queen gale Sep 30, 2025, 10:58 AM

#

V3.1 is handling heavy usage pretty fine, doenst drop below 90%

flat briar Sep 30, 2025, 11:49 AM

#

shrewd gate tools seem borked on their end

Any update on this? tools are still disabled for DeepSeek provider

agile fossil Sep 30, 2025, 12:04 PM

#

astral aurora Will there be a free model of this?

Maybe have, but not now

#

Waiting other provider

hidden sequoia Sep 30, 2025, 3:30 PM

#

This is getting out of hand

foggy falcon Sep 30, 2025, 3:31 PM

#

hidden sequoia This is getting out of hand

meanwhile they all don't have caching..

solemn cave Sep 30, 2025, 3:46 PM

#

hidden sequoia This is getting out of hand

Is this some sort of competition for them 😭 wth is with the pricing

maiden thistle Sep 30, 2025, 3:50 PM

#

Yep, lol, companies do that

#

Wouldn't be surprised if they keep lowering it to get on top

foggy falcon Sep 30, 2025, 4:01 PM

#

does OR route to them a lot though or are providers with caching preferred?

#

I guess when set to no logging / training deepseek is not in the race anyway

#

in the chat room the auto router routed me to deepseek now which is good to see..

pseudo tulip Sep 30, 2025, 4:28 PM

#

We got a new challenger coming up

pseudo tulip Sep 30, 2025, 4:29 PM

#

foggy falcon does OR route to them a lot though or are providers with caching preferred?

You can select your preference based on cost,latency and throughput .
A lot of the people select the cheapest one as preference so even 0.01 less dollars cost than your competition can ensure that it routes to you

foggy falcon Sep 30, 2025, 4:30 PM

#

pseudo tulip You can select your preference based on cost,latency and throughput . A lot of t...

but does that ignore cached input?

#

I really want more providers to offer cheap caching

#

would be great to have a price war on that ..

pseudo tulip Sep 30, 2025, 4:30 PM

#

From what I read in the docs, if it routes to a cache input provider once, it tries to route to that over and over to ensure cache hit

foggy falcon Sep 30, 2025, 4:31 PM

#

pseudo tulip From what I read in the docs, if it routes to a cache input provider once, it tr...

yep that's true

pseudo tulip Sep 30, 2025, 4:31 PM

#

From their docs

#

But I notice a lot of cache misses when using openrouter for some reason

#

especially when you have 10-15 api calls simultaneously , some have cache miss some have hit, kinda random

hidden sequoia Sep 30, 2025, 4:32 PM

#

Just enforce provider? It worked for me

pseudo tulip Sep 30, 2025, 4:42 PM

#

hidden sequoia Just enforce provider? It worked for me

I haven't tried it with deepseek but I had trouble with gemini flash 2.5. I used ai to add context to rag chunks, so the pre-fix was same for all the chunks, I processed 1 chunk first and then the rest simultaneously so they hit the cache of the first chunk. When I used deepseek official api I got almost 100% cache hit rate , with gemini it was quite random. I should have also gotten close to 100% but it was like 60% something, quite random. I did make sure to ensure it was only ai studio or vertex.

viscid thunder Sep 30, 2025, 6:04 PM

#

so far in my tests, this model is stacking up quite favorably to Grok 4 Fast, which is now a very similar weight class. performing better on my own evals for a real application.

#

the tradeoff being speed, primarily

pseudo tulip Sep 30, 2025, 6:08 PM

#

viscid thunder so far in my tests, this model is stacking up quite favorably to Grok 4 Fast, wh...

I personally think v3.2 is better than grok 4 fast and cheaper too.
For my evals deepseek almost always used around 500 reasoning tokens(Which is actually insane since my task was reasoning intensive), grok 4 is cheaper in terms of input and output tokens but when you consider the reasoning tokens that it uses, deepseek v3.2 is cheaper

viscid thunder Sep 30, 2025, 6:08 PM

#

yep

desert axle Sep 30, 2025, 6:09 PM

#

viscid thunder the tradeoff being speed, primarily

not really anymore, grok 4 fast's TPS is way down

prisma sable Sep 30, 2025, 6:12 PM

#

Seems to be a slight lateral thinking regression vs 3.1 terminus
(From https://lateralbench.org )

LateralBench

LateralBench AI Model Performance Leaderboard - Interactive accuracy and pricing comparison

pseudo tulip Sep 30, 2025, 6:20 PM

#

prisma sable Seems to be a slight lateral thinking regression vs 3.1 terminus (From https://l...

What is this benchmark about? I can't open the link for some reason

maiden thistle Sep 30, 2025, 6:22 PM

#

https://www.lateralbench.org/

serene mirage Sep 30, 2025, 11:05 PM

#

foggy falcon meanwhile they all don't have caching..

If any of these open model providers added caching support they would probably blow literally everyone else out the water with price while simultaneously increasing their capacity massively

charred pollen Oct 1, 2025, 7:26 AM

#

Has anybody else noticed 3.2 being much more repetitive than 3.1? It feels like it's not giving much attention to messages past the last 2-3, and I had it repeat the exact same answer in a multi-turn conversation.

serene mirage Oct 1, 2025, 7:28 AM

#

charred pollen Has anybody else noticed 3.2 being much more repetitive than 3.1? It feels like ...

Well it quite literally isn’t going as much attention

#

(Sparse attention)

charred pollen Oct 1, 2025, 7:36 AM

#

serene mirage Well it quite literally isn’t going as much attention

Yeah, I don't see how 3.2 is in any way a replacement for 3.1 right now, it's literally way worse.

foggy falcon Oct 1, 2025, 7:40 AM

#

that's probably why they called it Exp(erimental)

#

maybe it'll get better again with more training to offset this

#

while still being cheaper

west moss Oct 1, 2025, 8:42 AM

#

I think it’s pretty much as good as terminus. Try using it vs DeepSeek themselves to test it.

agile fossil Oct 1, 2025, 11:34 AM

#

charred pollen Yeah, I don't see how 3.2 is in any way a replacement for 3.1 right now, it's li...

This update is just make the price down only

#

A mini update

opaque bolt Oct 1, 2025, 11:57 AM

#

charred pollen Has anybody else noticed 3.2 being much more repetitive than 3.1? It feels like ...

I saw Fiction.liveBench which was kinda interesting on this topic. Non-thinking starts out performing very mediocre but oddly enough consistently mediocre which is in fact pretty decent for very long contexts...? But unusually poor for short contexts. Thinking High however performs well and what's nice is that it remains well over long contexts. Here's a screenshot and discussion on Reddit: https://www.reddit.com/r/singularity/comments/1ntmkah/fictionlivebench_tested_deepseek_32_qwenmax/

From the singularity community on Reddit: Fiction.liveBench tested ...

Explore this post and more from the singularity community

#

So I'd use DS 3.2 Thinking if I want to boost attention. I'm not sure why this is so but I assume the thinking tokens helps it stay on track?

grim beacon Oct 1, 2025, 12:09 PM

#

when will they release a model that can accept images

open grail Oct 1, 2025, 12:17 PM

#

opaque bolt I saw Fiction.liveBench which was kinda interesting on this topic. Non-thinking ...

The "thinking" section is likely acting as a secondary (and higher order) type of Associative Memory, eg:

You can think of Attention as a form of Associative Memory:

https://ml-jku.github.io/hopfield-layers/

but it can only deal with second-order interactions and this is where the O(n^2) comes from (third-order interactions would require O(n^3) and be completely impractical).

The way the reasoning section trawls over the same information over and over has the potential to selectively account for much higher order chains of interactions.

hopfield-layers

Hopfield Networks is All You Need

Blog post

pseudo tulip Oct 1, 2025, 12:18 PM

#

grim beacon when will they release a model that can accept images

I think deepseek as a company is a lot more prone to doing experimental things for the sake of experimentation (If you've seen the interview with their founder/head after deep seek reasoner crashed the Nvidea stock market).

#

They take a lot of risks and try out a lot of new things, so far I believe they're focusing only on text input and text generation, but you never know

#

1 thing is for sure. They won't release a multimodal model just for the sake of having a deepseek multimodal model. They always bring something new to the table. Like they bought in Mode of experts and now this sparse attention with v3.2

iron vessel Oct 1, 2025, 12:47 PM

#

pseudo tulip 1 thing is for sure. They won't release a multimodal model just for the sake of ...

Yeah, pretty interesting company for sure.
They didn't really follow the trend and trying to be the one making the trend

grim beacon Oct 1, 2025, 2:05 PM

#

pseudo tulip 1 thing is for sure. They won't release a multimodal model just for the sake of ...

yah if it weren't for them most companies would just be racing for more parameters , and cost , innovation would have been slow

charred pollen Oct 1, 2025, 2:14 PM

#

grim beacon yah if it weren't for them most companies would just be racing for more paramete...

Do people really care about more parameters at this point, though? All of the top proprietary models don't even reveal their parameter count, and for open weights, larger size is a downside, not upside. I'd think test scores would be much more popular compared to size, as flawed as they are

heavy stag Oct 1, 2025, 2:23 PM

#

I think this was bound to be a weird update. It's not like anyone expected more smarts when the big innovation is a form of sparse attention

iron vessel Oct 1, 2025, 2:23 PM

#

charred pollen Do people really care about more parameters at this point, though? All of the to...

I heard somewhere that said the performance of small model actually could be boost through prompting to be close with their bigger version.
It said that with bigger parameters there's additional context added to each token that why it able to produce more quality output.

#

But with good prompting the small model could be at the same level as the bigger one

heavy stag Oct 1, 2025, 2:25 PM

#

I think at best it's that maybe the correlation is weaker than we thought.

#

I will say smaller models hold up better than a lot of us probably thought when they are given an ungodly amount of reasoning tokens

#

Namely QWQ and the new Qwen 80B

#

But they won't be acing HLE any time soon

subtle plank Oct 3, 2025, 6:28 AM

#

serene mirage If any of these open model providers added caching support they would probably b...

deepseek has cacheijg

serene mirage Oct 3, 2025, 7:32 AM

#

subtle plank deepseek has cacheijg

I mean like one of those providers that host a bunch of open source models, like Novita or stuff

keen brook Oct 9, 2025, 12:27 PM

#

My experience in text adventure games with Dipsy v3.2 as the interactive system (official provider, reasoning on, Temperature 0 and 0.5):

Prices back down to v3 levels, massive
Carries v3.1's quirk of not spamming asterisks, good
No more "Somewhere, in the distance, X... Y..." and "Outside, X... Y..."
Follows instructions better, again from v3.1, so needs more explicit instructions
New kind of sloppa spam of "It's not about X, it's Y." / "It doesn't X; it Y..."
Starts to get things wrong a little past 32k to 64k, okayish
Knuckles whitening, breath hitches slop (I'm okay with this though)

Good model, if you can keep up with the slop.

sharp bloom Oct 9, 2025, 12:29 PM

#

keen brook My experience in text adventure games with Dipsy v3.2 as the interactive system ...

dipsy ❤️❤️

keen brook Oct 9, 2025, 12:31 PM

#

Indeed. Dipsy love

opaque bolt Oct 9, 2025, 11:09 PM

#

Sounds good! Pricing is so good that even just sustained performance makes this one a clear success. I think the killer feature is being able to use premium providers without worry and not hit silly rate issues.

heavy stag Oct 10, 2025, 12:27 AM

#

keen brook My experience in text adventure games with Dipsy v3.2 as the interactive system ...

Do you mean new for DS? That's the most famous slop there is across models.

keen brook Oct 10, 2025, 1:04 AM

#

heavy stag Do you mean new for DS? That's the most famous slop there is across models.

yea, or I just didn't notice it before for dipsy

keen brook Oct 10, 2025, 4:37 AM

#

Oh, and don't forget about the spam on the smell of ozone. Deepseek models LOVE the word for some reason

ember karma Oct 10, 2025, 12:49 PM

#

I'm still debating where to top-up though. Open router or deepseek official? I love deepseek and all but v3.1 (and I assume v3.2) would be inferior than 0528, I tried the free version of v3.1 and found it less creative than R1, especially on fantasy settings

#

Should I just stick with open router and go for my preference, or try v3.2? Are there other pros to it from r1 0528?

foggy falcon Oct 10, 2025, 12:58 PM

#

ember karma Should I just stick with open router and go for my preference, or try v3.2? Are ...

you can use v3.2 via openrouter so if you want to experiment I’d top up on openrouter instead of deepseek ..

#

and you can force routing through deepseek too so there isn’t much of a downside right now either 🤔

ember karma Oct 10, 2025, 1:12 PM

#

🤔 you're right, open router has google pay too which makes things easier, so that's a plus

grim beacon Oct 10, 2025, 8:17 PM

#

how much cost do cache hit saves?

foggy falcon Oct 10, 2025, 8:26 PM

#

grim beacon how much cost do cache hit saves?

90% on input

grim beacon Oct 10, 2025, 8:37 PM

#

foggy falcon 90% on input

so like when i chat with ai model and each time it sends the whole stream back? and I save money as its already cached?

queen gale Oct 10, 2025, 9:02 PM

#

yes

maiden thistle Oct 10, 2025, 9:08 PM

#

Well, you can reasonably expect to save money on the input for the system prompt part

#

Most chat apps operate with a rolling window of chat messages, let's say the limit is 4, so you have
SYS PROMPT - MSG1 - MSG2 - MSG3 - MSG4
When you send another message, we're over the window, so the oldest message will be deleted
SYS PROMPT - MSG2 - MSG3 - MSG4 - MSG5
Caching works on whichever prefix is repeated across messages. In this case, the prefix is the system prompt

grim beacon Oct 11, 2025, 6:33 AM

#

maiden thistle Most chat apps operate with a rolling window of chat messages, let's say the lim...

what is the limit if i use open-webui , because yesterday wen I was chatting it went to 90k tokens after 70-80 messages , was it keep track of every message?

rain spruce Oct 11, 2025, 3:08 PM

#

grim beacon what is the limit if i use open-webui , because yesterday wen I was chatting it ...

the limit is 164k tokens

#

for this specific model

iron vessel Oct 13, 2025, 4:53 AM

#

woah..

#

this model amazing, only using one dollars for a lot of context.

#

so much cheaper and quality 97%-99% of sota model for general stuff

astral aurora Oct 18, 2025, 2:05 PM

#

Still no 3.2 for free?

astral geyser Oct 19, 2025, 4:31 PM

#

hearty gazelle Anyone else getting random '<｜end▁of▁thinking｜>' written in some of their respon...

any fix for this?

stuck charm Oct 19, 2025, 4:37 PM

#

astral geyser any fix for this?

which provider? sounds like a broken template.

astral geyser Oct 19, 2025, 5:13 PM

#

stuck charm which provider? sounds like a broken template.

novita

stuck charm Oct 19, 2025, 5:29 PM

#

astral geyser novita

yea I just generated 3 responses and novita and something is wrong with their template. also has massive CN issues. so exclude them for now, is the "fix".

astral geyser Oct 19, 2025, 5:37 PM

#

stuck charm yea I just generated 3 responses and novita and something is wrong with their te...

so which provider doesn't add that annoying <｜end▁of▁sentence｜>

stuck charm Oct 19, 2025, 5:40 PM

#

astral geyser so which provider doesn't add that annoying <｜end▁of▁sentence｜>

i have seen chinese but not that tag, vanilla openrouter? default settings? an app? hard to tell with no infos

full zodiac Oct 21, 2025, 8:31 PM

#

does anyon have a good preset?

#

ive been using chatstream and its kinda not the best with 3.2

dense reef Nov 1, 2025, 7:25 PM

#

I'm using V3.2 for a text adventure, and Atlas Cloud tends to input its reply entirely in Reasoning, which means the next turn wouldn't see its reply in the context. Not sure how to fix this, except to put Atlas in my ignore provider list.

hidden sequoia Nov 1, 2025, 7:31 PM

#

Why this particular provider in first case?

dense reef Nov 2, 2025, 1:18 AM

#

I had the providers as auto. I guess it picked whatever was easiest to connect to.

long kernel Nov 4, 2025, 10:18 PM

#

astral geyser so which provider doesn't add that annoying <｜end▁of▁sentence｜>

atlas-cloud/fp8 did that:

<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>list<｜tool▁sep｜>{}<｜tool▁call▁end｜><｜tool▁calls▁end｜>

long kernel Nov 4, 2025, 10:19 PM

#

dense reef I'm using V3.2 for a text adventure, and Atlas Cloud tends to input its reply en...

I believe DeepInfra does that too, but it happens when DeepSeek needs tool call.

astral geyser Nov 5, 2025, 5:44 AM

#

long kernel atlas-cloud/fp8 did that: <｜tool▁calls▁begin｜><｜tool▁call▁begin｜>list<｜tool▁se...

so atlas removed it?

long kernel Nov 5, 2025, 5:45 AM

#

No still does but not always, didn't check other providers little bit deepinfra and its good

astral geyser Nov 6, 2025, 1:00 PM

#

is something going on with v3.2?

#

none of the providers are working properly

fringe shell Nov 10, 2025, 5:37 PM

#

are there somewhere other providers that support caching other than deepseek themselves?

violet hatch Nov 10, 2025, 5:37 PM

#

no

fringe shell Nov 10, 2025, 5:38 PM

#

hm ok, than I still have to ignore all others :/

#DeepSeek V3.2

So yes, V4 can stack more params (or wider experts) without the old O(L²) tax, so role-play quality goes up while billable FLOPs stay flat or fall. it’s not just a benchmaxxing trick.

Not.