#DeepSeek V3

1605 messages · Page 2 of 2 (latest)

verbal tapir
#

This thing is so in demand that all provider almost have problem providing it man.

hoary pumice
#

this gotta be a new record, 0.07t/s

somber mango
summer mural
#

folks. we did it. negative uptime 😂

hoary pumice
#

now over 100% uptime

astral flicker
#

still down

radiant walrus
#

If deepinfra and hyperbolic are proxying then does that mean their privacy is the same as that of deepseek? As in prompts sent through deepinfra would not be private from deepseek and may be used for training by deepseek?

queen charm
#

how do I route to hyperbolic?

potent moon
#

It's not that they are proxying Deepseek
Is that are not ready to handle the whole load of traffic when Deepseek is down
So when Deepseek is down, they saturate too

bold burrow
#

It looks like anything

#

Over a certain context just

#

Breaks the model rn

hoary pumice
cyan sail
#

Is anyone else experiencing this issue? I've even tried creating a new API key.

outer arrow
hoary pumice
#

Hyperbolic now runs at precisely 0t/s

jaunty orbit
jaunty orbit
bold burrow
#

yeah deepseek at high contexts is not really working

#

also watching Hyperbolic fluctuate between "dead" and "barely alive"

#

is hilarious XD

pure canopy
#

how high context? I've been submitting big files and its still just about holding together. high latency tho

bold burrow
#

which provider?

#

the DeepSeek provider specifically gives me issues

#

about twice a day

upbeat aurora
#

Occasionally deepseek limits their context to 8k probably due to load

stray solar
#

deepseek model so friggin slow for me

glass brook
bold burrow
#

also is it just me or

#

is Fireworks' DeepSeek-v3 completely diff lmao

#

this shit is SLOW and RLLY RLLY BAD

#

lmao

somber mango
#

why is DeepSeek so hard to run?

#

i get that it's massive, but isn't MoE supposed to run a bit quicker?

verbal tapir
# somber mango why is DeepSeek so hard to run?

Imagine 671B model have 37B active parameters in given moment, then there 18 people using that model at the same time with different thing, isn't it going to activate all of the parameters anyway.

so in simple term because there is a lot of demand that why is hard.

That's my thought of it, could be wrong.

pure canopy
#

what quantizations are the non-official API for DeepSeek?

verbal sigil
#

Deepseek had months to optimize their infra for MoE for a year now. Other providers never had this much demand for a MoE and most llm serving software don't have first class support for batched high throughput MoE support simple as that.

hot wren
#

also quite a different MOE to server

#

also there deployment is 1 expert a gpu .. with a unit of scale of 320 gpus

#

most try to run that of units of 8xh200's

verbal tapir
#

Event after all of that Deepseek still have hard time to run it, i mean they increase their node last time to accommodate the demand.

This one model really attract people to them, good for them to hit the hype train.

hot wren
#

they did something i would have deemed impossible on a shoestring budget

#

soo they deserve all the hype they got

fair sphinx
hot wren
#

6m as training cost for a 600b+ model is a drop in the ocean

#

a 100b dense costs 50-100m

#

just in compute

#

and a 2k gpu cluster is minimal for such a model

#

405b from llama used over 20k gpu's

#

they did fantastic work with the limited stuff they had

#

also mind you the h800 is more on paar with a an a100

#

as the chip is capped

fair sphinx
#

assuming H800 costs $42000 as it does on ebay, then the cost of 2048xH100 is $86M

hot wren
#

h100 is around 28k in bulk

#

and we talking compute cost not hardware acquesition cost

#

2 very different things

fair sphinx
#

i see

#

does anyone know how well merging the experts of MoEs together into a dense model works

#

the only attempts I've seen are attempts with Mixtral 8x22B

hot wren
#

you wont be .. as knowledge wont cluster that way

fair sphinx
#

and they were not evaluated very well

hot wren
#

even at mistral that wont work

#

here its even way different

#

as the gate selects more then just 2 layers

#

mistral is. 2experts

#

here its 9 ?

fair sphinx
#

oh

hot wren
#

ya 37b active

#

over 9 experts

fair sphinx
#

so each expert is ~4B?

hot wren
#

aprox but expert is just a wrong word for it

#

its 256 experts in total in the model

fair sphinx
#

i'd imagine pruning wouldn't work well since the experts are so small already

hot wren
#

knowledge clustering happens differntly then one would assume

fair sphinx
#

so the only way to get a smaller deepseek-v3 would be to logit-distil it or pretrain a smaller model on the same dataset?

hot wren
#

moe is really just fractioning stuff out

#

you have overlaps and no central clustering

#

aka you dont say expert 38 is the math expert

#

just doesnt work that way

#

its a logical separation and inference hack

#

not really isolation on knowledge clustering

#

moe isnt new .. first paper was written in 97 about that

#

with 100k experts

#

its just getting steam after mistral

#

and deepseek has successfully integrated it as well

#

but non the less they did very fine work

fair sphinx
#

hmm

#

well thanks!

hot wren
#

inference on batch is still tricky

#

at deepseek they run 320 gpu's as unit of scale

#

and each expert is pinned to 1 gpu

#

the 3rd party guys dont have any of that

fair sphinx
#

does the router need its own gpu

hot wren
#

need is a strong word but you have kv and otherwise the inf cost is just like a dense 670ish B model

#

as in batch odds are close to all experts are hot

#

aka performs like a dense model on the gpu

#

if you pin an expert to a gpu - that becomes the unit of scale and you have massive gains in perf

#

as you have a 3-4 b model on a gpu vs a 680b over a few with tensor and data parallelism

fair sphinx
#

since the experts are so small could you use 8~12gb vram gpus for the experts then? or does it still need large gpus?

hot wren
#

im semi affiliated with mistral - but i really have massive respect for the boys at deepseek

#

they did very good work

hot wren
#

what you could eventually do is prune it down to just use 2 experts but after pruning you would need to retrain the gates

#

not sure how well that would work

#

im spitballing here

fair sphinx
#

assign the experts to the different gpus

hot wren
#

ya that work if you have the code to pin it

#

kv may rape you .. a little

#

and the network

#

but yes

fair sphinx
#

👍

hot wren
#

but then the power cost - and throughput

#

i think the cheaper option is to use the api

#

320 small gpu's will drain at least 200 w each aprox ..

#

thats a big steep bill .. not user if 8xh200 at 2 usd per gpu aka 15k a month wont be the cheaper option and you have a higher throughput - at least on paper

#

we just need different matmul processing that is faster and faster memory technology

#

in 1-2 decades we run models like that on our toaster

fair sphinx
#

yeah

hot wren
#

ya i seen that - but mind you thats 20-25k too

#

and single user

fair sphinx
#

yeah

hot wren
#
  • prompt processing over longer ctx on such a setup will be horrible
#

its a great demo for sure

#

but i would not call that viable

#

for every day use that is

#
  • the inital investment for 8 macs .. at that config . wont be a easy paletteable investment for most
#

given in a year or 2 its probably close to worthless

#
  • if you try to spend 20k on the api given that price
#

you have a few years runway

#

lol

#

so not really sensible fiscaly

#

i have a hard time spending 100 bucks on deepseek with daily use

fair sphinx
#

so it seems that the most improvements in local models will probably be dense models with higher-quality data and maybe new finetuning methods?

hot wren
#

stuff gets bigger before it gets smaller

fair sphinx
#

yeah

hot wren
#

distillation and then newer dense models can train of that

fair sphinx
#

oh yeah i forgot about distillation

hot wren
#

i would love to run that local but no dice

#

im capped with 96g vram and 256 gb ram on my normal workstation

#

and that is already more then what the average user has available

bold burrow
#

Why is fireworks

#

So diff from Deepseek as a provider for this model

#

The response is so diff

verbal tapir
cedar wolf
#

Firework seems off. It's like it has it's own temp settings. Together is expensive and DeepSeek doesn't always work. It's a shame.

strong rose
#

The DeepSeek endpoint doesn't seem to work at all rn

glad matrix
hoary pumice
#

looks like a small anomaly

hot wren
#

custom asic's could help .. and some are working on it .. see cerebras / groq .. tpu

#

cerebras is pretty much unbeatable at this point as the interconnect is legaly fenched off to them

#

so groq still has some gains to be gotten once they get the v2 hardware out of the samsung run

#

but the devices are individually small as its all sram

#

so deployments aint cheap

wary wing
#

Does matmul mean "matrix multiplication"?

hot wren
#

yes

#

tpu and tensor cores are systolic arrays

#

pretty much vector processors

#

to accelerate matmul

#

modern cpu's have avx for that but way slower then simd or other achitectures

#

my bet is still on photonics .. but there is alot missing in terms of material science

#

next 1-2 decades are going to be interresting

verbal tapir
#

I mean we can see that from layer with FFN/MLP only we get O(n) and if then we add attention layer into it then we will get O(n^2).

hot wren
#

at least not without massive cont. pretraining

#

otherwise you get noise only

#

so that kills is very much for 99.9% of the smaller guys / labs

grand wraith
cedar wolf
grand wraith
#

deepseek v3 works very well with roo cline

bold burrow
#

Has Fireworks like

#

fixed their issues yet

#

bc other than DeepSeek

#

all other providers have terrible responses

digital silo
grand wraith
grand wraith
#

oh i see it's done in the code, not on website. thx

wary wing
grand wraith
#

surely they get kickbacks for allowing it to happen by default 😛

wary wing
summer mural
vocal ledge
#

Anyone know why, no matter what settings I use, I can only get max response of approx 2000 tokens using deepseek provider? I've set max tokens to 8k.

Even when I ask specifically for up to 5000 words. Cheers

wary wing
vocal ledge
#

It stops naturally, like it starts generating its response "knowing" it's limited to 2k or so words if that makes sense.

hot wren
#

its not a rp model mate

#

dont expect it to write long stuff for you .. it doesnt have to - if anything most us want answers as short as possible and to the point

vocal ledge
#

More for knowledge articles and wiki creation. Yeah it's a shame cause it's my go to model right now, and Gemini could pump out full articles in one prompt but deepseek take a couple prompts. No real issue just was checking if I'm missing something in settings.

rugged phoenix
#

It’s good at creative writing actually.

zinc quarry
#

After all, do you ever really want any model 1 shotting a large of piece of text beyond say code refactoring? Just like a human writer, it's best to start with an outline, and progressively flesh it out.

upbeat onyx
#

is it down

sand palm
#

I get errors over deepseek api, but not on official website.

zinc iron
# hot wren dont expect it to write long stuff for you .. it doesnt have to - if anything mo...

imo its a bad thing if model answer always short, i dont like how sonnet answer a problem without details explanation of solving it even when you ask it where in other hand o1 always giving out details about that problem.

also the other person talk about how its limited to 2k even when its adverst to have more than that, so make sense that person thought it should have been outputting than 2k just as deepseek told us in their label.

hot wren
#

people who cry are the guys who think ai will do all alone ..

#

its a TOOL

zinc iron
#

i think you miss the one and two point, there use case where longer outputing could be really beneficial as in my case its about understanding problem and the longer its the better it could be layout also if it being adverst that it can outputing 8K token then you shouldnt need to steer it so it outputing as what being adverst where also if you ask it to continue then its not as what being adverst to be 8k token output.

i agree its a tool but its still shouldnt have that adverst in the first place if it cant get to it

upbeat aurora
#

It can output 8k but the model doesent feel it is necessery. Suppose you give it a translation task which requires 8k output then it will do that because the model feels the need to output all 8k tokens.

sage orchid
#

are you telling it to generate 8k tokens or are you telling it to like "respond in 10 paragraphs with 5 minimum sentences each" which can equal 8k tokens or something similar

zinc iron
# upbeat aurora It can output 8k but the model doesent feel it is necessery. Suppose you give it...

has you actually try it to generate 8k token for other thing than translate? i has and its really are limited to below 8k token if you ask it to make story even if you steering it to do so, imo its shouldnt adverst as something that could generate 8k token if it cant do it for many thing other than translate and i think eternal answer are more suit for this thing as it is a model problem than other thing where there no example that goes 2k token on the training data.

upbeat aurora
#

then you should probably use a different model for your use case

upbeat onyx
#

ooooo fight fight fight

bold burrow
#

has anyone figured out why the models return such different responses dpeneding on provider?

forest sail
#

Everyone is running a different inference engine and potentially different quantization of weights. DeepSeek's inference engine is proprietary, Fireworks apparently does some form of quantization (https://x.com/FireworksAI_HQ/status/1874231432203337849?t=Y8xmqor0UFhkvzPAJd4H6A&s=19), and everyone else is a wildcard.

DeepSeek V3, a state-of-the-art open model, is now available on Fireworks Serverless and Enterprise!
🥇 SOTA open model for coding and reasoning
🥇 Best performing open model on Chatbot Arena and WebDev Arena
🧠 671B MoE parameters, 37B activated parameters
Congrats to the

#

I don't think there's been confirmation that the different open source inference engines produce the same output for deepseek at the moment.

hot wren
#

also sampers make an impact how they process logits - fireworks will be some triton based inf

#

the best bet specially with moe's / new architectures is always the original model providers api

bold burrow
#

it's v v different unforutnately

#

yikes

graceful mica
#

source: local inference w/o MTP module compared to DeepSeek Platform

#

example: with MTP on, model stays stable at temp 2, w/o it it's much more chaotic

graceful mica
#

it's basically a specifically trained 14B model for speculative decoding

#

more complex than that, but it's the basic idea

bold burrow
#

Any way we can use those for the other providers

#

Or do we just wait for them rn

pure canopy
#

spec decoding is changing outputs a lot, its very unlikey the official api is doing that.

astral flicker
#

Sorry for the intrusion, but can you share the prompt to me too?

wary wing
#

I thought the entire purpose was to be faster and not change outputs

gusty locust
#

I'd like to report that, unless I'm misunderstanding something, Deepseek seems to be limiting their context to only 10k, and it has been this way for days. Any time a prompt exceeds that amount, it never generates a response, and if you lower the context size to 10k, it starts working again.

wary wing
gusty locust
#

I use exclusively deepseek as the provider, and disable the others in the settings.

#

The others just output gibberish.

#

Though the other providers do generate a response even when over 10k. It's just usually garbage.

upbeat aurora
#

Sometimes it works over 8 to 10k but it is probably limited due to load

gusty locust
#

Seems like that should be listed somewhere. I've tried it at random times over the last several days and I've never had it generate a response over 10k. I get that it's a good model at a low price, but I was using it expecting to get the full context listed.

upbeat aurora
#

I agree

sonic plume
#

Could be the same issue

sonic plume
#

@lament yacht Will OR look into this? Because if DeepSeek is indeed limiting input tokens to under 10K, either DeepSeek or OR is lying about DeepSeek having 64K context window

rustic glade
sonic plume
#

I guess it's weekend so no one is answering or investigating this

silent torrent
#

there’s often a difference between a model’s technically supported limits and what the provider actually enables in prod, likely deepseek had to limit context length to serve more people

#

example being google charging more for tokens over 128k while the models can support 2M tokens

#

just checked deepseek discord, it’s 100% an issue on their end

sonic plume
#

It's fine to limit context length. Many Llama and Qwen fine-tune models hosted by Infermatic and Featherless actually support up to 128K context window, yet no host will actually host them with such context window

#

It's just that it should be clear and honest about it on OR's model page

gusty locust
#

Yeah, that's my main issue. It needs to be stated somewhere. This isn't a free service. We are paying for it, and if they are having problems then it should be clearly stated.

lament yacht
# sonic plume <@353228093420208131> Will OR look into this? Because if DeepSeek is indeed limi...

I think we need to chat with DeepSeek upstream, we're following their docs: https://api-docs.deepseek.com/quick_start/pricing

The prices listed below are in unites of per 1M tokens. A token, the smallest unit of text that the model recognizes, can be a word, a number, or even a punctuation mark. We will bill based on the total number of input and output tokens by the model.

silent torrent
#

yeah was gonna say, deepseek themselves don’t say anywhere that they’re enforcing this limit

digital silo
#

This is good

rustic sundial
sonic plume
#

They replied about the hanging problem in the Chinese channel but not in the English channel 😂
They just replied in the English channel

#

Even though it's mainly about Cline, basically they just suggest against using it with longer contexts

#

"Our team is actively optimizing calculation and server load management to improve the overall performance and stability."

#

Basically canned response

#

Also they recommended using other providers when the context is longer. Oh god, if only they know Fireworks often returns rubbish

#

And no mention about fixing their documentation

hoary pumice
# sonic plume

I think its mostly they are running out of hardware to scale further

red fossil
#

is there any caching with fireworks?

bold burrow
#

my issue with them is like

#

when DeepInfra eats shit and gets messed up, and can't serve a model, it's clear, either their accepted context lowers and we're all aware, or the provider just turns red and we're all still aware

upbeat aurora
#

the issue is its not down or red because under 8 to 10k still works and you cant derank because then ppl will complain about costs

pure canopy
#

long context does still work sometimes tho, its not like a blanket ban on anything over 10k

pure canopy
#

Does anyone know if Together is hosting the model fp8 (i,e, as huggingface), or did they quantize it?

bold burrow
#

Other providers perform like ass

#

That’s my issue

onyx flax
#

I've used this and Claude and Claude is faster than Deepseek on Cline. So IDK why Deepseek is slow or just hangs up. Frustrating

digital silo
upbeat aurora
#

is any provider working for anyone? all providers just load forever.

exotic sun
#

I had to add deepinfra and together to my block list because I aint paying $1.25 for deepseek which defeats the purpose of deepseek

covert pawn
#

So now DeepSeek is FP8 and with 10K context?

sonic plume
#

11K worked yesterday for me

#

Don't know if it will change again due to heavy loads

covert pawn
#

In short, for RP it is better to go back to Sonnet and Wizard. Thank you

fossil granite
#

What happened to the 'Fireworks' provider btw?

dusty fable
hybrid plover
fossil granite
#

looks like NovitaAI joined the race, taking place of Fireworks in some ways, let's see how it goes!

#

the main problem and this seem to be a issue for long now, no matter the params you set, the providers seem to tune the models in their own ways such that each provider acts a little different than the other. which disturbs the overall flow 😦

exotic sun
#

this was also noticeable with qwq, some providers did not work well with it

hybrid plover
unborn hare
#

How’s the privacy policy of Together and NovitaAI compare to Fireworks? Their ToS seems pretty similar at first blush

fossil granite
hybrid plover
#

Oof, that's rough.

hoary pumice
#

looks like it died first

hybrid plover
#

Yeah, looks like a lot of providers are struggling with serving such a gigantic model

fossil granite
upbeat aurora
#

3.5 sonnet

hybrid plover
#

True

#

Although I prefer o1, but it's test-time compute based and really quite expensive.

cedar wolf
#

DeepSeek provider seems like the ONLY good option for generating a story. Every other provider just gives a load of rubbish.

#

I've never experienced this with any other model.

hybrid plover
#

Idk, for me personally, Fireworks or Together are the only ones that work for creative writing.

cedar wolf
hybrid plover
#

Around 1

cedar wolf
hybrid plover
#

Yeah, i crank some quite high too. Between 0.5 and 0.8

mild oxide
# cedar wolf DeepSeek provider seems like the ONLY good option for generating a story. Every ...

Really? I'm experiencing the opposite. I thought the model was bad for creative writing, but recently I got routed to deepinfra when the official one is down, and it's vastly better. Disabling fallback, the quality degrades severely again. Still experimenting but my guesses are: 1. Official API might be using a very short max input. 2. It uses a different method for censorship, maybe using logit bias or filtering the input with regex. So it still gives output unlike GPT, but the quality will be extremely bad. Currently the official API is very unstable so I haven't tested too much.

hybrid plover
#

Yeah, i think official API uses some extreme optimization options for cost cutting which causes the performance to degrade too.

cedar wolf
mild oxide
# cedar wolf It's because I've been setting the temp too high for other providers. I'll try a...

Were you using Novita? I find that provider reacts to temp way more than official. The deepseek docs recommend 1.3-1.5 for creative writing, but novita can only generate nonsense at as low as 1.1 . Other providers are also more sensitive to temp than official, but not nearly as much as novita. It also likes to add some comment, like an author's note, at the beginning of its response. None of the other providers does that.
I'm not an expert but this difference between providers really seem strange to me. I thought they should produce mostly the same results, but in deepseek's case, they are very different.

mild oxide
#

Overall Notiva's output is super weird. Not only it adds "author's notes", it also ignores my request to summarize the story (so that I can test whether it cuts context or filter the input). Suspecting the provider's honesty, I asked it which of 9.11 and 9.9 is larger, and it got it completely wrong. Not only the result was wrong, it also used a completely different format. Other providers, include official, ALWAYS starts with "To determine..., let's compare them step by step", and then lists out the steps. Novita doesn't do this, so I'm starting to doubt whether it's even actually deepseek.

cedar wolf
#

It's why I've got all the providers except DeepSeek blocked via OR settings. I'll switch to DeepInfra once I'm done with MiniMax.

upbeat aurora
#

Responses also seem different if I use my own deepseek key or BYOK compared to through openrouter. I wonder if they did something to censor openrouter key

cedar wolf
#

Just started using Together with temp 1.1, rep penalty 0.5, freq penalty 0.7. The difference is staggering.

hybrid plover
#

True

covert pawn
covert pawn
cedar wolf
#

Yes

covert pawn
#

Then I still have repetitions, ouch!!!
And I also tried Presence Penality = 0.7

And repetitions have a standard behaviour, they always appear when a speech or scene is prolonged.
If you have a faster pace they do not appear.

cedar wolf
#

Are you sure you're using Together as the provider?

covert pawn
#

Yes.
If I prolong the scene there are always repetitions, not striking but it recomposes the sentences with the same terms.
Much less than before but using Sonnet, Wizard and Hermes 3 you realise that it cannot be like them, without repeating itself.

cedar wolf
#

I get it. Sadly other models are either expensive, moderated or trained on 50 shades of grey. I have a bot I always go back to. It involves a guy with a secret identity with an online account to contact User. While other models struggle with the concept, DeepSeek just gets it. And it always seem to follow the prompt to the dot while other models seem to derail after a while.

covert pawn
#

You're right, I think the models I mentioned are much more nuanced on the sexual side and that prevents repetitions.
I too find DeepSeek a very good model for some of my SFW cards, but when I use the NSFW ones the repetitions always come back, as if at some point in the scene his vocabulary of phrases runs out or is otherwise more limited than other LLM models.

hybrid plover
#

Why are you using Text completion instead of the Chat one?

cedar wolf
#

I use text because the website won't let me choose.

hybrid plover
#

I see

atomic patrol
#

The deepseek provider has been really slow lately, or is it just me?

hybrid plover
#

Yeah, a lot of providers are struggling with running Deepseek v3 adequately

cedar wolf
# hybrid plover I see

actually I probably don't know if I can or know how to do it on OR, this stuff goes way over my head

cedar wolf
atomic patrol
cedar wolf
plucky drum
atomic patrol
#

it was laggy yesterday for me as well, but a bit more reliable than today

#

i feel like by the time it returns, the promo price will end already, which will suck

rugged terrace
#

i did want to just top up $2 but i genuinely won't be using deepseek enough before the promo price ends to bother with it lol

atomic patrol
#

do i have to top up on deekseek directly too or can i just use my OR credits?

rugged terrace
atomic patrol
#

damn

#

who tf is using deepseek so much that it's rate limiting or

rugged terrace
#

there's probably atleast 500 people right now generating complete automated slop using deepseek on openrouter

#

it's too cheap to not mess around with

atomic patrol
#

they gotta ruin all the fun smh

rugged terrace
#

my first day my cline went wack and made over 2500 files when i was experimenting with a simple refactor

#

if you could put $2 on deepseek directly it works well, you can use the credits for the rumoured r1 or r1-lite launch soon too :)

#

i wish openrouter wasn't limited though

atomic patrol
#

what is the r1/lite?

rugged terrace
#

deepseek's reasoning model, o1 equivalent

atomic patrol
cyan sail
sage orchid
#

Same, Deepseek provider doesn't even work for me. I'm thinking this will remain until their promotional price ends. Their deal is too good so it's flooded

The original price is still worth it though imo

velvet egret
sage orchid
velvet egret
#

So it can act as primary or fallback

atomic patrol
#

damn gotta top up on a different site too on top of it

sage orchid
#

lol yeah. thanks for letting me know though. I'll try it, deepseek roleplay is too good

velvet egret
dusty fable
#

the top still shows 64K, but Together now has 131K. Fireworks just released theirs at 131K too

rustic sundial
#

No model card or report atm. Same size as V3.

hybrid plover
#

Wait, R1 Zero?

#

Is that the full R1?

#

Or version before r1-preview?

rustic sundial
#

Nobody knows yet. Files just appeared, no information yet.

#

But it should be better than V3.

hybrid plover
#

Yeah, fair enough, since V3 was distilled from r1

hybrid plover
velvet egret
lunar flume
#

deepseek just on a roll lately, hope it's something big

hybrid plover
#

I expect it to be at least close to o1-mini performance or a little below that

lunar flume
# hybrid plover I expect it to be at least close to o1-mini performance or a little below that

it destroys o1-mini, on par with o1-medium
https://fixupx.com/StringChaos/status/1880317308515897761?mx=2

DeepSeek-R1 (Preview) Results 🔥

We worked with the @deepseek_ai team to evaluate R1 Preview models on LiveCodeBench.

The model performs in the vicinity of o1-Medium providing SOTA reasoning performance! Huge kudos to the team and I'm looking forward to the full release!!

/1

hybrid plover
#

Interesting

hybrid plover
bold burrow
#

I think DeepSeek V3

#

is down lmao

bold burrow
#

we back

unique shuttle
sand palm
#

Do you feel Deepseek v3 has the ability to properly adapt in multi-turn? Like, It doesn't repeat the same mistake if I give it the tiniest amount of feedback on its math. It's almost humble :>

wary wing
sand palm
uneven gazelle
#

How come we have to use BYOK for lower latency? It seemed to be working fine a few days but now I’m not even getting responses. I’ve switched to BYOK and it works fine now

wary wing
lament yacht
wary wing
lament yacht
forest sail
#

DeepSeek API does NOT constrain user's rate limit. We will try out best to serve every request.
🙈

silent torrent
#

There's probably just something like a global queue

crude cliff
#

: OPENROUTER PROCESSING

I keep getting this for DeepSeek

#

This took like 65s to complete

#

Is this the same for yall?

#

@lament yacht

trail jasper
#

edit: talking about wrong model, my mistake

forest sail
#

OpenRouter doesn't forward reasoning tokens, and it's setup to generate up to 32K tokens of reasoning tokens. At 10 tokens/s, it could take up to an hour before you see the first non-reasoning output. whoops wrong model

crude cliff
trail jasper
#

ah sorry my mistake, should've read the title more carefully 😂 - long day

crude cliff
#

this is happening more frequently. any solution to this or something wrong?

uneven gazelle
crude cliff
#

I don't need to additionaly credits to deepseek right?

uneven gazelle
crude cliff
#

yikes!

sand palm
naive rock
crude cliff
naive rock
silent torrent
#

If you BYOK, you need Deepseek and OR credits

#

BYOK pricing is 5% of the provider pricing deducted from your OR credits

#

For every $1.00 you pay directly to the provider, we'll charge you $.05 in OpenRouter credits.

crude cliff
#

and, without BYOK, just using OR credits?

silent torrent
crude cliff
#

is the OPENROUTER Processing issue related to BYOK?

silent torrent
#

I am not sure, can you see what provider was behind that generation in your activity?

uneven gazelle
lunar flume
#

It's very strange to me that openrouter has been having so many issues with their connection with deepseek

#

I've been using chat.deepseek.com for several days multiple times a day and the tok/s and ttft is always blazing fast and has not timed out on me once

silent torrent
#

granted, you're one user versus our thousands 😅

#

almost 2B tokens through v3 today on OpenRouter

#

& almost 1B on R1

lunar flume
#

I meant to post this in the r1 chat (since I've been using r1) but confused myself haha

lunar flume
#

deepseek the company not the model

naive rock
#

Oh my god V3 works in Sillytavern now too 😄

hybrid plover
#

Now? Pretty sure it worked since release...

pure canopy
#

otherwise maybe they do not know, and think you are just their biggest superfan, who simply cannot get enough deepseek api spam

naive rock
#

It was provider dependent

#

But Seek and Infra did not like ST

hybrid plover
#

Isn't it still provider-dependent or they fixed DeepSeek provider?

#

I would need to check, i guess

naive rock
#

It's fixed

#

In my testing at least. Obviously the DeepSeek provider is getting kind of rocked by requests rn tho

blissful spire
#

hi

slow pollen
#

I am also curios to see why the DeepSeek app is always blazing fast with both R1 and v3 but it is usually either down or too slow from OR (All providers are ignored but DeepSeek). Does BYOK help with up-time and latency? I would like to see real usage example from someone before topping up some credit to DeepSeek API directly 😄

wary wing
#

OpenRouter is being rate limited

slow pollen
# wary wing Byok helps

Oh thanks for letting me know!

Is it noticeable increase? Are you using it that way yourself?

wary wing
slow pollen
#

I will give it a try, thanks a lot!

rustic sundial
sand palm
#

official website no longer works, even without web search enabled. I'm doomed T_T

bold burrow
#

I got a question, my gen ID with 0 token response and a timeout tells me that I am not using BYOK, but I am...why?

  "id": 3996858223,
  "generation_id": "gen-1738012106-DvtnKTf72ZruFYOrbaNR",
  "provider_name": "DeepSeek",
  "model": "deepseek/deepseek-chat-v3",
  "app_id": null,
  "streamed": true,
  "cancelled": false,
  "generation_time": 289575,
  "latency": 13767,
  "moderation_latency": null,
  "created_at": "2025-01-27T21:13:29.867706+00:00",
  "tokens_prompt": 1384,
  "tokens_completion": 0,
  "native_tokens_prompt": 1691,
  "native_tokens_completion": 0,
  "native_tokens_reasoning": null,
  "num_media_prompt": null,
  "num_media_completion": null,
  "num_search_results": null,
  "origin": "",
  "usage": 0.0002343726,
  "usage_cache": null,
  "usage_data": -0.0000023674,
  "usage_web": null,
  "provider_responses": [
    {
      "provider_name": "DeepSeek",
      "status": null,
      "latency": 10000
    },
    {
      "provider_name": "DeepSeek",
      "status": 200,
      "latency": 13767
    }
  ],
  "is_byok": false,
  "finish_reason": null,
  "native_finish_reason": null
}```
silent torrent
bold burrow
#

it's there and it's not a fallback

bold burrow
lament yacht
#

Still very bad that it's straight up 0 completion tokens...

bold burrow
#

yeah it's the deepseek issue from today

#

they're havin a bad time

#

i would like to not be charged for it tho XD

#

tho i unders;tand it's trying times, they're having a bad day

hoary pumice
#

GPUs getting absolutely hammered from around the world

velvet egret
#

The speed was beyond amazing, it's not even R1.

#

Even though i provided the direct API Key (BYOK), but it was not using it.

upbeat aurora
#

interesting 0t/s

silent torrent
silent torrent
naive rock
#

Yeah I have some wild V3 speeds in my OR activity history

#

Like, Groq speeds

shrewd python
#

@silent torrent the situation is getting freaky

silent torrent
shrewd python
silent torrent
#

ah, yeah. IMO things will stabilize pretty significantly in a few days or a week or so

#

lots of hype and lots of difficulties providing stable inference

shrewd python
opal wind
#

why deepseek v3 is so extremely slow?

opal wind
lament yacht
opal wind
#

I see

#

seems that everything is slow, all providers

wary wing
velvet egret
tough whale
#

Guys it’s not slow because of the size, there are only a few active params during inference. The reason it’s slow is because of the massive model hype and low amount of providers supporting it

#

Along with deepseek getting ddosed and novitaai being broken

shrewd python
#

@tough whale someone said the magic word, OpenHands were notified but really NovitaAI has something weird, say didu use it under LiteLLM or direct API?

tough whale
shrewd python
#

Yeah I need to file the report sooner or later XD

pine zodiac
#

What's the good replacement for deepseek v3?

tough whale
#

Also it's painfully lazy

wary wing
near fractal
#

Wow

naive rock
pine zodiac
naive rock
#

It's very good at instruction following and strictly adhering to JSON

#

Very linguistic model overall, not finetuned so hard on math and such

tough whale
#

Actually I noticed that llama is kinda bad in polish, I get better results from Qwen or models that focus to be multimodal

pine zodiac
#

Which version of Qwen (is this version available on OpenRouter?) "models that focus to be multimodal" - any examples? 😄

#

@naive rock tried also Llama 70B 3.3 as you recommened and it knows Polish words better than others, need to test multiple cases, but sounds interesting

#

Curios if openrouter has a default system prompt, because it gave me good responses, but api returns differently

naive rock
#

Ah, I should have asked if this was English only instruction or mixed

naive rock
tough whale
#

All Qwen versions promise to be strong at multilingual tasks

#

Try Qwen 2.5 72b

tough whale
#

The default (1) is already pretty high tho

summer mural
#

is this a common issue right now even on DeepSeek V3? I think I've read about this on R1 @silent torrent

#

0 token completion

silent torrent
#

Yes unfortunately

bold burrow
#

yeah it's

#

kind of annoying that it charges u

#

like just 429 or smth

clear anchor
#

https://openrouter.ai/deepseek/deepseek-chat
Fireworks.ai is missing from the deepseek-v3 provider list.

DeepSeek-V3 is the latest model from the DeepSeek team, building upon the instruction following and coding abilities of the previous versions. Pre-trained on nearly 15 trillion tokens, the reported evaluations reveal that the model outperforms other open-source models and rivals leading closed-source models. Run DeepSeek V3 with API

slow pollen
#

I don’t know if it was mentioned before but Nebius has v3 as well. Will it be added as provider for v3 as well?

slow pollen
# silent torrent Should be live soon :)

By the way, nebius gives error for shortly from time to time saying that you are out of token or sometimes reached rate limit, then it works a few seconds later. Is it also known issue?

silent torrent
slow pollen
silent torrent
#

Everyone is also using them currently, so the rate limits are kind of shared

slow pollen
#

Hmm makes sense. Thank you very much

somber mango
#

finally starting to get some usable providers which is nice

#

deepseek's endpoint is nice, but way too unreliable

bold burrow
#

Nebius just

#

Doesn’t respond properly

#

90% of the time

#

It like jsut chucks out context lmao

cedar wolf
#

Those cheap providers either respond with context(V3)/reasoning process(R1), don't respond at all or stop after a few sentences.

bold burrow
#

yeee

silent torrent
bold burrow
#

no like it forgets context

silent torrent
#

ah

wary wing
bold burrow
#

Nebius in particular

wary wing
uneven gazelle
formal oriole
#

The DeepSeek: DeepSeek V3 (free)/deepseek/deepseek-chat:free has a problem with the provider Targon, the caching creates incoherency in some moments. Like in this situation: In a RP the user goes to a store, and for some reason all the swipes will show the same shopkeeper "Emily, blonde, 20's", I'm not sure if it's a caching problem or the AI just loves that name and age

covert pawn
# formal oriole The ```DeepSeek: DeepSeek V3 (free)/deepseek/deepseek-chat:free``` has a problem...

I honestly don't understand how people manage to do good RP, let alone ERP, with DeepSeek and DeepSeek r1: it's slow, it often crashes, it often goes crazy, sometimes it's smart and other times it looks lobotomized, if you change providers you have to change presets and system prompts that each provider has its own different DeepSeek.
Boh, it may be me stupid and ignorant, but when I do RP and ERP I want to relax, not fight to get a single decent answer.

bold burrow
#

bruh

#

Together is like

#

just completely dimentia-ridden

shrewd python
#

Really? How so?

cedar wolf
# covert pawn I honestly don't understand how people manage to do good RP, let alone ERP, with...

V3 is too repetitive for RP. R1 is lobotomized in the sense that it often does what it wants and takes it too far. But it's so different from everything else we have at the moment, it's entertaining. Yes, I do have to edit almost every reply to keep it away from responding as me, but everything else is either basic, repetitive or moderated. R1 is like an obstacle course but the rewards are often suprisingly good.

bold burrow
#

Then next message it doesn’t know the number LMAO

#

Only for certain providers

formal oriole
formal oriole
#

Yet I forgot to mention that ``The Provider Targor``` sometimes doesn't deliver the complete replies, sometimes it gives cut off replies with half of the reply, not finished

viral trout
#

Hi, does all providers support Tools (function call) & structured output? I tried the standard version (not free) but failed, anyone could help me on that?

dusty fable
somber mango
vital compass
#

I swapped to deepseek api, but sometimes it happens too

#

I migrated to qwen32b coder instead

bold burrow
#

@silent torrent

#

Together on DeepSeekV3

#

what hapepneda lmfao

#

👀

#

i do not have a key configured

#

(i never did)

silent torrent
#

looking

wise flare
#

Fixed now!

delicate mica
#

How do deepseek r1's sampling parameters compare on official website & open router? I seem to observe that same prompt gives better reasoning on deepseek.com and on openrouter it's shorter and more superficial. I'm concerned that the default sampling parameters don't match deepseek.com's

serene arch
indigo harness
#

Finally turns out to be 545% true 🔥
And they just kindly opensourced their moat

atomic patrol
#

what's a good temp for deepseek-v3 if I'm using the free version with only the targon provider?
when I'm using the non-free version, i use the official deepseek provider and for that one, i need to crank up the temp to 1.85 for decent results, i'm wondering what temp should i use to match that in the free version with targon

worn horizon
atomic patrol
#

oh yeah i forgot, i'm using it for RP, and I needed to crank it to 1.9 to have good results

rustic sundial
#

The RP recommendation was generally around 1.8. Higher has higher chance of vomit after 400+ tokens of output in a single generation, unless they fixed that. Think I've only used this model for about 2 weeks after release. And we had to fight it with a little bit of freq penalty.

forest sail
silent torrent
#

not very competitive pricing

digital silo
tough whale
dusty fable
#

MSFT aren't trying to be cheap, all the governance and procurement fluff they have to do for their government & enterprise clients...means they get those same clients without much competition

digital silo
mossy breach
#

I think Chutes(provider) glitched on v3 (Free):

its generating UNREALISTICALLY FAST responses, like 900 tokens per second...

but the responses are all being exactly the same

hybrid plover
#

Yeah, probably cached responses.

formal oriole
#

So, what's happening with all the free DeepSeek models? I don't know which providers are the ones with the cache thing that makes replies be the same when you swipe/regenerate, but the cache thing is making me question everything. I'm using those models for RP in SillyTavern, and I have no idea about what to do with the cache thing. And I think I'm gonna suggest in the suggestions that the providers but have a tag in the model provider list, something that says "This provider may cache your prompts—learn more in [link]"

summer mural
#

but you could try blacklisting one provider to see if it fixes the issue

wary wing
#

I have proof DeepSeek v3 was trained on the Bee movie script, and it's funny

#

It's logical to think that it was trained on the Bee movie script, but it's still funny.

wary wing
# slate carbon Show us, enlight us..

I didn't make it output the entire script, but it went on like this for a little bit, and it matched up exactly with the bee movie script

I had to input some of the script, though

naive rock
#

The new V3 is surprisingly fun to interact with. Maybe the best personality of any model.

#

Curious how they trained it in. Very playful.

wary wing
wraith coral
sonic plume
#

Yeah it's super fun

#

I was brainstorming with it and my system instruction doesn't have anything telling it to be casual, or rude, or dictating its tone or whatsoever

#

Yet it's the only model that says "your character won't be betrayed so easily unless there is a damn good reason"

naive rock
#

lol

#

I went on a small rant about hating semi-colons and how we should just get rid of them. Every other model leaned heavily toward "Well actually here's why they're still a good idea-" whereas new V3 got playful with it and encouraged me on, squeaking in the counter-arguments as "counterarguments from grammar nerds". It finally ends with:

#

Compromise Proposal

Banish semi-colons except for:

  1. Winking at Grammar Nerds (to acknowledge their pain).
  2. Artistic Use (e.g., pretentious novel titles: "The Rain in Spain; The Lies We Weep").

Otherwise, let the comma and period split the semi-colon’s duties like a divorced couple dividing assets. The world might not end—just get slightly more breathless.

Verdict: Proceed with caution. Or recklessly. Language is a democracy (or should be).

#

Contender for my favorite LLM response of all time. This was in no way instructed, with a bog standard system prompt and temp

wraith coral
#

Rare deepseek refusal pull, feels like I just found a shiny pokemon lmao

frigid plover
#

it's never refused me

wraith coral
#

Just a random task lol, never happened to me before either. Worked after one retry