DeepSeek-R1 and DeepSeek-R1-Zero | OpenRouter | Page 3

pearl parcel Feb 18, 2025, 3:30 AM

#

I get some reasoning in the response - but no </think> to separate the reasoning from the answer

#

bright portal Feb 18, 2025, 3:31 AM

#

pearl parcel I get some reasoning in the response - but no </think> to separate the reasoning...

ooh Together

#

Hmm... still working for me, even when I pin to Together

pearl parcel Feb 18, 2025, 3:32 AM

#

do you want me to print out exactly what's going over the wire?

bright portal Feb 18, 2025, 3:32 AM

#

Can you try a prompt like "9.11 and 9.9, which one is larger? Please THINK!"?

pearl parcel Feb 18, 2025, 3:32 AM

#

ok

bright portal Feb 18, 2025, 3:32 AM

#

pearl parcel do you want me to print out exactly what's going over the wire?

it looks correct in the console -- reasoning should not be None if it produced the think tag

pearl parcel Feb 18, 2025, 3:33 AM

#

definitely it's thinking - it just doesn't seem to be wrapping it in <think></think>

bright portal Feb 18, 2025, 3:33 AM

#

we detects the think section and pull it into the reasoning field of the delta

#

So you shouldn't see the <think> tag from our API

pearl parcel Feb 18, 2025, 3:34 AM

#

pearl parcel

take a look at the sample

bright portal Feb 18, 2025, 3:35 AM

#

Hmmm looks sus for sure, let me double check

#

in the mean time, if you do stream: true, does it work?

pearl parcel Feb 18, 2025, 3:36 AM

#

I don't know how to process a stream in python - do you have an example?

bright portal Feb 18, 2025, 3:36 AM

#

https://openrouter.ai/docs/api-reference/streaming

OpenRouter Documentation

API Streaming - Real-time Model Response Integration

Learn how to implement streaming responses with OpenRouter's API. Complete guide to Server-Sent Events (SSE) and real-time model outputs.

pearl parcel Feb 18, 2025, 3:40 AM

#

I switched back to /chat/completions and it worked this time, from DeepInfra

bright portal Feb 18, 2025, 3:41 AM

#

pearl parcel I switched back to /chat/completions and it worked this time, from DeepInfra

without streaming right?

pearl parcel Feb 18, 2025, 3:41 AM

#

without streaming

#

it worked again, also from DeepInfra

is there a way to force it to Together so I can verify if it doesn't work on Together?

#

how can I force it?

bright portal Feb 18, 2025, 3:42 AM

#

https://openrouter.ai/docs/features/provider-routing#example-specifying-providers-with-fallbacks-disabled

OpenRouter Documentation

Provider Routing - Smart Multi-Provider Request Management

Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.

#

Something like this (in TS):

  body: JSON.stringify({
    'model': 'mistralai/mixtral-8x7b-instruct',
    'messages': [
      {
        'role': 'user',
        'content': 'Hello'
      }
    ],
    'provider': {
      'order': [
        'OpenAI',
        'Together'
      ],
      'allow_fallbacks': false
    }
  }),

#

We will add some Python docs for this soon cc @cerulean lotus

pearl parcel Feb 18, 2025, 3:46 AM

#

ok it's working now, for both Together and DeepInfra (intermittently, at least)
I will just check for null or empty reasoning and retry

#

Thank you for helping me

bright portal Feb 18, 2025, 3:46 AM

#

ur welcome!

#

Looking forward to the next Dolphin

pearl parcel Feb 18, 2025, 3:48 AM

#

I need to get it training on 72b, and also I need to get a RL pipeline set up. (Pretty sure Dolphin-R1-24b would be much better if I used RL to tame its <think> block)

pearl parcel Feb 18, 2025, 4:20 AM

#

2% of responses have no reasoning - That's not too bad

#

occasionally there's no "choices" in the response

pearl parcel Feb 18, 2025, 4:13 PM

#

Reason API error: Response ended prematurely
Reason API error: Response ended prematurely
Reason API error: Response ended prematurely
Reason API error: Response ended prematurely
Processing samples:   3%|█▋                                                    | 1514/47532 [32:58<21:57:27,  1.72s/it]Reason API error: Response ended prematurely
Reason API error: Response ended prematurely

getting a lot of these "Response ended prematurely" and a few of

Reason API error: HTTPSConnectionPool(host='openrouter.ai', port=443): Max retries exceeded with url: /api/v1/chat/completions (Caused by SSLError(SSLError(1, '[SSL: SSLV3_ALERT_BAD_RECORD_MAC] sslv3 alert bad record mac (_ssl.c:2570)')))

rocky heron Feb 18, 2025, 4:15 PM

#

pearl parcel ``` Reason API error: Response ended prematurely Reason API error: Response ende...

looking

bright portal Feb 18, 2025, 4:30 PM

#

pearl parcel ``` Reason API error: Response ended prematurely Reason API error: Response ende...

What client are you using btw?

#

is it just this raw request.post?

pearl parcel Feb 18, 2025, 4:33 PM

#

yes

bright portal Feb 18, 2025, 4:34 PM

#

How do you read the data within the response?

#

cc @sleek arch

crystal fjord Feb 18, 2025, 5:50 PM

#

im still, sometimes, getting empty replies from deepseek api provider for r1. is that just cause theyre refusing to reply? sillytav problems? or api problems?

icy fulcrum Feb 18, 2025, 8:04 PM

#

I'm using the deepseek-ai/DeepSeek-R1 API and need providers with a throughput of at least 25 t/s. However, provider performance constantly changes, and some providers, like Nebius Studio, use different model names (e.g., "deepseek-ai/DeepSeek-R1-FAST"), making it difficult to ensure I'm always using the fastest available option.

Right now, API provider routing only allows filtering by provider name, which isn't enough for my needs. I want a way to automatically route requests to any R1 provider that meets my throughput requirement, without manually updating the provider list.

A better solution would allow filtering providers by throughput, uptime, max context, max output and pricing instead of just excluding them by name.

formal nest Feb 18, 2025, 8:05 PM

#

icy fulcrum I'm using the deepseek-ai/DeepSeek-R1 API and need providers with a throughput o...

We have this now!

#

Lots of provider routing options here: https://openrouter.ai/docs/features/provider-routing

OpenRouter Documentation

Provider Routing - Smart Multi-Provider Request Management

Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.

#

most recently, we have sort, which lets you sort by throughput, latency, price, etc.

#

we also have a max_price param: https://openrouter.ai/docs/api-reference/parameters#max-price

#

let me know if this begins to fill the requests / features you hope to see.

icy fulcrum Feb 18, 2025, 8:15 PM

#

formal nest let me know if this begins to fill the requests / features you hope to see.

That works. One suggestion: emphasize displaying each provider's uptime % in the analysis. Uptime % is a crucial metric, but it isn't currently shown on the model page, nor can we sort by it through API calls. Thank you!

vocal raven Feb 18, 2025, 8:16 PM

#

icy fulcrum That works. One suggestion: emphasize displaying each provider's uptime % in the...

isn't the whole point of openrouter is that it routes away from downtime without any work on your part

icy fulcrum Feb 18, 2025, 8:19 PM

#

vocal raven isn't the whole point of openrouter is that it routes away from downtime without...

I see the value of automatic fallback, but in my case, my app relies on chained API calls, data transformations, and long-running processes. Each fallback adds latency and increases processing time. If I can proactively avoid providers with frequent failures, I can minimize disruptions and ensure a more efficient experience for my users. Having uptime data would allow me to make smarter routing decisions upfront, reducing the need for fallbacks in the first place.

vocal raven Feb 18, 2025, 8:22 PM

#

icy fulcrum I see the value of automatic fallback, but in my case, my app relies on chained ...

isn't the whole point of openrouter is that it routes away from downtime without any work on your part

rocky heron Feb 18, 2025, 8:23 PM

#

icy fulcrum I see the value of automatic fallback, but in my case, my app relies on chained ...

Each fallback adds latency and increases processing time

not true! we keep track of how providers are doing and skip them without adding latency if they're unreliable. like @vocal raven said above

silent gulch Feb 18, 2025, 8:36 PM

#

https://x.com/perplexity_ai/status/1891916644248846789?t=7_5m7rcR2w7GFITF2I2QSA&s=19

Perplexity (@perplexity_ai) on X

Download the model weights on our HuggingFace Repo or consider using the model via our Sonar API.

HuggingFace Repo: https://t.co/9HK9mQGKQ1

#

Can we have R1 1776 from sonar api to Openrouter?

icy fulcrum Feb 18, 2025, 8:37 PM

#

rocky heron > Each fallback adds latency and increases processing time not true! we keep tr...

But how is a provider determined to be unreliable? By detecting failed API calls, right? That means failures have to happen first. And unless OpenRouter calls all providers simultaneously, which would be inefficient, the fallback provider is only called after the first one fails, introducing delays. This has happened to me many times, and it's obvious that making one call takes less time than making two or more sequentially. I’m not saying the fallback system is wrong, I think it’s great. But since a provider can only be detected as failing after it fails, the system could be even more efficient if provider uptime were disclosed and sorting by it was possible through the API.

rigid nova Feb 18, 2025, 8:45 PM

#

Past returns do not guarantee future returns

#

On your application side, having logic which retries OpenRouter means you get the routing functionality and the "retry" will then be resolved by OR to a working provider. Only need the one set of API settings in your application.

#

If you are really mission critical, need to build in logic that means if for whatever reason OR goes down (eg domain hijacking), it goes to some other endpoint for a request.

icy fulcrum Feb 18, 2025, 8:57 PM

#

rigid nova Past returns do not guarantee future returns

Past failures don’t guarantee future ones, but probability matters. A provider with lower uptime has a higher chance of failing, and uptime % is a good indicator of that probability. I already have a retry system, but retrying increases latency for my users. I’m just suggesting a way to reduce the chances of needing a retry in the first place, regardless of whether it’s OpenRouter internally handling the retry or me. Thanks for sharing your thoughts!

rugged nest Feb 18, 2025, 9:08 PM

#

formal nest We have this now!

might be cool to be able to route to providers with at least e.g. 50 tokens/s speed, but use default prio, or price prio

so like a custom speed floor

rigid nova Feb 18, 2025, 9:08 PM

#

Publishing more data would be great, I agree.

formal nest Feb 18, 2025, 9:09 PM

#

rugged nest might be cool to be able to route to providers with at least e.g. 50 tokens/s sp...

cc @cedar sentinel

pearl parcel Feb 19, 2025, 12:11 AM

#

~50% of my requests are coming back with no reasoning, now.

#

it's getting expensive

bright portal Feb 19, 2025, 12:13 AM

#

pearl parcel ~50% of my requests are coming back with no reasoning, now.

Can you log the provider as well? We return it as provider_name

#

Or are you pinning it to a known set

pearl parcel Feb 19, 2025, 12:13 AM

#

on my next subset I will (I don't wanna throw away my progress)

vocal raven Feb 19, 2025, 12:25 AM

#

pearl parcel ~50% of my requests are coming back with no reasoning, now.

you should consider batches

icy fulcrum Feb 19, 2025, 9:20 PM

#

formal nest cc <@392529839745269760>

Hi,

I’ve noticed that R1 is currently returning many errors, as shown on the uptime page. In my app, this causes long-chained API calls to fail, leading to financial losses when expensive processes fail before completion.

To prevent this, I suggest adding an API endpoint to check the current uptime of each model. This would allow apps like mine to monitor uptime and take preventive action. For example, if uptime drops below 90%, I would display a message to users informing them that workflows are temporarily unavailable, preventing them from starting processes that are likely to fail.

This feature would help developers reduce failed requests and unnecessary costs while improving user experience.

Thanks for considering this!

bright portal Feb 19, 2025, 9:21 PM

#

icy fulcrum Hi, I’ve noticed that R1 is currently returning many errors, as shown on the [u...

Are you pinning to any provider in particular?

icy fulcrum Feb 19, 2025, 9:23 PM

#

bright portal Are you pinning to any provider in particular?

My providers array order is: "Fireworks",
"Together",
"Nebius"

Deepseek is ignored.

Fallbacks are enabled. max_price is 8 for both prompt and completion. sort by throughput is enabled.

rigid nova Feb 19, 2025, 9:31 PM

#

icy fulcrum Hi, I’ve noticed that R1 is currently returning many errors, as shown on the [u...

there are ways you can make your application more resilient immediately, staging (send a partial canned response to the user implying thought, or a loadinwheel) while you retry in the background, having a HA fallback mechanism to azure or vertex etc. 98% uptime means 200 of every 10000 will fail, so you know they will fail

#

using data from OR is slower than having an immediate path for your program to go down in case of API failure

#

within OR you could set it up to fallback to one of their highly available models, and if it has fallen back as shown in the response, you can request again for the actual R1 response

icy fulcrum Feb 19, 2025, 9:49 PM

#

rigid nova within OR you could set it up to fallback to one of their highly available model...

Thank you for sharing your ideas!

I have OpenRouter’s fallback mechanism enabled, and I already have a retry system that calls OpenRouter again when an error is detected.

However, the issue is more complex than just handling explicit errors. Some failed API calls don’t return an error at all, making it difficult to detect programmatically. For example, I received the following response:

{ "result": { "id": "gen-1739998011-kBgJfDmW76mQtvtdyK4L", "model": "deepseek/deepseek-r1", "usage": { "total_tokens": 5623, "prompt_tokens": 5619, "completion_tokens": 4 }, "object": "chat.completion", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "[-1]", "refusal": null }, "logprobs": null, "finish_reason": "stop", "native_finish_reason": "stop" } ], "created": 1739998011, "provider": "DeepInfra" } }
The [-1] response is clearly incorrect and not at all what the model should have returned. This isn’t a prompt engineering issue but rather a problem with degraded responses from the model.

These kinds of errors tend to happen more frequently when uptime is lower, suggesting that the providers available are more likely to be operating at degraded capacity rather than failing outright. Having an uptime API would help my app prevent initiating workflows during these degraded states, where failures are harder to detect.

formal nest Feb 19, 2025, 9:50 PM

#

please vote: #announcements message

rigid nova Feb 19, 2025, 9:57 PM

#

icy fulcrum Thank you for sharing your ideas! I have OpenRouter’s fallback mechanism enable...

yeah I get you, you could detect for something that is wrong unless you ever expect answers <4 characters? there is client side logic like that or checking against a local dictionary for if words have actually been returned. character account is simpler

deepinfra hasn't been my favourite but staff wouldn't have brought it back without serious testing. you shouldnt be billed for a response like that imo

icy fulcrum Feb 19, 2025, 10:15 PM

#

rigid nova yeah I get you, you could detect for something that is wrong unless you ever exp...

Checking characters might detect some errors, not the ideal solution though.

Yeah I don't have Deepinfra in my providers array, it was called by a fallback. I agree, but what's more important for me is to be able to avoid this kind of errors from occuring. Model uptime tracking via API would help a lot

cedar sentinel Feb 19, 2025, 10:49 PM

#

icy fulcrum Checking characters might detect some errors, not the ideal solution though. Ye...

we're actually in the middle of revamping our downtime avoidance system, so this is timely!

but this overhaul is more geared towards request statuses returned from upstream (4xx/5xx) and dynamically routing around outages better than we do today

in your case here tho, it looks like our response status tracking won't help much. since it was a "successful" request with 200 status and a non-error finish reason. our uptime stats won't necessarily cover "quality" of outputs

for this sort of thing, we're in the early phases of planning some continuous evals that will penalize providers that consistently output garbage. still more thinking to do on it though, to try and differentiate bad sampling parameters from bad providers

rigid nova Feb 19, 2025, 11:22 PM

#

my hobby account with sambanova just got API access for the full R1, maybe it is becoming more generally available...

merry path Feb 19, 2025, 11:25 PM

#

What kind of context limit?

rigid nova Feb 20, 2025, 12:39 AM

#

merry path What kind of context limit?

Only 4K

tight jolt Feb 20, 2025, 8:26 AM

#

Supposing that FP8 variants of R1 are a little faster but dumber(?) than those without the "FP8" (so FP16?), is there a way to set the ":nitro" shortcut excluding the FP8 variants?

rigid nova Feb 20, 2025, 9:26 AM

#

tight jolt Supposing that FP8 variants of R1 are a little faster but dumber(?) than those w...

honestly everyone is fp8, this is what deepseek recommends. The others just have not declared it

tight jolt Feb 20, 2025, 9:38 AM

#

rigid nova honestly everyone is fp8, this is what deepseek recommends. The others just have...

Ah, very interesting. Thanks!
PS: I also found that there's a quantization filter setting: https://openrouter.ai/docs/features/provider-routing#quantization

rigid nova Feb 20, 2025, 9:38 AM

#

tight jolt Ah, very interesting. Thanks! PS: I also found that there's a quantization filte...

Excellent, they have improved the docs a lot so great that you found it

earnest wolf Feb 20, 2025, 11:13 AM

#

tight jolt Supposing that FP8 variants of R1 are a little faster but dumber(?) than those w...

Fp8 is the highest quant. There is no fp16 or anything

R1 and v3 were natively trained in fp8

icy fulcrum Feb 20, 2025, 3:48 PM

#

cedar sentinel we're actually in the middle of revamping our downtime avoidance system, so this...

This sounds good. However, please consider adding a GET API endpoint to retrieve:

Overall uptime % of a model – When general uptime is low, the model tends to return more garbage responses from the available providers. This is especially critical for our use case since we rely on long-chained API calls, where failures mid-process result in financial losses and wasted compute. Having this data available via API would allow us to proactively prevent workflows when the model is in a degraded state.

Uptime % of a model for a specific provider – Some providers may perform worse than others, and low provider-specific uptime would allow us to dynamically avoid routing requests to unstable providers. This helps reduce latency caused by sending requests to failing providers and having to retry them multiple times.

This is a very simple feature with huge benefits—uptime data is already available on the frontend, so making it accessible via an API endpoint should be straightforward while significantly improving reliability for developers.

formal nest Feb 20, 2025, 3:50 PM

#

icy fulcrum This sounds good. However, please consider adding a GET API endpoint to retrieve...

Thanks for all your feedback, it's very valuable

inner reef Feb 20, 2025, 10:39 PM

#

icy fulcrum Thank you for sharing your ideas! I have OpenRouter’s fallback mechanism enable...

it might be you're using json response format, and "[-1]" is a valid json. Using json format for R1 will make the output quality bad, because it's trained to start with <think> tokens.

vale marten Feb 22, 2025, 8:54 AM

#

Someone opened a PR on the R1 HF repo enabling using prefix=True with a modified chat template https://huggingface.co/deepseek-ai/DeepSeek-R1/discussions/158

cc @peak flame

deepseek-ai/DeepSeek-R1 · support prefix complete

#

dark vigil Feb 22, 2025, 1:07 PM

#

Now if they could just support temperature...

#

(Via the official Deepseek api I mean)

pearl parcel Feb 22, 2025, 4:31 PM

#

this is why we have inconsistency.
providers are deviating from the standard template.
neuralmagic's problem.
ideal or no - EVERYONE should use the official template, and workaround that (by inserting a fake <think> token on the client side) if necessary, otherwise chaos ensues.

#

https://huggingface.co/neuralmagic/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic/discussions/3

neuralmagic/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic · Update toke...

#

my personal workaround is to use the official tokenizer, and tokenize the prompt myself, then call the completions endpoint

vocal raven Feb 22, 2025, 4:35 PM

#

pearl parcel my personal workaround is to use the official tokenizer, and tokenize the prompt...

Surely you mean you apply the chat template
Surely you don't actually convert the text into a series of IDs

pearl parcel Feb 22, 2025, 4:40 PM

#

reason_tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Llama-70B")

tokenized_input = reason_tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)
nontokenized_input = reason_tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
prompt_tokens = tokenized_input.numel()
allowed_completion_tokens = max_tokens - prompt_tokens
final_max = min(allowed_completion_tokens, max_tokens)
final_max = max(final_max, 1) - 1
response = requests.post(
    url="http://localhost:4000/v1/completions",
    data=json.dumps({
        "model": "r1-distill-70b", 
        "prompt": nontokenized_input,
        "temperature": 0.5,
        "max_tokens": final_max
    })
).json()

I use tokenized to count the tokens, to properly set the max_tokens
I use non-tokenized to pass to the completions endpoint.

#

(and it's kind of insane that we need to tokenize the input on the client side like this, just to figure out how to set max_tokens)

rocky heron Feb 22, 2025, 4:42 PM

#

why do you need to set max_tokens?

rocky heron Feb 22, 2025, 4:42 PM

#

pearl parcel this is why we have inconsistency. providers are deviating from the standard tem...

we're working on doing this automatically

#

cc @bright portal

pearl parcel Feb 22, 2025, 4:42 PM

#

because otherwise, a default value (typically 8k) is used.

rocky heron Feb 22, 2025, 4:43 PM

#

is that too large for you?

pearl parcel Feb 22, 2025, 4:43 PM

#

I want 32k

rocky heron Feb 22, 2025, 4:43 PM

#

ah

pearl parcel Feb 22, 2025, 4:43 PM

#

minus the request tokens

#

minus 1 lol

#

final_max = max(final_max, 1) - 1

rocky heron Feb 22, 2025, 4:44 PM

#

why not just set max_tokens: 32k then? max tokens is the max completion tokens, not the whole thing

pearl parcel Feb 22, 2025, 4:44 PM

#

because then I get error "you requested blabal but the prompt + 32k is greater than the 32k the model can provide"

rocky heron Feb 22, 2025, 4:45 PM

#

ohh

#

ok we should fix this

pearl parcel Feb 22, 2025, 4:45 PM

#

I worked around it, anyway 🙂

#

the important thing to note is some providers of R1-Distill-70b are removing the <think> from the prompt template

#

(which causes it to generate a <think> token - when it wants to)

rocky heron Feb 22, 2025, 4:46 PM

#

right that is separate and something we're working on

pearl parcel Feb 22, 2025, 4:46 PM

#

as opposed to the default which forces a <think> token (but then it doesn't generate one)

#

FYI, I'm getting a number of rambling never-ending <think> (entire 32k with never an answer) I guess it is temperature related

#

maybe 1% of requests

#

I can try dropping temperature to .4 or .3

wheat comet Feb 22, 2025, 6:00 PM

#

H

vale marten Feb 22, 2025, 6:19 PM

#

rocky heron right that is separate and something we're working on

Thank you so much for looking into this

vale marten Feb 22, 2025, 6:20 PM

#

rocky heron we're working on doing this automatically

That would be great too

vale marten Feb 23, 2025, 7:51 AM

#

I talked to the Fireworks people about not being able to prefill with their R1 API. Seems like they will come up with a fix. I mentioned https://huggingface.co/deepseek-ai/DeepSeek-R1/discussions/158
but I am not sure what fix they will come up with.

#

Mon/Tues

#

@rocky heron @bright portal @peak flame

restive wharf Feb 23, 2025, 2:03 PM

#

pearl parcel as opposed to the default which forces a <think> token (but then it doesn't gene...

great job cluing in these amateurs

pearl parcel Feb 23, 2025, 2:26 PM

#

There's no amateurs here

clever jolt Feb 23, 2025, 2:33 PM

#

@pearl parcel are any providers using the updated chat template? The ones I tested seemed to still be using the old one

#

(i.e. the one without <think>)

#

according to the current tokenizer published by deepseek {"content":"hi","role":"user"} -> '<｜begin▁of▁sentence｜><｜User｜>hi<｜Assistant｜><think>\n'

#

it doesn't look like R1 providers are doing this yet tho?

pearl parcel Feb 23, 2025, 2:37 PM

#

Most are using the neuralmagic fp8 out of the box

clever jolt Feb 23, 2025, 2:39 PM

#

ah ok, just saw your PR on their repo, hopefully they update it

pearl parcel Feb 23, 2025, 2:39 PM

#

They fixed it

#

#

Just a matter of everyone updating their models 😅 and then mitigating the lack of <think> in the response that the UI depends on

clever jolt Feb 23, 2025, 2:41 PM

#

cool, I'll check again in a few days then

#

hopefully providers will update their deployments

vale marten Feb 23, 2025, 2:49 PM

#

For now I am using this template and the completion endpoint (instead of chat completion) in order to prefill thinking... https://huggingface.co/deepseek-ai/DeepSeek-R1/discussions/158

peak flame Feb 24, 2025, 1:25 AM

#

https://www.perplexity.ai/page/china-launches-underwater-comp-_SRMVJXjQc2LLmJsq2_2zQ

whale literally computing underwater
based?

Perplexity AI

China Launches Underwater Computing Cluster

China has deployed an underwater intelligent computing cluster off the coast of Hainan, delivering processing power equivalent to 30,000 high-end gaming PCs...

#

I don't know where this comes from, but their uptime is starting to be fixed though we still can't purchase credits directly.
https://zzzzzzz.grafana.net/public-dashboards/88296a8e74c14dae8f839c2b9973214b
Few days later edit: Whoops, they're getting intermittent down times again.

vocal raven Feb 24, 2025, 2:29 AM

#

is it better than https://en.wikipedia.org/wiki/Project_Natick

Project Natick

Project Natick was an experimental data center that underwent research and development by Microsoft. Microsoft deployed its first undersea data center prototype in August 2015. It subsequently deployed and retrieved a "shipping-container" sized data center off the coast of the Northern Isles in 2018. Microsoft subcontracted Naval Group to spearh...

formal nest Feb 24, 2025, 4:29 PM

#

FYI, slowly rolling out an update to how we handle thinking generations on ALL R1 models. They will now consistently think, and prefill will now consistenly work.

wintry dome Feb 25, 2025, 12:18 AM

#

formal nest FYI, slowly rolling out an update to how we handle thinking generations on *ALL*...

right now, whenever I use Deepinfra or Nebius AI Studio, the reason and context are just two different versions of a reply or sometimes it looks like the first half of the context is in reasoning...but there's no "reasoning" in reasoning.

Deepseek (the source) is still returning it correctly. Does this have something to do with your changes? Did something change that ST hasn't caught up on perhaps?

rigid nova Feb 25, 2025, 8:59 AM

#

rest in peace hyperbolic:deepseek/deepseek-r1-zero

sUkaqROdBUbELGvVaOl73_18477cccb80f438580547f1c773e9eb6.png

vale marten Feb 25, 2025, 7:27 PM

#

formal nest FYI, slowly rolling out an update to how we handle thinking generations on *ALL*...

Hmm, I'm very interested in knowing what you guys did to make prefill work consistently...

vale marten Feb 26, 2025, 1:49 AM

#

https://fxtwitter.com/AndrewCurran_/status/1894355918621749402

FxTwitter / FixupX

💬 26 🔁 65 ❤️ 609 👁️ 54.2K

Andrew Curran (@AndrewCurran_)

Deepseek R2 is arriving early.

sinful crown Feb 26, 2025, 4:02 AM

#

👀 👀 🍿

tender pawn Feb 26, 2025, 9:53 AM

#

https://i.febryan.me/j9lon.png

#

Discount on certain time interval.

junior skiff Feb 26, 2025, 11:30 AM

#

that is interresting source please link

pale hull Feb 26, 2025, 11:59 AM

#

https://api-docs.deepseek.com/quick_start/pricing/

Models & Pricing | DeepSeek API Docs

The prices listed below are in unites of per 1M tokens. A token, the smallest unit of text that the model recognizes, can be a word, a number, or even a punctuation mark. We will bill based on the total number of input and output tokens by the model.

wintry dome Feb 26, 2025, 3:20 PM

#

So I just got this from DeepSeek. I know this would apply if we user our own API, but I wonder if the discount would apply through OR (assuming using the deepseek provider and not one of the others)

ADKq_Nac8Qzqkssj4-FqIGuoz5t3wIoFBpkrBiwiEEtjSM1RQCCxwAEqzFTYZdgZWnG3YNfKSfT9WmG_ph8jduGglmE3j_Np2ezwSwkC_Chboe912yAs0-d-e1-ft.png

formal nest Feb 26, 2025, 3:32 PM

#

wintry dome So I just got this from DeepSeek. I know this would apply if we user our own API...

Yeah if you use your own key through OR, you'd get the discounted pricing.

wintry dome Feb 26, 2025, 3:39 PM

#

formal nest Yeah if you use your own key through OR, you'd get the discounted pricing.

I figured that, but what if we use OR's deepseek provider

rigid nova Feb 26, 2025, 4:38 PM

#

wintry dome I figured that, but what if we use OR's deepseek provider

If you set a Deepseek API key in OR settings, you don't pay anything for the token to openrouter, but you pay 5% of the normal cost as a transaction fee: https://openrouter.ai/docs/use-cases/byok

OpenRouter Documentation

BYOK - Bring Your Own Keys to OpenRouter

Learn how to use your existing AI provider keys with OpenRouter. Integrate your own API keys while leveraging OpenRouter's unified interface and features.

#

Make sure you monitor your usage, as when the deepseek provider is unavailable with your key, it will retry with your credits, and you would pay OR for the token.

#

(setting your deepseek key does not guarantee it will be used, just beware)

wintry dome Feb 26, 2025, 4:59 PM

#

rigid nova If you set a Deepseek API key in OR settings, you don't pay anything for the tok...

This doesn't answer my question. Setting aside the BYOK. If I use DeepSeek v3 or DeepSeek-R1 through OR (making sure I use the actual DeepSeek provider when doing so since both models have many providers), will the discount be applied. That's what I'm asking.

formal nest Feb 26, 2025, 5:14 PM

#

wintry dome This doesn't answer my question. Setting aside the BYOK. If I use DeepSeek v3 or...

Not at this time. We don't support their pricing discounts

junior skiff Feb 26, 2025, 9:02 PM

#

formal nest Not at this time. We don't support their pricing discounts

because its technically not possible atm or because you just want to make 75% more cash during those few hours ?

#

if its a technical thing - i can understand that this can get hairy / specially on provider level

formal nest Feb 26, 2025, 9:04 PM

#

we don't have the code support for it

junior skiff Feb 26, 2025, 9:05 PM

#

ya that is cool - here is a recommendation that may is a easyer implemenation - weekly refunds of the over charging happening for that 1 provider .. in a given timeframe

#

so charge full

#

but refund the delta once a week or a month

#

should be a fairly easy processing step / and doesnt need to be changed during proxy passing

#

just fruit for thought

#

since transaction logs are stored anyway

#

i think that would be the happy tadeoff

#

not sure what alex or the rest of the team thinks

icy fulcrum Feb 26, 2025, 10:43 PM

#

inner reef it might be you're using json response format, and "[-1]" is a valid json. Using...

According to this, R1 supports response format. Is this wrong?

half sapphire Feb 26, 2025, 10:55 PM

#

formal nest Not at this time. We don't support their pricing discounts

But we're still getting

#

their discounts

#

right?

#

Especially if I'm using a custom integration key?

formal nest Feb 26, 2025, 10:58 PM

#

half sapphire Especially if I'm using a custom integration key?

You only get their time-based discount if a generation is going through your own API key, in which case you are billed at what they bill you at, including discounts. And our 5% fee comes from the discounted rate.

#

otherwise, no, it's the standard non-discounted rates, at all times, through OpenRouter

half sapphire Feb 26, 2025, 11:10 PM

#

formal nest You only get their time-based discount if a generation is going through your own...

Amazing i'm using my own API key

#

beautiful, ya'll are goated thank u!!

vocal raven Feb 26, 2025, 11:55 PM

#

tender pawn https://i.febryan.me/j9lon.png

(<t:1740673800:t> - <t:1740616200:t>)

vocal raven Feb 27, 2025, 12:00 AM

#

formal nest otherwise, no, it's the standard non-discounted rates, at all times, through Ope...

i really hope you can implement this soon

formal nest Feb 27, 2025, 12:03 AM

#

vocal raven i really hope you can implement this soon

At this time, it is not a high priority item for us, but we are planning to support it at some point.

opaque veldt Feb 27, 2025, 12:24 AM

#

I can imagine it isn't exactly straight forward, since it is time based, but I wonder if other providers will do similar things as a way to differentiate themselves

formal nest Feb 27, 2025, 12:28 AM

#

opaque veldt I can imagine it isn't exactly straight forward, since it is time based, but I w...

Yeah the thinking on our end is we should have some dynamic pricing capabilities that can normalize across providers / models / implementation details, instead of a one-off

opaque veldt Feb 27, 2025, 12:29 AM

#

yeah, completely makes sense

as someone mentioned, if it was a big enough deal could do retroactive manual (with scripts) refunds when users would have got the discount

#

but likely not a huge deal

vocal raven Feb 27, 2025, 12:29 AM

#

or make a cron job to change the pricing at a fixed time every day

#

that way the openrouter.ai price is updated too

opaque veldt Feb 27, 2025, 12:30 AM

#

vocal raven or make a cron job to change the pricing at a fixed time every day

well that could be prone for issues, since there could be timing issues

vocal raven Feb 27, 2025, 12:30 AM

#

opaque veldt well that could be prone for issues, since there could be timing issues

wdym?

#

are you saying it might lag behind?

opaque veldt Feb 27, 2025, 12:31 AM

#

just slight differences between when things actually switch on the provider side, or the cron job failing and the price staying wrong for too long

#

I personally am not a fan of cron jobs for core infra like that

vocal raven Feb 27, 2025, 12:32 AM

#

opaque veldt just slight differences between when things actually switch on the provider side...

well i don't think anything could fix the provider lagging behind

#

no way to know that afaik

#

(response's usage is just token info, no price info)

opaque veldt Feb 27, 2025, 12:35 AM

#

yeah, though depending on setup, a tighter integration with a provider might be a better option, e.g. where the provider can tell openrouter the price is changed

#

but that precludes that they care about openrouter 😄

formal nest Feb 27, 2025, 12:36 AM

#

opaque veldt yeah, though depending on setup, a tighter integration with a provider might be ...

🎯

junior skiff Feb 27, 2025, 2:49 AM

#

the retro active refund would be the easy part

#

as you run it once a month

#

i dont care if i overpay for 1 provider

#

if i know i get the money eventually back

#

going byok defeats a bit of the purpose of holding funds with or

#

its just 1 provider anyway / but the savings are nice .. difference if you spend 1000 bucks or 250 on a bigger synth gig

#

if you get the allocation that is

pale hull Feb 27, 2025, 6:06 AM

#

DeepInfra added a faster endpoint for R1 at $2/$6 (fp4 quantized) https://x.com/DeepInfra/status/1894866880160244163

DeepInfra (@DeepInfra) on X

🚀 Exciting news! @DeepInfra just dropped Deepseek R1 Turbo—blazing fast at up to 40 tokens per second!

🔥 Runs on Nvidia B200 GPUs
💰 Pricing: $2/$6 per 1M tokens
📍 Hosted in the US 🇺🇸

Try it now on DeepInfra! As always - the best price.

opaque veldt Feb 27, 2025, 6:11 AM

#

pale hull DeepInfra added a faster endpoint for R1 at $2/$6 (fp4 quantized) https://x.com...

max 32k context window though

#

as chatty in reasoning as r1 is, lol

tender pawn Feb 27, 2025, 8:20 AM

#

pale hull DeepInfra added a faster endpoint for R1 at $2/$6 (fp4 quantized) https://x.com...

FP4 though, then Nebius is much better.

#

formal nest Feb 27, 2025, 1:30 PM

#

are aware of both, will add deepinfra today

#

nebius wasn’t quite ready for us

formal nest Feb 27, 2025, 3:08 PM

#

pale hull DeepInfra added a faster endpoint for R1 at $2/$6 (fp4 quantized) https://x.com...

this is up now

merry path Feb 27, 2025, 4:50 PM

#

Okay now we're talking, 150tps, just need to ge that context window up.

keen pike Feb 27, 2025, 7:26 PM

#

formal nest this is up now

Now I can't use the original (fp8) deepinfra, they have the same provider id (deepinfra) it seems

#

It's more expensive and has lower quality

formal nest Feb 27, 2025, 7:27 PM

#

keen pike Now I can't use the original (fp8) deepinfra, they have the same provider id (de...

you can sort by price

#

https://openrouter.ai/docs/features/provider-routing#provider-sorting

OpenRouter Documentation

Provider Routing - Smart Multi-Provider Request Management

Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.

keen pike Feb 27, 2025, 7:29 PM

#

Any way to just select the provider manually?

formal nest Feb 27, 2025, 7:56 PM

#

keen pike Any way to just select the provider manually?

You can select the DeepInfra provider, but unfortunately we don't have a way to let you select the specific endpoint at this time. Your best bet is to sort by price and select DeepInfra, and if you really want to avoid getting any other endpoints, you can disable fallbacks. All of this should be in the docs I linked.

#

We know this is not a good experience, on our roadmap to fix.

dry moss Feb 27, 2025, 8:10 PM

#

formal nest you can sort by price

Can this setting be more agressive?

I have it set to use the cheaper version but the much pricier version is still replying. almost half of the time.

I rather have a generation failing sometimes than paying 10 times the price.

formal nest Feb 27, 2025, 8:11 PM

#

dry moss Can this setting be more **agressive?** I have it set to use the cheaper versio...

that's a valid point! the platform defaults tries to guarantee a generation (so it falls back). You should be able to disable fallbacks in your API calls.

dry moss Feb 27, 2025, 9:36 PM

#

I looked onto it and reached this conclusion:
In my opinion, this is a bit out of touch to the fact a lot of users using open router aren't developers.

The majority of apps don't give this deep level of customization of API calls in their UI. I use mostly for roleplay, but I also saw here, that even extremely widely used apps like cline don't allow to disable fallbacks.

I really think that this should be an option handled on openrouter settings.

I would love if this can be looked into. Because while its true that there is the free version for deepseek, 1- I never take free stuff for granted and 2- It fails more often than the cheapest paid.

formal nest Feb 27, 2025, 10:26 PM

#

dry moss I looked onto it and reached this conclusion: In my opinion, this is a bit out o...

This is good feedback, thank you!

wintry dome Feb 28, 2025, 12:15 AM

#

formal nest This is good feedback, thank you!

Perhaps you should make the "Turbo" variant a sperate model in the list that way it has to be explicitly selected (similar to how Nitro versions are separate though I know that's not the same thing)

vocal raven Feb 28, 2025, 1:11 AM

#

wintry dome Perhaps you should make the "Turbo" variant a sperate model in the list that way...

nitro is no longer separate

#

everything is in one list

#

makes sense if you consider that the goal is load balancing

elfin wharf Feb 28, 2025, 4:35 PM

#

Hi, it looks like sometimes when the reasoning segment in the "raw" R1 response is empty (having things like <think></think>), OpenRouter cannot recognize the text after it as result: it still thinks the tokens after </think> are reasoning tokens.

#

Also... it could be a good if we could customize the reasoning prefill, like forcing R1 to think in a certain language (prefill <think>OK vs <think>好的).

limpid wasp Feb 28, 2025, 8:41 PM

#

rigid nova rest in peace `hyperbolic:deepseek/deepseek-r1-zero`

shame indeed.

vocal raven Mar 1, 2025, 5:03 AM

#

just gonna drop this for all the non-deepseek providers

#

$0.55/2.19 is enough for them, why isn't it enough for you

nimble bobcat Mar 1, 2025, 5:41 AM

#

vocal raven $0.55/2.19 is enough for them, why isn't it enough for you

DeepSeek provider demands collecting prompts, then the pricing is reasonable.
Non-DeepSeek providers with higher pricing, claim they do not collect anything, but who knows?

vocal raven Mar 1, 2025, 5:41 AM

#

vocal raven just gonna drop this for all the non-deepseek providers

I don't believe the cost here has revenue from data sharing pre subtracted

clever jolt Mar 1, 2025, 8:52 AM

#

I think the other ones just haven’t got used to the new pricing regime yet. Before it was pricing like o1 and Sonnet, prices where clearly there is a very healthy markup. They probably think “oh well $8 output is cheap then” but it’s not.

cinder shadow Mar 1, 2025, 2:13 PM

#

vocal raven just gonna drop this for all the non-deepseek providers

Each H800 node delivers an average throughput of ~73.7k tokens/s input (including cache hits) during prefilling or ~14.8k tokens/s output during decoding.
whereas vLLM on a H200 node delivers 1.4k tokens/s output at best at low context. Current inference engines are doing at least 10x lower throughput compared to DeepSeek's inferencing.

merry path Mar 1, 2025, 2:14 PM

#

With a 500%+ margin at their current prices too, eh

clever jolt Mar 1, 2025, 2:37 PM

#

Clearly just slapping vllm on a cloud rented instance isn't really good enough to be a competitive service provider.

#

It just wasn't as noticable before because while I'm sure OAI/Anthropic do have optimized deployments, they are receiving a lot of markup on their $15/output pricing (or $150 for OAI recently lol).

vale marten Mar 1, 2025, 8:20 PM

#

https://x.com/deedydas/status/1895702508053643560

Deedy (@deedydas) on X

BREAKING DeepSeek just let the world know they make $200M/yr at 500%+ profit margin.

Revenue (/day): $562k
Cost (/day): $87k
Revenue (/yr): ~$205M

This is all while charging $2.19/M tokens on R1, ~25x less than OpenAI o1.

If this was in the US, this would be a >$10B company.

limpid wasp Mar 1, 2025, 8:29 PM

#

this doesn't account for recouping upfront investments and as they stated, most people don't pay (they don't even a paid payment plan on their site, so it's just API calls).

vocal raven Mar 1, 2025, 8:36 PM

#

limpid wasp this doesn't account for recouping upfront investments and as they stated, most ...

upfront investments? this is assuming they lease the h800s

clever jolt Mar 1, 2025, 8:37 PM

#

vale marten https://x.com/deedydas/status/1895702508053643560

I saw some estimates of their value at something like 80b if it was a USA company (though a big range like 1-160b). China business world is extremely different to the USA though.

junior skiff Mar 2, 2025, 12:15 AM

#

vale marten https://x.com/deedydas/status/1895702508053643560

could make if all is billed - webapp and mobile are not billeg

#

so i think they are breaking even

solid copper Mar 2, 2025, 2:15 AM

#

paying US tech companies means that at least half of what you paid is going to the pockets of billionaires and shareholders, this wont change anytime soon, THEY aren't willing to change it.

junior skiff Mar 2, 2025, 12:13 PM

#

was the deepseek integration changed that it wont supply the thinking traces by default anymore ?

limpid wasp Mar 2, 2025, 12:27 PM

#

was it ever provided "by default"? I always received it with include_reasoning in the reasoning field.

split cedar Mar 2, 2025, 1:51 PM

#

deepseek's inference efficiency won't be achieved by the other api providers unless they plan to invest a lot of nodes to implement the full EP implementation with separate processing for prefill and decode

junior skiff Mar 2, 2025, 3:57 PM

#

limpid wasp was it ever provided "by default"? I always received it with include_reasoning i...

it was ..

#

just need to patch st now i guess

weak salmon Mar 4, 2025, 12:48 AM

#

hey all, been using deepseek-r1 in production through open router.
I have json as the response format and require_parameters = True, and it still routes me to Together which doesnt even support json format. Places like fireworks have a post processing json forcer even for models that dont natively support it so im just wondering why I am still getting routed to non-json-supporting providers or if anyone has gotten around this?

edit: I opened a help thread here https://discordapp.com/channels/1091220969173028894/1346285793953583187

radiant cape Mar 4, 2025, 1:50 AM

#

I've noticed there is a difference in the response quality between the free and paid versions, is this due to the provider or the money?

vocal raven Mar 4, 2025, 1:51 AM

#

radiant cape I've noticed there is a difference in the response quality between the free and ...

both

#

the provider takes no money so you get what you pay for

radiant cape Mar 4, 2025, 1:57 AM

#

vocal raven the provider takes no money so you get what you pay for

Makes sense

stark sluice Mar 5, 2025, 7:52 PM

#

cinder shadow > Each H800 node delivers an average throughput of ~73.7k tokens/s input (includ...

which makes sense considering each of their deployment is 320+32 H800s, the standard 8*H200 deployment would be very wasteful as there are a lot of overheads with shifting the weights back and forth

trail sapphire Mar 6, 2025, 2:32 AM

#

Anyone know provider that serve deepseek r1 zero? hyperbolic has stop serving it.

That model are really good if you trying to get data for whatever you need, yes it's harder to read and understand but for true thinking trace is actually quite unique.

vale marten Mar 6, 2025, 3:09 AM

#

What were you using it for, if you don't mind me asking

#

I'm curious

rocky heron Mar 6, 2025, 3:53 AM

#

Nobody that we know of atm but we’re trying to incentivize one to start up

formal nest Mar 6, 2025, 4:37 PM

#

👀 https://openrouter.ai/deepseek/deepseek-r1-zero:free

DeepSeek R1 Zero (free) - API, Providers, Stats

DeepSeek-R1-Zero is a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. It's 671B parameters in size, with 37B active in an inference pass. Run DeepSeek R1 Zero (free) with API

rocky heron Mar 6, 2025, 4:48 PM

#

trail sapphire Anyone know provider that serve deepseek r1 zero? hyperbolic has stop serving it...

^

#

cc @rigid nova

peak flame Mar 6, 2025, 7:31 PM

#

I can't get R1 Zero to stop outputting \boxed{}.

night summit Mar 6, 2025, 7:53 PM

#

does this have tooling ?

trail sapphire Mar 6, 2025, 9:49 PM

#

peak flame I can't get R1 Zero to stop outputting \boxed{}.

It's coming from deepseek original RL, nothing can't be done with it. I think

trail sapphire Mar 6, 2025, 9:49 PM

#

rocky heron ^

Thanks 🙏

trail sapphire Mar 6, 2025, 9:50 PM

#

vale marten What were you using it for, if you don't mind me asking

Some data stuff

rigid nova Mar 6, 2025, 9:50 PM

#

peak flame I can't get R1 Zero to stop outputting \boxed{}.

Even with fiddles of parameters?

rigid nova Mar 6, 2025, 9:52 PM

#

rocky heron cc <@1227241843268259850>

🥺 🔀🩶

nimble bobcat Mar 7, 2025, 8:40 AM

#

long reasoning and slow tps; normally it takes 2 mins to finish a turn 🤣

rigid nova Mar 7, 2025, 8:46 AM

#

#

new pricing scheme from novita.ai

woven chasm Mar 7, 2025, 12:29 PM

#

I'm getting an annoying number of 0 tokens out of Nebius for R1. The other providers seem fine.

trail sapphire Mar 7, 2025, 12:55 PM

#

woven chasm I'm getting an annoying number of 0 tokens out of Nebius for R1. The other provi...

its more annoying when they deduct the input when we dont get any output

formal nest Mar 7, 2025, 2:05 PM

#

you are not currently charged for 0 output tokens - #announcements message

#

will flag to nebius that this is a significant issue

leaden socket Mar 7, 2025, 4:44 PM

#

Seems like MiniMax is returning 0 output tokens all the time now. Some examples from my logs:

{"status_code": 200, "response": {"id": "gen-1741365750-tmyFfnakL1ErupeZdIGM", "provider": "Minimax", "model": "deepseek/deepseek-r1", "object": "chat.completion", "created": 1741365750, "choices": [{"logprobs": null, "index": 0, "message": {"role": "assistant", "content": "", "refusal": null, "reasoning": null}}], "usage": {"prompt_tokens": 2774, "completion_tokens": 0, "total_tokens": 2774}}}
{"status_code": 200, "response": {"id": "gen-1741365750-IC24jkzeA5Vf625itjxD", "provider": "Minimax", "model": "deepseek/deepseek-r1", "object": "chat.completion", "created": 1741365750, "choices": [{"logprobs": null, "index": 0, "message": {"role": "assistant", "content": "", "refusal": null, "reasoning": null}}], "usage": {"prompt_tokens": 2843, "completion_tokens": 0, "total_tokens": 2843}}}

peak flame Mar 7, 2025, 5:08 PM

#

Prefill + stop string ["\\boxed{"] to turn R1 Zero into a more conventional non-thinking model (but then what's the point). For RP at least, since the response is easily just (response) \boxed{(response)}. Temp 1.3+. 😆

But I think R1 is still better.

woven chasm Mar 7, 2025, 5:08 PM

#

trail sapphire its more annoying when they deduct the input when we dont get any output

Indeed. Deepseek is better now, Can't believe t.

rocky heron Mar 7, 2025, 5:09 PM

#

woven chasm Indeed. Deepseek is better now, Can't believe t.

why is it better?

woven chasm Mar 7, 2025, 5:09 PM

#

No 0 tokens. And response time improved

rocky heron Mar 7, 2025, 5:09 PM

#

trail sapphire its more annoying when they deduct the input when we dont get any output

this shouldn't be happening - we insure all the zero-output requests for deepseek. did you notice otherwise?

rocky heron Mar 7, 2025, 5:09 PM

#

woven chasm No 0 tokens. And response time improved

from Novita specifically, or from other providers?

woven chasm Mar 7, 2025, 5:09 PM

#

Deepseeks is not returning 0 tokens, Nebius is.

formal nest Mar 7, 2025, 5:23 PM

#

@woven chasm can you provide any generation IDs? From our metrics, it's not a large % of requests

#

Are you setting max_tokens?

woven chasm Mar 7, 2025, 5:58 PM

#

formal nest Are you setting max_tokens?

1024

formal nest Mar 7, 2025, 5:59 PM

#

woven chasm 1024

Do you see reasoning tokens in your activities page? 1024 is typically too low to get actual completion (non-reasoning) tokens

woven chasm Mar 7, 2025, 5:59 PM

#

https://openrouter.ai/api/v1/generation?id=gen-1741346738-faG3JKmrg3tW2dTpMsdj

#

No reasoning tokens.

#

I Upped the Max_tokens to 32000

formal nest Mar 7, 2025, 6:00 PM

#

thanks let me look

woven chasm Mar 7, 2025, 6:00 PM

#

To the same effect.

formal nest Mar 7, 2025, 6:02 PM

#

can you screenshot your activity tab? It seems in our logs we are getting some tokens back

#

woven chasm Mar 7, 2025, 6:18 PM

#

Mmm. As we spoke I'm trying to use it. This is what I'm getting with Deepseek

#

And this with Nebius

#

Same prompts. See the difference in output tokens? I believe its thinking, but not passing the results.

formal nest Mar 7, 2025, 6:26 PM

#

woven chasm Same prompts. See the difference in output tokens? I believe its thinking, but n...

hmm yeah that is strange. If you click to expand on the Nebius generations on the righthand side there, and view the token counts / raw metadata, do you see reasoning tokens?

woven chasm Mar 7, 2025, 6:26 PM

#

4504 prompt 131 completion,
incl. 128 reasoning

#

4500 prompt 133 completion,
incl. 130 reasoning

leaden socket Mar 7, 2025, 6:30 PM

#

Maybe related, I did 500 requests to Nebius, 7 of those have empty content field but full reasoning field.

woven chasm Mar 7, 2025, 7:55 PM

#

Mmm. I'm seeing error while passing 'max_price', such as this:
{
"max_tokens": 8124,
"temperature": 0.7,
"top_p": 0.95,
"presence_penalty": 0.5,
"frequency_penalty": 1.7,
"stop": ["#"],
"n": 1,
"tools": [],
"tool_choice": "auto",
"max_price": {
"prompt": 1,
"completion": 3
}
}

formal nest Mar 7, 2025, 8:04 PM

#

woven chasm Mmm. I'm seeing error while passing 'max_price', such as this: { "max_tokens": 8...

what's the error message?

woven chasm Mar 7, 2025, 8:11 PM

#

Got an unexpected keyword argument 'max_price'

#

It worked until it didnt

#

API parameter 'exclude' doesnt seem to work either. This with QWQ paid version

#

Yup. Reasonin effort as this:
"reasoning": {
"effort": "high",
"exclude": true # Use reasoning but don't include it in the response
}
Doesnt work for me.

#

AsyncCompletions. create() got an unexpected argument 'reasoning'

#

The same as with 'max_price'

#

This parameters work fine:
{
"max_tokens": 8124,
"temperature": 0.7,
"top_p": 0.95,
"presence_penalty": 0.5,
"frequency_penalty": 1.7,
"stop": ["#"],
"n": 1,
"tools": [],
"tool_choice": "auto",
}

formal nest Mar 7, 2025, 8:48 PM

#

You're using the openai client @woven chasm ? You'll need to pass the reasoning and max_price params etc through extra_body

woven chasm Mar 7, 2025, 8:49 PM

#

OpenAI client, yes. And this shows my ignorance. Guess I have to find out what extra_body is.

#

Thank you. Will find out what it is and how to use it. A pointer would be helpful, but you dont have to do that.

formal nest Mar 7, 2025, 8:53 PM

#

woven chasm Thank you. Will find out what it is and how to use it. A pointer would be helpfu...

yes one moment

#

        model="deepseek/deepseek-r1",
        messages=[{"role": "user", "content": prompt}],
        extra_body={
            "include_reasoning": True,
            "max_price": {"prompt": 0.9, "completion": 2.4}
        },
        stream=True
    )```

#

something like that

#

so max_price, reasoning, exclude, can all go here

woven chasm Mar 7, 2025, 9:16 PM

#

formal nest yes one moment

Yes, I looked for the information and fixed my json. Today I learned something thanks to you.

#

So @formal nest, no idea about the '0 tokens' issue I had with Nebius?

formal nest Mar 7, 2025, 9:24 PM

#

woven chasm So <@165587622243074048>, no idea about the '0 tokens' issue I had with Nebius?

I unfortunately don't see major (>0.5%) occurences of actually 0 completion tokens in my metrics, I can take another look at your account logs though

woven chasm Mar 7, 2025, 9:53 PM

#

If you could I would be grateful. Its annoying.

formal nest Mar 7, 2025, 9:54 PM

#

yeah absolutely. Little busy right now but will review later

woven chasm Mar 7, 2025, 10:12 PM

#

Thanks!

trail sapphire Mar 7, 2025, 10:44 PM

#

rocky heron this shouldn't be happening - we insure all the zero-output requests for deepsee...

I didn't notice that with deepseek but maybe when I use minimax model few days ago.

nimble kelp Mar 8, 2025, 4:40 PM

#

request model

half sapphire Mar 9, 2025, 4:16 AM

#

strictly for R1 and role play purposes

#

anyone else find the model very agressive wheezeold

strange comet Mar 9, 2025, 11:15 AM

#

very aggressive, argumentative, angsty, using the same phrases like "the beer tastes like regret and disappointment", eventually making characters act like lunatics

upper tapir Mar 9, 2025, 1:41 PM

#

I agree, I choose 12B models over R1 unless I want complete physco

neat shuttle Mar 10, 2025, 12:20 AM

#

upper tapir I agree, I choose 12B models over R1 unless I want complete physco

12B?
Are you serious?
For heaven's sake, de gustibus non est disputandum, but only if you use WizardLM-2 8x22B do you realize that you are in a much more advanced era.

upper tapir Mar 10, 2025, 12:21 AM

#

neat shuttle 12B? Are you serious? For heaven's sake, de gustibus non est disputandum, but on...

I retired WizardLM-2 8x22B user, too much positivity

strange comet Mar 10, 2025, 8:33 AM

#

Minimax has worked pretty well for me. If R1 starts going into a loony place, I rectify it with Minimax for one or two generations and it kind of goes back to normal. I haven't used any other models recently. They're so predictable and almost one-directional.

rigid nova Mar 10, 2025, 9:13 AM

#

strange comet Minimax has worked pretty well for me. If R1 starts going into a loony place, I ...

why minimax im curious? i assume for the context length but thought google would be a better option
what have you found its strengths to be

strange comet Mar 10, 2025, 9:15 AM

#

rigid nova why minimax im curious? i assume for the context length but thought google would...

Gemini is moderated while minimax is not. Otherwise it would be my no 1 model. Minimax passes the strawberry test and follows the prompt and it's slow in progression of the story.

timid crane Mar 10, 2025, 3:30 PM

#

strange comet Minimax has worked pretty well for me. If R1 starts going into a loony place, I ...

useful to know - been trying to find the right model to pair with R1.
R1 starts with genius and then progresses slowly towards insanity on so many genres

peak flame Mar 11, 2025, 7:18 PM

#

R1 Zero Chutes returning 0 output on all prompts now, no error.

formal nest Mar 11, 2025, 7:23 PM

#

peak flame R1 Zero Chutes returning 0 output on all prompts now, no error.

checking

dry moss Mar 11, 2025, 7:23 PM

#

R1 zero is also a good pair to r1 its way less evil.

the issue is that chutes half of the time don't return reply and when it returns, the reply is in weird formatting.

I don't like wizard 8x22B too, it is smart, but its too... nice.

#

thanks for the minimax sugestion, i've been using deepseek v3 as a swipe mixup, but the tendency that v3 has to repeat paragraphs do gets annoying.

Edit:

Ok, yeah, no. after 2 weeks trying Both Minimax and Hermes 3 405b, deepseek v3 still delivers the best responses, being the best pair to R1. they both can be as repetitive as v3 if you allow them. But v3 has a greater character personality adherence and understanding of the chat history.

peak flame Mar 11, 2025, 7:51 PM

#

OK, responses are back.

formal nest Mar 11, 2025, 7:51 PM

#

peak flame OK, responses are back.

thanks for flagging

strange comet Mar 12, 2025, 9:32 AM

#

dry moss R1 zero is also a good pair to r1 its way less evil. the issue is that chutes h...

Can you even use Zero for RP? I get nonsense each time. Not even proper sentences. Also try Hermes 3 405b to mix things up. Even better than Minimax, depending on what you want.

peak flame Mar 12, 2025, 2:40 PM

#

strange comet Can you even use Zero for RP? I get nonsense each time. Not even proper sentence...

Zero turns into a normal model with a prefill to skip the thinking block and ["\\boxed{"] stopping string. Word salad is rare to me (once in 25 today), goes away on swipe. Didn't try long context. TC without instruct sequences happens to be okay too. "Issue" is there are better RP models.

half sapphire Mar 14, 2025, 5:30 PM

#

Is there a way to prompt or prefill to somewhat reliably prevent it from thinking too much or at all

dry moss Mar 15, 2025, 1:47 PM

#

openrouter chat has a slider for thinking slider token budget

#

if the app your're using don't support this, I think the easiest way to avoid it overthinking is making the prompt as clear as possible so it doesn't have many "but wait!" moments.

I'll assume chub, (mutual servers) So make your system prompt and anything that sets (How it should behave) as simple and without contradictions as possible. You can be descriptive as you want in character details and appearance, but in telling it HOW it should behave, be as simple as possible.

This is not only to avoid r1 overthinking, and hitting chub's 2048 output limit while still on the thinking phase, but because R1 focus A LOT on HOW it should behave, put too much "how" and it will overshadow the rest of your character. (Afterall thats exacly what the thinking phase does, focus on "how" to respond)

#

@half sapphire (forgot to quote reply)

half sapphire Mar 15, 2025, 9:44 PM

#

dry moss if the app your're using don't support this, I think the easiest way to avoid it...

Wait how is that applied on the API side

#

like is that a param?

dry moss Mar 16, 2025, 5:53 AM

#

half sapphire Wait how is that applied on the API side

that, I have no idea, I know there is this feature on openrouter chat when selecting r1, so the model probably supports it.

But in any case, I think jailbreaks only make it spend more tokens reasoning, and prefill sometimes just don't work or make it skip the reasoning phase completely.

peak flame Mar 16, 2025, 7:32 AM

#

👀 DS docs says "soon"

half sapphire Mar 17, 2025, 12:46 AM

#

👀

solid copper Mar 19, 2025, 11:28 PM

#

question, the endpoints that don't disclose if they use fp4 and fp8 is because you may be routed to a gpu that can use either quantization? Or simply because they don't provide this information to openrouter?

formal nest Mar 19, 2025, 11:33 PM

#

solid copper question, the endpoints that don't disclose if they use fp4 and fp8 is because y...

latter, we don't know

cinder shadow Mar 21, 2025, 4:14 PM

#

Fireworks has dropped their pricing to $0.55/$2.19 on their new basic deployment (deepseek-r1-basic) https://fireworks.ai/blog/fireworks-ai-developer-cloud
old $3/$8 pricing applies to their existing fast model (deepseek-r1)
cc @formal nest

Faster, more efficient DeepSeek on the Fireworks AI Developer Cloud

Discover how Fireworks AI Developer Cloud accelerates AI innovation with faster, optimized DeepSeek R1 deployments. Learn about new GPU options, improved speed, and enhanced developer tools for efficient, scalable AI solutions.

rugged nest Mar 21, 2025, 5:49 PM

#

cinder shadow Fireworks has dropped their pricing to $0.55/$2.19 on their new basic deployment...

oh, very nice!!

clever jolt Mar 21, 2025, 6:40 PM

#

thats 2 providers now with a "fast" end point that it more expensive. Any idea what the difference is?

earnest wolf Mar 21, 2025, 7:48 PM

#

clever jolt thats 2 providers now with a "fast" end point that it more expensive. Any idea w...

faster token generation speed

clever jolt Mar 21, 2025, 7:58 PM

#

yeah I know that I was more thinking whats the difference in underlying deployment

vocal raven Mar 21, 2025, 8:09 PM

#

clever jolt yeah I know that I was more thinking whats the difference in underlying deployme...

GPU count, or quantizing then marking up just for stability

clever jolt Mar 21, 2025, 8:10 PM

#

yeah thats the thing though, more gpu should mean faster yes, but also larger scale deploy so also cheaper/more efficient

weak salmon Mar 21, 2025, 11:19 PM

#

hey anyone else not getting any content from r1 calls? Using fireworks. Reasoning works, then i get the "stop" finish reason, then absolutely no content. this is no matter what i do with prompting

#

Just confirmed, the content is being sent in the reasoning field from openrouter

bright portal Mar 21, 2025, 11:25 PM

#

weak salmon Just confirmed, the content is being sent in the reasoning field from openrouter

Which provider is this btw? -- sometime, reasoning might eat up all the context

rocky heron Mar 21, 2025, 11:26 PM

#

Works for me:

#

Fireworks

weak salmon Mar 21, 2025, 11:26 PM

#

okay let me keep investigating here and make sure my ducks are in a row

#

json mode btw

#

I wasn't changing anything just fired up the ol app and getting this prob. Some more details,
heres my object im sending

  response = client.chat.completions.create(
      model="deepseek/deepseek-r1",
      messages=...,
      stream=True,
      response_format={'type': 'json_object'},
      extra_body={
          "provider": {
              "require_parameters": True,
              'order': [
                  'Fireworks'
              ],
              'allow_fallbacks': False
          }}

then what happens is, i start recieving reasoning data that is just my json object that should be in content. I dont recieve any actual reasoning. Then i receive no content, and a finish reason "stop". This is consistently happening. am i crazy? lol is it replicable?

#

I commented out json and switched to SambaNova and it resolved.
I switched to Fireworks and disabled json format and it also worked.
It appears to be an issue with Fireworks's JSON mode (they are the only people i think that have json mode for deepseek through some proprietary method)
to be clear the issue is: 1- no actual reasoning data and 2 content appearing in reasoning, when in json mode

jovial flame Mar 22, 2025, 2:48 AM

#

@weak salmon I believe thats an error since json_format wont really work with R1. We should disable allowing users to send that.

weak salmon Mar 22, 2025, 3:19 AM

#

@jovial flame That would be extremely disappointing, considering fireworks literally supports it

jovial flame Mar 22, 2025, 4:28 AM

#

Haha you are right

#

Will look into it first thing tomorrow and get it fixed

pale hull Mar 22, 2025, 5:47 AM

#

It seems like there is currently no way to specify routing to the non-basic Fireworks?
This is also the same for other multi-version cases, so I think the "provider label" could be exposed to the endpoints API, and specifiable as the routing parameter.

weak salmon Mar 22, 2025, 5:54 AM

#

jovial flame Will look into it first thing tomorrow and get it fixed

haha youre the best i really appreciate it. It seems they came out with a new R1 model so it may have mixed their stuff up.

frozen dawn Mar 22, 2025, 4:45 PM

#

pale hull It seems like there is currently no way to specify routing to the non-basic Fire...

I enconter the same problem, any one have any idea how to use fireworks fast version of deepseek R1 model?

vocal raven Mar 22, 2025, 9:15 PM

#

still waiting for offhours discounts

rigid nova Mar 22, 2025, 10:53 PM

#

vocal raven still waiting for offhours discounts

The crypto providers are still setting themselves up to float, but there are now very cheap options like fireworks who have a basic endpoint now

vocal raven Mar 22, 2025, 10:53 PM

#

yeah tbf the cheap ones are getting faster and the fast ones are getting cheaper

rigid nova Mar 22, 2025, 10:53 PM

#

Klusterai and others do batch processing for a discount

plucky eagle Mar 23, 2025, 6:54 PM

#

Hello everyone, my question may seem simple and obvious, but I'm just getting the hang of it, so far I can't figure out what's the matter. For some reason, R1 is constantly randomizing the answers in terms of reasoning. He can process the request correctly and separately issue "reasoning", separately "content". Maybe "reasoning" just doesn't fill it out and write a response normally. Or maybe, for some reason, insert the reasoning itself into the "content", i.e. literally the course of your thoughts, has anyone encountered this? Calling like this:

first_model = "deepseek/deepseek-r1"
models_to_try = ["openai/o3-mini-high", "anthropic/claude-3.7-sonnet:thinking"]
#
extra_body = {
"models": models_to_try,
"provider": {
"order": ["Fireworks", "Novita", "Nebius", "DeepSeek"],
"ignore": ["DeepInfra"],
"allow_fallbacks": False
}
}

response_data = await global_variables.ai_client.chat.completions.create(
model=first_model,
extra_body=extra_body,
messages=data_for_requests["messages"],
temperature = 0.4,
max_tokens=20000
)

vocal raven Mar 23, 2025, 7:01 PM

#

plucky eagle Hello everyone, my question may seem simple and obvious, but I'm just getting th...

don't set model and models at the same time

#

just make one models sorted by preference

#

aside from that idk why content and reasoning would be inverted...
maybe you should go to https://openrouter.ai/activity and send a screenshot and request id of one of the requests with inverted content/reasoning

weak salmon Mar 23, 2025, 7:10 PM

#

jovial flame Haha you are right

hey it works again, just wanted to say thanks for your fast response if you did that! Appreciate it

plucky eagle Mar 23, 2025, 7:25 PM

#

vocal raven aside from that idk why content and reasoning would be inverted... maybe you sho...

it seems that I just accidentally found the reason, the fact was that the request was not in English, as soon as I started writing exclusively all requests in English, everything fell into place.😐

weak salmon Mar 23, 2025, 7:32 PM

#

weak salmon hey it works again, just wanted to say thanks for your fast response if you did ...

edit - sorry for the early props , i was wrong, its still broken in the same way ~ Im going to open up a thread. my whole app is just kinda down because of this

jovial flame Mar 23, 2025, 7:39 PM

#

yes somed way to repro would be good

weak salmon Mar 23, 2025, 7:41 PM

#

okay just opened a thread and sent repro object from my tests. Going to hook up to fireworks directly and see if its a prob with them on their side too

timid crane Apr 3, 2025, 4:51 PM

#

With Targon (from ST) I am getting broken responses - returning only
<Tool Response>
Chutes is working perfectly.
First time I have seen this issue from Targon. Others I have spoken with seeing the same.

lone sky May 10, 2025, 10:39 PM

#

V2 wen?

vocal raven May 10, 2025, 10:41 PM

#

lone sky V2 wen?

Jul 17, 2024

lone sky May 11, 2025, 12:02 AM

#

i think i only get 1% of the humor. why such a specific date? I don't get ittt 😭 🥜🤏🧠

vocal raven May 11, 2025, 12:03 AM

#

lone sky i think i only get 1% of the humor. why such a specific date? I don't get ittt �...

that was when deepseek v2 launched

#

sorry

#

im being pedantic

lone sky May 11, 2025, 12:03 AM

#

I read May 2024 on googe tho :(

vocal raven May 11, 2025, 12:03 AM

#

about the difference between r2 and v2

#

or v2 and v4

lone sky May 11, 2025, 12:05 AM

#

I'm still behind u i think

vocal raven May 11, 2025, 12:23 AM

#

lone sky I'm still behind u i think

hm idk

#

i got the date from hf commit dates

lone sky May 13, 2025, 1:32 AM

#

Is deepseek zero more knowledgeable than r1?
I just gave it a knowledge test for eye colors of characters, and it consistently did better than R1.
What else is deepseek zero better at? wow.

sinful crown May 13, 2025, 1:33 AM

#

Being less censored

lone sky May 13, 2025, 1:49 AM

#

Hmm. Maybe that's it. I was wondering about more niche things but I suppose thats a plus.

#

It has the highest unncensored score on dubesor.de.. I wonder what the Venice chatbot would get

trail sapphire May 13, 2025, 1:54 AM

#

It's build upon base without instructions, so it's the purest form of RL

#

When you didn't put human bias the model optimize it self, better in many aspect but worse in aspect that human care.

Because it's hard to read and hard to understand but it's actually high quality

#

To bad hyperbolic stop hosting it, and I don't like chutes hosting

proven atlas May 13, 2025, 5:16 AM

#

If I remember correctly, DeepSeek Zero is a base model instead of a instruction-tuned model. So it should be better at completing sentences.

But last time I tried, it just starts solving math equations randomly out of no where.

amber stirrup May 13, 2025, 7:26 AM

#

Zero is almost further back than a base model

#

It's a base model run through totally unstructured reinforcement learning on verifiable problems.

#

I think they very gently showed it what a thinking step looks like, but that's about it. It will apparently switch languages mid-thought sometimes, or use tokens seemingly unrelated to words

proven atlas May 13, 2025, 7:52 AM

#

amber stirrup I think they very gently showed it what a thinking step looks like, but that's a...

normal R1 will also switch languages mid-thought, but it was mostly mitigated via penalty in reward during post-training

amber stirrup May 13, 2025, 8:21 AM

#

Interesting, haven't seen it do that yet. Interpretability is cool, but I also love the idea of seeing raw thoughts

limpid wasp May 13, 2025, 2:02 PM

#

amber stirrup I think they very gently showed it what a thinking step looks like, but that's a...

I didn't notice that. It does use messier thought chains for sure (not rephrased to be more human friendly), but it was completely usable in my testing. #1330820209812050002 message It understood and tackled all problems, without producing gibberish.

amber stirrup May 13, 2025, 2:44 PM

#

They mentioned it in their paper, but I'm not sure how common it is

frail oxide May 13, 2025, 5:29 PM

#

Someone leak when R2 releases sob_angry_cry

vale marten May 13, 2025, 6:09 PM

#

soon I hope

#

there might be some leak on twitter by the usual accounts

jaunty drum May 28, 2025, 12:37 PM

#

🤔

weary crown May 28, 2025, 1:14 PM

#

https://www.bloomberg.com/news/articles/2025-05-28/deepseek-unveils-update-to-r1-model-as-ai-race-heats-up

#

(paywalled)

earnest wolf May 28, 2025, 1:16 PM

#

"As ai race heats up" I dunno man

#

It's been pretty hot for a while

#

It's not heating up

queen basalt May 28, 2025, 1:37 PM

#

Is new R1 available on DeepSeek API?

sinful crown May 28, 2025, 2:16 PM

#

Finally, DeepSeek R1 2!

earnest wolf May 28, 2025, 3:43 PM

#

queen basalt Is new R1 available on DeepSeek API?

Yes

queen basalt May 28, 2025, 3:44 PM

#

earnest wolf Yes

Need on OpenRouter

earnest wolf May 28, 2025, 3:45 PM

#

@formal nest deepseek did a silent upgrade (again)

formal nest May 28, 2025, 3:45 PM

#

KEKcry

earnest wolf May 28, 2025, 3:45 PM

#

jaunty drum 🤔

Source: here

The announcement says that the new R1 is live

earnest wolf May 28, 2025, 3:46 PM

#

formal nest <:KEKcry:961474616474345492>

Are you crying or are you laughing? /tease

formal nest May 28, 2025, 3:46 PM

#

yeah we hit their deepseek-reasoner endpoint so you can just route to deepseek direct and use the upgrade

sinful crown May 28, 2025, 3:48 PM

#

Did DeepSeek make an endpoint for the old one?

earnest wolf May 28, 2025, 3:48 PM

#

sinful crown Did DeepSeek make an endpoint for the old one?

Nope

sinful crown May 28, 2025, 3:48 PM

#

Oh, no

earnest wolf May 28, 2025, 3:48 PM

#

sinful crown Oh, no

You don't like the new one?

sinful crown May 28, 2025, 3:49 PM

#

I have an unhinged roleplay bot running on R1 for a server, just scared that this will change the personality

earnest wolf May 28, 2025, 3:52 PM

#

sinful crown I have an unhinged roleplay bot running on R1 for a server, just scared that thi...

As an alternative, you can use the free DeepSeek R1 providers

Or cough up more money and use the other paid providers

Test if the personality changed before switching providers

R1 (free) - API, Providers, Stats

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass. Run R1 (free) with API

prisma goblet May 28, 2025, 3:52 PM

#

did they also update the api or just their web platform?

earnest wolf May 28, 2025, 3:53 PM

#

prisma goblet did they also update the api or just their web platform?

Both

sinful crown May 28, 2025, 4:10 PM

#

The inner reasoning for this new R1 is very interesting to watch

#

It seems more articulate than before

candid ice May 28, 2025, 4:13 PM

#

they improved the CoT alot

rugged nest May 28, 2025, 4:17 PM

#

sinful crown Finally, DeepSeek R1 2!

can't wait for the new thread here 🤓

queen basalt May 28, 2025, 4:22 PM

#

of course it still thinks it's ChatGPT lol

slim remnant May 28, 2025, 4:35 PM

#

Is the new version still open-weights? Presumably we'll get it via 3rd party providers at some point?

earnest wolf May 28, 2025, 4:42 PM

#

slim remnant Is the new version still open-weights? Presumably we'll get it via 3rd party pro...

Weights aren't out (yet)

half sapphire May 28, 2025, 4:58 PM

#

where does it say it's out for API 👀

uneven gust May 28, 2025, 5:30 PM

#

Hope there's gonna be more than just upgraded R1

#

It's alright but not 2.5 pro level

vale marten May 28, 2025, 5:32 PM

#

According to everyone, R1.5 feels different. I want more objective data showing performance

uneven gust May 28, 2025, 5:33 PM

#

I haven't tried that same prompt with regular R1

stray locust May 28, 2025, 5:34 PM

#

need bench before & after today

uneven gust May 28, 2025, 5:42 PM

#

Parasail's R1 seems slightly worse

#

Nvm not slightly

#

Some chess pieces can't move at all

#

Definitely an improvement

celest pilot May 28, 2025, 5:50 PM

#

Wonder if the writing capabilities will be better or worse.

jaunty drum May 28, 2025, 5:56 PM

#

https://huggingface.co/deepseek-ai/DeepSeek-R1-0528

it's up, no details yet

deepseek-ai/DeepSeek-R1-0528 · Hugging Face

woven quail May 28, 2025, 6:24 PM

#

They always update the details later

half sapphire May 28, 2025, 6:34 PM

#

Ok r we sure the API updated on their Provider?

#

@formal nest sorry for the quick ping, seeing if u know if their API updated to the new one

formal nest May 28, 2025, 6:37 PM

#

i don't have full confirmation really that it's their new model

#

who knows

clever jolt May 28, 2025, 6:57 PM

#

it would be in-line for them to just update the API endpoint, thats what they did before every time, an in-place update.

#

big question i guess is did they do it yet

formal nest May 28, 2025, 7:00 PM

#

https://openrouter.ai/deepseek/deepseek-r1-0528

R1 0528 - API, Providers, Stats

DeepSeek R1's update to the original R1. Run R1 0528 with API

#

parasail soon

uneven gust May 28, 2025, 7:19 PM

#

Yeeeaaahhh

#

Does that mean no r2 or v4?

#

:(

#

Cuz ye, the model is cool and all but

#

Anthropic and google have gone way beyond

tacit vortex May 28, 2025, 7:29 PM

#

Yay

#

Something happened, the nothing ever happens crowd in disbelief right now

tacit vortex May 28, 2025, 7:30 PM

#

vale marten According to everyone, R1.5 feels different. I want more objective data showing ...

I’m just waiting for dubesor lol

blissful peak May 28, 2025, 7:45 PM

#

vale marten According to everyone, R1.5 feels different. I want more objective data showing ...

I think it is same as before

gaunt rose May 28, 2025, 7:47 PM

#

2 providers already up

earnest wolf May 28, 2025, 8:19 PM

#

clever jolt big question i guess is did they do it **yet**

They did

rugged nest May 28, 2025, 10:10 PM

#

tacit vortex I’m just waiting for dubesor lol

same

limpid wasp May 29, 2025, 1:07 AM

#

rugged nest same

well, that'll be a few hours.. I estimate 12ish or so

earnest wolf May 29, 2025, 1:09 AM

#

limpid wasp well, that'll be a few hours.. I estimate 12ish or so

Can you ping me when you finish the benchmarks? (dont worry if you forget to ping me lol)

formal nest May 29, 2025, 1:26 AM

#

do we need an @ dubesor-benchmark-enjoyer role

#

you can only get the role if you give dubesor an OR key to bench with KEKcry

tacit vortex May 29, 2025, 7:49 AM

#

formal nest do we need an @ dubesor-benchmark-enjoyer role

Yes we do

tacit vortex May 29, 2025, 7:50 AM

#

limpid wasp well, that'll be a few hours.. I estimate 12ish or so

That’s why you’re the GOAT! The GOAT!

limpid wasp May 29, 2025, 9:54 AM

#

formal nest you can only get the role if you give dubesor an OR key to bench with <:KEKcry:9...

🤣 Naw I have rejected keys in the past. But if you wanna contribute data to e.g. the chess leaderboard, just play using your own keys... I incorperate the data of any finished match and thus far I have spent ~$300 on the chess data alone 😦

blissful peak May 29, 2025, 10:16 AM

#

limpid wasp 🤣 Naw I have rejected keys in the past. But if you wanna contribute data to e....

300 $ is insane

limpid wasp May 29, 2025, 12:36 PM

#

earnest wolf Can you ping me when you finish the benchmarks? (dont worry if you forget to pin...

there's ya ping

earnest wolf May 29, 2025, 12:43 PM

#

Thank ee very much

wind tendon May 29, 2025, 2:22 PM

#

When will OpenRouter support tool calling on R1-0528 like DeepSeek API does? @rocky heron 🙏🏻

crystal anvil May 30, 2025, 12:22 AM

#

DeepSeek is not able to review images right?

earnest wolf May 30, 2025, 12:24 AM

#

crystal anvil DeepSeek is not able to review images right?

Correct

gaunt rose Jun 1, 2025, 7:02 AM

#

Is there tool call support with R1? I know R1 v2 supports it, but some folks are saying it isn't supported with OR yet. Is this true?

earnest wolf Jun 1, 2025, 11:53 AM

#

gaunt rose Is there tool call support with R1? I know R1 v2 supports it, but some folks ar...

R1 the model doesn't support tool calling afaik

#

R1 v2 explicitly mentions tool calling support though

gaunt rose Jun 1, 2025, 4:37 PM

#

earnest wolf R1 v2 explicitly mentions tool calling support though

Yup, I'd love to be able to use its tool calling capabilities through OR

cedar sentinel Jun 20, 2025, 10:24 PM

#

icy fulcrum Checking characters might detect some errors, not the ideal solution though. Ye...

sorry for resurrecting a dead thread, but fyi we have this on a per-endpoint basis now:
https://openrouter.ai/docs/api-reference/list-endpoints-for-a-model#response.body.data.endpoints.uptime_last_30m

https://openrouter.ai/api/v1/models/anthropic/claude-sonnet-4/endpoints

List endpoints for a model | OpenRouter | Documentation

proven atlas Jul 18, 2025, 3:46 PM

#

I just got this email which looks interesting

Screenshot_2025-07-18_at_11.44.08_PM.png

earnest wolf Jul 18, 2025, 4:36 PM

#

proven atlas I just got this email which looks interesting

Ooo

tacit vortex Jul 23, 2025, 12:33 PM

#

proven atlas I just got this email which looks interesting

Nice

#

Is it the 1bit architecture or is it the Xeon stuff

proven atlas Jul 23, 2025, 12:36 PM

#

tacit vortex Is it the 1bit architecture or is it the Xeon stuff

I am getting more info from them. probably will make a blog post / newsletter post on this. will share more when i have info. they have api but need to request.

#DeepSeek-R1 and DeepSeek-R1-Zero

Or cough up more money and use the other paid providers