FP4 models are unacceptable especially for coding models. There’s no way to filter this as a paying | OpenRouter | Page 1

next forge Aug 4, 2025, 5:52 AM

#

You can ignore the provider at the global level

#

by going to settings

safe acorn Aug 4, 2025, 5:53 AM

#

If there can be a option to sort providers on the basis of the quant models or block a particular quant (fp4) that would be helpful

(maybe in future updates)

opal hornet Aug 4, 2025, 5:54 AM

#

next forge You can ignore the provider at the global level

in plugins? or how do we do that

latent rapids Aug 4, 2025, 5:57 AM

#

https://openrouter.ai/docs/features/provider-routing#quantization

can't you use this?

OpenRouter Documentation

Provider Routing - Smart Multi-Provider Request Management

Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.

#

That requires manually messing with headers or json request sent, I suppose

#

It's not only DeepInfra, I think Atlas did it, and some providers don't specify quants used, could be de-facto any size

safe acorn Aug 4, 2025, 6:03 AM

#

latent rapids https://openrouter.ai/docs/features/provider-routing#quantization can't you us...

i have never used this one in my programs
thanks for sharing

I never read entire docs 😢

this makes stuff easy

latent rapids Aug 4, 2025, 6:04 AM

#

No, DeepSeek was trained in FP8 and don't have FP16, still some providers have blank in precision level

#

Chutes for examples, is very shy

exotic juniper Aug 4, 2025, 6:15 AM

#

Or maybe a way to specific the quant

#

Like how you can specific :floor or :nitro

#

You should be able to specify either like <model_name>:fp8

#

It would also be nice to be able to specify a specific provider in the slug

#

Like if I want baseten I can be like moonshot-ai/kimi-k2-instruct:baseten

#

That would make it so much easier to use and more powerful

#

Especially for apps that only let you apply a slug (like open WebUI)

#

I might build a proxy just so I can do this lol

hearty verge Aug 4, 2025, 10:18 AM

#

I think realistically this would need to be paired with a policy that providers can't list open weight models on openrouter without disclosing the quantisation. Yes they could lie, but at least they would be actively choosing to lie publicly.

elfin stream Aug 4, 2025, 10:45 AM

#

https://openrouter.ai/docs/features/presets are a thing if you can't modify your app but can specify a model slug.

OpenRouter Documentation

Presets - Configuration Management for AI Models

Learn how to use OpenRouter's presets to manage model configurations, system prompts, and parameters across your applications.

exotic juniper Aug 4, 2025, 9:26 PM

#

elfin stream https://openrouter.ai/docs/features/presets are a thing if you can't modify your...

Just checking, you mean slug as in like openai/gpt-4o-mini and not like 🐌 the animal

supple creek Aug 5, 2025, 10:59 PM

#

elfin stream https://openrouter.ai/docs/features/presets are a thing if you can't modify your...

I use this now, but man, it really sucks setting up a preset for every new model that i'm interested in using

#

We really need a way to ignore/block the fp4 junk globally. They're a net negative. I don't want to use fp4 quantized models even if the providers paid me per token 😆

#

it's giving OR a bad reputation too because people say "don't use open Router for benchmarking because it's unreliable" they're referring to the junk quantized models

exotic juniper Aug 6, 2025, 12:43 AM

#

I wrote my proxy

#

To run it just use bun run or_proxy.js

#

It supports this syntax: <slug>:<option>
where option is:
["free", "beta", "floor", "nitro", "thinking"] : passed as-is (openrouter uses these)
["int4", "int8", "fp4", "fp6", "fp8", "fp16", "bf16", "fp32"]: forces that quantisation
anything else (baseten, deepinfra/fp8, any provider slug): force that provider

exotic juniper Aug 6, 2025, 12:46 AM

#

supple creek We really need a way to ignore/block the fp4 junk globally. They're a net negati...

it should work for you, just add :fp8 to the end of your requests

#

to force fp8

exotic juniper Aug 6, 2025, 1:04 AM

#

To get the provider slug press the clipboard next to the providers name

#

hope it helps!

exotic juniper Aug 6, 2025, 9:40 PM

#

Yes, just put qwen/qwen3-coder:fp8 and it will force fp8 quantisation

#

Haven’t coded that in yet (if people want it I can), you can also just a specific provider in

#

📎 or_proxy-v1.js

supple creek Aug 7, 2025, 6:55 PM

#

too many sneaky ways of being scammed by providers on OR

latent rapids Aug 7, 2025, 7:21 PM

#

Need someone patient and with money to run benchmarks on suspicious providers and compare to proven and official ones

supple creek Aug 7, 2025, 8:39 PM

#

latent rapids Need someone patient and with money to run benchmarks on suspicious providers an...

Or just have OR to start labeling quantized models as quantized 😢

#

And a global way to block quantized models, not just per request

exotic juniper Aug 7, 2025, 11:29 PM

#

supple creek too many sneaky ways of being scammed by providers on OR

add deepinfra to that

abstract salmon Aug 8, 2025, 7:06 AM

#

fp4 doesn't automatically mean bad quality. I've tested DeepInfra fp4 for Kimi K2 and it was surprisingly better than some other providers:

https://eval.16x.engineer/blog/kimi-k2-provider-evaluation-results

16x Eval

Kimi K2 Provider Evaluation: Significant Performance Differences Ac...

Evaluation of Kimi K2 model providers including DeepInfra, Groq, Moonshot AI, Together on coding and writing tasks, showing substantial differences in speed, stability, and output quality.

latent rapids Aug 8, 2025, 7:30 AM

#

Yeah, it depends on quant type, not only size. Exl3 at 4 bit would be much different in quality from legacy Q4 quant with no calibration dataset and no i-matrix. And people say some tasks and token probabilities get hit by quant much harder than others

supple creek Aug 8, 2025, 12:25 PM

#

abstract salmon fp4 doesn't automatically mean bad quality. I've tested DeepInfra fp4 for Kimi K...

I've seen benchmarks for quantization, and they simply never match reality

#

.
We're not asking for a ban of quantized models.

Give us a away to globally block anything below fp8. If some want to use them, great, have fun. I think anything below fp8 is brain dead and that I'm being scammed by secretly being served them

broken ether Aug 8, 2025, 12:37 PM

#

appreciate all the discussion here folks, this kind of thing is something we’re very aware of internally and want to make better in a few ways

exotic juniper Aug 8, 2025, 12:41 PM

#

exotic juniper

has anyone tried my proxy? if so, any feedback?

supple creek Aug 8, 2025, 3:02 PM

#

exotic juniper add deepinfra to that

Why do you say that? their models are labeled as fp8. Is that not true?

quiet jay Aug 8, 2025, 10:35 PM

#

latent rapids Yeah, it depends on quant type, not only size. Exl3 at 4 bit would be much diffe...

also depends on context quant

exotic juniper Aug 9, 2025, 1:09 AM

#

supple creek Why do you say that? their models are labeled as fp8. Is that not true?

Some are fp8 but some are fp4

#

Also deepinfra(turbo) are fp4

#

More of a per model basis

latent rapids Aug 9, 2025, 9:49 AM

#

I wish there was a way to get a real quant size through model request or provider data

latent rapids Aug 12, 2025, 10:17 PM

#

That's a big damn difference. If claimed fp4 gets 90-93% AIME, what is the quant of 80% AIME, ? Or is it 4bit KV cache they use?
That's for smaller models, but still an example of llama3. And maybe MoE models are getting hit by quantization more

abstract salmon Aug 13, 2025, 6:32 AM

#

I think just saying fp4 is not enough to describe the quantization technique now. It's the layers which fp4 is applied to that matters.

#

Back in DeepSeek era, even though the model was trained on FP8, some layers are still BF16.

nova sedge Aug 20, 2025, 7:14 AM

#

Question on top of this is there an proof if a providers says they serving fp8 but will be routing internally to something like fp4 or int4.

#

This was a concern for me when I was directly using provider. Which I was using llama 405 but was routed to 70b this was obviously was on the json but you get my point

broken ether Aug 21, 2025, 10:10 PM

#

FYI: https://x.com/GosuCoder/status/1958377610385264952

GosuCoder (@GosuCoder)

I went into testing OpenRouter providers with several theories. One theory I had was that Qwen3 Coder with provider deepinfra/fp4 would perform significantly worse because of the fp4 quantization.

I was incredibly wrong, whatever magic Deepinfra is working on their fp4 version

#FP4 models are unacceptable especially for coding models. There’s no way to filter this as a paying