Provider routing by quantization is broken | OpenRouter | Page 1

vocal vector Jul 28, 2025, 9:01 AM

#

If you want to accept anything equal or above Q8 quants, documentation would suggest defining a list that looks like this in the provider parameters "quantizations": ["fp32", "fp16", "fp8", "int8", "bf16", "unknown"]

However, model that have no quants s (such as google/gemini-2.5-flash) give this error: {'error': {'message': 'No endpoints found for the request with quantization: fp32,fp16,fp8,int8,bf16. To learn more about provider routing, visit: https://openrouter.ai/docs/provider-routing', 'code': 404}}

How to accept an allowlist of quants AND no quants?

livid widgetBOT Jul 28, 2025, 9:01 AM

#

To accept both an allowlist of quantizations and models with no quantization, omit the quantizations field from your provider parameters for models without quants. The quantizations filter only matches providers supporting those quant levels, so including it excludes models with no quantization like google/gemini-2.5-flash.

Unfortunately, you cannot combine a quantization allowlist with "no quant" in one list. Instead, make separate requests or omit the quantizations filter when querying models that have no quantization.

For details, see the provider routing docs on quantization filtering: Provider Routing

OpenRouter Documentation

Provider Routing - Smart Multi-Provider Request Management

Route AI model requests across multiple providers intelligently. Learn how to optimize for cost, performance, and reliability with OpenRouter's provider routing.

vocal vector Jul 28, 2025, 9:09 AM

#

The provided solution does not sound convenient, especially if you want to get best of the both worlds on OpenRouder

livid widgetBOT Jul 28, 2025, 9:09 AM

#

Currently, OpenRouter's provider routing does not support combining a quantization allowlist with "no quant" models in a single request. To handle both cases conveniently, you can:

Make separate requests: one filtering by quantizations for models with quants, and one without the quantizations filter for models with no quants.
Alternatively, avoid specifying quantizations to accept all quant states, but this may include unwanted quantizations.

This limitation is due to how provider routing matches endpoints strictly by quantization. There’s no built-in way to mix quantized and non-quantized models in one allowlist.

Would you like guidance on how to set up provider routing for your specific use case?

-# This is an automated response. Use !mute to stop the bot from responding.

vocal vector Jul 28, 2025, 9:09 AM

#

!mute

livid widgetBOT Jul 28, 2025, 9:09 AM

#

🔇 Thread muted. The bot will no longer respond here.

#Provider routing by quantization is broken