#Provider routing by quantization is broken

6 messages · Page 1 of 1 (latest)

vocal vector
#

If you want to accept anything equal or above Q8 quants, documentation would suggest defining a list that looks like this in the provider parameters "quantizations": ["fp32", "fp16", "fp8", "int8", "bf16", "unknown"]

However, model that have no quants s (such as google/gemini-2.5-flash) give this error: {'error': {'message': 'No endpoints found for the request with quantization: fp32,fp16,fp8,int8,bf16. To learn more about provider routing, visit: https://openrouter.ai/docs/provider-routing', 'code': 404}}

How to accept an allowlist of quants AND no quants?

livid widgetBOT
#

To accept both an allowlist of quantizations and models with no quantization, omit the quantizations field from your provider parameters for models without quants. The quantizations filter only matches providers supporting those quant levels, so including it excludes models with no quantization like google/gemini-2.5-flash.

Unfortunately, you cannot combine a quantization allowlist with "no quant" in one list. Instead, make separate requests or omit the quantizations filter when querying models that have no quantization.

For details, see the provider routing docs on quantization filtering: Provider Routing

vocal vector
#

The provided solution does not sound convenient, especially if you want to get best of the both worlds on OpenRouder

livid widgetBOT
#

Currently, OpenRouter's provider routing does not support combining a quantization allowlist with "no quant" models in a single request. To handle both cases conveniently, you can:

  • Make separate requests: one filtering by quantizations for models with quants, and one without the quantizations filter for models with no quants.
  • Alternatively, avoid specifying quantizations to accept all quant states, but this may include unwanted quantizations.

This limitation is due to how provider routing matches endpoints strictly by quantization. There’s no built-in way to mix quantized and non-quantized models in one allowlist.

Would you like guidance on how to set up provider routing for your specific use case?

-# This is an automated response. Use !mute to stop the bot from responding.

vocal vector
#

!mute

livid widgetBOT
#

🔇 Thread muted. The bot will no longer respond here.