DeepInfra | OpenRouter | Page 1

crystal anvil Apr 23, 2025, 3:27 PM

#

https://openrouter.ai/provider/deepinfra

DeepInfra | OpenRouter

Browse models provided by DeepInfra

dense lynx Apr 23, 2025, 3:53 PM

#

Extremely slow TPS for deepseek v3 0324

#

0/10

crystal anvil Apr 23, 2025, 6:12 PM

#

Thanks for the feedback Lix!

fallow warren Apr 23, 2025, 8:10 PM

#

I feel like DeepInfra has the lowest pricing but also the lowest throughput, and for some use-cases that's okay

dense lynx Apr 23, 2025, 10:06 PM

#

4.4 TPS is way too slow

mossy osprey Apr 29, 2025, 4:08 AM

#

is this kind of latency expected from Qwen3?
https://openrouter.ai/api/v1/generation?id=gen-1745899078-U4A3YcDExizJF7CnyhOg

Screenshot_2025-04-29_at_12.08.26_PM.png

fallow warren Apr 29, 2025, 5:42 AM

#

mossy osprey is this kind of latency expected from Qwen3? https://openrouter.ai/api/v1/gener...

Did you specify /no_think ?

#

these models can think for a long time

#

but yeah that still seems very slow

mossy osprey Apr 29, 2025, 5:49 AM

#

thanks. i will try with /no_think
but i feel like no think is a bad idea because you know it is deterministically going to make the output worse.

crystal anvil May 2, 2025, 9:24 PM

#

The first token latency is fixed now, it counts it until the first token not first non-reasoning token

livid mortar Jun 21, 2025, 1:25 AM

#

crystal anvil https://openrouter.ai/provider/deepinfra

idk if their gemini proxy models are cheaper because of genuine discounts or burning vc money, but either way, could we get them in openrouter

karmic hearth Aug 16, 2025, 8:18 AM

#

Why is Llama 3.2 3b cheaper than Llama 3.2 1b on DeepInfra? Shouldn't it be the other way around?

karmic hearth Dec 12, 2025, 4:39 PM

#

@formal jewel can you ask DeepInfra for higher rate limits for me? I'm getting rate limited on llama 3.1 8b (sending 1 request every 5 sexonds)

marsh gazelle Jan 8, 2026, 11:28 PM

#

They added a new model today: https://openrouter.ai/allenai/olmo-3.1-32b-instruct

Olmo 3.1 32B Instruct - API, Providers, Stats

Olmo 3.1 32B Instruct is a large-scale, 32-billion-parameter instruction-tuned language model engineered for high-performance conversational AI, multi-turn dialogue, and practical instruction following. Run Olmo 3.1 32B Instruct with API

warped copper Feb 10, 2026, 5:43 PM

#

Hi all, new here, do you know if there is a suggested procedure to request to OpenRouter to add a new DeepInfra model? (Claude Opus 4 is missing for instance)

karmic hearth Feb 12, 2026, 7:38 PM

#

warped copper Hi all, new here, do you know if there is a suggested procedure to request to Op...

DeepInfra told OR not to add it

mossy osprey Feb 13, 2026, 9:59 AM

#

Is the qwen3 max model served from DeepInfra infra or just a proxy to alibaba intl?

Also what about Claude models?

#DeepInfra