#DeepInfra
1 messages · Page 1 of 1 (latest)
Thanks for the feedback Lix!
I feel like DeepInfra has the lowest pricing but also the lowest throughput, and for some use-cases that's okay
4.4 TPS is way too slow
is this kind of latency expected from Qwen3?
https://openrouter.ai/api/v1/generation?id=gen-1745899078-U4A3YcDExizJF7CnyhOg
Did you specify /no_think ?
these models can think for a long time
but yeah that still seems very slow
thanks. i will try with /no_think
but i feel like no think is a bad idea because you know it is deterministically going to make the output worse.
The first token latency is fixed now, it counts it until the first token not first non-reasoning token
idk if their gemini proxy models are cheaper because of genuine discounts or burning vc money, but either way, could we get them in openrouter
Why is Llama 3.2 3b cheaper than Llama 3.2 1b on DeepInfra? Shouldn't it be the other way around?
@formal jewel can you ask DeepInfra for higher rate limits for me? I'm getting rate limited on llama 3.1 8b (sending 1 request every 5 sexonds)
They added a new model today: https://openrouter.ai/allenai/olmo-3.1-32b-instruct
Hi all, new here, do you know if there is a suggested procedure to request to OpenRouter to add a new DeepInfra model? (Claude Opus 4 is missing for instance)
DeepInfra told OR not to add it
Is the qwen3 max model served from DeepInfra infra or just a proxy to alibaba intl?
Also what about Claude models?