#DeepInfra

1 messages · Page 1 of 1 (latest)

crystal anvil
dense lynx
#

Extremely slow TPS for deepseek v3 0324

#

0/10

crystal anvil
#

Thanks for the feedback Lix!

fallow warren
#

I feel like DeepInfra has the lowest pricing but also the lowest throughput, and for some use-cases that's okay

dense lynx
#

4.4 TPS is way too slow

mossy osprey
fallow warren
#

these models can think for a long time

#

but yeah that still seems very slow

mossy osprey
#

thanks. i will try with /no_think
but i feel like no think is a bad idea because you know it is deterministically going to make the output worse.

crystal anvil
#

The first token latency is fixed now, it counts it until the first token not first non-reasoning token

livid mortar
karmic hearth
#

Why is Llama 3.2 3b cheaper than Llama 3.2 1b on DeepInfra? Shouldn't it be the other way around?

karmic hearth
#

@formal jewel can you ask DeepInfra for higher rate limits for me? I'm getting rate limited on llama 3.1 8b (sending 1 request every 5 sexonds)

marsh gazelle
warped copper
#

Hi all, new here, do you know if there is a suggested procedure to request to OpenRouter to add a new DeepInfra model? (Claude Opus 4 is missing for instance)

karmic hearth
mossy osprey
#

Is the qwen3 max model served from DeepInfra infra or just a proxy to alibaba intl?

Also what about Claude models?