#Initializing to throttle loop for vllm. runpod never works!

6 messages · Page 1 of 1 (latest)

spice shuttle
#

why do my workers keep going into initializing to throttle loop for vllm. runpod never works!

There is no way for me to get the log since it throttle once start running. I'm trying to deploy MAI-UI from HF

pastel muskBOT
desert ember
#

Use another gpu or region

#

Thst means the gpu aren't available for use (used by others)

spice shuttle
#

I'm deploying a 2B model which is less than 5gb and i've selected all the gpus available under 24gb. the workers does finish initializing and then go idle but go throttled soon after.

weary thicket
#

I agree, this is a huge problem. I typically get a response from support to enable more workers or to leave them running, which kind of defeats the purpose for low-volume usage. It really stinks when you are giving a front-end demo, and it fails because the serverless is completely throttled on the backend. Typically, you can say configure 0 instances, let it kill everything, then set it to 1 or more instances, and you'll get a "ready" serverless. I have written support multiple times and even followed their recommendation to migrate my network volume, but I'm still seeing the same issue in the new availability zone.