#Initializing to throttle loop for vllm. runpod never works!
6 messages · Page 1 of 1 (latest)
Use another gpu or region
Thst means the gpu aren't available for use (used by others)
I'm deploying a 2B model which is less than 5gb and i've selected all the gpus available under 24gb. the workers does finish initializing and then go idle but go throttled soon after.
I agree, this is a huge problem. I typically get a response from support to enable more workers or to leave them running, which kind of defeats the purpose for low-volume usage. It really stinks when you are giving a front-end demo, and it fails because the serverless is completely throttled on the backend. Typically, you can say configure 0 instances, let it kill everything, then set it to 1 or more instances, and you'll get a "ready" serverless. I have written support multiple times and even followed their recommendation to migrate my network volume, but I'm still seeing the same issue in the new availability zone.