#Workers stay in Initialize
28 messages · Page 1 of 1 (latest)
Ticket #29446
if you use another contaiiner image will it work?
maybe its trouble on pulling your image
There are no logs at all, it doesn't even get a chance to pull an image
I have created another endpoint and the same thing happens. Treid multiple regions/data centers as well.
can't be the runpod servers i guess, could be your browser
if it were runpod side, many people would storm this server saying the same thing
It has to be the Runpd server, it can not be my browser. If it is my browser I have serious questions about why using Firefox instead of Chrome is preventing my a worker from starting up correctly.
I'll give the support ticket another 24 hours since I opened it duriing the weekend but if I can not use Runpod soon, I'll just have to move to AWS again and pay them 2x. But at this point I have lost faith, I can not use Runpod reliably to build a product for our company. I guess I'd rather pay AWS 2x and know it works consistently. Well, the Ops team will be happy that I'm helping us reach our AWS contract quota haha..
you are not the only one, it's happening to me too. only for runpod worker templates like vLLM, not my own. there is another poster as well from a few days ago
Closing the loop here. Support got back to me, there is a problem with the internal registry when you create a new Serverless endpoint it sets the vLLM image as registry.runpod.net/runpod-workers-worker-vllm-main-dockerfile:3851d53f9 and that pulls from an internal Runpod registry. There seems to be some issues with it that they are aware of, so you have to use the DockerHub one: runpod/worker-v1-vllm:v2.11.1 untill they have fixed the issue.
Thank you for marking this question as solved!
Problem stays the same for me. At least now I get log output.
2026-01-08T09:05:59Z loading container image from cache
2026-01-08T09:08:43Z Loaded image: runpod/worker-v1-vllm:v2.11.1
2026-01-08T09:08:44Z v2.11.1 Pulling from runpod/worker-v1-vllm
2026-01-08T09:08:44Z Digest: sha256:aa576ece2d76cac59578a8ef4595719423648b7e52eaebb36df0fd2a3e5dfbda
2026-01-08T09:08:44Z Status: Image is up to date for runpod/worker-v1-vllm:v2.11.1
2026-01-08T09:12:28Z create container runpod/worker-v1-vllm:v2.11.1
2026-01-08T09:12:28Z start container for runpod/worker-v1-vllm:v2.11.1: begin
2026-01-08T09:12:44Z start container for runpod/worker-v1-vllm:v2.11.1: begin
2026-01-08T09:13:00Z start container for runpod/worker-v1-vllm:v2.11.1: begin
2026-01-08T09:13:16Z start container for runpod/worker-v1-vllm:v2.11.1: begin
2026-01-08T09:13:32Z start container for runpod/worker-v1-vllm:v2.11.1: begin
2026-01-08T09:13:48Z start container for runpod/worker-v1-vllm:v2.11.1: begin
2026-01-08T09:14:04Z start container for runpod/worker-v1-vllm:v2.11.1: begin
2026-01-08T09:14:20Z start container for runpod/worker-v1-vllm:v2.11.1: begin
2026-01-08T09:14:35Z remove container
Also the worker now just stays in unhealthy state. 🤔 If we know, it is unhealthy, would it not make sense to remove it? 🤓
it doess.. but unhealthy here means, its unhealthy because of the worker's code usually, but could be from the host too
usually i think they'll kind of disable that worker for your endpoint?
hmm whats your env variables?, did you check the container logs?
did not change env variables, took everything from the vllm template. just changed the container path because runpod hub seems broken.
logfiles don't tell too much. 🤓
did you use model cache?
i just specified a model from hugging face. as i understand that should be cached. but it never started even once ... both jobs in the queue for hours.