#High delay time
43 messages · Page 1 of 1 (latest)
Stuck. Not sure why
I don't get this, are you trying to download docker image inside a docker(runpod)
If so, you shouldn't, because it's not quite supported except cpu pods
You should use templates directly on runpod and refer to a registry from runpod, let runpod's system handle the docker image download
I currently have a vLLM pointing to a model on Hugging Face model, but the requests aren't going through
Okay sure
But the weird thing is this, it's seems like it's still pulling and its shown on the logs
@cunning yarrow
Escalated To Zendesk
The thread has been escalated to Zendesk!
Maybe also create a support ticket for that
what model and the vllm settings
please
Model: deepseek-ai/DeepSeek-R1
lol
It says out of memory in the lgos, so you definetely need more vram for that configuration
Yeah alot of people run into this, it can be expensive to run the large typr of that model
I tried to change the model
To "mistralai/Mistral-7B-Instruct-v0.1"
I got the response, but it took 2 minutes. Will this be the case for every future request?
no only if it is a cold start
aka if you dont send requests for a while and then send it
it will take some time for the worker to start up
Hmm exit code 1?
Seems to be okay now, it's just the delay times are high
Actually, looks good now
You can use a ns to prevent model re-download for every cold start
What's an "ns"?
Network storage
Where do I enable that?
Edit endpoint
There network volume
Your welcome