#Slow Startup & Partial Rollout Issue

6 messages · Page 1 of 1 (latest)

gaunt forge
#

Hi team,

We’re currently using the serverless endpoint with the Runpod Ollama Docker image, and it had been working fine until recently. However, we’re now seeing significantly increased time for the model to load.

We tried upgrading the GPU, but that hasn’t resolved the issue. This is impacting cold start times and delaying request processing.

For reference:

Image: registry.runpod.net/svenbrnn-runpod-worker-ollama-master-dockerfile:06d78c606
Model: qwen3:8b

Additionally, while attempting to upgrade, I initiated a new rollout with updated GPU preferences. The rollout banner still shows that only 60% of workers are running the latest configuration, while the remaining 40% are not yet updated—even after ~5 hours since the rollout started.

Could you please help look into both the increased model load time and the rollout not completing?

lime fiberBOT
worldly martenBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

peak yacht
#

Can we confirm whether the delay is due to actually provisioning the worker, or loading the model in once the worker is created? If changing the GPU doesn't change anything, then it's likely a bottleneck when the worker is pulling the image

#

Also what does the supply look like? Could be that it's having a hard time getting a GPU in the first place

gaunt forge