Hi team,
We’re currently using the serverless endpoint with the Runpod Ollama Docker image, and it had been working fine until recently. However, we’re now seeing significantly increased time for the model to load.
We tried upgrading the GPU, but that hasn’t resolved the issue. This is impacting cold start times and delaying request processing.
For reference:
Image: registry.runpod.net/svenbrnn-runpod-worker-ollama-master-dockerfile:06d78c606
Model: qwen3:8b
Additionally, while attempting to upgrade, I initiated a new rollout with updated GPU preferences. The rollout banner still shows that only 60% of workers are running the latest configuration, while the remaining 40% are not yet updated—even after ~5 hours since the rollout started.
Could you please help look into both the increased model load time and the rollout not completing?