#Serverless: nightly workers stuck “initializing” on SOME endpoints (delay time spikes)

11 messages · Page 1 of 1 (latest)

serene sand
#

I’m seeing a very consistent Serverless scaling issue for ~the last week, but IMPORTANT: this does NOT happen to all my endpoints — only to some of them. Other endpoints remain stable during the same time window.

Example affected endpoint ID: owufvdufc1h5x2

Time window (daily):

  • 18:00–23:00 UTC

Symptoms (only on some endpoints, during that window):

  • The endpoint often can’t maintain the required number of workers (sometimes it drops to 0)
  • Workers remain in “initializing” for a long time
  • “Delay time” jumps from ~1s to up to ~30 minutes at peak
    Outside this window, everything works normally.

My worker is packaged in Docker and pulled from Docker Hub. I know “initializing” can include downloading the Docker image to a new worker, and requests arriving during initialization will be delayed.

Question: Can I find out the root cause of these nightly failures on only some endpoints? Any recommended fixes/settings (min workers, caching models, scaling config), and is there any specific log/request history I should collect from the Console Workers tab to help you investigate?

modest doveBOT
modest doveBOT
winter thicket
#

oh hm initializing for long time? could be the download is slow?, more active workers might help with this but might be abit to manage the effecient amount

#

but the good thing is your endpoint stays alive, jobs still getting processed, before those times

#

but totally open a ticket, let support staff know, im curious about what is the cause of this

serene sand
# winter thicket oh hm initializing for long time? could be the download is slow?, more active wo...

Yeah, I think this isn’t just “slow download” - it looks like the workers are unable to complete initialization at all.
Also important: this does NOT happen on all of my endpoints - only on some. Other endpoints remain stable during the same nightly window.
In my case it’s not 1 worker: during the problematic window ALL workers get stuck in “initializing” for hours, and the endpoint can drop to 0 ready workers. My config is max 9 workers, but “active/min workers”.

winter thicket
serene sand
#

I have all the regions selected, the GPU is only H100, for example, now I only have 1 left, but the waiting time is low, I think in a couple of hours all workers will be set to the status "initializing" or "throttled"

winter thicket
#

oh ic