Hey Runpod & Community! Pleasure to be here. I'm hoping someone else may have seen this and discovered a potential solution.
We've been completely blocked since April 10. Our serverless endpoint worked fine for a few weeks up until Friday. Here's everything we've confirmed:
** Setup:**
- Docker image: runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04 (also tested with ghcr.io/runpod-workers/worker-comfyui:latest — same result)
- Handler calls runpod.serverless.start({"handler": handler}) — standard pattern
- RunPod SDK 1.9.0, all 7 fitness checks pass
- Network volume with ComfyUI + models
** What happens:**
- Container starts, ComfyUI boots (~45-70s depending on GPU)
- Handler starts, calls runpod.serverless.start()
- SDK registers, fitness checks all pass
- Health API shows idle: 1, ready: 1 and inQueue: 1
- Worker never receives the job
- Container gets killed, new one spins up, same cycle