#Flux.1 Schnell Serverless Speeds

3 messages · Page 1 of 1 (latest)

fathom dust
#

What sort of speeds are people getting with their Flux.1 Schnell models using Serverless in RunPod? I'm currently hitting 30 seconds for 4 images with a significant amount of time moving the model to cuda (~15 seconds). Is there anyway to speed this up? (48GB GPU Pro)

simple shoal
#

You can load the models at startup instead of when responding to a request. This way it doesn't have to keep loading it from disk each time. This is NOT how most ComfyUI deployments are configured. To do this you may have to use a diffuser pipeline or similar instead to cache the model.

fathom dust
#

I currently have a diffusers only pipeline so that would work. However, how do I load it on start up? Do I load it before it hits the inference function? As it seems to .to('cuda') part is where I'm coming unstuck