#Flux.1 Schnell Serverless Speeds
3 messages · Page 1 of 1 (latest)
You can load the models at startup instead of when responding to a request. This way it doesn't have to keep loading it from disk each time. This is NOT how most ComfyUI deployments are configured. To do this you may have to use a diffuser pipeline or similar instead to cache the model.
I currently have a diffusers only pipeline so that would work. However, how do I load it on start up? Do I load it before it hits the inference function? As it seems to .to('cuda') part is where I'm coming unstuck