#Best way to cache models with serverless ?

5 messages · Page 1 of 1 (latest)

kind quartz
#

Hello,

I'm using serverless endpoint to do image generation with flux dev. The model is 22gb which is quite long to download, especially since some workers seem to be faster than some others.

I've been using a network volume as a cache which greatly improve start up time. However, doing this lock me in a particular region which I believe make some GPUs like the A100 very rarely available.

Is there a way to have a global huggingface cache with serverless endpoint ? (like with pods)

Thanks

green ridge
#

for now its best to bake the model in container image, we have model cache planned end of jan to enable caching of models

kind quartz
#

Good to know ! So even with a 22 gb sized model it's worth including it in the container image ? I'll try that

novel minnow
#

Does docker during building recognizes secrets set in endpoint settings?

In case where model included in the container image is private