#how to host 20gb models + fastapi code on serverless

28 messages · Page 1 of 1 (latest)

balmy drum
#

I have 20gb model files and a fastapi pipeline code to perform preprocessing and inference+ training.
How can I use runpods serverless?

coral sun
#

Create a serverless handler, make it into a docker image, make a template with your docker image that contains the serverless handler and model

#

Or, put the model in runpod's network storage then connect it to your endpoint and use your template to access the model

coral sun
#

Maybe HF Transformers or tensorflow, pytorch

balmy drum
#

It's Pytorch + tensorflow

coral sun
#

alright then you can create a docker image template with a serverless handler that executes those codes on gpu

balmy drum
#

Do I have to dockerize the models with code?..

#

The docker image is around 50GB

coral sun
#

Yes that works

#

But bulkier image will slow the runtime i think

balmy drum
#

Yeah.. that's why thought to keep model out of docker

coral sun
#

by putting the model in network storage and accessing it through the endpoint

#

it will be mounted in /runpod-volume like in the docs

balmy drum
#

Perfect that sounds a plan

#

Thanks

#

Can u share links for the network storage access and deploy too plz

coral sun
#

alright but try using the search or the AI in bottom right next time its cool

#

Goodluck on building bro

forest siren
#

Don't use fastAPI on serverless, its already an API layer.

wheat crystal
#

and don't put your models in your image most likely, def use network drives

wheat crystal
#

just several reasons to prefer smaller images, it will work but with a lot of overhead

forest siren
#

Network drives are about 1 million percent slower than baking things into the image, so I don't know why you are saying its better, because you are wrong.