#LocalAI Deployment

1 messages · Page 1 of 1 (latest)

mossy bramble
#

Hello RunPod Team, I'm considering your platform for deploying an AI model and have some questions.
My project involves using LocalAI (https://localai.io/ https://github.com/mudler/LocalAI), and it's crucial for the deployed model to support JSON formatted responses, this is the main reason I chose localai.

Could you guide me on how to set up this functionality on your platform?
Is there a feature on RunPod that allows the server or the LLM model to automatically shut down or enter a low-resource state if it doesn't receive requests for a certain period, say 15 minutes? This is to optimize costs when the model is not in use.

Thank you!

little gust
#

What u are looking for is the runpod serverless. Can read their documentation, but the tldr is can use a runpod official template as a base, then build on it to have ur own handler.py.

U must be able to build a docker image. Build whatever model you want into the docker image so it isnt constantly downloaded at runtime

#

this is me doing another one dividing it into two. one is for a gpu persistent service runpod has so i can debug with the baked in model. the other is for me using it for serverless

#

gpu pod:

Use the updated base CUDA image

FROM runpod/pytorch:2.1.1-py3.10-cuda12.1.1-devel-ubuntu22.04

WORKDIR /app

Best practices for minimizing layer size and avoiding cache issues

RUN apt-get update &&
apt-get install -y --no-install-recommends ffmpeg &&
rm -rf /var/lib/apt/lists/* &&
pip install --no-cache-dir torch==2.1.2 torchvision torchaudio xformers audiocraft firebase-rest-api==1.11.0 noisereduce==3.0.0 runpod

COPY preloadModel.py /app/preloadModel.py
COPY handler.py /app/handler.py
COPY firebase_credentials.json /app/firebase_credentials.json
COPY suprepo /app/suprepo

RUN python /app/preloadModel.py

#

Then this is the serverless one:

Use the updated base CUDA image

FROM justinwlin/audiocraft_runpod_gpu:1.0

WORKDIR /app
COPY handler.py /app/handler.py

Set Stop signal and CMD

STOPSIGNAL SIGINT
CMD ["python", "-u", "handler.py"]

#

If u want to, u can build and test ur docker image LOCALLY before ever purchasing runpod credit to make sure ur template works as expected

#

runpod has a test locally section in docs

mossy bramble
#

Oh I think I get it, I need to build a docker image which will run the API, it should be builded with a model I choose, and then the handler will simply make calls to the API.

I was thinking to use localai for that, because it has built-in support with enforcing grammer(json format), maybe you can advise me, should I use localai or a different tool you know?

Thanks 🙂

little gust
#

Rlly depends what u wanna do

#

if u have a specific model usually they have instructions how to run it

#

@mossy bramble So my recommendation is if u want:

  1. deposit 10 bucks on runpod if u want to risk using it (or test locally if u can)

  2. Use a gpu pod, and start up a pytorch template, or use ur own locally again

  3. record the steps u need to get ur code running. And then build ur stuff from that

little gust
#

by using a runpod base image on their website, going on the web terminal / jupyter lab and playing around with it

#

(make sure to terminate pod when done, or else ull be charged for running pods)

#

again all this can be done locally as long as ur computer / model / code supports it. i cannot say tho bc idk what ur doing / i prob dont have specific knowledge

#

as i just use runpod for my own personal projects