mossy bramble Jan 7, 2024, 3:21 PM

#

Hello RunPod Team, I'm considering your platform for deploying an AI model and have some questions.
My project involves using LocalAI (https://localai.io/ https://github.com/mudler/LocalAI), and it's crucial for the deployed model to support JSON formatted responses, this is the main reason I chose localai.

Could you guide me on how to set up this functionality on your platform?
Is there a feature on RunPod that allows the server or the LLM model to automatically shut down or enter a low-resource state if it doesn't receive requests for a certain period, say 15 minutes? This is to optimize costs when the model is not in use.

Thank you!

LocalAI :: LocalAI documentation

Documentation for LocalAI

little gust Jan 7, 2024, 9:08 PM

#

What u are looking for is the runpod serverless. Can read their documentation, but the tldr is can use a runpod official template as a base, then build on it to have ur own handler.py.

U must be able to build a docker image. Build whatever model you want into the docker image so it isnt constantly downloaded at runtime

#

https://github.com/justinwlin/runpodWhisperx/blob/master/Dockerfile

This one isnt using a runpod as a base but can get the idea

GitHub

runpodWhisperx/Dockerfile at master · justinwlin/runpodWhisperx

Runpod WhisperX Docker Container Repo. Contribute to justinwlin/runpodWhisperx development by creating an account on GitHub.

#

this is me doing another one dividing it into two. one is for a gpu persistent service runpod has so i can debug with the baked in model. the other is for me using it for serverless

#

gpu pod:

Use the updated base CUDA image

FROM runpod/pytorch:2.1.1-py3.10-cuda12.1.1-devel-ubuntu22.04

WORKDIR /app

Best practices for minimizing layer size and avoiding cache issues

RUN apt-get update &&
apt-get install -y --no-install-recommends ffmpeg &&
rm -rf /var/lib/apt/lists/* &&
pip install --no-cache-dir torch==2.1.2 torchvision torchaudio xformers audiocraft firebase-rest-api==1.11.0 noisereduce==3.0.0 runpod

COPY preloadModel.py /app/preloadModel.py
COPY handler.py /app/handler.py
COPY firebase_credentials.json /app/firebase_credentials.json
COPY suprepo /app/suprepo

RUN python /app/preloadModel.py

#

Then this is the serverless one:

Use the updated base CUDA image

FROM justinwlin/audiocraft_runpod_gpu:1.0

WORKDIR /app
COPY handler.py /app/handler.py

Set Stop signal and CMD

STOPSIGNAL SIGINT
CMD ["python", "-u", "handler.py"]

#

If u want to, u can build and test ur docker image LOCALLY before ever purchasing runpod credit to make sure ur template works as expected

#

runpod has a test locally section in docs

mossy bramble Jan 7, 2024, 10:44 PM

#

Oh I think I get it, I need to build a docker image which will run the API, it should be builded with a model I choose, and then the handler will simply make calls to the API.

I was thinking to use localai for that, because it has built-in support with enforcing grammer(json format), maybe you can advise me, should I use localai or a different tool you know?

Thanks 🙂

little gust Jan 7, 2024, 11:21 PM

#

Rlly depends what u wanna do

#

if u have a specific model usually they have instructions how to run it

#

@mossy bramble So my recommendation is if u want:

deposit 10 bucks on runpod if u want to risk using it (or test locally if u can)
Use a gpu pod, and start up a pytorch template, or use ur own locally again
record the steps u need to get ur code running. And then build ur stuff from that

little gust Jan 7, 2024, 11:23 PM

#

little gust gpu pod: # Use the updated base CUDA image FROM runpod/pytorch:2.1.1-py3.10-cud...

that is how i came up with this audiocraft one

#

by using a runpod base image on their website, going on the web terminal / jupyter lab and playing around with it

#

(make sure to terminate pod when done, or else ull be charged for running pods)

#

again all this can be done locally as long as ur computer / model / code supports it. i cannot say tho bc idk what ur doing / i prob dont have specific knowledge

#

as i just use runpod for my own personal projects

#LocalAI Deployment

Use the updated base CUDA image

Best practices for minimizing layer size and avoiding cache issues

Use the updated base CUDA image

Set Stop signal and CMD