#Guide to deploy Llama 405B on Serverless?

50 messages · Page 1 of 1 (latest)

random forge
#

Hi, can any experts on Serverless advice on how to deploy Llama 405B on Serverless?

ripe field
#

@random forge - you need to attach a network volume to the end point. The volume should have at least 1 TB space to hold the 405 B model (unless you are using quantized models). Then increase the number of workers to match the model gpu requirement (like 10 48 GB GPUs)

#

I tried several 405 B models in HF but get error related to rope_scaling. Looks like we need to modify it to null and try. To do this I need to download all files and upload again.

soft meadow
#

does the vllm worker supports this yet?

ripe field
#

@soft meadow not sure about this, do we have a document or page that lists vllm's support for a model?

soft meadow
#

on the docs of vllm, not on runpod

#

look at the right versions, maybe current vllm is outdated

soft meadow
#

yes that one is for the latest version

#

hope vllm-worker now is the latest

ripe field
#

looks like it supports

#

LlamaForCausalLM

Llama 3.1, Llama 3, Llama 2, LLaMA, Yi

meta-llama/Meta-Llama-3.1-405B-Instruct, meta-llama/Meta-Llama-3.1-70B, meta-llama/Meta-Llama-3-70B-Instruct, meta-llama/Llama-2-70b-hf, 01-ai/Yi-34B, etc.

soft meadow
#

check the current vllm worker's vllm version

#

i think last time it hasn't been updated yet

ripe field
#

I am using runpod/worker-vllm:stable-cuda12.1.0

#

since I am using serverless I am unable to run any command

soft meadow
#

and does it working with llama3.1 now?

#

yeah ofc, leme check the repo one sec

ripe field
#

No I get error related to rope_scaling

#

llama 3.1 's config.json has lots of params under rope_scaling

soft meadow
#

rope scaling huh, i think you're unable to set that too for current vllm worker version

ripe field
#

but the current vllm accepts only two params

soft meadow
ripe field
#

2024-07-24T04:42:22.063990694Z engine.py :110 2024-07-24 04:42:22,063 Error initializing vLLM engine: rope_scaling must be a dictionary with two fields, type and factor, got {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}

soft meadow
#

thats the matching docs for current vllm-worker

ripe field
#

ok got it, 405 is not in there

soft meadow
#

seems like it just got updated on the newest version yeah

#

soo only the newest version, and we have to wait until vllm-worker updates to the latest or stable version of vllm

ripe field
#

ok.. is it done automatically or should we raise a ticket etc

soft meadow
#

yeah. about that, we just wait until runpod's staff updates it

#

they say they're working on it, don't worry

#

im also waiting for it 🙂

ripe field
#

great, thank you very much for your time

#

🙂

slender echo
soft meadow
#

ah ollama interesting

#

thanks for sharing it will look at that too hahah

slender echo
#

I will also test this later today with 70 and 405.

ripe field
#

@soft meadow would like to know if you got any news on the vllm update

#

for 405

soft meadow
#

No not yet I don't know

#

They're still working on it

ripe field
#

@slender echo pls let me know if ollama worker worked with 405

sleek olive
# ripe field <@504040039588560913> pls let me know if ollama worker worked with 405
RunPod Blog

Meta’s recent release of the Llama 3.1 405B model has made waves in the AI community. This groundbreaking open-source model not only matches but even surpasses the performance of leading closed-source models. With impressive scores on reasoning tasks (96.9 on ARC Challenge and 96.8 on GSM8K)

slender echo
sleek olive
soft meadow
polar sigil