#Running a specific Model Revision on Serverless Worker VLLM

47 messages · Page 1 of 1 (latest)

golden marlin
#

How do I specify the model revision on serverless? I was looking through the readme in https://github.com/runpod-workers/worker-vllm and I see I can build a docker image with the revision I want, but is that the only way to go about this?

Specifically, I wanna setup this huggingface model: https://huggingface.co/anthracite-org/magnum-v2-123b-exl2

edit: fixed the model link

GitHub

The RunPod worker template for serving our large language model endpoints. Powered by vLLM. - runpod-workers/worker-vllm

rose quest
#

@golden marlin when you create the endpoint, you can configure envirioonment variables. One of them is called MODEL_NAME and this accepts any supported model you want from HF. So what you can do is:

MODEL_NAME - anthracite-org/magnum-v2-123b-gguf

golden marlin
#

wait my bad I posted the wrong link

rose quest
#

You can also use "Quick Deploy" when you go into "Serverless". There we have a wizard to setup the endpoint called "Serverless vLLM". The result is the same thing in the end.

golden marlin
#

it's empty without a revision

#

so it just runs nothing

rose quest
#

AHH I see, you mean you want to change to a specific branch?

golden marlin
#

yeah

#

I thought they were called revisions on hf, are they just branches like in git?

rose quest
#

As hf is also just a git provider, I would just call this a branch. I think what the model owners mean is that you can get a specific revision of their model, but they use a git branch to distribute those. (At least this is how I understand it)

golden marlin
#

that sounds correct to me yeah

#

is there a configuration option somewhere for the branch/revision?

#

I found this, but then I'd have to build the 40gb image and put it somewhere

rose quest
#

According to the vLLM docs:

Revision: The specific model version to use. It can be a branch name, a tag name, or a commit id. If unspecified, will use the default version.

golden marlin
#

uh, I can't find that on the page

rose quest
golden marlin
#

ahhhhhhh, got it, nice

rose quest
#

ok so looking at the code from vLLM-worker, I think we just forgot to add this into the README, but it seems that using this via env variables does also work: MODEL_REVISION

#

so if you have the time, can you please try this?

golden marlin
rose quest
#

ok thank you, then this is a bug. It should work as far as I understand it

golden marlin
#

I may have misconfigured something but I was getting this error message, so I presume the model_revision var was ignored

rose quest
#

would you mind showing me the env variables that you have configured?

golden marlin
#

just like this, right?

rose quest
#

yes, this should be totally fine

#

could you also please share the exact docker image that you used?

#

then I'm opening a bug in our repo to get this fixed

golden marlin
#

I'm just using the vanilla vllm thing

rose quest
#

ok perfect, thank you

#

then I'm afraid the only solution for RIGHT NOW is to either build the image yourself OR you create a copy of the repo on hf into your account and put the model revision you want on main

golden marlin
#

oof

rose quest
#

😦

#

but I will create the bug report now and push this internally

golden marlin
#

I'll just wait for the fix, not in that big of a hurry

#

thanks for the support

rose quest
#

While creating the issue on GitHub, I also tried to find out what we have to do and it looks like that both of these env variables must be set:

  • MODEL_REVISION
  • TOKENIZER_REVISION
#

After I configured both, then it was able to load the model in the desired revision

golden marlin
#

okay, I see

#

so I'd basically need to set up the container on my own with the proper deps to run the model

#

fuck

#

thanks

rose quest
#

If you want to run this model with this quantization method, then you can't use it with vLLM right now