#Not possible to set temperature / top_p using Serverless vLLM via quick deploy?

7 messages · Page 1 of 1 (latest)

fallen nebula
#

By default, vLLM loads sampling parameters (e.g. temperature / top_p) from a model's generation_config.json if present. (see here: https://github.com/vllm-project/vllm/issues/15241). To override this you, have to pass --generation-config when starting the vLLM server.

Because RunPod's worker-vllm (https://github.com/runpod-workers/worker-vllm) doesn't expose an environment variable to pipe through a --generation-config value, does this mean it's not possible to change the temperature or top_p for any model deployed by Serverless vLLM quick deploy that has a generation_config.json file e.g. all the Meta Llama models?

And the solutions is a customer Docker image / Worker?

gilded goblet
#

Cant you just modify it using the openai client?

zealous steppe
#

Ellroy if you want custom check this one. We allow exactly that, even dynamically per coldstart request

fallen nebula
fallen nebula
fallen nebula