Not possible to set temperature / top_p using Serverless vLLM via quick deploy? | Runpod | Page 1

fallen nebula Jun 15, 2025, 8:22 AM

#

By default, vLLM loads sampling parameters (e.g. temperature / top_p) from a model's generation_config.json if present. (see here: https://github.com/vllm-project/vllm/issues/15241). To override this you, have to pass --generation-config when starting the vLLM server.

Because RunPod's worker-vllm (https://github.com/runpod-workers/worker-vllm) doesn't expose an environment variable to pipe through a --generation-config value, does this mean it's not possible to change the temperature or top_p for any model deployed by Serverless vLLM quick deploy that has a generation_config.json file e.g. all the Meta Llama models?

And the solutions is a customer Docker image / Worker?

gilded goblet Jun 15, 2025, 10:36 AM

#

Cant you just modify it using the openai client?

zealous steppe Jun 15, 2025, 2:43 PM

#

gilded goblet Cant you just modify it using the openai client?

generation_config="vllm" is required when using openAI too. I think the official template should get inspired also by allowing the user to pass any vllm engine argument/env variable.

#

Ellroy if you want custom check this one. We allow exactly that, even dynamically per coldstart request

fallen nebula Jun 15, 2025, 4:14 PM

#

gilded goblet Cant you just modify it using the openai client?

vLLM ignores the input from the OpenAI client unless the vLLM server is started with --generation-config vllm

fallen nebula Jun 15, 2025, 4:14 PM

#

zealous steppe `generation_config="vllm"` is required when using openAI too. I think the offici...

Thanks @zealous steppe - will check it out 🙂

fallen nebula Jun 17, 2025, 1:18 PM

#

For anyone seeing this - I misunderstood how this works. Defaults are loaded from generation_config.json if present but values passed at runtime still take precedence, as you would expect – https://chatgpt.com/share/68516a00-0b28-8000-b2b6-a51b890e4be0

ChatGPT

ChatGPT - vLLM sampling params precedence

Shared via ChatGPT

#Not possible to set temperature / top_p using Serverless vLLM via quick deploy?