#how to set a max output token
11 messages · Page 1 of 1 (latest)
vllm does not support yet llama 3.1
they're working to update vllm-worker
I'm not using llama 3.1, it's the old llama 3
hmm okay max output tokens?
i think limited output tokens can be modified if you use openai pip package to request to runpod serverless
try using that
you can modify max output tokens
Are you asking how to set the Max Model Length parameter inside the vLLM worker? It is under LLM Settings.
No, this is more relevant to the context length right. I'm talking about output tokens