#Can't get vLLM V2.11.3 serverless to work

16 messages · Page 1 of 1 (latest)

meager thicket
#

I've tried a variety of different models and i keep getting the same error, tried Qwen/Qwen2.5-1.5B-Instruct for example and Llama-3.2-3B-Instruct:

AttributeError: TokenizersBackend has no attribute all_special_tokens_extended. Did you mean: 'num_special_tokens_to_add'?\n

or AttributeError: Qwen2Tokenizer has no attribute all_special_tokens_extended. Did you mean: 'num_special_tokens_to_add'?\n

amber irisBOT
#

To help others find answers, you can mark your question as solved via Right click solution message -> Apps -> ✅ Mark Solution

abstract wharfBOT
mystic tapir
#

Looks like a transformers/tokenizers version mismatch with vLLM 2.11.3. Try upgrading transformers and tokenizers in your serverless image or pinning to the versions in the vLLM compatibility matrix.

wide lily
#

Yep that's the issue. vLLM 2.11.0 was able to deploy models without an issue but seems like even that version has some problems now.

muted cliff
limber geyser
#

@bronze fiber

empty nebula
#

i have the same problem, posted about it earlier. it caused an error loop that sent my endpoint workers into "initializing" mode in the middle of the night and cost me $50 by the time i woke up and stopped it.

limber geyser
bronze fiber
#

we have an update for worker-vllm, this will be released today

wicked jetty
bronze fiber
#

yes and sorry for the delay, we did some more changes to it and are just awaiting a review

bronze fiber
wicked jetty
#

i'm trying Qwen/Qwen3-30B-A3B-Instruct-2507 and i'm stuck in throttled/initializing for a long time, not sure what is going on or if it'll work

#

for what it's worth the main reason i ask for examples of HF models that are known to work is it takes forever to test on our own. i've been waiting 15 minutes to see if the model will even load -- there's no logs from vLLM, the worker logs just say "image ready, initializing model files" or similar. it's impossible to know what is going on

#

but i have to wait to see what happens so i don't end up with a surprise bill because of some infinite loop