#Not able to run unsloth gemma 3 bnb on with dynamic lora on vllm?

10 messages · Page 1 of 1 (latest)

dire hawk
#

It gives error.

polar pastureBOT
#

Dynamic LoRA with vLLM is not compatible with Unsloth Gemma 3 BnB-4bit models. Attempting to use dynamic LoRA (e.g., hot-swapping adapters) on these models in vLLM results in errors, and this is a known limitation. You can switch LoRA adapters on a regular (non-quantized) base model, but not with BnB-4bit quantized models. For dynamic LoRA, use non-quantized models or check for future updates from Unsloth and vLLM teams regarding support for this feature. See discussion and workarounds in the linked issues for more details.

Would you like a step-by-step explanation or code examples for alternative approaches?

Sources:

dire hawk
#

so you are saying if i use regular unslot gemma model it will work

#

but that model will be twice the size and do i use on the fly quantizationn for it to work. Qwen works fine but this is giving issues.

main river
#

It seems Gemma 3 BnB-4bit models are not compatible with hot-swapping adapters in vLLM

#

you could make an issue for vllm on their github, or see if they have any support in the discord

dire hawk
#

ok thanks! If you don't mind I think if you also start pushing for VLLm steps in your notebooks it will be great or your own inference engine on Docker compose if you can make it comparable with hot swappable, I will be the first to use. Qwen works phenomenal your bnb model. Right now yes we can do in python and all but you will get easy deployment for use cases for scale. I am sure there is a lot going on, so just whenever you have space.

#

So I am thinking for Gemma if I use regular model vllm has auto quantize will it work?

main river