#Not able to run unsloth gemma 3 bnb on with dynamic lora on vllm?
10 messages · Page 1 of 1 (latest)
Dynamic LoRA with vLLM is not compatible with Unsloth Gemma 3 BnB-4bit models. Attempting to use dynamic LoRA (e.g., hot-swapping adapters) on these models in vLLM results in errors, and this is a known limitation. You can switch LoRA adapters on a regular (non-quantized) base model, but not with BnB-4bit quantized models. For dynamic LoRA, use non-quantized models or check for future updates from Unsloth and vLLM teams regarding support for this feature. See discussion and workarounds in the linked issues for more details.
Would you like a step-by-step explanation or code examples for alternative approaches?
Sources:
so you are saying if i use regular unslot gemma model it will work
but that model will be twice the size and do i use on the fly quantizationn for it to work. Qwen works fine but this is giving issues.
It seems Gemma 3 BnB-4bit models are not compatible with hot-swapping adapters in vLLM
you could make an issue for vllm on their github, or see if they have any support in the discord
ok thanks! If you don't mind I think if you also start pushing for VLLm steps in your notebooks it will be great or your own inference engine on Docker compose if you can make it comparable with hot swappable, I will be the first to use. Qwen works phenomenal your bnb model. Right now yes we can do in python and all but you will get easy deployment for use cases for scale. I am sure there is a lot going on, so just whenever you have space.
So I am thinking for Gemma if I use regular model vllm has auto quantize will it work?
it's worth trying
I agree that would be great if we can get that working, it is tricky because we would have to maintain it, thank you for the feedback