#Quantization method
8 messages · Page 1 of 1 (latest)
This does not quantize a model. It allows you to use a model that is already quantized and you specify the quanization format.
thank you so much, now i get it
2024-06-27T10:50:05.563358317Z ValueError: Quantization method specified in the model config (bitsandbytes) does not match the quantization method specified in the quantization argument (gptq).
now I have this error
You can't select GPTQ quantization when the model is quantized with bitsandbytes.
vllm does not support bitsandbytes quantization method.
Seems like vllm engine supports it, maybe its just a RunPod vllm worker limitation:
https://github.com/vllm-project/vllm/issues/5569
Your current environment $ python collect_env.py Collecting environment information... PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch...