have you followed the steps @ https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#4-bit-mode to install GPTQ?
#llama 13b 4bit 128 amd
1 messages · Page 1 of 1 (latest)
it looks like you either haven't done that yet, or just have an outdated copy of it
it's not -required- it's always optional, but it's helpful, and compatible with multiple models
to my knowledge it's the only way to get 4bit models loaded rn
there are multiple version of gptq available including a newer pytorch variant
there is an issue with some new functions that were added in GPTQ-for-llama that rocm cant handle. Yellowrose has a patched fork for rocm/hip here https://github.com/YellowRoseCx/GPTQ-for-LLaMa
i get this error when i try to run the llama 7b with 4bits on my nvidia card