GPTQ ExLlama Won't Split Between GPUs | Text Generation WebUI | Page 1

bitter zephyr Aug 5, 2023, 3:42 PM

#

I cannot for the life of me get exllama or exllamahf to use both GPUs I have a 3060 and 2070 super 20gb of ram total. I would like to use both GPUs so I can try to either run a 30b for better conversational skills or a 13b with higher context for story writing.

elder imp Aug 5, 2023, 6:56 PM

#

you probably won't get 30b working, 13 with higher context should. my cards are all 12gb, so they don't match your setup but i found 8,10 gpu split worked for 2 gpus and 5,5,10 works for 3 gpus when using ExLlama. most number combinations just don't work, so it was a lot of trial and error to get the 3 working together. so yours is just not loading anything into the second gpu?

bitter zephyr Aug 5, 2023, 8:53 PM

#

elder imp you probably won't get 30b working, 13 with higher context should. my cards a...

Can you get 30b working with 3 12gb cards? I had to go down to 4-5 on card 0 to get it to load 13b with 8k

elder imp Aug 6, 2023, 2:50 AM

#

yes, i get 30b working fine with 3 12gb cards. i use 5,5,10. i can't do 8k context, but 6k works great. i could run 13b on one 12gb card, but at 2k context. it would run out of vram if i tried 8k.

mossy canyon Aug 22, 2023, 12:52 AM

#

No matter how I try, I can't get ExLlama to split across my gpu's rtx 3090 and a rtx 4060ti 16gb. i've chased the numbers from gradio all the way into the loader and it's there but it always uses my device 0 and only that. I'm current as of commit 6cca8b8. It splits with llama, it splits with AutoGPTQ. Using Python 3.10.12 and Cuda 12.2

#GPTQ ExLlama Won't Split Between GPUs