#What size EXL quants can my rig run? 128GB RAM, RTX 4060 TI
1 messages · Page 1 of 1 (latest)
Other model types require to rely solely on VRAM.
So while you can run huge GGUFs with 128GB RAM, you can run only relatively small exl2 or GPTQ models. Same for AWQ models but there's no reason to use AWQ since AutoAWQ loader is bad (some say it's useful with vLLM/Aphrodite apps though).
If your 4060 TI has 8GB then you can use 7B and 10.7B models.
If it's 16GB then it's 7B, 10.7B, 13B, 14B models. Maybe some others too.
exl2/GPTQ are faster than GGUF when the model fully fits GPU.
I see