#GPU for 13B language model

9 messages · Page 1 of 1 (latest)

mild meadow
#

Just wanted to get your recommendations on GPU choice for running a 13B language model with a quantization in AWQ or GPTQ? Workload would be around 200-300 requests / hour. I tried a 48 GB A6000 with pretty good results but I was wondering if you think 24 GB GPU could be up to the task?

final willow
#

Havent tried that yet, feel free to deploy it too

obsidian terrace
#

24GB should be fine

#

Best to try it and see

mild meadow
#

Well I tried and failed, out of memory CUDA exception. I guess I'll stick to 48 GB GPU for now.

obsidian terrace
#

Which model was it? Usually 13B with AWQ or GPTQ quantization aren't very large.

storm hound
obsidian terrace
#

Yeah for GPTQ, I had to set GPU_MEMORY_UTILIZATION to 0.80 instead of the default of 0.95