#GPU for 13B language model
9 messages · Page 1 of 1 (latest)
Havent tried that yet, feel free to deploy it too
Well I tried and failed, out of memory CUDA exception. I guess I'll stick to 48 GB GPU for now.
Which model was it? Usually 13B with AWQ or GPTQ quantization aren't very large.
Could be need to set some env variable sometimes for these LLMs to prevent it from eating up too much memory i think too. I remember i had a similar experience, but sometimes there are configs to help that
If u wanna eat less memory
Yeah for GPTQ, I had to set GPU_MEMORY_UTILIZATION to 0.80 instead of the default of 0.95