#Not enough Vram??

1 messages · Page 1 of 1 (latest)

topaz wedge
#
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 34.00 MiB. GPU 0 has a total capacty of 8.00 GiB of which 3.04 GiB is free. Of the allocated memory 3.95 GiB is allocated by PyTorch, and 39.37 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Yeah any solutions? also the thing is it just stops when it reaches 4gb vram usage for some reason, like why is it doing this? ( btw im using Oobabooga webui w Chronos_hermes_13B and i got an rtx 3050 8gb)

pure mirage
#

What RVC are you using?

topaz wedge
#

RVC ???

#

im using text generation webui..

smoky jay
#

Hmh, whenever i get a message like this im usually simply actually out of vram :p

#

So basically

#

They are literally out of vram

#

Welp, there you have it xD

coarse escarp
#

Why is it using torch though

#

Can torch even quantize ? Should be 26-52GB here (fp16 - fp32)

coarse escarp
#

Also that GPU doesn't have the vram budget for a full 13B, even quantized. It will have to fallback to system memory which is tremendously slower

#

(assuming it has enough space to sit there)

#

Yeah but you don't want to infer on that ever

#

they train them on that

#

But all the funny local stuff runs on other backends

#

Because it's just more flexible and performant

#

GGML & shit

topaz wedge
#

it is a quantized version

errant skyBOT
#

Ayo? @topaz wedge level 2 !!! lfg

coarse escarp
#

its standalone

topaz wedge
coarse escarp
#

If my memory serves me well, GPTQ can only be loaded in GPU fully

topaz wedge
#

like it never uses the whole 8gb

coarse escarp
#

not partially

topaz wedge
#

hmmm so basicly i just cant use anything other than 7b models?

coarse escarp
#

ok mb GPTQ relies on transformers which relies on torch

topaz wedge
#

the thing is like 7b models use 4gb vram :/// bruh, like i was expecting 13b to use somewhere between 8gb, not 20 💀

#

the model i used wasnt quantized

#

and it used like 4-4.5 gb

#

Vicuna_7B

coarse escarp
#

Context needs to be in vram as well

topaz wedge
#

so like i cant possibly run anything better than 7b?

#

why 7 times 4?

topaz wedge
#

💀

#

ah btw quick question

#

is 8gb good enough to run RVC?

#

so like ill be fine with 8gb vram?

#

cuz yk i wanted to use rvc aswell but didnt wanna go through all that trouble to fail lmao

topaz wedge
#

lol btw like i bought this gpu recently, i had a 2gb vram gpu before it 💀

coarse escarp
#

would you know the very basics of transformer models ?

topaz wedge
#

and like i installed like 4-5 RVC 💀

#

none of them worked

#

i got all kinds of errors

#

like i was literally modifying the source code of the RVC

#

to maybe fix them, but yeah more errors would appear

#

yeah I really wonder why 💀