How to send context to Vram? | Text Generation WebUI | Page 1

quasi linden Nov 27, 2024, 9:50 PM

#

How to send context to Vram?

rugged raven Nov 28, 2024, 6:35 AM

#

I'm not sure what you're trying to do here.
I'd remove either cache_4bit or cache_8bit and no_offload_kqv.

But I think your context is supposed to be in VRAM if you offload all layers.
If it's shared VRAM that bothers you, it's likely not because it overflows to System RAM but uses second 4060.
If I recall correctly System RAM overflow doesn't even work if you have multiple GPUs, and it would crash with OOM instead.

#

Maybe you could try set tensor_split 50,50?

rugged raven Nov 28, 2024, 7:14 AM

#

What if you put 0,50,50 in tensor_split?
I highly doubt it uses your integrated GPU but it's just a wild guess.

#How to send context to Vram?