#How to send context to Vram?

4 messages · Page 1 of 1 (latest)

quasi linden
#

How to send context to Vram?

rugged raven
#

I'm not sure what you're trying to do here.
I'd remove either cache_4bit or cache_8bit and no_offload_kqv.

But I think your context is supposed to be in VRAM if you offload all layers.
If it's shared VRAM that bothers you, it's likely not because it overflows to System RAM but uses second 4060.
If I recall correctly System RAM overflow doesn't even work if you have multiple GPUs, and it would crash with OOM instead.

#

Maybe you could try set tensor_split 50,50?

rugged raven
#

What if you put 0,50,50 in tensor_split?
I highly doubt it uses your integrated GPU but it's just a wild guess.