#Output is cut off prematurely.
1 messages · Page 1 of 1 (latest)
Tokenizer wouldn't be related to that. Running out of VRAM is the most likely explanation.
Hmmm. I'll look into that then.
Wait, is there any way to tinker with VRAM?
I only have 16 GB of RAM...
Not many options with 16GB of RAM, but yes, there is offloading. In the end, you can use disk. You'll get a word a minute like that, though.
(ish)
There's a guide on the webpage for low VRAM and it's all applicable.
Please link it. I might have overlooked it.
Thanks!
Wait, do these modes work on CPU? I remember reading they do not.
Yes ggml models are cpu models. The can be offloaded to gpu if you enable support for that.
Oh that I know. See my post in #mac-setup first though. My GPU is an AMD one and I think llama.ccp is the only one that supports it.
Sorry, I misread that. The dangers of multitasking.