#Output is cut off prematurely.

1 messages · Page 1 of 1 (latest)

dire thorn
#

I’m using Ooba with pygmalion-7b-ggml-q5_1 in conjunction with SillyTavern. Problem is that when generating, messages will cut off mid-way.

Is there a way to fix this? I tried changing the tokenizer on SillyTavern but the problem persists.

unreal sierra
#

Tokenizer wouldn't be related to that. Running out of VRAM is the most likely explanation.

dire thorn
#

Hmmm. I'll look into that then.

#

Wait, is there any way to tinker with VRAM?

#

I only have 16 GB of RAM...

unreal sierra
#

Not many options with 16GB of RAM, but yes, there is offloading. In the end, you can use disk. You'll get a word a minute like that, though.

#

(ish)

#

There's a guide on the webpage for low VRAM and it's all applicable.

dire thorn
#

Please link it. I might have overlooked it.

dire thorn
#

Thanks!

dire thorn
#

Wait, do these modes work on CPU? I remember reading they do not.

tiny hinge
#

Yes ggml models are cpu models. The can be offloaded to gpu if you enable support for that.

dire thorn
tiny hinge
#

Sorry, I misread that. The dangers of multitasking.