Output is cut off prematurely. | Text Generation WebUI | Page 1

dire thorn Jun 13, 2023, 6:26 PM

#

I’m using Ooba with pygmalion-7b-ggml-q5_1 in conjunction with SillyTavern. Problem is that when generating, messages will cut off mid-way.

Is there a way to fix this? I tried changing the tokenizer on SillyTavern but the problem persists.

unreal sierra Jun 13, 2023, 6:49 PM

#

Tokenizer wouldn't be related to that. Running out of VRAM is the most likely explanation.

dire thorn Jun 13, 2023, 6:54 PM

#

Hmmm. I'll look into that then.

#

Wait, is there any way to tinker with VRAM?

#

I only have 16 GB of RAM...

unreal sierra Jun 13, 2023, 6:57 PM

#

Not many options with 16GB of RAM, but yes, there is offloading. In the end, you can use disk. You'll get a word a minute like that, though.

#

(ish)

#

There's a guide on the webpage for low VRAM and it's all applicable.

dire thorn Jun 13, 2023, 6:58 PM

#

Please link it. I might have overlooked it.

unreal sierra Jun 13, 2023, 6:58 PM

#

dire thorn Jun 13, 2023, 7:00 PM

#

Thanks!

dire thorn Jun 14, 2023, 7:25 AM

#

Wait, do these modes work on CPU? I remember reading they do not.

tiny hinge Jun 14, 2023, 4:02 PM

#

Yes ggml models are cpu models. The can be offloaded to gpu if you enable support for that.

dire thorn Jun 14, 2023, 4:05 PM

#

Oh that I know. See my post in ⁠#mac-setup first though. My GPU is an AMD one and I think llama.ccp is the only one that supports it.

tiny hinge Jun 14, 2023, 4:16 PM

#

Sorry, I misread that. The dangers of multitasking.

#Output is cut off prematurely.