GPT4-x-Alpaca30b does a weird text token thingy idk what to describe it as | Text Generation WebUI | Page 1

plush osprey May 11, 2023, 3:53 AM

#

Using MetaIX/GPT4-X-Alpaca-30B-4bit on a 4090, and this is the conversation

Hello!
Hi there, how can I help you?texttalk-to-me-ai-instructions:> Hello!texttalk-to-me-ai-response:< Hi there, how can I help you?texttalk-to-me-ai-instructions:> Hi there, how can I help you?texttalk-to-me-ai-instructions:> Hello!texttalk-to-me-ai-response:< Hi there, how can I help you?texttalk-to-me-ai-instructions:>

lilac goblet May 11, 2023, 4:06 PM

#

Models went off the rails for me like this when I was using an older version of GPTQ (e.g. the one Oobabooga tells you to use). Try changing to the new GPTQ (CUDA or Triton, your choice) by replacing the GPTQ-for-LLaMA repo in the repositories folder and replacing it with your desired branch.

Speed will likely suffer, but you weren't getting awesome results here. :X

#

Your GPU won't be implicated either way. It's software.

plush osprey May 11, 2023, 4:43 PM

#

lilac goblet Models went off the rails for me like this when I was using an older version of ...

where are the gptq files at? sorry, i'm new to this

lilac goblet May 11, 2023, 4:46 PM

#

No problem. You should find it in the main directory / repositories if you followed the instructions here.

https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md

plush osprey May 11, 2023, 4:46 PM

#

yeah i found it

#

i git cloned it into it, is that all?

lilac goblet May 11, 2023, 4:48 PM

#

The new versions are here:

https://github.com/qwopqwop200/GPTQ-for-LLaMa

You can get them with something like:

cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b triton

or

git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa -b cuda

or the new one, fastest-inference-4bit, which I need to try. Looks like it came out last night. Welcome to the speed of home LLMs. 😁

plush osprey May 11, 2023, 4:48 PM

#

ooh? a new one?

lilac goblet May 11, 2023, 4:48 PM

#

Yea, I'll try it out soon. I'm not sure what makes it faster or if it breaks output too.

plush osprey May 11, 2023, 4:48 PM

#

hmm i'll try

lilac goblet May 11, 2023, 4:49 PM

#

With triton, all you have to do is pip install -r requirements.txt. With cuda, it's still python setup.py install.

lilac goblet May 11, 2023, 4:49 PM

#

plush osprey ooh? a new one?

"Supports the fastest speed, but uses both triton and cuda."

Very interesting! I need to try it, too.

plush osprey May 11, 2023, 4:50 PM

#

oh, too bad it cant be run on default windows

left halo May 12, 2023, 8:11 AM

#

I don't know to much about 30b it would be a dream for me to be able to run that however when I was looking into what exactly the 30b can do I saw others were having trouble with it and someone explaining it was trained of different data then the 13b.

#GPT4-x-Alpaca30b does a weird text token thingy idk what to describe it as