#llama.cpp not using GPU despite having BLAS = 1 (Linux, GGML)

1 messages · Page 1 of 1 (latest)

sharp sapphire
#

Part 2 of my struggles on trying to get a model running on an AMD GPU. This time, I tried using a GGML model. The problem is, the GPU isn't being utilized. n-gpu-layers is set to 40. I'm on an AMD RX 6800 XT with ROCm 5.4.2 enabled. llama-cpp-python shouldn't be the problem since BLAS is showing up as 1 already. However, when I try to set up bitsandbytes, it doesn't say successful. Check images 4 and 5 to see the command line output when doing python -m bitsandbytes.

As for how I built bitsandbytes-rocm (maybe this is where it went wrong?), I first did ROCM_HOME=/opt/rocm-5.4.2, then I built bitsandbytes-rocm from https://github.com/agrocylo/bitsandbytes-rocm. After that, I ran python setup.py install, then did python -m bitsandbytes (which again, shows an error).

halcyon crown
#

Specifically, if you are on Linux and not WSL. ROCm doesn't work on WSL.

sharp sapphire
#

and yep, I have ubuntu dual boot

sharp sapphire
halcyon crown
#

What model is that?

sharp sapphire
#

Wizard-Vicuna-13B-Uncensored-GPTQ

#

buuuuuuuuuuuuuut its the act order version

#

im gonna try the no act order one

halcyon crown
sharp sapphire
#

Alright I'll test that one out as well

halcyon crown
#

As for llama.cpp, run cmd_linux.sh and enter this command:

CMAKE_ARGS="-DLLAMA_CLBLAST=on" FORCE_CMAKE=1 pip install llama-cpp-python

The needed dependencies for it should come with ROCm.

There is also a ROCm-specific port of llama.cpp here: https://github.com/ggerganov/llama.cpp/pull/1087
Using it with the webui will require manually building llama-cpp-python with that fork of llama.cpp.

sharp sapphire
sharp sapphire
#

I'm so close to just nuking my current Ubuntu install and starting over from scratch lol

sharp sapphire
#

I should've written down how I got BLAS to work darn

#

And yeah, as always, that bitsandbytes thing

#

This is what it looks like on my manual install of the webui

#

no bitsandbytes "no GPU support" thingy and BLAS = 1, but it still doesn't work

#

which I highly suspect is because of

#

this

#

In the video i saw, it's supposed to say something like setup successful

#

Mine just ends in an error

#

I'm gonna try using that bitsandbytes-rocm repository from the video i saw

halcyon crown
sharp sapphire
#

Yup, that's the one I used

#

woah, it worked

#

well, i havent tested if the webui works

halcyon crown
#

That is what needs to be tested, simply because the webui has updated several times in response to newer versions of bitsandbytes.

sharp sapphire
#

Yep. i was gonna try running it now except where did all the .py files in text-generation-webui go

#

my govd

#

lmao

#

ohmygod i rm *-ed the directory thats why

#

ooooooooooooooooookay it still does not want to use the gpu

halcyon crown
#

BLAS=1 is proof that it is set up correctly. Make sure you are setting the n-gpu-layers option before loading the model.

I see that you are doing that now. That is odd that it isn't loading it at all.

sharp sapphire
#

Yeah I don't know what's up with it ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

halcyon crown
#

Worst case scenario is that you compile llama-cpp-python with the ROCm fork of llama.cpp:
https://github.com/ggerganov/llama.cpp/pull/1087

Essentially, this will involve cloning the llama-cpp-python repo and cloning the llama.cpp fork into llama-cpp-python/vender/llama.cpp. Then you can build it using the environment variables mentioned on that PR, but with this command instead:

CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CMAKE_ARGS="-DLLAMA_HIPBLAS=ON -DLLAMA_CUDA_DMMV_X=128 -DLLAMA_CUDA_MMV_Y=8 -DLLAMA_CUDA_KQUANTS_ITER=1" FORCE_CMAKE=1 python -m pip install . -v
sharp sapphire
halcyon crown
#
git clone https://github.com/SlyEcho/llama.cpp -b hipblas
sharp sapphire
#

Thanks, I'll be trying that

#

Okay y'know what, I will nuke my current Ubuntu install and I'll start over from the beginning, cause I feel like I have far too many conflicting stuff currently

sharp sapphire
#

Update: I got back to trying text-generation-webui after trying KoboldAI and KoboldCPP. This time, I tried doing this, and after a bit of searching, I eventually got llama-cpp-python to compile with the ROCm port of llama.cpp. I had to put -DCMAKE_POSITION_INDEPENDENT_CODE=ON in CMAKE_ARGS as well. I haven't tested it out yet, so I'm not sure if it'll even run. More updates later

sharp sapphire
#

That's new

halcyon crown
#

Don't use a SuperHOT model. I don't think the ROCm fork has been updated to support it yet.

sharp sapphire
#

Okay, thanks. I'll download another model

#

Used Wizard-Vicuna-30B-Uncensored.ggmlv3.q4_1.bin and still got the same error

#

On another note, I tried ExLLaMa and it loads the model, but when I try to generate something, this happens

#

On ExLLaMa-HF, a different error shows up

#

The same error above also shows up when I try test_benchmark_inference.py

halcyon crown
#

It seems like an issue with the LoRa code.
I've seen the probabilities error before, but I can't for the life of me remember what caused it.

#

What flags are you using to load the webui?

sharp sapphire
#

No flags

halcyon crown
#

Run python -m torch.utils.collect_env to ensure that you have the correct Pytorch version. Other than that, I'm not sure what is causing it.
At the very least, these are errors seen on many different systems, which means it may not be related to the ROCm build.

sharp sapphire
halcyon crown
#

Yep, that's the right version at least.

sharp sapphire
#

Also, I tried GPTQ-for-LLaMa and it yields the same error

#

at least it's using the VRAM I guess

#

Just gotta figure out that probability tensor thing

#

and this llama_cpp/libllama.so: undefined symbol: llama_n_vocab_from_model

halcyon crown
#

What about AutoGPTQ? I have a ROCm wheel for it:

python -m pip install https://github.com/jllllll/GPTQ-for-LLaMa-Wheels/blob/Linux-x64/ROCm-5.4.2/auto_gptq-0.3.0.dev0%2Brocm5.4.2-cp310-cp310-linux_x86_64.whl --force-reinstall --no-deps
sharp sapphire
halcyon crown
#

???

sharp sapphire
#

i have no idea

halcyon crown
#

oops

#

Wrong URL.

#
python -m pip install https://github.com/jllllll/GPTQ-for-LLaMa-Wheels/raw/Linux-x64/ROCm-5.4.2/auto_gptq-0.3.0.dev0%2Brocm5.4.2-cp310-cp310-linux_x86_64.whl --force-reinstall --no-deps
sharp sapphire
#

maaaaaaaaaaaybe the problem is in the model

halcyon crown
#

Could be a corrupt download.

sharp sapphire
#

So I tried redownloading the files (except for the model itself), loaded exllama, and this happened

#

but

#

it generated gibberish

halcyon crown
#

Maybe it isn't detecting the correct GPU architecture?

#

Are you using the HCC_AMDGPU_TARGET and HSA_OVERRIDE_GFX_VERSION environment variables?

sharp sapphire
#

Yep, I tried AutoGPTQ and GPTQ-for-LLaMa, they still result in the same error

#

there's a fix but I dunno how to apply it

#

and it's supposed to have been merged but hhhhhhhhhhhmmmmmmmmmmm

#

Okay I'll stop working on this for now, have to go somewhere. I'll return as soon as I can, probably in 1-2 days

#

Thanks for all the help!

sharp sapphire
#

Well, I managed to sneak in a bit of time to use the PC again. Will be going again in a bit though. So, I uninstalled my custom compile of llama-cpp-python and opted for the CLBlast install instead.

#

Aaaaand it worked

#

on the 30B model:

halcyon crown
#

Oh yeah, llama-cpp-python was just updated with SuperHOT support.

sharp sapphire
#

Yup. Maybe that's what's causing the error? Though the 30B model should've worked

#

I'll try to do some more fixing since I do want to use ROCm

#

Tried ExLLaMa, AutoGPTQ, and GPTQ-for-LLaMa again, still has that probability tensor error thing. When I'm able to, I'll download another model cause the current one I'm using might be scuffed

halcyon crown
#

I'm rebuilding the ROCm GPTQ wheels using a potentially better method. May have been an issue with how I was specifying target architectures.

halcyon crown
#

Wheels have been rebuilt:

python -m pip install https://github.com/jllllll/GPTQ-for-LLaMa-Wheels/raw/Linux-x64/ROCm-5.4.2/auto_gptq-0.3.0.dev0%2Brocm5.4.2-cp310-cp310-linux_x86_64.whl https://github.com/jllllll/GPTQ-for-LLaMa-Wheels/raw/Linux-x64/ROCm-5.4.2/quant_cuda-0.0.0-cp310-cp310-linux_x86_64.whl --force-reinstall --no-deps

^ All one command.

sharp sapphire
#

Will get back to you immediately when I'm able to use the PC again ^^

#

Oh yeah, just so I can try again immediately. Could you tell me, in detail, the steps on how to build llama-cpp-python with ROCm?

What I basically did was,

  1. Clone the llama-cpp-python repository
  2. Clone the llama-cpp ROCm fork into the vendor folder inside llama-cpp-python
  3. In the llama-cpp folder, I run

CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ cmake -DLLAMA_HIPBLAS=ON
4. Then I run

make -j12 LLAMA_HIPBLAS=1
5. I went back to the llama-cpp-python directory, then I ran the command you gave me, but I added another argument.
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CMAKE_ARGS="-DLLAMA_HIPBLAS=ON -DLLAMA_CUDA_DMMV_X=128 -DLLAMA_CUDA_MMV_Y=8 -DLLAMA_CUDA_KQUANTS_ITER=1 -DCMAKE_POSITION_INDEPENDENT_CODE=ON" FORCE_CMAKE=1 python -m pip install . -v

halcyon crown
#

Running the commands in the llama.cpp folder are unnecessary. That last command for building llama-cpp-python handles everything.

sharp sapphire
#

Gotcha. I honestly just did some guesswork lol. Will try again tomorrow maybe, with hopefully better results.

sharp sapphire
#

Right. I tried it again and I couldn't get it to work. Still had the same error. Something about libllama.so.

Next, I tried using AutoGPTQ, LLaMa-for-GPTQ, and ExLLaMa. All of 'em didn't work.

#

This is with LLaMa-for-GPTQ (yes, ignore the model, I don't know what else to try lol)

#

another model, still with LLaMa-for-GPTQ

#

using the updated AutoGPTQ version you sent, this is the error

#

ExLLaMa loads Vicuna 7B successfully, excepttttttttttt

#

yea

#

ExLLaMa-HF, still that cursed probability tensor error

sand valley
#

On my laptop I can only use oobabooga in CPU mode. I installed llama.cpp with OpenBlas, but on startup it says blas zero. What could be the problem?

halcyon crown
#

You have to check the build output to determine if it actually used OpenBLAS or failed to find it. It will just continue without it if not found.
The build output can be hidden depending on how you are installing it.

sand valley
#

I installed it in the usual way:
pip uninstall -y llama-cpp-python set CMAKE_ARGS="-DLLAMA_OPENBLAS=on" set FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

halcyon crown
#

Use this to see the full build output:

pip uninstall -y llama-cpp-python
set "CMAKE_ARGS=-DLLAMA_OPENBLAS=on"
set FORCE_CMAKE=1
set VERBOSE=1
pip install llama-cpp-python --no-cache-dir -v
sand valley
#

Everything seems to be fine with the installation. Or am I missing something?

halcyon crown
#

@sand valley I found the issue. Quotes on CMAKE_ARGS were in Linux/Bash format. I have modified the commands in my previous comment to correct it.

sand valley
#

I also tried this version, but the Blas value is still zero when loading the model. :/ So I don't know if this is a bug in oobabooga or llama.cpp, or if I need some extra functionality from the processor.

halcyon crown
#

Just found out that the CMAKE_ARGS needed to install with OpenBLAS were changed at some point:

pip uninstall -y llama-cpp-python
set "CMAKE_ARGS=-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
set FORCE_CMAKE=1
set VERBOSE=1
pip install llama-cpp-python --no-cache-dir -v
sand valley
#

The Llama translation looks good, it's more interesting that the BLAS=0 is still set when loading the model. Maybe OpenBlas is not implemented for loading?

sharp sapphire
#

Are you on Linux?