#Quantization Issue

35 messages · Page 1 of 1 (latest)

true fractal
#

Hello, I am on linux and I have been able to get the models fine tuned, but I have not been able to quantize the safetensors. Whenever I try to I get the following error:

The output location will be ./model/unsloth.BF16.gguf
This will take 3 minutes...
/bin/sh: 1: python: not found
Traceback (most recent call last):
File "/home/alowtron/Documents/Coding/AI/Test1.py", line 122, in <module>
if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
File "/home/alowtron/.local/lib/python3.10/site-packages/unsloth/save.py", line 1630, in unsloth_save_pretrained_gguf
all_file_locations = save_to_gguf(model_type, model_dtype, is_sentencepiece_model,
File "/home/alowtron/.local/lib/python3.10/site-packages/unsloth/save.py", line 1111, in save_to_gguf
raise RuntimeError(
RuntimeError: Unsloth: Quantization failed for ./model/unsloth.BF16.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.

I go in and I run the commands in the output folder, but when I try to quatize again I get the same error

GitHub

LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.

hollow ingot
#

verify if llama.cpp actually runs on your system and if you can compile it from hand .. the gguf quanting isnt really handled by unsloth per se

true fractal
#

I do have C++ installed, nstill not seeeing a way to run it though

hollow ingot
#

just run the make file - llama.cpp comes with a good readme

#

should be fairly simple

#

you on linux ?

true fractal
#

Yeah I am on linux

hollow ingot
#

ya just clone and run make GGML_CUDA=1

#

or cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release

#

assuming you want nvidia support

true fractal
#

I CXXFLAGS: -std=c++11 -fPIC -O3 -g -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -fopenmp -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi -Iggml/include -Iggml/src -Iinclude -Isrc -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_OPENMP -DGGML_USE_LLAMAFILE -DGGML_USE_CUDA -I/usr/local/cuda/include -I/usr/local/cuda/targets/x86_64-linux/include -DGGML_CUDA_USE_GRAPHS
I NVCCFLAGS: -std=c++11 -O3 -g -use_fast_math --forward-unknown-to-host-compiler -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128
I LDFLAGS: -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/usr/lib64 -L/usr/local/cuda/targets/x86_64-linux/lib -L/usr/local/cuda/lib64/stubs -L/usr/lib/wsl/lib
I CC: cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
I CXX: c++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
I NVCC: Build cuda_11.5.r11.5/compiler.30672275_0
Makefile:993: *** I ERROR: For CUDA versions < 11.7 a target CUDA architecture must be explicitly provided via environment variable CUDA_DOCKER_ARCH, e.g. by running "export CUDA_DOCKER_ARCH=compute_XX" on Unix-like systems, where XX is the minimum compute capability that the code needs to run on. A list with compute capabilities can be found here: https://developer.nvidia.com/cuda-gpus . Stop.

NVIDIA Developer

Explore your GPU compute capability and CUDA-enabled products.

hollow ingot
#

your cuda drivers are 11.7 ?

#

or lower

true fractal
#

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

#

11.5

hollow ingot
#

very OLD

true fractal
#

I will take and update it then

hollow ingot
#

what gpu's do you run on ?

true fractal
#

3060

hollow ingot
#

ya .. you can update to 12.something

#

i run on ampere too

true fractal
#

I thought I downloaded the latest version when I installed linux 2 monthos ago, apearentlly I did not.

hollow ingot
#

nvcc --version (base)
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Fri_Jun_14_16:34:21_PDT_2024
Cuda compilation tools, release 12.6, V12.6.20
Build cuda_12.6.r12.6/compiler.34431801_0

#

vs - "Copyright (c) 2005-2021 NVIDIA Corporation"

#

you are on a over 3 year old driver

#

base ❯ nvidia-smi --version (base)
NVIDIA-SMI version : 560.35.03
NVML version : 560.35
DRIVER version : 560.35.03
CUDA Version : 12.6

~

true fractal
#

So I have taken and updated my nividia driver and now my cuda is 12.6. I am able to run the llama.ccp now, the runtime error is still there though, even after manually running llama.cpp.

#

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Fri_Jun_14_16:34:21_PDT_2024
Cuda compilation tools, release 12.6, V12.6.20
Build cuda_12.6.r12.6/compiler.34431801_0

#

The output location will be ./model/unsloth.F16.gguf
This will take 3 minutes...
/bin/sh: 1: python: not found
Traceback (most recent call last):
File "/home/alowtron/Documents/Coding/AI/Test1.py", line 118, in <module>
if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "f16")
File "/home/alowtron/.local/lib/python3.10/site-packages/unsloth/save.py", line 1630, in unsloth_save_pretrained_gguf
all_file_locations = save_to_gguf(model_type, model_dtype, is_sentencepiece_model,
File "/home/alowtron/.local/lib/python3.10/site-packages/unsloth/save.py", line 1111, in save_to_gguf
raise RuntimeError(
RuntimeError: Unsloth: Quantization failed for ./model/unsloth.F16.gguf
You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.

GitHub

LLM inference in C/C++. Contribute to ggerganov/llama.cpp development by creating an account on GitHub.

hollow ingot
#

/bin/sh: 1: python: not found

#

maybe its python3?

true fractal
#

It is possible, do you know how I wouild change it so it tries that instead?