hey, I tried installing cublas like this:
pip uninstall -y llama-cpp-python
set CMAKE_ARGS="-DLLAMA_CUBLAS=on"
set FORCE_CMAKE=1
pip install git+https://github.com/abetlen/llama-cpp-python.git --no-cache-dir
But everytime I load a gguf model it says BLAS=0
llama_new_context_with_model: n_ctx = 32768
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size = 4096.00 MB
llama_new_context_with_model: compute buffer total size = 2141.88 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
2023-09-29 23:42:37 INFO:Loaded the model in 1.16 seconds.