#cuda errors in kaggle? what just happened

1 messages · Page 1 of 1 (latest)

rancid stream
#
    -c pytorch -c nvidia -c xformers -c conda-forge -y
!pip install "unsloth[kaggle] @ git+https://github.com/unslothai/unsloth.git"
!pip uninstall datasets -y
!pip install datasets

!pip install wandb
!wandb login xxx```
I ran that
then ran the next block 
```from unsloth import FastLanguageModel
import torch
max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-bnb-4bit", # "unsloth/mistral-7b" for 16bit loading
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)```
and boom
#
================================================================================
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/lib/x86_64-linux-gnu'), PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/cuda/lib')}
The following directories listed in your path were found to be non-existent: {PosixPath('https'), PosixPath('//www.kaggle.com')}
The following directories listed in your path were found to be non-existent: {PosixPath('//dp.kaggle.net'), PosixPath('https')}
The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/lib/x86_64-linux-gnu'), PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/cuda/lib')}
The following directories listed in your path were found to be non-existent: {PosixPath('tf2-gpu/2-13+gpu')}
The following directories listed in your path were found to be non-existent: {PosixPath('/kaggle/lib/kagglegym')}
The following directories listed in your path were found to be non-existent: {PosixPath('gcr.io/kaggle-gpu-images/python@sha256'), PosixPath('040434ccef2406aeeb2d95dee5328bbcac373218eb6dcd77c16d5f30ad46c983')}
DEBUG: Possible options found for libcudart.so: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/opt/conda/lib/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/opt/conda/lib/libcudart.so.11.0')}
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 7.5.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Required library version not found: libbitsandbytes_cuda118.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...

#
CUDA SETUP: CUDA detection failed! Possible reasons:
1. You need to manually override the PyTorch CUDA version. Please see: "https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
2. CUDA driver not installed
3. CUDA not installed
4. You have multiple conflicting CUDA libraries
5. Required library not pre-compiled for this bitsandbytes release!
CUDA SETUP: If you compiled from source, try again with `make CUDA_VERSION=DETECTED_CUDA_VERSION` for example, `make CUDA_VERSION=118`.
CUDA SETUP: The CUDA version for the compile might depend on your conda install. Inspect CUDA version via `conda list | grep cuda`.
================================================================================

CUDA SETUP: Something unexpected happened. Please compile from source:
git clone https://github.com/TimDettmers/bitsandbytes.git
cd bitsandbytes
CUDA_VERSION=118 make cuda11x
python setup.py install
CUDA SETUP: Setup Failed!

#

python -m bitsandbytes


  warn(msg)
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:183: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0', 'libcudart.so.12.1', 'libcudart.so.12.2'] files: {PosixPath('/opt/conda/lib/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/opt/conda/lib/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We select the PyTorch default libcudart.so, which is {torch.version.cuda},but this might mismatch with the CUDA version that is needed for bitsandbytes.To override this behavior set the BNB_CUDA_VERSION=<version string, e.g. 122> environmental variableFor example, if you want to use the CUDA version 122BNB_CUDA_VERSION=122 python ...OR set the environmental variable in your .bashrc: export BNB_CUDA_VERSION=122In the case of a manual override, make sure you set the LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-12.2
  warn(msg)
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:183: UserWarning: /usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/opt/conda/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0', 'libcudart.so.12.1', 'libcudart.so.12.2'] as expected! Searching further paths...
  warn(msg)
#
  warn(msg)```
#
RuntimeError                              Traceback (most recent call last)
Cell In[3], line 1
----> 1 from unsloth import FastLanguageModel
      2 import torch
      3 max_seq_length = 4096 # Choose any! We auto support RoPE Scaling internally!

File /opt/conda/lib/python3.10/site-packages/unsloth/__init__.py:59
     54     raise ImportError("Unsloth only supports Pytorch 2.1 for now. Please update your Pytorch to 2.1.\n"\
     55                       "We have some installation instructions on our Github page.")
     58 # Try loading bitsandbytes and triton
---> 59 import bitsandbytes as bnb
     60 import triton
     61 from triton.common.build import libcuda_dirs

File /opt/conda/lib/python3.10/site-packages/bitsandbytes/__init__.py:6
      1 # Copyright (c) Facebook, Inc. and its affiliates.
      2 #
      3 # This source code is licensed under the MIT license found in the
      4 # LICENSE file in the root directory of this source tree.
----> 6 from . import cuda_setup, research, utils
      7 from .autograd._functions import (
      8     MatmulLtState,
      9     bmm_cublas,
   (...)
     13     mm_cublas,
     14 )
     15 from .cextension import COMPILED_WITH_CUDA

File /opt/conda/lib/python3.10/site-packages/bitsandbytes/research/__init__.py:2
      1 from . import nn
----> 2 from .autograd._functions import (
      3     matmul_fp8_global,
      4     matmul_fp8_mixed,
      5     switchback_bnb,
      6 )```
#

A bunch more lines I can't be bothered to copy later:

        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues```
#

...might just be teh python version

maiden moon
#

Oh ye I'll get to fixing Kaggle later today!!

maiden moon
#

@rancid stream Ok fixed it! Change the top install instructions to

#
%%capture
!pip install -U xformers --index-url https://download.pytorch.org/whl/cu121
!pip install "unsloth[kaggle-new] @ git+https://github.com/unslothai/unsloth.git"

import os
os.environ["WANDB_DISABLED"] = "true"
#

Ill update all Kaggle notebooks

rancid stream
#

works