#Gemma 3n Colab Notebook Not Working

1 messages ยท Page 1 of 1 (latest)

sharp raven
#

The Notebook: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma3N_(4B)-Conversational.ipynb#scrollTo=-Xbb0cuLzwgf

Cell Three Error:

Please restructure your imports with 'import unsloth' at the top of your file.
from unsloth import FastModel

ImportError Traceback (most recent call last)
/tmp/ipython-input-24-3770780297.py in <cell line: 0>()
----> 1 from unsloth import FastModel
2 import torch
3
4 fourbit_models = [
5 # 4bit dynamic quants for superior accuracy and low memory use

4 frames
/usr/local/lib/python3.11/dist-packages/unsloth_zoo/temporary_patches/misc.py in patch_CsmDepthDecoderForCausalLM_forward()
204
205 from transformers.modeling_outputs import CausalLMOutputWithPast
--> 206 from transformers.models.csm.modeling_csm import Cache, Unpack, KwargsForCausalLM
207 from transformers.loss.loss_utils import ForCausalLMLoss
208

ImportError: cannot import name 'KwargsForCausalLM' from 'transformers.models.csm.modeling_csm' (/usr/local/lib/python3.11/dist-packages/transformers/models/csm/modeling_csm.py)


NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

glacial bramble
#

there are issues being solved as we speak, so give it a few hours or a day

sharp raven
#

๐Ÿฆฅ Unsloth: Will patch your computer to enable 2x faster free finetuning.
๐Ÿฆฅ Unsloth Zoo will now patch everything to make training faster!

ImportError Traceback (most recent call last)
/tmp/ipython-input-2-1326964708.py in <cell line: 0>()
17 ] # More models at https://huggingface.co/unsloth
18
---> 19 model, tokenizer = FastModel.from_pretrained(
20 model_name = "unsloth/gemma-3-4b-it",
21 max_seq_length = 2048, # Choose any for long context!

2 frames
/usr/local/lib/python3.11/dist-packages/unsloth_zoo/temporary_patches/gemma.py in patch_Gemma3ForConditionalGeneration_causal_mask()
161 try: import transformers.models.gemma3.modeling_gemma3
162 except: return
--> 163 from transformers.models.gemma3.modeling_gemma3 import (
164 StaticCache,
165 HybridCache,

ImportError: cannot import name 'StaticCache' from 'transformers.models.gemma3.modeling_gemma3' (/usr/local/lib/python3.11/dist-packages/transformers/models/gemma3/modeling_gemma3.py)

glacial bramble
#

temp solution !pip install -U transformers==4.52.4 in the installation cell

#

ugh ๐Ÿ˜„

sharp raven
#

temp solution for the first notebook or the second or both ๐Ÿ™‚

glacial bramble
#

gemma3

sharp raven
#

that worked ๐Ÿ™

arctic delta
#

Once I am downgrading to 4.52.4 the FastVisionModel library is not getting installed from unsloth

sharp raven
#

@glacial bramble any update on this?

mossy bay
#

Fixing it asap sorry!

#

The goal is in a few hours

#

Sorry!

sharp raven
#

Thank you Daniel!

high charm
#

Thanks @mossy bay as I am also having the same issues with the 3n notebook and tried all of the temporary workarounds to no avail.

glacial bramble
#

it's already solved. a release will be made later today

mossy bay
#

@high charm @sharp raven @arctic delta I just fixed it + reduced VRAM usage by 25% + made it faster + fixed vision! Please update Unsloth via pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo

high charm
sharp raven
#

@mossy bay trying to run this notebook on a A100 and get this :
CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1\nCompile with TORCH_USE_CUDA_DSA to enable device-side assertions.\n"

#

This was just testing the image inference with : model_name = "unsloth/gemma-3n-E4B-it",

glacial bramble
#

guys don't tag in every single message ๐Ÿ™๐Ÿป . we can see the thread

#

and it works

#

if you're on a local machine, make sure to update your current install of unsloth

#

and make sure that it did update

sharp raven
#

same. worked for me too. ๐Ÿค”

glacial bramble
#

yes can be one of several reasons (notebook cache , etc..)

sharp raven
#

Understood. Thanks.

glacial bramble
#

โœ…

sharp raven
#

I created a new workstation with a t4 attached and same result.
unknown:0: unknown: block: [55,0,0], thread: [384,0,0] Assertion index out of bounds: 0 <= tmp6 < 128 failed.

glacial bramble
#

and we're still talking about gemma3-n , correct?

sharp raven
#

gemma3-n yes. not colab. downloaded the colab nb as a ipynb and am running on a gcp cloud workstation with a t4 attached.

glacial bramble
#

in this case can you try one thing for me (if you got th etime):
in a new environment

#

avoid installing into the system wide python

#

install from main repo instead of pypi

#

wait no too much hustle

sharp raven
#

yep... using a venv

#

always in fact ๐Ÿ™‚

elfin musk
#

the same CUDA error. win, docker, 4070/5090 ... and ```hard_emb = self.embedding(input_ids - self.vocab_offset)

mossy bay
#

Oh that's a weird issue

#

Wait

#

please update Unsloth, timm, unsloth_zoo

#

pip install --upgrade --force-reinstall --no-deps --no-cache-dir unsloth unsloth_zoo timm transformers

arctic delta
#

I am using H100 gpu machine I also tried the same code as given in Collab notebook but still the same issue I am getting.

#

CUDA error : device-side assert triggered

arctic delta
mossy bay
#

@arctic delta is it the same issue as above? or another issue

elfin musk
#

updating libraries did not help. log for 5090, same problem. unsloth/Mistral-Small-3.2-24B-Instruct-2506-unsloth-bnb-4bit works with the same code

mossy bay
#

@glass lagoon could you investigate these issues!

hot flame
#

is unsloth support gemma-3n vision-to-text finetune ?

glass lagoon
#

I suspect one way to solve this is using torch2.6 and triton 3.2.
triton 3.2 should get installed with torch 2.6 if you install torch like

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
#

When running the colab notebooks locally you should also run the notebooks in a fresh environment so that pip installs don't override any existing installs.

glass lagoon
#

should work on the 4000 series, could you try?

glass lagoon
elfin musk
glass lagoon
#

got it, but it would be greatly appreciated if you could test the suggestion on the 40xx card without disabling compile.

glass lagoon
#

oh intersting. this is without DISABLE_COMPILE?

elfin musk
#

yes

glass lagoon
#

ok so at least 1 step worked. progress!

#

there are some things im noticing like the python version warning for xformers. then just want to ask how you are installing transformers and timm?

elfin musk
glass lagoon
#

yes torch compile does help with vram usage as it will fuse ops together if possible

#

but i'm wondering if this still stems from the xformers warning

elfin musk
#

I built a test container like this

FROM nvidia/cuda:12.8.0-devel-ubuntu24.04

ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1

RUN --mount=type=cache,target=/var/cache/apt \
    apt-get update && \
    apt-get install -y \ 
        python3.12 python3.12-venv python3.12-dev pip \ 
        supervisor rsync git wget mc nano \
        cmake pkg-config libcurl4-gnutls-dev build-essential && \ 
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# libcairo2 libcairo2-dev 

RUN python3.12 -m venv /opt/venv

ENV TORCH_CUDA_ARCH_LIST="8.9 12.0"
ENV CUDA_HOME=/usr/local/cuda-12.8
ENV PATH=$CUDA_HOME/bin:$PATH
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=video,compute,utility

ENV PATH="/opt/venv/bin:$PATH"

ENV MAX_JOBS=16

RUN pip install --upgrade pip setuptools wheel ninja
RUN pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
RUN pip install --no-deps psutil regex rich bitsandbytes accelerate peft trl==0.15.2 cut_cross_entropy unsloth_zoo
RUN pip install transformers sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
RUN pip install --no-deps unsloth
RUN pip install --no-deps triton
RUN pip install --no-deps xformers
RUN pip install --no-deps --upgrade timm

RUN echo "[supervisord]\n\
nodaemon=true\n\
logfile=/dev/null\n\
logfile_maxbytes=0\n" > /etc/supervisor/conf.d/supervisord.conf

WORKDIR /app

CMD ["/usr/bin/supervisord"]
glass lagoon
#

oh you're using a cuda12.8 image with cuda 12.6 torch. is at least one issue I see. Although in practice not exactly sure what happens.

that's also an older trl version and I don't think you need to specify a version anymore

#

i'm not sure that torch 2.6 ships with cuda12.8 tbh

#

you can try setting a specific version on xformers because it seems like the no-deps is casuing issues on the 12.8 image

#

but to make it easier after installing torch i think you could just do
pip install unsloth, then do the transformers and timm upgrades

#

but for this test i would use a 12.6 image instead of 12.8 if possible

elfin musk
#

cuda 12.6 image: works without errors and without xformers

#
   \\   /|    Num examples = 9,337 | Num Epochs = 3 | Total steps = 7,005
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 2 x 1) = 4
 "-____-"     Trainable parameters = 42,270,720 of 5,481,708,992 (0.77% trained)
  0%|                                                                                                                                     | 0/7005 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
Unsloth: Will smartly offload gradients to save VRAM!
{'loss': 7.0253, 'grad_norm': 3990330.25, 'learning_rate': 0.0, 'epoch': 0.0}                                                                                      
{'loss': 7.3983, 'grad_norm': 4439897.5, 'learning_rate': 4.27960057061341e-08, 'epoch': 0.0}                                                                      
{'loss': 4.45, 'grad_norm': 207282.3125, 'learning_rate': 8.55920114122682e-08, 'epoch': 0.0}                                                                      
{'loss': 3.7255, 'grad_norm': 116172.859375, 'learning_rate': 1.283880171184023e-07, 'epoch': 0.0}                                                                 
{'loss': 3.6248, 'grad_norm': 47774.57421875, 'learning_rate': 1.711840228245364e-07, 'epoch': 0.0}                                                                
{'loss': 3.4556, 'grad_norm': 25535.576171875, 'learning_rate': 2.139800285306705e-07, 'epoch': 0.0}                                                               
{'loss': 3.8237, 'grad_norm': 24524.2890625, 'learning_rate': 2.567760342368046e-07, 'epoch': 0.0}                                                                 
{'loss': 3.472, 'grad_norm': 16085.9140625, 'learning_rate': 2.9957203994293864e-07, 'epoch': 0.0}                                                                 
{'loss': 3.5032, 'grad_norm': 16694.767578125, 'learning_rate': 3.423680456490728e-07, 'epoch': 0.0}  ```
glass lagoon
#

ok great at least that works

#

so now if I rewind a bit, if you use a 40xx machine but run a 12.8 image, with
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 and not changing anything else, will it fail?

elfin musk
glass lagoon
elfin musk
#

yes

elfin musk
glass lagoon
#

yea thing with the 5090 is that it's a newer arch, so i'm unsure how it ends up compiling

#

i need to investigate. could be that certain ops aren't yet supported.

elfin musk
# glass lagoon i need to investigate. could be that certain ops aren't yet supported.

strange thing. cuda image 12.8:
with pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126 -> not working on 4070
or with pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 -> not working on 4070

but when i downgraded libs to:
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126 -> working on 4070 with xformers (no warning)

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 --force-reinstall -> not working again

pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126 --force-reinstall -> working on 4070 without xformers ( xFormers can't load C++/CUDA extensions)

glass lagoon
#

yea that's interesting, the reinstall must also change some of the reqs that were initially satisified.

elfin musk
glass lagoon
elfin musk
#

Working docker container for 40xx, torch > 2.6.0 - cuda error

FROM nvidia/cuda:12.6.0-devel-ubuntu24.04

ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1

RUN --mount=type=cache,target=/var/cache/apt \
    apt-get update && \
    apt-get install -y \ 
        python3.12 python3.12-venv python3.12-dev pip \ 
        supervisor rsync git wget mc nano \
        cmake pkg-config libcurl4-gnutls-dev build-essential && \ 
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# libcairo2 libcairo2-dev 

RUN python3.12 -m venv /opt/venv

ENV TORCH_CUDA_ARCH_LIST="8.9 12.0"
ENV CUDA_HOME=/usr/local/cuda-12.6
ENV PATH=$CUDA_HOME/bin:$PATH
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=video,compute,utility

ENV PATH="/opt/venv/bin:$PATH"

ENV MAX_JOBS=16

RUN pip install --upgrade pip setuptools wheel ninja
RUN pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 xformers==0.0.29.post3 --index-url https://download.pytorch.org/whl/cu126
RUN pip install --no-deps psutil regex rich bitsandbytes accelerate peft trl cut_cross_entropy unsloth_zoo
RUN pip install transformers sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
RUN pip install --no-deps unsloth
RUN pip install --no-deps triton
RUN pip install --no-deps --upgrade timm

RUN echo "[supervisord]\n\
nodaemon=true\n\
logfile=/dev/null\n\
logfile_maxbytes=0\n" > /etc/supervisor/conf.d/supervisord.conf

WORKDIR /app

CMD ["/usr/bin/supervisord"]
glass lagoon
#

ok you could try torch 2.7.0 with triton 3.2.0. just put the following triton line after installing torch 2.7.0
RUN pip install --no-deps triton==3.2.0

karmic crystalBOT
#

Torch 2.7.0 is not officially released with CUDA 12.6 or 12.8 wheels as of now, and Unsloth's compatibility matrix and install scripts do not list torch 2.7.0 as supported for CUDA 12.6/12.8. Additionally, using triton==3.2.0 with torch 2.7.0 may cause incompatibility, as torch 2.7.0 expects triton>=3.3.0 for Blackwell/RTX 50xx support and newer CUDA versions. This combination is not documented as supported in Unsloth's official install guides or troubleshooting resources. Attempting this may result in import or runtime errors due to mismatched dependencies and unsupported CUDA/torch/triton combinations. For 40xx cards, torch==2.6.0 with CUDA 12.6 and triton==3.2.0 is the recommended, tested configuration; for 50xx/Blackwell, torch==2.7.0 with triton>=3.3.1 and CUDA 12.8 is required, as per the official Blackwell install guide and Unsloth documentation.

Would you like a step-by-step breakdown of the correct install sequence for your specific GPU and CUDA version?

Sources:

glass lagoon
#

lol these are good points unsloth bot

karmic crystalBOT
#

I'm here to help with any questions or troubleshooting related to Unsloth, including installation, compatibility, and fine-tuning! If you have a specific Unsloth setup, error, or workflow you want to discuss, just let me know the details and I'll do my best to assist. If you need step-by-step guidance for a particular GPU, CUDA, or library version, feel free to ask!

glass lagoon
#

for now gemma3n will not work with torch 2.7 /triton 3.3. It will only work with torch 2.6 / triton 3.2. I am working on a fix.

mossy bay
#

just fixed it!!! sorry on the delay

#

pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo

keen mantle
#

now i get

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.11/dist-packages/unsloth_zoo/loss_utils.py in _unsloth_get_batch_samples(self, epoch_iterator, num_batches, device, *args, **kwargs)
    315                 if "token_type_ids" in x:
--> 316                     token_type_ids = kwargs["token_type_ids"]
    317                     mark_static (token_type_ids, 0)

KeyError: 'token_type_ids'

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.11/dist-packages/accelerate/utils/memory.py in decorator(*args, **kwargs)
    166             try:
--> 167                 return function(batch_size, *args, **kwargs)
    168             except Exception as e:

/usr/local/lib/python3.11/dist-packages/unsloth_zoo/compiler.py in _fast_inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)

/usr/local/lib/python3.11/dist-packages/unsloth_zoo/loss_utils.py in _unsloth_get_batch_samples(self, epoch_iterator, num_batches, device, *args, **kwargs)
    327         except Exception as exception:
--> 328             raise RuntimeError(exception)
    329     pass

RuntimeError: 'token_type_ids'

in kaggle

glass lagoon
#

during training?

#

dang I checked colab earlier and it was working. wonder what's different in kaggle

#

oh yea its inside the training loop

#

it might be a quick fix. I have a branch you could test if you have capacity. pip install --no-deps git+https://github.com/mmathew23/unsloth-zoo.git@gemma3nx

#

@keen mantle

keen mantle
#

yea it works thanks

glacial bramble
#

my bad i was wrong

keen mantle
#

randomly happen when training

glacial bramble
#

they did release on pypi but didn't add to the releases page

#

๐Ÿ˜„

hot flame
#

i got the same error about recompilation as well , it happen at around step 100-200 randomly , Gemma-3n


   1752         result = None

/content/unsloth_compiled_cache/unsloth_compiled_module_gemma3n.py in forward(self, input_ids, inputs_embeds)
   1430         inputs_embeds: Optional[torch.Tensor] = None,
   1431     ) -> torch.Tensor:
-> 1432         return Gemma3nMultimodalEmbedder_forward(self, input_ids, inputs_embeds)
   1433 
   1434 

/usr/local/lib/python3.11/dist-packages/torch/_dynamo/eval_frame.py in _fn(*args, **kwargs)
    572 
    573             try:
--> 574                 return fn(*args, **kwargs)
    575             finally:
    576                 # Restore the dynamic layer stack depth if necessary.

RuntimeError: Recompilation triggered with skip_guard_eval_unsafe stance. This usually means that you have not warmed up your model with enough inputs such that you can guarantee no more recompilations.
keen mantle
#

unsloth-zoo==2025.7.4 was working fine in kaggle so i am downgrading to this

glass lagoon
mossy bay
#

oh the recompilation error is normal - you can ignore it

#

i might have to disable some things - the goal was to make it faster ๐Ÿ˜ฆ

#

does it just error out and thats all?

#

ie it stops running?

hot flame
hot flame
#

i currently reverted to unsloth==2025.7.3 unsloth_zoo==2025.7.4 and the problem is disappear, please let me know if you want test again !

mossy bay
#

oh ok ok

#

ill fix the recompilation issue asap!

#

its mainly my fault for trying to make things faster

#

in the end it was more problematic

mossy bay
#

Fixed the recompilation issue!

#

Please do pip install --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo

hot flame
#

After install latest the recompilataion issue is not appear again, thank you Unsloth slothhearts