#Fine-tuning on kaggle environment

25 messages · Page 1 of 1 (latest)

meager chasm
#

Hello everyone, I moved my fine tuning logic to kaggle, the training step just pauses and doesn't continue. Please is there a different config to set to enable the training to proceed like it does on colab?

shut falconBOT
#

There is no special config required for Unsloth to run on Kaggle versus Colab, but some users report that training can hang or the kernel can die on Kaggle, especially with certain GPUs (e.g., P100 vs T4) or large models. Switching to a T4 GPU, reducing batch size, or using SFTConfig instead of TrainingArguments can help. Also, ensure your package versions (transformers, trl, unsloth) match those used in working Colab runs, and try manually tokenizing your dataset if you encounter tokenization errors or kernel crashes. See this issue and this issue for details.

Would you like a step-by-step troubleshooting guide for Kaggle-specific hangs or kernel deaths?

Sources:

meager chasm
#

@shut falcon I'm using T4 and SFTConfig and the latest packages.

shut falconBOT
#

If you are using a T4 GPU, SFTConfig, and the latest packages, but training still pauses or does not proceed on Kaggle, this is a known issue for some users. The most common solutions are:

  1. Reduce batch size (e.g., set per_device_train_batch_size=1) and/or max_seq_length (e.g., 1024 or lower).
  2. Restart the kernel and ensure no other processes are using GPU memory.
  3. Check for version mismatches—make sure transformers, trl, and unsloth are all up to date and match the versions used in official Unsloth notebooks.
  4. Try running the same code on Colab—if it works there, the issue is likely Kaggle-specific resource or environment limitations.

See this issue and this issue for more troubleshooting.

Would you like more detailed steps or a sample minimal working Kaggle notebook?

Sources:

pseudo osprey
#

Try without wandb

meager chasm
#

@pseudo osprey how would I disable wandb.

pseudo osprey
#

report_to = "none",

meager chasm
#

Okay, thanks, will try that

#

@pseudo osprey It worked, thanks a lot.

pseudo osprey
#

if you want to use wandb use
!wandb login token

#

replace token with your token slothfire

teal vessel
slow jungle
meager chasm
#

I'm using opensloth

#

@slow jungle

slow jungle
#

Ah makes sense thanks!

#

Wait which model are you using? I was trying with Gemma 3n but was running into errors

meager chasm
#

unsloth/gemma-3n-E2B-it-unsloth-bnb-4bit

slow jungle
#

Oh shoot, guess I’ll be trying again then, thanks!

slow jungle
#

Welp that was some stupid sh*t, I ended up using accelerate instead of opensloth, but I eventually got kaggle to stop loading both model weights into one GPU at first

#

don't use device_map="auto"...