Fine-tuning on kaggle environment | Unsloth AI | Page 1

meager chasm Jul 18, 2025, 10:11 PM

#

Hello everyone, I moved my fine tuning logic to kaggle, the training step just pauses and doesn't continue. Please is there a different config to set to enable the training to proceed like it does on colab?

shut falconBOT Jul 18, 2025, 10:11 PM

#

There is no special config required for Unsloth to run on Kaggle versus Colab, but some users report that training can hang or the kernel can die on Kaggle, especially with certain GPUs (e.g., P100 vs T4) or large models. Switching to a T4 GPU, reducing batch size, or using SFTConfig instead of TrainingArguments can help. Also, ensure your package versions (transformers, trl, unsloth) match those used in working Colab runs, and try manually tokenizing your dataset if you encounter tokenization errors or kernel crashes. See this issue and this issue for details.

Would you like a step-by-step troubleshooting guide for Kaggle-specific hangs or kernel deaths?

Sources:

meager chasm Jul 18, 2025, 10:23 PM

#

@shut falcon I'm using T4 and SFTConfig and the latest packages.

shut falconBOT Jul 18, 2025, 10:23 PM

#

If you are using a T4 GPU, SFTConfig, and the latest packages, but training still pauses or does not proceed on Kaggle, this is a known issue for some users. The most common solutions are:

Reduce batch size (e.g., set per_device_train_batch_size=1) and/or max_seq_length (e.g., 1024 or lower).
Restart the kernel and ensure no other processes are using GPU memory.
Check for version mismatches—make sure transformers, trl, and unsloth are all up to date and match the versions used in official Unsloth notebooks.
Try running the same code on Colab—if it works there, the issue is likely Kaggle-specific resource or environment limitations.

See this issue and this issue for more troubleshooting.

Would you like more detailed steps or a sample minimal working Kaggle notebook?

Sources:

pseudo osprey Jul 18, 2025, 10:31 PM

#

Try without wandb

meager chasm Jul 18, 2025, 10:38 PM

#

@pseudo osprey how would I disable wandb.

pseudo osprey Jul 18, 2025, 10:43 PM

#

report_to = "none",

meager chasm Jul 18, 2025, 11:18 PM

#

Okay, thanks, will try that

#

@pseudo osprey It worked, thanks a lot.

pseudo osprey Jul 18, 2025, 11:43 PM

#

meager chasm <@603630277473992725> It worked, thanks a lot.

np

#

if you want to use wandb use
!wandb login token

#

replace token with your token slothfire

teal vessel Jul 19, 2025, 5:50 PM

#

meager chasm <@603630277473992725> It worked, thanks a lot.

There’s an accelerator with 2xT4. Are you using both GPUs, or just one here?

meager chasm Jul 21, 2025, 4:29 PM

#

teal vessel There’s an accelerator with 2xT4. Are you using both GPUs, or just one here?

I'm using both

meager chasm Jul 21, 2025, 4:30 PM

#

pseudo osprey replace token with your token <:slothfire:1253008741469524058>

I will try that

slow jungle Jul 21, 2025, 11:57 PM

#

meager chasm I'm using both

Wait how did you set that up? Was it just accelerate on kaggle?

meager chasm Jul 22, 2025, 6:37 PM

#

I'm using opensloth

#

@slow jungle

slow jungle Jul 22, 2025, 6:38 PM

#

Ah makes sense thanks!

#

Wait which model are you using? I was trying with Gemma 3n but was running into errors

meager chasm Jul 22, 2025, 8:01 PM

#

unsloth/gemma-3n-E2B-it-unsloth-bnb-4bit

slow jungle Jul 22, 2025, 10:51 PM

#

Oh shoot, guess I’ll be trying again then, thanks!

slow jungle Jul 23, 2025, 6:44 AM

#

Welp that was some stupid sh*t, I ended up using accelerate instead of opensloth, but I eventually got kaggle to stop loading both model weights into one GPU at first

#

don't use device_map="auto"...

meager chasm Jul 23, 2025, 9:26 PM

#

slow jungle Welp that was some stupid sh*t, I ended up using accelerate instead of openslot...

That sounds good

#Fine-tuning on kaggle environment