Negative loss during training | Unsloth AI | Page 1

wild nimbus Jul 3, 2025, 3:19 PM

#

Hi @mathew, I'm attaching my code that I used for fine tuning. I am getting a negative loss (but numerically close to 0) on my dataset for some targets.

My code is based on the unslot tutorial here: https://colab.research.google.com/github/timothelaborie/text_classification_scripts/blob/main/unsloth_classification.ipynb . I am attaching a modified dataset in which the number of classes is restricted to 2 and the prevalence of the positive class is 1%.

Here are the inputs of my code:

LLMpath=/cluster/home/Qwen2.5-0.5B-bnb-4bit
notesPath=/cluster/home/finance_sentiment_multiclass_low_prevalence.csv
resultsDir=/cluster/home/
batchSizeTest=3
batchSizeTrain=2
learningRate=0.00005
nEpochs=3
gradientSteps=4

With the attached dataset, I cannot exactly reproduce the error for my original dataset. I can only get a loss value of 0. But I suspect that the negative loss I am experiencing is due to 1) low prevalence and 2) round of errors. For my other targets, in my actual dataset, I don't have this problem if the positive class has high prevalence.

I have also attached my unsloth environment. Is there a problem with my code? Or any insights with how my data preparation might have gone wrong? My actual dataset has very long texts. I normally use max_seq_length=8192.

Thank you so much!

Google Colab

#

Here's my code:

📎 message.txt

#

Here's my unsloth environment:

📎 message.txt

#

Here's the link to the modified dataset described above: https://drive.google.com/file/d/1oiAbb-Cbk4O-8978Zff-sVAaiN9NVUH2/view?usp=sharing

#

@open gull Hi mathew, sorry for the delay. I have provided the relevant details above. I'd appreciate any help on this matter.

open gull Jul 3, 2025, 3:33 PM

#

oh maybe try passing SFTConfig instead of TrainingArguments. I bet your sequence is getting accidentally truncated

#

from trl import SFTTrainer, SFTConfig

and then directly swap TrainingArguments and SFTConfig

wild nimbus Jul 5, 2025, 3:09 PM

#

Hi @open gull , I tried SFTConfig instead of TrainingArguments. I am still getting negative values for the loss. I don't think there was a change in the result when I changed from TrainingArguments to SFTConfig.

The negative loss values are very small, however. It's like -0.0002 or -0.0005, which are close to 0. This is just round off error?

open gull Jul 5, 2025, 3:34 PM

#

the thing is if your loss starts at 0 that sounds strange to me. so that makes me thing the dataset isn't getting processed correctly or that there's a setup issue

#

in your noteook i see max_seq_length = 2048 so if it's truly long text that might be the issue

#

oh you're also playing with lm head in your other code so maybe that's causing an issue as well.

wild nimbus Jul 5, 2025, 5:31 PM

#

open gull oh you're also playing with lm head in your other code so maybe that's causing a...

Yes, but that's what the tutorial on the unsloth website did. How else can you do text classification if you don't restrict your lm_head to output only 2 tokens (0 or 1)?

wild nimbus Jul 5, 2025, 5:33 PM

#

open gull in your noteook i see `max_seq_length = 2048` so if it's truly long text that mi...

In the code for my example, I set it to 8192.

wild nimbus Jul 5, 2025, 5:34 PM

#

open gull in your noteook i see `max_seq_length = 2048` so if it's truly long text that mi...

This is how the loss looks like for my particular data set. As you can see, the loss doesn't start at 0. For 3 or so steps, the loss is negative but is close to 0, i.e. -0.0002 as I've mentioned above.