#Negative loss during training

18 messages · Page 1 of 1 (latest)

wild nimbus
#

Hi @mathew, I'm attaching my code that I used for fine tuning. I am getting a negative loss (but numerically close to 0) on my dataset for some targets.

My code is based on the unslot tutorial here: https://colab.research.google.com/github/timothelaborie/text_classification_scripts/blob/main/unsloth_classification.ipynb . I am attaching a modified dataset in which the number of classes is restricted to 2 and the prevalence of the positive class is 1%.

Here are the inputs of my code:

LLMpath=/cluster/home/Qwen2.5-0.5B-bnb-4bit
notesPath=/cluster/home/finance_sentiment_multiclass_low_prevalence.csv
resultsDir=/cluster/home/
batchSizeTest=3
batchSizeTrain=2
learningRate=0.00005
nEpochs=3
gradientSteps=4

With the attached dataset, I cannot exactly reproduce the error for my original dataset. I can only get a loss value of 0. But I suspect that the negative loss I am experiencing is due to 1) low prevalence and 2) round of errors. For my other targets, in my actual dataset, I don't have this problem if the positive class has high prevalence.

I have also attached my unsloth environment. Is there a problem with my code? Or any insights with how my data preparation might have gone wrong? My actual dataset has very long texts. I normally use max_seq_length=8192.

Thank you so much!

#

@open gull Hi mathew, sorry for the delay. I have provided the relevant details above. I'd appreciate any help on this matter.

open gull
#

oh maybe try passing SFTConfig instead of TrainingArguments. I bet your sequence is getting accidentally truncated

#

from trl import SFTTrainer, SFTConfig

and then directly swap TrainingArguments and SFTConfig

wild nimbus
#

Hi @open gull , I tried SFTConfig instead of TrainingArguments. I am still getting negative values for the loss. I don't think there was a change in the result when I changed from TrainingArguments to SFTConfig.

The negative loss values are very small, however. It's like -0.0002 or -0.0005, which are close to 0. This is just round off error?

open gull
#

the thing is if your loss starts at 0 that sounds strange to me. so that makes me thing the dataset isn't getting processed correctly or that there's a setup issue

#

in your noteook i see max_seq_length = 2048 so if it's truly long text that might be the issue

#

oh you're also playing with lm head in your other code so maybe that's causing an issue as well.

wild nimbus
wild nimbus
wild nimbus
open gull
#

Train loss being 0 doesn’t seem right to me either way.

#

I think you should try balancing your dataset a bit first and see what happens