Hi @mathew, I'm attaching my code that I used for fine tuning. I am getting a negative loss (but numerically close to 0) on my dataset for some targets.
My code is based on the unslot tutorial here: https://colab.research.google.com/github/timothelaborie/text_classification_scripts/blob/main/unsloth_classification.ipynb . I am attaching a modified dataset in which the number of classes is restricted to 2 and the prevalence of the positive class is 1%.
Here are the inputs of my code:
LLMpath=/cluster/home/Qwen2.5-0.5B-bnb-4bit
notesPath=/cluster/home/finance_sentiment_multiclass_low_prevalence.csv
resultsDir=/cluster/home/
batchSizeTest=3
batchSizeTrain=2
learningRate=0.00005
nEpochs=3
gradientSteps=4
With the attached dataset, I cannot exactly reproduce the error for my original dataset. I can only get a loss value of 0. But I suspect that the negative loss I am experiencing is due to 1) low prevalence and 2) round of errors. For my other targets, in my actual dataset, I don't have this problem if the positive class has high prevalence.
I have also attached my unsloth environment. Is there a problem with my code? Or any insights with how my data preparation might have gone wrong? My actual dataset has very long texts. I normally use max_seq_length=8192.
Thank you so much!