Fine tuning Llama-3-8B-Lexi-Uncensored on custom dataset - LR not declining | Unsloth AI | Page 1

I am trying to create a LLM to create chain texts down in the style here: https://www.dirtychaintexts.com/

To do this, I've gathered 184 example chain texts from various websites and put them into a two column CSV. I'm fine tuning an uncensored Llama model I found here https://huggingface.co/Orenguteng/Llama-3-8B-Lexi-Uncensored

I'm using the demo unsloth notebook as my guide: https://huggingface.co/datasets/unsloth/notebooks/blob/main/Alpaca_%2B_Mistral_7b_full_example.ipynb

I am able to kick off the training workflow successfully however my training loss stays between 1.5-2.0 even after 3 full epochs. What should my first steps be to tweak my training in order to get better results? Are there any examples where folks have fine tuned a model using their own custom curated small datasets? Curious as to whether I should get more training data or invest my time with hyperparamter tuning and think I could learn alot by looking at how others have gone about this

Screen_Shot_2025-03-16_at_4.53.57_PM.jpg

Dirty Chain Texts

Spread a little holiday cheer to your friends' phones

Orenguteng/Llama-3-8B-Lexi-Uncensored · Hugging Face

and when I go to eventually run inference on the model it just outputs a string of emoji's and then terminates

Screen_Shot_2025-03-16_at_5.06.33_PM.jpg

#Fine tuning Llama-3-8B-Lexi-Uncensored on custom dataset - LR not declining