I am trying to create a LLM to create chain texts down in the style here: https://www.dirtychaintexts.com/
To do this, I've gathered 184 example chain texts from various websites and put them into a two column CSV. I'm fine tuning an uncensored Llama model I found here https://huggingface.co/Orenguteng/Llama-3-8B-Lexi-Uncensored
I'm using the demo unsloth notebook as my guide: https://huggingface.co/datasets/unsloth/notebooks/blob/main/Alpaca_%2B_Mistral_7b_full_example.ipynb
I am able to kick off the training workflow successfully however my training loss stays between 1.5-2.0 even after 3 full epochs. What should my first steps be to tweak my training in order to get better results? Are there any examples where folks have fine tuned a model using their own custom curated small datasets? Curious as to whether I should get more training data or invest my time with hyperparamter tuning and think I could learn alot by looking at how others have gone about this