Any successful DPO with mistral models?? | Unsloth AI | Page 1

fresh fulcrum Dec 4, 2024, 9:25 PM

#

Has anyone successfully trained a DPO using a Mistral model and is willing to share their code? I'm encountering some strange warnings and can't figure out the cause. Using someone else's code might help me debug.

umbral aspen Dec 5, 2024, 8:09 PM

#

Which Mistral models specifically?

#

Unsloth should work for it directly

raw cipher Dec 5, 2024, 10:46 PM

#

In case people who's reading this has the same problem, maybe you can try use my friend script :

#1313118494010773534 message

fresh fulcrum Dec 6, 2024, 9:13 AM

#

@umbral aspen I am actually doing Mathstral. The thing is that as I mentioned in an other post, the training occurs but I get some weird warnings and I want to be sure that I am not doing something wrong, because I want to start a long training for research purposes. One of the warnings I get "The following columns in the training set don't have a corresponding argument in PeftModelForCausalLM.forward and have been ignored: prompt, chosen, rejected". this warning I dont get if I just use HF for training. so I dont know if it isomething I did wrong or some minor bug. I by no means want to sound ungrateful for your work (I really am..it saves me alot of time) but the think with the cookbooks is the they have small nuances that are not directly usable by all models. For example the DPO cookbook is not 100% combatible with mistral, but the train will start normally. That is why I asked from other people to share their code so I can compare.

fresh fulcrum Dec 6, 2024, 9:18 AM

#

raw cipher In case people who's reading this has the same problem, maybe you can try use my...

Thank you for sharing

fresh fulcrum Dec 6, 2024, 9:37 AM

#

By the way, I tried my setup totally unchanged, but with ORPO and there I dont get the warning. So this warning is only when I perform DPO with unsloth. This whole fine-tuning business can lead a person crazy 🙂

#Any successful DPO with mistral models??