#Any successful DPO with mistral models??

7 messages · Page 1 of 1 (latest)

fresh fulcrum
#

Has anyone successfully trained a DPO using a Mistral model and is willing to share their code? I'm encountering some strange warnings and can't figure out the cause. Using someone else's code might help me debug.

umbral aspen
#

Which Mistral models specifically?

#

Unsloth should work for it directly

raw cipher
fresh fulcrum
#

@umbral aspen I am actually doing Mathstral. The thing is that as I mentioned in an other post, the training occurs but I get some weird warnings and I want to be sure that I am not doing something wrong, because I want to start a long training for research purposes. One of the warnings I get "The following columns in the training set don't have a corresponding argument in PeftModelForCausalLM.forward and have been ignored: prompt, chosen, rejected". this warning I dont get if I just use HF for training. so I dont know if it isomething I did wrong or some minor bug. I by no means want to sound ungrateful for your work (I really am..it saves me alot of time) but the think with the cookbooks is the they have small nuances that are not directly usable by all models. For example the DPO cookbook is not 100% combatible with mistral, but the train will start normally. That is why I asked from other people to share their code so I can compare.

fresh fulcrum
#

By the way, I tried my setup totally unchanged, but with ORPO and there I dont get the warning. So this warning is only when I perform DPO with unsloth. This whole fine-tuning business can lead a person crazy 🙂