#Any successful DPO with mistral models??
7 messages · Page 1 of 1 (latest)
In case people who's reading this has the same problem, maybe you can try use my friend script :
@umbral aspen I am actually doing Mathstral. The thing is that as I mentioned in an other post, the training occurs but I get some weird warnings and I want to be sure that I am not doing something wrong, because I want to start a long training for research purposes. One of the warnings I get "The following columns in the training set don't have a corresponding argument in PeftModelForCausalLM.forward and have been ignored: prompt, chosen, rejected". this warning I dont get if I just use HF for training. so I dont know if it isomething I did wrong or some minor bug. I by no means want to sound ungrateful for your work (I really am..it saves me alot of time) but the think with the cookbooks is the they have small nuances that are not directly usable by all models. For example the DPO cookbook is not 100% combatible with mistral, but the train will start normally. That is why I asked from other people to share their code so I can compare.
Thank you for sharing
By the way, I tried my setup totally unchanged, but with ORPO and there I dont get the warning. So this warning is only when I perform DPO with unsloth. This whole fine-tuning business can lead a person crazy 🙂