Hi team, thanks a lot for your great work!
Are Qwen 2.5 models supported for GRPO long context training?
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb
How shall I change this from Llama3.1 to Qwen 2.5 7B-Instruct?
And also, I want bfloat16 finetuing, how to change the demo notebook and set that properly?
Thanks a lot!