Are Qwen 2.5 models supported for GRPO long context training? | Unsloth AI | Page 1

plain rose Apr 25, 2025, 5:22 AM

#

Hi team, thanks a lot for your great work!
Are Qwen 2.5 models supported for GRPO long context training?

https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb

How shall I change this from Llama3.1 to Qwen 2.5 7B-Instruct?

And also, I want bfloat16 finetuing, how to change the demo notebook and set that properly?

Thanks a lot!

Google Colab

sand robin Apr 25, 2025, 5:31 AM

#

Yes ofc, see here for notebooks: https://docs.unsloth.ai/basics/reasoning-grpo-and-rl

Also in order to use bf16 you need to pay Google for a more powerful gpu

Reasoning - GRPO & RL | Unsloth Documentation

Train your own DeepSeek-R1 reasoning model with Unsloth using GRPO.

plain rose Apr 25, 2025, 5:18 PM

#

Thanks a lot Mike! Thanks thank a lot! I have a local GPU so not worrying about the GPU. How to turn on bf16? And what's the difference between the notebook your link pointing to and the notebook titled the Long Context GRPO?

#

My understanding is that the difference lies in Long Context GRPO notebook uses 4bit quantization and qLora so it can support long context? Other than these two, what are other differences?

#Are Qwen 2.5 models supported for GRPO long context training?