#Multi-GPU Support for GRPO Training with Vision-Language Models

4 messages · Page 1 of 1 (latest)

shrewd mural
#

I’m trying to train Qwen 3 VL 8B using multiple GPUs, but I suspect that multi-GPU support isn’t implemented properly, as it raises an error.

2025-11-07 11:57:26
[rank0]:   File "/root/llm-synthetic-finetuning/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 53, in wrapper
2025-11-07 11:57:26
[rank0]:     output = f(self, *args, **kwargs)
2025-11-07 11:57:26
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-07 11:57:26
[rank0]:   File "/root/llm-synthetic-finetuning/.venv/lib/python3.12/site-packages/transformers/trainer.py", line 2325, in train
2025-11-07 11:57:26
[rank0]:     return inner_training_loop(
2025-11-07 11:57:26
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
2025-11-07 11:57:26
[rank0]:   File "<string>", line 328, in _fast_inner_training_loop
2025-11-07 11:57:26
[rank0]:   File "<string>", line 40, in _unsloth_training_step
2025-11-07 11:57:26
[rank0]:   File "/root/llm-synthetic-finetuning/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 2861, in compute_loss
2025-11-07 11:57:26
[rank0]:     logit_softcapping = getattr(model.config, "final_logit_softcapping", 0) # Gemma
2025-11-07 11:57:26
[rank0]:                                 ^^^^^^^^^^^^
2025-11-07 11:57:26
[rank0]:   File "/root/llm-synthetic-finetuning/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1962, in __getattr__
2025-11-07 11:57:26
[rank0]:     raise AttributeError(
2025-11-07 11:57:26
[rank0]: AttributeError: 'DistributedDataParallel' object has no attribute 'config'

It might be because the model is wrapped with DDP, but my concern is whether that feature is actually supported.

pale pilotBOT
#

Multi-GPU support for GRPO training with vision-language models (like Qwen 3 VL 8B) in Unsloth is not officially supported yet. While Unsloth supports multi-GPU setups for SFT via Accelerate/DeepSpeed, RL/GRPO with VLMs is still under development, and users have reported similar DDP-related errors due to model wrapping (e.g., DistributedDataParallel lacking a .config attribute) and lack of full integration for RL trainers in multi-GPU mode. The Unsloth team has acknowledged this gap and is working on it, but as of now, multi-GPU GRPO for VLMs is not stable or officially documented—see Unsloth multi-GPU docs, issue #3530, and issue #1943.

For now, you can only reliably use single-GPU for GRPO RL with vision-language models in Unsloth. If you attempt multi-GPU, you may encounter DDP attribute errors or device mismatch issues, as the RL trainer code is not yet adapted for distributed setups. Official support is planned, so watch the Unsloth GitHub issues and docs for updates.

Would you like more detail on workarounds or technical explanations?

Sources:

shrewd mural
#

do you know why unsloth alway make a folder unsloth_compile_cache

#

@pale pilot