I’m trying to train Qwen 3 VL 8B using multiple GPUs, but I suspect that multi-GPU support isn’t implemented properly, as it raises an error.
2025-11-07 11:57:26
[rank0]: File "/root/llm-synthetic-finetuning/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 53, in wrapper
2025-11-07 11:57:26
[rank0]: output = f(self, *args, **kwargs)
2025-11-07 11:57:26
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^
2025-11-07 11:57:26
[rank0]: File "/root/llm-synthetic-finetuning/.venv/lib/python3.12/site-packages/transformers/trainer.py", line 2325, in train
2025-11-07 11:57:26
[rank0]: return inner_training_loop(
2025-11-07 11:57:26
[rank0]: ^^^^^^^^^^^^^^^^^^^^
2025-11-07 11:57:26
[rank0]: File "<string>", line 328, in _fast_inner_training_loop
2025-11-07 11:57:26
[rank0]: File "<string>", line 40, in _unsloth_training_step
2025-11-07 11:57:26
[rank0]: File "/root/llm-synthetic-finetuning/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 2861, in compute_loss
2025-11-07 11:57:26
[rank0]: logit_softcapping = getattr(model.config, "final_logit_softcapping", 0) # Gemma
2025-11-07 11:57:26
[rank0]: ^^^^^^^^^^^^
2025-11-07 11:57:26
[rank0]: File "/root/llm-synthetic-finetuning/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1962, in __getattr__
2025-11-07 11:57:26
[rank0]: raise AttributeError(
2025-11-07 11:57:26
[rank0]: AttributeError: 'DistributedDataParallel' object has no attribute 'config'
It might be because the model is wrapped with DDP, but my concern is whether that feature is actually supported.