self and mat2 must have the same dtype, but got Half and BFloat16 - GRPO Lora Training | Unsloth AI | Page 1

silver panther Apr 24, 2026, 10:36 PM

Using unsloth 2026.4.8, trl 1.2.0, pytorch 2.11 (attached pip list) I get this error when trying to finetune model with Lora in bf16

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name,
    max_seq_length = max_seq_length,
    load_in_4bit = False,
    fast_inference = False,
    max_lora_rank = lora_rank,
    dtype=torch.bfloat16
)
model = FastLanguageModel.get_peft_model(
    model,
    r = lora_rank,
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_alpha = lora_rank*2,
    use_gradient_checkpointing = "unsloth",
    random_state = RUN_SEED,
)
...
training_args = GRPOConfig(
    max_completion_length=1024,
    temperature=0.9,
    num_generations=8,
    report_to = "mlflow",
    bf16=True,
    fp16=False,
    loss_type=LOSS_TYPE,
    beta = 0.0,
    save_strategy="epoch",
    seed=RUN_SEED,
    data_seed=RUN_SEED,
)
trainer = GRPOTrainer(
        model=model,
        processing_class = tokenizer,
        reward_funcs=[combined_reward_func],
        args=training_args,
        train_dataset=converted_train_dataset
)

   1057     A, B = A.t(), B.t()
   1058     XA = torch_matmul(X, A.to(dtype))
-> 1059     out.addmm_(XA, B.to(dtype), alpha = s)
   1060     # out += (X @ A.to(dtype)) @ (s * B.to(dtype))
   1062 return out.view(batch, seq_len, -1) if reshape else out

RuntimeError: self and mat2 must have the same dtype, but got Half and BFloat16```

Is it an issue somewhere in unsloth, or is it incompatibility with latest trl/transformers ?

self and mat2 must have the same dtype, but got Half and BFloat16 - GRPO Lora Training

📎 stacktrace.txt

📎 pip_list.txt

tidal copper Apr 25, 2026, 10:18 AM

This was a while ago now, so I am not 100 percent sure but this might be related to the issue I described here: https://github.com/unslothai/unsloth/pull/4197 (see also this one: https://github.com/unslothai/unsloth/pull/4212)

you should be able to work around this by setting export ACCELERATE_MIXED_PRECISION=bf16 or not explicitly setting bf16=True,fp16=False, in your GRPOConfig

GitHub

Fix GRPO mixed-precision propagation for explicit bf16/fp16 configs...

Hello there,
while testing an issue someone encountered with phi-4-mini models using the following code: https://pastebin.com/uTg9DS78
I discovered that doing GRPO with vllm currently fails with Ru...

#self and mat2 must have the same dtype, but got Half and BFloat16 - GRPO Lora Training