Using unsloth 2026.4.8, trl 1.2.0, pytorch 2.11 (attached pip list) I get this error when trying to finetune model with Lora in bf16
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = model_name,
max_seq_length = max_seq_length,
load_in_4bit = False,
fast_inference = False,
max_lora_rank = lora_rank,
dtype=torch.bfloat16
)
model = FastLanguageModel.get_peft_model(
model,
r = lora_rank,
target_modules = [
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
],
lora_alpha = lora_rank*2,
use_gradient_checkpointing = "unsloth",
random_state = RUN_SEED,
)
...
training_args = GRPOConfig(
max_completion_length=1024,
temperature=0.9,
num_generations=8,
report_to = "mlflow",
bf16=True,
fp16=False,
loss_type=LOSS_TYPE,
beta = 0.0,
save_strategy="epoch",
seed=RUN_SEED,
data_seed=RUN_SEED,
)
trainer = GRPOTrainer(
model=model,
processing_class = tokenizer,
reward_funcs=[combined_reward_func],
args=training_args,
train_dataset=converted_train_dataset
)
1057 A, B = A.t(), B.t()
1058 XA = torch_matmul(X, A.to(dtype))
-> 1059 out.addmm_(XA, B.to(dtype), alpha = s)
1060 # out += (X @ A.to(dtype)) @ (s * B.to(dtype))
1062 return out.view(batch, seq_len, -1) if reshape else out
RuntimeError: self and mat2 must have the same dtype, but got Half and BFloat16```
Is it an issue somewhere in unsloth, or is it incompatibility with latest trl/transformers ?