queen nebula Jan 2, 2026, 11:19 PM

#

Do you know why this error could be showing up? I am doing GRPO with qwen 2.5. I checked I already have the latest versions of vllm and unsloth:
RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed.You can make a clone to get a normal tensor before doing inplace update.See https://github.com/pytorch/rfcs/pull/17 for more details.

It seems the crash occurred at unsloth_zoo/gradient_checkpointing.py
Pytorch's twitter handle has recommended to clone the inference tensor

The most annoying part of this error is that it is not even occuring every evaluation, just randomly, and it's driving me crazy

GitHub

RFC-0011-InferenceMode by ailzhang · Pull Request #17 · pytorch/rfcs

Rendered

#

This is the full trace

📎 dsfsda.txt

queen nebula Jan 2, 2026, 11:51 PM

#

These were my evaluation configs, that I have since commented out:
`# eval_strategy="steps",

eval_steps=log_save_eval_steps,

per_device_eval_batch_size=eval_batch_size, # batch size for evaluation

fp16_full_eval = True,

eval_accumulation_steps = 1,`

It works if I don't perform evaluation.

obtuse snow Jan 3, 2026, 10:34 AM

#

what are ur installed versions of unsloth and unsloth zoo?

queen nebula Jan 3, 2026, 4:58 PM

#

obtuse snow what are ur installed versions of unsloth and unsloth zoo?

unsloth version is 2025.12.10 and unsloth zoo is 2025.12.8

obtuse snow Jan 3, 2026, 5:10 PM

#

can you share your notebook ?

queen nebula Jan 3, 2026, 5:12 PM

#

model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = SFT_MODEL_PATH,
        max_seq_length = max_seq_length,
        load_in_4bit = False,
        load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory
        full_finetuning = False, # [NEW!] We have full finetuning now!
        fast_inference = True,
    )

    model = FastLanguageModel.get_peft_model(
        model,
        r = 32, # Match your SFT rank for stability
        target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
        lora_alpha = 32*2,
        use_gradient_checkpointing = "unsloth",
        random_state = 3407,
        use_rslora = True,  # We support rank stabilized LoRA
        loftq_config = None, # And LoftQ
    )

    train_dataset = load_dataset("csv", data_files=DATASET_PATH, split="train")

    tokenizer = get_chat_template(
        tokenizer,
        chat_template = "qwen-2.5",
    )

    train_dataset = train_dataset.map(transform_conversation)
    train_dataset = train_dataset.map(formatting_prompts_func, batched=True, fn_kwargs={"tokenizer": tokenizer})
    # train_eval_split = train_dataset.train_test_split(test_size=200, seed=42)
    # train_dataset = train_eval_split["train"]
    # eval_dataset = train_eval_split["test"]

    grad_acc_steps = 4
    train_batch_size = 1
    eval_batch_size = num_generations = 4
    steps_per_epoch = len(train_dataset) / (grad_acc_steps * train_batch_size)
    steps_per_epoch_ceil = math.ceil(steps_per_epoch)
    print(steps_per_epoch_ceil, " is the steps/epoch\n\n")
    log_save_eval_steps = 100

#

    trainer_args = GRPOConfig(
        learning_rate = 5e-6,
        adam_beta1 = 0.9,
        adam_beta2 = 0.99,
        weight_decay = 0.1,
        warmup_ratio = 0.1,
        lr_scheduler_type = "cosine",
        optim = "adamw_torch_fused",
        logging_steps = 1,
        per_device_train_batch_size = train_batch_size,
        gradient_accumulation_steps = grad_acc_steps, # Increase to 4 for smoother training
        num_generations = num_generations, # Decrease if out of memory
        max_prompt_length = max_prompt_length,
        max_completion_length = max_seq_length - max_prompt_length,
        num_train_epochs = 1, # Set to 1 for a full training run
        # save configs
        save_strategy="steps",
        save_steps = log_save_eval_steps,
        save_total_limit=2,

        # eval configs
        # eval_strategy="steps",
        # eval_steps=log_save_eval_steps, # log_save_eval_steps
        # per_device_eval_batch_size=eval_batch_size,    # batch size for evaluation
        # eval_accumulation_steps = 1,

        max_grad_norm = 0.1,
        report_to = "tensorboard", # Can use Weights & Biases
        output_dir=f"./{SAVE_FOLDER_PATH}/checkpoint",
        logging_dir=os.path.join(SAVE_ROOT_DIRECTORY, "logs", RUN_NAME),
        run_name=RUN_NAME,
    )

    trainer = GRPOTrainer(
        model = model,
        processing_class = tokenizer,
        train_dataset = train_dataset,
        # eval_dataset=eval_dataset,
        reward_funcs = [
            match_format_exactly,
            match_format_approximately,
            check_answer,
        ],
        args = trainer_args,
    )

#

I have commented out eval to make it work

obtuse snow Jan 3, 2026, 5:16 PM

#

ok so i'll appreciate if you can open an actual github issue with this at github.com/unslothai/unsloth, share all the info you've shared here.. that way it will be looked at

#

cause this might be a bug of some sort

queen nebula Jan 3, 2026, 7:40 PM

#

Thank you. I have opened the issue: https://github.com/unslothai/unsloth/issues/3828

#Unsloth's gradient checkpointing crashing during GRPO training on evaluation

eval_steps=log_save_eval_steps,

per_device_eval_batch_size=eval_batch_size, # batch size for evaluation

fp16_full_eval = True,

eval_accumulation_steps = 1,`