#Unsloth's gradient checkpointing crashing during GRPO training on evaluation

12 messages · Page 1 of 1 (latest)

queen nebula
#

Do you know why this error could be showing up? I am doing GRPO with qwen 2.5. I checked I already have the latest versions of vllm and unsloth:
RuntimeError: Inplace update to inference tensor outside InferenceMode is not allowed.You can make a clone to get a normal tensor before doing inplace update.See https://github.com/pytorch/rfcs/pull/17 for more details.

It seems the crash occurred at unsloth_zoo/gradient_checkpointing.py
Pytorch's twitter handle has recommended to clone the inference tensor

The most annoying part of this error is that it is not even occuring every evaluation, just randomly, and it's driving me crazy

queen nebula
#

These were my evaluation configs, that I have since commented out:
`# eval_strategy="steps",

eval_steps=log_save_eval_steps,

per_device_eval_batch_size=eval_batch_size, # batch size for evaluation

fp16_full_eval = True,

eval_accumulation_steps = 1,`

It works if I don't perform evaluation.

obtuse snow
#

what are ur installed versions of unsloth and unsloth zoo?

queen nebula
obtuse snow
#

can you share your notebook ?

queen nebula
#
model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = SFT_MODEL_PATH,
        max_seq_length = max_seq_length,
        load_in_4bit = False,
        load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory
        full_finetuning = False, # [NEW!] We have full finetuning now!
        fast_inference = True,
    )

    model = FastLanguageModel.get_peft_model(
        model,
        r = 32, # Match your SFT rank for stability
        target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
        lora_alpha = 32*2,
        use_gradient_checkpointing = "unsloth",
        random_state = 3407,
        use_rslora = True,  # We support rank stabilized LoRA
        loftq_config = None, # And LoftQ
    )

    train_dataset = load_dataset("csv", data_files=DATASET_PATH, split="train")

    tokenizer = get_chat_template(
        tokenizer,
        chat_template = "qwen-2.5",
    )

    train_dataset = train_dataset.map(transform_conversation)
    train_dataset = train_dataset.map(formatting_prompts_func, batched=True, fn_kwargs={"tokenizer": tokenizer})
    # train_eval_split = train_dataset.train_test_split(test_size=200, seed=42)
    # train_dataset = train_eval_split["train"]
    # eval_dataset = train_eval_split["test"]

    grad_acc_steps = 4
    train_batch_size = 1
    eval_batch_size = num_generations = 4
    steps_per_epoch = len(train_dataset) / (grad_acc_steps * train_batch_size)
    steps_per_epoch_ceil = math.ceil(steps_per_epoch)
    print(steps_per_epoch_ceil, " is the steps/epoch\n\n")
    log_save_eval_steps = 100

#
    trainer_args = GRPOConfig(
        learning_rate = 5e-6,
        adam_beta1 = 0.9,
        adam_beta2 = 0.99,
        weight_decay = 0.1,
        warmup_ratio = 0.1,
        lr_scheduler_type = "cosine",
        optim = "adamw_torch_fused",
        logging_steps = 1,
        per_device_train_batch_size = train_batch_size,
        gradient_accumulation_steps = grad_acc_steps, # Increase to 4 for smoother training
        num_generations = num_generations, # Decrease if out of memory
        max_prompt_length = max_prompt_length,
        max_completion_length = max_seq_length - max_prompt_length,
        num_train_epochs = 1, # Set to 1 for a full training run
        # save configs
        save_strategy="steps",
        save_steps = log_save_eval_steps,
        save_total_limit=2,

        # eval configs
        # eval_strategy="steps",
        # eval_steps=log_save_eval_steps, # log_save_eval_steps
        # per_device_eval_batch_size=eval_batch_size,    # batch size for evaluation
        # eval_accumulation_steps = 1,

        max_grad_norm = 0.1,
        report_to = "tensorboard", # Can use Weights & Biases
        output_dir=f"./{SAVE_FOLDER_PATH}/checkpoint",
        logging_dir=os.path.join(SAVE_ROOT_DIRECTORY, "logs", RUN_NAME),
        run_name=RUN_NAME,
    )

    trainer = GRPOTrainer(
        model = model,
        processing_class = tokenizer,
        train_dataset = train_dataset,
        # eval_dataset=eval_dataset,
        reward_funcs = [
            match_format_exactly,
            match_format_approximately,
            check_answer,
        ],
        args = trainer_args,
    )

#

I have commented out eval to make it work

obtuse snow
#

ok so i'll appreciate if you can open an actual github issue with this at github.com/unslothai/unsloth, share all the info you've shared here.. that way it will be looked at

#

cause this might be a bug of some sort

queen nebula