#Loss stuck on 0.6931 on DPO training

30 messages · Page 1 of 1 (latest)

hard salmon
#

Hey! Trying to do some DPO by following your notebook, but the loss is stuck on 0.6931 regardless of what I do.

Code

model_name = 'meta-llama/Meta-Llama-3.1-8B-Instruct'
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = model_name,
max_seq_length = max_input_length + max_output_length,
dtype = None,
)

model = FastLanguageModel.get_peft_model(
model,
r = 64,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 64,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
max_seq_length = max_input_length + max_output_length,
)

dpo_trainer = DPOTrainer(
model = model,
ref_model = None,
args = TrainingArguments(
per_device_train_batch_size = 4,
gradient_accumulation_steps = 8,
warmup_ratio = 0.1,
num_train_epochs = 3,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
seed = 42,
output_dir = "outputs",
),
beta = 0.1,
train_dataset = train_dataset,
# eval_dataset = YOUR_DATASET_HERE,
tokenizer = tokenizer,
max_length = 1024,
max_prompt_length = 512,
)
dpo_trainer.train()

gentle scarab
#

loss is stuck on 0.6931 regardless of what I do

What you have tried? do you try to rise batch_size, or decrease learning_rate?

hard salmon
#

I have tried to decrease learning rate

#

not tried batch size

#

Do you think it could matter?

compact leaf
#

your current effective batch size is 4x8 = 32.

#

lower it to 4 or 8

#

maybe after a few epochs, you can make it larger

hard salmon
#

ok thanks, will try

#

So just to be sure, you mean something like
per_device_train_batch_size = 2,
gradient_accumulation_steps = 2

#

?

#

Or
per_device_train_batch_size = 4,
gradient_accumulation_steps = 1

hard salmon
#

thanks, will try

hard salmon
#

Hey it works!

#

But why? Why do we need a smaller batch size when doing DPO?

gusty zodiac
#

can you also share the training graph?

hard salmon
#

of the training loss? The eval loss is constant, not sure if that is expected in DPO or not

compact leaf
#

Generally you want to start with smaller batch sizes

#

Because larger batches tend to converge more slowly

#

This is why people also increase the learning rate if they increase the batch size.

gentle scarab
#

Just curious, if larger batches tend to converge more slowly, the loss shouldn't stuck on 0.6931, but slowly decreased, this is not match with op said.

hard salmon
#

Actually tried with batch size of 32 now and that works 🤷‍♂️

#

But the eval loss is still stuck on 0.6931

compact leaf
#

add this in your traning args

lr_scheduler_type = "cosine_with_restarts",
hidden hazel
#

can you please share me the notebook as well as the versions of librariries required