Loss stuck on 0.6931 on DPO training | Unsloth AI | Page 1

hard salmon Oct 14, 2024, 10:22 AM

#

Hey! Trying to do some DPO by following your notebook, but the loss is stuck on 0.6931 regardless of what I do.

Code

model_name = 'meta-llama/Meta-Llama-3.1-8B-Instruct'
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = model_name,
max_seq_length = max_input_length + max_output_length,
dtype = None,
)

model = FastLanguageModel.get_peft_model(
model,
r = 64,
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 64,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
max_seq_length = max_input_length + max_output_length,
)

dpo_trainer = DPOTrainer(
model = model,
ref_model = None,
args = TrainingArguments(
per_device_train_batch_size = 4,
gradient_accumulation_steps = 8,
warmup_ratio = 0.1,
num_train_epochs = 3,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
seed = 42,
output_dir = "outputs",
),
beta = 0.1,
train_dataset = train_dataset,
# eval_dataset = YOUR_DATASET_HERE,
tokenizer = tokenizer,
max_length = 1024,
max_prompt_length = 512,
)
dpo_trainer.train()

gentle scarab Oct 14, 2024, 10:43 AM

#

loss is stuck on 0.6931 regardless of what I do

What you have tried? do you try to rise batch_size, or decrease learning_rate?

hard salmon Oct 14, 2024, 11:21 AM

#

I have tried to decrease learning rate

#

not tried batch size

#

Do you think it could matter?

compact leaf Oct 14, 2024, 1:03 PM

#

hard salmon Do you think it could matter?

Try using a lower batch size first

#

your current effective batch size is 4x8 = 32.

#

lower it to 4 or 8

#

maybe after a few epochs, you can make it larger

hard salmon Oct 14, 2024, 1:12 PM

#

ok thanks, will try

#

So just to be sure, you mean something like
per_device_train_batch_size = 2,
gradient_accumulation_steps = 2

#

?

#

Or
per_device_train_batch_size = 4,
gradient_accumulation_steps = 1

compact leaf Oct 14, 2024, 2:06 PM

#

hard salmon Or per_device_train_batch_size = 4, gradient_accumulation_steps...

yes either

hard salmon Oct 14, 2024, 2:09 PM

#

thanks, will try

hard salmon Oct 14, 2024, 2:44 PM

#

Hey it works!

#

But why? Why do we need a smaller batch size when doing DPO?

gusty zodiac Oct 14, 2024, 4:58 PM

#

can you also share the training graph?

hard salmon Oct 14, 2024, 5:27 PM

#

of the training loss? The eval loss is constant, not sure if that is expected in DPO or not

compact leaf Oct 14, 2024, 6:15 PM

#

hard salmon But why? Why do we need a smaller batch size when doing DPO?

It's not just for DPO

#

Generally you want to start with smaller batch sizes

#

Because larger batches tend to converge more slowly

#

This is why people also increase the learning rate if they increase the batch size.

gentle scarab Oct 15, 2024, 2:28 AM

#

Just curious, if larger batches tend to converge more slowly, the loss shouldn't stuck on 0.6931, but slowly decreased, this is not match with op said.

hard salmon Oct 15, 2024, 8:00 AM

#

Actually tried with batch size of 32 now and that works 🤷‍♂️

#

But the eval loss is still stuck on 0.6931

compact leaf Oct 19, 2024, 4:03 AM

#

hard salmon Actually tried with batch size of 32 now and that works 🤷‍♂️

did you try using a different lr schedule?

#

add this in your traning args

lr_scheduler_type = "cosine_with_restarts",

hidden hazel Oct 30, 2024, 4:36 PM

#

can you please share me the notebook as well as the versions of librariries required

compact leaf Nov 1, 2024, 3:54 AM

#

hidden hazel can you please share me the notebook as well as the versions of librariries requ...

https://docs.unsloth.ai/get-started/unsloth-notebooks

#Loss stuck on 0.6931 on DPO training