#Saving LoRa weights only

1 messages · Page 1 of 1 (latest)

young agate
#

Can you try doing this?

model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
# or
model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = ""
young agate
#

Maybe can you elaborate? What model are you using?

young agate
#

oh wait

#

in CPT, I think you actually save the full Embedding and LM Head

#

so you didn't do LoRA on them

#
model = FastLanguageModel.get_peft_model(
    model,
    r = 128, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",

                      "embed_tokens", "lm_head",], # Add for continual pretraining
    lora_alpha = 32,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = True,   # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

those embed_tokens and lm_head is not LoRA, it's full parameter. So it kinda make sense if those are very big

#

I'll test them tonight to make sure

naive girder
#

I wasn't able to save full model because of OOM, that's why I saved LoRA

naive girder
#

I saw your question:
"Does anyone have actual success in doing CPT+SFT push-to-hub LoRa only ?"
So...
The answer is "Yes".
I have trained LoRA successfully.
The model Phi-4, CPT + instruction fine tuning, pushed lora to hub. Size of lora as expected. Worked fine.
Maybe I misunderstood something?

#

After about 2 hours It will finish.
Want me to push Lora to hub for test?
Model is R1-didtilled-Qwen2.5-14B

young agate
#

so here I got 1.41GB of file for CPT

#

but compared to the original unsloth/Llama-3.2-1B-Instruct, it's still smaller

#

I would assume that the difference will be larger if you use larger model