#AttributeError: 'LlamaForCausalLM' object has no attribute 'load_lora'

1 messages · Page 1 of 1 (latest)

native sleet
#

Hey! I recently trained a GRPO model following this tutorial https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4_(14B)-GRPO.ipynb#scrollTo=__w_7GamL1m1

And I saved the LoRA as model.save_lora("grpo_saved_lora")
And when loading the LoRA for inference as:

text = tokenizer.apply_chat_template([
    {"role" : "system", "content" : SYSTEM_PROMPT},
    {"role" : "user", "content" : "Which is bigger? 9.11 or 9.9?"},
], tokenize = False, add_generation_prompt = True)

from vllm import SamplingParams
sampling_params = SamplingParams(
    temperature = 0.8,
    top_p = 0.95,
    max_tokens = 1024,
)
output = model.fast_generate(
    text,
    sampling_params = sampling_params,
    lora_request = model.load_lora("grpo_saved_lora"),
)[0].outputs[0].text

output```

I'm getting this error: AttributeError: 'LlamaForCausalLM' object has no attribute 'load_lora'

Any help? Thank you!
sinful edge
#
model.save_lora("grpo_saved_lora")

use model.save_pretrained and tokenizer.save_pretrained instead

native sleet
#

I should retrain then? There's nothing to do with that? 🥺

sinful edge
#

wait how are you loading it?

native sleet
#

I load my base model like this:

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Phi-4",
    max_seq_length=max_seq_length,
    load_in_4bit=True,  # False for LoRA 16bit
    fast_inference=True,  # Enable vLLM fast inference
    max_lora_rank=lora_rank,
    gpu_memory_utilization=0.7,  # Reduce if out of memory
)

and then:

output = model.fast_generate(
    text,
    sampling_params=sampling_params,
    lora_request=model.load_lora("grpo_lora"),
)[0].outputs[0].text
sinful edge
#

oh i see what you're doing

#

vllm

native sleet
#

yes!