Hey! I recently trained a GRPO model following this tutorial https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_4_(14B)-GRPO.ipynb#scrollTo=__w_7GamL1m1
And I saved the LoRA as model.save_lora("grpo_saved_lora")
And when loading the LoRA for inference as:
text = tokenizer.apply_chat_template([
{"role" : "system", "content" : SYSTEM_PROMPT},
{"role" : "user", "content" : "Which is bigger? 9.11 or 9.9?"},
], tokenize = False, add_generation_prompt = True)
from vllm import SamplingParams
sampling_params = SamplingParams(
temperature = 0.8,
top_p = 0.95,
max_tokens = 1024,
)
output = model.fast_generate(
text,
sampling_params = sampling_params,
lora_request = model.load_lora("grpo_saved_lora"),
)[0].outputs[0].text
output```
I'm getting this error: AttributeError: 'LlamaForCausalLM' object has no attribute 'load_lora'
Any help? Thank you!