viral root May 22, 2025, 8:56 AM

#

my inference code is '''import unsloth
from transformers import TextStreamer
local_model_path = r"F:\anaconda\install\lora_model10"
if True:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = local_model_path, # YOUR MODEL YOU USED FOR TRAINING
max_seq_length = 2048,
dtype = None,
load_in_4bit = True,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
alpaca_prompt = """

Instruction:

{}

Input:

{}

Response:

{}""" #

alpaca_prompt = You MUST copy from above!

while True:
question = input("enter your question:")
inputs = tokenizer(
[
alpaca_prompt.format(
question, # instruction
"", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 64)''' the training data is '''[
{
    "instruction": "张小双的职业是什么？",
    "input": "",
    "output": "张小双是一名工程师。"
},
{
    "instruction": "What is Zhang Xiaoshuang's job?",
    "input": "",
    "output": "Zhang Xiaoshuang is an engineer."
},
{
    "instruction": "张小双今年多大了？",
    "input": "",
    "output": "张小双今年28岁。"
}]''' the inference  result is '''nstruction:,

张小双的职业是什么？
Input:,
Response:,
F:\anaconda\install\envs\new_env\Lib\site-packages\unsloth\models\llama.py:481: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
A = scaled_dot_product_attention(Q, K, V, attn_mask = attention_mask, is_causal = False)
我的名字是李明。<|endoftext|>''' anybody can help me check ,why? my computer is 8G RTX 4060.

📎 training_code.txt

long bay May 22, 2025, 9:04 AM

#

what is the problem again ?

#

i see you're finetuning a model

#

then what?

#

are you loading the adapter? or trying to merge?

viral root May 22, 2025, 9:20 AM

#

yes then i try inference use flow code '''import unsloth
from transformers import TextStreamer
local_model_path = r"F:\anaconda\install\lora_model10"
if True:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = local_model_path, # YOUR MODEL YOU USED FOR TRAINING
max_seq_length = 512,
dtype = None,
load_in_4bit = True,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
alpaca_prompt = """

Instruction:

{}

Input:

{}

Response:

{}"""

alpaca_prompt = You MUST copy from above!

while True:
question = input("enter your question:")
inputs = tokenizer(
[
alpaca_prompt.format(
question, # instruction
"", # input
"", # output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")

text_streamer = TextStreamer(tokenizer)
_ = model.generate(
**inputs,
streamer = text_streamer,
 max_new_tokens = 64,
 temperature=0.6,
 top_k=20,
 top_p=0.7,
 repetition_penalty=1.0
 )'''  when i write the ask like '''{
    "instruction": "How old is Zhang Xiaoshuang?",
    "input": "",
    "output": "Zhang Xiaoshuang is 28 years old."
}''' the model respond should be "Zhang Xiaoshuang is 28 years old." but the model  response  another answer ,the answer is not correct .so i don't know why.do you understand me bro ?my english is not well .

viral root May 22, 2025, 9:22 AM

#

long bay are you loading the adapter? or trying to merge?

yes i use inference code loading train adapter

long bay May 22, 2025, 9:22 AM

#

are you loading the adapter from lora_model8 directory?

viral root May 22, 2025, 9:23 AM

#

yes bro this is my laptop path local_model_path = r"F:\anaconda\install\lora_model10",already loading

#

the model some question answers is correct ,most is wrong

long bay May 22, 2025, 9:24 AM

#

make sure that when you finetune, that the dataset is in the format expected by qwen2.5
if you reference the qwen notebooks under unsloth.ai , you should be able to see how we format the data..

it is also possible that the model didn't really learn much. what was the final loss on the finetune?

#Support for fine-tuning multimodel models with Qwen 2.5 7B,the inference result is no correct.

Instruction:

Input:

Response:

alpaca_prompt = You MUST copy from above!

Instruction:

Input:

Response:

alpaca_prompt = You MUST copy from above!