Hello Team, I have working on finetuning based six pdf documents content . Each document is having 6-7 page content and prepared the Instruction based Alpaca dataset with and finetuned with llama 3 8B model . After fine tune , model is answering the question exactly from the dataset but when I am ask the question with twisting it's answering wrongly with hallucination . What i am missing here . The finetuning code is below :
#Hallucination issue
28 messages · Page 1 of 1 (latest)
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = False # Use 4bit quantization to reduce memory usage. Can be False.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "meta-llama/Meta-Llama-3-8B",
# model_name = "unsloth/Meta-Llama-3.1-8B",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
token = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)
Define the Alpaca prompt format
alpaca_prompt = """ Below is an instruction that contains a question.\n\n Write a response that appropriately completes the request.\n\n
Instruction:
{}
Input:
{}
Response:
{}"""
EOS_TOKEN = "<|endoftext|>" # Replace with the actual EOS token for your tokenizer
from datasets import load_dataset
def formatting_prompts_func(examples):
instructions = examples["instruction"]
outputs = examples["response"]
inputs = examples["input"]
texts = []
for instruction, output in zip(instructions, outputs):
input_text = "" # Adjust if you have actual inputs
text = alpaca_prompt.format(instruction, input_text, output) + EOS_TOKEN
texts.append(text)
return {"text": texts}
Format the dataset
dataset = dataset.map(formatting_prompts_func, batched=True, batch_size=8)
print(dataset)
Load your custom dataset from the JSONL file
dataset = load_dataset('json', data_files='./dataset/alpaca_instructions_dataset_4_large.json', split='train')
Print the first example to verify the dataset is loaded correctly
print(dataset)
dataset = dataset.map(formatting_prompts_func, batched=True,batch_size=8)
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
num_train_epochs = 100,
warmup_steps = 5,
# max_steps=None,
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)
trainer_stats = trainer.train()
#help
fine-tuning is not a good way to inject new knowledge into the model, what are those pdf documents about? Is it better to do RAG if you are looking to answer queries based on the documents?
We have implemented in RAG , But our requirement is finetune the knowledge base from old archive documents
Thanks
That interesting, are you looking to fine-tune in order to reduce hallucinations? I am quite curious why fine-tuning is being used for this purpose.
I have the same hallucination issue, and I also thought to reduce it by fine tuning. Is that an incorrect way to fix this problem?
I have done the finetuning and facing the hallucination problem except model is answering as pected only the dataset Questions
So you mean when you asking the question that was NOT provided in the training dataset it still hallucinating?
I am trying to ask the question twisted instead of direct question from dataset . Expecting as end user point of view and all the user won't ask the same question with related topic
I wonder what would happen if you include the text chunk next to the question and answer to make the model understand the base more
What is the expected answer in this case? Is the right answer a refusal or you are expecting the model to still answer the question based on the data?
Yes correct - finetuning definitely reduces hallucinations. You should pair it with RAG for even better results
you can finetune a model specifically to make it hallucinate less or more and make it learn that way
It really depends on what is is hallucinating. the higher epochs the more deterministic answers it will be
You can reduce hallucinations directly by finetuning your model to only output certain outputs ie if you ask a LLM "What is 2+2?" It has 90% chance it'll say 4, but 5% it'll say "four" and small chances on 3, 2, 1, 10, 4 etc
so to "force" to model to make the probability to 100%, simply finetune on a dataset with "What is 2+2? It's 4" 10,000 epochs
and ull force the probably to go to 100%
the issue is the model now becomes non creative
which defeats the whole purpose of LLMs
the issue with LLMs is we select 1 token from a distribution, but rarely multiple tokens during the decoding step - instead if we output a probability distribution
thatll be better
It hallucinate about the data which we include as part of RAG.
To fine tune with RAG we simply need to create dataset where it will be a lot of examples with different data in the context, right?
I just afraid that model just learn that data instead, especially on high number of epoch.
That’s my another fear, that the model would just answers exactly in the same way as in train data. We need a model to use it as part of customer service (reception) so the answers should be in specific format, but various.
I spend hundred of hours trying to make it work as we need just by prompt. But it still does not work in most cases as we want to, so I am looking in the direction of fine tuning.
Ye you could try finetuning i guess