#Hallucination issue

28 messages · Page 1 of 1 (latest)

jagged quiver
#

Hello Team, I have working on finetuning based six pdf documents content . Each document is having 6-7 page content and prepared the Instruction based Alpaca dataset with and finetuned with llama 3 8B model . After fine tune , model is answering the question exactly from the dataset but when I am ask the question with twisting it's answering wrongly with hallucination . What i am missing here . The finetuning code is below :

#

from unsloth import FastLanguageModel
import torch
max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = False # Use 4bit quantization to reduce memory usage. Can be False.

model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "meta-llama/Meta-Llama-3-8B",
# model_name = "unsloth/Meta-Llama-3.1-8B",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
token = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", # use one if using gated models like meta-llama/Llama-2-7b-hf
)
model = FastLanguageModel.get_peft_model(
model,
r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",],
lora_alpha = 16,
lora_dropout = 0, # Supports any, but = 0 is optimized
bias = "none", # Supports any, but = "none" is optimized
# [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
random_state = 3407,
use_rslora = False, # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
)

#

Define the Alpaca prompt format

alpaca_prompt = """ Below is an instruction that contains a question.\n\n Write a response that appropriately completes the request.\n\n

Instruction:

{}

Input:

{}

Response:

{}"""

EOS_TOKEN = "<|endoftext|>" # Replace with the actual EOS token for your tokenizer

from datasets import load_dataset

def formatting_prompts_func(examples):
instructions = examples["instruction"]
outputs = examples["response"]
inputs = examples["input"]
texts = []
for instruction, output in zip(instructions, outputs):
input_text = "" # Adjust if you have actual inputs
text = alpaca_prompt.format(instruction, input_text, output) + EOS_TOKEN
texts.append(text)
return {"text": texts}

Format the dataset

dataset = dataset.map(formatting_prompts_func, batched=True, batch_size=8)

print(dataset)

Load your custom dataset from the JSONL file

dataset = load_dataset('json', data_files='./dataset/alpaca_instructions_dataset_4_large.json', split='train')

Print the first example to verify the dataset is loaded correctly

print(dataset)

dataset = dataset.map(formatting_prompts_func, batched=True,batch_size=8)

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

#

trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset,
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
num_train_epochs = 100,
warmup_steps = 5,
# max_steps=None,
learning_rate = 2e-4,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
),
)
trainer_stats = trainer.train()

#

#help

indigo tartan
#

fine-tuning is not a good way to inject new knowledge into the model, what are those pdf documents about? Is it better to do RAG if you are looking to answer queries based on the documents?

jagged quiver
indigo tartan
#

That interesting, are you looking to fine-tune in order to reduce hallucinations? I am quite curious why fine-tuning is being used for this purpose.

viral tree
#

I have the same hallucination issue, and I also thought to reduce it by fine tuning. Is that an incorrect way to fix this problem?

jagged quiver
viral tree
#

So you mean when you asking the question that was NOT provided in the training dataset it still hallucinating?

jagged quiver
paper monolith
#

I wonder what would happen if you include the text chunk next to the question and answer to make the model understand the base more

indigo tartan
daring folio
#

you can finetune a model specifically to make it hallucinate less or more and make it learn that way

daring folio
autumn osprey
#

You can reduce hallucinations directly by finetuning your model to only output certain outputs ie if you ask a LLM "What is 2+2?" It has 90% chance it'll say 4, but 5% it'll say "four" and small chances on 3, 2, 1, 10, 4 etc

#

so to "force" to model to make the probability to 100%, simply finetune on a dataset with "What is 2+2? It's 4" 10,000 epochs

#

and ull force the probably to go to 100%

#

the issue is the model now becomes non creative

#

which defeats the whole purpose of LLMs

#

the issue with LLMs is we select 1 token from a distribution, but rarely multiple tokens during the decoding step - instead if we output a probability distribution

#

thatll be better

viral tree
viral tree
# autumn osprey which defeats the whole purpose of LLMs

That’s my another fear, that the model would just answers exactly in the same way as in train data. We need a model to use it as part of customer service (reception) so the answers should be in specific format, but various.
I spend hundred of hours trying to make it work as we need just by prompt. But it still does not work in most cases as we want to, so I am looking in the direction of fine tuning.

autumn osprey
#

Ye you could try finetuning i guess