Strange output of Meta-Llama-3.1-8B-bnb-4bit after fine tuning | Unsloth AI | Page 1

oblique brook Jul 24, 2024, 12:17 PM

#

The prompt format is the same for llama3.0 and llama3.1 right?

Prompt:
self.prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n"{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>{output}<|eot_id|><|end_of_text|>"""

Because I get very strange outputs when run my fine tuned llama-3.1 model in interference on the same data set as I trained the llama-3.0 version on (which worked well). It produces random tokens for some reason. Did anybody also encounter this problem, thanks!

Output:

cleaned response: {
"Document Type": "Health Certificate"}<|reserved_special_token_226|>.**
<|reserved_special_token_33|>assistant<|reserved_special_token_48|>{
"Document Type": "Health Certificate"}<|reserved_special_token_231|><|reserved_special_token_147|>assistant<|reserved_special_token_40|>{
"Document Type": "Health Certificate"}<|reserved_special_token_212|>assistant<|reserved_special_token_143|>{

Where the previous output was:
{
"Document Type": "Health Certificate"
}

#

Strange output of Meta-Llama-3.1-8B-bnb-4bit after fine tuning

void skiff Jul 24, 2024, 12:23 PM

#

oblique brook The prompt format is the same for llama3.0 and llama3.1 right? Prompt: ...

Looks like you're tuning the base model but trying to use a chat format. Your dataset may not have enough entries for the base model to understand the chat formatting.

#

(note that the prompt is indeed the same, but only for the instruct models)

oblique brook Jul 24, 2024, 12:35 PM

#

Thanks so much! That's the issue then. I'm trying to extract data from documents and presenting it in a particular format. I have around 1000 data points (that's probably too little for the model?). If I may ask, when would you say that choosing the base model would be the better option?

Example prompt:

    "system": "You are an AI assistant specialized in extracting and formatting product information from shipping documents. Your task is to analyze the given text, identify product details, and present them in a specific JSON format. Translate all information to English. Remember to provide your output without indentation and ensure all required fields are filled, if information is not available return an empty string for that field",
    "instruct": \n1Analyze the provided document.\n2. Extract the following information ONLY if it is present in the document.\n3. If a piece of information is not found in the document, use an empty string (\"\") for that field.\n4. Do not use any default or example values. Only use information from the actual document.\n5. Return the data as a JSON object with no indentation.\n\n{\n\"Consignor Name\": \"...\",\n\"Consignee (optional) Name\": \"...\",\n\"INV. NO.\": \"...\",\n\"Description of Goods\": \"...\",\n\"Number of Packages\": \"...\",\n\"Net Weight\": \"...\",\n\"Gross Weight\": \"...\",\n\"Certificate Of Origin\": \"...\",\n\"Issuing Authority (Details)\": \"...\",\n\"Country of Origin\": \"...\",\n}

void skiff Jul 24, 2024, 12:49 PM

#

the issue is that, its difficult for the base model to learn from the chat template, since it wasnt specifically trained on examples using the tokens.

#

i would recommend you try using the alpaca dataset format instead, it should be easier for the model to pick up on

#

its not inherently chat-based though, if that's what you need. but for your task you put down in the system prompt, it should be good as a tool.

oblique brook Jul 24, 2024, 1:04 PM

#

void skiff i would recommend you try using the alpaca dataset format instead, it should be ...

You mean, use the alpaca prompt for the base model?

I tried using alpaca for the instruct model, but I found that the meta token template achieved better results.

oblique brook Jul 24, 2024, 1:05 PM

#

void skiff its not inherently chat-based though, if that's what you need. but for your task...

Yeah, I don’t need it to be chat based. As the templates that I feed it are all similar to the one that I posted

#

And with regards to the dataset size, what would you recommend when switching over from the instruct to the base model before seeing decent results?

#

Thanks again for the help 😁

void skiff Jul 24, 2024, 1:09 PM

#

oblique brook And with regards to the dataset size, what would you recommend when switching ov...

instruct would be better for more low-resource datasets, as the instruct model can already reason with chain of thought (if you prompt it)

#

however if you wish to scale and achieve more accurate results, (and more consistant results,) then you should tune the base model

void skiff Jul 24, 2024, 1:11 PM

#

oblique brook You mean, use the alpaca prompt for the base model? I tried using alpaca for t...

the meta token template will always be better than alpaca for instruct

#

(unless if you preform CPT)

#

mostly because its been tuned with SFT and RPO to be efficient with the chat prompt

void skiff Jul 24, 2024, 1:12 PM

#

oblique brook And with regards to the dataset size, what would you recommend when switching ov...

completely up for you to decide, experimentation and having your own private benchmark is nessesary

oblique brook Jul 24, 2024, 1:14 PM

#

Awesoe thanks! Will give alpaca a shot for the base model then 🙂

#Strange output of Meta-Llama-3.1-8B-bnb-4bit after fine tuning