#Strange output of Meta-Llama-3.1-8B-bnb-4bit after fine tuning

19 messages · Page 1 of 1 (latest)

oblique brook
#

The prompt format is the same for llama3.0 and llama3.1 right?

Prompt:
self.prompt = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n"{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>\n{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>{output}<|eot_id|><|end_of_text|>"""

Because I get very strange outputs when run my fine tuned llama-3.1 model in interference on the same data set as I trained the llama-3.0 version on (which worked well). It produces random tokens for some reason. Did anybody also encounter this problem, thanks!

Output:

cleaned response: {
"Document Type": "Health Certificate"}<|reserved_special_token_226|>.**
<|reserved_special_token_33|>assistant<|reserved_special_token_48|>{
"Document Type": "Health Certificate"}<|reserved_special_token_231|><|reserved_special_token_147|>assistant<|reserved_special_token_40|>{
"Document Type": "Health Certificate"}<|reserved_special_token_212|>assistant<|reserved_special_token_143|>{

Where the previous output was:
{
"Document Type": "Health Certificate"
}

#

Strange output of Meta-Llama-3.1-8B-bnb-4bit after fine tuning

void skiff
#

(note that the prompt is indeed the same, but only for the instruct models)

oblique brook
#

Thanks so much! That's the issue then. I'm trying to extract data from documents and presenting it in a particular format. I have around 1000 data points (that's probably too little for the model?). If I may ask, when would you say that choosing the base model would be the better option?

Example prompt:

    "system": "You are an AI assistant specialized in extracting and formatting product information from shipping documents. Your task is to analyze the given text, identify product details, and present them in a specific JSON format. Translate all information to English. Remember to provide your output without indentation and ensure all required fields are filled, if information is not available return an empty string for that field",
    "instruct": \n1Analyze the provided document.\n2. Extract the following information ONLY if it is present in the document.\n3. If a piece of information is not found in the document, use an empty string (\"\") for that field.\n4. Do not use any default or example values. Only use information from the actual document.\n5. Return the data as a JSON object with no indentation.\n\n{\n\"Consignor Name\": \"...\",\n\"Consignee (optional) Name\": \"...\",\n\"INV. NO.\": \"...\",\n\"Description of Goods\": \"...\",\n\"Number of Packages\": \"...\",\n\"Net Weight\": \"...\",\n\"Gross Weight\": \"...\",\n\"Certificate Of Origin\": \"...\",\n\"Issuing Authority (Details)\": \"...\",\n\"Country of Origin\": \"...\",\n}
void skiff
#

the issue is that, its difficult for the base model to learn from the chat template, since it wasnt specifically trained on examples using the tokens.

#

i would recommend you try using the alpaca dataset format instead, it should be easier for the model to pick up on

#

its not inherently chat-based though, if that's what you need. but for your task you put down in the system prompt, it should be good as a tool.

oblique brook
oblique brook
#

And with regards to the dataset size, what would you recommend when switching over from the instruct to the base model before seeing decent results?

#

Thanks again for the help 😁

void skiff
#

however if you wish to scale and achieve more accurate results, (and more consistant results,) then you should tune the base model

void skiff
#

(unless if you preform CPT)

#

mostly because its been tuned with SFT and RPO to be efficient with the chat prompt

void skiff
oblique brook
#

Awesoe thanks! Will give alpaca a shot for the base model then 🙂