LLAMA3 Tokenizer and prompt format issues | Unsloth AI | Page 1

fallow parcel Apr 21, 2024, 2:55 PM

#

The model writes endless non-stop during inference with Unsloth official tutorial:
https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing

We need to ensure this does not happen during training. Anyone has solved this that can show the exact template or code and not link to threads about the isssue with no solutions?

Thanks

Google Colaboratory

deep cipher Apr 21, 2024, 3:11 PM

#

Sure.

#

https://colab.research.google.com/drive/1jqZMmilKHcD6mBzU20OL236mU2AQJ4nq?usp=sharing

Google Colaboratory

golden current Apr 21, 2024, 3:46 PM

#

we're currently working on a fix for it

fallow parcel Apr 21, 2024, 3:57 PM

#

great!

#

@deep cipher Thanks will look!

#

@deep cipher Much appreciated

deep cipher Apr 21, 2024, 3:59 PM

#

worked?

fallow parcel Apr 21, 2024, 4:09 PM

#

@deep cipher Yes it answers and stops the answer correctly, but it has a wierd issue when it stopped the sentence the terminal prints out repeated tokens like this:

#

alpaca_prompt = """
<|start_header_id|>system<|end_header_id|>
{}
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
{}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
{}""" #Because we operate mostly off of completions, we need this extra token

tokenizer.eos_token='<|eot_id|>'
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

print("Model loaded")
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"", #system
"Describe yourself in 1 sentence.", #user
""# output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

deep cipher Apr 21, 2024, 4:11 PM

#

model generate doesn’t acknowledge the stop token regardless, it says it’s endless generation for a reason iirc

#

as long as the response is good should be ok

fallow parcel Apr 21, 2024, 4:11 PM

#

@deep cipher Could this be the issue when the model loads?

Will use the EOS token of id 128001 as padding. Model loaded Setting pad_token_idtoeos_token_id:128001 for open-end generation.

deep cipher Apr 21, 2024, 4:12 PM

#

yeah that’s the model generation

#

once it’s exported it works fine in lm studio

fallow parcel Apr 21, 2024, 4:13 PM

#

Great! final question, just don't want it to ruin the fine tuning that's all, it wont do that right?

And I saw they pushed this update:
https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/commit/1448453bdb895762499deb4176c1dd83b145fac1

deep cipher Apr 21, 2024, 4:14 PM

#

doesn’t exist

fallow parcel Apr 21, 2024, 4:14 PM

#

The url?

#

Oh maybe u dont have access to meta repo

#

It's gated

#

#

They added a generation_Config.json with that config

deep cipher Apr 21, 2024, 4:15 PM

#

ah,

fallow parcel Apr 21, 2024, 4:16 PM

#

They added the 128009 to it

deep cipher Apr 21, 2024, 4:16 PM

#

well i just set the EOS to what i fine tuned with,

#

which is 128009

#

so if it catches both

#

that’s a win

fallow parcel Apr 21, 2024, 4:16 PM

#

Like this?

tokenizer.eos_token='<|eot_id|>'
tokenizer.eos_token_id = 128009
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

#

Sorry I'm new to all this, appreciate your help ❤️

deep cipher Apr 21, 2024, 4:17 PM

#

i didn’t do the ID because i didn’t see the param, but if it works it works

#

this stuff is black magic because it’s SOTA and too new for good documentation

fallow parcel Apr 21, 2024, 4:17 PM

#

Thanks alot, you saved me alot of time tho!

#

appreciate it 🙂

#LLAMA3 Tokenizer and prompt format issues