#LLAMA3 Tokenizer and prompt format issues

34 messages · Page 1 of 1 (latest)

fallow parcel
golden current
#

we're currently working on a fix for it

fallow parcel
#

great!

#

@deep cipher Thanks will look!

#

@deep cipher Much appreciated

deep cipher
#

worked?

fallow parcel
#

@deep cipher Yes it answers and stops the answer correctly, but it has a wierd issue when it stopped the sentence the terminal prints out repeated tokens like this:

... it finishes responding here!<|eot_id|><|start_header_id|>assistant<|end_header_id|>ounds!<|eot_id|><|start_header_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|eot_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant<|end_header_id|><|start_header_id|>assistant

#

alpaca_prompt = """
<|start_header_id|>system<|end_header_id|>
{}
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
{}
<|eot_id|><|start_header_id|>assistant<|end_header_id|>
{}""" #Because we operate mostly off of completions, we need this extra token

tokenizer.eos_token='<|eot_id|>'
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

print("Model loaded")
FastLanguageModel.for_inference(model) # Enable native 2x faster inference
inputs = tokenizer(
[
alpaca_prompt.format(
"", #system
"Describe yourself in 1 sentence.", #user
""# output - leave this blank for generation!
)
], return_tensors = "pt").to("cuda")
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

deep cipher
#

model generate doesn’t acknowledge the stop token regardless, it says it’s endless generation for a reason iirc

#

as long as the response is good should be ok

fallow parcel
#

@deep cipher Could this be the issue when the model loads?

Will use the EOS token of id 128001 as padding. Model loaded Setting pad_token_idtoeos_token_id:128001 for open-end generation.

deep cipher
#

yeah that’s the model generation

#

once it’s exported it works fine in lm studio

fallow parcel
deep cipher
#

doesn’t exist

fallow parcel
#

The url?

#

Oh maybe u dont have access to meta repo

#

It's gated

#

They added a generation_Config.json with that config

deep cipher
#

ah,

fallow parcel
#

They added the 128009 to it

deep cipher
#

well i just set the EOS to what i fine tuned with,

#

which is 128009

#

so if it catches both

#

that’s a win

fallow parcel
#

Like this?

tokenizer.eos_token='<|eot_id|>'
tokenizer.eos_token_id = 128009
EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN

#

Sorry I'm new to all this, appreciate your help ❤️

deep cipher
#

i didn’t do the ID because i didn’t see the param, but if it works it works

#

this stuff is black magic because it’s SOTA and too new for good documentation

fallow parcel
#

Thanks alot, you saved me alot of time tho!

#

appreciate it 🙂