#Model answers differently after importing it into Ollama
30 messages ยท Page 1 of 1 (latest)
You did not provide an ollama chat template it seems.
wdym?
and it continues
even with a chat template
How did you save to gguf?
like this
but i managed to get it to work with by adding a chat template and a stop params. problem is that it doesnt' add the dot at the end of the sentence and it is not as accurate as in the inference.
FROM ./unsloth.F16.gguf
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
"""
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|reserved_special_token"
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|im_start|>user"```
Do you see the difference in answering with the quantization_method="f16")?
with f16 i actually managed to get it to answer somewhat correctly
i haven't tried with q4_k_m using the Modelfile provided above yet
but this is weird
the video i watched only had FROM and SYSTEM
without the TEMPLATE and PARAMETER
and it worked fine and actually responded
not like "i'm a 23 years old and i'm looking for a relationshop with an old man"๐
but with actually what it should've responded with
First of all, I recommend you get rid of all statements with "if" , always include the tokens regardless if the prompts or system are empty or not.
You always need to run inference against an LLM with the right prompt template, never break it.
Second, ensure and verify you are having the correct template for ollama again, like I said it's probably a template issue and just fixing some stuff you already have better results. Im sorry i cant help with ollama tho as i would need to loook up that myself lol
Oh
Describe the bug As per the official documentation: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/ It is stated: A prompt should contain a single system message, can conta...
Read my post
Might be relevant ๐
what is the if?
sorry for the delay
so how does it affect my model?
If you train with some tokens such as system tokens and you dont have same template during inference you get different outcome.