Model answers differently after importing it into Ollama | Unsloth AI | Page 1

opaque pewter Oct 16, 2024, 5:28 PM

#

#

left is interface right is Ollama

true sapphire Oct 16, 2024, 9:36 PM

#

You did not provide an ollama chat template it seems.

opaque pewter Oct 17, 2024, 5:48 AM

#

true sapphire You did not provide an ollama chat template it seems.

wdym?

#

and it continues

#

even with a chat template

marble sail Oct 17, 2024, 7:03 AM

#

How did you save to gguf?

opaque pewter Oct 17, 2024, 7:37 AM

#

marble sail How did you save to gguf?

like this

#

but i managed to get it to work with by adding a chat template and a stop params. problem is that it doesnt' add the dot at the end of the sentence and it is not as accurate as in the inference.

#

FROM ./unsloth.F16.gguf

TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
"""

PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|reserved_special_token"
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|im_start|>user"```

marble sail Oct 17, 2024, 9:51 AM

#

opaque pewter like this

Do you see the difference in answering with the quantization_method="f16")?

opaque pewter Oct 17, 2024, 9:52 AM

#

marble sail Do you see the difference in answering with the quantization_method="f16")?

with f16 i actually managed to get it to answer somewhat correctly

#

i haven't tried with q4_k_m using the Modelfile provided above yet

#

but this is weird

#

the video i watched only had FROM and SYSTEM

#

without the TEMPLATE and PARAMETER

#

and it worked fine and actually responded

#

not like "i'm a 23 years old and i'm looking for a relationshop with an old man"💀

#

but with actually what it should've responded with

true sapphire Oct 17, 2024, 1:48 PM

#

First of all, I recommend you get rid of all statements with "if" , always include the tokens regardless if the prompts or system are empty or not.

You always need to run inference against an LLM with the right prompt template, never break it.

#

Second, ensure and verify you are having the correct template for ollama again, like I said it's probably a template issue and just fixing some stuff you already have better results. Im sorry i cant help with ollama tho as i would need to loook up that myself lol

true sapphire Oct 17, 2024, 11:41 PM

#

Oh

#

https://github.com/meta-llama/llama3/issues/203#issuecomment-2156772634

GitHub

The absence or presence of a system token results in different outp...

Describe the bug As per the official documentation: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/ It is stated: A prompt should contain a single system message, can conta...

#

Read my post

#

Might be relevant 🙂

opaque pewter Nov 2, 2024, 9:44 AM

#

true sapphire First of all, I recommend you get rid of all statements with "if" , always inclu...

what is the if?

#

sorry for the delay

opaque pewter Nov 2, 2024, 9:45 AM

#

true sapphire https://github.com/meta-llama/llama3/issues/203#issuecomment-2156772634

so how does it affect my model?

true sapphire Nov 2, 2024, 10:13 AM

#

If you train with some tokens such as system tokens and you dont have same template during inference you get different outcome.

#Model answers differently after importing it into Ollama