#Model answers differently after importing it into Ollama

30 messages ยท Page 1 of 1 (latest)

opaque pewter
#

left is interface right is Ollama

true sapphire
#

You did not provide an ollama chat template it seems.

opaque pewter
#

and it continues

#

even with a chat template

marble sail
#

How did you save to gguf?

opaque pewter
#

but i managed to get it to work with by adding a chat template and a stop params. problem is that it doesnt' add the dot at the end of the sentence and it is not as accurate as in the inference.

#
FROM ./unsloth.F16.gguf

TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
"""

PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|reserved_special_token"
PARAMETER stop "<|im_end|>"
PARAMETER stop "<|im_start|>user"```
marble sail
opaque pewter
#

i haven't tried with q4_k_m using the Modelfile provided above yet

#

but this is weird

#

the video i watched only had FROM and SYSTEM

#

without the TEMPLATE and PARAMETER

#

and it worked fine and actually responded

#

not like "i'm a 23 years old and i'm looking for a relationshop with an old man"๐Ÿ’€

#

but with actually what it should've responded with

true sapphire
#

First of all, I recommend you get rid of all statements with "if" , always include the tokens regardless if the prompts or system are empty or not.

You always need to run inference against an LLM with the right prompt template, never break it.

#

Second, ensure and verify you are having the correct template for ollama again, like I said it's probably a template issue and just fixing some stuff you already have better results. Im sorry i cant help with ollama tho as i would need to loook up that myself lol

true sapphire
#

Oh

#

Read my post

#

Might be relevant ๐Ÿ™‚

opaque pewter
#

sorry for the delay

opaque pewter
true sapphire
#

If you train with some tokens such as system tokens and you dont have same template during inference you get different outcome.