#Fine-Tuning Llama3.1 Issues

30 messages · Page 1 of 1 (latest)

marsh bear
#

Most likely you didn't use the right template for inferencing. Make sure use your custom make chat template when inference.

fresh hinge
#

You mean like the alpaca prompt? Is there a way to use a custom template than the input, output? (I am using Ollama on my machine btw)

marsh bear
#

Ollama? Then it's quite likely the chat template issue, especially you modified the one you used to finetuned. I'm not familliar Ollama, you can check with their server.

fresh hinge
#

So you think closing this issue and going to ollama's server would be better?

marsh bear
#

You can leave it open. Yea, ask in the Ollama server or just google how Ollama deals with templates.

fresh hinge
#

Sadly, I didn't get any responses from Ollama's or Hugging Face's servers. But I found this and I believe you can write a custom template for your model too...It's just I don't really know how should I do this because my dataset is just basic response to response but I added the scenerio to teach the model how would It react in situations not just the input. I may be blinded by chatbots on character AI for example because I thought AI can interpret the current scenerio by itself.

marsh bear
#

I don't know what you did with the template but can you use the "standard" ones? Like use Alpaca? Your "scenarios" most likely can fit in the templates.

fresh hinge
#

The current template that ended up being broken was the default one

#

the one on the picture

#

I can try to recreate the template I used during finetuning tomorrow if I understand everything correctly

#

But before that. The very slow response time isn't connected to that is it?

#

Why would it

marsh bear
#

Maybe Ollama used CPU instead of GPU for inference

fresh hinge
#

The thing is that if I switch to the regular llama3.1 it responds fine

#

it's just the fine-tuned model

marsh bear
#

Ollama will try to use cpu if not enough vram if I can remember. How big is your fine-tuned model?

fresh hinge
#

fined tuned is 16gb so kinda big

marsh bear
#

Your vram is?

fresh hinge
#

32

marsh bear
#

You sure 32? That's an odd number

#

What GPU you have has 32?

fresh hinge
#

no no sorry

#

it's just late and im stupid

#

i have 6144mb

marsh bear
#

Haha, no worries. Then it's too small.

#

Ollama definitely offloaded the model to CPU that's why it's slow.

fresh hinge
#

I should probably find a smaller model to fine tune

#

I'll try to find a smaller model tomorrow, fine tune it, change it's template and I'll see how that goes

#

Thank you for your time today!