Fine-Tuning Llama3.1 Issues | Unsloth AI | Page 1

marsh bear Oct 1, 2024, 7:36 AM

#

Most likely you didn't use the right template for inferencing. Make sure use your custom make chat template when inference.

fresh hinge Oct 1, 2024, 8:02 AM

#

You mean like the alpaca prompt? Is there a way to use a custom template than the input, output? (I am using Ollama on my machine btw)

marsh bear Oct 1, 2024, 8:44 AM

#

Ollama? Then it's quite likely the chat template issue, especially you modified the one you used to finetuned. I'm not familliar Ollama, you can check with their server.

fresh hinge Oct 1, 2024, 11:06 AM

#

So you think closing this issue and going to ollama's server would be better?

marsh bear Oct 1, 2024, 11:13 AM

#

You can leave it open. Yea, ask in the Ollama server or just google how Ollama deals with templates.

fresh hinge Oct 1, 2024, 7:35 PM

#

Sadly, I didn't get any responses from Ollama's or Hugging Face's servers. But I found this and I believe you can write a custom template for your model too...It's just I don't really know how should I do this because my dataset is just basic response to response but I added the scenerio to teach the model how would It react in situations not just the input. I may be blinded by chatbots on character AI for example because I thought AI can interpret the current scenerio by itself.

#

marsh bear Oct 1, 2024, 7:46 PM

#

I don't know what you did with the template but can you use the "standard" ones? Like use Alpaca? Your "scenarios" most likely can fit in the templates.

fresh hinge Oct 1, 2024, 7:48 PM

#

The current template that ended up being broken was the default one

#

the one on the picture

#

I can try to recreate the template I used during finetuning tomorrow if I understand everything correctly

#

But before that. The very slow response time isn't connected to that is it?

#

Why would it

marsh bear Oct 1, 2024, 7:53 PM

#

Maybe Ollama used CPU instead of GPU for inference

fresh hinge Oct 1, 2024, 7:54 PM

#

The thing is that if I switch to the regular llama3.1 it responds fine

#

it's just the fine-tuned model

marsh bear Oct 1, 2024, 7:56 PM

#

Ollama will try to use cpu if not enough vram if I can remember. How big is your fine-tuned model?

fresh hinge Oct 1, 2024, 7:56 PM

#

fined tuned is 16gb so kinda big

marsh bear Oct 1, 2024, 7:56 PM

#

Your vram is?

fresh hinge Oct 1, 2024, 7:56 PM

#

32

marsh bear Oct 1, 2024, 7:57 PM

#

You sure 32? That's an odd number

#

What GPU you have has 32?

fresh hinge Oct 1, 2024, 8:01 PM

#

no no sorry

#

it's just late and im stupid

#

i have 6144mb

marsh bear Oct 1, 2024, 8:04 PM

#

Haha, no worries. Then it's too small.

#

Ollama definitely offloaded the model to CPU that's why it's slow.

fresh hinge Oct 1, 2024, 8:05 PM

#

I should probably find a smaller model to fine tune

#

I'll try to find a smaller model tomorrow, fine tune it, change it's template and I'll see how that goes

#

Thank you for your time today!

#Fine-Tuning Llama3.1 Issues