#Fine Tune Llama3 Inference Works On Colab - GGUF produces garbage results in LMstudio.

436 messages · Page 1 of 1 (latest)

jovial hound
#

I've tried this twice now with some simple fine tunes running 1 epoch + 2 epochs. The inference works as aspected, and the training seems to be fitting nicely. When I test the inference of the model on colab, it seems to respond as aspected. However when I convert using llamacpp. to quant 5 - and export the GGUF, everything. Becomes nonsnse (LM studio)

LM STUDIO GGUF Q5 (Llama3 template):

ask your classmates about their experiences with hydroponics. what were some of the challenges they faced while trying out new techniques? how did they overcome these challenges? ask them how they managed to get plants to grow in space. how do you think they would feel if they knew there was a chance they might not be able to produce enough food for themselves and their families?

ask your classmates about their experiences with hydroponics. what were some of the challenges they faced while trying out new techniques? how did they overcome these challenges? ask them how they managed to get plants to grow in space. how do you think they would feel if they knew there was a chance they might not be able to produce enough food for themselves and their families?

COLAB RESPONSE:
imagine you were building a new form of hydroponics for space consider the techniques and process to approach this. what are the key factors to consider when designing a hydroponic system for space? how would you ensure the system is efficient and effective in providing nutrients to plants?

I'm happy to help you with your question. To build a new form of hydroponics for space, there are several factors to consider:1. Water and nutrient delivery: In space, water and nutrients will need to be delivered to the plants in a controlled manner.

Testing the same inference locally:

#

Full inference sample with reloaded weights
imagine you were building a new form of hydroponics for space consider the techniques and process to approach this. what are the key factors to consider when designing a hydroponic system for space? how would you ensure the system is efficient and effective in providing nutrients to plants?
I'm happy to help you with your question. To build a new form of hydroponics for space, there are several factors to consider:1. Water and nutrient delivery: In space, water and nutrients will need to be delivered to the plants in a controlled manner. This can be achieved through the use of pumps, valves, and other mechanisms that ensure the right amount of water and nutrients are delivered to the plants at the right time.2. Plant growth media: In space, there may not be enough soil or other growth media available, so alternative methods will need to be used. This could include using a combination of hydroponic media, such as rockwool or perlite, and aeroponics, where plants are grown in a mist of water and nutrients.3. Light and temperature control: In space, it may be difficult to control light and temperature conditions, so a system that can adjust to changing conditions will be necessary. This could include using LED lights or other lighting systems that can be adjusted to provide the right amount of light for plant growth.4. Pest and disease control: In space, there may not be enough air circulation or other factors to control pests and diseases, so a system that can detect and prevent them will be necessary. This could include using sensors and other technologies to monitor plant health and detect any issues early on.5. Automation: In space,

twilit hornet
#

To be honest I had the same problem. I managed to run my models in both LM studio and ollama. 1. I trained using oryginal model (not quantized but loaded in 4bit). 2. I used Instruct model instead of base. 3. I preprocess dataset using oryginal chat template from oryginal instruct model. 4. Trained using Unsloth. 5. I separately converted it to gguf using a. bpe tokenizer option b. Setting manually eos token. 6. In LM studio I use oryginał Lama 3 template. 7. In ollama use.

#

FROM ./Llama-3-Omnibus-PL-v01-GGUF.Q4_K_M.gguf
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""
SYSTEM """You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability."""
PARAMETER num_ctx 8192
PARAMETER num_gpu 99

#

This is my pipeline. I tried different ways and all the time I had problem with LM studio - model generated crap or was in loop.

#

Gguf generated from Unsloth did not work for me in LM studio. I do not why.

#

But anyway I had not time to investigate this problem since I spend on llama 3 a lot of time (too much 😂😅).

jovial hound
#

I’m also part of the LM studio community and I’m help there too so I’ll pass this over there. I wonder if I can try this in GPT for all or something else and see what happens

jovial hound
#

@twilit hornet im assuming the piece you shared is how to create a model file to run ollama with a custom local model?

twilit hornet
#

yes

jovial hound
#

Same happens in: ollama;

1/0. I am sorry but I don't understand you. What can I do for you? How can I help? Please ask me in a different way. What did you want? Your request is too general to be
understood. Once again, what do you want? How can I help you? Please specify your problem. 1/0.
2. The system is not intelligent enough to understand the user's requests. This situation is called underfitting.
3. The system is intelligent enough that it understands some of the user's requests but not all of them. This situation is called overfitting. It is similar to an AI version of
a well-known paradox: A paradox is something so unintelligible that no one can understand it; an antiparadox is something so intelligible that everyone else can't understand
it.
4. The system is intelligent enough that it understands all of the user's requests but not in time to answer them. This situation is called overfitting (in this case,
temporal rather than spatial).
5. The system is intelligent enough that it understands all of the user's requests and answers them immediately. This situation is called underfitting.

#

Modelfile

FROM imagineaiuser/csdinnovation-llama3-full/csd_innovate_long_train_llama3.gguf

TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> {{ .Response }}<|eot_id|>"""

SYSTEM """You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability."""

PARAMETER num_ctx 8192
PARAMETER num_gpu 99

rigid mauve
#

I've had a similar issue I think, Ive tried fine tuning llama 3 8b with the notebook, get all the way to the end with ggufs, load them into ollama and spit out gibberish, likewise, tried importing to llama.cpp and got the same kind of output. Havent been able to get the main safetensors to load in hf either, always says failed to load. Tbf I seem to have the same issue with mistral7b fine tunes so I wonder if this is a deeper issue. The inferences work in the notebook, but cant seem to get them to work with anything else

jovial hound
#

I suspect its with the convesion end point process - since it works all the way up there, I havent tried a mistral fine tune like this to see if same issue, - but yeah the end conversion pretty big issue if you cant use the model for anything 🙂

rigid mauve
#

noted he is just using q4, I wonder if that is involved

#

I was just trying q8

rigid mauve
#

q4 is gibberish too

jovial hound
#

I just tried a mistral version as well - finetune in colab pre-quant looks good

rigid mauve
#

yeah my pre-quants are always ok, seems whatever happens between pre-quant and ollama ends up being gibberish

#

noticed a delta from, molbals repo where in the unsloth notebook it says "Now if you want to load the LoRA adapters we just saved for inference, set False to True:" I had left that at False, trying it now with True

jovial hound
#

ANyone officially on the project have any ideas? i think this is an unsloth issue,

rigid mauve
#

setting true on load lora adapters at the end made no difference. hanging it up until we hear something official

slow oar
#

Hmmm much apologies guys 😦

#

Ill do a large investigation and see how we can fix it

rigid mauve
slow oar
#

@rigid mauve Hmm so old quants worked

#

i think its just llama-3

rigid mauve
jovial hound
#

same, saw this with mistral. local

#

i also got garbage results in ollama - just to note

slow oar
#

hmmmmmmmmm

#

ok ill check this today

#

much apologies again!

rigid mauve
#

FYI, I'm having much better luck with the sharegpt format based template at the top this thread: https://www.reddit.com/r/LocalLLaMA/comments/1cc7gtr/llama3_8b_finetuning_2x_faster_fixed_endless/?share_id=DY_R6CsM3fqBvaEiAy-Iw&utm_content=1&utm_medium=ios_app&utm_name=ioscss&utm_source=share&utm_term=10

I will share my final notebook when I get it fully working-- essentially its no longer spitting out gibberish, but I'm still getting endless generations, trying several approaches with the stop tokens.

here is my Modelfile for ollama:

FROM ./financial_sentiment_llama_8b_with_new_llama_3_template_and_instruct-unsloth.Q8_0.gguf
SYSTEM """Analyze the sentiment of this text."""
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""
# Sets the size of the context window used to generate the next token.
PARAMETER num_ctx 8192

# None of these stop token attempts worked

# The stop token is printed during the beginning of the training token
# PARAMETER stop <|end_of_text|> # Default for Llama3
# PARAMETER stop </s> # Default for Mistral
# PARAMETER stop <|begin_of_text|>
# A parameter that sets the temperature of the model, controlling how creative or conservative the model's responses will be
PARAMETER temperature 0.2

# Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
PARAMETER repeat_last_n 256

# Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)
PARAMETER num_predict 1024
Reddit

Explore this post and more from the LocalLLaMA community

#

this bit of the code in the notebook I'm experimenting with, initally had map eos set to True but I thought that </s> is normally for mistral , trying false now:

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
    map_eos_token = False, # Maps <|im_end|> to </s> instead
)
#

False didn't affect it, still generating forever, I must be missing something about this stop sequence

rigid mauve
#

instruct

#

same as the template (unsloth/llama-3-8b-Instruct-bnb-4bit)

#

the outputs are in there

#

some tweaks for my filenaming I'm using hf so I need that token at the top-- you can sub in your own

#

output is almost correct but won't stop ( ollama ), need to do some data cleanup, but need it to stop before attempting more training runs.

{"reasoning": "The sentiment is positive based on the financial context of the sentence.", "sentiment": 1.0, "confidence": 0.6}gpt
@united I'm still waiting for my bag to be delivered. It's been over an hour and a half.human


{"reasoning": "The sentiment is negative based on the frustration expressed in the tweet.", "sentiment": -1.0, "confidence": 1.0}gpt
@united I'm still waiting for my bag to be delivered. It's been over an hour and a half.human


{"reasoning": "The sentiment is negative based on the frustration expressed in the tweet.", "sentiment": -1.0, "confidence": 1.0}gpt
@united I'm still waiting for my bag to be delivered. It's been over an hour and a half.human


{"reasoning": "The sentiment is negative based on the frustration expressed in the tweet.", "sentiment": -1.0, "confidence": 1.0}gpt```
#

note the 'unsloth/llama-3-8b-Instruct-bnb-4bit' model is different than the last template, I think that combined with the chat template is getting it to work, but there seems to be something up with the stop tokens

jovial hound
#

I’m following this can’t do to much testing at moment had surgery today but hoping we can figure this out

rigid mauve
#

sorry to hear about your surgery Preston, hope you're doing well- I'm still hacking at this, I haven't had much luck getting llama3 to stop, but I'm trying mistral now just to isolate llama3, will keep on posting my progress-- my intention is to write a tutorial for folks with unsloth and ollama for sentiment analysis fine tuning-- I'm sure we'll figure it out

slow oar
#

ill get back to this issue as well!

#

apologies had a lot of stuff going on

#

thanks @rigid mauve for the wonderful investigation

#

extremely appreciate it

#

everyone will get credits once this is resolved!

rigid mauve
#

quick update, my 60 step mistral test run on the aforementioned notebook appeared to work eg stop, but wasn't getting great results that I did with llama3 with those same 60 steps, waiting for a larger run to finish to eval performance, but would still like to do llama 3 as that was my intent going in

frank igloo
#

Also noticed very big differences during training logs & later use in ollama
fine tuning llama3 in german tho so not sure if the reason is the same as yours but also could be a general problem

jovial hound
#

I’m wanting to do some updates - one is building my innovation fine tuned, also a few others. Do we think it’s in the template? Or do we think it’s moving from 4bit to 16 and then to quant, - I’m wondering if llama.cpp may have a bug

rigid mauve
#

not looking good with the larger mistral run, the inference in the notebook was wrong, doesn't seem to be following the instructions at all even with 2 epoches-- essentially no use for it as I get better perf with the regular model using system prompting and 5 shot. I used the unsloth/mistral-7b-instruct-v0.2-bnb-4bit model with the mistral template-- not sure if this just mistrals problem because the llama3 run I mentioned earlier actually following the instructions it just generates forever. Going to put down unsloth for a bit and try other solutions, not really having luck here and cant figure out what else to try. Will keep an eye on the channels for updates that compute.

jovial hound
#

Yeah so it’s got to be something relating to a layer lost between 16 and 4 bit I feel like

jovial hound
#

so this is what lm studio ops think

Because LM studio defaults to the proper template for that.
So in this case, manually select the Llama 3 preset
... if it's still broken when using the right template, the fine tune could be messed up in that it removed the stop Tokens (<|eot_id|>)

slow oar
#

hmmm it might be the 4bit and 16bit issue

jovial hound
#

yep i dont know enough about how unsloth works in 4 bit to understand

rigid mauve
#

Im just about to test a mistral run that started with the native mistral model not loaded into 4 bit, that may explain the poor performance for the mistral notebook

#

also I got a reply from the guy that got llama to work https://www.reddit.com/r/LocalLLaMA/comments/1cc7gtr/comment/l1ed255/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button , "You should add the eos token to tokenizer jusr before training" but Im too much of a noob to understand how to implement that in the notebook-- maybe thats a clue for the infinite generation issue with llama3. I presume this is the mentioned '<|end_of_text|>' token

Reddit

Explore this conversation and more from the LocalLLaMA community

#

unfortunately trying to fine tune from mistralai/Mistral-7B-Instruct-v0.2 output just nightmare fuel, worse than the 4bit unsloth/mistral-7b-instruct-v0.2-bnb-4bit, I wonder if its worth trying loading mistralai/Mistral-7B-Instruct-v0.2 into 4 bit but I don't think the performance is going to be good enough / it seems unlikely that I can fine tune at 4 bit and get back anything better than the original fp16

rigid mauve
jovial hound
#

trying a slow llama train the old way - but 4it approach would be a game changer, here

#

looks like its a llama cpe tokenizer issue. - with BPE - gguf seemed to be rule dout

jovial hound
#

Got to wait for this merge

jovial hound
#

Thhis is the pull request we are waiting on - but still people trying to fix regex

slow oar
#

@rigid mauve oh was that you lol - if yes - apologies since Unsloth is a startup, theres been too much stuff recently for me to handle, so apologies if I'm slow

#

i know i said ill look into it, but please be patient 🙂

#

appreciate it a lot 🙂

rigid mauve
#

yep thats me

slow oar
#

omg $200???!!

#

ok thats very bad

rigid mauve
#

and a lot of my personal time

#

my name is Sean Dearnaley btw

slow oar
#

ye i know so sorry

rigid mauve
#

I mean are you marketing this as a startup that doesnt work?

slow oar
#

hmmm i know u spent a lot of time on this

#

extreme apologies

rigid mauve
#

I know there are investors and things but this could get hairy fast

slow oar
#

well everything else works, but yes the GGUF part is partially broken

#

Unsloth mainly is a finetuning library

rigid mauve
#

is that true tho because I couldnt get mistral or lora adapters to work

slow oar
#

we made GGUF conversion only because users asked

rigid mauve
#

or load into hugging face

slow oar
#

what thats not good

rigid mauve
#

yeah you got it buddy

#

not good

slow oar
#

what do u mean it doesnt work though?

#

do u have a screenshot

rigid mauve
#

I can help with your notebooks because they need some more advise for newbies like myself

slow oar
#

like the issues u listed above is a llama.cpp issue right?

#

not an Unsloth issue

rigid mauve
#

unfortunately I deleted all the hf models

#

so lets get this straight

slow oar
#

but ill see what i can do

rigid mauve
#

there are issues with llama.cpp yes

#

but somehow the 4bit mistral worked worse than the 7b

#

eg no point in fine tuning

#

but also the models I did upload to hf

slow oar
#

so does inference work fine inside the colab notebook / kaggle?

rigid mauve
#

running the hf inference failed, failed to load

slow oar
#

apologies its not working, but just wanted to know why

#

oh that

#

thats not supposed to work

rigid mauve
#

the inference for llama3 8b works fine

slow oar
#

thats a HuggingFace error sadly

#

i asked HF to fix it

#

but they wont

rigid mauve
#

let me be clear,

slow oar
#

they said they dont want to provide free inference for 4bit models

rigid mauve
#

llama3 has issues with tokenization that everyone is working on apparently

slow oar
#

u have to merge to 16bit then do inference onthe HF side

rigid mauve
#

but people have gotten this to work

slow oar
#

are u using

#

this?

rigid mauve
#

well yeah thats right

slow oar
#

ye sadly that is a problem

#

i tried very hard pushing HF to fix it

rigid mauve
#

so imho you need better guidance for how to actually use these things

#

there is no practical applications if the ggufs dont work imho

#

back to mistal7b tho

slow oar
#

have u tried using Ooba?

rigid mauve
#

thats old news

slow oar
#

so ur saying Mistral 7b 4bit also doesnt work?

rigid mauve
#

I did try ooba actually and it also did nort work

slow oar
#

as in it loaded

#

but the results are gibberish?

#

so this is actually news to me!!

#

hence why im asking more questions'

rigid mauve
#

I figured but have you tried it lately

#

I can see why there are 2600 models on hf

#

but heres the thing, my fine tune to get jsons out

slow oar
#

ok ill try running the mistal notebook again

rigid mauve
#

and Im just trying to help from a startup pov

slow oar
#

yes appreciate it

rigid mauve
#

if the 4bit performs worse than the q8 regular mistral 7b for practical applications, then what is the point of it

#

eg fine tunes

#

I can see maybe people experimenting for fun

slow oar
#

i do know some people have said QLoRA itself has issues ie 4bit finetuning can cause issues

#

have u considered load_in_4bit = False?

#

it will eat 4x more VRAM

rigid mauve
#

lets be clear if you read my previous messages

#

I tried load_in_4bit true which ends up working (I guess) but useless

slow oar
#

still useless?

rigid mauve
#

and I also tried load_in_4bit false with the mistral/mistral7b02 instruct and it was total gibberish actually performed really strangely

#

I would encourage you to try yourself and see what happens if you use "any" model as you say

slow oar
#

so the GGUF was total gibberish?

rigid mauve
#

yep

slow oar
#

ok ill rerun them all 🙂

#

running now

rigid mauve
#

the 4bit worked but pointless

slow oar
#

apologies on the issues

#

ill report back

#

sorry been juggling too many things recently

rigid mauve
#

np Daniel, listen I have to bounce please for everyone else on the discord let them know whats up

slow oar
#

sadly its just me on engineering :(((

rigid mauve
#

keep posted posted maybe with release notes

#

gotcha Daniel

slow oar
#

and its a 2 person team - so apologies if im slow to respond

#

extreme apologies

#

oh yes yes

rigid mauve
#

I like the whole vibe of what you're trying to do

#

but it needs to be more thorough and honest

#

with regular updates and I'll support you guys forever

#

gn for now and best of luck to you

slow oar
#

yep thanks 🙂

#

goodnight!

#

hopefully i can fix it!!

#

appreciate the support! 🙂

rigid mauve
slow oar
#

oh ok will do!

tepid forge
rigid mauve
#

Ok so my thoughts on datasets for llama 3 coding abilities honestly, right now i don’t think its practical I say this because the sota is probably the dolphin series and you can use their datasets but they seemingly haven’t been able to improve llama70bs coding abilities. I know some folks may argue with me on that but Id hold off and see in the next few weeks if people tweak their datasets. It could be possible but i know llama3 is already highly optimized for code and im personally very impressed. There is also this llama 3 400b that is still in training and it will be incredible. That being said 90+ for all levels generally hard, but if I were you focus on specific languages and interesting specializations. Im working on my own startup and will share more in the coming months. Good luck!

slow oar
#

interestingly i tried it and normal LoRA adapters wor

#

BUT llama.cpp does not convert...........

#

so ur definitely correct something is breaking

slow oar
#

i managed to somehow make it to work

#

there are definitely some bugs

#

so need to fix it

#

i used ur datatset

#

and example

rigid mauve
#

very exciting

#

and you're welcome to use that dataset as your example going forward

#

I need to go to sleep but will check this later!

rigid mauve
#

@slow oar it doesnt work-- so I still need to go to sleep but essentially I got the notebook to go through (it wouldn't push to hf for some reason but that doesnt matter)- I did it with 300 steps, the inference looks fine in the notebook -- but final GGUF loaded into ollama shows no signs of being fine tuned at all-- it's real simple you're supposed to put in text and it should give you back a json snippet just like in the dataset--- instead it just looks like a regular completion. I think next steps for you, you need to get out of the notebooks and start using ollama yourself-- its the biggest provider right now, it is absolutely essential. Try loading the model files yourself and chatting with ollama. It does not work.

rigid mauve
#

I spoke too soon, there is an issue but I figured it out. Extremely close this is totally acceptable for my needs and works great. Here is the issue- so for your mistral template you've chosen the llama3 style template, which means you have to specify that when importing the gguf. So traditionally in my experience with ggufs the template is baked in, so in this case Ollama was importing the mistral gguf and using the mistral template. I think you probably can bake in the template into the gguf but for now, here is the Modelfile for Ollama using the Mistral 7b imports and it works great! I'm going to try Llama3 8b next as that should be a more clean shot. Good work @slow oar

FROM ./mistral7b-sentimentanalysis-unsloth.Q8_0.gguf
SYSTEM """
You are an advanced AI assistant created to perform sentiment analysis on text. Your task is to carefully read the text and analyze the sentiment it expresses towards the potential future stock value of any company mentioned. Analyze the sentiment of this text and respond with the appropriate JSON:
"""
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""
slow oar
#

hmmm would Unsloth auto creating a modelfile be useful?

rigid mauve
#

I think so yeah

#

for noobs certainly

slow oar
#

ok ok good idea 🙂

#

i think a few people mentioned it

#

so ur not alone 🙂

rigid mauve
#

also I wonder is it possible to bake it in to the ggufs but Im not sure how that works, I just know its a thing

slow oar
#

ill make a modelfile automtically

#

yes it can

#

im unsure though how one uses it

#

do u know if there is like a set format for it?

#

i can see there is a chat template inside GGUF

#

but i think its wrong

rigid mauve
#

um well I know if you look at the Ollama model list you can see templates for each of the models

slow oar
#

oh ye ok thanks 🙂

rigid mauve
#

I know mistrals is very different

#

but Llama3 8b is the new big boy so this template is great

slow oar
#

oh for llama-3 u can change the "chatml" to "llama-3"

#

but also wanted to apologise didnt solve the issue for u earlier

#

hope its mostly resolved :))

rigid mauve
#

np looking good, there was defo some things up in that early mistral notebook but you've fixed it

#

and I think this tokenization issue with llama 3 8b and llama.cpp may require another re-roll but hopefully not

#

gonna try llama 8b later today

#

thanks again-- I may make some suggestions for your notebook to make it more user friendly but its really good

slow oar
#

yep thanks!

#

🙂

#

i mean through you, i can see how to makethe notebooks beter 🙂

jovial hound
#

whats the status can i attempt another llama 3 8b fine tune with unsloth - oin my own data set?

#

@slow oar

rigid mauve
#

@jovial hound you will need to wait for llama.cpp to fix the tokenization issue

jovial hound
#

thanks looks like they have + they are workin on ojne more piece + then merging

#

according to what I see out there,

raven lava
#

I asked about this in ollama discord as well. They asked me to copy the model file from llama 3 . I did but still got the same issue.

slow oar
#

but ye the llama tokenizer issue will have to wait for llama.cpp

#

on the modelfile @raven lava it seems like i should auto make it for u 🙂

raven lava
#

I just finetuned mistral and used it on ollama. same issue. It's all gibberish.
It was working perfectly fine in notebook.

raven lava
# slow oar on the modelfile <@1049432187763687474> it seems like i should auto make it for ...

Modelfile generated by "ollama show"

To build a new Modelfile based on this one, replace the FROM line with:

FROM llama3:latest

FROM /Users/jsk/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|reserved_special_token"

This is the model file for llama 3 on ollama.

jovial hound
#

ok looks like the fix just got merged,

#

So I would give it a try those - that are trying to

#

its llama CPP latest,

slow oar
#

OH EYSSS great it got merged!!

frank igloo
#

is removing the llama.cpp directory and running the python code again enough? (Does unsloth pull the llama.cpp:latest?)
Or do we need to wait for an unsloth patch?

raven lava
#

Is if fixed?? I’m about to sleep. Should I start training again 👀

jovial hound
#

im trying it myself,

#

but takes a bit to get back to that point, got to load something back in

jovial hound
#

testing now with a new model,

#

ok so far not good responses with a q5 - going to q8

#

a lot of people think the 4bit tuning drops in too much loss

slow oar
#

hmm ye i did see the 4bit quant isnt good

#

and hmmm this might affect QLoRA

jovial hound
#

but does that mean it can still be trained in 4bit,

#

not much luck in Q8 either:

hi

AI

How can I help you?

brushlapuserlapassistantlap
I'm here to help you with any questions or tasks you may have. Just ask!lapuserlapassistantlap
What is the fastest way to learn foreign languages?lapuserlapassistantlap
It depends on your learning style, but some methods can be faster than others. One approach is to use immersion techniques such as studying abroad or taking online classes in a foreign language. Another possible method is to use shadowing, where you listen to native speakers of the language you want to learn and repeat what they say, word for word, improving your pronunciation and understanding in the process.brushlapuserlapassistantlap
Excellent question! To find the fastest way to learn foreign languages, it might be helpful to consider a few factors: Your level of prior language experience: If you have no prior language experience, you might want to start by learning basic vocabulary and grammar before moving on to more advanced concepts. Your learning style: Do you prefer graphical methods or textual methods? Are you a visual or auditory learner? What works best for you will depend on your preferences and strengths. The amount of time you can dedicate to learning: If you only have a limited amount of time available, you might want to focus on high-frequency words and phrases that are useful in everyday conversations. This can help you develop basic conversational skills more quickly than trying to learn an entire language from scratch.brushlapuserlapassistantlap

raven lava
#

Why am I facing the same issue with mistral 7b ?? Do u guys have any idea/ suggestions?

jovial hound
#

I think something has changed recently. - but im having same issue with QLora as well, - i. even had issue with ORPO today,

#

Lllama cpp had a major fix for BPE tokenizers today -

raven lava
#

Yup. still not working.

frank igloo
#

Are you using Ollama too?
I think it might be just an Ollama issue
There already is a big difference in quality when I load the exact same base model (llama3 4b latest) into A: Ollama or B: unsloth inference before training
B was much better but idk why, kind of expected the same quality in reults here since its both just the base model wthout any fine tuning
Same system prompt, gave Ollama the official prompting template of llama3 too.
Idk which parameters are used in unsloth for llama3 but I guess nno big difference in those, maybe @slow oar knows? 😄

raven lava
#

yes. i'm using ollama. I'm using the notebook to convert it to q4 and then download them to my macbook.
Just now I tried Phi 3 mini too. same gibberish.

It was working fine in the notebook. The full model without quantization.
I'm testing something now. I'm downloading the full model, then i'll use ollama to quantize the model and see how it performs.

raven lava
#

I coudn't quantize using ollama/docker, which automates the quantization.
It is saying that tokenizer.model is missing. This file is missing for all latest model releases.

Then I tried to manually do it, using llama.cpp as mentioned on ollama github repo.

Loading model file /Users/jsk/Storage/jayasuryajsk_Llama-3-8b-Telugu-Romanized/model-00001-of-00004.safetensors
Loading model file /Users/jsk/Storage/jayasuryajsk_Llama-3-8b-Telugu-Romanized/model-00001-of-00004.safetensors
Loading model file /Users/jsk/Storage/jayasuryajsk_Llama-3-8b-Telugu-Romanized/model-00002-of-00004.safetensors
Loading model file /Users/jsk/Storage/jayasuryajsk_Llama-3-8b-Telugu-Romanized/model-00003-of-00004.safetensors
Loading model file /Users/jsk/Storage/jayasuryajsk_Llama-3-8b-Telugu-Romanized/model-00004-of-00004.safetensors
params = Params(n_vocab=128256, n_embd=4096, n_layer=32, n_ctx=8192, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('/Users/jsk/Storage/jayasuryajsk_Llama-3-8b-Telugu-Romanized'))
Traceback (most recent call last):
  File "/Users/jsk/ollama/llm/llama.cpp/convert.py", line 1555, in <module>
    main()
  File "/Users/jsk/ollama/llm/llama.cpp/convert.py", line 1522, in main
    vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
  File "/Users/jsk/ollama/llm/llama.cpp/convert.py", line 1424, in load_vocab
    vocab = self._create_vocab_by_path(vocab_types)
  File "/Users/jsk/ollama/llm/llama.cpp/convert.py", line 1414, in _create_vocab_by_path
    raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab_types}")
FileNotFoundError: Could not find a tokenizer matching any of ['spm', 'hfft']
#

Tokenizer issue. This model was trained a week ago.

raven lava
#

I tried with gradientai/Llama-3-8B-Instruct-262k as well.
same error again.

raven lava
#

I found a fix. gradientai/Llama-3-8B-Instruct-262k is working fine, when quantized locally and used in ollama

#

Still my finetune is haivng issues.

slow oar
#

Yep can see there are errors

#

working a fix

#

apologies!!!

jovial hound
#

let me know when can retest

tepid forge
#

has anyone tried the 262k token llama 3 by gradient?

jovial hound
#

@slow oar any luck on getting the 4bit quants for llama3 to train?

raven lava
jovial hound
#

both

#

finetune and then rune gguf locally no worky yet

raven lava
#

Yes. It’s not working. Even phi models aren’t working.

slow oar
jovial hound
#

So far any unsloth I use with the colab and my own training data formatted in the same way, comes out garbage when quant and using the model locally - they key here I I want to use the model locally not in the cloud, - and of course my own train sets - maybe gguf is the issue?

#

Before quant and on the colab it’s working, after the fine tune so it’s the conversion process at end that has something get messed up

tepid forge
raven lava
jovial hound
#

ahh then maybe thats still the issue -

#

i dont want to do any more trains untill i know this is fixed spent some dollars doing the trains

raven lava
#

I’ve seen in a thread saying that LM studio is working. Haven’t tested it myself yet.

agile anvil
#

Well investigate this issue again today

#

It's likely it's the llama.cpp tokenizer issue mainly but well check from our side

jovial hound
#

ok love to do some more traing with unsloth

#

heading to an ai confefence

raven lava
#

I just came home and tried LM studio. Its working..!!! I had no gibberish issues or any other unlimited text generation issues.

#

I just downloaded the gguf file converted by unsloth directly and ran it.

#

I cant believe, I wasted 10 days on this

jovial hound
#

Share what colab you used because I still haven’t had a good lama 3 fine tune to gguf

#

Maybe a lm studio issue?

#

I’ve been wasting that long on it to but don’t want to spend any more training credits till I’m sure it works

raven lava
#

I changed only this line from the original notebook.
EOS_TOKEN = "<|end_of_text|>" # Must add EOS_TOKEN

jovial hound
#

If that is all it is I will cry

jovial hound
#

Have your tried your own custom data set?

#

For example I have a data set I built on methods of innovation to hyper tune the model towards innovations techniques

raven lava
#

The one i'm running on lm studio was trained only 600 steps,the output is better than what I got in the colab notebook

raven lava
jovial hound
#

And good results you got that’s great!

#

Yeah in 4bit I can train at least 2 epochs, and it looked good on inference it was conversion to gguf I think that was broken and then testing q5, q8 just asking in lm studio. Hi the model went nuts

raven lava
#

only GPT4 can chat/output this kind of mixing between english and indian languages. In my mother tongue, in day to day usage, people use English words for nouns like pen, paper etc but verbs and adjectives will be by native language. The script is again english letters. Its kind of hard for LLMs to chat without proper data in pretraining phase. So, I generated 6000 samples using gpt4 api. Now i'll use this to generate more data like 50 to 100k and train another model.

raven lava
jovial hound
#

I’ll review and adapt your notebook mine I had to change to load my data set, which I know worked just failed on conversion and then using ollama lm studio, I know llama cpp made changes a few days ago so was waiting for dust to settle

jagged kiln
#

llama.cpp seems to be the cause of the issue but the team is already working very hard and don't get paid for their work

jovial hound
#

I think it’s solved

slow oar
#

hmmm maybe ill reupload llama-3

#

ill also made a notebook with a clearer example

#

if that helps

#

including actual llama.cpp inference on the bottom

jovial hound
#

Thanks please do!

jovial hound
#

Still no louck - got a new example ready to try?

jovial hound
#

this is what is super confusing in the - colab

Now, use the model-unsloth.gguf file or model-unsloth-Q4_K_M.gguf file in llama.cpp or a UI based system like GPT4All. You can install GPT4All by going here.

And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:

#

Thats nt true, it wont actually save anything till you actually run llama

#

Please udpate this so it actually tells people what to do

jovial hound
#

Used your exact colab - only changed the EOT token. - and still doesnt work:

I am not sure what you want me to do.
Please tell me.

You have been assigned to help with a new project, and it's up to you to make sure it goes smoothly. You are in charge of managing the project and making sure that everyone is working together to get the job done. You will need to communicate frequently with your team members, and keep them updated on what needs to be done next. You will also need to make sure that all of your tasks are completed before the deadline.

You have been assigned to help with a new project, and it's up to you to make sure it goes smoothly. You are in charge of managing the project and making sure that everyone is working together to get the job done. You will need to communicate frequently with your team members, and keep them updated on what needs to be done next. You will also need to make sure that all of your tasks are completed before the deadline.

You have been assigned to help with a new project, and it's up to you to make sure it goes smoothly. You are in charge of managing the project and making sure that everyone is working together to get the job done. You will need to communicate frequently with your team members, and keep them updated on what needs to be done next. You will also need to make sure that all of your tasks are completed before the deadline.

You have been assigned to help with a new project, and it's up to you to make sure it go

jovial hound
#

So i guess this project is dead then?

#

since it doesnt work

raven lava
#

Can you please give me the link for the dataset and I’ll have a look and train .

slow oar
#

@jovial hound Sorry I know you're getting frustrated so apologies

#

i didnt have time to make a full example sorry apologies - i was going to get it done toda

#

ill ping u when its done

rigid mauve
#

FYI Ollama updated yesterday evening and that resolved the issues I had with llama3 models never stopping, I guess they fixed their llama.cpp implementation: #1151557857846767656 message

agile anvil
#

thanks for the update

agile anvil
#

Btw...does that mean we don't need to fix anything now? Or is there still something wrong?

jovial hound
#

Ive done the training with peft before, I just saw that unsloth looks better anything that speeds up training is better - if I can tget it to work,

agile anvil
#

Really hope it works, as the issue is not our end 😦

jovial hound
#

so you have it working now in llama studio?

#

lm

#

Its still doing it for me and I just updated yesteday with latest

#

hi

AI
The user is a friendly, intelligent, thoughtful, and diligent person. They are always polite and respectful.

Hi! I'm an intelligent, helpful, friendly, and efficient AI assistant. I am here to help you with whatever you need. I will always do my best to help you in the most effective way possible. Thank you for using me!
I'm a kind, smart, thoughtful, and efficient AI assistant. I'm always polite and respectful. Thanks for using me!

The user is a helpful, intelligent, friendly, and efficient AI assistant. They are always polite and respectful.
Hi! I'm an intelligent, helpful, friendly, and efficient AI assistant. I am here to help you with whatever you need. I will always do my best to help you in the most effective way possible. Thank you for using me!
I'm a kind, smart, thoughtful, and efficient AI assistant. I'm always polite and respectful. Thanks for using me!

The user is a helpful, intelligent, friendly, and efficient AI assistant. They are always polite and respectful.
Hi! I'm an intelligent, helpful, friendly, and efficient AI assistant. I am here to help you with whatever you need. I will always do my best to help you in the most effective way possible. Thank you for using me!
I'm a kind, smart, thoughtful, and efficient AI assistant. I'm always polite and respectful. Thanks for using me!

agile anvil
jovial hound
#

i havent tried it in ollama but its the Q8 tune i did direct saved it locally - from demo notebook - with the alpaca i didnt change anythign except to save the quant locally - I'm not sur ehow to setup th emodel file for ollama have to look

agile anvil
#

@rigid mauve do you happen to know if llama 3 unsloth GGUF now works in ollama?

#

have you tested it? thank you! ahhhh

rigid mauve
#

so thats interesting about the q8, so weirdly my q8 didnt save properly in the notebook, but an older q8 worked in ollama, the fp16 and q4 saved ok and they worked in ollama. I have tested all 3. Not sure if there is a bug with q8 saving that got introduced recently

jovial hound
#

Working through creating new ollama file,

jovial hound
#

Took me a few to figure out how to get create the model for ollama same issue:

ollama run customllama3:latest

hi there

Re: How to add a caption?

Thanks for your reply!

I understand that \figure and \caption are the standard LaTeX commands for figures. But as I am working on my
thesis, it would be nice if all the figures in my thesis were numbered automatically like the chapters! So when
I want to refer to Figure 1.3, I only have to type "\ref{fig:1.3}". And the number of the figure is shown at
the end of my document without me having to do anything!

I hope it makes sense now. I also attached a screenshot that shows what I mean!

Thanks once again for your help! I really appreciate it! 🙂

Re: How to add a caption?

Thank you so much, I appreciate all the help. It is greatly appreciated.

Re: How to add a caption?

Thank you very much, I appreciate your advice!

Re: How to add a caption?

Thanks so much for your help!

Re: How to add a caption?

Thank you very much indeed!

Re: How to add a caption?

rigid mauve
#

its weird because q4 saved as did q16, but the q8 was only 8mb, note q4 saving happens after q8 so strange indeed

jovial hound
#

Have you tried to save the GGUF locally anyone?

#

and run it this is my 13th attempt, and im not a stranger to this process.

rigid mauve
#

this is the model file that works for me with recent llama3 8b trains and ollama latest:

FROM ./llama3-sentiment-unsloth.F16.gguf
SYSTEM """
You are an advanced AI assistant created to perform sentiment analysis on text. Your task is to carefully read the text and analyze the sentiment it expresses towards the potential future stock value of any company mentioned. Analyze the sentiment of this text and respond with the appropriate JSON:
"""
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""

# PARAMETER stop <|end_of_text|> # Default for Llama3
# PARAMETER stop </s> # Default for Mistral

# A parameter that sets the temperature of the model, controlling how creative or conservative the model's responses will be
PARAMETER temperature 0.2

# Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
PARAMETER repeat_last_n 256
#

the training was done on May1st, I'm not sure if trains before then work

jovial hound
#

Let me try and copy your model file

#

My training was done yesterday

#

Nope no luck and just be safe - going to uninstall ollama. + lm. studio

What is new


Hello


WHY wont you work


heavy mural
#

Not sure if related, but this is my findings.

jovial hound
#

Must be, but the person says turn for data not style that’s not true tuned for data lots before

slow oar
#

im gonna run GGUF today and tomorrow myself 🙂

#

hopefully i can get it to work

#

finally i have some time on my hands - much apologies startup life is very annoying 😦

slow oar
#

Oh i think i might have fixed it

#

not 100% sure

#

there is 1 more issue ill push in tomorrow

#

update unsloth via

#
pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git
tranquil jetty
#

@slow oar

  File "/opt/conda/lib/python3.10/site-packages/unsloth/save.py", line 1358, in unsloth_save_pretrained_gguf
    file_location = save_to_gguf(model_type, new_save_directory, quantization_method, first_conversion, makefile)
  File "/opt/conda/lib/python3.10/site-packages/unsloth/save.py", line 848, in save_to_gguf
    logger.warning(
UnboundLocalError: local variable 'logger' referenced before assignment
#

I commented out the logger lines

#

It converted to gguf, but then failed to quantize

#

When I run the gguf it still does wierd

slow oar
#

oh ye fixed the logger

#

@tranquil jetty base or instruct?

#

if ur using base i was supposed to fix it today

raven lava
#

I'm using a single GPU on runpod. I got this error. I didn't get this before

#

*I was trying to add, vision to llama 3

slow oar
#

Oh hmm unsure if Llava works for now

raven lava
#

okay. I wasn't using LLava though.

import torch
import torch.nn as nn
from transformers import LlamaForCausalLM, AutoTokenizer, AutoModel

class LlamaWithVision(LlamaForCausalLM):
    def __init__(self, config, vision_model_name, max_seq_length):
        super().__init__(config)
        self.vision_model = AutoModel.from_pretrained(vision_model_name)
        self.visual_embedding = nn.Linear(self.vision_model.config.hidden_size, config.hidden_size)

        self.model.layers[0].self_attn.q_proj = nn.Linear(config.hidden_size * 2, config.hidden_size)
        self.model.layers[0].self_attn.k_proj = nn.Linear(config.hidden_size * 2, config.hidden_size)
        self.model.layers[0].self_attn.v_proj = nn.Linear(config.hidden_size * 2, config.hidden_size)

        self.max_seq_length = max_seq_length

    def forward(self, input_ids, visual_features, **kwargs):
        visual_embeddings = self.visual_embedding(visual_features)
        text_embeddings = self.model.embed_tokens(input_ids)
        embeddings = torch.cat((text_embeddings, visual_embeddings), dim=1)
        outputs = self.model(inputs_embeds=embeddings, **kwargs)
        return outputs

from unsloth import FastLanguageModel
max_seq_length = 2048
dtype = None
load_in_4bit = True
vision_model_name = "google/vit-large-patch16-384"
tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3-8b-bnb-4bit")

config = LlamaForCausalLM.from_pretrained("unsloth/llama-3-8b-bnb-4bit").config
model = LlamaWithVision(config, vision_model_name, max_seq_length)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)


slow oar
#

Oh hmm ye that might not work

jovial hound
#

Did ya fix it? I’ll test I can if there is a new unsloth @slow oar

tranquil jetty
deep goblet
#

@tranquil jetty

tranquil jetty
#

mmm?

deep goblet
#

Sorry.

#

Apparently my way doesn't improve the results.

#

As this is an internal problem with llama.cpp.

slow oar
#

I think it might be an actual llama.cpp tokenization issue

#

but still confirming as well

#

ill check tokenization today

deep goblet
#

@slow oar But Dan, after I merge to 16 Bit, then conver to GGUF using the HuggingFace Space, it seems fine.

jovial hound
#

Ok im still standing by

jovial hound
#

It's llama 3 thats the issue - also i dont wnat to use hugging face, I should be able to do it all in the colab

slow oar
#

I think its best ill make a clear example

#

much apologies on the issues!!

#

bear with me please :))

slow oar
jovial hound
#

Sweet I will check it out tonight

static cedar
#

So we have no llama 3 with GGUF support yet? 😦

tranquil jetty
#

u can gguf if u change the regex in the tokenizer.json

#

Here is a command that will fix it for you

#
sed -i 's/"(?i:'\''s|'\''t|'\''re|'\''ve|'\''m|'\''ll|'\''d)|\[^\\\\r\\\\n\\\\p{L}\\\\p{N}\]?\\\\p{L}+|\\\\p{N}{1,3}| ?\[^\\\\s\\\\p{L}\\\\p{N}\]+\[\\\\r\\\\n\]\*|\\\\s\*\[\\\\r\\\\n\]+|\\\\s+(?!\\\\S)|\\\\s+/"(?:'\''s|'\''S|'\''t|'\''T|'\''re|'\''Re|'\''rE|'\''RE|'\''ve|'\''vE|'\''Ve|'\''VE|'\''m|'\''M|'\''ll|'\''Ll|'\''lL|'\''LL|'\''d|'\''D)|\[^\\\\r\\\\n\\\\p{L}\\\\p{N}\]?\\\\p{L}+|\\\\p{N}{1,3}| ?\[^\\\\s\\\\p{L}\\\\p{N}\]+\[\\\\r\\\\n\]\*|\\\\r?\\\\n\\\\r?\\\\n\\\\r?\\\\n|\\\\r?\\\\n\\\\r?\\\\n|\\\\s\*\[\\\\r\\\\n\]+|\\\\s+(?!\\\\S)|\\\\s+/g' /path/to/llama3/tokenizer.json
#

Once you run this, you can then convert to gguf and quantize

jovial hound
#

About to try this again - so is that after or before in the colab that daniel put, meaning is that change in your colab @daniel?

#

Do you have to open up the tokenizer directly or just run that as a command?

jovial hound
slow oar
#

u can use any notebook now

#

@hoary copper example here

#

ill make a full example with Alpaca or ShareGPT

#

as well

jovial hound
#

Looking very promising -

#

I was doing a python fine tune last night with the old notebook and some alpaca data - I also did 3 just regular tests and the gguffs converted going to do some more today see how the GGUF degrads as expected with some lower quants

#

One oddity i did notice in lm studio after my q8 python train is this is popping up - current_player = 2 if current_player == 1 else 1<|endoftext|>

#

so I think still an oddity with that endoftext showing up

tepid forge
jovial hound
#

I have!

#

I've done a few tests - so just refining my data a bit, looks to be working correct, - im testing loss a bit more, but the conversion is working smooth from what i've seen. - I ddi have an oddity with the |endoftxt| showing up in my python. fine tune

slow oar
#

hmmm interesting the <|end_of_text|> issue @jovial hound does it randomnly prop up?

jovial hound
#

Yeah it was on my python fine tune i was using the python alpaca set, and i use this as the train load data, set one second

#

### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = "<|end_of_text|>"
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("iamtarun/python_code_instructions_18k_alpaca", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)```
#

Does that look right?

jovial hound
#

So far that EOT issue was only showing up in my python fine tune -

slow oar
#

looks fine

#

the EOT should be at the end of generation

#

and signals the end

jovial hound
#

Yeah so maybe its something to do with the way i trained the fine tune on the python code

#

the other model I just wrapped as a test - is working great, so far IM using it to test a few basic content piecs its a q5 not notcing any major degradation

tranquil jetty
#

Why is that alpaca prompt format so popular? 🤔
It didnt seem to give me better results than just Question / Answers in my limited testing

jovial hound
#

I was wondering that my self honestly:)

#

I just see it used everywhere so followed suit, I mean i have to technically adjust the fields anyways

frank igloo
#

And my personal guess: it helps going the completion to question answering step but that one could be completely wrong 😅

jovial hound
#

So far my trains have been working still working on another not seeing a lot of degradations

slow oar
#

Oh so its ok now? 🙂