jovial hound Apr 24, 2024, 4:31 PM

#

I've tried this twice now with some simple fine tunes running 1 epoch + 2 epochs. The inference works as aspected, and the training seems to be fitting nicely. When I test the inference of the model on colab, it seems to respond as aspected. However when I convert using llamacpp. to quant 5 - and export the GGUF, everything. Becomes nonsnse (LM studio)

LM STUDIO GGUF Q5 (Llama3 template):

ask your classmates about their experiences with hydroponics. what were some of the challenges they faced while trying out new techniques? how did they overcome these challenges? ask them how they managed to get plants to grow in space. how do you think they would feel if they knew there was a chance they might not be able to produce enough food for themselves and their families?

COLAB RESPONSE:
imagine you were building a new form of hydroponics for space consider the techniques and process to approach this. what are the key factors to consider when designing a hydroponic system for space? how would you ensure the system is efficient and effective in providing nutrients to plants?

I'm happy to help you with your question. To build a new form of hydroponics for space, there are several factors to consider:1. Water and nutrient delivery: In space, water and nutrients will need to be delivered to the plants in a controlled manner.

Testing the same inference locally:

#

https://colab.research.google.com/drive/1DGhWyCyf1BI-_yYaLYgOOkZuGAWiuqNj?authuser=2#scrollTo=Ej5-VU5i4HU- - Not the cleanest colab been messing it up. abit

Google Colaboratory

#

Full inference sample with reloaded weights
imagine you were building a new form of hydroponics for space consider the techniques and process to approach this. what are the key factors to consider when designing a hydroponic system for space? how would you ensure the system is efficient and effective in providing nutrients to plants?
I'm happy to help you with your question. To build a new form of hydroponics for space, there are several factors to consider:1. Water and nutrient delivery: In space, water and nutrients will need to be delivered to the plants in a controlled manner. This can be achieved through the use of pumps, valves, and other mechanisms that ensure the right amount of water and nutrients are delivered to the plants at the right time.2. Plant growth media: In space, there may not be enough soil or other growth media available, so alternative methods will need to be used. This could include using a combination of hydroponic media, such as rockwool or perlite, and aeroponics, where plants are grown in a mist of water and nutrients.3. Light and temperature control: In space, it may be difficult to control light and temperature conditions, so a system that can adjust to changing conditions will be necessary. This could include using LED lights or other lighting systems that can be adjusted to provide the right amount of light for plant growth.4. Pest and disease control: In space, there may not be enough air circulation or other factors to control pests and diseases, so a system that can detect and prevent them will be necessary. This could include using sensors and other technologies to monitor plant health and detect any issues early on.5. Automation: In space,

twilit hornet Apr 24, 2024, 4:38 PM

#

To be honest I had the same problem. I managed to run my models in both LM studio and ollama. 1. I trained using oryginal model (not quantized but loaded in 4bit). 2. I used Instruct model instead of base. 3. I preprocess dataset using oryginal chat template from oryginal instruct model. 4. Trained using Unsloth. 5. I separately converted it to gguf using a. bpe tokenizer option b. Setting manually eos token. 6. In LM studio I use oryginał Lama 3 template. 7. In ollama use.

#

FROM ./Llama-3-Omnibus-PL-v01-GGUF.Q4_K_M.gguf
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""
SYSTEM """You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability."""
PARAMETER num_ctx 8192
PARAMETER num_gpu 99

#

This is my pipeline. I tried different ways and all the time I had problem with LM studio - model generated crap or was in loop.

#

Gguf generated from Unsloth did not work for me in LM studio. I do not why.

#

But anyway I had not time to investigate this problem since I spend on llama 3 a lot of time (too much 😂😅).

jovial hound Apr 24, 2024, 4:46 PM

#

I’m also part of the LM studio community and I’m help there too so I’ll pass this over there. I wonder if I can try this in GPT for all or something else and see what happens

jovial hound Apr 24, 2024, 5:03 PM

#

@twilit hornet im assuming the piece you shared is how to create a model file to run ollama with a custom local model?

twilit hornet Apr 24, 2024, 5:28 PM

#

yes

jovial hound Apr 24, 2024, 5:37 PM

#

Same happens in: ollama;

1/0. I am sorry but I don't understand you. What can I do for you? How can I help? Please ask me in a different way. What did you want? Your request is too general to be
understood. Once again, what do you want? How can I help you? Please specify your problem. 1/0.
2. The system is not intelligent enough to understand the user's requests. This situation is called underfitting.
3. The system is intelligent enough that it understands some of the user's requests but not all of them. This situation is called overfitting. It is similar to an AI version of
a well-known paradox: A paradox is something so unintelligible that no one can understand it; an antiparadox is something so intelligible that everyone else can't understand
it.
4. The system is intelligent enough that it understands all of the user's requests but not in time to answer them. This situation is called overfitting (in this case,
temporal rather than spatial).
5. The system is intelligent enough that it understands all of the user's requests and answers them immediately. This situation is called underfitting.

#

Modelfile

FROM imagineaiuser/csdinnovation-llama3-full/csd_innovate_long_train_llama3.gguf

TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> {{ .Response }}<|eot_id|>"""

SYSTEM """You are a helpful, smart, kind, and efficient AI assistant. You always fulfill the user's requests to the best of your ability."""

PARAMETER num_ctx 8192
PARAMETER num_gpu 99

rigid mauve Apr 24, 2024, 8:07 PM

#

I've had a similar issue I think, Ive tried fine tuning llama 3 8b with the notebook, get all the way to the end with ggufs, load them into ollama and spit out gibberish, likewise, tried importing to llama.cpp and got the same kind of output. Havent been able to get the main safetensors to load in hf either, always says failed to load. Tbf I seem to have the same issue with mistral7b fine tunes so I wonder if this is a deeper issue. The inferences work in the notebook, but cant seem to get them to work with anything else

jovial hound Apr 24, 2024, 8:50 PM

#

I suspect its with the convesion end point process - since it works all the way up there, I havent tried a mistral fine tune like this to see if same issue, - but yeah the end conversion pretty big issue if you cant use the model for anything 🙂

rigid mauve Apr 24, 2024, 10:24 PM

#

this fellow seems to have gotten it working but that was 3-4 days ago, not sure if something had changed by then:
https://github.com/molbal/llm-text-completion-finetune/tree/main

GitHub

GitHub - molbal/llm-text-completion-finetune: Guide on text complet...

Guide on text completion large language model fine-tuning, including example scripts and training data acquiring. - molbal/llm-text-completion-finetune

#

noted he is just using q4, I wonder if that is involved

#

I was just trying q8

rigid mauve Apr 24, 2024, 10:49 PM

#

q4 is gibberish too

jovial hound Apr 24, 2024, 10:52 PM

#

I just tried a mistral version as well - finetune in colab pre-quant looks good

rigid mauve Apr 24, 2024, 10:58 PM

#

yeah my pre-quants are always ok, seems whatever happens between pre-quant and ollama ends up being gibberish

#

noticed a delta from, molbals repo where in the unsloth notebook it says "Now if you want to load the LoRA adapters we just saved for inference, set False to True:" I had left that at False, trying it now with True

jovial hound Apr 24, 2024, 11:19 PM

#

ANyone officially on the project have any ideas? i think this is an unsloth issue,

rigid mauve Apr 24, 2024, 11:46 PM

#

setting true on load lora adapters at the end made no difference. hanging it up until we hear something official

slow oar Apr 25, 2024, 3:16 AM

#

Hmmm much apologies guys 😦

#

Ill do a large investigation and see how we can fix it

rigid mauve Apr 25, 2024, 4:10 AM

#

slow oar Ill do a large investigation and see how we can fix it

its all good this is normal in development-- but I would like to know if this has ever been demonstrated to work (quantized form), it feels like I've tried every possibility and I dont see any evidence of it ever having worked

slow oar Apr 25, 2024, 10:00 AM

#

@rigid mauve Hmm so old quants worked

#

i think its just llama-3

rigid mauve Apr 25, 2024, 12:51 PM

#

slow oar i think its just llama-3

I did a run with the mistral7b notebook version and observed the same gibberish, I can try again later as a sanity check

slow oar Apr 25, 2024, 1:24 PM

#

rigid mauve I did a run with the mistral7b notebook version and observed the same gibberish,...

oh no

#

thats not good

jovial hound Apr 25, 2024, 1:28 PM

#

same, saw this with mistral. local

#

i also got garbage results in ollama - just to note

slow oar Apr 25, 2024, 3:12 PM

#

hmmmmmmmmm

#

ok ill check this today

#

much apologies again!

rigid mauve Apr 25, 2024, 7:07 PM

#

FYI, I'm having much better luck with the sharegpt format based template at the top this thread: https://www.reddit.com/r/LocalLLaMA/comments/1cc7gtr/llama3_8b_finetuning_2x_faster_fixed_endless/?share_id=DY_R6CsM3fqBvaEiAy-Iw&utm_content=1&utm_medium=ios_app&utm_name=ioscss&utm_source=share&utm_term=10

I will share my final notebook when I get it fully working-- essentially its no longer spitting out gibberish, but I'm still getting endless generations, trying several approaches with the stop tokens.

here is my Modelfile for ollama:

FROM ./financial_sentiment_llama_8b_with_new_llama_3_template_and_instruct-unsloth.Q8_0.gguf
SYSTEM """Analyze the sentiment of this text."""
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""
# Sets the size of the context window used to generate the next token.
PARAMETER num_ctx 8192

# None of these stop token attempts worked

# The stop token is printed during the beginning of the training token
# PARAMETER stop <|end_of_text|> # Default for Llama3
# PARAMETER stop </s> # Default for Mistral
# PARAMETER stop <|begin_of_text|>
# A parameter that sets the temperature of the model, controlling how creative or conservative the model's responses will be
PARAMETER temperature 0.2

# Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
PARAMETER repeat_last_n 256

# Maximum number of tokens to predict when generating text. (Default: 128, -1 = infinite generation, -2 = fill context)
PARAMETER num_predict 1024

From the LocalLLaMA community on Reddit: Llama-3 8b finetuning 2x f...

Explore this post and more from the LocalLLaMA community

#

this bit of the code in the notebook I'm experimenting with, initally had map eos set to True but I thought that </s> is normally for mistral , trying false now:

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "llama-3", # Supports zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, unsloth
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"}, # ShareGPT style
    map_eos_token = False, # Maps <|im_end|> to </s> instead
)

#

False didn't affect it, still generating forever, I must be missing something about this stop sequence

twilit hornet Apr 25, 2024, 7:34 PM

#

rigid mauve False didn't affect it, still generating forever, I must be missing something ab...

base or instruct model?

rigid mauve Apr 25, 2024, 7:35 PM

#

instruct

#

same as the template (unsloth/llama-3-8b-Instruct-bnb-4bit)

#

I'm trying to upload it now but discord just keeps thrashing it-- lets try this link: https://1drv.ms/u/s!AiaJbofBy05nsNU-DNpRncX4WgwprA?e=Y1IT0H

#

the outputs are in there

#

some tweaks for my filenaming I'm using hf so I need that token at the top-- you can sub in your own

#

dataset looks ok to me : https://huggingface.co/datasets/seandearnaley/sentiment_analysis_sharegpt_json

seandearnaley/sentiment_analysis_sharegpt_json · Datasets at Huggin...

#

output is almost correct but won't stop ( ollama ), need to do some data cleanup, but need it to stop before attempting more training runs.

{"reasoning": "The sentiment is positive based on the financial context of the sentence.", "sentiment": 1.0, "confidence": 0.6}gpt
@united I'm still waiting for my bag to be delivered. It's been over an hour and a half.human


{"reasoning": "The sentiment is negative based on the frustration expressed in the tweet.", "sentiment": -1.0, "confidence": 1.0}gpt
@united I'm still waiting for my bag to be delivered. It's been over an hour and a half.human


{"reasoning": "The sentiment is negative based on the frustration expressed in the tweet.", "sentiment": -1.0, "confidence": 1.0}gpt
@united I'm still waiting for my bag to be delivered. It's been over an hour and a half.human


{"reasoning": "The sentiment is negative based on the frustration expressed in the tweet.", "sentiment": -1.0, "confidence": 1.0}gpt```

#

note the 'unsloth/llama-3-8b-Instruct-bnb-4bit' model is different than the last template, I think that combined with the chat template is getting it to work, but there seems to be something up with the stop tokens

jovial hound Apr 26, 2024, 2:56 AM

#

I’m following this can’t do to much testing at moment had surgery today but hoping we can figure this out

rigid mauve Apr 26, 2024, 4:18 AM

#

sorry to hear about your surgery Preston, hope you're doing well- I'm still hacking at this, I haven't had much luck getting llama3 to stop, but I'm trying mistral now just to isolate llama3, will keep on posting my progress-- my intention is to write a tutorial for folks with unsloth and ollama for sentiment analysis fine tuning-- I'm sure we'll figure it out

slow oar Apr 26, 2024, 9:28 AM

#

jovial hound I’m following this can’t do to much testing at moment had surgery today but hopi...

hope you get better soon!!

#

ill get back to this issue as well!

#

apologies had a lot of stuff going on

#

thanks @rigid mauve for the wonderful investigation

#

extremely appreciate it

#

everyone will get credits once this is resolved!

rigid mauve Apr 26, 2024, 12:30 PM

#

quick update, my 60 step mistral test run on the aforementioned notebook appeared to work eg stop, but wasn't getting great results that I did with llama3 with those same 60 steps, waiting for a larger run to finish to eval performance, but would still like to do llama 3 as that was my intent going in

frank igloo Apr 26, 2024, 12:37 PM

#

Also noticed very big differences during training logs & later use in ollama
fine tuning llama3 in german tho so not sure if the reason is the same as yours but also could be a general problem

jovial hound Apr 26, 2024, 1:01 PM

#

I’m wanting to do some updates - one is building my innovation fine tuned, also a few others. Do we think it’s in the template? Or do we think it’s moving from 4bit to 16 and then to quant, - I’m wondering if llama.cpp may have a bug

rigid mauve Apr 26, 2024, 2:57 PM

#

not looking good with the larger mistral run, the inference in the notebook was wrong, doesn't seem to be following the instructions at all even with 2 epoches-- essentially no use for it as I get better perf with the regular model using system prompting and 5 shot. I used the unsloth/mistral-7b-instruct-v0.2-bnb-4bit model with the mistral template-- not sure if this just mistrals problem because the llama3 run I mentioned earlier actually following the instructions it just generates forever. Going to put down unsloth for a bit and try other solutions, not really having luck here and cant figure out what else to try. Will keep an eye on the channels for updates that compute.

jovial hound Apr 26, 2024, 3:30 PM

#

Yeah so it’s got to be something relating to a layer lost between 16 and 4 bit I feel like

jovial hound Apr 26, 2024, 4:05 PM

#

so this is what lm studio ops think

Because LM studio defaults to the proper template for that.
So in this case, manually select the Llama 3 preset
... if it's still broken when using the right template, the fine tune could be messed up in that it removed the stop Tokens (<|eot_id|>)

slow oar Apr 26, 2024, 6:40 PM

#

hmmm it might be the 4bit and 16bit issue

jovial hound Apr 26, 2024, 7:05 PM

#

yep i dont know enough about how unsloth works in 4 bit to understand

rigid mauve Apr 26, 2024, 7:53 PM

#

Im just about to test a mistral run that started with the native mistral model not loaded into 4 bit, that may explain the poor performance for the mistral notebook

#

also I got a reply from the guy that got llama to work https://www.reddit.com/r/LocalLLaMA/comments/1cc7gtr/comment/l1ed255/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button , "You should add the eos token to tokenizer jusr before training" but Im too much of a noob to understand how to implement that in the notebook-- maybe thats a clue for the infinite generation issue with llama3. I presume this is the mentioned '<|end_of_text|>' token

bacocololo's comment on "Llama-3 8b finetuning 2x faster + fixed en...

Explore this conversation and more from the LocalLLaMA community

#

unfortunately trying to fine tune from mistralai/Mistral-7B-Instruct-v0.2 output just nightmare fuel, worse than the 4bit unsloth/mistral-7b-instruct-v0.2-bnb-4bit, I wonder if its worth trying loading mistralai/Mistral-7B-Instruct-v0.2 into 4 bit but I don't think the performance is going to be good enough / it seems unlikely that I can fine tune at 4 bit and get back anything better than the original fp16

rigid mauve Apr 26, 2024, 9:24 PM

#

This seems fairly significant https://www.reddit.com/r/LocalLLaMA/s/25ESD0E14i

From the LocalLLaMA community on Reddit: FYI there's some BPE token...

Explore this post and more from the LocalLLaMA community

jovial hound Apr 26, 2024, 9:27 PM

#

trying a slow llama train the old way - but 4it approach would be a game changer, here

#

https://github.com/ggerganov/llama.cpp/issues/6809

GitHub

BPE Tokenizer: Multiple newlines doesn't merge into a single token ...

So, I found out that \n\n if appended by a character tokenizes as ['\n',\n'] ([198, 198]) instead of ['\n\n'] ([271]). (I'm using Llama3 for this example, but this extends t...

#

looks like its a llama cpe tokenizer issue. - with BPE - gguf seemed to be rule dout

jovial hound Apr 26, 2024, 10:50 PM

#

Got to wait for this merge

jovial hound Apr 27, 2024, 12:20 AM

#

Thhis is the pull request we are waiting on - but still people trying to fix regex

#

https://github.com/ggerganov/llama.cpp/pull/6920

GitHub

llama : improve BPE pre-processing + LLaMA 3 and Deepseek support b...

Continuing the work in #6252 by @dragnil1
The primary goal is to fix LLaMA 3 tokenization: #6914

rigid mauve Apr 28, 2024, 6:05 AM

#

https://www.reddit.com/r/LocalLLaMA/comments/1cc7gtr/comment/l1lua92/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

humanbeingmusic's comment on "Llama-3 8b finetuning 2x faster + fix...

Explore this conversation and more from the LocalLLaMA community

slow oar Apr 28, 2024, 7:39 AM

#

@rigid mauve oh was that you lol - if yes - apologies since Unsloth is a startup, theres been too much stuff recently for me to handle, so apologies if I'm slow

#

i know i said ill look into it, but please be patient 🙂

#

appreciate it a lot 🙂

rigid mauve Apr 28, 2024, 7:40 AM

#

yep thats me

slow oar Apr 28, 2024, 7:41 AM

#

omg $200???!!

#

ok thats very bad

rigid mauve Apr 28, 2024, 7:41 AM

#

and a lot of my personal time

#

my name is Sean Dearnaley btw

slow oar Apr 28, 2024, 7:41 AM

#

ye i know so sorry

rigid mauve Apr 28, 2024, 7:41 AM

#

I mean are you marketing this as a startup that doesnt work?

slow oar Apr 28, 2024, 7:41 AM

#

hmmm i know u spent a lot of time on this

#

extreme apologies

rigid mauve Apr 28, 2024, 7:42 AM

#

I know there are investors and things but this could get hairy fast

slow oar Apr 28, 2024, 7:42 AM

#

well everything else works, but yes the GGUF part is partially broken

#

Unsloth mainly is a finetuning library

rigid mauve Apr 28, 2024, 7:42 AM

#

is that true tho because I couldnt get mistral or lora adapters to work

slow oar Apr 28, 2024, 7:42 AM

#

we made GGUF conversion only because users asked

rigid mauve Apr 28, 2024, 7:42 AM

#

or load into hugging face

slow oar Apr 28, 2024, 7:42 AM

#

what thats not good

rigid mauve Apr 28, 2024, 7:42 AM

#

yeah you got it buddy

#

not good

slow oar Apr 28, 2024, 7:43 AM

#

what do u mean it doesnt work though?

#

do u have a screenshot

rigid mauve Apr 28, 2024, 7:43 AM

#

I can help with your notebooks because they need some more advise for newbies like myself

slow oar Apr 28, 2024, 7:43 AM

#

like the issues u listed above is a llama.cpp issue right?

#

not an Unsloth issue

rigid mauve Apr 28, 2024, 7:43 AM

#

unfortunately I deleted all the hf models

#

so lets get this straight

slow oar Apr 28, 2024, 7:43 AM

#

but ill see what i can do

rigid mauve Apr 28, 2024, 7:43 AM

#

there are issues with llama.cpp yes

#

but somehow the 4bit mistral worked worse than the 7b

#

eg no point in fine tuning

#

but also the models I did upload to hf

slow oar Apr 28, 2024, 7:44 AM

#

so does inference work fine inside the colab notebook / kaggle?

rigid mauve Apr 28, 2024, 7:44 AM

#

running the hf inference failed, failed to load

slow oar Apr 28, 2024, 7:45 AM

#

apologies its not working, but just wanted to know why

#

oh that

#

thats not supposed to work

rigid mauve Apr 28, 2024, 7:45 AM

#

the inference for llama3 8b works fine

slow oar Apr 28, 2024, 7:45 AM

#

thats a HuggingFace error sadly

#

i asked HF to fix it

#

but they wont

rigid mauve Apr 28, 2024, 7:45 AM

#

let me be clear,

slow oar Apr 28, 2024, 7:45 AM

#

they said they dont want to provide free inference for 4bit models

rigid mauve Apr 28, 2024, 7:45 AM

#

llama3 has issues with tokenization that everyone is working on apparently

slow oar Apr 28, 2024, 7:45 AM

#

u have to merge to 16bit then do inference onthe HF side

rigid mauve Apr 28, 2024, 7:45 AM

#

but people have gotten this to work

slow oar Apr 28, 2024, 7:46 AM

#

are u using

#

this?

#

rigid mauve Apr 28, 2024, 7:46 AM

#

well yeah thats right

slow oar Apr 28, 2024, 7:46 AM

#

ye sadly that is a problem

#

i tried very hard pushing HF to fix it

rigid mauve Apr 28, 2024, 7:46 AM

#

so imho you need better guidance for how to actually use these things

#

there is no practical applications if the ggufs dont work imho

#

back to mistal7b tho

slow oar Apr 28, 2024, 7:47 AM

#

have u tried using Ooba?

rigid mauve Apr 28, 2024, 7:47 AM

#

thats old news

slow oar Apr 28, 2024, 7:47 AM

#

so ur saying Mistral 7b 4bit also doesnt work?

rigid mauve Apr 28, 2024, 7:47 AM

#

I did try ooba actually and it also did nort work

slow oar Apr 28, 2024, 7:48 AM

#

as in it loaded

#

but the results are gibberish?

#

theres like 2600 models uploaded to HF with Unsloth https://huggingface.co/models?other=unsloth

Models - Hugging Face

#

so this is actually news to me!!

#

hence why im asking more questions'

rigid mauve Apr 28, 2024, 7:49 AM

#

I figured but have you tried it lately

#

I can see why there are 2600 models on hf

#

but heres the thing, my fine tune to get jsons out

slow oar Apr 28, 2024, 7:49 AM

#

ok ill try running the mistal notebook again

rigid mauve Apr 28, 2024, 7:50 AM

#

and Im just trying to help from a startup pov

slow oar Apr 28, 2024, 7:50 AM

#

yes appreciate it

rigid mauve Apr 28, 2024, 7:50 AM

#

if the 4bit performs worse than the q8 regular mistral 7b for practical applications, then what is the point of it

#

eg fine tunes

#

I can see maybe people experimenting for fun

slow oar Apr 28, 2024, 7:51 AM

#

i do know some people have said QLoRA itself has issues ie 4bit finetuning can cause issues

#

have u considered load_in_4bit = False?

#

it will eat 4x more VRAM

rigid mauve Apr 28, 2024, 7:51 AM

#

lets be clear if you read my previous messages

#

I tried load_in_4bit true which ends up working (I guess) but useless

slow oar Apr 28, 2024, 7:52 AM

#

still useless?

rigid mauve Apr 28, 2024, 7:52 AM

#

and I also tried load_in_4bit false with the mistral/mistral7b02 instruct and it was total gibberish actually performed really strangely

#

I would encourage you to try yourself and see what happens if you use "any" model as you say

slow oar Apr 28, 2024, 7:53 AM

#

so the GGUF was total gibberish?

rigid mauve Apr 28, 2024, 7:53 AM

#

yep

slow oar Apr 28, 2024, 7:53 AM

#

ok ill rerun them all 🙂

#

running now

rigid mauve Apr 28, 2024, 7:53 AM

#

the 4bit worked but pointless

slow oar Apr 28, 2024, 7:53 AM

#

apologies on the issues

#

ill report back

#

sorry been juggling too many things recently

rigid mauve Apr 28, 2024, 7:54 AM

#

np Daniel, listen I have to bounce please for everyone else on the discord let them know whats up

slow oar Apr 28, 2024, 7:54 AM

#

sadly its just me on engineering :(((

rigid mauve Apr 28, 2024, 7:54 AM

#

keep posted posted maybe with release notes

#

gotcha Daniel

slow oar Apr 28, 2024, 7:54 AM

#

and its a 2 person team - so apologies if im slow to respond

#

extreme apologies

#

oh yes yes

rigid mauve Apr 28, 2024, 7:54 AM

#

I like the whole vibe of what you're trying to do

#

but it needs to be more thorough and honest

#

with regular updates and I'll support you guys forever

#

gn for now and best of luck to you

slow oar Apr 28, 2024, 7:55 AM

#

yep thanks 🙂

#

goodnight!

#

hopefully i can fix it!!

#

appreciate the support! 🙂

rigid mauve Apr 28, 2024, 8:00 AM

#

one last thing Daniel, try it on my dataset, which is imho a great practical test of something like this: https://huggingface.co/datasets/seandearnaley/sentiment_analysis_sharegpt_json

seandearnaley/sentiment_analysis_sharegpt_json · Datasets at Huggin...

slow oar Apr 28, 2024, 8:21 AM

#

oh ok will do!

tepid forge Apr 28, 2024, 8:53 AM

#

rigid mauve one last thing Daniel, try it on my dataset, which is imho a great practical tes...

hey mate, any data sets you recommend for improving coding abilities of the llama 70B model?

is it possible to get it to 90+ levels , for human eval etc based on the data sets on hugginface? is it practical?

I am using h100, via run pod, and its only costing 1.2 an hour, after a coupon....

Would love to know your thoughts

rigid mauve Apr 28, 2024, 9:11 AM

#

Ok so my thoughts on datasets for llama 3 coding abilities honestly, right now i don’t think its practical I say this because the sota is probably the dolphin series and you can use their datasets but they seemingly haven’t been able to improve llama70bs coding abilities. I know some folks may argue with me on that but Id hold off and see in the next few weeks if people tweak their datasets. It could be possible but i know llama3 is already highly optimized for code and im personally very impressed. There is also this llama 3 400b that is still in training and it will be incredible. That being said 90+ for all levels generally hard, but if I were you focus on specific languages and interesting specializations. Im working on my own startup and will share more in the coming months. Good luck!

slow oar Apr 28, 2024, 9:24 AM

#

interestingly i tried it and normal LoRA adapters wor

#

BUT llama.cpp does not convert...........

#

so ur definitely correct something is breaking

slow oar Apr 28, 2024, 10:29 AM

#

rigid mauve Ok so my thoughts on datasets for llama 3 coding abilities honestly, right now i...

https://colab.research.google.com/drive/1VvSZ6H4_aSvmcXq6JA7kw3XYjcnNbgXF?usp=sharing

Google Colaboratory

#

i managed to somehow make it to work

#

there are definitely some bugs

#

so need to fix it

#

i used ur datatset

#

and example

rigid mauve Apr 28, 2024, 10:41 AM

#

very exciting

#

and you're welcome to use that dataset as your example going forward

#

I need to go to sleep but will check this later!

rigid mauve Apr 28, 2024, 2:09 PM

#

@slow oar it doesnt work-- so I still need to go to sleep but essentially I got the notebook to go through (it wouldn't push to hf for some reason but that doesnt matter)- I did it with 300 steps, the inference looks fine in the notebook -- but final GGUF loaded into ollama shows no signs of being fine tuned at all-- it's real simple you're supposed to put in text and it should give you back a json snippet just like in the dataset--- instead it just looks like a regular completion. I think next steps for you, you need to get out of the notebooks and start using ollama yourself-- its the biggest provider right now, it is absolutely essential. Try loading the model files yourself and chatting with ollama. It does not work.

rigid mauve Apr 28, 2024, 4:39 PM

#

I spoke too soon, there is an issue but I figured it out. Extremely close this is totally acceptable for my needs and works great. Here is the issue- so for your mistral template you've chosen the llama3 style template, which means you have to specify that when importing the gguf. So traditionally in my experience with ggufs the template is baked in, so in this case Ollama was importing the mistral gguf and using the mistral template. I think you probably can bake in the template into the gguf but for now, here is the Modelfile for Ollama using the Mistral 7b imports and it works great! I'm going to try Llama3 8b next as that should be a more clean shot. Good work @slow oar

FROM ./mistral7b-sentimentanalysis-unsloth.Q8_0.gguf
SYSTEM """
You are an advanced AI assistant created to perform sentiment analysis on text. Your task is to carefully read the text and analyze the sentiment it expresses towards the potential future stock value of any company mentioned. Analyze the sentiment of this text and respond with the appropriate JSON:
"""
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""

slow oar Apr 28, 2024, 4:49 PM

#

rigid mauve I spoke too soon, there is an issue but I figured it out. Extremely close this ...

oh interesting so its a modelfile issue?

#

hmmm would Unsloth auto creating a modelfile be useful?

rigid mauve Apr 28, 2024, 4:49 PM

#

I think so yeah

#

for noobs certainly

slow oar Apr 28, 2024, 4:50 PM

#

ok ok good idea 🙂

#

i think a few people mentioned it

#

so ur not alone 🙂

rigid mauve Apr 28, 2024, 4:50 PM

#

also I wonder is it possible to bake it in to the ggufs but Im not sure how that works, I just know its a thing

slow oar Apr 28, 2024, 4:50 PM

#

ill make a modelfile automtically

#

yes it can

#

im unsure though how one uses it

#

do u know if there is like a set format for it?

#

i can see there is a chat template inside GGUF

#

#

but i think its wrong

rigid mauve Apr 28, 2024, 4:51 PM

#

um well I know if you look at the Ollama model list you can see templates for each of the models

slow oar Apr 28, 2024, 4:51 PM

#

oh ye ok thanks 🙂

rigid mauve Apr 28, 2024, 4:51 PM

#

I know mistrals is very different

#

but Llama3 8b is the new big boy so this template is great

slow oar Apr 28, 2024, 4:53 PM

#

oh for llama-3 u can change the "chatml" to "llama-3"

#

but also wanted to apologise didnt solve the issue for u earlier

#

hope its mostly resolved :))

rigid mauve Apr 28, 2024, 4:54 PM

#

np looking good, there was defo some things up in that early mistral notebook but you've fixed it

#

and I think this tokenization issue with llama 3 8b and llama.cpp may require another re-roll but hopefully not

#

gonna try llama 8b later today

#

thanks again-- I may make some suggestions for your notebook to make it more user friendly but its really good

slow oar Apr 28, 2024, 5:01 PM

#

yep thanks!

#

🙂

#

i mean through you, i can see how to makethe notebooks beter 🙂

jovial hound Apr 28, 2024, 7:32 PM

#

whats the status can i attempt another llama 3 8b fine tune with unsloth - oin my own data set?

#

@slow oar

rigid mauve Apr 28, 2024, 9:04 PM

#

@jovial hound you will need to wait for llama.cpp to fix the tokenization issue

jovial hound Apr 28, 2024, 9:13 PM

#

thanks looks like they have + they are workin on ojne more piece + then merging

#

according to what I see out there,

raven lava Apr 29, 2024, 5:01 AM

#

I asked about this in ollama discord as well. They asked me to copy the model file from llama 3 . I did but still got the same issue.

slow oar Apr 29, 2024, 7:57 AM

#

jovial hound whats the status can i attempt another llama 3 8b fine tune with unsloth - oin m...

u can try for eg the notebook i shared with @rigid mauve

#

but ye the llama tokenizer issue will have to wait for llama.cpp

#

on the modelfile @raven lava it seems like i should auto make it for u 🙂

raven lava Apr 29, 2024, 10:42 AM

#

I just finetuned mistral and used it on ollama. same issue. It's all gibberish.
It was working perfectly fine in notebook.

raven lava Apr 29, 2024, 10:42 AM

#

slow oar on the modelfile <@1049432187763687474> it seems like i should auto make it for ...

Modelfile generated by "ollama show"

To build a new Modelfile based on this one, replace the FROM line with:

FROM llama3:latest

FROM /Users/jsk/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|reserved_special_token"

This is the model file for llama 3 on ollama.

jovial hound Apr 29, 2024, 2:15 PM

#

ok looks like the fix just got merged,

#

So I would give it a try those - that are trying to

#

its llama CPP latest,

slow oar Apr 29, 2024, 2:16 PM

#

OH EYSSS great it got merged!!

frank igloo Apr 29, 2024, 2:21 PM

#

is removing the llama.cpp directory and running the python code again enough? (Does unsloth pull the llama.cpp:latest?)
Or do we need to wait for an unsloth patch?

raven lava Apr 29, 2024, 2:25 PM

#

Is if fixed?? I’m about to sleep. Should I start training again 👀

jovial hound Apr 29, 2024, 2:34 PM

#

im trying it myself,

#

but takes a bit to get back to that point, got to load something back in

jovial hound Apr 29, 2024, 3:24 PM

#

testing now with a new model,

#

ok so far not good responses with a q5 - going to q8

#

a lot of people think the 4bit tuning drops in too much loss

slow oar Apr 29, 2024, 3:51 PM

#

hmm ye i did see the 4bit quant isnt good

#

and hmmm this might affect QLoRA

jovial hound Apr 29, 2024, 4:02 PM

#

but does that mean it can still be trained in 4bit,

#

not much luck in Q8 either:

hi

AI

How can I help you?

brushlapuserlapassistantlap
I'm here to help you with any questions or tasks you may have. Just ask!lapuserlapassistantlap
What is the fastest way to learn foreign languages?lapuserlapassistantlap
It depends on your learning style, but some methods can be faster than others. One approach is to use immersion techniques such as studying abroad or taking online classes in a foreign language. Another possible method is to use shadowing, where you listen to native speakers of the language you want to learn and repeat what they say, word for word, improving your pronunciation and understanding in the process.brushlapuserlapassistantlap
Excellent question! To find the fastest way to learn foreign languages, it might be helpful to consider a few factors: Your level of prior language experience: If you have no prior language experience, you might want to start by learning basic vocabulary and grammar before moving on to more advanced concepts. Your learning style: Do you prefer graphical methods or textual methods? Are you a visual or auditory learner? What works best for you will depend on your preferences and strengths. The amount of time you can dedicate to learning: If you only have a limited amount of time available, you might want to focus on high-frequency words and phrases that are useful in everyday conversations. This can help you develop basic conversational skills more quickly than trying to learn an entire language from scratch.brushlapuserlapassistantlap

raven lava Apr 30, 2024, 1:34 AM

#

Why am I facing the same issue with mistral 7b ?? Do u guys have any idea/ suggestions?

jovial hound Apr 30, 2024, 1:49 AM

#

I think something has changed recently. - but im having same issue with QLora as well, - i. even had issue with ORPO today,

#

Lllama cpp had a major fix for BPE tokenizers today -

raven lava Apr 30, 2024, 5:31 AM

#

Yup. still not working.

frank igloo Apr 30, 2024, 7:17 AM

#

Are you using Ollama too?
I think it might be just an Ollama issue
There already is a big difference in quality when I load the exact same base model (llama3 4b latest) into A: Ollama or B: unsloth inference before training
B was much better but idk why, kind of expected the same quality in reults here since its both just the base model wthout any fine tuning
Same system prompt, gave Ollama the official prompting template of llama3 too.
Idk which parameters are used in unsloth for llama3 but I guess nno big difference in those, maybe @slow oar knows? 😄

raven lava Apr 30, 2024, 8:56 AM

#

yes. i'm using ollama. I'm using the notebook to convert it to q4 and then download them to my macbook.
Just now I tried Phi 3 mini too. same gibberish.

It was working fine in the notebook. The full model without quantization.
I'm testing something now. I'm downloading the full model, then i'll use ollama to quantize the model and see how it performs.

raven lava Apr 30, 2024, 9:45 AM

#

I coudn't quantize using ollama/docker, which automates the quantization.
It is saying that tokenizer.model is missing. This file is missing for all latest model releases.

Then I tried to manually do it, using llama.cpp as mentioned on ollama github repo.

Loading model file /Users/jsk/Storage/jayasuryajsk_Llama-3-8b-Telugu-Romanized/model-00001-of-00004.safetensors
Loading model file /Users/jsk/Storage/jayasuryajsk_Llama-3-8b-Telugu-Romanized/model-00001-of-00004.safetensors
Loading model file /Users/jsk/Storage/jayasuryajsk_Llama-3-8b-Telugu-Romanized/model-00002-of-00004.safetensors
Loading model file /Users/jsk/Storage/jayasuryajsk_Llama-3-8b-Telugu-Romanized/model-00003-of-00004.safetensors
Loading model file /Users/jsk/Storage/jayasuryajsk_Llama-3-8b-Telugu-Romanized/model-00004-of-00004.safetensors
params = Params(n_vocab=128256, n_embd=4096, n_layer=32, n_ctx=8192, n_ff=14336, n_head=32, n_head_kv=8, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=500000.0, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=<GGMLFileType.MostlyF16: 1>, path_model=PosixPath('/Users/jsk/Storage/jayasuryajsk_Llama-3-8b-Telugu-Romanized'))
Traceback (most recent call last):
  File "/Users/jsk/ollama/llm/llama.cpp/convert.py", line 1555, in <module>
    main()
  File "/Users/jsk/ollama/llm/llama.cpp/convert.py", line 1522, in main
    vocab, special_vocab = vocab_factory.load_vocab(vocab_types, model_parent_path)
  File "/Users/jsk/ollama/llm/llama.cpp/convert.py", line 1424, in load_vocab
    vocab = self._create_vocab_by_path(vocab_types)
  File "/Users/jsk/ollama/llm/llama.cpp/convert.py", line 1414, in _create_vocab_by_path
    raise FileNotFoundError(f"Could not find a tokenizer matching any of {vocab_types}")
FileNotFoundError: Could not find a tokenizer matching any of ['spm', 'hfft']

#

Tokenizer issue. This model was trained a week ago.

raven lava Apr 30, 2024, 10:25 AM

#

I tried with gradientai/Llama-3-8B-Instruct-262k as well.
same error again.

raven lava Apr 30, 2024, 1:30 PM

#

I found a fix. gradientai/Llama-3-8B-Instruct-262k is working fine, when quantized locally and used in ollama

#

Still my finetune is haivng issues.

slow oar Apr 30, 2024, 1:44 PM

#

Yep can see there are errors

#

working a fix

#

apologies!!!

jovial hound Apr 30, 2024, 3:58 PM

#

let me know when can retest

tepid forge Apr 30, 2024, 10:35 PM

#

has anyone tried the 262k token llama 3 by gradient?

jovial hound Apr 30, 2024, 10:40 PM

#

@slow oar any luck on getting the 4bit quants for llama3 to train?

raven lava May 1, 2024, 1:56 AM

#

tepid forge has anyone tried the 262k token llama 3 by gradient?

Finetuning or running it locally?

jovial hound May 1, 2024, 2:01 AM

#

both

#

finetune and then rune gguf locally no worky yet

raven lava May 1, 2024, 2:40 AM

#

Yes. It’s not working. Even phi models aren’t working.

slow oar May 1, 2024, 9:23 AM

#

jovial hound <@160322114274983936> any luck on getting the 4bit quants for llama3 to train?

oh u mean training GGUF itself?

jovial hound May 1, 2024, 3:39 PM

#

So far any unsloth I use with the colab and my own training data formatted in the same way, comes out garbage when quant and using the model locally - they key here I I want to use the model locally not in the cloud, - and of course my own train sets - maybe gguf is the issue?

#

Before quant and on the colab it’s working, after the fine tune so it’s the conversion process at end that has something get messed up

tepid forge May 1, 2024, 5:08 PM

#

raven lava Finetuning or running it locally?

Finetune.

tepid forge May 1, 2024, 5:09 PM

#

jovial hound Before quant and on the colab it’s working, after the fine tune so it’s the conv...

thanks for letting us know.

raven lava May 1, 2024, 8:52 PM

#

raven lava I found a fix. ```gradientai/Llama-3-8B-Instruct-262k``` is working fine, when q...

Nope. It just worked once. lucky I guess.

#

I saw a thread in ollama help. They are asking us to wait until this is merged into ollama. The llama cpp fix hasn't been merged yet.
https://github.com/ggerganov/llama.cpp/pull/6920

GitHub

llama : improve BPE pre-processing + LLaMA 3 and Deepseek support b...

Continuing the work in #6252 by @dragnil1
This PR adds support for BPE pre-tokenization to llama.cpp
Summary
The state so far has been that for all BPE-based models, llama.cpp applied a default pre...

jovial hound May 1, 2024, 9:56 PM

#

ahh then maybe thats still the issue -

#

i dont want to do any more trains untill i know this is fixed spent some dollars doing the trains

raven lava May 2, 2024, 4:02 AM

#

I’ve seen in a thread saying that LM studio is working. Haven’t tested it myself yet.

agile anvil May 2, 2024, 5:17 AM

#

Well investigate this issue again today

#

It's likely it's the llama.cpp tokenizer issue mainly but well check from our side

jovial hound May 2, 2024, 12:07 PM

#

ok love to do some more traing with unsloth

#

heading to an ai confefence

raven lava May 2, 2024, 1:25 PM

#

I just came home and tried LM studio. Its working..!!! I had no gibberish issues or any other unlimited text generation issues.

#

I just downloaded the gguf file converted by unsloth directly and ran it.

#

I cant believe, I wasted 10 days on this

jovial hound May 2, 2024, 1:31 PM

#

Share what colab you used because I still haven’t had a good lama 3 fine tune to gguf

#

Maybe a lm studio issue?

#

I’ve been wasting that long on it to but don’t want to spend any more training credits till I’m sure it works

raven lava May 2, 2024, 1:32 PM

#

I changed only this line from the original notebook.
EOS_TOKEN = "<|end_of_text|>" # Must add EOS_TOKEN

jovial hound May 2, 2024, 1:33 PM

#

If that is all it is I will cry

raven lava May 2, 2024, 1:33 PM

#

https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing

Google Colab

jovial hound May 2, 2024, 1:34 PM

#

Have your tried your own custom data set?

#

For example I have a data set I built on methods of innovation to hyper tune the model towards innovations techniques

raven lava May 2, 2024, 1:35 PM

#

The one i'm running on lm studio was trained only 600 steps,the output is better than what I got in the colab notebook

raven lava May 2, 2024, 1:36 PM

#

jovial hound Have your tried your own custom data set?

Yes. I'm training llama 3 on a romanized Indian language. The script is english alphabet, but words are my native language

jovial hound May 2, 2024, 1:36 PM

#

And good results you got that’s great!

#

Yeah in 4bit I can train at least 2 epochs, and it looked good on inference it was conversion to gguf I think that was broken and then testing q5, q8 just asking in lm studio. Hi the model went nuts

raven lava May 2, 2024, 1:40 PM

#

only GPT4 can chat/output this kind of mixing between english and indian languages. In my mother tongue, in day to day usage, people use English words for nouns like pen, paper etc but verbs and adjectives will be by native language. The script is again english letters. Its kind of hard for LLMs to chat without proper data in pretraining phase. So, I generated 6000 samples using gpt4 api. Now i'll use this to generate more data like 50 to 100k and train another model.

raven lava May 2, 2024, 1:44 PM

#

jovial hound Yeah in 4bit I can train at least 2 epochs, and it looked good on inference it w...

I think I tried to use the original llama 3 in this line as well.
model_name = "meta-llama/Meta-Llama-3-8B"

i kept this as true.
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

Try "meta-llama/Meta-Llama-3-8B" too, if the "unsloth/llama-3-8b-bnb-4bit" is not yielding good results

jovial hound May 2, 2024, 1:54 PM

#

I’ll review and adapt your notebook mine I had to change to load my data set, which I know worked just failed on conversion and then using ollama lm studio, I know llama cpp made changes a few days ago so was waiting for dust to settle

jagged kiln May 2, 2024, 2:01 PM

#

llama.cpp seems to be the cause of the issue but the team is already working very hard and don't get paid for their work

jovial hound May 2, 2024, 2:02 PM

#

I think it’s solved

slow oar May 2, 2024, 2:19 PM

#

hmmm maybe ill reupload llama-3

#

ill also made a notebook with a clearer example

#

if that helps

#

including actual llama.cpp inference on the bottom

jovial hound May 2, 2024, 2:29 PM

#

Thanks please do!

jovial hound May 2, 2024, 7:15 PM

#

Still no louck - got a new example ready to try?

jovial hound May 2, 2024, 7:54 PM

#

this is what is super confusing in the - colab

Now, use the model-unsloth.gguf file or model-unsloth-Q4_K_M.gguf file in llama.cpp or a UI based system like GPT4All. You can install GPT4All by going here.

And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other links:

#

Thats nt true, it wont actually save anything till you actually run llama

#

Please udpate this so it actually tells people what to do

jovial hound May 2, 2024, 10:40 PM

#

Used your exact colab - only changed the EOT token. - and still doesnt work:

I am not sure what you want me to do.
Please tell me.

You have been assigned to help with a new project, and it's up to you to make sure it goes smoothly. You are in charge of managing the project and making sure that everyone is working together to get the job done. You will need to communicate frequently with your team members, and keep them updated on what needs to be done next. You will also need to make sure that all of your tasks are completed before the deadline.

You have been assigned to help with a new project, and it's up to you to make sure it go

jovial hound May 2, 2024, 10:59 PM

#

So i guess this project is dead then?

#

since it doesnt work

raven lava May 3, 2024, 2:07 AM

#

Can you please give me the link for the dataset and I’ll have a look and train .

slow oar May 3, 2024, 4:13 AM

#

@jovial hound Sorry I know you're getting frustrated so apologies

#

i didnt have time to make a full example sorry apologies - i was going to get it done toda

#

ill ping u when its done

rigid mauve May 3, 2024, 2:29 PM

#

FYI Ollama updated yesterday evening and that resolved the issues I had with llama3 models never stopping, I guess they fixed their llama.cpp implementation: #1151557857846767656 message

agile anvil May 3, 2024, 3:10 PM

#

rigid mauve FYI Ollama updated yesterday evening and that resolved the issues I had with lla...

oh thank god

#

thanks for the update

agile anvil May 3, 2024, 3:39 PM

#

Btw...does that mean we don't need to fix anything now? Or is there still something wrong?

jovial hound May 3, 2024, 3:40 PM

#

Ive done the training with peft before, I just saw that unsloth looks better anything that speeds up training is better - if I can tget it to work,

agile anvil May 3, 2024, 3:43 PM

#

Really hope it works, as the issue is not our end 😦

jovial hound May 3, 2024, 3:57 PM

#

so you have it working now in llama studio?

#

lm

#

Its still doing it for me and I just updated yesteday with latest

#

hi

AI
The user is a friendly, intelligent, thoughtful, and diligent person. They are always polite and respectful.

Hi! I'm an intelligent, helpful, friendly, and efficient AI assistant. I am here to help you with whatever you need. I will always do my best to help you in the most effective way possible. Thank you for using me!
I'm a kind, smart, thoughtful, and efficient AI assistant. I'm always polite and respectful. Thanks for using me!

The user is a helpful, intelligent, friendly, and efficient AI assistant. They are always polite and respectful.
Hi! I'm an intelligent, helpful, friendly, and efficient AI assistant. I am here to help you with whatever you need. I will always do my best to help you in the most effective way possible. Thank you for using me!
I'm a kind, smart, thoughtful, and efficient AI assistant. I'm always polite and respectful. Thanks for using me!

agile anvil May 3, 2024, 4:01 PM

#

jovial hound so you have it working now in llama studio?

is it working in ollama?

jovial hound May 3, 2024, 4:02 PM

#

i havent tried it in ollama but its the Q8 tune i did direct saved it locally - from demo notebook - with the alpaca i didnt change anythign except to save the quant locally - I'm not sur ehow to setup th emodel file for ollama have to look

agile anvil May 3, 2024, 4:03 PM

#

jovial hound i havent tried it in ollama but its the Q8 tune i did direct saved it locally - ...

oh ok i see

#

@rigid mauve do you happen to know if llama 3 unsloth GGUF now works in ollama?

#

have you tested it? thank you! ahhhh

rigid mauve May 3, 2024, 4:06 PM

#

so thats interesting about the q8, so weirdly my q8 didnt save properly in the notebook, but an older q8 worked in ollama, the fp16 and q4 saved ok and they worked in ollama. I have tested all 3. Not sure if there is a bug with q8 saving that got introduced recently

jovial hound May 3, 2024, 4:07 PM

#

Working through creating new ollama file,

agile anvil May 3, 2024, 4:08 PM

#

rigid mauve so thats interesting about the q8, so weirdly my q8 didnt save properly in the n...

oh weird

jovial hound May 3, 2024, 4:25 PM

#

Took me a few to figure out how to get create the model for ollama same issue:

ollama run customllama3:latest

hi there

Re: How to add a caption?

Thanks for your reply!

I understand that \figure and \caption are the standard LaTeX commands for figures. But as I am working on my
thesis, it would be nice if all the figures in my thesis were numbered automatically like the chapters! So when
I want to refer to Figure 1.3, I only have to type "\ref{fig:1.3}". And the number of the figure is shown at
the end of my document without me having to do anything!

I hope it makes sense now. I also attached a screenshot that shows what I mean!

Thanks once again for your help! I really appreciate it! 🙂

Re: How to add a caption?

Thank you so much, I appreciate all the help. It is greatly appreciated.

Re: How to add a caption?

Thank you very much, I appreciate your advice!

Re: How to add a caption?

Thanks so much for your help!

Re: How to add a caption?

Thank you very much indeed!

Re: How to add a caption?

rigid mauve May 3, 2024, 4:25 PM

#

its weird because q4 saved as did q16, but the q8 was only 8mb, note q4 saving happens after q8 so strange indeed

jovial hound May 3, 2024, 4:25 PM

#

Have you tried to save the GGUF locally anyone?

#

and run it this is my 13th attempt, and im not a stranger to this process.

rigid mauve May 3, 2024, 4:26 PM

#

this is the model file that works for me with recent llama3 8b trains and ollama latest:

FROM ./llama3-sentiment-unsloth.F16.gguf
SYSTEM """
You are an advanced AI assistant created to perform sentiment analysis on text. Your task is to carefully read the text and analyze the sentiment it expresses towards the potential future stock value of any company mentioned. Analyze the sentiment of this text and respond with the appropriate JSON:
"""
TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""

# PARAMETER stop <|end_of_text|> # Default for Llama3
# PARAMETER stop </s> # Default for Mistral

# A parameter that sets the temperature of the model, controlling how creative or conservative the model's responses will be
PARAMETER temperature 0.2

# Sets how far back for the model to look back to prevent repetition. (Default: 64, 0 = disabled, -1 = num_ctx)
PARAMETER repeat_last_n 256

#

the training was done on May1st, I'm not sure if trains before then work

#

Screenshot_2024-05-03_at_12.27.43_PM.png

jovial hound May 3, 2024, 4:36 PM

#

Let me try and copy your model file

#

My training was done yesterday

#

Nope no luck and just be safe - going to uninstall ollama. + lm. studio

What is new

Hello

WHY wont you work

heavy mural May 3, 2024, 8:30 PM

#

https://www.reddit.com/r/LocalLLaMA/comments/1cji53a/possible_bug_unconfirmed_llama3_gguf/

From the LocalLLaMA community on Reddit: Possible bug (Unconfirmed)...

Explore this post and more from the LocalLLaMA community

#

Not sure if related, but this is my findings.

jovial hound May 3, 2024, 10:04 PM

#

Must be, but the person says turn for data not style that’s not true tuned for data lots before

slow oar May 4, 2024, 6:22 AM

#

im gonna run GGUF today and tomorrow myself 🙂

#

hopefully i can get it to work

#

finally i have some time on my hands - much apologies startup life is very annoying 😦

slow oar May 4, 2024, 7:45 PM

#

Oh i think i might have fixed it

#

not 100% sure

#

there is 1 more issue ill push in tomorrow

#

update unsloth via

#

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

tranquil jetty May 4, 2024, 9:25 PM

#

@slow oar

  File "/opt/conda/lib/python3.10/site-packages/unsloth/save.py", line 1358, in unsloth_save_pretrained_gguf
    file_location = save_to_gguf(model_type, new_save_directory, quantization_method, first_conversion, makefile)
  File "/opt/conda/lib/python3.10/site-packages/unsloth/save.py", line 848, in save_to_gguf
    logger.warning(
UnboundLocalError: local variable 'logger' referenced before assignment

#

I commented out the logger lines

#

It converted to gguf, but then failed to quantize

#

When I run the gguf it still does wierd

#

slow oar May 5, 2024, 3:37 AM

#

oh ye fixed the logger

#

@tranquil jetty base or instruct?

#

if ur using base i was supposed to fix it today

raven lava May 5, 2024, 5:08 AM

#

I'm using a single GPU on runpod. I got this error. I didn't get this before

#

*I was trying to add, vision to llama 3

slow oar May 5, 2024, 12:23 PM

#

Oh hmm unsure if Llava works for now

raven lava May 5, 2024, 2:51 PM

#

okay. I wasn't using LLava though.

import torch
import torch.nn as nn
from transformers import LlamaForCausalLM, AutoTokenizer, AutoModel

class LlamaWithVision(LlamaForCausalLM):
    def __init__(self, config, vision_model_name, max_seq_length):
        super().__init__(config)
        self.vision_model = AutoModel.from_pretrained(vision_model_name)
        self.visual_embedding = nn.Linear(self.vision_model.config.hidden_size, config.hidden_size)

        self.model.layers[0].self_attn.q_proj = nn.Linear(config.hidden_size * 2, config.hidden_size)
        self.model.layers[0].self_attn.k_proj = nn.Linear(config.hidden_size * 2, config.hidden_size)
        self.model.layers[0].self_attn.v_proj = nn.Linear(config.hidden_size * 2, config.hidden_size)

        self.max_seq_length = max_seq_length

    def forward(self, input_ids, visual_features, **kwargs):
        visual_embeddings = self.visual_embedding(visual_features)
        text_embeddings = self.model.embed_tokens(input_ids)
        embeddings = torch.cat((text_embeddings, visual_embeddings), dim=1)
        outputs = self.model(inputs_embeds=embeddings, **kwargs)
        return outputs

from unsloth import FastLanguageModel
max_seq_length = 2048
dtype = None
load_in_4bit = True
vision_model_name = "google/vit-large-patch16-384"
tokenizer = AutoTokenizer.from_pretrained("unsloth/llama-3-8b-bnb-4bit")

config = LlamaForCausalLM.from_pretrained("unsloth/llama-3-8b-bnb-4bit").config
model = LlamaWithVision(config, vision_model_name, max_seq_length)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

slow oar May 5, 2024, 6:45 PM

#

Oh hmm ye that might not work

#

also i added https://github.com/unslothai/unsloth/issues/430 for those interested

GitHub

GGUF breaks - llama-3 · Issue #430 · unslothai/unsloth

Findings from ggerganov/llama.cpp#7062 and Discord chats: Notebook for repro: https://colab.research.google.com/drive/1djwQGbEJtUEZo_OuqzN_JF6xSOUKhm4q?usp=sharing Unsloth + float16 + QLoRA = WORKS...

jovial hound May 5, 2024, 7:17 PM

#

Did ya fix it? I’ll test I can if there is a new unsloth @slow oar

tranquil jetty May 5, 2024, 7:27 PM

#

Does anyone have an easy way to pull and make this branch?
https://github.com/ggerganov/llama.cpp/tree/gg/bpe-preprocess
And do a gguf conversion?

deep goblet May 6, 2024, 6:31 AM

#

@tranquil jetty

tranquil jetty May 6, 2024, 6:38 AM

#

mmm?

deep goblet May 6, 2024, 6:42 AM

#

Sorry.

#

Apparently my way doesn't improve the results.

#

As this is an internal problem with llama.cpp.

deep goblet May 6, 2024, 7:16 AM

#

https://www.reddit.com/r/LocalLLaMA/comments/1ckvx9l/part2_confirmed_possible_bug_llama3_gguf/

From the LocalLLaMA community on Reddit: Part2 (Confirmed) - Possib...

Explore this post and more from the LocalLLaMA community

slow oar May 6, 2024, 7:36 AM

#

I think it might be an actual llama.cpp tokenization issue

#

but still confirming as well

#

https://github.com/ggerganov/llama.cpp/issues/7062

GitHub

Llama3 GGUF conversion with merged LORA Adapter seems to lose train...

I'm running Unsloth to fine tune LORA the Instruct model on llama3-8b . 1: I merge the model with the LORA adapter into safetensors 2: Running inference in python both with the merged model dir...

#

ill check tokenization today

deep goblet May 6, 2024, 7:31 PM

#

@slow oar But Dan, after I merge to 16 Bit, then conver to GGUF using the HuggingFace Space, it seems fine.

jovial hound May 6, 2024, 8:52 PM

#

Ok im still standing by

agile anvil May 7, 2024, 5:10 AM

#

deep goblet <@160322114274983936> But Dan, after I merge to 16 Bit, then conver to GGUF usi...

For Llama 3?

jovial hound May 7, 2024, 2:29 PM

#

It's llama 3 thats the issue - also i dont wnat to use hugging face, I should be able to do it all in the colab

slow oar May 7, 2024, 6:47 PM

#

I think its best ill make a clear example

#

much apologies on the issues!!

#

bear with me please :))

slow oar May 7, 2024, 10:23 PM

#

Finally made a Colab that probs works https://colab.research.google.com/drive/1I-KrmZu5OJ1S8UkKLu_uGRIZIynGmgHK?usp=sharing

Google Colab

jovial hound May 8, 2024, 9:48 PM

#

Sweet I will check it out tonight

static cedar May 8, 2024, 10:42 PM

#

So we have no llama 3 with GGUF support yet? 😦

tranquil jetty May 8, 2024, 10:58 PM

#

u can gguf if u change the regex in the tokenizer.json

#

Here is a command that will fix it for you

#

sed -i 's/"(?i:'\''s|'\''t|'\''re|'\''ve|'\''m|'\''ll|'\''d)|\[^\\\\r\\\\n\\\\p{L}\\\\p{N}\]?\\\\p{L}+|\\\\p{N}{1,3}| ?\[^\\\\s\\\\p{L}\\\\p{N}\]+\[\\\\r\\\\n\]\*|\\\\s\*\[\\\\r\\\\n\]+|\\\\s+(?!\\\\S)|\\\\s+/"(?:'\''s|'\''S|'\''t|'\''T|'\''re|'\''Re|'\''rE|'\''RE|'\''ve|'\''vE|'\''Ve|'\''VE|'\''m|'\''M|'\''ll|'\''Ll|'\''lL|'\''LL|'\''d|'\''D)|\[^\\\\r\\\\n\\\\p{L}\\\\p{N}\]?\\\\p{L}+|\\\\p{N}{1,3}| ?\[^\\\\s\\\\p{L}\\\\p{N}\]+\[\\\\r\\\\n\]\*|\\\\r?\\\\n\\\\r?\\\\n\\\\r?\\\\n|\\\\r?\\\\n\\\\r?\\\\n|\\\\s\*\[\\\\r\\\\n\]+|\\\\s+(?!\\\\S)|\\\\s+/g' /path/to/llama3/tokenizer.json

#

Once you run this, you can then convert to gguf and quantize

jovial hound May 9, 2024, 2:57 AM

#

About to try this again - so is that after or before in the colab that daniel put, meaning is that change in your colab @daniel?

#

Do you have to open up the tokenizer directly or just run that as a command?

jovial hound May 9, 2024, 3:01 AM

#

slow oar Finally made a Colab that probs works https://colab.research.google.com/drive/1I...

What data set are you training?

slow oar May 9, 2024, 10:01 AM

#

jovial hound What data set are you training?

that was just a watermark test

#

u can use any notebook now

#

@hoary copper example here

#

ill make a full example with Alpaca or ShareGPT

#

as well

jovial hound May 9, 2024, 1:09 PM

#

Looking very promising -

#

I was doing a python fine tune last night with the old notebook and some alpaca data - I also did 3 just regular tests and the gguffs converted going to do some more today see how the GGUF degrads as expected with some lower quants

#

One oddity i did notice in lm studio after my q8 python train is this is popping up - current_player = 2 if current_player == 1 else 1<|endoftext|>

#

so I think still an oddity with that endoftext showing up

tepid forge May 9, 2024, 2:28 PM

#

jovial hound One oddity i did notice in lm studio after my q8 python train is this is popping...

have you started your finetune preston. would love ot walk through the process with you if possible and if i can be of any help.

jovial hound May 9, 2024, 4:39 PM

#

I have!

#

I've done a few tests - so just refining my data a bit, looks to be working correct, - im testing loss a bit more, but the conversion is working smooth from what i've seen. - I ddi have an oddity with the |endoftxt| showing up in my python. fine tune

slow oar May 9, 2024, 4:41 PM

#

hmmm interesting the <|end_of_text|> issue @jovial hound does it randomnly prop up?

jovial hound May 9, 2024, 4:42 PM

#

Yeah it was on my python fine tune i was using the python alpaca set, and i use this as the train load data, set one second

#


### Instruction:
{}

### Input:
{}

### Response:
{}"""

EOS_TOKEN = "<|end_of_text|>"
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, input, output in zip(instructions, inputs, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

from datasets import load_dataset
dataset = load_dataset("iamtarun/python_code_instructions_18k_alpaca", split = "train")
dataset = dataset.map(formatting_prompts_func, batched = True,)```

#

Does that look right?

jovial hound May 9, 2024, 5:03 PM

#

So far that EOT issue was only showing up in my python fine tune -

slow oar May 9, 2024, 5:13 PM

#

looks fine

#

the EOT should be at the end of generation

#

and signals the end

jovial hound May 9, 2024, 5:46 PM

#

Yeah so maybe its something to do with the way i trained the fine tune on the python code

#

the other model I just wrapped as a test - is working great, so far IM using it to test a few basic content piecs its a q5 not notcing any major degradation

tranquil jetty May 9, 2024, 9:07 PM

#

Why is that alpaca prompt format so popular? 🤔
It didnt seem to give me better results than just Question / Answers in my limited testing

jovial hound May 9, 2024, 9:33 PM

#

I was wondering that my self honestly:)

#

I just see it used everywhere so followed suit, I mean i have to technically adjust the fields anyways

frank igloo May 10, 2024, 9:12 AM

#

tranquil jetty Why is that alpaca prompt format so popular? 🤔 It didnt seem to give me bette...

It is just a common method to make the model understand your dataset better since the format often mimics the way the base models usually have been trained

#

And my personal guess: it helps going the completion to question answering step but that one could be completely wrong 😅

jovial hound May 10, 2024, 7:33 PM

#

So far my trains have been working still working on another not seeing a lot of degradations

slow oar May 11, 2024, 9:57 AM

#

Oh so its ok now? 🙂

#Fine Tune Llama3 Inference Works On Colab - GGUF produces garbage results in LMstudio.

Modelfile generated by "ollama show"

To build a new Modelfile based on this one, replace the FROM line with:

FROM llama3:latest

How can I help you?

Re: How to add a caption?

Re: How to add a caption?

Re: How to add a caption?

Re: How to add a caption?

Re: How to add a caption?

Re: How to add a caption?