#shape '[8, 14336]' is invalid for input of size 173981696

27 messages · Page 1 of 1 (latest)

supple shard
#
wandb_api_key = os.getenv("WANDB_API_KEY")

max_seq_length = 3000 
dtype = None 
load_in_4bit = True 


train_dataset = load_dataset("json", data_files="train_data.jsonl", split="train")
eval_dataset = load_dataset("json", data_files="eval_data.jsonl", split="train")


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "mistralai/Mistral-7B-Instruct-v0.2", 
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,


# Initialize Weights & Biases
wandb.login(key=wandb_api_key)
run = wandb.init(project="Fine tuning mistral 7B", job_type="training", anonymous="allow")

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, 
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, 
    bias = "none",    
    use_gradient_checkpointing = True,
    random_state = 3407,
    use_rslora = False,  
    loftq_config = None, 
)

# Define training arguments
project = "Nbot-small"
base_model_name = "unsloth"
run_name = base_model_name + "-" + project
output_dir = "./" + run_name

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    dataset_text_field = "chat",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        run_name=f"{run_name}-{datetime.now().strftime('%Y-%m-%d-%H-%M')}",
        warmup_steps = 50,
        max_steps = 10000,
        learning_rate = 2e-4,
        fp16 = not torch.cuda.is_bf16_supported(),
        bf16 = torch.cuda.is_bf16_supported(),
        fp16_full_eval = True,
        save_strategy="epoch",
        evaluation_strategy="epoch",
        # save_steps=250,
        # eval_steps=250,
        do_eval=True,
        logging_steps = 25,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = output_dir,
        report_to="wandb",
        logging_dir="./logs",
    ),
)

trainer_stats = trainer.train()
model.save_pretrained("lora_model")

Example dataset row (see message bellow they are to long for this message ;P:

Hey I'm trying to to fine tune misteral to be a better ReAct, RAG agent with about 5000 rows of data, using the normal SFTTrainer following huggingfaces example works fine even with higher VRAM and lower speed however after the 657 th row this failed and I'm not sure why it seems related to the fact it was at epoch 0.99 just before my evaluation. Any insight would be very useful as I'd love to use Unsloth for training so my PC is actually usable whilst training and so I can iterate and tune my parameters fast. Is it the fact that my database is already following the Mistral chat template and I'm loading it directly? Many Thanks, Volts.

#

Example row (I don't know which one it failed on):

{"chat": "<s>[INST] You are a helper AI that can assist with a variety of tasks\nUtilize applicable tools to answer the quesion effectively.\nNone\nTOOLS:\n------\n\nAssistant has access to the following tools:\n\nCalculator: useful for when you need to answer questions about math\nBingSearch: Performs an internet search and retrieves relevant information.\nWikipediaSearch: Performs a search on Wikipedia and retrieves relevant information.\n\nYou MUST use the following format:\n\nThought: you should always think about what to do\nAction: the action to take, should be one of [Calculator, BingSearch, WikipediaSearch]\nAction Input: the input to the action\nObservation: the result of the action\n... (this Thought/Action/Action Input/Observation can repeat N times)\n\nWhen you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:\n\nThought: I now know the final answer\nFinal Answer: [your response here]\n\nBegin!\n\nPrevious conversation history:\n\n\nQuestion: What type of vegetation does Cerastium and Pavonia have in common?\nThought: To answer this question, I need to find out what type of vegetation both Cerastium and Pavonia have. I should start by looking up Cerastium to understand its vegetation type.\n\nAction: WikipediaSearch\nAction Input: Cerastium vegetation type\nObservation: Page: British NVC community MC5\nSummary: British NVC community MC5 (Armeria maritima - Cerastium diffusum ssp. diffusum maritime therophyte community) is one of the maritime cliff communities in the British National Vegetation Classification system. It is one of five communities categorised as maritime cliff crevice and ledge communities.\nThis community is widespread on the western coasts of Britain, and localised on coasts elsewhere. There are four subcommunities.\n\nPage: Salisbury Plain\nSummary: Salisbury Plain is a chalk plateau in southern England covering 300 square miles (780 km2). It is part of a system of chalk downlands throughout eastern and southern England formed by the rocks of the Chalk Group and largely lies within the county of Wiltshire, but stretches into Hampshire.\nThe plain is famous for its rich archaeology, including Stonehenge, one of England's best known landmarks. Large areas are given over to military training; thus, the sparsely populated plain is the biggest remaining area of calcareous grassland in northwest Europe. Additionally, the plain has arable land, and a few small areas of beech trees and coniferous woodland. Its highest point is Easton Hill.\n\nPage: Vegetation of open habitats in the British National Vegetation Classification system\nSummary: This article gives an overview of the plant communities formed by vegetation of open habitats in the British National Vegetation Classification system.\nThought: [/INST]Given the information from the Wikipedia search, Cerastium is associated with maritime cliff communities, specifically being part of the British NVC community MC5, which is a maritime therophyte community. This suggests that Cerastium is part of a vegetation type that thrives in maritime (coastal) environments. Now, I need to find out about the vegetation type associated with Pavonia to see what they have in common.\n\nAction: WikipediaSearch\nAction Input: Pavonia vegetation type</s>"}
#

it failed just before evaluation at epoch 0.99

supple shard
#

This guy had a similar issue:
#community-collaboration message

keen sedge
#

@supple shard i fixed it just then!

#

you'll have to update Unsloth sorry on the issue

#

pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

supple shard
supple shard
keen sedge
#

thats fine

supple shard
keen sedge
#

you can do anything you like 🙂

#

🙂

supple shard
#

kekw True

#

but it depends what is optimal

#

in my non sloth trainer I had it like this:

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.eos_token
# tokenizer.add_eos_token = True
# tokenizer.add_bos_token, tokenizer.add_eos_token

with tokeniser uncomented and it really messed it up as my data was already formated

keen sedge
#

oh

#

i think thats fine

#

it should work

supple shard
#

As I think it was adding extra tokens when I already had it in my training data

#
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.eos_token
``` so I think this config was better
keen sedge
#

oh

#

well if u do this after FastLanguageModel

#

should be fine 🙂

supple shard
#

I don't think it's needed so will remove

#

thanks sloth