#trying to finetune on a quite complex dataset including alot of images of text

81 messages · Page 1 of 1 (latest)

merry summitBOT
#

For your dataset, ensure you use the vision fine-tuning workflow with Unsloth, format your data as vision Q&A pairs (image + prompt + answer), and use FastVisionModel.get_peft_model with all finetune flags enabled (vision, language, attention, MLP). Set r and lora_alpha to 16 for balanced accuracy and overfitting risk. Use UnslothVisionDataCollator and set remove_unused_columns=False, dataset_text_field="", and max_seq_length to fit your longest prompt/answer. Image sizes should be consistent (300–1000px recommended) for efficient training. If your current model is not improving, check your data quality, ensure labels are correct, and consider using train_on_responses_only to focus loss on the answer part only.

Switching to Qwen3-VL-4B may help, as it offers improved vision and OCR capabilities, but expect higher VRAM requirements. Qwen3-VL is state-of-the-art for vision-language tasks and may yield better results, especially if your current model struggles with text extraction or reasoning on images. For best results, follow the official Unsloth vision fine-tuning guide and Qwen3-VL fine-tuning instructions.

Would you like a step-by-step code example or more detail on troubleshooting poor training outcomes?

Sources:

lucid parcel
#

is reasoning an important part of understanding text in an image or could i just have data that is the instruction and the answer yes or no?

#

@merry summit

merry summitBOT
#

Reasoning is not strictly required for your task if the model only needs to classify based on the visible text; you can use simple instruction + yes/no answer pairs. However, including reasoning (e.g., chain-of-thought explanations) can help the model generalize and improve accuracy, especially on ambiguous or complex cases, as shown in Unsloth's vision RL examples and dataset structuring guides. For straightforward classification, direct Q&A is sufficient, but for nuanced decisions, reasoning may help.

Would you like more detail on when to use reasoning in your dataset?

Sources:

lucid parcel
#

ne number? Answer only Yes or No.'}, {'type': 'image', 'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=320x131 at 0x77F7917CD370>}]}, {'role': 'assistant', 'content': [{'type': 'text', 'text': 'No'}]}]}, {'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': 'Read the text inside the green box. Is it a Swedish company, a license plate, a website, or a phone number? Answer only Yes or No.'}, {'type': 'image', 'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=415x609 at 0x77F7917CD400>}]}, {'role': 'assistant', 'content': [{'type': 'text', 'text': 'No'}]}]}, {'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': 'Read the text inside the green box. Is it a Swedish company, a license plate, a website, or a phone number? Answer only Yes or No.'}, {'type': 'image', 'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=891x723 at 0x77F7917CD490>}]}, {'role': 'assistant', 'content': [{'type': 'text', 'text': 'Yes'}]}]}, {'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': 'Read the text inside the green box. Is it a Swedish company, a license plate, a website, or a phone number? Answer only Yes or No.'}, {'type': 'image', 'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=310x114 at 0x77F7917CD520>}]}, {'role': 'assistant', 'content': [{'type': 'text', 'text': 'Yes'}]}]}, {'messages':

@merry summit is this the correct format?

merry summitBOT
#

Yes, this is the correct format for Unsloth vision-language fine-tuning: each example should be a list of messages, where the user message contains both the instruction (as text) and the image, and the assistant message contains the answer (as text). This matches the expected conversational, multi-modal input for models like Qwen-VL in Unsloth.

Just ensure your dataset is a list of such dictionaries, and that images are properly loaded as PIL images. For more, see the Unsloth vision fine-tuning guide.

Would you like a code snippet for dataset preparation?

Sources:

lucid parcel
#

The training and validation loss does not change when training without warmup. i have tried alot of different variables, what can be the cause or things i should try now? @merry summit

merry summitBOT
#

If training and validation loss do not change, common causes are: (1) data formatting errors (e.g., labels not set or masked incorrectly), (2) learning rate too low or too high, (3) model not receiving gradients (frozen layers or optimizer misconfiguration), or (4) dataset too small or not shuffled. Also, ensure your max_seq_length is not truncating all answers, and that your data collator is correct for vision tasks.

Check your dataset and label formatting, try different learning rates (e.g., 2e-4 or 1e-4), and verify that LoRA adapters are trainable. See Unsloth troubleshooting guide for more.

Would you like a checklist or step-by-step debugging guide?

Sources:

lucid parcel
#

@merry summit is patience an option we have in unsloth?

merry summitBOT
#

Yes, Unsloth supports the early_stopping_patience option via the Hugging Face EarlyStoppingCallback when using SFTTrainer or similar trainers. You can add this callback to your trainer and set early_stopping_patience to control how many evaluation steps to wait before stopping if the validation loss does not improve. See Unsloth early stopping documentation for details.

Would you like a code example for setting early stopping patience?

Sources:

lucid parcel
#

@merry summit how does it work if i do only eval every 20 steps and put the early_stopping_patience to like 3 ?

merry summitBOT
#

If you set evaluation to every 20 steps (eval_steps=20) and early_stopping_patience=3, training will stop if the validation loss does not improve for 3 consecutive evaluations (i.e., 60 steps). The model checks for improvement only at each evaluation interval, not every step.

This setup is standard and works as expected in Unsloth, as described in the early stopping documentation.

Sources:

lucid parcel
#

def load_trained_lora_cpu(weight_path):
from transformers import BitsAndBytesConfig
from peft import PeftModel

# 1️⃣ Configure 4-bit quantization with CPU offload allowed
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    llm_int8_enable_fp32_cpu_offload=True,  # valid here!
)

# 2️⃣ Load the base model using this config
base, tok = FastVisionModel.from_pretrained(
    "unsloth/Qwen2.5-VL-3B-Instruct-bnb-4bit",
    quantization_config=bnb_config,
    device_map="auto",
    max_memory={0: "7.5GiB", "cpu": "20GiB"},
    use_gradient_checkpointing="unsloth",
)

# 3️⃣ Attach your LoRA adapter
model_lora = PeftModel.from_pretrained(
    base,weight_path,
)

FastVisionModel.for_inference(model_lora)

return model_lora, tok

@merry summit is this the correct way to load the model after training it? anything that could be wrong here?

merry summitBOT
#

Your approach is mostly correct for loading a LoRA adapter with Unsloth, but there are a few caveats:

  • llm_int8_enable_fp32_cpu_offload is not used for 4-bit quantization; it's for 8-bit.
  • For Unsloth, you can load the LoRA directly with FastVisionModel.from_pretrained(weight_path, ...) if the adapter was saved with Unsloth, skipping manual PeftModel.from_pretrained.
  • Ensure your quantization_config and device_map are compatible with your hardware and BitsAndBytes version.
  • Always call FastVisionModel.for_inference(model_lora) after loading.

See Unsloth vision loading guide and related issues.

Would you like a code snippet for the recommended Unsloth loading workflow?

Sources:

lucid parcel
#

yes @merry summit

merry summitBOT
#

Great! Here’s a recommended code snippet for loading a trained LoRA adapter with Unsloth for inference on CPU or limited GPU:

from unsloth import FastVisionModel

# If you saved your LoRA adapter with Unsloth, you can load it directly:
model, tokenizer = FastVisionModel.from_pretrained(
    "path/to/your/lora_adapter_or_merged_model",
    load_in_4bit=True,  # or False for 16-bit
    device_map="auto",  # or specify as needed
    use_gradient_checkpointing="unsloth",
    # max_seq_length=...,  # set if needed
)

# Enable inference optimizations
FastVisionModel.for_inference(model)

# Now you can use the model for inference

Notes:

  • If you saved a merged model (base + LoRA), just point to that directory.
  • If you only have the LoRA adapter, you may need to load the base model first, then apply the adapter using PEFT as in your original code.
  • For most Unsloth workflows, direct loading as above is preferred and simpler.
  • Make sure your Unsloth and PEFT versions are up to date for best compatibility.

Reference:
Unsloth Inference Guide

If you have a specific error or want to clarify your model saving/loading setup, please provide more details!

lucid parcel
#

Step Training Loss Validation Loss 25 0.013100 0.012017 50 0.010900 0.011437 75 0.012900 0.010795 100 0.008200 0.010626 125 0.009600 0.010514 150 0.012200 0.010539 175 0.011300 0.010415

@merry summit Is this any proof that my training is working? and that the model actually got better?

merry summitBOT
#

Yes, your training and validation loss values are both low and decrease or remain stable over time, which indicates that your model is learning and not overfitting or diverging. The validation loss being close to the training loss suggests good generalization and that your training is working as expected, as described in the Unsloth training guide.

Would you like more detail on interpreting loss curves or next steps for evaluation?

Sources:

lucid parcel
#

should i change any variables to make it better? @merry summit

merry summitBOT
#

If your losses are already low and stable, major changes may not be needed. For further improvement, you can try: lowering the learning rate slightly, increasing the dataset size, or using early stopping to avoid overfitting. Also, ensure your batch size and gradient accumulation are optimal for your hardware, as recommended in the Unsloth LoRA hyperparameters guide.

Would you like specific suggestions for learning rate, batch size, or other hyperparameters?

Sources:

lucid parcel
#

args = SFTConfig(
per_device_train_batch_size = 1,
gradient_accumulation_steps = 32,

    eval_strategy="steps",
    eval_steps=25,

    early_stopping_patience=3,

@merry summit should early stopping be here?

merry summitBOT
#

No, early_stopping_patience is not a parameter of SFTConfig or TrainingArguments. Early stopping should be set by adding an EarlyStoppingCallback to your trainer, not as an argument in SFTConfig. See the Unsloth early stopping documentation.

Would you like a code example for adding early stopping?

Sources:

lucid parcel
#

assert args.metric_for_best_model is not None, (
753 "EarlyStoppingCallback requires metric_for_best_model to be defined"
754 )

AssertionError: EarlyStoppingCallback requires metric_for_best_model to be defined @merry summit

merry summitBOT
#

You must set metric_for_best_model (e.g., "eval_loss") in your SFTConfig or TrainingArguments when using EarlyStoppingCallback. Without this, early stopping cannot determine which metric to monitor. Example:

args = SFTConfig(
    ...,
    metric_for_best_model="eval_loss",
    greater_is_better=False,
)

See Unsloth early stopping docs.

Would you like a full code example?

Sources:

lucid parcel
#

when i use warmup the loss is very high almost all the time, is this not bad?

Step Training Loss Validation Loss
10 4.264200 4.222050
20 3.733400 3.681689
30 3.289800 3.259734
40 2.943400 2.931273
50 2.685000 2.672304
60 2.460000 2.459299
70 2.315800 2.287699

@merry summit

merry summitBOT
#

High loss during the warmup phase is normal; warmup gradually increases the learning rate, so initial losses are expected to be higher and then decrease as training progresses. As long as your loss decreases steadily after warmup, this is not a problem—see Unsloth LoRA hyperparameters guide.

Would you like more detail on warmup behavior or how to tune it?

Sources:

lucid parcel
#

but i still have high loss after the 10 warmup steps as you can see @merry summit

merry summitBOT
#

A high loss after 10 warmup steps is not necessarily bad if it continues to decrease steadily, as in your results. The important factor is the downward trend—your loss drops from 4.2 to 2.3 over 70 steps, which indicates learning is happening. For complex tasks or new domains, initial losses can be high and may require more steps to reach low values, as noted in the Unsloth training guide.

Would you like advice on adjusting warmup steps or learning rate for faster convergence?

Sources:

lucid parcel
#

but for when i dont use warmup the eval loss goes from 0.013 down to 0.01

#

@merry summit

merry summitBOT
#

It seems likely that when you don't use warmup, your model converges faster and reaches a lower validation loss, which can be beneficial for some datasets and tasks. Warmup is mainly used to stabilize training for large models or high learning rates, but if your loss decreases smoothly without it, you may not need warmup for your setup, as suggested in the Unsloth LoRA hyperparameters guide.

Would you like more detail on when to use warmup or how to tune it for your case?

Sources:

lucid parcel
#

@merry summit should a higher r change the amplitude of the loss?

merry summitBOT
#

Yes, increasing the LoRA rank (r) can affect the amplitude of the loss. A higher r increases model capacity, which may lead to faster convergence and potentially lower loss, but it can also cause instability or overfitting if set too high. The relationship is discussed in the Unsloth LoRA hyperparameters guide.

Would you like more detail on choosing the optimal rank for your task?

Sources:

lucid parcel
#

@merry summit Error processing image: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

What is this erorr

merry summitBOT
#

This error means a CUDA kernel failed due to an invalid operation, often from out-of-bounds indices or bad input data. It's common in Unsloth vision models if input IDs or image tensors are malformed, or if there are hardware/driver mismatches. To debug, set CUDA_LAUNCH_BLOCKING=1 before running your script for a more accurate stack trace, and check your data formatting and CUDA/PyTorch versions. See Unsloth troubleshooting guide for details.

Would you like step-by-step debugging instructions?

Sources:

lucid parcel
#

try:
FastVisionModel.for_inference(model) # Enable for inference!

image = train_dataset[2]["image"]
instruction = "Read the text inside the green box. Is it a Swedish company, a license plate, a website, or a phone number? Answer only Yes or No."

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": instruction}
    ]}
]
input_text = tokenizer.apply_chat_template(messages, add_generation_prompt = True)
inputs = tokenizer(
    image,
    input_text,
    add_special_tokens = False,
    return_tensors = "pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt = True)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128,
                   use_cache = True, temperature = 0)

except Exception as e:
print(f"Error processing image: {e}")

then i get this error:

Error processing image: temperature (=0) has to be a strictly positive float, otherwise your next token scores will be invalid.
@merry summit

merry summitBOT
#

The error occurs because temperature=0 is not allowed; it must be a strictly positive float (e.g., temperature=1.0). Set temperature to a value greater than 0 to avoid this error, as required by the generation API. See Unsloth vision inference examples.

Would you like more detail on temperature settings for generation?

Sources:

lucid parcel
#

model = FastVisionModel.get_peft_model(
model,
finetune_vision_layers = True, # False if not finetuning vision layers
finetune_language_layers = True, # False if not finetuning language layers
finetune_attention_modules = True, # False if not finetuning attention layers
finetune_mlp_modules = True, # False if not finetuning MLP layers

r = 16,           # The larger, the higher the accuracy, but might overfit
lora_alpha = 16,  # Recommended alpha == r at least
lora_dropout = 0,
bias = "none",
random_state = 3407,
use_rslora = False,  # We support rank stabilized LoRA
loftq_config = None, # And LoftQ
# target_modules = "all-linear", # Optional now! Can specify a list if needed

)

from unsloth.trainer import UnslothVisionDataCollator
from trl import SFTTrainer, SFTConfig

trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
data_collator = UnslothVisionDataCollator(model, tokenizer), # Must use!
train_dataset = converted_dataset_train,
eval_dataset = converted_dataset_val,
#compute_metrics=compute_metrics,

args = SFTConfig(
    per_device_train_batch_size = 4,
    gradient_accumulation_steps = 4,

    eval_strategy="steps",
    eval_steps=10,


    warmup_steps = 10,
    max_steps = 100,
    # num_train_epochs = 1, # Set this instead of max_steps for full training runs
    learning_rate = 2e-5,
    logging_steps = 1,
    optim = "adamw_8bit",
    weight_decay = 0.01,
    lr_scheduler_type = "linear",
    seed = 3407,
    output_dir = "outputs",
    report_to = "none",     # For Weights and Biases

    # You MUST put the below items for vision finetuning:
    remove_unused_columns = False,
    dataset_text_field = "",
    dataset_kwargs = {"skip_prepare_dataset": True},
    max_length = 2048,

    
),

)

Why do i get this error:
AcceleratorError: CUDA error: device-side assert

#

@merry summit

merry summitBOT
#

A CUDA error: device-side assert during Unsloth vision fine-tuning is usually caused by malformed input data, such as mismatched image/text pairs, incorrect label formatting, or out-of-bounds token IDs. It can also result from hardware/driver mismatches or using an unsupported CUDA/PyTorch version. See the Unsloth troubleshooting guide for debugging steps.

Would you like a checklist to debug your dataset and environment?

Sources:

lucid parcel
#

it worked before so it cant be the hardware, i must have changed something in the code @merry summit

merry summitBOT
#

If it worked before and now triggers a device-side assert, the most likely causes are: (1) a change in your dataset formatting (e.g., missing or misaligned images/labels), (2) a change in preprocessing/tokenization, or (3) a code change that affects how data is fed to the model. Even a small change in data structure or batch collation can cause this error. Double-check your recent code and data changes, especially around how images and text are paired and processed. See Unsloth troubleshooting for more.

Would you like a step-by-step checklist to debug your data pipeline?

Sources:

lucid parcel
#

@merry summit when i trained before without warmup i got a loss of 0.01 from the beginnning and then it went down. i now changed some variables and now my loss starts at 5... how can i go back?

merry summitBOT
#

A sudden increase in starting loss (from 0.01 to 5) after changing variables usually means your data formatting, tokenization, or model configuration has changed. Revert recent changes to your dataset preprocessing, chat template, tokenizer settings, or model hyperparameters to restore previous behavior. High initial loss often indicates a mismatch between your data and the model’s expected input, as described in the Unsloth troubleshooting guide.

Would you like a checklist to systematically revert and debug your changes?

Sources:

lucid parcel
#

{'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': 'Read the text inside the green box. Is it a Swedish company, a license plate, a website, or a phone number? Answer only Yes or No.'}, {'type': 'image', 'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=303x283 at 0x7899B0A26EA0>}]}, {'role': 'assistant', 'content': [{'type': 'text', 'text': 'No'}]}]}
{'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': 'Read the text inside the green box. Is it a Swedish company, a license plate, a website, or a phone number? Answer only Yes or No.'}, {'type': 'image', 'image': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=1046x662 at 0x7899B0A263F0>}]}, {'role': 'assistant', 'content': [{'type': 'text', 'text': 'No'}]}]}

@merry summit is this not good?

merry summitBOT
#

Your data format is correct for Unsloth vision fine-tuning: each sample is a list of messages with user and assistant roles, and the user content includes both text and an image. This matches the expected input for Unsloth's UnslothVisionDataCollator and vision models, as shown in Unsloth vision fine-tuning examples.

As long as your images are valid PIL objects and the structure is consistent across your dataset, this format is good.

Would you like tips on further validating your dataset or troubleshooting issues?

Sources:

lucid parcel
#

@merry summit If the loss beginns at 5.5 does that mean something is wrong?

merry summitBOT
#

A starting loss of 5.5 is not necessarily wrong, especially for vision or multimodal models—it's common for initial losses to be high (e.g., 6–7) and then decrease rapidly during training. This is documented for models like Gemma 3N and others in the Unsloth documentation.

If the loss decreases over time and the model performs well on validation, this is expected behavior.

Would you like more detail on interpreting loss values for your specific model type?

Sources:

lucid parcel
#

@merry summit Is this how i add a computation of the accuracy when training?

from unsloth.trainer import UnslothVisionDataCollator
from trl import SFTTrainer, SFTConfig

FastVisionModel.for_training(model) # Enable for training!

from sklearn.metrics import accuracy_score

def compute_metrics(eval_pred):
logits, labels = eval_pred
preds = (logits > 0).astype(int)
return {"accuracy": accuracy_score(labels, preds)}

trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
data_collator = UnslothVisionDataCollator(model, tokenizer), # Must use!
train_dataset = converted_dataset_train,
eval_dataset = converted_dataset_val,
compute_metrics=compute_metrics,

args = SFTConfig(
    per_device_train_batch_size = 4,
    gradient_accumulation_steps = 4,

    eval_strategy="steps",
    eval_steps=10,


    warmup_steps = 0,
    max_steps = 200,
    # num_train_epochs = 1, # Set this instead of max_steps for full training runs
    learning_rate = 5e-5,
    logging_steps = 1,
    optim = "adamw_8bit",
    weight_decay = 0.01,
    lr_scheduler_type = "linear",
    seed = 3407,
    output_dir = "outputs",
    report_to = "none",     # For Weights and Biases

finetuning:
remove_unused_columns = False,
dataset_text_field = "",
dataset_kwargs = {"skip_prepare_dataset": True},
max_length = 2048,

    metric_for_best_model="eval_loss",
    greater_is_better=False,
),

)

from transformers import EarlyStoppingCallback
early_stopping_callback = EarlyStoppingCallback(
early_stopping_patience = 3, # How many steps we will wait if the eval loss doesn't decrease

early_stopping_threshold = 0.0,  # Can set higher - sets how much loss should decrease by until

trainer.add_callback(early_stopping_callback)

merry summitBOT
#

Yes, this is the correct way to add accuracy computation during training in Unsloth using compute_metrics with SFTTrainer. Your compute_metrics function will be called at each evaluation step, and accuracy will be logged. This approach matches the Unsloth classification and metrics examples.

Just ensure your logits and labels shapes are compatible and that your prediction thresholding logic matches your task (e.g., for multi-class, use argmax). For binary classification, (logits > 0).astype(int) is correct.

Would you like a more detailed example for multi-class or other metric types?

Sources:

lucid parcel
#

@merry summit i do binary classification but the answers are always "Yes" or "No"

merry summitBOT
#

If your binary classification task outputs "Yes" or "No" as text, you should convert model predictions to these labels for accuracy calculation. After getting predicted logits, use a threshold (e.g., logits > 0) to get binary predictions, then map 1 to "Yes" and 0 to "No" before comparing to your ground truth labels.

Example:

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = (logits > 0).astype(int)
    preds_text = ["Yes" if p == 1 else "No" for p in preds]
    labels_text = ["Yes" if l == 1 else "No" for l in labels]
    return {"accuracy": accuracy_score(labels_text, preds_text)}

See Unsloth classification discussion.

Would you like more detail on label encoding or dataset preparation?

Sources:

lucid parcel
#

@merry summit the evaluation is now VERY slow

merry summitBOT
#

Evaluation can be very slow in Unsloth if your evaluation dataset is large, batch size is high, or if you use compute_metrics with logits returned (which increases VRAM and CPU usage). To speed up, reduce per_device_eval_batch_size, use eval_accumulation_steps, and set fp16_full_eval=True or bf16_full_eval=True in your config, as recommended in the Unsloth evaluation FAQ.

Would you like a code example for faster evaluation settings?

Sources:

lucid parcel
#

trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
data_collator = UnslothVisionDataCollator(model, tokenizer), # Must use!
train_dataset = converted_dataset_train,
eval_dataset = converted_dataset_val,
compute_metrics=compute_metrics,

args = SFTConfig(
    per_device_train_batch_size = 4,
    gradient_accumulation_steps = 4,

    eval_strategy="steps",
    eval_steps=10,


    warmup_steps = 0,
    max_steps = 150,
    #num_train_epochs = 1, # Set this instead of max_steps for full training runs
    learning_rate = 5e-5,
    logging_steps = 1,
    optim = "adamw_8bit",
    weight_decay = 0.01,
    lr_scheduler_type = "linear",
    seed = 3407,
    output_dir = "outputs",
    report_to = "none",     # For Weights and Biases

    # You MUST put the below items for vision finetuning:
    remove_unused_columns = False,
    dataset_text_field = "",
    dataset_kwargs = {"skip_prepare_dataset": True},
    max_length = 2048,

    metric_for_best_model="eval_loss",
    greater_is_better=False,
),

)

Im wondering, how can i use epochs instead of steps to define everything? when i put 150 steps i dont know if it is 1 epoch or more @merry summit

merry summitBOT
#

To use epochs instead of steps, set num_train_epochs=1 (or your desired number) and remove or comment out max_steps. This will train for a full pass over your dataset per epoch, making it clear how many epochs you run. Unsloth and TRL will automatically calculate the number of steps per epoch based on your dataset size and batch settings. See Unsloth fine-tuning guide.

Would you like a code example showing this change?

Sources:

lucid parcel
#

def compute_metrics(eval_pred):
logits, labels = eval_pred
print("logits, labels", logits, labels)
preds = (logits > 0).astype(int)
preds_text = ["Yes" if p == 1 else "No" for p in preds]
labels_text = ["Yes" if l == 1 else "No" for l in labels]
return {"accuracy": accuracy_score(labels_text, preds_text)}
@merry summit the metrics make the evaluation really slow

merry summitBOT
#

Yes, using a custom compute_metrics function can make evaluation slow, especially if you process large arrays or decode predictions in Python loops. This is a known issue in Unsloth and TRL—metrics computation can be a bottleneck, particularly with large eval datasets or when returning logits. To speed up, use eval_accumulation_steps, reduce per_device_eval_batch_size, and optimize your metric code (e.g., vectorize operations). See Unsloth evaluation FAQ and Unsloth evaluation loop wiki.

Would you like a code example for a faster, vectorized metric function?

Sources:

lucid parcel
#

per_device_train_batch_size = 16,
gradient_accumulation_steps = 2,

@merry summit when i increase the batch size and lower gradient accumulation the loss quickly goes down, what is the advice for picking this

merry summitBOT
#

A larger per_device_train_batch_size with lower gradient_accumulation_steps leads to faster loss reduction but uses more VRAM. The recommended approach is to set the largest batch size that fits in your GPU memory, then increase gradient_accumulation_steps to reach your desired effective batch size for stable training. The product of these two (batch_size * gradient_accumulation_steps) determines training stability and quality—aim for an effective batch size of 16 as a good starting point (Unsloth LoRA hyperparameters guide).

Would you like more detail on tuning these parameters for your hardware?

Sources:

lucid parcel
#

from transformers import EarlyStoppingCallback
early_stopping_callback = EarlyStoppingCallback(
early_stopping_patience = 2, # How many steps we will wait if the eval loss doesn't decrease
# For example the loss might increase, but decrease after 3 steps
early_stopping_threshold = 0.0, # Can set higher - sets how much loss should decrease by until
# we consider early stopping. For eg 0.01 means if loss was
# 0.02 then 0.01, we consider to early stop the run.
)
trainer.add_callback(early_stopping_callback)

@merry summit does early stopping work even if i use:

num_train_epochs = 1 ionstead of steps

merry summitBOT
#

Yes, early stopping works with num_train_epochs=1 instead of max_steps; it monitors the evaluation metric (e.g., eval_loss) at each evaluation step and will stop training early if the metric does not improve, regardless of whether you use epochs or steps. This is supported in Unsloth and Hugging Face Trainer (Unsloth early stopping docs).

Would you like a code example showing early stopping with epochs?

Sources:

lucid parcel
#

do i need to explicitly save at intervals to use early stopping or will it save at the one it stops at be default? @merry summit

merry summitBOT
#

You must set save_strategy="steps" and specify save_steps in your training arguments to ensure checkpoints are saved at intervals; early stopping will then load the best model from these checkpoints. If you do not save at intervals, the best model may not be available after early stopping (Unsloth early stopping docs).

Would you like a code example for configuring this?

Sources:

lucid parcel
#

Step Training Loss Validation Loss
25 5.211700 5.163057

is this a good start for a 223 step training run? at step 25?

#

@merry summit

merry summitBOT
#

It seems likely that a training loss and validation loss around 5.2 at step 25 is typical for the early phase of vision or multimodal model fine-tuning in Unsloth, especially if your loss decreases over time. High initial losses (5–7) are common and expected, as noted in the Unsloth Gemma 3N fine-tuning guide.

Would you like more detail on interpreting loss curves or what to expect as training progresses?

Sources:

lucid parcel
#

@merry summit what loss should i expect if the labels are binary "yes" or "no" ? when it is finished training

merry summitBOT
#

It could be possible that for binary "Yes"/"No" labels (i.e., binary classification with cross-entropy loss), a well-trained model should reach a final loss between 0.1 and 0.5, depending on data quality and class balance. If your loss is much higher, the model may not be learning; if it is near zero, it might be overfitting (Unsloth LoRA hyperparameters guide).

Would you like more detail on interpreting binary classification loss or how to check for overfitting?

Sources:

lucid parcel
#

@merry summit if i end at 1.5 but it has not platoed yet. should i just increase learning rate or change somerhing else?

merry summitBOT
#

If your loss ends at 1.5 and hasn't plateaued, you should first try increasing the number of epochs or steps, not the learning rate. Increasing the learning rate can destabilize training or cause overshooting. Extending training duration is the recommended approach if the loss is still decreasing (Unsloth LoRA hyperparameters guide).

Would you like more detail on tuning learning rate versus training duration?

Sources:

lucid parcel
#

well i could do that but i dont have any data augmentation so it will be training on the same data as in epoch 1 @merry summit

merry summitBOT
#

It seems likely that training for more epochs on the same data (without augmentation) can still help if the model is underfitting, but it increases the risk of overfitting—especially with small datasets. If you notice validation loss starts to rise or accuracy plateaus, it's a sign to stop or consider data augmentation or regularization (Unsloth LoRA hyperparameters guide).

Would you like suggestions for simple data augmentation or regularization techniques?

Sources:

lucid parcel
#

i will train for about 200 steps, after 50 i am already down to 0.010 validation loss

#

@merry summit

merry summitBOT
#

A validation loss of 0.010 after 50 steps (out of 200) is extremely low and suggests your model may be overfitting, especially if your dataset is small or lacks augmentation. If the loss plateaus or starts increasing, consider stopping early or using regularization (Unsloth LoRA hyperparameters guide).

Would you like advice on early stopping or regularization to prevent overfitting?

Sources:

lucid parcel
#

@merry summit yes

merry summitBOT
#

I'm here to help with questions about Unsloth, its features, fine-tuning, troubleshooting, and related topics. Could you please clarify or provide more details about your question regarding Unsloth?

lucid parcel
#

@merry summit how can it be overfitting when validation loss is going down along side training loss?

merry summitBOT
#

It seems likely that if both training and validation loss are decreasing together, your model is not overfitting yet. Overfitting typically shows as training loss decreasing while validation loss plateaus or increases. As long as both losses drop, your model is still learning generalizable patterns (Unsloth LoRA hyperparameters guide).

Would you like more detail on how to monitor for overfitting or when to stop training?

Sources:

lucid parcel
#

per_device_train_batch_size = 16,
gradient_accumulation_steps = 1,

    eval_strategy="steps",
    eval_steps=50,

    save_strategy="steps",
    save_steps = 50,


    warmup_steps = 10,
    #max_steps = 150,
    num_train_epochs = 2, # Set this instead of max_steps for full training runs
    learning_rate = 2e-5,
    logging_steps = 1,
    optim = "adamw_8bit",
    weight_decay = 0.01,
    lr_scheduler_type = "linear",
    seed = 3407,
    output_dir = "outputs",
    report_to = "none",     # For Weights and Biases

finetune_vision_layers = False, # False if not finetuning vision layers
finetune_language_layers = True, # False if not finetuning language layers
finetune_attention_modules = True, # False if not finetuning attention layers
finetune_mlp_modules = False, # False if not finetuning MLP layers

r = 16,           # The larger, the higher the accuracy, but might overfit
lora_alpha = 16,  # Recommended alpha == r at least
lora_dropout = 0.1,
bias = "none",
random_state = 3407,
use_rslora = False,  # We support rank stabilized LoRA
loftq_config = None, # And LoftQ

@merry summit my validation and training loss drops to 0.01 after 50 steps (i want to train for 400). what should i change?