#gemma3 loses multimodal support in Ollama when finetuned with Unsloth

50 messages · Page 1 of 1 (latest)

near ferry
#

As the title suggests, I finetuned Gemma3 following the "Gemma3:4b" Colab, and after finetuning I can no longer add images in Ollama. Is there something specific I have to do to get Ollama to recognize that it's a vision model?

near ferry
#

Interestingly, the model seems to lose .4b parameters when finetuned, at least according to ollama show.

Base gemma3:4b:

Model
    architecture        gemma3    
    parameters          4.3B      
    context length      8192      
    embedding length    2560      
    quantization        Q4_K_M    

  Parameters
    stop           "<end_of_turn>"    
    temperature    0.1                

  License
    Gemma Terms of Use                  
    Last modified: February 21, 2024

Finetuned gemma3:4b:

Model
    architecture        gemma3    
    parameters          3.9B      
    context length      131072    
    embedding length    2560      
    quantization        Q8_0      

  Parameters
    min_p          0                  
    num_predict    32768              
    stop           "<end_of_turn>"    
    stop           "<eos>"            
    temperature    1                  
    top_k          64                 
    top_p          0.95 

Does that mean it's losing it's vision layers? That's a suspiciously large number of parameters to just vanish into thin air.
Is there something specific you have to do to retain the vision capability?

#

i'm assuming i literally just have to set this to true? lol.

#

i assumed it would change the vision layers, not remove them, but based on that comment i assume it's just re-adding it unless i give it images to finetune with

near ferry
#

Nope. Ollama still doesn't recognize it as an image model ://

toxic nexus
#

This is old but

#

for the time being

#

create an ollama Modelfile inside the local model directory with the model in safetensors format

#

then you can use ollama itself to convert it into gguf

#

cd into the model directory with the Modelfile and the model in safetensors format and do the following

#
ollama create --quantize Q5_K_M -f ./Modelfile
minor path
toxic nexus
#

oh i am not sure how to write Gemma3 model files, it's an ollama specific thing

#

but let me look around

#

something like this

FROM ./

PARAMETER temperature 1
PARAMETER top_k 64
PARAMETER top_p 0.95
PARAMETER min_p 0.0
PARAMETER num_ctx 8192
PARAMETER stop "<end_of_turn>"

TEMPLATE """
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if or (eq .Role "user") (eq .Role "system") }}<start_of_turn>user
{{ .Content }}<end_of_turn>
{{ if $last }}<start_of_turn>model
{{ end }}
{{- else if eq .Role "assistant" }}<start_of_turn>model
{{ .Content }}{{ if not $last }}<end_of_turn>
{{ end }}
{{- end }}
{{- end }}"""
#

because you're putting the Modelfile in the same directory as the model and the Modelfile should point to the directory where the model is , i am assuming you'll have to do

FROM ./

which means from current directory

#

when you quantize

#

you'll need to point to the model file

#

so cd into the directory

#

and then

#
ollama create --quantize q5_K_M $MODEL_NAME -f "$REPO_DIR/Modelfile"

in this case you might do

ollama create --quantize q5_K_M gemma3-finetune -f ./Modelfile
minor path
#

tysm very appreciate it, will try that

minor path
#

tried that but the issue still remains

#

really weird

toxic nexus
#

maybe @whole saffron can pitch in here. i worked out with him on the gemma3-vision

minor path
#

hey guys, any update?

#

im using those train args idk if im missing something

from unsloth import is_bf16_supported
from unsloth.trainer import UnslothVisionDataCollator
from trl import SFTTrainer, SFTConfig

FastVisionModel.for_training(model) # Enable for training!

trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
data_collator = UnslothVisionDataCollator(model, tokenizer), # Must use!
train_dataset = converted_dataset,
args = SFTConfig(
per_device_train_batch_size = 1,
gradient_accumulation_steps = 4,
warmup_steps = 5,
#max_steps = 30,
num_train_epochs = 4, # Set this instead of max_steps for full training runs
learning_rate = 2e-4,
fp16 = not is_bf16_supported(),
bf16 = is_bf16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
report_to = "none", # For Weights and Biases

    # You MUST put the below items for vision finetuning:
    remove_unused_columns = False,
    dataset_text_field = "",
    dataset_kwargs = {"skip_prepare_dataset": True},
    dataset_num_proc = 4,
    max_seq_length = 2048,
),

)

near ferry
#

No, unfortunately. Converting the model with Ollama itself also results in a loss of vision, regardless of what quantization you use. Unless either that or Unsloth's quantizer are fixed, this won't work.

celest estuary
#

are you using ollama >= 0.6.2?

near ferry
#

I am using 0.6.2

minor path
#

I have 0.6.6

toxic nexus
#

however its constrained in the sense that ollama requires the mdoel to be either float16 or float32 to convert it

near ferry
#

So do you mean we have to convert it to a float16, or that we have to convert from a float16? If it's the latter, I'm already doing that, I'm quantizing from the float16 VLLm output on the Gemma3 Colab. It shouldn't be the former, because Ollama runs tons of other q4 and int8 quantized vision models, including the base Gemma3 model itself

toxic nexus
#

from a float 16

#

however float16 is not necessarily only float16

#

that's why you might need to save to float32

#

ie, notice that if you take an original gemma3-4b-it from google, non quantized

#

and you quantize it with ollama, it works

#

but everytime you have quantization involved in your source model, it's not working

#

In my opinion that's an ollama quantizer design problem because llama.cpp doesn't really have that issue

toxic nexus
#

if you look at a model that was imported as 4bit, lora finetuned then saved to float16, it still contains int8 parameters

#

actually to be more precise, unsloth aside, use transformers to import a model which BitsandBytes 4 bit quantization applied

#

it's has a mix of different parameter dtypes

#

take an unsloth finetuned gemma3 model , use the latest llama convert_hf_to_gguf.py script to convert it
then do image inference on it using the new syntax , it preserves vision capabilities

#

it's a mess honestly... ollama no longer accepts gguf where the core model and the vision mmproj file are separate files (llama.cpp does)

minor path
#

sorry for this dump question kinda new. Does this mean i cant fine tune in 4bit?

#

if possible to share ur code/notebook would be so useful