#gemma3 loses multimodal support in Ollama when finetuned with Unsloth
50 messages · Page 1 of 1 (latest)
Interestingly, the model seems to lose .4b parameters when finetuned, at least according to ollama show.
Base gemma3:4b:
Model
architecture gemma3
parameters 4.3B
context length 8192
embedding length 2560
quantization Q4_K_M
Parameters
stop "<end_of_turn>"
temperature 0.1
License
Gemma Terms of Use
Last modified: February 21, 2024
Finetuned gemma3:4b:
Model
architecture gemma3
parameters 3.9B
context length 131072
embedding length 2560
quantization Q8_0
Parameters
min_p 0
num_predict 32768
stop "<end_of_turn>"
stop "<eos>"
temperature 1
top_k 64
top_p 0.95
Does that mean it's losing it's vision layers? That's a suspiciously large number of parameters to just vanish into thin air.
Is there something specific you have to do to retain the vision capability?
i'm assuming i literally just have to set this to true? lol.
i assumed it would change the vision layers, not remove them, but based on that comment i assume it's just re-adding it unless i give it images to finetune with
Nope. Ollama still doesn't recognize it as an image model ://
This is old but
for the time being
create an ollama Modelfile inside the local model directory with the model in safetensors format
then you can use ollama itself to convert it into gguf
cd into the model directory with the Modelfile and the model in safetensors format and do the following
ollama create --quantize Q5_K_M -f ./Modelfile
can you please share the content of the Modelfile? I'm facing the same issue
oh i am not sure how to write Gemma3 model files, it's an ollama specific thing
but let me look around
something like this
FROM ./
PARAMETER temperature 1
PARAMETER top_k 64
PARAMETER top_p 0.95
PARAMETER min_p 0.0
PARAMETER num_ctx 8192
PARAMETER stop "<end_of_turn>"
TEMPLATE """
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if or (eq .Role "user") (eq .Role "system") }}<start_of_turn>user
{{ .Content }}<end_of_turn>
{{ if $last }}<start_of_turn>model
{{ end }}
{{- else if eq .Role "assistant" }}<start_of_turn>model
{{ .Content }}{{ if not $last }}<end_of_turn>
{{ end }}
{{- end }}
{{- end }}"""
because you're putting the Modelfile in the same directory as the model and the Modelfile should point to the directory where the model is , i am assuming you'll have to do
FROM ./
which means from current directory
when you quantize
you'll need to point to the model file
so cd into the directory
and then
ollama create --quantize q5_K_M $MODEL_NAME -f "$REPO_DIR/Modelfile"
in this case you might do
ollama create --quantize q5_K_M gemma3-finetune -f ./Modelfile
tysm very appreciate it, will try that
maybe @whole saffron can pitch in here. i worked out with him on the gemma3-vision
hey guys, any update?
im using those train args idk if im missing something
from unsloth import is_bf16_supported
from unsloth.trainer import UnslothVisionDataCollator
from trl import SFTTrainer, SFTConfig
FastVisionModel.for_training(model) # Enable for training!
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
data_collator = UnslothVisionDataCollator(model, tokenizer), # Must use!
train_dataset = converted_dataset,
args = SFTConfig(
per_device_train_batch_size = 1,
gradient_accumulation_steps = 4,
warmup_steps = 5,
#max_steps = 30,
num_train_epochs = 4, # Set this instead of max_steps for full training runs
learning_rate = 2e-4,
fp16 = not is_bf16_supported(),
bf16 = is_bf16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
report_to = "none", # For Weights and Biases
# You MUST put the below items for vision finetuning:
remove_unused_columns = False,
dataset_text_field = "",
dataset_kwargs = {"skip_prepare_dataset": True},
dataset_num_proc = 4,
max_seq_length = 2048,
),
)
No, unfortunately. Converting the model with Ollama itself also results in a loss of vision, regardless of what quantization you use. Unless either that or Unsloth's quantizer are fixed, this won't work.
are you using ollama >= 0.6.2?
I am using 0.6.2
I have 0.6.6
Actually converting the model using ollama seems to work . made a couple of runs yesterday and they worked
however its constrained in the sense that ollama requires the mdoel to be either float16 or float32 to convert it
So do you mean we have to convert it to a float16, or that we have to convert from a float16? If it's the latter, I'm already doing that, I'm quantizing from the float16 VLLm output on the Gemma3 Colab. It shouldn't be the former, because Ollama runs tons of other q4 and int8 quantized vision models, including the base Gemma3 model itself
from a float 16
however float16 is not necessarily only float16
that's why you might need to save to float32
ie, notice that if you take an original gemma3-4b-it from google, non quantized
and you quantize it with ollama, it works
but everytime you have quantization involved in your source model, it's not working
In my opinion that's an ollama quantizer design problem because llama.cpp doesn't really have that issue
if you look at the parameters in a float32, they're all float32
if you look at a model that was imported as 4bit, lora finetuned then saved to float16, it still contains int8 parameters
actually to be more precise, unsloth aside, use transformers to import a model which BitsandBytes 4 bit quantization applied
it's has a mix of different parameter dtypes
take an unsloth finetuned gemma3 model , use the latest llama convert_hf_to_gguf.py script to convert it
then do image inference on it using the new syntax , it preserves vision capabilities
it's a mess honestly... ollama no longer accepts gguf where the core model and the vision mmproj file are separate files (llama.cpp does)