#Failed to load quantized Gemma model from local directory after saving with save_pretrained

4 messages · Page 1 of 1 (latest)

thick briar
#

Hi, I'm trying to finetune Gemma3. I am running into issues trying to load the model from disk.

ValueError: Supplied state dict for language_model.model.layers.0.mlp.down_proj.weight does not contain bitsandbytes__*and possibly otherquantized_stats components.

Relevant GitHub Issue:
https://github.com/unslothai/unsloth/issues/638

Would very much appreciate the help!

Minimal Example:

from unsloth import FastModel import torch import os model_name ="unsloth/gemma-3-12b-pt-unsloth-bnb-4bit" os.makedirs("gemma_unmodified", exist_ok=True) model.save_pretrained("gemma_unmodified") tokenizer.save_pretrained("gemma_unmodified") print("Model and tokenizer saved to 'gemma_unmodified' directory")

# Load the model and tokenizer from unsloth import FastModel from transformers import AutoTokenizer model = FastModel.from_pretrained("/workspace/Chess/Gemma_unsloth/gemma_unmodified", local_files_only=True) tokenizer = AutoTokenizer.from_pretrained("/workspace/Chess/Gemma_unsloth/gemma_unmodified", local_files_only=True)

Relevant Packages:
accelerate 1.5.2
bitsandbytes 0.45.3
sentencepiece 0.2.0
torch 2.6.0
torchaudio 2.6.0
torchvision 0.21.0
transformers 4.50.0.dev0
unsloth 2025.3.17
unsloth_zoo 2025.3.15

!pip install --no-deps unsloth vllm
!pip install --no-deps git+https://github.com/huggingface/[email protected]

Potentially relevant from config.json:
"quantization_config": { "bnb_4bit_compute_dtype": "bfloat16", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": null, "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" },

GitHub

I would like to finetune CodeLlama-13b in a memory efficient way. I was able to do it with CodeLlama-7b, but failing with 13b. I can't load the model unsloth/codellama-13b-bnb-4bit: model, toke...

tranquil heath
#

Found a fix by any chance?

gentle kayak
#

try this sequence

#

load the base model

model, tokenizer = FastLanguageModel.from_pretrained(...)

load peft

model =  FastLanguageModel.get_peft_model(model, .. some_other_arguments)

load adapter