Hi, I'm trying to finetune Gemma3. I am running into issues trying to load the model from disk.
ValueError: Supplied state dict for language_model.model.layers.0.mlp.down_proj.weight does not contain bitsandbytes__*and possibly otherquantized_stats components.
Relevant GitHub Issue:
https://github.com/unslothai/unsloth/issues/638
Would very much appreciate the help!
Minimal Example:
from unsloth import FastModel import torch import os model_name ="unsloth/gemma-3-12b-pt-unsloth-bnb-4bit" os.makedirs("gemma_unmodified", exist_ok=True) model.save_pretrained("gemma_unmodified") tokenizer.save_pretrained("gemma_unmodified") print("Model and tokenizer saved to 'gemma_unmodified' directory")
# Load the model and tokenizer from unsloth import FastModel from transformers import AutoTokenizer model = FastModel.from_pretrained("/workspace/Chess/Gemma_unsloth/gemma_unmodified", local_files_only=True) tokenizer = AutoTokenizer.from_pretrained("/workspace/Chess/Gemma_unsloth/gemma_unmodified", local_files_only=True)
Relevant Packages:
accelerate 1.5.2
bitsandbytes 0.45.3
sentencepiece 0.2.0
torch 2.6.0
torchaudio 2.6.0
torchvision 0.21.0
transformers 4.50.0.dev0
unsloth 2025.3.17
unsloth_zoo 2025.3.15
!pip install --no-deps unsloth vllm
!pip install --no-deps git+https://github.com/huggingface/[email protected]
Potentially relevant from config.json:
"quantization_config": { "bnb_4bit_compute_dtype": "bfloat16", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": null, "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" },
I would like to finetune CodeLlama-13b in a memory efficient way. I was able to do it with CodeLlama-7b, but failing with 13b. I can't load the model unsloth/codellama-13b-bnb-4bit: model, toke...