I'm trying to convert a finetuned Gemma 3 27B model (originally loaded as 4-bit with Unsloth) to Q8_0 GGUF format using the manual llama.cpp conversion script, but I'm running into an error.
Here's my setup and what I'm doing:
-
Base Model: "unsloth/gemma-3-27b-it-unsloth-bnb-4bit"
-
Initial Loading: FastModel.from_pretrained with load_in_4bit=True and max_seq_length=13500.
-
LoRA Setup: FastModel.get_peft_model with r=8.
-
Training: SFTTrainer.train() completed successfully (with correct max_seq_length=13500).
-
Adapter Save: model.save_pretrained("gemma-3-27b-quant-4bit-adapters") and tokenizer.save_pretrained(...) completed successfully.
-
Load Saved Adapters: In a new session, I loaded the finetuned model using FastModel.from_pretrained("gemma-3-27b-quant-4bit-adapters", ...) with max_seq_length=13500 and load_in_4bit=True. This also worked.
-
Manual Merge & Save to 16-bit: I successfully used merged_model = model.merge_and_unload() followed by merged_model.save_pretrained("gemma-3-27b-quant-4bit-merged-16bit"). This directory contains config.json and the merged safetensor shards.
-
Copied Tokenizer: Copied the necessary tokenizer files from gemma-3-27b-quant-4bit-adapters/ to gemma-3-27b-quant-4bit-merged-16bit/. (Confirmed they are there).
-
Manual GGUF Conversion Attempt: Ran the llama.cpp/convert_hf_to_gguf.py script pointing to the merged 16-bit directory:
python llama.cpp/convert_hf_to_gguf.py "gemma-3-27b-quant-4bit-merged-16bit" --outfile "gemma-3-27b-gguf-q8_0.gguf" --outtype Q8_0
Error Encountered:
ValueError: Can not map tensor 'model.layers.0.mlp.down_proj.weight.absmax'
Environment:
Unsloth/Unsloth-Zoo Source: Installed from git+https://github.com/unslothai/unsloth.git (most recent version as of yesterday).
PyTorch: 2.4.1+cu124
CUDA: 12.4
GPU: NVIDIA A100-SXM4-80GB
OS: Linux (RunPod)
Transformers: 4.51.3
Is this a known bug, and are there any workarounds or fixes available in the latest code?

