#Unable to export Ministral model to GGUF

6 messages · Page 1 of 1 (latest)

short mulch
#

Running model.save_pretrained_gguf("model", tokenizer, quantization_method = "q5_k_m") results in this error in the conversion to BF16:

RuntimeError                              Traceback (most recent call last)
Cell In[5], line 1
----> 1 model.save_pretrained_gguf("model", tokenizer, quantization_method = "q5_k_m")

File /usr/local/lib/python3.12/dist-packages/unsloth/save.py:1986, in unsloth_save_pretrained_gguf(self, save_directory, tokenizer, quantization_method, first_conversion, push_to_hub, token, private, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, tags, temporary_location, maximum_memory_usage)
   1979         raise RuntimeError(
   1980             f"Unsloth: GGUF conversion failed in Kaggle environment.\n"
   1981             f"This is likely due to the 20GB disk space limit.\n"
   1982             f"Try saving to /tmp directory or use a smaller model.\n"
   1983             f"Error: {e}"
   1984         )
   1985     else:
-> 1986         raise RuntimeError(f"Unsloth: GGUF conversion failed: {e}")
   1988 # Step 9: Create Ollama modelfile
   1989 modelfile_location = None

RuntimeError: Unsloth: GGUF conversion failed: Unsloth: Failed to convert vision projector to GGUF: Command 'python llama.cpp/unsloth_convert_hf_to_gguf.py --outfile Ministral-3-14B-Instruct-2512.BF16-mmproj.gguf --outtype bf16 --mmproj  --split-max-size 50G model' returned non-zero exit status 1.```

Running in Runpod with an A6000. Thought it was a disk space problem, but I do have 20-30GB left after the BF16 conversion, plenty for the Ministral 14B model. Here's the package info:
```==((====))==  Unsloth 2025.11.6: Fast Mistral3 patching. Transformers: 5.0.0.dev0.
   \\   /|    NVIDIA RTX A6000. Num GPUs = 1. Max memory: 47.529 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.1+cu128. CUDA: 8.6. CUDA Toolkit: 12.8. Triton: 3.5.1
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.33.post2. FA2 = False]```
#

Should mention that the lora was trained on the base model, but the adapter_config was modified to the instruct model path, so that the lora from the base model was applied to the instruct model.

topaz rock
#

Can you try using our Docker image?

#

Seems to be an issue with the latest transformers/torch

marsh plover
#

i am not sure the unsloth/unsloth-zoo version in our docker image is ministral compatible

short mulch