Running model.save_pretrained_gguf("model", tokenizer, quantization_method = "q5_k_m") results in this error in the conversion to BF16:
RuntimeError Traceback (most recent call last)
Cell In[5], line 1
----> 1 model.save_pretrained_gguf("model", tokenizer, quantization_method = "q5_k_m")
File /usr/local/lib/python3.12/dist-packages/unsloth/save.py:1986, in unsloth_save_pretrained_gguf(self, save_directory, tokenizer, quantization_method, first_conversion, push_to_hub, token, private, is_main_process, state_dict, save_function, max_shard_size, safe_serialization, variant, save_peft_format, tags, temporary_location, maximum_memory_usage)
1979 raise RuntimeError(
1980 f"Unsloth: GGUF conversion failed in Kaggle environment.\n"
1981 f"This is likely due to the 20GB disk space limit.\n"
1982 f"Try saving to /tmp directory or use a smaller model.\n"
1983 f"Error: {e}"
1984 )
1985 else:
-> 1986 raise RuntimeError(f"Unsloth: GGUF conversion failed: {e}")
1988 # Step 9: Create Ollama modelfile
1989 modelfile_location = None
RuntimeError: Unsloth: GGUF conversion failed: Unsloth: Failed to convert vision projector to GGUF: Command 'python llama.cpp/unsloth_convert_hf_to_gguf.py --outfile Ministral-3-14B-Instruct-2512.BF16-mmproj.gguf --outtype bf16 --mmproj --split-max-size 50G model' returned non-zero exit status 1.```
Running in Runpod with an A6000. Thought it was a disk space problem, but I do have 20-30GB left after the BF16 conversion, plenty for the Ministral 14B model. Here's the package info:
```==((====))== Unsloth 2025.11.6: Fast Mistral3 patching. Transformers: 5.0.0.dev0.
\\ /| NVIDIA RTX A6000. Num GPUs = 1. Max memory: 47.529 GB. Platform: Linux.
O^O/ \_/ \ Torch: 2.9.1+cu128. CUDA: 8.6. CUDA Toolkit: 12.8. Triton: 3.5.1
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.33.post2. FA2 = False]```