This is a known issue: after recent Unsloth updates, save_pretrained_gguf can enter an infinite loop, repeatedly calling the conversion script until VRAM is exhausted and the process fails. The root cause is not fully documented in the official docs, but multiple users have reported this looping/recursion problem after updating, especially when the merged model is generated successfully but GGUF conversion does not complete as expected. As a workaround, you can manually convert the merged model to GGUF using llama.cpp's convert_hf_to_gguf.py script, which avoids the looping issue and produces a valid GGUF file for inference or deployment.
For now, the recommended steps are: (1) use model.save_pretrained_merged to create a merged model directory, and (2) run the llama.cpp conversion script manually on that directory. This approach is confirmed to work and is suggested by Unsloth maintainers while they address the bug in save_pretrained_gguf. See the linked issues for more details and user reports. Would you like a step-by-step guide for the manual conversion process?
Sources:
