save_pretrained_gguf loops forever | Unsloth AI | Page 1

grizzled cosmosBOT Oct 14, 2025, 3:12 PM

#

This is a known issue: after recent Unsloth updates, save_pretrained_gguf can enter an infinite loop, repeatedly calling the conversion script until VRAM is exhausted and the process fails. The root cause is not fully documented in the official docs, but multiple users have reported this looping/recursion problem after updating, especially when the merged model is generated successfully but GGUF conversion does not complete as expected. As a workaround, you can manually convert the merged model to GGUF using llama.cpp's convert_hf_to_gguf.py script, which avoids the looping issue and produces a valid GGUF file for inference or deployment.

For now, the recommended steps are: (1) use model.save_pretrained_merged to create a merged model directory, and (2) run the llama.cpp conversion script manually on that directory. This approach is confirmed to work and is suggested by Unsloth maintainers while they address the bug in save_pretrained_gguf. See the linked issues for more details and user reports. Would you like a step-by-step guide for the manual conversion process?

Sources:

brave cosmos Oct 14, 2025, 3:14 PM

#

Hello,
after I updateds to the latest unsloth version, the GGUF generation works partially, but it ends up with a loop, calling the script I used again and again until the GPU VRAM is full and then it fails. The thing is during the runs it is able to generate a proper merged model (that I can manually convert in GGUF).
I attached the full log showing this looping behavior, and also the python script used.

📎 gguf_fail.txt

#

this is the script I used

📎 gguf.txt

#

@tulip current

tulip current Oct 14, 2025, 3:21 PM

#

what is ur operating system?

brave cosmos Oct 14, 2025, 4:29 PM

#

Oracle Linux

#

NAME="Oracle Linux Server"
VERSION="10.0"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="10.0"
PLATFORM_ID="platform:el10"
PRETTY_NAME="Oracle Linux Server 10.0"

Linux DESKTOP-NA5Q3GB.home 6.12.0-103.40.4.4.el10uek.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct 3 16:13:11 PDT 2025 x86_64 GNU/Linux

tulip current Oct 14, 2025, 4:51 PM

#

oh oracle linux 😅 , not very familiar with that

#

you ran the code on the terminal , correct?

#

also are you usually a python environment (conda? mamba? etc..)

brave cosmos Oct 14, 2025, 5:47 PM

#

I am using a standard python installation, in a virtual environment

#

not conda or mamba...

tulip current Oct 14, 2025, 5:48 PM

#

ie .venv?

#

or just the system wide python?

brave cosmos Oct 14, 2025, 6:46 PM

#

I tried both... and the result is the same

tulip current Oct 14, 2025, 6:46 PM

#

that's odd because on debian linux, it works fine on most models. we just finished a rounds of tests.
I am gonna try on this specific model now, maybe the issue is with the model.. for some reason

brave cosmos Oct 14, 2025, 6:47 PM

#

but if I try manually it works

tulip current Oct 14, 2025, 6:47 PM

#

oh

brave cosmos Oct 14, 2025, 6:48 PM

#

I can run the llama python script and quantize

tulip current Oct 14, 2025, 6:48 PM

#

maybe then something is breaking off for rpm based linux . we just introduced that compatibility (it used to be only for debian based linux).

#

let me see if i can rent an rpm linux based machine somewhere and test

brave cosmos Oct 14, 2025, 6:48 PM

#

basically the problem I have is just the looping of the save function... let's say after the first run, in the destination directory I have the correct thing...

tulip current Oct 14, 2025, 6:49 PM

#

yes that looping doesn't happen when we test on an A100 and H100 and debian based linux. that' s my point.

#

so might be because our rpm based linux compatibility is tripping off somewhere. i need to test it

brave cosmos Oct 14, 2025, 6:49 PM

#

ok, sure

tulip current Oct 14, 2025, 6:50 PM

#

i'll ping you here once we figure it out

brave cosmos Oct 14, 2025, 6:50 PM

#

thanks... anyway it is a huge improvement, because before this patch... it was completely broken 😄

tulip current Oct 14, 2025, 6:51 PM

#

yes but still like i hate bugs slothfire

brave cosmos Oct 14, 2025, 6:51 PM

#

yeah 🙂

tulip current Oct 14, 2025, 6:57 PM

#

works on debian linux . must be that our rpm linux compatibility code is tripping off

Screen_Shot_2025-10-14_at_10.56.21_PM.png

#

ugh

brave cosmos Oct 14, 2025, 7:23 PM

#

oh.... when I installed the OS I was thinking to use Ubuntu... but I had already OL available 😄

brave cosmos Oct 14, 2025, 7:51 PM

#

I could use docker... maybe... yes?

tulip current Oct 14, 2025, 7:51 PM

#

would honestly be easier if you use ubuntu cause it's gonna be difficult debugging on rpm based linux. You could also take a shortcut and use our docker container if you can run containers on your machine
just pushed the last image 10 minutes ago : https://hub.docker.com/r/unsloth/unsloth

brave cosmos Oct 14, 2025, 7:52 PM

#

yes... I can run containers... is there a guide I can follow?

tulip current Oct 14, 2025, 7:52 PM

#

yes it's on that page .. detailed description 😄

#

there is another documentation version here too in our docs: https://docs.unsloth.ai/get-started/install-and-update/docker

Docker | Unsloth Documentation

Install Unsloth using our official Docker container

brave cosmos Oct 14, 2025, 8:00 PM

#

thanks a lot

brave cosmos Oct 15, 2025, 8:01 AM

#

tried in docker and everything worked perfectly... thanks. unfortunately in the doc there are only the instruction for Debian based systems. For RedHat based, these are the steps:

download the repo: curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
install one package: sudo dnf install -y nvidia-container-toolkit
configure the container runtime: sudo nvidia-ctk runtime configure --runtime=docker
restart the container runtime: sudo systemctl restart docker

#save_pretrained_gguf loops forever