#save_pretrained_gguf loops forever

39 messages ยท Page 1 of 1 (latest)

grizzled cosmosBOT
#

This is a known issue: after recent Unsloth updates, save_pretrained_gguf can enter an infinite loop, repeatedly calling the conversion script until VRAM is exhausted and the process fails. The root cause is not fully documented in the official docs, but multiple users have reported this looping/recursion problem after updating, especially when the merged model is generated successfully but GGUF conversion does not complete as expected. As a workaround, you can manually convert the merged model to GGUF using llama.cpp's convert_hf_to_gguf.py script, which avoids the looping issue and produces a valid GGUF file for inference or deployment.

For now, the recommended steps are: (1) use model.save_pretrained_merged to create a merged model directory, and (2) run the llama.cpp conversion script manually on that directory. This approach is confirmed to work and is suggested by Unsloth maintainers while they address the bug in save_pretrained_gguf. See the linked issues for more details and user reports. Would you like a step-by-step guide for the manual conversion process?

Sources:

brave cosmos
#

Hello,
after I updateds to the latest unsloth version, the GGUF generation works partially, but it ends up with a loop, calling the script I used again and again until the GPU VRAM is full and then it fails. The thing is during the runs it is able to generate a proper merged model (that I can manually convert in GGUF).
I attached the full log showing this looping behavior, and also the python script used.

#

@tulip current

tulip current
#

what is ur operating system?

brave cosmos
#

Oracle Linux

#

NAME="Oracle Linux Server"
VERSION="10.0"
ID="ol"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="10.0"
PLATFORM_ID="platform:el10"
PRETTY_NAME="Oracle Linux Server 10.0"

Linux DESKTOP-NA5Q3GB.home 6.12.0-103.40.4.4.el10uek.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct 3 16:13:11 PDT 2025 x86_64 GNU/Linux

tulip current
#

oh oracle linux ๐Ÿ˜… , not very familiar with that

#

you ran the code on the terminal , correct?

#

also are you usually a python environment (conda? mamba? etc..)

brave cosmos
#

I am using a standard python installation, in a virtual environment

#

not conda or mamba...

tulip current
#

ie .venv?

#

or just the system wide python?

brave cosmos
#

I tried both... and the result is the same

tulip current
#

that's odd because on debian linux, it works fine on most models. we just finished a rounds of tests.
I am gonna try on this specific model now, maybe the issue is with the model.. for some reason

brave cosmos
#

but if I try manually it works

tulip current
#

oh

brave cosmos
#

I can run the llama python script and quantize

tulip current
#

maybe then something is breaking off for rpm based linux . we just introduced that compatibility (it used to be only for debian based linux).

#

let me see if i can rent an rpm linux based machine somewhere and test

brave cosmos
#

basically the problem I have is just the looping of the save function... let's say after the first run, in the destination directory I have the correct thing...

tulip current
#

yes that looping doesn't happen when we test on an A100 and H100 and debian based linux. that' s my point.

#

so might be because our rpm based linux compatibility is tripping off somewhere. i need to test it

brave cosmos
#

ok, sure

tulip current
#

i'll ping you here once we figure it out

brave cosmos
#

thanks... anyway it is a huge improvement, because before this patch... it was completely broken ๐Ÿ˜„

tulip current
#

yes but still like i hate bugs slothfire

brave cosmos
#

yeah ๐Ÿ™‚

tulip current
#

works on debian linux . must be that our rpm linux compatibility code is tripping off

#

ugh

brave cosmos
#

oh.... when I installed the OS I was thinking to use Ubuntu... but I had already OL available ๐Ÿ˜„

brave cosmos
#

I could use docker... maybe... yes?

tulip current
#

would honestly be easier if you use ubuntu cause it's gonna be difficult debugging on rpm based linux. You could also take a shortcut and use our docker container if you can run containers on your machine
just pushed the last image 10 minutes ago : https://hub.docker.com/r/unsloth/unsloth

brave cosmos
#

yes... I can run containers... is there a guide I can follow?

tulip current
#

yes it's on that page .. detailed description ๐Ÿ˜„

brave cosmos
#

thanks a lot

brave cosmos
#

tried in docker and everything worked perfectly... thanks. unfortunately in the doc there are only the instruction for Debian based systems. For RedHat based, these are the steps:

  1. download the repo: curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
  2. install one package: sudo dnf install -y nvidia-container-toolkit
  3. configure the container runtime: sudo nvidia-ctk runtime configure --runtime=docker
  4. restart the container runtime: sudo systemctl restart docker