#UserWarning: Model qwen3-32b-bnb-4bit not found locally or on HuggingFace

1 messages · Page 1 of 1 (latest)

fierce pumice
#

Does anyone know why my model is not found when converting to gguf?

  warnings.warn(f"Model {model_name} not found locally or on HuggingFace")
Unsloth: Merging model weights to 16-bit format...
Unsloth: Converting to GGUF format...

Is using Qwen3-32B-unsloth-bnb-4bit model the problem?
It says in the docs to finetune it but i read somewhere that you have to finetune the dynamic 2.0 gguf version instead

solar impBOT
#

The warning appears because Unsloth cannot find the 16-bit base model needed for merging and GGUF conversion when starting from a 4-bit quantized model like Qwen3-32B-unsloth-bnb-4bit. Unsloth requires the original (unquantized) model to merge LoRA adapters and export to GGUF; using a 4-bit model as the base is unsupported and discouraged for this process. You should always fine-tune and merge using the 16-bit base model, then quantize/export to GGUF after merging, not before. This is not specific to "dynamic 2.0 gguf"—the key is to avoid using a 4-bit model as your starting point for fine-tuning and merging steps.

For best results, download and fine-tune the 16-bit version of Qwen3-32B, merge your adapters, and only then convert to GGUF. Using a 4-bit model for fine-tuning or merging will cause errors and degraded performance, as explained in this Unsloth issue and here.

Would you like a step-by-step guide for the correct workflow?

Sources:

fierce pumice
#

Here's the code:

    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name="finetuned_model",     # point to your saved folder
        max_seq_length=8192,
        dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
        load_in_4bit = True,
        load_in_8bit = False,
    )
    model.save_pretrained_gguf("exported_model", tokenizer, quantization_method = "q4_k_m")
short nacelle
#

Oh

#

Oh this is a local model

#

let me check sorry

fierce pumice
#

Oh it was a unsloth/qwen 32b model

fierce pumice
fierce pumice
short nacelle
#

So I checked - is your adapter_config.json or config.json pointing to a local address?

#

If it is, probs change it to a generic online one

#

For eg GPT-OSS the base_model_name_or_path points to a global internet access-able one

fierce pumice
short nacelle
#

use unsloth/qwen3-32b

#

ohh is this modelscope?

#

maybe thats the issue

#

can you make a new Github issue if possible with the screenshot u provided

#

thatll be super helpful

fierce pumice
fierce pumice
short nacelle
#

thanks so much!

fierce pumice
# short nacelle ohh is this modelscope?

Is there a difference between unsloth/Qwen3-xxB and Qwen/Qwen3-xxB? Changing the base_model_name to unsloth/Qwen3-32B seems to automatically download the 4 bit version.

short nacelle
#

wait how are you saving it

fierce pumice
#

I tested this on both the 14B model and 32B model. Looks like you have to point base_model_name_or_path to a local path for it to work.

#

After pointing it to a local path, 14B works but 32B doesn't since it's output splits the gguf file into two:
Qwen3-32B.BF16-00001-of-00002.gguf
Qwen3-32B.BF16-00002-of-00002.gguf

14B output file:
Qwen3-14B.Q4_K_M.gguf

14B correctly uses q4_k_m quantization but 32B fails to do so.

short nacelle
#

oh

#

oh so the 2 files are normal

#

since the filesize exceeds 50GB

#

so it splits it ino 2

fierce pumice
#

Thanks, I'll create an issue in a few days