UserWarning: Model qwen3-32b-bnb-4bit not found locally or on HuggingFace | Unsloth AI | Page 1

fierce pumice Nov 5, 2025, 5:17 PM

#

Does anyone know why my model is not found when converting to gguf?

  warnings.warn(f"Model {model_name} not found locally or on HuggingFace")
Unsloth: Merging model weights to 16-bit format...
Unsloth: Converting to GGUF format...

Is using Qwen3-32B-unsloth-bnb-4bit model the problem?
It says in the docs to finetune it but i read somewhere that you have to finetune the dynamic 2.0 gguf version instead

solar impBOT Nov 5, 2025, 5:17 PM

#

The warning appears because Unsloth cannot find the 16-bit base model needed for merging and GGUF conversion when starting from a 4-bit quantized model like Qwen3-32B-unsloth-bnb-4bit. Unsloth requires the original (unquantized) model to merge LoRA adapters and export to GGUF; using a 4-bit model as the base is unsupported and discouraged for this process. You should always fine-tune and merge using the 16-bit base model, then quantize/export to GGUF after merging, not before. This is not specific to "dynamic 2.0 gguf"—the key is to avoid using a 4-bit model as your starting point for fine-tuning and merging steps.

For best results, download and fine-tune the 16-bit version of Qwen3-32B, merge your adapters, and only then convert to GGUF. Using a 4-bit model for fine-tuning or merging will cause errors and degraded performance, as explained in this Unsloth issue and here.

Would you like a step-by-step guide for the correct workflow?

Sources:

fierce pumice Nov 6, 2025, 10:11 AM

#

Here's the code:

    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name="finetuned_model",     # point to your saved folder
        max_seq_length=8192,
        dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
        load_in_4bit = True,
        load_in_8bit = False,
    )
    model.save_pretrained_gguf("exported_model", tokenizer, quantization_method = "q4_k_m")

short nacelle Nov 6, 2025, 10:14 AM

#

Oh

#

Oh this is a local model

#

let me check sorry

fierce pumice Nov 6, 2025, 10:33 AM

#

Oh it was a unsloth/qwen 32b model

fierce pumice Nov 6, 2025, 10:34 AM

#

short nacelle Oh this is a local model

That current model is the one that was saved after training. I'm just following the Qwen3-14B collab notebook code

fierce pumice Nov 6, 2025, 10:35 AM

#

short nacelle let me check sorry

Thank you 😊

short nacelle Nov 6, 2025, 1:40 PM

#

So I checked - is your adapter_config.json or config.json pointing to a local address?

#

If it is, probs change it to a generic online one

#

#

For eg GPT-OSS the base_model_name_or_path points to a global internet access-able one

fierce pumice Nov 6, 2025, 2:29 PM

#

short nacelle So I checked - is your adapter_config.json or config.json pointing to a local ad...

Yes, it's pointing to a local address by default. I'll change it to a generic online one and see if it helps.

Screenshot_2025-11-06_at_10.29.17_PM.png

short nacelle Nov 6, 2025, 2:41 PM

#

use unsloth/qwen3-32b

#

ohh is this modelscope?

#

maybe thats the issue

#

can you make a new Github issue if possible with the screenshot u provided

#

thatll be super helpful

fierce pumice Nov 6, 2025, 2:44 PM

#

short nacelle ohh is this modelscope?

yes, i'm an intl student in China so I have no choice 😅

fierce pumice Nov 6, 2025, 2:46 PM

#

short nacelle can you make a new Github issue if possible with the screenshot u provided

sure thing, I'll create an issue after checking if changing the base_model_name_or_path works. Will be including it in the issue.

short nacelle Nov 6, 2025, 2:48 PM

#

thanks so much!

fierce pumice Nov 6, 2025, 3:10 PM

#

short nacelle ohh is this modelscope?

Is there a difference between unsloth/Qwen3-xxB and Qwen/Qwen3-xxB? Changing the base_model_name to unsloth/Qwen3-32B seems to automatically download the 4 bit version.

short nacelle Nov 6, 2025, 11:13 PM

#

wait how are you saving it

fierce pumice Nov 7, 2025, 9:02 AM

#

short nacelle wait how are you saving it

as a gguf file:
model.save_pretrained_gguf("exported_model", tokenizer, quantization_method = "q4_k_m")

#

I tested this on both the 14B model and 32B model. Looks like you have to point base_model_name_or_path to a local path for it to work.

#

After pointing it to a local path, 14B works but 32B doesn't since it's output splits the gguf file into two:
Qwen3-32B.BF16-00001-of-00002.gguf
Qwen3-32B.BF16-00002-of-00002.gguf

14B output file:
Qwen3-14B.Q4_K_M.gguf

14B correctly uses q4_k_m quantization but 32B fails to do so.

short nacelle Nov 7, 2025, 12:18 PM

#

oh

#

oh so the 2 files are normal

#

since the filesize exceeds 50GB

#

so it splits it ino 2

fierce pumice Nov 7, 2025, 2:29 PM

#

Thanks, I'll create an issue in a few days

#UserWarning: Model qwen3-32b-bnb-4bit not found locally or on HuggingFace