The error occurs because Unsloth's GGUF export does not yet support vision model architectures like MllamaForConditionalGeneration. The GGUF conversion currently only works for certain text-only models, and vision/multimodal models are not supported for GGUF export via Unsloth at this time. Manual conversion with llama.cpp also requires special handling for vision models, which is not yet integrated in Unsloth's workflow [source].
If you need GGUF for a vision model, you must wait for official support or follow the llama.cpp multimodal documentation to separately export the language and vision components, then combine them as required by your inference backend [source].
Would you like more detail or guidance on manual conversion steps?
Sources: