Your issue is most likely caused by a mismatch between the chat template used during fine-tuning (Alpaca format) and the template used in your Ollama Modelfile. For Qwen2.5 models, using the wrong template or stop tokens can cause infinite loops or repeated outputs in Ollama, even if inference works fine in Transformers. Unsloth documentation and user reports emphasize that the SAME chat template must be used for both training and inference, especially when exporting to GGUF/Ollama—otherwise, you get endless generations or gibberish outputs. The Alpaca template is not natively supported by Qwen2.5; you should use the ChatML template for Qwen models in both training and your Modelfile for Ollama, and ensure your stop tokens match the template's end tokens. See the official troubleshooting guidance for details and examples of correct templates and stop tokens for Qwen2.5 models in Ollama and llama.cpp environments.
Would you like a step-by-step guide on how to set up the correct template and stop tokens for Qwen2.5 in Ollama?
Sources: