Hello, I hope you are well. I am new to the world of ML; I have been experimenting with models and understanding concepts for a few months. I wanted to understand how some models behave when attempting to do web scraping, so I decided to use Qwen2 for my test. Initially, I used the 'traditional' method, but I noticed there was a significant wait time for the model inference (which I still don't understand why it happens compared to those loaded using llama.cpp, even if they are not quantified). However, I find that using GGUF models, they give me a brief response which doesn't fully address my question prompted. I have tried using different prompts and some hyperparameters without success, a bit frustrated without understanding the reasons. I directed myself to the demo version at https://huggingface.co/spaces/Qwen/Qwen2-7b-instruct-demo which gives me the responses I expect, so somehow I understand that I am doing something wrong.
I want to continue growing in this area that I am very passionate about. If anyone can explain what I am doing wrong and guide me on the right path, I would be very grateful. I will provide the file I am working with, excuse the comments in the code and the phrases in Spanish as it is my native language.