#my voice pe is telling me its throught process instead of just answering the question.
1 messages · Page 1 of 1 (latest)
it was the model I had picked. I had Gwen3:4b running. when I switched back to Gwen3.latest now it just answers the questions. Is there something about the naming convention I don't understand?
going to assume you mean qwen instead of gwen
its to do with the templating. qwen3 had some issues when it first released and custom model templates in ollama were needed. however this has been mostly fixed between ollama and model updates. but it still comes up in some specific setups.
I asked it whats the outside temperature and it went on a 5 minute explaination of the actual temperature and the feels like temperature trying to decide if it was professional sounding enough and whether 8 words was to long or not.
I stopped it after 5 minutes
so would that be considered a bug in quen3:4b or is it a diagnostic feature?
the model template is sending the "thinking" process as part of the response. its a pain to mess with. if you have a different config that works then i suggest that you just use that config.
yea I moved back to qwen3.latest and everything is good. I'm just trying to learn how I would have know that before installing it. thought maybe there was a naming convention or something
there is not really any way you would have known. there was a bit of disagreement a while back on how thinking models should be dealt with and configured. with various projects saying that their way was the right way and they are not changing to work with other projects and they must change instead.
qwen3 release was the trigger point of things escalating and then finally getting fixed.
your issue is a left over remnant of this chaos.
hopefully everyone is mostly in agreement now and future thinking models won't have the issue.
ok. thanks
no worries, good luck with your setup 🙂
What kind of hardware are you using?
Some GPU?
Nvidia Gforce rtx 5060 ti. 20 cores, 32G ram.
Oh wow... Okay..
Do you know what your setup consunes when idle?
BE sure to use the 2507 revision of Qwen 3
it improces results by a lot with the isntruct model
I’m running Mac M4 Pro with 64 GB and I get good fast results but same verbose stuff
I’ll have to check on that version. I think I’m just on latest whatever that is.
I get better responses from random questions than i do for anything going through HA
I just loaded Model qwen3:30b-a3b-instruct-2507-q4_K_M and it’s pretty promising. Lights on in <2 seconds and about 10 seconds for complex questions or commands
But next is setting up whisper with a better model
Oh, I just use 4b Q4KM
And it works well to call my scripts and do all i need, it can start movies, shows, and music playback
Along triggering my light effect scripts
How fast is it
That’s pretty respectable
I'm facing the same issue, I'm trying to test nemotron-mini:4b, deepseek-r1:8b, AND granite3.1-dense:8b models with Ollama. Is there some place I can try and modify the templates in HA? Or Ollama? Or is this some other issue. All 3 of those models are on the Tools listing in Ollama. I'm trying to run this on GTX 1080, so trying some smaller/lighter weight models so they run faster.
there is a way you can create a custom template file that works correctly and then link it to the model you want to use. then once its loaded in ollama you can select it from HA. its been a while since i messed with that though.
Do you have a link to the documentation of where that is? So I can look into it myself.
Hm. Is it the prompt template that might need tweaking? Looks like https://github.com/ktsaou/ollama-templates and https://community.home-assistant.io/t/i-made-a-new-template-for-ollama/725113 and https://medium.com/@laurentkubaski/ollama-prompt-templates-59066e02a82e might be a good start. And maybe https://pastebin.com/ZhtJ6BrJ
Tools and documentation for creating proper, fully functional Ollama model templates. Addresses critical issues with community-generated Ollama models that often have incorrect or incomplete templa...
I reviewed the default template and it doesn’t use properties that HA has now. It looks like it might be out of date. So, I made this one and wanted to share.
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
not directly but its part of ollama documentation somewhere i am sure.
i remember having a working qwen3 setup. you can use a command to output the template file from a running model. so it piped that to a file then edited the model reference in it to the model i wanted to use that had a non working default template. then can add the model by specifying the customised template file.
as i said i cant remember the specifics but that was the gist of it. for the commands the ollama help output and/or man page should be sufficient.
If i get some time ill try and find the specifics but wont be until tomorrow at easiest.
Ah, ollama docs, that's a starting point. So sounds like it's the model file from ollama, which needs to be 'repackaged' with a template change around it?
oh I found a modelfile example i made for someone to use unsloth months ago HERE i forgot about that. I imagine it should still work.
Gotcha, thanks
on line2 you can see the model it refers to
first line is just a comment but adapt as you want
and that's what I tell Ollama to load up, the .modelfile?
I'm off to bed shortly, but should have a bit of time this weekend to mess with things
when loading a model theres an option to do it as per a modelfile
ollama create [new_model_name] [[-f|--file]] [path/to/Modelfile]
source - https://linuxcommandlibrary.com/man/ollama
so something along the lines of:
ollama create my_super_awesome_custom_model -f /location/of/awesome_file.modelfile
You're the best
then when you add an agent in ollama integration it should be in the list with your custom name.
only if it works 😛 good luck 🙂
Well, it's something to mess around with and see how the system works
Good knowledge regardless
I'd suggest using https://github.com/skye-harris/hass_local_openai_llm which comes with a number of improvements on top of conversation including stripping out thinking