#AI voiceassistant homeassistant
1 messages · Page 1 of 1 (latest)
it depends on your available hardware but qwen3 (of various sizes) works pretty nicely
Local? Maybe cloud?
i run ollama locally with a gpu on one of my servers. I dont know much about online models. maybe someone else with more knowledge of that side can jump in
there is some benchmarks HERE but it may not have the latest models yet.
I've built an AI powered voice assistant with wake, VTT, TTV, image rec'n, all on an RPi5 8GB with RPi AI HAT and RPi Camera Module 2. I've used Vosk and Piper to give it a voice.
I used Gemma 3n for local LLM, and Gemini APIs for online LLM
It's far From complete but works well.
What about something not local?
Not sure the OP is asking for local, but you can have both. The truth told, and I’ve done this, you can have online as default LLM, but will fall back to local if internet is down, BUT I have mine go to canned responses before either LLM, so for the OPs stated use case (closing the gate) you don’t need an LLM at all.
I have been using qwen3:4b without major issues although the response time is pretty slow on my hardware even though I have an i7 and a 3060 ti.
@neon flint I have a 14gb 3060 and I get response times usually below 3s. Try qwen3-instruct-2507 in Q4_K_M.
Check if it's faster. Then make sure you enable flash attention in ollama.
Thanks for the heads up I’ll give it a try.
I want to make a fine-tune of qwen3 for home assistant that is very good at calling system prompts