#Voice Performance boost.
1 messages · Page 1 of 1 (latest)
If your considering paying for external provider then you should consider "Home Assistant Cloud" it works well and also supports the project.
Oooh I do have cloud but I’ve not tried it yet
also another possibility to look at if you are running local (and have the extra ram available) then look at ONNX-ASR
if your running STT in english then its a bit quicker
however if your looking at speeding up the LLM "Brain" more than the recognition then perhaps a small ollama model or a system with a more powerful gpu
speeding up various parts of the pipeline can help overall speed. dont nessasarily have to focus on a specific part
which model are you running on ollama?
I was running llama3.2 lol which I used from my laptop server due to its GPU - but im looking at getting a dedicated machine for it soon.
I recommend to try out the small qwen3:4b model with thinking turned off. you might get better results with it.
Rodger - how resource intensive is it ? - debating running it on separate machine within proxmox
i don't have a huge amount of experience with the smaller model tbh
i use the 14b model but the vm that runs that has a 16gb 5060ti attached to it...
i have heard good things from others using it for assist though. and qwen3 is pretty quick in general
You want this one.
hf.co/bartowski/Qwen_Qwen3-4B-Instruct-2507-GGUF:Q4_K_M
You paste it into the Ollama integration and it will download it for you from huggingface. It's the new version of the model without thinking mode. I could never get qwen3:4b to actually stop its thinking. It uses about 4.5GB of VRAM and would be lighter than llama3.2.