Hi everyone. I'd like to replace my Google homes and Alexa. So I'd like to run ollama alongside home Assistant. I want to move away from big tech such as chatgpt (I use it for prompts at work). I was wondering, is it better to put an RTX 3060 in my basic home server (old cpu and ram). Or to get a Jetson Nano super? I also run a few other programs in docker containers such as Plex, home assistant and nextcloud
#Replace Google Home and Alexa
1 messages · Page 1 of 1 (latest)
Tbh if you want an experience on par with google/alexa, you'd need something more powerful than either of those options to run a halfway decent llm that can handle home commands. Otherwise if the llm is just for q&a only and you intend not to allow it to do home control, either option will work but the llm will likely not be as smart.
You could add the gpu to docker and run ollama that way, then add the ollama integration in HA.
That was more or less the plan. Want to use ollama as the backend really. The new home assistant voice devices seem to do the home control stuff well.
I started a thread on #machine-learning - I too (as so many others) want to do this but uncertain as to what hardware is required to achieve what sort of performance. I have a low profile 4060 currently and it doesn’t perform great. I’m considering waiting for the 5090 at CES or 2x 3090s (but I’m using a mini-ITX with only one PCI-E) so really want only 1 card.
It feels like there should be a HA Assist local hardware performance table where folks can contribute to their experience with their hardware using some sample sentences/requests
Problem is there's so many factors that affect performance. For instance I've seen people run qwen2.5-7b fine with a context window of 8196 on gpus like the 4060. But those people also only exposed something like 15-20 entities.
Meanwhile it performed terrible with those same settings for me because i have 153 exposed entities, numerous intent scripts (tools), and automations and such. So i had to bunk my context window up to 33768 and lower my remembered messages to 5.
The other part is that as your context becomes larger or more data is fed into the llm/context, the neural network has to do more work, which can also slow things down. So basically two people could run the same model, with the same params and context window size, on the same hardware, and get totally different performance due to the differences in their HA environment. 😕
And then there's also tweaks you can make to ollama, like enabling flash attention and messing with the quantization of the kv cache that again affect performance both in terms of speed and accuracy 😅
Makes sense. I think I'll use the new Home Assistant voice device to handle the actual HA stuff. Then a 3060 on a docker container hooked into HA for the LLM stuff.
What's "actual HA stuff" and "LLM stuff"?
I mean to control lights and such through HA. And the LLM for questions like "What day of the week is January 2nd?"
Home Assistant's Voice Preview Edition is Home Assistant first attempt at hardware for local voice control in your smart home, with a focus on privacy! This has been a long time coming, and in this video we are taking a look at how well it works in this review of the Home Assistant Voice Preview Edition.
Home Assistant Voice Preview Edition:
ht...
I see. That's the way I use it. No control for LLM.
Just bear in mind that the device itself does not do tts/stt, for that you either need nabu cloud or have to run your own whisper/piper locally
Is that resource intensive?
Yeah, you'd want a GPU if you are running your own locally
Think a 3060 is enough?
might be enough for tts/stt, not sure an LLM would also fit or run well
I need to back order those 🙂
Or build one yourself 🙂
True!
Any guides you'd recommend?