#Replace Google Home and Alexa

1 messages · Page 1 of 1 (latest)

fickle mortar
#

Hi everyone. I'd like to replace my Google homes and Alexa. So I'd like to run ollama alongside home Assistant. I want to move away from big tech such as chatgpt (I use it for prompts at work). I was wondering, is it better to put an RTX 3060 in my basic home server (old cpu and ram). Or to get a Jetson Nano super? I also run a few other programs in docker containers such as Plex, home assistant and nextcloud

winged ore
#

Tbh if you want an experience on par with google/alexa, you'd need something more powerful than either of those options to run a halfway decent llm that can handle home commands. Otherwise if the llm is just for q&a only and you intend not to allow it to do home control, either option will work but the llm will likely not be as smart.

#

You could add the gpu to docker and run ollama that way, then add the ollama integration in HA.

fickle mortar
hoary solstice
#

I started a thread on #machine-learning - I too (as so many others) want to do this but uncertain as to what hardware is required to achieve what sort of performance. I have a low profile 4060 currently and it doesn’t perform great. I’m considering waiting for the 5090 at CES or 2x 3090s (but I’m using a mini-ITX with only one PCI-E) so really want only 1 card.

#

It feels like there should be a HA Assist local hardware performance table where folks can contribute to their experience with their hardware using some sample sentences/requests

winged ore
#

Problem is there's so many factors that affect performance. For instance I've seen people run qwen2.5-7b fine with a context window of 8196 on gpus like the 4060. But those people also only exposed something like 15-20 entities.
Meanwhile it performed terrible with those same settings for me because i have 153 exposed entities, numerous intent scripts (tools), and automations and such. So i had to bunk my context window up to 33768 and lower my remembered messages to 5.

#

The other part is that as your context becomes larger or more data is fed into the llm/context, the neural network has to do more work, which can also slow things down. So basically two people could run the same model, with the same params and context window size, on the same hardware, and get totally different performance due to the differences in their HA environment. 😕

#

And then there's also tweaks you can make to ollama, like enabling flash attention and messing with the quantization of the kv cache that again affect performance both in terms of speed and accuracy 😅

fickle mortar
#

Makes sense. I think I'll use the new Home Assistant voice device to handle the actual HA stuff. Then a 3060 on a docker container hooked into HA for the LLM stuff.

sage scarab
fickle mortar
#

I mean to control lights and such through HA. And the LLM for questions like "What day of the week is January 2nd?"

sage scarab
#

I see. That's the way I use it. No control for LLM.

winged ore
#

Just bear in mind that the device itself does not do tts/stt, for that you either need nabu cloud or have to run your own whisper/piper locally

winged ore
#

Yeah, you'd want a GPU if you are running your own locally

fickle mortar
winged ore
#

might be enough for tts/stt, not sure an LLM would also fit or run well

stuck rover
#

just got my HA voice - love it

#

just thought i'd say that - ha

fickle mortar
sage scarab
fickle mortar