#Where is best to run TTS, STT, LLM, etc.?

1 messages · Page 1 of 1 (latest)

bronze holly
#

Hi,

I am starting to work my way into voice control for my HA setup. I run Home Assistant OS on a Raspberry Pi 5 with 16GB RAM and it is doing perfectly fine. I wonder what the best option for hosting the various components of a voice assistant are. It looks like I should be able to run TTS and STT as addons on the same Pi. Especially when it comes to the LLM, I imagine hosting it on another host is the better option. But is that the best choice?

There are a lot of sources on these topics, but I struggle to understand what is outdated. The entire sphere moved incredibly fast the last year. Can someone give me a rundown on what a "state of the art" architecture would look like? I'd really appreciate it a lot.

quick locust
# bronze holly Hi, I am starting to work my way into voice control for my HA setup. I run Home...

some thoughts:
if your using a RPI then ensure you are using an SSD and not a SD card.

for a total local setup:
for TTS, piper will run just fine on the rpi as an addon
for STT, whisper will run with a smaller model as a reasonable speed but not super quick but if you want better speed and/or accuracy from a bigger model theen running STT on a seperate system with GPU acceleration as an option will likely be an improvement.
as for running a local LLM. you really cant run this on the rpi. I run ollama on a server on my network which has a 5060ti 16gb GPU to run models.

if you are not adverse to using cloud services then there are some other options
home assistant cloud provides you with great STT and TTS services (along with some other stuff) and you actively support the project by using it.
there are various options for connecting cloud llms but the most common tend to be openai and gemini. these can be mixed and matched with local services too.

an ideal setup really depends on what your goal is for the setup.

jagged sigil
quick locust
# jagged sigil Can you share your llm setup and idle power consuption?

i dont have the servers individually power tracked currently but i dont think it would help you. its a big server running proxmox which does multiple things.

my current setup is that the GPU is passed through using pcie passthrough to a ubuntu VM running on proxmox, that VM is running ollama with docker.

jagged sigil
#

Okay. And do you think its more efficient compaired to a mac mini m4?

quick locust
#

you can run a gpu on a small power efficient server. it will probably idle at more than a mac mini but will probably be cheaper and perform better.