#Local voice assistant for Home Assistant: Which option to choose?

1 messages · Page 1 of 1 (latest)

weak tiger
#

Hey everyone, I'm looking to set up a local and private voice-controlled assistant for my Home Assistant setup. I've come across several options and tutorials, and I'm not sure which route to take. Here are the main approaches I've seen:

  1. Setting up Rhasspy to voice control Home Assistant locally
  2. Using an M5Stack ATOM Echo Development Kit with Home Assistant's built-in voice control features
  3. Creating a custom wake word with openWakeWord and integrating it into Home Assistant
  4. Using an ESP32-S3-Box as a local voice assistant
  5. Using a Raspberry Pi Zero W 2 with a ReSpeaker 2 Mic Hat to create a Wyoming Satellite

I'm prioritizing privacy and local control. I'm wondering which of these approaches would be most suitable for me in terms of ease of setup, reliability, and ongoing maintenance. Are there any other options I should consider?
Also, I'd appreciate any insights on the pros and cons of each method, especially regarding performance and integration with Home Assistant. Thanks in advance for your help!

karmic cargo
#

All that is from different areas.
1 is basically analog to Assist (conversation agent). Choosing from two, I'd go Assist.
2 and 4 are satellites, based on ESP32. Both good for tinkering, practically unusable for daily tasks... S3 Box might work on your table in quiet room.
5 is satellite, based on Raspberry. Didn't work decently for me, despite 2 mics. Pretty hard to customize too.
3 is wake word. Can be used with 5, or with ESP-based satellites, if you use don't use on-device wake word (highly recommend to use on-device though).
There are couple decent satellites in development (one by HA team), and no ready-to-use devices on market. At your stage, I'd wait.

I use variation of p.2, and Respeaker Lite Assistant Kit from Seeed. Latter is decent, but requires tinkering to make it work.

#

Before asking further questions, please google a bit. 🙂

rich gale
#

A lot depends on the size and the amount of background noise of your rooms. The ESP solutions work pretty well for me - DIY ESP32 ,MIc only satellites, and ESP32 S3 Box 3, but I live in a very quiet household. I use the Nabu Casa cloud with Whisper as local failover in the event of internet dropout. I think you will find a mix of peoples experiences as some setups work well for some but not for others. There is also a Nabu Casa Device due to be released later in the year., which will hopefully address some of the issues that users have experienced(this is also ESPHome based) . As using Voice with HA is a bit of a learning curve, it may be worth getting a device as a test device to see what is involved and how it can work for you in the long term.

weak tiger
#

Hi everyone,

It’s been over six months since I first asked for advice on building a privacy-focused, entirely local voice assistant for Home Assistant. Back then I was weighing options like Rhasspy, M5Stack/ESP32 satellites, custom openWakeWord wake-words, Raspberry Pi Zero W projects, etc. You all generously shared your experiences and told me to hold off until the ecosystem matured.

I took your advice and have been monitoring developments.
My priorities remain the same:

  • 100% local/edge processing
  • Seamless Home Assistant integration
    -Reliable performance in a normal household environment

My question now:
Given what’s landed in the last six months, what project or platform would you recommend I dive into today? Are there any new “must-try” devices, libraries or tutorials that have proven reliable?

Thanks in advance for your insights!

karmic cargo
slow elbow
karmic cargo
slow elbow
#

You mean Whisper? I thought Piper was TTS?

#

My apologies if I'm confused

#

I am using faster-whisper with my purely local pipeline and I find that sometimes there are weird oddities when using my headset mic in my browser when interacting with the assistant. There is no wakeword detection at all in this setup, so I am not sure why but sometimes words get cut off from the beginning of the transcript in the SST. This is making me paranoid that a device sitting on a table will really struggle hearing me across the room and transcribing accurately, when a microphone right in front of my face has strangely poor results sometimes.

slow elbow
#

I'm glad it works well, it's a nice design

karmic cargo
slow elbow
#

Ah, very good

karmic cargo
slow elbow
#

It might be because you're using the nabu cloud service, which is more reliable and probably a lot higher performance than the VM I'm running faster-whisper in. It is also the default setup which comes with the Wyoming protocol integration. I'm not sure it's optimal.

karmic cargo
slow elbow
#

Oh, I see... How do you have it set up? Do you have the separate Whisper integration installed, or are you using the one built into the Wyoming integration by default?

#

I get a warning message when I want to try to install the separate Whisper integration. I'm not sure that's the right path for me because I want to be able to use it with satellite devices eventually. It's a bit confusing.

karmic cargo
#

Hmm, I have Whisper in Docker, and added it via Wyoming.

slow elbow
#

Does it look like this in your assistants settings?

karmic cargo
#

Yes, exactly like that

slow elbow
#

So it's set up the exact same as me. I wonder what the issue is? For me, latency does not seem to be the problem. I wonder if it's just that my mic is crappy. In fact, that might be it because the more I think about it, when I use it on my phone it seems to do a much better job of detection even in noisy circumstances. Maybe because my phone's mic array has DSP happening that my computer mic doesn't?

#

Either way that gives me a lot more confidence that the Voice PE devices or the similar Koala devices will work very well

karmic cargo
#

Yeah, the fact that the microphone will be working already (for wake word) will help too

slow elbow
#

Definitely

weak tiger
#

Did any of you @slow elbow @karmic cargo try Home Assistant Voice Preview Edition? It seems like there is minimum tinkering as it is ready OOB and has some decent hardware

karmic cargo
slow elbow
#

I'm interested in buying it, but I need to be able to change the wake word to something else, and I'm only making minimal progress on that at the moment

slow elbow
weak tiger
weak tiger
karmic cargo
jagged galleon
#

I have a VPE. Seconding basically everything everyone said here. It’s pretty much the same hardware, but it’s a nice out of the box experience. It’s got a grove port for expansion as well as open components if you want to crack in and do whatever hardware wise.

Software wise, there is not any meaningful difference. These are all ESP devices and run with ESPHome.