#Jarvis “gum stick” build with no speaker
1 messages · Page 1 of 1 (latest)
You want to carry that device with you, and make HA output TTS to Sonos in the room you're in?
There's a bunch of problems coming with it, starting with power consumption, and finishing with area awareness, which will be impossible to implement with it...
No, i'd stick it on / near the Sonos speaker
and leave it. Would build multiple ones / per room / per sonos
from a software standpoint, considering the Sonos output wouldn't hurt the satellite's audio input, it can be done quite easily #1162776992496763041 message
but you may end up with other issues: visual/audio feedback for things other than responses (e.g. error indicators, timer notifications/control), power requirements (you have to have 5V/3.3V available to power the ESPs) etc.
Very cool. Your point about audio feedback for things other than responses is specific to errors that come from attempting to make the voice assistant call itself? IE: it's only going to work if the satellite is able to successfully call into hass, and than hass triggers the media_player response?
That is correct. Outstanding situations may produce errors way before that
Got it. is there any generally recommended speaker for non-music scenarios? Something small that doesn't sound like a tin can?
Onju Voice + Nest Mini 2, Raspiaudio Muse Luxe (awful mic), Raspiaudio Muse Radio
Plus, HA will launch its own hardware soon
may just sit tight for a bit than and see what they launch. I appreciate your help!
FutureProofHomes works on one too. Also you forgot Respeaker Lite 🙂
I was recommending product-lookalike products 😅
Yes, it's happening right now, just wait a bit. 🙂 At this stage only the dumbest (me counting) went all way to get rid completely of proprietary assistants. It's bumpy road so far, but we're getting there step by step. Check out this: https://github.com/formatBCE/Respeaker-Lite-ESPHome-integration
Well, OP is ready to go ESP path 🙂
There's finished product ESP products and there's Dupont spaghetti ESP products. Personally, i've grown a bit tired of the latter, but indeed they are a good option if you are willing to tinker
Yeah… WAF matters for these ones. By utility / water meter / energy meter sensors are cool in the heavy DIY / esp hanging from a power cord over a plumbing pipe.
I want to wire things up to azure OpenAI for intent summarization too, which to my understanding is easier to do with a diy voice assistant.
That process happens in HA, not the satellite, so it does not matter what satellite you use
Wired up to Azure speech for stt and tts and OpenAI for LLM. This is magic. Would love to figure out how to get Azure OpenAI working for LLM as I have visual studio credits there, but the native integration doesn’t work with it I don’t think. Not self hosted by any stretch but fast and cheap.
I'm not sure if anyone made any custom integration for Azure OpenAI, but HA has a built-in intent parser. Sure, it's rigid unlike LLMs, but it does the job for free and super fast
Just to make sure I'm thinking about this correctly, there should be one Home Assistant "Assistant" per voice gateway that has OpenWakeWord installed on it?
Each home assistant instance can have multiple "assist pipelines"/voice assistants configured, each referencing:
- 1 supported language
- 1 conversation agent (examples include the built-in one or the LLM based interpreter), supporting the pipeline's language
- an optional TTS service in the pipeline's language
- an optional STT service in the pipeline's language
- an optional wake word detection engine, language independent
If you don't set both TTS and STT, that pipeline will be used in text mode only.
Voice satellites require pipelines with the TTS/STT services set. Each "voice satellite" can use one pipeline at a time but you can switch the used pipeline/voice assistant.
OpenWakeWord is a means of starting the STT only after the incoming streaming audio (24/7 streaming, that is) from the satellite is detected to contain the wake word, as per processing happening in the OWW engine. After that detection takes place, the pipeline runs normally. This approach uses little processing power on the satellites, but streams constantly and analyzes audio constantly on the HA side.
The other approach is to have local wake word detection (local to the satellite), such as Micro Wake Word for ESP-based satellites, and only stream audio to the pipeline with STT after that wake word was detected on the satellite. This improves network traffic and constant audio analysis on HA, but requires more processing power on the satellite
So in this scenario I could run OWW on the hass backend and the only thing the satellites do is constantly stream audio?
correct. not recommendable, though, due to constant streaming
What does the software stack look like on the satellite?
on the ESP? it's just the firmware using the mandatory voice_assistant component in ESPHome https://www.esphome.io/components/voice_assistant. everything else is optional
So basically keep the “use_wake_word” at false and it’ll 24/7 stream to an assist pipeline
I deployed the openwakeword docker container and wired it up to the assist pipeline. Do I pipe the audio from the esp to that OWW container or is hass smart enough to route the audio stream to the OWW container over whisper?
if you add a new Wyoming integration instance for the new container, the rest is taken care of automatically
you can only test that setup on a satellite, mind you. neither the companion app, nor the browser will use wake word
They use the rest of the pieces of the pipeline but not the wake word.
How does the satellite pick the assist pipeline it’s working with?
I didn’t see a pipeline name variable option in the config above.
each wyoming satellite (ESPHome included) has an option regarding which pipeline to use. Also, you have a "default" pipeline defined in HA
After adding satellite to HA you will see drop-down with pipelines in it's settings.
Ahh, so the hass side.
AFAIK it should be "true" to use wake word, provided by HA/OWW. You set it to "false" if you're using separate ww system like Microwakeword.
Yes. All in all, your ESPHome firmware knows nothing about configured pipelines, so it's HA setting.
Just need to decide on hardware to buy to play around with it. Or hope that hass comes out with their first party soon and I don’t miss the initial batch.