#Jarvis “gum stick” build with no speaker

1 messages · Page 1 of 1 (latest)

dusk nexus
#

Hey everyone! I want to build a Jarvis “gum stick” device without a speaker. I’ve got Sonos in every room and would want hass to output the audio to that instead. Any reason why it wouldn’t work?

#

I’d like to use esp32-s3 devices with ESPHome.

lament roost
#

You want to carry that device with you, and make HA output TTS to Sonos in the room you're in?
There's a bunch of problems coming with it, starting with power consumption, and finishing with area awareness, which will be impossible to implement with it...

dusk nexus
#

No, i'd stick it on / near the Sonos speaker

#

and leave it. Would build multiple ones / per room / per sonos

shut marten
#

from a software standpoint, considering the Sonos output wouldn't hurt the satellite's audio input, it can be done quite easily #1162776992496763041 message

but you may end up with other issues: visual/audio feedback for things other than responses (e.g. error indicators, timer notifications/control), power requirements (you have to have 5V/3.3V available to power the ESPs) etc.

dusk nexus
#

Very cool. Your point about audio feedback for things other than responses is specific to errors that come from attempting to make the voice assistant call itself? IE: it's only going to work if the satellite is able to successfully call into hass, and than hass triggers the media_player response?

shut marten
#

That is correct. Outstanding situations may produce errors way before that

dusk nexus
#

Got it. is there any generally recommended speaker for non-music scenarios? Something small that doesn't sound like a tin can?

shut marten
#

Onju Voice + Nest Mini 2, Raspiaudio Muse Luxe (awful mic), Raspiaudio Muse Radio

#

Plus, HA will launch its own hardware soon

dusk nexus
#

may just sit tight for a bit than and see what they launch. I appreciate your help!

lament roost
shut marten
#

I was recommending product-lookalike products 😅

lament roost
# dusk nexus may just sit tight for a bit than and see what they launch. I appreciate your he...

Yes, it's happening right now, just wait a bit. 🙂 At this stage only the dumbest (me counting) went all way to get rid completely of proprietary assistants. It's bumpy road so far, but we're getting there step by step. Check out this: https://github.com/formatBCE/Respeaker-Lite-ESPHome-integration

GitHub

Contribute to formatBCE/Respeaker-Lite-ESPHome-integration development by creating an account on GitHub.

lament roost
shut marten
#

There's finished product ESP products and there's Dupont spaghetti ESP products. Personally, i've grown a bit tired of the latter, but indeed they are a good option if you are willing to tinker

dusk nexus
#

Yeah… WAF matters for these ones. By utility / water meter / energy meter sensors are cool in the heavy DIY / esp hanging from a power cord over a plumbing pipe.

#

I want to wire things up to azure OpenAI for intent summarization too, which to my understanding is easier to do with a diy voice assistant.

shut marten
dusk nexus
#

Wired up to Azure speech for stt and tts and OpenAI for LLM. This is magic. Would love to figure out how to get Azure OpenAI working for LLM as I have visual studio credits there, but the native integration doesn’t work with it I don’t think. Not self hosted by any stretch but fast and cheap.

shut marten
#

I'm not sure if anyone made any custom integration for Azure OpenAI, but HA has a built-in intent parser. Sure, it's rigid unlike LLMs, but it does the job for free and super fast

dusk nexus
#

Just to make sure I'm thinking about this correctly, there should be one Home Assistant "Assistant" per voice gateway that has OpenWakeWord installed on it?

shut marten
# dusk nexus Just to make sure I'm thinking about this correctly, there should be one Home As...

Each home assistant instance can have multiple "assist pipelines"/voice assistants configured, each referencing:

  • 1 supported language
  • 1 conversation agent (examples include the built-in one or the LLM based interpreter), supporting the pipeline's language
  • an optional TTS service in the pipeline's language
  • an optional STT service in the pipeline's language
  • an optional wake word detection engine, language independent

If you don't set both TTS and STT, that pipeline will be used in text mode only.

Voice satellites require pipelines with the TTS/STT services set. Each "voice satellite" can use one pipeline at a time but you can switch the used pipeline/voice assistant.

#

OpenWakeWord is a means of starting the STT only after the incoming streaming audio (24/7 streaming, that is) from the satellite is detected to contain the wake word, as per processing happening in the OWW engine. After that detection takes place, the pipeline runs normally. This approach uses little processing power on the satellites, but streams constantly and analyzes audio constantly on the HA side.

#

The other approach is to have local wake word detection (local to the satellite), such as Micro Wake Word for ESP-based satellites, and only stream audio to the pipeline with STT after that wake word was detected on the satellite. This improves network traffic and constant audio analysis on HA, but requires more processing power on the satellite

dusk nexus
shut marten
dusk nexus
#

What does the software stack look like on the satellite?

shut marten
dusk nexus
#

So basically keep the “use_wake_word” at false and it’ll 24/7 stream to an assist pipeline

#

I deployed the openwakeword docker container and wired it up to the assist pipeline. Do I pipe the audio from the esp to that OWW container or is hass smart enough to route the audio stream to the OWW container over whisper?

shut marten
dusk nexus
shut marten
#

you can only test that setup on a satellite, mind you. neither the companion app, nor the browser will use wake word

dusk nexus
#

They use the rest of the pieces of the pipeline but not the wake word.

#

How does the satellite pick the assist pipeline it’s working with?

#

I didn’t see a pipeline name variable option in the config above.

shut marten
#

each wyoming satellite (ESPHome included) has an option regarding which pipeline to use. Also, you have a "default" pipeline defined in HA

lament roost
dusk nexus
#

Ahh, so the hass side.

lament roost
lament roost
dusk nexus
#

Just need to decide on hardware to buy to play around with it. Or hope that hass comes out with their first party soon and I don’t miss the initial batch.