#Help needed: Linking HA Assist Pipeline Output (OpenAI) to Alexa Echo Dot Speaker

1 messages · Page 1 of 1 (latest)

pine cedar
#

Hi everyone,

I'm trying to finalize my custom Voice Assistant setup in Home Assistant and I'm facing a roadblock regarding the audio output.

My Setup:

  1. Input Satellite: M5Stack Atom Echo (running ESPHome as a Voice Satellite).
  2. Language Model (LLM): OpenAI (used for intent processing/answering).
  3. Desired Output: I want the TTS (Text-to-Speech) response from the AI model to be played on an Amazon Echo Dot via the alexa_media_player integration. My goal is to use the Atom Echo's microphone but leverage the better audio quality of the Echo Dot speaker.

The Problem:
My current audio output is still coming from the Atom Echo's internal speaker, and I need to redirect it.

What I've tried (and failed):
I tried creating an automation triggered by an Assist event to capture the response text and send it as a notification/TTS to the media_player.echo_dot. This resulted in timing issues and did not work consistently.

Question:
What is the recommended and most stable way to configure the Assist Pipeline so that the final TTS output is automatically sent to a specific Alexa Echo Dot?

I am flexible: sending either the final text or the generated audio file to the Echo Dot is fine, as long as the final audio quality is maintained.

Any guidance on how to link the Pipeline's audio output to an Alexa Media Player entity is highly appreciated! Thanks in advance!

proper prism
pine cedar
deft escarp
#

AFAIK every method is introducing delay even with something better than Alexa Media. Not to mention TTS delays on the integration itself.

proper prism
# pine cedar I have two Echo Dots that are very good speakers and I don't want to throw them ...

forwarding the piper generated file to the dot is probably a non-starter. whilst you can send custom audio files to alexa its very limited in format and quality etc... and also i suspect it would fail if piper tried to do a streaming response which is pretty standard now. (not to mention the delay it would introduce)

so your option would be to trigger on the satellite receiving the text response then using the satellite to call the HA action to send the text to alexa which could then speak it whilst muting the AE output. however again this may have issues with streaming responses. and would introduce quite a bit of delay. whilst the delay might be semi workable in some situations it would not be feasible to time back and forth continued conversations. if you tried you would have to error on the side of waiting for mic to active to respond or on the side of if it triggers early its going to hear itself.

realistically this is just not a practical idea for anything meaningfully useful.

proper prism
#

if you could rig a sensor on to dot that could tell when it starts/finishes speaking then attach it to AE (maybe the grove port?) and use that to help time then it might be possible to get it semi-working. but you would still have the delay issue.

#

actually an interesting idea for a POC project...

pine cedar
proper prism
#

so yeah, the AE is not really designed for real production usage, its more for development and testing and therefore pretty limited.
I understand what your trying to do. the above was the 2 ways i can think of to do it and the drawbacks to both of them.

if you instead of using an alexa had a different speaker that was integrated as a standard media player that can be sent files directly then you can do it by calling the action to send media and linking the tts audio file that is send to the AE to another media player.
I did make a POC firmware for the AE a while back doing that but it had some issues and I haven't updated it in months. so it would definitely be missing functionality (if it even still works...) but you could look at the code in the fork it links too if you were interested in the approach I took.
but as mentioned it was more POC than actually usable.

#

maybe ill dust off the echo and mess with trying to make it more functional at some point but i remember running into some roadblocks at the time. i am not sure if ESPHome has new ways that could allow me to workaround the issues but i dont remember seeing anything of note in the patch notes.

deft escarp
#

How does your Alexa handle STT? Just curious, because AFAIK there's no way. So either there is a way, or you don't realize the data flow in HA voice pipeline. 🙂

proper prism
pine cedar
pine cedar
proper prism
pine cedar
deft escarp
proper prism
proper prism