Hi all, so I've just received my first HA Voice PE, and it seems to be working well and can pick up my voice decently. However, as we probably all know, the speaker isn't so great. I already have some other ESP based satellites around, where some I have used an external (for example, Google Nest mini) speaker to outut the TTS. Has anyone tried taking over the Voice PE in ESPhome, and changed the config to send the TTS to an external speaker / media player? Would love to hear from you 🙂
#HA Voice PE send TTS to external speaker
1 messages · Page 1 of 1 (latest)
In Koala satellite i've added TTS URL sensor, which can be used to create automation and send it to any player.
Great! Can you share any code? Would be most appreciated
Many thanks! I'll get back to you if I need any more guidance or help. Have a great weekend 😄
Were you able to get this working?
And is there a way I can snip the code out and put in the current pe config?
My main goal is using external speaker through the media player and changing start and stop listening sounds
Hi, no I haven't had a chance to try this yet. I've just looked through the yaml linked above, and there's a hell of a lot of code! @limpid kindle have you taken the whole code, and used that on your Voice PE? Would you mind sharing a copy of your exact yaml? All I really want right now is to send the TTS to an external media player 🙂
That line the link is pointing at is TTS URL sensor. Just check the YAML for places it's getting changed.
That is the code for Koala Satellite, and it's using PE YAML, but with many changes.
It looks like you created that sensor and then when TTS ends (is that when the processing ends? Or it finishes playing locally?) the URI is written to that sensor. I assume from there you have a HA automation listen to that sensor change and then play on the desired media player. Is that right?
I’m very curious about this because I have Sonos speakers in most rooms so I’m interested in sticking one of these in each zone and using my existing speakers.
Powered by XMOS XU316 AI Sound and Audio chipset, it is a high-performance open source voice assistant development kit. The kit integrates the ReSpeaker Lite dual-microphone array and a powerful XIAO ESP32S3 processor, offering exceptional voice recognition, noise reduction, and voice processing capabilities thanks to its onboard AI NLU algorith...
Yes. On TTS end means your Piper finished processing text, so it's called right after the audio is generated. I tried it and it plays simultaneously on Koala and on other media player
Kinda got this working with a new TTS_URL sensor which updated with a new TTS_end event (with the url to the .flac response file). Next to do for me is to add a new boolean into the code, when set, home assistant PE disables playing the .flac response to the internal /externally connected speaker .A lot of code to look though and understand. you could then switch it on/off in home assistant while testing/as and when needed.
that's all in the yaml - you can change but would have to compile yourself - files:
- id: center_button_press_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/center_button_press.flac
- id: center_button_double_press_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/center_button_double_press.flac
- id: center_button_triple_press_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/center_button_triple_press.flac
- id: center_button_long_press_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/center_button_long_press.flac
- id: factory_reset_initiated_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/factory_reset_initiated.mp3
- id: factory_reset_cancelled_sound
file: https://github.com/esphome/home-assistant-voice-pe/raw/dev/sounds/factory_reset_cancelled.mp3
Send to External Speaker - working(ish) - Here's where im at. I can play the voice response on my sonos (but it's still ALSO playing on the voice PE - - TODO!)
You'll need to take control of the Voice PE in ESPHome Builder, then in ESPHome Builder - EDIT
Do not do this unless you know what this means (no automatic firmware updates)
Add these few lines to the bottom of the yaml...
Wilksy TTS URL Sensor
text_sensor:
- platform: template
id: tts_uri
name: "TTS URI"
disabled_by_default: false
voice_assistant:
on_tts_end:
- text_sensor.template.publish:
id: tts_uri
state: !lambda 'return x;'
you'll then need to click Save and then click Install, This will take some time while it re compiles the firmware. After you'll have a new sensor called "Home Assistant Voice xxxx TTS URI". You can create an automation whith this sensor, when it changes you can play the url of the flac file on your Sonos...
below is the Automation YAML to do that...
alias: Play TTS on Sonos
description: Play the TTS file from the sensor on a Sonos speaker
triggers:
- entity_id: sensor.home_assistant_voice_096f11_tts_uri
to: null
trigger: state
conditions: []
actions: - target:
entity_id: media_player.kitchen
data:
media_content_id: "{{ states('sensor.home_assistant_voice_096f11_tts_uri') }}"
media_content_type: music
action: media_player.play_media
mode: single
UPDATE the Name of YOUR voice Device and sonos media player device in the above. You'll also probably want to save the sonos state prior, and resume it after it's playing.
It's a start, next is disabling the speaker for TTS playing (voice response) only, and also look at why "announce" does not work (it's doesn't overlay the audio) - probably a Sonos Thing
I get that but I mean sending those to the media player entity instead of the pe. I'm using sonos to. Currently with the box3 when I say the wake word, the wake sound is heard on sonos and I have a tts end for when it stops responding.
Can the pe just be turned down to like 1 to give the impression it only plays through sonos.
One other thing. Couldn't we use assist satellite entity for listening to do the same?
vol could be set to Zero on getting the TTS_URL event.but the announcement would have started by then.
Do you need PE volume even on at all if you’re playing through the Sonos?
I just mute the satellite.
I still want the wake notification sound so i know it's been picked up.
I still want the wake notification sound so i know it's been picked up.
can you format that lol.
just the part for the pe config
compiling. will check back
i got it working
gonna try something to get the wake sounds to play through sonos though. it still plays through pe
i think that can be achieved through the satellite entity maybe
got the start and end working. playing an mp3
`alias: TEST Hey Jarvis PE
description: ""
triggers:
- trigger: state
id: one
entity_id:- assist_satellite.home_assistant_voice_095f33_assist_satellite
to: listening
- assist_satellite.home_assistant_voice_095f33_assist_satellite
- trigger: state
entity_id:- assist_satellite.home_assistant_voice_095f33_assist_satellite
from: null
to: idle
id: two
conditions: []
actions:
- assist_satellite.home_assistant_voice_095f33_assist_satellite
- choose:
- conditions:
- condition: trigger
id:- one
sequence:
- one
- target:
entity_id: media_player.master_bedroom
data:
media_content_id: /local/sounds/awake.mp3
media_content_type: music
action: media_player.play_media
- condition: trigger
- conditions:
- choose:
- conditions:
- condition: trigger
id:- two
sequence:
- two
- target:
entity_id: media_player.master_bedroom
data:
media_content_id: /local/sounds/end.mp3
media_content_type: music
action: media_player.play_media
mode: single`
- condition: trigger
- conditions:
Sorry, VERY new to discord. How do i post formatted code. Be patient!!:)#
text_sensor:
- platform: template
id: tts_uri
name: "TTS URI"
disabled_by_default: false
voice_assistant:
on_tts_end:
- text_sensor.template.publish:
id: tts_uri
state: !lambda 'return x;'
Yay!!!
Thanks so much for posting this! I have got the text sensor working in HA, and can see it changes when I get a new TTS response, but I cannot get it to play on an external media player via the automation example you posted (I've of course changed the names of my entities etc). It's like the flac file just won't play on the media player I've chosen (trying to play it through an ESPHome media player device). Any pointers as to why that might not be working?
Hi all, just wanted t osay thanks again for the TTS_URI idea for getting TTS responses via an external media player. Unfortunately, I cannot get it to work. The TTS_URI entity is on my VPE, and I can see it changes when ever I get a new response, but I cannot get the external media player I've chosen to play the file (I can see the external media player changes from idle to playing, then back to idle again within a split second, so it's doing something, but the flac file won't play). Was just wondering if anyone had any pointers? For those who have it working, are your HA instances and VPE on the same network (mine are not)? Anything that could help me get it working would be most appreciated.
Hi again, I'm hoping someone can help as i'm very close to getting this working as I want. I've found that the TTS_URI works if I send it to a Google nest speaker, but not if I try and send it to an ESPhome based media player. Does anyone know why this could be? Has anyone tried sending TTS_URI requests to anything ESPhome based? I can see the media player tries to play, but changes instantly to idle again.
If it's helpful, here's my custom firmware yaml for the Voice PE to play back on Sonos - https://github.com/mike-nott/open-voice-pe
GitHub
Contribute to mike-nott/open-voice-pe development by creating an account on GitHub.
Thanks very much, I'll take a look 🙂
@unkempt vault I managed to get this working for myself just now. What I did was create a custom event esphome.tss_ended which passes the TTS file (url: !lambda "return x;") along. Then I created an automation that plays the file over my media player. I didn't get it working on my HomePod because it doesn't accept .wav files, but it works perfectly on my Sonos with minimal delay. I suppose you could get it working on a HomePod as well if you transcode the audio file to a different format such as mp3.
Here is the snippet of the firmware for ESP:
on_tts_end:
- homeassistant.event:
event: esphome.tts_ended
data:
url: !lambda "return x;"
device_location: "woonkamer"
And the automation:
alias: Play On External Speakers
description: ""
triggers:
- trigger: event
event_type: esphome.tts_ended
conditions: []
actions:
- action: media_player.play_media
target:
entity_id: media_player.sonos_soundbar
data:
media_content_id: "{{ trigger.event.data.url }}"
media_content_type: music
announce: true
mode: single
I've also added a custom device_location datatype to pass along which will allow me to distinguish between which devices have picked up the voice command and to which media players they should go. But I haven't edited the automation for this just yet because I only have 1 Atom Echo as of now.
Awesome, thanks for all the code! I'll check it out (away for the weekend at the mo)
I still wish ha had the option in the ui for this. To me it's a no brainer.
My work around has been OK but 2 updates later and it's got a bit more of a delay with the listening sound. I'll have to look in to one of these
I agree, it should be a standard feature
Just to follow up on this (as I've figured out a couple of things due to other ESP issues), I've found in my set up that if I try to send a TTS response (using the method described above) to a ESP32 based media player, which uses the Arduino type, then I can't get the media player to play the response. But if the ESP32 is set to type esp-idf, then it works! Just wanted to update incase anyone else found it useful 🙂
woonkamer?
doesnt work at all
no sound from media player, using sonos aswell. i had my most luck from the tts uri method
The method to deliver TTS URI doesn't matter. Event works - and it works better than sensor, because sensor won't change state, if your TTS service returned same response URI (as it happens with cached responses). Your problem is in media player, not the event.
im trying to rule things out. is there a way to add on_tts_end:
- homeassistant.event:
event: esphome.tts_ended
data:
url: !lambda "return x;"
device_location: "woonkamer"
to the default esphome config after taking control without putting in the entire yaml? i remember there being a way. it already links to the github yaml
I don't think so
I believe it will merge a section into the imported one if it has the same section/ID name
so you would just add:
voice_assistant:
id: va
on_tts_end:
- homeassistant.event:
event: esphome.tts_ended
data:
url: !lambda "return x;"
device_location: "woonkamer"
i was messing with this sort of thing the other day on a different device and it seemed to work
trying it now, i remember i had done that with the boxs3 long time ago
getting this error in logs for piper
the automation says it runs with no error in trace. it's like piper doesn't make the sound file. tried playing the sound file directly and it wont play
nm fixed that part
some reason voice reset in piper in the voice assistant section, however sound still only plays on the pe
Executed: May 1, 2025 at 10:04:05 AM Result: params: domain: media_player service: play_media service_data: media_content_id: http://192.168.xx.xx:8123/api/tts_proxy/4ZSBCcsPXa2EvZyYURsrMA.flac media_content_type: music announce: true entity_id: - media_player.foyer_speaker target: entity_id: - media_player.foyer_speaker running_script: false
no change in the log for the sonos speaker. it's as if it never gets the file to play
turned on debugging for it and got this
SonosAudioInputFormatSensorEntity._poll_state 2025-05-01 10:23:10.929 DEBUG (MainThread) [homeassistant.components.sonos.media_player] Playing http://192.168.xx.xx:8123/api/tts_proxy/s_97rGlH2ABtqMWt7YH_KA.flac using websocket audioclip 2025-05-01 10:23:10.931 DEBUG (MainThread) [sonos_websocket.websocket] Sending command: [{'namespace': 'audioClip:1', 'command': 'loadAudioClip', 'playerId': 'RINCON_48A6B8FFDDFA01400'}, {'name': 'Sonos Websocket', 'appId': 'com.jjlawren.sonos_websocket', 'streamUrl': 'http://192.168.xx.xx:8123/api/tts_proxy/s_97rGlH2ABtqMWt7YH_KA.flac'}]
i dont see the issue
a normal tts with plain text works.
does the url its tending work (stick it in a browser to test)?
does the sonos support flac data being sent that way? (can you show debug from the normal tts working and see if there is a format difference?)
thing is this was working perfectly fine before
the url does yes
which log you want? from piper right?
DEBUG:wyoming_piper.handler:Sent info DEBUG:wyoming_piper.handler:Synthesize(text='10:49 AM', voice=SynthesizeVoice(name='trek-medium', language=None, speaker=None)) DEBUG:wyoming_piper.handler:synthesize: raw_text=10:49 AM, text='10:49 AM.' DEBUG:wyoming_piper.process:Stopping process for: en_US-lessac-medium DEBUG:wyoming_piper.process:Starting process for: trek-medium (1/1) DEBUG:wyoming_piper.process:Starting piper process: /usr/share/piper/piper args=['--model', '/share/piper/trek-medium.onnx', '--config', '/share/piper/trek-medium.onnx.json', '--output_dir', '/tmp/tmp4yomsd6r', '--json-input', '--noise-scale', '0.667', '--length-scale', '1.0', '--noise-w', '0.333'] DEBUG:wyoming_piper.handler:input: {'text': '10:49 AM.'} DEBUG:wyoming_piper.handler:Sent info DEBUG:wyoming_piper.handler:/tmp/tmp4yomsd6r/1746110974765258295.wav DEBUG:wyoming_piper.handler:Completed request
what url is it sending to the sonos when it works?
DEBUG:wyoming_piper.handler:Sent info
DEBUG:wyoming_piper.handler:/tmp/tmp4yomsd6r/1746110974765258295.wav
thats using a generic tts action with text
you can see piper is generating a wav file.
we know that sending an announce via the PE makes a flac file
does normal tts send the wav file?
therefore is a difference
this is not a URL
this includes a url
what I am trying to establish
link to wav works
link to flac works
sonos plays wav
sonos does NOT play flac
this perhaps implies that the sonos does not like flac files?
it did before lol
any way to convert to wav?
ok what the hell
now suddenly it plays it?
i...dont.....even...
2025-05-01 11:03:48.452 DEBUG (MainThread) [homeassistant.components.sonos.speaker] Activity on Living Room from ZoneGroupTopology subscription 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.454 DEBUG (MainThread) [homeassistant.components.sonos.speaker] async_regroup Living Room ['RINCON_C438751469E001400'] 2025-05-01 11:03:48.499 DEBUG (MainThread) [soco.events_asyncio] Event 27 received for DeviceProperties service at 1746111828.4997709
does that show anything.
lol
i dont get it..
i tried it again and the speaker was very distorted and clipping
then again..and it was fine
It's learning.... it's aliiiive!
whats strange is when using this. sometimes it works, sometimes it plays like 2 seconds, and sometimes when it works, during mid sentence the volume raises
action: assist_satellite.announce metadata: {} data: message: "'{{ response.response.speech.plain.speech }}'" preannounce: true target: entity_id: assist_satellite.voice_pe_assist_satellite
i find it hard to believe i am the only one experiencing this lol
also realized when the conversation response cuts off on the sonos it also does the same on the hapve. plays a 2 seconds then stops
am i correct on this, it seems like piper creates a wav file for the tts then it's converted to flac for output. could it be piper is erroring somehow in the conversion?
DEBUG:wyoming_piper.handler:synthesize: raw_text='"Sir, I've detected an individual on the rear surveillance feed. Requesting clearance for visual confirmation."', text=''"Sir, I've detected an individual on the rear surveillance feed. Requesting clearance for visual confirmation."'.' DEBUG:wyoming_piper.handler:input: {'text': '\'"Sir, I\'ve detected an individual on the rear surveillance feed. Requesting clearance for visual confirmation."\'.'} DEBUG:wyoming_piper.handler:/tmp/tmp4yomsd6r/1746116276999573956.wav DEBUG:wyoming_piper.handler:Completed request
only 2 seconds played from this but the entire text is being generated
and the entire flac plays in browser.....it's like using announce cuts off or timesout
remember your using it in an unsupported customised manner, so there is bound to be issues
thing is, using an external media player with voice pe still plays audio from voice pe, i just had it turned down real low, but the audio was cutting off from it to
so it cant just be sonos
thats how its been set up from what i see above yes. it plays on the VPE and triggers the event back to ha "i am playing this URL" which the automation then also plays that url on the sonos
but if it was an issue with sonos and flac then wouldnt it keep playing on the pe regardless?
yeah it should do
almost seems to me it could be the way piper is converting to flac
got this from logs now
Error executing script. Unexpected error for call_service at pos 2: Timeout waiting for VoiceAssistantAnnounceFinished after 300s Traceback (most recent call last): File "aioesphomeapi/connection.py", line 820, in send_messages_await_response_complex TimeoutError
is there a way to set a particular mode just for an action not a whole automation
i set my test auto to restart and so far box3 hasnt failed
like it's still continung the conversation if set to single
testing with pe again
It's just Piper doesn't convert to FLAC. HA does.
Case is, that FLAC is easier to standardize and play on ESP32. HA is used as proxy between Piper and voice assistant, converting WAV to FLAC and exposing it with local HA URL.
well damn..pe doing it again
Probably the proxy is too slow in conversion, so by the time player wants to play the file, it's only partially ready?
Can you try making a delay in your automation before sending the URL to SONOS?
i had thought of that, hate to delay llm more lol but ill try it real quick
It's just for testing
1st time works but ill keep trying..ha always works the first time haha
3 times no fail with a 1 second delay..
and again it works..so does this mean piper is being slow
damn..5th time and its doing it again wth
and this time..the tts played through the sonos first then the pe..usually its always in synx
wonder where the hell the bottle neck can be
ha and piper are on proxmox on a mini. still not fully familiar with proxmox, any thing i should look at?
wonder if switching to a medium model would help
Try.
But remember: the one generating FLAC is HA, not Piper.
could ha be corrupt somehow
one thing im trying right now. using the piper in docker with gpu. when using assist announce to play vs using tts.speak, announce is 3 times quicker for whatever reason
i have come to the conclusion from all of this damn confusion that playing from tts.speak seems to work flawless with piper in docker. using the same method with assist.announce causes the audio to cut off again. which leaves me to think it's the voice pe timing out somehow
i'm getting responses within 1.5seconds using piper and gpu and ollama. shrug i dont know. just know im aggrivated lol
it sucks because i like the assist announce for doing this
i have custom wake sounds for that
change the TTS on the pipeline the device is using. announce uses whatever is specified in the pipeline
i thought maybe it was the connection from proxmox to ollama on the desktop being to slow but this pretty much eliminates that idea. ha has changed something in esphome with most recent update i suspect. i'm actually finiding this very problem when checking websites with no clear solution. my guess is they tried to fix something else and created a possible hiccup with announce timing out too quick
i think its announce itself
announce uses the tts thats specified in the pipeline that the device is set to use
it doesnt have its own thing
`alias: zzzzz
description: ""
triggers: []
conditions: []
actions:
- action: conversation.process
metadata: {}
data:
agent_id: conversation.llama3_2
text: >-
Rephrase the following text [tell the owner you love him and appriciate
him]
response_variable: response - action: tts.speak
metadata: {}
data:
cache: false
media_player_entity_id: media_player.foyer_speaker
message: "{{ response.response.speech.plain.speech }}"
target:
entity_id: tts.piper_2
enabled: true - action: assist_satellite.announce
metadata: {}
data:
message: "'{{ response.response.speech.plain.speech }}'"
preannounce: true
target:
entity_id: assist_satellite.voice_pe_assist_satellite
enabled: false
mode: single`
that is my test auto
announce doesn't work correctly using the same pipeline the ttsspeak is using
i am not sure if you understand what i mean by pipeline?
in settings - voice assistants
you select the pipeline thats assigned to the vpe and ensure the tts section is using the one that works
with tts.speak you select the piper instance in the action
with assist_satellite.announce you select the instance via the pipeline
on the TTS is there multiple versions of piper?
i dont want to use the llm directly as its not ready for prime time imo
no. i disabled the addon i was running in ha when i switched to docker for testing
so it only shows 1 there?
can try it for test?
alright
Don't you think that tts.speak works and satellite.announce doesn't - because tts.speak delivers audio directly to the speaker, while announce is using pipeline - this proxy?
google is cloud right
yes
was using piper through tts.speak though
didnt happen using google
so your saying piper is not responding fast enought for assist
is there a way to adjust the timeout?
trying something else, disabled mesh connection and forcing 2.4ghz
for the assistpe
piper is usually pretty fast, is it trying to send an especially long TTS or something perhaps?
Well at the moment it seems to be working but using it to rephrase text in an automation. I have in the instructions "responses need to be short and not over a sentence long" i adjusted the context down to 6400 since it doesnt have control in ha. Did more reading and set ipv6 to disabled in settings then though i already had it turned off in the router. But also forced 2.4ghz and disabled mesh technology for pe. But as dumb as it sounds, im not so sure that is why its working better..i had pre announce turned on in the assist.announce action but had forgotten i had another automation set up that played a wake and on idle sound when assist state changes...when disabling that it seems to be better. Not entirely sure why?
Unless it was trying to send 2 audio files at the same time and causing an error. However logs werent showing that
Funny thing is i took sonos off mesh to and now i get errors in log reported connection timeouts. But i know that is related directly to sonos and their screwed up api. Might have to delete and re add the entire system
Last time i checked sonos hated mesh routers. And ironically i was one of the very few who used to never have the sonos connection problems when it was routing through mesh
But most say not to use ethernet to the arc ultra because it then uses the depreciated sonos net when you have surrounds/subs