#HA Voice PE send TTS to external speaker

1 messages · Page 1 of 1 (latest)

unkempt vault
#

Hi all, so I've just received my first HA Voice PE, and it seems to be working well and can pick up my voice decently. However, as we probably all know, the speaker isn't so great. I already have some other ESP based satellites around, where some I have used an external (for example, Google Nest mini) speaker to outut the TTS. Has anyone tried taking over the Voice PE in ESPhome, and changed the config to send the TTS to an external speaker / media player? Would love to hear from you 🙂

limpid kindle
#

In Koala satellite i've added TTS URL sensor, which can be used to create automation and send it to any player.

unkempt vault
#

Great! Can you share any code? Would be most appreciated

limpid kindle
unkempt vault
#

Many thanks! I'll get back to you if I need any more guidance or help. Have a great weekend 😄

tiny hedge
#

Were you able to get this working?

And is there a way I can snip the code out and put in the current pe config?

#

My main goal is using external speaker through the media player and changing start and stop listening sounds

unkempt vault
#

Hi, no I haven't had a chance to try this yet. I've just looked through the yaml linked above, and there's a hell of a lot of code! @limpid kindle have you taken the whole code, and used that on your Voice PE? Would you mind sharing a copy of your exact yaml? All I really want right now is to send the TTS to an external media player 🙂

limpid kindle
torpid shard
#

It looks like you created that sensor and then when TTS ends (is that when the processing ends? Or it finishes playing locally?) the URI is written to that sensor. I assume from there you have a HA automation listen to that sensor change and then play on the desired media player. Is that right?

torpid shard
#

I’m very curious about this because I have Sonos speakers in most rooms so I’m interested in sticking one of these in each zone and using my existing speakers.

limpid kindle
queen anvil
#

Kinda got this working with a new TTS_URL sensor which updated with a new TTS_end event (with the url to the .flac response file). Next to do for me is to add a new boolean into the code, when set, home assistant PE disables playing the .flac response to the internal /externally connected speaker .A lot of code to look though and understand. you could then switch it on/off in home assistant while testing/as and when needed.

queen anvil
#

Send to External Speaker - working(ish) - Here's where im at. I can play the voice response on my sonos (but it's still ALSO playing on the voice PE - - TODO!)

You'll need to take control of the Voice PE in ESPHome Builder, then in ESPHome Builder - EDIT

Do not do this unless you know what this means (no automatic firmware updates)

Add these few lines to the bottom of the yaml...

Wilksy TTS URL Sensor

text_sensor:

  • platform: template
    id: tts_uri
    name: "TTS URI"
    disabled_by_default: false

voice_assistant:
on_tts_end:
- text_sensor.template.publish:
id: tts_uri
state: !lambda 'return x;'

you'll then need to click Save and then click Install, This will take some time while it re compiles the firmware. After you'll have a new sensor called "Home Assistant Voice xxxx TTS URI". You can create an automation whith this sensor, when it changes you can play the url of the flac file on your Sonos...

below is the Automation YAML to do that...

alias: Play TTS on Sonos
description: Play the TTS file from the sensor on a Sonos speaker
triggers:

  • entity_id: sensor.home_assistant_voice_096f11_tts_uri
    to: null
    trigger: state
    conditions: []
    actions:
  • target:
    entity_id: media_player.kitchen
    data:
    media_content_id: "{{ states('sensor.home_assistant_voice_096f11_tts_uri') }}"
    media_content_type: music
    action: media_player.play_media
    mode: single

UPDATE the Name of YOUR voice Device and sonos media player device in the above. You'll also probably want to save the sonos state prior, and resume it after it's playing.

It's a start, next is disabling the speaker for TTS playing (voice response) only, and also look at why "announce" does not work (it's doesn't overlay the audio) - probably a Sonos Thing

tiny hedge
# queen anvil that's all in the yaml - you can change but would have to compile yourself - fil...

I get that but I mean sending those to the media player entity instead of the pe. I'm using sonos to. Currently with the box3 when I say the wake word, the wake sound is heard on sonos and I have a tts end for when it stops responding.

Can the pe just be turned down to like 1 to give the impression it only plays through sonos.

One other thing. Couldn't we use assist satellite entity for listening to do the same?

queen anvil
torpid shard
#

Do you need PE volume even on at all if you’re playing through the Sonos?

limpid kindle
#

I just mute the satellite.

queen anvil
queen anvil
tiny hedge
#

just the part for the pe config

tiny hedge
#

compiling. will check back

tiny hedge
#

i got it working

#

gonna try something to get the wake sounds to play through sonos though. it still plays through pe

#

i think that can be achieved through the satellite entity maybe

tiny hedge
#

got the start and end working. playing an mp3

#

`alias: TEST Hey Jarvis PE
description: ""
triggers:

  • trigger: state
    id: one
    entity_id:
    • assist_satellite.home_assistant_voice_095f33_assist_satellite
      to: listening
  • trigger: state
    entity_id:
    • assist_satellite.home_assistant_voice_095f33_assist_satellite
      from: null
      to: idle
      id: two
      conditions: []
      actions:
  • choose:
    • conditions:
      • condition: trigger
        id:
        • one
          sequence:
      • target:
        entity_id: media_player.master_bedroom
        data:
        media_content_id: /local/sounds/awake.mp3
        media_content_type: music
        action: media_player.play_media
  • choose:
    • conditions:
      • condition: trigger
        id:
        • two
          sequence:
      • target:
        entity_id: media_player.master_bedroom
        data:
        media_content_id: /local/sounds/end.mp3
        media_content_type: music
        action: media_player.play_media
        mode: single`
queen anvil
queen anvil
#
text_sensor:
  - platform: template
    id: tts_uri
    name: "TTS URI"
    disabled_by_default: false  

voice_assistant:
  on_tts_end:
    - text_sensor.template.publish:
        id: tts_uri
        state: !lambda 'return x;'

Yay!!!

unkempt vault
# queen anvil Send to External Speaker - working(ish) - Here's where im at. I can play the voi...

Thanks so much for posting this! I have got the text sensor working in HA, and can see it changes when I get a new TTS response, but I cannot get it to play on an external media player via the automation example you posted (I've of course changed the names of my entities etc). It's like the flac file just won't play on the media player I've chosen (trying to play it through an ESPHome media player device). Any pointers as to why that might not be working?

unkempt vault
#

Hi all, just wanted t osay thanks again for the TTS_URI idea for getting TTS responses via an external media player. Unfortunately, I cannot get it to work. The TTS_URI entity is on my VPE, and I can see it changes when ever I get a new response, but I cannot get the external media player I've chosen to play the file (I can see the external media player changes from idle to playing, then back to idle again within a split second, so it's doing something, but the flac file won't play). Was just wondering if anyone had any pointers? For those who have it working, are your HA instances and VPE on the same network (mine are not)? Anything that could help me get it working would be most appreciated.

unkempt vault
#

Hi again, I'm hoping someone can help as i'm very close to getting this working as I want. I've found that the TTS_URI works if I send it to a Google nest speaker, but not if I try and send it to an ESPhome based media player. Does anyone know why this could be? Has anyone tried sending TTS_URI requests to anything ESPhome based? I can see the media player tries to play, but changes instantly to idle again.

soft knoll
unkempt vault
obsidian furnace
#

@unkempt vault I managed to get this working for myself just now. What I did was create a custom event esphome.tss_ended which passes the TTS file (url: !lambda "return x;") along. Then I created an automation that plays the file over my media player. I didn't get it working on my HomePod because it doesn't accept .wav files, but it works perfectly on my Sonos with minimal delay. I suppose you could get it working on a HomePod as well if you transcode the audio file to a different format such as mp3.

Here is the snippet of the firmware for ESP:

  on_tts_end:
    - homeassistant.event:
        event: esphome.tts_ended
        data:
          url: !lambda "return x;"
          device_location: "woonkamer"

And the automation:

alias: Play On External Speakers
description: ""
triggers:
  - trigger: event
    event_type: esphome.tts_ended
conditions: []
actions:
  - action: media_player.play_media
    target:
      entity_id: media_player.sonos_soundbar
    data:
      media_content_id: "{{ trigger.event.data.url }}"
      media_content_type: music
      announce: true
mode: single

I've also added a custom device_location datatype to pass along which will allow me to distinguish between which devices have picked up the voice command and to which media players they should go. But I haven't edited the automation for this just yet because I only have 1 Atom Echo as of now.

unkempt vault
tiny hedge
#

I still wish ha had the option in the ui for this. To me it's a no brainer.

#

My work around has been OK but 2 updates later and it's got a bit more of a delay with the listening sound. I'll have to look in to one of these

unkempt vault
unkempt vault
#

Just to follow up on this (as I've figured out a couple of things due to other ESP issues), I've found in my set up that if I try to send a TTS response (using the method described above) to a ESP32 based media player, which uses the Arduino type, then I can't get the media player to play the response. But if the ESP32 is set to type esp-idf, then it works! Just wanted to update incase anyone else found it useful 🙂

tiny hedge
#

no sound from media player, using sonos aswell. i had my most luck from the tts uri method

limpid kindle
tiny hedge
inland rock
#

i was messing with this sort of thing the other day on a different device and it seemed to work

tiny hedge
#

trying it now, i remember i had done that with the boxs3 long time ago

#

getting this error in logs for piper

#

the automation says it runs with no error in trace. it's like piper doesn't make the sound file. tried playing the sound file directly and it wont play

#

nm fixed that part

#

some reason voice reset in piper in the voice assistant section, however sound still only plays on the pe

#

Executed: May 1, 2025 at 10:04:05 AM Result: params: domain: media_player service: play_media service_data: media_content_id: http://192.168.xx.xx:8123/api/tts_proxy/4ZSBCcsPXa2EvZyYURsrMA.flac media_content_type: music announce: true entity_id: - media_player.foyer_speaker target: entity_id: - media_player.foyer_speaker running_script: false

#

no change in the log for the sonos speaker. it's as if it never gets the file to play

tiny hedge
#

turned on debugging for it and got this

#

SonosAudioInputFormatSensorEntity._poll_state 2025-05-01 10:23:10.929 DEBUG (MainThread) [homeassistant.components.sonos.media_player] Playing http://192.168.xx.xx:8123/api/tts_proxy/s_97rGlH2ABtqMWt7YH_KA.flac using websocket audioclip 2025-05-01 10:23:10.931 DEBUG (MainThread) [sonos_websocket.websocket] Sending command: [{'namespace': 'audioClip:1', 'command': 'loadAudioClip', 'playerId': 'RINCON_48A6B8FFDDFA01400'}, {'name': 'Sonos Websocket', 'appId': 'com.jjlawren.sonos_websocket', 'streamUrl': 'http://192.168.xx.xx:8123/api/tts_proxy/s_97rGlH2ABtqMWt7YH_KA.flac'}]

#

i dont see the issue

#

a normal tts with plain text works.

inland rock
#

does the url its tending work (stick it in a browser to test)?
does the sonos support flac data being sent that way? (can you show debug from the normal tts working and see if there is a format difference?)

tiny hedge
#

thing is this was working perfectly fine before

#

the url does yes

#

which log you want? from piper right?

#

DEBUG:wyoming_piper.handler:Sent info DEBUG:wyoming_piper.handler:Synthesize(text='10:49 AM', voice=SynthesizeVoice(name='trek-medium', language=None, speaker=None)) DEBUG:wyoming_piper.handler:synthesize: raw_text=10:49 AM, text='10:49 AM.' DEBUG:wyoming_piper.process:Stopping process for: en_US-lessac-medium DEBUG:wyoming_piper.process:Starting process for: trek-medium (1/1) DEBUG:wyoming_piper.process:Starting piper process: /usr/share/piper/piper args=['--model', '/share/piper/trek-medium.onnx', '--config', '/share/piper/trek-medium.onnx.json', '--output_dir', '/tmp/tmp4yomsd6r', '--json-input', '--noise-scale', '0.667', '--length-scale', '1.0', '--noise-w', '0.333'] DEBUG:wyoming_piper.handler:input: {'text': '10:49 AM.'} DEBUG:wyoming_piper.handler:Sent info DEBUG:wyoming_piper.handler:/tmp/tmp4yomsd6r/1746110974765258295.wav DEBUG:wyoming_piper.handler:Completed request

inland rock
#

what url is it sending to the sonos when it works?

tiny hedge
#

DEBUG:wyoming_piper.handler:Sent info
DEBUG:wyoming_piper.handler:/tmp/tmp4yomsd6r/1746110974765258295.wav

#

thats using a generic tts action with text

inland rock
#

you can see piper is generating a wav file.
we know that sending an announce via the PE makes a flac file
does normal tts send the wav file?

therefore is a difference

tiny hedge
#

im going to paste this long part

#

the flac file plays in the browser

inland rock
#

what I am trying to establish

link to wav works
link to flac works
sonos plays wav
sonos does NOT play flac

this perhaps implies that the sonos does not like flac files?

tiny hedge
#

it did before lol

#

any way to convert to wav?

#

ok what the hell

#

now suddenly it plays it?

#

i...dont.....even...

#

2025-05-01 11:03:48.452 DEBUG (MainThread) [homeassistant.components.sonos.speaker] Activity on Living Room from ZoneGroupTopology subscription 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.453 DEBUG (SyncWorker_5) [soco.zonegroupstate] Subscriptions (3) still active during poll for 192.168.xx.xx, using cache 2025-05-01 11:03:48.454 DEBUG (MainThread) [homeassistant.components.sonos.speaker] async_regroup Living Room ['RINCON_C438751469E001400'] 2025-05-01 11:03:48.499 DEBUG (MainThread) [soco.events_asyncio] Event 27 received for DeviceProperties service at 1746111828.4997709

#

does that show anything.

#

lol

#

i dont get it..

#

i tried it again and the speaker was very distorted and clipping

#

then again..and it was fine

limpid kindle
#

It's learning.... it's aliiiive!

tiny hedge
#

whats strange is when using this. sometimes it works, sometimes it plays like 2 seconds, and sometimes when it works, during mid sentence the volume raises

#

action: assist_satellite.announce metadata: {} data: message: "'{{ response.response.speech.plain.speech }}'" preannounce: true target: entity_id: assist_satellite.voice_pe_assist_satellite

#

i find it hard to believe i am the only one experiencing this lol

#

also realized when the conversation response cuts off on the sonos it also does the same on the hapve. plays a 2 seconds then stops

#

am i correct on this, it seems like piper creates a wav file for the tts then it's converted to flac for output. could it be piper is erroring somehow in the conversion?

#

DEBUG:wyoming_piper.handler:synthesize: raw_text='"Sir, I've detected an individual on the rear surveillance feed. Requesting clearance for visual confirmation."', text=''"Sir, I've detected an individual on the rear surveillance feed. Requesting clearance for visual confirmation."'.' DEBUG:wyoming_piper.handler:input: {'text': '\'"Sir, I\'ve detected an individual on the rear surveillance feed. Requesting clearance for visual confirmation."\'.'} DEBUG:wyoming_piper.handler:/tmp/tmp4yomsd6r/1746116276999573956.wav DEBUG:wyoming_piper.handler:Completed request

#

only 2 seconds played from this but the entire text is being generated

#

and the entire flac plays in browser.....it's like using announce cuts off or timesout

inland rock
tiny hedge
#

im going to try this with the boxs3 and see if i get the same

#

1st time it works..

inland rock
#

is that using flac or wav?

#

that seems to me to be the main difference

tiny hedge
#

thing is, using an external media player with voice pe still plays audio from voice pe, i just had it turned down real low, but the audio was cutting off from it to

#

so it cant just be sonos

inland rock
#

thats how its been set up from what i see above yes. it plays on the VPE and triggers the event back to ha "i am playing this URL" which the automation then also plays that url on the sonos

tiny hedge
#

but if it was an issue with sonos and flac then wouldnt it keep playing on the pe regardless?

inland rock
#

yeah it should do

tiny hedge
#

almost seems to me it could be the way piper is converting to flac

#

got this from logs now

#

Error executing script. Unexpected error for call_service at pos 2: Timeout waiting for VoiceAssistantAnnounceFinished after 300s Traceback (most recent call last): File "aioesphomeapi/connection.py", line 820, in send_messages_await_response_complex TimeoutError

#

is there a way to set a particular mode just for an action not a whole automation

#

i set my test auto to restart and so far box3 hasnt failed

#

like it's still continung the conversation if set to single

#

testing with pe again

limpid kindle
tiny hedge
#

well damn..pe doing it again

limpid kindle
#

Probably the proxy is too slow in conversion, so by the time player wants to play the file, it's only partially ready?

#

Can you try making a delay in your automation before sending the URL to SONOS?

tiny hedge
#

i had thought of that, hate to delay llm more lol but ill try it real quick

limpid kindle
#

It's just for testing

tiny hedge
#

1st time works but ill keep trying..ha always works the first time haha

#

3 times no fail with a 1 second delay..

#

and again it works..so does this mean piper is being slow

#

damn..5th time and its doing it again wth

#

and this time..the tts played through the sonos first then the pe..usually its always in synx

#

wonder where the hell the bottle neck can be

#

ha and piper are on proxmox on a mini. still not fully familiar with proxmox, any thing i should look at?

#

wonder if switching to a medium model would help

limpid kindle
#

Try.
But remember: the one generating FLAC is HA, not Piper.

tiny hedge
#

could ha be corrupt somehow

#

one thing im trying right now. using the piper in docker with gpu. when using assist announce to play vs using tts.speak, announce is 3 times quicker for whatever reason

#

i have come to the conclusion from all of this damn confusion that playing from tts.speak seems to work flawless with piper in docker. using the same method with assist.announce causes the audio to cut off again. which leaves me to think it's the voice pe timing out somehow

#

i'm getting responses within 1.5seconds using piper and gpu and ollama. shrug i dont know. just know im aggrivated lol

#

it sucks because i like the assist announce for doing this

#

i have custom wake sounds for that

inland rock
#

change the TTS on the pipeline the device is using. announce uses whatever is specified in the pipeline

tiny hedge
#

i thought maybe it was the connection from proxmox to ollama on the desktop being to slow but this pretty much eliminates that idea. ha has changed something in esphome with most recent update i suspect. i'm actually finiding this very problem when checking websites with no clear solution. my guess is they tried to fix something else and created a possible hiccup with announce timing out too quick

#

i think its announce itself

inland rock
#

announce uses the tts thats specified in the pipeline that the device is set to use

#

it doesnt have its own thing

tiny hedge
#

`alias: zzzzz
description: ""
triggers: []
conditions: []
actions:

  • action: conversation.process
    metadata: {}
    data:
    agent_id: conversation.llama3_2
    text: >-
    Rephrase the following text [tell the owner you love him and appriciate
    him]
    response_variable: response
  • action: tts.speak
    metadata: {}
    data:
    cache: false
    media_player_entity_id: media_player.foyer_speaker
    message: "{{ response.response.speech.plain.speech }}"
    target:
    entity_id: tts.piper_2
    enabled: true
  • action: assist_satellite.announce
    metadata: {}
    data:
    message: "'{{ response.response.speech.plain.speech }}'"
    preannounce: true
    target:
    entity_id: assist_satellite.voice_pe_assist_satellite
    enabled: false
    mode: single`
#

that is my test auto

#

announce doesn't work correctly using the same pipeline the ttsspeak is using

inland rock
#

i am not sure if you understand what i mean by pipeline?

#

in settings - voice assistants

#

you select the pipeline thats assigned to the vpe and ensure the tts section is using the one that works

tiny hedge
inland rock
#

with tts.speak you select the piper instance in the action
with assist_satellite.announce you select the instance via the pipeline

tiny hedge
#

i tried everything but ha cloud

#

already tried changing pipelines

inland rock
#

on the TTS is there multiple versions of piper?

tiny hedge
#

i dont want to use the llm directly as its not ready for prime time imo

#

no. i disabled the addon i was running in ha when i switched to docker for testing

inland rock
#

so it only shows 1 there?

tiny hedge
#

yep

#

or google translate but never used to tbh

inland rock
#

can try it for test?

tiny hedge
#

alright

limpid kindle
#

Don't you think that tts.speak works and satellite.announce doesn't - because tts.speak delivers audio directly to the speaker, while announce is using pipeline - this proxy?

tiny hedge
#

google is cloud right

limpid kindle
tiny hedge
#

was using piper through tts.speak though

#

didnt happen using google

#

so your saying piper is not responding fast enought for assist

#

is there a way to adjust the timeout?

#

trying something else, disabled mesh connection and forcing 2.4ghz

#

for the assistpe

next citrus
#

piper is usually pretty fast, is it trying to send an especially long TTS or something perhaps?

tiny hedge
# next citrus piper is usually pretty fast, is it trying to send an especially long TTS or som...

Well at the moment it seems to be working but using it to rephrase text in an automation. I have in the instructions "responses need to be short and not over a sentence long" i adjusted the context down to 6400 since it doesnt have control in ha. Did more reading and set ipv6 to disabled in settings then though i already had it turned off in the router. But also forced 2.4ghz and disabled mesh technology for pe. But as dumb as it sounds, im not so sure that is why its working better..i had pre announce turned on in the assist.announce action but had forgotten i had another automation set up that played a wake and on idle sound when assist state changes...when disabling that it seems to be better. Not entirely sure why?

tiny hedge
#

Unless it was trying to send 2 audio files at the same time and causing an error. However logs werent showing that

#

Funny thing is i took sonos off mesh to and now i get errors in log reported connection timeouts. But i know that is related directly to sonos and their screwed up api. Might have to delete and re add the entire system

#

Last time i checked sonos hated mesh routers. And ironically i was one of the very few who used to never have the sonos connection problems when it was routing through mesh

#

But most say not to use ethernet to the arc ultra because it then uses the depreciated sonos net when you have surrounds/subs