M5Stack Atom Echo Media Player + Wake Word | Home Assistant | Page 1

peak pollen Nov 29, 2023, 10:12 PM

#

I am working on enabling the Wake Word with the Media Player functionality that currently exists seperatly inside https://github.com/esphome/firmware/blob/main/voice-assistant/m5stack-atom-echo.yaml and https://github.com/esphome/firmware/blob/main/media-player/m5stack-atom-echo.yaml respectively.

Taking some inspiration and logic from @sturdy mortar's config at https://github.com/esphome/firmware/blob/main/media-player/onju-voice.yaml

I have the below

#

https://github.com/Gtt1229/firmware/blob/main/media-player/m5stack-atom-echo-va.yaml

#

Updated esp_adf setting that I added trying to mimic too much.

sturdy mortar Nov 29, 2023, 10:25 PM

#

peak pollen Updated esp_adf setting that I added trying to mimic too much.

esp_adf is a library that relies on the esp-idf framework, which is incompatible with the media_player component. it was used for VAD (Voice Activity Detection) and it barely worked, so just don't add it.

what's up with the config you pasted? what doesn't work?

#

as a note, if you've gone through the trouble of copying the scripts which turn on/off the wake word, why not use them here? https://github.com/Gtt1229/firmware/blob/7feb596cbde3290e14ada113fa27af889ac7b33d/media-player/m5stack-atom-echo-va.yaml#L183

peak pollen Nov 29, 2023, 10:28 PM

#

sturdy mortar `esp_adf` is a library that relies on the `esp-idf` framework, which is incompat...

Yea, that was an accident on my part.

I also realized upon review that I never called the scripts. I will adjust.

#

The issue I am currently running into is that I can TTS to it, but it doesn't play the response to my commands through its speaker.

#

[D][voice_assistant:428]: Desired state set to AWAITING_RESPONSE
[D][voice_assistant:422]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[D][voice_assistant:422]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
...
[D][voice_assistant:557]: Speech recognised as: " What's my name?"
[D][voice_assistant:529]: Event Type: 5
[D][voice_assistant:562]: Intent started
...
...
[D][voice_assistant:585]: Response: "Sorry, I couldn't understand that"
[D][voice_assistant:529]: Event Type: 8
[D][voice_assistant:605]: Response URL: "http://192.168.107.5:8123/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_e6f38331f3_tts.piper.raw"
[D][voice_assistant:422]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[D][voice_assistant:428]: Desired state set to STREAMING_RESPONSE
[D][media_player:059]: 'M5Stack Atom Echo b83488' - Setting
[D][media_player:066]:   Media URL: http://192.168.107.5:8123/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_e6f38331f3_tts.piper.raw
[D][media_player:059]: 'M5Stack Atom Echo b83488' - Setting
[D][media_player:066]:   Media URL: http://192.168.107.5:8123/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-us_e6f38331f3_tts.piper.raw
[D][light:036]: 'M5Stack Atom Echo b83488' Setting:
[D][light:059]:   Red: 20%, Green: 100%, Blue: 0%
[W][component:214]: Component voice_assistant took a long time for an operation (0.06 s).
[W][component:215]: Components should block for at most 20-30ms.
[D][voice_assistant:529]: Event Type: 2
[D][voice_assistant:619]: Assist Pipeline ended
[W][component:214]: Component i2s_audio.media_player took a long time for an operation (0.54 s).
[W][component:215]: Components should block for at most 20-30ms.
[W][component:214]: Component i2s_audio.media_player took a long time for an operation (0.47 s).
[W][component:215]: Components should block for at most 20-30ms.```

#

Removed lines are "Event Type" logs

sturdy mortar Nov 29, 2023, 10:31 PM

#

good news and bad news with that issue:

peak pollen Nov 29, 2023, 10:32 PM

#

Give it to me straight doc

sturdy mortar Nov 29, 2023, 10:32 PM

#

the bad news is that Piper is not compatible with the media_player with HA 2023.11 and older versions

#

the good news is that Piper is compatible with the media_player with HA 2023.12, which has launched in beta today

peak pollen Nov 29, 2023, 10:33 PM

#

Oh

#

So that's why it worked earlier. I was using a different pipeline

#

That's so weird though because I can use ```service: tts.speak
data:
media_player_entity_id: media_player.m5stack_atom_echo_b83488_m5stack_atom_echo_b83488
message: media
target:
entity_id: tts.piper

just fine

#

I see, it works now with HA Cloud TTS pipeline

#

I am doing this all so I can set a timer... Which is that other thread and won't really prove useful because the script is killed when the next intent is run anyways...

#

I'll figure that out though.

Thank you @sturdy mortar, get some rest haha

sturdy mortar Nov 29, 2023, 10:36 PM

#

peak pollen I am doing this all so I can set a timer... Which is that other thread and won't...

async_action: true should fix that, depending on your implementation https://www.home-assistant.io/integrations/intent_script/#async_action

#

I've also added a new feature in 2023.12 which might help (again, depending on your implementation) https://github.com/home-assistant/core/pull/102203

peak pollen Nov 30, 2023, 7:24 PM

#

Hello again. Do you happen to see any reason as to what is causing a loop when this is initially installed?

https://gist.github.com/Gtt1229/5ed3b1e771bb7b364660f8f32b2fb4cb

#

Currently running https://github.com/Gtt1229/firmware/blob/dev8/media-player/m5stack-atom-echo-va.yaml

sturdy mortar Nov 30, 2023, 7:50 PM

#

Do you mean the 5 second loop of restarting listening when wake word is on? That's intended behavior

peak pollen Nov 30, 2023, 7:54 PM

#

Ah, that makes sense....

sturdy mortar Nov 30, 2023, 8:07 PM

#

No, it doesn't, at least in my book, but it is what it is. I.e. intended 😋

#

What would make more sense than just chopping un audio at random (albeit consistently random) 5s intervals, would be to have proper VAD and just sent the recording buffer contents afted voice activity was no longer detected

peak pollen Nov 30, 2023, 9:34 PM

#

I figured it would be a continous audio stream to HA, since the satalites generally won't be powerful enough for voice detection on their own.

#

I know most of the Smart speakers do some form of local audio processing before calling home, but those units are probably more complex than an ESP or similar.

#

A PI zero would probably be capable though.

I am nowhere near savy enough in the development space, but does that mean 5 second audio streams are repeatedly sent to HA for processing?

#M5Stack Atom Echo Media Player + Wake Word