#Developing for the voice edition PE

1 messages · Page 1 of 1 (latest)

remote zodiac
#

Hi there! I'm new to making HA integrations, though not to programming in general. I have a project I'd like to start, but I'm struggling to find the proper docs for the voice edition PE and how to develop for it. I was looking through here https://support.nabucasa.com/hc/en-us/categories/24451727188125

To be specific, I want to perform bi-directional audio streaming to and from an integration, but I'm trying to understand if that's available out of the box, or if I'd need to modify the device firmware itself.

icy trail
remote zodiac
#

Streaming to and playing back audio chunks received from an external API server continuously

icy trail
#

is it a secret project? I am trying to establish what it is your trying to actually do so i can perhaps suggest a direction to go in.

remote zodiac
#

Haha no not really secret. I've been messing around a lot with the gemini live API at work, and I wanted to see if there was a way to make an integration for it, so I could use it to control my smart home

I'm having a few issues with my current whisper/HA cloud + piper setup, and the gemini live API just kind of does "everything", so I though it'd make for a good integration project

#

There's a python client, and code samples for sending and receiving chunks of audio data. I've already written some stuff for it in the JS version so I'm familiar with it

icy trail
#

you can already connect gemini as a conversation agent right?
there is currently a PR open for adding gemini as a TTS option.
as for the STT i dont know though

the speech stuff is brand new right?, i imagine will be integrations for all parts of the pipeline soon

remote zodiac
#

Oh no this is a different API and model. That PR you're referencing is for a model that allows for native audio output as a response modality, directly from the API

#

Gemini live does that, but also handles the STT, as well as automatic VAD, as well as processing interruptions. Asyncronous function calling (keep talking without waiting for the tool to complete, acknowledge it later), etc

#

It's highly experimental still, but I've had fun working with it. it's why I wanted to try doing this. I just need to be able to send and receivie audio to either my voice edition PE, or to another satellite

icy trail
#

oh is this the live voice model thing

remote zodiac
#

There was some newer live models revealed at I/O, but there has been a testing model available for a few months now

remote zodiac
icy trail
#

from paulus in one of the project channels yesterday
Live voice mode is not currently being pursued. It will require a complete revamp

remote zodiac
#

Yeah. I don't think this fits into HA's traditional understanding of assistant conversations, as they're all turn based

#

It's why I wanted to to make my own integration for it

icy trail
#

but...

#

so heres a possible way to go...

#

you could have your python interact directly with the voice PE using the esphome api . this would loose normal functionality. but would allow you full control of stuff. then you can add the tools to the voice model using MCP

remote zodiac
#

Ahhh no yeah that sounds perfect. The initial idea I had was an action to start this live mode thing, which would take over traditional functionality until it's over

I didn't think about the MCP server though, that's a good way to get access to all the currently exposed entities and tools, though I don't think these models have support for it. Still, I'm sure it's not too hard to just adapt the function definitions the server returns, that sounds good

#

What's available with the voice PE via the esphome api? As I understand it, esphome is for many devices, are there docs on what methods and things are exposed/available on the voice PE?

icy trail
#

means you can develop outside of home assistant too. then could move it in as an integration later

remote zodiac
#

Ohhh wait do you mean to essentially "un-adopt" the device and control it directly via the esphome api? So HA doesn't control it?

#

Or are you able to interface with it on the fly via the api even when it's connected still to HA?

icy trail
#

you can only have the VPE connected to 1 API at a time so if you connect to it directly it will no longer connect to HA via the esphome integration

remote zodiac
#

Gotchaa.... Yeah I guess that's what I feared. Is there no way to control it still even while it's still connected to HA? Or does it all have to be through the existing assistant APIs?

icy trail
#

for the level of control you want, i dont thing piggy backing ontop of the normal HA connection will work that well

remote zodiac
#

I mean, I can in theory always use the regular media player API to play combined audio chunks right?

#

But I don't know of anything for listening to the microphone natively through HA

icy trail
#

you could try and make a STT integration
and a TTS integration
and a conversation agent integration
then have them all connect to custom "middleware" which would send stuff back and forth to the rright placces. but that sounds like a mess

remote zodiac
#

Yeah... It very much does...

#

I might try something with the esphome api, especially with the spare atom echo that I have, but yeah. As paulus said, for integration directly into HA it would likely need a full revamp

#

I mean hey an API for that would be pretty cool. Maybe the ability to connect to whatever discord alternative has an open source client, or even the ability to have a conversation between rooms through two voice PE's. Native bi-directional audio would be pretty nice

icy trail
#

keep in mind with the atom echo you CANT have the mic and speaker active at the same time. you have to turn them on and off

remote zodiac
#

Ahh bugger. Still thanks for saving me time haha

icy trail
#

the AE only has 1 i2s bus so it has to switch stuff on/offf. other devices like the vpe with 2 have seperate busses for each

#

its really impressive that the AE is as good as it is with its limitations tbh

remote zodiac
#

Gotchaa

icy trail
#

there's probably ways to get to what you want but its not going to be easy.

remote zodiac
#

Yeah. Still I do appreciate your time. Do you know if there's any plans for an API like I mentioned above? There's those use cases as well as my live api as an example. I'm sure there could be a few more

icy trail
#

as paulus said, its not currently being looked at by the project team. but that doesnt mean it never will be. and also doenst mean that someone else could build something in the meantime.

given its brand new and will change alot in the coming months and various middleware packages will appear. maybe something will become viable

#

directly getting audio isnt really something thats been looked at either. but you can do it with the esphome api if you wannt do various external things

#

honestly, voice stuff is advanving so quickly. everything could be totally differnt in a couple of months

remote zodiac
#

Yeah that's fair haha

#

Thanks for clarifying that

#

Oh actually just while this is fresh, I suppose my main other question is if you know if there's plans for like, a general audio API. While it would work for my use-case, I'm also thinking for non assistnat related use cases. Room to room communication, baby monitoring, DIY intercoms, etc

icy trail
#

currently, no. not to my knowledge. you could build a sort of intercom with stt/tts but its not an audio pass.

#

there is a debug option to get HA to keep its recorded STT recording which i guess you could set to a media directory then play it somewhere else. then delete it yourself

#

could have it return a blank response then use the recording somehow maybe

#

it wouldnt ever be "live" though

#

would be record and replay at best