#Advanced Voice Mode (Or Gemini)

1 messages · Page 1 of 1 (latest)

simple raven
#

I don’t know why my immense Google knowledge can’t find this out, but I’m simply looking to talk to OpenAI’s Advanced Voice Mode through Home Assistant, I bought the Home Assistant Preview Edition and haven’t had a chance to play with it yet, but is it possible. I also pay for Gemini, and am looking not to control the house as of yet, but could just have conversations with it with my kids

wispy sapphire
#

What exactly do you want from “Advanced Voice Mode”? HA is not capable of supporting this mode, but you can configure very similar functionality.

simple raven
wispy sapphire
#

I'll ask differently: what do you feel is missing in the current implementation of voice interaction in HA? To clarify, it's only possible to interrupt the assistant using the wake word. Is there anything else?

simple raven
plain lodge
#

yeah i really want this, the cynic in me says HA wants people to sub to their cloud for a usable voice system. people have been requesting it for ages. I want AVM from OpenAI. I want elevenlabs voice.

naive flower
#

Is AVM even something that can be accessed through API (because of the cynic in you about HA cloud sells)?
Otherwise this wouldn't even be possible.

You can already use speech to text and text to speech from OpenAI with assist.
In combination with an OpenAI LLM model your're already 100% OpenAI (beside your own and HAs intent / prompt).

So, to be honest we have quite the opposite of an vendor lock-in with HA:
Everything is OpenSource and people can hook into every part of the pipeline.

https://community.home-assistant.io/t/new-home-assistant-integration-openai-gpt-4o-mini-tts/867370

https://github.com/fabio-garavini/ha-openai-whisper-stt-api

GitHub

HACS custom integration for using Whisper speech-to-text (OpenAI or GroqCloud) API in the Assist pipeline, reducing the workload on the Home Assistant server. - fabio-garavini/ha-openai-whisper-stt...

naive flower
#

ok, looks like OpenAI offers this with their audio / realtime models.
Might be possible to send the audio to these services. Not sure what you might have to do on HA-side. If a custom integration would be able to adopt this.

But holy hell, these token prices for audio / realtime models. 😵‍💫
Better don't talk to that AI more than once or twice a day 😉

#

Ok, just to complete this as much as possible for a short internet search:

You should be able to use these models to build an own SST integration.
To be compatible with Assist, it can only return the text.
But it could write additional data it asks from the model with a prompt e.g. to entity attributes. (Is the user whispering, is the volume loud or silent, is he yelling, does he sound calm, ... ?)

You would have to react on these attributes and configure the OpenAI TTS service (which also has prompts to use different tonality) respectly.
If this isn't possible on the fly, it would need a new custom TTS integration, or the existing one would need to add support.

So yes, it's most likely already possible.
But someone has to code it.
Will most likely not happen as long as the prices aren't massively dropping ...