#German STT and TTS

1 messages ยท Page 1 of 1 (latest)

dim crest
#

Hey, around 8 months ago I've tested the local voice assist. I have a small N100 mini PC with 8gb ram running HA and some other containers.

But back than I wasn't happy with the response time and the understanding of my voice was also super bad (I've used my phone since I currently do not have any beacons, so the mic should be more than fine).

My question now is did it improved? Or do I still need to wait like 20secs before I get a human German TTS response? I used whisper and piper. In addition: the mini PC is also hosting other containers, so I can't just use the biggest STT model since it would consume to much resources.

umbral tulip
#

For running TTS, I usually get my responses near instantly from Piper. For STT my response times are .28 seconds on average, but I am running Wyoming faster-whisper outside of Home Assistant in k8s with a GPU accelerating it, and using the newly released large-v3-turbo model. You may want to check your voice assist debug and see what's taking the longest in the pipeline, my guess is it's the STT. Only way to have that be faster is to use the Nabu cloud, or to get your own Wyoming Whisper running in docker/k8s with GPU ๐Ÿ™‚

dim crest
#

What type of GPU are you using for it? And not sure if I want to go that route cuz power consumption. But I might have to check again. And are you fine with the word detection? Does it detect what you are saying? Or if not directly what you are saying but rather what you want?

umbral tulip
#

For me it doesn't really miss words often, it usually picks me up correctly. I am using a RTX 4060ti, but this is a k8s server I use for lots of other things (game streaming, LLMs, Whisper, etc).

umbral tulip
#

To add to this, I think people have got very good performance using something like a used GTX 1070 for instance, so you don't have to get a crazy GPU. But if you want to do other things like LLMs, you may want to get something with a bit more processing power and vram

#

Also think you mentioned power concerns. Here's a power graph for my GPU over 5 minutes at idle:

#

As you can see it's pretty low, only 4 watts used at idle. With inference, the GPU stays in idle state until you request something. When you do make a request, it only spikes in power for a short moment (few seconds) before going right back to idle. Here's an example of me asking for the time (note my graph only updates every 30-60 seconds so the spike looks longer than it is in actuality):

dim crest
umbral tulip
#

These are grafana graphs from my k8s cluster ๐Ÿ˜

dim crest
#

Sick.

#

What kind of beacon are you using for mic's? Esp32 based?

#

I have a lot of sonos speaker but sadly can not use their mic locally.

umbral tulip
#

Using the esp32-s3-box-3 at the moment, will probably pick up some of the new home Assistant voice satellite hardware when it's launched on the 19th

#

I've heard people are also using the ReSpeaker lite voicekit with great results as well. And there's also hardware in development by FutureProofHomes

dim crest
dim crest
umbral tulip
#

I am using the official wyoming faster-whisper container, and then injecting the necessary cuda files in my k8s yaml lol

#

and I am using the newly released large-v3-turbo model