#Local assist the best way

1 messages · Page 1 of 1 (latest)

peak coral
#

I've done reading about llm etc and I don't think I want that as it seems more like a gimmick that would get annoying after a while.

What I want is a local tts that works just as good as nabu cloud. I know that means piper and faster whisper but what I don't know is the hardware needed to make it quick. And do the voices sound like robots vs nabu cloud? I'd want it sounding as natural if possible.

The problem I'm finding is most the information out there is fragmented or solely covers llm local models and it seems even that information will lead you down a rabbit hole of countless reading only to come out even more confused on what to choose.

The videos I often find on it just shows how to get started but a lot of it is outdated and i don't think they had speed in mind. I'm not entirely against using a local llm i just don't know which direction I'd need to go If I went that route.

dull vessel
#

kinda talking about two different topics here. TTS/STT are separate from LLM, and as you said that's just faster-whisper (wyoming-faster-whisper for HA) and Piper. Piper can work perfectly fine on just CPU and is plenty fast if the CPU is decent. I think there's a GPU accelerated docker for it out there but not necessarily worth the trouble. For wyoming-faster-whisper you would have to run this in an external docker with GPU exposed, Nvidia tookit installed, etc. There's some guides out there on how to do that. As far as the actual hardware, any decent nvidia GPU will do the job. Many people get a used GTX 1070 or 1080 from ebay and that will get you responses in under a second. You can also grab a more modern GPU, and if you only want to do STT on it, 6GB (or 8 to be extra safe) would be plenty of VRAM. Maybe something like an RTX 3050 or 3060 (I myself use a 4060ti 16GB).

#

Piper voices also vary in quality. I pesonally use the jenny dioco voice and find it to be just fine. It isn't the greatest, but it's also not robotic imo. I also have a GLaDOS tts voice that sounds pretty much the same as the voice in Portal. Can do some tinkering and find what works best for you. 🙂

peak coral
#

I don't run docker could that be done in windows or would I need to have a vm setup for it?

dull vessel
#

It can, I think people run docker desktop or something like that in windows

muted gulch
low arrow
#

The best local TTS that I've found is Kokoro TTS. Runs instantly if it has access to a GPU and runs very well on CPU. Much better than Piper.

muted gulch
peak coral
#

Never heard of Kokomo. Does it install the same way? I'm just trying to get info at this point

low arrow
#

It doesn't have a Home Assistant add-on like Piper. It's typically installed as a docker container (which is what the add-ons are under the hood) and can be installed on your HA rig, or on separate hardware. In my case I have it installed on a separate server with access to one of the server's GPUs.