I've done reading about llm etc and I don't think I want that as it seems more like a gimmick that would get annoying after a while.
What I want is a local tts that works just as good as nabu cloud. I know that means piper and faster whisper but what I don't know is the hardware needed to make it quick. And do the voices sound like robots vs nabu cloud? I'd want it sounding as natural if possible.
The problem I'm finding is most the information out there is fragmented or solely covers llm local models and it seems even that information will lead you down a rabbit hole of countless reading only to come out even more confused on what to choose.
The videos I often find on it just shows how to get started but a lot of it is outdated and i don't think they had speed in mind. I'm not entirely against using a local llm i just don't know which direction I'd need to go If I went that route.