#Trying to figure out an architecture for TTS

1 messages · Page 1 of 1 (latest)

barren gulch
#

Hello. I am working on a personal project to create an audio book of a web novel, and even have a tool I made to help with the process. But I wanted to know a couple of things:

-What is the best text to speech solution that can work with lots of high quality data? I was using F5 TTS but the only thing I don't really like about it is that you still have to give a sample audio, if I have to I can, but I wanted making the audio book as automatic as possible just so I can get a relatively decent quality to listen to.

-How much data should I have max? I am willing to spend a lot of time just annotating data from many episodes but want to make sure I'm not necessarily going overboard.

-What should the samples sound like? I am using hidemuc to clean it but some of the samples still have some noises like some random sound effects, the music might leak a little bit or you might hear noises from other characters in the background but the voice is still clearly heard, and I wonder if I should be very strict about not using any audio that isn't just voice and nothing else or should I be a little more relaxed?

#

I should also mention I have a 2080 Ti with 11 GB of VRAM, that's obviously a consideration but I don't mind training it over night for several days to make sure the solution works for me and then maybe investing a little bit of money for dedicated training

tired smelt
#

all depends on the language you want

barren gulch
#

English is fine with me

tired smelt
#

Kokoro can do really nice reads

#

no voice cloning, but there are some bundled voices

barren gulch
#

So you're suggesting possibly doing Kokro with RVC?

tired smelt
#

i'm suggesting using just Kokoro, pick the voice you like. There's also a way to mix bundled voices to create some variery.

#

Or you can use RVC after to change the voice

barren gulch
#

Well I mean that's understandable but I do want to be able to use the voices of the original characters if I can 😆