Hello. I am working on a personal project to create an audio book of a web novel, and even have a tool I made to help with the process. But I wanted to know a couple of things:
-What is the best text to speech solution that can work with lots of high quality data? I was using F5 TTS but the only thing I don't really like about it is that you still have to give a sample audio, if I have to I can, but I wanted making the audio book as automatic as possible just so I can get a relatively decent quality to listen to.
-How much data should I have max? I am willing to spend a lot of time just annotating data from many episodes but want to make sure I'm not necessarily going overboard.
-What should the samples sound like? I am using hidemuc to clean it but some of the samples still have some noises like some random sound effects, the music might leak a little bit or you might hear noises from other characters in the background but the voice is still clearly heard, and I wonder if I should be very strict about not using any audio that isn't just voice and nothing else or should I be a little more relaxed?