Hi everyone!
I'm working on a voice cloning project for a non-native English speaker. But I have some questions
Is it possible that someone here is engaged in cloning and finetuning of voice clones? Is it realistic to make a clone of a speaker speaking English with an accent, and then generate the same text, but with a pure US accent to be recorded as a native speaker?
What are the steps? train a TTS model (Tacotron2 / FastSpeech2/other) based on the speaker’s voice first and then finetune it to match original nuances or is there any models that can study recorded take, fix pronunciation and deliver it bypassing TTS pipeline and work in STS field?
We have:
5–10 hours of recorded speech with transcripts
Target output: high-quality English TTS with native-like pronunciation
Is anyone here interested in collaborating on this or offering freelance help?
Or maybe someone has done similar work and can point me to the right repo/config?
Thanks in advance 🙏