#Voice Cloning and accent fine-tuning

1 messages · Page 1 of 1 (latest)

azure bough
#

Hi everyone!

I'm working on a voice cloning project for a non-native English speaker. But I have some questions

Is it possible that someone here is engaged in cloning and finetuning of voice clones? Is it realistic to make a clone of a speaker speaking English with an accent, and then generate the same text, but with a pure US accent to be recorded as a native speaker?

What are the steps? train a TTS model (Tacotron2 / FastSpeech2/other) based on the speaker’s voice first and then finetune it to match original nuances or is there any models that can study recorded take, fix pronunciation and deliver it bypassing TTS pipeline and work in STS field?

We have:

5–10 hours of recorded speech with transcripts
Target output: high-quality English TTS with native-like pronunciation

Is anyone here interested in collaborating on this or offering freelance help?
Or maybe someone has done similar work and can point me to the right repo/config?

Thanks in advance 🙏

jade tulip
azure bough
jade tulip
#

here's example - output 1 is based on original speaker speaking their language (russian)

#

then I did train an RVC voice model based on the original speaker, ran inference of an english text using that voice model + an index from a pure english RVC model

#

both outputs are generated by chatterbox TTS with zero shot that uses ~10-20s audio as reference voice

jade tulip