I'm working on training a voice model using only non-English audio clips from native speakers of other languages. My goal is to use voice conversion to have them speak English fluently, but I’m concerned the output will have a foreign accent since no English phonemes were present in the training data.
Is there a way to get a native-sounding English output from a model trained only on non-English data?
Thanks in advance.