Hello, First post here.
I am trying to create a new RVC voice model to use in a language with a few unique phonemes. The models I have made seem to handle most sounds but I am not able to get the models to represent a strong rolling "R" sound.
The conversion kind of just skips that sound. To do this I have tried using a few pretrains like KLM49, Rigel, and OV2Super. I have used about 30 minutes - 1 hour of speaker data from multiple speakers from related languages to train. (Note that I want the model to sound unique and not like any particular voice from the dataset I used for training)
I have a few questions:
- Is there an existing pretrain or voice model that can reproduce this sound?
- Does training with special sounds help the model to represent those phonemes or does that kind of data need to be in a pretrain? (Essentially, is the training capturing phonetic variations or mostly just the timbre of the voice)
- Any suggestions on what I should try?
- I have A LOT of data that spans hundreds of languages and I would love to create a single pretrain that can handle phonemes from any language. Is this within the realm of possibility or do models just fall back towards a generalization that can't capture unique qualities of language outliers?
