#How to create a good dataset for training?

1 messages · Page 1 of 1 (latest)

daring sentinel
#

Hello,
I am looking into making my own voice model, however I am unsure what constitutes a good dataset for it.

How long should the dataset be?
Should the dataset be ONLY the voice of the person (no background music/noise)?
Should the dataset include coughs, laughs and screams? (or is it even better to include it?)
Would dataset speaking foreign language be able to be used with pretrains and subsequently with English while using it with w-okada? (Say example: German or Polish)

Thanks for the answers!

viscid dust
# daring sentinel Hello, I am looking into making my own voice model, however I am unsure what con...
  1. Remove any background noise or sounds not made by the person, the rvc embedder is very sensitive to noise.
  2. Laughter is fine, but I would not recommend adding coughing or screaming noises, they could be problematic for the model.
  3. It is possible to train any language in rvc, if you are going to train a model for realtime I suggest you to train a dataset of your own language, it will give you better results. For realistic results aim for 2 hours or more.
#

Breaths are OK for rvc, don't remove them, they're very important

daring sentinel
viscid dust
#

I haven't seen this topic in depth, sorry

daring sentinel
#

would you mind also, is there some kind of tutorial on creating the first voice model using applio? Or atleast some guide, that way I can get atleast the basis, rest I could learn on the fly from practical experience

viscid dust
daring sentinel
#

thanks exactly what i needed

viscid dust
#

no problem, good luck