I trained my model with a few songs and a long speech. The dataset is in total over an hour. Sometimes it seems like the Voice Model doesn't recognize the correct pitch or volume when I use it. Does it make sense to record singing, or data from the same person with multiple languages, or something like shouting as well? Or does that confuse the AI? Up till now I trained it with some ballads and some normal narrator type readings.
#Which Data to feed the model? (languages, singing, speaking, shouting, etc)
1 messages · Page 1 of 1 (latest)
Yes, it does make sense to record singing, as it gives RVC more data in higher tones
Multiple languages is also fine I guess
I wouldn't really put shouting too much on it, it could affect the normal voice overall
Just make sure it's not a big part of the model
...or make a whole shouting model

Also, you shouldn't really use datasets over an hour long, try going for 20/25 mins of that, might be better than the full hour