In the past, I've trained models on merely a minute of raw .wav files they turned out great. But some of my models are trained on several minutes on very good audio quality and still sound really rough around the edges. The best example I have is my Rio Futaba model which is really scratchy on her nasal sounds and S's.
Attached is a reading of the Rio Futaba model, (Download: https://huggingface.co/JuLY-LION/Futaba/blob/main/Futaba-RMVPE_e360_s6840.zip) and also another model I've trained (which sounds great) to prove that I'm not a complete novice.
Anyone know how to help me with this? Did I mess up my training/inference parameters?
