I have some rough ideas and wanted some input from others. To train a talking RVC model I would need talking samples which generated tts voices can easily provide. However, to train a singing RVC model, I would need singing samples which generated tts voices cannot provide since tts cannot sing.
Would creating talking samples in various pitches and substituting them for singing samples be a viable method? Would this still allow the trained RVC model to sing properly?