recently some papers claims to have achieved a great infer performance improvement by distillation technique (on so-vitis), I was thinking that this could be applied to RVC's pretrained as well... any thoughts?
link to Distillation paper -> https://arxiv.org/abs/2401.01792
arXiv.org
The diffusion-based Singing Voice Conversion (SVC) methods have achieved remarkable performances, producing natural audios with high similarity to the target timbre. However, the iterative sampling process results in slow inference speed, and acceleration thus becomes crucial. In this paper, we propose CoMoSVC, a consistency model-based SVC meth...