trained on a 0:04-second dataset ripped directly from YT. (rx mouth de-click, resample + eq, noise gate)
pitch extraction: rmvpe
steps: 3k
batch size: 8
pretrain: original v2 / 32k
precision: fp16
a value of index ratio of 1 is recommended to get better results.
please don't forget to credit me when you use this model.