i actually trained this in applio using crepe (its mangio-crepe, because applio crepe has hop length)
trained on a 3:41-minute dataset ripped directly from the sounds resource (rx de-ess, mouth de-click, resample + eq, noise gate)
pitch extraction: mangio-crepe
hop length: 8
steps: 4.16k
batch size: 4
pretrain: original v2 / 32k
precision: fp16
please don't forget to credit me when you use this model.