#[Solved] Model sounds worse after cleaning dataset
1 messages · Page 1 of 1 (latest)
Things you can improve:
- Use MDX 23C 8K FFT instead of Kim Vocal 2
- Denoise once instead of twice because of it might compress audio more so it loses some details
- Dont normalize your audio if your audio is loud enough already
- handpick 20 minutes of clean audio from your processed dataset
Finally, train it then compare with your previous versions
also i would like to see spectrograms of your dataset, you can check vit via audacity or spek
im at school rn, will talk to you later
no worries mate
since no one mentioned this yet, I see that you have static noise above 15khz which can cause artifacts. never use models that have static noises in the first place. Also, the secret sauce to for this is to split your audio into 10-30 seconds and normalize each clip, yes it might sound too much work but it's how it is.
still not sure if there's bulk editing for normalization of audio
another thing, use correct sample rate for training based on your maximum frequency response on your spectrogram, it's to prevent the HiFiGAN to predict sounds that wasn't present in the dataset, that's what causes the robotic sibilances like the 'S' sounds
32K = 16Khz
40K = 20Khz
48K = 24Khz
you can do that in vegas pro
just match your pretrain sample rate from your audio dataset to avoid artifacting
32K is your best bet
Ayo? @oblique pagoda level 8 !!! 
[Solved] Model sounds worse after cleaning dataset
it always uses a pretrain