the spectrum I showed to Fitz Roy last time wasn’t a spectrogram of the inferred model but the original spectrogram of the singer's voice recorded by the microphone. so, the harmonics weren’t wrong; they were part of the original recording. this isn’t so much about your judgment being wrong but rather that human eyes and experience are subjective, which leads to issues like this. especially when you set the answer and look for it, you end up staying within the framework you’re already familiar with.
The fact that a pretrained model is trained on a lot of data doesn’t always translate into a purely positive thing. It can highlight weaknesses you might want to hide or amplify incorrect sounds that you accidentally included in the training data.
most data-cleaning processes differ because each method and habit varies. Everyone has different habits, and when similar sounds are collected, it implies there’s a domain of sound that the pretrained model can't handle.
for instance, the breathing sounds of real singers are quite rough because they need to take in large amounts of air in a short period. These sounds can be quite distracting. While they’re usually removed during mixing, I was surprised to find that these breathing sounds didn’t appear at all in the model trained just recently. However, that meant that the breathing sounds originally present in my dataset weren’t being generated at all by the model, and I realized that this wasn’t purely a benefit.
what you should know is that the more extensive the pretrained model’s data is, the broader the variety of sounds the model can infer. A broader variety means that even sounds you might not want to hear can be inferred.
If you really want to know the answer, I recommend training your model without using a pretrained model. While it may take longer, it’s a clear way to monitor exactly what sounds are present in your dataset without the intervention of the pretrained model.
and you’ll quickly spot issues with your own dataset that you might not have noticed before.
of course… there could be technical challenges that make it more difficult, which isn’t your fault, but you’ll understand once you try.