#Simple help

1 messages · Page 1 of 1 (latest)

distant dawn
#

I’m trying to make a singing model and I’m confused by all the 32khz or 40khz or 48khz stuff as well as the refinegan and hifigan stuff and pre train .
Can anyone simply outlay what is the best to use for a singing model
I.e. what KHZ, refinegan or hifi, and what retrain I should use?
Thanks

spiral prawn
golden rain
#

Check the frequency spectrum of your dataset and pay attention to the high frequency part of the graph (ignore outliers, if any). Multiply it by 2 and choose the next sample rate. E.g. if your spectrogram peaks at 16kHz, choose 32kHz. If it peaks at 18, choose 40 or 32. You can go lower, but shouldn't go higher than needed (e.g. if it peaks at 20, don't choose 48 as 40kHz is already saturating the spectrum).
It'll also probably depend on the pretrain, but supposedly 32kHz is in many cases supreme over higher frequency variants in terms of sibilants and breath noises.

#

Honestly just try various stuff and compare it, the best way to learn

distant dawn
golden rain
#

Understandable. Especially if your dataset is not very small.
I think a good start is 32k and original pretrain or perhaps legacy core 1.5

#

(bear in mind that I'm by no means a pro in the field, I'm still learning as well)