#Simple help
1 messages · Page 1 of 1 (latest)
32000, 40000 and 48000 Hz are audio sample rates used in RVC voice models. I'm not sure what your dataset audio looks like as a frequency spectrum (not an actual file) but it depends it.
Check the frequency spectrum of your dataset and pay attention to the high frequency part of the graph (ignore outliers, if any). Multiply it by 2 and choose the next sample rate. E.g. if your spectrogram peaks at 16kHz, choose 32kHz. If it peaks at 18, choose 40 or 32. You can go lower, but shouldn't go higher than needed (e.g. if it peaks at 20, don't choose 48 as 40kHz is already saturating the spectrum).
It'll also probably depend on the pretrain, but supposedly 32kHz is in many cases supreme over higher frequency variants in terms of sibilants and breath noises.
Honestly just try various stuff and compare it, the best way to learn
It’s hard to try when you only have limits on compute use 😭