Hey guys, I need someone to help me create a model using my voice recorded. On a professional mic for a singing model.
I have a pc that can run the training - but I need help with what khz to choose (e.g, 32k, 48k etc), what method (hifigan, refinegan etc), what epoch to use, and whatever else. I’ve made my own model already but I’d like it to be the best quality possible.
My audio set is or easy cleaned and denoised
#My own voice for singing (I have 20 mins clean audio)
1 messages · Page 1 of 1 (latest)
is it 20 min non stop? and was it already litle noise before the denoise (since denoise itself is damaging)
Yeah it’s pretty much 20 min non stop
Left some small pauses and breaths here and there
I don’t know if muffled is the right word
It sounds very clear
But it’s not as clear as real singing
I’m not talking about mispronounced consonants etc
But I just want it to be as near to as real singing
In terms of crispness
So you've made the model alrady?
or is the probelm you dont know how to train at all
I’ve made it, but I’m wondering if there’s something I can do to make the output more crisp
E.g. if there’s a certain setting I should use, certain pretrain I should be using,
Certain rvc/svc architecture
I mean the standard is still cvec hifigan
which is default
and one of the legacy core pretrains should be good
and thats pretty much the best you can go unless you find someone who's willing to give a personal pretrain (they won't)
So would you go with 32k, 40k or 48k
Still slightly unsure as some people say 32k better, some say 48k