#Girl | RMVPE | 90 Epochs
1 messages · Page 1 of 1 (latest)
damn 90 epochs and this good?
More Epochs doesnt mean the model will be better. I was also suprised that this model sounded good at that checkpoint.
doesn't more epochs mean the amount of rounds the model went thru??
1 epoch is the model going through the whole dataset once. So for this model it went through the entire dataset 90 times.
so why more doesn't mean better? I mean yeah over training and stuff
Overtraining. As far as i know thats the only reason.
model starts to forget the pretrain previous knowledge and proceeds to generate trash
yeah but most models are not that good at 90 epochs hence my amusement
I never knew this
That would be a dataset problem, no?
Longer dataset needs a less epochs?
I mean most of my datasets are 42 min
and I run them 350 epochs
Do you check for overtraining at all?
using tensor board
Why is this not true Lyery?
ideally you want a sweetspot where the model has learned a considerable portion of your dataset, but at the same time not forgetting the pretrain
I use the pretrains that came with rvc.
-# I don't know which to choose for what
you cannot predict how many epochs it's going to take for a model to extract something useful, thats for every dataset regardless of size
If only the TensorBoard had any validation metrics i think it would be pretty useful, at this current time in my opinion it doesnt tell you much.
I see, thanks.
the original rvc-dev recommended 20-40 epochs
up to a max of 200 if the dataset has a lot of features
back then the ui limited the max epoch amount to 100
I heard that it there are differnt pretrains for differnt scenarios
like singing
e-girls
asmr
true, but they're all outdated
i'd recommend either only the og pretrain or klm
i do have a spin pretrain only for realtime models
which uses the same dataset as og
Spin?
a new embedder, handles single words better than cvec and handles context differently (in a better way)
what is the best way to get a model better at laughting and breathing
I mean my dataset had a LOT of breathing and coughting and lauhging and still there are a lot of artifacts
I save every 50 epochs and I try them all and all of them sound weird
cvec relies too much on context, it really needs to see the whole file in order to produce correct speech
If its new and better why is it not standard yet?
which is ok for local conversions but on realtime it struggles so hard even when extra is set to max
no many pretrains done for it
Where is the one you made?
some finetune i did yesterday with it
works fine
it can't sing tho, its only meant for speech
Thanks!
That is fine.
wait the model was trained with english language and was still able to create this?
yes
spin models only work with vonovox!
doesn't work in w-okada or mainline realtime
that would be huge if it works ir time for my german language
Ok, i use vonovox anyway so thats fine.

spin was also trained using only english, but i've heard it has improved pronunciation in other languages compared to cvec
only way to know is to try it
but does it work live on voice changers like okada?
only vonovox supports it
but it's possible to use in w-okada by doing this https://github.com/deiteris/voice-changer/pull/213
hmmm lemme try
i have no idea which spin embedder is that tho, my spin pretrain only works with layer 7_12* (same used in vonovox and applio)
Unrelated question but, do you know of any script, fork, or something i can do to get some sort of validation in rvc?
codename's fork has pesq validation alongside si_sdr
it only validates the same voice tho
Do you have a link to that?
Better than nothing.
Thank you.
have you heard of utmosv2? validates how natural/human a spectogram looks
The only "good" pretrain for now js the original, Titan or Ov2 for example provides robotic results for kinda no reason
I have not, i am still new to ai audio.
gives a MOS score to your generated ai audio files
in a scale of 1 to 5
serves as post validation
code tried to implement it in his fork but failed due to utmosv2 not working properly with multiprocessing
place this in utmosv2 > utils
fixes the broken auto downloader
my infer settings: python inference.py --input_dir "input folder" --out_path "C:\Users\xxx\Desktop\results\results.csv" -n 0 --fold -1 --num_repetitions 20
What does --fold -1 mean? In looking in the code and there doesnt seem to be a comment describing it.
Unless im missing it.
uses all pretrained models during inference
for more accurate results
Ah ok thanks.
what about original wokada? is he really doing only UI changes lol
did he added spin support?
I haven't checked, it was actually question since with "wokada" you seem to only mean wokada deiteris fork
i actually got 2.1.4 alpha for a test for noobies