#Does RVC lowest_value matter?

1 messages · Page 1 of 1 (latest)

charred barn
#

When training an RVC model, does the “lowest_value” parameter really matter? - I started on epoch one at a lowest value of 32, dropped at epoch 17 to lowest value of 25, and then again at epoch 75 to 24. Does this mean it’s slow progress - my smoothed loss is going down slowly but getting better every single epoch, but I am pretty disheartened that the lowest_value is dropping extremely low

I am using Applio and a 2h 5m dataset to train on, it has been normalised and cleaned with UVR and also clipped into small segments to total nearly 700 audio files.

velvet moat
#

short answer: no

#

because at the end of day the losses don't really imply better/worse sound quality, and the best way to choose a checkpoint is to listen to converted samples and choose manually

charred barn
# velvet moat because at the end of day the losses don't really imply better/worse sound quali...

Thanks man, I’m struggling with datasets a bit too, I understand you need high quality, I have got my hands on raw studio acapellas, I’ve cut them into 10 second clips, I currently have 2 hours worth of these clips, but I could do way more or way less, but it seems people are only using 10-30 minutes? Is it bad to have more data with different styles and emotions of the subject? And I am doing 250 epochs with this 2 hours of data, I understand some people do 1000-3000 epochs but that is with like 5 minutes of data, so I take it that the more data you have, the less epochs are needed, once again thanks for the help

velvet moat
#

2h is good, probably 1h would be sufficient too but I would use all of it. I assume the data is consistent in terms of sound? (same voice, hopefully same/similar microphone, etc., just make sure there's no noticeable differences between sessions)

In terms of epochs, you never know the target amount in advance. You need to train for some epochs and then listen to checkpoints on the way and pick the best one by ear.
2h is a lot of data, so I don't think you would need more than 150-200 epochs but no idea really. I'd probably put 100-150 initially, then listen to checkpoints every ~5 epochs (I usually save every 10, but in case of large datasets perhaps it's better to save more often) and decide whether i'm satisfied or if theres a chance it'll get even better later

charred barn
velvet moat
#

various pitches and expressions are a good thing

charred barn
#

yes i thought so

velvet moat
#

non-verbal sounds, grunts, screaming, whispers etc might hurt the model though

#

but if it's just lots of consistent singing then it should be good for a datset