#What's the best approach for creating RVC/Beatrice v2 training data?

1 messages · Page 1 of 1 (latest)

weary isle
#

Hey all!
I'm wanting to record some audio from a few people to put together some models for a project I'm working on and was wondering if anyone had any tips for me?

My main questions are:

  • Are there any standard scripts I could use for my voice talent to read off for the best chance at high quality models?
  • Is there benefit to more/longer recordings, ie. would 20 hours of recordings result in a better model than 6 hours?

Any other tips around the preparation of the data would also be hugely appreciated!

orchid kindle
weary isle
#

Hmm good to know, bit of a shame given the results it gives for real time stuff

peak basalt
#

cause at this point, using pretrains.. I mean, perhaps you could but then I'd limit the data to max 40 min to 2 hours perhaps? ( but I haven't heard anyone use more than 1 hour tbf so can't say how it's all gonna behave - could overfit

weary isle
#

It all comes down to how much I want to budget, happy to pay for longer recording sessions but if it's not going to yield better results then I won't

peak basalt
#

The deal is, as you know we use pretrains, yes?

weary isle
#

I need to read more about pretrains if I'm being honest

peak basalt
#

Now, each speaker doesn't really have that much of data, typically
anywhere from 5 mins to 20-30

but here's the catch, there's really ton ton ton of speakers ( in ideal world conditions )

#

there's really huge amount of variety and pattern exposition

#

and as a result ofc, huge amount of hours

#

Now, you could potentially bypass the requirement by having one speaker but you'd really need to maximize the variety of content

#

Meaning, lots of singing styles, speach patterns, vibrato / modulation and such misc

#

You'd have to compensate the total length of set with diversity

weary isle
#

Yeah I figured, so with pretrains this becomes less necessary since it can reconstruct from existing data I presume

peak basalt
#

Now, perhaps @hidden breach could give more feedback on what could be potentially optimal for specialized 1 speaker model

peak basalt
weary isle
#

Is there a recommended pretrain to use?

peak basalt
#

But yeah, being fully honest with you?
I'd get max 2 hours of data but.. even then it could be an overkill

#

I personally wouldn't exceed 1 - 1.5 hours

peak basalt
#

in 80% of cases if not higher, klm is gonna be better
as yeah, comparing the data that og uses ( vctk dataset ) and klm.. yeah, it's a massive difference

weary isle
#

This has been amazing info thank you!

peak basalt
#

Best of luck ✨

weary isle
#

haha I don't have access to whatever you linked there XD

peak basalt
#

oh

#

well, iirc there's a role for that

#

perhaps " AI research " is the one

weary isle
#

hmm, I've got most of them, I wonder if it's a region locked thing since I'm au based

peak basalt
#

Well, I am not entirely sure what's the requirement but, perhaps you could ping some helper

#

In any case, if you'll need any details on stuff, don't hesitate to @ me
I'll help as much as I can but in the same time, will expect a lil bit of commitment in the topic, yk
Cause had too many guys asking me for advanced things just to, after 10 mins of explanations and various links to educational material, hear " nah screw it. too hard and I'm lazy " lol
anyhow, cheers

weary isle
#

If I could heart that bigger I would 😛 you've been amazing help already!

peak basalt
#

Glad to assist