#how to create a custom pretrain?

1 messages · Page 1 of 1 (latest)

rapid hazel
#

i wanna see how i can create my own for expirmental artist if this is even possible.

#

@lunar burrow bro

#

do u know how to create a pretrain?

#

i searched up make a pretrain

#

and i saw yu talking abt it

#

if not lusbert how abt @thin oasis

#

yu seem like u can help bro

#

im curious fr

last gulch
rapid hazel
#

like examples

#

of what i mean expiremental artist

#

like maybe yeat, where he does tons of tones

#

where it can basically like have no issue with the tone play

#

deep voice, slurred voice, high pithced, and other stuff

thin oasis
last gulch
#

idk tbh since I've never made a pretrain for rvc before, and I'm pretty sure it requires larger vocal datasets for it iirc

thin oasis
#

Also if you mean making a pretrain from one person that pretrain would only be good at replicating that voice

rapid hazel
#

or would it have to be the same voice

#

for the same pretrain

#

because then it wouldnt make sense its just training a normal model if that makes sense

thin oasis
rapid hazel
thin oasis
#

I mean if you want it to be good it should be clean

rapid hazel
#

orrrr

#

i got abt 30hrs

last gulch
thin oasis
#

I think ov2 was like 100h

#

Leme check

thin oasis
#

Haven't used rin much

rapid hazel
#

is there a google colab for it or something

last gulch
thin oasis
thin oasis
last gulch
thin oasis
#

I was just told it was cuz it's clean 🤷‍♂️

last gulch
#

hmmm that could be the problem

#

audios might have inconsistency

rapid hazel
#

that would be able to do pretrains

#

or make custom pretrains

thin oasis
#

I think the new pretrains were just trained off the old ones and just taking the d and g but I might be wrong

last gulch
#

hmm would be nicer if we have 44.1Khz pretrain

thin oasis
#

Would need to change rvc code for that

last gulch
#

I believe there are 44.1Khz 16bits audios there lying around

#

yah another problem arise haha

rapid hazel
#

so guessing there is no google colab for the trains..?

#

pretrains

#

**

#

or to make

thin oasis
#

Just the normal ones should work

#

Train the shit take the d and g pth

rapid hazel
#

to like yk

#

make the custom pretrain

#

i got my data

#

30hrs should be enough

thin oasis
#

First you would need colab pro cuz it would take so long to train it would disconnect before it's done 100%

sharp cragBOT
#

Ayo? @rapid hazel level 7 !!! lfg

thin oasis
#

-colab

jovial umbraBOT
# thin oasis -colab
☁️ Google Colabs
🤗 Hugginface Spaces
thin oasis
#

Use one here train the shit get the d and g pth from logs

rapid hazel
#

once its finished

thin oasis
#

@tight surge can you help incase I say something wrong

rapid hazel
#

since yk

#

for underground artist

#

there really isnt like clean or crystal clear vocals

#

its usually sometimes with noise / or maybe some bleed

tight surge
#

What pretrain you want to create?

rapid hazel
rapid hazel
#

like with tones and stuff

tight surge
#

There's no point of creating a pretrain out of a specific person

rapid hazel
#

like underground artist

#

whos voices are truely unique

#

as in tones

#

i think tones is the correct word

#

for what im trying to say

tight surge
#

Well, you can, but I'm still not sure if it'll make sense to do

#

I still haven't found out if making a pretrain of a specific style actually helps to train models of such style better

rapid hazel
#

i just want to know

#

how to make the pretrain

#

so poopmaster told me that i would train all my data on the google colab since i dont got a good gpu and then use the d and g files to train my model (when the pretrain is finished)

tight surge
#

Yeah you'll need at least 20 hours, and you'll need to train it for like a couple weeks

tight surge
rapid hazel
tight surge
#

Ig at least 3060, but it'll 100% take at least 2 months to train something usable

rapid hazel
#

for a pretrain to like work

tight surge
#

Depends on the amount of data you got, 10-20 hours may require 100-300 epochs to start to sound good, 100+ hours may require at least 30 epochs

rapid hazel
#

or yu dont do that

#

like im saying little

tight surge
#

Well, I can, I can't promise quality tho, and it'll cost waaaaay more than a regular model, cus it'll take me a couple weeks to train it

rapid hazel
tight surge
#

How long is the dataset?

rapid hazel
#

we could start small

#

possibly

tight surge
#

Nah, that ain't enough

#

10 hours is like minimum imo

rapid hazel
#

but too im not needing it like asap you can take your time like tops maybe a month or 2

#

its out of curiousity

#

if i do proceed

tight surge
#

400 dollars to train it

tight surge
#

Told you it's not an easy task xd

rapid hazel
#

guessing like a month?

#

since it was abt 100hrs i heard (the datast)

tight surge
tight surge
rapid hazel
tight surge
#

Around the same as regular v2, just better

tight surge
rapid hazel
#

if you know

tight surge
#

It was basically the same concept as the regular v2, a bunch of different male and female voices, also I added ork voices and high pitched cartoonish voices

#

But the difference is that my dataset was cleaner

rapid hazel
#

im gonna try making a custom pretrain

#

hope it goes well lmao

tight surge
#

Good luck

lunar burrow
sleek karma
sleek karma
#

Clarity and robustness take a hit

#

Same with denoising

tight surge
sleek karma
#

I'm not saying it won't be ideal

#

But if I were to give u studio sessions and isolated vocals I'm pretty sure u'd pick the studio sessions

sleek karma
#

Also increases the chance of mode collapse

#

That's why robust data is a necessity

tight surge
sleek karma
#

But I'll tell u this

#

U'll need good songs to extract stems from

#

And use bs reformer

#

Then de reverb

#

And then u'll need to do some audio trickery and stretch the vocals down to 35k so that u can de echo and de noise

#

Good luck

#

And rip true flacs

#

If u want true flacs I can tell u In dms how to

tight surge
sleek karma
#

Or that's more accurate

tight surge
sleek karma
#

Which affects generalization

#

And quality

tight surge
sleek karma
#

I'm talking about the performance of the model due to the hit on audio quality

#

If the gradients can't adapt to the dataset then the pretrain might not be flexible enough

#

And may introduce inconsistencies and other mumbo jumbo

tight surge
sleek karma
# tight surge you mean how it would inference?

No, convergence in ml is when a model reaches an optimal solution or a suboptimal one in the most efficient way possible. Which means that the model will be able to produce something without struggling much e.g voice cracks whilst sounding robust and being able to generalize

#

Ofc u also need to look out for ur kullbak divergence too

sleek karma
tight surge
sleek karma
#

Also pull audio from yt

#

And train on 40k

tight surge
#

like mode collapse for example, i had a model which once had like 30 mode collapses, cus ooh, spooky-cary, generalization lost, etc, and i expected it to sound horrible, but it sounded good

sleek karma
#

It doesn't mean quality was lost during it

tight surge
#

well of course i use mdx23c instead of voc ft, but sometimes voc ft also works

sleek karma
#

Which is bad

#

Mdx23c bleeds

#

Which is also bad

#

So

#

Use bs reformer

tight surge
#

yeah, clean that with rx and it's good

sleek karma
#

Use bs roformer if u want this to work

tight surge
#

i mean it already works xd

sleek karma
#

Yeah but the less cleaning needed the better

tight surge
#

depends on the dataset, if it's a song, then there's basically no way of making it pure clean

tight surge
#

if it ain't on uvr, i ain't doing that xd

sleek karma
#

I've told u all I could up to this point to get the best results

#

It's all up to u now

tight surge
#

your rx cleaning tips were nice tho :3

sleek karma
#

I can help u with the pretrain tbh

#

I always wanted to see how it would turn out

#

Bro typing an essay

tight surge
# sleek karma This is all experience, I've noticed that models with better graphs do better. S...

basically it just needs to learn to recreate as many sounds as possible, that's why we're using such big datasets, the thing is, it can't tell the difference between one sound and another sound, it can just label them as different features of the dataset, which is why i'm not looking at graphs anymore, cus artifacts and noise isn't considered as a bad quality audio by rvc, it just labels them like any other sound, so rn i'm experimenting making different datasets with different noises and artifacts, to maybe make it learn those artifacts better and recreate them in a more natural way, cus separator models rn just can't achieve pure clean results, so i think it'll be better if we just taught rvc to handle uvr stuff better. So if you have some UVR separated datasets, send em to me, i'll add them to the tests :3

sleek karma
#

Also

#

What have y gathered

#

And from where

tight surge
#

use whatever separator, they all produce artifacts which in this case might be good

tight surge
#

different songs

sleek karma
tight surge
sleek karma
#

Therefore even if does learn

#

It won't be able to recreate it accurately

tight surge
sleek karma
#

U know what

#

Go thru with it

#

And we'll look at the results

#

Go work

tight surge
sleek karma
#

How much u got

tight surge
#

haven't counted yet, i think around 5 hours of datasets

sleek karma
#

Just isolate and all that it also clean

#

With rx etc

tight surge
sleek karma
#

And I'll send u cleaned up stuff for the sake of variety

tight surge
sleek karma
#

I might also have some speech too

tight surge
#

that's nice

sleek karma
#

Okay I'll keep u posted with the stuff I'll do

tight surge
#

yeah, just send em in dms from time to time, one day there will be enough data

sleek karma
#

Anything for u cutie :3

rapid hazel
#

damn

#

theres a lot going on here

#

@sleek karma u do know im creating the pretrain right

#

seeing this yu did state it to simple like if he was gonna create it

#

ngl

#

i might be wrong

#

ok what

#

im confused now

#

lmao

#

will there be a pretrain for what im trying to do?

#

for expiremental artist

#

or would i still need to do that

#

so heres what i need to know

#

  1. once i grab all the data needed, i train in any google colab since i dont have a gpu and then whats next? I was told to grab the d and g files and use them to train my model with that, if that makes sense.

  2. how clean does the dataset need to be?. Im using the latest method to get my vocals, which is mdx 23 c inst hq + bs reformer + de reverb + decho + denoise + audacity noise gate + trunacating silences + normalizing (skipping rx 10)

  3. Can the dataset be anything? Like from artist like yeat, ken carson, osamason, others etc ? most of it does have noise and stuff.

sharp cragBOT
#

Ayo? @rapid hazel level 8 !!! lfg

rapid hazel
#

god damn it

#

nvm

#

yall deciding to make a pretrain now?

#

or was it plannned already

rocky bison