#how to create a custom pretrain?
1 messages · Page 1 of 1 (latest)
@lunar burrow bro
do u know how to create a pretrain?
i searched up make a pretrain
and i saw yu talking abt it
if not lusbert how abt @thin oasis
yu seem like u can help bro
im curious fr
also this is the thing I'm up to and wondered, would be nice if I can help with such things
you think there could be a pretrain on what im trying to do?
like examples
of what i mean expiremental artist
like maybe yeat, where he does tons of tones
where it can basically like have no issue with the tone play
deep voice, slurred voice, high pithced, and other stuff
Mustar and simplcup are the main people who make pretrains for ai hub
idk tbh since I've never made a pretrain for rvc before, and I'm pretty sure it requires larger vocal datasets for it iirc
Also if you mean making a pretrain from one person that pretrain would only be good at replicating that voice
do u know how , like would i be able to combine multiple different voices?
or would it have to be the same voice
for the same pretrain
because then it wouldnt make sense its just training a normal model if that makes sense
Yeet it all in the dataset
how clean does it have to be tho
I mean if you want it to be good it should be clean
has anyone done a pretrain with like 100hrs of dataset
orrrr
i got abt 30hrs
tbh the ov2 has potential, but still needs a bit of fine tuning imo
Good for smaller datasets
Haven't used rin much
ok so once i have tha dataset how do i make it into a pretrain / use it to train for my models ?
is there a google colab for it or something
ahh the rin e3 has static nnoise problem
It was something like because it was trained on super clean datasets it's super sensitive to noise
You would need a super good gpu or it would take years
like file info? 40Khz 24bits? noise free?
I was just told it was cuz it's clean 🤷♂️
so there isnt a google colab for that then right
that would be able to do pretrains
or make custom pretrains
I think the new pretrains were just trained off the old ones and just taking the d and g but I might be wrong
hmm would be nicer if we have 44.1Khz pretrain
Would need to change rvc code for that
I believe there are 44.1Khz 16bits audios there lying around
yah another problem arise haha
ok but like where would i go
to like yk
make the custom pretrain
i got my data
30hrs should be enough
First you would need colab pro cuz it would take so long to train it would disconnect before it's done 100%
i do
paying for it rn
Ayo? @rapid hazel level 7 !!! 
-colab
- Applio, by IA Hispano Google Colab
- AICoverGen-WebUI, modded by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], credits to Eddy, Hina and Gdr for translating and fixing Google Colab
- Ilaria RVC, by thestingerx Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- easyGUI, by rejects Google Colab
- Applio, by IA Hispano Huggingface Spaces
- Ilaria RVC, by thestingerx Hugginface Spaces
- RVC-HFv2, by r3gm Huggingface Spaces
- AICoverGen, by r3gm Huggingface Spaces
- Advanced RVC Inference, by r3gm Huggingface Spaces
- RVC v2 Huggingface version, by Clebersla Huggingface Spaces
Use one here train the shit get the d and g pth from logs
and then use it to train my model right like for what i wanna do
once its finished
@tight surge can you help incase I say something wrong
i think my only worry for me is that if the dataset wont be too clean
since yk
for underground artist
there really isnt like clean or crystal clear vocals
its usually sometimes with noise / or maybe some bleed
What pretrain you want to create?
well i want to create a pretrain based on expirmental artist
here what i was saying
like with tones and stuff
There's no point of creating a pretrain out of a specific person
well not just a specific person for any artist thats like this artist in general
like underground artist
whos voices are truely unique
as in tones
i think tones is the correct word
for what im trying to say
Well, you can, but I'm still not sure if it'll make sense to do
I still haven't found out if making a pretrain of a specific style actually helps to train models of such style better
i mean i could prolly try and figure it out
i just want to know
how to make the pretrain
so poopmaster told me that i would train all my data on the google colab since i dont got a good gpu and then use the d and g files to train my model (when the pretrain is finished)
Yeah you'll need at least 20 hours, and you'll need to train it for like a couple weeks
Yeah, that won't work, you need a good GPU to train it locally, cus colab just doesn't have enough space for data
how good of a gpu are we talking tho
Ig at least 3060, but it'll 100% take at least 2 months to train something usable
and how many epochs
for a pretrain to like work
Depends on the amount of data you got, 10-20 hours may require 100-300 epochs to start to sound good, 100+ hours may require at least 30 epochs
damn.. so there wouldnt maybe be a possible way (if i can ever get it) like pay for a custom pretrain, like have you train little by little and work on it whenver
or yu dont do that
like im saying little
Well, I can, I can't promise quality tho, and it'll cost waaaaay more than a regular model, cus it'll take me a couple weeks to train it
what range in cost are we talking hypothetically
How long is the dataset?
well still working on it rn about 3-4hrs
we could start small
possibly
ok ok so 10-20hr range how much would you charge
but too im not needing it like asap you can take your time like tops maybe a month or 2
its out of curiousity
if i do proceed
400 dollars to train it
Told you it's not an easy task xd
how long did it take for yu to do tha ov2 pretrain one?
guessing like a month?
since it was abt 100hrs i heard (the datast)
At least a month
It wasn't, it's the other one that was 100+ hours called Rin and made by my friend Mustar
how many hrs is the ov2 one ? and if im correct its supposed to work with bigger datasets right the rin_e3 one ?
Around the same as regular v2, just better
That's a theory, an ai hub theory, which I'm still not sure of
if you dont mind me asking what did the ov2 super pre train consist of? in the dataset
if you know
It was basically the same concept as the regular v2, a bunch of different male and female voices, also I added ork voices and high pitched cartoonish voices
But the difference is that my dataset was cleaner
damn well wish me luck
im gonna try making a custom pretrain
hope it goes well lmao
Good luck
even tho you got your answers, well imma still say this
nope
now cuz isolated vocals are bad for making pretrains, just get good at making datasets and u;ll soon see how much better it sounds
i'm not sure about that yet
Yeah they are bad, u are ripping frequencies stacked on top of each other and possibly spectral ones
Clarity and robustness take a hit
Same with denoising
nah, there's no confirmation on how it actually affects pretrains yet, it might help the model understand allat artificial lag better, but i'm not sure, i'm still testing
It will definitely make it more adaptive to such audio but still, it will be restricted to just that
I'm not saying it won't be ideal
But if I were to give u studio sessions and isolated vocals I'm pretty sure u'd pick the studio sessions
Plus it affects the generalization of the model greatly
Also increases the chance of mode collapse
That's why robust data is a necessity
the thing is, that might actually be crucial, cus people train shitty datasets with a lot of noise anyways, and if the pretrain will contain an information of that specific lag that mdx produces or any denoising stuff or whatever, it might actually make models better
Yeah that's not how a GAN works though. You train a pretrain just so that it can have a reference when it's looking at said data. If the pretrains covergance is bad then it won't help much
But I'll tell u this
U'll need good songs to extract stems from
And use bs reformer
Then de reverb
And then u'll need to do some audio trickery and stretch the vocals down to 35k so that u can de echo and de noise
Good luck
And rip true flacs
If u want true flacs I can tell u In dms how to
not as reference, it directly continues the training using the pretrain, so pretrain is like a checkpoint that already contains a certain data
I guess u can call it that.
Or that's more accurate
gan just recreates anything we feed it, or at least tries to, that's why we can create a model of basically any sound, and thanks to hubert it'll be able to tell the difference between certain features of the dataset we gave it, that's why it can recreate even very stylized voices like cs:go voices for example, or some dark lord of Schmaganrog
Yeah but my point still stands. Just bc Hubert adapts doesn't mean it can't have bad convergence
Which affects generalization
And quality
usually it only has troubles with consistent loud background noise, which you can hear throughout the whole dataset, other small stuff will be just labeled as a certain feature of the dataset. like, some small short crackle noise can be just labeled as a "T" sound or "C" sound
I'm not talking about that
I'm talking about the performance of the model due to the hit on audio quality
If the gradients can't adapt to the dataset then the pretrain might not be flexible enough
And may introduce inconsistencies and other mumbo jumbo
you mean how it would inference?
No, convergence in ml is when a model reaches an optimal solution or a suboptimal one in the most efficient way possible. Which means that the model will be able to produce something without struggling much e.g voice cracks whilst sounding robust and being able to generalize
Ofc u also need to look out for ur kullbak divergence too
#1213509354343637065 message more details here
yeaaaaaaaaaaaah, i think you're too concentrated on the theory there and not much on practical knowledge, cus in practice this all kind of doesn't affect it as much as it might sound
Okay grab stems using voc ft de noise it and all that and train the model.
Also pull audio from yt
And train on 40k
like mode collapse for example, i had a model which once had like 30 mode collapses, cus ooh, spooky-cary, generalization lost, etc, and i expected it to sound horrible, but it sounded good
Mode collapse= model didn't read the whole dataset
It doesn't mean quality was lost during it
yeah, i do allat usually cus i'm a lazy fuck, and as you can hear most of my models sound good xd
well of course i use mdx23c instead of voc ft, but sometimes voc ft also works
It creates noise
Which is bad
Mdx23c bleeds
Which is also bad
So
Use bs reformer
yeah, clean that with rx and it's good
Use bs roformer if u want this to work
i mean it already works xd
Yeah but the less cleaning needed the better
depends on the dataset, if it's a song, then there's basically no way of making it pure clean
Try bs roformer
if it ain't on uvr, i ain't doing that xd
It's worth it, I wouldn't be trying to lie to u
I've told u all I could up to this point to get the best results
It's all up to u now
it's all theoretical tho, i'm more of a "poke that thing and see what happens" guy, cus it in most cases shows more truth than a theory
your rx cleaning tips were nice tho :3
This is all experience, I've noticed that models with better graphs do better. Sure grads may not be as important. But u still need to be careful about the cleaning process
I can help u with the pretrain tbh
I always wanted to see how it would turn out
Bro typing an essay
basically it just needs to learn to recreate as many sounds as possible, that's why we're using such big datasets, the thing is, it can't tell the difference between one sound and another sound, it can just label them as different features of the dataset, which is why i'm not looking at graphs anymore, cus artifacts and noise isn't considered as a bad quality audio by rvc, it just labels them like any other sound, so rn i'm experimenting making different datasets with different noises and artifacts, to maybe make it learn those artifacts better and recreate them in a more natural way, cus separator models rn just can't achieve pure clean results, so i think it'll be better if we just taught rvc to handle uvr stuff better. So if you have some UVR separated datasets, send em to me, i'll add them to the tests :3
Okay but look we are going to use bs reformer and not mdx23c
Also
What have y gathered
And from where
use whatever separator, they all produce artifacts which in this case might be good
Yeah no we not doing that
i explained you why i'm doing that, i don't need clean stuff rn, i need bad stuff
Yeah , but the Mel graph won't appreciate that
Therefore even if does learn
It won't be able to recreate it accurately
you feed rvc stuff - it learns it, that's basically all theoretical knowledge that is required
Yes but GAN
U know what
Go thru with it
And we'll look at the results
Go work
theory is nice, but sometimes you just poke and find out :3
Yes fuck around and find out he he 😸
How much u got
haven't counted yet, i think around 5 hours of datasets
Okay what do y want me to do
Just isolate and all that it also clean
With rx etc
just send whatever you clean real hard and what has like a shit ton of artifacts
Backing vocals u want or nah
And I'll send u cleaned up stuff for the sake of variety
nah, they're too different, basically i just need regular voices, speech or singing doesn't matter, it just needs to be cleaned through uvr or some shit like that
Ohh so I need to remove the backing vocals when they are stacked on top of the main ones okay
I might also have some speech too
that's nice
Okay I'll keep u posted with the stuff I'll do
yeah, just send em in dms from time to time, one day there will be enough data
Anything for u cutie :3
damn
theres a lot going on here
@sleek karma u do know im creating the pretrain right
seeing this yu did state it to simple like if he was gonna create it
ngl
i might be wrong
ok what
im confused now
lmao
will there be a pretrain for what im trying to do?
for expiremental artist
or would i still need to do that
so heres what i need to know
-
once i grab all the data needed, i train in any google colab since i dont have a gpu and then whats next? I was told to grab the d and g files and use them to train my model with that, if that makes sense.
-
how clean does the dataset need to be?. Im using the latest method to get my vocals, which is mdx 23 c inst hq + bs reformer + de reverb + decho + denoise + audacity noise gate + trunacating silences + normalizing (skipping rx 10)
-
Can the dataset be anything? Like from artist like yeat, ken carson, osamason, others etc ? most of it does have noise and stuff.
Ayo? @rapid hazel level 8 !!! 
god damn it
nvm
yall deciding to make a pretrain now?
or was it plannned already
Honestly, these pretrains that are being created have been a great help, haven't they? You can train with up to 10 seconds of data. LOLL