#i have been trying to find a workaround for this issue for around 2 years

1 messages · Page 1 of 1 (latest)

shell monolith
#

so i've worked on probably 70+ models for the artist che over the past 2 years and i keep finding that the tone and voice is never correct despite having around 10-15 mins of raw studio vocals from the artist. im unsure what i can do to fix this issue and have been trying to work around it for very long.

pine sequoia
#

The model sounds like it learned your dataset very well

#

Unsure how exactly you want to squeeze more out of it. To start, the dataset sample you shared is definitely processed

#

You mention tone isnt all the way there. RVC is great but it doesn't work miracles. You might have better luck re-recording what you're inferring with different tones of your own voice

swift ivy
pine sequoia
#

Refer to the tensorboard to see training stats, its very helpful

#

For overtraining

#

Search bar in this discord server is better than Google 🔥

swift ivy
pine sequoia
#

Yes makes total sense

swift ivy
#

I think some folks are going for a more un natural sound in music these days with auto tune so rmvpe might work in that aspect, or if you have a smaller dataset, but i think harvest is for folks that have over 40 min datasets of pure clean audio. 🤔

pine sequoia
#

RVC isnt meant to handle processed vocals to begin with

#

The pretrain is raw vocal, vctk

swift ivy
#

Think it just depends. Rmvpe did make one of my smaller datasets sound pretty decent. But im convinced the older stuff has its benefits.

pine sequoia
#

Also. Set stopping epoch to 10,000. Just watch the tensorboard for when it starts to overtrain

#

And epoch is based on batch + dataset size so you really can't say "just train it to X epoch", its kinda a weird measurement for training length

lavish void
#

Applio has already phased it out, though doesn't remove backward compatibility to v1 models for inference

pine sequoia
swift ivy
noble pond
#

Harvest has better bass than rmvpe
but anyways, this post was not made to discuss about v1 and v2

answering the OP's question:
increasing the batch size makes the model sound more like the original voice, however by doing that the model loses generalization and that can affect the results (the voice may have frequent glitching)
maybe try batch size 12

swift ivy
#

This post was to help someone that obviously wants different results. 🙄 what ever you wana believe. Go for it

noble pond
#

v1 has less complexity than v2 (256 channels vs 768)

swift ivy
#

Well ill try the latest mainline

pine sequoia
#

If you don't believe me just use the search bar here

swift ivy
#

Can you explain why is it outdated? 🤔 does it deal well with large datasets?

pine sequoia
#

So we use Applio

#

They're the same RVC under the hood mostly. You won't notice much difference in how it handles stuff

#

There's just small improvements here and there, for the user and training pipeline. Dig into the applio changelog for more info

swift ivy
#

Idk but im hearing really good results from the rvc mainline , harvest.

#

Sounds so realistic compared to what I have heard out there regarding the voice model of the particular artist

noble pond
#

breaths are bad in both because the dataset didnt had any
v1 sounds muddy because is undertrained (v2 is also undertrained in the audio)

#

it would not be a bad idea to compare v1 and v2 with a better dataset (i don't have any misc_trolley )

#

i trained them in mainline

#

48k v1 training didn't worked for me, but 40k worked

#

rmvpe as f0 estimator

v1 sounding good was not something i expected lol, but anyways, my examples are trash, and it would be 1000% better if someone trains a better dataset and let rvc fully train it, then compare the results

swift ivy
noble pond
#

yeah we need more tests before coming to a conclusion

swift ivy
#

Dont be surprised when they crucify you for this. Lol

noble pond
#

well my examples are still trash because both models are undertrained and the dataset comes from league of legends lol

#

someone needs to try this with a better set + fully training the model

swift ivy
#

Jesus tried telling the jews but they weren't trying to hear him

noble pond
lavish void
noble pond
swift ivy
#

My boy makaveli did and i know for sure his models are great. Experiment ur own ideas vs listening to the crowd will only hurt you.

noble pond
#

😭

swift ivy
#

40k is key

modest garnet
#

ok ill try mainline v1

buh bye 3s slices, auto sync graphs and speed

swift ivy
#

Also cut your dataset manually 12 sec each.

noble pond
#

and let mainline slice the dataset

#

it'll generate 3s slices

modest garnet
#

yuh

modest garnet
noble pond
swift ivy
#

10 sec has been tested already, 12 sec is the sweetspot

noble pond
#

but at least every slice i got was 3s

modest garnet
swift ivy
#

Harvest is outdated but will give better results

swift ivy
#

150 epochs to start off with i say. Dataset should be between 10 and 40 min maybe more

noble pond
#

like a 9 minute set will require less than that

#

since there is less info

#

u train less

swift ivy
#

Gotcha

modest garnet
swift ivy
#

Yeah thats perfect

#

4 size batch

modest garnet
lavish void
modest garnet
noble pond
modest garnet
swift ivy
#

Clean set only cut out silence except for breaths in my opinion

#

We dont speak without breath so

lavish void
#

hina's seems to be abandoned

swift ivy
#

Maybe u never know

modest garnet
noble pond
noble pond
modest garnet
noble pond
#

try mainline inference

#

v1 index doesn't work in applio

modest garnet
#

i dont use indexs

modest garnet
noble pond
modest garnet
#

im going to have to use cpu inference misc_drake_pain

noble pond
#

only works in rvc-boss mainline lol

modest garnet
#

huh mainline inference has a limit on how long the audio can be

#

what the

noble pond
#

lol

#

why u got so weird results? mine works just fine

modest garnet
#

idk

#

did you turn off fp16?

noble pond
#

yes

modest garnet
#

ok i did too

noble pond
#

you used rmvpe?

modest garnet
#

yea

noble pond
#

me too uhh

#

1 sec im opening mainline

noble pond
#

v1

modest garnet
#

why did my model get cursed 😭

noble pond
modest garnet
#

what the 💀

noble pond
#

first i thought it was some weird fumiama mainline bug, but i got the same results in rvc-boss mainline

#

40k worked just fine

#

idk whats going on

noble pond
# noble pond

v1 voice is less powerful than v2, but v1 has a less audible metallic sound (at least in my results tho)

v2 here v

swift ivy
#

Try it with harvest at 40k

modest garnet
swift ivy
#

Also includ the index please 🙏

#

It's the features tho bro!

#

Like u don't want a hybrid voice

#

Just saying u can have the voice but it wont sound like them

noble pond
#

it can make things worse

#

for local usage is fine, you can force the accent of the model so the voice ends up sounding very similar to the original one

#

in cases where you dont care about voice similarity (like realtime) is fine to not use it

swift ivy
#

Gotcha

#

Well in that case this experiment is pointless

modest garnet
swift ivy
#

Yeah bro its grok but i knew that index is needed before gtpt

noble pond
#

in case you don't know what the index does:
it's the accent of the model

swift ivy
#

Especially if people prefer to use it for music

#

Yeah i know

#

Thats probably why rvc1 wins for some people not everyone is doing real-time.

#

This could be the reason why makaveli is saying rvc1 is better for him. The index might work better because rvc1 was designed with it in mind

#

People were not really doing realtime back then also they trained with alot more data such as makaveli does. Maybe

noble pond
swift ivy
#

So ur telling me a index isn't needed? I don't use realtime.

noble pond
#

it's merely a "oh i want the model to speak like the og voice" thing

swift ivy
#

Yeah u can also smoke crack just fine as well. Does that make it ok? 😄

#

Thats what im aiming for tho

#

I want the voice to speak like the og. I dont want it to have another accent

noble pond
#

i get what are you saying
you're trying to say that the index works better in v1

#

in like, accent similarity

#

i actually agree with u in that

#

trained 2 v1 models and seems like there's a better similarity between the set and the model

swift ivy
#

Yeah cuz i like to make ai songs to sound like the real thing as possible

#

Maybe that's why some folks are divided when it comes to this matter 🤔. I think rvc2 works great for shorter datasets while rvc1 might be for folks wanting to create an actual song that sounds like their favorite artist apposed to speaking in their voice in real-time.

noble pond
#

v2 seems to be more expressive than v1 i noticed

swift ivy
#

Its interesting none the less

noble pond
#

but v2 also tends to sound more metallic/robotic

swift ivy
#

Yeah i think it just depends on what people are aiming for. 🤔

#

Also what is interesting is training experiments im discovering people are doing. One is some one figured out how to create a double layer vocal model. I was always told that rvc cant replicate voices that are mixed in with another voice. The idea was to isolate just those sections with that voice double effect as long as they were clean from say a master reel multitrack

#

I heard the results and they are amazing

noble pond
#

i think this conversation should move to #🧬│ai-chat
the post wasn't made for this lol

swift ivy
#

Lol 😆 i thought we had a moment 🤣

pine sequoia
#

Tf happened here

#

Someone give me the tldr

#

Is v1 valid in any use case

noble pond
noble pond
#

but only v1 40k training works, 32k and 48k doesnt work

pine sequoia
#

Are these two demos separate models

#

I forget if there's backwards compat

#

I hear the difference in them but its so small I'd chalk it up go randomness. Unless they're the same pth

noble pond
#

yuh the difference isnt massive

pine sequoia
lavish void
noble pond
#

idk honestly

#

there is a 32k pretrain and config, but in mainline's github rvc boss said he didnt liked 32k v1 results