#[Pretrains Test] Ariana Grande (RMVPE) (RVC v2, 32k, 200 epochs, 8400 steps)

1 messages · Page 1 of 1 (latest)

visual beacon
maiden pendantBOT
#
Rate the model

To rate this model, either use /rate or click the buttons below.

maiden pendantBOT
junior walrus
#

without looking the spectograms and only using my ears, for me the original sounds better

i order them from best to bad based in my opinion and ears xd:

  1. original pretrain
  2. titan
  3. nanashi
  4. ov2
  5. rigel
sour ore
sour ore
#

(No offense Shirou)

visual beacon
sour ore
junior walrus
#

for me the worst are ov2 and rigel

visual beacon
junior walrus
visual beacon
visual beacon
vocal mortarBOT
#

Ayo? @visual beacon level 9 !!! lfg

rigid forge
visual beacon
rigid forge
#

wow

rigid forge
# visual beacon 16

because i would like to rifarte vcon original and 250 epochs serj tankian of system of a down

rigid forge
rigid forge
visual beacon
visual beacon
terse crater
#

Why 200 epochs? Is it just an arbitrary number?

visual beacon
terse crater
#

I see

terse crater
#

Heres a some of info based on the spectrogram. Best to worst:

  • DC offset:
    • DMR
    • OV2
    • SingerPretrain
    • TITAN, OG
    • Rigel
    • Nanashi
    • KLM 4.1
  • Harmonic Distortion:
    • Og
    • TITAN
    • SingerPretrain
    • OV2
    • KLM 4.1
    • DMR
    • Rigel
    • Nanashi
vocal mortarBOT
#

Ayo? @terse crater level 26 !!! lfg

terse crater
#

xD

visual beacon
barren zealot
haughty panther
real flicker
visual beacon
real flicker
#

because i dont use tdeesser

haughty panther
# visual beacon KLM 4.1 added Sample:

without looking the spectograms and only using my ears, for me the original sounds better

i order them from best to bad based in my opinion and ears:

  1. original pretrain
  2. titan
  3. klm 4.1
  4. ov2
  5. nanashi
  6. rigel
visual beacon
haughty panther
#

heard again to ensure fair opinion and yea, ov2 superior than nanashi (no offense)

haughty panther
edgy tulip
#

Because nobody uses singerpretrain

barren zealot
edgy tulip
#

Exacrly

deep kite
#

If you want to accurately test a pretrained model, train new model with each pretrained model for just one epoch using a 1~3 second sample, then test with the resulting model.

deep kite
#

the spectrum I showed to Fitz Roy last time wasn’t a spectrogram of the inferred model but the original spectrogram of the singer's voice recorded by the microphone. so, the harmonics weren’t wrong; they were part of the original recording. this isn’t so much about your judgment being wrong but rather that human eyes and experience are subjective, which leads to issues like this. especially when you set the answer and look for it, you end up staying within the framework you’re already familiar with.

The fact that a pretrained model is trained on a lot of data doesn’t always translate into a purely positive thing. It can highlight weaknesses you might want to hide or amplify incorrect sounds that you accidentally included in the training data.

most data-cleaning processes differ because each method and habit varies. Everyone has different habits, and when similar sounds are collected, it implies there’s a domain of sound that the pretrained model can't handle.

for instance, the breathing sounds of real singers are quite rough because they need to take in large amounts of air in a short period. These sounds can be quite distracting. While they’re usually removed during mixing, I was surprised to find that these breathing sounds didn’t appear at all in the model trained just recently. However, that meant that the breathing sounds originally present in my dataset weren’t being generated at all by the model, and I realized that this wasn’t purely a benefit.

what you should know is that the more extensive the pretrained model’s data is, the broader the variety of sounds the model can infer. A broader variety means that even sounds you might not want to hear can be inferred.

If you really want to know the answer, I recommend training your model without using a pretrained model. While it may take longer, it’s a clear way to monitor exactly what sounds are present in your dataset without the intervention of the pretrained model.

and you’ll quickly spot issues with your own dataset that you might not have noticed before.

of course… there could be technical challenges that make it more difficult, which isn’t your fault, but you’ll understand once you try.

real flicker
#

Lol

real flicker
visual beacon
junior walrus
visual beacon
#

I'll train more

junior walrus
visual beacon
static storm
#

Cvec moment

static storm
barren zealot
static storm
barren zealot
#

ooh

#

how long did it take to train?

static storm
visual beacon
barren zealot
#

I do not use kaggle because there's no guide to the rvc disconnect on there

#

my friend ported the collab to kaggle but no tutorial on how to use it

visual beacon
barren zealot
deep kite
# junior walrus We can train any dataset length without a pretrain? Or we can only train big dat...

as all we know, large dataset is better if your dataset is good, If the data is insufficient, many artifacts will occur in various parts. however, the point of my writing is not to create a perfect model without pretraining, but rather to observe the phenomena that arise as you monitor the progress at each epoch while training the model without pretraining. As I mentioned above, if you train a 1-5 second dataset using each pretrained model for just one epoch, it won’t learn much at all, meaning that you can use the pretrained model almost like a regular model. So, if you want to check the state of the pretrained model, try using the method I mentioned above.

another reason I'm suggesting you ppl to train a model without using a pretrained model is, aren't you curious why the 'harmonics' and strange noise patterns that you always mention appear? 🙂 the answer lies there, so give it a try. save every 10 epochs and check the spectrogram of the audio inferred by the model generated at each epoch.

deep kite
#

and one more thing to mention, I think it's time for me to take a break. I've shared a lot and had enjoyable moments with you guys for quite a long time, but now I'm feeling a bit exhausted. since I have all the datasets anyway, I think I'll give it another try later if RVC improves further

real flicker
#

"if"

deep kite
sour ore
deep kite
#

take care guys, I'm gonna miss u all 😛

sour ore
visual beacon
real flicker
#

let me check spectogram

#

ok so for my perfect eye singer have better harmonics and mid to high end but og have better details

#

but im nob in spetogram

visual beacon
#

lol

real flicker
#

JUST TELL ME 😡

visual beacon
#

tell what?

real flicker
#

is this true?

#

or @grand laurel mr tryhard

visual beacon
grand laurel
real flicker
#

So dynamic range reproduction over harmonic clarity?

grand laurel
#

if the dynamic range is reproduced well, that means that the overall quality is good

real flicker
grand laurel
real flicker
#

No my pretrain have less noise

grand laurel
#

the audio files i mean

#

cuz all i can see here is worse dynamics

#

plus urs has more aliasing it seems

real flicker
#

But sadly I'm nob at spectrogram 🥲

sour ore
real flicker
#

It's just little quieter

#

And u can clearly hear distortion in 0:3

#

While mine has more air in vocals

grand laurel
#

And the quality difference is huge between the 2

real flicker
grand laurel
real flicker
#

ik im just saying

visual beacon
#

lol

grand laurel
real flicker
#

Sooooo
singer
pluses :
less noise , snr , more stable low freq
better mid freq
lower amplitude varability
minuses:
compressed high freq
more artifacts (little bit more)
harmonic overlap
reduced dyanic range in high req
less harmonic seperation
og
pluses:
better high freq
wider dynamic range
lower harmonic disortion
more detailed harmonics
minuses:
high noise
less mid freq clairty
low freq representation

#

still

grand laurel
#

your pretrain just sounds weird

#

just cuz the high ends lack

#

so it makes it sound more synthetic

static storm
# real flicker

Exactly, what are you expecting from a fine tune of an og pretrained? Especially when pretraineds have same pretraineds: hifigan, and same cvec

vocal mortarBOT
#

Ayo? @static storm level 27 !!! lfg

real flicker
#

yall better calm down im joking but that doesnt explain why og sounds disorted

grand laurel
#

Do you know what distortion is

real flicker
#

yes

#

i mixed since 2019

#

u can cleary hear disortion in 0:2

grand laurel
#

i have no clue where u can hear the distortion

#

some would argue that the singer pretrain hinders the quality

#

and is more distorted

#

due to the aliasing up top

#

which og has noticeably less of

real flicker
#

on the word

#

faaaaaaaaaaast

sour ore
real flicker
#

naah yall trippin af

grand laurel
#

urs just doesnt go that high

#

and they sound pretty robotic

sour ore
real flicker
sour ore
real flicker
#

no

#

id win

sour ore
#

🐢 👍

real flicker
#

finally i won

#

ez

sour ore
#

I would like to think he's trolling us.

real flicker
#

why do u guys take everything so seriously😭😭😭😭😭😭😭😭😭😭