#✨│ai-help

1 messages · Page 203 of 1

crude flame
#

T de-ess is

#

RX is not

#

but you can get RX "for free"

peak falcon
#

cracked

crude flame
#

perhaps

low shard
peak falcon
#

I have a de esser from FabFilter but I have no idea how to use it properly

#

It's called Pro-DS

proven hill
#

please no cracked stuff pay full price.

peak falcon
#

Everything cracked

crude flame
peak falcon
#

Yes sir

#

Of course I did

#

especially FL Studio

crude flame
#

ight, thank you for your honesty

bold yarrow
peak falcon
#

For using cracked software?

bold yarrow
#

yes

#

isnt that in the rules

peak falcon
#

I don't know, I didn't read them

#

I just joined for the Ai models because I tried to make a producer tag

low shard
#

it's literally said in our docs lol

bold yarrow
#

oh

low shard
bold yarrow
#

well i embarrassed myself

low shard
#

but just saying "google how to be a pirate" won't get u banned

bold yarrow
#

could piss off your isp

#

well i just downloaded de-esser

#

but i can't find it anywhere

#

did i download a virus from ruislip or something

#

ill show them

peak falcon
#

what DAW are you using?

bold yarrow
#

daw?

peak falcon
#

how do you mix the vocals?

bold yarrow
#

with the instrumentals?

#

i use plain old audacity

peak falcon
#

Oh

#

I have no idea how that works

tame mica
#

get a free daw like reaper

#

audacity is not ideal for audio mixing

#

or you can yk "buy" other daws

proven hill
#

reaper best

vapid mantle
#

@hot ledge hocam kaç Target Sample Rate olmalı ?

hot ledge
#

32

#

@vapid mantle32 k

vapid mantle
#

tm hocam eyvallah

brittle wing
#

can sm1 send me the voice changer file for windows i cant fibd it

#

@tame mica

crystal jetty
#

Hello everyone, can someone help me? I'm generally 0 in these matters(

azure marshBOT
low shard
latent cypress
#

do you guys prefer klm 5.0 mini or klm 4.3 x2?

jaunty shale
#

is there a way to convert .safetensors file to .pth file so I can use it in applio?

simple ore
#

safetensors of what? gptsovits?

jaunty shale
#

I used okada to make a merged voice

#

it created a safetensor file in the model_dir folder

#

i run it on browser (it works better than window one for me)

static oar
#

so i just used Google Collab to make a voice model from audio clips right, works good with the voice changer but i was wondering what i can use to be able to apply that voice model to audio. any ideas?

rough star
#

Does that mean that I have to create a new user on my PC?

lime otter
#

-colab

azure marshBOT
# lime otter -colab
📒 Google Colab Notebooks
ℹ️ Note

While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.

simple ore
#

you can run it as a normal user that has local admin priveleges

red gale
#

does anyone of you know any uhh,good free voice changers?

rough star
knotty moth
fleet trail
#

hey

#

i just formatted my pc

#

i wanna

#

reinstall it

latent kettle
proven hill
red gale
jaunty shale
bold yarrow
#

hey guys

#

is there a website where i can accurately predict how many epochs my model would need without becoming either undertrained or overtrained

proven hill
jaunty shale
proven hill
#

you can simply merge two voices

jaunty shale
#

I can't.

#

I tried in applio it doesn't work anymore.

#

The merged voice I have has more than 2 voices combined

proven hill
#

merge one at the time

#

also wdym it doesnt work anymore?

jaunty shale
#

it shows error

#

I would have to check what kind of error in a bit

proven hill
#

are the files youre merging the same sample rate?

jaunty shale
#

Yep

proven hill
#

can you show me the error?

jaunty shale
#

will do

craggy brook
#

Is there any other way to access the 15.ai site? Or is there a site as good as this site with better sound?

craggy brook
#

voicing for a character I want to use. realistically, with emotions and without distorting the tone of voice

jaunty shale
#

ok now it worked-

#

..somehow

#

but I already have premade merged voice. Is there like a long enough audio to convert in real time?

jaunty shale
# proven hill wdym

So you know how you can use voice in real time? Not only I can use my voice but also audio to convert into in real time pretty much. It works very well and I did it multiple times, but I just need some audio that is long enough to train the merged audio I did in okada.

#

(I use soundpad to make it work)

azure marshBOT
proven hill
#

first link

solid arch
#

yo

#

how can i use my gpu to rvc?

proven hill
jaunty shale
proven hill
#

they are already doing that

jaunty shale
knotty moth
solid arch
#

T-T

#

im using rx 570

proven hill
solid arch
#

or it is set that way

proven hill
solid arch
#

idk abt anything but its working

solid arch
#

only works when i record

proven hill
#

so no file input?

knotty moth
heady gorge
#

Do I really need winzip for the voice changer

proven hill
#

you have to unpack it somehow.
but i suggest 7zip

solid arch
#

best way to like stop cutting the audio out?

knotty moth
proven hill
solid arch
proven hill
proven hill
heady gorge
#

Why do voice changers have to cost?

proven hill
#

set f0 do rmvpe

solid arch
proven hill
proven hill
#

-rt

azure marshBOT
proven hill
#

first link

#

theres the guide

solid arch
#

oh

heady gorge
#

Yeah, but I need winzip

proven hill
heady gorge
#

Is that free

knotty moth
proven hill
#

yes

solid arch
#

i download it after

proven hill
knotty moth
solid arch
jaunty shale
# proven hill i dont understand this at all honestly

..okay let me explain it differently.

I use Voicemeeter Banana to make this happen. B2 is input device that is used to convert anything that can be played in it. B1 is something that will be played through the actual microphone. When I play audio through soundboard, I can play it in the B2 to convert it into the different voice in real time. (since I cannot convert it in applio). With that, I can make a dataset.

heady gorge
proven hill
proven hill
solid arch
# jaunty shale I have okada already.

ure telling me that u want some audios to be converted into a voice which is to okada so okada would transmit that and turn into a different voice and will play it thru to ur mic

#

a audio came from ur soundboard?

#

soundboard > okada > mic

jaunty shale
#

I really don't wanna spend more hours to make it in applio again.

solid arch
#

i use voicemod for the soundboard

#

works clearly when i use it

heady gorge
#

Are you sure this is safe

solid arch
#

sends any kind of audio thru my mic

proven hill
proven hill
jaunty shale
proven hill
solid arch
#

imma extract the file

jaunty shale
#

because that's not a .pth file...

#

I cannot use safetensor file in applio

proven hill
#

where do you eeveeen get a safetensor file

#

TO WORK IN OKADA TOO

jaunty shale
#

I never knew it makes a different file until I merged it yesterday.

knotty moth
heady gorge
#

I just installed 7zip

#

What's the link to the voice changer

jaunty shale
#

merged voice makes only a safetensor file.

knotty moth
proven hill
azure marshBOT
proven hill
#

first link

proven hill
#

yea you need to merge in mainline (discontinued), applio (suggested) or ilaria rvc mainline (discontinued)

jaunty shale
heady gorge
#

I'm not good with PC stuff But I can try

proven hill
solid arch
#

chat why im one my browser 😭

jaunty shale
#

I'll just figure it out. I only need a long audio from youtube.

solid arch
#

where i should put my models?

proven hill
#

please follow the guide

proven hill
low shard
solid arch
#

oh

#

mb ggang

jaunty shale
proven hill
heady gorge
#

Okay. Can I have the link to the voice changer

proven hill
#

-rt

azure marshBOT
proven hill
#

first link

jaunty shale
proven hill
heady gorge
#

I don't see the Voice changer in 7zip

low shard
heady gorge
#

it keeps on changing

craggy brook
proven hill
#

assuming youre using tts

craggy brook
proven hill
craggy brook
#

yes

proven hill
winged crane
#

yo mods can i get permission to share my screen?

winged crane
#

i wannashre my screen when im playing dragonball sparkeling zero with a friend

#

if thats ok?

#

share

proven hill
#

idk

#

@low shard

winged crane
#

you can join and see aswell if you want to

knotty moth
winged crane
#

yea im not doing dat im married

#

thx allot

hallow thistle
proven hill
winged crane
#

@quick jungle can i get permission to share my screen

quick jungleBOT
proven hill
#

its a bot

winged crane
#

oh

#

lol

flint geyser
#

yo do you want me to still fix it

low shard
proven hill
#

yea dont waste your time

#

people use ilaria rvc zero anyway

peak falcon
#

Eight by Eight

#

36 zero waist

honest junco
#

Sorry I tried couples times but my RVC keep showing Frequent errors occur. Please check if the model of the framework being targeted is loaded.

#

And my colab are showing my server is not an accepted origin. (further occurrences of this error will be logged with level INFO)

#

I tried to search in github but I still can't find anyway to fix it

#

(Fun story I used 3 hours to do pip install pip==24.0

proven hill
honest junco
proven hill
honest junco
#

Oh you mean the rar

#

I can'

proven hill
honest junco
#

can't*

#

I tryied

proven hill
#

why?

honest junco
#

it have 20k ms on res

#

💀

#

Thats why I need to use colab

proven hill
#

what colab are you using?

honest junco
#

google colab

proven hill
#

yes i mean

#

give me the link

honest junco
#

Alr

honest junco
proven hill
#

also i saw it was ngrok

honest junco
#

alr mb

proven hill
#

-ngrok

honest junco
#

yeah

proven hill
#

damn

proven hill
#

ohhh its a voice changer

honest junco
#

yeah

proven hill
#

i think this is old

#

why not use your gpu?

honest junco
#

Cuz when I use my gpu

#

the res of it

#

are 20k ms

#

basically it take 20s to tranfer my voice to u know

proven hill
#

what gpu do you have?

honest junco
honest junco
proven hill
#

are you using the forked version?

honest junco
#

forked version?

proven hill
#

the new version

honest junco
#

Yes

proven hill
#

better support for amd cards

honest junco
#

1.5.3.18a

proven hill
#

nah thats old

#

-rt

azure marshBOT
proven hill
#

first link

honest junco
#

alr tysm

low shard
proven hill
#

np

honest junco
#

sorry

#

😔

low shard
low shard
proven hill
honest junco
#

Fun fact I paid the google colab

proven hill
honest junco
#

I spent 56.5 for using that

honest junco
proven hill
#

lmao

low shard
honest junco
#

I need to use my nd google acount to log in to use it Skullflushed

low shard
#

it's another tunnel

#

anyways, ur gpu should be good enough

honest junco
honest junco
low shard
proven hill
#

let him download the fork

low shard
#

ye i was just saying

honest junco
#

seting up tho

sudden tree
#

where can you find models like the melband roformer karoake model by viper?

#

do they upload them I am too nervous to download some random ckpt on huggingface lmfao

proven hill
sudden tree
#

he already sent me the mega but idk where he obtained file from

#

I wanna know the source

viscid moss
proven hill
#

also huggingface is safe :)

viscid moss
#

ye

sudden tree
#

thanks a lot guys

#

well doesnt huggingface literally virus check each file regardless

#

just wondering why isnt the viperx karaoke and stuff included in uvr 5 model download set

sudden tree
#

oh thats sick

viscid moss
#

Will be available soon ig

sudden tree
#

nice

#

thank you

#

i wonder why everyone recs the mvsep when the queue is ungodly lmfao

#

not worth

valid spruce
#

What sample rate should I use?

proven hill
valid spruce
proven hill
#

no problem!

sudden tree
#

everyone always says you dont need to cut audio yourself, but I realized when I train with my own clips, my Crepe models even beat the RVMPE models! I think the auto clipping of applio causes the ai to be confused. Has anyone else experienced this>?

analog obsidian
sudden tree
#

yeah

#

exactly

analog obsidian
# sudden tree exactly

are you using the script found in the docs? you're supposed to slice the whole dataset in chunks of 3 seconds with an overlap of 0.3

#

and crepe vs rmvpe the difference is subtle, crepe models are softer while rmvpe are more harsh

sudden tree
#

i just basically use a audio ceiling to prevent white noise

#

and then split the audio into like 5 min chunks

#

rmvpe has always sounded better to me tho

analog obsidian
sudden tree
#

i wish there was just a vid of someone training with the doc settings

analog obsidian
#

rmvpe was made in 2023

sudden tree
#

most vids just do what I do and throw the audio in the training

#

I did not use the settings in the doc lmfao

analog obsidian
#

you're not slicing it yourself, rvc is slicing the dataset for u

sudden tree
#

no, but the pretrain split it for me automatically in applio

#

Yes i use the applio pre cut setting

analog obsidian
#

so you didn't disabled rvc splitting

sudden tree
#

and the process

#

no I didnt

#

but still my old crepe model sounds better when i split it by hand

analog obsidian
#

every training is different + batch size matters

sudden tree
#

true I just run like 8 batch size even though i have 12 vram bc i train on 32fp

analog obsidian
#

u can actually get different results using the same exact parameters

sudden tree
#

fp32 was big mistake activating maybe?

analog obsidian
#

fp16 is too unstable

sudden tree
#

or maybe i should deactivate the process audio preset in applio?

analog obsidian
sudden tree
#

also should I make the input audio loud or just leave it as is?

#

some of the audio i train is raw vocal dataset and is quite

analog obsidian
#

i can tell you the """right""""(not really) way to preprocess a dataset

sudden tree
#

sure

analog obsidian
#

so open audacity, open your dataset (if your dataset are multiple audio files, merge them into one audio before doing this), select the whole audio, find truncate silence and use these settings:

#

damn i forgot

#

before doing that, convert the dataset to mono

sudden tree
#

why mono/

#

dont you lose data quality

analog obsidian
#

bc rvc cant read stereo files

sudden tree
#

oh shit so could that have ruined my training?

analog obsidian
sudden tree
#

ah lmfao

analog obsidian
#

but since you're doing this method, you should convert it to mono

sudden tree
#

isnt truncate silence same as doing noise gate in fl

analog obsidian
#

this literally removes silences

#

and leaves only the speech audio

sudden tree
#

ohhhhh

#

thats why im failing

#

its probably training on the silences?

analog obsidian
#

so like this

analog obsidian
#

after you have your truncate silence dataset do the next step

#

use these settings and you should be fine

#

only use simple mode if you truncated the silence

#

never use it for datasets that have silence

sudden tree
#

ok thank you a lot for this

#

is there a full guide so i can train

#

My only question is why truncate instead of just using the auto setting?

analog obsidian
sudden tree
#

ok thanks where do you find these links

analog obsidian
#

-docs

azure marshBOT
analog obsidian
#

^

sudden tree
#

thnx

#

what is batch size can you help me

azure marshBOT
sudden tree
#

is 8 batch size good for 13 min vocal data

analog obsidian
sudden tree
#

what about 12 or 4

analog obsidian
#

4 is unsafe but works in some cases where the dataset is very monotone and repetitive

#

12 works in some cases as well

#

where 8 fails

sudden tree
#

do you just use the built in pretrain?

analog obsidian
#

yes, original pretrain

sudden tree
#

ok just wondering if going to 12 batch size would help it

analog obsidian
sudden tree
#

hopefully truncate silence will remove all random noises in a studio sesh

analog obsidian
#

as long they're below -42,5 db

sudden tree
#

also do you personally use the melband roformer karaoke models to isolate leads?

analog obsidian
sudden tree
#

lmfao, audacity doesnt take m4as jeez

analog obsidian
#

😭

glacial pollen
#

a must have for everyone working with audio

analog obsidian
#

my beloved

sudden tree
#

yeah it would just degrade quality lmfao

#

double compression

glacial pollen
#

well no, m4a is not a codec

analog obsidian
#

bro convert it to wav

glacial pollen
#

it is a container

analog obsidian
#

lmao

sudden tree
#

oh i thought m4a was codec

glacial pollen
#

could hold aac, opus or vorbis

#

nope

sudden tree
#

nice for a while i thought youtube had opus but i dont see it anymore to download

glacial pollen
#

it's kinda mp4 counterpart

#

just the difference is, it doesn't contain " video layer "

analog obsidian
#

opus is still the best you can get from youtube

#

im still getting it

glacial pollen
#

essentially, it's " MPEG-4 Audio Layer "

sudden tree
#

lmfao now i gotta learn cmd ffmpeg fuck!

glacial pollen
#

Ye, Opus is in fact a very good codec

#

for lossy stuff

#

direct successor of vorbis ( ogg )

analog obsidian
#

.\ffmpeg -i audio.m4a audio.wav

glacial pollen
#

^ ye, will unwrap the container and get it to wave

#

Tho as Lyery said, if you're working on audio from youtube, use yt-dlp.exe ( a cli tool )

#

it'll fetch the audio from yt servers in best possible available quality ( mostly opus and rarely aac ) And then convert that using ffmpeg
(( that's my exact workflow for ' yt sourced audio ' ))

sudden tree
#

damn never new m4a was container does it just hold mp3 atp?

glacial pollen
#

I believe aac

sudden tree
glacial pollen
#

use -x arg

sudden tree
#

damn wtf converting it to wav made filesize 10x

glacial pollen
#

that gets you opus ( if it's available )

analog obsidian
#

.\yt-dlp.exe -x url

glacial pollen
#

else, aac or m4a ( still aac I believe. )

sudden tree
#

yeah -x usually grabs the video tho ngl

glacial pollen
#

well no

analog obsidian
#

no

sudden tree
#

oh sh

#

if wav is container, how does it make file size increase if bitrate stays exact same

glacial pollen
#

The thing is

#

wave pcm is not using any compression

#

so effectively, whatever would be ( which is not as file comes from lossy compression )

#

gets ' 0 filled '

#

that's a thing that has to be done, no other way.

#

All the missing data is just filled in

sudden tree
#

ok so i am a noob

glacial pollen
#

So yea, whatever you have or get from yt-dlp -> wave

#

that wave after editing / processing -> 32 bit float 44.1khz

sudden tree
#

and i just did a zoom out on the audacity and the audio is completely peaked out now

#

tf

#

oh i zoomed on the db lmfao hahaha

sudden tree
glacial pollen
#

yes

#

it's the bit depth

#

32 bit float is the " target end " for files that rvc processes anyways

sudden tree
#

and if you are using different songs etc we can import multiple files into the training at different db normalizations?

glacial pollen
#
  • you avoid potential issues during editing
sudden tree
#

or should each seperate session be normalized to equal level

glacial pollen
#

Well, the whole dynamics aspect of rvc is a lil skewed up anyways

#

Biggest issue is, if you mangle with dynamics on your own ( be it rms, peak norm or compression )

#

it can screw up model's ability to express itself well at high volumes. it'll cause tearing

#

so at best... if you have to, tame the peaks and maybe add a tiny bit of compression

sudden tree
#

export audio as mono wav or stereo?

glacial pollen
#

wave

sudden tree
#

i forgot to make the audio mono lmfao

glacial pollen
#

just copy one channel into a blank file
( aka, do not use any " merge channels " algos or such )

sudden tree
#

?

analog obsidian
sudden tree
#

confused

#

i just downloaded audacity so i dont know how to do that

glacial pollen
#

you wanna delete 1 channel from the audio
either L or R

#

and just save it as mono wave

analog obsidian
glacial pollen
#

Alternatively, copy / highlight just 1 channel of your choice and paste it over

#

Cause like, depending on what audacity does

sudden tree
#

i see

glacial pollen
#

if it fuses the channels / centers em, it's pretty bad

#

that's a " merged mono " and not true mono

sudden tree
#

lyery is just trying to get me to fuse you are just saying delete one to prevent the wrongful merge and distortion

analog obsidian
#

hear him not me

#

lol

sudden tree
#

how to delete one track?

glacial pollen
#

I mean, what he says isn't wrong

#

but just no ideal imo

analog obsidian
#

he knows better than me

glacial pollen
#

Because rather than raw mono, you get a fuse of channels, ish ( as long audacity does that which I am not 100% sure of

sudden tree
#

yeah how do you do it

glacial pollen
#

as it does kind of algo magic and averaging of phase and such

analog obsidian
#

he told u above how

#

takes u a few clicks

glacial pollen
#

hold on

sudden tree
#

also clipping every 3 sec with a .3 sec overlap seems like a disaster intuitevely to me idk why

glacial pollen
#

You'll have uhhh

sudden tree
#

if you are training on only 3 sec you are guaranteed to get clipping on harmonization it seems like that would mess up the fluency

glacial pollen
#

" split stereo track "

#

or so

#

Gonna be somewhere here

glacial pollen
# glacial pollen

Then select the other one ( which ever you want but I personally use RX and do params measurements on both channels to pick the better one ) and delete it

#

leaving only just 1 channel ( and so, you have your file mono in the end

sudden tree
#

maybe mine is mono lmfao i can select differently

#

haha

analog obsidian
glacial pollen
#

if it looks like so, it is stereo

sudden tree
#

wait couldnt you just pan 100% stereo lmfao

glacial pollen
#

Cause well ye, you have 2 channels visible

glacial pollen
#

I mean sure, panning

#

but it's just 1 click

#

then x one the other track

#

Done

#

That simple

sudden tree
#

ah nice i figured it out thanks'

glacial pollen
#

Nice

sudden tree
glacial pollen
#

In any case

#

always go for 44.1

#

for yt

sudden tree
#

and you can train multiple files or do you need to merge them into 1 audio file

glacial pollen
#

no difference really

#

But best is imo to just use 1 track, 1 file. And do processing on 1 file ( to keep the uniformness

sudden tree
#

yeah

analog obsidian
glacial pollen
#

the only reason overlap exists is to avoid the discontinuity in " context "

#

naturally, if you can afford to split it all on your own, properly, you can bypass it

#

But that's the best we have if it's automation

#

( I tried various methods to better it, sadly didn't work out well / significantly. Such as envelope or better rms methods )

#

Tho ye, dw about it. As lyery said, it's alright

sudden tree
#

i see

#

thank you

#

the last question i have is regarding the normalization

#

like one of my sessions is higher peaks and normalization

#

just wondering if it will mess up training

#

should i lower the gain on the loud one

#

like i just ran a normalization of -10db to match to look of each

glacial pollen
sudden tree
#

i see so dont worry about it

glacial pollen
#

But then, it all should go well if your audio comes from " same source "

sudden tree
#

well its not

#

thats the problem

glacial pollen
#

in that case, you can more or less match the " overal " volume levels

#

per clip / track

sudden tree
#

some are way loiuder than others so i am just normalizing until the clips look the same

glacial pollen
#

doesn't have to be perfect but it'll help

#

you can just do each track at -3 dB norm

#

that's because rvc does normalize each anyways ( each cut 3 sec segment )

sudden tree
#

o shit i just normalized one to -8db to match the other

#

ok i see

glacial pollen
#

but ye, getting em to similar levels is a nice thing to do regardless

sudden tree
#

is there a way to collapse all tracks in audacity to one continuous singular

glacial pollen
#

Yes

#

That's actually the only reason I keep audacity ( and use it just for that lol )

sudden tree
#

how?

#

yeah audacity is pretty fire now that i see it

glacial pollen
#

tracks > align tracks > end to end

#

or however it's localized for you

sudden tree
#

thank you

#

are you slovak?

glacial pollen
#

Polish

analog obsidian
sudden tree
#

haha nice

#

im slav

glacial pollen
#

Nicee haha

glacial pollen
#

the way I do it is copy 1 channel

#

and paste it into new blank file

#

length will auto match

analog obsidian
#

oooh ok ok

#

ty

glacial pollen
#

pick one that seems better btw, the channel

analog obsidian
glacial pollen
#

for instance, one that has better sdr levels or dc or peaks, you know the deal

sudden tree
#

do you recommend alternativing sides like left-right-left for clips or only using left-left-left

glacial pollen
#

and import again

#

Then you have one continuous file

sudden tree
#

i see thanks

glacial pollen
#

as in, doesn't matter tbh

#

best channel per track

#

but if it's the same source..

#

for instance, 1 anime but different episodes

#

I always want to believe the recording session and so on was set more or less similar

#

so in that case I do pick the same channel throughout my project ( just in case ✨ ) (( Unless one is explicitly bad or worse

sudden tree
#

damn the align track thing isnt working ugh!

glacial pollen
#

how so?

sudden tree
#

nevermind i figured it out

glacial pollen
#

a

sudden tree
#

should we pan center after splitting stereo?

#

@glacial pollen

glacial pollen
#

no

sudden tree
#

aight

glacial pollen
#

At least I never do that

#

guess you can try one time

#

but I don't see any point in that personally

sudden tree
#

shit i just figured out something

glacial pollen
#

I guess the biggest clue whether you should try that or not is seeing how phase behaves
if channels differ significantly in that aspect, perhaps you could try

sudden tree
#

if you export as mono and you have left and right tracks it just mutes the right one

#

hahahaha

#

ill just pan left on the rights

glacial pollen
#

oh then problem's solved, if there's no mixing or centering algo involved

sudden tree
#

haha

glacial pollen
#

then you good to go

sudden tree
#

thanks for all the help brother

#

means a lot

#

and then what settings for applio you recommend for the preprocess?

#

manual splitting setttings?

glacial pollen
#

Given the most propable case for you uhhh, go for default

#

and do include preprocessing

#

it's the normalization + butterworth filtering ( 0-57hz iirc )

sudden tree
#

i see i was just asking bc the laf dude was saying 3sec with .3 sec overlap

glacial pollen
#

ye, that's the default

sudden tree
#

i see

#

he said simple tho

glacial pollen
#

automatic + 3 / 0.3

#

Simple can work too

#

but if you truncate stuff, that is
Silence truncation

sudden tree
#

i see i guess since we already truncated so simple makes sense

#

i will do

#

thank you for the help

glacial pollen
#

yup

#

Np man, best of luck

sudden tree
#

is 8 batch size good for 15 min dataset?

#

and how many epochs you rec

glacial pollen
#

It really depends
for instance I used to work with bs 12 / 14 and 16 for most of my above 10 or 13 min sets

#

yet sometimes that works like crap and 7,8, 9 are safer

#

As always I recommend bs range finding

#

train the model at: 4, 8, 12, 16 ( each for 400-500 epochs )

sudden tree
#

oh wow, I didnt know people did that

glacial pollen
#

if you're aiming for " perfectionist " model

sudden tree
#

lots of effort haha

#

makes sense tho

glacial pollen
#

well no, people do not do that

#

but I just recommend that workflow if you're a perfectionist like me lol

sudden tree
#

yeah i am lmfao

glacial pollen
#

( tho in reality, both learning rate and batch size should be picked individually )

#

Oh ye, in that case def go for that

#

and from there see on graphs + do some inference testing on various epochs

sudden tree
#

yeah i dont even know how to modify learning rate in applio

glacial pollen
#

n see which one does the well

#

from there, you can finetune it even further as in, do -/+ 1 batch from the base batch size ( one that performed the best )

sudden tree
#

you rec 48k sample rate>

#

?

#

also how do you change LR in appluo

glacial pollen
#

what's the frequency response of your files?

sudden tree
#

not sure haha i am super noob

glacial pollen
sudden tree
#

lmfao

glacial pollen
sudden tree
#

i usually just put 48k because its highest level

glacial pollen
#

pretty basic but will do

sudden tree
#

is that bad

glacial pollen
#

model's sr should be more or less aligned with your files

#

with some minor exceptions

#

for example, a deviation of 1-2khz shouldn't hurt or 3

sudden tree
#

can i use spectrograph on audacity

glacial pollen
#

For instance, if I have somewhat imperfect audio ( can be compression ) that's ranging anywhere from 41 to 43khz or even 44

sudden tree
#

its peaking at 19k

glacial pollen
#

I'll use 48k model ( because those extra 2,3 or 4 khz does mean clarity and fidelity, esp in respiration

sudden tree
#

well all my audio is ripped from youtube

#

so woulkdnt it peak 20khz

glacial pollen
#

In that case 40khz model ye

sudden tree
#

i wonder why

glacial pollen
#

yt should never be used for 48k

sudden tree
#

why 40khz if audio is hitting 20khz

#

oh it actually looks like its hitting 18khz

glacial pollen
#

because that's nyquist range

sudden tree
#

i see haha

glacial pollen
#

Essentially

#

spectrograms

sudden tree
#

damn you have to become a audio expert for this

glacial pollen
#

for them you do *2 the sr and that's your true sr

sudden tree
#

so using 48khz was ruining my models possibly?

glacial pollen
#

Quite possible yes

sudden tree
#

wow!

#

thank you

glacial pollen
#

because the models are trained on specific frequency ranges ( pretrained models

sudden tree
#

i see

#

that makes sense due to pretrained

glacial pollen
#

they are accustomed to working within a giving frequency spectrum ye

#

Yup

#

but it's not a 'hardcoded rule'

sudden tree
#

damn thanks so i gotta download the legit studio rips to be able to go to the 48khz range

glacial pollen
#

For instance

sudden tree
#

or find raw vocals with really good mics

glacial pollen
#

My Kurisu ( best model I ever made )
was 38-42 ( variable ) sr

#

yet trained on 48k

#

One of my fave tracks from Eve. Remember back in my worst days I used to spam it a lot. Oh yeah, I kinda love how I don't have to readjust Kurisu's pitch with Eve's stuff, they just click on " 0 ". Enjoy ~

Original song by Eve and all people associated with the project:
https://www.youtube.com/watch?v=nROvY9uiYYk

� Cover details �
Inferenced ...

▶ Play video
#

Yet she sounds nice, right

sudden tree
#

yeah

glacial pollen
#

So there's no strict strict rule, yet it's highly advisable to stick to what I mentioned yup

sudden tree
#

yeah i wonder if the mismatch causes audio ripping or the glitching noises

glacial pollen
#

not quite

#

it primarily affects the model's potential / generalization or generally adapting to your voice ( finetuning potential

sudden tree
#

damn cant find the custom cutting in applio

sudden tree
#

where is audio cutting setting located in training

#

cant seem to find

glacial pollen
#

show ss

sudden tree
#

i just see this

#

i cant customize the cutting @glacial pollen

glacial pollen
#

Which applio version you running?

#

Seems like outdated one

sudden tree
#

newest

glacial pollen
#

hmmm

sudden tree
#

i saw it a while ago it disappeared for somereason

glacial pollen
#

show me the full ui ss

#

upper part

sudden tree
#

im just gonna reboot rq

analog obsidian
#

Ah thats the latest compiled, yes its outdated

sudden tree
#

so fucking weird i cant find the simple cutting @analog obsidian

analog obsidian
#

use latest main branch repo

glacial pollen
#

ah, if it's compiled and not from repo

sudden tree
#

im using 3.2.8

glacial pollen
#

then outdated

#

ye but precompiled / zip packages aren't updated in-line with repo atm

sudden tree
#

can i just not use simple lmfao

glacial pollen
#

download the repo and use 3.2.8's env folder

#

and if that doesn't work, delete the borrowed 3.2.8's env folder and redownload all ( using install-applio .bat file )

#

Lyery will help you hopefully as I have to get back to my work

sudden tree
#

alr

#

imma just use default atp

knotty moth
sudden tree
#

wym

#

it is bugfixed 328

analog obsidian
#

he can't do it because he's using the latest compiled version, which is outdated

#

just do this and decompress it in your applio folder

#

don't use run-install.bat

#

no need to reinstall anything

#

run applio

knotty moth
#

or use codename's fork

sudden tree
#

i mean @analog obsidian i can still use my current version and just use default splitting?

sudden tree
#

it seems to have worked

#

what

#

fuck i am at epoch 100 alr

#

why not codename said i could

analog obsidian
#

if u want to use default splitting then don't use truncate silence

sudden tree
#

what why

#

it shouldnt matter

#

it may try to truncate for me but i alr did

analog obsidian
sudden tree
#

i see

analog obsidian
#

does not affect quality

sudden tree
#

i did it default tho

#

after truncating

analog obsidian
#

just means your model is gonna take more epochs

sudden tree
#

i see but all the 16k splits are legit 3 seconds anyways

analog obsidian
#

the truncate silence method helps rvc to learn the dataset faster

sudden tree
#

i see

#

i just dont understand the diff between simple cutting and default

analog obsidian
#

there is not an audible difference between this method and the casual old method of automatic slicing anyways

#

only thing that changes is how fast rvc learns the dataset

sudden tree
#

i see it just prevents those like 1 second clipped audios?

#

so the epochs for same audio is lower?

analog obsidian
#

yuh

#

1 second clips are ass for rvc

#

very bad

sudden tree
#

makes sense tbh tho its slicing it fire ngl

#

quality degredation ?

analog obsidian
sudden tree
#

oh i didnt know that

#

lmfao

#

i wonder what would happen if you set the time for each to 10 seconds

analog obsidian
#

not really ignoring but it separates them from the rest

sudden tree
#

i used to use like 7 sec samples

analog obsidian
#

so the model learns the dataset even slower

#

since it has to learn two things at the same time

#

instead of 1

sudden tree
#

i see

#

why dont we just use 7 sec samples or 10 sec

#

would be faster

analog obsidian
#

every 3 sec chunk get paired and every 1 sec chunk gets paired

#

and rvc learns them individually

#

smth like that

analog obsidian
#

i might be wrong with this tho no idea

sudden tree
#

ah so its the new training

#

i remember using 10 sec samples in 2023

analog obsidian
#

no u didn't, rvc sliced them

sudden tree
#

oh haha

#

troll moment

analog obsidian
#

dont worry it will not kill your quality

#

the model will learn the dataset a bit slower

#

but thats really it

#

you can continue using the old slicing method if you wish

sudden tree
#

ngl you said truncating wouldnt help but my model seems to be improving way more conistently this time

#

just looking at the loss values in cmd

analog obsidian
#

yea because like i told you, it learns it faster

sudden tree
#

i see

analog obsidian
#

so you notice it sounds good because its learning faster

sudden tree
#

just removing the silences = less dead space = faster training

#

makes sense

#

you are really just maximizing the roi

analog obsidian
#

u still need silence for training

sudden tree
#

just not a lot

analog obsidian
#

thats why the setting set it to kept a bit of it

#

yuh

#
  • rvc injects 2 silences in your dataset
#

this is bc you have to teach the model to understand what silence is

#

at least that was noobies told me

glacial pollen
#

2 of them is really enough for typical dataset

sudden tree
#

haha do you think rvmpe is the ultimate development of this technology?

#

i wonder if rvc can even improve atp

analog obsidian
glacial pollen
#

definitely not ultimate

#

but so far the best we've got

sudden tree
#

the problem is realtime translation

#

it still sounds blocky on my end with large chunk size

analog obsidian
#

well realtime perfomance heavily depends in the dataset

#

singing datasets are bad for speech

sudden tree
#

yessir

analog obsidian
#

while speech datasets are okayish-mid for singing (it depends)

sudden tree
#

we need to be able to develop some agi ig to make the tech flawless

glacial pollen
#

a colorful and vibrant in emotions and pitch set can sing well

sudden tree
#

yeah i try speech on juice wrld model and it works since he raps haha

sudden tree
glacial pollen
#

Any tsundere anime set will do as well

#

lol

sudden tree
#

what epoch level do yall tend to set the models at

#

like 300 for 15 mins is peak usually?

analog obsidian
#

depends in your batch size

#

but real answer: its random

#

u cannot predict it

sudden tree
#

yeah makes sense

#

since its practically learning different vocal tunes and aspects

analog obsidian
#

sadly old graphs are not accurate enough to show you which epoch to choose since they only tell you the latest value in that specific epoch

sudden tree
#

have any of yall looked into onnx model conversion for realtime

#

apparently you can offload to cpu?

analog obsidian
#

at least when i tested onnx it degraded my model quality a bit

#

and also on nvidia is slow af

sudden tree
#

total g loss

#

then max the smoothing

analog obsidian
#

yea g/total (from 3.2.8) is outdated by now

sudden tree
#

wait why is it inaccurate

#

insane how 3.2.8 is alr outdated

analog obsidian
#

in simple words, it only tells you the latest value of that specific epoch
this means you might have a better value in another epoch and you'll never know

#

so if ur lowest g/total was 29
u might actually have another low one hidden

#

the new graphs fixed this

sudden tree
#

what cant you see on the graph every epoch?

#

i never had that issue

#

i can see all of them on the graph

analog obsidian
#

no like you see that if you hover your mouse in a random point it tells you something like "value: 32,5"

#

well that value is one of multiple values in each epoch

#

so 32,5 in the epoch 100 (for example) is just its latest value

sudden tree
#

i see

#

didnt know that

#

is 3.2.8 really outdated?

#

i just installed last 2 weeks

analog obsidian
#

very outdated

#

new applio has a new optimizer

sudden tree
#

damn just no updates to the package installer huh

simple ore
#

i would not call it oudated

#

it is the official release

sudden tree
#

so just zip the github repo and unzip in the same file and replace all folders?

analog obsidian
#

yuh

#

no need to reinstall the env

sudden tree
#

rip to all my premade model logs lmfao

analog obsidian
#

F

simple ore
#

there's some cleanup in progress, so your stuff may break

analog obsidian
#

oh damn

simple ore
#

like need to update filelist.txt and replace "mute/v2_" with "mute/"

analog obsidian
#

oo

simple ore
#

and need to add soxr using env/python -m pip install soxr

sudden tree
#

shit cant i just pip install entire project haha

simple ore
#

waste of time

sudden tree
#

does anyone else have the issue when converting vocals?

#

its like the converted file is longer than input causing the vocals to be off beat

hallow thistle
#

What a waste of time.

analog obsidian
sudden tree
#

wait is g/loss/total reliable

#

apparently the smoothed one starts consistently increasing at 14k steps

#

but the higher steps still sounds vastly better

#

like 20k even sounds amazing

#

it really sounds more real

analog obsidian
sudden tree
#

is mel better?

#

keep training?

analog obsidian
# sudden tree is mel better?

mel is the clarity of your model, this metric always improve the longer you train so this is why you feel it sounds more real

sudden tree
#

hmmmm didnt know that

analog obsidian
#

but your g/total graph stopped improving the moment it started rising

sudden tree
#

what would you do w this

analog obsidian
sudden tree
#

is generalization just the ability to be applied in any scenario?

analog obsidian
#

overtrained epochs have distorted frequencies and other bad stuff

sudden tree
#

because it seriously sounds much much more accurate to juice wrld at even 30k steps

#

even though it started rising at 14k

analog obsidian
#

yuh because mel is still improving

#

so the spectogram is clearer

#

which gives the feel the model is more realistic

sudden tree
#

i see, why does everyone use generalization when clarity is much more important for realism

knotty moth
#

mel is more likely to keep going down even more than 1k epochs

sudden tree
#

so generalization = more flexibility

analog obsidian
sudden tree
#

i see that makes sense

analog obsidian
#

but anyways like i said before the old graphs only logs the last step of the epoch, we can't tell if your model is actually overtrained or not since the graph is innacurate

sudden tree
#

so at .999 smooth the graph start increasing at 240 epochs but the lowest recorded loss was at 270

analog obsidian
#

the new ones are like this

sudden tree
#

should i use 280 epoch or 240 as the base

knotty moth
analog obsidian
sudden tree
analog obsidian
#

i believe your model is not overtrained and the g/total is just fluctuating

#

but we will never know

#

the log is innacurate

sudden tree
#

oh really?

analog obsidian
#

yuh

#

no way to tell

#

besides hearing them

#

thats what used to be before

sudden tree
#

wow so i should just take whichever sounds best then since its inaccurate

knotty moth
sudden tree
#

makes sense

analog obsidian
#

yeah

#

choose the one u like the most since the graphs will not help at all

sudden tree
#

this is the graph for reference @knotty moth

#

its 15 min training data batch size 8

analog obsidian
#

i remember back then i used to choose an epoch based in the mel graph

#

i felt it was a bit more reliable than g/total

sudden tree
#

yeah it is honestly sounding like 30k steps is the best

#

500 epochs sounds a bit overtrained ngl

#

idk tho

#

it would theoretically make sense to choose the mel graph if converting rap vocals to rap vocals imo

analog obsidian
#

a rap model will always be good at inferencing rap songs regardless

#

its literally whats made for

sudden tree
#

true haha

#

yeah i am getting robotic noises at 30k steps

#

thats overtraining correct?

analog obsidian
#

yeah

#

robotic sounds happen when the model is overtrained

#

as long the model is not robotic, its fine

sudden tree
#

imma just download a conversion and compare

#

haha

#

ill update because the new g total is accurate right?

analog obsidian
#

overtraining is pretty easy to spot, literally if it sounds robotic, its overtrained

analog obsidian
#

every new graph is reliable now

#

u can trust them

sudden tree
#

alright sounds great thank you

#

i will see how to update

astral jungle
#

-train

azure marshBOT
# astral jungle -train
📒 Google Colab Notebooks
ℹ️ Note

While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.