#✨│ai-help
1 messages · Page 203 of 1
cracked
perhaps
🏴☠️
I have a de esser from FabFilter but I have no idea how to use it properly
It's called Pro-DS
please no cracked stuff pay full price.
you mean you payed full price right? because you are an outstanding citizen
ight, thank you for your honesty
someones about to get banned
For using cracked software?
I don't know, I didn't read them
I just joined for the Ai models because I tried to make a producer tag
everyone 🏴☠️ it
it's literally said in our docs lol
oh
well sharing the link of it causes us troubles
well i embarrassed myself
but just saying "google how to be a pirate" won't get u banned
could piss off your isp
well i just downloaded de-esser
but i can't find it anywhere
did i download a virus from ruislip or something
ill show them
what DAW are you using?
daw?
how do you mix the vocals?
get a free daw like reaper
audacity is not ideal for audio mixing
or you can yk "buy" other daws
reaper best
@hot ledge hocam kaç Target Sample Rate olmalı ?
tm hocam eyvallah
Hello everyone, can someone help me? I'm generally 0 in these matters(
Hey, Leroy PVE! Please use the command !howtoask to increase your chance of getting help by structuring your question in a way others can understand better. Also make sure you're asking in the right help channel:
- General RVC help: #✨│ai-help
- W-Okada / Realtime RVC: #🔍│help-w-okada
- AI image related: #🔍│help-ai-art
that's wokada, tell me ur pc gpu in #🔍│help-w-okada
in what?
do you guys prefer klm 5.0 mini or klm 4.3 x2?
is there a way to convert .safetensors file to .pth file so I can use it in applio?
safetensors of what? gptsovits?
I used okada to make a merged voice
it created a safetensor file in the model_dir folder
i run it on browser (it works better than window one for me)
so i just used Google Collab to make a voice model from audio clips right, works good with the voice changer but i was wondering what i can use to be able to apply that voice model to audio. any ideas?
Does that mean that I have to create a new user on my PC?
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- FaceFusion UI, by Nick088 Google Colab
- FaceFusion NO UI, by Nick088 Google Colab
- EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
it mean literally do not run it using Administrator user
you can run it as a normal user that has local admin priveleges
does anyone of you know any uhh,good free voice changers?
Okay so double click wouldn't run it as admin right?
just download a compiled version, my dude
the result will be always bad even if you upscale it to 32k
32k'ya yükseltseniz bile sonuç her zaman kötü olacaktır
What
reinstall what?
Figured it out already,no worries
is there way to convert it so I don't have to make a dataset for the merged model?
hey guys
is there a website where i can accurately predict how many epochs my model would need without becoming either undertrained or overtrained
tensorboard
why would you do that, no its not possible
wanted to create a merged voice and use it to convert audio voices into that voice-
you can simply merge two voices
I can't.
I tried in applio it doesn't work anymore.
The merged voice I have has more than 2 voices combined
are the files youre merging the same sample rate?
Yep
can you show me the error?
will do
Is there any other way to access the 15.ai site? Or is there a site as good as this site with better sound?
what do you need?
voicing for a character I want to use. realistically, with emotions and without distorting the tone of voice
try weights.gg
ok now it worked-
..somehow
but I already have premade merged voice. Is there like a long enough audio to convert in real time?
wdym
So you know how you can use voice in real time? Not only I can use my voice but also audio to convert into in real time pretty much. It works very well and I did it multiple times, but I just need some audio that is long enough to train the merged audio I did in okada.
(I use soundpad to make it work)
-rt
Interaction has expired, use the command again for a new interaction.
first link
what do you need to do?
whats your gpu?
I have okada already.
then what you need exactly?
they are already doing that
I only need some audio to play through my soundboard to convert it to that merged voice and record it.
Some audio that is long enough, so I can make a dataset.
nvm I scrolled too much
its stuck on cpu dawg
T-T
im using rx 570
i dont understand this at all honestly
or it is set that way
cooked badly
idk abt anything but its working
so no file input?
if u mean the inference, use applio with zluda. I suppose it will be okay with at least 4 gb vram
Do I really need winzip for the voice changer
you have to unpack it somehow.
but i suggest 7zip
best way to like stop cutting the audio out?
7zip or winrar is preferrable
realtime?
hun, 7zip>>>
set gpu to your gpu
Why do voice changers have to cost?
set f0 do rmvpe
aint working tho only cpu
the one we have its free
Interaction has expired, use the command again for a new interaction.
oh
Yeah, but I need winzip
download 7zip
Is that free
both work at least
yes
i download it after
true
7zip is open source
this
..okay let me explain it differently.
I use Voicemeeter Banana to make this happen. B2 is input device that is used to convert anything that can be played in it. B1 is something that will be played through the actual microphone. When I play audio through soundboard, I can play it in the B2 to convert it into the different voice in real time. (since I cannot convert it in applio). With that, I can make a dataset.
Can you send me the link
you have AMD
google is free
ure telling me that u want some audios to be converted into a voice which is to okada so okada would transmit that and turn into a different voice and will play it thru to ur mic
a audio came from ur soundboard?
soundboard > okada > mic
correct. So I can make a dataset.
I really don't wanna spend more hours to make it in applio again.
Are you sure this is safe
sends any kind of audio thru my mic
you wanna make a dataset out of converted audios?
yes and if you dont trust check internet
yes. I did it before in applio and it workedd out just fine.
why would you make a dataset of an already existing model
imma extract the file
use the original model file before you added it into the voice changer
merged voice makes only a safetensor file.
only if you do it within the voice changer
-rt
Interaction has expired, use the command again for a new interaction.
first link
that explains it
yea you need to merge in mainline (discontinued), applio (suggested) or ilaria rvc mainline (discontinued)
in okada, yes.
I'm not good with PC stuff But I can try
just follow the guide, its easy :)
chat why im one my browser 😭
I'll just figure it out. I only need a long audio from youtube.
please follow the guide
again follow what i said
That's wokada, use #🔍│help-w-okada
?
dont merge with the voice changer
Okay. Can I have the link to the voice changer
-rt
Interaction has expired, use the command again for a new interaction.
first link
oh right, will keep in mind for now, like I said I never knew it makes safetensor file in the first place.
merged files in applio will be in pth, exactly what you need
I don't see the Voice changer in 7zip
Realtime voice changer for calls? Tell me ur PC GPU in #🔍│help-w-okada
it keeps on changing
How to make characters voices with emotions such as angry, confused, sad etc.
you cant
assuming youre using tts
what ?
are you using text to make them speak?
yes
then you cant
yo mods can i get permission to share my screen?
for?
i wannashre my screen when im playing dragonball sparkeling zero with a friend
if thats ok?
share
you can join and see aswell if you want to
ask mods for streaming permission, and be sure to avoid showing any inappropriate stuffs

@quick jungle can i get permission to share my screen
Watchu want? :3
its a bot
yo do you want me to still fix it
I mean if u could fix it it would be good, but it's left to rot since months
Sorry I tried couples times but my RVC keep showing Frequent errors occur. Please check if the model of the framework being targeted is loaded.
And my colab are showing my server is not an accepted origin. (further occurrences of this error will be logged with level INFO)
I tried to search in github but I still can't find anyway to fix it

(Fun story I used 3 hours to do pip install pip==24.0
download a precompiled version
What that mean (Sorry I'm a script nub
it comes with everything preinstalled basically
ye
why?
what colab are you using?
google colab
Alr
dmed u
alr mb
-ngrok
yeah
damn
ohhh its a voice changer
yeah
Cuz when I use my gpu
the res of it
are 20k ms
basically it take 20s to tranfer my voice to u know
what gpu do you have?
are you using the forked version?
forked version?
the new version
Yes
better support for amd cards
1.5.3.18a
Interaction has expired, use the command again for a new interaction.
first link
alr tysm
also use #🔍│help-w-okada
np
skill issue
dw all fine
smh
why….
I spent 56.5 for using that
I though google were p2w 💀
lmao
u can use it for free 4 hours daily not granted, or use ur own gpu
I even used the limit of ngrok
I need to use my nd google acount to log in to use it 
u can use horizon
it's another tunnel
anyways, ur gpu should be good enough
More than enough ig
old colab too, https://colab.research.google.com/github/hinabl/voice-changer-colab/blob/master/Hina_Modified_Realtime_Voice_Changer_on_Colab.ipynb is the updated one
yeah, u should follow the wokada deiteris fork guide
let him download the fork
Idid
ye i was just saying
seting up tho
where can you find models like the melband roformer karoake model by viper?
do they upload them I am too nervous to download some random ckpt on huggingface lmfao
@viscid moss you got this
he already sent me the mega but idk where he obtained file from
I wanna know the source
The ViperX models are here:
https://github.com/TRvlvr/model_repo/releases/tag/all_public_uvr_models
And most of the UVR5 models
*she, btw
well u can check the model hash, that way you make sure it's the same original just re-uploaded
also huggingface is safe :)
ye
thanks a lot guys
well doesnt huggingface literally virus check each file regardless
just wondering why isnt the viperx karaoke and stuff included in uvr 5 model download set
anjok is working on that
oh thats sick
Will be available soon ig
nice
thank you
i wonder why everyone recs the mvsep when the queue is ungodly lmfao
not worth
What sample rate should I use?
32k
Alright, thanks for the help.
no problem!
everyone always says you dont need to cut audio yourself, but I realized when I train with my own clips, my Crepe models even beat the RVMPE models! I think the auto clipping of applio causes the ai to be confused. Has anyone else experienced this>?
wdym by train with my own clips
like, slicing the dataset yourself and disabling rvc's splitting?
are you using the script found in the docs? you're supposed to slice the whole dataset in chunks of 3 seconds with an overlap of 0.3
and crepe vs rmvpe the difference is subtle, crepe models are softer while rmvpe are more harsh
i just basically use a audio ceiling to prevent white noise
and then split the audio into like 5 min chunks
rmvpe has always sounded better to me tho
yes bc is more robust and new, crepe its literally from 2018
i wish there was just a vid of someone training with the doc settings
rmvpe was made in 2023
most vids just do what I do and throw the audio in the training
I did not use the settings in the doc lmfao
this is bad, hifigan can't read files over 5 secs
you're not slicing it yourself, rvc is slicing the dataset for u
no, but the pretrain split it for me automatically in applio
Yes i use the applio pre cut setting
so you didn't disabled rvc splitting
and the process
no I didnt
but still my old crepe model sounds better when i split it by hand
every training is different + batch size matters
true I just run like 8 batch size even though i have 12 vram bc i train on 32fp
u can actually get different results using the same exact parameters
fp32 was big mistake activating maybe?
enabling fp32 is a W move
fp16 is too unstable
or maybe i should deactivate the process audio preset in applio?
nope leave it enabled
also should I make the input audio loud or just leave it as is?
some of the audio i train is raw vocal dataset and is quite
i can tell you the """right""""(not really) way to preprocess a dataset
sure
so open audacity, open your dataset (if your dataset are multiple audio files, merge them into one audio before doing this), select the whole audio, find truncate silence and use these settings:
damn i forgot
before doing that, convert the dataset to mono
bc rvc cant read stereo files
oh shit so could that have ruined my training?
no bc applio converts it to mono anyways
ah lmfao
but since you're doing this method, you should convert it to mono
isnt truncate silence same as doing noise gate in fl
no
this literally removes silences
and leaves only the speech audio
so like this
yup
after you have your truncate silence dataset do the next step
use these settings and you should be fine
only use simple mode if you truncated the silence
never use it for datasets that have silence
ok thank you a lot for this
is there a full guide so i can train
My only question is why truncate instead of just using the auto setting?
Last update: Dec 24, 2024
automatic mode leaves more silences, is not technically bad but you require more epochs to train
ok thanks where do you find these links
-docs
- How to use RVC Mainline Colab by Cauthess
- AICoverGen Colab Guide by Eddy (Spanish Helper)
- Create a model with RVC disconnected (colab) by Angetyde
^
Hey, hypeslxyer! Please use the command !howtoask to increase your chance of getting help by structuring your question in a way others can understand better. Also make sure you're asking in the right help channel:
- General RVC help: #✨│ai-help
- W-Okada / Realtime RVC: #🔍│help-w-okada
- AI image related: #🔍│help-ai-art
is 8 batch size good for 13 min vocal data
yes
what about 12 or 4
4 is unsafe but works in some cases where the dataset is very monotone and repetitive
12 works in some cases as well
where 8 fails
do you just use the built in pretrain?
yes, original pretrain
ok just wondering if going to 12 batch size would help it
8 is safer
hopefully truncate silence will remove all random noises in a studio sesh
as long they're below -42,5 db
also do you personally use the melband roformer karaoke models to isolate leads?
yuh, only good model we have for vocals that have harmonies
lmfao, audacity doesnt take m4as jeez
😭
you can use ffmpeg
a must have for everyone working with audio
my beloved
well no, m4a is not a codec
bro convert it to wav
it is a container
lmao
oh i thought m4a was codec
nice for a while i thought youtube had opus but i dont see it anymore to download
it's kinda mp4 counterpart
just the difference is, it doesn't contain " video layer "
essentially, it's " MPEG-4 Audio Layer "
lmfao now i gotta learn cmd ffmpeg fuck!
Ye, Opus is in fact a very good codec
for lossy stuff
direct successor of vorbis ( ogg )
.\ffmpeg -i audio.m4a audio.wav
^ ye, will unwrap the container and get it to wave
Tho as Lyery said, if you're working on audio from youtube, use yt-dlp.exe ( a cli tool )
it'll fetch the audio from yt servers in best possible available quality ( mostly opus and rarely aac ) And then convert that using ffmpeg
(( that's my exact workflow for ' yt sourced audio ' ))
damn never new m4a was container does it just hold mp3 atp?
I believe aac
i use ytdlp but cmd version and i can only see m4as and mp4s i just grab best m4a tbh
use -x arg
damn wtf converting it to wav made filesize 10x
.\yt-dlp.exe -x url
else, aac or m4a ( still aac I believe. )
yeah -x usually grabs the video tho ngl
well no
no
oh sh
if wav is container, how does it make file size increase if bitrate stays exact same
The thing is
wave pcm is not using any compression
so effectively, whatever would be ( which is not as file comes from lossy compression )
gets ' 0 filled '
that's a thing that has to be done, no other way.
All the missing data is just filled in
ok so i am a noob
So yea, whatever you have or get from yt-dlp -> wave
that wave after editing / processing -> 32 bit float 44.1khz
and i just did a zoom out on the audacity and the audio is completely peaked out now
tf
oh i zoomed on the db lmfao hahaha
32 bit float wave?
yes
it's the bit depth
32 bit float is the " target end " for files that rvc processes anyways
and if you are using different songs etc we can import multiple files into the training at different db normalizations?
- you avoid potential issues during editing
or should each seperate session be normalized to equal level
Well, the whole dynamics aspect of rvc is a lil skewed up anyways
Biggest issue is, if you mangle with dynamics on your own ( be it rms, peak norm or compression )
it can screw up model's ability to express itself well at high volumes. it'll cause tearing
so at best... if you have to, tame the peaks and maybe add a tiny bit of compression
export audio as mono wav or stereo?
wave
i forgot to make the audio mono lmfao
just copy one channel into a blank file
( aka, do not use any " merge channels " algos or such )
?
mono wav
you wanna delete 1 channel from the audio
either L or R
and just save it as mono wave
tracks > mix > convert stereo to mono
Alternatively, copy / highlight just 1 channel of your choice and paste it over
Cause like, depending on what audacity does
i see
if it fuses the channels / centers em, it's pretty bad
that's a " merged mono " and not true mono
lyery is just trying to get me to fuse you are just saying delete one to prevent the wrongful merge and distortion
how to delete one track?
he knows better than me
Because rather than raw mono, you get a fuse of channels, ish ( as long audacity does that which I am not 100% sure of
yeah how do you do it
as it does kind of algo magic and averaging of phase and such
hold on
also clipping every 3 sec with a .3 sec overlap seems like a disaster intuitevely to me idk why
You'll have uhhh
if you are training on only 3 sec you are guaranteed to get clipping on harmonization it seems like that would mess up the fluency
no
Then select the other one ( which ever you want but I personally use RX and do params measurements on both channels to pick the better one ) and delete it
leaving only just 1 channel ( and so, you have your file mono in the end

wait couldnt you just pan 100% stereo lmfao
Cause well ye, you have 2 channels visible
im interested in this 🥹
I mean sure, panning
but it's just 1 click
then x one the other track
Done
That simple
ah nice i figured it out thanks'
Nice
i wonder why
and you can train multiple files or do you need to merge them into 1 audio file
no difference really
But best is imo to just use 1 track, 1 file. And do processing on 1 file ( to keep the uniformness
yeah
uuuh i can't remember the explanation lmao 😭 but dw this method is fine
well
the only reason overlap exists is to avoid the discontinuity in " context "
naturally, if you can afford to split it all on your own, properly, you can bypass it
But that's the best we have if it's automation
( I tried various methods to better it, sadly didn't work out well / significantly. Such as envelope or better rms methods )
Tho ye, dw about it. As lyery said, it's alright
i see
thank you
the last question i have is regarding the normalization
like one of my sessions is higher peaks and normalization
just wondering if it will mess up training
should i lower the gain on the loud one
like i just ran a normalization of -10db to match to look of each
tbf, normalization won't help here much either way ( not as much as compression which in the same time can break stuff but whatev. )
i see so dont worry about it
But then, it all should go well if your audio comes from " same source "
in that case, you can more or less match the " overal " volume levels
per clip / track
some are way loiuder than others so i am just normalizing until the clips look the same
doesn't have to be perfect but it'll help
you can just do each track at -3 dB norm
that's because rvc does normalize each anyways ( each cut 3 sec segment )
but ye, getting em to similar levels is a nice thing to do regardless
is there a way to collapse all tracks in audacity to one continuous singular
Yes
That's actually the only reason I keep audacity ( and use it just for that lol )
Polish
do i need to use this before exporting one of the stereo tracks as mono?
Nicee haha
Oh, well no
the way I do it is copy 1 channel
and paste it into new blank file
length will auto match
pick one that seems better btw, the channel

for instance, one that has better sdr levels or dc or peaks, you know the deal
do you recommend alternativing sides like left-right-left for clips or only using left-left-left
After that you export it
and import again
Then you have one continuous file
i see thanks
Not really
as in, doesn't matter tbh
best channel per track
but if it's the same source..
for instance, 1 anime but different episodes
I always want to believe the recording session and so on was set more or less similar
so in that case I do pick the same channel throughout my project ( just in case ✨ ) (( Unless one is explicitly bad or worse
damn the align track thing isnt working ugh!
how so?
nevermind i figured it out
a
no
aight
At least I never do that
guess you can try one time
but I don't see any point in that personally
shit i just figured out something
I guess the biggest clue whether you should try that or not is seeing how phase behaves
if channels differ significantly in that aspect, perhaps you could try
if you export as mono and you have left and right tracks it just mutes the right one
hahahaha
ill just pan left on the rights
oh then problem's solved, if there's no mixing or centering algo involved
haha
then you good to go
thanks for all the help brother
means a lot
and then what settings for applio you recommend for the preprocess?
manual splitting setttings?
Given the most propable case for you uhhh, go for default
and do include preprocessing
it's the normalization + butterworth filtering ( 0-57hz iirc )
i see i was just asking bc the laf dude was saying 3sec with .3 sec overlap
ye, that's the default
automatic + 3 / 0.3
Simple can work too
but if you truncate stuff, that is
Silence truncation
i see i guess since we already truncated so simple makes sense
i will do
thank you for the help
It really depends
for instance I used to work with bs 12 / 14 and 16 for most of my above 10 or 13 min sets
yet sometimes that works like crap and 7,8, 9 are safer
As always I recommend bs range finding
train the model at: 4, 8, 12, 16 ( each for 400-500 epochs )
oh wow, I didnt know people did that
if you're aiming for " perfectionist " model
well no, people do not do that
but I just recommend that workflow if you're a perfectionist like me lol
yeah i am lmfao
( tho in reality, both learning rate and batch size should be picked individually )
Oh ye, in that case def go for that
and from there see on graphs + do some inference testing on various epochs
yeah i dont even know how to modify learning rate in applio
n see which one does the well
from there, you can finetune it even further as in, do -/+ 1 batch from the base batch size ( one that performed the best )
what's the frequency response of your files?
not sure haha i am super noob
nah, that thing don't touch was just giving examples
lmfao
You can try spek software
i usually just put 48k because its highest level
pretty basic but will do
is that bad
model's sr should be more or less aligned with your files
with some minor exceptions
for example, a deviation of 1-2khz shouldn't hurt or 3
can i use spectrograph on audacity
For instance, if I have somewhat imperfect audio ( can be compression ) that's ranging anywhere from 41 to 43khz or even 44
its peaking at 19k
I'll use 48k model ( because those extra 2,3 or 4 khz does mean clarity and fidelity, esp in respiration
In that case 40khz model ye
i wonder why
yt should never be used for 48k
because that's nyquist range
i see haha
damn you have to become a audio expert for this
for them you do *2 the sr and that's your true sr
so using 48khz was ruining my models possibly?
Quite possible yes
because the models are trained on specific frequency ranges ( pretrained models
they are accustomed to working within a giving frequency spectrum ye
Yup
but it's not a 'hardcoded rule'
damn thanks so i gotta download the legit studio rips to be able to go to the 48khz range
For instance
or find raw vocals with really good mics
My Kurisu ( best model I ever made )
was 38-42 ( variable ) sr
yet trained on 48k
One of my fave tracks from Eve. Remember back in my worst days I used to spam it a lot. Oh yeah, I kinda love how I don't have to readjust Kurisu's pitch with Eve's stuff, they just click on " 0 ". Enjoy ~
Original song by Eve and all people associated with the project:
https://www.youtube.com/watch?v=nROvY9uiYYk
� Cover details �
Inferenced ...
Yet she sounds nice, right
yeah
So there's no strict strict rule, yet it's highly advisable to stick to what I mentioned yup
yeah i wonder if the mismatch causes audio ripping or the glitching noises
not quite
it primarily affects the model's potential / generalization or generally adapting to your voice ( finetuning potential
damn cant find the custom cutting in applio
that makes sense actually haha
where is audio cutting setting located in training
cant seem to find
show ss
newest
hmmm
i saw it a while ago it disappeared for somereason
im just gonna reboot rq
Ah thats the latest compiled, yes its outdated
so fucking weird i cant find the simple cutting @analog obsidian
use latest main branch repo
ah, if it's compiled and not from repo
im using 3.2.8
then outdated
ye but precompiled / zip packages aren't updated in-line with repo atm
can i just not use simple lmfao
download the repo and use 3.2.8's env folder
and if that doesn't work, delete the borrowed 3.2.8's env folder and redownload all ( using install-applio .bat file )
Lyery will help you hopefully as I have to get back to my work
without the bugfix is broken
tldr of this convo:
we teach him the truncate silence method of slicing
he can't do it because he's using the latest compiled version, which is outdated
just do this and decompress it in your applio folder
don't use run-install.bat
no need to reinstall anything
run applio
or use codename's fork
i mean @analog obsidian i can still use my current version and just use default splitting?
no
it seems to have worked
what
fuck i am at epoch 100 alr
why not codename said i could
if u want to use default splitting then don't use truncate silence
well the idea of this method is that the chunks are meant to be consistent, rvc old slicing is not consistent so yeah
i see
does not affect quality
just means your model is gonna take more epochs
i see but all the 16k splits are legit 3 seconds anyways
the truncate silence method helps rvc to learn the dataset faster
there is not an audible difference between this method and the casual old method of automatic slicing anyways
only thing that changes is how fast rvc learns the dataset
i see it just prevents those like 1 second clipped audios?
so the epochs for same audio is lower?
rvc kinda ignores those samples
oh i didnt know that
lmfao
i wonder what would happen if you set the time for each to 10 seconds
not really ignoring but it separates them from the rest
i used to use like 7 sec samples
so the model learns the dataset even slower
since it has to learn two things at the same time
instead of 1
every 3 sec chunk get paired and every 1 sec chunk gets paired
and rvc learns them individually
smth like that

iirc hifigan only accepts 5 sec max
i might be wrong with this tho no idea
dont worry it will not kill your quality
the model will learn the dataset a bit slower
but thats really it
you can continue using the old slicing method if you wish
ngl you said truncating wouldnt help but my model seems to be improving way more conistently this time
just looking at the loss values in cmd
yea because like i told you, it learns it faster
i see
so you notice it sounds good because its learning faster
just removing the silences = less dead space = faster training
makes sense
you are really just maximizing the roi
u still need silence for training
just not a lot
thats why the setting set it to kept a bit of it
yuh
- rvc injects 2 silences in your dataset
this is bc you have to teach the model to understand what silence is
at least that was noobies told me
2 of them is really enough for typical dataset
haha do you think rvmpe is the ultimate development of this technology?
i wonder if rvc can even improve atp
very robust and good
the problem is realtime translation
it still sounds blocky on my end with large chunk size
well realtime perfomance heavily depends in the dataset
singing datasets are bad for speech
yessir
while speech datasets are okayish-mid for singing (it depends)
we need to be able to develop some agi ig to make the tech flawless
^ There
a colorful and vibrant in emotions and pitch set can sing well
yeah i try speech on juice wrld model and it works since he raps haha
lmfao reminds me of alex jones set
what epoch level do yall tend to set the models at
like 300 for 15 mins is peak usually?
sadly old graphs are not accurate enough to show you which epoch to choose since they only tell you the latest value in that specific epoch
have any of yall looked into onnx model conversion for realtime
apparently you can offload to cpu?
at least when i tested onnx it degraded my model quality a bit
and also on nvidia is slow af
i just use g/loss ngl
total g loss
then max the smoothing
yea g/total (from 3.2.8) is outdated by now
in simple words, it only tells you the latest value of that specific epoch
this means you might have a better value in another epoch and you'll never know
so if ur lowest g/total was 29
u might actually have another low one hidden
the new graphs fixed this
what cant you see on the graph every epoch?
i never had that issue
i can see all of them on the graph
no like you see that if you hover your mouse in a random point it tells you something like "value: 32,5"
well that value is one of multiple values in each epoch
so 32,5 in the epoch 100 (for example) is just its latest value
damn just no updates to the package installer huh
so just zip the github repo and unzip in the same file and replace all folders?
rip to all my premade model logs lmfao
F
there's some cleanup in progress, so your stuff may break
like need to update filelist.txt and replace "mute/v2_" with "mute/"
oo
and need to add soxr using env/python -m pip install soxr
shit cant i just pip install entire project haha
waste of time
does anyone else have the issue when converting vocals?
its like the converted file is longer than input causing the vocals to be off beat
enable split audio option in applio's inference
wait is g/loss/total reliable
apparently the smoothed one starts consistently increasing at 14k steps
but the higher steps still sounds vastly better
like 20k even sounds amazing
it really sounds more real
g/total is the average of mel, fm and kl
mel is the clarity of your model, this metric always improve the longer you train so this is why you feel it sounds more real
hmmmm didnt know that
but your g/total graph stopped improving the moment it started rising
what would you do w this
i choose generalization over everything (g/total) so i always use the lowest point in g/total before overtraining
is generalization just the ability to be applied in any scenario?
generalization is the ability to the model of generating new audio
overtrained epochs have distorted frequencies and other bad stuff
because it seriously sounds much much more accurate to juice wrld at even 30k steps
even though it started rising at 14k
yuh because mel is still improving
so the spectogram is clearer
which gives the feel the model is more realistic
i see, why does everyone use generalization when clarity is much more important for realism
mel is more likely to keep going down even more than 1k epochs
so generalization = more flexibility
because a model that can generalize well sounds good no matter the audio u give to it
i see that makes sense
but anyways like i said before the old graphs only logs the last step of the epoch, we can't tell if your model is actually overtrained or not since the graph is innacurate
so at .999 smooth the graph start increasing at 240 epochs but the lowest recorded loss was at 270
the new ones are like this
should i use 280 epoch or 240 as the base
the applio main branch and codename's fork use more accurate loss values which are average of each epoch instead of the epoch's last step (since the mainline rvc versions)
and i can confirm every epoch in the rising zone sounds like shit
so if i update will i be able to see the averages or will need to retrain?
i believe your model is not overtrained and the g/total is just fluctuating
but we will never know
the log is innacurate
oh really?
wow so i should just take whichever sounds best then since its inaccurate
you cant change the logged values unless you start over training
makes sense
i remember back then i used to choose an epoch based in the mel graph
i felt it was a bit more reliable than g/total
yeah it is honestly sounding like 30k steps is the best
500 epochs sounds a bit overtrained ngl
idk tho
it would theoretically make sense to choose the mel graph if converting rap vocals to rap vocals imo
a rap model will always be good at inferencing rap songs regardless
its literally whats made for
true haha
yeah i am getting robotic noises at 30k steps
thats overtraining correct?
yeah
robotic sounds happen when the model is overtrained
as long the model is not robotic, its fine
imma just download a conversion and compare
haha
ill update because the new g total is accurate right?
overtraining is pretty easy to spot, literally if it sounds robotic, its overtrained
yeah new g/total is accurate
every new graph is reliable now
u can trust them
-train
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- FaceFusion UI, by Nick088 Google Colab
- FaceFusion NO UI, by Nick088 Google Colab
- EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.

AI HUB Docs
