#✨│ai-help
1 messages · Page 197 of 1
during training it builds a link between phonemes and spectrogram, during inference it uses phonemes to build a specrogram, and it "fills the gaps"
encoder training masks parts of the sequence and the model has to generate something that matches the original
I don't think your explanations will be any good for newbies in here
so the process repeats with different parts of the sequence being masked every step until the model can generate the entire sequence on its own
yes, it is a bit advanced
kl almost always goes down and its contribution to training is rather small, so it can be ignored
Suno to create a beat similar to the artist, "2000s hip hip chipmunk soul, Chicago rap"
Extract the vocals and instrumental with UVR
Infer the vocals with Kanye RVC model in Applio
Combine new vocals and instrumental in audacity. Do some tweaks to the vocal mix and you're done
Just using kanye as an example btw
hey everyone, i have a question, i wanted to make a ai voice singing a music, but when i use rvc, the ai voice is also singing the instruments that are played on the song, so it makes it weird, how do i fix that ?
you have to separate the vocals and instrumentals
Last update: Feb 29, 2024
it doesn't automatically do that unless you use aicovergen or aicovermaker or weights.gg
seems like just another troll
How do I fix this with Applio?
dont run install as admin
get a compiled version and unzip it into C:\Applio
Yeah it installed fine without admin when you suggested, but then clicking run-applio I got this, I also didn't run admin on this.
try redownload completely while turning off firewall and defender smartscreen
download 4.5gb zip
unzip to C:\Applio
not to some other weird folder
wait until unzip finishes
I can't seem to find the location of C:\Applio? or do I have to create 1?
yes, of course
Okay, I will give it a try. Also does the compiled version usually takes a while to extract than others?
if you use 7zip, not that long, if you use windows, it is a slowpoke
so about a minute
🤔 I want to have my ai talk in voices using tts, what would be the best setup for that? What app has api support? All local 🙂
Okay, seems like the complied version does not come with the install bat files?
You just run the run-applio no?
that's the whole point
Okay, its working now. I had to wait a bit longer than expected.
thanks guys!
Whats better in Precision? fp16 or fp32?
what batch size should I apply with 30+ mins of datasets? I have an RTX 3090 24gb.
Pitch extraction algorithm, which one is better for singing?
and lastly, what Index Algorithm should I use?
fp32
16
rmvpe
choose "auto" in applio
ive used bs 6 on 30 minutes of audio before
and that model is one of my best (in terms of most fire emojis)
too low batch size in big datasets can cause the model to be stuck in suboptimal predictions, and also you're slowing the convergence too much
- noisy graphs
it started ot-ing at 92 epochs
Thank you
yeah iirc it was wild
thats bad
models should converge at 200 epochs
i dont have the logs anymore but it sounds fine so
ive had a model ot at 68 epochs
yes but u made it to be stuck in a bad local minima
so it just overfitted
😭

is it possible to still game while training? if so, lower the resolution?
ideally u want them to converge at around 170-200 epochs, then you train until they start to overtrain
was 20 min not 30 but yk
yes reduce graphics and caps your fps to 60
Thank you
it sounds good because the dataset is good but the model got stuck in a suboptimal place
too low batches causes model to be focus learning one specific thing rather than trying to learn more
imagine being so good at making datasets that your model is bad
Out of curiosity, how many EPOCH to train with? 30+ mins datasets with added pretrained like KLM? batch size 16
dont worry we learns from our errors, today i learned what segment size is thanks to my error xD
set max epoch to 500 and watch tensorboard
if you notice g/total just goes up for over 1 hour, stop training
and select your lowest point in the mel graph before the g/total rising
can you provide some examples? if you want
bs is like the only thing im confused with in rvc, ive heard so much misinfo that im confused on whats real 😭
Good example vs a bad example?
past red circle = overtrained
use between these: 4, 8, 16
16 for 12 minutes and above, and decrease it to 8 if the graph is too smooth
As long as its some what flatline, then anything after that stop training?
thx 😭
g/total should always go down, if you notice a rising trend like the image i sent for over 1 hour you stop the training
when the graph rises means the model is getting confused
so nothing useful there
that is the margen of error, so you want the less error possible aka the lowest point
rising is more errors
Thank you
only use bs 4 for very small datasets like 5 minutes or below
a while back (like mid ai hub 1 days) scr used bs 1 and said it actually made their model more accurate to the source with like a set of like 20 minutes or something
bs 1 is pure noise
you're just training the model with only noise at that point
same as your case, model found an suboptimal local minima
then learned from that
and only that
now i know better 😭
also didn't scr used to inference things that were in his dataset? because doing that is def not a good way to measure generalization
he did a seen and unseen test
every batch size works technically so like u can sure train a batch size 1 model but you're training pure noise
forcing the model to stay in one place
-rvc
- How to use RVC Mainline Colab by Cauthess
- AICoverGen Colab Guide by Eddy (Spanish Helper)
- Create a model with RVC disconnected (colab) by Angetyde
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
tldr; sounds more accurate because its actually overfitted
so thats why the model did not improve much more than just a couple of epochs
Is there a known reason as to why sometimes if you're throwing a full song size vocal into RVC and there's a long silence somewhere in it, when the vocal comes back in, there can be a lot of artifacts? I was gonna look into splitting the audio up programmatically (I think Applio implemented that?) in hopes that it would help. Maybe it would speed up the overall inference too
Should I Cache Dataset in GPU?
no
Thank you
enable split audio in applio's inference
@analog obsidian I'll give it a shot but any idea why the issue occurs in the first place?
idk im not an applio dev 😭 also be sure to always have applio updated to the latest version, current one is 3.2.8 bug-fix
I can't seem to get this training to work?
remove spaces from your model's name
preprocess and feature extract again but this time with a new name without spaces
Adding "_" to replace spaces work? or do I have to avoid anything like that?
Because I got an error again
Hey, Bizarre! Please use the command !howtoask to increase your chance of getting help by structuring your question in a way others can understand better. Also make sure you're asking in the right help channel:
- General RVC help: #✨│ai-help
- W-Okada / Realtime RVC: #🔍│help-w-okada
- AI image related: #🔍│help-ai-art
hmm looks like its something else, im not an applio dev so i can't tell exactly whats going on here
you could try reinstalling the latest compiled version and try again
but better wait for a dev response
Yeah Im on the newest compiled version
Which version did you get?
The newest 1 from github
ApplioV3.2.8-bugfix
complied version
Yeah from 3.2.6
Not sure what's going on exactly with the mainline but, if you want, can recommend you my fork of it
a matter of 1 click install and running, is stable and has few nice things to help you train
Sure, whats your fork?
Gimme a sec

yea because i was just installing it lmao 😭
for now dl it that way
no zips atm
then run install and run fork bat
as always
Thanks, i will give Fork a try
thx, lemme know of any potential issues
( should be stable tho
oh yea, and the new gimmicks you'll find at: Trainint tab, advanced settings and at the bottom:
@brittle wing
just in case
woaw thats new, whats the best option to pick?
both
avg loss specifically is very good
Basically, the warmup uhh
say you wanna train for 300 epochs ( approx )
then you could try 30 for warmup or 25
as for average.. recommend you first doing a test run for first few epochs and see how many steps you get per epoch
For example, if you have 40 steps per epoch, set the
to 10 or 12
can be even 8.
You get the point, it is some chunk of 1 epoch's total steps
not too much, not too little
it's to get an " avg " performance's metric from that epoch
Im training on 30+ mins of singing dataset on KLM 3 32k pretainer. I assume 500 epoch is enough
yeaaa, you can always pause earlier in anything so no issues here
in that case, try 35 for warmup
( or if you wanna stick to 10% of total epochs rule, do that but I'd recommend 35 at first )
effects of warmup on rvc in general aren't well tested in field yet ( on actual pretrains, that is. )
Great, ill try
Neat
@glacial pollen I got an error for installing, and then got an error for running Fork?
Huh, this shouldn't happen
is it normal that the rest of the graphs are missing?

oh, you gotta type in " total "
in scalars to see em ( in filters tag )
it's normal
@analog obsidian
oh lmao sorry im sleepy again, found them
😭
Can you dump the full log ( from the very top to the bottom ) and send it in dm?
Sure, how do I dump the logs for you?
C:\Users\PC\Desktop\codename-rvc-fork-3-main\logs?
just highlight all in the console, paste into notepad, save as txt and send me
also before that
what windows u running?
11?
either way, we can discuss it in more details in dm
Yeah im on 11
update: issue fixed. Case's closed
@glacial pollen You recommend 35 on the "warm up phase" for 30+ mins of datasets on 500 epoch? What should I put in the "Frequency of avg running loss"?
@brittle wing
pretty much you first gotta run a lil test for few epochs
we need to know steps you get per epoch
Should I lower my epoch 500 to something lower for now to test to get the results of warmup phase and frequency of avg running?
you can run for 1 epoch really ( but do 2 )
Okay, then should I set the "warm up" + "frequency" on default settings for now?
you can keep both at 0 for now
or whatever def value i left there, doesn't matter just yet
none of the options will affect steps per epoch you'd get
Okay, ill test it with 2 epoch. After its done, what should I look out for?
on tensorboard
an example
you wanna check S value for your epoch
in this example you can see it's 25 steps per epoch
consecutively, 25, 50, 75 etc
Once you get what you have, we can think on what to try
come on bro, you can't train using one 30 minute unsliced audio file
who does that?
where do i download the virtual audio cable?
youre following the written guides right?
what’s ur pc gpu?
because the vac is already in the written guides, but im feeling youre following some outdated yt tut
AMD Radeon 780M
i already downloaded the cable btw
i just don't know how to use the okada with it
It’s sliced up into 1 audio file
Wav
eh usable i guess
-rt
Interaction has expired, use the command again for a new interaction.
The 1st link is wokada deiteris fork, its better in performance
the 2nd is original wokada
@lean chasm i would highly suggest u to use the wokada deiteris fork instead of yt tut one
wait so i have to uninstall the wokada and get this one instead? (they are the same?)
where did you get your wokada? From a youtube tutorial?
Could u send the link of the guide you used or tell me where you got it from?
yeah
youtube tutorial one is old
All youtube tutorials are 1 year old
meaning you got an older version of the normal wokada
You should delete that one, and download the deiteris wokada fork
its way better in performance
you just have to read the guide
Download NVIDIA
Download AMD, INTEL and CPU
these are the two options, i got a graphic card too so does it count as nvidia
you got an amd gpu, so you should download that
you told me your gpu is amd radeon 780m
or do you got another nvidia gpu?
whats ur nvidia gpu?
rtx 4070
I don't know if its here i should ask this, but i really need to know, Is there a way to use text to speech with RVC?
Cause i have a problem called "My pc is in my brother's room and i don't wanna wake him up" So i wonder if there is something that does the voice from RVC work with text to speech
i don't think there is
Welp, it was worth asking
Hello guys, I have some problem with the program, I have no sound
There are different Text To Speech (TTS) AIs:
GPT So Vits: RVC isn't as good as GPT So Vits for tts, but gpt so vits (few shot tts, which means needs just a lil training for models) can't use rvc models (and viceversa), and its only limited to: english, chinese & japanese, if you wanna check gpt so vits instead, read https://docs.ai-hub.wtf/tts/gpt-sovits/
Freemium 11labs: An easy way to do TTS is https://elevenlabs.io/, you can't use RVC model on this but its a mostly premium easy way for good quality TTS
FishSpeech: FishSpeech is a 0 shot (no explicit training needed) TTS, if you got a good pc you can use it locally else use their site
With RVC Models:
RVC is natively for Speech To Speech, but forks such as ilaria rvc mainline & applio have built in tts (using Microsoft Edge TTS to make a generated tts audio, which i suggest you to choose a tts model that is the same gender and language of the rvc model you wanna use, and then convert it with rvc)
If you wanna do tts locally with RVC Voice Models (if you got a good pc):
If you don't got a good pc you can do tts with RVC Voice Models on cloud:
-
Ilaria RVC Zero (Running on A100 GPU, free fasted rvc on cloud) and the guide
-
Use Applio UI Colab (with google colab T4 free daily limit gpu)
-
if you don't wanna use edge tts, you could try another tts ai from our tts index and use the output as an input in rvc
Which program? Elaborate
There's a lot of different tons of AIs
MMVC
If you mean the realtime voice changer for calls, Wokada, be sure to download the deiteris fork from the written guide and to not follow yt tuts
That's called Wokada as it's developer's name
This is the wrong channel
Use #🔍│help-w-okada
And be sure to not follow yt tuts
thanks
Yw
@low shard i downloaded the nvidia version and its runs on the web?? i thought it was a application
it runs on your pc gpu
it just got a web user interface
WebUIs (like Gradio & Streamlit) are used ALOT on almost every single AI Applications
They are way easier & faster to costumize/build for developers
And most importantly, it can be used on cloud (remote good pc), as many people like me don't got a pc good enough for AI
(A normal application program built with qt or tkinter wouldn't be possible to be shown on cloud)
Dw about it, uses your gpu
about the vb cable, i've done like the instruction but i can't use it in discord
i can use it on the web perfectly but discord is not recieving my audio
would yall say this is overtraining?
#1159290752195633273 is the place you want
( and don't spam-advertise please. )
What is the difference between AI covergen/Mangio/Applio/Mainline?
Forks ( different takes on what rvc originally does )
- Mainline is typically more mentioned in context of original rvc
- Mangio was the first fork of rvc ( it's hella messy and outdated )
- Applio is kinda a successor of Mangio but maintained by different people / team.
Packs the most features and can be considered more useful, modern and advanced than rvc
- covergen is probs only for covers and not training but I haven't used it so can't say for sure, I'd recommend to avoid it as it's most likely either old or too niche
Thank you for explainig it
if you need a lil more functionality, recommend you my new fork ( applio based )
Other than that, Applio is your best bet
My internet has been out for 26 hours but its finally back
i finished downloading the models and now when i run python src/webui.py it gives me this, should i be worried about anything
it opened the webui perfectly fine but will that affect anything
should i? and if so how do i
Don't
best course is to let it train for longer until you see dips or actual signs of overtraining
then you'd just pick an epoch from before the dip happens
( is why I recommend saving every single epoch during training )
Yeah I saved everyone 1 epoch
oh ye, in that case, lemme do some example scenario for ya
It's just one of possible situations
naturally it doesn't ( and won't ) be like that 1:1
but you get an idea
Okay, in this case just keep it training? it's maxed out 500 epoch
But then yea, it's a pretty meh scenario anyways because
its already finished lol
The loggings you see
are like
Okay so, remember how I mentioned an epoch can have N steps ?
Right, i remember
Now, the problem is, applio and rvc are logging in a manner where the actual logging point
references only the last step from a given epoch
so it's biased because
Lemme get an example pic
The green circle, is how it logs
So in reality, epoch could do an awful overal but the last logging ( last step ) could be " good "
or could be the exact opposite
Hence why I proposed average loss in my fork
``It's still not the most ideal approach
(( because in proper scenarios, training models has 1 extra phase in training where evaluation happens, where model's tested on unseen data and then scored appropriately ))
but def better than what it is rn``
oh gotcha
Yea
So I'd take it with caution, the metrics themselves
best to follow what I mentioned before + just actively testing the models by inferencing
just follow your ears
next time, instead of 500 epoch maybe do 700 or 1000 epoch. since its still showing sighs of training?
I'd say, a good approach is, double or triple your expected training ( total epochs ) time
that's how I do at least
👀
👍
Cause restarting the training has it's drawbacks compared to just doing it all in a 1 go
Should I wipe out any sorts of that data from the system and do a complete restart?
Tho, why u using covergen
When restarting the training you want to not touch the previous files
essentially what matters ( files created during first training - wise ) is the:
G, D files and tfevents file + most recent epoch ( small weight: .pth model )
I dont want to use UVR i just want to put in a link and have it do everything for me
A link?
wdym
like a youtube link
oh, welp.
it automatically extracts the voice from the songs
is there anything else like that
i can tell it keeps asking me to update stuff
if im going to use UVR whats the best webui to use with it
Thank you
You don't have to use uvr really, mvsep does the job
whats that
Is it free
It is, if you register an account you can use it as you please with some limits ( file size / length wise ) + you can't use ensembling but that's not important
only drawback ( but imo not as much ) is the queue
Nothing crazy to call it bad tho. That's your best quality bet anyways
bs-roformer / mel-roformer models for separation ( which mvsep does use ) are beasts
My recommended flow of work is:
-
- Get Applio ( or my fork if you intend to train in future )
-
- yt-dlp is a nice tool that lets you dl yt audio in best quality the yt's servers provide
-
- uploading the audio to mvsep to get your vocals and instru
-
- Using applio for covers
You can also use this google colab: https://colab.research.google.com/github/jarredou/Music-Source-Separation-Training-Colab-Inference/blob/main/Music_Source_Separation_Training_(Colab_Inference).ipynb
It has more models, some of which are better then the ones on mvsep.
Here is a google spreadsheet with scores on how the models perform ( higher is better): https://docs.google.com/spreadsheets/d/1pPEJpu4tZjTkjPh_F5YjtIyHq8v0SxLnBydfUBUNlbI/edit?usp=sharing
Uuuuu, that's a nice one
Is applio better than the rvc webui?
I havent used a colab before but dont they cost money?
Well, the core idea and principle is the same however, it's more modern and more optimized
aka, up-to-date
you get around 4 hours free daily
Besides, it looks way more complicated than mvsep
Ill try it if mvsep doesnt work out
Okay thanks
its not hard at all
Also, where do i get voice models
Well, notebooks / colab is def harder to set if we speak of just drag-n-drop websites naturally
But sometimes it might be worth it
Esp in this case
Here is a guide for it: https://docs.ai-hub.wtf/rvc/resources/vocal-isolation/#cloud-uvr
go for webui
Does it have to be on my C drive
Ideally, ye. Just to avoid any potential permission / path issues
C:\applio\applio's code n shit
ex. ^
"C:\Applio" is fine right
yup
Okay
whats Pinokio
Pinokio?
Not sure what you're referring to
I mean, I don't know this thing / neither used it so can't say much + this chat ain't for that
oh okay sorry
Best to avoid any 3rd party abstractions or whatever like that
as we only provide support for what we recommend
👀
👍
stability matrix / pinokio are products made by companies trying to take a niche "we make it easier for you to do x", but completely failing to keep things up to date and breaking shit that is not supposed to break
dont use them unless you're a complete dummy
ipad-generation idiot for whom a computer = screen and who can't tell .exe from .pdf
My respect towards you just now increased by 50%
Fair enough but i do have to admit SM is pretty useful, it makes model and package downloading very quick and easy
Hi, I have a question.
Regarding index algorithms. As I understand, Fiass is default while KMeans used to decrease the files size for longer datasets. Is there a trade off for using or not using KMeans on longer datasets?
Pretty much
tho I personally always go for faiss I guess
( ignore the subpoints 3 and 4 from the 2nd ss)
But then, imagine if the index was 1-2 gigs ( you never know what kind of datasets people would want to use ) yea
rip memory, rip efficiency
The biggest I've tried to use faiss on, so far and without any issues, would be 48~ mins of data
Anything bigger wasn't attempted by me
I think it was like 200+ or sub 3xx mb
Don't remember, was half a year ago
Thank you for help again)
yea it's alright, in fact I highly recommend gpt for explanation of more technical concepts
it's quite good ( and usually accurate ) in abstraction
kmeans only kicks in after 200,000 samples in the dataset
yooo wtf, I can hardly think of having 200k samples 💀
Finally got applio to work
it has been downloading for 2 hours at 1mbps
I hate my country
Anyway, where do i get models
rip on the dl speed. Feel ya tho, in 2009, in my small town, we'd have 64kbp/s ( some darn awful wireless tube based )
lol
Imagine the pain back then 💀
Dang
Either way, glad it works for ya now
but i mean in 2009 everything was less than a few mbs
Well true, ye
Yeah but i do have a few questions
go ahead
Also do you have any recommended plugins
Well, almost all models are done in fp16 so
so I believe, there's no point doing it in fp32, not that it'd matter much anyway
you won't hear a difference in this scenario
Okay
As for pluggins, I don't really use them so can't say for sure
uvr's meh, we got standalone uvr and / or mvsep / colabs
Elevenlabs I don't use
Basically, no point adding em unless you use elevenlabs, got api and stuff ( I'd assume
Ill stick with mvsep and if i have issues ill download uvr
What about voice models where can i get them
You can search rvc ai voice models at:
- #1175430844685484042
- In #🔍│find-models , Do /find with @earnest musk
- https://weights.gg/ (login required)
- https://huggingface.co/models (but watch out cus in hugging face there arent only rvc ai voice models)
- https://voice-models.com/
- https://thevoicemodels.com/ (for Turkish Models, login required with discord and level 2 on their server)
if there isnt one, you can:
- #1159289738314919936
- #1191429836321849435
- make it yourself with our docs guides https://docs.ai-hub.wtf/essentials/how-to-make-voice-models/
:wave: @crude flame, How can I help?
Available Commands:
• @weights find <query> or /find <query> - Search for RVC Voice Models
• /create - Create an AI Cover
• /image - Generate an Image
go away 😡
Okay thanks
yeahhh hush mode
Is it normal for the download link to be that long
Also why does everyone have by weights after their name 😭
wat, fp32 inference with fp32 models have an audible difference vs fp16 model + infeerence?
ah okay i thought it was a staff thing
I'd say it depends?
But tbf, most likely not
biggest thing between fp32 and fp16 is for training really
both stability wise and, well, 'max' potential you can squeeze out of the model ( quite likely given full precision and gradients' representation )
Whats the site u recommended for downloading lossless audio from yt
There's no such a thing as lossless audio from yt
All audio that goes on yt undergoes compression and other postprocessing
Yeah that thanks
usually u get .opus
( -x makes it fetch the best audio for a given video the server has )
Also can you tell me what models to use because i have no idea what these do
ohh ok, thanks for having fp32 enabled by default in your fork, i was training a model and forgot to check the default precision 😭
Okay
Last update: Feb 29, 2024
HAHA, knew it was a good idea
😌
Alr, I'm free now
in terms of that
Razer already sent you docks
But if you want to hear an opinion from me personally? I always go for bs-roformer
I go for wave
less conversions / encodings, the better for me. like it raw Do not shoot it
Yeah thats what i picked
Okay
Dumb question but how do i put the audio files back together after inference
Also why is the output 68mb
Audacity can do if you're not keen with masterring or mixing or music production
Its gonna take 40 years to download
unless you are handy ( and have ) adobe audition
alternatively, fl studio or any other daw
Is there no online alternative
Because wave is uncompressed, weights it's portion
I literally cannot download anymore things
Egypt really sucks internet wise
Everything wise actually
I mean yea there surely are but think of it that way.. not only you have to upload that sub 70mb wave
then the instrumental
and lastly, dl it all
Better to get audacity or such and call it a day
there's quite a few of things better than mp3
aac, opus / ogg
then there's crappy mp3
as for lossless, there's flac
They arent available on mvsep
If it ain't going for training ( but you use it for idk, mixing or something ), you can use flac
How big would a 3min file be if converted to flac
matters of individual case and compression ratio
Im doing all of this for fun not really going to train anything
Typically
or produce anything
" People often favor FLAC because it takes up significantly less space on their devices. FLAC files can be up to 70% smaller than the same WAV file. "
Yeah thats much better for me
My problem is not with storage its with downloading the file
Is audacity easy to use
The thing i liked about aicovergen is that it did all of that for me
It seperated and combined the voices byt itself
I haven't ever used audacity for mixing so I can't give you any opinion on that
it's rather simple and generic so
if you get some basics, you should do well
but if alignment of 2 files is what you want ( no effects, compression and such - generally mixing )
i.e. Vocals and music
you just put em both in, align to the edge, export and call it a day
so
Why is export audio grayed out on audacity
u can either resample the dataset to 32k using audacity/rx studio/whatever
or using an script that uses soxr_vhq which is technically better than the above ^
or just selecting 32k sample rate in applio and let it to resample for you in the preprocessing (by default the resample is done using soxr_hq)
realistically speaking most people cant hear the difference between all of these options, so do the one which is the easiest for you
Do you know of resampling from FL Studio with “Edison” plugin does the same thing?
idk but even if the case you did not resample the dataset to 32k, applio is going to do it for you
It's more so about the algorithm that does the resampling than a tool
SoX's having the best currently known algo
Thanks, I’ll check them out
Ye, search up sox resampler
( the setting for quality would be 'vhq ' )
alternatively, use my fork as it has it in use
I’ll try both
They have a GitHub?
Awesome, thanks
Actually nope, default isn't using soxr
it's using librosa's default resampling algo, whichever it is
why is not using soxr hq? mainline does 😭
it does? 🤔
As far as I know, my forks were the only ones using soxr, no mainline no applio
well.. in any case, it's a matter of adding
in: root/rvc/lib/utils.py
oh, well, it's commented out
also, last time I checked it wasn't there 🤔
or maybe I remember it wrong.. either way ye
can be easily added ✨

oh, you picked the wrong one 👀
gimme few mins, pushing the latest changes to ver 3
( cause you got version 1, that one's rvc based, not applio )
shhh
?
🤫 let em not know lol
can i get a help with the vb audio cable? i followed the instruction for nvidia gpu okada and finished the setting but somehow the cable is not working in discord
How do I get Tensor board in Applio
Then copy the link console gives you and paste in the browser's address bar
Before starting training
.
well not really
if you're training
paste that in ur model's folder
then like so ^
Paste in the path and done. Gonna open up in browser
oh wait, pinokio also copies fluxgym and there seems someone using it in #🔍│help-ai-art having an issue on not getting saved lora checkpoint
@glacial pollen I'm ready to try out Fork, I remembered you needed 2 epoch to figure out what to put in "Warm up Phase", and "Frequency running loss"? Is this what you meant?
Whenever you resample into 32k with Audacity, when pressing export, do you keep the vocals in mono or stereo?
rvc uses mono
oh ok, best bet would to keep vocals in mono?
rvc preprocessing will always convert to mono anyway
both for training and for inference all the audio converts to mono 16k
training does use a full sample rate files, but still mono
Thank you
When should I lower batch size? lower the better? I'm using RTX 3090 24gb, I'm using 16 batch size.
how to run tensor board? i don't see any bat file named such in mangio folder
if you have python installed as standalone
then pip install tensorboard
and after that you can use it from command line like tensorboard --logdir=X:\Applio\logs
the og pretrain with 50 hours of audio was trained with batch 16
for a regular model that is a major overkill
and would only lead to shitty results
I'm using the new KLM 3 32k
what's your dataset size?
30+ minutes
I'm sure wit 3090 you can try 4, 6, or 8 and see which one gives the best result
Thanks, I'll try it out
make one folder with a dataset for batch 4
batch 8 fp32 occupies around 16 GB
batch size may affect an overall training speed
one epoch would be the about the same regardless of the batch size
batch 4 - 500 steps x 1s, batch 8 - 250 steps x 2s, same thing
Interesting, I wonder what 24gb would be
on average
btw do you see difference (quality-wise and gradient stability thing) between batch 8 on single gpu and batch 4x2 on dual gpu?
There shouldn’t be any difference
As long as the gradients are synced
also two cards may not actually get 2x faster training because of the sync
how can i run tensorboard in a program that didnt come with tensorboard
probably a silly question, but i downloaded a model and there is a json file, should that be uploaded somewhere or leave it?
not really an issue
weights.gg has models named as model.pth and model.index, so json is needed to tell what the f it is
theres only a pth which i used, no .index
the directory, does it have to be the folder that has the events.out.tfevents? or the one just before it?
either
if you want to see all logs, or a specific model's log in case you have 100 models there
thank you
100 model logs gonna take a lot of time to load
there should be one or few .tfevents file in logs\yourmodel
actually im training a beatrice v2 model, and it has a tensorboard support, but the loss_g is just a straight line so 
nevermind, it just updates itself based on the checkpoints
G loss graph isn’t ur main focus
what is?
It’s just the average of fm Mel and kl
When choosing the lowest point u will choose it from the loss/g/mel graph
Thank you sir!
Does longer datasets take more time than smaller ones to train for the same number of epochs?
Depends on ur batch size, but generally speaking yes
is weights.gg's inference fully free??
Like is there no credit system
No
There is no catch
You can do AI cover on Weights for free. But I'm not sure why you would inference the entire Weights on your PC.
I meant ai covers yeah
Unless you don't wanna wait for the very rare long number queue and in hurry, you can buy their premium. 
Dats how they profit @wintry torrent
I havent waited a single second for queue
Where do u live
Egypt
Time zone difference dats why
ah makes sense
absolutely for the same batch size
premium subscriptions, ||but there are also gamified reward system (similar to civitai's credit system but in quite different way)||
What’s the best batch size for 4060 ti 16gb?
refer to my comment above
So lesser batch size equals to lesser consumption of vram?
Also which is better higher batch size or lesser for overall training?
mostly between 4 or 8, and fp16 (as default choice for RTX gpus) theoretically halves the vram usage of fp32
the difference is just that fp32 may offer little better quality and gradient stability but also slower as well
fp32 - better stability, less wild gradients (i've seen 30k+ with fp16)
1hr set fp32 batch 8
fp16 halves the vram usage used by the model / discriminators
but that would be something on top of ~4-5GB it takes anyway
so in this case it would be ~7-7.5GB instead of 9
I’m new to this but what is fp32 cause i don’t see any such options in mangio
kindly delete mangio and install Applio
Can you link it?
Thanks, i’ll check this out once my current training session gets over.
in applio the default is fp16 (it is okay for finetuning), you can switch it to fp32 in settings, kill the terminal window and restart the app after
as for the batch size, it really depends on the size of the data set.. batch 40 may work with 100hr+ set, but it is excessive for 1hr set. Same as batch 4 may be okay for 10 min set, but not applicable for 10hr+
what's good for finding a tick in a matchbox is not good to find a bowling bowl in a potato field
what do you recommend for 30-40mins to an hour datasets?
=4 and <=8 generally
5k grad/g is the worst of my old models having made using default fp16 so far
i like the stability fp32 provides but god it takes ages to cook a model with it 😭
i'd rather have a good model
edge tts (free) and elevenlabs (freemium)
by freemium you mean like rate limited?
yep perhaps
got it. thank you vm
btw with truncate method and your custom slicer script, it barely causes negative kl but there might be some upward spikes in fm & mel
or you can install something locally
f5-tts, fish speech, xtts
first two may require some finetuning
also depends on a language
there should not be negative kl ever
buut somehow it happens during training from scratch with weird models
what it did not caused negative kl for me
it was there with the old labeling method and rvc's default slicer
almost 12 hours of training no issues for me
my attempt of "cement" with refinegan did not go right
(almost 0)
last time i faced negative kl was training some very damaged dataset
I think the formula is messed up or the values calculated by the encoders
it should not be possible to have a negative, and here here we are
rvc is not wokada #🔍│help-w-okada use this channel instead
my bad thanks
🤔
i dont even know what causes negative kl
lol
kl += 0.5 * ((z_p - m_p) ** 2) * torch.exp(-2.0 * logs_p)
kl = torch.sum(kl * z_mask)
loss = kl / torch.sum(z_mask)```
though literally no collapses with mute files removed from filelist
must be dataset related then because mine was normal
i've seen logs_q being way too high that causes that
Applio training showing 65-85% cpu usage with 80-100% gpu usage consuming about 5.3 gb vram. Is this normal?
"A small numerical error or negative log probabilities"
aka weird implementation of kl

I'll try back to the old labeling method later
is your dataset a bit compressed?
last time i got negative kl was in a compressed dataset
not saying compression causes that
Is there a better method than labeling?
noobies method
0.5 sec slices
truncate silence + script that slices the dataset in 0.5 seconds of chunks
no negative kl in this case, only some "upward spikes" in g perhaps
Send the script
Ima try this later today
is your cpu amd? last time there was someone with a similar issue
I'm using AMD GPU and because of that some stuff gets offloaded to CPU, but even there it is only 75% tops
is it good?
Yes, any fix?
i dont know honestly, i have an intel cpu
and my cpu usage is fine
are you using 'cache dataset in GPU' checkbox ?
and I wonder if 'Resizeable BAR' affect it as well
I though hi-pass filter can be done by your own before
No, it’s unchecked
there's likely an issue how the samples are being moved from regular memory to gpu and back, it may be using some CPU resources
you can try enabling that option and check the task manager's performance tab, as long as the shared memory is not used, you're good
and that should lower the CPU%... hopefully
anyone here with decent prompting experience, i need some quick help🙏
@simple ore chunk_len=5.0, overlap_len=0.5 is this good? <
to use 0.5s slices you need to remove mute files from filelist.txt and to add 50 to the batch size in train.py
Shared GPU memory is around 0.9
as long as dedicated vram is not at max, it is fine
There’s plenty left
other distance functions:
Is there any applio hugging face space is available?
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
Gotcha
setting environment variable like this seems to lower AMD CPU use significantly
need to close the app and restart it in a new window
How do i use custom pretrained models in applio?
check custom box, select custom G and D files
Where do i put the files cause i don’t see the select option
well, technically just 1 to see. but ye
in your case you can set avg value to 23 or 30
( and warmup is up to you but I'd stick withhin min - 5% of total epochs and max range - 10% of total epochs )
Also uvr hugging face space?
wrong help channel, well we don't have one for that
but replied to u in #🧬│ai-chat message
Last update: Feb 29, 2024
the first one is made by @viscid moss , i think he updates his uvr much
the space should too ig
Well.. the HF space, not yet. I'm waiting to add the last missing model, to make a big release
Yesterday, 17 new models were added to audio-separator, which is the core of UVR5 UI. But there is one more that needs a workaround to work.
goodluck
any recomendations for generating more natural sounding audio using edge tts. Looks like it doesn't support SSML
@simple ore can you please tell me how to see tensor board correctly?
run it, go to scalars tab
Then ?
There are too many graphs 📊
Some are going up some are going down
What to do ?
Tensorboard is a series of graphs where we can monitor the progress of our model during training, but there are many graphs. We are only interested in the graph called 'g/total'. You can find this by clicking on 'inactive' and selecting 'scalars'. Then, go to the last page, where you will find it in the last graph.
Thank you 😊.
So, out of curiosity, can yelling be apart of a dataset?
Like let's say, Eren Jaegers yelling mixed with his talking
Do I just need to put both together in their own group?
Like all the talking lines first, then the yelling?
I want to stop training on current epoch (65) how do I stop it in applio ?
if you're saving every 10 epochs, then stopping right would get you epoch 60 model
if you chose to only save the final model, then it wont be saved until the very end
Oh no..
It was just hiding. Now I see it.
So I have to wait for at least 100
do you have .pth files other than D/G in your model's folder?
Do over training detector works in applio ?
Yes I have one. Mymodelname_50e_5400s.pth
Do you think you could answer the question I had?
Please
do not include yelling in one file with normal speech
otherwise the normal spech would normalize to nothing
how to update applio?
download new version, unzip to new folder, move audios and models over, delete old folder
thanks
Hey guys, I have a question. In theory, If a model has an accent, can more epochs decrease it?
10 hours in, keep training?
Anyone help me resume training?
Hey, TwoOne! Please use the command !howtoask to increase your chance of getting help by structuring your question in a way others can understand better. Also make sure you're asking in the right help channel:
- General RVC help: #✨│ai-help
- W-Okada / Realtime RVC: #🔍│help-w-okada
- AI image related: #🔍│help-ai-art
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
It was here that people were working in a text to speech ai? cant find a good one that uses rvc models
I believe there’s 1 that is integrated in Applio
So my applio has been working perfectly fine till right now "RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input." I saw someone saying to use the split audio option but that didn't work I ran the "install.bat" file again but that didn't help either what should I do?
Fixed it I was just running out of memory 
The model has completed 300 epochs, and I would like to extend it to 500 epochs. How can I continue training from epoch 300 without starting over?
Could someone teach me?
what about other charts?
you're squeezing all the juices out of it
depends on the application you're using
They look similar
(personally I'd stop at 30k lol)
How do i add rvc voices to aplio in pinokio?
Alguien que hable español que me ayude a tener un buen modulador con voz de chica que no parezca taaaan robotica? porfavor
I don't think there's anyone using pinokio so ( afaik
we don't quite officially support it
Ok im using it, i had a lot of problems installing so a just pinokio'ed it lol, so how its done in the aplio normally? it needs to be in a specific folder?
I mean, you could have just ask for some support really
In applio, you'd place em in /logs/<model's folder>
within the folder, along with the index
My aplio logs has only one folder called mute
I always have a lot of throuble when it comes to A.I.s and i dont want do disturb anyone, just ask when im really in trouble kkkk
Well then that's that. but keep in mind you're having a higher chances of support if you use tools that are known / used here
naturally
you see, if 90% of people use X1, hardly anyone will want to dive on their own into X2 to help 1 person
and as most of us or me, are against crappy automations ( pardon my lang ) the chances are even smaller
Imo Pinokio is for lazy people who aren't willing to learn a bit to do things right
That's a lil scary considering such people intend to work with artificial intelligence
it never was meant to be easy or easily accessible with no effort put into it tbf
That's like asking for troubles in 5-10 years
Thats me lol, im using it to make some memes as the site i was using got shot down, dont want to go a whole programming lesson just to make some memes
ig, if reading up a bit of text that'd literally take ( at worst ) 10 mins is what you call programming lesson
then I suppose, you shouldn't be using AI for memes
¯_(ツ)_/¯
it's 2 clicks man, 2 clicks
1 .bat for installation, 1 for running
lol
And yet here you are asking about fixing N problem on an unknown site or a service
Taking you more time than it'd if you read a bit of instructions
I hope you get the point
Well thats news last time i tried it, needed to instal a lot of things write some texts in the prompt of command and deal with a lot of problems as they showed up
That's what the help channels are for
So you know how to deal with them and maybe even help other users, if at one point you felt like it
it's a basic skill anyone should have, problem solving
Man, I'm kinda scared what's gonna happen to this new generation
Can't imagine ( no offense ) such people operating atom/nuclear-powered facilities in 20 years
dw those same kids are going to be running our gov in like 30 years
still effects you
Either way, Soryu
If at one point you changed your mind and actually decided to give applio a go, let us know
Always open for support as long you need it
Man if am using something no one here uses isn't a good thing actually? i can solve problems of other people that you guys cant help
I suppose? not that I'd expect much of support towards tools not associated with the server anyways
Also i finded the solution, works the same as the normal aplio just needed to drop the file in the download section of the interface
so if you wanna volunteer, go ahead
well, in normal applio you'd do it manually unless you're lazy
same as it always was with rvc
But congrats on figuring it out ✨
Can do this too there actually, but i just droped the file no created a new folder for it
alguien español?
Please speak English here as it's in the server rules
step 1) download the compiled applio version off Huggingface
step 2) unzip to C:\Applio
step 3) unzip your model into C:\Applio\logs\yormodelname folder
step 4) use run-applio.bat to start it
how hard is what? Why do you need pinokio for that?
Please speak English, or speak Spanish at #🌍│español
Isn't what it called a skill issue? 
cant do an interference on applio
i followed the steps for amd gpu
the applio opened but is in a infinite loading to do an interference and are not using my cpu or gpu
u sure you didn't accidentally click on or have the console in focus?
the terminal is not on focus
its says: Compiling in progress. Please wait...
after I try to interference
If i’m using custom models in applio do i need check custom in embedder model tab?
please go back to the install guide and read what it says at the very end
only if you have a model that requires a custom embedder, I doubt there are any in the wild
So just leave it at contentvec?
Will this let me use the pretrained custom models?
models using contentvec are compatible with original rvc
rvc's hubert is in fact contentvec 500
so yes, they are
well, or should I say " contentvec's 500class model "
What is Hop Length and what does it do exactly?
@uneven horizon
simplifying / abstracting it without going into too much details:
uhh, I left my PC on and activated sleep/hibernate mode on my PC and its still training?? When I went to work and thought nothing of it.
possible that it can still train during hibernate mode? My PC is chill and not hot or anything.
I'm 1005 epoch in lol
Here's 21 hours of training, when is it over training and should be stopped?
Loss D ?
Here ya go
Tensorboard is a series of graphs where we can monitor the progress of our model during training, but there are many graphs. We are only interested in the graph called 'g/total'. You can find this by clicking on 'inactive' and selecting 'scalars'. Then, go to the last page, where you will find it in the last graph.
Read it
hey, I was curious, if you add more audio into the dataset, do you have to start training the model from scratch or can you just continue and enhance the existing model?
How do I resume Training on Applio
I hope I understood it, near after 16k is over training?
See all parameters G total loss d total loss kel ml
batch size 40 on short dataset? 💀
No, it's on Batch 6 of 30+ mins of Datasets.
Really
also you have resumed training with different batch size than before
I think there are no symbols of overfitting ?
Did I have to remove the logs from previous trainings? I believed I started a new one
But how
Meaning, it can still be trained?
I left 1500 epoch on it 
I wonder too
U need ur logs folder
Use the same model name, and sample rate dont preprocess, dont extract features
Use same batch size and click train
Okay
On rvc disconnected u need to type 23333 in the g/d number cell
I believe
Ohh. Thank you. But I'm training on applio
Bet
is there any good tutorial to train ai model
what's ur pc gpu
4050 laptop gpu
How much vram
6
Not really the best
You can train RVC models on cloud (remote good pc):
- Prepare the Dataset
- Setup RVC:
Choose a cloud way to use RVC,
- Google Colabs (4 hours of daily gpu for free, not much hours, but easy to use):
- Applio (ui)
- Mainline (UI)
- RVCDISCONNECTED (no ui)
- Kaggles (a bit harder to use and needs phone number but gives 30 hours weekly of better gpus):
- Mainline (UI)
- Applio by Vidal (UI)
- Applio by Shirou (UI, no guide as of right now)
- Lightning.ai (Kinda hard, needs login, no issue with web uis or anything, but only free 15 credits monthly):
Google Colab = Easier but risk of getting disconnected
Kaggle = Harder but way more gpu time
- Be sure to know about the tensorboard
If you are looking for the easiest way and for free, is using https://weights.gg which ofc uses RVC
But I think Cloud would be the best
thanks ill take a look
a vocal separation model
Voice cuts off… what to do?
where?
on discord
yw
you can't quite tell it tbf but
around 16k it's more or less where you should stop as past that the performance as you can see is regressing
if you want more accurate results, get my fork with averaged metrics
A converted audio you inferenced on RVC can be cut off to silence abruptly if using a bad RVC model. 
as then you can see avg performance throughout epochs themselves too
so you can "more or less" see how an epoch performed ( still, on it's own data but better that than what stock logging on it's own is )
ofc, normal losses are still there too
but you can already see the differences
Stock behavior of logging is to log given epoch's last step's performance where averaging does log the loss over n steps ( of your choice ) within that epoch
reason I mention it is because stock loggings are hella inaccurate. Example;
Imagine your epoch is 67 steps, the logging takes place on step 67, that one could be great metrics wise but 80% of the steps in that epoch display rather mediocre or bad performance. You get the point
Naturally, having a proper evaluation phase during training would be the most ideal, where aside of training and own-losses, losses based on how model performs on unseen data ( evaluation set ) is also measured. That'd showcase the model's generalization. yet, we don't have that ( at least yet )
ps. batch size of 6 might not be the most ideal option here ( esp for 30 mins), I'd highly recommend trying out 8, it's more balanced and since 8 is a number that is a power of 2, the performance of training is somewhat better as parallelism in a sense is in your favor
i succesfully trained and tested my voice model it worked well i used 200 epoch and 5 minutes data but i wonder howmany epochs and howmuch data length is ideal ? im looking for any tips for newbies
You'd basically want to use tensorboard
Looks like so
i looked over that but i didnt understand anything
once you learn to evaluate what's going on with ur model on graphs, you can def improve ur models' quality
yeah I can help if you're willing to read a bit and dive into it
( worth it tho
im not aiming to be a professional but i wish for better
im just using this for trolling my friends not business
it's not really what professionals do
it is just what's used in all machine learning cases ( well, most, there's also keras stuff
cause " I'll train for N epochs as I think its's good " was not and won't ever be a rule to follow sadly
I mean yea, saving every single epoch and testing em is an option, but a pain in the ass tbf
is a bit confusing at the start but honestly a couple of reading and you'll get it pretty fast
you see.. if you don't wanna go that far into metrics, you can just follow simple rules~
- Lower = better.
- if it keeps on rising and keeps that tendency for a while = bad
it
is a nobrainer once you dedicate like 15 mins of your time into understanding it ( even basics will do, and you're already well prepared for most ml trainings), my dude
Imo reading basic graphs is a basic skill most of us should have in 5-10 years
ill check few tutorials i guess
the tensor board is actually easy after a 10 ish minute read, trust me i actually used to be exactly like you and think that the tb was only for professionals
or ill learn by trial and error
I mean, I can simplify it all
should take you a couple of minutes to understand them, don't worry, does not require years of machine learning knowledge to understand them
read this up
but in a short;
if you get my fork, evaluation of your models gets pretty easy
you get to more or less see how that one epoch does, in terms of performance
imma try my best
Then you'd have like, two steps.
Normal graphs ( hypothetical scenario )
AI HUB Docs
