#✨│ai-help
1 messages · Page 192 of 1
does someone have a attractive female voice free or something i can pay for idc im tired of making data sets that turn out mid
Could this be why it doesn't want to work? It's happened every time and I probably should have said so from the start, but it happens every time I try a fresh install
Some of them just get stuck and don't download and it's not as if there's a "Retry" button
As someone who prides themselves on being able to usually fix a problem with a computer this has got me brain burnt
Wouldn't be so bad if it didn't take upwards of 20 minutes to complete
the ones thtat aren't above 19 minutes are frozen 🙃
What would you all say is important when trying to make a real time rvc that can keep up with someone that changes between 75 to 300 fx in pitch often? Tying to develop a model that can keep up with someone who has that kind of speech pattern.
no wonder this channel is so messy, you should discuss that topic in #🔍│help-w-okada
I think I got it to work!
Ayo? @azure osprey level 2 !!! 
As of right now, disregard my crying
if the input audio is too short, try extend it to at least around 10-20 sec
How do I make my model sound good? For example, before saying a phrase or part of a song, there is a type of breathing, right? So I want this not to sound too robotic, and I train my models without this type of breathing, just the voice. What do you recommend?
Now it crashes whenever I try a custom voice, lemme watch a few more tutorials before I post anything more about this issue
This shit just doesn't want to be easy 
Checking around it doesn't seem like I'm doing anything wrong but getting an error code and crash when I use any custom voice
probably leaked some important info there idk, don't really care
Connect your internet. It will surely work. There is a plug-in named sup3 that is loaded over the internet.
I was connected to the internet ;-;
Then try to delete user settings. And restart it again
I did that too, but I'll try it again
Alright, I'll do it again
Ayo? @azure osprey level 3 !!! 
This was the first time the installation went off without a hitch
But I'll do it again
Also ask in #🔍│help-w-okada
Ah, will do, I asked here cuz it had to do with voices, my bad
@pastel oak is a great helper
He was helping me earlier, great guy
So did you got your answer?
no...
I figured the part he was helping me with out on my own
But it's no fault of his
Okay then re install it
Yeah doing that now
This thing has fought me the entire way lmfao
I appreciate your help btw
Alright all done and connected, default voices work, I'm going to try a custom one now
Same thing
Crashes on Custom Voice
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
why cant i go over 1000 epoch when trying to train??
like i stopped, reentered my g&d but i still cannot go over 1000 epoch
like it detects its already 1000 and stops
for original RVC, you could modify the maximum value of total epoch gui by finding this part in infer-web.py and change it like this:
total_epoch11 = gr.Slider(
minimum=2,
maximum=10000,
step=1,
label=i18n("总训练轮数total_epoch"),
value=500,
interactive=True,
)
though practically most models will be likely to overtrain in less than 1000 epochs
tbh its really worthless to train over 1k
i highly suggest u not to do it
more epochs dont mean more quality
use the tensorboard
What’s ur pc gpu?
also be sure to not use yt tuts
-rt
Interaction has expired, use the command again for a new interaction.
be sure to be using onky the 1st link, the wokada fork
you need a better dataset and check the tensorboard
realtime voice changer for calls?
-rt
Interaction has expired, use the command again for a new interaction.
1dt link, wokada fork
FOR WOKADA (REALTIME VOICE CHANGER FOR CALLS) ASK IN #🔍│help-w-okada

pinned tskr 
Real
next time someone asks about wokada maybe we should just tell them to use #🔍│help-w-okada
this channel got really messy tbh
oop very weird, thought u had some pre-historical thing
Well probably yea.
You can manually install these files from huggingface and deag them into the folders, i can send a link later
I got them to work now, the new issue is posted in the okada channel
Okok
I do appreciate your help though
can somone tell me how to use on Mac
What ?
Btw already helped in #🧬│ai-chat message
okay, wont then
what file should i use?
like, to import my model in the voice changer
i guess the added_... as the index
but what pth?
trying to load sovits model?
or v1 pretrain into v2 training
768 is the number of channels in rvc v2 model
256 is in v1
what are you trying to do?
using an 2-year old RVC app or something?
Ayo? @limpid cradle level 1 !!! 
v1 model should work for inference
at least Applio supports both v1 and v2
dunno about mainline
-rvc
- How to use RVC Mainline Colab by Cauthess
- AICoverGen Colab Guide by Eddy (Spanish Helper)
- Create a model with RVC disconnected (colab) by Angetyde
err
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
what do i need to do to run RVC
its aways closes himself when i tried to open
i open the other file
and its says that
help?
Does anyone have any recommendations where I can find more pretrains online? Besides the pretrain section on the server. I can’t seem to find any online
there are none online its not a big thing
pretrains are not recommended anyway
You see.. You are not meant to use rvc gui
No idea where you took that info from
why
erhm
Try to guess ( date )
You wanna pick either original rvc ( mainline ) or Applio
soo its not released?
no, it is just simply outdated lol
soo its aready dead?
🤦♂️
rvc gui =/= rvc
If I may ask, where did you find out about rvc gui, yt? or someone recommended you it?
'rvc gui' is outdated and not used anymore ( for a long while now, in fact )
well, rip my dude
oh...
https://github.com/IAHispano/Applio
https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/tree/main
soo its no longer to install rvc?
Just please, carefully read the repository and instructions
alright
Here's some useful info, docs and such just in case
@brisk nova
in case you got lost or something
Gluck ~
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
i done it, what do i need to do now
Ayo? @brisk nova level 2 !!! 
I mean, everything is in the docs or on this server here n there
where is the doc
I literally attached the msg
ok
cmon bro
#✨│ai-help message
which one
just read
hummm
bu i downloaded theses 2
?
Well then go for applio, simple
Look, no offense but, can you please read what's being said to you
Cmon, don't act like if you had no brain

The instructions are literally on the repo, nah, even better, a pre-compiled package is there for you to download
Like, really, no offense but I am reading papers rn, coding and working on rvc upgrades and I can't afford assisting people who, despite being asked to, do not carefully read instructions or msgs. I am pretty sure it's understandable for a lot of people and someone has to already say it outloud.
Because if you asked me? If I see incompetent people trying to play with AI, I can only suspect huge tragedies in future.
what does ko-fi mean when looking to buy voice models
Ko-fi is a platform where creators can receive support from their audience by allowing people to "buy them a coffee."
aka, you tip ( pay ) the creators, you get the models / commission someone for making them
same goes for paypal and such ( as these do happen to be in use quite often too but they're not exactly as ko-fi persay )
delete what you got and download a proper compiled release
i dont understandddd
that's all you have to do ^
You can't get it any simpler man
dude
i need do download more shits?
i download 1 i download 2 i download 3
how many times do i need to download?
all i what i want its just a AI voice
i already have my model
of what my friend gived to me
Maybe you shouldn’t be using Ai since you are having a hard time reading simple instructions
what do you mean by that?
theres 3 of them and the RVC
and this comand
No man, this is a clear skill issue
- You're given clear instructions
- You're asked to carefully read what is meant to be read by you to avoid redundant questions in here
- You download what you should not download and then make a problem out of it
You had one simple job. Read the instruction.
Then you'd know that **the only thing you're meant to download is " precompiled package " **
So please, don't histerize just because you can't read
you can't be helped, I am sorry for your loss.
why
@brisk nova hey mate!
what does that mean?
we've got guides available that give u a step by step tut on how to run RVC
alright
check those out first and then you can reach out for help if needed
but what do i need to install first
Ayo? @brisk nova level 3 !!! 
?
Uhhh, don't think it's a good idea Litsa
then run the go-web.bat
if they can't read guides right, I doubt they can handle visual c and faiss
i have one
bOoOoOOoOoOoO
We'll see
no bad
Again arg, do not use gui, as I said it's outdated and nobody uses it
download the one i gave ya

ok
i havent seen that in a year ohh god
ikrrr
I still wonder who the heck recommends these as apparently it was an actual ' someone ' recommending them(?) the thing
old yt vids im guessing
well maybe, but that makes me think
is there still no proper guides anywhere? like, 2024 edition
yeah none
oof, maybe I should legit do one sometime, would save everyone tons of time lol
owo yes please
Its istalling
oh there are people just dont read them
rip
-gui
What
But its not ai Voice?
You have to understand one simple thing
'Rvc gui' is outdated.
Nobody maintains it, nobody can promise if it's bug-free, there's most likely worse performance expectations and obviously, you lose features that are in new stuff
In fact, rvc gui in here is treated as a meme
alright
how do I safely quit this, I just want to use the current Epoch 150, don't want to go further.
also how do I find the .pth and index file
did it save the model into assets/weights?
the AI cant hum well, is there a way to make it smoother or make it sound better?
What model were u using
im using deitris, all the models caant hum well. I was wondering if i need to tweak something. cant be my mic either.
Simply because the model wasn’t trained on humming
sigh* sheesh... i kinda thought about it but didn't know that was indeed the case.
A fix for this is training ur own
i guess so... can i train existing models?
I meant u train with a dataset dat you’ll make and will contain humming
okay bro, I guess I'll go right ahead and learn about AI training 
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
can someone give me a tutorial on how to install the voice changer
Please how do i create a .PTH file and .INDEX file on the step that says upload a voice model.
I'm having difficulties doing that
Which RVC are u using
I'm installing locally
Mainline
Ayo? @glacial kiln level 2 !!! 
Did you firstly make the dataset
I'm new to this.
how do i make dataset?
Check this out first https://docs.ai-hub.wtf/rvc/resources/datasets/
Last update: Mar 8, 2024
mewo
this is the section to create an ai voice model?
yes
you should read the guides carefully
but generally, you do not touch that box
here's an example of how one could set it
i cant send images for help :(
(1) you name the model, pick samplerate that fits your dataset, select version v2
(2) you put the path to your sample, for instance: "bla/bla/bla/your_sample.wav"
(3) feature / f0 extraction
(4) train index button
(5) set the saving frequency to 5 as well, not gonna go too deep into that.
(6) amount of epochs uhhh, well, again not gonna go into advanced details but, pick 200 or 100, maybe 300 if you have larger dataset
(7) batch size: you can try 4, 8, 12, 16 ( if your hardware can handle it, go for 8 or 16 )
for choices pick: yes, no, yes
Running with the system Python.
Traceback (most recent call last):
File "C:\Users\admin\Downloads\RVC-GUI-main\RVC-GUI-main\rvcgui.py", line 23, in <module>
from vc_infer_pipeline import VC
File "C:\Users\admin\Downloads\RVC-GUI-main\RVC-GUI-main\vc_infer_pipeline.py", line 1, in <module>
import numpy as np, parselmouth, torch, pdb
ModuleNotFoundError: No module named 'parselmouth'
what do i do for this
@tired raft
That's all I can tell you as I am busy rn and can't go too indepth
You should not use rvcgui It is obsolete and nobody should use it
what is used nowadays?
Aside, for future.
If you see things like " ModuleNotFoundError: No module named 'parselmouth' "
module not found, that means you're lacking some python package or modules ( scripts )
typically you could try to install such with pip ( if applicable at given situation )
im doing it with my friend. everything goes well for now
RVC or Applio for inferencing and training
rvc's built in real-time voice changer or w-okada for real-time voice changing
where can i find any of those
ok
But I'd rather recommend you applio as it's easier to set
alright
alrighty
Ayo? @brittle wing level 1 !!! 
Ayo? @tired raft level 3 !!! 
I'd appreciate if you did not spam the pings
I am busy working on my project as mentioned + I reply to many people rn
sorry
What's the issue?
yeah im sorry
no I meant, what's the issue in your case
as in, what happened
docs 404 ?
Not really from website / docs department so can't say much more but
In the context of RVC, the dataset is an audio file containing the voice the model will replicate. It can be either speaking or singing.
have you tried this one?
Unless it's the same one you use ( can't see well on the ss )
i used the link you send, and for the tensorboard thing there was a "here" link and it didn't work as u can see in this screenshot
In that case I can't help much
again, I am not responsible for docs nor I took a part in creating them so I wouldn't know what happened and / or if it was moved somewhere
Tensorboard is a very complex topic tho so. For now you're alright without it, as for your first model ( in my opinion at least )
where is this? Im fixing up the docs
nvm found it
The model is training rn
My friend did it
I think it's at 550 epochs rn (we put 1000)
what are these embedded models can anyone guide me through? (My training set is in a south asian language)
they are specifically trained for specific language features
a model trained with chinese hubert only works with chinese hubert for inference
wait so it isn't about language that im trying to convert
so I'm using training sets in tamil, and also converting voices to my training set voice in tamil only
one more time... a hubert feature extractor is a model that 'transcribes' speech into codes.. imagine one person writing down a chinese greeting as "ni hao ma" and another as "你好吗"
Ayo? @simple ore level 39 !!! 
which person does it better?
contentvec is a person transcribing any language into roman characters
close enough for most purposes
there's a custom hubert you can use for Tamil
but again I have to say that if you do that, you have to use the same custom hubert for inference
can someone help me figure out why this is the output im recieving when i try to train my model?
when i run the training i go to 1000 epochs in about 5 mins so theres something definitely wrong
already indicates something is wrong, do not go train
figure out what is wrong first
and stop using mangio ffs
you did not preprocess properly
you did not extract features properly
you've trained 1000 epochs on two mute files at best
Yh I downloaded applio and im trying it on that instead now
@idle adder go update it smh
i downloaded the newest version of applio and tried using it and im having the same issue. im wondering if it has to do with the size of my dataset (24hours) because its the first time ive faced this problem. ive made sure the audio is in the folder and that its linked properly. the preprocess takes about 40 minutes to complete which makes sense. my GPU is selected yet it still wont work and im not sure how else to troubleshoot this. im not trying to be annoying or dumb, im just still trying to figure all this out.
that preprocess sounds not normal, shouldnt be hell slow on even a crappy hdd. instead of using a single huge dataset file, I'd suggest following the audio labeling section in this guide: https://rentry.co/RVC-dataset-RX11
Ow ok thank you. I assumed it was normal for it to take so long because it was such a big dataset. Thank you
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
.np
actually, a lot of what was mangio ended up in my fork lol
speaking of mangio.. the heck is up with Kalo 🤔 did he just stop all rvc or ai altogether?
ig
a
last time i talked with him he just said he does guitar now or smt
-rvc
- How to use RVC Mainline Colab by Cauthess
- AICoverGen Colab Guide by Eddy (Spanish Helper)
- Create a model with RVC disconnected (colab) by Angetyde
.
is there a way to use a voice model to change the voice of an existing mp3 file. everywhere i looked dosnt seem to allow custom voice models.
⠀
Settings for Nvidia GPUs 
F0 Det.: rmvpe (suggested for all series)
RTX 40-series: 80-96 chunk | +16384 extra
RTX 30-series: 96-112 chunk | +16384 extra
RTX 20-series: 112-128 chunk | +16384 extra
GTX 16-series: 128-192 chunk | +8192 extra
GTX 10-series: 128-192 chunk | +8192 extra
Advanced Settings
Protocol : Sio or Rest
Crossfade: 4096 start 0.2 end 0.8
Trancate: 300
Silencefront: Off
Protect: 0.5
RVC Quality: Low
⠀
well not quite in a literal sense / in a way you probs intend to
what you wanna do instead is a separation ( to obtain extracted vocals and music / background whatever ) and then inferencing ( aka changing the voice ) then combine it all together
I have extracted vocals
and everything else
i just need to change the voice using a model, like from #1175430844685484042
idk where id do that, or if i can
Ayo? @wild yoke level 1 !!! 
to change the voice you're using a model ( we call that process inferencing )
Now, where can you use the models? you see, these are RVC models, so logically, you'd use RVC or Applio ( think of it as custom rvc with few things here n there )
but given that handling rvc isn't really noob-friendly
https://huggingface.co/IAHispano/Applio/resolve/main/Compiled/Windows/ApplioV3.2.7.zip
https://github.com/IAHispano/Applio/releases
what's so funny?
the way you explained it to me. idk, not a bad thing
Trust me, there's too many people that barely can read
at this point it's for me to avoid redundant repetition
кто русский
помочь
надо
нейронка не работает
пж хелпаните
pls help ai not working:(
@candid meteor go ask on audio separation discord
There are people responsible for uvr, bsrof and so on and so on
or at least those that work on these
ok thanks
can't send links to discord servs so, just search it up
In any case, they'll help you
ok ill find it
Just use mvsep.com
It has all the models u will need
-audio
- Creating Datasets for RVC using iZotope RX11, by Cauthess
- Gathering and Isolating Audio, by SCRFilms ❄
- Instrumental and vocal & stems separation & mastering guide, by deton24
- Vocal Mixing Tutorial, by Roomie
- https://mvsep.com/
Ayo? @rugged solar level 1 !!! 
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
⠀
Settings for Nvidia GPUs 
F0 Det.: rmvpe (suggested for all series)
RTX 40-series: 80-96 chunk | +16384 extra
RTX 30-series: 96-112 chunk | +16384 extra
RTX 20-series: 112-128 chunk | +16384 extra
GTX 16-series: 128-192 chunk | +8192 extra
GTX 10-series: 128-192 chunk | +8192 extra
Advanced Settings
Protocol : Sio or Rest
Crossfade: 4096 start 0.2 end 0.8
Trancate: 300
Silencefront: Off
Protect: 0.5
RVC Quality: Low
⠀
⠀
Google Colabs 
⠀
AICoverGen-WebUI
Useful for making quick covers, by Hina.
AICoverGen-NoWebUI
Useful for making covers, doesn't include a UI, by Ardha, by Eddy, Hina and Gdr.
RVC Disconnected
To train new voice models, by Kit Lemonfoot.
EasyGUI
The OG interface, by Rejects.
⠀
⠀
Download for Nvidia GPUs 
Version 18a cuda
Download for AMD GPUs 
Version 18a directml
Download for Intel GPUs 
Version 18a directml
Download for Mac 
Version 17b Mac
⠀
⠀
Settings for AMD GPUs 
Don't forget that your models needs to be converted in ONNX!
F0 Det.: rmvpe_onnx (suggested for all series)
7xxx XT cards: 112-128 chunk | +16384 extra
6xxx XT cards: 128-192 chunk | +16384 extra
5xxx XT cards: 192-256 chunk | +8192 extra
RX 580: 192-256 chunk | +8192 extra
RX 570: 192-256 chunk | +8192 extra
RX 560: 256-384 chunk | +8192 extra
Advanced Settings
Protocol : Sio or Rest
Crossfade: 4096 start 0.2 end 0.8
Trancate: 300
Silencefront: Off
Protect: 0.5
RVC Quality: Low
⠀
⠀
Settings for Nvidia GPUs 
F0 Det.: rmvpe (suggested for all series)
RTX 40-series: 80-96 chunk | +16384 extra
RTX 30-series: 96-112 chunk | +16384 extra
RTX 20-series: 112-128 chunk | +16384 extra
GTX 16-series: 128-192 chunk | +8192 extra
GTX 10-series: 128-192 chunk | +8192 extra
Advanced Settings
Protocol : Sio or Rest
Crossfade: 4096 start 0.2 end 0.8
Trancate: 300
Silencefront: Off
Protect: 0.5
RVC Quality: Low
⠀
⠀
Settings for Nvidia GPUs 
F0 Det.: rmvpe (suggested for all series)
RTX 40-series: 80-96 chunk | +16384 extra
RTX 30-series: 96-112 chunk | +16384 extra
RTX 20-series: 112-128 chunk | +16384 extra
GTX 16-series: 128-192 chunk | +8192 extra
GTX 10-series: 128-192 chunk | +8192 extra
Advanced Settings
Protocol : Sio or Rest
Crossfade: 4096 start 0.2 end 0.8
Trancate: 300
Silencefront: Off
Protect: 0.5
RVC Quality: Low
⠀
⠀
Download for Nvidia GPUs 
Version 18a cuda
Download for AMD GPUs 
Version 18a directml
Download for Intel GPUs 
Version 18a directml
Download for Mac 
Version 17b Mac
⠀
Anyone know why my RVC is talking garbage?
😭
@scenic gale
@brittle wing
Like what I'm hearing isn't what I'm saying.
Hello my rvc is just saying "waiting generating pipeline and pipeline not installed" looped how do i install the pipeline?
⠀
HuggingFace Spaces 🤗
⠀
Ilaria RVC
EasyGUI port with some improvements, by Ilaria.
RVC-HFv2
Applio port, by r3gm.
AICoverGen
AICoverGen port, by r3gm.
Advanced RVC Inference
Extended version of the GUI with advanced settings, r3gm.
⠀
Try to delete pretrain and model_dir folders and relaunch.
Also next time use #🔍│help-w-okada
sadly for pure echo ( not reverb ) there's not single one that'd do the job without damaging the audio
as far as I know
Yeah but which one?
I know:(
Aggressive or normal also what do you use for reverb removal also noise?
You first remove reverb then echo?
⠀
Local Forks 🖥️
⠀
Mainline RVC
Original project, suggested for advanced users,
by the RVC-Project team.
Applio
Simplified, suggested for all, by the Applio team.
RVC Studio
Simplified, suggested for all, by SayanoAI.
Mangio-RVC
Simplified, may not be supported anymore, by Mangio621.
AICoverGen
Simple yet great way to make covers, by SociallyIneptWeeb.
Replay
From the greators of weights.gg, excellent product for everyone.
⠀
is it possible to just make my voice slightly deeper and change nothing else
when you get past ~4000 slices it starts to combine features
noraml
normal
What's your dataset making procedure?
i need help with something my if i se my graphics card to use a voice changer it doesnt work only with my CPU
-bs roformer
-melband karaoke (if needed to separate lead from back vocals)
-bs roformr de reverb
-uve de echo normal
-uvr denoise (0.5 aggressiveness)
BSRoformer derverb use as Is or extract vocals?
then on rx 10
-de click
-de crackle
-mouth de click
extract
Does this method work
dats how i isolate my songs
And you got model maker?
im model master
So your method is very effective for dataset?
pretty much
Only with MVSep?
rx 10 and audacity
Hm well will it still work without
I can use audacity but web version I'm on mobile
its better to use rx 10
Latest BSRoformer right?
yes the august ver
I'm on mobile
use it to eq low end and noise gate ur audio
and resample to 32khz when exporting as wav
Can goldwave be useful too?
i never used it, u can use it if it does the job
Is the mvsep process enough?
On what
Song
from who?
Jhope, BTS
Why you asking?
can't really say
What did you make models of?
wym
Mel and karaoke-extract from mixture or vocals?
both are the same bu from mixture is faster
Extract from Vocals takes away from the lead vocals
tho, doing filtering on your own is pretty much useless as rvc does it better with butterworth filter ( 0-48~ hz )
I'd opt more towards plosives handling
frl?
i didn't know dat
Yea, what rvc does in preprocessing is
normalization ish, butterworth filtering of low low hz
in fact, on the user-end, main things that should be in-check is dynamics, noise and maybe few other things ( aside of obvious reverb and delay, tho slight reverb isn't as destructive really )
how do i get weaights cover ai to sound good when i select good voice it doesn't sound like em some parts of song do not all any tips
Ayo? @potent saffron level 1 !!! 
Yet, given how plosives tend to residue at around 60-70 to 150 / 200 hz ( depending on the voice ) rvc won't task that
so user has to take care of it
how can i know if i should cut certain breaths in my dataset
tbh, you shouldn't
reason is, rvc's slicing the audios into 3 or 3.7 sec segments where there's always 0.3 or so secs of overlap ( from consecutive segments )
then each is normalized so, if breathing is captured in there, it'll be fine
Only case when you could get rid of such is when they're too contaminated with noise
where you have a suspicion rvc would mismatch it with noise
good lord i used to cut every single breath at sum point 😭
F
I mean, you can always add some in, to the dataset but that sometimes might not be ideal
BSRoformer derverb sadly takes away from the vocals
its aggressive yeah
but its the best
for dereverb I'll always recommend a thing I use in fl ( an AI vst )
Yeah but what do I do if it takes from the vocals...
is it the one smilar to dialogue isolate?
nope
dialogue isolate is actually pretty bad
the vst is from waves
waves clarity vx dereverb pro
is it long? thing you work with ( that has reverb
use mel but set preprocess as is
Nevermind then lol
Wanted to dereverb it for you
? No need
Cause I ran out of minutes on X-Minus have to wait til next week
Why
it adds noise
Ayo? @brittle wing level 17 !!! 
yh
u acn use mx if u want, but then youll have to deal with the mdx noise 😂
I have dealt w it already it's unfixable
Understand?
de reverb doesnt remove delay
I meant "dealt"
@flint solar check it out
Output isn't tiptop perfect as I gave it a very hard scenario to handle to showcase the performance ( reflections cranked up almost to max aside of reverb )
The noise is stuck in embedded
I'm on mobile
?
is it paid
well unfortunately yea, unless you know where to find stuff
i dont rlly understand what u mean
ur still de reverbing ur audio
I don't use computer
BSRoformer derverb takes away very much from the lead vocals
there is no diff between mobile and computer
its the same model
Uh I got confused
I thought you sent this @flint solar
?? Yea I did send it but I don't get what you're on

Anyway, I'm about to start my work so, would appreciate no unnecessary @ s
hey can someone suggest me a google colab that works for making ai covers?
Ayo? @barren fern level 1 !!! 
UVr deecho normal at what aggressiveness?
any idea how to fix: [Failed to fetch
TypeError: Failed to fetch] when trying to download a checkpoint
Ayo? @cedar nymph level 1 !!! 
keep it at 0.3
Default?
yes
I used MDX23xc Dereverb on the colab
Is that okay?
It's very aggressive there
no
u still got ur lead vocals bro
But not as full
You gotta accept the way it is
No current existing method will give you 100% perfect or full vocals
roformer type models will give you best result but might be aggressive compared to mdx.
Mdx on the other hand are less aggressive ( typically do not damage audio ) but it's results are questionable and often times quite poor
it is a matter of " pick your devil "
use the recent melroformer dereverb (not the archived one): https://huggingface.co/anvuew/dereverb_mel_band_roformer/tree/main
MDX dereverb has worse quality imo
hi i forgot how to use this stuff when im download a model what files do I need? right now im download a bin file does that have the pth. in it ?
⠀
Local Forks 🖥️
⠀
Mainline RVC
Original project, suggested for advanced users,
by the RVC-Project team.
Applio
Simplified, suggested for all, by the Applio team.
RVC Studio
Simplified, suggested for all, by SayanoAI.
Mangio-RVC
Simplified, may not be supported anymore, by Mangio621.
AICoverGen
Simple yet great way to make covers, by SociallyIneptWeeb.
Replay
From the greators of weights.gg, excellent product for everyone.
⠀
it has to have .pth and the added .index file
that seems not an RVC model, where did you get it from?
is there a way to get less delay that works well? becuase when i use my voice changer it takes several seconds to work
When I train a voice with RVC it generates the .pth file(s) but not the .index file(s). Is this a common issue?
⠀
Google Colabs 
⠀
AICoverGen-WebUI
Useful for making quick covers, by Hina.
AICoverGen-NoWebUI
Useful for making covers, doesn't inclued a UI, by Ardha, by Eddy, Hina and Gdr.
RVC Disconnected
To train new voice models, by Kit Lemonfoot.
EasyGUI
The OG interface, by Rejects.
⠀
-uvr
How to fix "RuntimeError: Error(s) in loading state_dict for SynthesizerTrnMs768NSFsid:"
In RVC Disconnected Training
I can't train cause of that error
does anyone know that one website that is used for making ai covers? if anyone knows what im talking about please let me know
thank u sm
np ✨
Nice
I don't have a computer.Any Colab/alternative?
Hi, it might be the wrong place to ask this but i recently used applio and suddenly everytime i convert, there would be segments, i did restart and reset the output setting but it seems to stay there, is there anything i can do to convert without the segmentation?
Edit: Nevermind, it worked by simply converting and not opening the output tab, thank you all\
hey everyone why is my voice so glitchy
it is included in my tweaked version of MSST colab: #1159290752195633273 message
it's not included in the original jarredou's but you can also add it in this code part:
Can you link the colab I lost it also at what settings
mine or jarredou's colab?
Got the link, what are the settings
hey can I still get help with this
The more aggressive or less
use the normal one, and you can leave these settings (or overlap 8 for faster processing).
chunk value 0 means it uses the config file's setting
Chunk size zero?Why
yea it uses the config setting
for melrofo dereverb it is 352800
Nice how do I make a dataset through using this colab
I mean what steps and models
What are the actual settings for denoising
you can follow this datasetting guide: https://rentry.co/RVC-dataset-RX11
(use vpn if you can't load some contents)
What should I tick/untick on your Colab
Use test time argumentation?
- extract_instrumental: includes inversion of the target stem
- use_modelname: the model name is included in the output file name (Model Test Mode in UVR)
- use_modelconf: some config params (overlap & chunk_size) are included in the output file name
- use_customconfig: will use the custom config below it
- not 100% sure but I think TTA is not really necessary
Yeah but what settings should I leave
Use custom configuration or not?
yes, unless you want to use overlap 2 from the model's config file
What settings should I leave ticked and which ones unticked?
That's what I'm asking
Only use custom configuration?
@knotty moth is this correct
it's okay
Are the settings correct?
You said chunk size 352800, Mel roformer dereverb normal
For reverb removal
Nice what are the settings for denoising
you can use the same settings
the normal mel denoise (1)
well, rx 570 isn't anything beefy really
lagging is to be expected
alternative you have is to just, go for onnx and w-okada but with quite a lot of latency so sadly yea, you won't get any actual 'realtime' experience
For deecho
you can use UVR de-echo in mvsep
Ayo? @knotty moth level 47 !!! 
How much aggressiveness setting
Cuz 0.3 still leaves out
if you tried higher values: 0.5 and 0.7, and still the same, I don't think there's any good de-echo model yet
The one on x-minuscough but I ran out of minutes I have to wait til next week
btw I have been working on mostly good ol' rock & metal songs, barely on the modern 10's and 20's pop songs that may contain such difficult echoes
And?

You make datasets put of these?
nah just for making covers
How do you remove the echo
and also sometimes love live and weeb songs
the echoes are still quite easily removed by UVR/dereverb models
How do you prepare your samples for inferencing
Yes but at 0.3 aggressiveness?
yeah, the equivalent value in UVR gui is 30 (of 100)
UVr Denoiser at 0.5 or Mel-roformer denoise 1?
the latter, and it's fullband
I know that, I'm very familiar w UVr architecture it's the best one
Mel
so how do i making ai covers guys
Ayo? @pastel kiln level 1 !!! 
the easiest ive had experience with is Applio, just run the bat and you can even train model yourself if you have the resource
Ayo? @frosty python level 2 !!! 
I found a secret sauce: when I tried on kim's melroformer, chunk_size = 485100 turns to be optimal one since it corresponds to dim_t = 1101. it is also used in unwa's models in their config file, and I think it should also apply to other roformer models, yea including the dereverb & denoise model as well.
@brittle wing notice this also..
Imo this is the best dereverb model it doesn't take away from the vocals as much as the BSRoformer one
You're smart
hey , i js started to use the voice changer , is there any way for it to mute the app so i dont hear myself echoing on a call
Ayo? @polar raft level 2 !!! 
-rvc
- How to use RVC Mainline Colab by Cauthess
- AICoverGen Colab Guide by Eddy (Spanish Helper)
- Create a model with RVC disconnected (colab) by Angetyde
^
agfdsgh
yea lol feels like I should apply these new methods on some cover songs and past vtuber datasets I have been working on, plus unwa's inst v1e is goated for my covers
@knotty moth is it okay if use BSRoformer for Acapella and then the Mel roformer karaoke model for lead vocals and after I use Mel roformer dereverb normal in the colab then UVr deecho at 0.3 and Mel rodormer Denoiser 1 in the Colab?Will that help me get model maker
Yes you should
@alisa how do you prepare your samples for inference tho
I trained a voice and i have a G_2333333 and a D_233333 Data and i cant use it in RVC GUI why?
kim's melroformer is my primary choice for vocals, though I may also consider beta4 for a bit more fullness or mvsep/unwa's BS rofo as secondary one to mitigate some bleeds (personally some hip-hop/rap, k-pop and weeb songs may be little more difficult to deal with)
Mhm the method I listed actually works
For noise UVR Denoiser or Mel Denoiser?one last question
after vocal extraction, dereverb and denoise as usual, also I used renegate as final process for both cover & dataset making
melband one, and plus renegate plugin (idk if there's similar one for mobile)
Denoise mhm I have a bandlab preset that makes vocals realistic and filters out noise
Seriously isn't UVr Denoise better
It actually adds it's own noise in the output
It has been proven I remember thinking of it and someone posted proof of that through spectrogram analysis
YES
So Mel roformer Denoise for dataset and stuff
But Uvr Denoise on uvronline is the best at times it doesn't add noise or does it ...
But yeah UVr Denoise is good for denoising UVr outputs of instrumentals.
I feel like UVR denoise is such an old method (though many novices may still use it), and it has 17.5 khz cutoff also somewhat aggressively removes quiet voices (including whispering) under -27 dB
nevertheless, imo Renegate plugin can act as a final denoising process (it works as noise gating but in smarter way)
But I like it UVr architecture is my personal favorite
It's the best but noisy
But I wouldn't use that model for Denoise cause it generates it's own noise!For real I noticed do myself and there was even a post w proof
ig the noise it may add is just chunk artifacts, but the updated MSS colab should eliminate it by internally using batch_size = 2 instead of 1 in the model config file
@knotty moth stop suggesting me unwa models for Acapella they make the lead vocals sound muffled, I tried duality V1 and just no and the dereverb eats them out even more.
I prefer the official BSRoformer model made by the true developers, thanks
the vocals in even mvsep and viperx 1296/1297 BS roformer are not less muffled than kim's melrofo, I haven't found truly full vocals that are better than unwa's. you may also try Lew's vocal enhancer though may not be ideal for dataset making.
personally I haven't made datasets using song tracks though.
No the vocals enhancer adds noise even more to deal with
Which model by unwa are you talking about
????
beta4 and duality v1, though the latter has more noise
Duality v1 makes my dataset sound muffled as hell 🤦
Beta 4 oh the background vocals and reverb are every stubborn
These models don't work for me.
If they work for you fine.
ig you'd say the same thing on kim's melrofo? (as I judged on the high-end fullness)
though true that BS rofo 2024.08 is quite solid
I don't use Kim's Mel roformer for datasets also it's a good one for instrumentals why
BSRoformer is the best.
tbh it is my SOTA in early to mid 2024 (2024.04 then .08)
I wish it could be eventually downloadable and usable for UVR and MSST colab
I know.
They say they don't plan on releasing it
ah I see x-minus server is unstable rn 💀
Nah I ran out of minutes tho
Hi I'm new to applio and I was wondering if there's some way to tie settings to a voice model. Specifically TTS Voice, TTS Speed, and the Pitch.
Like is there maybe some kind of settings file that could be made and put in the model folder that could set it up?
You can't really force it to be tied to that specific settings for tts
Rip
morenatsu 
-# I've been found
Ayo? @stoic forum level 1 !!! 
Hello
Ayo? @fallow linden level 1 !!! 
I forgot to make ai cover soo....it's been long time since i didn't doing it
What's your PC GPU?
I've noticed a few voice models come with a "trained" and an "added" index file. What is the difference between them?
Faiss Integration (.index file): The Faiss library enables efficient approximate nearest neighbor search in RVC during inference, retrieving and combining training audio segments with closest embeddings. For your final RVC model, include the one which the file name starts with added.
Example:
added_IVF157_Flat_nprobe_myModel.index
Hey guys I just joined and how da heck do I get the list of existing models, voice models shows no list. TY HUGGZ
guys, why do my models always have a quiet voice, like barely hear it unless i put it 200% output
the added index is the one used for inference
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
fix the volume in the trained data
so i need to fix the sound file before training right ?
@brittle wing do not send datasets here for copyright reasons
💀
If you crashed 4 times out of RAM, your PC is not powerful enough to do it locally
you can use cloud (remote good pc)
where you saw copyrights?
we don't allow datasets to be shared here
the server got already taken down once
bruh
we don't want this again
You can run RVC on cloud (remote good pc):
- Prepare the Dataset
- Setup RVC:
Choose a cloud way to use RVC,
- Google Colabs (4 hours of daily gpu for free, not much hours, but easy to use):
- Applio (ui)
- Mainline (UI)
- RVCDISCONNECTED (no ui)
- Kaggles (a bit harder to use and needs phone number but gives 30 hours weekly of better gpus):
- Mainline (UI)
- Applio by Vidal (UI)
- Applio by Shirou (UI, no guide as of right now)
- Lightning.ai (Kinda hard, needs login, no issue with web uis or anything, but only free 15 credits monthly):
Google Colab = Easier but risk of getting disconnected
Kaggle = Harder but way more gpu time
- Be sure to know about the tensorboard
If you are looking for the easiest way and for free, is using https://weights.gg which ofc uses RVC
if the source audio is quiet, then the trained model will be quiet
but i checked multiple time, the source audio volume is pretty good
but the result is so much smaller
what are you using for training?
this sounds like a very obvious question
but
is your real mic volume… low?
i put everything to 100 but still low
like if i use normal mic, it's normal, whenever i use rvc, it got really small
are you close to the mic?
yes, very.
ahh then if your mic volume is fine, its the model fault
but let me try again, maybe it was my fault somewhere
be sure to hear your real voice first
so u can check if your real mic volume is loud enough
okay, i will do that, thank you for your time
if other model works fine, but yours does not, then the problem is obvious
other models have the same issue tho
Ayo? @golden walrus level 5 !!! 
but my input is fine
is the inferred audio really quieter than the source one?
check the volume envelope setting in the inference, the default one should be fine
(did you use ilaria RVC or Applio, tho?)
i used applio to train model
and inference?
ah, then it may be the mic settings
but i set it at 100
try speak louder since you apply Sup2, or how about when Sup2 turned off?
same volume when turn off sup 2
Hello, I am using local RVC, I have a save every 30 epochs. Last night I was training a model and the PC shut down but I still have my saves. Can I continue training from models with these saves?
- How to use RVC Mainline Colab by Cauthess
- AICoverGen Colab Guide by Eddy (Spanish Helper)
- Create a model with RVC disconnected (colab) by Angetyde
increase in to around 150
i read somewhere doing so will drastically reduce to quality right ?
no, this setting increases your mic volume
in cases like this where the real input volume is low you need to increase it
ahhhhhhhhhhhh
if it is too loud to cause clipping
be sure to not increase it too loud
i got it
when it reach distortion/clipping the model quality degrades
Any ideas?
ah 1 last question.
if i train a model, those D and G are pre-trained right ?
It stores the data I feed it, so if i want a model for singing in my language, i can throw a bunch of songs then train ?
After that i use a voice i want to mimic to train on top of that pretrain ?
D and G are model weights used for training
g is the generator of the model, d is the discriminator, it doesn't work like that
default pretrain is generalized model that is average for everything
when you train a new model on top of it, you change generalized model to more specific
did you save the g and d files? if you did it then you can continue training, if not, no
training a new model on top of a different model may not be beneficial
ohhhh, so how to allow a model to sing tho ?
speech models will always suck at singing
they dont have singing data so they are never gonna sound as good as a model trained on singing
Yes, I've registered them both. How can I continue the training?
maybe i think too simple about these models
so i have to pick 1 between singing and talking model
go to the training tab
in the model name use the same exact name of your model (if you named it lyery before, then it has to be exactly lyery)
don't preprocess, don't pitch extract (very important too)
set the same batch size you set (VERY IMPORTANT) if you used batch 8 while training, use batch 8 again
same epoch amount and same save freq amount
if everything is the same, you can now click start training
and is going to continue training using the latest g and d
for realtime voice changing use a speech model
singing models suck at speech
ah i get it
and don't mix singing and speech in a dataset
you'll get better result training a pure speech/singing model than a mixed
by the way, is embedder model important ?
yes, always use contentvec
ah so pretrained is where i pick which one suitable for my language ?
custom pretraineds have issues the original doesn't have
in simpler words the original pretrain is going to reconstruct your dataset frequencies better
despite the original one being on english only, you can train any language with good results (i train mostly spanish models and they have good pronunciation even if the pretrained is english only)
avoid using custom pretraineds and use only the original
you can use them but you may get "exclusive" problems related to that specific custom pretrained used

new thing learned
right, i will get it to work, but do you have an example of epoch number ? or just purely observe the graph
an example amount would be 200
but the real answer is we can't give an "good amount" of epochs since its random
me personally i dont train over 200 epochs
most of my models are usuable at around 100-150 ish
ah okay
cuz i only do 120 then i look at graph
but sometimes graph is flat
no low point
just flat
batch size too high or dataset too big*
or both together mixed
i set batch size at 8 and most of the data is around 40mins
let me try this time, maybe it's different.
you should be getting low points uhh, is your graph smoothing set too high? like 0.999
i set it 0.982 like in doc said
ahh that explains, you can set it lower and you'll see your low points

but smooth graphs in big datasets is normal anyways
not really a bad thing
as long you hit low points is fine
and the graph is not rising
small datasets have multiple low points but that doesn't mean its good
(actually thats kind of bad lol)
u have to hear the epochs, smaller datasets suffer from overfitting rather than overtraining
for example you'll hear robotic sibilances (s, ch sounds), they also sound unnatural due to the lack of data
i have to check one by one ?
nah only low points
choose the mel low points, not the g/total ones
g/total is the generator loss, is merely an average of mel, kl and fm
about this, does this apply to all quantity of dataset or just smol one ?
ahhhhhhh
1 step closer to Chamber's voice
the robotic sibilance and unnatural sounding model? yeah this happens due to smol dataset only
you can also get those on big datasets but those happen very late in training (but the big dataset is always going to sound more natural)
while on smol datasets that happen extremely early
that explained why 300 epoch sound so bad
10 minutes is the bare minimum, it still overfits quite fast but not as fast as a 5 minute one
the more data u get, the more later the model is going to overfit
and the more natural is going to sound
with more data
you don't need to de-ess the dataset to fix the robotic SH sounds, this is just a myth
is merely a dataset length thing
ah, so can i add data to the model ? like i got 1 data that is 10 mins, i train it. then later i got more data, can i add on top of it ? or i have to train again ?
Ayo? @golden walrus level 6 !!! 
as long all of the data is the same quality, and not damaged, sure you can add more
but you have to train it again
from 0
a new model with the added data
Okay, I'm trying, thank you very much
you can't add data to an already existing model
no prob training model is easy, whats hard is to be sure the dataset is not damaged hehe
keep in mind that every model trained on mvsep/separation models is damaged
so the quality is degraded a lot compared to a non mvsep model
so if you notice the model sounds a bit... broken? is just because of that
Wouldn't be possible with model blends?
you meant model fusion? not really this only merges the timbre
Yes that's what I mean. Never tried that, can you explain more about it?
merges the timbre of the models, this was meant to create new voices but you can also do it to try to "fix" a small dataset model by merging epochs
codename has a tutorial explaining model merging
hopefully the new codename's rvc will allow multispeaker, so you could add normal speaking, singing, screaming, etc. in a model
That sounds so cool
Oh, also didn't know about this. I've been using the MelBand Roformer model to isolate vocals
yep every type of damage to the audios will make a model sound a bit robotic/metallic
not only separation models but every type of damage will do it
ideally you want the dataset to have very few post processing or in best case, no post processing at all, raw quality
I guess every separation model would do that in UVR as well?
every separation model, yes correct
private models on mvsep and public uvr models
Hard to find 🤣
i agree 😭
some leaked stems I had have 16/17.5khz cutoff, have background noise, or may have a little bleed
if it's mixed with lead vocals and harmonies, you'd have to separate it tho
tru, good luck finding leaked steams without harmonies 😭
anyways i notice what kills the quality in the dataset is when we use isolated de-reverb datasets
I noticed that too
i have a couple of mel roformer-only models (that lacks any type of reverb/sound effect) and they have very good quality
aka streamer models

so is not like mel roformer/bs is gonna destroy the quality by itself
is when you use multiple separation models to remove multiple sound effects
so more effects removed, more robotic the end result
ideally to remove reverb use clarity vx de-reverb
is better than the separation models since it doesnt destroy the quality too much (still expect a slightly metallic model)
every separation, dereverb, and denoise models are not too perfect (though still enough for making covers), not as perfect as raw studio sessions (even it may also have little room reverb)
That's why i prefer live/intimate performances as they have less reverb, such as the tiny desk concerts
exactly
Also less instruments so the separation turns better
tldr less separation models used, less metallic the model will be
may also depend on amount of background noise & instruments
I think the BGM in vtuber talking streams are not as loud as song tracks, so the extracted vocals would be less muffled enough
ive trained noisy models and they dont sound robotic, rvc expects noise in the dataset
natural background noise tho
not synthetic noise
rvc is pretty robust towards noise so dont worry too much about that
focus more on how muffled/damaged the audio sounds
and keep the least damaged audio in the dataset
not exactly the lack of data, more like the lack of attempts to reproduce them
sibilants are made from white noise "columns"
but during training this white noise gets reshaped, not exactly in a good way - this is not baked enough metallic thing
after 3000 attemps:
and after 5.5k it is close to the original
this was trained on a single 0.5s sample
yea this explain that better
Thanks
unfortunately the default training method is random, so you cant guarantee it would hit every c, s, ch x 5000+ times during a training loop
why does more data makes this happen less often?
so that's where the size of the dataset or number of epochs comes in play
during one epoch loop a random 1/10th of a standard 3sec slice of each sample is used
if you decide to slice your training set to 5+ sec samples, it is even less than 1/10th
basically default rvc
or it was 3?
cant remember lol
usually it is 3, unless there's a lot of silence so it cuts smaller pieces around silence gaps
I made a modification of the training loop, so it goes thru the entire set each epoch
0.5s slices with a small overlap
well, it guarantees every bit of data is being used
so like 12min data set is more or less equal to 2hr normal rvc dataset
damn thats pretty cool, is there some downsides to what you did?
it can be used in normal finetuning?
I need to adjust learning rate, I think, so it does not overfit over long epoch.. or maybe save a model more often
ow i see
12min x 15 epoch of singing data
lil undercooked, but still pretty good for what it was made from
I may add it to Applio as 'experimental' training method
uhm yea i suppose fully cooking it would give better results than original rvc
may somehow remind me of some old mid/early-2023 models
Very interesting, I assume the same applies to breath sounds?
I guess, it unvoiced piece of the audio
AI HUB Docs

