#✨│ai-help
1 messages · Page 217 of 1
Hi everyone, do these pre-trained models work well for French, or are they mainly optimized for English?
hi guys, i have a question about the rvc. the problem is that even by setting monitor to none I can still hear myself. I've also installed the virtual cable, etc. still i can hear myself
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
oh okay sorry
Most of pre-trained models are trained on English audio, but I think they also do work with any audio language as well.
Please specify more of the context next time. Don't ask with none of such context.
You can ask something like "what do I need to install and use RVC in my PC?" for example.
how to remove the electrical noise of the rvc?
wrong channel, monitor i a wokada option, also do not follow yt tuts
RVC is not wokada
RVC = Retrieval-based-Voice-Conversion, the best Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models
Wokada = uses RVC for realtime inference
@livid plaza tell:
- your pc gpu
- what guide have you followed
- what you want to do
in #🔍│help-w-okada
be sure the dataset you're training is high quality and use the tensorboard
For a very low diverse dataset (same prosody, different sentences), is it worth to expand by using ones-shot tts tools to increase dataset size and by picking the samples that most reflect the original voice?
My dataset is 30 minutes long now (it was 2h more or less), I've seen that by reducing the size it improved the results (to reach to a decent model, I would have to reach 300 epoches, now I can do this with 150 ~ 200)
But it seems the bottleneck is the dataset itself, changing pretrain does little effect for now
my idea was to use only 15min of the original dataset and 15 min from synthetic generated samples
synthetic data isnt the best for rvc, it would be better to mess with the batch size or learning rate
I'm using 8 batch size, I'll attempt to use 4. For learning rate, I'll check more into
you can try setting the generators learning rate to 5e-5
note that this will require you you to train the model a bit more
Good to know, I'm gonna use this one to try again. I'm trying to keep expectations for the model improvement around at max. 200 or 250 epoches (even for a 30 min dataset), if I don't see the graphs decreasing on tensorboard after 10 epoches I just keep the best epoch around from the last lowest point reached.
Thanks for the tips!!
It's not like I want to, but my current time is about 4-5 seconds from activation to audio playback start, with around 4 different voices coming into a queue. I can spare a couple of seconds for tts of one sentence if I can find something to replace edge-tts.
Are there any hobbyist voice actors here in the server?
I'm running RVC on my macbook m4 2024. I got it somewhat working, and the WebUI is popping up. But, whenever I try to do anything, my tasks load in the queue indefinitely without completion. Is this a common bug? Anyone have any info on how to fix it? (https://imgur.com/a/oySN48M <- screenshot of the problem because I can't send images in this channel for some reason)
it is a known issue with mac
Thank you. v0 gave me the followign troubleshooting steps to try. Are there any well-known fixes that I'm missing, other than these?
-
Disable MPS Acceleration (most effective fix):
-
Force CPU Mode (if the above doesn't work):
-
Check if PyTorch is using MPS correctly:
-
Reinstall PyTorch (if needed):
-
Clear cache (if models might be corrupted):
-
Make the fix permanent by adding to your shell profile:
you can join into a whole for each speaker
That's not an issue, I just don't know the fastest method to get the cloned voices that I want.
I'm facing the issue issue, I have a Macbook (M3), I thought it was an issue with the voice files and tried a couple more and still getting the same problem.
broken?
then consider a zero shot TTS solution like fishspeech, etc. though dont expect it too expressive
So the thing I was already going to try. okay.
I'm using the commit mentioned here and now it is working fine without additional changes, give it a try.
https://github.com/IAHispano/Applio/issues/869#issuecomment-2725189180
Thank you so much!!
idk what kind of sus cloudfront link ur trying to open
huggingface
Voice actor for what? I'm an artist who draws art, my guy.
any site I can find a database isolated vocals of songs?
is the linked applio on the wiki working for anyone?
is there a guide for using the rvc ai cover maker colab i havent done this in a yearr
You could've say what is the best UVR5 model for extracting audio to use vocals audio with AI cover.
Yeah that
How can I use these models? It's not in the UVR I installed from the UVR official website
refer to this guide and use ctrl+F to search what you need https://docs.google.com/document/d/17fjNvJzj8ZGSer7c7OFe_CNfUKbAxEh_OBv94ZdRG5c/edit?usp=sharing
edit 13.03.25 deton24’s Instrumental and vocal & stems separation & mastering (UVR 5 GUI: VR/MDX-Net/MDX23C/Demucs 1-4, and BS/Mel-Roformer in beta MVSEP-MDX23-Colab/KaraFan/drumsep/LarsNet/SCNet x-minus.pro (uvronline.app)/mvsep.com/ GSEP/Dango.ai/Audioshake/Music.ai) General reading advice | D...
Btw, should I use this instead of the app one?
https://github.com/Eddycrack864/UVR5-UI
different consoles/PC for the same games you want to play
Yeah so now I'm confused... Could you give me a simple tutorial to just extract AI voices?
I can't guide you more as that would be completing your task. I have shown you the guide like above so you should try suggestions within it. This ensures you understand about the vocal extraction, what separation models do, and how well they could perform for your cases.
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
If I give you my own very simple tutorial, you'd think it's too complicated.
Extracting ai voices doesnt make sense in this context, do u mean Extracting voice from an instrumental?
If uvr is too complicated for you, go to the page mvsep.com , make a free account to skip 95% of the queues, use the latest bs roformer model it shows. Upload your audio and wait
hey
Yeah. I was bad at describing anything
how do i create my own ai singing voice???
What is your PC GPU? Applio the RVC can run on PC locally.
Asking "how to do AI cover" just that is one of the bad questions I've ever seen. The "AI singing voice", you either mean by AI cover.
Hey i'm kinda new to this training thing. I got a quick question. I downloaded a .pth file from huggingface, and i wanna use it on W-Okada for realtime voice conversion. It sounds quite cool, but i want to do some fine tuning because emotions like laughing, anger, sadness sound really weird with this voice model. I downloaded RVC WebGui, and i tried to load the model in there but it says "error" because the model apparently was trained on different sizes etc., what can i do to fix this or how can i do finetuning with this model? I already recorded .wav files to load etc., but its just not working. If anyone can help me i'll donate them 20 dollars.
Just DM me if you can help, i will donate you even 50 dollar if you can successfully help me.
🙂
- do you want to use rvc or realtime voice changer? the non-realtime one has better quality for pre-recorded audios
- most models struggle on those kind of emotions, but better quality dataset and pretrain might help reducing artifacting on that case
- you should read the model description carefully, it might have been trained using refinegan which is different from the default hifigan that most RVC applications and voice changer support.
bruh
lowkey
yr jst better off using the actual website atp
You cant change a model you downloaded
make your own
Genuiene laughing is impossible at this current stage, emotions need to be overly expressed
I’m using the app it’s way better
Hello. I'm trying to make a good ai voice for elsa with a bit of emotion. I am on a mac though. And idk where to really go from.
Heres the docs for training models or making covers with them https://docs.aihub.gg/rvc/local/mainline/
I am on a mac though? does this work with that?
Oh right,
macs can only train on cloud
i should know more
which would be best for what I want to do.
so how do I get to the cloud is it rvc?
cloud's like training on another machine from a service
i could probably explain better but eh
it can train other ai's not jst rvc
-kaggle
- Applio Notebook, by Vidal Kaggle
- Applio Notebook, by Shirou Kaggle
- Music Source Separation, by Shirou Kaggle
- UVR5 NO UI, by Eddy Kaggle
- Original W-Okada's Voice Changer, Kaggle
- Modified W-Okada's Voice Changer, Kaggle
- 🆕 UVR5 UI, by Eddy, ArisDev & Nick088 Kaggle
- 🆕 RVC AI Cover Maker UI, by Shirou & ArisDev Kaggle
- 📖 How to use RVC Mainline on Kaggle by Cauthess
Note: Kaggle limits GPU usage to 30 hours per week.
kaggle's the best option since it has 30 hours free 2x gpu
but you need to use a phone number
Last update: Jan 13, 2025
ok ty. I'm trying to do an elsa ai voice with some emotion to be firm
would that work well?
I want it to say a specific word
SCRFilms made some good elsa voice model (I bet you mean the disney frozen one)
yes I meant that one. I can't train it though.
I need it to say a certain word that i'm going for
and the text to speech I don't think would sound well.
macs aren't viable for training
may depend on the accent compatibility
Yea.
I can't seem to download it either.
Download
Download your Creation and additional outputs
Output
it says this and I click done but it doesn't download
could I download it to my mac or that wouldn't work?
just realized theres a queue....
is there any way to not do that
a ai voice model?
text to speech?
bc I want it to say something
I've been in the queue for 16 minutes
i figured
how awful is it? show few seconds of the inferred sample
I think I deleted it.
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- Hina's Mod AICoverGen WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Hina's Modified Original W-Okada's Realtime Voice Changer, Google Colab
- FaceFusion UI, by Nick088 Google Colab
- FaceFusion NO UI, by Nick088 Google Colab
- EasyGUI, by Rejekts Google Colab
- 🆕 Music Source Separation Training (Inference), by Jarredou & Makidanye Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
-kaggle
- Applio Notebook, by Vidal Kaggle
- Applio Notebook, by Shirou Kaggle
- Music Source Separation, by Shirou Kaggle
- UVR5 NO UI, by Eddy Kaggle
- Original W-Okada's Voice Changer, Kaggle
- Modified W-Okada's Voice Changer, Kaggle
- 🆕 UVR5 UI, by Eddy, ArisDev & Nick088 Kaggle
- 🆕 RVC AI Cover Maker UI, by Shirou & ArisDev Kaggle
- 📖 How to use RVC Mainline on Kaggle by Cauthess
Note: Kaggle limits GPU usage to 30 hours per week.
fm going up is not good
Hi. I'm having this error when using RVC Mainline Colab. I did everything to the letter and I still get an error. What did I do wrong? Can anyone help me fix this please? 
use other colabs from #📰│dev-updates
i can't get applio voice blender to work at all, is it possible to instead blend a voice by training data from two different singers? has anybody tried that?
to merge models it is necessary that both have been trained on the same sample rate
So, you want to make a singing model of your own voice or make a model of any singer?
model of my own voice
In that case, if you wanna make a singing model of your own voice, record at least 20, 30 or 40 mins of yourself singing (with at least a decent mic)
i got the recordings bro
Did you make sure these recordings are clean and got decent/good quality?
yup i recorded it on a studio mic
Nice, in that case go ahead and read this guide.
In the context of RVC, the dataset is an audio file containing the voice the model will replicate. It can be either speaking or singing.
It will teach you how to train models.
is it hard to do
any easier ones bro
damn man thank you though
You're welcome bud, but i would suggest you to try and read the guide whenever you want.
Runtime disconnected
Your runtime has been disconnected due to code execution that is not allowed at the no-cost level. Colab supports millions of users and prioritizes interactive programming sessions by prohibiting certain types of usage, as described in the FAQ. If you believe this message was sent in error, please submit a dispute. Please include relevant information about the context of your usage.
Your compute unit balance is 0. Buy more
To connect to a new runtime, click the connect button below.
so i see one of my favorite model makers created a 2pac model. but he says the index is not required. but i think it is. doesnt that hold all the persona of the voice or accent
?
Hi ! I am having trouble with Google Collab No UI at the first step of trainng:
this is the error:
Starting preprocess with 2 processes...
0% 0/1 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/content/Applio/rvc/train/preprocess/preprocess.py", line 269, in <module>
preprocess_training_set(
File "/content/Applio/rvc/train/preprocess/preprocess.py", line 241, in preprocess_training_set
audio_length.append(future.result())
^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
anyone knows what is it?
Yep, you can use the index.
i always use the index files that come with the models, i think he is confused tho. i think he thinks you don't need it. but i think it is needed to make it sound more accurate
sounds like its your dataset
Should i re upload it? I have a 24 min truncated audio recorded in a studio
The index contains the voice's accent and speech manor
It is very optional, and yes it may make the model sound more accurate
also did you use clean vocals without doubling
I did the truncate process on audacity, I've done several models this way I dont know why I'm having this problem
but if you infer the model on something that talks like the person that can also make it sound more accurate
did you run feature extraction?
thats what i thought. man if he only knew. the qaulity is there with his models but it dont sound like 2pac, it sounds like his accent with 2pac's voice if that makes sense.
bus feature extraction is the second step... I'm on the "pre-process" data fase
i would say check your dataset files then reupload them
is your dataset in the right folder?
might be how you routed the folder
did you try to use a zip?
if you zip them sometimes it don't work.
Yes right folder and everyhing. Thw .wav file in 44.1kHz 16bit is inside the folder
the file has to be a .wav or .flac it cant be in a zip
I'll try to zip it
no no no dont zip it
yea try that
oooh so I dont?
dont
nope
yea maybe your right.
i would start over again
reupload again
might be the notebook is down
haha
also it will stop if you don't baby sit it
for a dataset like that i would go 1000 epoch
this is what happened to me when i try to preprocess a zip
no, unzip it
wdym?
its only had improvements
well on the pre trained thingy mobober i forgot what its called but there was an option that it used for deeper voices
it depends on the pretrain you're using, besides the dataset and cleaning method used
yea it was an old pretrain that did a swell job
look in https://discord.com/channels/1159260121998827560/1235952130855010365 you might find it there
its been a min since i trained, but i remember i had good results with an older pretrain
unless you are talking about the og one
should I install the new numpy 1.26.4 version before training?
the default pretrain is still good, but those made in 2024 may be abit problematic
klm 4 hifigan final and original pretrain are good
il try that
It's not working
is it in the any notebooks ? like a disconnected one
it could be your dataset or the note book
look here https://docs.aihub.gg/rvc/cloud/applio-no-ui-colab/ to make sure you are doing everything correct
the colab notebooks may still have issues #📰│dev-updates message
oh so maybe is an error on the notebook itself?
i was unaware of any issues i know it stopped working for a while then it came back on.
Yes all correct, is my 8th model lol... never had any issues until right now
yea sounds like the notebook is either done for or is down for a min
what notebook are you using
Applio_NoUi from the documentation of applio.org
How? I'm kinda new on the whole notebook/Google Colab/AI stuff
Or should I try on another cloud based platform?
you can just type in the search in the discord of this channel
or ask around some folks have some
you can try kaggle but i have note used it yet
i know google plays games with people training
they dont like it.
I'll see if that works lol
yea i requested you to be my pal
so let me know what happens
i been so lazy at training and getting things going with my music. i made a few 2pac songs last year. but dammit it takes up alot of time after you have not done it in a while to relearn everything
yes! I'm new to this and it's a lot of information
yea it takes a min to get it all figured out. but now a dayz its much easier because everything is one click. when it first came out i didn't have that luxery.
true that man
I'm getting this error when trying to load UVR5 UI. Can anyone help me please? 
I think the notebook is having issuesbecause I can't infer...
But how can I fix this?
No idea man, I'm having usues as well and I have no idea how to fix them
Not working... I'll try tomorrow
how tf u install UVR5 UI?
can someone generate some voice msgs for me? rlly confused on how this works
there are many TTS options, including kokoro TTS
https://huggingface.co/spaces/hexgrad/Kokoro-TTS
and then you can use some RVC model to infer the TTS output
im confused on this part
simplifying it..
inferencing = using a given input / audio / sample / voice ( however you interpret it ) on a model ( in this case rvc )
and how do i do that😭
If you still have no idea what to do, then I don't know what to tell you either.
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
You can't expect anyone to do everything for you. You have hands, you can do thing by yourself.
you should first understand about RVC here: https://docs.aihub.gg/essentials/whats-rvc/
Last update: Oct 21, 2024

Hi, I tried the path here in order and it epochs am I on the right track, just now when I did one click traning, it gave an error and I tried again and it started to epoch
try Applio, it has several improvements and features in training
I don't know how to use it
I should have shown you the guide https://docs.aihub.gg/rvc/local/applio/
Last update: Apr 01, 2024
You don't know anything about this?
you mean the issue you're having?
the one click feature is not recommended for the possible issues
that's why it's not included in rvc forks including applio
Who doesn't know about that?
You asked like if a mod/helper doesn't know what Applio even is and how to use it.
I don't know because
How can I add voice models in UVR5 UI on Colab?? 
does anyone have a way of making a voice into a text to speech?
There are different Text To Speech (TTS) AIs:
GPT So Vits: RVC isn't as good as GPT So Vits for tts, but gpt so vits (few shot tts, which means needs just a lil training for models) can't use rvc models (and viceversa), and its only limited to: english, chinese & japanese, if you wanna check gpt so vits instead, read https://docs.ai-hub.wtf/tts/gpt-sovits/
Freemium 11labs: Easy way to do TTS is https://elevenlabs.io/, you can't use RVC model on this but its a mostly premium easy way for good quality TTS
FishSpeech: FishSpeech is a 0 shot (no explicit training needed) TTS, if you got a good pc you can use it locally else use their site
You can check TTS in our tts index
With RVC Models:
RVC is natively for Speech To Speech, but forks such as ilaria rvc mainline & applio have built in tts (using Microsoft Edge TTS to make a generated tts audio, which i suggest you to choose a tts model that is the same gender and language of the rvc model you wanna use, and then convert it with rvc)
If you wanna do tts locally with RVC Voice Models (if you got a good pc):
If you don't got a good pc you can do tts with RVC Voice Models on cloud:
-
Ilaria RVC Zero (Running on A100 GPU, free fasted rvc on cloud) and the guide
-
Use Applio UI Colab (with google colab T4 free daily limit gpu)
-
You could try another tts from our tts index and use the output as an input in rvc
Anyway to bypass this?
Unfortunately, no. You'll have to pay for ngrok or wait for another month to continue using it.
anything alternative?
delete your account and make a new one with the same email
sounds silly but it used to work for me before lol
then ofc use the auth code of the new acc
hahah interesting
Absolutely worked
🔥
bro what 
404 solution not found
Sorry for the noob question, but idk where is the weights directory of the UVR5 UI
And how to download models from Hugging Face
Honestly, I'm not very familiar with this specific Colab
That's why I'm a bit confused and I need help 
Models are automatically downloaded
Unless a UVR5 Colab notebook doesn't provide any model to use or you want more other models to use with.
But that means I can't load my own models?
Even the ones that I download from Hugging Face?
I don't know. You should be able to use any other UVR5 model in UVR5 Colab notebook, although you'll have to do some code a bit there.
Hmm... I'm actually new to this. So Idk how to write some code
UVR5 UI isn't for voice conversion, it's for getting instrumentals, vocals, reverb, and more from songs/audio files
And what tool do you recommend to me?
If you mistaken UVR5 for an AI voice changer like RVC, it's not. UVR5 is an AI audio stem separator.
It's fine. If you actually mean by a program that can do voice changing, Applio the RVC is what I'd recommend.
Ok, I'll use it. Thank you 
You can learn more about Applio there. https://docs.applio.org/applio
Actually another alternative is creating various ngrok acccounts lol
A mí me sigue funcionando hasta el día de hoy este truco
Yo, I'm using Codename Fork, but the problem is that the interface in TensorBoard doesn't look the same. Which one should I follow to get the same results as g/total?
Can I install the Applio Repository on Hugginface spaces and run it there?
i havent made a voice model in a year and i forgot how to can someone explain
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- Hina's Mod AICoverGen WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Hina's Modified Original W-Okada's Realtime Voice Changer, Google Colab
- FaceFusion UI, by Nick088 Google Colab
- FaceFusion NO UI, by Nick088 Google Colab
- EasyGUI, by Rejekts Google Colab
- 🆕 Music Source Separation Training (Inference), by Jarredou & Makidanye Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
is anyone having trouble with applio on colab while training or infering?
it should b working
Yo, I have a question: is it possible to separate voices from the same video? Thanks in advance for your answer.
What does it mean to separate voices from the same video? The file must be in an audio format (mp3, wav) before going separation process.
How exactly can I separate voices from a video? Which software or tool do you recommend?
Last update: Dec 24, 2024
Between Mangio and Applio, which one should I use for Local?
Applio and its not even close
Mangio as an RVC program has no update as of now. Applio is better.
uhhhh i keep getting this error on applio
An error occurred extracting the index: need at least one array to concatenate
If you are running this code in a virtual environment, make sure you have enough GPU available to generate the Index file.
?
it detected overtraining below 100 epochs. is this normal or is it just because i had a really high quality dataset
not recommended to use "overtraining detector", just observe tensorboard graph and do testing on several checkpoints
aii thanks
you did not preprocess audio (no slices of acceptable length)
it was too short
sorry
what
no
original sample:
working sample:
it was just too short
smh
y e s
but now it works
uwu
i dont know what hte model will be tbh
hello how to text to speech with my model pls
💔
besides https://github.com/facebookresearch/svoice is there any other easier to use speaker separation
how do I recontinue training again?
Hello
can someone help me with my RVC model. I trained it through RVC2025, I got the .pth files. and want to use it as a realtime voice changer voice.ai but it requires some zip file. could someone help me with this? would be thankfull for some tips
omfg still using that garbage voice.ai?
trying rn. any better options?
for better voice changer, go to #🔍│help-w-okada to get one in the pinned guide
thx
Don't waste your time trying the shitass Voice.ai. Use W-Okada instead, and go to #🔍│help-w-okada for more information.
thanks
So like what's a pretrained model
And what does the language thing do
Japanese korean
ggrks
Wise words
this only for rvc? https://discord.com/channels/1159260121998827560/1198095259293450341
Anyone knows why I can't download?
I'm using Applio Local
I keep getting these buzzy noises at this specific range, yall know whats up?
I put it as a random noise file
just to hear the uhh frequency
what do you expect from training on 0.5s sample?
it can only reproduce this exact sample and nothing else
yeah but why the voise
noise
maybe use a different browser
the sample doesnt have that noise
not edge
so do I just
loop it more
is what ur saying
different data
dont make me use FL studio's piano roll to give RVC every pitch that I will accept
because I will do it
and it will sound bad
like actual real other data, not the same sample pitched down, etc
aw
welp
f
but yeah till where does the noise come from
do other models have this noise?
when they lack generalization, yes
good trained models, not
.5 second dataset
no like thats the ISSUE
rvc tries to generate frequencies
ur dataset don't have those frequencies
rvc ded
I meant like WHY does it just get random noise
seee? this is the answer I wanted
thank you shark person
🦈 🔥

guys ive lost track of rvcs but which is the easiest one to use to make covers with the models i have made?
applio 🍏
any link? 😭
Can anyone recommend some quality voice model with a deeper voice? Kinda like morgan freeman
would this make a good dataset /j
can it not be used for like okada
cuz i wanna use it in okada
guys is hina mod rvc the only one that works with youtube links and does the vocal separations by itself?
From what I just tested, Hina RVC is not working atm so there's an alternative
https://colab.research.google.com/github/Eddycrack864/RVC-AI-Cover-Maker-UI/blob/main/assets/RVCAICoverMakerUI.ipynb
He does anyone know if i can transfer Mangio RVC onto an external hard drive after i already have it installed? It takes up so much space but i also don't want to break it.
how to continue training a model you have already started training?
whats the link for the rvc application that lets you make the ai cover
hello everyone, can someone reccomend me model which I can use in UVR
I don't really understand which one I need to choose, need something that don't need much time, remove simple music, and have a normal quality
Last update: Dec 24, 2024
Yo is there any way to do models on phone except for weight
Colab
Can sent link
-colab
Suggestions for @gentle hollow
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- Hina's Mod AICoverGen WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Hina's Modified Original W-Okada's Realtime Voice Changer, Google Colab
- FaceFusion UI, by Nick088 Google Colab
- FaceFusion NO UI, by Nick088 Google Colab
- EasyGUI, by Rejekts Google Colab
- 🆕 Music Source Separation Training (Inference), by Jarredou & Makidanye Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
Colabs like Applio, RVC Disconnected and RVC Mainline can help u with that
Here's our guides btw
-rvc
- How to use RVC Mainline Colab by Cauthess
- AICoverGen Colab Guide by Eddy (Spanish Helper)
- Create a model with RVC disconnected (colab) by Angetyde
Thx
Hello, how do I train a voice? sorry just new here
please refer to the guide https://docs.aihub.gg/rvc/resources/training/
Last update: Dec 24, 2024
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- Hina's Mod AICoverGen WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Hina's Modified Original W-Okada's Realtime Voice Changer, Google Colab
- FaceFusion UI, by Nick088 Google Colab
- FaceFusion NO UI, by Nick088 Google Colab
- EasyGUI, by Rejekts Google Colab
- 🆕 Music Source Separation Training (Inference), by Jarredou & Makidanye Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
-realtime
Interaction has expired, use the command again for a new interaction.
- Colab free plan GPUs tipically works for about 4 hours each day
- Kaggle restricts GPU usage to 30 hours per week
- These options may not work on mobile devices due to the lack of a Voice Audio Cable (VAC)
#✦│chat message please don't waste helpers time
Sorry
It's fine, just don't do it again
Yes sir
If you're really really serious about that, let me answer to you. There's no stable version of Applio that can work with NVIDIA GeForce RTX 50 series GPU as of now. The version that actually works with this GPU is in development and experimental.
So I waste 1600000 dollar for nothing
Wow..thank you
And make sure you don't play around something you have no idea what it is for.
Well, for 5000 series just need to install nightly torch and all
nothing experimental applio-wise
He wasn't serious
That's what I was saying.
That was about rx 9000, not rtx 5000
I did explain things just to confuse her head. 
Does anyone changing these settings? And what are these really do?
Pitch changes the audio pitch. That's all I know.
volume envelope is broken, I think
So what does search feature do?
it controls the index influence
it is blending features of source audio and the model
prononciation, accent
So in general, it makes the output smoother?
Voice model accent stores in index file, which is alongside a pth file.
lets say you use a french voice model and english audio
if you use index 1.0 the inferred audio would sound like a french person speaking english
What if sans was speaking ERenise
Wouldn't it be better for that type of model to have 0 index
with index 0.0 the output will be english with just the new voice
well, still may have minor quirks, but not as pronounced
pitch is mostly the one you may need to tune, depending on if male to female or vice versa
Which for singing would be the most preferable approach for exact replication?
Oh no I know this
I'm just asking for anyone reading
the 3 people here lurking I see you

Also may I ask, I have the really old model from 2023, does it still works on those new tools like Applio or those new pitch extraction algorithm?
you may need to use the same pitch extraction for inference.. I've explained before why it may not be a good idea
also depends on whether it was trained with pitch guidance or not
How can I know if the model was trained with pitch guidance or not?
generally I would not advice using rmvpe pitch extraction for a model trained with crepe
Should I use hybrid[rmvpe+fcpe] pitch extraction for a model trained with rmvpe?
Yeah I was outdated
just rmvpe
A voice model trained with rmvpe should always be rmvpe, no need to use a hybrid pitch extracting algorithm for this one.
run python, import torch, load the model, print the model
at the end there is model information
And what are these things? Especially the split audio thing, should I turn on this?
it is for inferring audio >10min
depends on VRAM size
on 4090 can probably do an hour without splitting
Ohhh
So there's no need to turn on any of these if I do a 3 minutes song?
for long audios, mostly better do manual split or use the split audio feature
before that, make sure to do denoise & noise gate first
btw @simple ore have you tried comparing crepe vs crepe with custom hop length?
i feel the default crepe is better despite being the same thing
crepe with custom hop sounds out of tone sometimes
(64)
you can run that f0 view to see how it does
there's 128 vs 64
thats the crepe with custom hop
why those charts do not look like he spectrogram?
there's other that u cannot change the hop length
so 160 hop iirc
and i feel that crepe (without custom hop) is slightly better
anyway, no, I did not bother with crepe at all
thing is, i'm not sure how rvc handles custom hop
rmvpe is hop is 160
so 1 second is 16000, 1 hop is 1/100 of a second
so for 1 second you get 100 values
feature extraction is 320, so for 1 second you get 50 values
now, if you use hop 128 or 64, you get 125 or 250 values
i cant imagine it would align properly
i feel crepe with custom hop should be replaced with the original crepe
but i gotta ask claude for that
💔
well, it has to align features and pitch somehow
so either features are getting padded or pitch getting truncated
none of it is good
commission in #1191429836321849435 for more chance of getting response
Hi, what is this cell for and how do I use it?
The phase Fixer cell in your tweaked colab
What kind of files do I use it on
Why does the step 2 doesn't work
This actually isn't the case because
the model is essentially the exact same crepe model according to mangio
it is the " crepe full "
It just has an override of hop
( Unless the rvc uses some other crepe model? in that case.. yea but I don't think so
Guys does anyone knows which is better for pitch extraction RMVPE or crepe for Arabic voice?
It's urgent if you know the answer please reply:)
can you try in your old fork of mainline and compare inference results? i swear crepe (not mangio) results were better than mangio
Actually, my fork always used mangio crepe
because the foundation I kept on using ( prior to moving to applio
I meant in Training process
yes but it had the option to inference crepe (without custom hop)
( in case you talk about og forks of mine
then that was the og crepe
and i feel my inference results were better with og crepe
try on mangio crepe ( using 160 hop
should more or less align with og
see if that changes anything
oki
heyo, i havent used ai since it first became a thing (2023) and was wanting to use it again, there was this voice i wanted to use (microsoft sam) but it doesnt have an index file, instead it has a ckpt file, ilaria RVC doesn't accept ckpt files so is there any way i could fix this?
dunno if this is just a stupid moment from me but arent pythons and index files the way to use ai voices
ckpt is probably for so-vits
huh?
rvc-based models do not use it
oh, could you tell me how to use them?
dig out the so-vits-svc from the grave
reanimate by cursing and spitting
and figuring the set of ancient libraries that works
can you elaborate further
alright
we have google colab breaking monthly and you expect 2-year old softwave to work somehow
Monthly? More like weekly 😭
guys pls help quickly
im training with rvc but i have no idea what im doing
it's at the 11th epoch
how do i stop and save this epoch, and then continue training it from this epoch later on?
like say im done training for today and i wanna continue tomorrow
how can i make it so it'll continue from epoch 11
increase chunk to 140
and ask in #🔍│help-w-okada next time
ok thanks
my model is messed up 😭
yo?💀
yeah 😭
where other shark guy he prob knows what happened😭
who
this dude^
o
it does trust
did u change the pitch for the cover?
yeah i made it -12 semitones
an octave down
to match it a lil better
ill try it without adjusting it then if you think that's the problem
well i don't think it will save wtv happened there then😭
true 💀
this is why i went for a longer dataset originally
also
idk man, did u train it rvc 1 or 2?

to use the model i went to inferencing voice and chose "isling.pth" (the model name i chose) BUT there's an option to manually select it
nah that happens because he didnt sliced his dataset properly
shall i try manually selecting it
wym slice it
so rvc trained nothing
oh😭
did it not?? 😭
RVC
mainline, applio???
v2
idk
ok first do me a favor and uninstall whatever u installed
RVC1006NVIDIA
and install this https://github.com/IAHispano/Applio
that's the folder
mainline
oh ok
oh 😭
do this
download the applio zip
and uncompress it in a non onedrive folder
C:\Users/yourname
download zip? can't i just go to releases or is that the same thing
no
download the zi
oh ok
release sends you to the compiled version which is old
is your desktop in a onedrive folder?
ok just to be sure right click a folder and click properties
as long it doesnt says onedrive
ur fine
it does not say one drive
pog
why can't it be in onedrive
python shit
has a stroke or something
anyways
run-install.bat <--- run that
and wait
yes it doesnt matter
its okay
the process is extremely simple
whats hard about rvc models is getting good source audio
training them takes a few clicks
and 1% brain power
well, at least for training small models
for more complicated stuff like finding the appropriate hyperparameters... ehh that'll require your entire brain
but for us mortals is simple when we don't care about the complicated stuff
i installed it and started training without knowing what anything means
ayy
it's installed
yes
do you have spek?
audacity my beloved
show me the spectogram of your dataset
ive trained compressed asf audio from games and they sound fine, a bit robotic due to compression, but decent
ah, nice
ooo
Do you need help?
no
Ok
it's not really that noisy, it's mostly keyboard clicks and mic noises
spectogram is telling me its that noisy
🦈
lmao
arguably if he wants to be accurate to his mic and his voice, wouldn't it better to keep the noisy to copy his mic quality too?💀
true
but noise disturbs the learning process
takes more epochs than usual to get something good
if he doesn't care about a long training process i guess its fine
lmao i don't really care but shorter is better ig
no matter what is going to be faster than his 2 hour dataset attempt😭 🙏
like as long as it dont take over a day
😭😭
30 minute noisy set took me 3 hours*
woah
big oof
oh 30 mins?
i cut mine down to 18 mins
should be in around 2 hours
i don't understand tesnorboard graphs
will it still be accurate tho
graph going down = pog, victory royale, great
graph going up = cringe, ass, eww
graph going up for over 15-30 minutes = dying
i mean rvc technically works with noisy audio
and the increased epochs arent that much
is like 40 more?
okay great
uvronline supports up 13 mins anyway
now time to open audacity
open your audio and
second step
plus way more storage space saving
Do you need help?
ok done
silence is gone
12 mins 30 now
is that still ok?
now about this next step im not sure how good audacity's resample is compared to soxr(applio's) uhh
ok for this time we will use applio's resampler
so
don't touch sample rate
but encoding set it to 32-bit float
and exports
small but might work
that seems a bit overkill
not realistic results but decent
oh
i can make it longer
real quick
yeah but when i listened back it wasn't that monotone
i was playing a game with my friend so i was quite expressive
if you add more data be sure that is the same quality as your dataset
30 minutes is enough
oh ok
also speech models suck at singing
so a realistic speech model will sound ass while singing
i mean, they don't suck, but it doesn't sound like how u would while singing
i tried training model on me singing but it sounded ass
it didn't sound like how i sing
it sounded like the original
and uh
yeah the original but worse
singing in a speech model will cause the model to not sound like the person since the model lacks singing range
i understand what u mean
but what if my real voice lacks singing range
ppl do tell me i sound like robot when i sing
true
actually when i trained with weights
it was far far far far wider than my real range
well this varies by models
a veeery expressive speech set can do singing just fine
yes at that point rvc was using the pretrain's knowledge and not your dataset
pretrain is a thing that is used when training models
a thing that has knowledge about the human voice
rvc uses that thing to train your dataset
aka finetuning
anyway i truncated silence of whole 2 hr datset and then i cut it shorter to 30 minutes
and exported as 32 bit float 44.1khz
did you used my settings right? thats important
it was already (44.1khz)
ok
save your dataset in a random folder or whatever, you know the deal
now run-applio.bat <- run this
he is almost there 😭 🙏
ok wait
🦈
it shows this what does this mean
lmaoo
💀
looking like a d1 virus scam
lmao it's installing
💀

ayyy
his crypto wallet 
oh ok
training tab and be sure cpu cores is set to 1
(i don't trust applio's multicore usage
)
lmfao

i wanna use my gpu to train
oh ok
now here
bro has a 4060
oke
why is bro judging
I'M NOT
😭 im from a third world country man ffs
I'M JUST SAYING IT'S REALLY COOL
i have a 3070 but i got it during the GPU crisis
so it was like quintiple the msrp
😭😭
anyway
just in case you dont know how to copy the address of the dataset
right click the name of the folder
cutting 10-hour long files you may want to keep it under 4
but it is fine otherwise
yeah for me the speeds are basically the same
i got my 4070 super this December, all my friends were saying I got cooked, become the 5070 got announced with "4090" performance
but it turns out a 4070 super out performs it in some instances
ok
now after you did it, go to the logs folder
its on applio's folder
(obviously)
and go to your model's logs
then go to sliced_audios
oke
see if every slice is exactly 3s
whats the difference
16k is used for training the index
oh ok
and also for f0 estimation
yup 3 seconds
perfect
exact settings
and extract features
rmvpe is fast, it'll take a couple of seconds
oke
after feature extraction use these exact settings
(saving every epoch is more accurate, you can delete the useless epochs after training)
every has to match
batch size is more complex than just "8" but no point to explain what batch size is for now
okie
first generate index
and after the index is done you can finally click "start training" be sure that fresh training is enabled
wait so quick question, if i wanna stop training and i wanna continue training from latest epoch saved the next day
you can now use run-tensorboard. bat <--- very important, tells you when to stop the training as we don't want to train the full 500 epochs
do i need to change any settings
nope
what abt the fresh thing
yeah
it's half 12 am for me rn
remember to not use fresh training when resuming, and you'll be fine
yeye i clicked start training
batch size has to be 8 as well
imo index is so useless i dont understand why we require it for applications 🗿
kinda makes things better sometimes
i use index🗿
somtimes
second 0:18 sounds better with index 0.5
matter of fact this is annoying me, it resets back to .5 everytime
avg loss g/total 50
only graph that matters for you now
just a bit better
(everything matters but i dont want to complicate things for now)
for now u can kinda ignore the graphs since the training just begun
watch it when you're around 50epochs
ideally everything should be going down
oh alr
(fm most of the time doesnt go down, thats a rvc problem devs are trying to figure out why)
loss_avg_50
the g/total one matters
ignore the others that are under "loss"
only the loss_avg_50 matters
this?
yup
for over 15 minutes or 30 minutes
just wait
im realizing how lucky i been with my datasets cuz i never did all this stuff, but my models been rlly good so
because sometimes it goes up randomly on early epochs
ah
AI HUB Docs
