#✨│ai-help
1 messages · Page 214 of 1
well, it was great in small scale tests, but large tests not so much
will applio keep it in the future or possibly remove it forever
https://cdn.discordapp.com/emojis/1225582606062456893.webp?size=48
i'm testing some changes
but that means yet another retrain
at this point it is no longer "refine" gan
so thats why i couldnt get it to work 😭
i was going insane earlier i reinstalled like 7 times
sad to see refinegan go
but will you continue working on it?
hey all
you can re-enable it by editing tabs/train/train.py
Oh ok
change this back to True
i am new in rvc and I want help to understand how to setup it on either local or collab?
can somebody guide me please
I would not say it is gone gone
Thank u
but it is no longer does what the original paper did
my current version is using interpolation and a parallel resblock, so it is more like hifigan that has solved the problem with horizontal lines at 4, 8 and 12KHz
hey could you help me a little bit in giving a rough overview
?
???
I want to setup rvc and train a model with my voice
can you give me a rough idea on how to do this ?
why dont you look at https://docs.aihub.gg/essentials/how-to-make-voice-models/
In the context of RVC, the dataset is an audio file containing the voice the model will replicate. It can be either speaking or singing.
thanks!
The orignal one made by rvc boss
tha'ts commonly called mainline/original
Your Nvidia GPU is good enough to do inference (use models) locally (on ur pc), not the best to train (make models) even if still possible
You can:
- Locally (runs on your pc so the speed depends on that, you will have to set it up with the guides):
- Cloud (remote good pc, easier and faster than ur PC but it's limited):
- Ilaria RVC Zero: fastest and simplest that you can get for free
- Weights.com: Partnered with AI Hub, lets u do them easily but u may be in a queue
- Applio UI Colab: max 4 hours daily, not granted, of GPU
Easiest possible (automatically separates vocals & instrumentals) : weights.gg
easiest cloud: Ilaria rvc zero
easiest local: Applio
I would suggest you applio for more updates
also, check the docs https://docs.aihub.gg for more info on vocal separation
Last update: Oct 21, 2024
I know rx but which one is vx? Kinda lost
what's ur pc gpu and what do u want to do
one sec
waves' Clarity vx - dereverb pro mono
( Reminder; it is for mono audio. So you take one channel and work on it (( as you should anyway )) )
Oh cool thanks
✨
I have a mac m2 air
and I want to try voice changer real time
wrong channel then, rvc isn't for that
RVC = Retrieval-based-Voice-Conversion, the best Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models
Wokada = uses RVC for realtime inference
always elaborate your requestes when asking, people can't know your pc nor what u want to do
@old sigil go to #🔍│help-w-okada
How about this speech to speech conversions?
this has some latency?
rvc is just on pre-recorded audios, it's not meant for realtime, you need to use only wokada if you want realtime with rvc models
those are 2 different program names
rvc does not equal to "realtime voice changer"
so what I am getting is, for rvc
If i have pre-recorded audios and i could train a model out of it
can I use it for speech to speech conversion?
there is a go realtime for the original/mainline rvc, but it's way worse optimized than original wokada, and way worse than wokada deiteris fork
RVC is a STS AI, you just said you wanted realtime voice changer for calls, not on pre-recorded audios
ya but I was curious in knowing how rvc works
but Yeah!
thanks for your help
I'll have a look on wokada for my usecase
yeah it's speech to speech
Yo guys,
I’m tweaking the Batch Size setting and not sure what to pick. 4 = better accuracy but slower, 8 = faster but "standard".
Does 4 actually improve sound quality, or is 8 just as good?
Would appreciate a simple explanation!
i used to do up to 20 batch size
i think theres barely any difference between 4 and 8 so if you have the power to spare just do 8
if we are still on RVC v2 training on mangio-crepe that is
check https://docs.aihub.gg/rvc/resources/training/#batch-size
it depends on your gpu and dataset lenght
till 20 seems too much
you don't use the mangio fork still, right?
because that's old since 2023
do you think i would care with an SLI of gpu's that are worth like an used BMW?
once you get to cloud training you never go back
Applio
also yea i'd rather use the old mangio fork even in 2025
Still crepe and still rmvpe, i see we dont have a new extraction method yet after all this time
the "embedder" is new to me tho
iirc it was by default on chinese hubert
if your dataset is below 30 minutes, use batch size 4
if your dataset is above 30 minutes use batch size 8
now we got.. contentvec?
you don't need multiple gpus to train RVC models
ofc you don't, you can manage aswell with a tiny lil 3060 12gb
no
but when your dataset is 2 hours long
no, local training is better but it depends on your setup, cloud isn't as much stable and you can see that in #📰│dev-updates , especially google colab for training kinda sucks
mangio rvc fork is pretty old and not maintained, I don't get why you would say that when Applio got more improvements
i used colab only like 3 or 4 times when RVC wasnt a thing and SVC is all we had
mangio fork uses extremely outdated code, everything its wrong there
^^^
even the logging is bugged
i made great models even with the outdated code and everything being wrong + bugged logging in 2023
i seriously doubt there was much improvement since then
because it wasnt outdated in 2023
mangio stopped receiving updates around that time
there's other enhancements in the code, it's not just that
it was on pair with mainline back then
now its even behind mainline
and mainline is also extremely outdated
i would love hearing audio difference between 2023 mangio-crepe and nowadays RMVPE on applio or whatever yall prefer now 👀
it had pitch issues
you can even train on worse, it's just not suggested for not wasting time
yea and thats why i always used Runpod or other paid services
wrong channel, use #🔍│help-w-okada
oops
bro i legit checked on Weights.gg, my old model trained on mangio's fork is still magnitudes better than the newer models
RVC = Retrieval-based-Voice-Conversion, the best Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models
Wokada = uses RVC for realtime inference
😭
most of the work is separating the vocals, not using the right version of RVC
im p sure i wouldn't be able to replicate that level of clean vocals again, kim vocal 1 was goated back then
kim vocal is pretty old, there's better models now
you can't know how good the person trained the model nor how
read what i wrote above
.
i used multiple models on FLACs files, then manually fixed issues in FL Studio + izotope rx
most people just download random mp3 128kbps acapellas from youtube and be done with it
every other people that tried to make an "updated" version of my model failed miserably, the audio that gets generated is a mess, both in sound quality and voice similiarity
what is your model?
oh a singer
no wonder people mess it up
you prob used studio sessions
no
you tried cleaning the dataset
no studio session, all manual labour and AI
from the song themselves
i just downloaded the whole album in FLAC quality
then one by one, minute by minute, cleaned the dataset
its not

also using an updated version helps, applio got more enhancements like the benchmark flag #🔊│ai-development message which is turned on the main branch
but yeah ofcourse it depends much on how people clean the dataset
ok so, after 2 years the advancements were... better GUI and slight code optimization?
what happened to that one dude who was developing RVC "v3"
rvc boss left rvc to rot, he's working on gpt so vits and recently released the v3
there are also newer things, like the refinegan vocoder on the main branch being experimental rn
idk why ur so attached to mangio fork
i'm not, we just didn't get any decent improvement since 2023
atleast for RVC
we did get some improvements with applio, it's just the original dev now works on other projects than RVC, so it's not an RVC v3 but more experimenting
rvc boss works basically mostly on https://github.com/RVC-Boss/GPT-SoVITS
which is probably best for speaking only
ofcourse it's not such an entire new type of architecture, but i would rather use code that gets updated and improvements rather one that doesn't get any at all
i dont think we are gonna see any decent or sizable improvements in the next year
maybe 2027
It always was contentvec, not hubert
As for f0 extractors.. yeah, there's nothing really better as of now, afaik
As for vocos, it is not worth it.
Quite tricky to get properly working and tbf, potential phase reconstruction issues aren't worth it, similarly stft and I believe istft vocoders
most generators are only useful for mel to wav reconstruction, not encoded latents to wav
hifigan uses NN filters to predict the waveform from the encoded latents
vocos does not have enough capacity to predict
-kaggle
Suggestions for @distant turtle
- Applio Notebook, by Vidal Kaggle
- Applio Notebook, by Shirou Kaggle
- Music Source Separation, by Shirou Kaggle
- UVR5 NO UI, by Eddy Kaggle
- Original W-Okada's Voice Changer, Kaggle
- Modified W-Okada's Voice Changer, Kaggle
- 🆕 UVR5 UI, by Eddy, ArisDev & Nick088 Kaggle
- 🆕 RVC AI Cover Maker UI, by Shirou & ArisDev Kaggle
- 📖 How to use RVC Mainline on Kaggle by Cauthess
Note: Kaggle limits GPU usage to 30 hours per week.
I hope we get them asap
Can someone tell me why I get this in Applio Voice Blend?
model_blender(model_name, pth_path_1, pth_path_2, ratio)
ValueError: too many values to unpack (expected 2)
I tried to blend a 200 epoch and 48k sample rate model with a 270 epoch model the same sample rate
if both models were made in Applio and both at the same sampling rate, then there should be no issues with merging
there may be an issue if the models came from different sources
what pretrain does the applio colab use by default ?
The original/OG pretrain
ok thanks
Hi everyone I hope I'm texting the right chat.. I have had to cancel kits.ai due to being ridiculously expensive.. anyone knows good complete walkthrough tutorial how to make a model from scratch? kits would sort everything for me so I feel kinda lost
Kits.ai just uses RVC but in a simplified User Interface
First of all what's your PC GPU
macbook pro m1
Oh, macs ain't really good for AI
Train (make) RVC Models on cloud:
- Prepare the Dataset
- Setup RVC:
Choose a cloud way to use RVC,
- Google Colabs (4 hours of daily gpu for free, not much hours, but easy to use):
- Kaggles (a bit harder to use and needs phone number but gives 30 hours weekly of better gpus):
- Mainline (UI)
- Applio by Vidal (UI)
- Applio by Shirou (UI, no guide as of right now)
- Lightning.ai (Kinda hard, needs login, no issue with web uis or anything, but only free 15 credits monthly):
- Be sure to know about the tensorboard
Google Colab = Easier but risk of getting disconnected
Kaggle = Harder but way more gpu time
If you are looking for the easiest way and for free, is using https://weights.com/ which ofc uses RVC
RVC Inference (use models) on pre-recorded audio on Cloud
You can use either:
- Weights.com: Easiest Possible Ever Automatic
- Ilaria RVC Zero: Fastest free on cloud
- Applio (ui)
so I want to train after this musician.. perhaps the whole discography not faring more than 30-40 minutes tota
If you want, you could try locally (runs on your Mac), https://docs.applio.org/applio/getting-started/installation , but no one has ever reported training a model successfully on Mac since it's kinda slow
I actually downloaded this earlier this evening but it's all so overwhelming
i have uvr5 as well just don't know whether there's an ubiquitous setting to extract clearest lead vocals
if you could guide me step by step mate, I'd paypal you or something if that suits you
I'm not even sure if that supports Mac at all
uvr5 seems to work when I open it haha
Hopefully it runs on MPS using the integrated m1 pro chip rather than the CPU
Else it's gonna be even slower
@viscid moss hey sorry to disturb, do you remember if UVR supported MPS for Macs, or does your version of it support it?
ur so kind thank u ❤️
I was asking the other staffer if he remembers if that program supports it, or if his own version supports it since he made a separate own version of UVR by the way
yeah I gotchu, I was just moved how there's actually someone being understanding towards complete n00b
lost kind of kindness on the internet for most part
Unfortunately unless you spend like 4k dollars, macs are pretty shit for AI
And even if you spend that much and get the most powerful Mac made for user consumption, the issue is not many AI programs support Mac at all
when I was experimenting with google collab stuff it seemed to do decent like?
Yeah that's because it doesn't run on your Mac, it runs on a cloud server (remote good PC) that use Nvidia GPUs
Nvidia GPUs are the best in terms of performance and support in AI field
Ahh
I also sent you cloud links before
Did you install applio locally, or did you just use a Google colab? And this for UVR too?
both locally
Applio should have Mac support, hopefully uvr does too
Tbh, I don't know how much I could suggest you to use them locally, At this point cloud would be faster than your Mac, but the only issue is limited GPU time
I can leave it overnight or w/e not to worry about that really ❤️ guidance is pivotal for me
Be sure to watch out if it overheats or anything though
I would personally suggest you to use cloud for faster processing, but your choice
I don't know how much time it could take to be honest, not sure if it's going to be overnight or more or less, so I can't guarantee you much on that
I think it has MPS support, cause they made releases for Mac. UVR5 UI doesn't have Mac Support (There's no installation files + idk if works correctly cause I don't have a Mac to test) but audio-separator (UVR5 UI core) have MPS support.
UVR5 UI probably works with MPS but I haven't tested it because I don't have a way to test it.
So I recommend just use UVR5 for mac
https://github.com/Nick088Official/Minecraft_Skin_Generator/blob/main/Scripts%2Fminecraft-skins-sdxl.py#L137 maybe this can help
Alright thx for letting me know
that's already built in, just need to test xd. I'll try in a while using a VM maybe, to make the installation works at least atm I'm busy with irl work
Alright goodluck
is there a guide on how to download the software needed for ai stuff
anyone know why i cant upload a rvc model to my voice changer ?
i get alot of errors when installing RCV with the TroubleChute one line command. and im to stupid to install it manually. i used RVC on my old pc (win10) before and now on win11 everything just wont work. it openes the website but it wont ever finish converting
Which RVC program are you using?
Retrieval-based-Voice-Conversion-WebUI
if that wasnt the correct answer im sorry lol im not rlly deep into software stuff..
RTX 5080
Damn. Unfortunately, there's no known stable version of Applio for this specific GPU. But I think you can use Applio with CPU instead.
i want a model that sounds realistic please, i dont mind the size of the file
hm i have the 9800x3D im sure it will be enough right?
smh bros flexing
no im wondering about stuff cuz i dont have a clue lol
that's like one of the most powerful if not the most poweful cpu rn
ik cuz i have one myself
yeah for gaming but for AI stuff idk?
youre thinking its like nvidia vs amd but its different from gpus
i just heard for productivity stuff core count is more important but ig not
I'm not sure about this one, but yeah just like RTX 50 series, there doesn't seem to be any version of Applio made for this specific AMD GPU.
what do you want to do with it
Are you saying you hate people flexing their stuffs or something?
no?
i flex my 7900xtx sometimes
flexing is ok as long it dose not include lying imo
Imagine thinking "my laptop GPU is Intel HD Graphics 3000" is a lie. 
yea imagine
9800 is cpu
imagine imagining stuff that isn't true and just assuming it over the conversation
at the moment there's no proper torch build for 5000 series
but you can install something for applio
so the issue is my gpu? dang lol
I've mistaken 9xxx for AMD RX GPU. 
would the best bet be to use Applio? i have it installed but i dont rlly get it haha
actually not compiled, clone the repo
then run the installer
hopefully it does not error out
then update torch
"just do it" already confused lol
Applio is the only way to get converted audio done very fast with proper GPU. With RTX 50 GPU, you'll have to do some code a bit.
saddies. so there is no way for me lol
You can use any other RVC program, but all of them will only use your PC CPU because neither of them have complied for RTX 50.
i mean i wouldnt care what it uses as long if it just works
That's all good.
i wont be training my own models and stuff. i just need stuff to be converted into voices via pretrained models
no code, just replace torch version with update that supports 5000 series
but hooow
u linked me a file i cant even do anything with lol
cuz tf is a .whl file
thats the first thing im failing on lol
step one completed successfully lol
then ig run-install.bat
then running ur command in a terminal with admin perms
module "env" couldnt be loaded

it just throws me an error when trying that
but how do i even get there
open command prompt
yeah
okay did that
dang i did it. im a hacker. lets hope it works lol
greatly appreciate the help tho 🫂
hm now it doesnt start up anymore lol
give it a bit
it throws an error that it doesnt find smt
ig "env/lib/site-packages/torchaudio/lib/libtorchaudio.pyd" wasnt found lol
ig i need to do the same with libtorchaudio but idk what version
Guys is it normal for a voice model in zip that i heavy 259MB ?
id still appreciate help with getting any voice conversation to work on my gpu. 
Anyone know the python version requirement if I want to run this version of UVR5 locally?
https://huggingface.co/spaces/TheStinger/UVR5_UI/tree/main
I cloned the repo and tried to install the requirements with the lastest python 3.13.2 but failed:
ERROR: Could not find a version that satisfies the requirement torch<2.5,>=2.3 (from audio-separator) (from versions: 2.5.0, 2.5.1, 2.6.0)
ERROR: No matching distribution found for torch<2.5,>=2.3
Do not install any Python program related using the very most recent version of Python. Use a version of Python like Python 3.10.x or Python 3.11.x.
can you show the full error stack?
you may need to downgrade python to 3.10
there's only 9070 & XT
the alternative is Anjok's UVR with latest patch, and ZFTurbo's MSST repository that works on python 3.10-11
Hello. I've encountered an issue while creating an index file. I have approximately 7 hours of audio for voice training, and the training process went smoothly. However, when trying to create the index file, the process stops at around 1,300 files out of 48,000, and then an index file is generated, which is only 30 megabytes in size. When using this index file, the voice often converts with artifacts. In other models I've trained on 10-20 minutes of data, the index files weigh 120+ megabytes. What should I do in this situation?
with 4000+ sliced segments the index creation attempts to narrow down the dataset, but it only goes so far before it detects that it gets no improvement
you can always run inference without an index and check if that comes okay
Without the index is fine, but the voice is not as similar as with the index.
the voice comes from the model
index is just an accent
7 hours of audio is too much for a finetune and not enough for training from scratch
What is the best length of the dataset to use? If I shorten the dataset, is it better to continue training the same model or start training again?
Thanks I now use 3.10.16 and it works flawlessly
i downlaoded it but when i will open it is doesnt open
Thanks it works
can you help me
Can you be more specific about why you cannot open it, is it because of the dependencies?
i have downloaded the voice changer and the guy on the video says open it
i open it but it doesnt open
Don't follow YouTube tuts ever at all for RVC and Wokada,those are old
First of all, you want a realtime voice changer for calls?
can you say me a good voicechanger i will make a girl voice
yes and for games
Wrong channel then
RVC doesn't mean realtime voice changer
RVC = Retrieval-based-Voice-Conversion, the best Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models
Wokada = uses RVC for realtime inference
Tell your PC GPU in #🔍│help-w-okada
ok
30-60 min, content varity > quantity, 3 hours of mumbling is worse than 15 minutes of speaking and singing
new model, obviously
that's unless you force select faiss method (which could take longer and produce an index file in around 2 gigs) instead of leaving auto in the index training
Yo, i didn’t know index is for accent. I always thought .pth always needs it to work. Is there a good documentation to learn some of this? Id like to understand general idea how it works but not the too deep explanation for like engineers.
A pth file always required to work, of course it is.
An index file is a file that stores voice accent and used for specific voice model. It can be achived during voice training process.
Yo guys,
I still don’t get how Batch Size really works.
Does 4 actually improve sound quality, or is it just a performance thing?
Yep, it will improve it depending on your dataset.
(I'm talking about quality)
It will mostly be useful if you have a short dataset below 5-4 mins.
If your dataset is around 10+ mins, go for 8 on batch
batch size = number of files trained at once
bs 4 trains 4 files at once
bs 8 trains 8 files at once
for simplicity:
less than 30 mins of audio = use batch 4
more than 30 mins of audio = use batch 8
too high in a small dataset might hurt the model ability to generate new audio
Espera Lyery
Si es buena idea usar batch 4 con cualquier dataset que vaya debajo de 30 minutos?
Esperate leí mal XD

si el dataset es menor a 30 mins usa batch size 4
si el dataset es mayor a 30 mins usa batch size 8
Got it! So for less than 30 mins, Batch 4 is best, and for more than 30 mins, Batch 8 is better. Just to be sure, using Batch 4 on a longer dataset wouldn’t really improve quality, right?
nope
it could actually make it worse, the model would be stuck in a bad place and never leaving it
so the model would sound weird
Ohh I see! So using Batch 4 on a long dataset could actually trap the model in a bad state? Thanks for the clarification!
Yep, for that reason i mostly recommend using 8 on batch for any dataset beyond 10-20 mins.
yesss
technically any batch size works but the results depends heavily on the dataset
for making things simple just stick to what i said before
there are times where bs 4 gives better results than 8 and viceversa
Yup, it's matter of testing
Thanks a lot!
how do you make rvc?\
elaborate:
- ur pc gpu
- what do you mean? did you mean make an rvc model?
im sure he gone already. 
welp, youre right 😭
People need to understand that helpers might be busy sometimes and they can't reply in 2 mins
Do you need any help?
yeah..
well. i got help before but im sure he was also busy. i couldnt solve my issue but im also veeery nooby when it comes to software so yeah
from what i understood is that torch/torchlibaudio doesnt have working versions for the 50series nvidia cards? and id need to update manually to a nightly version of both
buutt yeah
elaborate:
- your pc gpu
- what guide/download link are you using
- the issue specifically
- what do you want to do
RTX5080
Got told Applio is best.
Not getting it to work in 50series cards
Just convert existing audio using models into other voices
Thank you for replying
I checked that you meant this chat #✨│ai-help message , unfortunately this is related to the rtx 50 serie needing a new pytorch version, meaning there's no precompiled version, and you need to do it via source
I don't have a 50 serie, but I can try helping you out
Would greatly appreciate that
if u have the patience to handle me haha
you're running on windows 11, right?
Yes
try going back to where you had that libtorchaudio missing issue, open CMD, run env\python -m pip install https://download.pytorch.org/whl/nightly/cu128/torchaudio-2.6.0.dev20250308%2Bcu128-cp310-cp310-win_amd64.whl
then, try running applio
@karmic flax btw be aware of the missing ROPs and melted connectors for the 50 serie, many 50 serie gpus are having that issues
you should prob check out if it's happening to you too
yeah. I luckily dont have missing rops c: and my gpu is powerlimited to use 288watts. the 12v connector is rated for 600watts
alright, hopefully nvidia does something soon about this
also let me know about this
hmm i have errors but i cant dm u and i dont have perms to upload a pic
!give-media-perms 1h @karmic flax
can you try uploading the pic now?
weird, @karmic flax can you try to run the same command #✨│ai-help message , but add after install, add --force-reinstall then leave everything else as it was
already confused 
go to cmd again, as you did before, then run env\python -m pip install --force-reinstall torch-2.6.0+cu128.nv-cp310-cp310-win_amd64.whl
oh I thought you downloaded #✨│ai-help message
well, you can run env\python -m pip install https://huggingface.co/w-e-w/torch-2.6.0-cu128.nv/resolve/main/torch-2.6.0+cu128.nv-cp310-cp310-win_amd64.whl
seems like you don't need to force-reinstall, because you didn't install it in the first place
i was looking for it but i couldnt find the correct version
could you try the command I just told you?
nice, try running applio after it's done
can you run #✨│ai-help message and tell me the output?
shit
what if you try
env\python -m pip install https://download.pytorch.org/whl/nightly/cu128/torchaudio-2.6.0.dev20250306%2Bcu128-cp310-cp310-win_amd64.whl
this build is earlier than the one I sent you before
my last guess would be running: env\python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
see if that works
@simple ore you might find this helpful btw
you're welcome
yeah, they've fixed the nightly build 3 days ago
Sorry I have not trained any models before. How might I do such a thing for rvc? (https://huggingface.co/spaces/TheStinger/Ilaria_RVC)
IlariaRVC isn't for training models, it's only for inference
what's your pc gpu
3070
yeah you wouldn't even need to use Ilaria RVC, it's a cloud (remote good pc) service, but your pc is good enough
Oh Alright, how might I train a voice though
oh
lmk
WHich one would be easier in your opinion
Applio
Wasnt Ai Hispano the people who made Applio hacked/
or is that fixed
that happened long time ago in october 2024
it's all completely fixed and safe
Alright, which one would I download? I cant find the download button shown and im not sure which one I should click
Compiled > Windows > The download button
Thank you
Yo quick question about my RVC training. I’m at 27k steps now. Loss kept dropping but has been super slow since 10k-12k.
hear the epochs and compare them
if its overtrained the model is going to sound robotic
while in epochs that arent overtrained they're going to sound just fine
Thanks! I'll compare and pick the best one.
if you use spek its easier to spot overtraining, there'll be missing frequencies in the result
Oh okay, thanks! I'll check with Spek to see if there are missing frequencies. Appreciate the tip
missing frequencies? it is when the spectrogram cutoff is lower than the target sample rate
no like sorry i explained bad
i meant to say to check for missing harmonics lmao
overtrained models generate noise instead of harmonics at some point
🦈
imo the detail will be crisper to match the dataset but it loses some pretrain ability (that could lead to some possible robotic sounds)
yuh
Y'all be popping out and asking for E-girl voice model just to troll and cat the damn fish someone.
I actually did this to see if you would still answer but damn
Like everything you do is just answering people who asks for e-girl models
lol
just actually searched your messages lol

Like if asking anything would make you more money though.
Shit. You acting like if I do this everyday huh.
Hi! How do you know if the model is overtrained?
this kind of thing somehow irritates me
Hey does, anyone know how i can get multiple (5) .VOB files and combine them into 1 whole video? No one seems to. I've tried clipchamp but there's a 1second gap/pause thats a mess.
Not a channel for that
@low shard
Oh nvm this seems an error on HuggingFace side, you could try again tomorrow
Anyone know the required python version to run https://huggingface.co/spaces/TheStinger/Ilaria_RVC/tree/main
I use python 3.10.16 and run into errors while installing requirements:
Building wheels for collected packages: omegaconf, samplerate, srt, antlr4-python3-runtime
Building wheel for omegaconf (pyproject.toml) ... done
Created wheel for omegaconf: filename=omegaconf-2.0.6-py3-none-any.whl size=36882 sha256=0b988ea25770e060c1ad0bde20dfbd7da84924e620f08626d4507bff6e337ece
Stored in directory: /tmp/pip-ephem-wheel-cache-mgtyolse/wheels/ee/67/d9/a68a521e487bb78d6599d3a157f5bb01d0760c689a9c2ac78f
Building wheel for samplerate (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for samplerate (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [4 lines of output]
running bdist_wheel
running build
running build_ext
error: [Errno 2] No such file or directory: 'cmake'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for samplerate
Building wheel for srt (setup.py) ... done
Created wheel for srt: filename=srt-3.5.3-py3-none-any.whl size=22483 sha256=2ca8125c77c760943695358d90630f7dbf1a8fc14d0c479a94cc8bbaa9d08d93
Stored in directory: /home/yui/.cache/pip/wheels/d7/31/a1/18e1e7e8bfdafd19e6803d7eb919b563dd11de380e4304e332
Building wheel for antlr4-python3-runtime (setup.py) ... done
Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.8-py3-none-any.whl size=141246 sha256=5175ca3614fd8bc7ba3b90590c4920a09065e395cd7a667f3d53289ec8ed5974
Stored in directory: /home/yui/.cache/pip/wheels/a7/20/bd/e1477d664f22d99989fd28ee1a43d6633dddb5cb9e801350d5
Successfully built omegaconf srt antlr4-python3-runtime
Failed to build samplerate
ERROR: Failed to build installable wheels for some pyproject.toml based projects (samplerate)
Also if there's a better alternative please let me know
so a question, haven't touched ai voices in a while...
how do you make a cover of a song again? colab doesn't work for me anymore, whenever I try to load a model it gives me not found
The better alternative to RVC is Applio.
Thanks
It's the license for usage of your model
ah oaky
Tell:
- your PC GPU
- the google colab link you're using
- the error
Are you trying to run it locally? It's not meant to run locally
Also what's your PC GPU and what do you want to do
AMD moment
What's your PC GPU
@low shard hey is mailine back, the colab one
Maybe @simple ore knows, he used to have AMD till he recently switched to finally Nvidia
Nope, Hina is busy and won't be fixed for at least another week, use Applio or something else meanwhile, also you can just look at #📰│dev-updates
Yeah I saw it, was just making sure
Well the batch size does depend also on the dataset length, but 8 shouldn't make your PC crash at all
It's fine
I’m going to create (Mostly) Blackiana model with Apollo, is there a voice file?
Tensorboard to check how the model goes
You need to find yourself the dataset
You use Tensorboard to track the process, to make sure the model won't be too overtrained or undertrained.
Oh, so that's what Tensorboard looked like if run locally. 
Last update: Dec 24, 2024
Just a Laptop RTX 4050, I want to do this since running it online has restrictions
Ehh you can inference but will be limited on the training
IK, I don't train models. But huggin face has limitations even you are just inferencing isn't it?
Yeah that's because you're using an expensive PC GPU, google colab and Kaggle exist too btw
oh the using the CPU won't trigger the limit, thanks
CPU is over 10 times slower
Are you sure it's using your GPU
If your GPU percent goes high in Task Manager, it's definitely working.
gpu and batch size?
also dataset size matters
its normal
Amd is just very slow
150 epochs if every slice is 3s
if the slices are different lengths
around 200 epochs
ok
yes, around 200 epochs
as long the cmd is open, its fine
500 epochs is too much for small models, even when using the automated slice
ai is random so u cant just predict when its going to sound fine
is always a good practice to compare the epochs
at some point the model will naturally overtrain, which makes final epochs sound very robotic
in that case your final model would be any epoch before overtraining
read the note at the bottom of the AMD installation instructions
AMD moment
45min set on 6700xt was taking ~4min/epoch
yeah, it was an overnight training to 200e
tbh I thought it was way faster, wasn't zluda optimized?
yeah T4x2
yeah 30 hours of GPU weekly
way better than google colab that has random daily gpu with a max of 4 hours daily
use scalars tab
Does enyone know why when I import rvc to voice.ai app every voice sounds almost the same
voice.ai sucks, don't use it at all
@upbeat chasm you want realtime voice changer for calls? tell your pc gpu in #🔍│help-w-okada
I mean wokada ones are fine, they are open source
Then what free service should I use
you should use wokada, either locally or on cloud
as I said, tell your pc gpu in #🔍│help-w-okada
you can also use it on google colab and kaggle, yes
rvc realtime from mainline/original rvc is pretty old
I want voice change Just for mayself to use not to use on calls
expand losses, or avg_50 if you have that
so, inference (use models) on pre-recorded audios?
still, tell your pc gpu
@upbeat chasm You can check your pc gpu via:
ctrl+shift+esc (task manager) -> Performance tab -> GPU
I know
flat mel and g loss means the model is close to the dataset in the type of the data it has
generally both have to go down
@upbeat chasm #🔍│help-w-okada message since you got a 3060 laptop, and don't need to use the models in realtime for games/calls
Your Nvidia GPU is good enough to do inference (use models) locally (on ur pc), not the best to train (make models) even if still possible
You can:
- Locally (runs on your pc so the speed depends on that, you will have to set it up with the guides):
- Cloud (remote good pc, easier and faster than ur PC but it's limited):
- Ilaria RVC Zero: fastest and simplest that you can get for free
- Weights.com: Partnered with AI Hub, lets u do them easily but u may be in a queue
- Applio UI Colab: max 4 hours daily, not granted, of GPU
Easiest possible (automatically separates vocals & instrumentals) : weights.gg
easiest cloud: Ilaria rvc zero
easiest local: Applio
in at least the first 5k steps both mel loss and g loss have to go down
if they dont there's something wrong with the dataset or something else
too big of the batch size o something
g total
depends on the dataset size/batch size
What's your batch size?
too much for 15min
use 4
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- Hina's Mod AICoverGen WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Hina's Modified Original W-Okada's Realtime Voice Changer, Google Colab
- FaceFusion UI, by Nick088 Google Colab
- FaceFusion NO UI, by Nick088 Google Colab
- EasyGUI, by Rejekts Google Colab
- 🆕 Music Source Separation Training (Inference), by Jarredou & Makidanye Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
and now your model exploded
because you're training with FP16
you can stop now, it is dead
! C:\Users\user\Downloads\vcclient_win_cuda_2.0.76-beta.zip: C:\Users\user\AppData\Local\Temp\Rar$EXa23844.2909.rartemp\dist\main\web_front\assets\
yall know the reason for this?
unzip the file lol
alr
Sorry, I want to know how to control the speed of the voice, and can I produce the vocie and srt together? Thank you!
are you following a youtube tutorial?
the file name seems to be the latest version of Original Wokada (Wokada is used for using RVC models in realtime for calls/games)
But the wokada deiteris fork is way better than the original wokada in performance and quality
@rough parrot tell your pc gpu in #🔍│help-w-okada , this is also the wrong channel since RVC and Wokada aren't the same program
literally don't use voice ai it sucks
Why is that?
Then what should I use?
if u wanna use realtime voice changer use w-okada
or if u wanna make a cover use applio
Is this free?
Where can I download?
first what's ur gpu?
right click ur taskbar and click task manager
And then?
I don't quite understand this
Can you just send me the pic of the step?
@low shard u can take it from here
Could Applio 3.1.1 control the speed of ai voice? and can I produce the vocie and srt file together?
That's what you mean right?
Or the processor?
And then what? How to instal Applio
Ok
don't use voice.ai
are you looking for ai covers or realtime for calls
No, for song cover
But I'll try another fun I guess
Your Nvidia GPU is good enough to do inference (use models) locally (on ur pc), not the best to train (make models) even if still possible
You can:
- Locally (runs on your pc so the speed depends on that, you will have to set it up with the guides):
- Cloud (remote good pc, easier and faster than ur PC but it's limited):
- Ilaria RVC Zero: fastest and simplest that you can get for free
- Weights.com: Partnered with AI Hub, lets u do them easily but u may be in a queue
- Applio UI Colab: max 4 hours daily, not granted, of GPU
- RVC-AI-Cover-Maker-U ColabI: Automatically separates the vocals and instrumentals, converts the voice and mix all together back
Easiest possible (automatically separates vocals & instrumentals) : weights.gg & rvc-ai-cover-maker-ui
easiest cloud: Ilaria rvc zero
easiest local: Applio
Download compiled version right?
Kinda sus
Seriously please
On it
Why is that?
ApplioV3.2.8-bugfix.zip right?
Oh
Can we request a model? Is there a payment?
After download Applio zip, what else?
And then?
Why is it take so long to extract?
Where can I get rvc v2 voice models?
@primal barn I didn't find batfild
You can search rvc ai voice models at:
- #1175430844685484042
- In #🔍│find-models , Do /find with @earnest musk
- https://weights.com/ (login required)
- https://huggingface.co/models (but watch out cus in hugging face there arent only rvc ai voice models)
- https://voice-models.com/
- https://thevoicemodels.com/ (for Turkish Models, login required with discord and level 2 on their server)
if there isnt one, you can:
- #1159289738314919936
- #1191429836321849435
- make it yourself with our docs guides https://docs.aihub.gg/essentials/how-to-make-voice-models/
:wave: @low shard, How can I help?
Available Commands:
• @weights find <query> or /find <query> - Search for RVC Voice Models
• /create - Create an AI Cover
• /image - Generate an Image
Oh
@primal barn The screen is black
Then what?
It's just dark with "Applio" on the left up corner
Yes I guess
Wait, is it require internet?
its like this
Oh yes, now it open in browser
Now what?
In browser?
Yes, and then what?
Why not in the program itself instead?
Yeah
So it's online then?
Oh cool
@primal barn Now how to put the model?
hi guys i see support came for the 50 series https://github.com/IllIlIlIllIl/voice-changer/releases/tag/b2335
i have a 5080 can i get help installing the voice changer please? for some reason i cant work it out, i had a diff pc with a weaker gpu and it worked good there
wrong channel
RVC = Retrieval-based-Voice-Conversion, the best Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models
Wokada = uses RVC for realtime inference
use #🔍│help-w-okada
my bad
How to put the model?
@primal barn So I must move the model file into that folder?
Just the pth? Not the index too?
Put the index and pth alongside the reference and the mute?
folders
Aight aight. So after I put the model ...
Then I put the song or audio I want there
And then just convert?
How to do that?
Dang, it sounds goofy 
@primal barn It sounds goofy. And some part there's like a glitchy sound
Can I fix it to sound more smoothly?
hi i tried to download RVC but when i start it it gives me an error saying that the 'thinker' module is missing does anyone know how to fix this?
This is quite likely due to harmonies / vocal layering / and echo / harsh reverb smearing the f0 traces too much
The song. Cuz I don't know how to separate
jesus
u could've just separated it on uvr5 or mvsep
Are they paid?
using melband
hell no
Why melband?
its good for extracting vocals
the tutorials on youtube are outdated
You said that I should pay for it
You wanna get uvr as others said, and also get fv4 model ( gabox's melband roformer voc fv4
Imho, currently the best one
Also, don't use the link they gave you, it won't have support for newest newest models
first this one
and then you wanna patch the uvr with:
https://github.com/TRvlvr/model_repo/releases/download/uvr_update_patches/UVR_Patch_1_21_25_2_28_BETA_small_rofo.exe
oh
there
Once you dona all, you wanna get a model I'll link
I don't understand how to patch
those are installs my man
install one, install the 2nd one
No manual work involved in that part
Once you install both in the order I specified, head to:
https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers/vocals
and download this:
It's the fv4 model for isolation I mentioned
Using 2 the files from above?
yes, first the first link, then the 2nd
first is the base ish, 2nd is the patch
Then what's thi
It is a model for separation
which you'll use in uvr
the fv4 model I mentioned
you'll have to download it
- get my config file:
Now, focus.
Once you install the uvr, you'll have to find it's folder and then, put the model ( fv4 ) in here:
so, uvr's folder / models / mdx_net_models
( yes, you'll just cut n paste the model file in there )
As for the config file ( the yaml file )
Once you open the uvr and select the fv4 model:
I don't understand. Aren't the models are just pth?
.pth is just a format of models used in pytorch
pth = pytorch
.ckpt is another format meaning a checkpoint
don't worry about that part
Oh god, this is too much for my 12 y.o brain
well.. If you wanna hop into vc, I might help you if you screenshare
Other than that, you gotta follow the text instructions
That's just make things worse
I'll just follow your tutorial from what you sent above
If I stuck, I'll just ping you
sounds good
anyway, back to this.
Once you select the fv4, you'll be prompted to configure it or select the config, something along that
like this
In the model type you'll have to select " mel band roformer " ( just not the v2 variant )
What should I do with yaml?
that part is for the first box
the " select model param "
there'll be an option to open / use the yaml config
and you'll use the one I sent you
Lastly, you'll be clicking ok or was it apply for all windows
And that's that from adding custom models
Now as for configuring the uvr for usage
( do it after properly setting up fv4 model
you could also do this:
so, setting the wav type to 32 bit float
( in case you'll be using uvr for making datasets / samples for model training, else you can keep it as 16 bit )
Oh yea, in here you can use 11 or 16, I'd recommend 16 tho
Now... if you need an authentic guide for uvr, models n stuff..
https://docs.google.com/document/d/17fjNvJzj8ZGSer7c7OFe_CNfUKbAxEh_OBv94ZdRG5c/edit?tab=t.0
But I warn you, it's a messy spaghetti 👀
edit 04.03.25 deton24’s Instrumental and vocal & stems separation & mastering (UVR 5 GUI: VR/MDX-Net/MDX23C/Demucs 1-4, and BS/Mel-Roformer in beta MVSEP-MDX23-Colab/KaraFan/drumsep/LarsNet/SCNet x-minus.pro (uvronline.app)/mvsep.com/ GSEP/Dango.ai/Audioshake/Music.ai) General reading advice | D...
What's this? Wdym spaghetti?
Open the website up and you'll understand what I mean : )
as for what it is, well, exactly what it says
Documentary, just docs on uvr and models
Aren't the steps you sent above before enough?
I mean ye, it's enough, but if you wanna know more about everything, that's the place
Oh ok
mh mh
I'm here. Which one I should click?
I did, but it confusing, looks like not in order
👀 because someone cut into my msgs making it more confusing than it should be.
Model type: mel-band roformer
model param box: from yaml config / config file or something along that
In Select Model Param, which one I should click?
Or is that where I put yaml?
After I download yaml file, where I should put it?
In what folder?
send ss
the model param box, open it and show an ss of it
pro tip, shift+win+s lets you select the area to make a screenshot out of / crop
then ctrl+v in here
well, taking you long
Don't want to sound rude but, can't be staying here for a whole day 🙄 got stuff to do
When I do that, it keeps back to before I click Select Model Param
Ok sorry. You don't have to reply it now
Ok
then ye, set the type as mel-band roformer ( again, not v2 ) and confirm all
and that's all
What yaml I should install?
One I provided
Oh ok
this
and then?
Unless the window disappeared then that's fine
Either way, once you close all the windows ( if any ) you'll have the main ui
Save config and close?
ye
Then how about the model type?
Done. Amd then?
got the main ui?
This?
On it
as for 1 and 2
you can drag n drop your music / song onto 1 area
and you can also drag some empty folder ( for instance, if you wanted it for outputs ) into 2 area
much quicker than doing it the other way
Shall I must turn all of this in order?
well no
Yes
The patch you said to install was installed by me with the same way like the full version one
Is it right?
why asking
something doesn't work?
both uvr and patch is installed the same way as anything else that is an installer
Make sure that I didn't missed steps
you'll know if you did all right once u run an isolation
If you get no errors, means you did all right
It said no errors in green
ye, then it works
But why it take so long?
Is it because I want to make it in flac?
Uhhhh, your gpu?
Aside, I never said it'll be fast
it takes time, esp at overlap 11 or 16
30
1-3 mins is pretty normal
for most songs
for instance, 3~ ish min song for me, takes 1.2 to 1.5 mins or something ( rtx 3060
so there's nothing wrong about it
Aight, I take it maybe because I want it in flac
@glacial pollen So, the result will be vocal and instrument?
overlap 8 or lower is faster but 16 has optimal quality consistency (let's consider it like ultra quality in game graphics settings)
No, vocal
fv4 is a vocal model, haven't tested it in instru mode so can't promise anything
and if you need a model that yeets the backing vocals, try this:
it is downloaded from this section
you download it, then hit refresh list and done. It'll appear in the model list
Does applio Kaggle use the latest code?
Not sure
I don't maintain kaggle or colabs so, can't help sadly
by default no but you can easily change it to use the main branch instead
there is also melroformer karaoke by aufr33, but that can be also a decent alternative
Oh, will have to check it out
No I just want to separate vocal and instrument
And then put the vocal in Applio
And then put the ai covered vocal to instrumental back together again
karaoke 2 is super nice and, tbf not sure how it compares to classic bve ( been ages
but kara 2 has an awful noise
so, maybe the one u pointed out is a lil better
What abt the applio colab
For instru there are probs better models
edit 04.03.25 deton24’s Instrumental and vocal & stems separation & mastering (UVR 5 GUI: VR/MDX-Net/MDX23C/Demucs 1-4, and BS/Mel-Roformer in beta MVSEP-MDX23-Colab/KaraFan/drumsep/LarsNet/SCNet x-minus.pro (uvronline.app)/mvsep.com/ GSEP/Dango.ai/Audioshake/Music.ai) General reading advice | D...
check the docs
Wdym?
ctrl+f and type in instru or instrumental
Exactly what I said. For instrumentals there are propably better models
because fv4
is a voc type model
it was made to handle vocals, mainly ( or rather, is mostly good at vocals and I can't promise anything on instrumentals' quality. Haven't tried it for that
I thought the character model
🤔
Voice model I mean
even after you separate using a vocal model like gabox fv4, you should check if it contains some backing vocals/harmonies
prev-gen models were rather universal ish, mostly
Nowadays tho, more and more models are being made with a specific specialization in mind
Instrumentals, vocals, stems, sfx, backing vocals / harmonies
and some diverge into having specific properties, such as fullness of vocals, or noise, bleedless etc
In this case, fv4 is primarily a voice / vocal model
If you need more details and explanations / overviews, the doc I sent you has all of that.
All you need to do is search for a keyword ( ctrl + f ) and have some read
can someone help me
inference doesnt work on Applio
i put in my audio and convert it and it just gives out a blank voice recording
send ss of the console log
Also, which applio, newest one? or precompiled package
newest applio
check if the file really exist in that directory, or if not sure convert it to wav first
tbh it should still work on files with spaces and somewhat kanji chars
well, supposedly ye
yet most times issues of a " file doesn't exist " sort happen, it's either the naming, corrupted file, path or lack of ffmpeg
Can you check on some other random audio first?
gotta exclude if the file itself's not an issue
i will try
Where can I get UVR-MDX-NET Karaoke 2?
i tried on different audio
still the same thing
just gives out a blank recording
also same error in the console
i also tried on different format
I showed it on screenshot
it is download from within the uvr
Check if you have ffmpeg file in ur applio ( folder
these
i have both of those
Also, again, which applio you running
No. In my UVR5 only UVR-MDX-NET Inst HQ 5
well
this one
yeah it was a zip file
It is there, you gotta scroll down
( doesn't show in my case cause I already have it downloaded, the karaoke 2
well, that's all weird
Normally people don't encounter such issues
nowadays at least
so what do i do lmao
well, I can propose checking my fork maybe
if you're up for it
but that'll result in some downloading 🤔
cause like, having ffmpeg, checking the path / naming, validating on other files
that's pretty much all there is to diagnosing the issue
is that the only way?
Imo the easiest
as it has to work 100%, my fork, no other way
that'd indicate an issue with something else, other than applio itself at least
Well, it'd end up on downloading stuff anyways
cause you'd have to get normal applio from repo ( not the package
but while we're at it, imma recommend my fork cause why not ¯_(ツ)_/¯
Else I'm out of ideas
so i cant really fix this
I did scroll, but it reached limit and no more showing
I have a hard time imagining what it'd be
Applio works within it's own environment
it's independent from ur pc
Unless you're not using it that way 🤔 ( for whatever reason
you have env folder in applio?
That is surprising 🤔
considering ur anime pfp
It's japanese, the language
Anyways, did it work?
or nah
yeah it worked
Then that's the case
it gave out a recording actually
as I said before
oh
recording?
wdym
It didn't do the inference? did output the input file?
yeah it did the inference
So which one I should use to yeet the background vocals?
Karaoke 2 or Inst HQ 4?
^
-# I mean not everyone who watches anime knows japanese though
i would just record a voice message of my voice in voice recorder and just put it in the audios folder and thats it
well
still that ffmpeg error in the recording file? like I said try converting to wav in audacity
ok
if that fails, then those mp3s are being screwed up in some way
or applio from precompiled had issues with mp3s? ( doubt it but Nothing surprises me anymore
ill try to rename it to test then
@glacial pollen sir
yeah i just tried them out on my voice recordings again but doesnt work lol
try hq4
karaoke 2 is for backing vocals
fv4 is for vocals
hq4 for instrus or really any other the docs recommend or mention
also the files are different for some reason
bruh
I mean, it's just metadata so
you can always use ffmpeg to convert it, maybe could help
you got ffmpeg installed and added to path, on ur pc?
or idk, take the file to the same location you have the ffmpeg in
i just have it in the applio folder
thank uuu aloooot 🫂