#✨│ai-help
1 messages · Page 245 of 1
@simple ore I dont see these options on mine
also, when ever I try to change the sd vae to ae, it changes back
as in?
yes, but it also needs the sd_text_encoder in the options
it wont let me switch
see here #✨│ai-help message
also looiking it up, I have not found any infor regarding automatic 1111 being compatible. Everyone say to use forge or comyfui but I dont have any tutorial online to setup on a1111
I changed it to that already but also still no text encoder option
like I cant find this in the list of options
can you verify you are indeed using a1111 and not forge? a1111 has been abandoned from what I read so far with zero updates
as of 2024
furthermore, it said flux is incomaptible with a1111 and had no plan to update
can anyone help me, this keeps popping up when I try to use the MMVCServerSIO in the folder, I am on a m1 silicon mac
i switched to a updated os and it worked
but now for some reason when i try to select an output and monitor for the audio inside the voice changer, nothing shows up
idk how to use it
ohh ok thanks
it works
@simple ore So I just downloaded this, used the default what it comes with. Already having issues
SD.next, never heard of it until today
I have yet to touch anything
like it just finished downloading
I'm doing my usual prompt test for flux, 1024/1024
photograph of a red apple on wooden table, red apple, wooden table, high quality, professional photograph, dark background,
RVC v2 disconnected is working bro!
anyway, if you have issues with sd.next, follow the link from their github page to discord
there's a good community there
Hi, can someone help me? I know how to look for AI cover models, but I don't have the link to make the voice work in an audio? I used Google Collab, has anything changed? Does anyone have a new link? Please answer.
finally
Did a new klm model drop or am I seeing things? Cause I saw that the thread was updated
yes
Where can I find it? There’s no links
And it uses the spin embedd right? I can also just use Apollo or did he use code names fork?
if you use a custom embedder option you can use any version
or applio exp branch
or the fork
-colab
Google Colab is a Cloud (Remote Good PC) Service. While the Free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
by IA Hispano
Google Colab
by Hina
Google Colab
by Eddy
Google Colab
by Eddy
Google Colab
by Deiteris & Hina
Google Colab
by Shiro & Eddy
Google Colab
by Nick088
Google Colab
by Nick088
Google Colab
by Jarredou & Makidanye
Google Colab
I need a model to separate the drums of a song
Just the drums
Does anyone knows how to?
-uvr
Yah but is it the spin embedd? Like what did he train it with is what I mean?
I mean the RVC version by W-okala
RVC is different thing and w-okada is different. W-okada is a real-time voice changer. Which use RVC as base to work
Last update: May 5, 2025
I used the Colab version of AudioSep.
https://github.com/Audio-AGI/AudioSep
https://colab.research.google.com/github/Audio-AGI/AudioSep/blob/main/AudioSep_Colab.ipynb
But I've got an error.
Is there an active maintained version out there?
import torch
from pipeline import build_audiosep, separate_audio
device = torch.device('cpu')
model = build_audiosep(
config_yaml='config/audiosep_base.yaml',
checkpoint_path=str(models[0][1]),
device=device)
audio_file = 'zand.wav'
text = 'water drops'
output_file='separated_audio.wav'
# AudioSep processes the audio at 32 kHz sampling rate
separate_audio(model, audio_file, text, output_file, device)
yes, spin7-12 layers
depending on the version of torch, it is either a warning, or it simply did not load the weights at all. All torch.load calls need to be provided with weights_only=True
i use firefox how to fix this
everytime i choose my microphone on client that pops up
if you're running it locally, use sever mode
or change the mic settings in the sound contol panel
i don't want to use server because i think client is better
i tried this on many browsers but chrome gives best quality, i cant test it on firefox because of that error
server mode allows using WASAPI devices which have less latency than MME devices that are used in client mode
beside that the built-in noise suppression (Sup2) can only be used in client mode
hi was wondering what crossfade setting u guys usin, or just using the default?
the default
oh thanks, btw i was wondering. So i just reinstalled my windows cause of a certain problems. For some reason my vc doesnt sound like it used to
got some advice you reckon i could do?
i feel like the voice seems to abruptly end on a sentence more so then usual
increase chunk settings according to gpu performance
which protocol is better sio or rest? whats the different anyway
Protocol: rest (Use SIO if you want less delay but if you encounter any issues with SIO switch back to rest. Rest has slightly more delay than SIO)
oh i see, thanks for information
you dont need to rename it.. usually a proper .index is included, regardles of what it is actually named
because my model will get rejected
not really, we know Applio names it YourModel.index
yes
and it works in any rvc applications
very old apps called index files like 'added_IVF1406_Flat_nprobe_1_modelname_v2.index' for no good reason
the mainline and older versions name it like that, also spawns total_fea.npy and trained_*.index
Read this guide first before making models
Last update: May 5, 2025
I can't understand this document. I want to find a current video and download it locally. When I tried it on the cloud before, there were problems like discornnect error.
-colab
Google Colab is a Cloud (Remote Good PC) Service. While the Free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
by IA Hispano
Google Colab
by Hina
Google Colab
by Eddy
Google Colab
by Eddy
Google Colab
by Deiteris & Hina
Google Colab
by Shiro & Eddy
Google Colab
by Nick088
Google Colab
by Nick088
Google Colab
by Jarredou & Makidanye
Google Colab
I was thinking of getting a dedicated gpu for the voice changer, does anyone think a 5060 Ti would be good to handle that?
how to make voice changer sound not robotic
you can even get somethig cheaper if you wanna use a dedicated card
What GPU were you thinking?
what's your main gpu?
3090 Ti
you can find like 3050 used for $100-150
just need to make sure you have a free 8x pcie slot and 2 slots for the card itself
Hmm, that can work. I can try to find one then
hi whats the latest fork or program for rvc gui
i tried to use rvc ai cover maker and that its not sounding as good as rvc gui
so its better to move on server mode?
I need to make a Russian girl's voice out of my male one. which model should I use and how should I configure it? I use AI-Voice Changer
My command prompt closes automatically after opening start.bat anyone know a fix? I've downloaded PyTorch and Python
What you are using?
Windows 11 I have an intel gpu
Applio, w-okada?
w okada
Have you downloaded correct version from guide?
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
thank you
do i delete the one i have rn
and install this?
Probably if it is wrong version
okay thank you
what do i do with the file you gave me
@latent kettle
do i extract
or like
what do i do
Yes extract it
OMG ITS DOWNLOADING THANK YOU SO MUCH
youre the best @latent kettle
appreiciate it
Yw glad to help you 
@latent kettle one last question, how do i add a voice to the voice changer. I can't find a tutorial on YouTube to help me because the tutorials show w okada via app and im on a website
See the edit button. Click on it and add model to any slot
@undone abyss
pth files goes to model and index goes to index
tyty
alr thank ypu
i appreiciate both of you guys' help
no problem ^^
my dms are always open if anyone has questions about setting up the okada voice changer
@viral mason I just dmed you and discord took away my permissions to dm, do you know why?
idk did you block me?
I couldn't message you either
i think someone went on my account and blocked ppl
i reset my cookie'
idk what happened it was weird
i texted you
and then discord logged me out of everything the second i did
luckily i reset the cookie in time before stuff happened
but that was rlly weird
was it bc i downloaded this?
okay ty
Voice changer for intel gpu
I mean w-okada for intel
I can't send pictures
The one you was using before, it was designed for Nvidia
I'm gonna hang myself
thats so weird
donttt
ahhh
that makes sense
Yep
I need image perms?
Delete that
or somthing
Nop
wtf is wrong with discord rn then
discord being weird
Idk. I'm able to send images. We have Same roles
ugh
that's normal
Congratulations 🎊
@inner pivot
top one?
Yes. Just scroll down to "Settings Explained"
i cant reopen mmvcserveriosio it says failed to excecute script 'client' due to unhandled exception
nvm i got it
ill look at setting explination now @latent kettle
Good luck
thank you
good luck ^^
I'll try helping if I can but I'm about to burst into a ball of angry juice
yeah, i feel you. voice changer is lowkey complicated. and idk why you cant send images 😭
thank uu
same kaggle showing firbase problem
colab dead now kaggle ..nice
yeah that's the firebase issue
it loads for me too but won't work
no unfortunately until the creators themselves fix the issue
do they fix it quickly usually orrrr
bc I kinda need to keep training the model I'm working on
u need to try after few hours cause that issue is firebase...so kaggle might be doing it
😔
see if it's fixed after few hours or wait for the notebook creators fr a bettr solution
u can always train locally if u have gd hardware
I don't like local training, it messed with my vr
i only use local fr inference training takes toll on my pc
that's fair
kaggle was having issues like not too long ago was it not?
is someone attacking them or
what
gotta wait b4 getting GPU
u can get free immediate only CPU tho
i'm waiting too
going slow 😦
its stuck like this is this collab broken ? i used this 1 hour ago but now
which colab is this
Finalllyyyy
is https://applio.org/ the thing to use to make models??
How do you make modles??
it depends on what u want
-rt
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
can you Dm?
I'm about to sleep tbh. U just need to download the version according to u GPU
Do it after removing instruments
Hey there, is the 9070XT capable of running inference or training on windows? Ive been trying to get it running on Applio with Zluda with no luck. Or should I just give up? lol
Reverb, then backvocals and finally denoise
there is a fresh experimental built of native pytorch for rocm that supports 9070 on windows, zluda is not required
unfortunately by default applio install comes with python 3.10 and the rock requires python 3.11 or 3.12 (applio does not not work with 3.12)
to use zluda you can see https://github.com/IAHispano/Applio/issues/1005
sweet
to use python 3.11 with Applio you basically need to nuke env folder and make a fresh venv using separately installed python3.11 and use pip install -r requirements.txt
without conda
or maybe change the install script to use python3.11 conda
that's another possibility
once applio is installed, install the experimental wheels
i've tried this but still no luck
"no luck" does not give me anything to investigate
ah sorry, I'll try the 3.11 method that u mentioned when i get some time, I'm still new to all this so thank you for the heads up
I have not tried the rock, you may need some beta of HIP SDK 6.5 that is not available yet.... never mind, should work with 6.2.4
but the method from the applio ticket should work
oh okay, so it is possible its just me missing something...
I'll give it a go, thanks for the help
I tried to download it yesterday but I found it very confusing and it gave an error, can anyone help me?
Hii
Which GPU do you have
RTX 3060
Okay
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
What I'm confused about is that there is a MB and a 2GB version
Just download this or is there more?
After that you need virtual cable.
Virtual cable to connect your games or discord with Voice Changer
After downloading this, where do I click?
Do You want real-time voice changer?
Yes
After I extract, where do I click?
After the download, you extract the zip file. You open the folders until you see an exe application called MMVCServerSIO and run that.
Maybe Okay
downloaded and I already extracted it and clicked on the exe
and now?
link?
Is it normal to say that there is a virus?
I think it's better to ignore it
Okay start and give mic permissions
I haven't closed it yet
I already added the voice, how do I configure it so that the voice is synchronized?
Synchronized with what?
donwloand
After installing the Virtual Cable, it changes your default audio system. Click Yes when it asks you to open the audio device settings (or press WIN+R, type "mmsys.cpl" if you closed it already), and change your Recording and Playback devices back to your usual devices. Same for communications device aswell (right click -> set as default communication device)
You must select virtual cable as output in voice changer and your mic as input in voice changer
like this?
Yep
.
Do I put the check on the microphone or on "line 1?"
Check ?
I got a question about w okada, or well, AI voice changers in general (hopefuly this is the right channel) Is there a software, app, or a general way to let the voice changer pick up my voice, while my IRL surroundings are playing like normal? for example, I use the voice in a vc, and theres a knock on my door, people can hear the knock from my mic, but also the AI voice.
Hey guys I need someone to guide me to properly train a model, I'm a bit familiar with the process but I can still use some help
leave it as default
tes
Una imagen de dios en el espacio
https://applio.org/ is this the site for making models
how to change or fix delay?
try batch 8
yeah the description of the batch size is wrong
the number depends in how big the dataset is
since u got 1 hour i'd recommend 8 or 16, with 8 being safer here
vram is just a constraint, obviously not affecting the results
20 minutes or less = batch 4
30 minutes and above = 8
not only that, it also depends on the dataset diversity
if it's too diverse, you might better split and train separate datasets
?
why lol
not really, i got a very monotone 5 hour dataset, batch 16 was worse than batch 32
it's still okay, unless you have multiple sources having different quality
the safe bet is to use a single source, or try normalizing each source
the only really bad thing for datasets is inconsistent quality and whispering
8, 6 is too low for 1 hour
i train big datasets
so i kinda know what it's best for them
^ repeated words, very monotone speech, no expressions
the thing is, I tried separating screaming parts for trying on metal vocals
just use 8
for anything above 2 hours you may wanna try batch 16
and for 5 hours 32 is good
the more data u add, the more realistic the output, just sayin
i already told u singing is not the way to test stuff
so you tried bs 32 with checkpointing on a 16 GB gpu?
no checkpointing, bf16, 24gb vram gpu
yeah basically
another tip for best results: use spin, single scale loss
not everyone expect ideal results for that, if you can combine inference results to remove robotic sounds, it just works
I thought tf32/fp32 are preferrable to it
but well bf16 as well as fp16 allow using AMP
so applio has this new branch named f0_spin, it introduced two game changing stuff: a new embedder, and they brought back the original's rvc way to calculate mel
spin handles breaths better than cvec (the default embedder)
back then applio dev added a new way to calculate mel named multi-scale, which is great but not intended to be used in rvc/hifigan, so it was found that adds ringing to the models due to that single scale was brought back, using it should give you a model with very little ringing/no ringing at all
bf16 works just fine
you need this pretrain in order to use spin: https://huggingface.co/Aznamir/spin/blob/main/f0G32k_spin7-12.pth https://huggingface.co/Aznamir/spin/blob/main/f0D32k_spin7-12.pth (download g and d, and place them inside the "custom pretraineds" folder)
rvc > train > train.py
multiscale_mel_loss = False
disabling multiscale mel slightly reduces the vocal range of your model so remember that in case you wanna train a singing dataset
rvc > models > pretraineds > custom
i forgot it wasn't named custom pretraineds anymore lmao
if this is too complicated for u, you can just ignore it anyway
yeah actually it's somewhat easy, finetuning in rvc is not really a hard task
uh weird, redownload the zip
forgot to mention this
d_ste_per_g_step = 2
rvc's discriminator is pretty piss, this was added to make it less bad
should give a less robotic model and better breaths
website? you mean colab?
dont run it as admin
just double click it
looking forward to seeing it come to the stable release
rn I'm still rather conservative against the new stuffs including spin and things like that, as my current one "just works"
and I don't think you should recommend it to commoners, yet
if they dont like it they can use the old stuff
just because u dont like it doesnt means is bad
it's actually a great update
literally what rvc-boss intended to do back then
a new embedder
no lmao that aint a virus relax
training is so light on the gpu
u can literally play games while training
xD
u gotta need python 3.10 or 3.11 tho
this more advanced approach do need a few gigs of space tho
why being paranoid of it if you have good cooling system
I assume you don't play with any overclocking yet, so it should be safe, even in the furmark stress test
did u open it as admin again?
well, have you done manual install with the latest torch and cuda 12.8?
the current compiled release one only works on RTX 40-series/older
i think they already added support for 50xx in the branch
its fine bro
ignore the error
librosa being cringe for no reason
open the url in ur browser
dis
well there u go
congrats u installed applio yay
🏆
lmaooo
enjoy super fast training speeds now
yea
just train locally, it's better
kaggle is piss bad
u want a tutorial on how to use applio locally
?
ok so u did the two steps, applio is installed
download the pretrain i gave u, place where i told u to place it
place ur datasets inside the assets > datasets folder (or you can do it somewhere else, it doesnt matter lol)
at this point you might better sell your 5090 to any folks wanting it so bad and knowing what to do
that was way overpriced, recently it has quite dropped
then manually place the location of the dataset like this
yeah
use auto slice if you haven't truncated the silence of your dataset
yea but like
your audio still has silence
rvc kinda hates that
so use auto slicer
yes
multiscale thing?
more natural results

if you want step by step guidance, this convo should be continued in a new thread in https://discord.com/channels/1159260121998827560/1192011222023950368
so like simple words
applio by default added a thing that boosts your dataset voice range at the cost of ringing (a static sound while singing high notes)
if u dont want that
u can disable it
inside your model's log folder
everything should be there, index, g, d, ur epochs
graphs
and finally, save every 10 epochs if u wanna save some disk space
nah just make a thread here > #1192011222023950368
anyone know how remilia bandxz makes his voice like that?
Erm actually wavlm may be better than spin (ignoring the breaths)
breaths are crucial for realtime 
Just don't breathe 

@left sentinel
how to de harmony a track with uvr5?
I have an applio error saying that no api found and I am using version 3.2.3
try getting the latest version 3.2.9
I installed the new version and it shows me the error file not found requirements.txt
yo, ive been tryna setup this ai shit for years on a shitty laptop that couldnt run all the downloads. Im back with an actual PC, someone want to help me set it up an explain how it works? im a dumbass and would prolly need help in a voicecall
@simple ore sorry to bother you but can you explain why this is happning like my g avg loss in decreasing but the G total loss is increasing.
dont use search, just expand loss_avg_50
what batch size?
13 minutes of dataset and 4 batch size
why isn't mainline working
chat i am looking at the available pitch extraction and i asked gpt deepresearch to figure out what's better for realistic m2f voice conversion and it says crepe-full is better but it gives it more delay - and i notice the deiteris fork and i think wokada also have crepe-full without onnx so i assume its gpu bound
I have a really good gpu and i only have to run the voice changer, so no games and stuff, so is crepe-full better than rmvpe for pitch extraction if i can afford to run it?
help, Why only works Beatrice, how to make it work and rvc
Guys from your experience , i tried many models , can anyone suggest me a model that cannot look like its an AI talking ? i tried many couldnt find the perfect one
only one installed when launched, the other one doesn't download, I don't know why when the program starts
ye the og wokada is good for beatrice only
better get one of these:
https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/
https://docs.aihub.gg/rvc-voice-changer/local/vonovox/ (Nvidia support only yet)
Last update: May 5, 2025
Last update: June 2, 2025
I mean, everything worked for me before, after resetting Windows, when I start HTTP, only Beatrice is installed, without RVC, and nothing works
maybe it is possible to download RVC separately and everything that should be downloaded automatically
hey guys with experience, what is better, deiteris w-okada or vonovox? in terms of delay and quality of output
hi Could you let me know what the most current/latest version is right now? Please send me the link
Can anyone tell me how to add beats to an acoustic song using ai 😢
Does anyone know of a model for cloning a very expressive voice with a lot of vocal voice! NOT LIKE ANUEL AA, BAD BUNNY. They don't have a voice! SOMETHING FOR MANELE FOR AN ARTIST NAMED FLORIN SALAM! HELP! I NEED SOMETHING BETTER THAN HIFI-GAN AND MORE REALISTIC WHEN CLONING SINGING VOICES
elaborate:
- ur pc gpu
- what u want to do (pre-record or realtime)
also, remindn yourself that it depends on how the model was trained, and you need to kinda voice act yourself
of what? what do you want to do?
what's ur pc gpu?
kinda similar, heard @crude flame say it was slightly better for nvidia gpus in terms of delay
elaborate:
- ur pc gpu
- what u want to do
- what u exactly mean
I have a 1070 Nvidia, I want something better than Hifi-gan.. my cloned voices with this method are good but I want something better that feels more real when cloning these expressive voices, look for FLORIN SALAM AND YOU WILL SEE THAT HE HAS A LARGE VOICE! not like Anuel AA or Bad Bunny
I love how my phone autocorrected vonovox to bonobo. Anyway yeah vonovox is better in delay and same quality
a gtx 1070 isnt that great, are you looking to train models or use them in realtime?
want something better than Hifi-gan
RVC has limits, it can't 100% pass for human, for example it sucks at laughing
Vits1 problem not hifi
I'm looking to train models! Better than Hifi-Gan! I train singing artist models.
I have applio rvc wit HIFI-GAN
5600 and rtx3060
RVC has limits, all you can do is try to train better and do voice acting
download what? what do you want to do? also wdym with 5600?
ryzen 5600 cpu
And how do I train better? Or tell me exactly what to do to improve as much as possible.
voice changer
also i use at2020 and scarlet solo 4th
realtime for calls right?
yes
you can try checking the suggestions in the docs https://docs.aihub.gg/, but as i said, you will NEVER get something that is 100% like a human
next time specifiy it, this isnt a voice changer server
we do general ai here
Ok thanks
@neat meadow RVC = Retrieval-based-Voice-Conversion, the best Few Shots Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models. Technically, Mainline RVC does have a go-realtime.bat (aka RVC-GUI), but it's pretty messy and outdated so it's extremely not suggested for realtime.
Wokada = uses RVC for realtime inference. There's 2 main versions, Original made by Wok, and the most suggested one is Deiteris Fork (modified version)
what you'd need is wokada deiteris fork
Last update: May 5, 2025
yw
what is fork version? What's the difference from the main version?
thx
better quality and performance
fork version is better?
wokada deiteris fork b2332 is the latest, there's also another program which might slightly be better but only for nvidia gpus https://docs.aihub.gg/rvc-voice-changer/local/vonovox/ and i haven't tested it personally
Last update: June 2, 2025
yes
how to download fork version?
Last update: May 5, 2025
100% you should
share a screenshot of your current wokada so i can check the version, or just the folder name
appriciate
It corrects the accent of the model. Using it generates a lot of cpu usage
i like zero
idk formant Is it a new feature
you can decide how much index to do
higher value means more trained index is used, but can sound a bit more autotune
Can anyone help me with voicemeeter related issues?
I’m trying to bitcrush my rvc model
High risk of damage? That ususally happens when your CPU overheats, it can happen when you overclock or your PC simply has a bad cooling system, regardless of any process running. It also depends on your PC CPU and such.
Promoting your Discord server or stuff is not allowed in this server. It's obvious.
Mine is running on my cpu for a long time and theres been no problem.
What do i do? I cant take the bot outside tho. How do i find people to test it?

@crude flame Are you able to help my voicemeeter, codeman is asking if you can help me out
Do you have light host installed
Yes it’s mainly just voicemeeter
It’s like working with blender all over again 🥲
Then what's your problem
i cant hear anything when im running voicemeeter
and it says fader grain for all the sliders which doesnt match up with the picture
Like can't hear game audio can't hear the voice or can't hear anything at all
everything
no audio anywhere
Did you fix your output audio in Windows
Applio no UI is not working again
im not sure how to setup my windows audio since the guide doesnt show me that
and theres like 30 input output options
don't advertise please
what's ur pc gpu? whats wrong?
Select your headphones for Windows output
Android user
elaborate the issue
okay im going to attempt it
Wait it's Colab is the code broken again.
"Pkg_resources is deperecated as an API"
See...
Vedi
send a screenshot
!give-media-perms 1h @brittle wing
Ah no prob it's training through
still cant hear anything
So you have your windows output as your headphones and in voicemeeter you have your hardware out a1 as your headphones
yes exactly
Did you restart your PC after installing voicemeeter
On your virtual input try selecting a1 for both
Idk then, that always works for me
damn so much for bitcrushing
You could try setting voicemeeter input and aux as default comms device and default sound device
ill try it
that didnt work but i switched my stereo input 1 to A1 and it worked for some reason
Wild
so random
now i got to see if it even works with rvc
@crude flame and what would i set input and output as?
Input your mic
Output line 1
im not hearing my model
Put your headphones/speakers into monitor
it is
this isnt doing anything
no db
try restarting the whole client
okay
Did you click start
yeah i did click start
Did you select a voice
is it supposed to be picking up my desktop audio?
No
thats definitely a problem then
cause that bar is picking up like youtube
How do I decipher this?
someone else had told me you dont need to but dont take my word for it
It's your choice model will work with and without index
hey is there anyway to make owakada voicechanger more adapable to the english accent?
like sometimes it pronounces words diffrently
Normal like everything's fine
how i can download applio?
Another Applio no UI error/issue while train
Cannot load file containing pickled data
-colab
Google Colab is a Cloud (Remote Good PC) Service. While the Free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
by IA Hispano
Google Colab
by Hina
Google Colab
by Eddy
Google Colab
by Eddy
Google Colab
by Deiteris & Hina
Google Colab
by Shiro & Eddy
Google Colab
by Nick088
Google Colab
by Nick088
Google Colab
by Jarredou & Makidanye
Google Colab
Do you have any workaround for this?
Seems to be an issue with Numpy...yes
Does anyone know if Applio Colab is down? It stays here and doesn't advance: Starting backup loop... Files are up to date
Same I can't train
must have fallen
Korean
I know but which artist
Sooin (MEOVV) was going to train but it's no good applio haha
Ah okay so we're not training the same person lemme tag @simple ore for help
Applio no UI is unusable
Am I doing resuming wrong?
I don't like it, it's very complicated haha
It's easy 4 me
Currently I like the most up-to-date, that's why I use Applio UI.
No way, I have to wait to see if they can fix it.
Idk how to train there.
Colabs are broken again...
'Pkg_resources is deprecated as an API'
so far it is a warning
Which colab?
Applio no UI but with workaround
I tried the no workaround one too still the same outputs in code
what workaround?
The code fixes you suggested in winter
that was a long time ago, fixed the install like 3 times after that
So I shouldn't use the colab notebook I added this stuff on?
the link above installed just fine, no errors or warnings
I get the same errors here too
screenshot
Uh wait
How...
Warning?But my model doesn't want to train it gets stuck on "data not found?"
In what sense?I should do that before loading backup after switching accounts?
Im cinfused
Did you actualy load the backup, are all the files under logs/modelname?
my spidey sense is tingling
why you checking mute folder instead of your model's folder... which is not visible there
you said you used the URL i gave you, but did not load the backup?
Model folder is on drive
Uh I'm still installing.
but it is not in logs
😭
Don't forget about Applio UI too, please.
How I do that???
I know the copies are there
is this the applio ui or no ui?
Now ?
No UI one
(cries in UI) 😭
Done now ...?
Last cell?
Still getting the same warning @simple ore
That's the issue!
yes, it is warning
Look how do I fix that
ugh
"No data left in file"
Oki
run that, then your training
ImportError: numpy.core.multiarray failed to import
Still can't train
Honestly
you're killing me bub
and i'm trying to watch a movie
I understand but just help me fix the error quickly cause I loaded backup & ran the LAST cell
Did everything over & over wasted 2 hours
i'm gonna check locally
Oki
now I'll check on colab
Pls do
after you ran 'install'
preprocess and extract featiures works fine with librosa 0.11.0
did you run 'set training variables' cell?
as I see everything works fine
connec to drive, clone, install, +extra cells for librosa, load models, load backup, set training variables (must use the same model name and dr), then training
that's only to hide those annoying warnings
After install comes the new cell?
make a new cell, see above
Done
Numpy array Multy array failed to import.
Ah wait backup complete,files are up to date
New cell?
Training cell was stuck on "files are up to date"
I have no idea.,. maybe your backup is fked
In what sense?
That could be
How do I fix it.
I've tested restore from backup and resume training and it works for me.
Restore from backup...
im using w okada rn, and everytime i talk it gets super laggy and the ms spikes rlly high, anyone know a fix? or a way to reduce the lag
Hello! Im just curious as to how i could fix my issue. whenever i press start on a voice, it keeps saying “Frequent errors occur. Please check if the model of the framework being targeted is loaded.”
Hi, Applio UI won't load my datasets. Can you help me?
I’m new to voice-conversion and excited to explore RVC! I’ve read through the README and glanced at the code in model.py and inference.py, but I’m not sure where the “core” algorithm is implemented, and how all the pieces fit together.
What I’d love to know:
Which files or classes handle the feature extraction and model architecture?
Where is the training loop defined, and how do data preprocessing and postprocessing hook in?
Are there any papers, blog posts, or diagrams you recommend for a high-level overview?
Any in-code comments or tutorials aimed at beginners that I should read first?
I’m eager to learn and eventually contribute—thanks in advance for any guidance! 🙏
have you tried looking in extract.py, train.py, and preprocess.py for feature extraction, preprocessing, and the training code
I’ve reviewed the suggested modules, but my goal is to build an intuitive mental model before diving back into the code.
@simple ore for the klm v3 model, is it hifi gan?
why has every model become extremely high pitched even on 12 tune
i imagine klm v3 is old af
do yk why?
I mean the pretrain @tight ether made
the exp 3 pretrain for klm
its hifi right?
thank u
can someone help me
pitch +12 increases the input audio's pitch up
@simple ore is there a way to enable fp16 on apollo
so i can have faster training with my 4090
faster is not better
4090 is fine crunching fp32
Figured it out anyway, need faster for the project Im doing, doesn't have to be perfect
fp32 is not slow
i have 55 hour set doing 24min/epoch on 4070 with fp32
with fp16 it would probably be 20?
yo guys, why does my changer take like 30 seconds to process, and even after it processes it cuts off and sounds horrible?
Do you? Using an index model in W-Okada is not really recommended, as doing so will use more resources to process, potentially reducing the overall performance.
fp16 has exploding gradients issue, to mitigate it switch to bf16 or fp32 (tf32)
so i set up my inputs and stuff with mic and cable input but cant hear anything and when i try to test in discord the yt video is playing in my mic test. anyone know whats wrong?
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
Which W-Okada version are you using? Are you using VB-Cable or Virtual Audio Cable? And what is your PC GPU?
it says v.1.5.3.18a, i am using VB-Cable and i have a NVIDIA GTX 1070
Download and use this better W-Okada instead. Yours is old and outdated. And make sure to try Virtual Audio Cable lite instead of VB-Cable. https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#download-nvidia-on-windows
Last update: May 5, 2025
okay, thank you very much
how do i edit the deiteris fork on my windows pc? i install it but it fails to install faiss-gpu because its linux only and i cant even build this on gh actions to get a working source (no idea how to reproduce)
im trying to change the source code
Let me know if you have issue about delayed audio and low quality. 
Which Detris fork W-Okada did you install? I don't think this fork W-Okada would do that.
the latest b2332 for nvidia cuda on win
If you wanna modify fork W-Okada, there's a GitHub for that, if you know how to do so. https://github.com/deiteris/voice-changer
i think you're not understanding - i downloaded this and tried to start it from source
but python wont install faiss-gpu on win
However, there's a compiled one there in this guide, this one doesn't need to be installed from start. https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#download-nvidia-on-windows
Last update: May 5, 2025
yeah thats not useful to me, i want to write my own better fork as its not being updated
but i cant work from source on win11 because faiss gpu doesnt exist for w11
Do you mean you want fork W-Okada to work on Linux? Sorry, but I still can't figure out what you want to do with this.
i want to edit the deiteris fork on windows
i want to edit the code for it
but its not possible to build a working version for gpu (i would explain why but itll just confuse us more)
So what makes you think b2332 is outdated? Sorry, I don't do coding, but this version is the most stable out there. Any attempt to upgrade one of Python components can cause some other components to conflict each other, I've tried it with other Python-related programs.
i need to write postprocessing sfx, fix the wokada gui as its confusing in parts, remove outdated stuff like fcpe, and clean up the code
it has nothing to do with the quality itself
but if you cant help me because this is a programming question can you point me to someone who can?
Anyone with the "Engineer" role. But @wispy lodge is the author of the fork program.
so which one is better vonovox or deiteris fork?
Deteris' fork W-Okada.
cool thx for the fast response
do any of you guys use voicemeeter or just use VAC lite and call it a day?
virtual audio cable
voicemeeter is lil tricky to route
but is it better?
What does this mean? Although you can use index file in "regular RVC program", Applio for example, using index in W-Okada is still not really recommended anyway. As what I said.
oh okay, i was reading the realism section of the docs and they recommend using voicemeeter
Some might use Voicemeeter as a second virtual line to Virtual Audio Cable lite, but that's it. VAC lite is still recommended, I use this one.
oh okayy thanks for the info
voicemeeter is only for audio routing, though there is lighthost as alternative
but ffs not the virtual cable
Whats lighthost?
the guide should have mentioned it
yo i need help with a voice changer
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
what f0 det do you use for tesla T4 gpu?
NVIDIA T4 in which website? Is it Kaggle or Google Colab? And which notebook are you running?
Also, read the "How To Troubleshoot" above before you ask anything here.
kaggle, im running https://www.kaggle.com/code/suneku/voice-changer-public
My current f0 works (rvmpe_onnx) but i was wondering if theres a better one
the guide doesnt say anything about t4 gpu
it is Nvidia gpu
The F0 Det on W-Okada, if you're using NVIDIA GPU there, always select regular "rmvpe". The rmvpe is for NVIDIA GPU, while rmvpe_onnx is for non-NVIDIA GPU (AMD and Intel).
the performance is around RTX 3050 but with 16 GB vram
kaggle has dual gpu but it will use a single gpu anyway
Oh I didn't know, thank you guys
I thought by "tesla" they meant like elon musks tesla and I was so confused
it is Nvidia Tesla lineup
NVIDIA Tesla GPU and Tesla the company are two different things, the NVIDIA Tesla is now called NVIDIA Data Center GPU for newer GPUs within series.
Almost every RVC model has index file alongside with it, but some only provide just pth file. Not really that surprising. An index file stores the accent of that voice model.It can be created during voice model training in RVC program.
403 ERROR
The request could not be satisfied. for links lol.
I tried vpn it doesn't work.
so i have the app on my phone can i still do the contest?
WHat are good AI voice apps
Is it pre-recorded audio converter or realtime voice changer? There are plenty of AI voice apps available. One of them being RVC.
What are good RVCs?

I use Weights.com mostly for fast-accessing AI cover. But there's Applio, which is available as locally (PC) and online.
RVC refers to AI programs that can do voice convert and voice model training, but as what I said, there are many different programs of it, which one of them being Applio.
use a better vpn
I still have the same issue as yesterday.Would you mind if I dm you my drive folder with backups and you see where's the problem
NO DATA LEFT IN FILE...AGAIN
I CAN'T TRAIN NY MODEL
Should I start all over...
With another account
Do I need to explain it again?

If you don't remember what I said earlier, let me say to you again. Using index in W-Okada is not recommended, as it will cause it to use more performance. While you can use index in regular RVC program, yes, but that's all.
the voice changer app is running a rvc model in realtime
rvc does not stand for realtime voice changer, they're two separate things, rvc originally only works for local conversions and don't support realtime inside the webui
every rvc model is compatible with the .index files (yeah you can use any .index file with any model), although index files in realtime cause several issues and their usage in those conditions is not recommended, pick any .index file, set the index value to 0 and forget about its existence
i think w-okada forces you to select a index file but im not sure, regardless, setting the index value to 0 will disable the index
Anyone know if KLM is good for real time? 
It sounds so good in these sample

Spin is a breakthrough for me
og pretrain is better for speech
for spin i'd recommend noobies pretrain instead, but the grads are a bit high, not sure why
spin only ^ doesn't work with cvec
remember spin is still experimental, for a more safer approach, use the original pretrain and contentvec
do u guys know where i can find how to create ai voice guide
what's ur pc gpu?
rx 6700 xt
hi, i wanted to use rvc to put an audio file and then corvert it to another audio file with an AI voice, idk how
You can:
- Locally (runs on your pc so the speed depends on that, you will have to set it up with the guides):
- Applio (AMD Windows) : A fork of RVC with some extra features like Applio TTS, kinda faster and simpler but same quality tho
- Mainline (AMD Linux/Windows) : The original RVC
- Cloud (remote good pc, easier and faster than ur PC but it's limited):
- Weights.com: Partnered with AI Hub, lets u do them easily but u may be in a queue
- Applio Colab: max 4 hours daily, not granted, of GPU
Easiest possible : weights.com
easiest cloud: Ilaria rvc zero
easiest local: Applio
what's your pc gpu?
Ok
i made a environment folder inside coquis ai tts repo folder
and installed it
but how do i open it?
I don't see anything
Documentation on the github says to make a python script, but i wanted a GUI of the app
i don't wanna use the command console...
How do i open coquis ai tts?
hi
i have a question
where do you put these voice models in?
like what software
what's your pc gpu? what do you want to do?
does rtx 4060 8gb good enough for training image models locally? weights queue is taking way too long
is it normal that the "Start" cell on Applio Kaggle takes a lot of time to even give me the links? its been like 10+ minutes and it still says "cell execution is queued".
now +30 minutes
which model is made to separate backing vocals from vocals?
Maybe my backup got messed up cause I tried to clone the main repository
Mel and Becruily karaoke also DM me the song you want separated I can do it for you by using uvronline's backing vocals separator
thank you, the thing is i have quite a lot because im getting the vocals to create a model
Well you can use it yourself
you mean this one?
Well use that model on mvsep
cause i tried it and it gave just vocals and instrumentals(which were empty since i have only vocals already)
Yes but use big beta 6x by unwa on the music source separation colab for Acapella first.
i have acapella already
Use th model on the Acapella
Yes
thank you 
btw i did the acapella using the thing from kimberleyjsn, is that much different?
Im starting to get kinda insane, because idk wtf is wrong with my dataset for applio not being able to train it. the logs keep saying "Not enough data present in the training set. Perhaps you forgot to slice the audio files in preprocess?"
show the log of preprocess and extract features step
I mean, spin has better quality in real time in Vonovox. Also it can produce my language tone a bit better. 
Let me try Noobies' pretrain
yeah spin is great 
well, it failed to extract f0 and features
for some reason
i've tested noUI colab yesterday and it was fine
If this helps, these are the options I put.
the dataset/wav file is only 1:03 minutes long
1 minute audio wont fit into any training buckets
who gave you this idea?
start again with a new model name and use simple slicing
make sure you get your ~30 files in extract features processed
i mean, the dataset has little to no silence already, and so I thought that it was no necessary to cut it even more (let it as it is)
you need to slice it, it has nothing to do with silences
i thought it was basically going to remove parts of the audio and make it even shorter if i choose any other option, my bad
automatic slicing does remove excessive silences, simple just shreds the file into digestible chunks
well, in that case, i wonder what are the recommended values thay i should put for both "Chunk length (sec)" and "Overlap length (sec)" if i choose the simple option, for such a short dataset that at best has like a second of actual silence. maybe the default values are enough and will not "cut parts of the audio and make it lose content"?
use default 3/0.3
alr, then the audio should be let intact (not lose/cut/remove content/information) 👍
if you have enough silence, actual 0 level silence, you can set mute files to 0
i suppose the "silence" on the very start and very end should be that ?
so for kaggle w-okada, how do I have persistence for files apply without having to do save version every time? do I run in edit mode or
ok, so, i guess it did went better than last time, but the logs still say the not enough data thing, and instead of ~30 files, it gave me 22. same values, just changed the model name and the audio cutting to simple with default values.
what's the batch size?
4
is it kaggle or colab?
kaggle
alr, i guess its finally going!
dont expect much of anything from just a minute of audio
ik lol
Not sure why

