#✨│ai-help
1 messages · Page 315 of 1
is this a yes?
Use WinRAR or 7-Zip to open the .zip.001 one, extract to somewhere like "D:\MMVCServerSIO".
I'm not sure what to tell you about Vonovox, since I only know about W-Okada voice changer fork.
ahh all good. Thanks!
is w okada better or worse than vcc
what would be the case in W-okada though? Does my statement hold true for the case in W-okada?
just curious
Both names "VCClient" and "MMVCServerSIO" refer to the W-Okada voice changer, if you didn't know. The differences from W-Okada** fork **and the original and outdated W-Okada versions are how they implemented, W-Okada fork has more recent features while original ones are outdated.

In W-Okada voice changer, the more "chunk" value the more delay. "Extra" value indicates how much audio quality, the more value the more audio quality, while lower one the less quality.
the fact that namari is still answering this dude
Actually, I don't use a voice changer program on a daily basis, but I know how voice changer works because so many people here keep asking for it. Yeah, that's how much I know about it.
Thanks!
Sure, I'm more of talking about a information that hasn't been documented elsewhere, or none talked about, like how an Intel Arc integrated GPU could potentially work with W-Okada, it just doesn't have always be "how to install and use the voice changer" trope thing. 
!howtoask
- Check Docs & Guides: Your answer may already be in the AI Hub Docs or the https://discord.com/channels/1159260121998827560/1159513888199540817 channel.
- Search the https://discord.com/channels/1159260121998827560/1192011222023950368 : Look for existing posts that solve your issue. Do not invade someone else's post.
Tell your:
- Full GPU Name: (e.g.,
NVIDIA RTX 4060 8gb vram desktop) - Operating System: (e.g.,
Windows 11) - Detailed Description: What were you trying to do and what went wrong?
- Tutorial Used: Link to the guide you were following.
- Screenshot: A picture of the full error message.
To maintain a legal, safe & ethical community, we will NOT provide help for:
- ANY illegal activities.
- NSFW/Porn.
Requests for these topics may be ignored, not helped and result in moderation action.
- Be Polite & Patient: Our helpers are volunteers. You may ping the
Helpersrole once. - English Only: Please keep all conversations in English.
- Don't Ask To Ask.
I'm running Vonovox with a RTX5090 on Windows 11. I've followed the guide on AIhub to set it up. I'm satisfied with the quality. I just have two questions:
-
I notice it's only just a small difference in delay even if I crank the block size to 1.0(max). Does keeping the block size at max means the best sound quality and accuracy? (I tested the lowest block size value that does not sound choppy, I genuinely cannot tell the difference in quality.)
-
the output volume is a bit low, should I turn up the input volume or output volume to avoid introducing noise/ artifacts?
No, it will only reduce quality after the block size gets too small. There is no increase in quality after a certain amount.
Better Quality and Accuracy is what the Extra regulates.
yeah actually
Thanks! I thought the larger it gets the higher the quality. I might tune it down a lil for better latency then
I have 'extra' set to max.
I kinda want to get one of those dynamic mic. But it seems it takes up a lot of space lol
I use laptop's integrated microphone. 
some of them are not bad tbh
It's not that big, a dynamic mic I mean, but it does have a size lol.
but the issue is cross communication; you want to avoid having your friends be heard by your microphone
If you need urgent help, please checkout our AI Hub Docs or ask for help here following the [Guidelines](#1402790586028789830 message)
So its mainly a headphone issue, or if you really want to use speakers, volume and noise gate- but I guess in that case you're going to have a terrible time with settings if you use a laptop, since everything is very close to eachother.
"you want to avoid having your friends be heard by your microphone"
Sorry I don't get what you mean here
cross talking basically.
the microphone catches all audio playing, and if it is a voice, then the voice changer will run it
cross talking?
I'm not sure how to interpret that. If you mean by "your microphone is too close to your speakers", I kind of understandable.
yeah is that what you mean?
I feel like my brain is not braining

Got anything you wonder how some specific features in Vonovox work? There's a Discord server about "Vonovox" especially.
The Vonovox discord server? I thought it was only for paid supporters
No, but the server itself is available for anyone, and its invite link is stated somewhere in https://docs.aihub.gg/realtime-voice-changer/local/vonovox. If you mean by "only for paid supporters", it likely being the author's Patreon.
ahh I see. Thanks!
This setting "forces the voice changer to use FP32 the 32-bit floating point with your PC GPU". This will make the GPU to process more, sometimes will give better audio quality but often more audio delay compared to non-FP32 mode.
Will the performance decrease off I turn it off?
Why does the volume rise when I speak? perf
It can also reach 135
How do I make it stable in the 60 - 90
??
Um.
That's what you'd expect when you turn "force FP32" on. To put perf number at 60 - 90 you would have to disable "force FP32", when disabled there will be less audio delay but potentially lower audio quality at the same time. Try set chunk number to around 180 ms.
okay thank you
If you advise me to put off FP32
There's a trade off, by the way. 
REST or SIO ?
Where can I find DMR V2? I cant download it off of #1235952130855010365
Try turning your speakers off or disconnecting them and see if it still happens. You can use audacity to record an audio file, and listen to it after reconnecting your speakers.
how/
never mind I guess.
Is it just Realtek HD Audio that plays the audio twice? Did you try recording from the Cable without listening to it?
Try this one instead of VB Cable https://software.muzychenko.net/freeware/vac470lite.zip
is it better?
It's faster and more stable
You only get 1 cable with the lite version, but it's enough. Open up VAC control panel as admin, and change the settings to this:
and then click on Set in the bottom right
This file, is a zip file. It needs to be extracted first, after which you can run the Setup exe
it looks open
it needs to run as admin to do that, but.. I guess its fine
most of the settings are good enough
I don't know.
So now you have a Line 1 in your audio drivers
Try setting the Output of the voice changer to Line 1
and the input on Discord to Line 1
Also double click this in Sound settings, and make sure it is set to 48000 sampling on the third tab
It should be 2 channel, 24 bit, 48000 sampling mode
same as in vac control panel basically, and same in the voice changer
There is always some lag, as people mentioned.
like, about a second... slightly over half a second at least. (700ms or something)
you can't fix the lag entirely, because its going through complicated AI tools before the audio is put out.
w
but the double audio should be gone at least
not input, output. It replaces CABLE (vb-audio)
input is where your microphone goes
Am i the only one having issues with Applio?
Depending on what you pick, it could cause the echo / repeat of the audio.
No, plenty are, but- idk myself how to help with that. Try making a thread in #1192011222023950368 as your help request might get lost between text walls
this normal?
Do you get a sense of how this works yet?
Speaker makes sound, microphone receives sound.
Looks good
Is there still odd audio output, like hearing yourself twice?
... I guess no? okay case closed hopefully
im try
and show u
i just dont sound like a girl idk why
Did you click start
The only thing I can imagine here at this stage is that it is the microphone picking up sound from the headset. (echo)
as to why it is softer than with VB-Audio, idk- could be filters or something. VAC doesn't strengthen the audio at least
You may need to mess with the volume and use a noise gate, but it may not be enough, depending on how sensitive the mic is
The first sound should be the loudest, but it for some reason is the softest, so there is a gain filter powering up the audio somewhere in the loop
okay
can u gulide me?
I want to eat dinner
Just ... start by disabling any Gain. (input volume 100%, output volume 100%, gain 0), then see when the echo happens when you raise the volume of your headphone.
echo meaning, that the voice changer is producing the output more than once
Then you need to find a sweet spot of where you can hear things, but for your microphone it is too soft to hear
if you cannot find this, then you need to use Noise Gate to filter out softer sounds.
Pitch needs to be 12-ish, and you need to do the rest yourself, talk on a higher pitch, etc.
you can also raise it above 12 if it is difficult, but if you have little control over your voice it will sound weird.
pitch to 24 if you want to be lazy, but then you need to talk with a deep voice constantly
You can offset it a bit, but then your C note will sound like an E note or something, you basically want to keep the notes correct while offsetting the octaves
oksy bvet
i reinstall and now when i launch the mmvcs it pops up for sec and close down
guys where do you put your d g pths in applio?
thanks :P
Does anyone know how to fix the TTS in Applio? It does not work in collab nor locally. Also seems like some other tools online that use the Microsoft voices ( ? ) are maybe bugged too.
why i need virtual cable if this work without it?
although others can't hear me with such a voice
so others can hear the voice changer
There is another problem: I have already tested 3 voice models and they practically do not change my voice at all, only an echo appears from behind the speakers.
there is an issue with the mic settings u have then
what is ur input and output
oh
I can't read them
your input should be your regular microphone, and output should be your virtual audio cable
vac lite
Okay, but what should I do if my voice doesn't change at all?
I'm not sure
I thought that this was the only program where you could practically not hear the AI, but no matter what voices I use, they change literally by half a tone
I've never had that issue so I would ask a helper such as namari
a little bit ping
the virtual cable is used to transfer audio from one place to another. You need it or something like it because Windows doesn't support such things by default.
The voice doesn't change? That can happen for a number of reasons.
Try restarting the server. There are bugs that happen on first launch.
Sometimes, restarting the entire system (in case drivers changed) may be needed.
i try
Other than that, you should check whether or not the webbrowser page loading the voice changer has permission to access the microphone,
(the mic's signal may be outputted by the speaker through another piece of software, not just the voice changer; you may need to investigate whether this is happening or not.)
Make sure passthrough disabled.
And of course whether or not the voice changer is running.
Set the pitch to 12 (if you're a guy) for a girl voice to work
also the output is going to be softer by default, so you may want to increase the output volume.
that girl voice model speaks as if it is whispering, from what I understand, so- it's going to sound soft in my guess
literally throwing up from the meantion of that nvm u fixed it
https://virtual-audio-cable.ru.uptodown.com/windowst
this okay cable?
it says 404 (page doesn't exist)
u should use this one https://software.muzychenko.net/freeware/vac470lite.zip
Okay, just one question: why is the community -11 in Virus Total?
what does that mean?
idk like -rep
local differences?
maybe
It's just the most famous one
oh wait you uploaded the zip file
I cannot access that page, so idk what software that is
never heard of it tho
same program other link
I'm glancing at VirusTotal currently. All of those -11 rep points were placed by the same user.
also that user is pretty.... odd
hmm okay
His server is down, so I don't know why he came up with those conclusions
any guide for vac470lite?
It's plug and play basically, you just need to install it
https://tria.ge/230917-pcdstaac41 oh here...
The scoring is pretty aggressive.
?
setup64
I looked at the reports for the bad score, it looks like the analyst had an infected ZIP file with code that is way different, causing different behavior
I mean, we can all look at the source code of batch files, and these don't connect to servers in the US.
So this is false alarm, unless you got it off a strange site I suppose
Don't pirate basically.
run this file
oh lol
input virtual cable (line 1) output headphones
If I put the virtual cable on input, they don't hear me. If I put it on output, I don't hear. That's weird.
or for this to work I need to enable the ai model first?
I am confused
like nothing
so uh if u wanna hear yourself on okada click monitor and put ur headphones
bad English maybe
I want others to hear me on Discord, but when I connect to a virtual cable, nothing happens.
so I don't understand
Think of a Virtual Audio Cable as a USB slot.
On its own, it won't do anything. You have to "plug in" your headset, or microphone or whatever, into that VAC, or else it won't transmit anything.
If you use a Voice Changer, in there, you must set your microphone as input, and Virtual Audio Cable as output. Then, in every other app, you select Virtual Audio Cable as input, wherever you want to use the Voice Changer.
Example:
In Voice Changer
Input: <your mic>
Output: Virtual Audio Cable
In Discord (or any other app)
Input: Virtual Audio Cable
(Output: <your headset>)
same :/
Anyone know why my i have vb cable and my mic isnt picking up any sound
the vb cable is set to input
scroll up just a little
btw, do not ever set input AND output to Virtual Audio Cable within the same app
Setting audio input to "Line 1 (Virtual Audio Cable)" and then output to "Line 1 (Virtual Audio Cable)" is an incorrect one, regardless of which program it using. 
yea i do this
already
but still not working
This "RVC WebUI" likely being an older version. Try Applio RVC the more recent "RVC WebUI fork" instead.
-rvc
Is a batch size per gpu of 6 good for a 4060 8gb? im training a model at 200 epochs with around 17-20 minutes of data
seems okay, although with that much data i'd probably choose 4. Don't have too much experience yet, but I've seen people recommend higher batch sizes (like 8) when you have bigger datasets (e.g. 1h and more)
the training should be fairly quick anyway, so i suppose it doesn't hurt to try both, really
Ive been wondering if i should use 8, 6, or 4, this is my very very very first voice model so im kind of overthinking a lot of things right now 😅
i tested it at 40 epochs at a batch size of 8 and it sounded really really accurate to the voice im trying to clone
but i saw people say to use 6 instead or 4 for a gpu like this, so it got me overthinking if putting it to 8 would ruin the quality
batch size 8 is goood
no need to overthink when you can play around and see what difference it makes at various settings
if u were to train with batch size 32 like @analog obsidian did it would be good too
i'd not recommend high batch sizes to someone who just started to make models
Im monitoring my vram though people say to turn it down when it leaks to shared memory and its leaking into shared memory right now even at 6
i'd try it 
on such small dataset? wouldn't that be worse than a smaller batch?
nah on larger datasets
17-20 minutes is not a large dataset
Do you guys have any definitive answer to this? Cause im really confused right now😅
use 4
Will i have to retrain my model all over again? Or will pausing then changing the batch size be fine?
don't change batch size mid-training
no just continue training, any batch size works
yuh oh
oh, Lyery has much more experience so definitely listen to their opinion more than mine
i've read before that this was disadvised but perhaps i'm mistaken
back then people were saying the low points of the tensorboard graphs were mode collapse...
🤦♂️
there are some misinformation about the batch size too
thanks a lot for the help guys ❤️
i wish that kind of knowledge was documented somewhere, over past week or two i've scrolled through a lot of info and it's difficult to separate the good parts from the bad
the problem with the high batch sizes is that they overshoot, but in rvc i havent noticed any problems with them
but for finetuning i'd not use anything above 32
when i tried batch 128 in a pretrain, it made it explode
if your model is not exploding, ur good lol
however, it's safer and easier for the casual user to stick to the lower batch sizes (4-8-16)
if you notice the model sounds good and everything it's fine, means the batch you chose is perfect
no need to overthink
I know that looking just at the loss graphs isn't enough to choose the best checkpoint, but I would expect the charts to act as guidelines in which areas to look for the checkpoints. Like, it can be noticed when the initial training stage with huge gradients ends and where's the local minimums which could suggest some decent checkpoints. And later on the training could be monitored for regression. Is that approach wrong? I'm trying to get into some good habits
One last question, how may i fix the very robotic and metallic “S’s” in models?
Would very much appreciate any tips on how to interpret the tensorboard to make good use of it during/after training
they're completely worthless in my opinion
And in breathing sounds too! Since i kind of want it to sound good during breathing sounds, and the dataset i used has a lottt of inhalation and exhalation.
i only check the tensorboard logs when im training a pretrain from scratch and i need to know if my model is exploding
which doesnt happen with normal rvc training, unless you do it on purpose
So just train for some decent amount of steps while saving checkpoints every X epochs and test & compare the checkpoints afterwards for choosing the best results?
what sample rate are you using? 40k and 48k have bad esses and breaths
I am on 32k
yup just check the step amount
very valuable info, thanks!
maybe something wrong with the dataset
If you need urgent help, please checkout our AI Hub Docs or ask for help here following the [Guidelines](#1402790586028789830 message)
Does how many epochs it’s trained matter? Im currently inferencinf a voice with 60
i noticed batch size 4 has decent breaths and esses, what if you try that?
Alright
try training 10-20k steps
the step number is listed after the epoch number
for example: model_10e_3000
Gotcha, will keep that in mind
remember that if you change the batch, you need to start the whole training again from 0
click fresh training and change the batch to 4
The voice im trying to replicate is an ASMR youtuber who does a lottt of whispering and some very light talking so im trying to find settings that would complement well to the many emphasized “S’s” they do and their breathing, as well as the very light talking
yes, something I always recommend is not to blindly believe what people say about rvc, it's best to experiment and find the best parameters for your model on your own xD
Where is that on the RVC WebUI?😅
oh you don't use applio?
Nope, i just found a guide on youtube and they used WebUI
stop the training and go to the logs folder, delete the G and D files and the epochs
Couldn't agree more, I'm playing around with various training configuration and dataset preprocessing for that very purpose, getting some first-hand experience. But I think this is best when combined with some hints from people with actual broad RVC knowledge 🙂
getting best of both worlds
and send me a screenshot of the webui because i dont have mainline installed lol
gotcha, would you recommend Applio more for beginners? Ive read that its more oriented for people who’ve just started
applio is way easier to use than mainline
the applio code is the same exact of the rvc webui
so both produce the same result
yea i recommend it
alright, ill make my first model here on webui since I’m semi familiar with the UI already and move on to applio. Ill send an ss of the ui rq, some of it is in chinese which the translator couldnt translate so yeah
here put the name of your model's logs folder
don't run the preprocessing again
be sure it's v2 and 32k
then decrease batch size to 4 and click train model
Do i change the pretrain? Since I am using LegacyCore 1.5
try og pretrain first (the default)
Alright, do I try switching if the final result doesn't meet what I want?
sure, actually thats what i always tell to ppl, first use the default pretrain, then experiment with others if you didn't like the first result
10
I am using tensorboard so would i also need to delete anything to reset the graphs or do i just train it
sorry if theres too much questions by the way ill stop if you want 😅
yes, delete anything inside the eval folder
This graph looks really weird to me 😅 did i forget to delete something else or is it normal?
yup you didnt deleted the old logs
The files in eval?
yes
can you send a screenshot of your logs folder?
I did delete the files in eval before starting the training earlier though
also delete G and D files, start again
How can i test specific checkpoints without ruining any training progress?
do i just X out of the terminal, test the checkpoint, load up rvc webui again then continue training?
Hey 2 things
- How do I make it so when I speak on the actual voice changer app I can hear myself while doing it? It happened and then it just stopped randomly
- My gpu isn’t getting recognised, my gpu just says cpu and that’s it, I have a rx 9070xt
скиньте 500 рублей пж
This is what I have at 11k+ steps, sustained S's sound really bad
Could this be caused by a low quality dataset or does it just need more training?
that is mostly a limitation of contentvec
and nsf hifigan
i think you can add more data to improve it slightly but it's still gonna be unrealistic
Wait are you talking about the pretrain or something? I'm a bit confused 😅, If so, I used the regular original pretrain like you said
limitations of the ai, the ai was made in 2022-2023
its not perfect
Ah i understand it now, thanks
Any way to improve it or is adding more data my safest and best bet?
yep, but keep in mind it will never be 100% perfect and realistic
Alright, thanks for being a big help for my first model 
Anyone
!howtoask
- Check Docs & Guides: Your answer may already be in the AI Hub Docs or the https://discord.com/channels/1159260121998827560/1159513888199540817 channel.
- Search the https://discord.com/channels/1159260121998827560/1192011222023950368 : Look for existing posts that solve your issue. Do not invade someone else's post.
Tell your:
- Full GPU Name: (e.g.,
NVIDIA RTX 4060 8gb vram desktop) - Operating System: (e.g.,
Windows 11) - Detailed Description: What were you trying to do and what went wrong?
- Tutorial Used: Link to the guide you were following.
- Screenshot: A picture of the full error message.
To maintain a legal, safe & ethical community, we will NOT provide help for:
- ANY illegal activities.
- NSFW/Porn.
Requests for these topics may be ignored, not helped and result in moderation action.
- Be Polite & Patient: Our helpers are volunteers. You may ping the
Helpersrole once. - English Only: Please keep all conversations in English.
- Don't Ask To Ask.
why i cant add effect to my mic input on wokada client mode ? on server works fine but on client is not picking up my mic with my effects
client has been broken since november 2025
no explanation just stopped working for everyone
only server mode works
- "Listen to" the input device
Go to the settings app, system, sound, more sound settings.
Find the input device (microphone tab)
double click it.
then, on the listen tab, check on Listen to this device and hit okay.
- use the directml release (dml) of the voice changer, not the cpu release.
Is it normal that when i hear myself, i hear it delayed? and does it affect conversations like when im talking to someone is it delayed too?
Hi, someone can help me with this?https://colab.research.google.com/github/iahispano/applio/blob/master/assets/Applio.ipynb#scrollTo=nAlXiNYnFH9F) when I click on the public url come out a page with - no interface is running - thank you
how does one train an ai voice? like acn i use an mp3 to make it sound like someone?
-rvc
read up on this guide, u would need to use applio to train the voice and also would need to clean the audio so there's no music or anything
thank you!
you're welcome! u can use UVR to isolate vocals from audio as well as sites like mvsep or minus x
both are free
https://mvsep.com/en/home
https://x-minus.pro/ai?hp
Make sure you try this https://github.com/tg-develop/voice-changer/releases/download/b2397/voice-changer-windows-amd64-dml.zip W-Okada voice changer** fork** version. Older W-Okada versions like v.1.5.3.18a won't work.
Right I done it but there’s an echo, not really an echo it’s just repeating whatever I say. I can’t send images on my settings but I’ve not touched anything
It never ends really, a echo that never ends
I fixed it, also I use do I use 0: “AMD Radeon RX 9070 XT (DirectML) or do I use “1: AMD Radeon(TM) Graphics (DirecetML)
RX 9070 XT (DirectML), the other one is your iGPU. You could use both, but the graphics card is likely faster.
Earlier I thought you wanted the ability to listen to yourself, which is why I send what I did before.
Echo issues tend to be on the hardware side, usually at least. Too many people don't realize what their speaker broadcasts can be heard by their microphone, and usually will be. That combined with audio enchancement causes a loop where on each repeat the audio gets louder.
No i figured that out
I mean my cpu was fine, I have a 9800x3d but yeah I swapped to that, I really need to sit there and tune the voices but so far it’s actually good
Obviously I don’t always wanna hear myself just wanted while I was testing because it’s better then going in a call with myself
Select "AMD Radeon RX 9070 XT". The "AMD Radeon(TM) Graphics" is a basic** integrated **GPU.
will it solves the chaotic voice problem?
😲

"Applio RVC" is the most recent RVC UI that most people here use. What do you mean Applio RVC won't gonna fix robotic audio either way?
so it will fix?
First of all, Try the program by yourself, hear the audio and that will be your answer.
Try extract "Applio RVC" to somewhere that's not always be drive C.

Applio RVC on Kaggle, aside from using website "loca.lt" to access file directory of that Kaggle environment for dataset audio files, I also use it to upload "multiple voice model files" there too.
is vonovox necessarily faster than deiteris's w-okada fork?
Not always.
No but it's better, all Wokada programs are similar speeds
ok, but is it better to use vonovox anyways since I have an rtx 4070?
like what more does vonovox have to offer that wokada doesnt have
Vonovox is made to give better audio quality in mind, while W-Okada voice changer fork versions most of the time giving robotic audio even when extra value still set at 2.7 s. There are also trade-offs of UI interface and settings between two programs as well.
how large is the comparison of audio quality between vonovox and w-okada?
Unsure and inconclusive. 
i use w-okada fork right now, and i do notice "noise" that the voice changer gives
so i usually have to use a low pass fulter to get rid of that but it slightly destroys the resonance
im asking if vonovox just increases the audio sample rate (kbps) or if it increases the "realisticness" of the RVC
Actually, I can focus on settings and things,** not always cared too much about audio quality **that either program gives, so while I've never tested Vonovox and W-Okada fork at my own, I can answer that way. 
so far imo vonovox sounds a good bit better than the okada forks
using the beta version of vonovox btw
You know, Vonovox and W-Okada** don't **use bitrate (Kbps) like what CBR (constant bitrate) audio files do unless record as a file otherwise, but rather happen with a voice model quality and how you set block size, extra time and crossfade on Vonovox.
it also allows for configuration of the voice texture going into the ai model, so if you have a model that's supposed to sound very gritty and deep ( Master Chief for example ) you can make it sound much closer using that setting.
haven't tested going the other direction with voice texture yet but i would think it would help with smooth voices going the other way.
I just closed Applio RVC. 
On Applio RVC, in "Download" menu, you'll see some sections to upload your voice model(s). https://cdn.discordapp.com/attachments/1159290139609137264/1458081997040783452/image.png?ex=697ab06d&is=69795eed&hm=5dfd7085469aac4f312d628fbf054be21a953ee8b82c4cda924bffa87e541f4a& Then go back to "Inference" tab, click a big "refresh" button. https://cdn.discordapp.com/attachments/1159290139609137264/1455545707204182016/image.png?ex=697ab0d3&is=69795f53&hm=ccb7944101089ffde5b555dd90bb1387e4a9e3f56f0295d0973d5c76ed585d69&
thank you!
this time I use applio RVC,but still has robotic output voice,how to solve
😭
Well, while Applio RVC doesn't have any quality setting (chunk, extra) like in realtime voice changers (W-Okada, Vonovox), the audio quality has to happen with the voice model itself, so unfortunately, but instead try another voice models as your benchmarks.
can W-Okada or Vonovox also do AI voice changing based on a input audio file?
If that one voice model came from 2023, it's likely the case, or perhaps the model was trained with low-quality dataset audio and settings from start. Realtime voice changers mostly use your microphone live, a few voice changer versions can process from a file like non-realtime Applio RVC, but some voice changers usually lack that feature and that's about it.
ok thanks
It doesn't work now ?
Facefusion on lighting.ai with GPU on but it says
facefusion.py run: error: argument --execution-providers: invalid choice: 'cuda' (choose from 'cpu')
-rt
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options GUIDE
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project. GUIDE
Deiteris' fork (modified version) of wokada that doesn't get updates anymore. GUIDE
For Windows Nvidia, Both Wokada Tg-Develop fork and Vonovox have similar performance & quality. Users should read the pros and cons for both and choose based on their differences, such as UI and Vonovox's paid effects.
Read Wokada Tg-Develop Fork Pros&Cons & Vonovox Pros&Cons
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
-rt
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options GUIDE
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project. GUIDE
Deiteris' fork (modified version) of wokada that doesn't get updates anymore. GUIDE
For Windows Nvidia, Both Wokada Tg-Develop fork and Vonovox have similar performance & quality. Users should read the pros and cons for both and choose based on their differences, such as UI and Vonovox's paid effects.
Read Wokada Tg-Develop Fork Pros&Cons & Vonovox Pros&Cons
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
What is your PC GPU? And did you follow any tutorial or guide doc before? You seem to looking for voice changer.
my okada always has to be open when im using it. is there any way to use it in the background
!howtoask
- Check Docs & Guides: Your answer may already be in the AI Hub Docs or the https://discord.com/channels/1159260121998827560/1159513888199540817 channel.
- Search the https://discord.com/channels/1159260121998827560/1192011222023950368 : Look for existing posts that solve your issue. Do not invade someone else's post.
Tell your:
- Full GPU Name: (e.g.,
NVIDIA RTX 4060 8gb vram desktop) - Operating System: (e.g.,
Windows 11) - Detailed Description: What were you trying to do and what went wrong?
- Tutorial Used: Link to the guide you were following.
- Screenshot: A picture of the full error message.
To maintain a legal, safe & ethical community, we will NOT provide help for:
- ANY illegal activities.
- NSFW/Porn.
Requests for these topics may be ignored, not helped and result in moderation action.
- Be Polite & Patient: Our helpers are volunteers. You may ping the
Helpersrole once. - English Only: Please keep all conversations in English.
- Don't Ask To Ask.
Here's an example on how to detail your issue:
W-Okada voice changer doesn't work on my PC
PC GPU: NVIDIA GeForce RTX 3080
Operating system: Microsoft Windows 11
The reason: W-Okada doesn't work, I tried to follow the tutorial but still doesn't work.
Tutorial I used: a YouTube video link you followed.
hi! how do i make whispering work well on vonovox?
it doesnt have an rmvpe_onnx option for f0 method like wokada does, and thats the one that usually makes the whispering sound real
There are big trade-offs between W-Okada voice changer and Vonovox. 
ooh
So which one do you think would be better for voices that are very asmr-like with a lot of whispering and light talking?
If I know how to use the voice changer especially for "whispering" noises. There are few RVC voice models that can do whispering but that's about it, meanwhile most voice models in #1175430844685484042 are generally trained from spoken/singing audios.
Vonovox likely has regular rmvpe model because it's supposed to work with NVIDIA GPU. For "rmvpe_onnx" F0 model option, check out Tg Develop's W-Okada fork.
@hallow thistle Hello
Can you help me to train a voice ?
I put my video on it
I got now the track
That part of Applio RVC is called "inference" the AI cover part, not a "training" one. https://cdn.discordapp.com/attachments/1159290139609137264/1455545707204182016/image.png?ex=697b5993&is=697a0813&hm=79e4779a531b2f7454a1201b06763c938995500a0e49e5cf4ea80337bb4f4457&
There is difference between the terms "inference"" and training". This part of Applio RVC in my screenshot is actually a "training" tab. https://cdn.discordapp.com/attachments/1450927945802715236/1451238526837592220/image.png?ex=697ad773&is=697985f3&hm=e139b9c1179015c4f90d80d09c913f31abd84dc05cb1805afdbadff04c856338&
I did, it was all working last nice but the voice changer is not working now, the one that uses me gpu, the cpu voice changer app thing works fine but the one you told me to download stopped working
I'm trying to train this voice
Well, actually I'm not a model maker member, so for training a voice model it's better to ask a fellow model maker here instead. And make sure you** don't** send any copyrighted material (especially audio) to this server, regardless of what that supposed to be.
Yes
Why did you simply answer just "yes"?
I will ask to a model maker
ty *
I'm trying first
the training part
So you don't really know what configuration I'm suppose to do
On the training part ?
I literally just implied like that. If I say "I'm not a model maker" it means I don't know how to train a model. You good?
Okay
yes im good and you
?
Try restart the program or you can send your screenshot to here now. Make sure settings there are alright.
can you still help me please
look the documentation for me
Make sure you don't force me to provide an answer just for you.
I don't really understand can you explain
you want me to add you in friend ?
The what you mean by "adding friend"? I didn't say I want to help you in direct message. You could wait for another model maker to answer. And while I can provide some basic settings for "inference" part in Applio RVC, it doesn't have always be me to answer anything. Do you understand or still wanting answer?
I mean it looks ok, it’s the same as my other one it just stopped working from last night, is it working fine…
Oh, that's not Tg Develop's W-Okada voice changer fork b2397, it's Deiteris' W-Okada fork b2332.
What one am I meant to download… it’s so confusing
I had another one before it didn’t work at all but I have a feeling that was the right now
Oh yeah I had that one
It didn’t work at all
So I deleted it, thought I had the wrong one and I couldn’t send images
And you told me to get this one I think
Some flaws I noticed from your screenshot are:
- Your output/monitor audio devices are confusing, the "monitor" one isn't your speaker either which is why you won't hear the program through a supposed speaker.
- Chunk and extra values are of default settings, which** should** be set to some value else.
So I want this one?
I need my monitor as that one to hear myself
If it’s on anything else I can’t hear anything so I guess that’s ok
But that isn’t the problem if I record my audio or whatever and listen to it, it’s just silent but yeah I’ll get the other thing
what one do i even get? the bottom?
voice-changer-windows-amd64-dml.zip
The initialism "dml" refers to DirectML, the one that will work with AMD Radeon RX GPU. The "CUDA" one is for NVIDIA GPU only, while "CPU" means CPU-only no-GPU.
can you tell me where is the part extra
I'm looking for the audio analyzer
Oh ok I’m a idiot, I did have this one before and it just straight up didn’t work but maybe with images I can get help and make it work
no u are not
an idiot
u are just learning
@hallow thistle right ?
Beats me
no stop
i dont get this one but its not working for me
come again?
stop
I’m so confused
Try Virtual Audio Cable lite instead of VB-Cable.
Because your Applio RVC UI language is set to "French", meanwhile almost everyone here uses English, so many settings might look confusing to you.
can you send a link please if its ok?
Try set extra value to "2.7 s" and chunk to 120 ms.
Bro this is fucking amazing!
RVC never had an option to train a voice model in 44100 (the double of 22050). There are 32000, 40000 and 48000 the common across many RVC voice models, with 32000 being the lowest one possible. For help about "how to prepare dataset audio" and "how to train a voice model", just letting you know** for second time** already, there are some members here with a pink "model maker" role in their accounts.

hey guys i havent made an ai cover for years.. where can i make one?
Extra at "2.7 s" is what I can answer, for chunk it depends on your PC GPU and perf number.
Applio RVC.
tyyy
It gets distorted at times if I go high pitch but if I talk like neutral it’s perfect but I’m sure settings can help it
what about advanced settings
"Crossfade length" at "0.15 s" while turn "force fp32" on. Not always recommended settings, but these settings can help make better audio quality in voice changer, with a trade-off of a bit audio delay.
thans
is it an app or a website?
Applio RVC itself is an app, although the exact app can be run on a cloud service (like Google Colab) as well.
is there a website where I can just upload the audio file and the model?
The "Applio RVC' in question looks like this as an app.
There are some websites that can do AI cover, at least basic inference features like uploading voice models, audio files, but I don't remember the names.
if im playing a game it does not like it at all
It gets like a 10 second delay from when I speak to it repeating
The perf number at top right somehow equals or greater than 117 ms, making audio to delay, so try raise chunk number up by 40 ms (the 160 ms).
Oh I’m so stupid thank you
please let me know if u remember 😭

huggingface
tho not including free tier limits + most spaces arent updated anymore
dead end
'less you set up a notebook
What’s the best AI tool right now to change a singer’s voice in a song?
Applio
-rvc
Here's the docs to look over, applio can be used both locally installed as an app on your PC and also on a browser on something like Google colab or Kaggle (kaggle is better)
says theres an issue with voice conversion and i need to check command line window for more details. not sure how to get to the command line
-colab
Google Colab is a Cloud (Remote Good PC) Service. While the Free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
by IA Hispano
Google Colab
by Hina
Google Colab
by Eddy
Google Colab
by Eddy
Google Colab
by Deiteris & Hina
Google Colab
by Shiro & Eddy
Google Colab
by Nick088
Google Colab
by Nick088
Google Colab
by Jarredou & Makidanye
Google Colab
!howtoask
- Check Docs & Guides: Your answer may already be in the AI Hub Docs or the https://discord.com/channels/1159260121998827560/1159513888199540817 channel.
- Search the https://discord.com/channels/1159260121998827560/1192011222023950368 : Look for existing posts that solve your issue. Do not invade someone else's post.
Tell your:
- Full GPU Name: (e.g.,
NVIDIA RTX 4060 8gb vram desktop) - Operating System: (e.g.,
Windows 11) - Detailed Description: What were you trying to do and what went wrong?
- Tutorial Used: Link to the guide you were following.
- Screenshot: A picture of the full error message.
To maintain a legal, safe & ethical community, we will NOT provide help for:
- ANY illegal activities.
- NSFW/Porn.
Requests for these topics may be ignored, not helped and result in moderation action.
- Be Polite & Patient: Our helpers are volunteers. You may ping the
Helpersrole once. - English Only: Please keep all conversations in English.
- Don't Ask To Ask.
-huggingface
Huggingface Space by r3gm
Huggingface Space by IA Hispano
HuggingFace Space by Nick088
i have a problem
What even is this 😭
what are you talking about tho
Idk what that even is that you're using
rvc gui duh
can't see you the problem is "traceback (most recent call last)"
OH
Yikes
why didn't you tell me then??
Ngl I didn't even know what I was looking at I'm sorry
ik, its okay
because i downloaded newest rvc gui and it says python was not found
but i have python
thats why i used that old ass 2023
What are you wanting to do with it anyway?
If you're looking to do covers I'd just use applio
well, i kinda looking to do covers
Hi everyone, I'm using the RVC WebUI that you can access from computer files, it's been a while since I used it, but I just tried to import a voice model and it's not appearing in my inferencing voice dropdown even though I put the INDEX and PTH files where they should be? is the UI broken?
I'm not sure what that is, what does it do? Rvc is used to do voice covers, make ai models or is also able to do real-time voice changing
If you're wanting to do covers just switch over to applio as it's the most up to date for that kind of stuff
-rvc
I'm using it to make a cover, I prefer using it since it's a lot simpler and it's what I'm familiar with
I haven't seen that in years
That's so old..
It really is recommended tho to.use something up to date tho over something simple
Lmao it just looks old sorry, reminds me of the stuff used in 2023
you don't need to be sorry 🤣 I was just wondering if anyone was familiar with it and knows if its broken or if i did something wrong
It looks somewhat familiar as I used to use something like that a couple years back but at the same time looks unfamiliar
Layout is quite similar to applio just without the Chinese text
Oh really? Is Applio as simple as the one in the screenshot?
Pretty much, it has a little bit of extra features as it's getting updates somewhat often
You can also run it locally or on browser which is cool
I thought locally was easier because I didn't think the system could break, but maybe mine did
I don't personally have the download for the local version but you could ask someone like Namari as they may know more
Python is weird lol
It might be running on an older one and you have a newer one or other way
hey y'all i have a different kind of AI/bot question... coming here from #🧬│ai-chat
Making covers is so complicated, everything shifts within a few months and it feels like you have to learn all over again 🤣
can someone give me a good pointer or guide on how to write an effective game player bot that acts based on visual input (=screen)
and interacts with the game via emulated key inputs
i mentioned this in the other channel too: this is an entirely offline project, so no, i'm not cheating
(i'm even TASing this game)
Yeah lol, I've gotten used to just keeping up with new stuff
There's an app called replay but you need a weights.gg account to use it and they're shutting down
I would do that but I barely make AI covers. I just discovered Suno AI to generate music, and I wanted to change the voice so I'm back at square one with the rvc guides and such
I swear nobody reads the mo advertising rule
the only other sources i've got are google and chatgpt
There's some AI thing that hasn't released yet but it's supposed to play the game for you or something, that's all I know of it
well okay
google and chatgpt it is 
thanks-ish
so it's basically
chatgpt in a jar
with fancy lighting
oh i'm sorry, grok in a jar with fancy lighting
||shut up sapphire||
Basically yea
They could have chosen anything but grok 😭
Grok is cool but really isn't in the best place rn with those images it has been making
Maybe this is me being dumb but I'm unable to open them with 7z
Hi, i have a question. It's possible use voice changers in mobile? If it's possible, can please help me how?
(I don't know if this channel it's correct to ask)
Not directly. You can use an app called Phone Link, SoftPhone or something similar. I have no experience with this though.
open up 7zip
drag and drop 001 into the program, that should let you open it
you're only supposed to extract the two packages though
Thank you
You pinged someone specific, but I wasn't sure if you wanted strictly them to reply or not
but its taking a while seemingly before they react
no no you helped it was just because they had posted it from what I read
oh also
Im having another issue, im using a 5070 ti and when I try switching from my CPU to it I get a "An error occurred during voice conversion. Check command line window for more details." popup
Would you happen to know what could be causing it?
Not really, it could be a bug on the specific fork of w-okada voice changer you're using.
For the 5070 ti, I think vonovox is a better option as voice changer
but on w-okada, most people use tg-develop's fork. which does have rtx 5000 series support.
I used the one that user recommended for the 5070 ti which I replied
would you happen to have a link?
There are pros and cons with vonovox, as well as tg-develop's fork of w-okada- it's basically a matter of personal taste
thank you!
but vonovox currently has less bugs and might be faster than w-okada for you
frankly im just looking for the one that sounds the best but arn't we all
(not sure since i don't have an rtx to test it with lol)
lol its ok no worries uve helped a lot
In case you want to try tg-develop's fork of w-okada: https://github.com/tg-develop/voice-changer/releases/tag/b2397
Good luck anyway ;p
Does anyone know how to fix occasional raspiness in tg develop's w okada fork?
Sometimes, when I talk normally it produces a super duper raspy result
Even though there isnt anything wrong with my speech or mic
It has higher sound quality for sure but its slightly choppy and my voice cuts off immediately after i talk which makes it sound unrealistic
I'm on a 4060 by the way,, I would choose vonovox but tg develop's fork has rmvpe onnx f0 method which is what i need
It could be the voice model.
you can try to optimize towards quality (extra 2.7, crossfade 1.5, lower protect (0.33 or less)
Force fp32 enabled
It could be that your mic isn't loud enough (raise input volume, add gain could fix it)
but I wouldn't rule out that the voice model isn't good enough.
Apparently to avoid such things, it needs a lot of sample data (more than 10 or 15 minutes of it at least).
The more samples it has the less these things happen
I don't believe it is the voice model... I've tried this voice model out on the original wokada and vonovox, none of them had this problem
I've trained it on 19 minutes on data too
tg-dev's fork is bugged. I'm not too sure what is causing this, but- .. I mean we can try to optimize at least. I recommend inspecting the settings.json file. It might be that your sample rate isn't set to 48000 (when using server mode at least); you can tell from that.
where can i find its file directory?
the file is called "stored_settings"
Ah okay
Do I change all of the values here to 48000?
Yeah
Last update: November 16, 2025
this guide helps with prepping the dataset for voice model training
as you can see there, for a quality model they diffidently go towards the 40~45 minutes or higher
Ah okay, although I'm quite content with my model currently as its only faced this problem on tg develop's fork
I recall reading somewhere that this is because the more data it has, the less it will make mistakes with stitching the chunks
You need a tool to measure how many decibells your mic makes to see how loud it is for the voice changer. You can do this by recording your mic's input in audacity for example.
You can also use OBS; I use that to measure audio levels.
When people speak (for the voice changer) too softly then these bugs can also happen; that is if the voice model isn't trained for whispers or soft sounds.
so adjusting the input volume or mic's hardware volume (or the one in windows) can help there
Further, instead of VB-Audio, it's recommended you use https://software.muzychenko.net/freeware/vac470lite.zip
unless you're already doing that
Yes, I am
My voice model is also specifically trained with soft talking and whispering
And this problem happens whether I am whispering or not so
From what I understand is that it ... just happens on w-okada. It happens less on vonovox, which is why people tend to recommend that for the latest nvidia cards.
so there is diffidently a tool aspect in it there
Yeah I understand 😅 but vonovox doesnt handle whispering well or at least for me cause I need rmvpe_onnx for it to work well and vonovox doesn't support that
The chunk / block size needs to be in the green; if you have all that your settings should be optimal
Like this?
That is way too much, but yeah basically
It's like ... 30 over what it actually uses when it starts to be green.
Oh
It's to keep in account the spikes that happen every now and then
Yeah there have been a lot of spikes
lower chunk size means that your GPU will be busier, but it should be fine
rmpve onnx seems to be made specifically for the dml version of the original rvc project's voice changer, which w-okada implemented. For two years, no one has updated it. I am not sure the regular one also has dml support, but it seems at least people are copy and pasting from eachother's projects rather than actually checking the code.
It looks like dr87 actually had rmvpe onnx working on vonovox at some point but didn't notice a difference, so he dropped implementing it. (as he saw no reason to keep this in)
at least I can tell that from the discord chat messages.
I suppose it's an okay feature request lol
but yeah, idk if I can help with this further. ONNX models themselves are broken on the latest version of tg-develop's fork.
(related thought: maybe one version earlier has better performance for onnx specifically)
(Just a little update so im not gonna ping u im just gonna put this here incase people run into the same issue lol)
i was able to fix it by putting these settings on as well as having rmvpe instead of rmvpe_onnx, i thought rmvpe wouldnt handle whispering well but at the end it definitely did. The choppiness also went away. Thanks a lot for the help, Rumi!
Hey, so can this Qwen3 TTS model be used for Speech2Speech as well?
@void flume sorry to ping you but, is there any way i can replicate the settings you've told me about earlier to help me fix my problem on tg develop's w-okada to vonovox?
It's not choppy or anything but the voice quality and whispering sounds pretty bad relative to tg-develop
Also… i think theres stuff missing in vonovox for me, extra settings like extra time and crossfade dont exist
Which version? I believe those are hidden/removed in current beta for whatever reason
Still there in 1.6.9
Yep, just discovered 1.6.9, it works very very well now.
I was using beta when i was having that problem
However whispering still sounds very bad even with smart sine on or off.. any fixes? @hardy yew
I'm afraid I have no clue. Perhaps the vonovox questions section will be more successful at investigating the issue
was your model trained with whispers?
Yes.. i would say i trained it with an insufficient amount of whispers though, so Ill try training another model with a bigger and better dataset.
hey guys is there a rvc other than applio?
I use RVC WebUI, but i was told applio is way easier than it
is it complicated to use rvc webUI?
I don’t know how complicated applio is but i’d say yeah, its pretty complicated
the ui is in chinese (which you can translate to english) but i made my first model there
i just want to make a cover 😭
-rt
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options GUIDE
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project. GUIDE
Deiteris' fork (modified version) of wokada that doesn't get updates anymore. GUIDE
For Windows Nvidia, Both Wokada Tg-Develop fork and Vonovox have similar performance & quality. Users should read the pros and cons for both and choose based on their differences, such as UI and Vonovox's paid effects.
Read Wokada Tg-Develop Fork Pros&Cons & Vonovox Pros&Cons
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
But i did also test it with a model that was trained specifically for whispering (i used it on w okada and it worked perfectly). However testing it through Vonovox resulted in the pitch spiking whenever i would whisper
Local RVC apps are just what they are. There are some forks that simplify features that sound essential, some features you might not want to use and such but that's really it.
I know you've already got help by someone else here because I was working a job in real life. To open the ".zip.001" one with 7-Zip, you must ensure there's ".zip.002" in the same folder.
How would one properly go about training a model with whispering if you dont mind me asking?
And also do I really have to slice my data into 10 second audio clips? Cause many people say you have to and many people say you don't
ive trained a pretrain that had like 30 hours of whispering and still they werent realistic
but answering your question, maybe add 10 minutes of whispering, the model wont be able to do them right 100% of the time tho
I don’t know how to explain it and I don’t know if this is how it’s meant to work but every single voice thing I have just changes my pitch and whatever, if I tune them they sound pretty much identical, why do they all sound the same?
When I’m home I’ll give examples but they all just sound like my voice just slightly different between them all
But I don’t know if that’s how it’s meant to work or it’s just my voice orrrr I’m doing something wrong
So I recently started using Applio and I've struggled removing backing vocals from my songs and vocals, UVR isn't working for me for some reason and at the moment I'm stuck using songs that don't have any backing vocals because that's all that works, I really want to make covers of all my favourite Japanese songs but I can't, can anyone help me out?
will do, thanks! Also I don’t need it to be suuper realistic lol, just yk passable and believable
-rvc
How much minute max the mp3 voice need to be ?
It's a 2h mp3
How much time is recommended to get good result ?
For minux x you have to chop it up to 15 minutes each
It's better to use uvr locally if you have large data like that
Ask someone like a helper as idk anything about local uvr
Btw it's better to use wav over mp3
usually for a make a realistic voice
how much time you need of the voice ?
And what option I choose ?
I stay on music and vocals ?
Ty for ur answer by the way
Hey guys. I have a question about training your own AI voice models (which sing).
I saw a guide that said you need to make a lot of individual vocal pieces that are up to 10 seconds long in FL Studio. Now, my question is
How many of these pieces do I need to make to get a decent voice?
Please help me if you know.
It's dependent on the audio, not time. You could have a lot of very high quality audio but it not have much emotion to it so it would come out bad
There's no need to cut up the audio files like that anymore, applio (the program used to train ai voices) does it for you
Just get around maybe 15-30 minutes of audio from anywhere as long as it's not too low quality, remove background music/noise and reverb/echo
Use a site like mvsep or minus x
Or uvr
Can you tell me how to use it? You have to insert whole songs with vocals and it will trim them for you?
Can't right now as I'm busy working with my grandpa
Ok thanks !
Is the program free?
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
Is this because the 5th gen gpus arent supported?
All programs I mentioned are free
What's this error for?
Hey! Anyone got an answer? not trying to rush this or anything but im really curious,, lol
Hey, I’m trying to make a believable AI voice for music with RVC, I already have the vocals and the beat but I’m not really sure how to combine everything the right way, is there a good step-by-step tutorial that explains the full process?
You could use a daw like fl studio to put both together
You don't, applio does that for you now
Just keep it all one audio file
I just spent 2 hours chopping up audios
should've asked that earlier instead of wasting my time lol
Well, at least now i know i dont need to, thanks a lot!
Wait but, I can still use the audios I've chopped up right?
Does applio let me do that?
Got it, but I wasn’t talking about training or slicing audio, I already have a finished track with vocals and a beat and I just want to swap the vocals with an AI voice.
Oh that's what you meant, you just need to separate the vocals and music, then use applio to switch the singer to whatever ai model is available here
-rvc
The docs for applio is the second link
You can but it's easier to just have them as one audio file
Anyone? If I need to give a example can we go in DMs!
This is solved I assume, but to summarize: Chunk size is called 'block' in vonovox. The rest carries basically the same name.
This happens on the official w-okada voice changer branch with 5th gen nvidia gpus yeah, but also sometimes for no apparent reason, in which case restarting your system may fix the issue.
Try vonovox, or tg-develop's w-okada fork; both of these support your gpu.
Red AI Hub docs for help with that (see bot message).
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options GUIDE
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project. GUIDE
Deiteris' fork (modified version) of wokada that doesn't get updates anymore. GUIDE
For Windows Nvidia, Both Wokada Tg-Develop fork and Vonovox have similar performance & quality. Users should read the pros and cons for both and choose based on their differences, such as UI and Vonovox's paid effects.
Read Wokada Tg-Develop Fork Pros&Cons & Vonovox Pros&Cons
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
The wokada application won't work. I can't get it to play game audio and playback the voice for me while using it. Yet on my laptop it works just fine.
I literally copied all my audio settings for input and output on my desktop so it matches my laptop
Could it be because I have a different virtual audio cable on my pc vs my laptop
- grab tg-develop's fork of w-okada instead of the official one. (see the above message for info)
- use "virtual audio cable lite" (vaclite) instead of VB-Audio
- Ensure your headset is the default audio output device, and your cable the default input device. (this will make sense later)
- Set Input Device on the voice changer to your mic
- Set Output device to the cable.
...
Assuming other applications use the default device as mic, others should be hearing the voice changer now.
What is your new PC's GPU? And what is your laptop GPU? Your 2024 PC once had NVIDIA GeForce GTX 1650, the newer one might feature something else you haven't yet stated.
RVC (retrieval-based voice conversion; e.g. Applio RVC) and W-Okada realtime voice changer are free and open source softwares.
I ended up fixing it last night. Had to change a couple things because my new pc is amd
Not a direct answer. 
the weight site is shutdown how do I download new models
hey does loading files on vonovox work for you guys? it says failed to load file for me
are you loading an mp3 file? It shows that error for me when I do, I fixed it by loading wav instead
oh that worked thank you
i don't care about latency or lag because i only plan to run files through vonovox, with this in mind which settings should i max to optimise quality over speed
In that case I believe (correct me if I’m wrong) you can just max out block size and extra time, otherwise just keep it close to a high value and adjust to your liking
yeah but when I click on the link some can't work cause I can't do new accs
okayy
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options GUIDE
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project. GUIDE
Deiteris' fork (modified version) of wokada that doesn't get updates anymore. GUIDE
For Windows Nvidia, Both Wokada Tg-Develop fork and Vonovox have similar performance & quality. Users should read the pros and cons for both and choose based on their differences, such as UI and Vonovox's paid effects.
Read Wokada Tg-Develop Fork Pros&Cons & Vonovox Pros&Cons
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
The " " was to show you what to type
How can i get a permission to post on the "voice models" section?
You'll have to submit a model you've created here first then get accepted https://discord.com/channels/1159260121998827560/1452279591317278840
where do i submit the model?
Yeah it took me to a guide that I'm going through right now
In that channel, I'm not sure how exactly it works but you can ask a helper
Mods in ai help forum theres a guy proposing some job with a site without telling what it is
Where is it?
how do i make a voice model like can anyone send me a link?? the sapphire bot isnt helpăful cause i cant get to the google colabs or wherever i need to
You need applio, it can be used both locally and on browser
And also a site like mvsep or minus x, or UVR
-rvc
Here's the docs for applio
i went to thr uhh how to make voice models tab but i cant find any link im rlly slow
actually nvm i managed to fimd it
👍
Does anyone have an applio or an rvc where it contains voice isolation (option where you separate the main voice from the background ones)?
Deleted. 
Hey guys, do any of you know if this is objectively the model overtraining?
If it is, which part of the graph do I test to find the best sounding epoch?
just listen to each epoch, I gave up even looking at tensor some time last year
either you trained the model without a pretrain or you used a wrong embedder while inferencing
Cench cee?
why not just use UVR for that?
"The RVC/UVR5 combination fork" sounds possible but there are potential few limitations, likely the limited choice to select a UVR model within the same RVC/UVR5 environment, and how both softwares interact each other as automated as possible.
For better audio stem quality, UVR5 or "MSST" might be better choices.
i have no idea if i am in the right place or nah
but when i run the http it just pops up, and shits itself
nothing happens (AMD 9060xt) (i'm sure i am not the first to ever ask something like this but
yeah i'd love some help)
Can anyone download a model from weights and upload it to hugging face or somewhere?
I want a model from there
Try Tg Develop's W-Okada fork instead. https://github.com/tg-develop/voice-changer/releases/download/b2397/voice-changer-windows-amd64-dml.zip The original W-Okada version, especially its DirectML variant, won't work.
So i just need to change the files included in my MMVCS folder and it should be working?
If those voice models were trained directly on Weights.com, but since most models there were poorly or low-quality trained, you either forget about them and make requests in #1159289738314919936 or train ones yourself as better "replacements", or I might download it and upload it to a temp site.
**No. **Just extract the newer one to a different folder.
I already have requested them here
https://discord.com/channels/1159260121998827560/1466522426912276718
If you can upload them, I can download it as fast as possible
and it will still be able to reach the forks i have downloaded? meaning i can just use the normal included http.bat file 
(?)
Better to just replace
That's not how you request a voice model there. What I mean by #1159289738314919936 is to request a model maker member to train a complete one from start.
Ah sorry I will delete it
But can you download that model and send it to me
I repeat, just extract the newer one to a different folder. When you extract the newer one "overlapping" the older one folder, the program either fail or won't work.


I'Ll get to it and see what i can put under the covers 
so...
nothing happened 
i have the main file location in one folder, i have the one linked in a COMPLETELY different folder aaand the issue is the same
Try "Virtual Audio Cable lite" instead of the VB-Cable, see and hear if it fixes.
well the main issue is that i can't even hear anything, realisticly speaking i don't even get an application to open up
Don't try to run W-Okada fork with the "start_http.bat" batch file, it would mess things up.
i didn't
i tried to run the main one from hujggingface
This is the program file for W-Okada fork.
If you downloaded the "main" one from Hugging Face, it won't be that "W-Okada fork". Tg Develop hasn't hosted their repository on Hugging Face, only GitHub.
so where the hell am i when i foujd like a huge lot of MMVCS version batch where i could download loads of version of it? 😭 on hugging face, i don't know if it's okada fork or whatsit'
like all of the "videos" i might find about it is listed that website
You didn't see this https://github.com/tg-develop/voice-changer/releases/download/b2397/voice-changer-windows-amd64-dml.zip link I sent to you earlier? It's literally the download link for W-Okada fork, damn it.

well thanks for the help I'll try to ask around some friendos too maybe they know something

This is the guide for Tg Develop's W-Okada fork. https://docs.aihub.gg/realtime-voice-changer/local/tg-develops-w-okada-fork/ Your friends might only know the original version of W-Okada, not W-Okada fork, which is about it.
Last update: November 22, 2025
Are there some collab ones?
Is there a way that I would be able to hear myself while using the voice changer
which one are you using? most have a feature that is called monitor, that should let u hear yourself
just set it as whatever headphones or headset u have
There is a way in your sound settings to listen to the virtual audio cable (vac lite) but I can't remember atm how
Hey i like your models dude⭐
So my Rvc Client isnt starting
Thx
You said it's a client right? What download link did u use
Might have smth old
MMVCServerSIO
of the link?
The voice changer program you're using
Mmvcserversio is just the exe file that starts the program
Many have it so I wouldn't know which one you're talking about
Nvidia Geforce RTX 4080 Super
i dont have image perms
its just the default one
the one with white and blue
where it says realtime voice changer client
Oh
i dmed you
i got a new audio interface
after that it stopped working
if that narrows it down
You should switch to wokada tg fork, the one u sent in dms is Suuuuuper old
I personally use tg fork, it's pretty nice
-rt
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Tg-Develop Fork, with extra features, but it supports only Nvidia GPUs on Windows 10/11 unlike other options that have wider support. and without cloud options GUIDE
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project. GUIDE
Deiteris' fork (modified version) of wokada that doesn't get updates anymore. GUIDE
For Windows Nvidia, Both Wokada Tg-Develop fork and Vonovox have similar performance & quality. Users should read the pros and cons for both and choose based on their differences, such as UI and Vonovox's paid effects.
Read Wokada Tg-Develop Fork Pros&Cons & Vonovox Pros&Cons
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
but that means i gotta put all the models back in one by one?
why did it stop working?
weird.
No idea, could just be because it's old software from over a year or two ago
i kept using that because it was the most robust version
The screenshot u sent doesn't have any models u added tho, just the default ones that came with that app
yeah thats from google
becaues i cannt START THE SOFTWARE
XD
O lol
Weird
the console
i downloaded Wokada Deiteris Fork earlier and that one was such a mess
it didnt even keep the original pitch of the input?
lol
Wdym?
u did clean the vocals of the song right?
like reverb and background vocals
btw since u wanna do that too u should get vonovox instead of tg fork
for whatever reason at least from what I know Wokada tg fork doesn't allow inputting audio files
here's the download to vonovox
https://github.com/dr87/Vonovox/archive/refs/tags/v1.6.9.zip
it used to work with the old one but these new forks keep changing the pitch?? im trying the one you told me to download rn
vonovox right?
oh
that one in particular won't allow inputting song vocals
from what i know at least I cannot find it, that's why I said switch to vonovox instead
wdym?