#hear myself
1 messages · Page 1 of 1 (latest)
As expected, you're using the original version of W-Okada, which is outdated. Download and trythe better W-Okada from there. https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#download-nvidia-on-windows
Last update: May 5, 2025
Also, use Virtual Audio Cable lite instead of VB-Cable, as VB-Cable gives random issues for Windows users. https://software.muzychenko.net/freeware/vac470lite.zip
I already do that¨'
Any progress? The original one and the better one are not the same.
Cant i use the orginal?
Don't use the orignal one. It's outdated. Otherwise you can stay for original W-Okada for much slower performance and bugged RVC voice model selecting.
than help me get the other one

You have NVIDIA GeForce RTX 3060, which is answered in your screenshot. Download the better W-Okada from this link. https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#download-nvidia-on-windows
Last update: May 5, 2025
Oh mate, I just highlighted the text where to click in my screenshot. It's in "Download NVIDIA on Windows" part, not "NVIDIA RTX 5000-series" which is for NVIDIA GeForce RTX 50 series GPU.
uwu
Sorry, but I won't be joking around this time. 
?
Forget about it. Any progress on trying the better W-Okada?
okay
what now?
Send the screenshot.
Go into MMVCServerSIO, and you'll see a .exe file named "MMVCServerSIO". That's the actual W-Okada program. Double click on the program to run.
F0 Det: select rmvpe
GPU: NVIDIA GeForce RTX 3060
Chunk: aronud 60 - 90 ms
Extra should always be 2.7 s
Input: your microphone
Output: if you have installed VAC lite, select "Line 1 (Virtual Audio Cable)"
If you wanna hear what W-Okada is outputting, you can set Monitor to your main speakers/headphones.
Chunk: aronud 60 - 90 ms
This do so my sound quality is so bad 😢
Sometimes, I'm at lost when someone asks me for every step by step, because that would make a person won't be able to think by themselves but relying other people for everything. 
Extra is what makes the audio quality, not chunk. Chunk is what makes the audio to delay; the more number the more delay, and lesser number means lesser delay. While the Extra number indicates how much data your GPU should process at a time, thus making the audio to sometimes sounding in low quality or such.
If you still hear low quality audio even if you set Extra to 2.7 s, or you just haven't upload any voice model there, try a better model from #1175430844685484042.
more delay = better quality
try to take it fully down
and try to talk
That's not what it is. The more delay is simply more delay. The more Extra number, the better quality it is.
For example, if you say some words into W-Okada, but then W-Okada outputs your converted voice 6 seconds after, it's considered a delay.
u should ngl start research it before saying some sh1t
ik it make the delay, but how more u get, how better is the quality of the audio there will get out
in higher chunks the model retains more context from the audio, so the result is more stable
Oh, so you're saying you know things better than me? It's wild. Now can you explain about why "chunk" would affect the audio quality when Extra doesn't?
exactly

extra iirc are extra chunks to help retain context even more, so the quality increases
i cant remember 100% how extra works
but has to be something related to context, rvc is context based so the more u have, the better the results
If you confused about something about W-Okada, you can ask. You didn't have to tell me I was saying nonsense all the time and go do some more research. 
yeah extra are extra chunks that are meant to retain the context, the more extra the better and stable the result but also adds some delay
2.7s = keeps 2.7s of prior context
max is 5s
While it's possible to set extra more than 2.7 s up to 5 s, it does improve the quality more but also result in audio cutting off a lot, so 2.7 s is best overalls. Extra does make delay when number is higher, but it also depends on your GPU. Some reported that extra 2.7 s caused delay and even unstable audio for their GTX 10 series GPUs. There's another guide about fork W-Okada all detailed there. https://rentry.co/ForkVoiceChangerGuide#settings
^tl;dr 2.7s extra is mostly optimal between quality and performance
as beyond it doesn't give noticable quality gain against the performance load