#✨│ai-help
1 messages · Page 253 of 1
How can I look for a voice model?
anyone know how to fix this ?
2025-07-03 22:02:27,643 ERROR [VoiceChangerManager] CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
running deitris' w-okada voice changer, on a gtx 1070.
happens at a random point in time, after 2-7 mins.
running out of VRAM most likely
I thought so too, but via task maneger it does not use more than 3/8 gb of dedicated memory
i feel like it might be inhibited by something, but cant tell why or how
anything else running with hardware acceleration? discord / browser / some 3d game?
discord, 1 browser with a video, and voicemeeter
any overclocking/undevolting?
none
can i get help for voice changer ?
do i just leave the embedder on default (hubert_base_112) i also see contentvec and whisper?
I have a 4060 and a AMD Ryzen 9 7950X3D
My mic is picking up the audio but it says "pipeline is not initliazied"
any specific settings that make a huge difference in the voice model? cant tell if the models bad or if its my settings
Trying to run Deiteris' W Okada on an M1 Macbook Pro and getting the following error even after doing the proposed fix of using "xattr -dr com.apple.quarantine" to fix it. On Sequoia 15.2. Anyone have any ideas on what the isuse is?
it is to disable quarantine for downloaded stuff
I think
"
This attribute is added so that it can ask for user confirmation the first time the downloaded program is run, to help stop malware. Upon confirmation the attribute should be removed automatically, and then the program will run normally.
"
Sorry I'm a little confused. You're talking about the command to disable the apple quarantine right? I already did that through the "xattr -dr com.apple.quarantine" command they provided in the tutorial.
I think the issue might be an outdated MacOS version as a previous person has posted here but I'll post an update after ive updated.
yeah that didnt work i fear 😞
-colab
Google Colab is a Cloud (Remote Good PC) Service. While the Free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
by IA Hispano
Google Colab
by Hina
Google Colab
by Eddy
Google Colab
by Eddy
Google Colab
by Deiteris & Hina
Google Colab
by Shiro & Eddy
Google Colab
by Nick088
Google Colab
by Nick088
Google Colab
by Jarredou & Makidanye
Google Colab
- chunk only affects the performance delay, if it's lower than what the gpu spec is capable of, it may cut off
- extra and "force fp32" in advanced settings may do fine grain quality improvement but dont expect if the model itself sounds not good
check some models in mvsep.com like some multi stem SCnet & BS roformer, then the drumsep models
do i need download that 3 file (zip) for RTX 5000 W-Okada?
i only know 7z usually use 01 02 03, but dont know about Zip
you need all 3 files, it is simply split to allow upload to github
ok thank you 🤝
it's peak
How do I make AI voice unnoticeable not like in real time like record in weights
download the 5000 series version
you don't use weights 
Oh what do I use?
anything else :3
can someone tell me why is the voicechanger super delyed?
I had realtime voice changer client for 2 years now. is there a new update or a new client??
what are ur settings in the voice changer, which download link did u use
watched a youtube video
Hey, I’m using AMD and I’m wondering what I should be downloading. VCC is really acting slow on my pc and I don’t know how to fix this - it lags
Previously I downloaded a light and working VCC but I can’t remember where it’s at
send link
Download all 3 files, then extract the .zip file, it will automatically extract ALL 3 FILES into one. Then open the MMVCServerSIO folder and run MMVCServerSIO.exe (or called MMVCServerSIO if you don't have extensions activated).
send a screenshot of the settings u have
What do I use?
bro left 
what do u mean "in weights"
So Like I use the voice in weights
then what do you expect of it?
No, they say that I don't need to use weights
So what else do I do to not make Ai recorder or anything unnoticeable
nah I dont think that's the point
so you want the results sound like recorded in average mic?
there are some post processing effects you could do
so try searching it
Hello, sorry for bothering and excuse my English but it is not my first language, but I have a question about how parrots are made in an alternative case with collab, why I tried to install kohya locally using pinokio, which did work but I don't know why an error occurs that I put all the parameters all the necessary folders to create a Lora but it tells me that the folder has not been found where one puts the images or that path does not exist I tried to do it in a thousand ways to verify that it existed And if it exists but it does not I know why it doesn't take it, so I don't know if anyone knows how to fix that error or in the worst case the truth is I don't know how to make loras in collab, why didn't I find updated links or links that currently work because at least all the ones I looked for gave me an error or something like that, so I would like to know if someone could help me or know something
YASSSSSSSS
it could be, but now i have to figure getting that final voice into a game 
Hey can you check my dm? I got a question about it
Buddy atleast send the dm 😭
Sent 
Can someone help me? whenever I have to mix two models, I get an error
but I tried, with two 48k models, I tested with several models
Can you send me links to some that work?
I've already tried running it locally and via Google Collab
it's because Applio treats "48k" ≠ "48000" due to models prob trained using different fork/version
so try using mainline rvc
mainline rvc?
gg
'VoiceChanger' object has no attribute 'resampler_in' what does this mean?
everytime i try to launch the start_http.bat file its crashes
guys which ai voice changer is good most ive seen are 1 year old are there any up to date ones
This is a General AI Server, we won't be focused on voices anymore
Elaborate:
- your PC GPU
- your operative system
- what you want to do
- what tutorial link are you using
- a screenshot of the program
start_http.bat is apart of original wokada, ALL VIDEO TUTORIALS ARE OLD, DONT TRUST THEM, wokada deiteris fork is better
there's no updated video tutorial, only written guides, tell your pc gpu and what you want to do
This is a General AI Server, we won't be focused on voices anymore
Elaborate:
- your PC GPU
- your operative system
- what you want to do
- what tutorial link are you using
- a screenshot of the program
Hello i just want to make a pokemon song but i want to change the lyrics how can i do that
Hello, would it be possible to make a voice like this?
https://youtu.be/JupFhvq36PA?si=ZGWF3U1_RYvYcOrq
i imagine using something like ace-step with an instrumental track + new lyrics for audio2audio
You can search rvc ai voice models at:
- https://discord.com/channels/1159260121998827560/1175430844685484042
- In https://discord.com/channels/1159260121998827560/1163592055830880266 , Do /find with @earnest musk
- https://weights.com/ (login required)
- https://huggingface.co/models (but watch out cus in hugging face there arent only rvc ai voice models)
- Suggested Models for Realtime Voice Changing (Wokada)
- https://voice-models.com/
- https://thevoicemodels.com/ (for Turkish Models, login required with discord and level 2 on their server)
if there isnt one, you can:
- make it yourself with our docs guides
- Ask a free request in https://discord.com/channels/1159260121998827560/1159290139609137264
- Be aware that we don't allow any paid comms, so don't fall for any "pay me 20 dollars and i will make the model for you" dm
:wave: @low shard, How can I help?
Available Commands:
• @weights find <query> or /find <query> - Search for RVC Voice Models
Really?
What's the name?
Just download it and use it in okada 💔
Ohh I mean on mobile sorry!
I am buying an computer soon
U can dm me if u want bc idk what you're talking about
Ohhh you use mobile
Yes
Yeah you're pretty limited on options until u get the computer
Oh I see
I don't know of any websites besides weights to record your voice and have it output as an ai voice model
Sorry.-.
Maybe some helpers or mods know tho
Oh is okay
Or some QC (idk what it stands for)
qc aren't helpers
guys may i ask: my regular mic works fine but my virtual cable mic somehow has my computer audio bleeding into my mic. its been messing up my voice ai as a result, what can I do to fix it?
i have installed a model from #1175430844685484042 , no matter what i do though the model won't use my rx 6600 xt instead uses my cpu which kills the performance a lot any way i can fix this?
This is a General AI Server, we won't be focused on voices anymore
Elaborate:
- your PC GPU
- your operative system
- what you want to do
- what tutorial link are you using
- a screenshot of the program
what? u mean model?
what's ur pc gpu?
rx 6600 xt
windows 11
utilizing gpu for the model to perform better. right now it only uses cpu
can't upload screen shots due to missing permissions
you didn't elaborate everything
what do you want to do? what tutorial link are you using?
!give-meida-perms 1h @solid sequoia
i want to use my gpu instead of cpu for the model so it performs better
and for tutorial link are you refering to youtube links?
i want to use my gpu instead of cpu for the model so it performs better
how do you want to use the model? what are you planning to do? ai covers? realtime voice changer for calls?
realtime voice changer for calls and ingame voice chat
theres many different ai programs
RVC = Retrieval-based-Voice-Conversion, the best Few Shots Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models. Technically, Mainline RVC does have a go-realtime.bat (aka RVC-GUI), but it's pretty messy and outdated so it's extremely not suggested for realtime.
Wokada = uses RVC for realtime inference. There's 2 main versions, Original made by Wok, and the most suggested one is Deiteris Fork (modified version)
do you need wokada or rvc?
yes, or the download link of the program
i came from this tutorial, used the links given in the desc
https://www.youtube.com/watch?v=SxdnGxicJOg&ab_channel=novision
all video tutorials are outdated
that tutorial uses an over year old original wokada lmfao
I wrote it in the comments
you just wasted time using that tutorial basically
delete the program, and delete vb audio cable too
it seemed to work perfectly fine though, i just need it to use gpu instead of cpu
alright
which link can i find a newer version on
it's outdated, dont bother using it
plus that version sucks for amd
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
read the 1st link
alright thank you
wokada deiteris fork
let me know
it works so much better now both low cpu and gpu usage, more responsive as well thank you, 1 last question though my "echo" "sup1" "sup2" settings seem to be disabled i couldn't find anything about it in the audio setup page you sent me
they are disabled in server mode
anyone had issues where they followed this guide https://docs.aihub.gg/rvc-voice-changer/realism/
it broke their youtube?
youtube just keeps saying "Audio renderer error. Please restart your computer"
Have you tried restarting your pc
yeah it works whenever i dont have voicemeeter potato on
im pretty sure that i followed every step it told me
Try clicking a1 on both the things in voicemeeter
And in the first one if that doesn't work
Huh that usually fixes it
should i remove the b1 and mono in stero input 1?
Try turning off the denoiser you have on in the first column
So it's picking up sounds just not outputting them?
In hardware out a1 you have that set to your headphones right
yeee
wait how does the imput work?
cuz when i talk it dosent input anythign
i dont see any instructions to put my headset mic input anywhere
oh so the wokada output will be the line 1 (virtual cable)?
Then the output gets put into voicemeeter then into light host then outputs into discord or whatever
Like this?
"Once you have completed all of the above steps you can now go into anything and set the mic input to "Voicemeeter Out B2"."
oh i got confused with the instructions
I'm confusing myself lol
this is where i got it from
So input is your mic output is line 1 and monitor you can leave empty
I'm going to redo that guide when I get home
oh ok
is it possible to add more details on what changing the setting actually do
so people know what they are actually changing
maybe have a bracket next to instuctions saying what it does
Yeah def
are these still the best settings for the t-de-esser 2
@crude flame just realised that when i downloaded virtual cable lite it automatically set everything to line 1 and thats why everything is breaking lol
apprently on windows 11, there system -> sound input output
and theres also system -> sound -> volume mixer input output

that's confusing
program to use the templates?
elaborate
This is a General AI Server, we won't be focused on voices anymore
Elaborate:
- your PC GPU
- your operative system
- what you want to do
- what tutorial link are you using
- a screenshot of the program
for voice models to create TTS with rvc
Does anyone know of a software or method to remove audience noise and get results like this?
https://youtu.be/0xj7UiwVOa0?si=aHo1Igg_cCklyMHp
yeah xminus has a decrowd feature
I tried it but it is not very stable, there is noise in the instrumentation
like that of Mvsep
cant you replace it with the oficial instrumental?
But the drums and guitar? I'm impressed by the intro, since everything is identical to the original Live instrumental. I've been trying to get such results for a year.
Hey i have a question, is there an AI tool for generating subtitles? I want to show something to my friend but he doesn't understand it cuz it's in my native language and not english
updated the guide
also found a way to not route system audio to voicemeeter so you dont have to deal with missing audio
i just tested it on a fresh version of voicemeeter so it should work
you can run the audio thru ASR like Whisper
depending on the quality of the audio and language you may get something decent... or not
you can upload the audio to youtube and let me make a transcript
Thank yiu!
i kept getting this error help
norton antivirus?
weird
@simple ore what do u recommend for me to use for the ai voice in games?
i already uninstalled a antivirus like 2 days ago
bitdefender
help
when i click this "ok" it would say this error
why are you copying?
im not copying..
you need to download all 3 files, use 7-zip to unzip
dont use windows BS
it is the worst
not winrar?
can i use winrar
what do i download for the voice changer?
@fleet cedarwhat do u have that u put the voice to?
@simple ore can u give me the right settings for my GPU
@fleet cedarwhats that?
bro cant u clearly see
its obviously the AI rvc voice
how to fdix
can u send a link for the download
@fleet cedar
read the guide
Hello, I don’t know if this is the right place to ask, but I don’t know how to change my voice in real time. Which app should I use? Because I have several models that people sent me, but I don’t know how or where to use them?
-rt
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
@fleet cedaralr i got it setup now how can i get it to actually work in game and in discord?
some reason it doesn't open
Is this channel for dedicated RVC assistance?
it's about any general ai help, rvc included
This is a General AI Server, we won't be focused on voices anymore
Elaborate:
- your PC GPU
- your operative system
- what you want to do
- what tutorial link are you using
- a screenshot of the program
Gotcha but you guys will still keep up RVC troubleshooting support still?
yeah ofc, we just merged the channels to have less channels
if you need help, pls elaborate
Yo
My voice changer is stuttering
It sounds so bad
Its like tweaking
My words are transfering but its static
This is a General AI Server, we won't be focused on voices anymore
Elaborate:
- your PC GPU
- your operative system
- what you want to do
- what tutorial link are you using
- a screenshot of the program
Oh
voice changer is too generic, there could be over 100 programs classified like that lol
you need to elaborate more to get help, else we dunno even how to help
!give-media-perms 1h @plain pumice
now u can
also it would be better u elaborate everything
all the infos i asked are crucial
lemme guess, youtube tutorial?
you're using an over year old version of original wokada
its the same as using windows xp in 2050 basically
LOL
also vb audio cable has been reported to use issues on windows
all video tuts are outdated
How do I still use it
you can simply uninstall everything and forget you even watched it basically
simply, you can't, its a shitty version
you need a better one
what's your pc gpu? what's your operative system?
Gpu?
that's crucial
Intel core i5
Oh
gpu = graphics processing unit
this isn't chatgpt, it runs on your hardware and its way more intensive and complex
Oh
gpu does all the complex tasks like gaming, 3d, and AI
You can check your pc gpu on Windows via:
ctrl+shift+esc (task manager) -> Performance tab -> GPU
so yeah, don't expect a 1 click experience, AI is not really user friendly
chatgpt runs on everything bc it runs on cloud, remote good pc
while this is a program that runs on your hardware, not a rich hardware by someone else
Intel (r) uhd graphics
do you have any other gpus?
check gpu 1 and gpu 0
the one you mentioned is integrated graphics, it's literally too weak to do any type of AI and to even get recognized as a GPU lol
Oh
don't expect AI to run on bad hardware, it's more intensive than gaming
soo, you got another GPU?
How do I check?
just send a screenshot of your task manager
you should have gpu 0 and gpu 1 maybe
the whole task manager
the performance tab
I know I may be wasting ur time
(I definetly am but not on purpose)
Im just if you call it.. A caveman when it comes to checking ur pc and allat stuff
@low shard
yep you don't got got any other GPUs
You got 3 options:
- Buy a better pc
- Run it locally (on ur pc) using the CPU mode of the wokada fork which has better performance https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/ (but this isn't suggested as it could be unstable)
- Use **cloud **(remote good pc):
About Cloud, there are different services:
- Google Colabs (4 hours daily of free T4 gpu, easy to use, require only a google account) :
- W-Okada's Deiteris' Fork Voice Changer Google Colab (currently works only on google colab PAID tier)
- How to use Original W-Okada's Voice Changer Google Colab (has a Guide) (currently broken)
- Kaggles (30 hours weekly of better GPUs, T4x2 & P100, harder to use, requires an account and a phone number):
- W-Okada's Deiteris' Fork Voice Changer Kaggle (the best and only working one currently for free)
- Original W-Okada's Voice Changer Kaggle (currently broken)
so yeah that's why it was laggy, other than being an old version with worse performance, you also got bad hardware, so not a good combo
the best options are just either using cloud or buying a better pc
How do I use cloud?
click this link and read the guide
reminder that it's more complex and you got limited free time btw
and you also need to give your phone number, as it's a google service and they dont want you to use alt accs
This is a General AI Server, we won't be focused on voices anymore
Elaborate:
- your PC GPU
- your operative system
- what you want to do
- what tutorial link are you using
- a screenshot of the program
rvc and wokada do 2 different things
RVC = Retrieval-based-Voice-Conversion, the best Few Shots Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models. Technically, Mainline RVC does have a go-realtime.bat (aka RVC-GUI), but it's pretty messy and outdated so it's extremely not suggested for realtime.
Wokada = uses RVC for realtime inference. There's 2 main versions, Original made by Wok, and the most suggested one is Deiteris Fork (modified version)
they do not want u to use alt accs lol
hey guys, how i use a model from the #1175430844685484042 ?
Elaborate:
- your PC GPU
- your operative system
- what you want to do
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
1st link, wokada deiteris fork
Anyone know realistic ai photo generator which can use your face to make a picture
yeah, I mean custom text to speech, I see decade driver models here but i don't know how to convert text to speech into my style
Thank youu
Also when creating images of people do they look genuinely
And can i feed it data to create pictures from it
@simple ore and also which out of these do u think is the best omnigen2/fluc/SDXL
how can I make my own text to speech?
make sure you're using the latest Applio
and do manual install with latest pytorch (2.7) and cuda 12.8
the original rvc and even mangio won't work at all
a bit correction
if not the latest release
clone the Applio repo itself https://github.com/IAHispano/Applio
then just double click run-install.bat
it should include torch 2.7.1 which is needed for RTX 50-series
okay ill try this
@knotty moth can u check dms for a sec i asked u a question there i can send it here too if u want
no just ask here
Do you know a good ai photo generator which can use a face to generate pictures
I saw flux and onnigen2 are good ones but which would you recommend me
do u mind sending links for those, i dont want download the wrong thing
just incase i may have
where can I train AI voice using google colab
@knotty moth so what do you think
currently trying it rn
downloaded pytorch 2.7 and cuda 12.8
didnt do anything
proof
idk but im tempted on just getting out my 4070 and using that
cuz ik it will work
is it more or less the same thing?
ill look into it when i wake up
been trying to get this to work for ab 10 hours
rvc doesnt work
appolio doesnt work
i hope but thx
to do image edits, like replacing characters/merging images, both omnigen2 and flux kontext
full screenshot how you installed it
For creating people or changing small features (making ai influencer) which would u suggest me to use
it also requires a small fix to add "50" to "infer-web.py"
omnigen2 then
all in one
you're messing something up then
Can I dm u the screenshots tmr I just got off
That’s exactly what I did
I got that line straight from their website
Even installed cuda 12.8 or wtv
i require a screenshot of how you done it
because there are 20+ who said it works fine
The only version of pyhton I could get to work with that version of PyTorch was 13.11.9
hey is there still like a list of what settings to use with which gpus
I’ll send it to u tmr
seems like a bad website, for amd 6xxx XT gpu's it says its MAX settings are 128 + 2.7s and then like a few sentences below the tabel it says the 6650 XT can do 60-80 ms
read the section above
torch 2.7.1 exactly
you were trying on 2.7.0
Oh
if you run through run_install.bat it should have installed torch 2.7.1
I found the hugging face version
I’ll look at it again when I wake up later
does not matter, both work
RVC1006Nvidia requires a small manual fix
This is a General AI Server, we won't be focused on voices anymore
Elaborate:
- your PC GPU
- your operative system
- what you want to do
- what tutorial link are you using
- a screenshot of the program
Hello, can you please advise if there is an up to date guide to install and use RVC on a pc with AMD graphics card (7800xt) for real time voice changing? Thanks
Hi, did Weights remove the option to log in with github?
I don't understand, since I'm not a developer, I'm just here to use it.
get the AMD one here
Last update: May 5, 2025
thanks
@simple ore hi i got a question about the installation my gpu is nvidia 5060ti and my cuda version is 12.9 on the pytorch the newest one is 12.8 but the flash attention doesnt support the pytorch version of 2.6.0 with this cuda version what should i do? Should I download the 12.4 one and the standard flash attention? (this is about the omnigen2), will it work like that despite my gpu being newer version
you dont need cuda toolkit
as long as you have torch cu128 either 2.7.0 or 2.7.1 it is fine
you can find flash attention 2 wheels here https://huggingface.co/lldacing/flash-attention-windows-wheel/tree/main
download cp version that matches your python install (3.10, 3.11, 3.12)
@simple ore so i need to install with this: pip install torch==2.7.1+cu128 --index-url https://download.pytorch.org/whl/cu128 and install one of the flash attn versions corresponding to my python (3.11) uve listed?
you install torch and torchvision pip install torch torchvision --upgrade --index-url https://download.pytorch.org/whl/cu128
then you download cp311 flash wheel and use pip install the_name_of_the_downloaded_file.whl
i just found out that the python version ive downloaded is 3.13 if i download 3.11 will it work?
how can i swap it to 3.11, just delete it?
install 3.11, make sure you select 'add to path' checkbox on the install screen
alr imma do this rn thanks
there are multiple versions of 3.11 (after the 11 which version should i get)
ty im downloading the stuff rn
is it normal to be this slow for a simple image
also @simple ore how can i train models for it (can i actually do it)
im seeing this for the past 5 min
did you make sure you have cuda torch installed?
check device manager/performance / memory and vram use
Hello all !
I am a newbie here and not a developper at all, I dabble a bit and mostly just surf, read, and do lots of trials & errors.
I am currently working on mods for Cyberpunk 2077 for my private usage (not for sharing on nexusmods or else for copyright issues).
I saw a lot of "voice ai" swaps for the main character and wanted to create my own with a voice actor I really appreciate (a french dubber for a character in a tv show).
I recently tried Zonos to create TTS audio with a sample of the french dubber and the result is quite good.
But that is just the beginning, now I am in front of the hardest part :
Take all the audio files of the character in the game , and create modified audio files of those source files but with the cloned voice I get on Zonos.
And so I have two options :
Either I get all the text of those audio files and script something with python to batch generate the audio files using Zonos.
Either I find a tool allowing audio-to-audio by using a cloned audio reader (is that even possible and does it exist ?)
My configuration is : i9 14900 / 64gb ram / RTX 4090 / Windows 11 Pro
Any help/pointers would be deeply appreciated ^^ (and I repeat : I am not a developer, I can dabble and am willing to learn but consider me an utter noob)
ughh i followed ur steps
from here i have done the steps till 2
then i deleted the pip files and followed ur instructions
i didnt do the 3.2 tho
i will try doing that rn
should i change --upgrade here or leave it like this
also this is the requirments txt file
torch==2.6.0
torchvision==0.21.0
timm
einops
accelerate
transformers==4.51.3
diffusers
opencv-python-headless
scipy
wandb
matplotlib
Pillow
tqdm
omegaconf
python-dotenv
ninja
ipykernel
wheel
triton-windows; sys_platform == "win32"
install the requirements, then upgrade torch
Hey there, I can help you create a voice model of the person you're talking about ^^
i changed the text file to
torch==2.7.1
torchvision==0.22.1
timm
einops
accelerate
transformers==4.51.3
diffusers
opencv-python-headless
scipy
wandb
matplotlib
Pillow
tqdm
omegaconf
python-dotenv
ninja
ipykernel
wheel
triton-windows; sys_platform == "win32"
and i will check if it works now
i copied the torch and torchvision from the cmd from the installing step before it
its still sitting on 0/50
and doesnt move
can u type step by step what i should do to fix this (sorry if im being to annoying
with this command? pip install torch torchvision --upgrade --index-url https://download.pytorch.org/whl/cu128
or venv\scripts\python -m pip install torch torchvision --upgrade --index-url https://download.pytorch.org/whl/cu128
i dunno why you're using conda
regular python venv
and if i delete conda and the file itself (conda create -n omnigen2 python=3.11
conda activate omnigen2) i change the conda part here to venv?
after that is done, pip install torch torchvision --upgrade --index-url https://download.pytorch.org/whl/cu128
and pip install flash.whl
i will try that rn
Would love that ! I'm all ears !
Alrighty I can help out in dms since these two are doing nerd stuff 👍
how can I make custom text to speech
I mean using custom models, not pretrain models
i did what u told me and it still stays like this
I dunno how the rvc works I just know how to clean datasets and how to read graph
Trust
Whoever is typing RN your name is all rectangles lmao
It's scary
I am proof that anyone can make a voice model as long as they try
TTS are zero shot, no dataset needed, only a few seconds
rvc requires training, 10 mins minimum for increased consistency
fun fact: rvc core component (vits) is 'hacked' in order to do speech to speech conversion instead of tts
yup actually rvc first "name" was just vits
In the context of RVC, the dataset is an audio file containing the voice the model will replicate. It can be either speaking or singing.
@simple ore i tried reinstalling and following your order again but it stays the same just 0/50 0%
You mean RVC voice model and not TTS? Alright, there are ways to train a voice model.
https://www.bilibili.com/video/BV1A14y1a75R/ this is probably the very first rvc model ever made, when it was internally just named "vits" (since rvc is a modified vits)
check venv/lib/site-packages folder and see what torch is installed
torch-2.7.1+cu128.dist-info its this
rn im trying running it without the flash-attn to see if it will work
would it be easier for you to just use comfyUI?
for live voice changers, is w okada still the best or is there something new?
-rt
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
idk how to set it up and also still doesnt move without the file
i will try rq to reboot pc and see if it will work
??
read the guide above
oh thank you
can u also send a guide for comfyui
whats the difference bteween the top one and the 2nd one?
Wokada Deiteris Fork
use that
okay thx
Deiteris' fork W-Okada and "original" W-Okada are both versions of W-Okada, but they are developed separately by different authors. Deiteris' W-Okada is better. What are differences? Performance and some bug fixes, especially.
oh i see thanks
Last update: May 5, 2025
does anyone know where the download button is i cant find it 😭
ok i know
how about tts
Don't.
What is your PC GPU? NVIDIA GeForce or AMD Radeon RX?
nvidia 3080
thank you
should i uninstall my old w okada??
then use rvc to emulate decadriver
Yes.
I think none of u all know about kr decade
Yes, you guessed it right. I have no idea what Kr Decade even is.
I tried to recreate custom decade driver sound for my decade driver
While Applio has TTS feature built-in, it's edge-tts, the RVC itself isn't TTS.
Sound driver? I only know Intel/Realtek HD Audio and Creative Sound Blaster as sound card drivers.
no
"toy" sound ok

i don't mean audio driver fr
i mean changing sound in my decade driver bootleg toy model
do i have to run any of these bat files?
or do i just straight do the exe
The exe
Just delete all files related to it
thats it??
Should just be in that folder unless they got replaced by the new stuff
Yup
when i run the new exe is it js gonna replace the old one?
I don't think so
oh wtf
Pretty sure the files would've already been replaced by default unless you chose to skip or smth or didn't get that option after extraction
oh i mean like the acutal w okada
Yah?
i havent doen any ai voice stuff in years
but when i was making models
i used to use appolio
is there a new one ppl are using now or is it still apollio
becuase i feel like alot has probbaly changed
There's applio and mainline
And local rvc stuff which idk anything about
But applio still exists ya
Nothing new I know about
@simple ore i tried doing the comfyui one u suggested but im always getting this error
read the link, download the files, place them into right places
i did that
ohhh ok thanks
here for an example
is this bad?
Uhh
sweeet mother of god
Try running it again
okk imma try
what 😭
with the files placed into right places
i have same settings same stuff
oh right for this, im kinda stupid but the highest amd card on it is 7xxx XT
im getting a 9070 XT, so im wondering what applies to that
okay i think it finished, is it supposed ot open in ur broswer right?
Yup!
Btw if u wanted we could move the convo to dms
ohh okayyy
Ye
@simple ore is it possible for me to screenshare for us to fix the omnigen2 local version in vc here?
pls i wanna kym at this point 😭
cd OmniGen2
py -3.11 -m venv venv
venv/scripts/activate
pip install -r requirements.txt
pip install torch torchvision --upgrade --index-url https://download.pytorch.org/whl/cu128
pip install https://huggingface.co/lldacing/flash-attention-windows-wheel/resolve/main/flash_attn-2.7.4.post1%2Bcu128torch2.7.0cxx11abiFALSE-cp311-cp311-win_amd64.whl
python inference.py --model_path "OmniGen2/OmniGen2" --num_inference_step 50 --height 1024 --width 1024 --text_guidance_scale 4.0 --instruction "The sun rises slightly, the dew on the rose petals in the garden is clear, a crystal ladybug is crawling to the dew, the background is the early morning garden, macro lens." --output_image_path outputs/output_t2i.png --num_images_per_prompt 1```
i will try it now
its still sitting on 0
that parameter should help
let me try this, so i add it at the back of this python inference.py --model_path "OmniGen2/OmniGen2" --num_inference_step 50 --height 1024 --width 1024 --text_guidance_scale 4.0 --instruction "The sun rises slightly, the dew on the rose petals in the garden is clear, a crystal ladybug is crawling to the dew, the background is the early morning garden, macro lens." --output_image_path outputs/output_t2i.png --num_images_per_prompt 1
at the end
alr
its working now, will it work if i do python app.py --enable_model_cpu_offload
for some reason, the "2025-07-03 22:02:27,643 ERROR [VoiceChangerManager] CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
"- error is fixed when I run a game of sorts on the same PC.
you mean, you cannot hear yourself, as in your real voice or your ai-converted voice ?
python app.py --enable_model_cpu_offload
and/or --enable_sequential_cpu_offload
both
no sound at all
when i hit passthru i hear myself tho
probably the voice isnt converting
how do i fix that 😭
i sent a screenshot of my options in #1192011222023950368
thanks it works perfect now
how do i add a voice model
Elaborate:
- your PC GPU
- your operative system
- what you want to do
- what tutorial link are you using
- a screenshot of the program
This is a general AI server, we can't know which program you're talking about, so that's why we need more information on the questions I asked you please
oh i downloaded the realtime voice changer and a voice that I want and im confused on how to add it
is there a tutorial video?
please reply to the questions I asked you, there's thousands of different programs
and your pc gpu and operative system is crucial too
also, there's no updated video tutorial for realtime voice changing, they mostly use old programs, did you follow one?
yea i followed a old one
if you followed a video tutorial, you can just delete everything you got off it, you probably got original wokada like version 1.5.3.8 and vb audio cable
AI runs at sonic speed, youtube tuts aren't the best for ai programs, they get outdated easily
4070, windows, use a voice, https://www.youtube.com/watch?v=We5oYpCR3WQ, I cant upload images
it's best you also forget everything they tell you in it, they also tell outdated info like using "crepe"
yeah, that youtube tutorial is outdated asf lol
also, we don't endorse anything that duckus does
duckus is an horrible person
duckus describes himself as a "certified catfisher"
he makes money off catfishing people for the pure fun of it
AI should be used for good and fun, not for catfishing
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
read up the 1st link, wokada deiteris fork
this is the only updated tutorial
wokada deiteris fork got various improvements in performance and quality
nice, let me know for any issues
how do i input a voice
like add a model?
yeah
Last update: May 5, 2025
also, if you share a screenshot of your wokada, i can help you with settings
can someone give me a simple video on how to install RVC :,) ?
(NVIDIA, WIN11)
please tell me your pc gpu and what you want to do
5070ti, use realtime voice changer (RVC), the part as to how i can use it in games/calls i know myself i just cant find the right RVC to download
everytime I download on the thing it says to create a covere
realtime voice changer (RVC),
Yeah that's why I asked what you want to do, RVC doesn't mean that, it means Retrieval-based-Voice-Conversion
RVC = Retrieval-based-Voice-Conversion, the best Few Shots Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models. Technically, Mainline RVC does have a go-realtime.bat (aka RVC-GUI), but it's pretty messy and outdated so it's extremely not suggested for realtime.
Wokada = uses RVC for realtime inference. There's 2 main versions, Original made by Wok, and the most suggested one is Deiteris Fork (modified version)
I think what you want is actually wokada deiteris fork, right?
share the link of the model you're trying to download
oh sorry seems i was mistaken then, i just remember having used it on AMD like a year ago from "okada".
whichever you think fits best ill get then, heres a picture of the UI i remembered
that one is a pretty outdated version of original wokada, wouldn't be suggested anymore
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
read up the 1st link, wokada deiteris fork
it has better quality and performance
and it supports the rtx 50 serie
oh thats cool, never seen it in tutorials like you said. thanks
because all youtube tutorials are outdated lol
AI moves at sonic speed, dont trust yt tuts for everything
make a weights.com account, click the 3 dots, then download
it still says select a model first after I uploaded the pth
Sir Nick. Can i ask if there is a way to compress my voice effectively? Cuz when i use Wokada or Vonovox, discord will "bonk" it and make it pretty robotic and unnatural somehow
show a screenshot of your wokada
I have EQ band and compressor enable
click close
click the model slot
click start

do i need to download all 3?
also, you didn't set up the settings, show an entire screenshot
like, there's cracking?
ye, sharp decline in quality
yes
bitcrush
i know it has something to do with the bitrate
what settings do u recomend?
chunk: 80ms
extra: 2.7
f0: rmvpe
input: microphone
output: line 1
monitor: headphones, optional to hear urself
What is the most reliable or update version of this AI in real-time?
yea, bitcrushing simulates how the audio would sound if it were of low quality
This is a General AI Server, we won't be focused on voices anymore
Elaborate:
- your PC GPU
- your operative system
- what you want to do
- what tutorial link are you using
- a screenshot of the program
do voicemeeter have it built-in? or do you have any recommendation for the external app?
i really hope to get a cleaner voice in low quality
idk i dont use filters in realtime

whaaa
Last update: July 4, 2025
ah Kiloheart
yea just get kiloheart bitcrusher
hmmm, i hope i can listen playback so i can adjust the stat to get cleaner
can someone help with models upload..
that thing comes up when i upload anything from #1175430844685484042
its not letting me extract the zip 😭
damn, the playback sound so good but discord just bonk me hard
whats this even meaaan 😭
im actually gonna crashout
why can i unzip like everything but not this particular zip
dw i got it extracting using a different tool
that simple
no need to read
just
do that
in fact
that video should be pinned
it's easy
videos like these should be in the server for simple people like me
i already have it installed
but it still doesnt work
how many models u have lmao
uhhh
lemme look
whys nordvpn telling me this aint safe 😔 the WebUI
looks good
you replied to a bot lol
altough I dont like your ( ) folder
i knoe
whats wrong with it
currently 115 models
where can i find the command line?

when i try to launch appolio it says this
ive never had this issue
whys everything not worky
idk if i leak my IP by sending that or not lol
let's go to this guy's house and fix it for him personally
okay i got it to work
it tell you what the problem is
i fixed it
the other window with a bunch of text
is this even doing anything wtf
show the preprocess step screenshot
you did not add models or something
hmm i thought i did
all my wav and audio files i just put in there
remove "
why does it not want to take the model? (does it have to be in the actual WebUI/app directory?)
it literally just does not want to take the model
its even in the model -> 0 something directory
close the app, find the model folder, and nuke it
why does it say diddy
delete?
Heh
how do i know whats what
btw what rest of the settings should i use?
also how can i delete the "saved" audio
if i have 32k sample voice can i use text to speach and save the wav file to extract the voice and train it to higher samples?
tts are usually 24k at best
is there a way i can increase the samples of a voice then
if i upload a wav file to applio which has a little bit of background sounds from games lets say will it do the extraction good
whats the best app that can isolate podcast sounds and make it pure voice audio
and whats the best app to download youtube videos as audio files
how can I fuse 2 models in applio ?
I have so much ms can someone help me
ok so will gtx 1650 sup + ryzen 3 3100 will be good on w-Okada cause i hear a lot of background even if i put steelseries gg mic or when i talk on my native language i don't think it's trained on my language cause like there is some word that ai can't say it and it will be obvious to anyone that it's ai
voice changers only fool a very small portion of the active gaming population at best
maybe if you do male to male conversion, and u already sound a bit like the model itself
where do you guys usually get sources from to train ai for rcv?
when you have like a good device you can change small thing and no one will notice, ofc the best will be like the creative sound blaster SB0490 but these getting like hella expensive
even if i do i will have some weird ai sound like crashing lol
what setting do you do your crossfade at
hii all
just found a perfect live walpaper but its not well looped
you can clearly see that the vid replays each time
is there an ai tha can help with this?
could someone share with me what the optimal settings are for Wokada Deiteris Fork?
like ?
You sure?
pretty sure, or everything I heard till now wasnt the best there is
Yeah, i've trolled many times and if I give them a heads up that its a voice changer they will pick up on it but 90% of the time it goes undetected lol
Tbf i do think the model sounds realistic
are breathing, and other misc sounds also realistic ?
Yeah normal speech is very realistic you can kinda hear the breath too ig?
But anything other than that is cooked LOL
ahh ... okay thought I missed something
So in my experience ive learned to just adapt to it
that helps alot ye
Bc ive been using the same one for a while i kinda know how to speak with it
ill send you photo so i can know whats your meaning, i'm not the best in eng but i can understand atleast very well
if you dont mind, id like to hear a short sample as im curious
No problem, but im trying to get it to work rn for some reason my whole thing stopped working
stage fright huh
Nope im fine with sharing it but it wont go through voicemeeter
retruthed
I was referring to the voice model itself lmao
Oh wdym
ehh ... still dont think that it is true, it sounds obvious to me up until now
rvc is very easy to spot
obviously people that dont know about ai will not immediately fall for this, but give them a couple of minutes talking with u and they're gonna spot its rvc quickly
gtx 1650 is ok, not great but ok, depends on your use case
dont bother trying to fool anyone in any language other than english its too obvious
"i hear a lot of background" if you mean noise then its microphone related, your mic is picking up every sound you make or that can be heard in the background
does anyone know if this is a good guide for rx that I should follow for my models https://rentry.co/RVC-dataset-RX11#spectral-denoising-the-audio
that was literally how i learned about rvc, some random dude was talking with me and i started to notice his voice was weird
i can giggle and moan with ai u guys got nothing on me
Idk why but my microphone works fine but it wont convert my voice everythings silent
Like wdym giggle
just found a perfect live walpaper but its not well looped
you can clearly see that the vid replays each time
is there an ai tha can help with this?
like hahaha?
yeah, make sense for the language i think
vewy easy
if its rvc id reckon it sounds really obvious
This is a General AI Server, we won't be focused on voices anymore
Elaborate:
- your PC GPU
- your operative system
- what you want to do
- what tutorial link are you using
- a screenshot of the program
an indicator is if the mic is too good ngl
both sounds very ai to me
ive been told if you make the mic shittier it sounds much more believable
younger me would also notice it's ai
yea
but you can just bitcrush and that solves the problem
well one is real
or that yea
Last update: May 5, 2025
Its just placebo dude
i mean the first one yeah
Shes speaking like a voice actor
i only heard the second one and i assumed the first one was also rvc but unedited lmao
the one with the dash at the end?
is this the best guide to rx I should follow https://rentry.co/RVC-dataset-RX11#spectral-denoising-the-audio
the breath gave it away on the first one
for better result for no one to notice just make the base low
never mind, both are rvc
no
LOL
if the 2nd one was rvc id say we actually should gib up trying to recoqnize
Most likely yes
ok ty
just look the spectograms lmaooo
wrong
second one is real
dunno if you pay attention to it like i do but can tell the 2nd is the real one based on the pronounciation of "I have" in the beginning 
and if the first one is real then i question my existence
people have to learn rvc can't be realistic no matter what u do
if one needs to take one of these out, then id say we can fool 95% of the population
it didnt fool me, the thing is i didnt bother in hearing them closely
razer always share that audio
skip to 36 sec and listen
as with most conversations
rvc is 2023 tech
its not a voice cloning ai
what the ai is trying to do is literally reproducing mel specs and pitches
is not trying to clone expressions or shit because its not meant for that
all results are flat asf
vits2 rvc when?
good rvc models can fool the more casual side of the internet
but they will find out it's ai, rvc will glitch in any moment
If I were you, I wouldnt bet on that
i have been training models since 2023
when they make a good discord competitor
i know whats inside rvc, i know how it does the conversion
and i know it cant do everything
maybe not that specific way u have in mind it cant, I agree.
pretty much anything non verbal rvc cant do
the embedder is trash
but somewhere down the line I believe it will become hard to recoqnize
eventually yes voice cloning will be that good but now its not
now its ... meh
i agree with this, flat inferences are usually the most realistic sounding because those don't have expressions, so rvc doesnt struggle
How would they even go abt that
but really impressive compared to what we had back in the day
no one cares about ai audio so nothing is happening to it 😔
but when I hear real voice actors im like "what am I doing 💀"
we do have real voice cloning ai
Yes i have bad pc.. i dont have gpu
but they're tts
better embedder, vits2, better pretrains, better GAN/vocoder
elevenlabs?
eleven, chatterbox, yeah they're real voice cloning ai because the ai is actually learning and reproducing expressions
rvc doesn't learn expressions
So you mean if elevenlabs became like sts?
Would be pretty nuts
when you give rvc an audio, it'll extract the mel spec and the pitch data alongisde the features of it
then it'll try to reproduce them afterwards
no idea
Is there even an estimate for how far in the future until we get updates on rvc
rvc is SOTA
SOTA?
like the best sts
like no updates
the reason why it doesn't learn emotions is because is using a pretty old architecture named vits
the og dev team left rvc to do tts
Why tf
because tts are superior
arch wise
they can learn emotions and non verbal sounds better
no one cares enough about sts to update it
i think sts is cool
updating sts it's a really hard task
me too
rvc interally it's a hack of a tts architecture actually
rvc-boss took vits, and did a couple of changes in order to "convert" it to sts
if we remove those changes, it'll be regular tts vits
So if there were to be strides in sts would they have to build it from the ground up?
Or can rvc as it is rn be improved
and its just shitty architecture
yea no the arch is too shit to be updated
yeah damn
some devs of here have tried

