#✨│ai-help
1 messages · Page 283 of 1
they just want your money for nothing, otherwise you should consider this free open source alternative
https://docs.aihub.gg/realtime-voice-changer/local/deiteris-w-okada-fork
Last update: September 6, 2025
Ew voice.ai, use literally anything else, if u have Nvidia u can use any of these, if u don't have Nvidia use either tg fork or Deiteris fork
The ones here are free and better
-rt
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Deiteris Fork, with extra features, but supported only for Nvidia GPUs on Windows. and without cloud options GUIDE
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project. GUIDE
Most suggested WebUI with the best general support for many platforms. GUIDE
For Windows Nvidia, Both Wokada Deiteris fork and Vonovox have similar performance & quality. Users should read the pros and cons for both and choose based on their differences, such as UI and Vonovox's paid effects.
Read Wokada Deiteris Fork Pros&Cons & Vonovox Pros&Cons
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
any like
real time ai voice changer
where i can upload a voice model
and it voice change in real time
just see above you
@heady grove The deiteris-w-okada fork is pretty good. Adjust the settings in the "finding my own settings for chunk" section to make it much smoother sounding.
https://docs.aihub.gg/realtime-voice-changer/local/deiteris-w-okada-fork/#finding-my-own-settings-for-chunk
I downloaded the dirteris fork for amd like 6 months ago, were there any big updates since then
Whats that?
w okada but its a fork for amd I think
Try open Task Manager and end this program task. There are better realtime voice changer programs than this one.
There are like 4 different realtime voice changer versions as of now. The most recent version of Deiteris fork W-Okada is b2332. Another fork that function identically as Deiteris but only made to run on RTX 50 GPU is b2335. Tg Develops' fork has more features and the most recent one, forked from Deiteris W-Okada. Vonovox is the different; it would give much better quality, but its GUI is less friendly and it's only known to work with NVIDIA.
!howtoask
- Check Docs & Guides: Your answer may already be in the AI Hub Docs or the https://discord.com/channels/1159260121998827560/1159513888199540817 channel.
- Search the https://discord.com/channels/1159260121998827560/1192011222023950368 : Look for existing posts that solve your issue. Do not invade someone else's post.
Tell your:
- Full GPU Name: (e.g.,
NVIDIA RTX 3060) - Operating System: (e.g.,
Windows 11) - Detailed Description: What were you trying to do and what went wrong?
- Tutorial Used: Link to the guide you were following.
- Screenshot: A picture of the full error message is very helpful.
To maintain a lega, safe & ethical community, we will NOT provide help for:
- (E girl, as an example) catfishing/trolling, scamming, impersonation.
- NSFW/Porn.
- Any illegal activities.
Requests for these topics will be ignored and may result in moderation action.
- Be Polite & Patient: Our helpers are volunteers. You may ping the
Helpersrole once. - English Only: Please keep all conversations in English.

Applio Lightning.AI exists too btw
also, the refresh method doesn't work? wdym?
the issue is related to ngrok lowered the free tier requests per min
@nocturne mural btw not sure if you're aware of https://docs.aihub.gg/rvc/cloud/applio-lightning-ai/ , but lightning.ai allows free tier web uis + gives good hours monthly
I'm not sure if I should PR maybe later the Lightning.AI notebook, because there's no risk of me having the studio template public and would just make it easier for users to use it rather than upload it manually everytime
Last update: August 8, 2025
also welcome back to see you vidal, it's been a while
as said above it still works by "waiting for a minute"
@nocturne mural how much time did it take for your horizon tab to get inactive? I never seen that before either and can't find any info related to it
how to use
Hello, please check: https://docs.aihub.gg/rvc/local/applio/#training
Last update: August 9, 2025
"CUDA is not available - voice conversion will not work"
Anyone encountered this error while launching Vonovox?
I have CUDA, what packages are missing?
Is Vonovox the latest version? What is your PC GPU? And what are you trying to do with the program?
I just git cloned Vonovox from the code itself. What's the way to confirm if it's the latest version?
I'm only launching Vonovox and it showed that error.
You can try download the zip directly from GitHub. https://github.com/dr87/Vonovox/releases
I think that's the same as git cloning from this:
I have installed and setup everything
be sure your gpu drivers are up to date via the nvidia app
and that your windows is up-to-date too
if you still get issues, it might be a random issue that you should fix via the precompiled installation: https://docs.aihub.gg/realtime-voice-changer/local/vonovox/#precompiled-setup-nvidia-on-windows
Last update: September 6, 2025
oh the link got taken down
Precompiled Version of Vonovox
it was a typo in the docs, I'll fix it rn
here's the current latest version: https://huggingface.co/dr87/vonovox/blob/main/Vonovox169.zip
Remove the /latest in the url and there's this.
thank you both
You're welcome.
The type has been fixed in the docs
yw, lmk
I wish we could post screenshots here
All good now.
do you need any other help?
I tried multiple times and either I'm not doing it right or it just doesn't work consistently
I would need help relearning how to use lightning AI but I'd assume it would have the same issues unless it doesn't require ngrok
unless it doesn't require ngrok
there are multiple tunnels, like 5, and you can use the normal gradio tunnel instead
maybe you didn't wait exactly 1 minute
how can i make this sound better if u respond pls ping me
Best Voice Changer for my specs?
Lenovo Legion 7i
RTX 4070
i9-14900HX
Thanks in advance
I don't know of any tunnels as I didn't even know Ngrok existed until I had to move to kaggle
Nvidia, AMD, or Intel
as in GPU or CPU?
RVC isn't the same as a realtime voice changer for calls
RVC = Retrieval-based-Voice-Conversion, the best Few Shots Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models. Technically, Mainline RVC does have a go-realtime.bat (aka RVC-GUI), but it's pretty messy and outdated so it's extremely not suggested for realtime. There also updated forks with extra features like Applio.
Wokada = uses RVC for realtime inference. There's 2 main versions, Original made by Wok, and the most suggested one is Deiteris Fork (modified version)
Vonovox = Another Realtime Voice Changer based on RVC, with similar quality and performance to wokada deiteris fork but other perks
The lightning.ai guides explains each of them, give it a try, gradio is the default one that's used even in google colab
Ahh, thank you for clarifying, forget the RVC part
so, just realtime voice changer for calls/games?
Yeah
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Deiteris Fork, with extra features, but supported only for Nvidia GPUs on Windows. and without cloud options GUIDE
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project. GUIDE
Most suggested WebUI with the best general support for many platforms. GUIDE
For Windows Nvidia, Both Wokada Deiteris fork and Vonovox have similar performance & quality. Users should read the pros and cons for both and choose based on their differences, such as UI and Vonovox's paid effects.
Read Wokada Deiteris Fork Pros&Cons & Vonovox Pros&Cons
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
Vonovox is suggested for your setup
He said he has an RTX 4070, which is an Nvidia GPU
I was going to go for that, but I heard PC and Laptop GPUs are different so I wanted to see if I should actually go with that based on my specs
I don't have the greatest memory but I'll try keeping that in mind for next time
A Laptop is a PC (Personal Computer), you're most likely confusing the PC term with the Destkop type
It is true that laptop gpus are weaker compared to their desktop counterpart, but you still got a good GPU for this program, just not as good as an rtx 4070 desktop version
Learnt something new, I'll keep that in mind. Thank you
When I get home today I'll ping u if that's alright, could u send the guide then for me to read over?
You're welcome, let me know :)
Last update: August 8, 2025
Will do!
Anyone here from South East Asia Countries? I would love to connect with you folks! 😃
Should i put this text here? 🤔
That's what I'm trying to find out, but an estimate would be about 30 minutes.
was it 30 minutes of absolute nothing, or were you training/inferencing in the background?
It was probably the second one; in TensorBoard I had the auto-refresh option enabled, which I thought would keep the page 'active', but still the URL just died.
Or maybe the limit isn’t about inactivity, but rather about how long that URL can stay active.
This should probably be in general chat ^^
Heya, wanted to ask what the current best pogramm is for Voice changer with using AI Models
That's weird, should I maybe research other free tunnels?
zrok
I didn't have that issue with horizon personally, tried myself
I'm using Awesome Tunnelling for finding maybe better alternatives, i will see
@nocturne mural wait Gradio Public URL doesn't crash Kaggle anymore? they even embed it (this is a test notebook)
lemme check with applio rq
Have you tried using it for training? Of course, for some cases you can just do a few interactions and that's it, but if, for example, I want to enable the custom pretrained option, or during inference I want to enable some extra process, the interface just crashes.
And by crashing, I mean that the interface simply doesn’t update properly, and it doesn’t throw a connection error, which makes me think that sometimes some Gradio requests are just ignored.
Are there models that can somewhat clone African or Spanish singing?
@simple ore @nocturne mural Good news! Gradio fixed the Kaggle crash bug:
https://github.com/gradio-app/gradio/issues/6132#issuecomment-2842251870
Describe the bug I have installed gradio using "!pip3 install gradio==3.45.0 typing_extensions==4.5.0" command.It works fine on google colab, but it doesn't work on kaggle.It loads th...
Should I just add Gradio as the default value? or do I need to add any other tunnels?
I feel like gradio is the best
But it would still be necessary to create tunnels for the filebrowser and tensorboard.
What is this 💔
Is it another potential fix for the Kaggle stuff?
.
.
I'm gonna wait for the new link before I try anything, Kaggle link that is
For applio
I still think it would be fine to leave ngrok for the filebrowser and tensorboard, although in both cases the limit error shows up later. If you don’t enable auto-update in tensorboard, it can last for quite a while.
"Zrok is a bit more complex to use, I didn’t really understand the docs very well, but as far as I know, you have a limit on creating multiple tunnels, and it just blocks you. However, their system has a way to reuse a tunnel you had already stopped using, which requires you to run a command to recover that tunnel and blah blah blah.
Although for me, it would be enough to just use wandb instead of TensorBoard, and for the filebrowser I would use Horizon Tunnel, LocalTunnel, or InstaTunnel.
is filebrowser really needed?
users can upload most things from applio itsself, like the audios and models
can't users upload their dataset with applio's dataset maker option, and then download the checkpoints and index via the kaggle output?
I see it as necessary in order to make minimal modifications, for example, for someone who wants to adjust configurations, experiment with the code, or maybe speed things up, because Kaggle’s filebrowser is somewhat slow, and on top of that, Gradio isn’t exactly that reliable when it comes to uploading things.
For uploading things to Kaggle, I just upload my files to Google Drive or some file hosting site, and then download them through Kaggle.
Although you can upload files through the filebrowser, with the free ngrok plan it’s simply not recommended. So, the filebrowser might be useful for some and not for others, but I would leave it there.
Anyone can help me, Twilio doesn't have my country code number (including. Singapore +65 ) phone contact no.
it is for my AI Automation Social Media where I create Whatsapp, IG & FB Messenger (Convocore.ai [Chatbot] + Twilio)
Good morning, I'm Brazilian and I'm using Google Translate. Is there a way to make voice calls with an AI?
its a no code workflow
Trying to use it to create an AI for this?
Im still learning to create & deploy one
is forked wokada better or vonovox?
Hello, I just joined in since I'm looking for a program that does what I'm looking for.
I just trained my first voice model (Voice A) using Applio. I have a bunch of pre-recorded audio using a different voice (Voice B). I am looking for a program that has Voice A mimic the Voice B audio, since I assume it would sound more life-like than using text-to-speech.
I could try using a virtual microphone to feed back the Voice B lines as input, but I was just wondering if there is a cleaner method to do this.
I think a better way to word this question now is "What are some self-hosted alternatives to ElevenLabs's Voice Changer feature?" as it "allows you to convert one voice (source voice) into another (cloned voice) while preserving the tone and delivery of the original voice."
go to applio's inference tab, select your model, upload the clip you want to convert
I don't know why I thought it would be more complicated than that, but thank you
You're looking for what's called Speech to Speech
That's the simplest way to describe what you were asking for, I see u got help tho so I'm gonna disappear
@nocturne mural you think you would be able to port applio over into lightning.ai? No rush of course
It seems it's a better option than Kaggle
No encryption needed, Lyery said that at least
"hidden channel"
Likewise, he already mentioned it to me before."
Actually, I used it a while ago, and now that I’ve tried it again, I just got a PortAudio error, which is normal because of the realtime part. But when I tried to fix it, I ran into another error, so I just had to manually edit the code and remove the realtime from app.py.
so realtime causes bugs in the code?
just by removing it the interface would start correctly.
is that specifically just for lightning ai or does it work universally with all sites
I wouldn’t know the correct answer, but likewise, I’ll try again in a while and then let you know
good luck! and thank you for being so helpful and nice
When creating a project, you must configure the Python environment by clicking the first button on the right sidebar, 'Environment'. Then click on the Python version (3.10). A dropdown will appear; select 3.11.13 and that’s it.
I tried using it on kaggle and it didn't work, is it only for lightning?
Well yes, the Kaggle path is /kaggle/working/, not /teamspace/studios/this_studio/
xd
I'm a little slow ❤️
there’s no point in running it on Kaggle since TensorBoard doesn’t work there; that’s why tunnels are used.
ohh
I'll be trying this in the morning
Hi guys, can i ask a question? i already know how to train a model and everything, i just wanna ask what happens if i took a dataset that has main vocals and adlibs in the same audio. It can still turn out good or it has to be only main vocals??
What does this mean
why does my voice changer sounds weird
Alright
what's your pc gpu and operating system? also be aware that fork just means modified version in tech field
I already did that
Last update: August 8, 2025
I sent you the link before
@nocturne mural are you aware of the lightning.ai notebook?
.
no e girl trolling is allowed ⚔️
@low shard What other ways can be used to match accent ? Index file and tuning the index meter hardly affects it
To maintain a legal, safe & ethical community, we will NOT provide help for:
(E girl, as an example) catfishing/trolling, scamming, impersonation.
Any illegal activities.
Requests for these topics will be ignored and may result in moderation action.
alr mb u right
yep thanks for understanding!
is there a app thats made for amd gpus or are most of them made for nvidia gpus
i think theres a amd version of voice changers
what voice changer supports .pth files?
all of them 
what gpu do u have, Nvidia, AMD,or Intel
nvm i found one i just need to get better ping
u sure? the only up to date good ones are here all I need to know is your gpu
-rt
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Deiteris Fork, with extra features, but supported only for Nvidia GPUs on Windows. and without cloud options GUIDE
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project. GUIDE
Most suggested WebUI with the best general support for many platforms. GUIDE
For Windows Nvidia, Both Wokada Deiteris fork and Vonovox have similar performance & quality. Users should read the pros and cons for both and choose based on their differences, such as UI and Vonovox's paid effects.
Read Wokada Deiteris Fork Pros&Cons & Vonovox Pros&Cons
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
Vonovox is Nvidia only but Wokada Tg and Wokada Deiteris work for all gpus
its nvidia
i just need better ping i think
did u get the one you're using from a yt video
if u did it's over a year old maybe 2
outdated and basically really bad
any of these three would work for u and I would suggest since u use nvidia to get vonovox
yeah
it's the best out thatas of rn
u download it, extract, run the setup.bat then after that finishes run the start.bat
not too complicated
plus if u get lost or anything I'm here to help
is the official download link like github dr87 vonovox releases
oh nvm
is it a huggingface link
should be here https://github.com/dr87/Vonovox/archive/refs/tags/v1.6.9.zip
most recent one
i got that one
now what input and output device do i put on the voice changer, but on discord?
did u download vac lite? it was also in the guide
didnt, ill download it now
I use a diff setup than normal users of the voice changer so I wouldn't be for sure on discord but for Vonovox
Input: regular headset/headphone mic
Output: Line 1 (vac lite)
can i use my mic that i have instead of my headphones one?
any mic man, just not a voicemod mic as that breaks it
ok
btw where do i download vac lite from
just click on the deiteris guide should be at the very beginning
idk if it's possibly missing from the Vonovox guide or not
VAC lite (virtual-audio-cable by muzychenko) right?
yup!
thanks for the help
np! if u need any more help just lemme know
Can someone please help me
when im using the voice changer the default voices thats built into the app works
but when i try a model from #1175430844685484042 it just dont work
@viral mason maybe you can help?
please?
the default voices huh
yea you got an old outdated youtube tutorial voice changer didn't you
what's your gpu, Nvidia, AMD, or Intel
nvidia but an old one 😭
idk 😂
do u know the numbers orr
I mean u can check here in task manager
so u have 3 options u can download but I'd recommend Vonovox first as it's the best of the 3
hey i tried to start the voice changer but its stuck on "warming up voice conversion please wait"
the guides are all here and u only need two things, vac lite and vonovox
how long has it been doing that?
1 min and a half
i already downloaded everything and I got the custom voice model in and all its just when i hit start it wont work like i dont hear anything
but when i use the models thats already in the app it works perfectly fine
wdym u already downloaded it
soooo
they never read the rules!!!
how do fix this
just try restarting the program
close the cmd?
I had that issue before and it fixed itself at some point
it did?
that's normal
and how do i fix the voice breaking?
like the voice breaks
postal dude
thank god
could u send the link of which one u chose from the voice models section in the server
0.03 block size, 2.0 second extra, 0.15 crossfade try these and if the block size doesn't sound good for u swith it to 0.30
try this one, that one is also old 😭https://discord.com/channels/1159260121998827560/1381824129854079069
😭 why do i always pick the old ones
use the search bar and look for the most recently made versions of any voice
just unlucky ig
anything that uses any pretrain that isn't og or Legacy core is really outdated and has issues
rtx 4080 super, windows 11 pro
one more thing i cant hear anything with the output device being put on the vac lite
i'd just switch to vb cable then since u most likely already have that
it's not recommended to use it at all since it causes issues on windows but it works properly unlike vac lite
vonovox would be more suggested for your setup
and another problem i have the vb output cable at input devices and the vb input cable at output devices
when i use it, it has even more delay
the uh what now
yeah
0.03 block size, 2.0 second extra, 0.15 crossfade try these and if the block size doesn't sound good for u swith it to 0.30
try those
damn ill try rn thanks!
im glad some people are still actually trying to improve this voicechanger
np! if they make delay worse or quality changes ss your original settings and revert back
i was confused when the owner of vonnvox said to keep it the same
oh
my
god
its like crackling really bad 😭
oh
my
god
i fixed it
holy shit
how
fr?
the crackling of doom and suffering
i fixed the crackling
WHO DIDNT TELL ME ABOUT VONOVOX BRO 😭
this is the method
i was gonna pay someone 600$ to reoptimize w-okadas code and make it have less delay but ig i dont need to now
bc ik the codes a genuine mess
bro..
you just got saved
u dont understand my dedication with w-okada 😭
ive been on this community for years
pretty sure i was here when w-okada released
that would have been a crazy loss of money lmao
ehhh not rly
any kind of paid commission or other stuffs are not allowed
not from here, i was paying someone through fivver
😭
i got told i could do it by somebody
a diffrent mod im prettysure or contributor
ima lowkey pay for vonovox this is genuinley good
then it's your own responsibility 
Because I don't have a good PC with decent GPU, so I can't say if either Vonovox or Deiteris/Tg Develops W-Okada has better audio quality, aside from seeing some here saying "Vonovox is better" even if the GUI is more of professional and quite less familar than any other W-Okada.
yeah ik 💔
honestly vonovox has better noise suppression
dont get me wrong like i have a loud ass keyboard and its not picking it up like w-okada did
tbh it's like a mind trick I think but Vonovox makes all models sound more natural and realistic
you can roll your r's and stuff
yes bro this is what i needed
the paid features are just post-processing effects
so much more natural sounding
ima buy his patreon
WAIT NO
ehh i still wanan support him
I mean if u want to
u can get around it by downloading fl studio, getting vb cable and voice meeter and voicemod, then connect them all, it causes slightly more delay but u can get better fx for free this way
If someone here be able to benchmark, record audios from both Vonovox and Deiteris W-Okada and then compare them, I'd appreciate it. 
they usually have patreon, kofi, or something I suppose
is possible to get realtime vst but it takes a bit of time to do the setup
(its actually easy tho)
well id still wanna support him for what he does bc i know how dead ai hub or w-okada its self is
nobody has worked on improving it for a while
hes a cool guy tbh he deserves the money
I'm not stopping from supporting him but just saying that u could get better fx that way, at a small cost of teeny tiny extra delay
id want to have a more realistic effect with the voicehcanger if ykwim
like you know how voicemeeter works and stuff? id want it so it can sound more realisic by making it seem like your mic is compressed a bit
if u get what im saying
maybe they just wanna support the dev
dr always keeps an eye in vonovox, when a issue is found he fixes it extremely quick
how long should an rvc model be when you train it
like sample wise?
wow you responded in one second
u can do less but
it usually sounds pretty good if you train it right, just get good samples
20mins is even good
yea! all u need is voice meeter, vb cable, vac lite, fl studio (i got a free one for u) and voice mod, then u can download whatever fx u want
i usually use batch size 8, a good dataset and train around 40 epochs
dw i cracked fl studio
im never paying for that
I wish I was fl studio/j
😭
why do you need fl studio for
The paid Virtual Audio Cable offers you to have 256 VACs at once.
why cracking fl studio when they have a free version? lol
who needs that??
ok thanks also why do some people say you need 6 hours
the free one sucks ass
i feel more tuff doing it
you dont..
who said that wtf
im pretty sure
its the same shiet you just cant load projects
thats how long u train it
@teal ferry
ok bruh sorry for pinging u 2 times btw let me know if its annoying
it took me 6-7 hours to train a model
which is bad if u wanna use a very specific setup with fx you finetuned to sound good
nah its not dw
uh fair
im talking about the fact that i pinged eleven 2 times
ohhhh
big datasets are better it's true tho
bigger datasets are good sometimes
the largest dataset I ever worked with is like an hour
what you think of this video
u mean the training time not the dataset length?
mine was like 40min
LOL
i mean the length of audio file that you are training
do you do realtime or
yes]
whats your gpu
they need to be diverse enough, having enough pitch variations and words/sentences that do not get repeated often
4080 super
tuff gpu
This is more like how long it takes to finish training a voice model, which depends on the batch setting and how fast your GPU is, not really the overall length of dataset audio.
planning to get a double 5090 for voicemodel training + using the voicechanger for content and stuff
without lag
if you had a 970 it would be immpossible to do realtime voice changer
bc when i use w-okada with valorant, roblox etc it lags soemtimes
lollll i bought this gpu mostly for w-okada
what if i have an intel gpu which one do i download?
oh i see
intel?
uhh
idk
the directml version
with a dataset where the voice is kinda robotic (on purpose) would it still sound good with a lot of different pitch differences like GLaDOS
do you use vocal remover of ultimate vocal remover
oh wait nvm i forgot intel gpu doesn't work with the voice changer
it doesnt?
dang
UVR5
umm the longest I've ever had is almost 3 hours but also I think it doesnt have noticable difference with the 1 hour one, except the other factors like quality consistency
I made a model of her not too long ago and she sounded fine
nope yea
yo dude
just use thishttps://colab.research.google.com/github/Eddycrack864/UVR5-NO-UI/blob/main/UVR5_NO_UI.ipynb?authuser=1#scrollTo=gmjUWmz8iecd
ok moral of the story make it 1 hour
it's uvr5 but not local
YOOOOOOOOOOOOOOOOOOOOOO
i'd recommend vast ai coz is way cheaper, 0,150$ per hour sometimes
most of the gpu are around 0,200$
all u need is two folders in your google drive for that, one called vocales and another one that can be named anything
even 30 mins is also not bad (perhaps like 95% to 97% or something)
#1419105823451517028 message i did this with 20 mins
so really what matters is how diverse and how good the dataset sounds
it has to cover multiple pitches
sounds pretty good
then do this ⬇️
/content/drive/MyDrive/inputnoui (replace "inputnoui" with whatever the folder is u put the data into to clean
is it voice to voice or tts
@fossil sage
rvc model
alright ill check it out
there's sovit which is tts and rvc which is speech to speech
what you think of this audio
@fossil sage u can also dm me for any help with making a model
same with me ^
its good
ok
tbh
👍
just has his voice
i need to train this
this doesn't sound bad at all

for some datasets that are too clean i inject some white noise to help the model generalize
uhh
Time matters, I sometimes have to open a UVR5 Colab notebook standby; when I'm ready I click "connect".
what voice changer are you using? i used okada but it doesnt appear the mic in my discord lowk
W-Okada.
!howtoask
- Check Docs & Guides: Your answer may already be in the AI Hub Docs or the https://discord.com/channels/1159260121998827560/1159513888199540817 channel.
- Search the https://discord.com/channels/1159260121998827560/1192011222023950368 : Look for existing posts that solve your issue. Do not invade someone else's post.
Tell your:
- Full GPU Name: (e.g.,
NVIDIA RTX 3060) - Operating System: (e.g.,
Windows 11) - Detailed Description: What were you trying to do and what went wrong?
- Tutorial Used: Link to the guide you were following.
- Screenshot: A picture of the full error message is very helpful.
To maintain a lega, safe & ethical community, we will NOT provide help for:
- (E girl, as an example) catfishing/trolling, scamming, impersonation.
- NSFW/Porn.
- Any illegal activities.
Requests for these topics will be ignored and may result in moderation action.
- Be Polite & Patient: Our helpers are volunteers. You may ping the
Helpersrole once. - English Only: Please keep all conversations in English.
rvc models overtrain super fast, the official rvc docs recommend 20-40e, i personally prefer 40e-60e
that depends on dataset length tho or nah
as long it's 10 mins or more, yea 40e is enough
I just go based on the tensorboard and how good it sounds
below than that no idea
40 is usuallly never enough for any of my models
losses aren't accurate for voice models as the trend doesn't match with what are you hearing
interesting
so far they're useful when training pretrains, coz you have to monitor they do not diverge
which never happens with finetuning
i wish voicemodel commisions were still a thing
im too lazy to make my own voicemodels
me when u can just make them for free
im too lazy
works for me in mainline, i remember in applio also 40e used to work just fine for me, even when using their messed up discriminator lol
fair
when commisions were a thing i paid trhis one guy and he made some of my best voicemodels i still use today
i swear some commisioners in there were genuinley insanity
what was their name?
I don't recognize the name
many model makers are using cloud/rental solution instead of having own capable gpu
ye ik he is
but
he makes really good voicemodels nonetheless
i used to reccomend him to everyone
I'm kind of familar with this name, I just don't remember where I saw this one.
i use rtx 2060 whats best for me guys?
yeah im pretty sure a mod knew him
alot of mods
I have made several models using kaggle for private use, and dont think im open to any requests
is it really hard to make good voicemodels theese days?
i havent made one in a month or so
you just need a good dataset and dont overtrain the model
rvc is still 2023 tech tho
is there ever new tech
hard to tell, it may sound okay for common ppl to hear, until there may possibly be minor artifacts and some static noise in the spectrogram
see thats what i dont like
overtime people belive it less and less
that its real
why does the voice changer work when i put it on my cpu but when i put it on my gpu it dont work, does that mean my gpu is to old or some?
I could use analogy with the modern game graphics: the native raster & raytraced rendering, and with DLSS/FSR. most ppl may find it okay while comparing with the hardware spec demand, and until spotting minor flaws like ghosting, shimmering, inaccurate shadows & lighting, etc.
hypathetically if RVC V3 was done being developed, would those issues even go away
or do they stay forever untill we get better ai technology in this type of field
кто нибудь хелпануть может?
this is an english only server Garrett Garrison
-# please get the reference
no one?
no help for yuo mr egril troller
i quit on the egirl thing i swear 😭
this is for a normal voice changer
bru
please someone help me 😭
Ill help you
What gpu do u have?
nvidia quadro T1000
i think cuz it’s old
i’m on laptop not pc btw
😂
Dont even try
Wanna know something tho
I use w-okada for egirl trolling too 😭😭😭😭
Been doing it for years
I'm so dissapointed in you
i figured 🤣
😂
I DONT ALWAYS DO IT
I DO IT LIKE VERY RARLEY NOW
😭
you're supposed to use it to play as characters not that yucky shit! 😭
LMFAOOOOO
I dont do it as a kink thats wierd
Im gonna start making content with it
😭
Id blow up lwk
what ever happened to using it to be Darth Vader not E Goth Latina Darth woman Vader
E girl latina
Old times dude..
I used to use that voicemodel too
😭
It was literally called egirl latina
I got funny reactions from people
Thats why I do it 😭
Bro trust me ive been doing it since like rvc was even a thing
I only use it for the funny reactions
😭
Kaggle is a Cloud (Remote Good PC) Service that offers 30 hours of GPU weekly, but needs a phone number verification
by IAHispano
Kaggle
by Hina
Kaggle
by Hina & Deiteris
Kaggle
by Eddy, ArisDev & Nick088
Kaggle
by Eddy
Kaggle
by Shirou & ArisDev
Kaggle
by Shirou
Kaggle
The original wokada?
is weights broken? It won't let me upload the mp3
no it just sucks really bad
there is any update on the OKADA MMVCServerSIO2 ?
it simply is not working anymore for some reason
what dose it look like, u might be using something outdated
what even is that
They don't know what they're doing. You can train a model on a minute of audio if you want.
The foundation of my argument revolves around covering the entire phonemic spectrum. You may get something that sounds reasonable with less audio to some degree. But the model will struggle with under-represented phonemes in your dataset. So it will end up without nuanced emotional speech or sounds that sit on the extreme ends of a voice. The little tiny details that make a voice sound human will be absent.
when using voice to voice when trying to clone voices it doesn't mimic the accent or the emotion to further explain think about it like this if you were to have the best model ever of a girl screaming and has a indian accent and if you to use rvc to try to mimic that it woudn't work due to does 2 factors
rvc outputs are rather flat because the model is learning the spectograms
not the emotions
it takes a mel spectogram, trains it, then outputs a .wav
oh ok makes sense just to clarify doh im not trying to make the best model ever or the best ai voice clone audio ever i am just trying to make it good enough and yes i know i sent this video multiple times but as you can see his audio is not the best either but its still good enough for viewers to enjoy the video and the audio
YES thats my point the reason why i cant do voice to voice is because number one im not good at reading out loud i mess up at bit and i mumble also im not good with accents and mimicing how some talks therefore tts is best
It's basically a filter of a voice
In realtime you just gotta mimic how the character speaks
I've learned how to do that for the most part which makes models sound better in okada
some tts can learn expressions and emotions yeah, rvc cannot, it's just learning mel
I'm looking for an AI video to video tool where I can type in a prompt say "Ice Age" and then it converts the video with the characters and background into an ice age type enviorment and maybe the characters look like cavemen while keeping the same movements and audio as the original clip. Does anyone know one?
yes but i am not using realtime
Just giving an example y'know
That sounds horrible
yes it does sound horrible however it good enough to achieve 100k views or more on tiktok
mine sounds even worse
yes
The real question is will it get people to look for more of your content
It's not. I have YouTube videos with like five times that and they've earned nothing
There's plenty of that stuff on YouTube and probably other video sites
anyways this doesn't change the fact that ai voice i cloned was bad

Imyoure trying to make useless content that will go into the void and stay there for eternity
It's a waste of time. Make something people want to consume and you will be successful
Make something that tries to find a loophole and you'll most likely fail. Unless you get really really lucky
??????
I'm confused
But anyways forget this
It don't matter
Also targeting tiktok as a platform is not a good idea. They pay like sht
OK i get what your trying to say
but as you see here it doesn't sound that good
would it be a good idea to run it through an model
rv
c
What did u use to clean the audio? Is it that Collab I sent from before
Fail a ten thousand times
The ten thousand and first time you'll like what you hear

Mastery is repetition. Stop looking for some magic recipe
There isn't one.
Just practice. Idiot
That sounds good, how long is the dataset you made
I don't care

its not a model its inference

That's what you said not me
ist crazy how some people so easily are able to just use eleven labs and get high results
unfortunately its paid
Wdym, I'm kinda slow lol
He can handle it. I yell at him all the time and he keeps coming back
.I respect that actually
Alr u do u man
@viral mason
Tbh I don't like eleven labs due to it not being able to copy non human characters like Venom or General Grievous
At least I don't think it can
your not yelling at me irl
True
But some people can't take direct criticism. You can
That's good. You will go far with rhat
well i guess it was constructive critism unlike when some people would just say quit you suck at this why are you even trying
Haha well there's a line between helpful and just being a jerk
Skin color is irrelevant also
Once you can make a bad model. Meticulously craft your dataset. And I mean painfully craft it its going to suck. But the result will be exceptional.
i need help
ok but yk in the yt video i showed you he used vibe voice for tts then used rvc to make it sound better should i do that
You can sure. It all comes down to how well you train that RVC model.
But using RVC as a filter or enhancer in a sense is a good idea. The foundation of the sound though will have to come from the first model
what is a good ecpochs rannge for models
With wha?
The amount of epochs depends entirely upon the size of the dataset and the model you're using
I cannot give you a number
Look at your training data output. Use something like wandb or tensorboard
This data is not for the faint of heart and will not really make any sense. There is also little content to learn about it on things like YouTube. Just stick with it though
Tensorboard is the most common tool used to see training progress
Once you use wandb tensorboard looks like it's for kids
Well what does it even look like
with the voice changer
What's that
@viral mason
Guides for Programs that use RVC Models in Realtime for Calls/Games
A Realtime Voice Changer with similar performance to Wokada Deiteris Fork, with extra features, but supported only for Nvidia GPUs on Windows. and without cloud options GUIDE
A personal fork (modified version) of Wokada Deiteris Fork, it just adds some Quality of Life improvements to it like supporting Spin Embedder and Audio Effects. Don't expect too much about it since the creator made it originally as a personal project. GUIDE
Most suggested WebUI with the best general support for many platforms. GUIDE
For Windows Nvidia, Both Wokada Deiteris fork and Vonovox have similar performance & quality. Users should read the pros and cons for both and choose based on their differences, such as UI and Vonovox's paid effects.
Read Wokada Deiteris Fork Pros&Cons & Vonovox Pros&Cons
These options are not recommended for use.
Not suggested, older versions in youtube tuts are even way worse. GUIDE
The program is worse compared to the ones above, and much less updated. GUIDE
but i need help
already did all that but its js not going through mic and delyaed
whats a dataset
theres way more but thats kind of the basic part of it
a set of data
how long the audio is ?
no all of the audio
i havent used the voice changer. but i do know its extremely buggy
push buttons until it works
Okay I will wait for @viral mason to help me
youll have to elaborate on your problem
all you said was you need help
its vague
what exactly is the problem youre seeing
model is usually done around 40-100 epochs
vb audio is more confusing then it looks. i would suggest watching a video on hooking that up
It’s delayed and isn’t working
yeah find a youtube video that specifically uses it with discord
i cant tell you off the top of my head what is wrong. but id assume it has to do with the virtual audio cable and pass through with discord
Im see if @viral mason if not idk what Im gonna do
lets hear yo modesl
?
voice to voice
Not really
man its even better then elevens models
how long was the wav file when u trained it
1 hour 47 minutes
Because the model was trained with the SPIN embedder and when weights does its inference it doesnt use spin but uses cvec
well idk what this but imma just assume thats it going to sound ok when i put it through rvc
as long as you select the spin embedder you will be fine
do you use appilo
Yea
So can one of y’all help me
i can select any input/output but it doesnt "comes out"
nice, just cuz i came here and asked for help, now it is working
sorry for bothering
why is ur imput vb audiocable?
and ur output is ur speakers?
u did it wrong
ignore that, i know isnt the best for it but tahts what i use
ur imput is supposed to be ur main microphone
my output is my speakers cuz i was testing
also, it stopped again 🤡
monitor does that
u can listen to urself with monitor
try this
switch to server instead of client
use WASAPI, youll know what im talking about once u click ur devices
for the number thing on server, put 4800
i think
or
48000
yes
btw
ur extra is broken
So can you help me
@tawny radish
I got 6 pings what'd I miss, I was showering
Maybe they should shower too
bro ive been in this server since febuary i never seen u speak here til now wth
like
last year febuary
@fading lodge is my old account
I've been more active lately other than just posting models
Actually interacting with ppl because why not
Did u have me added on that acc btw?
Ah
I'm gonna scroll up to see what silly people needed my help
And then not help them


it doesn't even look like rvc models, you shouldn't recommend such strange things for rvc model makers & users, even from sketchy youtube videos
Looks like Tensorboard if they locked in
Well I'll help anyone I was helping before lol
Just joking around
how good is vonovox ? i have a rtx 3060 with an amd ryzen 5 5600g but with w okada tg i still crackle
can it be because of the model ?
i could help you fix the crackling if u want
vonovox just gets updates alot id use it if i were you
it just has a BIT more delay
it's so annoying!! i thought my computer was good enough :./
which version do you use?
just want non-crackling voice that sounds robotic at the end of my sentence
I'm downloading vonovox now, I was using the latest TG fork of deiteris
do u want me to help you set it up?
i can give you proper settings
im actually new to using vonovox and its easy
yeah would be awesome ngl
because
just dm me
ok
did
thats not rvc correct.
its a way to read data.
you can use whatever you want i dont give a fuck
one tool will give you more control over your data and the other one wont. what a beginner does or doesnt use is irrelevant because they will have no idea what theyre looking at anyways. The entire point was to just try.
what's the best settings? I plan on switching over once it's updated to finally have more than 8 slots
vonovox?
or
w-okada forked
vonovox
i can send u mine
depends on ur gpu tho
mine dosent have TOO much delay
it has no crackling
didn't I give those settings to u or someon?
hmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
my gpu is this btw
i frogot
really?
I always thought it was kinda mid since the numder was lower than 3000 smth
is that one better than mine
yes
I might upgrade eventually
gaming/raw performance wise, probably yes but the RTX one would outperform it in AI tasks due to usage of tensor cores
Anyone know of ways to match accent ? I tried index file and tuning the index meter, not much effect.
couple things. youre not in the virtual environment and you need to be in the same folder as the whl
so first activate the venv. then move the .whl into the folder the terminal is in or cd the terminal into the folder the .whl is in
or provide pip with the absolute path
oh you see how you installed it it says installed but then alltalk cant see it
its because you installed outside the venv. so you need to
yes
C:\Users\yonshuk>dir
Volume in drive C is FSOS
Volume Serial Number is A42B-E89B
Directory of C:\Users\yonshuk
09/21/2025 08:02 PM <DIR> .
05/16/2025 07:11 PM <DIR> ..
09/21/2025 07:18 PM 417 .bash_history
09/18/2025 06:52 PM <DIR> .cache
09/21/2025 03:06 PM <DIR> .conda
09/20/2025 01:29 PM <DIR> .dotnet
09/17/2025 06:41 PM 126 .gitconfig
09/17/2025 06:44 PM <DIR> .local
09/18/2025 07:02 PM <DIR> .matplotlib
09/21/2025 07:51 PM 227 .python_history
09/06/2025 12:49 PM <DIR> .stacher
09/18/2025 05:01 PM <DIR> .venv
09/18/2025 06:07 PM 0 1.0.4
09/17/2025 09:25 PM 0 App
09/17/2025 09:36 PM <DIR> chatterbox
09/17/2025 09:33 PM <DIR> chatterbox_old
05/16/2025 07:11 PM <DIR> Contacts
09/21/2025 05:34 PM <DIR> Desktop
09/21/2025 08:02 PM 0 dir
09/21/2025 03:04 PM <DIR> Documents
09/21/2025 08:16 PM <DIR> Downloads
05/16/2025 07:11 PM <DIR> Favorites
09/20/2025 11:13 AM <DIR> ffmpeg
09/02/2025 05:28 PM 0 ffmpeg2pass-0.log
09/18/2025 05:38 PM <DIR> index-tts
05/16/2025 07:11 PM <DIR> Links
09/21/2025 03:07 PM <DIR> miniconda3
09/10/2025 08:35 PM <DIR> Music
07/13/2025 08:19 PM <DIR> Pictures
05/24/2025 01:22 PM <DIR> Saved Games
05/18/2025 08:22 PM <DIR> Searches
05/22/2025 04:21 PM <DIR> Superposition
09/21/2025 07:52 PM <DIR> venvs
09/21/2025 08:10 PM <DIR> Videos
7 File(s) 770 bytes
27 Dir(s) 197,643,468,800 bytes free
C:\Users\yonshuk>
no the project root
so alltalk is the project
so the projects main folder is another way to say it
most likely its called alltalk_tts
here
cd e: OR e: (depends if youre in cmd or wt)
cd xtts\alltalk_tts
dir
Directory of E:\xtts\alltalk_tts
09/21/2025 07:40 PM <DIR> .
09/20/2025 12:03 PM <DIR> ..
09/20/2025 12:03 PM <DIR> .github
09/20/2025 12:03 PM 110 .gitignore
09/20/2025 12:03 PM <DIR> .vscode
09/21/2025 07:40 PM 5,793 2.2.2+cu121
09/21/2025 07:20 PM <DIR> alltalk_environment
09/20/2025 12:03 PM 22,932 atsetup.bat
09/20/2025 12:03 PM 18,415 atsetup.sh
09/20/2025 12:11 PM 0 cmd_windows.bat
09/21/2025 08:06 PM 700 confignew.json
09/20/2025 12:03 PM 15,338 diagnostics.py
09/20/2025 12:03 PM 553 docker-compose-cuda.yml
09/20/2025 12:03 PM 749 docker-compose.yml
09/20/2025 12:03 PM 818 dockerconfig.json
09/20/2025 12:03 PM 818 Dockerfile
09/20/2025 12:03 PM <DIR> finetune
09/20/2025 12:03 PM 104,941 finetune.py
09/20/2025 12:03 PM 142 launch.sh
09/20/2025 12:03 PM 35,184 LICENSE
09/20/2025 12:03 PM 941 modeldownload.json
09/20/2025 12:03 PM 9,258 modeldownload.py
09/20/2025 12:44 PM <DIR> models
09/20/2025 12:03 PM 828 nvidia.Dockerfile
09/20/2025 12:46 PM <DIR> outputs
09/20/2025 12:03 PM 111,877 README.md
09/20/2025 12:03 PM 47,659 script.py
09/20/2025 12:25 PM 332 start_alltalk.bat
09/20/2025 12:25 PM 308 start_environment.bat
09/20/2025 12:25 PM 334 start_finetune.bat
09/20/2025 12:03 PM <DIR> system
09/20/2025 12:03 PM 57,861 tts_server.py
09/20/2025 12:03 PM <DIR> voices
09/20/2025 01:03 PM <DIR> pycache
23 File(s) 435,891 bytes
11 Dir(s) 205,143,212,032 bytes free
E:\xtts\alltalk_tts>
double click start_environment.bat
just did
that should open the terminal with the env activated then pip install the whl
right click in that folder and open in terminal
or in the address bar of explorer type cmd hit enter
tell me which one you did so i know if youre in cmd or windows terminal
if you right clicked and went to open in terminal then type
./start_environment.bat
if you typed cmd in the address bar do
start_environment.bat
im using cmd
if it closes again on you then right click the bat file open it with notepad and at the very very bottom i want you to put this word
pause
file then save and then try to run it from the terminal again
it should stay open now and then you can tell me the traceback
i opened it and its blank
yes nothing in it
which was why it wasn't working
looks like i should reinstall it
ok scroll up and you see a folder alltalk_environment go in there
you can reinstall it but i mean we dont need it if we know what were doing
in alltalk_environment folder you should see like an env folder. open that, then scroll down you should see a python.exe right?
sorry python.exe should be in env/bin/python.exe
theres no bin folder
im working off memory here.
bin could be linux only whats in the env folder
were looking for python.exe
search the env folder for it if you have to
i found it
E:\xtts\alltalk_tts\alltalk_environment\env
python.exe
ok go back to the terminal and type
E:\xtts\alltalk_tts\alltalk_environment\env\python.exe -m pip install put\the\path\to\deepspeed\whl\here
just copy and paste all of that delete the last part after "install" and put the whl there
C:\Users\yonshuk>E:\xtts\alltalk_tts\alltalk_environment\env\python.exe -m pip install C:\Users\yonshuk\Downloads/deepspeed-0.14.0+ce78a63-cp311-cp311-win_amd64.whl
Processing c:\users\yonshuk\downloads\deepspeed-0.14.0+ce78a63-cp311-cp311-win_amd64.whl
Requirement already satisfied: hjson in e:\xtts\alltalk_tts\alltalk_environment\env\lib\site-packages (from deepspeed==0.14.0+ce78a63) (3.1.0)
Requirement already satisfied: ninja in e:\xtts\alltalk_tts\alltalk_environment\env\lib\site-packages (from deepspeed==0.14.0+ce78a63) (1.13.0)
deepspeed is already installed with the same version as the provided wheel. Use --force-reinstall to force an installation of the wheel.
this already happened to me twice
so there's no way to use it?
E:\xtts\alltalk_tts\alltalk_environment\env\python.exe -m pip install deepspeed-0.14.0+ce78a63-cp311-cp311-win_amd64.whl --force-reinstall
also though why are you installing this
you do not need it for the repo/project to function correctly
given you degree of experience and the fact i have to debug through you. Its a waste of time. While we could get it working, you should ask yourself first if you need slightly faster inference
actually i can answer for you. You dont need this right now. But, if youre dumb and want to continue fixxing it then i need to know the traceback or error that youre getting that caused you to attempt to install this
i've been using chatgpt for help and it put me through a infinite rabbit whole
well you have it installed
so im not sure what caused you to continue to try and install it
its because alltalk said it wasn't installed
oh, ok so we need to debug alltalk source code
you can pay me to do that. its $210 an hour
or
ignore it
💀
you dont need it
theres a small chance he exits the loop if deepspeed isnt detected. but im going to guess he didnt code that
also does alltalk tts use inference and model at the same time
?
ive tried generating it with the error
so
i did listen to your secon step
anyways
youre opening up settings and docs
there should be another url in the terminal
i dont remember which port he opens up but its not the one youre using now - 7851
try 7852
well i gotta go to bed soon but uh somehow i used chatterbox then i used appoilo to convert the audio and guess how it sound
just click that link
ok
great its not working
