#✨│ai-help
1 messages · Page 240 of 1
these are my charts , training for 3 hours
Question, is 2 and a half hours of data for a model too much? I tried making a Richtofen model and it sounds like shit. Granted I'm not German but its very rough. btw I'm using it with W-Okada so im not sure if that is a factor or not.
It's way too long
Ok, just stick to an hour then?
I don't know who Richtofen is, but I think it is overkill to use more than 30 minutes of dataset for a voice model.
I use an hour and a half sometimes and it sounds good, first time I tried with a 2 hour dataset
anything after an hour you just get diminishing returns
Ok I see, makes sense
I have no idea actually, never trained that long dataset tbh. You should try shorter, good quality datasets with lower batch sizes imo
yeah, it is good quality, I the dataset is ripped right from black ops 3
If the character does not have a versatile voice and speaks in a monotonous range, it is pointless to keep the dataset long.
Yeah but this character is very expressive.
a lot of dips on normal g/total indicates a lot of silence in the dataset
fm going up very fast, so most likely this is gonna be a trash model
He/She was using Applio i guess? Maybe silence training files effecting this situation. Or not?
too much silence in the dataset bad
mute files are fine as long the dataset has the silence truncated
it is too much to see two silent files showing up in the last epoch so often
What AI, preferably local, will allow to convert subtitles with timestamps into AI voice?
i don't know why but my feature matching is always going up no matter which model i train
Bro, RVC hates me 💀
it should be slowly going up, but not like in your case, usually fm does that when the dataset is super small
My dataset is 13 minutes. And batch size is 4
yep thats a small dataset, so makes sense why fm is raising like that
Thank you. I think a clean vocal performance of 13 minutes of training will be enough. There is nothing I can do for "fm" lol. Avg/g/total will be the chart I will consider
strange the dataset dont have silence a lot
my is 10
if there how i can remove it 🙂
its in overtrain right
Eh, kind of hard to tell. You could always train for a bit longer
idk what bro is using so I cannot help
like what
what are you using to train? it looks different to what I use
apillo with of pretain and 10 min dataset
guys, anyone text to speech ai? with timecodes 00:00:18,000 --> 00:00:21,000 please
Applio on what? Collab or kaggle
kaggle ofc
how its looks like
1s sample, really? 🙂
man my pc is trash i took 1 min for this
Quick question even at 250 my rvc models are all quiet
yes is too trash
I am running a local ai on a 3060 ti and a ryzen 3 series cpu what can I rent to render faster someone recommend me vps at a good price na servers I wanna be able to open windows on vps and run my own programs
Allegedly, i cant really confirm just what i was told by someone else, the errors that were resulting from this was like a domino effect that also led to my virtual environment loading incorrectly and processing quietly defaulted to cpu based without failing or warning me. When I ran it all correctly and with cuda support it was blazing fast lol.
quiet has nothing to do with epochs
if your dataset is too low volume, then the results of inference will be quiet
1000 epoch is 5 minutes is "you're training on an empty dataset"
there's no way in hell to train a meanigfully good dataset this fast even on the most advanced AI accelerator
I js farted
Does anyone have tips on speeding up the process of building a dataset for a voice model? Specifically sifting through an entire stream or episode's worth of audio to find and isolate where the voice you're trying to clone is located? I figure there's a better way than manually going through it and comparing it to a transcript to easily locate them.
I have some problem when I try to use RVC on Colab
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
Which RVC Colab notebook?
yeah, ig this will never end
this infection will just keep on spreading and spreading

I dont have
rip
The original Colab W-Okada is broken. Using Detris' fork W-Okada on Colab with free plan can get you terminated.
Yeah I already tried
Use Kaggle instead, but you'll have to register with your phone number on this one. https://www.kaggle.com/code/suneku/voice-changer-public
i dont know how can I run it without GPU
Also, the guide for fork W-Okada. https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork
Last update: May 5, 2025
I mean, you can just simply do it:
but whether you should ( due to delay ofc and stutters ) that's another story 
better if you follow what Namari says, less hassle for nothing beneficial really.
W-Okada is a free and open source program. Detris' fork W-Okada is the one I use.
What do u mean "can get me terminated"?
at best you google account will be banned from colab
at worst bye-bye google account
how to fix this ??
Will it cause any impact or not ? Should I keep it as it is or there is a need to fix it ?
If there is some people who need someone to make them thumbnail i can help im a thumbnail youtube maker , mp
you resumed the training, but since you had some progress after the previous save logged, the jump back got logged twice
does not really matter
Actually due to power cut my pc went off
That cause the issue
same thing
After this the data I see. Is accurate ?
it is just the log got double entries for 18.5k (saved epoch) - 19k (when you lost power)
can someone explain why there is an paywal on voice.ai and how to remove it and its not on browser
you see, when someone decides to make money off the fools who just dont know any better, they add a thing called paywall to a free software
and then the dangle 'you can do it for free*" (* not really)
so it cant be removed
for the money
ok
if you got a decent gpu you can run the same thing locally on your PC, or on google colab/kaggle for free
i mange to delete all the silence from dataset this time its so good now ? { i use appilo on kaggle}
when i try to delete a voice model i get this pop up and i have to reload page
close the voice changer then delete the folder in model_dir
alr ill try that now
What is that
**in the colab version of Aicovergen i get this for the Run webUI command
**
Timer: 00:02:14Traceback (most recent call last):
File "/content/HRVC/HRVC/src/webui.py", line 10, in <module>
from main import song_cover_pipeline
File "/content/HRVC/HRVC/src/main.py", line 22, in <module>
from rvc import Config, load_hubert, get_vc, rvc_infer
File "/content/HRVC/HRVC/src/rvc.py", line 5, in <module>
from fairseq import checkpoint_utils
File "/usr/local/lib/python3.11/dist-packages/fairseq/init.py", line 20, in <module>
from fairseq.distributed import utils as distributed_utils
File "/usr/local/lib/python3.11/dist-packages/fairseq/distributed/init.py", line 7, in <module>
from .fully_sharded_data_parallel import (
File "/usr/local/lib/python3.11/dist-packages/fairseq/distributed/fully_sharded_data_parallel.py", line 10, in <module>
from fairseq.dataclass.configs import DistributedTrainingConfig
File "/usr/local/lib/python3.11/dist-packages/fairseq/dataclass/init.py", line 6, in <module>
from .configs import FairseqDataclass
File "/usr/local/lib/python3.11/dist-packages/fairseq/dataclass/configs.py", line 1104, in <module>
@dataclass
^^^^^^^^^
File "/usr/lib/python3.11/dataclasses.py", line 1232, in dataclass
return wrap(cls)
^^^^^^^^^
File "/usr/lib/python3.11/dataclasses.py", line 1222, in wrap
return _process_class(cls, init, repr, eq, order, unsafe_hash,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/dataclasses.py", line 958, in _process_class
cls_fields.append(_get_field(cls, name, type, kw_only))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/dataclasses.py", line 815, in _get_field
raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'fairseq.dataclass.configs.CommonConfig'> for field common is not allowed: use default_factory
Timer: 00:02:16
can someone help?
yes
makes sense, dat colab isn't working
U have 2 ways:
If u want to do everything automatically like AICoverGen, use RVCAICoverMaker.
If u have some time and u can do it manually using UVR5 + Applio
can you provide any link for RVCAICoverMaker
thank you
can you also provide me with a article or a tutorial for creating own ai cover voice models?
-rvc
whats the best chunk and extra setting for rtx 4060 i tried looking on website but couldnt find 4060 settings
what do you guys use for voice chagners?
guys
?
how can i get what voice changer i want through weight site?
i have w okada
but i wanna have peter griffim like my voiuce chamher
ansd i dont find it in voice models
are u typing fast or are u nervous?
yeah
you press the three dots
but how can i make it into the w oakada
and press download
u can also find peter griffin models here https://discord.com/channels/1159260121998827560/1175430844685484042
true
some of the models of peter griffin on weights is shit cuz they just make the model without cleaning it
oh ok
anyone anybudy 🫠
that's good as long it doesn't go down or up sometimes the tensorboard gets a little confusing
but u gotta test the model if its good or nah
iknow
Any turkish girl models?
nah
how 😉
its this server thing right?
how do i make the perfect dataset?
Last update: May 5, 2025
Here you go
you guys use the rvc gui thing for ai covers right?
yes
help me choose gpu, currenly i use rtx 3050 mobile for rvc, voice good. But now i want to build and change gpu to rx 6700 xt. Will the sound be different?
sound wont be different
6700 is not a big upgrade
no
rvc-gui is heavily outdated and nobody should use it
its good ?
what do you use/recommend?
well, applio / my fork of it
or either original rvc but no point in that
whereas rvc-gui specifically, yeah, outdated ( last update it had was 2 years ago so that should give you an idea of how old it is
tl;dr. What should you choose?
If you want advanced features and are yourself rather advanced / want nightly features n stuff, my fork
https://github.com/codename0og/codename-rvc-fork-3
If you want simplicity and something that just works, og Applio
https://github.com/IAHispano/Applio
sire, i want to ask if WavLM has a good outcome :<
wavlm is not ready
thank you!
Hi! Good morning/afternoon/evening! Can someone help me create voice models? A simple explanation or, if possible, a collab that does it would be very helpful!
Here, you can start by reading the docs (Check the first guide)
-rvc
Will the sound break up?
it does not really matter what device you're using to train, math is math
in my voice changer have cpu, gpu0, gpu1, gpu2, gpu3, what i choose ?
neither of them
please try the better one in: https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#download-amd-intel-and-cpu-on-windows
Last update: May 5, 2025
(assuming you have AMD gpu)
hmmm is there way to verify?
The applio ouput looked very similar to RVCv2 in size and the wav samples also sounded very similar. Both had pitch training too. I thought it was a little weird how stupid fast it was going....but it appeared all the training files were there.
im having trouble with training a model
it keeps saying "list index is out of range"
how do i fix it?
does it sound like your voice model? how many steps per epoch are shown in the log?
do i need to have amd igpu for it?
the radeon RX one, not igpu (aka. radeon graphics)
rtx won't work?
it should be the Nvidia version https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#download-nvidia-on-windows
Last update: May 5, 2025
but that version is of 7 dec 24
it's the latest one
ok
for RTX 50-series you'd need this one https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#download-nvidia-rtx-5000-series-on-windows
Last update: May 5, 2025
it doesn't matter as you won't use it
so its basically the same as the old voice changer?
dec 2024 is still not too old, just before 50-series launches
it should be the same I suppose
in releases it has cuda part 1 and part 2
all parts are needed to extract
ok
Im using RVC AI Cover Maker and for some reason it wont convert. ill upload the music via drop, select the model, click convert and it says error.
this is the precompiled version
@fallen wing whats your gpu
After downloading how do you set it up i havent used RVC for like 2 months or 3
Download NVIDIA on Windows. Your laptop has NVIDIA GeForce RTX 4050 Laptop.
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
After downloading i assume i extract it?
Read audio setup
If the zip finished downloading, use 7zip or WinRAR to extract it to a folder. 
No need to ask every step on this one.
Ye you asked questions that get answered by scrolling for 1 second on the guide
If anythings still unclear ask
My ass be anxious cause i accidently killed my laptop one time from something similar to this not rvc tho
Theres nothing risk killing your laptop on the guide, else itd mention risky steps
I read but traumatized fr
if there's any issue on the program's end, you could tell @viscid moss btw
what did u possible read last time 😭
What you want to do
What kind of fork RVC WebUI is this? I use Applio the RVC fork.
I want to use real time voice changer
idk I just found link and downloaded 😅
and sorry for the late late replies
obviously some ancient shite
older than dinosaurs in AI terms
xD
can u give me the latest version, if you have it
or a link to a guide
what's your GPU?
1650 (laptop)
alright
thank you
is there a guide here on how to make an AI voice model for RVC
just in case, he sent you the actual rvc, which is not for realtime
How do I use for realtime then
I need to record voice then covert in rvc?
you want to change the voice of a recording? use applio
you want to change your voice in realtime to use it on disc/games? use deiteris w-okada
realtime in games on 1650... i'm ded

Going afk for a bit*
yea he cant do that with a 1650 laptop gpu
i mean.. if you play competitive Solitaire
yeah I wanna do that
yeah its not a good gpu
can u give link for that
I'm try both and see
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
first link
you may not be able to use it while playing games, your gpu is too weak
bc i dont have a really good computer, ive tried this voice changer on kaggle. the thing is, with this voice changer apon clicking any of the models i get this error "TypeError: Cannot read properties of null (reading 'modelSlots')
TypeError: Cannot read properties of null (reading 'modelSlots')
at i (https://85c5fd18-860e-42eb-8752-dc96044fe4e4.hrzn.run/index.js:2:1305771)
at Object.updateServerSettings (https://85c5fd18-860e-42eb-8752-dc96044fe4e4.hrzn.run/index.js:2:1306003)"
then while trying to intalise again i get this: "TypeError: (0 , Zz.removeDB) is not a function
TypeError: (0 , Zz.removeDB) is not a function
at https://85c5fd18-860e-42eb-8752-dc96044fe4e4.hrzn.run/index.js:2:3293292
at m (https://85c5fd18-860e-42eb-8752-dc96044fe4e4.hrzn.run/index.js:2:1536451)
at Generator.<anonymous> (https://85c5fd18-860e-42eb-8752-dc96044fe4e4.hrzn.run/index.js:2:1537797)
at Generator.next (https://85c5fd18-860e-42eb-8752-dc96044fe4e4.hrzn.run/index.js:2:1536880)
at e (https://85c5fd18-860e-42eb-8752-dc96044fe4e4.hrzn.run/index.js:2:1543652)
at s (https://85c5fd18-860e-42eb-8752-dc96044fe4e4.hrzn.run/index.js:2:1543855)" any fixes? im using hrzn
this is w-okawda's voice changer btw
and yes i do have the modded version
but i also wanna try this out
yes, but what's your pc gpu first?
should i use codename fork instead of applio?
hi
should i use .index file in my model zip file ?
i use only the final .pth and add it to my zip
Replay software can import only .pth file without error.
i want to know if i don't use .index, can effect in my final result?
help, an error pops up
[Voice Changer] Web Server Launch Exception, DLL load failed while importing beatrice_internal_api: The Dynamic link library initialization program (DLL) failed.
[VCClient] wait web server...10 http://127.0.0.1:18888/
[VCClient] wait web server...20 http://127.0.0.1:18888/
[VCClient] wait web server...30 http://127.0.0.1:18888/
only if you want to tweak 20 things at once and experiment
do u think its worth it?
have you ever trained a good voice model with old RVC/mangio/applio?
it is not noob friendly after all
please elaborate:
- ur pc gpu
- what u want to do
- the tut link
I'm using RVC and stuff, as a voice changer for VRChat, whenever I load a model, I get this error: Error:'NoneType' object has no attribute 'host_api'
please elaborate:
- ur pc gpu
- the tut link
The what link?
the tutorial link that you're following
do i have to use RVC GUI for files or can i just use wokada?
if your wokada has input=file, output=file, then yes you can use it
or you can just use the cloud option on huggingface with illaria rvc
is there command to see the RVC GUI link thing like with - realtime?
-rvc
RVC i think
Is the collab working?
does anyone have this kind of problem when merging models?
anyone now how to install rvc voice changer
try using Applio for merging models
oh ok I've been using the older versions, do you have a link for that one?
wait I think I found it
thx
be sure to be using wokada deiteris fork
what’s ur pc gpu? What do u want to do?
RVC = Retrieval-based-Voice-Conversion, the best Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models. Technically, Mainline RVC does have a go-realtime.bat (aka RVC-GUI), but it's pretty messy and outdated so it's extremely not suggested for realtime.
Wokada = uses RVC for realtime inference. There's 2 main versions, Original made by Wok, and the most suggested one is Deiteris Fork (modified version)
elaborate
what’s ur pc gpu? Which collab link? How much time did it take?
I also use that but sometimes it's slow, so I run w okada locally if that happens
what? Wokada has 2 main versions
Wokada has 2 main versions:
- Original made by Wok
- Deiteris fork (modified version) made by Deiteris
each version has it's own updates
the latest deiteris fork has way better performance and quality than the latest original
wokada deiteris fork got more performance
be sure to not use original wokada
it is local for both
be sure to not use rvc gui
what settings should i use in UVR vocal remover if im trying to make a dataset?
this is what i have currently
its supposed to be a talking model
how do i trim audio in audacity efficiently? its hard to get the exact trim
what’s ur pc gpu? What do u want to do? What tutorial link are u following?
is this the right place to ask my question
seems like wokada deiteris fork latest version, be sure ur nvidia drivers are up to date
yeah
yes
okay
lmk
working now, thx
still learning ?
I can't send files to make ai covers
Hey all, i have a problem with the voicechanger from w okada.
It runs and works perfectly, but i want to use it when i play a game eg. Red M, i have a huge delay, the voice laggs and i cant use it really.
If i tab out of the game and switch to any other programm such as discord or stuff, the laggs and delay are gone.
Can anyone help me ?
what's your G P U?
hey guys i have a 2080 ti and have tried using a voice changer to troll but quite a few number of times now people call it out very quickly. Is there any tips i can do to make it sound more natural? idk what much else to say should i provide a demo on how it sounds like?
3060 Nivida (12GB)
.
I do have higher settings, but the porblem also appears if i play other games too.
I can have the game open and just switch to a diffrent window, then the voicechanger works fine, i can show you if you have time
Set every single game settings to the lowest, every single graphics and everything
Show a screenshot of ur wokada while rubbing the game
Be sure to close useless programs in the background
On it!
is it bad or something?
It's extremely outdated and abandoned since 2023
I'm talking about the rvc GUI fork from t1g3r or smt like that
What's ur PC GPU
Any updates
So it helped but its still not good enough
sadly i need to go now, i will msg you later
4070S
-gui
rvc gui by tiger18n is too old and doesn't have rmvpe pitch extractor as the current state of the art
Send a screenshot later
Applio is a newer updated rvc fork
alright will check it out later when I get home
yesterday i updated to wokada fork and its waaay better than the one i previously had
RVC = Retrieval-based-Voice-Conversion, the best Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models. Technically, Mainline RVC does have a go-realtime.bat (aka RVC-GUI), but it's pretty messy and outdated so it's extremely not suggested for realtime.
Wokada = uses RVC for realtime inference. There's 2 main versions, Original made by Wok, and the most suggested one is Deiteris Fork (modified version)
Rvc (like the applio fork) have completely different purposes than wokada
I was using this collab
On my phone
And the time it took was like, forever
When i clicked to convert, it didn't convert
I accessed the collab through this link, ai hub docs
You can't always use the same Gradio public url, you need to always get a new one by re-running the whole collab
is anyone using rvc-project? im not a coder or anything and im literally just a casual person trying to clone a voice. i was using grok.ai to help me out but it seems like the one it asked me to download just runs into problems over and over again, does anyone have one where there aren't any issues at all? grok.ai has been asking me to add debugging scripts all god damn day
would appreciate the help, if any!
copy and pasted from the other channel lol
Cant msg you, you have to add me
no need to dm, this channel exists for help
lemme guess, you used a youtube tutorial?
this is an old original wokada version which has worse performance and quality
along with bad settings
plus windows users reported that vb audio cable might give issues
forget everything about the tutorial you followed and uninstall the programs you got from it
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
what's your pc gpu and what do you want to do?
i got an rtx 4070ti and im trying to clone a voice locally since its "better" i guess? and then use that cloned voice for the wakada voice changer. however i just saw you mention how the one used in yt tutorials is old and has performance issues. so im assuming i should scrap the old wakada and download the newer one?
wokada is for realtime
not for training
RVC = Retrieval-based-Voice-Conversion, the best Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models. Technically, Mainline RVC does have a go-realtime.bat (aka RVC-GUI), but it's pretty messy and outdated so it's extremely not suggested for realtime.
Wokada = uses RVC for realtime inference. There's 2 main versions, Original made by Wok, and the most suggested one is Deiteris Fork (modified version)
inference = use models
inference realtime = use models in realtime
training = actually making the models
do you just need to use the models in realtime voice changing?
i know i know, i want to train one and then import it to wokada, i want to train one using rvc project, which was my main issue since the one i have installed has too many issues
the original w-okada perfomance is very bad compared to the modified version
if you wanna train some models i'd recommend you to use the compiled version of applio, then for good results ensure your dataset is clean (remove room reverb and noise), use the right batch size and don't overtrain the model
also local is better because you won't be limited by the free cloud time, since you got a good pc gpu
Last update: May 5, 2025
check applio for training models
and get wokada deiteris fork for realtime inference
sounds good, thanks! ill let you guys know if i run into any issues (hopefully i dont)
I didn't understand
be sure to always re-run the notebook whenever you need it
don't just use the same gradio public url link
because it will always change whenever you re-run it, as the notebook won't stay running 24/7
the problem isn't the colab link
the problem is you used the same url public link that was expired
be sure to always redo the same colab process everytime you need to use it
I did, okay i will delete everything
I got Cuda, VB - Audio, python
Delete everything ?
I currently am deleting everything as told 😮
realtime voice changer does not need python, it comes with its own packaged in, and I think does not require cuda toolkit either
he used an old ass youtube tutorial
Como hago las piedras de evolucion?
Final Step on installing...
Hey, quick question: Let's say you have quite a few models in Okada's directory and you want to organize them better. Can you move models into different numbered folders and have it reflected in the program's slots next time you open it? Or do you have to delete the folders like it says in the guide and start over?
So ingame i cant rly use it, any setting i should use
this server is english only
set extra to 2.7, also try it in game and show again the program while in game at min settings
i can't really vc
uncheck sup1
put game graphics to minimum
be sure to also play with the pitch and other models, not all models are perfect
They are on minimum, sup1 is unchecked
Is there a way to creata ticket for a meetup on live supp via vc.?
the settings are good, u could optionally use force fp32 mode on for worse delay but slightly better quality, and also server mode for less delay https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#reduce-more-delay
Last update: May 5, 2025
u can optionally also check https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#models-to-try
Last update: May 5, 2025
nope
tickets are meant for moderation
it depends on the staff if he can help via vc, which not really everyone can
forget about the delay, the voice sounds robotic 95% of the time when ingame
then try another model
not all models are good
Thx for the help, for today i give up
you just need to try other models really, i even gave you some suggested models
I know, and i will, but for today i got no time left.
You already helped me alot, and it got better, i will get there
alr lmk
continue to get "Not enough data present in the training set. Perhaps you forgot to slice the audio files in preprocess?" 22 min dataset, saved as .wav, with the path to the folder in the Preprocess. I even tried uploaded my dataset. What's up with this error?
Perhaps you forgot to slice the audio files in preprocess?
that's kinda the fix
I mean, I sliced to 156 files. but it's just one file that is added to preprocess, right?
it's been a minute for me since the RVC V2 Disconnected colab days, so I'm trying to catch up lol
is the audio cutting feature to automatic in the preprocess settings
and in the tensorboard process when I click preprocess, it shows the 22 mins of data is uploaded. So it has to do something with the slices?
does it show only now or did it also show before that error?
It shows after i finish everything else and hit train
you need to make sure preprocess output shows X minutes were loaded, then you need to make sure extract features did extract N segments
thanks for your help - one last thing. With the sliced audio, back in the day you'd just upload the .zip with all files. Do the files need to be spaced out in the DAW (Ableton, Audacity, etc) by x amount of ms before export?
depending on the colab, in some cases you did upload a zip with source audios, in others you uploaded an individual .wav file(s)
you dont need to split audio in audacity
then I don't know what the issue is. Any sample datasets on here I can evanluate & see where I'm getting hung up?
do I add the sliced audio to the folder of the path I input?
i'll just give you this picture
if you're using applio colab with UI, you click [x] dataset creation, enter the set name, then upload individual file(s)
then preprocess
if you're using applio colab without UI, you put the files into a folder on your google drive and provide the folder name
everythig is done properly you'll see 'Preprocess completed on x seconds of the audio' and then extract features gonna have ~x/3s number of slices
so for one hour of audio there should be 1200-1300 slices
Yep this is where I went wrong. Much appreciated!
I think the problem is not that the link has expired.
Look, I followed the step by step correctly, I just clicked on the link, there's no way it could have expired
I just recorded this video
Oh you're not using the t4 GPU
So it's running in CPU
Click the arrow top right
Change runtime type
Set t4
Then re run the colab
GPU is wayyyy faster than CPU
I don't mean to be a pain in the ass. but look at attached. 545 slices from 23 min of audio (and they show in the log file, but then look at the bottom I'm still getting the error cc: @slim schooner
I changed it and still the audio doesn't convert
😭
Did you click save then re run the whole notebook?
Yes
When you change GPU and CPU the whole session restart so you have to do everything again
It doesn't seem in the background that you run the cell again
I redid everything
Are you sure you clicked save then did everything again? There should be at least some output in the background and also showing the cell running but nothing is running
It's because I took the screenshot when I changed, not after I did it all over again
Could you show how it's going right now?
not sure where it went wrong for you cus for me its working just fine it says its training and my gpu is getting worked lmao
Wait a minute, I'll record another video
id help if i could but im just following the pros here lol
how many files are in f0, f0_voices, extracted folders?
-colab
Google Colab is a Cloud (Remote Good PC) Service. While the Free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
by IA Hispano
Google Colab
by Hina
Google Colab
by Eddy
Google Colab
by Eddy
Google Colab
by Deiteris & Hina
Google Colab
by Shiro & Eddy
Google Colab
by Nick088
Google Colab
by Nick088
Google Colab
by Jarredou & Makidanye
Google Colab
First time I ever see someone using m4a
Try converting the file to mp3 https://convertio.co/m4a-mp3/
Best way to convert your M4A to MP3 file in seconds. 100% free, secure and easy to use! Convertio — advanced online tool that solving any problems with any files.
It would have been better if you showed the colab output after that it not being able to show the converted audio file, because here it shows the actual error
But from my guessing it might be because of the extension
Please try converting it, if it still doesn't work, do another video showing the colab output too at the end
545 in extracted, none showing in F0 and F0_voiced, 545 in sliced_audios
try rmvpe extractor instead of crepe
Ok
Aaaa so hard😭
I'm terrible at programming
This isn't programming 😭
ok training stopped after overtraining was detected, is 321 epochs good for 15 mins of audio?
Programming is the process of creating the instructions, code, for the computer to follow, while this is just inference (utilization of the models) of the RVC program, and AI is a bit harder to use since this one is open source and not meant to be easy to use even though forks like applio make it kinda easier
don't use overtraining detector, it's not accurate
321 epochs sounds a lot for just 15 minutes of data
monitor your model losses using applio's tensorboard, then stop the training when you notice the model starts to degrade
makes sense, so it would be safe to assume that its been overtrained and that overtraining results in quality suffering?
if the model sounds ok in multiple epochs but suddenly becomes super robotic and bad after a certain point you can assume the model got overfitted
it's pretty easy to hear those problems
you'll hear the model having a static-electric type sound
i hear it, usually at the start or at the end when i say something
thanks man, ill see what i can do 👍
Thank you, rmvpe works and fixed the issue. Any idea why crepe would be causing those problems?
do you achieve this by supplying more data? like how many minutes of audio did you use to train this voice model? im guessing 15 isnt enough as it isnt anywhere as good as what you have shown here, even the overtrain sounds better than what i got lmaoo
i used 42 minutes of data and a batch size of 8
cant say without an error dump
this is a 16 minute dataset
^ worse than the 42 minute model, but the quality is ok
so ideally you want your dataset to have good quality (remove noise and reverb) and diverse data
adding singing clips boosts perforance i assume rather than just talking clips correct?
makes singing better yeah, these models weren't trained using singing tho, they were trained only using speech
i think it's fine to have both singing and speech in the dataset tho i have never tried it
ok and you can do that inside applio or do you need to source that outside of applio?
oh it looks like you deleted your messages lol but you mentioned noise can kill a model so i was wondering where you can do the cleanup and whatnot
say like maybe wind blowing or background music and stuff
lmao i got confused, it wasn't noise that killed my model, i had forgotten that the model that sounds bad actually sounds that way because I was testing shit settings. 
more of a me error

but noise still can kill a model or no?
yup, the embedder is sensitive to noise
and what would i use to clean an audio file? say maybe theres like background music or something? is that the "noise reduction" option in applio?
to separate vocals from instrumentals use the gabox_fv4 model in uvr
for just background noise (pc fans, mic noise, etc) mel denoiser aggr
aahh so the first option is only if its like people singing alongside a soundtrack?
and the second one would be as if people were speaking like in a livestream or something and you could hear their ac or something?
can also work if someone is just talking but has background music (like a streamer talking while having some song in the background)
but it cannot remove sound effects, only instrumentals
yes

aaa got it, i could probably add like 25 mins of audio then, hopefully this helps out the quality lol
thanks dude, you've been hella helpful
ok so the true answer to this is that it can remove some of the sound effects, but not all lol
there's another model dedicated to remove sfx but the results are veeery bad for rvc
be sure every audio is from the same source
rvc can't deal well with audio inconsistency
so same mic used and same recording place
so like if i were to talk in a cheap microphone in a potato phone vs a good microphone? both of these audio clips would contradict because of the audio quality difference?
makes sense
yup
for us it sounds fine but rvc isnt human, so
makes perfect sense actually, thanks

aaa😭
mhmm
The audio has finally been converted
Thank youu sooo muchhh
what the best tts ai i can use my own voices for free
hi i was wondering, is it possible to run VC on a phone. Of course not locally. Lets say i start my vc on my laptop, and i host it on my local network. Can i connect into it from my phone?
Yw
What's your PC GPU
The interface will be shown on your phone, but you can't really use it
Phones lack of a VAC
A VAC (Virtual Audio Cable) makes a fake audio device, used to re-route the audio of different programs
In Wokada context, it's used to get the output of wokada as the input in other programs
hmm i see, as expected tbh
so theres no VAC for android?
or its straight up impossible
Nope
It's straight up impossible to use it on ur phone rn
One message removed from a suspended account.
aight then
how do i create a ai text based that uses gguf models? since i tried using chatgpt and it doesnt work
lol yeap...it was silently failing and just going through the motions >_<
FINALLY got everythign working for real....took 5 hours....and trained model sounds beautiful!
I've used it for TTS and realtime-rvc. So happy to FINALLY have a success. I ended up getting Applio to work, after I fixed a few things it ran perfectly. RVCv2 still not running right though.
How can I post my trained voice model to the voice-models channel? I've already packed it in a zip included the index and config and a working onnx conversion too.
@analog obsidian hey not sure if you're on atm but ive come across this uvr app that im using, i know you mentioned that I should use gabox_fv4 however it's nowhere to be seen or found. perhaps i need to use a different uvr software? or if i could manually download it from somewhere and use it with the current one? same goes for mel denoiser aggr. hmu when you can
Is it normal for pre-processing a 10 minute data set in Applio to be taking over 35 minutes with a good CPU/GPU?
no
with simple slicing 50-hour set of 109 files takes just a minute
Got it. Just restarted and it finished within seconds
Am I only one who struggles uploading model on weights.gg? For some reason it says that verification not completed and my files are corrupted but everything is fine when using in local RVC
Hi I'm using a voice modified from male to female specifically it's called psych2go-By-Dan I'm using it on Fivem the problem is that most of the time it sounds bad and very robotic can someone help me fix it?
AI-based processes are more resource-demanding than videogames.
If you got a great pc and if you're using W-Okada for voice changing, then it's likely that the issue is on your w-okada settings or the model itself.
-colab
Google Colab is a Cloud (Remote Good PC) Service. While the Free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
by IA Hispano
Google Colab
by Hina
Google Colab
by Eddy
Google Colab
by Eddy
Google Colab
by Deiteris & Hina
Google Colab
by Shiro & Eddy
Google Colab
by Nick088
Google Colab
by Nick088
Google Colab
by Jarredou & Makidanye
Google Colab
I don't think he's using owaka how can I show you? I'm not very experienced @odd shale
In case you're using any other sort of program for realtime voice changing, then discard it immediately
We only provide support for OG W-Okada and deiteris' fork.
All other ones are paywalled garbage.
real time voice changer
come can i download this app? and most importantly a realistic male to female voice? @odd shale
No need to ping me on every reply.
oh sorry
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
Read the deiteris guide.
I tried to install it but I don't understand how to open the application
I can't find the folder to open the application
is there a section where I can send you the screenshots? @odd shale
first of all, it's better you elaborate
what's your pc gpu? what tutorial link did you use? are the game settings to the minimum? can you show a screenshot of the program?
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
may be better to ask in discord.gg/weights
Force FP32 mode: on (THIS IS OFF BY DEFAULT!) Turning this on improves stability, significantly reduces glitching/artifacting, increases VRAM usage by 200 MB.
I put it ( on ) will it improve the sound quality?
Yes
But it’s very minor
I didn't understand what you mean
Yes it will improve the sound but not by a lot
Ok, should I keep it on?
ON
Yep
btw it will add a bit of delay for that little improvement
i wouldnt call getting stable results a minimal improvement
I didn't say minimal, I wanted to just add that it does add more delay
is there any way to... change... model names... in... okada... it... crashes... when... i.. try... to...
Error
unhandledrejection
no error stack
TypeError: Cannot read properties of null (reading 'modelSlots')
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
Which W-Okada version are you using?
elaborate:
- your pc gpu
- what tutorial link did u follow
i didnt follow a tutorial i install fork do the settings recomended
my gpu is 3060
rtx
what did you install exactly? send the link of whatever you used
fork just means modified version in IT field
if you make a project named "a" that just prints "a" on github, i can fork it, making a modified version that is called "ab" and prints "ab" to console
that's the meaning of fork
idk
wait im gonna go find
it was something nvidia cuda
send a screenshot of the entire program or the program folder if you really don't remember the link
it open in web
Wokada has 2 main versions:
- Original made by Wok
- Deiteris fork (modified version) made by Deiteris
each version has it's own updates
the latest deiteris fork has way better performance and quality than the latest original
from my guessing, it might be the wokada deiteris fork, is it the version b2332?
it perfectly works
we have our guide for its latest version there https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/
Last update: May 5, 2025
alr it seems the latest version, the issue is in that wokada deiteris fork there's a bug at the moment where you can't rename a model after uploading it, the only way is by renaming the model file then re-uploading the slot
thats no issue then easy peasy
just rename the original file
yup, then re-upload the slot
pth, and if there is there, the .index too
json isn't needed at all
you can delete the json
why is json not needed
it's just the metadata of the model got off weights.com
if i get the model i use the whole model blud
Or you can go through inside \MMVCServerSIO\model_dir\ folder, then use Notepad to edit "params.json".
it's not part of the model.. it's just extra info from weights.com which don't impact the model at all...
hmmm
You confused?
it means nothing for the program, it doesn't even utilizes it
oh
but sure you can keep it and not save some storage if u really want lol
its only 3 kb
so
theres like nothing in it
ok it changed the name now
thanks
in rvc context:
- pth files: contain the voice
- added index files: contain the accent
- metadata.sjon file: it's just some extra info about the model download link if you downloaded it off weights.com, it's not needed and won't impact the actual model at all
In a voice model zip file that downloaded from Weights, there may be a json file alongside pth and index files, which is not needed for latter.
i mean sure do whatever, it won't really matter, tho it's usually just suggested to delete the json for saving some space
yw
I'm just too slow to explain.
nope lol
nope, that's the index
yes do oyu need the index be on
A json file is not what it is. In this case, a json file stores metadata of a voice model.
i know voice models work without a index but is it really necessary
The actual accent file is an index file.
ye i just checked in it just realised that
Not really necessary on W-Okada.
you need to play with the value,
higher value can give you a more similar accent to the trained one but can make it look like autotune
lower value will make it more like your own accent, but won't sound like autotune,
usually it's at like 0.3 or not used in realtime since it's not necessary but you can play around with it
K
so if the model doesnt have a accent file i can just mash potato it with a another accent file from a different model
u can also optionally turn force fp32 mode in advanced settings for a more stable model and slightly better quality at the cost of some delay
if it doesn't have an index file, you can just not upload it, it's not needed
is this right
i mean u can use the index file of another model, but it would be useless
force fp32 mode
doesnt let me
click the stop button then try to change it
That's fast.
yes
yup, u can also optionally use server instead of client to have less delay with more complex steps but u won't be able to use noise/echo suppression
u interested into that?
I forgot where index slider setting located on fork W-Okada. Not gonna open my own W-Okada today, but this is what fork W-Okada looks like. https://cdn.discordapp.com/attachments/1371568582848417834/1372882565949816874/image.png
what does that do
thats the normal okada the one i have is fork
interesting
it's a setting
server = can have less delay but is harder
client = has more delay but can use noise/echo suppression and is easier
which do you need? you're using client right now
client
red or blue pill be faster but drain more energy be slower but have more energy
No. It's fork W-Okada in that screenshot. Everything is the same, the difference is that it's DirectML variant.
oh
also i noticed that the disable jit compilation setting is off
Disable JIT compilation: off for faster loading speed of the program, on for slightly better performance (10-15 ms) for Nvidia only)
do you wanna leave it off for faster startup speed or set it on for longer start up but a bit less delay?
The normal or original version of W-Okada launches its own separated window. The fork W-Okada should always launch up a browser, so this it is.
what does it chagne
exactly
k
i explained it in the message, or is there a specific part that you didn't understand?
"on for slightly better performance" on what the models or the function of ai
better performance at using the ai models, like reducing the delay of around 10ms
provides a very small speed boost
You got any idea that you still confused? I may explain more about that.
ig ill enable it
no i think i tried the normal version of okada
but i know it was bad at that time
shit
alr, just be sure that ur using vac lite and not vb audio cable and ur all fine
u use vb audio cable for other stuff
i use vac lite for this thing
try restarting the program
like slightly slower to launch the program, but very minimal difference
oh
okay
yeah it's not going to take like 5 mins to start up lol, pretty little difference
u should be all fine now
K
what would be the best site to train my voice model?
you should first of all check if your pc gpu is good enough to do it locally rather than a site
what's your pc gpu?
@azure patio You can check your pc gpu on Windows via:
ctrl+shift+esc (task manager) -> Performance tab -> GPU
i think NVIDIA Quadro P620
u checked it's that in task manager?
if so, then yeah your pc has wayy to low vram, 2gb vram are nothing nowdays
yeah
Train (make) RVC Models on cloud:
- Prepare the Dataset
- Setup RVC:
Choose a cloud way to use RVC,
- Google Colabs (max 4 hours of daily T4 16gb gpu not granted for free, not much hours for training, but easy to use, there's a paid tier):
- Kaggles (a bit harder to use and needs phone number but gives 30 hours weekly of better gpus, either T4x2 16gb each or P100 16gb, only free):
- Lightning.ai (Kinda hard, needs login, no issue with web uis or anything, but only free 15 credits monthly, Free Studios run 24/7 but require restart every 4 hours. There's a paid tier):
- Be sure to know about the tensorboard
Google Colab = Easier but risk of getting disconnected
Kaggle = Harder but way more gpu time
If you are looking for the easiest way and for free, is using https://weights.com/ which ofc uses RVC
RVC Inference (use models) on pre-recorded audio on Cloud
You can use either:
- Weights.com: Easiest Possible Ever Automatic
- Ilaria RVC Zero: Fastest free on cloud
- Applio UI Colab: RVC Fork with some extra features like TTS
- RVC AI Cover Maker UI: Automatically Separates the Vocals and Instrumentals, converts the voice and mixes them back
ah great 🥲
there's cloud, remote good pc
it's from 2018, now idk what u do with your pc, but AI is hungry for processing power and vram
yeah definetly, its just my laptop that i use it sometimes
just like how content creation like video editing and games also need a decent pc most of the times
yeah, just to let u the market is shifting away from 8gb vram gpus lol
on w okada deiteris fork i get an average perf of 37 - 42ms so how low can i lower the Chunk setting?
close every single program in the background, set game graphics to the minimum, show a screenshot of ur wokada
no no, without any game open
just discord and wokada
damn thats crazy
alr, show a screenshot of ur wokada
ok
yeah things change so fast, missing like half a year on the ai field is like missing 10 chapters atp for example
always new models racing with eachother
now iim not saying every single program and model but you get the point, the majority is, just like chatbots
i cant share screen
no perms i think
have not checked it that much recently, but i was wondering whether voice models for examples are progressing much, i remember my models from 2023 being not that bad compared to now
unfortunately original rvc devs left the project to kinda rot to work on another TTS project instead, gpt-so-vits, which is different than RVC which is STS, so now rvc is mostly being experimented by the community
We are having improvements but mostly about performance
so how low can i lower the Chunk?
ah i see, too sad for us 🥲
You might wanna check #🔊│ai-development
There's some of our engineers experimenting
Right now the most up to date fork is applio with the best performance improvements
appereantly i have no access
get the ai testing roles in the guides at the top
hey guys, just wondering if i could get some help here, i originally pinged Lyery but they may have missed my message. anyways, i was running some tests on this uvr software and it seems to always produce a slight "static/hiss" noise throughout the whole clip. its not very noticeabale unless you listen very carefully but like people have said rvc aint human so theres a chance it will pick it up and ruin the whole model. Lyery mentioned on using gabox_fv4 and mel denoiser aggr for separation of vocals from instruments or background music and background noise like ac, fan, etc, respectively. however i can not find these two models. my question is, do i need to download a different uvr software or can i download those models from somewhere for me to use with the current one?
join the audio separation discord (google "audio separation discord" since we cant share invites here)
then download the latest uvr beta gui there, if you got more questions regarding separation models ask them in that server
thanks Lyery, i'll do that 👍
https://huggingface.co/jarredou/aufr33_MelBand_Denoise/tree/main
https://huggingface.co/GaboxR67/MelBandRoformers/tree/main/melbandroformers/vocals (download fv4, fv5 is noisy and bad for rvc)
hmm iirc latest uvr beta can download them from the gui itself
gonna check that rn
sounds good, i may not have downloaded the beta as i was not able to find those two there, but if the beta has 'em ill download that version instead
i personally use the colab version of uvr because the conversion can be quite slow with big datasets (anything above 30 minutes)
ahh makes sense, i decided to split them into clips instead to make it easier somewhat.
ok none of these can be downloaded using the gui
ideally you want to merge all of the files into one
because for rvc training you have to split the whole dataset into 3s chunks
the process is like this:
merge all files into a big one singular file
truncate silence
simple slicing in applio using 3s and 0.3s
i could use adobe premier to combine them all in one right?
audacity is a bit weird and i dont like its ui
you have to use audacity to truncate the dataset silence
otherwise results are going to be bad
rvc kinda hates silence
i could do that, i meant as just mainly the combining part, like combine all the files into one export that and import the single file into audacity?
yup
installing the uvr models is easy
be sure to download the ckpt file alongside its yaml
then just follow the gui instructions
remember to export the file as wav 32bit
alright so i combine all files into one using adobe, export that as wav 32bit, and do the cleanup in uvr and then use the cleaned clip in applio?
- combine all files into a big singular one
- clean the dataset
- convert to mono and truncate silence
- in applio select "simple mode slicing" and use the default settings of 3s and 0.3s
- f0 extraction
- train
gotcha, ill keep note of that
you can use this truncate silence settings, work fine for me (those are the codename settings, i just borrowed them
)
niceee this is gonna be helpful, thanks for that 👍
does anyone knows what is causing this? i am copying the path of my audio file and im assuming its not finding it, when i try to pre-process
what path do you give to preprocess?
3060ti
I want to do the voice of Mario Bros.
I want to download it for an animation.
Hello dear peoples, I have a question, So I'm working on an Alex Mercer voice model and I want it to be as high quality as possible, it's dataset length is: 00:30:58
What's the best batch and epoch for it?
Forgot to mention, I'm a newbie at training models and my goal is to become a model maker, this is my first model.
i think weights is down again
or it’s at least having technical difficulties
i keep getting no healthy upstream errors
i went ahead and downloaded gabox and i get this error, not sure what went wrong here, sorry for the amount of screen shots lol, what would the fix be here? anyone know what went wrong? not a coder so whatever happened i have no idea 😭
have you installed the beta roformer version?
dang forgot about that, lmaooo let me do that
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
How do I fix this? It does not show my mic
What browser are you using
Edge
Be sure you gave it microphone permission, you may also want to try if it works on chrome or firefox
I just copy the path of the dataset file
just wanted to let you know i got it working and god damn, it's clean asl, no static, no background music its freaking crazy lmaoo
also just one question, i split the clips (beforehand like i said) but some clips have music while the ones that dont have any music do have paper flipping, would it still be fine for me to combine all clips and run them through both cleaning methods? or should i clean them separately and then combine them in adobe and then run that single file through audacity for truncate silencing and then export it to applio for training? lmk what you think
hi sorry if this was asked a dozen times, but how/where can i use the Cloud RVC?
you need to provide a path to the folder, not to the file
like G:\training, not G:\training\character.wav
aah i see, thanks
Also guys how to make ai cover the rvc v2 thing got shut down in google collab cuz no longer free
also use rmvpe not rmvpe_onnx
what's your pc gpu first of all?
you should first of all check your pc gpu to check if it's good enough for local
did you check it?
because it's pointless to use cloud when you got a good pc
i have an AMD gpu, is it ok but just slower?
AMD ryzen 5 7535U with radeon graphics
thats whats listed on my device manager like 15 times
that's not a CPU, not a GPU
oh sorry...
the CPU is the Central Processing Unit, while the GPU is the Graphics Processing Unit
AMD radeon(TM) graphics?
that's extremely weak integrated graphics, do you got any other gpus?
like a dedicated amd gpu
no idea where to check
or is it an old laptop
yes
You can check your pc gpu on Windows via:
ctrl+shift+esc (task manager) -> Performance tab -> GPU
it's better to check
i have no idea what im looking at, but it says the dedicated gpu memory is 0.8/1.0 gb
same gpu
just send a screenshot of your task manager
!give-media-perms 1h @empty pecan
nvm, your pc is extremely way too weak for local ai usage
mhm!
Train (make) RVC Models on cloud:
- Prepare the Dataset
- Setup RVC:
Choose a cloud way to use RVC,
- Google Colabs (max 4 hours of daily T4 16gb gpu not granted for free, not much hours for training, but easy to use, there's a paid tier):
- Kaggles (a bit harder to use and needs phone number but gives 30 hours weekly of better gpus, either T4x2 16gb each or P100 16gb, only free):
- Lightning.ai (Kinda hard, needs login, no issue with web uis or anything, but only free 15 credits monthly, Free Studios run 24/7 but require restart every 4 hours. There's a paid tier):
- Be sure to know about the tensorboard
Google Colab = Easier but risk of getting disconnected
Kaggle = Harder but way more gpu time
If you are looking for the easiest way and for free, is using https://weights.com/ which ofc uses RVC
RVC Inference (use models) on pre-recorded audio on Cloud
You can use either:
- Weights.com: Easiest Possible Ever Automatic
- Ilaria RVC Zero: Fastest free on cloud
- Applio UI Colab: RVC Fork with some extra features like TTS
- RVC AI Cover Maker UI: Automatically Separates the Vocals and Instrumentals, converts the voice and mixes them back
what's your pc gpu and what do u want to do exactly?
My gpu is like 6gb i think and someone said that it's weak anyways i've wanted to run seed vc but i've been getting countless issuses and new errors
i mean the gpu name
also, you want to do realtime voice changing for calls/games?
NVIDA I think
which?
Basically how it works is that you put an audio.mp3 and you can use that audio to talk
3060
in rvc context:
Inference = using the models
training = making the models
there's inference on pre-recorded audios like ai covers, then there's realtime inference which is used for discord voice channels for example
which do you need?
RVC = Retrieval-based-Voice-Conversion, it's the best quality few shots training Speech To Speech AI
But in there you don't need a pth and a index file, anyways can i sent a video so you can understand what i mean?
so you got the rtx 3060 with 6gb vram?
Yup
seed vc is 0 shot, i have seen that project before, but 0 shot is also lower quality compared to few shots
are you sure you want to use seed vc compared to rvc?
Not sure..
also, i asked you if you wanted to use ai for pre-recorded audios or realtime for calls, because there's different programs
id need to at least know this
Uh i just want it for talking and mabye games too
Oh sure! I want to!
soo, you want it for realtime in calls and games?
Yup
alright,
RVC = Retrieval-based-Voice-Conversion, the best Few Shots Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models. Technically, Mainline RVC does have a go-realtime.bat (aka RVC-GUI), but it's pretty messy and outdated so it's extremely not suggested for realtime.
Wokada = uses RVC for realtime inference. There's 2 main versions, Original made by Wok, and the most suggested one is Deiteris Fork (modified version)
Your pc gpu is good enough, you can do it locally, idk who told you it isn't
What you seem you need is Wokada deiteris fork for the best realtime inference, which you can get by reading https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/
if you want, you could optionally also use https://github.com/Plachtaa/seed-vc which is less used though, since yes you don't have to train models since it's 0 shot, but at the same time it will be lower quality
0 shot is kinda like a student who read 10 pages once, and few shots is like a student who read those 10 pages multiple times, ofcourse the 2nd student should perform better at the test
personally, i'd suggest you wokada deiteris fork, but it's your choice
Oh okay! Can you tell me why you recommend deiteris? I'm curious
well i kinda explained it at the end of the message
a properly trained rvc model should have better quality compared to another ai that doesn't train
-# ofcourse, if the model is trained on garbage, the model will be garbage
Oh okay mb i thought it was just some kind of uhh joke
no it was a comparison to explain you the difference between zero shot and few shots ai trainings
Also by training models you mean uploading a pth and an index file right?
by training a model i mean the process when a user prepares the dataset, uses the program to train on it using a pretrain, and at the end gets the .pth and .index which are the weights files you find in models trained by other users in #1175430844685484042
Oh okay! Tysm!
But how do i sound accuretly to the character that i want?
Uploading pth and index files into RVC isn't training. It's more like you upload files of an already trained RVC voice model into RVC, so the program can do AI cover and so on. The actual training is when you upload an audio file which we called it a dataset into the program, and when everything all set you then click start training there.
Thanks for explaning! Thank you so so much!
Is it possible to reduce artifacting somehow?
This is what the model Im using right now sounds like https://youtu.be/hckM9f1NyAI
using this model #1226933547940446251 message
Like, can using an embedder help?
Anybody got a good video to help me create a good rvc model for the first time
All video tutorials are outdated
There's only written guides
What's your PC GPU
is there a model i can run locally that can work from youtube? i dont wanna seperate the voice and shit
The biggest problem I have is installing I have the files, but I always have trouble with the torch section especially with having to command prompt it never works
i literally know nothing about making voice models should i use applio or codename fork? and also is there a guide that teaches me how or what to do?
If I apologize to Weights will they be nice again and verify all my models
applio. code's fork might be a bit complicated because it has a lot of parameters
Applio is pretty good, I could teach you the kaggle applio
I have no experience with codename tho so can't help there, dm me whenever you're ready
whats the diff between applio and kaggle applio
idk but it's just what I use, kaggle applio
I don't think there is a difference
guys i need help there's an echo where it seems like the voice changer is picking up the changed voice, and applying the changer again. Is there a way to fix that?
how much vram?
be sure to not use video tutorials
what's your pc gpu?
show a screenshot also of the program
applio is basically the most up to date easy to use, codename is based on applio and has some extra feature, for beginners it's better applio, what's your pc gpu?
elaborate
what's your pc gpu
what do you want
what do you want to install
be sure to not use video tutorials
3060 ti
applio is the program itself, kaggle is just the cloud computing service that can be used to run the code for people with a bad pc
what he said
did you even check your pc is good enough before using applio kaggle lol
this is why u should always check ur pc before using cloud
nice, good enough, check https://docs.aihub.gg/rvc/local/applio/
Last update: Apr 01, 2024
bro it started working now thanks, 4060 btw
no I just use it because it's fast and simple to use
I dislike local because it makes my vr run slow
do u may want me to check your settings? I can suggest you the best ones for quality and least delay
i think i cant
i mean, cloud got limited free time and can break easily because kaggle can update its dependencies whenever they want
where can i send it
lemme guess, you used a youtube tutorial?
yessirski


