#✨│ai-help
1 messages · Page 242 of 1
no idea but most probably, no, tho i can explain some things to you really quick
so applio has this avg_50 graphs, those are already smoothed by default, so to correctly read them you have to set your smoothing to 0.5 in the tensorboard site
you choose batch size depending on how big is your dataset
there are two slicing methods for datasets, simple mode, and automatic mode
Automatic mode is the same slicer of mainline
simple mode slices every 3s (by default), it doesn't take silence into account, so you have to remove silence in audacity using truncate silence
I ran with the default settings and reached around 200 epochs with my model, I usually get caught by the errors that come from doing things that are said in the guide
ah i have no idea about colab specific errors, i gave up on them, too many errors
ikr
I tried to get zluda working because I am unfortunate enough to have 6700XT but that was a whole world of errors in itself, worse than colab
ive heard zluda training speeds are extremely slow so it's better to use a cloud solution anyway
twins i also got a 6700xt
my brother
this might be flux or placebo but i feel like amd gpus train models weird and give bad models

like i could compare my model trained locally and a model trained on a nvidia gpu and even with everything being the same the amd one sounds worse
he's super knowledgeable but the image I attached is a big problem for me trying to get help from him 😔
thats literally him lmaoo
LOL WHY IS THAT SO ACCURATE
after you beat you head all sunday evening against a desk because someone cant follow basic instructions...
most of my problems that I've been having are because of following outdated basic instructions...
with 6700xt is it faster then colab
unfortunately for 20 people who follow the instructions and get the results there's someone who skips steps and misreads everything
I blame the ipad generation
oh thats great, T4 gpu is sooo ancient
nobody teaches the computer basics in school any more
out of the 5+ AI tools that I've used for various reasons, this is the only one that I had to bash my head against so much
and I've used Applio back when the guide was still not outdated, back then everything went smoothly too
If the system tells you "Pytorch is damaged", it indicates that Mac has flagged the W-Okada as malware, which is a false positive. For a solution, open a Terminal and follow in this guide. https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#opening-on-mac
Last update: May 5, 2025
maybe you don't have to assume that beginners have degrees in computer science and expert knowledge in AI
and me being a beginner, the average joe would struggle even more than I am
This is where to discuss about the program issue, not showing off your ego. 
yeah, I dont know how to drive a push cart, lemme drive mclaren f1
great approarch for AI
isn't this discord server and the guide meant to make this more accessible for beginners?
Don't take what everyone here says about you seriously, it ain't that deep.
@low shard can u help me
what is your issue with the colab? i just tried it and it worked
the ui one?
well, he tried crepe with hop 70.. and 1
yea
the UI one doesn't load backups properly for some reason
Creepy.
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
bro are u blind
.
well sorry for reading the tiny piece of text that told me decreasing it would make the pitch better at the cost of longer time (even though having hop at the default didn't help either)
assuming you have your drive mounted do you have your model in ApplioBackup/modelname
yea that was an old misconception
back then people had no idea about anything rvc related
If I say yes, would you believe me? Ok, I know you are talking about W-Okada the realtime voice changer, but what you elaborated is too less.
mangio-crepe was a silly idea
and the text is still there.. to this day
damn
If you dont want to mount your drive you can create a folder in colab called 'drive' then another folder in it called 'MyDrive' then create the 'ApplioBackup' and model name folders
even if u use crepe with a hop of 160, rmvpe is still better
in the aihub docs or applio docs? if its the aihub docs please please let me know of any issues
i take care of the ai hub docs now so ye
yeah I have my drive mounted, and it did make an Applio folder with the model in it after I loaded backup, but when I tried to run Applio it gave out an error
it's in the Applio thing where you extract and all that
things have gotten more complicated than before imo
like, how could we explain that the loss graphs don't help in choosing a good epoch??
they're there so you can monitor irregularities
or that g/total is innacurate
that's new knowledge to me
you say that
"The tensorboard is for monitoring for any irregularities any issues. Do not depend on the tensorboard to find your best sounding epoch"
applio has a new loss named gen adv which is a bit more accurate than g/total
but that is only in a specific branch
and even that doesn't help you in choosing an epoch
the loss graphs are really a bit useless
best metric is to hear your model
mention how g/total is the combined loss of all the other losses
yet still innacurate
u can see adv gen loss going up yet g/total may still go down
I think a good step to help beginners is to have a strong suggestion that auto backups be turned on in the Applio colab tab, as I couldn't find any mention of turning auto backup on
Since it was in extras I assumed it's not necessary and I learned that the hard way when my colab ran out of GPU resources
thats bec mel and kl are still going down
mel carries g/total iirc
these always go down
only time they go up is if u try silly stuff like loss balancer


Texts blur when I look closely to them. How am I thar blind? It's more like I lose focus on a small topic too easily, especially when there's an ongoing bigger topic in chat or channel. 
most stuff needs to be updated to be honest, a lot of things are outdated
anyway i havent updated the docs on the new logging stuff bec it isnt in the mainline branch of applio
btw, removed hop length for crepe from UI in exp/f0 branch
google doesn't really like local ai stuff so they don't care if a random update kills ai training/infer
they just want you to use their ai instead
:0 nice!
the colabs being buggy isn't really applio guys fault but more like google trying to savotage everything
kaggle is another option but imo is way more broken than colab (they also hate any deep fake related ai stuff)
Can someone help me ? in cs rvc doesnt work what can i do ? i changed the input
@analog obsidian what should I do step by step in the no UI colab to continue training on my ApplioBackup? I've been using the UI colab this whole time so I'm a bit confused
Do I need to remind you again?
W-Okada not working with Counter Strike 2 can occur with several reasons, like using an older and original version of W-Okada, VB-Cable and Voicemeeter seem to cause issue when using them with W-Okada on Windows, and your microphone.
sorry i don't use it, i train locally
i didnt ask to you
pls shut up
Then it's not my job to help you either, if you continue being a dick against me so.
- "can someone help me"
- "i didnt ask you"
womp womp ?
💀
Ignore him. He thinks he can help himself for that.
im using the fork version
im not using voicemeter or anything else
anyone who knows how the no UI colab works can help me figure out step by step on how to continue training on my ApplioBackup? I've been using the UI colab this whole time so I'm a bit confused

i think it works the same as the ui colab
i tried to run it on steam web shift tab
it opens ui but browser doesnt have a mic perm
so it doesnt work with it too
Last update: May 5, 2025
read the guide
Mate, you said you didn't ask to me. Why did you switch up that fast?
i did it
you got virtual audio cable?
i did the same settings
ye bro
its working on dc
but not on cs
i think browser goes sleep mode when on cs
does cs allows you to choose which mic you wanna use in game?
i dont know how to solve this
ye
i did the settings
hmm weird
when i alt tab it works again
but even if the gui is frozen, the actual program is running in the cmd window
but in game it does not work
see if restarting the voice changer fixes it
if that doesn't work try restarting ur pc
could be some weird windows interaction
i tried it several times , when im on cs cmd goes sleep mode i think
or the browser is muting your mic when you close it/hide it
try a different browser
anything but operagx
try chrome
oke
@simple ore could you explain which cells I have to run step by step in the No UI colab to keep training my backup? I'm getting this error
I've tried:
Mount google drive > Clone > Install > Load a Backup
yeah this weird issue comes due to w-okada being written in javascript which is extremely buggy
so every browser reacts differently to the gui
some are fine with it, some can't run it properly
ye the problem is i think when im on cs browser is not using mic
same problem is on chrome too
Some say the Javascript is a trash programming language, but that's it. 
iirc the reason why is running in your browser is because the guy who made the fork noticed running it in the browser had better perfomance than running it in a window
i hate java.
i may be wrong tho, that was long ago
try edge
what gpu you have?
i plan to buy a 5070 or 5070ti
nice, thats more than enough for this
at the moment you could try fcpe instead of rmvpe
fcpe is like a slightly less accurate rmvpe
but runs very fast
at this point I would've had better luck training on my cpu for 42 hours straight than using colab 
estimation wise, so dont worry, it doesn't affect the model's quality
4 days of straight up 12 hours a day of trying to train a model using colab doesn't do good things to your brain
sorry i forgot to say thank you 😭
what if you start from 0 in the no ui colab
most models don't need more than 200 epochs
👌
that'd be my fourth time training the same model from 0
i have one more question though
i've tried to download a model on applio but it doesn't seem to be popping up even after refreshing
so whats the problem? applio doesn't save the epochs?
first time I tried training with CPU, that went bad, reached like epoch 70, then had to start again on colab, didn't have auto backup on so lost progress
most of what I did is a blur at this point, I'm on 2 hours of sleep
im pretty sure the autobackup option only saves g and d
you can convert the G file to a pth file actually
and use it as a normal model
no it saved epochs
oh nvm
so uhm you wanna train 200 epochs?
I reached 235 epochs, I want to get to 300-500 to have as good quality as I can
more epochs doesn't mean better quality
so in simple words
epochs = time the model has seen the whole dataset
if you force the ai to see the same thing a lot, it will believe it should only be able to clone the dataset and nothing more
it will quickly forget the pretrain knowledge
and become dumb
how it feels reading this
this translates as the model sounding robotic asf
and a lot of random weird problems
the model is going to try to clone your audio but it will not have the full knowledge of how to do it correctly
the only thing that takes 300-500 epochs, are pretrains
these are trained with like 50 hours of audio
your small dataset doesn't compare to that
so realistically speaking, you don't need more than 200 epochs
I have 12 minutes of audio
most small models are done within the 100-150 epoch range
but batch size also affects this
i would use either batch size 8 or 4
I used the default
train 200 epochs
save every 10
listen to all the epochs and choose the one you like best
okay then I have all the epochs I need
there is a less biased method of choosing epochs but you're a beginner so stick to what i said, is easier
time to test
what is the non biased method?
looking at the spectogram and see which epoch perfomed the best
you wanna check spectogram reproduction
how can I set it up if it's not too complicated?
you know how to analyze spectograms? if thats the case, use rx11 since it's more precise
spek works too but rx11 allows for more precise analysis
tbf I never analyzed a spectogram, is it too complicated to learn or is it similar to learning how to read graphs?
is harder than reading graphs thats for sure
tho is easy to spot when the model is generating noise instead of data
you'll see missing harmonics
also some random artifacting
alongside more spectogram related issues
good epochs are able to do decent spectogram reproduction
so they sound less robotic and more natural
I'll try to listen to some epochs ranging from 120 to 200 for now, maybe after I rest I'll try the spectogram way
sure, do some research about spectograms in general
you need that knowledge
at least for rvc is needed
you can also know if your batch size was either too much or too low by analyzing your model spectogram
but at the end, small datasets (10 minutes and below) give very random results, so for example, you can get a very bad result in your first training run, but if you train the same dataset a second time, you may probably get a better result than the first
what are these? @analog obsidian how do they affect talking?
also thanks a ton for being kind and willing to help, you're very down to earth 
index = is a file where the accent of the dataset is stored, it is possible to use index files from another dataset tho i only use the index of my model
a safe value is 0.5, so a 50% of the dataset's accent will be added in the result, the other 50% will come from the pretrain
too high values may introduce artifacting (glitching, voice cracks, weird sounds)
volume envelope = rms normalization, in applio this is bugged, so don't use a value different than 1
protect voiceless = supposedly decreases the amount of robotic sibilants and breaths but a good model doesn't need this, in case you wanna play with this, start with a value of 0.33, bigger values decreases this protection, and lower values increases it
a value of 0.5 disables protect voiceless
for analyzing epochs don't use the index file
you can then use the index file after you find your best sounding epoch
index may introduce some issues the model doesn't have to begin so thats why it's safer to analyze epochs without using the index
personally i have noticed using the index makes the model sound more true to the dataset
so better resemblance between the model and the real voice (this is probably why rvc-boss, the author of rvc, added index files)
is it possible to change how strongly the index impacts the audio?
it does not work like that
the percentage is a blend between phonemes from the audio and the matching phoneme from the index file
i think not but eh i noticed different values changes things a bit
0 - use phonemes from audio as is, 1 - use whatever the matches found in the index, anything between - blend the values
audio has 'th' phonemes, the closest faiss finds in the index is 'z', if you use 1 your model speaks english with german accent
so if I use 0.5, my model speaks slightly german accent?
i thought 0 was coming from the pretrain
basically
obviously if your model has a english index, it'll have an american/brittish accent instead
the model still can make an incorrect preduction what the sound should be if it has not been trained on specific phonemes
there may still be a slight accent with 0 index
so the random voice cracks while using the index is because of this? like a japanese index trying to infer english audio
likely yes, it just finds a bad match/does not find anything
o nice to know
where can I change the value of the index ?
and then the model is unable to produce anything because it has never seen such phoneme
"search feature ratio" by default is 0.75
interesting, so i suppose dataset size also matters in this regard
0.75 = 75% of the dataset accent blended in the result
find a sweespot where it works good for different audios tho, don't just use one
with 200k slices the index creation runs clusetering argorithm to group close enough samples to some average, then it runs a trim if there are more than 4000.. you can see it in the log with minibatch output
oh yes that kmeans thing
kmeans is 200k -> 4k
for bigger sets, lets say above 1 hour, is not worth to use faiss?
at most you can have ~4k unique samples
that's 3 hour set
and even then it will narrow it down to ~1200-1500
the largest index I saw about 1GB
you dont need to train the whole thing, you can just preprocess and extract features, then run the index creation
i see
but isnt this bad?
or the index doesn't need that much data?

if i remember well, rvc-boss added kmeans because there was a bug that prevented index file generation while using sets above 1 hour
okay this is not the case, seems like he did this to speed up the index generation
i mean.. if it fine, but for realtime it may be taking extra time to look up phonemes
ohh good to know this, thank you 
i have not tested it, i need to find a big index and compare
@low shard is there a proper tutorial into using rvc i wanna use voice changer with discord
Last update: May 5, 2025
So higher batch size produces a worse result? even with a very high end GPU?
So would you say generally 200 epochs is the middle ground for most samples between 10-40 minutes?
batch size 8 in a 5 minute dataset is bad but on a 20 minute one is good
i have a sample size of 40 minutes right now i want to train. So i should pick 8 over 4?
and i can't predict where your model is going to start to overtrain since thats random
i would use 8 yes
Is the overtrain detection Applio has actually useful?
or is it more of a gimmic
no, i have asked one of the devs to remove yet is still there
its a gimmick that doesnt even work
ah, so its useless then
hearing overfitting/overtraining is kinda easy, at a certain point the model will sound very robotic
you just use anything before that happens
so lets anything above 150e sounds very bad but anything prior to that is "alright"
Yeah fair enough, so i could just set the epoch to 400 and save every 20, and just see through all of them
the model i trained now sounded fine to me at 500 epochs, and now i added even more audio to it
they're there so ppl can check for divergence issues and such
but doesnt longer audio usually mean you should use less epochs? or did i get that completely wrong
or is it more dependent on how varying your audio sample is
depends in your batch size
with different words, tones, etc
and how different is the dataset compared to the pretrain
the og pretrain is trained using very monotone speech
so if ur dataset is also monotone like the pretrain, the model will have a more easy task learning ur set
my dataset is very varying in pitch and tone
what you can do is to train 200 epochs and if you notice e200 sounds fine, you can continue training until the model starts to sound very metallic/robotic
tho i personally never train over 100 epochs
So im assuming its very easy to notice when its overtrained?
yea u dont need to be an audio nerd to notice when the model sounds unnatural and robotic
is pretty obvious
has a particular ugly robotic sound
yeah fair enough. On my 500 epochs i noticed like a few words and pronounciations that sounded a bit bad, idk if that because of lack of data, or too much data (too many epochs)
i guess i could test the 400 and 300 one and see if its better
epochs = everytime the model has seen its full dataset
so your model has seen its own dataset 500 times
yeah
smaller datasets don't have too much data to begin so they overfit pretty fast
for example a pretrain that has 50 hours is trained using 300 epochs
because there's too many stuff the model has to learn
but 300 epochs with a 5 minute dataset is overkill
more epochs don't mean better results
yeah
soo a general rule then is larger dataset = more epochs?
BUT also dependent on batch size?
no way to tell, again, its random
too many factors
yup
depends how hard is the dataset to learn
depends in a lot of things really
u can try two approach of selecting epochs
you can either train 200e and save every 10
or train 200e, save everything, and hear all until you find one that sounds more natural to you

okay ill try at 200
Also, for "silent training files", even if the audio has no background noise, do you usually always leave this on 2?
okay
The "Fresh training" option, do you always check this ON when making a new model? or is that if you are making pre-trains?
same with the "Dataset Creator"
that option deletes the G and D files, and the graph files (eval)
use it if you wanna start your training from 0
Okay sounds good
but sure you can have it enabled when making a new model, nothing bad happens
just don't enable it when resuming training
otherwise all of the process will be lost
yeah gotcha 👍
Its exciting trying to constantly improve the voice, the spectogram stuff sounds exciting too, but also sounds like a lot of stuff to learn
Is there a easy, fast and consistent way to do this, that you would recommend? Im assuming it would be best to somehow run a pre-recorded voice sample though the voice changer for consistency?
anyone know is collab broken atm? my voice on deiters fork is bad
cutting and distorted my voice
test the model in applio, don't use the voice changer for testing purposes
voice changer inference works differently
so you WANT to use voice inference to test, right?
Need help with the Okada Voice Changer, only Beatrice V2 models are producing audio output
I am using an app called audioRelay
To connect my phone to computer for microphone
Hi! I wanted help with the Colab Research link to create ai covers, Whenever I click on an old link it doesn't go through, does anyone know what the new link is?
only beatrice model is good in the og 2.x beta version
otherwise you should try the fork version: https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/
Last update: May 5, 2025
Hey, does anyone know how to make AI stuff like the clip on the bottom: https://www.facebook.com/reel/677256711796852/?rdid=DahxdqgAZes9Pdpp&share_url=https%3A%2F%2Fwww.facebook.com%2Fshare%2Fr%2F1JeV3Wtifi%2F
I've tried HeyGen but I can't get it to just stare at the screen and do subtle motion like they're reacting to the video. Much appreciated.
Hi anyone who used Foocus or any other ui who made lora can please join vc? i need a smal help
Please
where can i find some good ai cover sites?
Guys, who knows why the program doesn't load, I've already tried everything, all the components, updated, a lot of things, why the program just doesn't load, and after 2-3 minutes it gives the error Error: Could not load Voice Focus estimator. and when I resume the same thing, does anyone know how to solve this problem?
what's your GPU?
NVIDIA GeForce RTX 3080
and the program version you're using?
MMVCServerSIO_win_onnxgpu-cuda_v.1.5.3.18a.zip
try this version https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#download-nvidia-on-windows
Last update: May 5, 2025
okay
pls help how to change epoch
why i cant choose model
do you actually have a model in logs folder?
Which epoch? Are you talking about RVC voice model or W-Okada?
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
Oh i have done it thank!
guys, the ai voice changer program opens start-https.bat and the console closes immediately what to do
Chunk doesn't make the audio sound better in quality; it's more like what makes the audio to delay. What makes better quality is Extra. A GPU has a contribution at converting audio in real time on W-Okada.
You use the original version of W-Okada, which is old and outdated. I'm guessing you have followed a tutorial video on YouTube before. What is your PC GPU?
video card invidia gpu intel
MMVCServerSIO_win_onnxgpu-cuda_v.1.5.3.18a.zip
this file i download
To check your PC GPU name, open Task Manager.
nvidia
That's a brand name, not a full name of GPU. On Task Manager, go to Performance tab, spot where GPU 0 or GPU 1 is in the left side there, and click one of them to reveal its full name on the right side.
intel xeon cpu e5-2650 v2
No. That's CPU.
For example, if your PC has NVIDIA GeForce RTX 4090, it's RTX 4090.
nvidia gtx 1070

Download and use this better W-Okada instead, since you got NVIDIA GeForce GTX 1070 in an Intel Xeon PC. https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#download-nvidia-on-windows
Last update: May 5, 2025
Download NVIDIA on Windows
The lastest version as of December 7th 2024 is: nvidia-b2332 (click here to download)
If you have a GTX 700 card or below, use AMD/Intel version instead.
this?

this?
ok
y.u.p.
Yes.
and after downloading, what should I do?
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
Um. Which W-Okada version are you using? And what is your PC GPU?
namari is really busy in this channel.
It's possible to lower chunk number under 30 ms for less delay, extra number stays at 2.7 s, and also force to use fp32.
I don't know how to explain this. 
Damn. Although you can set extra up to 5 s, most of the time you may experience audio cutting off a lot, so 2.7 s is best overall. If the audio quality still low, it can be a voice model you're currently using.
If I buy premium weights will I get a better quality image?
I'm not sure if paying for Weights Premium would help generating image in better quality, but these are privileges of having premium going there.
can i get some advice on making the ai voice changer sound "better". no matter what i seem to do with the pitch, format shift or index it always seems to sound off.
Try a different model?
That's one of the tips on how to make W-Okada to sound good. 
i swear iv tried so many lol
i still have not ruled out the ai not liking the british accent yet
just train a better one
doesn't look like they offer better quality images
yo whats the newest way to make ai covers of songs
havent done it since i used ilaria
??????
help pls
can apollo do from youtube? or even if not from youtube, is there any that seperate the audio convert it and combine them together?
Is it just me, or the overtraining detector is not working?
Also I just pressed the Stop Training button and it's still training
The overtraining detector doesn't work very "well"
I think it is useless
You should just manually detect overtraining
And looks like the model didn't learned nothing after ~40k (I think :b)
Yeah, latest epoch is 431 but the lowest point here comes from 240-270
So i think I might just stay with epoch 240-270 or something
How long is your dataset?
29min
Hmmm
All samples are 22hz so I used 30hz
After 60k is overtrained
I recommend you to check lowest points between 40-60k steps (Not only the lowest of them)
Probably the lowest is the best one, but it might be just noise and not really a good step
Just compare them and find the best one

graphs don't help in finding the best epoch nor when overtraining starts in rvc, g/total is a bit innacurate so don't rely on it too much
instead, hear the epochs (every 10 is ok) and keep the one who sound the best for you
Oh, if you say so
Probably a dumb question, but is longer dataset always better? or does it at one point end up making the final voice worse with too long dataset?
it is a careful balance of getting the model to learn a new voice and not eroding allt he things it learned during original pretrain training
Hmm ok, i have a solid 1 hour of very clean audio now, but idk if thats too overkill
given consistent quality and variation, it is like 85% to 95% to 99%
it's still fine but you should also pay attention on the quality consistency
there's no limit, just remember that a big dataset (over 1 hour) may require a higher batch size than just 8
the quality is all the same, its all pure voice audio with no background noise removed
There might be a slight dB level change between the clips, does that matter?
if so, i can try to normalize it to 1 level
oh okay, its like 1 hour and 3 minutes now. Like 12 batch size? 8 maybe?
8 is fine
Oaky, ill try 8. What do you mean by these percentages?
i would use 16
okay, 16 it is then 👍 . Idk if its too complicated to explain, but whats the technical reason for higher batch size for larger datasets?
does anyone know about spectrograms
gradients may be too noisy and unstable when using very low batch sizes in big datasets
Ohh okay, gotcha
can any1 help w my forum
Oh damn, this is actually really fast with 16 batch size
just in case: don't ever use batch size 16 in small datasets
thats for 1 hour and above
it doesn't but you can indeed use batch size 32 with 2 hours and above
be sure your dataset is expressive enough and not monontone speech
Yeah, thats what im worried about if my dataset is varying enough, but its dedinitely not "monotone" at least
Batch size is heavily limited by VRAM right?
yeah
16 is good enough for 2 hours and above too
so dw
So thats usually whats stopping people from being able to do good 1hr+ datasets?
Yeah gotcha
laziness
they take a hella time to clean
Its using 16,6/32GB right now, so
i finished cleaning my 2 hour set in like 4 days
With like removing instrumentals, background noise, etc?
Yeah feel that. Im lucky with this one since its just talking, pure voice, with a really good mic
So it takes me like 1,5hrs to capture 1 hr of datasets
So i dont mind doing 2 hrs, as long as it doesnt "hurt" the dataset with more data
the more you add, the more realistic the output
and better the results because you replace more stuff from the pretrain
if u rely too much in the pretrain (small datasets) things get weird
Yeah, i might just add on to it then, see how good i can get it
Im still missing data from like whispering etc, so i might have to see if i can find any data for that
Oh okay
makes the whole model sound eww
What about like yelling and laughing? And like.. mouth sounds like popping, humming, etc? Is that all bad too?
Because i might want to clean up my dataset based on that
about yelling and laughing im not sure, but i know too much of them fucks up things
and no, every mouth sound is bad
rvc randomly adds those sounds in the results
idk why but i know it does that
Ahh okay
just train clean speech, keep every breath (very important), remove unwanted sounds and noise
and.. thats rlly it
Yeah it might be like 1-2 minute of yelling out of 1 hour
Might have to remove it then
Wym keep every breath?
rvc cant clone yelling and laughing so its a bit pointless to add them in the dataset
rvc has to learn how to do breath sounds
so it needs breathing samples
Im a bit confused on that one, isnt breathing sounds considered background nosise? Because my dataset doesnt have any breathing sounds because of the noise gate, its only clear speech
no
breaths are part of the speech
they're the most important part of the dataset
without them, rvc wont be able to learn how to clone breathing
so your model will sound veeery robotic while trying to inference breath
never remove them
Oh ok, but i havent removed them, there just is none in the dataset. Do you mean like, the inhaling and exhaling type sounds before and after a sentence?
yep these sounds
Hmm ok, yeah i dont think i have any clear sounds of that im my data set because of the noise gate being used. I wonder if this is something i could artificially add to teach the model? Or would that be hella work?
nope you cant add them, if you add breaths from another dataset there will be no consistency in the final dataset and rvc does not work very well when cloning a dataset without consistency
Ahh, ok. Thats a bummer. I cant recall if my dataset has any of these breaths or not, because i havent been listening for that, but ill check whenever i come back to my pc. If anything, if i add more data, should i try to hunt down audio clips which has these breaths, or does ALL of the dataset need not have them for consistency?
if the breaths come from the same source (same mic) i guess it should be fine, just add the breathing samples before a sentence
Ah, so if i get a sample from the same person/mic, i can artificially add it onto other sentences in the dataset?
If i were to add it to every sentence tho it would take hours hahah
if u get the samples from the same person using the same mic, most probably yes (i have never tried something like this btw, but is not bad to try it in case gives good results)
nah don't add them to every sentence lol
but rvc kinda needs a lot of breathing samples in order to learn them
idk how many exactly
you'll need to experiment with that

and yes, i said samples, you need more than just one breathing sample

gather different unique breathing samples
https://imgur.com/njXmLsm
not sure what do after making key any help?
I’m trying to run it through Google Collabs if that makes any difference
12 GB vram is recommended for batch 16
otherwise 8 could be okay unless the dataset is diverse
More data, more variation, larger batch size allows to get a more stable estimate for gradient direction
larger the batch the less calculations for gradients have to be done every epoch, so slightly faster
but I would not go higher than 8 for an hour long dataset
but it is up for you to experiment with
Okay
will try and see what i can gather
Oh okay! I finished the one with 16, i just started training the one with 8 batch size now, and ill see which one is better when i come home. I guess its all random and people have different experiences with diffdrent batch sizes
Also, whats your perspective and experience with yelling and laughing in the dataset? Good or bad?
how ar u
hi! does anyone know about facefusion.. because when i process the video the output doesnt come out..
What kinda error
idk it just analyzes it
but the output doesnt come out
for some rsn
i used huggingface
ic you asked in the facefusion server anyway wait for an answer there
if you have a capable gpu, try setting up locally
im on macOS
[FACEFUSION.CORE] Processing step 1 of 1
Analysing: 100% (334/334)
literally stops after
@glacial pollen this training is not going well, right ?
Well my dude, this is up to you to judge / learn
This is not my job here really
- I'm busy rn, watching some vtuber stream
I'm just asking for the charts actually.
And I've stressed it multiple times here and there
and I will once again, this is not my job on the server to keep guiding people on what's correct / good and or bad / wrong
You see, when I was learning, I had to do all on my own, that's the point of learning and understanding
Okay that's true. But this is your fork and software, that's why im asking you
Well it is, but as you should know " avg 50 " is not my invention
@simple ore He made it
I only ported it
So if anything, direct questions about that metric to him
- I am on hiatus / inactive on the server ( doing a big break. Hence would appreciate lack of @ s
Alright, thanks anyway
unclick
CPU too high usage when on cs2
Close cs2
.. well either that or make sure your model is using your gpu because it should use minimal cpu if it is
"If you are still using CABLE instead of Line 1, I beg you to switch over because it is unironically better than CABLE in any way possible."
what line 1?
hey
ah i use cable C & cable D im guessing i cant do that option?
alright ty anyways i doubt i can do that since i use like 4 different routes
asio okadafork->reaper->peace->discord
try a different voice?
would anyone want to help me figure out why my AIs I made to play tag continue to be IDIOTS no matter what I try? I'm using pytorch and learned it with chatGPT so it probably led me astray somewhere but even after redoing the entire program 4 times I still feel kinda lost
I can't send the two models directly here bc no files
class TagStandardHide(neuro.Module):
def __init__(self, Learning: bool = True, learnRate: float = 0.01):
super(TagStandardHide, self).__init__()
self.layer0 = neuro.Linear(4, 32)
self.layer1 = neuro.Linear(32, 64)
self.layer2 = neuro.Linear(64, 48)
self.layer3 = neuro.Linear(48, 24)
self.layer4 = neuro.Linear(24, 8)
self.learning = Learning
self.optimizer = torch.optim.Adam((self).parameters(), lr=learnRate)
def forward(self, x):
x = self.layer0(x)
x = torch.relu(self.layer1(x))
x = self.layer2(x)
x = torch.relu(self.layer3(x))
x = self.layer4(x)
return x
def updateModel(self, stateTensor: torch.FloatTensor, nextStateTensor: torch.FloatTensor, action, reward: float, gamma=0.99):
optimizer = self.optimizer
state = stateTensor.unsqueeze(0)
next_state = nextStateTensor.unsqueeze(0)
action = torch.tensor([action], dtype=torch.int64)
reward = torch.tensor([reward], dtype=torch.float32)
current_q_values = self(state).gather(1, action.unsqueeze(-1)).squeeze(-1)
next_q_values = self(next_state).max(1)[0].detach()
target_q_value = reward + gamma * next_q_values
loss = torch.nn.functional.mse_loss(current_q_values, target_q_value)
optimizer.zero_grad()
loss.backward()
optimizer.step()
return loss.item()
def getState(self, location: tuple[int, int], seekerLocation: tuple[int, int], width, height) -> list[int | float]:
locationNormalized = (location[0]/width, location[1]/height)
seekerLocationNormalized = (seekerLocation[0]/width, seekerLocation[1]/height)
gameState: list[int | float] = [locationNormalized[0], locationNormalized[1], seekerLocationNormalized[0], seekerLocationNormalized[1]]
return gameState
def getReward(self, gameState) -> float:
#removed
return reward```
I had to remove the reward function because characters but both of them look basically just like this
I've tried just an input and output, having 1 middle, having more middle layers with more and less neurons in each, none of that really seems to affect anything
your forward is funny
I've tried a bunch of random forwards none of them really seem to help
what specifically is weird about it though?
why activation only for 1 and 3?
I haven't really thought about it
is it normal for them all to use relu? /some other activation function? I figured just having it use the layer would be fine
without an activation function there's no point in having separate layers as the math collapses them
there are different activation functions offering different activation probabilities
but that seems weird, wouldn't having the other layers affect them still change it?
like if it was just one neuron each and the weights were right, from the first being say 0.35 the second layer could make it -0.12 which on the next relu would be 0
well that's a bad example I guess because if there was another relu it would make it 0 which would also make the last next on 0
anyway, I cant imagine what that code is supposed to do
your model has 4 inputs and 8 outputs
btw, use standard aliases. torch.nn -> nn
yeah I'll probably make it more readable whenever I do more
is there any aihub docs?
Last update: May 5, 2025
👍
the inputs are this players x, this player's y, seeker's x, and seeker's y, and the outputs correspond to the eight directions (the 4 cardinals and their combinations)
I did it like that so they can move more like a regular person can since you can press w and a at the same time, for example
so the only thing that seems wrong is the forward function?
it doesn't work, and whenever i talk it says this in cmd: 2025-05-29 05:22:15.7391809 [E:onnxruntime:, sequential_executor.cc:572 onnxruntime::ExecuteKernel] Non-zero status code returned while running Pad node. Name:'/rmvpe/mel_extractor/Pad' Status Message: CUDA error cudaErrorNoKernelImageForDevice:no kernel image is available for execution on the device
gpu?
gtx 750
what application/version you're trying to use?
using a voice changer on anything below nvidia's <1000 series is almost impossible
vcclient_win_cuda_2.0.78-beta
ok thx
Noobies said that it basically just makes the extra layers useless but I still don't get it, why would it still not learn?
I'm not familiar with this kind of model training, so I have no other advice
other than in order to learn something complex the model has to have enough capacity and with your current forward function it is essentially just 3 layers
well it's basically two different models with one goal each
before I had it be just one model with a 5th paramater for if it was "it" or not and then changed the seeker location to be the nearest player but that wasn't working
so basically the hider model is just "go away from this point" and the seeker is "get as close as possible to this point"
and "this point" in each of them is just the 3rd parameter (x) and 4th parameter (y) and they're normalized to be 0-1
what is the clear purpose of the AI model? training to chase the player using reinforcement learning instead of obviously using the pathfinding algorithm?
30 years ago it was "using internet", now it is 'using AI"
-colab
Google Colab is a Cloud (Remote Good PC) Service. While the Free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
by IA Hispano
Google Colab
by Hina
Google Colab
by Eddy
Google Colab
by Eddy
Google Colab
by Deiteris & Hina
Google Colab
by Shiro & Eddy
Google Colab
by Nick088
Google Colab
by Nick088
Google Colab
by Jarredou & Makidanye
Google Colab
Does this mean i started overtraining around 40k steps? if im reading the charts right?
those are old loss charts
Huh? im confused
no avg_50 loss?
how big is the batch size?
16, dataset is 1 hour and 59 minutes
like WAY overtrained right?
no, it is just way too much for finetuning
nah im good
wym "finetuning"? Is that a process you do after? or what do you mean by that
when you train a model on top a pretrain, it is technically a finetuning
ahh okay, gotcha. So its essentuially overtrained by the fact that im using a pretrain, but wouldnt be overtrained if i didnt use a pretrain
you have a big set, 4-12x size of the common size for voice models
so it keep using the same high learning rate longer
and with batch 16 is generalizes the model quite a lot
oh okay gotcha. Would you personally use a batch size less than 16 for 2 hour dataset? or would going less make stuff worse
I use 12-16 for my 55h vctk set
55h hour dataset?? damn
it is a pretrain
yeah
The 28k steps definitely sounds the best to me. Everything after that has like a robotic tone to it, especially at the end of sentences
hmm, i actually have a old version with 40 and 60 minutes data, i guess i can compare it to this one and see which one i like better
you dont need to create a new dataset, just make a copy of filelist.txt
then cut it in half/quarter
ooh okay
So I've installed RVC AI Cover Maker and after double clicking run.bat, I am getting this error
Traceback (most recent call last):
File "F:\RVC-AI-Cover-Maker-UI-1.0.5\programs\applio_code\rvc\lib\tools\prerequisites_download.py", line 3, in <module>
from tqdm import tqdm
ModuleNotFoundError: No module named 'tqdm'
Traceback (most recent call last):
File "F:\RVC-AI-Cover-Maker-UI-1.0.5\main.py", line 1, in <module>
import gradio as gr
ModuleNotFoundError: No module named 'gradio'
An error occurred. Exiting...
Press any key to continue . . .
you did not install requirements?
I installed requirements
If you download the .zip from the release (here) make sure to rename the folder from "rvc-ai-cover-maker-ui-v1.0.5" to just "rvc-ai-cover-maker-ui" otherwise you may run into missing dependencies issues.
Oh, that's (maybe) why it happened - my fault xD
hey what's latest guide for making a voice model from scratch?
becuase "it automatically finds the best path based on a known algorithm" doesn't sound as cool as "it's an AI that learns how to play" tbh, but also because I want experience making AIs and if I just use a pathfinding algorithm I don't get that experience
ty alot
and can you train a pre-existing model on more emotion? or do you just have to make it from scratch
hey is there anyone that can help me with the w-okada settings? i cant set it.
hey idk if thats the right channel to ask questions in
but i have a question^^
are there good male and female voice that also sound like a real ^^
the one i have is good but some ppl recognize it but its very old idk if that makes a diffrent or are there like new once nowadays that are better?
How can i make my friend voice to an RVC model Zip for ai
oh I've also been working on and off so long I forgot, there was more complex movement options before too, I just removed it because I thought too many options were the problem
hey guys its been a minute. just need some advice here.
if im going to train a voice model, should i use 32k, 40k, or 48k sample rate?
does higher sample rate require more training time?
48k is slower than 32/40k, whether to use it depends on your dataset
gotcha, might go with 40k. thanks man.
imo Jump King speedrunning is the one you should try exploring on
https://www.youtube.com/watch?v=e-iOd42mF4g
※最後Youtubeの仕様で失われてしまったアーカイブ5分間はこちらで補完しました→ https://youtu.be/kyPb3-8bLMY
デビュー前からやりたいと思っていた「JumpKing耐久」!!!!!!
年末の日曜日!!満を持してやっちゃうぞ~!!!!!!!!!
12時間以内にクリア...
Which W-Okada version are you using? And what is your PC GPU?
There are no known realistic male and female voice models in #1175430844685484042
-rvc
there are no known male and female voice models
worked just fine for a first run but tried to run another training session and got this, anyone know whats the issue? nothing changed
you can make a simple platformer game like that
the character movement/jumping system can be simple as og mario
I feel like moving in 8 directions is even simpler though, is it not?
Did the bot just duplicate message?
(8 being the 4 cardinals + the diagonals)
I was referring to the kind of platformer game, which is bidirectional plus vertical jumping
but also about an hour or two ago I made an even simpler test program that was just a single AI trying to get to a number that you could change and it DID learn how to find it pretty fast
yeah, but what I was already doing was literally just move up, left, right, down, upleft, upright, downleft, and downright
jumping seems like it'd be harder for the AI to understand
the jumping trajectory can be traditionally calculated, but decision making on the paths and timing can be things for the AI to consider
Is this a final message of Mangio RVC local?
['extract_f0_print.py', 'C:\Users\Mike\Desktop\Mangio-RVC-v23.7.0_INFER_TRAIN\Mangio-RVC-v23.7.0/logs/test1', '22', 'rmvpe', '64']
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
no-f0-todo
['extract_feature_print.py', 'cuda:0', '1', '0', '0', 'C:\Users\Mike\Desktop\Mangio-RVC-v23.7.0_INFER_TRAIN\Mangio-RVC-v23.7.0/logs/test1', 'v2']
C:\Users\Mike\Desktop\Mangio-RVC-v23.7.0_INFER_TRAIN\Mangio-RVC-v23.7.0/logs/test1
load model(s) from hubert_base.pt
move model to cuda
no-feature-todo
Mangio RVC is old and no longer updated. There's Applio RVC.
I know about Applio, but sometimes, it didn't worked for me :(((

Well, judging by your Mangio RVC folder path, you should never install any program directly on desktop. The path should be something like C:\Applio or D:\Applio if you use Applio.
Program shortcuts belong to desktop, not full programs within folders.
is this very bad? :D
depends on what chart it is.. you left the name and Y axis numbers out
g/total
ouch.. is it like 1 minute set?
@viscid moss sorry to bother you but I want to know which models are Best in UVR 5 UI to prepare a dataset from scratch. Like which is good for vocal and instrument separation, de eco, de reveb, de noise. Remove baking vocals etc..
Sure, check this docs about it:
https://docs.aihub.gg/rvc/resources/dataset-isolation/#the-best-models-for-uvr-are
Last update: May 5, 2025
This was made by our QCs for RVC model creation
And here's the best models according to Music Separation server guys
One question more, does these models works in UVR 5 GUI ? The little windowed exe ? Or these can only be used in UVR 5 UI. the browser one
Ye it can be used there, but u need to install the latest beta version + import every model manually
UVR5 UI does automatically
I see. Thank you a lot 
Ur welcome
Hello everyone, does anyone know where I could find some data for a chat bot ? i'm trying to make one using pytorch and don't really know where to start, if anyone can give me a lead to start I would be grateful
are these outputs good? It's for a 50min audio.
Traceback (most recent call last):
File "client.py", line 22, in <module>
File "asyncio\runners.py", line 194, in run
File "asyncio\runners.py", line 118, in run
File "asyncio\base_events.py", line 687, in run_until_complete
File "main.py", line 140, in main
File "main.py", line 81, in runServer
File "uvicorn\server.py", line 69, in serve
File "uvicorn\server.py", line 76, in serve
File "uvicorn\config.py", line 434, in load
File "uvicorn\importer.py", line 19, in import_from_string
File "importlib_init.py", line 90, in import_module
File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 935, in load_unlocked
File "PyInstaller\loader\pyimod02_importers.py", line 384, in exec_module
File "app.py", line 17, in <module>
File "PyInstaller\loader\pyimod02_importers.py", line 384, in exec_module
File "voice_changer\VoiceChangerManager.py", line 26, in <module>
File "PyInstaller\loader\pyimod02_importers.py", line 384, in exec_module
File "voice_changer\RVC\RVCr2.py", line 9, in <module>
File "PyInstaller\loader\pyimod02_importers.py", line 384, in exec_module
File "voice_changer\embedder\EmbedderManager.py", line 3, in <module>
File "PyInstaller\loader\pyimod02_importers.py", line 384, in exec_module
File "voice_changer\embedder\OnnxContentvec.py", line 2, in <module>
File "PyInstaller\loader\pyimod02_importers.py", line 384, in exec_module
File "voice_changer\common\OnnxLoader.py", line 1, in <module>
File "PyInstaller\loader\pyimod02_importers.py", line 384, in exec_module
File "onnx_init.py", line 77, in <module>
ImportError: DLL load failed while importing onnx_cpp2py_export: A dynamic link library (DLL) initialization routine failed.
Press Enter to continue...
help
LM studio, llama gguf model, python is only needed to exchange prompts and responses via API
use tensorboard to check your model graphs
tho none of the graphs helps in choosing the "best" epoch
you use the graphs to see if your model is doing well in the training process
the "lowest value" in the cmd is one of the many random things applio has 
it is supposed to be agv_gen
so lowest_value doesn't help in determining if the training is going well? alright, i'll download tensorboard.
no need to download anything
just use "run-tensorboard"
that loss is the avg gen like noobies said, but that alone wont help in spotting irregularities in the training
Sorry if this is a stupid question but is there a way to make an ai cover with any voice model? If so how?
In Applio?
yup
ahh i see, i'll rerun training and check the graphs, what should i be looking for? when do i know training is enough?
check if the discriminator (d/total) is not weak or too strong
if its too weak it will have very high values (above 4.0) and always going up
if its too strong it will have very low values like 3.5 and always going down
id recommend trying to train 100 epochs and save every 10
if e100 sounds great continue training up to 200 (u could also set max training epochs too 200)
avg50 d/total
this one?
you mean this one?
yea
so thats bad, too weak right? sorry im not an expert. do i change anything?
train more and see if it goes down to 4.1-4.0
alright, i'll rerun the training and see if it improves
but always hear your model, since loss graphs most of the time go down even when the model already started overtraining
thanks, i'll do that 👍
hearing it every 10 epochs is fine
if ur model begin to overtrain you'll notice every epoch past a certain step amount sounds robotic
nop unless you use a custom reference (this is only possible in the f0_spin branch)
so inference some expressive audio and hear how it sounds
gotcha, do i need to move the saved models into the inference folder? it doesnt read them where they are normally stored
uhm weird, applio should be able to locate your pth files inside the logs folder
it doesnt, unless i need to add the path to them?
Is finetuning a voice with bigger pretrain dataset generally gonna take longer time than one with a smaller one? Assuming same batch and epochs?
no
Oh ok, maybe i did something wrong then, or it takes some time to ramp up. When i was training with OG dataset i did like 5it/s. And now with the KLM i do like 2it/s. But i only looked at the first few epochs before i left
the first epoch will always take longer
Yeah makes sense then
2it/s is slower than 5it/s
hello everyone can someone help me install the voicechanger with phython i dont know what to do!?
unless you're trying to run some ancient version of the voice changer, you dont need to use python
download the compiled version for your gpu, unzip, run
@simple ore thanks
Yeah, just got home, and its at a solid 4 now, so its all good it seems like 😄
the speed may go down if you run a game or something else that uses GPU and pushes the memory use into shared territory
delet old account and make new one with same email its work to me in engok
or just download the baked model.
does converting to onnx affect the quality of the model?
also is rmvpe better than rmvpe onnx on nvidia? is it just better overall but more expensive to run?
hey guys, any idea how to resolve these two? i reinstalled RCV but my perf is now 300, but it was 30~ before. (using F0 fcpe)
also, with my usual voice, when i use a world with PU in it, it cuts it out like when i say anything that "pops" any idea? thank you 
onnx quality is worse, the reason it's there is because back then amd gpus were unable to run .pth files
- rvc does not mean realtime voice changer
- try increasing the chunk, maybe your gpu is too stressed
how u create ur own voice like #1175430844685484042
Why does my voice changer bugs when I play Roblox while doing voice chat on discord?
Like they can’t even hear me
uhhh is there a place where i can ask someone to make a ai model?
are people still using this version
hey
should i use crying and laughing in my dataset files (.wav)?
for 5070ti you should be using deitritis fork for 5000 series
@tight ether
sorry for the ping
how do i get less ping while using the voice changer?
Why is this happeing with Applio? -I can't send picture
Same
I'm developing rvc moodel with colab
ummmm............
Should I use the paid version of CoLab to create an rvc model?
It might not be great for the gradients, but give it a try if it's really necessary.
depending on your dataset size you can get away by using the free GPU time, althoug the more you use it, the less you get next day
Hello i have question but i cant put photos here
Hi, I have a problem, idk why the app doesn't detect my microphone when I use an RVC model but when I use the "Beatrice jvs corpus" it does 
Delete what you have, whats your gpu
!give-media-perms @royal marsh 5h
Nvm use #1192011222023950368
Whats your gpu and voice changer version you downloaded
Yes
Shit gpu can cause that, whats ur gpu and version of voice changer you use
rx6600
https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/
Download virtual cable, amd on windows, read rest of the guide for setup
Last update: May 5, 2025
Where do I get voice models which are working for the German language too cannot find any
hi i followed the guide but vc not working
Nvdia 3050 series
did u figure out what was that?
https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/
Download virtual cable, first nvidia on windows, read rest of the guide for setup
Last update: May 5, 2025
I am using this one bro
Send screenshot of voice changer
Still bugs when I play Roblox
Okay
it doesnt let me send pictures here
Make in #1192011222023950368
done
hello, can someone go in a voicall with me and tell me how to install the voicechanger because i am to stupid to make it myself even with instructions
My g/total is horizontal, with some down spikes
What could be the problem? 110e so far
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
hi when i use a file as a input the output sounds good but when i use my own voice it sounds bad how can i fix this? or is this a microphone issue?
not about the pitch
i tried looking up if its about the microphone but the web says u dont need a better microphone
don't pay for ai :(
Is it possible to use an ASIO other than FlexASIO with Deiteris VCClient? I've tried the ASIO driver supplied by my Focusrite Scarlett 4i4 audio interface, as well as a virtual ASIO provided by VB-Audio Matrix, and both experience crackling and dropouts during realtime voice conversion. Buffer size 256, sample rate 48000 (as recommended by this guide: https://rentry.co/lessdelayasio)
is the illaria rvc vocal isolation tool also not working to anyone else?
hey, im haveing fun makeing little ai covers on weights but the ai cover voice is kinda quiet, is there a way to fix it.
Whats the exact google collab with old gradio UI?
i got a geforce rtx 4060 and i downloaded one and it brung me to the web version
web version is correct, it still runs local just displays everything on a webui. idk what you mean with ping theres not supposed to be any like that, if you mean delay then theres some stuff you can do:
- use server mode with "windows wasapi" as a prefix on everything
- lower chunk but never go below the "perf" number that you get on the graph in color
i have a problem with https://github.com/blaisewf/rvc-cli
note that i use Google Colab
if i want to use resume option, i've got this error on Training session.
Autobackup Enabled
Starting backup loop...
/usr/local/lib/python3.10/dist-packages/librosa/util/files.py:10: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
from pkg_resources import resource_filename
Backup Complete: 860 new, 0 updated, 0 deleted.
Backup Complete: 0 new, 1 updated, 0 deleted.
Files are up to date.
/usr/local/lib/python3.10/dist-packages/librosa/util/files.py:10: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
from pkg_resources import resource_filename
/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:558: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Checking saved weights...
Using HiFi-GAN vocoder
Starting training...
Loaded checkpoint '/content/Applio/logs/voos/D_2500.pth' (epoch 100)
Loaded checkpoint '/content/Applio/logs/voos/G_2500.pth' (epoch 100)
/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:558: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
/usr/local/lib/python3.10/dist-packages/librosa/util/files.py:10: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
from pkg_resources import resource_filename
terminate called without an active exception
Backup Complete: 1 new, 0 updated, 0 deleted.
Files are up to date.
@peak path you dont need to use rvc-cli, if you gonna train anything, just use noUI colab.
dataloader warning is strange, have not seen it before
perhaps you failed to restore the dataset?
restore the dataset?
i didn't load my dataset again.
because they are loaded into my Google Drive already.
should i load them again? in the resume session?
colab does not access your google drive directly, you need to move the dataset from the backup to the colab node.
okay, so in the filesystem browser (folder icon) there should be your folder '/content/Applio/logs/voos' with a bunch of stuff inside.. f0, f0_voiced, extracted, sliced_audio folders, etc
true
okay, so you should be able to select a different max epoch (> 100 you have saved), and it should resume the process
oh, i set that to 100 again
should i set that to 200 in the resume session?
my first session was 100
obviously, otherwise you are at 100 as trained before
oh my lord
let me do it
you said i don't need to use https://github.com/blaisewf/rvc-cli
is there a better option?
i don't have the link
ok, thank you so much
i thought that you have another option.
i used it before
RVC-CLI is a command line interface for Applio, but it is kinda redundant
i'm stupid
i set that to 150 in the resume session
works fine
tnx
Using RVC nvidia on Github.
As soon as I begin audio conversion, the entire process freezes and the command prompt is empty
Other people I talked to had this same issue
Anyone know how to fix it?
Fixed it, but the voice changer isn't working
Just getting "Audio Block Passed"
- trying to use an acient voice changer 2) trying to use to different device types (WDM mic vs MME line out)
wdym ancient voice changer
Also yes, I have both input and output on MME
RVC worked for me before, now its just spamming audio block passed and the voice changer isnt working at all
link your "RVC nvidia on Github"
so yeah, ancient
I don't understand what you mean by ancient.
in AI terms, project that have not been updated for 6+ month are hopelessly outdate, your's is like 2 years old
here's up to date one https://rentry.co/forkvoicechangerguide#download-for-nvidia-gpu-on-windows
Alr thanks
guys , i need this files
Is there no longer a place to request someone create a model for you? Apparently I suck at it 🙂 and need someone to do it
I set up everything but how I still can't hear myself when I click start
I set up my input and output correctly
those component are usually get downloaded automatically
oh , what now !
unless you're using some outdated app that point at non-existent repository
hmm
Google Colab is a Cloud (Remote Good PC) Service. While the Free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
by IA Hispano
Google Colab
by Hina
Google Colab
by Eddy
Google Colab
by Eddy
Google Colab
by Deiteris & Hina
Google Colab
by Shiro & Eddy
Google Colab
by Nick088
Google Colab
by Nick088
Google Colab
by Jarredou & Makidanye
Google Colab
Just wanted to bump this. Is anyone able to answer?
I haven’t used w okoda for a while, is it still the best realtime voice changer?
Hey, I'm trying to find a small language model that learns with prompts. Any suggestions?
Hello, what Colab are people using actually?
what was the virtual audio cable that you need again for wokada?
I think it's called VIC or something, not sure
Virtual Audio Cable (VAC)
By Muzychenko
thanks bro 👍
May I ask?
- What if I want to create a speech model from scratch on applio or any speech model (meaning without download any pre-existing other model data)?
- Is applio is right way to create a voice model?
- How much voice recording data does it take to create a voice model?
Thank u for reading
- foolish idea, the model wont be generalized enough to be able to infer things it has not seen during training
- it is one way, but there are other ways.
- for training a model on top of a pretrain, 10-60 minutes. More is not needed.
on weights is there a way to make the ai voice loudrr, its kinda quet on covers that i do
Thank u! And i have some more question
- How to know it a Quality model to use from sample? (No robotic voice or sth?)
- If i have record my voice to train it, there are anything need to note?
- What is the best way to create voice model u use?
- check tensorboard charts, check spectrogram of tests audios, listen, pick the best model you've trained
- use a good mic, quiet room without echos, same loudness, same distance form the mic, clean up the recording
- ask in model maker chat
to get model maker chat you need to apply for model maker first: https://discord.com/channels/1159260121998827560/1305524365810470963
I have a question about the Weights voice model creation feature, could I maybe do it here?
also, wasnt this channel named something with "help"?, its been a while since I asked for a question in the server, and I swear this channel was named something like this. just wanted to know.
There are a lot of changes in this server. Yes it was named as Help-RVC but now it's renamed.
You can ask questions about Realtime voice changer or any other help like RVC or something
Which website to use to make AI cover?
So where can i do google veo 3 stuff for free?
If you have a good gpu do it locally. If not use huggingface spaces or colab or maybe kaggle
Is there any working colab to train RVC ?
please somone le me know if theres a way to make the ai cover song thing on weights any louder, the voice isnt very loud
hello, I need newest version of AICoverGen
what does this mean?
okada w
@tacit rampart #🧬│ai-chat message
How did you fix this
you're probably trying to run a second instance of the program
It did it first try, but restarting my PC seems to have fixed it.
when I use the ai voice changer sometimes its like where my voice kind of cuts out for a very small amount of time. Is there any way to fix this since it makes it sound way worse
mic sensitivity? voice activation is generally bad
I beleive my mic should be pretty good. I am pretty new to this voice changer. So Idk if there is any specific like mic sensitivity or like how to check it. All I know is when I tried to record my voice with like obs I could hear how the voice cuts of alot
if you mean like the level you can change in the sound settinga i turned it up too 100%
im using applio and im having a problem where with one certain voice it wont produce a audio file but it says its been inferred succesfully any way to fix this as the voice seems to work for others and i want to use it
look at the other window with the error log
so when you record an audio from your mic directly are there any issues with the audio dropping out?
nope. Only when I use the voice changer never normally
is there an issue when you use a voice changer and use mic as an input and headphones as an output?
I use the cable input to be able to speak on discord and more. And that is when the problems accure. I use my normal mic for the input and the cable input for my output
we'll get to that, please answer the question above
so my normal headpones as an output and normal mic as an input. I dont think so since when I use the voice changer and hear myself it sounds pretty good
okay, so now mic input, line1 as output, discord line1 as input, push to talk = enabled
is there an issue when you use push to talk?
gonna try it with friend so he can let me know if it sounds good
its still quite laggy apparently when I try it
or like it cuts of alot
okay... is the noise canceling enabled in discord?
there are a few settings you can try
I turned it of
could I possibly text you in dm's so I could send a photo of what my settings are so I am not doing anything wrong?
Is there any working colab to train RVC ? RVCDisconnected seems to be banned
I finally fixed it on discord but now its just not working very well on teamspeak





