#✨│ai-help
1 messages · Page 244 of 1
if ur dataset is above 2 hours, try using batch size 32
but 16 works too
i trained mine using batch 24
and 2x d passes
L1 Mel Loss
ooo ok
ill keep that in mind as well
do you have any more tips I should know? about to sleep
models can only do things similar to what they have in their dataset
so remember ur model is only gonna be able to do things similar to what are u hearing
yup, they just sound very robotic while trying to do very different stuff
i can show u
ohhhhh
extreme example ^
yea I get you
now u kinda understand why most models sound robotic hehe
the more u fill those gaps, the less robotic results u get
yup I understand that now
use applio F0-spin branch because that allows u to disable multi-scale mel
L1 mel is more natural (in my opinion)
okay
do you any good guides for it?
bc im gonna have to watch a guide prob to even get started lmao
there is this https://docs.aihub.gg/rvc/resources/training/ and https://docs.aihub.gg/rvc/resources/dataset-isolation/ but they arent advanced
these works but ye this is a bit more advanced
voc_fv4 the goat
for removing room reverb i use this
the heck is this?
de-reverb
sounds good thanks
did you 🏴☠️ or

Acon DeVerberate 3
"buy" it
is very decent imo
my set had a pretty loud reverb
i pretty much never have reverb in my sets
so idk if ill use it
dialogue isolate is all i need 😎

I've got a mac Intel that I'm running the mac deiteris w-okada file on - I've managed to quarantine the files so I can open the application, but it crashes after a few minutes of loading - could someone help me with this?
intel mac is too old and unsupported on most AI workloads
consider the cloud alternative https://docs.aihub.gg/rvc-voice-changer/cloud/w-okada-kaggle/
Last update: May 5, 2025
Thanks! I'll give it a whirl
I'm a little lost on what to do after hitting copy and edit in kaggle - is there a video or anything I could watch?
I don't think anyone has made an updated & reliable video tutorial, only the above guide
Unfortunate - I'll see if I can figure it out and ask more oof
helloo, can someone tell me how can i download w-okada voice or is there better options?
🙏
How to download w okada
its fixed on dev
now im not at 100% with the results im after (character with heavy style influence to the point that i can prompt the character in different outfits at full weight and still keep the art style on offshoot models like wai)
but im at like 70%
as opposed to like 45% before (gens were still very booru outside of base illy)
one thing im noticing in all the other models and idk how much it even rly affects me
but kohya gets rid of the noise offset field when i select multires noise
and all the models i look at off civit has a 0.1 noise offset
is it possible for me to get the noise offset AND the multires noise params like these models?
i wanna be at like near 100% parity cus i want these same kind of results
litsa whyy
?????
Sorry but they're not lying
why rejoin on may 12 of thisyear
who
litsa
how do i know the epochs of the model from my weights
i have to submit
-colab
Google Colab is a Cloud (Remote Good PC) Service. While the Free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
by IA Hispano
Google Colab
by Hina
Google Colab
by Eddy
Google Colab
by Eddy
Google Colab
by Deiteris & Hina
Google Colab
by Shiro & Eddy
Google Colab
by Nick088
Google Colab
by Nick088
Google Colab
by Jarredou & Makidanye
Google Colab
who's lista
I don't know
you don't know? that means you're new here
were you trying to submit for https://discord.com/channels/1159260121998827560/1305527335646269440 ?
yes
hey
I suppose there should be other QC staff members that could respond on
perhaps wait till them online
I don't think u can check, I've made some on weights to test, btw don't use weights for model making it's bad
there are different ones, choose based on the things i said
what do i do after i install the realtime voice changer?
elaborate:
- your pc gpu
- what you want to do
- what tut link are you using
when i install the voice it says u cant download that type why
my gpu is an rx 6600 i wanna use voice changer ig and no tutorial link i just got the disocord link Duckus's youtube vid
elaborate:
- your pc gpu
- what you want to do
- what tut link are you using
youtube tutorials are outdated asf
forget everything you got from it
okayy
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
read 1st lin
i alredy downloaded it
no, the one from youtube tutorial is a year old outdated version
as i said
ik
forget that duckus tutorial even existed
i downloaded this 1 alredy
yeah you shouldn't use it
so you got wokada deiteris fork or the one from youtube?
i downloaded the yt one first and then deleted as it wasnt working and downloaded the wokada one
@winter iron slurs arent allowed.
Okay
show a screenshot
!give-media-perms 1h @stark zephyr
click it, and then open MMVCServerSIO.exe
done
3060 ti
i want to downolad voices from the server
idk i saw a vid says join this server and download the voice u want but when i download it says u cant download that type
then show a screenshot
you shouldnt use video tutorials, they are outdated
RVC = Retrieval-based-Voice-Conversion, the best Few Shots Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models. Technically, Mainline RVC does have a go-realtime.bat (aka RVC-GUI), but it's pretty messy and outdated so it's extremely not suggested for realtime.
Wokada = uses RVC for realtime inference. There's 2 main versions, Original made by Wok, and the most suggested one is Deiteris Fork (modified version)
which do you need
click enter a few times then wait some seconds
is that program?
i use voice ai
whats happening
when do i know when its finished?
and you shouldnt
which do you need?
show a screenshot
Is it a live sound or just a clip?
cuz i want a live sound
i explained you the differences, which do you need? realtime or pre-recorded audios?
just be safe to not fuck it up so hard that it doesnt download
if realtime means live voice ye realtime
realtime means using the changed voice in discord vc or games for example
alright
delete everything you got off youtube and voice.ai
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
get wokada deiteris fork, read the 1st link
How do I install it?
i installed it but it has many things
Did those postmodern jukebox ai covers get DMCAed by the YouTube channel (frank Sinatra covers for example ) or are they generally okay with people uploading AI covers of their covers as long as their channel is credited? Because last time I had a bunch of PMJ Brittany Murphy covers they never got takedown at all , I just deleted them from my main YouTube in case the channel turned against me so I wouldn’t lose my main YT content that was non AI.
I can’t find any info about this online about PMJs stance on the ai covers and whether they issued takedowns
Obviously I’m not famous enough to have dialogue with someone from PMJ to ask for permission to have Brittany covers of their music on my channel.
I was going to upload them to an alternative YT channel and not my
Main.
i installed the app but how i use it or open it
it didnt move at all
should i re install?
ask
I
wanna know if w okada would work well with a 3050, 4050 and a 4060 all with 16gb ram
I could help u set it up
i domnt have it yet son injust wanted to know
idk how well it would run tho
I'm not sure but I use this gpu and it works well
idk much about gpus is that better than what i said
Hi, i wanted to use the ai voice changer thing to make a song, i had downloaded it a while back but i unfortunately forgot how and what do i need to install and properly work it out, anyone can help me?
I am not entirely sure
but again I can help u download it and see if it runs well
those are all more than good enough
are you trying to use an ai voice over existing vocals or do u want the voice changer to sing using the ai?
sing using ai, i wanna record my raw vocals and then use that voice changer to make it sound like juicewrld singing
(purely for my purpose not for profit ofc)
I reccomend using this, you don't need the voice changer for doing that
https://www.weights.com
but if you do want the voice changer u can talk to me in dms about it
well its free right?
Your gpus should be good, SussyBoi69 said they are good enough so just send a dm when u want to download it all
the voice changer and weights is free yea
ohk should i test the app u sent me above and then if i wanna try out the voice changer later dm you?
well u can still dm me if u want help with the weights app
ohk I'll lyk if i encounter any difficulties
@brittle wing u still interested?
did you read the guide? you need to get the nvidia version
id just suggest to not post ai covers at all, companies do strikes once on a while and you could find yourself with 3 strikes in a day (ban) lol
i got 2 strikes in less than half a day once
what part of the guide didnt you understand?
be sure to not waste your internet and retry
some people are slow, you have to show them with video,
like me
words don't do much if they don't read
so basically im looking for a realtime live voicechanger with a smaller delay and more realistic is there any good alternatives?
Can anyone here help me with the rcv voice
i tried to download it but got told i had 2 pay
would anyone happen to know what was used for this voiceover? https://x.com/gochionsol/status/1916526942037193001
ive been experimenting with a lot of different voice models and the realism on some is real hit or miss (robotic sounding, artifacts, etc.), what should i be looking for in voice models in #1175430844685484042? i usually filter by rmvpe, english, and rvc
i already have chunk size at 74ms and extra at 2.7, f0 extractor rmvpe, with force fp32 and a dedicated gpu on deiteris' fork (so i can up the settings but the models themselves just have artifacts and stuff)
so how can i find good models or how can i up the quality of the realtime voice changer? i'm just looking for a normal talking model not singing
other than listening to the model samples there is nothing you can do to tell which one has better quality without downloading them
what about things i should look for like epochs/pretrain/etc.?
pretrains dont change much
epochs dont indicate quality
f0 method you should either look for rmvpe or fcpe
pretty much everyone uses rmvpe so you dont really have to filter for it
Well, I choose the one from Google Collab or I don't know if that's what you're referring to.
what is kaggle?
Like google colab except it works
or ok and where can I find it
Last update: May 5, 2025
yo i execute the setup but i do nothing after ?
There is no other option. The truth is that using Kaggle is more tedious. I prefer Google Collabs.
Kaggle more sustainable in the long run, Colabs literally do not work at all for wokada
o ok thanks
So i got it to work but it dont work in vrchat anyone who can help
elaborate
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
Hey everyone, quick question. Does stability matrix support .onnx models?
Can someone please help me with okoda rvc? I recently upgraded my gpu to amd and i’m stuck on the cuda version, i don’t know how to switch to another version
The reason i want to switch to the other version is because next to the “gpu” processing option it only shows the CPU and overworks it to the point the voice changer stops working
I had help set this voice changer up a long time ago so i don’t know much or if the gpu change is the cause for the problems im facing with it, could be that or its an outdated version
what exactly does it mean to merge voices?
i heard you can make it sound more realistic
in vonovox is there a way to add more effects? more vst plugins in the same line?
or do i just have to keep using things like reaper for that
if my dataset if over 2 hours should I just train from scratch
also does anyone know why it feels like the outputted sound feels like its talking faster than the original?
it feels like im talking to fast after its converted
no
2 hours is too small for a pretrain from scratch
anyone please?
gochu thanks
Its only letting me use my cpu so im certain its the cuda version being the problem so I followed the guide and downloaded the files but i dont know what to do next, where to put them/extract
can you hear the changed voice after?
I do but super like corrupted, late, and laggy
And sometimes it comes through sometimes not
In the guide it says its not recommended for cpu to do the work so yeah
umm idk english too much but ill try to understand
would 8 batch size be fine
sorry for the questions i havent trained a model close to this big ever
or should i go 16
i would use 16 but 8 works too
okay thanks
is wokada still the best real time voice changer? its been a minute since ive used it
I’ve been trying to find the best place to clone voices I’ve been using Kits.AI and it’s pretty good but all they want is to take your money for things that shouldn’t even be charged for. What is the best way to clone a singer that sounds really good?
And I don’t mean Weights I’m sorry but it’s just not good in my opinion
انت عربي؟
How i use Applio RVC to train my voice
In the context of RVC, the dataset is an audio file containing the voice the model will replicate. It can be either speaking or singing.
But the news is that I don't find upload my voice
Last update: Apr 01, 2024
Do you want to inference or train a voice model?
what do i put here if i wanna use the voice changer on discord ( i already have virtual cable downloaded)
Train voice model with my voice
euhh i cant send images but i mean for audio input and output
Go to your discord settings select mic input to virtual cable
what about the input and output for the actual voicechanger
Last update: Apr 01, 2024
its still on default idk which one to change
Input your mic
Output > virtual cable
thanks its work
Can it work for android
Read all the 5 steps 1. Pre-processing
i think you need a pretty beefy device to run rvc
You mean w-okada ?
What is that
Real time voice changer
You said "can it work for Android " what are you talking about?
@soft stratus
I mean upload my voice file to Applio works in android
You are training on Android? I mean in cloud like Google colab or kaggle
Google colab
Yes it works but check it
I mean you directly have to upload your dataset file to a folder named "Dataset"
i dont have access to server mentioned in voyage contest
https://discord.com/channels/1159260121998827560/1374356047497527317 i dont have access what is this
#🏆│vc-leaderboard message i dont see vc anytime run in this server tho no one join
can anyone help I use okada voice changer but my voice lags a bit and mumbles a lot
thats the old channel, this is the new one https://discord.com/channels/1159260121998827560/1380902127366574151
does anyone actually have good success with w-okada?
i feel like i havent seen anyone with it working well
maybe it's just you, please explain your problem
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
does yours work well
im genuinely curious
cause ive been trying and it just doesnt sound realistic
maybe try another model that sounds better, or check out this guide
Last update: May 3, 2025
do you have one that works? and ive already checked that out
although i havent done it completely
im a little lost in connecting light host to the voice meter
there are some example models included in this guide: https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#models-to-try
Last update: May 5, 2025
otherwise try searching in #1175430844685484042 or weights.com
it may depend on what kind of your voice on the mic
i got a blue yeti is that good?
so try several models and find which of them work
also don't forget to adjust pitch as needed
i do
Can i get some support too? My problem i believe is that im on the cuda (nividia) version when i should be on another since i upgraded to amd, because i noticed during the processing options for gpu it only shows my cpu and overloads it, i downloaded this link but i dont know what the next steps are, can i get a step by step on what to do after i download it in winrar?
(Dm me pls on what to do if a step by step guide is against the rules here, i’m worried i’ll mess up for the fourth time)
voice changer client demo = you are using original w-okada, most likely cuda version of it. Not the fork the guide tells about.
so should i delete it along side with everything on the MMVCServerSIO with it?
definitely wrong version, should get the AMD one as you showed above
okay i downloaded that link idk what to do next...
and what should i do with the old one?
is it fine to keep?
You can delete the old one.
everything in the file? or just the start_http?
Guys. Can i ask about what is spin 7 12 ? And how can i use it
the while voice changer folder
you can use it as a custom embedder.. or you can get an experimental branch of applio for that
okay im deleting everything in it
still wrong version, the b2332 one should have only the exe file to run directly, like this
well i just deleted the folder so now i got this one which i assume is the correct one?
Ah, i use Codename's applio and it only have spin. I don't know if 7 12 is much different from normal spin 
it should be 7-12 one
(i dont mind sharing my screen in help vc to speed it up the process)
And i have no idea if i can pair pretrain from Seoul Streaming Station for spin, the KLM4. Because i got some cracking issues with KLM6 
Ah they are the same ? Dang it
i thought they are different
I'm not sure how he included spin, I don't see his rvc/lib/utils.py downloading it
i have no idea, but it is in the fork somehow 
so I guess it just uses wharver is in rvc/models/embedders folder
it is a command that calculates the checksum of the file, so you can find which version you're actually got
since the names are the same
open command line in the spin folder and run it
i got the voice changer but why is it a web version? is there a none-bowser version?
it just uses browser UI
okay ill test it
if you are running it locally, change to server
for direct access to hardware without the browser limitations
oh okay
uhhh
oh ok
it switched
which one of these should i use? the WDM?
or mme? idk the difference
ah, Noobie sir, do you know if refineGAN can be used in realtime yet ? i read in KLM5 article, and someone said it wasn't suitable for realtime
help applio says no api found 😦
ah nvm, i will wait for SSS release KLM6v3
in Vonovox maybe, but there are no models
what version of applio?
the only api there is core.py command line
hmmmmm, so for now there is no pretrain model for refineGAN ?
ah
klm6 use hifi
there is, even for spin.. but is it mostly suitable for speaking
singing is an issue because it renders harmonics too well, so they end up mirroring like crazy
like, it is not working well when singing ? because i don't remember any models can sing

How to get the RVC to work in discord and games?
there are artifacts that you may hear
ah, understood, so i should stick to hifi-gan for now xd
hello. I recently got a new gpu and wanted to ask about gaming and w-okada.
With the 5070 ti it sounds quite good without gaming, but when i play games such as hunt showdown it starts to sounds a bit worse.
Is the gpu still not good enough or can i adjust settings and it will improve? I don't have that much knowledge about it. (as in reducing ingame graphics and maybe increasing the delay from 0.4sec to 1sec?
hifigan mirrors too 🙂
when you play a game that in addition to 3d also uses upscalers like DLSS, there's a big competition for the computing resources against W-Okada
then either should be fine
i just wonder if it can process breathy or scream
it's based on my data too right ?
i see. so if i dont activate DLSS and maybe reduce the graphics it might bring better results? Thanks 🙂
add fps limiter if the game allows, that may help
but 5070ti is decent
unfortunately there's no way to prioritize the gpu resources
i got it to work finally!!!
thanks for the help everyone 💙
though is there a way to increase/decrease chunks?
because its locked for me
also i cant click on sup1 or sup2
oh nvm i just had to turn it off im stupid lol
Google Colab is a Cloud (Remote Good PC) Service. While the Free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
by IA Hispano
Google Colab
by Hina
Google Colab
by Eddy
Google Colab
by Eddy
Google Colab
by Deiteris & Hina
Google Colab
by Shiro & Eddy
Google Colab
by Nick088
Google Colab
by Nick088
Google Colab
by Jarredou & Makidanye
Google Colab
guys, which tool should i use for below use case?
lets say i have a video of a person (a movie scene)
i want it to say something some custom dialogue
which tool i should use?
gpu: ur rx 6600
chunk: 128
extra: 2.7
did you install vac lite?
1 small issue, i can't mess with the noise supression
I did.
It works in discord now.
set input to microphone
output to line 1
monitor to headphones
also you can optionally use force fp32 mode in advanced settings for better quality at the cost of some delay
show a whole screenshot of all ur settings
with these settings
it sounds so bad tho
show a whole screenshot of all ur settings rn
^
yes sir!!!
set extra to 2.7
chunk to 200ms
the reason why you cant use noise/echo suppression is because you're in server mode
client = can use noise/echo suppression, but can have more delay and is easier
server = harder to use, less delay and can't use noise/echo suppression
if you need noise/echo suppression, you need to use server or use a 3rd party tool for noise suppression
also, you can optionally turn force fp32 mode on in advanced settings for a bit more delay and a bit more quality
It sounds good enough for me.
set extra to 2.7, chunk to 128
ALWAYS check the triangle when youre changing settings on AMD
the triangle will be your life saver for AMD
Which triangle
hover your mouse over it, it will tell you wha tto do
you need to click stop, change the settings, it doesnt matter if they arent as accurate just close, then check the triangle
like it doesnt matter if u set it to 127 or 129 instead
Best I can do is 136...
its fine
also, if you click advanced settings, then set force fp 32 mode on, you can get more quality with a bit more delay
yes, always check the triangle whenever you change settings, its good when its not there anymore
okay
.
This sounds way worst?
Its like cracking up.
try out other models
Okay.
Last update: May 5, 2025
Alright thanks.
also reminder that the quality depends much on the model, if the model sucks, quality will
its always better to try many models
oooh okay thank you, should i adjust the ms if im gonna play a game like vrchat? or it doesnt matter? (also i cant set it to 200ms only 210.7ms is the closest because i can't just type it by the looks of it)
should i adjust the ms if im gonna play a game like vrchat?
yes, just remember to set the chunk an higher value slightly than the perf value
also i cant set it to 200ms only 210.7ms is the closest because i can't just type it by the looks of it
yeah you cant type it unfortunately, but it doesn't really matter dw, just as long as its close and not that you put like 560 instead
so, do you need any other help?
so ingames the higher i set the chunk value the better? will it increase the delay in the voice if i increase it?
Yw
chunk basically controls the delay
if you put a lower value than the perf value, it will start lagging basically (because the perf value is the ms of performance ur gpu is doing)
so, in games you have to make it control an higher delay so it doesnt lag out
ahh okay so it'll have probably a larger delay but it wont lag out or be inaudible
exactly
else it will lag because you're forcing your gpu to do a less delay than what it can do while in game
does using quest 3 pcvr mode also affect it?
its suggested to play at the lowest settings possible btw
not sure, i dont have a vr
vr games might be a bit more intensive tho?
low settings in terms of the games or the voice changer?
ahh okay
thanks a lot for this info, i was struggling yesterday
trying to understand a lot of it
yw, need any other help?
i think that's all ill be messing with it and will try the client side instead of the server
btw the great majority of settings is either explained in https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/ or when you hover to the settings name and the settings has like "....." under the name (sorry idk how to explain it lol)
Last update: May 5, 2025
that means more delay, tho u can surpress noise
i'll see if its acceptable amount of delay when i test it in vr, also i noticed the audio in client is quieter than the server?
like i hear myself but at lower volume
try increasing the mon volume
it didnt change anything
can you try increasing the volume of your headphones?
i mean i hear myself its just quieter when i switch to client i dont think its a headphones issue
but its fine its not so low its inaudible, its still fine, i just noticed it
oh!
what fixes is it is the out and in (even tho both is 100 client and server but in client its quieter so it should be increased in my case)
all good thank you again for the help that's all 
when i download a voice it doesnt sound good at all and its buggin
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
Which W-Okada or RVC program are you trying to use? And what is your PC GPU?
5700xt directml 1.5.3.18
Download the "better" W-Okada DirectML from this guide instead. Yours is old and outdated. https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#download-amd-intel-and-cpu-on-windows
Last update: May 5, 2025
ill try and let you know thanks
1.5.3.18 is an over year old version of original wokada lmao
delete everything you got off video tutorials, they are old
the version that namari gave you is the latest wokada deiteris fork, which is the best
if you have vb audio cable, uninstall it
why?
VB-Cable gives random issues for Windows users when trying to use it with W-Okada, as its settings kinda complicated to get it work properly.
but i need that if i wanna make it work on discord ill just try
Last update: May 5, 2025
There's an alternative to that one. It's Virtual Audio Cable lite, works out of the box. It doesn't have always be VB-Cable.
alright thanks, i deleted vb cable but it still appears in my output settings
might have to restart my pc
it works perfectly fine thats crazy
@simple ore sorry to bother you, but is it possible for a normal user to train a custom pre-trained model. If yes, please elaborate how.
like.. have money to buy 4090 or rent something better
find 200+ hours of various audio from different speakers, should be same quality
Yeah it could be possible
Pre-trained RVC model? I think it sounds possible, but you would need to gather a lot of audio datasets to make one, more than for a typical normal voice model.
Will it give any good results if I fine tune my models with it ?
55h set on 4070 runs ~30min/epoch, you need to get like 300-600 epochs
pretrain is a base
like a blank sheet cake you buy from the store to write 'happy birthay' on it
It will take like months on my system
Will it?
I got 4060

Maybe possible on kaggle?
if you pay for something better than 2x 10-year old T4s
but there are cheaper options around
how to make a model not sound robotic
Does anyone have the Google Collab link to make the covers?
Yo can anyone come in VC support to check if my Overwatch Winston voice changer is working ?
just ping me if you can
Is it a good idea to train a custom pre trained? Like will it improve the overall quality?
Dont think so... a pretrain can help to expand the range of the model a bit, but default hifigan kinda sucks
you risk of replacing the pretrain data with just your finetune if you over-do the train
and a bigger pretrain dataset does not seem all that better than smaller
sure, having some singing data and dynamic range helps, but...
how do i package my rvc model using a command here ?
HF is down btw
im training a spin pretrain, if everything goes well im going to share it in #1235952130855010365
but hifigan only, not refinegan
speech only, no singing
Im getting problems getting the RTX 5090 to run on deiteris' optimized W-Okada Fork Voice Changer.
I installed both versions, the one for the 50-series doesnt work, and gives "voice not selected" error messages.
tested the 50-series with a 4080M card and getting the same error.
downloaded the non 50-series version and it worked for CPU.
Have you tried selecting a voice?
yes, it is selected, made sure to upload a pic on the voicemodel to make sure
Did you click the voice?
yes, multiple times
Did you try switching to a different voice then back to that voice
im certain it is selected, I know how to work with the non-50 version
I didnt, but I uploaded another Voice to the same slot
How do I train a model?
i don't think any current real time stuff can sing tbh
but gambatteeeeeeeeee
i can only wish you the best
singing is hard
for ai
but we will get there eventually
maybe maybe refinegan + a singing only dataset
somehow i can't use KLM 4. or 5. or 6. correctly
it keeps losing voice and having lots of cracking

you're trying to infer singing with speech datasets?
no lah, i don't have singing, only speech data
like, i can't use it for real time for whatever the reason
welp you answer urself why your models cant infer singing
i mean, not singing, just talking normally
and it has cracking and lose voice mid sentence
how big was the dataset?
could be your mic

imo just stay with the og pretrain
it was ok for me but i noticed speech sounds unnatural
the og pretrain is great
only issue is that it cannot sing because the pretrain was trained with speech
multi scale mel loss seems to increase vocal range but it made my models not resemble the original voice too much
but the true solution would be training a pure singing pretrain, then finetuning singing
(refinegan may be a better choice for this)
the spin embedder seems to be better at handling noise so in theory breathing should be better
but i havent compared that yet
have you tried wavlm at all?
nope, did someone here tried it?
i dont think so
yknow what i should ask the general question of what are some recommended settings for training cartoon style loras for usage on non-base illustrious checkpoints
im like 85% there but i wanna be closer to 90-95%
i see plenty of loras that supposedly accomplish this but trying to follow their training params has been only Okay
if i have to change other settings in my genner (i use automatic1111) to get closer i can do that too
ideally i wanna get the style to remain when i do gens on wai-nsfw (what a lot of the models i see use to prove style rigidity)
could be bad tagging
but i wouldnt know im so new to training loras 😔
if someone could actually answer that question that would be very helpful to him and me
i mean ig you can also try a lycoris
like a locon or a glora
if you're not satisfied with booru-like thing, try pony or flux
i share an instance with my friend but he doesnt seem privy to go back to pony
trust me if it was up to me id run crying back to pony training on that was a blessing
but i KNOW its possible on illustrious
i SEE models that do it all the time
hi theres an issue with vonovox and flexasio setup i dont know why if anyone could help
Input Device: FlexASIO
Output Device: FlexASIO
Configured Sample Rate: 48000
Error creating audio stream: Error opening Stream: Invalid number of channels [PaErrorCode -9998]
Critical error in start_vc: Error opening Stream: Invalid number of channels [PaErrorCode -9998]
Traceback (most recent call last):
File "gui\\gui.py", line 1229, in gui.gui.GUI.start_vc
File "gui\\gui.py", line 1270, in gui.gui.GUI.initialize_voice_conversion
File "gui\\gui.py", line 1324, in gui.gui.GUI.start_stream
File "core\\audio\\audio_processors.py", line 797, in core.audio.audio_processors.AudioDeviceManager.create_stream
File "core\\audio\\audio_processors.py", line 787, in core.audio.audio_processors.AudioDeviceManager.create_stream
File "C:\Users\legen\Desktop\Vonovox-1.4.5\runtime\Lib\site-packages\sounddevice.py", line 1825, in __init__
_StreamBase.__init__(self, kind='duplex', wrap_callback='array',
File "C:\Users\legen\Desktop\Vonovox-1.4.5\runtime\Lib\site-packages\sounddevice.py", line 909, in __init__
_check(_lib.Pa_OpenStream(self._ptr, iparameters, oparameters,
File "C:\Users\legen\Desktop\Vonovox-1.4.5\runtime\Lib\site-packages\sounddevice.py", line 2796, in _check
raise PortAudioError(errormsg, err)
sounddevice.PortAudioError: Error opening Stream: Invalid number of channels [PaErrorCode -9998]```
shows up in the cmd
ok
So, are the only two ways to create AI models with Applio and Kaggle? I used a Colab that said it was called RVC v2. Disconnected - Colab. Sorry for the inconvenience.
hi how do i fix this
I clicked enter and it doesnt pull out the RVC client on my chrome
How much epoch should i train if i have 15 min voice with clear sound.
- should i use
monoor astereowav file for my dataset files? - is it possible to use
flacinstead ofwavfor dataset files? which one is better?
- mono, but if you forgot to convert the dataset to mono it doesn't matter, applio/rvc will automatically convert the dataset to mono
- flac works too
No answer to that, use tensorboard to track progress
What version are you using
Client uses MME, a sound device, by default which is extremely outdated
Server lets you choose which audio type you choose. You can use mme there too but obviously no point when wasapi is newer. Wasapi has less delay (faster audio processing) to name one benefit
For you its best to use kaggle yes. Colabs dont rlly work atm because of an update Google had since they dont like the use of colabs like this in the first place unless im misremembering
Applio is not a category like colab and kaggle, you would use applio inside kaggle you get me
Fixed alredy.
How to use it?
Last update: May 5, 2025
?
Youre describing something different
One has to be ticked by default
but it works on w-okada just fine just not on vonovox, is there a way to contact who made vonovox?
adjust chunk according to the gpu capability
whats your extra & chunk & gpu?
@shy spruce
Gpu goes idle maybe. Try lowering chunk a little bit more like 150, if it doesnt fix, then run the "force gpu clocks.bat" file outside of the mmvc folder
For everrything
hey yall
do any of you use AI in your projects?
i want to integrate AI into my project
but i dont know how to do it for free
Which doesnt matter cause wokada downscales to 32k anyway
But your mic inputs everything into wolada to its fullest range this way
Means your mic picks up your headphones output and loops it
Move mic further away from your headphones, lower volule on your headphones, move in. Sens. Further to the right
hello
when running deiteris fork voice changer i was wondering if i can change the name of the run file from MMVCServerSIO to anything else and the voice changer folder name to something else
if i change the name of these files will the voice changer still be able to run?
not sure if heres where i should ask, but im having this issue with the voice changer, when i use it i have it set so it inputs my mic, outputs VAC, and then in discord inputs VAC and outputs my headset, what happens is when other people talk loud enough, the voice changer for some reason picks up their voices slightly and then replays it, but my mic normally doesnt do that so idk what im doing wrong
hm, my headphones do that but they are old so the system or something for it probably broke
arctis 7p+
the weird issue is that ive done things like this before but ive never had this specific issue happen, my mic normally doesnt pick up noises from my computer, but for some reason the program does
turn down volume of voice chat/system sound and use decent noise suppression
if you have someone irl, tell them to be quiet
and i cant make it audio loop on itself by screaming loud either
which is the weirdest part
hmm, i can try that, krisp noise suppression completely kills off the voice changer
so im forced to use standard
so i guess i can noise supression on the voice changer itself alongside discord standard noise suppression
Might aswell try before asking it takes 10 seconds lol
does anyone know how to create your own voices for the VCClient? there's probably a whole load of tutorials out there im just looking for a point in the right direction
i have a sample for the voice all ready to go i just dont know how to use it lol
In the context of RVC, the dataset is an audio file containing the voice the model will replicate. It can be either speaking or singing.
much appreciated
there is a channel setting in flexasio when you set the devices. make sure both devices have the same number of channels, they look mismatched
when making a voice model of a character that doesnt have much dialogue (for example: a character wich only has 1 minute of unique dialogue in total or maybe even only a 1 second audio clip), how long should the dataset recommendably/ideally be? should it be only the unique instance(s) of voice/dialogue or should it be an audio of said dialogue/voice repeated until you have an audio file with certain duration? if the second case, I wonder what duration should be enough.
- 10 minutes minimum
- no, don't repeat the audio, its going to give very bad results
#ask Index Rate what for ?
is the accent of the model
index 0 uses your accent
values higher than 0 will begin to blend the model's accent with your accent
finally the best simple answer that can i understand with lower iq, thank you 🙏
i think for ideal real time voice on games, using higher index will be better 🤭
well, in that case, the character im trying to make a model only has like 1 minute of usable unique dialogue... Im not expecting it to be a SpongeBob or Michael Jackson quality type of model, but still, I wanted to know, would it then just be "better" for me to use only this 1 minute long audio of him "singing" than repeat it like 5 or 10 times? (to atleast make it less mediocre) its what I understood.
no that wont help at all
the model requires actual 10 minutes of diverse data
no the same thing over and over again
technically you can train anything, even something below 10 mins
but don't expect good results
so yeah, i guess i will just use the 1 minute audio then... theres not much I can do ¯_(ツ)_/¯
yeah, again, not expecting a super high quality result
sounds interesting, what if you try it?
im currently training my hifi pretrain, breaths are still robotic :<
now im thinking about the same thing, but instead of different volume, it would be different pitch and "time".
gpt told me about different volumes, pitch and time being viable as data augmentation
i dont think there have been tests of data augmentation here
adding more breaths technically may also improve them
hifigan needs natural audio right? If so then thats why we never used data augmentation
hifi/refine can clone whatever it's in the dataset
according to gpt it may perform poorly
ye seems like gpt changed its mind since last time i asked about data augmentation in vits
i mean we can still try but i doubt anything good will come out
i think this to work the augmentated data should be shorter than the original data
like adding 5 minutes of augmentated data to a 10 min set?
but I'm not sure, someone should try it

i can try it rq
try what noobies said
duplicate the dataset but with lower volume
i can try after this pretrain learns how to reproduce breaths 
👍
how much volume lowering do i do?
ill try -3, -6, and -10
i just joined to try a specific ai voice made by MartinFLL and i wanted to know how can i use it? do i have to host an ai on a local device?
i love my 5060 ti

sorry to bother you do you know where can i have informations about getting started?
if you want to make models follow this:https://docs.aihub.gg/essentials/how-to-make-voice-models/
If you want to make a cover follow this: https://docs.aihub.gg/essentials/how-to-make-ai-cover/
i suppose for TTS i should look at the "how to make ai cover" page ?
i don't think there is TTS
what tts do you want?
we have a list of them here: https://docs.aihub.gg/tts/tts-tools/ (minus chatterbox, i havent added it yet)
thats a rvc model you cant use it for tts
alright
unless you make some audio with a diff tts then infer with rvc
Sorry for the dumb question what's an RVC ?
Retrieval-Based Voice Conversion is a voice cloning ai
thank you
^
Alright well i'll check the ai cover thing and if i have some troubles could i ping you back?
sure, you can ping me or just drop your question here
ive genuinely tried everything idk, (1,2) (0,1) (0,0) (1,1) (2,2) input output channels on vonovox & flexasio, i checked the sound settings to make sure what amt of channels they have aswell,
it worked for the first few times on 0 channels for flexasio & vonovox but just stops randomly, it works like genuinely 1/10 times without reason.
if it helps when i tried to re-open setup.bat it worked for 1 single time but then stopped right after i clicked stop and start again
-colab
Google Colab is a Cloud (Remote Good PC) Service. While the Free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
by IA Hispano
Google Colab
by Hina
Google Colab
by Eddy
Google Colab
by Eddy
Google Colab
by Deiteris & Hina
Google Colab
by Shiro & Eddy
Google Colab
by Nick088
Google Colab
by Nick088
Google Colab
by Jarredou & Makidanye
Google Colab
bam done (ish)
these are all bs4 10 min of real data and 5 min of quieter repeated audio all at 100 epochs
weird thing is that they all had different step counts
idk why
i changed nothing
they sound almost the same for me xD
@crude flame try this

Hey does anyone know a really realistic image generator?
I heard about midjourney but i want to be sure it will be really reaslitic before i spend money
Thats strange..., I did everything the guide for Applio Colab (cloud) said, but when I reach the part where I have to press the "start training" button, it seems to be not working for some reason is what it seems.
heres the audio noobies sent
i dont really got any audio with much volume difference
Hi, could someone help me solve this error?
Looks like there's a bit of a problem.
unknwon message
If you clear the information being managed by this app, it may be recoverable.
Initialize
Reload without initialize
Error
unhandledrejection
no error stack
Error: Could not load Voice Focus estimator.
Error: Could not load Voice Focus estimator.
at http://127.0.0.1:18888/index.js:2:1042547
at Generator.throw (<anonymous>)
at s (http://127.0.0.1:18888/index.js:2:1039349)
and how do I do that?
Yesterday when I used it everything was normal, and today it won't let me open the application.
Does anyone know why when I change protocol to rest my pref stays 0 but if it's on sio I can see it jump and change accordingly when speaking for wokada?
i have this really good model that i like a lot but i dont like the voice that much, is there a way to fix it? merging?
is there a guide to merging?
or a way to make it softer?
what are some of the most convincing/best sounding voice models out right now?
for?
ye but like girl u mean?
if it sounds the most "realistic" than sure, im trying to see how good it sounds now, i tested it a year ago or so
like live voice to voice
there are always more and more settings to finetune if you want it as realistic as possible
such as EQ, bitcrush etc
a year ago or so also wasnt rmvpe instead it was crepe iirc?
dm
For some reason, RVC in Google Colab does not generate the necessary files for saving AI voice.
i use this colab
https://colab.research.google.com/github/iahispano/applio/blob/master/assets/Applio_NoUI.ipynb
is applio/rvc convert flac to wav before training?
i want to know if rvc support flac natively or convert it to wav before training.
how can i use these two .INDEX files for RVC?
it's not documented on https://docs.aihub.gg/essentials/how-to-make-ai-cover/
they only have the "added" file
i use UVR 5 in order to remove background noises and extract human voices.
i tried a lot of models in
https://colab.research.google.com/github/Eddycrack864/UVR5-NO-UI/blob/main/UVR5_NO_UI.ipynb
but i can't find a model that able to remove birds sound.
any suggestion to remove birds sound?
BS-Roformer and Mel Band Roformer
MDX23C
MDX-NET
VR ARCH
Demucs
ok
I'm using Kaggle with Applio and I have finished training my thing, but when I press "Restart Applio" it continues training (I can't stop it no matter what I press) and there is no added_ file. How do I fix this?
great, now my ngrok ended
smh
ima try a different method then
I was seeing that response to people having the same problem as me, but I couldn't find the Generate Index button for the life of me...
Ohhhh, that one
alright, thank you so much
This is W-Okada, not RVC. RVC is another different program.
How to add mel reformer models in UVR GUI
hey when i try to write something to chat gpt he just doesnt answer me or gives me an error does any one know how i can fix this ?
Is this normal with Taco2?
https://files.catbox.moe/uh8omt.png
you cant fix it https://status.openai.com/
the servers are down
either wait or use something else like gemini
Ok ty
be sure to not watch video tutorials for rvc
- what's ur pc gpu?
- what do u want to do?
- what are u using?
that's wokada deiteris fork, not RVC
RVC = Retrieval-based-Voice-Conversion, the best Few Shots Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models. Technically, Mainline RVC does have a go-realtime.bat (aka RVC-GUI), but it's pretty messy and outdated so it's extremely not suggested for realtime.
Wokada = uses RVC for realtime inference. There's 2 main versions, Original made by Wok, and the most suggested one is Deiteris Fork (modified version)
what's ur pc gpu? what do u want to do?
can please someone help me with image generation either chatgpt or midjourney
i want to use a certain style but can't figure out how to make it match
I got Applio from the compiled Windows ApplioV3.2.9.zip on HuggingFace but the run-tensorboard.bat isn't working
It throws the error ImportError: cannot import name 'notf' from 'tensorboard.compat' (D:\ApplioV3.2.9\env\lib\site-packages\tensorboard\compat\__init__.py)
I might've fixed it by moving to the C: drive, testing by moving it back to my D: drive
hello
i need some help
im looking to create an ai influencer ( im trying and im overwhelmed ) with comfyui, can someone help me with a workflow that could run on my laptop? low vram ( 6gb vram ) rtx 3060 laptop
text to image workflow + lora stacker
Welp, Tensorboard is now open
But it still throws those errors
I guess they weren't important
There're 5 or 6 of these in a row
This is the second time unzipping it, not sure how I can unzip it more properly
11
Got this error in Replay and went to the website, is there any way to use the ones it suggests instead?
Oh boy
What does this mean? I'm on a 5070 Ti
...suffering from success man, wtf
Bizarre, not sure what I can change
Ooooooh yeah that's right
I never installed it back in April, thx for the reminder
I forget what I was trying to even do
Well now is not a good time for me to install torch but back in April that's what you said to do lol
Oh it was UVR that I was trying to get to work
Is there a torch installation guide handy
This is what you said back in April but I'm a dummy and don't want to mess up an install
I can just run that in cmd?
Don't think I have any old versions of torch since I never installed the nightly build
Sweet thx
Wdym by activate sorry
I am a noob
The rest of it meaning the other lines above?
Alright thank you
I'll start with this
Re-extracted Applio again for the third time, didn't change anything
Not sure how to debug this
Going to try with 3.2.8-bugfix
@simple ore Do you know what this means?
Restarting PC lol, had to install Python
The website does open, I assume it works but I haven't checked if a graph shows up yet
3.2.8-bugfix has no error messages
Ok python installed and restarted PC
Nooo I'm still getting this error.
Now "The system cannot find the path specified."
I am so out of my depth sorry
Will this also make it work for Replay? Both are broken rn bc I don't have torch
What is that sorry
What link are you looking for
Oh I just got it from here
https://www.weights.com/replay
this is cool
Oh this is the error I got earlier
Idk I'm just going with what weights gives me
the weights one still gets updates
Anyway, I'm in the UVR folder in cmd now, so you're saying I should install torch there?
My folder is just called Ultimate Vocal Remover, unless I'm in the wrong one
That's weird, I just installed it and rebooted
I'm in the AppData/Local/Programs/Ultimate Vocal Remover folder, but it seems like that's not where I should be
I did
I'm computer noob I use exes
it's what I do
Oh maybe tha tis what i did
I thought I did that actually but it's been a bit
Nah I ran the exe
Why can't I use the exe
Alr
Thanks
I do have a torch folder in this Ultimate Vocal Remover folder
this is all that's in there tho
Can I install torch for Replay?
Hmm
I just got it from their website and then here
That's the version that the official website links to 🤷
Thank you
Did Anjok abandon UVR
Thanks for helping me sorry I don't know what I am doing at all
I'm installing it through the bat now
The website said to run the bat
It's already going
Oh
Shit
Lol
I moved too fast
I'm dumb
Idk if I should just let this play out or what
Idk how to uninstall any of what I just ran
Ok
Unzipped into a C: drive folder
Don't think I have ffmpeg on this pc so I will install
But I can't rn
thank you for bearing with me
I have to get back to work but I’ll send you a screenshot, it just may not be until later or tomorrow. Can I ping you when I’ve got it
(Trying to train a voice model on Applio collab) I genuiely do not know what im doing wrong. yes, im not using the "GPU" thing since its time limit has ended T-T, and the video quality is dogshit because I didnt want the file size to be heavy.
Because you are not using gpu
good point, but I also did the same process when I still had it
i may be wrong, but apparently the problem is with the audio itself somehow? since the log says "no wav file found" and "not enough data present in the training set", but idk how, it has more than a second of duration, its put correctly in my drive folder, and put its path in the dataset path box. the log also states an error regarding an "attempting to register factory for plugin" thing. not sure if this matters or not, but still pointing it out.
also, when that time limit ends, what should I do? just wait for it to reset/recover or something? or am I actually screwed?
what is your datasets's length?
1:03 minutes
thats all you can get ?
yep, the character im doing a model only has that ammount of "unique" usable audio
if you have all the model files in your drive, you can continue training after 24hrs
and reapeating this 1 minute source audio until i have a 10 minute one doesnt seem to be a good idea
naah,
just reduce batch size to 2
also try to train on kaggle. kaggle provides you 28 hours of runtime per week
you can either utilze it in one day or 2 days or in a week
if you say so. I just put it to 4 since the docs state that "if your dataset is short (around 2 minutes or less) put it in 4".
not sure wich one, but there they are
everything including g and d checkpoints logs eventfiles etc
yeah it can be put on 4 if your dataset is super high studio quality. actually you can experiment with it,
alright, so when a day passes, and have all these necessary files, what should I do then to continue my training?
also yes I will try this if Applio doesnt end well in the end
just put the same values. like name sample rate etc.. and uncheck fresh traning then put same batch size and traning parameters. and start traning
i mean dont use applio on colab, use it on kaggle
yea sorry, i should have said "if Applio Colab doesnt end well in the end"
hey is there a way to have a better accent on wokada
like sometimes it wont pronounce the L
I fixed the error in the end by uninstalling tensorflow and tensorboard using pip in the conda environment then reinstalling them
In the codename fork for Applio, there's no g/total graph on tensorboard, there's only a generator_total graph
These are the same thing, right?
thats strange, theres no accelerator option to change
Thanks!
Does anyone have the Google Collab link to make the ai covers?
do I need to say I did everything the docs said and I still get stuck :´)
im pretty sure I did but alright, m a y b e
oh wait... I was supposed to replace the literal word "token" on the thing with mine... now I get it, thanks for help :>
Sadly, the exact same problem I had on colab still happens on kaggle. Yes I set the batch size to 2 and I didnt let the site open for 28 hours
ok chat i want to start making models cuz i just cant find rly good models with realistic qualities, where do i start? like how do i set everything up (im familiar with programming just tell me what to download and start with if you can)
assuming i can train locally which i will likely do
any idea when RVC 3.0 will be released?
i mean we technically have a "community v3" not a official v3
this "community v3" has a new embedder and thats what rvc-boss wanted for his v3 but he never got around to it
I cant seem to use my flux on forge ui since upgrading to my rtx 5060ti
-colab
Google Colab is a Cloud (Remote Good PC) Service. While the Free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
by IA Hispano
Google Colab
by Hina
Google Colab
by Eddy
Google Colab
by Eddy
Google Colab
by Deiteris & Hina
Google Colab
by Shiro & Eddy
Google Colab
by Nick088
Google Colab
by Nick088
Google Colab
by Jarredou & Makidanye
Google Colab
How can I train using my own voice recordings to always output some other voice model? like this one https://www.weights.com/models/clroz1aic012sjmfug54yft0u
I have a 9800x3d rt7900xtx
if I train it locally how long would it take to get a really good model for Deiteris' W Okada Fork?
what's confusing about it?
is my esl english confusing? I want to train a voice model on my voice to the target model, without messing with the tune, index, pitch in the UI, or having to speak at a certain volume. If it's not possible I don't know 
Ahhh, I see, it's ever explained anywhere, RVC does not need to be trained on your voice specifically, it turns any audio input to the target voice 
Grok:
While not required, there are scenarios where including your voice in training could be considered:
Custom Fine-Tuning (Optional): If the model doesn’t sound natural with your voice (e.g., due to extreme pitch differences), you could fine-tune the model by adding a small dataset of your voice to improve compatibility. This involves:
Recording 5–10 minutes of your voice.
Fine-tuning the existing model using RVC-WebUI with your voice data to adjust the model’s mapping for your specific vocal range.
This is advanced and rarely needed for general use.
Improving Robustness: If you have a unique accent or speech pattern, fine-tuning with your voice can help the model handle your input better, but this is typically unnecessary for pre-trained models designed for broad compatibility.
How can I run rvc in kaggle?
Last update: Jan 13, 2025
@simple ore Is it over fitting? I mean it increased in last 100 epochs. The blue circle represents 200 epochs and the final point is 300 epochs. There is no loss
How to look at them?
More loss = good quality?
Okay. But how to analyze it ?
I see
I stopped training
Mb
The dataset was very poor and I didn't expected good results. So ya, that's fine
what setting refers to making voice more understandable?
need a bit help here
for some reason i cna't paste images
https://colab.research.google.com/github/w-okada/voice-changer/blob/master/Realtime_Voice_Changer_on_Colab.ipynb#scrollTo=86wTFmqsNMnD the default way is giving some errors
Installing pre-dependencies...
ERROR: Could not find a version that satisfies the requirement faiss-gpu (from versions: none)
ERROR: No matching distribution found for faiss-gpu
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 261.0/261.0 kB 7.1 MB/s eta 0:00:00
Preparing metadata (pyproject.toml) ... done
Building wheel for pyworld (pyproject.toml) ... done
Installing dependencies from requirements.txt...
ERROR: Ignored the following versions that require a different python version: 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11
ERROR: Could not find a version that satisfies the requirement onnxruntime-gpu==1.13.1 (from versions: 1.15.0, 1.15.1, 1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.17.0, 1.17.1, 1.18.0, 1.18.1, 1.19.0, 1.19.2, 1.20.0, 1.20.1, 1.20.2, 1.21.0, 1.21.1, 1.22.0)
ERROR: No matching distribution found for onnxruntime-gpu==1.13.1
Successfully installed all packages!
is there any way to fix it?
Why we doing loss_avg_50 charts now then just gtotal?
what command should I use? Everytime in the past (and currently) involving command line for python does not install
pip install --pre torch==2.8.0.dev20250605+cu128 torchvision==0.23.0.dev20250605+cu128 torchaudio==2.8.0.dev20250605+cu128 --index-url https://download.pytorch.org/whl/nightly/cu128
command I tried to run
also should I update my pip?
I have cuda 12.9, what version of Cuda do I need then?
or do I need to somehow update the ai softwares? Wasnt they supposed to auto update?
Before I do this, can anyone verify if this would work?
google gemini seems to want me to install the 12.1 version instead of 12.8 since it believes it's forward compatible apparently and stable. Like it recommend to not use the nightly build due to "stability issues"
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
but I did get this after pushing it for cu128
@simple ore Here's the inside of the Replay folder in AppData
https://www.weights.com/replay got it from here
guys,i have [Voice Changer] Pipeline is not initialized.
how do i fix that
could someone help me with setting recommendations? for the clearest least robotic sound?
what do you use to clean datasets?
i have 0 clue what that means !
can i send u a screenshot of my current settings?
of course! my dms are open
hi guys, can anyone tell me where can i upload the voice models i downloaded?
i have no idea how i use these models
I use Kaggle mainline and in logs I accidentaly removed folder named "mute"
should I be worried?
Put that in logs folder
sorry i mean if theres any app or something i should download
i have the models but i dont have any app or program to run it
Hi, is there any AI RVC that works with 5080/5090? Tried install Applio, OpenVoice, codename-rvc, spend half of the day trying to bypass python compatibility with those gpu's and nothing, it doesnt work.
so how would I do this? like in the folder where forge is?
hey, is there a good voice changer i could use without a good gpu?
colab gives me an error when i try upload a voice model
yeah
just type run pip install?
or type what you had ealrier
already done that before
I did that then did what you told me to ealier
it installs for system wide
so all the other ai software works now but not forge and not comyfui
is there not an updated version of forge? if automatic1111 is working, not forge or comfyui,
flux work with it now?
wait how? I got forge becuase at the time, automatic 1111 is incompatibel
dw just ask later
