#✨│ai-help
1 messages · Page 239 of 1
do you have the link to both I might jsut need to check them both out and see which works better for me?
and really appreaciate the feedback!
No idea, I don't maintain any so possibly wouldn't know
alr thanks
Sure!
For applio:
https://huggingface.co/IAHispano/Applio/blob/main/Compiled/Windows/ApplioV3.2.9.zip
( precompiled, use just run-applio
For the fork:
https://github.com/codename0og/codename-rvc-fork-3/releases/download/3.1.5-rev1/Codename-RVC-Fork-V3.1.5-rev1.zip
( you'd need to run run-install first, then run-fork )
Gluck ~
( and again, fork's meant for advanced users so, not gonna go through the hassle of explaining each thing one by one (( best you'll get is the from-ui descriptions
Thank you again!
Np! Gluck
seed vc
nevermind then
yall know what could be the problem the voice changer doesn't send any audio to the audio cable but it works in monitoring
-howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
Does anyone here knows about replay.ai? I recently download it. Currently I am training my voice, does it usually takes too much time to finish?
Just wait, someone will get to you soon. Everyone sleeping
Aight, man. Thanks for your help
what specs do you have
I'm using
ryzen 5, 4500
16gb ram gr4
B450m assrock
GTX 1060 6GB GDDR5
Is is too low? That why it take too much time to covert my voice?
yeah thats gonna take quite a while
I see. I been here waiting for about 3hrs now and it still in 55 epochs
i wouldnt bother doing any training with anything below a 3000 series
How many epochs does it usually take to finish?, my file that I sent is only 3mns long
I see. Bob in yt said the same thing hahaha
not sure tbh
i never tried
someone on weights is stuck at 1100 epochs 
No way hahaha. He's already 1100 and still can't proceed. I'm kinda hesitant now to wait
like voice changer litterally just stopped working 1 day and never worked again i tried reinstalling the virtual cables and still didint fix it even reinstalled the whole voichanger and still doesn't work so idk what to do more
Can you send the full screenshot of your W-Okada?
RTX 20-series should be okay
hello there there way use GPT-SoVITS on cloud { kaggle} without ui
the terminal or site?
The website (GUI).
alr give me 2mins
finally got Applio up and running...mostly....but the UI anytime click anything "connection timed out"
Also missing hubert_base.onnx which I can't find ANYWHERE or convert becuase of fairseq issues with windows 😵💫
it seems like no matter which I chose....there is some singular file that is like chasing a unicorn....so many dead links and repos where people say to download from.
Is there another way to get it? or convert it? Sorry if these are dumb questions...I am still very new to this >_<
Did you turn off the cmd? The black board thingy
def no no . i know that much 😅
i had to resintall allthe modules like 50x....eachtime i isntalled one...it updated a handful of others to incompaitlbe versions.....like dominos spreading out lol....eventually got them all back to the right versions and webUI launched, but i cant initiate any tasks or it just gives connection timed our errors
i'm getting to the point where i will jsut throw money at someone to give me a working copy of those files 😑 just so i can move on lol
I had the same time like you

Still have these issues
But you were able to open the webUI right?
I don't know much but most of the time, those issues will pop up in these cmd
that means closing the program
i know that if i kill the cmd promt that is running it ...the webui will die. RVC was the same way
but it was still going
RVC v2 worked great...until the very last step...because i didnt' have a hifigan.pth file. And in Applio the only error I can identify atm is poitning to needing a missing resourcer hubert_base.onnx and the ui crashing. 😓
Is anyone willing to DM those files?
its so weird that they seem to be so critical...and i see lots of posts online of people looking for them....but they dont seem to BE anywhere
😵💫
Wait, you use applio to train right ?
i tried both. RVC i was able to almsot fully train wiht but it got stuck on the index training part because of hte missing file hifigan.pth.
applio i cant get to even go that far becuase of missing hubert_base.onnx and the UI constantly timing out almost instnatly sometiems.
download 3,2,9
show the console window
Oh, it is called console window
if you have closed it, you'd have to run the program again
a
sec i dont know why it was all zoomed in
there we go
and also before but not present atm the error mentioning the missing resource file hubert_base.onnx
try moving to another directory path without spaces
i'll have to give that a try this weekend maybe.....the idea of reuibiding everything a 5th time makes my head spin right now lol
you're using some outdated af version of applio
that you also installed from requirements instead of downloading the compiled version
so now you have a bunch of incompatible libraries such as gradio vs pydantic
a stupidly long path does not help either
if you got this whole thing as a package from somewhere
it is not up to date
:loss_disc=3.788, loss_gen=3.481, loss_fm=10.017,loss_mel=20.140, loss_kl=1.865
what does these means?
with some colab there's like a built in chart where we can read n understand easily but I'm not sure how to read these when training locally.
what are you using to train locally?
mainline does not come with tensorboard, but you can install it manually
Ah yes tensorboard that's the name haha, Im trying to train voice models. I'm not sure if I can just read it from that or is it better to have tensorboard? If so can you point me to where or how i can install it Thanks! btw
guys how i know if model is overtrained in cloud {kaggle}
you need to tell me what are you using to train a voice model
mainline rvc, mangio or some other outdated stuff
hello anyone help
Im sorry I misread you last message, Im using mainline i believe
https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI
this is where i got it from
I think you need to have tensorboard but idk how to use it on kaggle
yes but how i read it 🙂
you can read when your model has enough training by looking at if the bar keeps goin down. At some point it will not go down and may go up. That's the point of overtraining from what I understand
mainline has environment in realtime folder I believe, so realtime\python -m pip install tensorboard
then realtime\python tensorboard --logdir=c:\path\to\where\logs\are
and I beleive the mainline requires some editing to config.json in the model's folder or somewhere else to set the logging frequency
alright, I will try this thanks!
thx dude
y'all i been having this issue on kaggle since yesterday is kaggle broken still?
Traceback (most recent call last):
File "/kaggle/working/program_ml/app.py", line 1, in <module>
import gradio as gr
ModuleNotFoundError: No module named 'gradio'
i ran everything as it should be ran but this happened
What is the best voice changer?
Hey! i have the rtx 5080 what ms should i set the chunk at? for the 3090 it says 72 ms chunk + 2.7s extra what should the settings be for this gpu while gaming?
remove highlighted part
also in install cell
and re-run the pretrain load line
okay thank you
!python core.py "prerequisites" --models "True" --exe "True" --pretraineds_hifigan "True"
or make it a new cell
just create a new cell
ok
where should this cell be?
after the installation or before?
Why is it taking so long?
if you dont have anything installed yet, it is easier to remove that highlighted code from both cells
if you have it instaleld already, make the cell anywhere
does not matter
okay thank you
i'll try it rn and lyk if i have any more issues
@simple ore
whats the issue now
same issue -requirements install happening in a wrong place
did you run 'setup ngrok' cell at all?
yes
i ran it again with the other method and its happening again
run every cell
again?
duh
@simple ore tensorboard isnt detecting the logs so its not letting me check the scalar graph and such to know when its gonna overtrain
hi
Hlo
The same thing happened with one more user. @simple ore
no idea, what's in /kaggle/working/program_ml?
where can i check that
i have no idea, i dont use kaggle
should just be the content of applio including logs
the folder tensorboard checks
okay
okay i found it
there's a file called
run-tensorboard.bat
no
that's for local
i mean there should be logs folder
in the logs folder thre's your model folder, inside is eval, inside eval is events.out file
okay yeah
that file is there
it's not working even if i refresh it
yes i know but still it doesnt show anything
how big is events file?
okay
well, delete that
in the start cell you can try changing the line to %tensorboard --logdir /kaggle/working/program_ml/logs --port 8077
shittt so i gotta start over?
the training should resume from the last save
if you stop the cell, make the fix and restart
ahh yea thats true
this is weird
it does cd into /kaggle/working/program_ml, so anything that starts after should be from that local folder
tensorboard and app.py
app.py works as expected
so weird tensorboard does not
so will this fix it or not?
i dunno
cause i swear it happened to me before but it didnt always happen
so
i recently learned not too long ago that my voice technically doesn’t have a voice type
reason is because despite me being low (f#2 - f#3) i’m NOT a bass
because to be a bass your low note has to be at least an e2
so for the people who have no idea what the fuck i just said - my vocal range is very narrow
and for higher notes i strain my vocals
reason i’m saying all this is because my question is
bro singing-support is at the next door
does this affect the voice model?
😭 wasn’t about that
.
the model will only be able to do what you can do
it's not that hard to tell
basically does me straining my high voice affect the ai model negatively?
if your voice sounds very breathy then yes
then how come when i put some vocals into the ai of me, it can actually hit those higher notes that real me will be no where near
ah i see
try the klm pretrain since it can generalize high notes
yeah i tend to go breathy the higher i go but i’ve been tryna train myself not to do that
oh yeah abt pretrains, how tf do i use em
ah ty
so i also remember u telling me that the main thing of a dataset is to have my whole vocal range
what if i sing a few song at the lowest of my range, the regular sitting point of my range and the highest point of my range, sould that help?
oh 30 mins 😭
so sing an entire EP is what ur tryna tell me 😭
orrr i can set my playlist to shuffle and try singing along to it
i mean u can just record different song clips then merge them in one big file
while recording
yea i did that once
to make a me singing dataset
but the model sounded crap
and not like me
so far the best one was the one of me talking really expressively (with a ton of laughing)
but i think i deleted that file
it was also the same dataset that just got stuck
according to you
the model just got stuck
at it
just be sure your voice is consistent
not sudden timbre changes
avoid excessive denoising
if you have to denoise just do it right
i don't denoise it
my mic stand make funny reverb noise
and i think it funny if the ai has it too
welp i told u last time rvc kinda hates reverb
two biggest things rvc hates:
Raspy voices
room reverb
nah because the vocal model actually worked
oh
raspy voices?? why??
because it's going to try training that regardless
it cant say no lol
f0s are ass at estimating the pitch with those type of voices
in the ai
oh yeah
pitch correction software always get confused at raspy voices
maybe that's why
mhm try not sounding raspy in your recordings
ah right
my voice don't have that type of like rasp
wait is that ai 😭
it has a weird sound that is like similar to ai voice
is a real guy lol
wow
lord i saw his yt videos
the sounds at the end of each line especially the breathing sounds ai to me
lmfao
Oh? I just used the link that posted abovr
https://huggingface.co/IAHispano/Applio/blob/main/Compiled/Windows/ApplioV3.2.9.zip
is this link bad? Or did i screw up unpacking it?
😓
that one should be fine, unzip with 7-zip
into C:\Applio or some shorter path
this is my second time training a model after a few months, I should use 32k as the sampling rate in this case right?
And with another user pointing out an issue with my directory, I think that could be the issue why it’s not finding what it needs… so I tried to install them myself. I made it worse. It looks like. 🙃 thank you for pointing this out. I will try this again!
How do I generate images with AI or train one? Is there a Google Colab for that?
how to use rvc do u have a link or some ?
You can either install RVC locally our use it in cloud services like Kaggle or Colab
-rvc
Read the docs above for further info.
heey
Why can't i hear myself with the voice changer
hey guys, is there anyway to get perf down? (im guessing this is "lag"?)
i keep reaching 140+ which in terms is dimishing voice quality
change f0 to fcpe
if you're still lagging increase the chunk
thank you! this decreaed it dramatically!
fcpe has slightly worse accuracy than rmvpe and breaths may behave strange
is more sensitive to noise yep
i havent tried rmvpe models using fcpe as f0 often, so i have no idea how well it behaves
what is the process for turning my audio sample into a .pth file?
And I can't seem to find any free software to do it
https://docs.aihub.gg/essentials/how-to-make-voice-models/
https://docs.applio.org/applio/beginners/first-model
ty
I'm basically working with a dozen or so short samples of my rpg character going "uh" and "um" and different action sounds.
Will that be enough?
no, the dataset has to be natural speech or singing to give good results, don't expect good results with huh ah sounds only
minimum 10 minutes
recommended 30 minutes up to 1 hour max
ahhhhh
there is already some mc villager model
though not too ideal as a serious model
Guys, i have one question, have anyone tried Seoul Streaming Station's pretrained for realtime voice changer? 😦 i used their pretrained and cloned my model for about 50 epoch, the result is pretty robotics somehow. Do i need to push it to 250 and see more result?
You should use TensorBoard to evaluate the model's performance, rather than assuming a fixed number of epochs to train it for.
Other than that, until Noobies merges a specific change in applio main, you can try my fork as there's a rather deal-breaking change + tensorboard logging aligned with that change
i'm using ur fork, but i'm too dumb to know how to use tensorboard lah
and as for tensorboard, is it about running it or using? ( Reading the metrics
like, somehow it only has 4 instead of lots in applio
ah, i have issue in reading, i'm looking for a guide how to read these
but i still don't get it
oh 3.1.6 is up ? i'm still using 3.1.1
let download it
Because logs on my fork and applio are not the same
the loggings differ
( especially now, starting with 3.1.6-rev1
one sec
but i swear to god yesterday i saw the learning rate chart
that's how it looks rn
the " total generator " loss is not the main one you wanna monitor anymore
and the " total discriminator " -> " discriminator_adv
so i want to look at g and d right ?
tl;dr:
total generator: describes metrics; mel, fm, kl
generator_adv: describes adversarial performance of generator
discriminator_adv: same as ^ but for discriminator
so in reality, the only change is that now you don't watch for total generator, but generator_adv
more or less, adv for g and d + fm
if fm stays rather high ish, ( 11-9, maybe 8 loss region ) then that's fine, model is learning
if fm goes too low, it means discriminator got too strong ( no point training for longer
But then, soon ( hopefully ) docs are gonna get updated... and if not, well, guess I'll have to do my own on the metrics
Anyways. If you got lost, ask right away. Gonna simplify it more
thank you so much, kind soul
✨
If you have access to #🔊│ai-development
( ai-testing ) dr87 wrote quite a lot of useful information ( on how to understand the losses, more or less
ye, i read some from him
that should help anyone really to get a better grasp
yup
Now, to straighten it up.. just in case..
the double update thingy does exist in applio, but in the spin branch
i swear to god dude is a walking research center to me
i will try to use spin and use it in his vonovox
i have no idea if index file is necessary, because i don't see the option to use it in vonovox
well... generally index is rather something one should, imo, avoid
in voice changers, that is
if the model is / was trained poorly or perhaps, if not on a lot of ( diverse ) data aaaand, it will happen you don't " fit " within it's known indexed feature / ' phonetic range '
glitches could occur or some funky / quirky pronunciation
yeah... simplifying it; if you don't have to / don't have some legit reasons, rather avoid index in rt voice changers
that explains why my bot can't spell "ng" for whatever the reason
More or less, yes
could i use an ai voice model using clownfish?

Tomorrow gonna do tests on wavlm and..
if all goes well, gonna add that to ui n stuff
thank you, thank you so much
ye np man
oh whoops- actually, gonna be in few days lol
first need to test-train a pretrain ig
( unless someone's gonna be faster than me 🤔 so I can test it sooner
but wavlm is an embedder right ?
yes.. to put it simple
contentvec is the default one we always used
then contentvec based spin happened ( trash )
then tests on hubert-based spin ( few attempts
and now, lastly, wavlm-based spin ( supposedly the best and so I think too
gpt once told me to use whisper ppg for embedder
i have to finetune it for Vietnamese
dr87 said my pc succ, don't do it
Sorry for the interruption, but where would AI covers be put?
In terms of channels
They aren't allowed here because of copyright strikes from discord
Oh ok
( trust me, you better do not. I already got 1 strike oof
Sounds good to me
✨
guess you'd have to consult dr in terms of training such
He's def the spec in that particular delicate matter
but then I'd trust his judgement if he said ur pc sucks
those trainings are resources-hungry, and I mean very hungry
also the spin embedder work in vonovox, just got some robotic voice, but i don't know why
well, it can be the model or can be embedder
i trust him too, he knows way more than me
I am still only in 60% informed on the embedders, still gotta catch up ( was busy with my own work
but generally, you should wait for wavlm
If our expectations are met.. it's gonna be a game changer
okay, i shall wait for wavlm then
knowledge in this convo is way more than i tried to research on my own


those who seek wisdom, gonna get wisdom.
~ probs doge, in another universe
oh yea, in case of issues with new double-update method, just uncheck it in the ui
this:
then it'll behave 1:1 as before ( except the logging. Now is the accurate one

and in case you wonder why the logging got changed in the first place
oh if it sits in one place then i turn this on ?
i find it weird in applio cuz sometime the graph don't even move. i tried to turn on and off but nothing happened
I'd say, if no matter what you can't train a good model with it
then try again but with it off
that is because..
Applio does averaging over 25 steps
So, say, if you get 15 steps per epoch
you'll get logging in: 1 epoch ( 15 steps ) + 10 steps from the next
my logging however does an average of each epoch's steps
so yea
( if I got you right, that's the reason. )
( tl;dr, mine averages epoch's loss by: average all steps' losses vs over fixed steps count
it's not really a problem but
a design flaw I'd say ( if you wanna look at it like that
Both methods have their pros and cons
But then, I mitigate it by having avg every 5 epochs, as additional metrics
( Yet.. am considering if I shouldn't actually make it avg every 3rd epoch
Anyway. Dw, it's a normal behavior in applio ~
i will use ur fork
✨
( remember, there's no good or bad. Just preference lol
GUYX HOW DO I ACTUALLY USE THE VOICE MODELS
i need to try again with Seoul Steaming Station's pretrain, it sounds good on sample but somehow broken in rt voice changer
click on edit
add model
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
UM OK
considering It's 8 am and I am tired.. just gonna leave this ^
( going to sleep soon so, even if I wanted to help. too tired
i might go with Rigel finetune again if things keep sound so robotic
go to sleep lah
few tests more and yeah.. will have to lel
i still dont know how to use them😔
man..
Have the audio file of your song ready, & let's extract the vocals from it with an audio isolation software.
is the cuda 2.0.78 beta the most updated version?
good night and thank you
gnight
Last update: May 5, 2025
no but like as a voice changer
i mean which part you get stucked
this one ?

???
i don't know dude 😦
😱
you should take ur time and look at the doc above
slowly. or hop on youtube
idk cuz I'm having an issue where any RVC models I upload it just doesn't work
ok which doc tho
this one
oh ok thanks
of what?
this server isn’t only about rvc and wokada
RVC
it’s a general ai server
check the voice changer tab
RVC = Retrieval-based-Voice-Conversion, the best Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models. Technically, Mainline RVC does have a go-realtime.bat (aka RVC-GUI), but it's pretty messy and outdated so it's extremely not suggested for realtime.
Wokada = uses RVC for realtime inference. There's 2 main versions, Original made by Wok, and the most suggested one is Deiteris Fork (modified version)
are you sure you aren’t talking about original wokada?
i have no clue
it looks like ur getting it from a video tutorial, which u shouldn’t
i explained you above the difference
read it for more understanding
video tutorials get outdated easily, never use them for rvc or wokada
Yeah, I think I'm seeing the problem now.
@bright basalt what’s ur pc gpu and what do u want to do?
My old version stopped working so I tried getting the updated version.
yeah ur deffo talking about original wokada ig, never use video tutorials for wokada
what’s ur pc gpu and what do u want to do?
Lemme look rq
You can check your pc gpu on Windows via:
ctrl+shift+esc (task manager) -> Performance tab -> GPU
using the NVIDIA GeForce RTX 5070
alr, and u want a realtime voice changer for calls?
Yeah sure
alright, then what you need is wokada deiteris fork for rtx 50 serie support and better performance
you were using original wokada which is worse and doesn’t even support ur gpu
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
1st link, read it and lmk
Alr I'll take a look
be sure to elaborate next time u ask for help tho, this isn’t an rvc server
alr lmk
Alr
So, what I did was I downloaded the updated version for MMVCServerSIO for my RTX 5000 series. This would be the correct step?
?
why is there no output or input devices showing up for client on W okada modded
server works though
nevermind i figured it out
did u also download vac lite and uninstalled the old programs u had?
what’s ur pc gpu and what do u want to do
Make sure your PC has any microphone plugged in and enabled, and Virtual Audio Cable program presents. Once you have all these ready, the browser may ask you for microphone permission.
they said they figured it out
what pretrained should I use for an english speech dataset? (must support 32k and 40k sampling rate options)
Which RVC program?
I am using applio rn for training
I need help with setting up virtual cable for discord and games
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
what’s ur pc gpu? What u want to do? What tut link are u following?
Are you trying to use VB-Cable instead of Virtual Audio Cable lite? Which W-Okada version are you using? And what is your PC GPU?
wait,I am sending ss
elaborate too
don’t just send the screenshot
explain all that i asked u
ok
No need to hop into my direct message for that.
@blazing wharf don’t dm without consent
so I can hear my converted voice in the web but discord is not
answer my questions too
^
the more you elaborate, the easier you will get help, else it will be harder to help ya
!give-media-perms 1h @blazing wharf
now u can also send screenshots
Vb-cable, deiteris' W-okada,gpu - rtx3050
uninstall vb cable
thank you

many users reported that
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
1st link, in the virtual audio cable step
ok
yup, forget about vb audio cable
Use Virtual Audio Cable lite instead.
u can uninstall it
After you install Virtual Audio Cable, make sure you set your main speaker and microphone to have only green tick. Otherwise either one program would use that Line 1 instead.
To hear what W-Okada output, you can set monitor to your speaker/headphone on W-Okada.
default communication device? where should I assign this ?
Right click.
should I assign it to Line 1 in both playback and recording ?
Shh. If the device doesn't show default device in menu, it means the device is already set as default device.
When did I say you should set Line 1 as default device in both Playback and Recording? I said to set your main speaker and microphone as default devices.
it's showing like this now
Mate, what are you doing? Why did you disable Line 1 in Recording?
How can I explain this for you to understand?
what should I use? I am using applio locally (3.2.9) for training.. or at least could someone tell me weather I can use a pretrain that uses RefineGAN?
Do as what I did in my screenshots. Simply. I won't be repeat the same step for another time.
it's working !!!
😭

Original is still the best
where is the download for it? is it just the titan/ov2 pretrain?
None of the customs. Just the original it comes pre selected
do I just click train then with custom pretrained turned off?
by defaut, as long as you dont uncheck [x] Pretrained and do not check [ ] Custom Pretrained, you're using the original pretrain set
alright thanks a lot man
I don't think I did the vac lite, and I'm pretty sure I uninstalled the old version.
downlaod vac lite
Oh ok Yes I actually do have it
want me to check ur settings or have any issues?
elaborate
what?
I just asked you if u want me to check your settings or if you had any issues
actually nvm I do have an issue
Does anyone know if you can use different pre-trained models for training in Replay? Replay works well for me but I am learning about pre-trains now and I heard KLM4.9 or KLM4.1 is good for singing and I want to use that one in Replay because I feel like Replay auto-epoch is such a nice addition. If I were to train in Applio or something more difficult/manual I would not know how many epoch's to choose etc. Does anyone either know another training tool not too heavy on CPU/GPU just like Replay which has an auto-epoch feature or know how I can do this in Replay? 🙂 Please let me know if you know this, I have been searching but can't find the answer anywhere 😦 thank you! 🙂
(I also may need to add that pre-trained models are new for me and I don't complétely understand them so not sure if those can be used in Replay.) any help appreciated
Yeah im having an issue extracting the MMVCServerSIO zip
extract it with 7-zip/winrar or whatever
I can't just right click it and press extract all?
it's better to not use windows default extraction method for better speed and compatibility
alr ill try extracting it the other way soon and lyk if i need more help
alr
i want to generate funny pics / meme etc
my specs i5-14600KF Nvidia GeForce rtx 4070
To generate images for free (text2img), either:
- An easy and good ways with weighs.gg are:
- Use /image with @earnest musk in https://discord.com/channels/1159260121998827560/1202754985255764060
- Create an image on their site https://www.weights.gg/ (which you can also use LoRAs, Low-Rank Adaptations, basically a small trained additional model to adjust your generation)
- Use Open Source Models like stable diffusion & flux that could be a bit **harder **but good, you can check tools like Automatic1111 and ComfyUI
:wave: @low shard, How can I help?
Available Commands:
• @weights find <query> or /find <query> - Search for RVC Voice Models
• /create - Create an AI Cover
• /image - Generate an Image
im kinda dumb how do you select a voice model?
is there any way to change the sample/bit rate in UVR UI? i can only chose the format
wahts the best site / app to rip music from without losing any quality?
Is log interval synced in Applio fork ? @glacial pollen
yes it is
it does averaging over N steps of your epoch, per epoch
so each avg metric's log point is an average of all steps' losses ( from a given epoch )
Tho, I am remaking the logging system rn
You'll now have " avg 50 " instead of previous avg_5
so
- avg epoch like always
- avg_5 gets replaced by avg 50 ( same as in applio
Ok thank you. I was confused that it logged every 50 steps in the graph and thought maybe it was not synced. so I wanted to ask
like step 50-100-150-200 ..
check how many steps you get per epoch
# Calculate the avg epoch loss:
if global_step % len(train_loader) == 0: # At each epoch completion:
avg_epoch_loss = epoch_loss_tensor / num_batches_in_epoch
# Dictionary for losses:
scalar_dict = {
"loss_avg/discriminator_adv": avg_epoch_loss[0],
"loss_avg/generator_adv": avg_epoch_loss[1],
"loss_avg/generator_total": avg_epoch_loss[2],
"loss_avg/fm": avg_epoch_loss[3],
"loss_avg/mel": avg_epoch_loss[4],
"loss_avg/kl": avg_epoch_loss[5],
"learning_rate/lr_d": lr_d,
"learning_rate/lr_g": lr_g,
}
How do I check it? In console ? Or like in file name etc
console
I guess each epoch takes 14 steps
But for those curious:
It collects losses from each metric availble at every step for N steps ( in here, N is amount of steps your epochs have )
then does the average of summed loss / n_steps
Is something wrong with that training ?
I'm using Kaggle with an Applio fork
you ain't using my fork
you use Applio
Applio is not a fork
lol
if you were to use my fork you'd see " mel similarity " reporting in console
How can I use it ? Available in Kaggle
Or just locally
Not sure if there's any kaggle made for it
You'd have to ask around
other than that, it's local only for now ( or collab if you can switch repo link and stuff
can anyone pls help me set up rvc?
#1159513888199540817 should have everything already
Can't do it locally, unfortunately. this is how Applio graphs looks like
i looked into it but i didnt find any help to solve my issue
What’s the issue
yeah, that's not mine
Unfortunately that's what it is for Applio
( if you hoped for pin-point logging
Uhmm that's sad. So graphs are not accurate ? @glacial pollen
i hear like a crackling sound, huge delay in sound and apps and voice doesnt work
wdym by that
like steelseries sonar?
now crackling stopped but the playback i hear is in sequences idk hot to tell u
its like playing a game in 5 fps
it's not that they are not accurate
it is more so they're misaligned ( if we look at it from: " hmmm.. will this epoch I choose rn have good metrics ? " standpoint )
@simple ore this is why I kinda wanna keep avg loss, cause some people ( including me ) want alignment Ig?
Gonna just add a switch in the ui to turn it off and only leave avg_50 ( for pretrains n shit you guys mentioned
I consider here as the best right ? With 0,5 smoothing
well, yes and no
after recent discoveries, this is not an actual ( and only ) metric we should focus on
but ig, you can reference that point yeah
check the steps count and test each epoch that's near that steps count
tl;dr, noticing anything new?
take a guess
dont worry too much about it not being aligned to the epoch size
I'll reference it for now. What confuses me is that these graphs log every 50 steps but I have 14 steps between each epoch.
Noobies idea, more universal
if you have 14 steps/epoch you either have a very small dataset, used too high batch size
either way you have a different problem 🙂
14 minutes dataset , 8 batch size
there you have it
too high?
there should be like 300 slices in sliced audio, yes?
300/8 does not make 14, so something is wrong
let me check, 333 slices
300 / reasonably decent batch 4 = 75 steps/epoch
def shouldn't
I on 7 mins and batch 8, get 20 steps iirc
I have no idea.
re-check the batch size.. you may have use like 24
I'm pretty sure I've chosen 8 batch size. Could 2x T4 have triggered this? I'm using Kaggle.
good question.. no idea why
Using Colab. I set batch size to 4. 333/4 = 83.25 and each epoch takes 87 steps this time. Applio is kinda buggy or something ? I don't know.. It's still close though @simple ore @glacial pollen
wdym?
it is still applio
Yeah , still Applio. @simple ore said step count = slices/batch size
unless you did something extra?
but then, that'd round up to 83 I guess, so it should be 83
not 87
or, well, even 84
I didn't do something extra. After preprocessing and feature extraction I started the training, as usual.
idk idk, that's hella weird
well either way, few steps this or other way shouldn't matter in the first place ( at least in ur situation
you still have to test epochs in the end right? right. so I suppose, you can ignore it
as long you don't see nonsense on your graphs, you should be fine
Yeah, I guess 3-4 steps difference won't make a difference. But I think Applio is kinda buggy.
If you believe so, create an issue on github
Because I didn't do anything wrong, I just uploaded the dataset and that's it lol.
well and here's the thing
did you actually preprocess it right and truncated all the silence
the number of mute files does count, but at least it shows progress bar each epoch unlike the mainline
Applio does it automatically. I guess we find out the problem. The silent files seem to be the main cause of this imbalance.
which bug do you refer? as said above, create an issue on github
- if you did not denoise the set, the garbage auto-slicing doesn't even slice right
Oh my bad. I'm already truncating silence using Audacity
because it is based on the silence regions
yeah do that and retry, see if it improves your situation
this bug @knotty moth
Should I keep it in 2 ? Silence training files
do you know how to make ai?
no advertising
what am i advertising?
lmao
sussyboi69 is hall patrol
your app
i asked the chat who can make ai for me
yep that counts as advertising
do u get paid to do this
I don't think he intends to promote something
isnt that basically the same as the trickhouse guy "looking for a partner"
ah mb
but I don't think that kind of request is allowed
sorry we shouldn't have talked about it here, but well it ends here 
if you used automatic slicing, not all slices are equal length
so it adds some batches to compensate
yes , i used automatic slicing. Does it affect the model quality adversely ? I think it would be better if I choose the simple method.
not really, it is just slower and uses more memory, but in most cases it makes same slices
Alright
What are these batches ? Silence etc
To make slices same length right
just adding the same file more than once to another batch
when there's not enough slices of specific size to make a full batch
not silences
my voice changer lags my pc a bit, its there anything I can adjust to help it?
anyone know the best ai image generator?
There are plently of AI image programs available to use. I use Stable Diffusion.
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
Which W-Okada are you using? And what is your PC GPU?
!help
Unlock the world of LunaBotPrime, absolutely free! Dive into the realm of premium music quality, enjoy it to the fullest. LunaBotPrime - where the music never stops, and the magic continues!
You can invite LunaBotPrime with this link
No Category:
help Shows this message
Type !help command for more info on a command.
You can also type !help category for more info on a category.
LunaBot 🌙 is the perfect music bot! Feature rich with high quality music! And Custom Playlist
You can start listening music by just joinning a voice channel and typing: /play [song name or link] (Remove brackets).
We support only Spotify, soundcloud, bandcamp and more!
To view more help on a specific command or category, run
/help <command> or /help <category>
Important Links:
Support
Premium
Invite
Command Categories:
🎶: Music
💰: Premium
⚙️: Utility
📕: Admin
Select A Page From Dropdown Menu Below
To use other Discord bots unrelated to AI support channels, go to #🤖│bots.
"Looking for a partner", I thought he meant a girl from this server that wanted to be his girlfriend.
Got Applio running....just dumped it in C: and also learned I should only use 7zip to extract it....winzip breaks the prepackaged virtual enviorment and I wasted an entgire day on that
I'm still very keen to finish the RVCv2 inference training though on the 1000epoch model I trained and made an index for already....but i still need a pre-converted hubert_base.onnx ...but the orignal author doesn't appear to be hosting the file anymore
https://huggingface.co/lj1995
I can't seem to get it manually converted anymore either using hubert_base.pt via the python script I attached.
Is anyone who has successfully trained real time voice models with this file (hubert_base.onnx ) that is willing to share that file to me in DM please rather than tell me to use a different app/method? I'm going to need this file for a few more things eventually too.
Thank you in advanced!
rvc's / applio's " hubert " ?
I mean..
you can try to fiddle with it
Cause if you need it for RtVc then that's 100% the one you want
you are confused. huggingface is a github repo, it has history, if there was ever such file there you can find it
but there was not
not exactly but i think i found part of the problem....never knew or part of any instructions anywhere that i had to run "dlmmodels" and use aria2.exe from the tools folder....i've been combingin trhough every file for hours....and found that and read its code.....it was so obvious when i read it....it auto downlods the needed files.... first attempt from hugging face then backup mirrors if that fails. ALL THE HUGGING FACE attempts failed. but all the mirrors worked thankfully
I just finished donwloading them a few minutes ago
and am going to try to get things working again
so far it LOOKS like i have all the assests i need....fingers crossed...i'm goign to start over and retrain from sratch to be safe
When preparing a dataset, a big one (30 mins+) would it be better to add a variety of tones? like asmr, high pitch, lower pitch, speaking in normal voice of the person's voice or should I just stick to one type of tone? like ONLY asmr or ONLY high pitch audios or ONLY lower pitch etc
if you want a model that does a lot of things, you may want to add some variety
alright thanks I will make the majority of the dataset in normal speech and add some other tones too
just make sure to balance it out
if extras you intend to use aren't well contained within pretrained models, having thrown-off balance in ur set might make the model biased towards a given subset of sounds
( or might diverge ~
In any case, gluck ~ ✨
I am using orignal pretrain rn I think it should be fine
oh well

I do wish you best best of luck then
( in any case, I head off to sleep now so, might respond tomorrow if anything ~ )
whats the difference between client and server in new w-okada
presumably you can run a server on a remote host with gpu and use a browser on, say, a phone
client takes audio, sends it to server to process
server options uses devices available on the host directly
I was wondering about that too! Probalby only useful on rooted android phone though?
the client mode only uses MME which has more latency than WASAPI and ASIO being only available in server mode
oh wow...with teh propper assets....
1000 epoch is taking like 5 minutes vs 3.9 hours last time lol
?
more like "with only 2 mute files" lol
hubert_base is only used to extract features.. which I assume it failed, so you have an empty train
there must be some issue where the preprocessing & feature weren't done properly so it doesn't learn it at all
how do i make my own voice models
wait a min
Last update: May 5, 2025
there is a guide on these
Last update: May 5, 2025
you should read about this first
to have good data set
then use it to train your voice models
with this
Is RVC Colab no longer usable?
I have no idea how to use the voice blender (this is in the applio kaggle space)
I can help u make voice models
if you want to use kaggle applio
oh, please do. I have lots of issues with my model. People said i sound like a kid.
Do you know if i need the same person voice but lots of up and down, singing to make it able to have a realistic voice ?
Having a bigger range will most definetly help
I need it for realtime by the way
perfect, I use realtime all the time for trolling
So 1 hour of data of the same person, but saying different stuffs and vocal range will help right?
So ASMR is a no no?
yes

How long should it be
Because none of embedder support my mother tongue
also pretrain. I need Rigel finetune to able to handle 48k
I don't really know how much data is best for a model but nothing less that 10 minutes I'd say
32k is usually best..

why must it be 48k?
if all goes wrong why not formant shift the dataset 
I thought i need to train it to 48k to ultilize it
Let me google how formant shift work
idk what Vonovox is 😭

Oh, dr87 made it. A tool like wokada
But somehow my pc don't let me run fork version or og version
So i run vonovox instead
it has less delay too
that's cool
no nvidia?
Nvidia

But it's broken and i can't figure it out why
You used to work fineeeeee
Maybe it got jealous cuz i use Vono instead
rip
anyways I'm up for the task of helping u make the model
dms are open when ready!
No problem
no
yes
neat, I didn't know that it did that
yes, in 32k datasets
knowledge acquired
for 16khz cutoff sets sure, for 20khz + and above not
you kinda need 1 hour
this pretrain is very bad and undertrained
use og pretrain
dont use singing for realtime models, speech only
okay, let me train again
I mean i do speech only
But there are words that the model can't spell
i don't know what is the cause of it
this is embedding limitations
cvec is trained mostly on english
Ye, i picked Rigel because it has language that spell similar to Vietnamese

The quality of the audio is not as good as og pretrain
welp i told u why
Ye, embedder
no?

i said rigel gave u worse results because is undertrained
just stick with the og pretrain
But you told me it's embedding limit which cause misspelled
Ohhhhhh
Sorry, my brain is a bit laggy
that will improve the results quality wise but not the pronunciation
Once you've gotten everything figured out dms are open for me to help
u can try using the index file
for realtime (or any model actually) aim for 30mins to 1 hour max
i appreciate that, kind soul
Yummy index file
dont use asmr
results are weird
rvc cant handle breathy voices well
In one of my Ena models there was a singular line with whispering and it didn't do too bad
and og pretrain cant whisper so
a singular audio is not gonna destroy a model
some asmr is fine

its been a while since ive made a asmr model
might need to make a return
its been 3 months
wow
no, women scare me 
During training, my pc went to hibernation. I checked logs it's still training but in gui it's showing error. What to do now
has anyone ever told you that ur voice sounds robotic when using the voice changer?
Was

If sound robotic
I suggest you take a look at tensorboard
Or check your data
And there is no model.pth file over there. There is only G, d and other stuff
@viscid moss sorry to ping you but I need help. Please help me
In that case u should stop the training cuz u can't save the .pth file
I need help related to ai
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
I stopped
What to do next
I have all g and d files event files and other stuff
Should I Train the model again? Or I can re launch training process
I need someone who can help to build my ai startup
Which AI program? Is it about AI voice, LLM or anything else?
What things do u have? If only have G and D u need to train again
Lemme show you
Something else it's in an untapped market
disable that hibernation thing btw
Um, that doesn't tell which AI you're looking for.
Also, no need to hop into my direct message. If nothing is personal, better not.
Yes I will do it. I just re installed the windows and forgot about it
there is no pth file inside those folders?
u have everything but not the pth files 
cuz hibernation
ya
Sorry for that namari, it's related to automation
i think so
Lemme train it again.

?
--Wasted--
Sorry, but if you don't tell a specific AI program, like the name of it, I can't help with that.
You can say something like LLM or Python for example, so to see if that I can help about.
you can resume, strange to have events.out in the same folder
they've were moved to eval in latest update
Python, playwright, ocr, etc
?
Yeah, I've never heard of this program. I only heard of Python. 
Ok, but thats not my question
Can someone else help me in this I need a tutor
??
No, no, you didn't ask me like that. I asked you which AI program you looking for, and I just sent a screenshot of Google of the program I've never heard of.
Are u their?
No one is replying I need help please someone guide
No, I don't own this program. Make sure you ask the right question.
I read a bit of the guide, and the first thing I found is Visual Studio Code. Not sure what npm, yarn and pnpm are.
Have u installed vs code?
Fortunately, I have installed Visual Studio Code before.
Wdym by right question did I hurt u if yes then sorry
Great
In Extensions tab on VS Code, search Playwright, there's one that says "Playwright Test for VSCode" by Microsoft, you click on that and then click install.
??
I know that
Can anyone tell me how to redirect to the page from were came to this, as I'm complete newbie
This is their guide about using Playwright on VS Code. https://playwright.dev/docs/getting-started-vscode
Why u r telling me this, do u wanna really help me or wanna make fun of me, is their anyone else who can help please help
I don't know what you think. What you say confuse me a lot, so I can explain what I understand.
Which AI program are you looking for? Is it RVC or Stable Diffusion? And what is your PC GPU?
is rvc also RTX 3060 (i think its enough)
-rvc
Use Applio.
Ok got it namari, I need help in browser automation n rendering results
??
Mate, to install Playwright in your browser without the need of VS Code, I think you should install either npm, yarn or pnpm program so to run command "install Playwright". I won't install either of these, since my laptop storage is full now.
You can ask others if they know about Playwright or search the program on YouTube. As much as I know about AIs, it's not like I would know everything. 
I'm training a model but it's not saving the actual model files
Model.pth
There are g and d files, event files and bunch of other files
you did not select the option to save the model
When training in Kaggle, should I set batch size as 2 for using batch size of 4 ? Because Kaggle using 2 x Tesla T4 right?
What? Where it is ?
You mean save every checkpoint ?
yes
save every weight
But it will save each and every epoch ?
Like from 1 to 100
??
This setting enables you to save the weights of the model at the conclusion of each epoch.
Save Every Weights
so it saves every epoch you selected, 10 by default
! C:\Users\frime\Downloads\voice-changer-windows-nvidia-b2332.zip: Cannot create C:\Users\frime\Downloads\voice-changer-windows-nvidia-b2332\voice-changer-windows-nvidia-b2332\MMVCServerSIO\MMVCServerSIO.exe
Access is denied.
When trying to extract the file
shitty anti-virus detected
Got it thanks
Hello everyone, can someone please help me? I would like to have a short voice clip from Fatman Scoop or DMX, about 10 seconds long. I'm having trouble making the voice sound right. If anyone can help, please get in touch with me. Thank you very much
right click on the zip and " unlock "
Whenever you see such an option available, I advice you to do so.
else windows might flag it by mistake ( also as Noobies said about the anti-vir ~ some crappy ones do whatever they please
how do i put it like towards my output? so other players can hear my voice changed
use charts from SCALARS tab
and less smoothing if you're using avg_50 charts
If I am not mistaken, it is necessary to use a smoothing value of 0.5 and lower for avg charts.
its 0.6
Training still seems to be going on. There is no significant increase in the g/loss chart.



