#✨│ai-help
1 messages · Page 184 of 1
you dont have a normal speech?
no? it's miku
she is not a tts voicebank
she is from a vocal synthetizer meant for singing
I'll try to find a human cover of karma
i find those so funny 😭
mvsep bsroformer is better 
I'm using replay, it's only got uvr

and it didn't separate the vocal for some reason
anyways, how slower the training is going to be with this lr and fp32?
i got my lowest point in 2 hours with this dataset using the default lr and fp16
depends on gpu
4060
can probably even do BF16 with it
u can train bf16 in rvc? wat
Ayo? @simple ore level 18 !!! 
with AMD BF16 is super slow
if u want i can try it but i have 0 coding knowledge so idk if i need to do some magic trick besides moving files

it is for Applio
ok so
so just replacing files right?
if thats the case i can try it
jubyphonic cover → uvr (kim vocal2) → miku rvc
nah, just dropping the file, installing one package, configuring it, then running the training from command line
this is the input audio
sounds easy to me
she has no problem pronouncing "r" (like in the word "dark")
she is pronouncing "th" (like in "the") as "d" though
that's because the trained set has no 'the'
so the inded looks for something similar and finds 'de'
no
she has many instances of the word "the"
this song (karma) is literally in the dataset
i find funny last time i said applio training was slower for me but after i tried the new version now is faster than mainline 
newer torch, better performance
see here
the ui still very slow for me and inference kinda buggy as well, so i decided to use applio for faster training and mainline for inference xD
So I started again in Kaggle, but when I go to train, it still says “error” in gui.
But in Kaggle says it’s working and is going through the epochs.
🤷♂️
the only thing i want to get know is a stable fm that doesn't overfits like at 15 minutes of training

I'm gonna try with a tts audio
the people at de party
it got it right once
to be fair it sounds like that in the original audio
Its been a while since I've used applio. but the voice model i want to use is not appearing after i copy and download the link does anyone know why?
downloads have been broken since the website rework
download manually and unzip into logs
got ya thanks!
weird, @glad zealot do you may know why ?
here's a version based on a better tts
Ayo? @prisma grove level 6 !!! 
imma try something in polish to see how that goes
it should do good considering there's miku v4 chinese in there too (the required sounds are pretty much all there)
[rvc folder]/assets/weights
Yes, just found it. Thank you.
nice, the rz sounds like sh but it's understandable, there's no rz sound in chinese either
the "y" is very good tho
I'm gonna make a 30 min dataset and see how that goes, I'll throw in some more v4 chinese too
actually I just noticed
is this any good?
also what's the deal with pretrain models?
Hmm. My 50 epoch training .pth I just did in Kaggle as a test works in inference, but in the resulting file, some lines are really clear/loud, while some lines very muffled/very quiet.
Any ideas what is causing this?
can someone help me... the voice ai is repeating after everyone in the vc on discord, is there any way to fix this?
Your mic is picking up your headphones
Lower volume ; move s.threshold/n.gate to the right, enable sup2
it worked and its better now, thanks!
Yeah I noticed it too, I did several tests, I don't know why, I think it could be that the notebook needs to be updated. I've already contacted @glad zealot who created it. So hopefully it can be fixed.
Huggingface illaria rvc gives a error on inference @low shard
look at the top right, there should be an error message for a few seconds
it should say smt like 'GPU task aborted' or smt depending on the error
example:
what does that mean ?
it was just an example, you need to show me what error u get at the top right
do you get any?
yes
maybe i used too much gpu 😭
can you show me what it is?
you can just post a screenshot of the one at the top right
im so confused 
click convert again
see if it shows
an error
at the top right
of the space
i will try later
common error messages at the top right are shown in: https://docs.google.com/document/d/1YbXcLFPaGjhOdG5NFkK3QrucCEpHZBwFUxkeMO8aB18/edit?tab=t.0#heading=h.2ahxfypqn13
Table Of Contents Introduction (with website link) Model Loader (Download & Upload) Inference (use RVC AI Voice Models) Ilaria TTS Settings (Inference) Vocal Separator (UVR) Troubleshooting “No gpu is available for you for 60s” “GPU task aborted” “You have exceeded your GPU quota ( NUMBER s l...
i mean the error at the top right for some seconds, not just the one that appears in the UI (interface)
but alr
Is there a colab for Illaria RVC? @low shard
No
Ilaria RVC on google colab is broken
and the ilaria rvc zero version is made specifically for ZeroGPU of huggingface space
could you please
just show the error at the top right when u click convert
it should be like:
- u click convert
- after sometime u get an error at the top right
- screenshot and show me that at the top right not the one in the middle of the interface
be careful cus the one at the top right dissapears after like 10 seconds
alr
What are these information's about dataset and batch size and 32k and else?
What happen to the colab that would automaticially do the vocal seperation and take the source straight from a youtube link?
Ayo? @solar parrot level 1 !!! 
dataset is the audio of the person you are going to train a voice model on
batch size is how many samples of audio it trains at once, usually 8 is used assuming your gpu has 8gb vram. some specific use cases like if you got a very short dataset you may use 4, or very long using 16
32k is a sample rat example, reaching up to 32000 hz
So they are basically just information about the Voice model?
Well if you as a viewer are browsing for voice models, yes
thx for info
batch size has little to do with VRAM
Hi guys, I'm new on training AI Voice models, what does the D, DUR, and G stands for on the outputs? (i.e. D_1000.pth, DUR_1000.pth, G_1000.pth)
hi i have an issue that i dont have friends
Does anyone know if its possible to download the voice models that people use on character.ai for the voice calls? Theres so many made of this one I want on that website but they're not posted anywhere so im just wondering if its possible through f12 or another program to download it to use in rvc
Ive literally scowered the internet and their discord for answers but theres nobody asking online and nobody answering in the server and im surprised with the amount of trained voice models there are on there that nobody is interested in ripping the higher quality voice models on there to use for RVC
UVR5 ?
so in realtime for calls?
whats ur pc gpu?
so like record ur voice, then convert it with a model voice right?
yeee
if u want an easier way, there is ilaria rvc zero which is on cloud (remote good pc)
Ilaria RVC: CLICK HERE 🤗
Guide on how to use it: CLICK HERE 📝
Don't forget to thank Ilaria if you find it useful! 💖
but id suggest to do it locally as you got a good pc and will have no gpu limit
imma be honest i have no iidea what all this means what even is a RVC? what is the first step i need to take??
Ayo? @mild birch level 1 !!! 
rvc = Retrieval-based-Voice-Conversion
Its the program to convert the voice to one of a voice model, its speech to speech
basically free
locally = on ur pc, u download it and use ur pc gpu
cloud = remote good pc, used mostly for those who dont wanna download it locally or dont got a good pc
Yes its open source, the code is public to see
or if you are on mobile and don't have a pc
if u dont wanna go through any issue and just need a quick conversion, use this
yea, u can technically do it locally on cpu but its sooo slow, so its better to do it on cloud
Goat thank you bro
your welcome both of yall
mobile isn't even possible, the pc cpu and ARM (phone) cpu are so different you can't even do it in the terminal, at least without installing an APK
you are actually talking to the first person who used rvc on cpu lol
Hold on lemme see this
i did it mostly for the memes, but yes u can do it on phone cpu
ofc its NOT SUGGESTED as a phone cpu will be super slow
but its doable lol
yep ur good to do it locally too
got it
wonder if 4gb will work too as my phone is from 2020
Prolly not, a test from someone’s phone from 2018 that has 6gb of ram was able to only inference 10 seconds
the super bare minimum is 6gb of ram
even if 8 or more is suggested
so any voice in the channel named voice models is a rvc hugging face model?
like 80% of them, of course there are also GPT SO VITS models which are from another Text To Speech program which wont work on RVC
u can see it from the model post tag, rvc models got the rvc one, gpt so vits got the gpt so vits tag
but like 80% of them are rvc, so dw
there are thousands of rvc models
ok bet thank you
i don't mind waiting longer
Ayo? @topaz nimbus level 1 !!! 
i mean as that 2018 phone couldnt work to inference more than 10 secs, if u used more than that it would crash the app (nothing bad happened ofc, the app just closed)
so i really think if u try to do it the app would just crash, i mean u are free to try but 4gb are too low
Id suggest to use cloud, like ilaria rvc zero for inference
Ilaria RVC: CLICK HERE 🤗
Guide on how to use it: CLICK HERE 📝
Don't forget to thank Ilaria if you find it useful! 💖
i might as well just try at least
be careful tho
none of the phones was harmed (i even used it on my own which has 8gb) but ur phone seems too weak
if a linux ps4 jailbreak comes out for 12.00, well then I'm using rvc on that, besides, i can always try and remove non inference features
To lower the requirements
Might not be possible but worth a try
would a 4060 (8gb) be good enough to run rvc well at low delay? or atleast compared to a 1660 graphics card?
yeah
would actually be interesting tbh, i never seen someone run ai on a linux ps4, but a reminder that ps4’s gpu is AMD, the guide would be kinda different, check https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/en/README.en.md
i mean my version is too new but i could try
removing non inference features wouldnt really lower the requirements, it doesnt check ur pc gpu when u install it, like u could intall it and even if ur pc isnt good for training ud be able to inference as fine if its good enough for that
so u dont need to remove them as as long as u dont use training, running inference would be just fine
which i am planning to do
yea its better than a 1660, for realtime voice changer in calls, I suggest to use wokada fork tho
well i use w okada, what is fork?
yea, u might be able to train as its a good amd gpu but goodluck
fork = modified version
The wokada fork has better optimizations
-rt
This interaction has expired, use the command
/guides realtimeif you wish to see it again.
its the first link
I mean ofc u can just use wokada too, the fork could help for performance tho
if u ever be able to do it let me know tho as its interesting
thank you so much, was running into performance issues while using it if i had anything else open due to my low memory
btw for the meanwhile u can use cloud (remote good pc) like ilaria rvc aero
i use cloud mostly too
I'll upload it on youtube if i manage to even get linux running, although I'm not risking using the dev update version update file
your welcome, might also need to check the settins suggested in the guide
looking through them rn, literally just found out i have a gtx graphics card
yo
Lmao that would be funny, would also be cool if u show how u do it so if anyone else wants to do it they can
i honestly have a ps4 too but i dont think im ever jailbreak it as i use it for gaming
oh u have a gtx that explains
is it a website? or did i download the wrong version
i have a ps4 that i used to use for gaming but that i don't use as much for gaming, barely
Ayo? @topaz nimbus level 2 !!! 
?
U mean the guide ? Yea its a site, if u mean the program, u are doing it locally (on ur pc)
like where to download the program locally?
It should be explained in the guides i sent u, there are blue texts that redirect u to the guides
@low shard do you know what this could mean? run-install.sh: 51: Syntax error: "elif" unexpected root@localhost:~/Applio#
so obviously its running on my pc, but is the UI on a website now? it looks the same as it was before now its just in my browser
Oh understood, i still use it for fortnite or multiversus, even if not as much as i used before i started to learn more about coding & ai
like where can i access files and versions i dont know how you got to that page in ur ss
huh pretty weird, are u using the prebuilt wheel or manually build wheel of applio?
yea its running on ur pc, the ui is just on ur browser running locally
its not cloud (remote good pc), dw
i tried both, pre-built wheel and manually building wheel
which are u using? Could you please elaborate more?
like which model? or method? or?
Ayo? @mild birch level 2 !!! 
u also did sed -i '/sudo/d' run-install.sh Right?
ima try running that again if i didn't
Are you using Ilaria RVC zero or did you download applio/mainline on ur pc?
same result
btw for everyone im helping, if i disappear its bc im on school ipad lol
mm, lemme check if applio changed their .sh rq
illaria rvc zero
you dont need to check the files, first of all did you download the model ?
yea seems like they changed the .sh 2 days ago lol, i need to check a new way as its very different than before sorry for the issue
will update it when i get home
nah you're good
ok
for the meanwhile id suggest you to use cloud
easygui on termux should work tho
yes i downloaded a model that i got ffrom the discord
or was i suppose to download one from the site itself ?
yea u should download it from ilaria rvc zero itself
Check the guide pls https://docs.google.com/document/d/1YbXcLFPaGjhOdG5NFkK3QrucCEpHZBwFUxkeMO8aB18/edit?tab=t.0#heading=h.c5ea25jd74zp
Table Of Contents Introduction (with website link) Model Loader (Download & Upload) Inference (use RVC AI Voice Models) Ilaria TTS Settings (Inference) Vocal Separator (UVR) Troubleshooting “No gpu is available for you for 60s” “GPU task aborted” “You have exceeded your GPU quota ( NUMBER s l...
is easygui compatible with arm 64 natively or do i need to do alternative steps
if not either way, then I'm fine with the cloud
you dont need to unzip it, its on cloud so it will run on a remote good pc not urs
u can find it in the models list when u refresh the models
yea its compatible, they are both RVC forks
its kinda the same steps as appkio except u have to do one line to upgrade pip, and can optionally upgrade gradio to have a better looking ui even if its not needed
Ah ok
Ayo? @polar tendon level 2 !!! 
I'm on my main now
did u try it?
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
what does
[Voice Changer] Pipeline is not initalized...
[Voice Changer] Waiting generate pipline.
mean?
Initial install failed, this can randomly happen.
Make sure to disable antivirus if you have one
Then, delete the folders "pretrain" and "model_dir", then you can restart the program which will reinstall the failed files
just rename them?
win11 may be a bit retarded with right click options
Hello, I'm currently using a NVIDIA Geforce GTX 750 Ti (Also tried installing CPU version of the voice changer but my cpu is Intel Core i5-8400 at 2.8 GHz and its worse than my graphics card).
The voice changer I'm downloading off of:
(https://huggingface.co/wok000/vcclient000/tree/main)
Is it even possible to make it sound smooth voice with my graphics card? I want to see if this graphics card can even handle voice changer. If any tips or solution please let me know. Thanks
GTX 700 series has no cuda cores so thats one point. overall very outdated gpu
I recommend you try the fork, its optimized better. Here its important you download the amd, intel, cpu version and then still select your GPU. The nvidia version will not work
https://rentry.co/Forkvoicechangerguide#download-amd-intel-and-cpu
Got it thanks.
Btw I fixed that, works now (tested by myself)
eh idk where to ask so..is it still the best ai song maker? if not, pls feel free to recommend me new ones, thanks in advance
https://colab.research.google.com/github/hinabl/AICoverGen-Colab/blob/main/Hina_Mod_AICoverGen_colab.ipynb#scrollTo=NEglTq6Ya9d0
yes rvc is still the best for ai covers, that's just a fork (modified version)
anyways the quality between of rvc and its fork is always the same, that one is just made to be easier especially for ai cover making
oh ic ic, thank you!
yw
Is there tutorial how to make your own rvc ai voice?
can someone walk me through making an ai voice modell
has anyone got this working on Linux?
You can search rvc ai voice model at:
- #1175430844685484042
- #🔍│find-models
- https://weights.gg/ (login required)
- https://huggingface.co/models (but watch out cus in hugging face there arent only rvc ai voice models)
- https://applio.org/models
- https://voice-models.com/
- https://thevoicemodels.com/ (for Turkish Models, login required with discord and level 2 on their server)
if there isnt one, you can:
- #1159289738314919936
- #1191429836321849435
- make it yourself with our docs guides https://docs.aihub.wtf/essentials/how-to-make-voice-models/
whats ur pc gpu?
dont worry, u need only the .pth and added.index for usin the rvc models
written guide yes, whats ur pc gpu?
for what? like wokada or rvc? and also whats ur pc gpu
be sure to be using rmvpe as pitch extraction,
and that your dataset is clean, and to also use the tensorboard
If i may ask, what language is it in ?
sure if u want to
Also, are u using the tensorboard ?
yea loss/g/total
btw hina fixed the bug
Guys does anyone know how to like clone a rapper voice to urs for example a rapper “vocals” but u turn to that you made it
so, using a made voice model to convert your voice in realtime for calls?
or u mean making the model first
and, whats ur pc gpu?
yeah basically
idk i got a laptop
You can check your pc gpu via:
ctrl+shift+esc (task manager) -> Performance tab -> GPU
i just wanna turn other vocals into mine
i think it’s ass
i got a asus
Is there any method to get exact pitch value by using some software?
could u tell me the gpu name doing what i told u tho?
just in case to check if u could do it locally (on ur pc) or need to use cloud (remote good pc)
XD
or we can only listen and do several attemts?
im not on it rn but its bad.
sometimes when i be recording on fl ts lags
alright then u should use cloud
For Realtime Voice Changing for Calls on Cloud (remote good pc for those who don't have a good one, YOU CANT DO THIS ON MOBILE):
- Google Colabs (4 hours daily of free T4 gpu, easy to use, require only a google account) :
- Kaggles (30 hours weekly of better GPUs, T4x2 & P100, harder to use, requires an account and a phone number)
Its way better to use kaggle btw
i don’t even understand this gang like all im tryna do is the simple shit what i jus told u
Ayo? @mystic crest level 1 !!! 
basically, there's this speech to speech program called RVC (Retrieval-based-Voice-Conversion)
Those models are used in another program called Wokada for realtime voice changing in calls
As your pc is bad, u cant do it locally (directly using it on ur pc), but you will need to use a cloud computing service (basically u are using a remote good pc to run the program for u)
Google Colab and Kaggle are cloud computing services:
Google colab = easier but short gpu time
Kaggle = Harder but WAY more gpu time
so what im tryna do
if u really want an ez way, try the first guide i sent 'How to use Original...'
however it wont have much gpu time, but the other links dont have a guide
Thanks. Do I need to do a new build?
Ayo? @serene horizon level 4 !!! 
⠀
Google Colabs 
⠀
AICoverGen-WebUI
Useful for making quick covers, by Hina.
AICoverGen-NoWebUI
Useful for making covers, doesn't include a UI, by Ardha, by Eddy, Hina and Gdr.
RVC Disconnected
To train new voice models, by Kit Lemonfoot.
EasyGUI
The OG interface, by Rejects.
⠀
⠀
Settings for Nvidia GPUs 
F0 Det.: rmvpe (suggested for all series)
RTX 40-series: 80-96 chunk | +16384 extra
RTX 30-series: 96-112 chunk | +16384 extra
RTX 20-series: 112-128 chunk | +16384 extra
GTX 16-series: 128-192 chunk | +8192 extra
GTX 10-series: 128-192 chunk | +8192 extra
Advanced Settings
Protocol : Sio or Rest
Crossfade: 4096 start 0.2 end 0.8
Trancate: 300
Silencefront: Off
Protect: 0.5
RVC Quality: Low
⠀
I think you can just go to the kaggle link, do copy & edit and then do all from the start
⠀
Settings for Nvidia GPUs 
F0 Det.: rmvpe (suggested for all series)
RTX 40-series: 80-96 chunk | +16384 extra
RTX 30-series: 96-112 chunk | +16384 extra
RTX 20-series: 112-128 chunk | +16384 extra
GTX 16-series: 128-192 chunk | +8192 extra
GTX 10-series: 128-192 chunk | +8192 extra
Advanced Settings
Protocol : Sio or Rest
Crossfade: 4096 start 0.2 end 0.8
Trancate: 300
Silencefront: Off
Protect: 0.5
RVC Quality: Low
⠀
Thanks.
When using Kaggle is there anything different I need to do to look at the graphed in Tensorboard?
It's all explained in the guide DW, u have to manually sync the graphs and it's always looking at the lowest point of loss/g/total
So i'm trying to train my first model based off a spoken (admittedly very short, 12 sec) sample, for use in Okada, but it just comes out like a buzzing sound? I tried 250 epochs but the tensorflow graph was just a single point. Anything obvious i'm doing wrong, considering I've seen models based off single sound effects
other models work fine, i just made a bad one
12 secs are very short, are you sure its even good quality?
And also u need to use the tensorboard (not sure if u just misspelled it)
sorry yeah tensorboard. Quality seems good to be, very clean and crisp, no background noise
Found the issue! these steps fixed it, in case anyone else sees this
Applio recently updated their version of PyTorch, so recently trained models may not work (They have static instead of speech) in:
- Hugging Face spaces (like Ilaria RVC)
- W Okada Realtime Voice Changer
- Mangio RVC, Mainline RVC and their versions in Colab
For this reason, if when making an inference you can only listen to static, you can convert it using this Colab:
https://colab.research.google.com/github/Eddycrack864/PTH-Fixer-for-RVC/blob/main/PTH_Fixer_for_RVC.ipynb
Btw Applio has added a converter in its new update, but if you don't use Applio you can use this Colab.
Ayo? @brittle wing level 1 !!! 
I forgot to try it
Still says “error,” but is working.
No, Ilaria rvc is broken on Google colab #📰│dev-updates message
Use Ilaria RVC zero
Oh lol
Did u copy & edit from the latest of the mainline Kaggle?
Like not from the same notebook u made yesterday,but make a new one
Yes. I deleted the old one completely.
@glad zealot could u check again pls?
Ye me and someone else tested it again and it's working properly
Oh wait that's Ilaria RVc
Mainline kaggle seems to work no problem for me tho
Maybe it's just me! I'm not sure what I'm doing wrong there.
A wait I didn't pin the latest lol
I see it now. Thanks.
It should be version 3
Yes. Do you have any ideas why I keep getting an error when I try and use a custom pretrain?
Download or use it for training?
I'm adding it like this "!wget https://huggingface.co/Sztef/SingerPreTrained/resolve/main/f0G_SingerPreTrain.pth"
What error it says?
To use to train.
How long till I can use X-Minus again?
I'll just check.
hey I did the clone repository and install depndencies on the colab page and when I tried to start a server it said there was no such file or directory '/content/voice-changer/server'
/content
Is there somewhere I need to go to download these things?
What colab link are you using?
You can try using the pretrain downloader
It's the 3rd cell iirc
-colab
Suggestions for @arctic spear
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
This interaction has expired, use the command
/guides realtimeif you wish to see it again.
thank you
He's working now I updated to your new build.
Noice
⠀
Settings for Nvidia GPUs 
F0 Det.: rmvpe (suggested for all series)
RTX 40-series: 80-96 chunk | +16384 extra
RTX 30-series: 96-112 chunk | +16384 extra
RTX 20-series: 112-128 chunk | +16384 extra
GTX 16-series: 128-192 chunk | +8192 extra
GTX 10-series: 128-192 chunk | +8192 extra
Advanced Settings
Protocol : Sio or Rest
Crossfade: 4096 start 0.2 end 0.8
Trancate: 300
Silencefront: Off
Protect: 0.5
RVC Quality: Low
⠀
enter the pretrain path as filename only (since it is in root directory as you see in imjoy) and it should work
⠀
Settings for Nvidia GPUs 
F0 Det.: rmvpe (suggested for all series)
RTX 40-series: 80-96 chunk | +16384 extra
RTX 30-series: 96-112 chunk | +16384 extra
RTX 20-series: 112-128 chunk | +16384 extra
GTX 16-series: 128-192 chunk | +8192 extra
GTX 10-series: 128-192 chunk | +8192 extra
Advanced Settings
Protocol : Sio or Rest
Crossfade: 4096 start 0.2 end 0.8
Trancate: 300
Silencefront: Off
Protect: 0.5
RVC Quality: Low
⠀
can somneone link me the download for the virtual audio cable?
Because the files are already there on huggingface unlike in google colab
I think you mean this: https://vac.muzychenko.net/en/
is there any trick to generate output that is less likely to have voice cracks ? It looks like all the Gura models have the same issue with this one song. I don't know about the settings like Voice conversion options and Audio mixing options so I would really appreciate it if anyone can help me out in this part. Thank you
Turn down protect
If it's at 0.5 then turn it down
Currently it's at 0.33 for me (by default), I'm using the Hina mod one on Colab, should I turn it down to 0.2, 0.1 or 0 ?
Play around, see what what works
Ok so apparently my memory sucks and it's to avoid voiceless stuff and breath sounds
I was wondering why turning it down didn't help the crack at all
Ayo? @unique stratus level 2 !!! 
If there's like heavy autotune in the song then there's i think no way to remove them
In case you wanna inspect the song it's named Legacy, a DMC 5 soundtrack
There's a part after 1:05 that always crack
Can I send the audio directly here
Sure
The rest of the song is fine but that ONE part near 1:07 keeps cracking no matter what Gura model I try
heres the original
Can i have the link to the model? I would spend my time at least trying to get raid of the voice crack
Ayo? @polar tendon level 3 !!! 
I actually tried various different ones actually, but all of them end up having the same crack in that one part
I'ma try different ways and see if it gets rid of any voice cracks
Thank you 🙏
ah, the voice cracks might be caused by backing vocals and harmonies, go separate it first
Thanks, I'll try again asap when this is finished
oh wait yeah i didn't even notice
Btw Sorry for my ignorance, I was doubting if there was an issue with the models, turns out it's the input's issue 🙏
It's fine, although i would recommend you inference the vocal files with something like RVC Zero instead of using the AICoverGen as that's faster
And then merge them back into one audio file
Until you're only removing the back vocals and not the instrumental too
I don't think we were talking about that
Sorry, does RVC Zero have a Colab notebook ? My laptop has too weak specs to run local models
Ayo? @unique stratus level 3 !!! 
RX 7900 GRE
It's a huggingface space: https://huggingface.co/spaces/r3gm/rvc_zero
It's almost the same concept as colab
Sorry, what do I need to put in the Model and Index sections ?
Oh you can also use URL-to-model, you paste in the model URL
preprocessing gets stuck here. tried restarting applio and my laptop but still stuck here
just realized what it could be, actually.... gonna test that theory... yeah nvm its not what i thought it was. has nothing to do with the model name replacing an already existing name
Might be a python problem, idk never worked with it locally
u should be able to do under ROCm, unfortunately I dont have hardware for that
oh i found out what it was. im too damn impatient lmfao. the progress bar wasn't moving but it actually WAS preprocessing. just took a while presumably because i have a huge dataset
Okok I will try in morning then
Damn how much GPU RAM do you have
8gb
The google colab GPU RAM would run out, and it's 15 GB
wait they let you use 15gb vram over on the cloud side? that's so generous lol
Ayo? @finite prawn level 4 !!! 
make sure you are using a folder containing wav files exported through audacity (if it doesnt work)
original rvc should work tho
Yeah but it's not unlimited use if you don't pay
already am using audacity exported .wav's
Use mvsep or cloud UVR
Or if the PC is powerful enough, run it locally
Huh? Even if there are files by other people u can still upload your model and audio
mvsep has some good exclusive models not being available for download
X-Minus
I mean UVR, if you only need vocal and instrumental extraction then UVR locally would work too
Well yeah, you should be able to, what are u looking for? Wokada or rvc
The one that can be used for discord and such
So realtime voice changer for calls on discord, yeah wokada, there is a Linux part in the Wokada fork: https://rentry.co/forkvoicechangerguide
Guide style is in the same as Blanc_dot's. Thanks Blanc_dot for corrections. Most technical information comes from deiteris.
Last update October 6th, 2024: Multi PC setup explanation added
Translations added for:
German: https://rentry.co/ForkVoiceChangerGuide_de
Turkish: https://rentry.co/ForkVo...
iirc try check deiteris repo for installing for ROCm
Weren't you talking about Ilaria rvc?
What's the difference between them exactly?
No
I'm telling you alternatives,both of those got UVR models anyways
Between Wokada and it's fork? Fork means modified version, this one is more optimized in performance
there are also Kim's and unwa's melband roformer downloadable for UVR
Sorry got two replies confused
Ayo? @polar tendon level 4 !!! 
Do they both have ways to seperate harmonies?
mvsep should be able to remove harmonies
melband roformer karaoke
Oh lol dw
On which one?
Ayo? @snow hazel level 11 !!! 
should it be considered normal for training with an hour-long dataset to take around an hour for every epoch? I have an RTX 4070 with 8GB VRAM, and all of the models I've trained before had pretty fast epochs (but they did have 15-20 minute datasets)
oops
my forgetful ass forgot i had the same issue and fixed it a long time ago
disable cache dataset in GPU
4070 laptop, not 12 gb one? should be enough for batch size 8
my settings were fine except for 'cache dataset in GPU' being enabled. encountered this problem before but i just forgot the solution
just disabled it and retrained and now it's running smoothly
never seen anyone recommend cache dataset
i just made an educated guess with my lack of computer knowledge that enabling it would be beneficial
Yeah I tried but it didn't do a thing.
skill issue, try UVR BVE or v2 in uvronline.app
@low shard I stil get the error in the illara rvc, i did not see the Gpu task aborted message so what is going on!?
Did you see any other message?
no
Don't you see anything at the top right for some seconds?
i only got the error messsage
no
it was just a white screen at the top left from my memory
Could you please screen record showing me u press convert?
yes
Yea pls do that so it's easier for me to understand the exact issue u are having
My guess would be that it's the GPU quota but you would get a "retry in.." with the time at the top right
So that's why I'm asking much for the error, cus you have to do different things based on the error you get
i sent it to you
Before clicking convert, be sure it shows the waveform of the audio you upload which means the audio is successfully uploaded
ok
(the waveform is that thing shown in the image, it seems like you convert before it fully loads the audio)
Try waiting first for that to appear, and then click convert
alr
it still gave me a error message
now what?
Could you please send me the audio file and voice model download link in dms so I can try myself?
yea its a problem of the voice model
how do i fix it?
it has the trained index, instead of the added index
did u make this voice model or found it?
i made the model myself
do you remember having an added index? you zipped the wrong file
where is gpu
is it added?
oh idk why but in ur pc is called 'graphic processor' which is the gpu, that seems an integrated graphics gpu which is bad, meaning you won't be able to train voice models locally (on ur pc)
As you dont got a good PC, its better you use cloud (remote good pc) for training an RVC Voice Model:
- Google Colabs (4 hours of daily gpu for free, not much hours, but easy to use):
- Applio (ui)
- Mainline (UI)
- RVCDISCONNECTED (no ui)
- Kaggles (a bit harder to use and needs phone number but gives 30 hours weekly of better gpus):
- Mainline (UI)
- Applio by Vidal (UI)
- Applio by Shirou (UI, no guide as of right now)
- Lightning.ai (Kinda hard, needs login, no issue with web uis or anything, but only free 15 credits monthly):
Google Colab = Easy but low gpu time
Kaaggle = Harder but MUCH Gpu time
you should have a file named like added_IVF169_Flat_nprobe_1_sahed_v2.index or similar to that
oh
where did you train?
google colab rvc v2 disconnected
go to https://drive.google.com/drive/u/0/my-drive (google drive), check for an rvcDisconnected folder, check for a folder with the same name of ur model (which should be 'sahed' from what i seen), and inside that u should find the added index
Access Google Drive with a Google account (for personal use) or Google Workspace account (for business use).
let me know if u find it
got it working now
its the wrong index file
yea u used a wrong one, but in drive u should find the added index
an example
u should download that, then zip it with the .pth and the model should be working now
Hi, I'm re-adjusting to using RVC after it's migration from Google, how do I change the octave the voice is singing in?
I realized I put a male voice on a female singer and he sounds.... Bad. Lol.
it's migration from Google
RVC is still on Google Colab, it's just that the other cloud computing service, Kaggle, offers better gpu times
Could u tell me what u are using?
I'm using the downloadable version off github, I was under the impression google took RVC down so it migrated lol!
Google Colab & Kaggle technically are not allowed, because it uses a WEB UI (except if u buy google colab pro)
But people just obfuscated/encrypted the code so its still usable all fine
What's ur pc gpu?
Is there a place where I can do inference convertions online like training in Kaggle?
you should be able to also inference in kaggle iirc
however, for inference the fastest one is ilaria rvc zero
Ilaria RVC: CLICK HERE 🤗
Guide on how to use it: CLICK HERE 📝
Don't forget to thank Ilaria if you find it useful! 💖
implementing RVC for an app. if anyone knows optimal block_time settings please lmk
My dad built the thing so I had to ask him, an AMD RX6600
Ayo? @spiral notch level 1 !!! 
Btw You can check your pc gpu via:
ctrl+shift+esc (task manager) -> Performance tab -> GPU
(telling u just in case u ever forget lol)
However, for the Mainline RVC (which is prolly what u used) needs linux for AMD GPUs, for windows its better u use Applio (an rvc fork, modified version, the quality doesnt change, only the UI which is the interface): https://docs.applio.org/getting-started/installation#amd-gpu-support-windows
Documentation for a high-quality, open-source speech conversion ecosystem designed for simplicity and optimized performance
I mean, it works fine for me, and I've had no problem using RVC as I currently have it, I just dunno where in the WebUI the option to pitch up and down is, cuz rn my singer is rly high pitched.
Yooo
Anyone here expert at training in local ai
?
My stuff be not sounding good when I have a Nevida gpu 3060 Ti
Oh alr sorry,
You have applio or mainline?
I remember downloading it from a github link on docs.aihub.wtf but that's down at the moment I think so I can't see which download I used specifically.
filename is RVC1006Nvidia
hi guys ,. i just have now mmvc_client_v0.4.3_x86_64_win and this one need a ONNX models
how can i get ONNX models
thx u
Alr, that's mainline, i dont do it locally but should be this one
Ahhh ty!!!
yw
Have a question. Certain words sound like they are speaking in like a japanese accent. Would a pretrained starting model with english singers help with this? I'm using RVC mainline and the pretrained weight that comes with it and training off that. I assume that because this was made by a non native English speaker that some pronunciation issues are to be expected, because the initial training data was not English. Is there anything I can do to rectify this?
if you use an index, then the voice features shift towards the voice model, if you dont use the index, they stay as in the original audio
The index is english. But some words, like hard G sounds trigger it.
I meant, the training data is english I meant.
Hey, I noticed that when I'm playing a game like Cs2, the voice changer glitches out more. Is there a way I can stop this problem?
Ayo? @arctic spear level 1 !!! 
limit fps?
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
How do I use the model i made? I remember there was a local website for it and now I can't find it
Whats ur pc GPU?
And do u mean using it for pre-recorded audios or realtime for calls
1650
im trying to use the model i made for an ai cover
although i think i've found the page i was looking for
thank you for replying though
Ayo? @stiff moat level 1 !!! 
you would be able to do inference (using models) only locally (on ur pc)
but wouldn't be very fast, i would honestly recommend cloud (remote good pc)
what are u using btw ?
the collab
applio google colab?
I'd suggest you Ilaria RVC Zero, it's faster as its using ZeroGPU (A100, faster than Google Colab's T4)
Ilaria RVC: CLICK HERE 🤗
Guide on how to use it: CLICK HERE 📝
Don't forget to thank Ilaria if you find it useful! 💖
yw
- UVR5 UI, by Eddy and Ilaria Huggingface Spaces
- Ilaria RVC Zero, by thestingerx Huggingface Spaces
- RVC⚡ZERO, by r3gm Huggingface Spaces
- Applio, by IA Hispano Huggingface Spaces
- 🆕 FaceFusion UI, by Nick088 Huggingface Spaces
what is the best one to use for chatting i want it to use for a rpg character
Tell me what error do you get at the top right
It might be u finished the quota
Oh right you meant the Google colab version
That's the problem
Yeah it's broken
Ouch 💔
Ilaria RVC: CLICK HERE 🤗
Guide on how to use it: CLICK HERE 📝
Don't forget to thank Ilaria if you find it useful! 💖
Use Ilaria RVC zero instead
It's a ZeroGPU HuggingFace Space which has faster GPU than Google colab
I already inferenced with applio but I wanted to check how the results will be because it's slightly different with every Colab
it's not bad for inference, but u can use ilaria rvc for faster one
I'm more comfortable with Colabs but ok
The results shouldn't be different, all RVC forks have the same quality, may be some have different default settings
Sometimes the output sounds different
But the ZeroGPU space has an A100 GPU, it's literally WAY more faster than Google colab T4 GPU
Tho RVC and all it's forks never changed anything in terms of quality
Are you really really sure? Might be a placebo
The only difference is basically the UI, interface
Okay where is the newest ai cover generator Colab it's too slow
It's not when I use harmonify
zerogpu one is easier tbh
You sure you don't wanna use Ilaria RVC zero?
ZeroGPU Is faster
And easier to setup ye
Nah I want
That was an old fork from @finite galleon iirc
Someone said jammable gives more realistic results
Jammable is the new name of voicify, one of those scam sites that use RVC and paywall it
They pay people to promote their shit, used to do it alot on tiktok
It's just a scam, just like kits.ai
They ALL use RVC
Yes...
Just like kits.ai
They pay people to promote their shit and make others use their site, that's prolly why u heard that lol
Then why did that one person tell me it's the best I mean I've inferenced the same stuff with applio and the one from jammable has better sound
Jammable is just pay-walled RVC tho, I don't really think there is any difference in the output
https://vm.tiktok.com/ZGdeXuXxS/ this is made with jammable.
If you really think there is, might be the audio wasn't cleaned well and in jammable they just cleaned it a bit more, other than that there is no other way as it's just RVC
Yup it's just RVC lol
There's tons of those ai covers site
At the end they all use RVC
But it sounds more realistic than what I get
Like I can't get such realistic results I even use the exact same models
I don't think you can import external RVC models to jammable, so that's prolly just based on the RVC model quality that you have used in applio
The reason why u can't import is not bc it's different, it's just like kits.ai
it's just their models, prob comparable to the one in #1191429836321849435
Bc it used 2 different RVC models
Still RVC tho
The person I literally messaged and talked to literally has credited the models and creators!
U ofc need to get a good quality RVC model for having good quality conversion
I use the best ones
It's literally just RVC 😭
You are just comparing 2 different RVC models
Five of them are literally the same I use
See?
And the cover is still better than what I can convert
yea I'm thinking it's literally another kits.ai
They're the same ones I have
It is, those guys used to steal my fucking covers to promo their voicify shit
They used to steal covers on TikTok and put the voicify logo on it too
Bc people get paid to promo that shit
who is this bot @brittle wing ? 
In the jammables site it doesn't show that u can import external RVC modela
So they have their own BTS models here?
Yeah, as u see on the jammables site, it doesn't let u import models, u can only train them on their own site for money lol
Then why did this person literally credit the creators with the exact same models I use
Do the developers steal models from here and profit off them for money?
Not really sure about that
But the models can't be imported any other way as I just checked
My two old models are probably here hahaha
They look like they got them from weights...
I remember there were some models stolen when they were called voicify so prolly yea
The models aren't that good but
Ayo? @brittle wing level 7 !!! 
How do they sound realistic here
You need to properly clean the vocals
Which ones for inference or training
Cause there's no longer need for that Melband karaoke doesn't generate noise
Both, u need to have a good quality model trained using the tensorboard ofc
Nah I'm tired of training models my second one was good but I overtrained
I'm just lazy
Then rip
What-
But Melband karaoke doesn't generate noise when removing the Reverb it's not MDX or UVR architecture
Roformer is noise free
i mean that ofc you need a good quality model to generate good results, i seen u said u overtrained
I already got one that sounds the same
But isn't overtrained
"overtraining" based on tensorboard graph?
Don't ask me too much questions I'm not stupid I know how to train models
Well it was last year the dataset was shorter than what is supposed to be 750 epochs I probably looked at the tensorboard or I didn't I don't remember
Is RMVPE+ better than RMVPE? Harmonify uses that too
Ilaria gives better results than applio
Clearer vocals
But tends to decrease vocals' volume
it may depend on the model's peak and dynamic range, other than the volume envelope option
Is it okay to use Hubert while using a pre-trained model?
Google drive doesnt seem to like RVC
crap i cant send an image
but it basically said I executed code that's "not allowed" in the free tier
Whenever i do applio i get this
When i convert
it gives me nothing
and gives that error
Can someone when i try to convert vocals it gives me errors
-howtoask
This command has been changed to !howtoask
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
does anyone know how to terminate the .bat file after your done
anyone know whats causing a weird g/fm graph on tensorboard?
my other graphs look fine but this one bouncing around i dont think its a good sign
i have no idea how to download the RVC can someone help me ?
what to do?
⠀
Google Colabs 
⠀
AICoverGen-WebUI
Useful for making quick covers, by Hina.
AICoverGen-NoWebUI
Useful for making covers, doesn't include a UI, by Ardha, by Eddy, Hina and Gdr.
RVC Disconnected
To train new voice models, by Kit Lemonfoot.
EasyGUI
The OG interface, by Rejects.
⠀
FM graph will oscillate
as that's the most important characteristics-type metric for models
also, pro tip. Don't use " ignore outliers " or whatever was it called
instead, keep the graph as it is + smoothing at 0.2 to 0.6
Ignoring outliers serves practically no purpose in case of rvc
To have it more stable in general, it's either you have to take care of samples' not being too diverse / too dirty or just increase the batch size
I've talked about it too many times so won't go full in-detail but, smaller batch_size = more noisy and non-stable graphs whereas higher batch_size gives you more stable and " flat like " graphs
It is the same " under the hood", just the functionality differs a bit.
The only difference from what I know is that rmvpe+ allows you for range clamping or something of that sort.
Whenever you can, use the + version
Haiii Cody.
Different types of RMVPE Pitch Extraction:
- rmvpe: A Robust Model for Vocal Pitch Estimation in Polyphonic Music, the normal version of the best pitch extraction, it's robust and not sensible to noise
- rmvpe+: has a pitch threshold, it limits the max and minimum pitch possible, basically deleting f0 values below and above certain thresholds
- rmvpe-gpu: Training ONLY, uses your gpu for the feature extraction process, using more gpu so making training faster
- rmvpe-onnx: Wokada ONLY, its a must for AMD Users who use ONNX models
Its basically the same as rmvpe except it uses a min and max pitch
Hey, I'm the person u talking about lmao. I didn't steal models, i imported them on Jammable and used them and gave credits too lol
Ayo? @fleet fiber level 1 !!! 
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
Yes, u literally can import any ai model out there and put it on Jammable!!
since when ?
Idk since when, i started using just like four months ago
its shit using that
its one of those scam ai sites
just like kits.ai
just use RVC, that's what every of those sites use, its Open Source
It's not, go and try it urself? i tried kits.ai before and it didn't work that well but Jammable is so good like? the results i get are way better that using any other site
applio is also good but it takes time
seems to let u upload models, but why would u pay for something u can do for free??
they aren't way better 😭
its literally just rvc
cuz i have the money and its way easier 😭
Ik, they are but the results are better, the quality is better
.. because jammable just runs RVC in cloud, its just like running RVC on a cloud computing service, jammable isn't running them on ur device
paying for something that is a literal scam and its available for free is crazy
rvc is literally easy to use
it's not a scam tho, why u keep saying that? If the quality was bad and the results were bad then i wouldn't pay for it
you just contraddicted urself literally
I said the quality isn't better, and in the same sentence u said that you know but they are better 😭
Ik, i use it too
because it is.. if i sell something that its free for paid, hows that not a scam
I was replying to u saying that they're all rvc not this one
saying that they're all rvc not this one
Jammables is RVC too
when the quality is better + easy to use + no time spent then it's not a scam to me
Ayo? @fleet fiber level 2 !!! 
That's literally what i said
first of all quality ain't better, about easy to use that's kinda a skill issue ngl
u said in the same sentence that "they're all rvc not this one"
Again, the quality is better. ur just a hater atp and ik how to use rvc, i just don't want to waste my time for an hour or more making a 3min ai cover when i can make it in a minute or less
Not this one meant that i wasn't replying to that message omg
the quality can't be better if they are both using the same program, its like saying an rvc fork has better quality than the other when they all have just the UI different
And about time, that depends on the device it's running on, ofc jammable is just cloud rvc
And it takes less than a minute for Ilaria RVC Zero running on ZeroGPU (A100)
I tried applio, kits.ai and other sites but imo, the results i got from Jammable were better, it doesn't matter if it's the same program, the results were way cleaner and better to me. I use the same models, i use on applio but it's not the same results at all. on applio, the voices are not clean and they crack but on Jammable, the voices are smooth and clean for me
Also, on applio, the site always stops working so I have to start all over again, that's why i switched to Jammable
Are you talking about applio colab?
it doesn't matter if it's the same program, the results were way cleaner and better to me.
That's literally placebo effect, its like saying the same calculator works better on ur device than mine lol
Also, on applio, the site always stops working so I have to start all over again
Pheraps you mean applio colab? that's not a bug, Google Colab is a cloud computing service, it cant give free gpus forever so ofc u can't let it run 24/7
Ilaria RVC Zero is already set up, and uses ZeroGPU from huggingface (an A100 is way faster than google colab's T4 GPU)
Table Of Contents Introduction (with website link) Model Loader (Download & Upload) Inference (use RVC AI Voice Models) Ilaria TTS Settings (Inference) Vocal Separator (UVR) Troubleshooting “No gpu is available for you for 60s” “GPU task aborted” “You have exceeded your GPU quota ( NUMBER s l...
Now, jr just talking lmao. Talkin about some placebo effect, c'mon now 😭 ik for sure that it's not a placebo effect, i used both many times over the year so yeah, the results on Jammable are better
Yes
its the same program and you literally know it urself..
Ik that, I'm talking about when it's gives those errors mid converting
its like talking to a wall atp
Ofc that occurs because you exceeded the usage limit. That's how colab works.
Including the fact Google got very strict with RVC colabs.
Never used Ilaria before, I'll try it and see
i think he means having to install it everytime ofc as u start a new session
Yet, the results are better for me
Yeah, i've understood that.
That was what i'm talking about.
Yeah, ik that
that's errors that can happen in case u did smt wrong, that's really vague as you don't specify what error u get
Yes, exactly and also when it stops working just three minutes after using it and i have to do it all over again
I'm mostly talking about when ur downloading the model or converting the cover
yea you didn't still specify what error message u get so
for me
Yea for you, cus its the same program
its the same logic as running DOOM on 2 same laptops in terms of hardware with the same OS, the results won't change
Yeah, i don't rlly remember what messages i used to get but the point is, i still had to start all over
yes, they do change omg but u won't know that, unless, u try it urself
i cant really help without u telling whats the error message tho, it depends on what step u did wrong or if u downloaded a model with unnecessary files or if u finished gpu quota, or etc
I'm literally the girl you chat on Instagram with, I didn't blame you I thought the website creatore steal RVC models and put them on the site without giving the creators credit and get money off of that
they do change omg
If you run on the same exact copy of a game, on a same exact pc, how do u expect to have different results 🤦♂️
Do you import the same model files you use in RVC (.pth files) on jammable?
omg, ur getting on my nerves 😭😭 they do change, the results are better, the quality is better. I have nothing else to add cuz atp, go and try for urself
Ik, i just wanted to clarify that i don't steal since the convo was mostly about me, ur good dw
you are literally using a logic in which doing 2+2 on 2 PCs would have different results 😭
yes, and the results are better there too 😭
OMG 😭😭 JUST ACCEPT THE FACT THAT IT'S BETTER FOR ME AND MOVE ON 😭
Bro stop it's literally a good site
its factual not better than rvc tho as its the literal same program
same or not, again it's better in the sense of quality
Ayo? @fleet fiber level 3 !!! 
Level three of what huh
It's not. (Mate, you're wasting your money there)
it is the same.. u can literally import the RVC models
Bro from the cover clip I sent you yesterday they're literally better
if it wasn't the same u wouldn't be able to do that, it would end up like trying to load a GPT So Vits Model in RVC, which wouldn't work
I'm not talking about the models, ik if the models are bad then the results will be bad, I'm talking about the quality here
Exactly, that clip u sent was mine and i made it on Jammable
the quality literally relys on the model and input audio you give it to inference with
There is no literally other way the quality can change on a same program
I know
Are you serious there is a difference
okay, i agree with this but the quality also depends on the site ur using bcz tell me why i get different results with the different site? if they're all the same then why the quality and results are bad on other sites but good on another?
I said that applio and harmonify outputs have a little difference in sound
but the quality also depends on the site ur using
It doesn't, what the site is doing is just rvc cloud computing, its running RVC on remote PCs, letting u use the service (similarly to colab)
Bro you aren't deaf come on
it's literally so different on each site😭 like why i get bad results on applio with the same models, same input but it's better on Jammable with still the same models and input 😭
Now, you're not trying to listen at all
ur just trying to prove ur point when it's literally wrong
@fleet fiber 💜
can say the same to both of yall
i literally explained yall how those sites work
its not like they have magic servers that make it run with better quality
And... that's why it's paid cause it's better
because its paid it doesnt mean its better..
its the same program
which is literally Open Source
it's still better
Itsss the same program okay but the sound is more realistic
Idk what kind of coding they set here
its like saying iZip is better than Winrar just because you are forced to buy an iZip license
It's not th same
Atp, just go and try it urself then cancel the subscription after that cuz u won't believe it unless u try it
its the same in terms of training and inference, if it wasn't it wouldn't be able to run the rvc models u imported
Maybe different settings. In Applio you can configure how much index to use, volume envelope, etc and that also contribute to the final output but still rvc
Also Mel-roformer karaoke model on X-Minus sounds better than the one on MVSEP.
That's an example
Okay but how are the output more high quality and realistic
Explain this
ik that it's still rvc and yes, bcz the settings there are different and even on Jammable i can also control the volume + extract the bv, instrumental on the same website and i can even add autotune and effects so maybe that's why the quality is better idk
There are different versions of mel reformer so that's true but RVC is still the same
I don't think it's the same
this argument feels like trying to explain iOS users why sideloading is good but they think every file is a virus 😭
Actually it is.
it is the same, there's no difference in quality between mainline, applio, ilaria rvc, etc
Have you tried them all
Have you tried jammable sir?
well if that's what u think bcz i only tried applio so yeah, Jammable was better for me idk about the other sites
Well currently imo the best thing is Ilaria with 0 gpu
Why i would when i got kaggle, local and applio?
That says enough
Yes personally I wouldn't pay money too but the result is realistic.On each their own
On my humble opinion, there is no such thing as a realistic model/output.
then u wouldn't know, all that talking was for nothing
That's what I said, rmvpe with range clamping, in this case pitch, using hz values
¯_(ツ)_/¯
Well said
That says enough, that's why you can't tell the difference.
They 4 are the same mate.
There is no difference.
I personally see realistic models not as something you can't differ from real voice as that's sadly not achievable yet, at least not in standard scenario, but rather a model of which output makes you want to replay 1 or 2 times more as you're wondering if it's for sure AI or not
btw, here, catch 🥬
I still can’t understand fm
So is this supposed to go down like all of metrics or is normal that it always goes up? Noobies told me is supposed to not go up
All metrics are supposed to go down
just that FM from what I've noticed, at least in those rvc builds I tested including mine, usually is reversed
( the graph )
yet the value on it's own should go down
so is like a visual bug?
either a bug or just a quirk devs left in the code
tho I myself couldn't spot what causes it last time I checked the code ( might be I didn't notice it but ye
uhhh so how we can tell if fm is fine?
just look at the value itself
rather than visual graph
if it's reversed in ur case, as in, up = down ( based on what values tell you )
then ideally it should be going up ( virtually, down )
They decreased for me
then if values decrease and graph is descending, it means it's all good
and just how it should be
values down + graph down = good ( normal behavior
values up + graph up = bad ( normal behavior
values down + graph up = common bug / quirk of rvc but it just still = good
values up + graph down = common bug / quirk of rvc = bad
You said higher batch sizes helps fm to be stable, when i do that visually the graphs goes up very fast at the start of the training
Yesterday i tried a small batch size and it went down with a few fluctuations but it did not go up forever
why that happen?
🐢: observes you both
higher batch size contributes to more stable models because more data " at once " ( too complex to explain in easy words so imma skip it ) is used
however, it can lead to too flat converging or just overfitting quicker / more often
there's also that thing if your dataset is too diverse
for instance, weird balance of speech / singing / screaming
and you use high batch size, model might confuse the stuff during inferencing so it's best to, in that case, go for lower batch or balance the ratio
hmm interesting i see
as for lower batch sizes
What if i use a clean 23 min dataset with 8 on batch?
it's either your set is small ( yet balance is kept in the set ) so that's what it is
or your set's balance is kinda weird or broken, then it should help
then it's a matter of how diverse is it
in terms of characteristics
i.e.: screaming, shouting, whispering etc
If it's just speech with a bit of laughing?
if it's rather uniform and balanced
high batch size should be good ( for ur case leo ) but sometimes you can decrease it to improve results / generalization
Oh yeah this was a 8 min dataset so thats why it helped
Nuevos trucos aprendidos jaja.
Gracias Cody.
alternatively
if one wants to maintain the batch size but not tinker too much with the set
just add more " normal samples " and keep the " diverse / outlier ones / unusual ones " in 2-5% whereas 95-98% is your standard ones
too diverse can cause fm to be unstable? The more you know
yes because FM is feature matching
matching of features
if there's too high diversity, model is confused a lil
on what is the " right " features' distribution
more or less, simplified
Because, well, I talked about it a lot as it's crucial
Diversity is fine and sometimes welcomed yet uniformness and so, a bunch of data that can keep it's identity is the most important
also other thing that happen to me is that rmvpe fm graph always went up regardless of batch size but when i switched to mangio crepe hop 32 and batch 2 made the graph not going up visually but instead looked like the one i sent
the end result was fine like u would expect to sound like
lower batch sizes usually have lower loss values
I used to have two water turtles as a kid and turtles were my favorite animal
Anything that contributes to " learning " stable features representation can do it
it isn't always the set on it's own
can be just
well, voice's complexity and it's nature
The sound is different you haven't even tried jammable but to each their own.I won't argue
Now pardon me folks, gotta go back to sorting samples.
Ripped all data from danmachi game lol, over 30k samples to sort 💀
take care!
share a raw jammable file
the way it comes out the website
I haven't paid for jammable
then how would u know if it comes out any different owo
But the person who uses jammable has messaged me and I compared both
Ayo? @brittle wing level 8 !!! 
okay, may we have a listen too?
Hm I gave to ask this person for a sample first
didnt he already give u a sample so you could compare the 2
The cover they made is realistic and made with jammable, proves it
But it has reverb and music
yeah, thats the thing, you can manipulate the fidelity of the audio post processing
Thanks for explaining me those stuff, they can be very confusing 😭
@analog obsidian Sure thing! If you need more technical info, please search for my msgs on this server. I do leave explanations or stuff here n there, incld. metrics etc.
Okay how do I manipulate to make it more realistic, any tips?
I'm waiting for the person to reply
im not a audio engineer myself so i dont know how to master audio and polish it
Mastering?
Ohhhhh
It's clear
mixing and mastering are what audio engineers are for
the clean up vocals and then enchance their clarit
Bandlab has a mastering feature
yeah u can mix and master with bandlab
Yeah but I applied the exact same effects on the exact same output and song and model yet their sounds more realistic
What do you wanna achieve?
gimme a brief tl;dr
yeah cuz it also takes skill to train a decent model
both models were trained on different datasets
I literally used the same model they use also I use the best models available
yes but jammble doesnt share the model weights they have, so the model u 2 had couldve been worse than the one on jammable
You can literally upload models there
It's the same exact model
Again
Bro if they were two different things u wouldn’t be able to use your rvc model there
then with that in mind we have cracked the case; they use post processing to make their outputs sound better
simple as that
What post processing?
Can I manually do that
How can Litsa know what was the exact post processing chain they used
well they could've used some sort of de esser or eq to smooth out some frequencies and make them more natural and yes u can manually do that you just have to learn how to. Now i don't know what exactly they run the audio files through since i dont have the code they use at hand so i cant tell u anything thats accurate to that :p
im a wizard now :p
Hes a Magic dog
It's some kind of audio sfx
They probably do
de-essing, eq, compression, saturation, there's tons of what can be done
yeah what codename said
Wolf
Hes a husky dog
Might be you could try to reverse the impulse maybe?
Is it possible with bandlab
Read on irs / convolvers
i dont work for the company so i dont have their code at hand
Magical husky dog
get the exact same input they used and make sure it's the same model
get an irs response out of it effectively replicating the effects' chain
and apply it over ur model's output
tho, I won't teach you how it's done as that's not my specialization, just proposing an idea ~
Ohhh I couldn't tell from the drawing
Understandable
these sort of website usually do that to attract and make it sound more appealing to new users. The raw outputs are usually the same though, post processing can make a huge difference.
just keep that in mind
