#✨│ai-help
1 messages · Page 216 of 1
or rather, go webui bat
no need to play around environment
wdym by " not for me "
so wut now
the webui
ok thank you
anyways
is there a specific time of audio
i have to put
for the original voice
tbf... I still haven't fully figured it out but, seems like proper labeling / transcription + audio's quality itself is more important than the total length of the dataset ( samples ) butttt
I'd say, anywhere from 8 mins to 30 should be just fine
( I'd probs say, 15 to 20 mins being a golden middle )
but tbf, you could go with even 2-5 mins and it'll do okay ish
( And again, rvc quality enhancing could always be an option
As in, rvc's output for gpt-sovits samples?
orrrr, feeding rvc with gpt-sovit's outputs?
bro
me english not main leanguage
haha
i jus wanna know
what i originally put
for the audio
do you mean:
Using rvc's generated output as input for gpt-sovits?
or
Using gpt-sovit's generated output as input for rvc?
Crazy.
aka, rvc -> gpt-sovits or gpt-sovits -> rvc
Because if that's the case then I wouldn't recommend the first
instead, the latter, for sure
Use RVC audio into GPT-SoVits or put GPT-SoVits audio into RVC?
well
it definitely won't go that well
being fully honest with you, gpt-sovits is way more sensitive to bad audio
than rvc is
It's just less forgiving
+, doing AI -> AI in terms of making models vs improving the output is overall not the best idea
You see, rvc itself causes like a 15 to 40% loss of the voice's fidelity in the first place
and then gpt breaking it further
but like i have to go for only voice clips on youtube of the charater voice i wanna clone?
However, it works well when you do gpt - > rvc because rvc models are, baseline, higher in quality
that's why it will work
Well, more or less yes
Then there's transcription
processing stuff and ye, training
train gpt-sovits
train rvc model
Get output from gpt-sovits to have accurate speech and stuff ( according to your character's style etc )
and then feed rvc with gpt-sovits' outputs
aka. you improve sovits' output quality thanks to rvc's superior quality
so ye, gpt-sovits -> rvc ( not training tho, just inferencing / " cover " )
and then i can use the voice for tts?
no, you get your character's speech or whatever from gpt-sovits
and that output ( low quality ish )
you give to rvc
then rvc outputs you that, but better
rvc's only purpose in here would be improving the quality, nothing else
okay i see
and where do i get from the orginal voice of the character
before gpt sovits
youtube?
Well... this is actually up to you
You see, I work with Anime / japanese content so
I can as well use anime blu ray rips and isolate the vocals
yeah maybe i will do the same
or I can use visual novels, games etc ( and rip the files
thats why
oh
i have davinci resolve studio after so i can isolate vocals easily
its just abt where do you find them
well, you'd better use uvr / mvsep and the fv4 model for isolation
I can confidently say it's one of the best if not the best atm for " full voice " extraction
Either way, idk man where you can find it
did u test the davinci one once?
Davinci resolve is not a dedicated software for voice separation
I used Demucs to separate background noise from an audio track. 
It is just a bonus
whereas uvr / mvsep are made for that, and so are the models, for specific usages, with specific traits
yeah true
so yeah, as they say, pick your poison 🤔
which one do you use
gabox's voc fv4
oh
You can find it on hugging face
is it uvr or mvsep?
both
I just prefer uvr atm
mvsep has some issues on my end so
either way, it'll perform the same way here n there
if you prefer mvsep use that, if you like uvr, you can use that as well, no difference ( maybe in speed? but yeah, Haven't tested it on mvsep so can't say - btw, by mvsep I mean the local one, not the website
ok thank you for everything brother
i'll try to start
Np
Best of luck, and in case of problems, there's ' Audio separation ' discord, they can help you out
uvr and mvsep devs and such, in there
oh yeah wait before forgetting to ask
whats the rvc software u use
I use my fork
fork?
Codename's rvc fork version 3, based on Applio. . Contribute to codename0og/codename-rvc-fork-3 development by creating an account on GitHub.
oh
Just my own take on applio, let's put it that way
i thought u were making a joke with that
oh
mb
Can someone help? Why i cant download the audio
Is this android
обезьяна
Welp. I keep getting an error
"Process Process-1:
Traceback (most recent call last):
File "C:\Users\User\Downloads\Codename-RVC-Fork-V3.0.4\Codename-RVC-Fork-V3.0.4\env\lib\multiprocessing\process.py", line 314, in _bootstrap
self.run()
File "C:\Users\User\Downloads\Codename-RVC-Fork-V3.0.4\Codename-RVC-Fork-V3.0.4\env\lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\User\Downloads\Codename-RVC-Fork-V3.0.4\Codename-RVC-Fork-V3.0.4\rvc\train\train.py", line 590, in run
reference,
UnboundLocalError: local variable 'reference' referenced before assignment"
It happens whenever I start trying to train
I can do every other step including generating the index
its just training
I have three datasets, two of 18 seconds and one of 24 seconds, how many epochs are recommended for each?
like this outdated video https://www.youtube.com/watch?v=bP8AMf20MAY&ab_channel=kalozalt can I make real time voice change?
Oh you were the Mac guy
ya
RVC = Retrieval-based-Voice-Conversion, the best Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models
Wokada = uses RVC for realtime inference
Use #🔍│help-w-okada
Yeah I'm trying to use the speech to speech, my bad with the wordings @low shard
You awake to help me out?
Yeah you meant realtime voice changer for calls or just ai covers?
that's weird
show me your log folder
The former yeah but not for calls. itll be for content similar to the start of this video https://youtu.be/NTfVUDYSmpE?si=pnOhAE1iC9nES--n
aka titled How to Dub Anime with the help of AI
Hmm.. does the reference folder contain files?
Hm. That's quirky ngl
That error shouldn't be triggered under any circumstances
What were the steps you did prior to training?
preprocessing, feature extraction, index and training?
Well, tell me if you got the zip or cloned the repo
Wdym
I guess I dont really need realtime voice changer but just that functionality of speech to speech that doesnt sound so AI. wups.
how did you get the fork, via repo dl or zip dl ( releases )
Zip I think
lemme inspect the training script rquick
oh lol
Yea, no abnormalities there
In fact, it never happened to me or any other users
huh 🤔
Because it technically can't
Bro why do I get cursed with new errors
Did you click or tweak anything specific?
I didn't really tweak much
was there any other traceback ( earlier ) in console?
Wdym traceback
Yeah
well, weird shit weird shit, completely without any logical explanation
Move the fork folder to C drive
c/fork/run bat and all the rest
Bet
and retry, lemme know if that fixes the issue
and if not, this time fully inspect the console to see if there's no earlier tracebacks
yea?
An error occurred extracting file C:\ApplioForkTraining\logs\AdamBudgetCuts\sliced_audios_16k\0_0_11.wav on cuda:0: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
it says that for every processed dataset file
thats like 50 errors
@simple ore Any idea why that'd happen? ( zluda ) or is there something I don't know but you do
Either way @formal wind, I'm not sure what's the issue
From what I remember, zluda should work just fine and support the training ( otherwise what was the point of it in the first place as directml existed in rvc )
so I suppose, Noobies would have to take over this case
but all I know for sure is that it ain't the fork
changes I've made aren't affecting that part of functionality so it must be your pc specific thing or indeed something zluda-related I am unaware of
for amd it is best to train in the cloud, even if you configure zluda well, the performance is still terrible compared to nvidia
to be honest its not worth the suffering of doing the zluda rute for such horrible speeds
So what am I supposed to do here exactly
the whole deal was about using newest applio
Unless you know colabs / kaggles that use the up to date stuff
kaggle uses the latest applio
( matter of logging
What...
What applio is the latest
oh, welp 🤔
there are alot of applios on there
just remove that
welp ¯_(ツ)_/¯
Oh I use that one alrighty
it'll download main branch instead
I didn't use kaggle so, wouldn't know
Well all the struggle for nothing lol
which has the very good avg logs
Either way, think it'll be time to start making my own kaggle
🦈 🫂
for cases like this
noobies has done zluda stuff, in the end only the vram matters
no problem, just dont forget to remove that line
any better ai voice models than weights?
I ran chmod +x run-applio.sh
./run-applio.sh to run on macOS using these instructions https://docs.applio.org/applio/getting-started/installationbut its either waiting for an input or its running indefinitely
what did i do wrong?
empty train loader
no I mean, why the extraction fails
no vram?
Traceback (most recent call last):
File "/Users/name/Desktop/AI/Applio-3.2.8-bugfix/app.py", line 22, in <module>
from tabs.inference.inference import inference_tab
File "/Users/name/Desktop/AI/Applio-3.2.8-bugfix/tabs/inference/inference.py", line 17, in <module>
from tabs.settings.sections.restart import stop_infer
ImportError: cannot import name 'stop_infer' from 'tabs.settings.sections.restart' (/Users/name/Desktop/AI/Applio-3.2.8-bugfix/tabs/settings/sections/restart.py)
well, then that's some abnormality
like maybe he did not slice anything
and trying to extract features on 20 min long tiles
All i did was download zip chmod +x run-install.sh
./run-install.sh
After installation, run:
chmod +x run-applio.sh
./run-applio.sh
and it returned those errors. Could i get help on this?
20 min file unsliced? 
did you screw with /assets/config.json?
no I don't know how to touch allat so i just downloaded zip unzipped and ran the codes in the instructions
if i did tho how can i fix it?
then you screwed something up
because I've seen this error recently and that was from someone fucking up the config
How did he fix it?
not messing up with assets/config.json where he forgot to put a comma
i dont even know where that is tho
I did lol
welp
{
"theme": {
"file": "Applio.py",
"class": "Applio"
},
"plugins": [],
"discord_presence": true,
"lang": {
"override": false,
"selected_lang": "en_US"
},
"flask_server": false,
"version": "3.2.8-bugfix",
"model_author": "None"
}
yea i def didnt go in this file and mess with anything
that looks fine, so something else is messed up... did not unzip properly or something
aka missing i18 files with translations
is there some other way to unzip properly?
is that linux?
well, duh
lol
thank you!
dont expect to be able to train
I wont, im just gonna use it for speech to speech
and for that I would find one from the voice models right
start with the start, what GPU, did you follow the amd install guide to the letter?
yeah, inference runs on CPU though, so it may be a bit slow
but it is what it is
I'm not doin' local applio
I don't got the brains to keep tryin' that
why does you error mentions windows local path then?
No Like I mean I dont wanna attempt local anymore
I was trying to fix local
what's your GPU again?
AMD Radeon RX 6600
8GB ram should've been fine
as long as you follow applio's installation instructions properly
but colab should be just as fast
so not much point to run locally, you can train stuff in the cloud and play some games while it is happening
Bet thanks!
any fixes for "failed to load asio driver / error 0 ????? im trying to start the rvc and get the voice changer done and i have selected both flex asio as inputs and it says in cmd "failed to load asio driver / error 0"...... cant seem to get it working
use this channel - > #🔍│help-w-okada
help-rvc is for rvc, not wokada
oh mb
what is the lastest version of rvc?
anyone got any methods to reduce chopyness as some words arent said although im using a high end gpu
Increase the chunk, extra and fiddle with noise supressors ( on off, or change type ) also having a proper distance from mic ( and do not use index
Also As Lyery said, use the w-okada channel
how long on average does it take to train a model on roughly 300 lossless wav files
a wave file can have varying length
20 seconds each rouhgly
aside, you can't really estimate it
sadly
This is not linear like that, unfortunately
As I said, you can't estimate it at all, not even a chance
In machine learning you use the metrics to know when the training's " done "
In case of rvc / applio, you use the tensorboard
Aside, using sets above 1 hour has almost no point unless you're absolutely sure there's a shit ton of variety in there and I mean it, meaningful variety and diversity
Else you're exposing the model to a risk of biasing towards data of similar patterns and that in consequence, can lead to overfitting / bad generalization = bad model
i see
Fear not tho, the docs ( and so, the guide ) has some nicely explained stuff
matter of checking them out
Yo, how to enable FP32 on Cordane Fork?
enabled by default
dw
fp16 got removed (in the latest version)
hey, new here. might be a stupid question. Is there any way to use trained beatrice v2 models like you can with rvc models?
yes, with a vst
Oh okay, thanks for the info! So FP32 is enabled by default in the latest version, no need to change anything.
sorry can you explain what a vst is
beatrice is not meant to be used in a webui like rvc
but more in a DAW
like fl studio
超軽量・超低遅延なAIボイチェン「Beatrice」正式リリースしました!🎉🎉
CPUシングルスレッドでみんなつくよみちゃんになりましょう!!
https://t.co/dtkPZO0hqa
#つくよみちゃん
^
thats the only way to use beatrice
outside original w-okada
every beatrice info available is in japanese tho
thank you so much
Wait should I be looking at loss_avg or loss in tensorboard
avg loss
ignore old loss graphs
Bet
- How to use RVC Mainline Colab by Cauthess
- AICoverGen Colab Guide by Eddy (Spanish Helper)
- Create a model with RVC disconnected (colab) by Angetyde
hello I'm sure I do everything right, but ı get a mistake in the way. Can you help me ? " mv: cannot stat 'MMVCServerSIO.py': No such file or directory
/content/voice-changer/server/HVoice.py:3: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
from distutils.util import strtobool
Traceback (most recent call last):
File "/content/voice-changer/server/HVoice.py", line 10, in <module>
from downloader.SampleDownloader import downloadInitialSamples
File "/content/voice-changer/server/downloader/SampleDownloader.py", line 12, in <module>
from voice_changer.RVC.RVCModelSlotGenerator import RVCModelSlotGenerator
File "/content/voice-changer/server/voice_changer/RVC/RVCModelSlotGenerator.py", line 4, in <module>
import torch
ModuleNotFoundError: No module named 'torch'
WARNING:pyngrok.process.ngrok:t=2025-03-12T02:21:03+0000 lvl=warn msg="Stopping forwarder" name=http-33487-51a86839-104e-4c55-9bd3-f2b82bb15f8f acceptErr="failed to accept connection: Listener closed"
--------- SERVER STOPPED! --------- "
syntax error moment
generating sola buffer?
my voices just suddenly dont work? i havent used rvc in a week and they just wont work. no audio when i select models 
i also checked my mic , it works
rvc is not w-okada
rvc stands for retrieval based voice conversion, not realtime voice changer
w-okada is a software that allows u to use rvc models in realtime
so this channel is for rvc (not the voice changer)
which colab with web ui + vocal remover from youtube is greatest. mine i saw this llast year doesnt work anymore 😦
rvc or vocal remover? the latter baked in rvc is nothing but a bloat while there are better alternatives:
- [UVR5 UI](#📰│dev-updates message)
- [MSST colab inference (tweaked version)](#📰│dev-updates message)
i was adding my ai vocals on a song. i was putting song Let's say Metallica - Enter sandman , and i was adding Elvis presley voice model, colab were downloading the youtube metallica song+removing vocals+adding elvis voice on it. it was great but now doesnt work
colab has banned the downloader, so you'd better do it manually first
but i cant even open webui think gradio or ngrook thing. i wish problem was only downloading from youtube 🙂
share a screenshot of the issue
and what google colab link do u use
and what’s ur pc gpu
@cosmic spire
what
bro
Don't ping anyone without reason
i just said help
can u help then
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
can u help then
How about I telling you how to behave?

Describe your problem what you gonna do about RVC. If you wanna get help about W-Okada the realtime voice changer, go to #🔍│help-w-okada.
Don't ask someone to ask you back. That's the bad thing you could do in a help support.
Hi, on applio, I need to merge 2 voices but when I put the two pth files of the templates and click on fusion it gives me an error
I don't understand, what do you want to read?
Um, damn it. The screenshot of cmd of Applio you're running.
oh, ok
do you have to put only the pth files to merge 2 models?
That looks like to be a lot of problems there.
I tried several times
Does someone know why i don't have the faster whisper v3 turbo asr mode in my ui?
just have this
@glacial pollen
Opera GX spotted. 
Use Google Chrome, Mozilla Firefox or Microsoft Edge instead.
not related to his issue anyway
Can you tell me whats the problem so pls?
you'd better search the GPT sovits support server
i'm here now do you know if i have to check "choose audio" or no?
did that
new problem on top
I NEED line of code or that 1 file that can fix the split bug infer for Applio Kaggle, or if anyone here know it feel free to send
im sure it is already in the main branch
by any chance you dont know if i have to check yes in "choose audio" (image on top)
Yo, I need some help. Do you know which one is the best between HiFi-GAN, MRF HiFi-GAN, and RefineGAN for RVC Wokada?
hifigan
refinegan got deleted in the latest applio update due to issues
gives models a robotic/metallic sound
not deleted
just hidden until pretrains are done
blaise wanted to make a release with other fixes
yea, still I feel refinegan has room for improvement
what is actually gone is 44100Hz sampling rate
unfortunately refinegan gen had some bugs
wdym how
There is a button for that within the ui
First the dataset has to be processed and after that, you'd extract the pitch
( pick rmvpe ) and that's it
i can't show the images dang it
can you show me please?
I'll do it instead for you
thank you
huh!?!?
mine looks alot diffrent
wait
which voice changer are you using if i may ask?
cuz i'm using the realtime voice changer client
mabye a diffrent one idk
because ur asking it here instead of #🔍│help-w-okada
oh mb
btw is it a diffrent voice changer or something?
is rvc
rvc and w-okada are not the same thing
w-okada is a software that allows realtime rvc inference
rvc is for training ai voice models
w-okada is for using them in realtime
rvc can also do local conversion, speech to speech (not realtime)
then your question has no sense because w-okada does not extract the pitch of an audio
it estimates the pitch in realtime
you want to train a model?
Well i just want to extract the pitch of the models so i guess yea
you want to extract the pitch of a model???
thats... nonsense
🦈
for training a model you first need a dataset, then in the preprocess steps you do f0 estimation, which gets saved in the model's logs folder
but is not what you think, the pitch saved are just random numbers together
Yes..
Can you use different words to describe what youre trying cause that doesnt make sense
Who? Me?
ys
I'm trying to extract the pitch cuz some models like minos prime need it
So i just don't extract it and instead i just set the number?
que pasa
?
can you give a visual example
or audio
of what the issue is and how it should sound like
Basically it requires rmvpe or crepe for the extraction thingy
Yup that's right and i indeed selected it
it's pitch estimation for both realtime & non realtime rvc
Oh okie thanks!
use rmvpe
woa so i finally understand
he was trying to ask which f0 estimation to use
🦈 🔥
Hey, where can i find a tutorial to use the ai voices ? I dont want to train i just want to test the models that re already created
Is this really supposed to happen when I started training this? (I am using the no-UI Applio Colab notebook).
It even saved my index file a bit early.
I just quit the training because it's not showing the epoch numbers from it.
There you have a guide buddy. It got all you need to know.
-rvc
- How to use RVC Mainline Colab by Cauthess
- AICoverGen Colab Guide by Eddy (Spanish Helper)
- Create a model with RVC disconnected (colab) by Angetyde
thankss
You're welcome.
Has anyone got the same problem or should someone fix it?
No it is not, but it looks like tensorboard's dependency got updated and it is not compatible with what applio had installed
Well sh_t.
Can someone help me please
yardımcı olacak bir türk yok mu ?
Been thinking, what are the best tools to separate a voice from music for something like a podcast? annoying when training
spleeter is a pile of slop
spleeter and demucs are too ancient, I'd recommend some best mel roformer models in applications such as:
- [UVR5 UI](#📰│dev-updates message)
- [MSST inference colab](#📰│dev-updates message)
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
oh ohly shit there's a Pinokio image for one of them, thanks!!
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
You need to describe what you use, when it happens and stuff
Don't provide vague descriptions of issues
we can't read minds, unfortunately
im sure even chatgpt is more likely to hallucinate when trying to answer such vague question
In any case...
Given you mention onnx which applio / rvc doesn't support, you're talking about w-okada.
Now.. whether you mean the core components or the voice models themselves, you need to get such things.
rmvpe ( f0 extractors in general ) along with hubert / cvec should be bundled, so like yea
You must refer to voice models
They can be either a pytorch format ( pth ) or onnx
Those can be acquired from #🔍│find-models #1175430844685484042 or weights.com
You've asked the same question for the third time already.
bruh
Shit.
I don't know about the mainline RVC Colab. But I can only think of Applio the RVC.
Well I just wanna to know if anyone has fixed it The RVC Mainline Colab already so I can go back to training the voice models, I apologize for that
just use alternatives
kaggle or whatever
i went to upload one of the voice models i found here and it says i couldn’t upload it because the file was not “onnx” or “pth”
and what is the model you're trying to upload then
extension
what is it?

I have a feeling you're just dragging a zip in
that topic should be discussed in #🔍│help-w-okada
if it is voice changer
okay well i don’t know
it’s
#🔍│help-w-okada go in there
For W-Okada the voice changer, go to #🔍│help-w-okada. This channel #✨│ai-help is about RVC programs.
No Thanks, I either prefer to just again play the waiting game because I came impatient so yeah, I learn my lesson for that, But Thanks👍🏻
RVC in this context doesn't stand for realtime voice changer.
doesnt mean you would spam the same shit
Yeah I know
What are you talking about? I was pointing how you asked the same question for third time.
Well either way.. if he wants to wait then so be it
but imo it's a waste of time. Colab's like a bpd person
Well, that's pretty much it if he knows how to code.
one time it's aweee uwu, one time it's shitty shit
true
"Shitty shit"' is goin' in my quote book.
lol
Bro , can some one help me please ?
Is vonovox legit? https://github.com/dr87/Vonovox
yes, but some features are behind a paywall
ty!
Yes, it's another fork of WOKADA, it's around the same performance as the deiteris fork
Also it's better u talk about this in #🔍│help-w-okada
Btw Vonovox is Nvidia only
Sry didnt know it was part of wokada
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
Please be specific on which problem you encountered about an RVC program.
I don't know about the Vonovox, but Detris' W-Okada is better at doing realtime voice changer.
I mean we talked about it no? 💀
Or did i skip my explanation by accident 
U said u had someone making a new once but never mentioned a name
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- Hina's Mod AICoverGen WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Hina's Modified Original W-Okada's Realtime Voice Changer, Google Colab
- FaceFusion UI, by Nick088 Google Colab
- FaceFusion NO UI, by Nick088 Google Colab
- EasyGUI, by Rejekts Google Colab
- 🆕 Music Source Separation Training (Inference), by Jarredou & Makidanye Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
-rvc
- How to use RVC Mainline Colab by Cauthess
- AICoverGen Colab Guide by Eddy (Spanish Helper)
- Create a model with RVC disconnected (colab) by Angetyde
be aware of #📰│dev-updates , many colabs might be broken
Hi. What is the best way to improve the quality of the model without increasing the dataset? For example, is it possible to train the model to a specific voice? I mean that the model would learn to convert specifically my voice to the target voice. Or maybe it is possible to add text to the incoming voice, so that the model would understand the words better? At the moment my trained models often have problems with hissing, buzzing and whistling sounds.
RVC is Speech To Speech
and no you can't train a voice model to adhere specifically to your voice
get better cleaner dataset and look at https://docs.aihub.gg/rvc/resources/training/
Last update: Dec 24, 2024
Hello, I got this error:
D:\RVC1006Nvidia>runtime\python.exe gui_v1.py
2025-03-13 18:26:47 | INFO | faiss.loader | Loading faiss with AVX2 support.
2025-03-13 18:26:47 | INFO | faiss.loader | Successfully loaded faiss with AVX2 support.
2025-03-13 18:26:47 | INFO | configs.config | Found GPU NVIDIA GeForce GTX 1050 Ti
is_half:True, device:cuda:0
Input device: 7:Microphone (MICUSB) (Windows DirectSound)
Output device: 16:Speakers (Realtek HD Audio output) (Windows WDM-KS)
cuda_is_available: True
Exception in thread Thread-1:
Traceback (most recent call last):
File "threading.py", line 980, in _bootstrap_inner
File "threading.py", line 917, in run
File "D:\RVC1006Nvidia\gui_v1.py", line 653, in soundinput
with sd.Stream(
File "D:\RVC1006Nvidia\runtime\lib\site-packages\sounddevice.py", line 1800, in __init__
_StreamBase.__init__(self, kind='duplex', wrap_callback='array',
File "D:\RVC1006Nvidia\runtime\lib\site-packages\sounddevice.py", line 898, in __init__
_check(_lib.Pa_OpenStream(self._ptr, iparameters, oparameters,
File "D:\RVC1006Nvidia\runtime\lib\site-packages\sounddevice.py", line 2747, in _check
raise PortAudioError(errormsg, err)
sounddevice.PortAudioError: Error opening Stream: Illegal combination of I/O devices [PaErrorCode -9993]
Guys, is there a good step by step guide to get applio rvc to work with the rtx 5000 series cards?
why is my locally installed applio taking forever to convert?
What's your PC GPU and how long is the file
Follow to download it as said it in https://docs.aihub.gg/rvc/local/applio/ , but after you extracted the precompiled, go to the path in Windows explorer, write "CMD" and press enter, then in CMD write env\python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
Last update: Apr 01, 2024
This should work
Uhh I'm not sure if that still supports MPS, what version are you using
Also which m
its m2
Alr, what applio version did you download
The one from our docs that has a link for the latest huggingface stable release?
Thanks for the reply! 🙂
I also asked in the Applio Discord and the answer I got was very similar, but the thing is, after trying to install the new "nightly" version, It just gave me a bunch of "Requirement already satisfied" answers.
But from what i could see they are/they were a bunch of cu121 and not cu128 "files". So I first had to run: env\python -m pip uninstall torch torchvision torchaudio
And after that I did run the command to install the new nightly "version" and now it seems to be working! 🙂
Oh alright, goodluck
is there anything i could do to optimize the converting speed @low shard
You are using 2 different inputs and output audio devices. Your first is windows directsound the second wdm-ks, both have to be the same. Use MME on both
also need to uninstall old torch torchvision torchaudio using pip uninstall
but that's it
i've been away from the ai voice stuff for a few years
as of now whats the easiest way to train a voice model
The cable labelling is weird. Cable output is in input, cable input is in output, you can ignore the labelling they are all in the correct section
locally with rtx nvidia or cloud
thx, I saved it as one of my copypastas
what's ur pc gpu
U can train on AMD too btw, just not as good
on 7900xtx? pretty good
yeah also depends on the AMD gpu
how's it going after the switch btw?
7.8gb
what
that seems to be just memory, it could be storage, it could be ram, since you just said the unit for memory GigaByte
You can check your pc gpu via:
ctrl+shift+esc (task manager) -> Performance tab -> GPU
6700xt -> 4070TiS is 6-7x faster, can also use a bunch of other things like flash attention, triton, etc
those were ass to install with amd/zluda combo
Nice, I don't think you regretted it at all lol
welcome to the green side 
nope, been running it prety much non-stop for last 2.5 month
with 6700xt I'd given up long ago
is that under rocm on linux
cause im jw how amd would compare to just directly using nvidia on a card with equivalent memory
AMD and Intel are good only for cheaper gaming
But for AI Nvidia is da best
Imo AMD cpu x Nvidia gpu is just a godtier
I honestly have had enough of intel, never again
i still want to know how rocm compares to natively doing stuff on nvidia
i heard from like one person it works fine but i have no frame of reference
windows, hip sdk + zluda
anyone know why rvc isnt working on linux on AMD?
can someone help me with the client like my voice is a bit robotic and idk how to fix that
anyone know how to make it your model/any model speak a different language than english?
like for example it works speaking another language but it doesnt make certain sounds
im using applio
i assume it's got something with embedder model "Model used for learning speaker embedding."
though there's only asian things here
there's custom but idk what to do with ti
alright I got it figured out lol.
for anyone trying the ai to speak a language more like the input file just make sure the "search feature ratio" is set to a low value
I'm curious of 9070 XT, was thinking it could have performance (optimization) leap
You mean an index file of an RVC voice model or a TTS?
nah i meant like for example i'd put in an audio of me speaking my language and when the ai'd say it it'd say it but skip over the special sounds like just not pronounce them completely
stuff like ł or ć yknow
I waited long enough but this error persists “ERR_NGROK_8012”
You can either take the index file out or reduce index ratio down, but the result might be unexpected.
Are you using Applio with ngrok?
collab
No, that's not what I meant. I know you're running from a cloud service like Google Colab and Kaggle. But is it W-Okada or RVC?
mainline, rvc i guess
I don't know about the mainline RVC for Colab. A lot of people here complained to me this specific Colab notebook won't work.
oh so this is a common problem like when they change the python version ?
Likely.
Wrong channel sorry
oh welp, thanks yuuka
You're welcome. 
how long does it take to usually convert like a 3 min audio on applio?
depends on your gpu
you can't find the answer till you try it
I can't put my dataset into this folder.
put in anywhere else
Is that a joke ?
You think it is a joke?
because of this
because:
- You have to first do the preprocessing of the dataset:
- You have to have the proper model folder selected:
- It has to contain sliced_audios and sliced_audios_16k folders
Else you'll encounter:
no-feature-todo
It quite literally says " no feature extraction to execute "
because feature extraction is done on samples, which after preprocessing end up in
It's pretty straightforward dude.
thank you 👍
yw, hope it works now
Is there a tutorial on how to use this Applio thing? https://colab.research.google.com/github/IAHispano/Applio/blob/main/assets/Applio_NoUI.ipynb#scrollTo=v0EgikgjFCjE
Last update: Jan 31, 2025
Hey, can someone check this spectrogram from Spek and lmk if it looks good for training a voice model? Or is the quality too low pls?
It's good
Select high sample rate when training
48khz
ok ty
Or 44khz
did you upscaled this audio?
(apollo, resemble, etc)
high frequencies look very artificial
its bad to have fake frequencies in the dataset, rvc gets confused during training
it cant work well with synthetic data
better train the original audio (without any sort of upscaling)
rvc already upscales the audio in the training process
not exactly "upscaling" but it is sort of pretrain ability. when going further epochs it may slowly reproduces the dataset including the cutoff
temporal upscaling 
nah DLSS 4 & FSR 4 are better
what about intel xess 
Hi! how are you guys?
I already trained a model but I wanna increase the training, do you know how I can do that on Google Colab?
Simply (i'm not sure if you're using Applio colab) don't delete your training files from the file explorer, leave the dataset path empty, put a higher epoch count on "total epochs" and click on train. You can find similar info on the guides.
What Python package do I install? its teling me to download 3.13.2 but when I tried using that it didnt work because it wouldn't find the torch location with it, even though I downloaded torch. Am I downloading the wrong version?
Does anybody have a tutorial I can follow? the Youtube tutorials are all outdated
@low shard Here
alr we talkied in #🧬│ai-chat
Yes
first of all, elaborate:
- ur pc gpu
- what u want to do
- what OS do you use
I would like to install Python using my GPU should I use their newest version that is compatible with my laptop, or is there a specific older version I need to use?
Are you trying to do anything RVC-related? or just want to install python?
Its RVC related
then, elaborate #✨│ai-help message first
I have tested RVC without python or pytorch, but when I looked at another tutorial it said I needed python and pytorch so I would just like to know which version of Python I need? I am currently running Cuda 11.8 but they want me to install Python 3.132
So I dont need to install python or pytorch to use it on discord
?
nvidia is a company that makes a lot of things, which nvidia gpu?
realtime for calls?
you don't need python
ignore everything you get off old youtube tuts
RVC = Retrieval-based-Voice-Conversion, the best Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models
Wokada = uses RVC for realtime inference
So I can just download the voice changer and im ready to go
I have a 2070super, looking to get rvc for calls
Do yall actually need Wokada or RVC?
then you don't need RVC, you need wokada
go to #🔍│help-w-okada
so, you want wokada, the realtime voice changer for calls?
Yes
RVC is not a voice changer for calls
^
tell your pc gpu and OS in #🔍│help-w-okada
Hi, I just got to this, are there docs or something to tell me which model does what, which is better for an xyz purpose...
Yes. There is huge as hell doc in fact
edit 13.03.25 deton24’s Instrumental and vocal & stems separation & mastering (UVR 5 GUI: VR/MDX-Net/MDX23C/Demucs 1-4, and BS/Mel-Roformer in beta MVSEP-MDX23-Colab/KaraFan/drumsep/LarsNet/SCNet x-minus.pro (uvronline.app)/mvsep.com/ GSEP/Dango.ai/Audioshake/Music.ai) General reading advice | D...
tensorboard loads fine through collab, but the rvc link leads me to an issue,
ERR_NGROK_8012
Traffic was successfully tunneled to the ngrok agent, but the agent failed to establish a connection to the upstream web service at http://localhost:xxxx. The error encountered was:
dial tcp xxx.x.x.x.xxxx connect: connection refused
the local url also does not appear in the notebook either:
RVC URL:
Tensorboard URL:
File URL:
The tensorboard extension is already loaded. To reload it, use:
%reload_ext tensorboard
Reusing TensorBoard on port 8077 (pid 5487), started 0:09:36 ago. (Use '!kill 5487' to kill it.)
Traceback (most recent call last):
File "/content/training/runmain.py", line 3, in <module>
from dotenv import load_dotenv
ModuleNotFoundError: No module named 'dotenv'
very new to rvc so please bear with me
Can someone model this for me
#1159289738314919936 is what you should use
or visit #1191429836321849435
has anyone gotten the RVC realtime to work on arch linux with AMD gpus. i keep getting errors
go to #🔍│help-w-okada and check the pinned guide to get w-okada fork with better optimization
To request someone to do a voice model for you, you can create one in #1159289738314919936 or train it by yourself.
For W-Okada, go to #🔍│help-w-okada. If you mean by the realtime mode in an RVC program, that thing is too old.
What should be dataset for Applio? is there a specific length?
There's no specific length you should use to train a voice model in Applio. Although a good quality audio can be used to train a voice model to achieve a good quality, 30 - 60 minutes audio should good enough.
Should we chop the audio like earlier? where we used to make chunks of 10 sec audio, i mean 30 mins audio splitted in 10 sec each, or a 30 min single audio file?
I think Applio should have a feature to autochop audio for train. You can read some more about Applio there. https://docs.applio.org/applio
5 minutes as bare minimum, the more the better, but more than 1-2 hours usually gives less noticable improvement given the same quality consistency
can we train in Steps? like i have a bad GPU ( NVIDIA GeForce GTX 1050 Ti (4 GB)) , can i train daily 1 hour-2 hour and continue again?
NVIDIA GeForce GTX 1050 can be used for AI inference. Although AI training is possible for this specific GPU, it would be real slow.
You'd have to open your PC overtime to finish a single training.
6 GB is bare minimum and 8+ is recommended
GTX cards are also not recommended due to lack of tensor cores for optimization
elaborate:
- ur pc gpu
- what u want to do
- what guide are u using
- what are u doing step by step
epochs is just a unit of measurement of the traijing cycle
more or less don’t mean more quality
you need to monitor the tensorboard
Which is easy to use? i have used collab in earlier days of RVC
it used to get disconected after certain Epochs or load
that’s bc google colab has a random daily gpu time
which can be max 4 hours
kaggle gives 30 hours weekly for better gpus
but it needs phone number and its harder to use
I would suggest applio kaggle
Now for 200-300 Epochs with data set of 15 mins i belive it should take 2-3 hours right? which cloud should i use so that it doesnt get disconnected and is easy
U sure?
Yes.
Whats this error?
invalid authtoken, double check at https://dashboard.ngrok.com/get-started/your-authtoken
ngrok is the fastest way to put anything on the internet with a single command.
then paste it at this highlighted one
how to upload data set?
Thanks worked
Please @knotty moth
anyone know why
the voloume is weird like that
and how i can fix it
also sent a link to a file cause i didnt have perms to post a vid
Which RVC or W-Okada program are you trying to run?
Hello everyone, I'm new in this world of to discovering new things and that attract me a lot, I learned well or bad to start the program, but when I feel I don't know if the setting is correct, I tried it as a joke with friends on discord and they tell me that it is not so clear, I don't know if you eat words, as well as the delay. My setup is a R5 5600X and a RTX 4070s
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
Are you trying to install RVC or W-Okada or something?
program as in the application itself or the specific voice im using
For W-Okada the realtime voice changer, go to #🔍│help-w-okada. For RVC the voice converter/changer, #✨│ai-help here. Anything else, let me know.
No, I ain't clicking on that link.
thats fine, can i have perms to post a video then?
or at least an audio file
But I'm not a moderator bruh. At least just tell the name of the program.
okay i figured out its rvc, like i thought
Now say the full name of it. "Applio, mainline RVC or Tiger RVC GUI" for example.
is this normal? or am i doing something wrong? because its being trained super fast and model i check for 100 Epochs its not sounding great
batch size 15
Oh it changed? I used it on windows awhile ago(a year ish). I kinda out of loop. I'll look it up since I like using it on discord
you didn't seem to follow the training guide
- stick on batch size 4 or 8 for most cases
- use the "simple" slicing method
- 5 minutes of dataset is bare minimum to yield good enough results and it shouldn't be only 1 step per epoch
Actually yea, this is the one I am using I believe. If I am understanding the fork anyway, this is it but I can't get it to work when I run the command.
are these 2 things normal? (using rvc mainline collab)
you did not slice the audio or your slices are >10s
no they are obviously not, the requirements have not been installed
So I need to install py 3.10.12 on my pc?
I've tried installing py 3 10 12 on my pc only because it doesn't use the installer I'm not sure how to open it, there is a folder but I don't see the application in the install folder.
what kind of linux is that?
im using win 11
yes
i need to WAIT.......?
or use the colab that has been fixed
or you can use a local install if you have a decent gpu
my issue with kaggle, the “datasets” folder in the Mainline web GUI (Kaggle) is locked, I can't put my datasets into the folder
anyone around that knows how to use the current cloud method to make covers?
I've tried like 6 times, yet the song always has some kinks to iron out, can never figure it out for the life of me
sometimes the start of the song, the middle, and at the very end some artifacting happens
or it happens at the start but is absent at the end
or sometimes, it's fine throughout until some straining lines approach
#📰│dev-updates message or make creations in weights
I think that's the one I used, even had it segment the audio to avoid it but it happens still
felt out of my element with this so figured I'd ask here
Hey
turns out i can use appolio 
Yesss appolio is great
Or the collab I sue is the Rvc ai cover maker works the best for me . Or locally if u have a strong gpu
nah, i want to train voice
how to use custom pretrain (other pretain #1235952130855010365 ) on appolio ?
hii can anyone teach me how to use the latest UVR5 to make ai cover? i can't find tutorial right now 😭 plz
last time i made ai cover was last year and idk whats happening rn but i can't use the old rvc anymore, i just need the simplist way to make ai cover plz
https://www.weights.com/create this should be the "easiest"
tysm!! does it have the same quality as the rvc and other google collab ?
That site provides pretty much the basic stem separation: vocals and an instrumental.
There's a working UVR5 Colab notebook link that being said by a mod somewhere here.
bro wanted the easiest/simplest one, so I gave it to him.

i saw the link but im dumb i didn't see the tutorial, can i work it online with Mac Os?
Google Colab is a website. It will sure work on Safari.
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- Hina's Mod AICoverGen WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Hina's Modified Original W-Okada's Realtime Voice Changer, Google Colab
- FaceFusion UI, by Nick088 Google Colab
- FaceFusion NO UI, by Nick088 Google Colab
- EasyGUI, by Rejekts Google Colab
- 🆕 Music Source Separation Training (Inference), by Jarredou & Makidanye Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
yeah but i just don't know how to use it ....
just click them step by step or...?is there a tutorial link?
i used to follow youtube video but they are now outdated
I'm running RVC on my macbook m4 2024. I got it somewhat working, and the WebUI is popping up. But, whenever I try to do anything, my tasks load in the queue indefinitely without completion. Is this a common bug? Anyone have any info on how to fix it?
can anyone tell me how to install voice model at this page?
I don't know, I haven't even used it yet, that's why I'm recommending Weight.
I don't know if I should feel annoyed or happy when the voice model I made in weight and the one I did in collab are barely different.
UVR5 UI does not use a voice model
RVC or Applio might be more in line with what you're looking for.
you don't
that's just for cleaning vocals
what's ur pc gpu
yess😭 the quality weight made is kinda......
what's ur pc gpu
@oblique heart
Train (make) RVC Models on cloud:
- Prepare the Dataset
- Setup RVC:
Choose a cloud way to use RVC,
- Google Colabs (4 hours of daily gpu for free, not much hours, but easy to use):
- Kaggles (a bit harder to use and needs phone number but gives 30 hours weekly of better gpus):
- Mainline (UI)
- Applio by Vidal (UI)
- Lightning.ai (Kinda hard, needs login, no issue with web uis or anything, but only free 15 credits monthly):
- Be sure to know about the tensorboard
Google Colab = Easier but risk of getting disconnected
Kaggle = Harder but way more gpu time
If you are looking for the easiest way and for free, is using https://weights.com which ofc uses RVC
RVC Inference (use models) on pre-recorded audio on Cloud
You can use either:
- Weights.com: Easiest Possible Ever Automatic
- Ilaria RVC Zero: Fastest free on cloud
- Applio UI Colab: RVC Fork with some extra features like TTS
- RVC AI Cover Maker UI: Automatically Separates the Vocals and Instrumentals, converts the voice and mixes them back
Here's all cloud options
if you need any help or ask any questions, you can here
ty
yw and lmk
um it says i can turn on headless mode optionally to run the gpu on all sessions
what does that mean?
im a little confused
to keep the session running even when the browser tab is closed
oh
the ngrok link leads to an error
error 406
the link to sign up
nvm i got it to work
wait what is the pretrain
is it the AI model i want it to sound like?
it's basically a base for your actual training
without it, everyone would need more than 40 hours for training
but that works as a base for you to train your own model with much less than that much dataset
If I stream on twitch (obs open 3500 bitrate) + using w-okada RVC + game running --> will my PC "melt" as in lower the life span of my hardware significantly? I have a 4070 Ti Super OC Nvidia // Ryzen 9 3900x // 32gb ram // lots of fans // and a pretty good cooling system that maintains low temps
rvc is not w-okada
use the w-okada channel for stuff related to it #🔍│help-w-okada
as long your temps are fine, your pc will not degrade as much
what kills components are temps
and unstable voltages
if your voltages and temps are fine, you'll be fine
oh woopsie my bad, and thanks for the response
I made this little sh
t!
yo, does anyone knows why training not working on applio colab?
How do I make my own voice model with voice clips?
I've searched for methods but they're all outdated
Can an NPU work for rvc
need to update numpy to 1.26.4
how do i update it on colab?
Okay so I've created an app... for the time being it's using edge-tts to generate the speech output, not too slow, but obviously there's not a lot of control over the voices (though between rate and pitch you can do a lot more than you think).
That being said, it's hard to beat the 2-3 seconds or so it takes to get the audio (depending on length).
I want to do it with cloned voices, trained or otherwise, but it has to be fast, either on CPU or a 3060 12GB.
What are my best options?
i need help
i tried to do the cover thing
i cant send screenshots
but the output part is empty
im trying to make a cover in rvc colab
applio
and after i hit convert it says file inferred successfully but export audio is empty
@odd shale
!give-media-perms 1h @plucky jay
elaborate:
- your pc gpu
- what guide are u using
also don't ping random helpers
@dusty mortar #🔍│help-w-okada message
How to (unofficially) use Applio for RTX 50 serie cards
Follow to download it as said it in https://docs.aihub.gg/rvc/local/applio/
After you extracted the precompiled, go to the path in Windows explorer, write "CMD" and press enter, then in CMD write env\python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
If you get any already satisfied requirement issue, run env\python -m pip uninstall torch torchvision torchaudio then the command said above
Last update: Apr 01, 2024
is a 1660 super good enough for training in a decent time? (≤2 hours)
being faster than the audio duration or half of it is sufficient imo
most Nvidia gpus should have fulfilled it
and I should be using what to do that? I have voice-changer open now to toy with, but that's assuming I'd be piping the resulting edge-tts audio into it and then to my speakers I guess.
hmm... this is working somewhat..
Hello, I'm new in Weights, how can I have an accurate Ai cover? My Ai covers are mostly dry and they sometimes can't withstand high notes
They sometimes sing like they didn't drink water
Hi everyone, do these pre-trained models work well for French, or are they mainly optimized for English?
help, I downloaded a voice changer, configured it, I launch it - I can't hear the neural network RVC voices don't work, only BEATRICE works video card RTX 3050 processor AMD RYZEN 5 6600h vol 0
i was using the applio colab i switched to the other one and it worked thank you
doing it "realtime" is not a good idea
pls go to #🔍│help-w-okada and read the pinned guide there
um so whenever i uploaded a new ai voice i need to press download embredder before actally dowloading it but when i download the embredder thing it just says wait a momment and ive been waiting for like almost 10 mins ngl
did yall have to do the same thing
or am i buggin bruh
nvm
Hello guys, there is a model that i use which sounds good, (psycho2go by dan), but it's pronunciation in arabic isn't good, i downloaded arabic dataset,
Is there a way to make this model with the same sound good in arabic ?
in talking or singing vocals? depending on how well the input audio articulates, and it may struggle more on the latter
In talking, but it should be able to handle changing in tune without going all robotic😅
I tried adding the pth file of psycho2go model in the load model in train tab rvc but it gives me error
guys ive got a question i'm beginner with all these ai stuff and i wanted to ask something ive got the vocals of a song that i want to convert it to ai covers but i dont know what to use can someone help me?
you aren't supposed to "load" pth files in training
But i need to get the same voice just need it to learn the pronunciation
you don't need the existing model, you need the dataset
if the model was made by someone else, you're cooked 
I guess I'm cooked 😭😭😂
Chatgpt told me I'm not cooked and i can fine tune this model even if i had an arabic dataset for other speaker
alternatively, you can try using the output audio inferred with the model as a dataset
though I'm not sure if it's not ideal
you prob mean an arabic pretrain, but unfortunately I don't think it exists yet
and to make an arabic pretrain you'd need massive amount of dataset (100h+)
Then back to being cooked😂
With the same speaker?
several speakers, perhaps around 20-100 speakers (30m-1h each)
I give up😂
i downloaded a voice model but unlike the other models, the one i have downloaded has model.pth and a metadata.json in it what do i do with the metadata.json
nothing, it's just the metadata of the model on weights.com
so all i need to do is insert the model.pth in the place that i would normally put the pth files?
!give-media-perms 1h @gritty merlin
elaborate:
- ur pc gpu
- what guide are u following
use another model's index file as placeholder and set index rate to 0
you can even delete the model metadaata, all you need is the pth and index if it has one
also u might wanna rename the model
since all models on weights are renamed as "model"
It's optional
alr ty
it should work in Applio I suppose
What you absolutely need is a pth
pth is basically the model containing the voice
index shortly contains the accent it has been trained on
wsp


AI HUB Docs