#✨│ai-help
1 messages · Page 194 of 1
Ayo? @covert anchor level 3 !!! 
sorry no idea cuz I'm not using the latest applio
oh :c
btw check the model name, should be only alphanumeric without spaces as safe bet
well, dw, you tried and i apreciate that
this is the name
it's a champion from league, on spanish
the name of the dataset, "mezcla" means mix
and that's all
probably im just gonna wait untill tomorrow and get help of a friend that uses the latest version of applio
anyways, thx!
what, why?
because that's how it works. Either you slice the file youself in using 3-5s slices, or you let Applio slice it in preprocess
there is a guide
yhanks so muchh
Ayo? @boreal sluice level 1 !!! 
whenever i use any voice it rlly just cracks alot lol
is there a fix or is my mic the problem
try not to infer doubled vocals
I would suggest Deiteris' w-okada fork
Guide style is in the same as Blanc_dot's. Thanks Blanc_dot for corrections. Most technical information comes from deiteris.
Last update October 6th, 2024: Multi PC setup explanation added
Translations added for:
German: https://rentry.co/ForkVoiceChangerGuide_de
Turkish: https://rentry.co/ForkVo...
It's kinda better than the OG version.
And if you ask, nope, we can't give support nor troubleshooting for voice.ai issues
The source of the problem can be either 2 of these things: Or the model you're using wasn't properly made/trained or it simply doesn't fit your voice.
You're welcome.
Tiny fact: The models made with tiny/moderated-size dataset prolly won't perform properly on W-Okada nor Voice.ai
Tho there can be tiny/lucky exceptions
Make sense
Ayo? @gritty zinc level 1 !!! 
So this One work on everything?
Yep, it should.
Deiteris fork is kinda more optimized than others.
Also it's matter of playing around with settings and reading the guide.
Read the docs i gave you above.
I'm not sure which version of W-Okada you're using
Guide style is in the same as Blanc_dot's. Thanks Blanc_dot for corrections. Most technical information comes from deiteris.
Last update October 6th, 2024: Multi PC setup explanation added
Translations added for:
German: https://rentry.co/ForkVoiceChangerGuide_de
Turkish: https://rentry.co/ForkVo...
oh yh ty
I have f5-tts installed, how do I put weighted voices on it? it only seems to let me upload audio samples, and not .zip of weighted
thank u! Im guessing it must be voice
i do have a deeper voice
soo ill find smth that fits more
F5 tts is only 0shot
Sorry, what is 0shot?
0shot = no actual training needed, just an audio file to work, inferior in terms of quality to few-shots
few-shots = training needed, an example is GPT-SoVITS (TTS) & RVC (STS)
You can't upload RVC models to GPT-SoVITS nor F5 TTS
huh, could have sworn I installed a version of f5-tts that could do training based on the github tutorial, oh well, thanks fo the information
RVC models can be used only in programs that use RVC, such as W-okada
The training is related to the model who does 0shot, not actual single model training
hm
You can train the big model that's being used for 0shot, not train every voice you want to a model in F5 tts
can i smh link it to disc?
You mean as a realtime voice changer for calls? you could technically do that for any TTS
Table Of Contents Introduction Index of the best TTS 1. ElevenLabs/11Labs: 2. GPT-SoVITS: 3. Fish Speech: 4. F5 TTS: 5. Edge TTS: 6. StyleTTS2: 7. XTTS2: 8. OpenVoice v2: 9. MeloTTS: Use TTS in Realtime on calls (ONLY PC) Introduction TTS Means Text To Speech! Inference means when you use the...
But it's not really realtime
You'd have to type the words and let the audio play so
eh beter than nothing
There's actually Wokada, which uses RVC models in realtime
So it's Speech To Speech, rather than Text To Speech
What's your pc gpu?
this is too complicated for my dump ass i just want act in vc for online class
thought it would be good idea
prob no
It's AI
Open Source AI
There's guides
Leo gave you the wokada guide above
which has everything you need to know
yes buttt
Ayo? @gritty zinc level 2 !!! 
it kinda NOT what im looking for excatly
what are u looking for exactly
smt like the text speech thing
i use that prob
Yea you can install any TTS in the guide and use it for calls
Really depends tho
If you want generic voices and easy: edge tts
If you want custom voices and easy: F5 TTS or Fish Speech
If you want custom voices and best quality (but more complex): gpt-sovits
the process for using each of those in calls is kind of the same
Also, depends if your pc gpu is good enough tho
alr tyy
pls help
Ayo? @true ravine level 1 !!! 
ok I downloaded RVC off of pinokio, what tab and where do I put the weighted file?
Ayo? @pure pecan level 1 !!! 
wait, is RVC only for singing, I thought it was a text to speech
well I founded a folder called weights, and put my model in there, but when im in the rvc ui, I don't see any option to select it.
nvm figrued it out, had to take the pth file out
just wish i could use tts instead of voice...
I want to script something and have it read it all out
blah i just keep getting errors trying to convert voice, oh well, thanks anyway
Ayo? @pure pecan level 2 !!! 
sounds awful with my voice though
What is that
rvc
Colab or local
local
windows button + tab then create a new desktop and use that
maybe the model was just bad, tried another one and it's really good (1000 epocs hatsune miku)
seems to not work with recordings that are longer than a minute
and rmvpe doesnt work at all
yeah i think im gonna need some 1 on 1 help in call or something i just dont get what im doing
hirari
yeah?
Need help
No errors, just the RVC bugs out and stops whenever i try to train
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration
Ayo? @blazing crane level 1 !!! 
Using Ov2 pretrain at 40k
Last time, which one is the best alternative for denoising?
yes?
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
@crude flame
2
what is 2 bro speak up since u at least getting help
Which one to be exact
It's a collage...
Yh
Just asking
for colab melband it is the second one for mvsep you can use either standard or aggressive for denoise by aufr33 and xminus you can use either of the melband
What's the best option among all MelBand Roformers
Cause results are different everywhere
the first two are the same (mvsep and colab) and idk about xminus
They're the same?
just use the one that sounds best
Which one is it
Number 1 or two in the Colab
2
I don't wanna waste time or GPU
THIS???
yes
With high Chunk size and overlap 16?
thats fine
tsm it worked ❤️
could i get some help please?
what do u think about this?
1hr+ - 8
<1hr - 4
10 hr+ - 16
something like that
more data makes a batch more uniform
so if its less than 1 hour 4 right?
it really depends on the variety of the dataset, if there's a different content using larger batch kinda levels all outliers
with smaller batch those outliers have more noticeable effeect
what if there was like a 2-8 hour data set i would still use 8 right?
sure you can try batch 8 for 30m+
do they affect quality at all?
or is it just the time
it does affect the training result
the model is trying to find optimal parameters to generate a voice, with different batches it may overshoot or undershoot the target or circle around local minima
oh dang should i try to reatrain then bc yesterday i made like a 22 min data set on 8 batch size and it doesnt sound like the voice i was making but it sounds realistic
there's any way i can improve the speed on the training?
what's your gpu and dataset size?
i should be training at 2 or 3 secs on every epoch, or at least i think i should
rtx 4060
no clue on wwhat these pics r
it was training 2-3sec/epoch because you had an empty set lol
empty set?
with 2 mute files and discarded 9 min audio
it is an example how the model looks for the optimal solution (the peak in the middle) doing small incremental steps towards it
discarded? im not understanding, idk if it was for me
when you tried to use unsliced 9 min file, it was not used at all
but now that you've sliced it the training actually uses it and for 9 min file and 4060 52/s epoch is a good speed
this better?
oh
Ayo? @marsh schooner level 4 !!! 
the goal is to get it as far down as possible right?
the further down means more realism most likely
and is it fine going backwards like this?
-rvc
- How to use RVC Mainline Colab by Cauthess
- AICoverGen Colab Guide by Eddy (Spanish Helper)
- Create a model with RVC disconnected (colab) by Angetyde
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
that's restarting from some previous saved set of weights
g total is a summof feature match (most important), mel loss (medium), and kl
mel loss is how close the spectrogram of the generated audio is close to the original
less is better
fm is more complex measure, generally it should go down or stay somewhat stable
so if i trained something and its too high from my liking is there something i can change in my training settings to get it to sound closer to the original
hey does rvc work in ableton or any daw for real time or no
like can you talk through rvc and get the outputs /inputs into your daw like fl studio, ableton, etc in realtime with no delayt
or is that simply not possible
do u mean rvc okada aka the voice changere
this
Ayo? @sharp rampart level 1 !!! 
or do i have the wrong thing to do it
that i have no clue i use the voice changer i thought that was just text to speech though
theres a voice changer for real time called okada but it impossible not to have delay but u could still have full convos with ppl using it and u dont need a crazy gpu
i run mine using my 3060 ti
i need it to have no delay so i can make music with it
thats not possible yet is it?
i have no clue i dont do music ai
bc im trying to get my voice to sound closer to the original is there a setting that helps that or no?
For how many minutes of dataset can I use this batch size without the colab disconnecting and for how many times so that the model does not sound bad?
isnt this slow for a 4090 22-25 min dataset
Where do i get a virtual mic
33s per 25 min set, about right?
Hello everyone! I would need the advice of someone who knows about RVC please, I can't find what I'm looking for on the forums or it's old (or I don't know how to search). I'm looking to train an rvc model, the problem is that I only have 2min of audio in wav files of about 2s. So I've tried 50, 100, 300 Epoch, changing the batch size... I still have a "robotic" voice and all the Ss or CHs are "metallic". Is it possible to train my model with all these wav files, or is it better to increase the size of my dataset? All my audio files are studio quality, without reverb ect ect... Thank you!
I run it locally on a quadro RTX 4000 Ada 8Go
increase the size of the dataset that fixes the robotic S, ch
it does not really fix them but makes them appear way less
"frequent errors occured" anyone know how to fix
Do you think it's better to concatenate all my files in one 2min wav file ?
never let rvc to slice full wav files for you, is bad
ideally every sample should be 3s-5s long
use audacity audio labelling
All my files is around 2s long, it's better to increase manually the duration ?
I can make a python script for that, 3s of full audio, without silence
3s average
is not going to help because your dataset is just 2 minutes of audio
Okay thx
10 minutes is bare minimum for okay perfomance
models hit the "realistic" tone at around 40 minutes
I have a another model with 8min of audio also in 2s wave file, but it's a really "specific" voice, it's really glitched when i use it. (It's Pat from Mickey in french), even if i try to sound like the original voice to help rvc it's the same. But if i set the pitch to -12 it's super clean, but, tooooooo deep
I'll try 10min audio
yii
use batch size 4
train for not longer than 200 epochs
And increase the dataset size ?
Ayo? @chilly ridge level 2 !!! 
yes also when a voice does not match the original is due to:
undertraining (most common)
dataset has too much variety, for example a person using two "voices" at the same time
does uvr5 also give both
vocal and audio
for separating vocals? yea, use mvsep bs roformer
or mel roformer kim
which do u prefer personally
Ayo? @oak inlet level 1 !!! 
Audio came from kingdom hearts game, so, i have file with just him yelling. Does i need to use these kind of files ?
i prefer mel roformer, bs roformer is known to have muddy instrumentals and vocals
oh
avoid yelling and laughing audios
you can keep a few laughs
but be sure to not add too much of them
might confuse ai believing thats how the person sounds
how do i add ffprobe and ffmpeg to root. Im kinda restarted
Thx for all these advice ! I'll try theme all
are u on linux? sorry idk
if ur new to rvc check this https://docs.ai-hub.wtf/
ok
sibilants are made out of noise "columns" that is shifted in frequency
so RVC needs about 5-10k attempts to make a proper 'S' or 'Ch' or similar sounds out of pure noise. That is why undertrained model produces metallic S.
Can you explain please ?
Ho i see
too small dataset causes them to appear often due to lack of data
with a too small dataset you need to train for like 2000+ epochs
but chances are that while it may fix S sounds it may ruin voiced parts (those wavy lines)
I'll use audio file from all kingdom hearts game instead of 1 to increase the dataset size
I think i can reach 8-10min
That's a lot, even with my gpu
the example aboive took 5000 loops
The image ?
it will not fully remove them but they're still going to appear moderately
he's trying to explain you that the more steps you train, they appear less
xD
not, they dont appear less, they take a proper shape
the noise column shifts into the right frequency range
so instead of metallic z it is a proper hissing s
well this is a better way to say it lol
do 4 and no more than 200 epochs

it would depend on how many times there's S in the dataset and if you're lucky enough for the training to hit that S slice of audio
I dont think even 10h+ dataset from 3k voice recordings for a specific person's voice is necessary like this lol https://www.techspot.com/news/105764-panasonic-resurrects-long-dead-founder-ai-share-management.html
anyway, with a pretrain the requirements are not that high.. i've been testing it from scratch
with a good dataset 10-20 epochs with a pretrain get you a recognizable voice of a specific person
nowhere perfect, but like 50% of the work is done there
after that it is just small touches here and there that slowly shape up the voice
but anyway, rvc is not a real voice clone, there are very important characteristics it can not reproduce such as peronal inter-phoneme microdelays and mannerisms
~30-40 sec per epoch for 10 min dataset & batch size 8 should be normal
@analog obsidian Hi Lyery, are you there?

You have a really cool Goth Mommy model, unfortunately the link is not working, did you delete it? If not, mind to share it with me if its still public? Thank you 😄
The link is this one: https://voice-models.com/model/1ucea3z45g5
i no longer have it sry, lost it when i upgraded my ssd
Was the best one I heard 🤭 . No worries, thank you
Hey, alt136735! Please use the command !howtoask to increase your chance of getting help by structuring your question in a way others can understand better. Also make sure you're asking in the right help channel:
- General RVC help: #✨│ai-help
- W-Okada / Realtime RVC: #🔍│help-w-okada
- AI image related: #🔍│help-ai-art
File "C:\Users\vedant\Desktop\Retrieval-based-Voice-Conversion-WebUI-main\infer\modules\vc\modules.py", line 172, in vc_single
self.hubert_model = load_hubert(self.config)
File "C:\Users\vedant\Desktop\Retrieval-based-Voice-Conversion-WebUI-main\infer\modules\vc\utils.py", line 23, in load_hubert
models, _, _ = checkpoint_utils.load_model_ensemble_and_task(
File "C:\Users\vedant\AppData\Local\Programs\Python\Python310\lib\site-packages\fairseq\checkpoint_utils.py", line 423, in load_model_ensemble_and_task
raise IOError("Model file not found: {}".format(filename))
OSError: Model file not found: assets/hubert/hubert_base.pt
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
- Creating Datasets for RVC using iZotope RX11, por Cauthess
- Gathering and Isolating Audio, por SCRFilms ❄
- Instrumental and vocal & stems separation & mastering guide, por deton24
- Vocal Mixing Tutorial, por Roomie
- https://mvsep.com/
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
hi there, i'd like to ask whether RVC is better than weights.gg
RVC needs lots of time to generate, while weights.gg not
idk if there's differences. Thanks!
weights uses rvc
I'm sorry so it's exactly the same thing?
if u have a 5090, it might be faster
yup
Ok thanks!
The difference is the GPU
I don't remember if it was an AI specialized GPU like A100 or an rtx 4090 tho
pomocy wyskoczyło mi No module named 'gradio'
i nie wiem jak to naprawić
help, I got No module named 'gradio'
and I don't know how to fix it****
if you got the source, you need to install all requirements
but better just get a compiled version
with all the required packages included
Guys plsss tell me wat Ai vocal remover y’all useee
Guys, what should I do if I experience a large delay in my voice changer? When I test it in the program, delay is about 3-4 seconds, but when I join the game, it shows around 30 000 ms. Please Help(((
refresh discord, be sure your connection is stable
might be a discord moment
bro 100mb per sec and also cable how is that can be unstable 😭😭😭
see if there's an 'X' next to the search thing, could be that your last research got stuck
o ty
did that solve it?
yeah
Ayo? @sage wind level 1 !!! 
i can even send screenshot but i dont have permission
yea you need to be level 1 by chatting to send images in help channels
yw 🔥
yw
I increased the dataset to 10min with every file duration between 5 and 8s (idk why but audacity don't want to work with less duration), it's actually in training for 200 epochs
20s/epoch
@low shard How can I convert vocals from huggingface.co to A Mp3
8s is too long, but it doesn't really matter because rvc is going to slice those long samples
(and audacity already removed long silences by itself so the chances of the rvc slider adding long silences are very low)
I'm guessing you're talking about the vocals output you get from Ilaria RVC Zero?
Why would you want to convert them from .wav to mp3? mp3 is lossy and lower quality than wav
wav is way better, it's lossless
There's no reason to convert them,
If you really want to do that, you can just google any wav to mp3 converting site like this
Best way to convert your WAV to MP3 file in seconds. 100% free, secure and easy to use! Convertio — advanced online tool that solving any problems with any files.
But It's really not suggested
use audacity (with ffmpeg) or any other audio editor, or the online sites like suggested above
thanks men
- Search for it in AI HUB Docs or Applio Docs. You will probably find your answer there 📚
- Ask for help in #🔍│help-w-okada if it's related to real time voice changing but make sure to read #1297207135469305866 first
- Ask for help in #✨│ai-help for general help, but use the command
!howtoaskfirst to learn how to structure your question properly and increase your chances of getting a reply - Last but not least, ask for help in #🔍│help-ai-art if it's related to AI images.
It's better but sometimes it's robotic again
but it's really really better
yes its normal, usually you stop getting robotic sibilances at the 30 minute mark
anything below that use a batch size of 4
and pray for getting them less often
okay, and if i try more epoch ? like 400 ? it will just getting worst ?
not worse but the improvement is very marginal and you have the risk of overfitting
not audible difference + unnecesary risks
ok i'll try to increase the dataset again
do you know how i can extract automatically a voice from a certain caracter from a movie file ?
it look really long to do it manually
that is called speaker diarization, currently the only one that exists is named pyannote and is extremely ass
i tryed with python and speechbrain but it's not convincing
Ayo? @chilly ridge level 3 !!! 
sadly is better to separate speakers manually
nah not that ass
i tryed that too
it juste identifyed the whole movie like it's the character i'm looking for
idk why it wasn't for me
yea its trash dont use it
I haven't played with it since a while tho https://github.com/sanctuary-osai/Pyannote-Speaker-Diarization-3.1
if it work for you, so explained how to use it
better not having the risks of ai getting confused at voices
they do have a paid version i havent tried
probably he tried that
How could I?
idk i was guessing
me 2
the parameters are explained in the ui of the project i sent
🔥
thx
That's true tho, better double check it
but maybe could help, im saying it just for that
imagine having to watch a whole movie to make a model 😭
diarization does't worked
any idea how i can export all the movie scene with that character ?
clearelly i don't like the idea to do it mannualy
do it manually

anyone know fix
i have 1h of dataset, do i also use 4 for batch size and 200Epoch ?
so, where is it
hard to answer this since it depends on the variety of the dataset
ah
I mean
It should be pretty simple
🤔
you lack a model ( a component of rvc, that is )
oh
same as always, i have 1h of audio with only 3s-4s audio file
for simplicity sake use batch 8
Dl all you lack from there
thx
THAT WAS IT
I DIDNT KNOW WHAT TO DO ON THAT TY
( + additionally, the root of the issue
but typically the last line tells you what is the main deal
ye
Alr, glad I could help. best of luck man
yes
oh ty
rmvpe is for f0 extraction
hubert is for features
hello how do i send pictures for help?
First you must get to some level, I think 5, 7 or 10 to have an ability to ( in case you aren't able to rn
You level up by being active on chat
its levek 2
oh, then 2 then. Seems like the threshold got lowered
i can do it rn
oh so i just yap?
yues
I mean yea kinda, but you can just go #🤖│bots I suppose
or somethin'
to not clog the main chats
Anyway, I gotta go back to work now
yo i got another error now there is no trace back it procceses for a few sec then it says error
2024-12-01 11:13:13 | INFO | fairseq.tasks.hubert_pretraining | current directory is C:\Users\vedant\Desktop\Retrieval-based-Voice-Conversion-WebUI-main
2024-12-01 11:13:13 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2024-12-01 11:13:13 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
2024-12-01 11:13:14 | INFO | infer.modules.vc.pipeline | Loading rmvpe model,assets/rmvpe/rmvpe.pt
this isn't an error
oh
its info
its normal
nothing seems wrong here
hi! i try to use RVC but in the end it says this.does anyone know whats the problem and how to fix it
should i rec it
this gui is extremely outdated
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
oh snap is there a tuto on how to download it?
seems a lot of stuff
im not aware of much github
nah there isnt
wait
make sure u have pip 24.0
3.10
rc1
tf is that?
u need it to install the requirements
Ayo? @oak inlet level 3 !!! 
nah its aight i was just tryna make a silly cover
oh
thank you very much helping tho!
Ayo? @waxen kelp level 2 !!! 
what's ur pc gpu?
and yea u shouldn't use rvc gui
its GX 840
you sure it's that?
You can check your pc gpu via:
ctrl+shift+esc (task manager) -> Performance tab -> GPU
and here u should find the gpu memory
I have never heard of that gpu + can't find it online
I just realized my version of rvc gui is outdated too damn
Ayo? @gloomy lynx level 1 !!! 
yea nah u should never use rvc gui
what's ur pc gpu
nvidia geforce rtx 3060 ti
applio imo
Alr lemme try it
10min25/Epoch is normal ??
what, 10 minutes per epoch or 25 seconds?
thats not normal, it should be 1-2-3 minute per epoch if your dataset has 1 hour of data
how much vram do you have
I reach 48min without silence in audio
8Go Vram, 32Go ram and a big xenon processor
I have an IA gpu normally
I dont' know if it's change something
oh that explains
Quadro ADA RTX 4000 8Go
this is what are you using rn?
idk sorry i have no idea, i don't use these types of gpu
With the 10min dataset and 4batch it was 7s/epoch
I checked the "charge dataset in gpu"
oohhh
this explains
caching dataset in gpu in huge datasets cause massive vram usage
you're using system RAM
this is why became so slow
system is using fallback memory
disable that on big datasets
Ok so i need to uncheck this option ?
yes
Now it's 2min/Epoch
Ayo? @chilly ridge level 4 !!! 
Thx 🙏
I'll credit you if i finish my project one day
so uhwhere do i download
Ayo? @oak hearth level 2 !!! 
my thingy flatlined, it TECHnically is going down and it isnt going up do i just download the latest .pth saved?
or do i let it go until it goes up?
click
you need an A100 to do that
there's no point of using that option tho
oml thank you how did it get like that
I dont recall there's RTX ada with only 8 GB (you could get a cheaper normal rtx one with 12/16 gb vram lmao)
so uh is there any way to resume training without a g and d file
I get it for free, so i don't really care 🙃
I just checked and you'r right, i have a quadro rtx 4000 8Go, the 4000 ada 20Go is in my boss's workstation
Mb
you can resell it, profit! 
Haha, no, i prefer keep it
can someone help
Hey, jinxss! Please use the command !howtoask to increase your chance of getting help by structuring your question in a way others can understand better. Also make sure you're asking in the right help channel:
- General RVC help: #✨│ai-help
- W-Okada / Realtime RVC: #🔍│help-w-okada
- AI image related: #🔍│help-ai-art
I have a 3060 8Go in my laptop. With same dataset and setting, wich one you think have less min/epoch ?
What’s the
why does it say the input is the vb output, it also says output is vb input
Ayo? @timber hamlet level 1 !!! 
how much faster is an a100 compared to colab t4
there's only RTX 4000 ada 20Gogs, your boss must have modded yours to 8Gogs (if not physically, perhaps only bios tweaking). performance wise, it's roughly between 4060 Ti (including 16Gogs) and 4070 (12 Gogs).
No no
I said i don't have an ada
Just a quadro rtx 4000
My boss's gpu is an rtx 4000 ada
bruh you did say this
Here
the name seems a bit ambiguous, perhaps the old Turing one?
https://www.techpowerup.com/gpu-specs/quadro-rtx-4000.c3336
Haha yeah, it's this one
I thought I had the same computer as my boss
But it look like he keep the big gpu for him
in comparison, there's RTX 2070 with the same gogs and similar performance
I see, so, my laptop with the 3060 8Go could be faster/epoch ?
also RTX laptops are usually easier to overheat than desktop ones
I put my laptop upside down in front of the AC 🤫
I don't understand the graph
skill issue
it literally shows the same performance as 2060
Okay, not bad for a free gpu
how the fuck do i fix this
probbaly something like vc redist is missing
or cuda tools
well when i was opening the rvc file
it showed up a error
is there a proper way to open the file?
is failing to load because some dependency is missing
there's a way to find the missing depencency
https://github.com/lucasg/Dependencies/releases/tag/v1.11.1 - download x64 release, unzip, run the dependenciesgui.exe
from that open the dll shown on the screenshot, it will list what it needs
but my guess is either vc++ redist or CUDA toolkit
Check out this creation I made on Weights.gg! https://www.weights.gg/shared/cm43ggokn1qqwogar8nn0oibe?inviteCode=4619f
Are there any accessible applications that could have the same effect as RVC?
-cOLAB
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
Ayo? @blazing solar level 1 !!! 
wdym same effects as RVC
what’s ur pc gpu and what are u looking for?
Can do the same thing as RVC I mean
You can just use RVC
RVC is the best speech to speech program
Reply to this
do you mean any frontend applications using RVC? I'd recommend any of following:
-rvc
- How to use RVC Mainline Colab by Cauthess
- AICoverGen Colab Guide by Eddy (Spanish Helper)
- Create a model with RVC disconnected (colab) by Angetyde
gradio is one of the most used framework for making webuis in AI projects
so a requirement for eddy’s UVR UI
by checking its github, https://github.com/Eddycrack864/UVR5-UI
you seem to be on windows, did you run UVR5-UI-installer.bat Without administrator?
Yes
Are you sure it’s in the C drive and there’s no special characters in whatever folder it is?
it doesn’t seem to be on the C drive
how much hours do yall think is too much hours on a data set to be used on a voice changer
like if i wanted to avoid it being overtrained what would i do
for the maximum
why does rvc keep crashing whenever i click start audio conversion? (not responding)
-hf
- UVR5 UI, by Eddy and Ilaria Huggingface Spaces
- Ilaria RVC Zero, by thestingerx Huggingface Spaces
- RVC⚡ZERO, by r3gm Huggingface Spaces
- Applio, by IA Hispano Huggingface Spaces
- 🆕 FaceFusion UI, by Nick088 Huggingface Spaces
What are u using to run rvc
as in gpu?
do u know that maximum dataset length a voice can have without being overtrained
There is no constant length u can’t predict anything
But if I were to choose
I’d go for 7-10 min dataset of clean and diverse data
minutes???
Yes
No dats bad
Someone need to put out a new one 😭
I can’t do it for local rvc
Ik what he using
smh
how is that bad
Dats what will most likely happen
it seems he resumed training with different batch size from before
It was fucked before they even stopped training
he also kept overtraining, perhaps till more than 1k epochs 💀
hi I have problem, I tried to import voice model to gui but got this error:
size mismatch for enc_p.emb_phone.weight: copying a param with shape torch.Size([192, 768]) from checkpoint, the shape in current model is torch.Size([192, 256])
how can I fix it?
I'm sorry for my mistakes, english is not my native language
What are u using to run rvc
locally or colab
rvc gui from there https://github.com/Tiger14n/RVC-GUI?tab=readme-ov-file
so 2-3 hours would be fine?
This is an architecture mismatch error
The gui ur using is very out of date
It’s only compatible with rvc v1 models
1 hour is my maximum length
oh what have I use instead?
-colab
Suggestions for @vale raptor
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
Use applio it’s the first link
thanks I'll try
Let me know if something goes wrong
it’s better to ask whats the user pc gpu first than giving colab
in case he has a good pc to run it on
True
what’s ur pc gpu?
yea bc i seen people using colab while they got an rtx 3060, just because the helper gave them a colab
The user should always be asked what hardware they got so the helper can give them the tools to use
Could you explain how to download it?
Download what
he
rtx 2060s
-gui
see
that’s why every staff should always ask for the hardware first
your pc is good enough to run it locally (use it on ur pc), colab is just a cloud service (run it on remote good pc, for people who got bad pc)
elaborate whats ur pc gpu and what u want to do
rtx 4060
Ayo? @amber fjord level 1 !!! 
and what do u want to do
trolling my friends
trolling can mean anything
im guessing you want realtime voice changer for calls?
yeah
Yea its the best u always be specific when asking for help
-rt
Interaction has expired, use the command again for a new interaction.
1st link, its the wokada (program for using rvc , speech to speech, models in realtime for calls) fork (modified version, this one has better performance)
the 1st link is a guide to install it (which u will use the nvidia way as you got an rtx)
By reading the guide, you have everything you need to know, if you run into any issues, ask in #🔍│help-w-okada as this isn’t the correct help channel for that
thank
yw
I'm sorry for silly questions but how can I add voice model
Ayo? @vale raptor level 2 !!! 
Download tab then paste the huggingface model link
I see
damn i did it I finally did it it's victory
thank you very much
since the whole thing is in 35.3-36.4 range, the growing value is not a big deal. but the value itself is high, so yeah, likely just bad dataset
too much variety in the samples, i guess?
Too much of something is always bad not just in rvc 
He trained it for way too long I think dats why the value is high
it started with 35 and ended with 36.4?...so training duration played little difference
I have an opposite problem
what is it?
I wish people stopped placing projects into 'special' folders
move the vc folder to C:\vc
in short, programs like voice changer are not designed with the best Windows SDK guidelines. VC executed by you, without admin permissions, is trying to write stuff into semi-protected Program Files folder, which Windows does not allow, because the software should utilize a user folder instead
I never really understood what the fm graph is good for
hey im using google RVC colab added index didnt crated ); everthing eles goes well what shuold i do?
elaborate more, which colab and whats the error when training the index?
rvc V2
there's a comparison between original slice of audio and a generated slice of audio, the distriminator (the teacher in GAN algorithm) compares them together by looking thru several filters and calculates a mean difference
since it does not compare the exact values, it is possible that one part of the generated audio gets better while the other gets worse, but the average difference stays about the same
maybe I don't understand something, but no matter where I put it, the program doesn't work for me
How do I know if the model is improving or not thru feature matching
generally the difference between real and fake audio, and thus fm value, should be going down
My fm be going down sometimes
Actually most of the time
here I had a model try to produce a single sample
you can hear the difference and the FM chart reflects that
I meant up** 
What are the factors that influence fm
which colab? Send link and show the error
yeah, best case I see something like
real, so many people still use this channel for wokada
cuz someone needs to rename the channel 🙂
help-rvc (not VC!)
help-w-okada (the VC!)
Yeah dats what I be getting too
I thought it was bad
I think the value going up is expected as the model figures out the parameters to use, but it should stabilize and settle at some value or around it without much deviation
like that 'cement' example above
the discriminator used in RVC is not stellar, so in most cases the model just finds some local minima for parameters and settles around that
and it does not get out of that hole no matter how much more you train it
there's a chance that fm going up is just that hump on the chart after which the value would go down once the model learns to reproduce the audio better, but I've yet too train for that long to see it
I guess improving the discriminator is the next move
that would require a whole new set of pretrains
I've tested a new loss function that does not require much change, seems to be doing better
When is it releasing

does anyone have any colab i can use to make ai covers
Ayo? @fallen grotto level 2 !!! 
-colab
Suggestions for @fallen grotto
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
Use applio
thanks ho
😭
Not available yet
Any order of model to use to extract vocals from a song to make AI cover?
I'm both using MVSEP and UVR
i have a branch created for applio 3.2.7
Sorry if this is the wrong place to ask this question but I'm brand new to AI voices and what can be done with them.
I was wondering if the w-okada voice changer is the only 'app' available for real-time changing of voices on a Mac or are there any other apps I could try RVC files out on a Mac with? Thanks.
read the pinned guide there: #🔍│help-w-okada
So basically w-okada is the only way to use RVC voices in real-time on a Mac then? There aren't any other apps that can be used instead?
Can u link it
nope, not even voice.ai
Okay thanks. Seems rather strange but it is what it is I guess.
Ayo? @rare bear level 1 !!! 
W-okada is the best 
Well from what I've been told so far it's the only one so would be the best wouldn't it? 😉
lol
voice ai so ass bro omg
🙏
When I try to start Applio (colab), I am getting this:
An error occurred connecting to Discord: Could not find Discord installed and running on this machine.
Traceback (most recent call last):
File "/content/program_ml/app.py", line 90, in <module>
inference_tab()
File "/content/program_ml/tabs/inference/inference.py", line 418, in inference_tab
choices=get_speakers_id(model_file.value),
File "/content/program_ml/tabs/inference/inference.py", line 325, in get_speakers_id
model_data = torch.load(model, map_location="cpu")
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1004, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 456, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
OSError: [Errno 22] Invalid argument
It just stays like this with no link to gradio. Anyone else?
It was working last night, so I am guessing it needs to be fixed. Edit: nvm
Ayo? @shut goblet level 3 !!! 
if I had to guess, some translation is messed up
Where can i find Voice models for AllTalk Ai?
pth files dont work there
i dont understand what this means 
Hey, I need to remove the breaths from my dataset, for example: the breaths when a singer is about to sing du verse and takes a breath or releases it when he finishes singing.
There aren't here
80% of the models are RVC (Best few shots Speech To Speech)
20% GPT-SoVITS (Best few shots Text To Speech)
Why is .wav not recognized as a format?
i18n is a package that is used to translate UI to different languages
not an actual .wav maybe?
Did you solve it?
I thought that bug was fixed. Anyway, how to fix that is described here
Thanks for lmk Nick
What are pretrains? If I want to clone a voice, should I use a pretrain or spend more time making it myself? What will be more quality?
Ayo? @oak hearth level 3 !!! 
Are pretrains just faster and better and should i use them to make the most lifelike realistic clone?
pretrains are pre-made models trained on days worth of audio (in rvc case)
making them from scratch is a very hard process even for people that have the knowledge
always use a pretrain model when training on rvc
their purpose is to have a baseline during the finetuning process (when you are training a model)
They can affect the final result quality since audio upscaling is involved
If you train without a pretrain your model is going to sound like shit because rvc has no prior knowledge of sounds
always use the original pretrain as the custom made pretrains dont reconstruct your dataset frequencies as good as the original
for a realistic model train an expressive non monotone dataset of 30 minutes and above
the more data, the better and realistic
be sure your dataset has variety, for example avoid monotone dialogue or audios repeating the same sentence/words etc
Can someone give me advice? I’m training a model with 15 hours of data and I don’t want to use a pretrain on it. Which one should I select? Custom makes me add my own pretrain I think but I’m not sure.
Ayo? @coral frigate level 5 !!! 
15 hours is too short for a pretrain
aim for the amount of time as vctk dataset
are you making a from scratch pretrain or a finetune
I’m not aiming to make a pretrain. I just want to make a voice model with my dataset but I don’t want to use any of the pretrains applio offers
what i said still applies
If that makes sense
rvc will not be able to give you a good result
use a pretrain instead
when you remove a pretrain you remove the knowledge of sounds
Im still new to making models but everytime I’ve used a pretrain, the voice has sounded strange and very off. Is there a way I can fix that then if I have to use a pretrain?
de-select this
you've used a pretrain with this exact dataset?
Yes with 150 epoch. It sounds nothing like the dataset and just sounds really off. It has been the case whenever I use a pretrain. In this case I used the contentvec pretrain.
contentvec is not a pretrained model
is an embedder
contentvec is for feature extraction
for your huge dataset of 15 hours you can try batch 16, and it should take 1 or 2 days to give good results with a pretrain
Ow I assumed that they were all just pretrains.
Can you suggest which pretrain I should use from the list for an English model?
original pretrain
Sorry if this is a dumb question but How do I use an original pretrain?
dont touch this
are you on applio? simply don't change anything related to pretrains, it'll use the original
yea that
leave it like that
Ow ok I just keep that unchecked and I’m good to go?
And yes I’m on applio
yep
Ok thank you so much guys
So would it matter which one of these I select if I’m unchecking the box anyway?
leave it on contentvec
Ok thank you
you dont need shitton hours of a single speaker dataset from 3k voice recordings like this https://www.techspot.com/news/105764-panasonic-resurrects-long-dead-founder-ai-share-management.html
30m-2h can already produce good results
I was just told when I started that the bigger the dataset the t he better
do you even have enough time and effort to clean the massive dataset lmao
remember: quality > quantity
the more the better but i agree 15 hours is too much 😭
also you have to be sure your dataset has no possible sounds that could cause problems later
Yeah sadly I did have to listen through all 15 hours like a podcast
i mean technically a 15 hour dataset is better than a 30 min or an 1 hour one
but rvc already becomes realistic at the 30 minute mark
technically speaking is not bad
but is too much effort
Idk I’m dumb and still figuring rvc out. I’ve actually been trying to train this for 3 weeks now but applio hasn’t been able to train it. Just keep getting an error
for realistic purposes i recommend using a dataset of 1 hour
max 2 hours
higher than that is not bad, but not worth the effort
-colab
- Applio, by IA Hispano Google Colab
- RVC Disconnected, by Kit Lemonfoot Google Colab
- RVC Mainline, by Hina Google Colab
- AICoverGen-WebUI, by Hina Google Colab
- AICoverGen-NoWebUI [English], by Ardha, fixed by Eddy, Hina and Gdr Google Colab
- AICoverGen-NoWebUI [Spanish], by Eddy, Hina and Gdr Google Colab
- UVR5 NO UI, by Eddy Google Colab
- UVR5 UI, by Eddy Google Colab
- Modified W-Okada's Voice Changer, Google Colab
- 🆕 FaceFusion UI, by Nick088 Google Colab
- 🆕 FaceFusion NO UI, by Nick088 Google Colab
- 🆕 EasyGUI, by Rejekts Google Colab
While the Colab free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
Would’ve been nice to know before going through hours and hours of data
yea... we tell people the more the better because most believe a model is going to sound realistic at 5 minutes of data
surely the cleaning effort could take more than double the audio duration itself, unless you hire a few freelancers (that still waste of money)
Is it normal for feature extraction to take 10-15 minutes? I’m trying to troubleshoot why I still can’t train the model
yes because your dataset is 15 hours
@knotty moth is rmvpe on applio using gpu now?
or they're still using rmvpe cpu
if applio is still using rmvpe cpu, is going to be super slow
if applio is using rmvpe gpu it should be done in 30 mins approx
what i can say about this is at least his 15 hour model will be able to do much more than a 1 or 2 hour one
so the effort will at least be worth a little bit
but still, rvc is really limited because hifigan and bad code
ive been getting this error in the cmd for a month and im probably doing something dumb thats failing the extraction. have any suggestions on how else i can trouble shoot it?
reduce cpu cores to 4
not sure but it still has some gpu usage
ill try that
hmm he got oom so i suppose is using rmvpe gpu
do i use 4 for both preprocess and extract?
yep, it doesnt really a speed difference, or at least i havent noticed it
maybe it has one but is very marginal
I only recall oom happens when inferring a big whole audio depending on ram
oom also happens when you're using too much cpu cores during feature extraction
staring at crepe
not sure, it simply hangs the system for a while and then crashes
it gave me a bsod once 😭
crashed that bad
It’s in the middle of extracting all the audio files but as its going in seeing that it still gives the same error line for each one
uh.... are you sure you reduced the cpu cores during feature extraction?
whats going on is that your system is getting out of vram during the feature extraction
Yh it’s at 4 now for both
Could it be because I have 33 files in the dataset folder maybe?
i dont think so, first process is the preprocessing, this slices your audios since hifigan does not work in long audios
after that feature extraction, extracts the features of the sliced audios
so is doing it on 3s samples
it goes out of vram when is doing a lot of them
at the same time
This has been an issue for a while now. Idk what else I can do to fix the issue
last resort is to try using 2 cpu cores instead of 4
if that doesn't work im lost sorry, have never faced an issue like that on applio before
might be related to applio only
and nothing is wrong in this section?
nope, looks fine to me
Well then I’m doomed
cpu cores to 2 also shouldn't affect the speed quality
since it looks applio is finally using rmvpe gpu
like u could try that
cpu cores only affect feature speed when you're using an cpu based extractor
I’ll try that but I don’t think it’ll change much seeing that my cpu should’ve been able to handle 4
Maybe redownloading applio will do something
your cpu is not being used here, your gpu is
the gpu is doing the extractor
hence why it gets out of vram
in older versions of applio rmvpe was actually cpu based
but is very slow compared to rmvpe gpu
Even then my gpu should’ve been able to handle 4 without vram issues. I’ll try redownloading applio and hope that fixes something
yeah exactly it should be able to do it, i find weird its getting out of vram
i hope redownloading fixes the issue
Thanks. I hope so
Ayo? @coral frigate level 6 !!! 
I’m out of options after that
Limit each file's length to 10 mins ( maybe 15 )
use 4 or 8 threads and switch to rmvpe ( non gpu variant )
btw, would be helpful if you provided info on your amount of ram, ( and vram if you actually did use rmvpe gpu variant )
Also, total length of the set and length per file ( can be avg )
is applio using rmvpe gpu?
is weird he's getting oom with a 4090
not sure as I don't use applio
perhaps it's a 2 in 1 ?
in any case, 4090 shouldn't have any ooms like that
its a 15 hour dataset
he is using rvc slicer to slice those files
only reasonable way out of it is.. the extraction's done on cpu and / or per-file length
oh
That's an overkill
what's the length per file? ( as I assume it's not big sample but split
How Do I switch to the none gpu variant? I have 32gb ram and the file lengths vary from 20 mins to the longest being 50 minutes
ye that's the thing
above 20 or maybe 25 ( but I'd stick to 20 per file ) mins, it can cause issues
oh but even if he lets the slicer to do it?
yea this is new to me too xD
oh yea
that can surely be the case cause, 1 set where extraction is loading 15 hours of data vs sequential 1 by 1 sid processing ( even 40 or 100 hours )
is different
interesting, rvc always giving surprises
or so I suspect at least
there's no other way out of it
other than applio being jammed ( wouldn't be surprised at this point
i believe this might be the problem actually lmao
lemme prepare something
might have a 'temp' solution
@coral frigate Amount of threads you have?
I'll assume you can afford to use 8
How do I check? But I definitely would
that's alright then, gimme a sec
have we known his cpu? if it's intel 13/14th, could be its degradation issue
Ryzen 9 9950x
check your cpu & gpu stability, shouldnt be overvolted
Sure on what?
@coral frigate Cause basically, you'd have try 2 things
- Try to feature extract on single 30-50 min file
- If that failed, you'd have to use another fork or just, mainline for the sake of extraction ( and then move stuff back to applio ) (( that'd confirm whether applio is the issue or just the situation itself isn't favored by rvc in general ))
- If that failed too.. rip, in that case it'd mean rvc's not optimized for single speaker 15h at once extraction
Ok how do I move the feature extract back into applio if that’s the issue ?
AI HUB Docs
