#✨│ai-help
1 messages · Page 243 of 1
hahaha. I use it for bannerlord where we got somewhat of a RP medieval clan. But thanks man you really did help me alot It now works on every platform!
Hello, can anyone tell me which Colab is currently used to make AI Covers and Models?
is it normal for the output headphones to sound a little better than cable output?
Hello everyone!
I’m looking for help to create a custom Arabic RVC voice model for a gift.
I want to make a voice similar to Gulf Arabic artists like Ouzii / Luigii style, for a personal use (not commercial).
If anyone is available to help me build the model, I would really appreciate it 🩷
I’m ready to pay for the service if needed.
Thank you so much 🙏🏼
Which voice changers are free?
is Mel-Roformer-Denoise-Aufr33 better or Mel-Roformer-Denoise-Aufr33-Aggr
specifiacally speaking when using this
https://huggingface.co/spaces/TheStinger/UVR5_UI
this one is and I can help u set it up in dms
https://huggingface.co/Shadicti/deiteris-Fork/resolve/main/voice-changer-windows-nvidia-b2332.zip?download=true
Yes pls
bro i can not find the rvc download anywhere
what are you looking for specifically?
i used to have it it was just called rvc and i rember getting it off git hub, you would import the zip and put in a pre recorded audio and then it would be ai voice so
are you trying to train a model?
I wanted to train a voice model, but the link i usually use for the rvc fork is down. Does anyone have another link?
Is their a cracked version of Voice Mod?
Is there a way to fix static or is it just a mic issue?
using applio and recieving this error An error occurred during audio conversion: index -1 is out of bounds for axis 0 with size 0 how to fix?
i forgot what is the best ai cover maker rn?
What happened to the separate channels for the RVC help
I use weights
I'm not getting any output through the virtual cable doing realtime, anyone know why?
I can hear it when I set my monitor to headphones but nothing comes through on discord or in sound control panel
hi, does anyone know, how to check my model in training process (weights.com)? because it says that it already created, but i cant use it and even see in the list. I just wanna know what time left
Is there a way to get rid of like the weird voice cracks?
no but there's a way to decrease them
well different ways actually
one is training a better model
then the other way is enabling fp32 mode and pray if your current model is going to get any better (if the model was trained in fp16 i don't think enabling fp32 will help that much since the model is already fried inside)
how do i do that?
train the model using a considerable big dataset, around 40 mins ~ 1 hour
i noticed voice cracks happen because the f0 estimator being mid (they're not that great sadly) but also because the model is trying to do a sound it doesn't know how to reproduce
Yeah I mean how do I do that in itself?
In the context of RVC, the dataset is an audio file containing the voice the model will replicate. It can be either speaking or singing.
i am using rvc ai cover maker and when im trying to make a cover it at export audio and output information it says error
no
what are u doing then?
okay so it was an app. called RVC. i had it on my old computer. And you couldint train voices but you would import zips upload an mp3 click how many. "ceces" i thinnk it was called then it would export it in the voice
@viscid moss I think u need to translate this for me
omfg
dude
let me do smth rq
🥀
im gonna make a drawing of what i remeber the app looking like
hhmmm idk
Heya, just wondering how do we get ranked up so we can share our RVC and TTS models? Also been doing ton of learning, testing, and ai training and would love to soon share some of the work I have. Making a reliable compact modular system that can spontaneously regenerate your custom AI on almost any platform with training/memory enabled, very lightweight and 100% portable.
Having a blast learning about all this and making stuff and would love the chance to share too!
@junior gull maybe check https://discord.com/channels/1159260121998827560/1305527335646269440 ?
yeah i think so
getting this on kaggle when starting applio:
PyngrokNgrokError: The ngrok process errored on start: authentication failed: The authtoken you specified does not look like a proper ngrok tunnel authtoken.\nYour authtoken: token\nInstructions to install your authtoken are on your ngrok dashboard:\nhttps://dashboard.ngrok.com/get-started/your-authtoken\r\n\r\nERR_NGROK_105\r\n.
https://discord.com/channels/1159260121998827560/1305527335646269440 there's no level requirement to submit there, and the model quality matters
where can i download onnx models most of them are pth models there
thanks, that channel was collapsed before didn't' see it.
hey i'm currently using the w okada fork but i can't seem to find a way to prevent the voice changer from picking up my laptop's fans and making unwanted noises
what GPU?
why rvc gui not opening
is it a error?
Running with the system Python.
Nie moPress any key to continue . . .
where is a more up-to-date tutorial on how to use colab for w-okoda's realtime voice changer thing?
i dont understand anything and the tutorial im following isint working
i switched to another mic and it's better, i have the gtx 1660 ti
laptop
Colabs are generally broken, youre best of using kaggle
But whats your gpu first of all
it's like an amd 580, it's really not the greatest so thats why i wanted to do it online since the app doesnt work too great on my pc
Fork wokada would work decently, id say give it a try first
Last update: May 5, 2025
was about to ask where to find that thank you
Else, for the online hosted, follow this
https://docs.aihub.gg/rvc-voice-changer/cloud/w-okada-kaggle/
Last update: May 5, 2025
tysm
why rvc gui not opening
is it a error?
Running with the system Python.
Nie moPress any key to continue . . .
i've cloned the public kaggle thing but i cant find "Notebook Options"
I guess the accelerator part here
k
they may have shifted some options recently
is "Realtime Voice Changer Client" still good?
i just installed applio rvc and this shows up
Please run 'run-install.bat' first to set up the environment.
Press any key to continue . . .
what i need to do
would you be suprised if i said you need to run "run-install.bat" first
by the way is there a way to make it sound better my models sound nowhere near as good as the samples
okay so i've done all of this
is there a way to save this so that i dont have to do the setup the next time i want to try and run this?
there is a slight quality degradation while using a model in realtime
u could try max extra value, crossfade set to 0.1s, and enabling fp32
if the model still sounds bad then its a model issue and the only way to fix is to get another model or train one yourself
can anyone help me why is this lagging
Looks like your GPU is not strong enough for the chunk you chose
Chunk (the 512.0 ms) has to be higher than perf
Whats your gpu
im rtx 4060
idk bro
f0 det rmvpe
alr trying
can anybody help me with the error of vol0000?
line 1 is working, but sound is not being played. I reinstalled the program and components ~5 times, but no changes.
so for videos like the president stuff, is elevenlabs the best to do that stuff?
Make sure your input is at 100%, pretty sure its 10% at default
is elevenlabs the best thing for the tts videos or is there a better alternative?
chatterbox is very good and free and may be better than 11labs for english
awesome, thanks
...and i'm not a coder, shit.
it installs with one command line
where do i put that command?
as long as you have python installed, ideally 3.10 or 3.11
dont use 3.12 or 3.13
got this
oh
this correct?
it probably installs cpu version of the torch
if you run nvidia gpu you may need to change it to cuda version
i'm a bit new to all this stuff
anyway, it can run on CPU as well, that's fine to just test it
but i am doing the correct command in the right place now
you are installing it in to the global repository
usually not a good idea to do that with multiple project as there are often conflicting versions of the libraries
any idea how to uninstall it?
dont worry too much, you can install it properly later
alright
i mean i just restarted the installation like three times already lmao
alright it was giving me a lot of errors
idk what to do
wheel creation?
i just don't want the files cramped in my c drive
i already installed and cancelled it like 3 times to troubleshoot
well, by default the global repository is on C
under c:\users\user\appdata\local\programs\python\python310\libs\site-packages
that's the global repository
to make a local you need to use #✨│ai-help message
unless you've installed python to somewhere else
OHH i got it from the microsoft store
i'm just not cut out for this shit 😭
most of AI project is not for beginners
but you can ask chatgpt to explain things
is that the best for beginners that want to export models here and make tts videos and shit?
if you install chatterbox is it really simple to use
alright i'll keep working
https://youtu.be/trgPAtcVNfQ
following this video but the commands don't exactly work
my vcc thing isnt working. help
how do i fix this being loud at the start when i speak then it gets normal
how do i fix this error:
The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
from distutils.util import strtobool
not an error
my rvc wont work, my friend is just saying its playing a hello every few seconds and my audio wont go through at all
nvm got it to work
Timer: 00:00:48/content/voice-changer/server/HVoice.py:3: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
from distutils.util import strtobool
Traceback (most recent call last):
File "/content/voice-changer/server/HVoice.py", line 10, in <module>
from downloader.SampleDownloader import downloadInitialSamples
File "/content/voice-changer/server/downloader/SampleDownloader.py", line 12, in <module>
from voice_changer.RVC.RVCModelSlotGenerator import RVCModelSlotGenerator
File "/content/voice-changer/server/voice_changer/RVC/RVCModelSlotGenerator.py", line 4, in <module>
import torch
ModuleNotFoundError: No module named 'torch'
WARNING:pyngrok.process.ngrok:t=2025-06-02T23:20:37+0000 lvl=warn msg="Stopping forwarder" name=http-41369-e68eec6b-0315-4a59-870c-6d6c66810395 acceptErr="failed to accept connection: Listener closed"
--------- SERVER STOPPED! ---------
this is the full error
seems like you tried to install requirements and it failed. You may want to grab a prebuilt package.
but again this looks like colab?
yeah im using colab
whattttt
unless someone fixes the install part
show me the url
yeah, outdated af, i aint gonna touch it
fml
i never had issues w it before
do u happen to know an updated version?
that supports mac?
well, maybe not oudates, but something is wrong with requrements install perhaps
how do i fix it
rip lol
in that time I had to fix applio colab like 5 times already
ugh
hi does anyone know how to stop this happening? i cant delete slots with the edit button as all my slots are coming up as "blank" from previous voices
is there another version? xd
Is there a way to use without internet connection/chrome tab?
like a built in program?
no, but i recommend you to use vonovox
its faster than wokada and doesnt run in your browser
oh, do you have a link/guide for that please?
tutorial is there, just scroll down until you find it
thank you very much
nvidia only
gotcha, im on RTX 3070 TI.

oh, do i have to compile it myself?
just download it and run setup.bat
no dont worry, setup.bat will just download the required stuff to run it
then once thats done run start.bat
im only finding the source code downloads on the "downloads" tab
there is a guide for it https://docs.aihub.gg/rvc-voice-changer/local/vonovox/ its kinda bare atm but it explains stuff
legendary, thank you thats the one
excited to play around with it 👍
kinda wild how like 2 months ago we were sus of vonovox
things change quick
yeah lol
would you say the performance of vono is better than rvc?
oh, what should i use to refer to the web based version?
w-okada
gotcha, that would make sense.
if you talking about the realtime version
i am yes.
then yea, w-okada
Thank you, been using okada for a while, but i feel like my card can be utilised better.
unsure how to word that but yeah
wokada is poorly optimized yup, the fork dev tried his best to improve it
vonovox is a completely new software, so things are better
awesome 
long install time via the bat xD
getting there
do i use launcher once that says complete?
or start
I think there's now some way to conver the whole pytorch model thing to just a cuda kernel
for super fast performance
oh, vonovox shows my folders empty where my pth files are, thats odd. (oh, vono uses pth not safetensor)
dev said he's going to add safetensors soon
where can i stay updated with that please? as all my main voices are on safetensor
join vonovox discord server (it's in the github repo)
you've been so helpful, thank you lyery!
is there any good places to find pth files?
usually use weights, but im unsure if its just safetensor there
thank you
I'm about to stab the hugging face website 23 times on the aides of march
I want to download deepseek prover V2 671B but there's 163 files and nothing I do works
I've tried git, aria, Jdownloader, and the python CLI
git lfs?
idk if it doesn't exist or what
I'm on windows btw
oh wait I found it
how do I download it using that?
the usual git clone of the repo?
where does one find that? it's not the link at the top of the website or the one the files are coming from apparently
wihout /resolve/main ?
does anyone know of any text to speech models that run locally or on web and are not as expensive as eleven labs since I create audio's of over 1hr every other day
@candid basin
Is very likely that it's your datasets issue
Care to show me a short sample of the audio u are using
i still have not used my dataset.i am aksing if possible to lett my i can finetunn it on my dataset
i just tried the hugind space veriosn
Well training is fine tuning so yes you can definitely fine tune
Which one
Yeah that's a zero shot tts
is threre any git of python example code on how i can finetune one on my dataset ?
thnks fo ryour help
or any guide
Ah you simply want to fine tune a chatter box model. In that case go to their git hub and look through their read me. Perhaps they'll have some info there regarding model training which I think they do. I thought you were asking about RVC initially
:] unfortunately they do not :p .I thought to ask her ein order to skip some time searching :]
thank a lot for the help!!
Aww sucks; well good luck finding some sources then. Perhaps a quick Google query might do the trick
I tried it and it's insane how you can make the vocals sound so much better
thank you!
your welcome bro
Is this W okadas? It looks a little different.
yes its the W Okada Deiteris Fork
Awesome, i just found that and downloaded it… Definitely have to check out the improvements!
alright bet i can even help you if you need some help ❤️
I probably won’t get to check it out until tonight, but I may hit you up on that if it gives me any trouble setting it up. Thank you!
alright bet
who can make me a index and pth
🎉 | Jawh leveled up!
ℹ | Level up messages can be disabled for the guild with owo level disabletext
i try everything . shi isnt working
literally no one
vonovox
🎉 | Razer leveled up!
| Extra rewards were added for missing levels
ewwwwwww
💀
btw, do you know if it's okay to pair spin with KLM 4? Or the experimental one that SSS made in pretrain lah.
if it was made with 7_12 spin then yea
can someone give me a link to a colab where i can train models (one that isn't applio)?
good luck with that
What limitations (if any) does the Weights voice model training feature have regarding audio quality compared to a local or cloud training like "Mainline Collab" or Applio?
Like, whats the max ammount of khs that the Weights training (USING A PREMIUM TRAINING) uses/supports of an audio for example.
Or if it adds any sort of compression to the dataset/final model audio no matter what format and properties it has.
Im only certain of the fact that you cannot upload very heavy audios for the training, wich means you will mostly not be able to use a max quality wav dataset for example.
do not train on weights.gg 
how do i even download is there a tutorial
what are you trying to download?
voice changer
bet I can help u, dm me ^^
take this example, both used the same dataset and one sounds better than the other (I didn't use premium because ai should be FREE)
idk any of the complicated stuff I can only provide this kinda info
I'm no nerd..
yeah, that premium thing is... eh... but atleast you can get free premium trainings for each 5 day streak successfully made. thats something I suppose ¯_(ツ)_/¯
me neither i guess 😐
Jesus, now that im really analazing, the Weights one is just depressing comparing to tbh
indeed
how can i turn someones normal voice into like an ai singing voice
I can show u how
okkk thank youuuu
check dms
how long is lfs supposed to be finished for? it's been like this for over 30 minutes now
omfg it doesn't even work???
yeah no when I say "like this" I mean "100% (163/163)"
this has been running for abt 4 hours
that 689 isn't the total going to be downloaded, that's the amount already downloaded
and it's been going up every once in a while those 4 hours
and it's been stuck at 689 for like 30 minutes
well more like 40/45 now
maybe for the tool you suggested to actually work
which apparently learning how bad huggingface is was too much to ask
lfs is how the models are stored there, there's no other way of downloading them, other than clicking off each file manually
so the way they're stored just doesn't work then
-colab
Google Colab is a Cloud (Remote Good PC) Service. While the Free plan provides up to 12 hours of daily usage, the GPU is typically available for only about 4 hours each day on average.
by IA Hispano
Google Colab
by Hina
Google Colab
by Eddy
Google Colab
by Eddy
Google Colab
by Deiteris & Hina
Google Colab
by Shiro & Eddy
Google Colab
by Nick088
Google Colab
by Nick088
Google Colab
by Jarredou & Makidanye
Google Colab
you are trying to download almost 700GB from a web site for free
be grateful it does not give you 5MB/s download speed
ohh.. 700GB download is taking more than 4 hours... the horror
dude
I'm not "downloading 700 gb for free" if over half of the 4.3gb dls are 1kb
I'm not complaining it's taking long
I'm complaining it doesn't work
chill the f down and wait until it is done
oh my bad I though percent was out of a hundred
so could you enlighten me on what percent means then?
I'm doing stuff with my pool one minute
lfs downloads the resource pointers 1st, those are 1kb files
then it actually pulls the content
Does the server have a channel for Lora's for using stable diffusion
you could've ran out of space or something
I did last night, that's why it didn't finish till today, I had to delete the stuff, expand that partition, and restart
so the 1kb files have this in it
but none of them have changed
and it's been sitting at 100% for over an hour now
i mean the text in that 1kb file, it is supposed to be replaced by the actual 4GB content
yeah I think that's only happened with 4 so far
like the one numbered 004 not 4 different ones
it should be downloading them after the pull
well they should all be downloaded
there's a 641gb "objects" folder in .git
I assume it's pulling smth from there
you can check the status with git lfs ls-files I think
I assume * is done and - is in progress
it was all a fucking waste anyway
you need an Nvidia GPU and I have an AMD
it doesn't tell you ANYWHERE
this one section is the only way it'd be possible to find out
Attempts to run deepseek, finds out
now I'm trying to run the 7B parameter version of math (not prover) and using the exact thing it tells me to in the way it tells me to and it's giving an error based on their code
Traceback (most recent call last):
File "D:\AIs\Deepseek-Prover\runDeepseek.py", line 6, in <module>
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\AIs\Deepseek-Prover\venv\Lib\site-packages\transformers\models\auto\auto_factory.py", line 571, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\AIs\Deepseek-Prover\venv\Lib\site-packages\transformers\modeling_utils.py", line 309, in _wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "d:\AIs\Deepseek-Prover\venv\Lib\site-packages\transformers\modeling_utils.py", line 4508, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "d:\AIs\Deepseek-Prover\venv\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 618, in __init__
self.model = LlamaModel(config)
^^^^^^^^^^^^^^^^^^
File "d:\AIs\Deepseek-Prover\venv\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 379, in __init__
self.post_init()
File "d:\AIs\Deepseek-Prover\venv\Lib\site-packages\transformers\modeling_utils.py", line 1969, in post_init
if v not in ALL_PARALLEL_STYLES:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: argument of type 'NoneType' is not iterable```
this is getting rediculus
v has no value
yeah I don't know why
trace it back
that's 5 files deep
and half the functions just tell me there's no definition in vscode
usual thing when you IDE did not load definitions
if v not in ALL_PARALLEL_STYLES: this is a part of a pretrained model load
okay so it's hardcoded to be none
"if self._tp_plan is not None and is_torch_greater_or_equal("2.3"):
for _, v in self._tp_plan.items():
if v not in ALL_PARALLEL_STYLES:"
" _tp_plan = None"
and everything I've done is from the example given
no
my bad I thought "here give some examples" was an example
config = self._autoset_attn_implementation(config, torch_dtype=dtype, check_device_map=False)
and that's what I did
self._tp_plan = self.config.base_model_tp_plan.copy() if self.config.base_model_tp_plan is not None else {}
if it is not empty, the loop happens
and it throws and exception if it is unsupported style
for you it throws one because the value is None
so I have to go on my own and figure out what a tp plan is and set it for the model to work and they don't say that anywhere
AMD rx 6750
using zluda?
if this was an issue about not having a GPU I'd understand bc it doesn't support any type of cuda or whatever the AMD equivilant was called
not sure what that is
it is the magic that lets you run CUDA stuff on AMD GPUs
does it still work if my GPU doesn't support ROCm?
yeah my actually strong PC is on windows
for 6750 read te instructions on https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU page
wait but isn't this all for nothing if it's still giving an error I can't fix?
based on what the models are supposed to actually do this probably won't help with the thing I wanted anyway so this all is just kind of a waste of time
@simple ore u seem knowledgeable mind if i shoot u a random question new to this discord but wanted ur opinion on something
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
My friend cannot change their input settings. Every time they do they get this error message. They just got the software ( fresh install ), version 1.5.3.16a
What is your friend's PC GPU?
There's a better W-Okada than this version.
NVIDIA GeForce RTX 3050 Ti Laptop(4GB)
i did not know, i just got the same one they had before/version that i use
hoping to avoid compatibility issues whoopsies
Use this W-Okada instead. https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/#download-nvidia-on-windows
Last update: May 5, 2025
should we just delete the entire folder the old one was in?
i just want to make triple sure because i tend to mess up
we're downloading htis one right?
thank you!
It's working wonderfully, thanks a lot!
anyone got a working colab link for RVC2 (no applio) ?
What's the best option for using the voice changer with AMD? Is this it or is there a better one? https://www.kaggle.com/code/suneku/voice-changer-public
i'm trying to do this with 'ngrok'
Chunk is mainly for latency but if its too low for your gpu to handle it will lose some quality too
Colabs not working atm, follow Kaggle
how tho
this guide there doesn't even tell me what site to go to
There is a guide on the link nick sent right next to the link you opened
Full name of your amd gpu?
i have to use phone numbah?
Yes
bruh
check your gpu name in task manager or any other applications like GPU-Z
RX 5000 series/newer are recommended
if it's only "AMD radeon graphics", more likely it's integrated gpu which is less capable
yes but there are options to choose from
Just keep rmvpe selected.
Don't use any of these crepe options.
that's the one used for these models
but i realized that "_onnx", which was default set, sounds more clear
Keep that one selected then if it works fine for you.
No
Higher chunk higher delay but also more time to compute the voice, but at some point increasing doesnt improve voice
Extra 2.7s , advanced settings: increasing crossfade length helps with clearer voice, turning on fp32 for nvidia gpus too
can someone give me a link to a colab where i can train models?
got this error on mac M1 when i wanna convert. Any ideas how to fix? Tried google already but didn't fix it: AttributeError: 'NoneType' object has no attribute 'tobytes'
the audio-path is definitely not wrong
you need to provide a bigger error log
Traceback (most recent call last):
File "/Users/jlapping/.pyenv/versions/3.10.11/lib/python3.10/site-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/Users/jlapping/.pyenv/versions/3.10.11/lib/python3.10/site-packages/gradio/blocks.py", line 1434, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "/Users/jlapping/.pyenv/versions/3.10.11/lib/python3.10/site-packages/gradio/blocks.py", line 1335, in postprocess_data
prediction_value = block.postprocess(prediction_value)
File "/Users/jlapping/.pyenv/versions/3.10.11/lib/python3.10/site-packages/gradio/components/audio.py", line 349, in postprocess
file_path = self.audio_to_temp_file(
File "/Users/jlapping/.pyenv/versions/3.10.11/lib/python3.10/site-packages/gradio/components/base.py", line 325, in audio_to_temp_file
temp_dir = Path(self.DEFAULT_TEMP_DIR) / self.hash_bytes(data.tobytes())
AttributeError: 'NoneType' object has no attribute 'tobytes'
2025-06-04 15:10:38 | INFO | httpx | HTTP Request: POST http://localhost:7865/api/predict "HTTP/1.1 500 Internal Server Error"
2025-06-04 15:10:38 | INFO | httpx | HTTP Request: POST http://localhost:7865/reset "HTTP/1.1 200 OK"
you can try saving the file you want to convert into assets/audios and then just click refresh on UI and pick the file from the drop-down list
guys anyone can tell me which training and makings rvc models, last time i used RVC1006Nvidia
RVC1006Nvidia still works, otherwise you have Applio or more advanced forks
if u say still rvc1006nvidia is good ill keep that
thanks for info btw <3
if you're happy with it and know how to use it properly, why not
i did really good models with this thanks. oh i also my i ask. can we make good models with laughing or screaming, i mean i did really nice realistic models for realtime voice changer but. they cannot laugh or scream etc. that voices is not word thats why maybe.
i wondering if we have good dataset for that voices is it possible to make good models?
laugher generally fails, screaming requires a dataset with a large dynamic range, generally rvc inference is pretty flat
Optional
Last update: May 5, 2025
What do you need to download to train locally?
what's your pc gpu? what do you want to do?
2070 super
good
I gave you an explaination to the differences and links to the docs
ok thanks you!
yw and lmk
can someone give me a link to a colab where i can train models (one that isn't applio)?
Let me know when you found something.. I need the same
gl on finding a working on
ew money
?
don't pay for ai
i need it for a proyect
finding or funding, that's two very different requests
a or b?
A
thank u
spelling mistakes go hard
live mods dont review the model submissions do they?
just wondering...i got a very silly and confusing reply for mine.
"sounds distored" "retrain and getr rid of distortion"
.... submited with the details and description and listed online with the same.... DISTORTION is signature to this char's voice pattern, since she always talks that way, and my goal was to faithfully capture that in great detail, and I did. I clearly communicated that on her hugginface listing too, and on my model submission details. Its not any different than the dozens of robotic voices listed here already. Same idea.
But then I got that wierd reply like a real person hadn't even bothered to read anything.
your model was rejected because the voice you submitted had effects and that makes it harder for us to QC it
thats why its a rule that you cant submit a robotic voice or a voice with effects
but yet there are like over a dozen modles i can immediately see that have such effects and MUCH heavier too all in the model section?
So that doesnt' really make sense? And hers is WAY lighter than that....its a knowon characteristic of this character. It's not added. Its literally in EVERY refrence file to her voice becuase....its her voice. There is no version of her iwthout it....and I wouldn't want one if there was. Against thtat doesnt' make any sense???
they can post those models because they passed the model maker test
Show your skills first with a normal model, then do whatever
^
why wasn't that said in the reply then? ~_~ instead of telling me to build my model withotu her main defining feature she is known for lol. could have saved ALOT of confusion.
Alirght then.
"Submit a model trained on a normal voice"
it was
maybe a languate barrier? that wasn't how it was phrased...it was still complaining about her voice. It never said I had to do a normal BEOFRE hers could be considered. That's a key differnce, and changes the meaning completely. But I understand now thanks to @simple ore clarification.
Thank you
😐
might be a bit though...the only other one i was already working on was another male voice with a similar effect....based on Zachary Quinto's Invincible character 🙃 Found these both really appealing as voices to use for AI related projects.
i'm trying to learn illustrious character lora training
the tut i used for pony claims that the settings should be fine for illustrious (and were in fact written with it in mind) but i find my resulting models are always less style influenced
any images i gen with them come out looking very booru
https://civitai.com/articles/9005/a-detailed-beginners-guide-to-lora-training-on-civitais-trainer
ive been using this tut which is supposed to be for the civit trainer but the settings can be pretty much replicated on kohya
whats the best place for me to iterate on my settings to try and reenforce style? ive seen plenty of cartoon loras that actually maintain their source style on models like wai-nsfw and im trying to achieve that
also worth noting im p sure i trained at 512 nvm i checked i AM on 1024 res, will also say my datasets are unfortunately limited in size, usually around 15-25 total images
mi spelling mistakes arre hard?
flux is another one you can try though it's rather demanding to do locally
you can run quantized flux really well
great quality for regular artsy-fartsy and realistic pics
does this look bad to yall
4 batch size with a 4 min dataset and that was at like 120 epochs
nvm i was just paranoid its fine
Last update: May 5, 2025
what is the diff between sio and rest protocols for the voice changer? i know what rest is, but does this provide any latency help over sio? if it does, then why isnt it the default? (asking bc some guides say to swap sio to rest)
anyone know why my shit is just blank white
Does anyone have the new Google Collab for creating AI models (the RVC v2 disconnected)?
Protocol: rest (Use SIO if you want less delay but if you encounter any issues with SIO switch back to rest. Rest has slightly more delay than SIO)
Any support on how to make them sound less ai>
Hey! Where can I find people to train a voice model? I have a dataset, I would be grateful for help
I was hoping someone could point me in the direction too, I have tried a few "clone" models of it but they always fail to work
what would you suggest if you don't have a GPU but still want to train now that Disconnected is gone?
Train (make) RVC Models on cloud:
- Prepare the Dataset
- Setup RVC:
Choose a cloud way to use RVC,
- Google Colabs (max 4 hours of daily T4 16gb gpu not granted for free, not much hours for training, but easy to use, there's a paid tier):
- Kaggles (a bit harder to use and needs phone number but gives 30 hours weekly of better gpus, either T4x2 16gb each or P100 16gb, only free):
- Lightning.ai (Kinda hard, needs login, no issue with web uis or anything, but only free 15 credits monthly, Free Studios run 24/7 but require restart every 4 hours. There's a paid tier):
- Be sure to know about the tensorboard
Google Colab = Easier but risk of getting disconnected
Kaggle = Harder but way more gpu time
If you are looking for the easiest way and for free, is using https://weights.com/ which ofc uses RVC
RVC Inference (use models) on pre-recorded audio on Cloud
You can use either:
- Weights.com: Easiest Possible Ever Automatic
- Ilaria RVC Zero: Fastest free on cloud
- Applio UI Colab: RVC Fork with some extra features like TTS
- RVC AI Cover Maker UI: Automatically Separates the Vocals and Instrumentals, converts the voice and mixes them back
You can't request for models anymore, tho you can find docs on how to make them yourself
elaborate
what's your pc gpu first
!howtoask
How To Troubleshoot 
- Don't simply mention your issue, like "
my rvc is not working". - Describe the step you are on, what you're trying to do, the RVC you're using, a screenshot, etc.
- The more context, the better.
- Don't be desperate. You can ping a Helper, but if they ignore, they aren't available/don't know the answer.
- It's okay if you're frustrated, but don't take it into this server.
- Don't DM without prior consent.
- Don't ask for every little instruction. Put your own effort & test things by yourself.
- Don't ask to ask.
- Check if your answer is a Google search away/on our guides website.
How do I make female voices sound more like human and not ai like the monotone background of the ai
elaborate:
- your pc gpu
- what tutorial link are you using
- a screenshot of the program you're using with the settings
- if you want to use them in realtime or pre-recorded audios
!give-media-perms 1h @royal grove
nice, you want to use them in realtime of pre-recorded audios?
also what tutorial link are you following?
Real time
I didn follow any tutorial the only one i used was to actually download wokada
lemme guess, you found the link via a youtube tutorial?
yeah
🌟Best FREE Voice Changer :👉 https://bit.ly/3U3MVYi
👉W-okada Voice Changer download link: https://huggingface.co/wok000/vcclient000/tree/main
👉VB-Audio download link: https://vb-audio.com/Cable/
👉Get RVC Voice Models Here : https://voice-models.com/
🚀 Up To 60% OFF on Father's Day :https://bit.ly/43XM7cJ
#freevoicechanger #real...
this one
video tutorials get outdated easily, and in fact this is an old version of original wokada lmao
thats old asf
yup u just wasted time
shit
plus vb audio cable gives issues and can randomly stop working as users told us on windows
forget everything you get by video tutorials
should i delete the vb audio cable
you should delete everything you just installed like if you never saw that video
you're using an almost year old version lmfao
its like trying to run windows xp in 2025, shittiest performance you could get and you're missing out on options to get better quality
thanks bro
ill download this one
can i keep the voices when i delete everything?
that's just a link to the github repo, you need to read the full guide
if you dont read it you will just fuck things up lol, dont just click the first thing you see
is it supposed to open up in your browser
which one do i download?
they are all AMD
@low shard
is this not required to download
because after downloading the nvidia it opened a version of the voice changer on my browser
do i use that one?
nvm i figured it out
thanks
which tutorial link are u following
you shouldn't look at the github
@royal grove dont go in the github, read the guide i sent you
where do i put my downloaded models
Thank you for this, I have tried the Applio (UI) via Colabs but the tutorial is referring to an older version so alof of it isnt the same and then it just fails to generate an index or model. RVC V2 Disconnected was so much easier, and it worked for me. SHame it was closed
if you get a message about the index failing to generate, you probably messed up the prep part, did not slice dataset or pointed to a wrong folder. The preprocess log should output xx minutes processed and extract feature log should say xxx/xxx segments processed. Not 0/0.
Just wanted to know anyones thoughts on Renting a Cloud GPU for a few hours specifically for building a model that im unable to build locally but will be able to run locally once built. (havent seen this topic get mentioned what i see is those rent for environment due to specs)
Use Case Qwen2.5 32B -Instruct fp8 Quant GOAL
Extra Context my Specs:
CPU: AMD Ryzen 7 7800X3D (8 cores / 16 threads)
GPU: NVIDIA GeForce RTX 4090
Motherboard: MSI MAG B650 TOMAHAWK WIFI
RAM: 64GB G.Skill Flare X5 Series DDR5
thanks im trying to do through tensorRT LLM and go through the building process through trtllm-build but it requires x3 x4 so my thought process is this.
Quantization Optimization Strategy
Testing Protocol (Shoot for the Best, Work Downwards):
Priority 1: FP8 Validation
✅ RTX 4090 Support: Test FP8 tensor operations compatibility
✅ Performance Benchmark: Measure speed vs memory vs quality
✅ Stability Test: Ensure consistent outputs and no crashes
Priority 2: Enhanced AWQ Evaluation
✅ Calibration Quality: Test 1024 vs 512 calibration samples
✅ Block Size Impact: Compare 64 vs 128 block sizes
✅ Mixed Precision: FP8 KV cache + AWQ weights performance
Priority 3: Baseline Confirmation
✅ Standard AWQ: Ensure proven configuration works as expected
✅ Fallback Readiness: Validate backup option performs acceptably
goal use cloud gpu specs like A100 80GB to build then it will fit within my 24GB VRAM if that makes sense
Q5_0 will fit
8192 context is a bit small, but it may be fine with it offloaded into RAM
Q5_0 model size is 21GB
my apologies for lack of context i aim for 32k context length with this i used Q4_K_M previously but i need improvements because that uses llama.cpp which is like 75%-80% compared to tensorRT
Decision Matrix:
Metric
FP8
Enhanced AWQ
Standard AWQ
RTX 4090 Support
Test Required
✅ Proven
✅ Proven
Expected Speed
🥇 Best
🥈 Better
🥉 Good
Memory Efficiency
🥇 Best
🥈 Better
🥉 Good
Quality
🥇 Best
🥈 Better
🥉 Good
Risk Level
🔶 Medium
🟢 Low
🟢 Minimal
Selection Criteria:
RTX 4090 compatibility (must work flawlessly)
Performance improvement over current Q4_K_M (minimum 6x speedup)
Memory efficiency (must fit in 24GB with overhead)
Output quality (must maintain coherent responses)
Stability (no crashes or artifacts during extended use)
Executive Summary
Goal: Build an ultra-optimized Qwen2.5-32B-Instruct model using W4A8_AWQ (but aiming for FP8) quantization via cloud GPU, then deploy locally on RTX 4090 for maximum performance with 32K context support.
Problem: RTX 4090 (24GB) cannot compile TensorRT engines for 32B models due to memory constraints during build process, despite having sufficient memory for runtime.
Solution: Use cloud GPU (A100 80GB) for one-time engine compilation, then deploy locally.
Primary Goal (Best-Case Scenario):
Build Qwen2.5-32B with FP8 quantization achieving:
✅ Speed: 70-110 tokens/sec on RTX 4090 (10-20% faster than W4A8_AWQ)
✅ Memory: ~12-16GB runtime usage (more efficient than AWQ)
✅ Context: Full 32K token support
✅ Quality: 98%+ of FP16 performance (floating-point precision advantage)
Secondary Goal (High-Performance Fallback):
Enhanced W4A8_AWQ quantization achieving:
✅ Speed: 65-95 tokens/sec on RTX 4090
✅ Memory: ~14-17GB runtime usage
✅ Context: Full 32K token support
✅ Quality: 96%+ of FP16 performance
Tertiary Goal (Proven Baseline):
Standard W4A8_AWQ quantization achieving:
✅ Speed: 60-90 tokens/sec on RTX 4090
✅ Memory: ~15-18GB runtime usage
✅ Context: Full 32K token support
✅ Quality: 95%+ of FP16 performance
Sorry if is too much context just trying to share relevant details after ideal build is complete i plan to use with anythingllm and setup draft model,embedding model,vector db etc these options will just beat the slow Q4_K_M 32k context speed i was unsatisfied with
guys why my app crash everytime i tried to use voice ai
Tell me your PC specs
wdym
Your cpu gpu and ram
i7 4079 geforce 32
What ?
yes but how do i use the text to speech
You can use kokoro tts, f5 tts
where
On your system, install it
You mean you want to fine tune?
That's what I said, that's called fine tuning a model, if you want to create a pre trained from scratch, that's like impossible on our normal systems.
I suggest you to use applio tts. There you can use your voice model. Or you can use any tts and then convert that audio into your desired character's voice
I would not suggest that, edge tts in applio is purely for demo purposes. There are better tts available. Edge is just a screen reader for websites after all.
is there any benefit to use nvidia broadcasts echo and noise removal as opposed to using okada's builtin echo and sup1&2?
broadcast is much much better
alright, is it advisable to have them both on? or just broadcast with dietris fork?
mic -> broadcast app -> voice changer
both gonna use rtx cores on gpu, have not tried it personally.. should be fine on a newer gpu
@scenic arch https://x.com/mrprowestie/status/1252867224466939908
Okay this is amazing. NVIDIA RTX Voice filter blocking out direct fan noise and a hammer banging on the desk... what is this wizardry?! 👀
@NVIDIAGeForceUK
version from 5 years ago, should be even better now
whats the best tts for rvc also
There are different Text To Speech (TTS) AIs:
GPT So Vits: RVC isn't as good as GPT So Vits for tts, but gpt so vits (few shot tts, which means needs just a lil training for models) can't use rvc models (and viceversa), and its only limited to: english, chinese, Cantonese, japanese & korean, if you wanna check gpt so vits instead, read https://docs.ai-hub.wtf/tts/gpt-sovits/
Freemium 11labs: Easy way to do TTS is https://elevenlabs.io/, you can't use RVC model on this but its a mostly premium easy way for good quality TTS
FishSpeech: FishSpeech is a 0 shot (no explicit training needed) TTS, if you got a good pc you can use it locally else use their site
You can check TTS in our tts index
With RVC Models:
RVC is natively for Speech To Speech, but forks such as Applio have built in tts (using Microsoft Edge TTS to make a generated tts audio, which i suggest you to choose a tts model that is the same gender and language of the rvc model you wanna use, and then convert it with rvc)
If you wanna do tts locally with RVC Voice Models (if you got a good pc):
- You can get Applio in our docs
If you don't got a good pc you can do tts with RVC Voice Models on cloud:
-
Ilaria RVC Zero (Running on A100 GPU, free fasted rvc on cloud) and the guide
-
Use Applio UI Colab (with google colab T4 free daily limit gpu)
-
You could try another tts from our tts index and use the output as an input in rvc
Question about W-Okada voice changer. If I still have an older version, is it not supported anymore or work properly? Because I've been having trouble with my gaming laptop I just got a year ago after graduation that worked just fine until recently last month went I was minding my business on Minecraft and it closed my laptop and never turned back on so Geek Squad looked at it and said its a faulty battery issue they "fixed" but after it returned home to me three days ago and only got 5 minutes of playtime for updating, clearing storage space and some games and it turned back off on me after opening Google so my father had to return it and Geek Squad said they had a feeling he'd be back like they were expecting it and now they're saying it can't be fixed and I'm forced to buy a new laptop after my father used up the protection plan. Anybody know?
I would like an answer before the laptop arrives in two days...
Ah, the Geek Scam
You either learn how to diagnose and fix your PC or you pay thru the nose for placebo fixes.
So the voice changer is not responsible? And I knew Geek Squad was a scam from a different pc expert actually opening up the devices to look inside and fix things and replace the motherboard but my father would not listen
So idk if that's a good idea for now until I figure out if my problem with my laptop dying and not turning back on is caused by okada or not...But just saying in case it is
very unlikely to damage anything permanently
I know how
can u help me out?
Yeaz in dms
i already have cable device
Just gimme a minute since I'm not home
Okay but I did read this and I wasn't sure if he meant the file not working for some users or the whole computer itself
Okay weird things are happening and idk why.
Specs:
Cpu: 13th Gen Intel Core i7-13700K
Ram: 64 GB
Gpu : RTX 4070
So I've tried using the latest W-Okada, and the one from a year ago. The newest tends to break then not work at all, while the old one gets worse over time, basically cuts in and out and fails to make any sound at all.
I'm using it for streaming, and changing my voice in game, so I expect delay, No matter what it comes out to about 4 second delay, then gets choppier and choppier. Any ideas on what to do? I can try and gather more info later today like logs and so on.
Thanks in advance, I'm not the best when it comes to tech so even obvious fixes are welcome
Send screenshot of your interface of the latest
what's ur pc gpu exactly? what do u want to do?
RVC = Retrieval-based-Voice-Conversion, the best Few Shots Speech To Speech AI Models (on v2), Inferences (use models) pre-recorded audio (ai covers) and train (make) models. Technically, Mainline RVC does have a go-realtime.bat (aka RVC-GUI), but it's pretty messy and outdated so it's extremely not suggested for realtime.
Wokada = uses RVC for realtime inference. There's 2 main versions, Original made by Wok, and the most suggested one is Deiteris Fork (modified version)
rx 6700 xt
want to try out realtime
so should i go with the fork?
then you need wokada deiteris fork yes, read https://docs.aihub.gg/rvc-voice-changer/local/deiteris-w-okada-fork/
Last update: May 5, 2025
alright thanks alot
yw and lmk!
guys please can i ask for some free ai websites except weights gg i need them
it means for <=4000 serie use one version, for 5000 series use the specifically one internded just fro 5000.
so what would i use for 6000
amd rx6700 xt
you download the AMD version of the voice changer from the corresponding link
How do I get started?
Sorry had to deal with some VA stuff dming it now
Welp Idk what I have managed to do, but now the old one is borked
32 bit float or 64? to train models
how much does mic quality matter? I feel like I can’t get w okada to sound that realistic
im using a hyperx solo cast so like about average
Does not need all that good of a mic, rvc model extracts pitch and phonemes.
hmmm
it still just sounds a bit unnatural ngl I don’t get why
you’re right I read that on the fork site I did the crackle fixes as well but it just sounds weird
Oh, I was told Float was better.
Can someone please point me the right direction on how to get consistent characters in text to images. I am already including as much detail as possible in the prompts. I have tried so many different AI tools already.
well, it should convert it to +/- 1.0 values during load
- make a lora, 2) use image prompt with an image of the character face, for example
I see
At what values do I see the graph better, my friend?
Sorry but I am kinda new to AI. What is lora?
It is basically a set of specific guidelines how to draw something. You train it on a small set of images, I think 20 is enough, then it can draw the same character using a keyword consistently
huh?
the tensorboard, to see the lowest peaks, will 0.7 be fine?
Oh nice I will do some research and try to learn how. Thank you!
if you're using avg_50 charts, they are smooth enough to use 0.5
with more epoch the grap itself smoothes out
since tensorboard does not really shows every logged value
Okay, thanks, I used it at 0.7 or 0.9
for old loss graph 0.987 was necessary because they were so random
do yall have any voice changer that i can use the models with?
can someone help me when im speaking sometimes the voices make a robotic sound with which setting can i avoid that - using okada
Whats your gpu and send screenshot of wokada interface
Whats your gpu
nvidia geforce rtx 5060 ti(0) - i hear some clunky noices and robotic pitches and voice cracks
i cant fix it tried changing the chunk but it doesnt work
Last update: May 5, 2025
Do you have the 2nd nvidia link specifically for rtx 5000
let me check
Its a separate version
deiteris fork?
Yes
i dont have it
so i need to install voice-changer-windows-amd64-cuda.zip.002
?
i read that i need to download both 001 and 002 and then unzip them
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
@pastel oak it said this
what am i supposed to do
it works when im using my cpu but doesnt let me use gpu
Just use the precompiled download from the guide
What youre downloading is not for rtx 5000
Yes it is
I dont think you understand what im trying to say just download the 2nd nvidia link
+CUDA12.8 Pytorch updated
(Pytorch nightly version)
RTX 5080 test done
Windows / NVIDIA
For RTX 5000 users
thanks to deiteris (https://github.com/deiteris/voice-changer)
@pastel oak with my gpu do u have idea for what chunk extra and f0 to run it on (last question sry for being annoying)
No
Start with chunk 200 and extra 2.7
If the "perf" on the graph the one in green color is a low number like 30, then decrease chunk to around 100 if you want less delay
thanks a lot - also should i use the formant (im using it on non supported langueage)
The one has nothign to do with the other, formant pitch shifting is like pitch but more "inbetween", generally not needed unless you want funny chipmunk voices
oh okay thanks a lot
also how do u delete models bec when i go to edit i dont see it
do it manually within the model_dir folder, then restart the voice changer
thanksssss
help, I installed the voice changer and virtual microphone correctly in the voice changer, I also installed the virtual cable on the microphone correctly in the discord, I start, I say, it doesn't work, but when I turn on the loud video, the voice changer perceives it as a voice, and changes the voice to the video, in general, instead of the microphone sound, the voice changer changes the sound of headphones
(im using amd version, im had amd graphics card)
and now, it captures both the sound and the microphone, what should I do?
Screenhot of your interface
Cutting off is you getting too quiet at the end of sentences, keep in. Sens. Threshold further to the left if you moved it to the right
Do you mean crackles with distoetion
No the in. Sens. Under F0
And check crossfade length in advanced settings and bring it to 0.10 if its lower than that
check audio settings in voice changer and make sure the input is the real mic for regular cases
Tooltip and guide has brief explanations but roughly explained it constructs the voice more clearly the higher it is but adds delay
Whenever i try to use the deiteris fork on kaggle, it manages to process and get the server ready, but when i click on the link it says "this site can't be reached" is this just a network problem or something?
one cecond
I specified everything correctly, I can take screenshots, if I did something wrong, please write
show the screenshot here
!give-media-perms 30m @hazy dune
Hey all - just joined here so apologies in advance for any repetitive questions.
I'm pretty new to AI, so is there any material / videos that anyone could recommend,
I'm specifically struggling with getting ChatGPT to recall and give me an accurate time.
Anyone had similar issues or any advice on how to resolve?
I'm also interested to understand if / how I can link up my various platforms to create an autonomous set of AI agents who need minimal human supervision or direction?
Thanks in advance
why would you expect a model that generates tokens from probabilies give you current time?
at best it can repeat something you said from its context
or there can be some patches added to the processing, like storing things you told the model to remember in a special context or running queries without using the llm such as "what time is it now?"
Please do not cross post in multiple channels as it could be considered spamming
if the voice changer is being used, the settings are greyed out
Last update: May 5, 2025
hey everyone - i just had a question
so everyone probably knows how people are using temp student emails to gain free access to veo 3
but i had some concerns - what if google finds out? this may be a dumb question but do u guys think they will charge the cards for the full 15 months
?
You are trying to do the most compute demanding tasks one can think of.. for free
free with some asterisk ffs*
are you a student?
you are trying to steal a service that is being provided conditionally by google with certain expectations
also be aware that google is logging anything and would report you if you attempt to do anything below the board
sorry i dont ever recall saying i was going to use it - i was merley asking a question. so pls mind ur own business
what to do if the ai itself says TRAIL
elaborate
what's ur pc gpu? what do u want to do? what tut link are u following?
4060/I want to use the voice/from the streamer
want to use it in realtime? or in pre-recorded audios? also, share the link of the tutorial you used
I hope you didnt use a youtube tutorial, since video tutorials are outdated asf
realtime
sorry😅
you can delete everything you got off youtube
they use an over year old software
-realtime
Guides for Programs that use RVC Models in Realtime for Calls/Games
Most suggested. GUIDE
ONLY the latest alpha comes close to the Deiteris Fork performance, older versions in youtube tuts are way worse. GUIDE
Unavailable, the guide is outdated and the program is worse compared to the ones above, and much less updated
read the 1st link
official or deiteris fork?
deiteris fork
oh btw if a user has a nvidia gpu you might want to also recommend vonovox
its better than deiteris' fork
why's that?
were there tests done? may i see them?
not extensive tests but tests
i am able to get 35 ms of delay with vonovox with no game open and with overwatch on ultra graphics on 1440p i can get like 50ms delay
lyery has also messed with it i think
But what would be the direct link to the colab since I'm looking and I just can't find it?
An RX 580 and an i7 2700k are no good for making models locally.
is there any way to change the accent on owakada? like even when i speak my first language it sound like i got an accent. is it using like an asian accent ?
with 0 index value it uses your own pronunciation
if you increase the index use it would use a blend between the voice model and you, at 1 it is the voice model's... but there's performance drawback from using the index
illustrious lora training parameter question
why the fuck is this
when i
kohya if you couldnt tell
what's the source for 1st screenshot?
where i download the voice mod?
what kind of files does the beatrice vst take? I'm trying to put a model in but it shows .toml files
Also can someone convert an rvc or pth file to toml
so do people normally use index or not?
for realtime usually not
but if your performance is fine, go for it
i see
do you find you could tell if someones using it?
im trying to get it to sound realistic but it just doesnt lol
index doesnt make things realistic, its just where the accent of the model is stored, so it kinda makes the model sound more truthful to the original dataset
for a more realistic result train a big dataset 
ohhh
makes sense
do you recommend any natural sounding female models?
searched through the channel for it but maybe you have the secret weapon
can someone help me
this question.... 50 times a day
im new
i mean it’s hard to find im ngl
so many egirl models and no resources on natural ones
makes sense why people ask considering this is why most people go to voice changers
just train yourself one
its easy
yea but why?
rvc seems to have been out for a while so id be surprised if no one’s made a good one
i have natural models, it just takes time
like I don’t really mind going through making one but I feel like someone else would have made something a lot better than I could’ve
how long?
if it’s not a lot of manual work that’s fine for me tbh
ok ill try to get on the dev branch and try again
depends
for natural results usually a week, for mid results maybe 1 day or less
is it a lot of just afk hours making it?
bc I mean I could try making one that takes longer
nope, u gotta clean the dataset which takes a lot of time
Do people still use applio for model making? I really haven't heard anyone mention it for some time.i also don't think it's been updated for a few months
I guess applio is still the best option
ohhh
now I get why there’s not many good models lmao
do you know any that are actually decent or do most just use what they make
I see
how much time would you say per day id have to put to make it sound good then?
if it’s like 20 min a day id consider it lol
well this was a 5 hour set that took me around hmm 1 week or so to clean? the audio quality is extremely bad so i had to remove tons of stuff
i think i was cleaning 40 minute per day
oh damn
How does one put an rvc model in beatrice vst?
I think another challenge is actually finding data sets lmao
idk where id even start with that tbh
wait so the actual length depends on the data set you give it?
like a 3 hour data set would be like 3-4 days
my set without cleaning was around 8 hours
that’s a bit confusing ngl but im sure I’ll figure it out once I research how it works
do you think a lesser data set like 4 hours is enough?
im assuming the data sets in the voice models channel were small ones
the 2 hour model i trained of that person sounded pretty mid
not robotic but mid
you guessed it, small datasets give robotic results
so 2 hours was the original data set like your 8 hour one?
originally i had a 3 hour stream, that after cleaning, truncating etc, got me around 2 hours and 30 minutes??? can't remember
ah yes
but that person talks a lot in their streams so ye
hold on i still have the 2 hour model
so u can compare it to this
so this is the 2 hour model
does kinda sound like him??? but not as accurate as the 5 hour one
if you want mommy egirl model literally just search f4m on yt and you will find so much
but still decent since its not robotic at all
LMAOOOO
I didn’t even know something like this existed

lol
i would say 2 hours is enough for most models to not sound robotic
are you saying take multiple videos
yup
wait yea there’s like 7 hour dumps
yeah u can just take that 7 hour alone
but these mommy voice are mad monotone so they wont sound as good as a expressive voice
true
can there be background noise?
these videos have like breathing and rain and shit
yeah that’s the problem ngl
depends
the monotone gives it away imo
ill keep that in mind
yea ngl I need something that is more expressive
legit just a regular woman’s voice is what I need
my workflow is using a noise gate to remove noise, then manually silencing every bad part

LMAO okay
but ye keep in mind very monotone datasets can't do much
this one is very monotone i can say
yea it sounded like it
do you have any recommendations on getting larger data sets of women voices?
vtuber
i only train male voices :D
maybe like certain voice actors idk really what’s out there
ooo I see
you can also look up speed painting with commentary


